Professional Documents
Culture Documents
Probability Theory
An Analytic View, Second Edition
This second edition of Daniel W. Stroocks text is suitable for rst-year graduate
students with a good grasp of introductory undergraduate probability. It provides
a reasonably thorough introduction to modern probability theory with an emphasis on the mutually benecial relationship between probability theory and analysis. It includes more than 750 exercises and oers new material on Levy processes,
large deviations theory, Gaussian measures on a Banach space, and the relationship
between a Wiener measure and partial dierential equations.
The rst part of the book deals with independent random variables, Central Limit
phenomena, the general theory of weak convergence and several of its applications, as
well as elements of both the Gaussian and Markovian theories of measures on function
space. The introduction of conditional expectation values is postponed until the
second part of the book, where it is applied to the study of martingales. This part also
explores the connection between martingales and various aspects of classical analysis
and the connections between a Wiener measure and classical potential theory.
Dr. Daniel W. Stroock is the Simons Professor of Mathematics Emeritus at the
Massachusetts Institute of Technology. He has published many articles and is the
author of six books, most recently Partial Dierential Equations for Probabilists
(2008).
Probability Theory
An Analytic View
Second Edition
Daniel W. Stroock
Massachusetts Institute of Technology
Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
Table of Dependence . . . . . . . . . . . . . . . . . . . . . . xxi
Chapter 1 Sums of Independent Random
1.1 Independence . . . . . . . . . . . .
1.1.1. Independent -Algebras . . . . . .
1.1.2. Independent Functions . . . . . . .
1.1.3. The Rademacher Functions . . . . .
Exercises for 1.1 . . . . . . . . . . . .
1.2 The Weak Law of Large Numbers . . .
1.2.1. Orthogonal Random Variables . . .
1.2.2. Independent Random Variables . . .
1.2.3. Approximate Identities . . . . . . .
Exercises for 1.2 . . . . . . . . . . . .
1.3 Cramers Theory of Large Deviations . .
Exercises for 1.3 . . . . . . . . . . . .
1.4 The Strong Law of Large Numbers . . .
Exercises for 1.4 . . . . . . . . . . . .
1.5 Law of the Iterated Logarithm . . . .
Exercises for 1.5 . . . . . . . . . . . .
Variables
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
1
4
5
7
14
14
15
16
20
22
31
35
42
49
56
59
60
60
62
65
71
72
75
81
82
82
84
87
90
viii
Contents
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 96
. 96
. 101
. 105
. 110
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
115
116
117
119
122
123
126
137
139
139
141
147
. . . . . . . . . . . . . . . . . . . 151
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
152
153
156
159
160
161
163
168
170
171
174
177
178
180
182
183
185
187
Contents
ix
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
193
194
198
202
205
206
212
214
217
221
226
. .
.
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
233
233
239
240
244
245
248
251
253
256
257
257
262
263
. . . . . . . . . 266
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
266
266
267
270
272
276
280
282
282
284
289
Contents
Exercises for 7.2 . . . . . . . . . . . .
7.3 The Reflection Principle Revisited . . .
7.3.1. Reflecting Symmetric Levy Processes
7.3.2. Reflected Brownian Motion . . . . .
Exercises for 7.3 . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
290
292
292
294
298
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
299
299
299
303
306
306
306
307
310
313
317
317
318
322
326
328
330
337
337
340
342
343
344
346
349
355
358
358
361
363
365
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
367
367
367
370
377
Contents
Exercises for 9.1 . . . . . . . . . . . . . . .
9.2 Regular Conditional Probability Distributions .
9.2.1. Fibering a Measure . . . . . . . . . . .
9.2.2. Representing Levy Measures via the Ito Map
Exercises for 9.2 . . . . . . . . . . . . . . .
9.3 Donskers Invariance Principle . . . . . . . .
9.3.1. Donskers Theorem . . . . . . . . . . .
9.3.2. Rayleighs Random Flights Model . . . . .
Exercise for 9.3 . . . . . . . . . . . . . . .
xi
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
381
386
388
390
392
392
393
396
399
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
400
400
401
404
405
407
411
415
416
416
417
418
426
429
429
431
436
439
445
449
.
.
.
.
.
.
.
.
.
.
.
.
456
456
456
459
463
468
472
475
476
477
486
487
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
xii
Contents
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
Capacity
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
488
489
496
497
497
500
504
507
514
Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
Preface
xiv
Preface
tell the reader enough so that he could understand the ideas and not so much
that he would become bored by them. In addition, they gave me an introduction
to a host of ideas and techniques (e.g., stopping times and the strong Markov
property), all of which Kac himself consigned to the category of overelaborated
measure theory. In fact, it would be reasonable to say that my thesis was simply
the application of techniques which I picked up from Dynkin to a problem that
I picked up by reading some notes by Kac. Of course, along the way I profited
immeasurably from continued contact with McKean, a large number of courses
at N.Y.U. (particularly ones taught by M. Donsker, F. John, and L. Nirenberg),
and my increasingly animated conversations with S.R.S. Varadhan.
As I trust the preceding description makes clear, my graduate education was
anything but deprived; I had ready access to some of the very best analysts
of the day. On the other hand, I never had a proper introduction to my field,
probability theory. The first time that I ever summed independent random
variables was when I was summing them in front of a class at N.Y.U. Thus,
although I now admire the magnificent body of mathematics created by A.N.
Kolmogorov, P. Levy, and the other twentieth-century heroes of the field, I
am not a dyed-in-the-wool probabilist (i.e., what Donsker would have called a
true coin-tosser). In particular, I have never been able to develop sufficient
sensitivity to the distinction between a proof and a probabilistic proof. To me,
a proof is clearly probabilistic only if its punch-line comes down to an argument
like P (A) P (B) because A B; and there are breathtaking examples of such
arguments. However, to base an entire book on these examples would require a
level of genius that I do not possess. In fact, I myself enjoy probability theory
best when it is inextricably interwoven with other branches of mathematics and
not when it is presented as an entity unto itself. For this reason, the reader
should not be surprised to discover that he finds some of the material presented
in this book does not belong here; but I hope that he will make an effort to figure
out why I disagree with him.
Preface to the Second Edition
My favorite preface to a second edition is the one that G.N. Watson wrote for
the second edition of his famous treatise on Bessel functions. The first edition
appeared in 1922, the second came out in 1941, and Watson had originally
intended to stay abreast of developments and report on them in the second
edition. However, in his preface to the second edition Watson admits that his
interest in the topic had waned during the intervening years and apologizes
that, as a consequence, the new edition contains less new material than he had
thought it would.
My excuse for not incorporating more new material into this second edition is
related to but somewhat different from Watsons. In my case, what has waned
is not my interest in probability theory but instead my ability to assimilate
the transformations that the subject has undergone. When I was a student,
Preface
xv
xvi
Preface
Preface
xvii
to extend naturally both to -finite measure spaces as well as to random variables with values in a Banach space. Section 5.2 presents Doobs basic theory
of real-valued, discrete parameter martingales: Doobs Inequality, his Stopping
Time Theorem, and his Martingale Convergence Theorem. In the last part of
5.2, I introduce reversed martingales and apply them to DeFinettis theory of
exchangeable random variables.
6: Chapter 6 opens with extensions of martingale theory in two directions: to
-finite measures and to random variables with values in a Banach space. The
results in 6.1 are used in 6.2 to derive Birkhoffs Individual Ergodic Theorem
and a couple of its applications. Finally, in 6.3 I prove Burkholders Inequality
for martingales with values in a Hilbert space. The derivation that I give is
essentially the same as Burkholders second proof, the one that gives optimal
constants.
7: Section 7.1 provides a brief introduction to the theory of martingales with
a continuous parameter. As anyone at all familiar with the topic knows, anything approaching a full account of this theory requires much more space than a
book like this can give it. Thus, I deal with only its most rudimentary aspects,
which, fortunately, are sufficient for the applications to Brownian motion that I
have in mind. Namely, in 7.2 I first discuss the intimate relationship between
continuous martingales and Brownian motion (Levys martingale characterization of Brownian motion), then derive the simplest (and perhaps most widely
applied) case of the DoobMeyer Decomposition Theory, and finally show what
Burkholders Inequality looks like for continuous martingales. In the concluding section, 7.3, the results in 7.17.2 are applied to derive the Reflection
Principle for Brownian motion.
8: In 8.1 I formulate the description of Brownian motion in terms of its Gaussian, as opposed to its independent increment, properties. More precisely, following Segal and Gross, I attempt to convince the reader that Wiener measure
(i.e., the distribution of Brownian motion) would like to be the standard Gauss
measure on the Hilbert space H 1 (RN ) of absolutely continuous paths with a
square integrable derivative, but, for technical reasons, cannot live there and
has to settle for a Banach space in which H 1 (RN ) is densely embedded. Using
Wiener measure as the model, in 8.2 I show that, at an abstract level, any
non-degenerate, centered Gaussian measure on an infinite dimensional, separable Banach space shares the same structure as Wiener measure in the sense
that there is always a densely embedded Hilbert space, known as the Cameron
Martin space, for which it would like to be the standard Gaussian measure but
on which it does not fit. In order to carry out this program, I need and prove
Ferniques Theorem for Gaussian measures on a Banach space. In 8.3 I begin
by going in the opposite direction, showing how to pass from a Hilbert space H
to a Gaussian measure on a Banach space E for which H is the CameronMartin
space. The rest of 8.3 gives two applications: one to pinned Brownian motion
xviii
Preface
and the second to a very general statement of orthogonal invariance for Gaussian
measures. The main goal of 8.4 is to prove a large deviations result, known as
Schilders Theorem, for abstract Wiener spaces; and once I have Schilders Theorem, I apply it to derive a version of Strassens Law of the Iterated Logarithm.
Starting with the OrnsteinUhlenbeck process, I construct in 8.5 a family of
Gaussian measures known in the mathematical physics literature as Euclidean
free fields. In the final section, 8.6, I first show how to construct Banach space
valued Brownian motion and then derive the original form of Strassens Law of
the Iterated Logarithm in that context.
9: The central topic here is the abstract theory of weak convergence of probability measures on a Polish space. The basic theory is developed in 9.1. In
9.2 I apply the theory to prove the existence of regular conditional probability
distributions, and in 9.3 I use it to derive Donskers Invariance Principle (i.e.,
the pathspace statement of the Central Limit Theorem).
10: Chapter 10 is an introduction to the connections between probability theory and partial differential equations. At the beginning of 10.1 I show that
martingale theory provides a link between probability theory and partial differential equations. More precisely, I show how to represent in terms of Wiener
integrals solutions to parabolic and elliptic partial differential equations in which
the Laplacian is the principal part. In the second part of 10.1, I use this link to
calculate various Wiener integrals. In 10.2 I introduce the Markov property of
Wiener measure and show how it not only allows one to evaluate other Wiener
integrals in terms of solutions to elliptic partial differential equations but also
enables one to prove interesting facts about solutions to such equations as a consequence of their representation in terms of Wiener integrals. Continuing in the
same spirit, I show in 10.2 how to represent solutions to the Dirichlet problem
in terms of Wiener integrals, and in 10.3 I use Wiener measure to construct
and discuss heat kernels related to the Laplacian.
11: The final chapter is an extended example of the way in which probability
theory meshes with other branches of analysis, and the example that I have chosen is the marriage between Brownian motion and classical potential theory. Like
an ideal marriage, this one is simultaneously intimate and mutually beneficial to
both partners. Indeed, the more one knows about it, the more convinced one becomes that the properties of Brownian paths are a perfect reflection of properties
of harmonic functions, and vice versa. In any case, in 11.1 I sharpen the results
in 10.2.3 and show that, in complete generality, the solution to the Dirichlet
problem is given by the Wiener integral of the boundary data evaluated at the
place where Brownian paths exit from the region. Next, in 11.2, I discuss the
Green function for a region and explain how its existence reflects the recurrence
and transience properties of Brownian paths. In preparation for 11.4, 11.3 is
devoted to the Riesz Decomposition Theorem for excessive functions. Finally,
in 11.4, I discuss the capacity of regions, derive Chungs representation of the
Preface
xix
capacitory measure in terms of the last place where a Brownian path visits a
region, apply the probabilistic interpretation of capacity to give a derivation of
Wieners test for regularity, and conclude with two asymptotic calculations in
which capacity plays a crucial role.
Suggestions about the Use of This Book
In spite of the realistic assessment contained in the first paragraph of its preface,
when I wrote the first edition of this book I harbored the nave hope that it
might become the standard graduate text in probability theory. By the time
that I started preparing the second edition, I was significantly older and far less
nave about its prospects. Although the first edition has its admirers, it has
done little to dent the sales record of its competitors. In particular, the first
edition has seldom been adopted as the text for courses in probability, and I
doubt that the second will be either. Nonetheless, I close this preface with a few
suggestions for anyone who does choose to base a course on it.
I am well aware that, except for those who find their way into the poorly
stocked library of some prison camp, few copies of this book will be read from
cover to cover. For this reason, I have attempted to organize it in such a way that,
with the help of the table of dependence that follows, a reader can select a path
which does not require his reading all the sections preceding the information he
is seeking. For example, the contents of 1.11.2, 1.4, 2.1, 2.3, and 5.1
5.2 constitute the backbone of a one semester, graduate level introduction to
probability theory. What one attaches to this backbone depends on the speed
with which these sections are covered and the content of the courses for which
the course is the introduction. If the goal is to prepare the students for a career
as a quant in what is left of the financial industry, an obvious choice is 4.3
and as much of Chapter 7 as time permits, thereby giving ones students a
reasonably solid introduction to Brownian motion. On the other hand, if one
wants the students to appreciate that white noise is not the only noise that they
may encounter in life, one might defer the discussion of Brownian motion and
replace it with the material in Chapter 3 and 4.14.2.
Alternatively, one might use this book in a more advanced course. An introduction to stochastic processes with an emphasis on their relationship to partial
differential equations can be constructed out of Chapters 6, 7, 10, and 11, and
4.3 combined with Chapter 8 could be used to provide background for a course
on Gaussian processes.
Whatever route one takes through this book, it will be a great help to your
students for you to suggest that they consult other texts. Indeed, it is a familiar
fact that the third book one reads on a subject is always the most lucid, and so
one should suggest at least two other books. Among the many excellent choices
available, I mention Wm. Fellers An Introduction to Probability Theory and Its
Applications, Vol. II, and M. Loeves classic Probability Theory. In addition, for
xx
Preface
Table of Dependence
11.111.4
10.110.3
7.47.5
8.18.5
9.3
7.17.3
3.23.3
4.14.3
6.2
2.3
1.5
1.11.4
9.19.2
xxi
Chapter 1
Sums of Independent Random Variables
In one way or another, most probabilistic analysis entails the study of large
families of random variables. The key to such analysis is an understanding
of the relations among the family members; and of all the possible ways in
which members of a family can be related, by far the simplest is when there
is no relationship at all! For this reason, I will begin by looking at families of
independent random variables.
1.1 Independence
In this section I will introduce Kolmogorovs way of describing independence
and prove a few of its consequences.
1.1.1. Independent -Algebras. Let (, F, P) be a probability space
(i.e., is a nonempty set, F is a -algebra over , and P is a non-negative
measure on the measurable space (, F) having total mass 1), and, for each i
from the (non-empty) index set I, let Fi be a sub--algebra of F. I will say
that the -algebras Fi , i I, are mutually P-independent, or, less precisely,
P-independent, if, for every finite subset {i1 , . . . , in } of distinct elements of I
and every choice of Aim Fim , 1 m n,
(1.1.1)
independent -algebras tend to fill up space in a sense made precise by the following beautiful thought experiment designed by A.N. Kolmogorov. Let I be
any index set, take F = {, }, and, for each non-empty subset I, let
!
F =
Fi
Fi
iI
S
be the -algebra
generated by i Fi (i.e., F is the smallest -algebra conS
taining i Fi ). Next, define the tail -algebra T to be the intersection over
all finite I of the -algebras F{ . When I itself is finite, T = {, } and
is therefore P-trivial in the sense that P(A) {0, 1} for every A T . The
interesting remark made by Kolmogorov is that even when I is infinite, T is
P-trivial whenever the original Fi s are P-independent. To see this, for a given
non-empty I, let C denote the collection of sets of the form Ai1 Ain
where {i1 , . . . , in } are distinct elements of and Aim Fim for each 1 m n.
Clearly C is closed under intersection and F = (C ). In addition, by assumption, P(A B) = P(A)P(B) for all A C and B C{ . Hence, by Exercise
1.1.12, F is independent of F{ . But this means that T is independent of FF
for every finite F I, and therefore, again by Exercise 1.1.12, T is independent
of
[
FI =
{FF : F a finite subset of } .
Since T FI , this implies that T is independent of itself ; that is, P(A B) =
P(A)P(B) for all A, B T . Hence, for every A T , P(A) = P(A)2 , or,
equivalently, P(A) {0, 1}, and so I have now proved the following famous
result.
Theorem 1.1.2 (Kolmogorovs 01 Law). Let {Fi : i I} be a family
of P-independent sub--algebras of (, F, P), and define the tail -algebra T
accordingly, as above. Then, for every A T , P(A) is either 0 or 1.
To develop a feeling for the kind of conclusions that can be drawn from Kolmogorovs 01 Law (cf. Exercises 1.1.18 and 1.1.19 as well), let {An : n 1} be
a sequence of subsets of , and recall the notation
lim An
[
\
An = : An for infinitely many n Z+ .
m=1 nm
1.1 Independence
(1.1.4)
P(An ) < = P
n=1
lim An = 0.
(1.1.5)
P(An ) = P
n=1
lim An = 1.
(See part (iii) of Exercise 5.2.40 and Lemma 11.4.14 for generalizations.)
Proof: The first assertion, which is due to E. Borel, is an easy application of
countable additivity. Namely, by countable additivity,
[
X
An lim
P(An ) = 0
P lim An = lim P
n
nm
nm
if
\
\
[
lim An { = 0.
lim P
An { = P
An { = P
m=1 nm
nm
\
n=m
!
An {
= lim
N
Y
n=m
"
N
X
#
P An
=0
n=m
P
if n=1 P(An ) = . (In the preceding, I have used the trivial inequality 1 t
et , t [0, ).)
A second, and perhaps more transparent, way of dealing with the contents of
the preceding is to introduce the non-negative random variable N () Z+
1
P
Theorem, E [N ] = n=1 P(An ), and so Borels contribution is equivalent to
the EP [N ] < = P(N < ) = 1, which is obvious, whereas Cantellis
contribution is that, for mutually independent An s, P(N < ) = EP [N ] <
, which is not obvious.
1.1.2. Independent Functions. Having described what it means for the algebras to be P-independent, I will now transfer the notion to random variables
on (, F, P). Namely, for each i I, let Xi be a random variable (i.e., a
measurable function on (, F)) with values in the measurable space (Ei , Bi )). I
will say that the random variables Xi , i I, are (mutually) P-independent
if the -algebras
(Xi ) = Xi1 Bi Xi1 (Bi ) : Bi Bi , i I,
are P-independent. If B(E; R) = B (E, B); R denotes the space of bounded
measurable R-valued functions on the measurable space (E, B), then it should
be clear that P-independence of {Xi : i I} is equivalent to the statement that
EP fi1 Xi1 fin Xin = EP fi1 Xi1 EP fin Xin
for all finite subsets
{i1 , . . . , in } of distinct elements of I and all choices of
fi1 B Ei1 ; R , . . . , fin B Ein ; R . Finally, if 1A given by
1A ()
if A
if
/A
denotes the indicator function of the set A , notice that the family of sets
{Ai : i I} F is P-independent if and only if the random variables 1Ai , i I,
are P-independent.
Thus far I have discussed only the abstract notion of independence and have
yet to show that the concept is not vacuous. In the modern literature, the
standard way to construct lots of independent
quantities is to take products of
probability spaces.
Namely,
if
E
,
B
,
is
a
probability
space for each i I,
i
i
i
Q
one sets = iI Ei ; defines i : Ei to be the natural projection map
W
for each i I; takes Fi = i1 (Bi ), i I, and F = iI Fi ; and shows that
there is a unique probability measure P on (, F) with the properties that
P i1 i = i i )
1
for all
i I and i Bi
Throughout this book, I use EP [X, A] to denote the expected value under P of X over the set
R
A. That is, EP [X, A] =
X dP. Finally, when A = , I will write EP [X]. Tonellis Theorem
A
is the version of Fubinis Theorem for non-negative functions. Its virtue is that it applies
whether or not the integrand is integrable.
1.1 Independence
I will now show that the Rademacher functions are P-independent. To this end,
first note that every real-valued function f on {1, 1} is of the form + x, x
{1, 1}, for some pair of real numbers and . Thus, all that I have to show is
that
EP (1 + 1 R1 ) (n + n Rn ) = 1 n
for any n Z+ and (1 , 1 ), . . . , (n , n ) R2 . Since this is obvious when
n = 1, I will assume that it holds for n and need only check that it must also
hold for n + 1, and clearly this comes down to checking that
EP F (R1 , . . . , Rn ) Rn+1 = 0
for any F : {1, 1}n R. But (R1 , . . . , Rn ) is constant on each interval
m m+1
, 0 m < 2n ,
Im,n n ,
2n
2
whereas Rn+1 integrates to 0 on each Im,n . Hence, by writing the integral over
as the sum of integrals over the Im,n s, we get the desired result.
At this point I have produced a countably infinite sequence of independent
Bernoulli random variables (i.e., two-valued random variables whose range is
usually either {1, 1} or {0, 1}) with mean value 0. In order to get more general
ta
ba
Lemma 1.1.6. Let {X` : ` Z+ } be a sequence of P-independent {0, 1}valued Bernoulli random variables with mean value 12 on some probability space
(, F, P), and set
X
X`
.
U=
2`
`=1
n ()
1 + Rn ()
,
2
on [0, 1), B[0,1) , [0,1) . But, as is easily checked (cf. part (i) of Exercise 1.1.11),
P
for each [0, 1], = n=1 2n n (). Hence, the desired conclusion is trivial
in this case.
Now let (k, `) Z+ Z+ 7 n(k, `) Z+ be any one-to-one mapping of
+
Z Z+ onto Z+ , and set
Yk,` =
1 + Rn(k,`)
,
2
2
(k, `) Z+ .
Clearly, each Yk,` is a {0, 1}-valued, Bernoulli random variable with mean value
1
+ 2
is P-independent. Hence, by Lemma
2 , and the family Yk,` : (k, `) Z
1.1.6, each of the random variables
Uk
X
Yk,`
`=1
2`
k Z+ ,
t R.
Hence, after combining this with what we already know, I have now completed
the proof of the following theorem.
Theorem 1.1.7. Let = [0, 1), F = B[0,1) , and P = [0,1) . Then, for
any sequence {Fk : k Z+ } of distribution functions on R, there exists a
sequence {Xk : k Z+ } ofP-independent random variables on (, F, P) with
the property that P Xk t = Fk (t), t R, for each k Z+ .
Exercises for 1.1
Exercise 1.1.8. As I pointed out, P A1 A2 = P A1 )P A2 if and only
if the -algebra generated by A1 is P-independent of the one generated by A2 .
Construct an example to show that the analogous statement is false when dealing
with three, instead
of two, sets. That is, just because P A1 A2 A3 =
P A1 P A2 P A3 , show that it is not necessarily true that the three -algebras
generated by A1 , A2 , and A3 are P-independent.
Exercise 1.1.9. This exercise deals with three elementary, but important,
properties of independent random variables. Throughout, (, F, P) is a given
probability space.
(i) Let X1 and X2 be a pair of P-independent random variables with values in
the measurable spaces (E1 , B1 ) and (E2 , B2 ), respectively. Given a B1 B2 measurable function F : E1 E2 R that is bounded below, use Tonellis or
Fubinis Theorem to show that
x2 E2 7 f (x2 ) EP F X1 , x2 R
is B2 -measurable and that
EP F X1 , X2 = EP f X2 .
(ii) Suppose that X1 , . . . , Xn are P-independent, real-valued random variables.
If each of the Xm s is P-integrable, show that X1 Xn is also P-integrable and
that
EP X1 Xn = EP X1 EP Xn .
(iii) Let {Xn : n Z+ } be a sequence of independent random variables taking
values in some separable metric space E. If P(X
n = x) = 0 for all x E and
n Z+ , show that P Xm = Xn for some m 6= n = 0.
cos 2n z
for all z C.
n=1
Z
(i) Show that {P
n () : n 1} is the unique sequence {n : n 1} {0, 1}
n
such that m=1 2m m < 2n , and conclude that 1 () = b2c and
n+1 () = b2n+1 c 2b2n c for n 1.
2n1 (),
n=1
!
n
2n () ,
n=1
and show that [0,1)2 = F [0,1) . That is, [0,1) { : F () } = 2[0,1) () for
all B[0,1)2 .
(iii) Define G : [0, )2 [0, 1) by
X
2n (1 ) + n (2 )
,
G (1 , 2 ) =
4n
n=1
See, for example, 3.1 in the authors A Concise Introduction to the Theory of Integration,
Third Edition, Birkh
auser (1998).
Exercise 1.1.13. In this exercise I discuss two criteria for determining when
random variables on the probability space (, F, P) are independent.
(i) Let X1 , . . . , Xn be bounded, real-valued random variables. Using Weierstrasss Approximation Theorem, show that the Xm s are P-independent if and
only if
EP X1m1 Xnmn = EP X1m1 EP Xnmn
for all m1 , . . . , mn N.
(ii) Let X : Rm and Y : Rn be random variables. Show that X
and Y are P-independent if and only if
h
i
P
E exp 1 , X Rm + , Y Rn
h
h
i
i P
= E exp 1 , X Rm E exp 1 , Y Rn
P
1 (,x)Rm
() d and g(y) =
e 1 (,y)Rn () d,
f (x) =
e
Rm
Rn
where and are smooth functions with rapidly decreasing (i.e., tending
to 0 as |x| faster than any power of (1 + |x|)1 ) derivatives of all orders.
Finally, apply Fubinis Theorem.
Exercise 1.1.14. Given a pair of measurable spaces (E1 , B1 ) and (E
2 , B2 ),
recall that their product is the measurable space E1 E2 , B1 B2 , where
B1 B2 is the -algebra over the Cartesian product space E1 E2 generated by
the sets 1 2 , i Bi . Further, recall that, for any probability measures i
on (Ei , Bi ), there is a unique probability measure 1 2 on E1 E2 , B1 B2
such that
(1 2 ) 1 2 = 1 (1 )2 (2 ) for i Bi .
More Q
generally, for any n 2 and measurable
spaces {(Ei , Bi ) : 1Q i n}, one
Qn
n
n
takes 1 Bi to be the -algebra over 1 Ei generated by the sets 1 i , i Bi .
Qn+1
Qn+1
Qn
In particular, since 1 Ei and 1 Bi can be identified with ( 1 Ei )
10
Qn
En+1 and ( 1 Bi ) Bn+1 , respectively, one can use induction to show that, for
every choice
measures
i on (Ei , Bi ), there is a unique probability
Qn of probability
Qn
Qn
measure 1 i on ( 1 Ei , 1 Bi ) such that
! n !
n
n
Y
Y
Y
i
i =
i (i ), i Bi .
1
on
the
(E
,
B
)s,
there
is
a
unique
probability
measure
i
i
i
iI i on
EI , BI with the property that
!
!!
Y
Y
Y
1
(1.1.15)
i
F
i
=
i i , i Bi ,
iI
iF
iF
for every =
6 F I. Not surprisingly, the probability space
!
Y
Y
Y
Ei ,
Bi ,
i
iI
iI
iI
is called the product over I of the spaces Ei , Bi , i ; and when all the factors
are the same space E, B, , it is customary to denote
it by E I , B I , I , and
if, in addition, I = {1, . . . , N }, one uses E N , B N , N .
(i) After noting (cf. Exercise 1.1.12) that two probability measures that agree on
a -system agree on the -algebra generated bythat -system, show that there
is at most one probability measure on EI , BI that satisfies the condition in
(1.1.15). Hence, the problem is purely one of existence.
(ii) Let A be the algebra over EI generated by C, and show that there is a finitely
additive : A [0, 1] with the property that
!
Y
F1 F =
i F , F BF ,
iF
11
for all =
6 F I. Hence, all that one has to do is check that admits a
-additive extension to BI , and, by a standard extension theorem, this comes
down to checking that (An ) & 0 whenever {An : n 1} A and An & .
Thus, let {An : n 1} be a non-increasing sequence from A, and
Tassume that
(An ) for some > 0 and all n Z+ . One must show that 1 An 6= .
(iii) Referring to the last part of (ii), show that there is no loss in generality
to assume that An = F1
Fn , where, for each n Z+ , =
6 Fn I and
n
Fn BFn . In addition, show that one may assume that F1 = {i1 } and that
Fn = Fn1 {in }, n 2, where {in : n 1} is a sequence of distinct elements
of I. Now, make these assumptions, and show that it suffices to find a` Ei` ,
` Z+ , with the property that, for each m Z+ , (a1 , . . . , am ) Fm .
( iv) Continuing (iii), for each m, n Z+ , define gm,n : EFm [0, 1] so that
gm,n xFm = 1Fn xi1 , . . . , xin
if n m
and
gm,n xFm =
EFn \Fm
n
Y
!
i`
dyFn \Fm
if n > m.
`=m+1
g1,n xi1 i1 dxi1
Ei1
= lim (An ) ,
n
12
iI
iI
let Xi : Ei be the natural projection map from onto Ei , and show that
{Xi : i I} is a family of mutually P-independent random variables such that,
for each i I, Xi has distribution i .
Exercise 1.1.17. Although it does not entail infinite product spaces, an interesting example of the way in which the preceding type of construction can be
effectively applied is provided by the following elementary version of a coupling
argument.
(i) Let (, B, P) be a probability space and X and Y a pair of P-square integrable
R-valued random variables with the property that
X() X( 0 ) Y () Y ( 0 ) 0 for all (, 0 ) 2 .
Show that
EP X Y EP [X] EP [Y ].
Hint: Define Xi and Yi on 2 for i {1, 2} so that Xi () = X(i ) and
Yi () = Y (i ) when = (1 , 2 ), and integrate the inequality
0 X(1 ) X(2 ) Y (1 ) Y (2 ) = X1 () X2 () Y1 () Y2 ()
with respect to P2 .
(ii) Suppose that n Z+ and that f and g are R-valued, Borel measurable
functions on Rn that are non-decreasing with respect to each coordinate (separately). Show that if X = X1 , . . . , Xn is an Rn -valued random variable on a
probability space (, B, P) whose coordinates are mutually P-independent, then
EP f (X) g(X) EP f (X) EP g(X)
so long as f (X) and g(X) are both P-square integrable.
13
Hint: First check that the case when n = 1 reduces to an application of (i).
Next, describe the general case in terms of a multiple integral, apply Fubinis
Theorem, and make repeated use of the case when n = 1.
Exercise 1.1.18. A -algebra is said to be countably generated if it contains
a countable collection of sets that generate it. The purpose of this exercise is to
show that just because a -algebra is itself countably generated does not mean
that all its sub--algebras are.
Let (, F, P) be a probability space and {An : n Z+ F a sequence of
P-independent sub-subsets of F with the property that P(An ) 1 for
some (0, 1). Let Fn be the sub--algebra generated by An . Show that the
tail -algebra T determined by Fn : n Z+ cannot be countably generated.
Hint: Show that C T is an atom in T (i.e., B = C whenever B T \ {} is
contained in C) only if one can write
C = lim Cn
n
\
[
Cn ,
m=1 nm
B`
B` {
if P B` = 1
if P B` = 0
and set
C=
` .
B
`N
Note that, on the one hand, P(C) = 1, while, on the other hand, C is an atom
in T and therefore has probability 0.
Exercise 1.1.19. Here is an interesting application of Kolmogorovs 01 Law
to a property of the real numbers.
(i) Referring to the discussion preceding Lemma 1.1.6 and part (i) of Exercise
1.1.11, define the transformations Tn : [0, 1) [0, 1) for n Z+ so that
Tn () =
Rn ()
,
2n
[0, 1),
and notice (cf. the proof of Lemma 1.1.6) that Tn () simply flips the nth coefficient in the binary expansion . Next, let B[0,1) , and
show that
is measurable with respect to the -algebra {Rn : n > m} generated by
{Rn : n > m} if and only if Tn () = for each 1 n m. In particular,
conclude that [0,1) () {0, 1} if Tn = for every n Z+ .
14
(ii) Let F denote the set of all finite subsets of Z+ , and for each F F, define
T F : [0, 1) [0, 1) so that T is the identity mapping and
T F {m} = T F Tm
As an application of (i), show that for every B[0,1) with [0,1) () > 0,
!
[
[0,1)
T F () = 1.
F F
In particular, this means that if has positive measure, then almost every
[0, 1) can be moved to by flipping a finite number of the coefficients in the
binary expansion of .
1.2 The Weak Law of Large Numbers
Starting with this section, and for the rest of this chapter, I will be studying what
happens when one averages independent, real-valued random variables. The
remarkable fact, which will be confirmed repeatedly, is that the limiting behavior
of such averages depends hardly at all on the variables involved. Intuitively,
one can explain this phenomenon by pretending that the random variables are
building blocks that, in the averaging process, first get homothetically shrunk
and then reassembled according to a regular pattern. Hence, by the time that
one passes to the limit, the peculiarities of the original blocks get lost.
Throughout the discussion, (, F, P) will be a probability space on which there
is a sequence {Xn : n 1} of real-valued random variables. Given n Z+ , use
Sn to denote the partial sum X1 + + Xn and S n to denote the average:
n
1X
Sn
X` .
=
n
n
`=1
and EP Xk X` = 0 if k 6= `.
In particular, if
M sup EP Xn2 < ,
nZ+
then
2 M
,
n Z+ and > 0;
2 P S n EP S n
n
and so S n 0 in L2 (P; R) and therefore also in P-probability.
(1.2.3)
15
(1.2.5)
2 P S n m EP S n m
n
In particular, S n m in L2 (P; R) and therefore in P-probability.
As yet I have made only minimal use of independence: all that I have done
is subtract off the mean of independent random variables and thereby made
them orthogonal. In order to bring the full force of independence into play, one
has to exploit the fact that one can compose independent random variables with
any (measurable) functions without destroying their independence; in particular,
truncating independent random variables does not destroy independence. To see
how such a property can be brought to bear, I will now consider the problem
of extending the last part of Theorem 1.2.4 to Xn s that are less than P-square
integrable. In order to
understand the statement, recall that a family of random
variables Xi : i I is said to be uniformly P-integrable if
h
i
lim sup EP Xi , Xi R = 0.
R% iI
As the proof of the following theorem illustrates, the importance of this condition
is that it allows one to simultaneously approximate the random variables Xi , i
I, by bounded random variables.
16
Theorem 1.2.6 (The Weak Law of Large Numbers). Let Xn : n Z+
be a uniformly P-integrable sequence of P-independent random variables. Then
n
1X
Xm EP [Xm ] 0 in L1 (P; R)
n 1
and therefore also in P-probability. In particular, if Xn : n Z+ is a sequence
of P-independent, P-integrable random variables that are identically distributed,
then S n EP [X1 ] in L1 (P; R) and P-probability. (Cf. Exercise 1.2.11.)
Proof: Without loss in generality, I will assume that EP [Xn ] = 0 for every
n Z+ .
For each R (0, ), define fR (t) = t 1[R,R] (t), t R,
m(R)
= EP fR Xn ,
n
Xn(R) = fR Xn m(R)
n ,
and set
(R)
Sn
1 X (R)
X`
n
and
(R)
Tn
Since E[Xn ] = 0 =
1 X (R)
Y` .
n
`=1
`=1
(R)
mn
= E Xn , |Xn | > R ,
(R)
(R)
EP |S n | EP |S n | + EP |T n |
(R) 1
EP |S n |2 2 + 2 max EP |X` |, |X` | R
1`n
R
EP |X` |, |X` | R ;
+ 2 max
n
`Z+
and therefore, for each R > 0,
lim EP |S n | 2 sup EP |X` |, |X` | R .
`Z+
Hence, because the X` s are uniformly P-integrable, we get the desired convergence in L1 (P; R) by letting R % .
1.2.3. Approximate Identities. The name of Theorem 1.2.6 comes from
a somewhat invidious comparison with the result in Theorem 1.4.9. The reason
why the appellation weak is not entirely fair is that, although The Weak Law
is indeed less refined than the result in Theorem 1.4.9, it is every bit as useful
as the one in Theorem 1.4.9 and maybe even more important when it comes
to applications. What The Weak Law provides is a ubiquitous technique for
constructing an approximate identity (i.e., a sequence of measures that approximate a point mass) and measuring how fast the approximation is taking
17
place. To illustrate how clever selections of the random variables entering The
Weak Law can lead to interesting applications, I will spend the rest of this section
discussing S. Bernsteins approach
to Weierstrasss
Approximation Theorem.
+
For a given p [0, 1], let Xn : n Z
be a sequence of P-independent
{0, 1}-valued Bernoulli random variables with mean value p. Then
P Sn = ` =
n `
p (1 p)n`
`
for
0 ` n.
Hence, for any f C [0, 1]; R , the nth Bernstein polynomial
n
X
n
`
p` (1 p)n`
Bn (p; f )
f
n
`
(1.2.7)
`=0
of f at p is equal to
EP f S n .
In particular,
f (p) Bn (p; f ) = EP f (p) f S n EP f (p) f S n
2kf ku P S n p + (; f ),
where kf ku is the uniform norm of f (i.e., the supremum of |f | over the domain
of f ) and
(; f ) sup |f (t) f (s)| : 0 s < t 1 with t s
is the modulus of continuity of f . Noting that Var Xn = p(1 p)
applying (1.2.5), we conclude that, for every > 0,
1
4
and
f (p) Bn (p; f )
kf ku + (; f ).
u
2n2
f Bn ( ; f )
u (n; f ) inf
kf ku
+ (; f ) : > 0 .
2n2
18
most efficient one,1 as we are about to see, the Bernstein polynomials have a
lot to recommend them. In particular, they have the feature that they provide
non-negative polynomial approximates to non-negative functions. In fact, the
following discussion reveals much deeper non-negativity preservation properties
possessed by the Bernstein approximation scheme.
In order to bring out the virtues of the Bernstein polynomials, it is important to replace (1.2.7) with an expression in which the coefficients of Bn ( ; f )
(as polynomials) are clearly displayed. To this end, introduce the difference
operator h for h > 0 given by
f (t + h) f (t)
.
h f (t) =
h
for m Z+ ,
`=0
(m)
where h
see that
Bn (p; f ) =
n n`
X
X nn `
`=0 k=0
n
X
1
n,
we now
(1)k f (`h)p`+k
r
X
n n`
=
p
(1)r` f (`h)
`
r
`
r=0
r
`=0
n
X
X
r
n
r
=
(p)
(1)` f (`h)
r
`
r=0
r
`=0
n
X
r=0
n
(ph)r rh f (0),
r
Bn (p; f ) =
n
X
`=0
n`
n `
1 f (0)p`
n
`
for
p [0, 1].
See G.G. Lorentzs Bernstein Polynomials, Chelsea Publ. Co. (1986) for a lot more information.
19
X
X
un = 1 and (t) =
un tn for t [0, 1].
n=0
n=0
(0)
0
for every n N and 0 m n.
(iii) m
1
n
Proof: The implication (i) = (ii) is trivial. To see that (ii) implies (iii), first
observe that if is absolutely monotone on (a, b) and h (0, b a), then h
is absolutely monotone on (a, b h). Indeed, because D h = h D on
(a, b h), we have that
h Dm h (t) =
t+h
Dm+1 (s) ds 0,
t (a, b h),
and so m
h (0) 0 when h =
1
n
if
mh < 1,
h% n
we also know that nh (0) 0 when h = n1 , and this completes the proof that
(ii) implies (iii).
Finally, assume that (iii) holds and set n = Bn ( ; ). Then, from (1.2.9) and
the equality n (1) = (1) = 1, we see that each n is a probability generating
function. Thus, in order to complete the proof that (iii) implies (i), all that
20
un,` t` ,
`=0
Because the un,` s are all elements of [0, 1], one can use a diagonalization procedure to choose {nk : k Z+ } so that
lim unk ,` = u` [0, 1]
u` t`
`=0
Finally, by the Monotone Convergence Theorem, the preceding extends immediately to t = 1, and so is a probability generating function. (Notice that
the argument just given does not even use the assumed uniform convergence
and shows that the pointwise limit of probability generating functions is again
a probability generating function.)
The preceding is only one of many examples in which The Weak Law leads
to useful ways of forming an approximate identity. A second example is given
in Exercises 1.2.12 and 1.2.13. My treatment of these is based on that of Wm.
Feller.2
Exercises for 1.2
Exercise 1.2.11. Although, for historical reasons, The Weak Law is usually
thought of as a theorem about convergence in P-probability, the forms in which
I have presented it are clearly results about convergence in either P-mean or
even P-square mean. Thus, it is interesting to discover that one can replace the
uniform integrability assumption made in Theorem 1.2.6 with a weak uniform integrability assumption if one is willing to settle for convergence in P-probability.
Namely, let X1 , . . . , Xn , . . . be mutually P-independent random variables, assume that
F (R) sup RP |Xn | R 0 as R % ,
nZ+
2
Wm. Feller, An Introduction to Probability Theory and Its Applications, Vol. II, Wiley, Series
in Probability and Math. Stat. (1968). Feller provides several other similar applications of The
Weak Law, including the ones in the following exercises.
21
i
1 X Ph
E X` , |X` | n ,
mn =
n
n Z+ .
`=1
n
+
P
max
P S n mn
`
`
1`n
(n)2
`=1
Z n
2
F (t) dt + F (n),
2
n 0
and conclude that S n mn 0 in P-probability. (See part (ii) of Exercises
1.4.26 and 1.4.27 for a partial converse to this statement.)
t P |Y | > t dt.
[0,)
Hint: Let X1 , . . . , Xn , . . . be P-independent, N-valued Poisson random variables with mean value t. That is, the Xn s are P-independent and
tk
for k N.
P Xn = k = et
k!
Show that Sn is an N-valued Poisson random variable with mean value nt, and
conclude that, for each T [0, ) and t (0, ),
X (nt)k
= P Sn T .
ent
k!
0knT
Exercise 1.2.13. Given a right-continuous function F : [0, ) R of bounded variation with F (0) = 0, define its Laplace transform (), [0, ), by
the RiemannStieltjes integral:
Z
() =
et dF (t).
[0,)
as n
knT
22
1.3 Cram
ers Theory of Large Deviations
From Theorem 1.2.4, we know that if Xn : n Z+ is a sequence of Pindependent, P-square integrable random variables with mean value 0, and if
the averages S n , n Z+ , are defined accordingly, then, for every > 0,
max
1mn Var(Xm )
,
n Z+ .
P S n
n2
Thus, so long as
Var(Xn )
0 as n ,
n
the S n s are becoming more and more concentrated near 0, and the rate at
which this concentration is occurring can be estimated in terms of the variances
Var(Xn ). In this section, we will see that, by placing more stringent integrability
requirements on the Xn s, one can gain more information about the rate at which
the S n s are concentrating at 0.
In all of this analysis, the trick is to see how independence can be combined
with 0 mean value to produce unexpected cancellations; and, as a preliminary
warm-up exercise, I begin with the following.
Theorem 1.3.1. Let {Xn : n Z+ } be a sequence of P-independent, Pintegrable random variables with mean value 0, and assume that
M4 sup EP Xn4 < .
nZ+
4 3M4
4 P |S n | EP S n 2 ,
n
In particular, S n 0 P-almost surely.
(1.3.2)
n Z+ .
E Sn4 =
P
EP Xm1 Xm4 ,
m1 ,...,m4 =1
and, by Schwarzs Inequality, each of these terms is dominated by M4 . In addition, of these terms, the only ones that do not vanish have either all their factors
the same or two pairs of equal factors. Thus, the number of non-vanishing terms
is n + 3n(n 1) = 3n2 2n.
Given (1.3.2), the proof of the last part becomes an easy application of the
BorelCantelli Lemma. Indeed, for any > 0, we know from (1.3.2) that
X
P S n < ,
n=1
and therefore, by (1.1.4), that P limn S n = 0.
23
Obviously, (1.3.4) is more than sufficient to guarantee that the Xn s have moments of all orders. In fact, as an application of Lebesgues Dominated Convergence Theorem, one sees that R 7 M () (0, ) is infinitely differentiable
and that
Z
dn M
(0) for all n N.
EP X1n =
xn (dx) =
d n
R
In the discussion that follows, I will use m and 2 to denote, respectively, the
common mean value and variance of the Xn s.
In order to develop some intuition for the considerations that follow, I will
first consider an example, which, for many purposes, is the canonical example in
probability theory. Namely, let g : R (0, ) be the Gauss kernel
|y|2
1
, y R,
(1.3.5)
g(y) exp
2
2
24
There are two obvious reasons for the honored position held by Gaussian
random variables. In the first place, they certainly have finite moment generating
functions. In fact, since
2
Z
y
, R,
e g(y) dy = exp
2
R
it is clear that
2 2
.
Mm,2 () = exp m +
2
(1.3.6)
an N (m,
)-random
variable that is independent of X, then X + X is an
N m + m,
2 +
2 -random variable. In particular, if X1 , . . . , Xn are mutually
independent, standard normal random variables, then S n is an N 0, n1 -random
variable. That is,
r
Z
n|y|2
n
dy.
exp
P Sn =
2
2
h
i
1
log P S n = ess inf
lim
n n
|y|2
: y ,
2
where the ess in (1.3.7) stands for essential and means that what follows is
taken modulo a set of measure 0. (Hence, apart from a minus sign, the right2
hand side of (1.3.7) is the greatest number dominated by |y|2 for Lebesgue-almost
every y .) In fact, because
for > 0.
r
P |S n a| <
n(|a| + )2
22 n
.
exp
2
25
More generally, if the Xn s are mutually independent N (m, 2 )-random variables, then one finds that
r
P |S n m|
n2
2
exp
2
n2
for > 0;
r
P |S n (m + a)| <
n(|a| + )2
22 n
.
exp
2
Of course, in general one cannot hope to know such explicit expressions for the
distribution of S n . Nonetheless, on the basis of the preceding, one can start to
see what is going on. Namely, when the distribution falls off rapidly outside of
compacts, averaging n independent random variables with distribution has the
effect of building an exponentially deep well in which the mean value m lies at the
bottom. More precisely, if one believes that the Gaussian random variables are
normal in the sense that they are typical, then one should conjecture
that,
even
when the random variables are not normal, the behavior of P S n m for
large ns should resemble that of Gaussians with the same variance; and it is in
the verification of this conjecture that the moment generating function M plays
a central role. Namely, although an expression in terms of for the distribution
of Sn is seldom readily available, the moment generating function for Sn is easily
expressed in terms of M . To wit, as a trivial application of independence, we
have
EP eSn = M ()n , R.
P S n a ena M ()n = exp n a () ,
[0, ),
where
(1.3.8)
() log M ()
is the logarithmic moment generating function of . The preceding relation is one of those lovely situations in which a single quantity is dominated by a
whole family of quantities, which means that one should optimize by minimizing
over the dominating quantities. Thus, we now have
(1.3.9)
"
P S n a exp n
sup
[0,)
#
a () .
26
Notice that (1.3.9) is really very good. For instance, when the Xn s are N (m, 2 )random variables and > 0, then (cf. (1.3.6)) the preceding leads quickly to the
estimate
n2
P S n m exp 2 ,
2
0 (),
Z
x (dx)
and
R
2
x (dx) = 00 ().
for x (, m].
Finally, if
= inf x R : (, x] > 0 and = sup x R : [x, ) > 0 ,
then I is smooth on (, ) and identically + off of [, ]. In fact, either
({m}) = 1 and = m = or m (, ), in which case 0 is a smooth,
strictly increasing mapping from R onto (, ),
I (x) = (x) x (x) , x (, ),
where
= 0
1
27
Proof: For notational convenience, I will drop the subscript during the
proof. Further, note that the smoothness of follows immediately from the
positivity and smoothness of M , and the identification of 0 () and 00 () with
the mean and variance of is elementary calculus combined with the remark
following (1.3.4). Thus, I will concentrate on the properties of the function I.
As the pointwise supremum of functions that are linear, I is certainly lower
semicontinuous and convex. Also, because (0) = 0, it is obvious that I 0.
Next, by Jensens Inequality,
Z
() x (dx) = m,
R
and therefore
x () achieves a maximum at some point x R. In
addition, by the first derivative test, 0 (x ) = x, and so x = 1 (x). Finally,
suppose that < . Then
Z
Z
y
e
e (dy) =
e(y) (dy) & ({}) as ,
R
(,]
P S n a enI (a)
P S n a enI (a)
28
n2
1
where is the function given in (1.3.8) and 0
.
Proof: To prove the first part, suppose that a [m, ), and apply the second
part of Lemma 1.3.11 to see that the exponent in (1.3.9) equals nI (a), and,
after replacing {Xn : n 1} by {Xn : n 1}, one also gets the desired
estimate when a m.
To prove the lower bound, let a [m, ) be given, and set = (a)
[0, ). Next, recall the probability measure described in Lemma 1.3.11, and
remember that has mean value a = 0 () and variance 00 (). Further, if
Yn : n Z+ is a sequence of independent, identically distributed random
variables with common distribution , then it is an easy matter to check that,
for any n Z+ and every BRn -measurable F : Rn [0, ),
h
h
i
i
1
P Sn
E
e
F
X
,
.
.
.
,
X
EP F Y1 , . . . , Yn =
1
n .
M ()n
In particular, if
n
X
Tn
,
Tn =
Y` and T n =
n
`=1
But, because the mean value and variance of the Yn s are, respectively, a and
00 (), (1.2.5) leads to
00 ()
.
P T n a
n2
Results like the ones obtained in Theorem 1.3.12 are examples of a class of
results known as large deviations estimates. They are large deviations because the probability of their occurrence is exponentially small. Although large
deviation estimates are available in a variety of circumstances,1 in general one
has to settle for the cruder sort of information contained in the following.
1
In fact, some people have written entire books on the subject. See, for example, J.-D.
Deuschel and D. Stroock, Large Deviations, now available from the A.M.S. in the Chelsea
Series.
29
h
i
1
log P S n
n n
h
i
1
log P S n inf I (x).
lim
n n
x
(I use and to denote the interior and closure of a set . Also, recall that I
take the infemum over the empty set to be +.)
Proof: To prove the upper bound, let be a closed set, and define + =
[m, ) and = (, m]. Clearly,
P S n 2P S n + P S n .
h
i
1
log P S n I (a) (a)
n n
lim
30
Remark 1.3.14. The upper bound in Theorem 1.3.12 is often called Chernoff s Inequality. The idea underlying its derivation is rather mundane by
comparison to the subtle idea underlying the proof of the lower bound. Indeed,
it may not be immediately obvious what that idea was! Thus, consider once
again the second part of the proof of Theorem 1.3.12. What I had to do is estimate the probability that S n lies in a neighborhood of a. When a is the mean
value m, such an estimate is provided by the Weak Law. On the other hand,
when a 6= m, the Weak Law for the Xn s has very little to contribute. Thus,
what I did is replace the original Xn s by random variables Yn , n Z+ , whose
mean value is a. Furthermore, the transformation from the Xn s to the Yn s was
sufficiently simple that it was easy to estimate Xn -probabilities in terms of Yn probabilities. Finally, the Weak Law
Pn applied to the Yn s gave strong information
about the rate of approach of n1 `=1 Y` to a.
I close this section by verifying the conjecture (cf. the discussion preceding
Lemma 1.3.11) that the Gaussian case is normal. In particular, I want to check
that the well around m in which the distribution of S n becomes concentrated
looks Gaussian, and, in view of Theorem 1.3.12, this comes down to the following.
and
2
I (x) (x m) K|x m|3
2 2
Proof: Without loss in generality (cf. Exercise 1.3.17), I will assume that m =
0 and 2 = 1. Since, in this case, (0) = 0 (0) = 0 and 00 (0) = 1, it
follows that (0) = 0 and 0 (0) = 1. Hence, we can find an
M (0, )
and a (0, 1] with < < < for which (x) x M |x|2 and
() 2 M ||3 whenever |x| and || (M + 1), respectively. In
2
particular, this leads immediately to (x) (M + 1)|x| for |x| , and
the estimate for I comes easily from the preceding combined with equation
I (x) = (x)x (x) .
31
as p .
Hint: Handle the case (E) < first, and treat the case when f L1 (; R)
by considering the measure (dx) = f (x) (dx).
Exercise 1.3.17. Referring to the notation used in this section, assume that
is a non-degenerate (i.e., it is not concentrated at a single point) probability
measure on R for which (1.3.4) holds. Next, let m and 2 be the mean and
variance of , use to denote the distribution of
x R 7
xm
R
under ,
Image 0 = m + Image 0 ,
1
xm
, x Image 0 .
(x) =
(x m)2
,
2 2
x R,
when is the N m, 2 distribution with > 0, and show that
I (x) =
bx
bx
xa
xa
,
log
+
log
p(b a)
(1 p)(b a) b a
ba
x (a, b),
32
Pn
Pn
x2
2
1 k2 .
distribution of S 1 k Xk and show that I (x) 2
2 , where
In particular, conclude that
a2
P |S| a 2 exp 2 , a [0, ).
2
Exercise 1.3.19. Although it is not exactly the direction in which I have been
going, it seems appropriate to include here a derivation of Stirlings formula.
Namely, recall Eulers Gamma function:
Z
(1.3.20)
(t)
xt1 ex dx,
t (1, ).
[0,)
t
(1.3.21)
(t + 1) 2t
e
as
t % ,
where the tilde means that the two sides are asymptotic to one another in
the sense that their ratio tends to 1. (See Exercise 2.1.16 for another approach.)
The first step is to make the problem look like one to which Exercise 1.3.16
is applicable. Thus, make the substitution x = ty, and apply Exercise 1.3.16 to
see that
! 1t
1
Z
(t + 1) t
e1 .
=
y t ety dy
tt+1
[0,)
This is, of course, far less than we want to know. Nonetheless, it does show that
all the action is going to take place near y = 1 and that the principal factor in
t
the asymptotics of (t+1)
tt+1 is e . In order to highlight these observations, make
the substitution y = z + 1 and obtain
Z
(t + 1)
=
(1 + z)t etz dz.
tt+1 et
(1,)
2
Before taking the next step, introduce the function R(z) = log(1 + z) z + z2
for z (1, 1), and check that R(z) 0 if z (1, 0] and that |R(z)|
everywhere in (1, 1). Now let (0, 1) be given, and show that
Z
t tz
t 2
t
1 + z e dz (1 ) (1 )e
exp
2
1
and
Z
h
it1 Z
etz dz 1 + e
(1 + z)ez dz
3
t 2
.
+
2 exp 1
3(1 )
2
1+z
t
|z|3
3(1|z|)
33
tz 2
|z|
where
Z
E(t, ) =
tz 2
2
etR(z) 1 dz.
|z|
Check that
Z
r
Z
2
t 2
z2
1
2
2
tz2
e 2 dz 1 e 2 .
dz
e
= t 2
1
|z|
t
t2
|z|t 2
|z|3 e
|z|
|z|
tz 2 35
2 3(1)
dz
12(1 )
(3 5)2 t
p
as long as < 35 . Finally, take = 2t1 log t, and combine these to conclude
that there is a C < such that
C
(t + 1)
1
, t [1, ).
2t t t
t
e
T.H. Carne, A transformation formula for Markov chains, Bull. Sc. Math., 109, pp. 399
405 (1985). As Carne points out, what he is doing is the discrete analog of Hadamards representation, via the Weierstrass transform, of solutions to heat equations in terms of solutions
to the wave equations.
34
C.
In particular, this means that |Q(n, x)| 1 for all x [1, 1]. (It also means
that Q(n, ) is the nth Chebychev polynomial.)
(iii) Using induction on n Z+ , show that
n
A Q( , z) (m) = z n Q(m, z),
m Z and z C,
n Z+
and z C.
In particular, if
h
i
pm,n (z) E Q Sn , z), Sn < m = 2n
X
|2`n|<m
n
Q(2` n, z),
`
m2
sup |x pm,n (x)| P |Sn | m 2 exp
2n
x[1,1]
n
for all 1 m n.
2
f, An g 2kf kH kgkH exp m
H
2n
for n m.
sup |p(x)| kf kH ,
x[1,1]
f H.
35
Z
R and any sequence
n
bn : n Z+ (0, ) that converges to an element of (0, ], the set on which
lim
Sn an
bn
exists in R
lim
Sn an
bn
and
Sn an
bn
n
lim
Theorem 1.4.2.
variables, and if
(1.4.3)
Var Xn ) < ,
n=1
then
X
Xn EP Xn
converges P-almost surely.
n=1
n
X
1 X
Var X` ,
sup P
X` EP X` 2
nN
`=N
`=N
36
P
(1.4.3) certainly implies that the series n=1 Xn EP Xn converges in Pmeasure. Thus, all that I am attempting to do here is replace a convergence
in measure statement with an almost sure one. Obviously, this replacement
would be trivial if the supnN in (1.4.3) appeared on the other side of P. The
remarkable fact which we are about to prove is that, in the present situation,
the supnN can be brought inside!
Theorem 1.4.5 (Kolmogorovs Inequality).
and P-square integrable, then
(1.4.6)
n
!
X
1 X
Var Xn
P sup
X` EP X` 2
n=1
n1
`=1
2
+ 2 SN Sn Sn 2 SN Sn Sn ;
2
EP SN
, An EP Sn2 , An for any An {X1 , . . . , Xn } .
In particular, if A1 = |S1 | > and
n
o
An+1 = Sn+1 > and max S` ,
1`n
n Z+ ,
N
[
max Sn > =
An ,
1nN
n=1
n=1
2
N
X
n=1
n=1
P An = 2 P B N .
37
Thus,
2 X
P sup Sn > = lim 2 P BN lim EP SN
EP Xn2 ,
2
n1
n=1
and so the result follows after one takes left limits with respect to > 0.
Proof of Theorem 1.4.2: Again assume
that the Xn s have mean value 0.
By (1.4.6) applied to XN +n : n Z+ , we see that (1.4.3) implies
1 X
EP Xn2 0 as N
P sup Sn SN 2
n>N
n=N +1
for every > 0, and this is equivalent to the P-almost sure Cauchy convergence
of {Sn : n 1}.
In order to convert the conclusion in Theorem 1.4.2 into a statement about
S n : n 1 , I will need the following elementary summability fact about
sequences of real numbers.
Lemma 1.4.7 (Kronecker). Let bn : n Z+ be a non-decreasing sequence
of positive numbers that tend to , and set n = bn bn1 , where b0 0. If
{sn : n 1} R is a sequence that converges to s R, then
n
1 X
` s` s.
bn
`=1
X
1 X
xn
x` 0 as n .
converges in R =
bn
b
n=1 n
`=1
Proof: To prove the first part, assume that s = 0, and for given > 0 choose
N Z+ so that |s` | < for ` N . Then, with M = supn1 |sn |,
n
Mb
1 X
N
+ as n .
` s`
bn
bn
`=1
Pn
Turning to the second part, set y` = xb`` , s0 = 0, and sn = `=1 y` . After
summation by parts,
n
n
1 X
1 X
` s`1 ;
x` = sn
bn
bn
`=1
`=1
and so, since sn s R as n , the first part gives the desired conclusion.
After combining Theorem 1.4.2 with Lemma 1.4.7, we arrive at the following
interesting statement.
38
X
Var Xn
< ,
b2n
n=1
then
n
1 X
X` EP X` 0
bn
P-almost surely.
`=1
P Yn 6= Xn =
n=1
P |Xn | > n
n=1
Z n
X
n=1
P |X1 | > t dt = EP |X1 | < .
n1
1X P
E [Y` ] = lim EP X1 , |X1 | n = 0,
n
n n
lim
`=1
X
EP [Yn2 ]
< .
n2
n=1
X
EP [Y 2 ]
n
n=1
n2
39
X
1
,
n2
n=`
X
1 X P 2
E
X
,
`
1
<
|X
|
`
=
1
1
n2
n=1
`=1
X
`=1
X
1
EP X12 , ` 1 < |X1 | `
n2
n=`
X
1
`=1
EP X12 , ` 1 < |X1 | ` C EP |X1 | < .
Thus, the P-almost sure convergence is now established, and the L1 (P; R)-convergence result was proved already in Theorem 1.2.6.
Turning to the converse assertion, first note that (by Lemma 1.4.1) if S n
converges in R on a set of positive P-measure, then it converges P-almost surely
to some m R. In particular,
|Xn |
= lim S n S n1 = 0 P-almost surely;
n
n n
and so, if An |Xn | > n , then P limn An = 0. But the An s are mutually
independent, and
by the second part of the BorelCantelli Lemma, we
Ptherefore,
X
P
E |X1 | =
P |X1 | > t dt 1 +
P |Xn | > n < .
lim
n=1
Remark 1.4.10. A reason for being interested in the converse part of Theorem
1.4.9 is that it provides a reconciliation between the measure theory vs. frequency
schools of probability theory.
Although Theorem 1.4.9 is the centerpiece of this section, I want to give
another approach to the study of the almost sure convergence properties of
{Sn : n 1}. In fact, following P. Levy, I am going to show that {Sn : n 1}
converges P-almost surely if it converges in P-measure. Hence, for example,
Theorem 1.4.2 can be proved as a direct consequence of (1.4.4), without appeal
to Kolmogorovs Inequality.
The key to Levys analysis lies in a version of the reflection principle, whose
statement requires the introduction of a new concept. Given an R-valued random
variable Y , say that R is a median of Y and write med(Y ), if
(1.4.11)
P Y P Y 12 .
40
Notice that (as distinguished from a mean value) every Y admits a median; for
example, it is easy to check that
inf t R : P Y t 12
is a median of Y . In addition, it is clear that
med(Y ) = med(Y )
and
On the other hand, the notion of median is flawed by the fact that, in general,
a random variable will admit an entire non-degenerate interval of medians. In
addition, it is neither easy to compute the medians of a sum in terms of the
medians of the summands nor to relate the medians of an integrable random
variable to its mean value. Nonetheless, at least if Y Lp (P; R) for some
p [1, ), the following estimate provides some information. Namely, since, for
med(Y ) and R,
| |p
| |p P Y P Y EP |Y |p ,
2
we see that, for any p [1, ) and Y Lp (P; R),
p1
for all R and med (Y ).
| | 2EP |Y |p
| m|
2Var(Y )
Theorem 1.4.13 (L
evys Reflection Principle). Let Xn : n Z+ be
a sequence of P-independent random variables, and, for k `, choose `,k
med S` Sk . Then, for any N Z+ and > 0,
(1.4.14)
max Sn + N,n
1nN
2P SN ,
and therefore
(1.4.15)
max Sn + N,n
1nN
2P |SN | .
Proof:
Clearly (1.4.15) follows by applying (1.4.14) to both the sequences
Xn : n 1} and {Xn : n 1} and then adding the two results.
41
To prove (1.4.14), set A1 = S1 + N,1 and
An+1 = max S` + N,` < and Sn+1 + N,n+1
1`n
An =
n=1
max Sn + N,n .
1nN
In addition,
{SN An SN Sn N,n
for each 1 n N.
Hence,
N
X
P SN
P An SN Sn N,n
n=1
N
1
1X
P An = P max Sn + N,n ,
1nN
2
2 n=1
where, in the passage to the
last line, I have used the independence of the sets
An and SN Sn N,n .
+
Corollary 1.4.16. Let X
:
n
Z
be
n
a sequence of independent random
+
variables, and assume that Sn : n Z
converges in P-measure to an Rvalued random variable S. Then Sn S P-almost surely. (Cf. Exercise 1.4.25
as well.)
Proof: What I must show is that, for each > 0, there is an M Z+ such
that
sup P max Sn+M SM < .
N 1
1nN
42
Remark 1.4.17. The most beautiful and startling feature of Levys line of
reasoning is that it requires no integrability assumptions. Of course, in many
applications of Corollary 1.4.16, integrability considerations enter into the proof
that {Sn : n 1} converges in P-measure. Finally, a word of caution may be
in order. Namely, the result in Corollary 1.4.16 applies to the quantities Sn
themselves; it does not apply to associated quantities like S n . Indeed, suppose
that {Xn : n 1} is a sequence of independent, identically distributed random
variables that satisfy
12
P Xn t = P Xn t = 1 + t2 log e4 + t2
for all t 0.
On the one hand, by Exercise 1.2.11, we know that the associated averages S n
tend to 0 in probability. On the other hand, by the second part of Theorem
1.4.9, we know that the sequence S n : n 1 diverges almost surely.
i
1 h
P X t EP Y, X t ,
t
(1.4.19)
t (0, ).
Show that
(1.4.20)
p P p p1
,
E Y
p1
p1
EP X p
p (1, ).
f (x) (dx) =
E
f > t dt =
(0,)
t1 f t dt.
(0,)
tp2 EP Y, X t dt
(0,)
" Z
= pE Y
P
p2
dt =
p
EP X p1 Y ,
p1
43
p (1, ),
nZ+
and conclude from this that, for each p (2, ), {Sn : n 1} converges to S
in Lp (P ) if and only if S Lp (P ).
Exercise 1.4.23. If X L2 (P; R), then it is easy to characterize its mean m
as the c R that minimizes EP (X c)2 . Assuming that X L1 (P; R), show
that med(X) if and only if
EP |X | = min EP |X c| .
cR
b
P(X t) P(X t) dt.
Note that this can be used in place of (1.4.15) when proving results like the one
in Corollary 1.4.16.
44
and Sn = Sn Yn .
Show that
7 Sn (), Yn () R2
and 7 Sn (), Yn () R2
t [0, ).
(ii) Continuing in the same setting, add the assumption that the Xn s are identically distributed, and use part (i) to show that
lim P |S n | C = 1
1(1x)n
x
max |X` | > t = 1 P(|X1 | t)n
1`n
n as x & 0.
45
for all
t [0, ),
1
1
1
2 p 1 EP |Y |p p 2EP |Y |p p + || .
In particular, |X|p is integrable if and only if |Y |p is.
(ii) The result in (i) leads to my final refinement of The Weak Law of Large
Numbers. Namely, let {Xn : n 1} be a sequence of independent, identically
distributed random variables. By combining Exercise 1.2.11, part (ii) in Exercise
1.4.26, and part (i) above, show that1
lim P S n C = 1
The beautiful argument given below is due to Y. Guivarch, but its full power
cannot be appreciated in the present context (cf. Exercise 6.2.19). Furthermore,
a classic result (cf. Exercise 5.2.43) due to K.L. Chung and W.H. Fuchs gives a
much better result for the independent random variables. Their result says that
limn |Sn | = 0 P-almost surely.
In order to prove the assertion here, assume that limn |Sn | = with
positive P-probability, use Kolmogorovs 01 Law to see that |Sn | Palmost surely, and proceed as follows.
1
These ideas are taken from the book by Wm. Feller cited at the end of 1.2. They become
even more elegant when combined with a theorem due to E.J.G. Pitman, which is given in
Fellers book.
46
(i) Show that there must exist an > 0 with the property that
P ` > k S` Sk
for some k Z+ and therefore that
P(A) ,
where A : ` Z+ S` () .
and
o
n
n0 () = t R : 1 ` n t S`0 () < 2 ,
Pn
where Sn0 `=1 X`+1 . Next, let Rn () and Rn0 () denote the Lebesgue measure of n () and n0 (), respectively; and, using the translation invariance of
Lebesgue measure, show that
Rn+1 () Rn0 () 1A0 (),
where A0 : ` 2 S` () S1 () .
On the other hand, show that
EP Rn0 = EP Rn
n Z+ ,
1 P
E Rn .
n n
P(A) lim
(iii) In view of parts (i) and (ii), what remains to be done is show that
m = 0 = lim
1 P
E Rn = 0.
n
Rn ()
Sn ()
0,
0 =
n
n
47
Exercise 1.4.29. As I have already said, for many applications The Weak
Law of Large Numbers is just as good as and even preferable to the Strong
Law. Nonetheless, here is an application in which the full strength of the Strong
Law plays an essential role. Namely, I want to use the Strong Law to produce
examples of continuous, strictly increasing functions F on [0, 1] with the property
that their derivative
F (y) F (x)
=0
yx
yx
F 0 (x) lim
By familiar facts about functions of a real variable, one knows that such functions F are in one-to-one correspondence with non-atomic, Borel probability
measures on [0, 1] which charge every non-empty open subset but are singular
to Lebesgues measure.
Namely, F is the distribution function determined by :
F (x) = (, x] .
+
(i) Set = {0, 1}Z , and, for each p (0, 1), take Mp = (p )Z , where p on
{0, 1} is the Bernoulli measure with p ({1}) = p = 1 p ({0}). Next, define
7 Y ()
2n n [0, 1],
n=1
if 2n1 x b2n1 xc
1
2
1
2,
where
denotes the integer part of s. If {n : n 1} {0, 1} satisfies
Pbsc
Thus, p1 p2 whenever p1 6= p2 .
48
(iv) By Lemma 1.1.6, we know that 12 is Lebesgue measure [0,1] on [0, 1].
Hence, we now know that p [0,1] when p 6= 12 . In view of the introductory
remarks, this completes
the proof that, for each p (0, 1) \ { 12 }, the function
Fp (x) = p (, x] is a strictly increasing, continuous function on [0, 1] whose
derivative vanishes at Lebesgue-almost every point. Here, one can do better.
Namely, referring to part (iii), let p denote the set of x [0, 1) such that
1
n (x) = p,
n n
lim
where n (x)
n
X
m (x).
m=1
We know that 12 has Lebesgue measure 1. Show that, for each x 12 and
p (0, 1) \ { 12 }, Fp is differentiable with derivative 0 at x.
n
X
2m m (x)
m=1
Show that
Fp Rn (x) Fp Ln (x) = Mp
n
X
!
2m m = Ln (x)
m=1
When p
that
(0, 1) \ { 12 }
To complete the proof, for given x 12 and n 2 such that n (x) 2, let
mn (x) denote the largest m < n such that m (x) = 1, and show that mnn(x) 1
as n . Hence, since 2n1 < h 2n implies that
Fp (x) Fp (x h)
nmn (x)+1 Fp Rn (x) Fp Ln (x)
,
2
Rn (x) Ln (x)
h
lim
h&0
Fp (x + h) Fp (x)
= +.
h
The argument is similar to the one used to handle part (iv). However, this time
the role played by the inequality 4pq < 1 is played here by (2p)p (2q)q > 1 when
q = 1 p.
49
P-almost surely if
X
1
< .
2
b
n=1 n
1
Thus, for example, Sn grows more slowly than n 2 log n. On the other hand, if
Sn
;
the Xn s are N (0, 1)-random variables, then so are the random variables
n
and therefore, for every R (0, ),
P
Sn
lim R
n
[ Sn
S
R lim P N R > 0.
= lim P
N
N
n
N
nN
Hence, at least for normal random variables, one can use Lemma 1.4.1 to see
that
Sn
lim = P-almost surely;
n
n
1
(1.5.1)
q
2n log(2) (n 3),
where log(2) x log log x (not the logarithm with base 2) for x [e, ). That
is, one has The Law of the Iterated Logarithm:
(1.5.2)
Sn
= 1 P-almost surely.
n n
lim
This remarkable fact was discovered first for Bernoulli random variables by Khinchine, was extended by Kolmogorov to random variables possessing 2 + moments, and eventually achieved its final form in the work of Hartman and Wintner. The approach that I will adopt here is based on ideas (taught to me by
M. Ledoux) introduced originally to handle generalizations of (1.5.2) to random
50
variables with values in a Banach space.1 This approach consists of two steps.
The first establishes a preliminary version of (1.5.2) that, although it is far cruder
than (1.5.2) itself, will allow me to justify a reduction of the general case to the
case of bounded random variables. In the second step, I deal with bounded random variables and more or less follow Khinchines strategy for deriving (1.5.2)
once one has estimates like the ones provided by Theorem 1.3.12.
In what follows, I will use the notation
= []
S[]
and S =
for [3, ),
lim Sn a
(a.s., P) if
X
1
P S m a 2 < .
m=1
2
lim Sn = lim m1max m Sn lim m1max m
m
m
n
n m
n
Sn + m,n
1
,
2 lim maxm
m n
m
and therefore
P lim Sn a P
n
lim max
m n m
Sn + m,n
!
a
12
!
Sn + m,n
1
1
a 2 2P S m a 2 ,
P maxm
n
m
See 8.4.2 and 8.6.3 and, for much more information, M. Ledoux and M. Talagrand, Probability in Banach Spaces, Springer-Verlag, Ergebnisse Series 3.FolgeBand 23 (1991).
2 Here and elsewhere, I use (a.s.,P) to abbreviate P-almost surely.
51
lim Sn 8
(1.5.5)
(a.s., P).
X
3
P S2m 2 2 < .
(*)
m=0
2m
!
X
t [0, 1) :
Rn (t)Xn () a
n=1
#
"
a2
, a [0, ) and .
2 exp P2m
2 n=1 Xn ()2
Hence, if
)
2m
1 X
2
Xm () 2
: m
2 n=1
(
Am
and
2m
)!
X
3
,
t [0, 1) :
Rn (t)Xn () 2 2 2m
(
Fm () [0,1)
n=1
52
o Z
n
3
2
=
Fm ) P(d)
P : S2m () 2 2m
"
82m
exp P2m 2
2 n=1 Xn ()2
Z
2
h
i
P(d) 2 exp 4 log(2) 2m + 2P Am .
P
Thus, (*) comes down to proving that
m=0 P Am < ; and, in order to
check this, I argue in much the same way as I did when I proved the converse
statement in Kolmogorovs Strong Law. Namely, set
m
Tm =
2
X
Xn2 ,
Bm =
n=1
Tm+1 Tm
2 ,
2m
and T m =
Tm
2m
for m N. Clearly, P Am = P Bm . Moreover, the sets Bm , m N, are
mutually independent; and therefore, by the BorelCantelli Lemma, I need only
check that
Tm+1 Tm
2
= 0.
P lim Bm = P lim
m
m
2m
But, by the Strong Law, we know that T m 1 (a.s., P), and therefore it is
clear that
Tm+1 Tm
1 (a.s., P).
2m
I have now proved (1.5.5) with 4 replacing 8 for symmetric random variables.
To eliminate the symmetry assumption, again let (,
F, P) be the probability
0
0
0
space on which the Xn s are defined, let , F , P be a second copy of the
same space, and consider the random variables
(, 0 ) 0 7 Yn , 0 Xn () Xn ( 0 )
under the measure Q P P0 . Since the Yn s are obviously (cf. part (i) of
Exercise 1.4.21) symmetric, the result which I have already proved says that
lim
Sn () Sn ( 0 )
22 8
n|
Now suppose that limn |S
n > 8 on a set of positive P-measure. Then, by
Kolmogorovs 01 Law, there would exist an > 0 such that
|Sn ()|
8 + for P-almost every ;
n
n
lim
53
We have now got the crude statement alluded to above. In order to get the
more precise statement contained in (1.5.2), I will need the following application
of the results in 1.3.
Lemma 1.5.6. Let {Xn : n 1} be a sequence of independent random
variables with mean value 0, variance 1, and common distribution . Further,
assume that (1.3.4) holds. Then, for each R (0, ) there is an N (R) Z+
such that
!
#
"
r
8R log(2) n
R2 log(2) n
(1.5.7)
P Sn R 2 exp 1 K
n
for n N (R). In addition, for each (0, 1], there is an N () Z+ such that,
for all n N () and |a| 1 ,
(1.5.8)
h
i
1
P Sn a < exp a2 + 4K|a| log(2) n .
2
In both (1.5.7) and (1.5.8), the constant K (0, ) is the one in Theorem
1.3.15.
Proof: Set
n
=
n =
n
2 log(2) (n 3)
n
12
To prove (1.5.7), simply apply the upper bound in the last part of Theorem
1.3.15 to see that, for sufficiently large n Z+ ,
3
(Rn )2
K Rn
.
P Sn R = P S n Rn 2 exp n
2
3
This is Fubini at his best and subtlest. Namely, I am using Fubini to switch between horizontal and vertical sets of measure 0.
54
P Sn a < = P S n an < n ,
where an = an and n = n . Thus, by the lower bound in the last part of
Theorem 1.3.15,
2
an
K
2
+ K|an | n + an
P Sn a < 1 2 exp n
2
nn
!
h
i
K
exp a2 + 2K|a| + a2 n log(2) n
1 2
2 log(2) n
lim f
(1.5.10)
Sn
n
=
sup f (t)
(a.s., P).
t[1,1]
(Cf. Exercise 1.5.12 for a converse statement and 8.4.2 and 8.6.3 for related
results.)
Proof: I begin with the observation that, because of (1.5.5), I may restrict
my attention to the case when the Xn s are bounded random variables. Indeed,
for any Xn s and any > 0, an easy truncation procedure allows us to find an
Cb (R; R) such that Yn Xn again has mean value 0 and variance 1
while Zn Xn Yn has variance less than 2 . Hence, if the result is known
when the random variables are bounded, then, by (1.5.5) applied to the Zn s,
Pn
m=1 Zm ()
1 + 8,
lim Sn () 1 + lim
n
n
n
and, for a [1, 1],
Pn
Zm ()
lim Sn () a lim m=1
8
n
n
55
In view of the preceding, from now on I may and will assume that the Xn s
are bounded. To prove that limn Sn 1 (a.s., P), let (1, ) be given,
and use (1.5.7) to see that
h
i
1
1
P S m 2 2 exp 2 log(2) m
+
for all sufficiently
large m Z . Hence, by Lemma 1.5.3 with a = , we see
that limn Sn (a.s., P) for every (1, ). To complete the proof, I
must still show that, for every a (1, 1) and > 0,
P lim Sn a < = 1.
= inf
Skm Skm1
a
lim
k m
P-almost surely.
k m
m Z+ ,
X
P Ak,m = for sufficiently large k 2.
m=1
But
P Ak,m
km
km a
,
<
= P Skm km1
km km1 km km1
and, because
km
1 = 0,
lim max+
k mZ
m m1
k k
X
(*)
P Skm km1 a < =
m=1
for each k 2, a (1, 1), and > 0. Finally, referring to (1.5.8), choose 0 > 0
so small that a2 + 4K0 |a| < 1, and conclude that, when 0 < < 0 ,
h
i
1
P Sn < exp log(2) n
2
for all sufficiently large ns, from which (*) is easy.
56
Remark 1.5.11. The reader should notice that the Law of the Iterated Logarithm provides a naturally occurring sequence of functions that converge in
measure but not almost everywhere. Indeed, it is obvious that
Sn 0 in
2
L (P; R), but the Law of the Iterated Logarithm says that Sn : n 1 is
wildly divergent when looked at in terms of P-almost sure convergence.
Exercises for 1.5
Exercise 1.5.12. Let {Xn : n 1} be a sequence of mutually independent,
identically distributed random variables for which
|Sn |
< > 0.
n n
(1.5.13)
lim
In this exercise I4 will outline a proof that X1 is P-square integrable, EP X1 = 0,
and
(1.5.14)
1
Sn
Sn
= EP X12 2
= lim
n n
n n
lim
(a.s., P).
(i) Using Lemma 1.4.1, show that there is a [0, ) such that
(1.5.15)
lim
Sn
(a.s., P).
(a.s., P).
In other words, everything comes down to proving that (1.5.13) implies that X1
is P-square integrable.
(ii) Assume that the Xn s are symmetric. For t (0, ), set
nt = Xn 1[0,t] |Xn | Xn 1(t,) |Xn | ,
X
and show that
t, . . . , X
t , . . .
X
1
n
4
and
X1 , . . . , X n , . . .
I follow Wm. Feller An extension of the law of the iterated logarithm to variables without
variance, J. Math. Mech., 18 #4, pp. 345355 (1968), although V. Strassen was the first to
prove the result.
57
have the same distribution. Conclude first that, for all t [0, 1),
Pn
m=1 Xn 1[0,t] |Xn |
(a.s., P),
lim
n
n
X
Xn
1 < ,
P
n
n=1
and apply the BorelCantelli Lemma to conclude that Sn Sn1 0 (a.s., P).
Exercise 1.5.17. Let {Xn : n 1} be a sequence of RN -valued, identically
distributed random variables on (, F, P) with the property that, for each e
SN 1 = {x RN : |x| = 1}, e, X1 RN has mean value 0 and variance 1. Set
Pn
n | = 1 P-almost surely.
n = Sn , and show that limn |S
Sn = m=1 Xm and S
n
Here are some steps that you might want to follow.
58
and conclude first that limn |sn | 1 + and then that limn |sn | 1.
At the same time, since |sn | e1 , sn RN , show that limn |sn | 1. Thus
limn |sn | = 1.
(iii) Let {ek : k 1} be as in (i), and apply Theorem 1.5.9 to show that, for
n () : n 1} satisfies the condition in (i).
P-almost all , the sequence {S
Chapter 2
The Central Limit Theorem
In the preceding chapter I dealt with averages of random variables and showed
that, in great generality, those averages converge almost surely or in probability
to a constant. At least when all the random variables have the same distribution
and moments of all orders, one way of rationalizing this phenomenon is to recognize that the mean value is conserved whereas all higher moments are driven
to 0 when one averages. Of course, the reason why it is easy to conserve the
first moment is that the mean of the sum is the sum of the means. Thus, if one
is going to attempt to find a simple normalization procedure that conserves a
quantity involving more than the mean value, one should seek a quantity that
shares this additivity property.
With this in mind, one is led to ask what happens if one normalizes in a way
that conserves the variance. For this purpose, suppose that {Xn : n Z+ } is a
sequence of mutually independent, identicallyP
distributed random variables with
1
n
mean value 0 and variance 1, and set Sn = 1 Xk . Then Sn n 2 Sn again
has mean value 0 and variance 1. On the other hand, because of Theorem 1.5.9,
we know that, with probability 1, limn Sn = = limn Sn . Hence,
from the point of view of either almost sure convergence or even convergence in
probability, there is no hope that the Sn s will converge.
Nonetheless, the random variables {Sn : n 1} possess remarkable stability
when viewed from a distributional perspective. Indeed, if the Xn s are Gaussian,
then so are the Sn s, and therefore Sn N (0, 1) for all n 1. More generally,
even if the Xn s are not Gaussian, fixing their mean value and variance in this
way forces all their moments to stabilize. To be precise, assume that X1 has finite
moments of all orders, that its mean is 0, and that its variance is 1. Trivially,
L1 limn EP [Sn ] = 0 and L2 limn EP [Sn2 ] = 1. Next, assume that
L` limn EP [Sn` ] exists for 1 ` m, where m 2. I will show now that
Lm+1 limn EP [Snm+1 ] exists and is equal to mLm1 . To this end, first note
that, since EP [Xn ] = 0 and the Xn s are independent and identically distributed,
m
X
m+1
m
m P j+1 P mj
P
E Sn
= nE Xn Xn + Sn1
=n
E Xn E Sn1
j
j=0
P
59
60
m+1
Thus, after dividing through by n 2 , one gets the desired conclusion when
n . Starting from L1 = 0 and L2 = 1, one now can use induction to check
Qm
+
that L2m1 = 0 and L2m = `=1 (2` 1) = 2(2m)!
m m! for all m Z . That is,
lim EP Sn2m1 = 0
and
m
Y
(2m)!
lim EP Sn2m =
(2` 1) = m ,
n
2 m!
`=1
Notice that when the Xk s are identically distributed and have variance 1, the
Sn in (2.1.2) is consistent with the notation used above. Finally, set
(2.1.3)
m
1mn n
rn = max
and gn () =
n
i
1 X P h 2
E
X
,
X
m
n
m
2n m=1
61
1
In particular, because
rn2 2 + gn (),
(2.1.6)
> 0,
and Tn =
n
X
Yk ,
and observe that Tn is again an N (0, 1)-random variable and therefore that
EP (Sn ) h, 0,1 i = EP (Sn ) EP (Tn ) .
k =
Further, set X
Xk
n ,
Um =
and define
X
1km1
Yk +
k
X
for 1 m n,
m+1kn
where a sum over the empty set is taken to be 0. It is then clear that
n
X
m EP Um + Ym .
where m EP Um + X
Moreover, if
Rm () Um + (Um ) 0 (Um )
2 00
2 (Um ),
R,
62
n
n
X
2
k000 ku X P 3
, |Xm | n
E |Xm | , |Xm | n + k00 ku
EP X
m
6
1
1
n
2
k000 ku
k000 ku X m
00
+ k00 ku gn (),
+
k
k
g
()
=
u
n
2
6
6
n
1
while
n
X
1
3
n
3
X
k000 ku P
3 4 rn k000 ku
m
.
E |Y1 |3
EP |Rm (Yn )|
6
3n
6
1
63
lim
sup
n xB(0,2R)
|n (x) (x)|hR , n i + lim h(1 R )(n ), n i
n
2C lim h(1 R ), n i = lim 2C h, n i hR , n i = 2Ch(1 R ), i,
n
A Borel measure on a topological space is locally finite if it gives finite measure to compacts.
64
and similarly
lim h, n i h, i
n
lim hR , n i hR , i + C lim h(1 R ), n i + Ch(1 R ), i
n
= 2Ch(1 R ), i.
Finally, because is -integrable, h(1 R ), i 0 as R by Lebesgues
Dominated Convergence Theorem, and so we are done.
By combining Theorem 2.1.4 with the preceding, we have the following version
of the famous Central Limit Theorem.
Theorem 2.1.8 (Central Limit Theorem). With the setting the same as it
was in Theorem 2.1.4, assume that gn () 0 as n for each > 0. Then
lim EP n (Sn ) = h, 0,1 i
n
dy.
exp
(2.1.9)
lim P a Sn b = 0,1 (a, b] =
n
2
2 a
(See Exercise 2.1.10 for more information about the identically distributed case.)
Proof: Take n to be the distribution of Sn . By Theorem 2.1.4, we know
that h, n i h, 0,1 i for all Cc (RN ; R). In addition, we know that
h, n i = 2 = h, 0,1 i when (y) = 1 + y 2 . Hence, the first assertion is an
application of Lemma 2.1.7.
Turning to the second assertion, let a < b be given. To prove (2.1.9), choose
{k : k 1} Cb (R; R) and {k : k 1} Cb (R; R) so that 0 k % 1(a,b)
and 1 k & 1[a,b] as k . Then,
Z
i
P
as k , and, similarly,
lim P a Sn b lim EP k Sn =
Z
R
k (y) 0,1 (dy) 0,1 [a, b] .
65
lim EP Sn2 R2 1
In particular, by Lemma 2.1.7, this will certainly be the case whenever (2.1.1)
holds for every Cc (R; R). The purpose of this exercise is to show that the
Xn s are P-square integrable, have mean value 0, and variance no more than 1;
and the method which I will use is based on the same line of reasoning as was
given in Exercise 1.5.12.
(i) Assuming that X1 L2 (P; R), show that EP X1 = 0 and EP X12 1. In
particular, use this together with the result in part (i) of Exercise 1.4.27 to see
that it suffices to handle the case when the Xn s are symmetric.
(ii) In this and the succeeding parts of this exercise, we will be assuming that
the Xn s are symmetric. Following the same route as was suggested in (ii) of
Exercise 1.5.12, set
t = Xn 1[0,t] |Xn | Xn 1(t,) |Xn | ,
X
n
n Z+ ,
t, . . . , X
nt , . . . and X1 , . . . , Xn , . . . have the same distribuand recall that X
1
tion for each t (0, ). Use this
together with our basic assumption to see that
limR sup nZ+ P An (t, R) = 0, where
t(0,)
)
( n
n
X X
1
t n2 R .
An (t, R)
Xk
X
k
1
After noting that the Xn 1[0,t] |Xn | s are symmetric, check (cf. the proof of
Theorem 1.3.1) that EP |Snt |4 3t4 . In particular, conclude that, for each
t (0, ), there is an R(t) (0, ) such that
1
1
EP |Snt |2 , An t, R(t) 3 2 t2 P An t, R(t) 2 1
for all n Z+ .
66
(iv) Given t (0, ), choose R(t) (0, ) as in the preceding. Taking into
account the identity
Pn
Pn t
X
+
k
t
1
1 Xk
,
Sn =
1
2n 2
show that
EP X12 , |X1 | t = EP |Snt |2 EP |Snt |2 , An t, R(t) { + 1
h
i
EP Sn2 R(t)2 + 1
Next, define T2 for P to be the probability measure on R, BR given by
ZZ
x+y
(dx)(dy) for BR .
T2 () =
1
2
R2
After checking that T2 maps P into itself, use The Central Limit Theorem to
show that, for every P,
Z
Z
n
lim
d T2 =
d0,1 , Cb (R; C).
n
Conclude, in particular, that 0,1 is the one and only element of P with the
property that T2 = and that this fixed point is attracting. (See Exercise
2.3.21 for more information.)
Exercise 2.1.12. Here is another indication of the remarkable stability of normal random variables. Namely, I will outline here a derivation2 of the L
evy
Cram
er Theorem which says that if X and Y are independent random variables whose sum is normal (with some mean and variance), then both X and Y
are normal.
2
67
R2
,
P |X| r + R P |Y | r + R 4 exp
2
R (0, ).
In particular,
show that the moment generating
functions z C 7 M (z) =
EP ezX C and z C 7 N (z) = EhP eizY C exist and are entire functions.
2
Further, note that M (z)N (z) = exp z2 , and conclude that M and N never
vanish. Finally, from the fact that X + Y has mean 0, show that one can reduce
to the case in which both X and Y have mean 0. Thus, from now on, we assume
that M 0 (0) = 0 = N 0 (0).
(iii) Because M never vanishes and M (0) = 1, elementary complex analysis (cf.
Lemma 3.2.3) guarantees that there is a unique entire function : C C such
that (0) = 0 and M (z) = e(z) for all z C. Further, from M 0 (0) = 0, note
that 0 (0) = 0. Thus,
(z) =
cn z n
where n!cn =
n=2
z2
2
xX
dn
P
log
E
e
R.
dxn
x=0
i
(z) .
(x) 0
x2
(x)
2
for all x R.
h
exp Re
z2
2
(z)
i
i
h 2
= EP ezY exp x2 (x)
to arrive at
y 2 2Re (z) x2
for z = x +
1 y C.
68
z C.
(v) To complete the program, use Cauchys integral formula to show that, for
each n Z+ and r > 0, on the one hand,
Z 2
1
re 1 e 1 n d, r > 0,
cn r n =
2 0
while, on the other hand (since (z) = z) and therefore z (z) = 0),
re
0=
1 n
d.
Hence,
1
cn r =
e 1 n d,
Re re 1
n Z+ and r > 0.
Finally, in combination with the estimate obtained in (iv) and the fact that
c0 = c1 = 0, this leads to the conclusion that cn = 0 for n 6= 2 and therefore
that (z) = c2 z 2 with 0 c2 12 .
n2
1
2
n n1
t2
1
n
n3
2
1
1(1,1) n 2 t ,
Although E. Borel seems to have thought he was the first to discover this result and rhap
sodizes about it a good deal in Sur les principes de la cin
etique des gaz, Ann. lEcole
Norm.
69
k1 =
2 2
,
k2
where (t) is Eulers -function (cf. (1.3.20)), and then apply Stirlings formula
(cf. (1.3.21)) to see that
n2
1
2
n n1
1
2
as
n .
Now, using g to denote the density for the standard Gauss distribution (i.e., the
Gauss kernel in (1.3.5)), apply these computations to show that
sup sup
n3 tR
fn (t)
< and that
g(t)
fn (t)
1 uniformly on compacts.
g(t)
(ii) A less computational approach to the same calculation is the following. Let
{Xn : n
variables, and set
p 1} be a sequence of independent N (0, 1) random
2
2
Rn = X1 + + Xn . First note that P Rn = 0 = 0 and then that the
distribution of
1
n 2 X1 , . . . , Xn
n
Rn
R2
is n . Next, use The Strong Law of Large Numbers to see that nn 1 (a.s., P)
and conclude that, for any N Z+ ,
lim EP n(N ) = EP X1 , . . . , XN , Cc RN ; R ,
n
(N )
RN
RN
70
(iii) By considering the case when N = 2, show that, for any Cb (R; R),
Z
(2.1.15)
lim
Sn1 ( n)
1X
xk
n
!2
k=1
d0,1
n (dx) = 0.
Notice that the non-computational argument has the advantage that it immedi(N )
ately generalizes the earlier result to cover n for all N Z+ , not just N = 1
(cf. Exercise 2.3.24). On the other hand, the conclusion is weaker in the sense
that convergence of the densities has been replaced by convergence of integrals
with bounded continuous integrands and that no estimate on the rate of convergence is provided. More work is required to restore the stronger statements
when N 2.
When couched in terms of statistical mechanics, this result can be interpreted
as a derivation of the Maxwell distribution of velocities for an ideal gas of free
particles of mass 2 and having average energy 1.
Exercise 2.1.16. The most frequently encountered applications of Stirlings
formula (cf. (1.3.21)) are to cases when t Z+ . That is, one is usually interested
in the formula
n n
.
(2.1.17)
n! 2n
e
1
Z
n+n
1
1 1+4
xn ex dx
P Sn+1 0, 4 =
n! 1+n
Z 12 1
1
1
n n y
nn+ 2 en n + 4 1+n
12
dy.
y
e
1
+
n
=
1
n!
n 2
Z 14
1
x2
1
e 2 dx.
P Sn 0, 4
2 0
n 2 + 14
1
n 2
1+n1
1+n
12
n
ny
1
4
Z
dy
0
x2
2
dx,
71
and clearly (2.1.17) follows from these. In fact, if one applies the BerryEsseen
estimate proved in the next section, one finds that
2n
n!
n n
e
1
= 1 + O n 2 .
However, this last observation is not very interesting since we saw in Exercise
1.3.19 that the true correction term is of order n1 .4
2.2 The BerryEsseen Theorem via Steins Method
As we will see in the next section, the principles underlying the passage from
Theorem 2.1.4 to Theorem 2.1.8 are very general. In fact, as we will see in
Chapter 9, some of these principles can be formulated in such a way that they
extend to a very abstract setting. However, rather than delve into such extensions here, I will devote this section to a closer examination of the situation at
hand. Specifically, in this section we are going to see how to make the final part
of Theorem 2.1.8 quantitative.
From (2.1.5), we get a rate of convergence in terms of the second and third
derivatives of . In fact, if we assume that
(2.2.1)
1
k EP |Xk |3 3 < ,
1 k n,
and k k ,
Pn 3
Z
000
P
E Sn d0,1 2k ku 1 k
3n
3
R
x R 7 Fn (x) P Sn x [0, 1]
Limit Theorem. In fact, apart from the constant 2, what we now call Stirlings formula was
discovered first by DeMoivre while he was proving The Central Limit Theorem for Bernoulli
random variables. For more information, see, for example, Wm. Fellers discussion of Stirlings
formula in his Introduction to Probability Theory and Its Applications, Vol. I, Wiley, Series in
Probability and Math. Stat. (1968).
72
1
G(x) 0,1 (, x] =
2
t2
e 2 dt.
To see how (2.1.5) and (2.2.2) must be modified in order to gain such information,
first observe that
Z
0 (x) Fn (x) G(x) dx
R
Z
(2.2.5)
P
= E (Sn )
(y) 0,1 (dy), Cb1 (R; R .
R
(To prove (2.2.5), reduce to the case in which Cc1 (R; R) and (0) = 0;
and for this case apply either Fubinis Theorem or integration by parts over
the intervals (, 0] and [0, ) separately.) Hence, in order to get information
about the distance between Fn and G, we will have to learn how to replace
the right-hand sides of (2.1.5) and (2.2.2) with expressions that depend only on
the first derivative of . For example, if the dependence is on k0 ku , then we
get information about the L1 (R; R) distance between Fn and G, whereas if the
dependence is on k0 kL1 (R;R) , then the information will be about the uniform
distance between Fn and G.
2.2.1. L1 -BerryEsseen. The basic idea that I will use to get estimates in
terms of 0 was introduced by C. Stein and is an example of a procedure known
as Steins method.1 In the case at hand, his method stems from the trivial
observation that if is a Borel probability measure
on R and g is the Gauss
kernel in (1.3.5), then = 0,1 if and only if g = 0 in the sense of Schwartz
Lemma 2.2.6. Let C 1 (R; R), assume that k0 ku < , set = h, 0,1 i,
and define
Z x
2
x2
t2
2
dt R.
(t)e
(2.2.7)
x R 7 f (x) e
kf ku 2k0 ku ,
q
kf 0 ku 3 2 k0 ku ,
kf 00 ku 6k0 ku ,
73
and
f 0 (x) xf (x) = (x),
(2.2.9)
x R.
Proof: The facts that f C 1 (R; R) and that (2.2.9) holds are elementary
applications of The Fundamental Theorem of Calculus. Moreover, knowing that
f C 1 (R; R) and using (2.2.9), we see that f C 2 (R; R) and, in fact, that
f 00 (x) xf 0 (x) = f (x) + 0 (x),
(2.2.10)
x R.
To prove the estimates in (2.2.8), first note that, because and therefore f are
unchanged when is replaced by (0), I may and will assume that (0) = 0
and therefore that |(t)| k0 ku |t|. In particular, this means that
Z
Z
q
d0,1 k0 ku |t| 0,1 (dt) = k0 ku 2 .
t2
2
dt = 0, an alternative expression for f
(t)e
2
x2
t2
dt, x R.
(t)e
f (x) = e 2
Thus, by using the original expression for f (x) when x (, 0) and the
alternative one when x [0, ), we see first that
Z
2
x2
t sgn(x) e t2 dt, x R,
2
|f (x)| e
|x|
x2
2
t+
|x|
q
2
t2
e 2 dt.
But, since
Z
Z
t2
x2
t2
x2
d
t e 2 dt 1 = 0
e 2 dt e 2
e2
dx
x
x
t2
te 2 dt = 1 and e
for x [0, ),
x2
2
|x|
|x|
t2
e 2 dt
2;
which means that I have now proved the first estimate in (2.2.8). To prove the
other two estimates there, derive from (2.2.10)
x2
d x2 0
e 2 f (x) = e 2 f (x) + 0 (x)
dx
74
t2
f (t)+0 (t) e 2 dt,
x R.
Thus, reasoning as I did above and using the first estimate in (2.2.8) and the
relations in (2.2.9), (2.2.10), and (2.2.11), one arrives at the second and third
estimates in (2.2.8).
I now have the ingredients needed to apply Steins method to the following
example of a BerryEsseen type of estimate.
Theorem 2.2.12 (L1 -BerryEsseen Estimate). Continuing in the setting
of Theorem 2.1.4, one has that for all > 0 (cf. (2.1.3), (2.2.3), and (2.2.4))
(2.2.13)
Fn G
1
2 gn (2).
6(r
+
)
+
3
n
L (R;R)
Fn G
1
L (R;R)
6rn +
Pn
3
m=1 m
3n
Pn
3
9 m=1 m
.
3n
2
In particular, if m
= 1 and m < for each 1 m n, then
8 3
6 + 2 3
Fn G
1
.
L (R;R)
n
n
Proof: Let C 1 (R; R) having bounded first derivative be given, and define
f accordingly, as in (2.2.7). Everything turns on the equality in (2.2.9). Indeed,
because of that equality, we know that the right-hand side of (2.2.5) is equal to
n
X
2 P
m f (Sn ) ,
EP f 0 (Sn ) EP Sn f (Sn ) =
m
E f 0 (Sn ) EP X
m=1
m
n
m =
and X
Xm
n .
Next, define
m
Tn,m (t) = Sn + (t 1)X
2 0
m f Tn,m (t) dt
EP X
2 P
=
m
E f 0 Tn,m (0) +
Z
0
2 0
m f (Tn,m (t) f 0 Tn,m (0) dt
EP X
m
Am
R
where
m=1
m=1
75
Bm (t) dt,
h
i
Am EP f 0 Sn ) f 0 Tn,m (0)
and
h
i
2
m
Bm (t) EP X
f 0 (Tn,m (t) f 0 Tn,m (0)
.
i
kf 0 ku h 2
2
Bm (t) 2t
, |Xm | 2n .
m
kf 00 ku + 2 2 EP Xm
n
Thus, after summing over 1 m n, integrating with respect to t [0, 1], and
using (2.2.5), (2.2.15), and (*), we arrive at
Z
0 (x) Fn (x) G(x) dx rn + kf 00 ku + 2gn (2)kf 0 ku ,
R
76
Lemma 2.2.16.
C 1 (R; R), and define f accordingly, as in (2.2.7).
p Let
0
Then kf ku 8 k kL1 (R;R) and kf 0 ku k0 kL1 (R;R) .
Proof: I will assume, throughout, that k0 kL1 (R;R) = 1. Observe that, by the
Fundamental Theorem of Calculus, (cf. the notation in Lemma 2.2.6)
Z
(x)
where y (x) =
2e
x2
2
G(xy)G(x)G(y) 0.
and
x2
2xe 2 G(x y) G(x)G(y) + 1(,y] (x) G(y) 1
2
1
1 4 G(x) 12
4
and
G(x)
1 2
2
|x|
1
=
2
ZZ
2
2
!2
d
2 + 2
2
dd =
x2
1
1 e 2 ,
4
2 + 2 x2
which proves the first inequality. To get the second one, it suffices to consider
each of the four cases 0 x y, x 0 & y < x, y < x < 0, and x < 0 & y x
separately and take into account that, from the first part of (2.2.11),
x2
x2
x 0 = 2xe 2 1G(x) 1 and x < 0 = 2|x|e 2 G(x) 1.
77
(2.2.18)
3
m
3n
(2.2.19)
3
m
3
n2
3
max m
.
10
n
1mn
Proof: For each n Z+ , let n denote the smallest number with the property
that
Pn 3
kFn Gku 1 3 m
n
for all choices of random variables satisfying the hypotheses under which (2.2.18)
is to be proved. My strategy is to give an inductive proof that n 10 for all
n Z+ ; and, because 1 1 and therefore 1 1, I need only be concerned
with n 2.
m,
Given n 2 and X1 , . . . , Xn , define X
m , and Tn,m (t) for 1 m n and
t [0, 1] as in the proof of Theorem 2.2.12. Next, for each 1 m n, set
n,m =
2 ,
2n m
m =
m
,
n
n =
n
X
3
m
,
and n,m =
X ` 3
.
n,m
1`n
`6=m
Finally, set
Sn,m =
X`
1`n
`6=m
Sn,m
,
and Sn,m =
n,m
and let x R 7 Fn,m (x) P Sn,m x [0, 1] denote the distribution
function for Sn,m . Notice that, by definition, kFn,m Gku n1 n,m for each
1 m n. Furthermore, because (cf. (2.1.3))
2n,m
2
=1
m
1 rn2
2n
and n,m
n
3
(1 rn2 ) 2
n
n,m
1 m n,
3
n ,
78
(2.2.20)
1mn
n n1
3
(1 rn2 ) 2
Now let Cb2 (R; R) with k00 kL1 (R) < be given, define f accordingly, as
in (2.2.7), and let
{Am : 1 m n}
n,m 0
m 0 Tn,m ()
kf ku + max EP X
kf ku +
n
[0,1]
m 0 Tn,m () .
kf ku + kf 0 ku + max EP X
[0,1]
h
i
2 Tn,m (t) Tn,m (0)
+ EP X
m
m |3 kf ku + tEP |X
m |3 EP |Tn,m (0)| kf 0 ku
tEP |X
Z 1
P 3 0
m Tn,m (t) d
E X
+t
0
3
t
m
3 0
m Tn,m () .
kf ku + kf 0 ku + t max EP X
[0,1]
In order to handle the second term in the last line of each of these calculations,
introduce the function
n,m
0
y R.
(, , y) [0, 1] R 7 (, , y) Xm () +
n
79
E X
Xm ()
(, , y) 0,1 (dy) P(d)
R
Z
Z
Z
k
Z
=
Z
m ()k 0 (t, , y) G(y) Fn,m (y) dy P(d)
X
R
h
k
00
i
1
m k k00 kL1 (R;R) m n1 k kL 3(R;R) n ,
EP X
(1 rn2 ) 2
(1 rn2 )
n1 n
3
2
k {1, 3},
n
k0 kL1 (R;R)
n,m
(dy)
P(d)
0,1
1 .
R
2(1 rn2 ) 2
0
1
k kL (R;R)
n1 n
00
|Am | m kf ku + kf 0 ku +
3 k kL1 (R;R)
12 +
2
2
(1 rn )
2(1 rn2 )
and
0
1
k
k
n1 n
L (R;R)
00
3
|Bm (t)| t
m
kf ku + kf 0 ku +
3 k kL1 (R;R)
12 +
2
2
2
(1 rn )
2(1 rn )
for all 1 m n and t [0, 1], and, after putting these together with (2.2.5)
and (2.2.15), we conclude that
Z
0 (y) G(y) Fn (y) dy
R
3
kf ku + kf 0 ku
(2.2.21)
2
n1 k00 kL1 (R;R) n
k0 kL1 (R;R)
n .
+
3
1 +
(1 rn2 ) 2
2(1 r2 ) 2
n
80
1
h(x) = 1 x
if x < 0
if x [0, 1]
if x > 1,
and define
h (x) =
1 y h(x y) dy
R
where Cc R; [0, ) satisfies R (y) dy = 1. Finally, let a R be given,
and set
x R and , L > 0.
,L (x) = h xa
Ln ,
It is then an easy matter to check that k0,L kL1 (R;R) = 1 while k00,L kL1 (R;R)
2
Ln . Hence, by plugging the estimates from Lemma 2.2.16 into (2.2.21) and
then letting & 0, we find that, for each L > 0,
1 Z a+Ln
G(y) Fn (y) dy
sup
aR Ln a
r
2n1
1
3
n .
+
1+
3
1 +
8
2
(1 rn2 ) 2 L
2(1 r2 ) 2
(2.2.22)
But
1
Ln
1
Fn (y) dy Fn (a)
Ln
aLn
a+Ln
Fn (y) dy,
a
while
0
1
Ln
a+Ln
G(y) dy G(a) =
a
1
Ln
and, similarly,
1
0 G(a)
Ln
Z
a
a+Ln
Ln
(a + Ln y) 0,1 (dy) ,
8
Ln
G(y) dy .
8
aLn
3
kFn Gku +
2
L
3n1
3
9
n ,
+
+
1
3
1 +
32
(1 rn2 ) 2 L (8) 2
8(1 rn2 ) 2
81
+
n1
In order to complete the proof starting from (2.2.23), we have to consider the
1
1
. Because kFn Gku 1,
or n < 10
two cases determined by whether n 10
it is obvious that we can take n 10 in the first case. On the other hand, if
1
and we assume that n1 10, then, because
n 10
n
n
n
X
1 X P 2 32
1 X P
3
3
m
rn3 ,
E Xm =
E |Xm | 3
n = 3
n 1
n 1
1
8
32
2
e 2 dx
F2n+1 (tn ) G(tn ) =
2 tn
1
82
72
14 Pn
1
3
m
3n
Pn
3
12
3
m
is small.
() =
exp 1 , x RN (dx) for x RN .
RN
When is a probability measure which is the distribution of an RN -valued random variable X, probabilists usual call its Fourier transform the characteristic
function of X, and when admits a density with respect to Lebesgue measure
RN , one uses
Z
h
i
(2.3.2)
()
=
exp 1 , x RN (x) dx for RN
RN
in place of
to denote its Fourier transform.
Obviously,
is a continuous function that is bounded by the total variation
kkvar of ; and only slightly less obvious1 is the fact that, for Cc RN ; C ,
C RN ; C and that as well as all its derivatives are rapidly decreasing
1
(i.e., they tend to 0 at infinity faster than 1 + ||2
to any power).
() = ( 1) ()
is bounded by
k
.
1
N
L (R )
kk=n
1
83
It is then easy to check that Cb RN ; C and k kL1 (RN ;R) kkvar for every
cn (n )
().
2.1.7 applied to n (x) = e 1 (n ,x)RN and (x) = e 1 (,x)RN ,
Hence,
cn
uniformly on compacts. Conversely, suppose that
cn
h,
i
n
when Cc RN ; C . But, for such a , is smooth and rapidly decreasing,
and therefore the result follows immediately from the first part of the present
lemma together with Lebesgues Dominated Convergence Theorem.
84
n () =
n
0,1 ()
2n
for every R.
Actually, as we are about to see, a slight variation on the preceding will allow
us to lift the results that we already know for R-valued random variables to random variables with values in RN . However, before I can state this result, I must
introduce the analogs of the mean value and variance for vector-valued random
variables. Thus, given an RN -valued random variable X on the probability space
(, F, P) with |X| L1 (P; R), the mean value EP [X] of X is the m RN that
is determined by the property that
(, m)RN = EP , X RN
for all RN .
Similarly, if |X| is P-square integrable, then the covariance cov(X) of X is the
symmetric linear transformation C on RN determined by
, C RN = EP , X EP [X] RN , X EP [X] RN
for , RN .
Notice that cov(X) is not only symmetric
but is also non-negative definite,
since for each RN , , cov(X) RN is nothing but the variance of (, X)RN .
Finally, given m RN and a symmetric, non-negative C RN RN , I will use
m,C to denote the Borel probability measure on RN determined by the property
that
Z
Z
N
1
(dy), Cb (RN ; R),
(2.3.6)
dm,C =
m + C 2 y 0,1
RN
RN
85
(2.3.7)
[
()
=
exp
m,C
2
RN
n
X
Xm ,
Cn cov(Sn ) =
m=1
n
X
cov(Xm ),
m=1
1
n = Sn .
n = det(Cn ) 2N and S
n
n is consistent
Notice that when N = 1, the above use of the notation n and S
with that in 2.1.1.
With these preparations, I am ready to prove the following multidimensional
generalization of Theorem 2.1.8.
Theorem 2.3.8. Referring to the preceding, assume that the limit
A lim
(2.3.9)
Cn
2n
n
1 X P
E |Xm |2 , |Xm | n = 0 for each > 0.
2
n n
m=1
lim
sup sup
n1
yRN
|n (y)|
<
1 + |y|2
86
(2.3.12)
Sn
= h, 0,C i
lim EP n
n
n
whenever {n : n 1} C(RN ; C) satisfies (2.3.11) and converges to uniformly on compacts.
Proof: Given e SN 1 , set
n (e) =
e, Cn e RN
and n (e) =
n (e)
.
n
p
Then, (e) inf n1 n (e) (0, 1] and n (e) (e, Ae)RN as n . In
particular, if (e1 , . . . , eN ) is an orthonormal basis in RN , then
N
N
X
X
n |2 =
n 2N =
EP |S
EP ei , S
n (ei )2
R
i=1
N
X
i=1
i=1
ei , Aei
RN
RN
Hence, by Lemmas 2.1.7 and 2.3.3 plus (2.3.7), all that we have to do is check
that
i
h
1
(*)
fn () EP e 1 (,Sn )RN e 2 (,A)RN
for each RN .
When = 0, (*) is trivial. Thus, assume that 6= 0, set e =
(e,Sn )RN
. Because
Sn (e) =
|| ,
and take
n (e)
n
X
2
1
EP e, Xm RN , e, Xm RN n (e)
2
n (e) m=1
n
X
2
1
EP e, Xm RN , e, Xm RN (e)n (e)
2
2
(e) n m=1
87
tends to 0 for each > 0, Theorem 2.1.8 combined with Lemma 2.3.3 guarantees
that, for any R,
2
1
EP e 1 n Sn (e) e 2 ||
p
for any {n : n 1} R that tends to . In particular, if = (, A)RN and
n = n (e)||, we find that
1
fn () = EP e 1 n Sn (e) e 2 (,A)RN .
sup
yR
|(y)|
< .
1 + |y|2`
Proof: Refer to the discussion in the introduction to this chapter, and observe
that the argument there shows that
Z
(2`)!
P 2`
y 2` 0,1 (dy)
lim E Sn = ` =
n
2 `!
R
suph p , n i < ,
n1
88
n1
n1
{>R}
as R .
Knowing Lemma 2.3.15, ones problem is to find conditions under which one
n )] < for an interesting class of non-negative
can show that supn1 EP [(S
s. One such class is provided by the notion of a sub-Gaussian random variable. Given [0, ), an RN -valued random variable X is said to be -subGaussian if
2 ||2
EP e(,X)RN e 2 ,
(2.3.17)
RN .
R > 0,
N
2 |X|2
1 ()2 2 .
EP e 2
2 |X|2
< for some (0, ) and EP [X] = 0, then X
Conversely, if A EP e 2
2(1+A)
. In particular, if X is a bounded random
is -sub-Gaussian when =
R,
1
n
m=1 am Xm is
pPn
2
-sub-Gaussian when =
m=1 (am m ) .
Proof: Since the moment generating function of the sum of independent random variables is the product of the moment generating functions of the summands, the final assertion is essentially trivial.
To prove the first assertion, use Lebesgues Dominated Convergence Theorem
to justify
2 t2
e 2 1
1
P t(e,X)RN
=0
E (e, X)RN = lim t
E e
1 lim
t&0
t&0
t
89
and
2 t2
EP et(e,X)RN + EP et(e,X)RN 2
e 2 1
2
= 2
2 lim
E (e, X)RN = lim
t&0
t&0
t2
t2
P
tR
t(e,X) N
2 t2
R
E e
exp tR +
2
P
R2
2 2
by minimizing
1
P |X| R 2N max P (e, X)| RN N 2 R ,
eSN 1
2 |X|2
, use
the estimate for P(|X| R) follows. To get the estimate on EP e 2
Tonellis Theorem to see that
Z
Z
N
2 |X|2
2 ||2
=
EP e(,X)RN 0,2 I (d)
e 2 0,2 I (d) = 1()2 2 .
EP e 2
R
2 |X|2
< for some (0, ) and that EP [X]
Now assume that A = EP e 2
= 0. Then
1
||2 P 2 |||X|
E |X| e
(1 t)EP (, X)2RN et(,X)RN dt 1 +
2
0
||2
A||2
||2 ||22
||2 ||22 P 2 2 |X|2
4
2 ,
e
1+
1+A 2 e
e E |X| e
1+
2
EP e(,X)RN = 1 +
from which it is clear that X is -sub-Gaussian for the prescribed . In par 2 |X|2
2 K 2
e 2 for all 0,
ticular, if K = kXkL (P;RN ) (0, ), then EP e 2
and
X has mean value 0, then X is -sub-Gaussian
for =
p so, if, in addition,
By combining Lemmas 2.3.15 and 2.3.18 with Theorem 2.3.8, we get the following.
Theorem 2.3.19. Working in the setting and with the notation in Theorem
2.3.8, assume that, for each n Z+ ,
2
EP e(,Xn )RN en || ,
RN ,
90
where n (0, ). If
pPn
m=1
sup
2
m
n1
< ,
|(y)| Ce 2 , y RN ,
for some C < and 0, 1 . In particular, if the Xn s are identically
2
2
distributed with covariance C and if EP e |X1 | < for some (0, ),
then, for any C(RN ; C),
1
lim |y|2 log 1 + |(y)| = 0 = lim EP n 2 Sn = h, 0,C i.
n
|y|
P (1)k
for C
(ii) Take the branch of the logarithm
k
given by log = k=1
with |1 | < 1, and check that (1 ) + log |1 |2 for |1 | 12 .
Conclude first that
n
n
X h
i X
h Xm i R2 r2
X
m
n
log EP e 1 n
EP 1 e 1 n +
2
m=1
m=1
n
X
Xm
2
P
0
E 1 cos
n ()
n
2
m=1
uniformly for s in compacts.
91
n
n m=1
m=1
and that
n
X
2
2
gn ()
2
2
Xm
, |Xm | n 2 .
E 1 cos
n
m=1
P
Finally, combine these and apply (ii) to get limn 2 gn () 2 for all R.
() =
cos(x) (dx), R.
R
R.
Finally, note that 1 x log x for x (0, 1], apply this to the preceding to
get
Z
n
n
(1) < , n N,
2
1 cos 2 2 x (dx) log
R
92
and arrive at
x2 (dx) 2 log
(1)
R
Check that is symmetric and that = T2 . Hence, by (i), R x2 (dx) <
(in fact, is centered
normal). Finally, use this and part (i) of Exercise 1.4.27
R
to deduce that R x2 (dx) < .
(iii) Make the obvious extension of T2 to Borel probability measures on RN .
That is,
ZZ
x+y
(dx)(dy) for BRN .
T2 () =
1
1
22
RN RN
Using the result just proved when N = 1, show that = T2 if and only if
= 0,C for some non-negative definite, symmetric C.
Exercise 2.3.23. In connection with the preceding exercise, define T for
(0, ) and Borel probability measures on RN , so that
ZZ
1
T () =
1 2 (x + y) (dx)(dy), BRN .
RN RN
The problem under consideration here is that of determining for which s there
exist nontrivial (i.e., 6= 0 ) solutions to the fixed point equation = T .
Begin by reducing the problem to the case when N = 1. Next, repeat the initial
argument given in part (ii) of Exercise 2.3.21 to see that there is some solution
if and only if there is one that is symmetric. Assuming that is a non-trivial,
symmetric solution, use the reasoning in part (i) there to see that
Z
if (0, 2)
2
x (dx) =
0 if (2, ).
R
In particular, when (2, ), there are no non-trivial solutions to = T .
(See 3.2.3 for more on this topic.)
Exercise 2.3.24. Return to the setting of Exercise 2.1.13. After noting that,
so long as e Sn1 , the distribution of
x Sn1 n 7 (e, x)Rn R
93
for all (0, ) and s (1, ). Use the preceding to see that, for each
p (0, ),
r
p
p+1
2p
P
p if X N (0, 2 ).
(2.3.26)
E |X| =
2
The goal of the exercise is to show that the moments of sub-Gaussian random
variable display similar behavior.
(i) Suppose that X is -sub-Gaussian, and show that, for each p (0, ),
p
p
p
p
+1 .
= 2 2 +1
EP |X|p Kp p where Kp p2 2
2
2
(ii) Again suppose that X is -sub-Gaussian, and let 2 be its variance. Show
that
2+|p2|
+
(1 p
2)
p
EP |X|p K4
where q 0 =
q
q1
is the H
older conjugate of q.
m=1
m=1
2+|p2|
B p EP |S|p Kp B p .
K4
2+|p2|
v
u n
uX
a2m .
where A = t
m=1
94
(iv) The most famous case of the situation discussed in (iii) is when the Xm s are
symmetric Bernoulli (i.e., P(Xm = 1) = 12 ). First use (iii) in Exercise 1.3.17
or direct computation to check that Xm is 1-sub-Gaussian, and then conclude
that
+
(1 p
2)
(2.3.27)
K4
n
X
! p2
a2m
p #
" n
X
P
E
am Xm Kp
n
X
m=1
m=1
! p2
a2m
m=1
+
(1 p
2)
K4
EP
n
X
! p2
! p2
n
h
i
X
2
2
.
EP |S|p Kp EP
Xm
Xm
1
Hint: Refer to the beginning of the proof of Lemma 1.1.6, and let R1 , . . . , Rn be
the Rademacher functions on [0, 1), set Q = [0,1) P on [0, 1) , B[0,1) F ,
and observe that
n
X
7 S()
Xm ()
1
n
X
Rm (t)Xm ()
does under Q. Next, apply Khinchines inequality to see that, for each ,
+
(1 p
2)
K4
n
X
1
! p2
Xm ()2
[0,1)
T (t, )p dt Kp
n
X
! p2
Xm ()2
and complete the proof by taking the P-integral of this with respect to .
At least when p (1, ), I will show later that this sort of inequality holds
in much greater generality. Specifically, see Burkholders Inequality in Theorem
6.3.6.
Exercise 2.3.29. Suppose that X is an RN -valued Gaussian random variable
with mean value 0 and covariance C.
(i) Show that if A : RN RN is a linear transformation, then AX is an
N (0, ACA> ) random variable, where A> is the adjoint transformation.
95
EP e 1(,X)RN e 1(,X)RN = EP e 1(,X)RN EP e 1(,X)RN
C(11)
C(21)
C(12)
C(22)
,
where the block structure corresponds to RN = RN1 RN2 , and assume that
C(22) is non-degenerate. Show that the one and only transformation of the
sort in part (iii) is given by
=
and therefore that
> =
0(11)
C1
(22) C(21)
0(11)
0(21)
0(12)
I(22)
,
C(12) C1
(22)
I(22)
.
Hint: Note
that = 0 if (2) = 0(2) , = if (1) = 0(1) , and that
C(I ) (21) = 0(21) .
(v) Continuing with the assumption that C(22) is non-degenerate, show that
X=
C(12) C1
(22) Y
Y
+
Z
0
,
Z
RN1
(22)
x(2) ,B (dx(1) ) 0,C(22) (dx(2) ).
96
Exercise 2.3.30. Given h L2 (RN ; C), recall that the (n + 2)-fold convolution
h?(n+2) is a bounded continuous function for each n N. Next, assume that
h(x) = h(x) for almost every x RN and that h 0 off of BRN (0, 1). As an
application of part (iii) in Exercise 1.3.22, show that
"
?(n+2)
h
(x) 2khk2 2
L
(|x| 2)+
n
khk
exp
N
1
N
(R ;C)
L (R ;C)
2n
2 #
Hint: Note that h L1 (RN ; C), assume that M khkL1 (RN ;C) > 0, and define
Af = M 1 h ? f for f L2 (RN ; C). Show that A is a self-adjoint contraction on
L2 (RN ; C), check that
h?(n+2) (x) = M n Tx h, An h L2 (RN ;C) ,
where Tx h h( + x), and note that
Tx h, A` h L2 (RN ;C) = 0
if ` |x| 2.
Hn (x) = (1)n e
x2
2
dn x2
e 2 ,
dxn
x R.
Clearly, Hn is an nth order, real, monic (i.e., 1 is the coefficient of the highest
order term) polynomial. Moreover, if we define the raising operator A+ on
C 1 (R; C) by
x2
d
d x2
e 2 (x) = (x) + x(x),
A+ (x) = e 2
dx
dx
then
(2.4.2)
Hn+1 = A+ Hn
for all n N.
x R,
97
At the same time, if and are continuously differentiable functions whose first
derivatives are tempered (i.e., have at most polynomial growth at infinity), then
(2.4.3)
, A+
L2 (
0,1 ;C)
= A ,
L2 (0,1 ;C)
Hm , Hn
L2 (0,1 ;C)
= Hm , An+ H0
L2 (0,1 ;C)
= An Hm , H0
d
dx .
After combining
L2 (0,1 ;C)
= m! m,n ,
where, at the last step, I have used the fact that Hm is a monic mth order
polynomial. Hence, the (normalized) Hermite polynomials
(1)n x2 dn x2
Hn (x)
e 2 ,
= e2
H n (x) =
dxn
n!
n!
x R,
form an orthonormal set in L2 (0,1 ; C). (Indeed, they are one choice of the
orthogonal polynomials relative to the Gauss weight.)
Lemma 2.4.4. For each C, set
2
,
H(x; ) = exp x
2
x R.
Then
(2.4.5)
H(x; ) =
X
n
Hn (x),
n!
n=0
x R,
Proof: By (2.4.1) and Taylors expansion for the function e 2 , it is clear that
(2.4.5) holds for each (x, ) and that the convergence is uniform on compact
subsets of R C. Furthermore, because the Hn s are orthogonal, the asserted
uniform convergence in L2 (0,1 ; C) comes down to checking that
lim
n 2
X
Hn k2 2
L (0,1 ;C) = 0
||R n=m n!
sup
for every R (0, ), and obviously this follows from our earlier calculation that
2
Hn
2
= n!.
L ( ;C)
0,1
98
To prove the assertion that H n : n N forms an orthonormal basis in
L2 (0,1 ; C), it suffices to check that any L2 (0,1 ; C) that is orthogonal to all
of the Hn s must be 0. But, because of the L2 (0,1 ; C) convergence in (2.4.5),
we would have that
Z
(x) ex 0,1 (dx) = 0, C,
R
x2
2
(x)
,
2
x R,
then kkL1 (R;C) = kkL1 (0,1 ;C) kkL2 (0,1 ;C) < and (cf. (2.3.2)) 0,
which, by the L1 (R; C) Fourier inversion formula
Z
1
d &0
in L1 (R; C),
e|| e 1 x ()
2 R
H Hn = n Hn
for each n N.
X
2
Dom H = L2 (0,1 ; C) :
||2n , H n L2 (0,1 ;C) <
n=1
H =
n , H N
L2 (0,1 ;C)
H n,
Dom H .
n=0
This kernel appears in the 1866 article by Mehler referred to in the footnote following (2.1.14).
.
It arises there as the generating function for spherical harmonics on the sphere S
99
for all (0, 1) and (x, ) R C. In conjunction with (2.4.5), this means that
Z
(2.4.6)
H =
and from here it is not very difficult to prove the following properties of H for
(0, 1).
Lemma 2.4.7. For each L2 (0,1 ; C), (, x) (0, 1) R 7 H (x)
C may be chosen to be a continuous function that is non-negative if 0
Lebesgue-almost everywhere. In addition, for each (0, 1) and every p
[1, ],
H
p
L (
(2.4.8)
0,1 ;C)
H (x)p 0,1 (dx)
ZZ
||p d0,1 .
R
RR
Hence, (2.4.8) is now proved for p [1, ). The case when p = is even easier
and is left to the reader.
The conclusions drawn in Lemma 2.4.7 from the Mehler representation in
(2.4.6) are interesting but not very deep (cf. Exercise 2.4.36). A deeper fact is
100
the relationship between Hermite multipliers and the Fourier transform. For the
purposes of this analysis, it is best to define the Fourier operator F by
Z
(2.4.9)
Ff () =
e 1 2x f (x) dx, R,
R
of any further factors of 2, the Parseval Identity (cf. Exercise 2.4.37) becomes
the statement that F determines a unitary operator on L2 (R; C). In order to
relate F to Hermite multipliers, observe that, after analytically continuing the
result of another simple Gaussian computation,
Z
2
2
ex ex dx = e 4 for all C,
R
X
p
2
n
2p x ex dx
e 1 2x Hn
n! R
n=0
n
p
p
2 X
(p 1)2
2p0 ,
pn Hn
+ 1 2p = e
=e
exp
n!
2
n=0
1
p
is the H
older conjugate of p and p 1 (p 1) 2 . Thus,
where p0 = p1
we have now proved that, for each p (1, ) and n N,
Z
p
p
2
2
2p0 x e .
2p x ex dx = pn Hn
(2.4.10)
e 1 2x Hn
n N,
See Exercise 2.4.35, where it is shown that Ap < 1 for p (0, 1).
101
H
q
L (
0,1 ;C)
for all
L2 (0,1 ; C)
if
(2.4.16)
|1 |q + |1 + |q
2
q1
|1 |p + |1 + |p
2
p1
for every C.
That (2.4.16) implies (2.4.15) is trivial is quite remarkable. Indeed, it takes
a problem in infinite dimensional analysis and reduces it to a calculus question
about functions on the complex plane. Even though, as we will see later, this
reduction leads to highly non-trivial problems in calculus, Theorem 2.4.14 has
to be considered a major step toward understanding the contraction properties
of Hermite multipliers.3
The first step in the proof of Theorem 2.4.14 is to interpret (2.4.16) in operator theoretic language. For this
the standard Bernoulli
purpose, let denote
probability measure on R, BR . That is, {1} = 12 . Next, use to denote
the function on R that is constantly equal to 1 and {1} to stand for the identity function on R (i.e., {1} (x) = x, x R). It is then clear that and
{1} constitute an orthonormal basis in L2 (; C); in fact, they are the orthogonal polynomials there. Hence, for each C, we can define the Bernoulli
multiplier K as the unique normal operator on L2 (; C) prescribed by
K F =
2
if F =
{1}
if F = {1}.
See Beckners Inequalities in Fourier analysis, Ann. Math., # 102 #1, pp. 159182 (1975).
Later, in his article Gaussian kernels have only Gaussian maximizers, Invent. Math. 12,
pp. 179208 (1990), E. Lieb essentially killed this line of research. His argument, which is
entirely different from the one discussed here, handles not only the Hermite multipliers but
essentially every operator whose kernel can be represented as the exponential of a second order
polynomial.
3
102
if F 6= . Note that F : F {1, . . . , n} is an orthonormal basis for L2 ( n ; C),
and define Kn to be the unique normal operator on L2 ( n ; C) for which
Kn F = |F | F ,
(2.4.18)
F {1, . . . , n},
where |F | is used here to denote the number of elements in the set F . Alternatively, one can describe Kninductively on n Z+ by saying that K1 = K
and that, for C Rn+1 ; C and (x, y) Rn R,
(n+1)
K
(x, y) = K (x, ) (y) where (x, y) = Kn ( , y) (x).
It is this alternative description that makes it easiest to see the extension
of (2.4.17) alluded to above. Namely, what I will now show is that, for every
n Z+ ,
(2.4.19)
(2.4.17) =
Kn
q n kkLp ( n ;C) , L2 ( n ; C).
L ( ;C)
K
(x,
)
(y)
(dy)
n (dx)
L (
;C)
Rn
Z
pq
|( , y)|p
Rn
Z
Z
(dx) =
n
Rn
Z
L p ( n ;C)
pq
p
( , y) (dy)
q
L p ( n ;C)
Z
pq
pq
p
=
( , y) Lq ( n ;C) (dy)
(dy)
R
pq
( , y)
p p n (dy)
= kkqLp ( n+1 ;C) ,
L ( ;C)
103
where, in the passage to the third line, I have used the continuous form of
Minkowskis Inequality (it is at this point that the only essential use of the
hypothesis p q is made).
I am now ready to take the main step in the proof of Theorem 2.4.14.
Lemma 2.4.20. Define An : L2 (; C) L2 n ; C) by
An (x) =
Pn
`=1 x`
for x Rn .
and
(2.4.22)
H ,
L2 (0,1 ;C)
= lim Kn An , An
n
L2 ( n ;C)
for every (0, 1). Moreover, if, in addition, either or is a polynomial, then
(2.4.22) continues to hold for all C.
Proof: Let and be tempered elements of C(R; C), and define
fn () = Kn An , An
L2 ( n ;C)
and f () = H ,
L2 (0,1 ;C)
(0, 1).
lim fn () = f (),
Notice that (2.4.23) is (2.4.22) for (0, 1) and that In (2.4.21) follows from
(2.4.22) with = 1, = ||p , and any (0, 1).
In order to prove (2.4.23), I will need to introduce other expressions for f ()
and the fn ()s. To this end, set
C =
,
104
R
R
that R {0} (y) k (1, dy) =R {0} (1) and R {1} (y) k (1, dy) = {1} (1)
and therefore K (1) = R (y) k (1, dy) for all . Hence, if be the
probability measure on R2 determined by (dx dy) = k (x, dy) (dx) or,
equivalently,
and {(1, 1)} = 1
{(1, 1)} = 1+
4 ,
4
then
K ,
L2 (;C)
=
R2
R2
R2
Z+
, F = B ,
Pn
1 Zm
,
F
n
An , F L2 ( n ;C) F , An L2 ( n ;C) ,
F
n Z+ and C.
105
time, because (, Hm )L2 (0,1 ;C) = 0 for m > k, f is also a polynomial of degree
at most k, and therefore (2.4.23) already implies that the convergence extends
to the whole of C and is uniform on compacts. Finally, in the case when ,
instead of , is a polynomial, simply note that
Kn An , An
and H ,
L2 (
0,1 ;C)
L2 ( n ;C)
= H,
= Kn
An , An
L2 (0,1 ;C)
L2 ( n ;C)
Proof of Theorem 2.4.14: Assume that (2.4.16) holds for a given pair 1 <
p q < and D. We then know that (2.4.19) holds for every n Z+ .
Hence, by Lemma 2.4.20, if and are tempered elements of C(R; C) and at
least one of them is a polynomial, then
H , L2 (0,1 ;C) = lim Kn An , An 2 n
n
L ( ;C)
lim
An
Lp ( n ;C)
An
Lq0 ( n ;C) = kkLp (0,1 ;C) kkLq0 (0,1 ;C) .
n
In other words, we now know that, for all tempered and from C(R; C),
(2.4.24)
H
,
kkLp (0,1 ;C) kkLq0 (0,1 ;C)
L2 (0,1 ;C)
so long as one or the other is a polynomial.
To complete the proof when p (1, 2], note that, for any fixed polynomial
, (2.4.24) for every tempered C(R; C) guarantees that the inequality in
(2.4.15) holds for that . At the same time, because p (1, 2] and the polynomials are dense in L2 (0,1 ; C), (2.4.15) follows immediately from its own restriction
to polynomials.
Finally, assume that p [2, ) and therefore that q 0 (1, 2]. Then, again
because the polynomials are dense in L2 (0,1 ; C), (2.4.24) for a fixed tempered
C(R; C) and all polynomials implies (2.4.15) first for all tempered continuous s and thence for all L2 (0,1 ; C).
2.4.3. Applications of Beckners Theorem. I will now apply Theorem
2.4.14 to two important examples. The first example involves the case when
(0, 1) and shows that the contraction property proved in Lemma 2.4.7 can
be improved to say that, for each p (1, ) and (0, 1), there is a q =
q(p, ) (p, ) such that H is a contraction on Lp (0,1 ; C) into Lq (0,1 ; C).
Such an operator is said to be hypercontractive, and the fact that H is
hypercontractive was first proved by E. Nelson in connection with his renowned
construction of a non-trivial, two-dimensional quantum field.4 The proof that
4
Nelsons own proof appeared in his The free Markov field, J. Fnal. Anal. 12, pp. 1221
(1974).
106
I will give is entirely different from Nelsons and is much closer to the ideas
introduced by L. Gross5 as they were developed by Beckner.
Theorem 2.4.25 (Nelson). Let (0, 1) and p (1, ) be given, and set
q(p, ) = 1 +
p1
.
2
Then
H
q
kkLp (0,1 ;C) ,
L (0,1 ;C)
(2.4.26)
L2 (0,1 ; C),
(2.4.27)
0,1 ;C)
o
: kkLp (0,1 ;C) = 1 = .
p1
q1
12
I begin with the case when 1 < p < q 2, and I will first consider [0, 1).
Introducing the generalized binomial coefficients
r
r(r 1) (r ` + 1)
`!
`
for r R and ` N,
X
q
|1 |q + |1 + |q
=1+
()2k
2
2k
k=1
and
X
p
|1 |p + |1 + |p
=1+
2k .
2
2k
k=1
See Grosss Logarithmic Sobolev inequalities, Amer. J. Math. 97 #4, pp. 10611083
(1975). In this paper, Gross introduced the idea of proving estimates on H from the corresponding estimates for K . In this connection, have a look at Exercises 2.4.39 and 2.4.41.
107
q
Noting that, because q 2, 2k
0 for every k Z+ , and using the fact that,
p
because pq (0, 1), (1 + x) q 1 + pq x for all x 0, we see that
|1 |q + |1 + |q
2
pq
pX q
()2k .
1+
q
2k
k=1
Hence, I will have completed the case under consideration once I check that
X
p
pX q
2k
()
2k ,
q
2k
2k
k=1
k=1
p
p q
2k
q 2k
2k
for each k Z+ .
But the choice of in (2.4.28) makes the preceding an equality when k = 1, and,
when k 2,
2k
p q
2k1
Y jq
q 2k
1,
p
jp
2k
j=2
|1 | + |1 + |
,
2
b=
|1 | |1 + |
,
2
and c =
b
[1, 1].
a
Then
|1 | = 1+
2 (1 ) +
1
2 (1
) a b,
1
1
|1 c|q + |1 + c|q q
|1 |q + |1 + |q q
a
2
2
1
1
1
|1 |p + |1 + |p p
|a b|p + |a + b|p p
|1 c|p + |1 + c|p p
.
=
=
a
2
2
2
Hence, I have now completed the case when 1 < p < q 2 and is given by
(2.4.28).
108
To handle the other cases, I will use the equivalence of (2.4.16) and (2.4.17).
Thus, what we already know is that (2.4.17) holds for 1 < p < q 2 and the
in (2.4.28). Next, suppose that 2 p < q < . Then, since 1 < q 0 < p0 2 and
q0 1
p1
,
= 0
p 1
q1
kkLp (;C) ,
where the is the one given in (2.4.28). Thus, the only case that remains is the
1
1
one when 1 < p 2 q < . But, in this case, set = (p 1) 2 , = (q 1) 2 ,
and observe that, because the associated in (2.4.28) is the product of with
, K = K K and therefore
K
q
K
L2 (;C) kkLp (;C) .
L (;C)
As my second, and final, application of Theorem 2.4.14, I present the theorem
of Beckner for which he concocted Theorem 2.4.14 in the first place. The result
was
originally by H. Weyl, who guessed, on the basis of Fh0 =
conjectured
0
n
( 1) h0 , that the norm kFkpp0 of F as an operator on Lp (R; C) to Lp (R; C)
should be achieved by h0 . Weyls conjecture was partially verified by I. Babenko,
who proved it when p0 is an even integer. In particular, when combined with
the RieszThorin Interpolation Theorem, Babenkos result already shows (cf.
Exercise 2.4.35) that kFkpp0 < 1 for p (0, 1).
109
(2.4.32)
1
= + 1 (p 1) 2 , where , R.
1 + (p 1)
(*)
h
2
+ (p 1)
i p2
1+
2
+ (p 1)
i p2 p1
for all , R.
To prove (*), consider,
(0, ), the function g : [0, )2 [0, )
1for each
1
defined by g (x, y) = x +y . It is an easy matter to check that g is concave
or convex depending on whether [1, ) or (0, 1). In particular, since
p0
2
2
p0
2,
p0
+ (p 1)
i p20
1+
2
+ (p 1)
i p20
2
g x , y + g x+ , y
x + x+
,y
g
=
2
2
|1 |p + |1 + |p
2
! 20
p
p20
+ (p 1) 2
110
p
2
2
(0, 1),
0
+ (p 1)
i p2
1+
2
+ (p 1)
i p2
2
"
|1 |p + |1 + |p
2
# p2
p2
+ (p0 1) 2
(**)
|1 |p + |1 + |p
2
! 20
p
+(p1)
|1 |p + |1 + |p
2
p2
+(p0 1) 2 .
But because (cf. Theorems 2.4.14 and 2.4.25) we know that (2.4.16) holds with
1
p replaced by 2, q = p0 , and = p 1 2 , the left side of (**) is dominated by
1
(p 1) +
1 (p0 1) 2
2
+ 1 + (p0 1) 2
2
2
= 1 + (p 1) 2 + (p0 1) 2 .
At the same time, again by (2.4.16), only this time with p, 2, and the same
choice of , we see that the right-hand side of (**) dominates
1
(p 1) +
1 (p 1) 2
2
+ 1 + (p 1) 2
2
2
= 1 + (p 1) 2 + (p0 1) 2 .
(2.4.34)
f S =
f 1 + f 2
2R -almost everywhere,
111
(ii) For each n 0, let Z (n) denote span {Hn 1 Hnm 2 : 0 m n} .
S
2
Show that Z (m) Z (n) in L2 (0,1
; R) when m 6= n and the span of n=0 Z (n)
2
2
is dense in L2 (0,1
; R). Conclude from these that if F L2 (0,1
; R), then F =
P
(n)
and the series
n=0 n F , where n denotes orthogonal projection onto Z
convergences in L2 (0,1 ; R).
(iii) Using the generating (2.4.5), show that
n
X
n
n
Hm 1 Hnm 2 ,
Hn S = 2 2
m
m=0
1 + 2
, Hn
=
L2 (0,1 ;R)
1
2
2 n!
Hn 1 + Hn 2 .
f (x) = f, H1
Z
L2 (0,1 ;R)
H1 (x) =
f () 0,1 (d) x
is a strictly convex function that tends to 0 at both end points and is therefore
strictly negative. Hence, Ap < 1 for p (1, 2).
112
Check that takes B(E; C) into itself and that kku kku . Next, given a
-finite measure on (E, B), say that is -invariant if
Z
() =
(x, ) (dx) for all B.
E
B(E; C).
hn =
24
1
(n!) 2
hn ,
n N.
113
(i) Set
F (t) = kft kLq(t) (;C) ,
and, by somewhat tedious but completely elementary differential calculus, show
that
Z
q(t)
F (t)1q(t)
ft
q(t)
dF
d
q(t)
f
log
F
(t)
dt (t) =
q(t)2
R
Z
q(t)2
q(t)1
ft (x)
ft (x) ft (x) (dx) .
+ 2
R
q1
q1
4(q 1) 2 2
( )
q2
2
conclude that
dF
dt
(**)
Z
q(t)
F (t)1q(t)
ft
q(t)
d
q(t)
f
log
(t)
F (t)
q(t)2
R
Z
q(t)
2
q(t)
2
+ q(t) 1
ft (x) 2 ft (x) (dx) .
R
2
d 2
(x) (x) (dx)
(2.4.40)
log kk 2
R
L (;C)
114
Hint: Reduce to the case when (x) = 1 + bx for some b (0, 1), and, in this
case, check that (2.4.40) is the elementary calculus inequality
(1 + b)2 log(1 + b) + (1 b)2 log(1 b) (1 + b2 ) log(1 + b2 ) 2b2 ,
b (0, 1).
(iii) By plugging (2.4.40) into (**), arrive at (*), and conclude that (2.4.17)
holds for (0, 1) and q = 1 + p1
2 .
Exercise 2.4.41. The major difference between Grosss and Beckners approaches to proving Nelsons Theorem 2.4.25 is that Gross based his proof on
the equivalence of contraction results like (2.4.17) and (2.4.15) to Logarithmic
Sobolev Inequalities like (2.4.40). In Exercise 2.4.38, I outlined how one passes
from a Logarithmic Sobolev Inequality to a contraction result. The object of this
exercise is to go in the opposite direction. Specifically, starting from (2.4.26),
show that
Z
(2.4.42)
log
R
2
kkL2 (
0,1 ;C)
Z
d0,1 2
|0 |2 0,1 (dx)
To see that this estimate is quite good, show that kH1 kpLp (0,1 ;C) =
22
1
2
and apply Stirlings formula (1.3.21) to conclude that kH1 kLp (0,1 ;C)
as p .
p+1
2
p1
,
1
p1 2
e
Chapter 3
Infinitely Divisible Laws
The results in this chapter are an attempt to answer the following question.
GivenPan RN -valued random variable Y with the property that, for each n Z+ ,
n
Y = m=1 Xm , where X1 , . . . , Xn are independent and identically distributed,
what can one say about the distribution of Y?
Recall that the convolution 1 ? 2 of two finite Borel measures 1 and 2 on
RN is given by
ZZ
1 ? 2 () =
1 (x + y) 1 (dx)2 (dy), BRN ,
RN RN
and that the distribution of the sum of two independent random variables is the
convolution of their distributions. Thus, the analytic statement of our problem
is that of describing those probability measures that, for each n 1, can be
of some probability measure n1 .
written as the n-fold convolution power ?n
1
n
I will say that such a is infinitely divisible and will use I(RN ) to denote
the class of infinitely divisible measures on RN . Since the Fourier transform
takes convolution into ordinary multiplication, the Fourier formulation of this
problem is that of describing those Borel probability measures on RN whose
Fourier transform
has, for each n Z+ , an nth root which is again the Fourier
transform of a Borel probability measure on RN .
Not surprisingly, the Fourier formulation of the problem is, in many ways, the
most amenable to analysis, and it is the formulation in terms of which I will solve
it in this chapter. On the other hand, this formulation has the disadvantage that,
although it yields a quite satisfactory description of
, it leaves the problem
of extracting information about from properties of
. For this reason, the
following chapter will be devoted to developing a probabilistic understanding of
the analytic answer obtained in this chapter.
3.1 Convergence of Measures on RN
In order to carry out our program, I will need two important facts about the
convergence of probability measures on RN . The first of these is a minor modification of the classical HellyBray Theorem, and the second is an improvement,
due to Levy, of Lemma 2.3.3.
115
116
lim sup B(0, R){ = 0.
(3.1.2)
R S
K
X
k hk , nm i = lim h, nm i kku ,
m
m
() = lim
k=1
`+1 B(0, ` + 1) `+1 B(0, `) = ` B(0, `)
117
Hence, if
X
() lim ` B(0, `) =
` B(0, `) \ B(0, ` 1) ,
`
`=1
and so we would arrive at the contradiction 1 = limR B(0, R) .
3.1.2. L
evys Continuity Theorem. My next goal is to find a test in terms
of the Fourier transform to determine when (3.1.2) holds.
Lemma 3.1.3. Define
1
s(r) = inf
sin
for r (0, ).
1
(re) rR + 2 {y : |(e, y)RN | R} for all e SN 1 ,
and
1
B(0, N 2 R){ N sup {y : |(e, y)RN | R}
eSN 1
(3.1.5)
N
max 1
() : || r .
s(rR)
118
(3.1.6)
||&0 S
(3.1.4), simply observe that 1 e 1(re,y)RN 2 r|(e, y)RN | .
Turning to (3.1.5), note that
1
()
RN
To prove
1 cos(, y)RN (dy).
1
(te) dt
Z
RN \{0}
!
(dy)
s(rR) {y : |(e, y)RN | R} ,
and therefore
(3.1.7)
() s(rR) {y : |(e, y)RN | R} .
sup 1
B(0,r)
119
f (i j )i j 0
for all 1 , . . . , n C.
i,j=1
1(i j ,x)RN
2
n
X
1(
,x)
i
RN ,
i j =
e
i
i=1
i,j=1
()
dx d 0.
f (x )(x)
RN RN
In particular, when f L1 RN ; C , set
N
m(x) = (2)
1 (x,)RN
f () d,
RN
1
Recall that a non-negative definite operator on a complex Hilbert space is always Hermitian.
120
and use Parsevals Identity and Fubinis Theorem, together with elementary
manipulations, to arrive at
Z
ZZ
()
d d 0
(2)N
m(x) (x)2 dx =
f ( )()
RN
RN RN
for all L1 (RN ; R) Cb (RN ; R) with L1 (RN ; R). Conclude that m is non
negative, and use this to complete the proof in the case when f L1 RN ; C .
(iii) It remains only to pass from the case when f L1 RN ; C to the general
|x|2
case. For each t (0, ), set ft (x) = et 2 f (x). Clearly, ft (0) = 1 and
ft Cb (RN ; C) L1 (RN ; C). In addition, show that
Z
n
n
X
X
ft i j i j =
f i j i (x)j (x) 0,tI (dx) 0,
RN
i,j=1
i,j=1
kf ku 1
and |f () f ()|2 2 1 Re f ( ) ,
, RN .
Next, show that (*) follows directly from non-negative definiteness, whether
or not f is continuous. Thus, a non-negative definite function is uniformly
continuous everywhere if it is continuous at the origin.
Hint: Both parts of (*) follow from the fact that
f ()
f ()
1
1
f ( )
A = f ()
f ()
f ( )
is non-negative
definite. To get the second part, consider the quadratic form
v, Av C3 with v = (v1 , 1, 1).2
2
121
1(,x)RN
Exercise 3.1.11. It is important to recognize the extent to which Levys Continuity Theorem and, as a by-product, Bochners Theorem, are strictly finite
dimensional results. For example, let H be an infinite dimensional, separa2
1
ble, real Hilbert space, and define f (h) = e 2 khkH . Obviously, f is a continuous and f (0)= 1. Show that it is also non-negative definite in the sense
that f (hi hj ) 1i,jn is a non-negative definite, Hermitian matrix for each
n Z+ and h1 , . . . , hn H. Now suppose that there were a Borel probability
measure on H such that
Z
(h)
e 1(h,x)H (dx) = f (h), h H.
H
Show that, for any orthonormal basis {ei : i Z+ } in H, the functions Xi (h) =
(ei , h)H , i Z+ , would be, under , a sequence of independent, N (0, 1)-random
variables, and conclude from this that
Z
Y
2
2
ekhkH (dh) =
E eXi = 0.
H
iZ+
Hence, no such can exist. See Chapter 8 for a much more thorough account
of this topic.
Hint: The non-negative definiteness of f can be seen as a consequence of the
analogous result for Rn .
Exercise 3.1.12. The RiemannLebesgue Lemma says that f() 0
as || if f L1 (RN ; C). Thus
() 0 as || if M1 (R)
is absolutely continuous. In this exercise we will examine situations in which
M1 (R) but
()
6 0 as || .
(i) Given a symmetric M1 (R), show that
is real valued, and use Bochners
Theorem to show that
() cannot tend to a strictly negative number as || .
Hint: Let > 0, and suppose that
() 2 as || . Choose R > 0
so that
()
for
||
R
and
n Z+ so that (n 1) > 1. Set A =
122
6
0 as
|| .
Hint: Show that
never vanishes and that
(2m ) is independent of m Z+ .
3.2 The L
evyKhinchine Formula
Throughout, I(R ) will be the set of M1 (RN ) that are infinitely divisible.
My strategy for characterizing I(RN ) will be to start from an easily understood
subset of I(RN ) and to get the rest by taking weak limits.
The elements of I(RN ) that first come to mind are the Gaussian measures
(cf. (2.3.6)) m,C . Indeed, if m RN and C is a symmetric, non-negative
definite transformation on RN , then it is clear from (2.3.7) that m,C = ?n
m C.
n ,n
Unfortunately, this is not a good starting place because it is too rigid: limits of
Gaussians are again Gaussian. Indeed, suppose that mn ,Cn = . Then
N
()
for all RN ,
X
n ?n
.
n!
n=0
1
(dy)
,
d
()
=
exp
e
,
n ,
more hopeful choice of starting point, let m RN and a non-negative definite,
symmetric C be given, and choose (e1 , . . . , eN ) to be p
an orthonormal basis of
eigenvectors for C. Next, set mi = (m, ei )RN and i = (ei , Cei )RN , and take
!
N
N
X
1X
1
.
i ei + i ei
mi ei +
n =
n
2 i=1 n
2N i=1 n
123
N
X
n e
1mi (,ei ) N
R
n
i=1
!
N
X
i (, ei )RN
1
,
1 +
n cos
1
n2
i=1
which tends to [
m,C () as n , and so 2N n,n = m,C as n . Thus,
one can use weak convergence to break out to the class of Poisson measures.
As I will show in the next subsection, the preceding is a special case of a
result (cf. Theorem 3.2.7) that says that every infinitely divisible measure is the
weak limit of Poisson measures. However, before proving that result, it will be
convenient to alter our description of Poisson measures. For one thing, it should
be clear that, without loss in generality, I may always assume that the jump
distribution assigns no mass to 0. Indeed, if ({0}) = 1, then , = 0 = 0, 0
no matter how and 0 are chosen. If = ({0}) (0, 1), then , = 0 , 0 ,
where 0 = (1 ) and 0 = (1 )1 ( 0 ). In addition, although the
segregation of the rate and jumping distribution provides probabilistic insight,
there is no essential reason for doing so. Thus, nothing is lost if one replaces
, by M , where M is the finite measure , in which case
Z
d
M () = exp
1(,y)RN
1 M (dy) .
I(RN ) = P(RN ).
(3.2.1)
Before turning to the proof of (3.2.1), I need the following simple lemma about
non-vanishing, C-valued functions. In its statement, and elsewhere,
(3.2.2)
log =
X
(1 )m
m
m=1
is the principle branch of logarithm function on the open unit disk around 1 in
the complex plane.
Lemma 3.2.3. Let R (0, ) be given. If f C B(0, R); C \ {0} with
f (0) = 1, then there is a unique `f C B(0; R); C such that `f (0) = 0 and
124
f = e`f . Moreover, if B(0; R), r (0, ), and 1
f ()
f ()
`f () `f () = log
f ()
,
f ()
and therefore
f
()
if 1
f ()
if f is a second element of C B(0; R); C \ {0} with
Finally,
f()
1 f () 12 for all B(0, R), then
f
()
|`f () `f ()| 2 1
f ()
()
f
` () `f () 2 1
f
f ()
1
.
2
f(0) = 1 and if
In particular, if {fn : n 1} C B(0, R); C \ {0} with fn (0) = 1 for all n 1,
and if fn f C B(0; R); C \ {0} uniformly on B(0, R), then f (0) = 1 and
`fn `f uniformly on B(0; R).
2
f rm1
||
`f () = `f
rm1
||
+ log
f ()
rm1
||
f ()
f ()
125
1
Turning to the comparison between `f and `f when 1 ff ()
() 2 for all
f()
f () ,
conclude that `f `f = log ff . From this, the asserted estimate for |`f `f | is
immediate.
(3.2.5)
16
r
4R
,
then |
()| 2n for all B(0, R).
nor vanishes anywhere on B(0, r) and therefore that there are unique `, `
Further, since
= en` , uniqueness requires that ` = n1 `. Next, observe that,
because ` = log
and |1
| 12 on B(0, r), |`| 2 there. Hence, because
1
`
Re` 0, |1 | = 1 e n n2 on B(0, r). Using this in (3.1.7), we have, for
any > 0 and e SN 1 , that
2
1
,
max 1 ()
(3.2.6)
{y : |(e, y)RN | }
ns(r)
s(r) B(0,r)
4
for B(0, R). Finally take
which, by (3.1.4), leads to 1 () R + ns(r)
1
() = ()n to check that this gives the desired
= 4R , and use (3.2.5) and
conclusion.
I now have everything that I need to prove the equality (3.2.1).
Mn M0 (RN ) is defined by
(3.2.8)
Mn () n n1 (RN \ {0})
for BRN ,
126
() = cn1 () , we know first that cn1 never vanishes and then that ` = n`,
where ` is the unique element of C(RN ; C) satisfying `(0) = 0 and cn1 = e` . In
1
1
n ` () 1
1
e` () =
()
()
1
=
exp
n
e
d
()
=
exp
n
Mn
n
inf{|
k ()| : k Z+ and B(0, R)} > 0,
`
1 = e n k , and so, as k ,
k, n1 e n ` uniformly on compacts. Hence,
[
k, n
1
127
1
e 1(,y)RN 1 M (dy),
(*)
` = 1 , m RN 2 , C RN +
RN
and, as I already noted, only the Poisson component M offers much flexibility.
With this in mind, I introduce for each [0, ) the class M (RN ) of Borel
measures M on RN such that
Z
|y|
M (dy) < .
M ({0}) = 0 and
RN 1 + |y|
RN
1(,y)RN
. Thus,
subtracting off the next term in the Taylor expansion of e
choose a Borel measurable function : RN [0, 1] that equals 1 in a neighborhood of 0, and set `r () equal to
1 , m
RN
1
2
, C
RN
+
RN
i
e 1(,y)RN 1 1(y) , y RN Mr (dy).
Because
`r () =
1 , mr
RN
1
2
, C
RN
1(,y)RN
RN
Z
where mr = m
(y)y Mr (dy),
RN
i
1 Mr (dy),
128
1
2 (, C)RN
1(, m)RN
Z
+
RN
i
e 1(,y)RN 1 1(y) , y RN M (dy),
then `r ` uniformly on compacts. Hence, again by Levys Continuity Theorem, we know that, for each M M2 (RN ), the function
`()
(**)
1 ,m RN 12 , C RN
Z h
i
+
e 1(,y)RN 1 1(y) , y RN M (dy)
RN
1(,y)RN
RN
1(y) , y
RN
Mr (dy)
by
Z
RN
2 i
e 1(,y)RN 1 1(y) , y RN + 12 (y) , y RN Mr (dy)
in the expression for `r . However, to re-write this `r in the form given in (*),
one would have to replace C by
Z
C
(y)y y Mr (dy),
RN
A = lim n h, n1 i (0)
n
129
1(,x)RN
1(,x)RN
is not an
. Even though x
e
applying A to x
e
element, for technical reasons, it turns out that the class of s on which it
is easiest to understand A is the Schwartz test function space S (RN ; C) (the
space of smooth C-valued functions that, together with all of their derivatives,
are rapidly decreasing). The basic reason why S (RN ; C) is well suited to our
analysis is that the Fourier transform maps S (RN ; C) onto itself. Further, once
we understand how A acts on S (RN ; C), it is a relatively simple matter to use
that understanding to compute ` ().
Lemma 3.2.9. Let I(RN ) be given. For each r (0, ) there exists a
C(r) < such that |` ()| C(r)(1 + ||2 ) for all RN whenever I(RN )
satisfies |1
()| 12 for B(0, r). Moreover,
A (c1 + ) lim n hc1 + , n1 i c + (0)
n
Z
1
` ()()
d
=
(2)N RN
(3.2.10)
||R
4
.
ns(r)
1
, we obtain sup||R |1 cn1 ()| 12
Hence, if R r, then, by taking = 4R
and therefore sup||R | n1 ` ()| 2 if n satisfies (3.2.5). Finally, observe that
2
there
is an > 0 such that s(t) t for t (0, 1], and therefore that |` ()|
2 1+
64R2
r 2
Z
1
d
n e n ` () 1 ()
(2) n h, n1 i (0) =
RN
Z
Z 1 Z
t
` ()()
d,
d dt
=
e n ` () ` ()()
N
RN
RN
1
130
lim AR = 0
for all D,
x
for R > 0. Notice that, by applying the minimum principle
where R (x) = R
to both 1 and 1, one knows that A1 = 0.
To see that A satisfies both these conditions, first observe that if (0) =
minxRN (x), then h, n1 i (0) 0 for all n Z+ , and therefore that
A 0. Secondly, to check that A is quasi-local, note that it suffices to treat
N
S (RN ; R) and that for such a , c
Thus,
R () = R (R).
Z
N
` R1 ()
d 0,
(2) A R =
RN
is rapidly decreasing.
As I am about to show, these two properties allow us to say a great deal about
A . Before explaining this, first observe that if M M (RN ), then, for every
Borel measurable : RN C,
(3.2.13)
|(y)|
< = L1 (M ; C).
Using (3.2.13), one can easily check that if Cb2 (RN ; C) and S (RN ; R)
equals 1 in a neighborhood of 0, then
y
(y) (0) (y) y, (0) RN
131
2m , mP
(y) = 1 and n (y) = 0 unless m 2 n m + 1. Hence, if
(y) = mZ m (y) for y RN \ {0}, then is a smooth function with values
in [1, 4]; and therefore, for each m Z, the function m given by m (0) = 0
m (y)
for y RN \ {0} is a smooth, [0, 1]-valued function that
and m (y) = (y)
vanishes off of B(0, 2m+1 ) \ B(0, 2m2 ). In addition, for each y RN \ {0},
P
m2
|y| 2m+1 .
mZ m (y) = 1 and m (y) = 0 unless 2
Finally, given n Z+ and C n (RN ; C), define n (x) to be the multilinear
map on (RN )n into C by
!
n
X
n
n
x+
tm m
.
(x) (1 , . . . , n ) =
t1 tn
t1 ==tn =0
m=1
m (y) =
m (y)(y)
otherwise.
132
preserving and has norm Km ; and so, by the Riesz Representation Theorem,
we now know that there is a unique non-negative Borel measure Mm on RN
such that MRm is supported on B(0, 2m+1 ) \ B(0, 2m2 ), Km = Mm (RN ), and
A(m ) = RN (y) Mm (dy) for all S (RN ; R).
P
Now define the non-negative Borel measure M on RN by M = mZ Mm .
Clearly, M ({0}) = 0. In addition, if Cc RN \ {0}; R , then there is an
n Z+ such that m 0 unless |m| n. Thus,
A =
n
X
A(m ) =
m=n
m=n
=
RN
(y) Mm (dy)
RN
n
X
Z
n
X
m (y)(y)
Z
M (dy) =
(y) M (dy),
RN
m=n
and therefore
Z
(3.2.17)
A =
(y) M (dy)
RN
for Cc RN \ {0}; R .
Before taking the next step, observe that, as an application of (3.2.11), if
1 , 2 D, then
1 2 and 1 (0) = 2 (0) = A1 A2 .
(*)
(**)
(y) M (dy) A.
RN
Pn
To check this, apply (*) to n = m=n m and , and use (3.2.17) together
with the Monotone Convergence Theorem to conclude that
Z
Z
n (y) M (dy) = lim An A.
RN
RN
Now let be as in the statement of the lemma, and set R (y) = (R1 y) for
R > 0. By (**) with (y) = |y|2 (y) we know that
Z
RN
133
as R ,
RN
RN
R&0
R&0
Then, by the preceding, (3.2.17) holds for and, after one re-arranges terms,
says that (3.2.16) holds. Thus, the properties of C are all that remain to be
proved. That C is symmetric requires no comment. In addition, from (*), it
is clearly non-negative definite. Finally, to see that it is independent of the
chosen, let 0 be a second choice, note that 0 = in a neighborhood of 0, and
apply (3.2.17).
134
sup
|y|
1 (y)
sup
(y)|y|
< ,
yB(0,1)
/
yB(0,1)\{0}
1
1(,y)RN
N M (dy)
1(y)
,
y)
e
R
2
1
+
||
(3.2.20)
RN
is bounded and tends to 0 as || .
1(,y)RN
1 1(y) , y)RN M (dy)
e
RN
Z
1(,y)RN
1 1 , y)RN M (dy)
e
B(0,r)
Z
Z
+ ||
1 (y) |y| M (dy) +
2 + ||(y)|y| M (y)
B(0,r)
||
2
B(0,r){
|y|2 M (dy) + ||
B(0,r)
Z
+ ||
1 (y) M (dy)
B(0,r)
(y)|y| M (dy) + 2M B(0, r){ .
B(0,r){
135
(3.2.21)
`(m,C,M ) () = 1 m, RN 12 , C RN
Z
+
e 1 (,y)RN 1 1(y) , y RN M (dy)
RN
for any Levy system (m, C, M ) and any Borel measurable : RN [0, 1]
that satisfies (3.2.19). Furthermore, because `(m,C,Mr ) `(m,C,M ) uniformly
on compacts when Mr (dy) = 1[r,) (|y|) M (dy), it is clear that `(m,C,M ) is
continuous.
Theorem 3.2.22 (L
evyKhinchine). For each I(RN ), there is a unique
1
` C(RN ; C) such that ` (0) = 0 and
= e` , and, for each n Z+ , e n ` is
the Fourier transform of the unique n1 M1 (RN ) satisfying = ?n
1 . Next,
n
|y|&0
Z
C = lim n
n
RN
0 (y) y y n1 (dy)
and
m0 = lim n
n
0 (y)2 y y M (dy),
RN
Z
RN
0 (y)y n1 (dy)
for any if 0 Cc RN ; [0, 1] satisfying 0 = 1 in a neighborhood of 0. Finally,
for any Borel measurable : RN [0, 1] satisfying (3.2.19),
m
m0
Z
+
(y) 0 (y) M (dy).
RN
136
RN
RN
1 R (y) e (y) M (dy),
RN
and observe that the last term is dominated by M B(0, R){ 0.
So far we know that, for each I(RN ), there is a Levy system (m , C, M )
such that ` () = `(m ,C,M ) . Moreover, in the preliminary discussion at the
beginning of this subsection, it was shown that, for each Levy system (m, C, M ),
there exists a I(RN ) for which `(m,C,M ) = ` .
Finally, let 0 be as in the statement of this theorem. Given I(RN ), let
0
m RN , C Hom(RN ; RN ), and M M2 (RN ) be associated with A as in
0
(3.2.16) of Lemma 3.2.14 when = 0 . As we have just seen, ` = `(m
.
0
,C ,M )
1
RN
RN
RN
and that
m0 = lim n
n
Z
RN
0 (y) n1 (dy).
0
0 (y) (y) , y RN M (dy)
`(m,C,M ) () = `(m,C,M ) () + 1
RN
R
137
e(,y)RN (dy) = e` (
1)
for all CN .
RN
Hint: The first part is completely elementary complex analysis. To handle the
second part, begin by arguing that it is enough to treat the cases when either
M = 0 or C = 0. The case M = 0 is trivial, and the case when C = 0 can be
further reduced to the one in which = M for an M M0 (RN ) with compact
P
m
support in RN \ {0}. Finally, use the representation M = e m=0 m! ?m to
complete the computation in this case.
RN
2 lim t2 ` (t)
Z
(y)y M (dy)
RN
2
and
, m = 1 lim t1 ` (t) + t2 , C RN
t
Z
e 1(,y)RN 1 M (dy).
` () = 12 , C RN + 1 , m RN +
RN
1
` () = , C +
2
Z
RN
cos , y
RN
1 M (dy).
Exercise 3.2.25. Given I(R), show
that (, 0) = 0 if and only if
C = 0, M M1 (R), M (, 0) = 0, and (cf. the preceding exercise)
m 0. The following are steps that you might follow.
138
r
(i) To prove the if assertion,
set M (dy) = 1[r,) (y) M (dy) for r > 0, and
r
show that m ? M r (, 0] = 0 for
all r > 0 and m ? M = as r & 0.
Conclude from these that (, 0) = 0.
(ii) Now assume that (, 0) = 0. To see that C = 0, show that if > 0,
then 0,2 ? (, 0) > 0 for any M1 (R).
n
(iii) Continuing (ii), show that (, 0) n1 (, 0) , and conclude first
that n1 (, 0) = 0 for all n Z+ and then that
Z
M (, 0) = 0 and m
(y)y M (dy).
RN
B(s, t)
1(0,) (x)xs+t1 ex dx,
(s)(t)
where
Z
B(s, t)
s1 (1 )t1 d
(0,1)
(ii) As a consequence of (i), we know that the t s are infinitely divisible. Show
that their LevyKhinchine representation is
#
" Z
1 y
y dy
.
1 e
bt () = exp t
e
y
(0,)
Exercise 3.2.27. Given a M1 (RN ) for which there exists a strictly increasing sequence {nm : m 1} Z+ and a sequence { n1 : m 1} M1 (RN )
m
139
N
and
fixed points of T . That is, F (RN ) =
let F (RN ) denote the set of non-trivial
M1 (R ) \ {0 } : = T . If F (RN ) and 2n denotes the distrin
n
bution of x
2 x under , then = ?2
2n for all n. Hence, by the result in
Exercises 3.2.27, I(RN ), and so F (RN ) I(RN ) for all (0, ). In this
section, I will study the Levy systems associated with elements of F (RN ).
3.3.1. General Results. Knowing that F (RN ) I(RN ), we can phrase the
N
condition = T in terms of the associated Levy systems.
Namely, N F (R )
1
N
(3.3.1)
(y) T M (dy) = 2
(2 y) M (dy)
RN
RN
1
)m =
(y) (2 y) y M (dy).
(1 2
( C = 0,
F (R ) {0 }
RN
1
m +
(y) (2 y) y T M (dy).
mT = 2
RN
140
2
Hence, F (RN ) {0 } if and only if M = T M , C = 21 C , and, for
any satisfying (3.2.19),
1
1
(1 2
)m
1
(y) (2 y) y M (dy).
=
RN
M = T M = M B(0, 2 ) \ B(0, 2
n+1
1
) = 2n M B(0, 1) \ B(0, 2 ) .
1
From this we see that M B(0, 1) \ B(0, 2 ) > 0 unless M = 0 and that
P
the M -integral of |y| over B(0, 1) is bounded below by 21 n=0 2n(1 ) and
P
12
, y
1
1
RN
M (dy)
2 <|y|1
Z
1(,y)RN
1 1[0,1] (|y|) , y
RN
M (dy)
RN
for some M
T
>
M (RN ) \ M (RN ) satisfying M = T M . If (0, 1),
1(,y)RN
1 M (dy)
RN
for some M
T
N
M
(R
)
\ M (RN ) satisfying M = T M . Finally,
>
1 m,
RN
+
RN
e 1(,y)RN 1 1 1[0,1] (|y|) , y RN M (dy)
T
N
N
141
Proof: The first assertion requires no comment. When (0, 2), the if
1
assertions can be proved by checking that, in each case, ` () = 2` (2 ).
When [1, 2), the only if assertion follows immediately from Lemma 3.3.2
with = 1B(0,1) , and when (0, 1), it follows from that lemma combined
with the observation that
Z
Z
1
1
y M (dy).
y M (dy) =
M = T M = 1 2
1
{2 <|y|1}
B(0,1)
3.3.2. -Stable Laws. The most studied elements of F (RN ) are the stable laws: those I(RN )\{0 } such that ` (t) = t ` () for all t (0, ),
not just for t = 2. Equivalently, if M1 (RN ) is -stable if and only if
I(RN ) \ {0 } and, for all non-negative, Borel measurable functions ,
Z
Z
1
(y) t (dy) =
(t y) (dy), t (0, ),
RN
RN
where bt () = et` () . Thus, there are no -stable laws if > 2, and is 2-stable
if and only if = 0,C for some C 6= 0. To examine the -stable laws when
(0, 2), I will need the computations contained in the following lemmas.
Lemma 3.3.4. Assume that M M2 (RN ) and that (0, 2), and define the
finite Borel measure on SN 1 by
Z
2 |y|
1
y
|y| e
M (dy)
|y|
h, i =
(2 ) RN \{0}
RN
(y) M (dy) =
RN
(r)
SN 1
(0,)
dr
r1+
!
(d)
for all Cc RN \ {0}; R .
Proof: The if assertion is obvious. In addition, the only
if assertion
y
will follow once I prove it for s such that (y) = 1 |y| 2 (|y|), where
1 C SN 1 ; [0, ) and 2 Cc (0, ); R . Given 1 C SN 1 ; [0, ) ,
determine the Borel measure on (0, ) by
Z
y
2 (|y|)|y|2 M (dy)
h2 , i =
1 |y|
RN \{0}
142
for 2 Cc (0, ); R . Then (3.3.5) implies that
Z
e
tr
(dr) = t
(0,)
er (dr) = t2 (2 )h, i
(0,)
t (0, ),
(0,)
uniqueness of the Laplace transform (cf. Exercise 1.2.12) implies that (dr) =
h1 , ir1 dr, and therefore that
Z
RN \{0}
y
1 |y|
Z
M (dy) =
(0,)
2 (r)
(dr) = h1 , i
r2
Z
1 (r)
(0,)
dr
r1+
.
Lemma 3.3.6. Let I(RN ). Then is 2-stable if and only if = 0,C for
some symmetric, non-negative definite C 6= 0; is -stable for some (0, 1)
if and only if there is a finite, non-negative Borel measure 6= 0 on SN 1 such
that
!
Z
Z
dr
1(,r) N
R
1 1+ (d);
` () =
e
r
SN 1
(0,)
1 , m RN
Z
Z
+
dr
1(,r) N
R
1 11[0,1] (r) , r RN 2
e
r
(0,)
SN 1
!
(d);
and is -stable for some (1, 2) if and only if there is a finite, non-negative,
Borel measure 6= 0 on SN 1 such that ` () equals
1
1
Z
,
SN 1
Z
+
SN 1
RN
(d)
dr
1(,r) N
R
1 11[0,1] (r) , r RN 1+
e
r
(0,)
!
(d).
143
Proof: The sufficiency part of each case is easy to check directly or as a consequence of Theorem 3.3.3. To prove the necessity, first check that if is -stable
and therefore ` (t) = t ` (), then M must have the scaling property in (3.3.5)
and therefore have the form described in Lemma 3.3.4. Second, when M has
this form, simply check that in each case the result in Theorem 3.3.3 translates
into the result here.
In the following, C+ denotes the open upper half-space { C : Im() > 0}
in C, and C+ denotes its closure { C : Im() 0}. In addition, given C
, where arg is 0 if = 0 and is the
and (0, 2), we take || e 1arg
1r
r1+
(0,)
(1
dr =
for C+ .
In particular,
Z
a
(0,)
and
(2)
(1)
cos
2
if (1, 2)
cos r 1
dr =
(1) cos
2
r1+
2
Z
b
(0,)
(1 )
sin r
sin
dr =
2
r1+
if (0, 1)
if = 1
if (0, 1).
Proof: Let f () denote the integral on the left-hand side of the first equation.
Clearly f is continuous on C+ and analytic on C+ . In addition, f () = f (1)
for (0, ), and Re f (1) < 0. Hence, there exist c > 0 and 0, 2 such
f ( 1) =
Z
(0,)
1
er 1
dr =
1+
Z
(0,)
r er dr =
(1 )
.
and =
Hence, c = (1)
2 .
When (0, 1), the values of a and b follow immediately from the evaluation of f (1). When (1, 2), one can find the value of a by first observing
that
Z
Z
cos r 1
cos(r) 1
dr for (0, ),
dr
=
1+
r1+
r
(0,)
(0,)
144
(0,)
cos r 1
dr =
r1+
Z
(0,)
sin r
dr = b1 .
r
(2 ) cos
2
= .
&1
2
1
a1 = lim a = lim
&1
(d).
` () = (1)1(0,1) ()
1
SN 1
and
` () =
1 , m RN 1
Z
,
SN 1
RN
log ,
RN
(d),
1arg for C.
Proof: When (0, 1), the conclusion is a simple application of the corresponding results in Lemmas 3.3.6 and 3.3.7. When (1, 2), one has to
massage the corresponding expression in Lemma 3.3.6. Specifically, begin with
the observation that
Z
i dr
h
1
+
e 1r 1 11[0,1] (r)r 1+
r
1
(0,)
!
Z
i dr
h
1sgn()
1sgn()r
1 1sgn()1[0,1] (r)r 1+ +
= ||
e
1
r
(0,)
, N g sgn(, )RN (d),
R
145
i dr
1
1 11[0,1] (r)r 1+
g (1) =
e
1
r
(0,)
dr
1
.
sin r 1[0,1] (r)r 1+
= a 1
1
r
(0,)
Z
1r
Next use integration by parts over the intervals (0, 1] and [1, ) to check that
Z
sin r 1[0,1] (r)r
(0,)
a1
Hence, since
dr
1
1
+
=
1+
1
r
Z
(0,)
a1
1
cos r 1
.
+
dr =
1
r
(2)
sin
= (1)
2 ,
g (1) =
(2 )
e 2 ,
( 1)
and therefore
(2 )
sgn(x, )RN , RN =
( 1)
(, )RN
.
1
.
Thus, all that we need to do is replace the in Theorem 3.3.8 by (1)
Turning to the case = 1, note that, because of the mean zero condition on
,
Z
SN 1
1(,)RN r
(0,)
11[0,1] (r)r ,
i dr
RN
r2
!
(d)
!
i dr
h
= lim
e 1(,)RN r 1 1+ (d)
%1 SN 1
r
(0,)
Z
(, )RN
(1 )
(d)
= lim
%1
1
SN 1
Z
1
, RN , RN (d)
= 1 lim
%1 1 SN 1
Z
, RN log , RN (d),
= 1
Z
SN 1
146
Corollary 3.3.9. For any (0, 2], is a symmetric and -stable law if
and only if there is a finite, non-negative, symmetric, Borel measure 6= 0 on
SN 1 such that
Z
, N (d).
` () =
R
SN 1
we see that
` () =
1
, C RN =
2
Z
SN 1
, N 2 (d).
R
SN 1
(,
Z
(, )RN
1(0,1) ()
(d)
(d) = (1)
csc
RN
2
1
SN 1
1(0,1) ()
SN 1
(, )RN
Z
, N (d).
() = cos
R
2
SN 1
, RN (d)
R
SN 1
{:(,)RN <0}
RN
log ,
RN
+ ,
RN
log ,
RN
(d)
{:(,)RN <0}
1
= 1
Z
,
SN 1
RN
log ,
RN
(d),
147
|| , 0 (d 0 ) SN 1 (d) = t|| ,
` () =
SN 1
SN 1
t = (SN 1 )
SN 1
SN 1
e, N SN 1 (d)
R
S () =
X
mZ
2m
1 (2 y) (dy),
RN
and show that this map is one-to-one and onto the set of M M2 (RN ) satisfying
(cf. (3.3.1)) M = T M . Conclude that, for each (0, 2), F (RN ) contains
lots of elements!
Exercise 3.3.11. Here are a few further properties of elements of F (RN ).
(i) Show that there is F (RN ) such that {y : (e, y)RN < 0} = 0 for some
e SN 1 if and only if (0, 1).
Hint: Reduce to the case when N = 1, and look at Exercise 3.2.24.
N 1
(ii) If F1 (RN ), show that,
, {y : (e, y)RN < 0} >
for every e S
0 {y : (e, y)RN > 0} > 0.
(iii) If (1, 2), show that for each > 0 there is a F (R) such that
(, ] = 0.
Exercise 3.3.12. Take N = 1. This exercise is about an important class of
stable laws known as one-sided stable laws: stable laws that are supported
on [0, ).
(i) Show that there exists a one-sided -stable law only if (0, 1).
148
(ii)If
(0, 1), show that is a one-sided -stable law if and only if ` () =
(iii) Let (0, 1), and use t to denote the one-sided -stable law with `t () =
t 1 . Show that
Z
e
1y
t (dy)
= exp t
[0,)
for C with Im() 0.
ey t (dy) = et ,
0.
[0,)
t =
0,2 I t2 (d ),
[0,)
where t2 is the one-sided 2 -stable law in part (iii) of the preceding exercise.
This representation is an example of subordination, and, as we will see in
Exercise 3.3.17, can be used to good effect.
(3.3.15)
dt
dR
for t (0, ),
t
e h
,
t ( ) d = e
0
1
h1 (t ).
and that h
t ( ) t
[0, ),
149
and
2 e(
a2
+b2 )
d =
2 e2ab
a
for all (a, b) (0, ) , and conclude from the second of these that
1
1(0,) ( )e 4
.
h1 ( ) =
3
4 2
1
2
(3.3.16)
Exercise 3.3.17. In this exercise we will discuss the densities of the symmetric
stable laws
t for (0, 2) (cf. Exercise 3.3.13). Once again, we know that
each
admits
a smooth density with respect to Lebesgue measure RN on RN .
t
Further, it is clear that this density is symmetric and that
1
1 d
d
t
1
(t x)
(x) = t
dRN
dRN
for t (0, ).
(i) Referring to Exercise 3.3.14 and using Exercise 3.3.12, show that
Z
|x|2
1
d
1
N
2 e 4 h 2 ( ) d.
(x) =
(3.3.18)
N
dRN
(4) 2 0
1
(ii) Because we have an explicit expression for h12 , we can use (3.3.18) to get an
d11
dRN
N
2tN
d1t
(t, x) (0, ) RN ,
(x) = tR (x)
N +1 ,
dRN
N (t2 + |x|2 ) 2
1
N +1
is the surface area of SN in RN +1 . The function
where N = 2 2 N2+1
R
1 is the density for what probabilists call the Cauchy distribution. For
N
general N s, (t, x) (0, ) RN 7 tR (x) is what analysts call the Poisson
kernel for the right half-space in RN +1 . That is (cf. Exercise 10.2.22), if f
Cb (RN ; R), then
Z
N
(t, x)
uf (t, x) =
f (x y) tR (y) dy
(3.3.19)
RN
150
ZZ
RN RN
|yx|
Z
f (x)f (y) dxdy =
RN
|f()|2
1 (d)
for f L1 (RN ; C). This can be used to prove that k k determines a Hilbert
norm on Cc (RN ; C).
Chapter 4
L
evy Processes
Although analysis was the engine that drove the proofs in Chapter 3, probability
theory can do a lot to explain the meaning of the conclusions drawn there.
Specifically, in this chapter I will develop an intuitively appealing way of thinking
about a random
variable
X whose distribution is infinitely divisible, an X for
1 (,X)
P
N
R
equals
which E e
exp
1 , m)
1
2
, C
Z
+
RN
RN
h
i
1 (,y)RN
1 1 1[0,1] |y| , y RN M (dy)
e
152
4 Levy Processes
For reasons that should be obvious now, an evolution {Z(t) : t [0, )} of the
sort described above used to be called a process with independent, homogeneous increments, the term process being the common one for continuous
families of random variables and the adjective homogeneous referring to the
fact that the distribution of the increment Z(t) Z(s) for 0 s < t depends
only on the length t s of the time interval over which it is taken. In more
recent times, a process with independent, homogeneous increments is said to be
a L
evy process, and so I will adopt this more modern terminology.
Assuming that the family {Z(t) : t [0, )} exists, notice that we already
know what the joint distribution of {Z(tk ) : k N} must be for any choice of
0 = t0 < < tk < . Indeed, Z(0) = 0 and
P Z(tk ) Z(tk1 ) k , 1 k K =
K
Y
tk tk1 (k )
k+1
k=1
j=1
k=1
153
+
n
n
Z and m 2 } and b cn = b cn 2
= max{m2
: m N and m <
2n }. In addition, for 0 a < b,
(4.1.1)
154
4 Levy Processes
k=1
and a = t0 < t1 < < tK = b
is the total variation of [a, b].
Lemma 4.1.3.
r > 0, the set
If D(RN ), then, for each t > 0, kk[0,t] < , and for each
J(t, r, ) { (0, t] : |( ) ( )| r}
is finite subset of (0, t]. In addition, there exists an n(t, r, ) N such that, for
every n n(t, r, ) and m Z+ (0, 2n ],
m2n t (m 1)2n t r = m2n = + for some J(t, r, ).
t n
Finally,
kk[0,t] = lim max |(m2n t)| : m N [0, 2n ]
n
and
var[0,t] () = lim
m2n t (m 1)2n t .
mZ+ [0,2n ]
Proof: Begin by noting that it suffices to treat the case when t = 1, since one
can always reduce to this case by replacing with
(t ).
If kk[0,1] were infinite, then we could find a sequence {n : n 1} [0, 1] such
that |(n )| , and clearly, without loss in generality, we could choose this
sequence so that n [0, 1] and {n : n 1} is either strictly decreasing or
strictly increasing. But, in the first case this would contradict right-continuity,
and in the second it would contradict the existence of left limits. Thus, kk[0,1]
must be finite.
Essentially the same reasoning shows that J(1, r, ) is finite. If it were not,
then we could find a sequence {n : n 0} of distinct points in (0, 1] such
that |(n ) (n )| r, and again we could choose them so that they were
either strictly increasing or strictly decreasing. If they were strictly increasing,
then n % for some (0, 1] and, for each n Z+ , there would exist a
n0 (n1 , n ) such that |(n ) (n0 )| 2r , which would contradict the
existence of a left limit at . Similarly, right-continuity would be violated if the
n s were decreasing.
Although it has the same flavor, the proof of the existence of n(1, r, ) is a
bit trickier. Let 0 < 1 < K 1 be the elements of J(1, r, ). If n(1, r, )
155
n 0m2
The assertion about var[0,1] () is proved in essentially the same manner, although now the monotonicity comes from the triangle inequality and the first
equality in the preceding must be replaced by |(t)(t)| = limn |(btc+
n )
(btcn )|.
I next give D(RN ) the topological structure corresponding to uniform convergence on compacts, or, equivalently, the topological structure for which
(, 0 )
2n
n=1
k 0 k[0,n]
1 + k 0 k[0,n]
and
)| 0
sup |k ( ) (
(0,t]
156
4 Levy Processes
In particular, if t
j(t, ) is a jump function and t > 0, then, either j(t, ) =
j(t, ) or j(t, ) j(t, ) = y for some y RN \ {0}.
Proof: It should be obvious that if J and {y : J} satisfy the stated
conditions, then the t
j(t, ) given by (4.1.5) is a jump function. To go in the
other direction, suppose that t
j(t, ) is a jump function, and, for each r > 0,
set fr (t) = j t, RN \ B(0, r) . Because t
fr (t) is a non-decreasing, piecewise
constant, right-continuous function satisfying fr (0) = 0 and fr (t) fr (t)
157
{0, 1} for each t > 0, it has at most a countable number of discontinuities, and
at most fr (t) of them can occur in any interval (0, t]. Furthermore, if fr has a
discontinuity at , then j , B(0, r) j , B(0, r) = 0, and so the measure
= j(, ) j( , ) is a {0, 1}-valued probability measure on RN that assigns
mass 0 to B(0, r). Hence (cf. Exercise 4.1.15) fr ( ) 6= fr ( ) = = y
for some y RN \ B(0, r). From these considerations, it follows easily that if
J(r) = { (0, ) : fr ( ) 6= fr ( )} and if, for each J(r), y RN \B(0, r)
is chosen so that j(, ) j( , ) = y , then J(r) (0, t] is finite for all t > 0
and
X
j t, B(0, r){ =
1[,) (t)y .
J(r)
1 ( ) ( ) ,
J(t,)
158
4 Levy Processes
j(t, ,R)-integrable for all (t, ) [0, ) D(RN ) and, for each D(RN ),
t
(y) j(t, dy, ) is right-continuous and piecewise constant. Thus, it
RN
suffices to show that, for each t (0, ),
Z
(*)
b c+
n b cn
J(1,r,)
2n
X
(y) j(1, dy, ) = lim
m2n (m 1)2n .
n
m=1
N
j (t, ), and j(t, , ) = j R \ (t, ) = j(t, )j (t, ) for any D(RN )
whose jump function is t
j(t, ). Finally, suppose that {m : m 0}
N
D(R ) and a non-decreasing Ssequence {m : m 0} BRN satisfy the
159
Proof: Throughout the proof I will use the notation introduced in Lemma
4.1.4.
we know that
Assuming that 0
/ ,
X
j (t, ) =
1[,) (t)1 (y )y ,
J
where, for each t > 0, there are only finitely many non-vanishing terms. At the
same time,
X
X
(t) =
1[,) (t)1 (y )y and j(t, , ) =
1[,) (t)1RN \ (y )y
J
if j(t, , ) = j(t, ). Thus, all that remains is to prove the final assertion. To
this end, suppose that j(t, , ) 6= j(t, , ). Since k m k[0,t] 0, there
exists an m such that m (t) 6= m (t) and therefore that j(t, ) j(t, ) = y
for some y m . Since this means that n (t) n (t) = y for all n m, it
follows that (t) (t) = y and therefore that j(t, , ) j(t, , ) = y =
j(t, ) j(t, ). Conversely, suppose that j(t, ) 6= j(t, ) and choose m so
that j(t, ) j(t, ) = y for some y m . Then n (t) n (t) = y for
all n m. Thus, since this means that (t) (t) = y, we again have that
j(t, , ) j(t, , ) = y = j(t, ) j(t, ). After combining these, we see
that j(t, , ) j(t, , ) = j(t, ) j(t, ) for all t > 0, from which it is an
easy step to j(t, ) = j(t, , ) for all t 0.
Exercises for 4.1
Exercise 4.1.9. When dealing with uncountable collections of random variables, it is important to understand what functions are measurable with respect
to them. To be precise, suppose that {Xi : i I} is a non-empty collection of
functions on some space
with values in some measurable space (E, B), and let
F = {Xi : i I} be the -algebra over which they generate. Show that
+
A F if and only if there is a sequence {im : m Z+ } I and an B Z
such that
A = : Xi1 (), . . . , Xim (), . . . .
More generally, if f : R, show that f is F-measurable if and only if there
+
+
is a sequence {im : m Z+ } I and a F Z -measurable F : E Z R such
that
f () = F Xi1 (), . . . , Xim (), . . . .
Hint: Make use of Exercise 1.1.12.
Exercise 4.1.10. Let e SN 1 , set t ( ) = 1[t,) ( )e for t [0, 1], and show
that kt s k[0,1] = 1 for all s 6= t from [0, 1]. Conclude from this that D(RN )
is not separable in the topology of uniform convergence on compacts.
160
4 Levy Processes
RN
Hint: This is most easily seen from the representation of j(t, , ) in terms of
point masses at the discontinuities of . One can use this representation to show
that, for each r > 0,
Z
X
var[0,t] ()
( ) ( ) =
|y| j(t, dy, ), (t, ) [0, ).
|y|r
J(t,r,)
Exercise
4.1.14. If is an absolutely pure jump path, show that var[0,t] () =
R
|y| j(t, dy, ) and therefore that has locally bounded variation. Conversely,
if C(RN ) has locally bounded variation,
show that is an absolutely pure
R
jump path if and only if var[0,t] () = |y| j(t, dy, ). Finally,
if D(RN )
R
and j(t, , ) M1 (RN ) for all t 0, set c (t) (t) y j(t, dy, ) and
show that c C(RN ) and
Z
var[0,t] () = var[0,t] (c ) + |y| j(t, dy, ).
Exercise 4.1.15. If M1 (RN ), show that () {0, 1} for all BRN if
and only if = y for some y RN .
Hint: Begin by showing that it suffices to handle the case when N = 1. Next,
assuming that N = 1, show that is compactly supported, let m be its mean
value, and show that = m .
4.2 Discontinuous L
evy Processes
In this section I will construct the Levy processes corresponding to those
I(RN ) with no Gaussian component. That is,
1 , m RN
() = exp
(4.2.1)
Z h
i
1(,y)
1 1 1[0,1] (|y|) , y RN M (dy) .
+
e
RN
161
Because they are the building blocks out of which all such processes are made,
I will treat separately the case when is a Poisson measure M for some M
M0 (RN ) and will call the corresponding Levy process the Poisson process
associated with M .
4.2.1. The Simple Poisson Process. I begin with the case when
P N 1= 1
1
and M = 1 , for which M is the
simple
Poisson
measure
e
m=0 m! m
for all n Z+ and (t1 , . . . , tn ) Rn . Without loss in generality, I may and will
assume that m () > 0 for all m Z+ and . InPaddition, by The Strong
for t [0, ).
n=1
Clearly t
N (t, ) is a non-decreasing, right-continuous, piecewise constant, Nvalued path that starts at 0 and, whenever it jumps, jumps by +1. In particular,
N ( , ) D(RN ), N (t, ) N (t, ) {0, 1} for all t (0, ), and (cf. (4.1.6))
j t, , N ( , ) = N (t, )1 .
Because P N (t) = n = P Tn t < Tn+1 , P N (t) = 0 = P(1 > t) = et ,
and, when n 1 (below || denotes the Lebesgue measure of BRn )
Z
Z
Pn+1
P N (t) = n , = e m=1 m d1 dn+1 = et |B|,
A
Pn
Pn+1
where A = (1 , . . . , n+1 ) (0, )n+1 :
t
<
and
m
m
m=1
m=1
Pn
B = (1 , . . . , n ) (0, )n :
t
.
By
making
the
change
of
m
m=1
Pm
variables sm = j=1 j and remarking that the associated Jacobian is 1, one
sees that |B| = |C|, where C = (s1 , . . . , sn ) Rn : 0 < s1 < < sn t .
n
Since |C| = tn! , we have shown that the P-distribution of N (t) is the Poisson
measure t1 . In particular, 1 is the P-distribution of N (1).
I now want to use the same sort of calculation to show that {N (t) : t [0, )}
is a simple Poisson process, that is, a Levy process for 1 . (See Exercise
4.2.18 for another, perhaps preferable, approach.)
162
4 Levy Processes
Lemma 4.2.3. For any (s, t) [0, ), the P-distribution of the increment
N (s + t) N (s) is t1 . In addition, for any K Z+ and 0 = t0 < t1 < < tK ,
the increments {N (tk ) N (tk1 ) : 1 k K} are independent.
Proof: What I have to show is that, for all K Z+ , 0 = n0 nK , and
0 = t0 < t1 < < tK ,
P N (tk ) N (tk1 ) = nk nk1 , 1 k K
K
Y
e(tk tk1 ) (tk tk1 )nk nk1
,
(nk nk1 )!
k=1
and, since the case when nK = 0 is trivial, I will assume that nK 1. In fact,
because neither side is changed if one removes those nk s for which nk = nk1 ,
I will assume that 0 = n0 < < nK .
Begin by noting that
P N (tk ) = nk , 0 k K = P Tnk tk < Tnk+1 , 1 k K
Z
Z
PnK +1
= e m=1 m d1 dnK +1 = etK |B|,
A
where
(
A=
nK +1
(1 , . . . , nK +1 ) (0, )
nk
X
m tk <
m=1
nX
k +1
)
m , 1 k K
m=1
and
(
B=
nk
X
)
m tk : 1 k K
m=1
Pm
To compute |B|, make the change of variables sm = j=1 j to see that |B| =
|C|, where
C = (s1 , . . . , snK ) RnK : tk1 < snk1 +1 < < snk tk for 1 k K .
Finally, for 1 k K, set
Ck = (snk1 +1 , . . . , snk ) Rnk nk1 : tk1 < snk1 +1 < < snk tk ,
163
|Ck | =
kS
kS
(4.2.4)
ZM (t, ) =
Xn (),
1nN (t,)
(4.2.5)
j t, , ZM ( , ) =
X
1nN (t,)
Xn () =
n=1
I now want to check that {ZM (t) : t 0} is a Levy process for M and, as
such, deserves to be called a Poisson process associated with M : the one with
. That is, I want to show that, for
rate M (RN ) and jump distribution M M
(RN )
each 0 = t0 < t1 < tK , the random variables ZM (tk ) ZM (tk1 ), 1 k K,
164
4 Levy Processes
are mutually independent and that the kth one has distribution (tk tk1 )M .
Equivalently, I need to check that, for any 1 , . . . , K RN ,
!#
"
K
K
X
Y
P
1
k , ZM (tk ) ZM (tk1 ) RN
=
[
E exp
k M (k ),
k=1
k=1
K
X
EP exp 1
k , Xm
RN
K
K
n n
Y
Y
ek k k k1
(k )nk nk1 =
[
k M (k ).
(nk nk1 )!
nK n1 0 k=1
k=1
Any stochastic process {Z(t) : t 0} with right-continuous, piecewise constant paths and the same distribution as the process {ZM (t) : t 0} just
constructed is called a Poisson process associated with M .
Here is a beautiful and important procedure for transforming one Poisson
process into another.
0
165
from which the first assertion follows immediately from the same computation
with which I just showed that {ZM (t) : t 0} is a Poisson process associated
with M .
To prove the second assertion, I begin by observing that it suffices to treat
the case when I = {1, 2}. To see this, suppose that we know the result in that
case, and let n > 2 and a set {i1 , . . . , in } of distinct elements from I be given.
By taking F1 = (Fi1 , . . . , Fin1 ), F2 = Fin , and applying the assumed result, we
would have that {ZFin (t) : t 0} is independent of ZFi1 (t), . . . , ZFin1 (t) :
t 0 . Hence,
F proceeding by induction, we would be able to show that the
processes {Z im (t) : t 0} : 1 m n are independent.
Now assume that I = {1, 2}. What I have to check is that, for any K Z+ ,
0 = t0 < t1 < < tK , and {(k1 , k2 ) : 1 k K} RN1 RN2 ,
"
K h
X
P
1
k1 , ZF1 (tk ) ZF1 (tk1 RN1
E exp
k=1
i
+ k2 , ZF2 (tk ) ZF2 (tk1 ) RN2
"
P
=E
exp
#
!#
K
X
k1 , ZF1 (tk )
F1
Z (tk1 ) RN1
k=1
"
P
exp
K
X
!#
k2 , ZF2 (tk )
F2
Z (tk1 ) RN2
k=1
For this purpose, take F : RN RN1 +N2 to be given by F (y) = F1 (y), F2 (y) ,
and set k = (k1 , k2 ). Then the first expression in the preceding equals
"
#
K
X
F
F
P
1
k , Z (tk ) Z (tk1 RN1 +N2
E exp
k=1
K
Y
h
i
EP exp 1 k , ZF (tk tk1 ) RN1 +N2
,
k=1
166
4 Levy Processes
Z (t) =
The result in Theorem 4.2.8 says that the jumps of a Poisson process can be
decomposed into a family of mutually independent, simple Poisson processes run
at rates determined by the M -measure of the jump sizes. The next result can
be thought of as a re-assembly procedure that complements this decomposition
result.
Theorem 4.2.9. If {Zk (t) : t 0} : 1 k K are mutually independent
Poisson processes associated with {Mk : 1 k K} M0 (RN ), then
(
)
K
K
X
X
Z(t)
Zk (t) : t 0 is a Poisson process associated with M
Mk .
k=1
k=1
Next, suppose that the Mk s are mutually singular in the sense that, for each
k, there exists a k BRN \{0} with the properties that k ` = and
Mk k { = 0 = M` (k ) for ` 6= k. Then, for P-almost every ,
K
X
j t, , Z( , ) =
j t, , Zk ( , ) ,
t [0, ).
k=1
Equivalently, for P-almost every and all t 0, there is at most one k such
that Zk (t, ) 6= Zk (t, ).
Proof: Clearly, {Z(t) : t 0} starts at 0 and has independent increments. In
addition, for any s, t [0, ) and RN ,
K
i
i Y
h
h
EP e 1(,Zk (s+t)Zk (s))RN
EP e 1(,Z(s+t)Z(s))RN =
k=1
K
Y
Z
exp t
1(,y)RN
1 Mk (dy)
RN
k=1
Z
= exp t
RN
1(,y)RN
1 M (dy) .
167
Now assume that the Mk s are as in the final part of the statement, and choose
k s accordingly. Without loss in generality, I will assume that RN \ {0} =
SK
k=1 k . Also, because the assertion depends only on the joint distribution of
the processes involved, I may and will assume that
Z
Zk (t) =
y j t, dy, Z for 1 k K,
k
PK
since then Z(t) = k=1 Zk (t), and, by Theorem 4.2.8, the Zk s are independent
and the kth one is a Poisson process associated with Mk . But
with this choice,
another application of Theorem 4.2.8 shows that j t, , Zk = j t, k , Z ,
and therefore
K
X
j t, , Z =
j t, , Zk , t [0, ).
k=1
Because the paths of a Poisson process are piecewise constant, they certainly
have finite variation on each compact time interval. The first part of the next
lemma provides an estimate of that variation. The estimate in the second part
will be used in 4.2.5.
Lemma 4.2.10.
M0 (RN ), then
|y| M (dy).
RN
In addition, if
R
RN
2
N 2t
2
[0,t] R N t EP |Z(t)|
= 2
P kZk
R
R2
R
RN
y M (dy), then
|y|2 M (dy).
RN
Z
|y| (dy) = t
RN
|y| M (dy).
RN
N lim sup P
max n e, Z m2 t RN > N R .
n eSN 1
1m2
168
4 Levy Processes
n t) Z((`
e, Z(m2
t) RN =
e, Z(`2
1)2n t) RN ,
1`m
.
P
max n e, Z m2 t RN > N R N R2 EP e, Z(t)
RN
1m2
R
M (t)|2 = t N |y|2 M (dy). To this
Thus, we will be done once I check
that
EP |Z
R
R
2
2 2
E
X m = E
Xm + |m|2 EP N (t)2
1mN (t)
1mN (t)
1 |2 + |m|2 2 t2 + t = tEP |X1 |2 + 2 t2 |m|2 .
= tEP |X
R
Thus, since EP |X1 |2 = RN |y|2 M (dy), the desired equality follows.
4.2.3. Poisson Jump Processes. Rather than attempting to construct more
general Levy processes directly, I will first construct their jump processes and
then construct them out of their jumps. With this idea in mind, given a probability space (, F, P), I will say that (t, )
j(t, , ) is a Poisson jump process
associated with M M (RN ) if, for each , t
j(t, , ) is a jump func+
tion, and for each n Z and collection
Sn {1, . . . , n } of mutually disjoint Borel
subsets sets of RN satisfying 0
/ i=1 i , {j(t, i ) : t 0} : 1 i n are
mutually independent, simple Poisson processes, the ith of which is run at rate
M (i ) for each 1 i n. By starting with simple functions and passing to
limits, one can easily check that
Z
(t, ) [0, ) 7 (y) j(t, dy, ) [0, ]
169
t 0.
k=0
j(t, , )
jk (t, , )
k=1
170
4 Levy Processes
0
4.2.4. L
evy Processes with Bounded Variation. Although the contents
of the previous section provide the machinery with which to construct a Levy
process for any with Fourier transform given by (4.2.1), for reasons made clear
in the next lemma, I will treat the special case when M M1 (RN ) here and will
deal with M M2 (RN ) \ M1 (RN ) in the following subsection.
Lemma 4.2.13. Let {j(t, ) : t 0}R be a Poisson jump process associated
with M M2 (RN ), and set V (t, ) = |y| j(t, dy, ). Then V (t) < almost
surely or V (t) = almost surely for all t > 0, depending on whether M is or is
not in M1 (RN ). (See Exercise 4.3.11 to see that the same conclusion holds for
any M M (RN ).)
R
Proof: Since |y|>1 |y| j(t, dy, ) < for all (t, ) [0, ) , the question
R
is entirely about the finiteness of V0 (t, ) B(0,1) |y| j(t, dy, ). To study this
k+1 ) \ B(0, 2k ), F (y) = |y|1
k
Ak (y), and Vk (t, ) =
Rquestion, set Ak = B(0, 2
|y|
j(t,
dy,
)
for
k
1.
Clearly,
the
processes
{V
(t)
: t 0} : k Z+
k
Ak
are mutually independent. In addition, for each k, t
Vk (t) is non-decreasing
and, by the second part of Lemma 4.2.12, {Vk (t) : t 0} is a Poisson process
associated with M Fk . Thus, by Lemma 4.2.10,
ak EP Vk (t) = t
|y| M (dy) and bk Var Vk (t) = t
Ak
|y|2 M (dy).
Ak
"Z
P
B(0,1)
E Vk (t) =
|y| M (dy),
B(0,1)
k=1
which finishes the case when M M1 (RN ). When M M2 (RN ) \ M1 (RN ), set
Vk (t) = Vk (t) tak . Then, for each t > 0, {Vk (t) : k Z+ } is a sequence of
mutually independent random variables with mean value 0. Furthermore,
X
k=1
X
Var Vk (t) = t
bk = t
k=1
171
P
Hence, by Theorem
1.4.2,
k=1 Vk (t) converges P-almost
P
P surely. But, when
M
/ M1 (RN ), k=1 ak = , and so, for each t > 0, k=1 Vk (t) must diverge
P-almost surely.
Before stating the main result of the subsection, I want to introduce the notion
of a generalized Poisson measure. Namely, if M M1 (RN ) \ M0 (RN ) and
M is the element of I(RN ) whose Fourier transform is given by
Z
1(,y)RN
1 M (dy) ,
exp
e
R
or, equivalently,
d
M is given by (4.2.1) with m = B(0,1) y M (dy), then I will
call M the generalized Poisson measure for M . Similarly, if {Z(t) : t 0}
is a Levy process for a generalized Poisson measure M , I will say that it is a
generalized Poisson process associated with M .
172
4 Levy Processes
r<|y|1
then
!
P
sup Z( ) m Z(r) ( )
[0,t]
N 2t
2
|y|2 M (dy).
B(0,r)
and define
(r)
y j(t, dy, ),
(t, ) =
(t, ) [0, ) ,
|y|>r
for r (0, 1]. By Theorem 4.2.14, we know that {Z(r) (t) : t 0} is a Levy
process for (r) , where
!
Z
i
h
(r) () = exp
d
e 1(,y)RN 1 1 1[0,1] (y) , y RN M (dy) .
|y|>r
Furthermore, by the second part of Lemma 4.2.10, we know that, for 0 < r <
r0 1,
Z
N 2t
0
|y|2 M (dy).
(*)
P kZ(r ) Z(r) k[0,t] 2
0
r<|y|r
then
P sup kZ(rn ) Z(rm ) k[0,t]
n>m
1
m
nm
N 2t
(n + 1)4 2n ,
n=m
173
We now know that there is a P-null set N such that, for any
/ N , there
exists a Z( , ) D(RN ) to which {Z(rm ) ( , ) : n 0} converges uniformly
on compacts. Thus, if we take Z(t, ) = 0 for (t, ) [0, ) N , then it is an
easy matter to check that {Z(t) : t 0} is a Levy process for the I(RN )
whose Fourier transform is given by (4.2.1) with m = 0. In addition, since, by
Theorem 4.1.8, we know that t
j(t, , ) is the jump function for t
Z(t, )
when
/ N , it is clear that {j(t, , Z) : t 0} is a Poisson jump process
associated with M . Finally, to prove the estimate in the concluding assertion,
observe that, for
/ N , the path t
Z(r) (t, ) used in our construction
coincides with the path described in the statement. Thus, the desired estimate
is an easy consequence of the one in (*) above.
Corollary 4.2.16. Let I(RN ) with Fourier transform given by (4.2.1),
and suppose that {Z(t) : t 0} is a Levy process for . Then, depending
on whether or not M M1 (RN ), either P-almost all or P-almost none of the
paths t
Z(t) has locally bounded variation. Moreover, if M M1 (RN ), then,
P-almost surely,
!
Z
y M (dy)
is an absolutely pure jump path.
t
Z(t) t m
B(0,1)
y M (dy).
Z (t) =
y j(t, dy, Z), M (dy) = 1 (y)M (dy), and m =
B(0,1)
1 , m m RN
exp
Z
h
i
1(,y)RN
1 11[0,1] |y| , y RN M (dy) ,
+
e
RN \
and {Z(t) Z (t) : t 0} is independent of {j t, , Z : t 0}, and therefore
of {Z (t) : t 0} as well.
174
4 Levy Processes
Z (t) Z (t) =
1RN \ (y)y j(t, dy) t
y M (dy).
|y|>r
r<|y|1
(r)
Z
i
h
(B(0,r)){
Z
= exp
RN \
1(,y)RN
11[0,1]
!
i
|y| , y RN M (dy) .
Hence, it follows that {Z(t) Z (t) : t 0} is a Levy process for the specified
element of I(RN ).
Exercises for 4.2
Exercise 4.2.18. Here is another proof that the process {N (t) : t 0} in
4.2.1 has independent, homogeneous increments. Refer to the notation used
there.
(i) Given n Z+ and measurable functions f : [0, )n+1 7 [0, ) and g :
[0, )n R, show that
EP f (1 , . . . , n+1 ), n+1 > g(1 , . . . , n )
+
= EP eg(1 ,... ,n ) f 1 , . . . , n , n+1 + g(1 , . . . , n )+ .
(ii) Let K Z+ , 0 = n0 n1 nK , and 0 = t0 t1 < < tK = s
be given, and set A = {N (tk ) = nk , 1 k K}. Show that A = B
{nK +1 > s TnK }, where B {1 , . . . , nK } , and apply (i) to see that
P(A) = EP e(sTnK ) , B .
(iii) Let n Z+ and t > 0 be given, and set h() = P(Tn1 > ). Referring to
(ii) and again using (i), show that
P A {N (s + t) N (s) < n} = EP h(t + s TnK +1 ), B {nK +1 > s TnK }
= EP e(sTnK ) h(t nK +1 ), B = EP h(t nK +1 ) EP e(sTnK ) , B
= P N (t) < n P(A).
175
Hint: First use The Strong Law of Large Numbers to show that limn
1 P-almost surely. Second, use
2
N (t) N (n)
P N (1) n 2 2
P
sup
n
t
ntn+1
to see that
N (n)
n
N (t) N (btc)
= 0 P-almost surely.
lim
t
btc
t
Exercise 4.2.20. Assume that I(R) has its Fourier transform given by
(4.2.1), and let {Z(t) : t 0} be a Levy process for . Using Exercise 3.2.25,
show that t R Z(t) is non-decreasing if and only if M M1 (R), M (, 0) =
0, and m [1,1] y M (dy).
Exercise 4.2.21. Let {j(t, ) : t 0} be a Poisson jump process associated
with some M M (RN ), and suppose that F : RN R is a Borel measurable,
M -integrable function that vanishes at 0.
(i) Let N be the set of for which there is a t > 0 such that F is not
j t, , )-integrable, and show that P(N ) = 0.
(ii) Show that (cf. Lemma 4.2.6) M F M1 (R) and that, in fact,
Z
Z
F
|y| M (dy) = |F (y)| M (dy) < .
Next, define
F
R
Z (t, ) =
if
/N
if N ,
and show that {Z F (t) : t 0} is a (possibly generalized) Poisson process associated with M F .
(iii) Show that
Z F (t)
=
t
t
lim
Z
F (y) M (dy)
P-almost surely.
Hint: Begin by using Lemma 4.2.10 to show that it suffices to handle F s that
vanish in a neighborhood of 0. When F vanishes in a neighborhood of 0, use
Lemma 4.2.12 to see that {Z F (t) : t 0} is a Poisson process associated with
M F . Finally, use the representation of a Poisson process in terms of a simple
Poisson process and independent random variables, and apply The Strong Law
of Large Numbers together with the result in Exercise 4.2.19.
176
4 Levy Processes
Exercise 4.2.22. Let {Z(t) : t 0} be a Levy process for the I(RN ) with
|y|2 M (dy) +
B(0,1)
2
R
1<|y| R
|y|> R
1<|y| R
R){ .
Then,
P kZk[0,t] R P kZ1 k[0,t]
R
2
+ P kZ2 k[0,t]
R
2
+ P kZ3 k[0,t] 6= 0 .
Apply the estimates in Lemma 4.2.10 to control the first two terms on the right,
and use
N
P j t, RN \ B(0, R), Z 6= 0 = 1 etM (R \B(0, R))
177
Exercise 4.2.24. Let M M2 (RN ) be given, and assume that there exists a
decreasing sequence {rn : n 0} (0, 1] with rn & 0 such that
Z
m = lim
y M (dy)
n
rn <|y|1
exists. Let I(RN ) have Fourier transform given by (4.2.1) with this m and
M . If {Z(t) : t 0} is a Levy process for , set
Z
Zn (t, ) =
y j t, dy, Z( , ) ,
|y|>rn
and show that limn P kZ Zn k[0,t] = 0 for all t 0 and > 0. Thus,
after passing to a subsequence {nm : m 0} if necessary, one sees that, P-almost
surely,
Z
Z(t, ) = lim
y j t, dy, Z( , ) ,
m
|y|>rnm
where the convergence is uniform on finite time intervals. In particular, one can
say that P-almost all the paths t
Z(t, ) are conditionally pure jump.
4.3 Brownian Motion, the Gaussian L
evy Process
What remains of the program in this chapter is the construction of a Levy
process for the standard, normal distribution 0,I , the infinitely divisible law
||2
exp
1 , m RN 12 , C RN
Z h
i
+
e 1(,y)RN 1 1 1[0,1] (|y|) , y RN M (dy) .
RN
Because one of its earliest applications was as a mathematical model for the
motion of Brownian particles, 1 such a Levy process for 0,1 is called a Brownian motion. In recognition of its provenance, I will adopt this terminology
and will use the notation {B(t) : t 0} instead of {Z0,I (t) : t 0}.
1
R. Brown, an eighteenth century English botanist, observed the motion of pollen particles
in a dilute gas. His observations were interpreted by A. Einstein as evidence for the kinetic
theory of gases. In his famous 1905 paper, Einstein took the first steps in a program, eventually
completed by N. Wiener in 1923, to give a mathematical model of what Brown had seen.
178
4 Levy Processes
Before getting into the details, it may be helpful to think a little about what
sorts of properties we should expect the paths t
B(t) will possess. For this
N
purpose, set Mn = n 12 + 12 , and recall that we have seen already
n
n
that Mn =0,I . Since a Poisson process associated with Mn has nothing but
1
jumps of size n 2 , if one believes that the Levy process for 0,I should be, in
some sense, the limit of such Poisson processes, then it is reasonable to guess
that its paths will have jumps of size 0. That is, they will be continuous.
Although the prediction that the paths of {B(t) : t 0} will be continuous
is correct, it turns out that, because it is based on the Central Limit Theorem,
the heuristic reasoning just given does not lead to the easiest construction. The
problem is that The Central Limit Theorem gives convergence of distributions,
not random variables, and therefore one should not expect the paths, as opposed
to their distributions, of the approximating Poisson processes to converge. For
this reason, it is easier to avoid The Central Limit Theorem and work with
Gaussian random variables from the start, and that is what I will do here. The
Central Limit approach is the content of 9.3.
B (m + 1)2n B(m2n )
n
Xm,n+1 2 2 +1 Bn+1 (2m 1)2n1 Bn (2m 1)2n1
!
B m2n + B (m 1)2n
n
+1
n1
B (2m 1)2
= 22
2
h
n
= 2 2 B (2m 1)2n1 B (m 1)2n
i
B m2n B (2m 1)2n1
,
179
`=1
m=`
RN
Y
Y
Y
Y
0
0
0
0
1 m
Xm
1 m Xm
P
1 m
Xm
P
1 m Xm
P
E
e
=E
e
e
E
e
m=1
m=1
m=1
m=1
0
for any choice of {m : 1 m n} {m
: 1 m n} R. But the
expectation value on the left is equal to
!2
n
X
1
0
0
m Xm + m
Xm
exp EP
2
m=1
!2
!2
n
n
X
X
1
1
0
0
m
Xm
m Xm EP
= exp EP
2
2
m=1
m=1
#
# " n
" n
Y
Y
0
0
e 1 m Xm ,
= EP
e 1 m Xm EP
m=1
m=1
180
4 Levy Processes
0
0
since EP [Xm Xm
0 ] = 0 for all 1 m, m n.
Armed with Lemma 4.3.1, we can now check that {Xm,n : (m, n) Z+ N}
is independent. Indeed, since, for all (m, n) Z+ N and RN , , Xm,n RN
a member of the Gaussian family G(B), all that we have to do is check that, for
each (m, n) Z+ N, ` N, and (, ) (RN )2 ,
EP , Xm,n+1 RN , B(`2n ) RN = 0.
`
+
(m
1)
`
= 0.
= 2n , RN m 12 `
2
4.3.2. L
evys Construction of Brownian Motion. Levys idea was to
invert the reasoning given in the preceding subsection. That is, start with a
family {Xm,n : (m, n) Z+ N} of independent N (0, I)-random variables.
Next, define {Bn (t) : t 0} inductively
Bn (t) is linear on each
P so that t
n
n
interval [(m 1)2 , m2 ], B0 (m) = 1`m X`,0 , m N, Bn+1 (m2n ) =
Bn (m2n ) for m N, and
n
Bn+1 (2m 1)2n = Bn (2m 1)2n1 + 2 2 1 Xm,n+1 for m Z+ .
181
and
Bn+1 (2m 1)2n1 Bn+1 (m 1)2n
Bn (m2n ) Bn (m 1)2n
n
+ 2 2 1 Xm,n+1 .
=
2
Using these expressions and the induction hypothesis, it is easy to check the
required equation.
Second, and more challenging, we must show that, P-almost surely, these
processes are converging uniformly on compact time intervals. For this purpose,
consider the difference t
Bn+1 (t) Bn (t). Since this path is linear on each
interval [m2n1 , (m + 1)2n1 ],
Bn+1 (m2n1 ) Bn (m2n1 )
max Bn+1 (t) Bn (t) =
max
t[0,2L ]
1m2L+n+1
= 2 2 1
max
1m2L+n
L+n
2X
|Xm,n+1 | 2 2 1
14
|Xm,n+1 |4 .
m=1
L+n
2X
n
EP kBn+1 Bn k[0,2L ] 2 2 1
14
nL4
EP |Xm,n+1 |4 = 2 4 CN ,
m=1
1
where CN EP |X1,0 |4 4 < .
Starting from the preceding, it is an easy matter to show that there is a
measurable B : [0, ) RN such that B(0) = 0, B( , ) C [0, ); RN )
for each , and kBn Bk[0,t] 0 both P-almost surely and in L1 (P; R)
n
n
for every t [0, ). Furthermore, since
) P-almost surely
B(m2 )n= Bn (m2 n
2
for all (m, n) N , it is clear that B (m + 1)2
B(m2 ) : m 0 is a
sequence of independent N (0, 2n I)-random variables for all n N. Hence, by
continuity, it follows that {B(t) : t 0} is a Brownian motion.
We have now completed the task described in the introduction to this section.
However, before moving on, it is only proper to recognize that, clever as his
method is, Levy was not the first to construct a Brownian motion. Instead, it
182
4 Levy Processes
was N. Wiener who was the first. In fact, his famous2 1923 article Differential
Space in J. Math. Phys. #2 contains three different approaches.
4.3.3. L
evys Construction in Context. There are elements of Levys
construction that admit interesting generalizations, perhaps the most important
of which is Kolmogorovs Continuity Criterion.
Theorem 4.3.2. Suppose that {X(t) : t [0, T ]} is a family of random
variables taking values in a Banach space B, and assume that, for some p
[1, ), C < , and r (0, 1],
1
1
EP kX(t) X(s)kpB p C|t s| p +r for all s, t [0, T ].
) B is
X(t) = X(t)
P-almost surely for each t [0, T ] and t [0, T ] 7 X(t,
continuous for all . In fact, for each (0, r),
"
P
sup
0s<tT
X(s)k
kX(t)
B
(t s)
!p # p1
5CT p +r
.
(1 2r )(1 2r )
Proof: First note that, by rescaling time, it suffices to treat the case when
T = 1.
Given n 0, set Mn = max1m2n
X(m2n ) X (m 1)2n
B , and
observe that
! p1
2n
X
1
p
X(m2n ) X (m 1)2n
C2rn .
EP Mnp p EP
B
m=1
Next, let t
Xn (t) be the polygonal path obtained by linearizing t
each interval [(m 1)2n , m2n ], and check that
X(t) on
1)2
X(m2
)
= max n
X (2m 1)2n1
Mn+1 .
1m2
2
t[0,1]
i 12 p
EP sup kX(t)
B
1 2r
t[0,1]
2
Wieners article is remarkable, but I must admit that I have never been convinced that it is
complete. Undoubtedly, my doubts are more a consequence of my own ineptitude than of his.
183
kX(t)
B kX(t) Xn (t)kB + kXn (t) Xn (s)kB + kXn (s) X(s)kB
) Xn ( )kB + 2n (t s)Mn ,
2 sup kX(
[0,1]
kX(t)
B
) Xn ( )kB + 2n 2(1)n Mn .
22(n+1) sup kX(
(t s)
[0,1]
sup
0s<t1
X(s)k
kX(t)
B
(t s)
!p # p1
X
2(n+1) 2rn
n rn
+
2
2
2
1 2r
n=0
5C
.
(1 2r )(1 2r )
Corollary 4.3.3.
If {B(t) : t 0} is an RN -valued Brownian motion, then,
B(t) is P-almost surely H
older continuous of order .
for each 0, 21 , t
In fact, for each T (0, ),
P
sup
0s<tT
|B(t) B(s)|
< .
(t s)
|B(t) B(s)|
<
P s [0, ) lim
t&s
(t s)
= 0.
184
4 Levy Processes
|B(t) B(s)|
<
P s [0, 1) lim
t&s
(t s)
= 0.
|B(t) B(s)|
<
t&s
(t s)
s [0, 1) lim
[
\
[
n L1
[
\
B
m+`+1
n
m+`
n
M
n
M =1 =1 n= m=0 `=0
`+1
n
`
n
0 ` < L = 0.
M
n ,
But
P B
`+1
n
`
n
L
M
= 0, n1 I B 0, n
M
n ,
0`<L
N
2
(2)
Z
1
B(0,M n 2 )
|y|2
2
!L
dy
Cn( 2 )N L .
tI
m,n
m,n
n t[0,T ]
m=1
= 0 P-almost surely,
H.S.
m1
. In particular, P-almost no Brownian path
where m,n B B m
n
n B
has locally bounded variation.
185
N
Proof: Let
(e1 , . . . , eN ) be an orthonormal basis for R , and set Xi (k, n) =
ei , k,n B RN . Then, what we have to show is that
m
X
m
(*)
lim sup
Xi (k, n)Xj (k, n) i,j = 0 P-almost surely.
n 1mnT
n
k=1
To this end, note that, for each n Z+ and 1 i N , {Xi (k, n) : k 1} are
mutually independent N (0, n1 )-random variables. Hence, for each 1 i N ,
{Xi (k, n)2 n1 : k 1} are independent random variables with mean value
0 and variance 3n2 , and therefore, by (1.4.22) and the second inequality in
(1.3.2),
m
4
X
E max
Xi (k, n)2 n1
1mnT
k=1
4
12M4 T 2
X
2
1
,
4E
Xi (k, n) n
n2
1knT
lim
m
n
n
n
m=1
4.3.5. General L
evy Processes. Our original reason for constructing Brownian motion was to complete the program of constructing all the Levy processes.
In this subsection, I will do that.
Throughout this subsection, I(RN ) has Fourier transform
1 , m RN 12 , C RN
exp
(4.3.6)
Z h
i
1(,y)RN
1 11[0,1] (|y|) , y RN M (dy) ,
+
e
i
exp
e 1(,y)RN 1 11[0,1] (|y|) , y RN M (dy) .
Thus, = 0 ? 1 .
186
4 Levy Processes
r<|y|1
|y|2 M (dy).
B(0,r)
187
also have locally bounded variation P-almost surely, and, since {Z0 (t) : t 0}
1
has the same distribution as {tm + C 2 B(t) : t 0}, Theorem 4.3.5 shows that
this is possible only if C = 0.
In terms of this decomposition, Corollary 4.3.8 is saying that the local part of
A governs the continuous part of {Z(t) : t 0} and that the non-local part
governs the discontinuous part.
Exercises for 4.3
Exercise 4.3.10. This exercise deals with a few elementary facts about Brownian motion.
(i) Let {X(t) : t 0} be an RN -valued stochastic process satisfying X(0, ) = 0
N
and X( , ) C(RN ) for all , and showthat {X(t)
: t 0} is an R -valued
Brownian motion if and only if the span of , X(t) RN : t 0 & RN } is a
Gaussian family with the property that, for all t, t0 [0, ) and , 0 RN ,
h
i
EP , X(t) RN 0 , X(t0 ) RN = t t0 (, 0 )RN .
(ii) Assuming that {B(t) : t 0} is an RN -valued Brownian motion, show that
{OB(t) : t 0} is also an RN -valued Brownian motion for any orthogonal transformation O on RN . That is, the distribution of Brownian motion is invariant
under rotation. (See Theorem 8.3.14 for a significant generalization.)
(iii) Assuming that {B(t) : t 0} is an RN -valued Brownian motion, show that
1
{ 2 B(t) : t 0} is also an RN -Brownian motion for each (0, ). This is
called the Brownian scaling invariance property.
Exercise 4.3.11. This exercise introduces the time inversion invariance property of Brownian motion.
(i) Suppose that
{B(t) : t 0} is an RN -valued Brownian motion, and set
1
X(t) = tB t for t > 0. As an application of (i) in Exercise 4.3.10, show that
{X(t) : t > 0} has the same distribution as {B(t) : t > 0}, and conclude from
) = 0 and, for
this that limt&0 X(t) = 0 P-almost surely. In particular, if B(0,
t (0, ),
tB 1t , when lim 0 B 1 , = 0
B(t, ) =
0
otherwise,
N
188
4 Levy Processes
(ii) As a consequence of part (i), prove the Brownian Strong Law of Large Numbers: limt t1 B(t) = 0.
Exercise 4.3.12. Let {B(t) : t 0} be an RN -valued Brownian motion.
(i) As an application of Theorem 1.4.13, show that, for any e SN 1 and
T (0, ),
!
P
sup e, B(t) R
t[0,T ]
R2
2P e, B(T ) RN R 2e 2T ,
(4.3.13)
0`<m
and conclude that P {B(1) a} n () 12 P B (1) a for all n N.
Now let n and then & 0 to arrive at P B (1) a 2P B(1) a .
(iii) By combining the preceding with Brownian scaling invariance, arrive at
r
(4.3.14)
P B (t) a = 2P B(t) a =
1
at 2
x2
2
dx.
This beautiful result, which is sometimes called the reflection principle for
Brownian motion, seems to have appeared first in L. Bacheliers now famous
1900 thesis, where he used what is now called Brownian motion to model
price fluctuations on the Paris Bourse. More information about the reflection
principle can be found in 8.6.3.
189
lim q
B(t)
= 1 = lim q
t&0
2t log(2) t
2t log(2) t1
B(t)
P-almost surely.
Begin by checking that the second equality follows from the first applied to the
is just the Law of the Iterated Logarithm for standard normal random variables.
Thus, all that remains is to show that
B(n)
B(t)
= 0 P-almost surely,
q
lim
sup q
n t[n,n+1]
2t log(2) t
2n log(2) n
which can be checked by a combination of the Strong Law for Brownian motion,
the estimate in (4.3.13), and the easy half of the BorelCantelli Lemma.
Exercise 4.3.16. Given a stochastic process {X(t) : t 0}, the stochastic
process {X(t)
: t 0} is said to be a modification of {X(t) : t 0} if, for
190
4 Levy Processes
for some p [1, ), r > 0, and C < . Show that there exists a family
{X(x)
: x [0, T ] } with the properties that x [0, T ] 7 X(x,
) B
kX(y)
X(x)k
B
K(, r, )CT p +r .
EP sup
|y x|
x,y[0,T ]
y6=x
Hint: First rescale time to reduce to the case when T = 1. Now assume that
2
T = 1. Given n N, take Sn to be the set of pairs (m, m0 ) {0, . . . , 2n }N
P
such that m0i mi for all 1 i and i=1 (m0i mi ) = 1, note that Sn has
no more than 2(n+1) elements, set
Mn = max kX(m0 2n ) X(m2n )kB : (m, m0 ) Sn ,
1
191
involving Brownian paths. For simplicity, I will restrict my attention to the onedimensional case. Thus, let {B(t) : t 0} be an R-valued Brownian motion.
Because t
B(t) is continuous, one knows that any function : [0, 1] R of
bounded variation is RiemannStieltjes integrable on [0, 1] with respect to B
[0, 1]. However, as the following shows, almost no Brownian path is Riemann
Stieltjes with respect to itself. Namely, using Theorem 4.3.5, show that P-almost
surely,
lim
n
X
m=1
n
X
lim
whereas
lim
m1
n
m
n
m
n
m1
n
B(1)2 1
,
2
m
n
m1
n
B(1)2 + 1
,
2
= B(1)2 .
m=1
n
X
2m1
2n
m
n
m1
n
m=1
lim
sup
&0 0<ts
|B(t) B(s)|
= 2
L()
= 1,
192
4 Levy Processes
p
where L() log 1 . Notice that, on the one hand, this result is in the direction that one should expect: we know (cf. Theorem 4.3.4) that Brownian paths
are almost never H
older continuous of any order greater than 12 . On the other
hand, the Brownian Law of the Iterated Logarithm (cf. Exercise q
4.3.15) might
make one guess that their true modulus of continuity ought to be log(2) 1 ,
not L(). However, that guess is wrong because it fails to take into account the
difference between a question about what is true at a single time as opposed to
what is true simultaneously for all times. The purpose of this exercise is to show
how the considerations in 4.3.3 can be used to get a statement that is related
to but far less refined than Levys. The result to be proved here says only that
|B(t) B(s)|
K =1
(4.3.23)
P lim sup
&0 0<ts
L()
P lim
sup
&0 0<ts
s,t[0,1]
|B(t) B(s)|
K = 1
L()
X
n=0
sup
2n1 ts2n
|B(t) B(s)|
>K
L(2n1 )
!
< .
2kB Bn k[0,1]
Mn
|B(t) B(s)|
for 2n1 t s 2n .
+
n1
L(2n )
L(2n1 )
L(2
)
P
P p
(iii) Set C = n=0 (n + 1)2n , show that n=m L(2n )1 CL(2m )1 for
all m 0, and, arguing as in the proof of Theorem 4.3.2, conclude that, for any
R > 0,
X
P kB Bn k[0,1] R
P Mm+1 C 1 RL(2m1 )1 .
m=n
Chapter 5
Conditioning and Martingales
Up to this point I have been dealing with random variables that are either
themselves mutually independent or are built out of other random variables
that are. For this reason, it has not been necessary for me to make explicit
use of the concept of conditioning, although, as we will see shortly, this concept
has been lurking silently in the background. In this chapter I will first give the
modern formulation of conditional expectations and then provide an example of
the way in which conditional expectations can be used.
Let (, F, P) be a probability space, and suppose that A F is a set having
positive P-measure. For reasons that are most easily understood when is finite
and P is uniform, the ratio
P(B|A)
P(A B)
,
P(A)
B F,
194
probability of B when A does not occur, she has provided you with incomplete
information about B. Thus, before you are satisfied, you should demand to
know also what is the conditional probability of B given that A does not occur.
Of course, this second piece of information is relevant only if A is not certain,
in which case P(A) < 1 and therefore P B A{ is well defined. More generally,
suppose that P = {A1 , . . . , AN } (N here may be either finite or countably
infinite) is a partition of into elements of F having positive P-measure. Then,
in order to have complete information about the probability
of B F relative to
P, one has to know the entire list of the numbers P B An , 1 n N . Next,
suppose that one attempts to describe this list in a way that does not depend
explicitly on the positivity of the numbers P(An ). For this purpose, consider the
function
N
X
7 f ()
P B An 1An ().
n=1
Clearly, f is not only F-measurable, it is measurable with respect to the algebra (P) over generated by P. In particular (because the only (P)measurable set of P-measure 0 is empty), f is uniquely determined by its Pintegrals EP [f, A] over sets A (P). Moreover, because, for each B (P)
and n, either An B or B An = , we have that
N
X
EP f, A =
P B An =
n=1
P An B = P A B .
{n:An B}
Hence, the function f is uniquely determined by the properties that it is (P)measurable and that
EP f, A = P A B
for every A (P).
The beauty of this description is that it makes perfectly good sense even if
some of the An s have P-measure 0, except in that case the description does not
determine f pointwise but merely up to a (P)-measurable P-null set (i.e., a
set of P-measure 0), which is the very least one should expect to pay for dividing
by 0.
5.1.1. Kolmogorovs Definition. With the preceding discussion in mind,
one ought to find the following formulation reasonable. Namely, given a sub-algebra F and a (, ]-valued random variable X whose negative
part X (X 0) is P-integrable, I will say that the random variable X
is a conditional expectation of X given if X is (, ]-valued and
is P-integrable, and
-measurable, X
(5.1.1)
EP X , A = EP X, A for every A .
Obviously, having made this definition, my first duty is to show that such an
X always exists and to discover in what sense it is uniquely determined. The
latter problem is dealt with in the following lemma.
5.1 Conditioning
195
for every A ,
1
M P (A),
which, because EP X, A is a finite number, is impossible. At the same time, if
P(B) > 0, then
= EP X, B EP Y, B M < ,
which is also impossible.
196
5.1 Conditioning
197
Once one has established the existence and uniqueness of conditional expectations, there is a long list of more or less obvious properties that one can easily
verify. The following theorem contains some of the more important items that
ought to appear on such a list.
Theorem 5.1.4. Let be a sub--algebra of F. If X is a P-integrable random
variable and C is a -system (cf. Exercise 1.1.12) that generates , then
Y = EP X (a.s., P)
Y L1 (, , P; R) and EP Y, A = EP X, A for A C {}.
Moreover, if X is any (, ]-valued random variable that satisfies EP [X ]
< , then each of the following relations holds P-almost surely:
(5.1.5)
P
E X EP |X| ;
(5.1.6)
h i
EP X T = EP EP X T
and
EP Y X = Y EP X
(5.1.7)
(5.1.8)
(5.1.9) E
lim Xn lim EP Xn
198
Proof: To prove the first assertion, note that the set of A for which
EP [X, A] = EP [Y, A] is (cf. Exercise 1.1.12) a -system that contains C and
therefore . Next, clearly (5.1.5) is just an application of Lemma 5.1.2, while
(5.1.6) and the two equations that follow it are all expressions of uniqueness. As
for the next equation, one can first reduce to the case when X and Y are both
non-negative. Then one can use uniqueness to check it when Y is the indicator
function of an element of , use linearity to extend it to simple -measurable
functions, and complete the job by taking monotone limits. Finally, (5.1.8) is an
immediate application of the Monotone Convergence Theorem, whereas (5.1.9)
comes from the conjunction of
inf Xn inf EP Xn
nm
nm
P
E
with (5.1.8).
(a.s., P),
m Z+ ,
It probably will have occurred to most readers that the properties discussed
in Theorem 5.1.4 give strong evidence that, for fixed , X 7 EP [X|]()
behaves like an integral (in the sense of Daniell) and therefore ought to be
expressible in terms of integration with respect to a probability measure P .
Indeed, if one could actually talk about X 7 EP [X|]() for a fixed (as opposed
to P-almost every) , then there is no doubt that such a P would have to
exist. Thus, it is reasonable to ask whether there are circumstances in which one
can gain sufficient control over all the P-null sets involved to really make sense
out of X 7 EP [X|]() for fixed . Of course, when is generated by a
countable partition P, we already know what to do. Namely, when A P,
we can take
(
0
if P(A) = 0
P
E [X|]() =
EP [X, A]
if P(A) > 0.
P(A)
Even when does not arise in this way, one can often find a satisfactory representation of conditional expectations as expectations. A quite general statement
of this sort is the content of Theorem 9.2.1 in Chapter 9.
5.1.2. Some Extensions. For various applications it is convenient to have
two extensions of the basic theory developed in 5.1.1. Specifically, as I will now
show, the theory is not restricted to probability (or even finite) measures and
can be applied to random variables that take their values in a separable Banach
space. Thus, from now
on, will be an arbitrary (non-negative) measure on
(, F) and E, kkE will be a separable Banach space; and I begin by reviewing
a few elementary facts about -integration for E-valued random variables.2
2 The integration that I outline below is what functional analysts call the Bochner integral for
Banach spacevalued functions. There is a more subtle and intricate theory due to Pettis, but
Bochners theory seems adequate for most probabilistic considerations.
5.1 Conditioning
199
E [X] =
X() (d)
x (X = x).
xE\{0}
Notice that another description of E [X] is as the unique element of E with the
property that
E [X], x = E hX, x i for all x E
(I use E to denote the dual of E and hx, x i to denote the action of x E on
x E), and therefore that the mapping taking -simple X to E [X] is linear.
Next, observe that 7 kX()kE R is F-measurable if X : E is
F-measurable. In particular, for F-measurable X : E, I will set
(
1
if p [1, )
E kXkpE p
kXkLp (;E) =
if p =
inf M : kXkE > M = 0
and will write X Lp (; E) when kXkLp (;E) < . Also, I will say the X :
E is -integrable if X L1 (; E); and I will say that X is locally
-integrable if 1A X is -integrable for every A F with (A) < .
The definition of -integration for an E-valued X is completed in the following
lemma.
Lemma 5.1.10. For
each -integrable
X : E there is a unique element
E [X] E satisfying EP [X], x = EP [hX, x ] for all x E . In particular,
the mapping X L1 (; E) 7 E [X] E is linear and satisfies
E [X]
E kXkE .
(5.1.11)
E
Finally, if X Lp (; E), where p [1, ), then there is a sequence {Xn : n 1}
of E-valued, -simple functions with the property that kXn XkLp (;E) 0.
Proof: Clearly uniqueness, linearity, and (5.1.11) all follow immediately from
the given characterization of E [X]. Thus, all that remains is to prove existence
and the final approximation assertion. In fact, once the approximation assertion
is proved, then existence will follow immediately from the observation that, by
(5.1.11), E [X] can be taken equal to limn E [Xn ] if kX Xn kL1 (;E) 0.
To prove the approximation assertion, I begin with the case when is finite
and M = sup kX()kE < . Next, choose a dense sequence {x` : ` 1} in
E, set A0,n = , and let
o
n
for (`, n) Z+ Z+ .
A`,n = : kX() x` kE < n1
200
Xn () = x`
`1
[
Ak,n
k=0
and Xn () = 0 when
/
SLn
1
M + (E)
.
n
In order to handle the general case, let X Lp (; E) and n Z+ be given.
We can then find an rn (0, 1] with the property that
Z
1
,
kX()kpE (d)
(2n)p
kX Xn kLp (;E)
(rn ){
where
o
n
for r (0, 1].
(r) : r kX()kE 1r
Since, for any r (0, 1], rp (r) kXkpLp (;E) , we can apply the preceding to
the restrictions of and X to (rn ) and thereby find a -simple Xn : (rn )
E with the property
! p1
Z
1
p
.
Z
X() (d)
B
to denote the quantity E [1B X]. Also, when discussing the spaces Lp (; E), I
will adopt the usual convention of blurring the distinction between a particular
F-measurable X : E belonging to Lp (; E) and the equivalence class of
those F-measurable Y s that differ from X on a -null set. Thus, with this
convention, k kLp (;E) becomes a bona fide norm (not just a seminorm) on
Lp (; E) with respect to which Lp (; E) becomes a normed vector space. Finally,
by the same procedure with which one proves the Lp (; R) spaces are complete,
one can prove that the spaces Lp (; E) are complete for any separable Banach
space E.
5.1 Conditioning
201
E X , A = E X, A
(5.1.14)
(a.e., ).
Hence, not only does (5.1.13) continue to hold for any A with 1A X
L1 (; E), but also, for each p [1, ], the mapping X Lp (; E) 7 X
Lp (; E) is a linear contraction.
Proof: Clearly, it is only necessary to prove the = part of the first assertion.
Thus, suppose that (X 6= 0) > 0. Then, because E is separable and therefore
(cf. Exercise 5.1.19) E with the weak* topology
is also separable,
there exists
an > 0 and a x E with the property that X, x > 0, from which
it follows (by -finiteness) that there is an A F for which (A) < and
D
E
i
E X, A , x = E X, x , A 6= 0.
I turn next to the uniqueness and other properties of X . But it is obvious that
uniqueness is an immediate consequence of the first assertion and that linearity
follows from uniqueness. As for (5.1.14), notice that if x E and kx kE 1,
then
E X , x , A = E X, x , A E kXkE , A = E kXkE , A
for every A with (A) < . Hence,
at
least when
is a probability
202
has the required properties. In order to handle general X L1 (P; E), I use the
approximation result in Lemma 5.1.10 to find a sequence {Xn : n 1} of simple
functions that tend to X in L1 (P; E). Then, since
(Xn ) (Xm ) = Xn Xm (a.s., P)
and therefore, by (5.1.14),
(Xn ) (Xm )
1
kXn Xm
L1 (P;E) ,
L (P;E)
1
we
exists a -measurable X L (P; E) to which the sequence
know that there
(Xn ) : n 1 converges; and clearly X has the required properties.
Referring to the setting in the second part of Theorem 5.1.12, I will extend
the convention introduced following Theorem 5.1.3 and call the -equivalence
class of X s satisfying (5.1.13) the -conditional expectation of X given
, will use E [X|] to denote this -equivalence class, and will, in general,
ignore the distinction between the equivalence class and a generic representative
of that class. In addition, if X : E is locally -integrable, then, just
as in Theorem 5.1.4, the following are essentially immediate consequences of
uniqueness:
E Y X = Y E X (a.e., ) for Y L (, , ; R),
and
h i
E X T = E E X T
(a.e., )
203
(i) Let L be a closed linear subspace of L2 (P; R), and let L = {X : X L}
be the -algebra over generated by X L. Show that L = L2 , L , P; R if
and only if 1 L and X + L whenever X L.
Hint: To prove the if assertion, let X L be given, and show that
h
i
+
Xn n X 1 1 L for every R and n Z+ .
Conclude that Xn % 1(,) X must be an element of L.
(ii) Let be an orthogonal projection operator on L2 (P; R), set L = Range(),
and let = L , where L is defined as in part (i). Show that X = EP [X|]
(a.s., P) for all X L2 (P; R) if and only if 1 = 1 and
(*)
X Y = (X)(Y )
for all
X, Y L (P; R).
Hint: Assume that 1 = 1 and that (*) holds. Given X L (P; R), use
induction to show that
2
kXknL2n (P) kXkn1
L (P) kXkL (P)
and
n
= X(X)n1
n
for all n Z+ . Conclude that kXkL (P) kXkL (P) and that X
L, n Z+ , for every X L (P; R). Next, using the preceding together with
Weierstrasss Approximation Theorem, show that (X)+ L, first for X
L (P; R) and then for all X L2 (P; R). Finally, apply (i) to arrive at L =
L2 , , P; R .
(iii) To emphasize the point being made here, consider once again a closed
linear subspace L of L2 (P; R), and let L be orthogonal projection onto L.
Given X L2 (P; R), recall that L X is characterized as the unique element of
L for which X L X L, and show that EP [X|L ] is the unique element of
L2 (, L , P; R) with the property that
X EP X L f Y1 , . . . , Yn
for all n Z+ , f Cb Rn ; R , and Y1 , . . . , Yn L. In particular, L X =
EP [X|L ] if and only if X L X is perpendicular not only to all linear functions
of the Y s in L but even to all nonlinear ones.
Exercise 5.1.16. In spite of the preceding, there is a situation in which orthogonal projection coincides with conditioning. Namely, suppose that G is a
closed Gaussian family in L2 (P; R), and let L be a closed, linear subspace of G.
As an application of Lemma 4.3.1, show that, for any X G, the orthogonal
projection L X of X onto L is a conditional expectation value of X given the
-algebra L generated by the elements of L.
204
where the convergence is in L2 ([0, 1]; C). (Also see Exercise 5.2.45.)
Exercise 5.1.18. Let (, F, ) be a measure space and a sub--algebra of
F with the property that is -finite. Next, let E be a separable Hilbert
0
space, p [1, ], X Lp (; E), and Y a -measurable element of Lp (; E) (p0
is the Holder conjugate of p). Show that
h
i
-almost surely.
E Y, X E = Y, E X
E
Next, choose an orthonormal basis {en : n 0} for E, and justify the steps in
X
E Y, en E en , X E
E Y, X E =
1
h
i
E Y, en E E en , X E = E Y, E [X|] E .
Exercise 5.1.19. Let E be a separable Banach space, and show that, for each
R > 0, the closed ball BE (0, R) with the weak* topology is a compact metric
space. Conclude from this that the weak* topology on E is second countable
and therefore separable.
Hint: Choose a countable, dense subset {xn : n 1} in the unit ball BE (0, 1),
and define
(x , y ) =
X
n=1
2n hxn , x y i for x , y BE (0, R).
205
Show that is a metric for the weak* topology on BE (0, R). Next, choose
{xnm : m 1} so that xn1 = x1 and xnm+1 = xn if n is the first n > nm such
that xn is linearly independent of {x1 , . . . , xn1 }. Given a sequence {x` : ` 1}
in BE (0, R), use a diagonalization argument to find a subsequence {x`k : k 1}
such that am = limk hxnm , x`k i exists for each m 1. Now define f on the
PM
PM
span S of {xnm : m 1} so that f (x) = m=1 m am if x = m=1 m xnm ,
note that f (x) = limk hx, x`k i for x S, and conclude that f is linear on
S and satisfies the estimate |f (x)| RkxkE there. Since S is dense in E,
there is a unique extension of f as a bounded linear functional on E satisfying
the same estimate, and so there exists an x BE (0, R) such that hx, x i =
limk hx, x`k i for all x S. Finally, check that this convergence continues to
hold for all x E, and conclude that x`k x in the weak* topology.
Exercise 5.1.20. The purpose of this exercise is to show that Bochners theory
of integration for Banach space functions relies heavily on the assumption that
the Banach space be separable. In particular, the approximation procedure on
which the proof of Lemma 5.1.10 fails in the absence of separability. To see
this, consider the Banach space ` (; R) of uniformly bounded sequences x =
(x0 , . . . , xn , . . . ) RN with kxk` (N;R) = supn0 |xn |. Next, let {Xn : n 0}
be a sequence of mutually independent, {1, 1}-valued, Bernoulli random with
N}
of
E-valued
random
variables
is
Fn :
say that
the
family
{X
n
n N -progressively measurable if Xn is Fn -measurable for each n N.
random variables
is said
Next, a family {Xn : n N} of (, ]-valued
to be
a P-submartingale with respect to Fn : n N if it is Fn :
n N -progressively measurable, EP [Xn ] < , and, for each n N, X
n
|F
]
(a.s.,
P).
It
is
said
to
be
a
P-martingale
with
respect
to
Fn :
EP [Xn+1
n
n N if {Xn : n N} is an Fn : n N -progressively measurable family of
206
For a much more interesting and complete list of examples, the reader might want to consult
J. Neveus Discrete-parameter Martingales, NorthHolland (1975).
207
Theorem 5.2.1 (Doobs Inequality). Assume that Xn , Fn , P is a submartingale. Then, for every N Z+ and (0, ),
1 P
(5.2.2)
P
max Xn E XN , max Xn .
0nN
0nN
sup Xnp
nN
p1
1
p
sup EP Xnp p .
p 1 nN
for n Z+ .
n=0
n=0
N
X
EP XN , An
1
= EP XN , max Xn .
0nN
n=0
Now assume that the Xn s are non-negative. Given (5.2.2), (5.2.3) becomes
an easy application of Exercise 1.4.18.
Doobs inequality is an example of what analysts call a weak-type inequality. To be more precise, it is a weak-type 11 inequality. The terminology derives
from the fact that such an inequality follows immediately from an L1 -norm, or
strong-type 11, inequality between the objects under consideration; but, in general, it is strictly weaker. In order to demonstrate how powerful such a result
can be, I will now apply Doobs Inequality to prove a theorem of Marcinkewitz.
Because it is an argument to which we will return again, the reader would do
well to become comfortable with the line of reasoning that allows one to pass
from a weak-type inequality, like Doobs, to almost sure convergence results.
Corollary 5.2.4. Let X be an R-valued random variable
and p
[1, ). If
X Lp (P; R), then, for any non-decreasing sequence Fn : n N of sub-algebras of F,
"
#
_
P
P
(a.s., P) and in Lp (P; R) as n .
E X Fn E X Fn
0
In particular, if X is
Lp (P; R).
W
0
208
W
Proof: Without loss in generality, assume that F = 0 Fn .
Given X L1 (P; R), set Xn = EP [X|Fn ] for n N. The key to my proof will
be the inequality
1
(5.2.5)
P sup |Xn | EP |X|, sup |Xn | , (0, );
nN
nN
and, since, by (5.1.5), |Xn | EP [|X| |Fn ] (a.s., P), while proving (5.2.5) I may
and will assume that X and all the Xn s are non-negative. But then, by (5.2.2),
1 P
P
sup Xn > E XN , sup Xn >
0nN
0nN
1
= EP X, sup Xn >
0nN
for all N Z+ , and therefore (5.2.5) follows when N and one takes right
limits in .
As my first application of (5.2.5), note that {Xn : n 0} is uniformly Pintegrable. Indeed, because |Xn | EP [|X| |Fn ], we have from (5.2.5) that
h
h
i
i
sup EP |Xn |, |Xn | sup EP |X|, |Xn |
nN
nN
P
E |X|, sup |Xn | 0
nN
P
sup Xn Xn(k) + P sup Xn(k) X (k)
nN
nN
+ P X (k) X
(k)
2
(k)
(k)
X X
+ P sup Xn X
L1 (P)
nN
209
nN
when P(A) = 0.
[
Pn
and Pn1 Pn , n Z+ .
=
0
for every X L.
210
Proof: To prove the first part, simply set Fn = Pn , identify the Xn in
(5.2.6) as EP [X|Fn ], and finally apply Corollary
5.2.4. As for the second part,
let (L) be the -algebra generated by EP [X|] : X L , note that (L) is
countably generated and that
EP X = EP X (L)
(a.s., P)
for each X L,
..
P
E X
. .
EP XN
In addition, if g : C [0, ) is continuous and concave, then
EP g(X) g X
(a.s., P).
f EP [X|] EP f (X)|] (a.s., P).
and
X EP g(X), A
1A EP g(X)
Yn
P(A)
APn
and
P
EP g(X), A
E [X, A]
g
P(A)
P(A)
211
for all A F with P(A) > 0. Hence, if denotes the set of for which
Xn ()
lim
RN +1
n Yn ()
exists, v is a fixed element of C,
limn Xn () if
X ()
v
if
/ ,
and
Y ()
limn Yn () if
v
if
/ ,
212
213
for n Z+ .
Theorem 5.2.13 (Hunt). Let Xn , Fn , P be a P-integrable submartingale.
Given bounded stopping times and 0 satisfying 0 ,
(5.2.14)
X EP X 0 F
(a.s., P),
and the inequality can be replaced by equality when Xn , Fn , P is a martingale.
(Cf. Exercise 5.2.39 for unbounded stopping times.)
Proof: Choose {An : n N} for (Xn , Fn , P) as in Lemma 5.2.12, and set
Yn = Xn An for n N. Then, because A A 0 and A is F -measurable,
EP X 0 F EP Y 0 + A F = EP Y 0 F + A .
214
Hence, it suffices to prove that equality holds in (5.2.14) when Xn , Fn , P is a
martingale. To this end, choose N Z+ to be an upper bound for 0 , let F
be given, and note that
N
X
EP XN , { = n}
EP XN , =
n=0
N
X
EP Xn , { = n} = EP X , .
n=0
In particular, if
(5.2.17)
sup EP Xn+ < ,
nN
2 In the notes to Chapter VII of his Stochastic Processes, Wiley (1953), Doob gives a thorough
account of the relationship between his convergence result and earlier attempts in the same
direction. In particular, he points out that, in 1946, S. Anderson and B. Jessen formulated
and proved a closely related convergence theorem.
215
then there exists a P-integrable random variable X to which {Xn : n 0} converges P-almost surely. (See Exercises 5.2.36 and 5.2.38 for other derivations.)
a)+
, and note that (by Corollary 5.2.10) Yn , Fn , P is
Proof: Set Yn = (Xnba
a P-integrable submartingale. Next, let N Z+ be given, set 00 = 0, and, for
k Z+ , define
0
k = inf{n k1
: Xn a} N
and k0 = inf{n k : Xn b} N.
U[a,b]
N
X
N
X
0
Yk0 Yk = YN Y0
Yk Yk1
k=1
YN
k=1
N
X
0
Yk Yk1
.
k=1
0
0
0 for all
k and therefore, by (5.2.14), EP Yk Yk1
Hence, since k1
(N )
k Z+ , we see that EP [U[a,b] ] EP [YN ], and clearly (5.2.16) follows from this
after one lets N .
Given (5.2.16), the convergence result is easy. Namely, if (5.2.17) is satisfied,
then (5.2.16) implies that there is a set of full P-measure such that U[a,b] () <
for all rational a < b and ; and so, by the remark preceding the
statement of this theorem, for each , {Xn () : n 0} converges to some
X() [, ]. Hence, we will be done as soon as we know that EP [|X|, ] <
. But
EP |Xn | = 2EP Xn+ EP Xn 2EP Xn+ EP X0 ,
n N,
The inequality in (5.2.16) is quite famous and is known as Doobs Upcrossing Inequality.
Remark 5.2.18. The argument in the proof of Theorem 5.2.15 is so slick that
it is easy to miss the point that makes it work. Namely, the whole proof turns
0
on the inequality EP [Yk Yk1
] 0. At first sight, this inequality seems to be
0
wrong, since one is inclined to think that Yk < Yk1
. However, Yk need be
0
less than Yk1
only if k < N , which is precisely what, with high probability,
the submartingale property is preventing from happening.
216
Corollary 5.2.19. Let Xn , Fn , P be a martingale. Then there exists an
X L1 (P; R) such that Xn = EP [X|Fn ] (a.s., P) for each n N if and only if
the sequence {Xn : n 0} is uniformly P-integrable. In addition, if p (1, ],
then there is an X Lp (P; R) such that Xn = EP [X|Fn ] (a.s., P) for each n N
if and only if {Xn : n 0} is a bounded subset of Lp (P; R).
Proof: Because of Corollary 5.2.4 and (5.2.3), I need only check the if statement in the first assertion. But, if {Xn : n 0} is uniformly P-integrable,
then (5.2.17) holds and therefore Xn X (a.s., P) for some P-integrable X.
Moreover, uniform integrability together with almost sure convergence implies
convergence in L1 (P; R), and therefore, by (5.1.5), for each m N,
Xm = lim EP Xn Fm = EP X Fm (a.s., P).
n
Proof: Without loss in generality, I will assume throughout that all the Xn s
P
P
a
as well as Y dQ
dP take values in [0, ); and clearly, E [Xn ], n N, and E [Y ]
are all dominated by 1.
First note that
n
o
for A Fn .
Qn,s (A) = sup Q(A B) : B Fn and P(B) = 0
217
shows that Xn , Fn , P is a non-negative martingale. Thus, in either case, there
is a non-negative, P-integrable random variable X with the property that Xn
X (a.s., P). In order to identify X as Y , use Fatous Lemma to see that, for any
m N and A Fm ,
EP X, A lim EP Xn , A = lim Qn,a (A) Q(A);
n
S
and therefore EP [X, A] Q(A), first for A 0 Fm and then
for every A F.
In particular, by choosing B F so that Qs (B) = 0 = P B{ , we have that
EP X, A = EP X, A B Q(A B) = Qa (A) = EP Y, A for all A F,
which means that X Y (a.s., P). On the other hand, if Yn = EP [Y |Fn ] for
n N, then
EP Yn , A = Qa (A) Qn,a (A) = EP Xn , A for all A Fn ,
and therefore Yn Xn (a.s., P) for each n N. Thus, since Yn Y and
Xn X P-almost surely, this means that Y X (a.s., P).
Next, assume that Qn Pn for each n N and therefore that Xn , Fn , P
is a non-negative martingale. If {Xn : n 0} is uniformly P-integrable, then
Xn Y in L1 (P; R) and therefore Qs () = 1 EP [Y ] = 0. Hence, Q P
when {Xn : n 0} is uniformly P-integrable. Conversely, if Q P, then it
is easy to see that Xn = EP [Y |Fn ] for each n N, and therefore, by Corollary
5.2.4, that {Xn : n 0} is uniformly P-integrable.
Finally, assume that Qn Pn for each n N. Then, the Xn s can be chosen
dPn
. Hence, if Pa and Ps are
to take their values in (0, ) and Yn X1n = dQ
n
the absolutely continuous and singular parts of P relative to Q and if Y
Q
a
limn Yn , then Y = dP
dQ and so Pa (A) = E [Y, A] for all A F. Thus, when
1
on G and
B F is chosen so that Ps (B) = 0 = Q(B{), then, since Y = X
P
P
E [X, C G] = E [X, C] for all C F, it is becomes clear that
Q(A G = EQ XY, A G = EPa X, A G
= EP X, A G B = EP X, A B = Qa (A B) = Qa (A)
for all A F.
5.2.4. Reversed Martingales and De Finettis Theory. For some applications it is important to know what happens if one runs a submartingale or
martingale backwards. Thus,
again let (, F, P) be a probability space, only
this time suppose that Fn : n N is a sequence of sub--algebras that
is non-increasing. Given a sequence {Xn : n 0} of (, ]-valued random variables, I will say that the triple Xn , Fn , P is either a reversed submartingale or a reversed martingale if, for each n N, Xn is Fn -measurable
and either Xn L1 (P; R) and Xn+1 EP [Xn | Fn+1 ] or Xn L1 (P; R) and
Xn+1 = EP [Xn | Fn+1 ].
218
P sup Xn R
nN
1 P
E X0 , sup Xn R ,
R
nN
R (0, ).
sup Xn
nN
Lp (P;R)
p
X0
p
L (P;R)
p1
when p (1, ).
Moreover, if (Xn , Fn , P) is a reversed martingale, then (|Xn |, Fn , P is a re, Fn , P) is a reversed submartingale and
versed submartingale. Finally, if (XnT
P
max Xn > R
0nN
1 P
E X0 , max Xn > R
0nN
R
P
P
P
E X0 , sup Xn > R E X0 , sup Xn > R = E X0 , sup Xn > R ,
nN
nN
nN
219
nN
nN
\
m=1
+
Am = A B B Z : B = S B for all .
220
(5.2.24)
(5.2.25)
1X
g Xm = EP [g X1 ]
lim
n n
1
221
Now suppose that F is {Xm : 1 m N } -measurable. Then there
exists a g : E N R such that F = g X1 , . . . , XN ). If N = 1, then, because
Pn
limn n1 m=1 g Xm is T -measurable, (5.2.24) says that E P [F | X1 (A )]
is T -measurable. To get the same conclusion when N 2, I want to apply the
same reasoning, only now with E replaced by E N . To be precise, define
)
Z+
: B = S B for all (N ) , where
A(N
= B B
(N ) = : (`N + m) = (`N + 1) + m 1 for all ` N and 1 m < N
of length N .
is the group of finite permutations that transform Z+ in blocks
1 (N )
N
P
By (5.2.24) applied with E replacing E, we find that E F X (A ) =
(N )
1
1
EP F T P-almost surely. Hence,
since X (A )1 X (A ), (5.2.27) holds
for every {Xn : 1 n N } -measurable F L (P; R).
The best known consequence of Lemma 5.2.26 is the HewittSavage 0
1 Law, which says that X1 (A ) is trivial if the Xn s are independent and
identically distributed. Clearly, their result is an immediate consequence of
Lemma 5.2.26 together with Kolmogorovs 01 Law.
Seeing as the Strong Law of Large Numbers follows from (5.2.24) combined
with the HewittSavage 01 Law, one might think that (5.2.24) represents an
extension of the strong law. However, that is not really the case, since it can be
shown that X1 (A ) is trivial only if the Xn s are independent. On the other
hand, the derivation of the strong law via (5.2.24) extends without alteration to
the Banach space setting (cf. part (ii) of Exercise 6.1.16).
5.2.5. An Application to a Tracking Algorithm. In this subsection I will
apply the considerations in 5.2.1 to the analysis of a tracking algorithm. The
origin of this algorithm is an idea which Jan Mycielski introduced as a model
for learning. However, the treatment here derives from a variation, suggested
by Roy O. Davies, of Mycielskis model. Because I do not understand learning
theory, I prefer to think of Mycielskis algorithm as a tracking algorithm.
Let (E, B) be a measurable space for which there exists a nested sequence
{Pk : k 0} of finite or countable partitions such that P0 = {E} and B =
S
be the parent of Q in the sense
( k=0 Pk ). Given k 1 and Q Pk , let Q
that Q is the unique element of Pk1 which contains Q. Also, for each x E
and k 0, use Qk (x) to denote the unique Q Pk such that Q 3 x. Further, let
be a probability measure on (E, B) with the property that, for some (0, 1),
for each Q S Pk
0 < (Q) (1 )(Q)
k=0
Next, let (, F, P) be a probability space on which there exists a sequence
{Xn : n 1} of mutually independent E-valued random variables with distribution . In addition, let {Zn : n 1} be a sequence of E-valued random variables with the property that, for each n 1, Zn is independent of
{Xm : 1 m n} , let n be the distribution of Zn , and assume that
222
/
Q
()
for 1 j < m, Xm () Qk Zn () , and
for some k 0, Xj ()
k
n
/ Qk+1 Zn () for m < j n.
Xj ()
The goal here is to show that the Yn s search out the Zn s in the sense that,
for any B-measurable f : E R,
(5.2.28)
lim P |f (Yn ) f (Zn )| = 0 for all > 0.
n
Notice that, because fk = E f (Pk ) , this would be obvious from Corollary
5.2.4 if the Yn were replaced by Xn . Thus, the problem comes down to showing
that the distributions of Yn s are uniformly sufficiently close to .
For each n 1, define
n (z, ) =
=
n
X
X
k=0 j=1
X
n
k=0
j1
nj
1 Qk (z)
(Qk (z) \ Qk+1 (z)) 1 Qk+1 (z)
(Qk (z) \ Qk+1 (z))
,
Qk+1 (z)
Qk (z) \ Qk+1 (z)
n
n . Then
where n (Q) 1 (Q) 1 (Q)
ZZ
1B (z, y) n (z, dy)n (dz).
(5.2.29)
P (Zn , Yn ) B =
B
\ Q)
X
X
(Q
n
.
n () = n (z, ) n (dz) =
(Q)n (Q)
\ Q)
(Q
k=0 QPk+1
223
In addition, because Q` (z) \ Q`+1 (z) Qk (z) = if ` < k and is equal to
Q` (z) \ Q`+1 (z) when ` k,
X
n Q`+1 (z))
n z, Qk (z) =
`=k
= lim
n
n
n
1 (QL+1 (z)) 1 (Qk (z)
= 1 1 (Qk (z)) .
Z
1(Qk (z))
n
Z
n (dz) Kr
10
r
nr0
,
(dz)
1 (Qk (z))
Given an f L1 (; R) and Q
1
Af (Q) =
(Q)
k=0
Z
f d
Pk , set
(
0
)
Pk
k=0
Clearly,
x Q = M f (Q) f (x) sup A|f | Qk (x) ,
k0
and, because Af Qk (x) = E f (Pk ) (x), Doobs Inequality (5.2.3) implies
p
kf kLp (;R) for all p (1, ].
that kf kLp (;R) p1
kf kLq (n ;R)
rKr
q
kf kLqr0 (;R) .
X
X
1
n
f d
(Q)n (Q)
f dn =
\ Q) Q\Q
(Q
k=0 QPk+1
X X
224
since
(Q \ Q)
1 M f (Q).
f d 1 Af (Q)
Q\Q
n
=
n (Q)n (Q)M f (Q)
QPk+1
n n (Q)M f (Q)
1 (Q)
QPk+1
X
n
n
1 (Q) n (Q)M f (Q)
1 (Q) n (Q)M f (Q)
QPk+1
QPk
X
n
n
1 (Q) n (Q)M f (Q)
1 (Q) n (Q)M f (Q),
X
QPk+1
QPk
and therefore
K X
X
Z
f dn lim
n
1 (Q) n (Q)M f (Q)
QPk+1
k=0
n
1 (Q) n (Q)M f (Q)
QPk
= lim
n
1 (Q) n (Q)M f (Q)
f dn .
QPK+1
f dn
(f ) dn Kr
1 Kr
Z
0
q r
(f )
10
r
d
rKr
r0
kf kqLqr0 (;R) .
kf q kLr0 (;R) =
0
r 1
lim EP |f (Yn ) f (Zn )|q = 0 for each p [1, q).
225
By Holders Inequality,
1
1
n |f f R | Kr |f f R | r0 < Kr r0 ,
1
rKr
rKr 10
r .
|f f R | r0 <
n |f f R |
The proof of (5.2.35) follows the strategy outlined earlier. That is,
1
EP |f (Yn ) f (Zn )|p p
1
kf fk kLp (n ;R) + EP |fk (Yn ) fk (Zn )|p p + kfk f kLp (n ;R) .
By (5.2.33),
kf fk kLp (n ;R)
rKr
p1
kf fk kLpr0 (;R) ,
1
1
/ Qk (Zn ) p
EP |fk (Yn ) fk (Zn )|p p = EP |fk (Yn ) fk (Zn )|p , Yn
11
1
/ Qk (Zn ) p q .
EP |fk (Yn ) fk (Zn )|q q P Yn
By (5.2.30), the final factor tends to 0 as n . Hence, since, by Holders
Inequality and (5.2.33),
1
EP |fk (Yn ) fk (Zn )|q q kfk kLq (n ;R) + kfk kLq (n ;R)
1
1
1
1
r q
r q
+ 1 Krq kf kLqr0 (;R) ,
+ 1 Krq kfk kLqr0 (;R)
226
for
1 m n;
sup Mn Mm
nm
i
1 P h
E M Mm 0
as
m ,
EP A2n 4EP Xn2
for every n N.
Finally, show that there exist M L2 (P; R) and A L2 P; [0, ) such that
Mn M , An % A, and, therefore, Xn X M + A both P-almost surely
and in L2 (P; R).
(iii) Let Xn , Fn , P be a non-negative martingale, set Yn = eXn , n N, use
Corollary 5.2.10 to see that Yn , Fn , P is a uniformly bounded, non-negative,
submartingale, and apply part (ii) to conclude that {Xn : n 0} converges
P-almost surely to a non-negative X L1 (P; R).
227
(iv) Let Xn , Fn , P be a martingale for which
(5.2.37)
sup EP Xn < .
nN
n+1,m
Yn,m (a.s., P), define Ym = limn Yn,m , check that both Ym , Fm , P and
Ym , Fm , P are non-negative martingales with EP Y0+ +Y0 supnN EP |Xn | ,
and note that Xm = Y m+ Ym (a.s., P) for each m N. In other words, every
martingale Xn , Fn , P satisfying (5.2.37 ) admits a Hahn decomposition3 as
the difference of two non-negative martingales whose sum has expectation value
dominated by the left-hand side of (5.2.37). Finally, use this observation together
with (iii) to see that every such martingale converges P-almost surely to some
X L1 (P; R).
(v) By combining the final assertion in (iv) together with Doobs Decomposition
in Lemma 5.2.12, give another proof of the convergence assertion in Theorem
5.2.15.
Exercise 5.2.38. In this exercise we will develop another way to reduce Doobs
Martingale Convergence Theorem to the case of L2 -bounded martingales. The
technique here is due to R. Gundy and derives from the ideas introduced by
Calderon and Zygmund in connection with their famous work on weak-type 11
estimates for singular integrals.
measurable, [0, R]-valued
(i) Let {Zn : n N} be a Fn : n N -progressively
,
F
,
P
is
a
submartingale.
Next, choose
sequence with the property that
Z
n
n
{An : n N} for Zn , Fn , P as in Lemma 5.2.12, note that An s can be chosen
so that 0 An An1 R for all n Z+ , and set Mn = Zn + An , n N.
Check that Mn , Fn , P is a non-negative martingale with Mn (n + 1)R for
each n N. Next, show that
2
= EP Mn Mn1 Zn + Zn1
EP Mn2 Mn1
2
+ EP An An1 Zn + Zn1
= EP Zn2 Zn1
2
+ 2R EP An An1 ,
EP Zn2 Zn1
and conclude that EP [A2n ] EP [Mn2 ] 3REP [Z0 ] for all n N.
(ii) Let Xn , Fn , P be a non-negative martingale. Show that, for each R
(R)
(R)
(R)
(R)
(0, ), Xn = Mn An + n , n N, where Mn , Fn , P is a non-negative
(R)
(R) 2
3R EP X0 ; An : n N is a
martingale satisfying supn0 EP Mn
(R)
0,
228
(R) 2
(R)
(R)
3REP X0 ; and n :
An is Fn1 -measurable, and supn1 EP An
n N is a Fn : n N -progressively measurable sequence with the property
that
1
P n N (R)
6= 0 EP X0 .
n
R
(R)
(R)
(R)
!2
n
X
Vm(R) V (R) 12REP |Xn |
EP
m1
1
for n Z+ ; and {n : N} is an
sequence satisfying
Fn : n N -progressively measurable
2
P 0 m n (R)
=
6
0
EP |Xn | .
m
R
Exercise 5.2.39. In this exercise we will extend Hunts Theorem (cf. Theorem
5.2.13) to allow unbounded stopping times. To this end, let Xn , Fn , P be a
uniformly P-integrable submartingale on the probability space (, F, P), and set
Mn = Xn An , n N, where {An : n
N} is the sequence produced in Lemma
5.2.12. After checking that Mn , Fn , P is a uniformly P-integrable martingale,
show that, for any stopping time : X = EP [M |F ] + A (a.s., P), where
X , M , and A are, respectively, the P-almost sure limits of {Xn : n 0},
{Mn : n 0}, and {An : n 0}. In particular, if and 0 are a pair of stopping
times and 0 , conclude that X EP [X 0 |F ] (a.s., P).
229
Exercise 5.2.40. There are times when submartingales converge even though
they are not bounded in L1 (P; R). For example, suppose that (Xn , Fn , P) is a
submartingale for which there exists a non-decreasing function
: R 7 R with
the properties that (R) R for all R and Xn+1 Xn (a.e., P) for each
n N.
(i) Set R () = inf n N : Xn () R for R (0, ), and note that
sup XnR X0 (R)
(a.e., P).
nN
n=0
or
P An Fn1 () =
n=1
1An () = but
n=0
P An Fn1 () <
n=1
has P-measure 0. In particular, note that this gives another derivation of the
second part of the BorelCantelli Lemma (cf. Lemma 1.1.3).
Exercise 5.2.41. For each n N, let (En , Bn ) be a measurable space and
n and n a pair of probability measures on (En , Bn ) with the property that
Theorem, Q
which says that (cf. Exercise 1.1.14)
n
Qn . Prove Kakutanis
Q
Q
either nN n nN n or nN n nN n .
Hint: Set
Y
Y
Y
Y
En , F =
Bn , P =
n , and Q =
n .
=
nN
nN
nN
nN
Qn
( 0 Bm ), where n is the natural projection from onto
Next, take Fn =
Q
n
0 Em , set Pn = P Fn and Qn = Q Fn , and note that
n1
Xn (x)
Y
dQn
(x) =
fm (xm ),
dPn
0
x ,
230
dn
. In particular, when n n for each n N, use Kolwhere fn d
n
mogorovs
01 Law (cf. Theorem 1.1.2) to see that Q(G) {0, 1}, where G
limn Xn (0, )}, and combine this with the last part of Theorem 5.2.20
to conclude that Q 6 P = Q P. Finally, to remove the assumption
that
n n for all ns, define n on (En , Bn ) by n = 1 2n1 n + 2n1 n ,
Q
n , and use the preceding to complete
check that n n and Q Q
nN
the proof.
X2 Y2 d(P + Q).
dP
d
12
dQ
d
12
d.
Also, check that P Q if and only if P, Q = 0.
(ii) Suppose that Fn : n N is a non-decreasing sequence of sub--algebras
of F, and show that (P, Q)Fn (P, Q)W Fn .
0
(iv) Let {n }
0 (0, ), and, for each n N, let n and n be Gaussian
measures on R with variance n2 . If an and bn are the mean values of, respectively,
n and n , show that
Y
nN
depending on whether
Y
nN
P
0
or
Y
nN
nN
231
1.4.28 we showed that limn |Sn | < P-almost surely. Here we will show
that
(5.2.44)
As was mentioned before, this result was proved first by K.L. Chung and W.H.
Fuchs. The basic observation behind the present proof is due to A. Perlin, who
noticed that, by the HewittSavage 01 Law, limn |Sn | = L P-almost surely
for some L [0, ). Thus, the problem is to show that L = 0, and we will do
this by an simple argument invented by A. Yushkevich.
(i) Assuming that L > 0, use the HewittSavage 01 Law to show that
P |Sn x| <
L
3
i.o. = 0
for any x R,
where i.o. stands for infinitely often and means here for infinitely many
ns.
Hint: Set = L3 . Begin by observing that, because {Sm+n Sm : n Z+ }
has the same P-distribution as {Sn : n Z+ }, P(|Sm+n Sm | < 2 i.o.) = 0 for
any m Z+ . Thus, since |Sm+n x| |Sm+n Sm | |Sm x|, P(|Sn x| <
i.o.) P(|Sm x| ) for any m Z+ . Moreover, by the HewittSavage
01 Law, P(|Sn x| < i.o.) {0, 1}. Hence, either P(|Sn x| < i.o.) = 0,
or one has the contradiction that P(|Sm x| < ) = 0 for all m Z+ and yet
P(|Sn x| < i.o.) = 1.
L
3
i.o. P |Sn + L| <
L
3
i.o. = 1,
P |Sn x| < i.o. = 1.
Exercise 5.2.45.
of reversed martingales.
Here is a rather frivolous application
Let (, F, P), Fn : n N , and {ek : k Z be as in part (v) of Exercise
5.1.17. Next, take
Sm = {(2k + 1)2m : k Z} for each m N, and, for
2
f L [0, 1); C , set
m (f ) =
X
`Sm
f, e`
e,
L2 ([0,1);C) `
232
After noting that Fn : n N is non-increasing, use the convergence result for
reversed martingales in Theorem 5.2.21 to see that the expansion
f = f, 1
+
L2 ([0,1);C)
m (f )
m=0
When f is a function with the property that (f, e` )L2 ([0,1);C) = 0 for all ` Z\{2m : m N},
the preceding almost everywhere convergence result can be interpreted as saying that the
Fourier series of f converges almost everywhere, a result that was discovered originally by
Kolmogorov. The proof suggested here is based on fading memories of a conversation with
N. Varopolous. Of course, ever since L. Carlesons definitive theorem on the almost every
convergence of the Fourier series of an arbitrary square integrable function, the interest in this
result of Kolmogorov is mostly historical.
Chapter 6
Some Extensions and Applications
of Martingale Theory
Many of the results obtained in 5.2 admit easy extensions to both infinite
measures and Banach spacevalued random variables. Furthermore, in many
applications, these extensions play a useful, and occasionally essential, role. In
the first section of this chapter, I will develop some of these extensions, and in the
second section I will show how these extensions can be used to derive Birkhoffs
Individual Ergodic Theorem. The final section is devoted to Burkholders Inequality for martingales, an estimate that is second in importance only to Doobs
Inequality.
6.1 Some Extensions
Throughout
the
discussion that follows, (, F, ) will be a measure space and
Fn : n N will be a non-decreasing sequence of sub--algebras with the
property that F0 is -finite. In particular, this means that the conditional
expectation of a locally -integrable random variable given Fn is well defined (cf.
Theorem 5.1.12) even if the random variable takes
values in a separable Banach
space E. Thus, I will say that the sequence Xn ; n N of E-valued random
variables is a -martingale
with respect to Fn : n N , or,
more briefly,
that the triple Xn , Fn , is a martingale, if {Xn : n N} is Fn : n N progressively measurable, each Xn is locally -integrable, and
Xn1 = E Xn Fn1 (a.e., ) for each n Z+ .
Furthermore, whenE = R, I will
say that {Xn : n N} is a -submartingale
with respect to Fn : n N
(equivalently, the triple (Xn , Fn , ) is a submartingale) if {Xn : n N} is Fn : n N -progressively measurable, each
Xn is locally -integrable, and
Xn1 E Xn Fn1 (a.e., ) for each n Z+ .
6.1.1. Martingale Theory for a -Finite Measure Space. Without any
real effort, I can now prove the following variants of each of the basic results in
5.2.
233
234
Theorem 6.1.1. Let Xn , Fn , be an R-valued -submartingale. Then, for
each N N and A F0 on which XN is -integrable,
1
(6.1.2)
max Xn A E XN ,
max Xn A
0nN
0nN
for all (0, ); and so, when all the Xn s are non-negative, for every p
(1, ) and A F0 ,
p1
E sup |Xn |p , A
1
p
sup E |Xn |p , A p .
p 1 nN
nN
Furthermore, for each stopping time , Xn
, Fn , is a submartingale or a
martingale depending on whether Xn , Fn , is a submartingale or a martingale.
In addition, for any pair of bounded stopping times 0 ,
X E X 0 F
(a.e., ),
and the inequality is an equality in the martingale case. Finally, given a < b
and A F0 ,
E (Xn a)+ , A
,
E U[a,b] , A sup
ba
nN
= Xn X
(a.e., ),
W
where X is 0 Fn -measurable
and locally -integrable. In fact, in the case of
W
martingales, there is a 0 Fn -measurable, locally -integrable X such that
Xn = E X Fn (a.e., ) for all n N
if and only if {Xn : n 0} is uniformly -integrable on each A F0 with
(A) < , in which case X is -integrable if and only if Xn X in L1 (; R).
On the other hand, when p (1, ), X Lp (; R) if and only if {Xn : n 0}
is bounded in Lp (; R), in which case Xn X in Lp (; R).
Proof: Obviously, there is no problem unless () = . However, even then,
each of these results follows immediately from its counterpart in 5.2 once one
makes the following trivial observation. Namely, given 0 F0 with (0 )
(0, ), set
F 0 = F[0 ],
Fn0 = Fn [0 ],
Xn0 = Xn 0 ,
and P =
F0
.
(0 )
235
Then Xn0 , Fn0 , P0 is asubmartingale or a martingale depending on whether
the original Xn , Fn , was a submartingale or a martingale. Hence, when
() = , simply choose aSsequence {k : k 1} of mutually disjoint, -finite
Q=
N
Y
[aj , aj + r)
j=1
These partitions are nicely meshed in the sense that the (n + 1)st is a refinement
of the nth. Equivalently, if Fn denotes the -algebra over RN generated by the
partition Pn , then Fn Fn+1 . Moreover, if f L1 (RN ; R) and
Z
f
nN
f (y) dy for x Cn (k) and k ZN ,
Xn (x) 2
Cn (k)
(a.e., RN ),
{M(0) f }
236
where
(
(0)
f (x) = sup
1
|Q|
Z
|f (y)| dy : x Q
Q
Pn
nZ
then exactly the same argument that (when = 0) led us to (6.1.5) can now be
used to get
Z
n
o
1
N
()
f (y) dy
(*)
x R : M f (x)
{M() f }
for each {0, 1}N and (0, ). Finally, if Q is given by (6.1.3) and
r 3 12n , then it is possible to find an {0, 1}N and a C Pn () for which
Q C. (To see this, first reduce to the case when N = 1.) Hence,
max
{0,1}N
M() f Mf 6N
max
{0,1}N
M() f.
After combining this with the estimate in (*), we arrive at the following version
of the HardyLittlewood Maximal Inequality:
n
o (12)N Z
|f (y)| dy.
(6.1.6)
x RN : Mf (x)
RN
{0,1}N
()
M f
p N
L (R ;R)
p
kf kLp (RN ;R) ,
p1
p (1, ].
237
{M()B(0,R) f }
Next, even though the result in Exercise 1.4.18 was stated for probability measures, it applies equally well to any finite measure. Thus, we now know that
()
kM
! p1
()
(M
f ) (x) dx
B(0,R)
p
kf kLp (RN ;R) ,
p1
N
Mf
p N (12) p kf kLp (RN ;R)
L (R ;R)
p1
for p (1, ].
In this connection, notice that there is no hope of getting this sort of estimate
when p = 1, since it is clear that
lim |x|N Mf (x) > 0
|x|
Proof: I begin with the observation that, for each f L1 (RN ; R),
Z
1
f (y) dy N Mf (x), x RN ,
Mf (x) sup
B3x |B| B
238
2N
with N = B(0, 1). Second, notice that (6.1.9) for every
where n =
N
x RN is trivial when f Cc (RN ; R). Hence, all that remains is to check that
if fn f in L1 (RN ; R) and if (6.1.9) holds for each fn , then it holds for f . To
this end, let > 0 be given and check that, because of the preceding and (6.1.6),
Z
f (y) f (x) dy
x : lim 1
B&{x} |B| B
n
o
x : M(f
fn )(x)
3
Z
1
fn (y) fn (x) dy
+ x : lim
3
B&{x} |B| B
n
o
+ x : fn (x) f (x)
3
3
1 + (12)N N kf fn kL1 (RN )
nZ+
N
kZ
and
! p1
(6.1.11)
X
kZN
sup |S n (k)|p
nZ+
(12)N p
p1
! p1
X
|ak |p
for p (1, ].
kZN
239
for the game of cricket. What Hardy wanted to find is the optimal order in
which to arrange batters to maximize the average score per inning. Thus, he
worked with a non-negative sequence {ak : k 0} in which ak represented the
expected number of runs scored by player k, and what he showed is that, for
each (0, ),
k N : sup S n (k)
+
nZ
X
k N : sup S n (k) 1
ak ,
nZ+
0
(0, ).
Although this sharpened result can also be obtained as a corollary the Sunrise
Lemma,1 Hardys approach remains the most appealing.
6.1.2. Banach SpaceValued Martingales. I turn next to martingales
with values in a separable Banach space. Actually, everything except the easiest
aspects of this topic becomes extremely complicated and technical very quickly,
and, for this reason, I will restrict my attention to those results that do not
involve any deep properties of the geometry of Banach spaces. In fact, the only
general theory with which I will deal is contained in the following.
Theorem 6.1.12. Let E be a separable
Banach
space
and
X
,
F
,
an En
n
valued martingale. Then kXn kE , Fn , is a non-negative submartingale and
therefore, for each N Z+ and all (0, ),
(6.1.13)
sup kXn kE
0nN
1
E kXN kE ,
sup kXn kE .
0nN
sup kXn kE
nN
Lp (;E)
p
sup kXn kLp (;E) .
p 1 nN
See Lemma 3.4.5 in my A Concise Introduction to the Theory of Integration, Third Edition,
Birkhauser (1998).
240
Proof: The fact kXn kE , Fn , is a submartingale is an easy application of
the inequality in (5.1.14); and, given this fact, the inequalities in (6.1.13) and
(6.1.14) follow from the corresponding inequalities in Theorem 6.1.1.
While proving the convergence statement, I may and will assume that F =
W
p
Hence, because
kXn kE kXkE kXn XkE 2kXkE ,
the convergence in L1 (; E) is again an application of Lebesgues Dominated
Convergence Theorem.
Going beyond the convergence result in Theorem 6.1.12 to get an analog of
Doobs Martingale Convergence Theorem is hard. For one thing, a nave analog
is not even true for general separable Banach spaces, and a rather deep analysis
of the geometry of Banach spaces is required in order to determine exactly when
it is true. (See Exercise 6.1.18 for a case in which it is.)
Exercises for 6.1
Exercise 6.1.15. In this exercise we will develop Jensens Inequality in the
Banach space setting. Thus, (, F, P) will be a probability space, C will be a
closed, convex subset of the separable Banach space E, and X will be a C-valued
element of L1 (P; E).
(i) Show that there exists a sequence {Xn : n 1} of C-valued, simple functions
that tend to X both P-almost surely and in L1 (P; E).
(ii) Show that EP [X] C and that
EP g(X) g EP [X]
for every continuous, concave g : C [0, ).
241
APn
This proof, which seems to have been the first, of the Strong Law for Banach spaces was
given by E. Mourier in El
ements al
eatoires dans un espace de Banach, Ann. Inst. Poincar
e
13, pp. 166244 (1953).
242
and
Z
(y) f (x ty) dy,
t ? f (x) =
RN
t&0
x RN \ {0} for some C 1 (0, ); R with
Z
A
rN |0 (r)| dr < .
(0,)
1
|B(x, r)|
Z
f (y) dy
B(x,r)
N1
(0,)
A
N
(x),
Mf
x RN .
t(0,)
f L1 (RN ; R),
243
t (y) f (x y) f (x) dy = 0.
lim
t&0
RN
Two of the most familiar examples to which the preceding applies are the
2
N
Gauss kernel gt (x) = (2t) 2 exp |x|2 and the Poisson kernel (cf. (3.3.19))
N
R
t . In both these cases, A = N .
Exercise 6.1.18. Let E be a separable Hilbert space and (Xn , F, P) an Evalued martingale on some probability space (, F, P) satisfying the condition
sup EP kXn k2E < .
nZ+
W
Proceeding as in (i) of Exercise 5.2.36, first prove that there is a 1 Fn -measurable X L2 (P; E) to which {Xn : n 1} converges in L2 (P; E), next check
that
Xn = EP X Fn (a.s., P) for each n Z+ ,
and finally apply the last part of Theorem 6.1.12 to see that Xn X P-almost
surely.
Exercise 6.1.19. This exercise deals with a variation, proposed by Jan Mycielski, on the sort of search algorithm discussed in 5.2.5. Let G be a non-empty,
bounded, open subset of RN with the property that RN B(x, r) G N rd
for some > 0 and all x G and 0 < r diam(G), and define on (G, BG )
(G)
by () = RNN (G) . Next, let (, F, P) be a probability space on which there
R
exists sequences {Xn : n 1} and {Zn : n 1} of G-valued random variables
with the properties that the Xn s are mutually independent and have distribution , Zn is independent
of {X
1 , . . . , Xn } and has distribution n for each
n
< for some r (1, ). Without loss
n 1, and Kr supn1
d
d
r
L (;R)
244
and therefore that there is a C < such that kMG f kLp (;R)
for all p (1, ].
Cp
p
p1 kf kL (;R)
EP f (Xn ), An (z) n (dz).
Z
MG f dn .
(iii) Given the conclusion drawn at the end of (ii), proceed as in the derivation
of Theorem 5.2.34 from Lemma 5.2.31 to get the desired result.
6.2 Elements of Ergodic Theory
Among the two or three most important general results about dynamical systems
is D. Birkhoffs Individual Ergodic Theorem. In this section, I will present a
generalization, due to N. Wiener, of Birkhoffs basic theorem.
The setting in which I will prove the Ergodic Theorem will be the following.
(,
be a -finite measure space on which there exits a semigroup
kF, ) will N
: k N
of measurable, -measure preserving transformations.
That is, for each k NN , k is an F-measurable map from into itself, 0 is
the identity map, k+` = k ` for all k, ` NN , and
() = (k )1 () for all k N and F.
Further, E will be a separable Banach space with norm k kE , and, given a
function F : E, I will be considering the averages
1 X
F k (), n Z+ ,
An F () N
n
+
kQn
N
where Q+
: kkk < n and kkk max1jN kj . My
n is the cube k N
goal (cf. Theorem 6.2.7) is to show that, for each p [1, ) and F Lp (; E),
{An F : n 1} converges -almost everywhere. In fact, when either is finite
or p (1, ), I will show that the convergence is also in Lp (; E).
245
(24)N
kF kL1 (;E) ,
sup kAn F kE
n1
(0, ),
or
sup kAn F kE
n1
(6.2.3)
Lp ()
(24)N p
kF kLp (;E) ,
p1
F k ()
if k Q+
2n
if k
/ Q+
2n ,
(12)
F k ()
kQ+
2n
The idea of using Hardys Inequality was suggested to P. Hartman by J. von Neumann and
appears for the first time in Hartmans On the ergodic theorem, Am. J. Math. 69, pp.
193199 (1947).
246
and
X
kQ+
n
max
1mn
Am F ()
p
(12)N p
p1
p X
F k ()
p
kQ+
2n
max Am F
1mn
kQ+
n
Z
=
Cn () (d)
Z
(12)N X
F k f d
+
kQ2n
and, similarly,
X Z
kQ+
n
max
1mn
Am F
p
d
(12)N p
p1
p X Z
F k
p
d.
kQ+
2n
Finally, since the distributions of max1mn Am F k and F k do not
depend on k NN , the preceding lead immediately to
max Am F
1mn
and
max Am F
1mn
Lp ()
(24)N
kF kL1 ()
2 p (12)N p
kF kLp ()
p1
for all n Z+ . Thus, (6.2.2) and (6.2.3) follow after one lets n .
Given (6.2.2) and (6.2.3), I adopt again the strategy used in the proof of
Corollary 5.2.4. That is, I must begin by finding a dense subset of each Lp -space
on that the desired convergence results can be checked by hand, and for this
purpose I will have to introduce the notion of invariance.
A set F is said to be invariant, and I write I if = (k )1 () for
every k NN . As is easily checked, I is a sub--algebra of F. In addition, it
is clear that F is invariant if = (ej )1 () for each 1 j N , where
{ei : 1 i N } is the standard orthonormal basis in RN . Finally, if I is the
-completion of I relative to F in the sense that I if and only if F and
I such that ()
= 0 (AB (A\B)(B \A) is the symmetric
there is
difference between the sets A and B), then an F-measurable F : E is
I-measurable if and only if F = F k (a.e., ) for each k NN . Indeed, one
247
need only check this equivalence for indicator functions of sets. But if F
= 0 for some
I, then
and ()
+ ()
= 0,
(k )1 () (k )1 ()
and so I. Conversely, if I, set
[
=
(k )1 (),
kNN
I and ()
= 0.
and check that
Lemma 6.2.4. Let I(E) be the subspace of I-measurable elements of L2 (; E).
Then, I(E) is a closed linear subspace of L2 (; E). Moreover, if I(R) denotes
orthogonal projection from L2 (; R) onto I(R), then there exists a unique linear
contraction I(E) : L2 (; E) I(E) with the property that I(E) (af ) =
aI(R) f for a E and f L2 (; R). Finally, for each F L2 (; E),
(6.2.5)
An F I(E) F
Proof: I begin with the case when E = R. The first step is to identify the
orthogonal complement I(R) of I(R). To this end, let N denote the subspace
of L2 (; R) consisting of elements having the form g g ej for some g
L2 (; R) L (; R) and 1 j N . Given f I(R), observe that
f, g g ej L2 (;R) = f, g L2 (;R) f ej , g ej L2 (;R) = 0.
Hence, N I(R) . On the other hand, if f L2 (; R) and f N , then it is
clear that f f f ej for each 1 j N and therefore that
f f ej
2 2
L (;R)
2
2
= kf kL2 (;R) 2 f, f ej L2 (;R) +
f ej
L2 (;R)
= 2 kf k2L2 (;R) f, f ej L2 (;R) = 2 f, f f ej L2 (;R) = 0.
Thus, for each 1 j N , f = f ej -almost everywhere; and, by induction
on kkk , one concludes that f = f k -almost everywhere for all k NN .
In other words, we have now shown that I(R) = N or, equivalently, that
N = I(R) .
Continuing with E = R, next note that if f I(R), then An f = f (a.e., )
for each n Z+ . Hence, (6.2.5) is completely trivial in this case. On the other
hand, if g L2 (; R) L (; R) and f = g g ej , then
X
X
nN An f =
g k
g k+ej ,
{kQ+
n :kj =0}
{kQ+
n :kj =n1}
248
L (;R)
n
as n .
Hence, in this case also, (6.2.5) is easy. Finally, to complete the proof for E = R,
simply note that, by (6.2.3) with p = 2 and E = R, the set of f L2 (; R) for
which (6.2.5) holds is a closed linear subspace of L2 (; R) and that we have
already verified (6.2.5) for f I(R) and f from a dense subspace of I(R) .
Turning to general Es, first note that I(E) F is well defined for -simple F s.
P`
Indeed, if F = 1 ai 1i for some {ai : 1 i `} E and {i : 1 i `} of
mutually disjoint elements of F with finite -measure, then
I(E) F =
`
X
ai I(R) 1i
and so
I(E) F
2 2
L (;E)
`
X
!2
kai kE I(R) 1i
=
I(R)
`
X
1
!
2
kai kE 1i
2
L (;R)
`
X
Thus, since the space of -simple functions is dense in L2 (; E), it is clear that
I(E) not only exists but is also unique.
Finally, to check (6.2.5) for general Es, note that (6.2.5) for E-valued, simple F s is an immediate consequence of (6.2.5) for E = R. Thus, we already
know (6.2.5) for a dense subspace of L2 (; E), and so the rest is another elementary application of (6.2.3).
6.2.2. Birkhoff s Ergodic Theorem. For any p [1, ), let Ip (E) denote
the subspace of I-measurable elements of Lp (; E). Clearly Ip (E) is closed for
every p [1, ). Moreover, since
(6.2.6)
() < = I(E) F = E F I ,
249
An F I(E) F
(a.e., ).
()
lim An F =
n
if () (0, )
(a.e., ),
if () =
[F ]
, and, when () = , means that Ip (E) = {0}
(6.2.6)) that I(E) F = E()
for all p [1, ).
In view of the preceding, all that remains is to discuss the L1 (; E) convergence
in the case when p = 1 and () < . To this end, observe that, because the
An s are all contractions in L1 (; E), it suffices to prove L1 (; E) convergence
for E-valued, -simple F s. But L1 (; E) convergence for such F s reduces
to showing that An f I(R) f in L1 (; R) for non-negative f L (; R).
Finally, if f L1 ; [0, ) , then
An f kL1 () = kf kL1 () =
I(R) f kL1 (;R) ,
n Z+ ,
where, in the last equality, I used (6.2.6); and this, together with (6.2.8), implies
(cf. the final step in the proof of Theorem 6.1.12) convergence in L1 ().
I will say that semigroup k : k NN is ergodic on (, F, ) if, in addition
to being -measure preserving, () ({) = 0 for every invariant I.
250
Classic Example. In order to get a feeling for what the Ergodic Theorem is
saying, take to be Lebesgue measure on the interval [0, 1) and, for a given
(0, 1), define : [0, 1) [0, 1) so that
() + [ + ] = + mod 1.
If is rational and m is the smallest element of Z+ with the property that
m Z+ , then it is clear that, for any F on [0, 1), F = F if and only if F
1
. Hence, if F L2 [0, 1); C and
has period m
Z
c` (F )
F ()e 1 2` d, ` Z,
[0,1)
then elementary Fourier analysis leads to the conclusion that, in this case,
X
lim An F () =
cm` (F )e 1 2m` for Lebesgue-almost every [0, 1).
n
`Z
On the other hand, if is irrational, then k : k N} is -ergodic on [0, 1).
To see this, suppose that F I(C). Then (cf. the preceding and use Parsevals
Identity)
X
2
c` (F ) c` (F )2 .
0 =
F F
L2 ([0,1);C) =
`Z
But, clearly,
c` (F ) = e
1 2`
c` (F ),
` Z,
[0,1)
Finally, notice that the situation changes radically when one moves from [0, 1) to
[0, ) and again takes to be Lebesgue measure and (0, 1) to be irrational.
If I extend the definition of by taking () = bc + ( bc) for
[0, ), then it is clear that invariant functions are those that are constant on each
R bc+1
interval [m, m+1) and that, Lebesgue-almost surely, An f () bc f () d.
On the other hand, if one defines () = + , then every invariant set that
has non-zero measure will have infinite measure, and so, now, every choice of
(0, 1) (not just irrational ones) will give rise to an ergodic system. In
particular, one will have, for each p [1, ) and F Lp (; E),
lim An F = 0
Lebesgue-almost everywhere,
251
N
In particular, when k : k NN is ergodic on E, B N , I will say that the
family F is ergodic and conclude that the preceding can be replaced by
1 X
F Fk = EP F F (a.s., P) and in Lp (P; B).
(6.2.9)
lim N
n n
+
kQn
So far I have discussed one-sided stationary families, that is, families indexed
by NN . However, for various reasons (cf. Theorem 6.2.11) it is useful to know
that one can usually embed a one-sided stationary family into a two-sided one. In
terms of the semigroup
of shifts,
to the trivial observation that
k
this corresponds
N
NN
the semigroup : k N
on E = E
can be viewed as a sub-semigroup
= E ZN . With these comments in
of the group of shifts k : k ZN on E
mind, I will prove the following.
Lemma 6.2.10. Assume that E is a complete, separable, metric space and that
F = {Xk : k NN } is a stationary family of E-valued random variables on the
and
F,
P)
probability space
(, F, P).
Then there exists a probability space (,
N
N
a family F = Xk : k Z
with the property that, for each ` Z ,
` X
k+` : k NN
F
as F has under P.
has the same distribution under P
252
Proof: When formulated correctly, this theorem is an essentially trivial application of Kolmogorovs Extension Theorem (cf. part (iii) of Exercise 9.1.17).
Namely, for n N, set
n = k ZN : kj n for 1 j N ,
and define n : E 0 E n so that
n (x)k = xn+k
for all BE n .
Hence the n s are consistently defined on the spaces E n , and therefore Kolmogorovs Extension Theorem applies and guarantees the existence of a unique
N
Borel probability measure on E Z with the property that
N
EZ
\n
= n ()
k (
and X
) =
k for k ZN .
As an example of the advantage that Lemma 6.2.10 affords, I present the
following beautiful observation made originally by M. Kac.
Theorem 6.2.11.
Let (E, B) be a measurable space and {Xk : k N}
a stationary sequence of E-valued random variables on the probability space
(, F, P). Given B,
define the return time ()
= inf{k 1 : Xk () }.
Then, EP , X0 = P Xk for some k N . In particular, if {Xk : k
N} is ergodic, then
P X0 > 0 = EP , X0 = 1.
Proof: Set Uk = 1 Xk for k N. Then {Uk : k N} is a stationary sequence
of {0, 1}-valued random
variables. Hence, by Lemma 6.2.10, we can find a prob on which there is a family {U
F,
P
k : k Z} of {0, 1}-valued
ability space ,
253
n , . . . , U
n+k , . . .
random variables with the property that, for every n Z, U
as (U0 , . . . , Uk , . . . ) has under P. In particular,
has the same distribution under P
U
0 = 1 and
P 1, X0 = P
U
n = 1, U
n+1 = 0, . . . , U
0 = 0 ,
P n + 1, X0 = P
n Z+ .
Thus, if
(
) inf k N : Uk (
) = 1 ,
then
= n 1 ,
P n, X0 = P
n Z+ ,
and so
< .
EP , X0 = P
Now observe that
> n = P
U
n = 0, . . . , U
0 = 0 = P X0
P
/ , . . . , Xn
/ ,
from which it is clear that
< = P k N Xk .
P
Finally, assume that
P{Xk : k N} is ergodic and that P(X0 ) > 0.
Because, by (6.2.9), 0 1 Xk = P-almost surely, it follows that, P-almost
surely, Xk for some k N.
It should be noticed that, although there are far more elementary proofs, when
{Xn : n 0} is an irreducible, ergodic Markov chain on a countable state space
E, then Kacs theorem proves that the stationary measure at the state x E is
the reciprocal of the expected time that the chain takes to return to x when it
starts at x.
6.2.4. Continuous Parameter Ergodic Theory. I turn now to the setting of continuously parametrized semigroups
Thus, again
of transformations.
(, F, ) is a -finite measure space and t : t [0, )N is a measurable
semigroup of -measure preserving transformations on . That is, 0 is the
identity, s+t = s t ,
(t, ) [0, )N 7 t () is B[0,)N F-measurable,
and t = for every t [0, )N . Next, given an F-measurable F with
values in some separable Banach space E, let G(F ) be the set of with the
property that
Z
F t ()
dt < for all T (0, ).
E
[0,T )N
254
Clearly,
G(F ) = t () G(F )
and so
F
L (;E)
[0,T )N
< ,
Lp (; E) = G(F ){ = 0.
p[1,)
if G(F )
if
/ G(F ),
k
N
is defined in terms of : k N
as in Theorem 6.2.7. Then, for each
p [1, ) and F Lp (; E),
lim AT F = I(E)
F
(6.2.13)
(a.e., ).
Finally, if t : t [0, )N is ergodic, then (6.2.13) can be replaced by
lim AT F =
E [F ]
()
(a.e., ),
T >0
255
(0, )
and
sup
AT F
T >0
E
(6.2.15)
Lp (;E)
(24)N p
kF kLp (;E)
p1
for p (1, ).
for n T n + 1,
lim
sup
n
nT n+1
AT F An F
E
=0
Lp (;R)
Hence, for F L1 (; E)L (; E), (6.2.13) follows from (6.2.8). As for
the case
when () < , all that we have to do is check that I(E)
F = E F
I (a.e., ).
E F, = E A1 F, for all I.
then
But, if I,
E A1 F, =
E F t , dt
[0,1)N
Z
=
1
E F t , t
() dt = E [F, ].
[0,1)N
Finally, assume that t : t [0, )N is -ergodic. When () < , the
asserted result follows immediately from the preceding; and when () = , it
follows from the fact that I(E)
F is measurable with respect to the -completion
of I.
256
E [f ]
(a.e., ).
n
()
Exercise 6.2.18. Let F = Xk : k NN be a stationary family of random
variables on the probability space (, F, P) with values in the measurable space
NN
(E, B), and let I denote the -algebra of shift invariant BE
.
= lim An f =
(i) Take
T
Xk : kj n for all 1 j N ,
n0
N
the tail -algebra
determined
by
X
:
k
N
. Show that F1 (I) T , and
k
conclude that Xk : k NN is ergodic if T is P-trivial (i.e., P() {0, 1} for
all T ).
(ii) By combining (i), Kolmogorovs 01 Law, and the Individual Ergodic Theorem, give another derivation of The Strong Law of Large Numbers for independent, identically distributed, integrable random variables with values in a
separable Banach space.
Exercise 6.2.19. Let Xk : k N be a stationary, ergodic sequence of Rvalued, integrable random variables on (, F, P). Using the reasoning suggested
in Exercise 1.4.28, prove Guivarchs lemma:
n1
X
Xk < .
EP X1 = 0 = lim
n
k=0
257
m=1
On the other hand, it is not at all clear how to compare the size of Yn to that
of Xn in any of the Lp spaces other than p = 2.
The problem of finding such a comparison was given a definitive solution by D.
Burkholder, and I will present his solution in this section. Actually, Burkholder
solved the problem twice. His first solution was a beautiful adaptation of general
ideas and results that had been developed over the years to solve related problems in probability theory and analysis and, as such, did not yield the optimal
solution. His second approach is designed specifically to address the problem
at hand and bears little or no resemblance to familiar techniques. It is entirely
original, remarkably elementary and effective, but somewhat opaque. The approach is the outgrowth of many years of deep thinking that Burkholder devoted
to the topic, and the reader who wants to understand the path that led him to
it should consult the explanation that he wrote.1
6.3.1. Burkholders Comparison Theorem. Burkholders basic result is
the following comparison theorem.
Theorem
6.3.1 (Burkholder). Let , F, P be a probability space, Fn :
n N a non-decreasing sequence of sub--algebras of F, and E and F a pair
of (real or complex)
separable Hilbert spaces. Next, suppose that Xn , Fn , P
and Yn , Fn , P are, respectively, E- and F -valued martingales. If
kY0 kF kX0 kE and kYn Yn1 kF kXn Xn1 kE , n Z+ ,
P-almost surely, then, for each p (1, ) and n N,
(6.3.2)
Yn
p
Bp
Xn
Lp (P;E) ,
L (P;F )
where Bp (p 1)
1
.
p1
For those who want to know the secret behind this proof, Burkholder has revealed it in his
article Explorations in martingale theory and its applications for the 1989 Saint-Flour Ecole
dEt
e lectures published by Springer-Verlag, LNM 1464 (1991).
258
I may assume that both E and F are complex Hilbert spaces, since we can always
complexify them, and, in addition, that E = F , since, if that is not already the
case, I can embed them in E F . Thus, I will be making these assumptions
throughout.
The heart of the proof lies in the computations contained in the following two
lemmas.
Lemma 6.3.3. Let p (1, ) be given, set
p =
p2p (p 1)p1
2p
if p [2, )
if p (1, 2],
p
kykE + kxkE
p u(x, y),
p1
(x, y) E 2 .
(**)
(p 1)
p1
>1
if p (2, )
<1
if p (1, 2).
2p
(p 1)
p1
p p
(1 ps) (1 s) + (p 1) s
0
0
if p (2, )
if p (1, 2).
259
To this end, note that, by (**), (0) > 0 when p (2, ) and (0) < 0 when
p (1, 2). Also, for s (0, 1),
h
i
0 (s) = p (p 1)p sp1 + (1 s)p1 p2p (p 1)p1
and
h
i
00 (s) = p(p 1) (p 1)p sp2 (1 s)p2 .
In particular, we see that p1 = 0 p1 = 0. In addition, depending on whether
p (2, ) or p (1, 2), lims&0 00 (s) is negative or positive, 00 is strictly increasing or decreasing on (0, 1), and lims%1 00 (1) is positive or negative. Hence,
there exists a unique t = tp (0, 1) with the property that
< 0 if p (2, )
> 0 if p (2, )
00 (0, t)
and 00 (t, 1)
> 0 if p (1, 2)
< 0 if p (1, 2.
Moreover, because 00 (t) = 0, it is easy to see that t 0, p1 .
Now suppose that p (2, ) and consider on each of the intervals p1 , 1 ,
1
t, p , and 0, t separately. Because both and 0 vanish at p1 while 00 > 0
on p1 , 1 , it is clear that > 0 on p1 , 1 . Next, because 0 p1 = 0 and
00 t, p1 > 0, we know that is strictly decreasing on t, p1 and therefore that
t, p1 > p1 = 0. Finally, because 00 (0, t) < 0 while (0) (t) 0,
we also know that (0, t) > 0. The argument when p (1, 2) is similar, only
this time all the signs are reversed.
p2
kxkE .
one has
u(x + k, y + h) u(x, y) v(x, y) Re
y
kykF
x
,
k
, h + w(x, y) Re kxk
E
F
260
Proof: Set
(t) = t; (x, k), (y, h)
ky + thkE (p 1)kx + tkkE
kx + tkkE + ky + thkE
p1
u x + tk, y + th =
t; (x, k), (y, h)
(p 1)
if p [2, )
t; (y, h), (x, k)
if p (1, 2).
Re x(t),k
kx(t)kE
, and b(t) =
Re y(t),h
ky(t)kE
h
i
0 (t) = p(t)p2 (1 p)kx(t)kE a(t) + ky(t)kE + (2 p)kx(t)kE b(t)
h
i
= p (1 p)(t)p2 kx(t)kE a(t) + b(t) + (t)p1 b(t) .
In particular, the first expression establishes the required form for 0 (t). In
addition, from the second expression, we see that
2
00 (t)
= (p 1)(p 2) (t)p3 kx(t)kE a(t) + b(t)
p
i
h
2
2
E
+ (p 1)(t)p2 a(t) a(t) + b(t) + kx(t)k
ky(t)kE b (t) + a (t)
i
h
b (t)2
(t)p2 (p 1) a(t) + b(t) b(t) + (t) ky(t)k
E
2
= (p 1)(p 2) (t)p3 kx(t)kE a(t) + b(t)
+ (p 1)(t)p2 kkk2E khk2E + (p 2)(t)p1
b (t)2
ky(t)kE ,
p
p
where a (t) = kkk2E a(t)2 and b (t) = khk2E b(t)2 . Hence the required
properties of 00 (t) have also been established.
261
()
for each n N. Clearly, (6.3.2) for each Xn and Yn implies (6.3.2) for Xn
and Yn after one lets & 0. Finally, because there is nothing to do when the
right-hand side of (6.3.2) is infinite, let p (1, ) be given, and assume that
Xn Lp (P; E) for each n N. In particular, if u is the function defined in
Lemma 6.3.3 and v and w are those defined in Lemma 6.3.4, then
u(Xn , Yn ) L1 (P; R)
p
is the Holder conjugate of p.
for all n N, where p0 = p1
Note that, by Lemma 6.3.3, it suffices for us to show that An EP u Xn , Yn
0, n N. Since u X0 , Y0 ) 0 P-almost surely, there is no question that
A0 0. Next, assume that An 0, and, depending on whether p [2, ) or
p (1, 2], use the appropriate part of Lemma 6.3.4 to see that
i
h
An+1 EP v(Xn , Yn )Re kYYnnkE , Hn+1
E
i
h
P
Xn
+ E w(Xn , Yn )Re kXn kE , Kn+1
or
i
h
An+1 EP w(Yn , Xn )Re kYYnnkE , Hn+1
E
i
h
P
Xn
.
E v(Yn , Xn )Re kXn kE , Kn+1
E
v(Xn , Yn ) kYYnnkE
But, since
(cf. Exercise 5.1.18)
i
h
= 0.
EP v(Xn , Yn )Re kYYnnkE , Hn+1
E
Since the same reasoning shows that each of the other terms on the right-hand
side vanishes, we have now proved that An+1 0.
As an immediate consequence of Theorem (6.3.2), we have the following answer
to the question raised at the beginning of this section.
262
m=1
This is the form of his inequality which is best known and, as such, is called
Burkholders Inequality. Notice that his inequality can be viewed as a vast
generalization of Khinchines Inequality (2.3.27), although it applies only when
p (1, ).
Theorem
6.3.6 (Burkholders Inequality). Let , F, P and Fn : n
N be as in Theorem 6.3.1, and let Xn , Fn , P be a martingale with values in
the separable Hilbert space E. Then, for each p (1, ),
(6.3.7)
1
sup
Xn X0
Lp (P;E)
Bp nN
! p2 p1
X
Xn Xn1
2
EP
E
1
Bp sup
Xn X0
Lp (P;E) ,
nN
with Bp as in (6.3.2).
Proof: Let F = `2 (N; E) be the separable Hilbert space of sequences
y = x0 , . . . , xn , . . . E N
satisfying
kykF
X
0
! 12
kxn k2E
< ,
263
and define
Yn () = (X0 (), X1 () X0 (), . . . , Xn () Xn1 (), 0, 0, . . . ) F
for and n N. Obviously, Yn , Fn , P is an F -valued martingale. Moreover,
kX0 kE = kY0 kF
n N,
and therefore the right-hand side of (6.3.7) is implied by (6.3.2) while the lefthand side also follows from (6.3.2) when the roles of the Xn s and Yn s are
reversed.
Exercises for 6.3
Exercise 6.3.8. Because it arises repeatedly in the theory of stochastic integration, one of the most frequent applications of Burkholders Inequality is to
situations in which E is a separable Hilbert space and (Xn , Fn , P) is an E-valued
martingale for which one has an estimate of the form
h
1
i 2p
P
2p
<
Kp sup
E kXn Xn1 kE Fm1
+
nZ
(P;R)
for some p [1, ). To see how such an estimate gets used, let F be a second separable Hilbert space and suppose that {n : n N} is a sequence of
Hom(E; F )-valued random variables with the properties that, for each n N,
1
2p
< . Set Y0 = 0 and
n is Fn -measurable and an EP kn k2p
op
Yn =
n
X
m1 Xm Xm1
for n Z+ ,
m=1
n1
1 X 2p
a
n m=0 m
1
! 2p
Exercise 6.3.9. Return to the setting in Exercise 5.2.45, and let [0,1) denote
Lebesgue measure on [0, 1). Given f L2 ([0,1) ; C), show that, for each p
(1, ),
1
f (f, 1)L2 ( ;C)
p
(p 1)
[0,1)
L ([0,1);C)
p1
! p p1
Z
X
2 2
m (f )
dt
[0,1)
m=0
(p 1)
1
f (f, 1)L2 ( ;C)
p
.
[0,1)
L ([0,1) ;C)
p1
264
For functions f with (f, e` )L2 ([0,1) ;C) = 0 unless ` = 2m for some m N, this
estimate is a case of a famous theorem proved by Littlewood and Paley in order
to generalize Parsevals Identity to cover p 6= 2. Unfortunately, the argument
here is far too weak to give their inequality for general f s.
Exercise 6.3.10. In connection with the preceding exercise,
it is interesting
to note that there is an orthonormal basis for L2 [0,1) ; R that, as distinguished
from the trigonometric functions, can be nearly completely understood in terms
of martingale analysis. Namely, recall the Rademacher functions {Rn : n Z+ }
introduced in 1.1.2. Next, use F to denote the set of all finite subsets F of Z+ ,
and define the Walsh function WF for F F by
1
if F =
WF = Q
mF Rm if F 6= .
Finally, set A0 = and An = {1, . . . , n} for n Z+ .
(i) For each n N, let Fn be the -algebra generated by the partition
k k+1
: 0 k < 2n .
2n , 2n
Show that, for each n Z+ , WF : F An is an orthonormal
basis for the
subspace L2 [0, 1), Fn , [0,1) ; R , and conclude
from
this
that
WF : F F
forms an orthonormal basis for L2 [0,1) ; R .
(ii) Let f L1 [0, 1); R be given, and set
!
X Z
f
Xn =
f (t) WF (t) dt WF for n N.
F An
[0,1)
Using the result in (i), show that Xnf = E[0,1) [f |Fn ] and therefore that Xnf , Fn ,
[0,1) is a martingale.
In particular, Xnf f both (a.e., [0,1) ) as well as in
1
L [0,1) ; R .
(iii) Show that for each p (1, ) and f L1 [0,1) ; R with mean value 0,
(p 1) (p 1)1 kf kLp ([0,1);R)
X
X
[0,1)
n=1
F An \An1
f (s) WF (s) ds
[0,1)
p1
2 p2
WF (t) dt
265
1 + x a 1 x a
e = cosh a + x sinh a.
e +
2
2
(ii) Suppose that {Y1 , . . . , Yn } are [1, 1]-valued random variables on the probability space (, F, P) with the property that, for each 1 m n,
EP Yj1 Yjm = 0
n
n
n
X
X
Y
1
a2j ,
aj Yj
cosh aj exp
EP exp
2
j=1
j=1
j=1
!
n
2
X
R
,
P
aj Yj R exp Pn
2 j=1 a2j
j=1
R [0, ).
!
,
R [0, ).
Chapter 7
Continuous Parameter Martingales
It turns out that many of the ideas and results introduced in 5.2 can be easily
transferred to the setting of processes depending on a continuous parameter. In
addition, the resulting theory is intimately connected with Levy processes, and
particularly Brownian. In this chapter, I will give a brief introduction to this
topic and some of the techniques to which it leads.1
7.1 Continuous Parameter Martingales
There is a huge number of annoying technicalities which have to be addressed in
order to give a mathematically correct description of the continuous time theory
of martingales. Fortunately, for the applications which I will give here, I can
keep them to a minimum.
7.1.1. Progressively
Measurable
Functions. Let (, F) be a measurable
space and Ft : t [0, ) a non-decreasing family of sub--algebras. I will say
that a function X on [0, )
into a measurable
space (E, B) is progressively
measurable with respect to Ft : t [0, ) if X [0, T ] is B[0,T ] FT measurable for every T [0, ). When E is a metric space, I will say that
X : [0, ) E is right-continuous if X(s, ) = limt&s X(t, ) for every
(s, ) [0, ) and will say that it is continuous if X( , ) is continuous
for all .
Remark 7.1.1. The reader might have been expecting a slightly different definition of progressive measurability
here. Namely,
he might have thought that
one would say that X is Ft : t [0, ) -progressively measurable if it is
B[0,) F-measurable and 7 X(t, ) E is Ft -measurable for each
t [0, ). Indeed, in extrapolating from the discrete parameter setting, this
would be the first definition at which one would arrive. In fact, it was the notion
with which Doob and Ito originally worked;
and such functions were said by
them to be adapted to Ft : t [0, ) . However, it came to be realized
that there are various problems with the notion of adaptedness. For example,
even if X is adapted and f : E R is a bounded, B-measurable function, the
1
A far more thorough treatment can be found in D. Revuz and M. Yors treatise Continuous
Martingales and Brownian Motion, Springer-Verlag, Grundlehren der Mathematishen #293
(1999).
266
267
Rt
function (t, )
Y (t, ) 0 f X(s, ) ds R need not be adapted. On the
other hand, if X is progressively measurable, then Y will be also.
The following simple lemma should help to explain the virtue of progressive
measurability and its relationship to adaptedness.
Lemma 7.1.2. Let PM denote the set of A [0, ) with the property
that [0, t] A B[0,t] Ft for every t 0. Then PM is a sub--algebra of
B[0,) F and X is progressively measurable if and only if it is PM-measurable.
Furthermore, if E is a separable metric space and X : [0, ) E is a
right-continuous function, then X is progressively measurable if it is adapted.
Proof: Checking that PM is a -algebra is easy. Furthermore, for any X :
[0, ) E, T [0, ), and B,
(t, ) [0, T ] : X(t, )
= [0, T ] (t, ) [0, ) : X(t, ) },
and so X is Ft : t [0, ) -progressively measurable if and only if it is PMmeasurable. Hence, the first assertion has been proved.
Next, suppose that X is a right-continuous, adapted function. To see that X
is progressively measurable, let t [0, ) be given, and define
n
Xnt (, ) = X [2 2n]+1 t, , for (, ) [0, ) and n N.
268
1 , m RN , C RN +
1(,y)RN
1 1[0,1] (|y|) , y
RN
RN
M (dy).
If (, F, P) is a probability space and Z : [0, ) RN is a B[0,) Fmeasurable map with the properties that Z(0, ) = 0 and Z( , ) D(RN ) for
every , then {Z(t) : t 0} is a Levy process for if and only if, for each
x RN ,
(7.1.4)
exp
1(, Z(t))RN t` () , Ft , P is a martingale,
where Ft = {Z( ) : [0, t]} .
Proof: If {Z(t) : t 0} is a Levy process for , then, because Z(t) Z(s) is
independent of Fs and has characteristic function e(ts)` () ,
h
EP exp 1 , Z(t) RN t` ()
= exp
= exp
1 , Z(s)
1 , Z(s)
RN
RN
i
Fs
h
1
s` () e(st)` () EP e
s` () .
i
,Z(t)Z(s)
RN
To prove the converse assertion, observe that the defining distributional property
of a Levy process for can be summarized as the statement that Z(0, ) = 0
and, for each 0 s < t, Z(t) Z(s) is independent of {Z( ) : [0, t]} and
has distribution ts , where
c = e ` . Hence, since (7.1.4) implies that
h
i
EP exp 1 , Z(t) Z(s) RN Fs = e(ts)` () ,
RN ,
1
Trace C2 (x) + m, (x) RN
2 Z
h
i
+
(x + y) (x) 1[0,1] (|y|) y, (x) RN M (dy)
RN
269
is a martingale, where Ft = {Z( ) : [0, t]} and L is the operator
described in (7.1.5). Conversely, if Z is a progressively measurable function
satisfying Z(0, ) = 0 and Z( , ) D(RN ) for each , and if
Z t
Z(t)
L Z( ) d, Ft , P
0
(*)
where
t is the distribution of x under t , the measure determined by bt = et` .
The easiest way to check (*) is to work via Fourier transform and to use (3.2.10)
to verify that
d\
t` ()
()et` () ,
?
t () = ` ()()e
= Ld
dt
=
E L Z( ) , A d = E
L Z( ) d, A ,
s
270
1(,x)RN
, in which case L = ` (), and therefore, for
can take (x) = e
any A Fs , one gets that
Z t
i
h
u( ) d.
u(t) EP exp 1(, Z(t) RN N , A = u(s) + ` ()
R
Since this means that u(t) = e(ts)` () u(s), it follows that {Z(t) : t 0}
satisfies (7.1.4) and is therefore a Levy process for .
As an immediate consequence of the preceding we have the following characterizations of the distribution of a Levy process. In the statement that follows,
Ft is the -algebra over D(RN ) generated by {( ) : [0, t]}.
Theorem 7.1.7. Given I(RN ), let Q M1 D(RN ) be the
distribution
of a Levy process for . Then Q is the unique P M1 D(RN ) that satisfies
either one of the properties:
i
h
exp 1 , (t) RN + t` () , Ft , P
(t) (0)
L ( ) d, Ft , P
t[0,T ]
t[0,T ]
271
"
sup X(t)p
EP
t[0,T ]
1
p
EP X(T )p p ,
p1
p (0, ).
Proof: Because of Exercise 1.4.18, I need only prove the first assertion. To
this end, let T (0, ) and
apply Theorem 5.2.1 to the discrete
n N be given,
mT
, P , and observe that
parameter submartingale X 2n , F mT
n
2
sup X
mT
2n
: 0m2
% sup X(t)
as n .
t[0,T ]
Theorem 7.1.10
(Doobs Martingale Convergence Theorem). Assume
that X(t), Ft , P is a P-integrable submartingale. If
sup EP X(t)+ < ,
t[0,)
W
then there exists an F t0 Ft -measurable X = X() L1 (P; R) to which
X(t) converges P-almost surely as t . Moreover, when X(t), Ft , P is either
a non-negative submartingale or
a martingale, the convergence takes place in
1
L (P; R) if and only if the family X(t) : t [0, ) is uniformly P-integrable,
in which case X(t) EP [X | Ft ] or X(t) = EP [X | Ft ] (a.s., P) for all t [0, ),
and
1
(7.1.11)
P sup |X(t)| EP |X|, sup |X(t)| .
t0
t0
Finally, again when X(t), Ft , P is either a non-negative submartingale
or a
martingale, for each p (1, ) the family |X(t)|p : t [0, ) is uniformly Pintegrable if and only if supt[0,) kX(t)kLp (P) < , in which case X(t) X
in Lp (P; R).
Proof: To prove the initial convergence assertion, note that, by Theorem
W 5.2.15
applied to the discrete parameter process X(n), Fn , P , there is an nN Fn measurable X L1 (P; R) to which X(n) converges P-almost surely. Hence,
we need only check that limt X(t) exists in [, ] P-almost surely. To
(n)
this end, define U[a,b] () for n N and a < b to be the precise number of
times that the sequence X 2mn , : m N upcrosses the interval [a, b] (cf. the
(n)
paragraph preceding Theorem 5.2.15), observe that U[a,b] () is non-decreasing
(n)
as n increases, and set U[a,b] () = limn U[a,b] (). Note that if U[a,b] () < ,
272
then (by right-continuity), there is an s [0, ) such that either X(t, ) b for
all t s or X(t, ) a for all t s. Hence, we will know that X(t, ) converges
in [, ] for P-almost every as soon as we show that EP U[a,b] <
for every pair a < b. In addition, by (5.2.16), we know that
P
sup E
nN
(n)
U[a,b]
EP (X(t) a)+
< ,
sup
ba
t[0,)
(*)
sup |X(t)|
t[0,T ]
"
#
1 P
E |X|, sup |X(t)|
t[0,T ]
for every T (0, ). Hence, (7.1.11) follows when one lets T . But, again
from (*),
EP |X(T )|, |X(T )| EP |X|, |X(T )| EP |X|, sup |X(t)| ,
t0
and therefore, since, by (7.1.11), P supt0 |X(t)| 0 as , we can
conclude that {X(t) : t 0} is uniformly P-integrable.
Finally, if {X(T ) : T 0} is bounded in Lp (P; R) for some p (1, ), then,
by the last part of Theorem 7.1.9, supt0 |X(t)|p is P-integrable and therefore
X(t) X in Lp (P; R).
7.1.4. Stopping Times and Stopping Theorems. A stopping time
relative to a non-decreasing family {Ft : t 0} of -algebras is a map :
[0, ] with the property that { t} Ft for every t 0. Given a
stopping time , I will associate with it the -algebra F consisting of those
A such
S that A { t} Ft for every t 0. Note that, because
{ < t} = n=0 { (1 2n )t}, { < t} Ft for all t 0.
Here are a few useful facts about stopping times.
Lemma 7.1.12. Let be a stopping time. Then is F -measurable, and,
for any progressively measurable function X with
values in a measurable space
(E, B), the function
X(, ) X (), is F -measurable on { < } in
273
the sense that : () < & X(, ) F for all B. In addition,
f is again a stopping time if f : [0, ] [0, ] is a non-decreasing, rightcontinuous function satisfying f ( ) for all [0, ]. Next, suppose that
1 and 2 are a pair of stopping times. Then 1 + 2 , 1 2 , and 1 2
are all stopping times, and F1 2 F1 F2 . Finally, for any A F1 ,
A {1 2 } F1 2 .
Proof: Since { s} { t} = { s t} Ft , it is clear that is
F -measurable. Next, suppose that X is a progressively measurable
function.
To prove that X() is F -measurable, begin by checking that : (),
A Ft for any A Bt Ft . Indeed, this is obvious when A = [0, s] B for
s [0, t] and B Ft and, since these generate B[0,t] Ft , follows in general.
Now, for any t 0 and B,
A(t, ) (, ) [0, ) : , X(, ) [0, t] B[0,t] Ft ,
and therefore
{X() } { t} = : (), A(t, ) Ft .
As for f when f satisfies the stated conditions, simply note that
{f t} = { f 1 (t)} Ft ,
where f 1 (t) inf{ : f ( ) t} t.
Next suppose that 1 and 2 are two stopping times. It is trivial to see that
1 2 and 1 2 are again stopping times. In addition, if Q denotes the set of
rational numbers, then
[
{1 + 2 > t} = {1 > t}
{1 qt & 2 > (1 q)t} Ft .
qQ[0,1]
Finally, if A F1 , then
A {1 2 } {1 2 t} = A {1 t} {1 t 2 },
and therefore, since A {1 t} Ft and {1 t 2 } Ft2 Ft , we have
that A {1 2 } F1 2 .
In order to prove the continuous parameter analog of Theorems 5.2.13 and
5.2.11, I will need the following uniform integrability result.
274
Lemma 7.1.13. If X(t), Ft , P is either a martingale or a non-negative, integrable submartingale, then, for each T > 0, the set
X() : is a stopping time dominated by T
is uniformly P-integrable. Furthermore, if, in addition, {X(t) : t 0} is uniformly
7.1.10) X() = limt X(t) (a.s., P),
P-integrable and (cf. Theorem
then X() : is a stopping time is uniformly P-integrable.
Proof: Throughout, without loss in generality, I will assume that X(t), Ft , P
is a non-negative, integrable submartingale.
n
for n 0. By Lemma 7.1.12,
Given a stopping time T , define n = [2 2]+1
n
n is again a stopping time. Thus, by Theorem
5.2.13
applied to the discrete
parameter submartingale X(m2n ), Fm2n , P ,
X(n ) EP X 2n ([2n T ] + 1) Fn EP X(T + 1) Fn ,
and so
EP X(n ), X(n ) EP X(T + 1), X(n )
"
P
X(T + 1),
sup
X(t) .
t[0,T +1]
Starting from here, noting that n & as n , and applying Fatous Lemma,
we arrive at
"
#
(*)
EP X(), X() > EP X(T + 1), sup X(t) .
t[0,T +1]
Hence, since, by Theorem 7.1.9, P supt[0,T +1] X(t) tends to 0 as ,
this proves the first assertion. When {X(t) : t 0} is uniformly integrable, we
can replace (*) by
P
P
E X( T ), X( T ) > E X(), sup X(t)
t0
for any stopping time and T > 0. Hence, after another application of Fatous
Lemma, we get
P
P
E X(), X() > E X(), sup X(t) .
t0
At the same time, the first inequality in Theorem 7.1.9 can be replaced by
1
1 P
P sup X(t) E X(), sup X(t) EP [X()],
t0
t0
275
276
Theorem 7.1.16. Given I(RN ), let Q M1 D(RN ) be the distribution
of the Levy process for . Then, for each stopping time and FD(RN ) F measurable functions F : D(RN ) D(RN ) [0, ),
Z
ZZ
F , Q (d) =
1[0,) ( 0 ) F (, 0 ) Q (d)Q (d 0 ).
{<}
Q (1 ) BT
Q (BT )
EQT e 1(x,(t))RN t` () , A = EQT e 1(x,(s))RN s` () , A .
To this end, note that, by Theorem 7.1.14 applied to s + T and t + T ,
Q (BT )EQT e 1(x,(t))RN t` () , A
i
h
since
e 1(,())RN +` () 1A ( )1BT () is Fs+T -measurable.
7.1.5. An Integration by Parts Formula. In this subsection I will derive
a simple result that has many interesting applications.
277
on each interval [0, t) for which |V |(t, ) < . Next, suppose that X(t), Ft , P
is a C-valued martingale with the property that, for each (t, ) (0, ) ,
the product kX( , )k[0,t] |V |(t, ) < , and define
(R
Y (t, ) =
(0,t]
otherwise,
where, in the case when |V |(t, ) < , the integral is the Lebesgue integral of
X( , ) on [0, t] with respect to the C-valued measure determined by V ( , ).
If
h
i
EP kXk[0,T ] |V |(T ) + V (0) < for all T (0, ),
then X(t)V (t) Y (t), Ft , P is a martingale.
Proof: Without loss in generality,
I will assume
that both X and V are R
valued. To see that |V | is Ft : t [0, ) -progressively measurable, simply
observe that, by right-continuity,
[2n t]
|V |(t, ) = sup
nN
X
V
k+1
2n
t, V
k
2n ,
;
k=0
or Y (t, ) =
X(s, ) V (ds, )
(0,t]
lim
k+1
2n
t,
k+1
2n
t, V
k
2n
s,
k=[2n s]
In fact, under the stated integrability condition, the convergence in the preceding
takes place in L1 (P; R) for every t [0, ); and therefore, for any 0 s t <
278
and A Fs ,
EP Y (t) Y (s), A
[2n t]
= lim
h
EP X
k+1
2n
t,
h
EP X(t) V
k+1
2n
k+1
2n
t, V
k
2n
s,
,A
k=[2n s]
[2n t]
= lim
t, V
k
2n
s,
,A
k=[2n s]
h
i
i
= EP X(t) V (t) V (s) , A = EP X(t)V (t) X(s)V (s), A ,
and clearly this is equivalent to the asserted martingale property.
We will make frequent practical applications of Theorem 7.1.17 later, but
here I will show that it enables us to prove that there is an important dichotomy
between continuous martingales and functions of bounded variation. However,
before doing so, I need to make a small, technical digression.
A function : [0, ] is an extended stopping time relative to
Ft : t [0, ) if { < t} Ft for every t (0, ). Since { < t} Ft for any
stopping time , it is clear that every stopping time is an extended stopping time.
On the other hand, not every extended stopping time is a stopping time. To wit,
if X : [0, ) R is a right-continuous,
progressively measurable function
relative to
X( ) : [0, t] : t 0 , then = inf{t 0 : X(t) > 1} will
always be an extended stopping time but will seldom be a stopping time.
T
Lemma 7.1.18. For each t 0, set Ft+ = >t F . Then : [0, ]
is an extended stopping time if and only if it is a stopping time relative to
{Ft+ : t 0}. Moreover, if X(t), Ft , P is either a non-negative,
integrable
submartingale or a martingale, then so is X(t), Ft+ , P . In particular, if is
an extended stopping time, then X(t ), Ft+ , P is a non-negative, integrable
submartingale or a martingale.
T
Proof: The first assertion is immediate from { t} = >t { < }. To prove
the second assertion, apply right-continuity and the first uniform integrability
result in Lemma 7.1.13 to see that if 0 s < t and A Fs+ , then
EP X(s), A = lim EP X( ), A EP X(t), A ,
&s
279
XR (t)
XR ( ) XR (d ), Ft+ , P
Z
XR ( ) XR (d ) .
[0,t]2
Z
XR ( ) XR (d ) .
Hence, EP XR (t)2 = 0 for all t > 0, which means that XR ( ) 0 P-almost
surely.
The preceding result leads immediately to the following analog of the uniqueness statement in Lemma 5.2.12.
Corollary 7.1.20. Let X : R be a right-continuous, progressively
measurable function. Then, up to a P-null set, there is at most one continuous,
progressively measurable A : R such that A(0, ) = 0, A( , ) is of
locally bounded variation for P-almost every , and X(t) A(t), Ft , P is
a martingale.
The role of continuity here seems minor, but it is crucial. Namely, continuity was used in Theorem 7.1.19 only when I wanted to know that XR (t)2 =
Rt
2 0 XR ( ) XR (d ). On the other hand, it is critical.
Namely, if {N (t)
: t 0}
is the simple Poisson process in 4.2 and Ft = N ( ) : [0, t] , then it
is easy to check that N (t) t, Ft , P is a martingale, all of whose paths are of
locally bounded variation.
280
Exercise 7.1.21. The definition of stopping times and their associated algebras that I have adopted is due to E.B. Dynkin. Earlier, less ubiquitous
but more transparent, definitions appear in the work of Doob and Hunt under
the name of optional stopping times. To explain these earlier definitions, let E
be a complete, separable metric space and a non-empty collection of rightcontinuous paths : [0, ) E with the property that for all and
t [0, ), the stopped path t given by t ( ) = (t ) is again in . Similarly,
given a function : [0, ], define so that (t) = t () . Finally,
for each t [0, ), define the -algebras
Ft over to be the one generated
W
by {( ) : [0, t]}, and take F = t0 Ft . In terms of these quantities, an
optional stopping time is an F-measurable map : [0, ] such that
() t = () = ( t ), and the associated -algebra is { (t) : t 0} .
The goal of this exercise is to show that is an optional stopping time if and
only if it is a stopping time and that its associated -algebra is F .
(i) It is an easy matter (cf. Exercise 4.1.9) to check that f : R is F+
+
measurable if and only if there exists a B Z -measurable F : E Z R and a
sequence {tm : m Z+ } such that f () = F (t1 ), . . . , (tm ), . . . , from which
it is clear that an F-measurable f will be Ft -measurable for some t [0, ) if
and only if f () = f ( t ). Use this to show that every optional stopping time is
a stopping time.
(ii) Show that : [0, ] is a stopping time relative to Ft : t [0, )
if and only if it is F-measurable and, for each t [0, ), { : () t} =
{ : ( t ) t}. In addition, if is a stopping time, show that () < =
() = ( ), and therefore that () t = () = ( t ) for all t [0, ).
Thus, is an optional stopping time if and only if it is a stopping time.
Hint: In proving the second
part, check that { = t} Ft , and conclude that
1{t} () = 1{t} ( t ) for all (t, ) [0, ) .
(iii) If is a stopping time, show that F = { (t) : t 0} . Besides having
intuitive value, this shows that, at least in the situation here, F is countably
generated.
Hint: Using right-continuity, first show that
is F-measurable. Next,
given a B-measurable f : E R and t [0, ), use (ii) to show that
1[0,t] () f ( ) = 1[0,t] ( t ) f ( ( t ) , [0, ),
and conclude that { (t) : t 0} F . To prove the opposite inclusion, show that if f : R is F -measurable, then, for each t [0, ),
1{t} () f () = 1{t} ( t ) f ( t ), and thereby arrive at f () = f ( ). Fi
nally, use this together with Exercise 4.1.9 to show that f is { (t) : t 0} measurable.
281
Exercise 7.1.22. Let (, F, P) be a probability space and Ft : t [0, )
is non-decreasing family of sub--algebras of F. Denoteby F and Ft the completions of F and Ft with respect to P. If X(t), Ft , P is a submartingale or
martingale, show that X(t), Ft , P is also.
(1)
t =
y2
2
dy.
at 2
Now use the results in Exercise 3.3.14 (especially (3.3.16)) to conclude that the
1
(ii) Here is another, more conceptual way to understand the conclusion drawn
in (i) that the W (1) -distribution is a one-sided 12 -stable law. Namely, begin
by
a
a+b
a
b
a
showing that if (0) = 0 and () < , then
() = () + . As
an application of Theorem 7.1.16, conclude from this that if a denotes the W (1) distribution of a , then a+b = a ? b . In particular, this means that 1
ca = ea` , where ` is the exponent appearing in
is infinitely divisible and that
(iii) Next, use Brownian scaling to see that, for all > 0, a has the same W (1) distribution as 2 a , and use this together with part (iii) of Exercise 3.3.12 to
1
(iv) Although we know from (i) that the constant c must be 2 2 , here is an
1 2
easier way to find it. Use Exercise 7.1.23 to see that e(t) 2 t , Ft , W (1)
for every R, and apply Doobs Stopping Time Theorem and the fact that
(1)
1 2 a
W (1) ( a < ) = 1 to verify the identity EW e 2 = ea for > 0.
1
282
Q { : (t) & () t} = EQ t () , t ,
where, as usual, is determined by
c = e ` . As a consequence,
h
i
B(
)
d,
F
,
P
B(t)
t
2
0
is a martingale for all Cc (RN ; R). In this subsection, I, following Levy,1 will
give another martingale characterization of Brownian motion, this time involving
many fewer test functions. On the other hand, we will have to assume ahead of
time that B( , ) C(RN ) for every .
Theorem 7.2.1 (L
evy). Let B : [0, ) RN be a progressively measurable function satisfying
B(0, ) = 0 and B( , ) C(RN ) for every .
Then B(t), Ft , P is a Brownian motion if and only if
, B(t)
RN
2
t||2
, Ft , P
+ , B(t) RN
2
L
evys Theorem is Theorem 11.9 in Chapter VII of Doobs Stochastic Processes, Wiley (1953).
Doob uses a clever but somewhat opaque Central Limit argument. The argument given here
is far simpler and is adapted from the one introduced by H. Kunita and S. Watanabe in their
article On square integrable martingales, Nagoya Math. J. 30 (1967).
283
Proof: First suppose that B(t), Ft , P is a Brownian motion. Then, because
B(t) B(s) is independent of Fs and has distribution 0,I ,
EP B(t) B(s) Fs = 0 and EP B(t) B(t) B(s) B(s) Fs = (t s)I.
Hence, the necessity is obvious.
To prove the sufficiency, Theorem 7.1.3 says that it is enough to prove that
i
h
t||2
P
E exp 1 , B(t) RN + 2 , A
(*)
i
h
s||2
P
= E exp 1 , B(s) RN + 2 , A
i
h
Dn exp 1 n + 2n 1
i
h
2
Mn exp 1 , B(n1 ) RN + ||2 n1 .
By Taylors Theorem,
1 n +
Dn
n
2
1
2
1 n +
n
2
2
1 ||2 2
6 e 1 n +
3
n
2 .
284
X
X
EP Dn Mm , A =
EP En Mn , A
1
2 1 + ||2 e
||2
2
||2
EP n |Mn |, A 2 1 + ||2 (t s)e 2 (1+t) .
In other words, we have now proved that, for every (0, 1], the difference
||2
between the two sides of (*) is dominated by 2(1 + ||2 )(t s)e 2 (1+t) , and so
the equality in (*) has been established.
As in Theorem 7.1.19, the subtlety here is in the use of the continuity assumption. Indeed, the same example that demonstrated its importance there,
does so again here. Namely, if {N (t) : t 0} is a simple Poisson
process and
X(t) = N (t) t, then both X(t), Ft , P and X(t)2 t, Ft , P are martingales,
but X(t), Ft , P is certainly not a Brownian motion.
7.2.2. DoobMeyer Decomposition, an Easy Case. The continuous parameter analog of Lemma 5.2.12 is a highly non-trivial result, one that was
proved by P.A. Meyer and led him to his profound analysis of stochastic processes. Nonetheless, there is an important case in which Meyers result is relatively easy to prove, and that is the case proved in this subsection. However,
before getting to that result, there is a rather fussy matter to be dealt with.
m n>m
285
and so (, )
() and therefore also (, )
() are progressively
measurable functions. Hence, since B = {(, ) : () 0}, B is
progressively measurable.
Now define
X (),
if (t, )
/ B.
Clearly X( , ) is right-continuous. Moreover, because = (a.s., P), X( , )
is continuous and Xn ( , ) X( , ) uniformly on compacts for P-almost
every . Thus, it only remains to check that X is progressively measurable.
For this purpose, let BR be given, and set C = {(t, ) : X(t, ) }.
Because A and the Xn s are progressively measurable, it is clear that C A is
progressively measurable. Similarly, because B \ A is progressively measurable
and C (B \ A) equals B \ A or depending on whether 0 or 0
/ ,
C (B \ A), and therefore C B, are progressively measurable. Hence, we
now know that X B is progressively measurable. Finally, we showed earlier
that (t, )
t () is progressively
measurable, and therefore so is (t,)
[0, ) 7 t (), B. Thus, because X(t, ) = X t (), , we
are done.
Theorem 7.2.3. Let X(t), Ft , P be an R-valued, square integrable martingale with the property that X( , ) is continuous for P-almost every
. Then there is a P-almost surely unique progressively measurable function
hXi : [0, ) [0, ) such that hXi(0, ) = 0 and hXi( , ) is continuous
and non-decreasing for P-almost every , and X(t)2 hXi(t), Ft , P is a
martingale.
Proof: The uniqueness is an immediate consequence of Corollary 7.1.20.
The proof of existence, which is based on a suggestion I got from K. Ito, is
very much like that of Theorem 7.2.1. Without loss in generality, I will assume
that X(0) 0.
I begin by reducing to the case when X is P-almost surely bounded. To this
end, suppose that we know the result in this case. Given a general X and n N,
define n = inf{t 0 : |X(t)| n} and Xn (t) = X(tn ). Then, |Xn ( , )| n
and, by Doobs Inequality, n ()
% for P-almost every . Moreover, by
Corollary 7.1.15, Xn (t), Ft , P is a martingale. Thus, by our assumption, for
each n, we know hXn i exists. In addition, by Corollary 7.1.15 and uniqueness,
we know (cf. Exercise 7.2.10) that, P-almost surely, hXm i(t) = hXn i(t m ) for
all m n and t 0. Now define hXi so that hXi(t) = hXn i(t) for n t < n+1 .
Then hXi is progressively measurable and right-continuous, hXi(0) = 0, and,
P-almost surely, hXi is continuous and non-decreasing. Furthermore, X(t
286
n )2 hXi(t n ), Ft , P is a martingale for each n N. Finally, note that, by
Doobs Inequality,
EP khXik[0,t] EP kXk2[0,t] 4EP |X(t)|2 ,
and so, as n , X(t n)2 hXi(t n ) X(t)2 hXi(t) in L1 (P; R).
Hence, X(t)2 hXi(t), Ft , P is a martingale.
I now assume that |X( , )| C < for P-almost every . For each
n N, use induction to define {k,n : k 0} so that 0,n = 0, k,0 = k, and, for
(k, n) (Z+ )2 , k,n is equal to
inf{`,n1 : `,n1 > k1,n }
inf t k1,n : (t k1,n ) |X(t) X(k1,n )| n1 .
Working by induction, one sees that, for each n N, {k,n : k 0} is a nondecreasing sequence of bounded stopping times. Moreover, because X( , ) is
P-almost surely continuous, we know that, for each n N, limk k,n () =
P-almost every . Finally, the sequences {k,n : k 0} are nested in the
sense that {k,n1 : k 0} {k,n : k 0} for each n Z+ .
Set Xk,n = X k,n ) and, for k 1, k,n (t) = X t k,n X t k1,n .
Then X(t)2 = 2Mn (t) + hXin (t), where
Mn (t) =
X
k=1
k,n (t)2 .
k=1
Of course, for P-almost every , all but a finite number of terms in each of
these sums vanish. In addition, one should observe that hXin (s) hXin (t) if
s 0 and t s > n1 .
I now want to show that Mn (t), Ft , P is a P-almost surely continuous martingale for all n N, and the first step is to show for each (k, n) Z+ N,
Xk1,n k,n (t), Ft , P is a P-almost surely continuous martingale. Indeed, if
0 s < t and A Fs , then
EP Xk1,n k,n (t), A = EP Xk1,n k,n (t), A {k1,n s}
+ EP Xk1,n k,n (t), A {k1,n > s} .
287
where, in the passage to the second to last equality, I have used the fact that
Xk1,n 1A 1[k1,n ,k,n ) (s) is Fs -measurable and applied Theorem 7.1.14. At the
same time
EP Xk1,n k,n (t), A {k1,n > s}
= EP Xk1,n X(t k,n ) X(t k1,n ) , A {s < k1,n t}
= EP Xk1,n X(t) X(t) , A {s < k1,n t}
= 0 = EP Xk1,n k,n (s), A {k1,n > s} ,
where I have used the fact that Xk1,n 1A 1(s,t] (k1,n ) is Ftk1,n -measurable
and again applied Theorem 7.1.14
in getting the second
to last line. After
P
combining these, one sees that EP Xk1,n
(t),
A
=
E
Xk1,n k,n (s), A ,
k,n
which means that Xk1,n k,n (t), Ft , P is a P-almost surely continuous martingale.
Given the preceding, it is clear that, for each n and `, Mn (t `,n ), Ft , P
is a P-almost surely continuous, square integrable martingale. In addition, for
k 6= k 0 , Xk1 k,n (t `,n ) is orthogonal to Xk0 1 k0 ,n (t `,n ) in L2 (P; R).
Thus
"
#
P
2
E
sup
Mn ( ) 4EP Mn (t `,n )2
0 t`,n
=4
`
X
`
X
2
EP Xk1,n
k,n (t `,n )2 4C 2
EP k,n (t `,n )2
k=1
2 P
k=1
2
= 4C E X(t `,n )
4C E X(t) ,
P
from which it is easy to see that Mn (t), Ft , P is a P-almost surely continuous,
square integrable martingale.
I will now show that limm supn>m kMn Mm k[0,t] = 0 P-almost surely
(m)
(m)
and in L2 (P; R) for each t [0, ). To this end, define Yk1,n so that Yk1,n ()
(m)
= Xk1,n () X`1,m () when `1,m () k1,n () < `,m (). Then Yk1,n
P
(m)
(m)
1
(a.s., P), and Mn Mm = k=1 Yk1,n k,n .
is Fk1,n -measurable, |Yk1,n | m
Hence, by the same reasoning as above,
X
(m)
4
EP kMn Mm k2[0,t] 4
EP (Yk1,n )2 k,n (t)2 2 EP X(t)2 ,
m
k=1
288
Remark 7.2.4. The reader may be wondering why I chose to complicate the
preceding statement and proof by insisting that hXibe progressively measurable
with respect to the original family of -algebras Ft : t [0, ) . Indeed,
Exercise 7.1.22 shows that I could have replaced all the -algebras with their
completions, and, if I had done so, there would have been no reason not to
have taken X( , ) to be continuous and hXi( , ) to be continuous and nondecreasing for every . However, there is a price to be paid for completing
-algebras. In the first place, when one does, all statements become dependent
on the particular P with which one is dealing. Secondly, because completed algebras are nearly never countably generated, certain desirable properties can
be lost by introducing them. See, for example, Theorem 9.2.1.
By combining Theorem 7.2.3 with Theorem 7.2.1, one can show that, up to
time re-parametrization, all continuous martingales are Brownian motions. In
order to avoid technical difficulties, I will prove this result only in the simplest
case.
Corollary 7.2.5. Let X(t), Ft , P be a continuous, square integrable martingale with the properties that, for P-almost every , hXi( , ) is strictly
increasing and
exists a Brownian motion
limt hXi(t, ) = . Then there
B(t), Ft0 , P such that X(t) = X(0) + B hXi(t) , t [0, ) P-almost surely. In
particular,
X(t)
X(t)
= 1 = lim q
lim q
t
2hXi(t) log(2) hXi(t)
2hXi(t) log(2) hXi(t)
P-almost surely.
Proof: Clearly, given the first part, the last assertion is a trivial application of
Exercise 4.3.15.
After replacing F and the Ft s by their completions and applying Exercise
7.1.22, I may and will assume that X(0, ) = 0, X( , ) is continuous, hXi( , )
is continuous and strictly increasing, and limt hXi(t, ) = for every .
Next, for each (t, ) [0, ), set t () = hXi1 (t, ), where hXi1 ( , ) is the
inverse of hXi( , ). Clearly, for each , t
t () is a continuous, strictly
increasing function that tends to infinity as t . Moreover, because hXi is
progressively measurable, t is a stopping time for each t [0, ). Now set
289
B(t) = X(t ). Since it is obvious that X(t) = B hXi(t) , all that I have to
show is that B(t), Ft0 , P is a Brownian motion for some non-decreasing family
{Ft0 : t 0} of sub--algebras.
Trivially, B(0, ) = 0 and B( , ) is continuous for all . In addition, B(t) is Ft -measurable, and so B is progressively measurable with respect
to {Ft : t 0}. Thus, by Theorem
7.2.1, I will be done once I show that
2
B(t), Ft , P and B(t) t, Ft , P are martingales. To this end, first observe
that
"
#
"
#
EP
sup X( )2 = lim EP
[0,t ]
sup
X( )2
[0,T t ]
4 lim EP X(T t )2 4 lim EP hXi(T t ) 4t.
T
Thus, limT X(T t ) B(t) in L2 (P; R). Now let 0 s < t and A Fs
be given. Then, for each T > 0, AT A {s T } FT s , and so, by
Theorem 7.1.14,
EP X(T t ), AT = EP X(T s ), AT
and
EP X(T t )2 hXi(T t ), AT = EP X(T s )2 hXi(T s ), AT .
Now let T , and apply the preceding convergence assertion to get the
desired conclusion.
7.2.3. Burkholders Inequality Again. In this subsection we will see what
Burkholders Inequality looks like in the continuous parameter setting, a result
whose importance for the theory of stochastic integration is hard to overstate.
Theorem 7.2.6 (Burkholder). Let X(t), Ft , P be a P-almost surely continuous, square integrable martingale. Then, for each p (1, ) and t [0, )
(cf. (6.3.2)),
p1
(7.2.7) Bp1 kX(t) X(0)kLp (P;R) EP hX(t)i 2 p Bp kX(t) X(0)kLp (P;R) .
Proof: After completing the -algebras if necessary, I may (cf. Exercise 7.1.22)
and will assume that X( , ) is continuous and that hXi( , ) is continuous and
non-decreasing for every . In addition, I may and will assume that X(0) =
0. Finally, I will assume that X is bounded. To justify this last assumption, let
n = inf{t 0 : |X(t)| n}, set Xn (t) = X(t n ), and use Exercise 7.2.10 to
see that one can take hXn i = hXi(t n ). Hence, if we know (7.2.7) for bounded
martingales, then
p1
Bp1 kX(t n )kLp (P;R) EP hXi(t n ) 2 p Bp kX(t n )kLp (P;R)
290
for all n 1. Since hXi is non-decreasing, we can apply Fatous Lemma to the
preceding and thereby get
p1
kX(t)kLp (P;R) lim kX(t n )kLp (P;R) Bp EP hXi(t) 2 p ,
n
which is the left-hand side of (7.2.7). To get the right-hand side, note that either
kX(t)kLp (P;R) = , in which case there is nothing to do, or kX(t)kLp (P;R) < ,
in which case, by the second half of Theorem 7.1.9, X(t n ) X(t) in
Lp (P; R) and therefore
p1
p1
EP hXi(t) 2 p = lim EP hXi(t n ) 2 p
n
Proceeding under the above assumptions and referring to the notation in the
proof of Theorem 7.2.3, begin by observing that, for any t [0, ) and n
N, Theorem 7.1.14 shows that X(t k,n ), Ftk,n , P is a discrete parameter
martingale indexed by k N. In addition, k,n = t for all but a finite number
of ks. Hence, by (6.3.7) applied to X(t k,n ), Ftk,n , P ,
p1
Bp1 kX(t)kLp (P;R) EP hXin (t) 2 p Bp kX(t)kLp (P;R)
for all n N.
In particular, this shows that supn0 khXin (t)kLp (P;R) < for every p (1, ),
and therefore, since hXin (t) hXi(t) (a.s.,
P), this is more than enough to
p
p
P
P
2
2
verify that E hXin (t) E hXi(t) for every p (1, ).
t
1 2
2 x F
X( ) hXi(d ), Ft , P
291
Exercise 7.2.9. Let X(t), Ft , P be a continuous, square integrable martingale with X(0) = 0, and assume that there exists a non-decreasing function
A : [0, ) [0, ) such that hXi(t) A(t) (a.s.,
P) for each t [0, ). The
goal of this exercise is to show that E(t), Ft , P is a martingale when
E(t) = exp X(t) 12 hXi(t) .
(i) Given R (0, ), set R = inf{t 0 : |X(t)| R}, and show that
!
Z
tR
eX(tR )
1
2
eX( ) dhXi, Ft , P
is a martingale.
Hint: Choose F Cc (R; R) so that F (x) = ex for x [2R, 2R], apply
Exercise 7.2.8 to this F , and then use Doobs Stopping Time Theorem.
1
2
2 hXi(t)
.
[0,t]
sup E ( ) eR
2
2
A(t)
eR+
2
2
A(t)
[0,t]
R2
P kXk[0,t] R 2e 2A(t) .
Finally, given this estimate, show that the conclusion in Exercise 7.2.8 continues
to hold for any F C 2 (R; C) whose second derivative has at most exponential
growth.
292
Exercise 7.2.13.
Given a pair
continuous martingales
of square integrable,
hX+Y ihXY i
, and show that
X(t), Ft , P and Y (t), Ft , P , set hX, Y i =
4
X(t)Y (t) hX, Y i(t), Ft , P is a martingale. Further, show that hX, Y i is
uniquely determined up to a P-null set by this property together with the facts
that hX, Y i(0, ) = 0 and hX, Y i( , ) is continuous and has locally bounded
variation for P-almost every .
Exercise 7.2.14. Let B(t), Ft , P be an RN -valued Brownian motion. Given
f, g Cb1,2 [0, ) RN ; R , set
t
X(t) = f t, B(t)
+ 12 f , B( ) d,
+ 12 g , B( ) d,
Y (t) = g t, B(t)
0
f g , B( ) d.
hX, Y i(t) =
0
2
+ 12 f , B( ) d
2X(t)
0
Z
+ 12 f , B( ) d
2
,
1
, C RN +
2
Z
RN
cos , y RN 1 M (dy)
293
Z(t)
2Z(t ) Z(t) =
2Z() Z(t) if t,
: t 0} is again a Levy process for .
then {Z(t)
Proof: According to Theorem 7.1.3, all that I have to show is that
t`
()
,
F
,
P
exp 1 (, Z(t)
t
RN
t`
()
,
A
s}
EP exp 1 (, Z(t)
RN
i
h
= EP e2 1(,Z(s))RN exp 1 (, Z(t) RN t` () , A { s}
i
h
= EP e2 1(,Z(s))RN exp 1 , Z(s) RN s` () , A { s}
i
h
t`
()
,
A
s}
.
= EP exp 1 , Z(s)
N
R
Similarly,
i
h
t`
()
,
A
{
>
s}
EP exp 1 , Z(t)
RN
i
h
= EP e2 1(,Z(t))RN exp 1 (, Z(t) RN t` () , A { > s}
i
h
= EP exp 1 , Z(t ) RN (t )` () , A { > s}
i
h
= EP exp 1 , Z(s ) RN (s )` () , A { > s}
i
h
s`
()
,
A
{
>
s}
.
= EP exp 1 , Z(s)
N
R
294
t
=
P
Z(t)
a
&
t
+
a
a
P Z(t) > a , one gets P a t 2P Z(t) a , a conclusion that also could
have been reached via Theorem 1.4.13.
7.3.2. Reflected Brownian Motion. The considerations in the preceding
subsection are most interesting
when applied to R-valued Brownian motion.
Thus, let B(t), Ft , P be an R-valued Brownian motion. To appreciate the
improvements that can be made in the calculations just made, again take a =
inf{t 0 : B(t) a} for some a > 0. Then, because Brownian paths are
continuous, a < = B(a ) = a and so, since P(a < ) = 1, we can say
that
(7.3.2) P B(t) x & a t = P B(t) 2ax for (t, x) [0, )(, a].
In particular, by taking x = a and using P B(t) a = P B(t) a & a t ,
we recover the result in Exercise 4.3.12 that
P a t = 2P B(t) a .
A more interesting application of Lemma 7.3.1 to Brownian motion is to the
case when is the exit time from an interval other than a half-line.
Theorem 7.3.3. Let a1 < 0 < a2 be given, define (a1 ,a2 ) = inf{t 0 : B(t)
/
(a1 ,a2 )
(a1 ,a2 )
(a1 , a2 )}, and set Ai (t) = {
t & B(
) = ai } for i {1, 2}. Then,
for B[a1 ,) ,
0 P {B(t) } A1 (t) P {B(t) 2(a2 a1 ) + } A1 (t)
= P B(t) 2a1 P B(t) 2(a2 a1 ) +
and, for B(,a2 ] ,
0 P {B(t) } A2 (t) P {B(t) 2(a2 a1 ) + } A2 (t)
= P B(t) 2a2 P B(t) 2(a2 a1 ) + .
Hence, for B[a1 ,) , P {B(t) } A1 (t) equals
h
X
i
0,t 2a1 + 2(m 1)(a2 a1 ) 0,t + 2m(a2 a1 )
m=1
and, for B(,a2 ] , P {B(t) } A2 (t) equals
h
X
i
0,t 2a2 2(m 1)(a2 a1 ) 0,t 2m(a2 a1 ) ,
m=1
where in both cases the convergence is uniform with respect t in compacts and
B(a1 ,a2 ) .
295
i
P B(t) 2a1 2(m 1)(a2 a1 ) P B(t) 2m(a2 a1 ) +
m=1
+ P {B(t) 2M (a2 a1 ) + } A1 (t)
for all B[a1 ,) . The same line of reasoning applies when B(,a2 ] and
A1 (t) is replaced by A2 (t).
Perhaps the most useful consequence of the preceding is the following corollary.
296
(7.3.5)
P (s + t, x, ) =
I
Next, set
g(t, x) =
g(t, x + 4m),
x2
mZ
and
p(1,1) (t, x, y) = g(t, y x) g(t, y + x + 2)
Then p(1,1) is a smooth function that is symmetric in (x, y), strictly positive
on (0, ) (0, 1)2 , and vanishes when x {1, 1}. Finally, if
pI (t, x, y) = r1 p(1,1) r2 , r1 (x c), r1 (y c) ,
(t, x, y) (0, ) I 2 ,
then
I
(7.3.6)
p (s + t, x, y) =
297
where, in the passage to the second line, I have used Brownian scaling. Now,
use the last part of Theorem 7.3.3, the symmetry of 0,r2 t , and elementary
rearrangement of terms to arrive first at
P I (t, x, ) =
Xh
i
r2 t 4m + r1 ( x) r2 t 4m + 2 + r1 ( + x 2c) ,
mZ
and then at P I (t, x, dy) = pI (t, x, y) dy. Given this and (7.3.5), (7.3.6) is obvious.
Turning to the properties of p(1,1) (t, x, y), both its symmetry and smoothness are clear. In addition, as the density for P (1,1) (t, x. ), it is non-negative,
and, because x
g(t, x) is periodic with period 4, it is easy to see that
(1,1)
p
(t, 1, y) = 0. Thus, everything comes down to proving that p(1,1) (t, x, y)
> 0 for (t, x, y) (0, ) (1, 1)2 . To this end, first observe that, after rearranging terms, one can write p(1,1) (t, x, y) as
g(t,y x) g(t, y + x) + g(t, 2 x y)
h
X
+
g(t, y x + 4m) g(t, y + x + 2 + 4m)
m=1
i
+ g(t, y x 4m) g(t, y + x 2 4m) .
Since each of the terms in the sum over m Z+ is positive, we have that
2(1|x|)(1|y|)
t
1 2e g(t, y x)
p(1,1) (t, x, y) > g(t, y x) 1 2e
if t 2(1 |x|)(1 |y|). Hence, for each (0, 1), p(1,1) (t, x, y) > 0 for all
(t, x, y) [0, 22 ] [1 + , 1 ]2 . Finally, to handle x, y [1 + , 1 ] and
t > 22 , apply (7.3.6) with I = (1, 1) to see that
p
(1,1)
(m + 1) , x, y)
|z|(1)
and use this and induction to see that p(1,1) (m2 , x, y) > 0 for all m 1. Thus,
if n Z+ is chosen so that n2 < t (n + 1)2 , then another application of
(7.3.6) shows that
(1,1)
Z
(t, x, y)
|z|(1)
298
P (s + t, x, ) =
G
(N )
0,(txG )I x (xG ) , xG .
. This is the probabilistic version of Duhamels Formula, which we will see again
in 10.3.1.
(iii) As a consequence of (ii), show that there is a Borel measurable function
pG : (0, ) G2 [0, ) such that (t, y)
pG (t, x, y) is continuous for
each x G and P G (t, x, dy) = pG (t, x, y) dy for each (t, x) (0, ) G. In
particular, use this in conjunction with (i) to conclude that
Z
G
p (s + t, x, y) =
pG (t, z, y)pG (s, x, z) dz.
G
N
||2
2
(iv) Given c = (c1 , . . . , cN ) RN and r > 0, let Q(c, r) denote the open cube
QN
i=1 (ci r, ci + r), and show that (cf. Corollary 7.3.4)
pQ(c,r) (t, x, y) =
N
Y
i=1
Chapter 8
Gaussian Measures on a Banach Space
See I.E. Segals Distributions in Hilbert space and canonical systems of operators, T.A.M.S.,
88 (1958) and L. Grosss Abstract Wiener spaces, Proc. 5th Berkeley Symp. on Prob. &
Stat., 2 (1965), Univ. of California Press. A good exposition of this topic can be found in
H.-H. Kuos Gaussian Measures in Banach Spaces, Springer-Verlag, Math. Lec. Notes., # 463
(1975).
299
300
Moreover, the dual space (RN ) of (RN ) can be identified with the space
of RN -valued, Borel measures on [0, ) with the properties that ({0}) = 0
and 1
Z
kk(RN )
(1 + t) ||(dt) < ,
C(RN ) 7 kk(RN ) sup
[0,)
(t) (dt).
[0,)
[
n
o
\
A m, n1 : (0) = 0 BC(RN ) .
n=1 m=1
In order to analyze the space (RN ), k k(RN ) , define
N
N
N
F : (R ) C0 R; R C R; R : lim |(s)| = 0
|s|
by
(es )
,
F () (s) =
1 + es
1
s R.
301
As is well known, C0 R; RN with the uniform norm is a separable Banach
space,
N
N
and it is obvious that F is an isometry from (R ) onto
C0 R; R . Moreover,
by the Riesz Representation Theorem for C0 R; RN , one knows that the dual
of C0 R; RN is isometric to the space of totally finite, RN -valued measures
on R; BR with the norm given by total variation. Hence, the identification
kBk2(RN )
X
n=0
4
2
n+1
n+2
P
sup |B(t)|
0t2n
P
sup |B(t)|
32EP |B(1)|2 = 32N.
0t1
n=0
(x ) =
exp 1 hx, x i (dx), x E ,
E
then
is a continuous function of weak* convergence on , and
uniquely
determines in the sense that if is a second element of M1 () and
= ,
then = .
Proof: Since it is clear that each of the maps x E 7 hx, x i R is
continuous and therefore BE -measurable, the first assertion will follow as soon
302
x E.
nZ+
m xm
for = (1 , . . . , n ) Rn .
X () =
m=1
I will now compute the Fourier transform of W (N) . To this end, first recall
that, for an RN -valued Brownian motion, { , B(t) RN : t 0 and RN
spans a Gaussian family G(B) in L2 (P; R). Hence, span , (t) : t
0 and RN
is a Gaussian family in L2 (W (N ) ; R). From this, combined
with an easy limit argument using Riemann sum approximations, one sees that,
ZZ
1
\
(N ) () = exp
s t (ds) (dt) , (RN ) .
(8.1.3)
W
2
[0,)2
303
8.1.2. The Classical CameronMartin Space. From the Gaussian standpoint, it is extremely unfortunate that the natural home for Wiener measure is
a Banach space rather than a Hilbert space. Indeed, in finite dimensions, every
centered, Gaussian measure with non-degenerate covariance can be thought of
as the canonical, or standard, Gaussian measure on a Hilbert space. Namely, if
0,C is the Gaussian measure on RN with mean 0 and non-degenerate covariance
C, consider RN as a Hilbert space H with inner product (g, h)H = (g, Ch)RN ,
and take H to be the natural Lebesgue measure there: the one that assigns
measure 1 to a unit cube in H or, equivalently, the one obtained by pushing the
1
usual Lebesgue measure RN forward under the linear transformation C 2 . Then
we can write
khk2
1
2H
H (dh)
e
0,C (dh) =
N
(2) 2
and
2
d
0,C (h) = e
khk
H
2
!2
Z
n
X
(t
)
(t
)
t
t
m
m1
m
m1
d(t1 ) d(tn ).
exp
tm tm1
2
A
m=1
Obviously, nothing very significant has happened yet, since nothing very exciting has been done yet. However, if we now close our eyes, suspend our disbelief,
and pass to the limit as n tends to infinity and the tk s become dense, we arrive
at Feynmans representation 2 of Wieners measure:
#
"
Z
2
1
1
(N )
dt d,
(t)
(8.1.4)
W
d) = exp
2 [0,)
Z
2
In truth, Feynman himself never dabbled in considerations so mundane as the ones that
304
that (RN ) can be identified as a subspace of H1 (RN ). That is, for each
for all h H1 (RN ), and in the present setting it is easy to give a concrete
h(t) (dt) =
(0,)
Z
=
(0,)
(0,)
!
) d
h(
(0,t)
) (, ) d = h, h 1 N ,
h(
H (R )
where
Z
h (t) =
(0,t]
(, ) d.
(dt)
305
Moreover,
kh k2H1 (RN ) =
(, ) |2 d =
(0,)
ZZ
(0,)
(ds) (dt) d
(,)2
ZZ
=
s t (ds) (dt).
(0,)2
Hence, by (8.1.3),
\
(N ) () = exp
W
(8.1.5)
kh k2H(RN )
!
,
(RN ) .
[0,T ]
where the integral in the last expression is taken in the sense of Riemann
Stieltjes. Next, apply the integration by part formula3 to conclude that t
(t, ) is RiemannStieltjes integrable with respect to t
(t) and that
Z T
Z T
Hence, since
|(T )|
lim |(T )|||(T, ) lim
T
T 1 + T
Z
(8.1.6)
h, i = lim
Z
(1 + t) ||(dt) = 0,
(0,)
h (t) d(t),
See, for example, Theorem 1.2.7 in my A Concise Introduction to the Theory of Integration,
Birkh
auser (1999).
306
[
A(g1 , . . . , gn ) : n Z+ and g1 , . . . , gn H
is an algebra that generates BH . Show that there always exists a finitely additive
WH on A that is uniquely determined by the properties that it is -additive on
A(g1 , . . . , gn ) for every n Z+ and {g1 , . . . , gn } H and that
Z
i
h
kgk2H
, g H.
exp 1 (h, g)H WH (dh) = exp
2
H
On the other hand, as we already know, this finitely additive measure admits a
countably additive extension to BH if and only if H is finite dimensional.
8.2 A Structure Theorem for Gaussian Measures
Say that a centered Gaussian
measure
W on a separable Banach space E is
non-degenerate if EW hx, x i2 > 0 unless x = 0. (See Exercise 8.2.11.) In
this section I will show that any non-degenerate, centered Gaussian measure W
on a separable Banach space E shares the same basic structure that W (N ) has
on (RN ). In particular, I will show that there is always a Hilbert space H E
for which W is the standard Gauss measure in the same sense that W (N ) was
shown in 8.1.2 to be the standard Gauss measure for H1 (RN ).
8.2.1. Ferniques Theorem. In order to carry out my program, I need a
basic integrability result about Banach spacevalued, Gaussian random variables. The one that I will use is due to X. Fernique, and his is arguably the
most singularly beautiful result in the theory of Gaussian measures on a Banach
space.
Theorem 8.2.1 (Ferniques Theorem). Let E be a real, separable Banach
space, and suppose that X is an E-valued random variable that is centered and
Gaussian in the sense that, for each x E , hX, x i is a centered, R-valued
Gaussian random variable. If R = inf{r : P(kXkE r) 34 )}, then
(8.2.2)
2n
h kXk2E i
X
1
e
.
E e 18R2 K e 2 +
3
n=0
307
Proof: After enlarging the sample space if necessary, I may and will assume
that there is an E-valued random variable X 0 that is independent of X and has
1
1
the same distribution as X. Set Y = 2 2 (X + X 0 ) and Y 0 = 2 2 (X X 0 ).
Then the pair (Y, Y 0 ) has the same distribution as the pair (X, X 0 ). Indeed, by
2
Lemma 8.1.2, this
random
variable
comes down to showing that the R -valued
Now suppose that P kXk R
1
tn = R + 2 2 tn1 for n 1. Then
3
4,
2
P kXkE R P kXkE tn P kXkE tn1
and therefore
P kXkE tn
P kXkE R
P kXkE tn1
P kXkE R
!2
!2n
P kXkE R
P kXkE R
P kXkE tn
P kXkE R
n+1
2 1
1
2 2 1
32
n+1
2
R, that P kXkE 32
n+1
2
n
R 32 .
Hence,
h kXk2E i
X
n+1
n
n
1
e2 P 32 2 R kXkE 32 2 R
EP e 18R2 e 2 P kXkE 3R +
n=0
1
e2 +
X
n=0
n
e 2
= K.
8.2.2. The Basic Structure Theorem. I will now abstract the relationship,
proved in 8.1.2, between (RN ), H1 (RN ), and W (N ) , and for this purpose I
will need the following simple lemma.
308
Lemma 8.2.3. Let E be a separable, real Banach space, and suppose that
H E is a real Hilbert space that is continuously embedded as a dense subspace
of E.
(i) For each x E there is a unique hx H with the property that
h, hx H = hh, x i for all h H, and the map x E 7 hx H is
linear, continuous, one-to-one, and onto a dense subspace of H.
(ii) If x E, then x H if and only if there is a K < such that |hx, x i|
Kkhx kH for all x E . Moreover, for each h H, khkH = sup{hh, x i : x
E & kx kE 1}.
(iii) If L is a weak* dense subspace of E , then there exists a sequence {xn :
n 0} L such that {hxn : n 0} is an orthonormal basis for H. Moreover,
P
if x E, then x H if and only if n=0 hx, xn i2 < . Finally,
h, h0
H
n=0
H
= hh, x i,
x E ,
309
h, h
H
h, hxn
H
h , hxn
n=0
H
hh, xn ihh0 , xn i.
n=0
P
P
2
2
2
In particular,
n=0 hx, xn i < ,
P khkH = n=0 hh, xn i . Finally, if x E and
c ) = e
W(x
khx k2
H
2
for all x E .
The terminology is justified by the fact, demonstrated at the end of 8.1.2,
that H1 (RN ), (RN ), W (N ) is an abstract Wiener space. The concept of an
abstract Wiener space was introduced by Gross, although his description was
somewhat different from the one just given (cf. Theorem 8.3.9 for a reconciliation
of mine with his definition).
Theorem 8.2.5. Suppose that E is a separable, real Banach space and that
W M1 (E) is a centered Gaussian measure that is non-degenerate. Then there
exists a unique Hilbert space H such that (H, E, W) is an abstract Wiener space.
q
Proof: By Ferniques Theorem, we know that C EW kxk2E < .
To understand the proof of existence, it is best to start with the proof of
uniqueness. Thus, suppose that H is a Hilbert space for which (E, H, W) is an
abstract Wiener space. Then, for all x , y E , hhx , y i = (hx , hy )H =
hhy , x i. In addition,
hhx , x i =
khx k2H
Z
=
hx, x i2 W(dx),
310
hhx , y i =
xhx, x i W(dx), y
for all y E ,
and so
Z
(***)
hx =
xhx, x i W(dx).
L (W;R)
m n>m
311
c ) is continuous
Proof: To prove the initial assertion, remember that x
W(x
and so hxk hx in H.
Given the first assertion, the compactness of {hx : x BE (0, R)} in H
follows from the compactness (cf. Exercise 5.1.19) of BE (0, R) in the weak*
topology. To see that BH (0, R) is compact in E, again apply Exercise 5.1.19 to
check that BH (0, R) is compact in the weak topology on H. Therefore, all that
we have to show is that the embedding map h H 7 h E is continuous
from the weak topology on H into the
strong topology on E. Thus, suppose
that hk h weakly in H. Because hx : x BE (0, 1) is compact in H,
for each > 0 there exist an n Z+ and a {x1 , . . . , xn } BE (0, 1) such that
n
[
BH (hxm , ).
Now choose ` so that max1mn |hhk h, xm i| < for all k `. Then, for any
x BE (0, 1) and all k `,
|hhk h, x i| + min hk h, hx hxm H + 2 sup khk kH .
1mn
k1
Since, by the uniform boundedness principle, supk1 khk kH < , this proves
that khk hkE = sup{hhk h, x i : x BE (0, 1)} 0 as k .
S
Because H = 1 BH (0, n) and BH (0, n) is a compact subset of E for each
n Z+ , it is clear that H BE . To see that W(H) = 0 when E is infinite
dimensional, choose {xn : n 0} as in the final part of Lemma 8.2.3, and
set Xn (x) = hx, xn i. Then the Xn s are an infinite
P sequence of independent,
centered, Gaussians with mean value 1, and so n=0 Xn2 = W-almost surely.
Hence, by Lemma 8.2.3, W-almost no x is in H.
Turning to the map I, define I(hx ) = h , x i. Then, for each x , I(hx ) is
a centered Gaussian with variance khx k2H , and so I is a linear isometry from
312
khk2
H
EW e 1 I(h) = e 2 ,
h H,
1 khk2H
e 2 H (dh),
Z
That (8.2.8) is correct was proved for the classical Wiener space by Cameron
and Martin, and for this reason it is called the CameronMartin formula. In
fact, one has the following result, the second half of which is due to Segal.
313
(*)
2
1 I(h1 )+2 I(h2 )
22
1
2
2
kh1 kH + 1 2 h1 , h2 H + kh2 kH
E e
= exp
2
2
W
for all 1 , 2 C. Indeed, this is obvious when 1 and 2 are pure imaginary,
and, since both sides are entire functions of (1 , 2 ) C2 , it follows in general
by analytic
continuation. In particular, by taking h1 = g, 1 = 1, h2 = hx , and
.
Y (x) = exp hy, x ihx, x i
2
Hence,
2
1
Y EW R Fx EW R 2 Fx ,
and so (cf. Exercise 8.2.19)
hy, x i2
exp
8
1
1
= EW Y 2 EW R 2 (0, 1].
314
Exercise 8.2.11. Let E be a separable Banach space and W a centered Gaussian measure on E, but do not assume
that
W is non-degenerate. Denote by N
the set of x E for which EW hx, x i2 = 0, and set
= x E : hx, x i = 0 for all x N .
E
is closed, that W(E)
= 1, and that W E
is a non-degenerate,
Show that E
if hx, x i = 0 for all x C. For this purpose, recall that, by Exercise 5.1.19, E
with the weak* topology is second countable and therefore that N is separable
with respect to the weak* topology.
Exercise 8.2.12. Let {xP
separable Banach space
n : n 0} be a sequence in the P
E with the property that n=0 kxn kE < . Show that n=0 |n |kxP
n k < for
N
0,1
-almost every RN , and define X : RN E so that X() = n=0 n xn
P
if n=0 |n |kxn kE < and X() = 0 otherwise. Show that the distribution
of X is a centered, Gaussian measure on E. In addition, show that is
non-degenerate if and only if the span of {xn : n 0} is dense in E.
Exercise 8.2.13. Here an application of Ferniques Theorem to functional analysis. Let E and F be a pair of separable Banach spaces and a Borel measurable,
linear map from E to F . Given a centered, Gaussian E-valued random variable
X, use Exercise 2.3.21 see that X is an F -valued, a centered Gaussian random variable, and apply Ferniques Theorem to conclude that X is a square
integrable and has mean value 0. Next, suppose that is not continuous, and
choose {xn : n 0} E and {yn : n 0} F so that kxn kE = 1 = kyn kF
and h(xn ), yn i n + 13 . Using Exercise 8.2.12, show that there exist centered, Gaussian F -valued random variables {Xn : n 0},P
{X n : n 0},
N
2
and X under 0,1 such that Xn () = (n + 1) n xn , X() = n=0 Xn (), and
N
X n () = X() Xn () for 0,1
-almost every RN . Show that
Z
N
k
h X(), yn i 0,1
(d)
Z
N
h Xn (), yn i 0,1
(d) (n + 1),
X()k2F
N
0,1
(d)
N
and thereby arrive at the contradiction that X
/ L2 (0,1
; F ). Conclude that
every Borel measurable, linear map from E to F is continuous. Notice that,
as a consequence, we know that the PaleyWiener integral I(h) of an h in the
CameronMartin space is equal W-almost everywhere to a Borel measurable,
linear function if and only if h = hx for some x E .
315
Hint: Using Exercise 8.2.11, reduce to the case when W is non-degenerate. For
this case, let H be the CameronMartin space for W on E, and show that
i
h
2
2
EP e 1hS,x i = e 2 khx kH for all x E .
Exercise 8.2.15. Referring to the setting in Lemma 8.2.3, show that there is a
(n)
sequence {k kE : n 0} of norms on E each of which is commensurate with
(N )
k kE (i.e., Cn1 k k k kE Cn k k for some Cn [1, )) such that, for
each R > 0,
(n)
316
Exercise 8.2.18. Given (RN ) , I pointed out at the end of 8.1.2 that the
PaleyWiener integral
[I(h )]() can be interpreted as the RiemannStieltjes
integral of (s, ) with respect to (s). In this exercise, I will use this observation as the starting point for what is called stochastic integration.
(i) Given (RN ) and t > 0, set t (d ) = 1[0,t) ( )(d ) + t [t, ) , and
show that for all (RN )
h, t i =
(, ) d( ),
f ( ) d( ),
0
where again the integral on the right is RiemannStieltjes. Use this to see that
the process
Z t
f ( ) d( ) : t 0
0
Z t
2
B
|f ( )| d : t 0 ,
0
f ( ) d( ) : t 0 .
Of course, unless f has bounded variation, the integrals in the preceding are
no longer interpretable as RiemannStieltjes integrals. In fact, they not even
defined by but only as a stochastic process. For this reason, they are called
stochastic integrals.
317
N
for each n N, and show that F W = 0,1
and (F y ) W = 0 an ,1 , where
Q
N
an = hy, xn i. Conclude from this that (y ) W W if 0,1
0 an ,1 . Finally,
P
use this together with Exercise 5.2.42 to see that (y ) W W if 0 a2m = ,
which, by Lemma 8.2.3, will be the case if y
/ H.
318
1 kF 1 h0
k2
1 kh0
k2
(x0 ) H 0 ,
(x0 ) H = e 2
= e 2 khF > (x0 ) kH = e 2
which completes the proof that H 0 , E 0 , F W is an abstract Wiener space.
Theorem 8.3.1 says that there is a one-to-one correspondence between the abstract Wiener spaces associated with one Hilbert space and the abstract Wiener
spaces associated with any other. In particular, it allows us to prove the theorem
of Gross which states that every Hilbert space is the CameronMartin space for
some abstract Wiener space.
319
To convince oneself that this line of reasoning has a chance of leading somewhere, one should observe that Levys construction corresponds to a particular choice of the orthonormal basis {hm : m 0}.1 To see this, determine
{h k,n : (k, n) N2 } by
1
on k21n , (2k + 1)2n
n1
h k,0 = 1[k,k+1) and h k,n = 2 2
1 on (2k + 1)2n , (k + 1)21n
0
elsewhere
for n 1. Clearly, the h k,n s are orthonormal in L2 [0, ); R . In addition, for
each n N, the span of {h k,n : k N} equals that of {1[k2n ,(k+1)2n ) : k N}.
Perhaps the easiest way to check this is to do so by dimension counting. That
is, for a given (`, n) N2 , note that
n X
hk,m (t)Xk,m
m=0 k=0
X
X
2 2 sin n(t k)
Xk,n ,
(t k)1[k,k+1) (t)Xk,0 +
1[k,k+1) (t)
n
+
k=0
(k,n)NZ
320
where again {Xk,n : (k, n) N2 } is a family of independent, RN -valued, N (0, I)random variables. The reason why Levys choice is easier to handle than Wieners
is that, in Levys case, for each n Z+ and t [0, ), hk,n (t) 6= 0 for precisely
one k N. Wieners choice has no such property.
With these preliminaries, the following theorem should come as no surprise.
Theorem 8.3.3. Let H be an infinite dimensional, separable, real Hilbert
space and E a Banach space into which H is continuously embedded as a dense
subspace. If for some orthonormal basis {hm : m 0} in H the series
(8.3.4)
m hm converges in E
m=0
N
for 0,1
-almost every = (0 , . . . , m , . . . ) RN
and if S : RN E is given by
P
m=0 m hm
S() =
0
N
then H, E, W with W = S 0,1
is an abstract Wiener space. Conversely, if
(H, E, W) is an abstract Wiener space and {hm : m 0} is an orthogonal
sequence in H such that, for each m N, either hm = 0 or khm kH = 1, then
"
(8.3.5)
p #
n
X
sup
I(hm )hm
< for all p [1, ),
n0
m=0
m=0
[I(hm )](x)hm .
m=0
Finally,P
if {hm : m 0} is an orthonormal basis in H, then, for W-almost every
m=0
321
m=0
322
I need only show that W BE (0, r) > 0 for all r > 0. To this end, choose an
Pn
orthonormal basis {hm : m 0} in H, and set Sn = m=0 I(hm )hm . Then, by
Theorem 8.3.3, x
Sn (x) is W-independent of x
x Sn (x) and
Sn (x) x
in E for W-almost every x E. Hence, W {kx Sn (x)kE < 2r } 12 for some
n N, and therefore
W BE (0, r) 12 W kSn kE < 2r .
Pn
But kSn k2E CkSn k2H = m=0 I(hm )2 for some C < , and so
n+1
r
> 0 for any r > 0.
BRn+1 0, 2C
W kSn kE < 2r 0,1
8.3.3. Orthogonal Projections. Associated with any closed, linear subspace L of a Hilbert space H, there is an orthogonal projection map L : H
L determined by the property that, for each h H, h L h L. Equivalently,
L h is the element of L that is closest to h. In this subsection I will show that if
(H, E, W) is an abstract Wiener space and L is a finite dimensional subspace of
H, then L admits a W-almost surely unique extension PL to E. In addition,
I will show that PL x x in L2 (W; E) as L % H.
Lemma 8.3.7. Let (H, E, W) be an abstract Wiener space
P and {hm : m
0} an orthonormal basis in H. Then, for each h H,
m=0 (h, hm )H I(hm )
converges to I(h) W-almost surely and in Lp (W; R) for every p [1, ).
Proof: Define the -algebras Fn and F as in the proof P
of Theorem 8.3.3. Then,
n
by the same argument as I used there, one can identify m=0 (h, hm )H I(hm ) as
W
Theorem 8.3.8.
Let (H, E, W) be an abstract Wiener space. For each
finite dimensional subspace L of H there is a W-almost surely unique map
PL : E H such that, for every h H and W-almost every x E,
h, PL x H = I(L h)(x), where L denotes orthogonal projection from H onto
L. In fact, if {g1 , . . . , gdim(L) } is an orthonormal basis for L, then PL x =
Pdim(L)
[I(gi )](x)gi , and so PL x L for W-almost every x E. In partic1
ular, the distribution of x E 7 PL x L under W is the same as that
Pdim(L)
dim(L)
of (1 , . . . , dim(L) ) Rdim(L) 7
i gi L under 0,1
. Finally,
1
x
PL x is W-independent of x
x PL x.
Proof: Set ` = dim(L). It suffices to note that
!
`
`
X
X
I(L h) = I
(h, gk )H gk =
(h, gk )H I(gk ) =
k=1
k=1
`
X
k=1
!
I(gk )gk , h
H
for all h H
We now have the preparations needed to prove a result which shows that
my definition of an abstract Wiener space is the same as Grosss. Specifically,
Grosss own definition was based on the property proved in the following.
323
that 0 I(fn )fn fails to converge in L2 (W; E). Thus, suppose that there exists
an > 0 such that
for all n N there exists a finite dimensional L Ln
with EW kPL xk2E 2 . Under
this assumption, define
{nm : m 0}
N, {`m : m 0} N, and {f0 , . . . , fnm } : m 0 Lnm inductively
by the following prescription. First, take n0 = 0 = `0 and f0 = h0 . Next,
knowing nm and {f0, . . . , fnm }, choose a finite dimensional subspace L Lnm
so that EW kPL xk2E 2 , set `m = dim(L), and let {gm,1 , . . . , gm,`m } be an
orthonormal basis for L. For any > 0 there exists an n nm + `m such that
`m
X
Ln gm,i , Ln gm,j
i,j .
H
i,j=1
`m
X
Ln gm,i , Ln gm,j i,j
k
gm,i gm,i kH K`m
i=1
i,j=1
for some Km < which depends only on `m . Moreover, and because L Lnm ,
gm,i Lnm for all 1 i `m. Hence, we can find an nm+1 nm + `m so that
span {hn : nm < n nm+1 } admits an orthonormal basis {fnm +1 , . . . , fnm+1 }
P`
with the property that 1m kgm,i fnm +i kH 4 .
Clearly {fn : n 0} is an orthonormal basis for H. On the other hand,
2 12
2 12
`m
+`m
X
nmX
I(gm,i )gm,i I(fnm +i )fnm +i
EW
I(fn )fn
EW
n=nm +1
`m
X
2 1
EW
I(gm,i )gm,i I(fnm +i )fnm +i
H 2 ,
2 1
and so, since EW
I(gi,m )gm,i I(fnm +i )fnm +i
H 2 is dominated by
2 1
1
EW
I(gm,i ) I(fnm +i ) gm,i
H 2 + EW I(fnm +i )2 2 kgm,i fnm +i kH
2kgm,i fnm +i kH ,
324
we have that
2 12
+`m
nmX
EW
I(fn )fn
2
n +1
m
for all m 0,
P
and this means that 0 I(fn )fn cannot be converging in L2 (W; E).
Besides showing that my definition of an abstract Wiener space is the same
as Grosss, Theorem 8.3.9 allows us to prove a very convincing statement, again
due to Gross, of just how non-unique is the Banach space for which a given
Hilbert space is the CameronMartin space.
Corollary 8.3.10. If (H, E, W) is an abstract Wiener space, then there
exists a separable Banach space E0 that is continuously embedded in E as a
measurable subset and has the properties that W(E
0 ) = 1, bounded subsets of
E0 are relatively compact in E, and (H, E0 , W E0 is again an abstract Wiener
space.
Proof: Again I will assume that k kE k kH .
Choose {xn : n 0} E so that {hn : n 0} is an orthonormal basis
in H when hn = hxn , and set Ln = span {h0 , . . . , hn } . Next, using Theorem 8.3.9, choose an increasing sequence {nm : m 0} so that n0 = 0 and
1
EW kPL xk2E 2 2m for m 1 and finite dimensional L Lnm , and define
Q` for ` 0 on E into H so that
Q0 x =
hx, x0 ih0
and Q` x =
n`
X
hx, xn ihn
when ` 1.
n=n`1 +1
Pm
`=0
`2
Q` xkE <
`=1
m
X
m `=1
and therefore k kE0 is certainly a norm on E0 . Next, suppose that the sequence
{xk : k 1} E0 is a Cauchy sequence with respect to k kE0 . By the
preceding, we know that {xk : k 1} is also Cauchy convergent with respect to
325
= lim
n kQn x` kE
` n>m
n>m
`>k
Thus, by choosing k for a given > 0 so that sup`>k kx` xk kE0 < , we
conclude that limm kx Sm xkE < and therefore that Sm x x in E.
Hence, x E0 . Finally, to see that xk x in E0 , simply note that
m2 kQm (x xk )kE
m=1
lim
`
!
2
m=1
which tends to 0 as k .
To show that bounded subsets of E0 are relatively compact in E, it suffices
to show that if {x` : ` 1} BE0 (0, R), then there is an x E to which a
subsequence converges in E. For this purpose, observe that, for each m 0,
there is a subsequence {x`k : k 1} along which Sm x`k converges in Lnm .
Hence, by a diagonalization argument, {x`k : k 1} can be chosen so that
{Sm x`k : k 1} converges in Lnm for all m 0. Since, for 1 j < k,
X
kx`k x`j kE kSm x`k Sm x`j kE +
kQn (x`k x`j )kE
n>m
X 1
,
n2
n>m
X
EW kxkE0 = EW kQ0 xkE +
m2 EW kQm xkE
1
EW kQ0 xkE +
X
1
1
m2 EW kQm xk2E 2 < .
326
1
khkE
1
W
2 2 khkE
.
=
E
I(h)
khkH
khk2H
2m
X
X
m2
2
khkH = 25khkH .
khkE0 = kQ0 hkE +
m kQm hkE 1 + 2
m
2
m=1
m=1
To complete the proof, I must show that H is dense in E0 and that, for each
c0 (y ) = e 12 khy k2H , where W0 = W E0 and hy H is determined
y E0 , W
by h, hy H = hh, y i for h H. Both these facts rely on the observation that
X
kx Sm xkE0 =
n2 kQn xkE 0 for all x E0 .
n>m
= lim
nm
X
hy , hn
H
nm
X
hx, xn ihhn , y i
n=0
I(hn ) (x) = I(hy ) (x)
n=0
n
X
htm htm1
(tm ) (tm1 ) ,
t tm1
m=1 m
and so
(t1 ,... ,tn ) (t) [ PL ](t)
(
ttm1
if t [tm1 , tm ]
(t) (tm1 ) tm
(8.3.11)
tm1 (tm ) (tm1 )
=
(t) (tn )
if t [tn , ).
327
Thus, if (, ~y) (RN ) (RN )n 7 (t1 ,... ,tn ),~y (RN ) is given by
(t1 ,... ,tn ),~y = (t1 ,... ,tn ) +
n
X
htm htm1
(ym ym1 ),
t tm1
m=1 m
(8.3.12)
Z
=
(RN )n
(N )
(d) 0,C(t1 ,... ,tn ) (d~y),
(RN )
n
X
htm htm1
ym ,
t tm1
m=1 m
then
Z
F , (t1 ) (t0 ), . . . , (tn ) (tn1 ) W (N ) (d)
(RN )
(8.3.13)
Z
=
(RN )n
(N )
~
F (t1 ,... ,tn ),~y , y W
(d) 0,D(t1 ,... ,tn ) (d~y),
(RN )
where D(t1 , . . . , tn )(m,i),(m0 ,i0 ) = (tm tm1 )m,m0 i,i0 for 1 m, m0 n and
1 i, i0 N is the covariance matrix for (t1 ) (t0 ), . . . , (tn ) (tn1 )
under W (N ) .
There are several comments that should be made about these conclusions. In
the first place, it is clear from (8.3.11) that t
(t1 ,... ,tn ) (t) returns to the origin
at each of the times {tm : 1 m n}. In addition, the excursions (t1 ,... ,tn )
[tm1 , tm ], 1 m n, are independent of each other and of (t1 ,... ,tn ) [tn , ).
(N )
is a regular conditional probability distribution (cf. 9.2) of W (N ) given the algebra generated
by {(t1 ), . . . , (tn )}. Expressed in more colloquial terms, the
process (t1 ,... ,tn ),~y (t) : t 0 is Brownian motion pinned to the points
{ym : 1 m n} at times {tm : 1 m n}.
328
C be the set of x E for which both m=0 [I(hm )](x)hm and m=0 [I(hm )](x)Ohm
converge in E. By Theorem 8.3.3, we know that W(C) = 1 and that
P
m=0 [I(hm )](x)Ohm if x C
x
TO x
0
if x
/C
has distribution W. Hence, all that remains is to check that I(h)TO = I(O> h)
W-almost surely for each h H. To this end, let x E , and observe that
hx , Ohm
H
[I(hm )](x)
m=0
O> hx , hm
H
[I(hm )](x)
m=0
for W-almost every x E. Thus, since, by Lemma 8.3.7, the last of these
series convergences W-almost surely to I(O> hx ), we have that I(hx ) TO =
329
lim EW ( TOn ) = 0
n
See I.E. Segals Ergodic subsgroups of the orthogonal group on a real Hilbert Space, Annals
of Math. 66 # 2, pp. 297303 (1957). For a treatment in the setting here, see my article Some
thoughts about Segals ergodic theorem, Colloq. Math. 118 # 1, pp. 89-105 (2010).
330
where
Cn =
I
B>
n
Bn
I
with Bn =
hk , On h`
H
1k,`N
Perhaps the best tests for whether an orthogonal transformation satisfies the
hypothesis in Theorem 8.3.15 come from spectral theory. To be more precise, if
Hc and Oc are the space and operator obtained by complexifying H and O, the
Spectral Theorem for normal operators allows one to write
Z
Oc =
dE ,
O h, h
H
1n
f () d 0 as n .
See Exercises 8.3.24, 8.3.25, and 8.5.15 for a more concrete examples.
Exercises for 8.3
Exercise 8.3.16. The purpose of this exercise is to provide the linear algebraic
facts that I used in the proof of Theorem 8.3.9. Namely, I want to show that if
a set {h1 , . . . , hn } H is approximately orthonormal, then the vectors hi differ
by very little from their GramSchmidt orthogonalization.
3
This conclusion highlights the poverty of the result here in comparison to Segals result,
which says that TO is ergodic as soon as the spectrum of Oc is continuous.
331
(i) Suppose that A = aij 1i,jn Rn Rn is a lower triangular matrix whose
diagonal entries are non-negative. Show that there is a Cn < , depending only
on n, such that kIRn Akop Cn kIRn AA> kop .
Hint: Show that it suffices to treat the case when AA> 2IRn , and set =
IRn AA> . Assuming that AA> 2IRn , work by induction on n, at each step
using the lower triangularity of A, to see that
12
`
X
1
a2` j if 1 ` < n
|a` ` an ` | |n ` | + (AA> )n2 n
j=1
n1
X
1 a2n n |n n | +
a2n ` .
`=1
(ii) Let {h1 , . . . , hn } H, set B = (hi , hj )H 1i,jn , and assume that kIRn
Bkop < 1. Show that the hi s are linearly independent.
(iii) Continuing part (ii), let {f1 , . . . , fn } be the orthonormal set obtained from
the hi s by the GramSchmidt orthogonalization procedure, and let A be the
matrix whose (i, j)th entry is (hi , fj )H . Show that A is lower triangular and
that its diagonal entries are non-negative. In addition, show that AA> = B.
(iv) By combining (i) and (iii), show that there is a Kn < , depending only
on n, such that
n
X
khi fi kH Kn
i=1
n
X
i,j (hi , hj )H .
i,j=1
Pn
j=1
fi k2H
n
X
IRn A
2
ij
nkIRn Ak2op .
j=1
332
Exercise 8.3.18. Let (H, E, W) be an abstract Wiener space, and assume that
H is infinite dimensional. As was pointed out, {hx : x E } is the subspace of
g H for which there exists a C < with the property that |(h, g)H | CkhkE
for all h H. Show that for each g H there is separable Banach space Eg
that is continuously embedded as a Borel subset of E such that W(Eg ) = 1,
(H, Eg , W Eg ) is an abstract Wiener space, and |(h, g)H | khkEg for all
h H.
Hint: Refer to the notation used in the proof of Corollary 8.3.10. Choose nm %
1
so that n0 = 0 and, for m 1, kL
gkH 2m and EW kPL k2E 2 2m
nm
for finite dimensional L Lnm . Next, define Eg to be the space of x E with
the properties that PLnm x x in E and
kxkEg
X
kQ` xkE + Q` x, g H < ,
`=0
Pn`
(t) = tX0 () + 2 2
Xm ()
m=1
sin(mt)
,
m
t [0, 1],
where the convergence is uniform. From this, show that, W (1) -almost surely,
1
where the convergence of the series is absolute. Using the preceding, conclude
that, for any (0, ),
EW
(1)
Z
# 12
# 12 "
"Y
X
1
2
.
1 + 4
(t)2 dt =
1+ 2 2
m2 2 + 2
m
m=1
m=1
z2
m2 2
,
z C,
333
Z
exp
1
(t) dt
= cosh 2 2 ,
2
1
W (1)
2
E
exp
(t) dt
= cosh 2 T 2 .
0
This is a famous calculation that can be made using many different methods.
We will return to it in 10.1.3. See, in addition, Exercise 8.4.7.
Hint: Use Eulers product formula to see that
X
1
sinh t
d
= 2t
log
2
2
n + t2
t
dt
n=1
for t R.
Exercise 8.3.20. Related to the preceding exercise, but easier, is finding the
Laplace transform of the variance
!2
Z
Z
1 T
1 T
2
(t) dt
(t) dt
VT ()
T 0
T 0
of a Brownian path over the interval [0, T ]. To do this calculation, first use
Brownian scaling to show that
(1)
(1)
EW eVT = EW eT V1 .
Next, use elementary Fourier series to show that (cf. part (iii) of Exercise 8.2.18)
R
2
1
2 X
Z 1
f
(t)
d(t)
X
k
0
,
V1 () = 2
(t) cos(kt) dt =
k2 2
0
k=1
k=1
1
2
E eVT =
2T
.
sinh( 2T )
334
Exercise 8.3.21. The purpose of this exercise is to show that, without knowing ahead of time that W (N ) lives on (RN ), for the Hilbert space H1 (RN ) one
N
-almost surely in (RN ).
can give a proof that any Wiener series converges 0,1
N
Thus, let {hm : m 0} be an orthonormal basis
Pn in H(R ) and, for n N
N
and = (0 , . . . , m , . . . ) R , set Sn (t, ) = m=0 m hm (t). The goal is to
N
show that {Sn ( , ) : n 0} converges in (RN ) for 0,1
-almost every RN .
(i) For RN , set ht, ( ) = t , check that , Sn (t) RN = ht, , Sn (t) H1 (RN ) ,
N
and apply Theorem 1.4.2 to show that limn , Sn (t) RN exists both 0,1
2 N
N
almost surely and in L (0,1 ; R) for each (t, ) [0, ) R . Conclude from
N
this that, for each t [0, ), limn Sn (t) exists both 0,1
-almost surely and
2 N
N
in L (0,1 ; R ).
(ii) On the basis of part (i), show that we will be done once we know that,
N
for 0,1
-almost every x RN , {Sn ( , x) : n 0} is equicontinuous on finite
intervals and that supn0 t1 |Sn (t, x)| 0 as t . Show that both these
will follow from the existence of a C < such that
"
#
Sn (t) Sn (s)
N
3
0,1
CT 8 for all T (0, ).
(*)
E
sup sup
1
(t s) 8
0s<tT n0
(iii) As an application of Theorem 4.3.2, show that (*) will follow once one
checks that
N
0,1
4
E
sup |Sn (t) Sn (s)| B(t s)2 , 0 s < t,
n0
335
(ii) Set H1T (RN ) = {h [0, T ] : h H1 (RN ) & h(T ) = 0}, and define
L2 ([0,T ];RN ) . Show that the triple H1 (RN ), T (RN ), W (N )
khkH1T (RN ) = khk
T
T
(N )
(ii) Complete the program by showing that On h, h0 H1 (RN ) tends to 0 for all
h 0 C (0, ); RN .
(0, ) \ {1} and h, h0 H1 (RN ) with h,
c
(iii) There is another way to think about the operator O . Namely, let RN
be Lebesgue measure on R, define U : H(RN ) L2 (RN ; RN ) by U h(x) =
x
x ), and show that U is an isometry from H1 (RN ) onto L2 (RN ; RN ). Fure 2 h(e
ther, show that U O = log U , where : L2 (RN ; RN ) L2 (RN ; RN ) is
the translation map f (x) = f (x + ). Conclude from this that
On h, h0
H1 (RN )
= (2)1
Z
R
1n log
d
Uch(), U
h0
CN
d,
336
(iv) As a consequence of the above and Theorem 6.2.7, show that for each
(0, ) \ {1}, q [1, ), and F Lq (W (N ) ; C),
n1
(N )
1 X
F Sn = EW [F ] W (N ) -almost surely and in Lq (W (N ) ; C).
n n
m=0
lim
Exercise 8.3.25. Here is a second reasonably explicit example to which Theorem 8.3.15 applies. Again consider the classical case when H = H1 (RN ), and
assume that N Z+ is even. Choose a skew-symmetric A Hom(RN ; RN )
whose kernel is {0}. That is, A> = A and Ax = 0 = x = 0.
(i) Define OA on H1 (RN ) by
Z
) d,
e A h(
OA h(t) =
0
337
inf
h
(8.4.2)
khk2H
lim log W ()
2
&0
khk2H
.
2
h
The original version of Theorem 8.4.1 was proved by M. Schilder for the classical Wiener measure using a method that does not extend easily to the general
case. The statement that I have given is due to Donsker and S.R.S. Varadhan,
and my proof derives from an approach (which very much resembles the arguments given in 1.3 to prove Cramers Theorem) that was introduced into this
context by Varadhan.
The lower bound is an easy application of the CameronMartin formula. Indeed, all that I have to do is show that if h H and r > 0, then
khk2H
.
lim log W BE (h, r)
2
&0
(*)
1
1
W BE (hx , ) = W BE ( 2 hx , 2 )
i
h 1
2
1
1
= EW e 2 hx,x i 2 khx kH , BE (0, 2 )
2
1
1
1
e kx kE 2 khx kH W BE (0, 2 ) ,
khx k2H
,
BE (hx , ) BE (h, r) = lim log W BE (hx , r) kx kE
&0
2
and therefore, after letting & 0 and remembering that {hx : x E } is dense
in H, that (*) holds.
338
The proof of the upper bound in (8.4.2) is a little more involved. The first step
is to show that it suffices to treat the case when is relatively compact. To this
end, refer to Corollary 8.3.10, and set CR equal to the closure in E of BE0 (0, R).
2
By Ferniques Theorem applied to W on E0 , we know that EW ekxkE0 K <
for some > 0. Hence
W E \ CR = W E \ C
2 R
Ke
R2
Thus, if we can prove the upper bound for relatively compact s, then, because
CR is relatively compact, we will know that, for all R > 0,
khk2H
2
h
lim log W ()
inf
&0
R2
,
if y
/ H.
To see that (**) is enough, assume that it is true and let BE \{} be relatively
compact. Given (0, 1), for each y choose r(y) > 0 and (y) > 0 so that
(
W BE (y, r(y))
(1)
2
2 kykH
1
if y H
if y
/H
for all 0 < (y). Because is relatively compact, we can find N Z+ and
SN
{y1 , . . . , yN } such that 1 BE (yn , rn ), where rn = r(yn ). Then, for
sufficiently small > 0,
1
1
2
,
inf khkH
W () N exp
2 h
and so
lim log W ()
&0
1
inf khk2H
2 h
1
.
339
i
h 1
1
1
H
e (hy,x irkx kE ) EW e 2 hx,x i = e hy,x i 2 rkx kE ,
lim lim log W BE (y, r) sup hy, x i 12 khx k2H .
r&0 &0
x E
Finally, note that the preceding supremum is the same as half the supremum
kyk2
of hy, x i over x with khx kH = 1, which, by Lemma 8.2.3, is equal to 2 H if
y H and to if y
/ H.
An interesting corollary of Theorem 8.4.1 is the following sharpening, due to
Donsker and Varadhan, of Ferniques Theorem.
2 2
In particular, EW e 2 kxkE is finite if < 1 and infinite if 1 .
Proof: Set f (r) = inf{khkH : khkE r}. Clearly f (r) = rf (1) and f (1) =
1 . Thus, by the upper bound in (8.4.2), we know that
2
f (1)2
.
=
lim R2 log W kxkE R = lim R2 log WR2 kxkE 1
R
R
2
2
lim R2 log W kxkE R lim R2 log W kxkE > R
R
inf
khk2H
: khkE > R
2
1
f (1 + )2
= (1 + )2 2 ,
2
2
340
1 hg, x i = g, hx H = kgkH khx kH . Hence khx kH kgk1
H . Next,
suppose that h H with khkE = 1. Then, by the HahnBanach Theorem, there
exists a x E with kxkE = 1 and hh, x i = 1. In particular, khkH khx kH
h, hx H = hh, x i = 1, and therefore khk1
H khx kH , which, together with
the preceding, completes the verification.
The next step is to show that there exists an x E with kx kE = 1 such
that khx kH = . To this end, choose {xk : k 1} E with kxk kE = 1 so
that khxk kH . Because BE (0, 1) is compact in the weak* topology and,
by Theorem 8.2.6, x E 7 hx H is continuous from the weak* topology
into the strong topology, we can assume that {xk : k 1} is weak* convergent to
some x BE (0, 1) and that khx kH = , which is possible only if kx kE = 1.
Finally, knowing that this x exists, note that h , x i is a centered Gaussian
under W with variance 2 . Hence, since kxkE |hx, x i|,
h kxk2E i Z
2
e 22 0,2 (d) = .
EW e 22
R
2 2
Sn
for all n Z+ and M n.
/ G exp
(*)
P
2n
341
To check (*), first note (cf. Exercise 8.2.14) that the distribution of Sn under
1
/G =
P is the same as that of x
n 2 x under W and therefore that P Sn
W n2 (G{). Hence, (*) is really just an application of the upper bound in (8.4.2).
Given (*), I proceed in very much the same way as I did at the analogous place
in 1.5. Namely, for any (1, 2),
max
m m1 n m
lim
max
m1
At this point in 1.5 (cf. the proof of Lemma 1.5.3), I applied Levys reflection
principle to get rid of the max. However, Levys argument works only for
R-valued random variables, and so here I will replace his estimate by one based
on the idea in Exercise 1.4.25.
Lemma 8.4.5. Let {YmP: m 1} be mutually independent, E-valued random
n
variables, and set Sn = m=1 Ym for n 1. Then, for any closed F E and
> 0,
P(kSn F kE )
.
P max kSm F kE 2
1mn
1 max1mn P(kSn Sm kE )
Proof: Set
Am = {kSm F kE 2 and kSk F kE < 2 for 1 k < m}.
Following the hint for Exercise 1.4.25, observe that
P max kSm F kE 2
min P(kSn Sm kE < )
1mn
n
X
m=1
1mn
n
X
P Am {kSn Sm kE < }
P Am {kSn F kE } ,
m=1
which, because the Am s are disjoint, is dominated by P kSn F kE .
Applying the preceding to the situation at hand, we see that
!
Sn
BH (0, 1)
P
max
2
1n m
[ m1 ]
E
S[m ]
BH (0, 1)
P
[
m1 ]
E
.
1 max1n m P kSn kE [ m1 ]
342
After combining this with the estimate in (*), it is an easy matter to show that,
for each > 0, there is a (1, 2) such that
!
X
Sn
BH (0, 1)
P
max
2 < ,
m1 n m [ m1 ]
E
m=1
from which it should be clear why limn kSn BH (0, 1)kE = 0 P-almost surely.
The proof that, P-almost surely, limn kSn hkE = 0 for all h BH (0, 1)
differs in no substantive way from the proof of the analogous assertion in the
second part of Theorem 1.5.9. Namely, because BH (0, 1) is separable, it suffices
to work with one h BH (0, 1) at a time. Furthermore, just as I did there, I can
reduce the problem to showing that, for each k 2, > 0, and h with khkH < 1,
X
P
Skm km1 h
E < = .
m=1
lim
Exercise 8.4.7. Show that the in Corollary 8.4.3 is 12 in the case of the
classical abstract Wiener space H1 (RN ), (RN ), W (N ) and therefore that
lim R2 log W (N ) kk(RN ) R = 2.
log W
(N )
sup |( )| R
[0,t]
1
2t
343
and that
!
2
sup |( )| R (t) = 0 = .
t
lim R2 log W (N )
[0,t]
log W
(N )
Z
t
2
|( )| d R
0
=
2
8t2
and that
lim R
log W
(N )
Z
0
2
|( )| d R (t) = 0 = 2 .
2t
2
Hint: In each case after the first, Brownian scaling can be used to reduce the
problem to the case when t = 1, and the challenge is to find the optimal constant
C for which khkE CkhkH , h H for the appropriate
abstract Wiener space
N
(E, H, W).
In
the
second
case
E
=
C
[0,
1]
:
R
[0, 1] : (RN )
0
and H = [0, 1] : H1 (RN ) , in the third (cf. part (ii) of Exercise 8.3.22)
N
E = 1 (RN ) and H = H11 (RN ) , in the fourth E = L2 [0, 1];
{
R ) and H =
1
N
2
N
1 N
[0, 1] : H (R )}, and in the fifth E = L [0, 1]; R
and
H
=
H
(R
).
1
The optimization problems when E = (RN ) or C0 [0, 1]; RN are rather easy
1
consequences of |(t)| t 2 kkH1 (RN ) . When E = 1 (RN ), one should start with
L1 ([0,1];RN ) kkH11 (RN ) .
the observation that if H11 (RN ), then 2kku kk
In the final two cases, one can either use elementary variational calculus or one
can make use of, respectively, the orthonormal bases
2 2 sin n +
1
2
1
: n 0 and 2 2 sin n : n 1 in L2 [0, 1]; R).
Exercise 8.4.8. Suppose that f C E; R , and show, as a consequence of
Theorem 8.4.4, that
lim f Sn = min{f (h) : khkH 1} and lim f Sn = max{f (h) : khkH 1}
n
W N -almost surely.
8.5 Euclidean Free Fields
In this section I will give a very cursory introduction to a family of abstract
Wiener spaces they played an important role in the attempt to give a mathematically rigorous construction of quantum fields. From the physical standpoint,
the fields treated here are trivial in the sense that they model free (i.e.,
non-interacting) fields. Nonetheless, they are interesting from a mathematical
344
standpoint and, if nothing else, show how profoundly properties of a process are
effected by the dimension of its parameter set.
I begin with the case when the parameter set is one dimensional and the
resulting process can be seen as a minor variant of Brownian motion. As we
will see, the intractability of the higher dimensional analogs increases with the
number of dimensions.
8.5.1. The OrnsteinUhlenbeck Process. Given x RN and (RN ),
consider the integral equation
Z
1 t
U(, x, ) d, t 0.
(8.5.1)
U(t, x, ) = x + (t)
2 0
2t
e 2 d( ),
U(t, 0, ) = e
0
U(t, x, ) = e 2 x + U(t, 0, )
EW
(N )
|ts|
s+t
U(s, 0) U(t, 0) = e 2 e 2 I.
In their article On the theory of Brownian motion, Phys. Reviews 36 # 3, pp. 823-841
(1930), L. Ornstein and G. Uhlenbeck introduced this process in an attempt to reconcile some
of the more disturbing properties of Wiener paths with physical reality.
345
= 1 = lim
lim
(8.5.2)
t
2 log t
2 log t
t
W (N ) -almost surely, which confirms the suspicion that the restoring force dampens the Brownian excursions out toward infinity.
A second indication that U( , x) tends to spend more time than Brownian
paths do near the origin is that its distribution at time t will be e 2t x,(1et )I ,
and so, as distinguished from Brownian motion itself, its distribution as time
t tends to a limit, namely 0,I . This observation suggests that it might be
interesting to look at an ancient OrnsteinUhlenbeck process, one that already
has been running for an infinite amount of time. To be more precise, since the
distribution of an ancient OrnsteinUhlenbeck at time 0 would be 0,I , what
we should look at is the process that we get by making the x in U( , x, )
a standard normal random variable. Thus, I will say that a stochastic process
{UA (t) : t 0} is an ancient OrnsteinUhlenbeck process if its distribution
is that of {U(t, x, ) : t 0} under 0,I W (N ) .
If {U
process, then it is clear
A (t) : t 0} is an ancient OrnsteinUhlenbeck
that , UA (t) RN : t 0 & RN spans a Gaussian family with covariance
|ts|
EP UA (s) UA (t) = e 2 I.
As
we see that if {B(t) : t 0} is a Brownian motion, then
at consequence,
e 2 B et : t 0 is an ancient OrnsteinUhlenbeck process. In addition, as
we suspected, the ancient OrnsteinUhlenbeck process is a stationary process
in the sense that, for each T > 0, the distribution of {UA (t + T ) : t 0} is
the same as that of {UA (t) : t 0}, which can be checked either by using the
preceding representation in terms of Brownian motion or by observing that its
covariance is a function of t s.
In fact, even more is true: it is time reversible in the sense that, for each T > 0,
{UA (t) : t [0, T ]} has the same distribution as {UA (T t) : t [0, T ]}. This
observation suggests that we can give the ancient OrnsteinUhlenbeck its past
by running it backwards. That is, define UR : [0, ) RN (RN )2 RN by
U(t, x, + )
if t 0
UR (t, x, + , ) =
U(t, x, ) if t < 0,
346
only now for all s, t R. One advantage of having added the past is that the
statement of reversibility takes a more appealing form. Namely, {UR (t) : t R}
is reversible in the sense that its distribution is the same whether one runs
it forward or backward in time. That is, {UR (t) : t R} has the same
distribution as {UR (t) : t R}. For this reason, I will say that {UR (t) : t 0}
is a reversible OrnsteinUhlenbeck process if its distribution is the same
as that of {UR (t, x, + , ) : t 0} under 0,I W (N ) W (N ) .
An alternative way to realize a reversible OrnsteinUhlenbeck process is to
start with an RN -valued Brownian motion
{B(t) : t 0} and consider the
t
t
process {e 2 B(et ) : t R}. Clearly , e 2 B(et ) RN : (t, ) R RN is
a Gaussian family with covariance given by (8.5.3). It is amusing to observe
that, when one uses this realization, the reversibility of the OrnsteinUhlenbeck
process is equivalent to the time inversion invariance (cf. Exercise 4.3.11) of the
original Brownian motion.
8.5.2. OrnsteinUhlenbeck as an Abstract Wiener Space. So far, my
treatment of the OrnsteinUhlenbeck process has been based on its relationship
to Brownian motion. Here I will look at it as an abstract Wiener space.
Begin with the one-sided process
0) : t 0}. Seeing as this process
t {U(t,
t
2
has the same distribution as e B e 1 : t 0}, it is reasonably clear
that the Hilbert space associated with this process should be the space HU (RN )
t
of functions hU (t) = e 2 h et 1), h H1 (RN ). Thus, define the map F U :
H1 (RN ) HU (RN ) accordingly, and introduce the Hilbert norm k kHU (RN )
on HU (RN ) that makes F U into an isometry. Equivalently,
Z
h d
i2
1
U 2
ds
(1 + s) 2 hU log(1 + s)
kh kHU (RN ) =
[0,) ds
1
U U 2
khU k2 2
= kh U k2 2
N .
N + h ,h
N +
L ([0,);R )
L ([0,);R )
L ([0,);R )
Note that
h U , hU
L2 ([0,);RN )
1
2
Z
[0,)
d U
|h (t)|2 dt =
dt
1
lim |hU (t)|2
2 t
= 0.
1
347
and so we will adopt U (RN ) as the Banach space for HU (RN ). Clearly, the
dual space U (RN ) of U (RN ) can be identified with the space of RN -valued
Borel
measures on [0, ) that give 0 mass to {0} and satisfy kkU (RN )
R
log(e
+ t) ||(dt) < .
[0,)
(N )
Theorem 8.5.4. Let U0 M1 U (RN ) be the distribution of {U(t, 0) :
(N )
t 0} under W (N ) . Then HU (RN ), U (RN ), U0
is an abstract Wiener
space.
Proof: Since Cc (0, ); RN is contained in HU (RN ) and is dense in U (RN ),
we know that HU (RN ) is dense in U (RN ). In addition, because U (t) =
t
e 2 (et 1), where H1 (RN ), and k U kHU (RN ) = kkH1 (RN ) , k U ku
1
k U kHU (RN ) follows from |(t)| t 2 kkH1 (RN ) . Hence, HU (RN ) is continuously
embedded in U (RN ).
To complete the proof, remember our earlier calculation of the covariance of
{U(t; 0) : t 0}, and use it to check that
(N )
EU0
h, i2 =
ZZ
u0 (s, t) (ds) (dt),
where u0 (s, t) e
|st|
2
s+t
2
[0,)2
U
N
Hence, what I need to show is that if U (RN ) hU
H (R ) is the
U
U
U
map determined by hh , i = h , h HU (RN ) , then
(8.5.5)
2
khU
kHU (RN ) =
ZZ
u0 (s, t) (ds) (dt).
[0,)2
ZZ
u0 (s, t) (ds) e, (dt)
[0,)2
Z
=
e,
!
u0 (, t) (dt)
[0,)
.
RN
RN
348
R
Thus, one should guess that hU
( ) = [0,) u0 (, t) (dt) and must check that,
U
U
N
U
U
N
with this choice, h
H (R ), (8.5.5) holds, and, for all h H (R ),
U
U
U
hh , i = h , h HU (RN ) .
The key to proving all these is the equality
Z
Z
hU ( )u0 (, t) d = hU (t),
(*)
h U ( ) u0 (, t) d + 14
[0,)
[0,)
1
kkU (R;RN ) sup log(e + |t|)
|(t)| < .
tR
Furthermore, it should be clear that one can identify U (R; RN ) with the space
of RN -valued Borel measures on R satisfying
Z
kkU (R;RN )
log(e + |t|) ||(dt) < .
R
Theorem 8.5.6. Take H1 (R; RN ) to be the separable Hilbert space of absolutely continuous h : R RN satisfying
khkH1 (R;RN )
2 2 N + 1 khk2 2 N < .
khk
4
L (R:R )
L (R:R )
(N )
|st|
2
H1 (R;RN )
349
ZZ
u(s, t) (ds) (dt)
RR
u(, t) (dt). Hence, since , (t) RN : t 0 & RN
(N )
(N )
spans a Gaussian family in L2 UR ; R and u(s, t)I = EUR (s) (t) , the
proof is complete.
when h ( ) =
( 2)dim(H1 (R;RN ))
Z
1
2
2
1
( 2)dim(H 1 (R ;R))
Z
1
2
2
1
|h(x))| + 4 |h(x)| dx H 1 (R ;R) (dh),
exp
2 R
The need to deal with generalized functions is the primary source of the difficulties that
mathematicians have when they attempt to construct non-trivial quantum fields. Without
going into any details, suffice it to say that in order to construct interacting (i.e., non-Gaussian)
fields, one has to take non-linear functions of a Gaussian field. However, if the Gaussian field
is distribution valued, it is not at all clear how to apply a non-linear function to it.
350
The approach that I will adopt is based on the following subterfuge. The space
H 1 (R ; R) is one of a continuously graded family of spaces known as Sobolev
spaces. Sobolev spaces are graded according to the number of derivatives better or worse than L2 (R ; R) their elements are. To be more precise, for each
s R, define the Bessel operator B s on S (R ; C) so that
s
s () = 1 + ||2 2 ().
d
B
4
m
When s = 2m, it is clear that B s = 14 , and so, in general, it is reasonable
to think of B s as an operator that, depending on whether s 0 or s 0,
involves taking or restoring derivatives of order |s|. In particular, kkH 1 (R ;R) =
kB 1 kL2 (R ;R) for S (R ; R). More generally, define the Sobolev space
H s (R ; R) to be the separable Hilbert space obtained by completing S (R ; R)
with respect to
s
Z
s
1
s
1
2 d.
+ ||2 |h()|
khkH s (R ;R) kB hkL2 (R ;R) =
4
(2) R
Obviously, H 0 (R ; R) is just L2 (R ; R). When s > 0, H s (R ; R) is a subspace of L2 (R ; R), and the quality of its elements will improve as s gets larger.
However, when s < 0, some elements of H s (R ; R) will be strictly worse than
elements of L2 (R ; R), and their quality will deteriorate as s becomes more negative. Nonetheless, for every s R, H s (R ; R) S 0 (R ; R), where S 0 (R ; R),
whose elements are called real-valued tempered distributions, is the dual
space of S (R ; R). In fact, with a little effort, one can check that an alternative
description of H s (R ; R) is as the subspace of u S 0 (R ; R) with the property that B s u L2 (R ; R). Equivalently, H s (R ; R) is the isometric image in
S (R ; R) of L2 (R ; R) under the map B s , and, more generally, H s2 (R ; R) is
the isometric image of H s1 (R ; R) under B s2 s1 . Thus, by Theorem 8.3.1, once
we understand the abstract Wiener spaces for any one of the spaces H s (R ; R),
understanding the abstract Wiener spaces for any of the others comes down to
understanding the action of the Bessel operators, a task that, depending on what
one wants to know, can be highly non-trivial.
+1
is an element of H
+1
2
kh k
(R ; R),
ZZ
2
H
+1
2
(R ;R)
= K
R R
|xy|
2
(dx)(dy),
351
and
hh, i = h, h
H
+1
2
for each h H
(R ;R)
+1
2
(R ; R).
Proof: To prove the initial assertion, use the Fourier inversion formula to write
Z
d
h(x) = (2)
e 1(x,)R h()
R
2 2
1
khk
d
+
||
khku (2) 2
4
Hence, since H
norm k k +1
H
+1
2
+1
2
(R ;R)
(R ;R)
+1
e 1(x,)R ()
1
+1
d
B
(x) =
+1
2
(2) R
1
2
+
||
4
Z
Z
1(yx,)R
e
1
d (dy).
=
+1
(2) R
2
R 1 + ||2
4
Z
|yx|
e 1(yx,)R
1
2
,
d
=
K
e
+1
(2) R 1 + ||2 2
+1
2
(R ;R)
4
+1
|()|
1
d < .
=
(2) R 1 + ||2 +1
2
4
(R ;R)
+1
(R ;R)
particular,
kh k
ZZ
2
H
+1
2
(R ;R)
= hh , i = K
R R
|yx|
2
(dx)(dy).
352
+1
2
(R ;R)
+1
2
(R ; R), and
(R ;R)
+1
2
(R ; R),
+1
2
(R ; R), W
+1
2
(R ;R)
is an abstract Wiener space. Moreover, for each 0, 12 , W
every is H
older continuous of order and, for each > 12 , W
+1
2
(R ;R)
+1
2
(R ;R)
-almost
-almost
no is anywhere H
older continuous of order .
Proof: The initial part of the first assertion follows from the first part of
Lemma 8.5.7 plus the essentially trivial fact that C0 (R ; R) is continuously em+1
bedded as a dense subspace of 2 (R ; R). Further, by the second part of
that same lemma combined with Theorem 8.3.3, we will have proved the second part of the first assertion here once we show that, when {hm : m 0} is
P
+1
an orthonormal basis in H 2 (R ; R), the Wiener series m=0 m hm converges
+1
N
-almost every = (0 , . . . , m , . . . ) RN . Thus, set
in 2 (R ; R) for 0,1
Pn
Sn () = m=0 m hm for n 1. More or less mimicking the steps outlined in
Exercise 8.3.21, I will begin by showing that, for each 0, 12 and R [1, ),
(*)
zR
sup
x6=y
where Q(z, R) = z + [R, R) . Indeed, by the argument given in that exercise combined with the higher dimensional analog of Kolmogorovs continuity
criterion in Exercise 4.3.18, (*) will follow once we show that
N
E0,1 |Sn (y) Sn (x)|2 C|y x|,
x, y R ,
for some C < . To this end, set = y x , and apply Lemma 8.5.7 to check
E
N
0,1
n
X
2
2
|Sn (y) Sn (x)| =
hm , h
+1
2
(R ;R)
m=0
kh k2
+1
2
(R ;R)
= 2K 1 e
|yx|
2
Knowing (*), it becomes an easy matter to see that there exists a measurable S : R RN R such that x
S(x, ) is continuous of each and
353
N
Sn ( , ) S( , ) uniformly on compacts for 0,1
-almost every RN . In
N
fact, because of (*), it suffices to check that limn Sn (x) exists 0,1
-almost
surely for each x R , and this follows immediately from Theorem 1.4.2 plus
Var m hm (x) =
m=0
hm , hx
2
H
+1
2
= khx k2
(R ;R)
m=0
+1
2
(R ;R)
= K .
N
-almost every , x
S(x, )
Furthermore, again from (*), we know that, 0,1
1
is -H
older continuous so long as 0, 2 .
N
I must still check that, 0,1
-almost surely, the convergence of Sn ( , ) to
+1
S( , ) is taking place in 2 (R ; R), and, in view of the fact that we already
N
know that, 0,1
-almost surely, it is taking place uniformly on compacts, this
reduces to showing that
1
N
lim log(e + |x|
sup |Sn (x)| 0 0,1
-almost surely.
|x|
n0
n0
where k ku,C denotes the uniform norm over a set C R . At this point, I
would like to apply Ferniques
Theorem (Theorem 8.2.1) to the Banach space
` N; Cb (Q(z, 1); R) and thereby conclude that there exists an > 0 such that
N
(**)
B sup E0,1 exp sup kSn k2u,Q(z,1)
< .
zR
n0
However, ` N; Cb (Q(z, 1); R) is not separable. Nonetheless, there are two
ways to get around this technicality. The first is to observe that the only place
separability was used in the proof of Ferniques Theorem was at the beginning,
where I used it to guarantee that BE is generated by the maps x
hx, x i as
n0
mQ(0,M )Z
sup kSn ku,Q(m,1) .
n0
354
and therefore
"
Sn (x)
log(e
+ |x|)
|x|R n0
N
0,1
sup sup
1
m(log R) 4
h
i
N
E0,1 supn0 kSn ku,Q(0,em4 )
log(e + e(m1)4 )
1
m(log R) 4
p
log(1 + 2 e(m+1)4 B)
0
log(e + e(m1)4 )
as R .
+1
2
(R ;R)
-almost
no is anywhere Holder continuous of order , and for this purpose I will proceed
as in the proof of Theorem 4.3.4. Because the {(x + y) : x R } has the same
W +1 -distribution for all y, it suffices for me to show that, W +1 H
(R ;R)
(R ;R)
L n
\
:
m+`e
n
m+(`1)e
n
M
n
o
.
Hence, again using translation invariance, we see that we need only show that
there is an L Z+ such that, for each M Z+ ,
(`1)e
M
,
1
: `e
n W +1
n
n
n
H
(R ;R)
(R ;R)
where B(t), Ft , P is an R-valued Brownian motion. But clearly this probability
is dominated by the sum of
`
`1
`
P B e n B e n M2ne 2n
, 1 ` L
355
and
P 1 ` L
`1
1
1 e 2n B e n
M e 2n
2n
.
M 2 n2(1)
8
, which, since < 1,
The second of these is easily dominated by 2Le
means that it causes no problems. As for the first, one can use the independence
of Brownian
increments
and Brownian
scaling to dominate it by the Lth power
of
1
P B(1)B e n M (2n )1 . Hence, I can take any L such that 12 L >
.
As a consequence of the preceding and Theorem 8.3.1, we have the following
corollary.
kks (R ;R) = kB
+1
2 s
and
WH s (R ;R) = (B s
+1
2
) W
+1
2
(R ;R)
+1
2
(R ;R)
s
(8.5.11)
for the standard Gaussian distribution on R. I will outline the proof of (8.5.11)
for S (R; R), but the estimate immediately extends to any L2 (0,1 ; R)
whose (distributional) first derivative is again in L2 (0,1 ; R).
(i) For S (R; R), set
u (t, x) = EW
(1)
U (t, x) ,
t&0
in L2 (0,1 ; R).
Using this second expression, show that u (t, ) S (R; R) and that t
[0, ) 7 u (t, ) S (R; R) is continuous. In addition, show that u (t, x) =
1
00
0
2 u (t, x) xu (t, x) .
356
(ii) For 1 , 2 C 2 (R; R) whose second derivative are tempered, show that
1 , 002 x2
L2 (0,1 ;R)
= 01 , 02
L2 (0,1 ;R)
and use this together with (i) to show that, for any S (R; R),
hu (t, ), 0,1 i = h, 0,1 i and
d
ku (t, )k2L2 (0,1 ;R) = et ku0 (t, )k2L2 (0,1 ;R) .
dt
Conclude that ku (t, )kL2 (0,1 ;R) kkL2 (0,1 ;R) and
d
ku (t, )k2L2 (0,1 ;R) et k0 k2L2 (0,1 ;R) .
dt
log
0,1
(0 )2
0,1
.
u (t, ) log u (t, ) 0,1 =
u (t, ) 0,1
2
dt
et
d
(0 )2
.
0,1
357
Exercise 8.5.13. Although it should be clear that the arguments given in Exercises 8.5.10 and 8.5.12 work equally well in RN and yield (8.5.11) and (2.4.42)
with 0,1 replaced by 0,I and (0 )2 replaced by ||2 , it is significant that each
of these inequalities for R implies its RN analog. Indeed, show that Fubinis Theorem is all that one needs to pass to the higher dimensional results. The reason
why this remark is significant is that it allows one to prove infinite dimensional
versions of both Poincares Inequality and the logarithmic Sobolev Inequality,
and both of these play a crucial role in infinite dimensional analysis. In fact,
Nelsons interest in hypercontractive estimates sprung from his brilliant insight
that hypercontractive estimates would allow him to construct a non-trivial (i.e.,
non-Gaussian), translation invariant quantum field for R2 .
Exercise 8.5.14. It is interesting to see what happens if one changes the sign
of the second term on the right-hand side of (8.5.1), thereby converting the
centripetal force into a centrifugal one.
(i) Show that, for each (RN ), the unique solution to
V(t, ) = (t) +
1
2
V(, ) d,
t 0,
is
V(t, ) = e 2
e 2 d( ),
s+t
2
|ts|
2
(iii) Let {B(t) : t 0} be an RN -valued Brownian motion, and show that the
distribution of
t
e 2 B 1 et : t 0
and set kkV (RN ) supt0 et |(t)|. Show that V (RN ); k kV (RN ) is a
separable Banach space and that there exists a unique V (N ) M1 V (RN )
such that the distribution of {(t) : t 0} under V (N ) is the same as the
distribution of {V(t) : t 0} under W (N ) .
358
(v) There is a subtlety here that is worth mentioning. Namely, show that
HU (RN ) is isometrically embedded in HV (RN ). On the other hand, as distinguished from elements of HU (RN ), it is not true that k 12 k2L2 (R;RN ) =
2L2 (R;RN ) + 41 kk2L2 (R;RN ) , the point being that whereas the elements h of
kk
HV (RN ) with h Cc (0, ); RN are dense in HU (RN ), they are not dense in
HV (RN ).
Exercise 8.5.15. Given x R and a slowly increasing C(R ; R), define
x C(R ; R) so that x (y) = (x + y) for y R . Next, extend x
to S 0 (R ; R) so that h, x ui = hx , ui for S (R ; R), and check that
this is a legitimate extension in the sense that it is consistent with the original
definition when applied to us that are slowly increasing, continuous functions.
Finally, given s R, define Ox : H s (R ; R) H s (R ; R) by Ox h = x h.
(i) Show that B s x = x B s for all s R and x R .
(ii) Given s R, define Ox = x H s (R ; R), and show that Ox is an orthogonal
transformation.
(iii) Referring to Theorem 8.3.14 and Corollary 8.5.9, show that the measure
preserving transformation TOx that Ox determines on s (R ; R), WH s (R ;R) is
the restriction of x to s (R ; R).
(iv) If x 6= 0, show that TOx is ergodic on s (R ; R), WH s (R ;R) .
8.6 Brownian Motion on a Banach Space
In this concluding section I will discuss Brownian motion on a Banach space.
More precisely, given a non-degenerate, centered, Gaussian measure W on a
separable Banach space E, we will see that there exists an E-valued stochastic
process {B(t) : t 0} with the properties that B(0) = 0, t
B(t) is continuous,
and, for all 0 s < t, B(t) B(s) is independent of {B( ) : [0, s]} and
has distribution (cf. the notation in 8.4) Wts .
8.6.1. Abstract Wiener Formulation. Let W on E be as above, use H
to denote its CameronMartin space, and take H 1 (H) to be the Hilbert space
of absolutely continuous h : [0, ) H such that h(0) = 0 and khkH 1 (H) =
L2 ([0,);H) < . Finally, let (E) be the space of continuous : [0, )
khk
359
E
= 0, and turn (E) into a Banach space with norm
E satisfying limt k(t)k
t
1
kk(E) = supt0 (1 + t) k(t)kE . By exactly the same line of reasoning as
I used when E = RN , one can show that (E) is a separable Banach space in
which H 1 (E) is continuously embedded as a dense subspace. My goal is to prove
the following statement.
Theorem 8.6.1. With H 1 (H) and (E) as above, there is a unique W (E)
M1 (E) such that H 1 (H), (E), W (E) is an abstract Wiener space.
1
Choose an orthonormal basis {h1m : m 0} in H
(R), and, for n 0, t 0,
P
n
N
1
and x = (x0 , . . . , xm , . . . ) E , set Sn (t, x) =
m=0 hm (t)xm . I will show
N
that, W -almost surely, {Sn ( , x) : n 0} converges in (E), and, for the
most part, the proof follows the same basic line of reasoning as that suggested in
Exercise 8.3.21 when E = RN . However, there is a problem here that we did not
encounter there. Namely, unless E is finite dimensional, bounded subsets will
not necessarily be relatively compact in E. Hence, local uniform equicontinuity
plus local boundedness is not sufficient to guarantee
that a collection of E-valued
paths is relatively compact in C [0, ); E , and that is the reason why we have
to work a little harder here.
EW
sup
sup
n0 0s<tT
(t s)
1
8
KT 4 .
360
T n0 tT
But,
sup
t2k
X 7`
X
kSn (t, x)kE
kSn (t, x)kE
2 8
sup
t
t
`
`+1
2 t2
`k
`k
sup
0t2`+1
t8
24 K
1
8
2 1
2 8 .
1
every x E . But if x E , then hSn (t, x), x i = 0 hxm , x ihm (t), the random variables x
hxm , x ih1m (t) are P
independent, centered Gaussians under
N
W with variance khx k2H h1m (t)2 , and 0 h1m (t)2 = kht k2H 1 (R) = t. Thus, by
Theorem 1.4.2, we have the required convergence.
Next, define B : [0, ) E N E so that
limn Sn (t, x) if {Sn ( , x) : n 0} converges in (E)
B(t, x) =
0
otherwise.
Given (E) , determine h H 1 (H) by h, h H 1 (H) = hh, i for all h
H 1 (H). I want to show that, under W N , x
361
W N -almost surely.
X
2
2
hxm , xm i = khxm kH =
hh1m hk , i2 .
k=0
X
X
2
2
1
2
hB( ), i =
hhm hk , i =
h1m hk , h H 1 (H) = kh k2H 1 (H) .
m,k=0
m,k=0
Finally, to complete the proof, all that remains is to take W (E) to be the
W N -distribution of x
B( , x).
8.6.2. Brownian Formulation. Let (H, E, W) be an abstract Wiener space.
Given a probability space (, F, P), a non-decreasing family of sub--algebras
{Ft : t 0},
and a measurable map B : [0, ) E, say that the triple
B(t), Ft , P is a W-Brownian motion if
(1) B is {Ft : t 0}-progressively measurable,
(2) B(0, ) = 0 and B( , ) C [0, ); E for P-almost every ,
(3) B(1) has distribution W, and, for all 0 s < t, B(t)B(s) is independent
1
of Fs and has the same distribution as (t s) 2 B(1).
362
Proof: If B(t), Ft , P is a W-Brownian motion and x E with khx kH = 1,
then hB(t), x i hB(s), x i = hB(t) B(s), x i is independent of Fs and is a
centered Gaussian with variance (t s). Thus, hB(t), x i, Ft , P is an R-valued
Brownian motion.
Next assume that hB(t), x i, Ft , P is an R-valued Brownian motion for every
x with khx kH = 1. Then hB(t) B(s), x i is independent of Fs for every
x E , and so, since BE is generated by {h , x i : x E }, B(t) B(s) is
independent of Fs . In addition, hB(t) B(s), x i is a centered Gaussian with
variance (t s)khx k2H , and therefore B(1) has distribution W and B(t) B(s)
1
has the same distribution as (t s) 2 B(1). Thus, B(t), Ft , P is a W-Brownian
motion.
Again assume that B(t), Ft , P is a W-Brownian motion. To prove that
G(B) is a Gaussian family for which (8.6.5) holds, it suffices to show that, for
all 0 t1 t2 and x1 , x2 E , hB(t1 ), x1 i + hB(t2 ), x2 i is a centered Gaussian
with covariance t1 khx1 + hx2 k2H + (t2 t1 )khx2 k2H . Indeed, we would then
know not only that G(B) is a Gaussian family but also that the variance of
hB(t1 ), x1 i hB(t2 ), x2 i is t1 khx1 hx2 k2H + (t2 t1 )khx2 k2H , from which (8.6.5)
is immediate. But
(E)
h(t1 ), x1 ih(t2 ), x2 i = h1 h2 H 1 (H) = (t1 t2 ) hx1 , hx2 H .
363
Starting from this, it is an easy matter to check that the span of {h(t), x i :
(t, x ) [0, ) E } is a Gaussian family in L2 (W (E) ; R) that satisfies (8.6.5).
To prove the converse, begin by observing that, because G(B) is a Gaussian
family satisfying (8.6.5), the distribution of 7 B( , ) C [0, ); E
under P is the same as that of (E) 7 ( ) C [0, ); E under W (E) .
Hence
k(t)kE
kB(t)kE
(E)
= 0 = 1,
lim
=0 =W
P lim
t
t
t
t
B( , ) on
Not surprisingly, the proof differs only slightly from that of Theorem 8.4.4.
In proving the W (E) -almost sure convergence of {n : n 1} to BH 1 (H) (0, 1),
there are two new ingredients here. The first is the use of the Brownian scaling
invariance property (cf. Exercise 8.6.8), which says that the W (E) is invariant
1
under the scaling maps S : (E) (E) given by S = 2 ( ) for
> 0 and is easily proved as a consequence of the fact that these maps are
isometric from H 1 (H) onto itself. The second new ingredient is the observation
that, for any R > 0, r (0, 1], and (E), k(r ) BH 1 (H) (0, R)k(E)
k BH 1 (H) (0, R)k(E) . To see this, let h BH 1 (H) (0, R) be given, and check
that h(r ) is again in BH (0, R) and that k(r ) h(r )k(E) k hk(E) .
364
Taking these into account and applying (8.4.2), one can now justify
W (E) m1max m
n BH 1 (H) (0, 1)
(E)
n
!
m
2 (n m )
(E)
BH 1 (H) (0, 1)
=W
max
n
m1 n m
(E)
m1
[
]
m
W (E) m1max m
m n BH 1 (H) 0,
m
n
2 [ m1 ]
2
(E)
[ m1 ]
m
W (E)
BH 1 (H) 0,
m
2 [ m1 ]
2
(E)
m
1
B
(0,
1)
= W (E)
2 1
m1
H (H)
[
]
(E)
R2 [ m1 ]
(E)
m1
log(2) [
]
= W m 2
k BH 1 (H) (0, 1)k(E) exp
m
[ m1 ]
for all (1, 2), R < inf{khkH 1 (H) : khk(E) }, and sufficiently large m 1.
Armed with this information, one can simply repeat the argument given at the
analogous place in the proof of Theorem 8.4.4.
The proof that, W (E) -almost surely, n approaches every h C infinitely often
also requires only minor modification. To begin, one remarks that if A (E)
is relatively compact, then
k(t)kE
= 0.
T A t[T
/ 1 ,T ] 1 + t
lim sup
sup
Thus, since, by the preceding, for W (E) -almost every , the union of {n : n 1}
and BH 1 (H) (0, 1) is relatively compact in (E), it suffices to prove that
n (t) n (k 1 ) h(t) h(k 1 ) kE
= 0 W (E) -almost surely
lim sup
1+t
n t[k1 ,k]
for each h BH 1 (H) (0, 1) and k 2. Because, for a fixed k 2, the random
variables k2m k2m (k 1 ) [k 1 , k], m 1, are W (E) -independent random
variables, we can use the BorelCantelli Lemma as in 8.4.2 and thereby reduce
the problem to showing that, if km (t) = km (t + k 1 ) km (k 1 ), then
W (E) kk2m hk(E) =
m=1
for each > 0, k 2, and h BH 1 (H) (0, 1). Finally, since W (E) km 1 is the
k2m
W (E) distribution of
k2m , the rest of the argument is the same as the one
given in 8.4.2.
365
(E)
UR -distribution of
B( , t) is that of t times a reversible, RN -valued
OrnsteinUhlenbeck process.
where K is the constant in Ferniques Theorem. Next, use this together with
Theorem 8.4.4 and the reasoning in Exercise 4.3.16 to show that
k(t)kE
k(t)kE
= L = lim q
lim q
t&0
2t log(2) t
2t log(2)
where L = sup khkE : h BH (0, 1) .
t
366
log(2) (n 3)
t
,
n
t [0, ),
Hint: Referring to (ii) in Exercise 8.6.8, show that it suffices to prove these
properties for the sequence {(I)n : n 1}. Next check that
(I)n Ih
=
n h
(E)
(E)
for h H 1 (H),
and use Theorem 8.6.7 and the fact that I is an isometry of H 1 (H) onto itself.
Chapter 9
Convergence of Measures on a Polish Space
Next, again use kku supxE |(x)| to denote the uniform norm of
B(E; R), and consider the neighborhood basis at M1 (E) determined by
the sets
U (, r) = M1 (E) : h, i h, i < r for B(E, R) with kku 1
as r runs over (0, ). For obvious reasons, the topology defined by these neighborhoods U is called the uniform topology on M1 (E). In order to develop
some feeling for the uniform topology, I will begin by examining a few of its
elementary properties.
367
368
Lemma 9.1.1.
M1 (E) by
n
o
k kvar = sup h, i h, i : B(E; R) with kku 1 .
Then (, ) M1 (E)2 7 k kvar is a metric on M1 (E) that is compatible
with the uniform topology. Moreover, if , M1 (E) are two elements of
M1 (E) and is any element of M1 (E) with respect to which both and are
absolutely continuous (e.g., +
2 ), then
(9.1.2)
where f =
d
.
and g =
and equality holds when = sgn (g f ). To prove the assertion that follows
(9.1.2), note that
kg f kL1 (;R) kf kL1 (;R) + kgkL1 (;R) = 2
and that the inequality is strict if and only if f g > 0 on a set of strictly positive
-measure or, equivalently, if and only if 6 . Thus, all that remains is to
check the completeness assertion. To this end, let {n : n 1} M1 (E)
satisfying
lim sup kn m kvar = 0
m nm
P
be given, and set = n=1 2n n . Clearly, is an element of M1 (E) with
n
respect to which each n is absolutely continuous. Moreover, if fn = d
d , then,
1
by (9.1.2), {fn : n 1} is a Cauchy convergent sequence in L (; R). Hence,
since L1 (; R) is complete, there is an f L1 (; R) to which the fn s converge in
L1 (; R). Obviously, we may choose f to be non-negative, and certainly it has
-integral 1. Thus, the measure given by d = f d is an element of M1 (E),
and, by (9.1.2), kn kvar 0.
As a consequence of Lemma 9.1.1, we see that the uniform topology on M1 (E)
admits a complete metric and that convergence in this topology is intimately related to L1 -convergence in the L1 -space of an appropriate element of M1 (E).
369
In fact, M1 (E) looks in the uniform topology like a galaxy that is broken into
many constellations, each constellation consisting of measures that are all absolutely continuous with respect to some fixed measure. In particular, there will
usually be too many constellations for M1 (E) in the uniform topology to be
separable. To wit, if E is uncountable and {x} B for every x E, then the
point masses x , x E, (i.e., x () = 1 (x)) form an uncountable subset of
M1 (E) and ky x kvar = 2 for y 6= x. Hence, in this case, M1 (E) cannot be
covered by a countable collection of open k kvar -balls of radius 1.
As I said at the beginning of this section, the uniform topology is not the only
one available. Indeed, for many purposes and, in particular, for probability theory, it is too rigid a topology to be useful. For this reason, it is often convenient
to consider a more lenient topology on M1 (E). The first one that comes to mind
is the one that results from eliminating the uniformity in the uniform topology.
That is, given a M1 (E), define
o
n
(9.1.3) S , ; 1 , . . . , n M1 (E) : max hk , i hk , i <
1kn
4
4 0
for m 6= n, one sees that {n : n 1} not only fails to converge in the uniform
topology, it does
1not even have any limit points as n 2 . On the other
hand, because 2 2 sin(2nt) : n 1 is orthonormal in L [0,1] ; R , Bessels
Inequality says that
!2
Z
X
2
(t) sin(2nt) dt
kk2L2 ([0,1] ) kk2u <
n=1
[0,1]
370
and therefore h, n i h, [0,1] i for every B [0, 1]; R . In other words,
{n : n 1} converges to [0,1] in the strong topology, but it converges to nothing
at all in the uniform topology.
9.1.2. The Weak Topology. Although the strong topology is weaker than
the uniform and can be effectively used in various applications, it is still not
weak enough for most probabilistic applications. Indeed, even when E possesses
a good topological structure and B = BE is the Borel field over E, the strong
topology on M1 (E) shows no respect for the topology on E. For example,
suppose that E is a metric space and, for each x E, consider the point mass
x on BE . Then, no matter how close x E \ {x} gets to y in the sense
of the topology on E, x is not getting close to y in the strong topology on
M1 (E). More generally (cf. Exercise 9.1.15), measures cannot be close in the
strong topology unless their sets of small measure are essentially the same. Thus,
for example, the convergence that is occurring in The Central Limit Theorem
(cf. Theorem 2.1.8) cannot, in general, be taking place in the strong topology;
and since The Central Limit Theorem is an archetypal example of the sort of
convergence result at which probabilists look, it is only sensible for us to take a
hint from the result that we got there.
Thus, let E be a metric space, set B = BE , and consider the neighborhood
basis at M1 (E) given by the sets S(, ; 1 , . . . , n ) in (9.1.3) when the
k s are restricted to be elements of Cb (E; R). The topology that results is much
weaker than the strong topology, and is therefore justifiably called the weak
topology on M1 (E). (The reader who is familiar with the language of functional
analysis will, with considerable justice, complain about this terminology. Indeed,
if one thinks of Cb (E; R) as a Banach space and of M1 (E) as a subspace of its
dual space Cb (E; R) , then the topology that I am calling the weak topology
is what a functional analyst would call the weak topology. However, because
it is the most commonly accepted choice of probabilists, I will continue to use
the term weak instead of the more correct term weak .) In particular, the weak
topology respects the topology on E: y tends to x in the weak topology on
M1 (E) if and only if y x in E. Lemma 2.3.3 provides further evidence
that the weak topology is well adapted to the sort of analysis encountered in
probability theory, since, by that lemma, weak convergence of {n : n 1}
M1 (RN ) to is equivalent to pointwise convergence of
cn () to
().
Besides being well adapted to probabilistic analysis, the weak topology turns
out to have many intrinsic virtues that are not shared by either the uniform or
strong topologies. In particular, as we will see shortly, when E is a separable
metric space, the weak topology on M1 (E) is not only a metric topology, which
(cf. Exercise 9.1.15) the strong topology seldom is, but it is even separable,
which, as we have seen, the uniform topology seldom is. In order to check these
properties, we will first have to review some elementary facts about separable
metric spaces.
Given a metric for a topological space E, I will use Ub (E; R) to denote
371
(x, pn )
,
1 + (x, pn )
x E.
X
|n n |
2n
n=1
372
is a metric for E. At the same time, since [0, 1]Z is compact, and therefore
the restriction of D to any subset is totally bounded, it is clear that is totally
bounded on E.
denote the completion of E with respect to the totally
To prove (iii), let E
E
is both complete and
bounded metric . Then, because E is dense in E,
R 7 E
totally bounded and therefore compact. In addition, C E;
Ub(E; R) is a surjective homeomorphism; and so (iii) now follows from (i).
One of the main reasons why Lemma 9.1.4 will be important to us is that it
will enable us to show that, for separable metric spaces E, the weak topology
on M1 (E) is also a separable metric topology. However, thus far we do not
even know that the neighborhood bases are countably generated, and so, for a
moment longer, I must continue to consider nets when discussing convergence.
In order to indicate that a net { : A} M1 (E) is converging weakly
(i.e., in the weak topology) to , I will write = .
Theorem 9.1.5. Let E be any metric space and { : A} a net in M1 (E).
Given any M1 (E), the following statements are equivalent:
(i) = .
(ii) If is any metric for E, then h, i h, i for every Ub (E; R).
(iii) For every closed set F E, lim (F ) (F ).
373
are all trivial. Thus, the first part will be complete once I check that (ii) =
(iii), (iv) = (vi), and that (v) together with (vi) imply (vii). To see the
first of these, let F be a closed subset of E, and set
n (x) = 1
(x, F )
1 + (x, F )
n1
for n Z+ and x E.
It is then clear that n Ub (E; R) for each n Z+ and that 1 n (x) & 1F (x)
as n for each x E. Thus, The Monotone Convergence Theorem followed
by (ii) imply that
(F ) = lim hn , i = lim limhn , i lim (F ).
n
In proving that (iv) = (vi), I may and will assume that f is a non-negative,
lower semicontinuous function. For n N, define
fn =
X
` 4n
`=0
where
I`,n =
2n
1I`,n
4
1 X
1J`,n f,
f = n
2
`=0
` `+1
,
2n 2n
and J`,n =
`
, .
2n
for x E.
It is then an easy matter to check that f f f everywhere and that equality holds -almost surely. Furthermore, f is lower semicontinuous, f is upper
semicontinuous, and both are bounded. Hence, by (v) and (vi),
374
and so I have now completed the proof that conditions (i) through (vii) are
equivalent.
Now assume that E is separable, and let be a totally bounded metric for E.
By (iii) of Lemma 9.1.4, Ub(E; R) is separable. Hence, we can find a countable
set {n : n 1} that is dense in Ub(E; R). In particular, by the equivalence of
(i) and (ii) above, we see that hn , i hn , i for all n Z+ if and only if
+
= , which is to say that the corresponding map H : M1 (E) [0, 1]Z is
+
a homeomorphism. Since [0, 1]Z is a compact metric space and D (cf. the proof
of (ii) in Lemma 9.1.4) is a metric for it, we also see that the R described is a
totally bounded metric for M1 (E). In particular, M1 (E) is separable. Finally,
since, by (ii) in Lemma 9.1.4, it is always possible to find a totally bounded
metric for E, the last assertion needs no further comment.
The reader would do well to pay close attention to what (iii) and (iv) say
about the nature of weak convergence. Namely, even though = , it is
possible that some or all of the mass that the s assign to the interior of a
set may gravitate to the boundary in the limit. This phenomenon is most easily
understood by taking E = R, to be the unit point mass at [0, 1),
checking that = 1 , and noting that 1 (0, 1) = 0 < 1 = (0, 1) for each
[0, 1).
Remark 9.1.6. Those who find nets distasteful will be pleased to learn that,
from now on, I will be restricting my attention to separable metric spaces E and
therefore need only discuss sequential convergence when working with the weak
topology on M1 (E). Furthermore, unless the contrary is explicitly stated, I will
always be thinking of the weak topology when working with M1 (E).
Given a separable metric space E, I next want to find conditions that guarantee
that a subset of M1 (E) is compact; and at this point it will be convenient to
have introduced the notation K E to indicate that K is a compact subset
of E. The key to my analysis is the following extension of the sort of Riesz
Representation result in Theorem 3.1.1 combined with a crucial observation
made by S. Ulam.1
Lemma 9.1.7. Let E be a separable metric space, a metric for E, and a
non-negative linear functional on Ub (E; R) (i.e., is a linear map that assigns
a non-negative value to a non-negative Ub (E; R)) with (1) = 1. Then, in
order for there to be a (necessarily unique) M1 (E) satisfying () = h, i
for all Ub (E; R), it is sufficient that, for every > 0, there exist a K E
1
It is no accident that Ulam was the first to make this observation. Indeed, the term Polish
space was coined by Bourbaki in recognition of the contribution made to this subject by the
Polish school in general and C. Kuratowski in particular (cf. Kuratowskis Topologie, Vol. I,
WarszawaLwow (1933)). Ulam had studied with Kuratowski.
375
such that
(9.1.8)
() sup |(x)| + kku ,
xK
Ub (E; R).
Conversely, if E is a Polish space and M1 (E), then for every > 0 there is a
K E such that (K) 1 . In particular, if M1 (E) and () = h, i
for Cb (E; R), then, for each > 0, (9.1.8) holds for some K E.
Proof: I begin with the trivial observation that, because is non-negative and
(1) = 1, () kku . Next, according to the Daniell theory of integration,
the first statement will be proved as soon as we know that (n ) & 0 whenever
{n : n 1} is a non-increasing sequence of functions from Ub E; [0, ) that
tend pointwise to 0 as n . To this end, let > 0 be given, and choose
K E so that (9.1.8) holds. One then has that
lim n lim sup |n (x)| + k1 ku = k1 ku ,
n xK
!
Bk,n
k=1
.
2n
Hence, if
Cn
`n
[
k=1
B k,n
and K =
Cn ,
n=1
then (K) 1 . At the same time, it is obvious that, on the one hand,
K is closed (and therefore -complete) and that, on the other hand, K
S`n
2
for every n Z+ . Hence, K is both complete and totally
k=1 B pk , n
bounded with respect to and, as such, is compact.
As Lemma 9.1.7 makes clear, probability measures on a Polish space like to
be nearly concentrated on a compact set. Following Prohorov and Varadarajan,2
2
See Yu. V. Prohorovs article Convergence of random processes and limit theorems in probability theory, Theory of Prob. & Appl., which appeared in 1956. Independently, V.S.
Varadarajan developed essentially the same theory in Weak convergence of measures on a
separable metric spaces, Sankhy
a, which was published in 1958. Although Prohorov got into
print first, subsequent expositions, including this one, rely heavily on Varadarajan.
376
what we are about to see is that, for a Polish space E, relatively compact subsets
of M1 (E) are those whose elements are nearly concentrated on the same compact
set of E. More precisely, given a separable metric space E, say that M M1 (E)
is tight if, for every > 0, there exists a K E such that (K) 1 for
all M .
Theorem 9.1.9. Let E be a separable metric space and M M1 (E). Then
M is compact if M is tight. Conversely, when E is Polish, M is tight if M is
compact.3
Proof: Since it is clear, from (iii) in Theorem 9.1.5, that M is tight if and only
if M is, I will assume throughout that M is closed in M1 (E).
To prove the first statement, take
to be a totally bounded metric on E,
`
[
!
Bk,n
for `, n Z+ .
k=1
By (iv) in Theorem 9.1.5, M1 (E) 7 f`,n () [0, 1] is lower semicontinuous. Moreover, for each n Z+ , f`,n % 1 as ` % . Thus, by Dinis Lemma,
we can choose, for each n Z+ , one `n Z+ so that f`n ,n () 1 2n for all
3
For the reader who wishes to investigate just how far these results can be pushed before
they start of break down, a good place to start is Appendix III in P. Billingsleys Convergence
of Probability Measures, Wiley (1968). In particular, although it is reasonably clear that
completeness is more or less essential for the necessity, the havoc that results from dropping
separability may come as a surprise.
377
M ; and at this point the rest of the argument is precisely the same as the
one given at the end of the proof of Lemma 9.1.7.
9.1.3. The L
evy Metric and Completeness of M1 (E). We have now seen
that M1 (E) inherits properties from E. To be more specific, if E is a metric
space, then M1 (E) is separable or compact if E itself is. What I want to show
next is that completeness also gets transferred. That is, I will show that M1 (E)
is Polish if E is. In order to do this, I will need a lemma that is of considerable
importance in its own right.
Lemma 9.1.10. Let E be a Polish space and a bounded subset of Cb (E; R)
that is equicontinuous at each x E. (That is, for each x E, sup |(y)
(x)| = 0 as y x.) If {n : n 1} {} M1 (E) and n = , then
lim sup h, n i h, i = 0.
Proof: Let > 0 be given, and use the second part of Theorem 9.1.9 to choose
K E so that
sup kku
sup n K{ < .
4
nZ+
By (iv) of Theorem 9.1.5, K{ satisfies the same estimate. Next, choose a
metric for E and a countable dense set {pk : k 1} in K. Using equicontinuity
together with compactness, find ` Z+ and 1 , . . . , ` > 0 so that K x :
(x, pk ) < k for some 1 k ` and
sup (x) (pk ) <
4
Because r (0, ) 7 y K : (y, x) r
[0, 1] is non-decreasing
for each x K, we can find,
for each 1 k `, an rk k , 2k such that
(Bk ) = 0 when Bk x K : x, pk < rk . Finally, set A1 = B1 and
Sk
S`
Ak+1 = Bk+1 \ j=1 Bj for 1 k < `. Then, K k=1 Ak , the Ak s are
disjoint, and, for each 1 k `,
sup sup (x) pk <
4
xAk
and Ak = 0.
k=1
378
(x, y)
.
and F (x) F (y)
Setting ` =
,
2`
379
380
not depend on the choice of metric. To complete the first part, suppose that
(Xn , X) 0 in P-measure. Then, for every Ub (E; R) and > 0,
lim EP Xn EP X) lim EP Xn (X)
n
n
() + kku lim P Xn , X = (),
n
where
() sup |(y) (x)| : (x, y) 0 as
& 0.
that Yk,n
P = k as n . Finally, let be a complete metric for E, and
suppose that {Xn : n 1} is a sequence of E-valued random variables on
(, F, P) with the property that
(9.1.14)
lim lim P Xn , Yk,n = 0 for every > 0.
k n
`k
lim lim L (Y`,n ) P, (Xn ) P = 0.
k n
381
`k n
lim L , (Xn ) P L(, k ) + lim L (Yk,n ) P, (Xn ) P .
n
Thus, after letting k and applying (*), one concludes that (Xn ) P =
.
Exercises for 9.1
Exercise 9.1.15. Let (E, B) be a measurable space with the property that
{x} B for all x E. In this exercise, we will investigate the strong topology
in a little more detail. In particular, in part (iv), we will show that when
M1 (E) is non-atomic (i.e., {x} = 0 for every x E), then there is no
countable neighborhood basis of in the strong topology. Obviously, this means
that the strong topology for M1 (E) admits no metric whenever M1 (E) contains
a non-atomic element.
(i) Show that, in general,
k kvar = 2 max (A) (A) : A B
and that in the case when E is a metric space, B its Borel field, and a metric
for E,
k kvar = sup h, i h, i : Ub (E; R) and kku 1 .
(ii) Show that if {n : n 1} is a P
sequence in M1 (E) that tends in the strong
382
lim
for
(Suppose not, set (x) = (x,
x)
f Cb (R; R).) Finally, assuming that Cb (E; R) is separable, and, using a
diagonalization procedure, show that every sequence {xn : n 1} E admits a
and limm xn
subsequence {xnm : m 1} that converges to some x
E
m
exists for every Cb (E; R).
383
R
Conversely,
if
E
is
Polish
and
there
is
a
finite
measure
M
such
that
dMn
E
R
dM for every Cb (E; R), show that {Mn : n 1} is tight.
E
Exercise
9.1.17. Let {E` : ` 1} be a sequence of Polish spaces, set E =
Q
1 E` , and give E the product topology.
(i) For each ` Z+ , let ` be a complete metric for E` , and define
X
1 ` (x` , y` )
R(x, y) =
2` 1 + ` (x` , y` )
for x, y E.
`=1
k k ,
k=1
k=1
384
` x1 , . . . , x`
=
n
n`
xn
if
en
otherwise.
Show that (` ) [1,`] : ` Z+ M1 (E) is tight and that any limit must be
the desired .
The conclusion drawn in (iii) is the renowned Kolmogorov Extension (or
Consistency) Theorem. Notice that, at least for Polish spaces, it represents
a vast generalization of the result obtained in Exercise 1.1.14.
Exercise 9.1.18. In this exercise we will use the theory of weak convergence
to develop variations on The Strong Law of Large Numbers (cf. Theorem 1.4.9).
Thus, let E be a Polish space, (, F, P ) a probability space, and {Xn : n 1}
a sequence of mutually independent E-valued random variables on (, F, P )
with common distribution M1 (E). Next, define the empirical distribution
function
n
1 X
X () M1 (E),
7 Ln ()
n m=1 m
n
1 X
Xm () ,
, Ln () =
n m=1
n Z+ and .
which is The Strong Law of Large Numbers for the empirical distribution.
Now show that (9.1.19) provides another (cf. Exercises 6.1.16 and 6.2.18) proof
of the Strong Law of Large Numbers for Banach spacevalued random variables.
Thus, let EPbe a real, separable, Banach space with dual space E , and set
n
S n () = n1 1 Xm () for n Z+ and .
BE (0, R){ = 0
for some
R (0, ).
Choose Cb R; R so that (t) = t for t [R, R] and (t) = 0 when
|t|
R + 1, and define x Cb (E; R) for x E by x (x) = hx, x i , x E,
385
sup
n kx k 1
E
Z
hx , Ln ()i hx, x i (dx) = 0
E
lim
S n () m
E = 0
=
Xn ()
if
Xn ()
E < R
otherwise
Pn (R)
(R)
(R)
() = Xn () Xn (). Next, set S n = n1 1 Xm , n Z+ , and,
(R)
from (i), note that S n () : n 1 converges in E for P-almost every .
In particular, if > 0 is given and R (0, ) is chosen so that
(R)
and Yn
Z
kxkE (dx) <
,
8
{kxkE R}
lim P
sup
S n S m
E
nm
(R)
(R)
lim P sup S n S m
m
2
nm
n
!
1 X
(R)
Yk
+ 2 lim P sup
m
4
nm
n 1
E
!
n
1 X
Y (R)
= 0,
2 lim P sup
k
E
m
4
nm n 1
386
P AB =
P
(B) P(d) for all A and B F.
.
Hence,
since
F
(as
the
Borel
field
over a
The beautiful argument that I have just outlined is due to Ranga Rao. See his 1963 article
The law of large numbers for D[0, 1]-valued random variables, Theory of Prob. & Appl.
VIII #1, where he shows that this method applies even outside the separable context.
n fn
and g =
n=0
387
n gn
n=0
R.
Clearly, (1) = 1 for all . On the other hand, we cannot say that
is always non-negative as a linear functional on S. In fact, the best we can
do is extract a -measurable P-null set N so that is a non-negative linear
functional on S whenever
/ N . To this end, let Q denote the rational reals
and set
Q+ = R QN : f 0 .
Since g 0 (a.s., P) for every Q+ and Q+ is countable,
n
N : Q+
o
g () < 0
f S,
(f ) for
/N
EP [f ]
for
m,n () =
m (, Kn )
1 + m (, Kn )
for m, n Z+ .
388
Clearly, m,n U for each pair (m, n) and 0 m,n % 1Kn { as m for each
n Z+ . Thus, by The Monotone Convergence Theorem, for each n Z+ ,
Z
Z
sup m,n P(d) = lim
m,n P(d)
m
N { mZ+
N{
1
= lim EP m,n n ;
m
2
and so, by the BorelCantelli Lemma, we can find a -measurable P-null set
N 0 N such that
M () sup n sup m,n
< for every
/ N 0.
nZ+
mZ+
Hence, if
/ N 0 , then, for every f U and n Z+ ,
(f ) (1 m,n ) f + m,n f
M ()
kf ku
(1 m,n ) f
u +
n
for all m Z+ . But
(1 m,n ) f
u kf ku,Kn as m , and so we now see
that the condition in (9.1.8) is satisfied by for every
/ N 0 . In other words,
0
0
7 P
by taking P = P for N , then this map is -measurable and
Z
EP f, A =
EP [f ] P(d), A ,
first for all f U and thence for all F-measurable f s that are bounded below.
If P is a probability measure on (, F) and is a sub--algebra of F, then
a conditional probability distribution of P given is a map (, B) 7
P
(B) such that P is a probability measure on (, F) for each and
P (B) a conditional probability of B given for all B F. If, in addition, for outside a -measurable, P-null set and all A , P (A) = 1A (),
then the conditional probability distribution is said to be regular. Notice that,
although they may not always exist, conditional probability distributions are
always unique up to a -measurable, P-null set so long as F is countably generated. Moreover, Theorem 9.2.1 says that they will always exist if is Polish and
F = B . Finally, whenever a conditional probability distribution of P given
exists, the argument leading to the last part of Theorem 9.2.1 when is countably generated is completely general and shows that a regular version can be
found.
9.2.1. Fibering a Measure. When is a product space E1 E2 of two
Polish spaces and is the -algebra generated by the second coordinate, then
the conclusion of Theorem 9.2.1 takes a particularly pleasing form.
389
know that P(x1 ,x2 ) = P(x0 ,x2 ) . In addition, because is countably generated,
1
the final part of Theorem 9.2.1 guarantees
that there exists a 2 -null set B
BE2 such that P
E
{x
}
=
1
for
all x2
/ B. Hence, if we define
0
1
2
(x ,x2 )
1
x2
(x2 , ) by (x2 , ) = P
( E2 ), then, for any Borel measurable
(x01 ,x2 )
: E1 E2 [0, ), h, i equals
Z Z
Z Z
0
( 0 )P
(d
)
P(d)
=
(x
,
x
)
(x
,
dx
)
2 (dx2 ).
1
2
2
1
E2
E1
In the older literature, the result in Theorem 9.2.2 would be called a fibering
of . The name derives from the idea that on E1 E2 can be decomposed into
its vertical component 2 and its restrictions (x2 , ) to horizontal fibers
E1 {x2 }. Alternatively, Theorem 9.2.2 can be interpreted as saying that any
M1 (E1 E2 ) can be decomposed into its marginal distribution on E2 and
a transition probability x2 E2 7 (x2 , ) M1 (E1 ). The two extreme cases
are when the coordinates are independent, in which case (x2 , ) is independent
of x2 , and the case when the coordinates are equal, in which case (x2 , ) = x2 .
As an application of Theorem 9.2.2, I present the following important special
case of a more general result that indicates just how remarkably fungible nonatomic measures are.
Corollary 9.2.3. Let [0,1) denote Lebesgue measure on [0, 1). For each
N Z+ and M1 (RN ), there is a Borel measurable map f : [0, 1) RN
such that = f [0,1) .
Proof: I will work by induction on N Z+ . When N = 1, take
f (u) = inf t R : (, t] u , u [0, 1).
Next, assume the result is true for N , take E1 = R and E2 = RN in Theorem
9.2.2, and, given M1 (RN ), define 2 M1 (RN ) and y RN 7 (y, )
M1 (R) accordingly. By the induction hypothesis, 2 = f2 ( ) [0,1) for some
f2 : [0, 1) RN . Thus, if g : [0, 1)2 R RN is given by
g(u1 , u2 ) = inf t R : f2 (u2 ), (, t] u1 , f2 (u2 )
390
for (u1 , u2 ) [0, 1)2 , then g is Borel measurable on [0, 1)2 and = g 2[0,1) .
Finally, by Lemma 1.1.6 or part (ii) of Exercise 1.1.11, we know
that there is a
Borel measurable map u [0, 1) 7 U(u) = U1 (u), U2 (u) [0, 1)2 such that
U [0,1) = 2[0,1) , and so we can take f (u) = g U.
9.2.2. Representing L
evy Measures via the It
o Map. There is another
way of thinking about the construction of the Poisson jump processes, one that
is based on Corollary 9.2.3 and the transformation property described in Lemma
4.2.12. The advantage of this approach is that it provides a method of coupling
Levy processes corresponding to different Levy measures. Indeed, it is this coupling procedure that underlies K. Itos construction of Markov processes modeled
on Levy processes.1
Let M0 (dy) = |y|N 1 dy, which is the Levy measure for a (cf. Corollary 3.3.9)
symmetric 1-stable law. My first goal is to show that every M M (RN ) can
be realized as (cf. the notation in Lemma 4.2.6) M0F for some Borel measurable
F : RN RN satisfying F (0) = 0.2
Theorem 9.2.4. For each M M (RN ) there exists a Borel measurable map
F : RN RN such that F (0) = 0 and
M () = M0F M0 F 1 ( \ {0}) , BRN .
Proof: I begin with the case when N = 1. Given M M (R), define (r, 1)
for r > 0 by
(r, 1) = sup [0, ) : M [, ) r1
(r, 1) = sup [0, ) : M (, ] r1 ,
where I have taken the supremum over the empty set to be 0. Applying Exercise
9.2.6 with (dr)= r2 (0,) (dr), one sees that M = M0F when F (0) = 0 and
y
for y R \ {0}.
F (y) = |y|, |y|
Now assume that N 2, and let M M (RN ). If M = 0, simply take
F 0. If M 6= 0, choose a non-decreasing function h : (0, ) (0, ) so that
Z
h |y| M (dy) = 1,
See K. It
os On stochastic differential equations, Memoirs of the A.M.S. 4 (1951) or my
Markov Processes from K. It
os Perspective, Princeton Univ. Press, Annals of Math. Studies
155 (2003).
2 There is nothing sacrosanct about the choice of M as my reference measure. For instance, it
0
should be obvious that one can choose any L
evy measure M with the property that M0 = M F
for some Borel measurable F : RN RN that takes 0 to 0.
391
Then, again by Exercise 9.2.6, but this time with (dr) = N 1 r2 (0,) (dr),
for any continuous : RN [0, ) that vanishes in a neighborhood of 0,
Z
Z
(r)
(, dr) = N 1
(r, ) r2 dr, SN 1 ,
(0,)
(0,) h(r)
and so
Z
(y) M (dy) = N 1
(r, ) r
SN 1
RN
dr 2 (d)
(0,)
= N 1
[0,1)
!
2
(r, )f (t) r dr [0,1) (dt).
(0,)
Finally, define g : SN 1 [0, N 1 ) by g() = SN 1 { 0 SN 1 : 10 1 } ,
note that N 1 [0,1) = g SN 1 , and conclude that M = M0F when
y
y
for y RN \ {0}.
f g |y|
F (0) = 0 and F (y) = |y|, |y|
We can now prove the following theorem, which is the simplest example of
It
os procedure.
Theorem 9.2.5. Let {j0 (t, ) : t 0} be a Poisson jump process associated
with M0 . Then, for each M M (RN ), there is a Borel measurable map
F : RN RN with F (0) = 0 and a Poisson jump process {j(t, ) : t 0}
associated with M such that j(t, ) = j0F (t, ), t 0, P-almost surely.
Proof: Choose F as in Theorem 9.2.4 so that M = M0F . For R > 0, set
FR (y) = 1[R,) (y)F (y). By Lemma 4.2.12, we know that {j0FR (t, ) : t 0} is
a Poisson jump process associated with M FR . In particular, for each r > 0,
EP j0F t, RN \ B(0, r) = lim EP j0FR t, RN \ B(0, r) = M RN \ B(0, r) < .
R&0
392
r 0,
where
over the empty set is taken to be 0. Show that [t, ) =
the supremum
r : (r) t for all t > 0, and therefore that h, i = h , i for all Borel
measurable : [0, ) [0, ) that vanish at 0.
Hint: Determine g : (0, )
(0,
)
so
that
g(r),
X
1 k 0 k[0,n]
(, 0 ) =
2n 1 + k 0 k[0,n]
n=1
393
we know that two probability measures , M1 C(RN ) are equal if they
determine the same distribution on (RN )[0,) , that is, if, for each n Z+ and
0 = t0 < t1 < tn , the distribution of C(RN ) 7 (t0 , . . . , (tn ) (RN )n
is the same under and .
9.3.1. Donskers Theorem. Let (, F, P) be a probability space, and suppose that {Xn : n 1} is a sequence of independent,
P-uniformly square
integrable random variables (i.e., as R , EP |Xn |2 , |Xn | R 0
uniformly in n) with mean value 0 and covariance I. Given n 1, define
Pm
m
12
=
n
7 Sn ( , ) C(RN ) so that
S
(0)
=
0,
S
n
n
k=1 Xk , and
n
m1 m
+
Sn ( , ) is linear on each interval n , n for all m Z . Donskers theorem
is the following.
n (k) = n 2
X
j=bntk1 c+1
Xj ,
1
k ,
set
394
where, as usual, I use the notation btc to denote the integer part of t. Noting
that
Sn tk Sn tk1 n (k)
bntk1 c
bntk c
Sn tk Sn
+ Sn tk1 Sn
n
n
Xbnt c+1 + Xbnt c+1
k
k1
,
1
n2
`
2
X
n2
P
Xbntk c+1
4
k=0
`
2 i 4(` + 1)N
4 X P h
=
0
E
X
bnt
c+1
k
n2
n2
k=0
Mn (k)
X
||2
1
P
, Xbntk c+j RN
exp
E exp
1
2
Mn (k) 2 j=1
Mn (k)
n
k , we now
i
k ||2
= \
exp
E exp 1 , n (k) RN
0,k I (),
2
P
and therefore n (k) P = 0,k I .
395
iscompact for any > 0 and {R` : ` 1} [0, ). Thus, since n (0) =
0 = 1, all that we have to do is show that, for each T > 0,
|Sn (t) Sn (s)|
P
< ,
sup E
sup
1
(t s) 8
n1
1s<tT
81N 2 M (` k)2
135N 2 M (t s)2 ,
n2
where, in the passage to the final line, I have taken {ei : 1 i N } to be an
orthonormal basis for RN and used the estimate
2 2
X
`k
N
`k
X X
EP
Xk+j = EP
ei , Xk+j RN
j=1
i=1
j=1
54M (t s)2 +
N
X
i=1
4
`k
X
EP
ei , Xk+j RN 3N 2 M (` k)2
j=1
396
nZ+
and, for every n Z+ , the random variable Xn, fn, Xn has mean value 0
and covariance I. Next, for each > 0, define the maps 7 Sn, ( , )
C(RN ) relative to {Xn, : n 1}, and set n, = Sn, P. Then, by the
preceding, we know that n, = W (N ) for each > 0. Hence, by Theorem
9.1.13, we will have proved that n = W (N ) as soon as we show that
lim sup P sup Sn (t) Sn, (t) = 0
&0 nZ+
0tT
for every T Z+ and > 0. To this end, first observe that, because Sn ( ) and
Sn, ( ) are linear on each interval [(m 1)2n , m2n ],
m
X
1
Y
sup Sn (t) Sn, (t) = max
k, ,
1
1mnT
2
n k=1
t[0,T ]
for every e SN 1 .
9.3.2. Rayleighs Random Flights Model. Here is a more picturesque
scheme for approximating Brownian motion. Imagine the path t
R(t) of a
bird that starts at the origin, flies in a randomly chosen direction at unit speed
397
for a unit exponential random time, then switches to a new randomly chosen
direction for a second unit exponential time, etc. Next, given > 0, rescale time
1
and space so that the path becomes t
R (t), where R (t) 2 R(1 t). I
will show that, as & 0, the distribution of {R (t) : t 0} becomes Brownian
motion. This model was introduced by Rayleigh and is called his random flights
model.
In the following, {m : m 1} is a sequence of mutually independent, unit
exponential random variables from which their partial sums {Tn : n 0} and
the associated simple Poisson process {N (t) : t 0} are defined as in 4.2.1.
Finally, given > 0, N (t) = N (1 t).
Lemma 9.3.3. Let {Xn : n 1} a sequence of mutually independent RN valued, uniformly square P-integrable random variables with mean value 0 and
covariance I, and define {Sn (t) : t 0} accordingly, as in Theorem 9.3.1. (Note
that the Xn s are not assumed to be independent of the n s.) Next, define
X (t, ) =
N (t,)
Xm ,
(t, ) [0, ) .
m=1
lim P
&0
= 0,
where n [1 ].
t[0,T ]
N (t, )
,
X (t, ) Sn (t, ) = ( n 1) Sn
n
N (t, )
, Sn (t, ) .
+ Sn
n
sup X (t) Sn (t) r
t[0,T ]
!
N (t)
t
+ P sup
n
t[0,T ]
!
r
.
+ P sup
sup Sn (t) Sn (s)
2
s[0,T ] |ts|
r
P
sup Sn (t)
2
t[0,T +]
But, by Theorem 9.3.1 and the converse statement in Theorem 9.1.9, we know
that the first term tends to 0 as & 0 uniformly in (0, 1] and that the third
398
term tends to 0 as & 0 uniformly in (0, 1]. Thus, all that remains is to
note that, by Exercise 4.2.19,
!
(9.3.4)
lim P sup N (t) t = 0.
&0
t[0,T ]
Now suppose that {n : n 1} is a sequence of mutually independent RN valued random variables that satisfy the conditions that
h
i
M sup EP |n n |4 < ,
nZ+
h
i
EP n n = 0, and EP (n n ) (n n ) = I, n Z+ .
Finally, define 7 R( , ) C(RN ) by
N (t,)
X
R(t, ) = t TN (t,) () N (t,)+1 () +
m ()m ().
m=1
Hence, by Lemma 9.3.3 and Theorems 9.3.1 and 9.1.13, all that we have to do
is check that
!
lim P sup XN (t)+1 r = 0
&0
t[0,T ]
14
1
M (2 + T ) 4
P X
4
= 0.
E
|Xn+1 |
lim
lim
&0 r
&0
r
0nT
399
Prove their result as an application of Donskers Theorem and part (iii) of Exercise 4.3.11. According to Kac, it was G. Uhlenbeck who first suggested that
their result might be a consequence of a more general invariance principle.
Exercise 9.3.8. Here is another version
of Rayleighs random
flights model.
Again let {k : k 1}, Tm : m 0 , and N (t) : t 0 be as in 4.2.2, and
set
Z t
R(t) =
(1)N (s) ds and R (t) = R t .
0
(1)k k =
n
X
k=1
X
k k+1 k n n =
2k 2k1 n n+1 .
1k n
2
Chapter 10
Wiener Measure and
Partial Differential Equations
In this chapter I will give a somewhat sketchy survey of the bridge between
Brownian motion and partial differential equations. Like all good bridges, it
is valuable when crossed starting at either end. For those starting from the
probability side, it provides a computational tool with which the evaluation of
many otherwise intractable Wiener integrals is reduced to finding the solution to
a partial differential equation. For aficionados of partial differential equations,
it provides a representation of solutions that often reveals properties that are
not at all apparent in more conventional, purely analytic, representations.
10.1 Martingales and Partial Differential Equations
The origin of all the connections between Brownian motion and partial differential equations is the observation that the Gauss kernel
(10.1.1)
g (N ) (t, x) = (2t) 2 e
|x|2
2t
(t, x) (0, ) RN ,
is simultaneously the density for the Gaussian distribution 0,tI and the solution
to the heat equation t u = 12 u in (0, ) R with initial condition 0 . More
precisely, if Cb (RN ; R), then
Z
u (t, x) =
g (N ) (t, y x)(y) dy
RN
is the one and only bounded u C 1,2 (0, ) RN ; R that solves the Cauchy
initial value problem
t u = 21 u in (0, ) RN with lim u(t, ) = uniformly on compacts.
t&0
401
u Cb1,2 [0, ) RN ; C , then that theorem shows that, when B(t), Ft , P is a
Brownian motion, for each T > 0, u(T tT, x+B(tT )Ft , P is a martingale.
Thus,
Z
u(T, x) = EP B(T ) =
(x + y) 0,tI (dy) = u (T, x).
RN
In Theorem 10.1.2, I will prove a refinement of Theorem 7.1.6 that will enable
me (cf. the discussion following Corollary 10.1.3) to remove the assumption that
the derivatives of u are bounded.
As the preceding line of reasoning indicates, the advantage that probability
theory provides comes from lifting questions about a partial differential equation to a pathspace setting, and martingales provide one of the most powerful
machines with which to do the requisite lifting. In this section I will refine and
exploit that machine.
10.1.1. Localizing and Extending Martingale Representations. The
purpose of this subsection is to combine Theorems 7.1.6 and 7.1.17 with Corollary
7.1.15 to obtain a quite general method for representing solutions to partial
differential equations as Wiener integrals.
For the purposes of this chapter, it is best to think of Wiener measure
W (N )
N
N
as a Borel measure on the Polish space C(R ) C [0, ); R
and to take
{Ft : t 0} with Ft = {( ) : [0, t]} as the standard choice of a
non-decreasing family of -algebras. The reason for using C(RN ) instead of (cf.
(N )
8.1.3) (RN ) is that we will want to consider the translates Wx of W (N ) by
(N
)
x RN . That is, Wx is the distribution of
x + under W (N ) . Since it
(N )
N
is clear that the map x R 7 Wx M1 C(RN ) is continuous, there is
no doubt that it is Borel measurable.
Theorem 10.1.2. Let G be a non-empty, open subset of R RN , and, for
s R, define sG : C(RN ) [0, ] by
sG () = inf t 0 : s + t, (t)
/G .
Further, suppose that V : G R is a Borel measurable function that is
bounded above on the whole of G and bounded below on each compact subset
of G, and set
!
Z tsG
EsV (t, ) = exp
V s + , ( ) d .
0
If w C 1,2 (G; R) Cb (G; R) satisfies t + 12 + V w f on G, where f :
G R is a bounded, Borel measurable function, then
EsV (t, )w s + t sG (), (t sG )
Z
tsG ()
E (, )f s + , ( ) , Ft , Wx(N )
402
is a submartingale for every (s, x) G. In particular, if t + 12 + V w = f on
G, then the preceding triple is a martingale.
Proof: Without loss in generality, I may and will assume that s = 0.
Choose a sequence {Gn : n 0} of open sets such that (0, x)S G0 , Gn
Mn (t, ) = wn
t, (t)
with gn = t wn + 12 wn .
gn , ( ) d
Thus, if
Z
En (t, ) = exp
Vn , ( ) d
,
Z
= En (t, )
gn
, () d
En (, )gn , () d,
and therefore
Z
En (, )Mn (, )Vn (, ) d
Z t
= En (t, )wn t, (t)
En (, )fn , ( ) d,
0
403
is a martingale.
Finally, define 0Gn for Gn in the same way as 0G was defined for G. Since
fn f on Gn , an application of Theorem 7.1.15 gives the desired result with
0Gn in place of 0G , and, because 0Gn % 0G , this completes the proof.
Perhaps the most famous application of Theorem 10.1.2 is the FeynmanKac
formula,1 a version of which is the content of the following corollary.
Corollary 10.1.3. Let V : [0, T ] RN R be a Borel measurable function
that is uniformly bounded above everywhere
and bounded below uniformly on
compacts. If u C 1,2 (0, T ) RN ; R is bounded and satisfies the Cauchy initial
value problem
t u = 12 u+V u+f in (0, T )RN
Proof: Given Theorem 10.1.2, there is hardly anything to do. Indeed, here
G = (0, T ) RN and so 0G = T . Thus, by Theorem 10.1.2 applied to w(t, ) =
u(T t, ), we know that
R tT
V (,( )) d
e 0
u T t T, (t)
Z
tT
R
0
V (,()) d
f , ( ) d, Ft , Wx(N )
is a martingale. Hence,
W (N )
u(T, x) = lim E
t%T
Rt
V (,( )) d
e 0
u T t, (t)
Z
+
R
0
V (,()) d
f , ( ) d ,
0
1
In the same spirit as he wrote down (8.1.4), Feynman expressed solutions to Schr
odingers
equation in terms of path-integrals. After hearing Feynman lecture on his method, Kac realized
that one could transfer Feynmans ideas from the Schr
odinger to the heat context and thereby
arrive at a mathematically rigorous but far less exciting theory.
404
u(t, x) = EWx
(t) =
Z
(y)g(t, y x) dy.
RN
Wx(N ) t (0, ) s t, (t)
/ G = 1.
(N )
Wx
h R n ()
i
V (,( )) d
e 0
w n (), (n ) ,
fn } n. Moreover, by (10.1.5), for
where n () = inf{t 0 : t, (t)
/ G
(N )
Wx -almost every , n (), (n ) tends to a point in {(t, x) G : t < 0}
as n , and therefore
lim w n (), (n ) = lim u n (), (n ) 0
n
405
Z
0 = u(0, 0)
R tG (0 )
0
retkV ku k W (N )
V (, 0 ( )) d
u t 0G ( 0 ), 0 (t 0G ) W (N ) (d 0 )
{ 0 : k 0 k[0,t] r} .
Since, by Corollary 8.3.6, W (N ) { 0 : k 0 k[0,t] r} > 0, we have the
required contradiction.
Turning to the final assertion, take G = R G, and observe that for all
(x, y) G2 there is a such that (0) = x, (1) = y, and ( ) G for all
[0, 1].
At first glance, one might think that the strong minimum principle overshadows the weak minimum principle and makes it obsolete. However, that is not
entirely true. Specifically, before one can apply the strong minimum principle,
one has to know that a minimum is actually achieved. In many situations,
continuity plus compactness provide the necessary existence. However, when
compactness is absent, special considerations have to be brought to bear. The
weak minimum principle does not suffer from this problem. On the other hand,
it suffers from a related problem. Namely, one has to know ahead of time that
(10.1.5) holds. As we will see below, this is usually not too serious a problem,
but it should be kept in mind.
10.1.3. The Hermite Heat Equation. In the preceding subsection I gave
an example of how probability theory can give information about solutions to
partial differential equations. In this subsection, it will be a differential equation
that gives us information about probability theory. To be precise, I, following M.
Kac, will give in this subsection his derivation of the formulas that we derived
406
by purely Gaussian techniques in Exercise 8.2.16, and in the next section I will
give his treatment of a closely related problem.2
Closed form solutions to the Cauchy initial value problem are available for
very few V s, but there is a famous one for which they are. Namely, when
V = 12 |x|2 , a great deal is known. Indeed, already in the nineteenth century,
Hermite knew how to analyze the operator 12 12 |x|2 . As a result, this operator
is often called the Hermite operator by mathematicians, although physicists
call it the harmonic oscillator because it arises in quantum mechanics as minus
the Hamiltonian for an oscillator that satisfies Hooks law. Be that as it may,
set (cf. (10.1.1))
(10.1.7)
h(t, x, y) = e
N t+|x|2
2
(N )
|y|2
1 e2t
t
,y e x e 2
2
for (t, x, y) (0, ) RN RN . By using the fact that g (N ) solves the heat
equation and tends to 0 as t & 0, one can apply elementary calculus to check
that
t h(t, , y) = 12 12 |x|2 h(t, , y) in (0, ) RN
for each y RN .
and lim h(t, x, y) = yx
t&0
(y)h(t, x, y) dy.
RN
u (t, x) = EWx
h 1 Rt
i
|( )|2 d
(t) .
e 2 0
Wx
Rt
N2
|x|2
|( )|2 d
12
0
tanh t ,
exp
= cosh t
e
2
which, together with Brownian scaling, vastly generalizes the result in Exercise
8.2.16.
2
See Kacs On some connections between probability theory and differential and integral
equations, Proc. 2nd Berkeley Symp. on Prob. & Stat. Univ. of California Press (1951),
where he gives several additional, intriguing applications of Corollary 10.1.3.
407
10.1.4. The Arcsine Law. As I said at the beginning of the last subsection,
there are very few V s for which one can write down explicit solutions to equations of the form t u = 12 u + V u. On the other hand, when V is independent
of time one can often, particularly whenRN = 1, write down a closed form ex
pression for the Laplace transform U = 0 et u(t, ) dt of u. Indeed, if u is a
bounded solution to t u = 12 u + V u, then it is an elementary exercise to check
that
12 V U = f,
U (x) =EWx
for T > 0,
The preceding remark is the origin of Kacs derivation of Levys Arcsine Law
for Wiener measure.
Theorem 10.1.8. For every T (0, ) and [0, 1],
(
W
(1)
1
C(R) :
T
)!
1[0,) (t) dt
2
arcsin .
Proof: First note that, by Brownian scaling, it suffices to prove the result when
T = 1. Next, set
1
Z
F () = W
C(R) :
1[0,)
(s) ds
,
[0, ),
and let denote the element of M1 [0, ) for which F is the distribution
function. We are going to compute F () by looking at the double Laplace
transform
Z
G()
et g(t) dt, (0, ),
(0,)
where
Z
g(t)
[0,)
et (d),
t (0, );
408
Z t
(1)
G() =
exp
+ 1[0,) (s) ds W (d) dt
0
0
Z R t
(1)
V (( )) d
= EW
e 0
dt
where V 1[0,) .
Z
Z
At this point, the strategy is to calculate G() with the help of the idea
explained above. For this purpose, I begin by seeking as good a solution x
R 7 u (x) R as I can find to the equation 12 u00 + V u = 1. By considering
this equation separately on the left and right half-lines and then matching, in so
far as possible, at 0, one finds that the best choice of bounded u will be to take
i
h p
1
if x [0, )
A exp 2(1 + ) x + 1+
u (x) =
i
h
B exp 2 x + 1
if x (, 0),
where
A =
1
(1 + )
12
1+
and B =
1
(1 + )
12
1
.
u,n (x) = n
n Z+ ,
1 00
u,n + 1[0,) u,n 1
2
on R \ {0}.
Thus, since the argument that I attempted to apply to u works for u,n , we
know that
Z R t
V (( )) d
W (1)
u,n (0) = E
e 0
fn (t) d dt .
0
409
In addition, because
W (1)
Z
1{0}
0
W (1)
Z
(t) dt =
0,t {0} dt = 0,
Z
Rt
0
V (( )) d
fn
(t) dt G().
Hence, the conclusion u (0) = G() has now been rigorously verified.
1
Knowing that G() = (1) 2 , the rest of the calculation is easy. Indeed,
since
r
Z
12 t
,
t e
dt =
es
1
p
ds =
s(t s)
Z
0
et
p
d;
(1 )
2
1
1 1
p
d = arcsin 1 .
F () =
0
(1 )
= arcsin ,
lim P
:
n
n
Pm
where Nn () is the number of m Z+ [0, n] for which Sm () `=1 X` ()
is non-negative.
Proof: Thinking of
9.2.1)
Nn ()
n
1[0,) Sn (t, ) dt,
one should guess that, in view of Theorem 9.3.1 and Theorem 9.1.13, there
should be very little left to be done. However, once again there are continuity
410
issues that have to be dealt with. Thus, for each f C R; [0, 1] and n Z+ ,
introduce the functions F f and Fnf on C(R) given by
F f () =
f (t) dt
and Fnf () =
n
1 X
f
n m=1
m
n
for any f C R; [0, 1] . Since Fnf F f uniformly on compacts, Theorem
9.3.1 plus Lemma 9.1.10 show that the distribution of
7 Afn ()
n
Sm ()
1 X
f
1
n m=1
n2
+
Nn
(1)
f
W
F
lim P
n
n
and
Nn
(1)
f
< W
F
<
lim P
n
n
for every > 0. Passing to the limit as & 0, we arrive at
lim P
Nn
Nn
<
n
(1)
Z
:
1(0,) (t) dt
and
lim P
n
(1)
Z
:
.
Finally, since
Z Z
1{0}
Z
(1)
(t) dt W (d) =
W (1) (t) = 0 dt = 0,
is continuous, the asserted result follows.
411
Remark 10.1.10. The renown of the Arcsine Law stems, in large part, from the
following counterintuitive deduction that can be drawn from it. Namely, given
0, 12 , guess which maximizes limn P Nnn ( , + ) mod1 for
a fixed . Because of The Law of Large Numbers (in more common parlance,
The Law of Averages), most people are inclined to guess that the maximum
should occur at = 12 . Thus, it is surprising that, since
1
[0, ]
[0, 1] 7 p
(1 )
is convex and has its minimum at 12 , the Arcsine Law makes the exact opposite
prediction! The point is, of course, that the sequence of partial sums {Sn () :
n 1} is most likely to make long excursions above and below 0 but tends to
spend relatively little time in a neighborhood of 0. In other words, although
one may be correct to feel that my luck has got to change, one had better be
prepared to wait a long time.
A more technical point is one raised by S. Sternberg. The arcsine distribution
is familiar to people who study iterated maps and is important to them because
(cf. Exercise 10.1.15) it is the one and only absolutely continuous probability
distribution on [0, 1] that is invariant under x [0, 1] 7 4x(1 x) [0, 1].
Sternberg asked whether a derivation
R 1 of Theorem 10.1.8 can be
R 1based on this
invariance property. Taking T+ = 0 1[0,) (s) ds and S = 0 sgn (s) ds,
and noting that 4T+ (1 T+ ) = 1 S 2 , one way to phrase Sternbergs question
is to ask is whether there is a pure thought way to check that T+ and 1 S 2
have the same distribution under W (1) and that that distribution is absolutely
continuous. I have posed this problem to several experts but, as yet, none of
them has come up with a satisfactory solution.
10.1.5. Recurrence and Transience of Brownian Motion. In this subsection I will use solutions to partial differential equations to examine the long
time behavior of Brownian motion.
Theorem 10.1.11. For r [0, ), define
r () = inf t [0, ) : |(t)| = r ,
C(RN ).
Then
r2 |x|2
r =
N
(N )
(N
+ 4)r2 N |x|2 2
r |x|2
EWx r2 =
2
N (N + 2)
(N )
EWx
412
Wx(N )
R |x|
Rr
N 2 N 2
R
|x|N 2
r
N
2
R
rN 2
|x|
if
N =1
if
N =2
if
N 3.
In particular,
Wx(2)
Wx(1) 0 < = 1 for all x R,
0 < = 0, x 6= 0, but Wx(2) r < = 1, x R2 and r > 0,
and
Wx(N )
r < =
r
|x|
N 2
, 0 < r < |x|,
when N 3.
Proof: To prove the first two equalities, set f (t, x) = |x|2 N t, use Theorem
10.1.2 to show that
f t r , (t r ) , Ft , Wx(N )
and
2
f t r , (t r ) 4
tr
N EWx
h
2 i
(N )
t r = EWx (t r |x|2 ,
t [0, ),
and
2
(N )
Wx
N E
(N )
(t r )2 =|x|4 + 4EWx
"Z
tr
2
|(s)| ds
0
(N )
+ 2N EWx
h
2 i
4 i
(N )
(t r ) (t r ) EWx (t r )
413
(N )
for all t [0, ). Now assume that |x| r, and use the first of these N EWx [r ]
(N )
(N )
r2 . Thus Wx (r < ) = 1, and so N EWx [r ] = r2 |x|2 follows when t .
To get the second equality, use Theorem 10.1.2 to show that
!
Z tr
4
2
(t r ) (4 + 2N )
(s) ds, Ft , Wx(N )
0
414
lim (t) = = 1,
x RN .
Proof: Given r > 0, apply Theorem 10.1.2 to see that (cf. the notation in
Theorem 10.1.11)
(t r )N +2 , Ft , Wx(N )
is a bounded, non-negative martingale for every |x| > r > 0. Hence, by Theorem
7.1.14, for any 0 s t < and A Fs ,
h
i
(s)N +2 , A r () > s
h
N +2
i
(N )
= EWx t r
, A r () > s ;
(N )
|x|N +2 EWx
(N )
and, because N 3 and therefore r % a.s., Wx
as r & 0, an application
of the Monotone Convergence Theorem and Fatous Lemma leads to
(N )
|x|N +2 EWx
h
i
h
i
(N )
(s)N +2 , A EWx (t)N +2 , A
inf (t) R
tT +1
Z
=
Wx(N )
inf (t) R
tT
0,I (dx),
RN \{0}
415
Next, referring to Exercise 8.3.21, set `T,x,y (t) = TTt x + Tt y for t [0, T ], let
(N )
WT,x,y M1 C([0, T ]; RN ) denote the W (N ) -distribution of
`T,x,y + T
[0, T ], and show that
i
h 1 RT
(N )
h(t, x, y)
|( )|2 d
.
= (N )
EWT ,x,y e 2 0
g (T, y x)
Exercise 10.1.15.
The purpose of this exercise is to examine the assertion made in Remark 10.1.10 about the characterization of the arcsine distribution (i.e., the Borel probability
measure on [0, 1] with distribution function
x [0, 1] 7 F (x) = 2 arcsin x [0, 1]). Specifically, the goal is to show that
the arcsine distribution is the one and only Borel probability measure on [0, 1]
that is absolutely continuous with respect to Lebesgue measure and invariant
under x [0, 1] 7 4x(1 x) [0, 1].
2
[0, 1], and show that a Borel
(i) Define x [0, 1] 7 (x) = sin x
2
probability measure on [0, 1] is invariant under x
4x(1x) if and only if
is invariant under x
2x mod 1. Conclude that the desired characterization of
the arcsine distribution is equivalent to showing that Lebesgue measure [0,1] on
[0, 1] is the one and only Borel probability measure on [0, 1] that is absolutely
continuous with respect to Lebesgue measure and invariant under x
2x mod 1.
416
(iii) Now add the assumption that [0,1] , let f be the corresponding Radon
Nikodym derivative, and extend f to R by taking f = 0 off of [0, 1]. Given
0 x < x + y 1, conclude that
Z
F (x + y) F (x) F (y) f t + x2n f (t) dt 0
R
X
+
{0, 1}Z 7
2n n [0, 1]
n=1
is invariant under x
2x mod 1. In particular, this means that, for each
p (0, 1) \ { 12 }, the p described in Exercise 1.4.29 is a non-atomic, Borel
probability measure on [0, 1] that is invariant under x
2x mod 1 but singular
to Lebesgue measure.
(10.2.2)
F (,
{:()<}
C(RN )
(N )
) W() (d 0 )
Wx(N ) (d).
417
Proof: Given Theorem 7.1.16, the proof is mostly a matter of notation. In the
first place, by replacing F (, 0 ) with F (x + , 0 ), one can reduce to the case
when x = 0. Thus, I will assume that x = 0. Secondly, = () + if
() < . Hence,
Z
Z
(N )
F , W (d) =
F , () + W (N ) (d).
{:()<}
{:()<}
Now define F (, 0 ) = 1[0,) () F , () + 0 , note that F is again
F BC(RN ) -measurable, and apply Theorem 7.1.16 to reach the desired conclusion.
Theorem 10.2.1 is a statement of the Markov property for Wiener measure.
More precisely, because it involves stopping times, and not just fixed times, it is
often called the strong Markov property.
10.2.2. Recurrence in One and Two Dimensions. As my first application
of the Markov property, I will prove the statement made following Theorem
10.1.11 about the recurrence of Brownian motion when N {1, 2}.
Theorem 10.2.3. If N {1, 2}, then, for all x RN ,
Z
Wx(N )
1B(c,r) (t) dt = for all c RN and r (0, ) = 1.
0
418
In particular, because N {1, 2}, Theorem 10.1.11 says that both B(0,r) and
(N )
r2 are Wy -almost surely finite for all y RN . Thus, by induction, n <
(N )
() 2n () if 2n () <
Xn ()
0
if 2n () = .
(N )
1B(0,r) (t) dt
0
Xn ().
n=0
Hence, if we show that the Xn s are mutually independent and identically dis(N )
tributed under Wx , then (*) will follow from The Strong Law of Large Numbers. But, by (**), we will know that the Xn s have both these properties once
(N )
we show that Wy ( B(0,r) t) is the same for all y RN with |y| = 2r . To
this end, let yi , i {1, 2} with |yi | = 2r be given, and choose an orthogonal
(N )
transformation O of RN so that y2 = Oy1 . Then, Wy2 is the distribution of
(N )
(N )
(N )
Kakutanis 1944 article, Two dimensional Brownian motion and harmonic functions, Proc.
Imp. Acad. Tokyo, 20, together with his 1949 article, Markoff process and the Dirichlet problem, Proc. Imp. Acad. Tokyo, 21, are generally accepted as the first place in which a definitive
connection between harmonic functions and Brownian motion was established. However, it
was not until with Doobs Semimartingales and subharmonic functions, T.A.M.S., 77, in
1954 that the connection was completed. It is ironic that this connection was not made by
Wiener himself. Indeed, Wieners early fame as an analyst was based on his contributions to
potential theory. However, in spite of his claims to the contrary, I know of no evidence that
he discovered the connection between his measure and potential theory.
419
(N )
u(x) = EWx
u ( G ) , G () < .
Z
B(x, r) G = u(x) =
SN 1
u(x + r) SN 1 (d).
420
u(x) = EWx
u ( B(x,r) ) , B(x,r) () < .
Hence, the proof of (10.2.7) reduces to the observation that the distribution of
(N )
{ B(x,r) < } 7 ( B(x,r) ) B(x, r) under Wx is same as that of
{ B(0,r) < } 7 x+( B(0,r) ) under W (N ) and that (cf. Exercise 4.3.10)
the distribution of { B(0,r) < } 7 ( B(0,r) ) under W (N ) is rotation
invariant.
Turning to the converse assertion, suppose that u : G R is a locally
bounded, Borel measurable function for which (10.2.7) holds. To see that u
C (G; R), extend u to RN so that it is 0 off of G, and choose a Cc R; [0, )
with support in (0, 1) and total integral 1. Using (10.2.7) together with Fubinis
Theorem, one sees that, as long as B(x, r) G,
1
Z
u(x) =
(t)
u(x + tr) S N 1 (d) dt
N 1
0
Z S
1
|y x|1N r1 |y x| u(y) dy,
=
N 1 r RN
Z
from which it is clear that u C (G; R). Further, knowing that u is smooth
and satisfies (10.2.7), it is easy to see that it is harmonic. Indeed, by Taylors
Theorem, we know that
Z
SN 1
Z
u(x + r) SN 1 (d) u(x) =
SN 1
r2
2 u(x)
+ o(r2 ),
ei ,
2
SN 1
RN
Z
2
1
SN 1 (d)
SN 1 (d) =
N SN 1
and, when 1 i 6= j N ,
Z
ei ,
SN 1
RN
SN 1 (d) =
ei ,
SN 1
RN
ej ,
RN
SN 1 (d) = 0.
Hence, after dividing through by r2 and letting r & 0, we see that (10.2.7)
implies u(x) = 0.
421
(N )
u ( B(x,r) ) , B(x,r) () < ;
u(x) = EW
(N )
f (t) , G () < .
Thus, if (10.2.5) holds for all x G and we are going to solve the Dirichlet
problem for f , then we have no choice but to show that the u given by (10.2.8)
is a solution. Furthermore, because of the last part of Theorem 10.2.4, we already
know that this u is harmonic in G. Thus, all that remains is to find conditions
under which the u in (10.2.8) will take the correct boundary values.
It should be reasonably clear, and will be verified shortly (cf. Theorem 10.2.14),
that if f is continuous at a G and if
(10.2.9)
lim Wx(N ) ( G ) = 0
xa
xG
sG = s + G s
G
G
and 0+
s = 0+
= sG .
G
a reg G a G and Wa(N ) 0+
> 0 = 0,
422
and so reg G is Borel measurable. Finally, if a reg G, then, for each > 0,
(N )
G
G
(10.2.13)
lim
W
,
(
)
(0,
)
B(a,
)
= 1.
x
xa
xG
Proof: Set G(a, r) = G BRN (a, r). Since it is obvious that G(a,r) is dominated by G , there is no question that a reg G = a reg G(a, r). On the
other hand, if a reg G(a, r) and > 0, then, for all 0 < < ,
lim Wx(N ) ( G )
lim Wx(N ) ( G ) xa
xa
xG
xG
lim
xa
Wx(N )
lim Wx(N ) BRN (a,r)
G(a,r) + xa
xG
xG(a,r)
W (N )
sup |(t)|
t[0,]
r
2
!
0
as & 0.
(N )
G
0+
> 0 = 0, then,
G
lim Wx(N ) 0+
=0
lim Wx(N ) G = xa
xa
xG
xG
for every > 0. To prove the converse, suppose that a reg G, let positive
and be given, and choose r > 0 so that
Wx(N ) G for x G B(a, r).
Then, by the second part of (10.2.10), the Markov property, and (4.3.13), for
each s (0, ) one has
h
i
(N )
(N )
G
Wa(N ) 0+
2 EWa W(s) G , (s) G
r2
+ Wa(N ) (s)
/ B(a, r) + 2N e 2N s ,
423
(N ) G
from which Wa 0+
> 0 = 0 follows when first s & 0 and then & 0.
Now, assume that a reg G, and observe that, for each 0 < < ,
Wx(N ) G
/ B(a, ) or G
!
Wx(N ) G + Wx(N ) sup |(t) a| .
t[0,]
lim Wx(N )
xa
xG
/ B(x, ) or
2
,
2N exp
2N
for all a G.
xa
xG
At the same time, define L(f ) to be the set of v : G R such that v U(f ).
Finally, given a G, say that a admits
a barrier if, for some r > 0, there
exists an C 2 G B(a, r); (0, ) such that
lim
xa
xGB(a,r)
(x) = 0
and
424
xa
xG
for all x G
Hf (x) is a bounded
Z
Z
(N )
=
w 0 ( Bn () ) WXn () (d 0 ) Wx(N ) ()
{: n ()<}
Z
(N )
Wx
=E
SN 1
(N )
EWx
3
{ 0 : Bn () ( 0 )<}
w Xn () + Rn () SN 1 (d), () <
w ( n ) , n () < ,
425
where, in the passage to the second to last line, I have used the fact, established
earlier, that the exit place from a ball of a Brownian path started at its center
is uniformly distributed. Hence, by Fatous Lemma and the boundary condition
satisfied by w,
(N )
w(x) lim EWx w ( n ) , n () <
n
(N )
EWx f ( G ) , G () < = u(x).
Thus, we have now shown that w u for all w U(f ). Of course, if v L(f ),
then, because v U(f ), we also know that v u and therefore that
v u.
(N )
I turn next to the second part of the theorem. Set m(x) = EWx [ G ], x G.
Clearly m is positive. Moreover, if m(x) 0 as x a through G, then a is
regular. Conversely, suppose a is regular. Since
(N )
m(x) + EWx
1
(N )
1
G , G + Wx(N ) ( G ) 2 EWx ( G )2 2 ,
=
(N )
EWx [ B(c,R) ], and observe that, since B(c,R) = G + B(c,R) G when G <
,
(N )
m(x)
m(x) = EWx m
( G ) , G () < .
Thus mm
Hence, after letting first t and then n tend to infinity, we see that m(x) 2 (x)
for all x G; and, since (x) 0 as x tends to a through G, it follows that
a reg G.
426
The argument used to prove the first part of Theorem 10.2.15 is a probabilistic
implementation of what analysts call the balayage procedure for solving the
Dirichlet problem.
Exercises for 10.2
Exercise 10.2.16. Suppose that G is a non-empty, open subset of RM RN and
that (x, y) G 7 u(x, y) R is a Borel measurable function that is harmonic
with respect to x and y separately (i.e., u( , y) is harmonic on {x : (x, y) G}
for each y G and u(x, ) is harmonic on {y : (x, y) G} for each x G).
Assuming that u is bounded below on compact subsets of G, show that u is
harmonic on G.
Hint: Clearly, all that one has to show is that u is smooth on G. In addition,
without loss in generality, one can assume that u can be extended to RM RN
as a non-negative, Borel measurable function. Making this assumption,
proceed
as in the proof of Theorem 10.2.4 to show that if Cc (0, 1); R has total
integral 1 and BRM (x, r) BRN (y, r) G, then u(x, y) equals
ZZ
1
1
1M
1N
1
|x|
|y|
r
|x|
r
|y|
u(, ) dxd.
M 1 N 1 r2
RM RN
427
(N )
(N )
(N )
(ii) For any A BC(RN ) , show that EWx F, A = EWx [F ]Wx (A) for all
bounded, Borel measurable F : C(RN ) R if it holds for all bounded, continuous ones.
(N )
(N )
(iii) By combining (i) and (ii), show that Wx (A) = Wx (A)2 for all A F0+
and x RN .
Exercise 10.2.19. Let G be a non-empty, open subset of RN . In this exercise,
we will develop a criterion for checking the regularity of boundary points.
(1)
G
(ii) As an application of Blumenthals 01 Law, show that Wx (0+
> 0)
{0, 1} for all x RN . Next, using this together with (10.2.12), show that a is
(N )
regular if and only if Wa 0+ = 0 > 0.
(iii) Assume that a G has positive, upper Lebesgue density in G{. That is,
lim
r&0
|B(a, r) G{|
> 0,
|B(a, r)|
0+ t
Wa(N )
N e 12 |B(a, t 12 ) G{|
,
(t)
/G
1
N
|B(a, t 2 )|
(2) 2
is contained in G{
(v) If F is a closed subset of RN , r > 0, and G = {x RN : |x F | > r},
show that every boundary point of G satisfies the exterior cone condition and is
therefore regular.
428
(N )
EPn,x f ( G ) , G () < EWx f ( G ) , G () <
which G () = G () < .
(N )
(ii) Say that a G is strongly regular if Wa G = 0 = 1. If every a G
is strongly regular and if x G is a point at which (10.2.5) holds, show that
(N )
Wx G = G < = 1. Thus, (10.2.21) holds in this situation.
(N )
part (iii) of Exercise 10.2.19, show that Wa G = 0 = 1 if a G has
that is, if
positive, upper Lebesgue density in RN \ G,
Thus, if (10.2.5) holds for all x G and every a G has positive, upper
then (10.2.21) holds uniformly for x in compact
Lebesgue density in RN \ G,
subsets of G.
4
This type of approximation was carried out originally by H. Phillips and N. Wiener in Nets
and Dirichlet problem, J. Math. Phys. 2 in 1923. Ironically, the authors do not appear to have
made the connection between their procedure and probability theory. In 1928, a more complete
429
W(x,y)
(1) (N )
( H ) {0} = EW x, y I () ,
where y () = inf{t 0 : (t) y}. Next, recall from Exercise 7.1.24 that the
1
(N +1)
W(x,y)
( H ) {0} =
yR (y x) dy,
where
N
yR (y) =
22 y
is the unique bounded solution to the classical heat equation that tends to
as t & 0. Of course, from a probabilistic perspective, g (N ) (t, y x) is the
probability (in the sense of densities) of a Brownian path going from x to y
during a time interval of length t.
In this section I will construct other functions that, on the one hand, are
the fundamental solution to a heat equation and, at the same time, the density for the probability of a Brownian motion making transitions under various
conditions.
10.3.1. A General Construction. For each t > 0, let Et : C(RN ) [0, )
be a Ft -measurable function with the property that
(10.3.1)
Es+t () = Es ()Et s )
430
and define
(10.3.2)
q(t, x, y) = EW
(N )
h
i
Et x(1 `t ) + t + y`t g (N ) (t, y x),
C(RN ),
RN
Next note that, by (10.3.1), Es+t x + (s,s+t,(,)) + (y x )`s+t equals
s
(y x ) `s
Es x + s + + s+t
s
(y x ) + (s )t
Et x + + s+t
s
(y x ) `t ( s) .
+ y x + + s+t
431
Plugging this into the expression for q(s + t, x, y) and making the change of
s
(y x ), one finds that q(s + t, x, y) equals
variable x + + s+t
ZZ
q(s, x, )
q (t, , y)g (N ) (s, + c)g (N ) (t, + c) dd,
RN RN
where =
Z
s
s+t ,
t
s+t ,
and c =
tx+sy
s+t .
RN
st
s+t ,
(10.3.5) p (s+t, x, y) =
and
(10.3.6).
pG (t, x, y) = pG (t, y, x)
432
where, in the passage to the second line, I have applied the same reasoning as was
suggested in part (i) of Exercise 7.3.7. Hence, (*) will follow once y
q (t, x, y)
q (t,x,y)
g(N ) (t,yx) is shown to be continuous on G. To this end, argue as in the last
part of Theorem (10.3.1) and apply the Markov property to show that q (t, x, y)
equals
W (N ) x + t (t ) + (y x)`t (t ) G, [(1 )t, t]
= W (N ) y + t ( ) + (x y)`t ( ) G, [(1 )t, t]
Z
=
g (N ) (1 )t, z y Wz(N ) ( ) + y (t) `t ( ) G dz,
RN
433
and
sup pG (t, x, y) = 0 for (s, a) (0, ) reg G.
lim
(t,x)(s,a) yK
xG
pG (t, x, y) dy
pG (t, x, y) dy
G
Z
(N ) G
= Wx ( > t
GB(x,r)
g (N ) (y x) dy
RN \B(x,r)
g (N ) (y) dy,
RN \B(0,r)
and therefore
Z
lim sup 1
pG (t, x, y) dy lim sup Wx(N ) ( G t),
t&0 xK
t&0 xK
GB(x,r)
pG (t, x, y) =EWx
h
i
g (N ) (t, y x) g (N ) t G (), y ( G ) , G < t
N +kk
2
|x|2
1
P t 2 x e 2t ,
PN
i=1
i .
434
(10.3.10)
kk=n
Cn
(t + |x|2 )
N +n e
|x|2
4t
EWx
(N )
y g
t G (), y ( G ) , G () < t
i
h |y(G )|2
(N )
Cn
G
Wx
4t
,
()
<
t
.
E
e
|y G|N +n
Wx
i
h |y(G )|2
|yx|2
|y x|
(N
)
G
4t
,
kk[0,t]
, () < t e 16t + W
e
2
and so we now see (cf. (4.3.13)) that, for some other choice of Cn < ,
(10.3.11) y pG (t, x, y) Cn
(t + |y x|2 )
N +n
2
1
+
|y G|(N +n)
!
e
|yx|2
16N t
when kk = n.
Combining (10.3.11) with the symmetry of pG , we have
(10.3.12) x pG (t, x, y) Cn
(t + |y x|2 )
N +n
2
1
+
|x G|(N +n)
!
e
|yx|2
16N t
Z
G
pG
t
2 , x, z
pG
t
2 , z, y
dz,
Z
1
pG (h, x, y)(y) dy (x) = 12 (x)
h&0 h
ZG
1
G
p (h, x, y)(x) dx (y) = 12 (y).
lim
h&0 h
G
lim
435
To see this, use the symmetry of pG to show that the second of these follows
from the first one. To prove the first one, use pG (h, x, y) g (N ) (h, y x) and
(10.3.8) to show that, for any Cc2 (RN ; R) that equals in a neighborhood
of x,
Z
Z
G
(N
)
p (h, x, y)(y) dy (x)
g (h, y x)(y)
dy
G
dy (x) 12 (x),
h
G
tm pG (t
G
1
G
2 y p (t, x, y).
+ h, x, y) = 2
G
pG (h, x, z)m
y p (t, z, y) dz,
The following result provides the justification for my calling pG the Dirichlet
heat kernel on G.
Corollary 10.3.13. For each Cb (G; R), the function
Z
G
(N )
Wx
u(t, x) = E
(t) , () > t =
(y)pG (t, x, y) dy
G
lim u(t, ) =
t&0
lim
(t,x)(s,a)
xG
u(t, x) = 0
uniformly on compacts,
for (s, a) (0, ) reg G.
436
Proof: That the u in the first part is a bounded, smooth solution follows easily
from (10.3.12) and the last part of Theorem 10.3.9. To prove the uniqueness
assertion when G = reg G, choose {Gn : n 1} to be a non-decreasing
S
sequence of open sets so that Gn G and G = n1 Gn . Given a bounded
solution u, apply Theorem 10.1.2 to see that, for each n 1, u(t, x) equals
(N )
(N )
(t) , Gn () > t + EWx u t Gn (), ( Gn ) , Gn () t
(N )
(N )
= EWx ((t) , G () > t + EWx u t Gn (), ( Gn ) , G () < t
(N )
+ EWx u t Gn (), ( Gn ) (t) , Gn () t < G () .
EWx
= EW
(N )
Z t
exp
V x + t + (y x)`t ( ) d
g (N ) (t, y x).
0
Z
V ( ) d
Et () exp
,
we see that
Z
(10.3.16)
(y)q (t, x, y) dy = E
RN
(N )
Wx
Z t
exp
V ( ) d (t)
0
for (t, x) (0, ) RN and Borel measurable s that are bounded below,
Z
V
(10.3.17)
q (t, x, y) =
q V (s, x, z)q V (t, z, y) dz
RN
437
(10.3.19)
t u = 12 u + V u
(t, x) (0, ) RN .
RN
I now want to make an analysis of q V (t, x, ) which, among other things, will
enable me to show (cf. Corollary 10.3.22) that, under suitable conditions on V ,
the right-hand side of (10.3.20) is necessarily a solution to (10.3.19). For this
reason, I will call q V the FeynmanKac heat kernel with potential V .
Assume that V C n (RN ; R) is bounded above and that,
Theorem 10.3.21.
for some Cn < ,
max x V (x) Cn 1 + V (x) ,
kkn
x RN .
Finally, if n 2 and m
tm q V (t, x, y) =
1
2 x
n
2,
then
m
+ V (x) q V (t, x, y) =
1
2 y
m
+ V (y) q V (t, x, y).
(0)
x g (N ) (t, y x),
438
Rt
V (t,x,y ( )) d
, and
where t,x,y ( ) = x + t ( ) + t (y x), E V (t, x, y, ) = e 0
P`
(k)
= . Since, by our hypotheses, each of the integrands in these terms
k=0
+
is bounded by a constant times etkV ku , the asserted estimate for x q V (t, x, y)
follows from this and (10.3.10).
The rest of the proof is similar to, but easier than, that of Theorem 10.3.9.
Specifically, one uses q V (t, x, y) = q V (t, y, x) and
Z
q (t, x, y) =
qV
RN
t
2 , x, z
qV
t
2 , z, y
dz
to prove the existence of and estimate for x y q V (t, x, y). Also, knowing these
results about the spacial derivatives, one deals with the time derivatives in the
same way as I did at the end of that theorem. The details are left to the
reader.
Corollary 10.3.22. Let V be as in Theorem 10.3.21, and assume that n 2.
Then, for each Cb (RN ; R), the function
(N )
Wx
u(t, x) = E
Z
h Rt
i
V (( )) d
0
e
t) =
(y)q V (t, x, y) dy
RN
is the unique u C 1,2 (0, ) RN ; R that is bounded on (0, T ) RN for each
T > 0 and satisfies (10.3.19).
Proof: The only assertion that has not already been proved is that the u
described takes on the correct initial value. However, because q V (t, x, y)
+
ekV ku g (N ) (t, y x), it is clear that, for each r > 0,
Z
q V (t, x, y) dy = 0.
lim sup
t&0 xRN
B(x,r){
RN
Rt
V (( )) d
W (N ) x
q (t, x, y) dy E
1 e 0
|x|R
RN
+
tK(R)etK(R) + 1 + etkV ku W (N ) kk[0,t] R ,
V
439
QVt
We know that
is a bounded map from Cb (RN ; R) into itself. In addition, by
V
(10.3.17), {Qt : t 0} is a semigroup. That is, QVs+t = QVt QVs . Also, by
Corollary 10.3.22, we know that if
(10.3.23)
V C 2 (RN ; R) and max | V | C 1 + V ,
kk2
then (t, x)
QVt (x) is a solution to (10.3.19).
I will say that : RN R is a ground state for V if is a (strictly) positive,
continuous function that satisfies the equation et = QVt for some R and
all t 0, in which case will be called the eigenvalue associated with .
Lemma 10.3.24. Let V be as above, and assume that C RN ; [0, ) does
not vanish identically. If et = QVt for all t 0, then
is a ground state
with associated eigenvalue . In fact, Cb2 RN ; (0, ) if is bounded and
V C 2 (RN ; R) satisfies (10.3.23). Next, if is a twice continuously differentiable
ground state with associated eigenvalue , then 12 + V = . Conversely,
if is a twice continuously differentiable, bounded solution to 12 + V = ,
then is a ground state with associated eigenvalue .
Proof: Since I can always replace V by V , I may and will assume that = 0
throughout. Also, observe that if C RN ; [0, ) satisfies = QV1 , then,
because q V (1, x, y) > 0 everywhere, > 0 everywhere unless 0. Hence, the
first assertion is proved.
Next suppose that is a twice continuously differentiable ground state with
eigenvalue 0. To see that 12 + V = 0, it suffices to show that
N
1
2 + V , L2 (RN ;R) = 0 for all Cc (R ; R).
To this end, let Cc (RN ; R) be given, and apply symmetry, Theorem 10.1.2,
and Fubinis Theorem to justify
0 = , QV1 L2 (RN ;R) = QV1 , L2 (RN ;R)
Z 1
=
QV 21 + V , 2 N d
L (R ;R)
Z 1
0
1
2
+ V , QV
L2 (RN ;R)
d =
1
2
+ V , L2 (RN ;R) .
440
B(0,r)
and
p (s + t, x, y) =
RN
for all (t, x, y) (0, ) RN RN . In particular, for each Cb (RN ; R), the
function
Z
u(t, x) =
(y)p (t, x, y) dy
N
R
is the one and only bounded u C 1,2 (0, ) RN ; R that satisfies
t u(t, x) = 12 u(t, x) + log (x), u(t, x) RN in (0, ) RN
t&0
The advantage that p (t, x, y) has over q V (t, x, y) is that we can construct
measures on C(RN ) that bear the same relationship to it as the Wiener measures
(N )
Wx bear to the classical heat kernel g (N ) (t, y x).
441
(tm ) m , 1 m n =
Z Y
n
1 n m=1
where y0 = x. In fact, if
R (t, ) = e
1 V
(0)
E (t, ) (t)
then
(N )
Px (A) = EWx
Finally, x
Rt
V ((( )) d
where E (t, ) = e 0
,
V
R (t), A for all t 0 and A Ft .
Z
F ,
Px (d)
{()<}
Z
Z
=
F (,
) P() (d 0 )
Px (d)
{()<}
N
E
R (t) = 1 for all (t, x) [0, ) R . In addition, R (s + t, ) =
EWx
R (s+t), A =
R (s, )E
(N )
(N )
R (t) Wx(N ) (d) = EWx R (s), A
W(s)
for A Fs .
(N )
Determine t,x M1 C(RN ) by t,x (d) = R(t, )Wx (d). By the preceding, t1 ,x Ft1 = t2 ,x Ft1 for
all 0 t1 t2 , and so (cf. Exercise 9.3.6)
there is a unique Px M1 C(RN ) whose restriction to Ft is the same as that
of t,x for all t 0.
To see that x
Px is continuous, it suffices to check that
lim R (t, y + ) = R (t, x + )
yx
in L1 (W (N ) ; R).
But clearly this convergence is taking place pointwise for each C(RN ). In
addition, R (t, ) 0 and, for each z RN , R (t, z + ) has W (N ) -integral 1.
Hence, the convergence is also taking place in L1 (W (N ) ; R).
442
Now suppose that is a stopping time and that T for some T (0, ).
Then, for any F FT -measurable F : C(RN )2 R that is bounded below,
Z
F , Px (d)
Z
= R (), R (2T (), F , Wx(N ) (d)
Z
Z
(N )
0
0
0
= R (),
R 2T (), F (, )W() (d ) Wx(N ) (d)
Z
Z
0
0
= R (),
F (, ) P() (d ) Wx(N ) (d)
Z Z
0
0
=
F (, ) P() (d ) Px (d),
where I have again used (10.2.2) and, in the final step, Hunts Theorem (cf.
Theorem 7.1.14) to replace R (), ) by R (T, ). Starting from this, one
can easily remove the condition that is bounded and extend the result to all
F s that are F BC(RN ) -measurable and bounded below.
To complete the proof, observe that, as a special case of the preceding,
Z
EPx (s + t) , A = EPx
p t, (s), y) dy, A
RN
t, (t)
f , ( ) d, Ft , Px
0
N
t B(x,R) ()
V
E (, )(f ) , ( )
0
d, Ft , Wx(N )
443
R t B(x,R) (), (t B(x,R) )
t B(x,R) ()
R (, )f ( ) d, Ft , Wx(N )
0
(N )
t B(x,R) ()
!
f ( ) d, Ft , Px
b ( ) d.
If P M1 C(RN ) has the properties that P (0) = x = 1 and
Z t
1
( ) d, Ft , P
(t)
2 + b, RN
0
is a martingale for all Cc (RN ; R), then B(t), Ft , P is a Brownian motion.
Proof: Without loss in generality, I will assume that x = 0.
Given RN and R > 0, set e (y) = e 1(,y)RN ,
||2
1 , b(y) RN , and ER (t, ) = exp
f (y) =
2
t B(0,R) ()
f ( ) d
!
.
= e (t
B(0,R)
) +
0
t B(0,R) ()
f ( ) e ( ) d.
444
t B(0,R) ()
!
MR (, )f ( ) ER (, ) d, Ft , P
Hence
exp
1 , B(t
B(0,R)
||2
B(0,R)
t
() , Ft , P
() RN +
2
bR ( ) d,
t [0, T ],
445
XR (t, )
(t)
if
/ A(b).
1 , QVt 2
L2 (RN ;R)
= 2 , QVt 1
L2 (RN ;R)
ku
446
Finally,
ZZ
q (2t, x, x) dx (4t)
q (t, x, y) dx dy =
N
2
RN RN
RN
RN
ku
ku e
tkV + ku
kg
(N )
etkV
ku
N
(2t) 2q
Thus, since QVt maps Cc (RN ; R) into Cb (RN ; R), it also takes Lq (RN ; R) there.
Because (10.3.30) holds for elements of Cc (RN ; R), the preceding estimates
make it clear that it continues to hold for elements of L2 (RN ; R). That is, QVt is
self-adjoint on L2 (RN ; R). To see that it is non-negative definite, simply observe
that
, QVt L2 (RN ;R) = QVt , QVt L2 (RN ;R) 0.
2
RN
RN
In the language of functional analysis, the last part of Lemma 10.3.31 says
that QVT is HilbertSchmidt and therefore compact if e2T V L1 (RN ; R). As
a consequence, the elementary theory of compact, self-adjoint operators allows
us to make the conclusions drawn in the following theorem.
447
Theorem 10.3.32. Assume that eT V L2 (RN ; R) for some T (0, ). Then
there is a unique Cb RN ; (0, ) L2 (RN ; R) such that
kkL2 (RN ;R) = 1 and et = QVt for some R and all t (0, ).
Moreover, if V C 2 (RN ; R) satisfies (10.3.23), then p (t, , y) C 2 (RN ; R) and
t p (t, x, y) = 12 x p (t, x, y)+ log (x), x p (t, x, y) RN in (0, )RN RN .
Proof: The spectral theory of compact, self-adjoint operators guarantees that
the operator QVT has a completely discrete spectrum and that its largest eigenvalue is
(T ) = sup , QVT L2 (RN ;R) : kkL2 (RN ;R) = 1 .
Now let be an L2 (RN ; R)-normalized eigenvector for QVT with eigenvalue
(T ). Because (T ) = QVT , we know that can be taken to be continuous.
In addition, by the preceding paragraph,
ZZ
RN RN
ZZ
RN RN
which, because q V (T, x, y) > 0 for all (x, y), is possible only if (T ) > 0 and
never changes sign. Therefore we can be take to be non-negative. But,
if 0, then, since p (T, x, y) > 0 everywhere and (T ) = QVT , > 0
everywhere. Thus, we have now shown that every normalized eigenvector for
QVT with eigenvalue (T ) is a bounded, continuous function that, after a change
of sign, can be taken to be strictly positive. In particular, if 1 and 2 were
linearly independent, normalized eigenvectors of QVT with eigenvalue (T ), then
g=
448
addition, because t
Finally,
(s + t) = , QVs+t
L2 (RN ;R)
= (s) , QVt
L2 (RN ;R)
(t).
= (s)(t),
which means that (t) = et for some R, and, because (T ) = eT , this completes the proof of everything except the final statement, which is an immediate
consequence of Theorem 10.3.21.
If nothing else, Theorem 10.3.32 helps to explain the terminology that I have
been using. In Schr
odinger mechanics, the function in Theorem 10.3.32 is
called the ground state because it is the wave function corresponding to the
lowest energy level of the quantum mechanical Hamiltonian 12 V . From
our standpoint, its importance is that it shows that lots of V s admit a ground
state.
I turn now to the second method
for producing ground states. Namely, sup
pose that C 2 RN ; (0, ) . Then, it is obvious that 12 + V = 0, where
V =
log + | log |2
.
=
2
2
Theorem 10.3.33.
Let U C 2 (RN ; R), and assume that both U and V U
1
2
N
2 U + |U | are bounded above. Then,
for each x R , there is a unique
U
N
U
Px M1 C(R ) such that Px (0) = x = 1 and
(t)
1
2
Z t
0
+ U, RN ( ) d, Ft , PU
x
Z
(t) x
U ( ) d, Ft , PU
x
=e
U (x)
(N )
Wx
Rt U
h
i
U (((t))+
V (( )) d
0
e
,A
Finally, x
PU
x is continuous and, for any stopping time and any F BC(RN ) measurable F : C(RN ) C(RN ) that is bounded below,
Z
{()<}
F (, ) PU
x (d)
Z
=
Z
F (,
{()<}
0
) PU
() (d )
PU
x (d).
449
1 (N )
g
s
2 ,
g (N )
t
2 ,
p(cr,c+r) (t, x, y) = r1 g(1) r2 t, r1 (yx) r1 g(1) r2 t, r1 (x+y+22c)) ,
g (1) (t, x + 4m).
QN
Exercise 10.3.36. Set Q(a, R) = i=1 [ai R, ai + R] for a RN and R > 0.
Show that
where g(, ) =
mZ
pQ(a,R) (t, x, y)
Q(a,R)
N
Y
i=1
sin
N
N 2 Y
(xi ai + R)
(yi ai + R)
sin
dy = e 8R2 t
2R
2R
i=1
N 2
1
log Wx(N ) ( Q(a,R) > t) =
t t
8R2
lim
450
(i) Show that w is sub-multiplicative in the sense that w(s + t) w(s)w(t), and
conclude from this that limt 1t log w(t) = supT >0 T1 f (T ) [, 0].
Hint: Set f (t) = log w(t). Because w takes values in (0, 1] and is non-increasing,
f is non-positive and bounded on compacts.
f (s+t)
t Further, f is sub-additive:
1
f (s)+f (t). Thus, given, T > 0, f (t) T f (T ), and so limt t f (t) T1 f (T )
for every T > 0. Conclude from this that limt 1t f (t) = supT >0 T1 f (T )
[, 0].
ular, G < .
X
G
(10.3.39)
p (t, x, y) =
etn n (x)n (y),
n=0
451
G
(i) Let PG
t be the operator on Cb (G; R) whose kernel is p (t, x, y), and show
G
2
that Pt admits a unique extension to L (G; R) as a self-adjoint contraction.
Further, show that {PG
t : t > 0} is a continuous semigroup of non-negative
definite, self-adjoint contractions on L2 (G; R). Finally, show that
ZZ
p (t, x, y) dxdy =
pG (2t, x, x) dx
GG
|G|
N
(4t) 2
(*)
L2 (G;R)
etn , n
L2 (G;R)
0 , n
L2 (G;R)
n=0
X
n=0
tn
, n
2
L2 (G;R)
Z
=
GG
What is needed here is the variant of Stones Theorem that applies to semigroups. The
technical question which his theorem addresses is that of finding a simultaneous diagonalization
of the operators PG
t . Because we are dealing here with compact operators, this question can
be reduced to one about operators in finite dimensions, where it is quite easy to handle. For
a general statement, see, for example, K. Yoshidas Functional Analysis and its Applications,
Springer-Verlag (1971).
452
for any L2 (G; R), and use this to show that, for any M N and , 0
L2 (G; R),
tn
, n
n=M
0
e 2M
0
1
1
n L2 (G;R)
N kkL (G;R) k kL (G;R) .
L2 (G;R)
(t) 2
M
1
X
G
2e 2M
p (t, x, y)
etn n (x)n (y)
N ,
(t) 2
n=0
t
e 0 p(t, x, y) 0 (x)0 (y)
0
1
etn n (x)2
t1
2
pG
t
2 , x, x
12
pG
t
2 , y, y
! 12
etn n (y)2
n=1
n=1
12
t1
2
(t) 2
See Kacs wonderful article Can one hear the shape of drum?, Am. Math. Monthly 73 # 4,
pp. 123 (1966), or, better yet, borrow the movie from the A.M.S.
453
radiator. When Kac took up the problem, he turned it around. Namely, he asked
what geometric information, besides the volume, is encoded in the eigenvalues.
When he explained his program to L. Bers, Bers rephrased the problem in the
terms that Kac adopted for his title. Audiophiles will be disappointed to learn
that, according to C. Gordon, D. Webb, and S. Wolperts,4 one cannot hear the
shape of a drum, even a two dimensional one.
This exercise outlines Kacs argument for proving Weyls asymptotic formula
N
|G| 2
,
N ()
N
(2) 2 ( N2+1 )
N (d) =
(0,)
X
n=0
tn
Z
=
pG (t, x, x) dx,
where N (d) denotes integration with respect the purely atomic measure on
(0, ) determined by the non-decreasing function
N ().
(iii) Using (10.3.8), show that
N
(0,)
At this point, Kac invoked Karamatas Tauberian Theorem,5 which relates the
asymptotics at infinity of an increasing function to the asymptotics at zero of
4
See their 1992 announcement in B.A.M.S., new series 27 (2), One cannot hear the shape
of a drum.
5 See, for example, Theorem 1.7.6 in N. Bingham, C. Goldie, and J. Teugels Regularly Varying
Functions, Cambridge U. Press (1987).
454
its Laplace transform. Given the preceding, Karamatas theorem yields Weyls
asymptotic formula. It should be pointed out that the weakness of Kacs method
is its reliance on the Laplace transform and Tauberian theory, which gives only
the principal term in the asymptotics. Further information can be obtained
using Fourier methods, which, in terms of partial differential equations, means
that one is replacing the heat equation by the wave equation, an equation about
which probability theory has embarrassingly little to say.
Exercise 10.3.41. It will have occurred to most readers that the relation between the Hermite heat kernel in (10.1.7) and the OrnsteinUhlenbeck process
in 8.4.1 is the archetypal example of what we have been doing in this section.
This exercise gives substance to this remark.
(i) Set (x) = e
|x|2
2
1
2
12 |x|2 = N2 . By Lemma
(ii) Although it does not follow from Lemma 10.3.24, use (10.1.7) to show that
2
+ is also a ground state for |x|2 with associated N2 . (See Exercise 10.3.43.)
1
x2
d
2
Exercise 10.3.42. Recall the Hermite polynomials Hn (x) = (1)n e 2 dx
ne
in 2.4.1. Show that the Hermite functions (although these are not precisely the
ones introduced in 2.4, they are obtained from those by rescaling)
1
2
4
n (x) = 2 1 e x2 Hn (2 12 x), n 0,
h
(n!) 2
form an orthonormal basis in L2 (R; R) and that
Z
n (x), n 0 and (t, x) (0, ) R,
n (y)h(t, x, y) dy = e(n+ 12 )t h
h
N
Y
hni (xi )
for n NN and x RN ,
i=1
X
2
n
Hn (x) = ex 2 .
n!
n=0
455
Exercise 10.3.43. Part (ii) of Exercise 10.3.41 might lead one to question the
necessity of the boundedness assumption made in Lemma 10.3.24. However, that
would be a mistake because, in general, a positive solution to 12 + V =
need not be a ground state. For example, in this exercise we will show that
x4
although (x) = e 4 satisfies 12 x2 + V = 0 when V = 12 x6 + 3x2 , this is
not a ground state for V . The proof is based on the following idea. If were a
ground state, then Theorems 10.3.26 and its corollaries would apply, and so we
would know that the equation
Z
(*)
X(t, ) = (t) +
X(, )3 d
0
(1)
would have a solution on [0, ) for Wx -almost every C(R) for every x R.
The following steps show that this is impossible.
(i) Suppose that 1 , 2 C(R) and that 0 1 (t) 2 (t) for t [0, 1]. If
X( , 2 ) exists on [0, 1], show that X( , 1 ) exists on [0, 1].
Rt
Hint: Define X0 (t, ) = (t) and Xn+1 (t, ) = (t) + 0 Xn (, )3 d . First
show that if 0 1 (t) 2 (t), then 0 Xn ( , 1 ) Xn ( , 2 ). Second, if
supn0 kXn ( , )k[0,T ] < , show that Xn ( , ) converges uniformly on [0, T ]
to the unique solution to (*) on [0, T ].
1
(ii) Show that if (t) 1 for t [0, 1], then X(t, ) (1 2t) 2 for t 0, 12
and therefore X( , ) fails to exist after time 12 .
(1)
(iii) Show that W2 (t) 1 for t [0, 1] > 0, and conclude from this that
cannot be a ground state for V .
Chapter 11
Some Classical Potential Theory
In this concluding chapter I will discuss a few refinements and extensions of the
material in 10.2 and 10.3. Even so, I will be barely scratching the surface. The
interested reader should consult J.L. Doobs thorough account in Classical Potential Theory and Its Probabilistic Counterpart, published by SpringerVerlag
in 1984, or S. Port and C. Stoness Brownian Motion and Classical Potential
Theory, published by Academic Press in 1978.
11.1 Uniqueness Refined
In this section I will refine some of the uniqueness statements made in 10.2.
The improved statements result from the removal of the defect mentioned in
Remark 10.3.14. To be precise, recall that if G is an open subset of RN , then
G
sG () = inf{t s : (t)
/ G}, 0+
= lims&0 sG , and (cf. Lemma 10.2.11)
(N ) G
reg G is the set of x G such that Wx (0+
= 0) = 1. The main result
proved in this section is Theorem 11.1.15, which states that, for any x G and
(N )
Wx -almost all C(RN ), G () < = ( G ) reg G. However, I
will begin by amending the treatment that I gave in 10.3 of the Dirichlet heat
kernel pG (t, x, y).
11.1.1. The Dirichlet Heat Kernel Again. In 10.3, I introduced the
Dirichlet heat kernel pG (t, x, y). At the time, I was concerned with it only when
(x, y) G G, and so I defined it in such a way that it was 0 outside G G.
When G is regular in the sense that G = reg G, this choice is the obvious one,
since (cf. Theorem 10.3.9) it is the one that makes pG (t, , y) continuous on R
for each (t, y) (0, ) RN . However, when G is not regular, it is too crude
for the analysis here. Instead, from now on I will take
pG (t, x, y) =
(11.1.1)
W (N ) x 1 `t ( ) + t ( ) + y`t ( ) G, (0, t) g (N ) (t, y x),
457
when q (t, x, y) is defined in terms of Et via (10.3.2). Hence, just as in the proof
of Theorem 10.3.3, one can use the results in 8.3.3 to check that pG (t, x, y) =
pG (t, y, x) is again true but that (10.3.4) has to be replaced by
Z
(N )
(11.1.2)
RN
G
((t) , 0+
() t .
However, the analog here of the ChapmanKolmogorov equation (10.3.5) presents something of challenge. To understand this challenge, note that t
Et
fails to satisfy (10.3.1). Indeed,
Es+t
() = 1G (s) Es ()Et ().
(11.1.3)
Thus, repeating the argument used in the proof of Theorem 10.3.3 to derive
(10.3.5), one finds that
(11.1.4)
p (s + t, x, y) =
which, because the integral is over G and not RN , is a flawed version of the
ChapmanKolmogorov equation. In order to remove this flaw, I will need the
following lemma.
Lemma 11.1.5. For each (t, x) (0, ) RN ,
G
Wx(N ) ( G = t) = 0 = Wx(N ) (0+
= t),
and therefore
Z
(11.1.6)
RN
h
i
G
(y)pG (t, x, y) = Wx(N ) (t) , 0+
() > t
Wy(N ) G > 0,I (dy),
(0, ).
458
/ = Wy(N ) ( G = ) = 0
G
In addition, because G () = t = 0+
() = t when t > 0, it follows that
(N ) G
Wx ( = t) = 0 also.
Given the preceding, it is clear how to pass from (11.1.2) to (11.1.6). Finally,
by applying (11.1.6) with = 1G{ , we see that
Z
G
pG (t, x, y) dy = Wx(N ) (t)
/ G & 0+
() > t = 0,
G{
pG (t, x, y) =g (N ) (t, y x)
h
i
G
(N )
G
G
EWx g (N ) t 0+
(), y (0+
) , 0+
() < t
for all (t, x, y) (0, ) (RN )2 , and the idea is very much the same as the one
used to prove (10.3.8). Thus, for (0, 1), set
q (t, x, y) = W (N ) x 1 `t ( ) + t ( ) + y`t ( ) G, (0, t) g (N ) (t, y x).
Obviously, q (t, x, y) & pG (t, x, y) as % 1. In addition, proceeding as in the
proof of Theorem 10.3.3, one finds that q (t, x, ) is continuous and that
Z
h
i
G
(N )
(*)
(y)q (t, x, y) dy = EWx (t) , 0+
() t .
RN
459
for all (0, 1), t (0, ) and s (0, t). Thus, by (*), after letting s & 0,
we see that
Z
(y)q (t, x, y) dy
RN
Z
=
(y)g (N ) (t, y x) dy
RN
Z
(N )
G
G
G
EWx
(y)g t 0+
(), y (0+
) dy, 0+
() < t .
RN
(N )
G
G
G
g
t 0+
(), y (0+
) , 0+
() < t ,
460
for any 0 < s < t, and note that pG (s, , a) is bounded. Hence, the desired
conclusions follow from (10.3.12) and the argument used to prove the last part of
Theorem 10.3.9. Next, suppose that pG (t0 , x0 , a) = 0 for some (t0 , x0 ) (0, )
G. Then, by the strong minimum principle (cf. Theorem 10.1.6), pG (t, x, a) = 0
for all (t, x) (0, t0 ) G. But this, by (11.1.2) and symmetry, means that, for
t (0, t0 ),
Z
Z
(N ) G
G
Wa (0+ t) =
p (t, a, y) dy =
pG (t, x, a) dx = 0,
RN
where I have used the final part of Lemma 11.1.5 to get the second equality.
Hence, pG (t0 , x0 , a) = 0 = a reg G.
Finally, because, by the preceding and symmetry, for any x G, G \ reg G
is contained in {y
/ G : p(1, x, y) > 0}, and, by Lemma 11.1.5, the latter set
has Lebesgue measure 0, it is clear the G \ reg G has Lebesgue measure 0.
I next introduce the function
(N )
v G (x) EWx
(11.1.10)
G
e 0+ ,
x RN .
Then
x RN ,
(0,)
then
(11.1.12)
r(y x) G (dy),
v (x) =
x RN .
RN
In particular, G is always locally finite and is therefore finite in the case when
G{ is compact. Finally, for any non-empty, open set H RN ,
h H
i
(N )
H
(11.1.13) G{ reg(H) = v G (x) = EWx e0+ v G (0+
) , x RN ,
G
0+
()
as a
461
Z
r(x y) dy = 1,
x RN ,
RN
and so (11.1.12) follows after one integrates the preceding over y RN and
applies Tonellis Theorem.
Given (11.1.12) and the fact that r is uniformly positive on compacts, it
becomes obvious that G must be always locally finite and finite when G{ is
compact. Thus, all that remains is to check (11.1.13). But clearly, after multiplying (11.1.8) with G = H throughout by et and integrating with respect to
t (0, ), one gets
Z
h G
i
(N )
G
r(x y) =
et pH (t, x, y) dt + EWx e0+ r (0+
)y .
(0,)
Hence, since, by the first part of Lemma 11.1.9 with G = H, pH (t, x, ) vanishes
on reg(H), (11.1.13) follows after one integrates the preceding with respect to
G (dy) and uses (11.1.12).
Lemma 11.1.14. If G{ is compact and, for some [0, 1), v G G{ , then
(N ) G
Wx 0+
< = 0 for every x RN .
G
Proof:
by checking that
that
I begin
v everywhere. Thus, suppose
N
G
H = x R : v (x) > + 6= for some > 0. Because v G is lower
semicontinuous, H is open. I will derive a contradiction by first showing that
G{ reg(H) and then applying (11.1.13). To carry out the first step, use
(11.1.12) to see that, for any s (0, ),
Z
Z
G
t
(N )
G
v (x)
e
g (t, y x) (dy) dt
s
RN
Z
Z
s
t
(N )
G
=e
e
g (s, y x)v (y) dy dt
(0,)
( +
)Wx(N )
RN
H
(s) H es ( + )Wx(N ) 0+
>s ,
462
H
and so, after letting s & 0, we have that v G (x) ( + )Wx (0+
> 0).
(N ) H
In particular, if x
/ G, then ( + )Wx (0+ > 0), which means that
(N ) H
x
/ G = Wx (0+ > 0) < 1. Hence, because (cf. part (ii) of Exercise
(N ) H
10.2.19) Wx (0+
> 0) {0, 1}, this means that x
/ G = x reg(H) and
therefore that (11.1.13) applies. But if x H, (11.1.13) yields the contradiction
(N )
h H
i
H
e0+ v G (0+
) < + ,
H
H
since 0+
() < = (0+
)
/ H. That is, I have shown that H must be
empty.
Knowing that v G everywhere, I now want to argue that G (RN )
G (RN ). Since G (RN ) < , this will show that G = 0 and therefore, by
(N ) G
(11.1.12), that v G 0, which is the same as saying that Wx (0+
< ) = 0
everywhere. Thus, let K = G{, and set Kn = {x : dist(x, K) n1 } and
Gn = Kn { for n 1. Clearly, K RN \ Gn reg(Gn ), and so, by (11.1.12) and
Tonellis Theorem,
Z
Z
G N
Gn
G
(R ) =
v (x) (dx) =
v G (y) Gn (dy) Gn (RN ).
RN
RN
Thus, all that we have to do is check that Gn (RN ) & G (RN ) when n .
But
Z
Gn
N
(R ) =
v Gn (x) dx
RN
G
Proof: Suppose not. Because Wy (0+
> 0) {0, 1} for all y RN , we could
then find an x G and a > 0 for which
G
G
Wx(N ) 0+
() < & (0+
) > 0,
where
o
n
G
= y G : Wy(N ) 0+
12 .
463
is the one and only bounded, smooth solution to the boundary value problem
described in Corollary 10.3.13.
More interesting are the improvements that Theorem 11.1.15 allows me to
make to the results in 10.2.3.
Theorem 11.1.17.
f : G R, set
(11.1.18)
uf (x) = EWx
f ( G ) , G () < ,
for x G.
in G and
xa
xG
then uf u. In particular, if f Cb G; R , then uf is the one and only
harmonic function u on G with the properties that
u(x) CWx(N ) G < for all x G,
for some C < and
lim u(x) = f (a) for each a reg G.
xa
xG
464
Proof: The initial assertions are covered already by Theorem 10.2.14. Next,
let f Cb (G; R) be given, and suppose that u is an element of C 2 G; [0, )
which satisfies the conditions
in the second assertion. To prove that uf u, set
Ft = {( ) : [0, t]} , and choose a sequence of bounded, open subsets Gn
(N )
so that Gn G and Gn % G. Then, for each n 1, u (t Gn ), Ft , Wx
is a submartingale, and so we know that, for each x G, u(x) dominates
lim
(N )
lim EWx
u (T Gn ) lim
u ( Gn ) , G T
T % n
T % n
(N )
Wx
(N )
lim EWx
h
i
f ( G ) , G < = uf (x),
where, in the passage to the last line, I have used Fatous Lemma and Theorem
11.1.15.
Finally, let f Cb (G; R) be given. What I still have to show is that if u
is a harmonic function on G which tends to f at points in reg G and satisfies
(N )
|u(x)| CWx ( G < ) for some C < , then u = uf . Thus, suppose u is
such a function, and set M = C + kf ku . Then, by the preceding, we have both
that
M Wx(N ) G < + u(x) uM 1+f (x) = M Wx(N ) G < + uf (x)
and that
M Wx(N ) G < u(x) uM 1f (x) = M Wx(N ) G < uf (x),
which means, of course, that u = uf .
As an immediate consequence of Theorem 11.1.17, we have the following.
Corollary 11.1.19. Assume that
(11.1.20)
Then, for each f Cb (G; R) the function uf in (11.1.18) is the one and only
bounded, harmonic function u on G which satisfies limxa u(x) = f (a) for every
xG
a reg G. In particular, this will be the case if G is contained in a half-space.
In order to go further, it will be helpful to have the following lemma.
Lemma 11.1.21. Let G be a non-empty, connected, open set in RN . Then
reg G = Wx(N ) G < = 0 for all x G.
On the other hand, if reg G 6= and b G, then
/ BRN (b, r) & G < > 0.
b
/ reg G lim lim Wx(N ) ( G )
r&0 xb
xG
465
x G,
and so we need only check that limxb uf (x) > 0. To this end, first note that,
xG
since
xa
xG
the Strong Minimum Principle (cf. Theorem 10.1.6) says that uf > 0 everywhere
in G. Next, because b is not regular, we can find a > 0 and a sequence
{xn : n 1} G such that xn b and
) G
inf+ Wx(N
> > 0.
n
nZ
xb
xG
inf uf (y) > 0.
2 yK
466
and construct f so that f 1 on G B(b, r){ and f (b) = 0. Then f (b) <
limxb uf (x).
xG
I next take a closer look at the conditions under which we can assert the
uniqueness of solutions to the Dirichlet problem. To begin, observe that, by
Corollary 11.1.19, the situation is quite satisfactory when (11.1.20) holds. In
fact, the same line of reasoning which I used there shows that the same conclusion
(N )
holds as soon as one knows that Wx G < is bounded below by a positive
(N )
constant; and therefore, because x G 7 Wx ( G < ) is a bounded
harmonic function which tends to 1 at reg G, Theorem 11.1.17 tells us that
(11.1.23)
inf Wx(N ) G < > 0 = inf Wx(N ) G < = 1.
xG
xG
I will close this discussion of the Dirichlet problem with two results which
reflect the transience of Brownian paths in three and higher dimensions and
their recurrence in one and two dimensions.
Theorem 11.1.24. Assume that N 3, and let G be a nonempty, connected,
open subset of RN . If f Cc (G; R), then uf is the one and only bounded
harmonic function u on G which tends to f at reg G and satisfies
(11.1.25)
lim u(x) = 0.
|x|
xG
467
Clearly,
(N )
and
(N )
lim EWx
T %
h
i
f ( G ) , G T
i
u (T ) , T < G < = 0.
Finally, because N 3 and, therefore, by Corollary 10.1.12, (T ) as
(N )
T % for Wx -almost every C(RN ), (11.1.25) guarantees that
(N )
lim EWx
T %
h
i
u (T ) , G = = 0,
Wx(N ) G < ) = 1 for all x G or Wx(N ) G < = 0 for all x G,
depending on whether reg G 6= or reg G = . Moreover, if reg G = , then the
only functions u C 2 G; [0, ) satisfying u 0 are constant. In particular,
either reg G = , and there are no non-constant, nonnegative harmonic functions
on G, or reg G 6= , and, for each f Cb (G; R), uf is the unique bounded
harmonic function on G which tends to f at reg G.
(N )
Proof: Suppose that Wx0 G < < 1 for some x0 G, and choose open
sets Gn % G
so that x0 G1 and Gn G for all n Z+ . Given u
C 2 G; [0, ) with u 0, set
Xn (t, ) = 1(t,] Gn () u (t)
(N )
(N )
Then Xn (t), Ft , Wx0 is a non-positive,
right-continuous, Wx0 -submartin
gale when Ft = {( ) : [0, t]} . Hence, since
pointwise as n ,
468
At the same time, by Theorem 10.2.3, we know that, for Wx0 -almost every
C(RN ),
Z
1U (t) dt = for all open U 6= .
0
(N )
Hence, since Wx0 G = > 0, there exists a 0 C(RN ) with the properties
that (0) = x0 , G (0 ) = ,
Z
1U 0 (t) dt = for all open U 6= , and lim u 0 (t) exists,
t
which is possible only if u is constant. In other words, we have now proved that
(N )
when Wx0 ( G < ) < 1 for some x0 G, then the only u C 2 G; [0, )
with u 0 are constant.
Given the preceding paragraph, the rest is easy. Indeed, if reg G = , then
(N )
Theorem 11.1.15 already implies that Wx ( G < ) = 0 for all x G. On the
(N )
other hand, if a reg G but Wx0 G < < 1 for some x0 G, then the
(N )
(N )
preceding paragraph applied to x
Wx ( G < ) says that Wx ( G < )
is constant, which leads to the contradiction
) G
1 > Wx(N
( < ) = xa
lim Wx(N ) ( G < ) = 1.
0
xG
11.1.4. Harmonic Measure. We now have a rather complete abstract analysis of when the Dirichlet problem can be solved. Indeed, we know that, at least
when f Cc (G; R), one cannot do better than take ones solution to be the
function uf given by (11.1.18). For this reason, I will call
(11.1.27)
G (x, ) Wx(N ) ( G ) , G () <
the harmonic measure for G based at x G of the set BG . Obviously,
Theorem 11.1.15 says that G (x, G \ reg G) = 0, and
Z
uf (x) =
f () G (x, d).
G
Actually, S. Kakutanis 1944 article, Two dimensional Brownian motion and harmonic functions, Proc. Imp. Acad. Tokyo 20, together with his 1949 article, Markoff process and the
Dirichlet problem, Proc. Imp. Acad. Tokyo 21, are generally accepted as the first place in
which a definitive connection between the harmonic functions and Wieners measure was established. However, it was not until with Doobs Semimartingales and subharmonic functions,
T.A.M.S. 77, in 1954 that the connection was completed.
3 In 1957, Hunt published a series of three articles: Markov processes and potentials, parts
I, II, & III, Ill. J. Math. 1 & 2. In these articles, he literally created the modern theory of
Markov processes and established their relationship to potential theory. To see just how far
Hunts ideas can be elaborated, see M. Sharpes General Theory of Markov Processes, Acad.
Press Series in Pure & Appl. Math. 133 (1988).
469
y
2
N 1 (d),
N 1 y 2 + ||2 N2 R
y (0, ),
N 1
where N 1 is the surface area of SN 1 and I have identified RN
+ with R
and used RN 1 to denote Lebesgue measure on RN 1 . Hence, after a trivial
translation,
N
R+ (x, y), d =
y
2
N 1 (d)
N 1 y 2 + |x |2 N2 R
for
(x, y) RN 1 (0, ).
Moreover, by using further translation plus Wiener rotation invariance (cf. (ii) in
Exercise 4.3.10), one can pass easily from the preceding to an explicit expression
of the harmonic measure for an arbitrary half-space.
In the preceding, we were able to derive an expression giving the harmonic
measure for half-spaces directly from probabilistic considerations. Unfortunately, half-spaces are essentially the only regions for which probabilistic reasoning yields such explicit expressions. Indeed, embarrassing as it is to admit,
it must recognized that, when it comes to explicit expressions, the time-honored
techniques of clever changes of variables followed by separation of variables are
more powerful than anything which comes out of (11.1.27). To wit, I have been
unable to give a truly probabilistic derivation of the classical formula given in
the following.
Theorem 11.1.28 (Poisson Formula). Use SN 1 to denote the surface
measure on the unit sphere SN 1 in RN , and define
(N ) (x, ) =
N 1
1 |x|2
|x |N
Then:
B(0,1) (x, d) = (N ) (x, ) SN 1 (d),
470
r2 |x c|2
SN 1 (c,r) (d),
N 1 r |x |N
1
x B(c, r).
lim
xa
xB
SN 1
SN 1 B(a,){
for all x B.
SN 1
Z
=
(N ) (r, ) SN 1 (d),
SN 1
where, in the final step, I have used the easily verified identity
(N ) (r, ) = (N ) (r, )
2
for all r [0, 1) and (, ) SN 1 .
471
|x|2 r2
r|x|2
S1 (0,r) (d)
2 |x|2 r2 x2
for each x
/ D(r). In particular, if u Cb R2 \ D(r); R is harmonic on
R2 \ D(r), then
Z
|x|2 r2
|x|2
u(x) =
u(r)S1 (d),
2 S1 |x|2 rx2
(11.1.30)
\D(r)
(x, d) =
and so
(11.1.31)
1
lim u(x) =
2
|x|
Z
S1
u(r) S1 (d).
Proof: After an easy scaling argument, I may and will assume that r = 1.
Thus, set D = D(1), and
that u Cb R2 \ D; R is harmonic in R2 \
assume
x
for x D \ {0}. Obviously, v is bounded and
D. Next, set v(x) = u |x|
2
continuous. In addition, by using polar coordinates, one can easily check that v
is harmonic in D \ {0}. In particular, if (0, 1) and G() B \ B(0, ), then
h
i
h
i
(N )
(N )
v(x) = EWx v (1 ) , 1 < + EWx v ( ) , < 1 , x G(),
where the notation is that in Theorem 10.1.11. Hence, because, by that theorem,
(N )
% (a.s., Wx ) as & 0, this leads to
Z
h
i
(N )
1
1 |x|2
Wx
v(x) = E
v (1 ) , 1 < =
u() S1 (d)
2 S1 x2
for all x D \{0}. Finally, given the preceding, the rest comes down to a simple
matter of bookkeeping.
As a second application of Poissons formula, I make the following famous observation, which can be viewed as a quantitative version of the Strong Minimum
Principle (cf. Theorem 10.1.6) for harmonic functions.
Corollary 11.1.32 (Harnacks Principle).
(0, ),
rN 2 r |x c| B(c,r)
(c, )
N 1
r + |x c|
(11.1.33)
B(c,r)
rN 2 r + |x c| B(c,r)
(c, ).
(x, )
N 1
r |x c|
472
for all x B(c, r). Hence, if u is a non-negative, harmonic function on B(c, r),
then
rN 2 r + |x c|
rN 2 r |x c|
(11.1.34)
N 1 u(c).
N 1 u(c) u(x)
r |x c|
r + |x c|
In particular, if G is a connected region in RN and {un : n 1} is a nondecreasing sequence of harmonic functions on G, then either limn u(x) =
for every x G or there is a harmonic function u on G to which {un : n 1}
converges uniformly on compact subsets of G.
Proof: The inequalities in (11.1.33) are immediate consequences of Poissons
formula and the triangle inequality; and, given (11.1.33), the inequalities in
(11.1.34) comes from integrating the inequalities in (11.1.33). Finally, let a
connected, open set G and a nondecreasing sequence {un : n 1} of harmonic functions be given. By replacing un with un u0 if necessary, I may
and will assume that all the un s are nonnegative. Next, for each x G, set
u(x) = limn un (x) [0, ]. Because (11.1.34) holds for each of the un s and
B(c, r) G, the Monotone Convergence Theorem allows us to conclude that
it also holds for u itself. Hence, we know that both {x G : u(x) = } and
{x G : u(x) < } are open subsets of G, and so one of them must be empty.
Finally, assume that u < everywhere on G, and suppose that B(c, 2r) G.
Then, by the right-hand side of (11.1.34), the un s are uniformly bounded on
B c, 3r
2 , and so, by the last part of Theorem 11.1.28, we know that u is harmonic and that un u uniformly on B(c, r).
473
and define un on H by
(N )
un (x) = EWx
i
u ( H ) , H < n .
for all x H.
(iii) Let K be a compact subset of RN and a connected G K be given.
Assuming either that N 3 or that reg G 6= , show that (11.1.39) holds if K
is a removable singularity in G for every bounded, harmonic function on G \ K.
(N )
Hint: Consider the function x G \ K 7 Wx K < G [0, 1], and use
the Strong Minimum Principle.
(iv) Let G be a non-empty, open subset of RN , where N 2, and set D =
{(x, x) : x G}, the diagonal in G2 . Given a u C(G2 ; R) which is harmonic
on G \ D, show that u is harmonic on G2 .
Hint: Show that
(2N )
Wx,y
t [0, ) (t) D
Z
for (x, y) G2 \ D.
474
Exercise 11.1.40. For each r (0, ), let S(r) denote the open vertical strip
(r, r) R in R2 . Clearly,
S(r) () = r(1) () inf t 0 : |1 (t)| r ,
and so the harmonic measure for S(r), based at any point in S(r), will be
supported on {(x, y) : x = r and y R}. In particular, if u Cb S(r); R is
bounded and harmonic on S(r), then
(11.1.41)
The estimate in (11.1.41) is a primitive version of the PhragmenLindelof maximum principle. To get a sharper version, one has to relax the global boundedness
condition on S(r). To see what can be expected, consider the function
y
(x + r)
for z = (x, y) R2 .
cosh
ur (z) sin
2r
2r
= u satisfies (11.1.41),
which is the true Phragm
enLindel
of principle
(i)
(i) Given R (0, ), set R () = inf{t 0 : |i (t)| R}, and show that, for
any u C S(r); R which is harmonic on S(r),
h
i
h
i
(2)
(2)
(2)
(2)
(2)
u(z) = EWz u r(1) , r(1) R + EWz u R
, R < r(1)
for z S(r, R) (r, r) (R, R). Conclude that (11.1.41) holds as long as
(2)
lim sup u(x, R) u(x, R)Wz(2) R < r(1) = 0, z S(r).
R |x|1
Thus, the desired conclusion comes down to showing that, for each (r, ),
R
(2)
Wz(2) R < r(1) = 0, z S(r).
(*)
lim exp
R
2
475
(ii) To prove (*), let (r, ) be given. Show that, for R (0, ) and
z S(r, R),
i
h
(2)
(2)
1 R +
(2)
(1)
Wz
R
,
<
sin
u (z) = cosh 2 E
r
R
2
(2)
Wz(2) R < r(1) ,
cos r
cosh R
2
2
1
2 u
= f in G and
xa
Notice that, at least when G is bounded, or, more generally, whenever (11.1.20)
holds, there is at most one bounded u C 2 (G; R) which satisfies (11.2.1). Indeed, if there were two, then their difference would be a bounded harmonic function on G satisfying boundary condition 0 at reg G, which, because of (11.1.20)
and Corollary 11.1.19, means that this difference vanishes. Moreover, when
N 3, even if (11.1.20) fails, one can (cf. Theorem 11.1.24) recover uniqueness
by adding to (11.2.1) the condition that
(11.2.2)
lim u(x) = 0.
|x|
xG
1
T
Z
=
f (y) pG (T, x, y) pG (T 1 , x, y) dy
and xa
lim uT (x) = 0 for a reg G.
xG
R
Hence, at least when (11.1.20) holds and therefore G pG (T, x, y)f (y) dy 0
as T % , it is reasonable to hope that u = limT uT exists and will be the
476
desired solution to (11.2.1). On the other hand, it is neither obvious that the
limit will exist nor, even if it does exist, in what sense either the smoothness
properties or (11.2.2) will survive the limit procedure.
Motivated by these considerations, I now define the Green function to be
the function g G given by
Z
G
(11.2.3)
g (x, y) =
pG (t, x, y) dt, (x, y) G2 .
(0,)
g R (x, y) =
2|y x|2N
,
(N 2)N 1
N
What we still have to find are conditions under which GG f solves (11.2.1) and
satisfies (11.2.2). From (11.2.5) and Theorem 10.2.14, it is clear that GG f (x)
477
N
Thus, what we need to know is whether GG f Cb2 (G; R). By the considerations
N
above, we already know that GG f Cb2 (G; R) if and only if GR f is. Moreover,
N
N
if f Cc2 (G; R), then GR f = GR f for any with kk 2. In addition,
N
GR xi xj f (x) =
N 1
Z
RN
Hence, by starting with f s that are in Cc2 (G; R) and applying an obvious approximation argument, we see that GG f Cb2 (G; R) whenever f Cc1 (G; R).1
Theorem 11.2.7. Assume that N 3 and that G is a non-empty, open subset
of RN . Then, for each f Cc1 (G; R), the function GG f in (11.2.6) is the unique
bounded, twice differentiable solution to (11.2.1) which satisfies (11.2.2).
Remark 11.2.8. Notice that the Duhamel formula in (11.2.5) could have been
N
guessed. To be precise, g R is a fundamental solution for 12 in RN in the
N
sense that 12 GR f = f all test functions f Cc1 (RN ; R), and g G is to be a
fundamental solution for 12 in G with 0 boundary data in the sense that it
should be the kernel for the solution operator which solves the Poisson problem in
(11.2.1). Based on these remarks, one should guess that a reasonable approach
N
to the construction of g G would be to correct g R ( , y) for each y G by
N
subtracting off a harmonic function which has g R ( , y) as its boundary value,
and this is, of course, precisely what is being done in (11.2.5).
11.2.2. Green Functions when N {1, 2}. Because (cf. Theorem 10.2.3)
Brownian paths in one and two dimensions spend infinite time in every nonempty open set, the reasoning 11.2.1 is too crude to handle the Poisson problem
1
478
(y+x)2
1
t 2 e 2t 1 dt
= |x + y| |x y| = 2(x y).
More generally, by translation and reflection, we see that, for any c R,
(11.2.10)
for x, y (c, )
479
(1)
Since Wx ( (a,b) < ) = 1 for all x R and the boundary of (a, b) is regular,
Corollary 11.1.19 together with (11.2.10) say that, as a function of x (a, b),
the second term on the right equals u, where u00 = 0, limx&0 u(x) = 0, and
, and so
limx%b u(x) = 2(y a). Hence, u(x) = 2(xa)(ya)
ba
(11.2.11)
g (a,b) (x, y) =
2
x y a (b x y).
ba
= (y1 , y2 ). Therefore,
where y
Z
2
2
pR+ (t, x, y) dt
(0,)
Z
= lim
T %
1
t
|
y x|2
|y x|2
dt
exp
exp
2t
2t
2
|yx|
Z
= lim
T %
1 1
e 2tT dt,
t
|
yx|2
2
|yx|
|
yx| .
h
(2)
log |
y x| is harmonic in G, one can pass
h
i
(2)
1
1
G
G
)|, 0+
() < ,
log |y x| + EWx log |y (0+
first for G R2+ and then, after translation and rotation, for G contained in
any half-space. In addition, by the same argument as I used in 11.2.1, one can
use (11.2.12) to check that if G is contained in a half-space, then GG f solves
Poissons problem for every f Cc (G; R).
To handle regions that are not contained in a half-space, one needs to work
harder.
480
Lemma 11.2.13.
each K G,
g G (x, y) dx = sup
sup
yG
xG
and
(2)
sup
(x,y)K 2
EWx
g G (x, y) dy <
h
i
log |y ( G )|, G () < < .
for all c G and r > 0 with B(c, 2r) G. Given such a ball, set B = B(c, r)
and 2B = B(c, 2r), and define {n : n 0} inductively by 0 = 0 and, for n 1,
/ 2B}.
2n1 = inf{t 2(n1) : (t) B} and 2n = inf{t 2n1 : (t)
(2)
G
If u(x) = Wx 1 < , then u is a [0, 1]-valued, harmonic function on G \ B
that tends to 0 as x tends to reg G and to 1 as x tends to B. Thus, since
reg G 6= , the Minimum Principle says that u(x) (0, 1) for all x G \ B. In
particular, this means that max{u(x) : |x c| = 2r} (0, 1). At the same
time, by the Markov property,
(2)
Wx(2) 2n+1 < G = EWx u (2n ) , 2n () < G () Wx(2) 2n1 < G ,
(2)
(2)
and so Wx 2n1 < G n1 for n Z+ . Hence, if f (y) = EWy 2B ,
then
"Z G
#
"Z
#
2n
X
(2)
(2)
Wx
Wx
G
E
1B (t) dt =
E
1B (t) dt, 2n1 () <
0
X
n=1
n=1
(2)
EWx
2n1
kf ku
.
f (2n1 ) , 2n1 () < G ()
1
481
lim
sup
r (x,y)K 2
1
1
log |y x| g G (x, y) + log |y x|,
which, by the first part of this lemma, means that limr ur cannot be infinite
everywhere on B 2 .
482
Theorem 11.2.14.
Let G be a non-empty, open subset of R2 for which
reg G 6= . Then, (11.1.20) holds,
(2)
sup EWx
x,yK
h
i
log y ( G ), G < < for K G,
and
(2)
(x, y) G2 7 EWx
h
i
log y ( G ), G < R
log r (2) B(c,r)
Wx
G ,
r
hG (x) lim
(11.2.15)
x G,
i
h
(2)
1
1
log |yx|+ EWx logy( G ), G < +hG (x)
g G ( , y) hG
h
i
(2)
1
1
log |y x| + EWx log |y ( G(r) )|, G(r) () < .
r%
(0,)
r%
483
we conclude from the first part of Lemma 11.2.13 that only the second alternative
c2 and that
is possible. Thus, we now know that g G is harmonic on G
(*)
gr (x, y) % g G (x, y)
c2 .
uniformly on compact subsets of G
To go further, first notice that the expression in (11.2.12) for gr can be rewritten as
(**)
where
(2)
ur (x, y) = EWx
i
h
logy ( G ), G () < B(c,r) ()
and apply Lebesgues Dominated Convergence Theorem together with the integrability estimate in the second part of Lemma 11.2.13 to see that, as |y|
through G, the second term tends to 0 uniformly for x in compact subsets of
G.
484
log |x|
R
.
Wx(2) D(r) < G =
log Rr
hR
\D(R)
(x) =
|x|
1
,
log
R
x
/ D(R).
As we are about to see, for Gs whose complements are compact, the conclusion
drawn about hG at the end of Remark 11.2.18 is typical, at least as |x| .
Corollary 11.2.19. Let everything be as in Theorem 11.2.14, and assume
that K R2 \ G is compact. Then, for each R (0, ) with the property that
K D(R), one has that
Z
|x|2 R2
|x|2
|x|
1
G
=
hG (x) log
h (R) S1 (d)
2 S1 |x|2 Rx2
R
Z
1
hG (R) S1 (d)
2 S1
as |x| .
Proof: Define : C(RN ) [0, ] to be the first entrance time into D(R),
and note (cf. the preceding discussion) that, for each r > R and R < |x| < r,
Wx(2) D(r) < G
h
i
(2)
(2)
= Wx(2) D(r) < + EWx W() D(r) < G , < D(r)
h
i
(2)
log |x|
(2)
Wx
R
W() D(r) < G , < D(r) .
r +E
log R
Hence, after multiplying the preceding through by log r , using (11.2.15), and
letting r , we arrive at
h
i
(2)
1
|x|
1
+ EWx hG () , < , x R2 \ D(R),
hG (x) = log
485
|x|
1
log
R
is a bounded function that is harmonic off of D(R). Thus, the desired result
now follows from the first part of Theorem 11.1.29.
Notice that, as a by-product, one knows that the number
Z
1
1
hG (R) S1 () log R
2 S1
does not depend on R as long as G{ B(0, R). This number plays an important role in classical two-dimensional potential theory, where it is known as
Robins constant for G.
Corollary 11.2.20. Again let everything be as in Theorem 11.2.14. Then,
for each K G and r > 0,
n
o
sup g G (x, y) : |x y| r and y K <
and
lim sup g G (x, y) = 0 for each a reg G.
xa
xG yK
Moreover, for each f Cc1 (G; R), GG f is the unique bounded solution to
(11.2.1).
Proof: To prove the initial statements, let c G and r > 0 satisfying B(c, 2r)
G be given, set B = B(c, r), and define
the first entrance time () of
B
.
By
the Markov property, we see that,
0
:
(t)
into B by () = inf t
for any f Cc B; [0, ) ,
"Z G
#
Z
(2)
g G (x, y)f (y) dy = EWx
f (t) dt, < G
G
(2)
Wx
=E
Z
g
.
Hence, if x
/ 2B B(c, 2r) and therefore g G (x, ) B is continuous, we find
that
h
i
(2)
g G (x, y) = EWx g G (), y), < G
for all y B.
But, because g G (2B) B is bounded, we now see that
(*)
sup g G (x, y) CWx(2) < G , x
/ 2B,
yB
486
for some C (0, ). In particular, this, combined with the obvious Heine
Borel argument, proves the first estimate. In addition, if a reg G, then, for
each > 0,
lim Wx(2) G >
lim Wx(2) + xa
lim Wx(2) < G xa
xa
xG
xG
xG
lim Wx(2)
xa
xG
Thus, since the last expression obviously tends to 0 as & 0, this, together with
(*), implies that
lim sup g G (x, y) = 0,
xa
xG yB
which (again after the obvious HeineBorel argument) means that we have also
proved the second assertion.
Turning to the last part of the statement, let f Cc1 (G, R) be given. By the
preceding, we know that GG f is bounded and tends to 0 at reg G. In addition,
using Theorem 11.2.14, especially (11.2.16), and arguing as I did in the case
when N 3, it is easy to check that GG f C 2 (G; R) and 12 GG = f . Thus,
GG f is a bounded solution to (11.2.1), and, because (11.1.20) holds, it can be
the only such solution.
for distinct x, y from B(c, R). Thus, assume that c = 0 and R = 1. Next,
observe that
y
|x y| = |y|x |y|
for x SN 1 and y BRN (0, 1) \ {0},
and use this observation together with (11.2.12) and (11.2.5) to conclude that
(
y
|y|x
log
1
1
if y 6= 0
B(0,1)
|y|
g
(x, y) = log |y x| +
0
if y = 0
when N = 2 and
N
N
y
gR
|y|x
if y 6= 0
|y|
when N 3.
2
(N 2)N 1
if y = 0
487
Exercise 11.2.22. The derivation that I gave of Poissons formula (cf. Theorem 11.1.28) required me to already know the answer and simply verify that it is
correct. Here I outline another approach, which is the basis for a quite general
procedure. To begin with, recall the classical Greens Identity
Z
Z
u
v
dG
v n
uv vu dx =
u n
G
for bounded, smooth regions G in R and functions u and v that are smooth
in a neighborhood of G. (In the preceding, w
n (x) is used to denote the normal
derivative w(x), n(x) RN , where n(x) is the outer unit normal at x G
and G is the standard surface measure for G.) Next, let c be an element of
B(0, 1), suppose r > 0 satisfies B(c, r) B(0, 1), and let u be a function that
is harmonic in a neighborhood of BRN (0, 1). By applying Greens Identity with
G = BRN (0, 1) \ B(c, r) and v = 12 g B(0,1) (c, ), use Exercise 11.2.21 to verify
Z
N 1
u(c) = lim r
, v(c + r) RN u c + r) SN 1 (d)
r&0
SN 1
Z
Z
=
, v() RN u ) SN 1 (d) =
u ) (N ) (c, ) SN 1 (d),
SN 1
SN 1
uR (x) = EWx
f ( B(0,R) ) , B(0,R) < for R 1 and x B(0, R),
check that, as R & 1, uR u1 uniformly on B(0, 1), and use the preceding to
conclude that
Z
u1 (c) =
f () (N ) (c, ) SN 1 (d),
SN 1
which is, of course, the result that was proved in Theorem 11.1.28.
11.3 Excessive Functions, Potentials, and Riesz Decompositions
The origin of the Green function lies in the theory of electricity and magnetism.
Namely, if G is a region in RN whose boundary is grounded and y G, then
g G ( , y) should be the electrical potential in G that results from placing a unit
point charge at y. More generally, if is any distribution of charge in G (i.e.,
a non-negative, locally finite, Borel measure on G), then one can consider the
potential GG given by
Z
(11.3.1)
GG (x) =
g G (x, y) (dy), x G,
G
488
11.3.1. Excessive Functions. Throughout this subsection, G will be a nonempty, connected, open region in RN , and I will be assuming either that N 3
or that (11.1.20) holds. Thus, by the results obtained in 8.2.1 and 8.2.2, the
Green function (cf. (11.2.3)) g G satisfies (depending on whether N = 1, N = 2,
or N 3) either (11.2.10), (11.2.11), (11.2.16), or (11.2.5), and, in order to have
g G defined everywhere on G2 , I will take g G (x, x) = , x G, when N 2.
I will say that u is an excessive function on G and will write u E(G) if
u is a lower semicontinuous, [0, ]-valued function that satisfies the super mean
value property:
u(x)
N 1
SN 1
u(x + r) SN 1 (d)
whenever BRN (x, r) G.
= lim
r&0
N 1
u(x + r) u(x) S N 1 (d) 0
SN 1
u(x) = E
(N )
u ( B(x,r) ) , B(x,r) < EWx
"Z
B(x,r)
N 1
#
1
2 u
( ) d
Z
SN 1
u(x + r) SN 1 (d).
pG (t, , y) dt % un
1
n
as T ,
489
h
i
(N )
(N )
Wx
un (x) E
fn (t) dt = EWx un (r ) , r <
r
N 1
Z
un (x + rx) SN 1 (dx),
SN 1
we are done.
11.3.2. Potentials and Riesz Decomposition. My next goal is to prove
that, apart from the trivial case when u , every excessive function on G
admits a unique representation in the form GG + h for an appropriate choice
of and h. The proof requires me to make some preparations.
Lemma 11.3.4. If u E(G), then either u or u is locally integrable on G.
Next, given a u E(G) that is not identically infinite, there exists a sequence
{un : n 1} Cc (G; R) and a non-decreasing sequence {Gn : n 1} of
open subsets of G with the properties that Gn G, Gn % G, un u,
un 0 on Gn for each n 1, and un u pointwise as n . Moreover,
if n (dy) = 12 1Gn (y)un (y) dy, then there is a non-negative, locally finite,
Borel measure on G such that
Z
Z
(11.3.5)
lim
dn =
d for all Cc (G; R).
n
Proof: To prove the first assertion, let U denote the set of all x G with the
property that
Z
u(y) dy < for some r > 0 with B(x, r) G.
B(x,r)
490
and so, after integrating this with respect to N sN 1 ds over (0, r), we get
Z
Z
1
1
u(z) dz = ,
u(z) dz
u(y)
N 1 rN B(x,)
N 1 rN B(y,r)
where r |y x|. Hence, we now see that G \ U is also open, and therefore
that either U = G or U = and u .
Now assume that u E(G) is not identically infinite. To construct the required
Gn s and un s, choose a reference point c G, set R = 12 |c G{|, and take
Cc B(0, R4 ); [0, ) to be a rotationally invariant function with total integral
1. Next, for each n Z+ , set
and
Gn = x G B(c, n) : |x G{| > R
n
Z
(11.3.7)
un (x) =
n (x y)u(y) dy, x RN ,
G4n
where n () = nN (n). Clearly, {un : n 1} Cc G; [0, ) . In addition, if
x Gn , then, by taking advantage of the rotation invariance of , one can check
that
Z
Z
N 1
t
t
(t)
u x + n SN 1 (d) dt
un (x) =
(0, R
4 )
SN 1
tN 1 (t) dt = u(x),
u(x) N 1
(0, R
4 )
where : R [0, ) is taken so that (x) = |x| . Similarly, if B(x, r)
Gn , then
Z
un (x + r) SN 1 (d)
SN 1
Z
(z)
=
B(0, R
4 )
Z
N 1
B(0, R
4 )
u x+
SN 1
1
nz
+ r SN 1 (d)
dz
(z)u x + n1 z dz = N 1 un (x).
491
observe that we already know that u(x) limn un (x). On the other hand,
because u is lower semicontinuous, an application of Fatous Lemma yields
Z
(y) u x + n1 y dy = lim un (x).
u(x) lim
n
To complete the proof, let n be the measure described, and note that
#
"Z
t Gn
h
i
(N )
(N )
1
un (x) = EWx un (t Gn ) EWx
2 un (s) ds
0
(N )
Wx
"Z
t Gn
#
1
2 un
(s) ds =
Z t Z
p
0
Gn
(s, x, y) n (dy)
ds
Gn
for all n Z+ and (t, x) (0, ) Gn . Hence, after letting t % , we see that
Z
u(x) un (x)
g Gn (x, y) n (dy), n Z+ and x Gn .
Gn
u
dx
=
lim
dn = lim
n
2 u dx,
2
n
is a non-increasing function.
492
Proof: Let u E(G) be given. Clearly (11.3.9) is trivial in the case when
u . Thus, assume that u 6 , and define Gn and un for n Z+ as in
(11.3.7). Because un Gn 0, we know that
h
i
(N )
EWx un ( Gm T ) , () T < Gm ()
h
i
(N )
EWx un ( T ) , () T < Gm ()
for all 1 m n, x Gm , and T [0, ). Next, after noting that Gm <
(N )
Wx -almost surely, let T % in the preceding, and arrive at
h
i
h
i
(N )
(N )
EWx un ( Gm ) , () < Gm () EWx un () , () < Gm () .
But, because and u un 0, this means that
h
i
h
i
(N )
(N )
EWx un ( ) , () < Gm () EWx u () , () < Gm () ,
which, because 0 un u pointwise, leads, via Fatous Lemma, first to
h
i
h
i
(N )
(N )
EWx u ( ) , () < Gm () EWx u () , () < Gm ()
and thence, by the Monotone Convergence Theorem, to (11.3.9) when m .
From here, the rest is easy. Given a lower semicontinuous u : G [0, ]
and B(x, r) G, we have (cf. (11.3.3))
Z
h
i
(N )
1
u(x + r) SN 1 (d) = EWx u (r ) , r () < G () .
N 1 SN 1
is non-increasing; and, therefore, not only is u excessive but also (after passing
to polar coordinates and integrating) one finds that the monotonicity described
in the final assertion is true.
Theorem 11.3.10 (Riesz Decomposition). Let G be a non-empty, connected open subset of RN , and assume either that N 3 or that (11.1.20)
holds. If u E(G) is not identically infinite, then there exists a unique locally finite, non-negative Borel measure and a unique non-negative harmonic function
h on G with the property that
(11.3.11)
493
un (x) =
Gm
(N )
un ( Gm ) , Gm < .
Hence, by the Monotone Convergence Theorem, for any locally finite, nonnegative, Borel measure on G,
Z
ZZ
Z
Gm
(*)
u(x) (dx) = lim
g (x, y) (dx)dn (y) +
wm (x) (dx),
Gm
Gm
G2m
(N )
where wm (x) = EWx u ( Gm ) , Gm < .
Notice (cf. Harnacks Principle) that, as the non-decreasing limit of nonnegative harmonic functions {wm,n : n m}, wm is either identically infinite
or is itself a non-negative harmonic function on G; and so, since u(x) <
Lebesgue-almost everywhere, (*) shows that the latter must be the case. Now
let a be a fixed element of Gm , take n as in (11.3.7), and, for n m, define
(R
(x a)g Gm (x, y) dx if y Gm
Gm n
n (y) =
0
otherwise.
By taking (dx) = 1Gm (x)n (x a) dx in (*), we see that, for n m,
Z
Z
n (x a) u(x) dx = lim
n (y) k (dy)
k G
Gm
Z
+
n (x a) wm (x) dx.
Gm
But, since Gm is the intersection of two sets, both of which (cf. part (iv) in
Exercise 10.2.19) are regular, and is therefore regular as well, there is an n(a)
m for which n is continuous whenever n n(a). In particular, by (11.3.5), we
can now say that
Z
Z
Z
n (x a) u(x) dx =
n (x) (dx) +
n (x a) wm (x) dx
Gm
Gm
494
the same
time, it is clear that the second term on the right goes to wm (a) and
that n (y) : n n(a) tends non-decreasingly to g Gm (a, y). Thus, we have
now proved that
(**)
u = GGm + wm
on Gm for every m Z+ .
Starting from (**), the rest of the proof is quite easy. Namely, fix x G,
choose m so that x Gm , note that, g Gn (x, ) is non-decreasing as n m
increases, and conclude that GGnm (x) % GG (x). Hence, by (**) (alternatively, by (11.3.9)), we know that wmn (x) tends non-increasingly to a limit
h(x), which Harnacks Principle guarantees to be harmonic as a function of
x G. Thus, after passing to the limit as m in (**), we conclude that
(11.3.11) holds with the satisfying (11.3.6) and h = limm H Gm u.
To prove that these quantities are unique, note that if is any locally finite,
non-negative, Borel measure on G for which u GG is a non-negative harmonic
function, then, for every Cc (G; R), simple integration by parts plus the
symmetry of g G shows that
Z
Z
Z
G
1
1
G d =
d.
u dx = 2
2
G
That is, must satisfy (11.3.6); and so we have now derived the required uniqueness result.
Finally, to check the asserted characterization of h, suppose that v is a nonnegative harmonic function that is dominated by u on G. We then have
(N )
v(x) = EWx v ( Gm ) , Gm () < wm (x) for m Z+ and x Gm ,
and therefore the desired conclusion follows from the fact that wm tends to h.
By combining Lemma 11.3.2 with Theorem 11.3.10, we arrive at the following
characterization of potentials.
Corollary 11.3.12. Let everything be as in Theorem 11.3.10, and suppose
that u : G [0, ] is not identically infinite. Then a necessary and sufficient
condition for u to be the potential GG of some locally finite, non-negative,
Borel measure on G is that u be excessive on G and have the property that
the constant function 0 is the only non-negative harmonic function on G that is
dominated by u.
Let u be an excessive function on G that is not identically infinite. In keeping
with the electrostatic metaphor, I will call the measure entering the Riesz decomposition (11.3.11) of u the charge determined by u. A more mathematical
interpretation is provided by Schwartzs theory of distributions. Namely, when
u E(G) is not identically infinite, it is (cf. Lemma 11.3.4) locally integrable on
G, and, as such, it determines a distribution there. Moreover, in the language
of distribution theory, (11.3.6) says that = 12 u. However, the following
theorem provides a better way of thinking about .
495
u(y)pG (s, x, y) dy
Z
(x) (dx) = lim
(x) s (dx)
s&0
Proof: If u E(G), then, by the first part of Lemma 11.3.8 with = s and
= 0, one sees that u us . Conversely, suppose that u : G [0, ] is lower
semicontinuous, not identically infinite, and satisfies u us for all s > 0. Then,
since pG (s, x, ) > 0, u is locally integrable on G. Thus, if B(c, r) G and
ws (x) =
B(c,r)
u(y)pB(c,r) (s + t, x, y) dy
ws+t (x) =
B(c,r)
B(c,r)
for (s, t) (0, )2 and x B(c, r). Hence, if Cc2 B(c, r); [0, ) , then
Z
B(c,r)
1
t&0 s
12 ws (x)(x) dx = lim
ws (x) ws+t (x) (x) dx 0,
B(c,r)
which proves that ws 0 on B(c, r). Since this means that ws E B(c, r)
for each s > 0 and because
ws is non-increasing as a function of s, we will know
that u E B(c, r) once we show that ws u pointwise on B(c, r). But,
since ws u, this comes down to checking u(x) lims&0 ws (x), which follows
from lower semicontinuity.
496
Turning to the second assertion, begin with the observation that, because
u us and u is lower semicontinuous, us u pointwise as s & 0. Next, note
that for (s, x) (0, ) G,
"Z
#
Z T +s
s
1
ut (x) dt
ut (x) dt
g (x, y)fs (y) dy = lim
T s
0
T
G
Z
1 s
ut (x) dt u(x).
s 0
Z
s =
0
G
1
2 (y)p (,
, y) dy
d,
and so, by Fubinis Theorem and the symmetry of pG (, x, y), one can justify
Z
ds =
G
1
2s
Z
Z Z
u (y) d
(y) dy
Z
1
d.
2 (y)u(y) dy =
G
11.4 Capacity
497
In addition, because pG
K is bounded, the left-hand side is continuous with respect
to x G, and clearly the middle expression tends non-decreasingly to pG
K (x) as
s & 0. Thus, by the first part of Theorem 11.3.13, we now know that pG
E(G).
K
1
It is interesting to note that, although Wieners 1924 article, Certain notions in potential
theory, J. Math. Phys. M.I.T. 4, contains the first proof that an arbitrary compact set is
capacitable, it contains no reference to his own measure.
498
N 1
B(x,r)
(N )
Wx
B(x,r)
pG
pG
,
() <
K () SN 1 (d) = E
K (
SN 1
= Wx(N ) t B(x,r) (), G () (t) K = pG
K (x).
That is, pG
K satisfies the mean value property in G \ K and is therefore harmonic
there.
To complete the proof I must still show that if M(G) is supported on
G
K and u GG 1, then u pG
K , and I will start by showing that u pK
on G \ K. To this end, observe that u is harmonic on G \ K and that it tends
to 0 at reg G {}. Thus, if () = inf{t 0 : (t) K()}, where
K() = {x : |x K| }, then, for 0, dist(K, G{) and x G \ K(), u(x)
is dominated by
(N )
EWx
u ( ) , () < G () Wx(N ) t 0, G () (t) K() .
where ur
g G ( , y) (dy).
G\B(x,r)
11.4 Capacity
499
G
The function pG
K and the measure K are, for the reasons explained above,
known as, respectively, the capacitory potential and the capacitory distribution for K in G, and the total mass
Cap(K; G) G
K (K)
(11.4.3)
GGB(0,R) K
B(0,R)
GG K
1.
i
g G ( GB(0,R) ) , GB(0,R) () < ,
there exists (cf. Corollary 11.2.20 when N = 2) a C < such that g G (x, y)
B(0,R)
g GB(0,R) (x, y) + C for all x
/ B(0, R) and y K. Hence, GG K
500
B(0,R)
1 + CCap K, B(0, R) , and so we have shown that GG K
is a non-zero,
bounded potential on G whose charge is supported in K, which, by the preceding
equivalences, means that Cap(K; G) > 0. Conversely, if Cap(K; G) > 0, then,
again by the preceding equivalences, we know that pG
K > 0everywhere on G,
(N )
which, of course, means that Wx t (0, ) (t) K > 0, first for all
x G and then for all x RN .
The last part of the preceding allows us to use capacity to determine whether
Brownian paths will hit a K RN . Indeed, we now know that they will if
and only if Cap(K; G) > 0 for some G K satisfying our hypotheses. Thus,
the ability of Brownian paths in RN to hit a set is completely determined by
the singularity in the Green function. Namely, they will hit K with positive
probability if and only if there is a non-zero supported on K for which GG
is bounded. When N = 1, there is no singularity, and so even points can be hit.
When N 2, there is a singularity, and so, in order to be hit, K has to be large
enough to support a measure that is sufficiently smooth to mollify the singularity
in the Green function. Non-trivial (i.e., Ks for which K{ is the interior of its
closure) examples of Ks that cannot be hit are hard to come by. Lebesgues
spine provides one in R3 and can be adapted to RN for N 3. When N = 2
one has too work much harder. The most famous example is a devilishly clever
construction, known as Littlewoods crocodile, due to J.E. Littlewood. See M.
ements de la Theorie Classique du Potenial published
Brelots lecture notes El
in 1965 by Centre de Documentation Universitaire, Sorbonne, Paris V.
11.4.2. The Capacitory Distribution. In this subsection I will give a probabilistic representation, discovered by K.L. Chung, of the capacitory distribution
N
G
K . Again I assume that G is a connected open subset of R and that either
N 3 or (11.1.20) holds.
N
The function `G
K : C(R ) [0, ] given by
(11.4.5)
G
`G
K () = sup t 0, () : (t) K
0 if t 0, G () : (t) K = .
for t [0, G ).
This result appeared originally in K.L. Chungs Probabilistic approach in potential theory
to the equilibrium problem, Ann. Inst. Fourier Gren. 23 # 3, pp. 313322 (1973). It gives
the first direct probabilistic interpretation of the capacitory measure.
11.4 Capacity
501
Cap(K; G) > 0. Then, for all Borel measurable : G R that are bounded
below and every c G,
#
"
Z
(N )
(`G
K)
G
G
Wc
, `K (0, ) .
(11.4.7)
dK = E
g G c, (`G
G
K)
Proof: Take u = pG
f and s for s > 0 as in Theorem 11.3.13.
K , and define
s
(N )
G
Then sfs (x) = Wx 0 < `K s , and so, for any Cb (G; R),
"Z G
#
Z
(N )
G
Wc
g (c, y)(y) s (dy) = E
(t) fs (t) dt
G
1
s
1
s
(N )
i
(N )
G
(t) W(t) 0 < `G
K s , > t dt
(N )
i
(t) , t < `G
s
+
t
dt
K
EWc
EWc
#
" Z G
1 `K
G
(t) dt, `K (0, )
=E
s (`G
s)+
K
h
i
G
(N )
EWc (`G
as s & 0,
K ) , `K (0, )
(N )
Wc
where, in the passage to the third line, I have applied the Markov property and
used the time-shift property of `G
K . Next, let Cc (G; R) be given, note that
is
again
an
element
of
Cc (G; R), and conclude from Theorem 11.3.13
= gG (c,
)
and the preceding that (11.4.7) holds first for s in Cc (G; R) and then for all
bounded, measurable s on G.
Aside from its intrinsic beauty, (11.4.7) has the virtue that it simplifies the
proofs of various important facts about capacity. For instance, it allows one to
prove a basic monotone convergence result for capacity. However, before doing
so, I will need to introduce the the energy E G (, ), which is defined for locally
finite, non-negative Borel measures and on G by
ZZ
E G (, ) =
g G (x, y) (dx)(dy).
G2
E G (, )
E G (, );
and, when the factors on the right are both finite, equality holds if and only if
a b = 0 for some pair (a, b) [0, )2 \ (0, 0).
502
pG (t, x, y) (dy)
f (t, x) =
g G (t, x, y) (dy),
and g(t, x) =
E G (, ) =
ZZ
pG (t, x, y) (dx)(dy) dt
Z
(0,)
G2
ZZ
=
t
2, x
t
2, x
dtdx
(0,)G
ZZ
2
t
2, x
ZZ
2
t
2, x
dtdx
(0,)G
12
12
ZZ
ZZ
g(t, x) dtdx
f (t, x) dtdx
(0,)G
(0,)G
dtdx
(0,)G
12
12
q
E G (, ) E G (, ).
Furthermore, when f and g are square integrable, then equality holds if and only
if they are linearly dependent in the sense that af bg = 0 Lebesgue-almost
everywhere for some non-trivial choice of a, b [0, ). But this means that
Z
a
a
d = lim
T
&0
T
G
a
= lim
T &0 T
Z
(x)p (t, x, y) (dx) dt
G
ZZ
b
(x) f (t, x) dtdx = lim
T &0 T
b
T &0 T
Z
= lim
(0,T ]G
(0,T ]G
ZZ
Z
dt = b
d
G
11.4 Capacity
503
and so
Cap(K; G) = lim Cap Kn ; G).
n
= EG
12
G
E G G
Kn , Kn
Z
12
1
G
G
G
G 2
pKn (x) Kn (dx)
K , K
EG
1
1
1
1
G 2
G 2
Cap K; G 2
Cap Kn ; G 2 E G G
G
K , K
K , K
G
E G G
K , K
21
504
G
as n . Hence, Cap(K; G) E G G
K , K . On the other hand, if (G \ K) =
0 and GG 1, then, by Theorem 11.4.1, GG pG
K 1,
Z
Z
G
E G (, ) =
GG d
pG
G
K d = E
K,
G
G
G
G
G
K , K
12
E (, ) 2
E
Z
12
q
p
1
G
G
2
E G (, ),
Cap(K;
G)
E
(,
)
=
pG
d
K
K
G
K
K . In addition, for any with (G \ K) = 0 and
G
G
G 1, it shows that E (, ) Cap(K; G) and that equality can hold only
if and G
in which case = G
K are related by a non-trivial linear equation,
K
G
G
G
G
follows immediately from the equality E K , K = E (, ).
The result in Theorem 11.4.9, which was known to Wiener, played an important role in his analysis of classical potential theory. To be more precise, when
3
3
N = 3 and K{ is regular, pR
K is the continuous function on R that is harmonic
off K, is 1 on K, and tends to 0 at infinity. Thus, it is a relatively simple problem to define the capacitory distribution for such Ks in R3 . The importance
to Wiener of results like that in Theorem 11.4.9 is that they enabled him (cf.
Exercise 11.4.20) to make a consistent assignment of capacity to Ks for which
K{ is not necessarily regular.
11.4.3. Wieners Test. This subsection is devoted to another of Wieners
famous contributions to classical potential theory.
As was pointed out following Corollary 11.4.4, capacity can be used to test
whether Brownian paths will hit a compact set K. By Lemma 11.1.21, an
equivalent statement is that capacity can be used to test whether reg (K{) is
empty or not. The result of Wiener that will be proved here can be viewed as a
sharpening of this remark.
Assume that N 2, and let an open subset G of RN and an a G be given.
For n Z+ , set
n
o
Kn = y
/ G : 2n1 |y a| 2n ,
and define
(11.4.10)
Wn (a, G) =
nCap Kn ; B(a, 1)
if N = 2
a reg G
X
n=1
Wn (a, G) = .
if N 3.
11.4 Capacity
505
Notice that, at least qualitatively, (11.4.11) is what one should expect in that
the divergence of the series is some sort of statement that G{ is robust at a.
The key to my proof of Wieners test is the trivial observation that because
Z
B(a,1)
B(a,1)
pn (x) pKn (x) =
g B(a,1) (x, y) Kn (dy),
Kn
n Z+ .
Hence, in probabilistic terms, Wieners test comes down to the assertion that
Wa(N )
G
0+
X
= 0 = 1
Wa(N ) An = ,
1
where An is the set of C(RN ) that visit Kn before leaving B(a, 1). Actually,
although the preceding equivalence is not obvious, the closely related statement
G
(11.4.12)
Wa(N ) 0+
= 0 = 1 Wa(N ) lim An > 0
n
G
is essentially immediate. Indeed, if (0) = a and 0+
() = 0, then there
exists a sequence of times tm & 0 with the property that (tm ) B(a, 1)
G{ for all m, from which it is clear that visits infinitely many Kn s before
leaving B(a, 1). Hence, the = in (11.4.12) is trivial. As for the opposite
N
B(a,1)
implication,
suppose
() < ,
B(a,1)
that C(R ) has the properties that
t 0,
() : (t) = a} = {0}, and that visits infinitely many Kn s
before leaving B(a, 1). We can then find a subsequence {nm : m 1} and
a convergent sequence of times tm > 0 such that (tm ) Knm for each m.
Clearly, limm
(tm) = a, and therefore
limm
tm = 0. In other words, if
B(a,1) () < , t 0, B(a,1) () : (t) = a = {0}, and limn An ,
G
then 0+
() = 0. Hence, since N 2 and therefore
Wa(N )
= 1,
G
Wa(N ) 0+
= 0 Wa(N ) lim An ;
n
(N )
G
0+
= 0 {0, 1}, we have proved the equivalence
506
In view of the preceding paragraph, the proof of Wieners test reduces to the
problem of showing that
Wa(N )
(11.4.13)
lim An > 0
Wa(N ) An = .
X
1
1
.
P An = = P lim An
n
4C
Proof: Because
X
P An = =
P And+k = for some 0 k < d,
n=1
n=1
whereas
P
lim An P lim And+k
n
1
for all n Z+ . In particular, these assumptions
I will assume that P(An ) 4C
mean that, for each m Z+ , we can find an nm > m such that
sm
nm
X
3 1
,C .
P A` 4C
`=m
Pn
Indeed, simply take nm to be the largest n > m for which `=m P A`
At the same time, by an easy induction argument on n > m, one has that
!
n
n
[
X
1 X
P Ak A`
P
A`
P A`
2
`=m
`=m
mk6=`n
1
C.
11.4 Capacity
for all n > m 1, and therefore
!
[
P
A` P
`=m
n
m
[
!
A`
sm
`=m
507
1
Cs2m
4C
2
for all m Z+ .
Proof of Wieners Test: All that remains is to check that the sets An
(N )
appearing in (11.4.13) satisfy the hypothesis in Lemma 11.4.14 when P = Wa .
To this end, set
n
o
n () = inf t (0, ) : (t) Kn .
Clearly, An = n < B(a,1) , and so
Wa(N ) Am An Wa(N ) m < n < B(a,1) + Wa(N ) n < m < B(a,1)
for all m Z+ and n 6= m. But, by the Markov property,
(N )
Wa(N ) m < n < B(a,1) EWa pn (m ) , m () < B(a,1) ()
(m, n)pm (a),
where I have introduced the notation (m, n) maxxKm pn (x). Finally, beB(a,1)
g B(a,1) (x,y)
g B(a,1) (a,y)
as the amount of heat that flows into K during [0, t] from outside.
3
See Electrostatic capacity, heat flow, and Brownian motion, in Z. Wahrsh. Gebiete. 3. Recently, M. Van den Burg has written several papers in which he greatly refines Spitzers result.
508
we know that t
EK (t) is a bounded, non-negative, continuous, non-decreasing
function.
I next observe that, for any 0 h < t,
Z
EK (t) EK (t h) =
Wx(N ) t h < K t dx.
RN
To see this, notice that there would be nothing to do if the integral were over
(N )
K{. On the other hand, by part (ii) of Exercise 10.2.19, Wx (K > 0) = 0
Lebesgue-almost everywhere on K, and so the integral over K does not contribute anything.
I now want to replace the preceding by
Z
h
(*)
EK (t) EK (t h) =
Wy(N ) K h and K
> t dy,
RN
where
h
K
() inf s (h, ) : (s) K
is the first entrance time into K after time h. To prove (*), set
(x,y)
(s) =
s
ts
x + t (s) + y,
t
t
s [0, t],
Wx(N ) t h < K t
Z
(x,y)
=
W (N ) t h < K t
t g (N ) (t, y x) dy
N
ZR
(y,x)
(y,x)
h
=
W (N ) K t
h and K
t
> t g (N ) (t, y x) dy,
RN
11.4 Capacity
509
1
K (h)
= lim
h&0 h
h&0
h
K (1) = lim
h
Wy(N ) K h & K
= dy
B(0,R)
g (N ) (h, y )pR
K
K () d.
RN
Finally, combine these with Theorem 11.3.13 to arrive at K (1) = Cap K; RN .
To complete the proof, set ]t[= t btc and write
[t]
EK (t) = EK
X
]t[ +
EK ]t[ +n EK ]t[ +n 1 .
n=1
Using this together with K (h) = hCap(K; G), one obtains the desired result.
The next two computations provide asymptotic formulas as t % for the
(N )
quantity Wx K (t, ) .
Theorem 11.4.17.4 If N 3 and K RN , then, as t % ,
pK (t, x)
Wx(N )
N
2Cap(K; RN ) 1 pR
K (x) 1 N
t 2
(t, )
N
(2) 2 (N 2)
This result was conjectured by Kac and first proved by his student A. Joffe. However, I will
follow the argument given by F. Spitzer in the article cited above.
510
Proof: Without loss in generality (cf. Corollary 11.4.4), I will assume that
N
K{
Cap(K; RN ) > 0. Next, set pK (x) = pR
(t, x, y), and
K (x) and pK (t, x, y) = p
note that, by the Markov property,
Z
pK (t, x) =
lim sup t
N
2
t xRN
Z
p
(t,
x)
p
(y)
p
(t,
x,
y)
dy
K
=0
K
K
|y|R
for every R > 0 with K B(0, R). At the same time, because
Z
g R (x, y) R
K (dx),
pK (y) =
K
it is clear that
lim |y|N 2 pK (y) =
|y|
2Cap(K; RN )
.
(N 2)N 1
N Z
p
(t,
x,
y)
2Cap(K;
R
)
K
dy
lim sup t 1 pK (t, x)
=0
N
2
t xRN
(N 2)N 1 |y|R |y|
N
2
for each R (0, ) with K B(0, R), and what we must still prove is that
(*)
N Z
N 1 (N )
pK (t, x, y)
2 1
W
(
=
)
dy
lim sup t
=0
K
N
x
N
2
t |x|r
|y|
(2) 2
|y|R
h
i
(N )
pK (t, x, y)
Wx
dy
=
q(t,
x)
E
q
t
,
(
)
,
<
t
,
K
K
K
|y|N 2
where
Z
q(t, x)
|y|R
g (N ) (t, y x)
dy
|y|N 2
11.4 Capacity
511
After changing to polar coordinates and making a change of variables, one can
easily check that, for each T [0, ),
N
N 1
lim sup t 2 1 q(t s, x)
N
t 0<sT
(2) 2
|x|r
= 0.
N 1 (N )
pK (t, x, y)
(K = )
dy
N Wx
N 2
|y|
(2) 2
|y|R
!
i
h N
(N )
N
N 1
N 1
Wx
1
2 1 q t
,
T
,
(
)
E
t
= t 2 q(t, x)
K
K
K
N
N
(2) 2
(2) 2
i
h N
(N )
N 1 (N )
K (T, ) ,
EWx t 2 1 q t K , (K ) , K (T, t) +
N Wx
(2) 2
N
2
then it becomes clear that (*) will follow once we check that
(**)
lim
lim sup Wx(N ) K (T, ) = 0 and
T xRN
h
i
(N )
N
sup t 2 1 EWx q t K , (K ) , K (T, t) = 0.
T t>T
xRN
To check the first part of (**), note that, by the Markov property,
Wx(N ) K (T, T + 1] =
pK (T, x, y)Wy(N ) K 1 dy
K{
N
2
(2T )
RN
N
Wy(N ) K 1 dy CT 2 ,
X
(T, )
Wx(N ) K (T + n, T + n + 1] ,
n=0
(N )
we see that, as T , Wx K (T, ) 0 uniformly with respect to
x RN .
To handle the second part of (**), note that there is a constant A (0, )
for which
N
q(t, x) A (t 1)1 2 , (t, x) (0, ) K,
512
and therefore
N
(N )
t 2 1 EWx
q t K , (K ) , K (T, t)
N
1
2
Wx(N ) K [t] 1, t
At
[t]1
N
(t `)1 2 Wx(N ) K (` 1, `]
`=[T ]
[t]1
ACt
N
2
([t] 1)
N
2
+ ACt
N
2
(t `)1 2 (` 1) 2 ,
`=[T ]
where the C is the same as the one that appeared in the derivation of the first
part of (**). Thus, everything comes down to verifying that
N
lim sup n 2 1
m n>m
n1
X
(n `)1 2 ` 2 = 0.
`=m
(n `)1 2 ` 2
and
(n `)1 2 ` 2
(1m )n`n
m`(1m )n
n 2 1
n1
X
(n `)1 2 ` 2 Bm .
`=m
for each x R2 \ K.
This theorem is taken from G. Hunts article Some theorems concerning Brownian motion,
T.A.M.S. 81, pp. 294319 (1956). With breathtaking rapidity, it was followed by the articles
referred to in 11.1.4.
11.4 Capacity
513
Proof: The strategy of Hunts proof is to deal with the Laplace transform
Z
(2)
et W (2) K > t dt = 1 1 EWx eK ,
0
show that
(2)
log 1
1 EWx eK = hK (x),
&0 2
(*)
lim
Writing
Z
2f () =
0
1
1
t exp t t1 dt +
Z
+
t1 et dt,
t1 et exp t1 1 dt
log
+ + o(1)
as & 0,
514
where
=
et log t dt.
as & 0.
(2)
1
1
log |y x| + EWx log |y (K )|, K <
(2)
log 1
1 EWx eK + o(1)
+
2
lim
Wx
(2)
Wx
K > t
RN
Further, say that BRN has capacity zero if there is no tame M(RN )
for which () > 0.
(i) If K RN , show that K has capacity 0 if and only if Cap K; B(0, R) = 0
for some R > 0 with K B(0, R). Further, show that if K has capacity 0, G
is open with K G, and either N 3 or (11.1.20) holds, then Cap(K; G) = 0.
(ii) If BRN , show that has capacity 0 if and only if every compact K
has capacity 0.
(iii) For any open G RN , show that G \ reg G has capacity 0.
515
G
N
E G (K1 K2 + K1 K2 , K1 K2 + K1 K2 ) E G (K1 K2 + K1 K2 , K1 + K2 ).
Next, apply (v) of the preceding exercise to see that
E G (K1 K2 +K1 K2 , K1 K2 +K1 K2 ) = Cap(K1 K2 ; G)+3Cap(K1 K2 ; G)
and
E G (K1 K2 +K1 K2 , K1 +K2 ) = Cap(K1 ; G)+Cap(K2 ; G)+2Cap(K1 K2 ; G),
and conclude that Cap( ; G) satisfies the strong sub-additivity property
Cap(K1 K2 ; G) + Cap(K1 K2 ; G) Cap(K1 ; G) + Cap(K2 ; G).
What Choquet showed is that a non-negative set function defined for compact
subsets of G and satisfying the monotonicity property in (i), the monotone
convergence property in (ii), and the strong subadditivity property in (iii) admits
a unique extension to BG in such a way that these properties persist. In the
articles alluded to earlier, Hunt used Choquets result to show that the first
positive entrance into a Borel set is measurable.
1
Notation
General
Description
Notation
ab&ab
a+ & a
f S
k ku
kk[a,b]
See
val [a, b]
Variation norm of the path [a, b]
(4.1.2)
(1.3.20)
N 1
(2.1.13)
N 1
var[a,b] ()
(t)
A()
3.1
1.1
1A
BE (a, r)
B (E; R)
K E
517
Notation
518
SN 1
Q
Z & Z+
C(RN )
RN
9.3
Cb (E; R)
R.
The space of continuous, R-valued functions having com-
Cc (G; R)
C 1,2 (R RN ; R)
D(RN )
4.1.1
H ( RN )
8.1.2
( RN )
The Lebesgue space of E-valued functions f for which
Lp (; E)
kf kpE is -integrable
The space of Borel probability measures on E
M1 (E)
9.1.2
M(E)
sures on E
S (RN ; R) or S (RN ; C)
RN
Measure Theoretic
BE
B(E; R)
E [X, A]
on A. Equivalent to
X d. When A is unspecified, it
Notation
A
E [X | F ]
f
519
the -algebra F
The Fourier transform of the function f
2.3.1
f g
h, i
2.1
10.1
g (N ) (t, x)
Wiener Measure
Gaussian or normal distribution with mean m and co-
m,C
n =
2.3.1
variance C
The Fourier of the measure
2.3.1
N (m, C)
({Xi : i I})
Fi
9.1.2
med(Y )
Chap. III
1.4
2.3.1
(1.1.16)
iI
Fi
iI
7.1.4
10.2.1
8.1.1
10.1.1
W (N )
(N )
Wx
(H, E, WH )
8.2.2
Notation
520
Potential Theoretic
E(G)
g G (x, y)
GG
pG (t, x, y)
11.3.1
11.2
(11.3.1)
10.3.1
Index
absolutely monotone, 19
absolutely pure jump path, 158
abstract Wiener space, 309
orthogonal invariance, 328
ergodicity, 329
adapted, 266
-algebra
atom in, 13
tail, 2
trivial, 2
approximate identity, 16
a.e. convergence of, 241
Arcsine Law, 407
a characterization of, 415
for random variables, 409
asymptotic, 32
atom, 13
Azemas Inequality, 264
B
Bachelier, 188
barrier function, 423
Beckners inequality, 108
Bernoulli multiplier, 101
Bernoulli random variables, 5
Bernstein polynomial, 17
BerryEsseen Theorem, 77
Bessel operator, 350
Beta function, 138
Blumenthals 01 Law, 426
Bochners Theorem, 119
Borel measureable linear maps are continuous, 314
BorelCantelli Lemma
extended version of, 506
martingale extension of, 229
original version, 3
Brownian motion, 177
Erd
osKac Theorem, 399
H
older continuity, 183
in a Banach space, 359
C
Calder
onZygmund Decomposition
Gundys for martingales, 227
CameronMartin formula, 312
CameronMartin space, 305
classical, 305
in general, 310
capacitory distribution, 499
Chungs representation of, 500
capacitory potential, 497, 499
capacitory distribution, 499
capacity, 499
monotone continuity, 502
capacity zero, 514
Cauchy distribution, 149
Cauchy initial value problem, 400
centered Gaussian measure, 299
non-degenerate, 306
521
522
Index
Index
excessive function (continued)
charge determined by, 494
Riesz Decomposition of, 492
exchangeable random variables, 220
Strong Law for, 220
exponential random variable, 161
extended stopping time, 278
F
Ferniques Theorem, 306
application to functional analysis, 314
Feynmans representation, 303
FeynmanKac
formula, 403
heat kernel, 437
fibering a measure, 389
first entrance time, asymptotics of distribution
N = 2, 512
N 3, 509
first exit time, 419
fixed points of T , 92
Fourier transform, 82
Beckners inequality for, 108
diagonalized by Hermite functions, 100
for measure on Banach space, 301
inversion formula, 98, 112
of a function, 82
of a measure, 82
operator, 100
Parsevals Identity for, 112
free fields
Gaussian, 343
erogicity, 358
existence of, 352
function
characteristic, 82
distribution, 7
error, 72
Eulers Beta, 138
Eulers Gamma, 32
excessive, 488
Fourier transform of, 82
Hermite, 100
indicator, 4
moment generating, 23
logarithmic, 25
normalized Hermite, 112
probability generating, 19
progressively measurable, 266
523
Rademacher, 5
rapidly decreasing, 82
tempered, 97
G
524
Index
Kolmogorovs
continuity criterion, 182
Extension or Consistency Theorem, 384
Inequality, 36
Strong Law, 38
01 Law, 2
Kroneckers Lemma, 37
L
-system, 8
Laplace transform inversion formula, 21
large deviations estimates, 28
Law of Large Numbers
Strong
in Banach space, 241, 256, 384
for empirical distribution, 384
for exchangeable random variables, 220
Kolmogorovs, 38
Weak, 16
refinement, 20, 44, 45
Law of the Iterated Logarithm
converse, 56
proof of, 54
statement, 49
Strassens Version, 340, 366
Lebesgues Differentiation Theorem, 237
L
evy measure, 128
It
o map for, 390
L
evy operator, 268
L
evy process, 152
reflection, 292
L
evy system, 134
L
evys Continuity Theorem, 118
second version, 120
L
evyCram
er Theorem, 66
L
evyKhinchine formula, 136
limit superior of sets, 2
Lindebergs Theorem, 61
LindebergFeller Theorem, 62
Fellers part, 90
Liouville Theorem, 472
locally -integrable, 199
Logarithmic Sobolev Inequality, 113
for Bernoulli, 113
logarithmic Sobolev Inequality
for Gaussian, 114, 356
lowering operator, 97
Index
M
marginal distribution, 83
Markov property, 417
martingale, 205
application to Fourier series, 263
continuous parameter, 267
complex, 267
Gundys decomposition of, 227
Hahn decomposition of, 227
reversed, 217
Banach-valued case, 241
on -finite measure space, 233
martingale convergence
continuous parameter, 271
Hilbert-valued case, 243
Marcinkewitzs Theorem, 207
preliminary version for Banach space, 239
second proof, 226
third proof, 227
via upcrossing inequality, 214
maximal function
HardyLittlewood, 235
HardyLittlewood inequality, 236
maximum principle of Phragm
en Lindel
of, 474
Maxwell distribution for ideal gas, 70
mean value
Banach space case, 199
vector-valued case, 84
measure
invariant, 112
locally finite, 63
non-atomic, 381
product, 10
pushforward of under , 12
measure preserving, 244
measures
consistent family, 383
tight, 376, 382
median, 39
variational characterization, 43
Mehler kernel, 98
minimum principle, 130
strong, 405
weak, 404
moment estimate for sums of independent
random variables, 94
moment generating function, 23
logarithmic, 25
multiplier
Bernoulli, 101
525
Hermite, 98
N
526
Index
Index
stochastic process (continued)
right-continuous, 266
state of, 152
stochastic continuity, 189
stopping time, 212
continuous parameter, 272
discrete case, 212
extended, 278
old definition, 280
optional, 280
Stopping Time Theorem
Doobs, continuous parameter, 275
Doobs, discrete parameter, 213
Hunts, continuous parameter, 275
Hunts, discrete parameter, 213
Strassens Theorem, 340
Brownian formulation of, 363
Strong Law of Large Numbers, 23
for Brownian motion, 188
for empirical distribution, 384
in Banach space, 241, 256, 384
Kolmogorovs, 38
strong Markov property, 417
Strong Minimum Principle, 405
strong topology on M1 (E), 369
not metrizable, 381
sub-Gaussian random variables, moment
estimates, 93
submartingale, 205
continuous parameter, 267
Doobs Decomposition, 213
Doobs Inequality
continuous parameter, 270
discrete parameter, 206
Doobs Upcrossing Inequality, 214
reversed, 217
-finite measure space, 233
stopping time theorem
Doobs
discrete parameter, 212
Doobs continuous parameter, 275
Hunts
discrete parameter, 213
Hunts continuous parameter, 275
subordination, 148
symmetric difference of sets, 246
symmetric random variable, 44
moment relations, 45
T
tail -algebra, 2
and exchangability, 220
ergodicity of, 256
527
tempered, 97
tempered distribution, 350
tight, 376, 382
for finite measures, 382
time reversal, 335
time-shift map, 416
Tonellis Theorem, 4
transform
Fourier, see Fourier transform
Laplace, 21
Legendre, 26
transformation, measure preserving, 244
transient, 414
transition probability, 112
U
uniform norm k ku , 17
uniform topology on M1 (E), 367
uniformly distributed, 6
uniformly integrable, 15
unit exponential random variable, 161
V
variance, 15
variation norm, 368
W
Walsh functions, 264
weak convergence, 116
equivalent formulations, 372
principle of accompanying laws, 380
Weak Law of Large Numbers, 16
Weak Minimum Principle, 404
weak topology on M1 (E), 370
completeness, 377
Prohorov metric for, 379
separable, 376, 382
weak-type inequality, 207
Weierstrasss Approximation Theorem, 17
Wiener measure, 301
Arcsine law, 407
Feynmans representation, 303
Markov property, 417
translation by x, 401
Wiener series, 318
classical case, 334
Wieners test for regularity, 504