Professional Documents
Culture Documents
Basic tools
Laurent Schwartz
Modern probability theory has become the natural language for formulating
quantitative models of financial markets. This chapter presents, in the form
of a crash course, some of its tools and concepts that will be important in the
sequel. Although the reader is assumed to have some background knowledge
on random variables and probability, there are some tricky notions that we
have found useful to recall. Instead of giving a catalog of definitions and
theorems, which can be found elsewhere, we have tried to justify why these
definitions are relevant and to promote an intuitive understanding of the main
results.
Therefore it is natural to require that for any measurable set A its complement
Ac is also measurable.
These remarks can be summarized by saying that the domain of definition
of a measure on E is a collection of subsets of E which
• contains the empty set : ∅ ∈ E.
• is stable under unions:
An ∈ E, (An )n≥1 disjoint ⇒ An ∈ E. (2.2)
n≥1
PROPOSITION 2.1
Given a collection A of subsets of E, there exists a unique σ-algebra denoted
σ(A) with the following property: if any σ-algebra F contains A then σ(A) ⊂
F . σ(A) is the smallest σ-algebra containing A and is called the σ-algebra
generated by A.
An important example is the case where E has a topology, i.e., the notion of
open subset is defined on E. This is the case of E = Rd and more generally,
of function spaces for which a notion of convergence is available. The σ-
algebra generated by all open subsets is called the Borel σ-algebra 1 and we
will denote it by B(E) or simply B. An element B ∈ B is called a Borel set.
Obviously any open or closed set is a Borel set but a Borel set can be horribly
complicated. Defining measures on B will ensure that all open and closed sets
are measurable. Unless otherwise specified, we will systematically consider
the Borel σ-algebra in all the examples of measures encountered in the sequel.
Having defined σ-algebras, we are now ready to properly define a measure:
µ : E → [0, ∞]
A → µ(A)
such that
1. µ(∅) = 0.
µ : B(Rd ) → [0, ∞]
A → ρ(x)dx = 1A ρ(x)dx. (2.4)
A
The function ρ is called the density of µ with respect to the Lebesgue measure
λ. More generally, the Equation (2.4) defines a positive measure for every
positive measurable function ρ (measurable functions are defined in the next
section).
Dirac measures are other important examples of measures. The Dirac mea-
sure δx associated to a point x ∈ E is defined as follows: δx (A) = 1 if x ∈ A
and δx (A) = 0 if x ∈
/ A. More generally one can consider a sum of such Dirac
measures. Given a countable
set of points X = {xi , i = 0, 1, 2, ...} ⊂ E the
counting measure µX = i δxi is defined in the following way: for any A ⊂ E,
µX (A) counts the number of points xi in A:
µ(A) = #{i, xi ∈ A} = 1xi ∈A . (2.5)
i≥1
A measurable set may have zero measure without being empty. Going
back to the analogy with length and volume, we can note that the area of a
line segment is zero, the “length” of a point is also defined to be zero. The
existence of nontrivial sets of measure zero is the origin of many subtleties in
measure theory. If A is a measurable set with µ(A) = 0, it is then natural
to set µ(B) = 0 for any B ⊂ A. Such sets — subsets of sets of measure zero
— are called null sets. If all null sets are not already included in E, one can
always include them by adding all null sets to E: the new σ-algebra is then
said to be complete.
A measure µ is said to be integer valued if for any measurable set A, µ(A)
is a (positive) integer. An example of an integer valued measure is a Dirac
measure. More generally, any counting measure is an integer valued measure.
∀x ∈ E, µ0 ({x}) = 0. (2.7)
f −1 (A) = {x ∈ E, f (x) ∈ A}
is a measurable subset of E.
As noted above we will often consider measures on sets which already have a
metric or topological structure such as Rd equipped with the Euclidean metric.
On such spaces the notion of continuity for functions is well defined and one
can then ask whether there is a relation between continuity and measurability
for a function f : E → F . In general, there is no relation between these
notions. However it is desirable that the notion of measurability be defined
such that all continuous functions be measurable. This is automatically true
if the Borel σ-algebra is chosen (since f is continuous if and only if f −1 (A) is
open for every open set), which explains why we will choose it all the time.
Hence in the following, whenever the notion of continuous function makes
sense, all continuous functions will be measurable.
Simple examples of measurable functions f : E → R are functions of the
form
n
f= cj 1Aj , (2.9)
j=1
where (Aj ) are measurable sets and cj ∈ R. These functions are sometimes
called simple functions. The integral of such a simple function with respect
If the measure µ has the decomposition (2.8) then the integral in (2.12) can
be interpreted as
µ(f ) = f (x)µ0 (dx) + bj f (xj ). (2.13)
j≥1
P : F → [0, 1]
A → P(A).
X : Ω → E,
d
If µX = µY then X and Y are said to be identical in law and we write X = Y .
Obviously if X = Y almost surely, they are identical in law.
X : Ω → E is called a real-valued random variable when E ⊂ R. As in
Section 2.1.2, one can define the integral of a positive random variable X
with respect to P: this quantity,
called the expectation of X with respect to
P and denoted by E P [X] = E X(ω)dP(ω), is either a positive number or
+∞. If E P [X] < ∞ then X is said to be P-integrable. By decomposing
any real-valued random variable Y into its positive and negative parts Y =
Y+ − Y− , one sees that if E P [ |Y | ] < ∞ then Y− , Y+ are integrable and the
expectation E[Y ] = E[Y+ ] − E[Y− ] is well-defined. The set of (real-valued)
random variables Y verifying ||Y ||1 = E P [ |Y | ] < ∞ is denoted by L1 (Ω, P).
It is sometimes useful to allow “infinite” values for positive random vari-
ables, i.e., choose E = [0, ∞[∪{+∞}. Of course if Y ∈ L1 (Ω, P) then Y is
almost surely finite.
∂ n MX
mn = (0). (2.27)
∂un
1 ∂ n ΨX
cn (X) = (0). (2.29)
in ∂un
By expanding the exponential function at z = 0 and using (2.25), the n-th cu-
mulant can be expressed as a polynomial function of the moments mk (X), k =
Exponential
α
µ(x) = αe−αx 1x≥0 Φ(z) =
α − iz
Two-sided ex-
α −α|x|
ponential µ(x) = e α
2 Φ(z) =
α2 + |z|2
Gamma
λc c−1 −λx 1
µ(x) = x e 1x≥0 Φ(z) =
Γ(c) (1 − iλ−1 z)c
Gaussian 2
exp( −(x−γ)
2 )
µ(x) = √ 2σ σ2 z2
2πσ Φ(z) = exp(− + iγz)
2
Cauchy
c
µ(x) = Φ(z) = exp(−c|z| + iγz)
π[(x − γ)2 + c2 ]
Symmetric
α-stable
Not known in closed form Φ(z) = exp(−c|z|α )
Student t
Γ((n + 1)/2) x2 Not known in closed form
√ (1 + )−(n+1)/2
nπΓ(n/2) n
P( lim Xn = X) = 1. (2.37)
n→∞
For the above definition to make sense the variables (Xn )n≥1 have to be
defined on the same probability space (Ω, F, P). Note that almost sure con-
vergence does not imply convergence of moments: if Xn → X almost surely,
E[Xnk ] may be defined for all n ≥ 1 but have no limit as n → ∞.
P
Xn → X. (2.39)
n→∞
Almost sure convergence implies convergence in probability but the two no-
tions are not equivalent. Also note that convergence in probability requires
that the variables (Xn )n≥1 be defined on the same probability space (Ω, F, P).
PROPOSITION 2.5
If (Xn ) converges in distribution to X, then for any set A with boundary ∂A,
µ(∂A) = 0 ⇒ [P(Xn ∈ A) = µn (A) → P(X ∈ A) = µ(A)]. (2.43)
n→∞
PROPOSITION 2.6
(Xn )n≥1 converges in distribution to X, if and only if for every z ∈ Rd
Note however that pointwise convergence of (ΦXn ) does not imply the ex-
istence of a weak limit for (Xn ) since the pointwise limit of (ΦXn ) is not
necessarily a characteristic function.
Convergence in distribution does not necessarily entail convergence of mo-
ments: one cannot choose f to be a polynomial in (2.41) since polynomials
are not bounded. In fact the moments of Xn , X need not even exist.
When (Xn ) and X are defined on the same probability space, convergence
in distribution is the weakest notion of convergence among the above; almost
sure convergence entails convergence in probability, which entails convergence
in distribution.
4 The obscure word “cadlag” is a French acronym for “continu à droite, limite à gauche”
which simply means “right-continuous with left limits.” While most books use this termi-
nology some authors use the (unpronounceable) English acronym “rcll.”
Of course, any continuous function is cadlag but cadlag functions can have
discontinuities. If t is a discontinuity point we denote by
n−1
f (t) = g(t) + fi 1[ti ,ti+1 [ (t). (2.48)
i=0
One can think of FtX as containing all the information one can extract
from having observed the path of X between 0 and t. All explicit examples
of information flow we will use will correspond to the history of a set of asset
prices. Notice that FtX has been completed by adding the null sets; all the
null sets are stuffed into F0 . This means that if a certain evolution for X
between 0 and T is deemed impossible, its impossibility is already known at
t = 0.
∀t ≥ 0, {T ≤ t} ∈ Ft .
Xτ ∧t = Xt if t < τ Xτ ∧t = Xτ if t ≥ τ. (2.50)
It is the hitting time TA associated with the open interval A =]a, ∞[.
An example of a random time which is not a stopping time is the first
instant t ∈ [0, T ] when X reaches its maximum:
Obviously in order to know the value of the maximum one must first wait
until T to observe the whole path on [0, T ]. Therefore given the information
Ft at time t < T one cannot decide whether Tmax has occurred or not.
Given an information flow Ft and a nonanticipating random time τ , the
information set Fτ can be defined as the information obtained by observing
all nonanticipating (cadlag) processes at τ , i.e., the σ-algebra generated by
these observations: Fτ = σ(Xτ , X nonanticipating cadlag process). It can be
shown that this definition is equivalent to the following (see [110, p. 186] or
[324, Chapter 1]):
Fτ = {A ∈ F, ∀t ∈ [0, T ], A ∩ {t ≤ τ } ∈ Ft }. (2.54)
2.4.4 Martingales
Consider now a probability space (Ω, F, P) equipped with an information
flow Ft .
For a proof see [116] or [324, Section I.2]. In particular, a martingale stopped
at a nonanticipating random time is still a martingale.
A process (Xt )t∈[0,T ] is a called a local martingale if there exists a sequence
of stopping times (Tn ) with Tn → ∞ a.s. such that (Xt∧Tn )t∈[0,T ] is a martin-
gale. Thus a local martingale behaves like a martingale up to some stopping
5 Note that here we define processes and martingales in particular on a finite time interval
With this definition, any nonanticipating cadlag process is optional but the
sample paths of an optional process need not be cadlag in general: nonan-
ticipating cadlag processes “generate” optional processes the same way that
continuous functions “generate” measurable functions.
The distinction made in Section 2.4.1 between left and right continuity and
its interpretation in terms of sudden vs. predictable jumps motivates the
definition of another σ-algebra on [0, T ] × Ω:
While the name “predictable” is justified given the discussion above, the
name “optional” is less transparent. We will not use this notion very often so
we will stick to this terminology. Any left-continuous process is therefore pre-
dictable (by definition): this is intuitive if lims→t,s<t Xs = Xt then the value
of Xt is “announced” by the values at preceding instants. In the sense of
1
∀y ∈ [0, 1], FY−1 (y) = − ln(1 − y). (2.59)
λ
A simple consequence is that if U is uniformly distributed on [0, 1] then
− λ1 ln U is exponentially distributed with parameter λ. This result will be
used in Chapter 6 to simulate random variables with exponential distribu-
tion.
The exponential distribution has the following important property: if T is
an exponential random variable,
∞
λe−λy dy
∀t, s > 0, P(T > t + s|T > t) = t+s
∞ = P(T > s). (2.60)
t
λe−λy dy
∀t, s > 0, P(T > t + s|T > t) = P(T > s). (2.61)
PROOF Let g(t) = P(T > t). Using Bayes rule we obtain that g is a
multiplicative function:
∀t, s > 0 g(t + s) = P(T > t + s|T > t)P(T > t) = g(s)g(t).
PROPOSITION 2.9
Let (τi , i = 1 . . . n + 1) be independent exponential random variables with
parameter λ and define Tk = τ1 + · · · + τk . Then
1. The law of (T1 , . . . , Tn ) knowing Tn+1 = t is Dn ([0, t]).
T1
2. The vector ( Tn+1 , . . . , TTn+1
n
) is independent from Tn+1 and has the law
Dn ([0, 1]).
PROPOSITION 2.10
Let U1 , . . . , Un be independent random variables, uniformly distributed on
[0, 1], U(1) ≤ · · · ≤ U(n) be the corresponding order statistics and V be an
independent random variable with a density given by (2.63). Then the vari-
ables
V U(1) , V (U(2) − U(1) ), V (U(3) − U(2) ), . . . , V (U(n) − U(n−1) )
form a sequence of independent exponential variables with parameter λ.
PROPOSITION 2.11
If (τi )i≥1 are independent exponential random variables with parameter λ
then, for any t > 0 the random variable
n
Nt = inf{n ≥ 1, τi > t} (2.67)
i=1
(λt)n
∀n ∈ N, P(Nt = n) = e−λt . (2.68)
n!
k
PROOF Let Tk = i=1 τi ∀k. The density of (T1 , ..., Tk ) is given by
DEFINITION 2.17 Poisson process Let (τi )i≥1 be a sequence nof in-
dependent exponential random variables with parameter λ and Tn = i=1 τi .
The process (Nt , t ≥ 0) defined by
Nt = 1t≥Tn (2.69)
n≥1
PROPOSITION 2.12
Let (Nt )t≥0 be a Poisson process.
1. For any t > 0, Nt is almost surely finite.
2. For any ω, the sample path t → Nt (ω) is piecewise constant and in-
creases by jumps of size 1.
3. The sample paths t → Nt are right continuous with left limite (cadlag).
4. For any t > 0, Nt− = Nt with probability 1.
5. (Nt ) is continuous in probability:
P
∀t > 0, Ns → Nt . (2.70)
s→t
PROOF
The joint law of the increments has a product form, which shows their
independence. Moreover, each term in the product is recognized to be
the density of a Poisson law with parameter λ(ti − ti−1 ) which shows
the homogeneity of the increments.
9. The Markov property follows from the independence of increments:
since Nt − Ns is independent of Nu , u ≤ s.
Let us now comment on the properties (2), (3) and (4) which seem somewhat
paradoxical: on one hand we assert that with probability one, any sample path
of the Poisson process is discontinuous (in fact it only moves by jumps) and
8
5
7
4
6
3
2
4
1
3
0
2
1 −1
0 −2
0 5 10 15 0 5 10 15 20 25 30
The right continuity (cadlag property) of the Poisson process is not really
a “property”: we have defined Nt in such a way that at a discontinuity point
Nt = Nt+ , i.e., we have chosen the right-continuous (cadlag) version of the
Poisson process. Another choice would have been the left-continuous one:
Nt = 1t>Tn . (2.75)
n≥0
The difference between N and N is that the values of N in the near future
are foreseeable since Ns → Nt as s → t, s < t while the values of N are “unpre-
dictable” given the past. The jumps of N are interpreted as sudden events8 ,
which corresponds to our initial motivation for including jumps in models of
price dynamics. Hence we will always use the cadlag version (Nt )t≥0 of the
Poisson process. Following these remarks, for other examples of jump pro-
cess that we will encounter, we will always choose a right-continuous (cadlag)
version.
Poisson processes have some other useful properties which we mention here.
First, a superposition of independent Poisson processes is again a Poisson
process:
Then the process X is a Poisson process with intensity pλ. Another way to
see this result is the following: if the arrival Tn of each event in the Poisson
process N is marked with probability p, independently from event to event,
then the process of marked events thus obtained is again a Poisson process
whose intensity is equal to the intensity of N , decreased by the marking
probability: λX = pλ.
Ñt follows a centered version of the Poisson law with characteristic function:
(Ñt )t≥0 is called a compensated Poisson process and (the deterministic expres-
sion) (λt)t≥0 is called the compensator of (Nt )t≥0 : it is the quantity which
has to be subtracted from N in order to obtain a martingale. Sample paths of
the Poisson and compensated Poisson process are shown in Figure 2.1. Note
that the compensated Poisson process is no longer integer valued: unlike the
Figure 2.2 compares Ñt /λ with a standard Wiener process. The two graphs
look similar and this is not a coincidence: when the intensity of its jumps
increases, the (interpolated) compensated Poisson process converges in distri-
bution to a Wiener process [153]:
Ñt λ→∞
⇒ (Wt )t∈[0,T ] . (2.81)
λ
t∈[0,T ]
6
0
4
−1
2
Wt
−2
0
−3
−2
−4 −4
−6
0 5 10 15 20 25 30
−5
t 0 5 10 15 20 25 30 35
Xt is simply the number of random times {Tn , n ≥ 1} occurring in [0, t]. The
condition P(Tn → ∞) = 1 guarantees that, with probability 1, Xt is well-
defined (finite) for any t ≥ 0. Like the Poisson process, (Xt )t≥0 is a cadlag
process with piecewise constant trajectories: its sample paths move by jumps
of size +1.
If the random times (Tn ) are constructed as partial sums of a sequence
of i.i.d. exponential random variables, then X is a Poisson process. For a
general counting process, the sequence of random times (Tn ) can have any
distribution and dependence structure. The following lemma shows that the
only counting processes with independent stationary increments are Poisson
processes:
LEMMA 2.1
Let (Xt ) be a counting process with stationary independent increments. Then
(Xt ) is a Poisson process.
eiuXt
Mt =
f (t)
This result shows both that the process (XY1 +t − XY1 )t≥0 is independent from
Y1 (because we could have taken any number of increments) and that it has
the same distribution as (Xt )t≥0 . This property allows to conclude that Y2 is
independent from Y1 and has an exponential distribution, Y3 is independent
from Y2 and Y1 and has exponential distribution so the result is obtained by
induction.
Then M (ω, .) is a positive, integer valued measure and M (A) is finite with
probability 1 for any bounded set A. Note that the measure M (ω, .) depends
on ω: it is thus a random measure. The intensity λ of the Poisson process
determines the average value of the random measure M : E[M (A)] = λ|A|
where |A| is the Lebesgue measure of A.
The properties of the Poisson process translate into the following properties
for the measure M : for disjoint intervals [t1 , t1 ], . . . , [tn , tn ]
1. M ([tk , tk ]) is the number of jumps of the Poisson process in [tk , tk ]: it
is a Poisson random variable with parameter λ(tk − tk ).
2. For two disjoint intervals j = k, M ([tj , tj ]) and M ([tk , tk ]) are indepen-
dent random variables.
The random measure M can also be viewed as the “derivative” of the Poisson
process. Recall that each trajectory t → Nt (ω) of a Poisson process is an
increasing step function. Hence its derivative (in the sense of distributions)
is a positive measure: in fact it is simply the superposition of Dirac masses
located at the jump times:
d
Nt (ω) = M (ω, [0, t]) where M = δTi (ω) . (2.85)
dt
i≥1
In the same way, one can associate a random measure to the compensated
Poisson process Ñt , defined in Section 2.5.4, by:
M̃ (ω, A) = M (ω, A) − λdt = M (ω, A) − λ|A|. (2.86)
A
M̃ (A) then verifies E[M̃ (A)] = 0 and Var[M̃ (A)] = λ|A|. Note that unlike
M , M̃ is neither integer valued (counting measure) nor positive: it is a signed
measure. M̃ is an example of a compensated random measure and the measure
A → λ|A| is called the compensator of M . Note that here the compensator is
none other than λ times the Lebesgue measure: λ|A| = E[M (A)] and M̃ is
the “centered” version of M .
This construction can be generalized in various ways, leading to the notions
of Poisson random measure, point process, and marked point process.
9 Note that in most texts M and N are denoted by the same letter, which can be quite
M :Ω×E → N
(ω, A) → M (ω, A),
such that
(µ(A))k
∀k ∈ N, P(M (A) = k) = e−µ(A) . (2.87)
k!
The construction given in this proof shows that in fact any Poisson ran-
dom measure on E can be represented as a counting measure associated to a
random sequence of points11 in E: there exists {Xn (ω), n ≥ 1} such that:
∀A ∈ E, M (ω, A) = 1A (Xn (ω)). (2.88)
n≥1
M is thus a sum of Dirac masses located at the random points (Xn )n≥1 :
M= δXn .
n≥1
11 For
this reason Poisson random measures are also called “Poisson point processes”, al-
though the word “process” is confusing since there is no time variable involved yet.
The second sum runs over the events (Tn , Yn ) which have occurred before t,
i.e., Tn ≤ t. (Xt (f ))t∈[0,T ] is thus a jump process whose jumps happen at
the random times Tn and have amplitudes given by f (Tn , Yn ). As remarked
above, this construction makes sense if f verifies (2.91).
Similarly, under condition (2.91), one can define the integral of f with
respect to the compensated Poisson measure M̃ . The resulting process, called
the compensated integral, is in fact a martingale:
is a martingale.
Note that, even though the integrals defined in this section are random
variables, they are defined pathwise i.e., for any ω ∈ Ω.
∆X
t =0
Such expressions may involve infinite sums and we will see, in Chapters 3 and
8, when and in what sense they converge.
defines a cadlag process with jump times Tn and jump sizes ∆XTn = Xn −
Xn−1 . Its jump measure is given by:
JX = δ(Tn ,Xn −Xn−1 ) .
n≥1
Finally, let us note that if Xt is a jump process built from a Poisson random
measure M as in (2.96):
M= δ(Tn ,Yn ) Xt = f (s, y)M (ds dy)
n≥1 [0,t]×Rd \{0}
then the jump measure JX can be expressed in terms of the Poisson random
measure M by
JX = δ(Tn ,f (Tn ,Yn )) . (2.99)
n≥1
Further reading
General references on topics covered in this chapter are [228] or [153]. For
general notions on probability spaces, random variables, characteristic func-
tions and various notions of convergence we refer the reader to the excellent
introductory text by Jacod and Protter [214]. Poisson processes are treated
in most texts on stochastic processes; the link with point processes is treated,
in increasing level of difficulty by Kingman [235], Bouleau [68], Resnick [333],
Fristedt & Gray [153], Neveu [306]. A short introduction to Poisson random
measures is given in [205]. Point processes are considered in more detail in
[227] using a measure theory viewpoint. A modern treatment using martin-
gales is given in [306, 73] see also [295].
The Poisson distribution and the Poisson process were named after the
French mathematician Simeon-Denis Poisson (1781–1840). Poisson studied
mathematics at Ecole Polytechnique in Paris, under Joseph Louis Lagrange,
Pierre Simon Laplace and Jean Baptiste Fourier. Elected to the French
Academy of Sciences at age 30, he held academic positions at Ecole Poly-
technique and Sorbonne. Poisson made major contributions to the theories of
electricity and magnetism, the motion of the moon, the calculus of variations,
differential geometry and, of course, probability theory. Poisson found the
limiting form of the binomial distribution that is now named after him: the
Poisson distribution. The importance of this discovery was only recognized
years later, by Chebyshev in Russia. His works on probability appeared in
the book Recherches sur la probabilité des jugements en matière criminelle
et en matière civile, précédées des règles générales du calcul des probabilités,
published in 1837, where he was the first to use the notion of cumulative
distribution function, to define the density as its derivative and to develop
an asymptotic theory of central limit type with remainder for hypergeometric
sampling [362]. The expression “law of large numbers” is due to Poisson: “La
loi universelle des grands nombres est à la base de toutes les applications du
calcul des probabilités” 13 . Poisson was also the first to compute the Fourier
transform of x → exp(−|x|) and thus discovered what is now (erroneously)
13 The universal law of large numbers lies at the foundation of all applications of probability
theory.