Professional Documents
Culture Documents
à l’Économie et à la Finance
Erasmus Mundus
University Paris 1
Probability 1
Annie Millet
The aim of these lectures is to recall the basic notions of measure theory already seen in the
third year of Licence, concentrating on the tools that are constantly used by probabilists, as
well as the purely probabilistic notions. Most of the measure-theoretical results are stated
without proofs, except for the new ones.
We will develop in more details the conditional expectation (which is crucial in statistics,
in further Probability lectures on discrete and continuous time processes), Gaussian vectors
and convergence theorems.
6 . .
.
.. . . . .
.
. . .. . . . .. . .
. . .. . .
. . . .. .
. . .. . . . .. . . . .. . . . .. .
.. . . . .. ... .. . . .. .. ... ..... . . . .
... . .... .... .. ... . ...... ..... .. ...... . . . . .. ..
. . . . .
4 . .
. . ... . . . . . .. .... . .. . . ... .. .. .. . . .
. .. . .. . .. ... ..... .... ...... .. ..... . ................ ........ . .. . .
. . .. .... . . ....... ...................................... .................... ........ ...... ........ .... ..... .. .. . . . . .
. .. ....... . ... . ..... ...... ..... ...... ...... .. ....... .... ........ .. .. .... .. ... .
.. . . . . . .. .. .... . . .. . . .... . . ... . . ... ..
. . ... .
. . . ... . .. . . ........ ....... ..... . .................................................... ..... .... . .. . . . . . . . .
. . . .. ... . . .. .. . ...... .......... ... .................................................................................................................... .. ..... .... . . . .
. ... ....... ....... ... ... .... ................. ......................... .. ........ ........... .. . .
. . . . . . .... . ...................2 .. ............ ........................ ................................ ........... ....... . . . ... . ........... . . . . .
. . .. . . . .. . . . .. . . . .
. . . .... ... .. ........ .................. ........................................................................................................................................................................................ .. .. . . . . .
. . . . . . . ......... .. ... .................................................................................. ........ ........... .. .. ..... . . . .. . .
. .. . . . ... ..... ... ... ...... ....................................................................................... ................ ..... .... ....... . . .
. . . . .. .
. . ..... . .. . . ....... ....... ... ..... .......... .. .. . . . . .... .. . . . . . . . . . . . . . . . . . . . . . . . . . . . ... . . . . .. .
... .. .... . ................................................................................................................................................................................................................................................... ... . . . .
... . . . .. . ... ......... ..... ................................................................................................................................................................ ............... ... .... .. . . .
. . . . . ................ ..... ............................................................................................................................ ................. .. .. .... . .
. . . ............. . .. ....... ................................................................................................................................................................................................................................................................. ........ ......... .. ... .. . . ... ..
. . .. .. . . .. . .. .. .. . .. .. .............................................................. .......... ........ .. .. .... . . . .. .
. . ... .... .. ................................... ........................ ...................................................0 ....................... ..... ......... . ... ..... .. . .
. . . .. . ........................................................................................................................................................................................ .................. . .. . .. .. .
−8 −6 . . −4 .. . .. .. .. ........ . .....................−2 . ... .................................................................................0........................................................... ........ ....2 . . .. .... ... .
. 4. 6 8
. . . . . ... . .. .... . .... . .... ................ .. ...... .. .. . . . .. .. . . . . . .
. . . .. .... . .. ..................................................................................................................................................................................................................................... ................. . .... ..... .. .. .
. . . . . .... .. . . . . . . .. .. . . . . . . . ... . .. . . . . . . . .. . . .. . . . .. . . . . . . . . . . . . . . . . .
. . ... ............... ............................................................................................................................................................................... ......... . ...... ... . . .. .
. . ... .. . ............ ... ............... .......................... ...... .......... ... .......... ......... ... ......... . . . .. . .
. ... . ..... . .............................................................................................................................................................................................................. ........ ...... . . . .... . . . .
. . .. . . ... ...... .......... ...... ............ .. .... ............ ................. ......... ... .... .... . .. .. . . .
. ..... ..... ............ . .....................................................................................................................−2 .. . ....... .............. . .. .. .. .. . .
. .
. . . .
. . . . . .. . .. ... . . . .......... ........ .. ... . .... .. .. . .... ...... . ... . . .. . .. . . . . .. . .
. . . . . . .... ..... ........................................................................................................................................ ... .. . . .
. . . . . . .. . .. ........... .... ................. .... ... .. ....... ..... ..... .
. . . .. .. . . ..... .. . .. .. . . . . .
. . . . . . ...... .. . .................... ......... .............. ........... ........... . ..... .. .. ..... .. . ..
. . .. . . . .... . . . . .. ... . .. ... ... . . .
. .. . .. . . ........... . . . . . .. ...... ..... ......... .. ........ . ..... . . ... . . ... . .
.. . . . . .. .. .. .. .. . .. .. . .
. ... .. . . ... .... .......... . ....... . ..... ... .. . . . . .
. . .. . . . . .... .... .. . . . . . . .. . . −4
.
. .. .. . ... . . . . ..
. . . . . . . .. . . . . .
. . . .
. .. . . .
. .. .
. . . .
.
. −6
.
Table des matières
The aim of this chapter is to give reminders on the general theory of integration that
will be used in probability. Only the few results which are « new » compared to the lectures
given in the third year of Licence will be proved.
Convention of notation. If A ⊂ X, we denote by Ac the complement of A in X.
1.1 σ-algebra
The information that we need for a qualitative model is given by a set Ω (which often
corresponds to all of the possible results of an experiment) and the subsets of Ω (which
are results of the experiment that have particular properties). However, it is necessary to
give the rules of manipulation for these subsets which will later allow the introduction of
quantitative information. We lightly touch on all of this in a more general setting. So as to
reserve the notation Ω for a probability space, we will refer to the reference space here as
X.
The following remark brings together the important properties of a σ-algebra X on X that
follow immediately from the preceding definition.
Definition 1.3 For every family A of subsets of X there exists a smallest σ-algebra contai-
ning A, called the σ-algebra generated by A and written σ(A) ; it is the intersection of all
the σ-algebras that contain A.
The following examples of σ-algebras generated by a family of sets will be used often.
Definition 1.4 Let (X, X ) and (Y, Y) be measurable spaces. A map f : (X, X ) → (Y, Y) is
measurable if for all B ∈ Y, its pull back f −1 (B) ∈ X , which is written f −1 (Y) ⊂ X .
Example 1.5 1. Every constant mapping is measurable for any σ-algebra on the domain
and range.
2. If (X, X ) is a measurable space, for all A ∈ X , the indicator function of A, 1 A :
(X, X ) → (R, R) defined by 1A (x) = 1 if x ∈ A and 0 otherwise, is measurable.
We remark that if we take the sets Ai pairwise disjoint, the constants ci are unique. The
step functions are stable by sum, product, supremum, infimum.
The sequence
n2 n −1
X
fn = i 2−n 1Ai + n1Bn
i=0
are increasing step functions and for all x, (fn (x) , n ≥ 1) converges to f (x).
(ii) If f is negative somewhere, we use (i) for approximating f + = sup(f, 0) and f − =
sup(−f, 0) and we deduce a sequence of step functions whose difference converges simply to
f . If the function f is also bounded, the sequences that approximate f + and f − converge
uniformly and their difference then converges uniformly to f = f + − f . 2
Convention of notation. In the following, if A ⊂ R and f : X → R, for all A ⊂ R we will
write
{f ∈ A} := {ω ∈ Ω : f (ω) ∈ A} = f −1 (A).
Theorem 1.13 There exists a unique non negative measure λ on (R, R), called the Lebesgue
measure, such that
(i) λ([a, b]) = b − a for all a ≤ b.
(ii) λ is invariant by translation, that is to say that for all a ∈ R and all A ∈ R,
λ(a + A) = λ(A).
(iii) The measure λ is σ-finite.
We may characterize λ by requiring in the point (i) only λ([0, 1]) = 1. It is from this
theorem that the « good » σ-algebra on R is the Borel σ-algebra and not P(R). The proof
of the existence of this measure is long and the arguments are not much used in the rest of
the document. The proof of the uniqueness uses a result which helps in a number of contexts
where we wish to establish a uniqueness result. It relies on the following theorem.
Proof. We will say that a λ-system is a family of subsets of X that satisfies the conditions
(i)-(iii) of the theorem. A σ-algebra is clearly a λ-system. We write λ(C) as the intersection
of all the λ-systems that contain C ; we then deduce that λ(C) is contained in B. To check
that λ(C) contains σ(C), it suffices to prove that λ(C) is a σ-algebra.
It is clear that λ(C) satisfies the properties (1) and (2) of the definition 1.1 ; it suffices
then to prove that λ(C) is stable by countable union. We first show that λ(C) is stable by
finite intersection. For all A ∈ C, we denote
It is easy to show that, C ⊂ Π(A) ⊂ λ(C). For proving that λ(C) = Π(A), it then suffices
to show that Π(A) is a λ-system. First of all, A ∩ X = A ∈ C ⊂ λ(C), so that X ∈ Π(A).
Let B1 ⊂ B2 be elements of Π(A). Then (B2 \B1 ) ∩ A = (B2 \A) ∩ (B1 \A) ∈ Π(A) since
it is the intersection of two sets that are included in Π(A). Finally, let (Bn , n ≥ 0) be an
increasing sequence of elements of Π(A). Then (∪n Bn ) ∩ A = ∪n (Bn ∩ A), whereas Bn ∩ A
is an increasing sequence of elements of Π(A). We then deduce that ∪n Bn ∈ Π(A). Hence
we have shown that λ(C) = Π(A), that is to say that for all A ∈ C and B ∈ λ(C), we have
A ∩ B ∈ Π(A).
Now let A ∈ λ(C). The previous reasoning shows that C ⊂ Π(A) and that the new
Π(A) = λ(C). We deduce that the family λ(C) is stable by finite intersection. The stability
by complementation permits the deduction that λ(C) is also stable by finite union. Finally
let (An , n ≥ 1) be a sequence (not necessarily increasing) of elements of λ(C). For all
Proposition 1.17 Let λ be Lebesgue measure on R. For every integer d ≥ 2, the measure
λd = λ⊗d on the Borel σ-algebra Rd is the unique measure on Rd such that for ai < bi ,
d d
i = 1, · · · , d, λ Πi=1 ]ai , bi [ = Πi=1 (bi − ai ).
The following result is a « functional » version of the Monotone Class Theorem ; its proof
is left as an exercise.
Theorem 1.18 Let H be a vector subspace of the set of bounded functions from Ω to R
such that
(i) the constant functions belong to H.
(ii) If (hn ) is a sequence of elements of H that converge uniformly to h, then h ∈ H.
(iii) If (hn ) is an increasing sequence of non negative functions of H such that the
function h = supn hn is bounded, then h ∈ H.
Let C be a subset of H that is stable by multiplication. Then H contains all the bounded
measurable functions from (Ω, σ(C)) to (R, R).
By construction, λ2 ({x} × R) = λ2 (R × {x}) = 0 and, more generally, every line is a null
set for the Lebesgue measure λ2 . In the same way, every vector (or affine) subspace from
Rd of dimension strictly less than d is a null set for the Lebesgue measure λd . The following
theorem allows the transfer of a measure from the domain space to the range space by a
measurable function.
Theorem 1.19 Let (X, X ) and (Y, Y) be measurable spaces, f : (X, X ) → (Y, Y) a measu-
rable function and µ a non negative measure on X . The function ν : Y → [0, +∞] defined by
ν(B) = µ(f −1 (B)) is a measure on Y called the image (or pushforward measure) measure
of µ by f . We often use the notation ν = µ ◦ f −1 or ν = µf . Its total volume is equal to that
of µ.
In the rest of this section, except where the contrary is explicitly noted, we will write
(X, X , µ) as a measure space. We also use the systematic convention that 0 × (+∞) = 0.
We begin by defining integrals of the « simplest » functions, that is to say the indicator
functions, imposing that the integral of 1A with respect to µ must be µ(A). For preserving
the linearity, it is thus natural to impose the following definition.
P
Definition 1.20 Let f = ni=1 αi 1Ai be a non negative step function. Using the convention
0 × (+∞) = 0, we define the integral of f with respect to µ by
Z d
! d
X X
αi 1 A i dµ := αi µ(Ai ). (1.2)
i=1 i=1
R
We readily notice that the preceding definition of f dµ is independent of the decomposition
of the step function f as a linear combination of indicator functions. We will suppose in the
following that the sets Ai are pairwise disjoint. We easily verify the following proposition :
Proposition
R 1.21 LetR f and g be
R non negative step functions, c a nonnegative real number.
(i) (f + cg)dµ = f dµ R + c gdµ.
R
(ii) If 0 ≤ f ≤ g, 0 ≤ f dµ ≤ gdµ.
For example, if µ is Rthe counting measure on P(N) and if fn (i) = i for all i ≤ n and
fn (i) = n for all i > n, fn dµ = +∞, while if gn (i) = i for all i ≤ n and gn (i) = 0 for all
R P
i > n, gn dµ = ni=0 i = n(n+1)2
.
If λ denotes the Lebesgue measure on R, and if f : R →R[0, +∞[ is defined by f (x) = 1
if x ∈ [0, 1[, f (x) = 2 if x ∈ [1, 4] and f (x) = 0 if x 6∈ [0, 4], f dλ = 7.
We then define the integral of a non negative measurable function by using the point
(ii).
Definition 1.22 Let f : (X, X ) → ([0, +∞], B([0, +∞])) be a non negative measurable
function. We then define the integral of f with respect to µ by
Z Z
f dµ = sup ϕdµ : 0 ≤ ϕ ≤ f, ϕ a step function . (1.3)
If the definition of the integral of a non negative measurable function given by (1.3) is
« intrinsic » it is rarely useful for calculating this integral. The theorem 1.9 gave a construc-
tive procedure for approximating a non negative measurable function by increasing step
functions. The following theorem is one of the most fundamentally ones of the theory. It
allows, by passage to the increasing limit, the concrete computation of integrals of non
negative measurable functions.
The following examples demonstrate that the monotonicity of the sequence (fn ) is crucial,
either for bounded sequences if the measure is infinite, or if the measure is finite with
unbounded sequences.
Example 1.24 (i) We consider the measure space (R, R, λ), where λ designates the Le-
besgue
R measure. For n ≥ 0 let fn = 1[n,n+1[. Then the sequence fn converges simply to 0, but
fn dλ = 1 for every integer n. Thus, for a set of infinite measure, it does not suffice that
the sequence (fn ) be bounded to exchange limit and integral.
(ii) We consider the measure space ([0, 1], B([0, 1]), λ), where λ designates the restriction
of the Lebesgue measure to [0, 1]. Then for all α > 0, the sequence (f n = nα 1]0, 1 ] , n ≥ 1)
R R n
converges simply R to 0, but f n dλ = 1 for every integer n if α = 1, lim n f n dλ = 0 if
0 < α < 1 and fn dλ tends to +∞ if α > 1. Again, it is not sufficient for the measure to
be a probability in order to exchange limit and integral.
We immediately deduce from the Monotone Convergence Theorem 1.23 and from Theorem
1.9 the following results.
R X ) → ([0, +∞], B([0, +∞])) is a non negative measurable function, for all
1. If f : (X,
a ∈ X f dδa = f (a).
2. If f : N → [0, +∞], f is measurable for the σ-algebra P(N), and if µ designates the
counting measure on P(N), Z X
f dµ = f (n).
n≥0
The evaluation of a function and the sum of a series of non negative terms then appear as
particular cases of the integrals of functions for the appropriate measures.
The following proposition gathers some properties of the integral of a non negative
measurable function.
Proposition 1.25 Let f and g be non negative measurable functions from (X, X ) to ([0, +∞],
B([0, +∞])). Then R R
(i) If 0R≤ f ≤ g, we have 0 ≤ f dµ ≤ gdµ.
(ii) If R f dµ = 0, we have µ(f 6= 0) = 0.
(iii) If f dµ < +∞, we have µ(f = +∞) = 0.
(iv) (Markov Inequality) For every number a ∈]0, +∞[,
Z
1
µ(f ≥ a) ≤ f dµ. (1.5)
a
Finally, the Monotone Convergence Theorem allows the construction of non negative
measures from other non negative measures. Its proof is left as an exercise.
Theorem 1.26 Let (X, X , µ) be a measure space, f : (X, X ) → ([0, +∞], B([0, +∞])) a
non negative measurable function. Then the mapping ν : X → [0, +∞] defined by
Z
ν(A) = (f 1A )dµ , ∀A ∈ X
is a non negative measure. We say that the measure ν has density f with respect to the
measure µ.
We immediately
R see that if ν has density f with respectR to µ, the total mass of ν is equal
to ν(X) = f dµ ; the measure ν is thus a probability if f dµ = 1.
Again in this section (X, X , µ) denotes a measure space. In order to preserve the linearity
of the integral, since we can write f = f + − f , where f + = f ∨ 0 and f − = (−f ) ∨ 0, it
is tempting to define the integral of f as the difference of the integrals of f + and of f − .
However, we are not able to set this definition in full generality ; indeed, we would have
to give a meaning to the expression (+∞) − (+∞). This leads to the introduction of the
following definition, considering the fact that |f | = f + + f − .
Definition 1.27 A measurable function
R f : (X, X ) → (R, R) is µ-integrable (or integrable
with respect to the measure µ) if |f |dµ < +∞. RWe writeR L1 (µ) forR the collection of µ-
1 + −
Rintegrable functions, and if f ∈ L (µ) we write f dµ = f dµ − f dµ and kf k1 =
|f |dµ.
We see that in the previous definition we are assured that the separate integrals of f + and
of f − are finite. Since we are able to give a sense of a − b when one of the two terms of the
difference is +∞, we are R able to equally
R define the semi-integrable
R functions
R and impose
R that
either of the integrals f + dµ or f − dµ is finite ; we set again f dµ = f + dµ − f − dµ ∈
[−∞, +∞].
Remark 1.28 If f = f1 − f2 where f1 and f2 are nonnegative integrable measurable func-
tions,
R weRhave f + R≤ f1 and f − ≤ f2 whereas (f1 − f + ) − (f2 − f − ) = 0. We deduce that
f dµ = f1 dµ − f2 dµ, which proves the linearity of the integral.
Proposition 1.29 Let f and R g be µ-integrableRfunctions,Rα and β be real numbers. Then
αf + βg is µ-integrable and (αf + βg)dµ = α f dµ + β gdµ.
In the particular case (X, X ) = (R, R), the following proposition connects the integral
of a function f with respect to the Lebesgue measure λ and the Riemann integral of f . It
is a valuable tool for effectively calculating integrals with respect to the Lebesgue measure.
Proposition 1.30 (i) Every function f that is Riemann integrable on the closed, bounded
interval [a, b] is integrable for the Lebesgue measure λ and
Z Z b
(f 1[a,b] )dλ = f (x)dx. (1.6)
a
However, we are not able to place into the scope of measure theory those functions for
which the generalized Riemann integral converges without converging absolutely, as pointed
out by the following example. The function f : R → R defined by f (x) = sin(x) x
1[1,+∞[ is
not integrable
R +∞ (nor even semi-integrable) for the Lebesgue measure, although the Riemann
integral −∞ f (x)dx converges, but is not absolutely converging.
Let a ∈ X ; every
R measurable function f : (X, X ) → (R, R) is integrable for the Dirac
measure δa and f dδa = f (a).
A function f : N → R (also called a sequence) is integrable for the counting measure µ
on P(N) P if and only if the series of general
R termsPf (n) is absolutely convergent, that is to
say that n |f (n)| < +∞. In this case, f dµ = n f (n). The function f : N → R defined
(−1)n
by f (n)
P= n+1 is thus not integrable for the counting measure, although the alternating
series n f (n) will be convergent.
The following theorems are constantly used in probability theory. The first one links the
integrals of a function g with respect to the measure µ and with respect to the measure ν
of density f with respect to µ.
Theorem 1.31 Let (X, X , µ) be a measure space, f : (X, X ) → ([0, +∞], B([0, +∞])) a
non negative measurable function and ν the measure of density f with respect to µ defined
by Theorem 1.26. Then
(i) For every non negative measurable function ϕ : (X, X ) → ([0, ∞], B([0, +∞])),
Z Z
ϕdν = (f ϕ)dµ. (1.7)
(ii) The measurable function ϕ : (X, X ) → (R, R) is ν-integrable if and only if the
function f ϕ is µ-integrable, and if the functions are integrable the equality (1.7) remains
valid.
R R
Proof. (i) By definition, for every set A ∈ X , 1A dν = ν(A) = (f 1A )dµ and the equation
(1.7) is then true when ϕ = 1A . By linearity of the integral, the equation (1.7) remains true
when ϕ is a non negative step function. Let ϕ be a non negative measurable function and
(ϕn ) an increasing sequence of non negative step functions that converge simply to ϕ (and
whose existence is assured by Theorem 1.9). Because f is non negative, the sequence (f ϕ n )
is increasing and converges simply to f ϕ. The Monotone Convergence Theorem 1.23 applied
to the sequence (ϕn ) with respect to the measure ν and to the sequence (f ϕn ) with respect
to the measure µ yields that the equation (1.7) is true for every non negative measurable
function ϕ, which proves (i).
(ii) Let ϕ : (X, X ) → (R, R) be a measurable function ; then |ϕ| is a non negative
measurable function and (1.7) shows that ϕ ∈ L1 (ν) if and only if f ϕ ∈ L1 (µ). Moreover,
if ϕ ∈ L1 (ν), ϕ+ and ϕ− are non negative, measurable, and also belong to L1 (ν). Using (i),
we deduce that f ϕ+ and f ϕ− are non negative measurable functions that belong to L1 (µ)
and such that f ϕ = f ϕ+ − f ϕ− . The remark 1.28 and the point (i) allows to deduce that
the equality (1.7) is true for ϕ, which ends the proof of (ii). 2
The next theorem allows the comparison of the integrals of ϕ ◦ f with respect to µ and
of ϕ with respect to the image measure of µ by f denoted µ ◦ f −1 . Its proof is left as an
exercise.
Theorem 1.32 (Theorem of the image measure) Let (X, X ) and (Y, Y) be measurable
spaces, µ a non negative measure on X and ν = µ ◦ f −1 the image measure of µ by f
defined on Y in Theorem 1.19. Then
(i) For every non negative measurable function ϕ : (Y, Y) → ([0, ∞], B([0, +∞])),
Z Z
−1
ϕd(µ ◦ f ) = (ϕ ◦ f )dµ. (1.8)
Y X
(ii) For every measurable function ϕ : (Y, Y) → (R, R), ϕ is µ ◦ f −1 -integrable if and
only if ϕ ◦ f is µ-integrable. In that case, the equality (1.8) remains true.
The Monotone Convergence Theorem leads to the following result, which is used to show
that functions are integrable.
Theorem 1.33 (Fatou’s Lemma) Let (fn , n ≥ 1) be a sequence of non negative measurable
functions. Then Z Z
0 ≤ lim inf fn dµ ≤ lim inf fn dµ ≤ +∞.
R
We immediately deduce that, if supn |fn |dµ < +∞ and if the sequence (fn ) converges
simply to f , then f is µ-integrable.
We seek to extend the Monotone Convergence Theorem in a number of ways : on one
hand, weakening the notion of simple convergence of the sequence (fn ), and on the other
hand not requiring that the sequence be increasing and non negative. In this section (X, X , µ)
is a measure space.
Definition 1.34 (i) A measurable set A is a µ-null set if µ(A) = 0. When no confusion
concerning the measure µ is possible, we say that a µ-null set is a null set.
(ii) A property is true µ almost everywhere if it is true on the complement of a µ-null
set.
(iii) A function f : X → R is µ-null if it is zero µ-almost everywhere.
Thus, the set Q of rational numbers is a null set for the Lebesgue measure λ on R, {x}×R
is a null set for the Lebesgue measure λ2 on R2 . The sequence of functions fn : [0, 1] → R
defined by fn (x) = n1[0, 1 ] converges to 0 λ-almost surely.
n
If two non negative measurable
R R(or integrable) functions f and g are such that f = g µ-
almost everywhere, then f dµ = gdµ.
Proposition 1.35 (i) A function f ∈ L1 (µ) (that is to say µ-integrable) is finite µ-almost
everywhere. R
(ii) A function f is µ-null if and only if |f |dµ = 0.R
(iii) An integrable function f is µ-null if and only if A f dµ = 0 for every set A ∈ X .
Proof. The Markov inequality (1.5) shows that for every integer n ≥ 1,
Z Z
1 1
µ(|f | ≥ n) ≤ |f |dµ and µ(|f | ≥ ) ≤ n |f |dµ.
n n
(i) The sequence of sets {|f | ≥ n} is increasing, µ(|f | ≥ 1) < +∞ and ∩n {|f | ≥ n} =
{|f | = +∞}. Using the property (iv) of the Proposition 1.12 we deduce that µ(|f | = +∞) =
0. R
(ii) If |f |dµ = 0, the sequence of sets {|f | ≥ n1 } is increasing and µ(|f | ≥ n1 ) = 0 for
all n. The point (iii) of Proposition 1.12 shows that µ(f 6= 0) = µ ∪n {|f | ≥ n1 } = 0. The
converse is evident. R R
(iii) If f is µ-null, then for all A ∈ X , | A f dµ| ≤ |f |dµ = 0. Conversely, R for every
1 1 1
integer n ≥ 1, let An = {f ≥ n } and Bn = {f ≤ − n }. Then 0 ≤ n µ(An ) ≤ An f dµ = 0,
so µ(An ) = 0 and in the same way µ(Bn ) = 0. Since the sequence (An ∪ Bn , n ≥ 1) is
increasing and {f 6= 0} = ∪n (An ∪ Bn ), we have µ(f 6= 0) = 0. 2
The following result is the second « classical » theorem that allows to exchange the limit
and integral for a sequence of functions.
Theorem 1.36 (Dominated Convergence Theorem) Let (fn , n ≥ 1) be a sequence of mea-
surable functions from (X, X ) to (R, R) such that
(i) The sequence (fn ) converges to f µ-almost everywhere.
(ii) There exists a function g ∈ L1 (µ) such that for every n ≥ 1, |fn | ≤ g µ-almost
everywhere. R R R
Then, limn fn dµ = f dµ. Furthermore, we have the stronger result : limn |fn −
f |dµ = 0.
We see here that we improved the results concerning convergence of sequences of integrals
compared to the corresponding ones in the framework of the Riemann integrals. Indeed, in
the last framework one has to impose that the sequence of Riemann integrable functions
(fn ) converges uniformly to f . Furthermore, the hypothesis of « domination » in part (ii)
of the Dominated Convergence Theorem may be satisfied without uniform convergence, as
shown in the next example. We let X = [0, 1] endowed with the Borel σ-algebra and the
Lebesgue measure λ (restricted to [0,1]). For every n ≥ 1, x ∈ [0, 1] set
2
!
e−nx
fn (x) = min √ , n .
x
The sequence (fn ) of continuous (and thus Borel) functions is such that fn (x) , n ≥ 1
converges to 0 for all x ∈]0, 1]. The sequence (fn ) thus converges to 0 λ almost everywhere.
R1
Moreover, |fn (x)| ≤ g(x) with g(x) = √1x . Furthermore 0 √1x dx < +∞, and g is λ integrable
R
on [0, 1]. The Monotone Convergence Theorem 1.23 thus applies and lim n fn dλ = 0, while
the sequence (fn ) does not converge uniformly to 0.
We immediately deduce the following corollary that allows the change of the order of
series and integral.
Corollary 1.37 Let (gn , n ≥ 1) be a sequence of measurable functions from (X, X ) to
(R, R).
(i) If the functions gn are non negative for every integer n, then
Z X ! XZ
gn dµ = gn dµ. (1.9)
n n
P R P
(ii) If n |gn |dµ
P < +∞, then the functions gn , n |gn | and the function defined
almost everywhere by n gn are µ-integrable. Furthermore, the equality (1.9) is also true.
Proof. (i) We apply P the Monotone Convergence Theorem 1.23 to the increasing sequence
of partial sums fnP= nk=1 gk and we conclude by linearity of the integral.
1
(ii) Let g = n |gn |. From (i), g ∈ L (µ) so g < +∞ µ-almost everywhere and the
series of general terms gn (x), which is almost everywhere
P absolutely convergent, is almost
everywhere convergent. For the null set upon which n gn is notP
defined, we give it an
n
arbitrary value, for example,
P 0. The sequence of partial sums fn = k=1 gk then converges
µ-almost everywhere to n gn and |fn | ≤ g for all n. The Dominated Convergence Theorem
and the linearity of the integral concludes the proof. 2
The following theorems give sufficient conditions for continuity and differentiability of
the integrals depending on one parameter. Since these properties may be characterized
by the convergence of sequences, they are the immediate consequences of the Dominated
Convergence Theorem and their proof is left as an exercise.
The goal of this section is to study the functions whose pth power is integrable, as
generalizations of the space L1 . Again we take a measure space (X, X , µ).
Definition 1.40 For every real p ∈]0,R+∞[, let Lp (X, X , µ) be the set of measurable func-
tions from (X, X ) to (R, R) such that |f |p dµ < +∞. When there will be no confusion of
the set and the measure, we will write more simply Lp = Lp (X, X , µ).
If a, b and α are real numbers, |αa + b|p ≤ (2 max(|α| |a|, |b|))p ≤ 2p |α|p|a|p + 2p |b|p . We
deduce that Lp is a vector space. The following result will be very useful for probabilities.
Proof. Let µ be a finite measure and 0 < p ≤ q. Then, |f |p ≤ |f |q 1{|f |≥1} + 1{|f |<1} . If
f ∈ Lq , we deduce that
Z Z
|f | dµ ≤ |f |q dµ + µ(|f | < 1) < +∞.
p
2
In the case of the Lebesgue measure λ on R, the function f defined by f (x) = √1x 1]0,1] (x)
1
is such that f ∈ L1 but f 6∈ L2 while the function g defined by g(x) = |x|+1 is such that
2 1
g ∈ L but g 6∈ L . In the following we will suppose p ≥ 1 and if f : (X, X ) → (R, R) is
measurable, we will write
Z 1p
p
kf kp = |f | dµ . (1.10)
When f ∈ L2 , we say that the function f is square integrable. The following theorem will
be very useful in the sequel.
Theorem 1.42 (Schwarz’s Inequality) Let f and g be two functions belonging to L 2 . Then
f g ∈ L1 and Z
|f g|dµ ≤ kf k2 kgk2 . (1.11)
Proof. Let f, g be functions in L2 . Then, the linearity of the integral implies that for all
a ∈ R, Z Z Z Z
2 2
0 ≤ |af + g| dµ = a |f | dµ + 2a |f g|dµ + |g|2 dµ.
2
The discriminant of its trinomial is thus non negative, which proves the inequality (1.11).
Further,
R in the particular case a = 1, the preceding identity and (1.11) shows that kf +gk 22 =
(f + g)2 dµ ≤ (kf k2 + kgk2 )2 . 2
We may generalize the Schwarz inequality and the triangle inequality in Theorem 1.42
for the case p > 1. We say that the real numbers p ∈ [1, +∞] and q ∈ [1, +∞] are conjugate
if
1 1
+ = 1, (1.12)
p q
1
with the convention +∞
= 0. Then we see that p = 1 and q = +∞ are conjugate, and so
are p = q = 2.
Theorem 1.43 (Hölder’s Inequality) Let p ∈]1, +∞[ and q ∈]1, +∞[ be conjugate expo-
nents.
(i) Let f and g be non negative measurable functions from (X, X ) to ([0, +∞[, B([0, +∞[)).
Then, Z
0≤ f g dµ ≤ kf kp kgkq ≤ +∞. (1.13)
Furthermore if kf kp + kgkq < +∞, the inequality (1.13) is an equality if and only if there
exist non negative real numbers a and b such that (a, b) 6= (0, 0) and af p = bg q µ almost
everywhere.
Proof. (i) Let α ∈]0, 1[. First of all we show the Young inequality for two non negative real
numbers u and v
uα v 1−α ≤ αu + (1 − α)v, (1.15)
with equality if and only if u = v. Let ϕα : [0, +∞[→ R be the function defined by ϕα (x) =
xα − αx. Then ϕα is differentiable on ]0, +∞[ and (ϕα )0 (x) > 0 on ]0, 1[ while ϕ0α (x) < 0
on ]1, +∞[. We deduce ϕα (x) ≤ ϕα (1) = 1 − α for all x ∈]0, +∞[ with equality if and only
if x = 1. When u ≥ 0 and v > 0, we write the inequality ϕα (x) ≤ 1 − α with x = uv and
multiply by v > 0 ; this yields (1.15). Finally, (1.15) is true obviously true if u ≥ 0 and
v = 0.
Proof. The triangle inequality is trivial for p = 1 and thus we suppose that p > 1.
Integrating the inequality |f + g|p ≤ |f ||f + g|p−1 + |g||f + g|p−1 , we deduce
Z Z
kf + gkp ≤ |f ||f + g| dµ + |g||f + g|p−1dµ.
p p−1
Because Lp is a vector space, kf + gkp < +∞. If kf + gkp = 0, the triangle inequality is
trivial. Otherwise, we may simplify the last inequality found by kf + gkp−1
p and we deduce
the triangle inequality.
The characterization of the case of equality is not proved. 2
The definition of k.kp shows that if a ∈ R and f ∈ L , kaf kp = |a|kf kp ; thanks to the
p
Minkowski
R Inequality, k.kp is a semi-norm, but it is not a norm. In fact kf kp = 0 implies
that |f | dµ = 0, or that |f |p = 0 µ-almost everywhere, that is to say that f is a null
p
does not depend on the representatives f ∈ L2 and g ∈ L2 and defines a scalar product on
L2 with the associated norm hf, f i = kf k22 . The vector space L2 is complete for this norm ;
it is a Hilbert space.
(ii) The vector space Lp is complete for the norm k.kp .
Convention In the following, so as to ease the notation, we will make a consistent abuse of
notation to identify the equivalence class f¯ ∈ L2 and any of it representatives, a measurable
function f ∈ L2 (which is defined µ-almost everywhere).
Definition 1.48 For every measurable function f : (X, X ) → (R, R), we define
kf k∞ = inf{a ∈ [0, +∞[: µ(|f | > a) = 0}, (1.18)
with the convention inf ∅ = +∞. We write L∞ for the set of measurable functions such that
there exists a number a ∈ R+ such that |f | ≤ a µ-almost everywhere, that is to say such
that kf k∞ < +∞.
The set L∞ is a vector space and k.k∞ is a semi-norm. The Hölder inequality (1.14) is also
true when p = 1 and q = +∞. The preceding equivalence relation allows the introduction
of a norm on the quotient space L∞ , and we will apply the same convention of notation for
the elements of L∞ .
The goal of this section is to define a σ-algebra and a measure on a product space X × Y
and to calculate integrals with respect to this measure.
Definition 1.49 Let (X, X ) and (Y, Y) be measurable spaces. The product σ-algebra of X
and Y is the σ-algebra on X × Y denoted X ⊗ Y generated by the set X × Y = {A × B : A ∈
X , B ∈ Y}, that is to say
X ⊗ Y = σ({A × B : A ∈ X , B ∈ Y}).
Remark 1.50 (i) The σ-algebra X ⊗ Y is the smallest σ-algebra on X × Y that makes
measurable the canonical projections ΠX : X × Y → X and ΠY : X × Y → Y defined by
ΠX (x, y) = x and ΠY (x, y) = y when the σ-algebra for the domain X (resp. Y) is X (resp.
Y).
(ii) Let (V, V) be a measurable space. A mapping f = (fX , fY ) : (V, V) → (X × Y, X ⊗ Y)
is measurable if and only if the mappings fX (resp. fY ) are measurable from (V, V) to (X, X )
(resp. to (Y, Y)).
The previous definition may be extended to a finite product of spaces. For i = 1, · · · , d,
let (Xi , Xi ) be a measurable space and let X = Πdi=1 Xi . If A = {Πdi=1 Ai , Ai ∈ Xi } denotes
the set of all products of elements of the σ-algebras Xi , the σ-algebra on X generated by
A is called the product σ-algebra and written X = ⊗di=1 Xi . In the particular case where
(Xi , Xi ) = (R, R) for all i = 1, · · · , d, X = Rd the Borel σ-algebra Rd coincides with the
σ-algebra R⊗d . Moreover the operation ⊗ on σ-algebras is associative, that is to say that
⊗3i=1 Xi = (X1 ⊗ X2 ) ⊗ X3 = X1 ⊗ (X2 ⊗ X3 )
Proposition 1.51 Let (X, X ) and (Y, Y) be two measurable spaces.
(i) Let A ∈ X ⊗ Y, the sections Ax = {y ∈ Y : (x, y) ∈ A} belong to Y for all x ∈ X and
Ay = {x ∈ X : (x, y) ∈ A} belong to X for all y ∈ Y.
(ii) Let f : (X × Y, X ⊗ Y) → (R, R) be a measurable function. For all x ∈ X the
section of f with respect to x, defined by fx (y) = f (x, y) for y ∈ Y, is measurable from
(Y, Y) to (R, R) and in the same way for all y ∈ Y, the section f y with respect y defined by
f y (x) = f (x, y) for x ∈ X, is measurable from (X, X ) to (R, R).
If (X, X , µ) and (Y, Y, ν) are two measure spaces, the following theorem defines a measure
on the measurable product space (X × Y, X ⊗ Y)
Theorem 1.52 Let (X, X , µ) and (Y, Y, ν) be two σ-finite measure spaces. There exists a
unique measure denoted µ ⊗ ν on (X × Y, X ⊗ Y) defined by
(µ ⊗ ν)(A × B) = µ(A) ν(B) , ∀A ∈ X , ∀B ∈ Y, (1.19)
with the convention 0 × (+∞) = 0.
The existence of this measure is admitted. Its uniqueness is a consequence of the Monotone
Class Theorem (by the Corollary 1.16). We remark that we may iterate this construction
and define a product measure µ1 ⊗ · · · ⊗ µd on the product σ-algebra X1 ⊗ · · · ⊗ Xd defined
on the product space X = Πdi=1 Xi . Finally we remark that the Lebesgue measure λd on
Rd = ⊗d R is the product measure λ⊗d .
The following theorem links the integrals of non negative measurable functions defined
on X × Y with respect to µ ⊗ ν to the integrals with respect to µ and to ν.
Convention of notation We often write the equations (1.20) and (1.21) in the form
Z Z Z Z Z Z
µ(dx)ν(dy)f (x, y) = µ(dx) f (x, y)ν(dy) = ν(dy) µ(dx)f (x, y).
X Y X Y Y X
These two theorems, Fubini-Tonelli and Fubini-Lebesgue, give another proof of Corollary
1.37. They allow to define double series either of non negative terms or absolutely converging,
and their calculation.
The next notion of convolution product will be used in probability.
Proposition 1.55 Let f and g be Borel functions from R to R integrable for Lebesgue
measure λ. Then the function f ∗ g defined by
Z Z
(f ∗ g)(x) = f (x − y)g(y)λ(dy) = f (y)g(x − y)λ(dx) (1.22)
where the last equality occurs by the invariance of the Lebesgue measure by translation.
The regularity of the convolution product comes by the theorems 1.38 and 1.39 from conti-
nuity and differentiability of the integrals depending on a parameter, which are immediate
consequences of the dominated convergence theorem. 2
The Hölder inequality yields more precise properties of the convolution product acting
on the spaces Lp .
Proposition 1.56 (i) Let p ∈ [1, +∞] and q be conjugate exponents, f ∈ Lp (λ) and g ∈
Lq (λ). Then
sup |(f ∗ g)(x)| ≤ kf kp kgkq . (1.24)
x∈R
(ii) Let p ∈ [1, +∞], f ∈ Lp (λ) and g ∈ L1 (λ). Then f ∗ g ∈ Lp (λ) and we have
Raising this inequality to the power p then integrating with respect to the Lebesgue measure
λ(dx), because |f |p ∈ L1 , the inequality (1.23) implies
p
1+ pq
k |f | ∗ |g| kpp ≤ k |f |p ∗ |g| k1 kgk1q ≤ k |f |p k1 k g k1 = k f kpp k g kp1 ,
2 Probabilistic formulation
In all of this chapter, we consider a space Ω, described as all the possible values of an
experiment or of an observation « that depends on chance ».
The σ-algebra that describes the information defined by properties of the observations
is traditionally written F . The sets belonging to the σ-algebra are called events and the
elements of Ω are denoted ω. When the space Ω is finite or countable (for example N or
Nd ), the usual σ-algebra is the space of all subsets of Ω, so that F = P(Ω). When Ω is not
countable, for example when Ω = R, the σ-algebra P(Ω) of all subsets of Ω is technically
not proper and we are led to a smaller construction, for example R.
The probability P on F is such that P (A) numerically describes the « chances that
the property that defines A will be realized when we look at the results of a particular
experiment ».
The simplest probabilistic model is the one where the space Ω is finite. We often take as
the σ-algebra on F = P(Ω), the space of the subsets of Ω and the uniform probability on F
defined by P (A) = |A||Ω|
for all A ⊂ Ω describes mathematically the fact that the occurrence
of all the values appear in the same way (a fair coin, a fair die, a choice of a person « to
random » in a population, ...)
This model is certainly very limited for translating more complex phenomena and we do
not take it up again here. The following results give the essential reminders of probability
theory using the tools of abstract measure from the preceding chapter.
We begin by reviewing the notions of measurable function and integral by use of the
usual terminology in probability. In the beginning it is a simple problem of vocabulary, but
rather soon problems specific to probability are reached.
Convention For every Borel function ψ : R → R, non negative or integrable with respect
to the Lebesgue measure λ, we will write
Z Z
ψdλ = ψ(x)dx.
R R
We remark that every discrete random variable is equally a real random real.
The null sets are the elements of F of P -measure zero, that is to say P -null. A property
true P -almost everywhere will be said true almost surely and denoted a.s.
We thus translate again the part (i) of the Proposition 1.35 into saying that an integrable
random variable is finite almost surely. As an exercise we can prove that this property leads
to the following lemma, which is very useful in probability for showing the almost sure
convergence that we will study in Chapter 5.
In the spirit of the section 1.7, we introduce the spaces Lp and Lp . A random variable X
is said to be square integrable if it belongs to L2 := L2 (P ), of power p integrable if it belongs
to Lp := Lp (P ), 1 ≤ p < +∞ and (essentially) bounded if it belongs to L∞ := L∞ (P ). For
all p ∈ [1, +∞[, and every real random variable X,
1
kXkp = [E(|X|p )] p . (2.2)
In the following, we will make the abuse of language indicated at the end of section 1.7 by
identifying an element of the quotient space Lp with one if its representatives and a random
variable belonging to Lp with it equivalence class of Lp . Then, we will speak of the « p
norm » of a random variable X ∈ Lp , while we should also speak of its semi-norm and of
the norm of the equivalence class of X. The Hölder inequality implies that the spaces Lp
are included one in the others.
Theorem 2.3 Let (Ω, F , P ) be a probability space. Then if 1 ≤ p1 ≤ p2 ≤ +∞, Lp2 ⊂ Lp1 .
More precisely, for every real random variable X,
Proof. The inequality (2.3) is clear if X 6∈ Lp2 and we suppose that kXkp2 < +∞. Ob-
viously, if X ∈ L∞ , |X| ≤ kXk∞ a.s. and thus for all p1 ∈ [1, +∞[, |X|p1 ≤ kXkp∞1 . By
integrating this inequality, we deduce (2.3) when p2 = ∞.
We suppose 1 ≤ p1 < p2 < +∞, and we write p = pp12 ∈]1, +∞[ and q denote the
conjugate exponent of p. Then the Hölder inequality (1.14) applied to the function g = 1,
such that kgkq = 1, implies
Z 1p
kXkpp11 p1
= E(|X| × 1) ≤ |X| p1 p
dP × 1 = kXkpp12 ,
Let g be an integrable function ; the dominated convergence theorem 1.36 proves that
E = {g} is equiintegrable. We infer that E = {measurable f : |f | ≤ |g|} is also an
equiintegrable family. The following result demonstrates that a bounded sequence in Lp ,
p > 1 is uniformly integrable ; this will be useful to study stochastic processes.
Theorem 2.5 Let E be a family of measurable functions such that there exists p ∈]1, +∞[
satisfying M = sup{kf kp : f ∈ E} < +∞. Then, E is uniformly integrable.
Proof. Let q ∈]1, +∞[ be the conjugate exponent of p. For all a > 0, the Hölder inequality
implies, for f ∈ E that
Z Z p1
1
p
|f |dP ≤ |f | dP (P (|f | ≥ a) q .
{|f |≥a}
1
R
The Markov inequality (1.5) shows that P (|f | ≥ a) ≤ ap
|f |p dP . We deduce that for every
function f ∈ E, Z
p p
|f |dP ≤ M 1+ q a− q ,
{|f |≥a}
Proof. If E is uniformly
R integrable, for all ε ∈]0, 1] there exists a > 0 such that for every
function f ∈ E, {|f ≥a} |f |dP ≤ ε, and thus kf k1 ≤ a + 1, which proves (i). Furthermore, for
every measurable set A,
Z Z Z
|f |dP = |f |dP + |f |dP ≤ ε + aP (A),
A A∩{|f |≥a} A∩{|f |<a}
Theorem 2.7 Let (fn , n ≥ 1) be a sequence of measurable functions that converge almost
surely to f and are equiintegrable. Then limn E(|f − fn |) = 0 and limn E(fn ) = E(f ).
Proof. Following the Proposition 2.6, the sequence fn is bounded in L1 and the Fatou
Lemma 1.33 implies that f is integrable. For every a > 0,
Z Z Z
E(|fn − f |) ≤ |fn |dP + |f |dP + fn 1{|fn |<a} − f 1{|f |<a} dP.
|fn |≥a} |f |≥a}
For every ε > 0, the Definition 2.4 shows the existence of a > 0 such thatR for every function
g contained in the equiintegrable family E = {f } ∪ {fn : n ≥ 1}, we have |g|1{|g|≥a} dP ≤ ε.
Moreover, the sequence fn 1{|fn |<a} − f 1{|f |<a} converges almost surely to 0 and is dominated
by theR constant 2a, which is integrable. The dominated convergence theorem thus shows
that fn 1{|fn |<a} − f 1{|f |<a} dP converges to 0. This completes the proof. 2
Theorem 2.3 shows that the space L2 of square integrable random variables is included
in the space of random integrable variables. The following definition shows that, for every
random variable X that is square integrable, E(X) is the best constant that approximates X
for the k k2 norm. Indeed, if X is square integrable, E(X) is well defined, E(X − E(X)) = 0
and for every real number a, the linearity of the integral implies
2
2 2
E |X − a| = E |X − E(X)| + E(X) − a E X − E(X) + E(X) − a
≥ E |X − E(X)|2 .
Following the Proposition 1.35 (ii), a random variable is almost surely constant (and equal
to its expectation) if and only if it is square integrable and has zero variance. Clearly, if X is
integrable, E(aX + b) = aE(X) + b and if X is square integrable, V ar(aX + b) = a2 V ar(X).
The variance of a real variable measures the « dispersion » of X around its expectation.
The notion of image measure is crucial. The following theorem is a rephrasing of Theo-
rems 1.19 and 1.32.
The function ϕ is RPX -integrable if and only if ϕ(X) is P -integrable and in this case we have
anew E(ϕ(X)) = ϕdPX . R
Moreover, if ν is a probability on R such that E(ϕ(X)) = R ϕdν for every non negative
Borel function ϕ (or for every bounded Borel function ϕ), then ν = P X .
We remark that the distribution of X does not describe the probability P on all the σ-
algebra F , but on the σ-algebra generated by X, σ(X) = {X ∈ A : A ∈ R}. Finally we
remark that for every probability ν on the Borel σ-algebra R, there exists a probability
space (Ω, F , P ) and a real random variable X : Ω → R of distribution ν. In fact it suffices
to take Ω = R, F = R, P = ν and X the identity on R.
When X : Ω → R, the Monotone Class Theorem 1.15 applied to the class C = {]a, b[:
a < b, a, b ∈ R} shows that the distribution of X is characterized by the probability of X
on the intervals of R. We may also decrease the class of intervals which are required to
characterize the distribution of X by considering C = {] − ∞, t], t ∈ R}.
Proposition 2.10 Let X : (Ω, F ) → (R, R) be a real random variable and P be a probability
on F . The distribution of X is characterized by the distribution function of X, that is by
the function F : R → [0, 1] defined by
Moreover, the distribution function F is increasing, right continuous, lim t→−∞ F (t) = 0 and
limt→+∞ F (t) = 1.
When X : Ω → N is a discrete random variable, its distribution
P is characterized by the
set of non negative numbers (P (X = n), n ≥ 0) such that n P (X = n) = 1. In this case,
the distribution of X is written like a balanced series of Dirac masses, as
X
PX = P (X = n) δn .
n≥0
In this case, for every Borel function ϕ which is non negative or such that ϕ(X) is integrable,
X
E(ϕ(X)) = ϕ(n)P (X = n).
n≥0
In particular, X
E(X) = nP (X = n) ∈ [0, +∞].
n≥0
Example 2.11 Here are some examples of some « usual »discrete distributions
1) Bernoulli distribution of parameter p ∈ [0, 1], denoted B(p)
This distribution models an experiment with two possible results (heads or tails, success
or failure, ...) We encode the results 0 (failure) and 1 (success). A random variable X :
Ω → {0, 1} follows a Bernoulli distribution of parameter p ∈ [0, 1] if P (X = 1) = p and
P (X = 0) = 1 − p. It is a bounded random variable, which thus belongs to all the Lp ,
1 ≤ p ≤ +∞ spaces. We have E(X) = p and V ar(X) = p(1 − p) ; we then find that X is
constant, that is « deterministic » , if p = 0 or p = 1.
2) Binomial distribution of parameters n ≥ 1 and p ∈ [0, 1], denoted B(n, p)
This distribution models the total number of successes when we repeat n times « in an
independent way » the same experiment that every time follows the Bernoulli distribution of
parameter p. We will return to this in section 2.3. If a random variable X : Ω → {0, · · · , n}
follows a Bernoulli distribution, for every integer k = 0, · · · , n, P (X = k) = Cnk pk (1 − p)n−k .
Again, a binomial random variable B(n, p) belongs to all the Lp spaces for 1 ≤ p ≤ +∞.
We have E(X) = np and V ar(X) = np(1 − p).
Note that sometimes, in order to describe exactly the waiting time of the first success,
one says that the distribution of X is geometric with parameter a if for any integer n ≥ 1,
P (X = n) = (1 − a) an−1 .
Definition 2.12 A real random variable X (or its distribution) has density f (with respect
to the Lebesgue measure) if its distribution PX has the density f with respect to λ, that is if
for every Borel set A ∈ R,
Z Z
P (X ∈ A) = 1A f dλ = f (x)dx. (2.8)
A
R
The function f is Borel, non negative, such that R f (x)λ(dx) = 1 and for every Borel
function ϕ non negative (or such that ϕf is λ-integrable),
Z
E ϕ(X) = ϕ(x) f (x) dx. (2.9)
R
The validity of (2.9) for every Borel non negative (or bounded) function ϕ implies that
the distribution of X has the density f with respect to the Lebesgue measure λ. We say more
simply that X has density f .
R
Moreover, every non negative Borel function g on R and such that I := R g(x)dx < +∞
holds, by normalization, gives a probability density on R ; it suffices to put f (x) = g(x)I
.
Let X be a real variable of density f . Then the distribution function F of X is the
integral of the upper bound of f and is continuous, that is to say that P (X = t) = 0 for
all t ∈ R. Then it is natural to try to connect the derivative of F , if it exists, to f . This
point is rather delicate. We will admit that every increasing function G on the interval [a, b]
(thus Riemann integrable on [a, b]) that is also continuous on [a, b] is differentiable λ-almost
everywhere on [a, b]. Then, if G0 is its derivative (defined λ-almost everywhere) the following
inequality is true : Z b
G0 (x)dx ≤ G(b) − G(a). (2.10)
a
Indeed, for every integer n ≥ 1, let Gn (x) = n[G(x + n1 ) − G(x)] for a ≤ x ≤ b − n1 and
Gn (x) = 0 for b − n1 < x ≤ b. Then the functions Gn are non negative and convergent
λ-almost everywhere to G0 . The Fatou Lemma 1.33 allows the conclusion that
Z b Z b
0
G (x)dλ(x) ≤ lim inf Gn (x)dλ(x).
a n a
Furthermore,
Z b Z b Z 1
a+ n
Gn (x)dλ(x) = n G(x)dx − n G(x)dx
1
a b− n a
1 1
= n Φ(b) − Φ(b − ) − n Φ(a + ) − Φ(a) ,
n n
Rt
where we denote Φ(t) = a G(x)dx. Since G is continuous at a and at b, Φ admits a right-
hand derivative at a equal to G(a) and a left-hand derivative at b equal to G(b). The sequence
Rb
a
Gn (x)dλ(x) thus converges to G(b) − G(a).
The following example shows that the inequality (2.10) between the lower limit and the
integral of the derivative may be strict. If G(x) = 1[ 1 ,1] , a = 0 and b = 1, G0 (x) = 0 λ-almost
R1 2
everywhere on [a, b] and 0 G0 (x)dx = 0 < G(1) − G(0) = 1. A more complex example shows
that this inequality may be strict even if G is continuous.
Nevertheless, the integral of the upper bound F of a function f integrable with respect
to Lebesgue measure λ (called an absolutely continuous function) is λ-almost everywhere
differentiable and F 0 = f λ-almost everywhere. This will not be proved here. We then deduce
Proposition 2.13 (i) Let X be a real random variable of density f . Then the distribution
function F of X is continuous, almost everywhere differentiable and F 0 (x) = f (x) λ-almost
everywhere.
(ii) Let X be a real random variable whose distribution
R +∞ 0 function F is continuous ; then
F is differentiable almost everywhere. Moreover, if −∞ F (x) dx = 1, we deduce that X has
density F 0 .
Proof. The point (i) is clear. For the point (ii), the differentiability
Rn of F λ-almost everywhere
is clear. For every integer n, we know by (2.10) that −n F 0 (x)dx ≤ F (n) − F (−n) ≤ 1.
Furthermore, since F is increasing,
R 0 F 0 ≥ 0 λ-almost everywhere. The monotone convergence
theorem then shows that R F (x)dx ≤ 1, that is to say that F 0 is integrable. If we write
Rt
Φ(t) = −∞ F 0 (x)dx, the part (i) shows that Φ0 = F 0 λ-almost everywhere. Furthermore,
R +∞ 0
−∞ R
F (x) dx = 1 and for all t ∈ R, (2.10) and the monotone class theorem 1.15 show
t R +∞
that −∞ F 0 (x) dx ≤ F (t) when t F 0 (x) dx ≤ 1 − F (t). We then deduce that for every t,
Rt
Φ(t) = −∞ F 0 (x) dx = F (t), which shows that Φ is the distribution function of a distribution
µ on R of density F 0 , and since, according to Proposition 2.10, the distribution function
characterizes the distribution, we deduce that X has density F 0 . 2
We calculate the distribution of the image of a real random variable X by a function Φ
in the following manner.
If φ is of non constant sign, 1]c,d[ φ is λ-integrable if and only if φ ◦ Φ|Φ0 |1]a,b[ is λ-integrable
and in this case, the equation (2.11) remains true. The random variable Y = Φ(X) almost
surely takes its values in the interval ]c, d[ and its density is the function defined by g(y) = 0
if y 6∈]c, d[ and for y ∈]c, d[, if Ψ :]c, d[→]a, b[ designates the reciprocal function of Φ,
Example 2.15 The following examples of densities of random variables are classic.
1) Uniform distribution on the interval [a, b], written U ([a, b])
This distribution models a random phenomenon that takes the real values between a
and b, such that the probability of falling in an interval is proportional to its length, that is
to say that the values are placed « at random » in the interval [a, b].
A random variable X : Ω → R follows a uniform distribution on the interval [a, b] where
1
a and b are real numbers such that a < b if its density is the function f = b−a 1[a,b] . It only
takes values between a and b ; it is almost surely bounded and is contained in all the Lp
spaces with p ∈ [1, +∞].
a+b (b−a)2
We have E(X) = 2
, V ar(X) = 12
and the distribution function F is such that
0 if t < a,
t−a
F (t) = b−a
if a ≤ t ≤ b,
1 if t > b.
Convention In the rest it will be convenient to commit the consistent abuse of notation in
identifying a vector of Rd with the column matrix of its components in the canonical basis.
If A designates the matrix associated to h in the canonical basis, we will write with this
convention :
E(X1 )
..
E(X) = . and E(AX) = AE(X) for the expectation of the vector h(X).
E(Xd )
Theorem 2.17 (i) Let X and Y be real random variables that are square integrable. The
covariance of X and Y is a real number defined by
Cov(X, Y ) = E(XY ) − E(X)E(Y ) = E (X − E(X)) (Y − E(Y )) = Cov(Y, X). (2.13)
p p
Furthermore, Cov(X, X) = V ar(X) and |Cov(X, Y )| ≤ V ar(X) V ar(Y ).
(ii) Let X = (X1 , · · · , Xd ) be a square integrable random vector. The covariance matrix
of X is the square d × d matrix denoted ΓX defined by
ΓX = Cov(Xi , Xj ) : 1 ≤ i, j ≤ d = E X X̃ − E(X)E(X), ^ (2.14)
where B̃ designates the transpose matrix of the matrix B. It is a non negative, symmetric
matrix. More precisely, for every vector (a1 , · · · , ad ) ∈ Rd if we equate a and the column
matrix (d, 1) for the ai ,
X d
ãΓX a = V ar ai Xi ≥ 0. (2.15)
i=1
Finally, for every linear function h : R → R , of the matrix A in the canonical basis,
d r
Finally, X = (X1 , · · · , Xd ) has density f if and only if for every Borel function ϕ : Rd →
[0, +∞[, Z
E(ϕ(X)) = ϕ(x)f (x)dλd (x),
Rd
and this inequality is also true if ϕ is Borel (of some sign) such that ϕ(X) is integrable, or
in an equivalent way, ϕf is λd -integrable.
The calculation of integrals with respect to the measure λd , the product measure of
the Lebesgue measure on R, is accomplished by application of the Fubini-Tonelli Theorem
1.53 or Fubini-Lebesgue Theorem 1.54. These theorems allow calculation of the densities of
« sub-vectors » of X.
denoting as y = (xi1 , · · · , xik ), j1 < · · · < jd−k the elements of {1, · · · , d}\{i1 , · · · , ik } and
z = (xj1 , · · · , xjd−k ),
In particular, if d = 2, each component Xi of the pair (X1 , X2 ) has a density called the
ith marginal density fi defined respectively by
Z Z
f1 (x1 ) = f (x1 , x2 )dx2 and f2 (x2 ) = f (x1 , x2 )dx1 . (2.19)
(ii) A Borel function φ is integrable for the restriction of the Lebesgue measure on D if
and only if the function φ ◦ Φ |JΦ | is integrable for the restriction of the Lebesgue measure
on ∆. In this case, the equation (2.20) is again true.
2.3 Independence
It is necessary for us to more precisely formalize the implicit notions in the models
described for introducing the classical distributions, such as the Poisson distribution, the
geometric distribution, ...
The notion of independence is « purely probabilistic » ; its objective is to give a mathe-
matical formulation of successive experiments such that « the results of the early ones do not
influence the ones that follow », but it is not exactly equivalent to this intuitive property.
On the other hand, it is necessary to put away the confusion with linear independence.
The central notion is that of independence of σ-algebras, to which all the other definitions
return.
Definition 2.23 (i) Two events A and B in F are independent if
P (A ∩ B) = P (A)P (B).
(ii) A family (Gk , 1 ≤ k ≤ n) of sub-σ-algebras of F is independent if
P (∩nk=1 Ak ) = Πnk=1 P (Ak ) , ∀Ak ∈ Gk , 1 ≤ k ≤ n. (2.22)
A sequence (Gk , k ≥ 1)) of sub-σ-algebras of F is independent if and only if for every integer
n, the family (Gk , 1 ≤ k ≤ n) is independent.
(iii) A family of sets (Ak , 1 ≤ k ≤ n) (resp. (Ak , k ≥ 1)) of events is independent if the
family of σ-algebras (σ(Ak ), 1 ≤ k ≤ n) (resp. (σ(Ak ), k ≥ 1) ) is independent.
(iv) A finite family of random vectors (Xk : 1 ≤ k ≤ n) (resp. a sequence of random
vectors (Xk , k ≥ 1)), where Xk : Ω → Rdk , is independent if the family of the σ-algebras
σ(Xk ) = Xk−1 (Rdk ) is independent.
(v) A random vector X : Ω → Rd and a sub-σ-algebra G ⊂ F are independent if the
σ-algebras σ(X) = X −1 (Rd ) and G are independent.
We immediately see that the events A and B are independent if and only if the σ-
algebras σ(A) = {∅, Ω, A, Ac } and σ(B) are independent. Furthermore, if the random vectors
Xk : Ω → Rdk , 1 ≤ n are independent, and if Φk : Rdk → Rrk are Borel functions, then the
random vectors Yk = Φk ◦ Xk : Ω → Rrk are independent.
The following result characterizes the distribution of a random vector (X1 , · · · , Xd ) where
the components are independent (or blocks of the components are independent). It proof
consists of the fact that a probability on Rd is characterized by the value that it takes on
the tiles Πdi=1 [ai , bi ].
Theorem 2.24 Let Xi : Ω → Rdi , 1 ≤ i ≤ n be random vectors, k ∈ {1, · · · , n − 1} and
Y = (X1 , · · · , Xk ), Z = (Xk+1 , · · · , Xn ) be sub-vectors of X = (Y, Z) = (X1 , · · · , Xn ). Then
the properties are equivalent :
(i) The random vectors Y and Z are independent
(ii) The distribution of X on Rd1 +···dn is equal to the product of the distributions of Y
and of Z respectively on Rd1 +···dk and on Rdk+1 +···+dn , that is to say P(Y,Z) = PX ⊗ PY .
In the particular case where the random vectors Y and Z have for density, respectively,
g and h, these two properties are also equivalent to
(iii) The density of the pair of random vectors X = (Y, Z) is the « product »of the
densities f and g ; more precisely, it is the function f of x = (y, z) defined by
f (y, z) = g(y)h(z). (2.23)
This result extends to a finite number of sub-vectors of X. In the particular case of real
random variables Xi that have density fi , the random variables X1 , · · · , Xn are independent
if and only if the density of the vector X = (X1 , · · · , Xn ) on Rn is the function f defined
by f (x1 , · · · , xn ) = f1 (x1 ) · · · fn (xn ).
We deduce from this characterization of independence that if the random vectors Xk ,
1 ≤ k ≤ n, are independent for all k = 1, · · · , n − 1, then the vectors Y = (X1 , · · · , Xk ) and
Z = (Xk+1 , · · · , Xn ) are independent and consequently that if Φ : Rd1 +···dk → Rl and Ψ :
Rdk+1 +···dn → Rr are Borel functions, the random vectors Φ(Y ) and Ψ(Z) are independent.
Again, this property extends to a finite number of sub-vectors of X = (X1 , · · · , Xn ).
The theorems of Fubini-Tonelli 1.53 and of Fubini-Lebesgue 1.54 indicate that if X and
Y are independent real random variables of densities, respectively, f and g, then X + Y has
for density the convolution product f ∗ g defined by (1.22). The Fubini theorems imply the
following result.
Theorem 2.25 Let Xi : Ω → Rdi , i = 1, 2 be random independent vectors and Φi : Rdi → R
be Borel functions.
(i) If the functions Φi are non negative,
E Φ1 (X1 )Φ2 (X2 ) = E Φ1 (X1 ) E Φ2 (X2 ) . (2.24)
(ii) If the functions Φi are such that the random variables Φi (Xi ) are integrable, the
random variable Φ1 (X1 )Φ2 (X2 ) is also integrable and the equation (2.24) is also true.
We remark that the independence allows to weaken the square integrability hypothesis
for the product of real random variables to be integrable. We deduce in particular that if two
random variables X and Y are independent and integrable, the product XY is integrable
and that E(XY ) = E(X)E(Y ), that is to say that the covariance Cov(X, Y ) = 0. The
converse is false, as shown in the following example (which has other properties that will be
laid out later).
Example 2.26 Let X be a N (0, 1) random variable, a > 0, and Y be the random va-
riable defined by Y = X 1{|X|>a} − X1{|X|≤a} . Then the random variable Y is equally
N (0, 1),Rbut is a non-constant function of X that it is not independent of X. If we write
t 2
G(t) = 0 √12π exp(− x2 )dx, the function G : [0, +∞[→ [0, 21 [ is continuous, G(0) = 0 and
limt→+∞ G(t) = 21 . The intermediate value theorem shows then that there exists a > 0
(which is furthermore unique) such that G(a) = 41 . For this value of a, E(XY ) = 0, thus
Cov(X, Y ) = 0.
On the other hand, we deduce from Theorem 2.25 the
Corollary 2.27 Let X = (X1 , · · · , Xd ) : Ω → Rd be a random vector.
(i) If the components Xi of X are pairwise independent, the covariance matrix of X is
diagonal.
(ii) If k ∈ {1, · · · , d−1} is such that the vectors Y = (X1 , · · · , Xk ) and Z = (Xk+1 , · · · , Xd )
are independent, the the covariance matrix ΓX of X is block diagonal, that is to say
ΓY 0
ΓX = .
0 ΓZ
The converses of the two results of the Corollary 2.27 are false.
2.4 Simulation
xn+1 = a xn + c (mod m) ;
the initial value x0 is called the base, a is the multiplier, c is the addend and m the modulus
of the sequence. The sequence (xn ) takes its values between 0 and m − 1 and the sequence
(xn /m , n ≥ 1) takes its values in the interval [0, 1[. The maximal period of such a generator
is m and it is important, for the simulations of large samples for the given distribution, to
have generators of large period. The period for the Mersenne Twister is 106000 .
We suppose then that we know how to simulate the realization of a sample of uniform
distribution on [0, 1], that is to say a numerical sequence (un , n ≥ 0) of reals in [0, 1], which is
a realization (Un (ω), n ≥ 0) for a sequence (Un , n ≥ 0) of independent random variables with
the same uniform distribution U ([0, 1]), for example by executing the command Random in a
program. We proceed to write three classical methods of simulation of other real distributions
from (Un ).
Proof. We show first of all that for all u ∈]0, 1[ and t ∈ R, F −1 (u) ≤ t if and only if
u ≤ F (t). In fact, if u ≤ F (t), by definition F −1 (u) ≤ t. Conversely, let y > t ≥ F −1 (u) ;
then, because F is increasing, F (y) ≥ u and because F is right-continuous, when y > t
converges to t, we deduce F (t) ≥ u. 2
−1
If the distribution function F of X is explicit, we deduce that (F (Un ), n ≥ 1) is a
sample of the distribution X. This furnishes for example a simulation algorithm when :
Case 1. X takes a finite (or countable) number of values We suppose that the
values taken by X are (ai , 0 ≤ i ≤ N ) ordered in increasing manner, and that P (X = ai ) =
pi for all i. We then calculate
P Fi = p0 + · · · + pi for all i and for all u ∈]0, 1[ we write :
−1
F (u) = a0 1{u≤F0 } + i≥1 ai 1{Fi−1 <u≤Fi } .
Example of a Bernoulli distribution of parameter p : P (X = 0) = q = 1 − p and
P (X = 1) = p . We deduce the simulation of n independent random variables of the same
Bernoulli distribution of parameter p ∈]0, 1[ that we place in the table X (by using the fact
that if U follows a uniform distribution U ([0, 1]), 1 − U also follows a uniform distribution
U ([0, 1])) :
For k = 1, ..., n
If (Random < p)
X[k] ← 1
Else X[k] ← 0
End
X takes a finite number of values : If X takes N + 1 values, at the beginning of the
program we calculate the values of Fi that we place in the table F [i], i = 0, · · · , N and we
equally place the values ai in a table a[i]. The critical loop is then the following :
i←0
U ← Random
While (U > F [i])
i←i+1
End
X[k] ← a[i]
We deduce that simulating the independent random variables (Ui , i ≥ 1) of the same
distribution U ([0, 1]), if n(ω) designates the first integer such that U1 U2 · · · Un(ω)+1 < e−λ ,
n follows a Poisson distribution P(λ). Hence a simulation algorithm of a random variable
X of distribution P(λ) reads :
a ← exp(−λ), X ← 0
U ← Random
While (U > a) do
U ← U ∗ Random , X ← X + 1
End
Do X ← A
While C false
End
Return X
gives a simulation of the uniform distribution on C. In fact, let (Xn , n ≥ 1) be a sequence of
independent random variables of uniform distribution on D and τ = inf{n ≥ 1 : Xn ∈ C} ;
the preceding algorithm returns the random variable Xτ such that for all Borel subsets
B ⊂ C,
∞
X ∞
X
P (Xτ ∈ B) = P ({τ = k} ∩ {Xk ∈ B}) = P (X1 6∈ C)k−1 P (Xk ∈ B)
k=1 k=1
∞
X k−1
|C| |B| |B| 1 |B|
= 1− = |C|
= .
k=1
|D| |D| |D| |C|
[D|
The following figure shows a rejection method for the simulation of the uniform on the unit
circle from a uniform distribution on the square [−1, 1]2 . Among 10 000 points drawn on
the square, only 7 848 are kept because they are in the unit circle ; this is constant with the
equality π/4 = 0, 785 398. This simulation has been obtained as follows :
1.060
.. . .... .. .. . ... ...... .. . .
... .. .. ... ........ .. . .. . ... . .. .. ......
. ... . ... ... . .... .. . . . . .. .. ... . . ....... ... .. ... . .
. .. . .. . .. .... . ... . .... . . . .. . ........ .... . ....... ...... ... . ...
.. .... .... . ......... .. ... ...... .... .................. . ........ ..... .... . . ... . ......... . ...... ... ........ . ..
0.848 ..... . ..... ... ..... ... . .. ....... . . ... .... .. ........ ..... . .. .... ..... . . . . . ..... .... ..... ... .. ..
.
...... ......... . ... .. .. . .. . .. . . . . .. ... . ... ..... . . .. .. .. .. . . ..... . ...
.
.... .... .. .. ............ .......... ... .. .............. ..... ... ... ........ . .... .............. .......... . .. ........ ... ..................... ..... ..... .. . .
. .. . . . ... .. . .... .. . . . .... ... . . . . .. . . . . . ... ...... .
......... ... ......... . .... ........ .. .... ..... ...... .... ...... .. . ... .. ............................. ... ...... . . .......... ....... .... ........ .. .... .. ... ...... .... .
. . ..... .... ... ...... . .... . ...... ........ . . .... .... . .. ... ... ...... . ....... .. ...... . ...... . . . .... ..... . ........... .. . .
. ........ ... .. . .. . ... ... .. ............ .. . . .. .. ... ... . .. ..... . ..... .. .......... ........ .... .......... ... .......... . .. . ............ ....... .. .... .
0.636
. .... ... . . . ... .... . . .. .. ... . .... .. .. . . .... ... ... ...... .. . ... ... .. ... .. . ... ...... .. ... .. . ..... .. . . ... .... . .. . .
.. . . .... ... . .. .. .. ..... ..... . . ... .... ... .... . .... .. . ... .. . .. . .. .. .. ..... ... . .. . .. . ...... .... .................. ....... .. .. . .. .. ..
................ ........ . ............... ... .. . ............ ............ ...................... ..... .. . . . ...... . .. . ...... ..... .. ..... .............. ..... .. ....... . .......... . . . ...... ..........
... . . . . . . .. . . . .. . . . . . . . .... .. ... . . . .. .. . ... . .. . . .. . . . ... .. . .. . .. .. .. . ... ..
0.424 ..... ... . .. ..... ............ ... ...... .. .. .. .. . ... .................. .... ... .. ... ......... ....... ........ . .. .. ..... .... .. .. ... ... ..... ... ........ .. ... .... .. .
.... . . ... .. . .. . . ... .. . .. . . .. . . .. . .. .. .. . . ... . ... ... . . ... ... .. . . . . . . . ...... ... ... . .. . .. . . . .. .. ... .
... . . ........................ .... . ........... ....... ...... .. .... . . .... . ... .. .. ... ... . . ...... . ...... ...... ........... .. . ........ ..... ....... .. . ... .... . ....... .. ... ...
...... ..... ........... .. ... ......... .... ....... ...... .... ................ ........... ................. . . . .... . .. . . .. .............. .. ........ .. . ... ................. ..... ... .. .. ....... ..... .............. ..
. . .. . . .. . ... . .. . . ... . . . .. . .. .. . . . . .. . .. . . . ... .. .. . ..
. ... ..... .... .. ....... .... .......... . ... ........... ...... . . .... .. ... . . .. .... ... . . .. .... .......... ... ...................... ... . .. ... ... ....... .......... .... .. ...... ....... .
0.212 ....... ... .. ... ....... ..... . ...... ... ......... .......... .... . .. .... ......... ... . .. . ... ............ ........ .. .. ..... . ............. .... ........... ... ................... ...... ..... ........ . ..... .... .... ..... . ...
.. . . .. .. . . . .
.......... ........... . .. .. .. ........ ... .. . ... ..... ..... .. . ........ . . .. ..... .............. .. . ........ .. . .... .... . .. ... ..... ........ .......... .......... ....... ... ... .... . ..... . ..... .. ... .. .........
. .
. . . . . .. . .. .. .. .. .. . .. ...... .. ... .......... . ....... ..... .. . ... .... . .... ... .. ..... . . ... .. . ........ . . . . ... .. . .... .. . ............ . ... ..... .... . ..... ... .
..... ... .. . . .. . . .. ... . . .. . .. .. .. . ... . .. .... .. . . . ... . .. . .... . . . . . .. .. . . .. . . .. . ... . ....... . . ..
. . .. .............. .. ....... .... ...... . . . .. ...... .. ........ ................. ................. ............... ............ .. .. ......... .. ..... . ........... .. ....... ... .. ... ........ . ... . .... ......... . .........
0.000 .. ... ..... .... . . . .. ... ... ... .. . . ... . ..... .. .... .. .. .. ... .. . .. . . ...... . .... .... .. ..... .. . . . .. ........ ... . ... .. ..... .. .... . ... . ... .
. ........... ...... ......... .. .. ........ . ... ............. ... .... ..... .... . .......... ....... ..... ...... ... .. ........ ...... ... . ........... ............... . . ... . .. .. .. ...... .... . . .. ... ....
... ........... . ... ............ . ... . ..... ... . . ... .. ...... ....... .... ............ .. ..... .. .. ....... . ..... ... ... . .... ..... .......... .. ..................... ........... ... ... ..... ...... . .. .... ........ ...... .........
. . ..... . .. ... . .. . ... .. . .... . . ..... .... . .. . . . . ... . ... .. .... .... .... .............. .. . . . . . .... . ... . . . . . ..
.. ......... . ....... ......... ........... ... ..... .... .. .... ....... ...... . .... . . ... ....... .... . ... ...... ... ......... ..... .. ........... ... ..... ...... ........... .. . ..... . . . . ...... .. .... .. ..
−0.212 .... ...................... ... . . ... .. ... ....... . .... . ...... . .. . ..... . .... . . . .... .. .. ... .. ........ . . ..... ..... . . . .. ... .... .. . . .. .. .. .. .. ..... ..... ..
. .. .... . . . .... . . .. . . . .. . .. . . . ... . ... . ... .. . . .. . . . . . .. . ..... . . . .. ... . . . . .. . .....
.. .... ..... ........ .... ... .... .................. . ........... . .. . ...... ........ ... ...... . .. . ... ......... .. . ... ..... ........... . .... .. ...... .......... ........... ... ..... ............ .. .. ..... . .
. .. . ...... .. ... .. ..... . . ... . ..... . .. ... . ... ... .... ..... .... ... .. . ... . ... .. .. .. .. .... ... ... ...... . .. ..... .... .... ...... . ... . .. . .... .
.... . . ... ... ... . .. . .. . .. . . . ........... .. . .. ......... ... . . . .. ... .. ...... ..... . . ...... . ..... .... ........ . .. . ... . .......... . ..
. . . .. . . .
−0.424 . ... .. .... .......... . .............. ...... .. ..... ........... .......... ..... .... .......... .... ... .......... ... .......... .... .. ... ......... ..... .. .. .. . ........... ... ... ......... ..... ........
... ..... . ... ....... ....... ... .. .... .. ..... .. ...... .... . . ... ... ............... ... ... .. . ...... . ............ . .. ... ... . ....... .... .... ... .. . ...... ...
... . . ... . .. ... ........ ... . ... . . ...... .. . ............... .. .. .. .. .. ... ..... . .... ......... .... ..... . ...... ... ..... . ..... . . ... .
.. . .. . .. . . . .. . .. . . .. ..
. ....... ....... ................... ........ ...... ..... . ........ . .... .... .......... .. ........ ... .... . ......... ...... ... .. . ......................... .. ........ ..... . ..... . ........ .
. .. .. ... ....... ..... ..... ... .. ... . .. .... . ... ... .. ... ... ...... . ... .. . . .. ... . . .... . . .. .. ..... ...... .... ..
−0.636 ... . . . ... .. . .. .. . . .... . . . . . ... ... .. ... . .. . . ... ... . . ......... . ... . . . ... . ..... . . . . .
... . .. .. . .. . . ... . .... . . ....... .. .. .. . ........ . . . . .. ...... ... . . . .. . ... .. .
. . . . .
. .... .... ... . . ................. ..... ... ... .... . . ..... .... . .. .. .. .......... ... ....... .... . . .. ... .. ..... ....... ......... ...
. . .. . ... .. . . .. . .. ... ... ... .. .. . ... ..... . . . . .. ..... . . . .. . ... . . .
. .. . .. . ..... . .. ... .. . .. . .... . .. . . . . ...... ... ...... .... . .. .. .. .. .. ..
. . .. . .. . .
.... ......... .. ..... ... .... ........ ......... .. ...... ..... ........ ..... ................ .... ............ .. . ...... ........ ..
−0.848 ......... . ... . ... ... ... . . .. . ...... . . ..... . ........... ....
. . ... . . .. . . .. . . . . ... .. .. ... .... . .
. . .... . .. .. ....... .. . ....... . .. .
−1.060
−1.5 −1.2 −0.9 −0.6 −0.3 0.0 0.3 0.6 0.9 1.2 1.5
Do U ← 2 ∗ Random − 1, V ← 2 ∗ Random − 1
While (U ∗ U + V ∗ V > 1)
End
X ← U and Y ← V
We now write the general rejection method. The idea lies in the following result : Si-
mulating a random variable of density f amounts to drawing a point at random under the
graph of f and returning the abscissa of this point. In fact if (X, Y ) is a random variable
with uniform distribution under the graph of the function f , then for all t ∈ R,
Z t Z f (x)
P (X ≤ t) = dydx.
−∞ 0
We remark that this method applies to the case where the random variables X and Y have
a density with respect to the same measure (which is not necessarily the Lebesgue measure,
but may be the counting measure). 2
Application to the Gamma distribution The rejection method allows for example the
simulating of a random variable of distribution Γ(λ, a), that is to say of density f (x) =
λa
Γ(a)
exp(−λ x) xa−1 1]0,+∞[ (x) where λ and a are strictly positive parameters and Γ(a) =
R +∞ −x a−1
0
e x dx.
If X and Y are independent random variables of distribution Γ(λ, a) and Γ(λ, b) respecti-
vely, the random variable X +Y follows a distribution Γ(λ, a+b). Moreover, the distribution
Γ(λ, 1) is an exponential distribution E(λ). Adding n independent exponential random va-
riables E(λ) leads to a random variable of distribution Γ(λ, n), for some integer n greater
than or equal to 1.
Finally a change of variables shows that if Y follows a distribution Γ(1, a), the random
variable X = Yλ follows a distribution Γ(λ, a). Hence, in order to simulate all the distributions
Γ(λ, a), it suffices to know how to simulate a random variable of distribution Γ(1, a) for a
parameter a ∈]0, 1[ , which is possible by the following rejection method of Ahrens and Dieter
(1974) modified by Best (1983). Note the numerical important feature of this method : it is
not necessary to calculate Γ(a).
1
Let a ∈]0, 1[ and f (x) = Γ(a) e−x xa−1 1]0,+∞[(x) and
a e a−1
g(x) = x 1]0,1[ (x) + e−x 1[1,+∞[ (x) ;
a+e
a+e
then f ≤ a e Γ(a)
g and for all x > 0 :
f (x)
q(x) = a+e = e−x 1]0,1[ (x) + xa−1 1[1,+∞[(x) .
a e Γ(a)
g(x)
Let Y be a random variable of density g ; we may easily calculate the distribution function
G of Y and its inverse is defined for z ∈]0, 1[ by :
a1
−1 a+e a+e
G (z) = z e (z) − ln
1]0, a+e (1 − z) 1[ a+e
[ e
,1[ (z) .
e ae
(1) We simulate a random variable U of uniform distribution U ([0, 1]) and then calculate
Y = G−1 (U ). We then simulate V with uniform distribution U ([0, 1]) independent of U .
(2) If V ≤ q(Y ), we set X = Y and if not we return to (1).
Proposition 2.31 Let U1 and U2 be independent random variables of the same uniform
distribution U ([0, 1]) ; then the random variables
p p
X1 = −2 ln(U1 ) cos(2 π U2 ) and X2 = −2 ln(U1 ) sin(2 π U2 )
0.36
0.32
0.28
0.24
0.20
0.16
0.12
0.08
0.04 Valeurs
0
−6 −4 −2 0 2 4 6
The above figure shows the histogram of the simulation of Gaussian random variables
N (0, 1) by the Box-Muller method with the help of 10 000 pairs of independent uniform
draws, and the graph of the theoretical density.
3 Conditional Expectation
(ii) Let A ∈ F and B ∈ F be events such that P (A) > 0, P (Ac ) > 0 and P (B) > 0.
Then
P (B/A)P (A)
P (A/B) = .
P (B/A)P (A) + P (B/Ac )P (Ac )
n−1
(iii) Let (Ak , 1 ≤ k ≤ n) be a family of measurable sets such that P (∩k=1 Ak ) > 0. Then
P (∩nk=1 Ak ) = P (A1 )P (A2 /A1 )P (A3 /A1 ∩ A2 ) · · · P (An /A1 ∩ ... ∩ An−1 ).
Theorem 3.3 (Orthogonal Projection Theorem) (i) Let H be a closed vector subspace of
L2 . For all X ∈ L2 , the following properties are equivalent for an element Π(X) ∈ H :
(a) kX − Π(X)k2 = inf{kX − Zk2 : Z ∈ H}.
(b) hX − Π(X), Zi = 0 for all Z ∈ H.
(ii) For all X ∈ L2 , there exists a unique element Π(X) ∈ H that satisfies the properties
(a) or (b), called the orthogonal projection of X on H.
(iii) The function X ∈ L2 → H defined by X 7→ Π(X) is linear and kΠ(X)k2 ≤ kXk2 .
Proof. (i) First of all we show that (a) implies (b). Let Z ∈ H ; for all λ ∈ R, Π(X)+λZ ∈ H,
hence
We deduce that the trinomial λ2 kZk2 −2λhX −Π(X), Zi ≥ 0 for all λ ∈ R ; the discriminant
of this trinomial is thus negative or zero, which implies (b).
Conversely, we suppose that (b) is true and let Z ∈ H. Then Π(X) − Z ∈ H and
(ii) Let m = inf{kX −Zk22 : Z ∈ H} and for all n ≥ 1, let Zn ∈ H such that kX −Zn k22 ≤
m + n1 . The parallelogram identity
applied to a = X − Zn and b = X − Zn+k for every integer n ≥ 1 and k ≥ 1 and the fact
that 21 (Zn + Zn+k ) ∈ H, thus shows
2
Zn + Zn+k
4
X −
+ kZn − Zn+k k2 = 2kX − Zn k2 + 2kX − Zn+k k2 ,
2
2 2 2
2
which implies
1 1 4
kZn − Zn+k k22 ≤2 m+ +2 m+ − 4m ≤ .
n n+k n
The sequence (Zn ) is thus Cauchy and converges in L2 to a limit Π(X). Since H is closed,
we deduce that Π(X) ∈ H ; on the other hand kX − Π(X)k22 = m. Let Y ∈ H another
element that equally satisfies kX − Y k2H = m. We again apply the parallelogram identity
with a = X − Π(X) and b = X − Y . Then, because Y +Π(X)
2
∈ H,
2
Y + Π(X)
4m + kY − Π(X)k22
≤ 4
X −
+ kY − Π(X)k2 = 2kak2 + 2kbk2 = 2m + 2m,
2
2 2 2
2
The characterization of Π(X +λY ) given in (b) then shows that Π(X +λY ) = Π(X)+λΠ(Y ).
Finally, the orthogonality of Π(X) ∈ H and X − Π(X) given by the property (b) shows
that kXk22 = kΠ(X)k22 + kX − Π(X)k22 ≥ kΠ(X)k22 , which completes the proof. 2
Example 3.4 • In the particular case where H is the vector subspace generated by the
function 1, that is to say the set of constants, we recover the fact that the best approximation
in L2 of a square integrable random variable by a constant is its expectation E(X). The
quadratic error, E(|X − E(X)|2 ) is the variance of X.
• If X designates a square integrable function, and if H designates the vector space of L2
generated by the random variables 1 and X, or in an equivalent way by the random variables
1 and X − E(X), we see that H is of dimension less than or equalto two and is thus closed.
In this case, for every element Y of L2 , Π(Y ) = a + b X − E(X) and the constants a and
b are characterized by the following properties, which consists of writing the orthogonality
of Y − Π(Y ) with the generating family of H formed by 1 and X − E(X) :
hY − a + b X − E(X) , 1i = E Y − a + b(X − E(X) = 0 ,
hY − a + b X − E(X) , X − E(X)i = E Y − (a + b(X − E(X)) [X − E(X)] = 0.
Cov(X, Y )
Π(Y ) = E(Y ) + X − E(X) .
V ar(X)
Cov(X,Y )
Furthermore kY − Π(Y )k22 = V ar(Y ) 1 − ρ2 (X, Y ) , where ρ(X, Y ) = √ ∈
V ar(X)V ar(Y )
[−1, +1] according to the Schwarz inequality. We deduce that if X and Y are correlated,
kY − Π(Y )k22 < V ar(Y ), that is to say that the use of X has allowed the reduction of the
quadratic error of the approximation of Y . However, if X and Y are independent, or more
generally if they are not correlated, the use of X has not improved the approximation of Y .
For extending this theorem to the case where X is non negative or integrable, it is
necessary to solve two problems : the existence and the uniqueness of the extension.
Lemma 3.6 Let X and Y be G-measurable random variables, which are either both non
negative, or both integrable, and such that E(1A X) ≤ E(1A Y ) (resp. E(1A X) = E(1A Y ))
for all A ∈ G. Then X ≤ Y a.s. (resp. X = Y a.s.).
Proof. For all a < b, we write F (a, b) = {Y ≤ a < b ≤ X}. Then {Y < X} =
∪a<b,a,b∈Q F (a, b) and it suffices to prove that P (F (a, b)) = 0 for all a < b. We suppose
that P (F (a, b)) > 0 ; then
E(1F (a,b) Y ) ≤ aP (F (a, b)) < bP (F (a, b)) ≤ E(1F (a,b) X),
The following result allows us to show the existence of the foretold extensions.
Theorem 3.7 Let X be a non negative (resp. integrable) random variable. Then there exists
a non negative (resp. integrable) random variable E G (X), unique up to almost sure equiva-
lence, such that
E(1A X) = E(1A E G (X)) , ∀A ∈ G. (3.3)
Proof. The uniqueness comes from Lemma 3.6 and it thus suffices to prove the existence.
Let X ≥ 0 and for all n, Xn = X ∧ n ∈ L2 . Using Theorem 3.5 for every integer
n ≥ 1, E G (Xn ) ∈ L2 is such that for all A ∈ G, E(1A Xn ) = E(1A E G Xn ). Because the
sequence Xn is increasing, for all A ∈ G and for n ≥ 1, E(1A Xn ) ≤ E(1A Xn+1 ), hence
E(1A E G (Xn )) ≤ E(1A E G (Xn+1 )). Using Lemma 3.6, we deduce that the sequence E G (Xn )
is almost surely increasing. It thus converges almost surely to a G-measurable, non negative,
random variable, written Y . Furthermore, the monotone convergence theorem implies that
for all A ∈ G, E(1A Y ) = limn E(1A E G (Xn )) = limn E(1A Xn ) = E(1A X). We then deduce
that Y = E G (X). Moreover, if X is integrable, when A = Ω we deduce that E G (X) is
likewise integrable.
Let X = X + − X − ∈ L1 . Then the random variables X + and X − are non negative and
integrable ; both random variables E G (X + ) and E G (X − ) are thus also integrable (and hence
a.s. finite) and G-measurable. It then suffices to put E G (X) = E G (X + ) − E G (X − ). 2
The following proposition collects the immediate consequences of Theorem 3.7 and of
Lemma 3.6 and generalizes the usual properties of expectation. Its proof is left as an exercise.
Proposition 3.8 (i) E G (1) = 1 and if G = {∅, Ω}, E G (X) = E(X) a.s. for all non negative
or integrable random variables X.
(ii) For X, Y integrable, a, b ∈ R (resp. for X, Y non negative, a, b ∈ [0, +∞[),
The Theorem 3.5 thus shows that for every random vector X : Ω → Rd and for every
square integrable random variable Y : Ω → R, there exists a unique (up to almost sure
The Theorem 3.7 shows finally that for every random vector X : Ω → Rd and for every
non negative or integrable random variable Y : Ω → R, there exists a unique (up to almost
equivalence) non negative or integrable random variable ϕ ◦ X such that for B ∈ Rd
E Y 1B (X) = E ϕ(X)1B (X) . (3.5)
We will only make the explicit calculations in two particular simple cases, which cover
a large number of situations.
Therefore we fix k ∈ N such that P (X = k) > 0 and let ψk = 1{k} be the indicator
function of the singleton {k}. Then ψk ≥ 0, E(ψk (X)2 ) = P (X = k) < +∞ and following
(3.4), X
E Y 1{X=k} = E ϕ(n)1{X=n} 1{X=k} .
n:P (X=n)>0
E Y 1{X=k}
We see that for each set {X = k}, E(Y /X = k) = P (X=k) is the expectation of Y
with respect to the conditional probability P (./X = k). In fact, this property is immediate
if Y = 1A , A ∈ F , next extended to step random variables by linearity, to non negative
random variables by monotone convergence and then to integrable random variables by the
difference of the positive and negative parts.
We may also suppose that the conditional distribution of Y given X = k has a density gk
with respect to the Lebesgue measure. In this case, the random variable Y takes real values
and for every Borel set A ∈ R and every integer k, the equality
Z
P (X = k, Y ∈ A) := P ({X = k} ∩ {Y ∈ A}) = P (X = k) gk (y)dy
A
Example 3.11 Let X and Y be independent random variables of Poisson distribution with
parameters λ and µ respectively. We wish to calculate the distribution of X given the sum
S = X +Y which follows a Poisson distribution of parameter λ+µ. For every pair of integers
n ≥ k ≥ 0 it is necessary to calculate
λ
with p = λ+µ .
The distribution of X given X + Y = n is thus the binomial distribution B(n, p) and
E(X|X + Y = n) = np, which immediately implies E(X|S) = pS. A similar calculation for
the expectation and the variance of a binomial distribution gives E(X 2 |S) = p2 S 2 +p(1−p)S.
f (x, y) f (x, y)
q(y/x) := =R . (3.9)
g(x) f (x, y)dy
R
We remark that if g(x) > 0, Rr q(y/x)dy = 1, which justifies the terminology.
The following theorem shows that in order to compute E(φ(Y )/X = x), we proceed
formally like we did to compute E(φ(Y )). We replace the density of Y by the conditional
density of Y given X = x.
Theorem 3.13 Let X : Ω → Rd and Y : Ω → Rr be random vectors such that the pair
(X, Y ) has the density f (x, y) with respect to the Lebesgue measure on Rd+r and φ : Rd+r →
R is a Borel Rfunction such that φ ≥ 0 or φ◦(X, Y ) ∈ L1 . Then for λd -almost all x ∈ Rd such
that g(x) = Rr f (x, y)dy > 0, if q(y/x) denotes the conditional density of Y given X = x,
Z Z
f (x, y)
E(φ(X, Y )/X = x) = φ(x, y) q(y/x)dy = φ(x, y) R dy. (3.10)
Rd Rd f (x, y)dy
Proof. For all B ∈ Rd , the characterization ϕ(x) = E(φ(X, Y )/X = x) given by (3.3) and
the Fubini-Lebesgue theorem 1.54 show that
Z
E[ϕ(X)1B (X)] = ϕ(x)g(x)dx = E[φ(X, Y )1B (X)]
B
Z Z Z Z
= φ(x, y)f (x, y)dy dx = φ(x, y)q(y/x)dy g(x)dx.
B Rr B Rr
The Proposition
R 1.35 (iii) applied to the Lebesgue measure λd on Rd implies that if α(x) =
ϕ(x) − Rr φ(x, y)q(y/x)dy, the function g(x)α(x) is zero almost everywhere, that is to say
that α(x) = 0 for λd -almost all x such that g(x) > 0. This concludes the proof. 2
A similar calculation to the preceding shows moreover, using the notations of the Theo-
rem 3.13, if φ : Rd+r → [0, +∞[ is Borel,
Z Z
f (x, y)
E[φ(X, Y )|X = x) = φ(x, y) q(y/x)dy = φ(x, y) R dy.
Rd Rd f (x, y)dy
In the particular case where X and Y are independent, of respective densities g and h,
then f (x, y) = g(x)h(y), hence q(y/x) = h(y) and E(φ(Y )/X) = E(φ(Y )). We deduce again
that X is useless to improve the approximation in L2 of a function of Y because as soon as
φ(Y ) belongs to L2 , or more generally is non negative or integrable, E(φ(Y )/X) = E[φ(Y )].
Example 3.14 Let X and Y be independent random variables of the same exponential
distribution of parameter λ > 0. We want to calculate the conditional distribution of X given
S = X + Y . We calculate the density of the pair (X, S), denoted by f (x, s). The density of
the pair (X, Y ) is the product of the densities of X and of Y , and if D = {(x, s) : 0 < x < s}
the mapping Φ :]0, +∞[2 → D defined by Φ(x, y) = (x, x + y) is a C 1 -diffeomorphism. The
Jacobian of the inverse mapping calculated at the point (x, s) ∈ D is 1. The change of
variables formula implies that for all non negative Borel functions ϕ : R2 → [0, +∞[,
Z
E(ϕ(X, S)) = ϕ(x, x + y)λ2 e−λ(x+y) dxdy
]0,+∞[2
Z
= ϕ(x, s)λ2 e−s dxds.
D
The pair (X, S) then has as density the function f (x, s) = λ2 e−s 1{0<x<s} . The marginal
density of S is then λ2 se−λs 1[0,+∞[(s) and we well recover a Gamma distribution Γ(1, 2).
If s > 0, the conditional density of X given S = s is q(x|s) = 1s 1]0,s[ (x). It is a uniform
distribution on the interval ]0, s[. We deduce that for s > 0, E(X|S = s) = 2s and hence
Sp
E(X|S) = S2 . In the same manner, E(X p |S) = p+1 for every real number p ≥ 1.
A direct reasoning allows to give the value of E(X|S), but not the conditional expectation
of every non negative or integrable Borel function of X. The random variables X and Y
« play symmetric roles », which implies that E(X|X+Y ) = E(Y |X+Y ). Because E(.|X+Y )
is linear, X + Y = E(X + Y |X + Y ) = 2E(X|X + Y ), which yields E(X|X + Y ) = X+Y 2
without tedious calculation of the conditional distributions. Note that the above argument
is valid as soon as X and Y are independent identically distributed.
3.4.4 Application to simulation
The preceding notion of conditional distribution allows the simulation of random vectors
(X1 , · · · , Xn ) whose components are not independent. In fact, it suffices to simulate the
first component X1 by the methods of the section 2.4 ; we deduce that x1 = X1 (ω) ∈ R. We
next simulate the conditional distribution of X2 given X1 = x1 , which is again a probability
on R ; this supplies x2 = X2 (ω) ∈ R. We then simulate the conditional distribution of X3
given (X1 , X2 ) = (x1 , x2 ), and so on up to the simulation of the conditional distribution of
Xn given (X1 , · · · , Xn−1 ) = (x1 , · · · , xn−1 ).
We show here the properties of the conditional expectation E G with respect to some
sub-σ-algebra G of F (which may be σ(X) for some random variable X).
Proposition 3.15 Let X and Y be non negative random variables (or such that Y and XY
are integrable) and X be G-measurable. Then
Proof. We suppose that X and Y are non negative. Then for allA ∈ G,1A X is non negative
G
and G-measurable.
Proposition 3.9 then shows that E 1 A X E (Y ) = E 1 A X Y =
E 1A (XY ) . We deduce that since the function is non negative and G-measurable, X E G (Y )
is a.s. equal to E G (XY ). When X and Y change sign, it suffices to write X = X + − X −
and Y = Y + − Y − . 2
which implies that Y = E G (X) a.s. In order to extend this equality to the case where X is
integrable, we decompose X = X + − X − . Because G = {∅, Ω} ⊂ H for every sub-σ-algebra
H of F , the proposition 3.8 (i) allows us to conclude E E H (X) = E(X). 2
The set N = ∪a∈Q Na is also a null set and the preceding inequality is true on N c for all
rational numbers a and thus by continuity for all the real numbers a. We deduce that the
discriminant of this trinomial is almost surely negative or zero, which concludes the proof.
2
The following result is a generalization of the Jensen inequality. It allows us to show that
E G contracts each k.kp norm.
Proposition 3.18 (i) Let f : R → R be a convex function and X a random variable such
G G
that X and f (X) are integrable. Then f E (X) ≤ E (f (X)) a.s.
(ii) For all p ∈ [1, +∞[ and X ∈ Lp , |E G (X)|p ≤ E G (|X|p ) a.s. and for all p ∈ [1, +∞]
kE G (X)kp ≤ kXkp . The conditional expectation is then a contraction of each Lp space,
1 ≤ p ≤ +∞.
Proof. (i) For all x ∈ R there exists a line below the graph of f and going through the
point (x, f (x)), that is to say an affine function y → gx (y) = αx (y − x) + f (x) such that
gx (y) ≤ f (y) for all y ∈ R. In considering the countable family of points x ∈ Q, which
will write xn , we deduce that g = supn gxn is a convex function, continuous, such that
g(x) = f (x) for all x ∈ Q. By continuity, we then deduce that f = g, that is to say
that f is the supremum of a sequence of affine functions gn = an x + bn ≤ f . For all n
we deduce that an X + bn ≤ f (X), and then that there exists a null set Nn such that
an E G (X)(ω) G
+ bn ≤ E G f (X) (ω) for ω ∈ Nnc . Then N = ∪Nn is a null set and on N c ,
G G
f E (X) = supn an E (X) + bn ≤ E (f (X)).
(ii) For p ∈ [1, +∞[, the function |x|p is convex and it suffices to apply (i). Integrating
with respect to P the inequality |E G (X)|p ≤ E G (|X|p), we deducekE G (X)kp ≤ kXkp . The
case p = +∞ results from the Proposition 3.8 (i) and (iii). 2
The following result generalizes the convergence theorems of the first chapter. Its proof
is left as an exercise.
The following result allows us to find equiintegrable families of random variables related
to conditional expectation. It is important in the theory of martingales.
Theorem 3.20 (i) Let (Xi , i ∈ I) be a family of equiintegrable random variables and
(Gj , j ∈ J) a family of sub-σ-algebras of F . Then the family of random variables (Y ij =
E Gj (Xi ) : i ∈ I, j ∈ J) is equiintegrable.
(ii) In particular, we deduce that :
(a) Let X be an integrable random variable and (Gj , j ∈ J) a family of sub-σ-algebras
of G. Then the family (E Gj (X), j ∈ J) is equiintegrable.
(b) Let (Xi ) be a family of equiintegrable variables and G a sub-σ-algebra of F . Then
the family of random variables Yi = E G (Xi ) is equiintegrable.
Proof. (i) For all i ∈ I and j ∈ J we write Yij = E Gj (Xi ). For all a > 0, the event
{|Yij | ≥ a} ∈ Gj and according to the Proposition 3.18 (i), |Yij | ≤ E Gj (|Xi |). We deduce
Z Z Z
j G
|Yi |dP ≤ E (|Xi |)dP = |Xi |dP.
{|Yij |≥a} {|Yij |≥a} {|Yij |≥a}
Moreover, the Markov inequality (1.5) and the Proposition 3.16 show that P (|Y ij | ≥ a) ≤
1
a
E E Gj (|Xi |) = a1 E(|Xi |). Since, according to the Proposition 2.6 supi E(Xi |) < +∞, we
deduce that given any α > 0, P (|Yij | ≥ a) ≤ R α for a large enough. The Proposition 2.6
then allows the conclusion that for all ε > 0, {|Y j |≥a} |Xi |dP ≤ ε for a large enough, which
i
conclude the proof.
(ii) If X is integrable, the family {X} is equiintegrable. The point (a) is thus an imme-
diate consequence (i). The point (b) is a particular case of (i). 2
Proposition 3.22 (i) Let X be a non negative or integrable random variable, independent
of the sub-σ-algebra G. Then, E G (X) = E(X) a.s.
(ii) Let X : Ω → Rd be a random vector and G a sub-σ-algebra. Then X and G are
independent if and only if for every non negative Borel function φ : Rd → [0, +∞[
E G (φ(X)) = E(φ(X)). (3.14)
Proof. (i) The constant E(X) (real or equal to +∞) is G-measurable. Furthermore, for all
A ∈ G, E(1A X) = P (A)E(X) = E[1A E(X)], and the Theorem 3.7 allows us to conclude
E G (X) = E(X) a.s.
(ii) If X and G are independent, (3.14) is a consequence of (i). Conversely, if Y is G-
measurable and non negative, (3.14) and the Proposition 3.9 shows that
E φ(X)Y = E E G (φ(X))Y = E E(φ(X))Y = E[φ(X)]E(Y ).
The lemma 3.21 then implies the independence of X and G. 2
Proof. Let Z be a non negative G-measurable random variable. We write PX for the dis-
tribution of X and P(Y,Z) for the distribution of the pair (Y, Z). The random vectors X and
(Y, Z) are clearly independent and the Theorem 2.24, plus the Fubini-Tonelli Theorem show
that
Z Z
E[Zφ(X, Y )] = z φ(x, y)P(Y,Z) (dy, dz)PX (dx)
d r+1
ZR R Z
= z φ(x, y)PX (dx) P(Y,Z) (dy, dz)
r+1 Rr
ZR
= z E[φ(X, y)]P(Y,Z) (dy, dz) = E[Zϕ(Y )].
Rr+1
In this chapter, we consider a fixed probability space (Ω, F , P ) which we will not recall
in a systematic way in the sequel. The Fourier transform is a very powerful tool in analysis.
Probabilists traditionally give it another name : the name of characteristic function. Fur-
thermore, the notion of Gaussian vector is central in the theory because of the convergence
theorems in distribution which will be seen in the following chapter. Its infinite dimensional
extension, the Brownian motion, is the basis for many models, especially in finance.
Let X be a real random variable. For all t ∈ R the random variables cos(tX) and sin(tX)
are bounded, thus integrable. For every real number a, we recall that eia = cos(a)+i sin(a) ∈
C is a complex number of modulus 1. Requiring that the real part Re(Z) and imaginary
part Im(Z) are random variables, we naturally define a random variable Z : Ω → C. We say
that Z is integrable (resp. of pth power integrable) if and only if its real and imaginary parts
are. If Z ∈ L1 , we put E(Z) = E(Re(Z)) + iE(Im(Z)). All the properties of expectation
of the real random variables extend to complex random variables. We will write |z| for the
modulus of the complex number z.
Definition 4.1 Let X : Ω → R be a real random variable. Its characteristic function is the
function ΦX : R → C defined by
ΦX (t) = E eitX = E(cos(tX)) + iE(sin(tX)). (4.1)
The following proposition gives the immediate properties of the characteristic function.
Proof. The part (i) results immediately from the linearity of the integral and the evident
equality : ΦaX (t) = ΦX (at) for all a, t ∈ R.
(ii) Because |eitX | = 1, |ΦX (t)| ≤ E(|eitX |) = 1. Moreover, for all > 0 there exists
N such that P (|X| ≥ N ) < . The theorem of finite growth shows that for all x, s, t ∈ R,
|eitx − eisx | ≤ 2|x||t − s|. We deduce that if |t − s| ≤ 2N ,
Z N
|ΦX (t) − ΦX (s)| ≤ 2P (|X| ≥ N ) + |eitx − eisx |dPX (x) ≤ 2 + 2N |t − s| ≤ 3. 2
−N
The following property (i) is fundamental. The part (ii) allows us, in certain cases, to
recapture the density from the characteristic function. The theorem is temporarily assumed :
it will be proved in the last chapter (Proposition 5.18).
Theorem 4.5 (i) Let X be a real random variable, n ≥ 1 an integer such that E(|X| n ) <
+∞. Then the characteristic function of X is of class C n and for every k = 1, · · · , n,
(k)
ΦX (t) = ik E X k eitX . (4.3)
(k)
In particular, ΦX (0) = ik E X k .
(ii) Let X ∈ L2 ; then Φ0 (0) = iE(X), Φ00 (0) = −E(X 2 ). Moreover, there exists α > 0
and a function :] − α, +α[→ C such that limt→0 (t) = 0 and for |t| < α,
t2 2
Φ(t) = exp itE(X) − V ar(X) + t (t) . (4.4)
2
Proof. (i) We apply Theorem 1.39 to the function (t, x) → f (x, t) = eitx (in fact separately
to the real part and to the imaginary part of this function, but we bring together the separate
results so that we may directly reason with a complex-valued
function), I = R, d = 1 and to
the measure µ = PX . If X is integrable, ∂f
∂t
(x, t) = |ixeitx | = |x| ∈ L1 (PX ). We deduce the
formula (4.3) for k = 1. It then suffices to reason by induction on the successive derivatives
(and the successive powers of X) up to order n.
(ii) If X ∈ L2 , ΦX is of class C 2 and the Taylor formula of order 2 at 0 gives
t2
ΦX (t) = 1 + itE(X) − E(X 2 ) + o(t2 ).
2
The function ln(1+z) may be defined for a complex number z such that |z| < 1 as ln(1+z) =
P∞ k+1 z k 2
k=1 (−1) k
and by approximating ln(1+z) with z − z2 +o(|z|2 ) when |z| is small enough,
we deduce (4.4). 2
Example 4.6 The following example is fundamental. By applying Theorem 4.5 (i), we
calculate the characteristic function of a Gaussian N (m, σ 2 ) random variable Y . Note that
in this case, in equation (4.4) the term (t) is identically zero. This will explain the central
role that Gaussian random variables play in the convergence theorems.
If Y follows a Gaussian distribution N (m, 0), Y is a.s. equal to its expectation m and
ΦY (t) = eitm .
If σ 6= 0, X = Y −mσ
is Gaussian of distribution N (0, 1) and Y = m + σX. From the
Proposition 4.3, ΦY (t) = eitm ΦX (σt) and it then suffices to calculate ΦX . Because X ∈ L1 ,
for all t ∈ R,
Z Z Z
i − x2
2 2
− x2 1 x2
Φ0X (t) =√ x cos(tx)e dx + i x sin(tx)e dx = − √ xe− 2 sin(tx)dx.
2π R R 2π R
In fact, the first integral is zero because the integrand is odd and integrable. Integration by
parts in the second integral shows that
Z 2 2 i+∞ Z x2
− x2 − x2
xe sin(tx)dx = −e sin(tx) + te− 2 cos(tx)dx.
R −∞ R
R x2
Since by parity, ΦX (t) = √1 cos(tx)e− 2 dx, we deduce that ΦX is real valued and that
2π R
2
for all t, Φ0X (t) = −tΦX (t). We deduce that ln(|ΦX (t)|) = − t2 +C where C is a real constant.
Since ΦX is continuous and does not vanish, it keeps a constant sign. Furthermore, ΦX (0) = 1
t2
implies that ΦX (t) = e− 2 .
We finally check that if Y follows a N (m, σ 2 ) Gaussian distribution,
itY
itm− t
2 σ2 t2
E e =e 2 = exp itE(Y ) − V ar(Y ) . (4.5)
2
We can show as an exercise the following result on the moments of a Gaussian N (0, 1)
random variable X : for every odd integer n, E(X n ) = 0 and for every even integer n = 2k,
E(X 2k ) = (2k)!
2k k!
.
The following theorem is a d-dimensional version of the first results on real random
variables.
The point (iii) will be proved in the following chapter (Proposition 5.18).
The other properties are shown in a way similar to the corresponding proofs in the
preceding section and the details are left as an exercise.
In the particular case of square integrable random vectors, the following result is a
multidimensional version of the equation (4.4). Its proof is left as an exercise. Recall that
we are committing the abuse of language consisting of identifying a vector and the column
matrix of its components in the canonical basis and that we write t̃ for the matrix transpose
of the matrix t.
Theorem 4.9 Let X : Ω → Rd be a square integrable random vector. We write E(X) for
its expectation vector and ΓX for its covariance matrix. Then
d
1 X
ΦX (t) = 1 + iht, E(X)i − tk tl E(Xk Xl ) + o(|t|2 ).
2 k,l=1
There exists α > 0, a function :] − α, +α[d → C such that limktk→0 (ktk) = 0, and such
that for t ∈ Rd with ktk < α one has :
d d
!
X 1 X
ΦX (t) = exp i tk E(Xk ) − tk tl Cov(Xk , Xl ) + ktk2 (ktk)
k=1
2 k,l=1
1 2
= exp i t̃ E(X) − t̃ ΓX t + ktk (ktk) . (4.7)
2
Proof. To point out the ideas, we consider only a pair (X, Y ) of real random variables.
We write Z as a random variable with values in R2 and with distribution PX ⊗ PY . For all
(s, t) ∈ R2 , the Fubini-Lebesgue theorem 1.54 shows that
Z Z Z
i(sx+ty)
ΦZ (s, t) = e dPX (x) ⊗ dPY (y) = e dPX (x) eity dPY (y) = ΦX (s)ΦY (t).
isx
R2 R R
Because the characteristic function characterizes the distribution, we deduce that Z and the
pair (X, Y ) have the same distribution if and only if ΦZ (s, t) = Φ(X,Y ) (s, t). Thus Theorem
2.24 concludes the proof. 2
The following proposition is very useful to compute the distribution of the sum of inde-
pendent random variables. In the case of independent random variables X and Y of densities
f and g, respectively, it shows a classical result : the Fourier transform of the convolution
product f ∗ g of f and g (which is the density of X + Y ) is equal to the product of the
Fourier transforms of f and of g.
Proof. It suffices to use Theorem 2.25 and the fact that for all t ∈ Rd the random variables
eiht,Xk i are integrable (because bounded by 1). We immediately deduce
d
! n
Y Y
iht,Xk i
ΦS (t) = E e = E eiht,Xk i .
k=1 k=1
2
Example 4.12 1. The sum of two independent Gaussian random variables X and Y with
distributions N (m1 , σ12 ) and N (m2 , σ22 ) respectively is Gaussian with distribution N (m1 +
m2 , σ12 + σ22 ). To check this, it suffices to compute the characteristic function of X + Y by
means of the preceding Proposition and to use the characteristic function of the Gaussian
distribution N (m, σ 2 ) computed in (4.5).
2. Let (Xn , n ≥ 1) be a sequence
Pn of independent random variables with the same Cauchy
1
distribution and let X̄n = n k=1 Xk be the average of the first n terms of this sequence.
Then, X̄n follow a Cauchy distribution. To prove this, it suffices to show that the characte-
ristic function of X̄n is e−|t| . For all t ∈ R,
!!
Xn
t Yn
t |t|
E e itX̄n
= E exp i Xk = Φ Xk = e−n n = e−|t| .
k=1
n k=1
n
One can show as an exercise that if X and Y are independent, identically distributed,
square integrable real random variables, such that E(X) = 0, V ar(X) = 1 and X+Y √
2
has
the same distribution as X, then X (and Y ) are Gaussian N (0, 1). One can also check that
in this case, the random variables X+Y
√
2
and X−Y
√
2
are independent Gaussian N (0, 1).
We have seen that real Gaussian random variables are characterized by the fact that
their characteristic function is the exponential of a polynomial of degree two (without the
extra term in the Taylor expansion of order two). The notion of Gaussian vector extends
this property in arbitrary dimension.
Definition 4.13 A random vector X = (X1 , · · · , Xd ) is Gaussian if and only if every linear
combination of its components is a real Gaussian random variable (of nonnegative variance).
Proof. (i) is an immediate consequence of (ii) because the sub vector (Xj , Xk ) of X is
Gaussian.
(ii) If the components of a square integrable random variable are independent, the cova-
riance matrix of X is always diagonal. Conversely, we suppose that the covariance matrix Γ X
is diagonal. For showing that the components of X are independent, it suffices to calculate
the characteristic function of X and to apply Theorem 4.10. For every vector t ∈ R d , the
equation (4.8) and the fact that ΓX is diagonal show that
d
Y d
Y
itk E(Xk )− 12 itk E(Xk )− 12 t2k V ar(Xk )
Pd Pd 2
k=1 tk V ar(Xk )
ΦX (t) = e k=1 = e = ΦXk (tk ). 2
k=1 k=1
4
.
.
. . . .
. .. . 3 . . .
. . .
. . . . . . .. . . . . .. . .
.. . . . . . .. . . .
. . . .. .. . . . .
. ...... . . . .. . .. .... . . .. . . .... . . .. . .. .... ... . . ..
. .. ... ... . .. . ........ ... .. . .. ... . . . .
. . .. .. .. .. ....... ................... ....... .. . . .. .. . .. . .. ... ... . . ... . .
.. . .. .. . ... .... ....... ...... ......2 ..
... ..... ... . .. .... .. .... . .. ..
.. . . .. . . . .
. . .. . . .. . . ..... . .............. ......... ....................... ....... . . . .. .. ..
. . ... . . . .. . ... .. .. ................... .. .................................................................................................... ..... . ...... .... . .
. .. . ... .. ................... ... ...................................................................... .. ........ ... .. . .. .. .. ..
. . . . .. . .. ... .............................. .......... ................................ .......................................................... .. .... . ..... .. . .
. . .... .... .. .. . ..... . ........... .......... ................ .......... ......... ............ ............ .... ... .... . . . . .. .
. . . . . ............ ............................................... ......................................................................................................................................................................... .. .. ... .. . ... . ..
. . . . ............. .. ................................ ..........................1............................................... ........ ............... ..... .. . ..
. . ... . ...... ... ........... .................................................................................................................................................................................... ............. . ... .. . . .
. . . . . . . . . .. . . . . . .. . . . .. . . . . .
. ... . . .... ....... .. ....................................................................................................................................... ........ .... .. .... ... .. .. . . .
. . ...... ....... ..................... ............................................................................................................................. ................ ............... . .... . . .
. . ...... ..... ............................................................................................................................................................................................................................................................................................. ...... ..... . .... .... .
. ... .. . . . ......... ................. ...................................................................................................................... ...... ... ... ..
.. . . . . . ... . .. . .. . . . . .. . . . ..
. ... ... . . ... .. ................................................................................................................................................................................................................................... ....... ............ . . . .
.
. . . . . .. . . ........................................................................................................................0 . .. ...... . . .. .. .. .. . ... ..... .. .. . .
. ..... . .−2 ... ... . ......... ................. ............................................................................................................................................................................................... .......... . ...... .. .... .. . . ..
−6 −4 . . ... . . . . .. . . . . . .. .... . . . . . . . . . . . . . . . . . . . . . 0. . . . . . . . . . . . . .. . . . .. . .. . . .. .
. . . . .. .. .
. . .. . . .. . . ..2 . . . . . 4 6
. . . . ... ............................................................................................................. ................. .. .... .... .... ...... ....... . . .
. . .
. . . . . . . .... ....... ........... ....... .............................................................................................. ............................................ ....... ........ . . . .
. . . . . . .. . ........... .. ........ ...................................................................................................................................................................................................................................... . . .. . .
. . . . . . . . . . . . . . .
... . .. . ..... ........ .......................... ................................................................................................................... . .... .. . .. . . . . . . .. .
. . . .. . . .. .............................................−1 ............. ........... ...................... ... .... ....... ....... .. . .. ... ....
. .... . . ....... .. ................................. ........................................................................ ...................................................................... . .. ..... . .... . .
. . .. . . . .. ... . . . ....... . . ... .. . ..... . ... ... .. . . .... .. .
. . . ..... ... . .... ... .. ....... ................................. ............... .................................................... ............ ......... .......... ... . . .
. . . . .. . . . .... ... ......... . .... .. . ..... ... . ... ..... .... . . ..... . .. . . .
. . . .. .. ... .... ... .......... .... ........ ........ . ....... .... . . .. .. . . .
. .. .. .. . ....... .. ..... ... .. ..................... .. ..... ..... .... ........ ..... .. . . .. .. . . . ..
... .. .. . .. .. .. . .−2 . . . . . . . .. . . . ..
. . . . .. . . .. . ... . ........ . ... .... . . .
. . . . . . .. ....... . .. .......... ... . ... .. .. . .. . . . ..
. .. . . . .
. . .. .. ... . . . .
. . . . . .. . . . . . . . .
. .. . . ..
−3. .
.
. .
. . . .
.
−4
Proof. Following Theorem 4.16, there exists a d × d matrix A such that Γ = AÃ. Let
Y = (Y1 , · · · , Yd ) be a vector whose components are independent Gaussian N (0, 1). The
vector Y is then Gaussian. Let L : Rd → Rd denote the linear mapping whose matrix in the
canonical basis is A. Then the vector Z = AY is Gaussian with covariance matrix AÃ = Γ.
It then suffices to consider the vector X = m + AY ; it is a Gaussian vector with expectation
m with the same covariance matrix as that of AY .
If the matrix Γ is invertible, the matrix A is also and Φ : Rd → Rd defined by x =
Φ(y) = Ay + m is a C 1 -diffeomorphism from Rd to Rd . The Jacobian matrix of Φ−1 is A−1 ,
the density of vector Y is
d d
1 − 21 kyk2 1 1
g(y) = √ e = √ e− 2 ỹy ,
2π 2π
and from the Theorem 2.21, the density of X = Φ(Y ) is then f (x) = |det(A−1 )|g(Φ−1 (x)).
Since det(Γ) = det(A)det(Ã) = [det(A)]2 and y = Φ−1 (x) = A−1 (x − m), we deduce (4.10)
from the identity :
ỹy = (x^ g
− m)A ^
−1 A−1 (x − m) = (x − m)Γ−1 (x − m). 2
The Gaussian vectors finally have a remarkable property relative to conditioning. The
proof of the following result is left as an exercise.
Theorem 4.18 Let (X, Y ) : Ω → R2 be a Gaussian vector of expectation vector m =
σ12 ρσ1 σ2
(m1 , m2 ) and of covariance matrix Γ = , where σ1 > 0, σ2 > 0 and
ρσ1 σ2 σ22
ρ ∈ [−1, +1].
The random variables X − m1 and (Y − m2 ) − ρ σσ12 (X − m1 ) are independent centered
Gaussians. The conditional expectation of Y given X is E(Y |X) = m2 + ρσ σ1
2
(X − m1 ). It is
an affine function of X and it is then also the linear regression of Y in X.
• If |ρ| = 1, then V ar(σ2 X1 − ρσ1 X2 ) = 0, that is to say that the components X1 and X2
are not linearly independent (in the sense of linear algebra). The support of the distribution
of the vector (X1 , X2 ) is the line of equation σ2 x − ρσ1 y = σ2 m1 − ρσ1 m2 .
• If |ρ| < 1, the vector (X, Y ) has for density
exp − 2σ2 σ21(1−ρ2 ) [σ22 (x − m1 )2 − 2ρσ1 σ2 (x − m1 )(y − m2 ) + σ12 (y − m2 )2 ]
1 2
f (x, y) = p .
2π σ1 σ2 1 − ρ2
The conditional distribution of Y given X = x is Gaussian
ρσ2 2 2
N m2 + (x − m1 ) , σ2 (1 − ρ ) .
σ1
The result is generalized as follows : Let X = (Y, Z) be a Gaussian vector. The condi-
tional
Pkdistribution of Z given Y = y is that of a Gaussian vector. Moreover, E(Z|Y ) =
a + i=1 bi Yi if Y = (Y1 , · · · , Yk ), that is to say that the conditional expectation E(Z|Y ) is
a solution of the problem of multiple regression of Z in Y .
We can give more precise results for the decomposition of a non negative semi-definite
symmetric d × d matrix Γ of rank r ≤ d. In this case, there exists a d × d matrix A such that
Γ = AÃ. When Γ is invertible (that is to say positive definite) we lay out a numerical method,
called the Choleski decomposition, for finding a lower triangular matrix A. The Choleski
decomposition of Γ (which is available in numerous programs libraries) is calculated in the
following way :
Γ1,i p
Γ1,1 , Ai,1 = for 2 ≤ i ≤ d , A1,1 =
A1,1
s X
for i increasing from 2 to d : Ai,i = Γi,i − |Ai,k |2 ,
1≤k≤i−1
Pi−1
Γi,j − k=1 Ai,k Aj,k
, Ai,j = 0 . for i < j ≤ d Aj,i =
Ai,i
σ12 ρ σ 1 σ2 σ1 p 0
When d = 2, Γ = , we have A = . Then if
ρ σ 1 σ2 σ22 ρ σ 2 σ2 1 − ρ 2
Y1 and Y2 are independent Gaussian N (0, 1) random p variables, the vector X = (X1 , X2 )
defined by X1 = m1 + σ1 Y1 , X2 = m2 + σ2 ρ Y1 + 1 − ρ2 Y2 is Gaussian of expectation
vector m = (m1 , m2 ) and of covariance matrix Γ.
The figures below show the simulation
of 10 000 vectors
X in R2 , centered and of co-
3 −2 3 2
variance matrix Γ = , and Γ̄ = respectively. The eigenvalues of Γ
−2 3 2 3
are 5 and 1 with eigenvectors, respectively,√ v1 =!(1, −1) and v2 = (1, 1). Its Choleski de-
3 q0
composition is Γ = A Ã with A = . We see that in this case the points are
− √23 5
3
concentrated in an ellipse of axes v1 and v2 and of lengths that depend on the eigenvalues
of Γ. The eigenvalues of Γ̄ are 5 and 1 with eigenvectors, respectively, the vectors v̄1 = (1, 1)
and v̄2 = (1, −1) and we observe a rotation of the ellipse.
.
.
. 6 6 . .
. .
..
. . .. . . . .
. .. .
. . . .. . .
. . ... . . . .. .
. .. . . . . . . ... .. . . .. . . . . . .. .
. . ..... .. . . . . . . . . . .
. . .. . . . .. . . .. . . .. . .. . . .. ... . . . . .. .
.. . .. .... .. .. . . .. .. . .. . . .. . . . .. . . .. . . . .. ... . . .. .. .... . .. . . . .
. .
. . . .. . . .. . . .. .. . . . ... .... . ..... ... .... . . . 4 . . 4 ... . ...... ... ....... . .. .. ..... . . .
. .
. .
.. . ..... .. ........ ....... ........... .... ......... . . . .... . .... . . . .. . . .
. . . ... .. . . ... ..... .... ..... .. ... .. . . .... ...... ... . . .
. . . .
. .
. .
. . .. . . . . . .... . . . .. ... . . . .. . .. .. .. ... .... .... ...... .. ..... . ................ ........ . .. . .
. .. .. . . .. ... .... . .. ............ ........ ..... .................. .......... .. . .............. ..... ..... . . .. .... . . ...... ...................................... ................... ........ .... .......... .... ..... .. .. . . . . . .
. . . . .. .. . . . .... . . . . ... . . . . . . . . .. . ... .. . . . . . . .. . .. .. . . . . .
. . . . . .. ....... .. . ...... ....................... ...... ........................................ .......... ... .... .... . ...... . .. . . . .. . . .. .. .. . ... .. .. . .... ............. ..... ... ......... ... .... ........ ... ....... .. ... . . . ... .
. . . . ... . . . . . . .. .
. .. .... .... ... .... .. ...... ..................................................................................................... ............ ..... ... . .. . . .. . . .. . .
. . . . ... . .. . . ........ ...... ..... . ................................................... ..... .... . .. . . . . . . . .
. . . . . .. ... . . .. .. .. ...... ......... ... .................................................................................................................. .. ..... .... . . . .
. . ...... ...... ................................................................................................................................................................. .... . . . . . . . . ...... . .... .... ... ..... .................. .... ......... ........ .. ........ ........... .. . .
. . .. . ... ... . .... .... .. .....................................................................................................................................2 ............ ...... . ......... .. ........ . .. . ... . . . . . . . .... . ..................2 .. .............. ...................... .................................. ............ ........ . . . ... . ...... .... . . . . .
. . .. . . . . . . .. . . .
. . . .
.. . . ... .... . ........................ ................................................................................................................................................................................ .. .... .... . .. .
. . . . .... .. .. ........ ....................................... ............................................................................................................................................................. ...... ... .. . . . . .
.. . ... .. . .. ...... .... ....... ........ .. .. ...... ....................... ....................... ..... . ... ... .. .. . . . . . . . . .. . . . . ........ ... .............................................................................. ....... ............ . .. ..... .. . . .. .
. . . . . . . . . .. . . . .
. .. . . . ... ..... ... .. ................................................ ................................................. ........... ...... .... ....... . . .
. . . . . .... .... .................... ......... ............................................................................................................................................................................................... ............ ....... . ..... ... .. . ... . .......... . . ........... ........... .................................. ............... . .... .. . . . . .. .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . ................. ............................................................................................................................. ....... ..... .. . .... . . .. . . . . ... . . .. . . . . . ... .. .... . .................................................................................................................................................................................................................................................... ... . . . .
. . . .. ................ ............................................................................................................................... .......................................................... .. . .. . . ... . . . .. . ... .......... ....................................................................................................................................................................... ............... ... ... . . . . .
. . ..... ... ........ . ............ .............................................................................................................. ............. ....... ............... . . .... ..... ... . . . . . . . . .................... ................................................................................................................................... ................. .. .. .... . . .
. . . . . ... .. ..... . .. .. ............. .... ........ ... .. . .... . . .. . ... . . .
. . ............. . .. ....... ....................................................................................................................................................................................................................................................... ...... ......... .. ... .. . . ... ..
. . .. .. . ...... .................................................................................................................................................................................................................................. .............. ...... .... .... . .... . .. ..
. .
. . . . ... ... ... .... ......... .................... ........0 . ... . . . ... .. . . .. .. .. . . . . . .. .. . . .. . ... . .. .. . . .. ............................................................ ........... ........ .. .. .... . . . .. .
. .. ...... . ... ... ... ............................................................................. .... .. .............. .. .. . . .. . . . . ... .... .. ............................................................. ...................................................0 ........................... ........ . .. ..... .. . .
.. . . ... . . . . ... .............................................................................................................................................................................................................................................................. ..... ... .. . . . . . .. . .................................................................................................................................................................................. .................. . .. . .. .. .
. . . . . .. . . .. . . . . . . . . . . . ... . . .
. ..
. . . . ... . ..... .................... ................................................................................... ... ... ... .... . .. . 4 .. .
. . . . . . . . . . . . . .
. . . .
. . .. . .. . .
. . .
. . .. .. .
.. . . .
. ..
..
...
.
..
. . . . . ..
. .. ........
.. . ..
. ..
. . .. ... . . . . . . . . −4 .. . .. .. .. ........ . ....................−2 .. . ........... ............................ ...........0 ......... ................................ . ... ...2 . .. .... ... 4.
. . .. . . . .......... .......................................................................................................................................................................................................................................... ..... . .. ... .. .
−8 −6 −4 −2 0 .
2 . ..
6 8 −8 −6 6 8
. . ... . . .. ........................................................................................................................................................................................................................................ . ...... ....... . ... . . .. . . . . . .. .. . . ..... . ....... ...... .... ....... .............. .................................................................... .... ...... ... .. . .. ..
.. . . . . .. .. ...... . ................................................................................................... ..... ............ . . . . . . .
. .. .. . . . .. ... .. .. . . . . . . . . . . .
. . ... ................................................................................................................................................................................................................................................. ........ .. ... .. .. .. .
. . .
. . . . . ........ .. ............................. ................................................................................................................................................................................................... ... .. . .... . . ..
. . . . .. . .
. . . . .. . ........... ... ........... .... .................. .... . ......... ......... ....... . ....... . . . . . .. . . . . . . . . .
. . ... .. .. ... ....... ....... .............. ..... ........................................................................................ ... ........ . .... .. . ... . .... . ........................... ............................................................................................................................................................................ ............... . . . .... . . . .
. . . . . . . . . . . . . . . . . . .. . ... . . . . .
. .
.. .
. . .... .... .... .. ......... ........... ..... ...................... ... ......... . ..... . .. . . ..... . . ...... . .. . .. .. .. . .... . .. ..
.. . . .
. . . . ..... ....... .. ..−2 .. . .. . . . .. .. .... . . . . . ... . . . . . . ... .. . ... ....... .......... ........ ........ ..... ........ ............ . ................... .......... ... ..... ... . .. .. . . .
. . .... . . .................................. ............................................................................................ ........... ................ . .. .. . . . ..... .... ... .......... . ....................................................................................................................−2 ......... ......... .... . . .. . .. . .
. . . . ... . ......... .......................... ............................ ................................................ ................. ...... . ... .. . . . . . .. .. . . .. . .. .... . .. ............ .............. ... .......... .. .. . ..... .......... ... .. .. . .... ..
. . . .. . .... . . .. . ... ....... ...... .. .. .. . ..... .......... .. . . . . . . . . . . .... ..... .................................................................................................................................. ... .. . . .
. . . . . . . . . ... ..... ....... . ....... ....... ....................................................... .... .................. . ... ....... . .. . .
.
. . . . . . . . ... ............ .... ............... ........... ....... ..... ..... .
. . .. .. . . ..... .. .. .. . . . . .
. .. . . . . . .... . . ... . . . . . .. . . . . . .. ...... .. . ................... ......... .............. ............. ........... . ..... .. .. ..... .. . ..
. . ... ....... . . .. .... ....... ...... . ........................ ....... ...................... .. . . . . ... . ... .... ..... . .. . . .. . .... ... .. ... ... .. . . . . ..
. . . . .. . .. . . . . . . . . . .
. . . .. . .... . . . ... ...... . . .. . . . . . . .. . . . . . . . .. . . . . . . . ... . . . . . . ... ... . ....... .. .... ... . .... . .. . . .
. . . . .. .. ...... ... ...... . ... .. ...... . ... ... . .... .... .. . .. . . . . .... .. ....... .. .. ... . ... . . . . . . ...
. . . . . . . .. .. . . . ... ... . .... ... . . . ... . .. .
−4. . .. . . .. . .... . . . .. . .. ... . .. .. . . . .. . . . . . . . .. .. . . . . .... .... .. . ... . ... . . −4 .
.
.
. .. . .. . ..
. .
.. .. . ... . . . . ..
. . ...... . .. . . . . . . . . .
.
. . .. . . . . . .. . . . . .
. . .. . . . .. . .. . . . . .
. . .
. . . . .. . . . .
. . . .
. . .
−6 . . −6
. .
following result : If X and Y are independent random variables with distributions Γ(λ, a)
and Γ(λ, b) respectively, then X + Y follows a Γ(λ, a + b) distribution. We deduce that a χ2n
follows a Γ( 12 , n2 ) distribution.
Proof. If the variance is zero, the Proposition is clear. If σ > 0, we want to return to the case
of N (0, 1) random variables. In fact, for all k ≥ 1, the random variable Yk = σ −1 (Xk − m)
is N (0, 1), the Yk are independent, X̄n = m + σ Ȳn and Σn (X) = σ 2 Σn (Y ).
Thus, we suppose that Pnm = 0 and σ = 1. The average X̄n is a centered Gaussian random
1 1
variable of variance n2 k=1 V ar(Xk ) = n . The vector (X1 , · · · , Xn ) is Gaussian and the
vector (X̄n , X1 −X̄n , · · · , Xn −X̄n ) is thus Gaussian. Because Σn (X) is a Borel function of the
last n components of this vector, it suffices to verify that X̄n and (X1 − X̄n , · · · , Xn − X̄n ) are
independent and, thanks to Theorem 4.14, that for every k = 1, · · · , n, Cov(X̄n , Xk − X̄n ) =
0 ; this is straightforward.
It then remains to show that Σn (X) is the sum of the squares of n − 1 independent
N (0, 1) random variables. We write v1 = ( √1n , · · · , √1n ). The vector v1 is of norm 1 and
may be completed to an orthonormal basis (v1 , · · · , vn ) of Rn . We write A as the change of
basis matrix from the canonical basis to the basis (v1 , · · · , vn ). It is an orthogonal matrix,
as is its transpose Ã. The vector Z = ÃX is Gaussian, centered, and of covariance matrix
ÃIdP n A = Idn . The random variables Zk are independent
n √ PnN (0, 21) Gaussians
Pn and Z1 =
√1 X = n X̄ . Moreover, from Theorem 4.15 (iii), Z = X 2
n k=1 k n k=1 k k=1 k Finally,
.
n
X n
X n
X n
X
Σn (X) = Xk2 − 2X̄n 2
Xk + n(X̄n ) = Xk2 2
− n(X̄n ) = Zk2 .
k=1 k=1 k=1 k=2
Definition 4.21 For every integer n ≥ 1, a random variable Tn = √XY follows a Student
n
distribution with n degrees of freedom if the random variables X and Y are independent of
distributions, respectively, N (0, 1) and χ2n .
follows a Student distribution with n − 1 degrees of freedom. The important feature is that
neither the definition of T nor its distribution depends on the parameter σ. This is used
in statistics to test results on the expectation of Gaussian samples when the variance is
unknown.
5 Convergence Theorems
The aim of this chapter is twofold. One one hand, we introduce two more notions of
convergence for sequences of random variables and compare them with those studied in the
first chapter about measure theory. On the other hand, we will prove classical convergence
results for the average of sequences of independent identically distributed random variables.
In all this chapter, unless specified otherwise, we consider a probability space (Ω, F , P )
which will not be systematically recalled in the sequel. The spaces Lp , 1 ≤ p ≤ +∞ are the
spaces Lp (P ) defined in the first chapter.
Convention of notation In all of this chapter, if (Xn , n ≥ P 1) denotes a sequence of
real random variables or of random vectors, we denote by Sn = nk=1 Xk the sum and by
X̄n = Snn the average of the first n random variables. Finally, a sequence of independent
identically distributed random variables is written i.i.d.
Recall that by the Schwarz inequality (resp. Hölder inequality), if a sequence of random
variables (Xn ) converges to X in L2 (resp. in Lp with 1 < p < +∞), then (Xn ) converges
to X in L1 . More generally, if 1 ≤ p1 < p2 ≤ +∞ and if the sequence (Xn ) converges to X
in Lp2 , then (Xn ) converges to X in Lp1 .
The converse is false, as the following example shows : for Ω =]0, 1[ consider the Borel
σ-algebra with the Lebesgue measure and Xn = na 1]0, 1 [ . If a p1 < 1, the sequence (Xn )
n
converges to 0 in Lp1 , while for all p2 > p1 , if a p2 > 1, the sequence kXn kp2 tends to +∞
and (Xn ) then does not converge to 0 in Lp2 . In the same way, if a > 0, the sequence (Xn )
is not bounded in L∞ , while for p < +∞, if ap < 1, it converges to 0 in Lp .
The following definition introduces a notion of convergence which is weaker than that of
convergence in L1 and of almost sure convergence. We write kxk for the Euclidean norm of
a vector x ∈ Rd .
(iv) If the sequence (Xn ) converges to X in probability and is equiintegrable, then the
sequence (Xn ) converges to X in L1 .
Proof.
(i) The Markov inequality (1.5) shows that for all ε > 0,
1
P (kXn − Xk ≥ ε) ≤ E(kXn − Xk),
ε
which proves that convergence in L1 implies convergence in probability.
(ii) For every integer N ≥ 1 and all ε > 0, we write ΩN = {ω : supn≥N kXn − Xk ≥ ε}.
The sequence of sets ΩN is decreasing and the almost sure convergence of (Xn ) to X shows
that P (ΩN ) → 0 when N → +∞. Since {kXN − Xk ≥ ε} ⊂ ΩN , we deduce that almost
sure convergence implies convergence in probability.
(iii) For every integer k ≥ 1, let Nk be an integer such that P kXn − Xk ≥ k1 ≤ k12 for
all n ≥ Nk . We then construct a sequence (nk , k ≥ 1) such that P for all k ≥P1, nk+1 > nk
and nk ≥ Nk . We write Ak = {kXnk − Xk ≥ k1 }. The series k P (Ak ) ≤ k k −2 is thus
convergent and the Borel-Cantelli lemma 2.2 implies that P (lim sup Ak ) = 0. We deduce
that, for almost all ω, there exists an integer K(ω) such that for k ≥ K(ω), kXnk − Xk(ω) ≤
1
k
, which shows that the sequence (Xnk (ω) − X(ω), k ≥ 1) converges to 0.
(iv) It suffices to reason for each component of Xn , that is to say that we want to
return to the assumption that d = 1. From (iii), there exists a subsequence (nk ) such that
(Xnk , k ≥ 1) converges to X almost surely. The equiintegrability of the sequence (Xn ) shows
that supk E(|Xnk |) < +∞ (cf. the Proposition 2.6) and the Fatou lemma 1.33 thus shows
that X is integrable. Moreover, for every ε > 0 and a > 0,
E(kXn − Xk) ≤ ε + E(kXn − Xk1{kXn −Xk≥ε} ≤ ε + E kXn k1{kXn k≥a} + kXk1{kXk≥a}
+ E[kXn − Xk1{kXn −Xk≥ε} 1{kXn −Xk≤2a} ].
From the Proposition 2.6, for every ε > 0 there exists a > 0 such that E(kXk1 {kXk≥a} ) ≤ ε
and for all n, E(kXn k1{kXn k≥a} ) ≤ ε. We deduce E(kXn − Xk) ≤ 3ε + 2aP (kXn − Xk ≥
ε) ≤ 4ε if n is large enough. 2
Let Ω = [0, 1[ be endowed with the Borel σ-algebra and the Lebesgue measure. For
all a > 0, the sequence Xn = na 1[0, 1 [ converges in probability to 0 while if a > 1 the
n
sequence Xn does not converge to 0 in L1 . So, for every k ≥ 1 and for n = 0, · · · , 2k − 1, let
Xk,n = 1[n2−k ,(n+1)2−k [ . By ordering the preceding pairs (k, n) such that (k1 , n1 ) ≤ (k2 , n2 )
if k1 < k2 , or if k1 = k2 and n1 ≤ n2 , we deduce that this sequence converges to 0 in
L1 , and thus also in probability, but not almost surely. We then see that the converses of
properties (i) and (ii) are false (without restriction of the subsequence or without additional
conditions.)
The convergence in probability is metrizable, as the following proposition shows.
Then d is a distance (on the quotient set of random variables by the null random variables,
that is to say such that kXk = 0 a.s.). The set of random vectors (divided by set of a.s. the
null vectors) is complete for d.
Furthermore, for every sequence of random vectors (Xn ) with values in Rd , the sequence
(Xn ) converges to X in probability if and only if the sequence d(Xn , X) converges to 0.
Proof. (i) First of all we show that d is a distance. The function f : [0, +∞[→ [0, +∞[
x
defined by f (x) = 1+x is increasing and the triangle inequality for the norm on Rd implies
that for X, Y, Z : Ω → Rd ,
kX − Y k kY − Zk
d(X, Z) ≤ E (f (kX − Y k + kY − Zk)) ≤ E + ,
1 + kX − Y k 1 + kY − Zk
which implies the triangle inequality for d. In the evident way d(X, Y ) = d(Y, X). Finally,
kX−Y k
if d(X, Y ) = 0, we have 1+kX−Y k
= 0 a.s., which implies X = Y a.s.
(ii) We show that the set of random vectors endowed with the distance
P d is complete.
First of all let (Xn ) be a sequence of random vectors such that n d(Xn , Xn+1 ) < +∞.
We first show that (Xn ) converges almost surely to a random vector X. In fact, the monotone
convergence theorem 1.23 (or the Fubini-Tonelli theorem 1.53 works as well) shows that
P kXn −Xn+1 k P P kXn −Xn+1 k
E( n 1+kX n −Xn+1 k
) = n d(Xn , Xn+1 ) < +∞, which implies that n 1+kX n −Xn+1 k
< +∞
P
a.s. Because
P thean convergence of thePseries of positive terms n an is equivalent to that of
the series n 1+an , we deduce that n kXn − Xn+1 k < +∞ a.s., and thus that the sequence
(Xn ) converges almost surely to a random vector, denoted X.
kXn −Xk
The sequence 1+kX n −Xk
then converges almost surely to 0 and is dominated by 1. The
dominated convergence theorem yields that the sequence d(Xn , X) converges to 0.
This shows that the metric space of the (equivalence classes of) random vectors endowed
with the distance d is complete. Indeed,
P if the sequence (Yn ) is Cauchy, we are able to extract
a subsequence Xk = Ynk such that k d(Xk , Xk+1 ) < +∞. Hence there exists X such that
d(Xnk , X) → 0 when k → +∞. A Cauchy sequence with a subsequence converging to X is
such that the whole sequence converges to X . This concludes the proof of the completeness
of the metric space.
(3) It remains to show that convergence in probability is equivalent to that for d. Let
(Xn ) be a sequence that converges to X in probability. Then
Z
kXn − Xk ε kXn − Xk
E ≤ + dP ≤ ε + P (kXn − Xk ≥ ε) ≤ 2ε
1 + kXn − Xk 1+ε {kXn −Xk≥ε} 1 + kXn − Xk
The following proposition shows by example that the sum (and the product) of sequences
of real random variables which converge in probability converge to the sum (and the product)
of the limits. Its proof is left as an exercise. (Note that one can return to a compact set of
R2 and use the uniform continuity of a continuous function on a compact set.)
Proposition 5.4 (i) Let (Xn ) and (Yn ) be sequences of real random variables such that Xn
(resp. Yn ) converges in probability to X (resp. to Y ) and let f : R2 → R be a continuous
function. Then the sequence f (Xn , Yn ) converges in probability to f (X, Y ).
(ii) If P (X = 0) = 0, the sequence X1n converges in probability to X1 .
First we show that the average of a sequence of square integrable i.i.d. random variables
converges in probability. This result, whose proof is very simple, suffices in many concrete
situations. We first of all prove a little more general result, of which the weak law of large
numbers is an immediate consequence.
For every real random variable Z ∈ L2 and λ > 0, the following version of the Markov
inequality is called the Bienayme-Chebychev inequality :
V ar(Z)
P (|Z − E(Z)| ≥ λ) ≤ . (5.3)
λ2
It immediately gives the Markov inequality for the random variable |Z − E(Z)|2 and the
constant λ2 (and shows directly that convergence in L2 implies convergence in probability.)
Theorem 5.5 Let (Xn ) be a sequence of independent square integrable real random va-
riables ; for all n ≥ 1, we write mn = E(Xn ) and σn2 = V ar(Xn ). We suppose that there
exists m ∈ R such that
n n
1X X
lim mk = m and σk2 = o(n2 ).
n n
k=1 k=1
Proof. Because (a + b)2 ≤ 2(a2 + b2 ), the independence of the sequence (Xn ) implies that
for all n ≥ 1
2 2 2
1 Xn 1 Xn 2 Xn 1 Xn
2 2
E(|X̄n −m| ) ≤ 2E X̄n − mk +2 mk − m ≤ 2 σk +2 mk − m .
n n n n
k=1 k=1 k=1 k=1
However, in certain situations, the weak law of large numbers is insufficient. For example,
the Monte-Carlo method consists in approximating the expectation of a random variable by
the average X̄n (ω) for a single realization (xn = Xn (ω) , n ≥ 1) of the sequence (Xn , n ≥ 1),
that is to say for a single value of ω. It is clear that a result of convergence in probability is
insufficient for proving that for almost every realization, the sequence X̄n (ω) approximates
E(X). On the other hand, we would like to weaken the square integrability condition by the
more natural integrability condition on the i.i.d. sequence (Xn ). These two improvements
are achieved in the following theorem.
Theorem 5.7 Let (Xn ) be a sequence of independent identically distributed integrable ran-
dom vectors. Then the sequence (X̄n ) converges almost surely to E(X1 ).
Proof. (1) By reasoning on each component of the sequence (Xn ), we return to the case of
i.i.d. real integrable random variables. Furthermore,
replacing the i.i.d. sequence (Xn ) by the
centered i.i.d. sequence Yn = Xn − E(Xn ), n ≥ 1 , we return to the case where E(X1 ) = 0.
We thus consider an i.i.d. sequence of integrable, centered real random variables (Xn ).
(2) We will only prove the theorem in the particular case of square integrable, centered,
i.i.d. real random variables (Xn ) We at first show that the sequence X̄n2 converges a.s. to
0. Indeed, for all ε > 0, the Bienayme-Chebychev inequality (5.3) shows that
1 n2 V ar(X1 ) V ar(X1 )
P (|X̄n2 | ≥ ε) ≤ 2
V ar( X̄ n 2) =
4 2
= .
ε n ε n2 ε 2
1
The Borel-Cantelli lemma (2.2) shows that P (lim sup{|X̄n2 | ≥ n− 4 }) = 0. For almost all ω,
1
there then exists an integer N (ω) such that for n ≥ N (ω), |X̄n2 (ω)| ≤ n− 4 , which shows
the almost sure convergence of X̄n2 to 0.
(3) We now suppose that (Xn ) is square integrable, and we show that the sequence X̄n
converges a.s. For every integer n ≥ 1, we write
Then the Markov inequality (1.5), the trivial inequality (a + b)2 ≤ 2a2 + 2b2 and the inde-
pendence of the Xn imply that for all ε > 0,
(n+1)2 −1
1 X 1
P (Aεn ) ≤ 2 P (|X̄k − X̄n2 | ≥ ε) ≤ 2
E(|X̄k − X̄n2 |2 )
ε ε
k=n2
2
(n+1)2 −1
X 1 1 1
Xn2
1 Xk
≤ 2
E − 2 Xi + Xi
ε k n k
k=n2 i=1 i=n2 +1
(n+1)2 −1 n2
! (n+1)2 −1 k
!
2 X (k − n2 )2 X 2 X 1 X
≤ 2 V ar Xi + 2 V ar Xi
ε 2
(kn2 )2 ε 2
k 2
k=n i=1 k=n i=n2 +1
(n+1)2 −1 (n+1)2 −1
2 X
(2n)2 2 2 X 1
≤ n V ar(X 1 ) + 2nV ar(X1 )
ε2 2
n8 ε2 n4
k=n2
k=n
C 1 1 C
≤ 2 3
+ 2 ≤ 2 2.
ε n n εn
P −1 −1
Because n P (Ann 4 ) < +∞, the Borel-Cantelli lemma implies that P (lim sup Ann 4 ) = 0.
We deduce that for almost all ω and all ε > 0, there exists N1 (ω) such that for all
n ≥ N1 (ω), and for all k ∈ {n2 , · · · , (n + 1)2 − 1}, |X̄k − X̄n2 |(ω) ≤ ε. Moreover, for almost
all ω there exists N2 (ω) such that for all n ≥ N2 (ω), |X̄n2 (ω)| ≤ ε. The sequence (X̄n ) then
converges almost surely to 0. 2
The following notion corresponds to the weak convergence of the distributions (which
are probability on Borel σ-fields). It is the weakest of the convergences studied and, contrary
to the preceding, it does not require that all the random variables be defined on the same
probability space.
5.3.1 Definition
As its name indicates, this notion does not depend on the probability space on which
the random variables are defined (as in all the preceding convergences). It is defined as the
weak convergence of the pushforward of the probabilities by the random variables.
Definition 5.8 (i) A sequence of probabilities (µn ) on Rd converges weakly to R the probability
µ if for everyR bounded, continuous function f : Rd → R the sequence f dµn , n ≥ 1
converges to f dµ.
(ii) Let (Ωn , Fn , Pn ) be a sequence of probability spaces and (Ω, F , P ) a probability space.
A sequence of random variables Xn : (Ωn , Fn ) → (Rd , Rd ) converges in distribution to the
random variable X : (Ω, F → (Rd , Rd ) if the sequence (Pn )Xn = Pn ◦ Xn−1 of distributions
of Xn converges to the distribution PX = P ◦ X −1 of X. If all the probability spaces are
the same, the convergence in distribution of the sequence (Xn ) to X is then defined by the
convergence of the sequence E[f (Xn )], n ≥ 1 to E[f (X)] for every bounded continuous
function f : Rd → R.
R
A probability µ on Rd was characterized by the family of integrals f dµ for the conti-
nuous bounded functions f (from the functional version of the monotone class theorem),
and we see that if a sequence (Xn ) of random vectors converges
in distribution to X, that
is the say that if the sequence of distributions PXn , n ≥ 0 converges weakly to the dis-
tribution PX . The limiting distribution is unique (but neither the random variable, nor the
probability space on which it is defined).
The following proposition allows us to characterize convergence in distribution of a se-
quence of random variables (Xn ) by restricting the class of functions f in the definition.
Thus, restricting the convergence to continuous functions with compact support, we have a
notion of vague convergence.
Convention of notation In order to simplify the notations, we will suppose in the sequel
that all of the probability spaces on which the random variables are defined are the same
space (Ω, F , P ). Many results remain valid in the case of a sequence of random variables
defined on different probability spaces.
Proof. The convergence of the integrals of continuous bounded functions implies that of
continuous functions with compact support (or which tend to zero at infinity). Conversely,
let f be a bounded continuous function, (hk , k ≥ 1) a sequence of continuous functions
with compact support which increase to 1 (for example defined by hk (x) = 1 if kxk ≤ k,
hk (x) = 0 if kxk ≥ k + 1 and hk (x) = k + 1 − kxk if k < kxk < k + 1. Then
|E[f (Xn ) − f (X)]| ≤ |E[(f hk )(Xn ) − (f hk )(X)]|
+|E[f (Xn )(1 − hk (Xn ))]| + |E[f (X)(1 − hk (X))]|
≤ |E[(f hk )(Xn ) − (f hk )(X)]|
+kf k∞ (1 − E[hk (Xn )]) + kf k∞ (1 − E[hk (X)]) .
For all ε > 0, the dominated convergence theorem 1.36 shows that we may choose k such that
1 − E[hk (X)] < ε. The function hk being continuous with compact support, the sequence
E[hk (Xn )] converges to E[hk (X)] when n → ∞ and we may find N1 such that for n ≥ N1 ,
1 − E(hk (Xn )) < 2ε. Finally, the support of the function (f hk ) is compact and we may find
N2 such that for all n ≥ N2 , |E[(f hk )(Xn )−(f hk )(X)]| ≤ ε. We deduce that for n ≥ N1 ∨N2 ,
|E[f (Xn ) − f (X)]| ≤ (1 + 3kf k∞ )ε. 2
The example µu = δn provides us with a sequence of probabilities such that for every
continuous function f which
R tends to 0 at infinity (for example continuous on a compact
support), the sequence f dµn , n ≥ 1 converges to 0. There is vague convergence of the
sequence (δn ) to 0 but there is a « loss of total mass ». The sequence of random variables
(Xn = n) then does not converge in distribution. When we require that the limit measure
also be a probability (for example the distribution of a random variable) we avoid the
problem of loss of mass. Another more technical notion is that of tightness which will not
be addressed in these notes.
Proposition 5.10 If a sequence of random vectors (Xn ) with values in Rd converges in
distribution to a random vector X with values in Rd , the sequence PXn of distributions of
Xn is tight, that is to say that for all ε > 0 there exists a constant K > 0 such that
sup P (kXn k ≥ K) ≤ ε. (5.4)
n
Proof. For all ε > 0, there exists K1 such that P (kXk ≥ K1 ) ≤ ε. We may construct
a continuous function f such that f (x) = 0 if kxk ≤ K1 , 0 ≤ f ≤ 1 and f (x) = 1 if
kxk ≥ K1 + 1. Then there exists N such that for n ≥ N , |E(f (X)) − E(f (Xn ))| ≤ ε and
P (kXn k ≥ K1 + 1) ≤ E[f (Xn )] ≤ E[f (X)] + ε ≤ P (kXk ≥ K1 ) + ε ≤ 2ε.
It remains to choose K2 such that for n = 1, · · · , N − 1, P (kXn k ≥ K2 ) ≤ ε to deduce that
for all n ≥ 1 and K = K1 ∨ K2 , the inequality P (kXn k ≥ K) ≤ 2ε is true. 2
The following theorem shows that convergence in distribution is the weakest of all of the
notions of convergence that we have studied.
Proposition 5.11 Let (Xn ) be a sequence of random vectors with values in Rd defined on
the same probability space.
(i) If the sequence (Xn ) converges in probability to X, then (Xn ) converges in distribution
to X.
(ii) If the sequence (Xn ) converges in distribution to an a.s. constant random variable
X, then (Xn ) converges to X in probability.
The convergence of P (kXn −Xk ≥ α) to 0 then shows that the sequence E[f (Xn )] converges
to E[f (X)] and Proposition 5.9 concludes the proof.
(ii) Let (Xn ) be a sequence that converges in distribution to the constant vector a ∈ Rd .
For all λ > 0 let f be a continuous function such that 0 ≤ f ≤ 1, f (x) = 1 if kx − ak ≥ λ
and f (x) = 0 if kx − ak ≤ λ2 . Then, when n → ∞
The following example shows that, except in the case of a constant limit, the convergence
in distribution is strictly weaker than that in probability.
The following table gathers the links between the different types of convergence.
L2 L1 Proba Law
Xn −→ X ⇒ Xn −→ X ⇒ Xn −→ X ⇒ Xn −→ X
⇑
a.s.
Xn −→ X
If the random variables Xn and X take discrete values, we easily characterize the conver-
gence in distribution of (Xn ) to X.
Proposition 5.13 Let (Xn ) be a sequence of random variables which take their values on
a finite set I ⊂ R or in I = N. Then the sequence (Xn ) converges in distribution to the
random variable X : Ω → I if and only if for all i ∈ I, the sequence P (Xn = i), n ≥ 1
converges to P (X = i).
One can show as an exercise that if (Xn ) is a sequence of random variables with Binomial
distribution B(n, pn ) such that npn → λ > 0 when n tends to +∞, then the sequence (Xn )
converges in distribution to a random variable X with Poisson distribution of parameter λ.
The hypothesis made on the boundary of A and the dominated convergence theorem imply
that for all ε > 0, there exists an integer K such that E[gK (X) − fK (X)] ≤ ε. The conver-
gence in distribution of (Xn ) to X then yields the existence of an integer N such that for
all n ≥ N , |E[gK (Xn ) − gK (X)]| ≤ ε and |E[fK (Xn ) − fK (X)]| ≤ ε. We then deduce that
for all n ≥ N ,
|P (Xn ∈ A) − P (X ∈ A)| ≤ 3ε,
which concludes the proof. 2
For special sets like A =] − ∞, t], we connect the convergence in distribution of real
random variables to the simple convergence of their distribution functions. We recall that
for every random variable X : Ω → R the distribution function F of X is defined by
F (t) = P (X ≤ t).
Theorem 5.16 Let (Xn ) be a sequence of real random variables of distribution functions
Fn and X be a real random variable of distribution function F . Then the sequence (X n )
converges to X in distribution if and only if
for all t ∈ R such that P (X = t) = 0, Fn (t) = P (Xn ≤ t) → F (t) = P (X ≤ t).
Proof. The forward implication comes from the preceding theorem, because the boundary
of ] − ∞, t] is {t}. The converse one is accepted. This theorem implies in particular that if
the random variable X has a density, then (Xn ) converges to X in distribution if and only
if the sequence (Fn ) of distribution functions of Xn converges simply to X. 2
Example 5.17 Theorem 5.16 is well suited for showing the convergence in distribution
of a sequence of random variables defined as the supremum or infimum of sequences of
independent random variables. Using the above results, it is easy to check the following. Let
(Xn ) be a sequence of independent random variables of the same uniform distribution on
the interval [a, b] with a < b (defined on the same probability space). It is easy to check that
the sequences In = inf(X1 , · · · , Xn ) and Mn = sup(X1 , · · · , Xn ) converge in probability,
respectively, to a and b.
Indeed, by Proposition 5.11, it suffices to prove that the sequences (In ) and (Mn ) converge
in distribution respectively to the constants a and b. The distribution function of the constant
b is the function F = 1[b,+∞[. Then Theorem 5.16 shows that the sequence (Mn ) converges
in distribution to b if and only if for all t 6= b, the sequence P (Mn ≤ t) converges to
F (t). For all t ≤ a, P (Mn ≤ t) = 0, for all t ≥ b, P (Mn ≤ t) = 1 and for all t ∈]a, b[,
t−a n
P (Mn ≤ t) = b−a → 0 when n → ∞. The proof of the convergence in distribution of
(In ), which is similar, is left as an exercise.
5.3.3 Convergence in distribution and characteristic function
The characteristic function is again a very powerful tool for characterizing the conver-
gence in distribution. Furthermore, the convergence in distribution and the convolution by
Gaussian distributions of variance which tend to 0 allow us to obtain the distribution of a
random variable from its characteristic function.
The following result is important, because it is the base of the injectivity of the Fourier
transform and of the Fourier inversion formula which were stated in the previous chapter
(Theorems 4.4 and 4.8). We recall that we write hx, yi for the scalar product of the vectors
x and y and kxk2 = hx, xi for the square of the Euclidean norm of x.
Theorem 5.18 (i) For all n, let Xn be a Gaussian random vector of Rd whose components
are independent N (0, Id). Then for every sequence σn which converges to 0, the sequence of
random variables σn Xn converges in probability to 0.
(ii) Furthermore, for every probability µ on Rd , if we write
Z
µ̂(t) = eiht,xi µ(dx)
Rd
2
for the Fourier transform of µ and gσ (x) = √1
(σ 2π)d
exp − kxk
2σ 2 for the density of the cente-
red Gaussian vector of covariance matrix σ 2 Idd , then for every sequence (σn ) which converges
to 0, the sequence of probabilities with density defined by
Z Z 2 2
1 −iht,xi σ ktk
hσn (x) = µ ∗ gσn (x) = gσn (x − y)µ(dy) = d
µ̂(t)e exp − n dt (5.6)
(2π) Rd 2
converges weakly to µ. This means that for every random variable Y of distribution µ in-
dependent of the sequence (Xn ), the sequence of random variables (Xn + Y ) converges in
distribution to Y .
(iii) Finally, for every integrable function f with respect to the Lebesgue measure λ d , if
Z
fˆ(t) = eiht,xi f (x)dx
Rd
denotes the Fourier transform of f , when the sequence (σn ) tends to 0, the sequence of
functions Z 2 2
1 ˆ −iht,xi σ ktk
(f ∗ gσn )(x) = d
f (t)e exp − n dt
(2π) Rd 2
converges to f in L1 (λd ). Moreover, if the functions f and fˆ are integrable for the Lebesgue
measure, we have the Fourier inversion formula
Z
1
f (x) = e−iht,xi fˆ(t)dt , for almost all x ∈ Rd . (5.7)
(2π)d Rd
Proof. (i) For all λ > 0, P (kXn k ≥ λ) = P kY k ≥ σλn if Y follows a distribution N (0, Id).
Because kY k < +∞ a.s., we deduce that P (kXn k ≥ λ) → 0.
(ii) A change of variables and the Fubini-Tonelli theorem 1.53 show that if Xn and Y
are independent random variables, respectively, Gaussian N (0, σn2 Id) and of distribution µ,
then for every positive Borel function φ : Rd → [0, +∞[, we have
Z Z Z Z
E[φ(Xn + Y )] = φ(x + y)gσn (x)dxµ(dy) = dzφ(z) gσn (z − y)µ(dy) ,
Rd Rd Rd Rd
Z
1
hσ (x) = √ ĝ 1 (y − x)µ(dy)
(σ 2π)d Rd σ
Z Z
1 σd σ 2 ktk2
= √ √ exp iht, y − xi − dt µ(dy).
(σ 2π)d Rd ( 2π)d Rd 2
2 2
σ 2 ktk2 σ ktk
Since exp iht, y − xi − 2 = exp − 2 , this function is integrable with respect
to the product measure dt ⊗ µ and the Fubini Lebesgue theorem 1.54 shows that
Z Z Z
1 iht,yi
σ 2 ktk2
−iht,xi− 2 1 σ 2 ktk2
−iht,xi− 2
hσ (x) = e µ(dy) e dt = µ̂(t)e dt.
(2π)d Rd Rd (2π)d Rd
the Fubini Lebesgue theorem 1.54 applied to the measure µ(dy) ⊗ dx and to the function
gσ (x − y)f (x) shows that
Z Z
Dσ ≤ g (x − y)f (x)dx − f (y) µ(dy).
σ
Rd Rd
For all ε > 0 there exists λ > 0 such that for all σ > 0,
Z Z
gσ (x − y)dx = g1 (z)dz ≤ ε.
{kx−yk≥σλ} {kzk≥λ}
Considering the integral on the sets {kx − yk < σλ} and {kx − ykR ≥ σλ} and using the
continuity of f at the point y, we deduce that for all y, when σ → 0, Rd gσ (x − y)f (x)dx →
f (y). Furthermore, this difference is bounded by 2kf k∞ and the monotone convergence
theorem 1.36 yields that Dσn → 0 when σn → 0.
and remains dominated by | 1 d fˆ| ∈ L1 (λd ). The dominated convergence theorem 1.36
(2π)
then showsR that for all x ∈ Rd , when σn → 0, the sequence (f ∗ gσn (x), n ≥ 1) converges
1
to (2π) d Rd
e−iht,xi fˆ(t)dt. By again applying the dominated convergence theorem we deduce
that for every bounded continuous function ϕ : Rd → R, when n → +∞
Z Z Z Z
1 −iht,xi ˆ
(f ∗ gσn (x)ϕ(x)dx → d
e f (t)dt ϕ(x)dx.
Rd Rd Rd (2π) Rd
We write µ for the measure with density f with respect to the Lebesgue measure. Then,
µ ∗ gσn = f ∗ gσn and from (ii), the sequence of measures with density f ∗ gσn converges
weakly to the measure with density f (with respect to the Lebesgue measure), that is to
say for every bounded continuous function ϕ : Rd → R,
Z Z
(f ∗ gσn (x)ϕ(x)dx → ϕ(x)f (x)dx.
Rd Rd
R R
The equality Rd ϕ(x)f (x)dx = Rd ϕ(x)g(x)dx for every bounded continuous function
ϕ allows us to deduce f (x) = g(x) for λd almost all x, which shows the Fourier inversion
formula (5.7). 2
This Theorem shows a part of the Fourier inversion formula announced in Theorem 4.4
and also in the announced results in Theorems 4.4 and 4.8. Indeed, if µ and ν are two
probabilities on Rd whose Fourier transforms µ̂ and ν̂ are equal (that is to say let X and Y
be two random vectors of distribution µ and ν respectively whose characteristic functions
ΦX and ΦY are equal), then from the formula (5.6), the measures of density µ ∗ gσn = ν ∗ gσn
are equal and when σn → 0, they converge weakly, respectively, to µ and ν. The uniqueness
of the weak limit of this sequence of probabilities shows that µ = ν.
Theorem 5.19 Let (Xn ) be a sequence of random vectors with values in Rd and X a random
vector with values in Rd . Then the sequence (Xn ) converges in distribution to X if and only
if the sequence of characteristic functions of Xn converges to the characteristic function of
X, that is to say if :
E(eiht,Xn i ) → E(eiht,Xi ) for all t ∈ Rd .
Proof. For all t ∈ Rd , the function x → eiht,xi is bounded, continuous, and the convergence
in distribution of (Xn ) to X thus implies the simple convergence of the sequence (ΦXn ) to
ΦX . One may show besides that the sequence (ΦXn ) converges to ΦX uniformly on every
compact set.
Conversely the equation (5.6) shows that for all n and all σ > 0,
d Z
1 σ 2 ktk2
PXn ∗ gσ (x) = ΦXn (t)e−iht,xi− 2 dt.
2π Rd
We recall that for every random vector Y , |ΦY | ≤ 1. Since for all σ > 0, x ∈ Rd the sequence
σ 2 ktk2 σ 2 ktk2
of functions t → ΦXn (t)e−iht,xi− 2 converges to t → ΦX (t)e−iht,xi− 2 and is dominated
2 2
− σ ktk
by the function t → e 2∈ L1 (λd ), the dominated convergence theorem shows that
PXn ∗ gσ (x) converges to PX ∗ gσ (x) for all x.
We may rewrite this convergence in the following form : Let V be the vector space of
continuous functions that tend to 0 at infinity, and generated by
E = {y → gσ (x − y) : x ∈ Rd , σ > 0}.
≤ 2ε + |E[h(Xn )] − E[h(X)]| ≤ 3ε
if n is large enough. Proposition 5.9 then allows us to conclude that the sequence (Xn )
converges in distribution to X. 2
The following theorem is a refinement of one of the implications of the preceding theorem,
since it does not require that the limit of the characteristic functions of Xn be a characteristic
function. It is accepted.
Theorem 5.20 (Lévy’s theorem) Let (Xn ) be a sequence of real random variables whose
characteristic functions (ΦXn ) converges simply to a function Φ, which is continuous at 0.
Then Φ is the characteristic function of a real random variable X and the sequence (X n )
converges to X in distribution.
Example 5.21 Let (Xn , n ≥ 1) be a sequence of independent real random variables such
that P (Xn = 2−n ) = P (Xn = −2−n ) = 21 for every integer n ≥ 1. The sequence Sn =
P n
k=1 Xk converges in distribution. Indeed, for all t ∈ R the characteristic function of S n is
n
Y −n −n n
Y
eit2 + e−it2
ΦSn (t) = = cos(t2−k ).
2
k=1 k=1
The trigonometric formula sin(2a) = 2 sin(a) cos(a) shows that for all n ≥ 1,
When n → ∞, 2n sin(t2−n ) converges to t. Then for all t 6= 0, the sequence ΦSn (t) converges
to sin(t)
t
, whereas ΦSn (0) = 1 converges to 1. The limit function Φ defined by Φ(t) = sin(t)
t
if
t 6= 0 and Φ(0) = 1 is continuous at 0 and the Lévy theorem shows that it is the characteristic
function of a random variable X such that (Sn ) converges in distribution to X.
We may then identify the distribution of X as being uniform on the interval ] − 1, 1[.
Indeed, if f (x) = 21 1]−1,+1[ (x), for t 6= 0,
Z 1
1 1 it sin(t)
fˆ(t) = eitx dx = e − e−it = ,
2 −1 2it t
while fˆ(0) = 1. We deduce that Φ(t) = fˆ(t), that is to say that X follows a uniform
distribution on ] − 1, +1[.
This theorem shows the central role that Gaussian random variables and Gaussian ran-
dom vectors play in probability. Indeed, properly renormalized, a sum of square integrable
independent identically distributed real random variables converges in distribution to a
« universal » Gaussian distribution N (0, 1). Moreover, this theorem indicates the speed of
convergence of √1n of the average X̄n to E(X) in the strong law of large numbers.
We will show this theorem in the classic setting of a square integrable i.i.d. sequence,
but it has a great number of extensions.
Conventions of notation In this section, all of the random variables must be defined on
the same probability space (Ω, F , P ).
For every sequence of random vectors (Xn , n ≥ 1), we write
n
X Sn
Sn = Xk , X̄n = .
k=1
n
Proof. We immediately return to the case where E(X) = 0. Indeed, the sequence Yn =
Xn −E(X) is also independent, square integrable, of covariance matrix Γ and Ȳ = X̄−E(X) ;
clearly, E(Yn ) = E(Ȳ ) = 0.
We thus suppose that E(X) = 0. We write Φ for the characteristic function of X1 ;
the theorem 4.9 shows that for a vector t ∈ Rd whose norm ktk is small enough, Φ(t) =
exp − 2 t̃Γt + o(ktk ) . For all t ∈ R , we deduce that for n large enough,
1 2 d
n "
!#
t n gt t
t
2
Φ√nX̄n (t) = Φ √ = exp − √ Γ √ + n o
√n
.
n 2 n n
√
Therefore, the sequence of characteristic functions of nX̄n converges simply to the cha-
racteristic function of a Gaussian vector N (0, Γ) from the equation (4.8). The theorem 5.19
allows us to conclude. 2
For real random variables, we may renormalized by the square root of the variance of
the distribution and obtain a limit distribution N (0, 1) which does not depend on the initial
common distribution.
Example 5.24 1) Let (Xn ) be an i.i.d. sequence of Bernoulli random variables with para-
meter p ∈]0, 1[. Then the sequence √Sn −np converges in distribution to a Gaussian random
np(1−p)
variable N (0, 1). Since for all n ≥ 1 the random variable Sn follows a binomial distribution
B(n, p), this immediately yields the Gaussian approximation of a binomial distribution for
n large enough.
2) We assume that a computer rounds all the numbers to 10−9 up, that is to say that it
keeps 9 digits after the decimal point. It computes the sum of 106 elementary
1operations for
−9 1 −9
which every rounding error follows a uniform distribution on the interval − 2 10 , 2 10 .
The rounding errors are independent and the error in the final result is the sum of the
errors made in each operation. We want to find the probability that the absolute value of
the final error is less than 12 10−6 . We introduce a sequence (Xn , 1 ≤ n ≤ 10 9
) of independent
1 −9 1 −9
random variables with the same uniform distribution on − 2 10 , 2 10 . Then E(X1 ) = 0
P 6
and V ar(X1 ) = 12 1
10−18 . We deduce that if S = 10 k=1 Xk , the distribution of
qS
10−18
is
103 12
close to the Gaussian distribution N (0, 1). If Y follows a Gaussian distribution N (0, 1),
√ √ !
10−6 2 3|S| 2 3 × 10−6 √ √
P |S| ≤ =P ≤ ∼ P (|Y | ≤ 3) = 2F ( 3) − 1 ∼ 0.91674.
2 10−6 2 × 10−6
The vector central limit theorem implies the following result which is the basis of the χ2
test.
Let (Yn ) be a sequence of independent identically distributed random variables taking
their values in a finite state space I consisting of k elements, written 1, · · · , k. The dis-
tribution of Yi is thus the vector p = (pi , 1 ≤ i ≤ k) ∈ Rk where P (Y1 = i) = pi .
We write (ei , 1 ≤ i ≤ k) for the canonical basis vectors of Rk and for all n, we write
Xn = (Xn1 , · · · , Xnk ) : Ω → Rk for the random variable defined by
Xn (ω) = ei if and only if Yn (ω) = i.
P
The ith component Sni (ω) of the vector Sn (ω) = nj=1 Xj (ω) is then the number of draws
Yj (ω) 1 ≤ j ≤ n which take the value i and the vector X̄n gives the frequency for which we
observe the different values i ∈ I.
The random variables Xn are clearly independent identically distributed. Furthermore,
E(X1 ) = p and for all i, j ∈ {1, · · · , k},
Cov(X1i , X1j ) = pi δi,j − pi pj , where δi,i = 1 and δi,j = 0 if i 6= j.
If we write√Γ for the covariance
matrix of X1 , the central limit theorem 5.22 shows that the
Sn
sequence n n − p converges in distribution to a Gaussian vector N (0, Γ). So that we
obtain a limit which does not depend on pi , it is necessary to weight the various components
of X differently.
Theorem 5.25 Under these hypotheses and with the preceding notation, the sequence
Xk 2 X k
n Sn (i) Sn (i)2
Tn = − pi = −n + converges in distribution to a χ2k−1 . (5.9)
p
i=1 i
n i=1
npi
We deduce that the sequence (Tn ) converges in distribution to the random variable T =
f (N (0, Γ)). It then remains to verify that the image by f of a Gaussian vector N (0, Γ) is a
χ2k−1 .
√ √
The vector v1 = ( p1 , · · · , pk ) ∈ Rk is normed and may be completed to an ortho-
normal basis (v1 , · · · , vk ) of Rk . Let A : Rk → Rk be an orthogonal transformation such
that A(v1 ) = e1 . We write N for a Gaussian vector N (0, Γ) and √Np for the vector whose
components are √Npii for 1 ≤ i ≤ k and Z = A √Np . Then Z is a centered Gaussian vector of
covariance matrix
√ √
ΓZ = AΓ √N à = A δi,j − pi pj , 1 ≤ i, j ≤ k à = Idk − e1 ee1 .
p
The covariance matrix Z is diagonal and the components of Z are thus independent by Theo-
rem 4.14. For i = 2, · · · , k, we have V ar(Zi ) = 1 whereas V ar(Z1 ) = 0. By construction,
2
T = f (N ) =
√Np
= kZk2 because from the Theorem 4.15, the orthogonal transformation
P P
A preserves the Euclidean norm. Furthermore, T = ki=1 Zi2 = ki=2 Zi2 is the sum of the
squares of k − 1 independent Gaussian N (0, 1) random variables, which completes the proof.
2
We deduce that :
• if the sequence (Yn ) follows the distribution p = (p1 , · · · , pk ), the sequence (Tn )
converges to a χ2k−1 .
• if to the contrary the distribution of the sequence (Yn ) is p̄ = (p̄1 , · · · , p̄k ) 6= p, there
exists at least one index i = 1, · · · , k such that p̄i 6= pi . Hence, the sequence Snn(i) → p̄i a.s.
2
according to the strong law of large numbers. In the last case Tn ≥ pni Snn(i) − pi converges
almost surely to +∞.
This gives the rejection region for the χ2 test (for the adequacy of the distribution p for
the Yn ) : {Tn ≥ a} where the value of a is given by the level of the test and the table of the
distribution function of a χ2k−1 random variable.
Acknowledgments : I wish to thank Wayne Tarrant for his help with the translation of
this manuscript from French to English.
Bibliography
(1) Billingsley, P., Probability and Measure, Wiley series in probability, 1995.
(2) Briane, M. , Pagès, G., Théorie de l’intégration, Vuibert, 1998.
(3) Gradinaru, M., Roynette, B., Lectures and exercises, Probability, M1 University Nancy 1.
(4) Jacod, J., Protter, P., Probability Essentials, Springer, 2004
(5) Neveu, J., Probability Lectures, École Polytechnique.