You are on page 1of 193

10

APPUNTI
LECTURE NOTES
Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci
Scuola Normale Superiore
Piazza dei Cavalieri, 7
56126 Pisa, Italy

Introduction to Measure Theory and Integration


Luigi Ambrosio, Giuseppe Da Prato
and Andrea Mennucci

Introduction
to Measure Theory
and Integration

c 2011 Scuola Normale Superiore Pisa

ISBN: 978-88-7642-385-7
e-ISBN: 978-88-7642-386-4
Contents

Preface ix

Introduction xi

1 Measure spaces 1
1.1 Notation and preliminaries . . . . . . . . . . . . . . . . 1
1.2 Rings, algebras and σ –algebras . . . . . . . . . . . . . . 2
1.3 Additive and σ –additive functions . . . . . . . . . . . . 4
1.4 Measurable spaces and measure spaces . . . . . . . . . . 7
1.5 The basic extension theorem . . . . . . . . . . . . . . . 8
1.5.1 Dynkin systems . . . . . . . . . . . . . . . . . . 9
1.5.2 The outer measure . . . . . . . . . . . . . . . . 11
1.6 The Lebesgue measure in R . . . . . . . . . . . . . . . 14
1.7 Inner and outer regularity of measures on metric spaces . 18

2 Integration 23
2.1 Inverse image of a function . . . . . . . . . . . . . . . . 23
2.2 Measurable and Borel functions . . . . . . . . . . . . . 24
2.3 Partitions and simple functions . . . . . . . . . . . . . . 25
2.4 Integral of a nonnegative E –measurable function . . . . 27
2.4.1 Integral of simple functions . . . . . . . . . . . 27
2.4.2 The repartition function . . . . . . . . . . . . . 28
2.4.3 The archimedean integral . . . . . . . . . . . . . 31
2.4.4 Integral of a nonnegative measurable function . . 32
2.5 Integral of functions with a variable sign . . . . . . . . . 35
2.6 Convergence of integrals . . . . . . . . . . . . . . . . . 36
2.6.1 Uniform integrability and Vitali convergence
theorem . . . . . . . . . . . . . . . . . . . . . . 38
2.7 A characterization of Riemann integrable functions . . . 39
vi Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

3 Spaces of integrable functions 45


3.1 Spaces L p (X, E , μ) and L p (X, E , μ) . . . . . . . . . . 45
3.2 The L p norm . . . . . . . . . . . . . . . . . . . . . . . 47
3.2.1 Hölder and Minkowski inequalities . . . . . . . 48
3.3 Convergence in L p (X, E , μ) and completeness . . . . . 49
3.4 The space L ∞ (X, E , μ) . . . . . . . . . . . . . . . . . . 52
3.5 Dense subsets of L p (X, E , μ) . . . . . . . . . . . . . . 56

4 Hilbert spaces 61
4.1 Scalar products, pre-Hilbert and Hilbert spaces . . . . . 61
4.2 The projection theorem . . . . . . . . . . . . . . . . . . 63
4.3 Linear continuous functionals . . . . . . . . . . . . . . . 66
4.4 Bessel inequality, Parseval identity and orthonormal
systems . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.5 Hilbert spaces on C . . . . . . . . . . . . . . . . . . . . 70

5 Fourier series 73
5.1 Pointwise convergence of the Fourier series . . . . . . . 75
5.2 Completeness of the trigonometric system . . . . . . . . 79
5.3 Uniform convergence of the Fourier series . . . . . . . . 80

6 Operations on measures 83
6.1 The product measure and Fubini–Tonelli theorem . . . . 83
6.2 The Lebesgue measure on Rn . . . . . . . . . . . . . . . 87
6.3 Countable products . . . . . . . . . . . . . . . . . . . . 90
6.4 Comparison of measures . . . . . . . . . . . . . . . . . 94
6.5 Signed measures . . . . . . . . . . . . . . . . . . . . . 101
6.6 Measures in R . . . . . . . . . . . . . . . . . . . . . . . 105
6.7 Convergence of measures on R . . . . . . . . . . . . . . 107
6.8 Fourier transform . . . . . . . . . . . . . . . . . . . . . 112
6.8.1 Fourier transform of a measure . . . . . . . . . . 113

7 The fundamental theorem of the integral calculus 119

8 Measurable transformations 129


8.1 Image measure . . . . . . . . . . . . . . . . . . . . . . 129
8.2 Change of variables in multiple integrals . . . . . . . . . 130
8.3 Image measure of L n by a C 1 diffeomorphism . . . . . 131

A 137
A.1 Continuity and differentiability of functions depending
on a parameter . . . . . . . . . . . . . . . . . . . . . . . 137
vii Introduction to Measure Theory and Integration

A.2 The dual space of continuous functions . . . . . . . . . . 139

References 183
Preface

This textbook collects the notes for an introductory course in measure


theory and integration taught by the authors to undergraduate students of
Scuola Normale Superiore in the last 10 years.
The goal of the course was to present, in a quick but rigorous way,
the modern point of view on measure theory and integration, putting Le-
besgues theory in Rn into a more general context and presenting the ba-
sic applications to Fourier series, calculus and real analysis. The text can
also pave the way to more advanced courses in probability, stochastic
processes or geometric measure theory.
Prerequisites for the book are a basic knowledge of calculus in one and
several variables, metric spaces and linear algebra.
All results presented here, as well as their proofs, are classical. We
claim some originality only in the presentation and in the choice of the
exercises. Detailed solutions to the exercises are provided in the final part
of the book.
Pisa, July 2011
Luigi Ambrosio, Giuseppe Da Prato
and Andrea Mennucci
Introduction

This course consists of an introduction to the modern theories of measure


and of integration. Historically, this has been motivated by the necessity
to go beyond the classical theory of Riemann’s integration, usually taught
in elementary Calculus courses on the real line. It is therefore useful to
describe the reasons that motivate this extension.
(1) It is not possible to give a simple, handy, characterization of the class
of Riemann’s integrable function, within Riemann’s theory. This is indeed
possible within the stronger theory, due essentially to Lebesgue, that we
are going to introduce.
(2) The extensions of Riemann’s theory to multiple integrals are very
cumbersome. This extension, useful to compute areas, volumes, etc., is
known as Peano–Jordan theory, and it is sometimes taught in elementary
courses of integration in more than one variable. In addition to that, im-
portant heuristic principles like Cavalieri’s one can be proved only under
technical and basically unnecessary regularity assumptions on the do-
mains of integration.
(3) Many constructive processes typical of Analysis (limits, series, in-
tegrals depending on a parameter, etc.) cannot be handled well within
Riemann’s theory of integration. For instance, the following statement is
true (it is a particular case of the so-called dominated convergence the-
orem):
Theorem 1. Let f h : [−1, 1] → R be continuous functions pointwise
converging to a continuous function f . Assume the existence of a con-
stant M satisfying | f h (x)| ≤ M for all x ∈ [−1, 1] and all h ∈ N. Then
 1  1
lim f h (x) dx = f (x) dx.
h→∞ −1 −1

Even though this statement makes perfectly sense within Riemann’s the-
ory, any attempt to prove this result within the theory (try, if you don’t
xii Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

believe!) seems to fail, and leads (more or less explicitely, see [2]) to
a larger theory. In addition to that, the continuity assumption on the
limit function f is not natural, because a pointwise limit of continuous
functions need not be continuous, and we would like to give a sense to
1
−1 f (x) dx even without this assumption. This necessity emerges for
instance in the study of the convergence of Fourier series


f (x) = ai cos(iπ x) + bi sin(iπ x) x ∈ [−1, 1].
i=0

In this case the uniform convergence of the series, which


 implies the
continuity of f as well, is ensured by the condition i |ai | + |bi | <
∞. On the other hand, we will see that the “natural” condition for the
convergence (in a suitable sense) of the series is much weaker:


ai2 + bi2 < ∞.
i=0

Under this condition the limit function f need not be continuous: for
instance, if f (x) = 1 for x ∈ [−1/2, 1/2] and f (x) = 0 otherwise, then
we will see that the coefficients of the Fourier series are given by bi = 0
for all i (because f is even) and by

⎪ 1

⎪ if i = 0;
⎨2
ai =



⎩ sin(πi/2) if i > 0.

(4) The spaces of integrable functions, as for instance

 1
H := f : [−1, 1] → R : f 2 (x) dx < ∞
−1

endowed with the scalar product


 1
 f, g := f (x)g(x) dx
−1

and with the (pseudo) induced distance d( f, g) =  f − g, f − g1/2 ,


are not complete, if we restrict ourselves to Riemann integrable functions
only. In this sense, the path from Riemann’s to Lebesgue’s theory is the
same one that led from the (incomplete) set of rational numbers Q to the
(complete) real line R.
xiii Introduction to Measure Theory and Integration

Lebesgue’s theory extends Riemann’s theory in two independent direc-


tions. The first one is concerned, as we already said, with more general
classes of functions, not necessarily continuous or piecewise continuous
(the so-called Borel or measurable functions). The second direction can
be better understood if we remind the very definition of Riemann’s integ-
ral  1 
n−1
f (x) dx ∼ (ti+1 − ti ) f (ti )
−1 i=1

where t1 = −1, tn = 1 and the approximation is better and better as the


parameter supi<n ti+1 − ti tends to 0. More generally, instead of integrat-
ing with respect to the “length” measure, we can integrate with respect to
a generic measure μ and define
 1 
n
f (x) dμ(x) ∼ μ(Ai ) f (xi ) (1)
−1 i=1

where A1 , . . . , An is a partition of [−1, 1], xi ∈ Ai and the approximation


is expected to be better and better as the parameter supi diam(Ai ) tends to
0. We may think, for instance to [−1, 1] as a possibly non-homogeneous
bar, and to μ(A) as the “mass” of the subset A of the bar: because of
non-homogeneity, μ(A) need not be proportional to the length of A.
Once we adopt this viewpoint, we will see that it is not hard to obtain a
theory of integration in general metric spaces, and even in more general
classes of spaces. On the other hand, the approximation (1), that in any
case clarifies the intuitive meaning of the integral, will remain valid for
continuous functions only.
Chapter 1
Measure spaces

In this chapter we shall introduce all basic concepts of measure theory,


adopting the point of view of measures as set functions. The domains
of measures may have different stability properties, and this leads to the
concepts of ring, algebra and σ –algebra. The most basic tool developed
in the chapter is Carathéodory’s theorem, which ensures in many cases
the existence and the uniqueness of a σ –additive measure having some
prescribed values on a set of generators of the σ –algebra. In the final
part of the chapter we will apply these abstract tools to the problem of
constructing a “length” measure on the real line, the so-called Lebesgue
measure, and we will study its main properties.

1.1. Notation and preliminaries


We denote by N = {0, 1, 2, . . .} the set of natural numbers, and by N∗ the
set of positive natural numbers. Unless otherwise stated, sequences will
always be indexed by natural numbers.
We shall denote by X a non-empty set, by P(X) the set of all parts of
X and by ∅ the empty set. For any subset A of X we shall denote by Ac
its complement Ac := {x ∈ X : x ∈ / A}. If A, B ∈ P (X) we denote
by A \ B the relative complement A ∩ B c , and by AB the symmetric
difference (A \ B) ∪ (B \ A).
Let (An ) be a sequence in P (X). Then the following De Morgan
identity holds, c

∞ 

An = Acn .
n=0 n=0
(1)
Moreover, we define



∞ 

lim sup An := Am , lim inf An := Am .
n→∞ n→∞
n=0 m=n n=0 m=n

(1) Notice the analogy with liminf and limsup limits for a sequence (a ) of real numbers. We have
n
lim sup an = inf sup am and lim inf an = sup inf am . This is something more than an analogy,
n→∞ n∈N m≥n n→∞ n∈N m≥n
see Exercise 1.1.

L. Ambrosio et al., Introduction to Measure Theory and Integration


© Scuola Normale Superiore Pisa 2011
2 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

As it can be easily checked, lim supn An (respectively lim infn An ) con-


sists of those elements of X that belong to infinitely many An (respect-
ively that belong to all but finitely many An ).
It easy to check that if (An ) is nondecreasing (i.e. An ⊂ An+1 , n ∈ N),
we have

lim inf An = lim sup An = An ,
n→∞ n→∞ n=0

whereas if (An ) is nonincreasing (i.e. An ⊃ An+1 , n ∈ N), we have




lim inf An = lim sup An = An .
n→∞ n→∞ n=0

In the first case we shall write An ↑ L, and in the second one An ↓ L.

1.2. Rings, algebras and σ –algebras


Definition 1.1 (Rings and Algebras). A non empty subset A of P (X)
is said to be a ring if:
(i) ∅ belongs to A ;
(ii) A, B ∈ A ⇒ A ∪ B, A ∩ B ∈ A ;
(iii) A, B ∈ A ⇒ A \ B ∈ A .
We say that a ring is an algebra if X ∈ A .
Notice that rings are stable only with respect to relative complement,
whereas algebras are stable under complement in X.
Let K ⊂ P (X). As the intersection of any family of algebras is still
an algebra, the minimal algebra including K (that is the intersection of all
algebras including K ) is well defined, and called the algebra generated
by K . A constructive characterization of the algebra generated by K can
be easily achieved as follows: set F (0) = K ∪ {∅} and
 
F (i+1) := A ∪ B, Ac : A, B ∈ F (i) ∀i ≥ 0.

Then, the algebra A generated by K is given by i F (i) . Indeed, it is
immediate to check by induction on i that A ⊃ F (i) , and therefore the
union of the F (i) ’s is contained in A . On the other hand, this union is
easily seen to be an algebra, so the minimality of A provides the opposite
inclusion.
Definition 1.2 (σ –algebras). A non-empty subset E of P (X) is said to
be a σ –algebra if:
(i) E is an algebra;
3 Introduction to Measure Theory and Integration



(ii) if (An ) is a sequence of elements of E then An ∈ E .
n=0

If E is a σ –algebra and (An ) ⊂ E we have n An ∈ E by the De
Morgan identity. Moreover, both sets

lim inf An , lim sup An ,


n→∞ n→∞

belong to E .
Obviously, {∅, X} and P (X) are σ –algebras, respectively the smal-
lest and the largest ones. Let K be a subset of P (X). As the intersection
of any family of σ –algebras is still a σ -algebra, the minimal σ –algebra
including K (that is the intersection of all σ –algebras including K ) is
well defined, and called the σ –algebra generated by K . It is denoted by
σ (K ).
In contrast with the case of generated algebras, it is quite hard to give
a constructive characterization of the generated σ -algebras: this requires
the transfinite induction and it is illustrated in Exercise 1.18.
Definition 1.3 (Borel σ -algebra). If (E, d) is a metric space, the σ –
algebra generated by all open subsets of E is called the Borel σ –algebra
of E and it is denoted by B (E).
In the case when E = R the Borel σ -algebra has a particularly simple
class of generators.
Example 1.4 (B (R)). Let I be the set of all semi–closed intervals [a,b)
with a ≤ b. Then σ (I ) coincides with B (R). In fact σ (I ) contains all
open intervals (a, b) since
∞ 
1  1
(a, b) = a + ,b with n 0 > .
n=n 0 n b−a

Moreover, any open set A in R is a countable union of open intervals. (2)


An analogous argument proves that B (R) is generated by semi-closed
intervals (a, b], by open intervals, by closed intervals and even by open
or closed half-lines.

(2) Indeed, let (a ) be a sequence including all rational numbers of A and denote by I the largest
k  k
open interval contained in A and containing ak . We clearly have A ⊃ ∞k=0 Ik , but also the opposite
inclusion holds: it suffices to consider, for any x ∈ A, r > 0 such that (x − r, x + r) ⊂ A, and k
such that ak ∈ (x − r, x + r) to obtain (x − r, x + r) ⊂ Ik , by the maximality of Ik , and then x ∈ Ik .
4 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

1.3. Additive and σ –additive functions


Let A ⊂ P (X) be a ring and let μ be a mapping from A into [0, +∞]
such that μ(∅) = 0. We say that μ is additive if

A, B ∈ A , A ∩ B = ∅ ⇒ μ(A ∪ B) = μ(A) + μ(B).

If μ is additive, A, B ∈ A and A ⊃ B, we have μ(A) = μ(B) + μ(A \


B), so that μ(A) ≥ μ(B). Therefore any additive function is nondecreas-
ing with respect to set inclusion. Moreover, by applying repeatedly the
additivity property, additive measures satisfy

n n
μ Ak = μ(Ak )
k=1 k=1

for n ∈ N∗ and mutually disjoint sets A1 , . . . , An ∈ A .


A set function μ on A is called σ –additive if μ(∅) =  0 and for any
sequence (An ) ⊂ A of mutually disjoint sets such that n An ∈ A we
have 
∞ ∞
μ An = μ(An ).
n=0 n=0

Obviously σ –additive functions are additive, because we can consider


countable unions in which only finitely many An are nonempty.
Another useful concept is the σ –subadditivity: we say that μ is σ –
subadditive if
∞
μ(B) ≤ μ(An ),
n=0

for any B ∈ A and any sequence (An ) ⊂ A such that B ⊂ n An .
Notice that, unlike the definition of σ –additivity, the sets An need not be
disjoint here.
Remark 1.5 (σ –additivity and σ –subadditivity). Let μ be additive
 on
a ring A and let (An ) ⊂ A be mutually disjoint and such that n An ∈
A . Then by monotonicity we have
 


k  k
μ An ≥ μ An = μ(An ), for all k ∈ N.
n=0 n=0 n=0

Therefore, letting k ↑ ∞ we get




∞ 

μ An ≥ μ(An ).
n=0 n=0
5 Introduction to Measure Theory and Integration

Thus, to show that an additive function is σ –additive, it is enough to


prove that it is σ –subadditive.
Conversely, it is not difficult to show that σ –additive set functions are
σ –subadditive: indeed,
 if B ⊂ ∪n An we can define A0 = B ∩ A0 and
An := B ∩ An \ m<n Am for n ∈ N∗ , so that B is the disjoint union of
the sets An , to obtain

∞ 

μ(B) = μ(An ) ≤ μ(An ).
n=0 n=0

Let μ be additive on A . Then σ –additivity of μ is equivalent to con-


tinuity of μ in the sense of the following proposition.
Proposition 1.6 (Continuity on nondecreasing sequences). If μ is ad-
ditive on a ring A , then (i) ⇐⇒ (ii), where:
(i) μ is σ –additive;
(ii) (An ) ⊂ A and A ∈ A , An ↑ A ⇒ μ(An ) ↑ μ(A).
Proof. (i)⇒(ii). In the proof of this implication we can assume with no
loss of generality that μ(An ) < ∞ for all n ∈ N. Let (An ) ⊂ A , A ∈ A ,
An ↑ A. Then


A = A0 ∪ (An+1 \ An ),
n=0
the unions being disjoint. Since μ is σ –additive, we deduce that


μ(A) = μ(A0 ) + (μ(An+1 ) − μ(An )) = lim μ(An ),
n→∞
n=0

and (ii) follows.


 (ii)⇒(i). Let (An ) ⊂ A be mutually disjoint and such that A :=
n An ∈ A . Set

m
Bm := Ak .
k=0

m
Then Bm ↑ A and μ(Bm ) = μ(Ak ) ↑ μ(A) by the assumption. This
0
implies (i).

Proposition 1.7 (Continuity on nonincreasing sequences). Let μ be


σ –additive on a ring A . Then
(An ) ⊂ A and A ∈ A , An ↓ A, μ(A0 ) < ∞ ⇒ μ(An ) ↓ μ(A).
(1.1)
6 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Proof. Setting Bn := A0 \ An , B := A0 \ A, we have Bn ↑ B, therefore the


previous proposition gives μ(Bn ) ↑ μ(B). As μ(An ) = μ(A0 ) − μ(Bn )
and μ(A) = μ(A0 ) − μ(B) the proof is achieved.

Corollary 1.8 (Upper and lower semicontinuity of the measure). Let


μ be σ –additive on a σ –algebra E and let (An ) ⊂ E . Then we have
 
μ lim inf An ≤ lim inf μ(An ) (1.2)
n→∞ n→∞

and, if μ(X) < ∞, we have also


 
lim sup μ(An ) ≤ μ lim sup An . (1.3)
n→∞ n→∞

Proof. Set L := lim supn An . Then we can write





L= Bn where Bn := Am . (1.4)
n=0 m=n

Now, assuming μ(X) < ∞, by Proposition 1.7 it follows that

μ(L) = lim μ(Bn ) = inf μ(Bn ) ≥ inf sup μ(Am ) = lim sup μ(An ).
n→∞ n∈N n∈N m≥n n→∞

Thus, we have proved (1.3). The inequality (1.2) can be proved similarly
using Proposition 1.6, thus without using the assumption μ(X) < ∞.
The following result is very useful to estimate the measure of a lim sup
of sets.
Lemma 1.9. Let μ be σ –additive on a σ –algebra E and let (An ) ⊂ E .


Assume that μ(An ) < ∞. Then
n=0
 
μ lim sup An = 0.
n→∞

Proof. Set L = lim sup An and define Bn as in (1.4). Then the inclusion
n→∞
L ⊂ Bn gives


μ(L) ≤ μ(Bn ) ≤ μ(Am ) for all n ∈ N.
m=n

As n → ∞ we find μ(L) = 0.
7 Introduction to Measure Theory and Integration

1.4. Measurable spaces and measure spaces


Let E be a σ –algebra of subsets of X. Then we say that the pair (X, E )
is a measurable space. Let μ : E → [0, +∞] be a σ –additive function.
Then we call μ a measure on (X, E ), and we call the triple (X, E , μ) a
measure space.
The measure μ is said to be finite
if μ(X) < ∞, σ –finite if there exists
a sequence (An ) ⊂ E such that n An = X and μ(An ) < ∞ for all
n ∈ N. Finally, μ is called a probability measure if μ(X) = 1.
The simplest (but fundamental) example of a probability measure is
the Dirac mass δx , defined by

⎨1 if x ∈ B
δx (B) :=
⎩0 if x ∈ / B.

This example can be generalized as follows, see also Exercise 1.5 and
Exercise 1.23.
Example 1.10 (Discrete measures). Assume that Y ⊂ X is a finite or
countable set. Given c : Y → [0, +∞] we can define a measure on
(X, P (X)) as follows:

μ(B) := c(x) ∀B ⊂ X.
x∈B∩Y
 
Clearly μ = x∈Y c(x)δx is a finite measure if and only if x∈Y c(x) <
∞, and it is σ –finite if and only if c(x) ∈ [0, +∞) for all x ∈ Y .
More generally, the construction above works even when Y is uncount-
able, by replacing the sum with

sup c(x),
c∈B∩Y 

where the supremum is made among the finite subsets Y  of Y . The meas-
ures arising in the previous example are called atomic, and clearly if X is
either finite or countable then any measure μ in (X, P (X)) is atomic: it
suffices to notice that

μ= c(x)δx with c(x) := μ({x}).
x∈X

In the next section we will introduce a fundamental tool for the construc-
tion of non-atomic measures.
8 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Definition 1.11 (μ–negligible sets and μ–almost everywhere). Given


a measure space (X, E , μ), we say that B ∈ E is μ–negligible if μ(B) =
0, and we say that a property P(x) holds μ–almost everywhere if the set

{x ∈ X : P(x) is false}

is contained in a μ–negligible set.


Notice that the class of μ–negligible sets is stable under finite or count-
able unions. It is sometimes convenient to know that any subset of a
μ–negligible set is still μ–negligible.
Definition 1.12 (μ–completion of aσ–algebra and μ–measurable sets).
Let (X, E , μ) be a measure space. We define

E μ := {A ∈ P (X) : for some B, C ∈ E with μ(C) = 0, AB ⊂ C} .

It is easy to check that E μ is still a σ –algebra, the so-called completion of


E with respect to μ. The elements of E μ are called μ–measurable sets.
It is also easy to check that μ can be extended to all A ∈ E μ simply
by setting μ(A) = μ(B), where B ∈ E is any set such that AB is
contained in a μ–negligible set of E . This extension is well defined (i.e.
independent of the choice of B), still σ –additive and μ–negligible sets
coincide with those sets that are contained in some B ∈ E with μ(B) =
0. As a consequence, any subset of a μ–negligible set is μ–negligible as
well.

1.5. The basic extension theorem


The following result, due to Carathéodory, allows to extend a σ –additive
function on a ring A to a σ –additive function on σ (A ). It is one of the
basic tools in the construction of non-trivial measures in many cases of
interest, as we will see.

Theorem 1.13 (Carathéodory). Let A ⊂ P (X) be a ring, and let E


be the σ –algebra generated by A . Let μ : A → [0, +∞] be σ –additive.
Then μ can be extended to a measure on E . If μ is σ –finite, i.e. there
exist An ∈ A with An ↑ X and μ(An ) < ∞ for any n, then the extension
is unique.

To prove this theorem we need some preliminaries: for the uniqueness


the Dynkin theorem and for the existence the concepts of outer measure
and additive set.
9 Introduction to Measure Theory and Integration

1.5.1. Dynkin systems


A non-empty subset K of P (X) is called a π–system if

A, B ∈ K ⇒ A ∩ B ∈ K .

A non-empty subset D of P (X) is called a Dynkin system if

(i) X, ∅ ∈ D ;
(ii) A ∈ D ⇒ Ac ∈ D ; 
(iii) (Ai ) ⊂ D mutually disjoint ⇒ i Ai ∈ D .

Obviously any σ –algebra is a Dynkin system. Moreover, if D is both a


Dynkin system and a π–system then it is a σ –algebra. In fact, if (Ai ) is
a sequence in D of not necessarily disjoint sets we have


Ai = A0 ∪ (A1 \ A0 ) ∪ ((A2 \ A1 ) \ A0 ) ∪ · · ·
i=0

and so i Ai ∈ D by (ii) and (iii).
Let us prove now the following important result.

Theorem 1.14 (Dynkin). Let K be a π–system and let D ⊃ K be a


Dynkin system. Then σ (K ) ⊂ D .

Proof. Let D 0 be the minimal Dynkin system including K . We are going


to show that D 0 is a σ –algebra which will prove the theorem. For this
it is enough to show, as remarked before, that the following implication
holds:
A, B ∈ D 0 ⇒ A ∩ B ∈ D 0 . (1.5)
For any B ∈ D 0 we set

H (B) = {F ∈ D 0 : B ∩ F ∈ D 0 }.

We claim that H (B) is a Dynkin system. In fact properties (i) and (iii)
are clear. It remains to show that if F ∩ B ∈ D 0 then F c ∩ B ∈ D 0 or,
equivalently, F ∪ B c ∈ D 0 . In fact, since F ∪ B c = (F \ B c ) ∪ B c =
(F ∩ B) ∪ B c and F ∩ B and B c are disjoint, we have that F ∪ B c ∈ D 0
as required.
Notice first that if K ∈ K we have K ⊂ H (K ) since K is a
π–system. Therefore H (K ) = D 0 , by the minimality of D 0 . Con-
sequently, the following implication holds

K ∈ K , B ∈ D 0 ⇒ K ∩ B ∈ D 0 ,
10 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

which implies K ⊂ H (B) for all B ∈ D 0 . Again, the fact that H (B) is
a Dynkin system and the minimality of D 0 give that H (B) = D 0 for all
B ∈ D 0 . By the definition of H (B), this proves (1.5).
The uniqueness part in Caratheodory’s theorem is a direct consequence
of the following coincidence criterion for measures; in turn, the proof of
the criterion relies on Theorem 1.14.
Proposition 1.15 (Coincidence criterion). Let μ1 , μ2 be measures in
(X, E ) and assume that:
(i) the coincidence set
D := {A ∈ E : μ1 (A) = μ2 (A)}
contains a π–system K with σ (K ) = E ;
(ii) there exists a nondecreasing sequence (X i ) ⊂ K with μ1 (X i ) =
μ2 (X i ) < ∞ and X i ↑ X.
Then μ1 = μ2 .
Proof. We first assume that μ1 (X) = μ2 (X) is finite. Under this as-
sumption D is a Dynkin system including the π–system K (stability of
D under complement is ensured precisely by the finiteness assumption).
Thus, by the Dynkin theorem, D = E , which implies that μ1 = μ2 .
Assume now that we are in the general case and let X i be given by
assumption (ii). Fix i ∈ N and define the σ –algebra E i of subsets of X i
by
E i := {A ⊂ X i : A ∈ E } .
We may obviously consider μ1 and μ2 as finite measures in the measur-
able space (X i , E i ). Since these measures coincide on the π–system
K i := {A ⊂ X i : A ∈ K }
we obtain, by the previous step, that μ1 and μ2 coincide on σ (K i ) ⊂
P (X i ).
Now, let us prove the inclusion
{B ∈ E : B ⊂ X i } ⊂ σ (K i ). (1.6)
Indeed  
B ⊂ X : B ∩ X i ∈ σ (K i )
is a σ –algebra containing K (here we use the fact that X i ∈ K ), and
therefore contains E . Hence any element of E contained in X i belongs to
σ (K i ).
By (1.6) we obtain μ1 (B ∩ X i ) = μ2 (B ∩ X i ) for all B ∈ E and all
i ∈ N. Passing to the limit as i → ∞, since B is arbitrary we obtain that
μ1 = μ2 .
11 Introduction to Measure Theory and Integration

1.5.2. The outer measure


Let μ be a set function defined on A ⊂ P (X). For any E ∈ P (X) we
define:
 
∞

μ∗ (E) := inf μ(Ai ) : Ai ∈ A , E ⊂ Ai .
i=0 i=0

μ∗ is called the outer measure induced by μ. We can easily show that


μ∗ is a nondecreasing set function, namely μ∗ (E) ≤ μ∗ (F) whenever
E ⊂ F ⊂ X.
We will obtain the proof of the existence part of Carathéodory’s the-
orem by showing in the proposition below that μ∗ extends μ if μ is σ –
subadditive, and that (Theorem 1.17) μ∗ is σ -additive on a σ -algebra
containing σ (A ) if μ is A is a ring and μ is additive on A . In particular
if μ is σ –additive on A we see that μ∗ provides the desired σ –additive
extension to σ (A ).
Proposition 1.16. The set function μ∗ is σ –subadditive on P (X) and
extends μ if μ is σ –subadditive on A and μ(∅) = 0.
 
Proof. Let (E i ) ⊂ P (X) and set E := i E i . Assume that i μ∗ (E i )
are finite (otherwise the assertion is trivial). Then, since μ∗ (E i ) is finite
for any i ∈ N, for any ε > 0 there exist Ai, j ∈ A such that


ε


μ(Ai, j ) < μ (E i ) + , Ei ⊂ Ai, j , i ∈ N.
j=0
2i+1 j=0

Consequently

∞ 

μ(Ai, j ) ≤ μ∗ (E i ) + ε.
i, j=0 i=0



Since E ⊂ Ai, j we have
i, j=0


∞ 

μ∗ (E) ≤ μ(Ai, j ) ≤ μ∗ (E i ) + ε
i, j=0 i=0

and the first part of the statement follows from the arbitrariness of ε.
Now, let us assume that μ is σ -subadditive on A and choose E ∈ A ;
since E ⊂ i Ai then μ(E) ≤ i μ(Ai ), so we deduce μ∗ (E) ≥ μ(E);
but, by choosing A0 = E and An = ∅ for n ≥ 1, we obtain that μ∗ (E) =
μ(E). This proves that μ∗ extends μ.
12 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Let us now define the additive sets, according to Carathéodory. A set


A ∈ P (X) is called additive if
μ∗ (E) = μ∗ (E ∩ A) + μ∗ (E ∩ Ac ) for all E ∈ P (X). (1.7)
We denote by G the family of all additive sets.
Notice that, since μ∗ is subadditive, (1.7) is equivalent to
μ∗ (E) ≥ μ∗ (E ∩ A) + μ∗ (E ∩ Ac ) for all E ∈ P (X). (1.8)
Obviously, the class G of additive sets is stable under complement;
moreover, by taking E = A ∪ B with A ∈ G and A ∩ B = ∅, we
obtain the additivity property
μ∗ (A ∪ B) = μ∗ (A) + μ∗ (B). (1.9)
Other important properties of G are listed in the next proposition.
Theorem 1.17. Assume that A is a ring and that μ is additive. Then G
is a σ –algebra containing A and μ∗ is σ –additive on G .
Proof. We proceed in three steps: we show that G contains A , that G is a
σ –algebra and that μ∗ is additive on G . As pointed in Remark 1.5, if μ∗
is σ –subadditive and additive on the σ –algebra G , then μ∗ is σ –additive.
Step 1. A ⊂ G . Let A ∈ A and E ∈ P (X), we have to show (1.8).
Assume μ∗ (E) < ∞ (otherwise (1.8) trivially holds), fix ε > 0 and
choose (Bi ) ⊂ A such that

∞ 

E⊂ Bi , μ∗ (E) + ε > μ(Bi ).
i=0 i=0

Then, by the definition of μ∗ , it follows that



∞ 

μ∗ (E) + ε > μ(Bi ) = [μ(Bi ∩ A) + μ(Bi ∩ Ac )]
i=0 i=0
≥ μ∗ (E ∩ A) + μ∗ (E ∩ Ac ).
Since ε is arbitrary we have μ∗ (E) ≥ μ∗ (E ∩ A) + μ∗ (E ∩ Ac ), and (1.8)
follows.
Step 2. G is an algebra and μ∗ is additive on G . We already know
that A ∈ G implies Ac ∈ G . Let us prove now that if A, B ∈ G then
A ∪ B ∈ G . For any E ∈ P (X) we have
μ∗ (E) = μ∗ (E ∩ A) + μ∗ (E ∩ Ac )

= μ∗ (E ∩ A) + μ∗ (E ∩ Ac ∩ B) + μ∗ (E ∩ Ac ∩ B c ) (1.10)

= [μ∗ (E ∩ A) + μ∗ (E ∩ Ac ∩ B)] + μ∗ (E ∩ (A ∪ B)c ).


13 Introduction to Measure Theory and Integration

Since
(E ∩ A) ∪ (E ∩ Ac ∩ B) = E ∩ (A ∪ B),
we have by the subadditivity of μ∗ ,
μ∗ (E ∩ A) + μ∗ (E ∩ Ac ∩ B) ≥ μ∗ (E ∩ (A ∪ B)).
So, by (1.10) it follows that
μ∗ (E) ≥ μ∗ (E ∩ (A ∪ B)) + μ∗ (E ∩ (A ∪ B)c ),
and A ∪ B ∈ G as required. The additivity of μ∗ on G follows directly
from (1.9).
Step 3. G is a σ –algebra. Let (An ) ⊂ G . We are going to show that
S := n An ∈ G . Since we know that G is an algebra, it is not
restrictive
to assume that all sets An are mutually disjoint. Set Sn := n0 Ai , for
n ∈ N.
For any n ∈ N, by using the σ –subadditivity of μ∗ and by applying
(1.7) repeatedly, we get


∗ ∗ ∗
μ (E ∩ S ) + μ (E ∩ S) ≤ μ (E ∩ S ) +
c c
μ∗ (E ∩ Ai )
i=0
 

n
= lim μ∗ (E ∩ S c ) + μ∗ (E ∩ Ai )
n→∞
i=0
 
= lim μ∗ (E ∩ S c ) + μ∗ (E ∩ Sn ) .
n→∞

Since S c ⊂ Snc it follows that


 
μ∗ (E ∩ S c ) + μ∗ (E ∩ S) ≤ lim sup μ∗ (E ∩ Sn ) + μ∗ (E ∩ Snc )
n→∞

= μ (E).
So, S ∈ G and G is a σ –algebra.
Remark 1.18. We have proved that
σ (A ) ⊂ G ⊂ P (X). (1.11)
One can show that the inclusions above are strict in general, for instance
when μ is the Lebesgue measure we shall consider in the next section.
In fact, in the case when X = R and σ (A ) is the Borel σ -algebra, Exer-
cise 1.19 shows that σ (A ) has the cardinality of continuum, while G has
the cardinality of P (R), since it contains all subsets of Cantor’s middle
third set (see Exercise 1.8). An example of a non-additive set will be built
in Remark 1.23, so that also the second inclusion in (1.11) is strict.
14 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

1.6. The Lebesgue measure in R


In this section we build the Lebesgue measure on the real line R. To this
aim, we consider first the set I of all bounded right open intervals of R
I := {(a, b] : a, b ∈ R, a < b}
and the collection A containing ∅ and the finite unions of elements of
I . Our choice of half-open intervals ensures that A is a ring, because
I is stable under intersection and relative complement (the families of
open and closed intervals, instead, do not have this property).
We define
length((a, b]) := b − a.
More generally, any non-empty A ∈ A can be written, possibly in many
ways, as a disjoint finite union of intervals Ii , i = 1, . . . , N ; we define

N
λ(A) := length(Ii ). (1.12)
i=1

Setting λ(∅) = 0, it is not hard to show by elementary methods that λ


is well defined (i.e. λ(A) does not depend on the chosen decomposition)
and additive on A .
In the next definition we introduce the notion of characteristic function,
which can be used to turn set-theoretic operations into algebraic ones:
for instance the intersection corresponds to the product, when seen at the
level of characteristic functions (see also Exercise 1.1).
Definition 1.19 (Characteristic function of a set). Let A ⊂ X. The
characteristic function 1 A : X → {0, 1} is defined by

⎨1 if x ∈ A;
1 A (x) :=
⎩0 if x ∈ X \ A.

The reader already acquainted with Riemann’s theory of integration can


also notice that λ(A) is the Riemann integral of the characteristic function
1 A of A, and deduce the additivity property of λ directly by the additivity
properties of the Riemann integral. In the next theorem we shall rigor-
ously prove these facts, and more. We first state an auxiliary lemma, a
simple consequence of the Bolzano-Weierstrass compactness theorem on
the real line.
Lemma 1.20. Any bounded and closed interval J contained in the union
of a sequence {An }n∈N of open sets is contained in the union of finitely
many of them.
15 Introduction to Measure Theory and Integration

Proof. Assume with no loss of generality that I = N and An ⊂ An+1 ,


and assume by contradiction that there exist xn ∈ J \ An for all n; by the
Bolzano–Weierstrass theorem there exists a subsequence (xn(k) ) conver-
ging to some x ∈ J . If n̄ is such that x ∈ An̄ , for k large enough xn(k)
belongs to An̄ , because An̄ is open. But this is not possible, as soon as
n(k) ≥ n̄, because xn(k) ∈
/ An(k) and An(k) ⊃ An̄ .

Theorem 1.21. The set function λ defined in (1.12) is σ –additive on A .

Proof. (λ is well defined) Given disjoint partitions I1 , . . . , In and J1 , . . .


. . . , Jm of A ∈ A , we say that J1 , . . . , Jm is finer than I1 , . . . , In if any
interval Ii is the disjoint union of some of the intervals J j . Obviously,
given any two partitions, there exists a third partition finer than both: it
suffices to take all intersections of elements of the first partition with ele-
ments of the second partition, neglecting the empty intersections. Given
these
 remarks,  to show that λ is well defined, it suffices to show that
i λ(I i ) = j λ(J j ) if J1 , . . . , Jm isfiner than I1 , . . . , In . This state-
ment reduces to the fact that λ(I ) = k λ(Fk ) if I ∈ I is the disjoint
union of some elements Fk ∈ I ; this last statement can be easily proved,
starting from the identity (a, b] = (a, c] ∪ (c, b], by induction on the
number of the intervals Fk .
(λ is additive) If F, G ∈ A and F ∩ G = ∅, any disjoint decompositions
of F in intervals I1 , . . . , In ∈ I and any disjoint decomposition of G in
intervals J1 , . . . , Jm ∈ I provide a decomposition I1 , . . . , In , J1 , . . . , Jm
of F ∪ G in intervals belonging to I . Using this decomposition to com-
pute λ(F ∪ G) the additivity easily follows.
(λ is σ –additive) Let (Fn ) ⊂ A be a sequence of disjoints sets in A and
assume that

F := Fn (1.13)
n=0

also belongs to A .
We prove the additivity property first in the case when  F = (x, y] ∈
I . It is also not restrictive to assume that the series n λ(Fn ) is con-
vergent. As any Fn is a finite union of intervals, say Nn , we can find,
given any ε > 0, a finite union Fn ⊃ Fn of intervals in I such that
λ(Fn ) ≤ λ(Fn ) + ε/2n and the internal part of Fn contains Fn (just shift
the endpoints of each interval in Fn by a small amount, to obtain a lar-
ger interval in I , increasing the length at most by ε/(Nn 2n )). Let also
Fn be the internal part 
of Fn , that still includes Fn , and let x  ∈ (x, y].
Then, since [x , y] ⊂ n Fn , Lemma 1.20 provides an integer k such

16 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

k
that [x  , y] ⊂ 0 Fn . Hence, the additivity of λ in A gives

k k
y − x ≤ λ Fn ≤ λ(Fn )
n=0 n=0

k
ε ∞
≤ λ(Fn ) + ≤ 2ε + λ(Fn ).
n=0
2n n=0

By letting first ε ↓ 0 and then letting x  ↓ x we obtain that λ(F) ≤


 ∞
0 λ(Fn ). The opposite inequality simply follows by the monotonicity
and the additivity of λ, because the finite unions of the sets Fn are con-
tained in F.
In the general case, let
k
F= Ii ,
i=1

where I1 , . . . , Ik are disjoint sets in I . Then, since for any i ∈ {1, . . . , k}


we have that Ii is the disjoint union of Ii ∩ Fn , we know by the previous
step that
∞
λ(Ii ) = λ(Ii ∩ Fn ).
n=0

Adding these identities for i = 1, . . . , k, commuting the sums on the


right hand side and eventually using the additivity of λ on A we obtain


k 
∞ 
k 

λ(F) = λ(Ii ∩ F) = λ(Ii ∩ Fn ) = λ(Fn ).
i=1 n=0 i=1 n=0

We say that a measure μ in (R, B (R)) is translation invariant if μ(A +


h) = μ(A) for all A ∈ B (R) and h ∈ R (notice that, by Exercise 1.2,
the class of Borel sets is translation invariant as well). We say also that μ
is locally finite if μ(I ) < ∞ for all bounded intervals I ⊂ R.

Theorem 1.22 (Lebesgue measure in R). There exists a unique, up


to multiplication with constants, translation invariant and locally fi-
nite measure λ in (R, B (R)). The unique such measure λ satisfying
λ([0, 1]) = 1 is called Lebesgue measure.

Proof. (Existence) Let A be the class of finite unions of intervals and


let λ : A → [0, +∞) be the σ –additive set function defined in (1.12).
According to Theorem 1.21 λ admits a unique extension, that we still
denote by λ, to σ (A ) = B (R). Clearly λ is locally finite, and we can use
the uniqueness of the extension to prove translation invariance: indeed,
17 Introduction to Measure Theory and Integration

for any h ∈ R also the σ –additive measure A → λ(A+h) is an extension


of λ|A . As a consequence λ(A) = λ(A + h) for all h ∈ R.
(Uniqueness) Let ν be a translation invariant and locally finite measure
in (R, B (R)) and set c := ν([0, 1]). Notice first that the set of atoms of
ν is at most countable (Exercise 1.5), and since R is uncountable there
exists at least one x such that ν({x}) = 0. By translation invariance this
holds for all x, i.e., ν has no atom.
Excluding the trivial case c = 0 (that gives ν ≡ 0 by translation in-
variance and σ –additivity), we are going to show that ν = cλ on the
class A of finite unions of intervals; by the uniqueness of the extension
in Carathéodory theorem this would imply that ν = cλ on B (R).
By finite additivity and translation invariance it suffices to show that
ν([0, t)) = ct for any t ≥ 0 (by the absence of atoms the same holds for
the intervals (0, t), (0, t], [0, t]). Notice first that, for any integer q ≥ 1,
[0, 1) is the union of q disjoint intervals all congruent to [0, 1/q); as a
consequence, additivity and translation invariance give
 ν([0, 1)) c
ν [0, 1/q) = = .
q q
Similarly, for any integer p ≥ 1 the interval [0, p/q) is the union of p
disjoint intervals all congruent to [0, 1/q); again additivity and transla-
tion invariance give

p  1 p
ν([0, )) = pν [0, ) = c .
q q q
By approximation we eventually obtain that ν([0, t)) = ct for all
t ≥ 0.
The completion of the Borel σ –algebra with respect to λ is the so-
called σ -algebra of Lebesgue measurable sets. It coincides with the
class C of additive sets with respect to λ∗ considered in the proof of
Carathéodory theorem (see Exercise 1.12).
Remark 1.23 (Outer Lebesgue measure and non-measurable sets).
The measure λ∗ used in the proof of Carathéodory’s theorem is also called
outer Lebesgue measure, and it is defined on all parts of R. The termin-
ology is slightly misleading here, since λ∗ , though σ –subadditive, fails
to be σ –additive. In particular, there exist subsets of R that are not Le-
besgue measurable. To see this, let us consider the equivalence relation
in R defined by x ∼ y if x − y ∈ Q and let us pick a single element
x ∈ [0, 1] in any equivalence class induced by this relation, thus forming
a set A ⊂ [0, 1]. Were this set Lebesgue measurable, all the sets A + h
18 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

would still be measurable, by translation invariance, and the family of


sets {A + h}h∈Q would be a countable and measurable partition of R,
with λ∗ (A + h) = c independent of h ∈ Q. Now, if c = 0 we reach a
contradiction with the fact that λ∗ (R) = ∞, while if c > 0 we consider
all sets A + h with h ∈ Q ∩ [0, 1] to obtain

2 = λ∗ ([0, 2]) ≥ λ∗ (A + h) = ∞,
h∈Q∩[0,1]

reaching again a contradiction.


Notice that this example is not constructive and strongly requires the
axiom of choice (also the arguments based on cardinality, see Exercise
1.19 and Exercise 1.20, have this limitation). On the other hand, one
can give constructive examples of Lebesgue measurable sets that are not
Borel (see for instance 2.2.11 in [3]).
The construction done in the previous remark rules out the existence of
locally finite and translation invariant σ –additive measures defined on all
parts of R. In Rn , with n ≥ 3, the famous Banach–Tarski paradox (see
for instance [6]) shows that it is also impossible to have a locally finite,
invariant under rigid motions and finitely additive measure defined on all
parts of Rn .

1.7. Inner and outer regularity of measures on metric spaces


Let (E, d) be a metric space and let μ be a finite measure on (E, B (E)).
We shall prove a regularity property of μ.
Proposition 1.24. For any B ∈ B (E) we have

μ(B) = sup{μ(C) : C ⊂ B, closed} = inf{μ(A) : A ⊃ B, open}.


(1.14)
Proof. Let us set

K = {B ∈ B (E) : (1.14) holds}.

It is enough to show that K is a σ –algebra of parts of E including the


open sets of E. Obviously K contains E and ∅. Moreover, if B ∈ K
then its complement
 B c belongs to K. Let us prove now that (Bn ) ⊂ K
implies n Bn ∈ K . Fix ε > 0. We are going to show that there exist a
closed set C and an open set A such that


C⊂ Bn ⊂ A, μ(A \ C) ≤ 2ε. (1.15)
n=0
19 Introduction to Measure Theory and Integration

Let n ∈ N. Since Bn ∈ K there exist an open set An and a closed set Cn


such that Cn ⊂ Bn ⊂ An and
ε
μ(An \ Cn ) ≤ .
2n+1
  
Setting A := n An , S := n Cn we have S ⊂ n Bn ⊂ A and μ(A \
S) ≤ ε. However, A is open but  S is not necessarily closed. So, we
approximate S by setting Sn := n0 Ck . The set Sn is obviously closed,
Sn ↑ S and consequently μ(Sn ) ↑ μ(S). Therefore there exists  nε ∈ N
such that μ(S \ Snε ) < ε. Now, setting C = Sn ε we have C
 ⊂ n Bn ⊂ A
and μ(A \ C) < μ(A \ S) + μ(S \ C) < 2ε. Therefore n Bn ∈ K . We
have proved that K is a σ –algebra. It remains to show that K contains
the open subsets of E. In fact, let A be open and set


1
Cn = x ∈ E : d(x, A ) ≥ c
,
n

where d(x, Ac ) := inf y∈Ac d(x, y) is the distance function from Ac . Then
Cn are closed subsets of A, and moreover Cn ↑ A, which implies μ(A \
Cn ) ↓ 0. Thus the conclusion follows.
Notice that inner and outer approximation hold for μ–measurable sets
B as well: one has just to notice that there exist Borel sets B1 , B2 such
that B1 ⊂ B ⊂ B2 with μ(B2 \ B1 ) = 0, and apply inner approximation
to B1 and outer approximation to B2 .
Remark 1.25 (Inner and outer approximation for σ-finite measures).
It is possible to extend the inner approximation property to σ -finite meas-
ures: suffices to assume the existence of a sequence of closed sets Cn with
finite measure such that μ(X \∪n Cn ) = 0. Indeed, assuming with no loss
of generality that Cn ⊂ Cn+1 , we know that for any Borel set B and any
n ∈ N it holds

μ(B ∩ Cn ) = sup {μ(C) : C closed, C ⊂ B ∩ Cn } ,

so that
μ(B ∩ Cn ) ≤ sup {μ(C) : C closed, C ⊂ B} .
Letting n ↑ ∞ we recover the inner approximation property.
Analogously, if we assume the existence of a sequence of open sets An
with finite measure satisfying X = ∪n An , we have the outer approxim-
ation property: indeed, for any n and any > 0 we can find (assuming
with no loss of generality μ(B) < +∞) open sets Bn ⊂ An containing
20 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci


B ∩ An and such that μ Bn \ (B ∩ An ) < 2−n . It follows that ∪n Bn
contains B and  
μ Bn \ B < 2 .
n∈N

Since Bn are also open in X, the set ∪n Bn is open and since is arbitrary
we get the outer approximation property.
We conclude this chapter with the following result, whose proof is a
straightforward consequence of Proposition 1.24 (alternatively, one can
use Dynkin’s argument, since the class of closed sets is a π-system and
generates the Borel σ -algebra).
Corollary 1.26. Let μ, ν be finite measures in (E, B (E)), such that
μ(C) = ν(C) for any closed subset C of E. Then μ = ν.

Exercises
1.1 Given A ⊂ X, denote by 1 A : X → {0, 1} its characteristic function, equal
to 1 on A and equal to 0 on Ac . Show that

1 A∪B = max{1 A , 1 B }, 1 A∩B = min{1 A , 1 B }, 1 Ac = 1 X − 1 A

and that
lim sup An = A ⇐⇒ lim sup 1 An = 1 A ,
n→∞ n→∞
lim inf An = A ⇐⇒ lim inf 1 An = 1 A .
n→∞ n→∞

1.2 Let A ⊂ Rn be a Borel set. Show that for h ∈ Rn and t ∈ R the sets

A + h := {a + h : a ∈ A} , t A := {ta : a ∈ A}

are Borel as well.


1.3 Find an example of a σ –additive measure μ on a σ –algebra A such that
there exist An ∈ A with An ↓ A and infn μ(An ) > μ(A).
1.4 Let μ be additive and finite, on an algebra A . Show that μ is σ –additive if
and only if it is continuous along nonincreasing sequences.
1.5 Let μ be a finite measure on (X, E ). Show that the set of atoms of μ, defined
by
Aμ := {x ∈ X : {x} ∈ E and μ({x}) > 0}
is at most countable. Show that the same is true for σ –finite measures, and
provide an example of a measure space for which this property fails.
1.6 Let (X, E , μ) be a measure space, with μ finite. We say that μ is diffuse if
for all A ∈ E with μ(A) > 0 there exists B ⊂ A with 0 < μ(B) < μ(A). Show
that, if μ is diffuse, then μ(E ) = [0, μ(X)].
1.7 Show that if X is a separable metric space and E is the Borel σ –algebra,
then a σ –additive measure μ : E → [0, +∞) is diffuse if and only if μ has no
atom.
21 Introduction to Measure Theory and Integration

1.8 Let λ be the Lebesgue measure in [0, 1]. Show the existence of a λ–negli-
gible set having the cardinality of the continuum. Hint: consider the classical
Cantor’s middle third set, obtained by removing the interval (1/3, 2/3) from
[0, 1], then by removing the intervals (1/9, 2/9) and (7/9, 8/9), and so on.
1.9 Let λ be the Lebesgue measure in [0, 1]. Show the existence, for any ε > 0,
of a closed set C ⊂ [0, 1] containing no interval and such that λ(C) > 1 − ε.
Hint: remove from [0, 1] a sequence of open intervals, centered on the rational
points of [0, 1].
1.10 Using the previous exercise, write [0, 1] = A ∪ B where A is negligible
in the measure-theoretic sense (i.e. λ(A) = 0) and B is negligible in the Baire
category sense (i.e. it is the union of countably many closed sets with empty
interior). So, the two concepts of negligible should be never used at the same
time.
1.11
Let λ be the Lebesgue measure in [0, 1]. Construct a Borel set E ⊂ (0, 1)
such that
0 < λ(E ∩ I ) < λ(I )
for any open interval I ⊂ (0, 1).
1.12 Let (X, E , μ) be a measure space and let μ∗ : P (X) → [0, +∞] be
the outer measure induced by μ. Show that the completed σ –algebra E μ is
contained in the class C of additive sets with respect to μ∗ .
1.13 Let (X, E , μ) be a measure space and let μ∗ : P (X) → [0, +∞] be the
outer measure induced by μ. Show that for all A ⊂ X there exists B ∈ E
containing A with μ(B) = μ∗ (A).
1.14 Let (X, E , μ) be a measure space. Check the following statements, made
in Definition 1.12:
(i) E μ is a σ –algebra;
(ii) the extension μ(A) := μ(B), where B ∈ E is any set such that AB is
contained in a μ–negligible set of E , is well defined and σ –additive on
E μ;
(iii) μ–negligible sets of E μ are characterized by the property of being coin-
tained in a μ–negligible set of E .
1.15
Let (X, E , μ) be a measure space and let μ∗ : P (X) → [0, +∞] be
the outer measure induced by μ. Show that if μ(X) is finite, the class C of
additive sets with respect to μ∗ coincides with the class of E μ –measurable sets.
Hint: one inclusion is provided by Exercise 1.12. For the other one, given an
additive set A, by applying Exercise 1.13 twice, find first a set B ∈ E with
μ∗ (B \ A) = 0, and then a set C ∈ E with μ(C) = 0 and B \ A ⊂ C.
1.16 Find a σ –algebra E ⊂ P (N) containing infinitely many sets and such that
any B ∈ E different from ∅ has an infinite cardinality.
1.17 Find μ : P(N) → {0, +∞} that is additive, but not σ –additive.
1.18
Let ω be the first uncountable ordinal and, for K ⊂ P (X), define by
transfinite induction a family F (i) , i ∈ ω, as follows: F (0) := K ∪ {∅},
 

(i) c ( j) ( j)
F := Ak , B : (Ak ) ⊂ F , B ∈ F ,
k=0
22 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci


if i is the successor of j, and F (i) := j∈i F ( j) otherwise.

Show that i∈ω F (i) = σ (K ).
1.19
Show that B (R) has the cardinality of the continuum. Hint: use the con-
struction of the previous exercise, and the fact that ω has at most the cardinality
of continuum.
1.20
Show that the σ –algebra L of Lebesgue measurable sets has the same
cardinality of P(R), thus strictly greater than the continuum. Hint: consider all
subsets of Cantor’s middle third set.
1.21

Show that the cardinality of any σ –algebra is either finite or uncount-
able.
1.22 Let X be a set and let A ⊂ P (X) be an algebra with finite cardinality.
Show that its cardinality is equal to 2n for some integer n ≥ 1.
1.23
Let (X, E , μ) be a a measure space and suppose that X is finite or count-
able. Show the existence of a measure μ̃ on P (X) that extends μ, that is,
μ(A) = μ̃(A) for all A ∈ E .
1.24

Find an example of an additive set function μ : P (N) → {0, 1}, with
μ(N) = 1 and μ({n}) = 0 for all n ∈ N (in particular μ is not σ –additive, the
construction of this example requires Zorn’s lemma).
1.25
Let C ∈ B ([0, 1]) with λ(C) > 0. Without using the continuum hypo-
thesis, show that C has the cardinality of continuum.
1.26
Let (K , d) be a compact metric space and let μ be as in Exercise 1.24.
Let’s say that a sequence (xn ) ⊂ K μ-converges to x ∈ K if

μ {n ∈ N : d(xn , x) > ε} = 0 ∀ε > 0.

Show that any sequence (xn ) ⊂ K is μ-convergent and that the μ-limit is
unique.
Chapter 2
Integration

This chapter is devoted to the construction of the integral of E –measur-


able functions in general measure spaces ( , E , μ), and its main con-
tinuity and lower semicontinuity properties. Having built in the previous
chapter the Lebesgue measure in the real line R, we obtain as a byproduct
the Lebesgue integral on R; in the last section we compare Lebesgue and
Riemann integral.
In the construction of the integral we prefer to empahsize two view-
points: the first, more traditional one
 
f dμ = zμ({ f = z})
X z∈Im( f )

is appropriate to deal with simple functions (i.e. functions whose range is


finite) and useful to show the additivity of the integral with respect to f .
The second one, for nonnegative functions is summarized by the formula
  ∞
f dμ = μ({ f > t}) dt.
X 0

This second viewpoint is more appropriate to show the continuity prop-


erties of the integral with respect to f (the integral on the right side can
be elementarily defined, since t → μ({ f > t}) is nonincreasing, see
Section 2.4.3). Of course we show that the two viewpoints are consistent
if we restrict ourselves to the class of simple functions.

2.1. Inverse image of a function


Let X be a non empty set. For any function ϕ : X → Y and any I ∈
P (Y ) we set

ϕ −1 (I ) := {x ∈ X : ϕ(x) ∈ I } = {ϕ ∈ I }.

The set ϕ −1 (I ) is called the inverse image of I .

L. Ambrosio et al., Introduction to Measure Theory and Integration


© Scuola Normale Superiore Pisa 2011
24 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Let us recall some elementary properties of ϕ −1 (the easy proofs are left
to the reader as an exercise):
(i) ϕ −1 (I c ) = (ϕ −1 (I ))c for all I ∈ P (Y );
(ii) if {Ji }i∈I ⊂ P (Y ) we have
  
ϕ −1 (Ji ) = ϕ −1 Ji , ϕ −1 (Ji ) = ϕ −1 Ji .
i∈I i∈I i∈I i∈I

In particular, if I ∩ J = ∅ we have ϕ −1 (I ) ∩ ϕ −1 (J ) = ∅. Also, if


E ⊂ P (Y ) and we consider the family ϕ −1 (E ) of subset of X defined
by  
ϕ −1 (E ) := ϕ −1 (I ) : I ∈ E , (2.1)
we have that ϕ −1 (E ) is a σ –algebra whenever E is a σ –algebra.

2.2. Measurable and Borel functions


We are given measurable spaces (X, E ) and (Y, F ). We say that a func-
tion ϕ : X → Y is (E , F )–measurable if ϕ −1 (F ) ⊂ E . If (Y, F ) =
(R, B (R)), we say that ϕ is a real valued E –measurable function, and if
(X, d) is a metric space and E is the Borel σ –algebra, we say that ϕ is a
real valued Borel function.
The following simple but useful proposition shows that the measurab-
ility condition needs to be checked only on a class of generators.
Proposition 2.1 (Measurability criterion). Let G ⊂ F be such that
σ (G ) = F . Then ϕ : X → Y is (E , F )–measurable if and only if
ϕ −1 (I ) ∈ E for all I ∈ G (equivalently, iff ϕ −1 (G ) ⊂ E ).
Proof. Consider the family D := {I ∈ F : ϕ −1 (I ) ∈ E }. By the
above-mentioned properties of ϕ −1 as an operator between P (Y ) and
P (X), it follows that D is a σ –algebra including G . So, it coincides
with σ (G ) = F .
A simple consequence of the previous proposition is the fact that any
continuous function is a Borel function: more precisely, assume that ϕ :
X → Y is continuous and that E = B (X) and F = B (Y ). Then, the
σ –algebra  
A ⊂ Y : ϕ −1 (A) ∈ B (X)
contains the open subsets of Y (as, by the continuity of ϕ, ϕ −1 (A) is
open in X, and in particular Borel, whenever A is open in Y ), and then it
contains the generated σ –algebra, i.e. B (Y ).
The following proposition, whose proof is straightforward, shows that
the class of measurable functions is stable under composition.
25 Introduction to Measure Theory and Integration

Proposition 2.2. Let (X, E ), (Y, F ), (Z , G ) be measurable spaces and


let ϕ : X → Y and ψ : Y → Z be respectively (E , F )–measurable and
(F , G )–measurable. Then ψ ◦ ϕ is (E , G )–measurable.
It is often convenient to consider functions with values in the extended
space R := R ∪ {+∞, −∞}, the so-called extended functions. We say
that a mapping ϕ : X → R is E –measurable if
ϕ −1 ({−∞}), ϕ −1 ({+∞}) ∈ E and ϕ −1 (I ) ∈ E , ∀I ∈ B (R).
(2.2)
This condition can also be interpreted in terms of measurability between
E and a suitable Borel σ –algebra in R, see Exercise 2.3. Analogously,
when (X, d) is a metric space and E is the Borel σ –algebra, we say that
ϕ : X → R is Borel whenever the conditions above hold.
The following proposition shows that extended E –measurable func-
tions are stable under pointwise limits and countable supremum and in-
fimum.
Proposition 2.3. Let (ϕn ) be a sequence of extended E–measurable func-
tions. Then the following functions are E–measurable:
sup ϕn (x), inf ϕn (x), lim sup ϕn (x), lim inf ϕn (x).
n∈N n∈N n→∞ n→∞

Proof. Let us prove that ϕ(x) := supn ϕn (x) is E –measurable (all other
cases can be deduced from this one, or directly proved by similar argu-
ments). For any a ∈ R we have


−1
ϕ ([−∞, a]) = ϕn−1 ([−∞, a]) ∈ E .
n=0

In particular {ϕ = −∞} ∈ E , so that ϕ −1 ((−∞, a]) ∈ E for all a ∈ R;


by letting a ↑ ∞ we get ϕ −1 (R) ∈ E . As a consequence, the class
 
I ∈ B (R) : ϕ −1 (I ) ∈ E
is a σ –algebra containing the intervals of the form (−∞, a] with a ∈ R,
and therefore coincides with B (R). Eventually, {ϕ = +∞} = X \
[ϕ −1 (R) ∪ {ϕ = −∞}] belongs to E as well.

2.3. Partitions and simple functions


Let (X, E ) be a measurable space. A function ϕ : X → R is said to be
simple if its range ϕ(X) is a finite set. The class of simple functions is
obviously a real vector space, as the range of ϕ + ψ is contained in
{a + b : a ∈ range(ϕ), b ∈ range(ψ)} .
26 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

If ϕ(X) = {a1 , . . . , an }, with ai = a j if i = j, setting Ai = ϕ −1 ({ai }),


i = 1, . . . , n we can canonically represent ϕ as

n
ϕ(x) = ak 1 A k , x ∈ X. (2.3)
k=1

Moreover, A1 , . . . , An is a finite partition of X (i.e. Ai are mutually


disjoint and their union is equal to X). However, a simple function ϕ has
many representations of the form

N
ϕ(x) = ak 1 Ak , x ∈ X,
k=1

where A1 , . . . , AN need not be mutually disjoint and ak need not be in
the range of ϕ. For instance

1[0,1) + 31[1,2] = 1[0,2] + 21[1,2] .

It is easy to check that a simple function is E –measurable if, and only if,
all level sets Ak in (2.3) are E –measurable; in this case we shall also say
that {Ak } is a finite E –measurable partition of X.
Now we show that any nonnegative E –measurable function can be ap-
proximated by simple functions; a variant of this result, with a different
construction, is proposed in Exercise 2.8.
Proposition 2.4. Let ϕ be a nonnegative extended E –measurable func-
tion. For any n ∈ N∗ , define

⎪ i −1 i −1 i
⎨ n if n
≤ ϕ(x) < n , i = 1, 2, . . . , n2n ;
ϕn (x) = 2 2 2 (2.4)


n if ϕ(x) ≥ n.
Then ϕn are simple and E –measurable, (ϕn ) is nondecreasing and con-
vergent to ϕ. If in addition ϕ is bounded the convergence is uniform.
Proof. It is not difficult to check that (ϕn ) is nondecreasing. Moreover,
we have
1
0 ≤ ϕ(x) − ϕn (x) ≤ if ϕ(x) < n, x ∈ X,
2n
and
0 ≤ ϕ(x) − ϕn (x) = ϕ(x) − n if ϕ(x) ≥ n, x ∈ X.
So, the conclusion easily follows.
27 Introduction to Measure Theory and Integration

2.4. Integral of a nonnegative E –measurable function


We are given a measure space (X, E , μ). We start to define the integral
for simple nonnegative functions.

2.4.1. Integral of simple functions


Let ϕ be a nonnegative simple E –measurable function, and let us repres-
ent it as
N
ϕ(x) = ak 1 Ak , x ∈ X,
k=1
with N ∈ N, a1 , . . . , a N ≥ 0 and A1 , . . . , A N in E . Then we define
(using the standard convention in the theory of integration that 0·∞ = 0),
 N
ϕ dμ := ak μ(Ak ).
X k=1

It is easy to see that the definition does not depend on the choice of the
formula for ϕ. Indeed, let {b1 , . . . , b M } be the range of ϕ
representation
and let ϕ = 1M b j 1 B j , with B j := ϕ −1 (b j ), be the canonical representa-
tion of ϕ. We have to prove that

N 
M
ak μ(Ak ) = b j μ(B j ). (2.5)
k=1 j=1

As the Bi ’s are pairwise disjoint, (2.5) follows by adding the M identities



N
ak μ(Ak ∩ B j ) = b j μ(B j ) j = 1, . . . , M. (2.6)
k=1

In order to show (2.6) we fix j and consider, for I ⊂ {1, . . . , N }, the sets
 
A I := x ∈ B j : x ∈ Ai iff i ∈ I ,
so that {A I } are a E –measurable partition of B j and x ∈ A I iff the set
of i’s for which x ∈ Ai coincides with I . Then, using first the fact that
A I ⊂ Ai if i ∈ I , and Ai ∩ A I = ∅ otherwise,
 N and then the fact that
ak = b j whenever A I = ∅ (because 1 ak 1 Ak coincides with b j , the
k∈I
constant value of ϕ on B j ), we have

N 
N  
N
ak μ(Ak ∩ B j ) = ak μ(Ak ∩ A I ) = ak μ(Ak ∩ A I )
k=1 k=1 I I k=1
 
= ak μ(A I ) = b j μ(A I ) = b j μ(B j ).
I k∈I I
28 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Proposition 2.5. Let ϕ, ψ be simple nonnegative E –measurable func-


tions on X and let α, β ≥ 0. Then αϕ + βψ is simple, E –measurable
and we have
  
(αϕ + βψ) dμ = α ϕ dμ + β ψ dμ
X X X

Proof. Let

n 
m
ϕ= ak 1 A k , ψ= bh 1 Bh
k=1 h=1

with {Ak }, {Bh } finite E –measurable partitions of X. Then {Ak ∩ Bh } is a


finite E –measurable partition of X and αϕ + βψ is constant (and equal
to αak + βbh ) on any element Ak ∩ Bh of the partition. Therefore the
level sets of αϕ + βψ are finite unions of elements of this partition and
the E –measurability of αϕ + βψ follows (see also Exercise 2.2). Then,
writing

n 
m 
n 
m
ϕ(x) = ak 1 Ak ∩Bh (x), ψ(x) = bh 1 Ak ∩Bh (x), x ∈ X,
k=1 h=1 k=1 h=1

we arrive at the conclusion.

2.4.2. The repartition function


Let ϕ : X → R be E –measurable. The repartition function F of ϕ, relat-
ive to μ, is defined by

F(t) := μ({ϕ > t}), t ∈ R.

The function F is nonincreasing and satisfies

lim F(t) = lim F(n) = lim μ({ϕ > −n}) = μ({ϕ > −∞}),
t→−∞ n→−∞ n→∞

and, if μ is finite,

lim F(t) = lim F(n) = lim μ({ϕ > n}) = μ({ϕ = +∞}),
t→+∞ n→∞ n→∞

since

∞ 

{ϕ > −∞} = {ϕ > −n}, {ϕ = +∞} = {ϕ > n}.
n=1 n=1

Other important properties of F are provided by the following result.


29 Introduction to Measure Theory and Integration

Proposition 2.6. Let ϕ : X → R be E –measurable and let F be its re-


partition function.
(i) For any t0 ∈ R we have lim+ F(t) = F(t0 ), that is, F is right con-
t→t0
tinuous.
(ii) If μ is finite, for any t0 ∈ R we have lim− F(t) = μ({ϕ ≥ t0 )}, that
t→t0
(1)
is, F has left limits .
Proof. Let us prove (i). We have
  

1 1
lim F(t) = lim F t0 + = lim μ ϕ > t0 +
t→t0+ n→+∞ n n→+∞ n
= μ({ϕ > t0 }) = F(t0 ),
since



1 1
{ϕ > t0 } = ϕ > t0 + = lim ϕ > t0 + .
n=1
n n→∞ n

So, (i) follows. We prove now (ii). We have


 
1
lim F(t) = lim F t0 −
t→t0− n→+∞ n


1
= lim μ ϕ > t0 − = μ({ϕ ≥ t0 }),
n→+∞ n
since
∞


1 1
{ϕ ≥ t0 } = ϕ > t0 − = lim ϕ > t0 −
n=1
n n→∞ n

and (ii) follows.


From Proposition 2.6 it follows that, in the case when μ is finite, F is
continuous at t0 iff μ({ϕ = t0 }) = 0.
Now we want to extend the integral operator to nonnegative E –mea-
surable functions. Let ϕ be a nonnegative, simple and E –measurable
function and let
 n
ϕ(x) = ak 1 Ak , x ∈ X,
k=0

(1) In the literature F is called a cadlag function.


30 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

with n ∈ N∗ , 0 = a0 < a1 < a2 < · · · < an < ∞. Then the repartition


function F of ϕ is given by


⎪ μ(A1 ) + μ(A2 ) + · · · + μ(An ) = F(0) if 0 ≤ t < a1


⎨ μ(A2 ) + μ(A3 ) + · · · + μ(An ) = F(a1 ) if a1 ≤ t < a2
F(t) = . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

⎪ μ(An ) = F(an−1 ) if an−1 ≤ t < an


⎩ 0 = F(a ) if t ≥ an .
n

Consequently, we can write


 n 
n
ϕ(x) dμ(x) = ak μ(Ak ) = ak (F(ak−1 ) − F(ak ))
X k=1 k=1

n n
= ak F(ak−1 ) − ak F(ak )
k=1 k=1
(2.7)

n−1 
n−1
= ak+1 F(ak ) − ak F(ak )
k=0 k=0

n−1  ∞
= (ak+1 − ak )F(ak ) = F(t) dt.
k=0 0

Example 2.7. We set X = R, μ = λ,


A1 = [1, 2] ∪ [10, 11], A2 = [2, 3], A3 = [3, 4],
A4 = [4, 6], A5 = [7, 10],
a1 = 5, a2 =, a3 = 10, a4 = 7, a5 = 2
5
and ϕ := k=1 ak 1 Ak to be the simple function shown in Figure 2.1. It is
easy to verify that F has the graph shown in the right picture in Figure 2.1.

ϕ F
10
9

8
7
6
5
4
3
2
1

1 1 2 3 4 5 6 7 8 9 10

Figure 2.1. a simple function ϕ, and its repartition F


31 Introduction to Measure Theory and Integration

The color scheme used for the areas below the two graphs in 2.1 proves
graphically that the areas are identical.
Now, we want to define the integral of any nonnegative extended E –
measurable function by generalizing formula (2.7). For this, we need
first to define the integral of any nonnegative nonincreasing function in
(0, +∞).

2.4.3. The archimedean integral


We generalize here the (inner) Riemann integral to any nonincreasing
function f : [0, +∞) → [0, +∞]. The strategy is to consider the su-
premum of the areas of piecewise constant minorants of f .
Let  be the set of all finite decompositions σ = {t1 , . . . , t N } of
[0, +∞], where N ∈ N∗ and 0 = t0 ≤ t1 < · · · < t N < +∞.
Let now f : [0, +∞) → [0, +∞] be a nonincreasing function. For
any σ = {t0 , t1 , . . . , t N } ∈  we consider the partial sum

N −1
I f (σ ) := f (tk+1 )(tk+1 − tk ). (2.8)
k=0

We define  ∞
f (t) dt := sup{I f (σ ) : σ ∈ }. (2.9)
0
∞
The integral 0 f (t) dt is called the archimedean integral of f . It enjoys
the usual properties of the Riemann integral (see Exercise 2.5) but, among
these, we will need only the monotonicity with respect to f in the sequel.
For our purposes the most relevant property of the Archimedean integral
is instead the continuity under monotonically nondecreasing sequences.
Proposition 2.8. Let f n ↑ f , with f n : [0, +∞) → [0, +∞] nonin-
creasing. Then  ∞  ∞
f n (t) dt ↑ f (t) dt.
0 0

Proof. It is obvious that


 ∞  ∞
f n (t) dt ≤ f (t) dt.
0 0
∞
To prove the converse inequality, fix L < 0 f (t) dt. Then there exists
σ = {t1 , . . . , t N } ∈  such that

N −1
f (tk )(tk+1 − tk ) > L .
k=0
32 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Since for n large enough


 ∞ 
N −1
f n (t) dt ≥ f n (tk+1 )(tk+1 − tk ) > L ,
0 k=0

letting n → ∞ we find that


 ∞
sup f n (t) dt ≥ L .
n∈N 0

This implies  
∞ ∞
sup f n (t) dt ≥ f (t) dt
n∈N 0 0

and the conclusion follows.

2.4.4. Integral of a nonnegative measurable function


We are given a measure space (X, E , μ) and an extended nonnegative
E –measurable function ϕ. Having the identity (2.7) in mind, we define
  ∞
ϕ dμ : = μ({ϕ > t}) dt. (2.10)
X 0

Notice that the function t → μ({ϕ > t}) ∈ [0, +∞] is nonnegative and
nonincreasing in [0, +∞), so that its archimedean integral is well defined
and (2.10) extends, by the remarks made at the end of Section 2.4.2, the
integral elementarily defined on simple functions. If the integral is finite
we say that ϕ is μ–integrable.
It follows directly from the analogous properties of the archimedean
integral that the integral so defined is monotone, i.e.
 
ϕ ≥ ψ ⇒ ϕ dμ ≥ ψ dμ.
X X

Indeed, ϕ ≥ ψ implies {ϕ > t} ⊃ {ψ > t} and μ({ϕ > t}) ≥ μ({ψ >
t}) for all t > 0. Furthermore, the integral is invariant under modifica-
tions of ϕ in μ–negligible sets, that is
 
ϕ = ψ μ–a.e. in X ⇒ ϕ dμ = ψ dμ.
X X

To show this fact it suffices to notice that ϕ = ψ μ–a.e. in X implies that


the sets {ϕ > t} and {ψ > t} differ in a μ–negligible set for all t > 0,
therefore μ({ϕ > t}) = μ({ψ > t}) for all t > 0.
Let us prove the following basic Markov inequality.
33 Introduction to Measure Theory and Integration

Proposition 2.9. For any a ∈ (0, +∞) we have



1
μ({ϕ ≥ a}) ≤ ϕ(x) dμ(x). (2.11)
a X
Proof. For any a ∈ (0, +∞) we have, recalling the inclusion {ϕ ≥ a} ⊂
{ϕ > t} for any t ∈ (0, a), that μ({ϕ > t}) ≥ μ({ϕ ≥ a}) for all
t ∈ (0, a). The monotonicity of the archimedean integral gives
  ∞  ∞
ϕ(x) dμ(x) = μ({ϕ > t}) dt ≥ 1(0,a) (t)μ({ϕ > t}) dt
X 0 0

≥ aμ({ϕ ≥ a}).

The Markov inequality has some important consequences.


Proposition 2.10. Let ϕ : X → [0, +∞] be an extended E –measurable
function.

(i) If ϕ is μ–integrable then the set {ϕ = +∞} has μ–measure 0, that


is, ϕ is finite μ–a.e. in X.
(ii) The integral of ϕ vanishes iff ϕ is equal to 0 μ–a.e. in X.

Proof. (i) Since X ϕ dμ < ∞ we deduce from (2.11) that

lim μ({ϕ > a}) = 0.


a→+∞

Since


{ϕ = ∞} = {ϕ > n},
n=1

by applying the continuity along decreasing sequences in the space ({ϕ >
1} (with finite μ measure) we obtain

μ({ϕ = ∞}) = lim μ({ϕ > n}) = 0.


n→+∞

(ii) If X ϕ dμ = 0 we deduce from (2.11) that μ({ϕ > a}) = 0 for all
a > 0. Since
1
μ({ϕ > 0}) = lim μ({ϕ > }) = 0,
n→+∞ n
the conclusion follows. The other implication follows by the invariance
of the integral.
34 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Proposition 2.11 (Monotone convergence). Let (ϕn ) be a nondecreas-


ing sequence of extended nonnegative E –measurable functions and set
ϕ(x) := lim ϕn (x) for any x ∈ X. Then
n→∞
 ∞  ∞
ϕn (x) dμ(x) ↑ ϕ(x) dμ(x).
0 0
Proof. It suffices to notice that μ({ϕn > t}) ↑ μ({ϕ > t}) for all t > 0,
and then to apply Proposition 2.8.
Now, by Proposition 2.4 we obtain the following important approxim-
ation property.
Proposition 2.12. Let ϕ : X → [0, +∞] be an extended E –measurable
function. Then there exist simple E –measurable functions ϕn : X →
[0, +∞) such that ϕn ↑ ϕ, so that
 ∞  ∞
ϕn (x) dμ(x) ↑ ϕ(x) dμ(x).
0 0
Remark 2.13 (Construction of Lebesgue and Riemann integrals).
Proposition 2.12 could be used as an alternative, and equivalent, defini-
tion of the Lebesgue integral: we can just define it as the supremum of the
integral of minorant simple functions. This alternative definition is closer
to the definitions of Archimedean integrals and of inner Riemann integ-
ral: the only (fundamental) difference is due to the choice of the family of
“simple” functions. In all cases simple functions take finitely many val-
ues, but within the Lebesgue theory their level sets belong to a σ –algebra,
and so the family of simple function is much richer, in comparison with
the other theories.
We can now prove the additivity property of the integral.
Proposition 2.14. Let ϕ, ψ : X → [0, ∞] be E –measurable functions.
Then   
(ϕ + ψ) dμ = ϕ dμ + ψ dμ.
X X X
Proof. Let ϕn , ψn be simple functions with ϕn ↑ ϕ and ψn ↑ ψ. Then,
the additivity of the integral on simple functions gives
  
(ϕn + ψn ) dμ = ϕn dμ + ψn dμ.
X X X
We conclude passing to the limit as n → ∞ and using the monotone
convergence theorem.
The following Fatou’s lemma, providing a semicontinuity property of
the integral, is of basic importance.
35 Introduction to Measure Theory and Integration

Lemma 2.15 (Fatou). Let ϕn : X → [0, +∞] be extended E –measur-


able functions. Then we have
 
lim inf ϕn (x) dμ(x) ≤ lim inf ϕn (x) dμ(x). (2.12)
X n→∞ n→∞ X

Proof. Setting ϕ(x) := lim infn ϕn (x), and ψn (x) = infm≥n ϕm (x), we
have that ψn (x) ↑ ϕ(x). Consequently, by the monotone convergence
theorem,  
ϕ(x) dμ(x) = lim ψn (x) dμ(x).
X n→∞ X
On the other hand
 
ψn (x) dμ(x) ≤ ϕn (x) dμ(x),
X X

so that  
ϕ(x) dμ(x) ≤ lim inf ϕn (x) dμ(x).
X n→∞ X

In particular, if ϕn are pointwise converging to ϕ, we have


 
ϕ(x) dμ(x) ≤ lim inf ϕn (x) dμ(x).
X n→∞ X

2.5. Integral of functions with a variable sign


Let ϕ : X → R be an extended E –measurable function. We say that ϕ
is μ–integrable if both the positive part ϕ + (x) := max{ϕ(x), 0} and the
negative part ϕ − (x) := max{−ϕ(x), 0} of ϕ are μ–integrable in X. As
ϕ = ϕ + − ϕ − , in this case it is natural to define
  
ϕ(x) dμ(x) := ϕ + (x) dμ(x) − ϕ − (x) dμ(x).
X X X

As |ϕ| = ϕ + + ϕ − , the additivity properties of the integral give that



ϕ is μ–integrable if and only if |ϕ| dμ < ∞.
X

Let ϕ : X → R and let A ∈ E be such that 1 A ϕ is μ-integrable. We define


also  
ϕ(x) dμ(x) := 1 A (x)ϕ(x) dμ(x).
A X
In the following proposition we summarize the main properties of the
integral.
36 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Proposition 2.16. Let ϕ, ψ : X → R be μ–integrable functions.


(i) For any α, β ∈ R we have that αϕ + βψ is μ–integrable and
  
(αϕ + βψ) dμ = α ϕ dμ + β ψ dμ.
X X X
 
(ii) If ϕ ≤ ψ in X we have ϕ dμ ≤ ψ dμ.
! !  X X
! !
(iii) ! ϕ dμ!! ≤
! |ϕ| dμ.
X X
+ − − +

 (i). Since (−ϕ) = ϕ and (−ϕ) = ϕ , we have X −ϕ dμ =
Proof.
− X ϕ dμ. So, possibly replacing ϕ by −ϕ and ψ by −ψ we can assume
that α ≥ 0 and β ≥ 0. We have
(αϕ + βψ)+ + αϕ − + βψ − = (αϕ + βψ)− + αϕ + + βψ + ,
so that we can integrate both sides and use the additivity on nonnegative
functions to obtain
  
+ −
(αϕ + βψ) dμ + α ϕ dμ + β ψ − dμ
 X
 X
 X
− +
= (αϕ + βψ) dμ + α ϕ dμ + β ψ + dμ.
X X X

Rearranging terms we obtain (i).


(ii). It follows by the monotonicity of the integral on nonnegative func-
tions and from the inequalities ϕ + ≤ ψ + and ϕ − ≥ ψ − .
(iii). Since −|ϕ| ≤ ϕ ≤ |ϕ| the conclusion follows from (ii).
Another consequence of the additivity property of the integral is the
additivity of the real-valued map

A ∈ E → ϕ dμ
A

whenever ϕ is μ–integrable. We will see in the next section that, as a


consequence of the dominated convergence theorem, this map is even
σ –additive.

2.6. Convergence of integrals


In this section we study the problem of commuting limit and integral;
we have already seen that this can be done in some particular cases, as
when the functions are nonnegative and monotonically converge to their
supremum, and now we investigate some more general cases, relevant for
the applications.
37 Introduction to Measure Theory and Integration

Proposition 2.17 (Lebesgue dominated convergence theorem). Let


(ϕn ) be a sequence of E –measurable functions pointwise converging to
ϕ. Assume that there exists a nonnegative μ–integrable function ψ such
that
|ϕn (x)| ≤ ψ(x) ∀x ∈ X, n ∈ N.

Then the functions ϕn and the function ϕ are μ–integrable and


 
lim ϕn dμ = ϕ dμ.
n→∞ X X

Proof. Passing to the limit as n → ∞ we obtain that ϕ is E –measurable


and |ϕ| ≤ ψ in X. In particular ϕ is μ–integrable. Since ϕ + ψ is
nonnegative, by the Fatou lemma we have
 
(ϕ + ψ) dμ ≤ lim inf (ϕn + ψ) dμ.
X n→∞ X

Consequently,
 
ϕ dμ ≤ lim inf ϕn dμ. (2.13)
X n→∞ X

In a similar way we have


 
(ψ − ϕ) dμ ≤ lim inf (ψ − ϕn ) dμ.
X n→∞ X

Consequently,
 
ϕ dμ ≥ lim sup ϕn dμ. (2.14)
X n→∞ X

Now the conclusion follows by (2.13) and (2.14).

An important consequence of the dominated convergence theorem is


the absolute continuity property of the integral of μ–integrable func-
tions ϕ:

for any ε > 0 there exists δ > 0 such that μ(A) < δ ⇒ |ϕ| dμ < ε.
A
(2.15)
The proof of this property is sketched in Exercise 2.9.
38 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

2.6.1. Uniform integrability and Vitali convergence theorem


In this subsection we assume for simplicity that the measure μ is finite.
A family {ϕi }i∈I of R–valued μ–integrable functions is said to be μ–
uniformly integrable if

lim |ϕi (x)| dμ(x) = 0, uniformly in i ∈ I .
μ(A)→0 A

This means that for any ε > 0 there exists δ > 0 such that

μ(A) < δ ⇒ |ϕi (x)| dμ(x) ≤ ε ∀ i ∈ I.
A

This property obviously extends from single functions to families of func-


tions the absolute continuity property of the integral.
Notice that any family {ϕi }i∈I dominated by a single μ–integrable
function ϕ (i.e. such that |ϕi | ≤ |ϕ| for any i ∈ I ) is obviously μ–
uniformly integrable. Taking this remark into account, we are going to
to prove the following extension of the dominated convergence theorem,
known as Vitali Theorem.
Theorem 2.18 (Vitali). Assume that μ is a finite measure and let (ϕn ) be
a μ–uniformly integrable sequence of functions pointwise converging to
a real valued function ϕ. Then ϕ is μ–integrable and
 
lim ϕn dμ = ϕ dμ.
n→∞ X X

To prove the Vitali theorem we need the following Egorov Lemma.


Lemma 2.19 (Egorov). Assume that μ is a finite measure and let (ϕn )
be a sequence of E –measurable functions pointwise converging to a real
valued function ϕ. Then for any δ > 0 there exists a set Aδ ∈ E such that
μ(Aδ ) < δ and ϕn → ϕ uniformly in X \ Aδ .
Proof. For any integer m ≥ 1 we write X as the increasing union of the
sets Bn,m , where


1
Bn,m := x ∈ X : |ϕi (x) − ϕ(x)| < ∀i ≥ n .
m
Since μ is finite there exists n(m) such that μ(Bn(m),m ) > μ(X) − 2−m δ.
We denote by Aδ the union of X \ Bn(m),m , so that

∞ ∞
δ
μ(Aδ ) ≤ μ(X \ Bn(m),m ) < = δ.
m=1 m=1
2m
39 Introduction to Measure Theory and Integration

Now, given any ε > 0, we can choose m > 1/ε to obtain that
1
|ϕn (x) − ϕ(x)| ≤ < ε for all x ∈ Bn(m),m , n ≥ n(m).
m
As X \ Aδ ⊂ Bn(m),m , this proves the uniform convergence of ϕn to ϕ on
X \ Aδ .
Proof
 of the Vitali Theorem. Fix ε > 0 and find δ > 0 such that
 A |ϕn | dμ < ε whenever μ(A) < δ. Again, Fatou’s Lemma yields that
A |ϕ| dμ ≤ ε whenever μ(A) < δ.
Assume now that A is given by Egorov Lemma, so that ϕn → ϕ uni-
formly on X \ A. Then, writing
  
(ϕ − ϕn ) dμ = (ϕ − ϕn ) dμ + (ϕ − ϕn ) dμ
X X\A A

and using the fact that limn sup |ϕn − ϕ| = 0 we obtain


X\A
! !
! !
! (ϕ − ϕn ) dμ! ≤ 3ε
! !
X

for n large enough. The statement follows letting ε ↓ 0.

2.7. A characterization of Riemann integrable functions



The integrals J f dλ, with J = [a, b] closed interval of the real line and
λ equal to the Lebesgue measure in R, are traditionally denoted with the
b 
classical notation a f dx or with J f dx. This is due to the fact that
Riemann’s and Lebesgue’s integral coincide on the class of Riemann’s
integrable functions.
We denote by I∗ ( f ) and I ∗ ( f ) the upper and lower Riemann integral
of
n−1f respectively, the former defined by taking the supremum of the sums
1 ai (ti+1 − ti ) in correspondence of all step functions

n−1
h= ai 1[ti ,ti+1 ) ≤ f a = t1 < · · · < tn = b, (2.16)
i=1

and the latter considering the infimum in correspondence of all step func-
tions h ≥ f . We denote by I ( f ) the Riemann integral, equal to the upper
and lower integral whenever the two integrals coincide.
Asn−1
the Lebesgue integral of the function h in (2.19) coincides with
i ai (ti+1 − ti ), we have

g dλ = I (g) for any step function g : J → R.
J
40 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Now, if f : J → R is continuous, we can choose a uniformly bounded


sequence of step functions gh converging pointwise to f (for instance
splitting J into i equal intervals [xi , xi+1 [ and setting ai = min[xi ,xi+1 ] f )
whose Riemann integrals converge to I ( f ). Therefore, passing to the
limit in the identity above with g = gh , and using the dominated conver-
gence theorem we get

f dλ = I ( f ) for any continuous function f : J → R.
J

We are going to generalize this fact, providing a full characterization,


within the Lebesgue theory, of Riemman’s integrable functions.

Theorem 2.20. Let f : J = [a, b] → R be a bounded function. Then


f is Riemann integrable if and only if the set of its discontinuity points
is Lebesgue negligible. If this is the case, we have that f is B (J )λ –
measurable and 
f dλ = I ( f ). (2.17)
J

Proof. Let



⎪ f ∗ (x) := inf lim inf f (xh ) : x h → x


⎨ h→∞


(2.18)



⎪ ∗
⎩ f (x) := sup lim sup f (x h ) : x h → x .
h→∞

It is not hard to show (see Exercise 2.6 and Exercise 2.7) that f ∗ is lower
semicontinuous and f ∗ is upper semicontinuous, therefore both f ∗ and
f ∗ are Borel functions.  
We are going to show that I∗ ( f ) = J f ∗ dλ and I ∗ ( f ) = J f ∗ dλ.
These two equalities yield the conclusion, as f is continuous at λ–a.e.
point in J iff f ∗ − f ∗ = 0 λ–a.e. in J , and this holds iff (because f ∗ − f ∗ ≥
0) 
( f ∗ − f ∗ ) dλ = 0.
J

Furthermore, if the set of discontinuity points of f is λ–negligible, the


Borel function f ∗ differs from f only in a λ–negligible set, thus f is
B (J )λ –measurable (because { f > t} differs from the Borel set { f ∗ > t}
 a λ–negligible
only in  set, see also Exercise 2.4) and its integral coincides
with J f ∗ dλ = J f ∗ dλ; this leads to (2.17).
41 Introduction to Measure Theory and Integration

Since I ∗ ( f ) = −I∗ (− f ) and f ∗ = −(− f )∗ , we need only to prove the


first of the two equalities, i.e.

f ∗ dλ = I∗ ( f ). (2.19)
J

In order to check the inequality ≤ in (2.19) we apply Exercise 2.11, find-


ing a sequence of continuous functions gh ↑ f ∗ ≤ f and obtaining,
thanks to the monotone convergence theorem,
 
f ∗ dλ = sup gh dλ = sup I (gh ) = sup I∗ (gh ) ≤ I∗ ( f ).
J h∈N J h∈N h∈N

In order to prove ≥ in (2.19) we fix a step function h ≤ f in [a, b) as in


(2.16) and we notice that f ≥ ai = h in (ti , ti+1 ) implies f ∗ ≥ ai in the
same interval. Hence f ∗ ≥ h in J \ {t1 , . . . , tn } and, being the set of the
ti ’s Lebesgue negligible, we have
 
f ∗ dλ ≥ h dλ = I (h).
J J

Since h is arbitrary the inequality is achieved.

Exercises
2.1 Show that any of the conditions listed below is equivalent to the E –mea-
surability of ϕ : X → R.
(i) ϕ −1 ((−∞, t]) ⊂ E for all t ∈ R;
(ii) ϕ −1 ((−∞, t)) ⊂ E for all t ∈ R;
(iii) ϕ −1 ([a, b]) ⊂ E for all a, b ∈ R;
(iv) ϕ −1 ([a, b)) ⊂ E for all a, b ∈ R;
(v) ϕ −1 ((a, b)) ⊂ E for all a, b ∈ R.
2.2 Let ϕ, ψ : X → R be E –measurable. Show that ϕ + ψ and ϕψ are E –
measurable. Hint: prove that

{ϕ + ψ < t} = [{ϕ < r} ∩ {ψ < t − r}]
r∈Q

and √ √
{ϕ 2 > a} = {ϕ > a} ∪ {ϕ < − a}, a ≥ 0.

2.3 Let us define a distance d in R by

d(x, y) := | arctan x − arctan y|

where, by convention, arctan(±∞) = ±π/2.


42 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

(i) Show that (R, d) is a compact metric space (the so-called compactification
of R) and that A ⊂ R is open relative to the Euclidean distance if, and only
if, it is open relative to d;
(ii) use (i) to show that, given a measurable space (X, E ), f : X → R is E –
measurable according to (2.2) if and only if it is measurable between E and
the Borel σ –algebra of (R, d).
2.4 Let (X, E , μ) be a measure space and let E μ be the completion of E induced
by μ. Show that f : X → R is E μ –measurable iff there exists a E –measurable
function g such that { f  = g} is contained in a μ–negligible set of E .
2.5 Let us define I f as in (2.8) and let us endow  with the usual partial ordering
σ = {t1 , . . . , t N } ≤ ζ = {s1 , . . . , s M } if and only if σ ⊂ ζ . Show that σ  →

I f (σ ) is nondecreasing. Use this fact to show that f  → 0 f (t) dt is additive.
2.6 Let f : R → R be a function. Show that the functions f ∗ , f ∗ defined in
(2.18) are respectively lower semicontinuous and upper semicontinuous.
2.7 Let f : R → R be a bounded function. Using Exercise 2.6 show that
{ f ∗ ≤ t} and { f ∗ ≥ t} are closed for all t ∈ R. In particular deduce that

 = {x ∈ R : f is continuous at x}

belongs to B (R).
2.8 Let (an ) ⊂ (0, ∞) with


ai = ∞, lim ai = 0.
i→∞
i=0

Show that for any ϕ : X → [0, +∞] E –measurable there exist Ai ∈ E such that
ϕ = i ai 1 Ai . Hint: set ϕ0 := ϕ, A0 := {ϕ ≥ a0 } and ϕ1 := ϕ0 − a0 1 A0 ≥ 0.
Then, set A1 := {ϕ1 ≥ a1 } and ϕ2 := ϕ0 − a1 1 A1 and so on.
2.9 Let ϕ : X → R be μ–integrable. Show that the property (2.15) holds. Hint:
−i
 by contradiction its failure for some ε > 0 and find Ai with μ(Ai ) < 2
assume
and Ai |ϕ| dμ ≥ ε. Then, notice that B := lim supi Ai is μ–negligible, consider

Bn := Ai \ B ↓ ∅
i≥n

and apply the dominated convergence theorem.


2.10 Prove that if ϕn → ϕ in L 1 ( , E , μ), then (ϕn ) is μ–uniformly integrable.
In addition, find a space (X, E , μ) and a sequence (ϕn ) that is μ–uniformly
integrable, for which there is no g ∈ L 1 (X, E , μ) satisfying |ϕn | ≤ g for all
n ∈ N.
2.11 Let (X, d) be a metric space and let g : X → [0, ∞] be lower semicon-
tinuous and not identically equal to ∞. For any λ > 0 define

gλ (x) := inf {g(y) + λd(x, y)} .


y∈X
43 Introduction to Measure Theory and Integration

Check that:
(a) |gλ (x) − gλ (x  )| ≤ λd(x, x  ) for all x, x  ∈ X;
(b) gλ ↑ g as λ ↑ ∞.
2.12 Let f : R2 → R be satisfying the following two properties:
(i) x → f (x, y) is continuous in R for all y ∈ R;
(ii) y → f (x, y) is continuous in R for all x ∈ R.
Show that f is a Borel function. Hint: first reduce to the case when f is
bounded. Then, for ε > 0 consider the functions
 x+ε
1
f ε (x, y) := f (x  , y) dx  ,
2ε x−ε

proving that f ε are continuous and f ε → f as ε ↓ 0.


Chapter 3
Spaces of integrable functions

This chapter is devoted to the properties of the so-called L p spaces, the


spaces of measurable functions whose p-th power is integrable. Through-
out the chapter a measure space (X, E , μ) will be fixed.

3.1. Spaces L p (X, E , μ) and L p (X, E , μ)


Let Y be a real vector space. We recall that a norm  ·  on Y is a non-
negative map defined on Y satisfying:
(i) y = 0 if and only if y = 0;
(ii) αy = |α| y for all α ∈ R and y ∈ Y ;
(iii) y1 + y2  ≤ y1  + y2  for all y1 , y2 ∈ Y .
The space Y , endowed with the norm  · , is called a normed space.
Y is also a metric space when endowed with the distance d(y1 , y2 ) =
y1 − y2  (the triangle inequality is a direct consequence of (iii)). If
(Y, d) is a complete metric space, we say that (Y,  · ) is a Banach space.
We denote by L 1 (X, E , μ) the real vector space of all μ–integrable
functions on (X, E ). We define

ϕ1 := |ϕ(x)| dμ(x), ϕ ∈ L 1 (X, E , μ).
X

We have clearly

αϕ1 = |α| ϕ1 ∀α ∈ R, ∀ϕ ∈ L 1 (X, E , μ),

and

ϕ + ψ1 ≤ ϕ1 + ψ1 ∀ϕ, ψ ∈ L 1 (X, E , μ),

so that conditions (ii) and (iii) in the definition of the norm are fulfilled.
However,  · 1 is not a norm in general, since ϕ1 = 0 if and only if
ϕ = 0 μ–a.e. in X, so (i) fails.

L. Ambrosio et al., Introduction to Measure Theory and Integration


© Scuola Normale Superiore Pisa 2011
46 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Then, we can consider the following equivalence relation R on


L 1 (X, E , μ),

ϕ∼ψ ⇐⇒ ϕ = ψ μ–a.e. in X (3.1)

and denote by L 1 (X, E , μ) the quotient space of L 1 (X, E , μ) with re-


spect to R . In other words, L 1 (X, E , μ) is the quotient vector space
of L 1 (X, E , μ) with respect to the vector subspace made by functions
vanishing μ–a.e. in X.
For any ϕ ∈ L 1 (X, E , μ) we denote by ϕ̃ the equivalence class de-
termined by ϕ and we set

ϕ̃ + ψ̃ := ϕ
+ ψ, α ϕ̃ := α"
ϕ. (3.2)

It is easily seen that these definitions do no depend on the choice of rep-


resentatives in the equivalence class, and endow L 1 (X, E , μ) with the
structure of a real vector space, whose origin is the equivalence class of
functions vanishing μ–a.e. in X. Furthermore, setting

ϕ̃1 = ϕ1 , ϕ̃ ∈ L 1 (X, E , μ),

it is also easy to see that this definition does not depend on the particular
element ϕ chosen in ϕ̃, and that (ii), (iii) still hold. Now, if ϕ̃1 = 0
we have that the integral of |ϕ| is zero, and therefore ϕ̃ = 0. Therefore
L 1 (X, E , μ), endowed with the norm  · 1 , is a normed space.
To simplify the notation typically ϕ̃ is identified with ϕ whenever the
formula does not depend on the choice of the function  in the equival-
ence class: for instance, quantities as μ({ϕ > t}) or X ϕ dμ have this
independence, as well as most statements and results in Measure Theory
and Probability, so this slight abuse of notation is justified. It should be
noted, however, that formulas like ϕ(x̄) = 0, for some fixed x̄ ∈ X, do
not make sense in L 1 (X, E , μ), since they depend on the representative
chosen (unless μ({x̄}) > 0).
More generally, if an exponent p ∈ (0, ∞) is given, we can apply a
similar construction to the space


L (X, E , μ) := ϕ : ϕ is E –measurable and
p
|ϕ| dμ < ∞ .
p
X

Since |x + y| p ≤ |x| p + |y| p if p ≤ 1, and |x + y| p ≤ 2 p−1 (|x| p +


|y| p ) if p ≥ 1, it turns out that L p (X, E , μ) is a vector space, and we
shall denote by L p (X, E , μ) the quotient vector space, with respect to the
equivalence relation (3.1). Still we can define the sum and product by a
47 Introduction to Measure Theory and Integration

real number as in (3.2), to obtain that L p (X, E , μ) has the structure of a


real vector space. The case p = 2 is particularly relevant for the theory,
as we will see.
Sometimes we will omit either E or μ, writing L p (X,μ) or even L p (X).
This typically happens when (X, d) is a metric space, and E is the Borel
σ -algebra, or when X ⊂ R and μ is the Lebesgue measure.

3.2. The L p norm


For any ϕ ∈ L p (X, E , μ) we define
 1/ p
ϕ p := |ϕ| dμ
p
.
X

We are going to show that  ·  p is a norm for any p ∈ [1, +∞). Notice
that we already checked this fact when p = 1, and that the homogen-
eity condition (ii) trivially holds, whatever the value of p is. Further-
more, condition (i) holds precisely because L p (X, E , μ) consists, strictly
speaking, of equivalence classes induced by (3.1). So, the only condi-
tion that needs to be checked is the subadditivity condition (ii), and in the
sequel we can assume p > 1.
The concept of Legendre transform will be useful. Let f : R → R be
a function; we define its Legendre transform f ∗ : R → R ∪ {+∞} by
f ∗ (y) = sup{x y − f (x)}, y ∈ R.
x∈R

Then the following inequality clearly holds:


x y ≤ f (x) + f ∗ (y) ∀x, y ∈ R, (3.3)
and actually f ∗ could be equivalently defined as the smallest function
with this property.
Example 3.1. Let p > 1 and let
⎧ p
⎪ x

⎨ p if x ≥ 0,
f (x) =



0 if x < 0.
Then, by an elementary computation, we find that
⎧ q
⎪ y

⎨q if y ≥ 0,

f (y) =



+∞ if y < 0,
48 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

where q = p/( p − 1) (equivalently, 1p + q1 = 1). Consequently, the


following inequality, known as Young inequality, holds:
xp yq
xy ≤ + , x, y ≥ 0. (3.4)
p q
Motivated by the previous example, we say that p and q are dual (or
conjugate) exponents if 1p + q1 = 1, i.e. q = p/( p − 1). The duality
relation is symmetric in (1, +∞), and obviously 2 is self-dual.
Example 3.2. Let f (x) = e x , x ∈ R. Then


⎨+∞ if y < 0,

f (y) := sup{x y − e } = 0
x
if y = 0,
x∈R ⎪

y log y − y if y > 0.
Consequently, the following inequality holds:
x y ≤ e x + y log y − y, x, y ≥ 0. (3.5)

3.2.1. Hölder and Minkowski inequalities


Proposition 3.3 (Hölder inequality). Assume that ϕ ∈ L p (X, E , μ) and
ψ ∈ L q (X, E , μ), with p and q dual exponents in (1, +∞). Then
ϕψ ∈ L 1 (X, E , μ) and
ϕψ1 ≤ ϕ p ψq . (3.6)
Proof. If either ϕ p = 0 or ψq = 0 then one of the two functions
vanishes μ–a.e. in X, hence ϕψ vanishes μ–a.e. and the inequality is
trivial. If both ϕ p and ψq are strictly positive, by the 1–homogeneity
of the both sides in (3.6) with respect to ϕ and ψ, we can assume with no
loss of generality that the two norms are equal to 1.
Now we apply (3.4) to |ϕ(x)| and |ψ(x)| to obtain
|ϕ(x)| p |ψ(x)|q
|ϕ(x)ψ(x)| ≤ + .
p q
Integrating over X with respect to μ yields

1 1
|ϕ(x)ψ(x)| dμ(x) ≤ + = 1.
X p q
A particular case of the Hölder inequality is
! !  1/2  1/2
! !
! ϕ(x)ψ(x) dμ(x)! ≤ ϕ (x) dμ(x)
2
ψ (x) dμ(x)
2
.
! !
X X X

It also follows, as we shall see, from the Cauchy-Schwarz inequality of


scalar products.
49 Introduction to Measure Theory and Integration

Proposition 3.4 (Minkowski inequality). Assume that p ∈ [1, +∞)


and ϕ, ψ ∈ L p (X, E , μ). Then ϕ + ψ ∈ L p (X, E , μ) and

ϕ + ψ p ≤ ϕ p + ψ p . (3.7)

Proof. The cases p = 1 is obvious. Assume that p ∈ (1, +∞). Then we


have
  
|ϕ + ψ| p dμ ≤ |ϕ + ψ| p−1 |ϕ| dμ + |ϕ + ψ| p−1 |ψ| dμ.
X X X

Since |ϕ + ψ| p−1 ∈ L q (X, E , μ) where q = p/( p − 1), using the Hölder


inequality we find that
  1/q
|ϕ + ψ| dμ ≤
p
|ϕ + ψ| dμ
p
(ϕ p + ψ p ),
X X

and the conclusion follows.


By the previous proposition it follows that · p is a norm on L p (X, E , μ).

3.3. Convergence in L p (X, E , μ) and completeness


We have seen in the previous section that L p (X, E , μ) is a normed space
for all p ∈ [1, +∞). In this section we prove some properties of the con-
vergence in these spaces, obtaining as a byproduct the following result.

Theorem 3.5. L p (X, E , μ) is a Banach space for any p ∈ [1, +∞).

This theorem will be a direct consequence of the following proposition,


that provides also a relation between convergence in L p and convergence
μ–a.e. in X.

Proposition 3.6. Let p ∈ [1, +∞) and let (ϕn ) be a Cauchy sequence in
L p (X, E , μ). Then:

(i) there exists a subsequence (ϕn(k) ) converging μ–a.e. to a function ϕ


in L p (X, E , μ);
(ii) (ϕn ) is converging to ϕ in L p (X, E , μ), so that L p (X, E , μ) is a
Banach space.

Proof. Let (ϕn ) be a Cauchy sequence in L p (X, E , μ). Choose a sub-


sequence (ϕn(k) ) such that

ϕn(k+1) − ϕn(k)  p < 2−k ∀k ∈ N.


50 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Next, set


g(x) := |ϕn(k+1) (x) − ϕn(k) (x)|, x ∈ X.
k=0

By the monotone convergence theorem and the subadditivity of the L p


norm it follows that
 1/ p  ! !p 1/ p
!N −1 !
! !
g (x) dμ(x)
p
= lim ! |ϕn(k+1) − ϕn(k) |! dμ
X N →∞ X ! k=0 !

N −1
≤ lim 2−k = 2 < ∞.
N →∞
k=0

Therefore, g is finite μ–a.e., that is, there exists B ∈ E such that μ(B) =
0 and g(x) < ∞ for all x ∈ B c . Set now



ϕ(x) := ϕn(0) (x) + (ϕn(k+1) (x) − ϕn(k) (x)), x ∈ Bc.
k=0

The series above is absolutely convergent for any x ∈ B c


;N moreover, re-
placing the series in the definition of ϕ by the finite sum 0 −1 (ϕn(k+1) (x)
− ϕn(k) (x)) we obtain ϕ(x) = limk ϕn(k) (x). Therefore, if we define (for
instance) ϕ = 0 on the μ–negligible set B, we obtain that ϕn(k) → ϕ
μ–a.e. on X.
The inequality |ϕ| ≤ |ϕn(0) | + g gives that |ϕ| p is μ–integrable, so that
ϕ ∈ L p (X, E , μ). So, (i) is proved.
In order to prove (ii), we first claim that ϕn(k) → ϕ in L p (X, E , μ) as
k → ∞. In fact, since


|ϕ(x) − ϕn(h) (x)| ≤ |ϕn(k+1) (x) − ϕn(k) (x)|, x ∈ X,
k=h

we have, again by monotone convergence and subadditivity of the norm,


 1/ p
|ϕ(x) − ϕn(h) (x)| dμ(x)
p

∞  1/ p 
X
 ∞
≤ |ϕn(k+1) (x) − ϕn(k) (x)| dμ(x)
p
≤ 2−k ,
k=h X k=h

and the claim follows.


51 Introduction to Measure Theory and Integration

Since (ϕn ) is Cauchy, for any ε > 0 there exists n ε ∈ N such that

n, m > n ε ⇒ ϕn − ϕm  p < ε.

Now choose k ∈ N such that n(k) > n ε and ϕ − ϕn(k)  p < ε. For any
n > n ε we have

ϕ − ϕn  p ≤ ϕ − ϕn(k)  p + ϕn(k) − ϕn  p ≤ 2ε.

Remark 3.7 (L p convergence versus μ–a.e. convergence). The argu-


ment used in the previous proof applies also to converging sequences
(as these sequences are obviously Cauchy), and proves that any sequence
(ϕn ) strongly converging to ϕ in L p (X,E , μ) admits a subsequence (ϕn(k) )
converging μ–a.e. to ϕ: precisely, this happens whenever


ϕn(k+1) − ϕn(k)  p < ∞ .
0

In general, however, convergence in L p does not imply convergence μ–


a.e.: the functions


⎪ ϕ0 = 1[0,1]

⎨ ϕ1 = 1[0,1/2] , ϕ2 = 1[1/2,1]

⎪ ϕ3 = 1[0,1/3] , ϕ4 = 1[1/3,2/3] , ϕ5 = 1[2/3,1]


...

converge to 0 in L p (0, 1), but are nowhere pointwise converging.


The previous remark shows that we can expect to infer pointwise con-
vergence from convergence in L p only modulo the extraction of a sub-
sequence. Now, we ask ourselves about the converse implication: given
a sequence (ϕn ) in L p (X, E , μ) pointwise converging to a function ϕ ∈
L p (X, E , μ), we want to find conditions ensuring the convergence of
(ϕn ) to ϕ in L p (X, E , μ). This is not true in general, as the following
example shows.
Example 3.8. Let X = [0, 1], E = B ([0, 1]) and let μ = λ be the
Lebesgue measure. Set

n if x ∈ [0, 1/n],
ϕn (x) =
0 if x ∈ [1/n, 1].

Then ϕn (x) → 0 for all x ∈ (0, 1] but ϕn 1 = 1.


52 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

In the next proposition we assume that μ is a finite measure, since we


defined μ–uniform integrability only for finite measures μ.

Proposition 3.9. Let (ϕn ) be a sequence in L p (X, E , μ) pointwise con-


vergent to a function ϕ ∈ L p (X, E , μ), with (|ϕn | p ) μ–uniformly integ-
rable. Then ϕn → ϕ in L p (X, E , μ).

Proof. The functions h n := |ϕn − ϕ| p are pointwise converging to 0 and,


because of the inequality

h n ≤ 2 p−1 (|ϕn | p + |ϕ| p ),

they are also easily seen to be uniformly μ–integrable. Therefore, by


applying Vitali Theorem 2.18 to h n we obtain the conclusion.

3.4. The space L ∞ (X, E , μ)


Let ϕ : X → R be a E –measurable function. We say that ϕ is μ–
essentially bounded if there exists a real number M > 0 such that

μ({|ϕ| > M}) = 0.

If ϕ is μ–essentially bounded there exists a nonnegative number, denoted


by ϕ∞ , such that

ϕ∞ = min {t ≥ 0 : μ({|ϕ| > t}) = 0} . (3.8)

This easily follows from the fact that the function t → μ({|ϕ| > t}) is
right continuous (Proposition 2.6), so the infimum is attained.
Notice also that ϕ∞ is characterized by the property

ϕ∞ ≤ M ⇐⇒ |ϕ| ≤ M μ–a.e. in X. (3.9)

We shall denote by L ∞ (X, E , μ) the space of all equivalence classes of


μ–essentially bounded functions with respect to the equivalence relation
∼ in (3.1), thus identifying functions that coincide μ–a.e. in X.
Several properties of the L p spaces extend up to the case p = ∞: first
of all L ∞ (X, E , μ) is a real vector space and we have the Minkowski
inequality
ϕ + ψ∞ ≤ ϕ∞ + ψ∞ . (3.10)
Indeed, by (3.9) and the triangle inequality, |ϕ(x) + ψ(x)| ≤ ϕ∞ +
ψ∞ μ–a.e. in X, therefore (3.8) provides (3.10). As a consequence,
L ∞ (X, E , μ) endowed with the norm  · ∞ , is a normed space.
53 Introduction to Measure Theory and Integration

The Hölder inequality takes the form


 
|ϕψ| dμ ≤ ϕ∞ |ψ| dμ. (3.11)
X X

Indeed, we have just to notice that |ϕ(x)ψ(x)| ≤ ϕ∞ |ψ(x)| for μ–a.e.
x ∈ X, and then integrate with respect to μ. This inequality can be still
written as (3.6), provided we agree that q = 1 is the dual exponent of
p = ∞ (and conversely).
For finite measures we can apply Hölder’s inequality to obtain that the
L spaces are nested; in particular L ∞ is the smaller one and L 1 is the
p

larger one.
Remark 3.10 (Inclusions between L p spaces). Assume that μ is finite.
Then, if 1 ≤ r ≤ s ≤ ∞ we have

L r (X, E , μ) ⊃ L s (X, E , μ).

In fact, if r < s and ϕ ∈ L s (X, E , μ) we have, in view of the Hölder


inequality (with p = s/r and q = s/(s − r)),
  r/s  1−r/s
|ϕ(x)|r dμ(x) ≤ |ϕ(x)|s dμ(x) 1 X dμ(x) ,
X X X

and so
ϕr ≤ (μ(X))(s−r)/rs ϕs . (3.12)
−1/ p
By (3.12) we obtain that p → μ(X) ϕ p is nondecreasing for ϕ in
the intersection of the spaces L (X, E , μ), so that it has a limit as p →
p

∞. Since μ(X)−1/ p → 1 as p → ∞ we obtain that lim p→∞ ϕ p exists,


finite or infinite. The following proposition characterizes L ∞ (X, E , μ)
and the L ∞ norm in terms of this limit.
Proposition 3.11. Assume that μ is finite and let ϕ be in the intersection

L p (X, E , μ).
p<∞

Then ϕ ∈ L ∞ (X, E , μ) if and only if the limit lim p→∞ ϕ p is finite. If
this is the case, we have that ϕ∞ coincides with the value of the limit.
Proof. If p ≥ 1 we have by the Markov inequality

μ({|ϕ| ≥ a}) = μ({|ϕ| p ≥ a p }) ≤ a − p ϕ pp .

Consequently, ϕ p ≥ aμ({|ϕ| ≥ a})1/ p , which yields lim p ϕ p ≥


a whenever μ({ϕ ≥ a}) > 0. So, if the limit is finite, we have ϕ ∈
54 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

L ∞ (X, E , μ) and ϕ∞ ≤ lim p ϕ p . The converse inequality follows


directly from (3.11); the same inequality also proves that if the limit is
/ L ∞ (X, E , μ).
not finite, then ϕ ∈

In the next remark we characterize the convergence in L ∞ , proving also


that L ∞ (X, E , μ) is a Banach space: as a matter of fact, convergence
in L ∞ (X, E , μ) differs from the convergence in supremum norm only
because a μ–negligible set is neglected.
Remark 3.12 (L ∞ (X, E , μ) is a Banach space). Assume that (ϕn ) ⊂
L ∞ (X, E , μ) is a Cauchy sequence, and let us consider the μ–negligible
set


{x ∈ X : |ϕn (x) − ϕm (x)| > ϕn − ϕm ∞ } .
n, m=0

Then sup B c |ϕn − ϕm | ≤ ϕn − ϕm ∞ ; as a consequence, the complete-


ness of the space of bounded functions defined in B c provides a bounded
function ϕ : B c → R such that ϕn → ϕ uniformly in B c . Extending ϕ
in an arbitrary E –measurable way (for instance with the 0 value) to the
whole of X, we get ϕn → ϕ in L ∞ (X, E , μ).
A similar argument proves that ϕn → ϕ in L ∞ (X, E , μ) if and only if
there exists a μ–negligible set B ∈ E satisfying ϕn → ϕ uniformly in
Bc.
! ! 
We know that ! X ϕ dμ! does not exceed X |ϕ| dμ. A nice and useful
generalization of this fact is the so-called Jensen inequality.
Recall that, if J ⊂ R is an interval, a continuous function g : J → R
is said to be convex if
 
x+y g(x) + g(y)
g ≤ ∀x, y ∈ J. (3.13)
2 2

By several approximations (see Exercise 3.7) one can prove that a convex
function f satisfies g(t x +(1−t)y) ≤ tg(x)+(1−t)g(y) for all x, y ∈ J
and t ∈ [0, 1], and even that

n n 
n
g ti xi ≤ ti g(xi ) whenever ti ≥ 0, xi ∈ J and ti = 1.
i=1 i=1 i=1
(3.14)
In the proof we use an elementary property of convex functions g : R →
R satisfying g(t) → +∞ as |t| → +∞, namely the existence of a
minimum point t0 ; moreover, the function g is nondecreasing in [t0 , +∞)
and nonincreasing in (−∞, t0 ] (see Exercise 3.8).
55 Introduction to Measure Theory and Integration

Proposition 3.13 (Jensen). Assume that μ is a probability measure. Let


g : R → R be convex and bounded from below and let ϕ ∈ L 1 (X, E , μ).
Then we have   
g ϕ dμ ≤ g(ϕ) dμ. (3.15)
X X

Proof. Let us first show (3.15) when ϕ is simple. Let



n
ϕ= αi 1 Ai ,
i=1

where n ≥ 1 is an integer, α1 , . . . , αn ∈ R and A1 , . . . , An are mutually


disjoint sets in E whose union is X, so that

n
μ(Ai ) = 1.
i=1

Then, from (3.14) we infer


   
 n 
n
g ϕ dμ = g αi μ(Ai ) ≤ g(αi )μ(Ai ) = g(ϕ) dμ.
X i=1 i=1 X

In the general case, let us first assume that g(t) → +∞ as |t| → +∞.
Then, by Exercise 3.8 we know that g has a minimum point t0 , and that
g is nondecreasing in [t0 , +∞), and nonincreasing in (−∞, t0 ]. We can
assume with no loss of generality (possibly replacing g(t) by g(t − t0 )
 ϕ by ϕ + t0 ) that g attains its minimum value at t0 = 0, and that
and
X g(ϕ) dμ is finite. Furthermore, replacing g by g−g(0), we can assume
that the minimum value of g is 0.
Let ϕn± be nonnegative simple functions satisfying ϕn± ↑ ϕ ± ; the simple
functions ϕn+ − ϕn− converge to ϕ + − ϕ − = ϕ in L 1 (X, E , μ). In addition,
since g is monotone in (−∞, 0] and [0, +∞), the monotone convergence
theorem gives
   
g(ϕn+ ) dμ ↑ g(ϕ + ) dμ, g(−ϕn− ) dμ ↑ g(−ϕ − ) dμ,
X X X
+ − + − + −

 = 0, ϕ−n ϕn = 0 and ϕ ϕ = 0)
so that +(since g(0) +
X g(ϕn − ϕn ) dμ

=
 X g(ϕn ) dμ + X g(−ϕn ) converges to X g(ϕ ) dμ + X g(−ϕ ) =
X g(ϕ) dμ. Passing to the limit as n → ∞ in Jensen’s inequality for the
simple functions ϕn+ − ϕn−
  
+ −
g (ϕn − ϕn ) dμ ≤ g(ϕn+ − ϕn− ) dμ
X X

we get (3.15).
56 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Finally, the assumption that g(t) → +∞ as t → +∞ can be removed


by considering the functions gε (t) := g(t) + ε|t|, which converge to +∞
as |t| → ∞, thanks to the fact that g is bounded from below: we obtain
  ! !  
! !
g !
ϕ dμ + ε ! ϕ dμ! ≤ ! g(ϕ) dμ + ε |ϕ| dμ.
X X X X

and Jensen’s inequality follows by letting ε ↓ 0.


An alternative proof of Jensen’s inequality is based on another viewpoint,
namely the representation of g as the supremum of a family {L i }i∈I of
affine functions. Since μ is a probability measure, for all i ∈ I it is easy
to check that L i ( ϕ dμ) = L i (ϕ) dμ, so that
 

Li ϕ dμ) ≤ L(ϕ) dμ ∀i ∈ I.
X X

Taking the supremum in the right hand side we obtain Jensen’s inequality.
Both viewpoints are important in the theory of convex functions.
To be more precise, Jensen’s inequality holds provided g is convex on
an interval containing the image of ϕ. The next example is very important
in Probability and Information theory.
Example 3.14 (Entropy functional). By applying Jensen’s inequality
with the convex function g(z) = z ln z in [0, +∞) we obtain
   
ϕ ln ϕ dμ ≥ ϕ dμ ln ϕ dμ (3.16)
X X X

 all ϕ ∈ L (X, E , μ) nonnegative. If X ϕ dμ = 1 we obtain that
1
for
X ϕ ln ϕ dμ ≥ 0 even though the function g has a variable sign (it attains
the minimum value −1/e at z = 1/e).

3.5. Dense subsets of L p (X, E , μ)


Proposition 3.15. For any p ∈ [1, +∞], the space of all simple μ–
integrable functions is dense in L p (X, E , μ).
Proof. Let f ∈ L p (X, E , μ) with f ≥ 0. Then the conclusion follows
from Proposition 2.12 (by Proposition 2.4 in the case p = ∞) and the
dominated convergence theorem. In the general case we write f as f + −
f − and approximate in L p both parts by simple functions.
We consider now the special situation when X is a metric space, E is
the σ –algebra of all Borel subsets of X and μ is any finite measure on
(X, E ).
We denote by Cb (X) the space of all continuous bounded functions on
X. Clearly, Cb (X) ⊂ L p (X, E , μ) for all p ∈ [1, +∞].
57 Introduction to Measure Theory and Integration

Proposition 3.16. For any p ∈ [1, +∞) and any finite measure μ, Cb (X)
is dense in L p (X, E , μ).

Proof. Let C be the closure of Cb (X) in L p (X, E , μ); obviously C is a


vector space, as Cb (X) is a vector space. In view of Proposition 3.15 it is
enough to show that for any Borel set I ∈ B (X) there exists a sequence
(ϕn ) ⊂ Cb (X) such that ϕn → 1 I in L p (X, E , μ).
Assume first that I is closed. Set

⎨ 1 − n d(x, I ) if d(x, I ) ≤ n1
ϕn (x) =

0 if d(x, I ) ≥ n1 ,

where
d(x, I ) := inf{|x − y| : y ∈ I }.
It is easy to see that ϕn are continuous, that 0 ≤ ϕn ≤ 1 and that ϕn (x) →
1 I (x), hence the dominated convergence theorem implies that ϕn → 1 I in
L p (X, E , μ).
Now, let
G := {I ∈ B (X) : 1 I ∈ C }.
It is easy to see that G is a Dynkin system (which includes the π–system
of closed sets), so that by the Dynkin theorem we have G = B (X).
Remark 3.17. Cb (X) (or more precisely, the equivalence classes of con-
tinuous bounded functions) is a closed subspace of L ∞ (X, E , μ), and
therefore it is not dense in general. Indeed, if (ϕn ) ⊂ Cb (X) is Cauchy
in L ∞ (X, E , μ), then it uniformly converges, up to a μ-negligible set
B (just take in Remark 3.12 as B the union of the μ–negligible sets
{|ϕn −ϕm | > ϕn −ϕm }). Therefore (ϕn ) uniformly converges on B c and
on its closure K . Denoting by ϕ ∈ Cb (K ) its uniform limit, by Tietze’s
exension theorem we may extend ϕ to a function, that we still denote by
ϕ, in Cb (X). As X \ K ⊂ B is μ–negligible, it follows that ϕn → ϕ in
L ∞ (X, E , μ).

Exercises
3.1 Assume that μ is σ –finite, but not finite. Provide examples showing that no
inclusion holds between the spaces L p (X, E , μ) in general. Nevertheless, show
that for any E –measurable function ϕ : X → R the set
 
p ∈ [1, ∞] : ϕ ∈ L p (X, E , μ)

is an interval. Hint: consider for instance the Lebesgue measure on R.


58 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

3.2 Let 1 ≤ p ≤ q < ∞ and f ∈ L q (X, E , μ). Show that for any δ ∈ (0, 1)
we can write f = g + f˜, with g ∈ L q (X, E , μ), f˜ ∈ L p (X, E , μ) and gq ≤
δ f q (notice that if μ is finite we can take g = 0).
3.3 Let p ∈ (1, ∞), ϕ ∈ L p and ψ ∈ L q , with q = p , be such that ϕψ1 =
ϕ p ψq . Show that either ψ = 0 or there exists a constant λ ∈ [0, +∞) such
that |ϕ| = λ|ψ|q−1 μ–a.e. in X. Hint: first investigate the case of equality in
Young’s inequality.
3.4 Prove the following variant of Hölder’s inequality, known as Young’s in-
equality: if ϕ ∈ L p , ψ ∈ L q and 1p + q1 = r1 , with r ≥ 1, we have that ϕψ ∈ L r
and ϕψr ≤ ϕ p ψq .
3.5 Let (ϕn ) ⊂ L 1 (X, E , μ) be nonnegative and satisfying lim infn ϕn ≥ ϕ μ–
a.e. in X. Show that
  
ϕn dμ = ϕ dμ = 1 ⇒ |ϕ − ϕn | dμ → 0.
X X X

Hint: notice that the positive part and the negative part of ϕ − ϕn have the same
integral to obtain
 
|ϕ − ϕn | dμ = 2 (ϕ − ϕn )+ dμ.
X X

Then, apply the dominated convergence theorem.


3.6 Show that the following extension of Fatou’s lemma: if ϕn ≥ −ψn , with
ψn ∈ L 1 (X) nonnegative, ψn → ψ in L 1 (X), then
 
lim inf ϕn dμ ≥ lim inf ϕn dμ.
n→∞ X X n→∞

Hint: prove first the statement under the additional assumption that ψn → ψ
μ–a.e. in X.
3.7 Show that (3.13) implies g(t x + (1 − t)y) ≤ g(x) + (1 − t)g(y) for all
x, y ∈ J and t ∈ [0, 1]. Then, deduce from this property (3.14). Hint: it is
useful to consider dyadic numbers t = k/2m , with k ≤ 2m integer.
3.8 Let g : R → R be a convex function such that g(z) → +∞ as |z| → +∞.
Show the existence of z 0 ∈ R where g attains its minimum value. Then, show
that g is nondecreasing in [z 0 , +∞) and nonincreasing in (−∞, z 0 ].
3.9 Let (ϕn ) ⊂ L 1 (X, E , μ) be nonnegative functions. Show that the conditions
 
lim inf ϕn ≥ ϕ μ–a.e. in X, lim sup ϕn dμ ≤ ϕ dμ < ∞
n→∞ n→∞ X X

imply the convergence of ϕn to ϕ in L 1 (X, E , μ). Hint: use Exercise 3.5.


3.10 Let {ϕi }i∈I be a family of functions satisfying

sup (|ϕi |) dμ = M < +∞
i∈I X
59 Introduction to Measure Theory and Integration

and assume that (c)/c is nondecreasing and tends to +∞ as c → +∞. Show


that {ϕi }i∈I is μ–uniformly integrable. Hint: use the inequalities
  
(ϕi ) M
|ϕi | dμ ≤ dμ + |ϕi | dμ ≤ + cμ(A),
A A∩{|ϕi |≥c} (c) A∩{|ϕi |<c} (c)

with (c) := (c)/c, and then choose c sufficiently large, such that M/(c) <
ε/2.
3.11
Assuming that (X, d) is a metric space, E = B (X) and μ is finite, prove
Lusin’s theorem: for any ε > 0 and any f ∈ L 1 (X, E , μ), there exists a closed
set C ⊂ X such that μ(X \ C) < ε and f |C is continuous and bounded. Hint:
use the density of Cb (X) in L 1 and Egorov’s theorem.
Chapter 4
Hilbert spaces

In this chapter we recall the basic facts regarding real vector spaces en-
dowed with a scalar product. We introduce the concept of Hilbert space
and show that, even for the infinite-dimensional ones, continuous linear
functionals are induced by the scalar product. Moreover, we see that even
in some classes of infinite dimensional spaces (the so-called separable
ones) there exists a well-defined notion of basis (the so-called complete
orthonormal systems), obtained replacing finite sums with converging
series. Even though the presentation will be self-contained, we assume
that the reader has already some familiarity with these concepts (basis,
scalar product, representation of linear functionals) in finite-dimensional
spaces.

4.1. Scalar products, pre-Hilbert and Hilbert spaces


A real pre–Hilbert space is a real vector space H endowed with a map-
ping
H × H → R, (x, y) → x, y,
called scalar product, such that:
(i) x, x ≥ 0 for all x ∈ H and x, x = 0 if and only if x = 0;
(ii) x, y = y, x for all x, y ∈ H ;
(iii) αx + βy, z = αx, z + βy, z for all x, y, z ∈ H and α, β ∈ R.
In the following H represents a real pre–Hilbert space.
The scalar product allows us to introduce the concept of orthogonality.
We say that two elements x and y of H are orthogonal if x, y = 0.
We are going to prove that the function
#
x := x, x, x∈H

is a norm in H . For this we need the following Cauchy–Schwartz in-


equality.

L. Ambrosio et al., Introduction to Measure Theory and Integration


© Scuola Normale Superiore Pisa 2011
62 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Proposition 4.1. For any x, y ∈ H we have


|x, y| ≤ x y. (4.1)
In (4.1) equality holds if and only if x and y are linearly dependent.
Proof. Set
F(λ) = x + λy2 = λ2 y2 + 2λx, y + x2 , λ ∈ R.
Since F(λ) ≥ 0 for all λ ∈ R we have
|x, y|2 − x2 y2 ≤ 0,
which yields (4.1).
If x and y are linearly dependent, it is clear that |x, y| = x y.
Assume conversely that x, y = ±x y and that y  = 0. Then
we have F(λ) = (x ± λy)2 so that, choosing λ = ∓x/y, we
find F(λ) = 0. This implies x + λy = 0, so that x and y are linearly
dependent.
Now we can prove easily that  ·  is a norm in H . In fact, it is clear
that αx = |α|x for all α ∈ R and all x ∈ H . Moreover, taking into
account (4.1), we have for all x, y ∈ H ,
x + y2 = x + y, x + y = x2 + y2 + 2x, y

≤ x2 + y2 + 2x y = (x + y)2 ,


so that x + y ≤ x + y.
Therefore a pre–Hilbert space H is a normed space and, in particular,
a metric space. If H , endowed with the distance induced by the norm, is
complete we say that H is a Hilbert space.
Example 4.2. (i). Rn is a Hilbert space with the canonical scalar product

n
x, y := xk yk ,
k=1

inducing the Euclidean distance, where x = (x1 , . . . , xn ), y = (y1 , . . .


. . . , yn ) ∈ Rn .
(ii). Let (X, E , μ) be a measure space. Then L 2 (X, E , μ), endowed with
the scalar product

ϕ, ψ := ϕ(x)ψ(x) dμ(x) ϕ, ψ ∈ L 2 (X, E , μ),
X

is a Hilbert space (completeness follows from Proposition 3.5).


63 Introduction to Measure Theory and Integration

(iii). Let 2 be the space of all sequences of real numbers x = (xk ) such
∞
that xk2 < ∞. 2 is a vector space with the usual operations,
k=0

a(xk ) = (axk ) a ∈ R, (xk ) + (yk ) = (xk + yk ), (xk ), (yk ) ∈ 2 .


The space 2 , endowed with the scalar product


x, y := xk yk , x = (xk ), y = (yk ) ∈ 2
k=0

is a Hilbert space. This follows from (ii) taking X = N, E = P (X) and


μ({x}) = 1 for all x ∈ X.
(iv). Let X = C([0, 1]) be the linear space of all real continuous func-
tions on [0, 1]. X is a pre–Hilbert space with the scalar product

 f, g := f (t)g(t) dt.
X
However, X is not a Hilbert space: indeed, X is dense, but strictly con-
tained, in L 2 (0, 1).
Finite-dimensional pre-Hilbert spaces H are always Hilbert spaces:
indeed, if {v1 , . . . , vn }, with n = dim H , is a basis of H , the Gram-
Schmidt orthonormalization process (recalled in Exercise 4.3) provides
an orthonormal basis {e1 , . . . , en } of H (i.e. ei  = 1 and ei is ortho-
gonal to e j for i = j), and the map

n
x= x, ei ei → (x, e1 , x, e2 , . . . , x, en )
i=1

(mapping x to the Euclidean vector of its coordinates with respect to this


basis) is easily seen to provide an isometry with Rn : indeed,

n 
n 
n
 x, ei ei 2 = x, ei x, e j ei , e j  = (x, ei )2 .
i=1 i, j=1 i=1

Thus, being R complete, H is complete.


n

4.2. The projection theorem


It is useful to notice that for any x, y ∈ H the following parallelogram
identity holds:
x + y2 + x − y2 = 2x2 + 2y2 , x, y ∈ H. (4.2)
One can show that identity (4.2) characterizes pre-Hilbert spaces among
normed spaces, and Hilbert among Banach spaces, see Exercise 4.1.
64 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Theorem 4.3 (Projection on closed subspaces). Let H be a Hilbert


space and let Y be a closed subspace of H . Then for any x ∈ H there
exists a unique y ∈ Y , called projection of x on Y and denoted by πY (x),
such that
x − y = min x − z.
z∈Y

Moreover, y is characterized by the property

x − y, z = 0 for all z ∈ Y. (4.3)

Proof. Set d := infz∈Y x − z and choose (yn ) ⊂ Y such that x − yn  ↓


d. We are going to show that (yn ) is a Cauchy sequence.
For any m, n ∈ N we have, by the parallelogram identity (4.2),

(x −yn )+(x −ym )2 +(x −yn )−(x −ym )2 = 2x −yn 2 +2x −ym 2 .

Consequently
$ $2
$ yn + ym $
2 2 $
yn − ym  = 2x − yn  + 2x − ym  − 4 $x −
2 $ .
2 $
Taking into account that (yn + ym )/2 ∈ Y we find

yn − ym 2 ≤ 2x − yn 2 + 2x − ym 2 − 4d 2 ,

so that yn − ym  → 0 as n, m → ∞. Thus, (yn ) is a Cauchy sequence


and, since the space is complete and Y is closed, it is convergent to an
element y ∈ Y. Since x − yn  → x − y we find that x − y = d.
Existence is thus proved. Uniqueness follows again by the parallelogram
identity, that gives
$ $2
$ y + y $
 2  2
y − y  ≤ 2x − y + 2x − y  − 4 $x −
2 $ $
2 $
≤ 2d 2 + 2d 2 − 4d 2 = 0

whenever y and y  are minimizers.


Let us prove (4.3). Define

F(λ) = x − y − λz2 = λ2 z2 − 2λx − y, z + x − y2 , λ ∈ R.

Since F attains a minimum at λ = 0, we have F  (0) = x − y, z = 0,


as claimed.
Conversely, if (4.3) holds for all z ∈ Y , we have

x − y − z2 = z2 + x − y2 ≥ x − y2 .


65 Introduction to Measure Theory and Integration

Remark 4.4 (Projection on convex closed sets). The previous proof


works, with absolutely no modification, to show that for any convex
closed set K ⊂ H and any x ∈ H there exists a unique solution y =
π K (x) to the problem
min x − z.
z∈K
In this case, however, π K (x) is not characterized by (4.3), but by a one-
sided condition, namely x − π K (x), z − π K (x) ≤ 0 for all z ∈ K , see
Exercise 4.2.
Corollary 4.5. Let Y be a closed proper subspace of H . Then there ex-
ists x0 ∈ H \ {0} such that x0 , y = 0 for all y ∈ Y .
Proof. It is enough to choose an element z 0 in H which does not belong
to Y and set x0 = z 0 − πY (z 0 ).
Fix an integer n ≥ 1, a n-dimensional subspace Hn ⊂ H and an
orthonormal basis {e1 , . . . , en } of it. The following result characterizes
the projection on Hn , giving the best approximation of an element x by a
linear combination of {e1 , . . . , en }.
Proposition 4.6. The projection of an element x ∈ H on Hn is given by

n
π Hn (x) = x, ek ek .
k=1

Proof. We have to show that for any y1 , . . . , yn ∈ R we have


$ $2 $ $2
$ n $ $  n $
$ $ $ $
$x − xk ek $ ≤ $x − yk ek $ , (4.4)
$ k=1
$ $ k=1
$
where xk = x, ek . We have in fact
$ $2
$ n $ n n
$ $
$x − yk ek $ = x2 + yk2 − 2 xk yk
$ k=1
$ k=1 k=1


n 
n
= x2 − xk2 + (xk − yk )2 .
k=1 k=1

This quantity is clearly minimal when xk = yk , and


$ $2
$ n $ n
$ $
$x − xk ek $ = x2 − xk2 . (4.5)
$ k=1
$ k=1

An alternative proof of the Proposition, based on the characterization


(4.3) of π Hn (x), is proposed in Exercise 4.4.
66 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

4.3. Linear continuous functionals


A linear functional F on H is a mapping F : H → R such that
F(αx + βy) = α F(x) + β F(y) ∀x, y ∈ H, ∀α, β ∈ R.
F is said to be bounded if there exists K ≥ 0 such that
|F(x)| ≤ K x for all x ∈ H .
Proposition 4.7. A linear functional F is continuous if, and only if, it is
bounded.
Proof. It is obvious that if F is bounded then it is continuous (even
Lipschitz continuous). Assume conversely that F is continuous and, by
contradiction, that it is not bounded. Then for any n ∈ N there exists
x n ∈ H such that |F(xn )| ≥ n 2 xn . Setting yn = n1 xn /xn  we have
yn  = n1 → 0, whereas F(yn ) ≥ n, which is a contradiction.
The following basic Riesz theorem, gives an intrinsic representation
formula of all linear continuous functionals.
Proposition 4.8. Let F be a linear continuous functional on H . Then
there exists a unique x0 ∈ H such that
F(x) = x, x0  ∀x ∈ H. (4.6)
Proof. Assume that F = 0 and let Y = F −1 (0) = Ker F. Then Y  = H is
closed (because F is continuous) and a vector space (because F is linear),
so that by Corollary 4.5 there exists z 0 ∈ H such that F(z 0 ) = 1 and
z 0 , z = 0 for all z ∈ Ker F.
On the other hand, for any x ∈ H the element z = x − F(x)z 0 belongs
to KerF since F(z) = F(x) − F(x)F(z 0 ) = 0. Therefore
z 0 , x − F(x)z 0  = 0 for all x ∈ H,
so that
x, z 0  − F(x)z 0 2 = 0
and (4.6) follows setting x0 = z 0 /z 0 2 .
It remains to prove the uniqueness. Let y0 ∈ H be such that
F(x) = x, x0  = x, y0 , x ∈ H.
Then, choosing x = x0 − y0 we find that x0 − y0 2 = 0, so that
x0 = y0 .
67 Introduction to Measure Theory and Integration

4.4. Bessel inequality, Parseval identity and orthonormal sys-


tems
Let us discuss the concept of basis in a Hilbert space H , assuming with
no loss of generality that the dimension of H is not finite. We use Kro-
necker’s notation δhk , equal to 1 for h = k and equal to 0 if h = k.
Definition 4.9 (Orthonormal system). A sequence (ek )k∈N ⊂ H is call-
ed an orthonormal system if

eh , ek  = δh,k , h, k ∈ N.

Proposition 4.10. Let (ek )k∈N be an orthonormal system in H .

(i) For any x ∈ H we have




|x, ek |2 ≤ x2 . (4.7)
k=0



(ii) For any x ∈ H the series x, ek ek is convergent in H (1) .
k=0
(iii) Equality holds in (4.7) holds if and only if


x= x, ek ek . (4.8)
k=0

Inequality (4.7) is called Bessel inequality and when the equality holds,
Parseval identity.
Proof. (i) Let n ∈ N. Then by (4.5) we have
$ $2
$ n $ n
$ $
$x − x, ek ek $ = x −
2
|x, ek |2 , (4.9)
$ k=0
$ k=0

so that (4.7) follows by the arbitrariness of n.


(ii) Let n, p ∈ N and set

n
sn = x, ek ek .
k=0


(1) A series  x of vectors in a Banach space E is said to be convergent if the sequence of the
i
k=0
n
finite sums xi is convergent in E
k=0
68 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Then
$ $2
$n+ p $ 
n+ p
$ $
sn+ p − sn 2 = $ x, ek ek $ = |x, ek |2 .
$k=n+1 $ k=n+1



Since the series |x, ek |2 is convergent by (i), the sequence (sn ) is
k=0
Cauchy and the conclusion follows.
Passing to the limit as n → ∞ in (4.9) we find
$ $2
$ ∞ $ ∞
$ $
$x − x, ek ek $ = x −
2
|x, ek |2 .
$ k=0
$ k=0

This proves statement (iii).

Definition 4.11 (Complete orthonormal system). An orthonormal sys-


tem (ek )k∈N is called complete if


x= x, ek ek ∀x ∈ H.
k=0

Example 4.12. Let H = 2 as in Example 4.2(iii). Then, it is easy to see


that the system (ek ), where

ek := (0, 0, . . . , 0, 1, 0, 0, . . .) (with the digit 1 in the k-th position)

is complete. Indeed, if x = (xk ) ∈ 2 we have that x, ei  = xi (the i-th


component of the sequence x), so that

n 

x − x, ei ei 2 = xk2 → 0.
k=0 k=n+1

We already noticed that Rn is the canonical model of n-dimensional Hil-


bert spaces H , because any choice of an orthonormal basis {v1 , . . . ,vn}
of H induces the linear isometry

n
a → ai ei
i=1

from Rn to H (which, as a consequence, preserves also the scalar product,


by the parallelogram identity). For similar reasons, 2 is the canonical
69 Introduction to Measure Theory and Integration

model of all spaces H having a complete orthonormal system (ek )k∈N : in


this case, the linear map from 2 to H given by



a → ai ei
i=0

is an isometry, thanks to Parseval’s identity.

Proposition 4.13 (Completeness criterion). Let (en) be an orthonormal


system. Then (en ) is complete if and only if the vector space E spanned
by (en ) is dense in H .

Proof. If (en ) is complete we have that any x ∈ H is the limit of the


finite sums 1N x, ei ei , which all belong to E, therefore E is dense.
Conversely,  if E is dense, for any x ∈ H and any ε > 0 we can find a
n
vector z = i=1 ai ei with z − x < ε. By applying Proposition 4.6
twice (first to the vector space spanned by {e1 , . . . , em }, and then to the
vector space spanned by {e1 , . . . , en }) we get


m 
n 
n
x − x, ei ei  ≤ x − x, ei ei  ≤ x − ai ei  < ε
i=1 i=1 i=1

for m ≥ n. Since ε is arbitrary this proves that the sum of the series is
equal to x.

The following proposition provides a necessary and sufficient condi-


tion for the existence of a complete orthonormal system. We recall that
a metric space (X, d) is said to be separable if there exists a countable
dense subset D ⊂ X.

Theorem 4.14. A Hilbert space H admits a complete orthonormal sys-


tem (ek )k∈N if and only if H , as a metric space, is separable.

Proof. If H admits a complete orthonormal system (ek )k∈N then H is sep-


arable, because the collection D of finite sums with rational coefficients
of the vectors ek provides a countable dense subset (indeed, the closure
of D contains the finite linear combinations of the vectors ek and then the
whole space).
Conversely, assume that H is separable and let (vn ) be a dense se-
quence. We define e0 = v0 , e1 = vk1 where k1 is the first k > k0 = 0
such that vk is linearly independent from v0 , e2 = vk2 where k2 is the
first k > k1 such that vk is linearly independent from {e0 , e1 }, and so on.
In this way we have built a sequence (ei ) of linearly independent vectors
70 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

generating the same vector space generated by (vn ). Let S be this vec-
tor space, and let us represent it as ∪n Sn , where Sn is the vector space
generated by {e0 , . . . , en }. Notice that S is dense, as all vn belong to S.
By applying the Gram-Schmidt process to ei , an operation that does not
change the vector spaces Sn generated by the vectors e0 , . . . , en , we can
also assume that (ei ) is an orthonormal system. Then, Proposition 4.13
gives that (ei ) is complete.

4.5. Hilbert spaces on C


In this section we illustrate briefly how the concepts introduced so far
extend to complex vector spaces H . A pre–Hilbert space is a complex
vector space H endowed with a mapping

H × H → C, (x, y) → x, y,

called scalar product, such that:

(i) x, x ≥ 0 for all x ∈ H and x, x = 0 if and only if x = 0;


(ii) x, y = y, x for all x, y ∈ H ;
(iii) αx + βy, z = αx, z + βy, z for all x, y, z ∈ H and α, β ∈ C.

It turns out that x := x, x is still a norm, because the Cauchy-
Schwarz inequality still holds. Hence, we can define Hilbert spaces as
those spaces for which the norm induces a complete distance.
The canonical model of n-dimensional Hilbert space is Cn . Given a
measure space (X, F , μ), a basic example of Hilbert space is the space
of F -measurable and square integrable functions f : X → C. In this
context F -measurable means that both the real and the imaginary part of
f are F -measurable. In this space one can define the scalar product

 f, g := f (x)g(x) dμ(x)
X

and prove that it induces an Hilbert space structure. The space 2 (C) of
complex-values sequences (z n ) with (|z n |) ∈ 2 (R) is a particular case.
The norm still satisfies the parallelogram identity, so that we can still
prove the existence of orthogonal projections on closed subspaces and its
characterization in terms of

Re x − πY (x), z = 0 ∀z ∈ Y.

Analogously, in Remark 4.4, one has to replace the scalar product by its
real part.
71 Introduction to Measure Theory and Integration

Riesz representation theorem still holds (now for continuous and C-linear
functionals) and the concepts of orthonormal system and complete or-
thonormal system make sense. We have Bessel’s inequality for orthonor-
mal systems and Parseval’s identity for complete orthonormal systems.
Finally, 2 (C) is the canonical model of all separable Hilbert spaces; as
in the real case the correspondence is induced by the choice of a complete
orthonormal system, which provides coordinates of a vector.
We conclude this chapter providing a natural example, considered in
the literature, of non-separable Hilbert space.
Example 4.15 (Quasi-periodic functions). We define the space A P(R)
of almost periodic functions as the closure, with respect to uniform con-
vergence in R, of the vector space generated by complex-valued periodic
functions (of arbitrary period). This space has been extensively studied
by Bochner and Bohr. It is easy to show that the space of almost periodic
functions is not only a vector space (it is a subspace of C(R, C)), but also
an algebra, i.e. f g ∈ A P(R) whenever f, g ∈ A P(R).
If f is almost periodic one can also show (by approximation, taking
into account that this property is linear with respect to f and holds for
periodic functions) that there exists the limit
 T
1
M( f ) := lim f (x + t) dt.
T →+∞ 2T −T

In addition, it is easily seen that the limit is independent of x.


The space A P(R) of all almost periodic functions is a pre-Hilbert
space when endowed with the following inner product

 f, g A P := M( f ḡ) f, g ∈ A P(R).

For any λ ∈ R define

eλ (t) = eiλt , t ∈ R.

Then eλ ∈ A P(R), eλ , eλ  A P = 1 and

ei T (λ−ν) − e−i T (λ−ν)


eλ , eν  A P = lim =0 whenever λ  = ν,
T →+∞ T i(λ − ν)

so that (eλ )λ∈R is an orthonormal system in A P(R) having the cardinality


of continuum. One can also characterize the (abstract) Hilbert completion
of A P(R) (the so-called Bohr almost periodic functions) and prove that
the system {eλ }λ∈R is complete. For more details see e.g. [4].
72 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Exercises
4.1 Let (X,  · ) be a normed space, and assume that the norm satisfies the
parallelogram identity (4.2). Set
1 1
x, y := x + y2 − x − y2 , x, y ∈ X.
4 4
Show that ·, · is a scalar product whose induced norm is  · . Use this identity
to show that any linear isometry between pre-Hilbert spaces preserves also the
scalar product.
4.2 Show that, in the situation considered in Remark 4.4, π K (x) is characterized
by the property
x − π K (x), z − π K (x) ≤ 0 ∀z ∈ K .
4.3 Let H be a finite dimensional pre-Hilbert space and let {v1 , . . . , vn }, with
n = dim H , be a basis of it. Define
v2 , f 1  v3 , f 1  v3 , f 2 
f 1 = v1 , f 2 = v2 − f 1 , f 3 = v3 − f1 − f 2 , ......
 f1, f1  f1, f1  f2 , f2 
Show that ei = f i / f i  is an orthonormal system in H (notice that vk − f k is
the projection of vk on the vector space generated by {v1 , . . . , vk−1 }).
4.4 Let H be a Hilbert space, and let X be an infinite-dimensional separable
subspace. Show that


π X (x) = x, ek ek ∀x ∈ H,
k=0
where(ek ) is any complete orthonormal system of X. Hint: show that the vector
x − k x, ek ek is orthogonal to all vectors of X.
4.5 Let X be the space of functions
 f : [0, 1] → R such that f (x)  = 0 for at
most countably many x, and x f 2 (x) < +∞. Show that X, endowed with the
scalar product 
 f, g := f (x)g(x),
x∈[0,1]
is a non-separable Hilbert space.
4.6 Let (ek )k∈N be a complete orthonormal system of H . Show that, for any
x, y ∈ H we have
∞
x, ek y, ek  = x, y. (4.10)
k=0
4.7
Show that for any Hilbert space H there exists a family (not necessarily
finite or countable) of vectors {ei }i∈I such that:
(i) ei , e j  is equal to 1 if i = j, and to 0 otherwise;
(ii) for any vector x ∈ H there exists a countable set J ⊂ I with

x= x, ei ei .
i∈J
Hint: use Zorn’s lemma.
Chapter 5
Fourier series

In this chapter we study the problem of representing a given T -periodic


function as a superposition, for a suitable choice of the coefficients, of
more “elementary” ones. This problem was first studied by J. Fourier
in the case when the elementary functions are the trigonometric ones
(nowadays we know that many different choices are indeed possible).
Thanks to the theory of L 2 spaces and of Hilbert spaces developed in the
previous chapters, the problem can be formalized by looking for com-
plete orthonormal systems in L 2 made by trigonometric functions.
We shall mostly be concerned with the case of 2π-periodic functions,
but a simple change of scale (see Remark 5.1) easily provides the trans-
lation of the results to arbitrary periods. 
We are concerned with the measure space (−π, π), B ((−π, π)), λ ,
where λ is the Lebesgue measure. As usual, we shall write for brev-
ity L 2 (−π, π). We shall denote by ·, · the canonical scalar product
given by
 π
 f, g := f (x)g(x) dλ = f (x)g(x) dx, f, g ∈ L 2 (−π, π).
(−π,π) −π

Let us consider, as a family of elementary functions, the trigonometric


system, given by:
1 1 1
√ ; √ cos kx, k ∈ N, k ≥ 1; √ sin kx, k ∈ N, k ≥ 1.
2π π π
(5.1)
It is easy to check with integration by parts that this is an orthonormal sys-
tem in L 2 (−π, π), see Exercise 5.1. Thus, in view of Proposition 4.10,
the series of functions
1 ∞
S(x) = a0 + (ak cos kx + bk sin kx), (5.2)
2 k=1

L. Ambrosio et al., Introduction to Measure Theory and Integration


© Scuola Normale Superiore Pisa 2011
74 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

is convergent in L 2 (−π, π) for any f ∈ L 2 (−π, π), where


 π
1
ak := f (y) cos kydy, k ∈ N,
π −π

and  π
1
bk := f (y) sin kydy, k ∈ N, k ≥ 1.
π −π

Notice that a0 /2 is the mean value of f on (−π, π), in agreement with


the fact that all terms in the series (5.2) have mean value 0 on (−π, π).
To recognize (5.2) in terms of scalar products, we see that the term a0 /2
corresponds to % &
1 1
f, √ √
2π 2π
and the terms ak cos kx, bk sin kx for k ≥ 1, correspond respectively to
% & % &
1 1 1 1
f, √ cos kx √ cos kx, f, √ sin kx √ sin kx.
π π π π

Formula (5.2) is called the trigonometric Fourier series of f .


The Bessel inequality (4.7) reads, in this context, as follows:

1 π
1 2  ∞
| f (x)| dx ≥ a0 +
2
(ak2 + bk2 ). (5.3)
π −π 2 k=1


Indeed, it is easily seen that a02 π/2 = ( f, 1/ 2π)2 and, for k ≥ 1,
% &2 % &2
1 1
ak2 π = f, √ cos kx , bk2 π = f, √ sin kx .
π π

First, we shall find sufficient conditions on f ensuring the pointwise con-


vergence of the series S(x) to f (x) in (−π, π). Then, we shall show
that the trigonometric system is complete, so that the inequality above
is actually an equality. As shown in Exercise 5.4 and Exercise 5.5, the
trigonometric system, the trigonometric series and the form of the coef-
ficients become much more nice and symmetric in the complex-valued
Hilbert space L 2 (−π, π); C :

  π
1
f (x) = an e inx
where an := f (x)e−inx dx.
n∈Z
2π −π
75 Introduction to Measure Theory and Integration

Remark 5.1 (2T -periodic functions). If f ∈ L 2 (−T, T ) we can write


instead
a0  ∞
π π
f (x) = + ak cos kx + bk sin kx
2 k=1
T T

with
⎧  T
⎪ 1

⎪ f (x) dx if k = 0;

⎨ T −T
ak :=

⎪ 

⎪ 1 T π
⎩ f (x) cos kx d x if k > 0,
T −T T


1 T
π
bk := f (x) sin kx dx.
T −T T

5.1. Pointwise convergence of the Fourier series


For any integer N ≥ 1 we consider the partial sum

1 N
SN (x) := a0 + (ak cos kx + bk sin kx), x ∈ [−π, π).
2 k=1

Since the functions cos kx and sin kx are 2π–periodic, it is natural to


extend f to the whole of R as a 2π–periodic function '
f , setting

'
f (x + 2πn) = f (x), x ∈ [−π, π), n = ±1, ±2, . . . . (5.4)

We shall denote in the sequel by Hl,r (z) the “Heaviside” function



l if z ≤ 0;
Hl,r (z) :=
r if z > 0.

Lemma 5.2. For any integer N ≥ 1 and x, l, r ∈ R we have


 '  
l +r 1 π
f (x + τ ) − Hl,r (τ )  1 
S N (x) − = sin N + τ dτ.
2 2π −π sin(τ/2) 2
(5.5)
76 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Proof. Write
1 N
S N (x) = a0 + (ak cos kx + bk sin kx)
2 k=1

  
1 π
1 N
= f (y) + (cos kx cos ky + sin kx sin ky) dy
π −π 2 k=1

  
1 π
1 N
= f (y) + cos k(x − y) dy.
π −π 2 k=1

To evaluate the sum, we notice that for any z ∈ R


 
N

1
2
+ cos kz sin 12 z
k=1

1  1 N
    
= sin 2 z + sin k + 1
2
z − sin k − 1
2
z
2 k=1
1  
= sin N + 12 z .
2
Therefore  
1 N
1 sin N + 12 z
+ cos kz =  (5.6)
2 k=1 2 sin 12 z
and so,
 π
 
1 sin N + 12 (x − y)
S N (x) = f (y)  dy. (5.7)
2π −π sin 12 (x − y)
Now, setting τ = y − x we get
 π−x  
1 sin N + 12 τ
SN (x) = '
f (x + τ )  dτ
2π −π−x sin 12 τ
 π
 
1 sin N + 12 τ
= '
f (x + τ )  dτ
2π −π sin 12 τ
since the function under the integral is 2π–periodic. Now, integrating
(5.6) over [−π, π] yields
 π  
1 sin N + 12 τ
1=  dτ,
2π −π sin 12 τ
77 Introduction to Measure Theory and Integration

so that
 π
    
1 sin N + 12 τ 1 0 sin N + 12 τ
 dτ = 1 =  dτ.
π 0 sin 12 τ π −π sin 12 τ
If we multiply both sides by l and r, and subtract the resulting identities
from (5.7), (5.5) follows.
Proposition 5.3 (Dini’s test). Let x, l, r ∈ R be such that
 π '
| f (x + τ ) − Hl,r (τ )|
dτ < ∞. (5.8)
−π | sin(τ/2)|
Then the Fourier series of f converges to (l + r)/2 at x.
Dini’s test shows a remarkable property of the Fourier series: while the
specific value of the coefficients ak and bk depends on the behaviour of f
on the whole interval (−π, π), and the same holds for the Fourier series,
the character of the series (convergent or not) at a given point x depends
only on the behaviour of f in the neighbourhood of x: indeed, it is this be-
haviour that influences the integrability of ( '
f (x +τ )− Hl,r (τ ))/ sin(τ/2)
(the only singularity being at τ = 0).
In the next example we provide sufficient conditions for the conver-
gence of the Fourier series.
Example 5.4. Assume that f : [−π, π] → R is L-Lipschitz continuous,
i.e.
| f (x) − f (y)| ≤ L|x − y| ∀ x, y ∈ [−π, π]
for some L ≥ 0. Then Dini’s test is fulfilled at any x ∈ R \ Zπ choosing
l =r = ' f (x), and at any x ∈ Zπ choosing l = ' f (x− ) and r = '
f (x+ ) (1) .
Indeed, with these choices of l and r, the quotient
'
f (x + τ ) − Hl,r (τ )
sin(τ/2)
is bounded in a neighbourhood of 0.
The same conclusions hold when f is α–Hölder continuous for some
α ∈ (0, 1], i.e.
| f (x) − f (y)| ≤ L|x − y|α , ∀ x, y ∈ [−π, π]
for some L ≥ 0: in this case the quotient is bounded from above, near 0,
by the function L|τ |α /| sin(τ/2)| ∼ 2L|τ |α−1 which is integrable.

(1) here we denote by g(x ), g(x ) the left and right limits of g at x
− +
78 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

More generally, the argument of the previous example can be used to


show that the Fourier series is pointwise convergent for piecewise C 1
functions f : at continuity points x the series converges to f (x), and at
(jump) discontinuity points x it converges to ( f (x− ) + f (x+ ))/2. How-
ever, the mere continuity of f is not sufficient to ensure pointwise con-
vergence of the Fourier series.
In order to prove Proposition 5.3, we need the following Riemann–
Lebesgue lemma, a tool interesting in itself.
Lemma 5.5. Let (ek ) be an orthonormal system in L 2 (−π, π). Assume
that there exists M > 0 such that ek ∞ ≤ M for all k ∈ N. Then for
any f ∈ L 1 (−π, π) we have
 π
lim f (x)ek (x) dx = 0. (5.9)
k→∞ −π

Proof. Notice first that if f ∈ L 2 (−π, π) the conclusion of the lemma is


trivial. We have in fact in this case
 π
f (x)ek (x) dx =  f, ek 
−π

and, since by Bessel’s inequality the series ∞ 1 | f, ek | is convergent,
2

we have limk  f, ek  = 0.
Let us now consider the general case. We know that bounded continu-
ous functions are dense in L 1 (−π, π), hence for any ε > 0 we can find
g ∈ Cb (−π, π) such that  f − g1 < ε. As a consequence
| f, ek | = | f − g, ek | + |g, ek | ≤ Mε + |g, ek |
and letting k → ∞ we obtain lim supk | f, ek | ≤ Mε. Since ε is arbit-
rary the proof is achieved.
Proof of Proposition 5.3. Set
'
f (x + τ ) − Hl,r (τ )
g(τ ) := ∈ L 1 (−π, π). (5.10)
sin(τ/2)
Then, writing
1 1 1
sin[(N + )t] = sin N t cos t + cos N t sin t
2 2 2
and applying the Riemann–Lebesgue lemma to g cos t/2 (with e N =
sin N t) and to g sin(t/2) (with e N = cos N t) we obtain from (5.5) that
SN (x) converge to (l + r)/2.
79 Introduction to Measure Theory and Integration

5.2. Completeness of the trigonometric system


Proposition 5.6. The trigonometric system (5.1) is complete. In particu-
lar equality holds in (5.3) and
 π
lim | f (x) − S N f (x)|2 dx = 0 ∀ f ∈ L 2 (−π, π). (5.11)
N →∞ −π

Proof. We show that the vector space E generated by the trigonometric


system is dense in L 2 (−π, π). Let H  be the closure, in the L 2 (−π, π)
norm, of E, that is easily seen to be still a vector space as well. We will
prove in a series of steps that H  contains larger and larger classes of
functions.
Let f : [−π, π] → [0, +∞) be a Lipschitz function, and let us prove
that it belongs to H  . Indeed, we know from Example 5.4 that S N → f
pointwise in (−π, π). On the other hand, we already know from Propos-
ition 4.10(ii) that the Fourier series is convergent in L 2 (−π, π) to some
function g (which is indeed, by Exercise 4.4, the orthogonal projection of
f on H  ), therefore a subsequence (S N (k) ) is converging λ-almost every-
where to g. It follows that g = f and S N → f in L 2 (−π, π).
If now g : [−π, π] → [0, +∞) is continuous, we know that g can be
monotonically approximated by the Lipschitz functions

gλ (x) := min g(y) + λ|x − y| , x ∈ [−π, π]
y∈[−π,π]

(see Exercise 2.11), that converge to g also in L 2 (−π, π) by the dom-


inated convergence theorem. As a consequence also g belongs to H  .
Since H  is invariant by addition of constants, we proved that all continu-
ous functions in [−π, π] belong to H  . We conclude using the density of
this class of functions in L 2 (−π, π).
Remark 5.7. Let f ∈ L 2 (−π, π). Then, the Parseval identity reads as
follows  π
1 1  ∞
| f (x)|2 dx = a02 + (ak2 + bk2 ). (5.12)
π −π 2 k=1

For instance, taking f (x) = x one finds the following nice relation
between π and the harmonic series with exponent 2:
∞
1 π2
= .
k=1
k2 6

Notice that (5.11) provides, for any f ∈ L 2 (−π, π), the existence of
a subsequence N (k) such that SN (k) f (x) → f (x) for L 1 –a.e. x ∈
80 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

(−π, π). Is it true that the whole sequence S N f converges a.e. to f ?


This problem, surprisingly difficult, has been solved by L.Carleson only
in 1966, see [1].
Finally, we notice that there exist other important examples of com-
plete orthonormal systems, besides the trigonometric one. Some of them
are illustrated in the exercises.

5.3. Uniform convergence of the Fourier series


We conclude by studying ∞ the uniform convergence of the Fourier series.
We recall that a series 0 xn in a Banach
 space E is said to be totally
convergent if the numerical series ∞ 0 x n  is convergent. Using the
completeness of E it is not difficult to check (see Exercise 5.2) that any
totally convergent series is convergent  (as we have seen in the previous
chapter, thismeans that the finite sums 0N xn converge in E to a vector,
denoted by ∞ 0 x n ).
Now we show that the Fourier series of C 1 functions f with f (−π) =
f (π) are uniformly convergent: the proof highlights two important prin-
ciples, whose validity extend to higher order derivatives (see Exercise
5.9) and to Fourier transforms: first, the Fourier coefficients of the deriv-
ative of a function are linked to the Fourier coefficients of the function;
second, higher regularity of f implies a faster decay of the Fourier coef-
ficients, and therefore a convergence in stronger norms of the Fourier
series.

Proposition 5.8. Assume that f ∈ C 1 ([−π, π]) and that f (−π) =


f (π). Then the Fourier series of f converges uniformly to f in [−π, π].

Proof. We first notice that '


f in (5.4) is Lipschitz continuous, so that by
Proposition 5.3 we have

1 ∞
f (x) = a0 + (ak cos kx + bk sin kx) ∀x ∈ [−π, π].
2 k=1

Let us consider the Fourier series of the derivative f  of f ,




(ak cos kx + bk sin kx) x ∈ [−π, π],
k=1

where, for k ≥ 1 integer,


 π  π
 1 1
ak = f  (y) cos ky dy, bk = f  (y) sin ky dy. (5.13)
π −π π −π
81 Introduction to Measure Theory and Integration

Notice that a0 = 0 because f (−π) = f (π) implies that the mean value
of f  on (−π, π) is 0. As easily checked through an integration by parts
(using again the fact that f (−π) = f (π)), we have ak = kbk and bk =
−kak . Then, by the Bessel inequality it follows that

∞ ∞  π
 2  2 1
k 2
(ak2 + bk2 ) = (ak ) + (bk ) ≤ | f  (x)|2 dx < ∞. (5.14)
k=1 k=1
π −π

Therefore the Fourier series of f is totally convergent in C([−π, π]) and


therefore uniformly convergent. We have indeed


max |ak cos kx + bk sin kx|
x∈[−π,π]
k=1


≤ (|ak | + |bk |)
k=1
1/2 1/2
∞ ∞
≤ k 2 (|ak | + |bk |)2 k −2 < ∞.
k=1 k=1

Exercises
5.1 Check that the trigonometric system (5.1) is orthogonal.

5.2 Let E be a Banach space. Show that any totally convergent series n xn ,
with (xn ) ⊂ E, is convergent. Moreover,
$∞ $  ∞
$ $
$ xn $ ≤ xn . (5.15)
n=0 n=0
N M
Hint: estimate  0 xn − 0 xn  with the triangle inequality.
5.3 Prove that the following systems on L 2 (0, π) are orthonormal and complete
(
2
sin kx, k ≥ 1,
π
and (
1 2
√ ; cos kx, k ≥ 1.
π π
5.4 Show that
1
ek (x) := √ eikx , k∈Z

is a complete orthonormal system in L 2 ((−π, π); C). Hint: in order to show
completeness, consider first the cases where f is real-valued or i f is real-valued.
82 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

5.5 Let (ek ) be as in Exercise 5.4. Using the Parseval identity show that
  π 2
π 1 
| f (x)|2 dx = f (x)e−ikx dx ∀ f ∈ L 2 ((−π, π); C) .
−π 2π k∈Z −π

N
5.6 Let f ∈ L 2 ((−π, π); C) and let S N f = −N  f, ek ek , with N ≥ 1, be the
Fourier sums corresponding to the complete orthonormal system in Exercise 5.4.
Show that
 π
f (x) − S N f (x) = G N (x − y)( f (x) − f (y)) dy
−π

with
sin((N + 1/2)z)
G N (z) := .
sin(z/2)
 
Hint: use the identities 0N eiky = 0N (eiy )k = (ei(N +1)z − 1)/(eiy − 1).
 −4
5.7 Arguing as in Remark 5.7, show that ∞ 1 k = π 4 /90. Hint: consider the
function f (x) = x . 2

5.8 Chebyschev polynomials Cn in L 2 (a, b), with (a, b) bounded interval, are
the ones obtained by applying the Gram-Schmidt procedure to the vectors 1, x,
x 2 , x 3 , . . .. They are also called Legendre polynomials when (a, b) = (−1, 1).
(a) Compute explicitly the first three Legendre polynomials.
(b) Show that {Cn }n∈N is a complete orthonormal system. Hint: use the density
of polynomials in C([a, b]).
(c)
Show that the n-th Legendre polynomial Pn is given by
(
2n + 1 1 d n 2
Pn (x) = (x − 1)n .
2 2n n! d n x

5.9 Let f ∈ C m [−π, π]; C with f ( j) (−π) = f ( j) (π) for all j = 0, . . . , m −
1. Show that ck(m) , the k-th Fourier coefficient of f (m) is linked to ck , the k-th
Fourier coefficient of f , by ck(m) = (ik)m ck .
Chapter 6
Operations on measures

In this chapter we collect many useful tools in Analysis and Probability


that will be widely used in the following chapters. We will study the
product of measures (both finite and countable), the product of measures
by L 1 functions, the Radon–Nikodým theorem, the convergence of meas-
ures on the real line R and the Fourier transform.

6.1. The product measure and Fubini–Tonelli theorem


Let (X, F ) and (Y, G ) be measurable spaces. Let us consider the product
space X ×Y . A set of the form A× B, where A ∈ F and B ∈ G , is called
a measurable rectangle. We denote by R the family of all measurable
rectangles. R is obviously a π–system. The σ –algebra generated by R
is called the product σ –algebra of F and G . It is denoted by F × G .
Given σ –finite measures μ in (X, F ) and ν in (Y, G ), we are going to
define the product measure μ × ν in (X × Y, F × G ).
First, for any E ∈ F × G we define the sections of E, setting for
x ∈ X and y ∈ Y ,

E x := {y ∈ Y : (x, y) ∈ E}, E y := {x ∈ X : (x, y) ∈ E}.

Proposition 6.1. Assume that μ and ν are σ –finite and let E ∈ F × G .


Then the following statements hold.

(i) E x ∈ G for all x ∈ X and E y ∈ F for all y ∈ Y .


(ii) The functions

x → ν(E x ), y → μ(E y ),

are F –measurable and G –measurable respectively. Moreover,


 
ν(E x ) dμ(x) = μ(E y ) dν(y). (6.1)
X Y

L. Ambrosio et al., Introduction to Measure Theory and Integration


© Scuola Normale Superiore Pisa 2011
84 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Proof. Assume first that E = A × B is a measurable rectangle. Then, if


(x, y) ∈ X × Y we have

B if x ∈ A A if y ∈ B
Ex = E =
y
∅ if x ∈ / A, ∅ if y ∈ / B.

Consequently,

ν(E x ) = 1 A (x)ν(B), μ(E y ) = 1 B (y)μ(A),

so that (6.1) clearly holds.


Now, let D be the family of all E ∈ F × G such that (i) is fulfilled.
Clearly, D is a Dynkin system including the π–system R . Therefore, (i)
follows from the Dynkin theorem.
Now, if both μ are ν are finite, let D be the family of all E ∈ F ×
G such that (ii) is fulfilled. Clearly, D is a Dynkin system including
the π–system R (stability under complement follows by the identities
ν((E c )x ) = ν(Y ) − ν(E x ) and μ((E c ) y ) = μ(X) − μ(E y )). Therefore,
(ii) follows from the Dynkin theorem as well.
In the general σ –finite case we argue by approximation: if E ∈ F ×G ,
F  X h ↑ X and G  Yh ↑ Y satisfy μ(X h ) < ∞ and ν(Yh ) < ∞, we
define the finite measures

μh (A) = μ(A ∩ X h ), νh (B) = ν(B ∩ Yh )

to obtain that x → νh (E y ) is F –measurable and y  → μh (E x ) is G –


measurable for all E ∈ E × G . Passing to the limit as h → ∞ in the
identity
  
νh (E x ) dμ(x) = νh (E x ) dμh (x) = μh (E y ) dνh (y)
Xh X Y

= μh (E y ) dν(y)
Yh

the continuity properties of measures and integrals give (6.1) as well.

Theorem 6.2 (Product measure). If μ and ν are σ –finite, there exists a


unique measure λ in (X × Y, F × G ) satisfying

λ(A × B) = μ(A)ν(B) for all A ∈ F , B ∈ G .

The measure λ is σ -finite and denoted by μ × ν. Furthermore μ × ν


is finite (resp. a probability measure) if both μ and ν are finite (resp.
probability measures).
85 Introduction to Measure Theory and Integration

Proof. Existence is easy: we set


 
λ(E) = ν(E x ) dμ(x) = μ(E y ) dν(y), E ∈ F × G . (6.2)
X Y

Using the continuity and additivity properties of the integral, it is immedi-


ate to check that λ is a measure on (X ×Y, F ×G ). In the case of σ –finite
measures, uniqueness follows by the the coincidence criterion for posit-
ive measures stated in Proposition 1.15: indeed, the value of the product
measure is uniquely determined on the π–system K made by rectangles
A × B with μ(A) and ν(B) finite, and thanks to the σ –finiteness assump-
tion there exist E n = An × Bn ∈ K with E n ↑ X × Y .

Corollary 6.3. Let E ∈ F × G be such that μ × ν(E) = 0. Then


μ(E y ) = 0 for ν–almost all y ∈ Y and ν(E x ) = 0 for μ–almost all
x ∈ X.
Proof. It follows directly from (6.2).
We consider here the measure space (X × Y, F × G , λ), where λ =
μ × ν and μ and ν are σ –finite.
Theorem 6.4 (Fubini–Tonelli). Let F : X ×Y → [0, +∞] be a F ×G –
measurable map. Then the following statements hold.
(i) For any x ∈ X (respectively y ∈ Y ), the function y  → F(x, y) (re-
spectively x → F(x, y)) is G –measurable (resp. F –measurable).
(ii) The functions
 
x → F(x, y) dν(y), y → F(x, y) dμ(x)
Y X

are respectively F –measurable and G –measurable.


(iii) We have
  ) *
F(x, y) dλ(x, y) = F(x, y) dν(y) dμ(x)
X×Y X Y
 ) * (6.3)
= F(x, y) dμ(x) dν(y).
Y X

Proof. Assume first that F = 1 E , with E ∈ F × G . Then we have

F(x, y) = 1 E x (y), x ∈ X, F(x, y)(x) = 1 E y (x), y ∈ Y,

so (i), (ii) and (iii) follow from Proposition 6.1. Consequently, by lin-
earity, (i)–(iii) hold when F is a simple function. If F is general, it
86 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

is enough to approximate it by a monotonically increasing sequence of


simple functions and then pass to the limit using the monotone conver-
gence theorem.
Remark 6.5 (The definition of integral revisited). We noticed in Re-
mark 2.13 that the integral of nonnegative functions can also be defined
without using the archimedean integral, by considering minorant simple
functions. If we follow this approach, the identity that we used to define
the integral can be derived by applying the Fubini–Tonelli theorem to the
subgraph
E := {(x, t) ∈ X × R : 0 < t < f (x)} ,
with the product measure μ × λ, λ being the Lebesgue measure. Indeed,
it is not difficult to show that E is F × B (R)–measurable whenever f
is F -measurable, so that
 ∞  ∞ 
μ({ f > t}) dt = μ(E t ) dt = μ × λ(E) = λ(E x ) dμ(x)
0
0 X

= f (x) dμ(x).
X

Of course, splitting F in positive and negative parts, also the case of


extended real valued maps can be considered:
Corollary 6.6. Let F : X × Y → [−∞, +∞] be a F × G –measurable
map. Then F is μ × ν–integrable if and only if:
(i) for μ–a.e. x ∈ X the
 function y → F(x, y) is ν–integrable;
(ii) the function x → Y |F(x, y)| dν(y) is μ–integrable.
If these conditions hold, we have
  ) *
F(x, y) d(μ × ν)(x, y) = F(x, y) dν(y) dμ(x). (6.4)
X×Y X Y

Notice that, strictly speaking, the function in (ii) is defined only out of
a μ–negligible set; by μ–integrability of it we mean μ–integrability of
any F –measurable
 extension of it (for instance we may set it equal to 0
wherever Y |F(x, y)| dν(y) is not finite).
Remark 6.7 (Finite products). The previous constructions extend with-
out any difficulty to finite products of measurable spaces (X i ,F i ). Name-
ly, the product σ -algebra F := × n
i F i in the cartesian product X :=
× n
1 X i is generated by the rectangles

{A1 × · · · × An : Ai ∈ F i , 1 ≤ i ≤ n} .
87 Introduction to Measure Theory and Integration

Furthermore, if μi are σ –finite measures in (X i , F i ), integrals with re-


spect to the product measure μ = × n
1 μi are defined by
   
F(x) dμ(x) = · · · F(x1 , . . . , xn ) dμn (xn ) · · · dμ2 (x2 ) dμ1 (x1 ),
X X1 X2 Xn

and any permutation in the order of the integrals would produce the same
result. Finally, the product measure is uniquely determined, in the σ –
finite case, by the product rule
n
μ (A1 × · · · × An ) =  μi (Ai ) Ai ∈ F i , 1 ≤ i ≤ n.
i=1

It is also not hard to show that the product is associative, both at the level
of σ –algebras and measures, see Exercise 6.1.

6.2. The Lebesgue measure on Rn


This section is devoted to the construction, the characterization and the
main properties of the Lebesgue measure in Rn , i.e. the length measure
in R1 , the area measure in R2 , the volume measure in R3 and so on.
Definition 6.8 (Lebesgue measure in Rn ). Let us consider the measure
space (R, B (R), L 1 ), where L 1 is the Lebesgue measure on (R,B (R)).
n
Then, we can define the measure space (Rn , × B(R), L
i=1
n
) with L n :=

× n
1 L 1 . We say that L n is the Lebesgue measure on Rn .
Since (see Exercise 6.2)
n
B (Rn ) = × B (R),
i=1

we can equivalently consider L n as a measure in (Rn , B (Rn )), forget-


ting its construction as a product measure (indeed, there exist alternative
and direct constructions of L n independent of the concept of product
measure).
As in the one-dimensional case, we will keep using the classical nota-
tion
 
f (x) dx := f 1 E dL n E ∈ B (Rn ), f : Rn → R Borel
E Rn

for integrals with respect to Lebesgue measure L n (or Riemann integrals


in more than one independent variable).
88 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

In the computation of Lebesgue integrals, a particular role is sometimes


played by the dimensional constant ωn = L n (B(0, 1)) (so that ω1 = 2,
ω2 = π, ω3 = 4π/3,. . . ). A general formula for the computation of ωn
can be given using Euler’s  function:
 ∞
(z) := t z−1 e−t dt z > 0.
0

Indeed, we have
π n/2
ωn = . (6.5)
( n2 + 1)
A proof of this formula, based on the identity (z + 1) = z(z) (which
gives also (n) = (n − 1)! for n ≥ 1 integer) is proposed in Exercise 6.7.
We are going to show that L n is invariant under translations and rota-
tions. For this we need some notation. For any a ∈ Rn and any δ > 0 we
set
 
Q(a, δ) : = x ∈ Rn : ai ≤ xi < ai + δ, ∀ i = 1, . . . , n
n
= ×[a , a + δ).
i=1
i i

Q(a, δ) is called the δ–box with corner at a. For all N ∈ N we consider


the family

Q N = {Q(2−N k, 2−N ) : k = (k1 , . . . , kn ) ∈ Zn }.

It is also clear that each box in Q N is Borel and that its Lebesgue measure
is 2−n N . Now we set

Q = Q N.
N =0

It is clear that all boxes in Q N are mutually disjoint and that their union
is Rn . Furthermore, if N < M, Q ∈ Q N and Q  ∈ Q M , then either
Q  ⊂ Q or Q ∩ Q  = ∅. If follows that if Q, Q  ∈ Q intersect, then
one of the two sets is contained in the other one.
Lemma 6.9. Let U be a non empty open set in Rn . Then U is the disjoint
union of boxes in Q .
Proof. For any x ∈ U , let Q x ∈ Q be the biggest box such that x ∈
Q x ⊂ U . This box is uniquely defined: indeed, fix an x; for any m there
is only one box Q x,m ∈ Q m such that x ∈ Q x,m ; moreover, since U is
open, for m large enough Q x,m ⊂ U ; we can then define Q x = Q x,m̃
where m̃ is the smallest integer m such that Q x,m ⊂ U .
89 Introduction to Measure Theory and Integration

This family {Q x }x∈U is a partition of U , that is, for any x, y ∈ U , either


Q x = Q y or Q x ∩ Q y = ∅; indeed, if we suppose that Q x ∩ Q y  = ∅,
then one of the two boxes is contained in the other, say Q x ⊂ Q y . This
leads to x ∈ Q x ⊂ Q y ⊂ U , contradicting the definition of Q x unless
Qx = Q y.
From Lemma 6.9 it follows easily that the σ –algebra generated by Q
coincides with B (Rn ).
Proposition 6.10 (Properties of the Lebesgue measure). The follow-
ing statements hold.
(i) (translation invariance) For any E ∈ B (Rn ), x ∈ Rn we have
L n (E + x) = L n (E), where

E + x = {y + x : y ∈ Rn }.

(ii) If μ is a translation invariant measure on (Rn , B (Rn )) such that


μ(K ) < ∞ for any compact set K , there exists a number Cμ ≥ 0
such that

μ(E) = Cμ L n (E) ∀ E ∈ B (Rn ).

(iii) (rotation invariance) For any orthogonal matrix R ∈ L(Rn ; Rn ) we


have
L n (R(E)) = L n (E) ∀ E ∈ B (Rn ).
(iv) For any T ∈ L(Rn ; Rn ) we have

L n (T (E)) = |det T |L n (E) ∀ E ∈ B (Rn ).

Proof. Fix x ∈ Rn . The measures L n (E) and L n (E + x) coincide on


the π–system of boxes; thanks to Lemma 6.9, this π–system generates
the Borel σ –algebra, so that the coincidence criterion for measures stated
in Proposition 1.15 gives that L n (E) = L n (E + x) for all Borel sets E.
Let us prove (ii). Let Q 0 ∈ Q 0 and set Cμ = μ(Q 0 ). Since Q 0 is
included in a compact set, we have Cμ < ∞. Since μ is translation
invariant, all boxes in Q 0 have the same μ measure. Now, let Q N ∈ Q N .
Since Q 0 is the disjoint union of 2−n N boxes in Q N which have all the
same μ measure (again by the translation invariance) we have that

μ(Q N ) = Cμ L n (Q N ).

So, Lemma 6.9 gives that μ(A) = Cμ L n (A) for any open set, and there-
fore for any Borel set.
90 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Let us now prove (iii). By the translation invariance of L n , the meas-


ure μ(E) = L n (R(E)) is easily seen to be translation invariant (because
R(E +z) = R(E)+ R(z)), hence L n (R(E)) = CL n (E) for some cont-
ant C. We can identify the constant C choosing E equal to the unit ball,
finding C = 1.
√ (iv). By polar decomposition we can write T =
Finally, let us prove
R ◦ S with S = T ∗ ◦ T symmetric and nonnegative definite, and R
orthogonal. Notice that on one hand |det T | = det S (because det R ∈
{−1, 1}) and on the other hand, by (iii) we have
L n (T (E)) = L n (R(S(E))) = L n (S(E)).
Hence, it suffices to show that L n (S(E)) = det SL n (E) for any sym-
metric and nonnegative definite matrix S. By the translation invariance of
L n (S(E)) there exists a constant C such that L n (S(E)) = CL n (E) for
any Borel set E. In this case we can identify the constant C choosing as
E a suitable n-dimensional cube: denoting by (ei ) an orthonormal basis
of eigenvectors of S, with eigenvalues αi ≥ 0 (whose product is det S),
choosing
   
 n  n
E= ci ei : |ci | ≤ 1 , so that S(E) = αi ci ei : |ci | ≤ 1 ,
i=1 i=1

the rotation invariance of L gives L (E) = 1 and L n (S(E)) =


n n

α1 · · · αn . Therefore C = det S and the proof is complete.

6.3. Countable products


We are here concerned with a sequence (X i , F i , μi ), i = 1, 2, . . ., of
probability spaces. We denote by X the product space

X := ×X
k=1
k

and by x = (xk ) the generic element of X.


We are going to define a σ –algebra of subsets of X. Let us first in-
troduce the cylindrical sets in X. A cylindrical set In,A is a set of the
following form
In,A = {x : (x1 , . . . , xn ) ∈ A},
where n ≥ 1 is an integer and A ∈ × n
1 F k . This representation is not
unique; however, since

In,A = A × ×
k=n+1
Xk

we have that In,A = Im,B with n < m implies B = A × X n+1 × · · · × X m .


91 Introduction to Measure Theory and Integration

We denote by C the family of all cylindrical sets of X. Notice also that


c
In,A = In,Ac ,
so that C is stable under complement. If In,A and Im,B belong to C we can
assume by the previous remarks that m = n, so that In,A ∪ In,B = In,A∪B
belongs to C . Therefore C is an algebra.
The σ –algebra generated by C is called the product σ –algebra of the
σ –algebras F i . It is denoted by

×F
k=1
k.

Now we define a function μ on C , setting


n 
μ(In,A ) = ×μ
k=1
k (A), In,A ∈ C . (6.6)

This definition is well posed, again thanks to the fact that In,A = Im,B
with n < m when B = A × X n+1 × · · · × X m . It is easy to check that μ is
additive: indeed, if In,A and Im,B are disjoint, using the previous remark
we can assume with no loss of generality that n = m, and therefore the
equality μ(In,A ∪ In,B ) = μ(In,A ) + μ(In,B ) follows by
n  n  n 
×μ
k=1
k (A ∪ B) = ×μ
k=1
k (A) + ×μ
k=1
k (B).

Theorem 6.11. The set function μ defined in (6.6) is σ –additive on C


and therefore, by the Carathéodory theorem, it has a unique extension to
a probability measure on (X, × ∞
1 F k ) that is denoted by

×μ k=1
k

Proof. To prove the σ –additivity of μ it is enough to show the continuity


of μ at ∅, or equivalently the implication


(E j ) ⊂ C , (E j ) nonincreasing, μ(E j ) ≥ ε0 > 0 ⇒ E j  = ∅.
n=1
(6.7)
In the following we are given a nonincreasing sequence (E j ) on C such
that μ(E j ) ≥ ε0 > 0. To prove (6.7), we need some more notation. We
set

X (n) = ×
k=n+1
Xk
92 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

and we define μ(n) on cylindrical sets of X (n) as in (6.6). Then, we con-


sider the sections of E j defined as
 
E j (x1 ) = x (1) ∈ X (1) : (x1 , x (1) ) ∈ E j , x1 ∈ X 1 .

E j (x1 ) is a cylindrical subset of X (1) and by the Fubini theorem we have



μ(E j ) = μ(1) (E j (x1 )) dμ1 (x1 ) ≥ ε0 > 0, j ≥ 1. (6.8)
X1

Set now
+ ε0 ,
F j,1 = x1 ∈ X 1 : μ(1) (E j (x1 )) ≥ , j ≥ 1.
2
Then F j,1 is not empty and by (6.8) we have
 
(1)
μ(E j ) = μ (E j (x1 )) dμ1 (x1 ) + μ(1) (E j (x1 )) dμ1 (x1 )
F j,1 c
F j,1

ε0
≤ μ1 (F j,1 ) + .
2
Therefore μ1 (F j,1 ) ≥ ε0 /2 for all j ≥ 1.
Obviously (F j,1 ) is a nonincreasing sequence of subsets of X 1 . Since
μ
1∞is σ –additive, it is continuous at 0. Therefore, there exists α1 ∈
1 F j,1 and so
ε0
μ(1) (E j (α1 )) ≥ , j ≥ 1. (6.9)
2
Consequently we have

E j (α1 ) = ∅, j ≥ 1. (6.10)

Now we iterate the procedure: for any x2 ∈ X 2 we consider the section


 
E j (α1 , x2 ) = x (2) ∈ X (2) : (α1 , x2 , x (2) ) ∈ E j , j ≥ 1.

By the Fubini theorem we have



(1)
μ (E j (α1 )) = μ(2) (E j (α1 , x2 )) dμ2 (x2 ). (6.11)
X2

We set
+ ε0 ,
F j,2 = x2 ∈ X 2 : μ(2) (E j (α1 , x2 )) ≥ , j ≥ 1.
4
93 Introduction to Measure Theory and Integration

Then by (6.9) and (6.10) we have



ε0 (1)
≤ μ (E j (α1 )) = μ(2) (E j (α1 , x2 )) dμ2 (x2 )
2 X2
 
(2)
= μ (E j (α1 , x2 )) dμ2 (x2 ) + μ(2) (E j (α1 , x2 )) dμ2 (x2 )
F j,2 [F j,2 ]c
ε0
≤ μ2 (F j,2 ) + .
4
Therefore μ2 (F j,2 ) ≥ ε0 /4. Since (F j,2 ) is nonincreasing and μ2 is σ –
additive, there exists α2 ∈ X 2 such that
ε0
μ2 (E j (α1 , α2 )) ≥ , j ≥ 1,
4
and consequently we have

E j ((α1 , α2 )) = ∅. (6.12)

Arguing in a similar way we see that there exists a sequence (αk ) ⊂ X


such that
E j (α1 , . . . , αn ) = ∅, for all j, n ≥ 1, (6.13)
where
 
E j (α1 , . . . , αn ) = x ∈ X (n) : (α1 , . . . , αn , x (n) ) ∈ E j , j, n ≥ 1.

SinceE j are cylindrical, this easily implies that (αn ) ∈ ∞ 1 E j . There-
fore ∞ 1 E j is not empty, as required.

Exercises
6.1 Let (X 1 , F 1 ), (X 2 , F 2 ), (X 3 , F 3 ) be measurable spaces. Show that

(F 1 × F 2 ) × F 3 = F 1 × (F 2 × F 3 ).

If we are given measures μi in F i , i = 1, 2, 3, show also that (μ1 ×μ2 ) ×μ3 =


μ1 × (μ2 × μ3 ).
6.2 Let us consider the measurable spaces (R, B (R)), (Rn , B (Rn )). Show that
n
B (Rn ) = × B (R).
i=1

Hint: to show the inclusion ⊂, use Lemma 6.9.


6.3 Let L n be the σ –algebra of Lebesgue measurable sets in Rn . Show that

L 1 × L 1  L 2.
94 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Hint: to show the strict inclusion, consider the set E = F × {0}, where F ⊂ R
is not Lebesgue measurable.
6.4 Show that the product σ –algebra is also generated by the family of products
× ∞
1 Ai where Ai ∈ F i and Ai  = X i only for finitely many i.
6.5 Writing properly L 3 as a product measure, compute L 3 (T ), where
+ ,
T = (x, y, z) : x 2 + y 2 < r 2 and y 2 + z 2 < r 2 .

6.6 [Computation of ωn ] Find a recursive formula linking ωn to ωn−2 , and use


it to show that ω2k = π k /k! and ω2k+1 = 2k+1 π k /(2k + 1)!!, where (2k + 1)!!
is the product of all odd integers between 1 and 2k + 1. Hint: use the Fubini–
Tonelli theorem.

6.7 Use Exercise 6.6 and the identities (1) = 1, (1/2) = π and (z + 1) =
z(z) to show (6.5).
6.8 Let μ and ν be σ –finite measures on (X, F ) and (Y, G ) respectively and let
λ = μ × ν. Let E = (F × G )λ , as defined in Definition 1.12, and let ζ be
the extension of λ to E . Show this version of the Fubini–Tonelli Theorem 6.4:
for any E –measurable function F : X × Y → [0, +∞] the following statements
hold:
(i) for μ–a.e. x ∈ X thefunction y  → F(x, y) is ν–measurable;
(ii) the function x  → F(x, y) dν(y), set to zero at all points x such that
Y
y →
 F(x, y) is not ν–integrable,
 is μ–measurable;
(iii) X Y F(x, y) dμ(x) dμ(y) = X×Y F(x, y) dζ(x, y).

6.9 Using the notation of the Fubini-Tonelli theorem, let X = Y = [0, 1],
F = G = B ([0, 1]), let μ be the Lebesgue measure and let ν be the counting
measure.
 Let D = {(x,
 x) y: x ∈ [0, 1]} be the diagonal in X × Y ; check that
X ν(D x ) dμ(x)  = Y μ(D ) dν(y).
6.10
Let ( f h ) be converging to f in L 1 (X × Y, μ × ν). Show the existence
of a subsequence h(k) such that f h(k) (x, ·) converge to f (x, ·) in L 1 (Y, ν) for
μ–a.e. x ∈ X. Show by an example that, in general, this property is not true for
the whole sequence.

6.4. Comparison of measures


In this section we study some relations between measures in a measurable
space (X, F ).
The first (immediate) one is the order relation: viewing measures as
set functions, we say that μ ≤ ν if μ(B) ≤ ν(B) for all B ∈ F . It is not
hard to see that the space of measures endowed with this order relation is
a complete lattice (see Exercise 6.13): in particular

μ∨ν(B) = sup {μ(A1 ) + ν(A2 ) : A1 , A2 ∈ F , (A1 , A2 ) partition of B}


95 Introduction to Measure Theory and Integration

and

μ∧ν(B) = inf {μ(A1 ) + ν(A2 ) : A1 , A2 ∈ F , (A1 , A2 ) partition of B} .

Another relation between measures is linked to the concept of product of


a function by a measure.
Definition 6.12. Let μ be a measure in (X, F ) and let f ∈ L 1 (X, F , μ)
be nonnegative. We set

f μ(B) := f dμ ∀B ∈ F . (6.14)
B

It is immediate to check, using the additivity and the continuity properties


of the integral, that f μ is a finite measure. Furthermore, the following
simple rule provides a way for the computation of integrals with respect
to f μ:  
h d( f μ) = h f dμ, (6.15)
X X
whenever h is F –measurable and nonnegative (or h f is μ–integrable,
see Exercise 6.11). It suffices to check the identity (6.15) on character-
istic functions h = 1 B (and in this case it reduces to (6.14)), and then
for simple functions. The monotone convergence theorem then gives the
general result.
Notice also that, by definition, f μ(B) = 0 whenever μ(B) = 0. We
formalize this relation between measures in the next definition.
Definition 6.13 (Absolute continuity). Let μ, ν be measures in F . We
say that ν is absolutely continuous with respect to μ, and write ν # μ, if
all μ–negligible sets are ν–negligible, i.e.

μ(A) = 0 ⇒ ν(A) = 0.

For finite measures, the absolute continuity property can also be given in
a (seemingly) stronger way, see Exercise 6.14.
The following theorem shows that absolute continuity of ν with respect
to μ is not only necessary, but also sufficient to ensure the representation
ν = f μ.
Theorem 6.14 (Radon–Nikodým). Let μ and ν be finite measures on
(X, F ) such that ν # μ. Then there exists a unique nonnegative ρ ∈
L 1 (X, F , μ) such that

ν(E) = ρ(x) dμ(x) ∀E ∈ F . (6.16)
E
96 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

We are going to show a more general result, whose statement needs two
more definitions. We say that a measure μ is concentrated on a F –
measurable set A if μ(X \ A) = 0. For instance, the Dirac measure δa
is concentrated on {a}, and the Lebesgue measure in R is concentrated
on the irrational numbers, and f μ is concentrated (whatever μ is) on
{ f  = 0}.
Definition 6.15 (Singular measures). Let μ, ν be measures in (X, F ).
We say that μ is singular with respect to ν, and write μ ⊥ ν, if there exist
disjoint F –measurable sets A, B such that μ is concentrated on A and
ν is concentrated on B.
The relation of singularity, as stated, is clearly symmetric. However, it
can also be stated in a (seemingly) asymmetric way, by saying that μ ⊥ ν
if μ is concentrated on a ν–negligible set A (just take B = Ac to see the
equivalence with the previous definition).
Example 6.16. Let X = R, F = B (R), μ the Lebesgue measure on
(X, F ) and ν = δx0 the Dirac measure at x0 ∈ R. Then μ is concentrated
on A := R \ {x0 }, whereas ν is concentrated on B := {x0 }. So, μ and ν
are singular.
Theorem 6.17 (Lebesgue). Let μ and ν be measures on (X, F ), with μ
σ –finite and ν finite. Then the following assertions hold.
(i) There exist two unique finite measures νa and νs on (X, F ) such that

ν = νa + νs , νa # μ, νs ⊥ μ. (6.17)

(ii) There exists a unique ρ ∈ L 1 (X, F , μ) such that νa = ρμ.


(6.17) is called the Lebesgue decomposition of ν with respect to μ. The
function ρ in (ii) is called the density of ν with respect to μ and it is
sometimes denoted by

ρ: = .

Radon–Nikodým theorem simply follows by the Lebesgue theorem noti-
cing that, in the case when ν # μ the uniqueness of the decomposition
gives νa = ν and νs = 0, so that ν = νa = ρμ.
Proof of Theorem 6.17. . We assume first that also μ is finite. Set
λ = μ + ν and notice that, obviously, μ # λ and ν # λ. Define a linear
functional F on L 2 (X, F , λ) setting

F(ϕ) := ϕ(x) dν(x), ϕ ∈ L 2 (X, F , λ).
X
97 Introduction to Measure Theory and Integration

The functional F is well defined and bounded (and consequently con-


tinuous) since, in view of the Hölder inequality, we have
 
|F(ϕ)| ≤ |ϕ(x)| dν(x) ≤ |ϕ(x)| dλ(x) ≤ [λ(X)]1/2 ϕ L 2 (X,F ,λ) .
X X

Now, thanks to the Riesz theorem, there exists a unique function f ∈


L 2 (X, F , λ) such that
 
ϕ(x) dν(x) = f (x)ϕ(x) dλ(x) ∀ϕ ∈ L 2 (X, F , λ). (6.18)
X X

Setting ϕ = 1 E , with E ∈ F , yields



ν(E) = f (x) dλ(x) ≥ 0,
E

which implies, by the arbitrariness of E, f (x) ≥ 0, λ–a.e. and, in par-


ticular, both μ–a.e. and ν–a.e. In the sequel we shall assume, possibly
modifying f in a λ–negligible set, and preserving the validity of (6.18),
that f ≥ 0 everywhere. By (6.18) it follows
 
ϕ(x)(1− f (x)) dν(x) = f (x)ϕ(x) dμ(x) ∀ϕ ∈ L 2 (X, F , λ).
X X
(6.19)
Setting ϕ = 1 E , with E ∈ F , yields
 
(1 − f (x)) dν(x) = f (x) dμ(x) ≥ 0
E E

because f ≥ 0. Thus, being E arbitrary, we obtain that f (x) ≤ 1 for


ν–a.e. x ∈ X. Set now

A := {x ∈ X : 0 ≤ f (x) < 1}, B := {x ∈ X : f (x) ≥ 1},

so that (A, B) is a F –measurable partition of X, and

νa (E) := ν(E ∩ A), νs (E) := ν(E ∩ B) ∀E ∈ F ,

so that νa = 1 A ν is concentrated on A, νs = 1 B ν is concentrated on B


and ν = νa + νs .
Then, setting in (6.19) ϕ = 1 B , we see that
 
μ(B) ≤ f dμ = (1 − f ) dν = 0
B B
98 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

because f = 1 ν–a.e. on B. It follows that νs is singular with respect to


μ.
We show now that the existence of ρ such that νa = ρμ. Heuristically,
this can be obtained choosing in (6.19) the function ϕ = (1 − f )−1 1 E∩A ,
but since this function need not to be in L 2 (X, F , λ) we argue by ap-
proximation: set in (6.19)

ϕ(x) = (1 + f (x) + · · · + f n (x))1 E∩A (x)

where n ≥ 1 and E ∈ F . Then we obtain


 
(1− f n+1
(x)) dν(x) = [ f (x)+ f 2 (x)+· · ·+ f n+1 (x)] dμ(x).
E∩A E∩A

Set ρ(x) = 0 for x ∈ B and

f (x)
ρ(x) := lim [ f (x) + f 2 (x) + · · · + f n+1 (x)] = , x ∈ A.
n→∞ 1 − f (x)

Then, by the monotone convergence theorem it follows that


 
νa (E) = ν(E ∩ A) = ρ(x) dμ(x) = ρ(x) dμ(x).
E∩A E

Setting E = X we see that ρ ∈ L 1 (X, F , μ), and the arbitrariness of E


gives that νa = ρμ.
Now we consider the case when μ is σ –finite. In this case there exists
a sequence of pairwise disjoint sets (X n ) ⊂ F such that



X= Xn with μ(X n ) < ∞.
n=0

Let us apply Theorem 6.17 to the finite measures μn = 1 X n μ, νn = 1 X n ν.


For any n ∈ N let νn = (νn )a + (νn )s = ρn μn + (νn )s be the Lebesgue
decomposition of νn with respect to μn . Now, set


∞ 
∞ 

νa := (νn )a , νs := (νn )s , ρ := ρn 1 X n .
n=0 n=0 n=0

Since

k 
k
(νn )a + (νn )s = νn = 1∪k X n ν,
0
n=0 n=0
99 Introduction to Measure Theory and Integration

we can pass to the limit as k → ∞ to obtain that νa and νs are finite


measures, and ν = νa + νs . Moreover, for any E ∈ F we have, using
the monotone convergence theorem,
 ∞ ∞ 

νa (E) = (νn )a (E) = ρn (x) dμn (x)
n=0 n=0 E
 
∞ 
= ρn (x)1 X n dμ(x) = ρ(x) dμ(x).
E n=0 E

So, νa # μ, and setting E = X we see that ρ is integrable with respect


to μ. Finally, it is easy to see that νs ⊥ μ, because if we denote by
Bn ∈ F μ–negligible sets where (νn )s are concentrated, we have that νs
is concentrated on the μ–negligible set ∪n Bn .
Finally, let us prove the uniqueness of νa and νs : assume that

ν = νa + νs = νa + νs

and let B, B  be μ–negligible sets where νs and νs are respectively con-
centrated. Then, as B ∪ B  is μ–negligible and both νs and νs are con-
centrated on B ∪ B  , for any set E ∈ F we have

νs (E) = νs (E ∩(B ∪ B  )) = ν(E ∩(B ∪ B  )) = νs (E ∩(B ∪ B  )) = νs (E).

It follows that νs = νs and therefore νa = νa .


The interested reader can have a look at a different proof of The-
orem 6.17 independent of Hilbert space theory, and based on three aux-
iliary variational principles; it turns out that the density f of ν a is the
maximizer in the problem


sup f dμ : f μ ≤ ν . (6.20)
X

See Exercise 6.17 and Exercise 6.18 for more details.


Remark 6.18. If μ is not σ –finite then the Lebesgue decomposition does
not hold in general. Consider for instance the case when X = [0, 1],
F = B ([0, 1]), μ is the counting measure and ν = L 1 . Then ν # μ
(as the only μ–negligible set is the empty set) but there is no ρ : [0, 1] →
[0, ∞] satisfying 
ν(E) = ρ dμ.
E
Indeed, this function should be μ-integrable and therefore it can be non-
zero only in a set at most countable.
100 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Exercises
6.11 Show that a F –measurable function h is f μ–integrable if and only if f h
is μ–integrable.
6.12 Show that ( f μ)∨(gμ) = ( f ∨g)μ and ( f μ)∧(gμ) = ( f ∧g)μ whenever
f, g ∈ L 1 (X, F , μ) are nonnegative.
6.13 Let {μi }i∈I be a family of measures in (X, F ). Show that



μ(B) := inf μi(k) (Bk ) : i : N → I,
k=0

(Bk ) countable F –measurable partition of B

is the greatest lower bound of the family {μi }i∈I , i.e. μ ≤ μi for all i ∈ I and it
is the largest measure with this property. Show also that



μ(B) := sup μi(k) (Bk ) : i : N → I,
k=0

(Bk ) countable F –measurable partition of B

is the smallest upper bound of the family {μi }i∈I , i.e. μ ≥ μi for all i ∈ I and
it is the smallest measure with this property.
6.14 Let μ, ν be measures in (X, F ) with ν finite. Then ν # μ if and only if
for all ε > 0 there exists δ > 0 such that

A ∈ F , μ(A) < δ ⇒ ν(A) < ε.

6.15 Assume that ν # μ and that ν ⊥ μ. Show that ν = 0.


6.16 Assume that σ ≤ μ + ν and that σ ⊥ ν. Show that σ ≤ μ.
6.17
Prove Theorem 6.14 in the following two steps:
(1) Show that a maximizer f in (6.20) exists.
(2) Setting σ = ν − f μ ≥ 0, σ satisfies

t > 0, B ∈ F, t1 B μ ≤ σ ⇒ μ(B) = 0. (6.21)

Then, apply Exercise 6.18 to conclude that σ ⊥ μ.


6.18

Let μ, σ be nonnegative finite measures satisfying (6.21). Show that
σ ⊥ μ. Hint: first show that

inf {μ(A) : A ∈ F, σ is concentrated on A}

has a solution A. Assuming by contradiction that μ(A) > 0 (otherwise we are


done), show that

F  B ⊆ A, μ(B) > 0 ⇒ σ (B) > 0. (6.22)


101 Introduction to Measure Theory and Integration

Then, show that the numbers


+ ,
ξh := sup μ(B) : F  B ⊆ A, 1 B μ ≥ 2h 1 B σ

are infinitesimal as h → ∞, that the supremum is attained at Bh , and that

μ(C) ≤ 2h σ (C) for all sets C ⊂ A \ Bh . (6.23)

Finally choose t = 2−h , with h sufficiently large so that ξh < μ(A) and B =
A \ Bh , to get a contradiction with (6.21).

6.5. Signed measures


Let (X, F ) be a measurable space. In this section we see how the concept
of measure, still viewed as a set function, can be extended dropping the
nonnegativity assumption on A → μ(A).
∞ We recall that sequence (E i ) ⊂ F of pairwise disjoint sets such that
0 E i = E is called a countable F –measurable partition of E.
Definition 6.19 (Signed measures and total variation). A signed mea-
sure μ in (X, F ) is a map μ : F → R such that


μ(E) = μ(E i )
i=0

for all countable F –measurable partitions (E i ) of E.


Notice that the series above is absolutely convergent by the arbitrari-
ness of (E i ): indeed, if σ : N → N is a permutation, then (E σ (i) ) is still
a partition of E, hence

∞ 

μ(E i ) = μ(E σ (i) ).
i=0 i=0

This implies that the series is absolutely convergent.


Let μ be a signed measure. Then we define the total variation |μ| of μ
as follows:
 
∞
|μ|(E) = sup |μ(E i )| : (E i ) F –measurable partition of E ,
i=0
E ∈ F.

Proposition 6.20. Let μ be a signed measure and let |μ| be its total vari-
ation. Then |μ| is a finite measure on (X, F ).
102 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Proof. It is immediate to check that |μ| is a nondecreasing set function.


Step 1. If A, B ∈ F are disjoint, we have

|μ|(A ∪ B) = |μ|(A) + |μ|(B).

Indeed, let E = A ∪ B and let (E i ) be a countable F –measurable parti-


tion of E. Set

Aj = A ∩ E j, Bj = B ∩ E j, j ∈ N.

Then (A j ) is a countable F –measurable partition of A and (B j ) a count-


able F –measurable partition of B and we have E j = A j ∪ B j . Moreover,


∞ 
∞ 

|μ(E j )| ≤ |μ(A j )| + |μ(B j )| ≤ |μ|(A) + |μ|(B),
j=0 j=1 j=0

which yields |μ|(A ∪ B) ≤ |μ|(A) + |μ|(B).


Let us prove the converse inequality, assuming with no loss of gener-
ality that |μ|(A ∪ B) < ∞. Since both |μ|(A) and |μ|(B) are finite, for
any ε > 0 there exist countable F –measurable partitions (Aεk ) of A and
(Bkε ) of B such that



ε 

ε
|μ(Aεk )| ≥ |μ|(A) − , |μ(Bkε )| ≥ |μ|(B) − .
k=0
2 k=0
2

Since (Aεk , Bkε ) is a countable F –measurable partition of A ∪ B, we have


that


|μ|(A ∪ B) ≥ (|μ(Aεk )| + |μ(Bkε )|) ≥ |μ|(A) + |μ|(B) − ε.
k=1

By the arbitrariness of ε we have |μ|(A ∪ B) ≥ |μ|(A) + |μ|(B).


Step 2. |μ| is σ –additive. Since |μ| is additive by  Step 1, it is enough
to show that |μ| is σ –subadditive, i.e. |μ(A)| ≤ ∞ 0 |μ|(Ai ) whenever
(Ai ) ⊂ F is a partition of A. This can be proved arguing as in the first
part of Step 1, i.e. building from a partition (E j ) of A partitions (E j ∩ Ai )
of all sets Ai .
Step 3. |μ|(X) < ∞. Assume by contradiction that |μ|(X) = ∞. Then
we claim that

there exists a partition X = A ∪ B such that


(6.24)
|μ(A)| ≥ 1 and |μ|(B) = ∞.
103 Introduction to Measure Theory and Integration

By the claim the conclusion follows since we can use it to construct by


recurrence (replacing X with B and so on), a disjoint sequence (An ) ⊂ F
such that |μ(An )| ≥ 1. Assume, to fix the ideas, that μ(An ) ≥ 1 for
infinitely many n, and denote by E the union of these sets: then, the
σ –additivity of μ forces μ(E) = +∞, a contradiction. Analogously, if
μ(An ) ≤ −1 for infinitely many n, we find a set E such that μ(E) =
−∞.
Let us prove (6.24). By the assumption |μ|(X) = ∞ it follows the
existence of a partition (X n ) of X such that



|μ(X n )| > 2(1 + |μ(X)|).
n=0

Then either the sum of those μ(X n ) which are nonnegative or the absolute
value of the sum of those μ(X n ) which are nonpositive is greater than
1 + |μ(X)|. To fix the ideas, assume that for a subsequence (X n(k) ) we
have μ(X n(k) ) ≥ 0 and



μ(X n(k) ) > 1 + |μ(X)|.
k=0
∞
Set A = 0 X n(k) and B = Ac . Then we have |μ(A)| > 1 + |μ(X)| and

|μ(B)| = |μ(X) − μ(A)| ≥ |μ(A)| − |μ(X)| > 1.

Since
|μ|(X) = |μ|(A) + |μ|(B) = ∞,
either |μ|(B) = +∞ or |μ|(A) = +∞. In the first case we are done, in
the second one we exchange A and B. So, the claim is proved and the
proof is complete.

Let μ be a signed measure on (X, F ). We define

1 1
μ+ := (|μ| + μ), μ− := (|μ| − μ),
2 2
so that
μ = μ+ − μ− and |μ| = μ+ + μ− . (6.25)
The measure μ+ (respectively μ− ) is called the positive part (respectively
negative part) of μ and the first equation in (6.25) is called the Jordan
representation of μ.
104 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Remark 6.21. It is easy to check that Theorems 6.17 and 6.14 hold when
ν is a signed measure: it suffices to split it into its positive and negative
part, see also Exercise 6.19.
The following theorem proves also that μ+ and μ− are singular, and
provides a canonical representation of μ± as suitable restrictions of ±μ.

Theorem 6.22 (Hahn decomposition). Let μ be a signed measure on


(X, F ) and let μ+ and μ− be its positive and negative parts. Then there
exists a F –measurable partition (A, B) of X such that

μ+ (E) = μ(A∩E) and μ− (E) = −μ(B∩E) ∀E ∈ F . (6.26)

Proof. Let us first notice that μ # |μ|. Thus, by the Radon–Nikodým


theorem, there exists h ∈ L 1 (X, F , |μ|) such that

μ(E) = h d|μ| ∀E ∈ F . (6.27)
E

Let us prove that |h(x)| = 1 for |μ|–a.e. x ∈ X. Indeed, set

E 1 := {x ∈ X : h(x) > 1}, F1 := {x ∈ X : h(x) < −1}

We first show that |μ|(E 1 ) = |μ|(F1 ) = 0. Since we have



|μ|(E 1 ) ≥ μ(E 1 ) = h d|μ| ≥ |μ|(E 1 ),
E1

and the second inequality is strict if |μ|(E 1 ) > 0, we have that |μ|(E 1 ) =
0. In a similar way one can prove that |μ|(F1 ) = 0, so that |h| ≤ 1 |μ|–
a.e. in X. Now, let r ∈ (0, 1) and set

G r := {x ∈ X : |h(x)| < r}.

Let (G r,k ) be a countable F –measurable partition of G r . Then we have


! ! 
! !
|μ(G r,k )| = !! h d|μ|!! ≤ |h| d|μ| ≤ r|μ|(G r,k ).
G r,k G r,k

Therefore


|μ(G r,k )| ≤ r|μ|(G r ),
k=0

which yields, by the arbitrariness of the partition of G r , |μ|(G r ) ≤


r|μ|(G r ). Thus |μ|(G r ) = 0 and letting r ↑ 1 we obtain that |μ|({|h| <
105 Introduction to Measure Theory and Integration

1}) = 0. Hence, possibly modifying h in |μ|–negligible set, we can as-


sume with no loss of generality that h takes its values in {−1, 1}.
Now, to conclude the proof, we set
A := {x ∈ X : h(x) = 1}, B := {x ∈ X : h(x) = −1}.
Then for any E ∈ F we have

+ 1 1
μ (E) = (|μ|(E) + μ(E)) = (1 + h)d|μ|
2 2
 E

= hd|μ| = μ(E ∩ A),


E∩A

and 
− 1 1
μ (E) = (|μ|(E) − μ(E)) = (1 − h)d|μ|
2 2 E
=− hd|μ| = −μ(E ∩ B).
E∩B

Exercises
6.19 Using the decomposition of ν in positive and negative part, show that Le-
besgue decomposition is still possible when μ is σ –finite and ν is a signed meas-
ure. Using the Hahn decomposition extend this result to the case when even μ
is a signed measure. Are these decompositions unique?
6.20 Show that | f μ| = | f |μ for any f ∈ L 1 (X, E , μ).

6.6. Measures in R
In this section we estabilish a 1-1 correspondence between finite Borel
measures in R and a suitable class of nondecreasing functions. In one
direction this correspondence is elementary, and based on the concept of
repartition function.
Given a finite measure μ in (R, B (R)), we call repartition function of
μ the function F : R → [0, +∞) defined by
F(x) := μ ((−∞, x]) x ∈ R.
Notice that obviously (1) F is nondecreasing, right continuous, and satis-
fies
lim F(x) = 0, lim F(x) ∈ [0, +∞). (6.28)
x→−∞ x→+∞
Moreover, F is continuous at x if and only if x is not an atom of μ.

(1) The arguments are similar to those used in Section 2.4.2, in connection with the properties of the
function t → μ({ϕ > t})
106 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

The following result shows that this list of properties characterizes the
functions that are repartition functions of some finite measure μ; in addi-
tion the measure is uniquely determined by its repartition function.
Theorem 6.23. Let F : R → [0, +∞) be a nondecreasing and right
continuous function satisfying (6.28). Then there exists a unique finite
measure μ in (R, B (R)) such that F is the repartition function of μ.
Proof. The proof follows the same lines of the construction of the Le-
besgue measure in Section 1.6, with a simplification due to the fact that
we can also consider unbounded intervals (because we are dealing with
finite measures). We set
I := {(a, b] : a ∈ [−∞, +∞), b ∈ R, a < b}
and denote by A the ring generated by I : it consists, as it can be easily
checked, of all finite disjoint unions of intervals in I . We define, with
the convention F(−∞) = 0,
μ((a, b]) := F(b) − F(a) ∀(a, b] ∈ I . (6.29)
This definition is justified by the fact that, if μ were a measure and F
were its repartition function, (6.29) would be valid, because (a, b] =
(−∞, b] \ (−∞, a]. Then we extend μ to A with the same mechan-
ism used in the proof of Theorem 1.21, and check that μ is additive on
A . Also, the same argument used in that proof shows that μ is even σ –
additive: in order to prove that μ(F) = i μ(Fi ) whenever F and all Fi
belong to A one first reduces to the case when F = (a, b] belongs to I ;
then, one enlarges Fi to Fi ∈ A with μ(Fi ) < μ(Fi ) + δ2−i and, using
the fact that all intervals [a  , b] with a  > a are contained in a finite union
of the sets Fi , obtains

∞ 

μ((a  , b]) ≤ μ(Fi ) ≤ 2δ + μ(Fi ).
i=0 i=0

 δ ↓ 0 and then a ↓ a we obtain the σ –subadditivity property
Letting first
μ(F) ≤ i μ(Fi ), and the opposite inequality follows by monotonicity.
By the Carathéodory theorem μ has a unique extension, that we still
denote by μ, to B (R) = σ (A ). Setting a = −∞ and letting b tend to
+∞ in the identity (6.29) we obtain that μ(R) = F(+∞) ∈ R. From
(6.29) with a = −∞ we obtain that the repartition function of μ is F.
Given a nondecreasing and right continuous function F satisfying (6.28),
the Stieltjes integral 
f dF
R
107 Introduction to Measure Theory and Integration


is defined as f dμ F , where μ F is the finite measure built in the pre-
 theorem. The notation d F is justified by the fact that, when f =
vious
i z i 1(ai ,bi ] , we have (by the very definition of μ F )
  
f dF = f dμ F = z i (F(bi ) − F(ai )).
R R i

This approximation of the Stieltjes integral will play a role in the proof
of Theorem 6.28.

6.7. Convergence of measures on R


In this section we study a notion of convergence for measures on the
real line that is quite useful, both from the analytic and the probabilistic
viewpoints.
Definition 6.24 (Weak convergence). Let (μh ) be a sequence of finite
measures on R. We say that (μh ) weakly converges to a finite measure μ
on R if the repartition functions Fh of μh are pointwise converging to the
repartition function F of μ on a co-countable set, i.e. if

lim μh (−∞, x]) = μ ((−∞, x]) with at most countably many exceptions.
h→∞
(6.30)
Since the repartition function is right continuous, it is uniquely determ-
ined by (6.30). Then, since the measure is uniquely determined by its
repartition function, we obtain that the weak limit, if exists, is unique.
The following fundamental example shows why we admit at most count-
ably many exceptions in the convergence of the repartition functions.
Example 6.25. [Convergence tothe Dirac mass] Let ρ ∈ C ∞ (R) be
a nonnegative function such that R ρ dx = 1 (an important example is
2
the Gauss function (2π)−1/2 e−x /2 ). We consider the rescaled functions
ρh (x) = hρ(hx) and the induced measures μh = ρh L 1 , all probability
measures. Then, it is immediate to check that μh weakly converge to δ0 :
for x > 0 we have indeed
 x  hx
μh ((−∞, x]) = ρh (y) dy = ρ(y) dy → 1
−∞ −∞

because hx → +∞ as h → +∞. An analogous argument shows that


μh ((−∞, x]) → 0 for any x < 0. If ρ is even, at x = 0 we don’t
have pointwise convergence of the repartition functions: all the reparti-
tion functions Fh satisfy Fh (0) = 1/2, while F(0) = 1.
108 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Weak convergence is a quite flexible tool, because it allows also an op-


posite behaviour, the approximation of continuous measures (i.e. with no
atom) by purely atomic ones, see for instance Exercise 6.21.
From now on we will consider only, for the sake of simplicity, the
case of weak convergence of probability measures. Before stating a com-
pactness theorem for the weak convergence of probability measures, we
introduce the following terminology.
Definition 6.26 (Tightness). We say that a family of probability meas-
ures {μi }i∈I in R is tight if for any ε > 0 there exists a closed interval
J ⊂ R such that

μi (R \ J ) ≤ ε ∀i ∈ I.

Clearly any finite family of probability measures is tight. One can also
check (see Exercise 6.24) that {μi }i∈I is tight if and only if

lim Fi (x) = 0, lim Fi (x) = 1 uniformly with respect to i ∈ I ,


x→−∞ x→+∞
(6.31)
where Fi are the repartition functions of μi . Furthermore, (see Exer-
cise 6.25) any weakly converging sequence is tight. Conversely, we have
the following compactness result for tight sequences:

Theorem 6.27 (Compactness). Let (μh ) be a tight sequence of prob-


ability measures on R. Then there exists a subsequence (μh(k) ) weakly
converging to a probability measure μ.

Proof. We denote by Fh the repartition functions of μh . By a diagonal


argument we can find a subsequence (Fh(k) ) pointwise converging on Q.
We denote by G the pointwise limit, obviously a nondecreasing function.
We extend G by monotonicity setting

G(x) := sup {G(q) : q ∈ Q, q ≤ x} x ∈R

and let E be the co-countable set of the discontinuity points of G.


Let us check that Fh(k) is pointwise converging to G on R \ E: for
x∈/ E we have indeed

lim sup Fh(k) (x) ≤ inf lim sup Fh(k) (q) = inf G(q) = G(x),
k→∞ q∈Q, q>x k→∞ q∈Q, q>x

and analogously

lim inf Fh(k) (x) ≥ sup lim inf Fh(k) (q) = sup G(q) = G(x).
k→∞ q∈Q, q<x k→∞ q∈Q, q<x
109 Introduction to Measure Theory and Integration

Since (μh ) is tight, we have also

lim Fh (x) = 0, lim Fh (x) = 1


x→−∞ x→+∞

uniformly with respect to h, hence G(−∞) = 0 and G(+∞) = 1.


Notice now that the nondecreasing function

F(x) := lim G(y)


y↓x

is right continuous, and still satisfies F(−∞) = 0 and F(+∞) = 1,


therefore (according to Theorem 6.23) F is the repartition function of a
probability measure μ. Since F = G on R \ E, we have Fh(k) → F
pointwise on R \ E, and this proves the weak convergence of μh(k) to μ.

The following theorem provides a characterization of the weak conver-


gence in terms of convergence of the integrals of continuous and bounded
functions.

Theorem 6.28. Let μh , μ be probability measures in R. Then μh weakly


converge to μ if and only if
 
lim g dμh = g dμ ∀g ∈ Cb (R). (6.32)
h→∞ R R

Proof. Assuming that μh → μ weakly, we denote by Fh and F the cor-


responding repartition functions and fix g ∈ Cb (R). Let M = sup |g|
and ε > 0. By Exercise 6.25 the sequence (μh ) is tight, so that we can
find t > 0 satisfying μh (R \ (−t, t]) < ε for any h ∈ N; we may as-
sume (possibly choosing a larger t) that also μ (R \ (−t, t]) < ε and that
both −t and t are points where the repartition functions are converging.
Thanks to the uniform continuity of g in [−t, t] we can find δ > 0 such
that

x, y ∈ [−t, t], |x − y| < δ ⇒ |g(x) − g(y)| < ε. (6.33)

Hence, we can find points t1 , . . . , tn in [−t, t] such that t1 = −t, tn = t,


there is convergence of the repartition functions in all points ti , and ti+1 −
ti < δ for i = 1, . . . , n −1. By (6.33) it follows that sup(−t,t] |g − f | < ε,
where

n−1
f := g(ti )1(ti ,ti+1 ] .
i=1
110 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Splitting the integrals on R as the sum of an integral on (−t, t] and an


integral on (−t, t]c we have
!  !
! !
! g dμh − f dμh !! ≤ Mε + ε = (M + 1)ε ∀h ∈ N, (6.34)
!
R (−t,t]

and analogously
!  !
! !
! g dμ −
! f dμ!! ≤ Mε + ε = (M + 1)ε. (6.35)
R (−t,t]

Since
 
n−1
f dμh = g(ti ) [Fh (ti+1 ) − Fh (ti )]
(−t,t] i=1

n−1 
→ g(ti ) [F(ti+1 ) − F(ti )] = f dμ,
i=1 (−t,t]


adding and subtracting (−t,t] f dμh , and using (6.34) and (6.35), we con-
clude that
!  !
! !
lim sup !! g dμh − g dμ!! ≤ (M + 1)ε.
h→∞ R R

Since ε is arbitrary, (6.32) is proved.


Conversely, assume that (6.32) holds. Given x ∈ R, define the open
set A = (−∞, x); we can easily find (gk ) ⊆ Cb (R) monotonically con-
verging to 1 A and deduce from (6.32) the inequality
 
lim inf μh (A) ≥ sup lim inf gk dμh = sup gk dμ = μ(A).
h→∞ k∈N h→∞ R k∈N R

Analogously, using a sequence (gk ) ⊆ Cb (R) such that gk ↓ 1C , with


C = (−∞, x], we deduce from (6.32) the inequality
 
lim sup μh (C) ≤ inf lim sup gk dμh = inf gk dμ = μ(C).
h→∞ k∈N h→∞ R k∈N R

Therefore we have convergence of the repartition functions for any x ∈ R


such that μ(A) = μ(C), i.e. for any x that is not an atom of μ. We
conclude thanks to Exercise 1.5.
Notice that in (6.32) there is no mention to the order structure of R,
and only the metric structure (i.e. the space Cb (R)) comes into play. In
111 Introduction to Measure Theory and Integration

a general context, of probability measures on a metric space (X, d) en-


dowed with the Borel σ –algebra B (X), we say that μh weakly converge
to μ if
 
lim g dμh = g dμ for any function g ∈ Cb (X).
h→∞ X X

Exercises
6.21 Show that the probability measures

1 h
μh := δi
h i=1 h

weakly converge to the probability measure 1[0,1] L 1 .


6.22 Let Fh : R → R be nondecreasing functions pointwise converging to
a nondecreasing function F : R → R on a dense set D ⊂ R. Show that
Fh (x) → F(x) at all points x where F is continuous.
6.23 Consider all atomic measures of the form

h2
ai δ i ,
i=−h 2 h

where h ∈ N and a−h , . . . , ah ≥ 0. Show that for any finite Borel measure μ
in R there exists a sequence of measures (μh ) of the previous form that weakly
converges to μ.
6.24 Show that a family {μi }i∈I of probability measures in R is tight if and only
if (6.31) holds.
6.25 Show that any sequence (μh ) of probability measures weakly convergent
to a probability measure is tight. Hint: if μ is the weak limit and ε > 0 is
given, choose an integer n ≥ 1 such that μ([1 − n, n − 1]) > 1 − ε and points
x ∈ (−n, 1 − n) and y ∈ (n − 1, n) where the repartition functions of μh are
converging to the repartition function of μ.
6.26 We want to extend what was shown in this section from the realm of prob-
ability measures to that of finite measures. Let (μh ), μ be finite positive Borel
measures on R, and let Fh , F be their repartition functions. Consider the fol-
lowing implications:
 
(a) limh R g dμh = R g dμ ∀g ∈ Cb (R) (that is, (6.32));
(b) limh R g dμh = R g dμ ∀g ∈ Cc (R);
(c) Fh converge to F at all points where F is continuous;
(d) Fh converge to F on a dense subset of R;
(e) limh μh (R) = μ(R);
(f) (μh ) is tight.
Find an example where (b) holds but (a), (c), (e) do not hold and prove the
following implications: a ⇒ b, e, a ⇒ c, d ⇔ c, b ∧ e ⇒ c, d ∧ e ⇒ f ,
d ∧ f ⇒ e, d ∧ f ⇒ a. As a corollary, if (e) holds (as it happens in the case when
all μh and μ are probability measures) we obtain that a ⇔ b ⇔ c ⇔ d ⇒ f .
112 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

6.8. Fourier transform


The Fourier transform is a basic tool in Pure and Applied Mathematics,
Physics and Engineering. Here we just mention a few basic facts, focus-
sing on the use of this transform in Measure Theory and Probability.
Definition 6.29 (Fourier transform of a function). Let f ∈ L 1 (R, C).
We set 
ˆ
f (ξ ) := f (x)e−i xξ dx ∀ξ ∈ R.
R

The function fˆ is called Fourier transform of f .


Since the map ξ → f (x)e−iξ x is continuous, and bounded by | f (x)|,
the dominated convergence theorem gives that fˆ(ξ ) is continuous. The
same upper bound also shows that fˆ is bounded, and sup | fˆ| ≤  f 1 .
More generally, the following result holds:

Theorem 6.30. Let k ∈ N be such that R |x|k | f |(x) dx < ∞. Then
fˆ ∈ C k (R, C) and
D p fˆ(ξ ) = (−i) p x-
p f (ξ ) ∀ p = 0, . . . , k.
The proof of Theorem 6.30 is a straightforward consequence of the
differentiation theorem for integrals depending on a parameter (in this
case, the ξ variable, see the Appendix):
 
−i xξ p
Dξ f (x)e−i xξ dx
p
Dξ f (x)e dx =
R R 
= (−i) p x p f (x)e−i xξ dx.
R

According to the previous result, the Fourier transform allows to trans-


form differentiations (in the ξ variable) into multiplications (in the x
variable), thus allowing an algebraic solution of many linear differential
equations.
In the sequel we need an explicit expression of the Fourier transform
of a Gaussian function. For σ > 0, let
2 2
e−|x| /(2σ )
ρσ (x) := (6.36)
(2πσ 2 )1/2
be the rescaled Gaussian functions, already considered in Example 6.25.
Then 
2 2
ρσ (x)e−iξ x dx = e−ξ σ /2 ∀ξ ∈ R. (6.37)
R
The proof of this identity is sketched in Exercise 6.27.
113 Introduction to Measure Theory and Integration

Remark 6.31. (Discrete Fourier transform) If f : R → R is a 2T -


periodic function, then we can write the Fourier series (corresponding,
up to a linear change of variables, to those considered in Chapter 5 for
2π-periodic functions)
 π
f = an ein T x , in L 2 ((−T, T ); C), (6.38)
n∈Z

with

1 T π π π π
an = f (x)e−in T x dx, ein T x = cos n x + i sin n x.
2T −T T T
(6.39)
Remark 6.32. (Inverse Fourier transform) For g ∈ L 1 (R, C) we de-
fine inverse Fourier transform of f the function

1
g̃(x) := g(ξ )ei xξ dξ x ∈ R.
2π R
It can be shown (see for instance Chapter VI.1 in [7]) that the maps f  →
fˆ and g → g̃ are each the inverse of the other in the so-called Schwarz
space S(R, C) of smooth and rapidly decreasing functions at infinity:



S(R, C) := f ∈ C (R, C) : lim |x| |D f |(x) = 0 ∀k, i ∈ N .
k i
|x|→∞

In particular we have


  
f
f (x) = 2π (x) = aξ ei xξ dξ with
2π R

1
aξ : = f (x)e−iξ x dx.
2π R
These formulas can be viewed as the continuous counterpart of the dis-
crete Fourier transform (6.38), (6.39). In this sense, aξ are generalized
Fourier coefficients, corresponding to the “frequency” ξ . The difference
with Fourier series is that any frequency is allowed, not only the integer
multiples nπ/T of a given one.

6.8.1. Fourier transform of a measure


In this section we are concerned in particular with the concept of Fourier
transform of a measure.
114 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Definition 6.33 (Fourier transform of a measure). Let μ be a finite


measure on R. We set

μ̂(ξ ) := e−i xξ dμ(x) ∀ξ ∈ R.
R
The function μ̂ : R → C is called Fourier transform of μ.
Notice that Definition 6.29 is consistent with Definition 6.33, because
μ̂ = fˆ whenever μ = f L 1 . Notice also that, by the dominated con-
vergence theorem, the function μ̂ is continuous. Moreover μ̂(0) = μ(R)
and, by estimating from above the modulus of the integral with the integ-
ral of the modulus (see also Exercise 6.29), we obtain that |μ̂(ξ )| ≤ μ(R)
for all ξ ∈ R. Still using the differentiation theorems under the integral
sign, one can check that for k ∈ N the following implications hold:

|x|k dμ(x) < ∞ ⇒ μ̂ ∈ C k (R, C) and
R (6.40)
D p μ̂(ξ ) = (−i) p x-
p μ(ξ ) ∀ p = 0, . . . , k.

Let us see other basic examples of Fourier transforms of probability mea-


sures:
Example 6.34. (1) If μ = δx0 then μ̂(ξ ) = e−i x0 ξ .
(2) If μ = pδ1 + qδ0 (with p + q = 1) is the Bernoulli measure with
parameter p, then μ̂(ξ ) = q + pe−iξ .
(3) If
n  
n
μ= pi q n−i δi
i=0
i
is the binomial measure with parameters n, p then
μ̂(ξ ) = (q + pe−iξ )n ∀ξ ∈ R.
(4) If μ = e−x 1(0,∞) (x)L 1 is the exponential measure, then
1
μ̂(ξ ) = ∀ξ ∈ R.
1 + iξ
(5) If μ = (2a)−1 1(−a,a) L 1 is the uniform measure in [−a, a], then
sin(aξ )
μ̂(ξ ) = ∀ξ ∈ R \ {0}.

(6) If μ = [π(1 + x 2 )]−1 L 1 is the Cauchy measure, then (2)
μ̂(ξ ) = e−|ξ | ∀ξ ∈ R.

(2) This computation can be done using the residue theorem in complex analysis
115 Introduction to Measure Theory and Integration

Theorem 6.35. Any finite measure μ in R is uniquely determined by its


Fourier transform μ̂.
Proof. For σ > 0 we denote by ρσ the rescaled Gaussian functions in
(6.36). According to Exercise 6.27 we have

2 2
e−z σ /2 = ρσ (w)e−i zw dw.
R

Setting z = (x − y)/σ , dividing both sides by (2πσ 2 )1/2 we deduce that


2


1 2
ρσ (x − y) = ρσ (w)e−iw(x−y)/σ dw.
(2πσ )2 1/2
R

Using Fubini-Tonelli theorem we obtain


   
1 −iw(x−y)/σ 2
ρσ (x − y)dμ(x) = ρσ (w)e dw dμ(x)
R (2πσ )
2 1/2
R R
 w
ρσ (w) 2
= μ̂ eiyw/σ dy.
R (2πσ 2 )1/2 σ 2

 (6.41)
As a consequence, the integrals h σ (y) = R ρσ (y−x) dμ(x) are uniquely
determined by μ̂. But, still using the Fubini-Tonelli theorem, one can
check the identity
   
g(y)ρσ (x − y) dy dμ(x) = h σ (y)g(y) dy ∀g ∈ Cb (R).
R R R
(6.42)
Passing to the limit as σ ↓ 0 and noticing that (by Example 6.25, that
provides the weak convergence of ρσ λ to δ0 as σ ↓ 0, or a direct verific-
ation)
 
g(y)ρσ (x − y) dy = g(x − z)ρσ (z) dz → g(x) ∀x ∈ R
R R

from
 the dominated convergence theorem we obtain that all integrals
R g dμ, for g ∈ Cb (R), are uniquely determined. Hence μ is uniquely
determined by its Fourier transform.
Remark 6.36. It is also possible to show an explicit inversion formula
for the Fourier transform. Indeed, (6.42) holds not only for continuous
functions, but also for bounded Borel functions;
 choosing a < b that are
not atoms of μ and g = 1(a,b) , we have that R g(x)ρσ (x − y) dy → g(x)
for μ–a.e. x (precisely for x ∈/ {a, b}), so that (6.42) and (6.41) give
 b  b −w2 /2σ 2
e w 2
μ((a, b)) = lim h σ (y) dy = lim μ̂( 2 )eiyw/σ dwdy.
σ ↓0 a σ ↓0 a R 2πσ 2 σ
116 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

The change of variables w = tσ 2 and Fubini theorem give



1 2 2 eitb − eita
μ((a, b)) = lim e−t σ /2 μ̂(t) dt, (6.43)
σ ↓0 2π R it
for all points a < b that are not atoms of μ.
According to Theorem 6.28 we have the implication:

μh → μ weakly ⇒ μ̂h → μ̂ pointwise in R. (6.44)

The following theorem, due to Lévy, gives essentially the converse im-
plication, allowing to deduce the weak convergence from the convergence
of the Fourier transforms.
Theorem 6.37 (Lévy). Let (μh ) be probability measures in R. If f h =
μ̂h pointwise converge in R to some function f , and if f is continuous at
0, then f = μ̂ for some probability measure μ in R and μh → μ weakly.
Proof. Let us show first that (μh ) is tight. Fixed a > 0, taking into
account that sin ξ is an odd function and using the Fubini theorem we get
 a  a   a
−i xξ
σ̂ (ξ ) dξ = e dσ (x)dξ = cos(xξ ) dξ dσ (x)
−a −a R R −a

2
= sin(ax) dσ (x)
R x

for any probability measure σ . Hence, using the inequalities | sin t| ≤ |t|
for all t and | sin t| ≤ |t|/2 for |t| ≥ 2, we get
 
1 a sin(ax)
1 − σ̂ (ξ ) dξ = 2 − 2 dσ (x)
a −a R ax
  
sin(ax)
=2 1− dσ (x) (6.45)
R ax
 ) *
2 2
≥ σ R\ − , .
a a
For ε > 0 we can find, by the continuity of f at 0, a > 0 such that
 a
(1 − f (ξ )) dξ < εa.
−a

By the dominated convergence theorem we get h 0 ∈ N such that


 a

1 − μ̂h (ξ ) dξ < εa ∀h ≥ h 0 . (6.46)
−a
117 Introduction to Measure Theory and Integration

a
As a −1 −a (1 − μ̂h (ξ )) dξ → 0 as a ↓ 0 for any fixed h, we infer that
we can find b ∈ (0, a] such that (6.46) holds with b replacing a for all
h ∈ N. From (6.45) we get μh (R \ [−n, n]) < ε for all h ∈ N, as soon
as n > 2/b.
Being the sequence tight, we can extract a subsequence (μh(k) ) weakly
converging to a probability measure μ and deduce from (6.44) that f =
μ̂. It remains to show that the whole sequence (μh ) weakly converges to
μ: if this is not the case there exist ε > 0, g ∈ Cb (R) and a subsequence
h  (k) such that
!  !
! !
! g dμh  (k) − g dμ!! ≥ ε ∀k ∈ N.
!
R R

But, possibly extracting one more subsequence, we can assume that μh  (k)
weakly converge to a probability measure σ ; in particular
!  !
! !
! g dσ − g dμ! ≥ ε > 0. (6.47)
! !
R R

As we are assuming that f h = μ̂h converge pointwise to f = μ̂ we


obtain that σ̂ = limk μ̂h  (k) = μ̂, hence μ̂ = σ̂ . From Theorem 6.35 we
obtain that μ = σ , contradicting (6.47).
Notice that just pointwise convergences of the Fourier transforms is
not enough to conclude the weak convergence, unless we know that the
limit function is continuous: let us consider, for instance, the rescaled
Gaussian kernels used in the proof of Theorem 6.35 and let us consider
the behaviour of the Gaussian measures μσ = ρσ L 1 as σ ↑ ∞; in this
case, from Exercise 6.27 we infer that the Fourier transforms are point-
wise converging in R to the discontinuous function equal to 1 at ξ = 0
and equal to 0 elsewhere. In this case we don’t have weak convergence of
the measures: we have, instead, the so-called phenomenon of dispersion
of the whole mass at infinity
 n n 
lim μσ (R \ [−n, n]) = lim μ1 R \ [− , ] = μ1 (R \ {0}) = 1
σ ↑∞ σ ↑∞ σ σ
∀n ∈ N

and the family of measures μσ is far from being tight as σ ↑ ∞.

Exercises
6.27 Check the identity (6.37).
6.28
Show that μ̂ is uniformly continuous in R for any finite measure μ.
118 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

6.29 Let μ be a probability measure in R. Show that if |μ̂| attains its maximum
at ξ0 = 0, then there exist x0 ∈ R and cn ∈ [0, ∞) such that
 2nπ
μ= cn δxn with xn = x0 + .
n∈Z
ξ0

Use this fact to show that |μ̂| ≡ 1 in R if and only if μ is a Dirac mass.
Chapter 7
The fundamental theorem of the integral
calculus

In this section we give a closer look at a classical theme, namely the fun-
damental theorem of the integral calculus, looking for optimal conditions
on f ensuring the validity of the formula
 x
f (x) − f (y) = f  (s) ds.
y

Notice indeed that in the classical theory of the Riemann integration there
is agap between the conditions imposed to give a meaning to the integ-
x
ral a g(s) ds (i.e. Riemann integrability of g) and those that ensure its
differentiability as a function of x (for instance, typically one requires
the continuity of g). We will see that this gap basically disappears in Le-
besgue’s theory, and that there is aprecise characterization of the class
x
of functions representable as c + a g(s) ds for a suitable (Lebesgue)
integrable function g and for some constant c.
The following definition is due to Vitali.
Definition 7.1 (Absolutely continuous functions). Let I ⊂ R be an in-
terval. We say that f : I → R is absolutely continuous if for any ε > 0
there exists δ > 0 for which the implication

n 
n
(bi − ai ) < δ ⇒ | f (bi ) − f (ai )| < ε (7.1)
i=1 i=1

holds for any finite family {(ai , bi )}1≤i≤n of pairwise disjoint intervals
contained in I .
An absolutely continuous function is obviously uniformly continuous,
but the converse is not true, see Example 7.7.
Let f : [a, b] → R be absolutely continuous. For any x ∈ [a, b] define

n
F(x) = sup | f (xi ) − f (xi−1 )|,
σ ∈a,x i=1

L. Ambrosio et al., Introduction to Measure Theory and Integration


© Scuola Normale Superiore Pisa 2011
120 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

where a,x is the set of all decompositions σ = {a = x0 < x1 < · · · <


xn = x} of [a, x]. F is called the total variation of f . Let us check that
F is finite: let δ > 0 be satisfying the implication (7.1) with ε = 1 and
let us estimate from above a sum in the definition of F. Without loss of
generality we can assume that |xi − xi−1 | < δ/2 for all i = 1, . . . , n − 1,
possibly adding more points (which increases the sum). Then, we can
split the sum in families of intervals with total length larger than δ/2 and
less than δ (just keep adding a new interval to a family if the total length
does not exceed δ and notice that if it exceeds δ, the total length is at
least δ/2); the number of these families is less than 2δ (x − a) and, as a
consequence, (7.1) gives

2
F(x) ≤ (x − a) + 1.
δ
We set
1 1
f + (x) = (F(x) + f (x)), f − (x) = (F(x) − f (x)),
2 2
so that

f (x) = f + (x) − f − (x), F(x) = f + (x) + f − (x), x ∈ [a, b].

Lemma 7.2. Let f : [a, b] → R be absolutely continuous and let F be


its total variation. Then F, f + , f − are nondecreasing and absolutely
continuous.

Proof. Let x ∈ [a, b), y ∈ (x, b] and σ = {a = x0 < x1 < · · · < xn =


x}. Then we have


n
F(y) ≥ | f (y) − f (x)| + | f (xi ) − f (xi−1 )|.
i=1

Taking the supremum over all σ ∈ a,x , yields

F(y) ≥ | f (y) − f (x)| + F(x),

which implies that F, f + , f − are nondecreasing. It remains to show


that F is absolutely continuous. Let ε > 0 and let δ = δ(ε) > 0 be such
 holds for all finite families (ai , bi ) of pairwise
that the implication (7.1)
disjoint intervals with  i (bi − ai ) < δ. Let now (ai , bi ) bea family of
disjoint intervals with i (bi − ai ) < δ and let us prove that i |F(bi ) −
121 Introduction to Measure Theory and Integration

F(ai )| < 2ε. For any i = 1, . . . , n we can find σi = {ai = x0,i < x1,i <
· · · < xni ,i = bi } such that

ε ni
F(bi ) − F(ai ) < + | f (xk,i ) − f (xk−1,i )|, 1 ≤ i ≤ n. (7.2)
n k=1

Indeed, if a = y0 < y1 < · · · < ym i = bi is a partition such that

ε mi
F(bi ) < + | f (yk ) − f (yk−1 )|
n k=1

we can assume with no loss of generality (adding one more element to


the partition if necessary) that yk = ai for some k; then, it suffices to
estimate the first k terms of the above sum with F(ai ), and to call x0,i =
yk , . . . , xm i −k+1,i = ym i to obtain (7.2) with n i = m i − k + 1. Adding
the inequalities (7.2) and taking into account that the union of the disjoint
intervals (xk,i−1 , xk,i ) (for 1 ≤ i ≤ n, 0 ≤ k ≤ n i ) has length less than δ,
from the absolute continuity property of f we get


n
(F(bi ) − F(ai )) < ε + ε = 2ε.
i=1

This proves that F is absolutely continuous.


The absolute continuity property characterizes integral functions, as
the following theorem shows.

Theorem 7.3. Le I = [a, b] ⊂ R. A function f : I → R is represent-


able as  x
f (x) = f (a) + g(t) dt ∀x ∈ I (7.3)
a

for some g ∈ L 1 (I ) if and only if f is absolutely continuous.

Proof. (Sufficiency) If f is representable as in (7.3), we have


 y
| f (x) − f (y)| ≤ |g(s)| ds ∀x, y ∈ I, x ≤ y.
x

Hence, setting A = ∪i (ai , bi ), the absolute continuity property follows


by the implication

L 1 (A) < δ ⇒ |g| ds < ε.
A
122 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

The existence, given δ > 0, of ε > 0 with this property is ensured by


Exercise 6.14 (with μ = L 1 and ν = gL 1 ).
(Necessity) According to Lemma 7.2, we can write f as the difference
of two nonincreasing absolutely continuous functions. Hence, we can
assume with no loss of generality that f is nonincreasing, and possibly
adding to f a constant we shall assume that f (a) = 0. We extend f to the
whole of R setting f ≡ 0 in (−∞, a) and f ≡ f (b) in (b, ∞). It is clear
that this extension, that we still denote by f , retains the monotonicity and
absolute continuity properties.
By Theorem 6.23 we obtain a unique finite measure ν on (R, B (R))
without atoms (because f is continuous) such that f is the repartition
function of ν. Since f is constant on (−∞, a) and on (b, +∞), we
obtain that ν is concentrated on I , so that

f (x) = ν ((−∞, x]) = ν ((a, x]) ∀x ∈ R. (7.4)

Now, if we were able to show that ν # 1 I L 1 , by the Radon–Nikodym


theorem we would find g ∈ L 1 (I ) such that ν = gL 1 , so that (7.4)
would give  x
f (x) = g(s) ds ∀x ∈ I.
a

Hence, it remains to show that ν # 1 I L 1 . Taking into account the


identity ν((a, b)) = f (b) − f (a), the absolute continuity property can
be rewritten as follows: for any ε > 0 there exists δ > 0 such that

L 1 (A) < δ ⇒ ν(A) ≤ ε

for any finite union of open intervals A ⊂ I . But, by approximation,


the same implication holds for all open sets, because any such set is the
countable union of open intervals. By Proposition 1.24, ensuring an ap-
proximation from above with open sets, the same implication holds for
Borel sets B ⊂ I as well. This proves that ν # 1 I L 1 and concludes the
proof.
We will need the following nice and elementary covering theorem.

Theorem 7.4 (Vitali covering theorem). Let {Bri (xi )}i∈I be a finite
family of balls in a metric space (X, d). Then there exists J ⊂ I such
that the balls {Bri (xi )}i∈J are pairwise disjoint, and

Bri (xi ) ⊂ B3ri (xi ). (7.5)
i∈I i∈J
123 Introduction to Measure Theory and Integration

Proof. We proceed as follows: first we pick a ball with largest radius,


then we remove all balls that intersect the first chosen ball and choose
a second ball of largest radius among the remaining ones. We continue
removing all balls that intersect the second chosen ball and picking a
third ball of largest radius among the remaining ones, and so on. The
process stops when either there is no ball left, i.e. when the remaining
balls intersect at least one of the already chosen balls. The family of
chosen balls is disjoint by construction. If x ∈ Bri (xi ) and the ball Bri (xi )
has not been chosen, then there is a chosen ball Br j (x j ) intersecting it, so
that d(xi , x j ) < ri + r j . Moreover, if Br j (x j ) is the first chosen ball with
this property, then r j ≥ ri (otherwise, if ri > r j , either the ball Bri (xi ) or
a ball with larger radius would have been chosen, instead of Br j (x j )), so
that d(xi , x j ) < 2r j . It follows that

d(x, x j ) ≤ d(x, xi ) + d(xi , x j ) < ri + 2r j ≤ 3r j .

As x is arbitrary, this proves (7.5).


It is natural to think that the function g in (7.3) is, as in the classical
fundamental theorem of integral calculus, the derivative of f . This is
true, but far from being trivial, and it follows by the following weak con-
tinuity result (due to Lebesgue) of integrable functions. We state the
result even in more then one variable, as the proof in this case does not
require any extra difficulty.

Theorem 7.5 (Continuity in mean). Let f ∈ L 1 (Rn ). Then, for L n –


a.e. x ∈ Rn we have

1
lim | f (y) − f (x)| dy = 0.
r↓0 ωn r n Br (x)

The terminology “continuity in mean” can be explained as follows: it is


easy to show that the integral means

1
f (y) dy
ωn r n Br (x)

of a continuous function f converge to f (x) as r ↓ 0 for any x ∈ Rn ,


because they belong to the interval

[min f, max f ].
B r (x) B r (x)

The previous theorem tells us that the same convergence occurs, for L n –
a.e. x ∈ Rn , for any integrable function f . This simply follows by the
124 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

inequality
!  ! ! !
! 1 ! ! !
! f (y) dy − f (x)!= 1 ! f (y) − f (x) dy !
!ω rn ! ωn r n ! !
n Br (x) Br (x)

1
≤ | f (y) − f (x)| dy.
ωn r n Br (x)

By the local nature of this statement, the same property holds for locally
integrable functions.
Proof of Theorem 7.5. Given ε, δ > 0 and an open ball B = B R (0), it
suffices to check that the set
  
1
A := x ∈ B : lim sup | f (y) − f (x)| dy > 2ε
r↓0 ωn r n Br (x)

has Lebesgue measure less than (3n + 1)δ. To this aim, we write f as the
sum of a “good” part g and a “bad”, but small, part h, i.e. f = g + h
with g : B → R bounded and continuous, and h L 1 (B  ) < εδ, with
B  = B R+1 (0); this decomposition is possible, because Proposition 3.16
ensures the density of bounded continuous functions in L 1 (B).
The continuity of g gives

1
lim |g(y) − g(x)| dy = 0 ∀x ∈ B.
r↓0 ωn r n B (x)
r

Hence, as f = g + h, we have A ⊂ A1 , where


  
1
A1 := x ∈ B : lim sup |h(y) − h(x)| dy > 2ε .
r↓0 ωn r n Br (x)

Then, it suffices to show that L n (A1 ) ≤ (3n + 1)δ. By the triangle


inequality, we have also A1 ⊂ A2 ∪ A3 with
A2 := {x ∈ B : |h(x)| > ε}
and   
1
A3 := x ∈ B : sup |h(y)| dy > ε .
r∈(0,1) ωn r
n
Br (x)

Markov inequality ensures that L n (A2 ) ≤ h L 1 (B) /ε < δ, so that we


 that L (A3 ) ≤ 3 δ.
n n
need only to show
Since x  → Br (x) |h(y)| dy is continuous, we have that

1
x → sup |h(y)| dy
r∈(0,1) ωn r
n
Br (x)
125 Introduction to Measure Theory and Integration

is lower semi continuous, hence A3 is open. Notice also that for any
x ∈ A3 there exists r ∈ (0, 1), depending on x, such that

|h(y)| dy > εωn r n .
Br (x)

Let K ⊂ A3 be a compact set and let {B(xi , ri )}i∈I be a finite family of


these balls whose union covers K . By applying Vitali’s covering theorem
to this family of balls, we can find a disjoint subfamily {Bri (xi )}i∈J such
that the union of the enlarged balls B3ri (xi ) still covers K . Adding the
previous inequalities with x = xi and r = ri and summing in i ∈ J ,
since all balls Bri (xi ) are contained in B  we get
 
3n 
L (K ) ≤
n
ωn (3ri ) ≤
n
|h(y)| dy
i∈J
ε i∈J Bri (xi )

3n
≤ |h(y)| dy ≤ 3n δ.
ε B
As K is arbitrary we obtain that L n (A3 ) ≤ 3n δ.
Since the continuity in mean is a local property, it is not difficult to
extend the previous result to locally integrable functions. By applying
this extended theorem to a characteristic function f = 1 E we get

⎪ L n (E ∩ Br (x))

⎨lim = 1 for L n –a.e. x ∈ E
r↓0 ω rn
n

⎪ L n (E ∩ Br (x))
⎩lim = 0 for L n –a.e. x ∈ Rn \ E
r↓0 ωn r n
for any E ∈ B (Rn ); points of the first type are called density points,
whereas points of the second type are called rarefaction points.
Using the continuity in mean of integrable functions we obtain the
fundamental theorem of calculus within the (natural) class of absolutely
continuous functions.
Theorem 7.6. Let I ⊂ R be an interval and let f : I → R be absolutely
continuous. Then f is differentiable at L 1 –a.e. point of I . In addition
f  is Lebesgue integrable in I and
 x
f (x) = f (a) + f  (s) ds ∀x ∈ I. (7.6)
a

Proof. Let g be as in (7.3), let x0 ∈ I be a point where



1 x0 +r
lim |g(s) − g(x0 )| ds = 0 (7.7)
r↓0 r x −r
0
126 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

and notice that


 x0 +r
f (x0 + r) − f (x0 ) 1
= g(s) ds
r r x0
 x0 +r
1
= g(x0 ) + g(s) − g(x0 ) ds
r x0

for r > 0. Hence, passing to the limit as r ↓ 0, from (7.7) we get


f + (x0 ) = g(x0 ); a similar argument shows that f − (x0 ) = g(x0 ). As,
according to the previous theorem, L 1 –a.e. point x0 satisfies (7.7), we
obtain that f is differentiable, with derivative equal to g, L 1 –a.e. in I .
It suffices to replace g with f  in (7.3) to obtain (7.6).
One might think that differentiability L 1 –a.e. and integrability of the
derivative are sufficient for the validity of (7.6) (these are the minimal
requirements to give a meaning to the formula). However, this is not
true, as the Heaviside function 1(0,∞) fulfils these conditions but fails to be
(absolutely) continuous. Then, one might think that one should require
also the continuity of f to have (7.6). It turns out that not even this
is enough: we build in the next example the Cantor-Vitali function, also
called devil’s staircase: a continuous function having derivative equal to 0
L 1 –a.e., but not constant. This example shows why a stronger condition,
namely the absolute continuity, is needed.
Example 7.7 (Cantor–Vitali function). Let

X := { f ∈ C([0, 1]) : f (0) = 0, f (1) = 1} .

This is a closed subspace of the complete metric space C([0, 1]), hence
X is complete as well. For any f : [0, 1] → R we set


⎨ f (3x)/2 if 0 ≤ 3x ≤ 1,
T f (x) := 1/2 if 1 < 3x < 2, (7.8)


1/2 + f (3x − 2)/2 if 2 ≤ 3x ≤ 3.

It is easy to see that T maps X into X, and that T is a contraction (with


Lipschitz constant equal to 1/2). Hence, by the contraction principle,
there is a unique f ∈ X such that T f = f .
Let us check that f has zero derivative L 1 –a.e. in [0, 1]. As f = T f ,
f is constant, and equal to 1/2, in (1/3, 2/3). Inserting this information
again in the identity f = T f we obtain that f is locally constant (equal
to 1/4 and to 3/4) on (1/9, 2/9) ∪ (7/9, 8/9). Continuing in this way,
one finds that f is locally constant on the union of 2n−1 intervals, each
of length 3−n , n ≥ 1. The complement C = [0, 1] \ A of the union A
127 Introduction to Measure Theory and Integration

of these intervals is Cantor’s middle third set (see also Exercise 1.8), and
since
 ∞  n

2n−1 1 2
L 1 (A) = n
= =1
n=1
3 2 n=1
3

we know that L 1 (C) = 0. At any point of A the derivative of f is


obviously 0.
In connection with the previous example, notice also that f maps A,
a set of full Lebesgue measure in [0, 1], into the countable set {2−n }n≥1 .
On the other hand, it maps C, a Lebesgue negligible set, into [0, 1], a set
with strictly positive Lebesgue measure.

Exercises
7.1 Let H : R → R be satisfying the Lipschitz condition

|H (x) − H (y)| ≤ C|x − y| ∀x, y ∈ R

and let f : [a, b] → R be an absolutely continuous function. Show that H ◦ f


is absolutely continuous in [a, b].
7.2
Let E ⊆ R be a Borel set and assume that any t ∈ R is either a point
of density or a point of rarefaction of E. Show that either L 1 (E) = 0 or
L 1 (R \ E) = 0. (Remark: the same result is true in Rn , but with a much harder
proof, see [3], 4.5.11).
7.3[Lipschitz change of variables]
Let f : I = [a, b] → R be absolutely
continuous (resp. Lipschitz). Show that
 f (b)  b
ϕ(y) dy = ϕ( f (x)) f  (x) dx
f (a) a

for any bounded (resp. integrable) Borel function ϕ : f (I ) → R.


7.4 Use the previous exercise to show that, for any Lipschitz function f : R →
R and any L 1 –negligible set N ∈ B (R), the derivative f  vanishes L 1 –a.e.
on f −1 (N ).
Chapter 8
Measurable transformations

In this chapter we study the classical problem of the change of variables


in the integral from a new viewpoint. We will compute how the Lebesgue
measure in Rn changes under a sufficiently regular transformation, gen-
eralizing what we have already seen for linear, or affine, maps. As a
byproduct we obtain a quite general change of variables formula for in-
tegrals with respect to the Lebesgue measure.

8.1. Image measure


We are given two measurable spaces (X, E ) and (Y, F ), a measure μ
on (X, E ) and a (E , F )–measurable mapping F : X → Y . We define a
measure F# μ in (Y, F ) by setting
F# μ(I ) := μ(F −1 (I )), I ∈ F. (8.1)
It is easy to see that F# μ is well defined, by the measurability assumption
on F, and σ -additive on F . F# μ is called the image measure of μ by F.
The following change of variable formula is simple, but of a basic
importance.
Proposition 8.1. Let ϕ : Y → [0, ∞] be a F –measurable function. Then
we have  
ϕ(F(x)) dμ(x) = ϕ(y) d F# μ(y). (8.2)
X Y
Proof. By monotone approximation it is enough to prove (8.2) when ϕ
is a simple function. By linearity of both sides we need only to consider
functions ϕ of the form ϕ = 1 I , where I ∈ F . In this case we have
ϕ ◦ F = 1 F −1 (I ) , hence (8.2) reduces to (8.1).
In the following example we discuss the relation between the change of
variables formula (8.2), that even on the real line involves no derivative,
and the classical one. The difference is due to the fact that in (8.2) we are
not using the density of F# μ with respect to L 1 . It is precisely in this
density that the derivative of F shows up.

L. Ambrosio et al., Introduction to Measure Theory and Integration


© Scuola Normale Superiore Pisa 2011
130 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Example 8.2. Let F : R → R be of class C 1 and such that F  (t) > 0 for
all t ∈ R. Let A be the image of F (an open interval, by the assumptions
made on F) and let ψ : A → R be continuous. Then for any interval
[a, b] ⊂ A the following elementary formula of change of variables holds
(just put y = F(x) in the right integral):
 F −1 (b)  b
1
ψ(F(x)) dx = ψ(y)  −1 dy.
F −1 (a) a F (F (y))
On the other hand, choosing ϕ = ψ 1 I with I = [a, b] in (8.2), we have
 F −1 (b)  b
ψ(F(x)) dx = ψ(y) d F# L 1 .
F −1 (a) a

Hence, comparing the two expressions, we find


 b  b
1
ψ(y)  −1 dy = ψ(y) d F# L 1 . (8.3)
a F (F (y)) a

Since a, b and ψ are arbitrary, (8.3) can be interpreted by saying that


F# L 1 # L 1 and
1
F# L 1 =  L 1.
F ◦ F −1
In the next section, we shall generalize this formula to Rn , and even in one
space dimension we will see that the assumption that F  > 0 everywhere
can be weakened (see also Exercise 8.3).

8.2. Change of variables in multiple integrals


We consider here the measure space (Rn , B (Rn ), L n ), where L n is the
Lebesgue measure.
We recall a few basic facts from calculus with several variables: given
an open set U ⊂ Rn and a mapping F : U → Rn , F is said to be differ-
entiable at x ∈ U if there exists a linear operator D F(x) ∈ L(Rn ; Rn ) (1)
such that
|F(x + h) − F(x) − D F(x)h|
lim = 0.
|h|→0 |h|
The operator D F(x) if exists is unique, and is called the differential of F
at x. If F is affine, i.e. F(x) = T x + a for some T ∈ L(Rn ; Rn ) and
a ∈ Rn , we have D F(x) = T for all x ∈ U .

(1) L(Rn ; Rm ) is the Banach space of all linear mappings T : Rn → Rm endowed with the sup norm
T  = sup{|T x| : x ∈ Rn , |x| = 1}
131 Introduction to Measure Theory and Integration

If F is differentiable at x ∈ U we define the Jacobian determinant JF (x)


of F at x by setting
JF (x) = det D F(x).
If F is differentiable at any x ∈ U and if the mapping D F : U →
L(Rn ; Rn ) is continuous, we say that F is of class C 1 . If, in addition,
F is bijective between U and an open domain A and F −1 is of class C 1
in A, we say that F is a C 1 diffeomorphism of U onto A. In this case we
have that D F(x) is invertible and

D(F −1 )(F(x)) = (D F(x))−1 ∀x ∈ U.

Finally, by Proposition 6.10 we know that if T ∈ L(Rn ; Rn ) we have

L n (T (E)) = | det T | L n (E) ∀E ∈ B (Rn ). (8.4)

8.3. Image measure of L n by a C 1 diffeomorphism


In this section we study how the Lebesgue measure changes under the
action of a C 1 map F. The relevant quantity will be the function |JF |,
which really corresponds to the distorsion factor of the measure.
Let U ⊂ Rn be open. The critical set C F of F ∈ C 1 (U ; Rn ) is defined
by
C F := {x ∈ U : JF (x) = 0} .

Lemma 8.3. The image F(C F ) of the critical set is Lebesgue negligible.

Proof. Let K ⊂ C F be a compact set and ε > 0; for any x ∈ K the set
D F(x)(B 1 (0)) is Lebesgue negligible (because D F is singular at x, so
that D F(x)(Rn ) is contained in a (n − 1)-dimensional subspace of Rn ),
hence we can find δ = δ(ε, x) > 0 such that

L n {z ∈ Rn : dist (z − F(x), D F(x)(B 1 (0))) < δ} < ε.

By a scaling argument we get



L n {z ∈ Rn : dist (z − F(x), D F(x)(B r (0))) < δr} < εr n ∀r > 0.

On the other hand, since |F(y) − F(x) − D F(x)(y − x)| < δr in Br (x),
provided r is small enough, we get
 
F(Br (x)) ⊂ z ∈ Rn : dist (z − F(x), D F(x)(B r (0)) < δr .

It follows that Br (x) ⊂ U and L n (F(Br (x))) < εr n for r > 0 small
enough, depending on x.
132 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Since the family of balls {Br/3 (x)}x∈K covers the compact set K , we can
find a finite family {Bri /3 (xi )}i∈I whose union still covers K and extract
from it, thanks to Vitali’s covering theorem, a subfamily {Bri /3 (xi )}i∈J
made by pairwise disjoint balls such that the union of the enlarged balls
{Bri (xi )}i∈J covers K . In particular, covering F(K ) by the union of
F(Bri (xi )) for i ∈ J , we get
 3n ε   ri n 3n ε n
L n (F(K )) ≤ εrin = ωn ≤ L (U ).
i∈J
ωn i∈J 3 ωn

Letting ε ↓ 0 we obtain that L n (F(K )) = 0. Since K is arbitrary,


by approximation (recall that C F , being a closed subset of U , can be
written as the countable union of compact subsets of U ) we obtain that
L n (F(C F )) = 0.
The following theorem provides a necessary and sufficient condition
for the absolute continuity of F# L n with respect to L n , assuming a C 1
regularity of F.
Theorem 8.4. Let U ⊂ Rn be an open set and let F : U → Rn be of
class C 1 , whose restriction to U \ C F is injective. Then:
(i) F# (1U L n ) is absolutely continuous with respect to L n if and only if
C F is Lebesgue negligible.
(ii) If F# (1U L n ) # L n we have
1
F# (1U L n ) = 1 L n. (8.5)
|JF |(F −1 ) F(U \C F )
Proof. (i) If L n (C F ) > 0, we have F# (1U L n )(F(C F )) ≥ L n (C F ) >
0 and F# (1U L n ) fails to be absolutely continuous with respect to L n ,
because we proved in Lemma 8.3 that F(C F ) is Lebesgue negligible.
Let G be the inverse of the restriction of F to the open set U \ C F . The
local invertibility theorem ensures that the domain A = F(U \C F ) of G is
an open set, that G is of class C 1 in A and that DG(y) = (D F)−1 (G(y))
for all y ∈ A. Let us assume now that C F is Lebesgue negligible and
let us show that F −1 (E) is Lebesgue negligible whenever E ⊂ F(U ) is
Lebesgue negligible. Since we already know that C F is L n –negligible
set, we can assume with no loss of generality that E ⊂ A and show that
G(E) is Lebesgue negligible. Let A M be the open sets
A M := {y ∈ A : DG(y) < M} .
We will prove that
L n (G(K )) ≤ (3M)n L n (K ) (8.6)
133 Introduction to Measure Theory and Integration

for any compact set K ⊂ A M . So, F# L n ≤ (3M)n L n on the compact


sets of A M and therefore on the Borel sets; in particular
L n (G(E ∩ A M )) ≤ (3M)n L n (E ∩ A M ) = 0,
and letting M ↑ ∞ we obtain that L n (G(E)) = 0, because E ⊂ A.
In order to show (8.6) we consider a bounded open set B contained in
A M and containing K , and the family of balls Br (y) ⊂ B with y ∈ K
and r > 0. For any of these balls the mean value theorem gives (with
t = t (y, z) ∈ (0, 1))
|G(z)−G(y)| = |DG((1−t)y +t z)(z − y)| ≤ M|z − y| ∀z ∈ Br (y),
therefore G(Br (y)) ⊂ B Mr (G(y)) for any of these balls. Since the family
of balls {Br/3 (y)} y∈F covers K , we can find a finite family {Bri /3 (yi )}i∈I
whose union still covers K and extract from it, thanks to Vitali’s covering
theorem, a subfamily {Bri /3 (yi )}i∈J made by pairwise disjoint balls such
that the union of the enlarged balls {Bri (yi )}i∈J covers K . In particular,
by our choice of the radii of the balls, the family {B Mri (G(yi ))}i∈J covers
G(K ). We have then
   ri n
L n (G(K )) ≤ ωn (Mri )n = (3M)n ωn ≤ (3M)n L n (B).
i∈J i∈J
3
Letting B ↓ K we obtain (8.6).
Let us prove (ii). We denote by h the Radon–Nikodym derivative of
F# (1U L n ) with respect to L n ; by Theorem 7.5 we have that

1 L n (G(Br (y)))
h(y) = lim h(z) dz = lim ,
r↓0 ωn r n Br (y) r↓0 ωn r n
for L n –a.e. y ∈ A.
Taking into account that F# (1U L n ) is concentrated on A, and that
1/|JF | ◦ F −1 = |JG |, it remains to prove that for all y0 ∈ A we have
L n (G(Br (y0 )))
lim = |JG |(y0 ). (8.7)
r↓0 ωn r n
For the sake of simplicity we only consider the case when y0 = 0 and
G(0) = 0 (this is not restrictive, up to a translation in the domain and in
the codomain). We divide the rest of the proof in two steps.
Step 1. We assume in addition that DG(0) = I and show that
L n (G(Br (0)))
lim = 1, (8.8)
r↓0 ωn r n
which is equivalent to (8.7) in this case.
134 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Since D F(0) = DG(0) = I we have by the definition of derivative,

|F(x) − x| |G(y) − y|
lim = 0, lim =0
|y|→0 |x| |y|→0 |y|
So, for any ε ∈ (0, 1) there exists δε > 0 such that if |x| < δε we have
x ∈ U \C F and |F(x)−x| < ε|x| and if |y| < δε we have y ∈ F(U \C F )
and |G(y) − y| < ε|y|. It follows that r < δε implies

|F(x)| < r ∀x ∈ B(1−ε)r (0), |G(y)| < (1 + ε)r ∀y ∈ Br (0).


(8.9)
In particular

B(1−ε)r (0) ⊂ G(Br (0)) ⊂ B(1+ε)r (0) ∀r < δε . (8.10)

Now, by (8.10) it follows that

L n (G(Br (0)))
(1 − ε)n ≤ ≤ (1 + ε)n ,
ωn r n
provided r < δε , and this proves that (8.8) holds.
Step 2. Set T = DG(0) and H (x) = T −1 G(x), so that D H (0) = I .
Then we have G(Br (0)) = T (H (Br (0))) and so, thanks to (8.4),

L n (G(Br (0))) = L n (T (H (Br (0)))) = | det T | L n (H (Br (0))),

which implies

L n (G(Br (0))) L n (H (Br (0)))


lim = | det T | lim = | det T |.
r↓0 ωn r n r↓0 ωn r n
The proof is complete.
Example 8.5 (Polar and spherical coordinates). Let us consider the
polar coordinates
(ρ, θ) → (ρ cos θ, ρ sin θ).
Here U = (0, ∞) × (0, 2π) and the critical set is empty, because the
modulus of the Jacobian determinant is ρ.
In the case of the spherical coordinates

(ρ, θ, φ) → (ρ cos θ sin φ, ρ sin θ sin φ, ρ cos φ)

we have U = (0, ∞) × (0, 2π) × (0, π) and the critical set is empty,
because the modulus of the Jacobian determinant is −ρ 2 sin φ.
135 Introduction to Measure Theory and Integration

Theorem 8.6 (Change of variables formula). Let U ⊂ Rn be an open


set and let F : U → Rn of class C 1 , injective on U \ C F . Then
 
ϕ(y) dy = ϕ(F(x))|JF |(x) dx (8.11)
F(U ) U

for any Borel function ϕ : F(U ) → [0, +∞].

Proof. We first see that it is not restrictive to assume that C F = ∅; indeed,


F(C F ) is Lebegue negligible and so images of points in C F do not affect
the left hand side, while obviously points in C F do not affect the right
hand side. So, possibly replacing U with U \ C F , we can assume that
C F = ∅.
By (8.2) and (8.5) we have
 
ψ(y)
dy = ψ(F(x)) dx.
F(U ) |J F |(F (y))
−1
U

for any nonnegative Borel function ψ. We conclude choosing ψ(y) =


ϕ(y)|JF |(F −1 (y)).

Exercises
8.1 Let (X, F ), (Y, G ) and (Z , H ) be measurable spaces and let f : X → Y ,
g : Y → Z be measurable maps. Show that

g# ( f # μ) = (g ◦ f )# μ

for any measure μ in (X, F ).


N → [0, 1] be the map associating to a sequence (a ) ⊂ {0, 1}
8.2 Let f : {0, 1} i
the real number i ai 2−i−1 ∈ [0, 1]. Show that
∞ 
f# × 1 1
( δ0 + δ1 ) = 1[0,1] L 1 .
i=0 2 2

8.3
Show the existence of a strictly increasing and C 1 function F : R → R
such that F# L 1 is not absolutely continuous with respect to L 1 .
8.4

Remove the injectivity assumption in Theorem 8.4, showing that
 1
F# (1U L n ) = 1 L n.
|J F|(x) F(U \C F )
x∈F −1 (y)\C F

for any C 1 function F : U → Rn with Lebesgue negligible critical set.


Appendix A

A.1. Continuity and differentiability of functions depending


on a parameter
In this section we consider the following problem: we are given a metric
space (X, d) and a measure space (Y, F , μ). Given f : X × Y → R, we
assume that for all x ∈ X the function f (x, ·) is μ–integrable, so that the
function F : X → R given by

F(x) := f (x, y) dμ(y) x∈X
Y

is well defined. We would like to understand under which conditions F,


an integral depending on the parameter x, is continuous. When X is an
open subset of Rn endowed with the Euclidean distance, it is also natural
to investigate the differentiability properties of F.
Theorem A.1 (Continuity of F). Assume that f (·, y) is continuous in
X for μ-almost all y ∈ Y and that there exists m ∈ L 1 (Y, μ) satisfying

sup | f (x, y)| ≤ m(y) for μ–a.e. y ∈ Y . (A.1)


x∈X

Then F is bounded and continuous in X.


Proof. It is clear that |F(x)| ≤ m1 for all x ∈ X. Continuity is a
simple consequence of the dominated convergence theorem: indeed, if
xn ∈ X converge to x, then f (xn , y) converge to f (x, y) for μ-almost
every y and the convergence is dominated because of (A.1). It follows
that F(xn ) → F(x).
A more expressive way to state the continuity of F is to say that limit
and integral commute, namely
 
lim f (x h , y) dμ(y) = lim f (x h , y) dμ(y).
h→∞ Y Y h→∞
138 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

The following example shows that if no uniform upper bound is imposed


on f , then continuity might fail:
Example A.2. Let X = Y = R, μ = L 1 and


⎨|x|(1 − |y||x|) if |y||x| < 1;
f (x, y) :=


0 if |y||x| ≥ 1.

Then F(x) = 1 for all x = 0, while F(0) = 0. In this case the smallest
possible function satisfying (A.1) is |y|−1 which is not integrable.
Next, we assume that X is an open set of Rn endowed with the Euc-
lidean distance and we investigate the differentiability of F. Under suit-
able assumption, we can commute derivative and integral, namely
 
∂ ∂f
f (x, y) dμ(y) = (x, y) dμ(y) ∀x ∈ X, i = 1, . . . , n.
∂ xi Y Y ∂ xi
(A.2)
Theorem A.3 (Differentiability of F). Assume that for μ-almost all y ∈
Y the function f (·, y) is differentiable in X with a continuous gradient
∇x f (x, y) and that, for any ball Br (x0 ) ⊂ X, there exists m ∈ L 1 (Y, μ)
satisfying

| f (x0 , y)| + sup |∇x f |(x, y) ≤ m(y) for μ–a.e. y ∈ Y . (A.3)


x∈Br (x0 )

Then F ∈ C 1 (X) and (A.2) holds.


Proof. We fix x0 ∈ X, i ∈ {1, . . . , n} and xi = x + ti ei with ti  = 0 and
ti → 0. The mean value theorem, applied for any y such that f (·, y) ∈
C 1 (X), gives θi (y) ∈ (0, 1) satisfying

F(x0 + ti ei ) − F(x0 ) ∂f
= (x0 + θi (y)ti ei , y) dμ(y)
ti Y ∂ xi

For i large enough (as soon as |ti | < r) the functions of y inside the
integral are dominated by the function m in (A.3), hence we can pass to
the limit with the dominated convergence theorem to get (notice that the
measurability of ∂ f /∂ xi (x0 , ·) follows by the same limiting process)

∂F ∂f
(x0 ) = (x0 , y) dμ(y).
∂ xi Y ∂ xi

Finally, continuity of partial derivatives of F is a consequence of the


previous theorem.
139 Introduction to Measure Theory and Integration

Of course similar statements can be given for k-th order derivatives of F,


provided f (·, y) is k times differentiable and, for any ball Br (x0 ) ⊂ Rn
there exists m ∈ L 1 (Y, μ) satisfying

| f (x0 , y)| + sup sup |D p f |(x, y) ≤ m(y) for μ–a.e. y ∈ Y


x∈Br (x0 ) | p|≤k

(here p = ( p1 , . . . , pn ) and | p| = p1 + · · · + pn ). Under this assumption


one obtains that
 
D p
f (x, y) dμ(y) = Dxp f (x, y) dμ(y) whenever | p| ≤ k.
Y Y

A.2. The dual space of continuous functions


In this section we want to characterize the space (C(X))∗ , dual space of
C(X), with (X, d) compact metric space. Recall that C(X) is a Banach
space, when endowed with the sup norm, regardless of any assumption
on (X, d). Some knowledge of the basic terminology of Banach spaces
(dual space, dual norm) is needed for this section.
We start with some notation: we shall denote by M (X) the space of
signed measures μ, i.e. the real-valued and σ -additive set functions μ,
defined on B (X), of the form μ = μ+ − μ− with μ± positive and finite
Borel measures satisfying μ+ ⊥ μ− .
This orthogonality condition ensures uniqueness of the decomposition
of μ, as we will see in a moment; existence, instead, is just a consequence
of the σ -additivity (see Section 6.5), but we shall not use this fact in the
sequel.
For μ ∈ M (X) we denote |μ| = μ+ + μ− its total variation measure,
as in Section 6.5, and set

μ := |μ|(X) = μ+ (X) + μ− (X). (A.4)

In the next proposition we show that the decomposition μ = μ+ − μ−


is unique, so that (A.4) is well posed, and that M (X) is a normed space.
The completeness of M (X) will be a consequence of Theorem A.6, since
any dual space is complete.
Proposition A.4. For any μ ∈ M (X) the decomposition μ = μ+ − μ−
is unique. In addition M (X), endowed with the norm (A.4), is a normed
space.
Proof. Assume that μ = μ+ − μ− = μ̃+ − μ̃− , with orthogonal decom-
positions. Let A be a Borel set where μ+ is concentrated, so that μ− is
concentrated on X \ A, and let à be an analogous Borel set for μ̃± . Since
140 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

μ ≥ 0 (respectively μ ≤ 0) on the subsets of A (respectively of X \ A)


and the same property holds for Ã, we obtain that μ (and therefore μ±
and μ̃± ) vanishes on subsets of A \ à and of à \ A. On the other hand, if
B ⊂ A ∩ Ã we have μ− (B) = μ̃− (B) = 0 and

μ+ (B) = μ(B) = μ̃+ (B).

Analogously, if B ⊂ (X \ A) ∩ (X \ Ã) we have μ+ (B) = μ̃+ (B) = 0


and μ− (B) = μ̃− (B). This proves that μ± = μ̃± .
Now, stability of M (X) under multiplication with real constants and
1-homogeneity of the norm are obvious. Let us prove stability under
addition and subadditivity of the norm: if μ = μ+ −μ− and ν = ν + −ν −
we can write as before μ = f |μ| and ν = g|ν| with f, g : X → [−1, 1].
Then, setting σ = |μ|+|ν|, the Radon–Nikodým theorem gives |μ| = aσ
and |ν| = bσ for suitable a, b : X → [0, 1], so that

μ + ν = f |μ| + g||ν| = ( f a + gb)σ

and we may take ( f a + gb)± σ as positive and negative parts of μ + ν.


We obtain also
 
μ + ν = | f a + gb| dσ ≤ |a| + |b| dσ = μ + ν.
X X

This completes the proof of the proposition.


We shall also denote by A (X) the collection of open subsets of X
and use the following characterization of set functions defined on A (X)
which are restrictions of σ -additive measures defined on the Borel σ -
algebra.
Proposition A.5. Let (X, d) be a compact metric space and let α :
A (X) → [0, +∞] be a nondecreasing set function satisfying α(∅) = 0
and:
(i) (continuity) if An ∈ A (X), n ∈ N, monotonically converge from
below to A, then α(An ) ↑ α(A);
(ii) (subadditivity) α(A1 ∪ A2 ) ≤ α(A1 ) + α(A2 ) for all A1 , A2 ∈
A (X);
(iii) (additivity on disjoint sets) α(A1 ∪ A2 ) = α(A1 ) + α(A2 ) whenever
A1 ∈ A (X) and A2 ∈ A (X) are disjoint.
Then
α̃(B) := inf {α(A) : A ∈ A (X), A ⊃ B} (A.5)
is a σ -additive extension of α to B (X).
141 Introduction to Measure Theory and Integration

Proof. Notice first that α is σ –subadditive on A (X): indeed, if A ⊂ ∪i Ai


and B is an open set with compact closure in A, then B is contained in
the union of finitely many Ai ’s, so that (ii) gives



α(B) ≤ α(Ai ).
i=1


Since B is arbitrary, (i) gives α(A) ≤ i α(Ai ).
Now, if we take (A.5) as the definition of α̃ for all subsets of X, Pro-
position 1.16 gives that α̃ extends α and is σ –subadditive. Then, The-
orem 1.17 gives that α is σ –additive on the Borel σ –algebra, provided
we are able to show that any Borel set is α̃–additive. Since the class of
additive sets is a σ –algebra, suffices to show that any closed set is α̃–
additive.
To this aim, we first show that α̃ is additive on distant sets, namely
(recall that dist(U, V ) is the infimum of the distances d(x, y) for x ∈ U
and y ∈ V )

α̃(B1 ∪ B2 ) = α̃(B1 ) + α̃(B2 ) whenever dist(B1 , B2 ) > 0. (A.6)

Indeed, if A ⊃ B1 ∪ B2 is open we can consider the disjoint open sets

A1 := {x ∈ A : dist(x, B1 ) < dist(x, B2 )} ,


A2 := {x ∈ A : dist(x, B2 ) < dist(x, B2 )}

containing B1 and B2 respectively to get

α(A) ≥ α(A1 ∪ A2 ) = α(A1 ) + α(A2 ) ≥ α̃(B1 ) + α̃(B2 ).

Since A is arbitrary the inequality ≥ in (A.6) follows, while the converse


one is a consequence of subadditivity.
Let F ⊂ X be closed, B ⊂ X and let us prove that α̃(B ∩ F) + α̃(B \
F) ≤ α̃(B) (the opposite inequality follows by subadditivity). Assuming
with no loss of generality α̃(B) < ∞ and setting
 
Bh := x ∈ B : 2h > dist(x, F) ≥ 2h−1 h∈Z

the additivity on distant sets gives


 
α̃(B2h ) ≤ α̃(B) < ∞, α̃(B2h+1 ) ≤ α̃(B) < ∞
h∈Z h∈Z
142 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

because all finite


sums are made on distant sets, all contained in B. We
have then that h∈Z α̃(Bh ) is convergent and, since the sets Bh are a
partition of B \ F, using once more the additivity on distant sets we get

N  

α̃(B ∩ F) + α̃(B \ F) ≤ α̃(B ∩ F) + α̃ Bh + α̃(Bh )
h=−∞ h=N +1

N  

= α̃ (B ∩ F) ∪ Bh + α̃(Bh )
h=−∞ h=N +1


≤ α̃(B) + α̃(Bh )
h=N +1

for any N ≥ 1. Letting N → ∞ the inequality follows.


For g ∈ C(X) we can define
  
g dμ := g dμ+ − g dμ− .
X X X

In this way g dμ is linear w.r.t. g; in addition, since
  ∞  ∞
+
hμ= μ ({h > t}) dt − μ− ({h > t}) dt
X
0 ∞ 0

= μ({h > t}) dt


0

whenever h is nonnegative, splitting g in positive and negative part we


obtain that X g dμ is also linear w.r.t. to μ. Since
! !  
! !
! g dμ! ≤ +
|g| dμ + |g| dμ− ≤ max |g|μ = gμ
! !
X X X
∀g ∈ C(X)

the functional 
L μ (g) := g dμ g ∈ C(X) (A.7)
X

belongs to (C(X))∗ and satisfies L μ  ≤ μ. The remarkable fact


is that any element in the dual is representable in this form, and that
equality holds. This will also prove that M (X) is a Banach space (with
the definition of M (X) given above, independent of Section 6.5, it is not
even totally obvious that it is a linear space!).
143 Introduction to Measure Theory and Integration

Theorem A.6 (Riesz). Let (X, d) be a compact metric space. The space
(C(X))∗ is, via (A.7), isomorphic and isometric to M(K ). That is: all
functionals L μ belong to (C(X))∗ and, for any L ∈ (C(X))∗ , there exists
a unique μ ∈ M(K ) satisfying L = L μ . Finally, L μ  = μ.

Proof. The proof will be achieved in three steps. In the first one we build
an auxiliary positive finite measure μ∗ and prove in the second one that
μ∗ provides the desired representation of L when L is nondecreasing.
In the last one we achieve the general case and provide equality of the
norms.
Step 1. Let α ∗ : A (X) → [0, +∞) be defined by

α ∗ (A) := sup {|L(g)| : |g| ≤ 1, supp g ⊂ A} .

Notice that α ∗ (X) ≤ L and that α ∗ (∅) = 0. Notice also that we can
equivalently replace |L(g)| with L(g) inside the supremum and that a
simple approximation argument gives

α ∗ (A) ≥ |L(g)| whenever |g| ≤ 1 A . (A.8)

Indeed, if |g| ≤ 1 A we can find continuous functions gn : X → [−1, 1]


convergent to g and with support contained in A. In addition, if L is
monotone we have also

α ∗ (A) ≤ L(χ) whenever 1 A ≤ χ. (A.9)

We claim that α ∗ satisfies all the assumption of Proposition A.5. Indeed,


if g ∈ C(X) has support contained in A, since the support is compact
we have K ⊂ Ai for i large enough; it follows that L(g) ≤ α ∗ (Ai ) ≤
sup j α ∗ (A j ) and since g is arbitrary the continuity follows. In order to
prove the subadditivity, given a continuous g : X → [−1, 1] with support
K contained in A1 ∩ A2 , we can consider the disjoint compact sets K \
A1 and K \ A2 and a continuous function χ : X → [0, 1] identically
equal to 1 in a neighbourhood of K \ A1 and identically equal to 0 in a
neighbourhood of K \ A2 . It follows that (1 − χ)g has support contained
in A1 and χg has support contained in A2 , hence

L(g) = L((1 − χ)g) + L(χg) ≤ α ∗ (A1 ) + α ∗ (A2 ).

Since g is arbitrary, the subadditivity of α ∗ follows. Finally, to prove the


additivity on disjoint sets it suffices to notice that, given gi with support
in Ai and |gi | ≤ 1, the function g = g1 + g2 has support in A1 ∪ A2 and
satisfies L(g) = L(g1 ) + L(g2 ) and |g| ≤ 1.
144 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

By Proposition A.5 we obtain that α ∗ is the restriction to A (X) of a


positive measure μ∗ . Notice also that μ∗ is finite, since
μ∗ (X) = α ∗ (X) = L. (A.10)
Step 2. Now we claim that L μ∗ ≥ |L|, namely L μ∗ (g) ≥ |L(g)| for any
nonnegative g ∈ C(X). Also, we shall prove that if L is nondecreasing,
namely L(g) ≥ 0 whenever g ∈ C(X) is nonnegative, then L μ∗ coincides
with L. This proves already Riesz theorem for positive functionals.
By homogeneity, in the proof of the inequality L μ∗ (g) ≥ |L(g)|, it is
not restrictive to assume 0 ≤ g ≤ 1. Given an integer N ≥ 1, let us
consider the open sets Ai := {g > i/N }, i = 0, . . . , N − 1, and notice
that
1 
N −1
1 
N −1
+ 1 Ai ≥ g ≥ 1 Ai . (A.11)
N i=1
N i=1

Now, given continuous functions χi : X → [0, 1] satisfying 1 Ai ≤ χi ≤


1 Ai−1 , i = 1, . . . , N , we can use (A.8) to estimate


N −1 
N −1 ! 1 N !
1 ∗ 1 ! !
L μ∗ (g) ≥ μ (Ai ) ≥ |L(χi+1 )| ≥ ! L χi !.
i=1
N i=1
N N i=2

But, since
1 
N −1
1 
N −1
1
+ χi ≥ g ≥ χi+1 (A.12)
N i=1
N i=1
N
we can let N → ∞ and use the continuity of L to get L μ∗ (g) ≥ |L(g)|.
If L is also monotone we can use the inequality (A.9) to get

1 
N −1
1 ∗ 
N −1
1 1 N −1
L μ∗ (g) − ≤ μ (Ai ) ≤ L(χi ) = L χi .
N i=1
N i=1
N N i=1

Again we can let N → ∞ and use (A.12) to get L μ∗ (g) = L(g).


Step 3. Now we define linear continuous functionals L ± : C(X) → R
by
L μ∗ (g) + L(g) L μ∗ (g) − L(g)
L + (g) := , L − (g) := .
2 2
We have L + + L − = L μ∗ and L + − L − = L. In addition, by Step 2, L ±
are monotone.
Now we can apply the construction of Step 1 and use monotonicity in
Step 2 to find positive finite measures μ± such that L ± = L μ± . It follows
that
L = L + − L − = L μ+ − L μ− = L μ
145 Introduction to Measure Theory and Integration

and the representation of L follows. Analogously, we obtain that

L μ∗ = L + + L − = L μ+ + L μ− = L μ+ +μ−

so that μ∗ = μ+ + μ− . To conclude, we identify μ with L and show


that μ+ and μ− are orthogonal. The bound on μ follows by (A.10):

μ = μ+ (X) + μ− (X) = L μ+ (1) + L μ− (1) = L μ∗ (1) = L.

In order to show that μ+ ⊥ μ− , write μ± = a ± μ∗ and use the identity


μ∗ = μ+ + μ− to get a + + a − = 1 μ∗ –a.e. in X. On the other hand
the density of C(X) in L 2 (X, μ∗ ) and a truncation argument provide
a sequence of continuous functions gn : X → [−1, 1] convergent in
L 2 (X, μ∗ ) to the sign of a + − a − , so that
! ! 
! !
L = sup |L μ (g)| = sup !! (a − a )g dμ !! =
+ − ∗
|a + − a − | dμ∗ .
|g|≤1 |g|≤1 X

Hence 
(1 − |a + − a − |) dμ∗ = μ∗ (X) − L ≤ 0.
X

Since |a + − a − | ≤ 1 it must be |a + − a − | = 1 μ∗ –a.e. in X. Since


a ± ∈ [0, 1] μ–a.e., this can only happen if a + a − = 0 μ∗ –a.e. in X,
which means that μ+ is orthogonal to μ− .
Remark A.7. A similar result holds, with minor changes in the proof,
if (X, d) is locally compact and separable, namely there exists an non-
decreasing sequence of open sets with compact closure whose union is
the whole of X. In this case C(X) has to be replaced by C0 (X), namely
the closure in C(X) of the space Cc (X) of compactly supported func-
tions, while M (X) remains unchanged.
Solutions of some exercises

In this chapter we provide solutions to the main exercises proposed in the


text, and in particular of those marked with one or two
.

Chapter 1
Exercise 1.1. All verifications are very simple and we omit them.
Exercise 1.2. We prove the statement for the translations, the proof for
the dilations being similar. Fix h ∈ R and consider the class

F := {A ∈ B (R) : A + h ∈ B (R)} .

Then F is a σ –algebra containing the intervals, because the class I of


intervals is invariant under translations. Therefore F ⊃ σ (I ) = B (R).
This proves that A + h is Borel whenever A is Borel.

Exercise 1.3. Set X = N and μ := n δn . Then the sets An := {n, n +
1, . . .} satisfy μ(An ) = +∞, but their intersection is empty.
Exercise 1.4. Let An ↑ A with An , A ∈ A . Then the sets Bn := A \ An
satisfy Bn ↓ ∅, so that by assumption μ(Bn ) ↓ μ(∅) = 0. Since μ is
finite, μ(Bn ) = μ(A) − μ(An ), so that μ(An ) ↑ μ(A).
Exercise 1.5. For any n ∈ N∗ the set An of all atoms x such that μ({x}) ≥
1/n has at most cardinality nμ(X): indeed, if we choose k elements
x1 , . . . , xk in this sets, adding the inequalities μ({xi }) ≥ 1/n we find
k/n ≤ μ(X), whence the upper bound on the cardinality of An follows.
If μ is σ –finite, we choose X i ↑ X with X i ∈ E and μ(X i ) < ∞
and repeat the previous argument with the sets Ai,n := {x ∈ Aμ ∩ X i :
μ({x}) ≥ 1/n}, whose union gives Aμ . If not finiteness assumption is
made, the statement fails: take X = R, E = P (R) and μ(A) = 0 if
A = ∅ and μ(A) = +∞ otherwise.
Exercise 1.6. Let μ be diffuse. First we prove that for all τ ∈ (0, 1) and
all A ∈ E there exists a subset B ∈ E with 0 < μ(B) < τ μ(A). Indeed,
148 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

if this property fails for some τ and A, for all subsets B either μ(B) = 0
or μ(B) ≥ τ μ(A). Now, choose B1 ⊂ A with μ(B1 ) ∈ (0, μ(A)) (this is
possible by assumption), then B2 ⊂ A \ B1 with μ(B2 ) ∈ (0, μ(B1 )) and
so on. Since all these sets are contained in A, we have μ(Bi ) ≥ τ μ(A),
and this contradicts the fact that they are disjoint.
Now, given t ∈ (0, μ(X)) we define a sequence of pairwise disjoint
sets Bi and numbers si as follows: first set

s1 := sup {μ(B) : μ(B) ≤ t}

and then choose B1 with t ≥ μ(B1 ) > s1 /2; then recursively set
 
sn+1 := sup μ(B) : B ⊂ Bnc , μ(B) ≤ t − μ(Bn )

and choose Bn+1 ⊂ Bnc with t − μ(Bn ) ≥ μ(Bn+1 ) > sn+1 /2. We now
claim that μ(∪i Bi ) = t. If this property fails, then i μ(Bi ) < t and the
convergence of the series implies that si → 0. On the other hand
 

si ≥ sup μ(B) : B ⊂ X \ Bi , μ(B) ≤ t − μ(Bi )
i i

The previous property with A = X \ ∪i Bi and τ = (t − i μ(Bi ))/μ(A)
shows that the supremum in the right hand side (independent of i) is
positive, contradicting the fact that si → 0.
Exercise 1.7. Let X be a separable metric space and let E = B (X). If
μ({x}) > 0 for some x ∈ X, obviously μ is not diffuse. Conversely,
if A ∈ B (X) is given, with μ(A) > 0 and μ(B) ∈ {0, μ(A)} for all
B ⊂ A, we can fix a countable dense set (xi ) ⊂ X and define
 
r0 := sup r ≥ 0 : μ(A ∩ B r (x0 )) = 0 .

Since r  → μ(A ∩ B r (x0 )) is right continuous, the maximality of r0 easily


implies that μ(A ∩ B r0 (x0 )) > 0, and therefore μ(A ∩ B r0 (x0 )) = μ(A).
Now we iterate this construction, setting A1 := A ∩ B r0 (x0 ), defining
 
r1 := sup r ≥ 0 : μ(A1 ∩ B r (x1 )) = 0 ,

so that μ(A1 ∩ B r1 (x1 )) = μ(A1 ) = μ(A). Continuing in this way, we


have a nonincreasing
 family of sets (Ai ) with μ(Ai ) = μ(A); it follows

that μ( i Ai ) = μ(A) > 0. On the other hand, any point x ∈ i Ai
satisfies
d(x, xi ) = ri ∀i ∈ N.
149 Introduction to Measure Theory and Integration

By the density of the family (xi ), this intersection contains at most one
point (and at least one, because the measure is positive). It follows that
this point is an atom of μ.
Exercise 1.8. Cantor’s middle third set can be obtained as follows: let
C0 = [0, 1], let C1 the set obtained from C0 by removing the interval
(1/3, 2/3), let C2 be the set obtained from C1 by removing the intervals
(1/9, 2/9) and (7/9, 8/9), and so on. Each set Cn consists of 2n disjoint
closed intervals with length 3−n , so that λ(Cn ) = (2/3)n → 0. If follows
that the intersection C of all sets Cn is a closed and λ–negligible set.
In order to show that C has the cardinality of continuum (at this stage
it is not even obvious that C = ∅!) we recall that numbers x ∈ [0, 1]
can be represented with a ternary, instead of a decimal, expansion: this
means that we can write

x= ai 3−i = 0, a1 a2 a3 . . .
i≥1

with the ternary digits ai ∈ {0, 1, 2}. As for decimal expansions, this
representation is not unique; for instance 1/3 can be written either as 0.1
or as 0.0222 . . ., and 2/3 can be written either as 0.2 or as 0.1222 . . ..
It is easy to check that C1 corresponds to the set of numbers that can
be expressed by a ternary representation not having 1 as first digit, C2
corresponds to the set of numbers that admit a representation not having
1 as a first or second digit, and so on. It follows that C is the set of
numbers that admit a ternary representation not using the digit 1: since
the map



(a1 , a2 , . . .) ∈ {0, 2}N → x = ai 3−i
i=1
∗ ∗
provides a bijection of {0, 2}N with C, and the cardinality of {0, 2}N is
the continuum, this proves that C has the cardinality of continuum.
Exercise 1.9. Let {qn }n∈N be an enumeration of the rational numbers in
[0, 1], and set

ε ε
A := (qn − 2−n , qn + 2−n ).
n=0
4 4

Then A ⊂ R is open and λ(A) < n ε2−n−1 = ε (why is the inequality
strict ?). Therefore [0, 1] \ A has Lebesgue measure strictly less than ε
and an empty interior, because [0, 1] \ A does not intersect Q.
Exercise 1.11 Let {In }n∈N be an enumeration of the open intervals with
rational endpoints of (0, 1). By the construction in Exercise (1.9), for any
150 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

interval I and any δ ∈ (0, λ(I )) we can find a compact set C ⊂ I with
an empty interior such that 0 < λ(C) < δ. We will define


E := Ci
i=0

where Cn ⊂ In are compact sets with an empty interior, λ(Cn ) > 0 and
λ(Cn ) < δn . The choice of Cn and δn will be done recursively. Notice
first that
λ(E ∩ In ) ≥ λ(Cn ) > 0 ∀n ∈ N,
so we have only to take care of the condition λ(E ∩ In ) < λ(In ). Set
βn = λ(In \ ∪n0 Ci ) and notice that βn > 0 because all Ci have an empty
interior. Since

n 
∞ 

λ(In ∩ E) ≤ λ(In ∩ Ci ) + δi = λ(In ) − βn + δi
0 i=n+1 i=n+1

it suffices to choose δn (and Cn ) in such a way that ∞n+1 δi < βn . This
is possible, choosing for instance δn+1 > 0 satisfying


1 1 1
δn+1 < max βn , βn−1 , . . . , n+1 β0 ,
2 4 2

to get δi < 2n−i βn for i > n.


Exercise 1.12. Let A be μ–measurable and let B, C ∈ E be satisfying
AB ⊂ C and μ(C) = 0. For any set D ⊂ X we have, by monotonicity
of μ∗ ,

μ∗ (D ∩ A) + μ∗ (D \ A) ≤ μ∗ (D ∩ (B ∪ C)) + μ∗ ((D \ B) ∪ C).

Since μ∗ (D ∩C) ≤ μ∗ (C) = μ(C) = 0, by using twice the subadditivity


of μ∗ and then the additivity of B we get

μ∗ (D ∩ A) + μ∗ (D \ A) ≤ μ∗ (D ∩ B) + μ∗ (D \ B) = μ∗ (D).

Since D is arbitrary, this proves that A is additive.


Exercise 1.13. The statement is trivial if μ∗ (A) = ∞. If not, for any
n ∈ N∗ we can find, by the definition of μ∗ , a countable union An of
sets ofN such that An ⊃ A and μ(An ) ≤ μ∗ (A) + 1/n. Then, setting
B := n An we have B ⊃ A and μ(B) ≤ infn μ∗ (A) + 1/n = μ∗ (A).
The inequality μ(B) ≥ μ∗ (B) follows by the monotonicity of μ∗ , taking
into account that μ∗ (B) = μ(B).
151 Introduction to Measure Theory and Integration

Exercise 1.14. E μ is a σ –algebra: stability under complement


 isimmedi-
ate,
 because A c
B c
= AB; if A i B i ⊂ C i , then ( i A i )( i Bi ) ⊂
i C i , and since μ–negligible sets are stable under countable unions, this
proves that E μ is stable under countable unions.
The extension μ(A) := μ(B), where B ∈ E is any set such that AB
is contained in a μ–negligible set of E , is well defined and σ –additive on
E μ : if AB ⊂ C and AB  ⊂ C  , then BB  ⊂ C ∪ C  ; consequently,
if μ(C) = μ(C  ) = 0 it must be μ(B) = μ(B  ). The σ –additivity can be
proven with an argument analogous to the one used to show that E μ is a
σ –algebra.
μ–negligible sets of E μ are characterized by the property of being con-
tained in a μ–negligible set of E : if A ∈ E μ is μ–negligible, there exist
μ–negligible sets B, C ∈ E with AB ⊂ C; as a consequence A is
contained in the μ–negligible set B ∪ C ∈ E . Conversely, if A ⊂ X is
contained in a μ–negligible set C ∈ E we may take B = ∅ to conclude
that A ∈ E μ and μ(A) = 0.
Exercise 1.15. Let A be additive; by Exercise 1.13 we can find a set
B ∈ E containing A with μ(B) = μ∗ (A). The additivity of A and the
equality μ∗ (B) = μ(B) give

μ(B) = μ∗ (A) + μ∗ (B \ A).

As a consequence μ∗ (B \ A) = 0. Now we apply Exercise 1.13 again, to


find a μ–negligible set C ∈ E containing A \ B. It follows that AB is
contained in C, and therefore A is μ–measurable.
{Ai }i∈I
Exercise 1.16. Let us first build a family of pairwise disjoints sets
⊂ P (N), with I and all sets Ai having an infinite cardinality and i Ai =
N (the construction of the σ –algebra will be more clear if we keep I and
N distinct). The family {Ai } can be obtained, for instance, through a
bijective correspondence S between N × N and N, setting Ai := S({i} ×
N). Then, we define π : N → I by

π(n) = i, where i ∈ I is the unique index such that n ∈ Ai

and (with the convention π −1 (∅) = ∅)


 
F := π −1 (J ) : J ⊂ I .

It is immediate to check that F is a σ –algebra, that Ai = π −1 ({i}) ∈ F


and that any nonempty set in F contains one of the sets Ai . Therefore
F contains infinitely many sets, and all of them except ∅ have an infinite
cardinality.
152 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Exercise 1.17. It suffices to define μ(A) = 0 if A has a finite cardinality,


and +∞ otherwise. A finite union of sets has an infinite cardinality if and
only if at least one of the sets has an infinite cardinality, and this shows
that μ is additive.
The solutions of the next exercises require a more advanced knowledge
of set theory, and in particular the theory of ordinals, the transfinite in-
duction, the behavior of cardinality under unions and products, and Zorn
lemma. We shall denote by ω the smallest uncountable ordinal and by χ
the cardinality of continuum.
Exercise 1.18. Notice that F ( j) ⊂ σ (K ) implies
 


( j) ( j)
Ak , B : (Ak ) ⊂ F , B ∈ F
c
⊂ σ (K ).
k=0

Therefore, if i is the successor of j, we obtain F (i) ⊂ σ (K ); analog-


ously,
 if i has no predecessor, and F ( j) ⊂ σ (K ) for all j ∈ i, then
j∈i F
( j)
, namely F (i) , is contained in σ (K ). Using these two facts,
one obtains by transfinite induction that F (i) ⊂ σ (K ) for all i ∈ ω. An
analogous induction argument  shows that F (i) ⊂ F ( j) whenever i ∈ j.
So, the union U := i∈ω F (i) is contained in σ (K ) and, to prove
that equality holds, it suffices to show that this union is a σ –algebra. Let
(Bk ) ⊂ U and let i k ∈ ω be such thatBk ∈ F (ik ) . Since i k are countable
and ω is uncountable we have i := k i k ∈ ω and all sets Bk belong to
F (i) . It follows that their union belongs to F ( j) , where j is the successor
of i, and therefore to U . An analogous (and simpler) argument proves
that U is stable under complement.
Exercise 1.19. Obviously B (R) has at least the cardinality of continuum,
so we need only to show an upper bound on the cardinality of B (R). The

proof is based on the fact that a union i∈J X i and a product × i∈J X i
have cardinality not greater than χ if the index set J and all sets X i have
cardinality not greater than χ. Let F (i) be defined as in Exercise 1.19,
with K having at most the cardinality of continuum. Using the previous
property of products, with J even countable, one can prove by transfinite
induction that, for all i ∈ ω, F (i) has at most cardinality χ. If we choose
as K the class of intervals, whose cardinality is (at most) χ, we find

B (R) = σ (K ) = F (i) .
i∈ω

Now we use the above mentioned property of unions, with J = ω and


X i = F (i) , to conclude that B (R) has at most the cardinality of con-
tinuum.
153 Introduction to Measure Theory and Integration

Exercise 1.20. Obviously L has a cardinality not greater than the car-
dinality of P (R); by Bernstein theorem (1) it suffices to show that the
cardinality of P (R) is not greater than the cardinality of L : if C is the
Cantor set of Exercise 1.8, we know that P (R) is in one-to-one corres-
pondence of P (C), because C has the cardinality of continuum; on the
other hand, any subset of C obviously belongs to L , because C has null
Lebesgue measure.
Exercise 1.21. Let E ⊂ P (X) be a σ –algebra. Assume by contradiction
that E is infinite and countable. We define the equivalence relation

y ∼ y if and only if ((y ∈ B ⇔ y  ∈ B) ∀B ∈ E )

and let F be the partition of X in equivalence classes. We now prove that


F ⊂ E . Indeed, let F ∈ F , fix f ∈ F, for any x ∈ / F we have f  ∼ x
so there must be B ∈ E such that f ∈ B and x  ∈ B (or the opposite, but
then we may consider B c ); given this set B, for any g ∈ F we have that
g ∼ f implies g ∈ B, so that F ⊂ B. Since x is arbitrary we conclude
that 
F= B.
B∈E ,F⊂B

Now, since E is countable, it follows that F ∈ E . We eventually note


that any set in E is union of sets in F : but then, if F were finite then E
would be finite, whereas if F were infinite then E would be uncountable.
Exercise 1.22 We define F as in the solution of the previous exercise, in
this case it has finite cardinality, say n; consequently, there are 2n sets in
E.
Exercise 1.23 We define F as in the solution of Exercise 1.21; we also
adapt the above argument to show again that F ⊂ E . Indeed, let F ∈ F ,
/ F we have f ∼ x so there must be B = B F,x ∈ E
fix f ∈ F, for any x ∈
such that f ∈ B and x ∈ B; and again F ⊂ B F,x . Hence

F= B F,x
x∈X, x ∈ F

and this proves that F ⊂ A , since X is countable. We then use the


a function φ : F → X such that φ(F) ∈ F,
Axiom of Choice to define 
and eventually define μ̃ = F∈F μ(F)δφ(x) .

(1) If A has cardinality not greater than B, and B has cardinality not greater than A, then there exists
a bijection between A and B
154 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Exercise 1.24. We begin our construction with an algebra τ0 in P (N)


and μ0 : τ0 → {0, 1} which is additive but not σ –additive. For instance
we may take as τ0 the algebra generated by singletons {x} with x ∈ N
(i.e. the sets A ⊂ N such that either A or Ac are finite) and set

0 if A is finite;
μ0 (A) :=
1 if Ac is finite.

We will extend μ0 to an additive function, that we still denote by μ0 ,


defined on the whole of P (N). If such an extension exists, it can’t be
σ –additive, because μ0 ({n}) = 0 for all n ∈ N, while μ0 (N) = 1.
In the class C of pairs (τ, μ) with τ algebra and μ : τ → {0, 1}
additive, we define the partial order relation (τ, μ) ≤ (τ  , μ ) by τ ⊂
τ  and μ|τ = μ; then we consider the class C 0 of all (τ, μ) satisfying
(τ, μ) ≥ (τ0 , μ0 ). By Zorn lemma, we can find a maximal (τ̄ , μ̄) in this
class: indeed, it is easy to check that any totally ordered chain I ⊂ C 0
has an upper bound (τ  , μ ), defined by

τ  := τ and μ (A) := μ(A) where A ∈ τ, (τ, μ) ∈ I.
(τ,μ)∈I

We will show that the maximality of (τ̄ , μ̄) forces τ̄ to coincide with
P (N), so that μ̄ will be the desired extension of μ0 .
Let us assume by contradiction that τ̄  P (N) and choose Z ⊂ N
with Z ∈
/ τ̄ . We notice that
 
(A1 ∩ Z ) ∪ (A2 ∩ Z c ) : A1 , A2 ∈ τ̄

is the algebra generated by τ̄ ∪ {Z }. Moreover, either Z or Z c satisfy the


following property

for all A ∈ τ̄ with μ̄(A) = 1, Z ∩ A  = ∅. (A.13)

If not, we would be able to find A1 , A2 ∈ τ̄ with A1 ∩ Z = A2 ∩ Z c = ∅


and μ̄(A1 ) = μ̄(A2 ) = 1, so that A1 and A2 would be disjoint and
μ̄(A1 ∪ A2 ) = 2, contradicting the fact that μ̄ maps τ̄ into {0, 1}. Possibly
replacing Z by its complement we shall assume that Z fulfils (A.13).
Now we extend μ̄ to the algebra generated by τ̄ ∪ {Z }, as follows:

μ̃(B) := μ̄(A1 ) whenever A1 , A2 ∈ τ̄ and B = (A1 ∩ Z ) ∪ (A2 ∩ Z c ).


(A.14)
Let us check that μ̃ is well defined and additive.
155 Introduction to Measure Theory and Integration

1. μ̃ is well defined: if

B = (A1 ∩ Z ) ∪ (A2 ∩ Z c ) = (A3 ∩ Z ) ∪ (A4 ∩ Z c )

then (A1 ∩ Z ) = (A3 ∩ Z ), and if μ̄(A1 ) = μ̄(A3 ) then one of the


two numbers, say μ̄(A1 ), equals 1, while μ̄(A3 ) = 0. Defining A :=
A1 \ A3 we have μ̄(A) = 1 and A ∩ Z = ∅, contradicting (A.13).
2. Suppose B, B  ∈ τ̄ are disjoint. Let B = (A1 ∩ Z ) ∪ (A2 ∩ Z c )
and B  = (A1 ∩ Z ) ∪ (A2 ∩ Z c ). Then A1 ∩ A1 ∩ Z = ∅. Setting
A1 := A1 \ A1 we still have B  = (A1 ∩ Z ) ∪ (A2 ∩ Z c ), and then we
can use the additivity of μ̄ to conclude that

μ̃(B ∪ B  ) = μ̄(A1 ∪ A1 ) = μ̄(A1 ) + μ̄(A1 ) = μ̃(B) + μ̃(B  ).

If B ∈ τ we can choose A1 = A2 = B in (A.14) to obtain that μ̃(B) =


μ̄(B), so that μ̃ extends μ̄ to the algebra generated by τ̄ ∪ {Z }. This
violates the maximality of (τ̄ , μ̄).
Exercise 1.25 We obviously need only to show that the cardinality of
C is at least equal to the continuum. By the inner regularity of λ we
can assume with no loss of generality that C is closed. Now, we define
A = (0, 1) \ C and

g(t) := λ [0, t] ∩ C t ∈ [0, 1].

This continuous function maps continuously [0, 1] onto [0, λ(C)], and
it is constant in any connected component of A, so that g(A) is at most
countable. Since g(C) contains [0, λ(C)] \ g(A) we obtain that C has
cardinality at least equal to the continuum (one can actually see that
g(C) = g([0, 1])).
Exercise 1.26 Since K is totally bounded, for all > 0 there exist fi-
nitely many balls B1 , . . . , B N with radius whose union covers K . The
properties of μ imply the existence of an index i such that μ({n : xn ∈
Bi }) = 1. Now we start with = 1 and find a closed ball B (1) with
radius 1 such that μ({n : xn ∈ B (1) }) = 1. Repeating this construction
in B (1) we find a closed ball B (2) with radius 1/2 contained in B (1) with
μ({n : xn ∈ B (2) }) = 1. Continuing in this way, if z is the common point
of the balls B (i) , we find xn μ-converges to z.

Chapter 2
Exercise 2.1 The verification is straightforward and is omitted.
156 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Exercise 2.2 Let ϕ, ψ : X → R be E –measurable. If ϕ(x) + ψ(x) < t


we can find a rational number r such that ϕ(x) < r and ψ(x) < t − r,
hence
{ϕ + ψ < t} = [{ϕ < r} ∩ {ψ < t − r}] .
r∈Q

This proves that ϕ + ψ is E –measurable. Analogously, since


√ √
{ϕ 2 > a} = {ϕ > a} ∪ {ϕ < − a}, a ≥ 0

we obtain that ϕ 2 is measurable. Considering the difference (ϕ + ψ)2 −


(ϕ − ψ)2 we obtain that ϕψ is E –measurable.
Exercise 2.3. (i) The verification of the axioms of distance is immediate.
In order to prove the compactness of R, let us consider a sequence (xn ) ⊂
R. If supn xn = +∞ we can find for any k an index n(k) such that
xn(k) ≥ k; it follows that d(xnk , +∞) = | arctan xn(k) − π/2| tends to
0, so that xn(k) → +∞ in the metric space. Analogously, if infn xn =
−∞ we can find a subsequence converging to −∞ in (R, d). Finally, if
both supn xn and infn xn are finite, the sequence (xn ) is bounded and we
can extract, thanks to the Bolzano–Weierstrass theorem, a subsequence
xn(k) converging to x ∈ R. The continuity of z  → arctan z implies that
xn(k) → x in (R, d). To prove the equivalence of the two topologies,
let us work with closed sets: if C ⊂ R is closed with respect to the
(R, d) topology, then it is closed with respect to the Euclidean topology,
because |xn − x| → 0 implies | arctan xn − arctan x| → 0. On the other
hand, if | arctan xn − arctan x| → 0 then for n large enough arctan xn
belongs to an interval I := (arctan x − ε, arctan x + ε) ⊂ (−π/2, π/2);
the continuity of y → tan y in I implies that xn → x. This proves the
converse implication, and the equivalence of the two topologies.
(ii) We notice first that, according to (i), B (R) and {−∞}, {+∞} belong
to B (R). Therefore, if f is measurable between E and the Borel σ –
algebra of (R, d), then it is E –measurable according to (2.2). According
to the measurability criterion, in order to prove the converse implication
it suffices to show that B (R) is generated by B (R) ∪ {−∞} ∪ {+∞}:
this follows by the fact that if C ⊂ R is closed, then

C = (C ∩ R) ∪ (C ∩ {−∞}) ∪ (C ∩ {+∞})

(again by (i)) belongs to the algebra generated by B (R)∪{−∞}∪{+∞},


therefore the σ –algebra generated by this family of sets contains B (R).
Exercise 2.4. If { f = g} is contained in a μ–negligible set C of E ,
for some E –measurable function g, then { f > t}{g > t} ⊂ C for all
t ∈ R, and since {g > t} ∈ E it follows that { f > t} ∈ E μ ; this means
157 Introduction to Measure Theory and Integration

that f is E μ –measurable. Conversely, assume that f is E μ –measurable


and find for all q ∈ Q a set Bq ∈ E and a μ–negligible set Cq ∈ E with
{ f > q}Bq ⊂ Cq . We define
 
g(x) := sup q ∈ Q : x ∈ Bq , C := Cq .
q∈Q

Since {g ≤ t} = q≤t Bq we have that g is E –measurable. Let us prove
that f (x) = g(x) for all x ∈ / C: for any such x we have x ∈ Bq for
all q < f (x), therefore g(x) ≥ f (x); if the inequality were strict, there
would exist q ∈ Q with x ∈ Bq and q > f (x), therefore x would be in
Bq \ { f > q} ⊂ Cq ⊂ C.
Exercise 2.5. If σ ≤ τ we can find a nondecreasing family of partitions
σ1 , . . . , σn with σ1 = σ , σn = τ and σi+1 \ σi containing just one point.
Therefore, in the proof of the monotonicity of σ  → Iσ ( f ) we need only
to show that Iσ ( f ) ≤ Iσ ∪{t} ( f ) whenever t ∈ (0, ∞) \ σ . Let σ =
{t0 , . . . , t N } and let i be the last index such that ti < t. If i < N we use
the inequality
(ti+1 − ti ) f (ti+1 ) = (ti+1 − t) f (ti+1 ) + (t − ti ) f (ti+1 )
≤ (ti+1 − t) f (ti+1 ) + (t − ti ) f (t)

adding to both sides j=i (t j+1 −t j ) f (t j+1 ) we obtain Iσ ( f ) ≤ Iσ ∪{t} ( f ).
If i = N the argument is even easier, because the difference Iσ ∪{t} ( f ) −
Iσ ( f ) is given by (t − t N ) f (t).
Now, let f, g : (0, +∞) → [0, +∞)  ∞ Iσ ( f + g) =
 ∞ be given; since
Iσ ( f ) + Iσ (g) we get Iσ ( f + g) ≤ 0 f (t) dt + 0 g(t) dt. Since
σ ∈  is arbitrary, this proves that
 ∞  ∞  ∞
f (t) + g(t) dt ≤ f (t) dt + g(t) dt.
0 0 0
∞
In
∞ order to prove the converse inequality, fix L < 0 f (t) dt, M <
0 g(t) dt and find σ, η ∈  with Iσ ( f ) > L and Iη (g) > M; then
 ∞
f (t) + g(t) dt ≥ Iσ ∪η ( f + g) = Iσ ∪η ( f ) + Iσ ∪η (g)
0
≥ Iσ ( f ) + Iη (g) > L + M.
∞ ∞
Letting L ↑ 0 f (t) dt and M ↑ 0 g(t) dt the inequality is proved.
Exercise 2.6. We will prove that f ∗ is lower semicontinuous, the proof
of the upper semicontinuity of f ∗ being analogous. Let (xn ) ⊂ R be
converging to x and use the definition of f ∗ (xn ) to find yn ∈ R such that
1 1
|xn − yn | < and f (yn ) ≤ f ∗ (xn ) + .
n n
158 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Then (yn ) still converges to x, so that


1
f ∗ (x) ≤ lim inf f (yn ) ≤ lim inf f ∗ (xn ) + = lim inf f ∗ (xn ).
n→∞ n→∞ n n→∞

Exercise 2.7. Let t ∈ R and let (xn ) ⊂ { f ∗ ≤ t} be convergent to x.


Then, the lower semicontinuity of f ∗ gives

f ∗ (x) ≤ lim inf f ∗ (xn ) ≤ t.


n→∞

This proves that x ∈ { f ∗ ≤ t}, so that { f ∗ ≤ t} is closed. The proof for f ∗


is similar. Since the Borel σ –algebra is generated by halflines, it follows
that f ∗ and f ∗ are Borel, and the same is true for the set { f ∗ = f ∗ }, that
coincides with .
Exercise 2.8. Set ϕ0 := ϕ, A0 := {ϕ0 ≥ a0 } and ϕ1 := ϕ − a0 1 A0 ≥
0. Then, set A1 := {ϕ1 ≥ a1 } and ϕ2 := ϕ1 − a1 1 A1 and so on. If
ϕ(x) = +∞ then ϕn (x) = +∞ for all n, so that x belongs to all sets
n
Ai and i=0 ai 1 Ai (x) = +∞. We then assume that ϕ(x) < +∞ in the
following. By construction we have that 0 ≤ ϕi+1 ≤ ϕi ≤ · · · ≤ ϕ0 = ϕ,
hence
n n
ϕ = ϕn+1 + (ϕi − ϕi+1 ) = ϕn+1 + ai 1 Ai .
i=0 i=0

This proves that ϕ ≥ i ai 1 Ai . If the inequality were strict for some
x ∈ X with ϕ(x) < +∞, we could find ε > 0 such that ϕi (x) ≥ ε for
all i ∈ N, and since ai < ε for i large  enough, we would get x ∈ Ai for i
large enough. But since the series i ai is not convergent, we would get
a 1
i i Ai (x) = ∞, a contradiction.
Exercise 2.9. Assume by contradiction that the absolute continuity prop-
−i
erty fails. Then, for some ε > 0 we can find Ai with μ(Ai ) < 2 and
Ai |ϕ| dμ ≥ ε. It follows that the set B := lim supi Ai is μ–negligible,
and
Bn := Ai \ B ↓ ∅.
i≥n
 
Since Bn |ϕ| dμ ≥ An |ϕ| dμ ≥ ε we find a contradiction with the dom-
inated convergence theorem applied to the functions 1 Bn |ϕ|, pointwise
converging to 0.

Exercise 2.10. Let ε > 0 be given and let δ > 0 be such that A |ϕ| dμ <
ε/2 whenever A ∈ E and μ(A) < δ. The triangle inequality gives,
with the same choice of A, A |ϕn | dμ < ε for n > n 0 , provided ϕn −
ϕ1 < ε/2 for n > n 0 . Since ϕ1 , . . . , ϕn 0 are integrable, we can find
δi > 0 such that A |ϕi | dμ < ε whenever A ∈ E and μ(A) < δi . If
159 Introduction to Measure Theory and Integration


δ0 = min{δ, mini δi }, we have A |ϕn | dμ < ε/2 whenever n ∈ N, A ∈ E
and μ(A) < δ.
A possible example for the second question is = [0, 1], μ = λ the
n
Lebesgue measure, and ϕn = 2n 1[2−n ,21−n ) . The uniform integrability is a
direct consequence of the convergence of ϕn to 0 in L 1 . If ϕn ≤ g, then

∞  ∞
ϕn = ϕn ≤ g
n=1 n=1
 ∞ ∞
but n=1 ϕn = 1 1/n = +∞.
Exercise 2.11. (a) For any y ∈ X we have
gλ (x) ≤ g(y) + λd(x, y) ≤ g(y) + λd(x  , y) + λd(x, x  ).
Since y is arbitrary we get gλ (x) ≤ gλ (x  ) + λd(x, x  ). Reversing the
roles of x and x  the inequality is achieved.
(b) Clearly the family (gλ ) is monotone with respect to λ, and since we
can always choose y = x in the minimization problem we have gλ (x) ≤
g(x). Assume that supλ gλ (x) is finite (otherwise the statement is trivial)
and let xλ such that gλ (x) + λ−1 ≥ g(xλ ) + λd(x, xλ ). This inequality
implies that xλ → x as λ → ∞ and, now neglecting the term λd(x, xλ ),
that
1
gλ (x) + ≥ g(xλ ).
λ
Passing to the limit in this inequality as λ → ∞ and using the lower
semicontinuity of g we get supλ gλ (x) ≥ g(x).
Exercise 2.12. Let us first assume that f is bounded. For ε > 0 we
consider the functions
 x+ε
1
f ε (x, y) := f (x  , y) dx  .
2ε x−ε
Since x  → f (x, y) is continuous, we can apply the mean value theorem
to obtain that f ε (x, y) → f (x, y) as ε ↓ 0. So, in order to show that f
is a Borel function, we need only to show that f ε are Borel.
We will prove indeed that f ε are continuous: let xn → x and yn → y;
since f (x  , yn ) → f (x  , y) for all x  ∈ R, we have
1[xn −ε,xn +ε] (x  ) f (x  , yn ) → 1[x−ε,x+ε] (x  ) f (x  , y)
for all x  ∈ R \ {x − ε, x + ε}. Therefore, since f is bounded, the
dominated convergence theorem yields

1
f ε (x, y) = 1 (x  ) f (x  , y) dx 
2ε R [x−ε,x+ε]

1
= lim 1 (x  ) f (x  , yn ) dx  = lim f ε (xn , yn ).
2ε n→∞ R [xn −ε,xn +ε] n→∞
160 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

In the general case when f is not bounded we approximate it by the


bounded functions f h (x) := max{−h, min{ f (x), h}}, with h ∈ N, that
are still separately continuous, and therefore Borel.

Chapter 3
Exercise 3.1. On the real line, endowed with the Lebesgue measure,
the function (1 + |x|)−1 belongs to L 2 , but not to L 1 , and the function
|x|−1/2 1(0,1) (x) belongs to L 1 , but not to L 2 . Turning back to the general
case, if ϕ ∈ L p1 ∩ L p2 with p1 ≤ p2 , from the inequality

|ϕ| p ≤ max{|ϕ| p1 , |ϕ| p2 } ≤ |ϕ| p1 + |ϕ| p2 ∀ p ∈ [ p1 , p2 ]

(that can be verified considering separately the cases |ϕ| ≤ 1 and |ϕ| > 1)
we get that ϕ ∈ L p for all p ∈ [ p1 , p2 ].
Exercise 3.2. The statement is trivial if  f q = 0, so we assume that
 f q > 0. For > 0 the set X := {| f | > } has finite μ–measure, by
the Markov inequality, hence the inclusion between L r spaces for finite
measures gives that | f |1 X ∈ L p (X, E , μ). Since the dominated conver-
gence theorem gives
 
lim | f − f 1 X |q dμ = lim | f |q dμ = 0
↓0 X ↓0 X\X

we can choose f˜ = f 1 X for > 0 small enough.


Exercise 3.3. By homogeneity we can assume that ϕ p = 1 and ψq =
1. Since
  p 
|ϕ| |ψ|q ϕ p ψq
+ − |ϕ||ψ| dμ = + −1=0
X p q p q

and the function among parentheses is nonnegative, it follows that if van-


ishes μ–a.e. In particular, for μ–a.e. x, |ϕ(x)| is a minimizer of
yq
y → − |ψ(x)|y
q

in [0, +∞). But this problem has a unique minimizer, given by |ψ(x)|q−1,
and we conclude.
Exercise 3.4. It suffices to apply Hölder’s inequality to the functions |ϕ|r
and |ψ|r , with the dual exponents p/r and q/r, to obtain

ϕψrr ≤ |ϕ|r  p/r |ψ|r q/r = ϕrp ψrq .


161 Introduction to Measure Theory and Integration

Exercise 3.5. The positive part and the negative part of ϕ − ϕn have the
same integral, hence
 
|ϕ − ϕn | dμ = 2 (ϕ − ϕn )+ dμ.
X X

The condition lim infn ϕn ≥ ϕ ensures that (ϕ −ϕn )+ is pointwise conver-


gent to 0; in addition, since ϕn are nonnegative, the functions are dom-
inated by ϕ + . Therefore the dominated convergence theorem gives the
result.
Exercise 3.6. If ψn → ψ μ–a.e. we apply Fatou’s lemma to the functions
ψn + ϕn to obtain
 
lim inf ψn + ϕn dμ ≥ ψ + ϕ dμ.
n→∞ X X

Therefore
   
lim sup ψn dμ + lim inf ϕn dμ ≥ ϕ dμ + ψ dμ.
n→∞ X n→∞ X X X

Subtracting ψ dμ from both sides the statement is achieved.  In the
general case, let n(k) be a subsequence such that limk ϕn(k) dμ =
lim infn X ϕn , and let n(k(s)) be a further subsequence converging to
ϕ μ–a.e. Then
  
lim inf ϕn dμ = lim ϕn(k(s)) dμ ≥ lim inf ϕn(k(s)) dμ
n→∞ s→∞ X n
X
 X

≥ lim inf ϕn dμ.


X n→∞

Exercise 3.7. We show only how (3.13) implies g(t x + (1 − t)y) ≤


tg(x) + (1 − t)g(y) for all x, y ∈ J and t ∈ [0, 1]. We prove first, by
induction on m, that
m 
2
1  2m
1
g m
xi ≤ g(xi )
i=1
2 i=1
2m

for all x1 , . . . , x2m ∈ J . The case m = 1 is (3.13) and the induction step
can be achieved grouping the terms as follows:
m−1 
 2m
1 1 2 1 2m−1
1
xi = xi + x2m−1 +i .
i=1
2m 2 i=1 2m−1 i=1
2m−1
162 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Now, considering the case when xi = x for 1 ≤ i ≤ k and xi = y


otherwise, we get
k
g(t x + (1 − t)y) ≤ tg(x) + (1 − t)g(y) with t = .
2m
Since g is continuous, by approximation we get g(t x + (1 − t)y) ≤
tg(x) + (1 − t)g(y) for all x, y ∈ J and t ∈ [0, 1].
Exercise 3.8. Let us first show the existence of z 0 . Let A = g(R) and
let u n = g(z n ) with u n ↓ inf A. Since u n is uniformly bounded from
above, our assumption on g ensures that (z n ) is bounded. By the Bolzano-
Weierstrass theorem we can find a subsequence z n(k) convergent to z ∈
R. The continuity of g gives that u n(k) = g(z n(k) ) converge to g(z). It
follows that inf A is finite and coincides with g(z). Now, by applying the
convexity inequality of the previous exercise with x = z 2 , y = z 0 and
t = (z 1 − z 0 )/(z 2 − z 0 ), we get
g(z 2 ) − g(z 1 ) g(z 1 ) − g(z 0 )
≥ ≥0
z2 − z1 z1 − z0
for z 0 < z 1 < z 2 , proving the monotonicity of g in [z 0 , +∞). The
argument in (−∞, z 0 ] is analogous.
 
Exercise
 3.9. Fatou’s lemma gives lim infn ϕn dμ ≥ lim infn ϕn dμ ≥
ϕ dμ. Therefore tn := ϕn dμ → t := ϕ dμ; we can apply Exer-
cise 3.5 to the functions ϕn /tn to obtain that ϕn /tn → ϕ/t in L 1 . From
this, taking into account that tn → t, the convergence of ϕn to ϕ in L 1
follows.
Exercise 3.10. Let (c) := (c)/c and notice that |ϕi | ≤ c(|ϕi |)/(c)
= (|ϕi |)/(c) on {|ϕi | ≥ c}. Therefore
  
(|ϕi |) M
|ϕi | dμ ≤ dμ+ |ϕi | dμ ≤ +cμ(A).
A A∩{|ϕi |≥c} (c) A∩{|ϕi |<c} (c)

Let us choose c sufficiently large, such that M/(c) < ε/2, and then
δ > 0 such that cδ < ε/2. The inequality above yields A |ϕi | dμ < ε
whenever μ(A) < δ.
Exercise 3.11. Let ( f n ) ⊂ Cb (X) be converging in L 1 to f , and let f n(k)
be a subsequence pointwise convergent μ–a.e. to f . Then, given any
ε > 0, by Egorov theorem we can find a Borel set B ⊂ X with μ(B) < ε
and f n(k) → f uniformly on B c . By the inner regularity of the measure
we can find a closed set C ⊂ B c such that μ(X \ C) < ε. The function f
restricted to C, being the uniform limit of bounded continuous functions,
is bounded and continuous.
163 Introduction to Measure Theory and Integration

Chapter 4
Exercise 4.1. Notice that ·, · is obviously symmetric, that x, −y =
−x, y = −x, y and that x, x = x2 ≥ 0, with equality only if
x = 0. Notice that the parallelogram identity gives

x + x  + 2y2 + x − x  2 = 2x + y2 + 2x  + y2


= 8x, y + 8x  , y − 2x − y2 − 2x  − y2

and
x + x  − 2y2 + x − x  2 = 2x − y2 + 2x  − y2
= 8x, −y + 8x  , −y − 2x + y2 − 2x  + y2 .

Subtracting and dividing by 4 we get

x + x  , 2y = 4x, y + 4x  , y − 2x, y − 2x  , y.

So, we proved that x + x  , 2y = 2x, y + 2x  , y. Using the relation
u, 2v = 4u/2, v (due to the definition of ·, · and the homogeneity
of  · ), we get
% &
x + x 1 1
, y = x, y + x  , y.
2 2 2
Setting x = t1 v, x  = t2 v, and defining the continuous function φ(t) =
tv, y, we get  
t1 + t2 1 1
φ = φ(t1 ) + φ(t2 ).
2 2 2
This means that φ and −φ are convex in R, so that φ is an affine function,
and since φ(0) = 0 we get φ(t) = tφ(0), i.e. tu, y = tu, y. Coming
back to the identity above, we get x + x  , y = x, y + x  , y.
Exercise 4.2. Assume that y = π K (x). For all z ∈ K and t ∈ [0, 1] we
have y + t (z − y) belongs to K , so that

y + t (z − y) − x2 ≥ y − z2 .

Expanding the squares we get

t 2 z − y2 + 2tz − y, y − x ≥ 0 ∀t ∈ [0, 1].

This implies (either dividing by t > 0 and passing to the limit as t ↓ 0,


or computing the right derivative at t = 0) that z − y, x − y ≤ 0.
Conversely, if for some y ∈ K this condition holds for all z ∈ K , the
164 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

argument can be reversed to get y + t (z − y) − x ≥ y − x for


all t ≥ 0. Choosing t = 1 we get z − x ≥ y − x, proving that
y = π K (x).
Exercise 4.3. Let Yk be the vector space spanned by { f 1 , . . . , f k } and
let us prove by induction on k ≥ 1 that f i is orthogonal to f j whenever
1 ≤ i < j ≤ k. First we observe that if this property holds for some k,
then Yk is k-dimensional and coincides with the vector space spanned by
{v1 , . . . , vk } (being contained in it, and with the same dimension).
The orthogonality of the vectors f i can be obtained just noticing that


k−1
f k = vk − vk , ei ei .
i=1

So, f k = vk − πYk−1 (vk ) is orthogonal to all vectors in Yk−1 . It follows


that ek , ei  = 0 for all i < k.

Exercise 4.4. Let y = x− k x, ek ek ; we know that the series converges
in H by Bessel’s inequality. In order to show that k x, ek ek = π X (x)
it suffices to prove that y is orthogonal to all vectors in X. But since
any vector v ∈ X can be represented as a series, it suffices to show that
v, ei  = 0 for all i. The continuity and linearity of the scalar product
give


y, ei  = x, ei  − x, ek x, ei  = x, ei  − x, ei  = 0.
k=0

Exercise 4.5 Since X and its scalar product coincide with L 2 ([0, 1],
P ([0, 1]), μ), where μ is the counting measure in [0, 1], we obtain
that X is an Hilbert space. Let us prove by contradiction that X is not
separable. If S = { f n }n≥1 were a dense subset, it could be possible to
find a countable set D ⊂ [0, 1] such that f n (x) = 0 for all n and all
x ∈ [0, 1] \ D. Since [0, 1] is not countable we can find x0 ∈ [0, 1] \ D
and define g0 (x) equal to 1 if x = x0 and equal to 0 if x  = x0 . We
claim that g0 does not belong to the closure of S. If this property fails,
we can find a sequence ( f n(k) ) ⊂ S convergent to g0 μ–a.e. in [0, 1];
but, convergence μ–a.e. corresponds to pointwise convergence and since
g0 (x0 )  = 0, while f n(k) (x0 ) = 0 for all k, we obtain a contradiction.
Exercise 4.6. By Parseval identity we know that x  → (x, ei )) is a
linear isometry from H to 2 . As a consequence, taking the parallelogram
identity into account, the scalar product is preserved.
Exercise 4.7. We consider the class of orthonormal systems {ei }i∈I of H ,
ordered by inclusion. Zorn’s lemma ensures the existence of a maximal
165 Introduction to Measure Theory and Integration

system {ei }i∈I . Let V be the subspace spanned by ei , let Y be its closure
(still a subspace) and let us prove that Y = H . Indeed, if Y were a proper
subspace of H , we would be able to find, thanks to Corollary 4.5, a unit
vector e orthogonal to all vectors in Y , and in particular to all vectors
ei . Adding e to the family {ei }i∈I the maximality of the family would be
violated. Now, by the just proved density of V in H , given any x ∈ H
we can find a sequence of vectors (vn ), finite combinations of vectors ei ,
such that x − vn  → 0. If we denote by Jn ⊂ I the set of indexes used
to build the vectors {v1 , . . . , vn }, and by Hn the vector space spanned by
{ei }i∈Jn , we know by Proposition 4.6 that

x − x, ei ei  ≤ x − vn  → 0.
i∈Jn

As a consequence, setting J = ∪n Jn , we have x = i∈J x, ei ei .

Chapter 5
Exercise 5.1. The functions sin mx cos lx are odd, therefore their integral
on (−π, π) vanishes. To show that sin mx is orthogonal to sin lx when
l = m, we integrate twice by parts to get
 π 
m π
sin mx sin lx dx = cos mx cos lx dx
−π l −π

m2 π
= 2 sin mx sin lx dx.
l −π

The integrals of products cos mx cos lx can be handled analogously.


Exercise 5.2. Since for N < M we have

N 
M 
M 

 xn − xn  ≤ xi  ≤ xi 
n=0 n=0 i=N +1 i=N +1


we obtain that ( 0N xi ) is a Cauchy sequence in E. Therefore the com-
pleteness of E provides the convergence  of the series. Passing to the
limit as N → ∞ in the inequality  0N xi  ≤ 0N xi  and using the
continuity of the norm we obtain (5.15).

Exercise 5.3. We consider only the first system gk = 2/π sin kx, the
proof for the second one being analogous. The fact that (gk ) is orthonor-
mal can be easily checked noticing that gk are restrictions to (0, π) of
odd functions, and using the orthogonality of sin kx in L 2 (−π, π). Ana-
logously, if f ∈ L 2 (0, π) let us consider its extension f˜ to (−π, π) as an
166 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

odd function and its Fourier series, which obviously contains no cosinus.
In (0, π) we have

N 
N
bk sin kx =  f, gk gk ,
k=1 k=1

where the scalar products are understood in L 2 (0, π). Therefore, from
the convergence of the Fourier series in L 2 (−π, π) to f˜, which implies
convergence in L 2 (0, π) to f , the completeness follows.
Exercise 5.4. Clearly ek , ek  = 1, while
 π ) *π
ikx −ilx 1
e e dx = e i(k−l)x
dx =0 whenever k  = l.
−π i(k − l) −π

As a consequence (ek ) is an orthonormal


N system.
Since the Fourier series SN f = −N  f, ek ek of f depends linearly on
f , in order to show completeness we need only to show S N f → f when
f is real-valued and when f is imaginary-valued (i.e. i f is real-valued).
We consider only the first case, the second one being analogous. Setting
ck =  f, ek , we have
 π
1
ck = √ f (x) cos kx − i f (x) sin kx d x.
2π π

As a consequence, for k ≥ 1 we have 2/πck = ak − ibk , where ak and
√ k ≤ −1 we
bk are√the coefficients of the real Fourier series of f , and for
have 2/πck = a−k + ib−k . For k = 0, instead, we have 2/πc0 = a0 .
Taking into account these relations and setting b0 = 0, we have


N
N
eikx 1 
ck √ = (cos kx + i sin kx)(ak − ibk )
k=−N 2π 2 k=1

−1
+ (cos kx + i sin kx)(a−k − ib−k )
k=−N
N 
a0
= + Re (cos kx + i sin kx)(ak − ibk )
2 k=1

a0  N
= + ak cos kx + bk sin kx,
2 k=1

and the convergence of S N f to f follows by the convergence in the real-


valued case.
167 Introduction to Measure Theory and Integration

Exercise 5.5. It suffices to note that


 π 2
1 −ikx
f (x)e dx = ( f, ek )2 ,
2π −π

where (ek ) is the orthonormal system of Exercise 5.4 and to use its com-
pleteness.
2N ikz
Exercise 5.6. From the identity i=0 e = (ei(2N +1)z − 1)/(ei z − 1),
we get


N 
2N
ei(2N +1)z − 1
eikz = e−i N z eikz = e−i N z =
k=−N k=0
ei z − 1
i(N +1/2)z
e − e−i(N +1/2)z sin((N + 1/2)z)
= = (A.15)
e i z/2 −e −i z/2 sin(z/2)

and we call this term G N (z). Hence



N
1  π 
S N f (x) = f (y)e−iky dy eikx
k=−N
2π −π
N  π
1
= f (y)eik(x−y) dy
k=−N
2π −π
 π
1
= f (y)G N (x − y) dy.
2π −π

Using the fact that sin((N + 1/2)z)/ sin(z/2) has, still because of (A.15),
mean value 1 on (−π, π), we get
 π
1
f (x) − S N f (x) = ( f (x) − f (y))G N (x − y) dy.
2π −π

Exercise 5.7. We apply the Parseval identity to the function f (x) =


x 2 , whose Fouries series contains no sinus. It is simple to check, by
integration by parts, that a0 = 2π 2 /3 and that ak = 4k −2 cos kx for
k ≥ 1. We have then

1 π
2 a2  ∞
4 ∞
16
x 4 dx = π 4 = 0 + ak2 = π 4 + .
π −π 5 2 k=1
18 k=1
k4
∞
Rearranging terms, we get 1 k −4 = π 4 /90.
168 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Exercise 5.8. The polynomials Pn are given by Q n /Q n 2 , where Q n


are recursively defined by Q 0 = 1 and


n−1
x n , Q k  
n−1
Q n (x) := x n − Q k (x) = x n − x n , Pk Pk (x) ∀n ≥ 1.
k=0
Q k , Q k  k=0

(a) Since Q 0 = 1, P0 = 1/ 2 and Q 1 = √ x − x, P0 P0 = x, because
x, P0  = 0. As a consequence P1 (x) = 3/2x. Since x 2 , P1  = 0, we
have also
1
Q 2 (x) = x 2 − x 2 , P0 P0 − x 2 , P1 P1 = x 2 −
3

and this leads, with simple calculations, to P2 (x) = 45/8(x 2 − 1/3).
(b) Let H be the closure of the vector space spanned by Cn . This space
contains all monomials x n , and therefore all polynomials. Since the poly-
nomials are dense in C([a, b]), for the sup norm, they are also dense in
L 2 (a, b). It follows that H = L 2 (a, b). By Proposition 4.13 we conclude
that (Cn ) is complete.
(c) Set
(
2n + 1 1 dn 2
z n := , P̃n (x) := z n n (x − 1)
n
2 2n n! d x

Clearly the polynomial P̃n has degree n. So, in order to show that P̃n =
Pn , we have to show that P̃n is orthogonal to all monomials x k , k =
0, . . . , n −1, and that  P̃n 2 = 1. Since P̃n has zeros at ±1 with multipli-
city n, all its derivatives at ±1 with order less than n are zero. Therefore,
for k < n we have

) *1  1
d n−1 d n−1
 P̃n , x k  = z n x k n−1 (x 2 − 1)n − k x k−1 n−1 (x 2 − 1)n dx
d x −1 −1 d x
= ···
) n−k *1
d
= (−1) k!z n n−k (x − 1)
k 2 n
= 0.
d x −1

In order to prove that  P̃n 2 = 1, still integrating by parts we have


 1
d n−1 2 n d
n+1
 P̃n , P̃n  = −z n2 n−1 x
(x − 1) n+1 x
(x 2 − 1)n dx = · · ·
−1 d d
 1 2n
(A.16)
2 n d
= zn
2
(1 − x ) 2n (x − 1) dx.
2 n
−1 d x
169 Introduction to Measure Theory and Integration

On the other hand


 1  1
(1 − x ) dx = 2n
2 n
(1 − x 2 )n−1 x 2 dx
−1 −1
 1  1
= −2n (1 − x ) dx + 2n
2 n
(1 − x 2 )n−1 dx,
−1 −1

so that
 1  1
2n
(1 − x ) dx =
2 n
(1 − x 2 )n−1 dx = · · ·
−1 2n + 1 −1
 1
(2n)!! 2(2n)!!
= (1 − x 2 )0 dx = .
(2n + 1)!! −1 (2n + 1)!!

Taking into account that

d 2n 2
(x − 1)n = (2n)! = (2n)!!(2n − 1)!! = 2n n!(2n − 1)!!
d 2n x
from (A.16) we get

2n + 1 1 2(2n)!! n
 P̃n , P̃n  = 2 n!(2n − 1)!! = 1.
2 22n (n!)2 (2n + 1)!!

Exercise 5.9. Recall that


 π
1
ck = f (x)e−ikx dx.
2π −π

Integrating by parts once and using that f (−π) = f (π) we get


 π
1 1
ck = f  (x)e−ikx dx.
ik 2π −π

Continuing in this way, in m steps we get


 π
1 1
ck = f (m) (x)e−ikx dx.
(ik)m 2π −π

Chapter 6
Exercise 6.1. Let us prove the inclusion

(F 1 × F 2 ) × F 3 ⊂ F 1 × (F 2 × F 3 ),
170 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

the proof of the converse one being analogous. We have to show that all
products A × B, with A ∈ F 1 × F 2 and B ∈ F 3 belong to F 1 ×
(F 2 × F 3 ). Keeping B fixed, the class of sets A for which this property
holds is a σ –algebra that contains the π–system of measurable rectangles
A = A1 × A2 (because A × B = A1 ×(A2 × B) and A2 × B ∈ F 2 ×F 3 ),
and therefore the whole product σ –algebra F 1 × F 2 .
For all A in the product σ –algebra we have

(μ1 × μ2 ) × μ3 (A) = μ3 (A x1 x2 ) dμ1 × μ2 (x1 , x2 )
 X 1 ×X
2
= μ3 (A x1 x2 ) dμ2 (x2 ) dμ1 (x1 )
X1 X2

= μ2 × μ3 (A x1 ) dμ1 (x1 ) = μ1 × (μ2 × μ3 )(A).
X1

Exercise 6.2. Obviously the cubes belong to × n


1 B (R), and thanks to
Lemma 6.9 the same is true for the open sets. It follows that B (Rn ) is
contained in × n
1 B (R). Let us consider the class
 
M := B ⊂ R : B × R × · · · × R ∈ B (Rn ) .

This class contains the open sets (because the product of open sets is
open) and it is a σ –algebra, so it contains B (R). We have thus proved
that all rectangles B1 × R × · · · × R, with B1 Borel belong to B (Rn ). By
a similar argument we can show that all rectangles

R × · · · × R × Bi × R × · · · × R

are Borel. Intersecting rectangles in these families we obtain that all


rectangles with Borel sides belong to B (Rn ) and we conclude.
Exercise 6.3. Assume that A, B ∈ L 1 ; then there exist Borel sets
A , B  and Borel Lebesgue negligible sets N A , N B with AA ⊂ N A
and BB  ⊂ N B . Since A × B  ∈ B (R2 ), by the previous exercise,

(A × B)(A × B  ) ⊂ (N A × R) ∪ (R × N B )

and N A × R and R × N B are L 2 negligible, we obtain that A × B ∈ L 2 .


This proves that L 2 contains the generators of L 1 × L 1 , and therefore
the whole σ –algebra. In order to show the strict inclusion, we consider
the set E = F × {0}, where F ⊂ R is not Lebesgue measurable. Since
E is L 2 –negligible we have E ∈ L 2 . On the other hand, since the 0
171 Introduction to Measure Theory and Integration

section E 0 coincides with F, and therefore it does not belong to L 1 , the


set E can’t belong to the product of the two σ –algebras.
Exercise 6.4. Let A be the σ –algebra generated by these sets; since these
sets are obviously cylindrical, A is contained in the product σ –algebra.
The class of sets B ⊂ × n
1 X i such that B × X n+1 × X n+2 × · · · ∈ A
is a σ –algebra containing the measurable rectangles A1 × · · · × An , and
therefore contains the product σ –algebra × n
1 F i . Therefore A contains
the cylindrical sets and, by definition, the whole product σ –algebra.
Exercise 6.5. The#sections Ty := {(x, z) : (x, y, z) ∈ T } are squares
with length side 2 r 2 − |y|2 for 0 ≤ |y| ≤ r, hence
 r  r
1 16
L (T ) =
3
L (Ty ) dy = 8
2
(r 2 − y 2 ) dy = 8(r 3 − r 3 ) = r 3 .
−r 0 3 3

Exercise 6.6. For x ∈ Rn (with n ≥ 3) let


 
r := (x12 + x22 )1/2 , Ar := (x3 , . . . , xn ) : (x32 + · · · + xn2 ) < 1 − r 2 .

Then, using polar coordinates we get


  1
ωn = L n−2 (Ar ) dx1 dx2 = 2πωn−2 r(1 − r 2 )(n−2)/2 dr
{r<1} 0

= ωn−2 .
n
Therefore
2k−1 π k−1 πk
ω2k = ω2 =
2k(2k − 2) · · · 4 k!
and an analogous argument gives ω2k+1 = 2k+1 π k /(2k + 1)!!.
π n/2
Exercise 6.7. In order to show that ωn = n we show that the right
( +1)
2
hand side satisfies the same recursion formula of the previous
√ exercise.
Since (thanks to the identities (1) = 1, (1/2) = π) the formula
holds when n = 1, 2, this will prove that the identity holds for all n. For
n ≥ 2 we have
π n/2 π · π (n−2)/2 2π π (n−2)/2
= = .
( n2 + 1) n
2
( n2 ) n ( (n−2)
2
+ 1)

Exercise 6.8. We know, by Exercise 2.4, that there exist a λ–negligible


set N ∈ F ×G and a F ×G –measurable function F̃ : X ×Y → [0, +∞]
such that {F  = F̃} is contained in N . By applying the Fubini–Tonelli
172 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

theorem to 1 N we obtain that N x is ν–negligible in Y for μ–a.e. x ∈ X.


Since {F(x, ·)  = F̃(x, ·)} ⊂ N x , still Exercise 2.4 gives that F(x, ·) is
ν–measurable for μ–a.e. x ∈ X. This proves statement (i). Since, still for
μ–a.e. x ∈ X, the integral on Y (with respect to ν) of F(x, ·) coincides
with the integral of F̃(x, ·), statements (ii) and (iii) follow by applying
the Fubini–Tonelli theorem to F̃.
Exercise 6.9. Indeed, μ(D y ) = μ({y}) = 0 for all y ∈ Y , so that
Y μ(D y ) dν(y)
 = 0. On the other hand, ν(Dx ) = ν({x}) = 1 for all
x ∈ X, so that X ν(Dx ) dμ(x) = 1.

Exercise 6.10. Let (h(k)) be a subsequence such that k  f h(k) − f 1 is
convergent. Then the Fubini–Tonelli theorem gives
 
∞  
| f h(k) (x, y) − f (x, y)| dν(y) dμ(x)
X k=0 Y
∞ 

= | f h(k) (x, y) − f (x, y)| dμ × ν < ∞.
k=0 X×Y


It follows that k  f h(k) (x, ·) − f (x, ·) L 1 (ν) is finite for μ–a.e. x ∈
X, and for any such x the functions f h(k) (x, ·) converge to f in L 1 (ν).
Choosing Y = { ȳ} and ν = δ ȳ , to provide a counterexample it is suffi-
cient to consider any example (see Remark 3.7) of a sequence converging
in L 1 but not μ–almost everywhere.

Exercise 6.11. It suffices
 to apply (6.15) to |h| to show that |h| d f μ is
finite if and only if |h| f dμ is finite.
Exercise 6.12. We prove the property for the sup, the property for the inf
being analogous. If A = B1 ∪ B2 with B1 ∈ F and B2 ∈ F disjoint, we
have  
f μ(B1 ) + gμ(B2 ) = f dμ + g dμ
B1 B2
 
≤ f ∨ g dμ + f ∨ g dμ
B1 B2

= f ∨ g dμ.
A

The arbitrariness of this decomposition, proves that [( f μ) ∨ (gμ)](A) ≤


( f ∨g)μ(A). The converse inequality can be obtained noticing that, in the
chain of equalities-inequality above, the inequality becomes an equality
if we choose B1 = A ∩ { f ≥ g} and B2 = A ∩ { f > g}.
Exercise 6.13. It is easy to check that μ ≤ μi (respectively, μ ≥ μi ) for
all i ∈ I , and that any measure ν with this property is less than μ (resp.
173 Introduction to Measure Theory and Integration

 
 than μ): just write ν(B) =
greater k ν(Bk ) ≤ k μi(k) (Bk ) (resp.
≥ k μi(k) (Bk ). So, it remains to show that μ and μ are σ -additive.
For any map i : N → I , A1 , A2 ∈ F disjoint and any countable F –
measurable partition of A1 ∪ A2 we have

∞ 
∞ 

μi(k) (Bk ) = μi(k) (Bk ∩ A1 ) + μi(k) (Bk ∩ A2 ).
k=0 k=0 k=0

Estimating the right hand side from below with μ(A1 ) + μ(A2 ) we get
(because (Bk ) is arbitrary) that μ is superadditive, i.e. μ(A1 ∪ A2 ) ≥
μ(A1 ) + μ(A2 ). With a similar argument one can prove not only that μ
is subadditive, but also that μ is σ –subadditive (it suffices to consider a
countable F –measurable family, instead of 2 sets).
Now, let us prove that μ is subadditive and μ is superadditive. Let
A1 , A2 ∈ F be disjoint and let Bk1 , Bk2 be countable F –measurable
partitions of A1 and A2 respectively. If i 1 , i 2 : N → I we define i(2k) =
i 1 (k), B2k = Bk1 and i(2n + 1) = i 2 (n), B2k+1 = Bk2 , so that

∞ 
∞ 

μ(A1 ∪ A2 ) ≤ μi(k) (Bk ) = μi1 (k) (Bk1 ) + μi2 (k) (Bk2 ).
k=0 k=0 k=0

By the arbitrariness of Bk1 , Bk2 , i 1 and i 2 we conclude that μ(A1 ∪ A2 ) ≤


μ(A1 ) + μ(A2 ). With a similar argument one can prove that μ is even
σ –subadditive (one has to use a bijection between N × N and N) and that
μ is superadditive.
Exercise 6.14. If for all ε > 0 there exists δ > 0 satisfying

A ∈ F , μ(A) < δ ⇒ ν(A) < ε

then ν # μ: indeed, if μ(A) = 0 the implication above holds for all


ε > 0, hence ν(A) = 0. If ν is finite, to prove the converse we argue by
contradiction. Assume that, for some ε0 , we can find sets An ∈ F with
μ(An ) < 2−n and ν(An ) ≥ ε0 . Then, by the Borel–Cantelli lemma the
set A := lim supn An is μ–negligible. On the other hand, we have



ν Am ≥ ν(An ) ≥ ε0
m=n

and therefore (here we use the assumption that ν is finite) ν(A) ≥ ε0 ,


contradicting the absolute continuity of ν with respect to μ.
Exercise 6.15. Let B ∈ F be a μ–negligible set where ν is concentrated.
Then ν(E) = ν(E ∩ B) for all E ∈ F . But, by the absolute continuity
174 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

of ν with respect to μ, we have ν(E ∩ B) = 0 because E ∩ B ⊂ B is


μ–negligible.
Exercise 6.16. Let B ∈ F be a ν–negligible set where σ is concentrated.
Then

σ (E) = σ (E ∩ B) ≤ μ(E ∩ B) + ν(E ∩ B) ≤ μ(E) ∀E ∈ F ,

where we used the fact that ν(E ∩ B) = 0 because E ∩ B ⊂ B is ν–


negligible.
Exercise 6.17. It is easy to check that the class of functions f satisfying
f μ ≤ ν is a lattice. Hence, given a maximizing sequence ( f h ) in (6.20),
possibly replacing f h by maxi≤n f i , we can assume that f h ↑ f . The
monotone convergence theorem gives that f is a maximizer.
In order to show that ν = f μ we set σ = ν − f μ ≥ 0 and notice that
σ satisfies the following property:

t > 0, B ∈ F, t 1 B μ ≤ σ ⇒ μ(B) = 0. (A.17)


 
Indeed, the integrals X ( f + t 1 B ) dμ and X f dμ have to coincide, be-
cause ( f + tχ B )μ ≤ ν.
Exercise 6.18. We have to prove that any measure σ satisfying (A.17)
is concentrated on a μ-negligible set. To this aim, let us consider the
problem

inf {μ(A) : A ∈ F, σ is concentrated on A} .

By taking the intersection of a minimizing sequence it is easy to check


that also this problem has a solution A; we have to show that μ( A) = 0.
By the minimality of A, the implication

F  B ⊂ A, μ(B) > 0 ⇒ σ (B) > 0 (A.18)

holds. Let us consider the numbers


 
ξh := sup μ(B) : F  B ⊂ A, χ B μ ≥ 2h 1 B σ

and let us prove that ξh → 0 as h → ∞. Given maximizers Bh ⊂ A,


whose existence
 is easy to check, we have μ(Bh ) ≥ 2h σ (Bh ) and in
particular h σ (Bh ) < ∞. Hence
 
σ lim sup Bh = 0
h→∞
175 Introduction to Measure Theory and Integration

and (A.18) tells us that necessarily


 
0 = μ lim sup Bh ≥ lim sup μ(Bh ).
h→∞ h→∞

Let us show now that the maximality of Bh implies that μ(C) ≤ 2h σ (C)
for any set C ⊂ A \ Bh , i.e. t 1 A\Bh μ ≤ σ . Indeed, if there is C0 ⊂ A \ Bh
with μ(C0 ) > 2h σ (C0 ), the maximality of Bh provides a minimal integer
h 1 ≥ 1 and C1 ⊂ C0 satisfying μ(C1 ) ≤ 2h σ (C1 ) − 1/ h 1 . Let us
consider C0 \ C1 ; we still have μ(C0 \ C1 ) > 2h σ (C0 \ C1 ) and the
maximality of Bh provide a minimal integer h 2 ≥ h 1 and C2 ⊂ C0 \
C1 satisfying μ(C2 ) ≤ 2h σ (C2 ) − 1/ h 2 . Continuing in this way we
have a nondecreasing sequence (h i ) of integers and (Ci ) ⊂ F such that
μ(Ci ) ≤ 2h σ (Ci ) − 1/ h i and Ci ⊂ C0 \ ∪i−1 j=1 C j for all i ≥ 2; moreover
h i is the least integer for which there is such Ci . Now limi h i = ∞, since
the Ci are pairwise disjoint. Setting C = C0 \ ∪∞ 1 Ci , for all F ∈ F
contained in C, since F ⊂ C0 \ ∪1 C j for all i ≥ 2, we have μ(F) ≥
i−1

2h σ (F)−1/(h i −1) (if h i ≥ 2) and then μ(F) ≥ 2h σ (F). Hence Bh ∪C


is an admissible set for the maximum problem defining ξh , against the
maximality of Bh .
We choose h in such a way that ξh < μ(A) and set t = 2−h , B = A\ Bh
in (A.17). From (A.17) we conclude that μ(B) = 0, contradicting the fact
that μ(B) = μ(A) − ξh > 0.
Exercise 6.19. Let ν = ν + − ν − and let ν + = νa+ + νs+ , ν − = νa− + νs− be
the Lebesgue decompositions with respect to μ of ν + and ν − respectively.
Then, νa := νa+ − νa− and νs := νs+ − νs− provide a decomposition ν =
νa + νs with νa , νs signed, |νa | # μ and |νs | ⊥ μ.
If μ is signed and A provides a Hahn decomposition of μ (i.e. μ+(E) =
μ(E ∩ A) and μ− (E) = −μ(E ∩ Ac )), we repeat the decomposition above
in A, relative to ν and μ+ , and in B = Ac , relative to ν and μ− . Denoting
by νaA + νsA and νaB + νsB the two decompositions obtained,
νa (E) := νaA (E ∩ A)+νaB (E ∩ B), νs (E) := νsA (E ∩ A)+νsB (E ∩ B)
provides the desired decomposition ν = νa + νs with |νa | # |μ| and
|νs | ⊥ |μ|.
The uniqueness of these decompositions can be proved with the same
argument used in the case of nonnegative measures.
Exercise 6.20. Let B ∈ F and let (Bi ) be a F –measurable partition of
B; since
∞ ∞ !!
 ! 
! ∞  
| f μ(Bi )| = ! f dμ!≤ | f | dμ = | f | dμ,
! !
i=0 i=0 Bi i=0 Bi B
176 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

we obtain that | f μ|(B) ≤ | f |μ(B). To prove the converse inequality fix


ε > 0 and define Bi = B ∩ f −1 (Ii ), where Ii = ε[i, i + 1), i ∈ Z. Since
the oscillation of | f − εi| and || f | − ε|i|| in f −i (Ii ) are less than , we
get
! ! ! !
! ! ! !
! f dμ−εiμ(Bi )!! ≤ εμ(Bi ), ! | f | dμ−ε|i|μ(Bi )! ≤ εμ(Bi ),
! ! !
Bi Bi

hence !  !
! ! !!
! | f | dμ − ! !
f dμ !! ≤ 2εμ(Bi ).
!
Bi Bi

It follows that
 !! ! 
!
| f μ(Bi )| = ! f dμ!≥ | f | dμ − 2εμ(Bi )
! !
i∈Z i∈Z B i i∈Z Bi

= | f | dμ − 2εμ(B).
B

Since ε is arbitrary the converse inequality follows.


Exercise 6.21. If x < 0 or x ≥ 1 all repartition functions are respectively
equal to 0 or 1, so we need to consider only the case x ∈ [0, 1). The
repartition function of 1[0,1] L 1 obviously is equal to x, while
#{i ∈ [1, h] : i ≤ hx} [hx]
μh ((−∞, x]) = = ,
h h
where [s] denotes the integer part of s. Using the inequalities s − 1 <
[s] ≤ s with s = hx we obtain that μh ((−∞, x]) → x.
Exercise 6.22. The argument is similar to the one used in the proof of
Theorem 6.27: if y < x < y  and y, y  ∈ D we have
F(y) = lim Fh (y) ≤ lim inf Fh (x) ≤ lim sup Fh (x)
h→∞ h→∞ h→∞
 
≤ lim Fh (y ) = F(y ).
h→∞

Letting y ↑ x and y  ↓ x, we conclude.


Exercise 6.23. We define a−h 2 = μ((−∞, −h]) and, for −h 2 < i ≤ h 2 ,
ai = μ((i − 1)/ h, i/ h]). Let us denote by μh the measure obtained in
this way. If x ∈ (−h, h] and i is the smallest integer in (−h 2 , h 2 ] such
that x ≤ i/ h, we have
 *  * 
i−1
1 i −1
μ −∞, x − ≤μ −∞, = ai ≤ μh ((−∞, x]).
h h j=−h 2
177 Introduction to Measure Theory and Integration

If x is not an atom of μ, this proves that

lim inf μh ((−∞, x]) ≥ μ((−∞, x]).


h

Analogously
 *  * i
1 i
μ −∞, x + ≥μ −∞, = ai ≥ μh ((−∞, x]).
h h j=−h 2

If x is not an atom of μ, this proves that

lim sup μh ((−∞, x]) ≤ μ((−∞, x]).


h

Exercise 6.24. Let us assume that (6.31) holds. If Fi (x) → 1 as x →


+∞ uniformly in i ∈ I , for any ε > 0 we can find x such that 1−Fi (x) <
ε/2 for all i ∈ I . Analogously, we can find y < x such that Fi (y) < ε/2
for all i ∈ I . Then, the interval I = (y, x] satisfies μi (I ) > 1 − ε for all
i ∈ I , because I c = (−∞, y] ∪ (x, +∞).
Exercise 6.25. If μ is the weak limit and ε > 0 is given, let us choose
an integer n ≥ 1 such that μ([1 − n, n − 1]) > 1 − ε and points
x ∈ (−n, 1 − n) and y ∈ (n − 1, n) where the repartition functions of μh
are converging to the repartition function of μ. Then, since μ((∞, x]) +
1 − μ((−∞, y]) = μ(R \ (x, y)) < ε, there exists n ε ∈ N such that
supn≥nε μn ((∞, x]) + 1 − μn ((−∞, y]) < ε. Let now x  and y  be satis-
fying

μn ((∞, x  ]) + 1 − μn ((−∞, y  ]) < ε ∀n = 0, . . . , n ε − 1.

Then, the interval I = [min{x, x  }, max{y, y  }] satisfies infn μn (I ) >


1 − ε.
Exercise 6.26.
 
(a) limh R g dμh = R g dμ ∀g ∈ Cb (R) (that is, (6.32));
(b) limh R g dμh = R g dμ ∀g ∈ Cc (R);
(c) Fh converge to F on all points where F is continuous;
(d) Fh converge to F on a dense subset of R;
(e) limh μh (R) = μ(R);
(f) (μh ) is tight.
1 x2
We consider the functions ρh (x) := ρ(x +h), where ρ(x) = (2π)− 2 e− 2
is the Gaussian, and μh = ρh λ (λ being the Lebesgue measure), μ = 0.
178 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

In this case (c), (d), do not hold, because Fh (x) → 1  = 0 = F(x) for all
x ∈ R, (e) does not hold and (b) holds.
a ⇒ b, e. This is easy, because Cc (R) ⊂ Cb (R) and 1R ∈ Cb (R).
a ⇒ c. This follows by second part of the proof of Theorem 6.28.
d ⇔ c. This is Exercise 6.22.
b∧e ⇒ c. This follows by the same argument used in the proof of second
part of Theorem 6.28: the sequence (gk ) monotonically convergent to 1 A
can be chosen in Cc (R), and this shows that lim infh μh (A) ≥ μ(A) for
all A ⊂ R open. Using (e) and passing to the complementary sets, we
obtain lim suph μh (C) ≤ μ(C) for all C ⊂ R closed.
d ⇒ f . This follows by the same argument used in the solution of
Exercise 6.25.
d ∧ f ⇒ e. For all x ∈ D, with D dense, we have limh μh ((−∞, x]) =
μ((−∞, x]). Since μh ((−∞, x]) → μh (R) as x → +∞ uniformly in
h, we can pass to the limit as x ∈ D → +∞ to obtain limh μh (R) =
limx→+∞ μ((−∞, x]) = μ(R).
d ∧ f ⇒ a. This follows by the same argument used in the first part of
the proof of Theorem 6.28, choosing the points ti in the partitions to be
in the dense set where convergence occurs.
Exercise 6.27. Set

1 2 2
g(ξ ) := √ eiξ x e−x /(2σ ) dx.
2πσ R2

Notice that g(0) = 1, and that differentiation theorems under the integral
sign (2) and an integration by parts give

1 2 2
g  (ξ ) = √ ieiξ x (xe−x /(2σ ) ) dx
2
2πσ R

σ2 d 2 2
= √ i eiξ x e−x /(2σ ) dx
2πσ R dx
2


ξσ2 2 2
= −√ eiξ x e−x /(2σ ) dx.
2πσ R2

Therefore g satisfies the linear differential equation g  (ξ ) = −σ 2 ξ g(ξ ),


2 2
whose general solution is g(ξ ) = ce−σ ξ /2 . Taking into account that
g(0) = 1, c = 1.

(2) In this case, the application of the theorem is justified by the fact that sup d iξ x e−x 2 /(2σ 2 ) |
ξ ∈I | dξ e
is Lebesgue integrable for all bounded intervals I
179 Introduction to Measure Theory and Integration

Exercise 6.28. Let us approximate μ by μn = 1(−n,n) μ; using the in-


equality
|eiξ x − eiηx | ≤ |x||ξ − η| x, ξ, η ∈ R
we obtain that

|μ̂n (ξ ) − μ̂n (η)| ≤ |ξ − η| |x| dμn (x) ≤ n|ξ − η|,
R

therefore μ̂n is uniformly continuous. Since |μ̂n (ξ ) − μ̂(ξ )| ≤ μ(R \


[−n, n]), we have that μ̂n → μ̂ uniformly as n → ∞, therefore μ̂ is
uniformly continuous (indeed, given ε > 0, find n such that sup |μ̂n −
μ̂| < ε/2 and δ = ε/(2n) to obtain |μ̂n (ξ ) − μ̂n (η)| ≤ ε/2 whenever
|ξ − η| < δ, and then |μ̂(ξ ) − μ̂(η)| < ε).
Exercise 6.29. Obviously |μ̂(ξ0 )| = 1, and we set c = μ̂(ξ0 ) = eiθ for
some θ ∈ R. Since

|1 − c̄ei xξ0 |2 dμ(x) = 2 − c̄c − cc̄ = 0,
R

we obtain that ei xξ0 = c for μ–a.e. x ∈ R. This implies that xξ0 − θ ∈


2πZ for μ–a.e. x ∈ R, so that μ is concentrated on the set of points
{(2nπ + θ)/ξ0 }n∈N , and it suffices to set x0 = θ/ξ0 to obtain the stated
representation of μ as a sum of Dirac masses.
Obviously |μ̂| ≡ 1 if μ is a Dirac mass. Conversely, if |μ̂| ≡ 1, we find x0
with μ({x0 }) > 0 and ξ0 , ξ0 ∈ R \ {0} with ξ0 /ξ0 ∈/ Q to obtain that μ is
concentrated on the set {2nπ/ξ0 +x0 }n∈N and on the set {2nπ/ξ0 +x0 }n∈N .
By our choice of ξ0 and ξ0 , the intersection of the two sets is the singleton
{x0 }, and this proves that μ = δx0 .

Chapter 7
Exercise 7.1. Let C > 0 be such that |H (x) − H (y)| ≤ C|x − y| for all
x, y ∈ R. Let ε > 0 and let δ > 0 be such that i | f (bi )− f (ai )| < ε/C
whenever
 i (bi − ai ) < δ. We
 have i |H ( f (bi )) − H ( f (ai ))| ≤
C i | f (bi ) − f (ai )| whenever i (bi − ai ) < δ. In particular, choosing
f (t) = t, we see that Lipschitz functions are absolutely continuous.
Exercise 7.2. We assume that both L 1 (E) > 0 and L 1 (R \ E) > 0. Let
a ∈ R be such that L 1 ((a, ∞) ∩ E) > 0 and L 1 ((a, ∞) \ E) > 0, and
define F(t) = L 1 (E ∩(a, t)). By our choice of a, F(t) and (t −a)− F(t)
are not identically 0 in (a, +∞).
If t > a is a rarefaction point of E, we have
F(t + h) − F(t) L 1 ((t, t + h) ∩ E)
F+ (t) = lim = lim = 0.
h↓0 h h↓0 h
180 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Analogously, F− (t) = 0 and we find that F  is equal to 0 at all rarefaction


points. A similar argument proves that F  = 1 at all density points. Let
now t0 ∈ (a, ∞) where 0 < F(t0 ) < (t0 − a) and apply the mean value
theorem to obtain t0 ∈ (a, t0 ) such that

F(t0 ) = (t0 − a)F  (t0 ).

By our choice of t0 it follows that F  (t0 ) ∈ (0, 1), a contradiction (be-


cause either t0 is a density point or a rarefaction point).
Exercise 7.3. Assume first that ϕ is continuous and bounded. Let H (z):=
z
f (a) ϕ(y) dy. By the (classical) fundamental theorem of the integral cal-
culus, H is differentiable and H  (z) = ϕ(z) for all z ∈ f (I ). By the
chain rule and Exercise 7.1, the function
 f (t)
F(t) := ϕ(y) dy = H ( f (t))
f (a)

is absolutely continuous and it has derivative equal to H  ( f (t)) f  (t) =


ϕ( f (t)) f  (t) at all points t where f is differentiable. On the other hand,
still by the fundamental theorem of the integral calculus, the function
 t
G(t) := ϕ( f (x)) f  (x) dx
a

has derivative equal to (ϕ ◦ f ) f  L 1 –a.e. in [a, b]. Since both F and G


vanish at t = a, they coincide.
By the dominated convergence theorem, the identity of the two func-
tions persists if ϕ = 1 A , with A open (because 1 A is the pointwise limit of
continuous functions). By applying Dynkin’s theorem to the class M of
 f (t) t
the sets E ∈ B ( f (I )) such that f (a) 1 E (y) dy = a 1 E ( f (x)) f  (x) dx
we obtain that the formula holds for all ϕ = 1 E with E Borel. Eventu-
ally we obtain it for simple functions and, by uniform approximation, for
bounded Borel functions.
b
Exercise 7.4. Choosing g = 1 N , by Exercise 7.3 we get a 1 f −1 (N ) f  dx =
0, because 1 N ◦ f = 1 f −1 (N ) . Let h + and h − be respectively the positive
and negative part of f  1 f −1 (N ) . Since
 b  b  b
h + dx − h − dx = f  1 f −1 (N ) dx = 0
a a a

for all intervals (a, b), it follows that h + = h − L 1 –a.e. in R. As a


consequence, f  = 0 L 1 –a.e. in f −1 (N ).
181 Introduction to Measure Theory and Integration

Chapter 8
Exercise 8.1. Both are measures in (Z , H ). If B ∈ H then g◦ f # μ(B) =
μ( f −1 (g −1 (B))), because (g ◦ f )−1 = f −1 ◦ g −1 . On the other hand,

g# ( f # μ)(B) = f # μ(g −1 (B)) = μ( f −1 (g −1 (B))).

Exercise 8.2. Let n ≥ 1 integer, 0 ≤ k < 2n and let us consider the


interval I = [k/2n , (k + 1)/2n ). Then, f −1 (I ) is the cylindrical set of all
binary sequences a0 a1 · · · such that a0 · · · an−1 is the binary expression
of k. It follows that
∞  
 −1
×i=0
1
2
1
δ0 + δ1
2
f (I ) = L 1 (I ).

because their common value is 2−n . On the other hand, f −1 ({1}) consists
of a single point and therefore the identity above holds for I = {1}, the
common value being 0. By additivity the identity holds for finite unions
of sets of this type, a family stable under finite intersections. By the
coincidence criterion the two measures coincide.
Exercise 8.3. Let A ⊂ R be a dense open set whose complement C has
strictly positive Lebesgue measure (Exercise 1.9), and let

ϕ(t) := min {1, dist(t, C)} t ∈ R.

By construction the function ϕ is continuous, nonnegative, bounded by


1, and vanishes precisely on C. Then, set
⎧ t


⎨ ϕ(s) ds if t ≥ 0;
F(t) := 0
0


⎩− ϕ(s) ds if t < 0.
t

We have F  = ϕ, so that F ∈ C 1 and its critical set C F = C has positive


Lebesgue measure. It follows that F# L 1 is not absolutely continuous
b
with respect to L 1 . Finally, since a ϕ dt > 0 whenever a < b (because
A ∩ (a, b)  = ∅) we obtain that F is strictly increasing.
Exercise 8.4. Recall that F(C F ) is always Lebesgue negligible, regard-
less of any injectivity assumption on U . Hence, possibly replacing U by
U \C F we can assume with no loss of generality that C F = ∅, i.e. D F(x)
is nonsingular at any x ∈ U . Recall that, according to the local invert-
ibility theorem, for any x ∈ U there exists a ball Br (x) contained in U
such that the restriction to F is injective. Now, following the strategy of
182 Luigi Ambrosio, Giuseppe Da Prato and Andrea Mennucci

Lemma 6.9 we can cover U by a sequence of right open cubes {Q i }i∈I ,


pairwise disjoint, such that the restriction of F to a neighbourhood of
Q i is injective (we keep dividing a cube until this property is achieved).
Let Q i = × n
i=1 [ai , ai + δ); for bi < ai sufficiently close to ai and
Q̃ i =× n
i=1 (bi , bi + δ) we have (by injectivity of F on Q̃ i )

1
F# (1 Q̃ i L n ) = 1 Ln
|JF | ◦ F −1 F( Q̃ i )
and therefore we can pass to the limit to get
1
F# (1 Q i L n ) = 1 L n.
|JF | ◦ F −1 (y) F(Q i )
If we add both sides with respect to i ∈ I we get
 1  1
F# (1U L n ) = 1 F(Q ) L n
= 1 F(U ) L n .
i∈I
|JF | ◦ F (y)
−1 i
x∈F −1 (y)
|J F |(x)
References

[1] L. C ARLESON, On the convergence and growth of Fourier series,


Acta Math. 116 (1966), 135–157.
[2] W. F. E BERLEIN, Notes on Integration I: The Underlying Conver-
gence Theorem, Comm. Pure Appl. Math. X (1957), 357–360.
[3] H. F EDERER, “Geometric Measure Theory”, Springer, 1969.
[4] F. R IESZ and B. NAGY, “Functional Analysis”, Dover, 1990.
[5] W. RUDIN, “Real and Complex Analysis”, McGraw-Hill, 1987.
[6] S. WAGON, “The Banach-Tarski Paradox”, Cambridge University
Press, 1985.
[7] K. YOSIDA, “Functional Analysis”, Springer, 1980.
LECTURE NOTES

This series publishes polished notes dealing with topics of current re-
search and originating from lectures and seminars held at the Scuola Nor-
male Superiore in Pisa.

Published volumes
1. M. T OSI , P. V IGNOLO, Statistical Mechanics and the Physics of Flu-
ids, 2005 (second edition). ISBN 978-88-7642-144-0
2. M. G IAQUINTA , L. M ARTINAZZI , An Introduction to the Regularity
Theory for Elliptic Systems, Harmonic Maps and Minimal Graphs,
2005. ISBN 978-88-7642-168-8
3. G. D ELLA S ALA , A. S ARACCO , A. S IMIONIUC , G. T OMASSINI ,
Lectures on Complex Analysis and Analytic Geometry, 2006.
ISBN 978-88-7642-199-8
4. M. P OLINI , M. T OSI , Many-Body Physics in Condensed Matter Sys-
tems, 2006. ISBN 978-88-7642-192-0
P. A ZZURRI, Problemi di Meccanica, 2007. ISBN 978-88-7642-223-2
5. R. BARBIERI, Lectures on the ElectroWeak Interactions, 2007. ISBN
978-88-7642-311-6
6. G. DA P RATO, Introduction to Stochastic Analysis and Malliavin Cal-
culus, 2007. ISBN 978-88-7642-313-0
P. A ZZURRI, Problemi di meccanica, 2008 (second edition). ISBN 978-
88-7642-317-8
A. C. G. M ENNUCCI , S. K. M ITTER , Probabilità e informazione,
2008 (second edition). ISBN 978-88-7642-324-6
7. G. DA P RATO, Introduction to Stochastic Analysis and Malliavin Cal-
culus, 2008 (second edition). ISBN 978-88-7642-337-6
8. U. Z ANNIER, Lecture Notes on Diophantine Analysis, 2009.
ISBN 978-88-7642-341-3
9. A. L UNARDI, Interpolation Theory, 2009 (second edition).
ISBN 978-88-7642-342-0
186 Lecture notes

10. L. A MBROSIO , G. DA P RATO , A. M ENNUCCI, Introduction to Meas-


ure Theory and Integration, 2011.
ISBN 978-88-7642-385-7, e-ISBN: 978-88-7642-386-4

Volumes published earlier


G. DA P RATO, Introduction to Differential Stochastic Equations, 1995
(second edition 1998). ISBN 978-88-7642-259-1
L. A MBROSIO, Corso introduttivo alla Teoria Geometrica della Misura
ed alle Superfici Minime, 1996 (reprint 2000).
E. V ESENTINI, Introduction to Continuous Semigroups, 1996 (second
edition 2002). ISBN 978-88-7642-258-4
C. P ETRONIO, A Theorem of Eliashberg and Thurston on Foliations and
Contact Structures, 1997. ISBN 978-88-7642-286-7
Quantum cohomology at the Mittag-Leffler Institute, a cura di Paolo Aluf-
fi, 1998. ISBN 978-88-7642-257-7
G. B INI , C. DE C ONCINI , M. P OLITO , C. P ROCESI, On the Work of
Givental Relative to Mirror Symmetry, 1998. ISBN 978-88-7642-240-9
H. P HAM, Imperfections de Marchés et Méthodes d’Evaluation et Couver-
ture d’Options, 1998. ISBN 978-88-7642-291-1
H. C LEMENS, Introduction to Hodge Theory, 1998. ISBN 978-88-7642-268-3
Seminari di Geometria Algebrica 1998-1999, 1999.
A. L UNARDI, Interpolation Theory, 1999. ISBN 978-88-7642-296-6
R. S COGNAMILLO, Rappresentazioni dei gruppi finiti e loro caratteri,
1999.
S. RODRIGUEZ, Symmetry in Physics, 1999. ISBN 978-88-7642-254-6
F. S TROCCHI, Symmetry Breaking in Classical Systems, 1999 (2000).
ISBN 978-88-7642-262-1
L. A MBROSIO , P. T ILLI, Selected Topics on “Analysis in Metric Spaces”,
2000. ISBN 978-88-7642-265-2
A. C. G. M ENNUCCI , S. K. M ITTER, Probabilità ed Informazione, 2000.
S. V. B ULANOV, Lectures on Nonlinear Physics, 2000 (2001).
ISBN 978-88-7642-267-6
Lectures on Analysis in Metric Spaces, a cura di Luigi Ambrosio e Fran-
cesco Serra Cassano, 2000 (2001). ISBN 978-88-7642-255-3
L. C IOTTI, Lectures Notes on Stellar Dynamics, 2000 (2001).
ISBN 978-88-7642-266-9
S. RODRIGUEZ, The Scattering of Light by Matter, 2001.
ISBN 978-88-7642-298-0
G. DA P RATO, An Introduction to Infinite Dimensional Analysis, 2001.
ISBN 978-88-7642-309-3
S. S UCCI, An Introduction to Computational Physics: – Part I: Grid
Methods, 2002. ISBN 978-88-7642-263-8
D. B UCUR , G. B UTTAZZO, Variational Methods in Some Shape Optim-
ization Problems, 2002. ISBN 978-88-7642-297-3
187 Lecture notes

A. M INGUZZI , M. T OSI, Introduction to the Theory of Many-Body Sys-


tems, 2002.
S. S UCCI, An Introduction to Computational Physics: – Part II: Particle
Methods, 2003. ISBN 978-88-7642-264-5
A. M INGUZZI , S. S UCCI , F. T OSCHI , M. T OSI , P. V IGNOLO, Numer-
ical Methods for Atomic Quantum Gases, 2004. ISBN 978-88-7642-130-0

You might also like