Measure Theory
Steve Cheng
August 5, 2004
Contents
1 Motivation for the Lebesgue integral 2
2 Basic measure theory 4
3 Measurable functions 8
4 Denition of the Lebesgue Integral 11
5 Convergence theorems 15
6 Some Results of Integration Theory 18
7 L
p
spaces 23
8 Construction of Lebesgue Measure 28
9 Lebesgue Measure in R
n
32
10 Riemann integrability implies Lebesgue integrability 34
11 Product measures and Fubinis Theorem 36
12 Change of variables in R
n
39
13 Vectorvalued integrals 41
14 C
0
functions are dense in L
p
(R
n
) 42
15 Other examples of measures 47
16 Egorovs Theorem 50
17 Exercises 51
1
18 Bibliography 52
Preface
This article develops the basics of the Lebesgue integral and measure theory.
In terms of content, it adds nothing new to any of the existing textbooks on
the subject. But our approach here will be to avoid unduly abstractness and
absolute generality, instead focusing on producing proofs of useful results as
quickly as possible.
Much of the material here comes from lecture notes from a short real analysis
course I had taken, and the rest are wellknown results whose proofs I had
worked out myself with hints from various sources. I typed this up mainly for
my own benet, but I hope it will be interesting for anyone curious about the
Lebesgue integral (or higher mathematics in general).
I will be providing proofs of every theorem. If you are bored reading them,
you are invited to do your own proofs. The bibliography outlines the background
you need to understand this article.
Copyright matters
Permission is granted to copy, distribute and/or modify this document under
the terms of the GNU Free Documentation License, Version 1.2 or any later
version published by the Free Software Foundation; with no Invariant Sections,
with no FrontCover Texts, and with no BackCover Texts.
1 Motivation for the Lebesgue integral
If you have followed the rigorous denition of the Riemann integral in R or R
n
,
you may be wondering why do we need to study yet another integral. After all,
why should we even care to integrate nasty functions like:
D(x) =
_
1 , x Q
0 , x R Q
Rephrased in another way, D(x) is actually the indicator function
1
of the
set
S = x Q R,
and we want to nd its length. Continuing to rephrase this question, sup
pose we are taking many realvalued measurements x of a particular physical
phenomenon. What is the probability, say, that x is rational? If we assume
1
For any set S, this is the function
S
dened by
S
(x) = 1 if x S and
S
(x) = 0 if
x / S. Math people call this the characteristic function, while probability people call it the
indicator function instead.
2
the measurements are distributed normally with a mean of and a standard
deviation of , then this is given by:
Pr[X Q] =
_
S
1
2
2
e
1
2
(
x
)
2
dx
So wild sets like S are theoretically worth considering, and it does not work to
use Riemann integral to evaluate the above probability.
Another limitation to the Riemann integral is with limits. If a sequence of
functions f
n
is uniformly convergent (on a closed interval, or more generally a
compact set A R
n
), then we can interchange limits for the Riemann integral:
lim
n
_
A
f
n
(x) dx =
_
A
lim
n
f
n
(x) dx,
but the criterion of uniform convergence is often too restrictive, e.g. when
integrating Fourier series. On the other hand, it can be proven with the Lebesgue
integral that the interchange is valid under weaker conditions (e.g. the functions
f
n
is bounded above somehow, and they converge pointwise).
As an added benet, some sophisticated results concerning the Riemann
integral, such as the Change of Variables Theorem in R
n
, are more easily proven
using the Lebesgue integral, with its arsenal of limit theorems.
Finally, the Riemann integral does not deal with integration over innite
bounds very well. For example, the standard way to compute the probability
integral
_
+
1
2
x
2
dx
goes like this:
_
_
+
1
2
x
2
dx
_
2
=
_
+
1
2
x
2
dx
_
+
1
2
y
2
dy
=
_
+
_
+
1
2
(x
2
+y
2
)
dxdy
=
_
R
2
e
1
2
(x
2
+y
2
)
dxdy
=
_
2
0
_
0
e
1
2
r
2
r dr d (using polar coordinates)
= 2
_
e
1
2
r
2
_
r=
r=0
= 2 .
So
_
+
1
2
x
2
dx =
2 .
The above computation seems easy, and although it can be justied using the
Riemann integral alone, it is not entirely trivial, but it is with the Lebesgue
3
integral. (For example, why should
_
+
_
+
be the same as
_
R
2
? Note that
in the Riemann theory, the iterated integral and the area integral are proven to
be equal only for bounded sets of integration.)
You will probably be able to nd other sorts of limitations with the Riemann
integral.
2 Basic measure theory
The setting of abstract integration is measure theory, which tells us what the
areas or volumes of various sets are. Essentially we are given some function
of sets which returns the area or volume formally called the measure
of the given set. i.e. We assume at the beginning that such a function has
already been dened for us. The abstract approach of the Lebesgue integral has
the obvious advantage that the theory can be applied to many other measures
besides volume in R
n
.
We begin with the axioms of measure theory.
Denition 2.1. Let X be any nonempty set. A sigma algebra
2
of subsets of
X is a family / of subsets of X, with the properties:
1. / is nonempty.
2. If E /, then X E /.
3. If E
n
nN
is a sequence of sets in /, then their union is in /. That is,
/ is closed under countable unions.
The pair (X, /) is called a measurable space, and the sets in / are called
the measurable sets.
Notice that the axioms always imply that X /. Also, by De Morgans
laws, / is closed under countable intersections as well as countable union.
Needless to say, we cannot insist that / is closed under arbitrary unions
or intersections, as that would force / = 2
X
if / contains all the singleton
sets. That would be uninteresting. On the other hand, we want closure under
countable set operations, rather than just nite ones, as we will want to take
countable limits.
Example 2.1. Let X be any (nonempty) set. Then / = 2
X
is a sigma algebra.
Example 2.2. Let X be any (nonempty) set. Then / = X, is a sigma
algebra.
To get nontrivial sigma algebras to work with we need the following, a very
unconstructive(!) construction:
If we have a family of sigma algebras on X, then the intersection of all the
sigma algebras from this family is also a sigma algebra on X. If all of the sigma
2
I do not know why it has such a ridiculous name, other than the fact that it is often
denoted by the Greek letter.
4
algebras from the family contains some xed ( 2
X
, then the intersection of
all the sigma algebras from the family, of course, is a sigma algebra containing
(.
Now if we are given (, and we take all the sigma algebras on X that contain
(, and intersect all of them, we get the smallest sigma algebra that contains (.
Denition 2.2. The smallest sigma algebra containing any given ( 2
X
, as
constructed above, is denoted (), and is also called the sigma algebra generated
by (.
The following is an oftenused sigma algebra.
Denition 2.3. If X is a topological space, we can construct the sigma algebra
T ), where T is the set of all open sets. This is called the Borel sigma algebra
and is denoted B(X). When topological spaces are involved, we will always take
the sigma algebra to be the Borel sigma algebra unless stated otherwise.
B(X), being generated by the open sets, then contains all open sets, all
closed sets, and countable unions and intersections of open sets and closed sets.
It seems unlikely, however, that every set in B(X) is expressible as a countable
union and/or intersection of open sets and closed sets, although it is tempting
to think that.
By the way, Theorem 9.3 shows the Borel sigma algebra is generally not all
of 2
X
.
Sigma algebras are the domain on which measures are dened.
Denition 2.4. Let (X, /) be a measurable space. A positive measure on this
space is a function : / [0, ] such that
1. () = 0
2. Countable additivity: For any sequence of mutually disjoint sets E
n
/,
_
_
n=1
E
n
_
=
n=1
(E
n
) .
The set (X, /, ) will be called a measure space. Whenever convenient we
will abbreviate this expression, as in let X be a measure space, etc. Also, in
this article, when we say measure, we will be dealing with positive measures
only. (There are also theories about signed measures and complex measures.)
Example 2.3. Let X be an arbitrary set, and / be a sigma algebra on X. Dene
: / [0, ] as
(A) =
_
[A[ , if A is a nite set
, if A is an innite set .
This is called the counting measure.
We will be able to model the innite series
n=1
a
n
in Lebesgue integration
theory by using X = N and the counting measure, since integrals are essentially
sums of the integrand values weighted by areas or measures.
5
Example 2.4. X = R
n
, and / = B(R
n
). We can construct the Lebesgue measure
which assigns to the rectangle [a
1
, b
1
] [a
n
, b
n
] in R
n
its expected n
dimensional volume (b
1
a
1
) (b
n
a
n
). Of course this measure should also
assign the correct volumes to the usual geometric gures, as well as for all the
other sets in /.
The existence of such a measure will be demonstrated later.
Intuitively, dening the volume of the rectangle only should suce to uniquely
also determine the volume of the other sets, since the volume of every set can
be approximated by the volume of many small rectangles. Indeed, we will later
show this intuition to be true. In fact, you will see that most theorems using
Lebesgue measure really depend only on the denition of the volume of the
rectangle.
Example 2.5. Any probability measure (as dened by the usual axioms of prob
ability) is actually a measure in our sense. For example,
Pr[Z B] = (B) =
_
B
1
2
e
1
2
t
2
dt ,
3
where the integration, of course, is with respect to the Lebesgue measure on
the real line. Other examples include the uniform distribution, the Poisson
distribution, and so forth.
Before we begin the prove more theorems, I must mention that we will be
operating on the quantity as if it were a number, even though you may have
been told this is wrong by some teachers. It is true, of course, that certain
algebraic properties of R would fail to hold with included (i.e. R,
is not a eld), but the crucial point in real analysis is that obeys the usual
ordering rules when used in inequalities. The rules we adopt are the following:
a
+=
a = (a ,= 0)
0 = 0
The rst three rules are selfexplanatory. The last rule may need explaining:
when integrating functions, we often want to ignore isolated singularities, e.g.
at zero for
_
1
0
dx/
k=1
(E
k
E
k1
) = lim
n
n
k=1
(E
k
E
k1
) = lim
n
(E
n
) .
7
Theorem 2.3. For any E
n
/,
_
_
n=1
E
n
_
n=1
(E
n
) .
Proof.
_
_
n=1
E
n
_
=
_
_
n=1
E
n
(E
1
E
2
E
n1
)
_
=
n=1
_
E
n
(E
1
E
2
E
n1
)
_
n=1
(E
n
) .
Theorem 2.4. Let E
n
E (that is, E
n
are decreasing and their intersection
is E), and (E
1
) < . Then
lim
n
(E
n
) = (E) =
_
n=1
E
n
_
.
Proof. We have E
1
E
n
(E
1
E). So
(E
1
E) =
_
_
n=1
(E
1
E
n
)
_
= lim
n
(E
1
E
n
) ,
(E
1
) (E) = lim
n
[(E
1
) (E
n
)] = (E
1
) lim
n
(E
n
) ,
and cancel (E
1
) on both sides.
3 Measurable functions
To do integration theory, we of course need functions to integrate. You should
not expect that arbitrary functions can be integrated, but only the measurable
ones. The following denition is not dicult to motivate.
Denition 3.1. Let (X, /) and (Y, B) be measurable spaces. A map f : X Y
is measurable if
for all B B, the set f
1
(B) = [f B] is in /.
Example 3.1. A constant map is always measurable, for f
1
(B) is either or
X.
Theorem 3.1. The composition of two measurable functions is measurable.
Proof. Immediate from the denition.
8
Theorem 3.2. Let (X, /) and (Y, B) be measurable spaces, and suppose H
generates the sigma algebra B: H) = B. A function f : X Y is measurable
if and only if for every V H, f
1
(V ) is in /.
Proof. The only if part is just the denition of measurability. For the if
direction, dene ( = f
1
(V ) : V H, and also ( = V B : f
1
(V ) ().
It is easily checked that ( is a sigma algebra on Y , and it contains H, and hence it
is actually equal to B. That is, for every V B, f
1
(V ) is in () /) = /.
Corollary 3.3. All continuous functions (between topological spaces) are mea
surable.
A comment about innities again. There is a natural topology on [, +]
and [0, ] that make them look like closed intervals. Some denote [, +] by
R (the extended real numbers). However, for convenience, I will just denote it
as plain R. So keep in mind that when we prove our theorems, we have to make
sure that they work (or do not work) when innite quantities are introduced.
Theorem 3.4. Let (X, /) be a measurable space. A map f : X R is mea
surable if and only if [f > c] = f
1
((c, +]) / for all c R.
Proof. Let B be the set of all open intervals (a, b), along with , +. Let
H be the set of all intervals (c, +]. (a, b, c are nite.) Evidently H generates
B:
(a, b) = [, b) (a, +] ,
[, b) =
_
n=1
[, b
1
n
] =
_
n=1
R (b
1
n
, +] .
+ =
n=1
(n, +] .
= R
_
n=1
(n, +] .
In turn, B generates the Borel sigma algebra on R. Applying Theorem 3.2 to
the generator H gives the result. (As B H) and H B) together mean
H) = B).)
Remark 3.5. We can replace (c, +], in the statement of the theorem, by
[c, +], [, c], etc. and there is no essential dierence.
The countable union with 1/n trick used in the proof is widely applicable.
It may be of interest to note that the Archimedean property of the real num
bers is being used here the same proof will not work with nonArchimedean
ordered elds.
9
Theorem 3.6. Let f
n
be a sequence of measurable Rvalued functions. Then
the functions
sup
n
f
n
, inf
n
f
n
, max
n
f
n
, min
n
f
n
, limsup
n
f
n
, liminf
n
f
n
(the limits are pointwise) are all measurable.
Proof. If g(x) = sup
n
f
n
(x), then [g > c] =
n
[f
n
> c], and we apply Theorem
3.4. Similarly, if g(x) = inf
n
f
n
(x), then [g < c] =
n
[f
n
< c]. The rest can
be expressed as in terms of supremums and inmums (over a countable set), so
they are measurable also.
Not surprisingly, we will need to do arithmetic in integration theory, so we
better know that
Theorem 3.7. If f, g : X R are measurable, then so are f +g, fg, and f/g.
Proof. Consider the countable union
[f +g < c] =
_
rQ
[f < c r] [g < r] .
The set equality is justied as follows: Clearly f(x) < cr and g(x) < r together
imply f(x) + g(x) < c. Conversely, if we set g(x) = t, then f(x) < c t, and
we can increase t slightly to a rational number r such that f(x) < c r, and
g(x) < t < r. This shows that f +g is measurable (by Theorem 3.4).
Since [f < c] = [f > c], we see that f is measurable.
Therefore the functions
f
+
(x) = max+f(x), 0 (positive part of f)
f
, f is
measurable if f
+
and f
)(g
+
g
) = f
+
g
+
f
+
g
g
+
+f
,
to prove that fg is measurable, it suces to assume that f and g are both
nonnegative. Then just as with the sum,
[fg < c] =
_
rQ
[f < c/r] [g < r] .
Finally, for 1/g,
[1/g < c] =
_
[1/c < g, cg > 0] [1/c > g, cg > 0] , c ,= 0
[g < 0] , c = 0 .
10
Remark 3.8. You probably have already noticed there may be diculty in den
ing what the arithmetic operations mean when the operands are innite (or
when dividing by zero). The usual way to deal with these problems is to simply
redene the functions whenever they are innite to be some xed value. In
particular, if the function g is obtained by changing the original measurable
function f on a measurable set A to be a constant c, we have:
[g B] =
_
[g B] A
_
_
[g B] A
c
_
[g B] A =
_
A, c B
, c / B
[g B] A
c
= [f B] A
c
,
so the resultant function g is also measurable. Very conveniently, any sets like
A = [f = +], [f = 0] are automatically measurable. Thus the gaps in the
previous proof with respect to innite values can be repaired with this device.
As a nal note, one intermediate result from the proof is quite useful and
should be formally recognized:
Theorem 3.9. An Rvalued function f is measurable if and only if f
+
and f
.
Remark 3.10. Of course the converse to the second statement is not true. You
may construct a counterexample to convince yourself of this fact.
4 Denition of the Lebesgue Integral
The idea behind Riemann integration is to try to measure the sums of area of
the rectangles below a graph of a function and then take some sort of limit.
The Lebesgue integral uses a similar approach: we perform integration on the
simple functions rst:
Denition 4.1. A function is simple if its range is a nite set.
An Rvalued simple function always has a representation
=
n
k=1
a
k
E
k
,
where a
k
are the distinct values of , and E
k
=
1
(a
k
). Conversely, any
expression of the above form, where a
k
need not be distinct, and E
k
is not neces
sarily
1
(a
k
), also denes a simple function. For the purposes of integration,
however, we will require that E
k
be measurable, and that they partition X. It
should be mentioned that
S
is measurable if and only if S is.
11
Denition 4.2. Let (X, ) be a measure space. The Lebesgue integral, over
X, of a R
+
valued measurable simple function is dened as
_
X
d =
_
X
n
k=1
a
k
E
k
d =
n
k=1
a
k
(E
k
) .
(We restrict to being nonnegative for now, to avoid mixed +, on
the righthand side.) Needless to say, the quantity on the right represents the
sum of the areas below the graph of .
It had better be the case that the value of the integral does not depend
on the representation of . If =
i
a
i
Ai
=
j
b
j
Bj
, where A
i
and B
j
partition X (so A
i
B
j
partition X), then
i
a
i
(A
i
) =
i
a
i
(A
i
B
j
) =
i
b
j
(A
i
B
j
) =
j
b
j
(B
j
) .
The second equality follows because the value of is a
i
= b
j
on A
i
B
j
, so
a
i
= b
j
whenever A
i
B
j
,= . So fortunately the integral is welldened.
Using the same algebraic manipulations just now, you can prove that if we
have two simple functions , then
_
X
d
_
X
d (monotonicity of the
integral).
Theorem 4.1. The Lebesgue integral (for nonnegative simple functions) is
linear.
Proof. Clearly
_
X
cd = c
_
X
d. And if =
i
a
i
Ai
, =
j
b
j
Bj
, we
have
_
X
d +
_
X
d =
i
a
i
(A
i
) +
j
b
j
(B
j
)
=
j
a
i
(A
i
B
j
) +
i
b
j
(B
j
A
i
)
=
j
(a
i
+b
j
) (A
i
B
j
)
=
_
X
( +) d.
Next, we integrate nonsimple measurable functions like this:
Denition 4.3. Let f : X [0, +] be measurable. Consider te set S
f
of all
measurable simple functions 0 f, and dene the integral of f over X as
_
X
f d = sup
S
f
_
X
d.
12
Intuitively, the simple functions in S
f
are supposed to approximate f as
close as we like, and we nd the integral of f by computing the integrals of
these approximations. But logically we need to know that these approximations
really do exist. This is the essence of the following theorem.
Theorem 4.2 (Approximation Theorem). Let f : X [0, ] be measur
able. Then there exists a sequence of nonnegative functions
n
f, meaning
n
are increasing pointwise and converging pointwise to f. Moreover, if f is
bounded, it becomes possible for the
n
to converge to f uniformly.
Proof. We prove the second statement rst. Let N be any integer > sup f, and
set
n
=
N2
n
k=1
k 1
2
n
E
n,k
, E
n,k
= f
1
__
k 1
2
n
,
k
2
n
__
.
We have 0 f
n
< 2
n
uniformly. The detailed verication is left to the
reader.
The construction for the rst statement is very similar. Set
n
=
n2
n
k=1
k 1
2
n
E
n,k
+
Fn
, F
n
= [f n] .
I leave it to you to check that 0 f(x)
n
(x) < 2
n
whenever n > f(x), and
n
(x) = n whenever f(x) = .
In case you were worrying about whether this new denition of the integral
agrees with the old one in the case of the nonnegative simple functions, well, it
does. Use monotonicity to prove this.
Denition 4.4. If f is not necessarily nonnegative, we dene
_
X
f d =
_
X
f
+
d
_
X
f
d,
provided that the two integrals on the right are not both .
Of course we will want to integrate over subsets of X also. This can be
accomplished in two ways. Let A be a measurable subset of X. Either we
simply consider integrating over the measure space restricted to subsets of A,
or we dene _
A
f d =
_
X
f
A
d.
If is nonnegative simple, a simple working out of the two denitions of the
integral over A shows that they are equivalent. To prove this for the case of
arbitrary measurable functions, we will need the tools of the next section.
Let us note some other basic properties of our integral:
13
Let f, g be nonnegative. Since
S
cf
= c S
f
= c : S
f
, 0 c < .
we have (we freely omit the d and/or the integration limit X when they
are implied by the context)
_
cf = c
_
f .
This rule about constant multiplication also holds for f and c not necessarily
nonnegative, as you can easily check, but proving linearity requires the tools of
the next section.
Moreover, if 0 f g, then S
f
S
g
, and therefore
_
f
_
g. In particular,
if A B, then f
A
f
B
, so
_
A
f
_
B
f .
Unsurprisingly
_
f
_
g also holds if f, g are not necessarily nonnegative, and
that is proven by considering the positive and negative parts of f, g separately.
Then since [f[ f [f[, we also obtain
_
[f[
_
f
_
[f[, i.e.
_
f
_
[f[ .
(This last inequality is sometimes called the generalized triangle inequality,
as integrals can be viewed as an advanced form of summing.)
Next, we make one more denition related to integrals.
Denition 4.5. A ()measurable set is said to have ()measurable zero if
(E) = 0.
Typical examples of a measurezero set are the singleton points in R
n
, and
lines and curves in R
n
, n 2. By countable additivity, any countable set in R
n
has measure zero also.
A particular property is said to hold almost everywhere if the set of points
for which the property fails to hold is a set of measure zero. For example, a
function vanishes almost everywhere.
Clearly, if you integrate anything on a set of measure zero, you get zero.
Assuming that linearity of the integral has been proved, we can demonstrate
the following intuitive result.
Theorem 4.3. A measurable function f : X [0, ] vanishes almost every
where if and only if
_
X
f = 0.
Proof. Let A = [f = 0], and (A
c
) = 0. Then
_
X
f =
_
X
f (
A
+
A
c ) =
_
X
f
A
+
_
X
f
A
c =
_
A
f +
_
A
c
f = 0 + 0 .
14
Conversely, if
_
X
f = 0, consider [f > 0] =
n
[f >
1
n
]. We have
[f >
1
n
] =
_
[f>
1
n
]
1 = n
_
[f>
1
n
]
1
n
n
_
X
f = 0 .
Hence [f > 0] = 0.
5 Convergence theorems
The following theorems are another feature of the Lebesgue integral that make
it so much better than the Riemann denition.
Theorem 5.1 (Monotone Convergence Theorem). Let (X, ) be a mea
sure space. Let f
n
be nonnegative measurable functions increasing pointwise to
f. Then
_
X
f d =
_
X
_
lim
n
f
n
_
d = lim
n
_
X
f
n
d.
Proof. f is measurable because it is a limit of measurable functions. Since f
n
is
an increasing sequence of functions bounded by f, their integrals is an increasing
sequence of numbers bounded by
_
X
f; thus the following limit exists:
lim
n
_
X
f
n
d
_
X
f d.
Next we show the inequality in the other direction.
Take any 0 < t < 1. Given a xed S
f
, let A
n
= [f
n
t 0]. The A
n
are obviously increasing.
If for a particular x X, we have (x) = 0, then x A
n
, for all n.
Otherwise, (x) > 0, so f(x) (x) > t(x), and there is going to be some n
for which f
n
(x) t(x), i.e. x A
n
. Hence X =
n
A
n
.
For all measurable sets E, dene a new measure on X by
(E) =
_
E
td.
Then
_
X
td = (X) =
_
_
n
A
n
_
= lim
n
(A
n
) = lim
n
_
An
td
lim
n
_
An
f
n
d, since on A
n
we have t f
n
lim
n
_
X
f
n
d.
t
_
X
d lim
n
_
X
f
n
d, and take limit t 1 .
_
X
d lim
n
_
X
f
n
d, and take sup over S
f
.
15
Using this theorem, we are now in the position to prove linearity of the
Lebesgue integral for nonsimple functions. Given any two nonnegative mea
surable functions f, g, by the approximation theorem (Theorem 4.2), we know
that are nonnegative simple functions
n
f, and
n
g. Then
n
+
n
f +g, and so
_
f +g = lim
n
_
n
+
n
= lim
n
_
n
+
_
n
=
_
f +
_
g .
(The second equality follows because we already know the integral is linear
for simple functions. For the rst and third equality we apply the Monotone
Convergence Theorem.)
And if f, g not necessarily nonnegative, then
_
f +g =
_
(f
+
f
) + (g
+
g
)
=
_
f
+
+g
+
(f
+g
)
=
_
f
+
+g
+
_
f
+g
=
_
f
+
+
_
g
+
(
_
f
+
_
g
) =
_
f +
_
g .
(Only at the fourth equality we apply what we had just proved for nonnegative
functions. The third and last equality are just by the denition of the integral.)
Here is another application of the Monotone Convergence Theorem.
Theorem 5.2 (Beppo Levi). Let f
n
: X [0, ] be measurable. Then
_
n=1
f
n
=
n=1
_
f
n
.
Proof. Let g
N
=
N
n=1
f
n
, and g =
n=1
f
n
. The Monotone Convergence
Theorem applies to g
N
, and:
_
g =
_
lim
N
g
N
= lim
N
_
g
N
= lim
N
N
n=1
_
f
n
=
n=1
_
f
n
.
There are many more applications like this. We postpone those for now,
since you will probably be even more amazed by the next convergence theorem.
We rst need a lemma.
Lemma 5.3 (Fatous Lemma). Let f
n
: X [0, ] be measurable. Then
_
liminf
n
f
n
liminf
n
_
f
n
.
16
Proof. Set g
n
= inf
kn
f
k
, so that g
n
f
n
, and g
n
liminf
n
f
n
. Then
_
liminf
n
f
n
=
_
lim
n
g
n
= lim
n
_
g
n
= liminf
n
_
g
n
liminf
n
_
f
n
.
Remark 5.4. By adding and subtracting a constant, the hypotheses may be
weakened to allow functions that are bounded below by any xed number, not
just nonnegative functions. (This lower bound condition cannot be dropped.)
The same considerations apply to the Monotone Convergence Theorem.
The following denition is used to formulate a crucial hypothesis of the
theorem that is about to follow.
Denition 5.1. A function f : X R is called integrable if it is measurable
and
_
X
[f[ < .
It is immediate that f is integrable if and only if f
+
and f
are both
integrable. It is also helpful to know, that
_
[f[ < must imply [f[ <
almost everywhere.
Theorem 5.5 (Dominated Convergence Theorem). Let (X, ) be a mea
sure space. Let f
n
: X R be a sequence of measurable functions converging
pointwise to f. Moreover, suppose that there is an integrable function g such
that [f
n
[ g, for all n. Then f
n
and f are also integrable, and
lim
n
_
X
[f
n
f[ d = 0 .
Proof. Obviously f
n
and f are integrable. Also, 2g [f
n
f[ is measurable and
nonnegative. By Fatous lemma,
_
liminf
n
(2g [f
n
f[) liminf
n
_
(2g [f
n
f[) .
Since f
n
converges to f, the lefthand quantity is just
_
2g. The righthand
quantity is:
liminf
n
__
2g
_
[f
n
f[
_
=
_
2g + liminf
n
_
_
[f
n
f[
_
=
_
2g limsup
n
_
[f
n
f[ .
Since
_
2g is nite, it may be cancelled from both sides. Then we obtain
limsup
n
_
[f
n
f[ 0 , i.e. lim
n
_
[f
n
f[ = 0 .
Remark 5.6. It obviously suces to only require that f
n
converge to f pointwise
almost everywhere, or that [f
n
[ is bounded above by g almost everywhere. (Of
course, if f
n
only converges to f almost everywhere, then the theorem would
not automatically say that f is measurable.)
17
Remark 5.7. By the generalized triangle inequality, we conclude from the hy
potheses that also
lim
n
_
X
f
n
d =
_
X
f d,
which is usually how this theorem is applied.
Remark 5.8. The theorem also holds for continuous limits of functions, not just
countable limits. That is, if we have a continuous sequence of functions, say f
t
,
0 t < 1, we can also say
lim
t1
_
X
[f
t
f[ d = 0 ;
for given any sequence a
n
convergent to 1, we can apply the theorem to f
an
.
Since this can be done for any sequence convergent to 1, the above limit is
established.
6 Some Results of Integration Theory
This section contains some nice applications proven using the convergence the
orems from the last section.
Theorem 6.1 (Generalization of Beppo Levi). Let X be a measure space,
and f
n
: X R be measurable functions, with
_
[f
n
[ =
_
[f
n
[ < . Then
n=1
_
f
n
=
_
n=1
f
n
.
Proof. Let g
N
=
N
n=1
f
n
, g = limsup
N
g
N
, and h =
n=1
[f
n
[. Then
[g
N
[ h = [h[. Since
_
[h[ < by hypothesis, we have [h[ < almost
everywhere, so
n=1
f
n
is absolutely convergent almost everywhere. That is,
g
N
converges pointwise to g almost everywhere.
By the Dominated Convergence Theorem, lim
N
_
g
N
=
_
g, whence
n=1
_
f
n
= lim
N
N
n=1
_
f
n
= lim
N
_
N
n=1
f
n
=
_
n=1
f
n
.
Example 6.1. Heres a perhaps unexpected application. Suppose we have a
countable set of real numbers a
n,m
, n, m N. Let be the counting measure on
N. Then
_
mN
a
n,m
d =
m=1
a
n,m
. Moreover, Theorem 6.1 says that we can
sum either along n rst or m rst and get the same results (
n=1
m=1
a
n,m
=
m=1
n=1
a
n,m
) if the double sum is absolutely convergent. Of course, this
fact can also be proven in an entirely elementary way.
Theorem 6.2. Let g : X [0, ] be measurable in the measure space (X, /, ).
Let
(E) =
_
E
g d, E /.
18
Then is a measure on (X, /), and for any measurable function f on X,
_
X
f d =
_
X
fg d,
often written as d = g d.
Proof. We prove is a measure. () = 0 is trivial. For countable additivity,
let E
n
be measurable with union E, so that
E
=
n=1
En
, and
(E) =
_
E
g d =
_
X
g
E
d
=
_
X
n=1
g
En
d =
n=1
_
X
g
En
d =
n=1
(E
n
) .
Next, if f =
E
for some E /, then
_
X
f d =
_
X
E
d = (E) =
_
X
E
g d =
_
X
fg d.
By linearity, we see that
_
f d =
_
fg d whenever f is nonnegative simple.
For general nonnegative f, we use a sequence of simple approximations
n
f, so
n
g fg. Then by the monotone convergence,
_
X
f d = lim
n
_
X
n
d = lim
n
_
X
n
g d =
_
X
lim
n
n
g d =
_
X
fg d.
Finally, for f not necessarily nonnegative, we apply the above to its positive
and negative parts, and use linearity.
The procedure of proving some fact about integrals by rst reducing to the
case of simple functions and nonnegative functions is used quite often. (It will
get quite monotonous if we had to detail the procedure every time we use it, so
we wont anymore if the circumstances permit.)
Also, we should note that if f is only measurable but not integrable, then
the integrals of f
+
or f
B
d = (B) = (g
1
(B)) = (A) =
_
X
(f g) d.
Since both sides of the equation are linear in f, the equation holds whenever f
is simple. Applying the standard procedure mentioned above, the equation is
then proved for all measurable f.
Remark 6.4. The change of variables theorem can also be applied in reverse.
Suppose we want to compute
_
Y
f d, where is already given to us. Further
assume that g is bijective and its inverse is measurable. Then we can dene
(A) = (g(A)), and it follows that
_
Y
f d =
_
X
(f g) d.
Our theorem (especially when stated in the reverse form) is clearly related
to the usual change of variables theorem in calculus. If g : X Y is a bijec
tion between open subsets of R
n
, and both it and its inverse are continuously
dierentiable (i.e. g is a dieomorphism), and = is the Lebesgue measure
in R
n
, then (as we shall prove rigorously in Lemma 12.1),
(A) = (g(A)) =
_
A
[det Dg[ d.
Appealing to Theorems 6.2 and 6.3, we obtain:
Theorem 6.5 (Dierential change of variables in R
n
). Let g : X Y be
a dieomorphism of open sets in R
n
. If A X is measurable, and f : Y R
is measurable, then
_
g(A)
f d =
_
A
(f g) d =
_
A
(f g) [det Dg[ d.
The next two theorems are the Lebesgue versions of wellknown results about
the Riemann integral.
Theorem 6.6 (First Fundamental Theorem of Calculus). Let I R be
an interval, and f : I R be integrable (with Lebesgue measure in R). Then
the function
F(x) =
_
x
a
f(t) dt
is continuous. Furthermore, if f is continuous at x, then F
t
(x) = f(x).
Proof. To prove continuity, we compute:
F(x +h) F(x) =
_
x+h
x
f(t) dt =
_
I
f(t)
[x,x+h]
(t) dt .
20
(Naturally, when h < 0,
[x,x+h]
should be interpreted as
[x+h,x]
here.) Since
[f
[x,x+h]
[ [f[, by the Dominated Convergence Theorem (along with remark
5.8),
lim
h0
F(x +h) F(x) = lim
h0
_
I
f(t)
[x,x+h]
(t) dt
=
_
I
lim
h0
f(t)
[x,x+h]
(t) dt
=
_
I
f(t)
x]
(t) dt = 0 .
The proof of dierentiability is the same as for the Riemann integral:
_
x+h
x
(f(t) f(x)) dt
h
_
[x,x+h]
[f(t) f(x)[ dt
[h[
sup
t[x,x+h]
[f(t) f(x)[ [h[
[h[
,
which goes to zero as h does.
The First Fundamental Theorem was easy, but the Second Fundamental
Theorem (which states that
_
b
a
f
t
= f(b) f(a)) is not entirely trivial. The
diculty is that we should not assume as hypotheses that f
t
is continuous, or
even that it is Lebesgueintegrable. It turns out that a theorem without such
strong hypotheses is possible; we will not reproduce its proof here, but just
settle for a weaker version:
Theorem 6.7 (Second Fundamental Theorem of Calculus). Suppose
f : [a, b] R is measurable and bounded above and below. If f = g
t
for some
g, then
_
b
a
f(x) dx = g(b) g(a) .
Proof. We rst note that
g(x+h)g(x)
h
t
f(x, t)
if the following conditions are satised:
22
1. For each x X,
t
f(x, t) exists.
2. There is an integrable function g such that
t
f(x, t)
t
f(x, t)
(noting that
f(x,t+h)f(x,t)
h
is bounded by g(x)).
Remark 6.12. It is easy to see that we may generalize Theorem 6.11 to T being
any open set in R
n
, taking partial derivatives. I wont write it out in full because
the notation is somewhat complicated.
Example 6.2. Check that the function (x) =
_
0
e
t
t
x1
dt, x > 0 is continu
ous, and dierentiable with the obvious formula for the derivative.
7 L
p
spaces
The contents in this section do not have applications in this article, but they
are so well known that it would not do justice to omit them.
Denition 7.1. Let X be a measure space, and let p [1, ). The space L
p
consists of all measurable functions f : X R such that
_
[f[
p
< .
Denition 7.2. For each f L
p
, dene
f
p
=
_
_
[f[
p
_
1/p
.
(If f / L
p
, this quantity is of course dened as .)
As suggested by the notation, 
p
is a real norm on the vector space L
p
,
provided that we declare two functions to be equivalent if they dier only on
a set of measure zero (so that f
p
= 0 if and only if f = 0 as equivalence
classes). Only the verication of the triangle inequality presents any diculties
this will be solved by the theorems below.
Denition 7.3. Two numbers p, q (1, ) are called conjugate exponents
when
1
p
+
1
q
= 1.
23
Theorem 7.1 (Holders inequality). For Rvalued measurable functions f
and g,
_
fg
_
[f[[g[ f
p
g
q
.
Proof. The rst inequality is trivial. For the second inequality, since it only
involves absolute values, for the rest of the proof we may assume that f, g are
nonnegative.
If f
p
= 0, then [f[
p
= 0 almost everywhere, and so f = 0 and fg = 0
almost everywhere too. Thus the inequality is valid in this case. (Similarly
when g
q
= 0.)
If f
p
or g
q
is innite, the inequality is trivial.
So we now assume these two quantities are both nite and nonzero. Dene
F = f/f
p
, G = g/g
q
, so that F
p
= G
q
= 1. We must then show that
_
FG 1.
To do this, we employ the fact that log is concave:
1
p
log s +
1
q
log t log
_
s
p
+
t
q
_
, 0 s, t
or,
s
1
p
t
1
q
s
p
+
t
q
.
Substitute s = F
p
, t = G
q
, and integrate both sides:
_
FG
1
p
_
F
p
+
1
q
_
G
q
=
1
p
F
p
p
+
1
q
G
q
q
=
1
p
+
1
q
= 1.
You may have seen a special case of this theorem, for p = q = 2, as the
CauchySchwarz inequality.
Theorem 7.2 (Minkowskis inequality). For Rvalued measurable functions
f and g,
f +g
p
f
p
+g
p
.
Proof. The inequality is trivial when p = 1 or when f + g
p
= 0. Also, since
f +g
p
[f[ +[g[
p
, it again suces to consider only the case when f, g are
nonnegative.
Next, we employ the convexity of t t
p
, p > 1,
_
s +t
2
_
p
s
p
+t
p
2
, 0 s, t .
When we substitute s = f, t = g, we get (f + g)
p
2
p1
(f
p
+ g
p
). This
inequality shows that if f + g
p
is innite, then one of f
p
or g
p
is also
innite, so Minkowskis inequality holds true in that case.
We may now assume f +g
p
is nite. We write:
_
(f +g)
p
=
_
f(f +g)
p1
+
_
g(f +g)
p1
.
24
By Holders inequality, and noting that (p 1)q = p for conjugate exponents,
_
f(f +g)
p1
f
p
_
_
(f +g)
p1
_
_
q
= f
p
_
_
[f +g[
p
_
1/q
= f
p
f +g
p/q
p
.
A similar inequality holds for
_
g(f +g)
p1
. Putting these together:
f +g
p
p
(f
p
+g
p
)f +g
p/q
p
.
Dividing by f +g
p/q
p
yields the desired result.
Denition 7.4. A measure space (X, ) has nite measure if (X) is nite.
Theorem 7.3. Let (X, ) have nite measure. Then whenever 1 r < p < ,
L
p
L
r
. Moreover, the inclusion map from L
p
to L
r
is continuous.
Proof. If f L
p
, apply the Holder inequality with conjugate exponents
p
r
and
s =
p
pr
:
f
r
r
=
_
[f[
r
_
_
[f[
r
p
r
_
r/p
_
_
1
s
_
1/s
= f
r
p
(X)
1/s
,
and so
f
r
f
p
(X)
1/rs
= f
p
(X)
1
r
1
p
< .
To show continuity of the inclusion map, replace f with f g above where
f g
p
< .
Example 7.1.
_
1
0
x
1
2
dx = 2 < , so automatically
_
1
0
x
1
4
dx < . On the
other hand, the condition that (X) < is indeed necessary:
_
1
x
2
dx < ,
but
_
1
x
1
dx = .
Theorem 7.4. Let f
n
: X R be measurable functions converging (almost
everywhere) pointwise to f, and [f
n
[ g for some g L
p
. Then f, f
n
L
p
,
and f
n
converges to f in the L
p
norm, meaning:
lim
n
_
_
[f
n
f[
p
_
1/p
= lim
n
f
n
f
p
= 0.
Proof. [f
n
f[
p
converges to 0 and [f
n
f[
p
(2g)
p
L
1
. Apply the Dominated
Convergence Theorem on these functions.
One wonders whether the converse is true: if f
n
converges to f in the L
p
norm, do the functions f
n
themselves converge pointwise to f? The answer is
no (the counterexamples are not dicult), but we do have the following.
Theorem 7.5. Let f
n
be a Cauchy sequence in L
p
. Then it has a subsequence
converging pointwise almost everywhere.
25
Corollary 7.6. If lim
n
f
n
f
p
= 0 then there is a subsequence f
n(k)
con
verging to f pointwise almost everywhere.
Corollary 7.6 is used in the following result.
Theorem 7.7. L
p
is a complete metric space. (This means every Cauchy
sequence in L
p
converges.)
The proofs of these theorems are collected in the next section.
It is also possible to dene L
:
Denition 7.5. Let X be a measure space, and f : X R be measurable. A
number M [0, ] is an almosteverywhere upper bound for [f[ if [f[ M
almost everywhere. The inmum of all almosteverywhere upper bounds for [f[
is denoted by f
.
Denition 7.6. L
< .
Its norm is given by f
.
The use of the subscript is explained by the following theorem, whose
proof is left to the reader:
Theorem 7.8. If (X) < , then lim
p
f
p
= f
.
Remark 7.9. Observe that there is a similar thing for vectors a = (a
1
, . . . , a
n
)
R
n
:
lim
p
a
p
= lim
p
_
[a
1
[
p
+ +[a
n
[
p
_
1/p
= max([a
1
[, . . . , [a
n
[) = a
.
This remark may serve as a hint.
If we (naturally) dene the conjugate exponent of p = 1 to be q = ,
the Holder inequality remains valid, since [fg[ [f[g
almost everywhere.
Integrating both sides gives fg
1
f
1
g
.
Before closing, we mention two more results. From Theorem 7.4, it is clear
that
Theorem 7.10. Let f : X R L
p
(X), 1 p < . Then for any > 0,
there exist simple functions : X R such that
 f
p
=
_
_
R
n
[ f[
p
d
_
1/p
< .
(This is also true for p = .) This may be summarized by saying that the
set of all simple functions is dense in L
p
(in the topological sense).
When X = R
n
with Lebesgue measure , we have another result. The set of
innitely dierentiable functions on R
n
with compact support
4
, denoted C
0
,
4
The support of a function : X R is the closure of the set {x X : (x) = 0}. has
compact support means that the support of is compact (when X = R
n
, same as closed
and bounded).
26
is also dense in L
p
(R
n
, ), 1 p < . (This fact will be fully proven in Section
14.)
These two facts are typically used as tools to prove other theorems about
functions f L
p
. One rst proves that a certain theorem holds for all simple
(or C
0
), and then prove that the same result holds for arbitrary f L
p
by approximating such f by . An example of this procedure follows.
Theorem 7.11 (RiemannLebesgue Lemma). For all f L
1
(R),
lim
]]
_
R
f(x) sin(x)dx = 0 .
Proof. For convenience, assume > 0.
Suppose rst that f = C
0
. Since has compact support, integrating
(x) sin(x) over R is the same as integrating over some compact interval [a, b]
containing the support of . And since every function involved is innitely
dierentiable, we may use integration by parts (u = (x), dv = sin(x) dx):
_
b
a
(x) sin(x) dx = (x)
cos(x)
b
a
+
_
b
a
cos(x)
t
(x) dx.
As [[ and [
t
[ are continuous on [a, b], they are bounded by constants M and
M
t
respectively. We have,
_
b
a
(x) sin(x) dx
_
b
a
[cos(x)
t
(x)[ dx
2M
+
(b a)M
t
0 , .
Thus we have proven the result when f = C
0
. Now suppose f is
arbitrary. By the denseness of C
0
, for every > 0 we can nd C
0
such
that f 
1
< . Then
_
R
f(x) sin(x) dx
_
R
(x) sin(x) dx
_
R
_
f(x) (x)
_
sin(x)
dx
_
R
[f(x) (x)[ dx
< ,
or,
_
R
f(x) sin(x) dx
< +
_
R
(x) sin(x) dx
.
We take limsup
of both sides:
limsup
_
R
f(x) sin(x) dx
+ 0 .
But > 0 is arbitrary, so we must have:
lim
_
R
f(x) sin(x) dx
= limsup
_
R
f(x) sin(x) dx
= 0 .
27
8 Construction of Lebesgue Measure
We now come to actually construct Lebesgue measure, as promised. The idea
is to extend an existing measure which has been only partially dened, to an
outer measure
(E) = inf
A1,A2,...,
E
S
n
An
n
(A
n
) , for all E X .
(One interesting point to note: the denition of
() = 0.
2. It is monotone:
(E)
(F) when E F X.
3.
is countably subadditive: if E
1
, E
2
, . . . X, then
n
E
n
)
n
(E
n
).
Proof. The rst three properties are obvious. For the fourth, rst observe
that if any of the
(E
n
) is innite, there is nothing to prove. Otherwise,
let > 0. For each E
n
, by the denition of
m
/
covering E
n
, with
m
(A
n,m
)
(E
n
)+/2
n
. All of the A
n,m
together cover
n
E
n
, and
n,m
(A
n,m
)
n
(E
n
) + . Since was arbitrary, we have
n,m
(A
n,m
)
n
(E
n
).
There is of course no guarantee that
is a proper
measure on the following subcollection of 2
X
:
/ = B 2
X
[
(B E) +
(B
c
E) =
(B E) +
(B
c
E)
(B E) +
(B
c
E)
(E).)
Lemma 8.2. / is a sigma algebra.
Proof. It is immediate from the denition that / is closed under taking com
plements, and that /. We rst show / is closed under nite intersection
28
(and hence under nite union). Let B, C /.
_
(B C) E
_
+
_
(B C)
c
E
_
=
(B C E) +
_
(B
c
C E) (B C
c
E) (B
c
C
c
E)
_
(B C E) +
(B
c
C E) +
(B C
c
E) +
(B
c
C
c
E)
=
(C E) +
(C
c
E) , by denition of B /
=
(E) , by denition of C /.
Thus B C /.
We now have to show that if B
1
, B
2
, . . . /,
n
B
n
/. We may assume
that B
n
are disjoint, for otherwise we just consider B
t
n
= B
n
(B
1
B
n1
),
which are in / by the previous paragraph.
We shall need to know that, for all N 1, and all E X,
_
N
_
n=1
B
n
E
_
=
N
n=1
(B
n
E) .
The proof will be by induction on N. The statement is trivial for N = 1. For
the induction step, let D
N
=
N
n=1
B
n
which are increasing and are all in /.
Then
(D
N+1
E) =
_
D
N
(D
N+1
E)
_
+
_
D
c
N
(D
N+1
E)
_
=
(D
N
E) +
(B
N+1
E)
=
N+1
n=1
(B
n
E) (induction hypothesis).
Using the fact just proven, we now have:
(E) =
(D
N
E) +
(D
c
N
E)
=
N
n=1
(B
n
E) +
(D
c
N
E)
n=1
(B
n
E) +
_
_
_
n=1
B
n
_
c
E
_
(monotonicity).
Taking N , we obtain:
(E)
n=1
(B
n
E) +
_
_
_
n=1
B
n
_
c
E
_
_
_
_
n=1
B
n
_
E
_
+
_
_
_
n=1
B
n
_
c
E
_
(subadditivity).
But this shows
n=1
B
n
/.
29
Lemma 8.3. If B
1
, B
2
, . . . / are disjoint, then
n
B
n
) =
n
(B
n
).
Proof. The nite case
N
n=1
B
n
) =
N
n=1
(B
n
) was proven in the previous
lemma (set E = X). By monotonicity,
N
n=1
(B
n
)
n=1
B
n
). Taking
N gives
n=1
(B
n
)
n=1
B
n
). Inequality in the other direction
is implied by subadditivity of
.
We now know
i
A
i
) =
i
(A
i
). It follows that
is monotone and nitely subadditive.
3. Also, if A
1
, A
2
, . . . / are disjoint, and
i
A
i
happens to be in /, the
previous equation must also hold. (This is equivalent to requiring that
(
i
A
i
)
i
(A
i
).)
Then we have the following important result. (By the way, the name of
Caratheodory Extension Process is often used to refer to the constructions in
this section.)
Theorem 8.4. / /, and
(A E) +
(A
c
E)
_
A
_
n
A
n
_
+
_
A
c
_
n
A
n
_
(A A
n
) +
(A
c
A
n
)
n
(A A
n
) +
n
(A
c
A
n
)
=
n
(A
n
) (nite additivity of )
(E) + .
being arbitrary,
(A E) +
(A
c
E)
n
(A A
n
)
n
(A
n
) .
This implies (A)
.
Our nal results for this section concern the uniqueness of this extension.
Theorem 8.5. Assume (X) < . Let B be the sigma algebra generated from
the algebra /. If is another measure on B, which agrees with
on /, then
(B) = inf
A1,A2,...,
B
S
n
An
n
(A
n
) = inf
n
(A
n
) inf
_
_
n
A
n
_
inf (B) .
So
(B
c
) (B
c
), so that
(X)
(B)
(X) (B) =
(X) (B), or
(B X
n
) = (B X
n
). Taking limits as
n gives
(B) = (B).
The restriction that X be sigmanite is not too severe, since the usual spaces
such as R
n
are sigmanite. Sigmaniteness also comes back in the theorems
of Section 11.
Finally, the corollary below is just Theorem 8.6 restated without reference
to the outer measure:
Corollary 8.7 (Uniqueness of measures). Let B be the sigma algebra gen
erated by the algebra /. Then if two measures and agree on /, and X is
sigmanite (under either or ), then = on B.
31
9 Lebesgue Measure in R
n
In this section we rigorously construct the ndimensional volume measure on R
n
.
As hinted before, the idea is to dene the measure for rectangles and then use
the extension process in Section 8. Unfortunately, there is some grunt work to
do in order to verify that the hypotheses of those theorems are indeed satised.
Our setting will be the collection of rectangles I
1
I
n
in R
n
, where
I
k
is any open, halfopen or closed, bounded or unbounded, interval in R. The
following denition is merely an abstracted version of the formal facts we need
about these rectangles.
Denition 9.1. Let X be any set. A semialgebra is any 2
X
with the
following properties:
1. The empty set is in .
2. The intersection of any two sets in is also in .
3. For any set in , its complement is expressible as a nite disjoint union
of other elements of .
It is easy to see, although tiresome to write down formally, that the collection
of all rectangles is indeed a semialgebra. But immediately from this, we can
automatically construct an algebra /:
Theorem 9.1. The set / of all nite disjoint unions of elements of a semi
algebra is an algebra on X.
Proof. We check the properties for an algebra:
1. The empty set is trivially in /.
2. If A =
i
R
i
, and B =
j
S
j
, where R
i
and S
i
denote a nite number of
sets chosen from , then A B =
i
R
i
j
S
j
=
i,j
R
i
S
j
/.
3. If A =
i
R
i
, then A
c
=
i
R
c
i
=
i
j
S
i,j
for some S
i,j
. The nite
intersection belongs to / by the previous step. So A
c
/.
4. Finally, given A
i
=
j
R
i,j
, for a nite number of i, let D
0
= , and D
i
=
D
i1
(A
i
D
i1
) /. Then
i
A
i
=
i
D
i
=
i
(D
i
D
i1
) /.
By the way, the sigma algebra generated by / will contain the Borel sigma
algebra: every open set U R
n
obviously can be written as a union of open
rectangles, and in fact we can use a countable union of open rectangles. For,
given an arbitrary collection of rectangles covering U R
n
, there always exists
a countable subcover
5
.
The volume of A = I
1
I
n
is naturally dened as (A) = (I
1
) (I
n
),
(I
k
) being the length of the interval I
k
, with the usual rules about multiplying
zeroes and innities together in force.
5
If you are not aware of this theorem, you are invited to prove it yourself.
32
Now lets suppose that the rectangle A has been partitioned into a disjoint
smaller rectangles. Then the sum of the volumes of the smaller rectangles, as we
have dened it, should equal the volume of A. This is true, of course, although
it is again tedious to write down formally. Essentially, one draws a rectangular
grid on A using the boundaries of the smaller rectangles, and show that the sum
of the volumes of each cell in the grid equals the volume of A, by applying the
distributive property of multiplication over addition.
In the even more general case, suppose A /is a disjoint union of rectangles,
but A is not necessarily in the semialgebra of rectangles. The volume of A is
dened as the sum of the volumes of the component rectangles. This is obvious
we mention it only to note that, although A may certainly have dierent
decompositions (A =
i
R
i
=
j
S
j
), the volume sum is always the same.
To see this, simply take the common renement R
i
S
j
. Then
i
(R
i
) =
j
(R
i
S
j
) by applying the result of the previous paragraph on each
rectangle R
i
. But
j
(R
j
) equals this double sum also.
It follows easily then, that is nitely additive. Thus there is only one
nal thing left to show: if the disjoint union of A
1
, A
2
, . . . / is C /,
then (C) =
i=1
(A
i
). From monotonicity and taking limits we always have
(C)
i=1
(A
i
).
We show (C)
i=1
(A
i
). Observe that this also ought to be true if C
is contained in, but not necessarily equal to, the union of the A
i
, and A
i
need
not be disjoint at all. Henceforth these are our new hypotheses.
Suppose rst that C happens to be compact, and A
i
are all open. In other
words, A
i
form an open cover of the compact set C. So there is a nite
subcover A
1
, . . . , A
n
. By nite subadditivity, we have (C)
n
i=1
(A
i
)
i=1
(A
i
).
Now continue to assume that C is compact, but A
i
are not open. But
it is easy to make the A
i
open and still cover C, by slightly expanding each
A
i
. In particular, stipulate that the volume of each new A
i
grows by at most
/2
i
. (We can assume the A
i
are plain rectangles, rather than disjoint unions of
them, and that they are bounded, since C is bounded.) Then we have (C)
i=1
(A
i
) +, and > 0 is arbitrary.
All that remains is the case that C is not compact. If C is bounded, so is its
closure, and hence by the HeineBorel theorem, C is compact. Similarly take
the closure of the A
i
. But taking closures of elements of / does not change
their volumes.
Finally consider C unbounded. But C [N, N]
n
is bounded for each N,
so from the previous case we have (C [N, N]
n
)
i=1
(A
i
). It is easily
checked that the limit as N of the left side is exactly (C).
Thus, using the theorems of Section 8, we can conclude:
Theorem 9.2. Lebesgue measure in R
n
exists, and it is uniquely determined,
given our hypotheses.
It is obvious from our constructions that Lebesgue measure is invariant under
translations of sets. It is also invariant under other rigid motions (rotations,
reections); this will be a consequence of Lemma 12.1 and some linear algebra.
33
An interesting question to ask is whether there are any sets that are not
Borel, or that cannot be assigned any volume. The following theorem gives a
classic example (and should also serve to convince you why our strenuous eorts
are necessary).
Theorem 9.3 (Vitali). There exists a nonmeasurable set in [0, 1] using Lebesgue
measure. In other words, Lebesgue measure cannot be dened consistently for
all subsets of [0, 1].
Proof. The key fact in this proof is translationinvariance. In particular, given
any measurable H [0, 1], dene its shift with wraparound:
H x = h +x : h H, h +x 1 h +x 1 : h H, h +x > 1 .
Then (H x) = (H).
Dene two real numbers to be equivalent if their dierence is rational. The
interval [0, 1] is partitioned by this equivalence relation. Compose a set H
[0, 1] consisting of exactly one element from each equivalence class, and also say
0 / H. Then (0, 1] equals the disjoint union of all H r, for r [0, 1) Q.
Consequently, by countable additivity,
1 = ((0, 1]) =
r[0,1)Q
(H r) =
r[0,1)Q
(H) ,
a contradiction, because the sum on the right can only be 0 or . Hence H
cannot be measurable.
A fact related to these matters is that Lebesgue measure is complete, mean
ing if (A) = 0, then every B A is Lebesguemeasurable and (B) = 0. (This
follows directly from the construction of the outer measure in the previous sec
tion.) On the other hand, one can show that the Lebesgue measure restricted
to the Borel sets in R
n
is not complete. This means a slight complication in
the theorems we prove about Lebesgue measure, but fortunately the extension
process allows us to complete any (sigmanite) measure if necessary.
10 Riemann integrability implies Lebesgue in
tegrability
You have probably already suspected that any function that any Riemann
integrable function is also Lebesgueintegrable, and certainly with the same
values for the two integrals. We shall prove this fact here. Let us rst review
the denition of the Riemann integral.
Let A R
m
be a (bounded) rectangle. Usually a bounded function f : A
R is said to be (proper) Riemannintegrable if the supremum of its lower sums
and the inmum of its upper sums are equal:
sup
1
_
R1
(R) inf
xR
f(x)
_
= inf
1
_
R1
(R) sup
xR
f(x)
_
34
(where T denotes a rectangular partition of A).
We can rephrase the denition by considering not just lower sums and upper
sums for f, but the integral of any simple function s f or s f (simple with
respect to a rectangular partition). Such simple functions are obviously both
Riemann and Lebesgue integrable with the same values for the integral. It is
also easily seen that for every such s f there exists some lower sum for f (in
the usual sense) such that the integral of s is less than or equal to that lower
sum. Similarly for the upper simple functions and the upper sums. Therefore
sup
all simple sf
_
_
A
s
_
= sup
1
_
R1
(R) inf
xR
f(x)
_
,
inf
all simple sf
_
_
A
s
_
= inf
1
_
R1
(R) sup
xR
f(x)
_
,
and we may equivalently dene f to be Riemannintegrable if the supremum
of the integrals of the lower simple functions is equal to the inmum of the
integrals of the upper simple functions.
It follows from the usual arguments, that if s
1
and s
2
are simple with s
1
f s
2
, then
_
A
s
1
_
A
s
2
, and that f is Riemannintegrable if and only if there
exists a sequence of lower simple functions l
n
f, and upper simple functions
u
n
f such that
lim
n
_
A
l
n
=
_
A
f = lim
n
_
A
u
n
.
This more relaxed denition of Riemann integrability is easier to work with
in the proof of the following theorem.
Theorem 10.1. Let A R
m
be a rectangle. If f : A R is proper Riemann
integrable, then it is also Lebesgueintegrable (with respect to Lebesgue measure)
with the same value for the integral.
Proof. f is Riemannintegrable, so choose a sequence of simple functions l
n
f u
n
with lim
n
_
A
l
n
=
_
A
f = lim
n
_
A
u
n
. Let g
n
(x) = max
kn
l
k
(x), so that
g
n
(x) increase to g(x) = sup
n
g
n
(x). By our construction, we have
l
n
g
n
g f u
n
.
Since the integrals of l
n
and u
n
converge onto each other, we know that g
is Riemannintegrable. Riemannintegrating and applying limits to the above
inequality,
lim
n
_
A
l
n
lim
n
_
A
g
n
_
A
g
_
A
f lim
n
_
A
u
n
.
Thus the nonnegative function f g has a Riemann integral of zero, and so
f g = 0 almost everywhere with respect to Lebesgue measure
6
. In turn, f g
must be measurable. (This follows from Lebesgue measure being complete.)
6
This follows from a very famous theorem about Riemann integrability, whose proof you
can nd in [Spivak2] or [Munkres].
35
On the other hand, g is measurable, because g
n
and l
n
are, so f is measur
able. Since [f[ is bounded and A has nite measure, f must also be Lebesgue
integrable. Then the same inequality above with the Riemann integrals changed
to Lebesgue integrals shows that the Lebesgue and Riemann integrals of f are
the same.
The following theorem concerns the absolutely convergent improper Rie
mann integral as dened in [Munkres].
Theorem 10.2. Let A R
m
be open, and f : A R be locally bounded on A
and continuous almost everywhere on A. If f is improperRiemannintegrable
(i.e.
_
A
[f[ < ), then its Lebesgue integral exists with the same value for the
integral. Also
_
A
[f[ diverges simultaneously for the improper Riemann integral
and the Lebesgue integral.
Proof. Suppose rst that f 0. Let C
n
be a sequence of compact Jordan
measurable subsets of A whose union is A and C
n
interior C
n+1
. Then
_
A
f = lim
n
_
Cn
f .
Note that
_
Cn
f is valid as both a Riemann and Lebesgue integral, by Theo
rem 10.1, and it can also be written as the Lebesgue integral
_
A
f
Cn
which
converges monotonically, as n , to the Lebesgue integral
_
A
f. This must
of course be equal to the left side of the equation above, which is the improper
Riemann integral.
For general f, repeating the same reasoning for the nonnegative functions
[f[, f
+
, f
E
n
)
y
=
E
n
y
/. So
E
n
T.
3. If E T, and F = E
c
, then F
y
= x X : (x, y) F = x X :
(x, y) / E = X E
y
/. So E
c
T.
Thus T is a sigma algebra containing the measurable rectangles, i.e. T is all of
/B.
Theorem 11.2. Let (X Y, /B) and (Z, () be measurable spaces.
If f : X Y Z is measurable, then the functions f
y
: X Z, f
x
: Y Z
obtained by holding one variable xed are also measurable.
Proof. Again we consider only f
y
. Let I
y
: X X Y be dened by I
y
(x) =
(x, y). Then given E /B, by Theorem 11.1. I
1
y
(E) = E
y
/ , so I
y
is a
measurable function. But f
y
= f I
y
.
The next theorem on multiple integration requires the following technical
tool, which comes equipped with a denition.
Denition 11.2. A family / of subsets of X is a monotone class if it is closed
under increasing unions and decreasing intersections.
The intersection of any set of monotone classes is a monotone class. The
smallest monotone class containing a given set ( is the intersection of all mono
tone classes containing (. This construction is analogous to the one for sigma
algebras, and the result is also said to be the monotone class generated by (.
Theorem 11.3 (Monotone Class Theorem). If / be an algebra on X, then
the monotone class generated by / is the same as the sigma algebra generated
by /.
Proof. Since a sigma algebra is a monotone class, the generated sigma algebra
contains the generated monotone class /. So we only need to show / is a
sigma algebra.
We rst claim that / is actually closed under complementation. Let /
t
=
S / : X S / /. This is a monotone class, and it contains the
algebra /. So / = /
t
as desired.
To prove that / is closed under countable unions, we only need to prove
that it is closed under nite unions, for it is already closed under countable
increasing unions.
First let A /, and ^(A) = B / : A B / /. Again this is a
monotone class containing the algebra /; thus ^(A) = /.
37
Finally, let S /, with the same denition of ^(S). The last paragraph,
rephrased, says that / ^(S). And ^(S) is a monotone class containing /
by the same arguments as the last paragraph. Thus ^(S) = / as desired.
Theorem 11.4. Let (X, /, ), (Y, B, ) be sigmanite measure spaces.
If E /B, then
1. (E
x
) is a measurable function of x X.
2. (E
y
) is a measurable function of y Y .
Proof. We concentrate on (E
y
). Let X
m
X, with (X
m
) < . Fix m
for now and let T = E /B : (E
y
X
m
) is a measurable function of y.
T is equal to /B, because:
1. If E = AB is a measurable rectangle, then (E
y
X
m
) = (A X
m
)
B
(y) which is a measurable function of y.
If E is a nite disjoint union of measurable rectangles E
n
, then (E
y
X
m
) =
n
(E
n
y
X
m
) which is also measurable.
The measurable rectangles form a semialgebra (just like the rectangles in
R
n
). Therefore, applying Theorem 9.1, T contains the algebra of nite
disjoint unions of measurable rectangles.
2. If E
n
are increasing sets in T (not necessarily measurable rectangles), then
((
E
n
)
y
X
m
) = (
E
n
y
X
m
) = lim
n
(E
n
y
X
m
) is measurable,
so
E
n
T.
Similarly, if E
n
are decreasing sets in T, then using limits we see that
E
n
T. (Here it is crucial that E
n
y
X
m
have nite measure, for the
limiting process to be valid.)
3. These arguments show that T is a monotone class, and it contains the
monotone class generated by the algebra of nite unions of measurable
rectangles. By the Monotone Class Theorem, T must therefore be the
same as the sigma algebra /B.
We now know that for each E /B, (E
y
X
m
) is measurable, for every
m. Taking limits as m , we conclude that (E
y
) is also measurable.
One thing has been deliberately left out of our discussion so far: the con
struction of the product measure , which, as in the case of R
n
, should
assign a measure (A)(B) to the measurable rectangle AB. The problem is
that we cannot prove countable additivity of dened this way. We take
an indirect route instead, dening it by iterated integrals:
Theorem 11.5. Let (X, /, ), (Y, B, ) be sigmanite measure spaces. There
exists a unique product measure : /B [0, ], with
( )(E) =
_
xX
_
yY
E
(x, y) d
. .
(Ex)
d =
_
yY
_
xX
E
(x, y) d
. .
(Ey)
d.
38
Proof. Let
1
(E) denote the double integral on the left, and
2
(E) denote the
one on the right. (These integrals exist by Theorem 11.4.) It is obvious that
both
1
and
2
are countably additive, so they are both measures on / B.
Moreover, if E = A B, then just expanding the two integrals shows
1
(E) =
(A)(B) =
2
(E). Now XY is sigmanite if X and Y are, so by uniqueness
of measures
7
(Corollary 8.7),
1
=
2
on all of /B.
Theres not much work left for our nal theorems:
Theorem 11.6 (Fubini). Let (X, /, ) and (Y, B, ) be sigmanite measure
spaces. If f : X Y R is integrable, then
_
XY
f d( ) =
_
xX
_
_
yY
f(x, y) d
_
d =
_
yY
_
_
xX
f(x, y) d
_
d .
This equation also holds when f 0 (if it is merely measurable, not integrable).
Proof. The case f =
E
is just Theorem 11.5. Since all three integrals are
additive, they are equal for nonnegative simple f, and hence also for all other
nonnegative f, by approximation and monotone convergence. From linearity
on f = f
+
f
, we see that they are equal for any integrable f. (We will allow
to occur on a set of measure zero.)
Theorem 11.7 (Tonelli). Let (X, /, ) and (Y, B, ) be sigmanite measure
spaces, and f : X Y R be measurable. Then f is integrable if
and only if
_
xX
_
_
yY
[f(x, y)[ d
_
d <
(or with X and Y reversed).
Consequently, if any one of these conditions hold, then it is valid to switch
the order of integration when integrating f.
Proof. Immediate from Fubinis theorem applied to the function [f[.
Remark 11.8. Note that it is possible that
_
yY
[f(x, y)[ d = on a set of
measure zero in X, and still have integrability, and conversely.
12 Change of variables in R
n
This section will be devoted to completing the proof of the dierential change
of variables formula, Theorem 6.5. As we noted in the remarks preceding that
theorem, it suces to prove the following.
7
If you prefer, you can also prove this theorem using the Monotone Class Theorem instead.
39
Lemma 12.1. Let g : X Y be a dieomorphism between open sets in R
n
.
Then for all measurable sets A X,
(g(A)) =
_
g(A)
1 =
_
A
[det Dg[ .
Proof. We rst begin with two simple reductions.
(I) It suces to prove the lemma locally.
That is, suppose there exists an open cover of X, U
, so that the
equation of the lemma holds for measurable A contained inside one of the
U
i
(A E
i
) =
i
(A E
i
) = (A) .
(II) Suppose the lemma holds for two dieomorphisms g and h, and all mea
surable sets. Then it holds for the composition dieomorphism g h (and
all measurable sets).
Proof. For any measurable A,
_
g(h(A))
1 =
_
h(A)
[det Dg[ =
_
A
[(det Dg) h[ [det Dh[ =
_
A
[det D(g h)[ .
The second equality follows from Theorem 6.5 applied to the dieomor
phism h, which is valid once we know (h(B)) =
_
B
[det Dh[ for all mea
surable B.
We proceed to prove the lemma by induction, on the dimension n.
Base case n = 1. Cover X by a countable set of bounded intervals I
k
in R. By
Reduction I, it suces to prove the lemma for measurable sets contained
in each of the I
k
individually. By the uniqueness of measures (Corollary
8.7), it also suces to show = only for the intervals [a, b], (a, b), etc.
But this is just the Fundamental Theorem of Calculus:
_
g([a,b])
1 = [g(b) g(a)[ = [
_
b
a
g
t
[ =
_
b
a
[g
t
[ .
40
(For the last equality, remember that the g
t
must be either positive on all
of [a, b] or negative on all of [a, b]. If the interval is open or halfopen, we
may not be able to apply the Fundamental Theorem, but the preceding
equation can still be obtained via a limiting procedure.)
Induction step. Locally (i.e. on a suciently small open set around each
point x X), g can always be factored
8
as g = h
k
h
2
h
1
, where
each h
i
is a dieomorphism and xes one coordinate of R
n
. By Reduction
I, it suces to consider this local case only. By Reduction II, it suces to
prove the lemma for each of the dieomorphisms h
i
.
So suppose g xes one coordinate. For convenience in notation, assume g
xes the last coordinate: g(u, v) = (h
v
(u), v), for u R
n1
, v R, and h
v
are functions on (open subsets of) R
n1
, in fact dieomorphisms. Clearly
h
v
are onetoone, and most importantly, det Dh
v
(u) = det Dg(u, v) ,= 0.
Next, let a measurable set A be given, and consider its projection V =
v R : (u, v) A, and its crosssection U
v
= u R
n1
: (u, v) A.
We now apply Fubinis theorem and the induction hypothesis on h
v
:
_
g(A)
1 =
_
vV
_
hv(Uv)
1
=
_
vV
_
uUv
[det Dh
v
(u)[
=
_
vV
_
uUv
[det Dg(u, v)[ =
_
A
[det Dg[ .
13 Vectorvalued integrals
This section, in short, is a remark that everything we have done so far generalizes
to vectorvalued functions, the vectors being from real nitedimensional spaces
(or C). These often occur in applications.
Given f : X R
n
measurable, let e
k
denote the standard basis vectors
in R
n
, and f
k
the components of f with respect to this basis. Of course we
dene
_
f =
n
k=1
_
_
f
k
_
e
k
,
provided the integrals on the right exist. Generally, to say that
_
X
f exists, we
do not allow any one of the components to be innite, for this is usually not
useful when n 2.
It is not hard to see that f is measurable if and only if each f
k
is.
It follows immediately that this integral is linear, and hence, the denition
is independent of the basis, as it ought to be. That is, if e
k
is any basis of
R
n
, the sum in the denition does not change.
8
The proof of this fact can be found in [Munkres], but it is not dicult to prove it yourself.
Hint: Inverse Function Theorem.
41
Since by our convention that the components of the integral are not allowed
to be innite, it seems we do not need to dene separately what it means for f
to be integrable. But we dene it, because we want to take note of some facts
(admittedly they are not very interesting): f being integrable means
_
f < .
The norm can be arbitrary, for in R
n
, every norm is equivalent: if 
1
and

2
are any two given norms, then there always exist constants , > 0 such
that 
2

1

2
. Thus being integrable in one norm implies integra
bility in another norm. In particular, by using the norm x
=
n
k=1
[x
k
[, we
see that f is integrable if and only if its components f
k
are integrable.
We want to show that 
_
f
_
f; this basic inequality will enable us
to make estimates without having to separate components. As a start, this is
clearly true if f is a simple function, i.e. f =
m
j=1
a
j
Ej
for a
j
R
n
. It is
also trivial if
_
f = . To prove the inequality for the other f, we use the
following easy lemma.
Lemma 13.1. Let f : X R
n
be integrable. There exists a sequence of simple
functions
j
: X R
n
converging pointwise to f, with
lim
j
_

j
f = 0 .
Proof. By equivalence of norms, it suces to prove this only for the norm 
as dened above.
For each f
k+
, by the approximation theorem in R, there exists measurable
simple
k+
j
increasing to f
k+
. Similarly for f
k
. Let
k
j
=
k+
j
k
j
. Using
the Dominated Convergence Theorem applied to each component,
lim
j
_

j
f
= lim
j
_
n
k=1
[
k
j
f
k
[ = 0 .
We also note that by the Monotone Convergence Theorem, the
j
satisfy
lim
j
_
j
=
_
f.
We nish our demonstration of the generalized triangle inequality. Let
j
as in the lemma. Then
_
_
_
_
j
_
_
_
_

j

_
f +
_

j
f ,
so that
lim
j
_
_
_
_
j
_
_
_ =
_
_
_ lim
j
_
j
_
_
_ =
_
_
_
_
f
_
_
_
_
f + lim
j
_

j
f =
_
f.
14 C
0
functions are dense in L
p
(R
n
)
This section is devoted to the result that the space of C
0
functions is dense in
L
p
(R
n
), which was discussed at the end of Section 7.
42
Theorem 14.1. Let f : R
n
R L
p
(R
n
), 1 p < . Then for any > 0,
there exists C
0
such that
 f
p
=
_
_
R
n
[ f[
p
d
_
1/p
< .
Our strategy for proving this theorem is straightforward. Since we already
know that the simple functions =
i
a
i
Ei
are dense in L
p
, we should try
approximating
Ei
by C
0
functions. Since C
0
functions are nonzero on com
pact sets, it stands to reason that we should approximate the sets E
i
by compact
sets K
i
. If this can be done, then it suces to construct the C
0
functions on
the sets K
i
.
Our constructions start with this last step. You might even have seen some
of these constructions before.
Lemma 14.2. Let A be a compact rectangle in R
n
. Then there exists C
0
which is positive on the interior of A and zero elsewhere.
Proof. Consider the innitely dierentiable function
f(x) =
_
e
1/x
2
, x > 0
0 , x 0 .
If A = [0, 1], then
(
x) = f(x) f(1 x) is the desired function of C
0
. (Draw
pictures!)
If A = [a
1
, b
1
] [a
n
, b
n
], then we let
A
(x) =
_
x
1
a
1
b
1
a
1
_
_
x
n
a
n
b
n
a
n
_
.
Lemma 14.3. For any > 0, there exists an innitely dierentiable function
h: R [0, 1] such that h(x) = 0 for x 0 and h(x) = 1 for x .
Proof. Take the function from Lemma 14.2 for the rectangle [0, ], and let
h(x) =
_
x
(t) dt
_
(t) dt
.
Theorem 14.4. Let U be open, and K U compact. Then there exists
C
0
which is positive on K and vanishes outside some other compact set L,
K L U.
Proof. For each x U, let A
x
U be a bounded open rectangle containing x,
whose closure A
x
lies in U. The A
x
together form an open cover of K. Take
a nite subcover A
xi
. Then the compact rectangles A
xi
also cover K.
From Lemma 14.2, obtain functions
i
C
0
that are positive on A
xi
and
vanish outside A
xi
. Let =
i
i
C
0
. vanishes outside L =
i
A
xi
,
which is compact.
43
Corollary 14.5. In Theorem 14.4, it is even possible to require in addition that
0 (x) 1 for all x R
n
and (x) = 1 for x K.
Proof. Let be from Theorem 14.4. Since is positive on the compact set K,
it has a positive minimum there. Take the function h of Lemma 14.3 for this
. The new candidate function is h .
As we have said, we must now approximate arbitrary Borel sets B B(R
n
)
by compact sets. (We will also need approximation by open sets.) It turns out
that this part of the proof is purely topological, and generalizes to other metric
spaces X besides R
n
. Henceforth we consider the more general case.
Let d denote the metric for the metric space X.
Theorem 14.6. Let (X, B(X), ) be a nite measure space, and let B B(X).
For every > 0, there exists a closed set V and an open set U such that
V B U and (U V ) < .
Proof. Let / be the set of all B B(X) for which the statement is true. We
show that / is a sigma algebra containing all the open sets in X.
1. Let B / with V and U as above. Then V
c
open B
c
U
c
closed,
and (V
c
) (U
c
) < . This shows B
c
/.
2. Let B
n
/. Choose V
n
and U
n
for each B
n
such that (U
n
V
n
) < /2
n
.
Let U =
n
U
n
which is open, and V =
n
V
n
, so that V
n
B
n
U.
Of course V is not necessarily closed, but W
N
=
N
n=1
V
n
are, and these
W
N
increase to V . Hence (V W
N
) 0 as N , meaning that for
large enough N, (V W
N
) < .
Next, we have
U W
N
= (U V ) (V W
N
)
_
n
(U
n
V
n
) (V W
N
) ,
(U W
N
) = (U V ) +(V W
N
)
n
(U
n
V
n
) +(V W
N
) < + .
This shows that
n
B
n
/.
3. Let B be open, and A = B
c
. Also let d(x, A) = inf
yA
d(x, y) be the
distance from x X to A. Set D
n
= x X : d(x, A) 1/n. D
n
is
closed, because d(, A) is a continuous function, and [1/n, ] is closed.
Clearly d(x, A) 1/n > 0 implies x A
c
= B, but since A is closed, the
converse is also true: for every x A
c
= B, d(x, A) > 0. Obviously the
D
n
are increasing, so we have just shown that they in fact increase to B.
Hence (B D
n
) < for large enough n. Thus B /.
44
The case that is not a nite measure is taken care of, as you would expect,
by taking limits like we did for sigmanite measures in Section 8. But since
compact and open sets are involved, we need stronger hypotheses:
1. There exists K
n
X, with K
n
compact and (K
n
) < .
2. There exists X
n
X, with X
n
open and (X
n
) < .
It is easily seen that these properties are satised by X = R
n
and the
Lebesgue measure , as well as many other reasonable measures on B(R
n
).
We will discuss this more later.
We assume henceforth that X and have the properties just listed.
Theorem 14.7. Let B B(X) with (B) < . For every > 0, there exists
a compact set V and an open set U such that K B U and (U K) < .
Proof. It suces to show that (U B) < and (B K) < separately.
Existence of K. Since B K
n
B, there exists some n such that (B)
(B K
n
) < /2.
For this n, dene the nite measure
Kn
(E) = (E K
n
), for E B(X).
By Theorem 14.6, there are sets V B U, V closed, and
Kn
(BV )
Kn
(U V ) < /2. Since X
n
is compact, it is closed. Then K = V K
n
is also closed, and hence compact, because it is contained in the compact
set K
n
. We have,
(B K) = (B) (B K
n
) +(B K
n
) (K)
= (B) (B K
n
) +
Kn
(B)
Kn
(V ) <
2
+
2
.
Existence of U. For every n, dene the nite measure
Xn
(E) = (E X
n
),
for E B(X). By Theorem 14.6, there are sets V
n
B U
n
, U
n
open,
and
Xn
(U
n
B)
Xn
(U
n
V
n
) < /2
n
.
Let U =
n
U
n
X
n
B. We have,
(U B)
n
(U
n
X
n
B) =
Xn
(U
n
B) < .
We return to the case of X = R
n
.
Theorem 14.8. Let B B(R
n
) with (B) < . For every > 0, there exists
C
0
such that

B

p
=
_
_
R
n
[
B
[
p
d
_
1/p
< .
45
Proof. By Theorem 14.7, there is compact K and open U, K B U, (U
K) < . From Corollary 14.5, there is C
0
such that = 1 on K, = 0
outside U, and 0 1. Then
_
R
n
[
B
[
p
=
_
R
n
\U
0 +
_
U\B
p
+
_
B\K
(1 )
p
+
_
K
0
(U B) +(B K)
= (U) (B) +(B) (K)
< .
Proof of Theorem 14.1. Let =
i
a
i
Ei
, a
i
,= 0 be a simple function such
that  f
p
< /2. Let
i
C
0
such that 
i
Ei

p
< /2[a
i
[. (Note
that E
i
must have nite measure; otherwise would not be integrable.) Let
=
i
a
i
i
. Then (Minkowskis inequality),
f 
p
f 
p
+ 
p
f 
p
+
i
[a
i
[ 
Ei
i

p
< .
Actually, even the last part of theorem can be generalized to spaces other
than R
n
: instead of innitely dierentiable functions with compact support, we
consider continuous functions, dened on the metric space X, with compact
support. In this case, a topological argument must be found to replace Lemma
14.2. This is easy:
Lemma 14.9. Let A be any compact set in X. Then there exists a continuous
function : X R which is positive on the interior of A and zero elsewhere.
Proof. Let C = X interior A, so C is closed. Then (x) = d(x, C) works.
(d(x, C) was dened in the proof of Theorem 14.6.)
The proof of Theorem 14.4 goes through verbatim for metric spaces X,
provided that X is locally compact. This means: given any x X and an open
neighborhood U of x, there exists another open neighborhood V of x, such that
V is compact and V U.
Finally, we need to consider when properties (1) and (2) (in the remarks
preceding Theorem 14.7) are satised. These properties are somewhat awkward
to state, so we will introduce some new conditions instead.
Denition 14.1. A measure on a topological space X is locally nite if for
each x X, there is an open neighborhood U of x such that (U) < .
It is easily seen that when is locally nite, then (K) < for every
compact set K.
Denition 14.2. A topological space X is strongly sigmacompact if there
exists a sequence of open sets X
n
with compact closure, and X
n
X.
46
If X is strongly sigmacompact, and is locally nite, then properties (1) and
(2) are automatically satised. It is even true that strong sigmacompactness
implies local compactness in a metric space. (The proof requires some topology
and is left as an exercise.) Then we have the following theorem:
Theorem 14.10. Let X be a strongly sigmacompact metric space, and be
any locally nite measure on B(X). Then the space of continuous functions with
compact support is dense in L
p
(X, B(X), ), 1 p < .
15 Other examples of measures
Since so far we have chiey worked only in R
n
with Lebesgue measure, it should
be of interest to give a few more useful examples of measures.
kdimensional volume of a kdimensional manifold
A manifold is a generalization of curves and surfaces to higher dimensions, and
sometimes even to spaces other than R
n
. But here we shall concentrate on dier
entiable manifolds inside R
n
; the theory is elucidated in [Spivak2] or [Munkres].
Here we give a denition of the kdimensional volume for kdimensional mani
folds which does not require those dreaded partitions of unity.
Suppose a kdimensional manifold M R
n
is covered by a single coordinate
chart : U M, U R
k
open. Let D denote the nbyk matrix
D =
_
d
dt
1
d
dt
2
. . .
d
dt
k
_
.
(More precisely, each vector
d
dti
is represented by a column vector in the standard
basis of R
n
. Actually our denition works using any orthonormal basis also.)
Dene, for any vectors v
1
, . . . , v
k
R
n
(again represented in an orthnormal
basis):
V(v
1
, . . . , v
k
) =
_
det
_
v
1
v
2
. . . v
k
tr
_
v
1
v
2
. . . v
k
=
_
det [v
i
v
j
]
i,j=1,...,k
,
This is the kdimensional volume of a kdimensional parallelopiped spanned by
the vectors v
1
, . . . , v
k
in R
n
. One easily shows that this volume is invariant under
orthogonal transformations, and that it agrees with the usual kdimensional
volume (as dened by the Lebesgue measure) when the parallelopiped lies in
the subspace R
k
0 R
n
.
The kdimensional volume of any E B(M) is dened as:
(E) =
_
1
(E)
V(D) d.
(Since is continuous,
1
(E) B(R
k
).)
47
The integrand, of course, is supposed to represent innitesimal elements of
surface area (kdimensional volume), or approximations of the surface area of E
by polygons that are close to E. As indicated by the quotation marks, these
assertions about surface area are completely nonrigorous, and we wont be
labour to prove them, since the equation above is our denition of kdimensional
volume. But it should be pointed out that there are better theories of k
dimensional volume available, which are intrinsic to the sets being measured,
instead of our computational theory. (I dont know these other theories well
enough though.)
Back to our denitions. If M is not covered by a single coordinate chart,
but more than one, say
i
: U
i
M, i = 1, 2, . . . , then partition M with
V
1
=
1
(U
1
), V
i
=
i
(U
i
) V
i1
, and dene
(E) =
i
_
1
i
(EVi)
V(D) d.
It is left as an exercise to show that (E) is welldened: it is independent of
the coordinate charts
i
used for M.
Finally, the scalar integral of f : M R over M is simply
_
M
f d .
And the integral of a dierential form on an oriented manifold M is
_
pM
_
p ; T(p)
_
d ,
where T(p) is an orthonormal frame of the tangent space of M at p, oriented
according to the given orientation of M. (If you dont know what Im talking
about, just ignore this denition essentially it generalizes the line and surface
integrals in calculus.)
Again it is not hard to show that the formulae I have given are exactly
equivalent to the classical ones for evaluating scalar integrals and integrals of
dierential forms, which are of course needed for actual computations. But there
are several advantages to our new denitions. First is that they are elegant: they
are mostly coordinatefree, and all the dierent integrals studied in calculus have
been unied to the Lebesgue integral by employing dierent measures. In turn,
this means that the nice properties and convergence theorems we have proven
all carry over to integrals on manifolds.
For example, everybody knows that on a sphere, any circular arc C has
measure zero, and so may be ignored when integrating over the sphere. To
prove this rigorously using our denitions, we only have to remark that (C) =
0, since (
1
(C)) = 0 for a coordinate chart for the sphere.
Stieltjes measure
The denition of the Stieltjes measure is best motivated by probability theory.
Suppose we have a random variable Z with distribution , and the cumulative
48
distribution function F : R [0, 1] by denition, they satisfy F(z) = Pr[Z
z] =
_
[, z]
_
. It follows that F is (nonstrict) increasing, and ((a, b]) =
F(b) F(a).
The idea here is to try to reverse this procedure: given any increasing func
tion F, can we construct a measure on B(R) that assigns, to any interval
(a, b], a length of F(b) F(a)?
Actually we will need to impose some conditions on F rst. Since F is in
creasing, it always has only a countable number of discontinuities, and these
discontinuities must all be jump discontinuities. At these jumps, we will insist
that F is rightcontinuous, i.e. lim
xa
F(x) = F(a). Otherwise, taking count
able limits may fail: for example, if at the point a, F is leftcontinuous instead
of rightcontinuous, then
_
(a, b]
_
= F(b) F(a)
,= lim
xa
(F(b) F(x))
= lim
xa
_
(a, x]
_
.
(Of course, the preference of right over left comes from our convention that
we used intervals (a, b] that are open on the left and closed on the right.)
We will also insist that F(x) < for x ,= , +, so that the subtraction
F(b)F(a) makes sense. However, it should be allowed that, say, F() =
(and F(x) does not have to be in [0, 1] either). This allows sigmanite measures
to be constructed.
With the necessary conditions now stated, we can begin the construction of
, which is not much dierent from the construction of the Lebesgue measure
on R not surprising, since F(x) = x is exactly the Lebesgue measure on R.
First, it is easily checked, by drawing pictures of the intervals (a, b], that
they actually form a semialgebra on R. (Just ignore the point for now.)
The measure on the generated algebra / is dened in the obvious way, and
it follows from the same arguments as in Section 9 that is nitely additive.
Countable additivity requires the typical approximation arguments. Suppose
that we have J
n
/ with innite disjoint union I /. By monotonicity we
automatically have
n=1
(J
n
) (I), so we only have to prove the other
inequality. We can assume that J
n
are simple intervals (a
n
, b
n
], instead of nite
disjoint unions of intervals. Also assume, for now, that I is the nite interval
(a, b].
Since F is rightcontinuous, for every > 0, there exists > 0 such that
0 F(b) F(a) < F(b) F(a +) + .
Also there exists
n
> 0 such that
0 F(b
n
+
n
) F(a
n
) < F(b
n
) F(a
n
) +
2
n
.
There exists a nite set n
1
, . . . , n
k
such that (a+, b]
k
i=1
(a
ni
, b
ni
+
ni
],
49
since the open sets (a
n
, b
n
+) cover the compact set [a +, b]. Therefore,
F(b) F(a +)
k
i=1
F(b
ni
+
ni
) F(a
ni
)
n=1
F(b
n
+
n
) F(a
n
) ,
F(b) F(a) 2 +
n=1
F(b
n
) F(a
n
) .
and we take 0.
Finally, the proof for general I = I
1
I
k
/ just follows from nite
additivity and that the nite sum of limits equals the limit of the nite sum.
Innite intervals are handled in the same way as in Section 9.
Thus using the theorems of Section 8, can thereby be extended to a measure
on B(R). This is called the Stieltjes measure on R, and the integral
_
R
g d =
_
R
g dF
is the Stieltjes integral, and it generalizes the RiemannStieltjes integral that
is sometimes studied in real analysis courses. (The RiemannStieltjes integral
is dened by taking limits of Riemannlike sums
i
g(
i
) (F(x
i
) F(x
i1
)).
Showing this limit exists requires some eort, however.)
Of course, if F is dierentiable, the Stieltjes integral just reduces to
_
R
g dF =
_
R
g F
t
d.
Lastly, we should mention that if we admit signed measures, which are dif
ferences of two (positive) measures, then the condition that F be increasing can
even be relaxed. We will not pursue that theory here though.
16 Egorovs Theorem
The following theorem does not really belong in a rst course, but it is quite a
surprising and interesting result, and I want to record its proof.
Theorem 16.1 (Egorov). Let (X, ) be a measure space of nite measure, and
f
n
: X R be a sequence of measurable functions convergent almost everywhere
to f. Then given any > 0, there exists a measurable subset A X such that
(X A) < and the sequence f
n
converges uniformly to f on A.
Proof. First dene
B
n,m
=
k=n
_
[f f
k
[ <
1
m
_
.
50
Fix m. For most x X, f
n
(x) converges to f(x), so there exists n such that
[f
k
(x) f(x)[ < 1/m for all k n, so x B
n,m
. Thus we see B
n,m
n
XC
(C is some set of measure zero).
We construct the set A inductively as follows. Set A
0
= X C. For each
m > 0, since A
m1
B
n,m
n
A
m1
, we have (A
m1
B
n,m
) 0, so we
can choose n(m) such that
(A
m1
B
n(m),m
) <
2
m
.
Furthermore set
A
m
= A
m1
B
n(m),m
.
Since A
m
(A
m1
B
n(m),m
) = A
m1
, we have
(A
m
) > (A
m1
)
2
m
> (X)
2
4
2
m
(X) .
The sets A
m
are decreasing, so letting
A =
m=1
A
m
=
m=1
B
n(m),m
,
we have (A) (X) , or (X A) . Finally, for x A, x B
n(m),m
for
all m, showing that [f(x) f
k
(x)[ < 1/m whenever k n(m). This condition
is uniform for all x A.
17 Exercises
I have been suggested to provide some more exercises to this text. Here they
are.
1. In R
n
with Lebesgue measure, nd an uncountable set of measure zero.
2. Show that if f : R
n
R is continuous and equal to zero almost everywhere,
then f is in fact equal to zero everywhere.
3. Find a sequence of integrable functions f
n
such that f
n
(x) 0 for every
x but
_
f
n
.
4. Let f L
1
(R
n
), 1 p < . Compute the limits
lim
h0
_
[f(x +h) f(x)[
p
dx, lim
h
_
[f(x +h) f(x)[
p
dx.
5. Let f L
1
(R
n
). Show that
lim
y
_
R
n
f(x)e
ix,y)
dx = 0 .
51
6. Let f
n
L
p
, 1 < p < be a sequence of functions converging to f L
p
almost everywhere, and suppose there is a constant M such that f
n

p
M for all n. Then for each g L
q
,
_
fg = lim
n
_
f
n
g .
Hint: Use Egovovs Theorem and a density argument. Is this also true for
p = 1?
7. Let f
n
L
p
, 1 p < be a sequence of functions converging f Le
p
almost everywhere. Prove that f
n
converges to f in the Le
p
norm if and
only if f
n

p
f
p
.
8. Let f L
p
(R), g L
q
(R), with 1 p, q . Show that the function
F(x) =
_
x
0
f(t) dt is dened and continuous for all x R, and that the
function h(x) = ([x[ + 1)
a
F(x)g(x) is in L
1
(R), the constant a being
larger than 2
1
p
1
q
.
9. Let f L
1
(R). Show that the series
n=1
1
n
f(x
n)
is convergent for almost all x R.
10. Let f be in L
1
(R), and g be a continuous periodic function with period 1.
Show that
lim
n
_
f(x)g(nx) dx =
_
f(x) dx
_
1
0
g(y) dy .
18 Bibliography
The following outlines the prerequesites for this article. (Although Im not
suggesting that you must rst know everything here before you read this article;
you could be learning these as you go along.)
First, you need a respectable rstyear calculus course, dealing with limits
rigorously. The course I took used [Spivak1], possibly the best math book ever.
You probably should be at least somewhat familiar with multidimensional
calculus, if only to have a motivation for the theorems we prove (e.g. Fu
binis Theorem, Change of Variables). I learned multidimensional calculus
from [Spivak2] and [Munkres]. As youd expect, these are theoretical books,
and not very practical, but we will need a few elementary results that these
books prove.
Pointset topology is also introduced in the study of multidimensional cal
culus. We will not need a deep understanding of that subject here, but just the
52
basic denitions and facts about open sets, closed sets, compact sets, continous
maps between topological spaces, and metric spaces. I dont have particular
references for these, as it has become popular to learn topology with Moores
method (as I have done), where you are given lists of theorems that you are
supposed to prove alone.
The last book, [Rosenthal], (not a prerequesite) is what I mostly referred
to while writing up Section 8. It contains applications to probability of the
abstract measure stu we do here, and it is not overly abstract. I recommend
it, and its cheap too.
I dont mention any of the standard real analysis or measure theory books
here, since I dont have them handy, and this text is supposed to supplant a fair
portion of these books anyway. But surely you can nd references elsewhere.
References
[Spivak1] Michael Spivak, Calculus (3rd ed.). Publish or Perish, 1994; ISBN
0914098896.
[Spivak2] Michael Spivak, Calculus on Manifolds. Perseus, 1965; ISBN 0
805390219.
[Munkres] James R. Munkres, Analysis on Manifolds. Westview Press, 1991;
ISBN 0201510359.
[Rosenthal] Jerey S. Rosenthal, A First Look at Rigorous Probability Theory.
World Scientic, 2000; ISBN 9810243030.
53