FA Lecture

Functional Analysis Notes
M. Einsiedler, T. Ward Draft July 2, 2012
Draft version: comments to t.ward@uea.ac.uk please
ii
Acknowledgements We are grateful to several people for their comments on drafts of sections, including Anthony Flatters, Thomas Hille, Alex Maier, Andrea Riva, and Rene R uhr. Also Emmanuel Kowalski for making available notes on spectral theory and allowing us to raid them.
Contents
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 From Even and Odd Functions to Group Representations . . . . . 1.2 (Equi-)distribution of Points and Measures . . . . . . . . . . . . . . . . . 1.3 Ordinary Dierential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 A Second-Order Linear Initial Value Problem . . . . . . . . . 1.3.2 The Volterra Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.3 The SturmLiouville Equation . . . . . . . . . . . . . . . . . . . . . . 1.4 Partial Dierential Equations and the Laplace Operator . . . . . . 1.4.1 The Heat Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4.2 The Wave Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Distributions as Generalized Functions . . . . . . . . . . . . . . . . . . . . . 1.6 Highly Connected Networks: Expanders . . . . . . . . . . . . . . . . . . . . 1.7 What is spectral theory? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Norms, Banach Spaces, and Hilbert Spaces . . . . . . . . . . . . . . . . 2.1 Norms and Semi-Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Normed Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Semi-Norms and Quotient Norms . . . . . . . . . . . . . . . . . . . . 2.1.3 A Comment on Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Banach Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Proofs of Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 The Completion of a Normed Vector Space . . . . . . . . . . . 2.2.3 Non-Compactness of the Unit Ball . . . . . . . . . . . . . . . . . . . 2.3 The space of continuous functions . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 The ArzelaAscoli theorem . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 The StoneWeierstrass Theorem . . . . . . . . . . . . . . . . . . . . . 2.3.3 Continuous Functions in Lp Spaces . . . . . . . . . . . . . . . . . . 2.4 Bounded Operators and Functionals . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 The Volterra Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 The Norm of Continuous Functionals on C (X ) . . . . . . . .
3 3 8 12 12 13 14 18 20 22 24 25 32 33 35 35 35 41 43 44 46 55 57 58 58 61 66 70 75 77
iv
Contents
2.4.3 Banach Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 2.5 Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 2.5.1 Denitions and Elementary Properties . . . . . . . . . . . . . . . 80 2.5.2 Isometries are Ane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 2.5.3 Convex Sets in Uniformly Convex Spaces . . . . . . . . . . . . . 85 2.5.4 Two Applications to Measure Theory . . . . . . . . . . . . . . . . 91 2.5.5 Orthonormal Bases and GramSchmidt . . . . . . . . . . . . . . 96 2.5.6 The Non-Separable Case . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 2.6 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 3 From Fourier Series to Dirichlet Boundary Value Problems 103 3.1 Fourier Series on Compact Abelian Groups . . . . . . . . . . . . . . . . . 103 3.2 Fourier Series on Td . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 3.2.1 Convolution on the Torus . . . . . . . . . . . . . . . . . . . . . . . . . . 109 3.2.2 Dirichlet and Fej er Kernels . . . . . . . . . . . . . . . . . . . . . . . . . 110 3.2.3 Dierentiability and Fourier Series . . . . . . . . . . . . . . . . . . . 116 3.3 Spectral Theory for Group Actions on Td . . . . . . . . . . . . . . . . . . . 118 3.3.1 Group Actions and Unitary Representations . . . . . . . . . . 118 3.3.2 Measure-Preserving Actions of Compact Groups . . . . . . . 121 3.3.3 Unitary Representations of Compact Abelian Groups . . 121 3.3.4 Integrating Hilbert Space-valued Functions . . . . . . . . . . . 122 3.3.5 Proof of the Weight Decomposition . . . . . . . . . . . . . . . . . . 125 3.4 Sobolev Spaces and Embedding on the Torus . . . . . . . . . . . . . . . 128 3.4.1 L2 -Sobolev Spaces on Td . . . . . . . . . . . . . . . . . . . . . . . . . . 128 3.4.2 The Sobolev Embedding Theorem on Td . . . . . . . . . . . . . 132 3.5 Sobolev Spaces and Embedding Theorem on Open Sets . . . . . . 134 3.5.1 L2 -Sobolev Spaces on Open Subsets . . . . . . . . . . . . . . . . . 134 3.5.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 3.5.3 Restriction Operators and Traces . . . . . . . . . . . . . . . . . . . . 139 3.5.4 Sobolev Embedding in the Interior . . . . . . . . . . . . . . . . . . . 144 3.6 The Dirichlet Boundary Value Problem and Elliptic Regularity 147 3.6.1 The Pre-Inner Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 3.6.2 Elliptic Regularity for the Laplace Operator . . . . . . . . . . 150 3.6.3 Dirichlets Boundary Value Problem in two dimensions . 155 3.7 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Compact Self-Adjoint Operators and Laplace Eigenfunctions 161 4.1 The Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 4.2 Compact Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 4.2.1 Denition and Basic Properties . . . . . . . . . . . . . . . . . . . . . 162 4.2.2 Integral Operators are often Compact . . . . . . . . . . . . . . . . 165 4.3 Spectral Theory of Self-Adjoint Compact Operators. . . . . . . . . . 169 4.3.1 The Adjoint Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 4.3.2 The Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 4.3.3 Proof of the Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . 173
Contents
4.4 Eigenfunctions for the Laplace Operator . . . . . . . . . . . . . . . . . . . . 176 4.4.1 A Compact Right Inverse on the Torus . . . . . . . . . . . . . . . 177 4.4.2 A Self-Adjoint Right Inverse on Open Subsets . . . . . . . . . 178 4.4.3 Compactness of the Right-Inverse . . . . . . . . . . . . . . . . . . . 179 5 Uniform Boundedness and Open Mapping Theorem . . . . . . . 189 5.1 Uniform Boundedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 5.1.1 Uniform Boundedness and Fourier Series . . . . . . . . . . . . . 191 5.2 Open Mapping and Closed Graph Theorems . . . . . . . . . . . . . . . . 193 5.2.1 Baire Category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 5.2.2 Proof of Open Mapping Theorem . . . . . . . . . . . . . . . . . . . . 196 5.2.3 Consequences: Bounded Inverses and Closed Graphs . . . 197 5.3 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 Dual Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 6.1 HahnBanach Theorem and its Consequences . . . . . . . . . . . . . . . 201 6.1.1 HahnBanach Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 6.1.2 The HahnBanach Theorem Consequences . . . . . . . . . . . . 203 6.1.3 An Application of the Spanning Criterion . . . . . . . . . . . . 206 6.1.4 The Bidual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 6.1.5 Banach Limits and Amenable Groups . . . . . . . . . . . . . . . . 209 6.2 The Duals of Lp (X ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 6.2.1 The Dual of L1 (X ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 6.2.2 The Dual of Lp (X ) for p > 1 . . . . . . . . . . . . . . . . . . . . . . . 216 6.3 Riesz Representation, The Dual of C (X ) . . . . . . . . . . . . . . . . . . . 220 6.3.1 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 6.3.2 Totally Disconnected Compact Spaces . . . . . . . . . . . . . . . 221 6.3.3 Compact Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 6.3.4 Locally Compact -Compact Metric Spaces . . . . . . . . . . . 228 6.3.5 Continuous Linear Functionals on C (X ) . . . . . . . . . . . . . . 230 6.4 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Locally Convex Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 7.1 Weak Topologies and TychonoAlaoglu . . . . . . . . . . . . . . . . . . . 235 7.1.1 Weak* Compactness of the Unit Ball . . . . . . . . . . . . . . . . 237 7.1.2 More Properties of the Weak and Weak* Topologies . . . 239 7.1.3 Analytic Functions and the Weak Topology . . . . . . . . . . . 241 7.2 Applications of Weak* Compactness . . . . . . . . . . . . . . . . . . . . . . . 243 7.2.1 Equidistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 7.2.2 Elliptic Regularity at the Boundary . . . . . . . . . . . . . . . . . . 250 7.3 Topologies on B (X, Y ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 7.4 Locally Convex Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 7.5 Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 7.5.1 Applications of the HahnBanach Lemma . . . . . . . . . . . . 255 7.5.2 Extremal Points and the KreinMilman Theorem . . . . . . 258
vi
Contents
7.6 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 8 Spectral Theory of Unitary Operators, Fourier Transforms 263 8.1 Spectral Theory of Unitary Operators . . . . . . . . . . . . . . . . . . . . . . 263 8.1.1 Bochners Theorem for Positive-Denite Sequences . . . . 264 8.1.2 Cyclic Representations and the Spectral Theorem . . . . . 266 8.1.3 Proof of Bochners theorem . . . . . . . . . . . . . . . . . . . . . . . . . 269 8.1.4 Projection-valued Measures . . . . . . . . . . . . . . . . . . . . . . . . . 273 8.2 Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 8.2.1 Fourier Transform on L1 (Rd ) . . . . . . . . . . . . . . . . . . . . . . . 280 8.2.2 Fourier Transform on L2 (Rd ) . . . . . . . . . . . . . . . . . . . . . . . 284 8.2.3 Fourier transform and smoothness, Schwartz space. . . . . 287 Banach Algebras and Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 9.1 Spectrum and Spectral Radius . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 9.1.1 The Geometric Series and its Consequences . . . . . . . . . . . 291 9.1.2 Using Cauchy Integration . . . . . . . . . . . . . . . . . . . . . . . . . . 293 9.2 C -algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 9.3 Commutative Banach Algebras and their Gelfand duals . . . . . . 298 9.3.1 Commutative Unital Banach Algebras . . . . . . . . . . . . . . . 299 9.3.2 Commutative Banach Algebras without a Unit . . . . . . . . 301 9.3.3 The Gelfand Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302 9.3.4 The Gelfand Transform for Commutative C -algebras . . 305 9.4 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
10 Functional Calculus and Spectral Theory . . . . . . . . . . . . . . . . . . 309 10.1 Denitions, Basic Lemmas, Main Goals . . . . . . . . . . . . . . . . . . . . 309 10.1.1 Discrete, Continuous, and Residual Spectrum . . . . . . . . . 309 10.1.2 Numerical Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 10.1.3 Main Goals: Spectral Theorem and Functional Calculus 313 10.2 Continuous Functional Calculus for Self-Adjoint Operators . . . 316 10.2.1 Corollaries to the Continuous Functional Calculus . . . . . 319 10.3 Spectral measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 10.3.1 The Spectral Theorem for Self-Adjoint Operators . . . . . . 324 10.4 Spectral Measures and the Measurable Functional Calculus . . . 329 10.4.1 Non-Diagonal Spectral Measures . . . . . . . . . . . . . . . . . . . . 330 10.4.2 The Measurable Functional Calculus . . . . . . . . . . . . . . . . . 331 10.5 Commuting Normal Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334 10.6 Projection-valued measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336 10.7 The spectral theorem for normal operators . . . . . . . . . . . . . . . . . . 345 10.8 Some Facts on the Spectrum of a Tree . . . . . . . . . . . . . . . . . . . . . 347 10.8.1 The Correct Upper Bound for the Summing Operator . . 348 10.8.2 Chebyshev Polynomials of the Second Kind . . . . . . . . . . . 350
Contents
vii
11 Spectral Theory of Self-Adjoint Unbounded Operators . . . . 355 11.1 Examples, Denitions, and the Main Theorem . . . . . . . . . . . . . . 355 11.2 Operators of the form T T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 11.3 Self-Adjoint Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 Appendix A: Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 A.1 Set Theory and Axiom of Choice . . . . . . . . . . . . . . . . . . . . . . . . . . 365 A.2 Basic Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 A.3 Convergence and Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368 A.4 Inducing Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 A.5 Compact Sets and Tychono Theorem . . . . . . . . . . . . . . . . . . . . . 372 A.6 Normal Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374 Appendix B: Measure Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 B.1 Basic Denitions and Measurability . . . . . . . . . . . . . . . . . . . . . . . . 379 B.1.1 Measure and Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380 B.2 Properties of the Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382 B.3 The p-Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384 B.4 Near-continuity of Measurable Functions . . . . . . . . . . . . . . . . . . . 386 Hints for Selected Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399 Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404 General Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
Introduction
1 Motivation
We start by discussing some seemingly disparate topics that are all intimately linked to notions from functional analysis. Some of the topics have been important motivations for the development of the theory that came to be called functional analysis in the rst place and some topics concern more recent applications of the theory. We hope that the variety of topics helps to clarify the central role of functional analysis in mathematics.
1.1 From Even and Odd Functions to Group Representations

We recall the following elementary but useful notions of symmetry and antisymmetry for functions. A function f : R R is said to be even if f (x) = f (x) for all x R, and odd if f (x) = f (x)
for all x R. Every function f : R R can be split into an even and an odd component, since f ( x ) f ( x ) + f (x ) . (1.1) f (x) = f (x)+2 2
the even part
the odd part
This chapter is atypical for these notes. The reader may and the lecturer should skip, or return later to, this chapter for convenience. In fact the (sometimes informal) discussions here are not needed for the formal development of the theory, which starts in Chapter 2, but they may help to motivate some of the later developments. We also apologize for the trivial rst two pages, but we hope that these help to clarify how natural the discussed decompositions are.
1 Motivation
Exercise 1.1. Is the decomposition of a function into odd and even parts in (1.1) unique? That is, if f = e + o with e an even function and o an odd function, f ( x ) is e(x) = f (x)+ ? 2
Behind the denition of even and odd functions, and the decomposition in (1.1), is the group Z/2Z = {0, 1} acting on R via the map x (1)n x for n Z/2Z. In order to generalize this observation, recall that an action of a group G on a set X is a map G X X with the properties
(g, x) g x
g (h x) = (gh) x for all g, h G and x X , and
..
ex=x for all x X , where e G is the identity element. Having associated the decomposition of a function into odd and even parts with the action of the group Z/2Z on R, the notion of group action suggests many generalizations of the decomposition. We begin this by discussing functions on R2 . We could once again consider the action of Z/2Z via scalar multiplication by (1)n for n Z/2Z on R2 . Alternatively, we could treat the two components independently, and 2 allow (Z/2Z) to act via the action (n1 , n2 ) (x1 , x2 ) ((1)n1 x1 , (1)n2 x2 ) for n1 , n2 Z/2Z and x1 , x2 R. Notice that the action of this group of order four leads to four dierent types of functions, namely those that are: even with respect to both variables; even with respect to x1 and odd with respect to x2 ; odd with respect to x1 and even with respect to x2 ; odd with respect to both variable.
Once more every function can be decomposed into a sum of four components, one of each type (see Exercise 1.2).
Exercise 1.2. (a) Show that every function f : R2 R can be decomposed into a sum of four functions with the symmetry properties listed above. (b) Consider the group action of Z/nZ on R2 by letting k + nZ act by rotation by the angle 2k (using (1.2) and the matrix k( 2k ) as below). Generalize the above n n decompositions to this case.
Here we are using n Z as a shorthand for the coset n + 2Z Z/2Z.
However, there are other natural actions on R2 (as e.g. in the above exercise). Let T = R/Z be the one-dimensional circle group or 1-torus, and dene an action of T on R2 by the rotation T R2 R2 , where k () = x1 x2 k (2) cos sin sin cos
x1 , x2
(1.2)
is the matrix of anti-clockwise rotation through the angle 2 on R2 . In studying any situation with rotational symmetry on R2 one is naturally led to this action. What is the corresponding decomposition of functions for this action? How many dierent classes of functions will appear in the corresponding decomposition?
Exercise 1.3. Verify that (1.2) denes an action of T on R2 .
Clearly one distinguished class of functions is given by the functions invariant under rotation that is, functions satisfying f x1 x2 =f k (2) x1 x2
for all T. The graph of such a function is the surface obtained by rotating a graph of a function [0, ) R or [0, ) C about the z -axis. To guess what the other classes of function should be, notice that all of the symmetries of functions considered above can be phrased naturally in terms of the possible continuous group homomorphisms of the acting group to the group S1 = {z C | |z | = 1}. It is easy to show (see Exercise 1.4 for the third, non-trivial, statement) that (1) any homomorphism Z/2Z S1 has the form n (1)n , so there are two such homomorphisms; 2 (2) any homomorphism (Z/2Z) S1 has the form (n1 , n2 ) (1)n1 (1)n2 , so there are four such homomorphisms; and
Once again we will sometimes write t as shorthand for the coset t + Z R/Z; in particular the interval [0, 1) may be identied with T using addition modulo 1. Notice that this makes T into a topological group by declaring elements to be close if they (or rather their representatives) can be chosen to be close in R.
1 Motivation
(3) any continuous homomorphism T S1 has the form n () = e2in . For any topological group G, we call the continuous homomorphisms from G to S1 the characters of G. Notice that the characters in (1) and (2) correspond exactly to the even and odd functions in (1.1), respectively to the four types 2 of functions for the action of (Z/2Z) on R2 considered above.
Exercise 1.4. Show that any continuous homomorphism T S1 has the form n () for some n Z.
Turning to (3), we say that a function f : R2 C has weight n (is of type n) if f (k (2)v ) = n ()f (v ) for all T and v R2 . One might now guess and we will see in Chapter 3 that this indeed is the case that any reasonable function f : R2 C can be written as a linear combination f= fn (1.3)
nZ
where fn has weight n. However, in contrast to (1.1) this is an innite sum, so we are no longer talking about a purely algebraic phenomenon. The decomposition (1.3), its existence and its properties, lies both in algebra and in analysis. We therefore have to become concerned both with the algebraic structure and with questions of convergence. Depending on the notion of convergence used, the class of reasonable functions turns out to vary. These classes of reasonable functions will provide us with important examples of Banach spaces to be dened in Chapter 2. The discussion above on decompositions into sums of functions of dierent weights will later be part of the treatment of Fourier analysis (see Chapter 3). For this we will initially study the mathematically simpler situation of the action of T on T by translation, (x, y ) x + y for x, y T. Adjusting the denitions above appropriately, we say that f :TC
Notice that we are now allowing functions to be complex-valued, and that we have simplied the notation for points in R2 . This helps to clarify the underlying structure, and reects one of the themes of functional analysis: thinking of progressively more complicated objects (numbers, then vectors, then functions, then operators) as points in a larger space allows the real structures to be seen more clearly.
has weight n Z if and only if f is a multiple of n itself. We therefore seek, for a reasonable function f : T C, constants cn for n Z with f=
nZ
cn n .
(1.4)
The right-hand side of (1.4) is called the Fourier series of f . We will see later that it is relatively straightforward (at least in the abstract sense) to nd the Fourier coecients cn via the identity
1
cn =
0
f (x)n (x) dx
for all n Z.
Exercise 1.5. Use de Moivres formula e2in = cos n + i sin n to show that the discussion on Fourier series culminating in (1.4) has a purely real analog. Find formulas for the Fourier coecients of the analogous decomposition into sums of sins and cosines (assuming the formula for the complex version).
Similarly, we will show that for a reasonable function f : R2 C the function

1
fn (v ) =
0
f (k (2)v ) n () d
for n Z has weight n, and that f=

nZ
fn .
1
(1.5)
Exercise 1.6. Show that if the function fn (v ) = 0 f (k(2)v ) n () d is welldened (that is, if the integral exists for all, or for almost every, v R2 ), then it has weight n.
To summarize, we will introduce classes of functions (which will be examples of Banach spaces), and determine whether for functions in these classes the Fourier series (1.4) or the weight decomposition (1.5) converges, and in what sense the convergence does or does not happen. For functions f : R3 C one can generalize the discussion above in many dierent ways, by considering the actions of various dierent groups as follows. Z/2Z, giving the familiar generalization of even and odd functions. 3 (Z/2Z) , giving a decomposition into eight functions dened by their oddor even-ness with respect to each of the three variables. T = SO(2) acting by rotations in the x, y -plane about the z -axis. This gives a simple generalization of our discussion of functions R2 C, and we will be able to treat this case in a similar way to the two-dimensional case. SO(3), the full group of orientation-preserving rotations of R3 .
1 Motivation
The last case in this list is more dicult to analyze than any of the cases discussed above. The additional complications in this case are much deeper than might at rst appear. For example, the group SO(3) is simple, and as a result there are no non-trivial continuous homomorphisms SO(3) S1 , so this cannot be used to dene classes of functions in the same way. In fact the case of SO(3) requires the theory of harmonic analysis and unitary representations of compact groups. We will not reach this important topic in this notes but lay the ground for it and refer to the excellent treatment in ??.
Exercise 1.7. Prove that the group SO(3) of rotations of R3 forms a compact group in its natural topology when viewed as a set of 3 3 real matrices. Show that this group is simple, and deduce that a continuous homomorphism SO(3) S1 must be trivial.
1.2 (Equi-)distribution of Points and Measures

A sequence (xn )n of elements of a metric space X is dense if for every x X there is a subsequence (xnk )k that converges to x. A much ner property is given by equidistribution, which we now dene for X = [0, 1]. A sequence (xn )n 1 of points in [0, 1] is said to be equidistributed or uniformly distributed if any one of the following equivalent conditions is satised: (1) 1 |{k [1, K ] | xk [a, b]}| b a as K for any 0 a < b 1. K K 1 1 (2) f (x) dx as K for any continuous function f f (xk ) K 0 k=1 C ([0, 1]). K 1 1 f (xk ) f (x) dx as K for any Riemann-integrable f (3) K 0 k=1 R ([0, 1]). K 1 0 if n = 0; 1 (4) n (xk ) n (x) dx = as K for any n Z K 1 if n = 0 0 k=1 (n is dened on p. 6). We will now sketch some of the implications between these equivalent statements (see Exercise 1.8, Kuipers and Niederreiter [23] or [12, Sect. 4.4.1] for a detailed treatment). We will develop all of the theorems needed later in the text, and will return to the topic of equidistribution in Chapter 7 from a more general point of view. Almost a proof of (4) = (2). Consider the algebra of trigonometric polynomials
N
A=
n=N
c n n | c n C, N N .
Using the complex version of the StoneWeierstrass theorem (Theorem 2.34), it may be seen that A is dense in C (T) with respect to the uniform metric (see Proposition 3.19). Given f C (T) and > 0, there is some g A with f g which implies that f and 1 K for any K
K
= sup |f (x) g (x)| <

xT
g <
K
k=1
f (xk )
1 K
g (xk ) <
k=1
1. If K is suciently large then, by assumption, 1 K

K k=1
g (xk )
g < .
It follows that 1 K
k=1
f (xk )
f < 3,
which is not quite the claim in (2) since C (T) and C ([0, 1]) dier slightly. Any function f : T C gives rise to a function f : R C via the diagram R /C ? f T
f
which we can restrict to [0, 1], dening an element g C ([0, 1]). If f : T C is continuous then g is also, but g satises g (0) = g (1). We will handle this issue below in the proof that (2) implies (1), where we will only assume (2) for all f C (T). Proof of (2) = (1). Suppose rst that 0 < a < b < 1 and write [a,b] for the characteristic function of the interval [a, b]. Fix > 0 and choose continuous functions f , f+ : [0, 1] R with (a) 0
1
f (x)
[a,b] (x)
f+ (x)
1 for all x [0, 1],
(b)
0
(f+ f ) < , and

1
We use
f as shorthand for
0
f (x) dx for convenience.
10
1 Motivation
(c) f+ (0) = f+ (1) = f (0) = f (1) = 0. For example, the functions f+ and f could be chosen to be piecewise linear, as illustrated in Figure 1.1. In this case the shaded region can easily be chosen to have total area bounded above by , as required in (b).
f+
Fig. 1.1. The function
[a,b] and the approximations f (dots) and f+ (dashes).
By (c), the functions f and f+ also dene continuous functions on T. Since 1 K

K
k=1
f (xk )
1 K
[a,b] (xk )
k=1
1 K
f+ (xk )
k=1
(b a)
f+ (b a) +
K k=1
as K , we obtain (b a) lim inf

K
1 K
[a,b] (xk )
lim sup
K
1 K
K k=1
[a,b] (xk )
(b a) + ,
which implies the claim in (1) for 0 < a < b < 1. The formula in (1) holds trivially if f 1, so 1 K
K k=1
[0,a) (xk ) + (b,1] (xk ) 1 (b a)
as K by taking the dierence. Suppose now that a = 0 < b < 1. Then, for any suciently small > 0, we have
11
f = [,b] and
[0,b]
[0,b+) + (1,1] = f+
(f+ f ) < 3, and the formula in (1) already holds for f and f+ . As before, this implies the claim for [0,b] . The case of 0 < a < b = 1 is similar. As in many proofs in analysis approximation played a crucial role in the above argument. In fact, two notions of approximation were used: uniform approximation and approximation by functions that dier in integral very little. We will study these and related notions of approximation throughout these notes.
Exercise 1.8. Prove the remaining implications to show that the four characterizations of equidistribution at the start of Section 1.2 are indeed equivalent.
Example 1.9. A simple example of an equidistributed sequence may be obtained as follows. Fix R Q and dene xk = {k} [0, 1) for k N, where {t} denotes the fractional part of the real number t. To see that this denes an equidistributed sequence, the characterization in (4) is the most convenient to use. For n = 0, the function n is identically 1, so 1 K for all K . If n = 0, then 1 K
K K
0 (xk ) = 1
k=1
e2ink =
k=1 =1
1 K
e2in
k=1
= as K .
1 e2in(K +1) e2in 0 K e2in 1
An amusing consequence of this example is a special case of Benfords law.

Exercise 1.10. Use the equidistribution from Example 1.9 to show the following. Write n for the leading digit of 2n written in decimal (so the sequence (n ) begins (2, 4, 8, 1, 3, 6, 1, 2, 5, . . . )). Then 1 |{k | 1 K k K, k = 1}| log10 2
as K . Using Exercise 1.12 below, generalize this to a statement about powers of 2 and 3 with the same exponent.
12
1 Motivation
Clearly there is some notion of convergence of measures to the Lebesgue measure in the discussion above. In order to formulate this precisely, we will need to dene an appropriate topology on a space of measures. This topology will be called the weak*-topology (see Chapter 7), and as we will show the space of probability measures on a compact metric space is itself a compact metric space in this topology. This result helps to provide a coherent setting for many equidistribution results.
Exercise 1.11. Assume that 1 , . . . , d R are linearly independent over Q. Show that 1 T f t(1 , . . . , d ) (mod Zd ) dt f (x ) d x T 0 Td as T , for any f C (Td ). Exercise 1.12. Assume that 1, 1 , . . . , d R are linearly independent over Q. Show that N 1 1 f n(1 , . . . , d ) (mod Zd ) f (x ) d x N n=0 Td for any f C (Td ).
1.3 Ordinary Dierential Equations

There is no need to motivate the study of dierential equations, as they are of central importance across all sciences that deal with measurable quantities that change with respect to other variables of the system studied. Here we want to briey indicate how even the simplest dierential equations can lead directly to the study of integral operators, which may be analyzed using tools from functional analysis . 1.3.1 A Second-Order Linear Initial Value Problem Consider rst the dierential equation f (x) + f (x) = g (x) with the initial values f (0) = 1, f (0) = 0. Let us recall briey the familiar approach to solving such an equation. First one nds all solutions to the homogeneous equation
(1.6)
A reader familiar with the theorem of Picard and Lindel of on the existence and uniqueness of solutions to certain initial value problems and its proof will not be surprised by this connection. However there are further connections, which we will begin to expose here.
13
f (x) + f (x) = 0, giving f (x) = A sin x + B cos x (1.7) for constants A and B . Then one moves on to the problem of nding one particular solution fp to the equation
(x) + fp (x) = g (x), fp
(1.8)
ignoring the initial values, which may be done by a sophisticated guess if g is suciently simple, or by using the method of variation of parameters (that is, treating A and B as functions of x rather than constants). Finally, taking the sum of f from (1.7) and a solution to (1.8), one chooses the constants A and B in the solution to the homogeneous equation to satisfy the initial values. Rather than going through this in detail, we claim that the function
x
f (x) = cos(x) +
0
sin(x t)g (t) dt
is a solution to the initial value problem. This is easily checked by a calculation: f (0) = 1 clearly, and
x
f (x) = sin x + sin(x x)g (x) + so f (0) = 0. Finally,
cos(x t)g (t) dt,
f (x) = cos x + cos(x x)g (x) = f (x) + g (x) as required. 1.3.2 The Volterra Equation
sin(x t)g (t) dt
If the original dierential equation in (1.6) is changed slightly, to take the form f (x) + f (x) = (x)f (x), (1.9) with the same initial values f (0) = 1 and f (0) = 0, then the argument used above does not solve the equation. Nonetheless, the ideas are still useful, since it suggests transforming the equation into the integral equation
x
f (x) = cos(x) +
0
sin(x t) (t)f (t) dt.

g(t)
(1.10)
Now dene k (x, t) = sin(x t) (t) so that (1.10) takes the form
14
1 Motivation
f = u + K (f ), where u(x) = cos x and

x
(1.11)
K (f )(x) =
0
k (x, t)f (t) dt.
Here K is a linear map, dened on some space of nice functions. We will therefore call K an operator , and due to its nature it is an integral operator. Solving the perturbed equation (1.9) with initial values will turn out to be very straightforward at the level of abstraction we aim at in functional analysis. We can rewrite the equation (1.11) as a Volterra equation (I K )f = u where I is the identity map. The solution f is then given by applying the inverse operator (I K )1 , which we may calculate (in this particular case) using an operator form of the geometric series: (I K )1 = and hence f=
n=0 n=0
K n,
K n u.
Clearly we will have to study convergence of these innite series of powers of operators, and also make precise the classes of functions on which these arguments make sense (see Section 2.4.1). 1.3.3 The SturmLiouville Equation Finally, we make another small change to the dierential equation (1.6). Fix a parameter > 0 and consider the SturmLiouville equation f + 2 f = g, with the boundary conditions f (0) = f (1) = 0. We may proceed just as before. The functions of the form f (x) = A cos x + B sin x give all solutions to the homogeneous dierential equation f + 2 f = 0. Next one needs to nd a particular solution fp to (1.12)

fp + 2 fp = g
15
(ignoring the boundary conditions). After this, one would use the solutions to the homogeneous dierential equation to satisfy the boundary conditions. Explicitly, given fp we can calculate the vector fp (0) fp (1) and try to express it as a linear combination of the two vectors cos 0 cos 1 and sin 0 sin 1 det = 1 cos 0 . sin = sin (1.13)
If
1 0 cos sin
is non-zero, then this is always possible and we nd a unique solution to the boundary value problem. However, if Z then sin = 0 and we may fp (0) 1 be unlucky with the value of the vector (1.13): if and are fp (1) cos linearly independent, then there will not be a solution to the boundary value problem. This obstruction to being able to nd a solution to the boundary value problem may be phrased in terms of an integral operator. At rst sight this connection (and this example) may appear contrived, but in fact it opens a door to the important topic of the spectral theory of operators, which is crucial for many other problems. Dene the continuous function (the Green function) on [0, 1]2 by G(s, t) = We claim that the conditions f (0) = f (1) = 0 f = h are equivalent to f = Kh, where K is the operator dened by
1
s(t 1) for 0 t(s 1) for 0
s t
t s
1; 1.
(1.14)
K (h)(s) =
0
G(s, t)h(t) dt.
(1.15)
In order to justify the claim, assume rst that f = Kh. Then
16
1 Motivation
1
f (0) =
0
G(0, t) h(t) dt = 0,
=0
and f (1) = 0 for the same reason. Moreover,

s 1
f (s) =
0
t(s 1)h(t) dt +
s 0
s(t 1)h(t) dt,
@ @ @ s(s 1) h@ (s) + f (s) = @ @
th(t) dt @ @ @ s(s 1) h@ (s) + @ @

1 s
(t 1)h(t) dt,
and f (s) = (s)h(s) (s 1)h(s) = h(s), so f is a solution of the boundary value problem (1.14). To see the converse, notice that the boundary value problem has a solution (by the argument above). However, our previous discussion of the boundary value problem associated to the SturmLiouville equation (1.12) (which needs to be modied for the case = 0) shows that in this case the solution is unique. Thus the equivalence of (1.14) and f = Kh is established.
Exercise 1.13. Modify the argument for the SturmLiouville equation for the case = 0, and show that the solution is always unique.
In particular, the fact that sn (x) = sin nx for any n Z satises sn (0) = sn (1) = 0 2 s n = (n) sn implies that sn = (n)2 K (sn ). In other words, the values for n = 1, 2, . . . are eigenvalues of the linear map (or integral operator) K . Thus we can rephrase our earlier observation regarding the equivalent formulations f + 2 f = g f (0) = f (1) = 0 f = K (2 f + g ) I + 2 K f = K (g ) n = (n)2
by saying that this dierential equation always has a unique solution for any g unless = n corresponds to one of the eigenvalues n = (n)2 = 2 of K .
Actually these are all the eigenvalues of K (see Exercise 1.14).
17
This discussion gives some hope that the notion of eigenvalues and eigenvectors (which might themselves be functions) of operators may make sense, and can be useful in the study of ordinary dierential equations. (In fact, these questions also turn out to be useful for the study of partial dierential equations, see the discussion of the next section.) However, as we will see later, some care must be taken because the operators arising act on innitedimensional spaces eigenvectors may not exists, and the spectral theory of operators will be found to contain many new possibilities and phenomena involving generalized eigenvalues and eigenvectors when compared with the familiar theory of eigenvectors and eigenvalues of matrices (that is, operators on nite-dimensional vector spaces). We will start this topic in Chapter 4.
Exercise 1.14. Suppose that f L1 ([0, 1]) and that Kf = f for some in R {0}, where K is the operator (1.15) discussed in connection with the Sturm Liouville problem. Show that f must be smooth on (0, 1), and deduce that f and must satisfy the conditions found above. Exercise 1.15. In this exercise we generalize the connection between the Sturm Liouville boundary value problem and integral operators. Let a < b be real numbers, and assume that p C 1 ([a, b]) and q C ([a, b]) are real-valued functions with p > 0 and q > 0. We dene the second order dierential operator L(f ) = (pf ) + qf. Also let 1 , 2 , 1 , 2 R and dene the boundary conditions B1 (f ) = 1 f (a) + 2 f (a) = 0, B2 (f ) = 1 f (b) + 2 f (b) = 0. Assume that f1 and f2 are fundamental solutions of the dierential equation L(f ) = 0 such that we also have B1 (f1 ) = B2 (f2 ) = 0. Show that
p(f1 f2 f1 f2 ) = c
is a constant. Using this, dene an associated Green function G(s, t) =

1 f (s)f2 (t) c 1 1 f (t)f2 (s) c 1
for a for a
s t
t s
b, b,
and show that for h C ([a, b]) the boundary-value problem B1 (f ) = B2 (f ) = 0 L(f ) = h is equivalent to the equation
This is the rst case of the phenomenon called elliptic regularity, which we will return to in Chapter 3. That is, the functions f1 , f2 form a basis of the vector space of all solutions.
18
1 Motivation
b
f (s) = K (h)(s) =
a
G(s, t)h(t) dt.
Calculate G explicitly for the equation given by L(f ) = f , B1 (f ) = f (a) and B2 (f ) = f (b).
1.4 Partial Dierential Equations and the Laplace Operator

We would like to discuss two particular partial dierential equations. As we will see later, the mathematical background needed for this, most of which comes from functional analysis, is much more interesting than that needed for ordinary dierential equations. One of the objectives for this book is to make the informal discussion in this section more formal and rigorous. We will start this in Chapter 3 and Chapter 4. chapter:FourierSeriestoDirichletBoundaryValueProblemsFA In both of the partial dierential equations that we will discuss, we will need to understand and express the dierence between the value of a function at a point and its values in a neighborhood of the point. One might try to do this using an average over some nearby values, but we would like to have an innitesimal version of this dierence. This desire brings the Laplace operator f = 2f 2f + + 2 2 x1 xd
for a smooth function f : Rd R into the picture because of the following simple observation. Proposition 1.16 (Laplace and neighborhood averages). Let Rd be an open set, and suppose that f : R is a C 2 function. Then
r 0
lim
1 r2 vol(Br (x))
Br (x)
(f (y ) f (x)) dy = cf (x)
for any x , where c =
1 2(d+2) .
Proof. Suppose for simplicity of notation that x = 0, and apply Taylor approximation to obtain f (y ) = f (0) + f (0)y +

1 2f (0)yi yj + o 2 i,j =1 xi xj
Here interesting is a synonym for dicult. The same operator is sometimes written as 2 .
19
where f (0) is the total derivative of f at 0 and we used the notation o() from p. 389.. Now in the integral over the r-ball Br (0) = {y Rd | y < r} the linear terms (and the mixed quadratic terms) cancel out due to the symmetry of the ball. Thus f (y ) dy = vol (Br (0)) f (0) +
Br (0)
1 2
d i=1
2f (0) x2 i
Br (0)
2 yi dy
+ vol (Br (0)) o r2 . (1.16) Next notice that

Br (0) 2 yi dy = B1 (0) 2 yj dy
for all 1
i, j
d and y
Br (0) 2
dy =
B1 (0)
r2 z 2 rd dz
using the substitution y = rz . It follows that

2 yi dy =
Br (0)
1 d
d j =1 Br (0) 2 yj dy
1 = d =
y
Br (0)
dy
2
rd+2 d
z
B1 (0) =:C
dz .
Combining this with (1.16) gives 1 1 r2 vol (Br (0)) (f (y ) f (0)) dy = 1 rd+2 1 f (0) C + o(1) vol (Br (0)) 2 d C = f (0) + o(1). 2d vol (B1 (0)) r2
=c
Br (0)
For completeness, we calculate the value of c using d-dimensional spherical coordinates. Every point z Rd is of the form z = rv for some r 0, and v Sd1 = {w Rd | w = 1}. Then using this substitution we have
20
1 Motivation
1
vol (B1 (0)) =

Sd1 0
rd1 dr dv =
1 vol (Sd1 ) , d
where the integration with respect to v uses (d 1)-dimensional volume measure on the sphere Sd1 , and so
1
C=
B1 (0)
dz =
Sd1 0
rd+1 dr dv =
1 vol (Sd1 ) . d+2
Thus c=
$ $ 1 $ vol (S $ 1 C d 1 ) $ = d+2 . $= $ 1 $ 2d vol (B1 (0)) 2(d + 2) vol (S 2d $ d 1 ) d$
1.4.1 The Heat Equation The heat equation describes how temperatures in a region Rd (representing a physical medium) evolves given an initial temperature distribution and some prescribed behavior of the heat at the boundary . Inside the medium we expect the ow of heat to be controlled by the dierence between the temperature at each point and the temperature in a neighborhood of the point. If we write u(x, t) for the temperature of the medium at the point x at the time t, then this suggests a relationship u = constant x u, t
>0
(1.17)
where x u = u =
2u 2u + + 2 2 x1 xd
is the Laplace operator with respect to the space variables x1 , . . . , xd only. We call (1.17) the heat equation. If we take the physical interpretation of this equation for granted, then we can use it to give heuristic explanations of some of the mathematical phenomena that arise. Suppose rst that we prescribe a timeindependent temperature distribution at the boundary of the medium , and then wait until the system has settled into thermal equilibrium. Experience (that is, physical intuition) suggests that in the long run (as time goes to innity) the temperature distribution inside will reach a stable (time-independent) conguration. That is, for any prescribed boundary value b : R we expect the heat equation on to have a time-independent solution. More formally, we expect there to be a function u : R with u = 0 u| = b. (1.18)
21
The boundary value problem (1.18) is the Dirichlet boundary value problem. Proving what the physical intuition suggests, namely that the Dirichlet boundary value problem does indeed have a (smooth) solution will take us into the theory of Sobolev spaces. The case d = 2 will be proved in Chapter 3. Leaving the Dirichlet problem to one side for now, we continue with the heat equation. Motivated by the experience of ordinary dierential equations, we would like to know how we can nd other solutions to the partial dierential equation (ignoring the boundary values for now). A simple kind of solution to seek would be those with separated variables, that is solutions of the form u(x, t) = F (x)G(t) with x Rd and t R. The heat equation would then imply that F (x)G (t) = u = c (F (x)) G(t) t
and so (we may as well choose all physical constants to make c = 1) the quotient G (t) F (x) = G(t) F (x) is independent of x and of t, and therefore is a constant (as this is not really a proof, we will not worry about the division). In summary, u(x, t) = F (x)G(t) solves the equation u = x u t if G (t) = et and F = F for some constant , which one can quickly check (rigorously). Ignoring for the moment the values of F on the boundary , it is easy to nd functions with F = F for any R by using suitable exponential and trigonometric functions. However, these turn out not to be particularly useful for the general approach. Only those special functions F : R with F = F inside F | = 0 turn out to be useful in the general case. However, it is not clear that such functions, nor for which values of they may exist. Suppose now that the following non-trivial result the existence of a basis of eigenfunctions (which we will be able to prove in many special cases in Chapter 4) is known for the region Rd .
22
1 Motivation
Claim. Every suciently nice function f : R can be decomposed into a sum f = Fn of functions Fn : R satisfying Fn = n Fn for some n < 0 Fn | = 0. We may then solve the partial dierential equation u = x u t with boundary values u| {t} = 0 for all t u| {0} = f using the principle of superposition to obtain the general solution u(x, t) =
n
Fn (x)en t .
(1.19)
Since n < 0 for each n 1, the series (1.19) converges to 0 as t 0 if it is absolutely convergent, in accordance with our physical intuition, since the boundary condition has temperature 0 for all t > 0. We conclude by mentioning that the claim above will follow from the study of the spectral theory of an operator (much like the discussion in Section 1.3.3), but the operator involved will not have a concrete denition as an integral operator. 1.4.2 The Wave Equation The wave equation describes how an elastic membrane moves. We let u(x, t) be the vertical position of the membrane at time t above the point with coordinate x. As the membrane has mass (and hence inertia) our assumption is that the vertical acceleration a second derivative of position with respect to time t of the membrane at time t above x will be related to the dierence between the position of the membrane at that point and at nearby points. Hence we call 2u = cx u (1.20) t2 the wave equation. As in the case of the heat equation, we may as well choose physical units to arrange that c = 1. Once more we may argue from physical intuition that the Dirichlet boundary problem for the wave equation always has a solution. Consider a wire loop above the boundary (notice that even at this vague level we are imposing some smoothness: our physical image of a wire loop may be very distorted but will certainly be piecewise smooth) and imagine a soap lm whose edge is
23
the wire. Then, after some initial oscillations , we expect the soap lm to stabilize, giving a solution to the boundary value problem dened by the shape of the wire. In this context, what is the meaning of eigenfunctions of the Laplace operator that vanish on the boundary? To see this, imagine a drum whose skin has the shape so that the vibrating membrane is xed along the boundary , which is simply a at loop. Suppose now that F : R satises F = F in F | = 0 for some < 0, then we claim that u(x, t) = F (x) cos( t) solves the wave equation: 2 u(x, t) = F (x) (( )2 ) cos( t) 2 t = F (x) cos( t) = x F cos( t) . In other words, if we start the drum at time t = 0 with the prescribed shape given by the function F , then the drum will produce a pure tone of fre2 quency .
Exercise 1.17. Assume that satises the basis of eigenfunctions claim from p. 1.4.1, and that the Dirichlet boundary value problem always has a solution on . (a) Combine our two discussions of the heat equation to produce a non-rigorous general procedure along the lines of this section to solve the boundary value problem (no rigorous proof is expected) u = u in [0, ) t u | = b {t} u|{0} = f. (b) Repeat (a) for the wave equation.
In the real world there would also be a friction term, and the model for this is a modied wave equation, but we will ignore these subtleties. This preferred frequency for certain physical objects is part of the phenomena of resonance, and the design of large structures like buildings or bridges tries to prevent resonances that may lead to reinforcement of oscillations by wind, for example.
24
1 Motivation
Exercise 1.18. For the clamped vibrating string the wave equation over T the basis of eigenfunctions claim is precisely the claim that every nice function can be represented by its Fourier series. Assuming that this holds, show the basis of eigenfunctions claim for the domain = (0, 1) R. (In fact we have already encountered the eigenfunctions x sin(nx) with n = 1, 2 . . . ; no rigorous proof is expected, but explore the connection.)
1.5 Distributions as Generalized Functions

Both in applications and within mathematics it is often useful to have a generalized notion of function to allow, for example, a function F on R with the property that (x)F (x) dx = (0)
R
(1.21)
for any nice functions : R R. Such an F might represent a point mass (a dimensionless object of mass 1 located at 0), or be a mathematical representation of an impulse in physics. Since F is certainly not a function (see Exercise 1.20), one needs to develop a new theory that includes such objects(1) . The theory of distributions allows for such generalized functions, and permits them to be dierentiated, multiplied by smooth functions, and so on. Of course if we were only interested in expressions of the form in (1.21) then we could simply study measures, since (1.21) is simply the integral against the Dirac measure 0 at the origin. However, within the space of measures it does not normally make sense to take derivatives (and this is the case for 0 ) and we will be able to allow a derivative map in the space of distributions. The most direct approach to distributions supercially seems to be a cheat: We declare a distribution to be a linear continuous functional (that is, a linear continuous map to the base eld R or C) on a space of nice test functions {}. Here the denition of nice may vary, to give dierent classes of distributions. For example, one could consider all smooth functions with compact support on Rd as the nice test functions. Requiring continuity of the linear functional is natural but needs a topology to be dened on the space of test functions. In the case of smooth functions of compact support on Rd the natural topology does not come from a Banach space (that is, a linear space complete with respect to a norm see Section 2.2), so we will need to study more general classes of topological vector spaces (that is, vector spaces equipped with a topology making all the vector space operations continuous). We will start the discussion of these locally convex vector spaces in Chapter 7. This denition of a distribution is a cheat because we have nessed the problem that no function F satises (1.21) by simply declaring F to be the distribution (that is, continuous linear functional) which sends the test function to (0) without giving a more direct generalization of functions on R. We may write this formally as
1.6 Highly Connected Networks: Expanders
25
F, = (0), where we write F, for the action of the functional F on the test function . One sometimes also writes R F for F, , especially if we continue to think of F as a generalized function, but whenever we want to prove something about F we will go back to the formal denition of F as a functional on the space of allowed test functions. Even though this may look dubious at rst sight, the intuition provided by the viewpoint that F is a generalized function is often useful, and will stay consistent with the formal treatment of F as a linear functional. As indicated above however, we can only treat this theory rigorously after more preparatory material has been developed.
Exercise 1.19. Show that any integrable function gives rise to a distribution. That is, any f in L1 (R) denes a linear functional Ff on the space Cc (R) of smooth compactly supported functions via Ff , = f (x)(x) dx.
Moreover, show that the resulting map f Ff is linear and injective. Actually it is sucient to assume that f L1 loc (R), the space of locally integrable functions, those that are measurable and have the property that their restriction to any compact set is integrable. Exercise 1.20. Show that no measurable and locally integrable function f : R R has the property (1.21) as ranges over all smooth compactly supported functions.

In designing large connected networks (for example, connecting many computers and servers) one is often confronted with two competing constraints: (High connectivity) Starting from any vertex, it should be easy to reach any other vertex quickly (that is, in few steps); (Sparsity) The network should be economical, meaning that there should not be an unnecessarily large number of edges in the network.
Clearly it is easy to achieve the rst at the expense of the second by using a complete graph (in which every pair of vertices has an edge joining them), and it is easy to achieve the second at the expense of the rst (by arranging the edges so that the vertices are strung along a single line, so as to achieve connectivity at the lowest possible cost).
Exercise 1.21. Analyze the number of edges as a function of the number of vertices in the two extreme constructions of connected networks from above.
Of course there is another option of creating a center vertex with a direct connection to each of the existing vertices, but the center vertex created in this
26
1 Motivation
way would be very costly (or even technically impossible) and would defeat the objective of achieving sparsity. The notion of expander graphs is an attempt to achieve a balance between the two constraints. In order to describe expanders, we will need some basic notation from graph theory. A graph G = (V , E ) is a set of vertices V (the nodes of the network) and edges E V V giving the list of direct connections between nodes. We will always assume that the graph is undirected , so each edge goes both ways and the set E is symmetric. We will also assume that the graph issimple , i.e. that a pair of vertices is at most connected by one edge and that there is never an edge from a vertex to itself . The requirement of sparsity is achieved by requiring that the graph G be k regular for a xed k . A graph G = (V , E ) is said to be k -regular if, for any vertex v V there are exactly k edges from v to other vertices in V (possibly including an edge to v itself). We will x k and look for k -regular graphs with a large number of vertices. Notice that this will impose a sparsity condition on the graph, since the number of edges |E| will be a linear function of the number of vertices |V| (in contrast to the case of a complete graph, a simple undirected graph in which every pair of distinct vertices is connected by a unique edge, for which |E| = 1 2 |V| (|V| 1)). In order to dene the notion of high connectivity, we will need some preparations. A graph G = (V , E ) is called connected if for any two v, w V there exists a path from v to w in the sense that there is a list v0 = v, v1 , v2 , . . . , vn = w of vertices in V with (vi , vi+1 ) E for i = 0, . . . , n 1. Such a path may consist of a singleton, so each vertex is connected to itself by a path of length zero. Notice that there is a natural metric on any connected graph: we may dene d(v, w) to be the minimal length of a path from v to w (that is, the minimal number of edges in a path joining v to w; see Figure 1.2 and Exercise 1.22). The diameter of a connected graph G is the minimal N N with the property that for any two vertices v and w there is a path of length no more than N connecting v to w.
Exercise 1.22. Verify that the notion of distance on a graph illustrated in Figure 1.2 denes a metric on the set of vertices of a connected graph.
(or
The smaller the diameter is in comparison with V , the better the connectivity of the graph is. The worst case with the vertices strung out on a line (or if we seek a 2-regular graph, arranged around a circle) has diameter |V| 1
|V| 2
). The other extreme case of a complete graph has diameter 1. In
Formally the set of edges is viewed as a subset of (V V ) {(a, a) : a V}, and in this sense symmetry means that (a, b) E if and only if (b, a) E . We will think of a single edge joining vertex a to vertex b if (a, b) E , (a, b) and (b, a) will be viewed as a single element of the set of edges E . In particular, |E | will be the total number of edges drawn in the graph, each of which is viewed as a two-way connection.
1.6 Highly Connected Networks: Expanders v1
27
v2
Fig. 1.2. Two points v, w at distance 3 in a connected graph.
the case of expander graphs we will see that families of graphs may be found with diameter N log |V|. Denition 1.23 (Expanders). A sequence of nite k -regular graphs (Gi = (Vi , Ei ))i is an expander family if there exists a constant > 0 (independent of i) with |S | for any subset S Vi for any i min |S |, |Vi S | 1, where
1
S = {v S |there exists w Vi S with (v, w) E} {v Vi S | there exists w S with (v, w) E} is the boundary of S . A few comments are in order. We may always assume that (0, 1). Any nite collection of nite k -regular connected graphs (formally, a sequence as in Denition 1.23 that repeats these) is automatically an expander family. As this is not at all interesting and in particular does not achieve the real benet of the slower growth rate from the logarithmic bound on the diamter one usually requires in addition that |Vi | as i . Notice that we must also have k 3, because k = 2 corresponds to a regular |V |-gon, which we quickly see cannot be an expander family. An expander family consists of connected graphs, but much more is true. Proposition 1.24 (Small diameter). For an expander family (Gi )i , we have diam Gi log |Vi |. Proof. Given some vertex v Vi we claim that the metric ball
We write A B for functions A and B dened on N to mean that there is some constant c for which A(n) cB (n) for all n 1. In the current case the constant c will depend on k and on (as in Denition 1.23) but is not allowed to depend on the particular graph G .
28
1 Motivation
Ba (v ) = {w Vi | d(v, w) has more than

|V| 2
a}
elements if the integer a satises a D= log (|Vi |/2) . log (1 + /(k + 1))
Assuming the claim, suppose that v, w Vi are any pair of vertices. Then, by the claim, each of |Ba (v )| and |Ba (w)| is greater than |V| 2 so that these two balls must have non-empty intersection. By the triangle inequality, it follows that d(v, w) 2(D + 1) ,k log |Vi |, giving the proposition. To prove the claim, notice that if S = Bn (v ) then Bn+1 (v ) Bn (v ) = S S. 1 |S | k+1 since every element of S S must connect to one element of S S and at most k elements of S S can connect to the same element of S S . Together with the dening property of expander graphs, and assuming (0, 1) say, we deduce that |B0 (v )| = 1, |S S | |B1 (v )| and, by induction, |Bn+1 (v )| = |Bn (v )| + |Bn (v ) Bn (v )| for all n with |Bn | proves the claim.
|Vi | 2 .
Moreover,
k1
2>1+
k+1 ,
1+
k+1
|Bn (v )| > 1 +
k+1 |Vi | 2 ,
n+1
Since for n = a
D the lower bound is
this
Thus expander families achieve a balance between the two constraints of high connectivity (with logarithmic growth of the diameter) and sparsity of the graph (with only linear growth of the number of edges and a xed number of edges for its neighbors). However, several questions remain, the most pressing of which are the following. Do expander families exist? What is their connection to functional analysis?
The rst examples of expander families were found by Pinsker [38] (translated in [39]) using a non-constructive probabilistic argument. The same year Margulis [29] (translation in [30]) was able to give an explicit construction using
29
is shown in Figure 1.3.
Ka zdans Property (T) of the group SL3 (Z). Margulis showed that the family of quotients SL3 (Z)/ by nite index subgroups are (via a standard graph structure on them) an expander family. To prove this, we will discuss(2) unitary representations of the group SL3 (Z) (i.e. actions of SL3 (Z) by unitary transformations on a Hilbert space). To prepare some of the ground for the proof by Margulis, we can already exhibit a connection between the expander property and properties of eigenvalues of linear maps associated to the graphs. Let G = (V , E ) be a nite graph and identify V with the set {1, 2, . . . , |V|}. The adjacency matrix AG of the graph G is the matrix with |V| rows and |V| columns and with entries in {0, 1} so that (AG )ij = 1 if and only if there is an edge from vertex i to vertex j . A simple graph G with adjacency matrix 01011 1 1 1 0 0 AG = 0 1 0 1 1 1 0 1 0 1 10110
1
Fig. 1.3. A connected graph on 5 vertices.
Several properties of the graph are reected in the properties of the adjacency matrix. The matrix AG is symmetric by our standing assumption on the graph G . We also dene 1 MG = AG , k which is an averaging operator in the following sense. A vector x R|V| may be thought of as a function on the set of vertices, and applying MG to x gives a new function which at the vertex i is equal to the mean of the values of the function x at all the neighbors of i. By analogy with the material in Section 1.4, one also studies the graph Laplace operator
30
1 Motivation
G = I MG . Since MG is symmetric it is diagonalizable and has only real eigenvalues. Moreover, |(MG x)i | = (MG )ij xj (MG )ij |xj | = |xj |,
i,j
since
i
(MG )ij = 1
(1.22)
for all j by construction. Therefore, any eigenvalue on MG has || 1 and by (1.22) we see that 1 = 1 is an eigenvalue (with the constant vectors as eigenvectors). The relationship between the eigenvalues and connectivity is illustrated by the following elementary lemma. Lemma 1.25 (Connectivity). A k -regular graph is connected if and only if 1 is a simple eigenvalue of MG .
Exercise 1.26. Prove Lemma 1.25.
What we need next is a quantitative version of this relationship, the rst step of which is given by the following proposition. Proposition 1.27 (Eigenvalues and expanders). Let (Gi = (Vi , Ei ))i 1 be a sequence of graphs. For each i, let Mi = MGi be the averaging operator associated to Gi , and order its eigenvalues as 1 (Mi ) = 1 2 (Mi ) |Vi | (Mi ).
Suppose that there exists some > 0 with 2 (Mi ) for all i 1 (1.23)
1. Then the sequence of graphs is an expander family.
The uniform estimate in (1.23) is called a spectral gap for the sequence of graphs. The converse of Proposition 1.27 also holds, but we will not need this direction (we refer to Lubotzky [28] for the proof). Proof of Proposition 1.27. Let > 0 be as in the statement of the proposition. Let G = Gi and M = Mi for some xed i, so that 2 (M ) 1 . Also |V| let S V be any subset with |S | 1 as 2 |V|. We again think of vectors in R functions on V , and notice that M (S ) = S + fS ,
31
where fS is a vector that vanishes outside of S and has absolute value less than 1 on the elements of S . We will estimate fS 2 from above and below, and the resulting estimate will prove the claim. In fact, it follows that M (S ) S
2
= fS
|S |.
On the other hand, M is diagonalizable and so
S =
j
vj , j vj ,
j
M (S ) = and nally M (S ) S =
|V| j =2
(j 1)vj .
Furthermore, since M is symmetric we can assume that the vectors vj are normal to each other, so M (S ) S
2
2 j |V| |V| j =2
min |j 1| vj
2
|V| j =2
vj
Thus we need to relate the last norm to the size of S . To this end, notice that
V
and
|V|
S , V = |S |, |S | V . Therefore, so the orthogonal projection of S onto V is |V|

|V| j =2
vj
2
|S | S |V| V
1
1 2
|S | |V|
(by restricting the sum to S ) (since

|S | |V| 1 2)
|S |
|V| j =2
and putting these inequalities together gives |S | Thus the sequence (Gi )i M (S ) S
1 2
vj
2
|S |.
2 4 .
is an expander family with =
32
1 Motivation
1.7 What is spectral theory?

As we will see later the topics considered in Section 1.1, Section 1.3, Section 1.4, and Section 1.6 all are connected to spectral theory. The goal of spectral theory, at its broadest, might be described as an attempt to classify all linear operators. The restriction to Hilbert space is natural for two reasons. It is much easier than the general case of operators on Banach spaces (indeed, the general picture for Banach spaces is barely understood today). Secondly, many of the most important applications belong to this simpler setting of operators on Hilbert spaces. This is more than a happy coincidence, since Hilbert spaces are distinguished among Banach spaces as being the spaces most closely linked to the notions of distance, angle, and orthogonality in Euclidean geometry. Euclidean geometry in turn seems to be a suciently accurate mathematical model for the physical universe on many dierent size scales, so it is not so surprising, that some of the most useful innite-dimensional arguments remain close to this geometric intuition. How might one set about classifying linear operators? Finite-dimensional linear algebra suggests that linear maps T1 , T2 : H1 H2 which are linked by a relation of the form T 2 U1 = U2 T 1 , (1.24) where Ui : Hi Hi for i = 1, 2 are invertible linear maps, will share many properties. In the nite-dimensional case, this is because the map Ui , being invertible, may be thought of as corresponding to a changing of basis in the space Hi , which is an operation that does not aect the intrinsic properties of the operators. This interpretation fails in general for innite-dimensional spaces where no good theory of bases exists, but the denition still has interest, and one may try to describe all operators H1 H2 up to such equivalence. If H1 = H2 = H is a single Hilbert space, then one can specialize this notion of equivalence, saying that operators T1 , T2 : H H are equivalent if there is an invertible linear map U : H H with T2 U = U T1 , (1.25)
or equivalently if T2 = U T1 U 1 . Once again, the interpretation of U as a change of basis is not available in the innite-dimensional setting, but the notion is natural. In linear algebra, the classication problem is successfully solved by the theory of eigenvalues, eigenspaces, minimal and characteristic polynomials, which leads to a canonical normal form for any linear operator Cn Cn for any n 1. We wont be able to get such a general theory if H is innite-dimensional, but it turns out that many operators of greatest interest have properties which, in the nite-dimensional case, ensures an even simpler description. They may belong to any of the special classes of operators dened on a Hilbert space by
1.8 Further Topics
33
means of the adjoint operation T T : normal operators, self-adjoint operators, positive operators, or unitary operators. For these classes, if dim H = n, then there is an orthonormal basis (e1 , . . . , en ) of eigenvectors of T with corresponding eigenvalues (1 , . . . , n ), and in this basis, we can write
n n
T
i=1
i ei
=
i=1
i i ei ,
(1.26)
corresponding to a diagonal matrix representation. There is one interpretation of this representation which turns out to be amenable to generalization (in general, we will not be able to use bases in the same way in the innitedimensional setting). Consider the linear map U : H Cn dened by linearly extending the map with U : ei (0, . . . , 0, 1, 0, . . . , 0), where there is a 1 in the ith position. This map is a bijective isometry, by denition of an orthonormal basis, if Cn has the standard inner product. If T1 : Cn Cn (i ) (i i ) then (1.26) becomes T1 U = U T. (1.27) This is obvious, but we may interpret this as follows, which gives a slightly dierent view of the classication problem. For any nite-dimensional Hilbert space H , and normal operator T , we have found a model space and operator (Cn , T1 ), such that in the sense of (1.27) (H, T ) is equivalent to (Cn , T1 ) (in fact, unitarily equivalent, since U is isometric). The theory we will describe later will be a generalization of this type of normal form reduction, a point of view emphasized in the work of Reed and Simon [40, Ch. VII]. This is successful because the model spaces and operators are indeed quite simple: they are of the type L2 (X, ) for some measure space (X, ) (the nite-dimensional case of Cn corresponding to X = {1, . . . , n} with the counting measure), and the operators are multiplication operators Tg : f gf
for some suitable function g : X C.
1.8 Further Topics

We list here a few more topics that we will be able to discuss while developing the theory called functional analysis.
34
1 Motivation
Inheriting Smoothness: Suppose that f : R2 R is continuous and the partial derivatives k k f, f xk xk 1 2 exist and are continuous for all k 1, then f is smooth. That is, all mixed derivatives also exist and are continuous. How is it that the existence of the directional partial derivatives along the x1 and x2 axes alone can guarantee smoothness? We will provide the necessary background to answer this question in Chapter 3 (see Exercise 3.22). Generalized Limits: How can one construct a generalized limit notion that assigns to every bounded sequence a limit, and still has many of the usual expected properties? One such property is translation invariance with respect to the underlying group (for a sequence in the normal sense, this group would be Z). Do all groups have similar notions of generalized limits? We will answer the rst question in Section 6.1.5, where we will also start a discussion of the second question, which leads to the topic of amenable groups. For the construction of expander graphs we will discuss groups that are in some sense diametrically opposite to amenable groups. These are the groups with property (T) introduced by Khazhdan in 1967.
2 Norms, Banach Spaces, and Hilbert Spaces
In this chapter we start the more formal treatment of functional analysis, giving the fundamental denitions and introducing some of the basic examples and their properties.
2.1 Norms and Semi-Norms

We will assume familiarity with the following concepts from linear algebra: vector spaces, subspaces, quotient spaces, dimension (which may be innite), linear maps, image and kernel of linear maps. The notion of basis of a vector space will only be used to distinguish nite-dimensional vector spaces from innite-dimensional ones. We will not usually try to describe the vector spaces that arise in functional analysis, or the linear maps between them, in terms of bases. An exception will arise in the study of Hilbert spaces (see Section 2.5) and in the study of certain (important but nonetheless special) operators on them (see Chapter 4). Also recall that a subset K V of a vector space is said to be convex if for k1 , k2 K and t [0, 1] we have (1 t)k1 + tk2 K. 2.1.1 Normed Vector Spaces Throughout these notes we will be working with real or complex vector spaces (V, +, ) (here + is vector addition, and scalar multiplication). We will call the elements of the eld simply scalars if we want to avoid making the distinction between the real and complex case. For instance, in the fundamental denitions to come in this section, we treat the real and complex cases simultaneously. Denition 2.1. Let V be a real or complex vector space. A norm is a map :V R
36
with the following properties: (1) (2) (3) If v 0 for any v V , and v = 0 if and only if v = 0 (Strict positivity) ; v = || v for all v V and scalars (Homogeneity) ; and v+w v + w for all v, w V (Triangle inequality). is a norm on V , then (V, ) is called a normed vector space.
It is easy to give examples of normed vector spaces, and we list a few standard examples here (more will appear throughout these notes). Example 2.2. The following are examples of normed real vector spaces, in t which we write v = (v1 , . . . , vd ) for elements of Rd . (1) (2) (3) (4) V V V V = Rd = Rd = Rd = Rd with with with with
2 + + v2 . v = v 2 = v1 d v = v = max1 i d |vi |. v = v 1 = |v1 | + + |vd |. norm dened by
1 v B }, = inf { > 0 |
where B is an open, centrally symmetric (that is, with B = B ), convex, bounded (with respect to the Euclidean norm) subset of Rd . (5) Let X be any topological space (for example, a metric space), and let V = Cb (X ) = {f : X R | f is continuous and bounded} with the uniform or supremum norm f = f
= sup |f (x)|.
xX
Notice that if X is compact, then Cb (X ) coincides with C (X ), the space of continuous functions X R. (6) A special case of (5) makes C ([0, 1]), and so also the subspace C 1 ([0, 1]) = {f : [0, 1] R | f has a continuous derivative on [0, 1]}, into a normed vector space. A dierent norm on C 1 ([0, 1]) may be obtained by setting f C 1([0,1]) = max{ f , f }. (7) Finally, consider the vector space of real polynomials
N
R[x] = {f =
k=0
cf (k )xk | N N, cf (k ) R}
on which we can dene any of the following norms (thinking of f R[x] really as the vector of its coecients):
37
a) b) c)
f f f
= =
k=0
|cf (k )|,
1/2
k=0 k 0
|cf (k )|
, or
We could also think of polynomials as dening continuous functions on [0, 1] thus embedding R[x] C 1 ([0, 1]) C ([0, 1]), so that the norm
C 1 ([0,1])
= max{|cf (k )|}.
or
may also be used.
The examples in Example 2.2 all generalize in the obvious way to form normed complex vector spaces, with the exception of (4), where additional requirements on the set B are required (see Exercise 2.5).
Exercise 2.3. Verify that Example 2.2(1),(2),(3),(5),(6), and (7) dene normed vector spaces over R or C. Exercise 2.4. Show that Example 2.2(4) denes a real normed vector space. Exercise 2.5. (a) Show that for a complex normed vector space (V, ) the open unit ball V (0) = {v V | v < 1} B = B1 has the property that B = B for any C with || = 1. (b) Show that if B Cd is open, convex, bounded, and satises B = B for any C with || = 1, then there exists a norm on Cd whose open unit ball is B .
Lemma 2.6 (Associated metric). Suppose that (V, ) is a normed vector space. Then for every v, w V we have v w Moreover, writing for v, w V denes a metric d on V such that the norm function : V R is continuous with respect to the topology induced by the metric d. Proof. For any v, w V , v = vw+w and w = wv+v vw + v by Denition 2.1(2) and (3), and the two equations together give (2.1). To see that d is a metric we need to check the following dening properties for a metric. vw + w d(v, w) = v w vw . (2.1)
38
(1) Strict positivity: That d(v, w) 0 for all v, w V and d(v, w) = 0 if and only if v = w is clear by Denition 2.1(1). (2) Symmetry: d(v, w) = d(w, v ) for all v, w V by Denition 2.1(2) with = 1. (3) Triangle inequality: we have d(u, w) = u w = u v + v w u v + v w = d(u, v )+ d(v, w).
Finally, the norm is continuous at v V if for every > 0 there exists some > 0 such that w B (v ) = {u V | d(u, v ) < } implies that w v < . By (2.1), we may choose = to see this. Notice that the triangle inequality makes addition continuous. If we write B (v ) = {w V | w v < } for the ball of radius around v V , then we have B/2 (v1 ) + B/2 (v2 ) B (v1 + v2 ) for every > 0. This means that (v, w) v + w is continuous at (v1 , v2 ) and, since v1 , v2 V were arbitrary, shows that addition is continuous. Scalar multiplication is also continuous. To see this, notice that w v = ( )w (v w), so if | | < and for some (0, 1), then and so This gives continuity of scalar multiplication at (, v ). We now turn to the sense in which the topology induced by a norm determines the norm. Lemma 2.7 (Equivalence of norms). Two norms and on the same vector space induce the same topology if and only if there exists a (Lipschitz)constant c > 0 such that
1 c v +1
wv < w < v +1 w v < (1 + ||).
c v
(2.2)
for all v V . In this case we call the norms equivalent.
39
Proof. If (2.2) holds, then the standard neighborhoods of v V , B (v ) = {w V | w v and B (v ) = {w V | w v < } with respect to the two norms satisfy
B 1 (v ) B (v ) Bc (v ).
c
< }
This implies that the topologies have the same notion of neighborhood, and so are identical. Suppose now that the two topologies are the same, so that B1 (0) is a neighborhood of 0 in this topology. Then there must be some > 0 with B (0) B1 (0). Equivalently, v then so w = This implies that
< implies that v < 1. For any v V {0}, if w = w
2 v
2 v 2 v
< ,
v < 1.
2 v for all v V , giving the second inequality in (2.2). Reversing the roles of and gives the rst inequality also. v The phenomenon seen in the proof of Lemma 2.7, where a property on all of V is determined by the local behavior at 0 is something that will occur frequently. For Rd the notion of equivalence of norms has the following property.
Proposition 2.8 (Equivalence in nite dimensions). If V = Rd then any two norms on V are equivalent. As we will see in the proof, this is related to the compactness of the closed unit ball in Rd . Proof of Proposition 2.8. Let 1 be the norm on Rd from Example 2.2(3), and let be an arbitrary norm on Rd . It is enough to show that these two norms are equivalent. Write e1 , . . . , ed for the standard basis of Rd , and let M = max1 i d ei . Then
d d
=
i=1
vi ei
i=1
|vi | ei
M v 1,
(2.3)
40
where we have used the triangle inequality generalized by induction to nite sums and homogeneity of the norm. This gives one of the inequalities in (2.2). To obtain the reverse inequality, notice rst that S 1 = { v Rd | v
1
= 1}
is a compact set in the standard topology of Rd (by the HeineBorel theorem). Also, (2.3) shows that v v is a continuous function in the standard topology: (2.1) for and (2.3) together give v
M v w 1,
giving the continuity claimed just as in the end of the proof of Lemma 2.6 with = M . Together this implies that m = min v
v S1
= v0
is attained for some v0 S1 . By denition of S1 we have v0 = 0, so that m > 0 by the property of the norm . Therefore, v V {0} implies that v v or v V implies that v
1
m,
m v
as required.
This might suggest that the equivalence of norms is a widespread phenomenon. However, once we leave the setting of nite-dimensional normed spaces, we will quickly see that a given normed space may have many inequivalent norms.
Exercise 2.9. Show that no two of the norms on R[x] from Example 2.2(7) are equivalent. However, some of the pairs of norms do satisfy an inequality of the form f c f for some xed c > 0 and any f R[x]. Find those that do and identify the relevant constant c in each case. Exercise 2.10. Let V, W be normed vector spaces. Show that V W with its canonical inherited vector space structure can be made into a normed vector space using either of the norms (v, w) for some p [1, ), or (v, w)
p
=( v
p V
+ w
p 1/p W)
= max{ v
, w
W }.
Show that all of these norms are equivalent.
41
2.1.2 Semi-Norms and Quotient Norms
The following weakening of Denition 2.1 is often useful.
Denition 2.11. A non-negative function : V R 0 on a vector space V is called a semi-norm (or a pseudo-norm ) if satises properties (2) and (3) of a norm. Thus a semi-norm is allowed to have a non-trivial subset (which will automatically be a subspace, see below) on which it vanishes. A semi-norm gives rise to a pseudo-metric, which in turn gives rise to a topology on V . The resulting topology is Hausdor if and only if the original semi-norm is a norm in the usual sense. Indeed, if v V has v = 0, then v will belong to every neighborhood of 0 in the topology dened by . Example 2.12. Let (X, B , ) be a measure space, and dene
1 L (X ) = {f : X R | f is measurable and Lebesgue-integrable w.r.t. }.
On this space we can dene a semi-norm f

1
=
X
|f | d,
and this is not a norm (except in degenerate situations; see Exercise 2.14). A reader familiar with measure theory may misread Example 2.12, so 1 (X ) denotes the space that contains genuine we should emphasize that L functions dened at each point of X . The usual solution to the problems created by the many functions on which the semi-norm vanishes is to dene an equivalence class of a function f to consist of all functions that dier from f on a null set. This generalizes to a construction that allows any semi-norm on a vector space to be modied to give a norm on a (related) vector space. To describe this construction, we rst prove a simple lemma that starts to connect the algebraic properties to the topological properties of normed spaces. Lemma 2.13 (Kernel of a semi-norm). The kernel V0 = {v V | v = 0} of a semi-norm on a vector space is a closed subspace in the topology induced by the semi-norm. Proof. For v, w V0 and any scalar we have 0
v + w
|| v + w = 0,
The construction in this section is satisfying and useful at times but, with the exception of Denition 2.11, is not critical for later developments.
42
so V0 is a subspace. By the argument used in Lemma 2.6, we see that the semi-norm : v v is continuous with respect to the induced topology. It follows that V0 = ( )1 ({0}) is also closed.
Exercise 2.14. Characterize those measure spaces (X, B, ) on which the semi1 norm from Example 2.12 on the space L (X ) of Lebesgue-integrable functions is a norm.
1 Returning to Example 2.12, recall that for f L , f 1 = 0 is equivalent to the statement that f = 0 almost everywhere with respect to . Thus the usual equivalence class of a function f is precisely the coset f + V0 dened 1 by f with respect to the kernel V0 L (X ) of the semi-norm. We dene, as is standard, the quotient space
L /V0 , L1 (X ) =
1 and note that the semi-norm 1 on L (X ) gives rise to a norm, also 1 denoted 1 , on L (X ). This construction is a special case of the following.
Lemma 2.15 (Quotient norm). For any vector space V equipped with a semi-norm , and any closed subspace W V , the expression v+W
V /W
= inf
w W
v+w
for v V denes a norm on the quotient space V /W = {v + W | v V }. For the kernel W = V0 we have v + V0 for v V . Proof. This is simply a matter of chasing the denitions through the statements. Let v1 , v2 V and > 0 be given. Then there exist w1 , w2 W with vi + wi vi + W V /W + for i = 1, 2. Hence v1 + v2 + W
V /W V /V0
= v
v1 + v2 + w1 + w2 v1 + w1 + v2 + w2 v1 + W
V /W
+ v2 + W
V /W
+ 2,
43
and so the triangle inequality holds for v1 + w1 = || v1 + w1 which gives v1 + W

V /W
V /W .
Similarly, for any scalar ,

V /W
||
v1 + W
+ ,
|| v1 + W
V /W .
If = 0 then this is clearly an equality, and if = 0 then we may apply the above to v1 and the scalar 1 to give v1 + W
V /W
||1 v1 + W
V /W .
However, this is the remaining half of the homogeneity property. It remains to check that V /W is indeed a norm and not simply a seminorm. Assume therefore that v+W
V /W
= 0.
Then for every > 0 there exists some w W with v w < . However, this shows that v belongs to the closure W of W . However, as W = W is closed by assumption, we see that v W , so v + W = W is the zero element in the quotient space V /W . Notice that we cannot expect the inmum in Lemma 2.15 to be a minimum in general (see, for example, Exercise 2.16).
Exercise 2.16. Let (C ([1, 1]), Example 2.2(5). Dene W = f C ([1, 1]) |
1 )
be the normed vector space dened as in

1
f (x ) d x =
0
f (x ) d x = 0 .
Show that W is a closed subspace. Now let f (x) = x C ([1, 1]), calculate f
C ([1,1])/W ,
and show that the inmum is not achieved.
2.1.3 A Comment on Notation On several occasions in this section we considered dierent norms on the same vector space. This will happen less frequently in the remainder of these notes, and most of the time the normed vector space (V, ) will be equipped with a particular norm. Where we are dealing with a single norm, we will write
V V Br = Br = Br (0) = {w V | w < r}
for the open ball of radius r around 0, and

V V Br (v ) = Br (v ) = {w V | w v < r} = Br +v
44
for the open ball of radius r around v V . We will also frequently write V for the natural norm on V . This convention will be particularly useful in situations where V has a subspace on which one may also consider dierent norms. For example, in Example 2.2(6) we may write f C 1([0,1]) for the natural norm of f C 1 ([0, 1]), but may also write f C ([0,1]) = f for the supremum norm of f C 1 ([0, 1]) thought of as an element of the large space C ([0, 1]). At this point it is reasonable to ask what makes a norm be naturally associated to a given space, and this is partially explained in the next section.
2.2 Banach Spaces

We start by recalling a basic denition from analysis on metric spaces. Denition 2.17. A sequence (xn ) in a metric space (X, d) is said to be a Cauchy sequence if for any > 0 there is an N = N () such that d(xn , xm ) < for any m, n > N . The metric space is called complete if every Cauchy sequence converges to an element of X . This notion gives rise to one of the fundamental type of normed space in functional analysis. Denition 2.18. A normed vector space (V, ) is a Banach space if V is complete with respect to (the metric induced by) the norm . Once again there are many familiar examples of Banach spaces. As we will see there is often an almost canonical choice of norm V which makes a linear space V into a Banach space (V, V ). Example 2.19. We start with a small number of examples, and postpone the proof that these are indeed Banach spaces to Section 2.2.1. (1) V = Rd with any of the norms from Example 2.2(1)(4) from Section 2.1.1 forms a Banach space. (2) Let X be any set. Then B (X ) = {f : X R | f is bounded}, equipped with the norm f
= sup |f (x)|
xX
is a Banach space. Convergence of a sequence of functions in this space is also called uniform convergence. If X = N, then one often writes = (N) = B (N).
It is clear that this property of a norm does not dene it uniquely, as any equivalent norm would induce the same topology and therefore also make V into a Banach space.
2.2 Banach Spaces
45
(3) Let X be a topological space. Then Cb (X ) = {f B (X ) | f is continuous} is a closed subspace of B (X ) and so is also a Banach space. Notice that if X is compact then Cb (X ) = C (X ). (4) Let X be a locally compact topological space (that is, a topological space in which every point has a compact neighborhood). Then C0 (X ) = {f Cb (X ) | lim f (x) = 0}
x
is a closed subspace of Cb (X ) and hence a Banach space. The notion of the limit of f (x) as x used here is dened as follows: limx f (x) = A if and only if for every > 0 there exists some compact set K X with |f (x) A| < for all x X K . If X = N (with the discrete topology), one often writes c0 = c0 (N) C0 (N) for this subspace of (N). (5) The space C 1 ([0, 1]) of continuously dierentiable functions on [0, 1] with the norm f C 1 ([0,1]) = max{ f , f } is a Banach space. k (6) Let Rd be open, and x k 1. Then the space Cb ( ) of functions R for which all partial derivatives up to order k exist and are continuous and bounded on , equipped with the norm f
k ( ) Cb
deg( ) k
max
with = deg ( ) = 1 + + d . (7) Fix p [1, ) and let (X, B , ) be a measure space. Then
1/p
is a Banach space, where for Nd 0 stands for a partial dierential operator (f ) = f 1 d x1 x d
=
X
|f |p d
denes a semi-norm on the vector space

p L = {f : X R | f is measurable and f p
< }.
The associated space of equivalence classes, equal to the quotient

p Lp (X ) = L (X )/V0
by the kernel V0 of the semi-norm p , is a Banach space. Important special cases of this construction include the following:
46
a) (X, B , ) = (, B , m) where is a Borel subset of Rd , B is the Borel -algebra, and m is d-dimensional Lebesgue measure on . b) (X, B , ) = (N, P(N), counting), where counting denotes the counting measure on any subset of N. In this case we will write p = p (N) = Lp counting (N). (8) The analog of (7) with p = is constructed slightly dierently. Let (X, B , ) be a measure space. Then L = {f : X R | f is measurable, f B (X )} is already a Banach space with respect to f
L = L /W , ,
but one also denes
where equipped with the essential supremum norm dened by f

esssup
W = {f L | f = 0 -almost everywhere},
= esssupxX |f (x)| = inf { > 0 | ({x | |f (x)| > }) = 0}.
All of these also have natural complex analogs.

Exercise 2.20. Show that a product of two normed vector spaces V W is complete with respect to one of the norms from Exercise 2.10 if and only if both V and W are complete with respect to their own norms. Thus the product of two Banach spaces is a Banach space.
2.2.1 Proofs of Completeness In this subsection we will explain why the examples from Example 2.19 are indeed Banach spaces. Depending on the background of the reader, parts of this section may be skipped. In each case it is proving completeness that really takes up what eort is required. Proof for Example 2.19(1). If two norms are equivalent then they dene the same notion of convergence and of Cauchy sequence. Thus it is enough to consider Rd with the norm by Proposition 2.8. Now a Cauchy se(i) quence (vn ) in Rd has the property that each component sequence (vn ) for a xed i, where
As is customary we will quickly stop being too careful about the distinction between an element of L (X ) and the equivalence class dened by it in L (X ). For example, |f | dened by |f |(x) = |f (x)| for all x X really depends on f L and not just on the equivalence class, but (as we will see later) the norm dened here is independent of the representative chosen for a given equivalence class.
2.2 Banach Spaces
47
for all n, is itself a Cauchy sequence in R. Since R is complete, there exists a limit (i) v (i) = lim vn
n
vn . vn = . . (d ) vn
(1)
and it is easy to see that v is the limit of vn in Rd . Example 2.19(2). Let X be any set and let (fn ) be a Cauchy sequence in B (X ) with respect to . Then for any xed x X the sequence (fn (x)) is a Cauchy sequence in R, which therefore has a limit f (x). This denes a function f : X R, and we need to show that f B (X ) and that fn f as n with respect to . Since (fn ) is Cauchy, for any > 0 there is some N () with fm fn < for all m, n N (), and so in particular |fm (x) fn (x)| < for any x X and m, n N (). Now let m to see that |f (x) fn (x)| for all large n. Setting = 1 and n = N (1) gives |f (x)| |f (x) fn (x)| + |fn (x)| 1 + fn
,
for each i. These limits together dene a vector (1) v . v= . . , v (d )
showing that f B (X ). Using any > 0, the inequalities above show that there is some N () for which f fn for all n N (). This shows that f = lim fn B (X )
n
as required. Example 2.19(3). By denition Cb (X ) is a subspace of B (X ), and we use the same norm on both spaces. Thus if (fn ) is a Cauchy sequence in Cb (X ) then there exists by (2) a limit f = limn fn B (X ). It remains to show that f Cb (X ) that is, to show that Cb (X ) is a closed subspace of B (X ).
If |X | = d then this example reduces to the previous one.
48
This is a familiar argument from real analysis. Given any > 0 there exists some n with fn f < . Since fn Cb (X ) is continuous at x there is a neighborhood U X of x with |fn (y ) fn (x)| < for all y U . Therefore, |f (y ) f (x)| |f (y ) fn (y )| + |fn (y ) fn (x)| + |fn (x) f (x)| < 3
f fn
<
<
f fn
<
for all y U . As the existence of such a neighborhood holds for all > 0 and x X , we see that f Cb (X ) as required. Example 2.19(4). If (fn ) is a Cauchy sequence in C0 (X ), then f = lim fn Cb (X )
n
exists by (3). We only need to show that f C0 (X ). For this, let > 0 and choose n N with fn f < . Since fn C0 (X ), there exists some compact set K X with |fn (x)| < for all x X K . Thus |f (x)| |f (x) fn (x)| + |fn (x)| < 2
for all x X K . This implies that f C0 (X ) as required. Example 2.19(5). Let (fn ) be a Cauchy sequence in C 1 ([0, 1]) with respect to the norm f C 1 ([0,1]) = max{ f , f }. Then each fn C ([0, 1]), and (fn ) is a Cauchy sequence with respect to . Thus (3) applies and shows that fn converges uniformly to some f C ([0, 1]). The same argument applies to the sequence (fn ) of derivatives, showing that (fn ) converges uniformly to some g C ([0, 1]). All that remains is to verify that f = g. (2.4) In order to show (2.4), it is convenient to rephrase the statement as an integral equation: we need to show that
x
f (x) = f (0) +
0
g (t) dt.
(2.5)
By continuity of g , this is equivalent to (2.4). Since fn is continuously dierentiable, we certainly have
2.2 Banach Spaces

x
49
fn (x) = fn (0) +
0
fn (t) dt.
Now let > 0 be given, and choose n max{ fn f This implies that
x ,
1 such that
fn g }
< .
f (x) f (0)
g (t) dt
0
|f (x) fn (x)|
< x
+ fn (x) fn (0)
=0
fn (t) dt
+ |fn (0) f (0)|

< x
+
0
(fn g (t)) dt <x
for any x [0, 1]. As > 0 was arbitrary, (2.5) follows. Now we know (2.4), it is clear that fn f
C 1 ([0,1])
= max{ fn f
fn g
as n , as required. Example 2.19(6). Let Rd be an open subset, and let (fn ) be a Cauchy k sequence in Cb ( ) with respect to the norm Cb k ( ) . Just as in the argument d for (5), we know from (3) that for any N0 with = 1 + + d k,
the sequence ( fn ) in Cb ( ) has a uniform limit g Cb ( ). All that remains is to show that g = f = g0 . (2.6) Suppose therefore that x and i {1, . . . , d}. It is enough (by induction) to show that (2.7) g (x) = g+ei (x). xi We already know that
h
fn (x + hei ) = fn (x) +
0
+ei fn (x + tei ) dt,
50
so letting n and using the known uniform convergence and the triangle inequality just as in (5) gives
h
g (x + hei ) = g (x) +
0
g+ei (x + tei ) dt,
which implies (2.7) and hence (2.6) by induction on . As in (5) it now follows that fn g0 with respect to
k ( ) Cb
as n .
Exercise 2.21. Generalize Example 2.19(6) to give a Banach space over C in two dierent ways as follows. (a) Let Rd be open and consider C-valued dierentiable functions with bounded derivative (here there is little dierence from the real case). (b) Let C (or in Cd ) be open, and consider the space of complex dierentiable functions with bounded derivative.
For Examples 2.19(7) and (8) regarding integrable functions and bounded measurable functions, we will use two lemmas that we formulate more generally. The usual denitions of convergence and absolute convergence of series extend easily to normed vector spaces as follows. A series n=1 vn converges N if the sequence of partial sums (sN )N 1 converges, where sN = n=1 vn for all N 1, and converges absolutely if the real-valued series n=1 vn converges. Lemma 2.22 (Absolute convergence). A normed vector space (V, ) is a Banach space if and only if any absolutely convergent series in V is convergent. Proof. If V is a Banach space and a series which means that
n=1 n=1 vn
is absolutely convergent,
vn < ,
n k=1
then the sequence of partial sums (sn ) dened by sn = sequence, since for m > n we have
m m
vk is a Cauchy
sm sn =
vk
k=n+1 k=n+1
vk ,
and the last sum can be made arbitrarily small by requiring n to be suciently large. Assume now for the converse that (V, ) is a normed vector space in which every absolutely convergent series is convergent, and let (vn ) be a Cauchy
2.2 Banach Spaces
51
sequence in V . In order to render the Cauchy property more uniform, we choose a subsequence of (vn ) as follows. For each k 1 there exists some Nk such that 1 vm vn < k 2 for all m, n Nk . Now dene inductively a sequence (nk ) by n1 = N 1 , n2 = max{n1 + 1, N2 }, . . . nk = max{nk1 + 1, Nk }. The corresponding subsequence (vnk )k
1
satises 1 . 2k
vnk+1 vnk < Now dene
wk = vnk+1 vnk for all k 1, so that

k=1
wk <
k=1
1 =1 2k
converges, and hence the innite sum

k=1
wk = w V
converges by our assumption on the normed space (V, ). Now the th partial sum of this series is
k=1
v vn vn wk = vn n 2 + + vn+1 3 2 vn1 + = vn+1 vn1 ,
so the subsequence (vnk ) satises

k
lim vnk = w + vn1 = v.
Now any Cauchy sequence with a convergent subsequence must converge: for any > 0 choose N with vm vn < for m, n N and choose K with
52
vnk v < for k K . Then if k K has nk N we have
vm v for all m
vm vnk + vnk v < 2
N , showing that the sequence converges.
Exercise 2.23. Recall two facts from real analysis: a series of real numbers may be conditionally convergent, and that in this case the series may be rearranged to obtain any sum in R {} for the rearranged series; in contrast, if a series of real numbers is absolutely convergent, then the sum is independent of any rearrangement of the series.
Generalize the second result to show that the sum of an absolutely convergent series in a Banach space is independent of the ordering of the terms. (We note that the rst one does not generalize to the context of Banach spaces, see e.g. Section 2.5.5.)
Lemma 2.24 (Quotients of Banach spaces). If (V, ) is a Banach space and W V is a closed subspace then (V /W, V /W ) is a Banach space. Proof. Assume that (vn ) is a sequence with
n=1
vn + W
V /W
< .
We will show that
(vn + W ) 1 some wn W with vn + W

V /W
n=1
exists. For this, choose for each n vn + wn Then

n=1
1 . 2n
vn + wn < ,
so the limit v= exists in V . This implies that
(vn + wn )
n=1
(vn + W ) = v + W
n=1
2.2 Banach Spaces
53
converges also. Indeed, if > 0 and N is chosen with

n
v for n N then v for n N.
(vk + wk ) <
k=1
vk + W
k=1 V /W
<
We refer to Appendix B for basic properties of p on Lp (X ), and in particular for the triangle inequality. Example 2.19(7). Let (fn ) be a sequence in Lp (X ) with
n=1
fn
< .
n=1
By Lemma 2.22, it is enough to show that this, dene a sequence of functions (gn ) by
n
fn converges in Lp (X ). For
gn (x) =
k=1
|fk (x)|.
Clearly gn (x) g (x) for some measurable function g : X [0, ]. Note that
n p
|gn | d =
gn p p
k=1
|f k |
Mp
by the triangle inequality for p . By monotone convergence, this implies that p g p M p, p = lim gn p
n
and so g (x) < for -almost every x X . Therefore, f (x) =

n=1
fn (x)
exists for -almost every x X , and hence denes a measurable function f : X R. Since we also have |f (x)| to show that
g (x) for all x, we also have f Lp (X ). It remains
Strictly speaking we have only dened f on the complement of a null set, but we simplify the notation by ignoring this distinction here.
54

n k=1
fk f
as n . For this, notice that

n k=1 p
fk f
p
(2g )p
and
n k=1
fk f
as n , so that we may apply dominated convergence to the sequence of integrals dened by

n p n p
k=1
fk f
=
p X k=1
fk f
to obtain the conclusion required. Example 2.19(8). Since a pointwise limit (and a fortiori a uniform limit) of a sequence of measurable functions is a measurable function, the subspace L (X ) B (X ) is closed. Therefore, by (2) we see that L (X ) is a Banach space with respect to the norm . Now let W = {f L | f = 0 -almost everywhere}. Clearly W is closed, since if fn W for all n 1 and fn f uniformly, then {x X | f (x) = 0} is a -null set. Therefore
L = L /W n 1
{x X | fn (x) = 0}
is a Banach space with respect to the quotient norm to show that f L /W = inf f + g
gW
L /W .
It remains
coincides as claimed with the essential supremum norm f

esssup
= inf { > 0 | ({x X | |f |(x) > }) = 0}
as given in Example 2.19(8). For this, assume rst that > f esssup so that N = {x X | |f |(x) > } is a -null set, and hence g = f N W . It follows that f + g , f L /W so
2.2 Banach Spaces
55
f If, on the other hand, > f
L /W L /W
esssup .
then there exists some g W with
f +g and so
< ,
{x X | |f |(x) > } {x X | g (x) = 0} is a null set. Therefore f

esssup
L /W .
Exercise 2.25. Show that in the denition of esssup and of L /W (from the proof that Example 2.19(8) is a Banach space on p. 54) the inma are actually minima.
2.2.2 The Completion of a Normed Vector Space Even though we have seen several examples of Banach spaces above, there are many natural normed vector spaces that are not Banach spaces. For example, R[x] is not a Banach space with respect to any of the ve norms discussed in Example 2.19(7) (see also Exercise 2.49). As a result it is useful to know that any normed vector space has a completion. Theorem 2.26 (Existence of a Completion). Let (V, ) be a normed vector space. Then there exists a Banach space (B, ) which contains V as a dense subspace, and the indicated norm on B restricts to the original norm on the image of V in B . Proof. Let W = {(vn )n
1
V N | (vn ) is a Cauchy sequence}.
It is straightforward to check that W is a vector space. We also dene the semi-norm (vn )n 1 = lim vn , which is well-dened since ( vn )n 1 is a Cauchy sequence in R since (vn )n 1 is a Cauchy sequence in V (due to (2.1)). The kernel of this semi-norm is the space W0 = {(vn )n 1 | vn 0 as n }
The proof in this section can be skipped, as many natural normed vector spaces are already Banach spaces, and we will be able to give another shorter construction in Chapter 6 on p. 209.
56
of null sequences (that is, sequences converging to 0) in V . We dene B = W/W0 and b

B
= lim
vn
where b = (vn )n 1 + W0 , and it may then be checked that (B, B ) is a normed vector space. Moreover, B contains an isometric copy of V , since an element v V can be identied with the equivalence class of the constant sequence (v ) = (v, v, . . . ) + W0 , with the norm of this coset being (v )
B
= lim
v = v
by denition. We claim that (the image of) V is dense in B . Given an equivalence class b = (v1 , v2 , . . . ) + W0 B of a Cauchy sequence (vn ), for every > 0 there exists some N with vm vn < for m, n N . Then (v1 , v2 , . . . ) + W0 (vN )
B
= lim
vn vN
Using this for any > 0 shows that the image of V is dense in B . It remains to show that B is complete with respect to B . For this, assume that (bn )n 1 is a Cauchy sequence in B . Since the image of V is dense in B we can nd an associated Cauchy sequence (vn ) in V with bn (vn ) < 1 n for each n N. Then for every > 0 there exists some N () with bm bn < and
1 1 m, n
<
for m, n
N (), so that vm vn vm bm + bm bn + bn vn < 3

1 < <m
<
1 < <n
for m, n
N (), and so (vn ) is a Cauchy sequence in V . We dene b = (v1 , v2 , . . . ) + W0 B (2.8)
and see that
2.2 Banach Spaces
57
b bm
b (vm )
n
lim
+ (vm ) bm 1 vn vm + < 4 m
B <3
for m N (). Thus b B dened by (2.8) is the limit of (bn ) and so B is a Banach space.
Exercise 2.27. Let Cc (R) be the vector space of continuous functions f : R R with Supp(f ) = {x R | f (x) = 0} compact, with the norm . Show that this space is not complete, and nd a Banach space containing Cc (R) so that the induced norm obtained by restriction is . Can you do the same for the norm f
= f
where : R R>0 is a xed continuous function (for example, (x) = ex )?
2.2.3 Non-Compactness of the Unit Ball Many properties of nite-dimensional Banach spaces are consequences of the HeineBorel theorem, which implies that the closed unit ball in a nitedimensional Banach space is compact. Correspondingly, many interesting problems in innite-dimensional Banach spaces are a consequence of the opposite phenomenon. Proposition 2.28 (Non-compactness of the unit ball). If a normed vecV in V tor space (V, ) is not nite-dimensional, then the closed unit ball B1 is not compact. Proof. It is enough to construct a sequence (vn ) in V with vn for all n 1 and with vm vn
1 2
(2.9) (2.10)
for all m = n (for then such a sequence has no Cauchy subsequence, and therefore no convergent subsequence). Choose v1 V with norm 1 (this is always possible by homogeneity). Suppose that we have already found v1 , . . . , vk V with (2.9) and (2.10) for 1 n k and 1 m = n k . The subspace W = v1 , . . . , vk
The proofs in the section could be skipped: the result is negative and will only be used in a much more concrete situation where the proof is a simple exercise.
58
is nite-dimensional, and therefore complete with respect to the induced norm (see Proposition 2.8). Thus W is a closed subspace, and so we may consider the quotient norm V /W on the normed vector space V /W as in Lemma 2.15. Since V is not nite-dimensional, the quotient space V /W is non-trivial, and so we may choose some v V /W with v = 0. Thus d= v+W
V /W
> 0.
It follows that there exists some w W with v + w < 2d. Dene vk+1 = so that vk+1 = 1. Also, for 1 vk+1 vn vk+1 + W n
V /W
1 (v + w), v+w k , we have vn W and so 1 v+w v+W

V /W
d 1 = 2d 2
as required. Thus by induction we obtain a sequence (vn ) with the claimed properties, and hence the proposition. Given the negative statement in Proposition 2.28, a natural question to ask is how to characterize compact subsets of a Banach space. This depends on the space concerned (see Exercise 2.29). A vague principle is that one tries to extract topological and geometrical properties of nite subsets of the Banach space, and then compact subsets are sometimes characterized by suitable versions of those properties. We will illustrate this in the next section, where we will prove the ArzelaAscoli theorem.
Exercise 2.29. Characterize compact subsets of the following Banach spaces. (a) The space c0 of null sequences (that is, sequences (xn ) of scalars with |xn | 0 as n ) with the norm (xn ) = supn 1 |xn | = maxn 1 |xn |. (b) The space p of p-summable sequences of scalars with p [0, ). That is,
p = with the p-norm
(x n ) |
n=1
|xn |p <
1/p
(x n )
=
n=1
|xn |p
2.3 The space of continuous functions

2.3.1 The ArzelaAscoli theorem To illustrate the failure of non-compactness of the closed unit ball in a Banach space, we now discuss the Banach space of continuous functions C (X ) on a
59
compact metric space (X, d). A subset K C (X ) is said to be equicontinuous if for every > 0 there is a > 0 such that d(x, y ) < = |f (x) f (y )| < for all x, y X and f K . The key uniformity here is that a single may be used for all the functions f K . In the spirit of the discussion after the proof of Proposition 2.28, notice that for a nite set K this uniformity is trivial.
Exercise 2.30. Recall that a function f : X R on a metric space (X, d) is uniformly continuous if for any > 0 there is some > 0 for which d(x1 , x2 ) < = |f (x1 ) f (x2 )| < for all x1 , x2 X . Show that any continuous function on a compact metric space is uniformly continuous.
Theorem 2.31 (ArzelaAscoli). Let (X, d) be a compact metric space, and let C (X ) be the Banach space of continuous (real- or complex-valued) functions on X . A subset K C (X ) is compact if and only if K is closed, bounded, and equicontinuous. Proof. Suppose that K C (X ) is compact. Then it is closed and bounded (if it is not closed or not bounded, then it is easy to write down a sequence with no subsequence converging to an element of K ). We will now show that it is also equicontinuous. Fix > 0, then there exist nitely many functions f1 , . . . , fn K such that
n
K by compactness, since K
B (fi )
i=1
(2.11)
B (f )
f K
is an open cover of K . Each fi is continuous and, since X is compact, each fi is also uniformly continuous by Exercise 2.30. Since the family {fi } is nite, we can conclude that there is a > 0 with d(x, y ) < = |fi (x) fi (y )| < (2.12)
for i = 1, . . . , n. We now combine (2.11) and (2.12) for the given value of > 0. Fix some f K . By (2.11), there exists some i with f fi < . If x, y X have d(x, y ) < , then |f (x) f (y )| |f (x) fi (x)| + |fi (x) fi (y )| + |fi (y ) f (y )| < 3,
< < by (2.12) <
showing equicontinuity.
60
Now suppose that K C (X ) is closed, bounded, and equicontinuous. To show that K is compact with respect to , let (fn ) be an arbitrary sequence in K . It will be enough to exhibit a Cauchy subsequence of (fn ), since by Example 2.19(3) such a subsequence will converge in C (X ), and by our assumption that K is closed the limit will be in K . First notice that X contains a dense countable subset D since X is a compact metric space . In fact, X which implies that
a( m ) X B1 /m (x), xX
X B1 /m (xm,i ) i=1
(2.13)
for some xm,i X , so that the set D = {xm,i | m N, i = 1, . . . , a(m)} is countable and dense. Next notice that by our assumption on K there is some M with fn
M.
Let us write IM = B (M ) = [M, M ] R or IM = B C (M ) depending on whether the eld of scalars is R or C. Then by Tychonos theorem (Theorem A.17) B (M )
D
is a compact metric space with respect to the product

D D
topology. Dene n B (M ) by n = fn |D . By compactness of B (M ) D there exists a subsequence (nk ) which converges in IM . This convergence is precisely the statement that fnk (x) (x) as k for all x D and some function : D B (M ). Note, however, that at this point no uniformity of the convergence is known. We now upgrade the argument above to give the desired statement that (fnk ) is a Cauchy sequence. Fix > 0. Then there exists some > 0 with d(x, y ) < = |fnk (y ) fnk (x)| < (2.14)
for all k 1 (this is possible by equicontinuity of K ). We may assume that = 1 m for some m N. Since fnk (xm,i ) (xm,i )
A topological space is separable if it contains a dense countable subset. The argument given here shows that a compact metric space is separable.
61
for i = 1, . . . , a(m), each of the sequences (fnk (xm,i )) in B (M ) (with k varying) is a Cauchy sequence. Since m is xed, there are only nitely many sequences concerned, so there exists some N () such that k, N () implies that |fnk (xm,i ) fn (xm,i )| < (2.15) for i = 1, . . . , a(m). Now we combine (2.14) and (2.15) as follows. Given x X , 1 . Therefore, by (2.13) there is some i {1, . . . , a(m)} with d(x, xm,i ) < = m for k, N (), |fnk (x) fn (x)| |fnk (x) fnk (xm,i )|
< by (2.14)
+ |fnk (xm,i ) fn (xm,i )|

< by (2.15)
+ |fn (xm,i ) fn (x)| < 3.

< by (2.14)
Thus fnk fn < 3 for all k, Cauchy as required.
N (), showing that the subsequence is
Exercise 2.32. Prove the ArzelaAscoli theorem for any compact space (that is, without assuming that the space is a metric space). To do this, dene a set K C (X ) to be equicontinuous if for every > 0 and every x X there exists a neighborhood U of x with |f (y ) f (x)| < for all f K . Exercise 2.33. Extend the ArzelaAscoli theorem to the space C0 (X ) of continuous functions vanishing at innity with the uniform norm f = supxX {|f (x)|}, where X is a locally compact metric (or locally compact non-metric) space.
2.3.2 The StoneWeierstrass Theorem Let X be a compact topological space. We now prove a useful criterion for a subset of functions to be dense in C (X ). However, for this we will need to distinguish between the space CR (X ) of real-valued, and the space CC (X ) of complex-valued, continuous functions on X . Theorem 2.34 (StoneWeierstrass). Let (X, T ) be a compact topological space. (a) Suppose that A CR (X ) is a collection of functions that satisfy the following properties: A is a subalgebra, meaning that A is a linear subspace of CR (X ) and f g A for any f, g A (algebra);
62
the constant function A (constants); the algebra A separates points: for any x, y X with x = y , there is some function f A with f (x) = f (y ).
Then A is dense in CR (X ) with respect to . (b) Suppose that A CC (X ) satises all of the properties in (a) and, in addition, has A is closed under conjugation, meaning that if f A then f A (complex conjugation).
.
Then A is dense in CC (X ) with respect to
We will start the proof of Theorem 2.34 with the following lemma, and will write A for the closure on A with respect to . Lemma 2.35. Let A be a subalgebra of CR (X ). Then A is also a subalgebra, and |f |, max{f, g }, min{f, g } A for any f, g A. Proof. It is easy to check that the operations (f, g ) f + g and (f, g ) f g are continuous for f, g A and R. Therefore A is also an algebra. Recall that 1/2 n u 1 + u = (1 + u)1/2 = n n=0 is a power series with radius of convergence 1. In particular, for any > 0 this series converges uniformly for |u| 1 . Suppose now that f A and M = f . Then the function g = M2 1 (f 2 + ) +
takes on values in [/(M 2 + ), 1], and so

n=0
1/2 n
g 1
< ,
which implies that g = (1 + (g 1))1/2 =

n=0
1/2 (g 1)n n
converges with respect to by Example 2.19(3) and Lemma 2.22. We deduce that f 2 + A. Now
We will check this in greater generality in Section 2.4.3.
63
f 2 + |f | = =
f2 +
2
f2
2
f2
f +f ++
f2
so the fact that identities and
f 2 + A for all > 0 implies that |f | A also. The

1 max{f, g } = 1 2 (f + g ) + 2 |f g | 1 (f + g ) 1 min{f, g } = 2 2 |f g |
give the other parts of the lemma. Proof of Theorem 2.34. We start with the real case of an algebra A CR (X ). Notice that by Lemma 2.35 the algebra A is closed under taking nitely many maxima or minima: if f1 , . . . , fn A then max{f1 , . . . , fn }, min{f1 , . . . , fn } A. We will use this property for a given f CR (X ) and > 0 to nd a function f A with f f < . This then implies that A = CR (X ). The construction has three steps. First step: achieving the correct value at two points. Let x0 , x X be (not necessary distinct) points. Then there exists some hx0 ,x A with hx0 ,x (x0 ) = f (x0 ) hx0 ,x (x) = f (x). (2.16)
Indeed, if x0 = x then we simply take hx0 ,x = f (x0 ) A. If x X {x0 } we know that A contains a function h A with h(x) = h(x0 ) since the algebra separates points. In this case, we may nd a linear combination hx0 ,x of h A and the constant function A with the desired property. Second step: correct value at one point with a function nowhere much smaller. Let x0 X . As our next step we claim that there exists a function gx0 A with gx0 (x0 ) = f (x0 ) (2.17) gx0 (y ) > f (y ) for all y X . That is, gx0 is chosen to have the correct value at x0 for the objective of approximating f , and to be not much smaller than f at every other point, as illustrated in Figure 2.1. We will construct gx0 as a maximum after nding a nite subcover for the following open cover of X . For any x X (including x0 ) there exists an open neighborhood Ox of x with y Ox = hx0 ,x (y ) > f (y ) , (2.18)
64
2 Norms, Banach Spaces, and Hilbert Spaces hx f f hx0 = f (x0 )
x0
Ox O x0
Fig. 2.1. The function gx0 is constructed by nding x1 , . . . , xn (in this case, x0 and x) with the property that gx0 = max{hx1 , . . . , hxn } > f .
where hx0 ,x A is as in (2.16). This denes an open cover {Ox | x X } of X . By compactness there exists some nite subcover X = Ox1 Oxn . We dene and notice that gx0 satises gx0 = max{hx0 ,x1 , . . . , hx0 ,xn } A, (2.19)
gx0 (x) = max{f (x0 ), . . . , f (x0 )} = f (x0 ) by (2.16), and by (2.19) for every y X there is some i {1, . . . , n} with y Oxi , and hence gx0 (y ) hxi (y ) > f (y )
by (2.18). Third step: nowhere too much smaller, nowhere too much bigger. The claim (2.17) above takes care of one half of the need to nd an approximation to f within A. For every x X we found some gx A that is nowhere much smaller than f , and is equal to f at x. We now vary the point x, and essentially repeat the argument to nd an -approximation to f within A. Indeed, for every x X there is an open neighborhood Ux for which y Ux = gx (y ) < f (y ) + . (2.20) By allowing x X to vary this gives an open cover {Ux | x X } of X , and once again by compactness there is a nite subcover
2.3 The space of continuous functions g x1 g x0 g x2 f + f f
65
U x0
X
U x2
U x1
Fig. 2.2. The function f = min{gx0 ,x1 , . . . , gx0 ,xm } is constructed to have f f < .
X = Ux1 Uxm . We dene f = min{gx1 , . . . , gxm } A, and claim that f f we have
(2.21)
, as illustrated in Figure 2.2. For every y X gxi (y ) > f (y )
by the property of gxi in (2.17), and so f (y ) > f (y ) . By (2.21) every y X lies in some Uxi and so (2.20) implies that f (y ) gxi (y ) < f (y ) + .
Since f A and > 0 was arbitrary, we deduce that f A. In the case of a complex subalgebra A CC (X ) that is closed under conjugation we may consider AR = A CR (X ). This is again a subalgebra that separates points if A separates points. Indeed, if x, y X have x = y then there is (by the assumption on A) some f A with f (x) = f (y ). Let u = (f ) and v = (f ), so that u= f +f AR 2
66
f f AR 2i by our assumption on A. Thus AR also contains a function that separates x and y . By the real case, AR is dense in CR (X ), so by splitting an arbitrary function in CC (X ) into real and imaginary parts and approximating each of these with elements of AR AC the theorem is proved. v=
Exercise 2.36. Let X be a locally compact space. Extend the StoneWeierstrass theorem to C0 (X ) by considering a subalgebra A C0 (X ) that separates points. Exercise 2.37. Dene for every s > 0 a norm p on R[x]. Show that
s s
and
= p
t
C ([0,s])
= sup |p(x)|
x[0,s]
and
are inequivalent norms if s = t.
2.3.3 Continuous Functions in Lp Spaces
Another important feature of continuous functions (of compact support) is that they form a dense subset of the Lp spaces. Proposition 2.38 (Density of Cc (X ) in Lp (X )). Let X be a locally compact -compact metric space equipped with a locally nite measure on the Borel -algebra B (X ). Then, for any p [1, ), Cc (X ) is dense in Lp (X ). Proof. We start with some preparatory observations. Fix p [1, ) and p (X ). Then let f L f = f + if,
and it is enough to show that each of f and f can be approximated in Lp by elements of Cc (X ). We may therefore assume that f is real-valued, and by writing f = f + f we may also assume that it takes values in [0, ). Now notice that such a real-valued, non-negative function f is the pointwise limit of the simple functions
n fn (x) = min n, 21 n 2 f (x) f (x)
as n , which implies that

1/p
fn f
=
X
|f fn |p d
The reader may be familiar with this result for the Lebesgue measure (for example), and this case is sucient for much of the material that will follow. Thus the reader may skip the general proof and return to it at a later stage if needed.
67
as n , by dominated convergence. Thus it is sucient to show that any N simple function f = i=1 ai Bi (where ai R and Bi B (X ) have (Bi ) < for i = 1, . . . , N ) can be approximated by elements of Cc (X ). This in turn will follow if we can show that the characteristic function of any Borel set can be approximated by elements of Cc (X ) in the p norm. Having made these initial reductions, we can now turn to the heart of the argument. Fix a sequence (Xm ) of open subsets of X with Xm compact and Xm Xm+1 for all m 1 and with X=
m=1
Xm .
Using this sequence, we dene the family A = B B | B Xm Cc (X )
for all m
of all Borel sets whose characteristic function can be approximated by elements of Cc (X ) once restricted to any of the sets Xm . We claim that A = B , and will prove this by showing that A contains any open subset of X , and A is a -algebra. 1. Dene the closed (2.22)
Open Subsets: Let O X be open, and x m set A = X (Xm O) and the distance function d(x, A) = inf d(x, y ).
y A
This distance function satises |d(x1 , A) d(x2 , A)| d(x2 , y ) and so d(x1 , y ) which implies that
d(x1 , x2 ).
(2.23)
Indeed, for > 0 there exists some y A for which d(x2 , A) + ,
d(x1 , x2 ) + d(x2 , y )
d(x1 , x2 ) + d(x2 , A) + ,
The reader may also rst assume that X is compact and simply ignore the sequence (Xm ) (i.e. only work with one Xm = X ). In the general case the sequence (Xm ) can be constructed as follows. First set X0 = . Since X is compact, there is a sequence of compact sets (Qn ) with X = 1. n=1 Qn . Fix n By assumption every point x Qn has an open neighborhood Ux with compact closure. By compactness, we get Qn Xn = Xn1 Ux1 Uxm(n) for nitely many points x1 , . . . , xm(n) Qn . This sequence satises all required properties.
68
d(x1 , A)
d(x1 , x2 ) + d(x2 , A),
and hence (2.23) by the symmetry between x1 and x2 . This shows continuity and since clearly Supp (d(, A)) Xm is compact by construction, it follows that fn () = min{1, nd(, A)} Cc (X ). Moreover, if x A = X (Xm O) then fn (x) = Xm O (x) = 0, while if x Xm O then d(x, A) > 0
and fn (x) 1 = Xm O (x). Thus in either case fn Xm O as n on X , and so fn Xm O

p 1/p
=
Xm O
|fn Xm O |p d
as n by dominated convergence. As this holds for all m 1, this shows that O A by denition. In particular, X A. Intersections: Suppose that A, B A and m 1. Then there exist sequences of functions (fn ), (gn ) in Cc (X ) with fn Xm A and gn Xm B
p
0 0
as n . We may assume that fn and gn take on values in [0, 1], for if not we can replace fn by fn = max{0, min{1, fn}}, and gn by gn similarly. Then fn gn Cc (X ) and fn gn Xm (AB ) = fn gn Xm A Xm B so fn gn Xm (AB )
p
= (fn Xm A ) gn + Xm A (gn Xm B ) , fn Xm A + gn Xm B
gn
as n . As this holds for all m 1, this implies that A B A as desired. Complements: Suppose that A A and x m 1. Then there exists a sequence of functions (fn ) in Cc (X ) with fn Xm A p 0 as n .
69
There also exists a sequence (gn ) in Cc (X ) with gn Xm 0 as n since X A. Thus gn fn Xm (X

A) p
= gn fn (Xm Xm A ) gn Xm
p p
p + fn Xm A
shows that X A A. Finite unions: Let A, B A. Then X A A and B A A by the two steps above. For each m 1 choose sequences (fn ) and (gn ) in Cc (X ) with fn Xm A p 0 and gn Xm (B A) 0 as n . Then fn + gn Xm (AB )
p
= fn + gn (Xm A + Xm B fn Xm A
p
A) p A p
+ gn Xm B
as n . Thus A B A, and by induction this extends to nite unions. Countable unions: Now suppose that A1 , A2 , . . . all lie in A, and x m 1 and > 0. Then Xm
k=1
Ak
= lim Xm
Ak
k=1
By assumption, (Xm ) < , and so there exists some so that Xm Thus

k=1
Ak
k=1
Ak
< .
Xm
k=1
Ak
Xm
k=1
Ak
Xm
k=1
1/p
Ak
k=1
Ak
< 1/p .
However, since k=1 Ak A for any some f Cc (X ) with f Xm and so f Xm

k=1
1, we already know that there exists
k=1
Ak
< ,
Ak p
< 1/p + .
Since > 0 and m 1 were arbitrary, we deduce that k=1 Ak A. Concluding the proof: By the arguments above, A is a -algebra containing all the open subsets of X . By denition, A B and so A = B by denition of the Borel -algebra B . If B B has nite measure, then B Lp (X ) and for every > 0 there exists some m 1 with (B Xm ) < so that
70
B Xm B
Choosing f Cc (X ) with f Xm B we again get f B
p
< 1/p .
< ,
< 1/p + .
Therefore, every simple function (and hence, by the reduction argument at the beginning of the proof, every function in Lp p -closure (X )) lies in the of Cc (X ).
2.4 Bounded Operators and Functionals

Just as in linear algebra, linear maps are of fundamental importance in functional analysis. However, in innite-dimensional normed vector spaces continuity of linear maps is not guaranteed. We will use familiar notions from the theory of metric spaces. A map f :V W between normed spaces is continuous at v0 V if for any > 0 there is a > 0 such that v v0 < = f (v ) f (v0 ) < ; is continuous if it is continuous at each point, and is an isometry if f (v ) = v for all v V . Lemma 2.39 (Continuity and Boundedness). Let (V, V ) and (W, W ) be normed vector spaces and let L : V W be a linear map. Then L is continuous if and only if the operator norm L = L is nite. Denition 2.40. A continuous linear map L : V W between normed vector spaces is called a bounded linear operator. We denote the space of all bounded operators from V to W by B (V, W ). For brevity we write B (V ) for B (V, V ). If W = R (or W = C if the eld of scalars is C) then we also call B (V, R) = V (respectively B (V, C) = V ) the dual space of V , and elements of the dual space are called linear functionals.
operator
sup
v V, v
V
Lv
1
71
Lemma 2.41 (Space of operators). Let (V, V ) and (W, W ) be normed vector spaces. Then the space of linear maps from V to W , B (V, W ), is also a normed vector space with addition and scalar multiplication dened pointwise as in any space of functions, and with the operator norm from Lemma 2.39. If W is a Banach space, then so is B (V, W ), and in particular V is always a Banach space. Proof of Lemma 2.39. The case L = 0 is trivial, so we may assume that L = 0. Suppose that L operator < . Then for any v0 V we have
V L v0 + B/ V since v B/ L L
operator
W L(v0 ) + B
operator
{0} implies that = v

</ L
Lv
V
operator
norm one L
v v V
operator
< .
Hence L operator < implies that L is continuous. Suppose now that L is continuous. Then there exists some > 0 such that
V W L B B1 . v ) W 1, and Lv W In particular, v V 1 implies that L( 2 holds for all v with v V 1, we deduce that L operator < . 2 .
As this
Exercise 2.42. Show that the operator I : C ([0, 1]) C ([0, 1]) dened as the integral
x
I (f )(x) =
0
f (t ) d t
is continuous. Use this to shorten the argument in the proof for Example 2.19(5) on p. 48. Show also that the operator D : C 1 ([0, 1]) C ([0, 1]) dened as the derivative D(f ) = f is not continuous if we use the norm on both spaces.
Notice that the denition of the operator norm immediately gives the following general inequality, Lv
W
operator
for all v V , and the operator norm may be characterized as being the smallest number C with the property that Lv
W
C v
(2.24)
for all v V . We will use both these statements frequently in the sequel without comment.
72
Exercise 2.43. Prove that the operator norm of a bounded operator L : V W between two normed vector spaces is the smallest constant C 0 such that (2.24) holds for all v V .
Proof of Lemma 2.41. As indicated in the lemma, for L1 , L2 B (V, W ) and a scalar we dene L1 + L2 by (L1 + L2 ) (v ) = L1 (v ) + L2 (v ) for all v V . This is clearly another linear map. In order to bound its operator norm, let v V have v V 1. Then (L1 + L2 )(v )
W
= L1 (v ) + L2 (v ) W || L1 (v ) W + L2 (v ) || L1
operator
W operator ,
+ L2
and so L1 + L2
operator
|| L1
operator
+ L2
operator .
That is, the operator norm satises the triangle inequality and one half of the homogeneity property. The reverse inequality for homogeneity of the operator norm follows easily by considering the case = 0 and = 0 separately. Strict positivity is clear, so we have shown that B (V, W ) is a normed vector space with the operator norm. Now suppose that W is a Banach space and that (Ln ) is a Cauchy sequence. We claim that L(v ) = lim Ln (v )
n
denes an element L of B (V, W ) which is the limit of the sequence with respect to the operator norm. To see that L(v ) is well-dened it is enough to check that (Ln (v )) is a Cauchy sequence, which follows at once from the bound Lm (v ) Ln (v )
W
= (Lm Ln )v
Lm Ln
operator
which (for xed v ) may be made as small as we please for m, n large by the Cauchy property for the sequence (Ln ). To see that the limit L is a bounded operator one has to show that it is linear (which we leave as an exercise) and that it is bounded. For the latter, assume that v V has v 1 and choose N () from the Cauchy property for (Ln ), so that Ln (v ) L(v )
W
= lim
Ln (v ) Lm (v )
<
for n N (). Taking = 1 and n = N (1), this shows that L is bounded, and for general > 0 this shows precisely that Ln L as n . A word about notation: where the spaces concerned are clear, or where we wish to emphasize certain aspects of the spaces, we will for brevity often
73
use or X to mean the appropriate norm in that situation. Thus, for example, depending on context the symbols L , L operator and L B (V,W ) all mean the same thing. A good exercise for the reader is to ensure that they can identify the norms in each case. Lemma 2.44 (Sub-multiplicativity of operator norms). Let V, W, Z be three normed vector spaces, and let R : V W and S : W Z be bounded operators. Then S R : V Z is also a bounded operator, and S R S R . In particular, if L : V V is a bounded operator then Ln L n for all n 1. Proof. For any v V with v S R v S R . 1 we have S R(v ) S R(v )
Exercise 2.45. Compute the operator norm of the continuous map f f when viewed: (a) as a map C 1 ([0, 1]) C ([0, 1]); (b) as a map C ([0, 1]) L1 ([0, 1]), where is Lebesgue measure on [0, 1]. (c) Compute the operator norm of the composition of the maps from (a) and from (b). (d) Now restrict the maps in (a), (b) and (c) to the space of functions f with f (0) = 0, and compute the operator norms again.
The following result is both quite easy and extremely useful for the theory to come. Proposition 2.46 (Unique extension to completion). Let V be a normed vector space, let V0 V be a dense subspace, and assume that L0 : V0 W is a bounded operator into a Banach space W . Then L0 has a unique bounded extension L : V W , that is a bounded linear map L : V W with L|V0 = L0 , and L B (V, W ).
B (V,W )
Moreover L
= L0
B (V0 ,W ) .
Proof. For any v V there is a sequence (vn ) in V0 with vn v as n . In particular, this implies that (vn ) is a Cauchy sequence in V0 , and since L0 : V0 W is bounded (and so Lipshitz), it follows that (L0 (vn )) is a Cauchy sequence in W . If (vn ) is another sequence in V0 with vn v as n then vn vn 0 as n and so L0 (vn ) L0 (vn )0 as n since L0 is bounded (and so continuous at 0). Thus it makes sense to dene an operator L on V by
74
L(v ) = lim L0 (vn ) W,

n
because W is a Banach space. Notice that by density and the desired continuity of the extension, this is the only possible denition of a bounded operator that extends L0 . One can quickly check that L is a linear map from V to W . Moreover, if v V and (vn ) is a sequence in V0 with vn v as n , then L(v ) = lim
n
L0 (vn )
L0
n = v
lim
vn ,
showing that L is bounded, with L so L L0 .
L0 . On the other hand L|V0 = L0 ,
Corollary 2.47. Any two completions B1 and B2 of a given normed vector space V are isometrically isomorphic. Here a completion of a normed vector space V is a Banach space B containing an isometric dense copy of V , just as in the construction of the Banach space in Section 2.2.2. Proof of Corollary 2.47. Suppose that 1 : V B1 and 2 : V B2 are isometric embeddings associated to the two completions, as illustrated in Figure 2.3.
A }} AAA 2 } AA } AA }} } ~ } 1 +B B1 k 2
1 2
Fig. 2.3. The two given completions 1 , 2 and the maps 1 , 2 to be constructed.
Since 1 and 2 are isometries, the map 0 : 1 (V ) 2 (V ) B2 1 (v ) 2 (v ) is a well-dened bounded operator dened on a dense subset 1 (V ) B1 . By Proposition 2.46 there is an extension 1 : B1 B2 with norm
1 1 = 2 1 = 1. 1 Similarly there exists an extension 2 : B2 B1 which extends 1 2 and which also has norm 2 = 1. It follows that 2 1 and 1 2 are extensions
75
of the identity map on 1 (V ) and on 2 (V ) respectively. By uniqueness of the extension in Proposition 2.46 we must have 2 1 = IdB1 and 1 2 = IdB2 . We also see that b = 2 (1 (b)) 1 (b) b
for any b B , so that 1 is an isometry from B1 to B2 with 2 its inverse.

Exercise 2.48. Let D C be open and bounded. Assume in addition that D is the union of the images of nitely many C 1 curves 1 , . . . , k : [0, 1] C, which we choose to parameterize with unit speed so that |i (t)| = 1 for i = 1, . . . , k and all t [0, 1]. Let V be the space of all functions f C (D) such that the complex f derivative d exists and extends continuously to D . Fix p [1, ). dz (a) Equip V with the norm
k 1 0 1/p
H p (D )
= f |D
Lp (D )
=
i=1
|f (i (t))| dt
Show, for all z D, that the linear map Ez : f f (z ) is continuous with respect to compact closure O D, then
Lp (D ) .
Also show that if O D is open with
V f f |O C (O) is a bounded operator with respect to Lp (D ) and on C (O ). In particular, conclude that there exists a canonical map from the completion H p (D), known as a Hardy space, to the space of holomorphic functions on D. (b) Equip V with the norm f
H p (D )
= f
Lp (D ) ,
and repeat the problems from (a) to obtain the Bergmann space Ap (D). C (c) Let D = B1 (0) and describe the spaces H p (D) and Ap (D) in terms of the sequence of Taylor coecients (an ) of the Taylor expansion f (z ) = n 0 an z n of elements of the space. Exercise 2.49. For each of the ve norms on R[x] given in Example 2.19(7), nd a Banach space containing R[x] for which the induced norm obtained by restriction coincides with the given norm on R[x].
2.4.1 The Volterra Equation
We describe in this section how the theory above can be used to solve the Volterra equation. Recall from Section 1.3 that the initial value problem
This section will justify one of the claims from Section 1.3, but is not required for later material.
76
f + f = f f (0) = 1, f (0) = 0 on [0, 1] is equivalent to the integral equation (I K ) f = u

x
(2.25)
(2.26)
where u(x) = cos x, k (x, t) = sin(x t) (t), and the integral operator K is dened by K (f )(x) =
0
k (x, t)f (t) dt.
(2.27)
Lemma 2.50. Suppose that k C ([0, 1] [0, 1]). Then (2.27) denes a bounded linear operator K : C ([0, 1]) C ([0, 1]) with K k
,
and more generally with Kn k n . n!
In particular, the geometric series (I K )1 =

n=0
Kn
converges in B (C ([0, 1])). It follows that (2.26) has a unique solution for any C ([0, 1]), this solution belongs to C 2 ([0, 1]), and solves (2.25). Proof. As k is uniformly continuous, it is easy to check that K (f ) C ([0, 1]) for every f C ([0, 1]). Indeed, if > 0 then there exists some > 0 for which |x1 x2 | < = |k (x1 , t) k (x2 , t)| < for all t [0, 1]. Multiplying by f (t) and integrating from 0 to x shows that |x1 x2 | < = |K (f )(x1 ) K (f )(x2 )| < + k as required. Also, K is linear, and
x
Kf
sup
x[0,1] 0
k (x, t)f (t) dt
so K denes a bounded linear operator with K k . To prove the estimate on K n we need to be a bit more careful. For every x [0, 1] we have

x
77
|(Kf )(x)|
|k (x, t)f (t)| dt
x k Suppose we have already shown that |(K n f )(x)| Then (K n+1 f )(x)
0 x
xn k n!
(2.28)
|k (x, t)| |(K n f )(t)| dx tn k n!

n+1
dx
xn+1 k (n + 1)!
n+1
By induction on n, it follows that (2.28) holds for all n Kn k n n!
1, and so
for all n 1. By Lemma 2.41, B (C ([0, 1])) is a Banach space. By Lemma 2.22, it follows n that the absolutely convergent series n=0 K also converges in B (C ([0, 1])). However, (I K )
n=0
Kn =
n=0
K n (I K ) =
n=0
Kn
n=1
K n = I,
so the sum n=0 K n is the inverse of I K . The additional claim in the lemma regarding (2.26) follows from the discussion in Section 1.3.2. 2.4.2 The Norm of Continuous Functionals on C (X )
Let X be a locally compact metric space, and let be a nite Borel measure. Then : f f d
is a continuous functional on C0 (X ). Indeed,

The result of Section 2.4.2 will initially only be used in more concrete settings (and will even be proved there, as in Chapter 3), where it is easier to calculate the norm than in the general case considered here. The reader may therefore skip the proof and return to it later if needed.
78
f d
|f | d
(X ) f
shows the continuity by Lemma 2.39. More generally, if g L1 (X ) then g d : f f g d (2.29)
is also a continuous functional on C0 (X ). Again this is easy to see since f g d |f ||g | d f
. L1
(2.30)
In fact a more precise statement holds, but this takes a little more work. Lemma 2.51 (Operator norm of integration). Suppose that is a locally nite measure on a locally compact -compact metric space X . Let g L1 (X ). Then the norm of the functional on C0 (X ) dened in (2.29) is precisely g L1 . Proof. Let h(x) = sign(g (x)) = Clearly h L (X ) and hg d = |g | d = g . L1
g (x ) | g (x )|
if g (x) = 0, if g (x) = 0.
We wish to approximate h by continuous functions. Fix > 0. Then by Lebesgue integrability of g (see Lemma B.12) there exists a > 0 such that (Z ) < = g d < .
Z
Also, by Lusins theorem (Proposition B.16) applied to the nite measure dened by d = |g | d there exists a compact set K X such that the restriction h|K of h to K is continuous, and X K |g | d < . By Tietzes extension theorem (Proposition A.25) the restriction h|K can be extended to a continuous function f Cc (X ) of compact support. We may assume that f 1, as if this is not the case we may replace f by the continuous function f (x) if |f (x)| 1, f (x) if |f (x)| 1. |f (x)| Thus
79
g d
operator X
f g d f g d f g d
X K
|g | d
L1
X K
|g | d
2.
Since > 0 was arbitrary, this shows that g

L1
g d
operator ,
and the reverse inequality follows from (2.30).
2.4.3 Banach Algebras In many situations it makes sense to multiply elements of a normed vector space with each other. Denition 2.52. Let A be a Banach space, and assume there is a multiplication operation (x, y ) xy from A A A such that addition and multiplication make A into a ring, with the property that xy x y
for all x, y A. Then A is called a Banach algebra. Recall that a ring does not need to have a unit; if a ring A has a unit 1A then it is called unital. The additional axiom on the norm makes the product operation continuous by the following argument. Fix (0, 1) and x, y A. Then x x < < 1 and y y < together imply that x y xy x (y y ) + (x x)y ( x + y ) ( x + 1 + y ) .
Since (0, 1) was arbitrary, this shows the continuity of the product map at (x, y ) A2 . Example 2.53. (1) The continuous functions C (X ) on a compact topological space X with the supremum norm form a Banach algebra with respect to the pointwise multiplication operation (f g )(x) = f (x)g (x) for all x X . Notice that the constant function 1 is a unit in this ring. (2) Let X be a non-compact topological space. Then C0 (X ) is a Banach algebra with respect to the supremum norm and pointwise multiplication as in (1) above, but it does not have a unit.
80
(3) If V is any Banach space, then B (V ) = B (V, V ) is a Banach algebra with respect to composition: ST = sup
v =1
(ST )v = sup
v =1
S (T v )
sup
v =1
Tv = S
T .
The algebra has a unit, namely I (v ) = v for all v V . (4) A special case of (3) above is the case V = Rn . By choosing a basis for Rn we may identify B (Rn ) with the space of n n real matrices. In a Banach algebra with unit, we can apply many well-known functions to its elements and obtain new elements of the Banach algebra. For example, if a is any element of a Banach algebra, then we may dene exp a = an n! n=0
where a0 is the unit in A. The series denes an element of A by Lemma 2.22. We will return to the topic of Banach algebras in Section 2.4.3.
2.5 Hilbert Spaces

The notion of a Hilbert space is another fundamental idea in functional analysis. We will see in this section that a Hilbert space is a Banach space of a special sort, and the additional structure entailed by the extra hypothesis turns out to be highly signicant. 2.5.1 Denitions and Elementary Properties Denition 2.54. An inner product space or a pre-Hilbert space is a vector space over R or C with an inner product , : V V R (or C) with the following properties: v, v > 0 for all v V {0} (Positivity) ; v, w = w, v for all v, w V ((Skew-)Symmetry) ; and for any xed w V the map v v, w is linear (Linearity).
Notice that over R a consequence is linearity of the map w v, w in the second variable for xed v ; in the complex case we have sesqui-linearity instead: v, w1 + w2 = v, w1 + v, w2 . In an inner product space, we will see shortly that dening v = gives a norm on V . v, v (2.31)
2.5 Hilbert Spaces
81
Proposition 2.55 (Cauchy-Schwarz). Let (V, , ) be an inner product space. Then we have the CauchySchwarz inequality, | v, w | v w (2.32)
for all v, w V , where equality holds if and only if v and w are linearly dependent. Moreover, the function dened in (2.31) is a norm on V , so that every inner product space is also a normed space. Denition 2.56. A Hilbert space is an inner product space (H, , ) which is complete with respect to the norm dened using the inner product in (2.31). Proof of Proposition 2.55. We start by proving the CauchySchwarz inequality (2.32), which holds trivially if w = 0. So assume that w = 0. By denition, we have 0 v + tw
2
= v + tw, v + tw = v, v + tw, v + v, tw + tw, tw = v = v

2 2
+ t w, v + t w, v + |t|2 w + 2 t v, w v, w . w 2 + | t |2 w
2
(2.33)
for any scalar t by linearity and skew-symmetry of the inner product. We set t= Then the inequality (2.33) becomes 0 or 0 giving (2.32). Equality in (2.32) gives v + tw
2
| v, w |2 | v, w |2 + w 2, 2 w w 4
2
| v, w |2 ,
= v + tw, v + tw = 0,
which forces v + tw = 0 by the positivity property. If, on the other hand, v and w are linearly dependent with v = w for some scalar , then | v, w | = | w, w | = || w
2
= v
by the homogeneity property (which we will now prove). To see that : V R 0 from (2.31) denes a norm, we need to check the following properties:
Both because it works for the argument at hand, and because it is a geometrically interesting choice in its own right: it is the unique scalar which, when multiplied by w, gives the orthogonal projection of v onto the subspace spanned by w.
82
Positivity of , which follows at once from the positivity property of the inner product; Homogeneity of , which follows from the linearity and (skew-)symmetry, since v 2 = v, v = ||2 v 2 ; The triangle inequality, which may be seen as follows: v+w
2
= v + w, v + w = v v
2 2
+2 v
+ 2 v, w + w w + w
2
2 2
=( v + w ) by the CauchySchwarz inequality.
Exercise 2.57. Show that an inner product on an inner product space is jointly continuous with respect to the induced norm: if vn v and wn w as n , then vn , wn v, w as n .
We record a few elementary properties of inner product spaces. The parallelogram identity, v+w
2
+ vw
=2 v
+2 w
(2.34)
for all v, w V . The relationship with linear functionals: for xed w V the map w dened by w (v ) = v, w is a linear functional with norm w = w .
Both are easy to check. For the rst, expand the left-hand side to obtain v+w
2
+ vw
= v + w, v + w + v w, v w = v
2
+ 2 v, w + w
+ v
2 v, w + w 2 .
The second claim is a consequence of the linearity of the inner product, the CauchySchwarz inequality and the denition of the operator norm.
Exercise 2.58. Show that the parallelogram identity (2.34) characterizes real Hilbert spaces among the real Banach spaces in the following sense. If a real normed vector space satises the parallelogram identity, then an inner product can be dened in such a way that the norm arises from the inner product. Notice that if H is a Hilbert space, then the polarization identity expresses the inner product in terms of the norm as 1 x, y = x+y 2 xy 2 . 4 Exercise 2.59. Generalize the polarization identity to complex Hilbert spaces, and show the complex analog of Exercise 2.58.
2.5 Hilbert Spaces
83
Example 2.60. We have already seen several Hilbert spaces without making explicit the underlying inner product. (1) V = Rd (or V = Cd ) with
d
v, w =
i=1
vi wi ,
resulting in the 2-norm

d 1/2
=
i=1
|vi |
(2) V = 2 (N) = 2 , the space of square-summable sequences of scalars, with inner product v, w = with the 2-norm v
2
vi wi ,
1/2
i=1 i=1
|vi |
(3) V = L2 (X ) for a measure space (X, B , ) with the inner product f, g = giving the 2-norm
1/2
f g d,
=
X
|f |2 d
Notice that in Example 2.60(2) and (3), the spaces are themselves dened as the set of sequences or functions with nite 2-norm. The CauchySchwarz inequality then shows that the inner-product is well-dened (that is, nite) on the space. We recall this argument for (3) which gives (2) as a special case. Lemma 2.61. If (X, B , ) is a measure space and f, g L2 (X ), then f g d = f, g
X
is well-dened.
This space could more consistently be written L2 (N) where is counting measure on N, but the convention is to write these sequence spaces using .
84
Proof. Recall that a function is simple if it is a nite linear combination of characteristic functions of measurable sets of nite measure. Clearly the space of all such simple functions is an inner product space with the inner product f, g dened in (3), and the derived norm is precisely the 2-norm. If f, g L2 (X ) are positive, then there exist sequences (fn ) and (gn ) of simple functions with fn f and gn g as n , so that fn gn f g as n . It follows by monotone convergence that f g d = lim
n
fn gn d
lim
fn
gn
is nite. If f, g L2 are any two functions, then the argument above applies to the measurable functions |f | and |g | so that |f g| d and hence f, g = f
2
g 2,
f g d exists and satises the CauchySchwarz inequality.
2.5.2 Isometries are Ane Denition 2.62. A norm on a vector space V is strictly sub-additive if v+w < v + w for v, w V except when v or w is a real non-negative scalar multiple of w or v respectively. From the equality case of the CauchySchwarz inequality, which is itself used in the proof of the triangle inequality, it follows quickly that a norm in an inner product space is strictly sub-additive. Thus the following result applies in particular to Hilbert spaces.
Exercise 2.63. Show that the norm in a Hilbert space is strictly sub-additive : that is, v+w < v + w for all v, w V , unless v or w is a real non-negative scalar multiple of the other (see Denition 2.62).
Theorem 2.64 (MazurUlam [31]). Let V, W be two normed vector spaces with strictly sub-additive norms over R, and let M : V W be an isometry. Then M is ane, that is M (v ) = Mlinear (v ) + M (0), where Mlinear : V W is a linear isometry.
The results in Section 2.5.2 are interesting, but will not be needed later.
2.5 Hilbert Spaces
85
Proof. Clearly the map v M (v ) M (0) is an isometry if M is an isometry, so we may assume that M (0) = 0 without loss of generality. v2 Let v1 , v2 V be any points, and dene z = v1 + V to be the mid-point 2 of v1 and v2 . Now v1 z = z v2 = Since M is an isometry, this implies that M (v1 ) M (z ) = M (z ) M (v2 ) = Moreover, M (v1 ) M (v2 ) = (M (v1 ) M (z )) + (M (z ) M (v2 )) . Thus strict subadditivity implies that (M (v1 ) M (z )) and (M (z ) M (v2 )) must be real non-negative multiples of each other, but as they have the same norm this forces them to be be equal. In other words, we have shown that mid-points are sent to mid-points by M : M v1 + v2 2 = M (v1 ) + M (v2 ) 2 (2.35)
1 2 1 2
v1 v2 . M (v1 ) M (v2 ) .
for all v1 , v2 V . For a given v V we can apply this to the pairs v and 0, 1 then 1 2 v and 0, v and 2 v , 2v and v , v and v , and so on. This shows that M
k 2n v
k 2n M (v )
for any k Z, n N and v V , and so by continuity M (av ) = aM (v ) for all a R and v V . This, combined with (2.35), gives M (v1 + v2 ) = M (v1 ) + M (v2 ) for v1 , v2 V also. 2.5.3 Convex Sets in Uniformly Convex Spaces While the emphasis in this section is on Hilbert spaces, it is useful to isolate a more abstract property which is precisely what is needed for several proofs in this section. Denition 2.65. A normed vector space (V, ) is called uniformly convex if x+y 1 ( x y ), x , y 1 = 2 where : [0, 2] [0, 1] is a monotonically increasing function with (r) > 0 for all r > 0.
86
Lemma 2.66. A Hilbert space (H, , ) is uniformly convex. Proof. For x, y H with x , y we have x+y = 2
1 2
1 then by the parallelogram identity
1 2
1 4
xy
1 as required, with (r) = 1
1 4
xy
= 1( xy )
2 1 1 4r .
Heuristically, we can think of Denition 2.65 as having the following geometrical meaning, illustrated in Figure 2.4. If vectors x and y have norm y (length) one, then their mid-point x+ 2 has much smaller norm unless x and y are very close together. This accords closely with the geometrical intuition from nite-dimensional spaces with Euclidean distance.
y
x+ y 2
x Fig. 2.4. The mid-point is uniformly closer to zero.
The following theorem, whose conclusion is illustrated in Figure 2.5, will have many important consequences for the study of Hilbert spaces. Theorem 2.67 (Unique approximation within a convex set). Let (V, ) be a Banach space with a uniformly convex norm, let K V be a closed convex subset, and assume that v0 V . Then there exists a unique element w K that is closest to v0 in the sense that w is the only element of K with w v0 = inf k v0 .
k K
Proof of Theorem 2.67. By translating both the set K and the point v0 by v0 , we may assume without loss of generality that v0 = 0. We dene s = inf k v0 = inf k .
k K k K
If s = 0, then we must have 0 K since K is closed, and the only choice is then w = v0 = 0 (the uniqueness of w is a consequence of the strict positivity of the norm). So assume that s > 0. By multiplying by the scalar 1 s we
2.5 Hilbert Spaces
87
v0
K w
Fig. 2.5. The unique closest element of K to v0 .
may also assume without loss of generality that s = 1. Notice that once we have found a point w K with norm 1, then its uniqueness is an immediate consequence of the uniform convexity: if w1 , w2 K have w1 = w2 = 1, w2 w2 then w1 + K because K is convex. Also, w1 + = 1 by the triangle 2 2 inequality and since s = 1. By uniform convexity this implies that w1 = w2 . Turning to the existence, let us rst sketch the argument. Choose a sekm also quence (kn ) in K with kn 1 as n . Then the mid-points kn + 2 lie in K , since K is convex. However, this shows that the mid-point must have norm greater than or equal to 1, since s = 1. Therefore kn and km must be close together by uniform convexity. Making this precise, we will see that (kn ) is a Cauchy sequence. Since V is complete and K is closed, this will give a point w K with w = 1 = s as required. To make this more precise, we apply uniform convexity to the normalized vectors 1 xn = kn , sn where sn = kn . The mid-point of xn and xm can now be expressed as 1 1 xm + xn km + kn = = 2 2 sm 2 sn with a=
1 2sm 1 2sm
1 1 + 2 sm 2 sn
(akm + bkn )
+
1 2sn
1 2sn
0,
b=
1 2sm
1 2sn
and a + b = 1. Therefore akm + bkn K by convexity, and so xm + xn 2 = 1 1 + 2 sm 2 sn akm + bkn 1 1 + . 2 sm 2 sn
Fix > 0 and choose N = N () such that m 1 sm 1 ().
N implies that
88
Then m, n
N implies that 1 1 + 2 sm 2 sn 1 (),
which together with the denition of uniform convexity gives 1 ( xm xn )

x m +x n 2
1 ().
By monotonicity of the function this implies that xm xn ,
showing that (xn ) is a Cauchy sequence. Since sn 1 as n it follows that (kn ) with kn = sn xn for all n 1 is also a Cauchy sequence, since km kn = (sm sn )xm + sn (xm xn ). The limit w = limn kn must lie in K and by construction is an (and hence is the) element closest to 0. Denition 2.68. Let H be a Hilbert space, and A H any subset. Then the orthogonal complement of A is dened as A = {h H | h, a = 0 for all a A}. Corollary 2.69 (Orthogonal decomposition). Let H be a Hilbert space, and let Y H be a closed subspace. Then Y is a closed subspace with H = Y Y , meaning that every element h H can be written in the form h = y + z with y Y and z Y and y and z are unique with these properties. Moreover, Y = Y and h
2
= y
+ z
(2.36)
if h = y + z with y Y and z Y . In a two-dimensional real vector space, (2.36) is more familiar as Pythagoras theorem. Proof of Corollary 2.69. As h h, y is a (continuous linear) functional for each y Y , the set Y is an intersection of closed subspaces and hence is a closed subspace. Using positivity of the inner product, it is easy to see that Y Y = {0}, and from this the uniqueness of the decomposition h = y + z with y Y and z Y follows at once. So it remains to show the existence of this decomposition. Fix h H , and apply Theorem 2.67 with K = Y to
2.5 Hilbert Spaces
89
nd a point y Y that is closest to h. Let z = h y , so that for any v Y and any scalar t we have z
2
h (tv + y )
Y
= z tv
= z
2 (t v, z ) + |t|2 z 2.
However, this shows that (t v, z ) = 0 for all scalars t and v Y , and so v, z = 0 for all v Y . Thus z Y , and hence h
2
= h, h = y + z, y + z = y
+ z 2,
showing (2.36). It is clear from the denitions that Y Y . If v Y then v = y + z for some y Y and z Y by the rst part of the proof. However, 0 = v, z = z implies that v = y and so Y = Y
2
An immediate consequence of Corollary 2.69 is the following. Corollary 2.70 (Orthogonal projection). For a closed subspace Y of a Hilbert space H , the orthogonal projection onto Y , dened by P : H Y h y where y is the unique element of Y with h y Y , is a bounded linear operator with P 1 satisfying h, y = P h, y for all h H and y Y . Recall that we write V = B (V, R) or B (V, C) for the dual space of a normed vector space V , equipped with the operator norm. Corollary 2.71 (Fr echetRiesz Representation). For a Hilbert space H the map sending h H to (h) H dened by (h)(x) = x, h is a linear (resp. sesqui-linear in the complex case) isometric isomorphism between H and its dual space H .
90
Proof. By the axioms of the inner product, we know that is (sesqui-)linear. By the CauchySchwarz inequality and since (h)(h) = h 2 we also know that is isometric. It remains to show that is onto. So suppose that : H R (or C) is a linear functional. Then Y = ker() is a closed linear subspace of H (since is continuous). If Y = H then = 0 and so (0) = . So suppose that Y = H , in which case we can choose a non-zero element z Y . We claim that (z ) = z . z 2 Indeed let x H then (z )x (x)z ker = Y and so (z )x (x)z, z = 0 by choice of z . In other words, we have shown that (z ) x, z = (x) z which is equivalent to (x) = x,
(z ) z 2 2
for x H as claimed.
Exercise 2.72. The following is known as the LaxMilgram lemma. (a) Suppose that H is a Hilbert space, and suppose that B : H H R(or C) is linear in both coordinates (or sesquilinear in the second in the complex case). Finally assume that B is bounded in the sense that there is some M > 0 with |B (x, y )| M x y
for all x, y H . Show that there exists a linear operator T : H H with B (x, y ) = T x, y for which T operator M . (b) Assume in addition that B is coercive, meaning that there exists some c > 0 such that |B (x, x)| c x 2 for all x H . Show in this case that the operator T from (a) has an inverse, and 1 that T 1 operator . c Exercise 2.73. Recall the denition of the Hardy space H 2 (D) (or the Bergmann space A2 (D)) from Exercise 2.48. Show that for every a D there is a function ka H 2 (D) (respectively ka A2 (D)) with f (a) = f, ka or f (a) = f, ka
A2 (D ) H 2 (D )
respectively. The function D D (a, w) ka (w) is called a reproducing kernel. C Determine the two functions for D = B1 (0).
We note that Y is one-dimensional, since |Y has trivial kernel. Hence the choice of z will not matter below.
2.5 Hilbert Spaces
91
Exercise 2.74. Use Corollary 2.71 to show that if H is a Hilbert space, then H is also a Hilbert space, and exhibit a natural isometric isomorphism between H and H . Exercise 2.75. Show that every inner product space has a completion which is a Hilbert space.
We recall (and extend) the denition of linear hull as follows. Denition 2.76. Let (V, ) be a normed vector space, and let S V be a subset. The linear hull of S , written S , is the smallest subspace of V containing S . Thus S consists of all linear sums sF scs for F S nite and scalars cs . The closed linear hull S is the smallest closed subspace of V containing S it is the closure of the linear hull. Corollary 2.77 (Characterization of the closed linear hull). Let H be a Hilbert space and S H a subset. Then S = S
= {h H | h, x = 0 for all x S },
where S = {x H | x, s = 0 for all s S }. Proof. Let Y = S be the closed linear hull. By Corollary 2.69, Y = Y . We claim that Y = S , which together with the last statement gives the corollary. To see the claim, notice that for any x H we have x S x, s = 0 for s S x, y = 0 for y S x, y = 0 for y S x Y by (sesqui-)linearity and continuity of the inner product in the second argument.
2.5.4 Two Applications to Measure Theory We will show in this section two ways in which the results from Section 2.5.3 can be used in measure theory. Proposition 2.78 (Conditional expectation). Let (X, B , ) be a probability space, and let A B be a sub- -algebra. Then there exists a bounded operator, called the conditional expectation,
These applications are not essential for what follows, but they do give an explanation of the results in Section 2.5.3 in the context of measure theory.
92
E A : L1 (X, B , ) L1 (X, A, ) f E f A such that f d =

A A
E f A d
(2.37)
for all A A. This notion of conditional expectation with respect to a sub- -algebra is a powerful one, and nds wide applications in probability (see Lo eve [26], [27]) and ergodic theory (see [12], for example). The next example illustrates the meaning of the conditional expectation. Example 2.79. Let X = [0, 1]2 and let B be the Borel -algebra, and let A = B ([0, 1]) {, [0, 1]} = {B [0, 1] | B B[0,1] } be the sub- -algebra corresponding to the rst coordinate x1 of (x1 , x2 ) X = [0, 1]2 . Let be a Borel probability measure on X , and f L1 (X, B , ). Then (x1 , x2 ) E f A (x1 , x2 ) is independent of x2 the property (2.37) may be thought of as meaning that E f A (x1 , x2 ) is the average of f over the line segment {x1 } [0, 1]. Notice that because this line segment {x1 } [0, 1] may well be a -null set, this averaging cannot be carried out naively. If happens to be a product measure, so = 1 2 , then E f A (x1 , x2 ) = f (x1 , t) d2 (t)
by Fubinis theorem. However, for the existence of the conditional expectation operator we did not need to know anything about the measure , and in particular need not be a product measure. One of the necessary prices paid for this level of generality is that E f A (x1 , x2 ) is only dened for almost every (x1 , x2 ) and not everywhere (but this already appears in Fubinis theorem).
(2 + 2 ), where 1 is Exercise 2.80. In the notation of Example 2.79, let = 1 2 the one-dimensional probability measure supported on the diagonal {(x, x) | x [0, 1]}, and let 2 be the two-dimensional Lebesgue measure supported on the set {(x, y ) [0, 1]2 | 0 y x2 }
and normalized to be a probability measure. Compute E f A for f L1 (X ).
2.5 Hilbert Spaces
93
Proof of Proposition 2.78. By Example 2.60(3), we know that L2 (X, B , ) and L2 (X, A, ) are both Hilbert spaces. In particular, we know that L2 (X, A, ) L2 (X, B , ) is closed (since it is complete). It follows that there is an orthogonal projection P : L2 (X, B , ) L2 (X, A, ) just as in Corollary 2.70. Let us also write E0 = P . For any f L2 (X, B , ) and A A we have A L2 (X, A, ) and so
A
E0 (f ) d = P (f ), A = f, A =
f d.
A
Assume rst that f is real-valued, in which case the identity above shows that E0 (f ) is also real-valued. Let A+ = {x X | E0 (f ) > 0} A, Then
X
A = {x X | E0 (f ) < 0} A. |E0 (f )| d = =
A+
A+
E0 (f ) d f d
A
E0 (f ) d
A
f d
|f | d. If f is complex-valued, then we may apply the above to (f ) and (f ) separately to see that E0 (f )
1
2 f Hence
E0 ((f )) 1 + E0 ((f )) (f ) 1 + (f ) 1
1.
satises the hypotheses of Proposition 2.46 with respect to the dense subspace V0 = L2 (X, B , ) of V = L1 (X, B , ) with respect to the norm 1 . It follows that there is an extension E = E ( A) : L1 (X, B , ) L1 (X, A, ). It remains to show that the extension E still satises (2.37). For this, let f L1 (X, B , ), A A and let (fn ) be a sequence in L2 (X, B , ) with fn f 1 0 as n . Then
E0 : L2 (X, B , ) L2 (X, A, ) L1 (X, A, )
94
fn d
f d
A A
|fn f | d
fn f
as n , and similarly
A
E fn A d
E f A d
as n . Since each fn L2 (X, B , ) satises (2.37) by construction (via the orthogonal projection P = E0 ), the same must hold for f L1 (X, B , ).
Exercise 2.81. (a) Show that (2.37) uniquely characterizes E f A L1 (X, A, ) as an equivalence class. (b) Show that f L1 (X, B, ) and g L (X, A) implies that E f g A = gE f A . (c) Show that E f A
1
even for complex-valued f .
Before stating the next result, we recall some denitions from measure theory. A measure is absolutely continuous with respect to another measure , written if there exists some measurable f 0 with d = f d, that is if there is a nite measurable f 0 with (B ) =
B
f d
for all B B . Two measures and are singular with respect to each other if there exist disjoint measurable sets X1 , X2 X with X = X1 X2 and with (X1 ) = 0 = (X2 ). Finally, recall that a measure is -nite if there is a decomposition of X into measurable sets, X= with (Xi ) < . Proposition 2.82 (RadonNikodym derivative). Let and be two nite measures on a measurable space (X, B ). Then can be decomposed as = abs + sing into the sum of two -nite measures with abs being absolutely continuous with respect to , and with sing and being singular to each other (which will be written sing ).
i=1
Xi
2.5 Hilbert Spaces
95
The proposition implies that there is another, more practical way of checking whether a given -nite measure is absolutely continuous with respect to another -nite measure . If (N ) = 0 implies that (N ) = 0 for every measurable N X , then = abs is absolutely continuous. We also note that the density function f with f d = d is called the RadonNikodym derivative d and is often written d . Proof of Proposition 2.82. Suppose that and are both nite measures (the general case can be reduced to this case by using the assumption that and are both -nite; see Exercise 2.84). We dene a new measure m = + and will work with the real Hilbert space H = L2 m (X ). On this Hilbert space we dene a linear functional by (g ) = g d
for g H = L2 m (X ). For g a simple function on X , this is clearly well-dened and satises |(g )| = g d |g | d |g | dm g
L2 m
L2 m
where we have used the fact that m = + , that is a positive measure, and the CauchySchwarz inequality on H . Since the simple functions are dense in H , the functional extends to a functional on all of H . By Fr echetRiesz representation (Corollary 2.71) there is some k H = L2 such that m g d = (g ) = gk dm. (2.38)
We claim that k takes values in [0, 1] almost surely with respect to m. Indeed, for any B B we have 0 (B ) m(B ), so (using g = B ), 0
B
k dm
m(B ).
Using the choices B = {x X | k (x) < 0} and B = {x X | k (x) > 1} implies the claim that k takes m-almost surely values in [0, 1]. Since m = + , we can reformulate (2.38) as g (1 k ) d = gk d (2.39)
96
(this holds by construction for all simple functions g , and hence for all nonnegative measurable functions by monotone convergence). Now dene sing to be |X1 where X1 = {x X | k (x) = 1}. By denition, sing (X X1 ) = 0, and by (2.39) applied with g = have (X1 ) = 0. Therefore sing . We also dene abs = |X
X1
X1 we also
= {x|k(x)<1}
k 1k
so that = sing + abs . Dene the function f = let g 0 be measurable. Then by (2.39) we have gf d =
X X1 X X1
0 on X X1 , and
g k d = 1k
X X1
g (1 k ) d = 1k
g dabs ,
X X1
which shows that dabs = f d, and so abs .

Exercise 2.83. Use Lemma 2.51 and Proposition 2.82 to calculate the norm of the functional Cc (X ) f f d f d for two nite Borel measures and on X (which may or may not be mutually singular). Exercise 2.84. Extend Proposition 2.82 to the -nite case (for example, by using the result for the case of a nite measure space and constructing for each of the two measures a strictly positive integrable function).
2.5.5 Orthonormal Bases and GramSchmidt Denition 2.85. A set {xj | j J } of vectors is called orthonormal if xj , xk = jk = 1 if j = k, 0 if j = k.
In other words, we require that all the vectors have length one, and are mutually orthogonal. As one might expect, this notion is fundamental for Hilbert spaces, and gives rise to the following satisfying abstract result, which as we will see generalizes part of Fourier analysis.
2.5 Hilbert Spaces
97
Proposition 2.86 (The closed linear hull of an orthonormal sequence). Let H be a Hilbert space and I a nite or countable index set. Then the closed linear hull of an orthonormal set {xj | j I } is given by {xj | j I } = x = aj xj | the sum converges in H .
j I
Moreover, x = this case,
j I
aj xj converges in H if and only if x

2
j I
|aj |2 < . In
=
j I
|a j |2
and aj = x, xj for all j I . Notice that the result above also implies that the sum j =1 aj xj is independent of the order of the terms aj xj . In fact, if we know that {xj | j J } consists of orthonormal elements, then if x = j J aj xj we must have aj = x, xj and so the inner product of x with any element of the linear hull, or even with any element of the closed linear hull of {xj | j J }, is determined by the coecients only, and is independent of the order of the sum. Since x is determined by its associated functional x (h) = h, x in Corollary 2.71, the claim follows. In general the series j J aj xj arising may not be absolutely convergent, so Exercise 2.23 would not apply. Proof of Proposition 2.86. If the index set is nite, then
2
aj xj
j I
=
j I
| a j |2
by the assumption of orthonormality, the Pythagoras formula (2.36), and induction. Also by orthonormality we have aj xj , xk
j I
= ak
for all k I . This proves the statements in the case of nite I . Now suppose that I = N, and dene the subspace F = {(an ) 2 (N) | there exists some N with an = 0 for n > N } of the space 2 (N) from Example 2.60(2). Also dene a linear map
98
: F H (an )
an xn ,
n=1
where the sum is actually nite by the denition of the subspace F . Notice that (F ) is the linear hull of the set {xn | n N}. The case of I nite now shows that ((an )) = (an ) 2 (2.40) and ((an )), xj = aj
2
(2.41)
for all j N. Now notice that we may think of F (N) as the space of simple functions, which is dense, since if (an ) 2 (N) then (a1 , a2 , . . . , aN , 0, . . . ) (a1 , a2 , . . . )
F 2 2
n=N +1
|an |2 0
as N . Hence can be extended to all of 2 (N) by Proposition 2.46. Now the properties (2.40) and (2.41) extend to all of 2 (N) by continuity. Moreover, as is (the extension of and hence) an isometry from 2 (N) to its image 2 (N) H , the image is complete and hence closed. This proves all the claims in the proposition: The closed linear hull must con tain all convergent sums x = n=1 an xn . However, the closed linear hull is also equal to 2 (N) , for any converging sum x = an xn we must have an = x, xn , and the coecients are uniquely determined, so we must have n=1 |an |2 < . A list of orthonormal vectors in a Hilbert space H is said to be complete (or to be an orthonormal basis ) if its closed linear hull is H . Theorem 2.87 (GramSchmidt). Every separable Hilbert space H has an orthonormal basis. If H is n-dimensional, then H is isomorphic to Rn or Cn . If H is not nite-dimensional, then H is isomorphic to 2 (N). Here isomorphic means isomorphic as Hilbert spaces, so there is a linear bijection between the spaces that preserves the inner-product. The proof of Theorem 2.87 is simply an interpretation of the familiar GramSchmidt orthonormalization procedure. Proof of Theorem 2.87. Let {y1 , y2 , . . . } H be a dense countable subset. We are going to use the vectors {yn } to construct an orthonormal list of vectors which has the same linear hull. This is built up from the simple geometrical observation that if a vector v does not lie in the linear span of a nite set of vectors, then something from the linear span may be added to v to produce a non-zero vector orthogonal to the linear span.
2.5 Hilbert Spaces
99
1 If y1 = 0, dene x1 = y y1 . Suppose now that we have already constructed orthonormal vectors x1 , . . . , xn by using the vectors y1 , . . . , yk with k n in such a way that Vn = x1 , . . . , xn = y1 , . . . , yk .
Now decompose yk+1 into a sum vk+1 + wk+1 with vk+1 Vn and wk+1 Vn using Theorem 2.70. If wk+1 = 0, then we increase k but do not dene a new vector xk+1 , as in this case we already have
Vn = x1 , . . . , xn = y1 , . . . , yk+1 . If wk+1 = 0 then we dene xn+1 =

wk+1 wk+1
. Once again we have
Vn = x1 , . . . , xn = y1 , . . . , yk+1 . Continuing this construction, we either see that H is n-dimensional for some n 0 and we have produced an orthonormal basis {x1 , . . . , xn } for H , or we see that H is innite-dimensional, and we construct an innite list x1 , x2 , . . . in H of orthonormal vectors. In this case the linear hull of {x1 , x2 , . . . } contains the original dense set {y1 , y2 , . . . } and so the closed linear hull must be all of H . Proposition 2.86 and its proof now complete the proof of the theorem. 2.5.6 The Non-Separable Case While the motivation generated by natural examples, and the notational convenience of thinking of countable collections as sequences incline one strongly to the separable case, there is no reason to restrict attention completely to separable Hilbert spaces. Example 2.88. Let I be a set, equipped with the discrete topology and the counting measure count dened on the -algebra P(I ) of all subsets of I . Then 2 (I ) = L2 (I, P(I ), count ) is a Hilbert space, and it comprises all functions a : I R (or C) for which the support F = {i I | a(i) = 0} is nite or countable, and for which
iI
| a i |2 =
iF
|ai |2 < .
Theorem 2.89 (Non-separable GramSchmidt). Let H be a non-separable Hilbert space. Then there exists an uncountable set I such that H = 2 (I ). Indeed, for every i I there is an element xi H so that the set {xi | i I } is orthonormal. The isomorphism between H and 2 (I ) is given by 2 (I ) a a(i)xi ,
iI
where the sum on the right is countable and convergent.
100
Proof. We will construct a maximal orthonormal set of vectors by using Zorns lemma (see Appendix A.1). Dene a partially ordered set F = {(I, x ) | the function x : I H has orthonormal image} , with partial order dened by (I, x ) (J, y )
if I J and x = y |I . In this partially ordered set every linear chain has an upper bound, which can be found by simply taking the union of the index sets and the natural extension of the partially dened functions to the union. It follows that there exists a maximal element (I, x ) of this partially ordered set by Zorns lemma. Using this, dene
. .
: 2 (I ) H (ai | i I ) a(i)xi
iI
rst on the subset of all elements a 2 (I ) with |{i I | a(i) = 0}| < , and then (by applying Proposition 2.46) on all of 2 (I ). This again denes an isomorphism from 2 (I ) to Y = 2 (I ) H . We claim that Y = H , for otherwise there would exist some x Y of norm one by Corollary 2.69, and using this element x we can dene a new element of F which is strictly bigger than the maximal element (I, x ) in the partial order. This contradiction shows the claim and hence the theorem.
2.6 Further Topics

The material above represents the basic language and some of the main examples of functional analysis and its rst immediate applications. Let us mention briey some directions in which the theory continues.
Hilbert spaces are at the heart of many development. We will start to see this in the context of Fourier series, Sobolev spaces and the Laplace dierential operator in Chapters 3 and 4. In Chapter 5 we will see why we insisted on completeness in the denition of Banach and Hilbert spaces.
In order to ensure that this denition does indeed dene a set, we could choose I as a subset of H , and let x be the identity or insist that I belongs to the cardinal of the power set (set of all subsets) P(H ).
2.6 Further Topics
101
We have seen the denition of dual spaces, but only found a description of the dual of a Hilbert space. This will be corrected in Chapter 6, where we will describe the dual spaces to many Banach spaces that we discussed here. Some natural spaces (examples include Cc (X ) and C ([0, 1])) do not t into the framework of Banach spaces, but do t into the more general context of locally convex spaces. These will be introduced in Chapter 7.
Banach algebras will be discussed in greater detail in Chapter 9, which lays the foundations for the more advanced spectral theory in Chapters 10 and 11. The rst three of the topics above are, apart from a few examples and applications, independent of each other, so the reader may continue with any of Chapter 3 (followed for example by 4), Chapter 5, and 6. However, Chapters 7 and 8 depend more heavily on Chapters 5 and 6. Some topics that we touched in this chapter we will not return to, but let us give a few references for further reading. The Bergman spaces are useful in complex analysis, see e.g. ??. (others)
3 From Fourier Series to Dirichlet Boundary Value Problems
In this chapter we will pick up some of the informal claims from Chapter 1, prove them, and will locate them within the theory developed in Chapter 2. In particular, we will introduce Fourier series (in two dierent settings, the rst abstract and the second being the familiar setting of the torus) and will use them as a tool to study the Dirichlet boundary value problem. To bridge the gap between these two topics we will develop some of the general theory of functional analysis along the way. The emphasis is on seeing some of the abstract ideas in the relatively concrete setting of more familiar objects.
3.1 Fourier Series on Compact Abelian Groups

Denition 3.1. A topological group is a group G that carries a topology with respect to which the maps (g, h) gh and g g 1 are continuous as maps G G G and G G respectively. A compact ( -compact, locally compact, and so on) group is a topological group for which the topological space is compact ( -compact, locally compact, and so on). We similarly extend other topological and algebraic properties to topological groups. For example, a metric compact abelian group is a compact metric topological space with an abelian group structure with the continuity conditions above. Below we will be largely concerned with specic metric abelian groups, in which the circle or 1-torus T = R/Z = S1 and the d-torus
d Td = R /Zd = S1
This chapter will be concerned with applications of the basic theory of Hilbert spaces given in Section 2.5, and is not part of the core development of the theory given later. However, skipping both Sections 3.1 and 3.2 would remove the main examples from the theory of Hilbert spaces.
104
are the main examples (which will also be discussed in Section 3.2 from a slightly dierent, more concrete, point of view). The notation T will be used for the additive circle and S1 = {z C | |z | = 1} for the multiplicative circle. Here compact and abelian are necessary assumptions for the type of result we prove, and dropping either of these two assumptions changes the theory signicantly. However, we assume metrizability mainly for convenience, because it gives rise to the following useful observation. Lemma 3.2. Let (X, d) be a compact metric space. Then C (X ) is separable with respect to the uniform metric d(f, g ) = f g . Proof. The space X is separable (this may be seen, for example, from the proof of Theorem 2.31) so we may choose a countable dense set {x1 , x2 , . . . } X. We now dene fn = d(x, xn ), and claim that these functions separate points in X . That is, if x = y then there exists some n for which fn (x) = fn (y ). To see this, notice that by 1 d(x, y ), which implies density there is some n with d(x, xn ) = fn (x) < 2 1 that fn (y ) = d(y, xn ) > 2 d(x, y ). Now let AQ = Q[f0 = 1, f1 , f2 , . . . ] be the Q-algebra generated by the functions f1 , f2 , . . . together with the constant function f0 = 1. Clearly AQ is countable, and the closure of AQ contains the algebra A = R[f0 , f1 , f2 , . . . ]. Since A is an algebra that separates points, it is dense in CR (X ) (and A + iA is dense in CC (X )) by the StoneWeierstrass theorem (Theorem 2.34). We will use the following facts about compact abelian groups as black boxes we will not need to know how they are proved. However, we will also see that these are often easy to prove if the group is given concretely. Fact 1 (Existence of Haar measure ). Every locally compact group G has a left Haar measure mG , satisfying (and characterized by) the properties: mG (K ) < for any compact set K G; mG (O) > 0 for any open set O G; and mG (gB ) = mG (B ) for all measurable B G and g G.
We will usually be dealing with metrizable groups, which simplies the measure theory needed, but the existence of Haar measure only requires the group to be locally compact. For compact metric abelian groups we will be able to give an independent proof of Fact 1 (see Exercise 7.13). For G = Td , which as a measurable space can be identied with [0, 1)d , the Haar measure is simply the d-dimensional Lebesgue measure restricted to [0, 1)d .
This will be discussed further in Section ??.
3.1 Fourier Series on Compact Abelian Groups
105
Exercise 3.3. Show that the Lebesgue measure on [0, 1)d considered as a measure on Td satises all the properties of Fact 1.
Before stating the next fact, we recall that a character on a topological group G is a continuous homomorphism : G S1 = {z C | |z | = 1}. (3.1)
The trivial character is the character dened by (g ) = 1 for all g G. A collection F of functions on G (and, in particular a collection of characters) is said to separate points if for any g, h G with g = h there is some f F with f (g ) = f (h). Fact 2 (Completeness of characters ). On every locally compact abelian group G there are enough characters to separate points. For G = T this is trivial, because the single character dened by (x + Z) = e2ix for x R already separates points since it is an isomorphism between T and S1 . For G = Td the characters 1 , . . . , d , where j (x + Zd ) = e2ixj for x = (x1 , . . . , xd ) Rd , separate points since if x = y we must have some j {1, . . . , d} with xj = yj , and then j (xj ) = j (yj ). In general discussions about characters we will often need to parameterize the collection of all characters abstractly using symbols and then write characters in the form . For example, we will see shortly that the characters on Td are parameterized by elements n Zd , and so we may write characters on Td as n (x + Zd ) = e2inx . We will also write x Td as shorthand for the element x + Zd Td , and whenever convenient we identify x Td with x [0, 1)d . Assuming Fact 1 and 2 for a compact metric abelian group G, we will now describe the theory of Fourier series on G. This will give a complete description of L2 m (G) where m is the Haar measure on G. For convenience we normalize m to satisfy m(G) = 1. Theorem 3.4 (Fourier series). Assume that a metric compact abelian group G satises Fact 1 and 2. Then the set of characters forms an orthonormal basis of L2 m (G). That is, the set of characters is an orthonormal set and any f L2 m (G) may be written
This will be established in Section ??.
106
f=
a ,
where the sum, which runs over all the characters of G, is convergent(3) with respect to 2 , the equality is meant as elements of L2 m (G), the coecients are given by a = f, , and they satisfy
| a |2 = f
2 2.
Exercise 3.5. Rephrase Theorem 3.4 for Td for real-valued functions using sin and cosine functions (and notice as a result how much easier it is notationally to use characters for d > 1).
Proof of Theorem 3.4. Let be a non-trivial character on G, so that there is some element g G with (g ) = 1. Since is continuous and m(G) = 1, the function is integrable, and (x) dm(x) =
G G
(g + x) dm(x) (x) dm(x)

G
(3.2) (3.3)
= (g )
where in (3.2) we used the dening rotation-invariance property of the Haar measure extended to integration in the form f (x) dm(x) =
G G
f (g + x) dm(x)
for integrable functions, and in (3.3) we used the fact that a character is in particular a homomorphism. However, we have chosen g with (g ) = 1 so (3.2) and (3.3) give dm = 0.
G
Now let 1 , 2 be any characters, and write = 1 2 . Then is also a 1 character, and since 2 (g ) = (2 (g )) , we see that is trivial if and only if 1 = 2 . Therefore the calculation above gives 1 , 2 =
G
1 2 dm = 1 ,2 =
m(G) = 1 0
if 1 = 2 ; if 1 = 2 ,
so the characters form an orthonormal set (and this is a consequence of Fact 1). Next we claim that there are only countably many characters on G. For this, notice that by orthonormality of the set of characters, the L2 distance between any two distinct characters is 2. By Lemma 3.2, C (G) is separable
3.2 Fourier Series on Td
107
with respect to the since the bound
norm. This extends to L2 (G) with respect to = |f (g )|2 dm(g )

f
2
shows that there is a continuous embedding C (G) L2 (G), which we know has dense image by Proposition 2.38. It follows that there can be only countably many distinct characters, since an uncountable collection would give rise to an uncountable collection of disjoint open balls of radius 1 2 2, contradicting separability. In order to show completeness we will use Fact 2. Dene the complex linear hull A = | a character on G , and notice that A is an algebra since the product of two characters is another character. Also notice that A is closed under conjugation, since (g ) = (g ) for g G denes another character if is a character. Since by Fact 2 the algebra A separates points in G, the complex version of the StoneWeierstrass theorem now implies that A is dense in CC (G) with respect to . However, by the continuity of the embedding from C (G) to L2 (G) the closed linear hull of A in L2 m (G) contains CC (G) and so by Proposition 2.38 must be all of L2 m (G). Now Theorem 3.4 follows from Theorem 2.87.
Exercise 3.6. Show that any character on Td is of the form n for some n Zd by using the proof of Theorem 3.4. Exercise 3.7. Find all the characters on G = Z/N Z and prove Theorem 3.4 directly for this case. Exercise 3.8. Describe all the characters on G = Zp = proj lim Z/pZ , the compact group of p-adic integers.

The discussion in Section 3.1 applies in particular to the torus G = Td , giving the basic theory of Fourier series of L2 functions there, but this case is so important that we will treat it in greater detail here. Along the way, we will give an independent proof of Theorem 3.4 for the torus in this section. For this section, we will dene a character on Td to be a function of the form n (x) = e2inx = e2i(n1 x1 ++nd xd )
108
for all x Td , for some n Zd . We will see in Corollary 3.20 that these are indeed all the characters on Td in the sense of Section 3.1 (also see Exercise 3.6). A trigonometric polynomial is a nite linear combination p=
nF
a n n
of characters, so F Zd is a nite set and an C for all n F . As in the proof of Theorem 3.4, one can use the complex version of the StoneWeierstrass theorem to show that every continuous function can be approximated by a trigonometric polynomial. In this section we will give another proof of this using convolution. Theorem 3.9 (Fourier series on the torus). The characters n with n Zd form a complete orthonormal basis for L2 (Td ), so that every f L2 (Td ) is given by an L2 convergent Fourier series f=
nZd
a n n ,
(3.4)
where the an are the Fourier coecients dened by an = an (f ) = f, n =

Td
f (t)n (t) dt (3.5)
for n Zd . Moreover,
2 2
=
nZd
| a n |2 .
The relation (3.5) is Parsevals formula, and it may be viewed as an innite-dimensional form of Pythagoras theorem. We will see later (see Theorem 5.7 in Section 5.1.1) that it is too much to ask for the Fourier series of a continuous function to converge uniformly, or even pointwise. However, some additional smoothness assumptions do imply uniform convergence of the Fourier series, and this will be the starting point for our excursion into the theory of Sobolev spaces in Sections 3.4, 3.5 and 4.4. Theorem 3.10 (Dierentiability and Fourier series). Suppose that f C k (Td ) for some k 1. Let = (1 , . . . , d ) Nd k . Then the 0 with 1 Fourier coecient an ( f ) of f is given by an ( f ) = (2 in1 )1 (2 ind )d an (f ). (3.6) If k > d/2, then the Fourier series in the right-hand side of (3.4) converges absolutely , and f
2 2
k f + e 1
2 2
k f + + e d
2. 2
In order to prove this result we will need to discuss convolution.
The reader may wonder in what sense, and the answer is in all of them: With respect to 2 , pointwise at every point, and moreover with respect to .
109
3.2.1 Convolution on the Torus Denition 3.11. Let f, g L2 (Td ), or more generally let f Lp (Td ) and g 1 1. Then the convolution of f and g is the Lq (Td ) with 1 p + q = 1 and p, q function f (t)g (x t) dt. f g (x) =
Td
A pair of numbers p, q related as in Denition 3.11 are called H older conjugate or conjugate exponents due to H olders inequality f g 1 f p g q for f Lp and g Lq . This implies that the integral dening f g (x) exists for all x Td . Lemma 3.12. Let g Lq (Td ) with 1 q < . Then the shifted function g x Lq (Td ) dened by g x (t) = g (t x) depends continuously on x Td in the q 1 norm. Moreover, if f Lp (Td ) and g Lq (Td ) with 1 < p and p +1 q = 1, d then f g C (T ). Proof. As noted above, the convolution f g (x) is well-dened for all x Td by H olders inequality. For 1 q < , C (Td ) is dense in Lq (Td ) by Proposition 2.38. Now x > 0 and choose F C (Td ) with g F q < . Then there exists some > 0 for which d(x, y ) < = Fx Fy
< =
Fx Fy
< .
Since shifting functions preserves their integrals and their q -norms, we deduce that d(x, y ) < = gx gy
q
gx F x
+ Fx Fy
+ F y gy
< 3, (3.7)
showing continuity. Now suppose that f Lp (Td ) and g Lq (Td ), x > 0 and choose > 0 so that (3.7) holds. Then |f g (x) f g (y )| = f (t) (g (x t) g (y t)) dt
Td
Td
|f (t)||g (x t) g (y t)| dt
p
gx gy
3 f
by the H older inequality and (3.7). The case p = 1 (corresponding to q = ) follows by switching f and g (and hence p and q ), which is permitted by (1) in the next lemma. Lemma 3.13. Let f Lp (Td ) and g Lq (Td ). Then
110
(1) f g = g f ; (2) f n = f (t)n (t) dt n ; and (3) m , n = m,n . Proof. The rst formula follows by a simple substitution (see Exercises 3.3 and 3.14): f g (x) = =
Td
Td
f (t)g (x t) dt
=u
f (x u)g (u) du = g f (u).
The second formula follows from the denition, since f n (x) = =

Td
Td
f (t)n (x t) dt f (t)n (x)n (t) dt f (t)n (t) dtn (x).
=
Td
For the last identity (which is a general property of characters, as we have seen in Section 3.1), note that m (t)n (t) = mn (t) = e2i((m1 n1 )x1 ++(md nd )xd ) , and integrate this character over Td to see the result.
Exercise 3.14. Identify a function h C (Td ) with a Zd -periodic function on Rd dened by h(x) = h(x + Zd ). Using this and the denition h(t) dt =
Td [0,1)d
h(t) dt
show that h(t) dt =

Td Td
h(t) dt
and h(t) dt =
Td Td
h(t + v ) dt
for all v Td .
3.2.2 Dirichlet and Fej er Kernels Let us assume rst that d = 1. By Lemma 3.13(2) the nth term in the Fourier series of f is given by an (f )n = f n .
111
Thus the partial sums of the Fourier series satises

N n=N N
an (f )n = f
n
n=N
This observation motivates the following denition. Denition 3.15. The N th Dirichlet kernel is the function
N
DN =
n=N
n .
The 8th Dirichlet kernel is illustrated in Figure 3.1.
Fig. 3.1. The 8th Dirichlet kernel on the interval [ 1 , 1 ]. 2 2
Lemma 3.16. DN (x) = and

0
2N + 1
e2i(N +1)x e2iN x e2ix 1 1
sin((N + 1 2 )2x) sin x
if x = 0 T, if x = 0,
DN (x) dx = 1. Proof. The case x = 0 and the integral calculation follow immediately from the denitions. To check the formula for x = 0 notice that the Dirichlet kernel i e i is a geometric series and use the relation sin = e : 2i
112

N
DN (x) =
n=N
e2ix
= e2iN x 1 + + e2ix e2ix 1 e2ix 1 e2i(N +1)x e2iN x = e2ix 1 1 1 e2i(N + 2 )x e2i(N + 2 )x = e i x e i x 1 sin (N + 2 )2x . = sin x = e2iN x
2N +1
2N
The Dirichlet kernel is real-valued (this may be seen from the denition or from the reformulation in the lemma), but takes on both positive and negative values by the reformulation in the lemma. By averaging we obtain another kernel which only takes on positive values, which will be crucial later. Denition 3.17. The M th Fej er kernel is dened by FM = 1 M
M 1 m=0
Dm .
The 8th Fej er kernel is shown in Figure 3.2 (on the same scale as Figure 3.1).
Fig. 3.2. The 8th Fej er kernel on the interval [ 1 , 1 ]. 2 2
Lemma 3.18. The M th Fej er kernel is given by M if x = 0, 2 FM (x) = sin( Mx ) 1 if x = 0 M sin(x) and satises the following properties: FM (x) 0 for all x R;
113
for > 0.
1 FM (x) dx 0 FM (x) 0
= 1; as M uniformly on every set of the form [, 1 ]
Proof. We rst verify the formula claimed for FM . If x = 0, then FM (0) = For x = 0 we have FM (x) = = = 1 M
M 1 m=0
1 M
M 1 m=0
(2m + 1) = M.
e2i(m+1)x e2imx e2ix 1 e2ix

M 1
1 M (e2ix 1)
e 1 e 1 1 e2ix 2ix 2ix M (e2ix 1) e 1 e 1 e i x e i x 1 e2iMx 1 = 2 i x 2 M (e 1) e ix 1 + = = 1 1 e2iMx 2 + e2iMx i x M (e eix )2 1 (eiMx eiMx )2 1 sin2 M x . = i x i x 2 M (e e ) M sin2 x
1 0
m=0 2 iMx
e2imx
M 1
e2imx
m=0 2 iMx
e i x e2iMx 1 (1 e2ix )
Now it is clear that FM (x) 0. 1 Since 0 Dm (x) dx = 1 we also have [, 1 ] we have sin x 2 and hence FM (x) uniformly as M . Proposition 3.19. For f C (T) we have 1 M 2
FM (x) dx = 1. Finally, for x

2
f FM f as M with respect to are dense in C (T).

.
In particular, trigonometric polynomials
114
This behavior of the sequence of functions (FM ) with respect to convolution is also described by saying that the sequence (FM ) is an approximate identity. Proof of Proposition 3.19. Let f C (T) and x > 0. Then there exists some > 0 for which d(x, y ) < = |f (x) f (y )| < . Now we estimate the dierence f FM (x) f (x) as follows. By commutativity 1 of convolution and the facts that FM 0 and 0 FM (t) dt = 1 we have |f FM (x) f (x)| = |FM f (x) f (x)| = FM (t)f (x t) dt FM (t)f (x) dt
FM (t)|f (x t) f (x)| dt. Now we split the range of integration into the interval [, 1 ] and its complement: |f FM (x) f (x)|
1
FM (t) |f (x t) f (x)| dt
2 f
FM (t) |f (x t) f (x)| dt.

<
The rst integral goes to zero as M since FM 0 uniformly as M on [, 1 ]. Hence for large enough M , the rst integral is smaller than . As is independent of x, the same is true for M , and we see that f FM converges 1 uniformly to f . The second integral is bounded above by since 0 FM (t) dt = 1. Let us briey describe how the denition of a character in Section 3.1 relates to the characters n for n Zd that we will now discuss. Corollary 3.20. Every character of Td in the sense of the denition given on p.105 is of the form = n for some n Zd . Proof. Let : Td S1 be a continuous homomorphism. By Proposition 3.19, can be approximated uniformly by a trigonometric polynomial f . If does not appear in this trigonometric polynomial, then f is orthogonal to (by the argument on p. 106) and so cannot be close to . However, by denition of the Fej er kernel, the characters appearing in f are those of the form n for n Zd .
115
Proof of Theorem 3.9. We start with the case d = 1. Proposition 3.19 shows that the linear hull A of the characters (that is, the space of trigonometric polynomials) is dense with respect to in C (T). Therefore the same holds with respect to 2 in L2 (T). By Lemma 3.13, the characters on T form an orthonormal set. Thus Theorem 2.87 applies and proves the theorem for d = 1. The case of d 2 is similar once we have shown that the space of trigonometric polynomials is dense in the continuous functions. For this, notice rst that FM (x1 , . . . , xd ) = FM (x1 ) FM (xd ) is a trigonometric polynomial satisfying FM
Td
0, FM (x) dx = 1, and FM (x) dx =

[, ]d
FM (t) dt
>1
for , > 0 and large enough M (how large depending on and ). Next notice that f FM is a trigonometric polynomial for any f C (Td ). The argument is now similar to the case d = 1: we again show that the sequence is an approximate identity. Given f C (Td ) and > 0 we can choose > 0 such that |f (x t) f (x)| < for x Td and t [, ]d. This implies that f FM (x) f (x)
Td
|f (x t) f (x)|FM (t) dt |f (x t) f (x)| FM (t) dt

<
[, ]d
+
Td [, ]d
2 f
FM (t) dt
<+2 f as required.
Exercise 3.21. (Approximate identities on R) Dene a function : R R by (x) = where k > 0 is chosen so that
1
ke1/(1x 0
for |x| < 1, for |x| 1,
(x) dx =
1
(x) dx = 1.
For > 0 also dene (x) = 1
for x R. Show that
116
(a) Cc (R), (b) the function f dened by
f (x) = f (x) =
R
(x y )f (y ) dy
converges uniformly to f as 0 on any compact subset of R, for any f C (R), (c) f C (R) for any f C (R), and (d) Supp f Supp f + [, ].
3.2.3 Dierentiability and Fourier Series Suppose that f C 1 (Td ) and j {1, . . . , d}. Notice that
1
xj f (x)n (x) dxj = f (x)n (x)

0 1
xj =1 xj =0
f (x)xj n (x) dxj

0
f (x)xj n (x) dxj

0
(3.8)
by integration by parts and periodicity, and xj n (x) = xj e2i(n1 x1 ++nd xd ) = 2 inj e2i(n1 x1 ++nd xd ) = 2 inj n (x). (3.9)
Thus the Fourier coecients of f and of the partial derivative xj f satisfy the relation an (xj f ) = (xj f )(x)n (x) dx f (x) xj n (x) dx (by (3.8)) (3.10)
Td
Td
2 inj n (x) by (3.9)
= 2 inj an (f ).
Proof of Theorem 3.10. The formula (3.6) follows from (3.10) by induction on k . To prove the last claim of the theorem, we will show that |an (f )| d f
2 2 k f + e 1 2 2 k f + + e d 2 2
(3.11)
nZd
for f C k (Td ) and k > d 2 . Assuming (3.11) for the moment, we see that an (f )n
nZd
117
is an absolutely convergent series with respect to , and so converges to some limit F in C (Td ) with respect to . However, since 2 the same function F is also a limit with respect to 2 . Hence by Theorem 3.9 we have F = f rst as an identity in L2 (Td ) (and hence almost everywhere), but as both functions are continuous, also in C (Td ) (and hence everywhere). To prove (3.11), we start by expressing the right-hand side in terms of the Fourier coecient an = an (f ). By Parsevals theorem in (3.5) applied k (f ) we have to e j
k (f ) e j 2 2
=
nZd
k (f ) |2 = |an e j
nZd
(2nj )2k |an (f )|2 ,
where we have used (3.6) in the last step. Therefore we can simplify the sum under the square root to give the estimate
d
2 2
k f + e 1
2 2
k + + e f d
2 2
=
nZd j =1
k |an (f )|2 1 + (2 )2k n2 j 2k 2
since n that
2
1+ n
nZd 2k 2
|an (f )|2
d j =1
d max1
j d
|nj | and hence n

2k 1/2 2 nZd
|nj |2k . We claim (3.12)
1+ n
2 (Zd )
for k > d 2 . From this claim the inequality (3.11) follows quickly by the Cauchy Schwarz inequality: |an (f )| = 1+ n
nZd 2k 1/2 2
1+ n
2k 1/2 2
nZd
|an (f )| 1+ n
2k 2
1+ n d To see the claim that f

2 2
2k 1/2 2
nZd 2 (Zd ) k f + + e d
nZd 2. 2
|an (f )|2
k f + e 1
2 2
nZd
1 1+ n
2k 2
<
we split the sum up. Firstly, by running through the possibilities of the signs of the nj , it is sucient to show convergence for n1 , . . . , nd 0. Secondly, using the symmetry of the summands with respect to permutation of the variables, we may restrict the sum to those n Zd for which n2 , n3 , . . . , nd n1 , and we may also assume that n1 1. Now 1+ n
2k 2 k n2 1 ,
118
so 1 1+ n
2k 2
nZd
n1
n1 =1 n2 ,...,nd (n1 + 1)d1 k n2 1 n=1
1 n2k =0 1
n2k+1d n1 =1 1
and the last sum converges if 2k > d. This implies the claim above, the inequality (3.11), and hence the theorem.
Exercise 3.22. Let f be a real-valued function dened on an open subset R2 . 4 4 Suppose that f is continuous and that x f , x f exist and are continuous also. 1 2 Show that x1 x2 f exists and is continuous.
3.3 Spectral Theory for Group Actions on Td

We are going to describe in this section how functions on R2 can be decomposed into functions that have special rotational symmetries as alluded to in Section 1.1. The same argument will also apply to symmetries with respect to rotations in the x, y -plane in R3 (but not to all rotations in R3 ), and to many other situations. A convenient framework that allows all of these examples 11 and more (like the quotient of the hyperbolic plane by the action of ) 01 is the following set-up. 3.3.1 Group Actions and Unitary Representations For later use, we will dene the following notion in a more general setting than we need immediately. Denition 3.23. Let G be a topological group, and let X be a topological space. A continuous group action of G on X is a continuous map
. : G X X (g, x) g.x
with for g, h G and x X , and g (h x) = (gh) x ex=x
..
for all x X , where e G is the identity element. In this denition we have used multiplicative notation for the group operation in G, but as usual if G is abelian we will often use additive notation.
119
Denition 3.24. Let G be a topological group with a continuous action on a topological space X , and let be a measure on X dened on the Borel sets. Then we say that the G-action is measure-preserving if (g B ) = (B ) for all g G and any measurable set B X .
It is straightforward (see Exercise 3.25) to see that a measure-preserving action also preserves integration with respect to in the sense that f g d =
X X
f (g 1 x) d(x) =
X
f (x) d(x) =
X
f d,
(3.13)
for all integrable functions f and g G, where we dene f g (x) = f (g 1 x) for all x X (the inverse in the denition of f g is only necessary if G is non-abelian). In particular, if f1 , f2 L2 (X ) then
g f1 2
= f1
and, more generally
g g f1 , f2 = f1 , f2
(3.14)
for all g G, where the inner-product is dened by f1 , f2 =

X
f1 f2 d.
We can associate to the action of g on X an operator g on L2 dened by g (f ) = f g = f g 1 , which is unitary by (3.14), and since it is invertible we also have g g1 = g1 g = I, the identity on L2 .
Exercise 3.25. Prove (3.13)
Denition 3.26. A unitary representation of a topological group G on a Hilbert space H is a homomorphism from G into the group of unitary operators on H, written g f (or (g )f , g f , or f g ), such that
e = I , the identity operator on H; g1 g2 = g1 g2 for g1 , g2 G; for any given f H, the map g g f H is continuous.
Lemma 3.27. Let G be a locally compact metric group G acting continuously on a locally compact metric space X . Suppose that the action is measurepreserving with respect to a locally nite measure on the Borel sets of X .
That is, a measure with the property that at every point x X there is an open neighbourhood B (x) with nite -measure.
120
Then the induced action of G on H = L2 (X ) is a unitary representation of G on H. More generally, for any p [1, ) and f Lp (X ) we have g that g G f g Lp ( X ) satises f = f and depends continuously p p on g G.
Lemma 3.27 is an important motivation for the study of unitary representations in general. We will return to this topic in several settings. Proof of Lemma 3.27. By Exercise 3.25 (see the hints on p.391), g dened by g (f )(x) = f (g 1 x)
is unitary for every g G, and it is also easy to see that the rst two properties of a unitary representation hold. For the continuity we essentially have to repeat the argument from the proof of Lemma 3.12. So let p [1, ) and f Lp (X ) and x > 0. By Proposition 2.38, there exists some F Cc (X ) with Let U = U 1 be a compact symmetric neighborhood of e G so that K = U Supp F X is compact and hence has nite measure. The map f F
p
< .
(3.15)
(g, x) F (g 1 x)
is continuous and so is uniformly continuous on the set U K . Hence there exists a > 0 for which d(g, e) < = F (g 1 y ) F (y ) < / (K ) for all y K . Thus g F F
2 2
=
X
F (g 1 x) F (x) F (g 1
=
K
. .x) F (x)
< 3
d(x) d(x)
<2 /(K )
< . Together with (3.15), this gives g f f

p
for all g U with d(g, e) < . Therefore, g f is continuous at g = e. However, this gives the general case since g f g0 f
p
= g1 g f f
0
p,
and hence the desired continuity.
and if we apply the argument above we see that for g suciently close to g0 we again have g f g0 f p < 3
121
3.3.2 Measure-Preserving Actions of Compact Groups Assume now that G is a compact group, where our main example in mind is G = Td (and in many cases G = T). Assume that G acts on a locally compact metric space X , preserving a locally nite measure (a good example to have in mind is X = R2 equipped with rotation about the origin or R3 equipped with rotation about the z -axis). Then one can associate to G and X a locally compact quotient metric space Y = G\X = {G x | x X } which consists of the compact orbits G x of the G-action, equipped with the metric
dY (Gx, Gy ) = min d(gx, hy ),

g,hG
(3.16)
and equipped with a locally nite quotient measure dened by (BY ) = 1 (BY )
where : X Y is the quotient map : x G x Y , and BY Y is measurable. We will not prove this here. For G = T and X = R2 , with the rotation action dened in (1.2) on p. 5, we have Y = [0, ) and each element of Y corresponds to the radius of an orbit (a circle with center the origin in R2 ). Thus we may think of elements of Y as the radius of any point on an orbit when written in polar coordinates. For the Lebesgue measure dx dy = r dr d on R2 the quotient measure on [0, ) is dened by 2r dr. Turning this construction around, one can also start with a locally nite measure on a locally compact metric space Y , dene X = Y G and = mG where mG is a left Haar measure on G. Then the action of G on X dened by g (y, h) = (y, hg 1 )
gives rise to a measure-preserving action of G. Note that the unitary representation now takes the form g (f )(y, h) = f (y, hg ) for any f L2 (X ), y Y and g G. If X = R2 with the rotation action of T, then X {0} ts into this framework.
Exercise 3.28. Prove that dY dened in (3.16) is a metric.
3.3.3 Unitary Representations of Compact Abelian Groups We now describe the decomposition of elements in a Hilbert space into elements of special types according to a unitary representation of a compact rotation group G, where we will have G = Td in mind. For the argument it makes little dierence whether we restrict attention to G = T only, or discuss compact abelian groups (though in this case we will need to assume Facts 1 and 2 from p. 104 for G).
122
Denition 3.29. Let H be a Hilbert space, and let G be a topological group acting unitarily on H. Let : G S1 be a character. Then v H is of type or has weight (type or weight n Zd if = n in the case G = Td ) if g v = (g )v for all g G. We also dene the weight space H = {v H | v has weight }. The following result gives as a special case the decomposition of functions on R2 into components of dierent weights. Theorem 3.30. Let G be a compact metric abelian group that acts unitarily on a Hilbert space H. Then H = H , where the sum is over all characters of G. More concretely, H and H are orthogonal subspaces for any characters = , and every v H can be written as the limit of a convergent sum v in H, where v = v =
(g )g (v ) dmG (g )
G
(3.17)
has weight . 3.3.4 Integrating Hilbert Space-valued Functions We need to explain the meaning of the convolution by in (3.17) by extending the notion of Riemann and Lebesgue integration to functions taking values in a Hilbert space. We discuss three increasingly general denitions of such integrals. If H = L2 (X ) where X = Y G and = mG , and the action is g (y, h) = (y, h g ) as in Section 3.3.2, with the induced unitary representation, then for f in L2 (Y G) we have by Fubinis theorem that the function f (y, ) : G C lies in L2 (G) for -almost every y Y . It follows that the integral f (y, h) =
(g )f (y, h g ) dmG (g )
is well-dened for -almost every y , and that f L2 (Y G). We refer to Exercise 3.31 for this approach.
123
Exercise 3.31. (a) Let G be a compact abelian group acting continuously on a locally compact space X and preserving a locally nite measure . Show that f (x ) =
(g )f (g 1 x) dmG (g )
2 exists for -almost every x X and denes an element in L2 (X ) for any f L (X ) and character . (b) Use Theorem 3.9 to deduce Theorem 3.30 in the setting of (a).
Another way to interpret f is to generalize the Riemann integral to this context. This approach applies to any unitary representation. For a xed v H we have assumed that g (v ) H depends continuously on g G, and since g (g ) is continuous we also see that (g )g (v ) H depends continuously on g G. Now one can dene Riemann sums of the form (gP )gP (v )mG (P ),
P
where = {P1 , . . . , Pn } is a partition of G into nitely many measurable sets, and gP P is an arbitrary sample point chosen in the partition element P of . Requiring that max diam(P ) 0
P
along a sequence of partitions forces the associated Riemann sums to be Cauchy, and hence convergent. The limit is the Riemann integral. We refer to Exercise 3.32 for the details of integrating functions taking values in a Banach space.
Exercise 3.32. Let (X, d) be a compact metric space and let be a nite Borel measure on X . Let V be a Banach space, and let f : X V be a continuous function. Prove the existence of a Riemann integral R
X
f d V
as indicated above.
In the following we will describe the integral for functions taking values in a Hilbert space, and this is also the approach that requires the least amount of structure for the function. On the other hand, this third approach does not allow integration of functions taking values in a Banach space, but works similarly for any dual space and so also for the class of reexive Banach spaces to be dened later. Lemma 3.33. Let (Z, ) be a measure space, and let f : Z H be a function with the properties that z f (z ) is measurable and integrable, and that for any v H the map z v, f (z ) is measurable. Then there exists a unique element Z f d of H with
124
v,
Z
f d
=
Z
v, f (z ) d(z )
(3.18)
for all v H and z Z . Proof. To see this we only have to show that the right-hand side of (3.18) denes a continuous functional on H, for then the Fr echetRiesz representation theorem (Corollary 2.71) implies the claimed existence and uniqueness. That the integral converges and so this denes a well-dened map follows quickly from the CauchySchwarz inequality and our assumptions: v, f (z ) d(z )
Z Z
| v, f (z ) | d(z )
f (z ) d(z ).
Moreover, for any scalar and v, w H we have v + w, f (z ) = v, f (z ) + w, f (z ) for any z Z , and so v + w, f (z ) d(z ) =
Z Z
v, f (z ) d(z ) +
Z
w, f (z ) d(z ),
showing linearity. This shows that
f d H exists, and that f d

Z Z
f d.
One can now check that this denition depends linearly on f . In the context of unitary representations of a group G on a Hilbert space H, we can use the notions of integration above to dene convolution with measures or with L1 functions as follows. Denition 3.34. Let G be a topological group acting unitarily on a Hilbert space H, via the homomorphism from G into the group of unitary operators of H. Let be a nite measure on G. Then for v H we dene the convolution operator v =
g v d(g ).
G
If G has a left Haar measure mG and L1 mG , then for v H we dene v =
(g )g v dmG (g ).
G
125
3.3.5 Proof of the Weight Decomposition We are now ready to prove Theorem 3.30 which, apart from the generalized context, is a simple extension of Theorem 3.4. We will use the assumptions of the theorem in this section without further remark. Lemma 3.35. v has weight .
Proof. We need to prove that g v = (g ) v .

For this, let w H and then w, g v
= g (w), v
=
G
g (w), (h)h v dmG (h) w, (h)g+h v dmG (h)
=
G
=
G
w, (h g )h (v ) dmG (h ) w, (h )h (v ) dmG (h )
G
= (g )
= (g ) w, v = w, (g ) v .
Lemma 3.36. H H for any two characters = . Proof. Let v H , w H and g G. Then (g ) v, w = (g )v, w = g v, w = v, g w = v, (g )w
= (g ) v, w .
However, for some g G we have (g ) = (g ) and so v, w = 0. Lemma 3.37. If v H then v = v . If v H for = , then v = 0.

126
Proof. If v H then v =
(g )g v dmG (g )
G
=
G
(g )(g )v dmG (g ) = v,
but for v H we have
since we assume that mG (G) = 1. We know that v H by Lemma 3.35,
g v =
g (h)h (v ) dmG
G
=
G
(h)h (g v ) dmG
= ( g ) v
= (g ) v,
so v H . By Lemma 3.36 we know that H H so v = 0.

Proof of Theorem 3.30. Apply Theorem 2.87 to each weight space H . This produces, for each character, a list of orthonormal vectors whose closed linear hull is its weight space. Applying Lemma 3.36, we see that the union of these lists is also a list of orthonormal vectors in H. It remains to show that this union is a complete orthonormal basis of H, or equivalently that the closed linear hull of the union of the subspaces H for all characters is all of H. To see this, we wish to approximate a give vector v H by a nite linear combination of elements from weight spaces. Fix > 0. By continuity of the unitary representations there exists some > 0 such that g (v ) v <
G for g B (0). By Theorem 3.4 (or the more concrete argument from Section 3.2.2 in the case of the torus) there is a trigonometric polynomial (that is, a nite linear combination of characters) F with the following properties:
0; F dm = 1; and G G B (0) F dm > 1 .
Then F v is a nite linear combination of elements from weight spaces by Lemma 3.35. However, we also claim that F vv
(1 + 2 v ) .
(3.19)
To see this, let w H, and then notice that
127
w, F v v
= =
w,
G
F (g )g v dmG
w,
F dmG v
G
w, F (g ) (g v v ) dmG | w, g v v | F (g ) dmG (g )
w
G (0) B
+
G (0) G B
| w, g v v | F (g ) dmG
2 w v
w (1 + 2 v ) , which implies (3.19) since we may apply the inequality with w = F v v . Thus the closed linear hull of the subspaces H is all of H, and the union of the orthonormal bases forms a complete orthonormal basis for H. Therefore every vector v H has a decomposition as a direct sum v with v H . By Lemma 3.37, we also deduce that v = v .
Exercise 3.38. Let G be a locally compact group with left and right Haar measure mG . (a) For f1 , f2 L1 mG (G) dene their convolution in G by f1 f2 (g ) = f1 (gh1 )f2 (h) dmG (h).
Show that f1 f2 (g ) exists for mG -almost every g G, and denes an element f1 f2 L1 mG (G). Show that this makes L1 mG (G) into a Banach algebra. (b) Let be a unitary representation of G on a Hilbert space H . Show that f1 (f2 v ) = (f1 f2 ) v

where is dened as in Denition 3.34. In other words, H is a module for the algebra L1 mG (G). (c) Extend (a) and (b) to the space M (G) of nite signed measures on G. Exercise 3.39. Let X = R2 and let T act on X by
x1 x2
cos 2 sin 2 sin 2 cos 2
x1 . x2
Let f C (R2 ). Show that the decomposition of f given by Theorem 3.30 converges uniformly on compact subsets of R2 to f . How much smoothness is needed to arrive at this uniform convergence?
128
3.4 Sobolev Spaces and Embedding on the Torus

Using the theory of Fourier series developed in Sections 3.1 and 3.2, we will now develop the notion of Sobolev spaces and prove the Sobolev embedding theorem. Sobolev spaces combine familiar notions of smoothness (that is, differentiability) with bounds on Lp norms. We will set p = 2, but the theory can be extended to all p. The embedding theorem, or more exactly its extension in Section 3.5, will be used in Section 3.6.2 to formulate and prove elliptic regularity for the Laplace operator, and to prove existence and uniqueness of solutions to the Dirichlet boundary value problem. 3.4.1 L2 -Sobolev Spaces on Td In this section we are going to construct Hilbert spaces of functions on Td depending on a parameter k N0 . Unlike the equivalence classes f L2 (Td ), these functions may be continuous or even dierentiable, depending on k (see Section 3.4.2). Denition 3.40. Let k 0 be an integer. We dene the (L2 )-Sobolev k d space H (T ) to be the closure of C (Td ) inside V = L2 (Td )
K (k )
,
1
where K (k ) is the number of multi-indices (N0 )d with function f C (Td ) is identied with the tuple k (f ) = ( f )
k and a
V.
In order to make this denition more palatable, we now describe some special cases. (1) If k = 0, then K (k ) = 1 and so H 0 (Td ) is the closure of C (Td ) in L2 (Td ) with respect to 2 . As we have seen, C (Td ) is dense in C (Td ) with respect to (indeed, the trigonometric polynomials are already dense) and C (Td ) is dense in L2 (Td ) with respect to 2 , so we obtain H 0 (Td ) = L2 (Td ). We will also write f H 0 = f 2 for f H 0 (Td ). (2) Now let k = 1, so K = 1 + d and H 1 (Td ) = (C (Td )) is the closure K of (C (Td )) L2 (Td ) , where we used the embedding 1 : f (f, x1 f, . . . , xd f ) L2 (Td )
1+d
So, by our denition, elements of H 1 are (d + 1)-tuples of functions on Td . In order to be able to think of these as single functions on Td (which is how we will think of Sobolev spaces), notice that the last d terms of the (d + 1)-tuple are uniquely determined by the rst term. This is clear for 1 (f ) with f C (Td ), but also remains true in the closure H 1 (Td ).
129
Lemma 3.41. Suppose that (f, f1 , . . . , fd ) H 1 (Td ) and the Fourier series of f is given by f= cn n .
nZd
Then fj =
nZd
2 in j c n n .
(3.20)
Proof. We start with the formula an xj f = xj f, n = 2 inj f, n = 2 inj an (f ) for all n Zd and all f C (Td ), see (3.10). Using continuity and the denition of H 1 (Td ), this formula extends to all (f, f1 , . . . , fd ) H 1 (Td ). Theorem 3.9 gives the lemma. As mentioned above, the rst component f of any element (f, f1 , . . . , fd ) H 1 (Td ) determines all the other components. We will therefore identify an element of H 1 (Td ) with the associated element f L2 (Td ). We will also write f H 1 (Td ) and j f = fj L2 (Td ) for j = 1, . . . , d for the other components (this notation will be further justied in Section 3.5). If f C (Td ) this denition of j f matches with the usual one by Lemma 3.41 and formula (3.10). However, the norm associated to f H 1 (Td ) is
d
H1
2 2
+
j =1 d+1
j f
2, 2
which is the norm induced from L2 (Td ) . This discussion generalizes to any k 1 as follows. Proposition 3.42. Fix integers k and with 0 < k . Then the identity map on C (Td ) uniquely extends to an injective continuous operator k, : H k (Td ) H (Td ) of norm one. Moreover, for every j {1, . . . , d} the partial derivative j : C (Td ) C (Td ) extends to a continuous operator j : H k (Td ) H k1 (Td ) with norm less than or equal to one. Finally, if
130
(f )
H k (Td )
and f = f0 has the Fourier series f=

nZd
cn n ,
(3.21) k are
then the other components f for Nd 0 {(0, . . . , 0)} with determined by their Fourier series as f =
nZd
(2 in) cn n ,
(3.22)
where we have used the shorthand (2 in) = (2 in1 )

1
(2 ind )
Proof. For the rst claim, notice that the identity map on C (Td ) is also the restriction of the natural projection map k, : L2 (Td ) (f )
K (k ) k
L2 (Td ) (f ) 1
K ( )
to the image of C (Td ) under the embedding that denes H k (Td ). Therefore the extended map is simply the restriction of this projection to H k (Td ) and so has norm less than or equal to one. Injectivity will follow from the last claim of the proposition. For the second claim, regarding the operator j : H k (Td ) H k1 (Td ), we modify the argument above as follows. Consider the projection map j : L2 (Td ) (f )
K (k )
f+ej
L2 (Td )
K (k1)
1
k 1
which clearly has norm one. Figure 3.3 illustrates the dierence between the projection k, and the projection j in a simple example. Restricting the projection j to C (Td ) we see that j (k (f )) = k1 xj (f ) since, for Nd 0 with
1
k 1,
(j (k (f ))) = (k (f ))+ej = +ej (f ) = ej f by denition. Therefore, the continuous extension j of j : C (Td ) C (Td )
131
Fig. 3.3. With d = 2, = 3, k = 4 and j = 1 the projection dening corresponds to the multi-index on the left. The projection dening j corresponds to the on the right.
to a map from H k (Td ) to H k1 (Td ) exists, and is equal to the restriction of j to H k (Td ) and so has norm less than or equal to one. The nal claim of the proposition follows from Lemma 3.41 for the case k = 1 and induction as follows. For k = 0 there is nothing to prove, Lemma 3.41 gives the case k = 1. Suppose that the claim holds for some k 1, let ( f )
k+1
H k+1 (Td ),
and suppose that f = f0 has Fourier series (3.21). Then by the rst claim of the proposition we have ( f )
H k (Td ),
and so (3.22) already holds for all with 1 k by the inductive hypothesis. d Now let Nd 0 have 1 = k + 1, so = + ej for some j and some N0 with 1 = k . Applying the second part of the proposition, we deduce that f+ej Applying Lemma 3.41 to f gives fej =
nZd
1
H k (Td ).
2 in j c n n ,
which together with the inductive hypothesis gives f+ej =

nZd
(2 in) 2 inj cn n .
=(2 in)+ej
Setting = gives the claim for , nishing the inductive step. Now justied by Proposition 3.42, we again write f H k (Td ), f L2 (Td ) for
1
k , and f
Hk
=
k
2. 2
132
3.4.2 The Sobolev Embedding Theorem on Td As we have seen in the discussion above, each of the spaces H k (Td ) consists of certain L2 functions on Td . For k = 0 we have H 0 (Td ) = L2 (Td ). A natural question arises, namely to ask which functions of L2 (Td ) actually belong to H k (Td ) for k = 1, 2, . . . . Using Fourier series we can give a formal answer to the question, and this will have interesting and important extensions discussed below. Lemma 3.43. Let k 0 be an integer and let f = Then f H k (Td ) if and only if
nZd nZd cn n
L2 (Td ).
|c n |2 n
2k 2
< .
Proof. If f = rem 3.9, for all with that
nZd
cn n H k (Td ) then, by Proposition 3.42 and Theo(n cn )nZd 2 (Zd ) k . We apply this to = ke1 , ke2 , . . . , ked and we see
d j =1 nZd k 2 n2 j |cn | < .
Using
d j =1 k n2 j n 2k 2
we get
nZd
|c n |2 n
2k 2
<
2k 2
as required. Conversely, assume that with 1 k we have
nZd
| c n |2 n
< . Then for any Nd 0 n k 2,
d 1 n 1 nd = n
and so
nZd
|(2 in) cn | < .
From this it follows that the partial sums cn n

n
2
133
The following theorem shows how special the elements of the subset H k (Td ) within L2 (Td ) become once k is suciently large. If k > d 2 , then any element of H k (Td ) agrees almost surely (and will be identied) with a continuous function. Increasing k further also gives some dierentiability of this continuous function. Theorem 3.44 (Sobolev Embedding). Let k and be non-negative inted gers with k > + 2 . Then the inclusion map from C (Td ) to C (Td ) has a continuous extension from H k (Td ) to C (Td ) so, in particular, any f H k (Td ) has a representative f C (Td ) with f C d f H k . The proof will show that most of the work needed has already been done. Proof of Theorem 3.44. Let us start with the case = 0. In this case we already know that f
of the Fourier series of f converges even in the H k norm induced from L2 (Td ) to f . It follows that f H k (Td ).
2 2
k f + e 1
2 2
k f + + e d
2 2
for f C (Td ) by Theorem 3.10. However, the square root on the right is bounded above by f H k , which shows that the inclusion map : C (Td ),
Hk
C (Td ),
is the inclusion map from C (Td ) to L2 (Td ) considered earlier, so that (f ) = k,0 (f )
is a bounded operator. Since C (Td ) is a Banach space, this operator extends to H k (Td ) by Proposition 2.46. Moreover, the composition of the inclusion maps C (Td ) C (Td ) L2 (Td )
for all f H k (Td ). That is, if f H k (Td ) agrees with the continuous function (f ) C (Td ). d Now let 1 satisfy + d 2 < k . Recall that C (T ) is a Banach space with the norm f C = max f .

for f C (Td ) and
Applying Theorem 3.10 to f (as done in the discussion above) and Proposition 3.42 gives f d f H k f Hk
1
. Therefore, f
C
d f
Hk
134
Composing with the inclusion from C (Td ) to C 0 (Td ) we see (from the discussion above concerning = 0) that (f ) = k,0 (f ), showing that f H k (Td ) agrees almost everywhere with (f ) C (Td ).
for f C (Td ), and the identity map once again gives rise to a bounded operator : H k (Td ) C (Td ).
3.5 Sobolev Spaces and Embedding Theorem on Open Sets

Much of the discussion in Section 3.4.1 regarding Sobolev spaces on Td can be quickly generalized to open subsets of Rd . However, in Section 3.4.2 particularly we frequently made use of Fourier series in the arguments, so we will go through the denitions and elementary properties once again, without making use of Fourier series arguments. 3.5.1 L2 -Sobolev Spaces on Open Subsets In this subsection we will dene spaces of functions on an open subset U Rd that form Hilbert spaces in which the elements may be continuous or even differentiable depending on a regularity parameter k (and the dimension). Here we have a choice regarding the behavior of the functions on the boundary U of U Rd , giving rise to two dierent Sobolev spaces for every k 1.
Denition 3.45. Let d 1 and k 0 be integers, and let U Rd be an open subset. Then the L2 -Sobolev space H k (U ) is dened to be the closure of ( f ) 1 k | f C (U ), f L2 (U ) for 1 k (3.23) inside L2 (U )
K (k )
, where as usual
d K (k ) = { N0 | 1
Even though the closure H k (U ) contains many more functions that are not in C (U ), those new elements ( f )
k} .
H k (U )
(3.24)
still have some of the properties of the elements in the subspace (3.23) that dened H k (U ). In fact, as we will show below, f = f0 determines all the other components f of the vector (3.24), and these are derivatives in the following weaker sense.
In the literature another notation that is used is W k,2 . The more general case of W k,p is dened similarly using Lp (U ) instead of L2 (U ).
135
2 Denition 3.46. Suppose that Nd 0 and f, g L (U ). Then g is called a weak -partial derivative of f , written g = f , if
U for all Cc (U ).
f dx = (1)
g dx
U
We view the functions appearing in this denition as test functions. Example 3.47. Let U = (1, 1) R. The function f (x) = has the weak e1 -partial derivative g (x) = In fact
1 1
x 0
if x if x
0 0
1 if x > 0 0 if x < 0
f (x) (x) dx =
0
x (x) dx
1
= x(x)|1 0
1
(x) dx
0
=0 as required.
g (x)(x) dx
1
Lemma 3.48 (Weak derivatives). A weak -partial derivative of an L2 function f is uniquely determined as an element of L2 if it exists. If f C (U ) and f L2 (U ), then f is a weak -partial derivative. If (f )
H k (U )
then f = f0 has f as a weak -partial derivative for with 1 k . In particular, f = f0 determines all the elements of the vector in H k (U ). Proof. If g is a weak -partial derivative then the inner product g, =
U
g dx = (1)
f dx
is determined by f for all Cc (U ). As Cc (U ) is dense in L2 (U ), we see that g is uniquely determined by f .
136
(U ) then integration by parts If f C 1 (U ), ej f L2 (U ), and Cc shows that
U (y +Rej )
f (x)ej (x) dxj =
ej f (x)(x) dxj ,
U (y +Rej )
where the boundary terms vanish since Cc (U ). Integrating over the remaining variables shows that ej f is indeed also a weak ej -partial derivative. By induction on 1 , this implies the second claim in the lemma. In particular, we have shown that f0 , 0 = (1)
f ,
K (k ) k
rst for all (f ) 1 k in the subspace (3.23) inside L2 (U ) from C (U ) H k (U ), and then by continuity for all (f ) This implies the lemma.
obtained H k (U ).
Exercise 3.49. Show that f L2 (Td ) belongs to H k (Td ) if and only if there exists, for every Nd k a weak partial derivative f L2 (Td ) (dened 0 with 1 d using C (T ) as the space of test functions). Here the partial derivative is dened in terms of smooth functions C (Td ).
Denition 3.50 (Modifying Denition 3.45). Lemma 3.48 justies the following notational convention. We identify the elements of H k (U ) with functions f L2 (U ) and equip the space H k (U ) with the norm f
H k (U )
f
k
2 L 2 (U ) .
Proposition 3.51. For k >
0 there is a natural injection
k, : H k (U ) H (U ) of norm one, extending the identity on C (U ). For any multi-index with 1 k there is a natural operator : H k (U ) H k f f
(U )
of norm less than or equal to one, which extends on C (U ) H k (U ).

More precisely, in our denition, H k (U ) consists of those f L2 (U ) that have weak -partial derivatives f L2 (U ) for ||1 k and such that the vector ( f )||1 k (L2 (U ))K (k) can be approximated by the vectors corresponding to elements of C (U ) H k (U ). We remark that for a suitable open sets U Rd the analogue to Exercise 3.49 also holds. However, due to the existence of U this is a bit harder to prove (as e.g. direct convolution with a smooth function would result in a function that isnt dened in a neighborhood of U ). We will not use this possible alternative denition of the Sobolev spaces.
137
This may be proved in the same way as Proposition 3.42. We may obtain other Sobolev spaces which will be subspaces of H k (U ) by requiring additional decay properties on the boundary U . Denition 3.52. We dene
k (U ) H k (U ) H0 (U ) = Cc
to be the closure of all smooth compactly supported functions in H k (U ).

k We will see later that elements of H0 (U ) vanish in the square mean norm at U if k 1.
Exercise 3.53. Let V U Rd be open subsets. Let k 0. (a) Show that the restriction |V : H k (U ) H k (V ) is a bounded operator. k k (b) Show that the extension operator H0 (V ) to H0 (U ), dened by extending the functions to be zero on U V , is a bounded operator. k k (c) Show that for k 1 in general H0 (U )|V does not belong to H0 (V ). Show that k k one cannot dene an extension operator from H (V ) to H (U ) by simply extending the functions to be zero on U V .
3.5.2 Examples We illustrate the theory above with some simple examples, which will be justied later. Example 3.54. Let d = 1, U = (0, 1) and k = 1. Then every f H 1 ((0, 1)) can be continuously extended to a function (f ) C ([0, 1]) with (f )
2 f
H1 .
(3.25)
Also see Exercise 3.56 and Exercise 3.57. Example 3.55. Let d = 2, U = B1/2 (0) {0} and k = 1. Then the function f dened by f (x) = log log x , lies in H 1 (U ) and cannot be extended to an element of C (U ). Also see Exercise 3.58 and Exercise 3.59. Justification of Example 3.54. For x, y U and f C (U ) H 1 (U ) we clearly have
y
f (y ) = f (x) +
x
f (s) ds.
Notice that for a xed x and y the integral on the right is a continuous functional on H 1 (U ), but for the terms f (y ) and f (x) this is not clear yet. Now integrate over x (0, 1) to get
138

1 1 y x 1
f (y ) =
0 1
f (x) dx +
0 1
f (s) ds dx f (s) (y, x, s) ds dx,
=
0
f (x) dx +
0 0
where
Applying Fubinis theorem we get

1
if x < s < y ; 1 (y, x, s) = 1 if y < s < x; 0 if s is not between x and y.

1
f (y ) =
0
f (x) dx +
0
f (s)k (y, s) ds
(3.26)
where
1
k (y, s) =
0
(y, x, s) dx s 1s if s < y ; if s > y.
Hence (3.26) expresses the value of f at y in a way which is clearly continuous on H 1 ((0, 1)) and satises (3.25). Moreover, we may use (3.26) for y = 0 and y = 1 as a denition of f (0) and f (1), and then
y1
|f (y1 ) f (y2 )|
f (s) ds
y2
|y 1 y 2 | f
L2
(3.27)
for all y1 , y2 [0, 1]. Applying Proposition 2.46 to the so described map from C (U ) H 1 (U ) to C (U ) the claims in Example 3.54 follow.
Exercise 3.56. Let U = (0, 1) and let f : U C be a function continuous on U and continuously dierentiable at all points of U apart from nitely many points. Assume also that f L2 (U ). Show that f H 1 (U ). Exercise 3.57. Show directly that Example 3.54 can be generalized to the statement that any f H k ((0, 1)) (continuously extended) actually belongs to C k1 ([0, 1]). 1 Show also that H0 ((0, 1)) is mapped under the embedding from Example 3.54 into {f C ([0, 1]) | f (0) = f (1) = 0}.
Justification of Example 3.55. It is easy to check that f L2 (U ), so we only have to check that xj f L2 (U ). By the chain rule we have
139
xj (log |log x |) =
1 log x
1 x
xj . x
1
Taking the square and integrating with respect to dx dy = r dr d we get

1/2 2 0 1/2
xj f
L 2 (U ) 0
1 r d dr (r log r)2
1 dr < . r(log r)2
Exercise 3.58. Extend Example 3.55 by showing that f (x) = log log x
d
H 1 (B1/2 (0)).
R Exercise 3.59. Let U = B1 (0), and let f (x) = x k of do we have f H (U )?
for x U . For which values
3.5.3 Restriction Operators and Traces In order to get a better geometric understanding of what it means for a function to belong to H 1 (U ), we will now show that an element f H 1 (U ), when restricted to any hyperplane, still belongs to L2 . Notice that since a hyperplane is a null set, any property claimed for the restriction to a hyperplane cannot be demanded indeed does not even make sense for a function (that is, an equivalence class of functions) f L2 (U ). For notational simplicity we start with the case U = (0, 1)d . Proposition 3.60 (Trace on the standard cube). (0, 1)d1 and write Sy = S { y } U
Let U = (0, 1)d , S =
for y [0, 1]. For every y [0, 1] there is a natural restriction operator to L2 (Sy ), called the trace on Sy , H 1 (U ) f f
Sy
L2 (Sy ), .
which for y (0, 1) is the extension of the restriction operator C (U ) H 1 (U ) f f

Sy
Moreover, if we identify L2 (Sy ) with L2 (S ) for all y [0, 1] then we also have |y 1 y 2 | f H 1 ( U ) . f S y f S y L 2 (S )
1 2
This result could be skipped, we will only need the very similar Proposition 3.64. By simply identifying Sy with S .
140
Proof. We write x for elements of S , so that U = {(x, y ) | x S, y (0, 1)}. We will use this decomposition and Fubinis theorem to extend the denition of f , as illustrated in Figure 3.4.
{x} (0, 1)
Sy x Fig. 3.4. We obtain f (x, y ) for almost every x by integrating vertically both f and y f .
Let f C (U ) H 1 (U ) then f, y f L2 (U ) and so by Fubinis theorem f

{x}(0,1)
, y f
{x}(0,1)
L2 ({x} (0, 1))
(3.28)
for almost every x. Note that we assumed f to be smooth and so y f is dened pointwise by the limit of dierence quotients as usual, but f or y f may tend to innity as y 0 or y 1, and that is the reason we apply Fubinis theorem here. In other words, for those x we know that f
{x}(0,1)
H 1 ({x} (0, 1)),
and we may apply Example 3.54. More precisely, we will use (3.26) from the justication of the statements in that example, which in our case says that
1 y 1
f (x, y ) =
0
f (x, s) ds +
0
sy f (x, s) ds +
y
(1 s)y f (x, s) ds
(3.29)
holds for any y (0, 1) and for all of the almost every x satisfying (3.28). Applying the CauchySchwarz inequality we deduce that |f (x, y )| 2 f (x, )
L2 ({x}(0,1))
+ y f (x, )
L2 ({x}(0,1)) 2 L2 ({x}(0,1)) .
f (x, )
2 L2 ({x}(0,1))
+ y f (x, )
Now take the square and integrate over x (0, 1)d1 to obtain for every f C (U ) H 1 (U )
141
f (, y )
L 2 (S y )
2 2 f
L 2 (U )
+ y f
L 2 (U )
H 1 (U ) .
(3.30)
Applying Proposition 2.46 shows that the restriction map C (U ) H 1 (U ) f f

Sy
L2 (Sy )
has a continuous extension to all of H 1 (U ). Just as in Example 3.54, we can now use (3.29) to dene f |S0 (x) and f |S1 (x) for all x satisfying (3.28). By the argument above, (3.30) then also holds for f |S0 and f |S1 . Finally we have, for any f C (U ) H 1 (U ), |f (x, y1 ) f (x, y2 )| | y 1 y 2 | y f
L2 ({x}(0,1))
for all x satisfying (3.28), by (3.27). Taking the square and integrating over x (0, 1)d1 we get (using the identication of L2 (Sy1 ) with L2 (Sy2 ) as in the proposition) that f
Sy1
Sy2
L 2 (S )
|y2 y2 | y f |y 2 y 2 | f
L 2 (U )
H 1 (U ) .
By density of C (U ) H 1 (U ) this extends to all f H 1 (U ) and so completes the proof.

Exercise 3.61. Prove an extension of Proposition 3.60 for an arbitrary bounded open set U Rd and the image of [0, 1]d1 {0} for a regular map : [0, 1]d U.
We now repeat the argument above in a simplied way for elements 1 of H0 (U ). For the statement that such functions vanish in the square-mean sense on U we want to assume that U has a smooth boundary in the following sense. Denition 3.62. Let U Rd be an open set. We say that U has a C k smooth boundary if for every z (0) U there exists a neighborhood B (z (0) ), a rotated coordinate system which we denote by x1 , . . . , xd1 , y so that z (0) (0) (0) corresponds to (x1 , . . . , xd1 , y (0) ), and a function C k B (x1 , . . . , xd1 ) such that U B (z (0) ) = {(x, y ) B (z (0) ) | y < (x)}, as illustrated in Figure 3.5.
(0) (0)
142
3 From Fourier Series to Dirichlet Boundary Value Problems B (z0 )
U Fig. 3.5. A set with smooth boundary.
This includes examples like U = Br (x), but excludes U = (0, 1)d if k > 1. Notice that an open set with a C k -smooth boundary need not be connected, simply connected, or bounded. Also note that the rotation within Denition 3.62 does not aect the assumption of whether a function belongs to H k (U ). In fact, since a rotation R preserves the Lebesgue measure, a convergent sequence (fn ) in C (U ) H k (U ) is mapped to another convergent sequence (fn R) in H k (R1 U ).
Exercise 3.63. Let U Rd be open and bounded, and let be a dieomorphism (e.g. a rotation) dened on a neighborhood of U . Let k 0 be an integer. Show that f H k (U ) f H k (1 (U )) is an isomorphism (in the case of a rotation an isometry) between H k (U ) and H k (1 (U )).
Proposition 3.64. Let U Rd be open and bounded. Then for every hyperplane P there is a natural restriction operator, the trace on P ,
1 H0 (U ) f f |P L2 (P ),
which is the extension of the restriction operator

Cc (U ) f f |P
(see Figure 3.6).

y=Y
Fig. 3.6. The trace on P .
More generally, if we denote the variables (z1 , . . . , zd ) by (x1 , . . . , xd1 , y ) and
This is much like the statement of the implicit function theorem.

Rd1 (x(0) ) R : B
143
is continuous, then there is a natural restriction operator, the trace on the graph Graph() = {(x, (x))} of : Rd1 R,
1 H0 (U ) f f
Graph()
L2 dx1 dxd1 (Graph())
which satises f
Graph() L2 (Graph())
y f
L 2 (U ) ,
where is chosen with the property that (x, (x)+ ) / U for all x B (x(0) ) (see Figure 3.7).
Graph() U Fig. 3.7. The trace on the graph of .
Consider now the case of a bounded open set U Rd with C 0 -smooth boundary and the functions (x) = (x) with as in Denition 3.62. 1 Then, by Proposition 3.64, the trace of an element f H0 (U ) on this trans lated portion of the boundary has norm O( f H 1 (U ) ). This explains the 1 earlier claim that f H0 (U ) vanishes in the square-mean sense on U . Proof of Proposition 3.64. Recall that we write x for the rst (d 1) variables and y for the last variable. For f Cc (U ) we have
Y
f (x, y ) =
y f (x, s) ds,
y
(3.31)
where Y R is larger than the last component of any z U . Taking the square and applying the CauchySchwarz inequality gives |f (x, y )|2 =
Y 2 Y
y f (x, s) ds
y
(Y y )
U 1
|y f (x, s)|2 ds.
(3.32)
Assuming P = Graph(L) for a linear map L : Rd1 R, we can integrate (3.32) with y = L(x) with respect to x to obtain
144
f |P
U y f
L 2 (U )
H 1 (U )
(3.33)
Just as in the proof of Proposition 3.60 this implies the rst part of the proposition. For the second part, we set y = (x) and note that in (3.31) we can use Y = y + so that (3.32) becomes |f (x, (x))|
2 (x)+
(x )
|y f (x, s)| ds,
which when integrated over x gives f |Graph()

L2 (Graph())
y f
L 2 (U )
3.5.4 Sobolev Embedding in the Interior We now extend the Sobolev embedding theorem (Theorem 3.44) to open subsets U Rd (but leave open the question regarding the behavior of f on U ).
Theorem 3.65 (Sobolev embedding on open subsets of Rd ). Let U Rd be open and let 0 and k > d 2 + be integers. Then any function k in H (U ) (has a continuous representative that) also lies in C (U ).
Exercise 3.66. (a) Extend Proposition 3.60 by showing that f H k (U ) implies that f |Sy H k1 (U ). (b) Use (a) to prove the following weak version of the Sobolev embedding theorem. For any 0 there is a natural map from H d+ (U ) to C (U ) that extends the identity inclusion on the functions in C (U ) H d+ (U ). (c) Extend the arguments from (b) to the boundary, i.e. show that there is a natural map from H d+ (U ) to C (U ). (d) Now improve the needed regularity in (c) in the following way: Show that there 1 is a natural map from H k (U ) to C (S0 ) if k > 1 + d (by also applying the Sobolev 2 embedding theorem on S0 ).
As we will see Theorem 3.65 follows from Theorem 3.44 by translating the problem from an open set U Rd to the d-torus Td . Proposition 3.67 (Transfer, part 1). A theorem regarding the existence of an extension of the identity : H k (U ) C (U ) f f holds for any open set U Rd if it holds for the case U = Td .
(3.34)
By this we mean that the map maps every f H k (U ) to a continuous representative (f ) C (U ) of the equivalence class f . We are not claiming any continuity of or boundedness of (f ) for f H k (U ).
145
1 Proof. Fix a point x0 U . Choose (0, 2 ) with B (x0 ) U , and let Cc (B (x0 )) be a function with
1 on B/2 (x0 ).
(3.35)
Using property (3.35), any f C (U ) gives rise to a function f Cc (B (x0 )) that agrees with f on B/2 (x0 ). By the product rule we have ej (f ) = (ej f ) + (ej )f, which extends (by induction) to a formula for (f ) for all with 1 k which expresses (f ) as a linear combination of products of () and (f ) with 1 , 1 1 k . This implies that f
H k (B (x0 ))
max
1
H k (B (x0 )) .
We will now think of f as being dened on all of Rd , simply by setting f 1 d to be zero outside B (x0 ). Since the support of f lies in B (x0 ) ( 1 2, 2) d which maps injectively under the identity map to T , f agrees on B (x0 ) with its Zd -periodic continuation p(f )(x) =
nZd
(f )(x + n).
For f C (U ) H k (U ), p(f ) C (Td ) and satises p(f )

H k (T d )
= f
H k (B (x0 ))
H k (U ) .
Therefore, the operator can be continuously extended to give a map f p(f )
H k (U ) f p(f ) H k (Td ) which has the property that p(f )|B/2 (x0 ) = f |B/2 (x0 ) . Assuming (3.34) for U = Td , we know that p(f ) C (Td ) for all f H k (U ), which implies that f |B/2 (x0 ) C (B/2 (x0 )). By varying the point x0 U we deduce (3.34) for the open set U . To summarize the theorem follows from the following chain of operators H k (U ) H k (Td ) C (Td )
p() |B/2 (x0 )
Cb (B/2 ) L2 (B/2 (x0 )),
and since the composition of these applied to f C (U ) H k (U ) simply gives the restriction of f to B/2 (x0 ).
146
Exercise 3.68. To be precise the above proof is not yet complete since we have only shown that for every point x there exists a neighborhood B/2 (x) such that we can nd a C -version of the restriction of f to the given neighborhood. However, we actually claimed that there is version of f on all of U which is in C (U ). To complete the proof, prove or recall the following statements: (a) (Merging lemma for continuous functions) Suppose U is covered by a family of open subsets B for T and for every T we are given some f C (B ). Assume that f1 |B2 = f2 |B1 for every 1 , 2 T . Then there exists some f C (U ) with f = f |B for all T . (b) Using that U is -compact construct a countable cover Bn = Bn /2 (xn ) of U with Bn (xn ) U . (c) Combine the Proof of Proposition 3.67, (a), and (b) above to complete the proof of Proposition 3.67.
The transfer method applied in the proof of Proposition 3.67 has additional properties, which we state here for use in the next section. Proposition 3.69 (Transfer, part 2). Let U Rd be open, and let Cc (U ) have Supp BM (0). Then for every k 0 there is a bounded operator p() : H k (U ) H k (Td M ),
d d k where Td M = R /(2M Z) . If some f H (U ) has the property that p(f ) d H (TM ) for some > k , then also f H (U ).
Proof. The existence of the operator p() has essentially been show in the proof of Proposition 3.67: simply replace by M and Td by Td M. For the second part of the proposition we may argue in a similar way. (U ) be chosen with Let Cc 1 on a neighborhood of Supp (see Exercise 3.70), and Supp U BM (0).
d Then any g C (Td M ) can be identied with a (2M Z) -periodic smooth d function g on R , and so C (Td M ) g g Cc (U )
is well-dened. Arguing just as in the proof of Proposition 3.67, we get g

H (U )
. H (T d M)
Applying this map to g = p(f ) H k (Td M ) we get p(f ) = f since this holds trivially for f C (U ) by the properties of and . Hence the d existence of the operator : H (Td M ) H (U ) shows that p(f ) H (TM ) implies that f H (U ) for k .
3.6 The Dirichlet Boundary Value Problem and Elliptic Regularity
147
The existence of functions in Cc (U ) that are equal to one on large subsets of U (as used in the above proof) will be used more frequently in the next section. We outline a construction in the following exercise.
Exercise 3.70. Let K U be a compact subset of an open subset U Rd . Find a smooth function Cc (U ) with |K 1 e.g. by following the following outline. (a) Find an open cover of K by nitely many balls of the form Bj /2 (xj ) such that xj K and Bj (xj ) U . (b) Find for every j a function j Cc (Bj (xj )) with j |B /2 (xj ) 1. j (c) Find a function h C (R) such that = h ( j j ) Cc (U ) has all the desired properties.

In this section we will combine the discussion of Sobolev spaces from Section 3.5, the orthogonal decomposition theorem (Theorem 2.87) and the following orthogonality relation to solve the Dirichlet boundary value problem introduced and motivated in Section 1.4.1. Recall that a function g is said to be harmonic if g = 0, where g = 2g 2g 2 2 + + 2 = 1 g + + d g 2 x1 xd
is the Laplacian of g . We will also use the notion of a pre-inner product, which satises all the requirements of an inner product from Denition 2.54 apart from positivity, which is replaced by the requirement that v, v 0 for all v .
Lemma 3.71. Let U Rd be open, let Cc (U ), and let g C 2 (U ) 1 H (U ) be a harmonic function. Then and g are orthogonal with respect to the pre-inner product d
u, v
=
j =1
j u, j v .
(3.36)
Proof. Fix j {1, . . . , d}. Since Cc (U ), integration by parts gives
j g, j
L 2 (U )
=
U
(j g )(j ) dx
2 2 g, (j g ) dx = j L 2 (U )
since the boundary terms vanish. Summing over all j , we get g, by the assumption on g .
1
= g,
L 2 (U )
=0
148
Motivated by Lemma 3.71, the approach is to decompose a function f C 1 (U ) as f = g + v,

1 1 where v H0 (U ) and g is orthogonal to H0 (U ) in a sense that will be made precise below. Then v will vanish at U in the square-mean sense and so
f |U = g |U at least in the square-mean sense, and there is some hope that g is harmonic. As we wish to use the pre-inner product from (3.36) in the denition of the orthogonal complement, we will have to discuss properties of this pre-inner product. We will then show that g is smooth and harmonic, and it is this step that relies on a general phenomenon called elliptic regularity, the Laplace operator being an example of an elliptic dierential operator. Finally, we will show for d = 2 that g extends continuously to the boundary U and agrees with f |U there. 3.6.1 The Pre-Inner Product Let U Rd be an open bounded set.
Lemma 3.72. The pre-inner product , 1 restricted to Cc (U ) is an inner product, and the norm dened by this inner product is equivalent to H 1 (U ) . The semi-norm induced by , 1 on C (U ) H 1 (U ) has as its kernel the subspace of all locally constant functions .
Proof. By the proof of Proposition 3.64 (more precisely, by (3.33)) we have diam U xd L2 (U ) |P L2 (P ) for any hyperplane P (which, for example, we may take to be of the form P = Rd1 {xd }). Squaring and integrating over xd gives
Here C 1 (U ) is the space of all functions on U that are continuously dierentiable such that the function and the derivative extend continuously to the closure of U . We will mostly be interested in the restriction of the extension of the function f to the boundary of U , even though the described method will use the actual f on U . Assume that U is C 1 -smooth. In that case given a function in C 1 (U ) it is possible to extend it continuously to U so that we get an element f C 1 (U ) as above with the given function as the restriction to U . For this one only needs a smooth partition of unity on a neighborhood of U and the assumed smoothness of U . Here the kernel is the subspace of all functions f with f, f 1 = 0. A function f : U R is called locally constant if for every x U there is a neighborhood V of x such that f |V is constant. If U is connected, then any locally constant function is constant.
149
L 2 (U )
diam U xd
L 2 (U ) .
It follows that, for Cc (U ),
= =
d j =1 H 1 (U )
xj
2 L 2 (U )
2 L 2 (U )
d j =1
xj
2 L 2 (U )
, 1 , proving the rst statement in the proposition. If is locally constant then it is clear that , 1 = 0. On the other hand, if C (U ) H 1 (U ) has , 1 = 0, then xj = 0 for all j , and so is locally constant. We are now ready to exhibit the desired orthogonal decomposition. Proposition 3.73. Let U Rd be an open bounded set with C 1 -smooth boundary, and let f C 1 (U ) (that is, f and all of its derivatives are 1 continuous and extend to U ). Then there exists some v H0 (U ) such 1 that g = f v H (U ) is weakly harmonic in the sense that g,
for all Cc (U ). L 2 (U )
=0
As before with and , we will think of this statement as giving meaning to g = 0 by writing g,
L 2 (U )
= g,
L 2 (U )
=0
for all Cc (U ). If g = 0, then we say that g is weakly harmonic. Proof of Proposition 3.73. We equip Cc (U ) with the inner product , , 1 , and write 1 for the norm derived from this inner product. By 1 Lemma 3.72, 1 is equivalent to H 1 (U ) on Cc (U ) and so H0 (U ), , 1 1 is a Hilbert space. Let f C (U ) be as in the statement of the proposition, and notice that d
() = , f
=
j =1
j , j f
L 2 (U )
(3.37)
1 denes a linear functional on H0 (U ). Applying the Fr echetRiesz representa1 tion theorem (Corollary 2.71) we nd some v H0 (U ) with
() = , v
1 for all H0 (U ). This implies
(3.38)
150

d
, f v
L 2 (U )
=
j =1 d
2 j , f v
L 2 (U )
(by denition of )
j =1
j , j f j v
L 2 (U )
(by Lemma 3.48) (by (3.37) and (3.38))
= () () = 0 completing the proof.
3.6.2 Elliptic Regularity for the Laplace Operator In this section we will upgrade the conclusion from the previous section to show that the weakly harmonic function g is actually smooth and harmonic. The principle at work here is much more general, and is called elliptic regularity. We will again rely on Fourier series in the argument, and this will only give the result in the interior of U and not on the boundary U . For this reason, it is natural to start with functions that have little structure on U as in the following denition. Denition 3.74. A measurable function f on U is called locally Lp for some p [0, ] if f K Lp (U ) for every compact set K U . In this case we write f Lp loc (U ). A measurable function f on U is called lo cally H k for some k N0 if f H k (U ) for all Cc (U ). In this case k we write f Hloc (U ).
Exercise 3.75. Let g be a measurable function on an open subset U Rd . Show k that g Hloc (U ) if and only if g |K H k (K ) for every compact K U .
Notice that the characteristic function K localizes f and removes the values of f near the boundary U ; in the second case has the same eect but is chosen to be C so as to not disturb any of the smoothness properties of f . Theorem 3.76 (Elliptic regularity for inside open subsets of Rd ). 1 k (U ), Suppose that U Rd is open and g Hloc (U ). Assume that g Hloc k in the sense that there exists some u Hloc (U ) with g,
L 2 (U )
= g,
L 2 (U )
= u,
L 2 (U )
k+2 for all Cc (U ). Then g Hloc (U ).
We have not used this as our denition as we will need the functions as in Denition 3.74 in the proofs anyway.
151
Roughly speaking the theorem says that if g exists, then the Sobolevregularity of g is precisely two less than the one of g . In other words, any non-smoothness of g will be visible also in g or there is no cancellation of singularities when g is calculated out of g . This remarkable result has many remarkable consequences, a few of which we list here.
1 Corollary 3.77. If g Hloc (U ) has g = u C (U ) (or g is weakly harmonic in the sense that g = 0), then g C (U ) (and is harmonic). k Proof. Since u Hloc (U ) for all k N0 , Theorem 3.76 implies that g k+2 Hloc (U ) for all k 0. Hence g H k+2 (U ) for all k 0 and all functions Cc (U ). By Theorem 3.65, this implies that g C (U ) for all Cc (U ). Choosing Cc (U ) equal to 1 on a neighborhood of a given x U shows that g C (U ) since it is C in a neighborhood of each point. 1 Corollary 3.78. If g Hloc (U ) is a weak eigenfunction of in the sense that there exists C with
g,
L 2 (U )
= g,
L 2 (U )
= g,
L 2 (U )
for all Cc (U ), then g C (U ). 1 (U ) and so by Theorem 3.76 we Proof. By assumption, g = g Hloc 3 3 have g Hloc (U ) also. However, this shows that g = g Hloc (U ) and 5 Theorem 3.76 may be applied again to see that g Hloc (U ), and so on. k (U ) for all k 0, and arguing as in the proof of It follows that g Hloc Corollary 3.77 we see that g C (U ).
We will prove Theorem 3.76 in two steps (which should be a familiar strategy by now): rst we deal with the case of functions on Td (which turns out to be easy because of Fourier series), and second we show how to transfer the theorem from Td to open subsets of U . Lemma 3.79 (Elliptic regularity on Td ). Let g L2 (Td ), and assume that g H k (Td ) in the sense that there is some function u H k (Td ) with g,
L 2 (T d )
= g,
L 2 (T d )
= u,
L 2 (T d )
for all C (Td ). Then g H k+2 (Td ). Proof. If g = u as in the lemma, then u is uniquely determined by g . Indeed, if g= cn n
nZd
is the Fourier series of g then
152
u=
cn (2 )2 n 2 2 n
nZd
is the Fourier series of u. This follows from Theorem 3.9 since the characters n are eigenfunctions of the Laplace operator :
2 2 n = (2 i)2 n 2 2 n = (2 ) n 2 n
and so u, n
L 2 (T d )
= g, n
L 2 (T d )
= (2 )2 n
2 2
g, n
L 2 (T d )
=c n
By assumption u H k (Td ), which shows that cn (2 )2 n

nZd 2 2 2
2k 2
<
by Lemma 3.43. However, this is equivalent to |c n |2 n

2(k+2) 2
nZd
< ,
which implies that g H k+2 (Td ) by Lemma 3.43. Before extending Lemma 3.79 to the proof of Theorem 3.76, we rst explain how the localizing step in the denition of Hloc (U ) in which g is replaced by g aects the assumption regarding g .
Lemma 3.80. Suppose that U Rd is open and g Hloc (U ) with Then for every j = 1, . . . , d there exist a weak partial derivative 1 j g Hloc (U ) L2 loc (U )
1.
for which
for all Cc (U ).
j g, = g, j
At rst it is a bit hard to pinpoint where the non-cancellation of singularities in Theorem 3.76 really comes from. However, if one is determined to nd one step within the proof to blame for this, then it would be the fact that the eigenvalues of the character n grows with the rate n 2 2 . This would for instance not be 2 2 2 true for the non-elliptic (hyperbolic) partial dierential operator D = 1 corresponding to the wave equation in two dimensions for this operator the eigenvalue on characters can cancel (i.e. be zero or much smaller than n 2 2 ) and there are also many non-smooth solutions to Dg = 0. As the statements of the next two lemmas should be easy to believe and the proofs rely only on the denitions and are maybe a bit tedious, the reader may skip these proofs initially and see how the lemmas become useful in the proof of Theorem 3.76 on page pagerefpage:proofofellipticinFA.
153
Proof. Fix some j {1, . . . , d}. Let V U be an open subset with V U compact, and choose Cc (U ) with 1 on V . Then g H (U ) has a weak partial derivatives along xj , which we will denote by g,j H 1 (U ). By denition, we now have g,j , = g, j = g, j
for all Cc (V ) Cc (U ). This shows that g,j |V H 1 (V ) is the weak partial -derivative of g |V and by Lemma 3.48 is therefore uniquely determined by g |V (and independent of ). Now we may write U= Vn n 1
for an increasing sequence of open subsets of U with compact closures 1 }), and dene gj (x) to be gn ,j (x) for x Vn (e.g. Vn = {x : d(x, Rd U ) > n where n and gn ,j are the functions as above corresponding to the set V = Vn . By Lemma 3.48 the function gj is well-dened almost everywhere. Let Cc (U ), then by compactness there exists some n with Supp Vn . Since gj |Vn = gn ,j |Vn H 1 (Vn ) we obtain gj , = gn ,j , = g, j . Moreover, gj = n gn ,j H 1 (U ). As these two facts hold for every 1 Cc (U ) we see that gj Hloc (U ) is a weak partial derivative of g along xj .
Lemma 3.81. If g Hloc (U ) with and Cc (U ), then d k 1, g = u Hloc (U ) with k
0,
g =
u
H k (U )
+ g +2
H (U ) j =1
(j g )(j ) H min{k,1} (U ).
H 1 ( U )
Notice that if g happened to be smooth, then the formula in the lemma calculates (g) since in that case
d d 2 j (g) = j =1 j =1 d
(g) =
j ((j g ) + g (j ))
2 2 (j g ) + 2(j g )(j ) + g (j ). j =1
k Proof. By assumption u Hloc (U ) and so u H k (U ). Similarly, g Hloc (U ) and so g H (U ). Finally, by assumption, g Hloc (U ) with 1 1 and so j g Hloc (U ) by Lemma 3.80, which gives
154
(j g )(j ) H 1 (U ). Therefore u + g + 2
j =1 d
(j g )(j ) H min{k,1} (U )
and it remains to show that this function is equal to (g) weakly. For this, let Cc (U ) and calculate (all the inner products are taken in L2 (U ))
d d
u + g + 2
j =1
(j g )(j ),
= u, + g, () + 2
j =1
j g, (j )
d
= g, () + g, () 2
d
g, j ((j ))
j =1
g, () + 2
j =1
(j )(j ) +
d d 2 (j ) 2
+() 2
(j )(j )
j =1
j =1
= g, = g, which we now interpret as the statement that

d
u + g + 2
j =1
(j g )(j )
equals (g).
1 Proof of Theorem 3.76. Let U Rd be open and suppose that g Hloc (U ) has k g = u Hloc (U ).
k+2 We wish to show that g Hloc (U ). We will do this by showing that g Hloc (U ) by induction on {1, . . . , k + 2}. The case = 1 is the assumption in the theorem. So suppose that 1 k + 1 and g Hloc (U ). Fix some Cc (U ). Then
(g ) = u1 H 1 (U ) weakly by Lemma 3.81. This means that u1 ,

for all Cc (U ). L 2 (U )
= g,
L 2 (U )
(3.39)
155
Using the fact that Cc (U ) also, we can now make a switch in this formula to Td as follows. Let Cc (U ) be such that 1 on Supp . d d d Then Cc (U ) for any C (Td M ), where TM = R /(2M Z) and d d functions on Td are identied with (2 M Z ) -periodic functions on R . We M choose M > 0 with Supp U [M, M ]d . Using the notation from Proposition 3.69 and applying (3.39) to we get
p(u1 ),
L 2 (T d M)
= u1 ,
L 2 (U )
= u1 ,
L 2 (U )
= g, ()
L 2 (U )
Since is one and its derivatives are zero at any point of Supp , we can remove on the right hand side. In particular, we have p(u1 ),
L 2 (T d M)
= g,
L 2 (U )
= p(g ),
L 2 (T d M)
for any C (Td M ). In other words we have shown that p(u1 ) H 1 (Td M ) equals (p(g )) +1 d H (Td ). By Lemma 3.79 it follows that p ( g ) H ( T M M ), which by Proposition 3.69 allows us to pull the statement back to U and deduce that g +1 H +1 (U ). Since this holds for all Cc (U ) we see that g Hloc (U ). Repeating the argument and increasing each time, we eventually reach = k +1 +1 k+2 and then g Hloc (U ) = Hloc (U ). 3.6.3 Dirichlets Boundary Value Problem in two dimensions Theorem 3.82. Let U R2 be open and bounded with C 1 -smooth boundary and let f C 1 (U ). Then there exists a function g C (U ) which satises g |U = f and is harmonic. g |U C (U )
Proof. Since U has C 1 -smooth boundary, we can nd an extension of f , again denoted by f , to a function in C 1 (U ) (see the footnote on page 148). 1 Now apply Proposition 3.73 to f , giving functions v H0 (U ) and g H 1 (U ) such that f = g + v with g weakly harmonic in U . Now Corollary 3.77 implies that g C (U ), so that g (in the usual sense) is well-dened. By integration by parts, it follows that g, = g, = g, = 0
for all Cc (U ), showing that g = 0. That is, g is a harmonic function. Moreover, v vanishes at u in the mean-square sense, so g |U = f |U in L2 (U ). Using the averaging property of harmonic functions (Proposition 3.85) and the lemma below (which only works in two dimensions), we will upgrade this to give the theorem.
156
Lemma 3.83. Suppose that U R2 is open and bounded with C 1 -smooth 1 boundary. Let v H0 (U ), which we extend to a function on R2 by setting it equal to 0 outside U . Choose for some x(0) U the coordinate system such that x(0) = 0, U B (x(0) ) = {(x, y ) B (x(0) ) | y < (x)} where : (, ) R is C 1 with (0) = 0. Then 1 4 2

|v | dx dy 0
(3.40)
as 0, uniformly for all z (0) U . Proof. To prove (3.40) we rst show that

|v | dx dy
|x v |2 dx dy,
(3.41)
1 rst for v Cc (U ) which then extends by continuity to all v H0 (U ). 2 As x v L (U ), this then proves (3.40), and does so uniformly since for every > 0 we can nd an > 0 so that the L2 -norm of x1 v and of x2 v on the -neighborhood of U is less than . To prove (3.41) we integrate the absolute value of
v (x, y ) = with respect to x and y to get

y v (x, s) ds
y
|v (x, y )| dx dy =
|y v (x, s)| ds dx dy
|y v (x, s)||s + | ds dx

|y v |2 dx dy
|s + |2 ds dx
as claimed.
|y v |2 dx dy 3
Exercise 3.84. Describe what prevents the proof of Lemma 3.83 from extending to higher dimensions by trying to emulate the calculations involved.
157
Proposition 3.85 (Mean value principle). Let U Rd be open, and let C (U ) be a harmonic function. Let x0 U and r > 0 be chosen with Br (x0 ) U . Then the value of the harmonic function at x0 is equal to the average over the sphere of radius r around x0 , (x0 ) = 1 Area(r Sd1 ) (x) d (x),
x0 +r Sd1
where d denotes the natural area measure on the sphere x0 + r Sd1 . Proof. Without loss of generality x0 = 0. The proof consists of applying the d-divergence theorem to the vector eld f (x) = (x)v (x) v (x)(x), where v : Rd {0} R is an auxiliary function. In fact v is dened by v (x) = log x
1 x
d 2 2
for d = 2, for d > 2.
A direct calculation shows that it satises v = and v = div v = 0 for x = 0. Let (0, r) then the divergence theorem applied to f on the annulus Br (0) B (0) has the form f n d f n d = div f dx1 . . . dxd ,
Br (0) B (0) 1 x x 2 x 2 x d 2 1 x x d 2
for x = 0 and d = 2,
2
for x = 0 and d > 2,
(3.42)
Br (0)
B (0)
x where n = x is the normalized normal vector to the sphere of radius x at x. For the specic f as above, we have
f n = ((x)v (x) v (x)(x)) n x x c1 v (x)(x) n = (x) d 1 x x x 2 2 2 c1 = (x) d1 c2 (x) n, r where c1 = 1 (or c1 = d 2, depending on d) and c2 = (x) for x = r. Furthermore, we notice that
158
c2
Br (0)
(x) n d =
Br (0)
div dx1 dxd = 0.

==0
Using this and the analogue formula for x = we see that the left-hand side of (3.42) equals f n d f n d = c1 r d 1 (x) d c1 d 1 (x) d.
Sd1
Br (0)
B (0)
r Sd1
For the right-hand side, we calculate div f = div (v v ) Therefore, (3.42) becomes the equation 1 r d 1 1 Area(r Sd1 ) by continuity. Returning to the proof of Theorem 3.82, let z U be a point of distance from U , and write f (z ) = g(z ) = v (z ) = 4 2 4 2 4 2
z +B/2 r Sd1
= v + v v v = 0. 1 d 1
Sd1
(x) d =
(x) d.
Now divide by the area of Sd1 and notice that we get d =

r Sd1
1 Area( Sd1 )
Sd1
d (0)
f (w) dw,
z +B/2
g (w) dw, and v (w) dw

z +B/2
for the averages of f, g, and v over the ball of radius /2 with center z . By Proposition 3.85 we have g (z ) = g (z ). By uniform continuity of f we also have f (z ) f (z ) = o(1) as 0. Finally, by Lemma 3.83, v (z ) = o(1) as 0. Thus g (z ) f (z ) = g (z ) f (z ) v (z ) + o(1)
=0
shows the theorem.
3.7 Further Topics
159
3.7 Further Topics

The topic of Fourier series on Td leads naturally to the study of the Fourier integral on Rd (see Section ??). The concepts of Fourier series and Fourier integrals on Td and Rd respectively nd a common generalization in the theory of Pontryagin duality (see Section ??). The case of unitary representations for compact abelian groups considered in this chapter was quite straightforward and is only the beginning of the important theory of unitary representations of locally compact groups. For locally compact abelian groups Pontryagin duality is the main step in a complete description (see Folland [14] for the details). For compact groups the main theorem in this direction is the PeterWeyl theorem [37] (which is also covered in Folland [14]). For many other groups that are neither abelian nor compact this topic is also important and can have many interesting surprises. We will continue our excursion into applications of Hilbert space techniques and Sobolev spaces in Chapter 4, where we will prove the existence of an orthonormal basis consisting of eigenfunctions of the Laplace operator (see Section 4.4).
4 Compact Self-Adjoint Operators and Laplace Eigenfunctions
4.1 The Goal

There is no doubt that eigenvalues and eigenvectors are of fundamental importance in linear algebra and in its applications, both within and outside mathematics. In nite dimensions eigenvectors always exist over C because the corresponding eigenvalues arise as zeros of a polynomial. However, even in nite dimensions the eigenvectors are not guaranteed to give a basis of the space (because there may be non-trivial Jordan blocks). However, if the linear map is self-adjoint (over R or over C; see Denition 4.18) or unitary (over C) then there is an orthonormal basis consisting of eigenvectors. That is, in these cases the linear map can be diagonalized. In innite dimensions the inherent complications of linear algebra in nite dimensions have added to them entirely new phenomena, illustrated by the following exercise.
Exercise 4.1. (a) Let H = 2 (Z) and dene the operator U : H H by (U ((xn )))k = xk+1 , which simply shifts the sequence by one step to the left. Show that U has no eigenvectors. (b) Let H = 2 (N) and dene the operator S : H H by (S ((xn )))k = xk+1 , which again simply shifts the sequence one step to the left, but now forgets the rst entry of the sequence. Show that S has uncountably many dierent eigenvalues. In particular, deduce that there are too many eigenvalues to hope for a diagonalization .
There is an important class of operators for which some of the diculties illustrated in Exercises 4.1 do not arise. These are the compact operators which
Since the space H is separable, a diagonal map would only have countably many eigenvectors.
162
will be dened in Section 4.2. In Section 4.3 we then prove that compact self-adjoint operators can be diagonalized using an orthonormal basis, and relate this to the SturmLiouville equation from Section 1.3.3. Using this, and results from Chapter 3 we will prove in Section 4.4 a version of the existence of a basis of eigenfunctions claimed in Section 1.4 for a bounded domain Rd with smooth boundary. At rst sight this should be surprising, since the Laplace operator is not even bounded on L2 ( ), but we will nd a compact operator dened on all of L2 ( ) whose eigenfunctions are precisely the eigenfunctions of .
4.2 Compact Operators

4.2.1 Denition and Basic Properties Denition 4.2. Let V and W be normed vector spaces, and let L : V W be a linear operator. Then L is said to be a compact operator if the closure
V (0) W L B1
of the image of the unit ball is compact in W . We will sometimes write K (V, W ) for the space of compact operators, and if V = W will write K (V ) for the space of compact operators from V to V . Since compact sets are bounded, every compact operator is also bounded, but the converse does not hold. For example, the identity operator V V on an innite-dimensional normed vector space is not a compact operator by Proposition 2.28. However, if L : V W is a bounded operator and L(V ) is nite-dimensional, then L is a compact operator. We will see many more examples after we prove a few basic properties of compact operators. Lemma 4.3. Let V1 , V2 , V3 be normed vector spaces, and let L1 : V1 V2 and L2 : V2 V3 be bounded operators. If L1 or L2 is a compact operator, then so is L2 L1 . Proof. Suppose that L1 is compact. Then
V1 V1 L 2 L 1 B1 L 2 L 1 B1
and the latter is compact, as it is the continuous image of a compact set. This
V1 is compact, and so L2 L1 is a compact operator. shows that L2 L1 B1 If L2 is compact, then
It is also not well-dened on all of L2 ( ) which is a related issue; also see the Closed Graph theorem (Theorem 5.21).

V1 L2 L 2 L 1 B1
163
L1
V2 operator B1
= L1
operator L2
V1 , B1
which is compact, and so L1 L2 is again compact.

Exercise 4.4. Let V and W be two normed vector spaces. Show that K (V, W ) = {L : V W | L is a compact operator} is a linear subspace. Deduce that if V = W is a Banach space, then K (V ) = K (V, V ) B (V ) is a two-sided ideal of the Banach algebra B (V ). That is, if L K (V ) and A B (V ), then A L and L A lie in K (V ).
Example 4.5. (a) The inclusion map : C 1 ([0, 1]) ,

C1
(C ([0, 1]) ,
is a compact operator. This follows from the ArzelaAscoli theorem (Theorem 2.31), since
C (0) {f C ([0, 1]) | f B1
1
1, f is 1-Lipschitz}.
This example shows that it is important to take the closure of the image and 1 not just the image of the closed ball: for example, the function f (x) = |x 2 | belongs to the closure but not to the image of . (b) The integral operator
x
C ([0, 1]) f
f (t) dt C ([0, 1])
is compact, since it is the composition of the continuous operator : C ([0, 1]) C 1 ([0, 1])
x
f and the compact inclusion operator in (a).
f (t) dt,
0
Exercise 4.6. Is the operator f L1 ([0, 1]) compact operator?
x 0
f (t ) d t
C ([0, 1]) a
Many of the compact operators that we will encounter have a similar avor to the example above. They either map from a space of functions with more regularity properties (in this instance, dierentiability) to a space of functions with fewer regularity properties (in this case, continuity), or are integral operators. The next lemma is a useful tool for proving compactness of bounded operators.
164
Lemma 4.7 (Uniform approximation). Let V be a normed vector space, and let W be a Banach space. Suppose that (Ln ) is a sequence of compact operators V W , and suppose that Ln L B (V, W ) as n with respect to the operator norm. Then L is a compact operator as well. Lemma 4.7 improves the claim from Exercise 4.4 that K (V ) is a two-sided closed ideal in B (V ) for any Banach space V . Proof of Lemma 4.7. Let
V (0) W. M = L B1
Since W is assumed to be a Banach space, M is complete. It remains to show that M is totally bounded (see Section A.5 for the notion and for the equivalence to compactness). Let > 0 and choose Ln with Ln L < . Since Ln is compact, we know that
V (0)) Ln (B1
is compact and hence is totally bounded. It follows that there exist eleV (0)) with ments w1 , . . . , wm Ln (B1
m V (0) L n B1 W B (wi ). i=1
V For each wi there exists some vi B1 (0) with wi Ln (vi ) < . If now v V B1 (0), then for some i we have
Ln (v ) Ln (vi ) < 2. Now Ln L < and v , vi < 1 so L(v ) L(vi ) < 4. It follows that
V L(B1 (0)) m
B4 (L(vi )),
i=1
which implies that the points L(vi ) for i = 1, . . . , m are 5-dense in M = V (0)). As was arbitrary, M is therefore totally bounded and so M is a L(B1 compact set and L is a compact operator.
Exercise 4.8. In each case, justify your claim. (a) Is the inclusion map k+1,k : H k+1 (Td ) H k (Td ) f f
4.2 Compact Operators from Proposition 3.42 a compact operator? (b) Let Rd be an open set. Show that the inclusion
k+1 k k+1,k : Cb ( ) Cb ( )
165
f f is a compact operator if Rd is compact. Show that for = R and k = 0 (or for any k 0), the inclusion map 1,0 (or k+1,k ) is not a compact operator. (c) Is the inclusion map C (Td ) L2 (Td ) a compact operator?
4.2.2 Integral Operators are often Compact We explore here briey the realm of integral operators and show that many (but not all) are in fact compact operators. Lemma 4.9. Let (X, dX ) and (Y, dY ) be compact metric spaces. Let be a nite Borel measure on X , and let k C (X Y ). Then the operator K : C (X ) C (Y ) dened by (Kf )(y ) =
X
f (x)k (x, y ) d(x)
is a compact operator. Proof. Notice that |(Kf )(y )| |f (x)||k (x, y )| d f
(X ).
Since X Y is compact, k is uniformly continuous. Then for any > 0 there is a > 0 for which dY (y1 , y2 ) < = |k (x, y1 ) k (x, y2 )| < for all x X . Therefore dY (y1 , y2 ) < implies that |Kf (y1 ) Kf (y2 )|
X
|f (x)||k (x, y1 ) k (x, y2 )| d

(X ). C (X )
Hence Kf C (Y ) and the image of the unit ball B1 (0) is an equicontinuous bounded family of functions. By the ArzelaAscoli theorem (TheoC (X ) rem 2.31) the closure of K (B1 (0)) is a compact subset of C (Y ), and so K is a compact operator.
166
Proposition 4.10 (HilbertSchmidt [18]). Let (X, BX , ) and (Y, BY , ) be -nite measure spaces. Let k L2 (X Y ). Then the Hilbert-Schmidt 2 operator K : L2 ( X ) L ( Y ) dened by Kf (y ) =
X
f (x)k (x, y ) d(x)
-almost everywhere denes a compact operator. Proof. Note rst that

1/2 X
|f (x)k (x, y )| d
L2
|k (x, y )|2 d(x)
(4.1)
Squaring and integrating over Y gives

2 Y X
|f (x)k (x, y )| d
2 L2
2 L2
<
by Fubinis theorem. Thus (4.1) is nite almost everywhere, so Kf (y ) is welldened for -almost every y . The bound above also shows that Kf
L2
L2
. L2
2 Hence K : L2 (X ) L (Y ) is well-dened, clearly linear, and
operator
. L2
If k is a simple function of the form

n
k (x, y ) =
i=1
ci Ai Bi
(4.2)
for some measurable sets Ai X , Bi Y of nite measure and constants ci C, then

n
Kf =
i=1
ci
Ai
f d
Bi
is a bounded operator with nite range and so is also compact. We wish to apply Lemma 4.7 to show that the compactness extends to all operators of the form described in the lemma. Since we already showed that the operator norm is bounded above by the L2 -norm of the kernel, we only have to show 2 that any k L2 can be written as the limit in L (X ) of a sequence of functions (kn ) with each kn of the form (4.2). Indeed, if Kn is the operator associated to kn then K Kn
operator
k kn
167
as n , which together with the previous discussion and Lemma 4.7 gives the compactness of K . In order to show that k L2 can be obtained as the limit of simple functions as in (4.2), note rst that simple functions are dense in L2 . Hence it is sucient to show that a characteristic function D for a measurable set D X Y of nite measure can be approximated by functions of the n form i=1 Ai Bi where A1 B1 , . . . , Am Bm are all disjoint and have nite -measure. The proof of this claim is similar to the argument in the proof of Proposition 2.38. Indeed, if we write X = n=1 Xn and Y = n=1 Yn with X1 X2 , Y1 Y2 and with (Xn ) < and (Yn ) < for all n 1. Then A = {D BX BY | the claim above holds for D (Xn Yn ) for all n 1}
is a -algebra containing all rectangles A B for A BX and B BY . It follows that A = BX BY . Finally, if D X Y has nite measure, then
D D(Xn Yn )
L2
as n . Therefore the simple functions as in (4.2) are indeed dense, which gives the lemma.
Exercise 4.11. Prove that the collection A of sets dened in the proof of Proposition 4.10 is a -algebra. Exercise 4.12. Let g L2 (Td ) and dene the convolution operator L2 (Td ) f f g C (Td ). Show that this denes a compact operator from (L2 (Td ),
2)
to (C (Td ),
).
Not all integral operators are compact, as shown by the Holmgren operators. Proposition 4.13 (Holmgren). Let (X, BX , ) and (Y, BY , ) be -nite measure spaces. Let k : X Y R be measurable on X Y , with sup
xX
|k (x, y )| d (y ) < |k (x, y )| d(x) < .
and sup
y Y
Then the integral operator K dened by Kf (y ) = is a bounded operator f (x)k (x, y ) d(x) (4.3)
168
K : L2 (X, BX , ) L2 (Y, BY , ). Moreover,

1/2 1/2
sup
xX
|k (x, y )| d (y )
sup
y Y
|k (x, y )| d(x) <
Proof. The proof that the integral in (4.3) makes sense for -almost every y Y , and denes an element in L2 (Y, BY , ) is less direct than the proof of Proposition 4.10, and uses the Fr echetRiesz representation theorem (Corollary 2.71). For this, suppose that f L2 (X, BX , ) and g L2 (Y, BY , ), and consider the integral I=
X Y
|f (x)k (x, y )g (y )| d(x) d (y ). 0 and c > 0, we always have
Notice that for any real numbers a, b ab
b2 ca2 + , 2 2c
as a special case of the arithmetic-geometric mean inequality. Applying this to the denition of I with a = |f (x)| and b = |g (x)| gives I
X c 2 X Y
|k (x, y )|
Y
c 2 2 |f (x)|
1 2 2c |g (y )|
d(x) d (y )
1 2c Y X
|k (x, y )| d (y )|f (x)|2 d(x) + 1 c |k (x, y )| d (y ) + sup f 2 2 sup 2 2 c y Y xX Y

sX
|k (x, y )| d(x)|g (y )|2 d (y )
|k (x, y )| d(x) .
sY
Optimize the parameter c by setting c= and obtain |f (x)k (x, y )g (y )| d(x) d (y ) sX sY f

2
sY g sX f
2 2
g 2.
It follows that g g (y )
Y X
f (x)k (x, y ) d(x) d (y )
is a continuous functional on L2 (Y, BY , ) and so
4.3 Spectral Theory of Self-Adjoint Compact Operators
169
Kf (y ) =
X
f (x)k (x, y ) d(x)

2
is nite -almost everywhere, with Kf
sX sY f
2.
The main dierence between HilbertSchmidt integral operators and Holmgren integral operators is that the latter are not automatically compact.
Exercise 4.14. Let X = Y = R and = = , the Lebesgue measure on R. Dene k(x, y ) = 1 0 for x y otherwise. 1,
Show that the corresponding Holmgren operator K as dened in Proposition 4.13 is not a compact operator on L2 (R, ).

There is a general spectral theory of compact operators L : V V on Banach spaces. However, as we will discuss later, our applications do not need that level of generality and the statement and proof for the simpler case of selfadjoint operators is signicantly easier. For these reasons we will restrict to that case below and refer to Lax [25] for the general result.[Chapter 21] 4.3.1 The Adjoint Operator Let H1 , H2 be Hilbert spaces, and let A : H1 H2 be a bounded operator. For any xed v2 H2 the map H1 v1 Av1 , v2 is linear and bounded since | Av1 , v2 | Av1 v2 ( A
operator H2
v2 ) v1 .
Therefore, by the Fr echetRiesz representation theorem (Corollary 2.71) applied to H1 there exists some element, which will be denoted A v2 H1 , with the properties that (4.4) v1 , A v2 H1 = Av1 , v2 H2 and A v2 This denes a bounded operator A : H2 H1 , called the adjoint of A. This operator is linear, since A
operator
v2 .
(4.5)
170

v1 , A (v2 + v2 ) = Av1 , v2 + v2 = Av1 , v2 + Av1 , v2 = v1 , A v2 + v1 , A v2 = v1 , A v2 + A v2
for v1 H1 , v2 , v2 H2 and any scalar , and is bounded with
operator
operator
by (4.5). Moreover, (4.4) also implies that A = A and hence A

operator
= A
operator .
Finally, it is easy to check that the map A A is sesquilinear.

Exercise 4.15. Let A : H1 H2 and B : H2 H3 be bounded operators between Hilbert spaces. Show that (BA) = A B .
Denition 4.16. An operator U : H1 H2 between two Hilbert spaces is unitary if U U = IH1 and U U = IH2 , which we also write as U = U 1 .
Exercise 4.17. (a) Show that an operator U : H1 H2 is unitary if and only if it is a bijective isometry (that is, a bijection with U v H2 = v H1 for all v H1 ). (b) Suppose that U : H1 H2 is an isometry. Show that U U = IH1 and that U U = PIm U is the orthogonal projection from H2 onto the closed subspace Im U H2 .
Denition 4.18. A bounded operator A : H H of a Hilbert space H is called self-adjoint if A = A. The next exercise revisits the maps introduced in Exercise 4.1.
Exercise 4.19. (a) Let H = 2 (Z) and dene the operator U : H H by (U ((xn )))k = xk+1 . Show that U is unitary. (b) Let H = 2 (N) and dene the operator S : H H by (S ((xn )))k = xk+1 . Show that S operator = 1, but that S is not an isometry. (c) Let H = 2 (N) once again, and dene T : H H by T ((xn )) = (0, x1 , x2 , . . . ), which shifts the sequence to the right and lls in the rst entry of the new sequence with a 0. Show that T operator = 1, that T is an isometry, is not surjective, and has no eigenvectors.
171
The next exercise is not simply another example, it turns out to really be the basis of the powerful theory of self-adjoint operators, both bounded and unbounded.
Exercise 4.20. Let (X, B, ) be a measure space, and let H = L2 (X ). Also let g : X C be a measurable function. Characterize the following properties of the multiplication operator Mg : f gf for f H . (a) What properties of g ensure that Mg : H H is well-dened and bounded? What is Mg operator ? (b) When is Mg a bounded self-adjoint operator? That is, what property of g ensures that Mg f1 , f2 = f1 , Mg f2 holds for all f1 , f2 H ? What property of g ensures that Mg is unitary? That is, surjective with Mg f1 , Mg f2 = f1 , f2 . (c) When does Mg have C as an eigenvalue? (d) Suppose that X = R and let g (x) = x, and assume that is an arbitrary nite compactly supported Borel measure on R. Characterize in terms of the property that Mg can be diagonalized. That is, when H has an orthonormal basis {en | n N} and a sequence (n ) such that

Mg
n=1
x n en
=
n=1
n x n e n .
Exercise 4.21. Let H = Cn be a nite-dimensional Hilbert space with respect to the usual inner product. Show that the linear operator dened by a matrix A = (aij ) is self-adjoint if any only if it is equal to its own conjugate transpose (that is, aij = aji for all i, j ). Such matrices are also called Hermitian.
4.3.2 The Spectral Theorem The spectral theorem presented here generalizes the familiar fact that a Hermitian matrix has real eigenvalues and can be diagonalized using a unitary matrix to an innite-dimensional setting. Theorem 4.22. (Spectral theorem for compact self-adjoint operators) Let H be a separable innite-dimensional Hilbert space, and let A : H H be a compact self-adjoint operator. Then there exists a sequence of real eigenvalues (n ) with n 0 as n , and an orthonormal basis {vn } of eigenvectors with Avn = n vn for all n 1.
Separability will be assumed in this section in order to make use of an orthonormal basis that consists of a sequence. Properly formulated, the result holds more generally, and in particular allows the kernel of A to be a non-separable space. Both the nite-dimensional and the inseparable case can easily be extracted from the proof we give.
172
In other words, a compact self-adjoint operator is diagonalizable, each nonzero eigenvalue has nite multiplicity, and 0 is the only possible accumulation point of the set of eigenvalues. Given these properties which will turn out to be extremely useful it is worth asking if there are any such operators. Clearly such operators exist in the following sense. If {en } is an orthonormal basis of a Hilbert space H , and (n ) is a sequence of real numbers with n 0 as n , then we may dene an operator A : H H by A
xn en
n=1
n xn en
for any convergent series It may then be checked that A is compact and self-adjoint. Of course Theorem 4.22 does not tell us anything we did not already know for such an operator. A more interesting kind of example is found among the integral operators. Let H = L2 (X ) where (X, B , ) is a -nite measure space, and suppose that k L2 (X X ) satises k (x, y ) = k (y, x) for -almost every x, y X . Then the operator K dened by Kf (y ) = f (x)k (x, y ) d(x)
n=1 n=1 xn en .
is compact by Proposition 4.10, and is self-adjoint by the following application of Fubinis theorem: f1 , K f2 = Kf1 , f2 =
X X
f1 (x)k (x, y ) d(x)f2 (y ) d(y ) f1 (x)

X X =k(y,x)
f2 (y ) k (x, y ) d(y ) d(x)
= f1 , Kf2 for all f1 , f2 L2 (X ). Hence Theorem 4.22 applies, but in this case it is a priori not at all clear how one could nd the eigenvalues for the operator. Finally, notice that the integral operator from Section 1.3.3 dened by the kernel s(t 1) if 0 s t 1, G(s, t) = t(s 1) if 0 t s 1 satises the conditions above, and so the eigenfunctions found in Section 1.3.3 (resp. in Exercise 1.17, which now stands on the rmer ground provided by Section 3.2) must coincide with the eigenvectors which must exist by Theorem 4.22.
173
Exercise 4.23. Let K be the HilbertSchmidt integral operator on L2 (X ) dened 2 by a kernel k L (X X ) with k(x, y ) = k(y, x) as above. Prove that the generalized Fredholm integral equation of the second kind f = Kf + has, for any L2 (X ), a solution if and only if = the set of eigenvaluees of K on L2 (X ).
1 n
for any n, where {n } is
We will see another class of compact self-adjoint operators in Section 4.4. 4.3.3 Proof of the Spectral Theorem Lemma 4.24 (Invariance of orthogonal complement). Let A : H H be a bounded operator on a Hilbert space. If V H is an A-invariant subspace (that is, a subspace with A(V ) V ), then V is also A -invariant. Proof. If v V and v V , then A v , v = v , Av = 0. As this holds for all v V , we must have A v V . As we will see, Lemma 4.24 reduces the proof of Theorem 4.22 mostly to nding a single eigenvalue e1 , as we can then apply the lemma to V = e1 and A = A to see that V is A-invariant. We now approach the central statement concerning the existence of an eigenvalue. Before doing this, it is useful to recall how one proves the complete diagonalizability of self-adoint operators on Rd . By compactness, we may choose e Sd1 = {v Rd | v 2 = 1} such that the quadratic form Ax, x 2 achieves its maximum at x = e. Using Lagrange multipliers one can then check that e is an eigenvector of A. The vector e is then an eigenvector with eigenvalue R of absolute value || = A operator. This relies in an essential way on the compactness of the unit sphere Sd1 , which as we know fails in innite-dimensional Hilbert spaces, and it is here that the additional assumptions on A will become important. The rst step towards using the compactness of A is the following (which only relies on self-adjointness). Lemma 4.25 (The norm of self-adjoint operators). Let A : H H be a bounded self-adjoint operator on a Hilbert space. Then A = sup | Ax, x |
x 1
(4.6)
Notice that if A is self-adjoint, then Ax, x R for all x H , since Ax, x = x, Ax = A x, x = Ax, x .
174
Proof of Lemma 4.25. Let us write s(A) = sup | Ax, x |

x 1
for the right-hand side of (4.6). Then, by the CauchySchwarz inequality, | Ax, x | Ax x A x
2
for all x H with x 1. Hence s(A) A . The opposite inequality is slightly more involved. For > 0, 4 Ax
2 1 1 1 1 = A(x + Ax), x + Ax A(x Ax), x Ax
s(A)
1 Ax x +
1 + x Ax
since the two inner-products appearing are of the form Au, u and satisfy | Au, u | s(A) u 2 . Now we apply the parallelogram identity (2.34) to obtain 1 2 . 4 Ax 2 2s(A) 2 x 2 + 2 Ax Assuming that Ax = 0, we set 2 = 4 Ax so Ax This shows that A s(A). s(A) x .
2 Ax x
and get Ax
2
2s(A)
Ax x
x Ax
4s(A) Ax
x ,
We are now ready to prove the existence of an eigenvector. Lemma 4.26 (Main step: nding the rst eigenvector). Let A : H H be a compact self-adjoint operator on a non-trivial Hilbert space. Then there exists an eigenvector of A for the eigenvalue A or A . Proof. If A = 0 then A = 0 and there is nothing to prove, so we may assume that A > 0. By Lemma 4.25 there exists a scalar with || = A and a sequence (xn ) in H with xn = 1 for all n 1 and with Axn , xn as n . By the remark before the proof of Lemma 4.25, we have Axn , xn R and so { A , A }. Now notice that 0 Axn xn
2
= Axn = Axn 2 A
2
2 2
2 ( Axn , xn ) + 2 xn 2 Axn , xn + 2
2
2 Axn , xn 2 A
2 A
=0
as n . In particular, this shows that (Axn ) converges if and only if (xn ) converges, and that the limits agree. However, since xn = 1 and A is a
175
compact operator there exists a subsequence (xnk ) for which Axnk converges, say Axnk x (4.7) as k for some x H . Therefore,xnk x as k as well, and hence xnk x as k . Since A is continuous, we deduce that Axnk Ax as k . Together with (4.7) we have Ax = x, and since xnk = 1 for all k 1 and xnk x as k we also have x = 1 and hence x = 0. Now we combine the arguments above to prove the spectral theorem for compact self-adjoint operators. Proof of Theorem 4.22. By assumption, H is an innite-dimensional Hilbert space and A : H H is a compact self-adjoint operator. By Lemma 4.26 there exists an eigenvector e1 with eigenvalue 1 R, and |1 | = A . We may assume without loss of generality that e1 = 1. Suppose now, for the purposes of an induction argument, that we already have found orthonormal eigenvectors e1 , . . . , en with corresponding eigenvalues 1 , . . . , n . Let Vn = e1 , . . . , en be the linear span of these vectors, and notice that A(Vn ) Vn since they are eigenvectors for A. By Lemma 4.24 we have A (Vn ) Vn , but since A = A we also have A(Vn ) Vn . Write
An = A|Vn : V n Vn for the restriction of A to Vn . Then An is a compact operator because A is compact, and is self-adjoint because A is self-adjoint . Therefore, we may ap ply Lemma 4.26 again to the operator An : Vn Vn to nd another eigenvector en+1 orthogonal to e1 , . . . , en with eigenvalue n+1 , |n+1 | = An , and en+1 = 1. Repeating the argument, we nd an orthonormal sequence (en ) of eigenvectors with Aen = n en and n R. We need to show that n 0 as n . By construction we have
|n+1 | = An = A|Vn so that |1 | |2 |
A|Vn = An1 = |n |, 1 . (4.8) 1
If n 0 as n , then there is some > 0 such that |n | > for all n by (4.8). This shows that en = A
n en
A (B1 (0))
= A w1 , w2 = Aw1 , w2
If w1 , w2 Vn then A = w 1 , A n w2 n w 1 , w2 and Aw1 Vn so we have A = An . n = A|Vn
176
for all n
1, and since en em for n = m we must have en em = 2
for n = m. This shows that the sequence (en ) lies in the set A(B1 (0)) (which is compact because A is a compact operator) and cannot have a convergent subsequence, which is a contradiction. Also, since |n | = A|Vn A|V where V = e1 , e2 , . . . , we see that A|V = 0. Thus far we have not used the assumption that H is separable, and the statement at the end of the last paragraph is the general result. Assuming now that H is separable, we can choose an orthonormal basis of V (which might be zero, in which case the theorem is already proved, or might be nite-dimensional). Listing this orthonormal basis of V together with the basis of V already constructed proves the theorem.
4.4 Eigenfunctions for the Laplace Operator

We start by discussing the case of the d-dimensional torus, even though we already have an orthonormal basis consisting of eigenvalues of the Laplacian in this setting, namely the characters. Using the characters, we will dene a right-inverse S of the Laplacian on
d L2 0 (T ) =
f L2 (Td ) |
f dx = 0 ,
Td
d d 2 and it will be easy to show that S : L2 0 (T ) L0 (T ) is a compact operator. In Section 4.4.2 we will generalize this construction to open subsets of Rd a setting in which we do not know the eigenfunctions of the Laplacian. In Section 4.4.3 we will assume that is a bounded set with smooth boundary, and will be able to deduce the existence of an orthonormal basis of eigenfunctions of the Laplacian. We start by stating the main theorem, which will be proved in Section 4.4.3.
Theorem 4.27. Let Rd be an open bounded set with smooth boundary. 1 ( ) Then there exists an orthonormal basis {fn } of L2 ( ) of functions in H0 which are smooth in and are eigenfunctions of the Laplace operator, with fn = n fn , with n < 0 for all n 1 and n as n .
177
4.4.1 A Compact Right Inverse on the Torus Lemma 4.28. There exists a compact self-adjoint operator
d 2 d S : L2 0 (T ) L0 (T )
with the property that Sf = f for all f Each character n for n Zd is an eigenfunction of S with eigenvalue 42 1n 2 . Proof. By Theorem 3.9 we have an isomorphism of Hilbert spaces
d 2 d L2 {0}) 0 (T ) = (Z d L2 0 (T ).
dened by f=
nZd {0}
an n (an )nZd
{0} .
We dene S on 2 (Zd {0}) by S : (an )nZd

{0}
42 1n 2 an
nZd {0}
which again belongs to 2 (Zd {0}). Since 42 1n 2 R for all n Zd Sa, b =

nZd {0}
{0} we have 1 4 2 n a b 2 n n = a, Sb 1 we
for all a, b 2 (Zd {0}) and so S = S is self-adjoint. For each m dene a truncated operator Sm : 2 (Zd {0}) 2 (Zd {0}) by Sm (a) = and notice that (S Sm )a = 1 (4 2 )2 n |a n |2 1/2 1 a , 4 2 m2 0 42 1n 2 an for n m, for n > m,
n >m
so that Sm S with respect to operator as m . As Sm has nitedimensional range (and is therefore automatically compact), Lemma 4.7 shows that S is a compact operator.
178
d Using the isomorphism between 2 (Zd {0}) and L2 0 (T ), we can and will d identify S with a compact self-adjoint operator on L2 ( T ), which by construc0 tion will have the property that
Sn =
1 4 2 n
2
for all n Zd {0}. d d Now let f L2 0 (T ) and C (T ) be given, with Fourier expansions f=
nZd {0}
a n n
as an equality in L2 0 and =
nZd
b n n
as an identity that holds pointwise. Then Sf, =

nZd {0}
42 1n 2 an
4 2 n 2 bn = f, ,
which shows that Sf = f by denition of . Essentially the same argument that gave compactness of the operator S in Lemma 4.28 also proves the following lemma (whose proof we leave as an exercise; see also Exercise 3.49 and Exercise 4.8). Lemma 4.29. The operator 1,0 : H 1 (Td ) H 0 (Td ) = L2 (Td ) is compact. In fact, K = {f L2 (Td ) | f
2
1 and, for j = 1, . . . d, j f exists with j f
1}
is a compact subset of L2 (Td ). 4.4.2 A Self-Adjoint Right Inverse on Open Subsets Proposition 4.30 (Self-adjoint right inverse). Let Rd be a bounded 1 and open subset. Using Lemma 3.72 we may equip H0 ( ) with the inner product , 1 . Then the map
1 = 1,0 : H0 ( ) H 0 ( ) = L2 ( )
has the property that (( f )) L2 ( )
179
exists for all f L2 ( ) and equals f . In other words, ( ) = I equals the identity on L2 ( ). Finally, S = : L2 ( ) L2 ( ) is a self-adjoint operator. Proof. Recall the map
1 : H0 ( ) H 0 ( )
f f
from Proposition 3.51. The adjoint is therefore a map

1 : H 0 ( ) H0 ( ),
and so the composition is indeed a map from L2 ( ) to L2 ( ). By Exer cise 4.15 (and hence S ) is self-adjoint. Now let Cc ( ) and f L2 ( ). Then f,
L2
= =
j
2 f, j j
L2
j f, j
1 H0
L2
= f, = f, shows that ( ) = I as claimed.
L2
= f,
L2
4.4.3 Compactness of the Right-Inverse We will now assume that Rd is bounded and has smooth boundary, and prove that in this case the operator S dened in Proposition 4.30 is compact. Proposition 4.31 (Compactness on open subsets). Let Rd be bounded and open with smooth boundary. Then
1 : H0 ( ) H 0 ( )
is a compact operator. As a rst step toward this general result, we prove a weaker claim for cubes in Rd .
180
Lemma 4.32 (Weak compactness for a cube). Let = (a, b)d with a < a b. Then for any (0, b 2 ) the set K = {f L2 ( ) | f is compact. The condition on the support of the functions in K is to be understood as meaning that there is a representative function in the equivalence class of f whose support lies in [a + , b ]d . Proof of Lemma 4.32. For simplicity of notation and without loss of generality we set a = 0 and b = 1. Fix (0, 1 2 ) and f K . Then, by denition of j , we see that fj = j f L2 ((0, 1)d ) satises fj , = f, j (4.9)
2
1, j f L2 ( ) exists, j f
2
1, and Supp(f ) [a + , b ]d }
for all Cc ((0, 1)d ). Switching to Td . Notice that we could also consider f, fj as elements of L2 (Td ) by identifying D = [0, 1)d with Td . Then (4.9) also holds for C (Td ). To prove this, choose some Cc (D) with 1 on a neighborhood d d of [, 1 ] (0, 1) . Now let C (Td ), which we may identify with a Zd -periodic function C (Rd ), and in particular may think of as being dened on [0, 1]d . Notice that |D may not belong to Cc (D), so that we cannot apply (4.9) directly to |(0,1)d . However, |(0,1)d Cc (D), and so
f, j
L 2 (T d )
= f, j |D
L 2 (D ) L 2 (D )
= f, j |D + |(0,1)d j = f, j (|D )
L 2 (D )
since Supp(f ) [, 1 ]d and 1 on [, 1 ]d . Applying (4.9) we see that f, j = fj , |D = fj , |D = fj ,

L 2 (D )
L 2 (D )
L 2 (T d )
since we also have Supp(fj ) [, 1 ]d (see Exercise 4.33). It follows now from Lemma 4.29 that K considered as a subset of L2 (Td ) is compact. However, since L2 (Td ) is naturally identied with L2 (D), we obtain the lemma.
181
Exercise 4.33. Suppose that f L2 (D), j f L2 (D) exists, and Supp(f ) M for some closed set M D (as usual, this means that there is a function in the equivalence class of f with Supp(f ) M ). Show that Supp j f M .
1 Lemma 4.32 shows that it is desirable to replace a function f H0 ( ) by a function which has support contained in . The function provided by the next lemma will help us to achieve this.
for which
Lemma 4.34 (Smooth almost characteristic function). Let Rd be open, bounded, and with smooth boundary. Then for every suciently small > 0 there exists a function Cc ( ) with if dist(x, ) , = 0 (x) [0, 1] if dist(x, ) [, 2 ], =1 if dist(x, ) 2,
j =1,...,d
max
1 ,
where the implied constant only depends on . Here we say that has smooth boundary if, at every point x the set and its boundary can be described can be described using a smooth function in the following sense. After reordering the variables if necessary, there exists some > 0 and a smooth function b : B (x1 , . . . , xd1 ) R such that B (x) = {y B (x) | yd b(y1 , . . . , yd )}. Note that since is compact, it can be covered by nitely many such balls (cf. Denition 3.62). We postpone the proof of this lemma from analysis to the end of the section, and continue by showing how it may be used for part of the proof of Proposition 4.31. Lemma 4.35 (Product rule). Let Rd be an open set, and let Cc ( ). Then 1 H 1 ( ) f f H0 ( ) is a continuous operator, and ( f ) = ( j f ) + f (j ).
1 If D , then f H0 (D) also. Proof. For C ( ) H 1 ( ) it is clear that Cc ( ) and so j ( ) = (j ) + (j ) Cc ( ) L2 ( )
182
1 for j = 1, . . . , d. Hence H0 ( ) and H1 d + 1 max{ 2 , 1 ( ) 2 , . . . , d ( ) 2 } d max{ 2 , 1 2 + 1 2 , . . . ,
H1 ,
+ d
2}
which shows that multiplication with continuously extends to a bounded 1 operator from H 1 ( ) to H0 ( ) by Proposition 2.46.
1 Next we need to study how f and f for f H0 ( ) and as in 2 Lemma 4.34 dier as elements of L ( ).
Lemma 4.36 (Estimates for f ). Let Rd be open and bounded with smooth boundary. For > 0 dene = {x | dist(x, ) < }. Then f |
L2 ( )
1. H0
(4.10)
Moreover, if we let be as in Lemma 4.34 for a suciently small , then f 1 H0 ( ) implies that 1 f f L2 f H0 (4.11) and f
1 H0
1, H0
(4.12)
where the implied constants only depend on as in Lemma 4.34 but not on . Proof. By Denition 3.62 and the comment after it, we may nd for every z (0) some > 0, a rotated coordinate system denoted by (x1 , . . . , xd1 , y ) with z (0) = (x(0) , y (0) ), and a C 1 -smooth function so that B (z (0) ) = {(x, y ) | B (z (0) ) | y < (x)}. Without loss of generality (rotating the coordinates further if necessary), we may also assume that (x(0) ) = 0, and (by decreasing if necessary) that (x)
1 10
for all x B10 (x(0) ) since is C 1 . By compactness, can be covered by nitely many balls Bi (z (i) ) with i > 0 and z (i) , and with each ball having the properties above. Let B2 (z (0) ) B10 (z (0) ) with > 0 and z (0) be one of the balls. We claim that dist(z, ) |(x) y |
11 10
dist(z, )
(4.13)
183
for all z = (x, y ) B2 (z (0) ) (where we use the function and the coordinate system as above). The rst inequality is trivial, since (x, (x)) . For the second, let (x , (x )) be chosen so that dist(z, ) = (x, y ) (x , (x ) , and notice that and |y (x)| x x dist(z, )
dist(z, ) +
11 10
|y (x )| + |(x ) (x)|
1 10
dist(z, )
x x
1 by the assumption that 10 . 11 10 and dene for any (0, 10 ) the function = . Now x 11 Applying Proposition 3.64 to |B2 (x) we get
B2 (x(0) )
|f (x, (x) )|2 dx1 dxd1
2 H1 .
11 Finally, we integrate over (0, 10 ) to get 11/10 B2 (x(0) ) 0
|f (x, (x) )|2 dx1 dxd1 d 2 f
2 H1 .
Together with (4.13) we see that f |B2 (x(0) ) Now for mini i we have B2i (x(i) ),
i 2
2 f
2 H1 .
and the rst part (4.10) of the lemma follows. Now note that (4.11) follows at once from the rst part of the lemma and the properties of in Lemma 4.34: f f
2
f |2 ( )
1. H0
1 , so that is remains to show a suitable For (4.12), note that f 2 f H0 inequality for j ( f ) for j = 1, . . . , d. Using the rst part of the lemma again, together with the property of , we get
j ( f )
j f f f
H1
1 H0
+ (j )f
1 H0
2 2
1, H0
+ j
1 2 f +
f |2 ( ) f
184
which proves the lemma. Proof of Proposition 4.31. We wish to show that
1 K = {f L2 ( ) | f = (f ) H0 and f
1 H0
1}
is a compact subset of L2 ( ). Let > 0 and apply Lemma 4.34 to = . By Lemma 4.36 every f K is -close to f K . Choose a square D = (a, b)d . Then every f L2 ( ) can be considered as an element 1 1 of H0 (D) by Lemma 4.35 (notice that this would not be true for f H0 ( )). By Lemma 4.36 we must have K BM0
H 1 (D )
(0)
for some M . Moreover, the elements of K satisfy the support assumption of the functions in Lemma 4.32. It follows that K L2 (D) has compact closure. In particular, K is totally bounded, so we can nd an -dense nite set { f j | 1 j m} K with respect to 2 . Therefore, for any f K there exists some r {1, . . . , m} with f fr , which by Lemma 4.36 gives f fr
2
f f
+ f fr
+ fr fr
It follows that K is totally bounded and hence compact. Proof of Theorem 4.27. Let S = : L2 ( ) L2 ( ) be the self-adjoint operator from Proposition 4.28. By Proposition 4.31,
1 : H0 ( ) L2 ( )
is compact since is open and bounded with smooth boundary. By Lemma 4.3 this shows that S is compact also. Thus Theorem 4.22 applies, giving an orthonormal basis f1 , f2 , . . . of eigenvectors and corresponding eigenvalues 1 , 2 , . . . with n 0 as n with Sfn = n fn . By Proposition 4.30 we have (Sfn ) = fn , so that (n fn ) = fn . It follows that n = 0 and fn = n fn with n = and so |n | as n . Since S = we have n = Sfn , fn = fn , fn 0
1 n ,
185
so that n as n . It remains to show that fn is smooth in and fn = n fn for all n and this is precisely the statement of Corollary 3.78.
1,
There are very few examples of domains for which one can write down the eigenfunctions of the Laplace operator explicitly. Important exceptions are rectangles = (0, a1 ) (0, a2 ) (0, ad ) and balls
R = BM (0).
d
On the disc the eigenfunctions are Bessel functions(4) .

Exercise 4.37. Let = (0, a1 ) (0, ad ). Show that the functions fn arising as eigenfunctions of the Laplace operator can be chosen to take the form
( d) fn (x) = (sin (1) n x1 ) (sin n xd ).
Proof of Lemma 4.34. We will make use of the cover constructed in the proof of Lemma 4.36: is covered by nitely many balls Bi (z (1) ), . . . , Bk (z (k) ) (4.14)
with zi . For each of the balls, which we denote B (z (0) ), there is a rotated coordinate system and a C 1 -smooth function : B10 (z (0) ) R with and the estimate (4.13) holds for all (x, y ) B2 (z (0) ). Now choose C (R) to be a function satisfying if t 0; = 0 (t) (0, 1) if t (0, 1); . =1 if t 1, Now cover Rd by the sets , Rd , and the balls B1 (z (1) ), . . . , Bk (z (k) ). For this cover there exists a smooth partition of unity (see Lemma A.24 in Appendix A), that is a collection of continuous functions , Rd and i for 1 i k with each function supported on its corresponding set and with
B10 (z (0) ) = {(x, y B10 (z (0) ) | y < (x)}
Such a function can readily be constructed explicitly, starting for example from the smooth function 0 if t 0; 0 (t) = e1/t if t > 0.
186

k
+ Rd
+
i=1
i = 1.
This may improved to a smooth partition of unity , Rd and so on by replacing each function with its convolution with a smooth function with Supp B/2 (0) and = 1, where is a Lebesgue number for the cover. Using this smooth partition of unity, we can now give a formula for (x) as a function of > 0 and x Rd . Within any of the balls Bi (z (i) ) the function can be written down using the rotated coordinate system and . In fact, in this case we can dene
(i) (x, y ) = 10 9 ((x) (i) 11 y 10 )
for (x, y ) Bi (z (i) . By construction (x, y ) C (Bi (z (i) )) and

1 j d (i) max j
for xed and . If now dist ((x, y ), ) |y (x)| by (4.13), and so (x) y
11 10
, then
11 10
0,
(i)
which also holds if (x, y / Bi (z (i) ) . Hence (x, y ) = 0 in these cases. Suppose therefore that (x, y ) and dist ((x, y ), ) 2 . In this case (x) y and so (x, y ) = 1. Now dene
k (i) 11 10 9 10 ,
= +
i=1
(i) 1 .
We claim that satises all the properties claimed in the lemma. Indeed,
k k (i) i + j i=1
j = j +
(i) j i i=1
That is, some > 0 with the property that B (x) lies in some element of the cover for every x. (i) For x / Bi (z (i) ) we have i (x) = 0 and so it does not matter that is not (i) dened there, we simply dene the product i to be zero. Thus is dened on Rd .

(i) (i)
187
1 has j since |j (x)| and | (x)| 1 wherever this is dened, and the remaining terms are independent of . If x / then (x) = 0 as each term vanishes in this case. If x and dist(x, ) (i) then (x) = 0 if < /2 and (x) is either zero or is undened for all i, so (i) (by our convention) (x) = 0. If x and dist(x, ) 2 then (x) = 1 or is undened, so k k (i) (x)i (x) = (x) + i=1 i=1
(x) = (x) +
i (x) + Rd
(x)
= 1.
5 Uniform Boundedness and Open Mapping Theorem
In this chapter we present the main consequences of completeness for Banach spaces.
5.1 Uniform Boundedness

Our rst result is the principle of uniform boundedness or the Banach Steinhaus theorem. Theorem 5.1 (BanachSteinhaus). Let X be a Banach space and let Y be a normed vector space. Let {T | A} be a family of bounded linear operators from X to Y . Suppose that for each x X , the set {T x | A} is a bounded subset of Y . Then the set { T | A} is bounded. Proof. Assume rst that there is a ball B (x0 ) on which {T x | A} is uniformly bounded: that is, there is a constant K such that x x0 < = T x K. (5.1)
Then it is possible to nd a uniform bound on the whole family { T | A} of the norms of the operators. For any y = 0 dene z= y + x0 . 2 y K.
Then z B (x0 ) by construction, so (5.1) implies that T z Now by linearity of T the triangle inequality shows that 2 y T y T x0 T y + T x0 = T z 2 y
K,
which can be solved for T y to give
190
T y where K = sup T x0
K + T x0
K + K y
K < . It follows that T 2 K + K
as required. To nish the proof we have to show that there is a ball on which property (5.1) holds. This is proved by contradiction. Assume that there is no ball on which (5.1) holds. Fix an arbitrary ball B0 . By assumption there is a point x1 B0 such that T1 x1 > 1 for some index 1 A say. Since each T is continuous, there is a ball B1 (x1 ) 1 for all y B1 (x1 ). Assume without loss of generality with T1 (y ) that 1 < 1. By assumption, in this new ball the family {T x | A} is not bounded, so there is a point x2 B1 (x1 ) with T2 x2 > 2 for some index 2 A. We continue in the same way. By continuity of 2 2 for all y B2 (x2 ). Assume there is a ball B2 (x2 ) B1 (x1 ) with T2 y . without loss of generality that 2 < 1 2 Repeating this process produces points x1 , x2 , . . . , indices 1 , 2 , . . . , and 1 , and positive numbers 1 , 2 , . . . such that Bn (xn ) Bn1 (xn1 ), n < n Tn y > n for all y Bn (xn ) for all n 1. Now the sequence (xn ) is clearly Cauchy (since xm Bn (xn ), and so d(xm , xn ) < 2n < 2/n), and therefore converges to some z X . By 1, which contradicts construction, z Bn (xn ) and Tn z > n for all n the hypothesis that the set {T z | A} is bounded. Corresponding to the operator norm dened in Lemma 2.39 there is a notion of convergence in the space B (X, Y ) of bounded linear operators from X to Y . A sequence (Tn ) in B (X, Y ) is uniformly convergent to T B (X, Y ) if Tn T 0 as n (so uniform convergence of a sequence of operators is simply convergence in the operator norm). A dierent (and weaker, despite the name) notion of convergence for a sequence of operators is given by the following denition. We will discuss this notion of convergence again in Section 7.3. Denition 5.2. A sequence (Tn ) in B (X, Y ) is strongly convergent if, for any x X , the sequence (Tn x) converges in Y . If there is a T B (X, Y ) with limn Tn x = T x for all x X , then (Tn ) is strongly convergent to T .
5.1 Uniform Boundedness
191
Corollary 5.3. Let X be a Banach space, and Y any normed vector space. If a sequence (Tn ) in B (X, Y ) is strongly convergent, then there exists T B (X, Y ) such that (Tn ) is strongly convergent to T . Proof. For each x X the sequence (Tn x) is bounded since it is convergent. By the uniform boundedness principle (Theorem 5.1), there is a constant K such that Tn K for all n. Hence Tn x K x for all x X. (5.2)
Dene T by requiring that T x = limn Tn x for all x X . It is clear that T is linear, and (5.2) shows that T x K x for all x X , showing that T is bounded. The construction of T means that (Tn ) converges strongly to T .
Exercise 5.4. Prove that uniform convergence implies strong convergence, and nd an example to show that strong convergence does not imply uniform convergence.
5.1.1 Uniform Boundedness and Fourier Series This section is an application of Theorem 5.1 to classical Fourier analysis on T (see Section 3.2 for the background). Recall that if f C (T) then the Fourier coecients of f are dened by am = f, m where m (x) = e2imx , for m Z. The nth partial sum of the Fourier series is
n
sn (x) =
m = n
am e2imx .
Recall that one of the basic questions of Fourier analysis is to clarify the relationship between (sn ) and f . That is, to understand in what sense does the function sn approximate f for large n (if it does at all). We now ask if the sequence of functions (sn ) converges uniformly or pointwise to f for f C (T). Recall from Denition 3.15 that the Dirichlet kernel Dn is dened by
n
Dn (x) =
j = n
e2ijx =
sin(n + 1 2 )2x . sin x
By the discussion in Section 3.2.2,

1
sn (y ) =
0
f (y x)Dn (x) dx.
192
Lemma 5.5. The linear functional Tn : X R dened by

1
Tn (f ) =
0
f (x)Dn (x) dx
is bounded, with Tn =
0
|Dn (x)| dx.
This is a very special case of the general argument in Lemma 2.51, but we include it for the case at hand as this is easier to prove. Proof. For any function f X we have
1 1
|Tn (f )| so
|f (x)||Dn (x)| dx
1
|Dn (x)| dx,
Tn
0
|Dn (x)| dx.
Fix > 0. Then, since |Dn (x)| Mn is bounded for each n 1, we may nd a continuous (this could be chosen to be piecewise-linear for example) function fn with fn 1 that diers from sign(Dn (x)) on a nite union 1 . The triangle inequality for of intervals whose total length is less than M n integrals now gives
1 1
fn (x)Dn (x) dx >

0 0
|Dn (x)|dx ,
which proves the lemma as > 0 was arbitrary. Lemma 5.6. The Dirichlet kernel Dn from Denition 3.15 satises
1 0 1
|Dn (x)| dx =
sin(n + 1 2 )2x dx sin x
as n . Proof. Recall that | sin t|

1 0
|t| for all t R. It follows that

1 0
1 | sin(2n + 1)x| dx. x
Now | sin t|
1 2
5 for all t Z + [ 6 , 6 ]. In particular, it follows that if 2n
(2n + 1)x
1 [(k + 6 ), (k + 5 6 ) ] k=1
5.2 Open Mapping and Closed Graph Theorems
193
then Together this gives

1 0
| sin(2n + 1)x|
2n+1
1 2.
(k + 5 6 )/(2n+1)
1 (k + 6 )/(2n+1)
1 (k +
5 6 )/(2n
1 2 as n .
k=1 2n+1
1 dx + 1) 2
k=1
$ 2n +$ 1 4 $ 6 5 2n +$ 1 k + 6 $$
Theorem 5.7. There exists a continuous function f C (T) whose Fourier series diverges at x = 0. Proof. By the discussion in Section 3.2, we have Tn (f ) = sn (0) for all f C (T). Moreover, for a xed f C (T), if the Fourier series of f converges at 0, then the family {Tn f | n 1} is bounded (since each element is just a partial sum of a convergent series). Thus if the Fourier series of f converges at 0 for all f C (T), then for each f C (T) the set {Tn f | n 1} is bounded. By Theorem 5.1, this implies that the set { Tn | n 1} is bounded, which contradicts Lemmas 5.5 and 5.6. It follows that there must be some f C (T) whose Fourier series does not converge at 0. In principle the proofs of Theorem 5.1 and Theorem 5.7 allow one to construct more concretely the function f as in Theorem 5.7, at least as the limit of a concrete Cauchy sequence of continuous functions. Comparing Theorem 5.7 with Theorem 3.10 (resp. Theorem 3.44) and Proposition 3.19, we see that this limit function is not continuously dierentiable (respectively, not in H 1 (T)) and that the Fourier series of f at 0 is an oscillating function whose Cesaro averages of the diverging sequence (sn (0)) actually converge to f (0).

Recall that a continuous map has the property that the preimage of any open set is open, but in general the image of an open set is not open. We now show that bounded linear maps between Banach spaces on the other hand have the following special property. Theorem 5.8 (Open mapping theorem). Let X and Y be Banach spaces, and let T be a bounded linear map from X onto Y . Then T maps open sets in X onto open sets in Y .
194
The assumption that X maps onto Y is essential. For example, the projection (x, y ) (x, 0) from R2 R2 is bounded and linear, but not onto, and certainly cannot send open sets to open sets. The proof of Theorem 5.8 uses the Baire category theorem(5) , which states that a complete metric space cannot be written as a countable union of nowhere dense subsets. 5.2.1 Baire Category A subset S X of a metric space (X, d) is said to be nowhere dense if for every point x S , and for every > 0, B (x) (X \S ) is nonempty (equivalently, if (S )o = ). A set is called meager or rst category if it is a countable union of nowhere dense sets. The next result is a version of the Baire category theorem. Theorem 5.9 (Baire category theorem). A complete metric space cannot be written as a countable union of nowhere dense sets. Indeed, the complement of a countable union of nowhere dense sets is dense. This is often described by saying that a complete metric space is of second category. Proof of Theorem 5.9. Let X be a complete metric space, and suppose that (Xj ) is a sequence of nowhere dense subsets of X (that is, the sets Xj all have empty interior for j = 1, 2, . . .). Fix an arbitrary ball B (x0 ) with > 0 and x0 X . Since X1 does not contain B (x0 ), there must be a point x1 B (x0 ) with x1 / X1 . It follows that there is an open ball Br1 (x1 ) with B r1 (x1 ) B (x0 ) and with B r1 (x1 ) X1 = . Assume without loss of generality that r1 < 1. Similarly, there is a point x2 and a radius r2 > 0 such that B r2 (x2 ) 1 Br1 (x1 ), and B r2 (x2 ) X2 = , and without loss of generality r2 < 2 . Notice that B r2 (x2 ) X1 = automatically since B r2 (x2 ) Br1 (x1 ). Inductively, we construct a sequence of decreasing closed balls B rn (xn ) such that B rn (xn ) Xj = for 1 j n, and rn 0 as n . It follows that (xn ) is a Cauchy sequence, and the limit z lies in the intersection of all / Xj for all j 1. This implies that the closed balls B rn (xn ), so z z B (x0 ) Xj = ,
j 1
which gives the result since > 0 and x0 X were arbitrary.

Exercise 5.10. Prove the Baire category theorem for compact topological spaces (that is, without the assumption that the space is metric).
By taking complements we can also phrase the Baire category theorem in terms of G -sets.
195
Denition 5.11. A countable intersection of open sets in a topological space is called a G -set. Corollary 5.12 (Baire category theorem). Let (X, d) be a complete metric space, and assume that Gn X is a dense G -set for each n 1. Then the intersection Gn
n=1
is also a dense G -set. Proof. By assumption we can write each Gn in the form Gn =
k=1
On,k
where each On,k is open and dense. It follows that

n=1
Gn =
On,k
n=1 k=1
is a G -set, and also that it is sucient to consider the case where each Gn = On is open and dense. In that case, Xn = X Gn is closed and nowhere dense for each n 1. That is, for every x X and > 0 we have B (x) Gn = B (x) X Xn = . By Theorem 5.9 there exists, for every open ball B (x) some y B (x) or equivalently y B (x) This shows that
n=1 n=1 n=1
Xn ,
Gn .
Gn is dense.
Exercise 5.13. Prove the BanachSteinhaus theorem (Theorem 5.1) using the Baire category theorem (Theorem 5.9).
Let us mention that the notion of a dense G set is the topological version of being a large set, while a set is measure-theoretically large if its complement is a null-set. Both notions of being large share similar features, and in particular a countable intersection of large sets in either sense is also large(6) . However, these two notions are quite dierent. Example 5.14 shows how to construct topologically large sets that are measure-theoretically small, and vice-versa.
196
Example 5.14. For every > 0 there exists an open set O R which contains Q and has Lebesgue measure less than . This may be found, for example, by listing the elements of Q as {x1 , x2 , . . . } and setting O =
k 1
B/2k+2 .
Then G = n 1 O1/n is a dense G and a null set, and its complement R G is meager and a set of full measure. 5.2.2 Proof of Open Mapping Theorem
X Y Recall that we write Br and Br for the open balls of radius r and center 0 in X and Y respectively.
Lemma 5.15. Assume that T : X Y is a bounded, surjective linear map, X is a normed vector space, and Y is a Banach space. For any > 0, there is a > 0 such that X BY . T B (5.3) Proof. Since X=
n=1 X and T is onto, we have Y = T (X ) = n=1 nT B . By the Baire category X theorem (Theorem 5.9 applied to Y ) it follows that, for some n, the set nT B Y X contains some ball Br (z ) in Y . Then, by linearity, T B must contain the 1 1 Y z and = n r. It follows that the set ball B (y ), where y = n Y Y Y { y 1 y 2 | y 1 B (y ), y2 B (y )} = B2 X nB ,
is contained in the closure T Q, where

X X X Q = {x1 x2 | x1 B , x2 B } = B2 . Y X Thus, B2 T B2 and (5.3) follows.
The above lemma only used that Y is a Banach space. Using that also X is a Banach space we are able to prove the main step towards the theorem in the following lemma. Lemma 5.16. Let T : X Y be as in Theorem 5.8. For any > 0 there is a > 0 such that X Y T B B . (5.4)
197
Proof. Choose a sequence (n ) with each n > 0 and with n=1 n < . By Lemma 5.15 there is a sequence (n ) of positive numbers such that
X BY T B n n
(5.5)
for all n 1. Without loss of generality, assume that n 0 as n . (Actually this holds automatically unless Y is very special indeed.) Now let = 1 . Y Y X Let y be any point in B = B . By (5.5) there is a point x1 B 1 1 such that T x1 is as close to y as we wish, say with y T x1 < 2 . Since Y X (y T x1 ) B , (5.5) with n = 2 implies that there exists a point x2 B 2 2 such that y T x1 T x2 < 3 . Continuing , we obtain a sequence (xn ) in X such that xn < n for all n, and
n
yT Since xn < n , the series it is convergent; write x = x

n n
xk
k=1
< n+1 .
(5.6)
xn is absolutely convergent, so by Lemma 2.22 xn . Then
xn
n=0
n < .
n=0
The map T is continuous, so (5.6) shows that y = T x, since n 0. Y X That is, for any y B we have found a point x B such that T x = y , proving (5.4).
X such that x + Proof of Theorem 5.8. Since G is open, there is a ball B X X Y B G. By Lemma 5.16, T (B ) B for some > 0. Hence X X Y T (G) T (x + B ) = T (x) + T (B ) y + B .
5.2.3 Consequences: Bounded Inverses and Closed Graphs As an application of Theorem 5.8, we establish a general property of inverse maps. While we generally write T : X Y to mean that T is dened on all of X , it is sometimes convenient to permit T to be only dened on a domain DT which is then a (possibly proper) subspace of X .
This argument may be paraphrased as follows. At each stage we approximate the n1 current element y T k=1 xk in Y up to an error n+1 that we know can be dealt with later. This pushes the problem along until it ultimately vanishes in the limit.
198
Denition 5.17. Let T : X Y be an injective linear operator. Dene the inverse of T denoted T 1 by requiring that T 1 y = x if and only if T x = y . Then the domain of T 1 is the linear subspace T X Y , and T 1 is a linear operator on its domain. Clearly T 1 T x = x for all x X , and that T T 1 y = y for all y in the domain of T 1 . We also say that T 1 is a left-inverse of T . Proposition 5.18 (Bounded inverse). Let X and Y be Banach spaces, and let T be an bijective bounded operator from X to Y . Then T 1 is a bounded operator also. Proof. Since T 1 is a linear operator, we only need to show it is continuous (which is equivalent to boundedness by Lemma 2.39). By Theorem 5.8, T maps open sets onto open sets. For the map T 1 this shows that the preimage (T 1 )1 (O) = T (O) of an open set O X is open in Y . Therefore, T 1 is continuous. Corollary 5.19 (Equivalent norms). If X is a Banach space with respect to two norms (1) and (2) and there is a constant K such that x
(1)
K x
(2)
for all x X , then the two norms are equivalent. That is, there is another constant K > 0 with x (2) K x (1) for all x X . Proof. Consider the map T : x x from (X, (2) ) to (X, (1) ). By assumption, T is bounded, so by Proposition 5.18, T 1 is also bounded, giving the bound in the other direction. Denition 5.20. Let T be a linear operator from a normed linear space X into a normed vector space Y , with domain a linear subspace DT X . The graph of T is the set GT = {(x, T x) | x DT } X Y. If GT is a closed set in X Y then T is a closed operator. Notice as usual that this notion becomes trivial in nite dimensions in the following sense. If X and Y are nitedimensional, then the graph of T is simply some linear subspace, which is automatically closed. Also it is easy to see that a continuous operator has a closed graph. The next theorem the converse is called the closed graph theorem.
We note that the converse is not a purely topological fact. For instance the set consisting of the hyperbola xy = 1 and the origin is the closed graph of a discontinuous function f : R R.
199
Theorem 5.21 (Closed graph theorem). Let X and Y be Banach spaces, and T : X Y a linear operator with DT = X . If T is closed, then it is continuous. Proof. Fix the norm (x, y ) = x X + y Y on X Y . The graph GT is, by linearity of T , a closed linear subspace in X Y , so GT is itself a Banach space. Consider the projection P : GT X dened by P (x, T x) = x. Then P is clearly bounded, linear, and bijective. It follows by Proposition 5.18 that P 1 is a bounded linear operator from X into GT , so (x, T x) = P 1 x K x
X
for all x X,
for some constant K . It follows that x X + T x Y K x X for all x X , so T is bounded and therefore T is continuous by Lemma 2.39.
Exercise 5.22. Suppose that A : H H is a linear operator on a Hilbert space H that is self-adjoint in the sense that Ax, y = x, Ay for all x, y H. Show that this implies that A is bounded.
Corollary 5.23. Let (X, ) be a -nite measure space and g : X C a 2 measurable function. If T : f gf maps L2 (X ) to L (X ), then g L (X ). Proof. Notice that the hypotheses in the statement do not require that the map is continuous, but simply asks that the range lie in L2 (X ). However, if (fn , gfn ) is a convergent sequence in the graph of T , so that fn f and gfn as n in L2 (X, ), then we can extract a subsequence along which both convergences hold -almost everywhere. Then along this subsequence gfn converges almost everywhere to gf and to , so that gf = L2 (X ), so (f, ) also lies in the graph of T . It follows that the map is closed, and hence continuous by Theorem 5.21. Knowing now that T is bounded, there is a constant C 0 such that gf for any f L2 (X ). Let XC = {x | |g (x)| > C }, which we claim is a null set. Assuming the opposite, let B XC be a measurable subset of positive nite measure and let f = B be its characteristic function. Then C 2 (B ) <
B 2
C f
|g |2 |f |2 d = gf
2 2
C2 f
2 2
= C 2 (B )
200
gives a contradiction, which implies that (XA ) = 0. Hence |g | is almost everywhere less than or equal to C , and in particular shows that g L (X ). Some very important and natural operators are unbounded. For example, the derivative operator D0 : f f {f C 1 ((0, 1)) | f, f L2 ((0, 1))}. As continuity does not hold, the denition above of a closed operator gives a more general framework to consider such unbounded operators. In fact, one can consider the closure of the graph of D0 , and obtain a closed operator D dened on a dense subset of L2 ((0, 1)). Implicitly we already carried this out: the domain is H 1 ((0, 1)) and the operator is the weak dierential operator considered in Proposition 3.51, but now considered as an unbounded operator from a dense subset of L2 ((0, 1)) to L2 ((0, 1)). The closed graph theorem therefore says that these generalized operators, namely closed operators, are usually only dened on a subset of the rst Banach space unless they actually are bounded.
considered on L2 ((0, 1)), is originally only dened on the dense subset
5.3 Further Topics

The above consequences of completeness are very useful, let us indicate a few applications and extensions of these results. As we have seen the BanachSteinhaus theorem (Theorem 5.1) has interesting consequences for the notion of strong convergence (Corollary 5.3). We will discuss the corresponding topology again in Section 7.3. The notion of closed operator is the starting point of the theory of selfadjoint unbounded operators, see Chapter ??. The closed graph theorem (Theorem 5.21) has interesting consequences for irreducible unitary representations, see ??. ?
6 Dual Spaces
Let X be a real or complex normed vector space. A bounded linear operator from X into the normed space R is a (continuous) linear functional on X . Recall that the space of all continuous linear functionals is denoted B (X, R) = X , and it is called the dual or conjugate space of X . Notice that Lemma 2.41 shows that X is a Banach space with respect to the operator norm. One of the most important questions one may ask of X is the following: are there enough elements in X ? For example, are there enough elements to separate points? This is answered in great generality using the HahnBanach theorem (Theorem 6.3 below); see Corollary 6.4. In Section 6.1 we also discuss several further consequences of the HahnBanach theorem concerned with the relationship between X and X . In Sections 6.2 and 6.3 we will identify the duals to many important Banach spaces, leading to examples and counterexamples to the property of reexivity. These examples then also give concrete cases of the weak and the weak* topology in Chapter 7.
6.1 HahnBanach Theorem and its Consequences

6.1.1 HahnBanach Lemma We rst prove the HahnBanach lemma, which is the following slightly more abstract result. We will see an application of this more abstract form in Section 6.3. Lemma 6.1 (HahnBanach lemma). Let X be a real vector space, and p : X R a norm-like function with the properties p(x1 + x2 ) and p(x1 ) = p(x1 ) p(x1 ) + p(x2 )
202
6 Dual Spaces
for all 0 and x1 , x2 X. Let Y be a subspace of X , and f : Y R a linear function with f (y ) p(y ) for all y Y . Then there exists a linear functional F : X R such that F (y ) = f (y ) for y Y , and for all x X . Proof. Let K be the set of all pairs (Y , g ) in which Y is a linear subspace of X containing Y , and g is a real linear functional on Y with the properties that g (y ) = f (y ) for all y Y , and g (x) p(x) F (x) p(x)
for all x Y . We make K into a partially ordered set by dening the relation (Y , g ) (Y , g ) if Y Y and g = g |Y . It is clear that any totally ordered subset {(Y , g ) : I } has an upper bound given by the subspace Y = Y and the functional dened by g (y ) = g (y ) for y Y and I . That Y is a subspace and that g is well-dened both follow since {(Y , g ) : I } is linearly ordered. All of this is to prepare the use of Zorns lemma, which roughly speaking allows us to make a transnite induction with choices (the heart of the argument follows in the next paragraph). Indeed by Zorns lemma (see Section A.1), there is a maximal element (Y0 , g0 ) in K. All that remains is to check that Y0 is all of X (so we may take F to be g0 ). Assume that x X \Y0 . Let Y1 be the vector space spanned by Y0 and x: each element z Y1 may be expressed uniquely in the form z = y + x with y Y0 and R, because x is assumed not to be in the vector space Y0 . Dene a linear functional g1 Y1 by setting g1 (y + x) = g0 (y ) + c. Now we choose the constant c carefully. Note that if y1 , y2 Y0 , then g0 (y1 ) g0 (y2 ) = g0 (y1 y2 ) so p(x y2 ) g0 (y2 ) p(y1 + x) g0 (y1 ). p(y1 y2 ) p(y1 + x) + p(x y2 ),
203
It follows that A = sup {p(x y ) g0 (y )}

y Y0 y Y0
inf {p(y + x) g0 (y )} = B.
Choose c to be any number in the interval [A, B ]. Then, by construction of A and B , c p(y + x) g0 (y ) (6.1) for all y Y0 , and p(x y ) g0 (y ) c c
y
(6.2) for y to obtain (6.3)
for all y Y0 . Multiply (6.1) by > 0 and substitute p(y + x) g0 (y )
from the assumed (positive) homogeneity. Multiply (6.2) by < 0, substiy for y and use the homegeneity assumption on p to obtain (6.3) again. tute Since (6.3) is clear for = 0, we deduce that g1 (y + x) = g0 (y ) + c p(y + x)
for all R and y Y0 . That is, (Y1 , g1 ) K and (Y0 , g0 ) (Y1 , g1 ) with Y0 = Y1 . This contradicts the maximality of (Y0 , g0 ) and hence F = g0 is dened on all of X and satises the conclusion of the lemma.
Exercise 6.2. Let X be a vector space over R and let K X be a convex subset. Suppose that 0 K and for every x X there is some t > 0 with tx K . Dene the gauge function pK (x) = inf {t > 0 | 1 x K }. t Show that pK is norm-like in the sense that it is positive, homogeneous, and satises the triangle inequality (the latter two being assumptions in Lemma 6.1).
6.1.2 The HahnBanach Theorem Consequences For real vector spaces, the HahnBanach theorem follows at once (for complex spaces a little more work is needed). Theorem 6.3 (HahnBanach theorem). Let X be a real or complex normed space, and Y a linear subspace. Then for any y Y there exists an x X such that x = y and x (y ) = y (y ) for all y Y .
204
6 Dual Spaces
That is, any linear functional dened on a subspace may be extended to a linear functional on the whole space, without increasing the norm. Proof of Theorem 6.3. Assume rst that X is a real normed space. Let p(x) = y x and f (x) = y (x). Apply the HahnBanach lemma (Lemma 6.1) to nd x = F . To check that x y , write x (x) = |x (x)| for = 1. Then |x (x)| = x (x) = x (x) p(x) = y x = y x . The reverse inequality is clear, so x = y . Now let X be a complex normed vector space, let Y X be a complex linear subspace, let y Y , and dened a real linear functional y1 by
y1 (y ) = y (y ) for y Y . Let x 1 : X R be an extension of y1 with x1 = y1 (by the real case above). Now dene x (x) = x 1 (x) ix1 (ix),
which is once again an R-linear map from X to C. It is also C-linear since

2 x (ix) = x 1 (ix) ix1 (i x)
and
2 ix (x) = ix 1 (x) i x1 (ix) = x (ix).
Moreover, for y Y we have x (y ) = y (y ) iy (iy ) = y (y ) + iy (y ) = y (y ). y ax = y
Finally, |x (x)| = ax (x) for some a C with |a| = 1, and so |x (x)| = ax (x) = x (ax) = x 1 (ax) x , which shows that x = y and hence the complex case of the theorem. Many useful results follow from the HahnBanach theorem. Corollary 6.4 (Separation). Let X be a normed vector space. Then, for any x X there is a functional x X with x = 1 and x (x) = x . Hence, if z = y X then there exists x X such that x (y ) = x (z ). Proof. We assume without loss of generality that x = 0. Apply Theorem 6.3 with Y being the linear hull of x to nd an extension of the linear map y (ax) = a x on Y . Since |y (ax)| = |a| x = ax we have y = 1, and so we nd an x X with x = 1 and x (x) = x . For the last part, take x = y z .
205
Exercise 6.5. Show that every nite-dimensional subspace of a normed vector space has a closed linear complement.
Notice nally that linear functionals allow us to decompose a vector space: let X be a normed vector space, and x X . The null space or kernel of x is the linear subspace Nx = {x X | x (x) = 0}. If x = 0, then there is a point x0 = 0 such that x (x0 ) = 1. Any element x X can then be written x = z + x0 , with = x (x) and z = x x0 Nx . Thus, X = Nx Y , where Y is the onedimensional space spanned by x0 .
The reader should compare the following result for a general normed vector space to Corollary 2.77 for Hilbert spaces. Corollary 6.6 (Closed linear hull). Let S X be a subset of a normed vector space. Then the closed linear hull of S is precisely the set of all x X that satisfy x (x) = 0 for all x X with x (S ) = {0}. Equivalently, S =
x X ; x (S )={0}
ker x .
Proof. The inclusion of the left-hand side in the right-hand side is clear. Suppose that x0 / S , and let Y = x0 + S . Then the functional y dened by y (x0 + z ) = for z S is bounded. For otherwise there would exist, for every n scalar n = 0 and some zn S with |n | which implies that x0 +
1 n zn 1 n,
1, some
n n x0 + zn ,
forcing x0 S . Therefore, y can be extended to a continuous linear functional x on X which satises x (S ) = {0} but x (x0 ) = 1. This shows that x0 also does not lie in the intersection of the kernels on the right-hand side.
Exercise 6.7. Let Y X be a subspace of a normed linear space X . Show that
x 1,x (Y )={0}
max
|x (x)| = inf |x y |.
y Y
Exercise 6.8. Let Y X be a closed subspace of a normed vector space. (a) Show that Y = {x X | x (Y ) = {0}} is closed subspace. (b) Show that (X/Y ) = Y (that is, that there is a natural isometric isomorphism between the two). (c) Show that Y = X /Y . (d) Conclude that Y is reexive if X is reexive. Exercise 6.9. (a) Prove that if the dual space X of a real normed vector space X is strictly convex, then the HahnBanach extension of a continuous functional on a subspace to all of X is unique. (b) Give an explicit example of a situation in which the extension dened by the HahnBanach theorem is not unique.
206
6 Dual Spaces
6.1.3 An Application of the Spanning Criterion The spanning criterion is a powerful tool, surprisingly often even without a complete description of the dual space. The following theorem of M untz generalizes the StoneWeierstrass theorem on the unit interval. Theorem 6.10 (M untz [32]). Suppose that (nk ) is a sequence in N with n1 < 1 = , and let pn (x) = xn for n N. Then n2 < n3 < and with k=1 n k the linear hull of {1, pn1 , pn2 , . . . } is dense in C ([0, 1]). Proof. Let Y be the closed linear hull of the set {1, pn1 , pn2 , . . . } in C ([0, 1]). By Corollary 6.6 we have to show that if C ([0, 1]) has (1) = (pnk ) = 0 (6.4) for all k 1, then = 0. In fact, it is enough to show that if C ([0, 1]) has (6.4) for all k 1, then (pn ) = 0 for all integers n 1. This is because Corollary 6.6 then shows that C[x] Y , after which the StoneWeierstrass theorem (Theorem 2.34) applies and gives Y = C ([0, 1]). So assume that C ([0, 1]) satises (6.4) for all k 1, and assume for now that there is some n 1 < . N with (pn ) = 0. We will show that this implies that j =1 n j Notice that for C with ( ) > 0, we have p C ([0, 1]) where p (t) = t , t 1 and t + t t 1 = lim t = t log t, C 0 C 0 lim where the convergence is with respect to the uniformity). Now dene f ( ) = (p )
norm (check this claimed
for C with ( ) > 0, and notice that |f ( )| analytic for ( ) > 0 since
C 0
. Furthermore, f is
lim
f ( + ) f ( ) = t log t
exists by the above observation regarding uniform convergence. Finally, f (nk ) = 0 for k 1 by assumption. Now dene the Blaschke product
K
BK ( ) =
k=1
nk , + nk
so that we have the simple zeros BK (nk ) = 0
207
for k = 1, . . . , K , BK ( ) = 0 for ( ) > 0 and / {n1 , . . . , nK } and the asymptotic formulas |BK ( )| 1 as ( ) 0 or as | | . Together these show that gK ( ) = f ( ) BK ( )
is analytic for ( ) > 0, and has the asymptotic properties |gK ( )| (1 + )
for R and ( ) where R > R( ) and < ( ), the quantities R( ) > 0 and ( ) > 0 depend on , and > 0 is arbitrary. Applying the maximum principle for gK on the half-circle { C | | | R, ( ) }
in Figure 6.1, the analytic function gK must attain its maximum on the boundary of the half-circle. As this holds for any < ( ) and R > R( ) as well as for any , we obtain gK .
R 0
Fig. 6.1. Applying the maximum principle.
For = n this shows that

K k=1
2n 1+ = nk n
K k=1
n + nk = |BK (n)|1 n nk
< , |f (n)|
208
6 Dual Spaces
i.e. we found an upper bound independent of K . Taking the logarithm, it follows that the sum K 1 nk n
k=1
has an upper bound independent of K . Multiplying the series term-by-term n nk n 1 with nk k=1 nk < nk (and noticing that nk 1 as k ), it follows that as claimed. This contradicts our assumption, and the theorem follows. 6.1.4 The Bidual Corollary 6.11 (Isometric embedding into the bidual). Let X be a normed vector space. Then x =
x X , x
max
|x (x)|
for any x X . In particular, the natural map : x (x) X X = (X ) from X into the bi-dual of X that sends x X to the linear functional (x) dened by (x)(x ) = x (x) for x X , is an isometric embedding. Denition 6.12. A Banach space is called reexive if the isometry in Corollary 6.11 is a bijection from X to X . As we will see in the next section, some Banach spaces which we have already encountered are reexive, but some are not. Proof of Corollary 6.11. By denition, x (x0 ) x x0 x0
for all x X with x 1 and x0 X . For the converse, apply Corollary 6.4 to obtain some functional x X of norm one with x (x0 ) = y (x0 ) = x0 as required. Now notice that sup
x X, x 1
|x (x0 )|
is by denition precisely the norm of (x0 ) X . Hence we have shown that is an isometry.
209
Exercise 6.13. Let X be a normed vector space and suppose that the dual X is separable. Show that X is also separable. In particular, if X is separable but X is not, then X cannot be reexive. Find an example of a Banach space that is not reexive for that reason.
The results developed above give another approach to Theorem 2.26. Shorter proof of Theorem 2.26. Let X be a normed vector space. By Corollary 6.11, X is isometric to (X ) X . Set B = (X ), which is a Banach space by Lemma 2.41. 6.1.5 Banach Limits and Amenable Groups On the space c = {(xn )nN (N) | lim xn exists}
n
we have the natural linear functional lim dened by c (xn )nN lim ((xn )nN ) = lim xn .
n
A natural question is to ask if this rather obvious functional, taking the limit of sequences that do have a limit, might have an extension to all of the much larger space (N). The HahnBanach theorem is built for just such situations, and using it we readily nd such a generalized functional. Corollary 6.14. There exists a linear functional LIM ( (N)) with norm one, which may be thought of as a generalized limit since it satises the following properties: LIM ((an )) = limn an if the latter limit exists; LIM ((an )) [lim inf n an , lim supn an ] if an R for all n LIM ((an )) = LIM ((an+1 )). 1;
The functional LIM is called a Banach limit. Proof. Let c (N) and lim c be as given before the statement of the corollary. Notice that lim = 1 since | limn an | supn 1 |an |. Let L1 ( (N)) be an extension as in Theorem 6.3, with L1 = lim . We now dene a2 a1 +a2 +a3 L ((an )) = L1 a1 , a1 + , ,... . 2 3 Clearly L is linear and extends lim on the subspace c, since the C esaro averages of a convergent sequence converge to the same limit. This functional also has norm one, since a1 ++an (an ) n for all n 1, which implies that
a2 L 1 a 1 , a1 + 2 ,...
(an )
210
6 Dual Spaces
Moreover,
a 3 a1 a4 L ((an )n (an+1 )n ) = L1 a1 a2 , a1 , 3 , . . . = 0, 2
which implies the last claim in the corollary. If an R for all n inf n 1 an and S = supn 1 an , then an for all n 1, and so L ((an )) which implies that I L ((an )) translation-invariance, we obtain
n k I +S 2 S I 2 , I +S 2 S I 2
1, I =
S . Together with the already established L ((an )) sup an ,
inf an
n k
and so also lim inf an

n
L ((an ))
lim sup an .
n
By pre-composing with the restriction operator (Z) (N)
(an )nZ (an )nN we can view LIM also as a translation-invariant linear functional on (Z). A natural and important question to ask is which other groups G have a similar invariant functional dened on all of (G). We assume here that G is countable (and we endow the group G with the discrete topology). Denition 6.15. A countable group G is called amenable if there exists a nitely additive (left-)invariant mean on G. That is, a function m : P(G) [0, 1] dened on all subsets of G with the following properties: m(A) 0 for all A G and m(G) = 1; m(A1 A2 ) = m(A1 ) + m(A2 ) for disjoint sets A1 , A2 G; and m(gA) = m(A) for all g G and A G.
One may think of a mean (which is only required to be nitely additive) as a poor substitute for a measure (which is countably additive) when a measure with the desired properties does not exist. This is the case if G is a countable innite group, as the only translation-invariant measure in that case is the counting measure (or a scalar multiple of counting measure), which is innite. However, the invariant mean discussed here takes values in [0, 1].
211
Example 6.16. As Corollary 6.14 together with the next lemma show G = Z is amenable. Moreover, any invariant mean on Z (there will turn out to be many see Exercise 6.22) will have some reassuringly familiar properties. For example if E = {n Z | n is even}, then for any invariant mean m we must 1 since 2m(E ) = m(E ) + m(E + 1) = m(Z) = 1. have m(E ) = 2 Lemma 6.17. A countable group G is amenable if and only if there exists a positive (left-)translation invariant functional LIM in ( (G)) of norm one. Here positivity is the requirement that and LIM (a(h)) = LIM (a) for all h G and a (G), where a(h) is the shifted map g a(hg ). a(g ) 0 for all g G implies that LIM(a) 0,
Sketch of proof. If LIM is given on (G) then we can dene m(A) to be LIM(A ) for all A G, and then it is easy to check that m is a leftinvariant mean on G. On the other hand, if a left-invariant mean m is given, then we may obtain every a (G) as the limit of a sequence of nite sums of the form
n
ri
i=1
(n)
A(n)
i
as n , where ri C and Ai may partition the set
(n)
(n)
G for all n
1 and i. For example, we
B Ca
(0)
1 n,
into nitely many sets B1 , . . . , B each of diameter less than Bi and then dene Ai = a1 (Bi ) so that
n
choose ri
(n)
a Now we can dene LIM by
ri
i=1
(n)
A(n)
i
1 . n
LIM(a) = lim
ri m(Ai ),
i=1
(n)
(n)
and check that LIM is well-dened, linear, bounded of norm one, positive, and left-invariant.
Exercise 6.18. Provide a detailed proof of Lemma 6.17.
212
6 Dual Spaces
Denition 6.19. A sequence (Fn )n of nite subsets of a countable group G is called a Flner sequence if the elements of the sequence are asymptotically translation invariant in the sense that
n
lim
|Fn hFn | =0 |Fn |
(6.5)
for all h G. Lemma 6.20. If a countable group has a Flner sequence, then it is amenable. Proof. Let LIMN be the Banach limit from Corollary 6.14 and let (Fn ) be a Flner sequence. Then for any a (G) we can dene 1 LIMG (a) = LIMN a(g ) , |Fn |
gFn
which is linear, positive, and of norm one. Moreover, for any h G write ah (G) for the element with ah (g ) = a(hg ) for all g G. Then 1 a(hg ) LIMG (ah ) = LIMN |Fn | gFn 1 = LIMN a(g ) |Fn | ghFn 1 = LIMN a(g ) (by (6.5)) |Fn |
gFn
= LIMG (a),
showing left-invariance. Example 6.21. (1) More generally, G = Zd is amenable. This may be seen by noting that a sequence (Fn ) of large boxes, for example Fn = [n, n]d Zd is a Flner sequence. (2) The free group G = F, generated by two elements , is not amenable. To see this, suppose that m is an invariant mean on G. Clearly the singletons {e}, {}, {}, . . . are all disjoint and are left-translates of each other, so we must have m({e}) = 0. Now dene S = {g G | the reduced form of g starts on the left with },
6.2 The Duals of Lp (X )
213
and similarly dene sets S1 , S and S 1 . Since G = S S1 S S 1 {e} and m({e}) = 0, we must have 1 = m(S ) + m(S1 ) + m(S ) + m(S 1 ). However, 1 S = S S S 1 {e}, so m(S ) = m(S ) + m(S ) + m(S 1 ), and hence m(S ) = m(S 1 ) = 0 by positivity. Exchanging the roles of and also shows that m(S ) = m(S1 ) = 0, which together contradict (6.6). It follows that F, cannot be amenable.
Exercise 6.22. Show that the invariant means on Z constructed by using the Flner sequences dened by Fn = [0, n], (2) Fn = [n, 0], and (3) Fn = [n2 , n2 + n]
(1)
(6.6)
are all dierent. Can you construct innitely many dierent invariant means on Z? Exercise 6.23. (a) Show that any countable abelian group is amenable. (b) Show that SL2 (Z) is not amenable. [You may use the fact that PSL2 (Z) = SL2 (Z)/{I } is isomorphic to the free product of Z/2Z and Z/3Z.] Exercise 6.24. Prove that the discrete Heisenberg group 1k H = 1 m | k, , m Z 1
with the usual matrix multiplication is amenable.

We will present isomorphisms between spaces and their duals using a bilinear pairing. If X and Y are vector spaces and each y Y induces a linear functional on X , then we often write x, y for the value of the functional associated to y Y evaluated on x X . We always assume that the linear
214
6 Dual Spaces
functional depends linearly on y Y , and so , is a bilinear functional on X Y . We use the word pairing here to signify that, for example, the map in Exercise 6.25 may be thought of in two ways. It denes on the one hand a family of functionals parameterized by elements of (N) dened on sequences in c0 (N) and on the other a family of functionals parameterized by elements of c0 (N) dened on sequences in (N). Even though we will prove this in many cases (indeed, it will often be the key step in an argument), when we use this notation and terminology we do not assume that Y is indeed the whole dual to X or vice versa. The reader may start with the following as a warm-up exercise to start thinking about how dual spaces may be found.
Exercise 6.25. (a) Recall that c0 (N) = {(an ) | lim an = 0} (N)
n
is a Banach space with respect to the supremum norm 1 (N), where the dual pairing is given by
Show that (c0 (N)) =
( a n ) , ( bn ) =
n=1
a n bn
for (an ) c0 (N) and (bn ) 1 (N). (b) Show that 1 (N) = (N), with the same formula for the pairing. (c) Show that the Banach limit LIM ( (N)) as in Corollary 6.14 is not in the canonical image of 1 (N) in ( (N)) . (d) Conclude that neither c0 (N) nor 1 (N) is reexive.
6.2.1 The Dual of L1 (X ) We start by generalizing the second part of Exercise 6.25. Proposition 6.26. Let (X, B , ) be a -nite measure space. Then L1 (X ) L ( X ) under the pairing f, g =
X for f L1 (X ) and g L (X ). The operator norm of the functional dened by g L ( X ) is precisely g esssup (the essential supremum norm, dened on p. 46).
f g d
Proof. As indicated, we associate to every g L (X ) the functional (g ) : f which satises f g d

X
215
(g )
operator
= sup
f
1
f g d
1 X
sup
f
1
|f ||g | d
esssup .
For the converse, let > 0 and choose a measurable set A {x X | |g (x)| > g
esssup
with (A) > 0 (which is possible by denition of the essential supremum) and with (A) < (which is possible since is -nite). Now dene f= and notice that f
1
|g (x)| 1 A , (A) g (x)
= 1 and 1 (A) |g | d = g g
esssup
(g )(f ) = This shows that
(g )
operator
esssup ,
((X )) f 2 for all f L2 (X ) by the CauchySchwarz inequality. There 1 fore, a functional L (X ) also induces a functional L2 (X ) . Since L2 echetRiesz representation theorem (X ) is a Hilbert space, the Fr (Corollary 2.71) now shows that there exists some g L2 (X ) with (f ) =
X for all f L2 (X ). We now show that g L (X ). Let
1 so : L is an isometric embedding. It remains to show (X ) L (X ) that is onto (this is often the most interesting part of the identication of a dual space). 1 For this, assume rst that is nite. Then L2 (X ) L (X ) and f 1 1/2
f g d
A = {x X | |g (x)| >
operator },
g| 1 so that f = A |g L2 (X ) L (X ). If (A) > 0 then
operator(A)
<
A
|g | d =
f g d = |(f )|
operator (A)
gives a contradiction. Thus (A) = 0 and so g esssup operator. Since 1 and (g ) agree on the dense subset L2 ( X ) L ( X ) we have = (g ) as required.
216
6 Dual Spaces
If is -nite with X=
n=1
Yn
with (Yn ) < , then we may apply the argument above to |L1 to nd ( Yn ) some gn L ( Y ) with n gn We dene g (x) = gn (x) for x Yn , and obtain g L (X ) with g Since the subspace V =
n=1 esssup esssup
= |L 1 ( Yn )
operator
operator .
operator.
1 L1 (Yn ) L (X )
contains all simple functions in its closure, we have V = L1 (X ). As we have shown, |L 1 = (g )|L1 ( Yn ) ( Yn ) for all n 1, so once again = (g ) as required.
6.2.2 The Dual of Lp (X ) for p > 1

1 Exercise 6.27. Let p, q (1, ) have 1 +q = 1. Show that (p (N)) = q (N) (in p the sense that there is an isometric linear bijection between the two spaces).
The following provides us with many examples of reexive spaces. Proposition 6.28. Let (X, B , ) be a -nite measure space, and let p q (1, ) with H older conjugate q . Then (Lp (X )) = L (X ) via the pairing f, g =
X q for f Lp (X ) and g L (X ). The operator norm of the functional determined by g is precisely g q . q Proof. For f Lp (X ) and g L (X ) with 1 p
f g d
+
q
1 q
= 1 we have
| f, g |
by the H older inequality. It follows that the linear functional dened by g on Lp (X ) is bounded, with norm less than or equal to g q . If we set
217
f= then f and f, g =
X p
|g | q/p |g | g
1/p
|g |q d |g |1+ p d =
q
= g
q/p q
<
p
|g |q d = f
shows that the norm of the functional determined by g must be equal to g q . It remains to show that every bounded linear functional (Lp (X )) is q determined as above by some g L (X ). So let (Lp (X )) and replacing by (()), respectively by (()) if necessary we may restrict to real-valued functions on X and to R-linear functionals, as the complex case then follows by putting (()) and (()) in terms of functions in Lq (X ) together. So we work over the reals and dene + (B ) = sup{(A ) | A B measurable , (A) < } for any measurable set B X . Notice that if is dened by g , then + (B ) would be given by + (B ) =
A
g d =
B
g + d
for A = {x B | g (x) > 0}. Thus for a general we would like to show that + is an absolutely continuous measure on X (which then will give us g + as a RadonNikodym derivative). Clearly + (B2 ) + (B1 ) 0 for any measurable sets B1 B2 X . For measurable disjoint B1 , B2 X and A1 B1 , A2 B2 as in the denition of + we have (A1 ) + (A2 ) = (A1 A2 ) + (B1 ) + + (B2 ) + (B1 B2 ),
and taking the supremum over A1 and A2 gives + (B1 B2 ).
If, on the other hand, A B1 B2 we dene Ai = A Bi for i = 1, 2 and see that (A ) = (A1 ) + (A2 ) + (B1 ) + + (B2 ) and so by taking the supremum over A we get + (B1 ) + + (B2 ) = + (B1 B2 ). Suppose now that B=
n=1
Bn .
218
6 Dual Spaces
Then monotonicity and the above nite additivity imply that

N
+ (Bn )
n=1
+ (B )
for all N
1 and so
n=1
+ (Bn )
+ (B ).
To see the converse, let A B be measurable with nite measure, and dene An = A Bn . Clearly
A =
An
n=1
pointwise, but by dominated convergence this also holds in Lp (X ) for p < . Since is continuous, it follows that (A ) =
n=1
(An )
n=1
+ (Bn ).
As this holds for all A B we get +

+ n=1
Bn
n=1
+ (Bn ),
and so is a measure on X . Finally, if B X has nite -measure, then (A )
A = ((A))1/p
+ (B )
1/p
((B ))
1/p
for all A B , which shows that ((B )) . It follows that + is absolutely continuous and is -nite since is assumed to be -nite. By the RadonNikodym theorem (Proposition 2.82) we have d + = g + d for some measurable function g + 0. q We claim that g + Lq (X ) is the positive part of the element g L (X ) + q we are looking for. For this we rst have to check that g L (X ), which we do by estimating the Lq -norms of
+ gn = min{n, g + Xn },
where (Xn ) with Xn Xn+1 for all n 1 is a sequence of -nite sets + with X = n=1 Xn . Notice that gn g as n . Now let h 0 be a m simple function of the form h = j =1 j Bj , where j 0 for all j and with the sets Bj measurable and pairwise disjoint. Then

m + hgn d
219
hg + d =
j =1
j sup (Aj ) | Aj Bj
m j =1
but the expressions inside the last supremum we may estimate by

m
= sup
j Aj | Aj Bj
m
j =1
j Aj
operator
j Aj
j =1 p
operator
and so
+ hgn d operator
h p.
+ Applying the argument (for gn Lq (X )) from the beginning of the proof this shows that + gn operator q
and letting n also shows that g+

q
operator
where (g + ) is the functional determined by g + Lq (X ). Notice that for all B X measurable with (B ) < we have (B ) =
by monotone convergence. Now dene = (g + ) Lp (X )
B g + d (B )
0.
Hence if we apply the construction above of + and g + to then we obtain a new measure (B ) = sup{ (A ) | A B, (A) < } which satises (B ) = (B ) =
X
= sup{(A B ) | A B }
B g d
for all measurable B X with (B ) < . By the same argument, we have g Lq (X ). Together, we see that (B ) =
B (g + g ) d =
B g d
for g = g + g Lq (X ) and all measurable B X . By the density of simple functions in Lp (X ) we conclude that = (g ) as required.
220
6 Dual Spaces
6.3 Riesz Representation, The Dual of C (X )

The next result is useful in many ways. It will allow us to completely describe C (X ) in Section 6.3.5, but it is more often used directly in the form presented here. In this section, apart from 6.3.5, we work over the reals only. Denition 6.29. Let F (X ) be a space of real-valued functions on some space X . Then a positive linear functional on F (X ) is a linear map : F (X ) R with the property that f 0 implies that (f ) 0 for all f F (X ). Theorem 6.30. Let X be a locally compact, -compact metric space, and suppose that : Cc (X ) R is a positive linear functional. Then there exists a uniquely determined locally nite (positive) measure such that (f ) =
X
f d
for all f Cc (X ). Recall that a measure is locally nite if every point has a neighborhood of nite measure, or equivalently if every compact subset of X has nite measure.
Exercise 6.31. Let X be a -compact, locally compact metric space. Let be a locally nite measure on X . Show that is regular, meaning that (B ) = sup{(K ) | K B is compact} = inf {(O) | O is open} for any Borel set B X .
We will prove Theorem 6.30 in several steps, rst showing the claimed uniqueness of the measure, then showing existence in the totally disconnected compact case, then the compact case and nally the general case. 6.3.1 Uniqueness We will use here the density of Cc (X ) in L2 (X ) from Proposition 2.38 and the RadonNikodym derivative in Proposition 2.82. Proof of uniqueness in Theorem 6.30. Let : Cc (X ) R be a positive linear functional and suppose and are two positive measures with f d = (f ) =
X X
f d
221
for f Cc (X ). We note that this implies that and are locally nite, since for every compact set K X there exists some function f Cc (X ) with f K by Urysohns lemma (Lemma A.23), which shows that (K ), (K ) (f ) < . Dene m = + so that m and m by Section 2.5.4. By Proposition 2.82 there exist measurable functions f , f 0 with d = f dm, d = f dm and f + f = 1 m-almost everywhere. Therefore gf dm =
X X
g d = (f ) =
X
g d =
X
gf dm
for all g Cc (X ). Let O X be open with compact closure and restrict the formula above to g Cc (O). Since Cc (O) L2 m (O) is dense by Proposition 2.38, and the linear functional determined by f |O L2 m (O) (respectively, f |O ) determines f |O (respectively f |O ) we get f |O = f |O m-almost everywhere. As O was arbitrary and X is -compact we deduce that f = f m-almost everywhere, which shows that = . 6.3.2 Totally Disconnected Compact Spaces As our rst step towards the existence of the measure representing a positive linear functional we consider the following kind of spaces, where the proof is quite simple. Denition 6.32. Let X be a topological space. A set C X is called clopen if it is both open and closed in X . The space X is called totally disconnected if every open set in X is a union of clopen sets, so the topology has a basis consisting of clopen sets. Example 6.33. Before we give the proof, let us mention that it is easy to nd compact metric totally disconnected spaces. (1) Let X = {1, . . . , a}N with the product topology is compact and metric with respect to the product topology using the discrete topology on {1, . . . , a}. It is also totally disconnected, since for any nite collec1 1 (Fn ) is both open tion F1 , . . . , Fn {1, . . . , a} the set 1 (F1 ) n and closed (here j : X {1, . . . , a} is the projection onto the j th coordinate). (2) More generally, we can also take the product X = n=1 An with the product topology, where each An is a nite set equipped with the discrete topology. Moreover, any closed subset Y X is again a totally disconnected compact metric space.
222
6 Dual Spaces
Lemma 6.34. Let X be a totally disconnected compact metric space. Then the Borel -algebra is generated by the clopen sets. Proof. Since the Borel -algebra is (by denition) the smallest -algebra containing the open sets, it is enough to show that there is a countable basis F for the topology consisting of clopen sets. Since the clopen sets form a basis for the topology, we have, for any n 1 and x X , B1/n (x) =
I (x) (x ) C
for some clopen sets C , I (x). Hence X=

xX I (x) (x ) C ,
(x )
and by compactness of X we can nd a nite subcover X=

J (n)
D ,
2 n.
(n)
consisting of clopen sets with diameter less than F = { D

(n)
We dene
| J (n), n
1}.
By construction, F is countable, and for every open set O X and x O 2 < d(x, X O) for some n 1. Then there we have d(x, X O) > 0, and so n (n) (n) exists some J (n) with x D , which also satises D O. Since x was arbitrary, O is indeed a union of elements of F . Finally, since F is countable, every open set is a countable union of elements of F . It follows that F generates the same -algebra as does the collection of all open sets, showing the lemma.
Exercise 6.35. Show that in a compact totally disconnected space, there are only countably many clopen sets.
Proof of Theorem 6.30 for totally disconnected compact metric spaces. Let X be a totally disconnected compact metric space, and let C = {C X | C is open and closed}. Notice that C is an algebra. Let : C (X ) R be a positive linear functional. Using we can already dene a content C on the algebra C . In fact, for C C we dene C (C ) = (C ).
This is possible since C C (X ) as C is both open and closed. It follows that
223
C (C ) 0 for C C (Positivity); C (C1 C2 ) = C (C1 ) + C (C2 ) for disjoint C1 , C2 C (Finite additivity).
By Caratheodorys extension theorem (see Theorem B.4) and Lemma 6.34 we can extend C to a measure on the Borel -algebra B of X if C
n=1
Cn
n=1
C (Cn )
for any disjoint sets C1 , . . . in C with n=1 Cn C . In the totally disconnected compact setting this is quite easy to check. Suppose that Cn C are disjoint for n 1 and C = n=1 Cn C . Then C is compact since it is a closed subset of the compact space X . On the other hand the sets Cn C are open, so C = n=1 Cn C is an open cover of a compact set. It follows N that C = n=1 Cn for some N 1, and hence Cn = for n > N . It follows that
N
C (C ) =
n=1
C (Cn ) =
n=1
C (Cn )
as required. Therefore, C can be extended to a measure , dened on the Borel algebra BX of X . By construction
X
C d = (C )
for C C . We wish to extend this formula to all continuous functions. So x f C (X ) and > 0. Then for every x X there exists some Cx C with x Cx such that |f (y ) f (x)| < (6.7) for y Cx . This denes an open cover {Cx | x X } of X , so there is a nite subcover X = Cx1 Cxn . Now dene sets
= Cxn = Cx2 Cx1 C , . . . , Cx = Cx1 C , Cx Cx n 2 1 n1 k=1
Cxk
so that
. Cx X = Cx n 1
We may now reinterpret the bound (6.7) as g X =

n k=1
X < f < f (xi )Cx i
n k=1
+ X = g + X . (6.8) f (xi )Cx i
224
6 Dual Spaces
We already know that integration against and application of the functional agree on the functions on the left and right-hand side of (6.8). Hence we may apply and d to (6.8) and obtain from positivity of both these functionals the bounds (f ), f d [(g ) (), (g ) + ()] and so f d (f ) 2().
As this holds for all > 0 and all f C (X ), the theorem follows. 6.3.3 Compact Spaces We now boost the result from Section 6.3.2 to the case of a general compact metric space. For this we are going to use the HahnBanach lemma (Lemma 6.1) and the following lemma. Lemma 6.36 (Symbolic cover). Let X be a compact metric space. Then there exists a totally disconnected compact metric space Y and a continuous surjective map : Y X . Example 6.37. A few cases of this lemma do not need a proof, and should help explain why one can think of Y as a symbolic cover. If X = [0, 1] (a similar argument applies to any compact interval in R) then we may take Y = {0, 1}N to be the space of all binary sequences with the map ((an )) =
an 2 n
n=1
and a continuous surjective map just as above : Y X = [0, 1]d

(d ) (a(1) n ), . . . , (an )
sending the binary sequence to the real number with that binary expansion. Let X [M, M ]d be a compact subset of Rd . By composing with an ane map, we can assume without loss of generality that X [0, 1]d = X . Dene d Y = {0, 1}N
n a(1) ,..., n 2
n=1
d ) n a( n 2
n=1
and nally with = |Y . Then Y Y is closed and so again is a totally disconnected compact metric space, and : Y X is continuous and surjective. Y = {y Y | (y ) X }
225
Exercise 6.38. Suppose that X is a compact d-dimensional manifold. Construct Y and as in Lemma 6.36.
We postpone the proof of the lemma until after we have seen why it is useful for the problem at hand. Proof of Theorem 6.30 for compact metric spaces. Let X be a compact metric space, and let Y and : Y X be as in Lemma 6.36. Let : C (X ) R be a positive linear functional. For f C (X ) we have sup f (x)
xX
X f X f
0,
so by positivity, or equivalently (f ) Now let V = {f | f C (X )} C (Y ), where we used continuity of , and notice that if f1 = f2 for f1 , f2 C (X ) then f1 = f2 since is surjective. Thus we may dene V (f ) = (f ), which is linear and satises V (f ) = (f ) for p : C (Y ) R dened by p(F ) = (X ) sup F (y ).
y Y
sup f (x)
xX
(X ) sup f (x).
xX
(X ) sup f (x) = p(f )

xX
Moreover, p satises p(F1 + F2 ) for F1 , F2 C (Y ) and p(F1 ) + p(F2 )
p(F ) = p(F )
for F C (Y ) and 0. These are precisely the hypotheses for Lemma 6.1, so we conclude that V can be extended to a functional Y : C (Y ) R which still satises Y (F ) (X ) sup F (y ).
y Y
226
6 Dual Spaces
If F
0 then F
0 and so Y (F ) (X ) sup (F (y ))
y Y
0,
or Y (F ) 0. Hence Y is a positive linear functional on Y . By the totally disconnected compact case in Section 6.3.2 we conclude that there is a measure Y on Y with Y (F ) =
Y
F dY .
Applying this to F = f , we see (f ) = Y (f ) = We now dene = Y by the formula (B ) = Y 1 (B ) for B BX . Note that by this denition we have
X Y
f dY .
B d = (B ) = Y 1 (B ) =
B dY ,
which extends by linearity to all simple functions, then by monotone convergence to all positive measurable functions, and then to all integrable functions. In particular, (f ) =
Y
f dY =
f d
X
for all f C (X ), proving the theorem for a compact metric space X . We note that the argument above actually proves the following abstract principle. If : Y X is a continuous surjective map between two compact spaces, and the Riesz representation theorem holds for Y , then it also holds for X . It remains to construct the totally disconnected symbolic cover. Proof of Lemma 6.36. Recall that since X is a compact metric space, it is also totally bounded, so for every m 1 there exist nitely many (m ) (m ) points x1 , . . . , xn(m) X with
n(m)
X=
i=1
B1/m xi
(m )
(6.9)
We dene Z=
m=1
{1, . . . , n(m)}
227
with the product topology from the discrete topologies on each of the spaces {1, . . . , n(m)}. By Sections A.4 and A.5, Z is a compact metric space. We will dene Y as a closed subset of Z , and will dene : Y X by (y ) = lim xy(m)
m (m )
where y (m) {1, . . . , n(m)} is the mth coordinate of y and xy(m) is the corresponding center of the y (m)th ball in the cover (6.9). Our denition of Y will ensure that is well-dened (that is, the limit dening exists), continuous, and surjective. The closed set Y . Now dene Y = y Z | B1/1 xy(1) B1/m xy(m) = for all m
(1) (m )
(m )
1 .
We will show that Y is closed by proving that its complement Z Y is open. So suppose that z Z Y , so that B1/1 xz(1) B1/m xz(m) = for some m 1. However, this means that all other sequences with the same rst m coordinates also lie in Z Y . That is,
1 1 1 {z (1)} m {z (m)} Z Y, (1) (m )
and the set on the left is an open neighborhood of z by denition, so Z Y is open. The limit defining exists. Let y Y and m > , then there exists a point (m ) ( ) x B1/ xy() B1/m xy(m) and so d xy() , xy(m)
(m ) ( ) (m )
d xy() , x + d x, xy(m) <
( )
(m )
1 m
< 2 .
(6.10)
This shows that xy(m) is a Cauchy sequence in X and so has a limit in X . Continuity of . Let y Y and x > 0. Now choose with 4 < . 1 Suppose that z Y belongs to the neighborhood {y ()} dened by the th coordinate of y . Applying the limit as m in (6.10) we see that d xy() , (y ) and similarly d xz() , (z )
( ) 2 . ( ) 2
However, by choice of z we have y () = z () and so
228
6 Dual Spaces
d ((z ), (y ))
< . 1 an index y (m)
This shows continuity of . Surjectivity. Let x X , and choose for every m {1, . . . , n(m)} with x B1/m xy(m) ,
(m )
which is possible by (6.9). It follows directly from the denitions that y Y and that (y ) = x. 6.3.4 Locally Compact -Compact Metric Spaces Knowing Theorem 6.30 for compact metric spaces, we now extend it to compact locally compact metric spaces using suitable patchworking. Proof of Theorem 6.30. Let X be a -compact locally compact metric space, and let : Cc (X ) R be a positive linear functional. Since X is compact, there is a sequence of compact sets (Qn ) with X = n=1 Qn . We claim that there also exists a sequence of compact subsets Kn with X = o 1. We rst dene K1 = Q1 and then n=1 Kn and Kn Kn+1 for n construct Kn inductively as follows. Suppose Kn Qn has already been constructed. Notice that every point x Kn has an open neighborhood Ux with compact closure. By compactness, we get Kn Un = Ux1 Uxm(n) . Now dene Kn+1 = Un Qn+1 . The so constructed sequence satises all desired properties. By Urysohns lemma (Lemma A.23) there exists a function fn Cc (X ) o with Kn fn 1 for each n 1. If f Cc (Kn ) then
o x Kn
sup f (x) fn f
so (f ) (fn ) sup f (x).

o x Kn
o We now consider Cc (Kn ) as a subspace of the space of continuous functions C (Kn ) on Kn . The norm-like function
pn (f ) = (fn ) sup f (x)

o x Kn o) for f C (Kn ) has all the properties needed to apply Lemma 6.1, so |Cc (Kn extends to some n dened on C (Kn ) and is again positive (use the argument from Section 6.3.2 to check this), and can be represented by a measure n o , we obtain dened on the Borel sets in Kn . Restricting this measure n to Kn o o on K a measure n = n |Kn with n
229
(f ) =
o Kn
f dn
o for all f Cc (Kn ). We claim that these measures can be patched together to dene a locally nite measure on X with the desired properties. For this, o notice that n+1 is a measure on Kn +1 which satises
(f ) =
o Kn +1
f dn+1 =
o Kn
f dn+1
o o for all f Cc (Kn ) Cc (Kn +1 ). By the uniqueness of the measure in Theorem 6.30 (see Section 6.3.1) this shows that
o = n n+1 |Kn
(6.11)
for all n
1. For a Borel set B X , dene

o (B ) = lim n (B Kn ), n
o which exists because n n (B Kn ) is an increasing function. By the como patibility established in (6.11), (B ) = n (B ) if B Kn . It is now straightforward to check that is a Borel measure on X as follows. It is clear that is positive and monotone; the main step is to check that is countably additive. If
B= then, for any L 1,

L
=1
=1 L o n (B Kn ) = n
B B,
L
B
=1
=1
o Kn
o n (B Kn ),
and so
(B )
=1
(B ).
Since this holds for all L
1, we must have
=1
(B )
(B ).
(6.12)
To check that there is equality in (6.12), assume that (B ) > 0 and choose with 0 < < (B ). Then there exists some n 1 with
230
6 Dual Spaces
o < n ( B K n ) = n =1 o B K n
=1
o n (B Kn ),
and so there exists some L with

L
<
=1
o n (B Kn ).
This implies that

L
<
=1
(B )
by monotonicity. Therefore, we have equality in (6.12) and so is a meao sure. Since every compact set K is contained in some Kn , we have (K ) o n (Kn ) < , so is locally nite. By the same argument any f Cc (X ) o belongs to some Cc (Kn ) and hence (f ) =
o Kn
f dn =
o Kn
f d =
X
f d
as required.
Exercise 6.39. Let X be a -compact locally compact metric space, and let : C0 (X ) R be a positive linear functional. Show that (f ) = for a nite measure on X . f d
6.3.5 Continuous Linear Functionals on C (X ) In the remainder of this section we again treat the real and the complex cases simultaneously. The following result describes the dual of C (X ). Theorem 6.40 (Riesz representation on C (X )). Let X be a compact metric space, and let (C (X )) be a continuous linear functional on the space C (X ) of continuous functions on X . Then there exists a signed measure representing . That is, there exists a positive nite measure || and some measurable g with g = 1 such that d = g d|| denes a signed measure with f g d|| f d = (f ) =
X X
for all f C (X ).
o By construction {Kn | n N} is an open cover of X , and hence of K , so there is a nite subcover.
231
In the proof below we rst construct from the linear functional a positive linear functional || which will give rise to the positive nite measure ||. The existence of g will then follow from Proposition 6.26. At rst sight the construction of || is surprising we will force positivity, and then linearity is a minor miracle. Comparing this construction to Lemma 2.51 and its proof should make this less surprising. Proof of Theorem 6.40. Let be a continuous linear functional on C (X ), and let f C (X ) be non-negative. We dene ||(f ) = sup {(g ) | g C (X ), |g | f} .
Clearly ||(f ) 0 and ||(f ) = ||(f ) for 0. In order to extend || to an R-linear functional on CR (X ) we rst consider f1 , f2 C (X ) with f1 0 and f2 0, and claim that ||(f1 + f2 ) = ||(f1 ) + ||(f2 ). One inequality is quite easy. If gi C (X ) satises |gi | |g 1 + g 2 | and so (g1 ) + (g2 ) = ((g1 + g2 )) which shows that ||(f1 ) + ||(f2 ) ||(f1 + f2 ). To show the reverse inequality, we need to take some g C (X ) with |g | f1 + f2 and split it into two continuous functions g = g1 + g2 with |g1 | f1 and |g2 | f2 . We dene g1 (x) = g (x)
g (x ) |g(x)| f1 (x)
(6.13) fi for i = 1, 2, then
|g 1 | + |g 2 |
f1 + f2 ||(f1 + f2 ),
if |g (x)| if |g (x)|
f1 (x), f1 (x),
which we claim is a continuous function satisfying |g1 |f1 . At the cross-over of the two formulas that dene g1 (x) the two formulas agree, and so this does not create a problem for continuity. Hence, the only points x X where continuity is not clear is where g (x)/|g (x)| is not dened (and possibly cannot be extended to a continuous function) and we are in the second case of the (x ) denition of g1 . In this case we interpret the denition as |g g(x)| f1 (x) = 0 since we have 0 f1 (x) |g (x)| = 0. Now notice that |g1 | f1 everywhere, and hence continuity of g1 also holds at these points since f1 is continuous. We also dene g2 = g g1 and notice that |g2 (x)| = 0 |g (x)| f1 (x) if |g (x)| if |g (x)| f1 (x), f1 (x)
232
6 Dual Spaces
so that |g2 |
f2 by the assumption on g . Hence (g ) = (g1 ) + (g2 ) ||(f1 ) + ||(f2 )
which proves the reverse inequality and hence (6.13). Now let f be any function in CR (X ). We extend the denition of || by the formula ||(f ) = ||(f + ) ||(f ), (6.14)
where f + = max{f, 0} and f = max{f, 0} are non-negative continuous functions. We now have ||(f ) = ||(f ) for all R and f CR (X ). For linearity it remains to show that ||(f1 + f2 ) = ||(f1 ) + ||(f2 ) for f1 , f2 CR (X ). To see this, notice rst that (f1 + f2 ) = (f1 + f2 )+ (f1 + f2 )
+ + = f1 f1 + f2 f2
(6.15)
and so
+ + (f1 + f2 )+ + f1 + f2 = (f1 + f2 ) + f1 + f2 .
We may apply || to the latter equation and use the non-negative linearity obtained earlier to give
+ + || (f2 + f2 )+ + ||(f1 ) + ||(f2 ) = || (f1 + f2 ) + ||(f1 ) + ||(f2 ).
Rearranging the terms again we get

+ + ) ||(f1 ) + ||(f2 ) ||(f2 ) || (f2 + f2 )+ || (f1 + f2 ) = ||(f1
which is precisely (6.15) by denition of || in (6.14). We have shown that || is a positive linear functional on CR (X ). By Theorem 6.30 there exists a positive measure || with ||(f ) = for f CR (X ). Note that ||(X ) = ||() = sup{(f ) | f C (X ), f Moreover, we claim that |(f )| |f | d||
X
f d||
1}
for all f C (X ). Indeed, by denition of || and || we have
6.4 Further Topics
233
|(f )| = (f ) = ((f ))
||(|f |) =
|f | d||
(6.16)
for some C with || = 1. However, the claim shows that is continuous with respect to and extends to L1 || (X )
L1 (X ) ||
and so by Proposition 6.26 must be of the form (f ) =

X
f g d||
for some g L (X ). Moreover, the supremum norm g of g equals the norm of the functional with respect to L1 (X ) , which by (6.16) is less || than or equal to one.
Exercise 6.41. Show that the signed measure in Theorem 6.40 is uniquely characterized by the functional C (X ) . Exercise 6.42. In the notation of Theorem 6.40 (and of its proof) show that |g | = 1 for ||-almost every x X . Express the norm of in terms of g and ||. Exercise 6.43. State and prove a version of Theorem 6.40 for C0 (X ) for a compact locally compact metric space X . Exercise 6.44. Let X = [0, 1] R (though the reader will notice that the same conclusions hold for the space of continuous functions on most compact metric spaces). (a) Notice that every nite signed measure on X denes a linear functional on L (X ) = {f : X R | f
< , f measurable}
but that L inf ty (X ) contains other functionals as well. (b) Notice that every function f L (X ) denes a linear functional on the space of nite signed measures M(X ) = C (X ) . Deduce that C (X ) is not reexive. Show that M(X ) contains more functionals than those arising from L (X ). Exercise 6.45. Find a description of the dual of C n ([0, 1]) for n = 1, 2, . . ..
6.4 Further Topics

We list here a few topics that use or extend the material of this chapter. As we will see in Section 7.5 the HahnBanach lemma (Lemma 6.1) is very useful in the study of closed convex sets (even in the more general setting of locally convex spaces introduced in Section 7.4). There are more connections between the spaces Lp (X ) for various p [0, ]. E.g. there are interesting interpolation theorems between these spaces, see e.g. the Riesz-Thorin Interpolation Theorem in Folland, Real Analysis, Section 6.5 ??.
234
6 Dual Spaces
The Riesz representation theorem (Theorem 6.30) has numerous applications. It play a crucial role in obtaining a point in a convex set as a generalized convex combination of extremal points of the convex set (see Section ??), in the spectral theory of bounded (and unbounded) self-adjoint operators on Hilbert spaces (see Chapters ??), in the construction of the Haar measure of a locally compact group (see Fact ? and ?), and also in the spectral theory of unitary representations of locally compact abelian groups (see Bochners Theorem in ?).
7 Locally Convex Vector Spaces
In this chapter we introduce the important weak and weak*-topologies on Banach space and their duals, prove an important compactness result, introduce two more topologies on B (V, W ), and put these into the general context of locally convex vector spaces. Finally we also discuss convex sets of locally convex vector spaces.
7.1 Weak Topologies and TychonoAlaoglu

As we have seen in Proposition 2.28, the unit ball in an innite-dimensional Banach space is not compact in the topology induced by the norm (which is often called the norm or strong topology). Given the central importance of compactness in much of analysis, this is a signicant problem. In general this is simply something that must be lived with as a price to pay for the additional power of doing analysis in innite-dimensional spaces, but we can also improve the chance of nding compactness by studying weaker topologies than the norm topology. We also refer to Appendix A.4 since many denitions of topologies in this chapter are special cases of more general constructions there. Denition 7.1. Let X be a normed vector space with dual space X . The weak topology on X is the weakest (coarsest) topology on X for which all the elements of X (which are functions on X ) are continuous. Notice that in the weak topology a neighborhood of x0 X is a set containing a set of the form
n
N1 ,...,n ; (x0 ) =
i=1
{x X | |i (x) i (x0 )| < }
As usual, X consists of the linear functionals that are continuous with respect to the norm topology on X .
236
for some > 0 and functionals 1 , . . . , n X . Note that a sequence (xn ) in X converges in the weak topology to x X if (xn ) (x) for every X . If X is innite-dimensional, then the weak topology and the norm topology are automatically dierent. To see this notice that
n i=1
ker i N1 ,...,n ; (0),
which implies that no neighborhood of 0 in the weak topology can be bounded in the norm of X . Denition 7.2. Let X be a normed vector space with dual space X . The weak* topology on X is the weakest (or coarsest) topology on X for which the evaluation maps x x (x) corresponding to x X are all continuous. Once again we can describe the weak* topology by saying that a neigh borhood of x 0 X is a set containing a set of the form
n
Nx1 ,...,xn ; (x 0) =
i=1
{x X | |x (xi ) x 0 (xi )| < }}
for some > 0 and x1 , . . . , xn X . As before, we can show that the weak* topology and the norm topology on X are dierent if X (and hence if X ) is innite-dimensional. Example 7.3. (a) For a Hilbert space H , the weak and weak* topologies are identical. The same holds for any reexive Banach space. However, in general there is no denition of a weak* topology on a given Banach space as there may not exist a canonical pre-dual of X , meaning a Banach space Y with X = Y (see Example 7.51) (b) Let X = [0, 1] and consider the sequence of measures (n ) where n = 1 1/n + 2/n + + 1 n
viewed (via integration) as functionals on C (X ) (see Theorem 6.40). Then (n ) converges in the weak* topology to the Lebesgue measure , which we also identify with the functional it induces. Notice that this statement is equivalent to the beginning of the theory of the Riemann integral for continuous functions. Notice however that (n ) does not converge in the weak topology, nor in the norm topology. To see the former, notice that every function f L ([0, 1]) induces a linear functional on the space M([0, 1]) of nite signed measures on [0, 1], and that for f = Q[0,1] we have f dn = 1 for all n 1 while f d = 0. Thus the weak and weak* topologies on M([0, 1]) are dierent.
In general convergence of sequences is not sucient to describe a topology (see Exercise 7.12), while convergence of nets or lters are sucient. Here t denotes the point measure dened by t (A) = 1 if t A and 0 if not.
237
7.1.1 Weak* Compactness of the Unit Ball The importance of the weak* topology comes from the following theorem. Theorem 7.4 (TychonoAlaoglu). The closed unit ball { X |
operator X (0) 1 } = B1
in the dual X of a normed vector space X is compact in the weak* topology. Proof. Let B (r) be the closed (and hence compact) ball of radius r > 0 in R or C depending on the eld of scalars. By Tychonos theorem (see Theorem A.17) the space Y = B( x )
xX
is compact with respect to the product topology (see Denition A.14). Now dene the embedding
X (0) Y : B1
(x) B ( x )
xX
Let x : Y B ( x ) y y (x) be the projection operator corresponding to x X . Then the neighborhoods of some y = (0 ) in the product topology are sets containing sets of the form
n
N=
i=1
1 (B (0 (xi ))) . x i
Now notice that the pre-image of such a set under takes the form
n
1 (N ) =
i=1
{ X | |(xi ) 0 (xi )| < } = Nx1 ,...,xn , (0 ),
which is precisely one of the neighborhoods of 0 X dening the weak* topology on X . Therefore, is a homeomorphism from X (with the weak* topology)to a subset of Y (with the product topology). We claim that (X ) Y is closed, which then implies the theorem, since any closed subset of Y is compact since Y is itself compact. X (0)) consists of all linear maps in Y . To see the claim, notice that (B1 This is because any element of Y is a scalar-valued function on X with y (x) B ( x ), and so if y is linear then y 1.
238
The claim now follows easily since linearity is dened by equations (rather than inequalities) and so is a closed condition, as we will now show. It is enough to show that X (0) Y B1 is open. So suppose that
X (0) . y Y B1
Then by construction y is not linear, and so there exist scalars 1 , 2 and elements x1 , x2 X with y (1 x1 + 2 x2 ) = 1 y (x1 ) + 2 y (x2 ). Now choose > 0 such that B (y (1 x1 + 2 x2 )) (1 B (y (x1 ))) + (2 B (y (x2 ))) = , so that
1 1 1 (2 B (y (x2 ))) (1 B (y (x1 )))x (B (y (1 x1 + 2 x2 )))x z 2 1 1 x1 +2 x2
implies that z (1 x1 + 2 x2 ) =
B (y (1 x1 +2 x2 )) X
1 z (x1 ) + 2 z (x2 )
1 B (y (x1 ))+2 B (y (x2 ))
This shows that Y (B1 (0)) is open, and the theorem follows. The weak and weak* topologies are never metrizable for innite-dimensional Banach spaces (see Exercise 7.6), but when restricted to the unit ball the situation is better. Proposition 7.5. Let D X be a dense subset of a normed vector space. X (0) is the weakest topology on B X (0) Then the weak* topology restricted to B1 1 for which the evaluation maps (x) are continuous for all x D. In parX (0) is ticular, if X is separable, then the weak* topology restricted to B1 metrizable. Proof. Suppose that D X is dense, and suppose that Nx, = { X | |(x) 0 (x)| < }
X (0) dened by > 0 and some arbitrary x X . is a neighborhood of 0 B1 X (0) we , and notice that for B1 Choose some x D with x x < 3 have |(x) (x )| < 3 and so X (0) N X Nx ,/3 (0 ) B1 x, (0 ) B1 (0)

239
by a simple application of the triangle inequality (check this). Thus the topoloX (0) by D or by X (the latter being the weak* topology gies dened on B1 by denition) agree. For the last claim of the proposition, notice that if X is separable, then by denition there exists a countable dense set D = {x1 , x2 , . . . } X . For every xn D the weakest topology for which (xn ) is continuous is the topology induced by the semi-norm xn = |(xn )|, and so the weak* X (0) that is stronger than all the topology is the weakest topology on B1 topologies induced by the semi-norms xn for n N. By Denition A.14, this topology is metrizable.
Exercise 7.6. Let X be an innite-dimensional Banach space. (a) Show that X is not the span of countably many elements of X . That is, show that for any x1 , x2 , X we have X = xn | n N . (b) Use part (a) to show that the weak* topology does not have a countable basis of neighborhoods of 0. Conclude that the weak* topology on X is not metrizable. (c) Generalize parts (a) and (b) to the weak topology on X .
7.1.2 More Properties of the Weak and Weak* Topologies Notice that we already have some interesting examples of weak* convergence. In fact, Proposition 3.19 can be interpreted as showing that for every x T the measure dened by FM (x t) dt on T converges in the weak* topology to the Dirac measure x corresponding to the unit point mass at x. The reader can analyze the proof of Proposition 3.19 and the material of this section to prove the following theorem due to Toeplitz. Theorem 7.7 (Toeplitz). Suppose that (kn ) is a sequence of integrable functions [0, 1] C, and let x be a point in [0, 1]. Then the measures dened by kn (t) dt converge in the weak* topology to x if and only if all of the following conditions hold: (1) (2)
0 1
kn
1
C for some constant C independent of n;
kn (t) dt 1 as n ; and kn (t)g (t) dt 0 for all g C ([0, 1]) with x / Supp(g ).
(3)
0
Exercise 7.8. Prove Theorem 7.7.
Let us nish with the following lemma, which answers both of the following questions for a Banach space armatively:
Of course we may very well have X = xn | n N .
240
Does X as a vector space with the weak* topology uniquely characterize X ? If the weak and weak* topologies on X agree, does it follow that X is reexive? Both have positive answers, as a consequence of the following lemma.
Lemma 7.9. Let X be a Banach space. A functional on X is continuous with respect to the weak* topology if and only if it is an evaluation map, that is a map of the form f : x x (x) for some x X .
C Proof. Suppose f is continuous. Then f 1 (B1 (0)) is a neighborhood of 0 X , and so there exist x1 , . . . , xn X and > 0 with C Nx1 ,...,xn , (0) f 1 B1 (0).
If now x X satises x (x1 ) = = x (xn ) = 0 then any multiple of x belongs to Nx1 ,...,xn , (0) and therefore |f (M x )| < 1 for all M . This implies that f (x ) = 0, or in other words that f induces a functional on x1 n x2 ker xi Y =X / , = Im . . . i=1 xn where x (x ) = x (x). However, the space Y is nite-dimensional and the maps xi generate the dual of Y , so that f must be a linear combination of the form
n
f=
i=1
i xi =
n i=1
i xi
as claimed.
Exercise 7.10. Let X be a Banach space with a sequence (xn ) that converges to some x X in the weak topology. Show that supn 1 xn < . Exercise 7.11. Let X be a reexive Banach space. Let (xn ) in X be a bounded sequence. Show that (xn ) has a weakly convergent subsequence. Notice that this follows immediately from Theorem 7.4 and Proposition 7.5 if X is separable; the exercise is to show this in general.
241
Exercise 7.12. We know that the weak topology and the norm topology on innitedimensional Banach spaces are dierent. In contrast to this, show that a sequence in 1 (N) converges in the weak topology if and only if it converges in the norm topology. Exercise 7.13. Let G be a compact metric abelian group. Show that there exists a G-invariant positive function : CR (G) R, and deduce the existence of a Haar measure on G (see Section 3.1). Exercise 7.14. Fix p (1, ). (a) Prove that a sequence (fn ) in p (N) converges weakly to f p (N) if there is some M with fn p M for all n 1 and fn (k) f (k) as n for each k N. (b) Find a sequence in p (N) that converges weakly but not in norm. Exercise 7.15. Let X, Y be normed vector spaces, and let T : X Y be linear. Show that T is a bounded operator if and only if xn x weakly in X as n implies that T xn T x weakly in Y as n . Exercise 7.16. Let X be an innite-dimensional normed vector space. Show that the weak closure of the unit sphere S= {x X | x = 1} is the closed unit ball B 1 (0) = {x X | x 1}.
7.1.3 Analytic Functions and the Weak Topology As we have seen, weak convergence and norm convergence are in general quite dierent. There are, however, situations in which weak convergence can be ungraded to norm convergence. Analytic functions taking values in a Banach space provides one such setting where this phenomenon is seen. Denition 7.17. Let G C be an open set, and let X be a complex Banach space. A function f : G X is called (strongly) analytic if for every G the limit f ( + h) f ( ) f ( ) = lim h0 h exists in the norm topology. Also f is called weakly analytic if for every X and G the limit ( f ) ( ) = lim exists. Notice that in the denition of weak analyticity we do not see immediately whether we can associate to f and a weak limit of the dierence quotient dening f ( ). What we can associate to f and G in terms of a derivative is a weak* limit in X , lim (f ( + h)) (f ( )) , h0 h
h0
(f ( + h)) (f ( )) h
which is bounded by the BanachSteinhaus theorem (Corollary 5.3). However, much more is true.
242
Theorem 7.18 (Dunford). Let G C be an open set and let X be a Banach space. A weakly analytic function f : G X is analytic. Proof. Let X , so that by assumption f : G C is analytic, and so f ( ) = 1 2 i
| z | =
f (z ) dz z
for suciently small > 0, where the notation denotes the contour integral over a circular path with positive orientation winding once around with C radius of . For h B {0} we therefore have f ( + h) f ( ) = 1 2 i h = 2 i 1 1 dz z ( + h ) z | z | = h dz. (7.1) f (z ) (z ( + h))(z ) | z | = f (z )
C For h = h in B (0) {0} we write
x(h, h ) =
1 h h
f ( + h) f ( ) f ( + h ) f ( ) h h
for the second-order dierence quotient. We claim that x(h, h ) is uniformly C bounded for h = h in B/ 2 (0) {0}. Assuming this for the moment, we see that f ( + h) f ( ) h h
C is a Lipshitz function on B/ 2 (0) {0} and so has a limit as h 0. Now let X and use (7.1) to calculate that
(x(h, h )) =
1 = $ 2 i$ (h$ $ h )
1 2 i(h h )
| z | = | z | =
f (z ) f (z ) dz (z ( + h))(z ) (z ( + h ))(z ) $ f (z )$ (h$ $ h ) dz. (z )(z ( + h))(z ( + h )) (7.2)
Notice that the denominator in (7.2) is uniformly bounded away from zero, and the numerator is bounded above by M for some constant M depending only on f . It follows that | (x(h, h )) | is uniformly bounded for every X . By the BanachSteinhaus theorem (Theorem 5.1) this shows that x(h, h ) is uniformly bounded, which proves the claim. Another instance were weak convergence can be upgraded to strong convergence arises in the proof of a version of a mean ergodic theorem for a measure-preserving group action. We refer to [12, Sec. 8.7], where a simple version of an argument due to Greschonig and Schmidt [16] is presented.
7.2 Applications of Weak* Compactness
243

7.2.1 Equidistribution The combination of the Riesz representation theorem for functionals on C (X ) in Theorem 6.40 and the compactness of the unit ball in the weak* topology in the TychonoAlaoglu theorem (Theorem 7.4) provide the basic tools for studying sequences of probability measures(7) . Proposition 7.19. Let X be a compact metric space. Then the space P (X ) of probability measures dened on the Borel -algebra of X forms a compact metric space in the weak* topology. Proof. By the Riesz representation theorem (Theorem 6.40) we have C (X ) = M(X ) P (X ) where M(X ) is the space of measures dened on the Borel -algebra of X . To see that P (X ) is a closed subset, notice that P (X ) = M(X ) | f d 0 for all f C (X ) with f 0 and
C (X ) d = 1 B1 (0)
is dened as an intersection of weak* closed sets. By the TychonoAlaoglu theorem (Theorem 7.4) this implies that P (X ) is compact in the weak* topology. By Lemma 3.2, C (X ) is separable, and by Proposition 7.5 the weak* topology on P (X ) is metrizable. Denition 7.20. Let X be a compact metric space, and let (n ) be a sequence of probability measures in P (X ). We say that (n ) equidistributes with respect to a probability measure m P (X ) if n m as n in the weak* topology; that is, if f dn f dm
X
as n for all f C (X ). A sequence (xn ) equidistributes with respect to m P (X ) if the averages n = 1 (x + + xn ) n 1
of the Dirac measures of x1 , . . . , xn equidistributes with respect to m. One is often interested in equidistribution with respect to a natural given measure like the Lebesgue measure on T, and in that case the natural measure is often not mentioned, and we simply talk about a sequence of measures being equidistributed. For the case of the Lebesgue measure on Td the following provides a characterization of equidistribution.
244
Lemma 7.21. A sequence (n ) of probability measures on Td equidistributes if and only if m dn m dx =

Td
Td
1 0
if m = 0; if m = 0
for all m Zd .
Exercise 7.22. Prove Lemma 7.21 using the fact that the trigonometric polynomials are dense in C (Td ).
Lemma 7.21 already gives some examples of equidistributed sequences, generalizing Example 1.9, and gives a new approach to Exercise1.11.
Exercise 7.23. Show that the sequence (xn ) in Td dened by xn = n(1 , . . . , d ) (mod Zd ) Td
for some 1 , . . . , d R is equidistributed in Td if and only if 1, 1 , . . . , d are linearly independent over Q.
Equidistribution results like this are a starting point for more general results obtained by Weyl [50]. We will only discuss a special case, and outline a proof along the lines of a slightly more recent approach due to Furstenberg [15]. Proposition 7.24. If R Q, then the sequence (xn ) dened by xn = n2 modulo Z is equidistributed in T. The approach of Furstenberg is to study not just n2 modulo Z but in fact orbits of points (x, y ) T2 under the map T : T2 T2 dened by T (x, x) = (x + , y + 2x + ). Notice that T (0, 0) = (, ), T 2 (0, 0) = (2, 4), . . . T n (0, 0) = (n, n2 ), so that Proposition 7.24 follows from the stronger result that the orbit (T n (0, 0)) is equidistributed in T2 . Dynamical questions of this sort concerning equidistribution of an orbit under iteration of a map are part of ergodic theory. We will briey outline how one can use the TychonoAlaoglu theorem (Theorem 7.4) to prove Proposition 7.24 using ideas from ergodic theory without developing this theory further, and refer to [12] for a more thorough treatment. (7.3)
245
Denition 7.25. Let X be a compact metric space, and let T : X X be a continuous transformation. A probability measure on X is called T -invariant if (T 1 B ) = (B ) for all measurable sets B X . The triple (X, T, ) is called a measure-preserving system . A T -invariant probability measure is ergodic if any measurable set B X with T 1 B B = 0 implies that (B ) {0, 1}. Ergodicity is the natural notion of indecomposability for ergodic theory (which includes the study of measure-preserving systems). To see this, notice that if B X has T 1 B B = 0 and (B ) (0, 1), then we can decompose the measure into a convex combination = (B ) 1 |B (B ) + (X B ) 1 | (X B ) X
B
where one can quickly check that 1 |B (B ) and 1 | (X B ) X
are two dierent T -invariant probability measures. Thus a non-ergodic measurepreserving system can be decomposed into two disjoint measure-preserving systems. Pursuing the idea that a non-ergodic measure is one that can be decomposed in this way leads to the following alternate characterization of ergodicity. Proposition 7.26. The space P T (X ) = { M(T ) | is a T -invariant probability measure} is a weak* compact convex subset of M(X ). The extremal points of P T (X ) are precisely the ergodic measures in P T (X ). This characterization is interesting because it relates an intrinsic property of a T -invariant probability measure (ergodicity, as in Denition 7.25) with a property regarding the relative position of this measure in the space of all T -invariant probability measures. Sketch of Proof of Proposition 7.26. By Riesz representation (Theorem 6.30) a signed measure on X is a measure if and only if X f d 0 for all f C (X ) with f 0. By the uniqueness part of Riesz representation, a measure is T -invariant if and only if
That is, those elements that cannot be expressed as a proper convex combination = s1 + (1 s)2 with s (0, 1) and 1 , 2 distinct elements of P T (X ). We will discuss extremal points from an abstract point of view in Section 7.5.2.
246
f T d =
f d
X M( X )
(0) is weak* closed, and for all f C (X ). Hence P T (X ) P (X ) B1 so compact by the TychonoAlaoglu theorem (Theorem 7.4). It is easy to see that P T (X ) is convex, and the discussion before the lemma shows that a non-ergodic invariant measure is not extremal. Suppose now that is not extremal, and write = s1 + (1 s)2 with s (0, 1) and 1 , 2 distinct measures in P T (X ). Clearly 1 since s (0, 1), so there is a measurable function f1 with d1 = f1 d. We claim that f1 is T -invariant in the sense that f1 T = f1 almost everywhere with respect to . To see this let B X be a measurable set and note that by T -invariance of we have 1 (B ) =
B
f1 d =
X
(B f1 ) T d =
T 1 B
f1 T d,
and by denition 1 (T 1 B ) =
T 1 B
f1 d.
Let us assume now that T has a continuous inverse (which is the case for the map on T2 considered above to which this result will be applied). This implies that f1 = f1 T since all measurable sets are pre-images as we have T 1 (T B ) = B. Since 1 = the function f1 is not equal to a constant function almost everywhere with respect to , and has f1 d = 1 (X ) = 1.
X
Therefore, B = f 1 ((0, 1)) satises B T 1 B = 0 The compactness of P (X ) can be used to nd elements of P T (X ) from sequences of approximately invariant measures.
Exercise 7.27. For any sequence (n ) in P (X ), dene a sequence (n ) by n =
and has (B ) (0, 1), so is not ergodic.
1 n
n j T n j =1
This is not necessary, we refer to [12] for the general case.
7.2 Applications of Weak* Compactness for all n 1, where S ( B ) = S 1 B
247
for any map S : X X and measure is called the push-forward of under S . Show that any weak* limit of a subsequence of (n ) is T -invariant, and deduce that P T (X ) is non-empty(8) .
With these general facts about continuous transformations on compact metric spaces at our disposal, we return to a consideration of the transformation (7.3). We start this by explaining Example 1.9 in this language. Clearly the map R : T T dened by R (x) = x + preserves the Lebesgue measure T , so (T, R , T ) is a measure-preserving system. Lemma 7.28. If R Q then T is ergodic for R . Proof. Suppose that B T is a measurable set with
1 T B R B = 0.
Then the characteristic function
B satises
1 = B B R = R B
as elements of L2 T (T). Thus for the Fourier series expansion
B =
m Z
cm m ,
which converges in L2 T (T), we have
B R =
m Z
cm m R =
cm m ,
m Z
where we have used the fact that UR : f f R is an isometry of L2 T (T) and hence maps a convergent series to a convergent series. Notice that m R (x) = e2im(x+) = e2im m (x), so that (by uniqueness of Fourier coecients) we must have cm e2im = cm for all m Z. Since R Q this implies that cm = 0 for m Z {0}, so B = c0 in L2 T (T), which implies that T (B ) {0, 1} as required. Indeed, a stronger statement is true: T is the only measure invariant under R if R Q. This implies Lemma 7.28 by Proposition 7.26. To see this stronger result, let be any R -invariant probability measure, and calculate
248
m d =
T T
m R d = e2im 1 for m = 0; 0 for m = 0.
m d,
T
which implies that m d =

T
Since this is a property shared by T , and the trigonometric polynomials are dense in C (T) by Proposition 3.19, we deduce that = T . Using Exercise 7.27 together with the exercise below gives an alternative approach to Example 1.9. Of course this approach is more complicated, but it can also be used in situations where a direct calculation of the sort used in Example 1.9 is not feasible.
Exercise 7.29. Let Z be a compact metric space, let (xn ) be a sequence in Z , and let z Z . Show that the following are equivalent: limn xn = z ; for every subsequence (xnk ) there is a subsequence (xnk ) such that lim xnk = z.
Assume now that R Q, and use this, together with the fact that P R (T) = {T } and Exercise 7.27, to show the equidistribution of (n) in T.
We now describe the procedure for obtaining equidistribution of the orbits of the map T dened by T (x, y ) = (x + , y + 2x + ) on T2 discussed earlier, leaving some of the steps as exercises.
Exercise 7.30. Show that the Lebesgue measure T2 is T -invariant and ergodic.
Sketch of Proof of Proposition 7.24. As discussed just after the statement of the proposition, it is enough to show that every T -orbit (x, y ), T (x, y ), T 2 (x, y ), . . . is equidistributed with respect to T2 . Notice that the rst coordinate of points in the orbit are precisely the points in the orbit
2 x, R (x), R (x), . . .
in T for the transformation R . We already know that this sequence is equidistributed with respect to T . Write x for x T for the Dirac measure at x, so that the equidistribution of R -orbits is equivalent to the statement 1 n
n1 j =0 j T (x T ) T2
(7.4)
249
as n . Suppose that 1 n
n1 j =0 j T (x y ) T2
as n . Then there exists some f CR (T) and > 0 such that 1 nk

nk 1 j =0
f T j (x, y )
f dT2 > 2
(7.5)
for all k 1 for some subsequence (nk ) with nk as k . By continuity there is some > 0 such that d ((x1 , y1 ), (x2 , y2 )) < = |f (x1 , y1 ) f (x2 , y2 )| <
(where d denotes the usual metric on T2 ). Write y, = B (y) for the Lebesgue measure restricted to the -ball B (y ) = (y , y + ) T around y T, and consider the average nk 1 1 T j (x y, ) . nk 2 j =0 By Proposition 7.19 there exists a convergent subsequence (nk ) with limit 1 . Using the convergence in (7.4), we also see that 1 nk (1 2 )
nk 1 j =0 j T (x (T y, ) 2
converges as . By Exercise 7.27 we have 1 , 2 P T (T2 ). We also have T2 = 21 + (1 2 )2 by (7.4). Together with Exercise 7.30 and Proposition 7.26 this implies that T2 = 1 = 2 . We claim that this is essentially what we wanted to prove. By the choice of we have 1 nk
nk 1 j =0
f T j (x, y )
1 2nk
nk 1 j =0
f T j (x, y + z ) dz <
since T j (x, y + z ) = T j (x, y ) + z has distance less than from T j (x, y ) for all z (, ). It follows that
T2
f dT2
lim inf
1 nk 1 nk
nk 1 j =0
f T j (x, y )
nk 1 j =0
lim sup
f T j (x, y )
T2
f dT2 + .
250
However, this contradicts the assumption (7.5) about the subsequence (nk ). It follows that n1 1 T j (x y ) T2 n j =0 as n . 7.2.2 Elliptic Regularity at the Boundary
7.3 Topologies on B (X, Y )

Let X and Y be Banach spaces. Then we have seen that the space B (X, Y ) of bounded linear operators from X to Y together with the operator norm is again a Banach space. Denition 7.31. Let X and Y be Banach spaces. The topology on B (X, Y ) induced by the operator norm is called the uniform operator topology. Since any Banach space has a weak topology, there is of course also a weak topology on B (X, Y ). There are, however, further topologies that make special use of the fact that B (X, Y ) is a space of maps. Denition 7.32. Let X and Y be Banach spaces. The strong operator topology on B (X, Y ) is the weakest topology for which the evaluation maps B (X, Y ) L Lx Y is continuous for every x X , where we use the norm topology on Y . In other words, a neighborhood of L0 B (X, Y ) in the strong operator topology is a set containing a set of the form Nx1 ,...,xn ; (L0 ) = L B (X, Y ) | Lxi L0 xi < for i = 1, . . . , n . Equivalently we could dene the strong operator topology by using all neighborhoods dened by all of the semi-norms L
x1 ,...,xn
= max ( Lx1 , . . . , Lxn ) .
The strong operator topology is in many situations more natural than the uniform topology, and the study of unitary representations (see Denition 3.26 for the general denition) is an example. Example 7.33. Let H = L2 (R) and dene for x R the unitary map Ux : H H by Ux f (t) = f (t + x) for t R and f H. We claim that
7.3 Topologies on B (X, Y )
251
Ux Uy = 2xy = In fact, if x < y and in Figure 7.1) by 0 f (t) = (1)m 0
2 if x = y ; 0 if x = y.
M > 0 then we can dene a function f H (illustrated for t < 0; for t (m(y x), (m + 1)(y x)) with 0 for t > M (y x).
m < M;
Fig. 7.1. The function f in Example 7.33.
Then Uyx f satises |(f Uyx f ) (t)| = |2f (t)| = 2 for t (0, (M 1)(y x)), so that f U y x f while f
2 2
>2
(M 1)(y x)
M (y x). This shows that I U y x

2
= Ux Uy
=2
since Ux = Ux = 1. The claim implies that the map R x Ux B (H, H) is not continuous with respect to the uniform operator topology. However, it is continuous with respect to the strong operator topology, for if f1 , . . . , fn H and > 0 are given, then for y suciently close to x we have Uy fi Ux fi 2 < (this follows easily from the density of Cc (R) in L2 (R); see Proposition 2.38 and Lemma 3.27 for the details). Thus, for y suciently close to x we will have Uy Nf1 ,...,fn ; (Ux ) as required.
252
Another topology on B (X, Y ) is built up using functionals on Y . Denition 7.34. Let X and Y be Banach spaces. The weak operator topology on B (X, Y ) is the weakest topology with respect to which the map B (X, Y ) L y (L(x)) is continuous for all x X and y Y . Equivalently, the weak operator topology can be dened using the neighborhoods dened by the seminorms L
;x ,y ;...;x y x1 ,y1 2 2 n n
= max (|y1 (Lx1 )|, . . . , |yn (Lxn )|) .
Exercise 7.35. Assume that X and Y are innite-dimensional Banach spaces. Show that the uniform topology, the weak topology, the strong operator topology, and the weak operator topology are all dierent.
7.4 Locally Convex Vector Spaces

Even if we were initially only interested in Banach spaces, the last few sections should be convincing enough to accept that the next denition is natural and unavoidable. It gives a class of vector spaces generalizing normed vector spaces. Denition 7.36. Let X be a vector space (over R or C) and suppose that { | A} is a family of semi-norms on X such that for every x X {0} there is some A with x > 0. Then the locally convex topology on X induced by the semi-norms is the topology for which a neighborhood of x0 X is a set containing a set of the form
n
N1 ,...,n ; (x0 ) =
i=1
(x0 )
i )
= {x X | max ( x x0
i=1,...,n
< } .
The vector space X together with this topology is called a locally convex vector space. Equivalently, a locally convex topology is the weakest topology that is stronger than those dened by a collection of semi-norms. Enlarging the collection of semi-norms if necessary, we may assume that for 1 , . . . , n A the semi-norm x = max x i
i=1,...,n
also belongs to the collection (that is, coincides with some ). If this is the case, then the neighborhoods are sets containing a ball of the form
7.4 Locally Convex Vector Spaces
253
B (x0 ) = {x X | x x0
< }
for some A and > 0. An equivalent denition of local convex vector spaces is obtained by requiring that a topology on the vector space X makes addition and scalar multiplication continuous and has a basis of neighborhoods of 0 X consisting of absorbent balanced convex sets: A convex set C X is balanced if for any x C and with || 1, we also have x C , and is absorbent if any x X has the form x for some x C and scalar . We refer to Conway [8, Sec. IV.1] for the equivalence, as we will not need it in this form (also see Exercise 7.37 and Exercise 7.44).
Exercise 7.37. Show that a locally convex vector space (as in Denition 7.36) has the property that addition and scalar multiplication are continuous, and that 0 X has a basis consisting of absorbent balanced convex sets.
As the next exercise shows, even if a locally convex vector space topology cannot be described using a norm, the locally convex structure is enough for the HahnBanach theorem (Theorem 6.3).
Exercise 7.38. Let X be a locally convex vector space. Show that the space X of continuous linear functionals on X separates points.
We have seen many examples of locally convex vector spaces. These include normed vector spaces with their norm or weak topology, duals of Banach spaces with the weak* topology, the space B (X, Y ) of operators between two Banach spaces with any of the topologies discussed in Section 7.3. However, there are further spaces that we have neglected so far because they do not t well (or at all) into the framework of normed spaces and spaces constructed from normed spaces. Example 7.39. (1) The space C ([0, 1]) is a locally convex vector space with the semi-norms f C (n) ([0,1]) = max f (j )
j =0,...,n
for all n N. Notice that even though each of these semi-norms is a norm, we still have to use all of them to dene the locally convex topology on C ([0, 1]) we are interested in, namely the topology of uniform convergence of all derivatives. Notice that dierentiation C ([0, 1]) f f C ([0, 1]) is a continuous operator on C ([0, 1]). (2) Let Rd be an open set. Then
Cb ( ) = k 0 k Cb ( ), D
254
k with Cb ( ) dened as in Example 2.19(6), is another example of a locally convex vector space if we use all of the norms Cb k ( ) for k 1. d (3) Let R be an open set. Another important notion of convergence in analysis for functions on is the notion of uniform convergence on compact subsets. For example, on the space C ( ) this notion is captured if we use the collection of semi-norms
K,
| K compact}
where f K, = supxK |f (x)| for f C ( ) is supremum norm of the restriction to K . (4) Let Rd be an open set. We can also make Cc ( ) into a locally convex space in a natural way by endowing it with the collection of semi-norms {
F
| F C ( )}
where f F = f F for f Cc ( ) is the supremum norm taken after multiplication by F C ( ). The corresponding notion of convergence is less familiar but is natural for elements of Cc ( ) (see Exercise 7.40 below). The convergence is uniform across , and this remains true after multiplication with any continuous function that increases rapidly towards .
Exercise 7.40. Use the notation from Example 7.39 in this exercise. (a) Show that f C ( ) belongs to Cc ( ) if and only if f F < for all F C ( ). (b) Suppose that (fn ) is a sequence of functions in Cc ( ) that converges in Cc ( ) to some f Cc ( ). Show that there exists a compact set K such that Supp(fn ), Supp(f ) K for all n 1.
In general the topology of a locally convex vector space is not metrizable. One important situation in which it is metrizable is when it is sucient to use countably many semi-norms. This is the case in Example 7.39(1), (2) and (3) (for the latter recall that an open set Rd is -compact), but is not for Example 7.39(4). If the locally convex topology on X is given by the seminorms n for n N, then we can dene a metric on X as in Lemma A.15, leading to the following denition. Denition 7.41. A Fr echet space is a locally convex vector space X whose topology is dened by countably many semi-norms n for n N, such that X is complete with respect to the metric d(x, y ) =
n=1 x y n 1 2n 1+ xy n
(7.6)
Exercise 7.42. Suppose that the topology of a locally convex vector space X is induced by countably many semi-norms n for n N. (a) Show that a sequence (xn ) in X is a Cauchy sequence with respect to the metric in (7.6) if and only if (xn ) is a Cauchy sequence with respect to all of the seminorms n for n N. (b) Show that Example 7.39(1), (2) and (3) are Fr echet spaces.
7.5 Convex Sets
255
7.5 Convex Sets

7.5.1 Applications of the HahnBanach Lemma A set K X in a vector space is called absorbent if for any x X there exists some > 0 with x K . Note that if K = B1 (0) for some semi-norm on X , then K is an absorbent convex set. A partial converse is given by the following result, which gives one solution to Exercise 6.2. Lemma 7.43. Let K X be an absorbent convex set in a vector space. Dene the gauge function pK : X R 0 by pK (x) = inf {t > 0 | 1 t x K }. Then pK (x) = pK (x) and pK (x + y ) and x, y X . pK (x) + pK (y ) for all 0
Proof. The positive homogeneity follows directly from the denition. Suppose now that x, y X and tx , ty > 0 have 1 1 x, y K. tx ty Then 1 tx (x + y ) = tx + ty tx + ty ty 1 x + tx tx + ty 1 y ty (7.7)
also lies in K , since K is convex. Thus pK (x + y ) tx + ty ,
and since this holds for all tx , ty with (7.7), the triangle inequality follows.
Exercise 7.44. Use Lemma 7.43 to prove the converse to Exercise 7.37.
The next result strengthens Corollary 6.4 and Exercise 7.38, and is readily explained using Figure 7.2. Theorem 7.45. Let X be a locally convex vector space over R. Let K X be a closed convex set, and suppose that z X K . Then there exists a continuous linear functional X and a constant c R such that (y ) for all y K . c < (z )
256
7 Locally Convex Vector Spaces (x ) = c
K z
Fig. 7.2. The point z / K can be separated from the convex set K by a closed hyperplane.
Proof. Since z / K and K is closed, we see that X K is a neighborhood of z , and in particular we have N1 ,...,n ; (z ) X K for some 1 , . . . , n A and > 0 (see Denition 7.36 for the notation). We dene U = N1 ,...,n ;/2 (0), so that z + 2U X K . Without loss of generality we may assume that 0 K (for otherwise we just translate both K and z by the negative of an element of K ). Dene M = K + U = {y + u | y K, u U } and notice that M is convex because both K and U are (check this) and that M U is absorbent. We now apply Lemma 7.43 to nd the norm-like function pM . By denition, we have 2 (7.8) max{ 1 , . . . , n } pM since U M . We claim that pM (z ) > 1. For otherwise there exists a sequence (n ) with n 1 as n and with 1 z = kn + un M = K + U n z and un are bounded in the semi-norms 1 , . . . , for all n 1. Clearly 1 n , so the same holds also for kn . Now rewrite the above equation as n
7.5 Convex Sets
257
z = kn + (n 1)kn + n un and notice that for large enough n we have (n 1)kn + n un 2U since n 1 as n . However, this contradicts our assumption on U that z + 2U X K . Therefore we must have pM (z ) > 1. Now dene Z = Rz and a functional Z by (z ) = pM (z ). On Z we have (z ) pM (z ) for all R 0 (since pM is non-negative this holds trivially for < 0 also). By the HahnBanach lemma (Lemma 6.1) there exists an extension to all of X such that (x) pM (x) for all x X . This implies that (y ) 1 for y K M . Moreover, we also get continuity of since (7.8) implies that (x) which upgrades to |(x)|
2 2
max{ x
1 , . . . ,
n },
max{ x
1 , . . . ,
n }
by linearity of and since the right-hand side is a semi-norm. This gives the theorem. Since the weak topology is, for innite-dimensional vector spaces, strictly coarser than the norm topology, there is no reason why a set that is closed in the norm topology should be closed in the weak topology. However, for convex sets the situation is better. Corollary 7.46. Suppose that K is a norm-closed convex set in a real Banach space X . Then K is also closed in the weak topology. Proof. Suppose that z / K and apply Theorem 7.45 to nd a continuous linear functional : X R with (y ) c < (z )
for all y K and some c R. Therefore, 1 ((c, )) X K is a neighborhood of z in the weak topology. Hence X K is open in the weak topology and the corollary follows.
258
Exercise 7.47. Suppose that K, L X are disjoint convex sets in a locally convex vector space X over R. Suppose one of them has non-empty interior. Then there exists a continuous linear functional and a constant c such that (x ) for all x K and y L. Exercise 7.48. Let X be a normed vector space, and let K X be a closed and convex subset. Show that
y K
(y )
inf x y = sup
=1
(x) sup (y )
y K
for any x X .
7.5.2 Extremal Points and the KreinMilman Theorem An important concept for convex sets, both abstractly and for many concrete applications, is the notion of extremal points. Denition 7.49. Let X be a locally convex space and let K X be a convex subset. An element x K is an extremal point of K if x cannot be expressed as a proper convex combination of points of K (that is, if x = sy + (1 s)z with y, z K and s (0, 1) then we must have x = y = z ). Theorem 7.50 (KreinMilman). Let X be a locally convex space, and let K X be a compact convex subset. Then K is the closed convex hull of its extremal points, or equivalently the closure of the convex hull of the set. For the proof the following extension of the denition of extremal points will be useful. A subset E K of a convex set is called an extremal subset of K if E is convex, non-empty, and if x = sy + (1 s)z for x E , y, z K and s (0, 1) forces y, z E . To better understand this notion, it may be helpful to the reader to nd all extremal subsets of a polygon in R2 or of a polytope in R3 . Proof of Theorem 7.50. The proof uses Zorns lemma, applied to the set F = {E K | E is an extremal closed subset of K } and the partial order on F dened by E1 E2 if E1 E2 . We need to show that for every linearly ordered chain {E | I } there exists an element E E for every I . We claim that E=
I
That is, the intersection of all closed convex sets containing the set.
7.5 Convex Sets
259
is such an element. For this we only need to show that E F , as the fact that E E for every I is automatic from the denition of . Since each E is closed, the same holds for the intersection E . Since each E is non-empty and {E | I } is linearly ordered, we see that every nite intersection E1 En is non-empty, because it must coincide with one of the sets E1 , . . . , En . Since K is compact, we see that the intersection E is non-empty (see Appendix A.5). It remains to show that E is an extremal subset. Suppose therefore that x = sy + (1 s)z E with y, z K and s (0, 1). Then we must have y, z E for all I as E E and E is an extremal subset. Therefore x E = I E as required. In summary, we have shown that we are in a position to use Zorns lemma, and that there must therefore be a maximal element of F . In other words there exists a minimal closed extremal subset of E . We claim that E = {x} is a singleton, which then implies that x must be an extremal point of K . Indeed, if E contains two points x0 , y0 , then by Theorem 7.45 there exists a continuous linear functional on X with (x0 ) < (y0 ). However, by compactness this implies that E = {z K | (z ) = max |E } E is a non-empty proper closed subset. It is also an extremal subset, since if x = sy + (1 s)z E with y, z K and s (0, 1), then we must have y, z E as E is extremal and so (x) = s(y ) + (1 s)(z ) and (y ), (z )
(x) = max |E
which implies that y, z E as required. However, this is a contradiction since E K was supposed to be a minimal closed extremal subset of K . Therefore, E = {x0 } is a singleton and we have show that the set of extremal points of K is non-empty. Now let M denote the closed convex hull of the set of all extremal points of K . Clearly M K and we need to show that M = K . Suppose that x0 K M . By Theorem 7.45 there exists a continuous linear functional with (y ) c < (x0 )
for all y M . Now let E = {x K | (x) = max |K }, and notice that E K M is a closed convex subset of K . Therefore, E is compact and by the above argument there exists an extremal point x E . We claim that x is also
260
an extremal point of K , which then gives a contradiction since then x M K E also, by denition of M . So suppose that x = sy + (1 s)z with y, z K . Then (x) = s(y ) + (1 s)(z ) and (y ), (z ) max |K = (x) which implies that y, z E and hence x = y = z by extremality of x in E . This contradiction shows that K = M is the closed convex hull of the extremal points. The KreinMilman theorem together with the TychonoAlaoglu theorem (Theorem 7.4), can produce some striking consequences. Example 7.51. Let us show that c0 (N) has no pre-dual. In other words, there is no Banach space X with the property that X is isometrically isomorphic to c0 (N). Indeed, suppose that there is such a Banach space. Then, by the TychonoAlaoglu theorem, the unit ball of c0 (N) would be weak* compact. Thus, by the KreinMilman theorem , the unit ball would have to contain some extremal point (an )n 1 . We complete the argument by showing that there cannot be such an extremal point of the unit ball. By denition, |an | 1 for all n 1 and limn an = 0. Therefore, there exists some n0 with |an0 | < 1 2 and then the sequences (bn ) and (cn ) dened by an for n = n0 , bn = 2an for n = n0 and cn = an 0 for n = n0 , for n = n0
both belong to the unit ball by construction and we have

1 1 (bn ) + 2 (cn ), (an ) = 2
which shows that (an ) is not an extremal point.

Exercise 7.52. In Example 7.51 we showed that V is never isometrically isomorphic to c0 (N). Generalize the result in two ways as follows. Show that there is no Banach space V with the property that V is isomorphic to C0 (X ), where X is a compact, locally compact, non-compact space and the isomorphism is only assumed to be a linear homeomorphism.
This assumes we are working over R, but the argument extends easily to C (check this).
7.6 Further Topics
261
Exercise 7.53. (a) Let X be a -compact, locally compact metric space. Find the extremal points of the closed unit ball B1 (0) in the space of signed measures on X , and the extreme points of the convex set P (X ) of all probability measures on X . (b) Assume in addition that X is compact and innite. Show that the assumptions of the Krein-Milman theorem (Theorem 7.50) holds, but that P (X ) is not the convex hull of its extreme points. In other words, taking the closure of the convex hull is important in innite dimensions. (c) Assume now instead that X is in addition non-compact. Show that the conclusion of the KreinMilman theorem (Theorem 7.50) holds for P (X ) (despite the fact that the assumptions do not).
M(X )
In many applications where convex subsets of Banach spaces or locally convex spaces appear the extremal points play a special role. One instance of this arose in our brief excursion into ergodic theory (see Section 7.2.1), where the ergodic measures (which may now be seen to exist in great generality due to the KreinMilman theorem) are precisely the extremal points of the convex set of invariant probability measures. The following example (or rather part (c) of it) shows how badly intuition can fail for convex sets in innite dimensions.
Exercise 7.54. Let K = {f C ([0, 1]) | f (0) = 0, f is 1-Lipschitz}. (a) Show that K is convex and compact in the norm topology. (b) Show that any function f K which is piecewise linear and has slope 1 wherever f is dierentiable is extremal. (c) Show that the extremal points in K are dense in K . (d) Describe all the extremal points of K .
7.6 Further Topics

Many proofs and theories depend on weak* compactness, the notion of locally convex vector spaces, or the study of extremal points of convex subsets. We only mention a few samples and give further references. Decay of Matrix Coecients for Simple Lie Groups (the HoweMoore Theorem): If a simple non-compact Lie group G acts unitarily on a Hilbert space H without non-zero G-xed vectors, then the matrix coecients (g )v, w decay to zero as g in G, for any v, w H. In the language of ergodic theory, this means that every measure-preserving ergodic G-action is automatically mixing. This may sound complicated but the proof for SLd (R) only needs as inputs the equality case of the Cauchy-Schwarz inequality, the Tychono-Alaoglu theorem, and the ability to multiply matrices. We refer to [12, Sec. 11.4] for a discussion of the easier special case G = SL2 (R), and to [11] for the general case. The weak* compactness is here used on the Hilbert space H.
262
In the study of von Neumann algebras two more topologies on B (X, Y ) are used (particularly in the case where X = Y is a Hilbert space), the ultrastrong operator topology and the ultra-weak operator topology. We refer to von Neumann [33] for the original formulation of the ultra-strong topology, and to the monograph of Takesaki [46, Ch. II] for a full treatment. The locally convex vector space Cc ( ) = D ( ) is the space of test functions for distributions on in the sense that we can dene distributions as continuous linear functions on D ( ) (see Section ??). The Fr echet space S (Rd ) of Schwartz functions on Rd has important connections to Fourier transforms, and is the space of test functions for tempered distributions on Rd (see Section ??). There are further general classes of locally convex vector spaces. Among these are the nuclear spaces (examples include C ([0, 1]), Cb ( ) and S (Rd )) and the LF-spaces (these are strict inductive limits of Fr echet spaces; ex amples include Cc ( ) and Cc ( )). We refer to Bourbaki [4] or Tr` eves [47] for more details. If G is a locally compact abelian group, then one can dene the notion of positive-denite functions (see Weil [49, pp. 1223], where Bochners theorem for locally compact abelian groups is deduced from the Plancherel theorem, which in turn is shown using a structure theorem for such groups) and these form a positive cone in L m (G)
8 Spectral Theory of Unitary Operators, Fourier Transforms
8.1 Spectral Theory of Unitary Operators

Let H be a Hilbert space. Recall that a linear operator U : H1 H2 is said to be unitary if U is surjective and Uv
H2
= v
H1
for all v H1 (or, equivalently, if U = U 1 ). In contrast to the spectral theory of compact self-adjoint operators, it is not in general true that unitary operators on a Hilbert space are diagonalizable. This may be seen in the next model example. Example 8.1. Let be a nite measure on T, and let H = L2 (T, ). Dene the unitary multiplication operator U = M1 : H H by This unitary operator U has = e2ix0 as an eigenvalue if and only if the function f (x) = {x0 } is non-zero as an element of H = L2 (T, ). That is, = e2ix0 is an eigenvalue if and only if x0 is an atom of . Moreover, U is diagonalizable if and only if is atomic. The type of operator seen in Example 8.1 are not dicult to deal with even though they are usually not diagonalizable. Having abandoned the false hope that all unitary operators will be diagonalizable (that is, describable
f 1 f : x e2ix f (x).
This construction is a special case of Exercise 4.20(b).
264
ultimately in terms of only countably many multiplications on the ground eld), the next best hope one might have is that any unitary operator can be described in terms of multiplication by characters as in Example 8.1 (at the expense of allowing the underlying measure to vary). This is the content of the spectral theory of unitary operators. Theorem 8.2 (Spectral theory of unitary operators). Let H be a separable Hilbert space, and let U : H H be a unitary operator. Then H can be split into a countable direct sum H= Hn
n 1
of closed mutually orthogonal subspaces Hn , invariant under U and U , such that for each n 1 the unitary operator Un = U |Hn : Hn Hn is unitarily isomorphic to the multiplication operator M1 : L2 (T, n ) L2 (T, n ) for some nite measure n on T. Denition 8.3. If a unitary operator U : H H (or its restriction to a closed subspace as above) is unitarily isomorphic to a multiplication operator M1 : L2 (T, ) L2 (T, ) for some nite measure on T, then is called a spectral measure. As indicated before Theorem 8.2, spectral measures should be thought of as a replacement for, or a generalization of, eigenvalues (which correspond to the case of a measure with a single atom). 8.1.1 Bochners Theorem for Positive-Denite Sequences Although this is not immediately apparent, a useful tool for the proof of Theorem 8.2 is the notion of a positive-denite sequence. Denition 8.4. A sequence (an )nZ of complex numbers is called positivedenite if for any nite sequence (cn )nZ of complex numbers we have
m,nZ
cm cn amn
0,
meaning that the sum is real and non-negative.
That is, a sequence (cn )nZ for which cn = 0 for all but nitely many n Z.
265
It is not immediately obvious that non-trivial positive-denite sequences exist. There are two ways to construct examples. Example 8.5 (First basic construction). Let be a nite measure on T. Then the Fourier coecients an () of dened by an () =
T
n (x) d(x)
for n Z form a positive-denite sequence. Example 8.6 (Second basic construction). Let U : H H be a unitary operator on a Hilbert space, and x some v H. Then the inner products an (v ) = U n v, v for n Z form a positive-denite sequence. Both these claims require justication. Proof of statement in Examples 8.5 and 8.6. Notice rst that Example 8.5 is a special case of Example 8.6. If H = L2 (T, ), U = M1 , and v = T = is the constant function 1, then U n v = n and so an () =
T
n d = U n v, v = an (v )
for all n Z. Thus it is enough to consider the sequence (an (v )) from Example 8.6. Let (cn ) be a nite complex sequence as in Denition 8.4. Then cm cn amn (v ) = =
m Z
cm cn U m v, U n v
m,nZ
m,nZ
cm U m v,
nZ
cn U n v
0,
since the inner-product is positive-denite. The main step needed towards the proof of Theorem 8.2 is the following description of all positive-denite sequences. Theorem 8.7 (Bochners theorem for sequences). Let (an )nZ be a positive-denite sequence. Then there exists a nite measure on T for which an = an () =
T
n d
for all n Z. We postpone the proof to Section 8.1.3, and show next how Bochners theorem may be used to prove the spectral theorem for unitary operators.
266
8.1.2 Cyclic Representations and the Spectral Theorem Notice rst that a unitary operator U denes (and is dened by) an associated unitary representation of the group Z simply by dening n to be U n for n Z. Denition 8.8. A unitary representation (H, ) of a group G is called cyclic if H = Hv = g v | g G
for some v H. If G = Z then we also refer to a cyclic representation of Z as a cyclic Hilbert space with respect to the unitary operator 1 : H H. Note that if (H, ) is an arbitrary unitary representation of a group G on a separable Hilbert space H, then we can write H in the following form H=
n 1
Hn ,
where the sum is an orthonormal sum of closed -invariant subspaces Hn , each of which is a cyclic representation. Indeed, if w1 , w2 , H is an orthonormal basis as in Theorem 2.87 and H1 = Hw1 = g (w1 ) | g G , then H1 is -invariant (that is, g (H1 ) H1 for all g G). This, together with the fact that g = g1 and Lemma 4.24 implies that H1 is -invariant. De ne H2 = Hw2 , where w H is the orthogonal projection of w2 onto H1 . 2 1

nal projection of w3 onto (H1 H2 ) . Clearly w1 H1 , w2 H1 H2 , and w3 H1 H2 H3 . Repeating this construction gives a sequence of orthogonal, closed, -invariant, cyclic subspaces (Hn ) with H = n 1 Hn (see Exercise 8.9). This construction, together with the description of cyclic subspaces with respect to a unitary representation in the following corollary to Theorem 8.7, proves Theorem 8.2.
Exercise 8.9. Let Hn H be closed mutually orthogonal subspaces of a Hilbert space H. Prove that the closed linear hull of the union n 1 Hn is precisely the set of all convergent sums vn
nN
Again H1 H2 and (H1 H2 ) are -invariant, and we can continue the process by dening H3 = Hw3 , where w is the orthogo3 (H1 H2 )
with vn Hn for all n if n 1 vn 2 < .
1. Moreover, show that such a sum converges if and only
267
Corollary 8.10 (Cyclic spaces). Let U : H H be a unitary operator on a Hilbert space such that H is cyclic with respect to U , with H = Hv = U n v | n Z for some v H. Then there exists a nite spectral measure v on T such that there is a unitary isomorphism : H L2 (T, v ) with (v ) = and (U w) = M1 (w) for all w H. In other words, for cyclic Hilbert spaces we have a commutative diagram U :H H
L2 (T, v ) M1 : L2 (T, v ) of unitary maps. Proof of Corollary 8.10 assuming Theorem 8.7. Let v H be chosen with H = Hv = U n v | n Z . By Example 8.6, we know that an (v ) = U n v, v for n Z is a positive-denite sequence. By Theorem 8.7 there exists a nite measure v on T with an (v ) = an () = for all n Z. We now dene a unitary map : U n v | n Z L2 (T, v ) to be the unique extension of the map that sends any nite sum to the corresponding trigonometric polynomial, : Hv L2 (T, v ) cn n .
|n| N
n dv
|n| N
cn U n v
268
This natural attempt at dening a map raises several questions. Why is the map dened as above well-dened? How can it be extended to all of Hv from the dense subset of nite linear combinations? Why is the extension unitary? Why is it surjective? Leaving the last question to one side for the moment, the others all follow from the following calculation. For any nite complex sequence (cn ), we have
2
cn U n v
nZ H
=
m,nZ
cm U m v, cn U n v cm cn
m,nZ
U mn v, v
amn (v )=amn (v )
=
m,nZ
cm cn
mn dv cn n
nZ
=
m Z
cm m
2
dv
=
nZ
cn n
L2 (T,v )
We now show that the map dened on the set of nite linear combinations is well-dened by the following argument. If cn U n v =
nZ nZ n c nU v
for nite complex sequences (cn ) and (c n ), then 0= and so cn n =

nZ 2 nZ n (cn c n )U v
=
H
nZ
nZ
(cn c n )n
,
L2 (T,v )
c n n
in L (T, v ). Clearly the map so dened is now an isometry on a dense subspace of Hv , and so extends by Proposition 2.46 to an isometry from Hv into L2 (T, v ). Furthermore, the image of contains all trigonometric polynomials on T, and the trigonometric polynomials form a dense subset of C (T) by Proposition 3.19 (and so also form a dense subset of L2 (T, v )). Since Hv = H is a Hilbert space by assumption, and is an isometry, we see that (Hv ) L2 (T, v ) is complete and dense, and so is equal to L2 (T, v ). It remains to check that U = M1 . Let (cn ) be a nite complex sequence. Then
269
U
nZ
cn U n v
=
nZ
cn U n+1 v,
and so U
nZ
cn U n v
=
nZ
cn n+1 cn n = 1
nZ nZ
= 1
cn U n v .
That is, the desired formula holds on a dense subset of Hv and by continuity therefore also holds on all of Hv . This proves the corollary, and by the discussion above the corollary also proves Theorem 8.2.
8.1.3 Proof of Bochners theorem Knowing that Bochners theorem (Theorem 8.7) gives us the spectral theorem for unitary operators, we now have a strong motivation to prove it. For this we need two short lemmas, the rst of which gives some simple consequences of positive-deniteness. Lemma 8.11. (Elementary properties of positive denite sequences) Let (an ) be a positive-denite sequence. Then a0 0 , an = an for all n Z, and |an | a0 for all n Z.
In particular, (an ) (Z). Proof. Let
cn = n0 = Then a0 =
m,nZ
1 0
if n = 0, if n = 0. 0
cm cn amn
by denition. Now let x C, n Z 1 ck = x 0
{0}, and dene if k = 0, if k = n, if k / { 0 , n} .
We apply the denition again to see that
270
k,Z
ck c ak =
a0
(k==0)
xan
(k=n,=0)
+ xan + |x|2 a0
(k=0,=n) (k==n)
0.
(8.1)
Setting x = 1 we see that an + an R since a0 R, so (an ) = (an ). Setting x = i we see that ian ian R, and so (an ) = (an ). Thus an = an . Now write an = |an |ei for some R, and set x = rei with r R. With this, (8.1) simplies to give a 0 + 2 r |a n | + r 2 a 0 0
for all r R. If a0 = 0, this can only hold if we also have an = 0. If an = 0, n| then we can set r = |a a0 to get a0 2 |a n |2 |a n |2 + a0 a0 a0 . 0.
Equivalently we have shown that |an |
Lemma 8.12. (Fourier coecients of product) If f, g L2 (Td ), then the Fourier coecients cn (f g ) =
Td
f gn dx
for n Zd , are given by the convergent convolution product cn (f g ) =

m Z d
cm (f )cnm (g )
of the Fourier coecients (cn (f )) and (cn (g )) of f and g . Proof. Let us start with the case n = 0, where c0 (f g ) =
Td
f g dx = f, g
L 2 (T d )
= (cm (f ))m , (cm (g))m
2 (Z d )
by using the isomorphism between L2 (Td ) and 2 (Zd ). Now notice that cm (g ) =
Td
gm dx =
Td
gm dx = cm (g ), cm (f )cm (g ),
and so c0 (f g ) =
nZd
cm (f )cm (g ) =
nZd
which proves the claimed formula in the case n = 0. For any n Zd we have cn (f ) = c0 (f n ), so that
271
cn (f g ) = c0 (f gn ) =
m Z d
cm (f )cm (gn ) cm (f )cnm (g ),
=
m Z d
as claimed in the lemma. Proof of Theorem 8.7. We are given a positive-denite sequence (an )nZ and need to construct from it a nite measure on T so that an = n d
for all n Z. There is a one-to-one correspondence between nite Borel measures on T and positive linear functionals on C (T) by the Riesz representation theorem (Theorem 6.30), so it is enough to construct the corresponding positive linear functional. Roughly speaking, we would like to associate to the positive-denite sequence (an ) a positive linear functional via the requirement that
nZ
cn n
=
nZ
cn an ,
(8.2)
so that an = (n ) =
T
n d.
In order to do this, we need to ensure that is well-dened and positive. However, the denition of using (8.2) only makes sense a priori for nite sums and not for all continuous functions, because we do not have convergence of the sum nZ cn an . Indeed, for f C (T) we only know that the sequence of Fourier coecients (cn ) lies in 2 (Z) we do not have any reason to expect (cn ) to lie in 1 (Z). To overcome this problem we dene rst on the smaller space C 1 (T) of continuously dierentiable functions. If f C 1 (T) then (cn (f )) 1 (Z) (by the inequality 3.11 in the proof of Theorem 3.10). Together with Lemma 8.12 we deduce that (f ) = cn (f )an
nZ
converges, so : C (T) C is well-dened. We need to show that extends from C 1 (T) to C (T), and that the extension is also positive. In these two steps we will rely on the positive-denite property of the sequence (an ). Weak positivity: We claim as a rst step towards positivity that
1
f C 1 (T), f > 0 = (f )
0.
272
To see the claim, assume that f C 1 (T) and f > 0, let g = f C 1 (T), and let (cn (g )) be the sequence of Fourier coecients of g . Since g is real-valued, we have cn (g ) =
T
gn dx =
T
gn dx = cn (g )
for all n Z. By Lemma 8.12, we can also express the Fourier coecients of f = gg using those of g , since c (f ) =
m Z
cm (g )cm (g ) =
m Z
cm cm (g )
for all Z. Therefore, (f ) =

Z
c (f )a =
,mZ
cm (g )cm (g )a cm (g )cn (g )amn
=
m,nZ
Suppose that f C 1 (T) with f 0. Then for any > 0 we have f + > 0 and so (f + ) = (f ) + () 0. Since this holds for any > 0, we see that (f ) 0 as required. Boundedness: By denition, is dened on C 1 (T) (and also continuous with respect to C 1 (T) ). However, is also bounded with respect to . Indeed, |(f )| a0 f for f CR (T) since f + f
by the substitution n = m and the fact that the double sum is absolutely convergent. Restricting the sum to m, n F for nite subsets F Z, the original denition 8.2 applies and shows that these partial sums are all nonnegative. It follows that the same holds for the limit, and hence (f ) 0 as claimed. Positivity: We now improve the previous claim to show that is positive in the sense that f C 1 (T), f 0 = (f ) 0.
0 = (f ) + ( f
) = (f ) + f
a0
0.
The measure: By the argument above, extends continuously to a bounded linear functional on CR (T). Moreover, if f CR (T) and f 0 then there exists 1 a sequence (fn ) of functions in CR (T) with fn f uniformly as n , so that fn fn f implies that (f )
n
lim (fn )
lim a0 fn f
= 0.
By the Riesz representation theorem (Theorem 6.30) for positive functionals, there exists a nite measure on T with
273
(f ) =
T
f d
for all f C (T). In particular, n d = (n ) = an ,

T
which proves the theorem. For simplicity we have been working in this section mostly on T but this would not have been necessary as the next exercise shows.
Exercise 8.13. (a) Dene positive-denite functions on Zd (so that the sequence case corresponds to d = 1), and generalize Bochners theorem to Td . (b) State and prove a corollary to part (a) regarding the spectral theory of d commuting unitary operators, so that Theorem 8.2 corresponds to the case d = 1.
The spectral theory of unitary operators has the spectral theory of selfadjoint operators as a consequence, as the next exercise shows. However, we will also give an independent and much more detailed treatment of this case in Chapter ??.
Exercise 8.14. (a) For any bounded operator A : V V on a Banach space V and n any power series f (z ) = n=0 cn z whose radius of convergence is bigger than A , show that the natural denition of f (A) as the limit of the sequence of operators obtained as partial sums makes sense. Show that if g (z ) = n=0 dn (z c0 ) is the inverse function to f (z ) dened in a neighborhood of f (0) = c0 (and represented by another power series) and f (A) c0 is smaller than the radius of convergence of g (z ), then we have g (f (A)) = A. (b) Let A : H H be a self-adjoint operator. Apply part (a) to 4 A and the A power series corresponding to eiz to obtain a unitary operator U : H H. Show A (and hence A) can be recovered from U via the power series representthat 4 A ing 1 log( z ) in a neighborhood of 1. i (c) Apply Theorem 8.2 to U and show that one can describe A on H by a direct sum of multiplication operators as in Exercise 4.20(b). In fact, for each of the direct summands the measure space can be chosen to be a copy of R together with a compactly supported measure and the multiplication operator can be chosen to be Mid (f )(x) = xf (x).
8.1.4 Projection-valued Measures As in Exercise 8.14 it is relatively straightforward to obtain a denition of h(A) for an analytic function h (dened by a power series) and bounded operator A (whose norm is less than the radius of convergence of the power series). For a multiplication operator Mg as in Exercise 4.20 one can go much further. It is easy to dene h(Mg ) by setting it equal to Mhg for any measurable
The reader should verify at this point that this denition is indeed generalizing the prior denition for analytic functions to measurable functions.
274
function h. Since Theorem 8.2 and Exercise 8.14 describe unitary and selfadjoint operators in terms of multiplication operators this allows one to also dene the operators obtained by applying h to these. However, from this denition it is not clear whether the result is independent of the choices made to describe the operator on H as a sum of multiplication operators. As it turns out this is the case, and we will return to this functional calculus in Chapter 10. We wish to show that we can indeed dene h(U ) for a measurable function h : T C which is equal to Mh1 if U = M1 : L2 (T, ) L2 (T, ) but is independent of the isomorphism in Theorem 8.2. For this the most important step is to consider the case of characteristic functions h = B for a measurable set B T. We will dene B (U ), which we will denote by E (B ) (dropping the symbol U since we will x the operator U : H H), using the inner products E (B )v, w . For this the trick is to dene more general spectral measures. Denition 8.15. Let U : H H be a unitary operator. A complex-valued measure v,w is the spectral measure of v, w H if n dv,w = U n v, w
T
(8.3)
for all n Z. Corollary 8.16. Let U : H H be a unitary operator. The spectral measure v,w exists and is uniquely determined by v and w. Moreover, it depends linearly on v and conjugate-linearly on w. For every Borel subset B T there exists an orthogonal projection operator E (B ) : H H with E (B )v, w = v,w (B ). Moreover, if H = n 1 L2 (T, n ) and v corresponds to (fn ) then E (B ) corresponds to (B fn ).
n 1
L2 (T, n ),
The idea behind the proof is simple, and relies on the polarization identity
3
U n v, w = which gives the existence
1 4 =0
i U n (v + i w), v + i w
(8.4)
v,w =
1 4
3 =0 i v +i w .
(8.5)
Denition 8.17. The function E : B (T) B (H) is called a projection-valued measure.
275
Proof of Corollary 8.16. Since the spectral measures v+i w for the vectors v + i w H exist by Corollary 8.10, equation (8.4) shows that v,w as dened in (8.5) satises the desired relationship. Since trigonometric polynomials are dense in C (T) by Proposition 3.19, and complex-valued measures are naturally identied with linear functionals on C (T) (by Theorem 6.40 and Exercise 6.41), it follows that the spectral measure v,w is uniquely determined by (8.5). This implies the claimed sesquilinearity. Now that we have established a denition of v,w we switch our viewpoint 2 and describe H by n 1 L (T, n ) for a sequence of nite measures (n ). Let v, w H correspond to (fn ), (gn ) n 1 L2 (T, n ) respectively, so that for any m Z we have U m v, w
H
=
n 1
m f , gn M 1 n
L2 (T,n )
=
n 1 T
m fn gn dn =
T
m dv,w .
Now notice that

T
|fn gn | dn
fn
gn 2 ,
and so |fn gn | dn fn
n 1 2
gn
n 1
This implies that
n 1
fn 2 2
1/2
n 1
gn 2 2
1/2
= v
H.
fn gn dn
n 1
denes a nite complex-valued measure, which must be equal to v,w . Moreover, v,w v H w H. Therefore, for any Borel subset B T we have that v,w (B ) is sesquilinear in v and |v,w (B )| v H w H. Therefore by the Fr echetRiesz representation theorem (Corollary 2.71) there exists an operator E (B ) : H H with v,w (B ) = E (B )v, w .
276
Switching to n 1 L2 (T, n ) and using the same notation we can now check that E (B )v corresponds to (B fn ) Indeed (B fn ) , (gn )
n 1
L2 (T, n ).
n 1
L2 (T,n )
=
T
B
n 1
fn gn dn = v,w (B )
as required. This also implies that E (B ) is self-adjoint (check this).

Exercise 8.18. Given a function h L (T) show that there exists an operator h(U ) B (H) with h(U )v, w =
T
h dv,w .
operator .
When is h(U ) unitary or self-adjoint? What is the norm h(U )
8.2 Fourier Transform

The Fourier transform generalizes the (important and satisfying) theory of Fourier series on Td to an (equally important and satisfying) theory for functions on Rd . The analog of the Fourier coecient will be the Fourier transform f of a function f on Rd , dened by f (t) =
Rd
f (x)e2ixt dx,
(8.6)
where x, t Rd and x t = x1 t1 + + xd td is the usual inner product. The analog of the Fourier series will be the Fourier back transform (or reverse h of a function h on Rd , dened by transform)
h(x) =
Rd
h(t)e2ixt dt.
(8.7)
The analog of the fact that the Fourier series represents the original function (where this is true) will be a Fourier inversion formula f = (f ) . However, the way in which the optimistic identity f = (f ) needs to be interpreted as a mathematical theorem is more involved. For example, if f L2 (Rd ) then there is no reason to expect the Fourier transform dened in (8.6) as a Lebesgue integral to exist. Thus one has to nd another interpretation of f for f L2 (Rd ), using a unique continuous extension of a densely-dened operator as in Proposition 2.46. We also note that we will think of x Rd as the space variable and of t Rd as the frequency variable. In fact, any t Rd denes the wave
277
function e2ixt for which t gives the frequency and the direction of the wave. In that sense, f (t) should be interpreted as the correlation of the wave with frequency t and the function f . The formula f = (f ) then shows that one can reconstruct the original function f by a suitable superposition of the waves with frequency t and amplitude f (t). We start with a concrete example. Example 8.19. Let f (x) = e
x
2
. Then we claim that f (t) = e
Proof of claim. Suppose rst that d = 1, and start by calculating f (0). By denition, f (0) =
R
ex dx.
Thus |f (0)|2 = = ex ey dx dy
R2 0 2 0 0
2 2
(by Fubini) (in polar coordinates)
er r d dr
2
= 2 =
0
er r dr (where r2 = s.)
es ds = 1
To verify the claimed formula for f (t) for t R we will use the Cauchy formula 2 for complex path integrals applied to the holomorphic function z ez , with z = x + it. We integrate over a rectangular path with corners at M and M + it as illustrated in Figure 8.1.
M + it
M + it
M Fig. 8.1. The contour .
By Cauchys formula,
278
0=
ez dz
M
=
M
ex dx +
0
e(M +is) ds +
M M
e(x+it) dx +
t
e(M +is) ds (8.8)
Now notice that which implies that
e(x+is) = e(x e(M +is)

2
s2 2 isx)
e (M
t2 )
converges to zero uniformly as M with |s| Therefore (8.8) implies that

t for any xed t R.

2
ex dx = 1 =
ex e2itx dxet
f (t)
as M , so f (t) = et for all t R. For d 1 notice that f (x) = e

x
2
= ex1 exd = f1 (x1 ) fd (xd )

2
is a product of d copies of the function f1 (x) = ex1 discussed above, so f (t) =

Rd
f1 (x1 ) fd (xd )e2i(x1 t1 ++xd td ) dx f1 (xd )e2ixd td dxd

R
=
R
f1 (x1 )e2ix1 t1 dx1

=f1 (t1 )
2 2
= et1 etd = e
by Fubinis theorem and the case d = 1. The next result is the rst of many duality principles involving Fourier transforms. Proposition 8.20 (Duality between shift and multiplication by characters ). For x0 , t0 Rd we dene the shift operator x0 and the multiplication operator Mt0 on L1 (Rd ) by (x0 (f )) : x f (x x0 )
Multiplication by a character might also be referred to as a phase shift.
279
and Mt0 (f ) : x e2ixt0 f (x). Then x0 (f ) = Mx0 f and Mt0 f = t0 f . Proof. By denition, x0 f (t) = =
Rd
Rd
f (x x0 )e2ixt dx f (y )e2i(y+x0 )t dy
= e2ix0 t f (t) and Mt0 f (t) = =

Rd
e2ixt0 f (x)e2ixt dx
Rd
f (x)e2ix(tt0 ) dx
= f (t t0 ).
Proposition 8.21 (Duality of linear transformations). Let f L1 (Rd ) and let A GLd (R) be an invertible matrix. Then f A L1 (Rd ), and (f A)(t) =
t 1 f A | det A|
Proof. We use the denition and substitute f A(t) = = f (Ax)e2ixt dx

Rd ( A ) 1 t
t
1 2 i(Ax) f (Ax)| det A|e det A| Rd t 1 = f (A )1 t . | det A|
dx
The impatient reader may use the three propositions above to show that the Fourier transform extends to an isometry from L2 (Rd ) to L2 (Rd ) via the steps of the following exercise.
280
Exercise 8.22. (a) Show that A= x

nite
ci eai
xxi +2 ixti
| ci C, ai
0, xi , ti Rd
is a subalgebra of C0 (Rd ) that separates points and is closed under conjugation. (b) Show that A = A and that f = (f ) for all f A. (c) Show that A C0 (Rd ) is dense with respect to . (d) Show that A L1 (Rd ) L2 (Rd ) is dense in both L1 (Rd ) and in L2 (Rd ) with respect to the norms 1 and 2 respectively (which is not an immediate consequence of (c) since Rd has innite Lebesgue measure). Moreover, show that if g L1 (Rd ) L2 (Rd ) and > 0 then there exists a single function f A with f g 1 < and f g 2 < . (e) Show that f 2 = f 2 for all f A so that the Fourier transform extends to a unitary map on L2 (R) with its inverse given by the Fourier back transform.
8.2.1 Fourier Transform on L1 (Rd ) Proposition 8.23 (Basic inequality and RiemannLebesgue lemma). The Fourier transform maps L1 (Rd ) into C0 (Rd ), and the basic inequality f for all f L1 (Rd ). Proof. If f L1 (Rd ) then |f (t)| = f (x)e2ixt dx
Rd
1.
If tn t in Rd as n , then f (x)e2ixtn f (x)e2ixt as n and so f (tn ) =

Rd
f (x)e2ixtn dx
f (x)e2ixt dx = f (t)
Rd
by the dominated convergence theorem. Therefore f is a bounded continuous function on Rd . It remains to show that f C0 (Rd ), which we will prove by an approximation argument. Suppose rst that f = [a1 ,b1 ][ad ,bd ] is the characteristic function of a rectangle. Then, by Fubinis theorem,
281
f (t) =
Rd b1
[a1 ,b1 ] (x1 ) [ad ,bd ] (xd )e2ixt dx

e2ix1 t1 dx1
bd ad
=
a1
e2ixd td dxd .
Each factor can be calculated explicitly, and

b a
e2ixt dx =
ba
e2ibt e2iat 2 it
for t = 0, and for t = 0,
so each factor lies in C0 (R). It follows that f C0 (Rd ) if f is the characteristic function of a rectangle. By linearity the same holds for any nite linear combinations of such functions. Furthermore, the same holds for any element f L1 (Rd ) that can be approximated by such nite linear combinations, which is all of L1 (Rd ) (see the argument on p. 167). Proposition 8.24 (First half of duality of convolution and multiplication). For f, g L1 (Rd ) the convolution f g (x) =
Rd
f (y )g (x y ) dy
denes another element of L1 (Rd ) satisfying f g

1
(so L1 (Rd ) is a Banach algebra). The Fourier transform of f g satises f g = f g. Proof. Applying Fubinis theorem and a substitution we see that |f (y )g (x y )| dx dy = =
Rd Rd
Rd
Rd
Rd
Rd
|f (y )g (x y )| dx dy |f (y )||g (z )| dz dy = f
1
g 1.
Thus the integral dening f g (x) is nite for almost every x Rd , and f g
1 Rd Rd
|f (y )g (x y )| dy dx = f
g 1.
Now let t Rd and apply Fubinis theorem to the denition of f g (t) to see that
282
f g(t) = =
Rd
Rd
f (y )g (x y ) dy e2ixt dx
Rd
f (y )
Rd
g (x y )e2i(xy)t dx e2iyt dy
g (t)
= f (t)g (t).
As mentioned above, we will show that the Fourier back transform is the inverse of the Fourier transform. However, as we will see, this requires additional assumptions on L1 (Rd ), since the hypothesis f L1 (Rd ) only implies that f C0 (Rd ) (as seen in the proof of Proposition 8.23), so there is no reason to expect that the Fourier back transform will be dened on f . Theorem 8.25 (Fourier inversion theorem). Suppose that f L1 (Rd ) also has f L1 (Rd ), then f agrees almost everywhere with the continuous function (f ) C0 (Rd ). Even with the additional assumption, Theorem 8.25 implies immediately that f uniquely determines f as an element of L1 (Rd ). Corollary 8.26 (Injectivity). If f1 , f2 L1 (Rd ) satises f1 = f2 , then f1 = f2 . Proof. Given f1 , f2 L1 (Rd ) as in the corollary, the function f = f1 f2 satises f = 0 L1 (Rd ). Applying Theorem 8.25 this implies that f = 0 and shows the corollary. In order to prove Theorem 8.25 we need a preparatory lemma. Lemma 8.27. If f, g L1 (Rd ) then f g dx =
Rd Rd
f g dy.
Proof. Once again, this is a simple application of Fubinis theorem, as f (x)g (x) dx =
Rd Rd Rd
f (y )e2iyx dyg (x) dx f (y )

Rd Rd
g (x)e2iyx dx dy =
Rd
f (y )g(y ) dy.
Proof of Theorem 8.25. Let f L1 (Rd ) also have f L1 (Rd ). We need to show that (f ) agrees with f almost everywhere. To achieve this, we will use Lemma 8.27 for f and the phase-shifted stretched Gaussian distribution
283
r,x0 (t) = e2itx0 r (t) where r (t) = e

rt
2
for x0 Rd and r > 0 (as a smoothing tool). Indeed, Lemma 8.27 gives fr (x0 ) :=
Rd
f (t)r,x0 (t) dt =
Rd
f (x)r,x0 (x) dx
(8.9)
for all x0 Rd . We will show that fr L1 (Rd ) for r > 0 and also establish the following claims: Claim 1: fr (f ) pointwise as r 0 (which will use the left-hand integral in (8.9)), and Claim 2: fr f in L1 (Rd ) as r 0 (which will use the right-hand integral in (8.9)). Notice rst that r is an approximate identity as r 0 (with properties similar to the Fej er kernel in Section 3.2.2): r (x)
Rd
0,
r (x) dx = (0) = 1, and for any > 0 r (x) dx =

Rd B (0) Rd B (0)
r d e e
Rd B/r (0) y
x/r
dx
= as r 0. Proof of Claim 1. Since fr (x0 ) =

Rd
dy 0
f (t)e2itx0 e
rt
dt
and e
rt
1 as r 0, we obtain fr (x0 ) f (t)e2itx0 dt = (f ) (x0 )

Rd
by the dominated convergence theorem. Proof of Claim 2. By Example 8.19 and Proposition 8.21 we have r (x) = rd e and by Proposition 8.20 we also have
x/r
2
284
r,x0 (x) = rd e This gives fr (x0 ) =

Rd
(xx0 )/r
f (x)rd e
(x0 x)/r
dx = f r (x0 )
and f r L1 (Rd ) by Proposition 8.23. Now f r (x0 ) f (x0 ) = =

Rd Rd
(f (x0 x) f (x0 )) rd e (f (x0 rz ) f (x0 )) e

z
x/r
dx
dz
by using the substitution z = x/r, and so on taking the norm we obtain f r f

1 Rd Rd
|f (x0 rz ) f (x0 )|e

1e z
2
dz dx0
Rd
rz (f ) f
1
dz.
Now notice that rz (f ) f
0 as r 0 for every z Rd . Therefore

1
f r f
by dominated convergence, proving Claim 2. Since every sequence that converges in L1 (Rd ) has a subsequence that converges almost everywhere (see, for example, the proof of completeness of Lp spaces), the two claims prove the theorem. 8.2.2 Fourier Transform on L2 (Rd ) As mentioned before, the Fourier transform behaves quite well on L2 (Rd ) (with the minor obstacle that the dening integral does not make sense). Theorem 8.28 (Plancherel formula). If f L1 (Rd ) L2 (Rd ), then f L2 (Rd ) and the map f f extends continuously to a unitary isometry ). on L2 (Rd ) (whose inverse is the continuous extension of f f Proof. We dene the space of functions V = {f L1 (Rd ) | f L1 (Rd )}. By Theorem 8.25 and Proposition 8.23 we have f = (f ) C0 (Rd ) almost everywhere for f V . Hence V L (Rd ) consists of essentially bounded functions and so |f |2 = f f L1 (Rd ) for all f V . Therefore, V L2 (Rd ). We
285
claim that V is dense in L2 (Rd ). For this, notice rst that L1 (Rd ) L2 (Rd ) is dense in L2 (Rd ) (this may be seen using simple functions), so that it is enough to approximate a given function f L1 (Rd ) L2 (Rd ) by an element of V with respect to 2 . Using the same notation as in the proof of Theorem 8.25, we already know that f r approximates f in L1 (Rd ) as r 0. By the same argument, f r also approximates f in L2 (Rd ) as r 0. By Proposition 8.24 we see that f r = f r
is a product of an element in C0 (Rd ) with an element of L1 (Rd ), which shows that f r V . This gives the claimed density. We will now show that the Fourier transform preserves the inner product. For this let f, g V and dene h = g . Then h(x) =
Rd
g (t)e2ixt dt g (t)e2ixt dt
=
Rd
= (g ) (x) = g (x) almost everywhere by Theorem 8.25. Applying Lemma 8.12 we see that f, g
L 2 ( Rd )
=
Rd
f g dx f h dx
Rd
= =
Rd
f h dt f g dt = f , g
Rd
L 2 ( Rd )
In other words we have show that the Fourier transform preserves the inner product for elements in V . It follows that the Fourier transform extends to an isometry from L2 (Rd ) to itself, which we again denote L2 (Rd ) f f L2 (Rd ). Since V = V (which follows directly from Theorem 8.25) is dense in L2 (Rd ) (by the above), the extension is surjective. Clearly the same holds for the Fourier back transform. Morever, since (f ) = f for f V by Theorem 8.25, the same holds for all f L2 (Rd ). As we will use the same symbol f for the Fourier transform of f , dened for f L1 (Rd ) by (8.6) and dened for f L2 (Rd ) by the unique continuous extension from V L2 (Rd ), we still need to check that for f L1 (Rd )L2 (Rd )
286
these denitions agree. The tools for this argument have been used before. Let f L1 (Rd ) L2 (Rd ), so that f r f as r 0 both in L1 (Rd ) and in L2 (Rd ). For f r V we used the same denition for the Fourier transform, and by Proposition 8.24 we have f r = f L r ,
1
where we write f L for the Fourier transform dened using (8.6). Now let r 2 0, so that r (x) = e rx 1 pointwise, and hence f r f L f r f L
1
pointwise as r 0. Also for r 0 we have by denition

2
in L2 (Rd ), where we write f L for the Fourier transform obtained by continuous extension. There is a subsequence rn 0 as n such that f rn f L
2
almost everywhere as n , which implies that f L = f L almost everywhere as desired. Using the Plancherel formula we can give the reverse duality claim to Proposition 8.24. Corollary 8.29 (Second half of duality between convolution and multiplication). For f, g L2 (Rd ) the pointwise product f g L1 (Rd ) has Fourier transform f g = f g. Proof of Corollary 8.29. The proof is similar to the proof of Lemma 8.12. Note that since f , g L2 (Rd ), the integral in f g(t) =
Rd
f (s)g (t s) ds
exists by the CauchySchwarz inequality. Since the map f f is unitary on L2 (Rd ), we have f g (0) = f , g = f, g = Using Proposition 8.24 we can extend this to f g(t) = f g (t) f g dx = f g(0).
for all t R .
d
8.2 Fourier Transform Exercise 8.30. Show that the unitary operator L2 (Rd ) f f L2 (Rd ) is completely diagonalizable and has only four eigenvalues.
287
8.2.3 Fourier transform and smoothness, Schwartz space As with Fourier series in Section 3.2, smoothness and decay properties of the Fourier transform are closely related. Proposition 8.31 (Duality between dierentiation and multiplication by monomials). If x x f (x) lies in L1 (Rd ) for all Nd 0 with ||1 k , then f C k (Rd ), and f = (2 i)
x f (x)
1
for all with 1 k . If f C k (Rd ) and f L1 (Rd ) for d and f C0 (R ) for 1 k 1, then ( f ) = (2 i)
t f (t).
Proof. Suppose that f and x xj f (x) lie in L1 (Rd ). Then j f (t) = lim f (t + hej ) f (t) h e2ix(t+hej ) e2ixt dx f (x) = lim h0 Rd h e2ihxj 1 2ixt = lim f (x) e dx h0 Rd h
h0
if the limit exists. Now notice that e2ihxj 1 2 ixj h as h 0 and is bounded in norm by 2 |xj | by the two-dimensional mean-value theorem for dierentiation. Applying the dominated convergence theorem we deduce that j f (t) =
Rd
f (x)(2 ixj )e2ixt dt = (2 ixj f (x))(t),
and so the rst part of the proposition now follows by induction on k . Now suppose f C 1 (Rd ), f C0 (Rd ) and f, j f L1 (Rd ). Then j f (t) =
Rd
f (x)e2ixt dx. xj
288
By applying Fubinis theorem we may concentrate on the one-dimensional integral in xj . Using integration by parts, f (x)e2ixt dxj = lim f (x)e2ixt M xj = 2 itj f (t) by all of our assumptions on f . The proposition now follows by induction. Proposition 8.31 says in particular that the Fourier transform of a smooth function in C0 (Rd ) L1 (Rd ) whose derivatives also lie in C0 (Rd ) L1 (Rd ) (for example, any element of Cc (Rd )) has a Fourier transform with super polynomial decay . Similarly, a function that has super-polynomial decay has a smooth Fourier transform. Given these observations, the next denition describes a natural class of functions invariant under dierentiation and under Fourier transforms. Denition 8.32. The Schwartz space on Rd is dened by S (Rd ) = {f : Rd C | f is smooth and x f
x j =M x j = M M
lim
f (2 itj )e2ixt dxj
< for all , Nd 0 }.
The following exercises describe the main properties of S (Rd ) and of the Fourier transform on S (Rd ).
Exercise 8.33. (a) Show that S (Rd ) is a Fr echet space (see Denition 7.41) with the seminorms f , = x f for f S (Rd ) and , Nd 0. (b) If the seminorms f
= (x f (x))
are used instead, do you get the same Fr echet space? Exercise 8.34. Show that the Fourier transform : S (Rd ) S (Rd ) is a continuous operator with the Fourier back transform : S (Rd ) S (Rd ) being its continuous inverse. Exercise 8.35. Prove the Poisson summation formula : for f S (Rd ), f (n) =
nZd nZd
f (n).
multiplied by any polynomial is bounded (and That is, the Fourier transform f still decays).
9 Banach Algebras and Spectrum
In this chapter we will study the Banach algebras introduced in Section 2.4.3. For most of the discussion we will assume that the Banach algebra A is unital meaning that there is a multiplicative unit A = 0. Even though this excludes at rst sight the important example (L1 (Rd ), +, ), this may be overcome by the simple construction in the following exercise.
Exercise 9.1. Let A be a Banach algebra, and dene the algebra A = A C with the convention that we write the elements of A in the form (a, ) = a + , use the norm a + = a A + || and the obvious linear structure as a vector space over C, and use the multiplication (a + )(b + ) = (ab + b + a) + . Show that with these denitions A is a unital Banach algebra.
9.1 Spectrum and Spectral Radius

We say that an element a of a unital Banach algebra A has an inverse if there exists some b A with ab = ba = . Denition 9.2. Let A be a unital Banach algebra over C. The spectrum of a A is the set (a) = { C | a has no inverse in A}. The complement of the spectrum is the resolvent set (a) = { C | a has an inverse in A}.
In the literature one sometimes also sees the assumption not need it.
A = 1, but we will
290
Let us note that the above generalizes the notion of an eigenvalue: For if A equals the algebra of linear maps on Cd , the spectrum equals the set of eigenvalues.
Exercise 9.3. In this exercise we describe the spectrum for multiplication operators as in Exercise 4.20. (a) Let be a compactly supported nite measure on C, and let (M (f ))(z ) = zf (z ) for f L2 (C) be the multiplication operator corresponding to the identity map on C. Show that the spectrum (M ) within the algebra of bounded operators B (L2 (X )) equals the support of . (b) Let (X, B, ) be a -nite measure space, and let H = L2 (X ). Let g : X C be a bounded measurable function. Show that the spectrum within the algebra of bounded operator of the multiplication operator Mg consists of the essential range of g , i.e. all C such that (g 1 (U )) > 0 for all neighborhoods U of . Exercise 9.4. Let X be a compact topological space, and let A = C (X ). Find the spectrum of f C (X ) as an element of the Banach algebra C (X ).
The following theorem will show that the spectrum is always non-empty, and hence provides us with generalized eigenvalues. Since even in nite dimensions eigenvalues may be complex, we will (as in the above denition) only consider Banach algebras over C. Using Cauchy integration on the complex plane and convergent geometric series in A we will show the following important result. Theorem 9.5 (Spectral radius formula). Let A be a unital Banach algebra. Then for every a A the spectrum (a) is a non-empty compact subset of C. Moreover the spectral radius max(a) || satises
(a)
max || = lim
an .
(9.1)
This theorem is the rst of many that relate the algebraic to the topological structure in Banach algebras. The spectrum and the spectral radius of an element are dened in purely algebraic terms, whereas the limit is dened in terms of the norm. One surprising consequence of this is the following observation: if A is a unital Banach algebra contained in a larger Banach algebra B (with compatible structures), then it is possible for an element a A to be non-invertible in A but to be invertible in B . Thus the spectrum of an element depends on the algebra it is viewed in, and we have B (a) A (a) with strict inequality a possibility (see Exercise 9.6). Despite this, the spectral radius of a A is not changed if it is viewed as an element of B , since Theorem 9.5 expresses it in terms of the norms of powers of a, which is not aected by switching from A to B (by the implicit compatibility assumption).
Exercise 9.6. Let U : 2 (Z) 2 (Z) be the unitary shift operator from Exercise 4.19(a) (U ((xn )))k = xk+1 .
291
(a) Show that the spectrum of U considered within the algebra B of all bounded operators on 2 (Z) consists of S1 = { C : || = 1}. (b) Now consider the Banach algebra A generated by U (that is, obtained by taking the closed linear hull of U 0 = , U, U 2 , . . .). Show that the spectrum of U within A consists of { C : || 1}.
The above mentioned dependence of the spectrum of an element on the ambient algebra will not cause any confusion: In the abstract setting considered here we will only have one algebra at a time, and in the application of these results in the context of operators on a Hilbert space H we will always consider the algebra B (H ) of all bounded linear operators on H .
Exercise 9.7. (There are no Banach elds.) Use Theorem 9.5 to show that C is the only Banach algebra over C that is also a eld.
Finally, let us comment on the precise shape of the spectral radius formula (9.1). It will be relatively straight forward to show that C with || > a cannot belong to the spectrum of a A. However, it is also clear that in general the norm may be much larger than the spectral radius. In fact, even in the elementary case of the algebra of two-by-two matrices (equipped with any operator norm) the norm of the matrix a= 1C 0 1
can be made arbitrarily large by increasing the value of C , but the spectrum always consists simply of {1}. The right-hand side of the spectral radius formula (9.1) essentially ignores the original size of the matrix a and instead looks at the exponential growth rate of the norm of an . In the case at hand the norm of an grows linearly which makes the right-hand side equal to one (and so equal to the left hand side). 9.1.1 The Geometric Series and its Consequences As usual, in a ring with a unit the inverse of an element a of a Banach algebra is uniquely determined by (and uniquely determines) a. Proposition 9.8. Let A be a unital Banach algebra. Then the set U = {a A | a is invertible} is non-empty and open. Moreover, for any a A the resolvent (a) is open in C, and so the spectrum is a closed set.
Self-adjoint and normal operators on Hilbert spaces will form a nice exception to this.
292
Proof. If a A and a < 1, then (with convention a0 = for any a A) ( a)1 =

n=0
an ,
since the right-hand side converges absolutely and ( a)

n=0
an
= =
n=0 n n=0
an ( a)
n=1
an = .
This shows that B1 () U . Now let a0 U be any invertible element, and assume that a a0 < 1 1 a . Then we claim that a is also invertible, which will then show 0 that U A is open. To see the claim, notice that a = a0 + (a a0 ) = a0
1 + a 0 (a a0 ) B1 ()
is a product of two elements of U and so lies in U . Finally, for any a A the resolvent set (a) = { C | a U } is the pre-image of an open set under a continuous mapping, and so is open. Therefore the spectrum (a) = C (a) is closed. Proposition 9.9. Let A be a unital Banach algebra over C, and let a A. If C has || > m am for some m 1, then (a). In particular, (a) B Ca (0) is compact. Proof. Let a A and C be as in the proposition. Then we claim that ( a)
1
= 1 = 1
1 a
n an
n=0
n=0
= 1 1 + 1 a + + (m1) am1
mn amn .
Here and below we will sometimes prefer to study a instead of a , obviously this does not make any dierence.
293
For this notice rst that, by assumption,

n=0 n=0
mn amn
n=0
||m am
<1
<
so the series
mn amn converges absolutely. Moreover,

n=0 n=0
( a) 1 1 + 1 a + + (m1) am1
mn amn =
m am
mn amn = ,
which proves the claim. This implies that (a), and using the case m = 1 also shows that (a) B Ca (0). 9.1.2 Using Cauchy Integration We have shown that (a) C is compact for any element a A, but have yet to show that (a) is non-empty. This existence theorem uses Cauchy integration, and to prepare for this we need the following lemma concerning the resolvent. Lemma 9.10. Let a be an element of a unital Banach algebra A over C. Then the resolvent function R : (a) A dened by R() = ( a)1 is an analytic function in the sense that for any 0 (a) there is an open neighborhood of 0 on which R is given by an absolutely convergent power series R() =
n=0
bn ( 0 )n
with coecients bn A. Proof. We use essentially the same formulas as arising in the proof of Proposition 9.8. Let a A and 0 (a) be as in the lemma. Suppose that C satises | 0 | < (0 a) 1 . Then a = (0 a) (0 ) which shows that = (0 a)( (0 )(0 a)1 ),
294
R() = (0 a)1 is, for | 0 | < (0 a)1 claimed.
n=0
(0 a)1
(0 )n
, an absolutely convergent power series as
With this analyticity we are ready to prove the rst part of Theorem 9.5. Proof that (a) is non-empty. Let a be an element of a unital Banach algebra, and suppose that (a) is empty. We rst sketch an argument that produces a contradiction from this assumption, and then ll in the details. Since (a) is empty, the resolvent R() = ( a)1 is an entire function (that is, is a function analytic at each point of C). It follows by Cauchys integral formula that R(z ) dz = 0
for any closed piecewise dierentiable path in C. In particular, if is the closed path with center 0 and radius a + 1, then R(z ) = (z a)1 =
1 z 1 a z 1
n=0
z n1 an
(9.2)
for any z on the path , and the sum is absolutely convergent. Therefore, 0=
R(z ) dz = = a
|z |= a +1 n=0 n n=0
z n1 an dz (9.3) z
n1
|z |= a +1
dz = 2 i,
since z n1 dz =
|z |= a +1
2 i if n = 0, 0 if n = 0.
Now = 0 (unless the Banach algebra is the trivial one A = {0}, which is a case we want to ignore), and so (9.3) shows that the assumption (a) = leads to a contradiction. The diculty with the argument sketched above is that most of the integrals are integrals of A-valued functions. Even though it is possible to make sense of integration for A-valued functions (just as we have seen how to make sense of integration for functions taking values in a Hilbert space in Section 3.3.4, see Exercise 3.32), we do not need to extend the Cauchy integral formula for A-valued functions because of the following argument (which could be used to prove such an extension). Let A be a linear functional with () = 0 (such a functional is guaranteed to exist by Theorem 6.3) and consider the function
The alert reader may notice that this usage of Cauchy integration is a bit unorthodox, but should read on this will be resolved below.
295
R : (a) = C C By Lemma 9.10, R(z ) can be represented as a power series for z suciently close to any element z0 of (a) = C. By continuity of , the same holds for R. It follows that R : C C is an entire function (in the usual sense of complex analysis). Using this entire function in the calculation in (9.3) we see that 0=
|z |= a +1
R(z ) dz =
n=0
(an )
|z |= a +1
z n1 dz = 2 i() = 0.
It follows that R cannot be dened on all of C, and so (a) is non-empty. For the spectral radius formula in Theorem 9.5 we also need the following elementary property of sub-additive and sub-multiplicative real-valued sequence. Denition 9.11. A real sequence (n ) is sub-additive if m+n for all m, n m + n 0 and
1, and is sub-multiplicative if n m+n m n
for all m, n
1.
Lemma 9.12 (Feketes lemma). Let (n ) be a real sequence. (1) If (n ) is sub-additive then
n
lim
n n = inf . n 1 n n
(2) If (n ) is sub-multiplicative then lim n n = inf n n .

n n 1
Proof. Suppose rst n 0 is sub-multiplicative. If n0 = 0 for some n0 , then n = 0 for all n n0 and in this case the claim is trivial. So suppose n > 0 for all n 1, in which case the second statement of the lemma follows from the rst by taking the logarithm of the sequence. So consider now a real-valued sub-additive sequence n . Let = inf
nN
n , n
k k 1 < + 2 .
n so that for all n 1. Given > 0, pick k 1 such that n By the sub-additive property, for any m 1 and j , 0 j < k ,
296
mk+j mk + j
mk j + mk + j mk + j mk j + mk mk mk j1 + mk mk 1 1 1 k + <+ + . k m 2 m
1 m 1 <2 , then n n
So if n = mk + j is large enough to ensure that required.
< + as
Proof of Theorem 9.5. Notice rst that the sequence (n ) dened by n = an for all n 1 is sub-multiplicative, since m+n = am+n for all m, n 1. Thus
m
am
an = m n
lim
am = inf
m 1
am
1
m
exists by Lemma 9.12. Now by Proposition 9.9, if C has || > inf m then (a). Thus max || inf m am .
(a) m 1
am
The reverse inequality is more involved. It involves a renement of the claim that (a) is non-empty, and we will use the Cauchy integral formula again. Let s = max(a) ||, so that the resolvent function R(z ) = (z a)1 is analytic on {z C | |z | > s} (a).
Pick A with 1, and x > 0. We now use the closed path around the circle of radius s + to see that (z a)1 z m dz = Oa, ((s + )m ) ,
|z |=s+
where the implicit constant only depends on R(z ) restricted to {z C | |z | = s + }, and in particular does not depend on m and . Expanding the circle to the radius a + 1 does not change the integral, but we may use (9.2) again to see that (z a)1 z m dz = =
n=0
|z |=s+
|z |= a +1
(z a)1 z m dz z n+m1 dz = 2 i(am ).
(an )
|z |= a +1
9.2 C -algebras
297
Together this gives for all m 1 and A with |(am )| a, (s + )m am a, (s + )m . Taking the mth root and the limit we see that the implicit constant disappears, and we get lim m am s + . Since this holds for any > 0 the theorem follows. The results of this section will be used in the next chapter to derive the spectral theory of self-adjoint operators. For normal operators (or, more generally, a collection of bounded self-adjoint or normal operators) the material of the next section will also be useful.
m
1. By Corollary 6.4 we deduce that
9.2 C -algebras
Denition 9.13. A Banach algebra A over C is a C -algebra if it has in addition to the multiplication operator (as an algebra) a star operator : A A with the following properties: is anti-linear; (ab) = b a for a, b A; (a ) = a for a A; a = a for a A; and a a = a 2 for a A (the C -property of the norm).
Example 9.14. (a) The algebra of bounded operators B (H ) on a Hilbert space H has a star operator, namely the map that sends A B (H ) to its adjoint A B (H ). For this operator we already know all the desired properties with the exception of the last (critical) property. To see this last property, let A B (H ), and notice that A A is self-adjoint since (A A) = A (A ) = A A. Hence, by Lemma 4.25, A A = sup | A Ax, x | = sup Ax, Ax = A
x 1 x 1 2
as required. Therefore B (H ) is a C -algebra. (b)The spaces of bounded functions B (X ), of continuous bounded functions Cb (X ), of measurable bounded functions L (X ), or of measurable es sentially bounded functions L (X ) are all commutative unital C -algebras. For these multiplication is dened pointwise, and the star operator is simply pointwise complex conjugation.
298
Denition 9.15. Let A be a C -algebra. Then an element a A is called self-adjoint if a = a, and is called normal if a a = aa .
Exercise 9.16. Let A be a unital C -algebra. Show that the unit and has = 1.
is self-adjoint
For normal elements in a C -algebra the spectral radius formula is quite easy. Proposition 9.17. Let A be a unital C -algebra, and let a A be a normal element. Then the spectral radius satises
(a)
max || = a .
Proof. We will prove by induction on n that a2

n
= a
2n
(9.4)
The case n = 0 is trivial. For n = 1 we have a2

2
= (a2 ) a2 = (a a)(a a) = a a
= a 4,
where we used the C -property of the norm for a2 , normality of a, selfadjointness of a a and the C -property of the norm for a a, and nally also the C -property of the norm for a. Now suppose that (9.4) holds for n a given n 1 and set b = a2 . Then a2
n+1
= b2 = b
= a2
= a
2n+1
where we used the denition of b, the case n = 1 for b, the denition of b, and the inductive hypothesis. This concludes the induction, and using Theorem 9.5 this gives the proposition.
9.3 Commutative Banach Algebras and their Gelfand duals

Recall that the dual space B of a Banach algebra consists of all bounded linear functionals B C. If B = A is a commutative Banach algebra (one in which ab = ba for all a, b B ) then it also makes sense, and is useful, to study algebra homomorphisms. The trivial map dened by (b) = 0 for all b B may also be considered an algebra homomorphism, but we will exclude this trivial map in the discussion below. Denition 9.18. Let A be a commutative Banach algebra over C. Then the Gelfand dual Ao is the set of all non-trivial (that is, surjective) continuous algebra homomorphisms : A C (which are also called characters ). That is, Ao = { A {0} | (ab) = (a)(b) for all a, b A}.
299
9.3.1 Commutative Unital Banach Algebras If the Banach algebra that we consider also has a unit, then we can link the notion of algebra homomorphisms to the spectrum of the elements of the algebra. The following result establishes this link and a great deal more. Theorem 9.19 (Properties of the Gelfand dual). Let A be a commutaA is non-empty and weak* tive unital Banach algebra over C. Then Ao B1 compact, and (a) = {(a) | Ao } for every a A. In the course of the proof of Theorem 9.19 we will also see that in fact any algebra homomorphism : A C is automatically continuous(9) and satises 1. This latter result is straightforward. Lemma 9.20. Let A be a commutative Banach algebra, and let Ao . Then 1. Proof. Suppose that > 1, so that there is an element a A with a < 1 n and with (a) = 1. Then b = n=1 a converges, and a + ab = b, so (a) + (a)(b) = (b), and hence 1 + (b) = (b), a contradiction. For the next steps we will need to use some terminology from basic algebra. Recall that an ideal J of a commutative algebra A is a subspace such that AJ = {ab | a A, b J } J , and that for any ideal J the quotient A/J is also an algebra with multiplication given by (a + J )(b + J ) = ab + J for all a, b A. An ideal J A is proper if J = A. A maximal ideal M in a unital algebra A is a proper ideal such that if J is an ideal with M J A then J = M or J = A. The quotient by a maximal ideal M is always a eld, for if a + M A/M {0} then J = Aa + M M is a bigger ideal, and so must be A. Since A has a unit , we have ba + m = for some b A and m M , so every non-zero element of A/M has a multiplicative inverse. The next lemma examines these general notions for Banach algebras. Lemma 9.21. Let A be a commutative unital Banach algebra. The closure of any ideal in A is an ideal, and a maximal ideal is closed. Proof. The multiplication map A A A is continuous by the discussion in Section 2.4.3. This implies the rst claim in the lemma, for if J A is an ideal, a A and bn b J as n with bn J for all n 1. Then abn J for all n 1, and by continuity abn ab J as required.
300
For the second claim, notice that a proper ideal J A cannot contain or indeed any invertible element. By Proposition 9.8, this implies that / J. Since a maximal ideal M is proper, and its closure M is also a proper ideal, we see that M = M is closed. Lemma 9.21 already implies that any algebra homomorphism : A C on a commutative unital Banach algebra is continuous. Indeed, M = ker is a maximal ideal, and so is closed. Since A/M C is nite-dimensional, this shows that : A A/M C
is a continuous map. For the proof of Theorem 9.19 we need one more algebraic result. Lemma 9.22. Let R be a commutative ring with a unit, and let J0 R be a proper ideal. Then there exists a maximal ideal M R containing J0 . Proof. This is a direct application of Zorns lemma. Dene a set S = { J R | J0 J R}
with the partial order dened by inclusion. If (J | I ) is a linearly ordered chain in S , then J= J / J for all is again an ideal. Moreover, since each J is proper, we have I , and so J is also proper. By Zorns lemma it follows that S contains a maximal element, which by construction is a maximal ideal containing J0 . Proof of Theorem 9.19. By the denition and Lemma 9.20 we have
A (0) | (ab) = (a)(b) for all a, b A and () = 1} Ao = { B1
A (0) = B1
a,bA
{ A | (ab) = (a)(b)} { A | () = 1}.
Here the sets { A | (ab) = (a)(b)} and { A | () = 1} are closed in the weak* topology. Hence Ao is weak* compact by the TychonoAlaoglu theorem (Theorem 7.4). Now let a A be non-invertible, so that J = Aa is a proper ideal. By Lemma 9.22 there is a maximal ideal M A containing a. By Lemma 9.21, M is closed. We claim that B = A/M is also a Banach algebra. To see this, we equip B with the quotient norm (from Section 2.1.2) which makes B into a Banach space by Lemma 2.24. Since M is an ideal, multiplication is welldened on A/M. Finally, ab + M
A/M
ab
A,
301
which implies that (a + M)(b + M)

A/M
a+M
A/M
b+M
A/M
as required. Thus A/M is a Banach algebra and eld (since M is maximal). We claim this implies that A/M = C( + M) = C. Indeed (see Exercise 9.7), if a + M A/M, then (a + M) = 0 by Theorem 9.5, and so a + M is non-invertible for some C. However, since A/M is a eld this implies that a + M = + M and so the claim. Together we have shown that if a A is non-invertible, then there exists a non-trivial algebra homomorphism : A A/M C with (a) = 0. Applying this to a for (a), we see that for any such there is some Ao with (a) = . On the other hand if (a) = for some a A, C and Ao , then (a ) = 0 and a cannot be invertible (since = 0). Together we have shown the theorem. Example 9.23 (StoneCech compactication). Let A = (N), which is a Banach algebra with respect to the pointwise product. Clearly for any n0 N the map dened by n0 ((an )) = an0 is an algebra homomorphism, and this assignment denes a map from N to Ao . The compact (but non-metrizable) topological space Ao is called the Stone Cech compactication of N and is denoted N.
Exercise 9.24. (a) Show that N is dense in N. (b) Show that (N) can be canonically identied with C ( N). (c) Show that N is non-metrizable.
9.3.2 Commutative Banach Algebras without a Unit While the notions of invertibility and spectrum are linked to the existence of a unit, the denition of the Gelfand dual does not require a unit. However, the topological properties of Ao are changed by the absence of a unit. Corollary 9.25 (Properties of the Gelfand Dual). Let A be a comA (0) is locally compact mutative Banach algebra over C. Then Ao B1 o (and A {0} is compact) in the weak* topology on A . For any a A we also have max |(a)| = lim n an . o
A {0} n
302
Proof. If A has a unit, then Theorem 9.19 applies. So assume that A does not have a unit, and consider the algebra A = A C with the multiplication and norm as in Exercise 9.1. Clearly ((0, 1)) = 1 o for any (A )o , so there is a one-to-one correspondence between A and |A{0} . Moreover, this correspondence is a homeomorphism with respect to the respective weak topologies. Applying Theorem 9.19 (and noticing that (A )o may become trivial when restricted to A = A {0} A ) proves the result. 9.3.3 The Gelfand Transform Denition 9.26. Let A be a commutative Banach algebra with Gelfand dual Ao . Then the map ()o : A C (Ao ) dened by f o () = (f ) for f A and Ao is called the Gelfand transform. Just as in Theorem 9.19 we will always use the weak* topology in Ao . Proposition 9.27. The Gelfand transform maps A into C0 (Ao ) (or into C (Ao ) if A has a unit), and satises fo for all f A. Proof. By denition of the weak* topology, f o () = (f ) depends continuously on Ao for each f A. By Lemma 9.20, |f o ()| = |(f )| f f for all Ao , and so f o f . Finally, 0 A plays the role of innity in the one-point compactication of Ao in Corollary 9.25. This gives f o C0 (Ao ) as required. An important special case of the construction above is given by the following . Proposition 9.28 (Algebra homomorphisms on L1 (G)). Let G be a locally compact metrizable abelian group, which we equip with a Haar measure m as in Fact 1 from p. 104. Then L1 (G) is a commutative Banach algebra with respect to the convolution dened by
The reader may make this more familiar by assuming that G = Rd or G = Td .
303
f1 f2 (g ) =
f1 (h)f2 (g h) dm(h)
for f1 , f2 L1 (G). The Gelfand dual (L1 (G))o of all non-trivial algebra homomorphisms can be identied with the Pontryagin dual G, which by denition consists of all continuous homomorphisms G S1 . The Gelfand transform can be identied with the Fourier transform. We now explain the two identications in more detail. If G is a continuous group homomorphism : G S1 = {z C | |z | = 1}, then it also gives rise to an algebra homomorphism dened by (f ) =
G
f (g )(g ) dm(g ),
which is well-dened since f L1 (G) and L (G). The rst identication claimed is the statement that every algebra homomorphism on L1 (G) has this shape. This also explains the second identication as follows. The Fourier transform of an element f L1 (G) is the function f on G dened by f () =
G
f (g )(g ) dm(g )
for G. Since we identify the Pontryagin dual G with the Gelfand dual (L1 (G))o , we see that f () = (f ) = f o ( ) is the Fourier transform and is at the same time also the Gelfand transform. Proof of Proposition 9.28. The proof that L1 (G) is a commutative Banach algebra, and that every continuous group homomorphism : G S1 gives rise to an algebra homomorphism : f L1 (G) f dm
G
is very similar to the proof for the case G = Rd in Proposition 8.24 and is therefore left to the reader. The main claim of the proposition is the claim that every non-trivial algebra homomorphism : L1 (G) C has the form = for some continuous group homomorphism : G S1 . So let (L1 (G))o . Then by Lemma 9.20 we have 1. By Proposition 6.26 there is an element L (G) with esssup 1, so that (f ) =
G
f dm.
304
We have to show that can be chosen in Cb (G) such that (gh) = (g )(h) for all g, h G. By assumption = 0, so there exists some f0 L1 (G) with (f0 ) = 0. Also, by assumption on and Fubinis theorem (f )(f0 ) = (f f0 ) = =
G
f (h)f0 (g h) dm(h)(g ) dm(g )

G
f (h)
f0 (g h)(g ) dm(g ) dm(h)
for all f L1 (G). We now dene 0 (h) = ((f0 ))

1 G
f0 (g h)(g ) dm(g ),
so that (f ) = G f 0 dm for all f L1 (G). Notice that 0 is dened using f0 and essentially by convolution. This can be used to show that 0 Cb (G), since |0 (h) 0 (h0 )| = ((f0 ))
1 G
(f0 (g h) f0 (g h0 )) (g ) dm(g ) f0 (g h) f0 (g h0 ) dm(g )

h (g ) =f0
|(f0 )|1
=f0 0 (g)
h0 h 0 = |(f0 )|1 f0 f0
as h h0 by Lemma 3.27. Now choose a sequence (Bn ) of decreasing open neighborhoods of 0 G that form a basis of the neighborhoods at 0. Then the sequence (n ) dened by 1 B n = m(Bn ) n for all n 1 forms an approximate identity. Now let g1 , g2 G be arbitrary. Then
g1 )= (n
1 m(Bn ) 1 = m(Bn )
Bn (h g1 )0 (h) dm(h)
0 (h) dm(h) 0 (g1 )
Bn +g1
g2 and (n ) 0 (g2 ) as n . Morevover, g2 g1 g2 g1 ) n ) = (n ) (n (n 1 = B (h1 g1 )Bn (h2 h1 g2 ) dm(h1 )0 (h2 ) dm(h2 ) m(Bn )2 G G n 1 0 (h1 + h = 2 ) dm(h1 ) dm(h2 ), m(Bn )2 Bn +g2 Bn +g1
305
where we have used the substitution h 2 = h2 h1 . By continuity of the addition operation, this is an average of the values of 0 over smaller and smaller neighborhoods of g1 + g2 , so
g2 g1 ) 0 (g1 + g2 ) ) (n (n
as n . Thus
0 (g1 + g2 ) = 0 (g1 )0 (g2 )
for g1 , g2 G. In other words, 0 : G C is a homomorphism to the multiplicative structure of C. Since 0 is bounded and non-zero, it follows that 0 is non-zero everywhere, and that 0 : G S1 takes values in S1 . This completes the proof, since = 0 for 0 G. The next example shows that the Fourier transform (or in general the Gelfand transform) is not an isometry. Example 9.29. Let G = R and f1 = [0,1] L1 (R). Then f1 (t) = 1
e2it 1 2 it
= e i t e
it
e i t 2 it
eit sin t t
for t = 0, for t = 0
so f1 = 1 = f1 1 , but the maximum value of |f1 (t)| is attained precisely at the point t = 0. Now consider f2 (x) = [0,1] (x) [1,0] (x) = f1 (x) f1 (x) with f2 (t) = f1 (t) f1 (t). Clearly |f2 (t)| achieves its maximum for some t0 = 0, so that f2 = |f2 (t0 )| |f1 (t0 )| + |f1 (t0 )| < 2 f1 = f2 ,
showing that the Fourier transform (and hence a general Gelfand transform) need not be an isometry. 9.3.4 The Gelfand Transform for Commutative C -algebras The Gelfand transform has good additional properties for commutative C algebras. Corollary 9.30. Let A be a unital commutative C -algebra over C. Then the Gelfand transform is an isometry from A onto C (Ao ).
306
is an isomorphism between A and a complete subalgebra of C (Ao ) which also contains the unit. Moreover, the image separates points since 1 = 2 Ao implies that there exists some a A with a(1 ) = 1 (a) = 2 (a) = a(2 ). Since Ao is a compact space we can apply the StoneWeierstrass theorem (Theorem 2.34) if we also know that the image is closed under conjugation. Assuming this for now, we conclude that the image of is both dense in C (Ao ) and complete, and therefore must be all of C (Ao ). Since A is closed under , it suces to prove that (a ) = (a). This in turn follows if we know that a = a A implies that (a) R for a Ao . In fact, any a A can be written as a=

Proof. For a A the norm ao of the Gelfand transform equals the spectral radius of a (see Theorem 9.19). By Proposition 9.17 we get ao = a , since in a commutative C -algebra every element is normal. This shows that ()o : A C (Ao )
a + a a a +i , 2 2i
a+a 2
a a where both a+ and a 2 2i are self-adjoint. Assuming that are real, we deduce that
and
a a 2i
(a ) =
a + a a a i 2 2i a + a 2 + i
= a a 2i
a + a 2 =
i i
a a 2i = (a).
a + a a a +i 2 2i
The following lemma nishes the proof of the corollary. Lemma 9.31. Let a = a A be a self-adjoint element of a unital C algebra. Then (a) R. As we will see in the course of the proof, this can be deduced from Proposition 9.17. This might be a little confusing initially. How can a property like max(a) || a imply that (a) R. One way of viewing the situation is to use a vertical translation of the set (a), as illustrated in Figure 9.1. Proof of Lemma 9.31. By Proposition 9.17 we know that the spectral radius of a is a . We will use this for Ri to show that (a) R. So let (a) and = iy for y R. Then iy (a iy ), and
9.4 Further Topics
307
(a + i)
a B Ca (0) (a )
a + i
Fig. 9.1. Many possible C that satisfy the constraint || a might not satisfy a 2 + ||2 as the constraint | + i| a + i if the norm of a + i for R is Figure 9.1 suggests. Taking and gives (a) R.
| iy |2
a iy
= a2 + y 2 a2 + y 2 a
2 2
= (a iy ) (a iy ) = (a + iy )(a iy )
+y ,
where we have used the fact that = 1 (see Exercise 9.16). If = x0 + iy0 C with x0 , y0 R, then the calculation above gives for all y R. However, this shows that y0 = 0, and so (a) R as claimed.
Exercise 9.32. Let A be a commutative C -algebra. (a) Show that the Gelfand transform is an isometry onto C0 (Ao ). (b) Show that Ao is compact if and only if A is unital. (c) Assume now that A is not unital. Show that it is possible to dene a norm on A = A C so that A is again a C -algebra. (The norm from Exercise 9.1 may not do this.)
2 2 2 2 x2 0 + (y y0 ) = x0 + y 2yy0 + y0
+ y2
9.4 Further Topics

The study of L1 (G) for a locally compact abelian group can lead to a vast generalization of the theory of Fourier series and the Fourier transform to
308
all such groups. This is known as Pontryagin duality or harmonic analysis for locally compact abelian groups. We refer to Hewitt and Ross [17] for the details. Another important class of Banach algebras with additional structure are the von Neumann algebras. These are special subalgebras of B (H ) for a Hilbert space H , and so in particular are also C -algebras. We refer to Blackadar [3] for an overview.
10 Functional Calculus and Spectral Theory
10.1 Denitions, Basic Lemmas, Main Goals

In this chapter we will study the spectrum (as dened for abstract algebras in Section 9.1) in the context of bounded operators on a Hilbert space. More precisely, we x a complex Hilbert space H let A = B (H ) be the Banach algebra of bounded operators, and study the spectrum of some T A. In fact, as we have seen in Section 9.2, B (H ) is also a C -algebra and we will use the strenghtening of Theorem 9.5 given in Proposition 9.17. 10.1.1 Discrete, Continuous, and Residual Spectrum Since an operator with non-trivial kernel cannot be invertible, it is clear that any eigenvalue of T B (H ) belongs to the spectrum of T . It is usual to call the set of eigenvalues the discrete or point spectrum. Denition 10.1 (Discrete spectrum). We say that C belongs to the discrete spectrum of T B (H ), and write discrete (T ), if ker(T ) = {0}.
As we have already seen in Exercise 4.20 and Example 8.1, there may not be any discrete spectrum for a given bounded operator. For these examples the notion of eigenvalue has to be replaced by a sequence of approximate eigenvalues in the following sense. Denition 10.2 (Continuous spectrum). We say that C belongs to the continuous spectrum of T B (H ), and write continuous (T ), if there
310
is a sequence of approximate eigenvectors (vn ) with vn (ker(T )) and vn = 1 for all n 1, and with (T )vn 0 as n .
Exercise 10.3. (a) Show that continuous (T ) C is a closed set for any T B (H ). (b) Find an example of an operator T B (H ) for which discrete (T ) is not a closed subset of C. (c) More specically, nd an example for which discrete (T ) is countable and dense in continuous (T ) = [0, 1].
Roughly speaking, for multiplication operators the discrete spectrum corresponds to atoms, and the continuous spectrum corresponds to the continuous part of the measure, as shown in the following renement of Exercise 9.3. See also part (c) of the following exercise to see why the last sentence is wrong.
Exercise 10.4. (a) Let be a compactly supported nite measure on C, and let (M (f ))(z ) = zf (z ) for f L2 (C) be the multiplication operator corresponding to the identity map on C. Show that discrete (M ) = { C | ({}) > 0} and continuous (M ) = { C | (U {}) > 0 for every neighborhood U of }. (b) Let (X, B, ) be a -nite measure space, and let H = L2 (X ). Let g : X C be a bounded measurable function with corresponding multiplication operator Mg . Show that discrete (Mg ) = { C | (g 1 {}) > 0} and continuous (M ) = { C | g 1 (U {}) > 0 for every neighborhood U of }. (c) Let X = [1, 1] R, and let be the counting measure on Q [1, 1] considered as -nite measure on X . Let M be as in part (a). Show that discrete (M ) = Q [1, 1] and continuous (M ) = [1, 1] (even though the measure has no continuous part).
In general the discrete and continuous spectrum may not describe the whole spectrum, giving rise to the next denition. Denition 10.5 (Residual spectrum). We say that C belongs to the residual spectrum of T B (H ), and write residual (T ), if / discrete (T ) and Im(T ) = H.
In the literature one also sees a dierent denition, that / discrete (T ) and there exists a sequence (vn ) in H with vn = 1 for all n 1 and with (T )vn 0 as n . We choose Denition 10.2 as it ensures that continuous (T ) a closed subset of C (see Exercise 10.3).
311
Example 10.6. Let T : 2 (N) 2 (N) be the operator dened by T ((xn )) = (0, x1 , x2 , . . .) from Exercise 4.19(c). Then T x = x for any x 2 (N), and so 0 / discrete continuous (T ). However, the image of T is the proper closed subspace {x 2 (N) | x1 = 0}, so 0 residual (T ).
Exercise 10.7. For the operator T from Example 10.6, show that discrete (T ) = , continuous (T ) = S1 = { C | || = 1}, and
C residual (T ) = B1 (0) = { C | || < 1}.
The next lemma gives the main relationship between the three parts of the spectrum from this section and the spectrum in the sense of Denition 9.2 for A = B (H ). Lemma 10.8 (Decomposition of spectrum). Let H be a complex Hilbert space, and let T B (H ). Then (T ) = discrete (T ) continuous (T ) residual (T ). Moreover, if T is normal, then the residual spectrum is empty. Proof. If / (T ) then, by denition, (T )1 B (H ) and so T is injective, and thus / discrete (T ). Also, if vn H has vn = 1 for all n 1 then 1 = vn = (T )1 (T )vn (T )1 (T )vn
shows that (vn ) cannot be a sequence of approximate eigenvectors, and hence / continuous (T ). Finally, if / (T ) then T is onto and so / residual (T ). The reverse inclusion can be shown almost as directly. Suppose that / discrete (T ) continuous (T ) residual (T ). Then T is injective, since in particular / discrete (T ), and there exists some > 0 with (10.1) v (T )v since / continuous (T ). Therefore (T ) : H Im(T ) is bijective and has an inverse (T )1 : Im(T ) : H
312
that is continuous by (10.1). This implies that Im(T ) is complete (check this), and so is a closed subspace of H . It follows that Im(T ) = H because / residual (T ). Suppose now that T : H H is a normal operator, and that C has V = Im(T ) = H. By normality
for v H , which implies in particular that V is T -invariant. By Lemma 4.24 we deduce that V is T -invariant. Now let v V {0}. Then (T )v V by T -invariance, and (T )v V by denition. This forces (T )v = 0, and so discrete (T ). It follows that residual (T ) = . 10.1.2 Numerical Range The following denition is useful because it gives an upper bound for the spectrum. Denition 10.9. The numerical range of T B (H ) is the set N (T ) dened by N (T ) = { T v, v | v H, v = 1}. Lemma 10.10. The spectrum of T B (H ) is contained in the closure of the numerical range of T . / (T ). By Proof. We have to show that C N (T ) implies that assumption, | (T )v, v | = | T v, v |
T ((T )v ) = (T )(T v )
for some xed > 0 and all v H with v = 1. This shows that T is injective, so / discrete (T ), that (T )v for v = 1 (by CauchySchwarz), so / continuous (T ), and that any v H with v = 1 is not / residual (T ). By Lemma 10.8, we deduce orthogonal to Im(T ), so that / (T ).
Exercise 10.11. Show that N (T ) is really only an upper bound for the spectrum of T B (H ) by showing that N (T ) is the convex hull of the eigenvalues of T if T is a diagonalizable map with orthogonal eigenvectors.
The following is a useful easy consequence of Lemma 10.10 (giving an easy alternative to the argument used in Lemma 9.31). Lemma 10.12. If T B (H ) is self-adjoint then (T ) R. Proof. For any v H we have T v, v = v, T v = T v, v = T v, v R,
so (T ) N (T ) R by Lemma 10.10.
313
10.1.3 Main Goals: Spectral Theorem and Functional Calculus In this chapter we will establish two related theorems about normal operators, the rst of which gives a complete classication of normal operators in terms of operators as in the next example (which featured in exercises before). Example 10.13. This example is both very simple and extremely important; indeed, in view of the spectral theorem (see Theorem 10.14 below) proved in this chapter, this example describes all self-adjoint bounded operators on a Hilbert space. Let H = L2 (X, ) for a nite (or slightly more generally a -nite) measure space (X, ), and let g L (X ) be a real-valued bounded function. The multiplication operator Mg is then self-adjoint on H . We claim that the spectrum (Mg ) is the essential range of g , Note rst that we have Mg = Mg . If X = {x | g (x) = } has positive measure (which clearly implies that belongs to the essential range), then discrete (Mg ) since, for example, f = B ker(Mg ) for any measurable B X of positive nite measure. So suppose now g (x) = almost everywhere. Therefore, we can solve the equation (Mg ) = , formally, for any L2 (X, ), by putting = 1 , g (Mg ) = {x R | g 1 ((x , x + )) > 0 for all > 0}.
and this is in fact the only solution as a set-theoretic function on X . It follows that (Mg ) if and only if the operator (g )1 is a bounded linear map on L2 (X, ). By Corollary 5.23, we know this is equivalent to asking that (g )1 be an L function on X . This translates to the condition that there exist some C > 0 such that ({x X | |(g (x) )1 | > C }) = 0, or equivalently that ({x X | |g (x) | <
1 C })
= 0,
which means that is not in the essential range of g . This chain of implications is easily reversed so that the claim follows. It is convenient to observe that the essential range can also be identied with the support of the image measure = g () on R, (Mg ) = Supp g (). In particular, if X is a bounded subset of R and g (x) = x, then the spectrum of Mx is the support of .
314
Theorem 10.14 (Spectral theorem for normal operators). Let H be a separable complex Hilbert space, and let T B (H ) be a normal operator on H . Then there exists a nite measure space (X, ), a bounded measurable g 2 L (X ), and a unitary isomorphism : H L (X ) such that T :H H
Mg : L2 L2 (X ) (X ) commutes. As we will see we can take X = (T ) N, which we will identify with the countable disjoint union X= (T ).
nN
Moreover, the measure on X will be obtained from countably many spectral measures which we will dene using the continuous functional calculus. Finally, g will be the bounded map g (z, n) = z on X . The second goal is to establish the measurable functional calculus, which allows us to obtain normal operators f (T ) from any normal T B (H ) and any bounded measurable f L ( (T )). For a given normal T B (H ) this assignment L ( (T )) f f (T ) B (H ) (10.2) has many natural functorial properties: (FC1) If f (z ) = j =0 aj z j for all z (T ), then f (T ) = j =0 aj T j . (FC2) The map in (10.2) is an algebra homomorphism. In particular, f1 (T ) commutes with f2 (T ) for f1 , f2 L ( (T )). Moreover (f (T )) = f (T ) for f L ( (T )). (FC3) The map in (10.2) is continuous, with f (T )
operator n n
for f L ( (T )), and is an isometry on C ( (T )), i.e. f (T ) for f C ( (T )). Example 10.15. (a) Suppose T B (H ) is a normal operator on a complex Hilbert space H , and suppose that f (z ) = n 0 an z n is a power series with C radius of convergence R such that (T ) BR (0). Then we may restrict the function f to (T ) so that the power series converges uniformly and absolutely. By combining (FC1) with (FC3), it follows that f (T ) = n 0 an T n is also dened by an absolutely converging power series.
operator
= f
(10.3)
315
(b) Let (X, ) be a nite (or -nite) measure space, g L (X ), and Mg : 2 L2 ( X ) L ( X ) be the multiplication operator as in Example 10.13. For a polynomial f (z ) C[z ] it follows from (FC-1) that f (Mg ) = Mf g . It is reasonable to expect that this holds more generally for f C ( (T )) or even f L ( (T )). Here, the composition f g is well-dened in L (X ), although the image of g might not lie entirely in (Mg ) (and so in the domain of the continuous or measurable function), because the description of the spectrum (Mg ) in Example 10.13 shows that ({x | g (x) / (Mg )}) = (R (Mg )) = 0, (the complement of the support being the largest open set with measure 0) so that, for almost every x X , g (x) lies in (Mg ) and therefore f (g (x)) is dened for almost every x (we can dene the function on the zero-measure subset where g (x) / (Mg ) using some arbitrary measurable function, and this does not change the resulting multiplication operator denoted Mf g ). The next property summarizes this discussion. (FC4) If H is unitarily isomorphic to L2 (X ), T B (H ) corresponds (via and the commutative diagram above) to Mg on L2 (X ) as in Theorem 10.14, then f g is dened a.e. and f (T ) corresponds to Mf g . (FC5) If f L ( (T )) and g L (f ( (T ))), then g (f (T )) = (g f )(T ). (FC6) If V H is a closed T -invariant subspace, then V is also f (T )-invariant for all f L ( (T )). Moreover, if S B (H ) commutes with the normal operator T B (H ) and its adjoint, then S commutes also with f (T ) for all f L ( (T )). It is tempting to interpret (FC4) as saying that the existence of the map in (10.2) is simply a consequence of Theorem 10.14. However, if we really use (FC4) as the denition of the functional calculus then we would not know whether it is canonical that is, independent of the isomorphism in Theorem 10.14. The fact that we will give a denition of the functional calculus in (10.2) independent of the isomorphism , but nonetheless obtain (FC4) as one of the properties of the functional calculus demonstrates that there is only one reasonable way to dene f (T ) for f L ( (T )). Theorem 10.16 (Measurable functional calculus for normal operators). Let H be a complex Hilbert space, and T B (H ) a normal operator. Then there exists a functional calculus for T that is, a map as in (10.2) with the properties (FC1)(FC6). For simplicity we will start with the case of a self-adjoint operator, which only needs the material from Section 9.1 and Section 9.2. In compensation, we will treat in Section ?? the case of several commuting normal operators.
316
10.2 Continuous Functional Calculus for Self-Adjoint Operators

The goal of this section is to show how to dene an operator f (T ) where T B (H ) is a self-adjoint operator and f C ( (T )). Theorem 10.14 will be deduced in the next section for these operators. For certain functions f the denition of f (T ) is clear. For example, if
d
p(z ) =
j =0
j z j
is a polynomial in C[X ] restricted to (T ), then the only reasonable denition for p(T ) is
d
p(T ) =
j =0
j T j B (H ).
In fact, this polynomial denition makes sense for any T B (H ), not only for T normal, but there is a technical point which explains why only normal operators are really suitable here. If (T ) is nite, then a polynomial of unknown degree is not uniquely determined by its restriction to (T ). Thus in this case the denition above gives a map C[T ] B (H ), not one dened on C ( (T )). We cannot hope to have a functional calculus only depending on the spectrum if this dependency is real, and simple examples show that it sometimes is. Consider the operator A B (C2 ) given by the matrix A= 01 . 00
Clearly (A) = {0}, so the polynomials p1 (z ) = z and p2 (z ) = z 2 coincide when restricted to (A), but p1 (A) = A = p2 (A) = 0. However, if we assume that T is normal, this problem does not arise, because (we we will show) for every p C[z ] we have p(T ) = p
,(T )
as claimed in (10.3). This suggests a general denition, using the approximation principle from Proposition 2.46 and the StoneWeierstrass theorem (Theorem 2.34) applied to X = (T ) R for a self-adjoint operator T . We know in particular that any function f C ( (T )) can be approximated uniformly by polynomials. This suggests that we should attempt to dene f (T ) = lim pn (T ),
n
(10.4)
,X
where (pn ) is a sequence of polynomials with f pn
0 as n .
317
This denition is indeed sensible and possible, and the basic properties of this construction are given in the following theorem. Roughly speaking, this means that any operation on (or property of) the function f which is reasonable corresponds to an analogous operation on (or property of) f (T ). Theorem 10.17 (Continuous functional calculus). Let H be a Hilbert space and T B (H ) a self-adjoint bounded operator. Then there exists a unique linear map = T : C ( (T )) B (H ), denoted f f (T ), with the following properties: (1) For any polynomial p(z ) =
d j =0
j z j C[z ] we have
d
(p) = p(T ) =
j =0
(j )T j
(that is, extends the denition above). (2) The map is an isometric Banach algebra homomorphism, meaning that (f1 f2 ) = (f1 )(f2 ) for f1 , f2 C ( (T )), () = , and (f1 ) = f1
,(T ) .
(10.5)
) (that is, f (T ) = f (T )), (3) For any f C ( (T )), we have (f ) = (f and in particular f (T ) is normal. (4) If (T ) is in the point spectrum and v ker(T ), then v ker(f (T ) f ()). Notice that property (2) and (z z ) = T implies property (1) since is a C-linear map. As already observed, the essence of the proof of existence of is to show that (10.4) is a valid denition. Lemma 10.18. Let H be a Hilbert space. (1) For T B (H ) and a polynomial p C[z ], dene p(T ) B (H ) as before. Then (p(T )) = p( (T )). (10.6) (2) Let T B (H ) be normal and let p C[z ] be a polynomial. Then p(T ) = p
,(T ) .
(10.7)
Proof. For (1), x C and factor the polynomial p(X ) in C[X ] to give p(z ) =
1 i d
(z i ),
318
for some C {0} and complex numbers 1 , . . . , d C (not necessarily distinct). Since p p(T ) is an algebra homomorphism, it follows that p(T ) =
1 i d
(T i ).
If / p( (T )), then the solutions i to the equation p(z ) = are not in (T ), so each term T i is invertible, and hence p(T ) is invertible. It follows that (p(T )) p( (T )). Conversely, if p( (T )), then one of the i must lie in (T ). Because the factors commute, we can assume without loss of generality that either i = 1 if T i is not surjective in which case p(T ) is not surjective either, or i = d if T i is not injective in which case neither is p(T ) . In all situations, (p(T )), proving the reverse inclusion . For (2), we note rst that (p) = p(T ) is normal if T is. By the improved spectral radius formula (Proposition 9.17), we have p(T ) = and by (10.6), we get p(T ) = as desired. Proof of Theorem 10.17. Let T B (H ) be self-adjoint. By Lemma 10.18, we deduce that the map : (C[z ], ) B (H ) sending p to p(T ) is linear and continuous (indeed, is an isometry). Hence it extends uniquely (using Theorem 2.34 and Proposition 2.46) to a map dened on C ( (T )), and the extension remains isometric. By continuity, the ), which are valid for polyproperties (f1 f2 ) = (f1 )(f2 ) and (f ) = (f nomials (using T = T for the latter), pass to the limit and are true for all f C ( (T )). It follows that )(f ) = (f f ) = (f f ) = (f )(f ) = f (T )f (T ) , f (T ) f (T ) = (f so f (T ) is normal (and is self-adjoint if f is real-valued). It remains to check that the additional property (3) of the continuous functional calculus f f (T ) holds. So let v ker(T ). Write f as a
(p(T ))
max
||,
p((T ))
max
|| = max |p()|,
(T )
Notice that this argument uses Proposition 5.18, but we also note that this can be avoided.
319
uniform limit of a sequence of polynomials (pn ). Since T (v ) = v , we have by induction and linearity pn (T )v = pn ()v, for all n 1, and by continuity we deduce that f (T )(v ) = f ()v .
The following denition will help us to introduce spectral measures in the next section. Denition 10.19. Let T B (H ) be a bounded operator on a Hilbert space. We say that T is a positive operator, written T 0, if it is self-adjoint and has T v, v 0 for all v H .
Exercise 10.20. Let H be a complex vector space. Show that any T B (H ) with T v, v R for all v H is self-adjoint.
Corollary 10.21. Let H be a complex Hilbert space, and let T B (H ) be self-adjoint. If f C ( (T )) is non-negative, then f (T ) is a positive operator. Proof. If f C ( (T )) satises f f =( 0, then we can write f )2 = g 2
where g 0 is also continuous on (T ). Then g (T ) is well-dened, self-adjoint (because g is real-valued), and f (T )v, v = g (T )2 v, v = g (T )v, g (T )v for all v V , which shows that f (T ) 0. 0,
10.2.1 Corollaries to the Continuous Functional Calculus
The following generalizes Lemma 10.18(1) to any continuous function.
Corollary 10.22. Let H be a complex Hilbert space, and let T B (H ) be a self-adjoint operator. Then for any f C ( (T )) we have (f (T )) = f ( (T )).
The results of this subsection help to explain the functional calculus and how it can be used further, but will not be needed later.
320
Proof. We rst recall that for a compact topological space X , C (X ) is itself a Banach algebra with respect to the supremum norm, for which the spectrum is given by (f ) = f (X ) for f C (X ). So for f C ( (T )), we have (f ) = f ( (T )). We prove rst that (f (T )) implies that f ( (T )). Indeed, assume that / f ( (T )), then the function g (x) = 1 f (x)
is a continuous function on (T ), and so the bounded operator S = g (T ) exists. The relations g (f ) = (f ) g = 1, valid in C ( (T )), imply by Theorem 10.17(2) that g (T )(f (T ) ) = (f (T ) )g (T ) = , that is, S = (f (T ) )1 , so that is not in the spectrum of f (T ), as expected. Conversely, let = f (1 ) with 1 (T ). We need to check that (f (T )). We argue according to the type of 1 . If 1 discrete (T ), then Theorem 10.17(4) shows that f (1 ) discrete (f (T )), as desired. Since residual (T ) = for T self-adjoint, we are left with the case 1 continuous (T ). We use the existence of approximate eigenvectors as in Denition 10.2, applied rst to T 1 , and then transfer it to f (T ) . Let v be a vector with v = 1 and p C[X ]. We write (f (T ) )v Now write p(z ) p(1 ) = q (z )(z 1 ), and use Theorem 10.17(2) so that (f (T ) )v f p
C ((T ))
(f (T ) p(T ))v + (p(T ) p(1 ))v + (p(1 ) )v f (T ) p(T ) + (p(T ) p(1 ))v + |p(1 ) |.
+ q
C ((T ))
(T 1 )v + |p(1 ) |,
for all v in the unit sphere of H . For any > 0 we can nd a polynomial p and since = f (1 ) also |p(1 ) | < 3 . Then such that f p ,(T ) < 3 for all v with v = 1, we have (f (T ) )v
2 3
+ q
C ((T ))
(T 1 )v ,
where q is now xed by the choice of p. Since 1 continuous (T ), we can nd (see Denition 10.2) a vector v with v = 1 and with
321
(T 1 )v < and then deduce that
3 q
C ((T ))
As > 0 was arbitrary, this implies that discrete (f (T )) continuous (f (T )). At this point, we have shown that f ( (T )) (f (T )). Corollary 10.23. Let T B (H ) be a positive operator. For any n 1, there exists a positive normal operator, denoted T 1/n , with the property that (T 1/n )n = T. We note that such an operator is unique, but we will prove this only a bit later. At this stage, we do not know enough about how the spectrum helps describe how the operator acts. Proof. Since T 0, we have (T ) [0, ), so the function f : x x1/n is dened and continuous on (T ). Since f (x)n = x, for all x [0, ), the functional calculus implies that f (T )n = T . Moreover f 0, and hence f (T ) 0 by Corollary 10.21. The next corollary, which will be generalized later, starts to show how the functional calculus can be used to provide detailed information about the spectrum. Corollary 10.24. Let H be a Hilbert space and let T B (H ) be a bounded self-adjoint operator. Let (T ) be an isolated point so that there is some > 0 for which (T ) ( , + ) = {}. Then discrete (T ). Proof. The fact that is isolated implies that the function f = : (T ) C which maps to 1 and (T ) {} to 0 is a continuous function on (T ). Hence we can dene an operator P = f (T ) B (H ). We claim that P is non-zero, and is a projection to ker(T ). This will show that is in the discrete spectrum. Firstly, P = 0 because P = f ,(T ) = 1, by the functional calculus. Clearly f = f 2 in C ( (T )), so P = f (T ) = f (T )2 = P 2 , which shows that P is a projection. We have an identity of continuous functions (z )f (z ) = 0 for all z (T ), so by the functional calculus we get which shows that 0 = Im(P ) ker(T ). (T )P = 0,
(f (T ) )v < .
322
Example 10.25. Let H be a separable Hilbert space, and let T K (H ) be a compact self-adjoint operator. Writing T (v ) =
n 1
n v, en ,
where (n ) is the sequence of non-zero (real) eigenvalues of T with (en ) the sequence of corresponding eigenvectors, then f (T ) is dened on (T ) = {0} {1 , 2 , . . . } by f (T )v = f (0)P0 (v ) +
n 1
f (n ) v, en en ,
where P0 B (H ) is the orthogonal projection onto ker(T ).
10.3 Spectral measures

Using the functional calculus, we can clarify now how the spectrum represents an operator T and its action on vectors v H . Proposition 10.26. Let H be a Hilbert space, let T B (H ) be a self-adjoint operator and let v H be a xed vector. There exists a unique measure on (T ), depending on T and on v , such that f (x) d(x) = f (T )v, v
(T )
for all f C ( (T )). In particular, we have ( (T )) = v 2 , (10.8) so is a nite measure. This measure is called the spectral measure associated to v and T . Proof. This is a direct application of Theorem 6.30. Indeed, the linear functional : C ( (T )) C f f (T )v, v
is well-dened and positive, since if f 0, we have f (T ) 0 by Corollary 10.21, hence f (T )v, v 0 by denition. Hence there exists a unique positive locally nite measure on (T ) such that (f ) = f (T )v, v =
(T )
f (x) d(x)
for all f C ( (T )). Moreover, taking f (x) = 1 for all x, we obtain (10.8) (which also means that = v 2 ).
323
Example 10.27. Let H be a separable, innite-dimensional Hilbert space, and let T K (H ) be a compact self-adjoint operator expressed as in Theorem 4.22 in an orthonormal basis (en ) of ker(T ) consisting of eigenvectors for the nonzero eigenvalues n = 0 of T . We have then, by Example 10.25, the formula f (T )v = f (0)P0 (v ) +
n 1
f (n ) v, en en
for all v H , where P0 is the orthogonal projection on ker(T ). Thus, by denition, we have f (x) d(x) = f (0) P0 (v )
(T ) 2
+
n 1
f (n )| v, en |2
for all continuous functions f on (T ). Notice that (T ) = {0} {n | n 1},
and that, since n 0 as n , f is thus entirely described by the sequence (f (n )), with f (0) = lim f (n ).
n
Hence the formula above means that, as a measure on (T ), is a series of Dirac measures at the eigenvalues (including 0) with ({0}) = P0 (v ) 2 , 2 and {(n )} = m =n | v, em | (the sum is needed in case there is an eigenvalue with multiplicity). Equivalently, to be concise in all cases: for all (T ), {()} is given by v 2 , where v is the orthogonal projection of v onto the eigenspace ker(T ). This example indicates how, roughly speaking, one can think of in general. The spectral measure indicates how the vector v is spread out among the spectrum; in general, any individual point (T ) carries a vanishing proportion of the vector, because ({}) is often zero. However, (U ) > 0 for a subset U (T ) indicates that a positive density of the vector is in generalized eigenspaces corresponding to that part of the spectrum. Example 10.28. Let (X, ) be a nite measure space and let T = Mg be the multiplication operator by a real-valued bounded function on L2 (X, ). Consider L2 (X ). What is the associated spectral measure? According to Example 10.13, the functional calculus is dened by f (T ) = Mf g for f C ( (T )) (which is the support of the measure = g () on R). We therefore have f (y ) d (y ) f (g (x))|(x)|2 d(x) = f (T ), =
X R
where by the standard change of variable formula for image measures. Since the support of is contained in the support of , this can be written as = g (||2 d),
324
f (T ), =
(T )
f (y ) d (y )
which means that the spectral measure associated with is the measure , restricted to (T ). Notice the following interesting special cases. If = 1, then the spectral measure is simply . If, in addition, X R is a bounded subset of the real numbers and g (x) = x, then the spectral measure is simply itself. 10.3.1 The Spectral Theorem for Self-Adjoint Operators Using spectral measures, we can now understand how the spectrum and the functional calculus interact to give a complete description of a self-adjoint operator in B (H ). To see how this works, consider rst v H and the associated spectral measure v , so that f (T )v, v =
(T )
f (x) dv (x)
for all continuous functions f dened on the spectrum of T . In particular, if we apply this to |f |2 = f f and use the properties of the functional calculus, we get f (T )v
2
= f (T )v, f (T )v = (f f )(T )v, v = =

(T ) f 2 L2 ((T ),v ) .
|f (x)|2 dv (x)
In other words, the (obviously linear) map (C ( (T )),

L2 )
sending f to f (T )v is an isometry. The fact that v is a positive locally nite measure implies that continuous functions are dense in the Hilbert space L2 v ( (T )), and so there is a continuous (isometric) extension : L2 v ( (T )) H. In general, there is no reason that should be surjective (think, for example, of the case v = 0). However, if we let Hv = Im() H , then the subspace Hv is closed, and it is stable under T . Indeed, the closedness comes from the fact that is an isometry, and to show that T (Hv ) Hv , it is enough to show that T ((f )) Hv
325
for f C ( (T )), or even for f a polynomial function, since the image of the those functions is dense in Hv . However, T ((f )) = T (f (T )v ) = T
j
(j )T j (v ) =
j
(j )T j +1 v = (xf )(T )(v ),
(10.9) which lies in Hv . We see even more from this last calculation. Denote by Tv (for clarity) the restriction of T to Hv , so Tv : Hv Hv and is now an isometric isomorphism : L2 ( (Tv ), v ) Hv . Thus Tv is, via , unitarily equivalent to the operator S = 1 Tv
on L2 v ( (T )), and since we have
Tv ((f )) = (xf )(T )(v ) = (xf )
(10.10)
by (10.9), extended by continuity from polynomials to L2 v ( (T )), it follows that S (f )(x) = xf (x) (in L2 v ( (T ))). That is, S is simply the multiplication operator Mx dened on L2 v ( (T )). This is therefore a special case of Theorem 10.14, which we have now proved, for the case where T is self-adjoint and there exists some vector v such that Hv = H . It is important in this reasoning to keep track of the measure v , which depends on the vector v , and to remember that L2 functions are dened up to functions which are zero almost everywhere. Indeed, it could well be that v has support which is much smaller than the spectrum, and then the values of a continuous function f outside the support are irrelevant in seeing f as an element of L2 v . In particular, the map C ( (T )) L2 v ( (T )) is not necessarily injective. Denition 10.29. Let H be a Hilbert space and let T B (H ). A vector v in H is called a cyclic vector for T if the vectors T n (v ), for n 0, span a dense subspace of H . In particular, H is then separable.
We write x for the function x x to simplify notation.
326
By the denseness of polynomials in C ( (T )), a vector v is cyclic for a self-adjoint operator if and only if Hv = H in the notation above. It is not always the case that T admits a cyclic vector. However, we have the following lemma which allows us to reduce many questions to the cyclic case. Lemma 10.30. Let H be a Hilbert space, and let T B (H ) be a self-adjoint operator. Then there exists a family (Hi )iI of non-zero, pairwise orthogonal, closed subspaces of H such that H is the orthogonal direct sum of the Hi , T (Hi ) Hi for all i, and T restricted to Hi is, for all i, a self-adjoint bounded operator in B (Hi ) with a cyclic vector. Proof. This involves iterating the construction of Hv above (since Tv , by denition, admits v as cyclic vector). Since this needs to be done potentially innitely often a suitable application of Zorns lemma is needed. We sketch this argument, since the details are straightforward and close to many other applications of Zorns lemma. First of all, we dispense with the case H = 0 (in which case one can take I = ). Note also that since v Hv as dened above, we have Hv = 0 if and only if v = 0. Let O be the set of subsets I H {0} such that the spaces Hv for v I are pairwise orthogonal, ordered by inclusion. We can apply Zorns Lemma to (O, ). Indeed, if T is a totally ordered subset of O, then we dene as usual J=
I T
I H,
and if v , w are in J , they belong to some I1 , I2 , in T , respectively, and one of I1 I2 or I2 I1 must hold. In either case, the denition of O shows that Hv and Hw are (non-zero) orthogonal subspaces. Consequently, J is an upper bound for T in O. Applying Zorns lemma, we get a maximal element I O. Let H1 =
v I
Hv ,
where the direct sum is orthogonal and taken in the Hilbert space sense, so elements of H1 are sums v=
iI
vi ,
vi Hi ,
2
with v
2
=
iI
vi
< .
To conclude the proof, we must show that H1 = H . Since H1 is closed (by denition of the Hilbert sum), if H1 = H then there exists some v0 H1 {0}, and then
327
I = I {v0 } is strictly larger than I and lies in O. Indeed, by construction each Hv for v I is T -invariant, and hence H1 is also. By Lemma 4.24 and T = T this shows and I O as claimed. This that H1 is T -invariant and hence Hv0 H1 contradicts maximality, so we must have H1 = H as required. Notice that if H is separable, the index set in the above result is either nite or countable, since each Hi is non-zero. We can now prove Theorem 10.14 for self-adjoint operators. Theorem 10.31 (Spectral theorem for self-adjoint operators). Let H be a separable Hilbert space and T B (H ) a continuous self-adjoint operator. Then there exists a nite measure space (X, ), a unitary operator : H L2 (X ) and a bounded function g L (X ), such that Mg = T. Proof. Consider a family (Hn )n 1 (possibly with nitely many elements only) of pairwise orthogonal non-zero closed subspaces of H , spanning H , for which T (Hn ) Hn and T has a cyclic vector vn = 0 on Hn . By replacing vn with 2n/2 vn 1 vn , we can assume that vn 2 = 2n (without changing Hn ). Let n = vn be the spectral measure associated with vn (and T ), so that n ( (T )) = vn
2
= 2 n .
By the argument at the beginning of this section, we have unitary maps n : L2 n ( (T )) Hn H,

1 such that n T n = Mx , the operator of multiplication by x. Now dene
X = {1, 2, . . . , n, . . .} (T ), with the product topology, and the locally nite positive measure dened by ({n} A) = n (A) for n 1 and A (T ) measurable. It is easily checked that this is indeed a measure. In fact, functions on X correspond to sequences of functions (fn ) on (T ) by mapping f to (fn ) with fn (x) = f (n, x), and
328
f (x) d(x) =
X n 1 (T )
fn (x) dn (x)
whenever this makes sense (for example, if f 0, which is equivalent to fn 0 for all n, or if f is integrable, which is equivalent with fn being n -integrable for all n and the sum being convergent). In particular (X ) =
n 1
n ( (T )) =
n 1
2n < ,
so (X, ) is a nite measure space. Moreover, the map : L2 (X, ) L2 ( (T ), n )

n 1
f (fn ) is a surjective isometry. We construct by dening

n 1
wn = 1
n 1
1 n (wn )
for all wn Hn . Since all the Hn together span H , this is a linear map dened on all of H , and it is a unitary map with inverse 1 (f ) =
n 1
n (fn ).
Now consider the map g : X C sending (n, x) to x, which is bounded and measurable, and nally observe that the nth component of (v ), for v expressed as v= n (fn ),
n 1 1 is n (n (fn )) = fn , hence the nth component of (T (v )) is
T (fn ) = xfn , which means exactly that T = Mg . This spectral theorem is extremely useful. It immediately implies a number of results which could also be proved directly from the continuous functional calculus, but less transparently. Note that the method of proof (treating rst the case of cyclic operators, and then extend using Zorns Lemma) may also be a shorter approach to other corollaries, since in the cyclic case one knows that the multiplication function can be taken to be the identity on the spectrum.
10.4 Spectral Measures and the Measurable Functional Calculus
329
Example 10.32. We continue with the example of a multiplication operator T = Mg associated with a bounded function g , acting on H = L2 (X ) for a nite measure space (X, ). For a given H , it follows from the previous examples that H is the subspace of functions of the type x f (g (x))(x) for f C ( (T )), the spectrum being the support of g (). If we select the special vector = 1, this is the space of functions f (g (x)). This may or may not be dense; for instance, if X R and g (x) = x, this space is of course dense in H ; if, say, X = [1, 1], is Lebesgue measure and g (x) = x2 , this is the space of even functions in L2 , which is not dense, so is not a cyclic vector in this case. Corollary 10.33 (Positivity). Let H be a separable Hilbert space and let T B (H ) be a self-adjoint operator. For f C ( (T )), we have f (T ) 0 if and only if f 0. Proof. Because of Corollary 10.21, we only need to check that f (T ) 0 implies that f 0. Now two unitarily equivalent operators are simultaneously either positive or not, so it suces to consider an operator T = Mg acting on L2 (X ) for a nite measure space (X, ). Then we have f (Mg ) = Mf g by Example 10.13, hence f (Mg ), =
X
f (g (x))|(x)|2 d(x)
for all vectors L2 (X, ). The non-negativity of this for all implies that f 0 everywhere, as desired. To see this, take (x) = (g (x)), where is the characteristic function of A = {y | f (y ) < 0}, to get f (y )(y ) d (y ) =
(T ) A
f (y ) d (y )
0,
= g (),
since (T ) is the support of the image measure. It follows that (A) = 0, so f is non-negative almost everywhere on (T ), and since it is continuous (and the support of is the whole spectrum), this means in fact that f 0 everywhere.

For the proof of Theorem 10.16 we now discuss some more general spectral measures.
330
10.4.1 Non-Diagonal Spectral Measures Denition 10.34. Let H be a complex Hilbert space and let T B (H ) be a normal bounded operator. For v, w H a non-diagonal spectral measure is a nite complex-valued measure v,w on (T ) with f dv,w = f (T )v, w
(T )
(10.11)
for all f C ( (T )). Proposition 10.35. Let H be a complex Hilbert space, and let T B (H ) be a self-adjoint bounded operator. Then for every v, w H there exists a uniquely determined spectral measure v,w with (10.11). Moreover, v,w depends sesquilinearly on v, w H and satises v,w v w .
Proof. Since linear functionals on C ( (T )) can be uniquely identied with complex-valued measures by Theorem 6.40, the spectral measure is uniquely determined if it exists. To obtain the existence, we may consider by the spectral theorem (Theorem 10.14) a multiplication operator Mg : L2 (X ) 2 L2 ( X ) for some g L ( X ). So let v, w L ( X ). By Example 10.13 we have f (Mg ) = Mf g for all f C ( (Mg )), so that f (Mg )v, w =
X d (x )
f (g (x)) v (x)w(x) d(x)
=
(Mg )
f (z ) dg (z ).
(10.12)
Also notice that f (g (x))v (x)w(x) d(x)

X X
|f (g (x))| |v (x)w(x) | d(x)

f
by the CauchySchwarz inequality, so that g = v,w is a nite complexvalued measure with g v 2 w 2. Since uniqueness and existence are now shown, the sesqui-linearity follows easily from the sesqui-linearity of the inner product on H .
331
10.4.2 The Measurable Functional Calculus Using the spectral measures from above, we can now dene f (T ) for a function f L ( (T )). Proposition 10.36. Let H be a complex Hilbert space and let T B (H ) be a self-adjoint bounded operator. For any f L ( (T )) there exists a bounded operator f (T ) which is uniquely characterized by the property f (T )v, w =
(T )
f dv,w
(10.13)
for all v, w H . Proof. Since v,w is a nite complex-valued measure, and f L ( (T )) is bounded, the integral (T ) f dv,w exists. Moreover, f dv,w
X X
|f | d|v,w |
v,w
by Proposition 10.35. For a xed v H the map w f dv,w

(T )
is linear and bounded. Therefore, by Fr echetRiesz representation (Corollary 2.71) there exists some vf with vf for which vf , w = w, vf =
(T )
(10.14)
f dv,w
for all w H . By linearity of v v,w and the bound (10.14), we see that v vf = f (T )v denes a bounded operator f (T ) with (10.13). We now have all the ingredients needed to prove all the properties of the measurable functional calculus dened by Proposition 10.36. Proof of Theorem 10.16 for self-adjoint operators. Recall that we have already shown in Theorem 10.17 the existence of an operator f (T ) for f C ( (T )) such that (FC1) holds and C ( (T )) f f (T ) B (H ) is an isometric Banach algebra homomorphism with f (T ) = f (T ).
332
Also recall that in Proposition 10.35 we derived the existence of the family of nite complex-valued measures {v,w } on (T ) with f (T )v, w =
(T )
f dv,w ,
(10.15)
which in Proposition 10.36 we turned around to use (10.15) as the denition of f (T ) for f L ( (T )). Hence this denition of the measurable functional calculus extends the denition of the continuous functional calculus, and hence satises (FC1). By Theorem 10.17 and Proposition 10.36 above we also have (FC3). To prove (FC2) we again argue in the following way. First, by Theorem 10.17 we already know (FC2) for continuous functions. We will use this to encode the properties of (FC2) into a property of the non-diagonal measure v,w , which in turn will give the same property for measurable functions. Let us start with f (T ) = f (T ), which we know by Theorem 10.17 for f C ( (T )). We claim that this implies that v,w = w,v . To see this, let f C ( (T )) and notice that f dv,w = f (T ) v, w = v, f (T )w = f (T )w, v = f dw,v = f dw,v ,
and since this holds for all f C ( (T )) the claim follows. Now we use basically the same identity (in a slightly dierent order, and with a dierent logic) to deduce that f (T ) = f (T ) for f L ( (T )). So let f L ( (T )). Then f (T ) v, w = v, f (T )w = f (T )w, v = = f dw,v f dv,w
= f (T )v, w as required. We now show that (f1 f2 )(T ) = f1 (T )f2 (T ) (10.16)
for f1 , f2 L ( (T )). Again we know this property for f1 , f2 C ( (T )). We claim that this implies df2 (T )v,w = f2 dv,w (10.17)
333
for f2 C ( (T )) and for f2 L ( (T )). Assuming this for now, we derive (10.16) from it for f1 , f2 L ( (T )). Indeed, f1 (T )f2 (T )v, w =
(T )
f1 df2 (T )v,w f1 f2 dv,w = (f1 f2 )(T )v, w .
=
(T )
It remains to prove the claim (10.17). Using f1 , f2 C ( (T )) we have

(T )
f1 df2 (T )v,w = f1 (T )f2 (T )v, w = (f1 f2 )(T )v, w =

(T )
f1 f2 dv,w .
As this holds for all f1 C ( (T )) we obtain df2 (T )v,w = f2 dv,w for f2 C ( (T )). Using v,w = w,v this also shows that dv,f1 (T )w = df1 (T )w,v = f1 dw,v = f1 dv,w for all f1 C ( (T )). Now let f2 L ( (T )) so that f1 df2 (T )v,w = f1 (T )f2 (T )v, w = f2 (T )v, f1 (T )w = = f2 dv,f1 (T )w f1 f2 dv,w
for all f1 C ( (T )), which implies the claim and hence (FC2). To prove (FC4), suppose rst that S B (H ) commutes with T . We note rst that this implies that S commutes with f (T ) for all f C ( (T )). Indeed, for a polynomial f C[z ] this is clear, and for f C ( (T )) this follows quickly by density of C[z ]|(T ) in C ( (T )) since if (fn ) is a sequence in C[z ] with fn f in C ( (T )), then f (T )S = Sf (T ) since fn (T )S = Sfn (T ) for all n 1. We extend this again to f L ( (T )) using the spectral measures. For these, we have that Sv,w = v,S w since f dSv,w = f (T )Sv, w = f (T )v, S w =
(T ) (T )
f dv,S w
for all f C ( (T )) by the previous step. Now let f L ( (T )), and notice that
334
f (T )Sv, w =
(T )
f dSv,w =
(T )
f dv,S w = Sf (T )v, w
shows that f (T )S = Sf (T ). To complete the proof of (FC4), we still have to consider an invariant subspace V H . By Lemma 4.24 the closed subspace V is T = T -invariant. This implies that T commutes with the orthogonal projection PV : H H onto V , since for v V T PV (v ) = T v = PV (T v ),
V
and T PV (w) = 0 = PV ( T w ) for w V . Therefore PV commutes with f (T ) for all f L ( (T )). If now v V then f (T )v = f (T ) PV (v ) = PV f (T )(v ) V shows the remaining claim in (FC4). 2 To prove (FC5), suppose now that T = Mg : L2 (X ) L (X ) for some g L (X ), and let f L ( (T )). Then f g is dened almost everywhere (specically, on g 1 ( (Mg ); see Example 10.13). For v, w L2 (X ) we see that dv,w is the push-forward of vw d under g by (10.12) and so f (Mg )v, w =
X V
f dv,w =
X
f gvw d = Mf g v, w ,
which proves (FC5), and hence the theorem.
10.5 Commuting Normal Operators

The following is a natural generalization of the spectral theorem for normal operators (Theorem 10.14). Theorem 10.37 (Spectral theorem for commuting normal operators). Let H be a separable complex Hilbert space, and let T1 , T2 , B (H ) be normal operators that commute with each other and with each others adjoints. Then there exists a -nite measure space (X, ), a unitary isomor phism : H L2 (X ) and for every Tn a bounded function gn L (X ) such that Tn : H H commutes. L2 Mgn : L2 (X ) (X )
10.5 Commuting Normal Operators
335
Clearly Theorem 10.14 is a special case of Theorem 10.37. Proof of Theorem 10.37. Let A=
, T , T , . . . B (H ) , T1, T1 2 2
be the closure of the unital algebra generated by T1 , T2 , . . . and their adjoints. By assumption, A is commutative. In fact A is a commutative unital C algebra, since B (H ) is itself a C -algebra and A = A by construction. By Corollary 9.30 the Gelfand transform A a ao C (Ao ) is an isometry satisfying (a )o = ao
for all a A. Notice that the inverse map ao a is a generalized functional calculus. Now x v H and a 0 in A. Then there exists some b = b A o (dened using the Gelfand transform by b = ao ) with b2 = a. It follows that the linear functional : C (Ao ) C dened by (ao ) = av, v is positive, since ao 0 = (ao ) = av, v = bv, bv 0.
By Riesz representation (Theorem 6.30) there exists a positive nite measure v,v on Ao such that av, v =
Ao
ao dv,v .
Just as in Section 10.3.1 and Corollary 8.10 this induces a unitary isomorphism o between the cyclic subspace Hv = Av and L2 v,v (A ) which sends av Av o o to a C (A ). In particular for a, b A we have (abv ) = (ab)o = ao bo = ao (bv ). Fixing a A, this extends by continuity to the statement Hv (aw) = Mao (w) for all w Hv . As in Sections 10.3 and 8.1.2 this extends to a proof of Theorem 10.37 as follows. If w1 , w2 , . . . is an orthonormal basis of H then we is the orthogonal projection dene H1 = Hw1 , H2 = Hw2 where w2 H1 to H1 , similarly H3 = Hw3 with w3 (H1 H2 ) , and so on. Dene
Note that Lemma 9.31 in the proof of Corollary 9.30 can here be replaced by Lemma 10.12.
336
X=
m N
Ao
with measure =
m N
wm ,
so that
L2 (X ) =
m N
L2 (X, wm ) =
Hwm = H,
m N
(10.18)
and application of Tn A leaves each subspace Hwm invariant and corresponds to multiplication by o Tn C (Ao ) on As this holds for all m N and the map in (10.18) is a unitary isomorphism : H L2 (X ), this gives the result.
2 L2 (Ao , wm ) L (X, ).
10.6 Projection-valued measures

In this section, we describe another version of Theorem 10.31, still for selfadjoint operators, which is essentially equivalent but sometimes more convenient. Moreover, it allows us to examine some concepts from Section 8.1.4 in a well-motivated way. The idea is to generalize the following interpretation of the spectral theorem for a compact self-adjoint operator T K (H ). If we denote by P the orthogonal projection onto ker(T ) for R, then we have v=
R
P (v ), P (v ),
R
T (v ) =
for all v H , where the series are well-dened because P = 0 for / (T ). To generalize this, it is natural to expect that one must replace the summation with appropriate integrals. Thus some form of integration for functions taking values in H or B (H ) is needed. Moreover, ker(T ) may be zero for all , and the projections must be generalized. We start by considering these two questions abstractly. Denition 10.38 (Projection-valued measure). Let H be a Hilbert space and let P (H ) denote the set of orthogonal projections in B (H ). A (nite) projection valued measure on H is a map
337
A A
B P (H )
from the Borel -algebra on R to the set of projections, such that the following holds: (1) = 0 and R = . (2) There is some R > 0 for which [R,R] = . (3) If (An ) is a sequence of pairwise disjoint Borel subsets of R, and A = n 1 An , then A =
n 1
A n
(10.19)
where the series converges in the strong operator topology . If the sequence (An ) is actually a nite list of sets A1 , . . . , AN , then
N
A =
n=1
A n .
Exercise 10.39. Show that the strong operator topology has the following properties. (1) it is Hausdor; (2) it is weaker than the Banach-space topology given by the operator norm; (3) a sequence (Tn ) converges to T as n in the strong operator topology if and only if Tn (v ) T (v ) as n for all v .
In particular (10.19) means that A (v ) =

n 1
An (v )
for all v H . If (en )n 1 is an orthonormal basis of a separable Hilbert space H , then the projection Pn onto C en H satises Pn (v ) = v, en en 0 as n for any v H . This may be seen as follows: since | v |2 =
n 1
| v, en |2 ,
This is the topology on B (H ) dened using the seminorms pv : B (H ) [0, ) T |T (v )| for v H .
338
the coecients v, en converge to 0 as n . Thus Pn converges strongly to 0, but we also have |Pn | = 1 for all n, so (Pn ) does not converge in B (H ) for the operator norm. Denition 10.38 resembles in some ways the denition of a (nite) Borel measure on R. The following elementary properties are therefore not surprising. Lemma 10.40. Let H be a Hilbert space and a projection-valued measure on H . (1) For A B measurable, A B and A B = B A = A . (2) For measurable sets A, B R, A B = A B = B A . In particular, all projections A commute, and if A B = , we have A B = 0. Proof. Write B = A (B A), a disjoint union, so that by (10.19) we have B = A + B A . Now B A is an orthogonal projection, so in particular B A 0 (since P (v ), v = P (v ) 2 for any orthogonal projection P ). Moreover, we recall that whenever P1 , P2 are orthogonal projections on H1 , H2 , respectively, we have P1 P2 H2 H1 P1 P2 = P2 P1 = P2 .
In our case, with P1 = B , 2 = A , this gives A B = B A = A completing the proof of (1). For (2), we start with the case A B = . Then A B = A + B , and multiplying by A (since A A B ) the rst part gives
2 A = A A B = A + A B = A + A B ,
Indeed, since P1 P2 , we have H2 H1 = 0, since if v belongs to this subspace, we have 0 = P1 (v ) = P2 (v ) + (P1 P2 )(v ) = v + (P1 P2 )(v ),
hence (by positivity of the second term) v

2
+ (P1 P2 )v, v = 0,
so that v = 0; then H2 H1 = 0 implies H2 H1 ; then from H2 H1 , we have directly that P2 (P1 (v )) = P2 (v ) and P1 (P2 (v )) = P2 (v ) for all v H .
339
so that A B = 0. Similarly we have B A = 0. Next, for any A and B , notice that we have a disjoint intersection A = (A B ) (A B ), hence A = A
B
+ A B ,
and multiplying by B this time gives B A = B A

B
+ B A B = A B
because B (A B ) = (allowing the previous case to be used), and AB B (and we apply (1) again). Similarly, we get A B = AB . As expected, the point of projection-valued measures is that one can integrate with respect to them, and construct operators in B (H ) using this formalism. Proposition 10.41. Let H be a Hilbert space and let be a projection-valued measure on H . Then for any bounded Borel function f : R C, there exists a unique operator T B (H ) such that T (v ), v =
R
f ()dv ()
(10.20)
for all v H , where v is the nite Borel measure given by v (A) = A (v ), v for A B . This operator is denoted T =
R
(10.21)
f () d () =
R
f () d .
Moreover, T =
R
f ()d (),
and the operator T is normal. Finally, it is self-adjoint if f is real-valued, and non-negative if f is non-negative. If T is the self-adjoint operator associated to a projection valued measure , it is also customary to write f (T ) =
R
f () d ().
(10.22)
If f is continuous, this coincides with the functional calculus for T . Proof of Proposition 10.41. Let be a projection valued measure on H , and let v be any xed vector. We dene v on Borel subsets of R as indicated
340
by (10.21), and we rst check that it is indeed a Borel measure. Since any orthogonal projection is positive, v is non-negative. Moreover, v () = 0 and v (R) = |v |2 by the rst property dening projection-valued measures, and if (An ) is a sequence of disjoint Borel subsets with union A, then v (A) = A (v ), v =
n
An v, v =
n
v (An ),
by (10.19) and the denition of strong convergence of sequences. So v is a nite measure. In particular, if f is a bounded measurable function on R, the integral f () dv ()
R
exists for all v H . If there exists an operator T B (H ) such that T (v ), v =

R
f () dv ()
for all v , we know that T is uniquely determined by those integrals, and this gives the uniqueness part of the statement. To show the existence, we simply parallel the construction of integration with respect to a measure (one could be more direct by showing directly that the right-hand side of the equality above is of the form T (v ), v for some T B (H ), but the longer construction is instructive for other reasons anyway). We start by dening
A () d () = A
R
for any Borel subset A R, where A is the characteristic function of A. The denition (10.21) exactly means that this denition is compatible with our desired statement (10.20) for f = A , i.e., we have f d v, v =
R R
f ()dv (),
(10.23)
for all v H . We then extend the denition by linearity for step functions f=
1 i N
i Ai
where Ai R are disjoint measurable sets, namely f d =

R 1 i N
i Ai ;
341
again, linearity ensures that (10.23) holds for such f , and unicity ensures that the resulting operator does not depend on the representation of f as a sum of characteristic functions. Next, for f 0, bounded and measurable, it is well-known that we can nd step functions sn 0, n 1, such that (sn ) converges uniformly to f : indeed, if 0 f B , one can dene sn (x) = iB , where 0 n i n 1 is such that f (x) [iB/n, (i + 1)B/n[ B/n for all x.
(and sn (x) = B if f (x) = B ), so that |f (x) sn (x)| We will show that Tn = sn d

R
converges in B (H ) to an operator T such that (10.23) holds. Indeed, for any step function s, we can write s=
1 i N
i Ai ,
Ai disjoint,
s2 =
1 i N
2 i Ai
and by denition we get sd v

R 2
=
1 i N
2 i Ai (v )
max |i |2 v 2 , ,
(10.24)
for all v (using Lemma 10.40, (2)), so sd

R
B (H )
L .
Applied to s = sn sm , this shows that the sequence (Tn ) is a Cauchy sequence in B (H ), hence it does converge to some operator T B (H ). We can then argue that, by continuity, we have T (v ), v = lim Tn (v ), v = lim
n n
sn dv =
R R
f dv
by the dominated convergence theorem (since 0 sn f which is bounded, hence integrable with respect to a nite measure). This means that T satises (10.23), as desired. We are now essentially done: given a bounded complex-valued function f : R C, we write f = ((f )+ (f ) ) + i((f )+ (f ) ), where each of the four terms is 0, and we dene f d by linearity from this expression. Again, (10.23) holds trivially. To conclude the proof of the proposition, we note rst that
342
T (v ), v = T (v ), v =
R
f ()dv () =
R
f ()dv (),
which shows that T =

R
f ()d ().
Generalizing (10.24) one shows that, for all f , we have f d v

R 2
=
R
|f |2 dv =
d v f
R
and since T is normal if and only if T (v ) 2 = T (v ) 2 for all v , we deduce that any of the operators of the type R f d is normal. Finally, the selfadjointness for f real-valued, and the positivity for f 0, are clear from the construction. Example 10.42. Let be a projection-valued measure. We then have Id =
R
d ().
Corollary 10.43. Let H be a Hilbert space, and a nite projection-valued measure on H . Let (fn ) be a sequence of bounded measurable functions R C such that fn (x) f (x) with f measurable and bounded for all x R, and with |fn | bounded for n 1. Then fn () d () f () d ()
R R
in the strong topology. Proof. Let T be the self-adjoint operator dened by . The left- and righthand sides are well-dened operators by Proposition 10.41, given by fn (T ) and f (T ) respectively (using the notation (10.22)), and it is enough to prove that, for any vector v H , we have fn (T )v, v f (T )v, v as n . By denition, the left-hand side is fn ()dv ()
R
where the measure v , as in (10.21), is uniquely determined by and v . Now the assumptions imply that fn () f () as n for each , and that |fn | C for some constant C independent of n. Since v is a nite
343
measure on R, this constant is v -integrable so we can apply the dominated convergence theorem to conclude that fn () dv () f () dv () = f (T )v, v ,
R
as desired. We can now state a new form of the spectral theorem. Theorem 10.44 (Spectral theorem in projection-valued measure form). Let H be a separable Hilbert space and let T = T be a bounded self-adjoint operator on H . Then there exists a unique projection-valued measure T such that dT (), T =
R
where the integral is extended to the unbounded function by dening it as dT () =

R
I dT ,
where I is the characteristic function of some interval I = [R, R] for which T,I = . Moreover, if f is any continuous function on (T ), then f (T ) =
(T )
f () dT ().
Proof. By the spectral theorem (Theorem 10.31), we can assume that T = Mg is a multiplication operator acting on H = L2 (X ) for some nite measure space (X, ), and g a real-valued function in L (X ). For A R a Borel subset, we then dene T,A = MA g , the multiplication operator by A g L (X ). We now check that A T,A is a projection-valued measure. It is clear that
2 T,A = A (g (x))2 (x) = A (g (x)) = T,A ,
for every L2 (X ), so each T,A is a projection operator, and since it is self-adjoint (each A g being real-valued), it is an orthogonal projection. The properties T, = 0, and T,R = T,[R,R] = are clear if |g | R. If (An ) is a sequence of pairwise disjoint Borel subsets of R with union A, then
344
A (y ) =
n 1
An (y )
for any y R, where the series contains at most a single non-zero term since the sets are disjoint. Hence T,A (x) = A (g (x))(x) =
n 1
An (g (x))(x),
for any L2 (X ), showing that T,A =

n 1
T,An ,
in the strong topology on B (H ). So we have constructed a projection-valued measure from T . Consider now the operator S=
R
I d B (H ),
where I is as described in the statement of the theorem. We claim that S = T . To see this, x H so that (by (10.20)) we have S (), =
R
I () d ()
where the measure is dened by (A) = T,A , =

X
A (g (x))|(x)|2 d(x) =
A (y ) d (y ),
R
where = g (||2 d) is the spectral measure associated to T and (see Example 10.28). On the other hand, we have T (), =
X
g (x)|(x)|2 d(x) =
d ().
R
The support of is the essential range of g , and so by the choice of I , we have S (), =
R
I () d () =
R
d () = T (),
for all H . This means that S = T , as desired. Intuitively (as the proof illustrates), T,A is the orthogonal projection onto the subspace of H which is the direct sum of those subspaces on which T acts by multiplication by some A. The following lemma will be useful in the next section.
10.7 The spectral theorem for normal operators
345
Lemma 10.45. Let H be a separable Hilbert space, and let T1 , T2 be selfadjoint operators in B (H ) which commute, with associated projection valued measures 1 and 2 . Then, for bounded measurable functions f and g , the operators S1 and S2 dened by Si =
R
f di
for i = 1, 2 also commute. Proof. It is enough to consider the case where g is the identity, so S2 = T2 , because if that case holds, then we deduce rst that S1 commutes with T2 , and then the same argument with (T1 , T2 , f ) replaced by (T2 , S1 , g ) gives the desired conclusion. Next, a simple limiting argument shows that it is enough to consider the case where f is the characteristic function of a measurable set, so S1 = 1,A is a projection, and we must show that 1,A T2 = T2 1,A . Now we can argue as follows. The assumption implies immediately, by induction, that
n n T1 T2 = T2 T1
for all n 0, so T2 commutes with any polynomial p(T1 ). By continuity of multiplication in B (H ), T2 commutes with all operators (T1 ), C ( (T1 )). We know there exists a sequence (n ) of such continuous functions with n (x) A (x) as n for all x. By strong convergence (Corollary 10.43), it follows that n (T1 ) =
R
n d1 A
strongly. Thus T2 (A (v )) = lim T2 (n (T1 )v ) = lim n (T1 )(T2 (v )) = A (T2 (v )).

n n
10.7 The spectral theorem for normal operators

Using the following simple lemma, we are now in a position to extend the spectral theorem and the continuous functional calculus to normal operators via this approach. Lemma 10.46. Let H be a Hilbert space and T B (H ) a normal bounded operator. Then there exist self-adjoint operators T1 , T2 B (H ) such that T = T1 + iT2 , and T1 T2 = T2 T1 .
346
Proof. Write T1 = and T2 =
T + T 2
T T , 2i so that T = T1 + iT2 . Notice rst that both T1 and T2 are self-adjoint, and then that T 2 (T )2 T1 T2 = T2 T1 = 4i because T is normal. We now can state the basic result for normal operators. Proposition 10.47. Let H be a separable Hilbert space and let T B (H ) be a normal bounded operator. There exists a nite measure space (X, ), a bounded measurable function g L (X ), and a unitary isomorphism U : H L2 (X ) such that Mg U = U T . Sketch proof of Proposition 10.47. Write T = T1 + iT2 with T1 , T2 selfadjoint bounded operators which commute, as in the lemma. Let 1 (resp. 2 ) denote the projection valued measure for T1 (resp. T2 ). The idea will be to rst construct a suitable projection-valued measure associated with T , which must be dened on C since (T ) is not (in general) a subset of R. We rst claim that all projections 1,A and 2,B commute, since T1 and T2 commute (see Lemma 10.45). This allows us to dene AB = 1,A 2,B = 2,B 1,A , which are orthogonal projections. By basic limiting procedures, it follows that the mapping A B AB extends to a map which is a (nite) projection valued measure dened on the Borel subsets of C. Repeating the argument in the previous section allows us to dene normal operators
C
B (C) P (H )
f () d () B (H ),
for f bounded and measurable dened on C. In particular, one nds again that d (), T =
C
10.8 Some Facts on the Spectrum of a Tree
347
where the integral is again dened by truncating outside a suciently large compact set. This gives the spectral theorem for T , expressed in the language of projection-valued measures. Next one gets, for any f C ( (T )) and v H , the fundamental relation
2
f d v
|f |2 dv ,
where v is the associated spectral measure. This allows us once again to show that if T has a cyclic vector v (dened now as a vector for which the span of the vectors T n v and (T )m v for m, n N, is dense), then the unitary map L2 v ( (T )) H represents T as a multiplication operator Mz on L2 v ( (T )). Zorns lemma allows us to get the general case.

In this section we want to study the spectrum of the random walk (or equivalently, the Laplace operator) on a (p + 1)-regular tree. Let us recall that a graph is a set of vertices V together with a set of edges E V V . We will assume that the graph is undirected, meaning that (v, w) E if and only if (w, v ) E for all v, w V . More concretely, we x an integer p 2 (the case p = 1 is quite dierent and mush easier, see Exercise 10.49) and suppose that (V , E ) is a (p + 1)regular tree. This means that V is countably innite and every vertex v V is connected to exactly (p + 1) further vertices by edges in E and there are no loops (see Figure 10.1). We write v w if there is an edge joining v to w. At rst sight there are three natural operators that we can dene on 2 (V ) using the tree structure. In the following we always x p 2 and a (p + 1)regular tree (V , E ). Denition 10.48. The averaging operator on 2 (V ) is dened by T (f )(v ) = 1 f (w), p + 1 w v
for f 2 (V ). It replaces the value of a function at a vertex v by the average T (f )(v ) of all values f (w) at the direct neighbors w v in the tree. The summing operator is dened by S = (p + 1)T,
The proof will involve a fourth.
348
v0
Fig. 10.1. The 3-regular tree is illustrated here by showing all vertices of distance no more than 4 from a given vertex v0 , but the pattern needs to be repeated indenitely, from w and all the other vertices at distance 4 from v0 .
simply summing the values at the immediate neighbors. Finally the Laplace operator =I T compares the value at each vertex with the average over all its immediate neighbors. Clearly T, S , and are essentially equivalent. If one is understood well, then the same applies to the other two.
Exercise 10.49. Set p = 1, so that we may think of the (p + 1)-regular graph as having vertex set V = Z and edge set E = {(n, n 1) | n Z}. Show that the summing operator S is self-adjoint, describe its spectrum. Exercise 10.50. Show that the summing operator S : 2 (V ) 2 (V ) on a (p + 1)regular tree is a self-adjoint bounded operator with S p + 1. Show that no eigenvalue discrete (S ) of absolute value || = p + 1.
10.8.1 The Correct Upper Bound for the Summing Operator While it is not dicult to see that S p + 1, one might also guess that this upper bound is not the real value of S . Indeed the proof of the last statement of Exercise 10.50 already hints at this. Due to the very rapid growth V in the number of vertices in balls Bn (v0 ) (measured with respect to the natural path length on the tree), elements of 2 (V ) must decay rather rapidly. In this section we will not give a complete description of the spectrum of S on 2 (V ), but we will at least prove the correct upper bound for S . Theorem 10.51. Let p 2 and let (V , E ) be a (p + 1)-regular tree. The summing operator S : 2 (V ) 2 (V ) satises S 2 p < p + 1.
349
As implied by the intermediate inequality in the theorem, in fact S = 2 p, but we will not prove this here(10) . For the proof we will use yet another normalization of the averaging and summing operators. We refer to this as the unitarily normalized summation 1 U1 = S. p In fact we will also need the operators Un for n lemma. 0 as dened in the next
Lemma 10.52. For any n 0, let Un be the operator that maps any function f on a (p + 1)-regular tree to the function Un (f ) dened by Un (f )(v ) = 1 pn/2 f (w),
k n, w k v kn(mod2)
where w k v means that w and v have distance k in the (p + 1)-regular tree. Then the sequence of operators (Un ) satises U0 = I, p+1 1 U1 = T = S, and p p Un+1 = U1 Un Un1 for n 1. 2, we
Proof. The cases n = 0 and n = 1 hold trivially by denition. For n need to calculate the product 1 1 S (Un (f )) (v ) = p p = Un (f )(v )
v v
Here there are two possibilities for the distance from w to v . The distance could be k + 1, in which case v is the unique element with distance 1 to v in the direction of the path to w. The distance could be k 1, in which case v could be any of the p neighbors of v away from the direction of the path to w (see Figure 10.2 for the case p = 2). There is one exceptional case in this description: If k = 1 and w = v then all (p + 1) choices of neighbors of v give rise to w = v . We reorder the summation, and then split the inner summation into two sums depending on the two cases, keeping track of the multiplicities coming from the choices for v in the second case, giving
1 p(n+1)/2 v v
k n, w k v kn(mod2)
f (w) .
350
v1
v0
v
v2
w Fig. 10.2. In the (p + 1)-regular tree there is always a unique geodesic path from v to w. Here v0 is the unique neighbor of v on this path, and v1 , v2 are further from w than v is.
1 p(n+1)/2
k n, kn(mod2)
f (w) + p
w k+1 v w k 1 v
In the case k = 0 we adopt the convention that the sum over w 1 v sums over an empty set and so can be ignored. The extra term involving k,1 = 0 1 k = 1, k=1
f (w) + k,1 f (v ) .
corrects the multiplicity as discussed above. Shifting the summation over k to a summation over = k + 1 (respectively = k 1), we get 1 p(n+1)/2 f (w) +
n+1, w v n+1(mod2)
1 p(n1)/2
f (w)
n1, w v n1(mod2)
which is equal to Un+1 (f )(v ) + Un1 (f )(v ), proving the lemma. The recurrence relations above are classical ones. 10.8.2 Chebyshev Polynomials of the Second Kind Denition 10.53. The Chebyshev polynomials of the second kind are the polynomials Un Z[x] dened recursively by U0 (x) = 1, U1 (x) = 2x, and Un+1 (x) = 2xUn (x) Un1 (x) for n 1.
351
This sequence of polynomials have the following very concrete connection to trigonometric and hyperbolic functions. Lemma 10.54. If x = cos , then Un (x) = and if x = cosh , then Un (x) = sinh[(n + 1)] . sinh sin[(n + 1)] , sin (10.25)
The lemma is easily checked using the standard addition formulas for sin and sinh (which are identical). Proof of Lemma 10.54. First notice that if x = cos , then 2 sin cos sin 2 = = 2x, sin sin which proves the case n = 1. Now assume that the lemma holds for n 1 and for n, for some n 2. Then sin[(n + 1) + ] sin[(n + 1) ] sin[(n + 2)] = + Un1 (x) sin sin sin sin[(n + 1)] cos cos[(n + 1)] sin + = sin sin sin[(n + 1)] cos() cos[(n + 1)] sin() + + Un1 (x) sin sin = 2xUn (x) Un1 (x) = Un+1 (x), giving the case n + 1. As the addition formula for sinh is the same as the addition formula for sin , the proof of the second case is identical. The following might help to explain why we call the operators Un unitarily normalized, even though the precise normalization was to match the recurrence within Denition 10.53. Lemma 10.55. Let v0 be any vertex and let v0 2 (V ) be the standard basis vector dened by v0 (v ) = vv0 for all v V . Then 1 Un (v0 ) 1. Proof. Clearly pn/2 Un (v0 ) =
k n, w k v0 kn(mod2)
This formula relies on the assumption that we are using a tree. If there were loops then the orthogonality of the sum would not hold.
352
is a sum of orthogonal vectors, let us say Np many of them. Thus pn/2 Un (v0 ) = If n is even, then (since p + 1 pn 2p), Np .
Np = 1 + (p + 1)p + (p + 1)p3 + (p + 1)pn1 2(1 + p2 + + pn ) = 2 pn+1 1 pn . p1
If n is odd, then pn Np = (p + 1)(1 + p2 + + pn1 ) = (p + 1)(pn 1) pn p1
also, giving the lemma. Proof of Theorem 10.51. If p 2 then 2 p < p + 1 so we have to show that S 2 p, or equivalently that U1 2. Notice that this bound is precisely the constant arising in the transition from trigonometric to hyperbolic functions in Lemma 10.54. Now let (X, ) be a nite measure space, g L (X ) a real-valued function, and : 2 (V ) L2 ( X ) be a unitary isomorphism such that U1 = Mg as in Theorem 10.14. Let v0 V be arbitrary, and let f0 = (v0 ) L2 (X ) be the element corresponding to v0 2 (V ) as in Lemma 10.55. From that lemma we have Un v0 = gn f0 1, where gn is dened using the same recurrence as we used for the Chebyshev polynomials: g0 = 1, g1 = g , and gn+1 = g1 gn gn1 for all n 1. We claim that this implies that A = {x X | f0 (x) = 0} {x X | |g1 | > 2} is a null set. For the claim, notice that sinh[(n + 1)] sinh n+1 n + 1 whenever g1 > 2.
for all > 0. By Lemma 10.54 this implies that gn Therefore
This may be seen, for example, from the fact that sinh is convex for > 0, and so 1 1 (n + 1) + nn sinh[(n + 1)]. sinh = sinh n+1 +1 n+1
353
1 gn f0
{g1 >2}
2 gn |f0 |2 d
(n + 1)
{g1 >2}
2 |f 0 | d
for all n 1. However, this gives f0 = 0 almost everywhere on {g1 > 2}. On the set {g1 < 2} we can use the function (1)n gn which satises the same recurrence relation as before. This proves the claim. As we vary v0 V the functions v0 form an orthonormal basis of 2 (V ). As all of these gives rise to functions (v0 ) that vanish almost everywhere on {|g1 | > 2}, but also give a basis of L2 (X ) we deduce that ({|g1 | > 2}) = 0. g1 L 2 as required. Therefore, U1 = Mg1
11 Spectral Theory of Self-Adjoint Unbounded Operators
11.1 Examples, Denitions, and the Main Theorem

In this chapter we will generalize the spectral theorem from Chapter 8 to the case of unbounded self-adjoint operators (the formal denition will be given below). The model case for such an operator is again a multiplication operator. Example 11.1. Let (X, B , ) be a -nite measure space, and let g : X R be measurable. The multiplication operator Mg : f gf has natural domain
2 Dg = D(Mg ) = f L2 (X ) | gf L (X ) .
Clearly Mg (f1 ), f2 =
X
gf1 f2 d = f1 , Mg (f2 )
for f1 , f2 Dg . This suggests that Mg is a self-adjoint operator which is unbounded if g / L (X ) (though this statement requires a proof after we have seen the formal denitions). d : Cc (R) H = L2 (R) dx be the dierentiation operator, and dene an operator T by Graph(T ) = Graph
d dx
Example 11.2. Let
By Denition 3.45 and Lemma 3.48 (applied with d = k = 1) this denes a map T : D(T ) = H 1 (R) L2 (R).
The map iT : D(iT ) L2 (R) is an unbounded self-adjoint operator which is conjugate to an unbounded self-adjoint multiplication operator as in Example 11.1. The unitary isomorphism is given by the Fourier transform indeed, by Proposition 8.31 we have for f Cc (R) that
356

d dx f (t)
= 2 itf (t).
From this one can deduce (check this) that D(T ) = H 1 (R) = {f L2 (R) | tf (t) and so for g (t) = 2t the diagram iT : D(iT ) = D(T ) L2 (R) Mg : Dg = D(T ) L2 (R) commutes and completely describes iT (and hence T ) in terms of a multiplication operator. The following example shows that we have to be more careful about the domain of unbounded operators in contrast to the discussions about bounded operators.
Exercise 11.3. Let X = (0, 1), D0 = Cc ((0, 1)), and consider again the operator
< }
d : D0 L2 ((0, 1)). dx (a) Recall from Exercise 1.18 or 4.37 that the functions x sin nx for n = 1, 2, . . . form an orthonormal basis. Using this, show that
1 T0 : D (T0 ) = H 0 ((0, 1)) L2 ((0, 1))
an sin nx
n=1 n=1
nan cos nx
dened on D (T0 ) =
an sin nx |
n=1 n=1
n2 |an |2 <
d extends the operator d so as to make Graph(T0 ) a closed set. x (b) Using the fact that the family {x e2inx | n Z} forms a basis of L2 ((0, 1)), show that
T : D(T ) = H 1 ((0, 1)) L2 ((0, 1)) an e2inx

nZ nZ
2 inan e2inx
dened on D (T ) =
nZ d dx
an e2inx |
nZ
n2 |an |2 <
extends the operator so as to make Graph(T ) a closed set. (c) Show that T0 = T even though they agree on a dense subset of L2 ((0, 1)).
11.1 Examples, Denitions, and the Main Theorem
357
The examples above, and Exercise 5.22, clearly show that unbounded selfadjoint operators cannot reasonably be required to be dened on the whole Hilbert space. We will proceed by recalling Denition 5.20 and then giving related denitions. In contrast to Denition 5.20 we will in this chapter always assume that X = H and Y = H are Hilbert spaces, and that the domain DT H is dense.
Denition 11.4. Let H and H be complex Hilbert spaces, let DT H be a dense subspace, and let T : DT H be a linear operator. Then we say that T is a densely dened operator from H to H , and write (DT , T ) : H H .
We say that T is closable if Graph(T ) is again the graph of a densely dened operator (DT , T ) : H H . We say that T is closed if Graph(T ) is closed. If (DT , T ) : H H and (DS , S ) : H H are densely dened operators, then we say that T is equal to S if DT = DS and T = S , and say that S is an extension of T , written T S , if DT DS and S |DT = T . The adjoint of a densely dened operator will be dened in the next lemma. Lemma 11.5 (Adjoint operator). Let (DT , T : H H ) be a densely dened closable operator of complex Hilbert spaces. Then there exists a densely dened closed operator (DT , T ) : H H , called the adjoint, satisfying T v, w
H
= v, T w
for all v DT and w DT . Moreover, the adjoint of the adjoint, (T ) , is equal to the closure T of the operator T . Denition 11.6. Let (DT , T ) : H H be a densely dened operator on a complex Hilbert space. If T = T then T is said to be self-adjoint. We will base our discussion of the spectral theory of self-adjoint operators on the following lemma. Lemma 11.7 (Orthogonal decomposition into two graphs). Let (DT , T ) : H H be a closed densely dened operator between two complex Hilbert spaces. The orthogonal complement of the closed set Graph(T ) H H is given by Graph(T ), where : H H H H is the map (w, v ) (v, w).
As usual this also entails equality of the domains, so D(T ) = DT .
358
Proof of Lemma 11.5 and 11.7. Let (DT , T ) : H H be a densely dened closable operator as in Lemma 11.5. We dene DT = {w H | DT v T v, w
H
is bounded} .
Notice that if w H and the linear map v T v, w H is bounded then in can be uniquely extended from the dense subset to H , so that by Frechet Riesz representation (Corollary 2.71) there is a uniquely dened T w H with T v, w H = v, T w H (11.1) for all v DT . It is easy to check that DT is a linear subspace and that T : DT H is linear. Now let us prove that Graph(T ) = Graph(T ), (11.2)
which in particular will imply Lemma 11.5 (by Corollary 2.77). Let w DT so that (11.1) holds for all v DT . By denition, (T w, w) Graph(T ) and (v, T v ), (T w, w) (v, T v ), (v , w)
H H
= v, T w
T v, w
=0
for all v DT . On the other hand, if (v , w) Graph(T ) so that

H H
= v, v
T v, w
=0
for all v DT , then DT v T v, w is bounded and so w DT , v = T w, and Now let us prove that (DT , T ) : H H is also densely dened. Suppose it is not, so that DT is a proper subspace of H . Then there exists some nonzero w0 (DT ) . Notice that this implies that (0, w0 ), (T w, w) for all w DT , or equivalently (0, w0 ) Graph(T ) .
H H
(v , w) = (T w, w) Graph(T ).
=0
11.2 Operators of the form T T
359
By (11.2) and Corollary 2.77 this gives (0, w0 ) Graph(T ) = Graph(T ). Since w0 = 0, this contradicts the assumption that the closed operator T exists. For the nal remark, note that by (11.2) we have Graph(T ) = Graph(T ) = Graph(T ). Applying this to T (and noting that the operator lution), we see that Graph(T ) = Graph(T ) = Graph(T ) = Graph(T ) = Graph(T ) = Graph(T ).
is unitary and an invo-
11.2 Operators of the form T T

We mention (just as motivation for what follows) that dierentiation can often be used to dene a closed operator T which sends a function to its total derivative. Moreover, T is then often the negative of the divergence on vector elds, so that T T is often some kind of Laplace operator. This observation motivates the following construction. Theorem 11.8 (Spectral theory of T T ). Let (DT , T ) : H H be a densely dened closed linear operator. Then (DT T , T T ) : H H is a selfadjoint operator which is unitarily isomorphic to a multiplication operator
2 (DMh , Mh ) : L2 (X ) L (X )
for some nite measure space (X, ) and some measurable function h : X [0, ). Proof. The proof of the theorem essentially comprises a careful analysis of Figure 11.1. Let us write PGraph(T ) : H H H H for the orthogonal projection onto the closed subspace Graph(T ) H H , : H H H for the embedding map v (v, 0), and PH : H H H for the projection map (v, w) v . Note that
360
11 Spectral Theory of Self-Adjoint Unbounded Operators Graph(T ) (w, T w)
(w, 0) (v, 0) Graph(T ) = Graph(T )
Fig. 11.1. We obtain a bounded operator B : H H sending v to w by two orthogonal projections (and one embedding).
H (v ), (v , w )
H H
= v, v
= v, PH (v , w )
H H
so that H = PH . Also note that PGraph(T ) = PGraph(T ) . Now dene B = PH PGraph(T ) H , so that B = PGraph( T ) PH = PH PGraph(T ) = B
is self-adjoint. Moreover, B Also, by denition, Bv, v = PH PGraph(T ) H (v ), v = PGraph(T ) H (v ), PGraph(T ) H (v ) 0. PH PGraph(T ) H = 1 .
To summarize, B : H H is a self-adjoint bounded operator with spectrum in [0, 1]. We now relate B to T T , after which we can simply apply Theorem 10.31 to B and obtain the spectral theorem for T T . In fact, we claim that B = ( + T T )1 , or more precisely that (a) ( + T T )B = , and (b) B ( + T T ) = DT T and, in particular, DT T Im B .
Together this implies that DT T = Im B , B is injective, and nally that T T = B 1 is completely determined by the operator B . To prove (a) we chase the equations dening B (see Figure 11.1). Let v H and w = Bv so that (by denition) w DT , (w, T w) Graph(T ), and (w, T w) (v, 0) Graph(T ) = Graph(T ). This gives (w, T w) (v, 0) = (w v, T w) = (T T w, T w),
11.3 Self-Adjoint Operators
361
so w DT T and
( + T T )Bv = w + T T w = v.
To prove (b), we essentially use the same formulas. Fix w DT T and dene v = w + T T w. Then (w, T w) Graph(T ), (T T w, T w) Graph(T ), and which implies that w = Bv = B ( + T T )w as claimed. Now apply Theorem 10.31 to B to nd a nite measure space (X, ) and some function g L (X ) so that B and Mg are unitarily isomorphic. Since B is injective, g = 0 -almost everywhere. Using the same isomorphism we claim that T T is isomorphic to Mh 1 for h = g 1. Indeed,
1 2 (DT T ) = (Im B ) = Im(Mg ) = f L2 (X ) | g f L (X ) = DMh ,
(v, 0) = (w, T w) + (T T w, T w),
and since T T w = B 1 w w for all w DT T we also see that

1 (T T w) = (B 1 w) (w) = Mg (w) (w) = Mh (w).
Exercise 11.9. (The inuence of the domain) Let X = (0, 1) and H = L2 ((0, 1)). 1 1 d ((0, 1)), d (a) Let (DT0 , T0 ) = H0 be the weak derivative map restricted to H0 ((0, 1)). x Show that T0 T0 coincides with on Cc ((0, 1)), and that its eigenfunctions are 1 the functions x sin nx for n Z. More precisely DT T = H0 ((0, 1)) H 2 ((0, 1)) (these are the Dirichlet boundary conditions). d (b) Let (DT , T ) = H 1 ((0, 1)), d . Show that T T coincides with on Cc ((0, 1)), x and that its eigenfunctions are the functions x cos nx for n Z. More precisely
1 DT T = {f H 2 ((0, 1)) | f H0 ((0, 1))},
which are the Neumann boundary conditions. d . Show that Tp Tp coincides with on H 2 (T). (c) Let (DTp , Tp ) = H 1 (T), d x

Theorem 11.10. Let (DT , T ) : H H be a self-adjoint densely dened operator. Then there exists a nite measure space (X, ) and a real-valued measurable function h : X R such that (DT , T ) is unitarily isomorphic
362
to (DMh , h), meaning that there is a unitary isomorphism : DT DMh such that T : DT H H commutes. Mh : DMh L2 L2 (X ) (X )
Since a self-adjoint operator T as in Theorem 11.10 is also closed, it is clear that we can also apply the method of the previous section to T . Note, however, that a simple application of Theorem 11.8 only gives a description of T 2 , which does not allow a description of T . In fact T 2 has a potentially smaller domain, and may have lost some information about T (namely the sign of eigenvalues or approximate eigenvalues). To compensate for that we will study two operators: B as in the previous section, and A = T B , as in Figure 11.2.
(0, T w) = (0, Av )
(w, T w)
Graph(T )
(w, 0) = (Bv, 0) (v, 0)
Fig. 11.2. For the proof of Theorem 11.10 we study the operators A and B .
Proof of Theorem 11.10. Let B = (I + T T )1 = (I + T 2 )1 be as in the proof of Theorem 11.8. We also dene A = T B = PH,2 PGraph(T ) H , where PH,2 (v, w) = w is the projection to the second copy of H in H H , see Figure 11.2. Below we will apply Theorem 10.37 to the bounded operators A and B , and to do this we rst have to show that A is normal (in fact it is self-adjoint; this is something we already know for B by the proof of Theorem 11.8), and that A and B commute. To prepare for this, we rst claim that T B BT . To prove the claim, x w DT and dene v = Bw so that ( + T 2 )v = w.
363
Since w DT this shows that v DT 3 and ( + T 2 )T v = T w, or equivalently that T v = BT w. Since v = Bw, we have shown that T Bw = BT w for all w DT and hence the claim. To prove that A = A for A = T B we argue as follows. For w DT and v H we have Av, w = T Bv, w = Bv, T w = v, BT w = v, Aw by the claim above. Thus A w = Aw for all w DT , which is a dense subset of H . It follows that A = A. Moreover, AB = T BB BT B = BA. by the claim. Since both AB and BA are dene on all of H this shows that AB = BA. Next we have to show that A and B together uniquely determine T (so that when A, B are realized as multiplication operators we have some hope of deducing a similar realization for T ). We claim that T = B 1 A and DT = {v H | Av Im B }. To see this, suppose rst that v DT so that B 1 Av = B 1 T Bv = B 1 BT v = T v by the previous claim that T B BT . The converse is more involved, and relies more directly on the assumption that T is self-adjoint. Let v DB 1 A . By denition of Graph(T ) and the construction of B in the proof of Theorem 11.8 we have (Bv, T Bv ) Graph(T ), (11.3)
(Bv v, T Bv ) Graph(T )
for any v H . Replacing the latter instance of v with B 1 Av (which exists by the assumption that v DB 1 A ) shows that (Av B 1 Av, T Av ) Graph(T ) . Now recall that Graph(T ) = Graph(T ) so that we have equivalently (T 2 Bv, T Bv + B 1 Av ) Graph(T ). (11.4)
364
11 Spectral Theory of Self-Adjoint Unbounded Operators Graph(T ) (0, B 1 Av ) (0, Av ) Graph(T )
(Bv, 0)
(v, 0)
Fig. 11.3. As the proof of Theorem 11.10 shows, the two marked segments are translates of each other.
Taking the sum of (11.3) and (11.4) and using the identity ( + T 2 )B = gives (v, B 1 Av ) Graph(T ).
Thus v DT and T v = B 1 Av as claimed (see Figure 11.3). Now we apply Theorem 10.37 to A and B to obtain a nite measure space (X, ) and two functions gA : X R and gB : X (0, ) such that A and B are conjugated to MgA and MgB respectively. Since we have shown that DT and T are purely dened in terms of A and B , we can use the same unitary isomorphism to describe (DT , T ) as follows: (DT ) = ({v H | Av Im B }) = f L2 (X ) | where we set h =
gA gB . gA gB f
= {f L2 (X ) | MgA (f ) Im gB } L2 (X ) = DMh ,
Moreover,
1 MgA (v ) = Mh (v ) (T v ) = (B 1 Av ) = Mg B
for all v DT .
Appendix A: Set Theory and Topology
A.1 Set Theory and Axiom of Choice

We will be using naive set theory, and in particular will use without specic reference the axioms ZF, ZF C of set theory. This does entail some caution. For example, it does not permit there to be a set that contains all sets, for if there were such a universal set V then its subset C = {A V | A / A} forces the contradictory statement C C C / C. Here are some basic properties of sets that we will use without comment. (1) A set will never contain itself. (2) For every set S of sets there is a set AS A, the union, containing all elements that are contained in some A S . (3) For every set A there is a power set P(A) containing all subsets of A. (4) Any subset of a set is a set. Examples of sets include the empty set , the natural numbers N, the real numbers R, the set of functions R C, which may also be written as CR , and so on. The following axiom of set theory is more controversial than those above, but it plays a central role in analysis. Axiom of Choice. Suppose that A is a non-empty set for all I . Then there is a function f : I A
I
with f () A for all I . In other words, the product space
A is non-empty.
Which by denition contains all such functions.
366
Appendix A: Topology
While this axiom appears quite innocent (indeed, it appears almost obvious), it turns out to have a number of exotic consequences(11) . The axiom of choice has many equivalent formulations, one of which is Zorns lemma, which is particularly useful in analysis. In order to state this, recall that a partial order on a set S is a relation with the property that a b, b c = a c for all a, b, c S . A partial order is a linear order if for every pair a, b S we have either a = b or a b or b a. A maximal element in a partially ordered set (S, ) is an element m S for which there is no a S with m a. Zorns lemma. Let (S, ) be a partially ordered set, and suppose that for every linearly ordered subset L S without a maximal element there exists an element m S with m for all L. Then there exists a maximal element m S . One might imagine setting out to prove Zorns lemma inductively along the following lines. Starting with a single element (which certainly forms a linearly ordered set) one can build larger and larger linearly ordered subsets. If the current linearly ordered subset L has a maximal element, then it may also be a maximal element for S in which case we are done. Otherwise, one can add an element to L which is bigger than the maximal element of L. If L has no maximal element then we can (by assumption) add an element larger than every element of L. Repeating this inductively (by transnite induction, and noting that this procedure only ends once a maximal element in S is found), Zorns lemma follows. However, in the course of the proof one has to make (potentially uncountably) many choices, and doing this carefully reveals that the argument needs the axiom of choice.
A.2 Basic Denitions

The notion of open set is fundamental for dening continuity and convergence. For any set X we write P (X ) for the set of all subsets of X . Denition A.1. Let X be a space. A family T P(X ) of subsets of X is called a topology on X if , X T ; if O1 , O2 T then O1 O2 T ; if Oi T for all i I , where I is an arbitrary index set, then
iI
Oi T .
The pair (X, T ) is called a topological space. The elements of a topology are called open sets and a set A X with X A T is called closed. A set that is both open and closed is called a clopen set.
A.2 Basic Denitions
367
Given a point x in a topological space, a neighborhood of x is a set V containing an open set U that contains x. We will usually want to assume that neighborhoods are open sets, and with this convention a neighborhood of x is an open set containing x. Many of the topological spaces that we will study are particularly wellbehaved ones arising from a metric space structure. Denition A.2. A function d : X X R is called a metric if it satises the following properties: d(x, y ) 0 and d(x, y ) = 0 if and only if x = y , for all x, y X (strict positivity) ; d(x, y ) = d(y, x) for all x, y X (symmetry) ; d(x, y ) d(x, z ) + d(z, y ) for all x, y, z X (triangle inequality).
The pair (X, d) is called a metric space. A set O X in a metric space is called open if for any x O there is some > 0 such that B (x) = {y X | d(x, y ) < } O. The set B (x) is called an open -ball around x. It is easy to check that the collection of all open sets in a metric space denes a topology on the metric space. If instead of strict positivity we only have d(x, y ) 0 for all x, y X (positivity)
then we say that d is a pseudo-metric. Denition A.3. A function f : X Y between two topological spaces (X, TX ) and (Y, TY ) is continuous if f 1 (O) TX for all O TY . Denition A.4. Let X be a set and suppose that T1 and T2 are two topologies on X . If the identity map id : X X viewed as a map from (X, T1 ) to (X, T2 ) is continuous, then T2 is said to be weaker or coarser than T1 , and T1 is called stronger or ner than T1 . Notice that we can describe the same relation between topologies by saying that T2 is weaker than T1 if T2 T1 .
368
A.3 Convergence and Continuity

As is well-known from analysis, we say that a sequence (xn ) in a topological space X converges to x, written limn xn = x, if for every neighborhood U of x there exists some N such that xn U for all n > N . While this notion is sucient for metric spaces , convergence of sequences is not adequate for more general topological spaces. The following notions of lters and convergent lters are sucient for any topological space. Recall that we write P(X ) for the set of all subsets of X (this is also called the power set of X ). Denition A.5. Let X be a set. A family F P(X ) of subsets of X is a lter if X F but / F; if F1 , F2 F then F1 F2 F ; and if F F and F B X , then B F .
Example A.6. (a) Let X = (X, T ) be a topological space and x X . Then Ux = {U T | x U } is a lter, called the neighborhood lter. (b) Let X = N and set F = {B N | there exists some N > 0 with n B for all n > N }. Then F is a lter, called the tail lter. (c) While this is not needed here, we mention that a directed set (as in the denition of nets) gives rise to a generalization of tail lters. Denition A.7. Let F1 , F2 P(X ) be lters on a set X . Then F1 is ner than F2 , or F2 is coarser than F1 , if F1 F2 . Denition A.8. Let X be a topological space, and let F P(X ) be a lter. We say that F converges to x X , written x = lim F , if F is ner than the neighborhood lter Ux .
Exercise A.9. Let X be a Hausdor topological space, and let F P(X ) be a lter. Show that the limit lim F is unique if it exists.
Denition A.10. Let M be a set, F P(M ) a lter, X a topological space, and f : M X a map. We say that f converges along F to x X , written as limF f = x, if the image lter f (F ) = {B X | there exists some A M with f (A) B } is ner than Ux (that is, the image lter converges to x).
If the topology is given by a metric d, then this is equivalent to the property that for any > 0 there is some N such that n > N = d(xn , x) < . Suciency means, for example, that we can characterize continuity for functions between metric spaces using convergence of sequences.
A.4 Inducing Topologies
369
Exercise A.11. Let M = N, let X be a topological space, and let f : N X be the function corresponding to the sequence (f (n)). Show that limn f (n) = limF f , where F is the tail lter from Example A.6(b). Exercise A.12. Let X, Y be topological spaces, and let f : X Y be a map. Show that f is continuous if for all x X we have limUx f = f (x), where Ux is the neighborhood lter from Example A.6(a).

If (X, T ) is a topological space and Y X is any subset, then the topology on Y induced from the topology on X is the weakest topology on Y for which the identity inclusion map Y X is continuous. Equivalently, the induced topology on Y is {Y O | O T }. Suppose that f : X Y is a map between two sets. If Y has a topology TY , then there is a weakest topology on X which makes f continuous. This topology is given by f 1 (TY ) = {f 1 (O) | O TY }. If on the other hand X has a topology TX , then there is a strongest topology on Y which makes f continuous. It is given by {O Y | f 1 (O) TX }. The former case has an important generalization as follows. Denition A.13. Let X be a set, and let f : X Y for I be a family of maps from X to topological spaces (Y , TY ). Then the initial topology induced by these maps is the weakest topology for which all of the maps are continuous. The open sets in the initial topology are arbitrary unions of nite intersections of elements of f1 (TY ) for various I . The initial topology can also be characterized by the following universal property. A function g : Z X is continuous if and only if f g : Z Y is continuous for each I . A particular case of the initial topology is the product topology. Denition A.14. Suppose that (Y , T ) for I is a collection of topological spaces. Dene X= Y .
I
This topology is also known as the weak, limit, or projective topology.
370
from X to Y for all I .
The product topology on X is the initial topology induced by the projection maps : (y )I y
Another case is given by the topology generated by a family of topologies. Suppose that X is a set, and for all I we have a topology T on X . Then we may consider the identity map id : X X as a map from X to the topological space (X, T ) for each I , and associate to X the weakest topology that is ner than all the topologies T for I . Notice that the product topology, or the weakest topology that is ner than a given family of topologies may not be metric (that is, derived from a metric) even if the original topologies were metric. However, there is a special situation in which the metric property is preserved by taking products. Lemma A.15. Let X be a set and suppose that dn : X X R is a sequence of pseudo-metrics. Then the weakest topology that is ner than the topologies induced by dn for n N is itself induced by a pseudo-metric. In particular, the countable product of metric spaces is a metric space in the product topology. Proof. For the main part of the argument it is important to know that we may assume that dn only takes on values in [0, 1]. To see this, we claim that if dn is any pseudo-metric then dn = dn 1 + dn
is a pseudo-metric that denes the same topology as the topology dened by dn . Positivity and symmetry of dn are clear since they hold for dn . Hence it is enough to check the triangle inequality for dn . For this, notice rst that the u function u 1+ u maps from [0, ) to [0, 1), is monotone and satises u+v 1+u+v u v + 1+u 1+v (A.1)
for u, v [0, ). The inequality (A.1) follows from the inequality (u + v )(1 + u)(1 + v ) = (u + v )(1 + u + v + uv ) (1 + u + v )(u + v + 2uv ) = (1 + u + v ) (u(1 + v ) + v (1 + u)) after dividing by (1 + u + v )(1 + u)(1 + v ). It follows that if x, y, z X , then dn (x, y ) = dn (x, y ) 1 + dn (x, y ) dn (x, z ) + dn (z, y ) 1 + dn (x, z ) + dn (z, y ) dn (z, y ) dn (x, z ) + = dn (x, z ) + dn (z, y ) 1 + dn (x, z ) 1 + dn (z, y )
371
as required. So suppose that dn : X X [0, 1) is the given metric for each n We dene 1 dn (x, y ). d(x, y ) = 2n n=1
1.
Since this sum converges uniformly on X X , it denes another pseudometric on X . We claim that the topology induced by d is precisely the weakest topology that is ner than all the topologies induced by dn for n 1. Suppose rst that O X is an open set with respect to d, and let x O. By denition there exists > 0 with
d B (x) = {y X | d(x, y ) < } O.
Now choose N with
1 n=N +1 2n N n=1
. Then < 2
dn d (x) B (x) O B/N 2N
since if y X satises dn (y, x) <

N
for n = 1, . . . , N then
n=N +1
d(x, y )
n=1
dn (x, y ) +
1 < . 2n
As this holds for all x O, we see that O is a union of nite intersections of sets that are open with respect to the topology induced by dn . The converse is similar. Suppose O is a union of nite intersections of sets that are open with respect to dn . Let x X and suppose that x
n=1
N On O,
where On is open with respect to dn for n = 1, . . . , N . Then we may as well dn assume On = B (x). We claim that
N d B/ 2N (x) n=1
On O,
which then implies that O is open with respect to the pseudo-metric d. So suppose y X satises d(y, x) = then 21 n dn (y, x) < and so
2N
1 d (y, x) < N , n n 2 2 n=1
and for n N this implies that dn (y, x) < , y On
372
N
y The rst part of the lemma follows. Now suppose that
On .
n=1
X=
n=1
Xn
where each (Xn , dn ) is a metric space, and we dene d ((xn ), (yn )) = 1 dn (xn , yn ) . n 1 + d (x , y ) 2 n n n n=1
Then d is a pseudo-metric by the argument above. However, d ((xn ), (yn )) = 0 = dn (xn , yn ) = 0 for all n so d is a metric on X . The topology induced by ((xk )k , (yk )k ) dn (xn , yn ) is precisely the weakest topology for which the projection to Xn is continuous. By the rst part of the lemma, this shows that the topology induced by d is the weakest topology for which all the projections are continuous, so d induces the product topology. 1 = (xn ) = (yn ),
A.5 Compact Sets and Tychono Theorem

Compactness is a fundamental notion for all of analysis, and in particular for functional analysis. It plays a role in topology a little like niteness does in combinatorics. Denition A.16. Let (X, T ) be a topological space. A family of sets U is called an open cover if U consists of open sets and X O.
OU
The space (X, T ) is called compact if every open cover has a nite subcover, that is a nite subset V U which is also an open cover. An alternative and equivalent condition for compactness can be given in terms of closed sets. A collection of sets {A | I } has the nite intersection property if
k
A =
=1
A.5 Compact Sets and Tychono Theorem
373
for any nite subset {1 , . . . , k } I , and has the innite intersection property if A = .
I
is convergent, meaning that there is some x X with the property that for any > 0 there is some N = N () such that n N = d(xn , x ) < .
Then a topological space (X, T ) is compact if and only if every family of closed sets with the nite intersection property also has the innite intersection property. Recall that a metric space (X, d) is called complete if every sequence (xn ) with the Cauchy property that for every > 0 there is some N = N () for which m, n N = d(xm , xn ) <
For metric spaces there are further equivalent properties characterizing compactness. A metric space (X, d) is sequentially compact if any sequence (xn ) in X has a convergent subsequence. A metric space (X, d) is compact if and only if it is complete and totally bounded, meaning that for every > 0 there is a nite set of points {x1 , . . . , xn } with
n
X=
i=1
B (xn )
Compactness is closed under taking products in the following sense. Theorem A.17 (Tychono ). Let I be an index set, and suppose that (X , T ) is a compact topological space for all I . Then I X is compact with respect to the product topology. The notion of compactness has many useful extensions and generalizations. We will only need two of these. Denition A.18. A topological space is called locally compact if every point has a neighborhood which is compact in the induced topology. A topological space is called -compact if it can be written as n=1 Kn with each Kn compact in the induced topology. Compactness can also be characterized in terms of lters, and for this another notion is useful. Denition A.19. Let X be a set and F P(X ) a lter. Then F is an ultralter if for every B X we have B F or X B F .
374
Proposition A.20. Let X be a Hausdor topological space. Then the following are equivalent. (1) X is compact. (2) Every lter on X has a ner lter that converges to some x X . (3) Every ultralter converges. The implication (3) = (1) once again uses the Axiom of Choice in the form of Zorns lemma.
Exercise A.21. (a) Use Zorns lemma to show that every lter has a ner lter that is an ultralter. (b) Prove Proposition A.20. (c) Use Proposition A.20 to prove Tychonos theorem.
A.6 Normal Spaces

A circle of useful constructions concerns ways to approximate functions with continuous functions. The appropriate level of generality is provided by normal spaces ; as the name suggests, many of the topological spaces that arise in functional analysis have this property (in particular, any metric space is normal and any compact Hausdor space is normal). Denition A.22. A topological space X = (X, T ) is said to be normal if for any closed sets A, B in X with A B = there are open sets U A and V B with U V = . This denition, which says that disjoint closed sets can be separated by open sets, may be thought of as requiring that there are enough open sets. An important consequence is that there are enough continuous functions in the following sense (this elegant presentation is taken from Taos blog). Lemma A.23 (Urysohns lemma). Let X = (X, T ) be a topological space. Then the following properties of X are equivalent. (1) X is a normal space. (2) For every closed set K X and every open set U K , there is an open set V and a closed set L with U L V K . (3) For every pair of closed sets K and L in X with K L = , there exists a continuous function f : X [0, 1] with f (x) = 1 0 if x K, if x L.
(4) For every closed set K X and every open set U K , there exists U (x) for f (x) a continuous function f : X [0, 1] with K (x) all x X .
A.6 Normal Spaces
375
Proof. The implications (3) (4) and (1) (2) are clear, since a set is closed if and only if its complement is open. Assume now that (3) holds. Given disjoint closed sets K, L X , let f be the function given by (3). Then the open sets U = {x X | f (x) > 0.9} and show (1). Assume next that (2) holds, let K = K1 be a closed set, and let U = U0 be an open set with K1 U0 . By (2), we can nd a closed set K1/2 and an open set U1/2 with U0 K1/2 U1/2 K1 . V = {x X | f (x) < 0.1}
Applying (2) again (twice) gives closed sets K1/4 , K3/4 and open sets U1/4 , U3/4 with U0 K1/4 U1/4 K1/2 U1/2 K3/4 U3/4 K1 . Continuing in exactly the same way, we construct for every rational qD={ a |n 2n 0, a Z, 0 a 2n }
a closed set Kq and an open set Uq with Kq Uq for all q (0, 1) and with Uq1 Kq2 for all q1 , q2 [0, 1] with q1 q2 . Now dene f (x) = sup{q D | x Uq } = inf {q D | x / Kq } with the convention that sup = 1 and inf = 0. It is easy to check that {x X | f (x) > s} = and {x X | f (x) < s} = X Kq
q<s
Uq
q>s
are open sets for any real s, so f is continuous and (4) follows. Lemma A.24 (Partition of unity). Let X be a normal topological space, and let {K | A} be a collection of closed sets that cover X . Let {U | A} be an open cover of X , with U K for each A and with the property that each x X has an open neighborhood that intersects nontrivially with only nitely many U . Then for each A there exists a continuous function f : X [0, 1] supported on U such that f (x) = 1
A
for all x X .
376
Proof. By Urysohns lemma (Lemma A.23), for each A there is a continuous function g : X [0, 1] which is supported on U and is equal to 1 on K . Then g (x) = g (x)
A
is well-dened (by the nite intersection property) and is bounded below by 1. Setting f = g /g for all A gives the result. Proposition A.25 (Tietze extension theorem). Let X be a normal topological space, A X a closed subset, and let f : A R (or C) be a bounded continuous function . Then there exists a bounded continuous function F : X R (or C) with F |A = f . If in addition S is locally compact and A is compact, then we can nd such an extension F in Cc (X ). Proof. If f is complex-valued then we may use the following argument for (f ) and (f ) separately, so it is enough to consider the real-valued case. If |f (x)| M for all x X then we may also apply the following argument 1 f , so we may assume without loss of generality that f is a continuous to M function from A to [1, 1]. 1 ]) and B+ = f 1 ([ 1 Dene sets B = f 1 ([1, 3 3 , 1]). By denition and by continuity of f , B A and B+ A are disjoint closed sets. By Urysohns lemma (Lemma A.23) there exists a continuous func1 tion g : X [0, 1] with g |B = 0 and g |B+ = 1. Dene h1 = 2 3 (g 2 ). 2 We claim that f h1 |A 3 by considering each possibility in turn. 2 1 ] and h1 (x) = 1 If x B then f (x) [1, 3 3 , so |f (x) h1 (x)| 3. 1 2 1 . Finally If x B+ then f (x) [ 3 , 1] and h1 (x) = 3 so |f (x) h1 (x)| 3 1 1 2 if x A (B B+ ), then f (x) ( 3 , 3 ) and |h1 (x)| 1 3 , so |f (x)h1 (x)| 3 again. We interpret the argument above as follows. Every continuous function f : A [1, 1] has an approximation h1 |A which is the restriction of a continuous 2 function h1 : X [1, 1] to A, with f h1 |A 3 . Applying this general 3 statement to f2 = 2 (f h1 |A ) we nd some continuous function h2 : X [1, 1] with f2 h2 |A 2 3 or, equivalently, with f h1 + 2 3 h 2 |A
2 3 (f h1 |A ) h2 |A 3 2
2 3
3 2 ) (f2 h2 ), we nd funcContinuing inductively starting with f3 = ( 2 tions h1 , h2 , . . . , hn : X [1, 1] with 2 n1 f h1 + 2 h n |A 3 h2 + + ( 3 )
2 3
(A.2)
The assumption of boundedness is not essential, but does simplify the proof and is sucient for our purposes.
A.6 Normal Spaces
377
We set F = and notice that F (x)

m n=1
n=1
2 n1 3
hn ,
2 n1 3
hn (x)
2 m 3
for any m 1 and x A, so the convergence is uniform and F C (X ) (see the proof of Example 2.19 (3) on page 47). By (A.2) we have f = F |A as required. If X is also assumed to be locally compact and A X is compact, then there exists an open set O A with compact closure . Now extend f , rst by using the denition f (x) = f (x) for x A, 0 for x X O.
Then A = A X O is closed, and f is a continuous function on A which (by the argument above) can be extended to a continuous function F C (X ). By construction Supp(F ) O X is compact, so F Cc (X ) as required.
To see this, for each x A let Ox be an open neighborhood of x with compact closure. Then the open cover {Ox | x A} has a nite subcover A Ox1 Oxn = O, and the union of the elements of that nite cover gives such an open set.
Appendix B: Measure Theory
Measure theory is one approach to making rigorous the idea of the size (or length, volume, and so on) of a set in an abstract setting. By carefully controlling the complexity of the sets allowed in the theory, the basic intuition (for example, that the volume of the disjoint union of two sets is the sum of their volumes) can be developed into a powerful tool for studying diverse elds, including functional analysis and probability.
B.1 Basic Denitions and Measurability

The path to the denition of the Lebesgue integral starts with a discussion about which sets (and hence which functions) are allowed in the theory. Denition B.1. Let X be a set. A family A P(X ) of subsets of X is called an algebra if it satises the following properties: , X A; if A A then Ac = X A A; n if A1 , . . . , An A then i=1 Ai A; if A1 , A2 , A then
n=1
and if, in addition, An A
then A is a -algebra. If A is a -algebra, then we call the pair (X, A) a measurable space and the elements of A measurable sets or A-measurable sets. It is straightforward to check that the intersection of any two -algebras is also a -algebra. Hence for any family C P (X ) of subsets there is a unique smallest -algebra containing C , called the -algebra generated by C , and denoted (C ). If X is a topological space, then the -algebra generated by all open subsets of X is called the Borel -algebra, and is denoted B or B (X ).
380
Denition B.2. A function : X Y between two measurable spaces (X, AX ) and (Y, AY ) is called measurable if 1 (A) AX for all A AY . If Y is a topological space and : X Y is a map from a measurable space (X, A) then we will usually assume (unless explicitly indicated otherwise) that we will deal with the Borel -algebra on Y to dene measurability of . In particular, in such a setting is measurable if and only if 1 (O) A for every open set O Y . This applies in particular to the cases Y = R and Y = C. Pointwise limits of sequences of measurable functions are measurable in the following sense. If (fn ) is a sequence of measurable functions with fn : X Y for each n 1, where Y is a topological space, and for each x X the sequence (fn (x)) converges to some f (x) in Y , then f : X Y is measurable. B.1.1 Measure and Integral In order to dene the integral of a measurable function one needs a precise notion of size or measure of a measurable set. Denition B.3. A function : A R {} dened on a -algebra A of subsets of a set X is called a (positive) measure if it has the following properties: (A) 0 for A A (Positivity) ; if An A for all n 1 and An Am = for all m = n, then
n=1
An
n=1
(An ),
where the sum on the right may or may not converge ( -additivity). Theorem B.4 (Carath eodory extension [6]). Let A be an algebra of subsets of X , and assume that : A [0, ] is a function satisfying the following properties: (1) () = 0; (2) if A1 , A2 , . . . are disjoint members of A with n=1 An A, then ( n=1 An ) = n=1 (An ); (3) there is a countable collection {An | n N} with An A and (An ) < for all n 1, and with X = n=1 An . Then there is a measure on the smallest -algebra containing A that extends in the sense that (A) = (A) for any A A. The triple (X, A, ) is called a measure space and is called a probability space if (X ) = 1. We will assume from now on that we are given some measure on a measurable space (X, B ).
B.1 Basic Denitions and Measurability
381
Denition B.5. A measurable function F : X C is called simple if ({x X | f (x) = 0}) < and we have nite range |f (X )| < . In other words, f is simple if
N
f=
n=1
an Bn n
(B.1) N . The
for some constants an C and Bn B with (Bn ) < for 1 integral of the function f in (B.1) is dened to be
N
f d =
X n=1
an (Bn ).
(B.2)
One can show rather easily that the integral dened in (B.2) is independent of the particular description of f as a nite sum in (B.1). For the next denition, the analogous claim is an important step in the theory (this is essentially the monotone convergence theorem discussed below). Denition B.6. Suppose that f : X R integral of f as the limit f d = lim
X n 0
is measurable. We dene the
fn d,
X 0
where (fn ) is a sequence of simple measurable functions fn : X R 0 for m n and x X , and f (x) = lim fn (x)
n
with
fm (x)
fn (x)
f (x)
for all x X . Implicit in this denition is the fact that any non-negative measurable function is a pointwise limit of simple functions. Notice also that we permit sets to have innite measure and functions to have innite integral. If f : X [0, ] = R {}, then we dene
X
f d =
if ({x X | f (x) = }) > 0, and
382
f d =
X X
f {xX |f (x)<} d
otherwise. Here the product 0 is dened to be 0. The function f is called integrable if

X
f d < .
If f : X R and
f = max{0, f } are integrable, then we dene f d =

X X
f + = max{0, f },
f + d
f d
X
and say that f is integrable. Finally, if f : X C and (f ), (f ) are integrable, then we dene f d = (f ) d + i (f ) d
and once again say that f is integrable.
B.2 Properties of the Integral

The space of integrable functions forms a vector space, and the integral is a linear function on that vector space. Moreover, the integral satises the following fundamental continuity properties, each of which is a consequence of the -additivity of the measure . Theorem B.7 (Monotone convergence). Let (X, B , ) be a measure space, and let (fn ) be a sequence of measurable functions fn : X R {} with fn f as n . That is, fm (x) fn (x) for m n and x X , and f (x) = limn fn (x) for all x X . Then f is measurable and f d = lim
n
fn d.
Theorem B.8 (Dominated convergence). Let (X, B , ) be a measure space, and let (fn ) be a sequence of measurable functions with f (x) = limn fn (x) for all x X . Assume that there is an integrable function g : X R 0 with |fn (x)| g (x) for all x X . Then f is integrable and f d = lim fn d.
n
B.2 Properties of the Integral
383
Denition B.9. We write

1 L (X ) = {f : X C | f is integrable}
for the space of integrable functions on a measure space (X, B , ), and dene f
1
|f | d
for any measurable function f : X C. Notice that and

1 f L f 1
< ,
1 1
f d
1 for all f L (X ). It is easy to check that f 1 f + g 1 for all f, g L and C.
= || f
and f + g
Denition B.10. A set N X is called a null set if (N ) = 0. We say that a property holds almost everywhere (also written a.e., or where the measure is not obvious from the context, -almost everywhere) if it holds on the complement of a null set.
1 Thus, for example, if f L (X ) then
= 0 f = 0 -a.e..
Denition B.11. We dene

1 L1 (X ) = L (X )/ , 1 where the equivalence relation is dened as follows. For f, g L (X ) we have f g if f = g almost everywhere.
While the natural notation for the element of L1 (X ) containing f is [f ] , it is conventional to simply write f L1 (X ), with the understanding that such a function f is only dened up to equivalence under . Even though integrable functions do not have to be bounded, integration nonetheless has the following property (which is trivial for bounded functions).
1 L (X )
Lemma B.12 (Continuity of Lebesgue integration). Let (X, B , ) be a measure space, and f L1 (X ). Then for every > 0 there exists a > 0 such that (B ) < = |f | d <
B
for B B .
It is implicit in this denition that N is required to be measurable. It is often convenient to relax this requirement and call any set N to be a null set if there is a measurable set N with (N ) = 0 and N N .
384
Proof. Notice that if f is bounded, then the statement is indeed trivial, since in that case |f | d (B ) max{|f (x)|}.
B xX
In the general case, suppose without loss of generality that f denition of L1 g (X ) there exists a simple function g with 0 g d >
. f d 2
0. Then by f and with
Clearly g is bounded (because it is a simple function), so there is some M with g (x) < M for all x X . Now given > 0 let = 2M . Then for any B B with (B ) < we have f d =
B B
(f g ) d +
g d
B X
(f g ) d +
g d <
B
+ M =
as required.
B.3 The p-Norm

Denition B.13. For any p [1, ) we dene
1/p
=
X
|f |p d
for any measurable f : X C,

p (X ) = {f : X C | f L p
< },
and
p Lp (X ) = L (X )/ ,
where f g if f = g -almost everywhere. Once again it is clear that f p = || f p and f + g p f p+ g p for any measurable f, g and C. We will review the proof of this triangle inequality, starting with the following important step. Theorem B.14 (H older inequality). Let p, q (1, ) satisfy (in which case q is called the conjugate exponent of p). Then |f g | d f
p 1 p
1 q
=1
for any measurable functions f, g : X C.
B.3 The p-Norm
385
For p = q = 2 this is the CauchySchwarz inequality (see also Proposition 2.55). For f p = 0 or g q = 0 we have f g = 0 -almost everywhere, and so |f g | d = 0. So assume that f p > 0 and g q > 0. If either are innite, then the inequality holds trivially. So it is enough to consider the case f p , g q (0, ). Dividing through by f p and by g q we may also assume that f p = g q = 1. Suppose now that x X satises |f (x)| > 0 and |g (x)| > 0. Then we may choose s, t R with |f (x)| = es/p and |g (x)| = et/q . By convexity of the function v ev for v R, we see that |f g |(x) = es/p+t/q
1 s pe 1 1 t p q +1 q e = p |f (x)| + q |g (x)| ,
(B.3)
and the inequality between the left-hand side and the right-hand side of (B.3) also holds trivially if f (x) = 0 or g (x) = 0. Integrating (B.3) over x X gives |f g | d proving the theorem. Theorem B.15 (Triangle inequality). For measurable functions f and g from X to C we have f +g p f p+ g p for any p [1, ). Proof. If f p = , g p = or f + g p = 0 then there is nothing to show. So we may assume f + g p > 0, f p < and g p < . Then |f + g |p and
1 2 |f | 1 +2 |g | p 1 p
p p
1 q
q q
= 1,
(|f | + |g |)
1 p 1 p |f | + |g | 2 2 p by convexity of the function u u on [0, ) (this is where the assumption p 1 is required). Together these inequalities imply that f +g
p p
(|f | + |g |) d
2p 2
p p
+ g
p p
which shows that f + g p < . The case of p = 1 follows easily from the standard triangle inequality for integrals, so assume that p > 1. Write (|f | + |g |) = |f | (|f | + |g |)
p p1
+ |g | (|f | + |g |)
p1
integrate over X , and apply H olders inequality to get
386

p p
|f | + | g | where
1 p
=
1 q
(|f | + |g |)p d
(|f | + |g |)p1
+ g
(|f | + |g |)p1 (B.4)
= 1. Notice that p = q (p 1) so (|f | + |g |)

p1 q
(|f | + |g |)
p/q p ,
q(p1)
1/q
= |f | + | g |
which by the argument above is positive and nite. Dividing (B.4) by |f | + 1/q |g | q gives the triangle inequality since p p q = 1.
B.4 Near-continuity of Measurable Functions

Even though measurable functions are typically very far from being continuous, if the measure space is equipped with a metric and a measure then they are nearly continuous in the following sense. Proposition B.16 (Lusins theorem: near continuity of measurable functions). Let X be a metric space, let be a nite measure on the Borel algebra of X , let Y be a separable metric space, and let f : X Y be (Borel) measurable. Then for every > 0 there exists a closed set K X with (X K ) < such that f |K is continuous. If X is -compact, then K can be chosen to be compact. As the proof will show, we will in essence produce the continuity of f |K by removing very small open subsets around every possible discontinuity. To do this we will use the following regularity property of measures on metric spaces. Lemma B.17 (Regularity of Measures). Let X = (X, d) be a metric space and let be a nite measure on X . Then for every Borel set B X and every > 0 there exists a closed set K X and an open set O X with K B O and (O K ) < . Proof. Consider the family A B of sets B B with the property that for every > 0 there exists a closed set K B and an open set O B with (O K ) < . The statement of the lemma is then A = B , which we will prove in stages. By denition of the Borel -algebra B , it is enough to show that A is a -algebra containing all the open sets. Closure under complements: Since taking complements switches open and closed sets, A is closed under taking complements. Explicitly, if B A and for a given > 0 we have K B O as in the denition of A,
B.4 Near-continuity of Measurable Functions
387
then X O X B X K and ((X K ) (X O)) = (O K ) < shows that X B A. Open sets: Using the distance function x d(x, A) for a closed subset A X from (2.22) it follows that A=
n 1
{x X | d(x, A) <
=On , an open set
1 n}
is a countable intersection of open sets (that is, a G -set). From the properties of the measure it now follows that B = A satises the claim of the lemma with K = A and O = On with n depending on > 0. Since closed sets belong to A, open sets also belong to A by the previous step. Finite unions: Suppose B1 , B2 A and > 0. Then there exist K1 B1 O1 and K2 B2 O2 as in the denition of A, with (O1 K1 ) < and (O2 K2 ) < . Now dene K = K1 K2 , B = B1 B2 and O = O1 O2 so that K is closed, O is open, and K B O. Moreover, (O K ) (O1 K1 ) + (O2 K2 ) < 2,
and since > 0 was arbitrary we deduce that B = B1 B2 A. By induction, the same holds for any nite unions. Countable unions: Now suppose that B1 , B2 , A. By the steps above, B1 Bn A for all n 1. Therefore, and since we are interested in the union of these sets, we may assume that B1 , B2 , A satisfy Bn Bn+1 1 so that for all n 1. Dene Bn +1 = Bn+1 Bn A for all n
n=1
Bn
n=1
)< (Bn
by our assumption that is a nite measure. Therefore, for any > 0 there exists some m 1 with
(Bn +1 ) < . n=m
Since Bm A, there exists some closed K Bm and open O Bm with (O K ) < . Since Bn A there exists some open On Bn n with (On Bn ) < /2 for all n > m. Now dene O = O and notice that K and
n=1 n=m+1
On
Bn O
388
(O K )
(O K )+
n=m+1
On
< +
) + (On Bn
n=m+1
Bn
< 3.
n=m+1
It follows that
n=1
Bn A.
Conclusion: By the above A is a -algebra that contains all open sets, and hence A = B . Proof of Proposition B.16. Let f : X Y be as in the statement of the proposition. By denition of measurability and of the Borel -algebra, the pre-image of every open set is Borel measurable in X . We wish to nd, for every > 0, a closed set K X with (X K ) < such that (f |K )1 (U ) = K f 1 (U ) is open in K for every open set U Y . By our assumptions on Y , there exists a countable basis of the topology 1 for n 1 with centers at the of Y (for example, using all balls of radius n points of a countable dense subset). Let {Un | n 1} be such a basis. Now apply Lemma B.17 to each of the sets f 1 (Un ) X to nd a closed set Kn and an open set On with Kn f (Un ) On and with (On Kn ) < /2n for n 1. Now dene K=
n=1
(Kn X On ) . 1, and so K is also closed.
Notice rst that Kn X On is closed for all n Second, we have (X K ) = X
n=1
(Kn X On )
n=1
by construction of Kn and On . Finally, notice that
X (Kn X On ) <
=On Kn
1 1 f | (Un ) = K On K (Un ) = K f
is an open subset of K (in the induced topology). Since this holds for all the sets Un in the basis, it follows that f |K is continuous. If now in addition X= Ln
n=1
is a countable union of compact sets, then

N
K =K
Ln
n=1
satises the nal claim of the proposition if N is suciently large.
Appendix Z: To do list
On page 60, do we want to refer forward to the construction of the metric in an appendix (for countable product of metric spaces). Notation: Usual things, little-oh and big-oh, Lp spaces. Less usual things: weak partials etc. are initially going to be and , background, measure theory, real analysis, metric spaces. Thanks: Anthony Flatters, Alex Maier, Andrea Riva, Thomas Hille; specifically Emanuel Kowalski for spectral theory notes Summability-kernels / approximate identities should appear also by name at some point. Notion of Lebesgue integration in appendix on measure theory. Number of exercises is 179 Number of gures is 25 Where does this one belong to?
Exercise B.18. Show that for any A Z we have lim sup
k
1 sup |A [M, M + k 1)| = sup m(A) = max m(A), m k m m
where the supremum (resp. maximum) is taken over all the invariant means on Z. This quantity is often called the Banach upper density and is often also written in the abbreviated form 1 lim sup |A [M, N )|. N M N M
Hints for Selected Problems
Exercise 1.10 (p. 11): Express this quantity in terms of the orbit of 0 under the map t t + log10 2 modulo 1. Exercise 1.18 (p. 24): Extend the given function rst to an odd function on (1, 1) and then by periodicity to a function on R/2Z. Then use the Fourier series for the function. Exercise 2.48 (p. 75): Use the Cauchy integral formula (assuming that the paths i are oriented correctly) to see that f f (a) = 1 2 i
k i=1 i
f (z ) dz za
H p (D ) .
and deduce that f f (a) is continuous with respect to
Exercise 2.72 (p. 90): For (a) apply Corollary 2.71 to the linear functional x B (x, y ) for a xed y H . For (b), notice that T x c x and show that this implies that T (H ) H is closed. Now notice that if f T (H ) then T x, x = 0 c x 2 . Exercise 2.75 (p. 91): Either use Section 2.2.2, or dene an inner product on the dual (or double dual) of H . Exercise 3.6 (p. 107): Recall that e1 , . . . , ed were sucient to separate points, and that the group of characters generated by these are all the characters of the stated form. Exercise 3.22 (p. 118): Localize to a small open subset B (x) by mul tiplying by a function Cc (B (x)) which is equal to 1 on B/2 (x). Treat the new localized function as an element on T2 . Now generalize Theorem 3.10 to give an inequality concerning (and as a result, the existence of) x1 x2 f at x. This exercise should become easier after reading Theorem 3.44.
392
Exercise 3.25 (p. 119): Do this via a familiar sequence of approximations, rst for indicator functions of measurable sets, then for simple functions, then for non-negative functions by monotone convergence, and nally for all integrable functions. Notice that the property (3.13) also gives a characterization of a group action being measure-preserving, simply by considering f = B . Exercise 3.31 (p. 123):(a) Use Fubinis theorem to show that (g, x) (g )f (g 1 x) belongs to L2 (mG ). Then apply Fubinis theorem a second time. (b) Apply Theorem 3.9 to the map g f (g 1 x) for all x X with the property that g f (g 1 x) lies in L2 mG . Now apply the dominated convergence theorem for
2 2
(x)2 d(x).
Exercise 3.38 (p. 127): For (a) use Fubini, and for (b) start with the formal calculation f1 (f2 v ) =

f1 (h)h
f2 (g )g (v ) dmG (g ) dmG (h)
f1 (h)f2 (g )hg (v ) dmG (g ) dmG (h)
and then substitute k = hg and argue using Lemma 3.33. Exercise 3.39 (p. 127): Use integration by parts just as in the proof of Theorem 3.10 to bound n f uniformly on compact subsets of R2 .
Exercise 3.49 (p. 136): Describe the relationship between the Fourier coefcients of f and of f and use Lemma 3.43. Exercise 3.59 (p. 139): Either convolve with an approximate identity (that is, a function 0 of small support with Cc (R2 ) and = 1) or show rst that the sequence of functions dened by fn = min{n, f } all lie in H 1 (B1 (0)). Exercise 3.61 (p. 141): Use the regular map to pull back any function f C (U ) H 1 (U ) (or f H 1 (U )) to an element f C (0, 1)d H 1 (0, 1)d (or f H 1 (0, 1)d ) and then apply Proposition 3.60. Exercise 4.14 (p. 169): Consider the images under K of the functions fn = [3n,3n+1] , all of which have L2 -norm one. Exercise 4.23 (p. 173): Expand and f in terms of the orthonormal basis of Theorem 4.22, and compare coecients.
393
Exercise 6.5 (p. 205): Construct the complement as a kernel of a linear map, using the HahnBanach theorem. Exercise 6.13 (p. 209): Consider a dense countable subset {1 , 2 , . . . } and choose for every n some xn X with xn = 1 and |n (xn )| n /2. Now take the Q-linear (or Q(i)-linear) hull of {xn }, which is countable, and show that it is dense. Exercise 6.23 (p. 213):(b) Emulate the strategy used to show that a free group is not amenable in Example 6.21. Exercise 6.35 (p. 222): This should become clear by combining Lemma 6.34 with the compactness argument used in the proof of Theorem 6.30 for totally disconnected spaces, since N< = nN Nn . Exercise 6.31 (p. 220): Check the claim rst for open subsets, and then argue along the lines used to prove Proposition 2.38. Exercise 6.39 (p. 230): Apply Theorem 6.30 to obtain a locally nite measure. Assuming that (X ) = , nd some f C0 (X ) for which X f d = , and then use positivity to obtain a contradiction. Exercise 6.44 (p. 233): For (a), notice that (N) can be embedded 1 | into L (X ) using functions that vanish everywhere except on the set { n n N}. Now extend the Banach limit from (N) to L (X ) and show that it does not arise from a signed measure on X . Extend the Banach limit from (N) and how that it cannot arise from a signed measure on X . (b) If f : X R is a non-measurable bounded function on X then f induces a linear functional on the space {{ M(X ) | (X D) = 0 for some countable set D X }, since for each such measure one can dene extend this functional to all of M(X ). f d as a countable sum. Now
Exercise 6.24 (p. 213): Construct a box-like (not cube-like) Flner sequence. Exercise 7.13 (p. 241): For every g G show that the map Lg : CR (G) CR (G) dened by (Lg f )(x) = f (gx) is an isometry, and that { | = Lg , 0, () = 1}
is a closed subset of the unit ball in C (G) . Exercise 7.6 (p. 239): For (a) use the Baire category theorem (Theorem 5.9). For (b) assume that the neighborhoods of the form Nx1 ,...,xn ;1/n (0) form a basis of the weak* topology neighborhoods of 0 X and conclude that X is
394
the linear hull of {x1 , x2 , . . . } by using the same argument as was used in the proof of Lemma 7.9. Exercise 7.12 (p. 241): Suppose that there is a sequence that converges weakly but not in norm. Show that this implies that there is a sequence (fn ) in 1 (N) such that fn = 1 for all n 1 but for which fn converges weakly to 0 as n . Use this to construct a partition N = [1, I1 ] [I1 + 1, I2 ] of N into subintervals and a subsequence (fnj ) such that
Ij 1 k=1
|fnj (k )|
1 5
and
k=Ij +1
|fnj (k )|
1 5
for all j
1. Using this partition, construct an element h (N) for which

k=1
fnj (k )h(k ) 0
as j . Exercise 7.30 (p. 248): For ergodicity, use Fourier series as in the proof of Lemma 7.28. Exercise 7.35 (p. 252): The uniform operator topology is the only topology that has neighborhoods that are bounded with respect to the operator norm. If xn X and yn Y have norm one, then L 1 y (Lxn ) 2n n n=1
is a continuous functional on B (X, Y ) and so also continuous with respect to the weak topology. Choosing the sequence (xn ) carefully makes this functional not continuous with respect to the strong, nor the weak, operator topology. Finally, notice that for the strong operator topology and x X {0} there exists a neighborhood, namely Nx,1 (0) such that {Lx | L Nx,1 (0)} Y is bounded while this is not true for the weak operator topology. Exercise 7.38 (p. 253): Apply the HahnBanach lemma (Lemma 6.1).
395
Exercise 8.22 (p. 280):(c) Apply the StoneWeierstrass theorem to the onepoint compactication of Rd . (d) First approximate g simultaneously in L1 (Rd ) and L2 (Rd ) by some func2 tion f0 Cc (Rd ). Then approximate e x f0 (x) by some function f1 A 2 with respect to , and notice that f = e x f1 will then approximate g with respect to 1 and 2 . (e) Consider f, g A and f, g = f g dx = (f g)(0)
in terms of the Fourier transform of f and g . It may be helpful to prove and then use the identity (f g ) = f g. Exercise 8.35 (p. 288): Consider the associated (well-dened) function g C (Td ) dened by f (n + x). g (x) =
nZd
Exercise 9.16 (p. 298): Start with the identity star operator and then use the C -property. Exercise 10.7 (p. 311): Prove that
= = , apply the
(Im(T )) = ker (T )
and then use this together with an explicit description of T . Exercise 10.49 (p. 348): Use Fourier series and the resulting isomorphism 2 (Z) = L2 (T). Exercise 9.6 (p. 290):(b) You may show that A is, as a Banach algebra, isomorphic to the algebra generated by S with S as in Exercise 4.19(b), see also Exercise 4.1.
Notes
(Page 24) The additional exibility aorded by the theory of distributions allows, for example, locally integrable functions to be dierentiated in the sense of distributions. The theory is of central importance in partial dierential equations, where it sometimes allows solutions to be found in the sense of distributions when they cannot be readily found in the classical sense (we will see some of these ideas in Chapters 3 and 4). The theory of generalized functions was initiated by Sobolev [45] to provide weak solutions to certain partial dierential equations, and then developed systematically by Schwarz [42], [43]. (2) (Page 29) There is also a family of certain nite quotients SL2 (Z)/ that give an expander family, but the proof of this lies deeper and goes beyond what we will be able to cover. We refer to the monographs of Sarnak [41] and Lubotzky [28] for the details. (3) (Page 105) In particular, the convergence in L2 does not imply convergence of the Fourier series at any given point, and a priori does not even imply convergence almost everywhere. In the classical setting G = T, these questions have been of central importance. Dirichlet proved that the Fourier series converges at each point if f C 1 (T), and Paul du Bois-Reymond showed that there is a function f C (T) whose Fourier series diverges at one point. Luzin conjectured that the Fourier series converged almost everywhere to the function for f L2 (T), and Kolmogorov [21] found a function in L1 (T) whose Fourier series diverges almost everywhere. Carleson [7] proved the convergence almost everywhere for f L2 (T), an extremely dicult result later extended to f Lp (T) for p (1, ) by Hunt [19]. We refer to Lacey [24] for a modern, approachable, account. The situation is more complicated for functions on compact abelian groups, because there is no canonical way to sum over the group of characters. (4) (Page 185) We refer to Courant and Hilbert [9] for a thorough classical treatment of Bessel functions. (5) (Page 194) The Baire category theorem result is a powerful tool across much of topology and analysis. It was shown by Osgood [34] for R, and independently by Baire [1] for Rd . It was later applied in functional analysis by Banach and Steinhaus [2].
(1)
398
(6) (Page 195) This analogy is pursued in a monograph by Oxtoby [35], motivated by work of Sierpi nski [44] and Erd os [13], who showed that under the assumption of the continuum hypothesis there is an injective function f : R R with f = f 1 with the property that f (A) is a null set if and only if A is of rst category. (7) (Page 243) We refer to Parthasarathy [36] for a more detailed treatment of the theory of probability measures on compact metric spaces, and to [12, Ch. 4] for material on equidistribution from a dynamical point of view. (8) (Page 247) This is the KryloBogolioubo Theorem [22], and it means that a continuous transformation on a compact metric space always gives rise to one (and perhaps to many) measure-preserving systems. (9) (Page 299) This is the simplest result in the topic of automatic continuity, which asks for algebraic conditions on Banach algebras A and B that ensure that any algebra homomorphism : A B is continuous. We refer to the monograph of Dales [10] for a thorough account. (10) (Page 349) We refer to the monograph of Lubotzky [28, Sec. 4.5] and the papers of Kesten [20] and Buck [5] for the proof (and for generalizations to other Cayley graphs). (11) (Page 366) Some of these are well explained in the monograph of Wagon [48].
References
1. R. Baire, Sur les fonctions de variables r eelles, Annali di Mat.(3) III (1899), 1123. 2. S. Banach and H. Steinhaus, Sur le principe de la condensation de singularit es, Fundamenta 9 (1927), 5061 (French). 3. B. Blackadar, Operator algebras, in Encyclopaedia of Mathematical Sciences 122 (Springer-Verlag, Berlin, 2006). Theory of C -algebras and von Neumann algebras, Operator Algebras and Non-commutative Geometry, III. 4. N. Bourbaki, Sur certains espaces vectoriels topologiques, Ann. Inst. Fourier Grenoble 2 (1950), 516 (1951). 5. M. W. Buck, Expanders and diusers, SIAM J. Algebraic Discrete Methods 7 (1986), no. 2, 282304. http://dx.doi.org/10.1137/0607032. 6. C. Carath eodory, Vorlesungen u ber reelle Funktionen, in Third (corrected) edition (Chelsea Publishing Co., New York, 1968). 7. L. Carleson, On convergence and growth of partial sums of Fourier series, Acta Math. 116 (1966), 135157. 8. J. B. Conway, A course in functional analysis, in Graduate Texts in Mathematics 96 (Springer-Verlag, New York, second ed., 1990). 9. R. Courant and D. Hilbert, Methods of mathematical physics. Vol. I (Interscience Publishers, Inc., New York, N.Y., 1953). 10. H. G. Dales, Banach algebras and automatic continuity, in London Mathematical Society Monographs. New Series 24 (The Clarendon Press Oxford University Press, New York, 2000). Oxford Science Publications. 11. M. Einsiedler and T. Ward, Homogeneous dynamics and applications. in preparation. 12. M. Einsiedler and T. Ward, Ergodic theory with a view towards number theory, in Graduate Texts in Mathematics 259 (Springer-Verlag London Ltd., London, 2011). 13. P. Erd os, Some remarks on set theory, Ann. of Math. (2) 44 (1943), 643646. 14. G. B. Folland, A course in abstract harmonic analysis, in Studies in Advanced Mathematics (CRC Press, Boca Raton, FL, 1995). 15. H. Furstenberg, Strict ergodicity and transformation of the torus, Amer. J. Math. 83 (1961), 573601. 16. G. Greschonig and K. Schmidt, Ergodic decomposition of quasi-invariant probability measures, Colloq. Math. 84/85 (2000), no. 2, 495514.
400
References
17. E. Hewitt and K. A. Ross, Abstract harmonic analysis. Vol. I, in Grundlehren der Mathematischen Wissenschaften 115 (Springer-Verlag, Berlin, second ed., 1979). 18. D. Hilbert and E. Schmidt, Integralgleichungen und Gleichungen mit unendlich vielen Unbekannten, in Teubner-Archiv zur Mathematik [Teubner Archive on Mathematics], 11 (BSB B. G. Teubner Verlagsgesellschaft, Leipzig, 1989). Edited and with a foreword and afterword by A. Pietsch, With English, French and Russian summaries. 19. R. A. Hunt, On the convergence of Fourier series, in Orthogonal Expansions and their Continuous Analogues (Proc. Conf., Edwardsville, Ill., 1967), pp. 235255 (Southern Illinois Univ. Press, Carbondale, Ill., 1968). 20. H. Kesten, Symmetric random walks on groups, Trans. Amer. Math. Soc. 92 (1959), 336354. 21. A. Kolmogorov, Une s erie de Fourier-Lebesgue divergente presque partout, Fundamenta math. 4 (1923), 324328 (French). 22. N. Krylo and N. Bogolioubo, La th eorie g en erale de la mesure dans son application ` a l etude des syst` emes dynamiques de la m ecanique non lin eaire, Ann. of Math. (2) 38 (1937), no. 1, 65113. 23. L. Kuipers and H. Niederreiter, Uniform distribution of sequences (WileyInterscience [John Wiley & Sons], New York, 1974). Pure and Applied Mathematics. 24. M. T. Lacey, Carlesons theorem: proof, complements, variations, Publ. Mat. 48 (2004), no. 2, 251307. 25. P. D. Lax, Functional analysis, in Pure and Applied Mathematics (New York) (Wiley-Interscience [John Wiley & Sons], New York, 2002). 26. M. Lo` eve, Probability theory. I, in Graduate Texts in Mathematics 45 (SpringerVerlag, New York, fourth ed., 1977). 27. M. Lo` eve, Probability theory. II, in Graduate Texts in Mathematics 46 (SpringerVerlag, New York, fourth ed., 1978). 28. A. Lubotzky, Discrete groups, expanding graphs and invariant measures, in Modern Birkh auser Classics (Birkh auser Verlag, Basel, 2010). With an appendix by Jonathan D. Rogawski, Reprint of the 1994 edition. 29. G. A. Margulis, Explicit constructions of expanders, Problemy Pereda ci Informacii 9 (1973), no. 4, 7180. 30. G. A. Margulis, Explicit constructions of expanders, Problems of Information Transmission 9 (1975), no. 4. 31. S. Mazur and S. Ulam, Sur les transformationes isom etriques despaces vectoriels norm es, C. R. Math. Acad. Sci. Paris 194 (1932), 946948. 32. C. H. M untz, Uber den Approximationssatz von Weierstra, Schwarz-Festschr. (1914), 303312. 33. J. v. Neumann, On a certain topology for rings of operators, Ann. of Math. (2) 37 (1936), no. 1, 111115. 34. W. F. Osgood, Non-uniform convergence and the integration of series term by term., Amer. J. Math. 19 (1897), 155190. 35. J. C. Oxtoby, Measure and category. A survey of the analogies between topological and measure spaces (Springer-Verlag, New York, 1971). Graduate Texts in Mathematics, Vol. 2. 36. K. R. Parthasarathy, Probability measures on metric spaces, in Probability and Mathematical Statistics, No. 3 (Academic Press Inc., New York, 1967).
401 37. F. Peter and H. Weyl, Die Vollst andigkeit der primitiven Darstellungen einer geschlossenen kontinuierlichen Gruppe, Math. Ann. 97 (1927), no. 1, 737755. 38. M. S. Pinsker, On the complexity of a concentrator, in Proceedings of the Seventh International Teletrac Congress (Stockholm, 1973), 318 (1973), 318/1 318/4. unpublished. 39. M. S. Pinsker, On the complexity of a concentrator, Problems of Information Transmission 9 (1975), no. 4, 325332. 40. M. Reed and B. Simon, Methods of modern mathematical physics. I. Functional analysis (Academic Press, New York, 1972). 41. P. Sarnak, Some applications of modular forms, in Cambridge Tracts in Mathematics 99 (Cambridge University Press, Cambridge, 1990). 42. L. Schwartz, Th eorie des distributions. Tome I, in Actualit es Sci. Ind., no. 1091 = Publ. Inst. Math. Univ. Strasbourg 9 (Hermann & Cie., Paris, 1950). 43. L. Schwartz, Th eorie des distributions. Tome II, in Actualit es Sci. Ind., no. 1122 = Publ. Inst. Math. Univ. Strasbourg 10 (Hermann & Cie., Paris, 1951). 44. W. Sierpi nski, Sur les fonctions jouissant de la propri et e de Baire de fonctions continues, Ann. of Math. (2) 35 (1934), no. 2, 278283. 45. S. Sobole, M ethode nouvelle ` a r esoudre le probl` eme de Cauchy pour les equations lin eaires hyperboliques normales, Rec. Math. [Mat. Sbornik] N.S. 1(43) (1936), no. 1, 3972. 46. M. Takesaki, Theory of operator algebras. I (Springer-Verlag, New York, 1979). 47. F. Tr` eves, Topological vector spaces, distributions and kernels (Academic Press, New York, 1967). 48. S. Wagon, The Banach-Tarski paradox, in Encyclopedia of Mathematics and its Applications 24 (Cambridge University Press, Cambridge, 1985). With a foreword by Jan Mycielski. 49. A. Weil, Lint` egration dans les groupes topologiques et ses applications, in Actual. Sci. Ind., no. 869 (Hermann et Cie., Paris, 1940). 50. H. Weyl, Uber die Gleichverteilung von Zahlen mod Eins, Math. Ann. 77 (1916), 313352.
Author Index
Alaoglu, 237, 243 Arzela, 59 Ascoli, 59 Baire, 194, 397 Banach, 6, 44, 189, 201, 209, 242, 289, 389, 397 Benford, 11 Bergmann, 75, 90 Bessel, 185 Blackadar, 308 Blaschke, 206 Bochner, 265 Bogolioubo, 398 Bourbaki, 262 Buck, 398 Caratheodory, 223 Carath eodory, 380 Carleson, 397 Cauchy, 44, 81, 277, 293 C esaro, 209 Conway, 253 Dales, 398 Dirac, 243 Dirichlet, 21, 22, 103, 111, 147, 155 Dunford, 242 Erd os, 398 Fej er, 112 Fekete, 295 Folland, 159 Flner, 212 Fourier, 6, 7, 24, 105, 193, 303 Fr echet, 262 Fr echet, 89, 288, 331 Furstenberg, 244 Gauss, 277 Gelfand, 298, 301 Green, 15, 17 Greschonig, 242 Haar, 104 Hahn, 201 Hardy, 75, 90 Hewitt, 308 Hilbert, 80, 81, 88, 166 H older, 109, 216, 384 Holmgren, 167 Jordan, 161 Ka zdan, 29 Kesten, 398 Kolmogorov, 397 Krein, 258 Krylo, 398 Kuipers, 8 Lacey, 397 Laplace, 18, 29, 359 Lax, 90, 169 LaxMilgram lemma, 90 Lebesgue, 12 Liouville, 14, 17 Lipschitz, 38 Lubotzky, 30, 397, 398 Lusin, 78 Luzin, 397 Margulis, 28 Mazur, 84 Milgram, 90 Milman, 258 de Moivre, 7 M untz, 206 von Neumann, 262, 307, 308 Niederreiter, 8 Nikodym, 95 Osgood, 397 Oxtoby, 398
AUTHOR INDEX
403
Parseval, 108 Parthasarathy, 398 Peter, 159 Pinsker, 28 Plancherel, 284, 286 Poisson, 288 Pontryagin, 159, 303 Radon, 95 Reed, 33 Riesz, 89, 220, 243, 331 Ross, 308 Sarnak, 397 Schmidt, 166, 242 Schwartz, 262 Schwarz, 81, 397 Sierpi nski, 398 Simon, 33 Sobolev, 128, 133, 397 Steinhaus, 189, 242, 397
Stone, 9, 61, 104, 107, 301 Sturm, 14, 17 Takesaki, 262 Taylor, 18 Tietze, 78 Toeplitz, 239 Tr` eves, 262 Tychono, 60, 237, 243, 373 Ulam, 84 Urysohn, 374 Volterra, 13, 14 Wagon, 398 Weierstrass, 9, 61, 104, 107 Weil, 262 Weyl, 159, 244 Zorn, 100, 202, 326, 366, 374
Notation
k (), matrix of rotation through 2 on R2 , 5 n , character e2in , 6 C ([0, 1]), continuous complex-valued functions on [0, 1], 8 R ([0, 1]), Riemann-integrable complexvalued functions on [0, 1], 8 A , indicator function of the set A, 9 {}, fractional part of a real number, 11 , Laplace operator, 18 Sd1 , (d 1) unit sphere in Rd , 19 L1 , space of integrable functions, 25 Cc , space of smooth compactly supported functions, 25 L1 , space of locally integrable funcloc tions, 25 AG , adjacency matrix for a graph, 29 MG , averaging operator for a graph, 29 c0 , space of null sequences, 58 CR (X ), CC (X ), real- and complexvalued continuous functions, 61 B (V, W ), bounded linear operators from V to W , 70 B (V ), bounded linear operators from V to V , 70 V , continuous linear functionals on V , 70 H p (D), Hardy space, 75 Ap (D), Bergmann space, 75 Y , orthogonal complement in a Hilbert space, 88 S , linear hull of S , 91 1 2 , mutually singular measures, 94 P(I ), set of all subsets of I , 99 P(), set of all subsets, 100 H , weight space associated to character , 122 H k (Td ), 128 K (V, W ), K (V ), space of compact operators, 162 K (V, W ), space of compact operators, 163 A , adjoint of operator A, 169 G -set, 195 S , push-forward of a measure, 247 D ( ), space of test functions, 262 S , space of Schwartz functions, 262 S (Rd ), Schwartz space on Rd , 288 (a), resolvent set, 289 B (H ), algebra of bounded operators on a Hilbert space, 297 N, StoneCech compactication of N, 301 discrete , discrete spectrum, 309 continuous , continuous spectrum, 310 P(X ), set of subsets of X , 366 B (), ball in a metric space, 367 lim F , limit of a convergent lter, 368 limF f , convergence along a lter, 368 (C ), -algebra generated by C , 379 1 L , space of integrable functions, 383
General Index
absorbent, 253, 255 adjacency matrix, 29 adjoint operator densely dened, 357 ane map, 84 algebra, 104 Banach, 79, 80, 101, 127, 289, 291, 292, 294, 301, 302 automatic continuity, 398 bounded operators, 309 C -star, 297 continuous functions, 79 dual space, 298 Gelfand dual, 298, 301 Gelfand transform, 302 homomorphism, 317 ideal, 163 integrable functions, 281 maximal ideal, 299 von Neumann, 308 spectral radius, 290 spectrum, 289, 309 unital, 289 C , 297 normal element, 298 self-adjoint element, 298 star operator, 297 commutative, 299 homomorphism, 298 von Neumann, 262, 308 almost everywhere, 42, 117, 383, 397 amenable group, 210 analytic strongly, 241 weakly, 241 approximate eigenvector, 310, 320 approximate identity, 114, 115, 283, 304, 392 arithmetic-geometric mean inequality, 168 ArzelaAscoli theorem, 59 atom, 310 automatic continuity, 398 axiom of choice, 365, 366 Baire category theorem, 194 balanced, 253 ball, 38 Banach algebra, 79, 80, 101, 289, 301, 302 automatic continuity, 398 bounded operators, 309 C -star, 297 continuous functions, 79 dual space, 298 examples, 79 eld, 291 Gelfand dual, 298, 301 Gelfand transform, 302 generated, 291 homomorphism, 317 ideal, 163 integrable functions, 127, 281 inverse of an element, 291 maximal ideal, 299 von Neumann, 308 resolvent, 291 spectral radius, 290 spectrum, 289, 292, 294, 309 unital, 289 without a unit, 301 limit, 209 space, 6, 44 reexive, 123, 208 upper density, 389 BanachSteinhaus theorem, 189, 242 application to Fourier analysis, 191
406
base eld, 24 Bergmann space, 75, 90 Bessel function, 185 bi-dual, 208 Blaschke product, 206 Bochner theorem, 265 boundary of a set in a graph, 27 boundary conditions, 14 boundary value problem Dirichlet, 21 bounded linear operator extension, 73 C-star C -algebra, 297 normal element, 298 self-adjoint element, 298 star operator, 297 C -algebra normal element spectral radius formula, 298 Carath eodory extension theorem, 223, 380 category rst, 194 second, 194 Cauchy formula, 277 sequence, 44 CauchySchwarz inequality, 81, 385 Cech, 301 C esaro average, 209 character, 6, 105, 298 separate points, 105 weight, 122 cheating, 24 Chebyshev polynomial of the second kind, 350 circle rotation, 247 clopen, 221 set, 366
closable operator, 357 closed linear hull, 91, 205 operator, 198 set, 366 coarser lter, 368 coarser topology, 367 coercive, 90 commutative algebra, 299 ideal, 299 quotient, 299 compact, 372 integral operator, 165, 172 intersection property, 373 operator, 161, 162 HilbertSchmidt, 166 ideal in a Banach algebra, 163 preserved by limits, 164 regularity property, 163 spectral theorem, 172 sequentially, 373 totally bounded, 373 Tychono, 373 complete, 98 conditional expectation, 91 conjugate exponent, 109, 384 space, 201 connected graph, 26 content, 222 continuous, 70 addition, 38 function, 367 functions dense in Lp , 66 group action, 118 scalar multiplication, 38 uniformly, 59 convergence uniform, 44 convergent
GENERAL INDEX
407
lter, 368 sequence, 368 convex, 35 set, 85 absorbent, 253, 255 balanced, 253 space, 85 convolution, 108, 109 operator, 124 cover nite subcover, 372 open, 372 cyclic vector, 325 decay super-polynomial, 288 dense, 8 densely dened operator, 357 adjoint, 357 self-adjoint, 357 diameter graph, 26 dierential equation fundamental solutions, 17 ordinary, 12 Dirichlet boundary problem, 103 wave equation, 22 boundary value problem, 21, 147, 155 kernel, 111 discrete spectrum, 309 distribution, 24 divergence theorem, 157 dual Banach algebra, 298 Gelfand, 298 space, 201 edge, 26 eigenvector approximate, 310, 320
elliptic dierential operator, 148 regularity, 128, 148, 150 regularity on the torus, 151 entire function, 294 equicontinuous, 59 equidistributed, 8 equidistributes, 243 equivalent norm, 38 ergodic, 245 circle rotation, 247 relation to indecomposable, 245 essential range, 313 even function, 3 part, 3 expander family, 27 graph, 26 logarithmically small diameter, 27 expectation conditional, 91 extension bounded linear operator, 73 extremal point, 258 Fej er kernel, 112 Feketes lemma, 295 lter, 368 compactness, 373 convergence, 368 convergence along, 368 ner, coarser, 368 neighborhood, 368 tail, 368 ner lter, 368 ner topology, 367 nite intersection property, 259, 372 rst category, 194
408
Flner sequence, 212 Fourier analysis, 6 coecient, 7 measure, 265 inversion theorem, 282 series, 7, 24, 105 convergence almost everywhere, 397 diverges almost everywhere, 397 nonconvergent, 193 transform, 276, 303 Gaussian, 277 not an isometry, 305 frequency variable, 276 Fr echet space, 254 Fr echet space, 288 Fr echetRiesz theorem, 89 function continuous, 367 even, 3 even, odd part, 3 uniqueness, 4 generalized, 24 odd, 3 simple, 381 integral, 381 test, 24 weight, 6 functional, 24, 201 calculus, 274 measurable, 314 gauge function, 203, 255 G -set, 195 Gelfand dual, 298, 301 and Pontryagin dual, 303 transform, 302 not an isometry, 305 generalized function, 24 graph, 26, 347 adjacency matrix, 29
boundary of a set, 27 connected, 26, 30 diameter, 26 edge, vertex, 26 expander family, 30 k -regular, 26 Laplace operator, 29 metric, 26 path, 26 regular tree, 347 simple, 26 sparsity, 26 spectral ga, 30 undirected, 26, 347 graph of a linear operator, 198 Green function, 15, 17 group action, 4 associated unitary operator, 119 measure-preserving, 119 amenable, 34, 210 character, 6 continuous action, 118 topological, 103 unitary representation, 119 Haar measure, 104 HahnBanach lemma, 201 theorem, 201, 203 Hardy space, 75, 90 harmonic function, 147, 155 mean value principle, 157 weak, 149 weakly, 149 heat equation, 20 HeineBorel theorem, 57 Hilbert space, 80, 81 norm is strictly sub-additive, 84 orthogonal complement, 88
GENERAL INDEX
409
orthogonal projection, 89 HilbertSchmidt operator, 166 H older conjugate, 109 inequality, 109, 384 homogeneous, 12 homomorphism algebra, 298 hull closed linear, 91 linear, 91 ideal, 299 commutative algebra, 299 maximal, 299 proper, 299 inequality CauchySchwarz, 81, 385 H older, 384 triangle, 367, 385 innite intersection property, 372 initial topology, 369 values, 12 inner product, 80 space, 80 sesqui-linear, 80 integral operator, 12, 14, 165 compact, 165 simple function, 381 invariant measure, 245 ergodic, 245 isometry, 70 isomorphism isometric, 74 Jordan block, 161 KreinMilman theorem, 258 Laplace operator, 18, 348, 359 graph, 29
Laplace operator, 347 Lebesgue integral, 379 LF space, 262 limit Banach, 209 topology, 369 linear functional, 70, 201 hull, 91 closed, 91 operator, 70 extension, 73 order, 366 Lipschitz constant, 38 locally convex extremal point, 258 topology, 252 vector space, 252 nite measure, 119 H k , 150 Lp , 150 Lusin theorem, 78 maximal element, 366 MazurUlam theorem, 84 meager, 194 mean value principle, 157 measure Fourier coecient, 265 invariant, 245 ergodic, 245 locally nite, 119, 220 preserving, 119 preserving system, 245 regular, 220, 386 -nite, 94 space, 380 spectral, 264 metric, 367 pseudo, 367
410
space, 367 separable, 60 miracle minor, 231 mixing, 261 de Moivre formula, 7 multiplication operator spectrum, 290, 310 M untz theorem, 206 neighborhood, 367 lter, 368 von Neumann algebra, 308 von Neumann algebra, 262 non-diagonal spectral measure, 330 norm, 35 denes a metric, 38 equivalent, 38 operator, 70 pseudo-, 41 semi-, 41, 42, 45, 56, 148, 239, 250, 252, 254, 255 normal space, 374 topological space, 374 normed space bi-dual, 208 vector space inner product, 80 normed linear space dual, 201 normed vector space conjugate, 201 nowhere dense, 194 nuclear space, 262 null set, 383 odd function, 3 part, 3 open cover, 372
mapping theorem, 193 set, 366 operator, 14 averaging, 347 closable, 357 compact, 162 spectral theorem, 172 conditional expectation, 91 densely dened, 357 adjoint, 357 self-adjoint, 357 eigenvalues, 16 HilbertSchmidt, 166 integral compact, self-adjoint, 172 Laplace, 348 multiplication, 274, 290, 310 norm, 70 positive, 319 self-adjoint, 199 spectral theorem, 172 summing, 347 unbounded, 355 unitary, 170, 263 spectral theory, 264 unitary multiplication, 263 order linear, 366 partial, 366 ordinary dierential equation, 12 homogeneous, 12 initial value, 12 SturmLiouville, 14 Volterra, 14 orthogonal complement, 88 projection, 89 orthonormal, 96 basis, 98 parallelogram identity characterizes Hilbert space, 82 Parseval formula, 108 partial
GENERAL INDEX
411
dierential equation heat equation, 20 wave equation, 22 order, 366 maximal element, 366 partition of unity, 375 path, 26 Plancherel formula, 284, 286 Poisson summation formula, 288 polarization identity, 82, 274 Pontryagin dual, 159 Pontryagin dual, 303 positive denite sequence, 264 Bochner theorem, 265 operator, 319 power set, 100, 368 pre-dual, 236, 260 pre-Hilbert space, 80 probability space, 380 product topology, 369 projection orthogonal, 89 valued measure, 274, 336 projective topology, 369 pseudo metric, 41 -norm, 41 pseudo-metric, 367 Pythagoras theorem, 88 Radon Nikodym derivative, 94, 95 -nite case, 96 random walk, 347 reexive, 123, 201, 208 regular measure, 220 reproducing kernel, 90 residual spectrum, 310 resolvent, 291, 293
function, 293 resonance, 23 Riesz representation, 220 Schwartz space, 288 second category, 194 self-adjoint, 161 integral operator, 172 operator, 199 densely dened, 357 spectral theorem, 172 semi-norm, 41, 42, 45, 56, 148, 239 absorbent set, 255 continuous, 42 denes a norm, 42 Fr echet space, 254 kernel, 41 locally convex topology, 252 strong operator topology, 250 sequence, 368 Cauchy, 44 dense, 8 equidistributed, 8 sequentially compact, 373 sesqui-linearity, 80 set clopen, 366 closed, 366 open, 366 theory, 365 simple function, 381 Sobolev embedding theorem, 128, 133 space, 21, 128 space variable, 276 spectral gap, 30 measure, 264, 274, 322 non-diagonal, 330 radius, 290, 298 theorem, 171, 343 projection-valued measure form, 343
412
proof, 175 theory, 101 unbounded self-adjoint, 355 unitary operator, 264 unitary operators, 263 spectrum, 290 discrete, point, 309 essential range, 313 residual, 310 star operator, 297 StoneWeierstrass theorem, 61, 104, 107 StoneCech compactication, 301 strong analytic, 241 convergence, 190 operator topology, 250, 337 stronger topology, 367 SturmLiouville boundary value problem integral operator, 17 equation, 14 sub-additive sequence, 295 strictly, 84 sub-multiplicative sequence, 295 super-polynomial decay, 288 superposition, 277 tail lter, 368 Taylor approximation, 18 test function, 24, 135 thermal equilibrium, 20 Tietze extension theorem, 78 Toeplitz theorem, 239 topological group, 103 abelian, 103 character, 105 space normal, 374 vector space, 24 topological space compact, 372
neighborhood, 367 normal, 374 topology, 366 coarser, weaker, 367 induced from a subset, 369 initial, limit, projective, weak, 369 locally convex, 252 product, 369 strong operator, 250, 337 stronger, ner, 367 uniform operator, 250 weak, 235 weak operator, 252 weak*, 236 total derivative, 18 totally bounded, 373 totally disconnected, 221 trace, 139 transfer method, 146 tree averaging operator, 347 Laplace operator, 348 regular, 347 summing operator, 347 triangle inequality, 367 Lp -norm, 385 trigonometric polynomial, 8 Tychono Alaoglu theorem, 237 theorem, 60, 373 ultralter, 373 uniform boundedness, 189 convergence, 44 convergence on compact sets, 254 operator topology, 250 uniformly continuous, 59 convergent, 190 convex space, 85
GENERAL INDEX
413
distributed, 8 unit ball compact in weak* topology, 237 ball is non-compact, 57 unital, 79 Banach algebra, 289 unitary, 161 operator, 119, 170 representation, 119 cyclic, 266 Urysohn lemma, 374 vector space inner product, 80 norm, 35 topological, 24 vertex, 26 Volterra equation, 13, 14 wave
equation, 22 Dirichlet boundary problem, 22 vibrating string, 24 weak analytic, 241 analytic function is analytic, 242 derivative, 135 harmonic, 149 operator topology, 252 topology, 201, 235, 369 weak* topology, 201, 236 weaker topology, 367 weakly harmonic function, 149 weight, 122 of a function, 6 space, 122 Zorns lemma, 100, 366

FA Lecture

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FA Lecture

Uploaded by

Copyright:

Available Formats

Functional Analysis Notes

M. Einsiedler, T. Ward Draft July 2, 2012

Draft version: comments to t.ward@uea.ac.uk please

Draft version: comments to t.ward@uea.ac.uk please

Draft version: comments to t.ward@uea.ac.uk please

Draft version: comments to t.ward@uea.ac.uk please

Draft version: comments to t.ward@uea.ac.uk please

Draft version: comments to t.ward@uea.ac.uk please

Draft version: comments to t.ward@uea.ac.uk please

Draft version: comments to t.ward@uea.ac.uk please

Draft version: comments to t.ward@uea.ac.uk please

Draft version: comments to t.ward@uea.ac.uk please

1.1 From Even and Odd Functions to Group Representations

the odd part

Draft version: comments to t.ward@uea.ac.uk please

g (h x) = (gh) x for all g, h G and x X , and

Here we are using n Z as a shorthand for the coset n + 2Z Z/2Z.

Draft version: comments to t.ward@uea.ac.uk please

1.1 From Even and Odd Functions to Group Representations

Draft version: comments to t.ward@uea.ac.uk please

Draft version: comments to t.ward@uea.ac.uk please

1.1 From Even and Odd Functions to Group Representations

Similarly, we will show that for a reasonable function f : R2 C the function

for n Z has weight n, and that f=

Draft version: comments to t.ward@uea.ac.uk please

1.2 (Equi-)distribution of Points and Measures

Draft version: comments to t.ward@uea.ac.uk please

1.2 (Equi-)distribution of Points and Measures

= sup |f (x) g (x)| <

1. If K is suciently large then, by assumption, 1 K

1 for all x [0, 1],

(f+ f ) < , and

f (x) dx for convenience.

Draft version: comments to t.ward@uea.ac.uk please

Fig. 1.1. The function

[a,b] and the approximations f (dots) and f+ (dashes).

By (c), the functions f and f+ also dene continuous functions on T. Since 1 K

as K , we obtain (b a) lim inf

[0,a) (xk ) + (b,1] (xk ) 1 (b a)

Draft version: comments to t.ward@uea.ac.uk please

1.2 (Equi-)distribution of Points and Measures

1 e2in(K +1) e2in 0 K e2in 1

An amusing consequence of this example is a special case of Benfords law.

Draft version: comments to t.ward@uea.ac.uk please

1.3 Ordinary Dierential Equations

Draft version: comments to t.ward@uea.ac.uk please

1.3 Ordinary Dierential Equations

sin(x t)g (t) dt

f (x) = sin x + sin(x x)g (x) + so f (0) = 0. Finally,

cos(x t)g (t) dt,

sin(x t)g (t) dt

sin(x t) (t)f (t) dt.

Draft version: comments to t.ward@uea.ac.uk please

f = u + K (f ), where u(x) = cos x and

k (x, t)f (t) dt.

Draft version: comments to t.ward@uea.ac.uk please

1.3 Ordinary Dierential Equations

s(t 1) for 0 t(s 1) for 0

G(s, t)h(t) dt.

In order to justify the claim, assume rst that f = Kh. Then

Draft version: comments to t.ward@uea.ac.uk please

and f (1) = 0 for the same reason. Moreover,

s(t 1)h(t) dt,