Lecturenotes641 PDF

Lecture Notes
Kuttler
July 6, 2010
2
Contents
I Preliminary Material 11
1 Set Theory 13
1.1 Basic Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2 The Schroder Bernstein Theorem . . . . . . . . . . . . . . . . . . . . 16
1.3 Equivalence Relations . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.4 Partially Ordered Sets . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2 The Riemann Stieltjes Integral 21
2.1 Upper And Lower Riemann Stieltjes Sums . . . . . . . . . . . . . . . 21
2.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3 Functions Of Riemann Integrable Functions . . . . . . . . . . . . . . 26
2.4 Properties Of The Integral . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5 Fundamental Theorem Of Calculus . . . . . . . . . . . . . . . . . . . 33
2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3 Important Linear Algebra 39
3.1 Algebra in F
n
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2 Subspaces Spans And Bases . . . . . . . . . . . . . . . . . . . . . . . 42
3.3 An Application To Matrices . . . . . . . . . . . . . . . . . . . . . . . 46
3.4 The Mathematical Theory Of Determinants . . . . . . . . . . . . . . 48
3.5 The Cayley Hamilton Theorem . . . . . . . . . . . . . . . . . . . . . 61
3.6 An Identity Of Cauchy . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.7 Block Multiplication Of Matrices . . . . . . . . . . . . . . . . . . . . 63
3.8 Shurs Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.9 The Right Polar Decomposition . . . . . . . . . . . . . . . . . . . . . 73
3.10 The Space L(F
n
, F
m
) . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.11 The Operator Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4 The Frechet Derivative 81
4.1 C
1
Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.2 C
k
Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.3 Mixed Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . 89
4.4 Implicit Function Theorem . . . . . . . . . . . . . . . . . . . . . . . 91
4.5 More Continuous Partial Derivatives . . . . . . . . . . . . . . . . . . 95
3
4 CONTENTS
II Lecture Notes For Math 641 and 642 97
5 Metric Spaces And General Topological Spaces 99
5.1 Metric Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.2 Compactness In Metric Space . . . . . . . . . . . . . . . . . . . . . . 101
5.3 Some Applications Of Compactness . . . . . . . . . . . . . . . . . . . 104
5.4 Ascoli Arzela Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.5 General Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . 110
5.6 Connected Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6 Approximation Theorems 123
6.1 The Bernstein Polynomials . . . . . . . . . . . . . . . . . . . . . . . 123
6.2 Stone Weierstrass Theorem . . . . . . . . . . . . . . . . . . . . . . . 125
6.2.1 The Case Of Compact Sets . . . . . . . . . . . . . . . . . . . 125
6.2.2 The Case Of Locally Compact Sets . . . . . . . . . . . . . . . 128
6.2.3 The Case Of Complex Valued Functions . . . . . . . . . . . . 129
6.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
7 Abstract Measure And Integration 135
7.1 Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
7.2 The Abstract Lebesgue Integral . . . . . . . . . . . . . . . . . . . . . 143
7.2.1 Preliminary Observations . . . . . . . . . . . . . . . . . . . . 143
7.2.2 Denition Of The Lebesgue Integral For Nonnegative Mea-
surable Functions . . . . . . . . . . . . . . . . . . . . . . . . . 145
7.2.3 The Lebesgue Integral For Nonnegative Simple Functions . . 147
7.2.4 Simple Functions And Measurable Functions . . . . . . . . . 150
7.2.5 The Monotone Convergence Theorem . . . . . . . . . . . . . 151
7.2.6 Other Denitions . . . . . . . . . . . . . . . . . . . . . . . . . 152
7.2.7 Fatous Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . 153
7.2.8 The Righteous Algebraic Desires Of The Lebesgue Integral . 155
7.3 The Space L
1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
7.4 Vitali Convergence Theorem . . . . . . . . . . . . . . . . . . . . . . . 162
7.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
8 The Construction Of Measures 169
8.1 Outer Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
8.2 Urysohns Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
8.3 Positive Linear Functionals . . . . . . . . . . . . . . . . . . . . . . . 186
8.4 One Dimensional Lebesgue Measure . . . . . . . . . . . . . . . . . . 193
8.5 The Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . 193
8.6 Completion Of Measures . . . . . . . . . . . . . . . . . . . . . . . . . 195
8.7 Product Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
8.7.1 General Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 199
8.7.2 Completion Of Product Measure Spaces . . . . . . . . . . . . 203
8.8 Disturbing Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
CONTENTS 5
8.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
9 Lebesgue Measure 211
9.1 Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
9.2 The Vitali Covering Theorem . . . . . . . . . . . . . . . . . . . . . . 215
9.3 The Vitali Covering Theorem (Elementary Version) . . . . . . . . . . 217
9.4 Vitali Coverings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
9.5 Change Of Variables For Linear Maps . . . . . . . . . . . . . . . . . 223
9.6 Change Of Variables For C
1
Functions . . . . . . . . . . . . . . . . . 227
9.7 Mappings Which Are Not One To One . . . . . . . . . . . . . . . . . 231
9.8 Lebesgue Measure And Iterated Integrals . . . . . . . . . . . . . . . 232
9.9 Spherical Coordinates In Many Dimensions . . . . . . . . . . . . . . 234
9.10 The Brouwer Fixed Point Theorem . . . . . . . . . . . . . . . . . . . 236
9.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
10 The L
p
Spaces 247
10.1 Basic Inequalities And Properties . . . . . . . . . . . . . . . . . . . . 247
10.2 Hilbert Space And Riesz Representation Theorem . . . . . . . . . . 253
10.3 Minkowskis Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . 258
10.4 Density Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 260
10.5 Separability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
10.5.1 An Algebra Of Special Functions . . . . . . . . . . . . . . . . 264
10.6 Continuity Of Translation . . . . . . . . . . . . . . . . . . . . . . . . 265
10.7 Molliers And Density Of Smooth Functions . . . . . . . . . . . . . 266
10.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
11 Fourier Transforms 275
11.1 An Algebra Of Special Functions . . . . . . . . . . . . . . . . . . . . 275
11.2 Fourier Transforms Of Functions In ( . . . . . . . . . . . . . . . . . 276
11.3 Fourier Transforms Of Just About Anything . . . . . . . . . . . . . . 279
11.3.1 Fourier Transforms Of (
. . . . . . . . . . . . . . . . . . . . 279
11.3.2 Fourier Transforms Of Functions In L
1
(1
n
) . . . . . . . . . . 283
2
(1
n
) . . . . . . . . . . 286
11.3.4 The Schwartz Class . . . . . . . . . . . . . . . . . . . . . . . 291
11.3.5 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
11.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
12 Banach Spaces 299
12.1 Theorems Based On Baire Category . . . . . . . . . . . . . . . . . . 299
12.1.1 Baire Category Theorem . . . . . . . . . . . . . . . . . . . . . 299
12.1.2 Uniform Boundedness Theorem . . . . . . . . . . . . . . . . . 303
12.1.3 Open Mapping Theorem . . . . . . . . . . . . . . . . . . . . . 304
12.1.4 Closed Graph Theorem . . . . . . . . . . . . . . . . . . . . . 306
12.2 Hahn Banach Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 308
12.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
6 CONTENTS
13 Hilbert Spaces 321
13.1 Basic Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
13.2 Approximations In Hilbert Space . . . . . . . . . . . . . . . . . . . . 327
13.3 Orthonormal Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
13.4 Fourier Series, An Example . . . . . . . . . . . . . . . . . . . . . . . 332
13.5 General Theory Of Continuous Semigroups . . . . . . . . . . . . . . 334
13.5.1 An Evolution Equation . . . . . . . . . . . . . . . . . . . . . 343
13.5.2 Adjoints, Hilbert Space . . . . . . . . . . . . . . . . . . . . . 346
13.5.3 Adjoints, Reexive Banach Space . . . . . . . . . . . . . . . . 350
13.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
14 Representation Theorems 359
14.1 Radon Nikodym Theorem . . . . . . . . . . . . . . . . . . . . . . . . 359
14.2 Vector Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
14.3 Representation Theorems For The Dual Space Of L
p
. . . . . . . . . 372
14.4 The Dual Space Of C
0
(X) . . . . . . . . . . . . . . . . . . . . . . . . 380
0
(X), Another Approach . . . . . . . . . . . . 385
14.6 More Attractive Formulations . . . . . . . . . . . . . . . . . . . . . . 386
14.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388
15 Integrals And Derivatives 393
15.1 The Fundamental Theorem Of Calculus . . . . . . . . . . . . . . . . 393
15.2 Absolutely Continuous Functions . . . . . . . . . . . . . . . . . . . . 398
15.3 Dierentiation Of Measures With Respect To Lebesgue Measure . . 403
15.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
16 Dierentiation With Respect To General Radon Measures 415
16.1 Besicovitch Covering Theorem . . . . . . . . . . . . . . . . . . . . . 415
16.2 Fundamental Theorem Of Calculus For Radon Measures . . . . . . . 420
16.3 Slicing Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
III Complex Analysis 431
17 The Complex Numbers 433
17.1 The Extended Complex Plane . . . . . . . . . . . . . . . . . . . . . . 435
17.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
18 Riemann Stieltjes Integrals 437
18.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
19 Fundamentals Of Complex Analysis 449
19.1 Analytic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
19.1.1 Cauchy Riemann Equations . . . . . . . . . . . . . . . . . . . 451
19.1.2 An Important Example . . . . . . . . . . . . . . . . . . . . . 453
19.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
CONTENTS 7
19.3 Cauchys Formula For A Disk . . . . . . . . . . . . . . . . . . . . . . 455
19.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
19.5 Zeros Of An Analytic Function . . . . . . . . . . . . . . . . . . . . . 465
19.6 Liouvilles Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 467
19.7 The General Cauchy Integral Formula . . . . . . . . . . . . . . . . . 468
19.7.1 The Cauchy Goursat Theorem . . . . . . . . . . . . . . . . . 468
19.7.2 A Redundant Assumption . . . . . . . . . . . . . . . . . . . . 471
19.7.3 Classication Of Isolated Singularities . . . . . . . . . . . . . 472
19.7.4 The Cauchy Integral Formula . . . . . . . . . . . . . . . . . . 475
19.7.5 An Example Of A Cycle . . . . . . . . . . . . . . . . . . . . . 482
19.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486
20 The Open Mapping Theorem 489
20.1 A Local Representation . . . . . . . . . . . . . . . . . . . . . . . . . 489
20.1.1 Branches Of The Logarithm . . . . . . . . . . . . . . . . . . . 491
20.2 Maximum Modulus Theorem . . . . . . . . . . . . . . . . . . . . . . 493
20.3 Extensions Of Maximum Modulus Theorem . . . . . . . . . . . . . . 495
20.3.1 Phragmen Lindelof Theorem . . . . . . . . . . . . . . . . . . 495
20.3.2 Hadamard Three Circles Theorem . . . . . . . . . . . . . . . 497
20.3.3 Schwarzs Lemma . . . . . . . . . . . . . . . . . . . . . . . . . 498
20.3.4 One To One Analytic Maps On The Unit Ball . . . . . . . . 499
20.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 500
20.5 Counting Zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
20.6 An Application To Linear Algebra . . . . . . . . . . . . . . . . . . . 506
20.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510
21 Residues 513
21.1 Rouches Theorem And The Argument Principle . . . . . . . . . . . 516
21.1.1 Argument Principle . . . . . . . . . . . . . . . . . . . . . . . 516
21.1.2 Rouches Theorem . . . . . . . . . . . . . . . . . . . . . . . . 519
21.1.3 A Dierent Formulation . . . . . . . . . . . . . . . . . . . . . 520
21.2 Singularities And The Laurent Series . . . . . . . . . . . . . . . . . . 521
21.2.1 What Is An Annulus? . . . . . . . . . . . . . . . . . . . . . . 521
21.2.2 The Laurent Series . . . . . . . . . . . . . . . . . . . . . . . . 524
21.2.3 Contour Integrals And Evaluation Of Integrals . . . . . . . . 528
21.3 The Spectral Radius Of A Bounded Linear Transformation . . . . . 537
21.4 Analytic Semigroups . . . . . . . . . . . . . . . . . . . . . . . . . . . 539
21.4.1 Sectorial Operators And Analytic Semigroups . . . . . . . . . 539
21.4.2 The Numerical Range . . . . . . . . . . . . . . . . . . . . . . 551
21.4.3 An Interesting Example . . . . . . . . . . . . . . . . . . . . . 553
21.4.4 Fractional Powers Of Sectorial Operators . . . . . . . . . . . 556
21.4.5 A Scale Of Banach Spaces . . . . . . . . . . . . . . . . . . . . 571
21.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574
8 CONTENTS
22 Complex Mappings 577
22.1 Conformal Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577
22.2 Fractional Linear Transformations . . . . . . . . . . . . . . . . . . . 578
22.2.1 Circles And Lines . . . . . . . . . . . . . . . . . . . . . . . . 578
22.2.2 Three Points To Three Points . . . . . . . . . . . . . . . . . . 580
22.3 Riemann Mapping Theorem . . . . . . . . . . . . . . . . . . . . . . . 581
22.3.1 Montels Theorem . . . . . . . . . . . . . . . . . . . . . . . . 582
22.3.2 Regions With Square Root Property . . . . . . . . . . . . . . 584
22.4 Analytic Continuation . . . . . . . . . . . . . . . . . . . . . . . . . . 588
22.4.1 Regular And Singular Points . . . . . . . . . . . . . . . . . . 588
22.4.2 Continuation Along A Curve . . . . . . . . . . . . . . . . . . 590
22.5 The Picard Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . 591
22.5.1 Two Competing Lemmas . . . . . . . . . . . . . . . . . . . . 593
22.5.2 The Little Picard Theorem . . . . . . . . . . . . . . . . . . . 596
22.5.3 Schottkys Theorem . . . . . . . . . . . . . . . . . . . . . . . 597
22.5.4 A Brief Review . . . . . . . . . . . . . . . . . . . . . . . . . . 601
22.5.5 Montels Theorem . . . . . . . . . . . . . . . . . . . . . . . . 603
22.5.6 The Great Big Picard Theorem . . . . . . . . . . . . . . . . . 604
22.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 606
23 Approximation By Rational Functions 609
23.1 Runges Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
23.1.1 Approximation With Rational Functions . . . . . . . . . . . . 609
23.1.2 Moving The Poles And Keeping The Approximation . . . . . 611
23.1.3 Mertens Theorem. . . . . . . . . . . . . . . . . . . . . . . . . 611
23.1.4 Runges Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 616
23.2 The Mittag-Leer Theorem . . . . . . . . . . . . . . . . . . . . . . . 618
23.2.1 A Proof From Runges Theorem . . . . . . . . . . . . . . . . 618
23.2.2 A Direct Proof Without Runges Theorem . . . . . . . . . . . 620
23.2.3 Functions Meromorphic On

C . . . . . . . . . . . . . . . . . . 622
23.2.4 A Great And Glorious Theorem About Simply Connected
Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622
23.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626
24 Innite Products 627
24.1 Analytic Function With Prescribed Zeros . . . . . . . . . . . . . . . 631
24.2 Factoring A Given Analytic Function . . . . . . . . . . . . . . . . . . 636
24.2.1 Factoring Some Special Analytic Functions . . . . . . . . . . 638
24.3 The Existence Of An Analytic Function With Given Values . . . . . 640
24.4 Jensens Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644
24.5 Blaschke Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . 647
24.5.1 The M untz-Szasz Theorem . . . . . . . . . . . . . . . . . . . 650
24.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652
CONTENTS 9
25 Elliptic Functions 661
25.1 Periodic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662
25.1.1 The Unimodular Transformations . . . . . . . . . . . . . . . . 666
25.1.2 The Search For An Elliptic Function . . . . . . . . . . . . . . 669
25.1.3 The Dierential Equation Satised By . . . . . . . . . . . . 672
25.1.4 A Modular Function . . . . . . . . . . . . . . . . . . . . . . . 674
25.1.5 A Formula For . . . . . . . . . . . . . . . . . . . . . . . . . 680
25.1.6 Mapping Properties Of . . . . . . . . . . . . . . . . . . . . 682
25.1.7 A Short Review And Summary . . . . . . . . . . . . . . . . . 690
25.2 The Picard Theorem Again . . . . . . . . . . . . . . . . . . . . . . . 694
25.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695
A The Hausdor Maximal Theorem 697
A.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 701
Copyright c _ 2005,
10 CONTENTS
Part I
Preliminary Material
11
Set Theory
1.1 Basic Denitions
A set is a collection of things called elements of the set. For example, the set of
integers, the collection of signed whole numbers such as 1,2,-4, etc. This set whose
existence will be assumed is denoted by Z. Other sets could be the set of people in
a family or the set of donuts in a display case at the store. Sometimes parentheses,
specify a set by listing the things which are in the set between the parentheses.
For example the set of integers between -1 and 2, including these numbers could
be denoted as 1, 0, 1, 2. The notation signifying x is an element of a set S, is
written as x S. Thus, 1 1, 0, 1, 2, 3. Here are some axioms about sets.
Axioms are statements which are accepted, not proved.
1. Two sets are equal if and only if they have the same elements.
2. To every set, A, and to every condition S (x) there corresponds a set, B, whose
elements are exactly those elements x of A for which S (x) holds.
3. For every collection of sets there exists a set that contains all the elements
that belong to at least one set of the given collection.
4. The Cartesian product of a nonempty family of nonempty sets is nonempty.
5. If A is a set there exists a set, T (A) such that T (A) is the set of all subsets
of A. This is called the power set.
These axioms are referred to as the axiom of extension, axiom of specication,
axiom of unions, axiom of choice, and axiom of powers respectively.
It seems fairly clear you should want to believe in the axiom of extension. It is
merely saying, for example, that 1, 2, 3 = 2, 3, 1 since these two sets have the
same elements in them. Similarly, it would seem you should be able to specify a
new set from a given set using some condition which can be used as a test to
determine whether the element in question is in the set. For example, the set of all
integers which are multiples of 2. This set could be specied as follows.
x Z : x = 2y for some y Z .
13
14 SET THEORY
In this notation, the colon is read as such that and in this case the condition is
being a multiple of 2.
Another example of political interest, could be the set of all judges who are not
judicial activists. I think you can see this last is not a very precise condition since
there is no way to determine to everyones satisfaction whether a given judge is an
activist. Also, just because something is grammatically correct does not mean it
makes any sense. For example consider the following nonsense.
S = x set of dogs : it is colder in the mountains than in the winter .
So what is a condition?
We will leave these sorts of considerations and assume our conditions make sense.
The axiom of unions states that for any collection of sets, there is a set consisting
of all the elements in each of the sets in the collection. Of course this is also open to
further consideration. What is a collection? Maybe it would be better to say set
of sets or, given a set whose elements are sets there exists a set whose elements
consist of exactly those things which are elements of at least one of these sets. If o
is such a set whose elements are sets,
A : A o or o
signify this union.
Something is in the Cartesian product of a set or family of sets if it consists
of a single thing taken from each set in the family. Thus (1, 2, 3) 1, 4, .2
1, 2, 74, 3, 7, 9 because it consists of exactly one element from each of the sets
which are separated by . Also, this is the notation for the Cartesian product of
nitely many sets. If o is a set whose elements are sets,
AS
A
signies the Cartesian product.
The Cartesian product is the set of choice functions, a choice function being a
function which selects exactly one element of each set of o. You may think the axiom
of choice, stating that the Cartesian product of a nonempty family of nonempty sets
is nonempty, is innocuous but there was a time when many mathematicians were
ready to throw it out because it implies things which are very hard to believe, things
which never happen without the axiom of choice.
A is a subset of B, written A B, if every element of A is also an element of
B. This can also be written as B A. A is a proper subset of B, written A B
or B A if A is a subset of B but A is not equal to B, A ,= B. A B denotes the
intersection of the two sets, A and B and it means the set of elements of A which
are also elements of B. The axiom of specication shows this is a set. The empty
set is the set which has no elements in it, denoted as . A B denotes the union
of the two sets, A and B and it means the set of all elements which are in either of
the sets. It is a set because of the axiom of unions.
1.1. BASIC DEFINITIONS 15
The complement of a set, (the set of things which are not in the given set ) must
be taken with respect to a given set called the universal set which is a set which
contains the one whose complement is being taken. Thus, the complement of A,
denoted as A
C
( or more precisely as X A) is a set obtained from using the axiom
of specication to write
A
C
x X : x / A
The symbol / means: is not an element of. Note the axiom of specication takes
place relative to a given set. Without this universal set it makes no sense to use
the axiom of specication to obtain the complement.
Words such as all or there exists are called quantiers and they must be
understood relative to some given set. For example, the set of all integers larger
than 3. Or there exists an integer larger than 7. Such statements have to do with a
given set, in this case the integers. Failure to have a reference set when quantiers
are used turns out to be illogical even though such usage may be grammatically
correct. Quantiers are used often enough that there are symbols for them. The
symbol is read as for all or for every and the symbol is read as there
exists. Thus could mean for every upside down A there exists a backwards
E.
DeMorgans laws are very useful in mathematics. Let o be a set of sets each of
which is contained in some universal set, U. Then
_
A
C
: A o
_
= (A : A o)
C
and
_
A
C
: A o
_
= (A : A o)
C
.
These laws follow directly from the denitions. Also following directly from the
denitions are:
Let o be a set of sets then
B A : A o = B A : A o .
and: Let o be a set of sets show
B A : A o = B A : A o .
Unfortunately, there is no single universal set which can be used for all sets.
Here is why: Suppose there were. Call it S. Then you could consider A the set
of all elements of S which are not elements of themselves, this from the axiom of
specication. If A is an element of itself, then it fails to qualify for inclusion in A.
Therefore, it must not be an element of itself. However, if this is so, it qualies for
inclusion in A so it is an element of itself and so this cant be true either. Thus
the most basic of conditions you could imagine, that of being an element of, is
meaningless and so allowing such a set causes the whole theory to be meaningless.
The solution is to not allow a universal set. As mentioned by Halmos in Naive
set theory, Nothing contains everything. Always beware of statements involving
quantiers wherever they occur, even this one.
16 SET THEORY
1.2 The Schroder Bernstein Theorem
It is very important to be able to compare the size of sets in a rational way. The
most useful theorem in this context is the Schroder Bernstein theorem which is the
main result to be presented in this section. The Cartesian product is discussed
above. The next denition reviews this and denes the concept of a function.
Denition 1.1 Let X and Y be sets.
X Y (x, y) : x X and y Y
A relation is dened to be a subset of X Y . A function, f, also called a mapping,
is a relation which has the property that if (x, y) and (x, y
1
) are both elements of
the f, then y = y
1
. The domain of f is dened as
D(f) x : (x, y) f ,
written as f : D(f) Y .
It is probably safe to say that most people do not think of functions as a type
of relation which is a subset of the Cartesian product of two sets. A function is like
a machine which takes inputs, x and makes them into a unique output, f (x). Of
course, that is what the above denition says with more precision. An ordered pair,
(x, y) which is an element of the function or mapping has an input, x and a unique
output, y,denoted as f (x) while the name of the function is f. mapping is often
a noun meaning function. However, it also is a verb as in f is mapping A to B
. That which a function is thought of as doing is also referred to using the word
maps as in: f maps X to Y . However, a set of functions may be called a set of
maps so this word might also be used as the plural of a noun. There is no help for
it. You just have to suer with this nonsense.
The following theorem which is interesting for its own sake will be used to prove
the Schroder Bernstein theorem.
Theorem 1.2 Let f : X Y and g : Y X be two functions. Then there exist
sets A, B, C, D, such that
A B = X, C D = Y, A B = , C D = ,
f (A) = C, g (D) = B.
The following picture illustrates the conclusion of this theorem.
B = g(D)
A
E
'
D
C = f(A)
Y X
f
g
1.2. THE SCHRODER BERNSTEIN THEOREM 17
Proof: Consider the empty set, X. If y Y f (), then g (y) / because
has no elements. Also, if A, B, C, and D are as described above, A also would
have this same property that the empty set has. However, A is probably larger.
Therefore, say A
0
X satises T if whenever y Y f (A
0
) , g (y) / A
0
.
/ A
0
X : A
0
satises T.
Let A = /. If y Y f (A), then for each A
0
/, y Y f (A
0
) and so
g (y) / A
0
. Since g (y) / A
0
for all A
0
/, it follows g (y) / A. Hence A satises
T and is the largest subset of X which does so. Now dene
C f (A) , D Y C, B X A.
It only remains to verify that g (D) = B.
Suppose x B = X A. Then A x does not satisfy T and so there exists
y Y f (A x) D such that g (y) A x . But y / f (A) and so since A
satises T, it follows g (y) / A. Hence g (y) = x and so x g (D) and this proves
the theorem.
Theorem 1.3 (Schroder Bernstein) If f : X Y and g : Y X are one to one,
then there exists h : X Y which is one to one and onto.
Proof: Let A, B, C, D be the sets of Theorem1.2 and dene
h(x)
_
f (x) if x A
g
1
(x) if x B
Then h is the desired one to one and onto mapping.
Recall that the Cartesian product may be considered as the collection of choice
functions.
Denition 1.4 Let I be a set and let X
i
be a set for each i I. f is a choice
function written as
f
iI
X
i
if f (i) X
i
for each i I.
The axiom of choice says that if X
i
,= for each i I, for I a set, then
iI
X
i
,= .
Sometimes the two functions, f and g are onto but not one to one. It turns out
that with the axiom of choice, a similar conclusion to the above may be obtained.
Corollary 1.5 If f : X Y is onto and g : Y X is onto, then there exists
h : X Y which is one to one and onto.
18 SET THEORY
Proof: For each y Y , f
1
(y) x X : f (x) = y ,= . Therefore, by the
axiom of choice, there exists f
1
0

yY
f
1
(y) which is the same as saying that
for each y Y , f
1
0
(y) f
1
(y). Similarly, there exists g
1
0
(x) g
1
(x) for all
x X. Then f
1
0
is one to one because if f
1
0
(y
1
) = f
1
0
(y
2
), then
y
1
= f
_
f
1
0
(y
1
)
_
= f
_
f
1
0
(y
2
)
_
= y
2
.
Similarly g
1
0
is one to one. Therefore, by the Schroder Bernstein theorem, there
exists h : X Y which is one to one and onto.
Denition 1.6 A set S, is nite if there exists a natural number n and a map
which maps 1, , n one to one and onto S. S is innite if it is not nite. A
set S, is called countable if there exists a map mapping N one to one and onto
S.(When maps a set A to a set B, this will be written as : A B in the future.)
Here N 1, 2, , the natural numbers. S is at most countable if there exists a map
: N S which is onto.
The property of being at most countable is often referred to as being countable
because the question of interest is normally whether one can list all elements of the
set, designating a rst, second, third etc. in such a way as to give each element of
the set a natural number. The possibility that a single element of the set may be
counted more than once is often not important.
Theorem 1.7 If X and Y are both at most countable, then X Y is also at most
countable. If either X or Y is countable, then X Y is also countable.
Proof: It is given that there exists a mapping : N X which is onto. Dene
(i) x
i
and consider X as the set x
1
, x
2
, x
3
, . Similarly, consider Y as the
set y
1
, y
2
, y
3
, . It follows the elements of X Y are included in the following
rectangular array.
(x
1
, y
1
) (x
1
, y
2
) (x
1
, y
3
) Those which have x
1
in rst slot.
(x
2
, y
1
) (x
2
, y
2
) (x
2
, y
3
2
in rst slot.
(x
3
, y
1
) (x
3
, y
2
) (x
3
, y
3
3
in rst slot.
.
.
.
.
.
.
.
.
.
.
.
.
.
Follow a path through this array as follows.
(x
1
, y
1
) (x
1
, y
2
) (x
1
, y
3
)

(x
2
, y
1
) (x
2
, y
2
)

(x
3
, y
1
)
Thus the rst element of XY is (x
1
, y
1
), the second element of XY is (x
1
, y
2
),
the third element of X Y is (x
2
, y
1
) etc. This assigns a number from N to each
element of X Y. Thus X Y is at most countable.
1.3. EQUIVALENCE RELATIONS 19
It remains to show the last claim. Suppose without loss of generality that X
is countable. Then there exists : N X which is one to one and onto. Let
: X Y N be dened by ((x, y))
1
(x). Thus is onto N. By the rst
part there exists a function from N onto XY . Therefore, by Corollary 1.5, there
exists a one to one and onto mapping from X Y to N. This proves the theorem.
Theorem 1.8 If X and Y are at most countable, then XY is at most countable.
If either X or Y are countable, then X Y is countable.
Proof: As in the preceding theorem, X = x
1
, x
2
, x
3
, and Y = y
1
, y
2
, y
3
, .
Consider the following array consisting of X Y and path through it.
x
1
x
2
x
3

y
1
y
2
Thus the rst element of X Y is x
1
, the second is x
2
the third is y
1
the fourth is
y
2
etc.
Consider the second claim. By the rst part, there is a map from N onto XY .
Suppose without loss of generality that X is countable and : N X is one to one
and onto. Then dene (y) 1, for all y Y ,and (x)
1
(x). Thus, maps
X Y onto N and this shows there exist two onto maps, one mapping X Y onto
N and the other mapping N onto X Y . Then Corollary 1.5 yields the conclusion.
This proves the theorem.
1.3 Equivalence Relations
There are many ways to compare elements of a set other than to say two elements
are equal or the same. For example, in the set of people let two people be equiv-
alent if they have the same weight. This would not be saying they were the same
person, just that they weighed the same. Often such relations involve considering
one characteristic of the elements of a set and then saying the two elements are
equivalent if they are the same as far as the given characteristic is concerned.
Denition 1.9 Let S be a set. is an equivalence relation on S if it satises the
following axioms.
1. x x for all x S. (Reexive)
2. If x y then y x. (Symmetric)
3. If x y and y z, then x z. (Transitive)
Denition 1.10 [x] denotes the set of all elements of S which are equivalent to x
and [x] is called the equivalence class determined by x or just the equivalence class
of x.
20 SET THEORY
With the above denition one can prove the following simple theorem.
Theorem 1.11 Let be an equivalence class dened on a set, S and let H denote
the set of equivalence classes. Then if [x] and [y] are two of these equivalence classes,
either x y and [x] = [y] or it is not true that x y and [x] [y] = .
1.4 Partially Ordered Sets
Denition 1.12 Let T be a nonempty set. T is called a partially ordered set if
there is a relation, denoted here by , such that
x x for all x T.
If x y and y z then x z.
( T is said to be a chain if every two elements of ( are related. This means that
if x, y (, then either x y or y x. Sometimes a chain is called a totally ordered
set. ( is said to be a maximal chain if whenever T is a chain containing (, T = (.
The most common example of a partially ordered set is the power set of a given
set with being the relation. It is also helpful to visualize partially ordered sets
as trees. Two points on the tree are related if they are on the same branch of
the tree and one is higher than the other. Thus two points on dierent branches
would not be related although they might both be larger than some point on the
trunk. You might think of many other things which are best considered as partially
ordered sets. Think of food for example. You might nd it dicult to determine
which of two favorite pies you like better although you may be able to say very
easily that you would prefer either pie to a dish of lard topped with whipped cream
and mustard. The following theorem is equivalent to the axiom of choice. For a
discussion of this, see the appendix on the subject.
Theorem 1.13 (Hausdor Maximal Principle) Let T be a nonempty partially
ordered set. Then there exists a maximal chain.
The Riemann Stieltjes
Integral
The integral originated in attempts to nd areas of various shapes and the ideas
involved in nding integrals are much older than the ideas related to nding deriva-
tives. In fact, Archimedes
1
was nding areas of various curved shapes about 250
B.C. using the main ideas of the integral. What is presented here is a generaliza-
tion of these ideas. The main interest is in the Riemann integral but if it is easy to
generalize to the so called Stieltjes integral in which the length of an interval, [x, y]
is replaced with an expression of the form F (y) F (x) where F is an increasing
function, then the generalization is given. However, there is much more that can
be written about Stieltjes integrals than what is presented here. A good source for
this is the book by Apostol, [3].
2.1 Upper And Lower Riemann Stieltjes Sums
The Riemann integral pertains to bounded functions which are dened on a bounded
interval. Let [a, b] be a closed interval. A set of points in [a, b], x
0
, , x
n
is a
partition if
a = x
0
< x
1
< < x
n
= b.
Such partitions are denoted by P or Q. For f a bounded function dened on [a, b] ,
let
M
i
(f) supf (x) : x [x
i1
, x
i
],
m
i
(f) inff (x) : x [x
i1
, x
i
].
1
Archimedes 287-212 B.C. found areas of curved regions by stung them with simple shapes
which he knew the area of and taking a limit. He also made fundamental contributions to physics.
The story is told about how he determined that a gold smith had cheated the king by giving him
a crown which was not solid gold as had been claimed. He did this by nding the amount of water
displaced by the crown and comparing with the amount of water it should have displaced if it had
been solid gold.
21
22 THE RIEMANN STIELTJES INTEGRAL
Denition 2.1 Let F be an increasing function dened on [a, b] and let F
i

F (x
i
) F (x
i1
) . Then dene upper and lower sums as
U (f, P)
n
i=1
M
i
(f) F
i
and L(f, P)
n
i=1
m
i
(f) F
i
respectively. The numbers, M
i
(f) and m
i
(f) , are well dened real numbers because
f is assumed to be bounded and 1 is complete. Thus the set S = f (x) : x
[x
i1
, x
i
] is bounded above and below.
In the following picture, the sum of the areas of the rectangles in the picture on
the left is a lower sum for the function in the picture and the sum of the areas of the
rectangles in the picture on the right is an upper sum for the same function which
uses the same partition. In these pictures the function, F is given by F (x) = x and
these are the ordinary upper and lower sums from calculus.
y = f(x)
x
0
x
1
x
2
x
3
x
0
x
1
x
2
x
3
What happens when you add in more points in a partition? The following
pictures illustrate in the context of the above example. In this example a single
additional point, labeled z has been added in.
y = f(x)
x
0
x
1
x
2
x
3
z x
0
x
1
x
2
x
3
z
Note how the lower sum got larger by the amount of the area in the shaded
rectangle and the upper sum got smaller by the amount in the rectangle shaded by
dots. In general this is the way it works and this is shown in the following lemma.
Lemma 2.2 If P Q then
U (f, Q) U (f, P) , and L(f, P) L(f, Q) .
2.1. UPPER AND LOWER RIEMANN STIELTJES SUMS 23
Proof: This is veried by adding in one point at a time. Thus let P =
x
0
, , x
n
and let Q = x
0
, , x
k
, y, x
k+1
, , x
n
. Thus exactly one point, y,
is added between x
k
and x
k+1
. Now the term in the upper sum which corresponds
to the interval [x
k
, x
k+1
] in U (f, P) is
supf (x) : x [x
k
, x
k+1
] (F (x
k+1
) F (x
k
)) (2.1)
and the term which corresponds to the interval [x
k
, x
k+1
] in U (f, Q) is
supf (x) : x [x
k
, y] (F (y) F (x
k
)) (2.2)
+ sup f (x) : x [y, x
k+1
] (F (x
k+1
) F (y)) (2.3)
M
1
(F (y) F (x
k
)) +M
2
(F (x
k+1
) F (y)) (2.4)
All the other terms in the two sums coincide. Now supf (x) : x [x
k
, x
k+1
]
max (M
1
, M
2
) and so the expression in 2.2 is no larger than
supf (x) : x [x
k
, x
k+1
] (F (x
k+1
) F (y))
+supf (x) : x [x
k
, x
k+1
] (F (y) F (x
k
))
= sup f (x) : x [x
k
, x
k+1
] (F (x
k+1
) F (x
k
)) ,
the term corresponding to the interval, [x
k
, x
k+1
] and U (f, P) . This proves the
rst part of the lemma pertaining to upper sums because if Q P, one can obtain
Q from P by adding in one point at a time and each time a point is added, the
corresponding upper sum either gets smaller or stays the same. The second part
about lower sums is similar and is left as an exercise.
Lemma 2.3 If P and Q are two partitions, then
L(f, P) U (f, Q) .
Proof: By Lemma 2.2,
L(f, P) L(f, P Q) U (f, P Q) U (f, Q) .
Denition 2.4
I infU (f, Q) where Q is a partition
I supL(f, P) where P is a partition.
Note that I and I are well dened real numbers.
Theorem 2.5 I I.
Proof: From Lemma 2.3,
I = supL(f, P) where P is a partition U (f, Q)
because U (f, Q) is an upper bound to the set of all lower sums and so it is no
smaller than the least upper bound. Therefore, since Q is arbitrary,
I = supL(f, P) where P is a partition
infU (f, Q) where Q is a partition I
where the inequality holds because it was just shown that I is a lower bound to the
set of all upper sums and so it is no larger than the greatest lower bound of this
set. This proves the theorem.
Denition 2.6 A bounded function f is Riemann Stieltjes integrable, written as
f R([a, b])
if
I = I
and in this case,
_
b
a
f (x) dF I = I.
When F (x) = x, the integral is called the Riemann integral and is written as
_
b
a
f (x) dx.
Thus, in words, the Riemann integral is the unique number which lies between
all upper sums and all lower sums if there is such a unique number.
Recall the following Proposition which comes from the denitions.
Proposition 2.7 Let S be a nonempty set and suppose sup(S) exists. Then for
every > 0,
S (sup(S) , sup(S)] ,= .
If inf (S) exists, then for every > 0,
S [inf (S) , inf (S) +) ,= .
This proposition implies the following theorem which is used to determine the
question of Riemann Stieltjes integrability.
Theorem 2.8 A bounded function f is Riemann integrable if and only if for all
> 0, there exists a partition P such that
U (f, P) L(f, P) < . (2.5)
2.2. EXERCISES 25
Proof: First assume f is Riemann integrable. Then let P and Q be two parti-
tions such that
U (f, Q) I /2.
Then since I = I,
U (f, Q P) L(f, P Q) U (f, Q) L(f, P) 0 there exists a partition such that 2.5 holds. Then
for given and partition P corresponding to
I I U (f, P) L(f, P) .
Since is arbitrary, this shows I = I and this proves the theorem.
The condition described in the theorem is called the Riemann criterion .
Not all bounded functions are Riemann integrable. For example, let F (x) = x
and
f (x)
_
1 if x
0 if x 1
(2.6)
Then if [a, b] = [0, 1] all upper sums for f equal 1 while all lower sums for f equal
0. Therefore the Riemann criterion is violated for = 1/2.
2.2 Exercises
1. Prove the second half of Lemma 2.2 about lower sums.
2. Verify that for f given in 2.6, the lower sums on the interval [0, 1] are all equal
to zero while the upper sums are all equal to one.
3. Let f (x) = 1 + x
2
for x [1, 3] and let P =
_
1,
1
3
, 0,
1
2
, 1, 2
_
. Find
U (f, P) and L(f, P) for F (x) = x and for F (x) = x
3
.
4. Show that if f R([a, b]) for F (x) = x, there exists a partition, x
0
, , x
n
such that for any z

k
[x
k
, x
k+1
] ,
_
b
a
f (x) dx
n
k=1
f (z
k
) (x
k
x
k1
)
<
This sum,
n
k=1
f (z
k
) (x
k
x
k1
) , is called a Riemann sum and this exercise
shows that the Riemann integral can always be approximated by a Riemann
sum. For the general Riemann Stieltjes case, does anything change?
5. Let P =
_
1, 1
1
4
, 1
1
2
, 1
3
4
, 2
_
and F (x) = x. Find upper and lower sums for the
function, f (x) =
1
x
using this partition. What does this tell you about ln(2)?
6. If f R([a, b]) with F (x) = x and f is changed at nitely many points,
show the new function is also in R([a, b]) . Is this still true for the general case
where F is only assumed to be an increasing function? Explain.
7. In the case where F (x) = x, dene a left sum as
n
k=1
f (x
k1
) (x
k
x
k1
)
and a right sum,
n
k=1
f (x
k
) (x
k
x
k1
) .
Also suppose that all partitions have the property that x
k
x
k1
equals a
constant, (b a) /n so the points in the partition are equally spaced, and
dene the integral to be the number these right and left sums get close to as
n gets larger and larger. Show that for f given in 2.6,
_
x
0
f (t) dt = 1 if x is
rational and
_
x
0
f (t) dt = 0 if x is irrational. It turns out that the correct
answer should always equal zero for that function, regardless of whether x is
rational. This is shown when the Lebesgue integral is studied. This illustrates
why this method of dening the integral in terms of left and right sums is total
nonsense. Show that even though this is the case, it makes no dierence if f
is continuous.
2.3 Functions Of Riemann Integrable Functions
It is often necessary to consider functions of Riemann integrable functions and a
natural question is whether these are Riemann integrable. The following theorem
gives a partial answer to this question. This is not the most general theorem which
will relate to this question but it will be enough for the needs of this book.
Theorem 2.9 Let f, g be bounded functions and let f ([a, b]) [c
1
, d
1
] and g ([a, b])
[c
2
, d
2
] . Let H : [c
1
, d
1
] [c
2
, d
2
] 1 satisfy,
[H (a
1
, b
1
) H (a
2
, b
2
)[ K [[a
1
a
2
[ +[b
1
b
2
[]
for some constant K. Then if f, g R([a, b]) it follows that H (f, g) R([a, b]) .
Proof: In the following claim, M
i
(h) and m
i
(h) have the meanings assigned
above with respect to some partition of [a, b] for the function, h.
Claim: The following inequality holds.
[M
i
(H (f, g)) m
i
(H (f, g))[
K [[M
i
(f) m
i
(f)[ +[M
i
(g) m
i
(g)[] .
Proof of the claim: By the above proposition, there exist x
1
, x
2
[x
i1
, x
i
]
be such that
H (f (x
1
) , g (x
1
)) + > M
i
(H (f, g)) ,
and
H (f (x
2
) , g (x
2
)) < m
i
(H (f, g)) .
2.3. FUNCTIONS OF RIEMANN INTEGRABLE FUNCTIONS 27
Then
[M
i
(H (f, g)) m
i
(H (f, g))[
< 2 +[H (f (x
1
) , g (x
1
)) H (f (x
2
) , g (x
2
))[
< 2 +K [[f (x
1
) f (x
2
)[ +[g (x
1
) g (x
2
)[]
2 +K [[M
i
(f) m
i
(f)[ +[M
i
(g) m
i
(g)[] .
Since > 0 is arbitrary, this proves the claim.
Now continuing with the proof of the theorem, let P be such that
n
i=1
(M
i
(f) m
i
(f)) F
i
<

2K
,
n
i=1
(M
i
(g) m
i
(g)) F
i
<

2K
.
Then from the claim,
n
i=1
(M
i
(H (f, g)) m
i
(H (f, g))) F
i
<
n
i=1
K [[M
i
(f) m
i
(f)[ +[M
i
(g) m
i
(g)[] F
i
< .
Since > 0 is arbitrary, this shows H (f, g) satises the Riemann criterion and
hence H (f, g) is Riemann integrable as claimed. This proves the theorem.
This theorem implies that if f, g are Riemann Stieltjes integrable, then so is
af + bg, [f[ , f
2
, along with innitely many other such continuous combinations of
Riemann Stieltjes integrable functions. For example, to see that [f[ is Riemann
integrable, let H (a, b) = [a[ . Clearly this function satises the conditions of the
above theorem and so [f[ = H (f, f) R([a, b]) as claimed. The following theorem
gives an example of many functions which are Riemann integrable.
Theorem 2.10 Let f : [a, b] 1 be either increasing or decreasing on [a, b] and
suppose F is continuous. Then f R([a, b]) .
Proof: Let > 0 be given and let
x
i
= a +i
_
b a
n
_
, i = 0, , n.
Since F is continuous, it follows that it is uniformly continuous. Therefore, if n is
large enough, then for all i,
F (x
i
) F (x
i1
) <

f (b) f (a) + 1
Then since f is increasing,
U (f, P) L(f, P) =
n
i=1
(f (x
i
) f (x
i1
)) (F (x
i
) F (x
i1
))

f (b) f (a) + 1
n
i=1
(f (x
i
) f (x
i1
))
=

f (b) f (a) + 1
(f (b) f (a)) < .
Thus the Riemann criterion is satised and so the function is Riemann Stieltjes
integrable. The proof for decreasing f is similar.
Corollary 2.11 Let [a, b] be a bounded closed interval and let : [a, b] 1 be
Lipschitz continuous and suppose F is continuous. Then R([a, b]) . Recall that
a function, , is Lipschitz continuous if there is a constant, K, such that for all
x, y,
[(x) (y)[ < K[x y[ .
Proof: Let f (x) = x. Then by Theorem 2.10, f is Riemann Stieltjes integrable.
Let H (a, b) (a). Then by Theorem 2.9 H (f, f) = f = is also Riemann
Stieltjes integrable. This proves the corollary.
In fact, it is enough to assume is continuous, although this is harder. This
is the content of the next theorem which is where the dicult theorems about
continuity and uniform continuity are used. This is the main result on the existence
of the Riemann Stieltjes integral for this book.
Theorem 2.12 Suppose f : [a, b] 1 is continuous and F is just an increasing
function dened on [a, b]. Then f R([a, b]) .
Proof: Since f is continuous, it follows f is uniformly continuous on [a, b] .
Therefore, if > 0 is given, there exists a > 0 such that if [x
i
x
i1
[ < , then
M
i
m
i
<

F(b)F(a)+1
. Let
P x
0
, , x
n
be a partition with [x
i
x
i1
[ < . Then
U (f, P) L(f, P) <
n
i=1
(M
i
m
i
) (F (x
i
) F (x
i1
))
<

F (b) F (a) + 1
(F (b) F (a)) < .
By the Riemann criterion, f R([a, b]) . This proves the theorem.
2.4. PROPERTIES OF THE INTEGRAL 29
2.4 Properties Of The Integral
The integral has many important algebraic properties. First here is a simple lemma.
Lemma 2.13 Let S be a nonempty set which is bounded above and below. Then if
S x : x S ,
sup(S) = inf (S) (2.7)
and
inf (S) = sup(S) . (2.8)
Proof: Consider 2.7. Let x S. Then x sup(S) and so x sup(S) . If
follows that sup(S) is a lower bound for S and therefore, sup(S) inf (S) .
This implies sup(S) inf (S) . Now let x S. Then x S and so x inf (S)
which implies x inf (S) . Therefore, inf (S) is an upper bound for S and
so inf (S) sup(S) . This shows 2.7. Formula 2.8 is similar and is left as an
exercise.
In particular, the above lemma implies that for M
i
(f) and m
i
(f) dened above
M
i
(f) = m
i
(f) , and m
i
(f) = M
i
(f) .
Lemma 2.14 If f R([a, b]) then f R([a, b]) and
_
b
a
f (x) dF =
_
b
a
f (x) dF.
Proof: The rst part of the conclusion of this lemma follows from Theorem 2.10
since the function (y) y is Lipschitz continuous. Now choose P such that
_
b
a
f (x) dF L(f, P) < .
Then since m
i
(f) = M
i
(f) ,
>
_
b
a
f (x) dF
n
i=1
m
i
(f) F
i
=
_
b
a
f (x) dF +
n
i=1
M
i
(f) F
i
which implies
>
_
b
a
f (x) dF +
n
i=1
M
i
(f) F
i

_
b
a
f (x) dF +
_
b
a
f (x) dF.
Thus, since is arbitrary,
_
b
a
f (x) dF
_
b
a
f (x) dF
whenever f R([a, b]) . It follows
_
b
a
f (x) dF
_
b
a
f (x) dF =
_
b
a
(f (x)) dF
_
b
a
f (x) dF
and this proves the lemma.
Theorem 2.15 The integral is linear,
_
b
a
(f +g) (x) dF =
_
b
a
f (x) dF +
_
b
a
g (x) dF.
whenever f, g R([a, b]) and , 1.
Proof: First note that by Theorem 2.9, f + g R([a, b]) . To begin with,
consider the claim that if f, g R([a, b]) then
_
b
a
(f +g) (x) dF =
_
b
a
f (x) dF +
_
b
a
g (x) dF. (2.9)
Let P
1
,Q
1
be such that
U (f, Q
1
) L(f, Q
1
) < /2, U (g, P
1
) L(g, P
1
) < /2.
Then letting P P
1
Q
1
, Lemma 2.2 implies
U (f, P) L(f, P) < /2, and U (g, P) U (g, P) < /2.
Next note that
m
i
(f +g) m
i
(f) +m
i
(g) , M
i
(f +g) M
i
(f) +M
i
(g) .
Therefore,
L(g +f, P) L(f, P) +L(g, P) , U (g +f, P) U (f, P) +U (g, P) .
For this partition,
_
b
a
(f +g) (x) dF [L(f +g, P) , U (f +g, P)]
[L(f, P) +L(g, P) , U (f, P) +U (g, P)]
and
_
b
a
f (x) dF +
_
b
a
g (x) dF [L(f, P) +L(g, P) , U (f, P) +U (g, P)] .
Therefore,
_
b
a
(f +g) (x) dF
_
_
b
a
f (x) dF +
_
b
a
g (x) dF
_
U (f, P) +U (g, P) (L(f, P) +L(g, P)) < /2 +/2 = .

This proves 2.9 since is arbitrary.
2.4. PROPERTIES OF THE INTEGRAL 31
It remains to show that
_
b
a
f (x) dF =
_
b
a
f (x) dF.
Suppose rst that 0. Then
_
b
a
f (x) dF supL(f, P) : P is a partition =
supL(f, P) : P is a partition
_
b
a
f (x) dF.
If < 0, then this and Lemma 2.14 imply
_
b
a
f (x) dF =
_
b
a
() (f (x)) dF
= ()
_
b
a
(f (x)) dF =
_
b
a
f (x) dF.
In the next theorem, suppose F is dened on [a, b] [b, c] .
Theorem 2.16 If f R([a, b]) and f R([b, c]) , then f R([a, c]) and
_
c
a
f (x) dF =
_
b
a
f (x) dF +
_
c
b
f (x) dF. (2.10)
Proof: Let P
1
be a partition of [a, b] and P
2
be a partition of [b, c] such that
U (f, P
i
) L(f, P
i
) < /2, i = 1, 2.
Let P P
1
P
2
. Then P is a partition of [a, c] and
U (f, P) L(f, P)
= U (f, P
1
) L(f, P
1
) +U (f, P
2
) L(f, P
2
) < /2 +/2 = . (2.11)
Thus, f R([a, c]) by the Riemann criterion and also for this partition,
_
b
a
f (x) dF +
_
c
b
f (x) dF [L(f, P
1
) +L(f, P
2
) , U (f, P
1
) +U (f, P
2
)]
= [L(f, P) , U (f, P)]
and
_
c
a
f (x) dF [L(f, P) , U (f, P)] .
Hence by 2.11,
_
c
a
f (x) dF
_
_
b
a
f (x) dF +
_
c
b
f (x) dF
_
< U (f, P) L(f, P) <

which shows that since is arbitrary, 2.10 holds. This proves the theorem.
Corollary 2.17 Let F be continuous and let [a, b] be a closed and bounded interval
and suppose that
a = y
1
< y
2
< y
l
= b
and that f is a bounded function dened on [a, b] which has the property that f is
either increasing on [y
j
, y
j+1
] or decreasing on [y
j
, y
j+1
] for j = 1, , l 1. Then
f R([a, b]) .
Proof: This follows from Theorem 2.16 and Theorem 2.10.
The symbol,
_
b
a
f (x) dF when a > b has not yet been dened.
Denition 2.18 Let [a, b] be an interval and let f R([a, b]) . Then
_
a
b
f (x) dF
_
b
a
f (x) dF.
Note that with this denition,
_
a
a
f (x) dF =
_
a
a
f (x) dF
and so
_
a
a
f (x) dF = 0.
Theorem 2.19 Assuming all the integrals make sense,
_
b
a
f (x) dF +
_
c
b
f (x) dF =
_
c
a
f (x) dF.
Proof: This follows from Theorem 2.16 and Denition 2.18. For example, as-
sume
c (a, b) .
Then from Theorem 2.16,
_
c
a
f (x) dF +
_
b
c
f (x) dF =
_
b
a
f (x) dF
and so by Denition 2.18,
_
c
a
f (x) dF =
_
b
a
f (x) dF
_
b
c
f (x) dF
=
_
b
a
f (x) dF +
_
c
b
f (x) dF.
The other cases are similar.
2.5. FUNDAMENTAL THEOREM OF CALCULUS 33
The following properties of the integral have either been established or they
follow quickly from what has been shown so far.
If f R([a, b]) then if c [a, b] , f R([a, c]) , (2.12)
_
b
a
dF = (F (b) F (a)) , (2.13)
_
b
a
(f +g) (x) dF =
_
b
a
f (x) dF +
_
b
a
g (x) dF, (2.14)
_
b
a
f (x) dF +
_
c
b
f (x) dF =
_
c
a
f (x) dF, (2.15)
_
b
a
f (x) dF 0 if f (x) 0 and a < b, (2.16)
_
b
a
f (x) dF
_
b
a
[f (x)[ dF
. (2.17)
The only one of these claims which may not be completely obvious is the last one.
To show this one, note that
[f (x)[ f (x) 0, [f (x)[ +f (x) 0.
Therefore, by 2.16 and 2.14, if a < b,
_
b
a
[f (x)[ dF
_
b
a
f (x) dF
and
_
b
a
[f (x)[ dF
_
b
a
f (x) dF.
Therefore,
_
b
a
[f (x)[ dF
_
b
a
f (x) dF
.
If b < a then the above inequality holds with a and b switched. This implies 2.17.
2.5 Fundamental Theorem Of Calculus
In this section F (x) = x so things are specialized to the ordinary Riemann integral.
With these properties, it is easy to prove the fundamental theorem of calculus
2
.
2
This theorem is why Newton and Liebnitz are credited with inventing calculus. The integral
had been around for thousands of years and the derivative was by their time well known. However
the connection between these two ideas had not been fully made although Newtons predecessor,
Isaac Barrow had made some progress in this direction.
Let f R([a, b]) . Then by 2.12 f R([a, x]) for each x [a, b] . The rst version
of the fundamental theorem of calculus is a statement about the derivative of the
function
x
_
x
a
f (t) dt.
Theorem 2.20 Let f R([a, b]) and let
F (x)
_
x
a
f (t) dt.
Then if f is continuous at x (a, b) ,
F
(x) = f (x) .
Proof: Let x (a, b) be a point of continuity of f and let h be small enough
that x +h [a, b] . Then by using 2.15,
h
1
(F (x +h) F (x)) = h
1
_
x+h
x
f (t) dt.
Also, using 2.13,
f (x) = h
1
_
x+h
x
f (x) dt.
Therefore, by 2.17,
h
1
(F (x +h) F (x)) f (x)
h
1
_
x+h
x
(f (t) f (x)) dt
h
1
_
x+h
x
[f (t) f (x)[ dt
.
Let > 0 and let > 0 be small enough that if [t x[ < , then
[f (t) f (x)[ < .
Therefore, if [h[ < , the above inequality and 2.13 shows that
h
1
(F (x +h) F (x)) f (x)
[h[
1
[h[ = .
Since > 0 is arbitrary, this shows
lim
h0
h
1
(F (x +h) F (x)) = f (x)
and this proves the theorem.
Note this gives existence for the initial value problem,
F
(x) = f (x) , F (a) = 0

2.5. FUNDAMENTAL THEOREM OF CALCULUS 35
whenever f is Riemann integrable and continuous.
3
The next theorem is also called the fundamental theorem of calculus.
Theorem 2.21 Let f R([a, b]) and suppose there exists an antiderivative for
f, G, such that
G
(x) = f (x)
for every point of (a, b) and G is continuous on [a, b] . Then
_
b
a
f (x) dx = G(b) G(a) . (2.18)
Proof: Let P = x
0
, , x
n
be a partition satisfying
U (f, P) L(f, P) < .
Then
G(b) G(a) = G(x
n
) G(x
0
)
=
n
i=1
G(x
i
) G(x
i1
) .
By the mean value theorem,
G(b) G(a) =
n
i=1
G
(z
i
) (x
i
x
i1
)
=
n
i=1
f (z
i
) x
i
where z
i
is some point in [x
i1
, x
i
] . It follows, since the above sum lies between the
upper and lower sums, that
G(b) G(a) [L(f, P) , U (f, P)] ,
and also
_
b
a
f (x) dx [L(f, P) , U (f, P)] .
Therefore,
G(b) G(a)
_
b
a
f (x) dx
 0 is arbitrary, 2.18 holds. This proves the theorem.
3
Of course it was proved that if f is continuous on a closed interval, [a, b] , then f R([a, b])
but this is a hard theorem using the dicult result about uniform continuity.
The following notation is often used in this context. Suppose F is an antideriva-
tive of f as just described with F continuous on [a, b] and F
= f on (a, b) . Then
_
b
a
f (x) dx = F (b) F (a) F (x) [
b
a
.
Denition 2.22 Let f be a bounded function dened on a closed interval [a, b] and
let P x
0
, , x
n
be a partition of the interval. Suppose z
i
[x
i1
, x
i
] is
chosen. Then the sum
n
i=1
f (z
i
) (x
i
x
i1
)
is known as a Riemann sum. Also,
[[P[[ max [x
i
x
i1
[ : i = 1, , n .
Proposition 2.23 Suppose f R([a, b]) . Then there exists a partition, P
x
0
, , x
n
with the property that for any choice of z
k
[x
k1
, x
k
] ,
_
b
a
f (x) dx
n
k=1
f (z
k
) (x
k
x
k1
)
< .
Proof: Choose P such that U (f, P) L(f, P) < and then both
_
b
a
f (x) dx
and
n
k=1
f (z
k
) (x
k
x
k1
) are contained in [L(f, P) , U (f, P)] and so the claimed
inequality must hold. This proves the proposition.
It is signicant because it gives a way of approximating the integral.
The denition of Riemann integrability given in this chapter is also called Dar-
boux integrability and the integral dened as the unique number which lies between
all upper sums and all lower sums which is given in this chapter is called the Dar-
boux integral . The denition of the Riemann integral in terms of Riemann sums
is given next.
Denition 2.24 A bounded function, f dened on [a, b] is said to be Riemann
integrable if there exists a number, I with the property that for every > 0, there
exists > 0 such that if
P x
0
, x
1
, , x
n
is any partition having [[P[[ < , and z

i
[x
i1
, x
i
] ,
I
n
i=1
f (z
i
) (x
i
x
i1
)
< .
The number
_
b
a
f (x) dx is dened as I.
Thus, there are two denitions of the Riemann integral. It turns out they are
equivalent which is the following theorem of of Darboux.
2.6. EXERCISES 37
Theorem 2.25 A bounded function dened on [a, b] is Riemann integrable in the
sense of Denition 2.24 if and only if it is integrable in the sense of Darboux.
Furthermore the two integrals coincide.
The proof of this theorem is left for the exercises in Problems 10 - 12. It isnt
essential that you understand this theorem so if it does not interest you, leave it
out. Note that it implies that given a Riemann integrable function f in either sense,
it can be approximated by Riemann sums whenever [[P[[ is suciently small. Both
versions of the integral are obsolete but entirely adequate for most applications and
as a point of departure for a more up to date and satisfactory integral. The reason
for using the Darboux approach to the integral is that all the existence theorems
are easier to prove in this context.
2.6 Exercises
1. Let F (x) =
_
x
3
x
2
t
5
+7
t
7
+87t
6
+1
dt. Find F
(x) .
2. Let F (x) =
_
x
2
1
1+t
4
dt. Sketch a graph of F and explain why it looks the way
it does.
3. Let a and b be positive numbers and consider the function,
F (x) =
_
ax
0
1
a
2
+t
2
dt +
_
a/x
b
1
a
2
+t
2
dt.
Show that F is a constant.
4. Solve the following initial value problem from ordinary dierential equations
which is to nd a function y such that
y
(x) =
x
7
+ 1
x
6
+ 97x
5
+ 7
, y (10) = 5.
5. If F, G
_
f (x) dx for all x 1, show F (x) = G(x) +C for some constant,
C. Use this to give a dierent proof of the fundamental theorem of calculus
which has for its conclusion
_
b
a
f (t) dt = G(b) G(a) where G
(x) = f (x) .
6. Suppose f is Riemann integrable on [a, b] and continuous. (In fact continuous
implies Riemann integrable.) Show there exists c (a, b) such that
f (c) =
1
b a
_
b
a
f (x) dx.
Hint: You might consider the function F (x)
_
x
a
f (t) dt and use the mean
value theorem for derivatives and the fundamental theorem of calculus.
7. Suppose f and g are continuous functions on [a, b] and that g (x) ,= 0 on (a, b) .
Show there exists c (a, b) such that
f (c)
_
b
a
g (x) dx =
_
b
a
f (x) g (x) dx.
Hint: Dene F (x)
_
x
a
f (t) g (t) dt and let G(x)
_
x
a
g (t) dt. Then use
the Cauchy mean value theorem on these two functions.
8. Consider the function
f (x)
_
sin
_
1
x
_
if x ,= 0
0 if x = 0
.
Is f Riemann integrable? Explain why or why not.
9. Prove the second part of Theorem 2.10 about decreasing functions.
10. Suppose f is a bounded function dened on [a, b] and [f (x)[ < M for all
x [a, b] . Now let Q be a partition having n points, x
0
, , x
n
and let P
be any other partition. Show that
[U (f, P) L(f, P)[ 2Mn[[P[[ +[U (f, Q) L(f, Q)[ .
Hint: Write the sum for U (f, P) L(f, P) and split this sum into two sums,
the sum of terms for which [x
i1
, x
i
] contains at least one point of Q, and
terms for which [x
i1
, x
i
] does not contain any points of Q. In the latter case,
[x
i1
, x
i
] must be contained in some interval,
_
x
k1
, x
. Therefore, the sum

of these terms should be no larger than [U (f, Q) L(f, Q)[ .
11. If > 0 is given and f is a Darboux integrable function dened on [a, b],
show there exists > 0 such that whenever [[P[[ < , then
[U (f, P) L(f, P)[ < .
12. Prove Theorem 2.25.
Important Linear Algebra
This chapter contains some important linear algebra as distinguished from that
which is normally presented in undergraduate courses consisting mainly of uninter-
esting things you can do with row operations.
The notation, C
n
refers to the collection of ordered lists of n complex numbers.
Since every real number is also a complex number, this simply generalizes the usual
notion of 1
n
, the collection of all ordered lists of n real numbers. In order to avoid
worrying about whether it is real or complex numbers which are being referred to,
the symbol F will be used. If it is not clear, always pick C.
Denition 3.1 Dene F
n
(x
1
, , x
n
) : x
j
F for j = 1, , n . (x
1
, , x
n
) =
(y
1
, , y
n
) if and only if for all j = 1, , n, x
j
= y
j
. When (x
1
, , x
n
) F
n
, it
is conventional to denote (x
1
, , x
n
) by the single bold face letter, x. The numbers,
x
j
are called the coordinates. The set
(0, , 0, t, 0, , 0) : t F
for t in the i
th
slot is called the i
th
coordinate axis. The point 0 (0, , 0) is
called the origin.
Thus (1, 2, 4i) F
3
and (2, 1, 4i) F
3
but (1, 2, 4i) ,= (2, 1, 4i) because, even
though the same numbers are involved, they dont match up. In particular, the
rst entries are not equal.
The geometric signicance of 1
n
for n 3 has been encountered already in
calculus or in precalculus. Here is a short review. First consider the case when
n = 1. Then from the denition, 1
1
= 1. Recall that 1 is identied with the
points of a line. Look at the number line again. Observe that this amounts to
identifying a point on this line with a real number. In other words a real number
determines where you are on this line. Now suppose n = 2 and consider two lines
39
40 IMPORTANT LINEAR ALGEBRA
which intersect each other at right angles as shown in the following picture.
2
6

(2, 6)
8
3
(8, 3)
Notice how you can identify a point shown in the plane with the ordered pair,
(2, 6) . You go to the right a distance of 2 and then up a distance of 6. Similarly,
you can identify another point in the plane with the ordered pair (8, 3) . Go to
the left a distance of 8 and then up a distance of 3. The reason you go to the left
is that there is a sign on the eight. From this reasoning, every ordered pair
determines a unique point in the plane. Conversely, taking a point in the plane,
you could draw two lines through the point, one vertical and the other horizontal
and determine unique points, x
1
on the horizontal line in the above picture and x
2
on the vertical line in the above picture, such that the point of interest is identied
with the ordered pair, (x
1
, x
2
) . In short, points in the plane can be identied with
ordered pairs similar to the way that points on the real line are identied with
real numbers. Now suppose n = 3. As just explained, the rst two coordinates
determine a point in a plane. Letting the third component determine how far up
or down you go, depending on whether this number is positive or negative, this
determines a point in space. Thus, (1, 4, 5) would mean to determine the point
in the plane that goes with (1, 4) and then to go below this plane a distance of 5
to obtain a unique point in space. You see that the ordered triples correspond to
points in space just as the ordered pairs correspond to points in a plane and single
real numbers correspond to points on a line.
You cant stop here and say that you are only interested in n 3. What if you
were interested in the motion of two objects? You would need three coordinates
to describe where the rst object is and you would need another three coordinates
to describe where the other object is located. Therefore, you would need to be
considering 1
6
. If the two objects moved around, you would need a time coordinate
as well. As another example, consider a hot object which is cooling and suppose
you want the temperature of this object. How many coordinates would be needed?
You would need one for the temperature, three for the position of the point in the
object and one more for the time. Thus you would need to be considering 1
5
.
Many other examples can be given. Sometimes n is very large. This is often the
case in applications to business when they are trying to maximize prot subject
to constraints. It also occurs in numerical analysis when people try to solve hard
problems on a computer.
There are other ways to identify points in space with three numbers but the one
presented is the most basic. In this case, the coordinates are known as Cartesian
3.1. ALGEBRA IN F
N
41
coordinates after Descartes
1
who invented this idea in the rst half of the seven-
teenth century. I will often not bother to draw a distinction between the point in n
dimensional space and its Cartesian coordinates.
The geometric signicance of C
n
for n > 1 is not available because each copy of
C corresponds to the plane or 1
2
.
3.1 Algebra in F
n
There are two algebraic operations done with elements of F
n
. One is addition and
the other is multiplication by numbers, called scalars. In the case of C
n
the scalars
are complex numbers while in the case of 1
n
the only allowed scalars are real
numbers. Thus, the scalars always come from F in either case.
Denition 3.2 If x F
n
and a F, also called a scalar, then ax F
n
is dened
by
ax = a (x
1
, , x
n
) (ax
1
, , ax
n
) . (3.1)
This is known as scalar multiplication. If x, y F
n
then x +y F
n
and is dened
by
x +y = (x
1
, , x
n
) + (y
1
, , y
n
)
(x
1
+y
1
, , x
n
+y
n
) (3.2)
With this denition, the algebraic properties satisfy the conclusions of the fol-
lowing theorem.
Theorem 3.3 For v, w F
n
and , scalars, (real numbers), the following hold.
v +w = w+v, (3.3)
the commutative law of addition,
(v +w) +z = v+(w+z) , (3.4)
the associative law for addition,
v +0 = v, (3.5)
the existence of an additive identity,
v+(v) = 0, (3.6)
the existence of an additive inverse, Also
(v +w) = v+w, (3.7)
1
Rene Descartes 1596-1650 is often credited with inventing analytic geometry although it seems
the ideas were actually known much earlier. He was interested in many dierent subjects, physi-
ology, chemistry, and physics being some of them. He also wrote a large book in which he tried to
explain the book of Genesis scientically. Descartes ended up dying in Sweden.
( +) v =v+v, (3.8)
(v) = (v) , (3.9)
1v = v. (3.10)
In the above 0 = (0, , 0).
You should verify these properties all hold. For example, consider 3.7
(v +w) = (v
1
+w
1
, , v
n
+w
n
)
= ((v
1
+w
1
) , , (v
n
+w
n
))
= (v
1
+w
1
, , v
n
+w
n
)
= (v
1
, , v
n
) + (w
1
, , w
n
)
= v +w.
As usual subtraction is dened as x y x+(y) .
3.2 Subspaces Spans And Bases
Denition 3.4 Let x
1
, , x
p
be vectors in F
n
. A linear combination is any
expression of the form
p
i=1
c
i
x
i
where the c
i
are scalars. The set of all linear combinations of these vectors is
called span(x
1
, , x
n
) . If V F
n
, then V is called a subspace if whenever ,
are scalars and u and v are vectors of V, it follows u + v V . That is, it is
closed under the algebraic operations of vector addition and scalar multiplication.
A linear combination of vectors is said to be trivial if all the scalars in the linear
combination equal zero. A set of vectors is said to be linearly independent if the
only linear combination of these vectors which equals the zero vector is the trivial
linear combination. Thus x
1
, , x
n
is called linearly independent if whenever
p
k=1
c
k
x
k
= 0
it follows that all the scalars, c
k
equal zero. A set of vectors, x
1
, , x
p
, is
called linearly dependent if it is not linearly independent. Thus the set of vectors
is linearly dependent if there exist scalars, c
i
, i = 1, , n, not all zero such that
p
k=1
c
k
x
k
= 0.
Lemma 3.5 A set of vectors x
1
, , x
p
is linearly independent if and only if
none of the vectors can be obtained as a linear combination of the others.
3.2. SUBSPACES SPANS AND BASES 43
Proof: Suppose rst that x
1
, , x
p
is linearly independent. If
x
k
=
j=k
c
j
x
j
,
then
0 = 1x
k
+
j=k
(c
j
) x
j
,
a nontrivial linear combination, contrary to assumption. This shows that if the
set is linearly independent, then none of the vectors is a linear combination of the
others.
Now suppose no vector is a linear combination of the others. Is x
1
, , x
p
linearly independent? If it is not there exist scalars, c

i
, not all zero such that
p
i=1
c
i
x
i
= 0.
Say c
k
,= 0. Then you can solve for x
k
as
x
k
=
j=k
(c
j
) /c
k
x
j
contrary to assumption. This proves the lemma.
The following is called the exchange theorem.
Theorem 3.6 (Exchange Theorem) Let x
1
, , x
r
be a linearly independent set
of vectors such that each x
i
is in span(y
1
, , y
s
) . Then r s.
Proof: Dene spany
1
, , y
s
V, it follows there exist scalars, c
1
, , c
s
such that
x
1
=
s
i=1
c
i
y
i
. (3.11)
Not all of these scalars can equal zero because if this were the case, it would follow
that x
1
= 0 and so x
1
, , x
r
would not be linearly independent. Indeed, if
x
1
= 0, 1x
1
+
r
i=2
0x
i
= x
1
= 0 and so there would exist a nontrivial linear
combination of the vectors x
1
, , x
r
which equals zero.
Say c
k
,= 0. Then solve (3.11) for y
k
and obtain
y
k
span
_
_
x
1
,
s-1 vectors here
..
y
1
, , y
k1
, y
k+1
, , y
s
_
_
.
Dene z
1
, , z
s1
by
z
1
, , z
s1
y
1
, , y
k1
, y
k+1
, , y
s

Therefore, span x
1
, z
1
, , z
s1
= V because if v V, there exist constants
c
1
, , c
s
such that
v =
s1
i=1
c
i
z
i
+c
s
y
k
.
Now replace the y
k
in the above with a linear combination of the vectors,
x
1
, z
1
, , z
s1
to obtain v spanx
1
, z
1
, , z
s1
. The vector y
k
, in the list y
1
, , y
s
, has
now been replaced with the vector x
1
and the resulting modied list of vectors has
the same span as the original list of vectors, y
1
, , y
s
.
Now suppose that r > s and that span(x
1
, , x
l
, z
1
, , z
p
) = V where the
vectors, z
1
, , z
p
are each taken from the set, y
1
, , y
s
and l + p = s. This
has now been done for l = 1 above. Then since r > s, it follows that l s < r
and so l + 1 r. Therefore, x
l+1
is a vector not in the list, x
1
, , x
l
and since
spanx
1
, , x
l
, z
1
, , z
p
= V, there exist scalars, c
i
and d
j
such that
x
l+1
=
l
i=1
c
i
x
i
+
p
j=1
d
j
z
j
. (3.12)
Now not all the d
j
can equal zero because if this were so, it would follow that
x
1
, , x
r
would be a linearly dependent set because one of the vectors would
equal a linear combination of the others. Therefore, (3.12) can be solved for one of
the z
i
, say z
k
, in terms of x
l+1
and the other z
i
and just as in the above argument,
replace that z
i
with x
l+1
to obtain
span
_
_
x
1
, x
l
, x
l+1
,
p-1 vectors here
..
z
1
, z
k1
, z
k+1
, , z
p
_
_
= V.
Continue this way, eventually obtaining
span(x
1
, , x
s
) = V.
But then x
r
span(x
1
, , x
s
) contrary to the assumption that x
1
, , x
r
is
linearly independent. Therefore, r s as claimed.
Denition 3.7 A nite set of vectors, x
1
, , x
r
is a basis for F
n
if
span(x
1
, , x
r
) = F
n
and x
1
, , x
r
is linearly independent.
Corollary 3.8 Let x
1
, , x
r
and y
1
, , y
s
be two bases
2
of F
n
. Then r =
s = n.
2
This is the plural form of basis. We could say basiss but it would involve an inordinate amount
of hissing as in The sixth shieks sixth sheep is sick. This is the reason that bases is used instead
of basiss.
3.2. SUBSPACES SPANS AND BASES 45
Proof: From the exchange theorem, r s and s r. Now note the vectors,
e
i
=
1 is in the i
th
slot
..
(0, , 0, 1, 0 , 0)
for i = 1, 2, , n are a basis for F
n
. This proves the corollary.
Lemma 3.9 Let v
1
, , v
r
be a set of vectors. Then V span(v
1
, , v
r
) is a
subspace.
Proof: Suppose , are two scalars and let
r
k=1
c
k
v
k
and
r
k=1
d
k
v
k
are two
elements of V. What about
k=1
c
k
v
k
+
r
k=1
d
k
v
k
?
Is it also in V ?
k=1
c
k
v
k
+
r
k=1
d
k
v
k
=
r
k=1
(c
k
+d
k
) v
k
V
so the answer is yes. This proves the lemma.
Denition 3.10 A nite set of vectors, x
1
, , x
r
is a basis for a subspace, V
of F
n
if span(x
1
, , x
r
) = V and x
1
, , x
r
is linearly independent.
Corollary 3.11 Let x
1
, , x
r
and y
1
, , y
s
be two bases for V . Then r = s.
Proof: From the exchange theorem, r s and s r. Therefore, this proves
the corollary.
Denition 3.12 Let V be a subspace of F
n
. Then dim(V ) read as the dimension
of V is the number of vectors in a basis.
Of course you should wonder right now whether an arbitrary subspace even has
a basis. In fact it does and this is in the next theorem. First, here is an interesting
lemma.
Lemma 3.13 Suppose v / span(u
1
, , u
k
) and u
1
, , u
k
is linearly indepen-
dent. Then u
1
, , u
k
, v is also linearly independent.
Proof: Suppose

k
i=1
c
i
u
i
+ dv = 0. It is required to verify that each c
i
= 0
and that d = 0. But if d ,= 0, then you can solve for v as a linear combination of
the vectors, u
1
, , u
k
,
v =
k
i=1
_
c
i
d
_
u
i
contrary to assumption. Therefore, d = 0. But then

k
i=1
c
i
u
i
= 0 and the linear
independence of u
1
, , u
k
implies each c
i
= 0 also. This proves the lemma.
Theorem 3.14 Let V be a nonzero subspace of F
n
. Then V has a basis.
Proof: Let v
1
V where v
1
,= 0. If spanv
1
= V, stop. v
1
is a basis for V .
Otherwise, there exists v
2
V which is not in span v
1
. By Lemma 3.13 v
1
, v
2
is a linearly independent set of vectors. If spanv

1
, v
2
= V stop, v
1
, v
2
is a basis
for V. If span v
1
, v
2
,= V, then there exists v
3
/ spanv
1
, v
2
and v
1
, v
2
, v
3
is
a larger linearly independent set of vectors. Continuing this way, the process must
stop before n +1 steps because if not, it would be possible to obtain n +1 linearly
independent vectors contrary to the exchange theorem. This proves the theorem.
In words the following corollary states that any linearly independent set of vec-
tors can be enlarged to form a basis.
Corollary 3.15 Let V be a subspace of F
n
and let v
1
, , v
r
be a linearly inde-
pendent set of vectors in V . Then either it is a basis for V or there exist vectors,
v
r+1
, , v
s
such that v
1
, , v
r
, v
r+1
, , v
s
is a basis for V.
Proof: This follows immediately from the proof of Theorem 3.14. You do exactly
the same argument except you start with v
1
, , v
r
rather than v
1
.
It is also true that any spanning set of vectors can be restricted to obtain a
basis.
Theorem 3.16 Let V be a subspace of F
n
and suppose span(u
1
, u
p
) = V
where the u
i
are nonzero vectors. Then there exist vectors, v
1
, v
r
such that
v
1
, v
r
u
1
, u
p
and v
1
, v
r
is a basis for V .
Proof: Let r be the smallest positive integer with the property that for some
set, v
1
, v
r
u
1
, u
p
,
span(v
1
, v
r
) = V.
Then r p and it must be the case that v
1
, v
r
is linearly independent because
if it were not so, one of the vectors, say v
k
would be a linear combination of the
others. But then you could delete this vector from v
1
, v
r
and the resulting
list of r 1 vectors would still span V contrary to the denition of r. This proves
the theorem.
3.3 An Application To Matrices
The following is a theorem of major signicance.
Theorem 3.17 Suppose A is an n n matrix. Then A is one to one if and only
if A is onto. Also, if B is an n n matrix and AB = I, then it follows BA = I.
Proof: First suppose A is one to one. Consider the vectors, Ae
1
, , Ae
n
where e
k
is the column vector which is all zeros except for a 1 in the k
th
position.
This set of vectors is linearly independent because if
n
k=1
c
k
Ae
k
= 0,
3.3. AN APPLICATION TO MATRICES 47
then since A is linear,
A
_
n
k=1
c
k
e
k
_
= 0
and since A is one to one, it follows
n
k=1
c
k
e
k
= 0
3
which implies each c
k
= 0. Therefore, Ae
1
, , Ae
n
must be a basis for F
n
because if not there would exist a vector, y / span(Ae
1
, , Ae
n
) and then by
Lemma 3.13, Ae
1
, , Ae
n
, y would be an independent set of vectors having
n + 1 vectors in it, contrary to the exchange theorem. It follows that for y F
n
there exist constants, c
i
such that
y =
n
k=1
c
k
Ae
k
= A
_
n
k=1
c
k
e
k
_
showing that, since y was arbitrary, A is onto.
Next suppose A is onto. This means the span of the columns of A equals F
n
. If
these columns are not linearly independent, then by Lemma 3.5 on Page 42, one of
the columns is a linear combination of the others and so the span of the columns of
A equals the span of the n 1 other columns. This violates the exchange theorem
because e
1
, , e
n
would be a linearly independent set of vectors contained in
the span of only n 1 vectors. Therefore, the columns of A must be independent
and this equivalent to saying that Ax = 0 if and only if x = 0. This implies A is
one to one because if Ax = Ay, then A(x y) = 0 and so x y = 0.
Now suppose AB = I. Why is BA = I? Since AB = I it follows B is one to
one since otherwise, there would exist, x ,= 0 such that Bx = 0 and then ABx =
A0 = 0 ,= Ix. Therefore, from what was just shown, B is also onto. In addition to
this, A must be one to one because if Ay = 0, then y = Bx for some x and then
x = ABx = Ay = 0 showing y = 0. Now from what is given to be so, it follows
(AB) A = A and so using the associative law for matrix multiplication,
A(BA) A = A(BAI) = 0.
But this means (BAI) x = 0 for all x since otherwise, A would not be one to
one. Hence BA = I as claimed. This proves the theorem.
This theorem shows that if an nn matrix, B acts like an inverse when multiplied
on one side of A it follows that B = A
1
and it will act like an inverse on both sides
of A.
The conclusion of this theorem pertains to square matrices only. For example,
let
A =
_
_
1 0
0 1
1 0
_
_
, B =
_
1 0 0
1 1 1
_
(3.13)
Then
BA =
_
1 0
0 1
_
but
AB =
_
_
1 0 0
1 1 1
1 0 0
_
_
.
3.4 The Mathematical Theory Of Determinants
It is assumed the reader is familiar with matrices. However, the topic of determi-
nants is often neglected in linear algebra books these days. Therefore, I will give a
fairly quick and grubby treatment of this topic which includes all the main results.
Two books which give a good introduction to determinants are Apostol [3] and
Rudin [37]. A recent book which also has a good introduction is Baker [7]
Let (i
1
, , i
n
) be an ordered list of numbers from 1, , n . This means the
order is important so (1, 2, 3) and (2, 1, 3) are dierent.
The following Lemma will be essential in the denition of the determinant.
Lemma 3.18 There exists a unique function, sgn
n
which maps each list of n num-
bers from 1, , n to one of the three numbers, 0, 1, or 1 which also has the
following properties.
sgn
n
(1, , n) = 1 (3.14)
sgn
n
(i
1
, , p, , q, , i
n
) = sgn
n
(i
1
, , q, , p, , i
n
) (3.15)
In words, the second property states that if two of the numbers are switched, the
value of the function is multiplied by 1. Also, in the case where n > 1 and
i
1
, , i
n
= 1, , n so that every number from 1, , n appears in the or-
dered list, (i
1
, , i
n
) ,
sgn
n
(i
1
, , i
1
, n, i
+1
, , i
n
)
(1)
n
sgn
n1
(i
1
, , i
1
, i
+1
, , i
n
) (3.16)
where n = i
in the ordered list, (i

1
, , i
n
) .
Proof: To begin with, it is necessary to show the existence of such a function.
This is clearly true if n = 1. Dene sgn
1
(1) 1 and observe that it works. No
switching is possible. In the case where n = 2, it is also clearly true. Let sgn
2
(1, 2) =
1 and sgn
2
(2, 1) = 1 while sgn
2
(2, 2) = sgn
2
(1, 1) = 0 and verify it works.
Assuming such a function exists for n, sgn
n+1
will be dened in terms of sgn
n
.
If there are any repeated numbers in (i
1
, , i
n+1
) , sgn
n+1
(i
1
, , i
n+1
) 0. If
there are no repeats, then n + 1 appears somewhere in the ordered list. Let
be the position of the number n + 1 in the list. Thus, the list is of the form
(i
1
, , i
1
, n + 1, i
+1
, , i
n+1
) . From 3.16 it must be that
sgn
n+1
(i
1
, , i
1
, n + 1, i
+1
, , i
n+1
)
3.4. THE MATHEMATICAL THEORY OF DETERMINANTS 49
(1)
n+1
sgn
n
(i
1
, , i
1
, i
+1
, , i
n+1
) .
It is necessary to verify this satises 3.14 and 3.15 with n replaced with n +1. The
rst of these is obviously true because
sgn
n+1
(1, , n, n + 1) (1)
n+1(n+1)
sgn
n
(1, , n) = 1.
If there are repeated numbers in (i
1
, , i
n+1
) , then it is obvious 3.15 holds because
both sides would equal zero from the above denition. It remains to verify 3.15 in
the case where there are no numbers repeated in (i
1
, , i
n+1
) . Consider
sgn
n+1
_
i
1
, ,
r
p, ,
s
q, , i
n+1
_
,
where the r above the p indicates the number, p is in the r
th
position and the s
above the q indicates that the number, q is in the s
th
position. Suppose rst that
r < < s. Then
sgn
n+1
_
i
1
, ,
r
p, ,
n + 1, ,
s
q, , i
n+1
_
(1)
n+1
sgn
n
_
i
1
, ,
r
p, ,
s1
q , , i
n+1
_
while
sgn
n+1
_
i
1
, ,
r
q, ,
n + 1, ,
s
p, , i
n+1
_
=
(1)
n+1
sgn
n
_
i
1
, ,
r
q, ,
s1
p , , i
n+1
_
and so, by induction, a switch of p and q introduces a minus sign in the result.
Similarly, if > s or if < r it also follows that 3.15 holds. The interesting case
is when = r or = s. Consider the case where = r and note the other case is
entirely similar.
sgn
n+1
_
i
1
, ,
r
n + 1, ,
s
q, , i
n+1
_
=
(1)
n+1r
sgn
n
_
i
1
, ,
s1
q , , i
n+1
_
(3.17)
while
sgn
n+1
_
i
1
, ,
r
q, ,
s
n + 1, , i
n+1
_
=
(1)
n+1s
sgn
n
_
i
1
, ,
r
q, , i
n+1
_
. (3.18)
By making s 1 r switches, move the q which is in the s 1
th
position in 3.17 to
the r
th
position in 3.18. By induction, each of these switches introduces a factor of
1 and so
sgn
n
_
i
1
, ,
s1
q , , i
n+1
_
= (1)
s1r
sgn
n
_
i
1
, ,
r
q, , i
n+1
_
.
Therefore,
sgn
n+1
_
i
1
, ,
r
n + 1, ,
s
q, , i
n+1
_
= (1)
n+1r
sgn
n
_
i
1
, ,
s1
q , , i
n+1
_
= (1)
n+1r
(1)
s1r
sgn
n
_
i
1
, ,
r
q, , i
n+1
_
= (1)
n+s
sgn
n
_
i
1
, ,
r
q, , i
n+1
_
= (1)
2s1
(1)
n+1s
sgn
n
_
i
1
, ,
r
q, , i
n+1
_
= sgn
n+1
_
i
1
, ,
r
q, ,
s
n + 1, , i
n+1
_
.
This proves the existence of the desired function.
To see this function is unique, note that you can obtain any ordered list of
distinct numbers from a sequence of switches. If there exist two functions, f and
g both satisfying 3.14 and 3.15, you could start with f (1, , n) = g (1, , n)
and applying the same sequence of switches, eventually arrive at f (i
1
, , i
n
) =
g (i
1
, , i
n
) . If any numbers are repeated, then 3.15 gives both functions are equal
to zero for that ordered list. This proves the lemma.
In what follows sgn will often be used rather than sgn
n
because the context
supplies the appropriate n.
Denition 3.19 Let f be a real valued function which has the set of ordered lists
of numbers from 1, , n as its domain. Dene
(k
1
, ,k
n
)
f (k
1
k
n
)
to be the sum of all the f (k
1
k
n
) for all possible choices of ordered lists (k
1
, , k
n
)
of numbers of 1, , n . For example,
(k
1
,k
2
)
f (k
1
, k
2
) = f (1, 2) +f (2, 1) +f (1, 1) +f (2, 2) .
Denition 3.20 Let (a
ij
) = A denote an n n matrix. The determinant of A,
denoted by det (A) is dened by
det (A)
(k
1
, ,k
n
)
sgn(k
1
, , k
n
) a
1k
1
a
nk
n
where the sum is taken over all ordered lists of numbers from 1, , n. Note it
suces to take the sum over only those ordered lists in which there are no repeats
because if there are, sgn(k
1
, , k
n
) = 0 and so that term contributes 0 to the sum.
Let A be an n n matrix, A = (a
ij
) and let (r
1
, , r
n
) denote an ordered list
of n numbers from 1, , n. Let A(r
1
, , r
n
) denote the matrix whose k
th
row
is the r
k
row of the matrix, A. Thus
det (A(r
1
, , r
n
)) =
(k
1
, ,k
n
)
sgn(k
1
, , k
n
) a
r
1
k
1
a
r
n
k
n
(3.19)
and
A(1, , n) = A.
Proposition 3.21 Let
(r
1
, , r
n
)
be an ordered list of numbers from 1, , n. Then
sgn(r
1
, , r
n
) det (A)
=
(k
1
, ,k
n
)
sgn(k
1
, , k
n
) a
r
1
k
1
a
r
n
k
n
(3.20)
= det (A(r
1
, , r
n
)) . (3.21)
Proof: Let (1, , n) = (1, , r, s, , n) so r < s.
det (A(1, , r, , s, , n)) = (3.22)
(k
1
, ,k
n
)
sgn(k
1
, , k
r
, , k
s
, , k
n
) a
1k
1
a
rk
r
a
sk
s
a
nk
n
,
and renaming the variables, calling k
s
, k
r
and k
r
, k
s
, this equals
=
(k
1
, ,k
n
)
sgn(k
1
, , k
s
, , k
r
, , k
n
) a
1k
1
a
rk
s
a
sk
r
a
nk
n
=
(k
1
, ,k
n
)
sgn
_
_
k
1
, ,
These got switched
..
k
r
, , k
s
, , k
n
_
_
a
1k
1
a
sk
r
a
rk
s
a
nk
n
= det (A(1, , s, , r, , n)) . (3.23)
Consequently,
det (A(1, , s, , r, , n)) =
det (A(1, , r, , s, , n)) = det (A)
Now letting A(1, , s, , r, , n) play the role of A, and continuing in this way,
switching pairs of numbers,
det (A(r
1
, , r
n
)) = (1)
p
det (A)
where it took p switches to obtain(r
1
, , r
n
) from (1, , n). By Lemma 3.18, this
implies
det (A(r
1
, , r
n
)) = (1)
p
det (A) = sgn(r
1
, , r
n
) det (A)
and proves the proposition in the case when there are no repeated numbers in the
ordered list, (r
1
, , r
n
). However, if there is a repeat, say the r
th
row equals the
s
th
row, then the reasoning of 3.22 -3.23 shows that A(r
1
, , r
n
) = 0 and also
sgn(r
1
, , r
n
) = 0 so the formula holds in this case also.
Observation 3.22 There are n! ordered lists of distinct numbers from 1, , n .
To see this, consider n slots placed in order. There are n choices for the rst
slot. For each of these choices, there are n 1 choices for the second. Thus there
are n(n 1) ways to ll the rst two slots. Then for each of these ways there are
n 2 choices left for the third slot. Continuing this way, there are n! ordered lists
of distinct numbers from 1, , n as stated in the observation.
With the above, it is possible to give a more symmetric description of the de-
terminant from which it will follow that det (A) = det
_
A
T
_
.
Corollary 3.23 The following formula for det (A) is valid.
det (A) =
1
n!
(r
1
, ,r
n
)
(k
1
, ,k
n
)
sgn(r
1
, , r
n
) sgn(k
1
, , k
n
) a
r
1
k
1
a
r
n
k
n
. (3.24)
And also det
_
A
T
_
= det (A) where A
T
is the transpose of A. (Recall that for
A
T
=
_
a
T
ij
_
, a
T
ij
= a
ji
.)
Proof: From Proposition 3.21, if the r
i
are distinct,
det (A) =
(k
1
, ,k
n
)
sgn(r
1
, , r
n
) sgn(k
1
, , k
n
) a
r
1
k
1
a
r
n
k
n
.
Summing over all ordered lists, (r
1
, , r
n
) where the r
i
are distinct, (If the r
i
are
not distinct, sgn(r
1
, , r
n
) = 0 and so there is no contribution to the sum.)
n! det (A) =
(r
1
, ,r
n
)
(k
1
, ,k
n
)
sgn(r
1
, , r
n
) sgn(k
1
, , k
n
) a
r
1
k
1
a
r
n
k
n
.
This proves the corollary since the formula gives the same number for A as it does
for A
T
.
Corollary 3.24 If two rows or two columns in an nn matrix, A, are switched, the
determinant of the resulting matrix equals (1) times the determinant of the original
matrix. If A is an n n matrix in which two rows are equal or two columns are
equal then det (A) = 0. Suppose the i
th
row of A equals (xa
1
+yb
1
, , xa
n
+yb
n
).
Then
det (A) = xdet (A
1
) +y det (A
2
)
where the i
th
row of A
1
is (a
1
, , a
n
) and the i
th
row of A
2
is (b
1
, , b
n
) , all
other rows of A
1
and A
2
coinciding with those of A. In other words, det is a linear
function of each row A. The same is true with the word row replaced with the
word column.
Proof: By Proposition 3.21 when two rows are switched, the determinant of the
resulting matrix is (1) times the determinant of the original matrix. By Corollary
3.23 the same holds for columns because the columns of the matrix equal the rows
of the transposed matrix. Thus if A
1
is the matrix obtained from A by switching
two columns,
det (A) = det
_
A
T
_
= det
_
A
T
1
_
= det (A
1
) .
If A has two equal columns or two equal rows, then switching them results in the
same matrix. Therefore, det (A) = det (A) and so det (A) = 0.
It remains to verify the last assertion.
det (A)
(k
1
, ,k
n
)
sgn(k
1
, , k
n
) a
1k
1
(xa
k
i
+yb
k
i
) a
nk
n
= x
(k
1
, ,k
n
)
sgn(k
1
, , k
n
) a
1k
1
a
k
i
a
nk
n
+y
(k
1
, ,k
n
)
sgn(k
1
, , k
n
) a
1k
1
b
k
i
a
nk
n
xdet (A
1
) +y det (A
2
) .
The same is true of columns because det
_
A
T
_
= det (A) and the rows of A
T
are
the columns of A.
Denition 3.25 A vector, w, is a linear combination of the vectors v
1
, , v
r
if
there exists scalars, c
1
, c
r
such that w =
r
k=1
c
k
v
k
. This is the same as saying
w spanv
1
, , v
r
.
The following corollary is also of great use.
Corollary 3.26 Suppose A is an n n matrix and some column (row) is a linear
combination of r other columns (rows). Then det (A) = 0.
Proof: Let A =
_
a
1
a
n
_
be the columns of A and suppose the condition
that one column is a linear combination of r of the others is satised. Then by using
Corollary 3.24 you may rearrange the columns to have the n
th
column a linear
combination of the rst r columns. Thus a
n
=
r
k=1
c
k
a
k
and so
det (A) = det
_
a
1
a
r
a
n1
r
k=1
c
k
a
k
_
.
By Corollary 3.24
det (A) =
r
k=1
c
k
det
_
a
1
a
r
a
n1
a
k
_
= 0.
The case for rows follows from the fact that det (A) = det
_
A
T
_
. This proves the
corollary.
Recall the following denition of matrix multiplication.
Denition 3.27 If A and B are n n matrices, A = (a
ij
) and B = (b
ij
), AB =
(c
ij
) where
c
ij

n
k=1
a
ik
b
kj
.
One of the most important rules about determinants is that the determinant of
a product equals the product of the determinants.
Theorem 3.28 Let A and B be n n matrices. Then
det (AB) = det (A) det (B) .
Proof: Let c
ij
be the ij
th
entry of AB. Then by Proposition 3.21,
det (AB) =
(k
1
, ,k
n
)
sgn(k
1
, , k
n
) c
1k
1
c
nk
n
=
(k
1
, ,k
n
)
sgn(k
1
, , k
n
)
_
r
1
a
1r
1
b
r
1
k
1
_

_
r
n
a
nr
n
b
r
n
k
n
_
=
(r
1
,r
n
)
(k
1
, ,k
n
)
sgn(k
1
, , k
n
) b
r
1
k
1
b
r
n
k
n
(a
1r
1
a
nr
n
)
=
(r
1
,r
n
)
sgn(r
1
r
n
) a
1r
1
a
nr
n
det (B) = det (A) det (B) .
Lemma 3.29 Suppose a matrix is of the form
M =
_
A
0 a
_
(3.25)
or
M =
_
A 0
a
_
(3.26)
where a is a number and A is an (n 1) (n 1) matrix and denotes either a
column or a row having length n1 and the 0 denotes either a column or a row of
length n 1 consisting entirely of zeros. Then
det (M) = a det (A) .
Proof: Denote M by (m
ij
) . Thus in the rst case, m
nn
= a and m
ni
= 0 if
i ,= n while in the second case, m
nn
= a and m
in
= 0 if i ,= n. From the denition
of the determinant,
det (M)
(k
1
, ,k
n
)
sgn
n
(k
1
, , k
n
) m
1k
1
m
nk
n
Letting denote the position of n in the ordered list, (k
1
, , k
n
) then using the
earlier conventions used to prove Lemma 3.18, det (M) equals
(k
1
, ,k
n
)
(1)
n
sgn
n1
_
k
1
, , k
1
,
k
+1
, ,
n1
k
n
_
m
1k
1
m
nk
n
Now suppose 3.26. Then if k
n
,= n, the term involving m
nk
n
in the above expression
equals zero. Therefore, the only terms which survive are those for which = n or
in other words, those for which k
n
= n. Therefore, the above expression reduces to
a
(k
1
, ,k
n1
)
sgn
n1
(k
1
, k
n1
) m
1k
1
m
(n1)k
n1
= a det (A) .
To get the assertion in the situation of 3.25 use Corollary 3.23 and 3.26 to write
det (M) = det
_
M
T
_
= det
__
A
T
0
a
__
= a det
_
A
T
_
= a det (A) .
This proves the lemma.
In terms of the theory of determinants, arguably the most important idea is
that of Laplace expansion along a row or a column. This will follow from the above
denition of a determinant.
Denition 3.30 Let A = (a
ij
) be an n n matrix. Then a new matrix called
the cofactor matrix, cof (A) is dened by cof (A) = (c
ij
) where to obtain c
ij
delete
the i
th
row and the j
th
column of A, take the determinant of the (n 1) (n 1)
matrix which results, (This is called the ij
th
minor of A. ) and then multiply this
number by (1)
i+j
. To make the formulas easier to remember, cof (A)
ij
will denote
the ij
th
entry of the cofactor matrix.
The following is the main result. Earlier this was given as a denition and the
outrageous totally unjustied assertion was made that the same number would be
obtained by expanding the determinant along any row or column. The following
theorem proves this assertion.
Theorem 3.31 Let A be an n n matrix where n 2. Then
det (A) =
n
j=1
a
ij
cof (A)
ij
=
n
i=1
a
ij
cof (A)
ij
. (3.27)
The rst formula consists of expanding the determinant along the i
th
row and the
second expands the determinant along the j
th
column.
Proof: Let (a
i1
, , a
in
) be the i
th
row of A. Let B
j
be the matrix obtained
from A by leaving every row the same except the i
th
row which in B
j
equals
(0, , 0, a
ij
, 0, , 0) . Then by Corollary 3.24,
det (A) =
n
j=1
det (B
j
)
Denote by A
ij
the (n 1) (n 1) matrix obtained by deleting the i
th
row and
the j
th
column of A. Thus cof (A)
ij
(1)
i+j
det
_
A
ij
_
. At this point, recall that
from Proposition 3.21, when two rows or two columns in a matrix, M, are switched,
this results in multiplying the determinant of the old matrix by 1 to get the
determinant of the new matrix. Therefore, by Lemma 3.29,
det (B
j
) = (1)
nj
(1)
ni
det
__
A
ij
0 a
ij
__
= (1)
i+j
det
__
A
ij
0 a
ij
__
= a
ij
cof (A)
ij
.
Therefore,
det (A) =
n
j=1
a
ij
cof (A)
ij
which is the formula for expanding det (A) along the i
th
row. Also,
det (A) = det
_
A
T
_
=
n
j=1
a
T
ij
cof
_
A
T
_
ij
=
n
j=1
a
ji
cof (A)
ji
which is the formula for expanding det (A) along the i
th
column. This proves the
theorem.
Note that this gives an easy way to write a formula for the inverse of an n n
matrix.
Theorem 3.32 A
1
exists if and only if det(A) ,= 0. If det(A) ,= 0, then A
1
=
_
a
1
ij
_
where
a
1
ij
= det(A)
1
cof (A)
ji
for cof (A)
ij
the ij
th
cofactor of A.
Proof: By Theorem 3.31 and letting (a
ir
) = A, if det (A) ,= 0,
n
i=1
a
ir
cof (A)
ir
det(A)
1
= det(A) det(A)
1
= 1.
Now consider
n
i=1
a
ir
cof (A)
ik
det(A)
1
when k ,= r. Replace the k
th
column with the r
th
column to obtain a matrix, B
k
whose determinant equals zero by Corollary 3.24. However, expanding this matrix
along the k
th
column yields
0 = det (B
k
) det (A)
1
=
n
i=1
a
ir
cof (A)
ik
det (A)
1
Summarizing,
n
i=1
a
ir
cof (A)
ik
det (A)
1
=
rk
.
Using the other formula in Theorem 3.31, and similar reasoning,
n
j=1
a
rj
cof (A)
kj
det (A)
1
=
rk
This proves that if det (A) ,= 0, then A
1
exists with A
1
=
_
a
1
ij
_
, where
a
1
ij
= cof (A)
ji
det (A)
1
.
Now suppose A
1
exists. Then by Theorem 3.28,
1 = det (I) = det
_
AA
1
_
= det (A) det
_
A
1
_
so det (A) ,= 0. This proves the theorem.
The next corollary points out that if an n n matrix, A has a right or a left
inverse, then it has an inverse.
Corollary 3.33 Let A be an nn matrix and suppose there exists an nn matrix,
B such that BA = I. Then A
1
exists and A
1
= B. Also, if there exists C an
n n matrix such that AC = I, then A
1
exists and A
1
= C.
Proof: Since BA = I, Theorem 3.28 implies
det Bdet A = 1
and so det A ,= 0. Therefore from Theorem 3.32, A
1
exists. Therefore,
A
1
= (BA) A
1
= B
_
AA
1
_
= BI = B.
The case where CA = I is handled similarly.
The conclusion of this corollary is that left inverses, right inverses and inverses
are all the same in the context of n n matrices.
Theorem 3.32 says that to nd the inverse, take the transpose of the cofactor
matrix and divide by the determinant. The transpose of the cofactor matrix is
called the adjugate or sometimes the classical adjoint of the matrix A. It is an
abomination to call it the adjoint although you do sometimes see it referred to in
this way. In words, A
1
is equal to one over the determinant of A times the adjugate
matrix of A.
In case you are solving a system of equations, Ax = y for x, it follows that if
A
1
exists,
x =
_
A
1
A
_
x = A
1
(Ax) = A
1
y
thus solving the system. Now in the case that A
1
exists, there is a formula for
A
1
given above. Using this formula,
x
i
=
n
j=1
a
1
ij
y
j
=
n
j=1
1
det (A)
cof (A)
ji
y
j
.
By the formula for the expansion of a determinant along a column,
x
i
=
1
det (A)
det
_
_
_
y
1

.
.
.
.
.
.
.
.
.
y
n

_
_
_,
where here the i
th
column of A is replaced with the column vector, (y
1
, y
n
)
T
,
and the determinant of this modied matrix is taken and divided by det (A). This
formula is known as Cramers rule.
Denition 3.34 A matrix M, is upper triangular if M
ij
= 0 whenever i > j. Thus
such a matrix equals zero below the main diagonal, the entries of the form M
ii
as
shown.
_
_
_
_
_
_

0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

0 0
_
_
_
_
_
_
A lower triangular matrix is dened similarly as a matrix for which all entries above
the main diagonal are equal to zero.
With this denition, here is a simple corollary of Theorem 3.31.
Corollary 3.35 Let M be an upper (lower) triangular matrix. Then det (M) is
obtained by taking the product of the entries on the main diagonal.
Denition 3.36 A submatrix of a matrix A is the rectangular array of numbers
obtained by deleting some rows and columns of A. Let A be an mn matrix. The
determinant rank of the matrix equals r where r is the largest number such that
some r r submatrix of A has a non zero determinant. The row rank is dened
to be the dimension of the span of the rows. The column rank is dened to be the
dimension of the span of the columns.
Theorem 3.37 If A has determinant rank, r, then there exist r rows of the matrix
such that every other row is a linear combination of these r rows.
Proof: Suppose the determinant rank of A = (a
ij
) equals r. If rows and columns
are interchanged, the determinant rank of the modied matrix is unchanged. Thus
rows and columns can be interchanged to produce an r r matrix in the upper left
corner of the matrix which has non zero determinant. Now consider the r +1r +1
matrix, M,
_
_
_
_
_
a
11
a
1r
a
1p
.
.
.
.
.
.
.
.
.
a
r1
a
rr
a
rp
a
l1
a
lr
a
lp
_
_
_
_
_
where C will denote the r r matrix in the upper left corner which has non zero
determinant. I claim det (M) = 0.
There are two cases to consider in verifying this claim. First, suppose p > r.
Then the claim follows from the assumption that A has determinant rank r. On the
other hand, if p < r, then the determinant is zero because there are two identical
columns. Expand the determinant along the last column and divide by det (C) to
obtain
a
lp
=
r
i=1
cof (M)
ip
det (C)
a
ip
.
Now note that cof (M)
ip
does not depend on p. Therefore the above sum is of the
form
a
lp
=
r
i=1
m
i
a
ip
which shows the l
th
row is a linear combination of the rst r rows of A. Since l is
arbitrary, this proves the theorem.
Corollary 3.38 The determinant rank equals the row rank.
Proof: From Theorem 3.37, the row rank is no larger than the determinant
rank. Could the row rank be smaller than the determinant rank? If so, there exist
p rows for p < r such that the span of these p rows equals the row space. But this
implies that the r r submatrix whose determinant is nonzero also has row rank
no larger than p which is impossible if its determinant is to be nonzero because at
least one row is a linear combination of the others.
Corollary 3.39 If A has determinant rank, r, then there exist r columns of the
matrix such that every other column is a linear combination of these r columns.
Also the column rank equals the determinant rank.
Proof: This follows from the above by considering A
T
. The rows of A
T
are the
columns of A and the determinant rank of A
T
and A are the same. Therefore, from
Corollary 3.38, column rank of A = row rank of A
T
= determinant rank of A
T
=
determinant rank of A.
The following theorem is of fundamental importance and ties together many of
the ideas presented above.
Theorem 3.40 Let A be an n n matrix. Then the following are equivalent.
1. det (A) = 0.
2. A, A
T
are not one to one.
3. A is not onto.
Proof: Suppose det (A) = 0. Then the determinant rank of A = r < n.
Therefore, there exist r columns such that every other column is a linear com-
bination of these columns by Theorem 3.37. In particular, it follows that for
some m, the m
th
column is a linear combination of all the others. Thus letting
A =
_
a
1
a
m
a
n
_
where the columns are denoted by a
i
, there exists
scalars,
i
such that
a
m
=
k=m
k
a
k
.
Now consider the column vector, x
_

1
1
n
_
T
. Then
Ax = a
m
+
k=m
k
a
k
= 0.
Since also A0 = 0, it follows A is not one to one. Similarly, A
T
is not one to one
by the same argument applied to A
T
. This veries that 1.) implies 2.).
Now suppose 2.). Then since A
T
is not one to one, it follows there exists x ,= 0
such that
A
T
x = 0.
Taking the transpose of both sides yields
x
T
A = 0
where the 0 is a 1 n matrix or row vector. Now if Ay = x, then
[x[
2
= x
T
(Ay) =
_
x
T
A
_
y = 0y = 0
contrary to x ,= 0. Consequently there can be no y such that Ay = x and so A is
not onto. This shows that 2.) implies 3.).
Finally, suppose 3.). If 1.) does not hold, then det (A) ,= 0 but then from
Theorem 3.32 A
1
exists and so for every y F
n
there exists a unique x F
n
such
that Ax = y. In fact x = A
1
y. Thus A would be onto contrary to 3.). This shows
3.) implies 1.) and proves the theorem.
Corollary 3.41 Let A be an n n matrix. Then the following are equivalent.
1. det(A) ,= 0.
2. A and A
T
are one to one.
3. A is onto.
Proof: This follows immediately from the above theorem.
3.5. THE CAYLEY HAMILTON THEOREM 61
3.5 The Cayley Hamilton Theorem
Denition 3.42 Let A be an nn matrix. The characteristic polynomial is dened
as
p
A
(t) det (tI A)
and the solutions to p
A
(t) = 0 are called eigenvalues. For A a matrix and p (t) =
t
n
+a
n1
t
n1
+ +a
1
t +a
0
, denote by p (A) the matrix dened by
p (A) A
n
+a
n1
A
n1
+ +a
1
A+a
0
I.
The explanation for the last term is that A
0
is interpreted as I, the identity matrix.
The Cayley Hamilton theorem states that every matrix satises its characteristic
equation, that equation dened by P
A
(t) = 0. It is one of the most important
theorems in linear algebra. The following lemma will help with its proof.
Lemma 3.43 Suppose for all [[ large enough,
A
0
+A
1
+ +A
m
m
= 0,
where the A
i
are n n matrices. Then each A
i
= 0.
Proof: Multiply by
m
to obtain
A
0
m
+A
1
m+1
+ +A
m1
1
+A
m
= 0.
Now let [[ to obtain A
m
= 0. With this, multiply by to obtain
A
0
m+1
+A
1
m+2
+ +A
m1
= 0.
Now let [[ to obtain A
m1
= 0. Continue multiplying by and letting
to obtain that all the A
i
= 0. This proves the lemma.
With the lemma, here is a simple corollary.
Corollary 3.44 Let A
i
and B
i
be n n matrices and suppose
A
0
+A
1
+ +A
m
m
= B
0
+B
1
+ +B
m
m
for all [[ large enough. Then A
i
= B
i
for all i. Consequently if is replaced by
any n n matrix, the two sides will be equal. That is, for C any n n matrix,
A
0
+A
1
C + +A
m
C
m
= B
0
+B
1
C + +B
m
C
m
.
Proof: Subtract and use the result of the lemma.
With this preparation, here is a relatively easy proof of the Cayley Hamilton
theorem.
Theorem 3.45 Let A be an n n matrix and let p () det (I A) be the
characteristic polynomial. Then p (A) = 0.
Proof: Let C () equal the transpose of the cofactor matrix of (I A) for [[
large. (If [[ is large enough, then cannot be in the nite list of eigenvalues of A
and so for such , (I A)
1
exists.) Therefore, by Theorem 3.32
C () = p () (I A)
1
.
Note that each entry in C () is a polynomial in having degree no more than n1.
Therefore, collecting the terms,
C () = C
0
+C
1
+ +C
n1
n1
for C
j
some n n matrix. It follows that for all [[ large enough,
(AI)
_
C
0
+C
1
+ +C
n1
n1
_
= p () I
and so Corollary 3.44 may be used. It follows the matrix coecients corresponding
to equal powers of are equal on both sides of this equation. Therefore, if is
replaced with A, the two sides will be equal. Thus
0 = (AA)
_
C
0
+C
1
A+ +C
n1
A
n1
_
= p (A) I = p (A) .
This proves the Cayley Hamilton theorem.
3.6 An Identity Of Cauchy
There is a very interesting identity for determinants due to Cauchy.
Theorem 3.46 The following identity holds.
i,j
(a
i
+b
j
)
1
a
1
+b
1

1
a
1
+b
n
.
.
.
.
.
.
1
a
n
+b
1

1
a
n
+b
n
j 2. Therefore, there are exactly n 1 factors which contain a
2
.
Therefore, a
2
has an exponent of n 1. Similarly, each a
k
is raised to the n 1
power and the same holds for the b
k
as well. Therefore, the right side of 3.28 is of
the form
ca
n1
1
a
n1
2
a
n1
n
b
n1
1
b
n1
n
where c is some constant. Now consider the left side of 3.28.
This is of the form
1
n!
i,j
(a
i
+b
j
)
i
1
i
n
,j
1
,j
n
sgn(i
1
i
n
) sgn(j
1
j
n
)
1
a
i
1
+b
j
1
1
a
i
2
+b
j
2

1
a
i
n
+b
j
n
.
3.7. BLOCK MULTIPLICATION OF MATRICES 63
For a given i
1
i
n
, j
1
, j
n
, let S (i
1
i
n
, j
1
, j
n
) (i
1
, j
1
) , (i
2
, j
2
) , (i
n
, j
n
) .
This equals
1
n!
i
1
i
n
,j
1
,j
n
sgn(i
1
i
n
) sgn(j
1
j
n
)
(i,j)/ {(i
1
,j
1
),(i
2
,j
2
) ,(i
n
,j
n
)}
(a
i
+b
j
)
where you can assume the i
k
are all distinct and the j
k
are also all distinct because
otherwise sgn will produce a 0. Therefore, in
(i,j)/ {(i
1
,j
1
),(i
2
,j
2
) ,(i
n
,j
n
)}
(a
i
+b
j
) ,
there are exactly n 1 factors which contain a
k
for each k and similarly, there are
exactly n 1 factors which contain b
k
for each k. Therefore, the left side of 3.28 is
of the form
da
n1
1
a
n1
2
a
n1
n
b
n1
1
b
n1
n
and it remains to verify that c = d. Using the properties of determinants, the left
side of 3.28 is of the form
i=j
(a
i
+b
j
)
1
a
1
+b
1
a
1
+b
2

a
1
+b
1
a
1
+b
n
a
2
+b
2
a
2
+b
1
1
a
2
+b
2
a
2
+b
n
.
.
.
.
.
.
.
.
.
.
.
.
a
n
+b
n
a
n
+b
1
a
n
+b
n
a
n
+b
2
1
Let a
k
b
k
. Then this converges to

i=j
(b
i
+b
j
) . The right side of 3.28
converges to
j<i
(b
i
+b
j
) (b
i
b
j
) =
i=j
(b
i
+b
j
) .
Therefore, d = c and this proves the identity.
3.7 Block Multiplication Of Matrices
Consider the following problem
_
A B
C D
__
E F
G H
_
You know how to do this. You get
_
AE +BG AF +BH
CE +DG CF +DH
_
.
Now what if instead of numbers, the entries, A, B, C, D, E, F, G are matrices of a
size such that the multiplications and additions needed in the above formula all
make sense. Would the formula be true in this case? I will show below that this is
true.
Suppose A is a matrix of the form
A =
_
_
_
A
11
A
1m
.
.
.
.
.
.
.
.
.
A
r1
A
rm
_
_
_ (3.29)
where A
ij
is a s
i
p
j
matrix where s
i
is constant for j = 1, , m for each i =
1, , r. Such a matrix is called a block matrix, also a partitioned matrix. How
do you get the block A
ij
? Here is how for A an mn matrix:
s
i
m
..
_
0 I
s
i
s
i
0
_
A
np
j
..
_
_
0
I
p
j
p
j
0
_
_
. (3.30)
In the block column matrix on the right, you need to have c
j
1 rows of zeros
above the small p
j
p
j
identity matrix where the columns of A involved in A
ij
are c
j
, , c
j
+p
j
and in the block row matrix on the left, you need to have r
i
1
columns of zeros to the left of the s
i
s
i
identity matrix where the rows of A involved
in A
ij
are r
i
, , r
i
+s
i
. An important observation to make is that the matrix on
the right species columns to use in the block and the one on the left species the
rows used. There is no overlap between the blocks of A. Thus the identity n n
identity matrix corresponding to multiplication on the right of A is of the form
_
_
_
I
p
1
p
1
0
.
.
.
0 I
p
m
p
m
_
_
_
these little identity matrices dont overlap. A similar conclusion follows from con-
sideration of the matrices I
s
i
s
i
.
Next consider the question of multiplication of two block matrices. Let B be a
block matrix of the form
_
_
_
B
11
B
1p
.
.
.
.
.
.
.
.
.
B
r1
B
rp
_
_
_ (3.31)
and A is a block matrix of the form
_
_
_
A
11
A
1m
.
.
.
.
.
.
.
.
.
A
p1
A
pm
_
_
_ (3.32)
and that for all i, j, it makes sense to multiply B
is
A
sj
for all s 1, , p. (That
is the two matrices, B
is
and A
sj
are conformable.) and that for xed ij, it follows
B
is
A
sj
is the same size for each s so that it makes sense to write

s
B
is
A
sj
.
The following theorem says essentially that when you take the product of two
matrices, you can do it two ways. One way is to simply multiply them forming BA.
The other way is to partition both matrices, formally multiply the blocks to get
another block matrix and this one will be BA partitioned. Before presenting this
theorem, here is a simple lemma which is really a special case of the theorem.
3.7. BLOCK MULTIPLICATION OF MATRICES 65
Lemma 3.47 Consider the following product.
_
_
0
I
0
_
_
_
0 I 0
_
where the rst is n r and the second is r n. The small identity matrix I is an
r r matrix and there are l zero rows above I and l zero columns to the left of I in
the right matrix. Then the product of these matrices is a block matrix of the form
_
_
0 0 0
0 I 0
0 0 0
_
_
Proof: From the denition of the way you multiply matrices, the product is
_
_
_
_
0
I
0
_
_
0
_
_
0
I
0
_
_
0
_
_
0
I
0
_
_
e
1

_
_
0
I
0
_
_
e
r
_
_
0
I
0
_
_
0
_
_
0
I
0
_
_
0
_
_
which yields the claimed result. In the formula e
j
referrs to the column vector of
length r which has a 1 in the j
th
position. This proves the lemma.
Theorem 3.48 Let B be a q p block matrix as in 3.31 and let A be a p n block
matrix as in 3.32 such that B
is
is conformable with A
sj
and each product, B
is
A
sj
for s = 1, , p is of the same size so they can be added. Then BA can be obtained
as a block matrix such that the ij
th
block is of the form
s
B
is
A
sj
. (3.33)
Proof: From 3.30
B
is
A
sj
=
_
0 I
r
i
r
i
0
_
B
_
_
0
I
p
s
p
s
0
_
_
_
0 I
p
s
p
s
0
_
A
_
_
0
I
q
j
q
j
0
_
_
where here it is assumed B
is
is r
i
p
s
and A
sj
is p
s
q
j
. The product involves the
s
th
block in the i
th
row of blocks for B and the s
th
block in the j
th
column of A.
Thus there are the same number of rows above the I
p
s
p
s
as there are columns to
the left of I
p
s
p
s
in those two inside matrices. Then from Lemma 3.47
_
_
0
I
p
s
p
s
0
_
_
_
0 I
p
s
p
s
0
_
=
_
_
0 0 0
0 I
p
s
p
s
0
0 0 0
_
_
Since the blocks of small identity matrices do not overlap,
s
_
_
0 0 0
0 I
p
s
p
s
0
0 0 0
_
_
=
_
_
_
I
p
1
p
1
0
.
.
.
0 I
p
p
p
p
_
_
_ = I
and so
s
B
is
A
sj
=
s
_
0 I
r
i
r
i
0
_
B
_
_
0
I
p
s
p
s
0
_
_
_
0 I
p
s
p
s
0
_
A
_
_
0
I
q
j
q
j
0
_
_
=
_
0 I
r
i
r
i
0
_
BIA
_
_
0
I
q
j
q
j
0
_
_
=
_
0 I
r
i
r
i
0
_
BA
_
_
0
I
q
j
q
j
0
_
_
Hence the ij
th
block of BA equals the formal multiplication according to matrix
multiplication,
s
B
is
A
sj
.
Example 3.49 Let an n n matrix have the form
A =
_
a b
c P
_
where P is n 1 n 1. Multiply it by
B =
_
p q
r Q
_
where B is also an n n matrix and Q is n 1 n 1.
You use block multiplication
_
a b
c P
__
p q
r Q
_
=
_
ap +br aq +bQ
pc +Pr cq +PQ
_
Note that this all makes sense. For example, b = 1 n 1 and r = n 1 1 so br
is a 1 1. Similar considerations apply to the other blocks.
Here is an interesting and signicant application of block multiplication. In this
theorem, p
M
(t) denotes the characteristic polynomial, det (tI M) . Thus the zeros
of this polynomial are the eigenvalues of the matrix, M.
Theorem 3.50 Let A be an mn matrix and let B be an nm matrix for m n.
Then
p
BA
(t) = t
nm
p
AB
(t) ,
so the eigenvalues of BA and AB are the same including multiplicities except that
BA has n m extra zero eigenvalues.
3.8. SHURS THEOREM 67
Proof: Use block multiplication to write
_
AB 0
B 0
__
I A
0 I
_
=
_
AB ABA
B BA
_
_
I A
0 I
__
0 0
B BA
_
=
_
AB ABA
B BA
_
.
Therefore,
_
I A
0 I
_
1
_
AB 0
B 0
__
I A
0 I
_
=
_
0 0
B BA
_
Since the two matrices above are similar it follows that
_
0 0
B BA
_
and
_
AB 0
B 0
_
have the same characteristic polynomials. Therefore, noting that BA is an n n
matrix and AB is an mm matrix,
t
m
det (tI BA) = t
n
det (tI AB)
and so det (tI BA) = p
BA
(t) = t
nm
det (tI AB) = t
nm
p
AB
(t) . This proves
the theorem.
3.8 Shurs Theorem
Every matrix is related to an upper triangular matrix in a particularly signicant
way. This is Shurs theorem and it is the most important theorem in the spectral
theory of matrices.
Lemma 3.51 Let x
1
, , x
n
be a basis for F
n
. Then there exists an orthonormal
basis for F
n
, u
1
, , u
n
which has the property that for each k n,
span(x
1
, , x
k
) = span(u
1
, , u
k
) .
Proof: Let x
1
, , x
n
be a basis for F
n
. Let u
1
x
1
/ [x
1
[ . Thus for k = 1,
span(u
1
) = span(x
1
) and u
1
is an orthonormal set. Now suppose for some k < n,
u
1
, , u
k
have been chosen such that (u
j
u
l
) =
jl
and span(x
1
, , x
k
) =
span(u
1
, , u
k
). Then dene
u
k+1

x
k+1
k
j=1
(x
k+1
u
j
) u
j
x
k+1
k
j=1
(x
k+1
u
j
) u
j
, (3.34)
where the denominator is not equal to zero because the x
j
form a basis and so
x
k+1
/ span(x
1
, , x
k
) = span(u
1
, , u
k
)
Thus by induction,
u
k+1
span(u
1
, , u
k
, x
k+1
) = span(x
1
, , x
k
, x
k+1
) .
Also, x
k+1
span(u
1
, , u
k
, u
k+1
) which is seen easily by solving 3.34 for x
k+1
and it follows
span(x
1
, , x
k
, x
k+1
) = span(u
1
, , u
k
, u
k+1
) .
If l k,
(u
k+1
u
l
) = C
_
_
(x
k+1
u
l
)
k
j=1
(x
k+1
u
j
) (u
j
u
l
)
_
_
= C
_
_
(x
k+1
u
l
)
k
j=1
(x
k+1
u
j
)
lj
_
_
= C ((x
k+1
u
l
) (x
k+1
u
l
)) = 0.
The vectors, u
j
n
j=1
, generated in this way are therefore an orthonormal basis
because each vector has unit length.
The process by which these vectors were generated is called the Gram Schmidt
process. Recall the following denition.
Denition 3.52 An n n matrix, U, is unitary if UU
= I = U
U where U
is
dened to be the transpose of the conjugate of U.
Theorem 3.53 Let A be an n n matrix. Then there exists a unitary matrix, U
such that
U
AU = T, (3.35)
where T is an upper triangular matrix having the eigenvalues of A on the main
diagonal listed according to multiplicity as roots of the characteristic equation.
Proof: Let v
1
be a unit eigenvector for A . Then there exists
1
such that
Av
1
=
1
v
1
, [v
1
[ = 1.
Extend v
1
to a basis and then use Lemma 3.51 to obtain v
1
, , v
n
, an or-
thonormal basis in F
n
. Let U
0
be a matrix whose i
th
column is v
i
. Then from the
above, it follows U
0
is unitary. Then U
0
AU
0
is of the form
_
_
_
_
_
1

0
.
.
. A
1
0
_
_
_
_
_
where A
1
is an n 1 n 1 matrix. Repeat the process for the matrix, A
1
above.
There exists a unitary matrix

U
1
such that

U
1
A
1

U
1
is of the form
_
_
_
_
_
2

0
.
.
. A
2
0
_
_
_
_
_
.
Now let U
1
be the n n matrix of the form
_
1 0
0

U
1
_
.
This is also a unitary matrix because by block multiplication,
_
1 0
0

U
1
_
_
1 0
0

U
1
_
=
_
1 0
0

U
1
__
1 0
0

U
1
_
=
_
1 0
0

U
U
1
_
=
_
1 0
0 I
_
Then using block multiplication, U
1
U
0
AU
0
U
1
is of the form
_
_
_
_
_
_
_
1

0
2

0 0
.
.
.
.
.
. A
2
0 0
_
_
_
_
_
_
_
where A
2
is an n 2 n 2 matrix. Continuing in this way, there exists a unitary
matrix, U given as the product of the U
i
in the above construction such that
U
AU = T
where T is some upper triangular matrix. Since the matrix is upper triangular, the
characteristic equation is

n
i=1
(
i
) where the
i
are the diagonal entries of T.
Therefore, the
i
are the eigenvalues.
What if A is a real matrix and you only want to consider real unitary matrices?
Theorem 3.54 Let A be a real n n matrix. Then there exists a real unitary
matrix, Q and a matrix T of the form
T =
_
_
_
P
1

.
.
.
.
.
.
0 P
r
_
_
_ (3.36)
where P
i
equals either a real 1 1 matrix or P
i
equals a real 2 2 matrix having
two complex eigenvalues of A such that Q
T
AQ = T. The matrix, T is called the real
Schur form of the matrix A.
Proof: Suppose
Av
1
=
1
v
1
, [v
1
[ = 1
where
1
is real. Then let v
1
, , v
n
be an orthonormal basis of vectors in 1
n
.
Let Q
0
be a matrix whose i
th
column is v
i
. Then Q
0
AQ
0
is of the form
_
_
_
_
_
1

0
.
.
. A
1
0
_
_
_
_
_
where A
1
is a real n 1 n 1 matrix. This is just like the proof of Theorem 3.53
up to this point.
Now in case
1
= + i, it follows since A is real that v
1
= z
1
+ iw
1
and
that v
1
= z
1
iw
1
is an eigenvector for the eigenvalue, i. Here z
1
and w
1
are real vectors. It is clear that z
1
, w
1
is an independent set of vectors in 1
n
.
Indeed,v
1
, v
1
is an independent set and it follows span(v
1
, v
1
) = span(z
1
, w
1
) .
Now using the Gram Schmidt theorem in 1
n
, there exists u
1
, u
2
, an orthonormal
set of real vectors such that span(u
1
, u
2
) = span(v
1
, v
1
) . Now let u
1
, u
2
, , u
n
be an orthonormal basis in 1
n
and let Q
0
be a unitary matrix whose i
th
column
is u
i
. Then Au
j
are both in span(u
1
, u
2
) for j = 1, 2 and so u
T
k
Au
j
= 0 whenever
k 3. It follows that Q
0
AQ
0
is of the form
_
_
_
_
_
_
_

0
.
.
. A
1
0
_
_
_
_
_
_
_
where A
1
is now an n 2 n 2 matrix. In this case, nd

Q
1
an n 2 n 2
matrix to put A
1
in an appropriate form as above and come up with A
2
either an
n 4 n 4 matrix or an n 3 n 3 matrix. Then the only other dierence is
to let
Q
1
=
_
_
_
_
_
_
_
1 0 0 0
0 1 0 0
0 0
.
.
.
.
.
.

Q
1
0 0
_
_
_
_
_
_
_
thus putting a 2 2 identity matrix in the upper left corner rather than a one.
Repeating this process with the above modication for the case of a complex eigen-
value leads eventually to 3.36 where Q is the product of real unitary matrices Q
i
above. Finally,
I T =
_
_
_
I
1
P
1

.
.
.
.
.
.
0 I
r
P
r
_
_
_
where I
k
is the 2 2 identity matrix in the case that P
k
is 2 2 and is the num-
ber 1 in the case where P
k
is a 1 1 matrix. Now, it follows that det (I T) =
r
k=1
det (I
k
P
k
) . Therefore, is an eigenvalue of T if and only if it is an eigen-
value of some P
k
. This proves the theorem since the eigenvalues of T are the same
as those of A because they have the same characteristic polynomial due to the
similarity of A and T.
Denition 3.55 When a linear transformation, A, mapping a linear space, V to
V has a basis of eigenvectors, the linear transformation is called non defective.
Otherwise it is called defective. An nn matrix, A, is called normal if AA
= A
A.
An important class of normal matrices is that of the Hermitian or self adjoint
matrices. An n n matrix, A is self adjoint or Hermitian if A = A
.
The next lemma is the basis for concluding that every normal matrix is unitarily
similar to a diagonal matrix.
Lemma 3.56 If T is upper triangular and normal, then T is a diagonal matrix.
Proof: Since T is normal, T
T = TT
. Writing this in terms of components

and using the description of the adjoint as the transpose of the conjugate, yields
the following for the ik
th
entry of T
T = TT
j
t
ij
t
jk
=
j
t
ij
t
kj
=
j
t
ij
t
jk
=
j
t
ji
t
jk
.
Now use the fact that T is upper triangular and let i = k = 1 to obtain the following
from the above.

j
[t
1j
[
2
=
j
[t
j1
[
2
= [t
11
[
2
You see, t
j1
= 0 unless j = 1 due to the assumption that T is upper triangular.
This shows T is of the form
_
_
_
_
_
0 0
0
.
.
.
.
.
.
.
.
.
.
.
.
0 0
_
_
_
_
_
.
Now do the same thing only this time take i = k = 2 and use the result just
established. Thus, from the above,
j
[t
2j
[
2
=
j
[t
j2
[
2
= [t
22
[
2
,
showing that t
2j
= 0 if j > 2 which means T has the form
_
_
_
_
_
_
_
0 0 0
0 0 0
0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 0
_
_
_
_
_
_
_
.
Next let i = k = 3 and obtain that T looks like a diagonal matrix in so far as the
rst 3 rows and columns are concerned. Continuing in this way it follows T is a
diagonal matrix.
Theorem 3.57 Let A be a normal matrix. Then there exists a unitary matrix, U
such that U
AU is a diagonal matrix.
Proof: From Theorem 3.53 there exists a unitary matrix, U such that U
AU
equals an upper triangular matrix. The theorem is now proved if it is shown that
the property of being normal is preserved under unitary similarity transformations.
That is, verify that if A is normal and if B = U
AU, then B is also normal. But

this is easy.
B
B = U
UU
AU = U
AU
= U
AA
U = U
AUU
U = BB
.
Therefore, U
AU is a normal and upper triangular matrix and by Lemma 3.56 it

must be a diagonal matrix. This proves the theorem.
Corollary 3.58 If A is Hermitian, then all the eigenvalues of A are real and there
exists an orthonormal basis of eigenvectors.
Proof: Since A is normal, there exists unitary, U such that U
AU = D, a
diagonal matrix whose diagonal entries are the eigenvalues of A. Therefore, D
=
U
U = U
AU = D showing D is real.
Finally, let
U =
_
u
1
u
2
u
n
_
where the u
i
denote the columns of U and
D =
_
_
_
1
0
.
.
.
0
n
_
_
_
The equation, U
AU = D implies
AU =
_
Au
1
Au
2
Au
n
_
= UD =
_

1
u
1

2
u
2

n
u
n
_
where the entries denote the columns of AU and UD respectively. Therefore, Au
i
=
i
u
i
and since the matrix is unitary, the ij
th
entry of U
U equals
ij
and so
ij
= u
T
i
u
j
= u
T
i
u
j
= u
i
u
j
.
This proves the corollary because it shows the vectors u
i
form an orthonormal
basis.
Corollary 3.59 If A is a real symmetric matrix, then A is Hermitian and there
exists a real unitary matrix, U such that U
T
AU = D where D is a diagonal matrix.
Proof: This follows from Theorem 3.54 and Corollary 3.58.
3.9. THE RIGHT POLAR DECOMPOSITION 73
3.9 The Right Polar Decomposition
The right polar decomposition involves writing a matrix as a product of two other
matrices, one which preserves distances and the other which stretches and distorts.
First here are some lemmas.
Lemma 3.60 Let A be a Hermitian matrix such that all its eigenvalues are nonneg-
ative. Then there exists a Hermitian matrix, A
1/2
such that A
1/2
has all nonnegative
eigenvalues and
_
A
1/2
_
2
= A.
Proof: Since A is Hermitian, there exists a diagonal matrix D having all real
nonnegative entries and a unitary matrix U such that A = U
DU. Then denote by

D
1/2
the matrix which is obtained by replacing each diagonal entry of D with its
square root. Thus D
1/2
D
1/2
= D. Then dene
A
1/2
U
D
1/2
U.
Then
_
A
1/2
_
2
= U
D
1/2
UU
D
1/2
U = U
DU = A.
Since D
1/2
is real,
_
U
D
1/2
U
_
= U
_
D
1/2
_
(U
= U
D
1/2
U
so A
1/2
is Hermitian. This proves the lemma.
There is also a useful observation about orthonormal sets of vectors which is
stated in the next lemma.
Lemma 3.61 Suppose x
1
, x
2
, , x
r
is an orthonormal set of vectors. Then if
c
1
, , c
r
are scalars,
k=1
c
k
x
k
2
=
r
k=1
[c
k
[
2
.
Proof: This follows from the denition. From the properties of the dot product
and using the fact that the given set of vectors is orthonormal,
k=1
c
k
x
k
2
=
_
_
r
k=1
c
k
x
k
,
r
j=1
c
j
x
j
_
_
=
k,j
c
k
c
j
(x
k
, x
j
) =
r
k=1
[c
k
[
2
.
Next it is helpful to recall the Gram Schmidt algorithm and observe a certain
property stated in the next lemma.
Lemma 3.62 Suppose w
1
, , w
r
, v
r+1
, , v
p
is a linearly independent set of
vectors such that w
1
, , w
r
is an orthonormal set of vectors. Then when the
Gram Schmidt process is applied to the vectors in the given order, it will not change
any of the w
1
, , w
r
.
Proof: Let u
1
, , u
p
be the orthonormal set delivered by the Gram Schmidt
process. Then u
1
= w
1
because by denition, u
1
w
1
/ [w
1
[ = w
1
. Now suppose
u
j
= w
j
for all j k r. Then if k < r, consider the denition of u
k+1
.
u
k+1

w
k+1
k+1
j=1
(w
k+1
, u
j
) u
j
w
k+1
k+1
j=1
(w
k+1
, u
j
) u
j
By induction, u
j
= w
j
and so this reduces to w
k+1
/ [w
k+1
[ = w
k+1
. This proves
the lemma.
This lemma immediately implies the following lemma.
Lemma 3.63 Let V be a subspace of dimension p and let w
1
, , w
r
be an
orthonormal set of vectors in V . Then this orthonormal set of vectors may be
extended to an orthonormal basis for V,
w
1
, , w
r
, y
r+1
, , y
p
Proof: First extend the given linearly independent set w

1
, , w
r
to a basis
for V and then apply the Gram Schmidt theorem to the resulting basis. Since
w
1
, , w
r
is orthonormal it follows from Lemma 3.62 the result is of the desired
form, an orthonormal basis extending w
1
, , w
r
. This proves the lemma.
Here is another lemma about preserving distance.
Lemma 3.64 Suppose R is an mn matrix with m > n and R preserves distances.
Then R
R = I.
Proof: Since R preserves distances, [Rx[ = [x[ for every x. Therefore from the
axioms of the dot product,
[x[
2
+[y[
2
+ (x, y) + (y, x)
= [x +y[
2
= (R(x +y) , R(x +y))
= (Rx,Rx) + (Ry,Ry) + (Rx, Ry) + (Ry, Rx)
= [x[
2
+[y[
2
+ (R
Rx, y) + (y, R
Rx)
and so for all x, y,
(R
Rx x, y) + (y,R
Rx x) = 0
Hence for all x, y,
Re (R
Rx x, y) = 0
3.9. THE RIGHT POLAR DECOMPOSITION 75
Now for a x, y given, choose C such that
(R
Rx x, y) = [(R
Rx x, y)[
Then
0 = Re (R
Rx x,y) = Re (R
Rx x, y)
= [(R
Rx x, y)[
Thus [(R
Rx x, y)[ = 0 for all x, y because the given x, y were arbitrary. Let

y = R
Rx x to conclude that for all x,

R
Rx x = 0
which says R
R = I since x is arbitrary. This proves the lemma.

With this preparation, here is the big theorem about the right polar decompo-
sition.
Theorem 3.65 Let F be an m n matrix where m n. Then there exists a
Hermitian n n matrix, U which has all nonnegative eigenvalues and an m n
matrix, R which preserves distances and satises R
R = I such that
F = RU.
Proof: Consider F
F. This is a Hermitian matrix because

(F
F)
= F
(F
= F
F
Also the eigenvalues of the n n matrix F
F are all nonnegative. This is because

if x is an eigenvalue,
(x, x) = (F
Fx, x) = (Fx,Fx) 0.
Therefore, by Lemma 3.60, there exists an n n Hermitian matrix, U having all
nonnegative eigenvalues such that
U
2
= F
F.
Consider the subspace U (F
n
). Let Ux
1
, , Ux
r
be an orthonormal basis for
U (F
n
) F
n
. Note that U (F
n
) might not be all of F
n
. Using Lemma 3.63, extend
to an orthonormal basis for all of F
n
,
Ux
1
, , Ux
r
, y
r+1
, , y
n
.
Next observe that Fx
1
, , Fx
r
is also an orthonormal set of vectors in F
m
.
This is because
(Fx
k
, Fx
j
) = (F
Fx
k
, x
j
) =
_
U
2
x
k
, x
j
_
= (Ux
k
, U
x
j
) = (Ux
k
, Ux
j
) =
jk
Therefore, from Lemma 3.63 again, this orthonormal set of vectors can be extended
to an orthonormal basis for F
m
,
Fx
1
, , Fx
r
, z
r+1
, , z
m
Thus there are at least as many z

k
as there are y
j
. Now for x F
n
, since
Ux
1
, , Ux
r
, y
r+1
, , y
n
is an orthonormal basis for F

n
, there exist unique scalars,
c
1
, c
r
, d
r+1
, , d
n
such that
x =
r
k=1
c
k
Ux
k
+
n
j=r+1
d
k
y
k
Dene
Rx
r
k=1
c
k
Fx
k
+
n
j=r+1
d
k
z
k
(3.37)
Then also there exist scalars b
k
such that
Ux =
r
k=1
b
k
Ux
k
and so from 3.37, applied to Ux in place of x
RUx =
r
k=1
b
k
Fx
k
= F
_
r
k=1
b
k
x
k
_
Is F (
r
k=1
b
k
x
k
) = F (x)?
_
F
_
r
k=1
b
k
x
k
_
F (x) , F
_
r
k=1
b
k
x
k
_
F (x)
_
=
_
(F
F)
_
r
k=1
b
k
x
k
x
_
,
_
r
k=1
b
k
x
k
x
__
=
_
U
2
_
r
k=1
b
k
x
k
x
_
,
_
r
k=1
b
k
x
k
x
__
=
_
U
_
r
k=1
b
k
x
k
x
_
, U
_
r
k=1
b
k
x
k
x
__
=
_
r
k=1
b
k
Ux
k
Ux,
r
k=1
b
k
Ux
k
Ux
_
= 0
3.10. THE SPACE L
F
N
, F
M
77
Therefore, F (
r
k=1
b
k
x
k
) = F (x) and this shows
RUx = Fx.
From 3.37 and Lemma 3.61 R preserves distances. Therefore, by Lemma 3.64
R
R = I. This proves the theorem.

3.10 The Space L(F
n
, F
m
)
Denition 3.66 The symbol, L(F
n
, F
m
) will denote the set of linear transforma-
tions mapping F
n
to F
m
. Thus L L(F
n
, F
m
) means that for , scalars and x, y
vectors in F
n
,
L(x +y) = L(x) +L(y) .
It is convenient to give a norm for the elements of L(F
n
, F
m
) . This will allow
the consideration of questions such as whether a function having values in this space
of linear transformations is continuous.
3.11 The Operator Norm
How do you measure the distance between linear transformations dened on F
n
? It
turns out there are many ways to do this but I will give the most common one here.
Denition 3.67 L(F
n
, F
m
) denotes the space of linear transformations mapping
F
n
to F
m
. For A L(F
n
, F
m
) , the operator norm is dened by
[[A[[ max [Ax[
F
m
: [x[
F
n
1 < .
Theorem 3.68 Denote by [[ the norm on either F
n
or F
m
. Then L(F
n
, F
m
) with
this operator norm is a complete normed linear space of dimension nm with
[[Ax[[ [[A[[ [x[ .
Here Completeness means that every Cauchy sequence converges.
Proof: It is necessary to show the norm dened on L(F
n
, F
m
) really is a norm.
This means it is necessary to verify
[[A[[ 0 and equals zero if and only if A = 0.
For a scalar,
[[A[[ = [[ [[A[[ ,
and for A, B L(F
n
, F
m
) ,
[[A+B[[ [[A[[ +[[B[[
The rst two properties are obvious but you should verify them. It remains to verify
the norm is well dened and also to verify the triangle inequality above. First if
[x[ 1, and (A
ij
) is the matrix of the linear transformation with respect to the
usual basis vectors, then
[[A[[ = max
_
_
_
_
i
[(Ax)
i
[
2
_
1/2
: [x[ 1
_
_
_
= max
_
_
_
_
_
j
A
ij
x
j
2
_
_
_
1/2
: [x[ 1
_
_
which is a nite number by the extreme value theorem.
It is clear that a basis for L(F
n
, F
m
) consists of linear transformations whose
matrices are of the form E
ij
where E
ij
consists of the mn matrix having all zeros
except for a 1 in the ij
th
position. In eect, this considers L(F
n
, F
m
) as F
nm
. Think
of the mn matrix as a long vector folded up.
If x ,= 0,
[Ax[
1
[x[
=
A
x
[x[
[[A[[ (3.38)
It only remains to verify completeness. Suppose then that A
k
is a Cauchy
sequence in L(F
n
, F
m
) . Then from 3.38 A
k
x is a Cauchy sequence for each x F
n
.
This follows because
[A
k
x A
l
x[ [[A
k
A
l
[[ [x[
which converges to 0 as k, l . Therefore, by completeness of F
m
, there exists
Ax, the name of the thing to which the sequence, A
k
x converges such that
lim
k
A
k
x = Ax.
Then A is linear because
A(ax +by) lim
k
A
k
(ax +by)
= lim
k
(aA
k
x +bA
k
y)
= a lim
k
A
k
x +b lim
k
A
k
y
= aAx +bAy.
By the rst part of this argument, [[A[[ < and so A L(F
n
, F
m
) . This proves
the theorem.
Proposition 3.69 Let A(x) L(F
n
, F
m
) for each x U F
p
. Then letting
(A
ij
(x)) denote the matrix of A(x) with respect to the standard basis, it follows
A
ij
is continuous at x for each i, j if and only if for all > 0, there exists a > 0
such that if [x y[ < , then [[A(x) A(y)[[ < . That is, A is a continuous
function having values in L(F
n
, F
m
) at x.
3.11. THE OPERATOR NORM 79
Proof: Suppose rst the second condition holds. Then from the material on
linear transformations,
[A
ij
(x) A
ij
(y)[ = [e
i
(A(x) A(y)) e
j
[
[e
i
[ [(A(x) A(y)) e
j
[
[[A(x) A(y)[[ .
Therefore, the second condition implies the rst.
Now suppose the rst condition holds. That is each A
ij
is continuous at x. Let
[v[ 1.
[(A(x) A(y)) (v)[ =
_
_
_
j
(A
ij
(x) A
ij
(y)) v
j
2
_
_
_
1/2
(3.39)
_
_
_
i
_
_
j
[A
ij
(x) A
ij
(y)[ [v
j
[
_
_
2
_
_
_
1/2
.
By continuity of each A
ij
, there exists a > 0 such that for each i, j
[A
ij
(x) A
ij
(y)[ <

n
m
whenever [x y[ < . Then from 3.39, if [x y[ < ,
[(A(x) A(y)) (v)[ <
_
_
_
i
_
_
m
[v[
_
_
2
_
_
_
1/2
_
_
_
i
_
_
m
_
_
2
_
_
_
1/2
=
This proves the proposition.
The Frechet Derivative
Let U be an open set in F
n
, and let f : U F
m
be a function.
Denition 4.1 A function g is o (v) if
lim
|v|0
g (v)
[v[
= 0 (4.1)
A function f : U F
m
is dierentiable at x U if there exists a linear transfor-
mation L L(F
n
, F
m
) such that
f (x +v) = f (x) +Lv +o (v)
This linear transformation L is the denition of Df (x). This derivative is often
called the Frechet derivative. .
Usually no harm is occasioned by thinking of this linear transformation as its
matrix taken with respect to the usual basis vectors.
The denition 4.1 means that the error,
f (x +v) f (x) Lv
converges to 0 faster than [v[. Thus the above denition is equivalent to saying
lim
|v|0
[f (x +v) f (x) Lv[
[v[
= 0 (4.2)
or equivalently,
lim
yx
[f (y) f (x) Df (x) (y x)[
[y x[
= 0. (4.3)
Now it is clear this is just a generalization of the notion of the derivative of a
function of one variable because in this more specialized situation,
lim
|v|0
[f (x +v) f (x) f
(x) v[
[v[
= 0,
81
82 THE FRECHET DERIVATIVE
due to the denition which says
f
(x) = lim
v0
f (x +v) f (x)
v
.
For functions of n variables, you cant dene the derivative as the limit of a dierence
quotient like you can for a function of one variable because you cant divide by a
vector. That is why there is a need for a more general denition.
The term o (v) is notation that is descriptive of the behavior in 4.1 and it is
only this behavior that is of interest. Thus, if t and k are constants,
o (v) = o (v) +o (v) , o (tv) = o (v) , ko (v) = o (v)
and other similar observations hold. The sloppiness built in to this notation is
useful because it ignores details which are not important. It may help to think of
o (v) as an adjective describing what is left over after approximating f (x +v) by
f (x) +Df (x) v.
Theorem 4.2 The derivative is well dened.
Proof: First note that for a xed vector, v, o (tv) = o (t). Now suppose both
L
1
and L
2
work in the above denition. Then let v be any vector and let t be a
real scalar which is chosen small enough that tv +x U. Then
f (x +tv) = f (x) +L
1
tv +o (tv) , f (x +tv) = f (x) +L
2
tv +o (tv) .
Therefore, subtracting these two yields (L
2
L
1
) (tv) = o (tv) = o (t). There-
fore, dividing by t yields (L
2
L
1
) (v) =
o(t)
t
. Now let t 0 to conclude that
(L
2
L
1
) (v) = 0. Since this is true for all v, it follows L
2
= L
1
. This proves the
theorem.
Lemma 4.3 Let f be dierentiable at x. Then f is continuous at x and in fact,
there exists K > 0 such that whenever [v[ is small enough,
[f (x +v) f (x)[ K[v[
Proof: From the denition of the derivative, f (x +v)f (x) = Df (x) v+o (v).
Let [v[ be small enough that
o(|v|)
|v|
< 1 so that [o (v)[ [v[. Then for such v,
[f (x +v) f (x)[ [Df (x) v[ +[v[
([Df (x)[ + 1) [v[
This proves the lemma with K = [Df (x)[ + 1.
Theorem 4.4 (The chain rule) Let U and V be open sets, U F
n
and V
F
m
. Suppose f : U V is dierentiable at x U and suppose g : V F
q
is
dierentiable at f (x) V . Then g f is dierentiable at x and
D(g f ) (x) = D(g (f (x))) D(f (x)) .
83
Proof: This follows from a computation. Let B(x,r) U and let r also be small
enough that for [v[ r, it follows that f (x +v) V . Such an r exists because f is
continuous at x. For [v[ < r, the denition of dierentiability of g and f implies
g (f (x +v)) g (f (x)) =
Dg (f (x)) (f (x +v) f (x)) +o (f (x +v) f (x))
= Dg (f (x)) [Df (x) v +o (v)] +o (f (x +v) f (x))
= D(g (f (x))) D(f (x)) v +o (v) +o (f (x +v) f (x)) . (4.4)
It remains to show o (f (x +v) f (x)) = o (v).
By Lemma 4.3, with K given there, letting > 0, it follows that for [v[ small
enough,
[o (f (x +v) f (x))[ (/K) [f (x +v) f (x)[ (/K) K[v[ = [v[ .
Since > 0 is arbitrary, this shows o (f (x +v) f (x)) = o (v) because whenever
[v[ is small enough,
[o (f (x +v) f (x))[
[v[
.
By 4.4, this shows
g (f (x +v)) g (f (x)) = D(g (f (x))) D(f (x)) v +o (v)
which proves the theorem.
The derivative is a linear transformation. What is the matrix of this linear
transformation taken with respect to the usual basis vectors? Let e
i
denote the
vector of F
n
which has a one in the i
th
entry and zeroes elsewhere. Then the matrix
of the linear transformation is the matrix whose i
th
column is Df (x) e
i
. What is
this? Let t 1 such that [t[ is suciently small.
f (x +te
i
) f (x) = Df (x) te
i
+o(te
i
)
= Df (x) te
i
+o(t) .
Then dividing by t and taking a limit,
Df (x) e
i
= lim
t0
f (x +te
i
) f (x)
t

f
x
i
(x) .
Thus the matrix of Df (x) with respect to the usual basis vectors is the matrix of
the form
_
_
_
f
1,x
1
(x) f
1,x
2
(x) f
1,x
n
(x)
.
.
.
.
.
.
.
.
.
f
m,x
1
(x) f
m,x
2
(x) f
m,x
n
(x)
_
_
_.
As mentioned before, there is no harm in referring to this matrix as Df (x) but it
may also be referred to as Jf (x) .
This is summarized in the following theorem.
Theorem 4.5 Let f : F
n
F
m
and suppose f is dierentiable at x. Then all the
partial derivatives
f
i
(x)
x
j
exist and if Jf (x) is the matrix of the linear transformation
with respect to the standard basis vectors, then the ij
th
entry is given by f
i,j
or
f
i
x
j
(x).
What if all the partial derivatives of f exist? Does it follow that f is dieren-
tiable? Consider the following function.
f (x, y) =
_
xy
x
2
+y
2
if (x, y) ,= (0, 0)
0 if (x, y) = (0, 0)
.
Then from the denition of partial derivatives,
lim
h0
f (h, 0) f (0, 0)
h
= lim
h0
0 0
h
= 0
and
lim
h0
f (0, h) f (0, 0)
h
= lim
h0
0 0
h
= 0
However f is not even continuous at (0, 0) which may be seen by considering the
behavior of the function along the line y = x and along the line x = 0. By Lemma
4.3 this implies f is not dierentiable. Therefore, it is necessary to consider the
correct denition of the derivative given above if you want to get a notion which
generalizes the concept of the derivative of a function of one variable in such a way
as to preserve continuity whenever the function is dierentiable.
4.1 C
1
Functions
However, there are theorems which can be used to get dierentiability of a function
based on existence of the partial derivatives.
Denition 4.6 When all the partial derivatives exist and are continuous the func-
tion is called a C
1
function.
Because of Proposition 3.69 on Page 78 and Theorem 4.5 which identies the
entries of Jf with the partial derivatives, the following denition is equivalent to
the above.
Denition 4.7 Let U F
n
be an open set. Then f : U F
m
is C
1
(U) if f is
dierentiable and the mapping
x Df (x) ,
is continuous as a function from U to L(F
n
, F
m
).
The following is an important abstract generalization of the familiar concept of
partial derivative.
4.1. C
1
FUNCTIONS 85
Denition 4.8 Let g : U F
n
F
m
F
q
, where U is an open set in F
n
F
m
.
Denote an element of F
n
F
m
by (x, y) where x F
n
and y F
m
. Then the map
x g (x, y) is a function from the open set in X,
x : (x, y) U
to F
q
. When this map is dierentiable, its derivative is denoted by
D
1
g (x, y) , or sometimes by D
x
g (x, y) .
Thus,
g (x +v, y) g (x, y) = D
1
g (x, y) v +o (v) .
A similar denition holds for the symbol D
y
g or D
2
g. The special case seen in
beginning calculus courses is where g : U F
q
and
g
x
i
(x)
g (x)
x
i
lim
h0
g (x +he
i
) g (x)
h
.
The following theorem will be very useful in much of what follows. It is a version
of the mean value theorem.
Theorem 4.9 Suppose U is an open subset of F
n
and f : U F
m
has the property
that Df (x) exists for all x in U and that, x+t (y x) U for all t [0, 1]. (The
line segment joining the two points lies in U.) Suppose also that for all points on
this line segment,
[[Df (x+t (y x))[[ M.
Then
[f (y) f (x)[ M[y x[ .
Proof: Let
S t [0, 1] : for all s [0, t] ,
[f (x +s (y x)) f (x)[ (M +) s [y x[ .
Then 0 S and by continuity of f , it follows that if t supS, then t S and if
t < 1,
[f (x +t (y x)) f (x)[ = (M +) t [y x[ . (4.5)
If t < 1, then there exists a sequence of positive numbers, h
k
k=1
converging to 0
such that
[f (x + (t +h
k
) (y x)) f (x)[ > (M +) (t +h
k
) [y x[
which implies that
[f (x + (t +h
k
) (y x)) f (x +t (y x))[
+[f (x +t (y x)) f (x)[ > (M +) (t +h
k
) [y x[ .
By 4.5, this inequality implies
[f (x + (t +h
k
) (y x)) f (x +t (y x))[ > (M +) h
k
[y x[
which yields upon dividing by h
k
and taking the limit as h
k
0,
[Df (x +t (y x)) (y x)[ (M +) [y x[ .
Now by the denition of the norm of a linear operator,
M[y x[ [[Df (x +t (y x))[[ [y x[
[Df (x +t (y x)) (y x)[ (M +) [y x[ ,
a contradiction. Therefore, t = 1 and so
[f (x + (y x)) f (x)[ (M +) [y x[ .
Since > 0 is arbitrary, this proves the theorem.
The next theorem proves that if the partial derivatives exist and are continuous,
then the function is dierentiable.
Theorem 4.10 Let g : U F
n
F
m
F
q
. Then g is C
1
(U) if and only if D
1
g
and D
2
g both exist and are continuous on U. In this case,
Dg (x, y) (u, v) = D
1
g (x, y) u+D
2
g (x, y) v.
Proof: Suppose rst that g C
1
(U). Then if (x, y) U,
g (x +u, y) g (x, y) = Dg (x, y) (u, 0) +o (u) .
Therefore, D
1
g (x, y) u =Dg (x, y) (u, 0). Then
[(D
1
g (x, y) D
1
g (x
, y
)) (u)[ =
[(Dg (x, y) Dg (x
, y
)) (u, 0)[
[[Dg (x, y) Dg (x
, y
)[[ [(u, 0)[ .

Therefore,
[D
1
g (x, y) D
1
g (x
, y
)[ [[Dg (x, y) Dg (x
, y
)[[ .
A similar argument applies for D
2
g and this proves the continuity of the function,
(x, y) D
i
g (x, y) for i = 1, 2. The formula follows from
Dg (x, y) (u, v) = Dg (x, y) (u, 0) +Dg (x, y) (0, v)
D
1
g (x, y) u+D
2
g (x, y) v.
Now suppose D
1
g (x, y) and D
2
g (x, y) exist and are continuous.
g (x +u, y +v) g (x, y) = g (x +u, y +v) g (x, y +v)
4.1. C
1
FUNCTIONS 87
+g (x, y +v) g (x, y)
= g (x +u, y) g (x, y) +g (x, y +v) g (x, y) +
[g (x +u, y +v) g (x +u, y) (g (x, y +v) g (x, y))]
= D
1
g (x, y) u +D
2
g (x, y) v +o (v) +o (u) +
[g (x +u, y +v) g (x +u, y) (g (x, y +v) g (x, y))] . (4.6)
Let h(x, u) g (x +u, y +v) g (x +u, y). Then the expression in [ ] is of the
form,
h(x, u) h(x, 0) .
Also
D
2
h(x, u) = D
1
g (x +u, y +v) D
1
g (x +u, y)
and so, by continuity of (x, y) D
1
g (x, y),
[[D
2
h(x, u)[[ <
whenever [[(u, v)[[ is small enough. By Theorem 4.9 on Page 85, there exists > 0
such that if [[(u, v)[[ < , the norm of the last term in 4.6 satises the inequality,
[[g (x +u, y +v) g (x +u, y) (g (x, y +v) g (x, y))[[ < [[u[[ . (4.7)
Therefore, this term is o ((u, v)). It follows from 4.7 and 4.6 that
g (x +u, y +v) =
g (x, y) +D
1
g (x, y) u +D
2
g (x, y) v+o (u) +o (v) +o ((u, v))
= g (x, y) +D
1
g (x, y) u +D
2
g (x, y) v +o ((u, v))
Showing that Dg (x, y) exists and is given by
Dg (x, y) (u, v) = D
1
g (x, y) u +D
2
g (x, y) v.
The continuity of (x, y) Dg (x, y) follows from the continuity of (x, y)
D
i
g (x, y). This proves the theorem.
Not surprisingly, it can be generalized to many more factors.
Denition 4.11 Let g : U

n
i=1
F
r
i
F
q
, where U is an open set. Then the
map x
i
g (x) is a function from the open set in F
r
i
,
x
i
: x U
to F
q
. When this map is dierentiable, its derivative is denoted by D
i
g (x). To aid
in the notation, for v X
i
, let
i
v
n
i=1
F
r
i
be the vector (0, , v, , 0) where
the v is in the i
th
slot and for v
n
i=1
F
r
i
, let v
i
denote the entry in the i
th
slot of
v. Thus by saying x
i
g (x) is dierentiable is meant that for v X
i
suciently
small,
g (x +
i
v) g (x) = D
i
g (x) v +o (v) .
Here is a generalization of Theorem 4.10.
Theorem 4.12 Let g, U,
n
i=1
F
r
i
, be given as in Denition 4.11. Then g is C
1
(U)
if and only if D
i
g exists and is continuous on U for each i. In this case,
Dg (x) (v) =
k
D
k
g (x) v
k
(4.8)
Proof: The only if part of the proof is left for you. Suppose then that D
i
g
exists and is continuous for each i. Note that

k
j=1
j
v
j
= (v
1
, , v
k
, 0, , 0).
Thus

n
j=1
j
v
j
= v and dene

0
j=1
j
v
j
0. Therefore,
g (x +v) g (x) =
n
k=1
_
_
g
_
_
x+
k
j=1
j
v
j
_
_
g
_
_
x +
k1
j=1
j
v
j
_
_
_
_
(4.9)
Consider the terms in this sum.
g
_
_
x+
k
j=1
j
v
j
_
_
g
_
_
x +
k1
j=1
j
v
j
_
_
= g (x+
k
v
k
) g (x) + (4.10)
_
_
g
_
_
x+
k
j=1
j
v
j
_
_
g (x+
k
v
k
)
_
_
_
_
g
_
_
x +
k1
j=1
j
v
j
_
_
g (x)
_
_
(4.11)
and the expression in 4.11 is of the form h(v
k
) h(0) where for small w F
r
k
,
h(w) g
_
_
x+
k1
j=1
j
v
j
+
k
w
_
_
g (x +
k
w) .
Therefore,
Dh(w) = D
k
g
_
_
x+
k1
j=1
j
v
j
+
k
w
_
_
D
k
g (x +
k
w)
and by continuity, [[Dh(w)[[ < provided [v[ is small enough. Therefore, by
Theorem 4.9, whenever [v[ is small enough, [h(
k
v
k
) h(0)[ [
k
v
k
[ [v[
which shows that since is arbitrary, the expression in 4.11 is o (v). Now in 4.10
g (x+
k
v
k
) g (x) = D
k
g (x) v
k
+o (v
k
) = D
k
g (x) v
k
+o (v). Therefore, referring
to 4.9,
g (x +v) g (x) =
n
k=1
D
k
g (x) v
k
+o (v)
which shows Dg exists and equals the formula given in 4.8.
The way this is usually used is in the following corollary, a case of Theorem 4.12
obtained by letting F
r
j
= F in the above theorem.
4.2. C
K
FUNCTIONS 89
Corollary 4.13 Let U be an open subset of F
n
and let f :U F
m
be C
1
in the sense
that all the partial derivatives of f exist and are continuous. Then f is dierentiable
and
f (x +v) = f (x) +
n
k=1
f
x
k
(x) v
k
+o(v) .
4.2 C
k
Functions
Recall the notation for partial derivatives in the following denition.
Denition 4.14 Let g : U F
n
. Then
g
x
k
(x)
g
x
k
(x) lim
h0
g (x +he
k
) g (x)
h
Higher order partial derivatives are dened in the usual way.
g
x
k
x
l
(x)

2
g
x
l
x
k
(x)
and so forth.
To deal with higher order partial derivatives in a systematic way, here is a useful
denition.
Denition 4.15 = (
1
, ,
n
) for
1

n
positive integers is called a multi-
index. For a multi-index, [[
1
+ +
n
and if x F
n
,
x =(x
1
, , x
n
),
and f a function, dene
x
1
1
x
2
2
x
n
n
, D
f (x)

||
f (x)
x
1
1
x
2
2
x
n
n
.
The following is the denition of what is meant by a C
k
function.
Denition 4.16 Let U be an open subset of F
n
and let f : U F
m
. Then for k a
nonnegative integer, f is C
k
if for every [[ k, D
f exists and is continuous.

4.3 Mixed Partial Derivatives
Under certain conditions the mixed partial derivatives will always be equal.
This astonishing fact is due to Euler in 1734.
Theorem 4.17 Suppose f : U F
2
1 where U is an open set on which f
x
, f
y
,
f
xy
and f
yx
exist. Then if f
xy
and f
yx
are continuous at the point (x, y) U, it
follows
f
xy
(x, y) = f
yx
(x, y) .
Proof: Since U is open, there exists r > 0 such that B((x, y) , r) U. Now let
[t[ , [s[ < r/2, t, s real numbers and consider
(s, t)
1
st
h(t)
..
f (x +t, y +s) f (x +t, y)
h(0)
..
(f (x, y +s) f (x, y)). (4.12)
Note that (x +t, y +s) U because
[(x +t, y +s) (x, y)[ = [(t, s)[ =
_
t
2
+s
2
_
1/2
_
r
2
4
+
r
2
4
_
1/2
=
r
2
< r.
As implied above, h(t) f (x +t, y +s) f (x +t, y). Therefore, by the mean
value theorem from calculus and the (one variable) chain rule,
(s, t) =
1
st
(h(t) h(0)) =
1
st
h
(t) t
=
1
s
(f
x
(x +t, y +s) f
x
(x +t, y))
for some (0, 1) . Applying the mean value theorem again,
(s, t) = f
xy
(x +t, y +s)
where , (0, 1).
If the terms f (x +t, y) and f (x, y +s) are interchanged in 4.12, (s, t) is un-
changed and the above argument shows there exist , (0, 1) such that
(s, t) = f
yx
(x +t, y +s) .
Letting (s, t) (0, 0) and using the continuity of f
xy
and f
yx
at (x, y) ,
lim
(s,t)(0,0)
(s, t) = f
xy
(x, y) = f
yx
(x, y) .
The following is obtained from the above by simply xing all the variables except
for the two of interest.
Corollary 4.18 Suppose U is an open subset of F
n
and f : U 1 has the property
that for two indices, k, l, f
x
k
, f
x
l
, f
x
l
x
k
, and f
x
k
x
l
exist on U and f
x
k
x
l
and f
x
l
x
k
are both continuous at x U. Then f
x
k
x
l
(x) = f
x
l
x
k
(x) .
By considering the real and imaginary parts of f in the case where f has values
in F you obtain the following corollary.
n
and f : U F has the property
that for two indices, k, l, f
x
k
, f
x
l
, f
x
l
x
k
, and f
x
k
x
l
exist on U and f
x
k
x
l
and f
x
l
x
k
x
k
x
l
(x) = f
x
l
x
k
(x) .
4.4. IMPLICIT FUNCTION THEOREM 91
Finally, by considering the components of f you get the following generalization.
n
and f : U F
m
has the
property that for two indices, k, l, f
x
k
, f
x
l
, f
x
l
x
k
, and f
x
k
x
l
exist on U and f
x
k
x
l
and
f
x
l
x
k
x
k
x
l
(x) = f
x
l
x
k
(x) .
It is necessary to assume the mixed partial derivatives are continuous in order
to assert they are equal. The following is a well known example [5].
Example 4.21 Let
f (x, y) =
_
xy(x
2
y
2
)
x
2
+y
2
if (x, y) ,= (0, 0)
0 if (x, y) = (0, 0)
From the denition of partial derivatives it follows immediately that f
x
(0, 0) =
f
y
(0, 0) = 0. Using the standard rules of dierentiation, for (x, y) ,= (0, 0) ,
f
x
= y
x
4
y
4
+ 4x
2
y
2
(x
2
+y
2
)
2
, f
y
= x
x
4
y
4
4x
2
y
2
(x
2
+y
2
)
2
Now
f
xy
(0, 0) lim
y0
f
x
(0, y) f
x
(0, 0)
y
= lim
y0
y
4
(y
2
)
2
= 1
while
f
yx
(0, 0) lim
x0
f
y
(x, 0) f
y
(0, 0)
x
= lim
x0
x
4
(x
2
)
2
= 1
showing that although the mixed partial derivatives do exist at (0, 0) , they are not
equal there.
4.4 Implicit Function Theorem
The implicit function theorem is one of the greatest theorems in mathematics. There
are many versions of this theorem. However, I will give a very simple proof valid in
nite dimensional spaces.
Theorem 4.22 (implicit function theorem) Suppose U is an open set in 1
n
1
m
.
Let f : U 1
n
be in C
1
(U) and suppose
f (x
0
, y
0
) = 0, D
1
f (x
0
, y
0
)
1
L(1
n
, 1
n
) . (4.13)
Then there exist positive constants, , , such that for every y B(y
0
, ) there
exists a unique x(y) B(x
0
, ) such that
f (x(y) , y) = 0. (4.14)
Furthermore, the mapping, y x(y) is in C
1
(B(y
0
, )).
Proof: Let
f (x, y) =
_
_
_
_
_
f
1
(x, y)
f
2
(x, y)
.
.
.
f
n
(x, y)
_
_
_
_
_
.
Dene for
_
x
1
, , x
n
_
B(x
0
, )
n
and y B(y
0
, ) the following matrix.
J
_
x
1
, , x
n
, y
_
_
_
_
f
1,x
1
_
x
1
, y
_
f
1,x
n
_
x
1
, y
_
.
.
.
.
.
.
f
n,x
1
(x
n
, y) f
n,x
n
(x
n
, y)
_
_
_.
Then by the assumption of continuity of all the partial derivatives, there exists
0
>
0 and
0
> 0 such that if <
0
and <
0
, it follows that for all
_
x
1
, , x
n
_
B(x
0
, )
n
and y B(y
0
, ) ,
det
_
J
_
x
1
, , x
n
, y
__
> r > 0. (4.15)
and B(x
0
,
0
) B(y
0
,
0
) U. Pick y B(y
0
, ) and suppose there exist x, z
B(x
0
, ) such that f (x, y) = f (z, y) = 0. Consider f
i
and let
h(t) f
i
(x +t (z x) , y) .
Then h(1) = h(0) and so by the mean value theorem, h
(t
i
) = 0 for some t
i
(0, 1) .
Therefore, from the chain rule and for this value of t
i
,
h
(t
i
) = Df
i
(x +t
i
(z x) , y) (z x) = 0. (4.16)
Then denote by x
i
the vector, x +t
i
(z x) . It follows from 4.16 that
J
_
x
1
, , x
n
, y
_
(z x) = 0
and so from 4.15 z x = 0. Now it will be shown that if is chosen suciently
small, then for all y B(y
0
, ) , there exists a unique x(y) B(x
0
, ) such that
f (x(y) , y) = 0.
Claim: If is small enough, then the function, h
y
(x) [f (x, y)[
2
achieves its
minimum value on B(x
0
, ) at a point of B(x
0
, ) .
Proof of claim: Suppose this is not the case. Then there exists a sequence
k
0 and for some y
k
having [y
k
y
0
[ <
k
, the minimum of h
y
k
occurs on a point
of the boundary of B(x
0
, ), x
k
such that [x
0
x
k
[ = . Now taking a subsequence,
4.4. IMPLICIT FUNCTION THEOREM 93
still denoted by k, it can be assumed that x
k
x with [x x
0
[ = and y
k
y
0
.
Let > 0. Then for k large enough, h
y
k
(x
0
) < because f (x
0
, y
0
) = 0. Therefore,
from the denition of x
k
, h
y
k
(x
k
) < . Passing to the limit yields h
y
0
(x) . Since
> 0 is arbitrary, it follows that h
y
0
(x) = 0 which contradicts the rst part of the
argument in which it was shown that for y B(y
0
, ) there is at most one point, x
of B(x
0
, ) where f (x, y) = 0. Here two have been obtained, x
0
and x. This proves
the claim.
Choose <
0
and also small enough that the above claim holds and let x(y)
denote a point of B(x
0
, ) at which the minimum of h
y
on B(x
0
, ) is achieved.
Since x(y) is an interior point, you can consider h
y
(x(y) +tv) for [t[ small and
conclude this function of t has a zero derivative at t = 0. Thus
Dh
y
(x(y)) v = 0 = 2f (x(y) , y)
T
D
1
f (x(y) , y) v
for every vector v. But from 4.15 and the fact that v is arbitrary, it follows
f (x(y) , y) = 0. This proves the existence of the function y x(y) such that
f (x(y) , y) = 0 for all y B(y
0
, ) .
It remains to verify this function is a C
1
function. To do this, let y
1
and y
2
be
points of B(y
0
, ) . Then as before, consider the i
th
component of f and consider
the same argument using the mean value theorem to write
0 = f
i
(x(y
1
) , y
1
) f
i
(x(y
2
) , y
2
)
= f
i
(x(y
1
) , y
1
) f
i
(x(y
2
) , y
1
) +f
i
(x(y
2
) , y
1
) f
i
(x(y
2
) , y
2
)
= D
1
f
i
_
x
i
, y
1
_
(x(y
1
) x(y
2
)) +D
2
f
i
_
x(y
2
) , y
i
_
(y
1
y
2
) .
Therefore,
J
_
x
1
, , x
n
, y
1
_
(x(y
1
) x(y
2
)) = M (y
1
y
2
) (4.17)
where M is the matrix whose i
th
row is D
2
f
i
_
x(y
2
) , y
i
_
. Then from 4.15 there
exists a constant, C independent of the choice of y B(y
0
, ) such that
J
_
x
1
, , x
n
, y
_
1
< C
whenever
_
x
1
, , x
n
_
B(x
0
, )
n
. By continuity of the partial derivatives of f
it also follows there exists a constant, C
1
such that [[D
2
f
i
(x, y)[[ < C
1
whenever,
(x, y) B(x
0
, ) B(y
0
, ) . Hence [[M[[ must also be bounded independent of
the choice of y
1
and y
2
in B(y
0
, ) . From 4.17, it follows there exists a constant,
C such that for all y
1
, y
2
in B(y
0
, ) ,
[x(y
1
) x(y
2
)[ C [y
1
y
2
[ . (4.18)
It follows as in the proof of the chain rule that
o(x(y +v) x(y)) = o(v) . (4.19)
Now let y B(y
0
, ) and let [v[ be suciently small that y +v B(y
0
, ) .
Then
0 = f (x(y +v) , y +v) f (x(y) , y)
= f (x(y +v) , y +v) f (x(y +v) , y) +f (x(y +v) , y) f (x(y) , y)
= D
2
f (x(y +v) , y) v +D
1
f (x(y) , y) (x(y +v) x(y)) +o([x(y +v) x(y)[)
= D
2
f (x(y) , y) v +D
1
f (x(y) , y) (x(y +v) x(y)) +
o([x(y +v) x(y)[) + (D
2
f (x(y +v) , y) vD
2
f (x(y) , y) v)
= D
2
f (x(y) , y) v +D
1
f (x(y) , y) (x(y +v) x(y)) +o(v) .
Therefore,
x(y +v) x(y) = D
1
f (x(y) , y)
1
D
2
f (x(y) , y) v +o(v)
which shows that Dx(y) = D
1
f (x(y) , y)
1
D
2
f (x(y) , y) and y Dx(y) is
continuous. This proves the theorem.
In practice, how do you verify the condition, D
1
f (x
0
, y
0
)
1
L(F
n
, F
n
)?
In practice, how do you verify the condition, D
1
f (x
0
, y
0
)
1
L(F
n
, F
n
)?
f (x, y) =
_
_
_
f
1
(x
1
, , x
n
, y
1
, , y
n
)
.
.
.
f
n
(x
1
, , x
n
, y
1
, , y
n
)
_
_
_.
The matrix of the linear transformation, D
1
f (x
0
, y
0
) is then
_
_
_
_
f
1
(x
1
, ,x
n
,y
1
, ,y
n
)
x
1

f
1
(x
1
, ,x
n
,y
1
, ,y
n
)
x
n
.
.
.
.
.
.
f
n
(x
1
, ,x
n
,y
1
, ,y
n
)
x
1

f
n
(x
1
, ,x
n
,y
1
, ,y
n
)
x
n
_
_
_
_
and from linear algebra, D
1
f (x
0
, y
0
)
1
L(F
n
, F
n
) exactly when the above matrix
has an inverse. In other words when
det
_
_
_
_
f
1
(x
1
, ,x
n
,y
1
, ,y
n
)
x
1

f
1
(x
1
, ,x
n
,y
1
, ,y
n
)
x
n
.
.
.
.
.
.
f
n
(x
1
, ,x
n
,y
1
, ,y
n
)
x
1

f
n
(x
1
, ,x
n
,y
1
, ,y
n
)
x
n
_
_
_
_
,= 0
at (x
0
, y
0
). The above determinant is important enough that it is given special
notation. Letting z = f (x, y) , the above determinant is often written as
(z
1
, , z
n
)
(x
1
, , x
n
)
.
Of course you can replace 1 with F in the above by applying the above to the
situation in which each F is replaced with 1
2
.
Corollary 4.23 (implicit function theorem) Suppose U is an open set in F
n
F
m
.
Let f : U F
n
be in C
1
(U) and suppose
f (x
0
, y
0
) = 0, D
1
f (x
0
, y
0
)
1
L(F
n
, F
n
) . (4.20)
4.5. MORE CONTINUOUS PARTIAL DERIVATIVES 95
0
, ) there
0
, ) such that
f (x(y) , y) = 0. (4.21)
1
(B(y
0
, )).
The next theorem is a very important special case of the implicit function the-
orem known as the inverse function theorem. Actually one can also obtain the
implicit function theorem from the inverse function theorem. It is done this way in
[32] and in [3].
Theorem 4.24 (inverse function theorem) Let x
0
U F
n
and let f : U F
n
.
Suppose
f is C
1
(U) , and Df (x
0
)
1
L(F
n
, F
n
). (4.22)
Then there exist open sets, W, and V such that
x
0
W U, (4.23)
f : W V is one to one and onto, (4.24)
f
1
is C
1
. (4.25)
Proof: Apply the implicit function theorem to the function
F(x, y) f (x) y
where y
0
f (x
0
). Thus the function y x(y) dened in that theorem is f
1
.
Now let
W B(x
0
, ) f
1
(B(y
0
, ))
and
V B(y
0
, ) .
4.5 More Continuous Partial Derivatives
Corollary 4.23 will now be improved slightly. If f is C
k
, it follows that the function
which is implicitly dened is also in C
k
, not just C
1
. Since the inverse function
theorem comes as a case of the implicit function theorem, this shows that the
inverse function also inherits the property of being C
k
.
Theorem 4.25 (implicit function theorem) Suppose U is an open set in F
n
F
m
.
Let f : U F
n
be in C
k
(U) and suppose
f (x
0
, y
0
) = 0, D
1
f (x
0
, y
0
)
1
L(F
n
, F
n
) . (4.26)
0
, ) there
0
, ) such that
f (x(y) , y) = 0. (4.27)
k
(B(y
0
, )).
Proof: From Corollary 4.23 y x(y) is C
1
. It remains to show it is C
k
for
k > 1 assuming that f is C
k
. From 4.27
x
y
l
= D
1
(x, y)
1
f
y
l
.
Thus the following formula holds for q = 1 and [[ = q.
D
x(y) =
||q
M
(x, y) D
f (x, y) (4.28)
where M
is a matrix whose entries are dierentiable functions of D
(x) for [[ < q

and D
f (x, y) for [[ q. This follows easily from the description of D

1
(x, y)
1
in
terms of the cofactor matrix and the determinant of D
1
(x, y). Suppose 4.28 holds
for [[ = q < k. Then by induction, this yields x is C
q
. Then
D
x(y)
y
p
=
||||
M
(x, y)
y
p
D
f (x, y) +M
(x, y)
D
f (x, y)
y
p
.
By the chain rule
M
(x,y)
y
p
is a matrix whose entries are dierentiable functions of
D
f (x, y) for [[ q +1 and D
(x) for [[ < q +1. It follows since y

p
was arbitrary
that for any [[ = q + 1, a formula like 4.28 holds with q being replaced by q + 1.
By induction, x is C
k
. This proves the theorem.
As a simple corollary this yields an improved version of the inverse function
theorem.
Theorem 4.26 (inverse function theorem) Let x
0
U F
n
and let f : U F
n
.
Suppose for k a positive integer,
f is C
k
(U) , and Df (x
0
)
1
L(F
n
, F
n
). (4.29)
Then there exist open sets, W, and V such that
x
0
W U, (4.30)
f : W V is one to one and onto, (4.31)
f
1
is C
k
. (4.32)
Part II
Lecture Notes For Math 641
and 642
97
Metric Spaces And General
Topological Spaces
5.1 Metric Space
Denition 5.1 A metric space is a set, X and a function d : X X [0, )
which satises the following properties.
d (x, y) = d (y, x)
d (x, y) 0 and d (x, y) = 0 if and only if x = y
d (x, y) d (x, z) +d (z, y) .
You can check that 1
n
and C
n
are metric spaces with d (x, y) = [x y[ . How-
ever, there are many others. The denitions of open and closed sets are the same
for a metric space as they are for 1
n
.
Denition 5.2 A set, U in a metric space is open if whenever x U, there exists
r > 0 such that B(x, r) U. As before, B(x, r) y : d (x, y) < r . Closed sets are
those whose complements are open. A point p is a limit point of a set, S if for every
r > 0, B(p, r) contains innitely many points of S. A sequence, x
n
converges to
a point x if for every > 0 there exists N such that if n N, then d (x, x
n
) < .
x
n
is a Cauchy sequence if for every > 0 there exists N such that if m, n N,
then d (x
n
, x
m
) < .
Lemma 5.3 In a metric space, X every ball, B(x, r) is open. A set is closed if
and only if it contains all its limit points. If p is a limit point of S, then there exists
a sequence of distinct points of S, x
n
such that lim
n
x
n
= p.
Proof: Let z B(x, r). Let = r d (x, z) . Then if w B(z, ) ,
d (w, x) d (x, z) +d (z, w) < d (x, z) +r d (x, z) = r.
Therefore, B(z, ) B(x, r) and this shows B(x, r) is open.
The properties of balls are presented in the following theorem.
99
100 METRIC SPACES AND GENERAL TOPOLOGICAL SPACES
Theorem 5.4 Suppose (X, d) is a metric space. Then the sets B(x, r) : r >
0, x X satisfy
B(x, r) : r > 0, x X = X (5.1)
If p B(x, r
1
) B(z, r
2
), there exists r > 0 such that
B(p, r) B(x, r
1
) B(z, r
2
) . (5.2)
Proof: Observe that the union of these balls includes the whole space, X so
5.1 is obvious. Consider 5.2. Let p B(x, r
1
) B(z, r
2
). Consider
r min(r
1
d (x, p) , r
2
d (z, p))
and suppose y B(p, r). Then
d (y, x) d (y, p) +d (p, x) < r
1
d (x, p) +d (x, p) = r
1
and so B(p, r) B(x, r
1
). By similar reasoning, B(p, r) B(z, r
2
). This proves
the theorem.
Let K be a closed set. This means K
C
X K is an open set. Let p be a
limit point of K. If p K
C
, then since K
C
is open, there exists B(p, r) K
C
. But
this contradicts p being a limit point because there are no points of K in this ball.
Hence all limit points of K must be in K.
Suppose next that K contains its limit points. Is K
C
open? Let p K
C
.
Then p is not a limit point of K. Therefore, there exists B(p, r) which contains at
most nitely many points of K. Since p / K, it follows that by making r smaller if
necessary, B(p, r) contains no points of K. That is B(p, r) K
C
showing K
C
is
open. Therefore, K is closed.
Suppose now that p is a limit point of S. Let x
1
(S p) B(p, 1) . If
x
1
, , x
k
have been chosen, let
r
k+1
min
_
d (p, x
i
) , i = 1, , k,
1
k + 1
_
.
Let x
k+1
(S p) B(p, r
k+1
) . This proves the lemma.
Lemma 5.5 If x
n
is a Cauchy sequence in a metric space, X and if some subse-
quence, x
n
k
converges to x, then x
n
converges to x. Also if a sequence converges,
then it is a Cauchy sequence.
Proof: Note rst that n
k
k because in a subsequence, the indices, n
1
, n
2
,
are strictly increasing. Let > 0 be given and let N be such that for k >
N, d (x, x
n
k
) < /2 and for m, n N, d (x
m
, x
n
) < /2. Pick k > n. Then if n > N,
d (x
n
, x) d (x
n
, x
n
k
) +d (x
n
k
, x) <

2
+

2
= .
Finally, suppose lim
n
x
n
= x. Then there exists N such that if n > N, then
d (x
n
, x) < /2. it follows that for m, n > N,
d (x
n
, x
m
) d (x
n
, x) +d (x, x
m
) <

2
+

2
= .
5.2. COMPACTNESS IN METRIC SPACE 101
5.2 Compactness In Metric Space
Many existence theorems in analysis depend on some set being compact. Therefore,
it is important to be able to identify compact sets. The purpose of this section is
to describe compact sets in a metric space.
Denition 5.6 Let A be a subset of X. A is compact if whenever A is contained
in the union of a set of open sets, there exists nitely many of these open sets whose
union contains A. (Every open cover admits a nite subcover.) A is sequen-
tially compact means every sequence has a convergent subsequence converging to
an element of A.
In a metric space compact is not the same as closed and bounded!
Example 5.7 Let X be any innite set and dene d (x, y) = 1 if x ,= y while
d (x, y) = 0 if x = y.
You should verify the details that this is a metric space because it satises the
axioms of a metric. The set X is closed and bounded because its complement is
which is clearly open because every point of is an interior point. (There are
none.) Also X is bounded because X = B(x, 2). However, X is clearly not compact
because
_
B
_
x,
1
2
_
: x X
_
is a collection of open sets whose union contains X but
since they are all disjoint and nonempty, there is no nite subset of these whose
union contains X. In fact B
_
x,
1
2
_
= x.
From this example it is clear something more than closed and bounded is needed.
If you are not familiar with the issues just discussed, ignore them and continue.
Denition 5.8 In any metric space, a set E is totally bounded if for every > 0
there exists a nite set of points x
1
, , x
n
such that
E
n
i=1
B(x
i
, ).
This nite set of points is called an net.
The following proposition tells which sets in a metric space are compact. First
here is an interesting lemma.
Lemma 5.9 Let X be a metric space and suppose D is a countable dense subset
of X. In other words, it is being assumed X is a separable metric space. Consider
the open sets of the form B(d, r) where r is a positive rational number and d D.
Denote this countable collection of open sets by B. Then every open set is the union
of sets of B. Furthermore, if ( is any collection of open sets, there exists a countable
subset, U
n
( such that
n
U
n
= (.
Proof: Let U be an open set and let x U. Let B(x, ) U. Then by density of
D, there exists d DB(x, /4) . Now pick r (/4, 3/4) and consider B(d, r) .
Clearly, B(d, r) contains the point x because r > /4. Is B(d, r) B(x, )? if so,
this proves the lemma because x was an arbitrary point of U. Suppose z B(d, r) .
Then
d (z, x) d (z, d) +d (d, x) < r +

4
<
3
4
+

4
=
Now let ( be any collection of open sets. Each set in this collection is the union
of countably many sets of B. Let B
denote the sets of B which are contained in

some set of (. Thus B
= (. Then for each B B
, pick U
B
( such that
B U
B
. Then U
B
: B B
is a countable collection of sets of ( whose union

equals (. Therefore, this proves the lemma.
Proposition 5.10 Let (X, d) be a metric space. Then the following are equivalent.
(X, d) is compact, (5.3)
(X, d) is sequentially compact, (5.4)
(X, d) is complete and totally bounded. (5.5)
Proof: Suppose 5.3 and let x
k
be a sequence. Suppose x
k
has no convergent
subsequence. If this is so, then no value of the sequence is repeated more than
nitely many times. Also x
k
has no limit point because if it did, there would
exist a subsequence which converges. To see this, suppose p is a limit point of
x
k
. Then in B(p, 1) there are innitely many points of x
k
. Pick one called
x
k
1
. Now if x
k
1
, x
k
2
, , x
k
n
have been picked with x
k
i
B(p, 1/i) , consider
B(p, 1/ (n + 1)) . There are innitely many points of x
k
in this ball also. Pick
x
k
n+1
such that k
n+1
> k
n
. Then x
k
n
n=1
is a subsequence which converges to p
and it is assumed this does not happen. Thus x
k
has no limit points. It follows
the set
C
n
= x
k
: k n
is a closed set because it has no limit points and if
U
n
= C
C
n
,
then
X =
n=1
U
n
but there is no nite subcovering, because no value of the sequence is repeated more
than nitely many times. Note x
k
is not in U
n
whenever k > n. This contradicts
compactness of (X, d). This shows 5.3 implies 5.4.
Now suppose 5.4 and let x
n
be a Cauchy sequence. Is x
n
convergent? By
sequential compactness x
n
k
x for some subsequence. By Lemma 5.5 it follows
that x
n
also converges to x showing that (X, d) is complete. If (X, d) is not
totally bounded, then there exists > 0 for which there is no net. Hence there
exists a sequence x
k
with d (x
k
, x
l
) for all l ,= k. By Lemma 5.5 again,
this contradicts 5.4 because no subsequence can be a Cauchy sequence and so no
subsequence can converge. This shows 5.4 implies 5.5.
5.2. COMPACTNESS IN METRIC SPACE 103
Now suppose 5.5. What about 5.4? Let p
n
be a sequence and let x
n
i
m
n
i=1
be
a 2
n
net for n = 1, 2, . Let
B
n
B
_
x
n
i
n
, 2
n
_
be such that B
n
contains p
k
for innitely many values of k and B
n
B
n+1
,= .
To do this, suppose B
n
contains p
k
for innitely many values of k. Then one of
the sets which intersect B
n
, B
_
x
n+1
i
, 2
(n+1)
_
must contain p
k
for innitely many
values of k because all these indices of points from p
n
contained in B
n
must be
accounted for in one of nitely many sets, B
_
x
n+1
i
, 2
(n+1)
_
. Thus there exists a
strictly increasing sequence of integers, n
k
such that
p
n
k
B
k
.
Then if k l,
d (p
n
k
, p
n
l
)
k1
i=l
d
_
p
n
i+1
, p
n
i
_
<
k1
i=l
2
(i1)
< 2
(l2)
.
Consequently p
n
k
is a Cauchy sequence. Hence it converges because the metric
space is complete. This proves 5.4.
Now suppose 5.4 and 5.5 which have now been shown to be equivalent. Let D
n
be a n
1
net for n = 1, 2, and let
D =
n=1
D
n
.
Thus D is a countable dense subset of (X, d).
Now let ( be any set of open sets such that ( X. By Lemma 5.9, there
exists a countable subset of (,
( = U
n
n=1
such that
( = (. If ( admits no nite subcover, then neither does

( and there ex-
ists p
n
X
n
k=1
U
k
. Then since X is sequentially compact, there is a subsequence
p
n
k
such that p
n
k
converges. Say
p = lim
k
p
n
k
.
All but nitely many points of p
n
k
are in X
n
k=1
U
k
. Therefore p X
n
k=1
U
k
for each n. Hence
p /
k=1
U
k
contradicting the construction of U
n
n=1
which required that
n=1
U
n
X. Hence
X is compact. This proves the proposition.
Consider 1
n
. In this setting totally bounded and bounded are the same. This
will yield a proof of the Heine Borel theorem from advanced calculus.
Lemma 5.11 A subset of 1
n
is totally bounded if and only if it is bounded.
Proof: Let A be totally bounded. Is it bounded? Let x
1
, , x
p
be a 1 net for
A. Now consider the ball B(0, r + 1) where r > max ([x
i
[ : i = 1, , p) . If z A,
then z B(x
j
, 1) for some j and so by the triangle inequality,
[z 0[ [z x
j
[ +[x
j
[ < 1 +r.
Thus A B(0,r + 1) and so A is bounded.
Now suppose A is bounded and suppose A is not totally bounded. Then there
exists > 0 such that there is no net for A. Therefore, there exists a sequence of
points a
i
with [a
i
a
j
[ if i ,= j. Since A is bounded, there exists r > 0 such
that
A [r, r)
n
.
(x [r, r)
n
means x
i
[r, r) for each i.) Now dene o to be all cubes of the form
n
k=1
[a
k
, b
k
)
where
a
k
= r +i2
p
r, b
k
= r + (i + 1) 2
p
r,
for i 0, 1, , 2
p+1
1. Thus o is a collection of
_
2
p+1
_
n
non overlapping cubes
whose union equals [r, r)
n
and whose diameters are all equal to 2
p
r
n. Now
choose p large enough that the diameter of these cubes is less than . This yields a
contradiction because one of the cubes must contain innitely many points of a
i
.
The next theorem is called the Heine Borel theorem and it characterizes the
compact sets in 1
n
.
Theorem 5.12 A subset of 1
n
is compact if and only if it is closed and bounded.
Proof: Since a set in 1
n
is totally bounded if and only if it is bounded, this
theorem follows from Proposition 5.10 and the observation that a subset of 1
n
is
closed if and only if it is complete. This proves the theorem.
5.3 Some Applications Of Compactness
The following corollary is an important existence theorem which depends on com-
pactness.
Corollary 5.13 Let X be a compact metric space and let f : X 1 be continuous.
Then max f (x) : x X and minf (x) : x X both exist.
5.3. SOME APPLICATIONS OF COMPACTNESS 105
Proof: First it is shown f (X) is compact. Suppose ( is a set of open sets whose
union contains f (X). Then since f is continuous f
1
(U) is open for all U (.
Therefore,
_
f
1
(U) : U (
_
is a collection of open sets whose union contains X.
Since X is compact, it follows nitely many of these,
_
f
1
(U
1
) , , f
1
(U
p
)
_
contains X in their union. Therefore, f (X)
p
k=1
U
k
showing f (X) is compact
as claimed.
Now since f (X) is compact, Theorem 5.12 implies f (X) is closed and bounded.
Therefore, it contains its inf and its sup. Thus f achieves both a maximum and a
minimum.
Denition 5.14 Let X, Y be metric spaces and f : X Y a function. f is
uniformly continuous if for all > 0 there exists > 0 such that whenever x
1
and
x
2
are two points of X satisfying d (x
1
, x
2
) < , it follows that d (f (x
1
) , f (x
2
)) < .
A very important theorem is the following.
Theorem 5.15 Suppose f : X Y is continuous and X is compact. Then f is
uniformly continuous.
Proof: Suppose this is not true and that f is continuous but not uniformly
continuous. Then there exists > 0 such that for all > 0 there exist points,
p
and q
such that d (p
, q
) < and yet d (f (p
) , f (q
)) . Let p
n
and q
n
be the points which go with = 1/n. By Proposition 5.10 p
n
has a convergent
subsequence, p
n
k
converging to a point, x X. Since d (p
n
, q
n
) <
1
n
, it follows
that q
n
k
x also. Therefore,
d (f (p
n
k
) , f (q
n
k
)) d (f (p
n
k
) , f (x)) +d (f (x) , f (q
n
k
))
but by continuity of f, both d (f (p
n
k
) , f (x)) and d (f (x) , f (q
n
k
)) converge to 0
as k contradicting the above inequality. This proves the theorem.
Another important property of compact sets in a metric space concerns the nite
intersection property.
Denition 5.16 If every nite subset of a collection of sets has nonempty inter-
section, the collection has the nite intersection property.
Theorem 5.17 Suppose T is a collection of compact sets in a metric space, X
which has the nite intersection property. Then each of these sets is closed and
there exists a point in their intersection. (T ,= ).
Proof: First I show each compact set is closed. Let K be a nonempty compact
set and suppose p / K. Then for each x K, let V
x
= B(x, d (p, x) /3) and
U
x
= B(p, d (p, x) /3) so that U
x
and V
x
have empty intersection. Then since V
is compact, there are nitely many V
x
which cover K say V
x
1
, , V
x
n
. Then let
U =
n
i=1
U
x
i
. It follows p U and U has empty intersection with K. In fact U has
empty intersection with
n
i=1
V
x
i
. Since U is an open set and p K
C
is arbitrary,
it follows K
C
is an open set.
Consider now the claim about the intersection. If this were not so,
_
F
C
: F T
_
=
X and so, in particular, picking some F
0
T,
_
F
C
: F T
_
would be an open
cover of F
0
. Since F
0
is compact, some nite subcover, F
C
1
, , F
C
m
exists. But
then F
0

m
k=1
F
C
k
which means
m
k=0
F
k
= , contrary to the nite intersection
property. To see this, note that if x F
0
, then it must fail to be in some F
k
and so
it is not in
m
k=0
F
k
. Since this is true for every x it follows
m
k=0
F
k
= .
Theorem 5.18 Let X
i
be a compact metric space with metric d
i
. Then
m
i=1
X
i
is
also a compact metric space with respect to the metric, d (x, y) max
i
(d
i
(x
i
, y
i
)).
Proof: This is most easily seen from sequential compactness. Let
_
x
k
_
k=1
be a sequence of points in

m
i=1
X
i
. Consider the i
th
component of x
k
, x
k
i
. It
follows
_
x
k
i
_
is a sequence of points in X
i
and so it has a convergent subsequence.
Compactness of X
1
implies there exists a subsequence of x
k
, denoted by
_
x
k
1
_
such
that
lim
k
1
x
k
1
1
x
1
X
1
.
Now there exists a further subsequence, denoted by
_
x
k
2
_
such that in addition to
this, x
k
2
2
x
2
X
2
. After taking m such subsequences, there exists a subsequence,
_
x
l
_
such that lim
l
x
l
i
= x
i
X
i
for each i. Therefore, letting x (x
1
, , x
m
),
x
l
x in

m
i=1
X
i
5.4 Ascoli Arzela Theorem
Denition 5.19 Let (X, d) be a complete metric space. Then it is said to be locally
compact if B(x, r) is compact for each r > 0.
Thus if you have a locally compact metric space, then if a
n
is a bounded
sequence, it must have a convergent subsequence.
Let K be a compact subset of 1
n
and consider the continuous functions which
have values in a locally compact metric space, (X, d) where d denotes the metric on
X. Denote this space as C (K, X) .
Denition 5.20 For f, g C (K, X) , where K is a compact subset of 1
n
and X
is a locally compact complete metric space dene
K
(f, g) supd (f (x) , g (x)) : x K .
Then
K
provides a distance which makes C (K, X) into a metric space.
The Ascoli Arzela theorem is a major result which tells which subsets of C (K, X)
are sequentially compact.
Denition 5.21 Let A C (K, X) for K a compact subset of 1
n
. Then A is said
to be uniformly equicontinuous if for every > 0 there exists a > 0 such that
whenever x, y K with [x y[ < and f A,
d (f (x) , f (y)) < .
5.4. ASCOLI ARZELA THEOREM 107
The set, A is said to be uniformly bounded if for some M < , and a X,
f (x) B(a, M)
for all f A and x K.
Uniform equicontinuity is like saying that the whole set of functions, A, is uni-
formly continuous on K uniformly for f A. The version of the Ascoli Arzela
theorem I will present here is the following.
Theorem 5.22 Suppose K is a nonempty compact subset of 1
n
and A C (K, X)
is uniformly bounded and uniformly equicontinuous. Then if f
k
A, there exists
a function, f C (K, X) and a subsequence, f
k
l
such that
lim
l
K
(f
k
l
, f) = 0.
To give a proof of this theorem, I will rst prove some lemmas.
Lemma 5.23 If K is a compact subset of 1
n
, then there exists D x
k
k=1
K
such that D is dense in K. Also, for every > 0 there exists a nite set of points,
x
1
, , x
m
K, called an net such that
m
i=1
B(x
i
, ) K.
Proof: For m N, pick x
m
1
K. If every point of K is within 1/m of x
m
1
, stop.
Otherwise, pick
x
m
2
K B(x
m
1
, 1/m) .
If every point of K contained in B(x
m
1
, 1/m) B(x
m
2
, 1/m) , stop. Otherwise, pick
x
m
3
K (B(x
m
1
, 1/m) B(x
m
2
, 1/m)) .
If every point of K is contained in B(x
m
1
, 1/m) B(x
m
2
, 1/m) B(x
m
3
, 1/m) , stop.
Otherwise, pick
x
m
4
K (B(x
m
1
, 1/m) B(x
m
2
, 1/m) B(x
m
3
, 1/m))
Continue this way until the process stops, say at N (m). It must stop because
if it didnt, there would be a convergent subsequence due to the compactness of
K. Ultimately all terms of this convergent subsequence would be closer than 1/m,
violating the manner in which they are chosen. Then D =
m=1
N(m)
k=1
x
m
k
. This
is countable because it is a countable union of countable sets. If y K and > 0,
then for some m, 2/m < and so B(y, ) must contain some point of x
m
k
since
otherwise, the process stopped too soon. You could have picked y. This proves the
lemma.
Lemma 5.24 Suppose D is dened above and g
m
is a sequence of functions of
A having the property that for every x
k
D,
lim
m
g
m
(x
k
) exists.
Then there exists g C (K, X) such that
lim
m
(g
m
, g) = 0.
Proof: Dene g rst on D.
g (x
k
) lim
m
g
m
(x
k
) .
Next I show that g
m
converges at every point of K. Let x K and let > 0 be
given. Choose x
k
such that for all f A,
d (f (x
k
) , f (x)) <

3
.
I can do this by the equicontinuity. Now if p, q are large enough, say p, q M,
d (g
p
(x
k
) , g
q
(x
k
)) <

3
.
Therefore, for p, q M,
d (g
p
(x) , g
q
(x)) d (g
p
(x) , g
p
(x
k
)) +d (g
p
(x
k
) , g
q
(x
k
)) +d (g
q
(x
k
) , g
q
(x))
<

3
+

3
+

3
=
It follows that g
m
(x) is a Cauchy sequence having values X. Therefore, it con-
verges. Let g (x) be the name of the thing it converges to.
Let > 0 be given and pick > 0 such that whenever x, y K and [x y[ < ,
it follows d (f (x) , f (y)) <

3
for all f A. Now let x
1
, , x
m
be a net for K
as in Lemma 5.23. Since there are only nitely many points in this net, it follows
that there exists N such that for all p, q N,
d (g
q
(x
i
) , g
p
(x
i
)) <

3
for all x
1
, , x
m
. Therefore, for arbitrary x K, pick x
i
x
1
, , x
m
such
that [x
i
x[ < . Then
d (g
q
(x) , g
p
(x)) d (g
q
(x) , g
q
(x
i
)) +d (g
q
(x
i
) , g
p
(x
i
)) +d (g
p
(x
i
) , g
p
(x))
<

3
+

3
+

3
= .
Since N does not depend on the choice of x, it follows this sequence g
m
is uni-
formly Cauchy. That is, for every > 0, there exists N such that if p, q N,
then
(g
p
, g
q
) < .
5.4. ASCOLI ARZELA THEOREM 109
Next, I need to verify that the function, g is a continuous function. Let N be
large enough that whenever p, q N, the above holds. Then for all x K,
d (g (x) , g
p
(x))

3
(5.6)
whenever p N. This follows from observing that for p, q N,
d (g
q
(x) , g
p
(x)) <

3
and then taking the limit as q to obtain 5.6. In passing to the limit, you can
use the following simple claim.
Claim: In a metric space, if a
n
a, then d (a
n
, b) d (a, b) .
Proof of the claim: You note that by the triangle inequality, d (a
n
, b)
d (a, b) d (a
n
, a) and d (a, b) d (a
n
, b) d (a
n
, a) and so
[d (a
n
, b) d (a, b)[ d (a
n
, a) .
Now let p satisfy 5.6 for all x whenever p > N. Also pick > 0 such that if
[x y[ < , then
d (g
p
(x) , g
p
(y)) <

3
.
Then if [x y[ < ,
d (g (x) , g (y)) d (g (x) , g
p
(x)) +d (g
p
(x) , g
p
(y)) +d (g
p
(y) , g (y))
<

3
+

3
+

3
= .
Since was arbitrary, this shows that g is continuous.
It only remains to verify that (g, g
k
) 0. But this follows from 5.6. This
proves the lemma.
With these lemmas, it is time to prove Theorem 5.22.
Proof of Theorem 5.22: Let D = x
k
be the countable dense set of K
gauranteed by Lemma 5.23 and let (1, 1) , (1, 2) , (1, 3) , (1, 4) , (1, 5) , be a sub-
sequence of N such that
lim
k
f
(1,k)
(x
1
) exists.
This is where the local compactness of X is being used. Now let
(2, 1) , (2, 2) , (2, 3) , (2, 4) , (2, 5) ,
be a subsequence of (1, 1) , (1, 2) , (1, 3) , (1, 4) , (1, 5) , which has the property
that
lim
k
f
(2,k)
(x
2
) exists.
Thus it is also the case that
f
(2,k)
(x
1
) converges to lim
k
f
(1,k)
(x
1
) .
because every subsequence of a convergent sequence converges to the same thing as
the convergent sequence. Continue this way and consider the array
f
(1,1)
, f
(1,2)
, f
(1,3)
, f
(1,4)
, converges at x
1
f
(2,1)
, f
(2,2)
, f
(2,3)
, f
(2,4)
converges at x
1
and x
2
f
(3,1)
, f
(3,2)
, f
(3,3)
, f
(3,4)
converges at x
1
, x
2
, and x
3
.
.
.
Now let g
k
f
(k,k)
. Thus g
k
is ultimately a subsequence of
_
f
(m,k)
_
whenever
k > m and therefore, g
k
converges at each point of D. By Lemma 5.24 it follows
there exists g C (K; X) such that
lim
k
(g, g
k
) = 0.
This proves the Ascoli Arzela theorem.
Actually there is an if and only if version of it but the most useful case is what
is presented here. The process used to get the subsequence in the proof is called
the Cantor diagonalization procedure.
5.5 General Topological Spaces
It turns out that metric spaces are not suciently general for some applications.
This section is a brief introduction to general topology. In making this generaliza-
tion, the properties of balls which are the conclusion of Theorem 5.4 on Page 100
are stated as axioms for a subset of the power set of a given set which will be known
as a basis for the topology. More can be found in [31] and the references listed
there.
Denition 5.25 Let X be a nonempty set and suppose B T (X). Then B is a
basis for a topology if it satises the following axioms.
1.) Whenever p A B for A, B B, it follows there exists C B such that
p C A B.
2.) B = X.
Then a subset, U, of X is an open set if for every point, x U, there exists
B B such that x B U. Thus the open sets are exactly those which can be
obtained as a union of sets of B. Denote these subsets of X by the symbol and
refer to as the topology or the set of open sets.
Note that this is simply the analog of saying a set is open exactly when every
point is an interior point.
Proposition 5.26 Let X be a set and let B be a basis for a topology as dened
above and let be the set of open sets determined by B. Then
, X , (5.7)
5.5. GENERAL TOPOLOGICAL SPACES 111
If ( , then ( (5.8)
If A, B , then A B . (5.9)
Proof: If p then there exists B B such that p B because there are
no points in . Therefore, . Now if p X, then by part 2.) of Denition 5.25
p B X for some B B and so X .
If ( , and if p (, then there exists a set, B ( such that p B.
However, B is itself a union of sets from B and so there exists C B such that
p C B (. This veries 5.8.
Finally, if A, B and p A B, then since A and B are themselves unions
of sets of B, it follows there exists A
1
, B
1
B such that A
1
A, B
1
B, and
p A
1
B
1
. Therefore, by 1.) of Denition 5.25 there exists C B such that
p C A
1
B
1
A B, showing that A B as claimed. Of course if
A B = , then A B . This proves the proposition.
Denition 5.27 A set X together with such a collection of its subsets satisfying
5.7-5.9 is called a topological space. is called the topology or set of open sets of
X.
Denition 5.28 A topological space is said to be Hausdor if whenever p and q
are distinct points of X, there exist disjoint open sets U, V such that p U, q V .
In other words points can be separated with open sets.
Hausdor
p
U
q
V
Denition 5.29 A subset of a topological space is said to be closed if its comple-
ment is open. Let p be a point of X and let E X. Then p is said to be a limit
point of E if every open set containing p contains a point of E distinct from p.
Note that if the topological space is Hausdor, then this denition is equivalent
to requiring that every open set containing p contains innitely many points from
E. Why?
Theorem 5.30 A subset, E, of X is closed if and only if it contains all its limit
points.
Proof: Suppose rst that E is closed and let x be a limit point of E. Is x E?
If x / E, then E
C
is an open set containing x which contains no points of E, a
contradiction. Thus x E.
Now suppose E contains all its limit points. Is the complement of E open? If
x E
C
, then x is not a limit point of E because E has all its limit points and so
there exists an open set, U containing x such that U contains no point of E other
than x. Since x / E, it follows that x U E
C
which implies E
C
is an open set
because this shows E
C
is the union of open sets.
Theorem 5.31 If (X, ) is a Hausdor space and if p X, then p is a closed
set.
Proof: If x ,= p, there exist open sets U and V such that x U, p V and
U V = . Therefore, p
C
is an open set so p is closed.
Note that the Hausdor axiom was stronger than needed in order to draw the
conclusion of the last theorem. In fact it would have been enough to assume that
if x ,= y, then there exists an open set containing x which does not intersect y.
Denition 5.32 A topological space (X, ) is said to be regular if whenever C is
a closed set and p is a point not in C, there exist disjoint open sets U and V such
that p U, C V . Thus a closed set can be separated from a point not in the
closed set by two disjoint open sets.
Regular
p
U
C
V
Denition 5.33 The topological space, (X, ) is said to be normal if whenever C
and K are disjoint closed sets, there exist disjoint open sets U and V such that
C U, K V . Thus any two disjoint closed sets can be separated with open sets.
Normal
C
U
K
V
Denition 5.34 Let E be a subset of X. E is dened to be the smallest closed set
containing E.
Lemma 5.35 The above denition is well dened.
Proof: Let ( denote all the closed sets which contain E. Then ( is nonempty
because X (.
(A : A ()
C
=
_
A
C
: A (
_
,
an open set which shows that ( is a closed set and is the smallest closed set which
contains E.
5.5. GENERAL TOPOLOGICAL SPACES 113
Theorem 5.36 E = E limit points of E.
Proof: Let x E and suppose that x / E. If x is not a limit point either, then
there exists an open set, U,containing x which does not intersect E. But then U
C
is a closed set which contains E which does not contain x, contrary to the denition
that E is the intersection of all closed sets containing E. Therefore, x must be a
limit point of E after all.
Now E E so suppose x is a limit point of E. Is x E? If H is a closed set
containing E, which does not contain x, then H
C
is an open set containing x which
contains no points of E other than x negating the assumption that x is a limit point
of E.
The following is the denition of continuity in terms of general topological spaces.
It is really just a generalization of the - denition of continuity given in calculus.
Denition 5.37 Let (X, ) and (Y, ) be two topological spaces and let f : X Y .
f is continuous at x X if whenever V is an open set of Y containing f(x), there
exists an open set U such that x U and f(U) V . f is continuous if
f
1
(V ) whenever V .
You should prove the following.
Proposition 5.38 In the situation of Denition 5.37 f is continuous if and only
if f is continuous at every point of X.
Denition 5.39 Let (X
i
,
i
) be topological spaces.

n
i=1
X
i
is the Cartesian prod-
uct. Dene a product topology as follows. Let B =
n
i=1
A
i
where A
i

i
. Then B
is a basis for the product topology.
Theorem 5.40 The set B of Denition 5.39 is a basis for a topology.
Proof: Suppose x

n
i=1
A
i

n
i=1
B
i
where A
i
and B
i
are open sets. Say
x =(x
1
, , x
n
) .
Then x
i
A
i
B
i
for each i. Therefore, x
n
i=1
A
i
B
i
B and
n
i=1
A
i
B
i

n
i=1
A
i
.
The denition of compactness is also considered for a general topological space.
This is given next.
Denition 5.41 A subset, E, of a topological space (X, ) is said to be compact if
whenever ( and E (, there exists a nite subset of (, U
1
U
n
, such that
E
n
i=1
U
i
. (Every open covering admits a nite subcovering.) E is precompact if
E is compact. A topological space is called locally compact if it has a basis B, with
the property that B is compact for each B B.
In general topological spaces there may be no concept of bounded. Even if
there is, closed and bounded is not necessarily the same as compactness. However,
in any Hausdor space every compact set must be a closed set.
Theorem 5.42 If (X, ) is a Hausdor space, then every compact subset must also
be a closed set.
Proof: Suppose p / K. For each x X, there exist open sets, U
x
and V
x
such
that
x U
x
, p V
x
,
and
U
x
V
x
= .
If K is assumed to be compact, there are nitely many of these sets, U
x
1
, , U
x
m
which cover K. Then let V
m
i=1
V
x
i
. It follows that V is an open set containing
p which has empty intersection with each of the U
x
i
. Consequently, V contains no
points of K and is therefore not a limit point of K. This proves the theorem.
A useful construction when dealing with locally compact Hausdor spaces is the
notion of the one point compactication of the space.
Denition 5.43 Suppose (X, ) is a locally compact Hausdor space. Then let
X X where is just the name of some point which is not in X which is

called the point at innity. A basis for the topology for

X is

_
K
C
where K is a compact subset of X
_
.
The complement is taken with respect to

X and so the open sets, K
C
are basic open
sets which contain .
The reason this is called a compactication is contained in the next lemma.
Lemma 5.44 If (X, ) is a locally compact Hausdor space, then
_
X,
_
is a com-
pact Hausdor space. Also if U is an open set of , then U is an open set
of .
Proof: Since (X, ) is a locally compact Hausdor space, it follows
_
X,
_
is
a Hausdor topological space. The only case which needs checking is the one of
p X and . Since (X, ) is locally compact, there exists an open set of , U
having compact closure which contains p. Then p U and U
C
and these are
disjoint open sets containing the points, p and respectively. Now let ( be an
open cover of

X with sets from . Then must be in some set, U
from (, which
must contain a set of the form K
C
where K is a compact subset of X. Then there
exist sets from (, U
1
, , U
r
which cover K. Therefore, a nite subcover of

X is
U
1
, , U
r
, U
.
To see the last claim, suppose U contains since otherwise there is nothing to
show. Notice that if C is a compact set, then X C is an open set. Therefore, if
x U , and if

X C is a basic open set contained in U containing , then
if x is in this basic open set of

X, it is also in the open set X C U . If x
is not in any basic open set of the form

X C then x is contained in an open set of
which is contained in U . Thus U is indeed open in . This proves
the lemma.
5.6. CONNECTED SETS 115
Denition 5.45 If every nite subset of a collection of sets has nonempty inter-
section, the collection has the nite intersection property.
Theorem 5.46 Let / be a set whose elements are compact subsets of a Hausdor
topological space, (X, ). Suppose / has the nite intersection property. Then
,= /.
Proof: Suppose to the contrary that = /. Then consider
(
_
K
C
: K /
_
.
It follows ( is an open cover of K
0
where K
0
is any particular element of /. But
then there are nitely many K /, K
1
, , K
r
such that K
0

r
i=1
K
C
i
implying
that
r
i=0
K
i
= , contradicting the nite intersection property.
Lemma 5.47 Let (X, ) be a topological space and let B be a basis for . Then K
is compact if and only if every open cover of basic open sets admits a nite subcover.
Proof: Suppose rst that X is compact. Then if ( is an open cover consisting
of basic open sets, it follows it admits a nite subcover because these are open sets
in (.
Next suppose that every basic open cover admits a nite subcover and let ( be
an open cover of X. Then dene

( to be the collection of basic open sets which are
contained in some set of (. It follows

( is a basic open cover of X and so it admits
a nite subcover, U
1
, , U
p
. Now each U
i
is contained in an open set of (. Let
O
i
be a set of ( which contains U
i
. Then O
1
, , O
p
is an open cover of X. This
proves the lemma.
In fact, much more can be said than Lemma 5.47. However, this is all which I
will present here.
5.6 Connected Sets
Stated informally, connected sets are those which are in one piece. More precisely,
Denition 5.48 A set, S in a general topological space is separated if there exist
sets, A, B such that
S = A B, A, B ,= , and A B = B A = .
In this case, the sets A and B are said to separate S. A set is connected if it is not
separated.
One of the most important theorems about connected sets is the following.
Theorem 5.49 Suppose U and V are connected sets having nonempty intersection.
Then U V is also connected.
Proof: Suppose U V = A B where A B = B A = . Consider the sets,
A U and B U. Since
(A U) (B U) = (A U)
_
B U
_
= ,
It follows one of these sets must be empty since otherwise, U would be separated.
It follows that U is contained in either A or B. Similarly, V must be contained in
either A or B. Since U and V have nonempty intersection, it follows that both V
and U are contained in one of the sets, A, B. Therefore, the other must be empty
and this shows U V cannot be separated and is therefore, connected.
The intersection of connected sets is not necessarily connected as is shown by
the following picture.
U
V
Theorem 5.50 Let f : X Y be continuous where X and Y are topological spaces
and X is connected. Then f (X) is also connected.
Proof: To do this you show f (X) is not separated. Suppose to the contrary
that f (X) = AB where A and B separate f (X) . Then consider the sets, f
1
(A)
and f
1
(B) . If z f
1
(B) , then f (z) B and so f (z) is not a limit point of
A. Therefore, there exists an open set, U containing f (z) such that U A = .
But then, the continuity of f implies that f
1
(U) is an open set containing z such
that f
1
(U) f
1
(A) = . Therefore, f
1
(B) contains no limit points of f
1
(A) .
Similar reasoning implies f
1
(A) contains no limit points of f
1
(B). It follows
that X is separated by f
1
(A) and f
1
(B) , contradicting the assumption that X
was connected.
An arbitrary set can be written as a union of maximal connected sets called
connected components. This is the concept of the next denition.
Denition 5.51 Let S be a set and let p S. Denote by C
p
the union of all
connected subsets of S which contain p. This is called the connected component
determined by p.
Theorem 5.52 Let C
p
be a connected component of a set S in a general topological
space. Then C
p
is a connected set and if C
p
C
q
,= , then C
p
= C
q
.
5.6. CONNECTED SETS 117
Proof: Let ( denote the connected subsets of S which contain p. If C
p
= AB
where
A B = B A = ,
then p is in one of A or B. Suppose without loss of generality p A. Then every
set of ( must also be contained in A also since otherwise, as in Theorem 5.49, the
set would be separated. But this implies B is empty. Therefore, C
p
is connected.
From this, and Theorem 5.49, the second assertion of the theorem is proved.
This shows the connected components of a set are equivalence classes and par-
tition the set.
A set, I is an interval in 1 if and only if whenever x, y I then (x, y) I. The
following theorem is about the connected sets in 1.
Theorem 5.53 A set, C in 1 is connected if and only if C is an interval.
Proof: Let C be connected. If C consists of a single point, p, there is nothing
to prove. The interval is just [p, p] . Suppose p < q and p, q C. You need to show
(p, q) C. If
x (p, q) C
let C (, x) A, and C (x, ) B. Then C = AB and the sets, A and B
separate C contrary to the assumption that C is connected.
Conversely, let I be an interval. Suppose I is separated by A and B. Pick x A
and y B. Suppose without loss of generality that x < y. Now dene the set,
S t [x, y] : [x, t] A
and let l be the least upper bound of S. Then l A so l / B which implies l A.
But if l / B, then for some > 0,
(l, l +) B =
contradicting the denition of l as an upper bound for S. Therefore, l B which
implies l / A after all, a contradiction. It follows I must be connected.
The following theorem is a very useful description of the open sets in 1.
Theorem 5.54 Let U be an open set in 1. Then there exist countably many dis-
joint open sets, (a
i
, b
i
)
i=1
such that U =
i=1
(a
i
, b
i
) .
Proof: Let p U and let z C
p
, the connected component determined by p.
Since U is open, there exists, > 0 such that (z , z +) U. It follows from
Theorem 5.49 that
(z , z +) C
p
.
This shows C
p
is open. By Theorem 5.53, this shows C
p
is an open interval, (a, b)
where a, b [, ] . There are therefore at most countably many of these con-
nected components because each must contain a rational number and the rational
numbers are countable. Denote by (a
i
, b
i
)
i=1
the set of these connected compo-
nents. This proves the theorem.
Denition 5.55 A topological space, E is arcwise connected if for any two points,
p, q E, there exists a closed interval, [a, b] and a continuous function, : [a, b] E
such that (a) = p and (b) = q. E is locally connected if it has a basis of connected
open sets. E is locally arcwise connected if it has a basis of arcwise connected open
sets.
An example of an arcwise connected topological space would be the any subset
of 1
n
which is the continuous image of an interval. Locally connected is not the
same as connected. A well known example is the following.
__
x, sin
1
x
_
: x (0, 1]
_
(0, y) : y [1, 1] (5.10)
You can verify that this set of points considered as a metric space with the metric
from 1
2
is not locally connected or arcwise connected but is connected.
Proposition 5.56 If a topological space is arcwise connected, then it is connected.
Proof: Let X be an arcwise connected space and suppose it is separated.
Then X = A B where A, B are two separated sets. Pick p A and q B.
Since X is given to be arcwise connected, there must exist a continuous function
: [a, b] X such that (a) = p and (b) = q. But then we would have ([a, b]) =
( ([a, b]) A) ( ([a, b]) B) and the two sets, ([a, b]) A and ([a, b]) B are
separated thus showing that ([a, b]) is separated and contradicting Theorem 5.53
and Theorem 5.50. It follows that X must be connected as claimed.
Theorem 5.57 Let U be an open subset of a locally arcwise connected topological
space, X. Then U is arcwise connected if and only if U if connected. Also the
connected components of an open set in such a space are open sets, hence arcwise
connected.
Proof: By Proposition 5.56 it is only necessary to verify that if U is connected
and open in the context of this theorem, then U is arcwise connected. Pick p U.
Say x U satises T if there exists a continuous function, : [a, b] U such that
(a) = p and (b) = x.
A x U such that x satises T.
If x A, there exists, according to the assumption that X is locally arcwise con-
nected, an open set, V, containing x and contained in U which is arcwise connected.
Thus letting y V, there exist intervals, [a, b] and [c, d] and continuous functions
having values in U, , such that (a) = p, (b) = x, (c) = x, and (d) = y.
Then let
1
: [a, b +d c] U be dened as
1
(t)
_
(t) if t [a, b]
(t) if t [b, b +d c]
5.7. EXERCISES 119
Then it is clear that
1
is a continuous function mapping p to y and showing that
V A. Therefore, A is open. A ,= because there is an open set, V containing p
which is contained in U and is arcwise connected.
Now consider B U A. This is also open. If B is not open, there exists a
point z B such that every open set containing z is not contained in B. Therefore,
letting V be one of the basic open sets chosen such that z V U, there exist
points of A contained in V. But then, a repeat of the above argument shows z A
also. Hence B is open and so if B ,= , then U = B A and so U is separated by
the two sets, B and A contradicting the assumption that U is connected.
It remains to verify the connected components are open. Let z C
p
where C
p
is the connected component determined by p. Then picking V an arcwise connected
open set which contains z and is contained in U, C
p
V is connected and contained
in U and so it must also be contained in C
p
As an application, consider the following corollary.
Corollary 5.58 Let f : Z be continuous where is a connected open set.
Then f must be a constant.
Proof: Suppose not. Then it achieves two dierent values, k and l ,= k. Then
= f
1
(l) f
1
(m Z : m ,= l) and these are disjoint nonempty open sets
which separate . To see they are open, note
f
1
(m Z : m ,= l) = f
1
_
m=l
_
m
1
6
, n +
1
6
__
which is the inverse image of an open set.
5.7 Exercises
1. Let V be an open set in 1
n
. Show there is an increasing sequence of open sets,
U
m
, such for all m N, U
m
U
m+1
, U
m
is compact, and V =
m=1
U
m
.
2. Completeness of 1 is an axiom. Using this, show 1
n
and C
n
are complete
metric spaces with respect to the distance given by the usual norm.
3. Let X be a metric space. Can we conclude B(x, r) = y : d (x, y) r?
Hint: Try letting X consist of an innite set and let d (x, y) = 1 if x ,= y and
d (x, y) = 0 if x = y.
4. The usual version of completeness in 1 involves the assertion that a nonempty
set which is bounded above has a least upper bound. Show this is equivalent
to saying that every Cauchy sequence converges.
5. If (X, d) is a metric space, prove that whenever K, H are disjoint non empty
closed sets, there exists f : X [0, 1] such that f is continuous, f(K) = 0,
and f(H) = 1.
6. Consider 1 with the usual metric, d (x, y) = [x y[ and the metric,
(x, y) = [arctanx arctany[
Thus we have two metric spaces here although they involve the same sets of
points. Show the identity map is continuous and has a continuous inverse.
Show that 1 with the metric, is not complete while 1 with the usual metric
is complete. The rst part of this problem shows the two metric spaces are
homeomorphic. (That is what it is called when there is a one to one onto
continuous map having continuous inverse between two topological spaces.)
Thus completeness is not a topological property although it will likely be
referred to as such.
7. If M is a separable metric space and T M, then T is also a separable metric
space with the same metric.
8. Prove the Heine Borel theorem as follows. First show [a, b] is compact in 1.
Next show that

n
i=1
[a
i
, b
i
] is compact. Use this to verify that compact sets
are exactly those which are closed and bounded.
9. Give an example of a metric space in which closed and bounded subsets are
not necessarily compact. Hint: Let X be any innite set and let d (x, y) = 1
if x ,= y and d (x, y) = 0 if x = y. Show this is a metric space. What about
B(x, 2)?
10. If f : [a, b] 1 is continuous, show that f is Riemann integrable.Hint: Use
the theorem that a function which is continuous on a compact set is uniformly
continuous.
11. Give an example of a set, X 1
2
which is connected but not arcwise con-
nected. Recall arcwise connected means for every two points, p, q X there
exists a continuous function f : [0, 1] X such that f (0) = p, f (1) = q.
12. Let (X, d) be a metric space where d is a bounded metric. Let ( denote the
collection of closed subsets of X. For A, B (, dene
(A, B) inf > 0 : A
B and B
A
where for a set S,
S
x : dist (x, S) inf d (x, s) : s S .

Show S
is a closed set containing S. Also show that is a metric on (. This

is called the Hausdor metric.
13. Using 12, suppose (X, d) is a compact metric space. Show ((, ) is a complete
metric space. Hint: Show rst that if W
n
W where W
n
is closed, then
(W
n
, W) 0. Now let A
n
be a Cauchy sequence in (. Then if > 0
5.7. EXERCISES 121
there exists N such that when m, n N, then (A
n
, A
m
) < . Therefore, for
each n N,
(A
n
)
k=n
A
k
.
Let A
n=1
k=n
A
k
. By the rst part, there exists N
1
> N such that for
n N
1
,
k=n
A
k
, A
_
< , and (A
n
)
k=n
A
k
.
Therefore, for such n, A
W
n
A
n
and (W
n
)
(A
n
)
A because
(A
n
)
k=n
A
k
A.
14. In the situation of the last two problems, let X be a compact metric space.
Show ((, ) is compact. Hint: Let T
n
be a 2
n
net for X. Let /
n
denote
nite unions of sets of the form B(p, 2
n
) where p T
n
. Show /
n
is a
2
(n1)
net for ((, ).
15. Suppose U is an open connected subset of 1
n
and f : U N is continuous.
That is f has values only in N. Also N is a metric space with respect to the
usual metric on 1. Show that f must actually be constant.
Approximation Theorems
6.1 The Bernstein Polynomials
To begin with I will give a famous theorem due to Weierstrass which shows that
every continuous function can be uniformly approximated by polynomials on an
interval. The proof I will give is not the one Weierstrass used. That proof is found
in [37] and also in [31].
The following estimate will be the basis for the Weierstrass approximation the-
orem. It is actually a statement about the variance of a binomial random variable.
Lemma 6.1 The following estimate holds for x [0, 1].
m
k=0
_
m
k
_
(k mx)
2
x
k
(1 x)
mk
1
4
m
Proof: By the Binomial theorem,
m
k=0
_
m
k
_
_
e
t
x
_
k
(1 x)
mk
=
_
1 x +e
t
x
_
m
. (6.1)
Dierentiating both sides with respect to t and then evaluating at t = 0 yields
m
k=0
_
m
k
_
kx
k
(1 x)
mk
= mx.
Now doing two derivatives of 6.1 with respect to t yields
m
k=0
_
m
k
_
k
2
(e
t
x)
k
(1 x)
mk
= m(m1) (1 x +e
t
x)
m2
e
2t
x
2
+m(1 x +e
t
x)
m1
xe
t
.
Evaluating this at t = 0,
m
k=0
_
m
k
_
k
2
(x)
k
(1 x)
mk
= m(m1) x
2
+mx.
123
124 APPROXIMATION THEOREMS
Therefore,
m
k=0
_
m
k
_
(k mx)
2
x
k
(1 x)
mk
= m(m1) x
2
+mx 2m
2
x
2
+m
2
x
2
= m
_
x x
2
_
1
4
m.
Denition 6.2 Let f C ([0, 1]). Then the following polynomials are known as
the Bernstein polynomials.
p
n
(x)
n
k=0
_
n
k
_
f
_
k
n
_
x
k
(1 x)
nk
.
Theorem 6.3 Let f C ([0, 1]) and let p
n
be given in Denition 6.2. Then
lim
n
[[f p
n
[[
= 0.
Proof: Since f is continuous on the compact [0, 1], it follows f is uniformly
continuous there and so if > 0 is given, there exists > 0 such that if
[y x[ ,
then
[f (x) f (y)[ < /2.
By the Binomial theorem,
f (x) =
n
k=0
_
n
k
_
f (x) x
k
(1 x)
nk
and so
[p
n
(x) f (x)[
n
k=0
_
n
k
_
f
_
k
n
_
f (x)
x
k
(1 x)
nk
|k/nx|>
_
n
k
_
f
_
k
n
_
f (x)
x
k
(1 x)
nk
+
|k/nx|
_
n
k
_
f
_
k
n
_
f (x)
x
k
(1 x)
nk
< /2 + 2 [[f[[
(knx)
2
>n
2
2
_
n
k
_
x
k
(1 x)
nk
2 [[f[[
n
2
2
n
k=0
_
n
k
_
(k nx)
2
x
k
(1 x)
nk
+/2.
6.2. STONE WEIERSTRASS THEOREM 125
By the lemma,
4 [[f[[
2
n
+/2 <
whenever n is large enough. This proves the theorem.
The next corollary is called the Weierstrass approximation theorem.
Corollary 6.4 The polynomials are dense in C ([a, b]).
Proof: Let f C ([a, b]) and let h : [0, 1] [a, b] be linear and onto. Then f h
is a continuous function dened on [0, 1] and so there exists a poynomial, p
n
such
that
[f (h(t)) p
n
(t)[ <
for all t [0, 1]. Therefore for all x [a, b],
f (x) p
n
_
h
1
(x)
_
< .
Since h is linear p
n
h
1
is a polynomial. This proves the theorem.
The next result is the key to the profound generalization of the Weierstrass
theorem due to Stone in which an interval will be replaced by a compact or locally
compact set and polynomials will be replaced with elements of an algebra satisfying
certain axioms.
Corollary 6.5 On the interval [M, M], there exist polynomials p
n
such that
p
n
(0) = 0
and
lim
n
[[p
n
[[[[
= 0.
Proof: Let p
n
[[ uniformly and let
p
n
p
n
p
n
(0).
This proves the corollary.
6.2 Stone Weierstrass Theorem
6.2.1 The Case Of Compact Sets
There is a profound generalization of the Weierstrass approximation theorem due
to Stone.
Denition 6.6 / is an algebra of functions if / is a vector space and if whenever
f, g / then fg /.
To begin with assume that the eld of scalars is 1. This will be generalized
later.
Denition 6.7 An algebra of functions, / dened on A, annihilates no point of A
if for all x A, there exists g / such that g (x) ,= 0. The algebra separates points
if whenever x
1
,= x
2
, then there exists g / such that g (x
1
) ,= g (x
2
).
The following generalization is known as the Stone Weierstrass approximation
theorem.
Theorem 6.8 Let A be a compact topological space and let / C (A; 1) be an
algebra of functions which separates points and annihilates no point. Then / is
dense in C (A; 1).
Proof: First here is a lemma.
Lemma 6.9 Let c
1
and c
2
be two real numbers and let x
1
,= x
2
be two points of A.
Then there exists a function f
x
1
x
2
such that
f
x
1
x
2
(x
1
) = c
1
, f
x
1
x
2
(x
2
) = c
2
.
Proof of the lemma: Let g / satisfy
g (x
1
) ,= g (x
2
).
Such a g exists because the algebra separates points. Since the algebra annihilates
no point, there exist functions h and k such that
h(x
1
) ,= 0, k (x
2
) ,= 0.
Then let
u gh g (x
2
) h, v gk g (x
1
) k.
It follows that u(x
1
) ,= 0 and u(x
2
) = 0 while v (x
2
) ,= 0 and v (x
1
) = 0. Let
f
x
1
x
2

c
1
u
u(x
1
)
+
c
2
v
v (x
2
)
.
This proves the lemma. Now continue the proof of Theorem 6.8.
First note that / satises the same axioms as / but in addition to these axioms,
/ is closed. The closure of / is taken with respect to the usual norm on C (A),
[[f[[
max [f (x)[ : x A .
Suppose f / and suppose M is large enough that
[[f[[
< M.
Using Corollary 6.5, there exists p
n
, a sequence of polynomials such that
[[p
n
[[[[
0, p
n
(0) = 0.
It follows that p
n
f / and so [f[ / whenever f /. Also note that
max (f, g) =
[f g[ + (f +g)
2
min(f, g) =
(f +g) [f g[
2
.
Therefore, this shows that if f, g / then
max (f, g) , min(f, g) /.
By induction, if f
i
, i = 1, 2, , m are in / then
max (f
i
, i = 1, 2, , m) , min(f
i
, i = 1, 2, , m) /.
Now let h C (A; 1) and let x A. Use Lemma 6.9 to obtain f
xy
, a function
of / which agrees with h at x and y. Letting > 0, there exists an open set U (y)
containing y such that
f
xy
(z) > h(z) if z U(y).
Since A is compact, let U (y
1
) , , U (y
l
) cover A. Let
f
x
max (f
xy
1
, f
xy
2
, , f
xy
l
).
Then f
x
/ and
f
x
(z) > h(z)
for all z A and f
x
(x) = h(x). This implies that for each x A there exists an
open set V (x) containing x such that for z V (x),
f
x
(z) < h(z) +.
Let V (x
1
) , , V (x
m
) cover A and let
f min(f
x
1
, , f
x
m
).
Therefore,
f (z) < h(z) +
for all z A and since f
x
(z) > h(z) for all z A, it follows
f (z) > h(z)
also and so
[f (z) h(z)[ <
for all z. Since is arbitrary, this shows h / and proves / = C (A; 1). This
proves the theorem.
6.2.2 The Case Of Locally Compact Sets
Denition 6.10 Let (X, ) be a locally compact Hausdor space. C
0
(X) denotes
the space of real or complex valued continuous functions dened on X with the
property that if f C
0
(X) , then for each > 0 there exists a compact set K such
that [f (x)[ < for all x / K. Dene
[[f[[
= sup [f (x)[ : x X.
Lemma 6.11 For (X, ) a locally compact Hausdor space with the above norm,
C
0
(X) is a complete space.
Proof: Let
_
X,
_
be the one point compactication described in Lemma 5.44.
D
_
f C
_
X
_
: f () = 0
_
.
Then D is a closed subspace of C
_
X
_
. For f C
0
(X) ,
f (x)
_
f (x) if x X
0 if x =
and let : C
0
(X) D be given by f =

f. Then is one to one and onto and also
satises [[f[[
= [[f[[
. Now D is complete because it is a closed subspace of a

complete space and so C
0
(X) with [[[[
is also complete. This proves the lemma.

The above refers to functions which have values in C but the same proof works
for functions which have values in any complete normed linear space.
In the case where the functions in C
0
(X) all have real values, I will denote the
resulting space by C
0
(X; 1) with similar meanings in other cases.
With this lemma, the generalization of the Stone Weierstrass theorem to locally
compact sets is as follows.
Theorem 6.12 Let / be an algebra of functions in C
0
(X; 1) where (X, ) is a
locally compact Hausdor space which separates the points and annihilates no point.
Then / is dense in C
0
(X; 1).
Proof: Let
_
X,
_
be the one point compactication as described in Lemma
5.44. Let

/ denote all nite linear combinations of the form
_
n
i=1
c
i
f
i
+c
0
: f /, c
i
1
_
where for f C
0
(X; 1) ,
f (x)
_
f (x) if x X
0 if x =
.
Then

/is obviously an algebra of functions in C
_
X; 1
_
. It separates points because
this is true of /. Similarly, it annihilates no point because of the inclusion of c
0
an
arbitrary element of 1 in the denition above. Therefore from Theorem 6.8,

/ is
dense in C
_
X; 1
_
. Letting f C
0
(X; 1) , it follows

f C
_
X; 1
_
and so there
exists a sequence h
n

/ such that h
n
converges uniformly to

f. Now h
n
is of
the form

n
i=1
c
n
i
f
n
i
+ c
n
0
and since

f () = 0, you can take each c
n
0
= 0 and so
this has shown the existence of a sequence of functions in / such that it converges
uniformly to f. This proves the theorem.
6.2.3 The Case Of Complex Valued Functions
What about the general case where C
0
(X) consists of complex valued functions
and the eld of scalars is C rather than 1? The following is the version of the Stone
Weierstrass theorem which applies to this case. You have to assume that for f /
it follows f /. Such an algebra is called self adjoint.
Theorem 6.13 Suppose / is an algebra of functions in C
0
(X) , where X is a
locally compact Hausdor space, which separates the points, annihilates no point,
and has the property that if f /, then f /. Then / is dense in C
0
(X).
Proof: Let Re / Re f : f /, Im/ Imf : f /. First I will show
that / =Re /+i Im/ =Im/+i Re /. Let f /. Then
f =
1
2
_
f +f
_
+
1
2
_
f f
_
= Re f +i Imf Re /+i Im/
and so / Re /+i Im/. Also
f =
1
2i
_
if +if
_
i
2
_
if + (if)
_
= Im(if) +i Re (if) Im/+i Re /
This proves one half of the desired equality. Now suppose h Re /+i Im/. Then
h = Re g
1
+i Img
2
where g
i
/. Then since Re g
1
=
1
2
(g
1
+g
1
) , it follows Re g
1

/. Similarly Img
2
/. Therefore, h /. The case where h Im/ + i Re / is
similar. This establishes the desired equality.
Now Re / and Im/ are both real algebras. I will show this now. First consider
Im/. It is obvious this is a real vector space. It only remains to verify that
the product of two functions in Im/ is in Im/. Note that from the rst part,
Re /, Im/ are both subsets of / because, for example, if u Im/ then u + 0
Im/ + i Re / = /. Therefore, if v, w Im/, both iv and w are in / and so
Im(ivw) = vw and ivw /. Similarly, Re / is an algebra.
Both Re / and Im/ must separate the points. Here is why: If x
1
,= x
2
, then
there exists f / such that f (x
1
) ,= f (x
2
) . If Imf (x
1
) ,= Imf (x
2
) , this shows
there is a function in Im/, Imf which separates these two points. If Imf fails
to separate the two points, then Re f must separate the points and so you could
consider Im(if) to get a function in Im/ which separates these points. This shows
Im/ separates the points. Similarly Re / separates the points.
Neither Re / nor Im/ annihilate any point. This is easy to see because if x
is a point there exists f / such that f (x) ,= 0. Thus either Re f (x) ,= 0 or
Imf (x) ,= 0. If Imf (x) ,= 0, this shows this point is not annihilated by Im/.
If Imf (x) = 0, consider Im(if) (x) = Re f (x) ,= 0. Similarly, Re / does not
annihilate any point.
It follows from Theorem 6.12 that Re / and Im/ are dense in the real valued
functions of C
0
(X). Let f C
0
(X) . Then there exists h
n
Re / and g
n

Im/ such that h
n
Re f uniformly and g
n
Imf uniformly. Therefore, h
n
+
ig
n
/ and it converges to f uniformly. This proves the theorem.
6.3 Exercises
1. Let X be a nite dimensional normed linear space, real or complex. Show
that X is separable. Hint: Let v
i
n
i=1
be a basis and dene a map from
F
n
to X, , as follows. (
n
k=1
x
k
e
k
)

n
k=1
x
k
v
k
. Show is continuous
and has a continuous inverse. Now let D be a countable dense set in F
n
and
consider (D).
2. Let B(X; 1
n
) be the space of functions f , mapping X to 1
n
such that
sup[f (x)[ : x X < .
Show B(X; 1
n
) is a complete normed linear space if we dene
[[f [[ sup[f (x)[ : x X.
3. Let (0, 1]. We dene, for X a compact subset of 1
p
,
C
(X; 1
n
) f C (X; 1
n
) :
(f ) +[[f [[ [[f [[
<
where
[[f [[ sup[f (x)[ : x X
and
(f ) sup
[f (x) f (y)[
[x y[
: x, y X, x ,= y.
Show that (C
(X; 1
n
) , [[[[
) is a complete normed linear space. This is

called a Holder space. What would this space consist of if > 1?
4. Let f
n
n=1
C
(X; 1
n
) where X is a compact subset of 1
p
and suppose
[[f
n
[[
M
for all n. Show there exists a subsequence, n
k
, such that f
n
k
converges in
C (X; 1
n
). We say the given sequence is precompact when this happens.
(This also shows the embedding of C
(X; 1
n
) into C (X; 1
n
) is a compact
embedding.) Hint: You might want to use the Ascoli Arzela theorem.
6.3. EXERCISES 131
5. Let f :1 1
n
1
n
be continuous and bounded and let x
0
1
n
. If
x : [0, T] 1
n
and h > 0, let
h
x(s)
_
x
0
if s h,
x(s h) , if s > h.
For t [0, T], let
x
h
(t) = x
0
+
_
t
0
f (s,
h
x
h
(s)) ds.
Show using the Ascoli Arzela theorem that there exists a sequence h 0 such
that
x
h
x
in C ([0, T] ; 1
n
). Next argue
x(t) = x
0
+
_
t
0
f (s, x(s)) ds
and conclude the following theorem. If f :1 1
n
1
n
is continuous and
bounded, and if x
0
1
n
is given, there exists a solution to the following
initial value problem.
x
= f (t, x) , t [0, T]
x(0) = x
0
.
This is the Peano existence theorem for ordinary dierential equations.
6. Let H and K be disjoint closed sets in a metric space, (X, d), and let
g (x)
2
3
h(x)
1
3
where
h(x)
dist (x, H)
dist (x, H) + dist (x, K)
.
Show g (x)
_
1
3
,
1
3
for all x X, g is continuous, and g equals

1
3
on H
while g equals
1
3
on K.
7. Suppose M is a closed set in X where X is the metric space of problem 6 and
suppose f : M [1, 1] is continuous. Show there exists g : X [1, 1]
such that g is continuous and g = f on M. Hint: Show there exists
g
1
C (X) , g
1
(x)
_
1
3
,
1
3
_
,
and [f (x) g
1
(x)[
2
3
for all x H. To do this, consider the disjoint closed
sets
H f
1
__
1,
1
3
__
, K f
1
__
1
3
, 1
__
and use Urysohns lemma or something to obtain a continuous function g
1
dened on X such that g
1
(H) = 1/3, g
1
(K) = 1/3 and g
1
has values in
[1/3, 1/3]. When this has been done, let
3
2
(f (x) g
1
(x))
play the role of f and let g
2
be like g
1
. Obtain
f (x)
n
i=1
_
2
3
_
i1
g
i
(x)
_
2
3
_
n
and consider
g (x)
i=1
_
2
3
_
i1
g
i
(x).
8. Let M be a closed set in a metric space (X, d) and suppose f C (M).
Show there exists g C (X) such that g (x) = f (x) for all x M and if
f (M) [a, b], then g (X) [a, b]. This is a version of the Tietze extension
theorem.
9. This problem gives an outline of the way Weierstrass originally proved the
theorem. Choose a
n
such that
_
1
1
_
1 x
2
_
n
a
n
dx = 1. Show a
n
<
n+1
2
or
something like this. Now show that for (0, 1) ,
lim
n
_
_
1
_
1 x
2
_
n
a
n
+
_

1
a
n
_
1 x
2
_
n
dx
_
= 0.
Next for f a continuous function dened on 1, dene the polynomial, p
n
(x)
by
p
n
(x)
_
x+1
x1
a
n
_
1 (x t)
2
_
n
f (t) dt =
_
1
1
a
n
f (x t)
_
1 t
2
_
n
dt.
Then show lim
n
[[p
n
f[[
= 0. where [[f[[
= max [f (x)[ : x [1, 1].

10. Suppose f : 1 1 and f 0 on [1, 1] with f (1) = f (1) = 0 and f (x) < 0
for all x / [1, 1] . Can you use a modication of the proof of the Weierstrass
approximation theorem given in Problem 9 to show that for all > 0 there
exists a polynomial, p, such that [p (x) f (x)[ < for x [1, 1] and
p (x) 0 for all x / [1, 1]? Hint:Let f
(x) = f (x)

2
. Thus there exists
such that 1 > > 0 and f
< 0 on (1, 1 +) and (1 , 1) . Now consider
k
(x) = a
k
_
_
x
_
2
_
k
and try something similar to the proof given for the
Weierstrass approximation theorem above.
6.3. EXERCISES 133
11. Suppose f C
0
([0, )) and also [f (t)[ Ce
rt
. Let / denote the algebra of
linear combinations of functions of the form e
st
for s suciently large. Thus
/ is dense in C
0
([0, )) . Show that if
_

0
e
st
f (t) dt = 0
for each s suciently large, then f (t) = 0. Next consider only [f (t)[ Ce
rt
for some r. That is f has exponential growth. Show the same conclusion holds
for f if
_

0
e
st
f (t) dt = 0
for all s suciently large. This justies the Laplace transform procedure of
dierential equations where if the Laplace transforms of two functions are
equal, then the two functions are considered to be equal. More can be said
about this. Hint: For the last part, consider g (t) e
2rt
f (t) and apply the
rst part to g. If g (t) = 0 then so is f (t).
Abstract Measure And
Integration
7.1 Algebras
This chapter is on the basics of measure theory and integration. A measure is a real
valued mapping from some subset of the power set of a given set which has values
in [0, ]. Many apparently dierent things can be considered as measures and also
there is an integral dened. By discussing this in terms of axioms and in a very
abstract setting, many dierent topics can be considered in terms of one general
theory. For example, it will turn out that sums are included as an integral of this
sort. So is the usual integral as well as things which are often thought of as being
in between sums and integrals.
Let be a set and let T be a collection of subsets of satisfying
T, T, (7.1)
E T implies E
C
E T,
If E
n
n=1
T, then
n=1
E
n
T. (7.2)
Denition 7.1 A collection of subsets of a set, , satisfying Formulas 7.1-7.2 is
called a algebra.
As an example, let be any set and let T = T(), the set of all subsets of
(power set). This obviously satises Formulas 7.1-7.2.
Lemma 7.2 Let ( be a set whose elements are algebras of subsets of . Then
( is a algebra also.
Be sure to verify this lemma. It follows immediately from the above denitions
but it is important for you to check the details.
Example 7.3 Let denote the collection of all open sets in 1
n
and let ()
intersection of all algebras that contain . () is called the algebra of Borel
sets . In general, for a collection of sets, , () is the smallest algebra which
contains .
135
136 ABSTRACT MEASURE AND INTEGRATION
This is a very important algebra and it will be referred to frequently as the
Borel sets. Attempts to describe a typical Borel set are more trouble than they are
worth and it is not easy to do so. Rather, one uses the denition just given in the
example. Note, however, that all countable intersections of open sets and countable
unions of closed sets are Borel sets. Such sets are called G
and F
respectively.
Denition 7.4 Let T be a algebra of sets of and let : T [0, ]. is
called a measure if
(
_
i=1
E
i
) =
i=1
(E
i
) (7.3)
whenever the E
i
are disjoint sets of T. The triple, (, T, ) is called a measure
space and the elements of T are called the measurable sets. (, T, ) is a nite
measure space when () < .
Note that the above denition immediately implies that if E
i
T and the sets
E
i
are not necessarily disjoint,
(
_
i=1
E
i
)
i=1
(E
i
) .
To see this, let F
1
E
1
, F
2
E
2
E
1
, , F
n
E
n

n1
i=1
E
i
, then the sets F
i
are
disjoint sets in T and
(
_
i=1
E
i
) = (
_
i=1
F
i
) =
i=1
(F
i
)
i=1
(E
i
)
because of the fact that each E
i
F
i
and so
(E
i
) = (F
i
) +(E
i
F
i
)
which implies (E
i
) (F
i
) .
The following theorem is the basis for most of what is done in the theory of
measure and integration. It is a very simple result which follows directly from the
above denition.
Theorem 7.5 Let E
m
m=1
be a sequence of measurable sets in a measure space
(, T, ). Then if E
n
E
n+1
E
n+2
,
(
i=1
E
i
) = lim
n
(E
n
) (7.4)
and if E
n
E
n+1
E
n+2
and (E
1
) < , then
(
i=1
E
i
) = lim
n
(E
n
). (7.5)
Stated more succinctly, E
k
E implies (E
k
) (E) and E
k
E with (E
1
) <
implies (E
k
) (E).
7.1. ALGEBRAS 137
Proof: First note that
i=1
E
i
= (
i=1
E
C
i
)
C
T so
i=1
E
i
is measurable.
Also note that for A and B sets of T, A B
_
A
C
B
_
C
T. To show 7.4, note
that 7.4 is obviously true if (E
k
) = for any k. Therefore, assume (E
k
) <
for all k. Thus
(E
k+1
E
k
) +(E
k
) = (E
k+1
)
and so
(E
k+1
E
k
) = (E
k+1
) (E
k
).
Also,
_
k=1
E
k
= E
1
_
k=1
(E
k+1
E
k
)
and the sets in the above union are disjoint. Hence by 7.3,
(
i=1
E
i
) = (E
1
) +
k=1
(E
k+1
E
k
) = (E
1
)
+
k=1
(E
k+1
) (E
k
)
= (E
1
) + lim
n
n
k=1
(E
k+1
) (E
k
) = lim
n
(E
n+1
).
This shows part 7.4.
To verify 7.5,
(E
1
) = (
i=1
E
i
) +(E
1

i=1
E
i
)
since (E
1
) < , it follows (
i=1
E
i
) < . Also, E
1

n
i=1
E
i
E
1

i=1
E
i
and
so by 7.4,
(E
1
) (
i=1
E
i
) = (E
1

i=1
E
i
) = lim
n
(E
1

n
i=1
E
i
)
= (E
1
) lim
n
(
n
i=1
E
i
) = (E
1
) lim
n
(E
n
),
Hence, subtracting (E
1
) from both sides,
lim
n
(E
n
) = (
i=1
E
i
).
It is convenient to allow functions to take the value +. You should think of
+, usually referred to as as something out at the right end of the real line and
its only importance is the notion of sequences converging to it. x
n
exactly
when for all l 1, there exists N such that if n N, then
x
n
> l.
This is what it means for a sequence to converge to . Dont think of as a
number. It is just a convenient symbol which allows the consideration of some limit
operations more simply. Similar considerations apply to but this value is not
of very great interest. In fact the set of most interest is the complex numbers or
some vector space. Therefore, this topic is not considered.
Lemma 7.6 Let f : (, ] where T is a algebra of subsets of . Then
the following are equivalent.
f
1
((d, ]) T for all nite d,
f
1
((, d)) T for all nite d,
f
1
([d, ]) T for all nite d,
f
1
((, d]) T for all nite d,
f
1
((a, b)) T for all a < b, < a < b < .
Proof: First note that the rst and the third are equivalent. To see this, observe
f
1
([d, ]) =
n=1
f
1
((d 1/n, ]),
and so if the rst condition holds, then so does the third.
f
1
((d, ]) =
n=1
f
1
([d + 1/n, ]),
and so if the third condition holds, so does the rst.
Similarly, the second and fourth conditions are equivalent. Now
f
1
((, d]) = (f
1
((d, ]))
C
so the rst and fourth conditions are equivalent. Thus the rst four conditions are
equivalent and if any of them hold, then for < a < b < ,
f
1
((a, b)) = f
1
((, b)) f
1
((a, ]) T.
Finally, if the last condition holds,
f
1
([d, ]) =
_
k=1
f
1
((k +d, d))
_
C
T
and so the third condition holds. Therefore, all ve conditions are equivalent. This
proves the lemma.
This lemma allows for the following denition of a measurable function having
values in (, ].
Denition 7.7 Let (, T, ) be a measure space and let f : (, ]. Then
f is said to be measurable if any of the equivalent conditions of Lemma 7.6 hold.
When the algebra, T equals the Borel algebra, B, the function is called Borel
measurable. More generally, if f : X where X is a topological space, f is said
to be measurable if f
1
(U) T whenever U is open.
7.1. ALGEBRAS 139
You should verify this last condition is veried in the special cases considered
above.
Theorem 7.8 Let f
n
and f be functions mapping to (, ] where T is a al-
gebra of measurable sets of . Then if f
n
is measurable, and f() = lim
n
f
n
(),
it follows that f is also measurable. (Pointwise limits of measurable functions are
measurable.)
Proof: First is is shown f
1
((a, b)) T. Let V
m

_
a +
1
m
, b
1
m
_
and
V
m
=
_
a +
1
m
, b
1
m
. Then for all m, V

m
(a, b) and
(a, b) =
m=1
V
m
=
m=1
V
m
.
Note that V
m
,= for all m large enough. Since f is the pointwise limit of f
n
,
f
1
(V
m
) : f
k
() V
m
for all k large enough f
1
(V
m
).
You should note that the expression in the middle is of the form
n=1
k=n
f
1
k
(V
m
).
Therefore,
f
1
((a, b)) =
m=1
f
1
(V
m
)
m=1
n=1
k=n
f
1
k
(V
m
)

m=1
f
1
(V
m
) = f
1
((a, b)).
It follows f
1
((a, b)) T because it equals the expression in the middle which is
measurable. This shows f is measurable.
Theorem 7.9 Let B consist of open cubes of the form
Q
x

n
i=1
(x
i
, x
i
+)
where is a positive rational number and x
n
. Then every open set in 1
n
can be
written as a countable union of open cubes from B. Furthermore, B is a countable
set.
Proof: Let U be an open set and let y U. Since U is open, B(y, r) U for
some r > 0 and it can be assumed r/
n . Let
x B
_
y,
r
10
n
_

n
and consider the cube, Q
x
B dened by
Q
x

n
i=1
(x
i
, x
i
+)
where = r/4
n. The following picture is roughly illustrative of what is taking

place.
y
x
Q
x
B(y, r)
Then the diameter of Q
x
equals
_
n
_
r
2
n
_
2
_
1/2
=
r
2
and so, if z Q
x
, then
[z y[ [z x[ +[x y[
<
r
2
+
r
2
= r.
Consequently, Q
x
U. Now also,
_
n
i=1
(x
i
y
i
)
2
_
1/2
<
r
10
n
and so it follows that for each i,
[x
i
y
i
[ <
r
4
n
since otherwise the above inequality would not hold. Therefore, y Q
x
U. Now
let B
U
denote those sets of B which are contained in U. Then B
U
= U.
To see B is countable, note there are countably many choices for x and countably
many choices for . This proves the theorem.
Recall that g : 1
n
1 is continuous means g
1
(open set) = an open set. In
particular g
1
((a, b)) must be an open set.
Theorem 7.10 Let f
i
: 1 for i = 1, , n be measurable functions and let
g : 1
n
1 be continuous where f (f
1
f
n
)
T
. Then gf is a measurable function
from to 1.
7.1. ALGEBRAS 141
Proof: First it is shown
(g f )
1
((a, b)) T.
Now (g f )
1
((a, b)) = f
1
_
g
1
((a, b))
_
and since g is continuous, it follows that
g
1
((a, b)) is an open set which is denoted as U for convenience. Now by Theorem
7.9 above, it follows there are countably many open cubes, Q
k
such that
U =
k=1
Q
k
where each Q
k
is a cube of the form
Q
k
=
n
i=1
(x
i
, x
i
+) .
Now
f
1
_
n
i=1
(x
i
, x
i
+)
_
=
n
i=1
f
1
i
((x
i
, x
i
+)) T
and so
(g f )
1
((a, b)) = f
1
_
g
1
((a, b))
_
= f
1
(U)
= f
1
(
k=1
Q
k
) =
k=1
f
1
(Q
k
) T.
Corollary 7.11 Sums, products, and linear combinations of real valued measurable
functions are measurable.
Proof: To see the product of two measurable functions is measurable, let
g (x, y) = xy, a continuous function dened on 1
2
. Thus if you have two mea-
surable functions, f
1
and f
2
dened on ,
g (f
1
, f
2
) () = f
1
() f
2
()
and so f
1
() f
2
() is measurable. Similarly you can show the sum of two
measurable functions is measurable by considering g (x, y) = x + y and you can
show a linear combination of two measurable functions is measurable by considering
g (x, y) = ax +by. More than two functions can also be considered as well.
The message of this corollary is that starting with measurable real valued func-
tions you can combine them in pretty much any way you want and you end up with
a measurable function.
Here is some notation which will be used whenever convenient.
Denition 7.12 Let f : [, ]. Dene
[ < f] : f () > f
1
((, ])
with obvious modications for the symbols [ f] , [ f] , [ f ], etc.
Denition 7.13 For a set E,
A
E
() =
_
1 if E,
0 if / E.
This is called the characteristic function of E. Sometimes this is called the
indicator function which I think is better terminology since the term characteristic
function has another meaning. Note that this indicates whether a point, is
contained in E. It is exactly when the function has the value 1.
Theorem 7.14 (Egoro) Let (, T, ) be a nite measure space,
(() < )
and let f
n
, f be complex valued functions such that Re f
n
, Imf
n
are all measurable
and
lim
n
f
n
() = f()
for all / E where (E) = 0. Then for every > 0, there exists a set,
F E, (F) < ,
such that f
n
converges uniformly to f on F
C
.
Proof: First suppose E = so that convergence is pointwise everywhere. It
follows then that Re f and Imf are pointwise limits of measurable functions and
are therefore measurable. Let E
km
= : [f
n
() f()[ 1/m for some
n > k. Note that
[f
n
() f ()[ =
_
(Re f
n
() Re f ())
2
+ (Imf
n
() Imf ())
2
and so by Theorem 7.10,
_
[f
n
f[
1
m
_
is measurable. Hence E
km
is measurable because
E
km
=
n=k+1
_
[f
n
f[
1
m
_
.
For xed m,
k=1
E
km
= because f
n
converges to f . Therefore, if there
exists k such that if n > k, [f
n
() f ()[ <
1
m
which means / E
km
. Note also
that
E
km
E
(k+1)m
.
Since (E
1m
) < , Theorem 7.5 on Page 136 implies
0 = (
k=1
E
km
) = lim
k
(E
km
).
7.2. THE ABSTRACT LEBESGUE INTEGRAL 143
Let k(m) be chosen such that (E
k(m)m
) < 2
m
and let
F =
_
m=1
E
k(m)m
.
Then (F) < because
(F)
m=1
_
E
k(m)m
_
<
m=1
2
m
=
Now let > 0 be given and pick m
0
such that m
1
0
< . If F
C
, then

m=1
E
C
k(m)m
.
Hence E
C
k(m
0
)m
0
so
[f
n
() f()[ < 1/m
0
<
for all n > k(m
0
). This holds for all F
C
and so f
n
converges uniformly to f on
F
C
.
Now if E ,= , consider A
E
Cf
n
n=1
. Each A
E
Cf
n
has real and imaginary
parts measurable and the sequence converges pointwise to A
E
f everywhere. There-
fore, from the rst part, there exists a set of measure less than , F such that on
F
C
, A
E
Cf
n
converges uniformly to A
E
Cf. Therefore, on (E F)
C
, f
n
con-
verges uniformly to f. This proves the theorem.
Finally here is a comment about notation.
Denition 7.15 Something happens for a.e. said as almost everywhere, if
there exists a set E with (E) = 0 and the thing takes place for all / E. Thus
f() = g() a.e. if f() = g() for all / E where (E) = 0. A measure space,
(, T, ) is nite if there exist measurable sets,
n
such that (
n
) < and
=
n=1
n
.
7.2 The Abstract Lebesgue Integral
7.2.1 Preliminary Observations
This section is on the Lebesgue integral and the major convergence theorems which
are the reason for studying it. In all that follows will be a measure dened on a
algebra T of subsets of . 0 = 0 is always dened to equal zero. This is a
meaningless expression and so it can be dened arbitrarily but a little thought will
soon demonstrate that this is the right denition in the context of measure theory.
To see this, consider the zero function dened on 1. What should the integral of
this function equal? Obviously, by an analogy with the Riemann integral, it should
equal zero. Formally, it is zero times the length of the set or innity. This is why
this convention will be used.
Lemma 7.16 Let f (a, b) [, ] for a A and b B where A, B are sets.
Then
sup
aA
sup
bB
f (a, b) = sup
bB
sup
aA
f (a, b) .
Proof: Note that for all a, b, f (a, b) sup
bB
sup
aA
f (a, b) and therefore, for
all a,
sup
bB
f (a, b) sup
bB
sup
aA
f (a, b) .
Therefore,
sup
aA
sup
bB
f (a, b) sup
bB
sup
aA
f (a, b) .
Repeating the same argument interchanging a and b, gives the conclusion of the
lemma.
Lemma 7.17 If A
n
is an increasing sequence in [, ], then supA
n
=
lim
n
A
n
.
The following lemma is useful also and this is a good place to put it. First
b
j
j=1
is an enumeration of the a
ij
if
j=1
b
j
=
i,j
a
ij
.
In other words, the countable set, a
ij
i,j=1
is listed as b
1
, b
2
, .
Lemma 7.18 Let a
ij
0. Then

i=1
j=1
a
ij
=
j=1
i=1
a
ij
. Also if b
j
j=1
is any enumeration of the a
ij
, then

j=1
b
j
=
i=1
j=1
a
ij
.
Proof: First note there is no trouble in dening these sums because the a
ij
are
all nonnegative. If a sum diverges, it only diverges to and so is written as the
answer.
j=1
i=1
a
ij
sup
n
j=1
n
i=1
a
ij
= sup
n
lim
m
m
j=1
n
i=1
a
ij
= sup
n
lim
m
n
i=1
m
j=1
a
ij
= sup
n
n
i=1
j=1
a
ij
=
i=1
j=1
a
ij
. (7.6)
Interchanging the i and j in the above argument the rst part of the lemma is
proved.
Finally, note that for all p,
p
j=1
b
j

i=1
j=1
a
ij
and so

j=1
b
j

i=1
j=1
a
ij
. Now let m, n > 1 be given. Then
m
i=1
n
j=1
a
ij

p
j=1
b
j
where p is chosen large enough that b
1
, , b
p
a
ij
: i m and j n . There-
fore, since such a p exists for any choice of m, n,it follows that for any m, n,
m
i=1
n
j=1
a
ij

j=1
b
j
.
Therefore, taking the limit as n ,
m
i=1
j=1
a
ij

j=1
b
j
and nally, taking the limit as m ,
i=1
j=1
a
ij

j=1
b
j
proving the lemma.
7.2.2 Denition Of The Lebesgue Integral For Nonnegative
Measurable Functions
The following picture illustrates the idea used to dene the Lebesgue integral to be
like the area under a curve.
h
2h
3h
h([h < f])
h([2h < f])
h([3h < f])
You can see that by following the procedure illustrated in the picture and letting
h get smaller, you would expect to obtain better approximations to the area under
the curve
1
although all these approximations would likely be too small. Therefore,
dene
_
fd sup
h>0
i=1
h([ih < f])
1
Note the dierence between this picture and the one usually drawn in calculus courses where
the little rectangles are upright rather than on their sides. This illustrates a fundamental philo-
sophical dierence between the Riemann and the Lebesgue integrals. With the Riemann integral
intervals are measured. With the Lebesgue integral, it is inverse images of intervals which are
measured.
Lemma 7.19 The following inequality holds.
i=1
h([ih < f])
i=1
h
2
__
i
h
2
< f
__
.
Also, it suces to consider only h smaller than a given positive number in the above
denition of the integral.
Proof:
Let N N.
2N
i=1
h
2
__
i
h
2
< f
__
=
2N
i=1
h
2
([ih < 2f])
=
N
i=1
h
2
([(2i 1) h < 2f]) +
N
i=1
h
2
([(2i) h < 2f])
=
N
i=1
h
2
__
(2i 1)
2
h < f
__
+
N
i=1
h
2
([ih < f])
i=1
h
2
([ih < f]) +
N
i=1
h
2
([ih < f]) =
N
i=1
h([ih < f]) .
Now letting N yields the claim of the lemma.
To verify the last claim, suppose M <
_
fd and let > 0 be given. Then there
exists h > 0 such that
M <
i=1
h([ih < f])
_
fd.
By the rst part of this lemma,
M <
i=1
h
2
__
i
h
2
< f
__
_
fd
and continuing to apply the rst part,
M <
i=1
h
2
n
__
i
h
2
n
< f
__
_
fd.
Choose n large enough that h/2
n
< . It follows M h>0
i=1
h([ih < f])
_
fd. Since M is arbitrary, this proves the last claim.
7.2.3 The Lebesgue Integral For Nonnegative Simple Func-
tions
Denition 7.20 A function, s, is called simple if it is a measurable real valued
function and has only nitely many values. These values will never be . Thus
a simple function is one which may be written in the form
s () =
n
i=1
c
i
A
E
i
()
where the sets, E
i
are disjoint and measurable. s takes the value c
i
at E
i
.
Note that by taking the union of some of the E
i
in the above denition, you
can assume that the numbers, c
i
are the distinct values of s. Simple functions are
important because it will turn out to be very easy to take their integrals as shown
in the following lemma.
Lemma 7.21 Let s () =

p
i=1
a
i
A
E
i
() be a nonnegative simple function with
the a
i
the distinct non zero values of s. Then
_
sd =
p
i=1
a
i
(E
i
) . (7.7)
Also, for any nonnegative measurable function, f, if 0, then
_
fd =
_
fd. (7.8)
Proof: Consider 7.7 rst. Without loss of generality, you can assume 0 < a
1
<
a
2
< < a
p
and that (E
i
) < . Let > 0 be given and let
1
p
i=1
(E
i
) < .
Pick <
1
such that for h < it is also true that
h <
1
2
min(a
1
, a
2
a
1
, a
3
a
2
, , a
n
a
n1
) .
Then for 0 < h <
k=1
h([s > kh]) =
k=1
h
i=k
([ih < s (i + 1) h])
=
i=1
i
k=1
h([ih < s (i + 1) h])
=
i=1
ih([ih < s (i + 1) h]) . (7.9)
Because of the choice of h there exist positive integers, i
k
such that i
1
< i
2
< , <
i
p
and
i
1
h < a
1
(i
1
+ 1) h < < i
2
h < a
2
(i
2
+ 1) h < < i
p
h < a
p
(i
p
+ 1) h
Then in the sum of 7.9 the only terms which are nonzero are those for which
i i
1
, i
2
, i
p
. To see this, you might consider the following picture.
a
1
a
2
a
3
i
1
h
i
3
h
i
2
h
When ih and (i + 1) h are both in between two of the a
i
the set [ih < s (i + 1) h]
must be empty because the only values of the function are one of the a
i
. At an
i
k
, i
k
h is smaller than a
k
while (i
k
+ 1) h is at least as large. Therefore, the set
[ih < s (i + 1) h] equals E
k
and so
([i
k
h < s (i
k
+ 1) h]) = (E
k
) .
Therefore,
k=1
h([s > kh]) =
p
k=1
i
k
h(E
k
) .
It follows that for all h this small,
0 kh])
=
p
k=1
a
k
(E
k
)
p
k=1
i
k
h(E
k
) h
p
k=1
(E
k
) < .
Taking the inf for h this small and using Lemma 7.19,
0
p
k=1
a
k
(E
k
) sup
>h>0
k=1
h([s > kh]) =
p
k=1
a
k
(E
k
)
_
sd .
Since > 0 is arbitrary, this proves the rst part.
To verify 7.8 Note the formula is obvious if = 0 because then [ih < f] =
for all i > 0. Assume > 0. Then
_
fd sup
h>0
i=1
h([ih < f])
= sup
h>0
i=1
h([ih/ < f])
= sup
h>0
i=1
(h/) ([i (h/) < f])
=
_
fd.
Lemma 7.22 Let the nonnegative simple function, s be dened as
s () =
n
i=1
c
i
A
E
i
()
where the c
i
are not necessarily distinct but the E
i
are disjoint. It follows that
_
s =
n
i=1
c
i
(E
i
) .
Proof: Let the values of s be a
1
, , a
m
. Therefore, since the E
i
are disjoint,
each a
i
equal to one of the c
j
. Let A
i
E
j
: c
j
= a
i
. Then from Lemma 7.21
it follows that
_
s =
m
i=1
a
i
(A
i
) =
m
i=1
a
i
{j:c
j
=a
i
}
(E
j
)
=
m
i=1
{j:c
j
=a
i
}
c
j
(E
j
) =
n
i=1
c
i
(E
i
) .
Note that
_
s could equal + if (A
k
) = and a
k
> 0 for some k, but
_
s is
well dened because s 0. Recall that 0 = 0.
Lemma 7.23 If a, b 0 and if s and t are nonnegative simple functions, then
_
as +bt = a
_
s +b
_
t.
Proof: Let
s() =
n
i=1
i
A
A
i
(), t() =
m
i=1
j
A
B
j
()
where
i
are the distinct values of s and the
j
are the distinct values of t. Clearly
as + bt is a nonnegative simple function because it is measurable and has nitely
many values. Also,
(as +bt)() =
m
j=1
n
i=1
(a
i
+b
j
)A
A
i
B
j
()
where the sets A
i
B
j
are disjoint. By Lemma 7.22,
_
as +bt =
m
j=1
n
i=1
(a
i
+b
j
)(A
i
B
j
)
= a
n
i=1
i
(A
i
) +b
m
j=1
j
(B
j
)
= a
_
s +b
_
t.
7.2.4 Simple Functions And Measurable Functions
There is a fundamental theorem about the relationship of simple functions to mea-
surable functions given in the next theorem.
Theorem 7.24 Let f 0 be measurable. Then there exists a sequence of simple
functions s
n
satisfying
0 s
n
() (7.10)
s
n
() s
n+1
()
f() = lim
n
s
n
() for all . (7.11)
If f is bounded the convergence is actually uniform.
Proof : Letting I : f () = , dene
t
n
() =
2
n
k=0
k
n
A
[k/nf<(k+1)/n]
() +nA
I
().
Then t
n
() f() for all and lim
n
t
n
() = f() for all . This is because
t
n
() = n for I and if f () [0,
2
n
+1
n
), then
0 f () t
n
()
1
n
. (7.12)
Thus whenever / I, the above inequality will hold for all n large enough. Let
s
1
= t
1
, s
2
= max (t
1
, t
2
) , s
3
= max (t
1
, t
2
, t
3
) , .
Then the sequence s
n
satises 7.10-7.11.
To verify the last claim, note that in this case the term nA
I
() is not present.
Therefore, for all n large enough, 7.12 holds for all . Thus the convergence is
uniform. This proves the theorem.
7.2.5 The Monotone Convergence Theorem
The following is called the monotone convergence theorem. This theorem and re-
lated convergence theorems are the reason for using the Lebesgue integral.
Theorem 7.25 (Monotone Convergence theorem) Let f have values in [0, ] and
suppose f
n
is a sequence of nonnegative measurable functions having values in
[0, ] and satisfying
lim
n
f
n
() = f() for each .
f
n
() f
n+1
()
Then f is measurable and
_
fd = lim
n
_
f
n
d.
Proof: From Lemmas 7.16 and 7.17,
_
fd sup
h>0
i=1
h([ih < f])
= sup
h>0
sup
k
k
i=1
h([ih < f])
= sup
h>0
sup
k
sup
m
k
i=1
h([ih < f
m
])
= sup
m
sup
h>0
i=1
h([ih < f
m
])
sup
m
_
f
m
d
= lim
m
_
f
m
d.
The third equality follows from the observation that
lim
m
([ih < f
m
]) = ([ih < f])
which follows from Theorem 7.5 since the sets, [ih < f
m
] are increasing in m and
their union equals [ih < f]. This proves the theorem.
To illustrate what goes wrong without the Lebesgue integral, consider the fol-
lowing example.
Example 7.26 Let r
n
denote the rational numbers in [0, 1] and let
f
n
(t)
_
1 if t / r
1
, , r
n
0 otherwise
Then f
n
(t) f (t) where f is the function which is one on the rationals and zero
on the irrationals. Each f
n
is Riemann integrable (why?) but f is not Riemann
integrable. Therefore, you cant write
_
fdx = lim
n
_
f
n
dx.
A meta-mathematical observation related to this type of example is this. If you
can choose your functions, you dont need the Lebesgue integral. The Riemann
integral is just ne. It is when you cant choose your functions and they come to
you as pointwise limits that you really need the superior Lebesgue integral or at
least something more general than the Riemann integral. The Riemann integral
is entirely adequate for evaluating the seemingly endless lists of boring problems
found in calculus books.
7.2.6 Other Denitions
To review and summarize the above, if f 0 is measurable,
_
fd sup
h>0
i=1
h([f > ih]) (7.13)
another way to get the same thing for
_
fd is to take an increasing sequence of
nonnegative simple functions, s
n
with s
n
() f () and then by monotone
convergence theorem,
_
fd = lim
n
_
s
n
where if s
n
() =
m
j=1
c
i
A
E
i
() ,
_
s
n
d =
m
i=1
c
i
m(E
i
) .
Similarly this also shows that for such nonnegative measurable function,
_
fd = sup
__
s : 0 s f, s simple
_
which is the usual way of dening the Lebesgue integral for nonnegative simple
functions in most books. I have done it dierently because this approach led to
such an easy proof of the Monotone convergence theorem. Here is an equivalent
denition of the integral. The fact it is well dened has been discussed above.
Denition 7.27 For s a nonnegative simple function, s () =
n
k=1
c
k
A
E
k
() ,
_
s =
n
k=1
c
k
(E
k
) . For f a nonnegative measurable function,
_
fd = sup
__
s : 0 s f, s simple
_
.
7.2.7 Fatous Lemma
Sometimes the limit of a sequence does not exist. There are two more general
notions known as limsup and liminf which do always exist in some sense. These
notions are dependent on the following lemma.
Lemma 7.28 Let a
n
be an increasing (decreasing) sequence in [, ] . Then
lim
n
a
n
exists.
Proof: Suppose rst a
n
is increasing. Recall this means a
n
a
n+1
for all n.
If the sequence is bounded above, then it has a least upper bound and so a
n
a
where a is its least upper bound. If the sequence is not bounded above, then for
every l 1, it follows l is not an upper bound and so eventually, a
n
> l. But this
is what is meant by a
n
. The situation for decreasing sequences is completely
similar.
Now take any sequence, a
n
[, ] and consider the sequence A
n
where
A
n
inf a
k
: k n . Then as n increases, the set of numbers whose inf is being
taken is getting smaller. Therefore, A
n
is an increasing sequence and so it must
converge. Similarly, if B
n
supa
k
: k n , it follows B
n
is decreasing and so
B
n
also must converge. With this preparation, the following denition can be
given.
Denition 7.29 Let a
n
be a sequence of points in [, ] . Then dene
lim inf
n
a
n
lim
n
inf a
k
: k n
and
lim sup
n
a
n
lim
n
supa
k
: k n
In the case of functions having values in [, ] ,
_
lim inf
n
f
n
_
() lim inf
n
(f
n
()) .
A similar denition applies to limsup
n
f
n
.
Lemma 7.30 Let a
n
be a sequence in [, ] . Then lim
n
a
n
exists if and
only if
lim inf
n
a
n
= lim sup
n
a
n
and in this case, the limit equals the common value of these two numbers.
Proof: Suppose rst lim
n
a
n
= a 1. Then, letting > 0 be given, a
n

(a , a +) for all n large enough, say n N. Therefore, both inf a
k
: k n
and sup a
k
: k n are contained in [a , a +] whenever n N. It follows
limsup
n
a
n
and liminf
n
a
n
are both in [a , a +] , showing
lim inf
n
a
n
lim sup
n
a
n
< 2.
Since is arbitrary, the two must be equal and they both must equal a. Next suppose
lim
n
a
n
= . Then if l 1, there exists N such that for n N,
l a
n
and therefore, for such n,
l inf a
k
: k n supa
k
: k n
and this shows, since l is arbitrary that
lim inf
n
a
n
= lim sup
n
a
n
= .
The case for is similar.
Conversely, suppose liminf
n
a
n
= limsup
n
a
n
= a. Suppose rst that
a 1. Then, letting > 0 be given, there exists N such that if n N,
supa
k
: k n inf a
k
: k n <
therefore, if k, m > N, and a
k
> a
m
,
[a
k
a
m
[ = a
k
a
m
supa
k
: k n inf a
k
: k n <
showing that a
n
is a Cauchy sequence. Therefore, it converges to a 1, and
as in the rst part, the liminf and limsup both equal a. If liminf
n
a
n
=
limsup
n
a
n
= , then given l 1, there exists N such that for n N,
inf
n>N
a
n
> l.
Therefore, lim
n
a
n
= . The case for is similar. This proves the lemma.
The next theorem, known as Fatous lemma is another important theorem which
justies the use of the Lebesgue integral.
Theorem 7.31 (Fatous lemma) Let f
n
be a nonnegative measurable function with
values in [0, ]. Let g() = liminf
n
f
n
(). Then g is measurable and
_
gd lim inf
n
_
f
n
d.
In other words,
_
_
lim inf
n
f
n
_
d lim inf
n
_
f
n
d
Proof: Let g
n
() = inff
k
() : k n. Then
g
1
n
([a, ]) =
k=n
f
1
k
([a, ]) T.
Thus g
n
is measurable by Lemma 7.6 on Page 138. Also g() = lim
n
g
n
() so
g is measurable because it is the pointwise limit of measurable functions. Now the
7.3. THE SPACE L
1
155
functions g
n
form an increasing sequence of nonnegative measurable functions so
the monotone convergence theorem applies. This yields
_
gd = lim
n
_
g
n
d lim inf
n
_
f
n
d.
The last inequality holding because
_
g
n
d
_
f
n
d.
(Note that it is not known whether lim
n
_
f
n
d exists.) This proves the Theo-
rem.
7.2.8 The Righteous Algebraic Desires Of The Lebesgue In-
tegral
The monotone convergence theorem shows the integral wants to be linear. This is
the essential content of the next theorem.
Theorem 7.32 Let f, g be nonnegative measurable functions and let a, b be non-
negative numbers. Then
_
(af +bg) d = a
_
fd +b
_
gd. (7.14)
Proof: By Theorem 7.24 on Page 150 there exist sequences of nonnegative
simple functions, s
n
f and t
n
g. Then by the monotone convergence theorem
and Lemma 7.23,
_
(af +bg) d = lim
n
_
as
n
+bt
n
d
= lim
n
_
a
_
s
n
d +b
_
t
n
d
_
= a
_
fd +b
_
gd.
As long as you are allowing functions to take the value +, you cannot consider
something like f +(g) and so you cant very well expect a satisfactory statement
about the integral being linear until you restrict yourself to functions which have
values in a vector space. This is discussed next.
7.3 The Space L
1
The functions considered here have values in C, a vector space.
Denition 7.33 Let (, o, ) be a measure space and suppose f : C. Then f
is said to be measurable if both Re f and Imf are measurable real valued functions.
Denition 7.34 A complex simple function will be a function which is of the form
s () =
n
k=1
c
k
A
E
k
()
where c
k
C and (E
k
) < . For s a complex simple function as above, dene
I (s)
n
k=1
c
k
(E
k
) .
Lemma 7.35 The denition, 7.34 is well dened. Furthermore, I is linear on the
vector space of complex simple functions. Also the triangle inequality holds,
[I (s)[ I ([s[) .
Proof: Suppose

n
k=1
c
k
A
E
k
() = 0. Does it follow that

k
c
k
(E
k
) = 0?
The supposition implies
n
k=1
Re c
k
A
E
k
() = 0,
n
k=1
Imc
k
A
E
k
() = 0. (7.15)
Choose large and positive so that + Re c
k
0. Then adding

k
A
E
k
to both
sides of the rst equation above,
n
k=1
( + Re c
k
) A
E
k
() =
n
k=1
A
E
k
and by Lemma 7.23 on Page 149, it follows upon taking
_
of both sides that
n
k=1
( + Re c
k
) (E
k
) =
n
k=1
(E
k
)
which implies

n
k=1
Re c
k
(E
k
) = 0. Similarly,

n
k=1
Imc
k
(E
k
) = 0 and so
n
k=1
c
k
(E
k
) = 0. Thus if
j
c
j
A
E
j
=
k
d
k
A
F
k
then

j
c
j
A
E
j
+

k
(d
k
) A
F
k
= 0 and so the result just established veries
j
c
j
(E
j
)
k
d
k
(F
k
) = 0 which proves I is well dened.
That I is linear is now obvious. It only remains to verify the triangle inequality.
Let s be a simple function,
s =
j
c
j
A
E
j
7.3. THE SPACE L
1
157
Then pick C such that I (s) = [I (s)[ and [[ = 1. Then from the triangle
inequality for sums of complex numbers,
[I (s)[ = I (s) = I (s) =
j
c
j
(E
j
)
=
j
c
j
(E
j
)
j
[c
j
[ (E
j
) = I ([s[) .
With this lemma, the following is the denition of L
1
() .
Denition 7.36 f L
1
() means there exists a sequence of complex simple func-
tions, s
n
such that
s
n
() f () for all
lim
m,n
I ([s
n
s
m
[) = lim
n,m
_
[s
n
s
m
[ d = 0
(7.16)
Then
I (f) lim
n
I (s
n
) . (7.17)
Lemma 7.37 Denition 7.36 is well dened.
Proof: There are several things which need to be veried. First suppose 7.16.
Then by Lemma 7.35
[I (s
n
) I (s
m
)[ = [I (s
n
s
m
)[ I ([s
n
s
m
[)
and for m, n large enough this last is given to be small so I (s
n
) is a Cauchy
sequence in C and so it converges. This veries the limit in 7.17 at least exists. It
remains to consider another sequence t
n
having the same properties as s
n
and
verifying I (f) determined by this other sequence is the same. By Lemma 7.35 and
Fatous lemma, Theorem 7.31 on Page 154,
[I (s
n
) I (t
n
)[ I ([s
n
t
n
[) =
_
[s
n
t
n
[ d
_
[s
n
f[ +[f t
n
[ d
lim inf
k
_
[s
n
s
k
[ d + lim inf
k
_
[t
n
t
k
[ d <
whenever n is large enough. Since is arbitrary, this shows the limit from using
the t
n
is the same as the limit from using s
n
What if f has values in [0, )? Earlier
_
fd was dened for such functions and
now I (f) has been dened. Are they the same? If so, I can be regarded as an
extension of
_
d to a larger class of functions.
Lemma 7.38 Suppose f has values in [0, ) and f L
1
() . Then f is measurable
and
I (f) =
_
fd.
Proof: Since f is the pointwise limit of a sequence of complex simple func-
tions, s
n
having the properties described in Denition 7.36, it follows f () =
lim
n
Re s
n
() and so f is measurable. Also
_

(Re s
n
)
+
(Re s
m
)
+
d
_
[Re s
n
Re s
m
[ d
_
[s
n
s
m
[ d
where x
+
1
2
([x[ +x) , the positive part of the real number, x.
2
Thus there is no
loss of generality in assuming s
n
is a sequence of complex simple functions having
values in [0, ). Then since for such complex simple functions, I (s) =
_
sd,
I (f)
_
fd
[I (f) I (s
n
)[ +
_
s
n
d
_
fd
< +
_
[s
n
f0]
s
n
d
_
[s
n
f0]
fd
+
_
[s
n
f<0]
s
n
d
_
[s
n
f<0]
fd
_
[s
n
f0]
(s
n
f) d
_
[s
n
f<0]
(s
n
f) d
+
_
[s
n
f0]
[s
n
f[ d +
_
[s
n
f>0]
[s
n
f[ d
= +
_
[s
n
f[ d
whenever n is large enough. But by Fatous lemma, Theorem 7.31 on Page 154, the
last term is no larger than
lim inf
k
_
[s
n
s
k
[ d <
whenever n is large enough. Since is arbitrary, this shows I (f) =
_
fd as
claimed.
As explained above, I can be regarded as an extension of
_
d so the usual
symbol,
_
d can be used. It is now easy to verify
_
d is linear on L
1
() .
2
The negative part of the real number x is dened to be x

1
2
(|x| x) . Thus |x| = x
+
+x
and x = x
+
x
. .
7.3. THE SPACE L
1
159
Theorem 7.39
_
d is linear on L
1
() and L
1
() is a complex vector space. If
f L
1
() , then Re f, Imf, and [f[ are all in L
1
() . Furthermore, for f L
1
() ,
_
fd =
_
(Re f)
+
d
_
(Re f)
d +i
__
(Imf)
+
d
_
(Imf)
d
_
Also the triangle inequality holds,
_
fd
_
[f[ d
Proof: First it is necessary to verify that L
1
() is really a vector space because
it makes no sense to speak of linear maps without having these maps dened on
a vector space. Let f, g be in L
1
() and let a, b C. Then let s
n
and t
n
be sequences of complex simple functions associated with f and g respectively as

described in Denition 7.36. Consider as
n
+bt
n
, another sequence of complex
simple functions. Then as
n
() + bt
n
() af () + bg () for each . Also, from
Lemma 7.35
_
[as
n
+bt
n
(as
m
+bt
m
)[ d [a[
_
[s
n
s
m
[ d +[b[
_
[t
n
t
m
[ d
and the sum of the two terms on the right converge to zero as m, n . Thus
af +bg L
1
() . Also
_
(af +bg) d = lim
n
_
(as
n
+bt
n
) d
= lim
n
_
a
_
s
n
d +b
_
t
n
d
_
= a lim
n
_
s
n
d +b lim
n
_
t
n
d
= a
_
fd +b
_
gd.
If s
n
is a sequence of complex simple functions described in Denition 7.36
corresponding to f, then [s
n
[ is a sequence of complex simple functions satisfying
the conditions of Denition 7.36 corresponding to [f[ . This is because [s
n
()[
[f ()[ and
_
[[s
n
[ [s
m
[[ d
_
[s
m
s
n
[ d
with this last expression converging to 0 as m, n . Thus [f[ L
1
(). Also, by
similar reasoning, Re s
n
and Ims
n
correspond to Re f and Imf respectively in
the manner described by Denition 7.36 showing that Re f and Imf are in L
1
().
Now (Re f)
+
=
1
2
([Re f[ + Re f) and (Re f)
=
1
2
([Re f[ Re f) so both of these
functions are in L
1
() . Similar formulas establish that (Imf)
+
and (Imf)
are in
L
1
() .
The formula follows from the observation that
f = (Re f)
+
(Re f)
+i
_
(Imf)
+
(Imf)
_
and the fact shown rst that
_
d is linear.
To verify the triangle inequality, let s
n
be complex simple functions for f as
in Denition 7.36. Then
_
fd
= lim
n
_
s
n
d
lim
n
_
[s
n
[ d =
_
[f[ d.
Now here is an equivalent description of L
1
() which is the version which will
be used more often than not.
Corollary 7.40 Let (, o, ) be a measure space and let f : C. Then f
L
1
() if and only if f is measurable and
_
[f[ d < .
Proof: Suppose f L
1
() . Then from Denition 7.36, it follows both real and
imaginary parts of f are measurable. Just take real and imaginary parts of s
n
and
observe the real and imaginary parts of f are limits of the real and imaginary parts
of s
n
respectively. Why is
_
[f[ d < ? It follows from Theorem 7.39. Recall why
this was so. Let s
n
be a sequence of simple functions attached to f as in the
denition of what it means to be L
1
. Then from the denition of I (s) for s simple,
[I ([s
n
[ [s
m
[)[ I ([s
n
s
m
[)
which converges to 0. Since I ([s
n
[) is a Cauchy sequence, it is bounded by a
constant C and also [s
n
[ is a sequence of simple functions of the right sort which
converges pointwise to [f[ and so by denition,
_
[f[ d = I ([f[) = lim
n
I ([s
n
[) C.
This shows the only if part.
The more interesting part is the if part. Suppose then that f is measurable and
_
[f[ d < . Suppose rst that f has values in [0, ). It is necessary to obtain the
sequence of complex simple functions. By Theorem 7.24, there exists a sequence of
n
such that s
n
() f (). Then by the monotone
convergence theorem,
lim
n
_
(2f (f s
n
)) d =
_
2fd
and so
lim
n
_
(f s
n
) d = 0.
7.3. THE SPACE L
1
161
Letting m be large enough, it follows
_
(f s
m
) d < and so if n > m
_
[s
m
s
n
[ d
_
[f s
m
[ d < .
Therefore, f L
1
() because s
n
is a suitable sequence.
The general case follows from considering positive and negative parts of real and
imaginary parts of f. These are each measurable and nonnegative and their integral
is nite so each is in L
1
() by what was just shown. Thus
f = Re f
+
Re f
+i
_
Imf
+
Imf
_
and so f L
1
(). This proves the corollary.
Theorem 7.41 (Dominated Convergence theorem) Let f
n
L
1
() and suppose
f() = lim
n
f
n
(),
and there exists a measurable function g, with values in [0, ],
3
such that
[f
n
()[ g() and
_
g()d < .
Then f L
1
() and
0 = lim
n
_
[f
n
f[ d = lim
n
_
fd
_
f
n
d
Proof: f is measurable by Theorem 7.8. Since [f[ g, it follows that

f L
1
() and [f f
n
[ 2g.
By Fatous lemma (Theorem 7.31),
_
2gd lim inf
n
_
2g [f f
n
[d
=
_
2gd lim sup
n
_
[f f
n
[d.
Subtracting
_
2gd,
0 lim sup
n
_
[f f
n
[d.
Hence
0 lim sup
n
(
_
[f f
n
[d)
lim inf
n
[
_
[f f
n
[d[
_
fd
_
f
n
d
0.
This proves the theorem by Lemma 7.30 on Page 153 because the limsup and liminf
are equal.
3
Note that, since g is allowed to have the value , it is not known that g L
1
() .
Corollary 7.42 Suppose f
n
L
1
() and f () = lim
n
f
n
() . Suppose also
there exist measurable functions, g
n
, g with values in [0, ] such that
lim
n
_
g
n
d =
_
gd, g
n
() g () a.e.
and both
_
g
n
d and
_
gd are nite. Also suppose [f
n
()[ g
n
() . Then
lim
n
_
[f f
n
[ d = 0.
Proof: It is just like the above. This time g + g
n
[f f
n
[ 0 and so by
Fatous lemma,
_
2gd lim sup
n
_
[f f
n
[ d =
lim inf
n
_
(g
n
+g) lim sup
n
_
[f f
n
[ d
= lim inf
n
_
((g
n
+g) [f f
n
[) d
_
2gd
and so limsup
n
_
[f f
n
[ d 0.
Denition 7.43 Let E be a measurable subset of .
_
E
fd
_
fA
E
d.
If L
1
(E) is written, the algebra is dened as
E A : A T
and the measure is restricted to this smaller algebra. Clearly, if f L
1
(),
then
fA
E
L
1
(E)
and if f L
1
(E), then letting

f be the 0 extension of f o of E, it follows

f
L
1
().
7.4 Vitali Convergence Theorem
The Vitali convergence theorem is a convergence theorem which in the case of a
nite measure space is superior to the dominated convergence theorem.
Denition 7.44 Let (, T, ) be a measure space and let S L
1
(). S is uni-
formly integrable if for every > 0 there exists > 0 such that for all f S
[
_
E
fd[ < whenever (E) < .
7.4. VITALI CONVERGENCE THEOREM 163
Lemma 7.45 If S is uniformly integrable, then [S[ [f[ : f S is uniformly
integrable. Also S is uniformly integrable if S is nite.
Proof: Let > 0 be given and suppose S is uniformly integrable. First suppose
the functions are real valued. Let be such that if (E) < , then
_
E
fd
<

2
for all f S. Let (E) < . Then if f S,
_
E
[f[ d
_
E[f0]
(f) d +
_
E[f>0]
fd
=
_
E[f0]
fd
_
E[f>0]
fd
<

2
+

2
= .
In general, if S is a uniformly integrable set of complex valued functions, the in-
equalities,

_
E
Re fd
_
E
fd
_
E
Imfd
_
E
fd
,
imply Re S Re f : f S and ImS Imf : f S are also uniformly inte-
grable. Therefore, applying the above result for real valued functions to these sets
of functions, it follows [S[ is uniformly integrable also.
For the last part, is suces to verify a single function in L
1
() is uniformly
integrable. To do so, note that from the dominated convergence theorem,
lim
R
_
[|f|>R]
[f[ d = 0.
Let > 0 be given and choose R large enough that
_
[|f|>R]
[f[ d <

2
. Now let
(E) <

2R
. Then
_
E
[f[ d =
_
E[|f|R]
[f[ d +
_
E[|f|>R]
[f[ d
< R(E) +

2
<

2
+

2
= .
The following theorem is Vitalis convergence theorem.
Theorem 7.46 Let f
n
be a uniformly integrable set of complex valued functions,
() < , and f
n
(x) f(x) a.e. where f is a measurable complex valued
function. Then f L
1
() and
lim
n
_
[f
n
f[d = 0. (7.18)
Proof: First it will be shown that f L
1
(). By uniform integrability, there
exists > 0 such that if (E) < , then
_
E
[f
n
[ d < 1
for all n. By Egoros theorem, there exists a set, E of measure less than such
that on E
C
, f
n
converges uniformly. Therefore, for p large enough, and n > p,
_
E
C
[f
p
f
n
[ d < 1
which implies
_
E
C
[f
n
[ d < 1 +
_
[f
p
[ d.
Then since there are only nitely many functions, f
n
with n p, there exists a
constant, M
1
such that for all n,
_
E
C
[f
n
[ d < M
1
.
But also,
_
[f
m
[ d =
_
E
C
[f
m
[ d +
_
E
[f
m
[
M
1
+ 1 M.
Therefore, by Fatous lemma,
_
[f[ d lim inf

n
_
[f
n
[ d M,
showing that f L
1
as hoped.
Now Sf is uniformly integrable so there exists
1
> 0 such that if (E) <
1
,
then
_
E
[g[ d < /3 for all g S f. By Egoros theorem, there exists a set,
F with (F) <
1
such that f
n
converges uniformly to f on F
C
. Therefore, there
exists N such that if n > N, then
_
F
C
[f f
n
[ d <

3
.
It follows that for n > N,
_
[f f
n
[ d
_
F
C
[f f
n
[ d +
_
F
[f[ d +
_
F
[f
n
[ d
<

3
+

3
+

3
= ,
which veries 7.18.
7.5. EXERCISES 165
7.5 Exercises
1. Let = N =1, 2, . Let T = T(N), the set of all subsets of N, and let
(S) = number of elements in S. Thus (1) = 1 = (2), (1, 2) = 2,
etc. Show (, T, ) is a measure space. It is called counting measure. What
functions are measurable in this case?
2. For a measurable nonnegative function, f, the integral was dened as
sup
>h>0
i=1
h([f > ih])
Show this is the same as
_

0
([f > t]) dt
where this integral is just the improper Riemann integral dened by
_

0
([f > t]) dt = lim
R
_
R
0
([f > t]) dt.
3. Using the Problem 2, show that for s a nonnegative simple function, s () =
n
i=1
c
i
A
E
i
() where 0 < c
1
< c
2
< c
n
and the sets, E
k
are disjoint,
_
sd =
n
i=1
c
i
(E
i
) .
Give a really easy proof of this.
4. Let be any uncountable set and let T = A : either A or A
C
is
countable. Let (A) = 1 if A is uncountable and (A) = 0 if A is countable.
Show (, T, ) is a measure space. This is a well known bad example.
5. Let T be a algebra of subsets of and suppose T has innitely many
elements. Show that T is uncountable. Hint: You might try to show there
exists a countable sequence of disjoint sets of T, A
i
. It might be easiest to
verify this by contradiction if it doesnt exist rather than a direct construction
however, I have seen this done several ways. Once this has been done, you
can dene a map, , from T (N) into T which is one to one by (S) =
iS
A
i
.
Then argue T (N) is uncountable and so T is also uncountable.
6. An algebra / of subsets of is a subset of the power set such that is in the
algebra and for A, B /, A B and A B are both in /. Let ( E
i
i=1
be a countable collection of sets and let
1

i=1
E
i
. Show there exists an
algebra of sets, /, such that / ( and / is countable. Note the dierence
between this problem and Problem 5. Hint: Let (
1
denote all nite unions
of sets of ( and
1
. Thus (
1
is countable. Now let B
1
denote all complements
with respect to
1
of sets of (
1
. Let (
2
denote all nite unions of sets of
B
1
(
1
. Continue in this way, obtaining an increasing sequence (
n
, each of
which is countable. Let
/
i=1
(
i
.
7. We say g is Borel measurable if whenever U is open, g
1
(U) is a Borel set. Let
f : X and let g : X Y where X is a topological space and Y equals
C, 1, or (, ] and T is a algebra of sets of . Suppose f is measurable
and g is Borel measurable. Show g f is measurable.
8. Let (, T, ) be a measure space. Dene : T() [0, ] by
(A) = inf(B) : B A, B T.
Show satises
() = 0, if A B, (A) (B),
(
i=1
A
i
)
i=1
(A
i
), (A) = (A) if A T.
If satises these conditions, it is called an outer measure. This shows every
measure determines an outer measure on the power set. Outer measures are
discussed more later.
9. Let E
i
be a sequence of measurable sets with the property that
i=1
(E
i
) < .
Let S = such that E
i
for innitely many values of i. Show
(S) = 0 and S is measurable. This is part of the Borel Cantelli lemma.
Hint: Write S in terms of intersections and unions. Something is in S means
that for every n there exists k > n such that it is in E
k
. Remember the tail
of a convergent series is small.
10. Let f
n
, f be measurable functions with values in C. f
n
converges in
measure if
lim
n
(x : [f(x) f
n
(x)[ ) = 0
for each xed > 0. Prove the theorem of F. Riesz. If f
n
converges to f
in measure, then there exists a subsequence f
n
k
which converges to f a.e.
Hint: Choose n
1
such that
(x : [f(x) f
n
1
(x)[ 1) < 1/2.
Choose n
2
> n
1
such that
(x : [f(x) f
n
2
(x)[ 1/2) < 1/2
2
,
7.5. EXERCISES 167
n
3
> n
2
such that
(x : [f(x) f
n
3
(x)[ 1/3) < 1/2
3
,
etc. Now consider what it means for f
n
k
(x) to fail to converge to f(x). Then
use Problem 9.
11. Let = N = 1, 2, and (S) = number of elements in S. If
f : C
what is meant by
_
fd? Which functions are in L
1
()? Which functions are
measurable? See Problem 1.
12. For the measure space of Problem 11, give an example of a sequence of non-
negative measurable functions f
n
converging pointwise to a function f, such
that inequality is obtained in Fatous lemma.
13. Suppose (, ) is a nite measure space (() < ) and S L
1
(). Show
S is uniformly integrable and bounded in L
1
() if there exists an increasing
function h which satises
lim
t
h(t)
t
= , sup
__
h([f[) d : f S
_
< .
S is bounded if there is some number, M such that
_
[f[ d M
for all f S.
14. Let (, T, ) be a measure space and suppose f, g : (, ] are mea-
surable. Prove the sets
: f() < g() and : f() = g()
are measurable. Hint: The easy way to do this is to write
: f() < g() =
rQ
[f < r] [g > r] .
Note that l (x, y) = x y is not continuous on (, ] so the obvious idea
doesnt work.
15. Let f
n
be a sequence of real or complex valued measurable functions. Let
S = : f
n
() converges.
Show S is measurable. Hint: You might try to exhibit the set where f
n
converges in terms of countable unions and intersections using the denition
of a Cauchy sequence.
16. Suppose u
n
(t) is a dierentiable function for t (a, b) and suppose that for
t (a, b),
[u
n
(t)[, [u
n
(t)[ < K
n
where

n=1
K
n
< . Show
(
n=1
u
n
(t))
n=1
u
n
(t).
Hint: This is an exercise in the use of the dominated convergence theorem
and the mean value theorem from calculus.
17. Suppose f
n
is a sequence of nonnegative measurable functions dened on a
measure space, (, o, ). Show that
_

k=1
f
k
d =
k=1
_
f
k
d.
Hint: Use the monotone convergence theorem along with the fact the integral
is linear.
18. Show lim
n
n
2
n
n
k=1
2
k
k
= 2. This problem was shown to me by Shane Tang,
a former student. It is a nice exercise in dominated convergence theorem if
you massage it a little.
The Construction Of
Measures
8.1 Outer Measures
What are some examples of measure spaces? In this chapter, a general procedure
is discussed called the method of outer measures. It is due to Caratheodory (1918).
This approach shows how to obtain measure spaces starting with an outer mea-
sure. This will then be used to construct measures determined by positive linear
functionals.
Denition 8.1 Let be a nonempty set and let : T() [0, ] satisfy
() = 0,
If A B, then (A) (B),
(
i=1
E
i
)
i=1
(E
i
).
Such a function is called an outer measure. For E , E is measurable if for
all S ,
(S) = (S E) +(S E). (8.1)
To help in remembering 8.1, think of a measurable set, E, as a process which
divides a given set into two pieces, the part in E and the part not in E as in
8.1. In the Bible, there are four incidents recorded in which a process of division
resulted in more stu than was originally present.
1
Measurable sets are exactly
1
1 Kings 17, 2 Kings 4, Mathew 14, and Mathew 15 all contain such descriptions. The stu
involved was either oil, bread, our or sh. In mathematics such things have also been done with
sets. In the book by Bruckner Bruckner and Thompson there is an interesting discussion of the
Banach Tarski paradox which says it is possible to divide a ball in R
3
into ve disjoint pieces and
assemble the pieces to form two disjoint balls of the same size as the rst. The details can be
found in: The Banach Tarski Paradox by Wagon, Cambridge University press. 1985. It is known
that all such examples must involve the axiom of choice.
169
170 THE CONSTRUCTION OF MEASURES
those for which no such miracle occurs. You might think of the measurable sets as
the nonmiraculous sets. The idea is to show that they form a algebra on which
the outer measure, is a measure.
First here is a denition and a lemma.
Denition 8.2 (S)(A) (S A) for all A . Thus S is the name of a
new outer measure, called restricted to S.
The next lemma indicates that the property of measurability is not lost by
considering this restricted measure.
Lemma 8.3 If A is measurable, then A is S measurable.
Proof: Suppose A is measurable. It is desired to to show that for all T ,
(S)(T) = (S)(T A) + (S)(T A).
Thus it is desired to show
(S T) = (T A S) +(T S A
C
). (8.2)
But 8.2 holds because A is measurable. Apply Denition 8.1 to S T instead of
S.
If A is S measurable, it does not follow that A is measurable. Indeed, if
you believe in the existence of non measurable sets, you could let A = S for such a
non measurable set and verify that S is S measurable.
The next theorem is the main result on outer measures. It is a very general
result which applies whenever one has an outer measure on the power set of any
set. This theorem will be referred to as Caratheodorys procedure in the rest of the
book.
Theorem 8.4 The collection of measurable sets, o, forms a algebra and
If F
i
o, F
i
F
j
= , then (
i=1
F
i
) =
i=1
(F
i
). (8.3)
If F
n
F
n+1
, then if F =
n=1
F
n
and F
n
o, it follows that
(F) = lim
n
(F
n
). (8.4)
If F
n
F
n+1
, and if F =
n=1
F
n
for F
n
o then if (F
1
) < ,
(F) = lim
n
(F
n
). (8.5)
Also, (o, ) is complete. By this it is meant that if F o and if E with
(E F) +(F E) = 0, then E o.
8.1. OUTER MEASURES 171
Proof: First note that and are obviously in o. Now suppose A, B o. I
will show A B A B
C
is in o. To do so, consider the following picture.
S
A
C
B
C
S
A
C
B
S
B
S
B
C
A
B
S
Since is subadditive,
(S)
_
S A B
C
_
+(A B S) +
_
S B A
C
_
+
_
S A
C
B
C
_
.
Now using A, B o,
(S)
_
S A B
C
_
+(S A B) +
_
S A
C
B
_
+
_
S A
C
B
C
_
= (S A) +
_
S A
C
_
= (S)
It follows equality holds in the above. Now observe using the picture if you like that
(A B S)
_
S B A
C
_
_
S A
C
B
C
_
= S (A B)
and therefore,
(S) =
_
S A B
C
_
+(A B S) +
_
S B A
C
_
+
_
S A
C
B
C
_
(S (A B)) +(S (A B)) .
Therefore, since S is arbitrary, this shows A B o.
Since o, this shows that A o if and only if A
C
o. Now if A, B o,
A B = (A
C
B
C
)
C
= (A
C
B)
C
o. By induction, if A
1
, , A
n
o, then so
is
n
i=1
A
i
. If A, B o, with A B = ,
(A B) = ((A B) A) +((A B) A) = (A) +(B).
By induction, if A
i
A
j
= and A
i
o, (
n
i=1
A
i
) =
n
i=1
(A
i
).
Now let A =
i=1
A
i
where A
i
A
j
= for i ,= j.
i=1
(A
i
) (A) (
n
i=1
A
i
) =
n
i=1
(A
i
).
Since this holds for all n, you can take the limit as n and conclude,
i=1
(A
i
) = (A)
which establishes 8.3. Part 8.4 follows from part 8.3 just as in the proof of Theorem
7.5 on Page 136. That is, letting F
0
, use part 8.3 to write
(F) = (
k=1
(F
k
F
k1
)) =
k=1
(F
k
F
k1
)
= lim
n
n
k=1
((F
k
) (F
k1
)) = lim
n
(F
n
) .
In order to establish 8.5, let the F
n
be as given there. Then from what was just
shown,
(F
1
F
n
) +(F
n
) = (F
1
)
Then, since (F
1
F
n
) increases to (F
1
F), 8.4 implies
lim
n
((F
1
F
n
)) = lim
n
((F
1
) (F
n
)) = (F
1
F) .
Now I dont know whether F o and so all that can be said is that
(F
1
F) +(F) (F
1
)
but this implies
(F
1
F) (F
1
) (F) .
Hence
lim
n
((F
1
) (F
n
)) = (F
1
F) (F
1
) (F)
which implies
lim
n
(F
n
) (F) .
But since F F
n
,
(F) lim
n
(F
n
)
and this establishes 8.5. Note that it was assumed (F
1
) < because (F
1
) was
subtracted from both sides.
It remains to show o is closed under countable unions. Recall that if A o, then
A
C
o and o is closed under nite unions. Let A
i
o, A =
i=1
A
i
, B
n
=
n
i=1
A
i
.
Then
(S) = (S B
n
) +(S B
n
) (8.6)
= (S)(B
n
) + (S)(B
C
n
).
By Lemma 8.3 B
n
is (S) measurable and so is B
C
n
. I want to show (S)
(S A) + (S A). If (S) = , there is nothing to prove. Assume (S) < .
Then apply Parts 8.5 and 8.4 to the outer measure, S in 8.6 and let n .
Thus
B
n
A, B
C
n
A
C
and this yields
(S) = (S)(A) + (S)(A
C
) = (S A) +(S A).
Therefore A o and this proves Parts 8.3, 8.4, and 8.5. It remains to prove the
last assertion about the measure being complete.
Let F o and let (E F) +(F E) = 0. Consider the following picture.
E F
S
Then referring to this picture and using F o,
(S) (S E) +(S E)
(S E F) +((S E) F) +(S F) +(F E)
(S F) +(E F) +(S F) +(F E)
= (S F) +(S F) = (S)
Hence (S) = (SE)+(SE) and so E o. This shows that (o, ) is complete
and proves the theorem.
Completeness usually occurs in the following form. E F o and (F) = 0.
Then E o.
In the case of a Hausdor topological space, the following lemma gives conditions
under which the algebra of measurable sets for an outer measure contains
the Borel sets. In words, it assumes the outer measure is inner regular on open sets
and outer regular on all sets. Also it assumes you can approximate the measure of
an open set with a compact set and the measure of a compact set with an open set.
Lemma 8.5 Let be a Hausdor space and suppose is an outer measure satis-
fying is nite on compact sets and the following conditions,
1. (E) = inf (V ) , V E, V open for all E. (Outer regularity.)
2. For every open set V, (V ) = sup(K) : K V, K compact (Inner regu-
larity on open sets.)
3. If A, B are compact disjoint sets, then (A B) = (A) +(B).
Then the following hold.
1. If > 0 and if K is compact, there exists V open such that V K and
(V K) <
2. If > 0 and if V is open with (V ) < , there exists a compact subset K of
V such that
(V K) <
3. Then the measurable sets o contain the Borel sets and also is inner regular
on every open set and for every E o with (E) < .
Proof: First we establish 1 and 2 and use them to establish the last assertion.
Consider 2. Suppose it is not true. Then there exists an open set V having (V ) <
but for all K V, (V K) for some > 0. By inner regularity on open
sets, there exists K
1
V, K
1
compact, such that (K
1
) /2. Now by assumption,
(V K
1
) and so by inner regularity on open sets again, there exists compact
K
2
V K
1
such that (K
2
) /2. Continuing this way, there is a sequence of
disjoint compact sets contained in V K
i
such that (K
i
) /2.
V
K
1
K
4
K
2
K
3
Now this is an obvious contradiction because by 3,
(V ) (
n
i=1
K
i
) =
n
i=1
(K
i
) n
2
for each n, contradicting (V ) < .
Next consider 1. By outer regularity, there exists an open set W K such
that (W) < (K) + 1. By 2, there exists compact K
1
W K such that
((W K) K
1
) < . Then consider V W K
1
. This is an open set contain-
ing K and from what was just shown,
((W K
1
) K) = ((W K) K
1
) < .
Now consider the last assertion.
Dene
o
1
= E T () : E K o
for all compact K.
First it will be shown the compact sets are in o. From this it will follow the
closed sets are in o
1
. Then you show o
1
= o. Thus o
1
= o is a algebra and so it
contains the Borel sets. Finally you show the inner regularity assertion.
Claim 1: Compact sets are in o.
Proof of claim: Let V be an open set with (V ) < . I will show that for C
compact,
(V ) (V C) +(V C).
Here is a diagram to help keep things straight.
V H C
K
By 2, there exists a compact set K V C such that
((V C) K) < .
and a compact set H V such that
(V H) <
Then
(V ) (H) + (H C) +(H C) +
(V C) +(V C) + (H C) +(K) + 3
By 3,
= (H C) +(K) + 3 = ((H C) K) + 3 (V ) + 3.
Since is arbitrary, this shows that
(V ) = (V C) +(V C). (8.7)
Of course 8.7 is exactly what needs to be shown for arbitrary S in place of V . It
suces to consider only S having (S) < . If S , with (S) < , let V S,
(S) + > (V ). Then from what was just shown, if C is compact,
+(S) > (V ) = (V C) +(V C)
(S C) +(S C).
Since is arbitrary, this shows the compact sets are in o. This proves the claim.
As discussed above, this veries the closed sets are in o
1
. If o
1
is a algebra,
this will show that o
1
contains the Borel sets. Thus I rst show o
1
is a algebra.
To see that o
1
is closed with respect to taking complements, let E o
1
and K
a compact set.
K = (E
C
K) (E K).
Then from the fact, just established, that the compact sets are in o,
E
C
K = K (E K) o.
o
1
is closed under countable unions because if K is a compact set and E
n
o
1
,
K
n=1
E
n
=
n=1
K E
n
o
because it is a countable union of sets of o. Thus o
1
is a algebra.
Therefore, if E o and K is a compact set, just shown to be in o, it follows
KE o because o is a algebra which contains the compact sets and so o
1
o.
It remains to verify o
1
o. Recall that
o
1
E : E K o for all K compact
Let E o
1
and let V be an open set with (V ) < and choose K V such
that (V K) < . Then since E o
1
, it follows E K, E
C
K o and so
(V ) (V E) +(V E)
The two sets are disjoint and in S
..
(K E) +(K E) + 2
= (K) + 2 (V ) + 3
Since is arbitrary, this shows
(V ) = (V E) +(V E)
which would show E o if V were an arbitrary set.
Now let S be such an arbitrary set. If (S) = , then
(S) = (S E) +(S E).
If (S) < , let
V S, (S) + (V ).
Then
(S) + (V ) = (V E) +(V E) (S E) +(S E).
Since is arbitrary, this shows that E o and so o
1
= o. Thus o Borel sets as
claimed.
From 2 is inner regular on all open sets. It remains to show that
(F) = sup(K) : K F
for all F o with (F) < . It might help to refer to the following crude picture
to keep things straight. It also might not help. I am not sure.
V U F F K V
C
V
<
<
Let (F) < and let U be an open set, U F, (U) < . Let V be open,
V U F, and
(V (U F)) < .
(This can be obtained as follows, because is a measure on o.
(V ) = (U F) +(V (U F))
Thus from the outer regularity of , 1 above, there exists V such that it contains
U F and
(U F) + > (V ) .
and so
(V (U F)) = (V ) (U F) < .)
Also,
V (U F) = V
_
U F
C
_
C
= V
_
U
C
F
= (V F)
_
V U
C
_
V F
and so
(V F) (V (U F)) < .
Since V U F
C
, V
C
U
C
F so U V
C
U F = F. Hence U V
C
is a
subset of F. Now let K U, (U K) < . Thus K V
C
is a compact subset of
F and
(F) = (V F) +(F V )
< +(F V ) +(U V
C
) 2 +(K V
C
).
Since is arbitrary, this proves the second part of the lemma. This proves the
lemma.
Where do outer measures come from? One way to obtain an outer measure is
to start with a measure , dened on a algebra of sets, o, and use the following
denition of the outer measure induced by the measure.
Denition 8.6 Let be a measure dened on a algebra of sets, o T (). Then
the outer measure induced by , denoted by is dened on T () as
(E) = inf(F) : F o and F E.
A measure space, (o, , ) is nite if there exist measurable sets,
i
with (
i
) <
and =
i=1
i
.
You should prove the following lemma.
Lemma 8.7 If (o, , ) is nite then there exist disjoint measurable sets, B
n
such that (B
n
) < and
n=1
B
n
= .
The following lemma deals with the outer measure generated by a measure
which is nite. It says that if the given measure is nite and complete then no
new measurable sets are gained by going to the induced outer measure and then
considering the measurable sets in the sense of Caratheodory.
Lemma 8.8 Let (, o, ) be any measure space and let : T() [0, ] be the
outer measure induced by . Then is an outer measure as claimed and if o is the
set of measurable sets in the sense of Caratheodory, then o o and = on o.
Furthermore, if is nite and (, o, ) is complete, then o = o.
Proof: It is easy to see that is an outer measure. Let E o. The plan is to
show E o and (E) = (E). To show this, let S and then show
(S) (S E) +(S E). (8.8)
This will verify that E o. If (S) = , there is nothing to prove, so assume
(S) < . Thus there exists T o, T S, and
(S) > (T) = (T E) +(T E)
(T E) +(T E)
(S E) +(S E) .
Since is arbitrary, this proves 8.8 and veries o o. Now if E o and V E
with V o, (E) (V ). Hence, taking inf, (E) (E). But also (E) (E)
since E o and E E. Hence
(E) (E) (E).
Next consider the claim about not getting any new sets from the outer measure
in the case the measure space is nite and complete.
Suppose rst F o and (F) < . Then there exists E o such that E F
and (E) = (F) . Since (F) < ,
(E F) = (E) (F) = 0.
Then there exists D E F such that D o and (D) = (E F) = 0. Then by
completeness of o, it follows E F o and so
E = (E F) F
Hence F = E (E F) o. In the general case where (F) is not known to be
nite, let (B
n
) < , with B
n
B
m
= for all n ,= m and
n
B
n
= . Apply what
was just shown to F B
n
, obtaining each of these is in o. Then F =
n
F B
n
o.
This proves the Lemma.
Usually is not just a set. It is also a topological space. It is very important
to consider how the measure is related to this topology. The following denition is
of what it means for a measure to be a regular measure.
Denition 8.9 Let be a measure on a algebra o, of subsets of , where (, )
is a topological space. is a Borel measure if o contains all Borel sets. is called
outer regular if is Borel and for all E o,
(E) = inf(V ) : V is open and V E.
is called inner regular if is Borel and
(E) = sup(K) : K E, and K is compact.
If the measure is both outer and inner regular, it is called regular.
It will be assumed in what follows that (, ) is a locally compact Hausdor
space. This means it is Hausdor: If p, q such that p ,= q, there exist open
sets, U
p
and U
q
containing p and q respectively such that U
p
U
q
= and Locally
compact: There exists a basis of open sets for the topology, B such that for each
U B, U is compact. Recall B is a basis for the topology if B = and if every
open set in is the union of sets of B. Also recall a Hausdor space is normal if
whenever H and C are two closed sets, there exist disjoint open sets, U
H
and U
C
containing H and C respectively. A regular space is one which has the property
that if p is a point not in H, a closed set, then there exist disjoint open sets, U
p
and
U
H
containing p and H respectively.
8.2 Urysohns Lemma
Urysohns lemma which characterizes normal spaces is a very important result which
is useful in general topology and in the construction of measures. Because it is
somewhat technical a proof is given for the part which is needed.
Theorem 8.10 (Urysohn) Let (X, ) be normal and let H U where H is closed
and U is open. Then there exists g : X [0, 1] such that g is continuous, g (x) = 1
on H and g (x) = 0 if x / U.
Proof: Let D r
n
n=1
be the rational numbers in (0, 1). Choose V
r
1
an open
set such that
H V
r
1
V
r
1
U.
This can be done by applying the assumption that X is normal to the disjoint closed
sets, H and U
C
, to obtain open sets V and W with
H V, U
C
W, and V W = .
Then
H V V , V U
C
=
and so let V
r
1
= V .
Suppose V
r
1
, , V
r
k
have been chosen and list the rational numbers r
1
, , r
k
in order,
r
l
1
< r
l
2
< < r
l
k
for l
1
, , l
k
= 1, , k.
If r
k+1
> r
l
k
then letting p = r
l
k
, let V
r
k+1
satisfy
V
p
V
r
k+1
V
r
k+1
U.
If r
k+1
(r
l
i
, r
l
i+1
), let p = r
l
i
and let q = r
l
i+1
. Then let V
r
k+1
satisfy
V
p
V
r
k+1
V
r
k+1
V
q
.
If r
k+1
< r
l
1
, let p = r
l
1
and let V
r
k+1
satisfy
H V
r
k+1
V
r
k+1
V
p
.
Thus there exist open sets V
r
for each r (0, 1) with the property that if
r < s,
H V
r
V
r
V
s
V
s
U.
Now let
f (x) = inft D : x V
t
, f (x) 1 if x /
_
tD
V
t
.
(Recall D = (0, 1) .)I claim f is continuous.
f
1
([0, a)) = V
t
: t < a, t D,
8.2. URYSOHNS LEMMA 181
an open set.
Next consider x f
1
([0, a]) so f (x) a. If t > a, then x V
t
because if not,
then
inft D : x V
t
> a.
Thus
f
1
([0, a]) = V
t
: t > a = V
t
: t > a
which is a closed set. If a = 1, f
1
([0, 1]) = f
1
([0, a]) = X. Therefore,
f
1
((a, 1]) = X f
1
([0, a]) = open set.
It follows f is continuous. Clearly f (x) = 0 on H. If x U
C
, then x / V
t
for any
t D so f (x) = 1 on U
C
. Let g (x) = 1 f (x). This proves the theorem.
In any metric space there is a much easier proof of the conclusion of Urysohns
lemma which applies.
Lemma 8.11 Let S be a nonempty subset of a metric space, (X, d) . Dene
f (x) dist (x, S) inf d (x, y) : y S .
Then f is continuous.
Proof: Consider [f (x) f (x
1
)[and suppose without loss of generality that
f (x
1
) f (x) . Then choose y S such that f (x) + > d (x, y) . Then
[f (x
1
) f (x)[ = f (x
1
) f (x) f (x
1
) d (x, y) +
d (x
1
, y) d (x, y) +
d (x, x
1
) +d (x, y) d (x, y) +
= d (x
1
, x) +.
Since is arbitrary, it follows that [f (x
1
) f (x)[ d (x
1
, x) and this proves the
lemma.
Theorem 8.12 (Urysohns lemma for metric space) Let H be a closed subset of
an open set, U in a metric space, (X, d) . Then there exists a continuous function,
g : X [0, 1] such that g (x) = 1 for all x H and g (x) = 0 for all x / U.
Proof: If x / C, a closed set, then dist (x, C) > 0 because if not, there would
exist a sequence of points of C converging to x and it would follow that x C.
Therefore, dist (x, H) + dist
_
x, U
C
_
> 0 for all x X. Now dene a continuous
function, g as
g (x)
dist
_
x, U
C
_
dist (x, H) + dist (x, U
C
)
.
It is easy to see this veries the conclusions of the theorem and this proves the
theorem.
Theorem 8.13 Every compact Hausdor space is normal.
Proof: First it is shown that X, is regular. Let H be a closed set and let p / H.
Then for each h H, there exists an open set U
h
containing p and an open set V
h
containing h such that U
h
V
h
= . Since H must be compact, it follows there
are nitely many of the sets V
h
, V
h
1
V
h
n
such that H
n
i=1
V
h
i
. Then letting
U =
n
i=1
U
h
i
and V =
n
i=1
V
h
i
, it follows that p U, H V and U V = . Thus
X is regular as claimed.
Next let K and H be disjoint nonempty closed sets.Using regularity of X, for
every k K, there exists an open set U
k
containing k and an open set V
k
containing
H such that these two open sets have empty intersection. Thus HU
k
= . Finitely
many of the U
k
, U
k
1
, , U
k
p
cover K and so
p
i=1
U
k
i
is a closed set which has
empty intersection with H. Therefore, K
p
i=1
U
k
i
and H
_
p
i=1
U
k
i
_
C
. This
proves the theorem.
A useful construction when dealing with locally compact Hausdor spaces is the
notion of the one point compactication of the space discussed earler. However, it
is reviewed here for the sake of convenience or in case you have not read the earlier
treatment.
Denition 8.14 Suppose (X, ) is a locally compact Hausdor space. Then let
X X where is just the name of some point which is not in X which is

called the point at innity. A basis for the topology for

X is

_
K
C
where K is a compact subset of X
_
.
The complement is taken with respect to

X and so the open sets, K
C
are basic open
sets which contain .
The reason this is called a compactication is contained in the next lemma.
Lemma 8.15 If (X, ) is a locally compact Hausdor space, then
_
X,
_
is a com-
pact Hausdor space. Also if U is an open set of , then U is an open set
of .
Proof: Since (X, ) is a locally compact Hausdor space, it follows
_
X,
_
is
a Hausdor topological space. The only case which needs checking is the one of
p X and . Since (X, ) is locally compact, there exists an open set of , U
having compact closure which contains p. Then p U and U
C
and these are
disjoint open sets containing the points, p and respectively. Now let ( be an
open cover of

X with sets from . Then must be in some set, U
from (, which
must contain a set of the form K
C
where K is a compact subset of X. Then there
exist sets from (, U
1
, , U
r
which cover K. Therefore, a nite subcover of

X is
U
1
, , U
r
, U
.
To see the last claim, suppose U contains since otherwise there is nothing to
show. Notice that if C is a compact set, then X C is an open set. Therefore, if
x U , and if

X C is a basic open set contained in U containing , then
if x is in this basic open set of

X, it is also in the open set X C U . If x
is not in any basic open set of the form

X C then x is contained in an open set of
which is contained in U . Thus U is indeed open in .
Theorem 8.16 Let X be a locally compact Hausdor space, and let K be a compact
subset of the open set V . Then there exists a continuous function, f : X [0, 1],
such that f equals 1 on K and x : f (x) ,= 0 spt (f) is a compact subset of V .
Proof: Let

X be the space just described. Then K and V are respectively
closed and open in . By Theorem 8.13 there exist open sets in , U, and W such
that K U, V
C
W, and U W = U (W ) = .
V
C
K U
W
Thus W is an open set in the original topological space which contains V
C
.
U is an open set in the original topological space which contains K, and W
and U are disjoint.
Now for each x K, let U
x
be a basic open set whose closure is compact and
such that
x U
x
U.
Thus U
x
must have empty intersection with V
C
because the open set, W
contains no points of U
x
. Since K is compact, there are nitely many of these sets,
U
x
1
, U
x
2
, , U
x
n
which cover K. Now let H
n
i=1
U
x
i
.
Claim: H =
n
i=1
U
x
i
Proof of claim: Suppose p H. If p /
n
i=1
U
x
i
then if follows p / U
x
i
for
each i. Therefore, there exists an open set, R
i
containing p such that R
i
contains
no other points of U
x
i
. Therefore, R
n
i=1
R
i
is an open set containing p which
contains no other points of
n
i=1
U
x
i
= W, a contradiction. Therefore, H
n
i=1
U
x
i
.
On the other hand, if p U
x
i
then p is obviously in H so this proves the claim.
From the claim, K H H V and H is compact because it is the nite
union of compact sets. By Urysohns lemma, there exists f
1
continuous on H which
has values in [0, 1] such that f
1
equals 1 on K and equals 0 o H. Let f denote
the function which extends f
1
to be 0 o H. Then for > 0, the continuity of f
1
implies there exists U open in the topological space such that
f
1
((, )) = f
1
1
((, )) H
C
=
_
U H
_
H
C
= U H
C
an open set. If 0,
f
1
((, )) =
an open set. If > 0, there exists an open set U such that
f
1
((, )) = f
1
1
((, )) = U H = U H
because U must be a subset of H since by denition f = 0 o H. If 0, then
f
1
((, )) = X,
an open set. Thus f is continuous and spt (f) H, a compact subset of V. This
proves the theorem.
In fact, the conclusion of the above theorem could be used to prove that the
topological space is locally compact. However, this is not needed here.
In case you would like a more elementary proof which does not use the one point
compactication idea, here is such a proof.
Theorem 8.17 Let X be a locally compact Hausdor space, and let K be a compact
subset of the open set V . Then there exists a continuous function, f : X [0, 1],
such that f equals 1 on K and x : f (x) ,= 0 spt (f) is a compact subset of V .
Proof: To begin with, here is a claim. This claim is obvious in the case of a
metric space but requires some proof in this more general case.
Claim: If k K then there exists an open set U
k
containing k such that U
k
is
contained in V.
Proof of claim: Since X is locally compact, there exists a basis of open sets
whose closures are compact, |. Denote by ( the set of all U | which contain
k and let (
denote the set of all closures of these sets of ( intersected with the
closed set V
C
. Thus (
is a collection of compact sets. I will argue that there are

nitely many of the sets of (
which have empty intersection. If not, then (
has
the nite intersection property and so there exists a point p in all of them. Since
X is a Hausdor space, there exist disjoint basic open sets from |, A, B such that
k A and p B. Therefore, p / A contrary to the above requirement that p be in
all such sets. It follows there are sets A
1
, , A
m
in ( such that
V
C
A
1
A
m
=
Let U
k
= A
1
A
m
. Then U
k
A
1
A
m
and so it has empty intersection
with V
C
. Thus it is contained in V . Also U
k
is a closed subset of the compact set
A
1
so it is compact. This proves the claim.
Now to complete the proof of the theorem, since K is compact, there are nitely
many U
k
of the sort just described which cover K, U
k
1
, , U
k
r
. Let
H =
r
i=1
U
k
i
so it follows
H =
r
i=1
U
k
i
and so K H H V and H is a compact set. By Urysohns lemma, there exists
f
1
continuous on H which has values in [0, 1] such that f
1
equals 1 on K and equals
0 o H. Let f denote the function which extends f
1
to be 0 o H. Then for > 0,
the continuity of f
1
implies there exists U open in the topological space such that
f
1
((, )) = f
1
1
((, )) H
C
=
_
U H
_
H
C
= U H
C
an open set. If 0,
f
1
((, )) =
an open set. If > 0, there exists an open set U such that
f
1
((, )) = f
1
1
((, )) = U H = U H
because U must be a subset of H since by denition f = 0 o H. If 0, then
f
1
((, )) = X,
an open set. Thus f is continuous and spt (f) H, a compact subset of V. This
proves the theorem.
Denition 8.18 Dene spt(f) (support of f) to be the closure of the set x :
f(x) ,= 0. If V is an open set, C
c
(V ) will be the set of continuous functions f,
dened on having spt(f) V . Thus in Theorem 8.16 or 8.17, f C
c
(V ).
Denition 8.19 If K is a compact subset of an open set, V , then K V if
C
c
(V ), (K) = 1, () [0, 1],
where denotes the whole topological space considered. Also for C
c
(), K
if
() [0, 1] and (K) = 1.
and V if
() [0, 1] and spt() V.
Theorem 8.20 (Partition of unity) Let K be a compact subset of a locally compact
Hausdor topological space satisfying Theorem 8.16 or 8.17 and suppose
K V =
n
i=1
V
i
, V
i
open.
Then there exist
i
V
i
with
n
i=1
i
(x) = 1
for all x K.
Proof: Let K
1
= K
n
i=2
V
i
. Thus K
1
is compact and K
1
V
1
. Let K
1

W
1
W
1
V
1
with W
1
compact. To obtain W
1
, use Theorem 8.16 or 8.17 to
get f such that K
1
f V
1
and let W
1
x : f (x) ,= 0 . Thus W
1
, V
2
, V
n
covers K and W
1
V
1
. Let K
2
= K (
n
i=3
V
i
W
1
). Then K
2
is compact and
K
2
V
2
. Let K
2
W
2
W
2
V
2
W
2
compact. Continue this way nally
obtaining W
1
, , W
n
, K W
1
W
n
, and W
i
V
i
; W
i
compact. Now let
W
i
U
i
U
i
V
i
, U
i
compact.
W
i
U
i
V
i
By Theorem 8.16 or 8.17, let U
i

i
V
i
,
n
i=1
W
i

n
i=1
U
i
. Dene
i
(x) =
_
(x)
i
(x)/
n
j=1
j
(x) if

n
j=1
j
(x) ,= 0,
0 if

n
j=1
j
(x) = 0.
If x is such that

n
j=1
j
(x) = 0, then x /
n
i=1
U
i
. Consequently (y) = 0 for
all y near x and so
i
(y) = 0 for all y near x. Hence
i
is continuous at such x.
If

n
j=1
j
(x) ,= 0, this situation persists near x and so
i
is continuous at such
points. Therefore
i
is continuous. If x K, then (x) = 1 and so
n
j=1
j
(x) = 1.
Clearly 0
i
(x) 1 and spt(
j
) V
j
The following corollary wont be needed immediately but is of considerable in-
terest later.
Corollary 8.21 If H is a compact subset of V
i
, there exists a partition of unity
such that
i
(x) = 1 for all x H in addition to the conclusion of Theorem 8.20.
Proof: Keep V
i
the same but replace V
j
with

V
j
V
j
H. Now in the proof
above, applied to this modied collection of open sets, if j ,= i,
j
(x) = 0 whenever
x H. Therefore,
i
(x) = 1 on H.
8.3 Positive Linear Functionals
Denition 8.22 Let (, ) be a topological space. L : C
c
() C is called a
positive linear functional if L is linear,
L(af
1
+bf
2
) = aLf
1
+bLf
2
,
and if Lf 0 whenever f 0.
Theorem 8.23 (Riesz representation theorem) Let (, ) be a locally compact Haus-
dor space and let L be a positive linear functional on C
c
(). Then there exists a
algebra o containing the Borel sets and a unique measure , dened on o, such
that
is complete, (8.9)
(K) < for all K compact, (8.10)
8.3. POSITIVE LINEAR FUNCTIONALS 187
(F) = sup(K) : K F, K compact,
for all F open and for all F o with (F) < ,
(F) = inf(V ) : V F, V open
for all F o, and
_
fd = Lf for all f C
c
(). (8.11)
The plan is to dene an outer measure and then to show that it, together with the
algebra of sets measurable in the sense of Caratheodory, satises the conclusions
of the theorem. Always, K will be a compact set and V will be an open set.
Denition 8.24 (V ) supLf : f V for V open, () = 0. (E)
inf(V ) : V E for arbitrary sets E.
Lemma 8.25 is a well-dened outer measure.
Proof: First it is necessary to verify is well dened because there are two
descriptions of it on open sets. Suppose then that
1
(V ) inf(U) : U V
and U is open. It is required to verify that
1
(V ) = (V ) where is given as
supLf : f V . If U V, then (U) (V ) directly from the denition. Hence
from the denition of
1
, it follows
1
(V ) (V ) . On the other hand, V V and
so
1
(V ) (V ) . This veries is well dened.
It remains to show that is an outer measure. It is obvious that is increasing.
What about countably subadditive property? Let V =
i=1
V
i
and let f V . Then
spt(f)
n
i=1
V
i
for some n. Let
i
V
i
,
n
i=1
i
= 1 on spt(f).
Lf =
n
i=1
L(f
i
)
n
i=1
(V
i
)
i=1
(V
i
).
Hence
(V )
i=1
(V
i
)
since f V is arbitrary. Now let E =
i=1
E
i
. Is (E)

i=1
(E
i
)? Without
loss of generality, it can be assumed (E
i
) < for each i since if not so, there is
nothing to prove. Let V
i
E
i
with (E
i
) +2
i
> (V
i
).
(E) (
i=1
V
i
)
i=1
(V
i
) +
i=1
(E
i
).
Since was arbitrary, (E)
i=1
(E
i
) which proves the lemma.
Lemma 8.26 Let K be compact, g 0, g C
c
(), and g = 1 on K. Then
(K) Lg. Also (K) < whenever K is compact.
Proof: Let (0, 1) and V
= x : g(x) > so V
K and let h V
.
g >
V
K
Then h 1 on V
while g
1
1 on V
and so g
1
h which implies
L(g
1
) Lh and that therefore, since L is linear,
Lg Lh.
Since h V
is arbitrary, and K V
,
Lg (V
) (K) .
Letting 1 yields Lg (K). This proves the rst part of the lemma. The
second assertion follows from this and Theorem 8.16. If K is given, let
K g
and so from what was just shown, (K) Lg < . This proves the lemma.
Lemma 8.27 If A and B are disjoint compact subsets of , then (A B) =
(A) +(B).
Proof: By Theorem 8.16 or 8.17, there exists h C
c
() such that A h B
C
.
Let U
1
= h
1
((
1
2
, 1]), V
1
= h
1
([0,
1
2
)). Then A U
1
, B V
1
and U
1
V
1
= .
B V
1
A U
1
From Lemma 8.26 (A B) < and so there exists an open set, W such that
W A B, (A B) + > (W) .
Now let U = U
1
W and V = V
1
W. Then
U A, V B, U V = , and (A B) + (W) (U V ).
Let A f U, B g V . Then by Lemma 8.26,
(A B) + (U V ) L(f +g) = Lf +Lg (A) +(B).
Since > 0 is arbitrary, this proves the lemma.
From Lemma 8.26 the following lemma is obtained.
Lemma 8.28 Let f C
c
(), f() [0, 1]. Then (spt(f)) Lf. Also, every
open set, V satises
(V ) = sup (K) : K V .
Proof: Let V spt(f) and let spt(f) g V . Then Lf Lg (V ) because
f g. Since this holds for all V spt(f), Lf (spt(f)) by denition of .
V spt(f)
Finally, let V be open and let l < (V ) . Then from the denition of , there
exists f V such that L(f) > l. Therefore, l < (spt (f)) (V ) and so this
shows the claim about inner regularity of the measure on an open set.
At this point, the hypotheses of Lemma 8.5 have been veried and so o contains
the Borel sets and is inner regular on every set of o having nite measure.
It remains to show satises 8.11.
Lemma 8.29
_
fd = Lf for all f C
c
().
Proof: Let f C
c
(), f real-valued, and suppose f() [a, b]. Choose t
0
< a
and let t
0
< t
1
< < t
n
= b, t
i
t
i1
< . Let
E
i
= f
1
((t
i1
, t
i
]) spt(f). (8.12)
Note that
n
i=1
E
i
is a closed set and in fact
n
i=1
E
i
= spt(f) (8.13)
since =
n
i=1
f
1
((t
i1
, t
i
]). Let V
i
E
i
, V
i
is open and let V
i
satisfy
f (x) < t
i
+ for all x V
i
, (8.14)
(V
i
E
i
) < /n.
By Theorem 8.20 there exists h
i
C
c
() such that
h
i
V
i
,
n
i=1
h
i
(x) = 1 on spt(f).
Now note that for each i,
f(x)h
i
(x) h
i
(x)(t
i
+).
(If x V
i
, this follows from Formula 8.14. If x / V
i
both sides equal 0.) Therefore,
Lf = L(
n
i=1
fh
i
) L(
n
i=1
h
i
(t
i
+))
=
n
i=1
(t
i
+)L(h
i
)
=
n
i=1
([t
0
[ +t
i
+)L(h
i
) [t
0
[L
_
n
i=1
h
i
_
.
Now note that [t
0
[ + t
i
+ 0 and so from the denition of and Lemma 8.26,
this is no larger than
n
i=1
([t
0
[ +t
i
+)(V
i
) [t
0
[(spt(f))
i=1
([t
0
[ +t
i
+)((E
i
) +/n) [t
0
[(spt(f))
[t
0
[
n
i=1
(E
i
) +[t
0
[ +
n
i=1
t
i
(E
i
) +([t
0
[ +[b[)
+
n
i=1
(E
i
) +
2
[t
0
[(spt(f)).
From 8.13 and 8.12, the rst and last terms cancel. Therefore this is no larger than
(2[t
0
[ +[b[ +(spt(f)) +) +
n
i=1
t
i1
(E
i
) +(spt(f))
_
fd + (2[t
0
[ +[b[ + 2(spt(f)) +).
Since > 0 is arbitrary,
Lf
_
fd (8.15)
for all f C
c
(), f real. Hence equality holds in 8.15 because L(f)
_
fd
so L(f)
_
fd. Thus Lf =
_
fd for all f C
c
(). Just apply the result for
real functions to the real and imaginary parts of f. This proves the Lemma.
This gives the existence part of the Riesz representation theorem.
It only remains to prove uniqueness. Suppose both
1
and
2
are measures on
o satisfying the conclusions of the theorem. Then if K is compact and V K, let
K f V . Then
1
(K)
_
fd
1
= Lf =
_
fd
2

2
(V ).
Thus
1
(K)
2
(K) for all K. Similarly, the inequality can be reversed and so it
follows the two measures are equal on compact sets. By the assumption of inner
regularity on open sets, the two measures are also equal on all open sets. By outer
regularity, they are equal on all sets of o. This proves the theorem.
An important example of a locally compact Hausdor space is any metric space
in which the closures of balls are compact. For example, 1
n
with the usual metric
is an example of this. Not surprisingly, more can be said in this important special
case.
Theorem 8.30 Let (, ) be a metric space in which the closures of the balls are
compact and let L be a positive linear functional dened on C
c
() . Then there
exists a measure representing the positive linear functional which satises all the
conclusions of Theorem 8.16 or 8.17 and in addition the property that is regular.
The same conclusion follows if (, ) is a compact Hausdor space.
Proof: Let and o be as described in Theorem 8.23. The outer regularity
comes automatically as a conclusion of Theorem 8.23. It remains to verify inner
regularity. Let F o and let l < k < (F) . Now let z and
n
= B(z, n) for
n N. Thus F
n
F. It follows that for n large enough,
k < (F
n
) (F) .
Since (F
n
) < it follows there exists a compact set, K such that K
F
n
F and
l < (K) (F) .
This proves inner regularity. In case (, ) is a compact Hausdor space, the con-
clusion of inner regularity follows from Theorem 8.23. This proves the theorem.
The proof of the above yields the following corollary.
Corollary 8.31 Let (, ) be a locally compact Hausdor space and suppose
dened on a algebra, o represents the positive linear functional L where L is
dened on C
c
() in the sense of Theorem 8.16 or 8.17. Suppose also that there
exist
n
o such that =
n=1
n
and (
n
) < . Then is regular.
The following is on the uniqueness of the algebra in some cases.
Denition 8.32 Let (, ) be a locally compact Hausdor space and let L be a
positive linear functional dened on C
c
() such that the complete measure dened
by the Riesz representation theorem for positive linear functionals is inner regular.
Then this is called a Radon measure. Thus a Radon measure is complete, and
regular.
Corollary 8.33 Let (, ) be a locally compact Hausdor space which is also
compact meaning
=
n=1
n
,
n
is compact,
and let L be a positive linear functional dened on C
c
() . Then if (
1
, o
1
) , and
(
2
, o
2
) are two Radon measures, together with their algebras which represent L
then the two algebras are equal and the two measures are equal.
Proof: Suppose (
1
, o
1
) and (
2
, o
2
) both work. It will be shown the two
measures are equal on every compact set. Let K be compact and let V be an open
set containing K. Then let K f V. Then
1
(K) =
_
K
d
1

_
fd
1
= L(f) =
_
fd
2

2
(V ) .
Therefore, taking the inmum over all V containing K implies
1
(K)
2
(K) .
Reversing the argument shows
1
(K) =
2
(K) . This also implies the two measures
are equal on all open sets because they are both inner regular on open sets. It is
being assumed the two measures are regular. Now let F o
1
with
1
(F) < .
Then there exist sets, H, G such that H F G such that H is the countable
union of compact sets and G is a countable intersection of open sets such that
1
(G) =
1
(H) which implies
1
(G H) = 0. Now G H can be written as the
countable intersection of sets of the form V
k
K
k
where V
k
is open,
1
(V
k
) < and
K
k
is compact. From what was just shown,
2
(V
k
K
k
) =
1
(V
k
K
k
) so it follows
2
(G H) = 0 also. Since
2
is complete, and G and H are in o
2
, it follows F o
2
and
2
(F) =
1
(F) . Now for arbitrary F possibly having
1
(F) = , consider
F
n
. From what was just shown, this set is in o
2
and
2
(F
n
) =
1
(F
n
).
Taking the union of these F
n
gives F o
2
and also
1
(F) =
2
(F) . This shows
o
1
o
2
. Similarly, o
2
o
1
.
The following lemma is often useful.
Lemma 8.34 Let (, T, ) be a measure space where is a topological space. Sup-
pose is a Radon measure and f is measurable with respect to T. Then there exists
a Borel measurable function, g, such that g = f a.e.
Proof: Assume without loss of generality that f 0. Then let s
n
f pointwise.
Say
s
n
() =
P
n
k=1
c
n
k
A
E
n
k
()
where E
n
k
T. By the outer regularity of , there exists a Borel set, F
n
k
E
n
k
such
that (F
n
k
) = (E
n
k
). In fact F
n
k
can be assumed to be a G
set. Let
t
n
()
P
n
k=1
c
n
k
A
F
n
k
() .
Then t
n
is Borel measurable and t
n
() = s
n
() for all / N
n
where N
n
T is
a set of measure zero. Now let N
n=1
N
n
. Then N is a set of measure zero
and if / N, then t
n
() f (). Let N
N where N
is a Borel set and

(N
) = 0. Then t
n
A
(N
)
C converges pointwise to a Borel measurable function, g,
and g () = f () for all / N
. Therefore, g = f a.e. and this proves the lemma.

8.4. ONE DIMENSIONAL LEBESGUE MEASURE 193
8.4 One Dimensional Lebesgue Measure
To obtain one dimensional Lebesgue measure, you use the positive linear functional
L given by
Lf =
_
f (x) dx
whenever f C
c
(1) . Lebesgue measure, denoted by m is the measure obtained
from the Riesz representation theorem such that
_
fdm = Lf =
_
f (x) dx.
From this it is easy to verify that
m([a, b]) = m((a, b)) = b a. (8.16)
This will be done in general a little later but for now, consider the following picture
of functions, f
k
and g
k
. Note that f
k
A
(a,b)
A
[a,b]
g
k
.
a + 1/k
a
1
b 1/k
b
f
k
a 1/k
a
1
b
b + 1/k
g
k
Then considering lower sums and upper sums in the inequalities on the ends,
_
b a
2
k
_

_
f
k
dx =
_
f
k
dm m((a, b)) m([a, b])
=
_
A
[a,b]
dm
_
g
k
dm =
_
g
k
dx
_
b a +
2
k
_
.
From this the claim in 8.16 follows.
8.5 The Distribution Function
There is an interesting connection between the Lebesgue integral of a nonnegative
function with something called the distribution function.
Denition 8.35 Let f 0 and suppose f is measurable. The distribution function
is the function dened by
t ([t < f]) .
Lemma 8.36 If f
n
is an increasing sequence of functions converging pointwise
to f then
([f > t]) = lim
n
([f
n
> t])
Proof: The sets, [f
n
> t] are increasing and their union is [f > t] because if
f () > t, then for all n large enough, f
n
() > t also. Therefore, from Theorem 7.5
on Page 136 the desired conclusion follows.
Lemma 8.37 Suppose s 0 is a measurable simple function,
s ()
n
k=1
a
k
A
E
k
()
where the a
k
are the distinct nonzero values of s, a
1
< a
2
< < a
n
. Suppose is
a C
1
function dened on [0, ) which has the property that (0) = 0,
(t) > 0 for

all t. Then
_

0
(t) ([s > t]) dm =

_
(s) d.
Proof: First note that if (E
k
) = for any k then both sides equal and
so without loss of generality, assume (E
k
) < for all k. Letting a
0
0, the left
side equals
n
k=1
_
a
k
a
k1
(t) ([s > t]) dm =

n
k=1
_
a
k
a
k1
(t)
n
i=k
(E
i
) dm
=
n
k=1
n
i=k
(E
i
)
_
a
k
a
k1
(t) dm
=
n
k=1
n
i=k
(E
i
) ((a
k
) (a
k1
))
=
n
i=1
(E
i
)
i
k=1
((a
k
) (a
k1
))
=
n
i=1
(E
i
) (a
i
) =
_
(s) d.
With this lemma the next theorem which is the main result follows easily.
Theorem 8.38 Let f 0 be measurable and let be a C
1
function dened on
[0, ) which satises
(t) > 0 for all t > 0 and (0) = 0. Then

_
(f) d =
_

0
(t) ([f > t]) dt.

8.6. COMPLETION OF MEASURES 195
Proof: By Theorem 7.24 on Page 150 there exists an increasing sequence of
n
which converges pointwise to f. By the monotone
convergence theorem and Lemma 8.36,
_
(f) d = lim
n
_
(s
n
) d = lim
n
_

0
(t) ([s
n
> t]) dm
=
_

0
(t) ([f > t]) dm

8.6 Completion Of Measures
Suppose (, T, ) is a measure space. Then it is always possible to enlarge the
algebra and dene a new measure on this larger algebra such that
_
, T,
_
is
a complete measure space. Recall this means that if N N
T and (N
) = 0,
then N T. The following theorem is the main result. The new measure space is
called the completion of the measure space.
Theorem 8.39 Let (, T, ) be a nite measure space. Then there exists a
unique measure space,
_
, T,
_
satisfying
1.
_
, T,
_
is a complete measure space.
2. = on T
3. T T
4. For every E T there exists G T such that G E and (G) = (E) .
5. For every E T there exists F T such that F E and (F) = (E) .
Also for every E T there exist sets G, F T such that G E F and
(G F) = (G F) = 0 (8.17)
Proof: First consider the claim about uniqueness. Suppose (, T
1
,
1
) and
(, T
2
,
2
) both work and let E T
1
. Also let (
n
) < ,
n

n+1
, and
n=1
n
= . Dene E
n
E
n
. Then pick G
n
E
n
F
n
such that (G
n
) =
(F
n
) =
1
(E
n
). It follows (G
n
F
n
) = 0. Then letting G =
n
G
n
, F
n
F
n
,
it follows G E F and
(G F) (
n
(G
n
F
n
))
n
(G
n
F
n
) = 0.
It follows that
2
(G F) = 0 also. Now E F G F and since (, T
2
,
2
) is
complete, it follows E F T
2
. Since F T
2
, it follows E = (E F) F T
2
.
Thus T
1
T
2
. Similarly T
2
T
1
. Now it only remains to verify
1
=
2
. Thus let
E T
1
= T
2
and let G and F be as just described. Since
i
= on T,
(F)
1
(E)
=
1
(E F) +
1
(F)

1
(G F) +
1
(F)
=
1
(F) = (F)
Similarly
2
(E) = (F) . This proves uniqueness. The construction has also veried
8.17.
Next dene an outer measure, on T () as follows. For S ,
(S) inf (E) : E T .
Then it is clear is increasing. It only remains to verify is subadditive. Then let
S =
i=1
S
i
. If any (S
i
) = , there is nothing to prove so suppose (S
i
) < for
each i. Then there exist E
i
T such that E
i
S
i
and
(S
i
) +/2
i
> (E
i
) .
Then
(S) = (
i
S
i
)
(
i
E
i
)
i
(E
i
)
i
_
(S
i
) +/2
i
_
=
i
(S
i
) +.
Since is arbitrary, this veries is subadditive and is an outer measure as claimed.
Denote by T the algebra of measurable sets in the sense of Caratheodory.
Then it follows from the Caratheodory procedure, Theorem 8.4, on Page 170 that
_
, T,
_
is a complete measure space. This veries 1.
Now let E T. Then from the denition of , it follows
(E) inf (F) : F T and F E (E) .
If F E and F T, then (F) (E) and so (E) is a lower bound for all such
(F) which shows that
(E) inf (F) : F T and F E (E) .
This veries 2.
Next consider 3. Let E T and let S be a set. I must show
(S) (S E) +(S E) .
8.6. COMPLETION OF MEASURES 197
If (S) = there is nothing to show. Therefore, suppose (S) < . Then from
the denition of there exists G S such that G T and (G) = (S) . Then
from the denition of ,
(S) (S E) +(S E)
(G E) +(G E)
= (G) = (S)
This veries 3.
Claim 4 comes by the denition of as used above. The only other case is when
(S) = . However, in this case, you can let G = .
It only remains to verify 5. Let the
n
be as described above and let E T
such that E
n
. By 4 there exists H T such that H
n
, H
n
E, and
(H) = (
n
E) . (8.18)
Then let F
n
H
C
. It follows F E and
E F = E F
C
= E
_
H
C
n
_
= E H = H (
n
E)
Hence from 8.18
(E F) = (H (
n
E)) = 0.
It follows
(E) = (F) = (F) .
In the case where E T is arbitrary, not necessarily contained in some
n
, it
follows from what was just shown that there exists F
n
T such that F
n
E
n
and
(F
n
) = (E
n
) .
Letting F
n
F
n
(E F) (
n
(E
n
F
n
))
n
(E
n
F
n
) = 0.
Therefore, (E) = (F) and this proves 5. This proves the theorem.
Now here is an interesting theorem about complete measure spaces.
Theorem 8.40 Let (, T, ) be a complete measure space and let f g h be
functions having values in [0, ] . Suppose also that f () = h() a.e. and that
f and h are measurable. Then g is also measurable. If
_
, T,
_
is the completion
of a nite measure space (, T, ) as described above in Theorem 8.39 then if f
is measurable with respect to T having values in [0, ] , it follows there exists g
measurable with respect to T , g f, and a set N T with (N) = 0 and g = f
on N
C
. There also exists h measurable with respect to T such that h f, and a
set of measure zero, M T such that f = h on M
C
.
Proof: Let 1.
[f > ] [g > ] [h > ]
Thus
[g > ] = [f > ] ([g > ] [f > ])
and [g > ] [f > ] is a measurable set because it is a subset of the set of measure
zero,
[h > ] [f > ] .
Now consider the last assertion. By Theorem 7.24 on Page 150 there exists an
increasing sequence of nonnegative simple functions, s
n
measurable with respect
to T which converges pointwise to f. Letting
s
n
() =
m
n
k=1
c
n
k
A
E
n
k
() (8.19)
be one of these simple functions, it follows from Theorem 8.39 there exist sets,
F
n
k
T such that F
n
k
E
n
k
and (F
n
k
) = (E
n
k
) . Then let
t
n
()
m
n
k=1
c
n
k
A
F
n
k
() .
Thus t
n
= s
n
o a set of measure zero, N
n
T, t
n
s
n
. Let N

n
N
n
. Then by
Theorem 8.39 again, there exists N T such that N N
and (N) = 0. Consider

the simple functions,
s
n
() t
n
() A
N
C () .
It is an increasing sequence so let g () = lim
n
s
n
() . It follows g is mesurable
with respect to T and equals f o N.
Finally, to obtain the function, h f, in 8.19 use Theorem 8.39 to obtain the
existence of F
n
k
T such that F
n
k
E
n
k
and (F
n
k
) = (E
n
k
). Then let
t
n
()
m
n
k=1
c
n
k
A
F
n
k
() .
Thus t
n
= s
n
o a set of measure zero, M
n
T, t
n
s
n
, and t
n
is measurable with
respect to T. Then dene
s
n
= max
kn
t
n
.
It follows s
n
is an increasing sequence of T measurable nonnegative simple functions.
Since each s
n
s
n
, it follows that if h() = lim
n
s
n
() ,then h() f () .
Also if h() > f () , then
n
M
n
M
, a set of T having measure zero. By

Theorem 8.39, there exists M M
such that M T and (M) = 0. It follows

h = f o M. This proves the theorem.
8.7. PRODUCT MEASURES 199
8.7 Product Measures
8.7.1 General Theory
Given two nite measure spaces, (X, T, ) and (Y, o, ) , there is a way to dene a
algebra of subsets of X Y , denoted by T o and a measure, denoted by
dened on this algebra such that
(AB) = (A) (B)
whenever A T and B o. This is naturally related to the concept of iterated
integrals similar to what is used in calculus to evaluate a multiple integral. The
approach is based on something called a system, [14].
Denition 8.41 Let (X, T, ) and (Y, o, ) be two measure spaces. A measurable
rectangle is a set of the form AB where A T and B o.
Denition 8.42 Let be a set and let / be a collection of subsets of . Then /
is called a system if / and whenever A, B /, it follows A B /.
Obviously an example of a system is the set of measurable rectangles because
AB A
= (A A
) (B B
) .
The following is the fundamental lemma which shows these systems are useful.
Lemma 8.43 Let / be a system of subsets of , a set. Also let ( be a collection
of subsets of which satises the following three properties.
1. / (
2. If A (, then A
C
(
3. If A
i
i=1
is a sequence of disjoint sets from ( then
i=1
A
i
(.
Then ( (/) , where (/) is the smallest algebra which contains /.
Proof: First note that if
H ( : 1 - 3 all hold
then H yields a collection of sets which also satises 1 - 3. Therefore, I will assume
in the argument that ( is the smallest collection satisfying 1 - 3. Let A / and
dene
(
A
B ( : A B ( .
I want to show (
A
satises 1 - 3 because then it must equal ( since ( is the smallest
collection of subsets of which satises 1 - 3. This will give the conclusion that for
A / and B (, A B (. This information will then be used to show that if
A, B ( then A B (. From this it will follow very easily that ( is a algebra
which will imply it contains (/). Now here are the details of the argument.
Since / is given to be a system, / (
A
. Property 3 is obvious because if
B
i
is a sequence of disjoint sets in (
A
, then
A
i=1
B
i
=
i=1
A B
i
(
because A B
i
( and the property 3 of (.
It remains to verify Property 2 so let B (
A
. I need to verify that B
C
(
A
.
In other words, I need to show that A B
C
(. However,
A B
C
=
_
A
C
(A B)
_
C
(
Here is why. Since B (
A
, A B ( and since A / ( it follows A
C
( by
assumption 2. It follows from assumption 3 the union of the disjoint sets, A
C
and
(A B) is in ( and then from 2 the complement of their union is in (. Thus (
A
satises 1 - 3 and this implies since ( is the smallest such, that (
A
(. However,
(
A
is constructed as a subset of (. This proves that for every B ( and A /,
A B (. Now pick B ( and consider
(
B
A ( : A B ( .
I just proved / (
B
. The other arguments are identical to show (
B
satises 1 - 3
and is therefore equal to (. This shows that whenever A, B ( it follows AB (.
This implies ( is a algebra. To show this, all that is left is to verify ( is closed
under countable unions because then it follows ( is a algebra. Let A
i
(.
Then let A
1
= A
1
and
A
n+1
A
n+1
(
n
i=1
A
i
)
= A
n+1
n
i=1
A
C
i
_
=
n
i=1
_
A
n+1
A
C
i
_
(
because nite intersections of sets of ( are in (. Since the A
i
are disjoint, it follows
i=1
A
i
=
i=1
A
i
(
Therefore, ( (/) and this proves the Lemma.
With this lemma, it is easy to dene product measure.
Let (X, T, ) and (Y, o, ) be two nite measure spaces. Dene / to be the set
of measurable rectangles, AB, A T and B o. Let
(
_
E X Y :
_
Y
_
X
A
E
dd =
_
X
_
Y
A
E
dd
_
(8.20)
where in the above, part of the requirement is for all integrals to make sense.
Then / (. This is obvious.
Next I want to show that if E ( then E
C
(. Observe A
E
C = 1 A
E
and so
_
Y
_
X
A
E
Cdd =
_
Y
_
X
(1 A
E
) dd
=
_
X
_
Y
(1 A
E
) dd
=
_
X
_
Y
A
E
Cdd
which shows that if E (, then E
C
(.
Next I want to show ( is closed under countable unions of disjoint sets of (. Let
A
i
be a sequence of disjoint sets from (. Then
_
Y
_
X
A
i=1
A
i
dd =
_
Y
_
X
i=1
A
A
i
dd
=
_
Y
i=1
_
X
A
A
i
dd
=
i=1
_
Y
_
X
A
A
i
dd
=
i=1
_
X
_
Y
A
A
i
dd
=
_
X
i=1
_
Y
A
A
i
dd
=
_
X
_
Y
i=1
A
A
i
dd
=
_
X
_
Y
A
i=1
A
i
dd, (8.21)
the interchanges between the summation and the integral depending on the mono-
tone convergence theorem. Thus ( is closed with respect to countable disjoint
unions.
From Lemma 8.43, ( (/) . Also the computation in 8.21 implies that on
(/) one can dene a measure, denoted by and that for every E (/) ,
( ) (E) =
_
Y
_
X
A
E
dd =
_
X
_
Y
A
E
dd. (8.22)
Now here is Fubinis theorem.
Theorem 8.44 Let f : XY [0, ] be measurable with respect to the algebra,
(/) just dened and let be the product measure of 8.22 where and are
nite measures on (X, T) and (Y, o) respectively. Then
_
XY
fd ( ) =
_
Y
_
X
fdd =
_
X
_
Y
fdd.
Proof: Let s
n
be an increasing sequence of (/) measurable simple functions
which converges pointwise to f. The above equation holds for s
n
in place of f from
what was shown above. The nal result follows from passing to the limit and using
the monotone convergence theorem. This proves the theorem.
The symbol, T o denotes (/).
Of course one can generalize right away to measures which are only nite.
Theorem 8.45 Let f : XY [0, ] be measurable with respect to the algebra,
(/) just dened and let be the product measure of 8.22 where and are
nite measures on (X, T) and (Y, o) respectively. Then
_
XY
fd ( ) =
_
Y
_
X
fdd =
_
X
_
Y
fdd.
Proof: Since the measures are nite, there exist increasing sequences of sets,
X
n
and Y
n
such that (X
n
) < and (Y
n
) < . Then and restricted to
X
n
and Y
n
respectively are nite. Then from Theorem 8.44,
_
Y
n
_
X
n
fdd =
_
X
n
_
Y
n
fdd
Passing to the limit yields
_
Y
_
X
fdd =
_
X
_
Y
fdd
whenever f is as above. In particular, you could take f = A
E
where E T o
and dene
( ) (E)
_
Y
_
X
A
E
dd =
_
X
_
Y
A
E
dd.
Then just as in the proof of Theorem 8.44, the conclusion of this theorem is obtained.
It is also useful to note that all the above holds for

n
i=1
X
i
in place of X Y.
You would simply modify the denition of ( in 8.20 including all permutations for
the iterated integrals and for / you would use sets of the form

n
i=1
A
i
where A
i
is measurable. Everything goes through exactly as above. Thus the following is
obtained.
Theorem 8.46 Let (X
i
, T
i
,
i
)
n
i=1
be nite measure spaces and let
n
i=1
T
i
de-
note the smallest algebra which contains the measurable boxes of the form
n
i=1
A
i
where A
i
T
i
. Then there exists a measure, dened on

n
i=1
T
i
such that if
f :
n
i=1
X
i
[0, ] is

n
i=1
T
i
measurable, and (i
1
, , i
n
) is any permutation of
(1, , n) , then
_
fd =
_
X
i
n

_
X
i
1
fd
i
1
d
i
n
8.7.2 Completion Of Product Measure Spaces
Using Theorem 8.40 it is easy to give a generalization to yield a theorem for the
completion of product spaces.
Theorem 8.47 Let (X
i
, T
i
,
i
)
n
i=1
be nite measure spaces and let
n
i=1
T
i
de-
note the smallest algebra which contains the measurable boxes of the form
n
i=1
A
i
where A
i
T
i
. Then there exists a measure, dened on

n
i=1
T
i
such that if
f :
n
i=1
X
i
[0, ] is

n
i=1
T
i
measurable, and (i
1
, , i
n
) is any permutation of
(1, , n) , then
_
fd =
_
X
i
n

_
X
i
1
fd
i
1
d
i
n
Let
_
n
i=1
X
i
,
n
i=1
T
i
,
_
denote the completion of this product measure space and
let
f :
n
i=1
X
i
[0, ]
be

n
i=1
T
i
measurable. Then there exists N
n
i=1
T
i
such that (N) = 0 and a
nonnegative function, f
1
measurable with respect to

n
i=1
T
i
such that f
1
= f o
N and if (i
1
, , i
n
) is any permutation of (1, , n) , then
_
fd =
_
X
i
n

_
X
i
1
f
1
d
i
1
d
i
n
.
Furthermore, f
1
may be chosen to satisfy either f
1
f or f
1
f.
Proof: This follows immediately from Theorem 8.46 and Theorem 8.40. By
the second theorem, there exists a function f
1
f such that f
1
= f for all
(x
1
, , x
n
) / N, a set of

n
i=1
T
i
having measure zero. Then by Theorem 8.39
and Theorem 8.46
_
fd =
_
f
1
d =
_
X
i
n

_
X
i
1
f
1
d
i
1
d
i
n
.
Since f
1
= f o a set of measure zero, I will dispense with the subscript. Also
it is customary to write
=
1

n
and
=
1

n
.
Thus in more standard notation, one writes
_
fd (
1

n
) =
_
X
i
n

_
X
i
1
fd
i
1
d
i
n
This theorem is often referred to as Fubinis theorem. The next theorem is also
called this.
Corollary 8.48 Suppose f L
1
_
n
i=1
X
i
,
n
i=1
T
i
,
1

n
_
where each X
i
is a nite measure space. Then if (i
1
, , i
n
) is any permutation of (1, , n) ,
it follows
_
fd (
1

n
) =
_
X
i
n

_
X
i
1
fd
i
1
d
i
n
.
Proof: Just apply Theorem 8.47 to the positive and negative parts of the real
and imaginary parts of f. This proves the theorem.
Here is another easy corollary.
Corollary 8.49 Suppose in the situation of Corollary 8.48, f = f
1
o N, a set of
n
i=1
T
i
having
1

n
measure zero and that f
1
is a complex valued function
measurable with respect to

n
i=1
T
i
. Suppose also that for some permutation of
(1, 2, , n) , (j
1
, , j
n
)
_
X
j
n

_
X
j
1
[f
1
[ d
j
1
d
j
n
< .
Then
f L
1
_
n
i=1
X
i
,
n
i=1
T
i
,
1

n
_
and the conclusion of Corollary 8.48 holds.
Proof: Since [f
1
[ is

n
i=1
T
i
measurable, it follows from Theorem 8.46 that
>
_
X
j
n

_
X
j
1
[f
1
[ d
j
1
d
j
n
=
_
[f
1
[ d (
1

n
)
=
_
[f
1
[ d (
1

n
)
=
_
[f[ d (
1

n
) .
Thus f L
1
_
n
i=1
X
i
,
n
i=1
T
i
,
1

n
_
as claimed and the rest follows from
Corollary 8.48. This proves the corollary.
The following lemma is also useful.
Lemma 8.50 Let (X, T, ) and (Y, o, ) be nite complete measure spaces and
suppose f 0 is T o measurable. Then for a.e. x,
y f (x, y)
is o measurable. Similarly for a.e. y,
x f (x, y)
is T measurable.
8.8. DISTURBING EXAMPLES 205
Proof: By Theorem 8.40, there exist T o measurable functions, g and h and
a set, N T o of measure zero such that g f h and for (x, y) / N, it
follows that g (x, y) = h(x, y) . Then
_
X
_
Y
gdd =
_
X
_
Y
hdd
and so for a.e. x,
_
Y
gd =
_
Y
hd.
Then it follows that for these values of x, g (x, y) = h(x, y) and so by Theorem 8.40
again and the assumption that (Y, o, ) is complete, y f (x, y) is o measurable.
The other claim is similar. This proves the lemma.
8.8 Disturbing Examples
There are examples which help to dene what can be expected of product measures
and Fubini type theorems. Three such examples are given in Rudin [38] and that
is where I saw them.
Example 8.51 Let a
n
be an increasing sequence of numbers in (0, 1) which
converges to 1. Let g
n
C
c
(a
n
, a
n+1
) such that
_
g
n
dx = 1. Now for (x, y)
[0, 1) [0, 1) dene
f (x, y)
k=1
g
n
(y) (g
n
(x) g
n+1
(x)) .
Note this is actually a nite sum for each such (x, y) . Therefore, this is a continuous
function on [0, 1) [0, 1). Now for a xed y,
_
1
0
f (x, y) dx =
k=1
g
n
(y)
_
1
0
(g
n
(x) g
n+1
(x)) dx = 0
showing that
_
1
0
_
1
0
f (x, y) dxdy =
_
1
0
0dy = 0. Next x x.
_
1
0
f (x, y) dy =
k=1
(g
n
(x) g
n+1
(x))
_
1
0
g
n
(y) dy = g
1
(x) .
Hence
_
1
0
_
1
0
f (x, y) dydx =
_
1
0
g
1
(x) dx = 1. The iterated integrals are not equal.
Note the function, g is not nonnegative even though it is measurable. In addition,
neither
_
1
0
_
1
0
[f (x, y)[ dxdy nor
_
1
0
_
1
0
[f (x, y)[ dydx is nite and so you cant apply
Corollary 8.49. The problem here is the function is not nonnegative and is not
absolutely integrable.
Example 8.52 This time let = m, Lebesgue measure on [0, 1] and let be count-
ing measure on [0, 1] , in this case, the algebra is T ([0, 1]) . Let l denote the line
segment in [0, 1] [0, 1] which goes from (0, 0) to (1, 1). Thus l = (x, x) where
x [0, 1] . Consider the outer measure of l in m. Let l
k
A
k
B
k
where A
k
is Lebesgue measurable and B
k
is a subset of [0, 1] . Let B k N : (B
k
) = .
If m(
kB
A
k
) has measure zero, then there are uncountably many points of [0, 1]
outside of
kB
A
k
. For p one of these points, (p, p) A
i
B
i
and i / B. Thus each
of these points is in
i / B
B
i
, a countable set because these B
i
are each nite. But
this is a contradiction because there need to be uncountably many of these points as
just indicated. Thus m(A
k
) > 0 for some k B and so m (A
k
B
k
) = . It
follows m (l) = and so l is m measurable. Thus
_
A
l
(x, y) d m =
and so you cannot apply Fubinis theorem, Theorem 8.47. Since is not nite,
you cannot apply the corollary to this theorem either. Thus there is no contradiction
to the above theorems in the following observation.
_ _
A
l
(x, y) ddm =
_
1dm = 1,
_ _
A
l
(x, y) dmd =
_
0d = 0.
The problem here is that you have neither
_
fd m < not nite measure
spaces.
The next example is far more exotic. It concerns the case where both iterated
integrals make perfect sense but are unequal. In 1877 Cantor conjectured that the
cardinality of the real numbers is the next size of innity after countable innity.
This hypothesis is called the continuum hypothesis and it has never been proved
or disproved
2
. Assuming this continuum hypothesis will provide the basis for the
following example. It is due to Sierpinski.
Example 8.53 Let X be an uncountable set. It follows from the well ordering
theorem which says every set can be well ordered which is presented in the ap-
pendix that X can be well ordered. Let X be the rst element of X which is
preceded by uncountably many points of X. Let denote x X : x < . Then
is uncountable but there is no smaller uncountable set. Thus by the contin-
uum hypothesis, there exists a one to one and onto mapping, j which maps [0, 1]
onto . Thus, for x [0, 1] , j (x) is preceeded by countably many points. Let
Q
_
(x, y) [0, 1]
2
: j (x) < j (y)
_
and let f (x, y) = A
Q
(x, y) . Then
_
1
0
f (x, y) dy = 1,
_
1
0
f (x, y) dx = 0
In each case, the integrals make sense. In the rst, for xed x, f (x, y) = 1 for all
but countably many y so the function of y is Borel measurable. In the second where
2
In 1940 it was shown by Godel that the continuum hypothesis cannot be disproved. In 1963 it
was shown by Cohen that the continuum hypothesis cannot be proved. These assertions are based
on the axiom of choice and the Zermelo Frankel axioms of set theory. This topic is far outside the
scope of this book and this is only a hopefully interesting historical observation.
8.9. EXERCISES 207
y is xed, f (x, y) = 0 for all but countably many x. Thus
_
1
0
_
1
0
f (x, y) dydx = 1,
_
1
0
_
1
0
f (x, y) dxdy = 0.
The problem here must be that f is not mm measurable.
8.9 Exercises
1. Let = N, the natural numbers and let d (p, q) = [p q[, the usual dis-
tance in 1. Show that (, d) the closures of the balls are compact. Now let
f

k=1
f (k) whenever f C
c
(). Show this is a well dened positive
linear functional on the space C
c
(). Describe the measure of the Riesz rep-
resentation theorem which results from this positive linear functional. What
if (f) = f (1)? What measure would result from this functional? Which
functions are measurable?
2. Verify that dened in Lemma 8.8 is an outer measure.
3. Let F : 1 1 be increasing and right continuous. Let f
_
fdF where
the integral is the Riemann Stieltjes integral of f C
c
(1). Show the measure
from the Riesz representation theorem satises
([a, b]) = F (b) F (a) , ((a, b]) = F (b) F (a) ,
([a, a]) = F (a) F (a) .
Hint: You might want to review the material on Riemann Stieltjes integrals
presented in the Preliminary part of the notes.
4. Suppose is a metric space and , are two Borel measures with the property
that they are nite on every ball and that they are equal on every open set.
Show they must be equal on every Borel set. Hint: Let ( denote those Borel
sets E such that (E B) = (E B) for B an open ball. Show ( is closed
with respect to countable disjoint unions and complements and contains the
system consisting of the open sets. Then consider the lemma on systems.
Let B = B(p, n) , n = 1, 2, .
5. Let be a metric space with the closed balls compact and suppose is a
measure dened on the Borel sets of which is nite on compact sets. Show
there exists a unique Radon measure, which equals on the Borel sets.
6. Random vectors are measurable functions, X, mapping a probability
space, (, P, T) to 1
n
. Thus X() 1
n
for each and P is a probability
measure dened on the sets of T, a algebra of subsets of . For E a Borel
set in 1
n
, dene
(E) P
_
X
1
(E)
_
probability that X E.
Show this is a well dened measure on the Borel sets of 1
n
and use Problem 5
to obtain a Radon measure,
X
dened on a algebra of sets of 1
n
including
the Borel sets such that for E a Borel set,
X
(E) =Probability that (X E).
7. Suppose X and Y are metric spaces having compact closed balls. Show
(X Y, d
XY
)
is also a metric space which has the closures of balls compact. Here
d
XY
((x
1
, y
1
) , (x
2
, y
2
)) max (d (x
1
, x
2
) , d (y
1
, y
2
)) .
Let
/ E F : E is a Borel set in X, F is a Borel set in Y .
Show (/), the smallest algebra containing /contains the Borel sets. Hint:
Show every open set in a metric space which has closed balls compact can be
obtained as a countable union of compact sets. Next show this implies every
open set can be obtained as a countable union of open sets of the form U V
where U is open in X and V is open in Y .
8. Suppose (, o, ) is a measure space which may not be complete. Could you
obtain a complete measure space,
_
, o,
1
_
by simply letting o consist of all
sets of the form E where there exists F o such that (F E) (E F) N
for some N o which has measure zero and then let (E) =
1
(F)? Explain.
9. Let (, o, ) be a nite measure space and let f : [0, ) be measurable.
Dene
A (x, y) : y < f (x)
Verify that A is m measurable. Show that
_
fd =
_ _
A
A
(x, y) ddm =
_
A
A
d m.
10. For f a nonnegative measurable function, it was shown that
_
fd =
_
([f > t]) dt.
Would it work the same if you used
_
([f t]) dt? Explain.
11. The Riemann integral is only dened for functions which are bounded which
are also dened on a bounded interval. If either of these two criteria are not
satised, then the integral is not the Riemann integral. Suppose f is Riemann
integrable on a bounded interval, [a, b]. Show that it must also be Lebesgue
integrable with respect to one dimensional Lebesgue measure and the two
integrals coincide. Give a theorem in which the improper Riemann integral
coincides with a suitable Lebesgue integral. (There are many such situations
just nd one.) Note that
_
0
sin x
x
dx is a valid improper Riemann integral but
is not a Lebesgue integral. Why?
8.9. EXERCISES 209
12. Suppose is a nite measure dened on the Borel subsets of X where X is a
separable metric space. Show that is necessarily regular. Hint: First show
is outer regular on closed sets in the sense that for H closed,
(H) = inf (V ) : V H and V is open
Then show that for every open set, V
(V ) = sup (H) : H V and H is closed .
Next let T consist of those sets for which is outer regular and also inner reg-
ular with closed replacing compact in the denition of inner regular. Finally
show that if C is a closed set, then
(C) = sup (K) : K C and K is compact .
To do this, consider a countable dense subset of C, a
n
and let
C
n
=
m
n
k=1
B
_
a
k
,
1
n
_
C.
Show you can choose m
n
such that
(C C
n
) < /2
n
.
Then consider K
n
C
n
.
13. Let (, T, ) be a nite measure space and suppose f
n
is a sequence of
nonnegative functions which satisfy f
n
() C independent of n, . Suppose
also this sequence converges to 0 in measure. That is, for all > 0,
lim
n
([f
n
]) = 0
Show that then
lim
n
_
f
n
() d = 0.
Lebesgue Measure
9.1 Basic Properties
Denition 9.1 Dene the following positive linear functional for f C
c
(1
n
) .
f
_

f (x) dx
1
dx
n
.
Then the measure representing this functional is Lebesgue measure.
The following lemma will help in understanding Lebesgue measure.
Lemma 9.2 Every open set in 1
n
is the countable disjoint union of half open boxes
of the form
n
i=1
(a
i
, a
i
+ 2
k
]
where a
i
= l2
k
for some integers, l, k. The sides of these boxes are equal.
Proof: Let
(
k
= All half open boxes
n
i=1
(a
i
, a
i
+ 2
k
] where
a
i
= l2
k
for some integer l.
Thus (
k
consists of a countable disjoint collection of boxes whose union is 1
n
. This
is sometimes called a tiling of 1
n
. Think of tiles on the oor of a bathroom and
you will get the idea. Note that each box has diameter no larger than 2
k
n. This
is because if
x, y
n
i=1
(a
i
, a
i
+ 2
k
],
then [x
i
y
i
[ 2
k
. Therefore,
[x y[
_
n
i=1
_
2
k
_
2
_
1/2
= 2
k
n.
211
212 LEBESGUE MEASURE
Let U be open and let B
1
all sets of (
1
which are contained in U. If B
1
, , B
k
have been chosen, B
k+1
all sets of (
k+1
contained in
U
_
k
i=1
B
i
_
.
Let B
i=1
B
i
. In fact B
= U. Clearly B
U because every box of every

B
i
is contained in U. If p U, let k be the smallest integer such that p is contained
in a box from (
k
which is also a subset of U. Thus
p B
k
B
.
Hence B
is the desired countable disjoint collection of half open boxes whose union
is U. This proves the lemma.
Now what does Lebesgue measure do to a rectangle,

n
i=1
(a
i
, b
i
]?
Lemma 9.3 Let R =
n
i=1
[a
i
, b
i
], R
0
=
n
i=1
(a
i
, b
i
). Then
m
n
(R
0
) = m
n
(R) =
n
i=1
(b
i
a
i
).
Proof: Let k be large enough that
a
i
+ 1/k < b
i
1/k
for i = 1, , n and consider functions g
k
i
and f
k
i
having the following graphs.
a
i
+ 1/k
a
i
1
b
i
1/k
b
i
f
k
i
a
i
1/k
a
i
1
b
i
b
i
+ 1/k
g
k
i
Let
g
k
(x) =
n
i=1
g
k
i
(x
i
), f
k
(x) =
n
i=1
f
k
i
(x
i
).
Then by elementary calculus along with the denition of ,
n
i=1
(b
i
a
i
+ 2/k) g
k
=
_
g
k
dm
n
m
n
(R) m
n
(R
0
)
_
f
k
dm
n
= f
k
i=1
(b
i
a
i
2/k).
Letting k , it follows that
m
n
(R) = m
n
(R
0
) =
n
i=1
(b
i
a
i
).
9.1. BASIC PROPERTIES 213
Lemma 9.4 Let U be an open or closed set. Then m
n
(U) = m
n
(x +U) .
Proof: By Lemma 9.2 there is a sequence of disjoint half open rectangles, R
i
such that
i
R
i
= U. Therefore, x + U =
i
(x +R
i
) and the x + R
i
are also
disjoint rectangles which are identical to the R
i
but translated. From Lemma 9.3,
m
n
(U) =
i
m
n
(R
i
) =
i
m
n
(x +R
i
) = m
n
(x +U) .
It remains to verify the lemma for a closed set. Let H be a closed bounded set
rst. Then H B(0,R) for some R large enough. First note that x+H is a closed
set. Thus
m
n
(B(x, R)) = m
n
(x +H) +m
n
((B(0, R) +x) (x +H))
= m
n
(x +H) +m
n
((B(0, R) H) +x)
= m
n
(x +H) +m
n
((B(0, R) H))
= m
n
(B(0, R)) m
n
(H) +m
n
(x +H)
= m
n
(B(x, R)) m
n
(H) +m
n
(x +H)
the last equality because of the rst part of the lemma which implies m
n
(B(x, R)) =
m
n
(B(0, R)) . Therefore, m
n
(x +H) = m
n
(H) as claimed. If H is not bounded,
consider H
m
B(0, m) H. Then m
n
(x +H
m
) = m
n
(H
m
) . Passing to the limit
as m yields the result in general.
Theorem 9.5 Lebesgue measure is translation invariant. That is
m
n
(E) = m
n
(x +E)
for all E Lebesgue measurable.
Proof: Suppose m
n
(E) < . By regularity of the measure, there exist sets
G, H such that G is a countable intersection of open sets, H is a countable union
of compact sets, m
n
(G H) = 0, and G E H. Now m
n
(G) = m
n
(G+x) and
m
n
(H) = m
n
(H +x) which follows from Lemma 9.4 applied to the sets which are
either intersected to form G or unioned to form H. Now
x +H x +E x +G
and both x+H and x+G are measurable because they are either countable unions
or countable intersections of measurable sets. Furthermore,
m
n
(x +G x +H) = m
n
(x +G) m
n
(x +H) = m
n
(G) m
n
(H) = 0
and so by completeness of the measure, x +E is measurable. It follows
m
n
(E) = m
n
(H) = m
n
(x +H) m
n
(x +E)
m
n
(x +G) = m
n
(G) = m
n
(E) .
If m
n
(E) is not necessarily less than , consider E
m
B(0, m) E. Then
m
n
(E
m
) = m
n
(E
m
+x) by the above. Letting m it follows m
n
(E) =
m
n
(E +x). This proves the theorem.
Corollary 9.6 Let D be an n n diagonal matrix and let U be an open set. Then
m
n
(DU) = [det (D)[ m
n
(U) .
Proof: If any of the diagonal entries of D equals 0 there is nothing to prove
because then both sides equal zero. Therefore, it can be assumed none are equal
to zero. Suppose these diagonal entries are k
1
, , k
n
. From Lemma 9.2 there exist
half open boxes, R
i
having all sides equal such that U =
i
R
i
. Suppose one
of these is R
i
=

n
j=1
(a
j
, b
j
], where b
j
a
j
= l
i
. Then DR
i
=

n
j=1
I
j
where
I
j
= (k
j
a
j
, k
j
b
j
] if k
j
> 0 and I
j
= [k
j
b
j
, k
j
a
j
) if k
j
< 0. Then the rectangles, DR
i
are disjoint because D is one to one and their union is DU. Also,
m
n
(DR
i
) =
n
j=1
[k
j
[ l
i
= [det D[ m
n
(R
i
) .
Therefore,
m
n
(DU) =
i=1
m
n
(DR
i
) = [det (D)[
i=1
m
n
(R
i
) = [det (D)[ m
n
(U) .
and this proves the corollary.
From this the following corollary is obtained.
Corollary 9.7 Let M > 0. Then m
n
(B(a, Mr)) = M
n
m
n
(B(0, r)) .
Proof: By Lemma 9.4 there is no loss of generality in taking a = 0. Let D be the
diagonal matrix which has M in every entry of the main diagonal so [det (D)[ = M
n
.
Note that DB(0, r) = B(0, Mr) . By Corollary 9.6
m
n
(B(0, Mr)) = m
n
(DB(0, r)) = M
n
m
n
(B(0, r)) .
There are many norms on 1
n
. Other common examples are
[[x[[
max [x
k
[ : x =(x
1
, , x
n
)
or
[[x[[
p

_
n
i=1
[x
i
[
p
_
1/p
.
With [[[[ any norm for 1
n
you can dene a corresponding ball in terms of this
norm.
B(a, r) x 1
n
such that [[x a[[ < r
It follows from general considerations involving metric spaces presented earlier that
these balls are open sets. Therefore, Corollary 9.7 has an obvious generalization.
Corollary 9.8 Let [[[[ be a norm on 1
n
. Then for M > 0, m
n
(B(a, Mr)) =
M
n
m
n
(B(0, r)) where these balls are dened in terms of the norm [[[[.
9.2. THE VITALI COVERING THEOREM 215
9.2 The Vitali Covering Theorem
The Vitali covering theorem is concerned with the situation in which a set is con-
tained in the union of balls. You can imagine that it might be very hard to get
disjoint balls from this collection of balls which would cover the given set. How-
ever, it is possible to get disjoint balls from this collection of balls which have the
property that if each ball is enlarged appropriately, the resulting enlarged balls do
cover the set. When this result is established, it is used to prove another form of
this theorem in which the disjoint balls do not cover the set but they only miss a
set of measure zero.
Recall the Hausdor maximal principle, Theorem 1.13 on Page 20 which is
proved to be equivalent to the axiom of choice in the appendix. For convenience,
here it is:
Theorem 9.9 (Hausdor Maximal Principle) Let T be a nonempty partially or-
dered set. Then there exists a maximal chain.
I will use this Hausdor maximal principle to give a very short and elegant proof
of the Vitali covering theorem. This follows the treatment in Evans and Gariepy
[19] which they got from another book. I am not sure who rst did it this way but
it is very nice because it is so short. In the following lemma and theorem, the balls
will be either open or closed and determined by some norm on 1
n
. When pictures
are drawn, I shall draw them as though the norm is the usual norm but the results
are unchanged for any norm. Also, I will write (in this section only) B(a, r) to
indicate a set which satises
x 1
n
: [[x a[[ < r B(a, r) x 1
n
: [[x a[[ r
and

B(a, r) to indicate the usual ball but with radius 5 times as large,
x 1
n
: [[x a[[ < 5r .
Lemma 9.10 Let [[[[ be a norm on 1
n
and let T be a collection of balls determined
by this norm. Suppose
> M supr : B(p, r) T > 0
and k (0, ) . Then there exists ( T such that
if B(p, r) ( then r > k, (9.1)
if B
1
, B
2
( then B
1
B
2
= , (9.2)
( is maximal with respect to 9.1 and 9.2.
Note that if there is no ball of T which has radius larger than k then ( = .
Proof: Let H = B T such that 9.1 and 9.2 hold. If there are no balls with
radius larger than k then H = and you let ( =. In the other case, H ,= because
there exists B(p, r) T with r > k. In this case, partially order H by set inclusion
and use the Hausdor maximal principle (see the appendix on set theory) to let (
be a maximal chain in H. Clearly ( satises 9.1 and 9.2 because if B
1
and B
2
are
two balls from ( then since ( is a chain, it follows there is some element of (, B
such that both B
1
and B
2
are elements of B and B satises 9.1 and 9.2. If ( is
not maximal with respect to these two properties, then ( was not a maximal chain
because then there would exist B _ (, that is, B contains ( as a proper subset
and (, B would be a strictly larger chain in H. Let ( = (.
Theorem 9.11 (Vitali) Let T be a collection of balls and let
A B : B T.
Suppose
> M supr : B(p, r) T > 0.
Then there exists ( T such that ( consists of disjoint balls and
A
B : B (.
Proof: Using Lemma 9.10, there exists (
1
T T
0
which satises
B(p, r) (
1
implies r >
M
2
, (9.3)
B
1
, B
2
(
1
implies B
1
B
2
= , (9.4)
(
1
is maximal with respect to 9.3, and 9.4.
Suppose (
1
, , (
m
have been chosen, m 1. Let
T
m
B T : B 1
n
(
1
(
m
.
Using Lemma 9.10, there exists (
m+1
T
m
such that
B(p, r) (
m+1
implies r >
M
2
m+1
, (9.5)
B
1
, B
2
(
m+1
implies B
1
B
2
= , (9.6)
(
m+1
is a maximal subset of T
m
with respect to 9.5 and 9.6.
Note it might be the case that (
m+1
= which happens if T
m
= . Dene
(
k=1
(
k
.
Thus ( is a collection of disjoint balls in T. I must show
B : B ( covers A.
Let x B(p, r) T and let
M
2
m
< r
M
2
m1
.
9.3. THE VITALI COVERING THEOREM (ELEMENTARY VERSION) 217
Then B(p, r) must intersect some set, B(p
0
, r
0
) (
1
(
m
since otherwise,
(
m
would fail to be maximal. Then r
0
>
M
2
m
because all balls in (
1
(
m
satisfy
this inequality.
'
r
0
p
0
c
r
p
x
Then for x B(p, r) , the following chain of inequalities holds because r
M
2
m1
and r
0
>
M
2
m
[x p
0
[ [x p[ +[p p
0
[ r +r
0
+r
2M
2
m1
+r
0
=
4M
2
m
+r
0
< 5r
0
.
Thus B(p, r)

B(p
0
, r
0
) and this proves the theorem.
9.3 The Vitali Covering Theorem (Elementary Ver-
sion)
The proof given here is from Basic Analysis [31]. It rst considers the case of open
balls and then generalizes to balls which may be neither open nor closed or closed.
Lemma 9.12 Let T be a countable collection of balls satisfying
> M supr : B(p, r) T > 0
and let k (0, ) . Then there exists ( T such that
If B(p, r) ( then r > k, (9.7)
If B
1
, B
2
( then B
1
B
2
= , (9.8)
( is maximal with respect to 9.7 and 9.8. (9.9)
Proof: If no ball of T has radius larger than k, let ( = . Assume therefore, that
some balls have radius larger than k. Let T B
i
i=1
. Now let B
n
1
be the rst ball
in the list which has radius greater than k. If every ball having radius larger than
k intersects this one, then stop. The maximal set is just B
n
1
. Otherwise, let B
n
2
be the next ball having radius larger than k which is disjoint from B
n
1
. Continue
this way obtaining B
n
i
i=1
, a nite or innite sequence of disjoint balls having
radius larger than k. Then let ( B
n
i
. To see that ( is maximal with respect to
9.7 and 9.8, suppose B T, B has radius larger than k, and ( B satises 9.7
and 9.8. Then at some point in the process, B would have been chosen because it
would be the ball of radius larger than k which has the smallest index. Therefore,
B ( and this shows ( is maximal with respect to 9.7 and 9.8.
For the next lemma, for an open ball, B = B(x, r) , denote by

B the open ball,
B(x, 4r) .
Lemma 9.13 Let T be a collection of open balls, and let
A B : B T .
Suppose
> M supr : B(p, r) T > 0.
A
B : B (.
Proof: Without loss of generality assume T is countable. This is because there
is a countable subset of T, T
such that T
= A. To see this, consider the set

of balls having rational radii and centers having all components rational. This is a
countable set of balls and you should verify that every open set is the union of balls
of this form. Therefore, you can consider the subset of this set of balls consisting
of those which are contained in some open set of T, G so G = A and use the
axiom of choice to dene a subset of T consisting of a single set from T containing
each set of G. Then this is T
. The union of these sets equals A . Then consider

T
instead of T. Therefore, assume at the outset T is countable. By Lemma 9.12,

there exists (
1
T which satises 9.7, 9.8, and 9.9 with k =
2M
3
.
Suppose (
1
, , (
m1
have been chosen for m 2. Let
T
m
= B T : B 1
n
union of the balls in these G

j
..
(
1
(
m1

and using Lemma 9.12, let (
m
be a maximal collection of disjoint balls from T
m
with the property that each ball has radius larger than
_
2
3
_
m
M. Let (
k=1
(
k
.
Let x B(p, r) T. Choose m such that
_
2
3
_
m
M < r
_
2
3
_
m1
M
Then B(p, r) must have nonempty intersection with some ball from (
1
(
m
because if it didnt, then (
m
would fail to be maximal. Denote by B(p
0
, r
0
) a ball
in (
1
(
m
which has nonempty intersection with B(p, r) . Thus
r
0
>
_
2
3
_
m
M.
Consider the picture, in which w B(p
0
, r
0
) B(p, r) .
9.3. THE VITALI COVERING THEOREM (ELEMENTARY VERSION) 219
w
r
0
p
0
?
r
p
x
Then
[x p
0
[ [x p[ +[p w[ +
<r
0
..
[wp
0
[
< r +r +r
0
2
<
3
2
r
0
..
_
2
3
_
m1
M +r
0
< 2
_
3
2
_
r
0
+r
0
= 4r
0
.
This proves the lemma since it shows B(p, r) B(p
0
, 4r
0
) .
With this Lemma consider a version of the Vitali covering theorem in which
the balls do not have to be open. A ball centered at x of radius r will denote
something which contains the open ball, B(x, r) and is contained in the closed ball,
B(x, r). Thus the balls could be open or they could contain some but not all of
their boundary points.
Denition 9.14 Let B be a ball centered at x having radius r. Denote by

B the
open ball, B(x, 5r).
Theorem 9.15 (Vitali) Let T be a collection of balls, and let
A B : B T .
Suppose
> M supr : B(p, r) T > 0.
A
B : B (.
Proof: For B one of these balls, say B(x, r) B B(x, r), denote by B
1
, the
ball B
_
x,
5r
4
_
. Let T
1
B
1
: B T and let A
1
denote the union of the balls in
T
1
. Apply Lemma 9.13 to T
1
to obtain
A
1

B
1
: B
1
(
1
where (
1
consists of disjoint balls from T
1
. Now let ( B T : B
1
(
1
. Thus
( consists of disjoint balls from T because they are contained in the disjoint open
balls, (
1
. Then
A A
1

B
1
: B
1
(
1
=
B : B (
because for B
1
= B
_
x,
5r
4
_
, it follows

B
1
= B(x, 5r) =

B. This proves the theorem.
9.4 Vitali Coverings
There is another version of the Vitali covering theorem which is also of great impor-
tance. In this one, balls from the original set of balls almost cover the set,leaving
out only a set of measure zero. It is like packing a truck with stu. You keep trying
to ll in the holes with smaller and smaller things so as to not waste space. It is
remarkable that you can avoid wasting any space at all when you are dealing with
balls of any sort provided you can use arbitrarily small balls.
Denition 9.16 Let T be a collection of balls that cover a set, E, which have the
property that if x E and > 0, then there exists B T, diameter of B < and
x B. Such a collection covers E in the sense of Vitali.
In the following covering theorem, m
n
denotes the outer measure determined by
n dimensional Lebesgue measure.
Theorem 9.17 Let E 1
n
and suppose 0 < m
n
(E) < where m
n
is the outer
measure determined by m
n
, n dimensional Lebesgue measure, and let T be a col-
lection of closed balls of bounded radii such that T covers E in the sense of Vitali.
Then there exists a countable collection of disjoint balls from T, B
j
j=1
, such that
m
n
(E
j=1
B
j
) = 0.
Proof: From the denition of outer measure there exists a Lebesgue measurable
set, E
1
E such that m
n
(E
1
) = m
n
(E). Now by outer regularity of Lebesgue
measure, there exists U, an open set which satises
m
n
(E
1
) > (1 10
n
)m
n
(U), U E
1
.
E
1
U
Each point of E is contained in balls of T of arbitrarily small radii and so
there exists a covering of E with balls of T which are themselves contained in U.
Therefore, by the Vitali covering theorem, there exist disjoint balls, B
i
i=1
T
such that
E
j=1
B
j
, B
j
U.
9.4. VITALI COVERINGS 221
Therefore,
m
n
(E
1
) = m
n
(E) m
n
_
j=1
B
j
_
j
m
n
_
B
j
_
= 5
n
j
m
n
(B
j
) = 5
n
m
n
_
j=1
B
j
_
Then both E
1
and
j=1
B
j
are contained in U and so
m
n
(E
1
) > (1 10
n
)m
n
(U)
(1 10
n
)[m
n
(E
1

j=1
B
j
) +m
n
(
j=1
B
j
)]
(1 10
n
)[m
n
(E
1

j=1
B
j
) + 5
n
=m
n
(E
1
)
..
m
n
(E) ].
and so
_
1
_
1 10
n
_
5
n
_
m
n
(E
1
) (1 10
n
)m
n
(E
1

j=1
B
j
)
which implies
m
n
(E
1

j=1
B
j
)
(1 (1 10
n
) 5
n
)
(1 10
n
)
m
n
(E
1
)
Now a short computation shows
0 <
(1 (1 10
n
) 5
n
)
(1 10
n
)
< 1
Hence, denoting by
n
a number such that
(1 (1 10
n
) 5
n
)
(1 10
n
)
<
n
< 1,
m
n
_
E
j=1
B
j
_
m
n
(E
1

j=1
B
j
) <
n
m
n
(E
1
) =
n
m
n
(E)
Now using Theorem 7.5 on Page 136 there exists N
1
large enough that
n
m
n
(E) m
n
(E
1

N
1
j=1
B
j
) m
n
(E
N
1
j=1
B
j
) (9.10)
Let T
1
= B T : B
j
B = , j = 1, , N
1
. If E
N
1
j=1
B
j
= , then T
1
=
and
m
n
_
E
N
1
j=1
B
j
_
= 0
Therefore, in this case let B
k
= for all k > N
1
. Consider the case where
E
N
1
j=1
B
j
,= .
In this case, since the balls are closed and T is a Vitali cover, T
1
,= and covers
E
N
1
j=1
B
j
in the sense of Vitali. Repeat the same argument, letting E
N
1
j=1
B
j
play
the role of E. (You pick a dierent E
1
whose measure equals the outer measure of
E
N
1
j=1
B
j
and proceed as before.) Then choosing B
j
for j = N
1
+1, , N
2
as in
the above argument,
n
m
n
(E
N
1
j=1
B
j
) m
n
(E
N
2
j=1
B
j
)
and so from 9.10,
2
n
m
n
(E) m
n
(E
N
2
j=1
B
j
).
Continuing this way
k
n
m
n
(E) m
n
_
E
N
k
j=1
B
j
_
.
If it is ever the case that E
N
k
j=1
B
j
= , then as in the above argument,
m
n
_
E
N
k
j=1
B
j
_
= 0.
Otherwise, the process continues and
m
n
_
E
j=1
B
j
_
m
n
_
E
N
k
j=1
B
j
_

k
n
m
n
(E)
for every k N. Therefore, the conclusion holds in this case also. This proves the
Theorem.
There is an obvious corollary which removes the assumption that 0 < m
n
(E).
Corollary 9.18 Let E 1
n
and suppose m
n
(E) < where m
n
is the outer
measure determined by m
n
, n dimensional Lebesgue measure, and let T, be a col-
lection of closed balls of bounded radii such that T covers E in the sense of Vitali.
Then there exists a countable collection of disjoint balls from T, B
j
j=1
, such that
m
n
(E
j=1
B
j
) = 0.
Proof: If 0 = m
n
(E) you simply pick any ball from T for your collection of
disjoint balls.
It is also not hard to remove the assumption that m
n
(E) < .
n
and let T, be a collection of closed balls of bounded
radii such that T covers E in the sense of Vitali. Then there exists a countable
collection of disjoint balls from T, B
j
j=1
, such that m
n
(E
j=1
B
j
) = 0.
Proof: Let R
m
(m, m)
n
be the open rectangle having sides of length 2m
which is centered at 0 and let R
0
= . Let H
m
R
m
R
m
. Since both R
m
and R
m
have the same measure, (2m)
n
, it follows m
n
(H
m
) = 0. Now for all
k N, R
k
R
k
R
k+1
. Consider the disjoint open sets, U
k
R
k+1
R
k
. Thus
1
n
=
k=0
U
k
N where N is a set of measure zero equal to the union of the H
k
.
Let T
k
denote those balls of T which are contained in U
k
and let E
k
U
k
E.
Then from Theorem 9.17, there exists a sequence of disjoint balls, D
k

_
B
k
i
_
i=1
9.5. CHANGE OF VARIABLES FOR LINEAR MAPS 223
of T
k
such that m
n
(E
k

j=1
B
k
j
) = 0. Letting B
i
i=1
be an enumeration of all
the balls of
k
D
k
, it follows that
m
n
(E
j=1
B
j
) m
n
(N) +
k=1
m
n
(E
k

j=1
B
k
j
) = 0.
Also, you dont have to assume the balls are closed.
n
and let T, be a collection of open balls of bounded
j
j=1
, such that m
n
(E
j=1
B
j
) = 0.
Proof: Let T be the collection of closures of balls in T. Then T covers E in
the sense of Vitali and so from Corollary 9.19 there exists a sequence of disjoint
closed balls from T satisfying m
n
_
E
i=1
B
i
_
= 0. Now boundaries of the balls,
B
i
have measure zero and so B
i
is a sequence of disjoint open balls satisfying
m
n
(E
i=1
B
i
) = 0. The reason for this is that
(E
i=1
B
i
)
_
E
i=1
B
i
_

i=1
B
i

i=1
B
i

i=1
B
i
B
i
,
a set of measure zero. Therefore,
E
i=1
B
i

_
E
i=1
B
i
_
i=1
B
i
B
i
_
and so
m
n
(E
i=1
B
i
) m
n
_
E
i=1
B
i
_
+m
n
_
i=1
B
i
B
i
_
= m
n
_
E
i=1
B
i
_
= 0.
This implies you can ll up an open set with balls which cover the open set in
the sense of Vitali.
Corollary 9.21 Let U 1
n
be an open set and let T be a collection of closed or
even open balls of bounded radii contained in U such that T covers U in the sense
of Vitali. Then there exists a countable collection of disjoint balls from T, B
j
j=1
,
such that m
n
(U
j=1
B
j
) = 0.
9.5 Change Of Variables For Linear Maps
To begin with certain kinds of functions map measurable sets to measurable sets.
It will be assumed that U is an open set in 1
n
and that h : U 1
n
satises
Dh(x) exists for all x U, (9.11)
Lemma 9.22 Let h satisfy 9.11. If T U and m
n
(T) = 0, then m
n
(h(T)) = 0.
Proof: Let
T
k
x T : [[Dh(x)[[ < k
and let > 0 be given. Now by outer regularity, there exists an open set, V ,
containing T
k
which is contained in U such that m
n
(V ) < . Let x T
k
. Then by
dierentiability,
h(x +v) = h(x) +Dh(x) v +o (v)
and so there exist arbitrarily small r
x
< 1 such that B(x,5r
x
) V and whenever
[v[ r
x
, [o (v)[ < k [v[ . Thus
h(B(x, r
x
)) B(h(x) , 2kr
x
) .
From the Vitali covering theorem there exists a countable disjoint sequence of
these sets, B(x
i
, r
i
)
i=1
such that B(x
i
, 5r
i
)
i=1
=
_
B
i
_
i=1
covers T
k
Then
letting m
n
denote the outer measure determined by m
n
,
m
n
(h(T
k
)) m
n
_
h
_
i=1
B
i
__
i=1
m
n
_
h
_
B
i
__
i=1
m
n
(B(h(x
i
) , 2kr
x
i
))
=
i=1
m
n
(B(x
i
, 2kr
x
i
)) = (2k)
n
i=1
m
n
(B(x
i
, r
x
i
))
(2k)
n
m
n
(V ) (2k)
n
.
Since > 0 is arbitrary, this shows m
n
(h(T
k
)) = 0. Now
m
n
(h(T)) = lim
k
m
n
(h(T
k
)) = 0.
Lemma 9.23 Let h satisfy 9.11. If S is a Lebesgue measurable subset of U, then
h(S) is Lebesgue measurable.
Proof: Let S
k
= S B(0, k) , k N. By inner regularity of Lebesgue measure,
there exists a set, F, which is the countable union of compact sets and a set T with
m
n
(T) = 0 such that
F T = S
k
.
Then h(F) h(S
k
) h(F) h(T). By continuity of h, h(F) is a countable
union of compact sets and so it is Borel. By Lemma 9.22, m
n
(h(T)) = 0 and so
h(S
k
) is Lebesgue measurable because of completeness of Lebesgue measure. Now
h(S) =
k=1
h(S
k
) and so it is also true that h(S) is Lebesgue measurable. This
proves the lemma.
In particular, this proves the following corollary.
9.5. CHANGE OF VARIABLES FOR LINEAR MAPS 225
Corollary 9.24 Suppose A is an nn matrix. Then if S is a Lebesgue measurable
set, it follows AS is also a Lebesgue measurable set.
Lemma 9.25 Let R be unitary (R
R = RR
= I) and let V be a an open or closed

set. Then m
n
(RV ) = m
n
(V ) .
Proof: First assume V is a bounded open set. By Corollary 9.21 there is a
disjoint sequence of closed balls, B
i
such that V =
i=1
B
i
N where m
n
(N) = 0.
Denote by x
i
the center of B
i
and let r
i
be the radius of B
i
. Then by Lemma 9.22
m
n
(RV ) =
i=1
m
n
(RB
i
) . Now by invariance of translation of Lebesgue measure,
this equals

i=1
m
n
(RB
i
Rx
i
) =

i=1
m
n
(RB(0, r
i
)) . Since R is unitary, it
preserves all distances and so RB(0, r
i
) = B(0, r
i
) and therefore,
m
n
(RV ) =
i=1
m
n
(B(0, r
i
)) =
i=1
m
n
(B
i
) = m
n
(V ) .
This proves the lemma in the case that V is bounded. Suppose now that V is just
an open set. Let V
k
= V B(0, k) . Then m
n
(RV
k
) = m
n
(V
k
) . Letting k ,
this yields the desired conclusion. This proves the lemma in the case that V is open.
Suppose now that H is a closed and bounded set. Let B(0,R) H. Then letting
B = B(0, R) for short,
m
n
(RH) = m
n
(RB) m
n
(R(B H))
= m
n
(B) m
n
(B H) = m
n
(H) .
In general, let H
m
= H B(0,m). Then from what was just shown, m
n
(RH
m
) =
m
n
(H
m
) . Now let m to get the conclusion of the lemma in general. This
proves the lemma.
Lemma 9.26 Let E be Lebesgue measurable set in 1
n
and let R be unitary. Then
m
n
(RE) = m
n
(E) .
Proof: First suppose E is bounded. Then there exist sets, G and H such that
H E G and H is the countable union of closed sets while G is the countable
intersection of open sets such that m
n
(G H) = 0. By Lemma 9.25 applied to these
sets whose union or intersection equals H or G respectively, it follows
m
n
(RG) = m
n
(G) = m
n
(H) = m
n
(RH) .
Therefore,
m
n
(H) = m
n
(RH) m
n
(RE) m
n
(RG) = m
n
(G) = m
n
(E) = m
n
(H) .
In the general case, let E
m
= E B(0, m) and apply what was just shown and let
m .
Lemma 9.27 Let V be an open or closed set in 1
n
and let A be an n n matrix.
Then m
n
(AV ) = [det (A)[ m
n
(V ).
Proof: Let RU be the right polar decomposition (Theorem 3.65 on Page 75) of
A and let V be an open set. Then from Lemma 9.26,
m
n
(AV ) = m
n
(RUV ) = m
n
(UV ) .
Now U = Q
DQ where D is a diagonal matrix such that [det (D)[ = [det (A)[ and
Q is unitary. Therefore,
m
n
(AV ) = m
n
(Q
DQV ) = m
n
(DQV ) .
Now QV is an open set and so by Corollary 9.6 on Page 214 and Lemma 9.25,
m
n
(AV ) = [det (D)[ m
n
(QV ) = [det (D)[ m
n
(V ) = [det (A)[ m
n
(V ) .
This proves the lemma in case V is open.
Now let H be a closed set which is also bounded. First suppose det (A) = 0.
Then letting V be an open set containing H,
m
n
(AH) m
n
(AV ) = [det (A)[ m
n
(V ) = 0
which shows the desired equation is obvious in the case where det (A) = 0. Therefore,
assume A is one to one. Since H is bounded, H B(0, R) for some R > 0. Then
letting B = B(0, R) for short,
m
n
(AH) = m
n
(AB) m
n
(A(B H))
= [det (A)[ m
n
(B) [det (A)[ m
n
(B H) = [det (A)[ m
n
(H) .
If H is not bounded, apply the result just obtained to H
m
HB(0, m) and then
let m .
With this preparation, the main result is the following theorem.
Theorem 9.28 Let E be Lebesgue measurable set in 1
n
and let A be an n n
matrix. Then m
n
(AE) = [det (A)[ m
n
(E) .
Proof: First suppose E is bounded. Then there exist sets, G and H such that
H E G and H is the countable union of closed sets while G is the countable
intersection of open sets such that m
n
(G H) = 0. By Lemma 9.27 applied to these
sets whose union or intersection equals H or G respectively, it follows
m
n
(AG) = [det (A)[ m
n
(G) = [det (A)[ m
n
(H) = m
n
(AH) .
Therefore,
[det (A)[ m
n
(E) = [det (A)[ m
n
(H) = m
n
(AH) m
n
(AE)
m
n
(AG) = [det (A)[ m
n
(G) = [det (A)[ m
n
(E) .
In the general case, let E
m
= E B(0, m) and apply what was just shown and let
m .
9.6. CHANGE OF VARIABLES FOR C
1
FUNCTIONS 227
9.6 Change Of Variables For C
1
Functions
In this section theorems are proved which generalize the above to C
1
functions.
More general versions can be seen in Kuttler [31], Kuttler [32], and Rudin [38].
There is also a very dierent approach to this theorem given in [31]. The more
general version in [31] follows [38] and both are based on the Brouwer xed point
theorem and a very clever lemma presented in Rudin [38]. The proof given here
will be based on a sequence of easy lemmas.
Lemma 9.29 Let U and V be bounded open sets in 1
n
and let h, h
1
be C
1
func-
tions such that h(U) = V. Also let f C
c
(V ) . Then
_
V
f (y) dy =
_
U
f (h(x)) [det (Dh(x))[ dx
Proof: Let x U. By the assumption that h and h
1
are C
1
,
h(x +v) h(x) = Dh(x) v +o(v)
= Dh(x)
_
v +Dh
1
(h(x)) o(v)
_
= Dh(x) (v +o(v))
and so if r > 0 is small enough then B(x, r) is contained in U and
h(B(x, r)) h(x) = h(x+B(0,r)) h(x) Dh(x) (B(0, (1 +) r)) . (9.12)
Making r still smaller if necessary, one can also obtain
[f (y) f (h(x))[ < (9.13)
for any y h(B(x, r)) and
[f (h(x
1
)) [det (Dh(x
1
))[ f (h(x)) [det (Dh(x))[[ < (9.14)
whenever x
1
B(x, r) . The collection of such balls is a Vitali cover of U. By
Corollary 9.21 there is a sequence of disjoint closed balls B
i
such that U =
i=1
B
i
N where m
n
(N) = 0. Denote by x
i
the center of B
i
and r
i
the radius.
Then by Lemma 9.22, the monotone convergence theorem, and 9.12 - 9.14,
_
V
f (y) dy =
i=1
_
h(B
i
)
f (y) dy
m
n
(V ) +
i=1
_
h(B
i
)
f (h(x
i
)) dy
m
n
(V ) +
i=1
f (h(x
i
)) m
n
(h(B
i
))
m
n
(V ) +
i=1
f (h(x
i
)) m
n
(Dh(x
i
) (B(0, (1 +) r
i
)))
= m
n
(V ) + (1 +)
n
i=1
_
B
i
f (h(x
i
)) [det (Dh(x
i
))[ dx
m
n
(V ) + (1 +)
n
i=1
_
_
B
i
f (h(x)) [det (Dh(x))[ dx +m
n
(B
i
)
_
m
n
(V ) + (1 +)
n
i=1
_
B
i
f (h(x)) [det (Dh(x))[ dx + (1 +)
n
m
n
(U)
= m
n
(V ) + (1 +)
n
_
U
f (h(x)) [det (Dh(x))[ dx + (1 +)
n
m
n
(U)
Since > 0 is arbitrary, this shows
_
V
f (y) dy
_
U
f (h(x)) [det (Dh(x))[ dx (9.15)
whenever f C
c
(V ) . Now x f (h(x)) [det (Dh(x))[ is in C
c
(U) and so using the
same argument with U and V switching roles and replacing h with h
1
,
_
U
_
V
f
_
h
_
h
1
(y)
__
det
_
Dh
_
h
1
(y)
__
det
_
Dh
1
(y)
_
dy
=
_
V
f (y) dy
by the chain rule. This with 9.15 proves the lemma.
Corollary 9.30 Let U and V be open sets in 1
n
and let h, h
1
be C
1
functions
such that h(U) = V. Also let f C
c
(V ) . Then
_
V
f (y) dy =
_
U
Proof: Choose m large enough that spt (f) B(0,m) V V
m
. Then let
h
1
(V
m
) = U
m
. From Lemma 9.29,
_
V
f (y) dy =
_
V
m
f (y) dy =
_
U
m
=
_
U
f (h(x)) [det (Dh(x))[ dx.
Corollary 9.31 Let U and V be open sets in 1
n
and let h, h
1
be C
1
functions
such that h(U) = V. Also let E V be measurable. Then
_
V
A
E
(y) dy =
_
U
A
E
(h(x)) [det (Dh(x))[ dx.
Proof: Let E
m
= E V
m
where V
m
and U
m
are as in Corollary 9.30. By
regularity of the measure there exist sets, K
k
, G
k
such that K
k
E
m
G
k
, G
k
is open, K
k
is compact, and m
n
(G
k
K
k
) < 2
k
. Let K
k
f
k
G
k
. Then
f
k
(y) A
E
m
(y) a.e. because if y is such that convergence fails, it must be
the case that y is in G
k
K
k
innitely often and

k
m
n
(G
k
K
k
) < . Let
N =
m

k=m
G
k
K
k
, the set of y which is in innitely many of the G
k
K
k
.
Then f
k
(h(x)) must converge to A
E
(h(x)) for all x / h
1
(N) , a set of measure
zero by Lemma 9.22. By Corollary 9.30
_
V
m
f
k
(y) dy =
_
U
m
f
k
9.6. CHANGE OF VARIABLES FOR C
1
FUNCTIONS 229
By the dominated convergence theorem using a dominating function, A
V
m
in the
integral on the left and A
U
m
[det (Dh)[ on the right,
_
V
m
A
E
m
(y) dy =
_
U
m
A
E
m
Therefore,
_
V
A
E
m
(y) dy =
_
V
m
A
E
m
(y) dy =
_
U
m
A
E
m
(h(x)) [det (Dh(x))[ dx
=
_
U
A
E
m
Let m and use the monotone convergence theorem to obtain the conclusion
of the corollary.
With this corollary, the main theorem follows.
Theorem 9.32 Let U and V be open sets in 1
n
and let h, h
1
be C
1
functions
such that h(U) = V. Then if g is a nonnegative Lebesgue measurable function,
_
V
g (y) dy =
_
U
g (h(x)) [det (Dh(x))[ dx. (9.16)
Proof: From Corollary 9.31, 9.16 holds for any nonnegative simple function in
place of g. In general, let s
k
be an increasing sequence of simple functions which
converges to g pointwise. Then from the monotone convergence theorem
_
V
g (y) dy = lim
k
_
V
s
k
dy = lim
k
_
U
s
k
=
_
U
g (h(x)) [det (Dh(x))[ dx.
This is a pretty good theorem but it isnt too hard to generalize it. In particular,
it is not necessary to assume h
1
is C
1
.
Lemma 9.33 (Sard) Let U be an open set in 1
n
and let h : U 1
n
be C
1
. Let
Z x U : det Dh(x) = 0 .
Then m
n
(h(Z)) = 0.
Proof: Let Z
k
denote those points x of Z such that [[Dh(x)[[ k and such
that [x[ < k. Let > 0 be given. For x Z
k
,
h(x +v) = h(x) +Dh(x) v +o(v)
and so whenever r is small enough,
h(x+B(0,r)) = h(B(x, r)) h(x) +Dh(x) B(0, r) +B(0, r)
Note Dh(x) B(0, r) is contained in an n1 dimensional subspace of 1
n
due to the
fact Dh(x) has rank less than n. Now let Q denote an orthogonal transformation
preserving all distances,
QQ
= Q
Q = I,
such that
QDh(x) B(0, r) 1
n1
.
Then
Qh(B(x, r)) Qh(x) +QDh(x) B(0, r) +B(0, r)
and by translation invariance of Lebesgue measure,
m
n
(Qh(B(x, r))) m
n
(QDh(x) B(0, r) +B(0, r))
([[QDh(x)[[ (2r + 2r))
n1
2r = C (1 +)
n1
m
n
(B(0,r))
These balls give a Vitali cover of Z
k
and so there exists a disjoint sequence of them
B
i
, each contained in B(0,k) which covers Z
k
except for a set of measure zero
which is mapped by h to a set of measure zero. Therefore using Theorem 9.28,
m
n
(h(Z
k
)) = m
n
(h(
i=1
B
i
))
i=1
m
n
(h(B
i
))
=
i=1
m
n
(Qh(B
i
)) C (1 +)
n1
i=1
m
n
(B
i
) C (1 +)
n1
m
n
(B(0,k))
and since is arbitrary, this shows m
n
(h(Z
k
)) = 0. Now
m
n
(h(Z)) = lim
k
m
n
(h(Z
k
)) = 0.
With this important lemma, here is a generalization of Theorem 9.32.
Theorem 9.34 Let U be an open set and let h be a 1 1, C
1
function with values
in 1
n
. Then if g is a nonnegative Lebesgue measurable function,
_
h(U)
g (y) dy =
_
U
g (h(x)) [det (Dh(x))[ dx. (9.17)
Proof: Let Z = x : det (Dh(x)) = 0 . Then by the inverse function theorem,
h
1
is C
1
on h(U Z) and h(U Z) is an open set. Therefore, from Lemma 9.33
and Theorem 9.32,
_
h(U)
g (y) dy =
_
h(U\Z)
g (y) dy =
_
U\Z
g (h(x)) [det (Dh(x))[ dx
=
_
U
g (h(x)) [det (Dh(x))[ dx.
Of course the next generalization considers the case when h is not even one to
one.
9.7. MAPPINGS WHICH ARE NOT ONE TO ONE 231
9.7 Mappings Which Are Not One To One
Now suppose h is only C
1
, not necessarily one to one. For
U
+
x U : [det Dh(x)[ > 0
and Z the set where [det Dh(x)[ = 0, Lemma 9.33 implies m
n
(h(Z)) = 0. For
x U
+
, the inverse function theorem implies there exists an open set B
x
such that
x B
x
U
+
, h is one to one on B
x
.
Let B
i
be a countable subset of B
x
xU
+
such that U
+
=
i=1
B
i
. Let
E
1
= B
1
. If E
1
, , E
k
have been chosen, E
k+1
= B
k+1

k
i=1
E
i
. Thus
i=1
E
i
= U
+
, h is one to one on E
i
, E
i
E
j
= ,
and each E
i
is a Borel set contained in the open set B
i
. Now dene
n(y)
i=1
A
h(E
i
)
(y) +A
h(Z)
(y).
The set, h(E
i
) , h(Z) are measurable by Lemma 9.23. Thus n() is measurable.
Lemma 9.35 Let F h(U) be measurable. Then
_
h(U)
n(y)A
F
(y)dy =
_
U
A
F
(h(x))[ det Dh(x)[dx.
Proof: Using Lemma 9.33 and the Monotone Convergence Theorem or Fubinis
Theorem,
_
h(U)
n(y)A
F
(y)dy =
_
h(U)
_
_
_
i=1
A
h(E
i
)
(y) +
m
n
(h(Z))=0
..
A
h(Z)
(y)
_
_
_A
F
(y)dy
=
i=1
_
h(U)
A
h(E
i
)
(y)A
F
(y)dy
=
i=1
_
h(U)h(E
i
)
A
F
(y)dy
=
i=1
_
h(B
i
)h(E
i
)
A
F
(y)dy
=
i=1
_
h(B
i
)
A
h(E
i
)
(y)A
F
(y)dy
=
i=1
_
B
i
A
E
i
(x)A
F
(h(x))[ det Dh(x)[dx
=
i=1
_
U
A
E
i
(x)A
F
=
_
U
i=1
A
E
i
(x)A
F
=
_
U
+
A
F
(h(x))[ det Dh(x)[dx =
_
U
A
F
(h(x))[ det Dh(x)[dx.
Denition 9.36 For y h(U), dene a function, #, according to the formula
#(y) number of elements in h
1
(y).
Observe that
#(y) = n(y) a.e. (9.18)
because n(y) = #(y) if y / h(Z), a set of measure 0. Therefore, # is a measurable
function.
Theorem 9.37 Let g 0, g measurable, and let h be C
1
(U). Then
_
h(U)
#(y)g(y)dy =
_
U
g(h(x))[ det Dh(x)[dx. (9.19)
Proof: From 9.18 and Lemma 9.35, 9.19 holds for all g, a nonnegative simple
function. Approximating an arbitrary measurable nonnegative function, g, with an
increasing pointwise convergent sequence of simple functions and using the mono-
tone convergence theorem, yields 9.19 for an arbitrary nonnegative measurable func-
tion, g. This proves the theorem.
9.8 Lebesgue Measure And Iterated Integrals
The following is the main result.
Theorem 9.38 Let f 0 and suppose f is a Lebesgue measurable function dened
on 1
n
. Then
_
R
n
fdm
n
=
_
R
k
_
R
nk
fdm
nk
dm
k
.
9.8. LEBESGUE MEASURE AND ITERATED INTEGRALS 233
This will be accomplished by Fubinis theorem, Theorem 8.47 on Page 203 and
the following lemma.
Lemma 9.39 m
k
m
nk
= m
n
on the m
n
measurable sets.
Proof: First of all, let R =

n
i=1
(a
i
, b
i
] be a measurable rectangle and let
R
k
=
k
i=1
(a
i
, b
i
], R
nk
=
n
i=k+1
(a
i
, b
i
]. Then by Fubinis theorem,
_
A
R
d (m
k
m
nk
) =
_
R
k
_
R
nk
A
R
k
A
R
nk
dm
k
dm
nk
=
_
R
k
A
R
k
dm
k
_
R
nk
A
R
nk
dm
nk
=
_
A
R
dm
n
and so m
k
m
nk
and m
n
agree on every half open rectangle. By Lemma 9.2
these two measures agree on every open set. Now if K is a compact set, then
K =
k=1
U
k
where U
k
is the open set, K + B
_
0,
1
k
_
. Another way of saying this
is U
k

_
x : dist (x,K) <
1
k
_
which is obviously open because x dist (x,K) is a
continuous function. Since K is the countable intersection of these decreasing open
sets, each of which has nite measure with respect to either of the two measures, it
follows that m
k
m
nk
and m
n
agree on all the compact sets.
Now let E be a bounded Lebesgue measurable set. Then there are sets, H and
G such that H is a countable union of compact sets, G a countable intersection of
open sets, H E G, and m
n
(G H) = 0. Then from what was just shown about
compact and open sets, the two measures agree on G and on H. Therefore,
m
n
(H) = m
k
m
nk
(H) m
k
m
nk
(E)
m
k
m
nk
(G) = m
n
(G) = m
n
(E) = m
n
(H)
By completeness of the measure space for m
k
m
nk
, it follows that E is m
k
m
nk
measurable and m
k
m
nk
(E) = m
n
(E) . This proves the lemma.
You could also show that the two algebras are the same. However, this is not
needed for the lemma or the theorem.
Proof of Theorem 9.38: By the lemma and Fubinis theorem, Theorem 8.47,
_
R
n
fdm
n
=
_
R
n
fd (m
k
m
nk
) =
_
R
k
_
R
nk
fdm
nk
dm
k
.
Not surprisingly, the following corollary follows from this.
Corollary 9.40 Let f L
1
(1
n
) where the measure is m
n
. Then
_
R
n
fdm
n
=
_
R
k
_
R
nk
fdm
nk
dm
k
.
Proof: Apply Fubinis theorem to the positive and negative parts of the real
and imaginary parts of f.
9.9 Spherical Coordinates In Many Dimensions
Sometimes there is a need to deal with spherical coordinates in more than three
dimensions. In this section, this concept is dened and formulas are derived for
these coordinate systems. Recall polar coordinates are of the form
y
1
= cos
y
2
= sin
where > 0 and [0, 2). Here I am writing in place of r to emphasize a pattern
which is about to emerge. I will consider polar coordinates as spherical coordinates
in two dimensions. I will also simply refer to such coordinate systems as polar
coordinates regardless of the dimension. This is also the reason I am writing y
1
and
y
2
instead of the more usual x and y. Now consider what happens when you go to
three dimensions. The situation is depicted in the following picture.
(x
1
, x
2
, x
3
)
R
2
R
From this picture, you see that y
3
= cos
1
. Also the distance between (y
1
, y
2
)
and (0, 0) is sin(
1
) . Therefore, using polar coordinates to write (y
1
, y
2
) in terms
of and this distance,
y
1
= sin
1
cos ,
y
2
= sin
1
sin,
y
3
= cos
1
.
where
1
[0, ] . What was done is to replace with sin
1
and then to add in
y
3
= cos
1
. Having done this, there is no reason to stop with three dimensions.
Consider the following picture:
(x
1
, x
2
, x
3
, x
4
)
R
3
R
From this picture, you see that y
4
= cos
2
. Also the distance between (y
1
, y
2
, y
3
)
and (0, 0, 0) is sin(
2
) . Therefore, using polar coordinates to write (y
1
, y
2
, y
3
) in
9.9. SPHERICAL COORDINATES IN MANY DIMENSIONS 235
terms of ,
1
, and this distance,
y
1
= sin
2
sin
1
cos ,
y
2
= sin
2
sin
1
sin,
y
3
= sin
2
cos
1
,
y
4
= cos
2
where
2
[0, ] .
Continuing this way, given spherical coordinates in 1
n
, to get the spherical
coordinates in 1
n+1
, you let y
n+1
= cos
n1
and then replace every occurance of
with sin
n1
to obtain y
1
y
n
in terms of
1
,
2
, ,
n1
,, and .
It is always the case that measures the distance from the point in 1
n
to the
origin in 1
n
, 0. Each
i
[0, ] , and [0, 2). It can be shown using math
induction that these coordinates map
n2
i=1
[0, ] [0, 2) (0, ) one to one onto
1
n
0 .
Theorem 9.41 Let y = h(, , ) be the spherical coordinate transformations in
1
n
. Then letting A =
n2
i=1
[0, ] [0, 2), it follows h maps A(0, ) one to one
onto 1
n
0 . Also [det Dh(, , )[ will always be of the form
[det Dh(, , )[ =
n1
(, ) . (9.20)
where is a continuous function of and .
1
Furthermore whenever f is Lebesgue
measurable and nonnegative,
_
R
n
f (y) dy =
_

0
n1
_
A
f (h(, , )) (, ) ddd (9.21)
where here dd denotes dm
n1
on A. The same formula holds if f L
1
(1
n
) .
Proof: Formula 9.20 is obvious from the denition of the spherical coordinates.
The rst claim is also clear from the denition and math induction. It remains to
verify 9.21. Let A
0

n2
i=1
(0, )(0, 2) . Then it is clear that (A A
0
)(0, )
N is a set of measure zero in 1
n
. Therefore, from Lemma 9.22 it follows h(N) is also
a set of measure zero. Therefore, using the change of variables theorem, Fubinis
theorem, and Sards lemma,
_
R
n
f (y) dy =
_
R
n
\{0}
f (y) dy =
_
R
n
\({0}h(N))
f (y) dy
=
_
A
0
(0,)
f (h(, , ))
n1
(, ) dm
n
=
_
A
A(0,)
(, , ) f (h(, , ))
n1
(, ) dm
n
=
_

0
n1
__
A
f (h(, , )) (, ) dd
_
d.
1
Actually it is only a function of the rst but this is not important in what follows.
Now the claim about f L
1
follows routinely from considering the positive and
negative parts of the real and imaginary parts of f in the usual way. This proves
the theorem.
Notation 9.42 Often this is written dierently. Note that from the spherical co-
ordinate formulas, f (h(, , )) = f () where [[ = 1. Letting S
n1
denote the
unit sphere, 1
n
: [[ = 1 , the inside integral in the above formula is some-
times written as
_
S
n1
f () d
where is a measure on S
n1
. See [31] for another description of this measure. It
isnt an important issue here. Later in the book when integration on manifolds is
discussed, more general considerations will be dealt with. Either 9.21 or the formula
_

0
n1
__
S
n1
f () d
_
d
will be referred to as polar coordinates and is very useful in establishing estimates.
Here
_
S
n1
_
_
A
(, ) dd.
Example 9.43 For what values of s is the integral
_
B(0,R)
_
1 +[x[
2
_
s
dy bounded
independent of R? Here B(0, R) is the ball, x 1
n
: [x[ R .
I think you can see immediately that s must be negative but exactly how neg-
ative? It turns out it depends on n and using polar coordinates, you can nd just
exactly what is needed. From the polar coordinats formula above,
_
B(0,R)
_
1 +[x[
2
_
s
dy =
_
R
0
_
S
n1
_
1 +
2
_
s
n1
dd
= C
n
_
R
0
_
1 +
2
_
s
n1
d
Now the very hard problem has been reduced to considering an easy one variable
problem of nding when
_
R
0
n1
_
1 +
2
_
s
d
is bounded independent of R. You need 2s + (n 1) < 1 so you need s < n/2.
9.10 The Brouwer Fixed Point Theorem
This proof is based on Lemma 9.44. I found this proof of the Brouwer xed point
theorem or one close to it in Evans [20].
Recall that the mixed partial derivatives of a C
2
function are equal. In the
following lemma, and elsewhere, a comma followed by an index indicates the partial
9.10. THE BROUWER FIXED POINT THEOREM 237
derivative with respect to the indicated variable. Thus, f
,j
will mean
f
x
j
. Also,
write Dg for the Jacobian matrix which is the matrix of Dg taken with respect
to the usual basis vectors in 1
n
. Recall that for A an n n matrix, cof (A)
ij
is
the determinant of the matrix which results from deleting the i
th
row and the j
th
column and multiplying by (1)
i+j
.
Lemma 9.44 Let g : U 1
n
be C
2
where U is an open subset of 1
n
. Then
n
j=1
cof (Dg)
ij,j
= 0,
where here (Dg)
ij
g
i,j

g
i
x
j
. Also, cof (Dg)
ij
=
det(Dg)
g
i,j
.
Proof: From the cofactor expansion theorem,
det (Dg) =
n
i=1
g
i,j
cof (Dg)
ij
and so
det (Dg)
g
i,j
= cof (Dg)
ij
(9.22)
which shows the last claim of the lemma. Also
kj
det (Dg) =
i
g
i,k
(cof (Dg))
ij
(9.23)
because if k ,= j this is just the cofactor expansion of the determinant of a matrix
in which the k
th
and j
th
columns are equal. Dierentiate 9.23 with respect to x
j
and sum on j. This yields
r,s,j
kj
(det Dg)
g
r,s
g
r,sj
=
ij
g
i,kj
(cof (Dg))
ij
+
ij
g
i,k
cof (Dg)
ij,j
.
Hence, using
kj
= 0 if j ,= k and 9.22,
rs
(cof (Dg))
rs
g
r,sk
=
rs
g
r,ks
(cof (Dg))
rs
+
ij
g
i,k
cof (Dg)
ij,j
.
Subtracting the rst sum on the right from both sides and using the equality of
mixed partials,
i
g
i,k
_
_
j
(cof (Dg))
ij,j
_
_
= 0.
If det (g
i,k
) ,= 0 so that (g
i,k
) is invertible, this shows

j
(cof (Dg))
ij,j
= 0. If
det (Dg) = 0, let
g
k
= g +
k
I
where
k
0 and det (Dg +
k
I) det (Dg
k
) ,= 0. Then
j
(cof (Dg))
ij,j
= lim
k
j
(cof (Dg
k
))
ij,j
= 0
Denition 9.45 Let h be a function dened on an open set, U 1
n
. Then h
C
k
_
U
_
if there exists a function g dened on an open set, W containing U such
that g = h on U and g is C
k
(W) .
Lemma 9.46 There does not exist h C
2
_
B(0, R)
_
such that h :B(0, R)
B(0, R) which also has the property that h(x) = x for all x B(0, R) . Such a
function is called a retraction.
Proof: Suppose such an h exists. Let [0, 1] and let p
(x) x+(h(x) x) .
This function, p
is a homotopy of the identity map and the retraction, h. Let

I ()
_
B(0,R)
det (Dp
(x)) dx.
Then using the dominated convergence theorem,
I
() =
_
B(0,R)
i.j
det (Dp
(x))
p
i,j
p
ij
(x)
=
_
B(0,R)
j
det (Dp
(x))
p
i,j
(h
i
(x) x
i
)
,j
dx
=
_
B(0,R)
j
cof (Dp
(x))
ij
(h
i
(x) x
i
)
,j
dx
Now by assumption, h
i
(x) = x
i
on B(0, R) and so one can integrate by parts and
write
I
() =
i
_
B(0,R)
j
cof (Dp
(x))
ij,j
(h
i
(x) x
i
) dx = 0.
Therefore, I () equals a constant. However,
I (0) = m
n
(B(0, R)) > 0
but
I (1) =
_
B(0,1)
det (Dh(x)) dm
n
=
_
B(0,1)
#(y) dm
n
= 0
because from polar coordinates or other elementary reasoning, m
n
(B(0, 1)) = 0.
The following is the Brouwer xed point theorem for C
2
maps.
9.10. THE BROUWER FIXED POINT THEOREM 239
Lemma 9.47 If h C
2
_
B(0, R)
_
and h : B(0, R) B(0, R), then h has a xed
point, x such that h(x) = x.
Proof: Suppose the lemma is not true. Then for all x, [x h(x)[ , = 0. Then
dene
g (x) = h(x) +
x h(x)
[x h(x)[
t (x)
where t (x) is nonnegative and is chosen such that g (x) B(0, R) . This mapping
is illustrated in the following picture.
f (x)
x
g(x)
If x t (x) is C
2
near B(0, R), it will follow g is a C
2
retraction onto B(0, R)
contrary to Lemma 9.46. Thus t (x) is the nonnegative solution to
H (x, t) = [h(x)[
2
+ 2
_
h(x) ,
x h(x)
[x h(x)[
_
t +t
2
= R
2
(9.24)
Then
H
t
(x, t) = 2
_
h(x) ,
x h(x)
[x h(x)[
_
+ 2t.
If this is nonzero for all x near B(0, R), it follows from the implicit function theorem
that t is a C
2
function of x. Then from 9.24
2t = 2
_
h(x) ,
x h(x)
[x h(x)[
_
4
_
h(x) ,
x h(x)
[x h(x)[
_
2
4
_
[h(x)[
2
R
2
_
and so
H
t
(x, t) = 2t + 2
_
h(x) ,
x h(x)
[x h(x)[
_
=
4
_
R
2
[h(x)[
2
_
+ 4
_
h(x) ,
x h(x)
[x h(x)[
_
2
If [h(x)[ < R, this is nonzero. If [h(x)[ = R, then it is still nonzero unless
(h(x) , x h(x)) = 0.
But this cannot happen because the angle between h(x) and x h(x) cannot be
/2. Alternatively, if the above equals zero, you would need
(h(x) , x) = [h(x)[
2
= R
2
which cannot happen unless x = h(x) which is assumed not to happen. Therefore,
x t (x) is C
2
near B(0, R) and so g (x) given above contradicts Lemma 9.46.
Now it is easy to prove the Brouwer xed point theorem.
Theorem 9.48 Let f : B(0, R) B(0, R) be continuous. Then f has a xed
point.
Proof: If this is not so, there exists > 0 such that for all x B(0, R),
[x f (x)[ > .
By the Weierstrass approximation theorem, there exists h, a polynomial such that
max
_
[h(x) f (x)[ : x B(0, R)
_
<

2
.
Then for all x B(0, R),
[x h(x)[ [x f (x)[ [h(x) f (x)[ >

2
=

2
contradicting Lemma 9.47. This proves the theorem.
It is not surprising that the ball does not need to be centered at 0.
Corollary 9.49 Let f : B(a, r) B(a, r) be continuous. Then there exists x
B(a, r) such that f (x) = x.
Proof: Let g : B
r
B
r
be dened by g (y) f (y +a) a. Then g is a
continuous map from B
r
to B
r
. Therefore, there exists y B
r
such that g (y) = y.
Therefore, f (y +a) a = y and so letting x = y +a, f also has a xed point as
claimed.
9.11 Exercises
1. Let R L(1
n
, 1
n
). Show that R preserves distances if and only if RR
=
R
R = I.
2. Let f be a nonnegative strictly decreasing function dened on [0, ). For
0 y f (0), let f
1
(y) = x where y [f (x+) , f (x)]. (Draw a picture. f
could have jump discontinuities.) Show that f
1
is nonincreasing and that
_

0
f (t) dt =
_
f(0)
0
f
1
(y) dy.
Hint: Use the distribution function description.
9.11. EXERCISES 241
3. Let X be a metric space and let Y X, so Y is also a metric space. Show
the Borel sets of Y are the Borel sets of X intersected with the set, Y .
4. Consider the following nested sequence of compact sets, P
n
. We let
P
1
= [0, 1], P
2
=
_
0,
1
3
_
2
3
, 1
, etc. To go from P
n
to P
n+1
, delete the
open interval which is the middle third of each closed interval in P
n
. Let
P =
n=1
P
n
. Since P is the intersection of nested nonempty compact sets, it
follows from advanced calculus that P ,= . Show m(P) = 0. Show there is
a one to one onto mapping of [0, 1] to P. The set P is called the Cantor set.
Thus, although P has measure zero, it has the same number of points in it
as [0, 1] in the sense that there is a one to one and onto mapping from one to
the other. Hint: There are various ways of doing this last part but the most
enlightenment is obtained by exploiting the construction of the Cantor set
rather than some silly representation in terms of sums of powers of two and
three. All you need to do is use the theorems related to the Schroder Bernstein
theorem and show there is an onto map from the Cantor set to [0, 1]. If you
do this right and remember the theorems about characterizations of compact
metric spaces, you may get a pretty good idea why every compact metric
space is the continuous image of the Cantor set which is a really interesting
theorem in topology.
5. Consider the sequence of functions dened in the following way. Let
f
1
(x) = x on [0, 1]. To get fromf
n
to f
n+1
, let f
n+1
= f
n
on all intervals where
f
n
is constant. If f
n
is nonconstant on [a, b], let f
n+1
(a) = f
n
(a), f
n+1
(b) =
f
n
(b), f
n+1
is piecewise linear and equal to
1
2
(f
n
(a) + f
n
(b)) on the middle
third of [a, b]. Sketch a few of these and you will see the pattern. The process
of modifying a nonconstant section of the graph of this function is illustrated
in the following picture.
Show f
n
converges uniformly on [0, 1]. If f(x) = lim
n
f
n
(x), show that
f(0) = 0, f(1) = 1, f is continuous, and f
(x) = 0 for all x / P where P is the

Cantor set. This function is called the Cantor function.It is a very important
example to remember. Note it has derivative equal to zero a.e. and yet it
succeeds in climbing from 0 to 1. Hint: This isnt too hard if you focus on
getting a careful estimate on the dierence between two successive functions
in the list considering only a typical small interval in which the change takes
place. The above picture should be helpful.
6. Let m(W) > 0, W is measurable, W [a, b]. Show there exists a nonmea-
surable subset of W. The ideas involved in getting this nonmeasurable set are
due to Vitali. Hint: Let x y if xy . Observe that is an equivalence
relation on 1. See Denition 1.9 on Page 19 for a review of this terminology.
Let ( be the set of equivalence classes and let T C W : C ( and
C W ,= . By the axiom of choice, there exists a set, A, consisting of
exactly one point from each of the nonempty sets which are the elements of
T. Show
W
rQ
A+r (a.)
A+r
1
A+r
2
= if r
1
,= r
2
, r
i
. (b.)
Observe that since A [a, b], then A+r [a1, b +1] whenever [r[ < 1. Use
this to show that if m(A) = 0, or if m(A) > 0 a contradiction results.Show
there exists some set, S such that m(S) < m(S A) +m(S A) where m is
the outer measure determined by m.
7. This problem gives a very interesting example found in the book by
McShane [35]. Let g(x) = x+f(x) where f is the strange function of Problem
5. Let P be the Cantor set of Problem 4. Let [0, 1] P =
j=1
I
j
where I
j
is
open and I
j
I
k
= if j ,= k. These intervals are the connected components
of the complement of the Cantor set. Show m(g(I
j
)) = m(I
j
) so
m(g(
j=1
I
j
)) =
j=1
m(g(I
j
)) =
j=1
m(I
j
) = 1.
Thus m(g(P)) = 1 because g([0, 1]) = [0, 2]. By Problem 6 there exists a set,
A g (P) which is non measurable. Dene (x) = A
A
(g(x)). Thus (x) = 0
unless x P. Tell why is measurable. (Recall m(P) = 0 and Lebesgue
measure is complete.) Now show that A
A
(y) = (g
1
(y)) for y [0, 2]. Tell
why g
1
is continuous but g
1
is not measurable. (This is an example of
measurable continuous ,= measurable.) Show there exist Lebesgue measur-
able sets which are not Borel measurable. Hint: The function, is Lebesgue
measurable. Now recall that Borel measurable = measurable.
8. If A is mS measurable, it does not follow that A is m measurable. Give an
example to show this is the case.
9. Let f (y) = g (y) = [y[
1/2
if y (1, 0) (0, 1) and f (y) = g (y) = 0 if
y / (1, 0) (0, 1). For which values of x does it make sense to write the
integral
_
R
f (x y) g (y) dy?
10. Let f L
1
(1), g L
1
(1). Wherever the integral makes sense, dene
(f g)(x)
_
R
f(x y)g(y)dy.
Show the above integral makes sense for a.e. x and that if f g (x) is dened
to equal 0 at every point where the above integral does not make sense, it
follows that [(f g)(x)[ < a.e. and
[[f g[[
L
1 [[f[[
L
1[[g[[
L
1. Here [[f[[
L
1
_
[f[dx.
9.11. EXERCISES 243
11. Let f : [0, ) 1 be in L
1
(1, m). The Laplace transform is given by

f(x) =
_
0
e
xt
f(t)dt. Let f, g be in L
1
(1, m), and let h(x) =
_
x
0
f(xt)g(t)dt. Show
h L
1
, and
h =

f g.
12. Show that the function sin(x) /x is not in L
1
(0, ). Even though this
function is not in L
1
(0, ), show lim
A
_
A
0
sin x
x
dx =

2
. This limit is
sometimes called the Cauchy principle value and it is often the case that
this is what is found when you use methods of contour integrals to evaluate
improper integrals. Hint: Use
1
x
=
_
0
e
xt
dt and Fubinis theorem.
13. Let E be a countable subset of 1. Show m(E) = 0. Hint: Let the set be
e
i
i=1
and let e
i
be the center of an open interval of length /2
i
.
14. If S is an uncountable set of irrational numbers, is it necessary that S
has a rational number as a limit point? Hint: Consider the proof of Problem
13 when applied to the rational numbers. (This problem was shown to me by
Lee Erlebach.)
15. Let f be a function dened on an interval, (a, b). The Dini derivates are
dened as
D
+
f (x) lim inf
h0+
f (x +h) f (x)
h
,
D
+
f (x) lim sup
h0+
f (x +h) f (x)
h
D
f (x) lim inf

h0+
f (x) f (x h)
h
,
D
f (x) lim sup

h0+
f (x) f (x h)
h
.
Suppose f is continuous on (a, b) and for all x (a, b), D
+
f (x) 0. Show
that then f is increasing on (a, b). Hint: Consider the function, H (x)
f (x) (d c) x(f (d) f (c)) where a < c < d < b. Thus H (c) = H (d).
Also it is easy to see that H cannot be constant if f (d) < f (c) due to the
assumption that D
+
f (x) 0. If there exists x
1
(a, b) where H (x
1
) > H (c),
then let x
0
(c, d) be the point where the maximum of f occurs. Consider
D
+
f (x
0
). If, on the other hand, H (x) < H (c) for all x (c, d), then consider
D
+
H (c).
16. Suppose in the situation of the above problem we only know
D
+
f (x) 0 a.e.
Does the conclusion still follow? What if we only know D
+
f (x) 0 for every
x outside a countable set? Hint: In the case of D
+
f (x) 0,consider the bad
function described above in Problem 5. In the case where D
+
f (x) 0 for all
but countably many x, by replacing f (x) with

f (x) f (x) + x, consider
the situation where D
+
f (x) > 0 for all but countably many x. If in this

situation,

f (c) >

f (d) for some c < d, and y
_
f (d) ,

f (c)
_
,let
z sup
_
x [c, d] :

f (x) > y
0
_
.
Show that

f (z) = y
0
and D
+
f (z) 0. Conclude that if

f fails to be in-
creasing, then D
+
f (z) 0 for uncountably many points, z. Now draw a

conclusion about f.
17. Let f : [a, b] 1 be increasing. Show
m
_
_
_
N
pq
..
_
D
+
f (x) > q > p > D
+
f (x)
_
_
_ = 0 (9.25)
and conclude that aside from a set of measure zero, D
+
f (x) = D
+
f (x).
Similar reasoning will show D
f (x) = D
f (x) a.e. and D

+
f (x) = D
f (x)
a.e. and so o some set of measure zero, we have
D
f (x) = D
f (x) = D
+
f (x) = D
+
f (x)
which implies the derivative exists and equals this common value. Hint: To
show 9.25, let U be an open set containing N
pq
such that m(N
pq
)+ > m(U).
For each x N
pq
there exist y > x arbitrarily close to x such that
f (y) f (x) < p (y x) .
Thus the set of such intervals, [x, y] which are contained in U constitutes a
Vitali cover of N
pq
. Let [x
i
, y
i
] be disjoint and
m(N
pq

i
[x
i
, y
i
]) = 0.
Now let V
i
(x
i
, y
i
). Then also
m
_
_
_N
pq
=V
..
i
(x
i
, y
i
)
_
_
_ = 0.
and so m(N
pq
V ) = m(N
pq
). For each x N
pq
V , there exist y > x
arbitrarily close to x such that
f (y) f (x) > q (y x) .
9.11. EXERCISES 245
Thus the set of such intervals, [x
, y
] which are contained in V is a Vitali

cover of N
pq
V . Let [x
i
, y
i
] be disjoint and
m(N
pq
V
i
[x
i
, y
i
]) = 0.
Then verify the following:
i
f (y
i
) f (x
i
) > q
i
(y
i
x
i
) qm(N
pq
V ) = qm(N
pq
)
pm(N
pq
) > p (m(U) ) p
i
(y
i
x
i
) p
i
(f (y
i
) f (x
i
)) p
i
f (y
i
) f (x
i
) p
and therefore, (q p) m(N
pq
) p. Since > 0 is arbitrary, this proves that
there is a right derivative a.e. A similar argument does the other cases.
The L
p
Spaces
10.1 Basic Inequalities And Properties
One of the main applications of the Lebesgue integral is to the study of various
sorts of functions space. These are vector spaces whose elements are functions of
various types. One of the most important examples of a function space is the space
of measurable functions whose absolute values are p
th
power integrable where p 1.
These spaces, referred to as L
p
spaces, are very useful in applications. In the chapter
(, o, ) will be a measure space.
Denition 10.1 Let 1 p < . Dene
L
p
() f : f is measurable and
_
[f()[
p
d <
In terms of the distribution function,
L
p
() = f : f is measurable and
_

0
pt
p1
([[f[ > t]) dt <
For each p > 1 dene q by
1
p
+
1
q
= 1.
Often one uses p
instead of q in this context.

L
p
() is a vector space and has a norm. This is similar to the situation for 1
n
but the proof requires the following fundamental inequality. .
Theorem 10.2 (Holders inequality) If f and g are measurable functions, then if
p > 1,
_
[f[ [g[ d
__
[f[
p
d
_1
p
__
[g[
q
d
_1
q
. (10.1)
Proof: First here is a proof of Youngs inequality .
Lemma 10.3 If p > 1, and 0 a, b then ab
a
p
p
+
b
q
q
.
247
248 THE L
P
SPACES
Proof: Consider the following picture:
b
a
x
t
x = t
p1
t = x
q1
From this picture, the sum of the area between the x axis and the curve added to
the area between the t axis and the curve is at least as large as ab. Using beginning
calculus, this is equivalent to the following inequality.
ab
_
a
0
t
p1
dt +
_
b
0
x
q1
dx =
a
p
p
+
b
q
q
.
The above picture represents the situation which occurs when p > 2 because the
graph of the function is concave up. If 2 p > 1 the graph would be concave down
or a straight line. You should verify that the same argument holds in these cases
just as well. In fact, the only thing which matters in the above inequality is that
the function x = t
p1
be strictly increasing.
Note equality occurs when a
p
= b
q
.
Here is an alternate proof.
Lemma 10.4 For a, b 0,
ab
a
p
p
+
b
q
q
and equality occurs when if and only if a
p
= b
q
.
Proof: If b = 0, the inequality is obvious. Fix b > 0 and consider f (a)
a
p
p
+
b
q
q
ab. Then f
(a) = a
p1
b. This is negative when a b
1/(p1)
. Therefore, f has a minimum when a = b
1/(p1)
. In other
words, when a
p
= b
p/(p1)
= b
q
since 1/p + 1/q = 1. Thus the minimum value of f
is
b
q
p
+
b
q
q
b
1/(p1)
b = b
q
b
q
= 0.
It follows f 0 and this yields the desired inequality.
Proof of Holders inequality: If either
_
[f[
p
d or
_
[g[
p
d equals , the
inequality 10.1 is obviously valid because anything. If either
_
[f[
p
d or
_
[g[
p
d equals 0, then f = 0 a.e. or that g = 0 a.e. and so in this case the left side of
the inequality equals 0 and so the inequality is therefore true. Therefore assume both
10.1. BASIC INEQUALITIES AND PROPERTIES 249
_
[f[
p
d and
_
[g[
p
d are less than and not equal to 0. Let
__
[f[
p
d
_
1/p
= I (f)
and let
__
[g[
p
d
_
1/q
= I (g). Then using the lemma,
_
[f[
I (f)
[g[
I (g)
d
1
p
_
[f[
p
I (f)
p
d +
1
q
_
[g[
q
I (g)
q
d = 1.
Hence,
_
[f[ [g[ d I (f) I (g) =
__
[f[
p
d
_
1/p
__
[g[
q
d
_
1/q
.
This proves Holders inequality.
The following lemma will be needed.
Lemma 10.5 Suppose x, y C. Then
[x +y[
p
2
p1
([x[
p
+[y[
p
) .
Proof: The function f (t) = t
p
is concave up for t 0 because p > 1. Therefore,
the secant line joining two points on the graph of this function must lie above the
graph of the function. This is illustrated in the following picture.
[x[ [y[ m
([x[ +[y[)/2 = m
Now as shown above,
_
[x[ +[y[
2
_
p
[x[
p
+[y[
p
2
which implies
[x +y[
p
([x[ +[y[)
p
2
p1
([x[
p
+[y[
p
)
Note that if y = (x) is any function for which the graph of is concave up,
you could get a similar inequality by the same argument.
Corollary 10.6 (Minkowski inequality) Let 1 p < . Then
__
[f +g[
p
d
_
1/p
__
[f[
p
d
_
1/p
+
__
[g[
p
d
_
1/p
. (10.2)
250 THE L
P
SPACES
Proof: If p = 1, this is obvious because it is just the triangle inequality. Let
p > 1. Without loss of generality, assume
__
[f[
p
d
_
1/p
+
__
[g[
p
d
_
1/p
<
and
__
[f +g[
p
d
_
1/p
,= 0 or there is nothing to prove. Therefore, using the above
lemma,
_
[f +g[
p
d 2
p1
__
[f[
p
+[g[
p
d
_
< .
Now [f () +g ()[
p
[f () +g ()[
p1
([f ()[ +[g ()[). Also, it follows from the
denition of p and q that p 1 =
p
q
. Therefore, using this and Holders inequality,
_
[f +g[
p
d
_
[f +g[
p1
[f[d +
_
[f +g[
p1
[g[d
=
_
[f +g[
p
q
[f[d +
_
[f +g[
p
q
[g[d
(
_
[f +g[
p
d)
1
q
(
_
[f[
p
d)
1
p
+ (
_
[f +g[
p
d)
1
q
(
_
[g[
p
d)
1
p
.
Dividing both sides by (
_
[f +g[
p
d)
1
q
yields 10.2. This proves the corollary.
The following follows immediately from the above.
Corollary 10.7 Let f
i
L
p
() for i = 1, 2, , n. Then
_
_

i=1
f
i
p
d
_
1/p
i=1
__
[f
i
[
p
_
1/p
.
This shows that if f, g L
p
, then f + g L
p
. Also, it is clear that if a is a
constant and f L
p
, then af L
p
because
_
[af[
p
d = [a[
p
_
[f[
p
d < .
Thus L
p
is a vector space and
a.)
__
[f[
p
d
_
1/p
0,
__
[f[
p
d
_
1/p
= 0 if and only if f = 0 a.e.
b.)
__
[af[
p
d
_
1/p
= [a[
__
[f[
p
d
_
1/p
if a is a scalar.
c.)
__
[f +g[
p
d
_
1/p
__
[f[
p
d
_
1/p
+
__
[g[
p
d
_
1/p
.
f
__
[f[
p
d
_
1/p
would dene a norm if
__
[f[
p
d
_
1/p
= 0 implied f = 0.
Unfortunately, this is not so because if f = 0 a.e. but is nonzero on a set of
10.1. BASIC INEQUALITIES AND PROPERTIES 251
measure zero,
__
[f[
p
d
_
1/p
= 0 and this is not allowed. However, all the other
properties of a norm are available and so a little thing like a set of measure zero
will not prevent the consideration of L
p
as a normed vector space if two functions
in L
p
which dier only on a set of measure zero are considered the same. That is,
an element of L
p
is really an equivalence class of functions where two functions are
equivalent if they are equal a.e. With this convention, here is a denition.
Denition 10.8 Let f L
p
(). Dene
[[f[[
p
[[f[[
L
p

__
[f[
p
d
_
1/p
.
Then with this denition and using the convention that elements in L
p
are
considered to be the same if they dier only on a set of measure zero, [[ [[
p
is a
norm on L
p
() because if [[f[[
p
= 0 then f = 0 a.e. and so f is considered to be
the zero function because it diers from 0 only on a set of measure zero.
The following is an important denition.
Denition 10.9 A complete normed linear space is called a Banach
1
space.
L
p
is a Banach space. This is the next big theorem.
Theorem 10.10 The following hold for L
p
()
a.) L
p
() is complete.
b.) If f
n
is a Cauchy sequence in L
p
(), then there exists f L
p
() and a
subsequence which converges a.e. to f L
p
(), and [[f
n
f[[
p
0.
Proof: Let f
n
be a Cauchy sequence in L
p
(). This means that for every
> 0 there exists N such that if n, m N, then [[f
n
f
m
[[
p
< . Now select a
subsequence as follows. Let n
1
be such that [[f
n
f
m
[[
p
< 2
1
whenever n, m n
1
.
Let n
2
be such that n
2
> n
1
and [[f
n
f
m
[[
p
< 2
2
whenever n, m n
2
. If
n
1
, , n
k
have been chosen, let n
k+1
> n
k
and whenever n, m n
k+1
, [[f
n

f
m
[[
p
< 2
(k+1)
. The subsequence just mentioned is f
n
k
. Thus, [[f
n
k
f
n
k+1
[[
p
<
2
k
. Let
g
k+1
= f
n
k+1
f
n
k
.
1
These spaces are named after Stefan Banach, 1892-1945. Banach spaces are the basic item of
study in the subject of functional analysis and will be considered later in this book.
There is a recent biography of Banach, R. Katu za, The Life of Stefan Banach, (A. Kostant and
W. Woyczy nski, translators and editors) Birkhauser, Boston (1996). More information on Banach
can also be found in a recent short article written by Douglas Henderson who is in the department
of chemistry and biochemistry at BYU.
Banach was born in Austria, worked in Poland and died in the Ukraine but never moved. This
is because borders kept changing. There is a rumor that he died in a German concentration camp
which is apparently not true. It seems he died after the war of lung cancer.
He was an interesting character. He hated taking examinations so much that he did not receive
his undergraduate university degree. Nevertheless, he did become a professor of mathematics due
to his important research. He and some friends would meet in a cafe called the Scottish cafe where
they wrote on the marble table tops until Banachs wife supplied them with a notebook which
became the Scotish notebook and was eventually published.
252 THE L
P
SPACES
Then by the corollary to Minkowskis inequality,
>
k=1
[[g
k+1
[[
p

m
k=1
[[g
k+1
[[
p

k=1
[g
k+1
[
p
for all m. It follows that
_
_
m
k=1
[g
k+1
[
_
p
d
_

k=1
[[g
k+1
[[
p
_
p
< (10.3)
for all m and so the monotone convergence theorem implies that the sum up to m
in 10.3 can be replaced by a sum up to . Thus,
_
_

k=1
[g
k+1
[
_
p
d <
which requires
k=1
[g
k+1
(x)[ < a.e. x.
Therefore,

k=1
g
k+1
(x) converges for a.e. x because the functions have values in
a complete space, C, and this shows the partial sums form a Cauchy sequence. Now
let x be such that this sum is nite. Then dene
f(x) f
n
1
(x) +
k=1
g
k+1
(x)= lim
m
f
n
m
(x)
since

m
k=1
g
k+1
(x) = f
n
m+1
(x) f
n
1
(x). Therefore there exists a set, E having
measure zero such that
lim
k
f
n
k
(x) = f(x)
for all x / E. Redene f
n
k
to equal 0 on E and let f(x) = 0 for x E. It then
follows that lim
k
f
n
k
(x) = f(x) for all x. By Fatous lemma, and the Minkowski
inequality,
[[f f
n
k
[[
p
=
__
[f f
n
k
[
p
d
_
1/p
lim inf
m
__
[f
n
m
f
n
k
[
p
d
_
1/p
= lim inf
m
[[f
n
m
f
n
k
[[
p

lim inf
m
m1
j=k
f
n
j+1
f
n
j
i=k
f
n
i+1
f
n
i
p
2
(k1)
. (10.4)
Therefore, f L
p
() because
[[f[[
p
[[f f
n
k
[[
p
+[[f
n
k
[[
p
< ,
10.2. HILBERT SPACE AND RIESZ REPRESENTATION THEOREM 253
and lim
k
[[f
n
k
f[[
p
= 0. This proves b.).
This has shown f
n
k
converges to f in L
p
(). It follows the original Cauchy
sequence also converges to f in L
p
(). This is a general fact that if a subsequence
of a Cauchy sequence converges, then so does the original Cauchy sequence. You
should give a proof of this or you could see Lemma 5.5. This proves the theorem.
10.2 Hilbert Space And Riesz Representation The-
orem
The special case where p = 2 gives an example of a Hilbert space.
Denition 10.11 Let X be a vector space. An inner product is a mapping from
X X to C if X is complex and from X X to 1 if X is real, denoted by (x, y)
which satises the following.
(x, x) 0, (x, x) = 0 if and only if x = 0, (10.5)
(x, y) = (y, x). (10.6)
For a, b C and x, y, z X,
(ax +by, z) = a(x, z) +b(y, z). (10.7)
Note that 10.6 and 10.7 imply (x, ay + bz) = a(x, y) + b(x, z). Such a vector space
is called an inner product space.
An example of such an inner product space is L
2
(; ) with the inner product
given by
(f, g)
L
2

_
f () g ()d
This is with the understanding that f = g if they are equal for a.e. . In the con-
text of L
2
(; ) , the following fundamental inequality is a special case of Holders
inequality. However, I will give an elegant proof of it which depends only on algebra.
Theorem 10.12 (Cauchy Schwarz) In any inner product space
[(x, y)[ [[x[[ [[y[[
where [[x[[ (x, x)
1/2
.
Proof: Let C, [[ = 1, and (x, y) = [(x, y)[ = Re(x, y). Let
F(t) = (x +ty, x +ty).
If y = 0 there is nothing to prove because
(x, 0) = (x, 0 + 0) = (x, 0) + (x, 0)
254 THE L
P
SPACES
and so (x, 0) = 0. Thus, it can be assumed y ,= 0. Then from the axioms of the
inner product,
F(t) = [[x[[
2
+ 2t Re(x, y) +t
2
[[y[[
2
0.
This yields
[[x[[
2
+ 2t[(x, y)[ +t
2
[[y[[
2
0.
Since this inequality holds for all t 1, it follows from the quadratic formula that
4[(x, y)[
2
4[[x[[
2
[[y[[
2
0.
This yields the conclusion and proves the theorem.
Proposition 10.13 For an inner product space, [[x[[ (x, x)
1/2
does specify a
norm.
Proof: All the axioms are obvious except the triangle inequality. To verify this,
[[x +y[[
2
(x +y, x +y) [[x[[
2
+[[y[[
2
+ 2 Re (x, y)
[[x[[
2
+[[y[[
2
+ 2 [(x, y)[
[[x[[
2
+[[y[[
2
+ 2 [[x[[ [[y[[ = ([[x[[ +[[y[[)
2
.
The following lemma is called the parallelogram identity.
Lemma 10.14 In an inner product space,
[[x +y[[
2
+[[x y[[
2
= 2[[x[[
2
+ 2[[y[[
2
.
Proof: Starting with the left side,
[[x +y[[
2
+[[x y[[
2
(x +y, x +y) + (x y, x y)
Now by the properties of the inner product,
= (x, x) + 2 Re (x, y) + (y, y) + (x, x) 2 Re (x, y) + (y, y)
= 2 [[x[[
2
+ 2 [[y[[
2
This proves the identity.
Lemma 10.15 For x H, an inner product space,
[[x[[ = sup
||y||1
[(x, y)[ (10.8)
Proof: By the Cauchy Schwarz inequality, if x ,= 0,
[[x[[ sup
||y||1
[(x, y)[
_
x,
x
[[x[[
_
= [[x[[ .
This proves 10.8 if x ,= 0. It is obvious that 10.8 holds in the case that x = 0. This
is because (0, y) = 0 for every y. (Why?)
Denition 10.16 A Hilbert space is an inner product space which is complete.
Thus a Hilbert space is a Banach space in which the norm comes from an inner
product as described above.
Example 10.17 An example of a Hilbert space is also L
2
(; ). This follows from
Theorem 10.10 which says this inner product space is complete.
Denition 10.18 Let H be a Hilbert space. Then H
denotes the space of all

bounded linear functionals dened on H. Thus f H
means f has values in F the

eld of scalars, f is linear,
f (x +y) = f (x) +f (y) , for all , F,
and f is bounded. (There exists C > 0 such that for all x H,
[f (x)[ C [[x[[
Then H
is also called the dual space of H.

Lemma 10.19 A linear functional f dened on H, a Hilbert space, is bounded if
and only if f is continuous.
Proof: Suppose f is bounded rst. Then if x
n
x,
[f (x) f (x
n
)[ = [f (x x
n
)[ C [x x
n
[
which converges to 0. Therefore, f is continuous.
Next suppose f is continuous. Then in particular f is continuous at 0. Since f
is linear, it follows f (0) = 0. Therefore, there exists > 0 such that if [[x[[ ,
then [f (x)[ < 1. Now for arbitrary x ,= 0,
x
[[x[[

and so for any x ,= 0,
f
_
x
[[x[[
_
< 1
and so
[f (x)[
1
[[x[[ .
Thus f is bounded. This proves the lemma.
Proposition 10.20 Let M be a closed subspace of a Hilbert space H and let x H.
Then there exists a unique Px M such that [[x Px[[ [[x y[[ for all y M.
This mapping P is characterized by
(y Px, x P (x)) = 0
for all y M and P is linear.
256 THE L
P
SPACES
Proof: Let y
n
be a minimizing sequence. That is, y
n
M,
lim
n
[[y
n
x[[ = inf [[y x[[ : y M .
Then since M is a subspace, (y
n
+y
m
) /2 M and so is (y
n
y
m
) /2. Then from
the parallelogram identity Lemma 10.14,
=x(y
n
+y
m
)/2
..
_
x
2

y
n
2
_
+
_
x
2

y
m
2
_
2
+
y
n
2

y
m
2
2
= 2
x
2

y
n
2
2
+ 2
x
2

y
m
2
2
and so
1
4
[[y
n
y
m
[[
2
=
1
2
[[x y
n
[[
2
+
1
2
[[x y
m
[[
2
x
y
n
+y
m
2
1
2
[[x y
n
[[
2
+
1
2
[[x y
m
[[
2
2
and now since this is a minimizing sequence, it follows that for m, n large enough,
the right side is smaller than . Thus y
n
is a Cauchy sequence. Since H is
complete, this converges to some y H. Since M is closed, it follows y M. Then
= lim
n
[[x y
n
[[ = [[x Px[[
This proves existence. To verify uniqueness of Px, suppose both y, y
1
work. Then
consider (y +y
1
) /2 and use the parallelogram identity again.
x
y +y
1
2
2
+
y y
1
2
2
= 2
x y
2
2
+ 2
x y
1
2
2
=
2
and so if y ,= y
1
x
y +y
1
2
2
=
2
y y
1
2
2
<
2
,
a contradiction. Thus Px is unique.
Next consider the characterization. Let z M and let t 1 and y M. Then
for [[ = 1,
f (t) [[x (z +t(y z))[[
2
= [[x z[[
2
+ 2t Re (x z, (y z)) +t
2
[[y z[[
2
= [[x z[[
2
+ 2t Re (x z, y z) +t
2
[[y z[[
2
(10.9)
When z = Px,the top line of the above shows this function achieves its minimum
when t = 0. Therefore, taking the derivative and plugging in t = 0,it follows for all
y M,
Re (x Px, y Px) = 0
Choosing such that
(x Px, y Px) = [(x Px, y Px)[
this requires (x Px, y Px) = 0. Then 10.9 shows z = Px.
If for all y M, (x z, y z) = 0, then 10.9 shows z = Px. Now why is P
linear? From the characterization and uniqueness of Px,
(x
1
+x
2
(Px
1
+Px
2
) , y (Px
1
+Px
2
))
=
_
x
1
Px
1
,
y
Px
1
_
+
_
x
2
Px
2
,
y
Px
2
_
= 0 + 0 = 0
and so P (x
1
+x
2
) = Px
1
+Px
2
by uniqueness. This proves the Proposition.
Note that since y, Px M,
(x Px, z) = 0
for all z M. Just let y = z +Px in the conclusion of the proposition.
The following theorem is called the Riesz representation theorem for the dual of
a Hilbert space. If z H then dene an element f H
by the rule (x, z) f (x). It

follows from the Cauchy Schwarz inequality and the properties of the inner product
that f H
. The Riesz representation theorem says that all elements of H
are of
this form.
Theorem 10.21 Let H be a Hilbert space and let f H
. Then there exists a

unique z H such that
f (x) = (x, z) (10.10)
for all x H.
Proof: Letting y, w H the assumption that f is linear implies
f (yf(w) f(y)w) = f (w) f (y) f (y) f (w) = 0
which shows that yf(w)f(y)w f
1
(0), which is a closed subspace of H since f is
continuous. If f
1
(0) = H, then f is the zero map and z = 0 is the unique element
of H which satises 10.10. If f
1
(0) ,= H, pick u / f
1
(0) and let w uPu ,= 0.
Thus Proposition 10.20 implies (y, w) = 0 for all y f
1
(0). In particular, let y =
xf(w) f(x)w where x H is arbitrary. Therefore,
0 = (f(w)x f(x)w, w) = f(w)(x, w) f(x)[[w[[
2
.
Thus, solving for f (x) and using the properties of the inner product,
f(x) = (x,
f(w)w
[[w[[
2
)
Let z = f(w)w/[[w[[
2
. This proves the existence of z. If f (x) = (x, z
i
) i = 1, 2,
for all x H, then for all x H, then (x, z
1
z
2
) = 0 which implies, upon taking
x = z
1
z
2
that z
1
= z
2
258 THE L
P
SPACES
Denition 10.22 If R : H H
is dened by Rx(y) (y, x) , the Riesz rep-

resentation theorem above states this map is onto. This map is called the Riesz
map.
10.3 Minkowskis Inequality
In working with the L
p
spaces, the following inequality also known as Minkowskis
inequality is very useful. It is similar to the Minkowski inequality for sums. To
see this, replace the integral,
_
X
with a nite summation sign and you will see the
usual Minkowski inequality or rather the version of it given in Corollary 10.7.
To prove this theorem rst consider a special case of it in which technical con-
siderations which shed no light on the proof are excluded.
Lemma 10.23 Let (X, o, ) and (Y, T, ) be nite complete measure spaces and
let f be measurable and uniformly bounded. Then the following inequality is
valid for p 1.
_
X
__
Y
[f(x, y)[
p
d
_1
p
d
__
Y
(
_
X
[f(x, y)[ d)
p
d
_1
p
. (10.11)
Proof: Since f is bounded and (X) , (Y ) < ,
__
Y
(
_
X
[f(x, y)[d)
p
d
_1
p
< .
Let
J(y) =
_
X
[f(x, y)[d.
Note there is no problem in writing this for a.e. y because f is measurable and
Lemma 8.50 on Page 204. Then by Fubinis theorem,
_
Y
__
X
[f(x, y)[d
_
p
d =
_
Y
J(y)
p1
_
X
[f(x, y)[d d
=
_
X
_
Y
J(y)
p1
[f(x, y)[d d
Now apply Holders inequality in the last integral above and recall p 1 =
p
q
. This
yields
_
Y
__
X
[f(x, y)[d
_
p
d
_
X
__
Y
J(y)
p
d
_1
q
__
Y
[f(x, y)[
p
d
_1
p
d
=
__
Y
J(y)
p
d
_1
q
_
X
__
Y
[f(x, y)[
p
d
_1
p
d
10.3. MINKOWSKIS INEQUALITY 259
=
__
Y
(
_
X
[f(x, y)[d)
p
d
_1
q
_
X
__
Y
[f(x, y)[
p
d
_1
p
d. (10.12)
Therefore, dividing both sides by the rst factor in the above expression,
__
Y
__
X
[f(x, y)[d
_
p
d
_
1
p
_
X
__
Y
[f(x, y)[
p
d
_1
p
d. (10.13)
Note that 10.13 holds even if the rst factor of 10.12 equals zero. This proves the
lemma.
Now consider the case where f is not assumed to be bounded and where the
measure spaces are nite.
Theorem 10.24 Let (X, o, ) and (Y, T, ) be -nite measure spaces and let f
be product measurable. Then the following inequality is valid for p 1.
_
X
__
Y
[f(x, y)[
p
d
_1
p
d
__
Y
(
_
X
[f(x, y)[ d)
p
d
_1
p
. (10.14)
Proof: Since the two measure spaces are nite, there exist measurable sets,
X
m
and Y
k
such that X
m
X
m+1
for all m, Y
k
Y
k+1
for all k, and (X
m
) , (Y
k
) <
. Now dene
f
n
(x, y)
_
f (x, y) if [f (x, y)[ n
n if [f (x, y)[ > n.
Thus f
n
is uniformly bounded and product measurable. By the above lemma,
_
X
m
__
Y
k
[f
n
(x, y)[
p
d
_1
p
d
__
Y
k
(
_
X
m
[f
n
(x, y)[ d)
p
d
_1
p
. (10.15)
Now observe that [f
n
(x, y)[ increases in n and the pointwise limit is [f (x, y)[. There-
fore, using the monotone convergence theorem in 10.15 yields the same inequality
with f replacing f
n
. Next let k and use the monotone convergence theorem
again to replace Y
k
with Y . Finally let m in what is left to obtain 10.14. This
proves the theorem.
Note that the proof of this theorem depends on two manipulations, the inter-
change of the order of integration and Holders inequality. Note that there is nothing
to check in the case of double sums. Thus if a
ij
0, it is always the case that
_
_
j
_
i
a
ij
_
p
_
_
1/p
i
_
_
j
a
p
ij
_
_
1/p
because the integrals in this case are just sums and (i, j) a
ij
is measurable.
The L
p
spaces have many important properties.
260 THE L
P
SPACES
10.4 Density Considerations
Theorem 10.25 Let p 1 and let (, o, ) be a measure space. Then the simple
functions are dense in L
p
().
Proof: Recall that a function, f, having values in 1 can be written in the form
f = f
+
f
where
f
+
= max (0, f) , f
= max (0, f) .
Therefore, an arbitrary complex valued function, f is of the form
f = Re f
+
Re f
+i
_
Imf
+
Imf
_
.
If each of these nonnegative functions is approximated by a simple function, it
follows f is also approximated by a simple function. Therefore, there is no loss of
generality in assuming at the outset that f 0.
Since f is measurable, Theorem 7.24 implies there is an increasing sequence of
simple functions, s
n
, converging pointwise to f(x). Now
[f(x) s
n
(x)[ [f(x)[.
By the Dominated Convergence theorem,
0 = lim
n
_
[f(x) s
n
(x)[
p
d.
Thus simple functions are dense in L
p
.
Recall that for a topological space, C
c
() is the space of continuous functions
with compact support in . Also recall the following denition.
Denition 10.26 Let (, o, ) be a measure space and suppose (, ) is also a
topological space. Then (, o, ) is called a regular measure space if the algebra
of Borel sets is contained in o and for all E o,
(E) = inf(V ) : V E and V open
and if (E) < ,
(E) = sup(K) : K E and K is compact
and (K) < for any compact set, K.
For example Lebesgue measure is an example of such a measure.
Lemma 10.27 Let be a metric space in which the closed balls are compact and
let K be a compact subset of V , an open set. Then there exists a continuous function
f : [0, 1] such that f(x) = 1 for all x K and spt(f) is a compact subset of
V . That is, K f V.
10.4. DENSITY CONSIDERATIONS 261
Proof: Let K W W V and W is compact. To obtain this list of
inclusions consider a point in K, x, and take B(x, r
x
) a ball containing x such that
B(x, r
x
) is a compact subset of V . Next use the fact that K is compact to obtain
the existence of a list, B(x
i
, r
x
i
/2)
m
i=1
which covers K. Then let
W
m
i=1
B
_
x
i
,
r
x
i
2
_
.
It follows since this is a nite union that
W =
m
i=1
B
_
x
i
,
r
x
i
2
_
and so W, being a nite union of compact sets is itself a compact set. Also, from
the construction
W
m
i=1
B(x
i
, r
x
i
) .
Dene f by
f(x) =
dist(x, W
C
)
dist(x, K) + dist(x, W
C
)
.
It is clear that f is continuous if the denominator is always nonzero. But this is
clear because if x W
C
there must be a ball B(x, r) such that this ball does not
intersect K. Otherwise, x would be a limit point of K and since K is closed, x K.
However, x / K because K W.
It is not necessary to be in a metric space to do this. You can accomplish the
same thing using Urysohns lemma.
Theorem 10.28 Let (, o, ) be a regular measure space as in Denition 10.26
where the conclusion of Lemma 10.27 holds. Then C
c
() is dense in L
p
().
Proof: First consider a measurable set, E where (E) < . Let K E V
where (V K) < . Now let K h V. Then
_
[h A
E
[
p
d
_
A
p
V \K
d = (V K) < .
It follows that for each s a simple function in L
p
() , there exists h C
c
() such
that [[s h[[
p
< . This is because if
s(x) =
m
i=1
c
i
A
E
i
(x)
is a simple function in L
p
where the c
i
are the distinct nonzero values of s each
(E
i
) < since otherwise s / L
p
due to the inequality
_
[s[
p
d [c
i
[
p
(E
i
) .
By Theorem 10.25, simple functions are dense in L
p
(), and so this proves the
Theorem.
262 THE L
P
SPACES
10.5 Separability
Theorem 10.29 For p 1 and a Radon measure, L
p
(1
n
, ) is separable. Recall
this means there exists a countable set, T, such that if f L
p
(1
n
, ) and > 0,
there exists g T such that [[f g[[
p
< .
Proof: Let Q be all functions of the form cA
[a,b)
where
[a, b) [a
1
, b
1
) [a
2
, b
2
) [a
n
, b
n
),
and both a
i
, b
i
are rational, while c has rational real and imaginary parts. Let T be
the set of all nite sums of functions in Q. Thus, T is countable. In fact T is dense in
L
p
(1
n
, ). To prove this it is necessary to show that for every f L
p
(1
n
, ), there
exists an element of T, s such that [[s f[[
p
< . If it can be shown that for every
g C
c
(1
n
) there exists h T such that [[g h[[
p
< , then this will suce because
if f L
p
(1
n
) is arbitrary, Theorem 10.28 implies there exists g C
c
(1
n
) such
that [[f g[[
p

2
and then there would exist h C
c
(1
n
) such that [[h g[[
p
<

2
.
By the triangle inequality,
[[f h[[
p
[[h g[[
p
+[[g f[[
p
< .
Therefore, assume at the outset that f C
c
(1
n
).
Let T
m
consist of all sets of the form [a, b)
n
i=1
[a
i
, b
i
) where a
i
= j2
m
and
b
i
= (j + 1)2
m
for j an integer. Thus T
m
consists of a tiling of 1
n
into half open
rectangles having diameters 2
m
n
1
2
. There are countably many of these rectangles;
so, let T
m
= [a
i
, b
i
)
i=1
and 1
n
=
i=1
[a
i
, b
i
). Let c
m
i
be complex numbers with
rational real and imaginary parts satisfying
[f(a
i
) c
m
i
[ < 2
m
,
[c
m
i
[ [f(a
i
)[. (10.16)
Let
s
m
(x) =
i=1
c
m
i
A
[a
i
,b
i
)
(x) .
Since f(a
i
) = 0 except for nitely many values of i, the above is a nite sum. Then
10.16 implies s
m
T. If s
m
converges uniformly to f then it follows [[s
m
f[[
p
0
because [s
m
[ [f[ and so
[[s
m
f[[
p
=
__
[s
m
f[
p
d
_
1/p
=
_
_
spt(f)
[s
m
f[
p
d
_
1/p
[m
n
(spt (f))]
1/p
whenever m is large enough.
10.5. SEPARABILITY 263
Since f C
c
(1
n
) it follows that f is uniformly continuous and so given > 0
there exists > 0 such that if [x y[ < , [f (x) f (y)[ < /2. Now let m be large
enough that every box in T
m
has diameter less than and also that 2
m
< /2.
Then if [a
i
, b
i
) is one of these boxes of T
m
, and x [a
i
, b
i
),
[f (x) f (a
i
)[ < /2
and
[f (a
i
) c
m
i
[ < 2
m
< /2.
Therefore, using the triangle inequality, it follows that [f (x) c
m
i
[ = [s
m
(x) f (x)[ <
and since x is arbitrary, this establishes uniform convergence. This proves the the-
orem.
Here is an easier proof if you know the Weierstrass approximation theorem.
p
(1
n
p
(1
n
, ) and > 0,
p
< .
Proof: Let T denote the set of all polynomials which have rational coecients.
Then T is countable. Let
k
C
c
(((k + 1) , (k + 1))
n
) such that [k, k]
n

k

((k + 1) , (k + 1))
n
. Let T
k
denote the functions which are of the form, p
k
where
p T. Thus T
k
is also countable. Let T
k=1
T
k
. It follows each function in T is
in C
c
(1
n
) and so it in L
p
(1
n
, ). Let f L
p
(1
n
, ). By regularity of there exists
g C
c
(1
n
) such that [[f g[[
L
p
(R
n
,)
<

3
. Let k be such that spt (g) (k, k)
n
.
Now by the Weierstrass approximation theorem there exists a polynomial q such
that
[[g q[[
[(k+1),k+1]
n sup[g (x) q (x)[ : x [(k + 1) , (k + 1)]
n
<

3(((k + 1) , k + 1)
n
)
.
It follows
[[g
k
q[[
[(k+1),k+1]
n = [[
k
g
k
q[[
[(k+1),k+1]
n <

3(((k + 1) , k + 1)
n
)
.
Without loss of generality, it can be assumed this polynomial has all rational coef-
cients. Therefore,
k
q T.
[[g
k
q[[
p
L
p
(R
n
)
=
_
((k+1),k+1)
n
[g (x)
k
(x) q (x)[
p
d
_

3(((k + 1) , k + 1)
n
)
_
p
(((k + 1) , k + 1)
n
)
<
_
3
_
p
.
It follows
[[f
k
q[[
L
p
(R
n
,)
[[f g[[
L
p
(R
n
,)
+[[g
k
q[[
L
p
(R
n
,)
<

3
+

3
< .
264 THE L
P
SPACES
Here is another proof based on the Weierstrass approximation theorem which
also introduces a collection of functions which will be useful in studying the Fourier
transform.
10.5.1 An Algebra Of Special Functions
First recall the following denition of a polynomial.
Denition 10.31 = (
1
, ,
n
) for
1

n
nonnegative integers is called a
multi-index. For a multi-index, [[
1
+ +
n
and if x 1
n
,
x = (x
1
, , x
n
) ,
x
1
1
x
2
2
x
n
n
.
A polynomial in n variables of degree m is a function of the form
p (x) =
||m
a
.
Here is a multi-index as just described and a
C. Also dene for =

(
1
, ,
n
) a multi-index
D
f (x)

||
f
x
1
1
x
2
2
x
n
n
.
Denition 10.32 Dene (
1
to be the functions of the form p (x) e
a|x|
2
where
a > 0 and rational and p (x) is a polynomial having all the a
= p + iq where both
p, q are rational. Let ( be all nite sums of functions in (
1
. Thus ( is an algebra
of functions which has the property that if f ( then f (. Also, (
1
is countable
and so it follows that so is (.
Lemma 10.33 ( is dense in C
0
(1
n
) with respect to the norm,
[[f[[
sup[f (x)[ : x 1
n
Proof: By the Weierstrass approximation theorem, it suces to show ( sep-

arates the points and annihilates no point. It was already observed in the above
denition that f ( whenever f (. If y
1
,= y
2
suppose rst that [y
1
[ ,= [y
2
[ .
Then in this case, you can let f (x) e
|x|
2
and f ( and f (y
1
) ,= f (y
2
). If
[y
1
[ = [y
2
[ , then suppose y
1k
,= y
2k
. This must happen for some k because y
1
,= y
2
.
Then let f (x) x
k
e
|x|
2
. Thus ( separates points. Now e
|x|
2
is never equal to
zero and so ( annihilates no point of 1
n
p
(1
n
p
(1
n
, ) and > 0,
p
< .
10.6. CONTINUITY OF TRANSLATION 265
Proof: Let T be dened as functions of the form
hA
(m,m)
n
where m N and h ( given above in Lemma 10.33. I just need to show it is dense
in L
p
(1
n
, ) because it is obviously countable. Let f L
p
(1
n
, ) . By Theorem
10.28 there exists g C
c
(1
n
) such that
[[f g[[
L
p
(R
n
,)
<
Say spt (g) (m, m)
n
. There exists h ( such that
[[h g[[
<

[((m, m)
n
)]
1/p
Then
hA
(m,m)
n g
L
p
()

_
_
(m,m)
n
p
d
_
1/p
= [((m, m)
n
)]
1/p
<
Hence
f hA
(m,m)
n
L
p
()
[[f g[[
L
p
()
+
g hA
(m,m)
n
L
p
()
< 2.
Since is arbitrary, this proves the theorem.
Corollary 10.35 Let be any measurable subset of 1
n
and let be a Radon
measure. Then L
p
(, ) is separable. Here the algebra of measurable sets will
consist of all intersections of measurable sets with and the measure will be
restricted to these sets.
Proof: Let

T be the restrictions of T to . If f L
p
(), let F be the zero
extension of f to all of 1
n
. Let > 0 be given. By Theorem 10.29 or 10.30 there
exists s T such that [[F s[[
p
< . Thus
[[s f[[
L
p
(,)
[[s F[[
L
p
(R
n
,)
<
and so the countable set

T is dense in L
p
().
10.6 Continuity Of Translation
Denition 10.36 Let f be a function dened on U 1
n
and let w 1
n
. Then
f
w
will be the function dened on w+U by
f
w
(x) = f(x w).
266 THE L
P
SPACES
Theorem 10.37 (Continuity of translation in L
p
) Let f L
p
(1
n
) with the mea-
sure being Lebesgue measure. Then
lim
||w||0
[[f
w
f[[
p
= 0.
Proof: Let > 0 be given and let g C
c
(1
n
) with [[g f[[
p
<

3
. Since
Lebesgue measure is translation invariant (m
n
(w+E) = m
n
(E)),
[[g
w
f
w
[[
p
= [[g f[[
p
<

3
.
You can see this from looking at simple functions and passing to the limit or you
could use the change of variables formula to verify it.
Therefore
[[f f
w
[[
p
[[f g[[
p
+[[g g
w
[[
p
+[[g
w
f
w
[[
<
2
3
+[[g g
w
[[
p
. (10.17)
But lim
|w|0
g
w
(x) = g(x) uniformly in x because g is uniformly continuous. Now
let B be a large ball containing spt (g) and let
1
be small enough that B(x, ) B
whenever x spt (g). If > 0 is given there exists <
1
such that if [w[ < , it
follows that [g (x w) g (x)[ < /3
_
1 +m
n
(B)
1/p
_
. Therefore,
[[g g
w
[[
p
=
__
B
[g (x) g (x w)[
p
dm
n
_
1/p

m
n
(B)
1/p
3
_
1 +m
n
(B)
1/p
_ <

3
.
Therefore, whenever [w[ < , it follows [[gg
w
[[
p
<

3
and so from 10.17 [[ff
w
[[
p
<
10.7 Molliers And Density Of Smooth Functions
Denition 10.38 Let U be an open subset of 1
n
. C
c
(U) is the vector space of all
innitely dierentiable functions which equal zero for all x outside of some compact
set contained in U. Similarly, C
m
c
(U) is the vector space of all functions which are
m times continuously dierentiable and whose support is a compact subset of U.
Example 10.39 Let U = B(z, 2r)
(x) =
_
_
_
exp
_
_
[x z[
2
r
2
_
1
_
if [x z[ < r,
0 if [x z[ r.
Then a little work shows C
c
(U). The following also is easily obtained.
10.7. MOLLIFIERS AND DENSITY OF SMOOTH FUNCTIONS 267
Lemma 10.40 Let U be any open set. Then C
c
(U) ,= .
Proof: Pick z U and let r be small enough that B(z, 2r) U. Then let
C
c
(B(z, 2r)) C
c
(U) be the function of the above example.
Denition 10.41 Let U = x 1
n
: [x[ < 1. A sequence
m
C
c
(U) is
called a mollier (This is often called an approximate identity if the dierentiability
is not included.) if
m
(x) 0,
m
(x) = 0, if [x[
1
m
,
and
_

m
(x) = 1. Sometimes it may be written as
where
satises the above

conditions except
(x) = 0 if [x[ . In other words, takes the place of 1/m

and in everything that follows 0 instead of m .
As before,
_
f(x, y)d(y) will mean x is xed and the function y f(x, y)
is being integrated. To make the notation more familiar, dx is written instead of
dm
n
(x).
Example 10.42 Let
C
c
(B(0, 1)) (B(0, 1) = x : [x[ < 1)
with (x) 0 and
_
dm = 1. Let
m
(x) = c
m
(mx) where c
m
is chosen in such
a way that
_

m
dm = 1. By the change of variables theorem c
m
= m
n
.
Denition 10.43 A function, f, is said to be in L
1
loc
(1
n
, ) if f is measurable
and if [f[A
K
L
1
(1
n
, ) for every compact set, K. Here is a Radon measure
on 1
n
. Usually = m
n
, Lebesgue measure. When this is so, write L
1
loc
(1
n
) or
L
p
(1
n
), etc. If f L
1
loc
(1
n
, ), and g C
c
(1
n
),
f g(x)
_
f(y)g(x y)d.
The following lemma will be useful in what follows. It says that one of these very
unregular functions in L
1
loc
(1
n
, ) is smoothed out by convolving with a mollier.
Lemma 10.44 Let f L
1
loc
(1
n
, ), and g C
c
(1
n
). Then f g is an innitely
dierentiable function. Here is a Radon measure on 1
n
.
Proof: Consider the dierence quotient for calculating a partial derivative of
f g.
f g (x +te
j
) f g (x)
t
=
_
f(y)
g(x +te
j
y) g (x y)
t
d(y) .
268 THE L
P
SPACES
Using the fact that g C
c
(1
n
), the quotient,
g(x +te
j
y) g (x y)
t
,
is uniformly bounded. To see this easily, use Theorem 4.9 on Page 85 to get the
existence of a constant, M depending on
max [[Dg (x)[[ : x 1
n
such that
[g(x +te
j
y) g (x y)[ M[t[
for any choice of x and y. Therefore, there exists a dominating function for the
integrand of the above integral which is of the form C [f (y)[ A
K
where K is a
compact set depending on the support of g. It follows the limit of the dierence
quotient above passes inside the integral as t 0 and
x
j
(f g) (x) =
_
f(y)

x
j
g (x y) d(y) .
Now letting

x
j
g play the role of g in the above argument, partial derivatives of all
orders exist. A similar use of the dominated convergence theorem shows all these
partial derivatives are also continuous. This proves the lemma.
Theorem 10.45 Let K be a compact subset of an open set, U. Then there exists
a function, h C
c
(U), such that h(x) = 1 for all x K and h(x) [0, 1] for all
x.
Proof: Let r > 0 be small enough that K + B(0, 3r) U. The symbol,
K +B(0, 3r) means
k +x : k K and x B(0, 3r) .
Thus this is simply a way to write
B(k, 3r) : k K .
Think of it as fattening up the set, K. Let K
r
= K +B(0, r). A picture of what is
happening follows.
K K
r
U
Consider A
K
r

m
where
m
is a mollier. Let m be so large that
1
m
< r.
Then from the denition of what is meant by a convolution, and using that
m
has
10.7. MOLLIFIERS AND DENSITY OF SMOOTH FUNCTIONS 269
support in B
_
0,
1
m
_
, A
K
r

m
= 1 on K and that its support is in K + B(0, 3r).
Now using Lemma 10.44, A
K
r

m
is also innitely dierentiable. Therefore, let
h = A
K
r

m
.
The following corollary will be used later.
Corollary 10.46 Let K be a compact set in 1
n
and let U
i
i=1
be an open cover
of K. Then there exist functions,
k
C
c
(U
i
) such that
i
U
i
and
i=1
i
(x) = 1.
If K
1
is a compact subset of U
1
there exist such functions such that also
1
(x) = 1
for all x K
1
.
Proof: This follows from a repeat of the proof of Theorem 8.20 on Page 185,
replacing the lemma used in that proof with Theorem 10.45.
Theorem 10.47 For each p 1, C
c
(1
n
) is dense in L
p
(1
n
). Here the measure
is Lebesgue measure.
Proof: Let f L
p
(1
n
) and let > 0 be given. Choose g C
c
(1
n
) such that
[[f g[[
p
<

2
. This can be done by using Theorem 10.28. Now let
g
m
(x) = g
m
(x)
_
g (x y)
m
(y) dm
n
(y) =
_
g (y)
m
(x y) dm
n
(y)
where
m
is a mollier. It follows from Lemma 10.44 g
m
C
c
(1
n
). It vanishes
if x / spt(g) +B(0,
1
m
).
[[g g
m
[[
p
=
__
[g(x)
_
g(x y)
m
(y)dm
n
(y)[
p
dm
n
(x)
_1
p
__
(
_
[g(x) g(x y)[
m
(y)dm
n
(y))
p
dm
n
(x)
_1
p
_ __
[g(x) g(x y)[
p
dm
n
(x)
_1
p
m
(y)dm
n
(y)
=
_
B(0,
1
m
)
[[g g
y
[[
p
m
(y)dm
n
(y) <

2
whenever m is large enough. This follows from Corollary ??. Theorem 10.24 was
used to obtain the third inequality. There is no measurability problem because the
function
(x, y) [g(x) g(x y)[
m
(y)
is continuous. Thus when m is large enough,
[[f g
m
[[
p
[[f g[[
p
+[[g g
m
[[
p
<

2
+

2
= .
270 THE L
P
SPACES
This is a very remarkable result. Functions in L
p
(1
n
) dont need to be continu-
ous anywhere and yet every such function is very close in the L
p
norm to one which
is innitely dierentiable having compact support.
Corollary 10.48 Let U be an open set. For each p 1, C
c
(U) is dense in L
p
(U).
Here the measure is Lebesgue measure.
Proof: Let f L
p
(U) and let > 0 be given. Choose g C
c
(U) such that
[[f g[[
p
<

2
. This is possible because Lebesgue measure restricted to the open
set, U is regular. Thus the existence of such a g follows from Theorem 10.28. Now
let
g
m
(x) = g
m
(x)
_
g (x y)
m
(y) dm
n
(y) =
_
g (y)
m
(x y) dm
n
(y)
where
m
is a mollier. It follows from Lemma 10.44 g
m
C
c
(U) for all m
suciently large. It vanishes if x / spt(g) +B(0,
1
m
). Then
[[g g
m
[[
p
=
__
[g(x)
_
g(x y)
m
(y)dm
n
(y)[
p
dm
n
(x)
_1
p
__
(
_
[g(x) g(x y)[
m
(y)dm
n
(y))
p
dm
n
(x)
_1
p
_ __
[g(x) g(x y)[
p
dm
n
(x)
_1
p
m
(y)dm
n
(y)
=
_
B(0,
1
m
)
[[g g
y
[[
p
m
(y)dm
n
(y) <

2
whenever m is large enough. This follows from Corollary ??. Theorem 10.24 was
used to obtain the third inequality. There is no measurability problem because the
function
(x, y) [g(x) g(x y)[
m
(y)
is continuous. Thus when m is large enough,
[[f g
m
[[
p
[[f g[[
p
+[[g g
m
[[
p
<

2
+

2
= .
Another thing should probably be mentioned. If you have had a course in com-
plex analysis, you may be wondering whether these innitely dierentiable functions
having compact support have anything to do with analytic functions which also have
innitely many derivatives. The answer is no! Recall that if an analytic function
has a limit point in the set of zeros then it is identically equal to zero. Thus these
functions in C
c
(1
n
) are not analytic. This is a strictly real analysis phenomenon
and has absolutely nothing to do with the theory of functions of a complex variable.
10.8. EXERCISES 271
10.8 Exercises
1. Let E be a Lebesgue measurable set in 1. Suppose m(E) > 0. Consider the
set
E E = x y : x E, y E.
Show that E E contains an interval. Hint: Let
f(x) =
_
A
E
(t)A
E
(x +t)dt.
Note f is continuous at 0 and f(0) > 0 and use continuity of translation in
L
p
.
2. Give an example of a sequence of functions in L
p
(1) which converges to zero
in L
p
but does not converge pointwise to 0. Does this contradict the proof
of the theorem that L
p
is complete? You dont have to be real precise, just
describe it.
3. Let K be a bounded subset of L
p
(1
n
) and suppose that for each > 0 there
exists G such that G is compact with
_
R
n
\G
[u(x)[
p
dx 0, there exist a > 0 and such that if [h[ < , then
_
[u(x +h) u(x)[
p
dx <
p
for all u K. Show that K is precompact in L
p
(1
n
). Hint: Let
k
be a
mollier and consider
K
k
u
k
: u K .
Verify the conditions of the Ascoli Arzela theorem for these functions dened
on G and show there is an net for each > 0. Can you modify this to let
an arbitrary open set take the place of 1
n
? This is a very important result.
4. Let (, d) be a metric space and suppose also that (, o, ) is a regular mea-
sure space such that () < and let f L
1
() where f has complex
values. Show that for every > 0, there exists an open set of measure less
than , denoted here by V and a continuous function, g dened on such
that f = g on V
C
. Thus, aside from a set of small measure, f is continuous.
If [f ()[ M, show that we can also assume [g ()[ M. This is called
Lusins theorem. Hint: Use Theorems 10.28 and 10.10 to obtain a sequence
of functions in C
c
() , g
n
which converges pointwise a.e. to f and then use
Egoros theorem to obtain a small set, W of measure less than /2 such that
convergence is uniform on W
C
. Now let F be a closed subset of W
C
such
that
_
W
C
F
_
< /2. Let V = F
C
. Thus (V ) < and on F = V
C
,
the convergence of g
n
is uniform showing that the restriction of f to V
C
is
continuous. Now use the Tietze extension theorem.
272 THE L
P
SPACES
5. Let : 1 1 be convex. This means
(x + (1 )y) (x) + (1 )(y)
whenever [0, 1]. Verify that if x < y < z, then
(y)(x)
yx

(z)(y)
zy
and that
(z)(x)
zx

(z)(y)
zy
. Show if s 1 there exists such that
(s) (t) +(s t) for all t. Show that if is convex, then is continuous.
6. Prove Jensens inequality. If : 1 1 is convex, () = 1, and f : 1
is in L
1
(), then (
_
f du)
_
(f)d. Hint: Let s =

_
f d and use
Problem 5.
7. Let
1
p
+
1
p
= 1, p > 1, let f L
p
(1), g L
p
(1). Show f g is uniformly

continuous on 1 and [(f g)(x)[ [[f[[
L
p[[g[[
L
p
. Hint: You need to consider
why f g exists and then this follows from the denition of convolution and
continuity of translation in L
p
.
8. B(p, q) =
_
1
0
x
p1
(1 x)
q1
dx, (p) =
_
0
e
t
t
p1
dt for p, q > 0. The rst
of these is called the beta function, while the second is the gamma function.
Show a.) (p + 1) = p(p); b.) (p)(q) = B(p, q)(p +q).
9. Let f C
c
(0, ) and dene F(x) =
1
x
_
x
0
f(t)dt. Show
[[F[[
L
p
(0,)

p
p 1
[[f[[
L
p
(0,)
whenever p > 1.
Hint: Argue there is no loss of generality in assuming f 0 and then assume
this is so. Integrate
_
0
[F(x)[
p
dx by parts as follows:
_

0
F
p
dx =
show = 0
..
xF
p
[
0
p
_

0
xF
p1
F
dx.
Now show xF
= f F and use this in the last integral. Complete the

argument by using Holders inequality and p 1 = p/q.
10. Now suppose f L
p
(0, ), p > 1, and f not necessarily in C
c
(0, ). Show
that F(x) =
1
x
_
x
0
f(t)dt still makes sense for each x > 0. Show the inequality
of Problem 9 is still valid. This inequality is called Hardys inequality. Hint:
To show this, use the above inequality along with the density of C
c
(0, ) in
L
p
(0, ).
11. Suppose f, g 0. When does equality hold in Holders inequality?
12. Prove Vitalis Convergence theorem: Let f
n
be uniformly integrable and
complex valued, () < , f
n
(x) f(x) a.e. where f is measurable. Then
f L
1
and lim
n
_
[f
n
f[d = 0. Hint: Use Egoros theorem to show
f
n
is a Cauchy sequence in L
1
(). This yields a dierent and easier proof
than what was done earlier. See Theorem 7.46 on Page 163.
10.8. EXERCISES 273
13. Show the Vitali Convergence theorem implies the Dominated Convergence
theorem for nite measure spaces but there exist examples where the Vitali
convergence theorem works and the dominated convergence theorem does not.
14. Suppose f L
L
1
. Show lim
p
[[f[[
L
p = [[f[[
. Hint:
([[f[[
)
p
([[f[ > [[f[[
])
_
[|f|>||f||
]
[f[
p
d
_
[f[
p
d =
_
[f[
p1
[f[ d [[f[[
p1
_
[f[ d.
Now raise both ends to the 1/p power and take liminf and limsup as p .
You should get [[f[[
liminf [[f[[
p
limsup [[f[[
p
[[f[[
15. Suppose () < . Show that if 1 p < q, then L

q
() L
p
(). Hint Use
Holders inequality.
16. Show L
1
(1) _ L
2
(1) and L
2
(1) _ L
1
(1) if Lebesgue measure is used. Hint:
Consider 1/
x and 1/x.
17. Suppose that [0, 1] and r, s, q > 0 with
1
q
=

r
+
1
s
.
show that
(
_
[f[
q
d)
1/q
((
_
[f[
r
d)
1/r
)
((
_
[f[
s
d)
1/s
)
1
.
If q, r, s 1 this says that
[[f[[
q
[[f[[
r
[[f[[
1
s
.
Using this, show that
ln
_
[[f[[
q
_
ln([[f[[
r
) + (1 ) ln([[f[[
s
) .
Hint:
_
[f[
q
d =
_
[f[
q
[f[
q(1)
d.
Now note that 1 =
q
r
+
q(1)
s
and use Holders inequality.
18. Suppose f is a function in L
1
(1) and f is innitely dierentiable. Does it fol-
low that f
L
1
(1)? Hint: What if C
c
(0, 1) and f (x) = (2
n
(x n))
for x (n, n + 1) , f (x) = 0 if x < 0?
274 THE L
P
SPACES
Fourier Transforms
11.1 An Algebra Of Special Functions
First recall the following denition of a polynomial.
Denition 11.1 = (
1
, ,
n
) for
1

n
nonnegative integers is called a
multi-index. For a multi-index, [[
1
+ +
n
and if x 1
n
,
x = (x
1
, , x
n
) ,
x
1
1
x
2
2
x
n
n
.
A polynomial in n variables of degree m is a function of the form
p (x) =
||m
a
.
Here is a multi-index as just described and a
C. Also dene for =

(
1
, ,
n
) a multi-index
D
f (x)

||
f
x
1
1
x
2
2
x
n
n
.
Denition 11.2 Dene (
1
to be the functions of the form p (x) e
a|x|
2
where a > 0
and p (x) is a polynomial. Let ( be all nite sums of functions in (
1
. Thus ( is an
algebra of functions which has the property that if f ( then f (.
It is always assumed, unless stated otherwise that the measure will be Lebesgue
measure.
Lemma 11.3 ( is dense in C
0
(1
n
) with respect to the norm,
[[f[[
sup[f (x)[ : x 1
n
275
276 FOURIER TRANSFORMS
Proof: By the Weierstrass approximation theorem, it suces to show ( sep-
arates the points and annihilates no point. It was already observed in the above
denition that f ( whenever f (. If y
1
,= y
2
suppose rst that [y
1
[ ,= [y
2
[ .
Then in this case, you can let f (x) e
|x|
2
and f ( and f (y
1
) ,= f (y
2
). If
[y
1
[ = [y
2
[ , then suppose y
1k
,= y
2k
. This must happen for some k because y
1
,= y
2
.
Then let f (x) x
k
e
|x|
2
. Thus ( separates points. Now e
|x|
2
is never equal to
zero and so ( annihilates no point of 1
n
These functions are clearly quite specialized. Therefore, the following theorem
is somewhat surprising.
Theorem 11.4 For each p 1, p < , ( is dense in L
p
(1
n
).
Proof: Let f L
p
(1
n
) . Then there exists g C
c
(1
n
) such that [[f g[[
p
< .
Now let b > 0 be large enough that
_
R
n
_
e
b|x|
2
_
p
dx <
p
.
Then x g (x) e
b|x|
2
is in C
c
(1
n
) C
0
(1
n
) . Therefore, from Lemma 11.3 there
exists ( such that

ge
b||
2
< 1
Therefore, letting (x) e
b|x|
2
(x) it follows that ( and for all x 1
n
,
[g (x) (x)[ < e
b|x|
2
Therefore,
__
R
n
[g (x) (x)[
p
dx
_
1/p
__
R
n
_
e
b|x|
2
_
p
dx
_
1/p
< .
It follows
[[f [[
p
[[f g[[
p
+[[g [[
p
< 2.
Since > 0 is arbitrary, this proves the theorem.
The following lemma is also interesting even if it is obvious.
Lemma 11.5 For ( , p a polynomial, and , multiindices, D
( and
p (. Also
sup[x
(x)[ : x 1
n
<
11.2 Fourier Transforms Of Functions In (
Denition 11.6 For ( Dene the Fourier transform, F and the inverse
Fourier transform, F
1
by
F(t) (2)
n/2
_
R
n
e
itx
(x)dx,
11.2. FOURIER TRANSFORMS OF FUNCTIONS IN G 277
F
1
(t) (2)
n/2
_
R
n
e
itx
(x)dx.
where t x
n
i=1
t
i
x
i
.Note there is no problem with this denition because is in
L
1
(1
n
) and therefore,
e
itx
(x)
[(x)[ ,
an integrable function.
One reason for using the functions, ( is that it is very easy to compute the
Fourier transform of these functions. The rst thing to do is to verify F and F
1
map ( to ( and that F
1
F () = .
Lemma 11.7 The following formulas are true
_
R
e
c(x+it)
2
dx =
_
R
e
c(xit)
2
dx =
c
, (11.1)
_
R
n
e
c(x+it)(x+it)
dx =
_
R
n
e
c(xit)(xit)
dx =
_
c
_
n
, (11.2)
_
R
e
ct
2
e
ist
dt =
_
R
e
ct
2
e
ist
dt = e
s
2
4c
c
, (11.3)
_
R
n
e
c|t|
2
e
ist
dt =
_
R
n
e
c|t|
2
e
ist
dt = e
|s|
2
4c
_
c
_
n
. (11.4)
Proof: Consider the rst one. Simple manipulations yield
H (t)
_
R
e
c(x+it)
2
dx = e
ct
2
_
R
e
cx
2
cos (2cxt) dx.
Now using the dominated convergence theorem to justify passing derivatives inside
the integral where necessary and using integration by parts,
H
(t) = 2cte
ct
2
_
R
e
cx
2
cos (2cxt) dx e
ct
2
_
R
e
cx
2
sin(2cxt) 2xcdx
= 2ctH (t) e
ct
2
2ct
_
R
e
cx
2
cos (2cxt) dx = 2ct (H (t) H (t)) = 0
and so H (t) = H (0) =
_
R
e
cx
2
dx I. Thus
I
2
=
_
R
2
e
c(x
2
+y
2
)
dxdy =
_

0
_
2
0
e
cr
2
rddr =

c
.
Therefore, I =

/
c. Since the sign of t is unimportant, this proves 11.1. This

also proves 11.2 after writing as iterated integrals.
Consider 11.3.
_
R
e
ct
2
e
ist
dt =
_
R
e
c
t
2
ist
c
+(
is
2c
)
2
s
2
4c
dt
= e
s
2
4c
_
R
e
c(t
is
2c
)
2
dt = e
s
2
4c
c
.
Changing the variable t t gives the other part of 11.3.
Finally 11.4 follows from using iterated integrals.
With these formulas, it is easy to verify F, F
1
map ( to ( and F F
1
=
F
1
F = id.
Theorem 11.8 Each of F and F
1
map ( to (. Also F
1
F () = and
F F
1
() = .
Proof: The rst claim will be shown if it is shown that F ( for (x) =
x
e
b|x|
2
because an arbitrary function of ( is a nite sum of scalar multiples of
functions such as . Using Lemma 11.7,
F (t)
_
1
2
_
n/2
_
R
n
e
itx
x
e
b|x|
2
dx
=
_
1
2
_
n/2
(i)
||
D
t
__
R
n
e
itx
e
b|x|
2
dx
_
=
_
1
2
_
n/2
(i)
||
D
t
_
e
|t|
2
2b
_
b
_
n
_
and this is clearly in ( because it equals a polynomial times e
|t|
2
2b
. It remains
to verify the other assertion. As in the rst case, it suces to consider (x) =
x
e
b|x|
2
.
F
1
F () (s)
_
1
2
_
n/2
_
R
n
e
ist
_
1
2
_
n/2
_
R
n
e
itx
x
e
b|x|
2
dxdt
=
_
1
2
_
n
_
R
n
e
ist
(i)
||
D
t
__
R
n
e
itx
e
b|x|
2
dxdt
_
=
_
1
2
_
n/2
_
R
n
e
ist
_
1
2
_
n/2
(i)
||
D
t
_
e
|t|
2
4b
_
b
_
n
_
dt
=
_
1
2
_
n
_
b
_
n
(i)
||
_
R
n
e
ist
D
t
_
e
|t|
2
4b
_
dt
=
_
1
2
_
n
_
b
_
n
(i)
||
(1)
||
s
(i)
||
_
R
n
e
ist
e
|t|
2
4b
dt
=
_
1
2
_
n
_
b
_
n
s
_
R
n
e
ist
e
|t|
2
4b
dt
11.3. FOURIER TRANSFORMS OF JUST ABOUT ANYTHING 279
=
_
1
2
_
n
_
b
_
n
s
|s|
2
4(1/(4b))
_

_
1/ (4b)
_
n
=
_
1
2
_
n
_
b
_
n
s
e
b|s|
2
_
b
_
n
= s
e
b|s|
2
= (s) .
This little computation proves the theorem. The other case is entirely similar.
11.3 Fourier Transforms Of Just About Anything
11.3.1 Fourier Transforms Of (
Denition 11.9 Let (
denote the vector space of linear functions dened on (

which have values in C. Thus T (
means T : ( C and T is linear,

T (a +b) = aT () +bT () for all a, b C, , (
Let (. Then dene T
by
T
()
_
R
n
(x) (x) dx
Lemma 11.10 The following is obtained for all , (.
T
F
() = T
(F) , T
F
1
() = T
_
F
1
_
Also if ( and T
= 0, then = 0.
Proof:
T
F
()
_
R
n
F (t) (t) dt
=
_
R
n
_
1
2
_
n/2
_
R
n
e
itx
(x)dx(t) dt
=
_
R
n
(x)
_
1
2
_
n/2
_
R
n
e
itx
(t) dtdx
=
_
R
n
(x)F(x) dx T
(F)
The other claim is similar.
Suppose now T
= 0. Then
_
R
n
dx = 0
for all (. Therefore, this is true for = and so = 0. This proves the lemma.
From now on regard ( (
and for ( write () instead of T
() . It was
just shown that with this interpretation
1
,
F () = (F ()) , F
1
() =
_
F
1
_
.
This lemma suggests a way to dene the Fourier transform of something in (
.
Denition 11.11 For T (
, dene FT, F
1
T (
by
FT () T (F) , F
1
T () T
_
F
1
_
Lemma 11.12 F and F
1
are both one to one, onto, and are inverses of each
other.
Proof: First note F and F
1
are both linear. This follows directly from the
denition. Suppose now FT = 0. Then FT () = T (F) = 0 for all (. But F
and F
1
map ( onto ( because if (, then = F
_
F
1
()
_
. Therefore, T = 0
and so F is one to one. Similarly F
1
is one to one. Now
F
1
(FT) () (FT)
_
F
1
_
T
_
F
_
F
1
()
__
= T.
Therefore, F
1
F (T) = T. Similarly, F F
1
(T) = T. Thus both F and F
1
are
one to one and onto and are inverses of each other as suggested by the notation.
Probably the most interesting things in (
are functions of various kinds. The

following lemma has to do with this situation.
Lemma 11.13 If f L
1
loc
(1
n
) and
_
R
n
fdx = 0 for all C
c
(1
n
), then f =
0 a.e.
Proof: First suppose f 0. Let
E x :f (x) r, E
R
E B(0,R).
Let K
m
be an increasing sequence of compact sets and let V
m
be a decreasing
sequence of open sets satisfying
K
m
E
R
V
m
, m
n
(V
m
) m
n
(K
m
) + 2
m
, V
1
B(0,R) .
Therefore,
m
n
(V
m
K
m
) 2
m
.
Let
m
C
c
(V
m
) , K
m

m
V
m
.
1
This is not all that dierent from what was done with the derivative. Remember when you
consider the derivative of a function of one variable, in elementary courses you think of it as a
number but thinking of it as a linear transformation acting on R is better because this leads to
the concept of a derivative which generalizes to functions of many variables. So it is here. You
can think of G as simply an element of G but it is better to think of it as an element of G
as
just described.
Then
m
(x) A
E
R
(x) a.e. because the set where
m
(x) fails to converge to this
set is contained in the set of all x which are in innitely many of the sets V
m
K
m
.
This set has measure zero because
m=1
m
n
(V
m
K
m
) <
and so, by the dominated convergence theorem,
0 = lim
m
_
R
n
f
m
dx = lim
m
_
V
1
f
m
dx =
_
E
R
fdx rm(E
R
).
Thus, m
n
(E
R
) = 0 and therefore m
n
(E) = lim
R
m
n
(E
R
) = 0. Since r > 0 is
arbitrary, it follows
m
n
([f > 0]) =
k=1
m
n
__
f > k
1
_
= 0.
Now suppose f has values in 1. Let E
+
= [f 0] and E
= [f < 0] . Thus
these are two measurable sets. As in the rst part, let K
m
and V
m
be sequences
of compact and open sets such that K
m
E
+
B(0, R) V
m
B(0, R) and let
K
m

m
V
m
with m
n
(V
m
K
m
) < 2
m
. Thus
m
C
c
(1
n
) and the sequence
converges pointwise to A
E
+
B(0,R)
. Then by the dominated convergence theorem,
if is any function in C
c
(1
n
)
0 =
_
f
m
dm
n

_
fA
E
+
B(0,R)
dm
n
.
Hence, letting R ,
_
fA
E
+
dm
n
=
_
f
+
dm
n
= 0
Since is arbitrary, the rst part of the argument applies to f
+
and implies f
+
= 0.
Similarly f
= 0. Finally, if f is complcx valued, the assumptions mean

_
Re (f) dm
n
= 0,
_
Im(f) dm
n
= 0
for all C
c
(1
n
) and so both Re (f) , Im(f) equal zero a.e. This proves the
lemma.
1
(1
n
) and suppose
_
R
n
f (x) (x) dx = 0
for all (. Then f = 0 a.e.
Proof: Let C
c
(1
n
) . Then by the Stone Weierstrass approximation theo-
rem, there exists a sequence of functions,
k
( such that
k
uniformly.
Then by the dominated convergence theorem,
_
fdx = lim
k
_
f
k
dx = 0.
By Lemma 11.13 f = 0.
The next theorem is the main result of this sort.
Theorem 11.15 Let f L
p
(1
n
) , p 1, or suppose f is measurable and has
polynomial growth,
[f (x)[ K
_
1 +[x[
2
_
m
for some m N. Then if
_
fdx = 0
for all ( then it follows f = 0.
Proof: First note that if f L
p
(1
n
) or has polynomial growth, then it makes
sense to write the integral
_
fdx described above. This is obvious in the case of
polynomial growth. In the case where f L
p
(1
n
) it also makes sense because
_
[f[ [[ dx
__
[f[
p
dx
_
1/p
__
[[
p
dx
_
1/p
<
due to the fact mentioned above that all these functions in ( are in L
p
(1
n
) for
every p 1. Suppose now that f L
p
, p 1. The case where f L
1
(1
n
) was
dealt with in Corollary 11.14. Suppose f L
p
(1
n
) for p > 1. Then
[f[
p2
f L
p
(1
n
) ,
_
p
= q,
1
p
+
1
q
= 1
_
and by density of ( in L
p
(1
n
) there exists a sequence g
k
( such that
g
k
[f[
p2
f
0.
Then
_
R
n
[f[
p
dx =
_
R
n
f
_
[f[
p2
f g
k
_
dx +
_
R
n
fg
k
dx
=
_
R
n
f
_
[f[
p2
f g
k
_
dx
[[f[[
L
p
g
k
[f[
p2
f
which converges to 0. Hence f = 0.

It remains to consider the case where f has polynomial growth. Thus x
f (x) e
|x|
2
L
1
(1
n
) . Therefore, for all (,
0 =
_
f (x) e
|x|
2
(x) dx
because e
|x|
2
(x) (. Therefore, by the rst part, f (x) e
|x|
2
= 0 a.e. This
proves the theorem.
The following theorem shows that you can consider most functions you are likely
to encounter as elements of (
.
Theorem 11.16 Let f be a measurable function with polynomial growth,
[f (x)[ C
_
1 +[x[
2
_
N
for some N,
or let f L
p
(1
n
) for some p [1, ]. Then f (
if
f ()
_
fdx.
Proof: Let f have polynomial growth rst. Then the above integral is clearly
well dened and so in this case, f (
.
Next suppose f L
p
(1
n
) with > p 1. Then it is clear again that the
above integral is well dened because of the fact that is a sum of polynomials
times exponentials of the form e
c|x|
2
and these are in L
p
(1
n
). Also f () is
clearly linear in both cases. This proves the theorem.
This has shown that for nearly any reasonable function, you can dene its Fourier
transform as described above. Note also you could dene the Fourier transform of
a nite Radon measure because for such a measure

_
R
n
d
is a linear functional on (. This includes the very important case of probability
distribution measures. The theoretical basis for this assertion will be given a little
later.
1
(1
n
)
First suppose f L
1
(1
n
) .
Theorem 11.17 Let f L
1
(1
n
) . Then Ff () =
_
R
n
gdt where
g (t) =
_
1
2
_
n/2
_
R
n
e
itx
f (x) dx
and F
1
f () =
_
R
n
gdt where g (t) =
_
1
2
_
n/2
_
R
n
e
itx
f (x) dx. In short,
Ff(t) (2)
n/2
_
R
n
e
itx
f(x)dx,
F
1
f(t) (2)
n/2
_
R
n
e
itx
f(x)dx.
Proof: From the denition and Fubinis theorem,
Ff ()
_
R
n
f (t) F(t) dt =
_
R
n
f (t)
_
1
2
_
n/2
_
R
n
e
itx
(x) dxdt
=
_
R
n
_
_
1
2
_
n/2
_
R
n
f (t) e
itx
dt
_
(x) dx.
Since ( is arbitrary, it follows from Theorem 11.15 that Ff (x) is given by the
claimed formula. The case of F
1
is identical.
Here are interesting properties of these Fourier transforms of functions in L
1
.
Theorem 11.18 If f L
1
(1
n
) and [[f
k
f[[
1
0, then Ff
k
and F
1
f
k
converge
uniformly to Ff and F
1
f respectively. If f L
1
(1
n
), then F
1
f and Ff are
both continuous and bounded. Also,
lim
|x|
F
1
f(x) = lim
|x|
Ff(x) = 0. (11.5)
Furthermore, for f L
1
(1
n
) both Ff and F
1
f are uniformly continuous.
Proof: The rst claim follows from the following inequality.
[Ff
k
(t) Ff (t)[ (2)
n/2
_
R
n
e
itx
f
k
(x) e
itx
f(x)
dx
= (2)
n/2
_
R
n
[f
k
(x) f (x)[ dx
= (2)
n/2
[[f f
k
[[
1
.
which a similar argument holding for F
1
.
Now consider the second claim of the theorem.
[Ff (t) Ff (t
)[ (2)
n/2
_
R
n
e
itx
e
it
[f(x)[ dx
The integrand is bounded by 2 [f (x)[, a function in L
1
(1
n
) and converges to 0 as
t
t and so the dominated convergence theorem implies Ff is continuous. To see

Ff (t) is uniformly bounded,
[Ff (t)[ (2)
n/2
_
R
n
[f(x)[ dx < .
A similar argument gives the same conclusions for F
1
.
It remains to verify 11.5 and the claim that Ff and F
1
f are uniformly contin-
uous.
[Ff (t)[
(2)
n/2
_
R
n
e
itx
f(x)dx
Now let > 0 be given and let g C
c
(1
n
) such that (2)
n/2
[[g f[[
1
< /2.
Then
[Ff (t)[ (2)
n/2
_
R
n
[f(x) g (x)[ dx
+
(2)
n/2
_
R
n
e
itx
g(x)dx
/2 +
(2)
n/2
_
R
n
e
itx
g(x)dx
.
Now integrating by parts, it follows that for [[t[[
max [t
j
[ : j = 1, , n > 0
[Ff (t)[ /2 + (2)
n/2
1
[[t[[
_
R
n
n
j=1
g (x)
x
j
dx
(11.6)
and this last expression converges to zero as [[t[[
. The reason for this is that

if t
j
,= 0, integration by parts with respect to x
j
gives
(2)
n/2
_
R
n
e
itx
g(x)dx = (2)
n/2
1
it
j
_
R
n
e
itx
g (x)
x
j
dx.
Therefore, choose the j for which [[t[[
= [t
j
[ and the result of 11.6 holds. There-
fore, from 11.6, if [[t[[
is large enough, [Ff (t)[ < . Similarly, lim

||t||
F
1
(t) =
0. Consider the claim about uniform continuity. Let > 0 be given. Then there
exists R such that if [[t[[
> R, then [Ff (t)[ <

2
. Since Ff is continuous, it is
uniformly continuous on the compact set, [R 1, R + 1]
n
. Therefore, there exists
1
such that if [[t t
[[
<
1
for t
, t [R 1, R + 1]
n
, then
[Ff (t) Ff (t
)[ < /2. (11.7)

Now let 0 < < min(
1
, 1) and suppose [[t t
[[
< . If both t, t
are contained
in [R, R]
n
, then 11.7 holds. If t [R, R]
n
and t
/ [R, R]
n
, then both are
contained in [R 1, R + 1]
n
and so this veries 11.7 in this case. The other case
is that neither point is in [R, R]
n
and in this case,
[Ff (t) Ff (t
)[ [Ff (t)[ +[Ff (t
)[
<

2
+

2
= .
There is a very interesting relation between the Fourier transform and convolu-
tions.
Theorem 11.19 Let f, g L
1
(1
n
). Then fg L
1
and F(fg) = (2)
n/2
FfFg.
Proof: Consider
_
R
n
_
R
n
[f (x y) g (y)[ dydx.
The function, (x, y) [f (x y) g (y)[ is Lebesgue measurable and so by Fubinis
theorem,
_
R
n
_
R
n
[f (x y) g (y)[ dydx =
_
R
n
_
R
n
[f (x y) g (y)[ dxdy
= [[f[[
1
[[g[[
1
< .
It follows that for a.e. x,
_
R
n
[f (x y) g (y)[ dy < and for each of these values
of x, it follows that
_
R
n
f (x y) g (y) dy exists and equals a function of x which is
in L
1
(1
n
) , f g (x). Now
F(f g) (t)
(2)
n/2
_
R
n
e
itx
f g (x) dx
= (2)
n/2
_
R
n
e
itx
_
R
n
f (x y) g (y) dydx
= (2)
n/2
_
R
n
e
ity
g (y)
_
R
n
e
it(xy)
f (x y) dxdy
= (2)
n/2
Ff (t) Fg (t) .
There are many other considerations involving Fourier transforms of functions
in L
1
(1
n
).
2
(1
n
)
Consider Ff and F
1
f for f L
2
(1
n
). First note that the formula given for Ff
and F
1
f when f L
1
(1
n
) will not work for f L
2
(1
n
) unless f is also in L
1
(1
n
).
Recall that a +ib = a ib.
Theorem 11.20 For (, [[F[[
2
= [[F
1
[[
2
= [[[[
2
.
Proof: First note that for (,
F() = F
1
() , F
1
() = F(). (11.8)
This follows from the denition. For example,
F (t) = (2)
n/2
_
R
n
e
itx
(x) dx
= (2)
n/2
_
R
n
e
itx
(x) dx
Let , (. It was shown above that
_
R
n
(F)(t)dt =
_
R
n
(F)dx.
Similarly,
_
R
n
(F
1
)dx =
_
R
n
(F
1
)dt. (11.9)
Now, 11.8 - 11.9 imply
_
R
n
[[
2
dx =
_
R
n
F
1
(F)dx
=
_
R
n
F(F)dx
=
_
R
n
F(F)dx
=
_
R
n
[F[
2
dx.
Similarly
[[[[
2
= [[F
1
[[
2
.
Lemma 11.21 Let f L
2
(1
n
) and let
k
f in L
2
(1
n
) where
k
(. (Such
a sequence exists because of density of ( in L
2
(1
n
).) Then Ff and F
1
f are both
in L
2
(1
n
) and the following limits take place in L
2
.
lim
k
F (
k
) = F (f) , lim
k
F
1
(
k
) = F
1
(f) .
Proof: Let ( be given. Then
Ff () f (F)
_
R
n
f (x) F (x) dx
= lim
k
_
R
n
k
(x) F (x) dx = lim
k
_
R
n
F
k
(x) (x) dx.
Also by Theorem 11.20 F
k
k=1
is Cauchy in L
2
(1
n
) and so it converges to some
h L
2
(1
n
). Therefore, from the above,
Ff () =
_
R
n
h(x) (x)
which shows that F (f) L
2
(1
n
) and h = F (f) . The case of F
1
is entirely
similar. This proves the lemma.
Since Ff and F
1
f are in L
2
(1
n
) , this also proves the following theorem.
Theorem 11.22 If f L
2
(1
n
), Ff and F
1
f are the unique elements of L
2
(1
n
)
such that for all (,
_
R
n
Ff(x)(x)dx =
_
R
n
f(x)F(x)dx, (11.10)
_
R
n
F
1
f(x)(x)dx =
_
R
n
f(x)F
1
(x)dx. (11.11)
Theorem 11.23 (Plancherel)
[[f[[
2
= [[Ff[[
2
= [[F
1
f[[
2
. (11.12)
Proof: Use the density of ( in L
2
(1
n
) to obtain a sequence,
k
converging
to f in L
2
(1
n
). Then by Lemma 11.21
[[Ff[[
2
= lim
k
[[F
k
[[
2
= lim
k
[[
k
[[
2
= [[f[[
2
.
Similarly,
[[f[[
2
= [[F
1
f[[
2.
The following corollary is a simple generalization of this. To prove this corollary,
use the following simple lemma which comes as a consequence of the Cauchy Schwarz
inequality.
Lemma 11.24 Suppose f
k
f in L
2
(1
n
) and g
k
g in L
2
(1
n
). Then
lim
k
_
R
n
f
k
g
k
dx =
_
R
n
fgdx
Proof:
_
R
n
f
k
g
k
dx
_
R
n
fgdx
_
R
n
f
k
g
k
dx
_
R
n
f
k
gdx
_
R
n
f
k
gdx
_
R
n
fgdx
[[f
k
[[
2
[[g g
k
[[
2
+[[g[[
2
[[f
k
f[[
2
.
Now [[f
k
[[
2
is a Cauchy sequence and so it is bounded independent of k. Therefore,
the above expression is smaller than whenever k is large enough. This proves the
lemma.
Corollary 11.25 For f, g L
2
(1
n
),
_
R
n
fgdx =
_
R
n
Ff Fgdx =
_
R
n
F
1
f F
1
gdx.
Proof: First note the above formula is obvious if f, g (. To see this, note
_
R
n
Ff Fgdx =
_
R
n
Ff (x)
1
(2)
n/2
_
R
n
e
ixt
g (t) dtdx
=
_
R
n
1
(2)
n/2
_
R
n
e
ixt
Ff (x) dxg (t)dt
=
_
R
n
_
F
1
F
_
f (t) g (t)dt
=
_
R
n
f (t) g (t)dt.
The formula with F
1
is exactly similar.
Now to verify the corollary, let
k
f in L
2
(1
n
) and let
k
g in L
2
(1
n
).
Then by Lemma 11.21
_
R
n
Ff Fgdx = lim
k
_
R
n
F
k
F
k
dx
= lim
k
_
R
n
k
dx
=
_
R
n
fgdx
A similar argument holds for F
1
.This proves the corollary.
How does one compute Ff and F
1
f ?
Theorem 11.26 For f L
2
(1
n
), let f
r
= fA
E
r
where E
r
is a bounded measurable
set with E
r
1
n
. Then the following limits hold in L
2
(1
n
) .
Ff = lim
r
Ff
r
, F
1
f = lim
r
F
1
f
r
.
Proof: [[f f
r
[[
2
0 and so [[Ff Ff
r
[[
2
0 and [[F
1
f F
1
f
r
[[
2
0 by
Plancherels Theorem. This proves the theorem.
What are Ff
r
and F
1
f
r
? Let (
_
R
n
Ff
r
dx =
_
R
n
f
r
Fdx
= (2)
n
2
_
R
n
_
R
n
f
r
(x)e
ixy
(y)dydx
=
_
R
n
[(2)
n
2
_
R
n
f
r
(x)e
ixy
dx](y)dy.
Since this holds for all (, a dense subset of L
2
(1
n
), it follows that
Ff
r
(y) = (2)
n
2
_
R
n
f
r
(x)e
ixy
dx.
Similarly
F
1
f
r
(y) = (2)
n
2
_
R
n
f
r
(x)e
ixy
dx.
This shows that to take the Fourier transform of a function in L
2
(1
n
), it suces
to take the limit as r in L
2
(1
n
) of (2)
n
2
_
R
n
f
r
(x)e
ixy
dx. A similar
procedure works for the inverse Fourier transform.
Note this reduces to the earlier denition in case f L
1
(1
n
). Now consider the
convolution of a function in L
2
with one in L
1
.
Theorem 11.27 Let h L
2
(1
n
) and let f L
1
(1
n
). Then h f L
2
(1
n
),
F
1
(h f) = (2)
n/2
F
1
hF
1
f,
F (h f) = (2)
n/2
FhFf,
and
[[h f[[
2
[[h[[
2
[[f[[
1
. (11.13)
Proof: An application of Minkowskis inequality yields
_
_
R
n
__
R
n
[h(x y)[ [f (y)[ dy
_
2
dx
_
1/2
[[f[[
1
[[h[[
2
. (11.14)
Hence
_
[h(x y)[ [f (y)[ dy < a.e. x and
x
_
h(x y) f (y) dy
is in L
2
(1
n
). Let E
r
1
n
, m(E
r
) < . Thus,
h
r
A
E
r
h L
2
(1
n
) L
1
(1
n
),
and letting (,
_
F (h
r
f) () dx
_
(h
r
f) (F) dx
= (2)
n/2
_ _ _
h
r
(x y) f (y) e
ixt
(t) dtdydx
= (2)
n/2
_ _ __
h
r
(x y) e
i(xy)t
dx
_
f (y) e
iyt
dy(t) dt
=
_
(2)
n/2
Fh
r
(t) Ff (t) (t) dt.
Since is arbitrary and ( is dense in L
2
(1
n
),
F (h
r
f) = (2)
n/2
Fh
r
Ff.
Now by Minkowskis Inequality, h
r
f h f in L
2
(1
n
) and also it is clear that
h
r
h in L
2
(1
n
) ; so, by Plancherels theorem, you may take the limit in the above
and conclude
F (h f) = (2)
n/2
FhFf.
The assertion for F
1
is similar and 11.13 follows from 11.14.
11.3.4 The Schwartz Class
The problem with ( is that it does not contain C
c
(1
n
). I have used it in presenting
the Fourier transform because the functions in ( have a very specic form which
made some technical details work out easier than in any other approach I have
seen. The Schwartz class is a larger class of functions which does contain C
c
(1
n
)
and also has the same nice properties as (. The functions in the Schwartz class
are innitely dierentiable and they vanish very rapidly as [x[ along with all
their partial derivatives. This is the description of these functions, not a specic
form involving polynomials times e
|x|
2
. To describe this precisely requires some
notation.
Denition 11.28 f S, the Schwartz class, if f C
(1
n
) and for all positive
integers N,
N
(f) <
where
N
(f) = sup(1 +[x[
2
)
N
[D
f(x)[ : x 1
n
, [[ N.
Thus f S if and only if f C
(1
n
) and
sup[x
f(x)[ : x 1
n
< (11.15)
for all multi indices and .
Also note that if f S, then p(f) S for any polynomial, p with p(0) = 0 and
that
S L
p
(1
n
) L
(1
n
)
for any p 1. To see this assertion about the p (f), it suces to consider the case
of the product of two elements of the Schwartz class. If f, g S, then D
(fg) is
a nite sum of derivatives of f times derivatives of g. Therefore,
N
(fg) < for
all N. You may wonder about examples of things in S. Clearly any function in
C
c
(1
n
) is in S. However there are other functions in S. For example e
|x|
2
is in
S as you can verify for yourself and so is any function from (. Note also that the
density of C
c
(1
n
) in L
p
(1
n
) shows that S is dense in L
p
(1
n
) for every p.
Recall the Fourier transform of a function in L
1
(1
n
) is given by
Ff(t) (2)
n/2
_
R
n
e
itx
f(x)dx.
Therefore, this gives the Fourier transform for f S. The nice property which S
has in common with ( is that the Fourier transform and its inverse map S one to
one onto S. This means I could have presented the whole of the above theory in
terms of S rather than in terms of (. However, it is more technical.
Theorem 11.29 If f S, then Ff and F
1
f are also in S.
Proof: To begin with, let = e
j
= (0, 0, , 1, 0, , 0), the 1 in the j
th
slot.
F
1
f(t +he
j
) F
1
f(t)
h
= (2)
n/2
_
R
n
e
itx
f(x)(
e
ihx
j
1
h
)dx. (11.16)
Consider the integrand in 11.16.
e
itx
f(x)(
e
ihx
j
1
h
)
= [f (x)[
(
e
i(h/2)x
j
e
i(h/2)x
j
h
)
= [f (x)[
i sin((h/2) x
j
)
(h/2)
[f (x)[ [x
j
[
and this is a function in L
1
(1
n
) because f S. Therefore by the Dominated
Convergence Theorem,
F
1
f(t)
t
j
= (2)
n/2
_
R
n
e
itx
ix
j
f(x)dx
= i(2)
n/2
_
R
n
e
itx
x
e
j
f(x)dx.
Now x
e
j
f(x) S and so one can continue in this way and take derivatives inde-
nitely. Thus F
1
f C
(1
n
) and from the above argument,
D
F
1
f(t) =(2)
n/2
_
R
n
e
itx
(ix)
f(x)dx.
To complete showing F
1
f S,
t
F
1
f(t) =(2)
n/2
_
R
n
e
itx
t
(ix)
a
f(x)dx.
Integrate this integral by parts to get
t
F
1
f(t) =(2)
n/2
_
R
n
i
||
e
itx
D
((ix)
a
f(x))dx. (11.17)
Here is how this is done.
_
R
e
it
j
x
j
t
j
j
(ix)
f(x)dx
j
=
e
it
j
x
j
it
j
t
j
j
(ix)
f(x) [
+
i
_
R
e
it
j
x
j
t
j
1
j
D
e
j
((ix)
f(x))dx
j
where the boundary term vanishes because f S. Returning to 11.17, use the fact
that [e
ia
[ = 1 to conclude
[t
F
1
f(t)[ C
_
R
n
[D
((ix)
a
f(x))[dx < .
It follows F
1
f S. Similarly Ff S whenever f S.
Theorem 11.30 Let S. Then (F F
1
)() = and (F
1
F)() =
whenever S. Also F and F
1
map S one to one and onto S.
Proof: The rst claim follows from the fact that F and F
1
are inverses of each
other which was established above. For the second, let S. Then = F
_
F
1
_
.
Thus F maps S onto S. If F = 0, then do F
1
to both sides to conclude = 0.
Thus F is one to one and onto. Similarly, F
1
is one to one and onto.
11.3.5 Convolution
To begin with it is necessary to discuss the meaning of f where f (
and (.
What should it mean? First suppose f L
p
(1
n
) or measurable with polynomial
growth. Then f also has these properties. Hence, it should be the case that
f () =
_
R
n
fdx =
_
R
n
f () dx. This motivates the following denition.
Denition 11.31 Let T (
and let (. Then T T (
will be dened
by
T () T () .
The next topic is that of convolution. It was just shown that
F (f ) = (2)
n/2
FFf, F
1
(f ) = (2)
n/2
F
1
F
1
f
whenever f L
2
(1
n
) and ( so the same denition is retained in the general
case because it makes perfect sense and agrees with the earlier denition.
Denition 11.32 Let f (
and let (. Then dene the convolution of f with

an element of ( as follows.
f (2)
n/2
F
1
(FFf) (
There is an obvious question. With this denition, is it true that F

1
(f ) =
(2)
n/2
F
1
F
1
f as it was earlier?
Theorem 11.33 Let f (
and let (.
F (f ) = (2)
n/2
FFf, (11.18)
F
1
(f ) = (2)
n/2
F
1
F
1
f. (11.19)
Proof: Note that 11.18 follows from Denition 11.32 and both assertions hold
for f (. Consider 11.19. Here is a simple formula involving a pair of functions in
(.
_
F
1
F
1
_
(x)
=
__ _ _
(x y) e
iyy
1
e
iy
1
z
(z) dzdy
1
dy
_
(2)
n
=
__ _ _
(x y) e
iy y
1
e
i y
1
z
(z) dzd y
1
dy
_
(2)
n
= ( FF) (x) .
Now for (,
(2)
n/2
F
_
F
1
F
1
f
_
() (2)
n/2
_
F
1
F
1
f
_
(F)
(2)
n/2
F
1
f
_
F
1
F
_
(2)
n/2
f
_
F
1
_
F
1
F
__
=
f
_
(2)
n/2
F
1
__
FF
1
F
1
_
(F)
_
_
f
_
F
1
F
1
_
= f ( FF) (11.20)
Also
(2)
n/2
F
1
(FFf) () (2)
n/2
(FFf)
_
F
1
(2)
n/2
Ff
_
FF
1
_
(2)
n/2
f
_
F
_
FF
1
__
=
= f
_
F
_
(2)
n/2
_
FF
1
_
__
= f
_
F
_
(2)
n/2
_
F
1
FFF
1
_
__
= f
_
F
_
F
1
(FF )
__
f (FF ) = f ( FF) . (11.21)
The last line follows from the following.
_
FF(x y) (y) dy =
_
F(x y) F (y) dy
=
_
F (x y) F(y) dy
=
_
(x y) FF(y) dy.
From 11.21 and 11.20 , since was arbitrary,
(2)
n/2
F
_
F
1
F
1
f
_
= (2)
n/2
F
1
(FFf) f
which shows 11.19.
11.4. EXERCISES 295
11.4 Exercises
1. For f L
1
(1
n
), show that if F
1
f L
1
or Ff L
1
, then f equals a
continuous bounded function a.e.
2. Suppose f, g L
1
(1) and Ff = Fg. Show f = g a.e.
3. Show that if f L
1
(1
n
) , then lim
|x|
Ff (x) = 0.
4. Suppose f f = f or f f = 0 and f L
1
(1). Show f = 0.
5. For this problem dene
_
a
f (t) dt lim
r
_
r
a
f (t) dt. Note this coincides
with the Lebesgue integral when f L
1
(a, ). Show
(a)
_
0
sin(u)
u
du =

2
(b) lim
r
_
sin(ru)
u
du = 0 whenever > 0.
(c) If f L
1
(1), then lim
r
_
R
sin(ru) f (u) du = 0.
Hint: For the rst two, use
1
u
=
_
0
e
ut
dt and apply Fubinis theorem to
_
R
0
sinu
_
R
e
ut
dtdu. For the last part, rst establish it for f C
c
(1) and
then use the density of this set in L
1
(1) to obtain the result. This is sometimes
called the Riemann Lebesgue lemma.
6. Suppose that g L
1
(1) and that at some x > 0, g is locally Holder contin-
uous from the right and from the left. This means
lim
r0+
g (x +r) g (x+)
exists,
lim
r0+
g (x r) g (x)
exists and there exist constants K, > 0 and r (0, 1] such that for [x y[ <
,
[g (x+) g (y)[ < K[x y[
r
for y > x and
[g (x) g (y)[ < K[x y[
r
for y < x. Show that under these conditions,
lim
r
2
_

0
sin(ur)
u
_
g (x u) +g (x +u)
2
_
du
=
g (x+) +g (x)
2
.
7. Let g L
1
(1) and suppose g is locally Holder continuous from the right
and from the left at x. Show that then
lim
R
1
2
_
R
R
e
ixt
_

e
ity
g (y) dydt =
g (x+) +g (x)
2
.
This is very interesting. If g L
2
(1), this shows F
1
(Fg) (x) =
g(x+)+g(x)
2
,
the midpoint of the jump in g at the point, x. In particular, if g (,
F
1
(Fg) = g. Hint: Show the left side of the above equation reduces to
2
_

0
sin(ur)
u
_
g (x u) +g (x +u)
2
_
du
and then use Problem 6 to obtain the result.
8. A measurable function g dened on (0, ) has exponential growth if [g (t)[
Ce
t
for some . For Re (s) > , dene the Laplace Transform by
Lg (s)
_

0
e
su
g (u) du.
Assume that g has exponential growth as above and is Holder continuous from
the right and from the left at t. Pick > . Show that
lim
R
1
2
_
R
R
e
t
e
iyt
Lg ( +iy) dy =
g (t+) +g (t)
2
.
This formula is sometimes written in the form
1
2i
_
+i
i
e
st
Lg (s) ds
and is called the complex inversion integral for Laplace transforms. It can be
used to nd inverse Laplace transforms. Hint:
1
2
_
R
R
e
t
e
iyt
Lg ( +iy) dy =
1
2
_
R
R
e
t
e
iyt
_

0
e
(+iy)u
g (u) dudy.
Now use Fubinis theorem and do the integral from R to R to get this equal
to
e
t
e
u
g (u)
sin(R(t u))
t u
du
where g is the zero extension of g o [0, ). Then this equals
e
t
e
(tu)
g (t u)
sin(Ru)
u
du
11.4. EXERCISES 297
which equals
2e
t
_

0
g (t u) e
(tu)
+g (t +u) e
(t+u)
2
sin(Ru)
u
du
and then apply the result of Problem 6.
9. Suppose f S. Show F(f
x
j
)(t) = it
j
Ff(t).
10. Let f S and let k be a positive integer.
[[f[[
k,2
([[f[[
2
2
+
||k
[[D
f[[
2
2
)
1/2
.
One could also dene
[[[f[[[
k,2
(
_
R
n
[Ff(x)[
2
(1 +[x[
2
)
k
dx)
1/2
.
Show both [[ [[
k,2
and [[[ [[[
k,2
are norms on S and that they are equivalent.
These are Sobolev space norms. For which values of k does the second norm
make sense? How about the rst norm?
11. Dene H
k
(1
n
), k 0 by f L
2
(1
n
) such that
(
_
[Ff(x)[
2
(1 +[x[
2
)
k
dx)
1
2
< ,
[[[f[[[
k,2
(
_
[Ff(x)[
2
(1 +[x[
2
)
k
dx)
1
2
.
Show H
k
(1
n
) is a Banach space, and that if k is a positive integer, H
k
(1
n
)
= f L
2
(1
n
) : there exists u
j
( with [[u
j
f[[
2
0 and u
j
is a
Cauchy sequence in [[ [[
k,2
of Problem 10. This is one way to dene Sobolev
Spaces. Hint: One way to do the second part of this is to dene a new
measure, by
(E)
_
E
_
1 +[x[
2
_
k
dx.
Then show is a Radon measure and show there exists g
m
such that g
m
(
and g
m
Ff in L
2
(). Thus g
m
= Ff
m
, f
m
( because F maps ( onto (.
Then by Problem 10, f
m
is Cauchy in the norm [[ [[
k,2
.
12. If 2k > n, show that if f H
k
(1
n
), then f equals a bounded continuous
function a.e. Hint: Show that for k this large, Ff L
1
(1
n
), and then use
Problem 1. To do this, write
[Ff(x)[ = [Ff(x)[(1 +[x[
2
)
k
2
(1 +[x[
2
)
k
2
,
So
_
[Ff(x)[dx =
_
[Ff(x)[(1 +[x[
2
)
k
2
(1 +[x[
2
)
k
2
dx.
Use the Cauchy Schwarz inequality. This is an example of a Sobolev imbedding
Theorem.
13. Let u (. Then Fu ( and so, in particular, it makes sense to form the
integral,
_
R
Fu(x
, x
n
) dx
n
where (x
, x
n
) = x 1
n
. For u (, dene u(x
) u(x
, 0). Find a
constant such that F (u) (x
) equals this constant times the above integral.

Hint: By the dominated convergence theorem
_
R
Fu(x
, x
n
) dx
n
= lim
0
_
R
e
(x
n
)
2
Fu(x
, x
n
) dx
n
.
Now use the denition of the Fourier transform and Fubinis theorem as re-
quired in order to obtain the desired relationship.
14. Recall the Fourier series of a function in L
2
(, ) converges to the func-
tion in L
2
(, ). Prove a similar theorem with L
2
(, ) replaced by
L
2
(m, m) and the functions
_
(2)
(1/2)
e
inx
_
nZ
used in the Fourier series replaced with
_
(2m)
(1/2)
e
i
n
m
x
_
nZ
Now suppose f is a function in L
2
(1) satisfying Ff (t) = 0 if [t[ > m. Show
that if this is so, then
f (x) =
1
nZ
f
_
n
m
_
sin( (mx +n))
mx +n
.
Here m is a positive integer. This is sometimes called the Shannon sampling
theorem.Hint: First note that since Ff L
2
and is zero o a nite interval,
it follows Ff L
1
. Also
f (t) =
1
2
_
m
m
e
itx
Ff (x) dx
and you can conclude from this that f has all derivatives and they are all
bounded. Thus f is a very nice function. You can replace Ff with its Fourier
series. Then consider carefully the Fourier coecient of Ff. Argue it equals
f
_
n
m
_
or at least an appropriate constant times this. When you get this the
rest will fall quickly into place if you use Ff is zero o [m, m].
Banach Spaces
12.1 Theorems Based On Baire Category
12.1.1 Baire Category Theorem
Some examples of Banach spaces that have been discussed up to now are 1
n
, C
n
,
and L
p
(). Theorems about general Banach spaces are proved in this chapter.
The main theorems to be presented here are the uniform boundedness theorem, the
open mapping theorem, the closed graph theorem, and the Hahn Banach Theorem.
The rst three of these theorems come from the Baire category theorem which is
about to be presented. They are topological in nature. The Hahn Banach theorem
has nothing to do with topology. Banach spaces are all normed linear spaces and as
such, they are all metric spaces because a normed linear space may be considered
as a metric space with d (x, y) [[x y[[. You can check that this satises all the
axioms of a metric. As usual, if every Cauchy sequence converges, the metric space
is called complete.
Denition 12.1 A complete normed linear space is called a Banach space.
The following remarkable result is called the Baire category theorem. To get an
idea of its meaning, imagine you draw a line in the plane. The complement of this
line is an open set and is dense because every point, even those on the line, are limit
points of this open set. Now draw another line. The complement of the two lines
is still open and dense. Keep drawing lines and looking at the complements of the
union of these lines. You always have an open set which is dense. Now what if there
were countably many lines? The Baire category theorem implies the complement
of the union of these lines is dense. In particular it is nonempty. Thus you cannot
write the plane as a countable union of lines. This is a rather rough description of
this very important theorem. The precise statement and proof follow.
Theorem 12.2 Let (X, d) be a complete metric space and let U
n
n=1
be a se-
quence of open subsets of X satisfying U
n
= X (U
n
is dense). Then D
n=1
U
n
is a dense subset of X.
299
300 BANACH SPACES
Proof: Let p X and let r
0
> 0. I need to show D B(p, r
0
) ,= . Since U
1
is
dense, there exists p
1
U
1
B(p, r
0
), an open set. Let p
1
B(p
1
, r
1
) B(p
1
, r
1
)
U
1
B(p, r
0
) and r
1
< 2
1
. This is possible because U
1
B(p, r
0
) is an open set
and so there exists r
1
such that B(p
1
, 2r
1
) U
1
B(p, r
0
). But
B(p
1
, r
1
) B(p
1
, r
1
) B(p
1
, 2r
1
)
because B(p
1
, r
1
) = x X : d (x, p) r
1
. (Why?)
r
0
p
p
1
There exists p
2
U
2
B(p
1
, r
1
) because U
2
is dense. Let
p
2
B(p
2
, r
2
) B(p
2
, r
2
) U
2
B(p
1
, r
1
) U
1
U
2
B(p, r
0
).
and let r
2
< 2
2
. Continue in this way. Thus
r
n
< 2
n
,
B(p
n
, r
n
) U
1
U
2
... U
n
B(p, r
0
),
B(p
n
, r
n
) B(p
n1
, r
n1
).
The sequence, p
n
is a Cauchy sequence because all terms of p
k
for k n
are contained in B(p
n
, r
n
), a set whose diameter is no larger than 2
n
. Since X is
complete, there exists p
such that
lim
n
p
n
= p
.
Since all but nitely many terms of p
n
are in B(p
m
, r
m
), it follows that p

B(p
m
, r
m
) for each m. Therefore,
p
m=1
B(p
m
, r
m
)
i=1
U
i
B(p, r
0
).
The following corollary is also called the Baire category theorem.
Corollary 12.3 Let X be a complete metric space and suppose X =
i=1
F
i
where
each F
i
is a closed set. Then for some i, interior F
i
,= .
Proof: If all F
i
has empty interior, then F
C
i
would be a dense open set. There-
fore, from Theorem 12.2, it would follow that
= (
i=1
F
i
)
C
=
i=1
F
C
i
,= .
12.1. THEOREMS BASED ON BAIRE CATEGORY 301
The set D of Theorem 12.2 is called a G
set because it is the countable inter-

section of open sets. Thus D is a dense G
set.
Recall that a norm satises:
a.) [[x[[ 0, [[x[[ = 0 if and only if x = 0.
b.) [[x +y[[ [[x[[ +[[y[[.
c.) [[cx[[ = [c[ [[x[[ if c is a scalar and x X.
From the denition of continuity, it follows easily that a function is continuous
if
lim
n
x
n
= x
implies
lim
n
f(x
n
) = f(x).
Theorem 12.4 Let X and Y be two normed linear spaces and let L : X Y be
linear (L(ax + by) = aL(x) + bL(y) for a, b scalars and x, y X). The following
are equivalent
a.) L is continuous at 0
b.) L is continuous
c.) There exists K > 0 such that [[Lx[[
Y
K [[x[[
X
for all x X (L is
bounded).
Proof: a.)b.) Let x
n
x. It is necessary to show that Lx
n
Lx. But
(x
n
x) 0 and so from continuity at 0, it follows
L(x
n
x) = Lx
n
Lx 0
so Lx
n
Lx. This shows a.) implies b.).
b.)c.) Since L is continuous, L is continuous at 0. Hence [[Lx[[
Y
< 1 whenever
[[x[[
X
for some . Therefore, suppressing the subscript on the [[ [[,
[[L
_
x
[[x[[
_
[[ 1.
Hence
[[Lx[[
1
[[x[[.
c.)a.) follows from the inequality given in c.).
Denition 12.5 Let L : X Y be linear and continuous where X and Y are
normed linear spaces. Denote the set of all such continuous linear maps by L(X, Y )
and dene
[[L[[ = sup[[Lx[[ : [[x[[ 1. (12.1)
This is called the operator norm.
302 BANACH SPACES
Note that from Theorem 12.4 [[L[[ is well dened because of part c.) of that
Theorem.
The next lemma follows immediately from the denition of the norm and the
assumption that L is linear.
Lemma 12.6 With [[L[[ dened in 12.1, L(X, Y ) is a normed linear space. Also
[[Lx[[ [[L[[ [[x[[.
Proof: Let x ,= 0 then x/ [[x[[ has norm equal to 1 and so
L
_
x
[[x[[
_
[[L[[ .
Therefore, multiplying both sides by [[x[[, [[Lx[[ [[L[[ [[x[[. This is obviously a
linear space. It remains to verify the operator norm really is a norm. First of all,
if [[L[[ = 0, then Lx = 0 for all [[x[[ 1. It follows that for any x ,= 0, 0 = L
_
x
||x||
_
and so Lx = 0. Therefore, L = 0. Also, if c is a scalar,
[[cL[[ = sup
||x||1
[[cL(x)[[ = [c[ sup
||x||1
[[Lx[[ = [c[ [[L[[ .
It remains to verify the triangle inequality. Let L, M L(X, Y ) .
[[L +M[[ sup
||x||1
[[(L +M) (x)[[ sup
||x||1
([[Lx[[ +[[Mx[[)
sup
||x||1
[[Lx[[ + sup
||x||1
[[Mx[[ = [[L[[ +[[M[[ .
This shows the operator norm is really a norm as hoped. This proves the lemma.
For example, consider the space of linear transformations dened on 1
n
having
values in 1
m
. The fact the transformation is linear automatically imparts conti-
nuity to it. You should give a proof of this fact. Recall that every such linear
transformation can be realized in terms of matrix multiplication.
Thus, in nite dimensions the algebraic condition that an operator is linear is
sucient to imply the topological condition that the operator is continuous. The
situation is not so simple in innite dimensional spaces such as C (X; 1
n
). This
explains the imposition of the topological condition of continuity as a criterion for
membership in L(X, Y ) in addition to the algebraic condition of linearity.
Theorem 12.7 If Y is a Banach space, then L(X, Y ) is also a Banach space.
Proof: Let L
n
be a Cauchy sequence in L(X, Y ) and let x X.
[[L
n
x L
m
x[[ [[x[[ [[L
n
L
m
[[.
Thus L
n
x is a Cauchy sequence. Let
Lx = lim
n
L
n
x.
Then, clearly, L is linear because if x
1
, x
2
are in X, and a, b are scalars, then
L(ax
1
+bx
2
) = lim
n
L
n
(ax
1
+bx
2
)
= lim
n
(aL
n
x
1
+bL
n
x
2
)
= aLx
1
+bLx
2
.
Also L is continuous. To see this, note that [[L
n
[[ is a Cauchy sequence of real
numbers because [[[L
n
[[ [[L
m
[[[ [[L
n
L
m
[[. Hence there exists K > sup[[L
n
[[ :
n N. Thus, if x X,
[[Lx[[ = lim
n
[[L
n
x[[ K[[x[[.
12.1.2 Uniform Boundedness Theorem
The next big result is sometimes called the Uniform Boundedness theorem, or the
Banach-Steinhaus theorem. This is a very surprising theorem which implies that for
a collection of bounded linear operators, if they are bounded pointwise, then they are
also bounded uniformly. As an example of a situation in which pointwise bounded
does not imply uniformly bounded, consider the functions f
(x) A
(,1)
(x) x
1
for (0, 1). Clearly each function is bounded and the collection of functions is
bounded at each point of (0, 1), but there is no bound for all these functions taken
together. One problem is that (0, 1) is not a Banach space. Therefore, the functions
cannot be linear.
Theorem 12.8 Let X be a Banach space and let Y be a normed linear space. Let
L
be a collection of elements of L(X, Y ). Then one of the following happens.

a.) sup[[L
[[ : <
b.) There exists a dense G
set, D, such that for all x D,

sup[[L
x[[ = .
Proof: For each n N, dene
U
n
= x X : sup[[L
x[[ : > n.
Then U
n
is an open set because if x U
n
, then there exists such that
[[L
x[[ > n
But then, since L
is continuous, this situation persists for all y suciently close

to x, say for all y B(x, ). Then B(x, ) U
n
which shows U
n
is open.
Case b.) is obtained from Theorem 12.2 if each U
n
is dense.
The other case is that for some n, U
n
is not dense. If this occurs, there exists
x
0
and r > 0 such that for all x B(x
0
, r), [[L
x[[ n for all . Now if y

304 BANACH SPACES
B(0, r), x
0
+ y B(x
0
, r). Consequently, for all such y, [[L
(x
0
+ y)[[ n. This
implies that for all and [[y[[ < r,
[[L
y[[ n +[[L
(x
0
)[[ 2n.
Therefore, if [[y[[ 1,

r
2
y
< r and so for all ,

[[L
_
r
2
y
_
[[ 2n.
Now multiplying by r/2 it follows that whenever [[y[[ 1, [[L
(y)[[ 4n/r. Hence

case a.) holds.
12.1.3 Open Mapping Theorem
Another remarkable theorem which depends on the Baire category theorem is the
open mapping theorem. Unlike Theorem 12.8 it requires both X and Y to be
Banach spaces.
Theorem 12.9 Let X and Y be Banach spaces, let L L(X, Y ), and suppose L
is onto. Then L maps open sets onto open sets.
To aid in the proof, here is a lemma.
Lemma 12.10 Let a and b be positive constants and suppose
B(0, a) L(B(0, b)).
Then
L(B(0, b)) L(B(0, 2b)).
Proof of Lemma 12.10: Let y L(B(0, b)). There exists x
1
B(0, b) such
that [[y Lx
1
[[ <
a
2
. Now this implies
2y 2Lx
1
B(0, a) L(B(0, b)).
Thus 2y 2Lx
1
L(B(0, b)) just like y was. Therefore, there exists x
2
B(0, b)
such that [[2y 2Lx
1
Lx
2
[[ < a/2. Hence [[4y 4Lx
1
2Lx
2
[[ < a, and there
exists x
3
B(0, b) such that [[4y 4Lx
1
2Lx
2
Lx
3
[[ < a/2. Continuing in this
way, there exist x
1
, x
2
, x
3
, x
4
, ... in B(0, b) such that
[[2
n
y
n
i=1
2
n(i1)
L(x
i
)[[ < a
which implies
[[y
n
i=1
2
(i1)
L(x
i
)[[ = [[y L
_
n
i=1
2
(i1)
(x
i
)
_
[[ < 2
n
a (12.2)
Now consider the partial sums of the series,

i=1
2
(i1)
x
i
.
[[
n
i=m
2
(i1)
x
i
[[ b
i=m
2
(i1)
= b 2
m+2
.
Therefore, these partial sums form a Cauchy sequence and so since X is complete,
there exists x =
i=1
2
(i1)
x
i
. Letting n in 12.2 yields [[y Lx[[ = 0. Now
[[x[[ = lim
n
[[
n
i=1
2
(i1)
x
i
[[
lim
n
n
i=1
2
(i1)
[[x
i
[[ < lim
n
n
i=1
2
(i1)
b = 2b.
Proof of Theorem 12.9: Y =
n=1
L(B(0, n)). By Corollary 12.3, the set,
L(B(0, n
0
)) has nonempty interior for some n
0
. Thus B(y, r) L(B(0, n
0
)) for
some y and some r > 0. Since L is linear B(y, r) L(B(0, n
0
)) also. Here is
why. If z B(y, r), then z B(y, r) and so there exists x
n
B(0, n
0
) such
that Lx
n
z. Therefore, L(x
n
) z and x
n
B(0, n
0
) also. Therefore
z L(B(0, n
0
)). Then it follows that
B(0, r) B(y, r) +B(y, r)
y
1
+y
2
: y
1
B(y, r) and y
2
B(y, r)
L(B(0, 2n
0
))
The reason for the last inclusion is that from the above, if y
1
B(y, r) and y
2

B(y, r), there exists x
n
, z
n
B(0, n
0
) such that
Lx
n
y
1
, Lz
n
y
2
.
Therefore,
[[x
n
+z
n
[[ 2n
0
and so (y
1
+y
2
) L(B(0, 2n
0
)).
By Lemma 12.10, L(B(0, 2n
0
)) L(B(0, 4n
0
)) which shows
B(0, r) L(B(0, 4n
0
)).
Letting a = r(4n
0
)
1
, it follows, since L is linear, that B(0, a) L(B(0, 1)). It
follows since L is linear,
L(B(0, r)) B(0, ar). (12.3)
Now let U be open in X and let x +B(0, r) = B(x, r) U. Using 12.3,
L(U) L(x +B(0, r))
= Lx +L(B(0, r)) Lx +B(0, ar) = B(Lx, ar).
306 BANACH SPACES
Hence
Lx B(Lx, ar) L(U).
which shows that every point, Lx LU, is an interior point of LU and so LU is
open. This proves the theorem.
This theorem is surprising because it implies that if [[ and [[[[ are two norms
with respect to which a vector space X is a Banach space such that [[ K[[[[,
then there exists a constant k, such that [[[[ k [[ . This can be useful because
sometimes it is not clear how to compute k when all that is needed is its existence.
To see the open mapping theorem implies this, consider the identity map idx = x.
Then id : (X, [[[[) (X, [[) is continuous and onto. Hence id is an open map which
implies id
1
is continuous. Theorem 12.4 gives the existence of the constant k.
12.1.4 Closed Graph Theorem
Denition 12.11 Let f : D E. The set of all ordered pairs of the form
(x, f(x)) : x D is called the graph of f.
Denition 12.12 If X and Y are normed linear spaces, make XY into a normed
linear space by using the norm [[(x, y)[[ = max ([[x[[, [[y[[) along with component-
wise addition and scalar multiplication. Thus a(x, y) +b(z, w) (ax+bz, ay +bw).
There are other ways to give a norm for X Y . For example, you could dene
[[(x, y)[[ = [[x[[ +[[y[[
Lemma 12.13 The norm dened in Denition 12.12 on X Y along with the
denition of addition and scalar multiplication given there make X Y into a
normed linear space.
Proof: The only axiom for a norm which is not obvious is the triangle inequality.
Therefore, consider
[[(x
1
, y
1
) + (x
2
, y
2
)[[ = [[(x
1
+x
2
, y
1
+y
2
)[[
= max ([[x
1
+x
2
[[ , [[y
1
+y
2
[[)
max ([[x
1
[[ +[[x
2
[[ , [[y
1
[[ +[[y
2
[[)
max ([[x
1
[[ , [[y
1
[[) + max ([[x
2
[[ , [[y
2
[[)
= [[(x
1
, y
1
)[[ +[[(x
2
, y
2
)[[ .
It is obvious X Y is a vector space from the above denition. This proves the
lemma.
Lemma 12.14 If X and Y are Banach spaces, then X Y with the norm and
vector space operations dened in Denition 12.12 is also a Banach space.
Proof: The only thing left to check is that the space is complete. But this
follows from the simple observation that (x
n
, y
n
) is a Cauchy sequence in X Y
if and only if x
n
and y
n
are Cauchy sequences in X and Y respectively. Thus
if (x
n
, y
n
) is a Cauchy sequence in XY , it follows there exist x and y such that
x
n
x and y
n
y. But then from the denition of the norm, (x
n
, y
n
) (x, y).
Lemma 12.15 Every closed subspace of a Banach space is a Banach space.
Proof: If F X where X is a Banach space and x
n
is a Cauchy sequence
in F, then since X is complete, there exists a unique x X such that x
n
x.
However this means x F = F since F is closed.
Denition 12.16 Let X and Y be Banach spaces and let D X be a subspace. A
linear map L : D Y is said to be closed if its graph is a closed subspace of XY .
Equivalently, L is closed if x
n
x and Lx
n
y implies x D and y = Lx.
Note the distinction between closed and continuous. If the operator is closed
the assertion that y = Lx only follows if it is known that the sequence Lx
n
converges. In the case of a continuous operator, the convergence of Lx

n
follows
from the assumption that x
n
x. It is not always the case that a mapping which
is closed is necessarily continuous. Consider the function f (x) = tan(x) if x is not
an odd multiple of

2
and f (x) 0 at every odd multiple of

2
. Then the graph
is closed and the function is dened on 1 but it clearly fails to be continuous. Of
course this function is not linear. You could also consider the map,
d
dx
:
_
y C
1
([0, 1]) : y (0) = 0
_
D C ([0, 1]) .
where the norm is the uniform norm on C ([0, 1]) , [[y[[
. If y D, then
y (x) =
_
x
0
y
(t) dt.
Therefore, if
dy
n
dx
f C ([0, 1]) and if y
n
y in C ([0, 1]) it follows that
y
n
(x) =
_
x
0
dy
n
(t)
dx
dt

y (x) =
_
x
0
f (t) dt
and so by the fundamental theorem of calculus f (x) = y
(x) and so the mapping

is closed. It is obviously not continuous because it takes y (x) and y (x) +
1
n
sin(nx)
to two functions which are far from each other even though these two functions are
very close in C ([0, 1]). Furthermore, it is not dened on the whole space, C ([0, 1]).
The next theorem, the closed graph theorem, gives conditions under which closed
implies continuous.
Theorem 12.17 Let X and Y be Banach spaces and suppose L : X Y is closed
and linear. Then L is continuous.
308 BANACH SPACES
Proof: Let G be the graph of L. G = (x, Lx) : x X. By Lemma 12.15
it follows that G is a Banach space. Dene P : G X by P(x, Lx) = x. P maps
the Banach space G onto the Banach space X and is continuous and linear. By the
open mapping theorem, P maps open sets onto open sets. Since P is also one to
one, this says that P
1
is continuous. Thus [[P
1
x[[ K[[x[[. Hence
[[Lx[[ max ([[x[[, [[Lx[[) K[[x[[
By Theorem 12.4 on Page 301, this shows L is continuous and proves the theorem.
The following corollary is quite useful. It shows how to obtain a new norm on
the domain of a closed operator such that the domain with this new norm becomes
a Banach space.
Corollary 12.18 Let L : D X Y where X, Y are a Banach spaces, and L is
a closed operator. Then dene a new norm on D by
[[x[[
D
[[x[[
X
+[[Lx[[
Y
.
Then D with this new norm is a Banach space.
Proof: If x
n
is a Cauchy sequence in D with this new norm, it follows both
x
n
and Lx
n
are Cauchy sequences and therefore, they converge. Since L is
closed, x
n
x and Lx
n
Lx for some x D. Thus [[x
n
x[[
D
0.
12.2 Hahn Banach Theorem
The closed graph, open mapping, and uniform boundedness theorems are the three
major topological theorems in functional analysis. The other major theorem is the
Hahn-Banach theorem which has nothing to do with topology. Before presenting
this theorem, here are some preliminaries about partially ordered sets.
Denition 12.19 Let T be a nonempty set. T is called a partially ordered set if
there is a relation, denoted here by , such that
x x for all x T.
If x y and y z then x z.
( T is said to be a chain if every two elements of ( are related. This means that
if x, y (, then either x y or y x. Sometimes a chain is called a totally ordered
set. ( is said to be a maximal chain if whenever T is a chain containing (, T = (.
The most common example of a partially ordered set is the power set of a given
set with being the relation. It is also helpful to visualize partially ordered sets
as trees. Two points on the tree are related if they are on the same branch of
the tree and one is higher than the other. Thus two points on dierent branches
would not be related although they might both be larger than some point on the
12.2. HAHN BANACH THEOREM 309
trunk. You might think of many other things which are best considered as partially
ordered sets. Think of food for example. You might nd it dicult to determine
which of two favorite pies you like better although you may be able to say very
easily that you would prefer either pie to a dish of lard topped with whipped cream
and mustard. The following theorem is equivalent to the axiom of choice. For a
discussion of this, see the appendix on the subject.
Theorem 12.20 (Hausdor Maximal Principle) Let T be a nonempty partially
ordered set. Then there exists a maximal chain.
Denition 12.21 Let X be a real vector space : X 1 is called a gauge function
if
(x +y) (x) +(y),
(ax) = a(x) if a 0. (12.4)
Suppose M is a subspace of X and z / M. Suppose also that f is a linear
real-valued function having the property that f(x) (x) for all x M. Consider
the problem of extending f to M 1z such that if F is the extended function,
F(y) (y) for all y M 1z and F is linear. Since F is to be linear, it suces
to determine how to dene F(z). Letting a > 0, it is required to dene F (z) such
that the following hold for all x, y M.
f(x)
..
F (x) +aF (z) = F(x +az) (x +az),
f(y)
..
F (y) aF (z) = F(y az) (y az). (12.5)
Now if these inequalities hold for all y/a, they hold for all y because M is given to
be a subspace. Therefore, multiplying by a
1
12.4 implies that what is needed is
to choose F (z) such that for all x, y M,
f(x) +F(z) (x +z), f(y) (y z) F(z)
and that if F (z) can be chosen in this way, this will satisfy 12.5 for all x, y and the
problem of extending f will be solved. Hence it is necessary to choose F(z) such
that for all x, y M
f(y) (y z) F(z) (x +z) f(x). (12.6)
Is there any such number between f(y) (y z) and (x + z) f(x) for every
pair x, y M? This is where f(x) (x) on M and that f is linear is used.
For x, y M,
(x +z) f(x) [f(y) (y z)]
= (x +z) +(y z) (f(x) +f(y))
(x +y) f(x +y) 0.
310 BANACH SPACES
Therefore there exists a number between
supf(y) (y z) : y M
and
inf (x +z) f(x) : x M
Choose F(z) to satisfy 12.6. This has proved the following lemma.
Lemma 12.22 Let M be a subspace of X, a real linear space, and let be a gauge
function on X. Suppose f : M 1 is linear, z / M, and f (x) (x) for all
x M. Then f can be extended to M1z such that, if F is the extended function,
F is linear and F(x) (x) for all x M 1z.
With this lemma, the Hahn Banach theorem can be proved.
Theorem 12.23 (Hahn Banach theorem) Let X be a real vector space, let M be a
subspace of X, let f : M 1 be linear, let be a gauge function on X, and suppose
f(x) (x) for all x M. Then there exists a linear function, F : X 1, such
that
a.) F(x) = f(x) for all x M
b.) F(x) (x) for all x X.
Proof: Let T = (V, g) : V M, V is a subspace of X, g : V 1 is linear,
g(x) = f(x) for all x M, and g(x) (x) for x V . Then (M, f) T so T ,= .
Dene a partial order by the following rule.
(V, g) (W, h)
means
V W and h(x) = g(x) if x V.
By Theorem 12.20, there exists a maximal chain, ( T. Let Y = V : (V, g) (
and let h : Y 1 be dened by h(x) = g(x) where x V and (V, g) (. This
is well dened because if x V
1
and V
2
where (V
1
, g
1
) and (V
2
, g
2
) are both in the
chain, then since ( is a chain, the two element related. Therefore, g
1
(x) = g
2
(x).
Also h is linear because if ax + by Y , then x V
1
and y V
2
where (V
1
, g
1
)
and (V
2
, g
2
) are elements of (. Therefore, letting V denote the larger of the two V
i
,
and g be the function that goes with V , it follows ax + by V where (V, g) (.
Therefore,
h(ax +by) = g (ax +by)
= ag (x) +bg (y)
= ah(x) +bh(y) .
Also, h(x) = g (x) (x) for any x Y because for such x, x V where (V, g) (.
Is Y = X? If not, there exists z X Y and there exists an extension of h to
Y 1z using Lemma 12.22. Letting h denote this extended function, contradicts
the maximality of (. Indeed, (
_
Y 1z, h
_
would be a longer chain. This
proves the Hahn Banach theorem.
This is the original version of the theorem. There is also a version of this theorem
for complex vector spaces which is based on a trick.
Corollary 12.24 (Hahn Banach) Let M be a subspace of a complex normed linear
space, X, and suppose f : M C is linear and satises [f(x)[ K[[x[[ for all
x M. Then there exists a linear function, F, dened on all of X such that
F(x) = f(x) for all x M and [F(x)[ K[[x[[ for all x.
Proof: First note f(x) = Re f(x) +i Imf (x) and so
Re f(ix) +i Imf(ix) = f(ix) = if(x) = i Re f(x) Imf(x).
Therefore, Imf(x) = Re f(ix), and
f(x) = Re f(x) i Re f(ix).
This is important because it shows it is only necessary to consider Re f in under-
standing f. Now it happens that Re f is linear with respect to real scalars so the
above version of the Hahn Banach theorem applies. This is shown next.
If c is a real scalar
Re f(cx) i Re f(icx) = cf(x) = c Re f(x) ic Re f(ix).
Thus Re f(cx) = c Re f(x). Also,
Re f(x +y) i Re f(i (x +y)) = f(x +y)
= f (x) +f (y)
= Re f(x) i Re f(ix) + Re f(y) i Re f(iy).
Equating real parts, Re f(x + y) = Re f(x) + Re f(y). Thus Re f is linear with
respect to real scalars as hoped.
Consider X as a real vector space and let (x) K[[x[[. Then for all x M,
[ Re f(x)[ [f (x)[ K[[x[[ = (x).
From Theorem 12.23, Re f may be extended to a function, h which satises
h(ax +by) = ah(x) +bh(y) if a, b 1
h(x) K[[x[[ for all x X.
Actually, [h(x)[ K[[x[[ . The reason for this is that h(x) = h(x) K[[x[[ =
K[[x[[ and therefore, h(x) K[[x[[. Let
F(x) h(x) ih(ix).
312 BANACH SPACES
By arguments similar to the above, F is linear.
F (ix) = h(ix) ih(x)
= ih(x) +h(ix)
= i (h(x) ih(ix)) = iF (x) .
If c is a real scalar,
F (cx) = h(cx) ih(icx)
= ch(x) cih(ix) = cF (x)
Now
F (x +y) = h(x +y) ih(i (x +y))
= h(x) +h(y) ih(ix) ih(iy)
= F (x) +F (y) .
Thus
F ((a +ib) x) = F (ax) +F (ibx)
= aF (x) +ibF (x)
= (a +ib) F (x) .
This shows F is linear as claimed.
Now wF(x) = [F(x)[ for some [w[ = 1. Therefore
[F(x)[ = wF(x) = h(wx)
must equal zero
..
ih(iwx) = h(wx)
= [h(wx)[ K[[wx[[ = K[[x[[ .
Denition 12.25 Let X be a Banach space. Denote by X
the space of continuous

linear functions which map X to the eld of scalars. Thus X
= L(X, F). By
Theorem 12.7 on Page 302, X
is a Banach space. Remember with the norm dened

on L(X, F),
[[f[[ = sup[f(x)[ : [[x[[ 1
X
is called the dual space.

Denition 12.26 Let X and Y be Banach spaces and suppose L L(X, Y ). Then
dene the adjoint map in L(Y
, X
), denoted by L
, by
L
(x) y
(Lx)
for all y
.
The following diagram is a good one to help remember this denition.
X
X

L
Y
This is a generalization of the adjoint of a linear transformation on an inner
product space. Recall
(Ax, y) = (x, A
y)
What is being done here is to generalize this algebraic concept to arbitrary Banach
spaces. There are some issues which need to be discussed relative to the above
denition. First of all, it must be shown that L
. Also, it will be useful to

have the following lemma which is a useful application of the Hahn Banach theorem.
Lemma 12.27 Let X be a normed linear space and let x X. Then there exists
x
such that [[x
[[ = 1 and x
(x) = [[x[[.
Proof: Let f : Fx F be dened by f(x) = [[x[[. Then for y = x Fx,
[f (y)[ = [f (x)[ = [[ [[x[[ = [y[ .
By the Hahn Banach theorem, there exists x
such that x
(x) = f(x) and

[[x
[[ 1. Since x
(x) = [[x[[ it follows that [[x
[[ = 1 because
[[x
[[
_
x
[[x[[
_
=
[[x[[
[[x[[
= 1.
Theorem 12.28 Let L L(X, Y ) where X and Y are Banach spaces. Then
a.) L
L(Y
, X
) as claimed and [[L
[[ = [[L[[.
b.) If L maps one to one onto a closed subspace of Y , then L
is onto.
c.) If L maps onto a dense subset of Y , then L
is one to one.
Proof: It is routine to verify L
and L
are both linear. This follows imme-

diately from the denition. As usual, the interesting thing concerns continuity.
[[L
[[ = sup
||x||1
[L
(x)[ = sup
||x||1
[y
(Lx)[ [[y
[[ [[L[[ .
Thus L
is continuous as claimed and [[L
[[ [[L[[ .
By Lemma 12.27, there exists y
x
Y
such that [[y
x
[[ = 1 and y
x
(Lx) =
[[Lx[[ .Therefore,
[[L
[[ = sup
||y
||1
[[L
[[ = sup
||y
||1
sup
||x||1
[L
(x)[
= sup
||y
||1
sup
||x||1
[y
(Lx)[ = sup
||x||1
sup
||y
||1
[y
(Lx)[ sup
||x||1
[y
x
(Lx)[ = sup
||x||1
[[Lx[[ = [[L[[
314 BANACH SPACES
showing that [[L
[[ [[L[[ and this shows part a.).

If L is one to one and onto a closed subset of Y , then L(X) being a closed
subspace of a Banach space, is itself a Banach space and so the open mapping
theorem implies L
1
: L(X) X is continuous. Hence
[[x[[ = [[L
1
Lx[[
L
1
[[Lx[[
Now let x
be given. Dene f L(L(X), C) by f(Lx) = x
(x). The function,

f is well dened because if Lx
1
= Lx
2
, then since L is one to one, it follows x
1
= x
2
and so f (L(x
1
)) = x
(x
1
) = x
(x
2
) = f (L(x
1
)). Also, f is linear because
f (aL(x
1
) +bL(x
2
)) = f (L(ax
1
+bx
2
))
x
(ax
1
+bx
2
)
= ax
(x
1
) +bx
(x
2
)
= af (L(x
1
)) +bf (L(x
2
)) .
In addition to this,
[f(Lx)[ = [x
(x)[ [[x
[[ [[x[[ [[x
[[
L
1
[[Lx[[
and so the norm of f on L(X) is no larger than [[x
[[
L
1
. By the Hahn Banach

theorem, there exists an extension of f to an element y
such that [[y
[[
[[x
[[
L
1
. Then
L
(x) = y
(Lx) = f(Lx) = x
(x)
so L
= x
because this holds for all x. Since x
was arbitrary, this shows L
is
onto and proves b.).
Consider the last assertion. Suppose L
= 0. Is y
= 0? In other words
is y
(y) = 0 for all y Y ? Pick y Y . Since L(X) is dense in Y, there exists

a sequence, Lx
n
such that Lx
n
y. But then by continuity of y
, y
(y) =
lim
n
y
(Lx
n
) = lim
n
L
(x
n
) = 0. Since y
(y) = 0 for all y, this implies

y
= 0 and so L
is one to one.
Corollary 12.29 Suppose X and Y are Banach spaces, L L(X, Y ), and L is one
to one and onto. Then L
is also one to one and onto.

There exists a natural mapping, called the James map from a normed linear
space, X, to the dual of the dual space which is described in the following denition.
Denition 12.30 Dene J : X X
by J(x)(x
) = x
(x).
Theorem 12.31 The map, J, has the following properties.
a.) J is one to one and linear.
b.) [[Jx[[ = [[x[[ and [[J[[ = 1.
c.) J(X) is a closed subspace of X
if X is complete.
Also if x
,
[[x
[[ = sup [x
(x
)[ : [[x
[[ 1, x
.
Proof:
J (ax +by) (x
) x
(ax +by)
= ax
(x) +bx
(y)
= (aJ (x) +bJ (y)) (x
) .
Since this holds for all x
, it follows that
J (ax +by) = aJ (x) +bJ (y)
and so J is linear. If Jx = 0, then by Lemma 12.27 there exists x
such that
x
(x) = [[x[[ and [[x
[[ = 1. Then
0 = J(x)(x
) = x
(x) = [[x[[.
This shows a.).
To show b.), let x X and use Lemma 12.27 to obtain x
such that
x
(x) = [[x[[ with [[x
[[ = 1. Then
[[x[[ sup[y
(x)[ : [[y
[[ 1
= sup[J(x)(y
)[ : [[y
[[ 1 = [[Jx[[
[J(x)(x
)[ = [x
(x)[ = [[x[[
Therefore, [[Jx[[ = [[x[[ as claimed. Therefore,
[[J[[ = sup[[Jx[[ : [[x[[ 1 = sup[[x[[ : [[x[[ 1 = 1.
This shows b.).
To verify c.), use b.). If Jx
n
y
then by b.), x
n
converging to some x X because
[[x
n
x
m
[[ = [[Jx
n
Jx
m
[[
and Jx
n
is a Cauchy sequence. Then Jx = lim
n
Jx
n
= y
.
Finally, to show the assertion about the norm of x
, use what was just shown

applied to the James map from X
to X
still referred to as J.
[[x
[[ = sup [x
(x)[ : [[x[[ 1 = sup[J (x) (x
)[ : [[Jx[[ 1
sup[x
(x
)[ : [[x
[[ 1 = sup[J (x
) (x
)[ : [[x
[[ 1
[[Jx
[[ = [[x
[[.
Denition 12.32 When J maps X onto X
, X is called reexive.
It happens the L
p
spaces are reexive whenever p > 1.
316 BANACH SPACES
12.3 Exercises
1. Is N a G
set? What about ? What about a countable dense subset of a

complete metric space?
2. Let f : 1 C be a function. Dene the oscillation of a function in B(x, r)
by
r
f(x) = sup[f(z) f(y)[ : y, z B(x, r). Dene the oscillation of the
function at the point, x by f(x) = lim
r0
r
f(x). Show f is continuous
at x if and only if f(x) = 0. Then show the set of points where f is
continuous is a G
set (try U
n
= x : f(x) <
1
n
). Does there exist a
function continuous at only the rational numbers? Does there exist a function
continuous at every irrational and discontinuous elsewhere? Hint: Suppose
D is any countable set, D = d
i
i=1
, and dene the function, f
n
(x) to equal
zero for every x / d
1
, , d
n
and 2
n
for x in this nite set. Then consider
g (x)
n=1
f
n
(x). Show that this series converges uniformly.
3. Let f C([0, 1]) and suppose f
(x) exists. Show there exists a constant, K,

such that [f(x) f(y)[ K[x y[ for all y [0, 1]. Let U
n
= f C([0, 1])
such that for each x [0, 1] there exists y [0, 1] such that [f(x) f(y)[ >
n[xy[. Show that U
n
is open and dense in C([0, 1]) where for f C ([0, 1]),
[[f[[ sup[f (x)[ : x [0, 1] .
Show that
n
U
n
is a dense G
set of nowhere dierentiable continuous func-

tions. Thus every continuous function is uniformly close to one which is
nowhere dierentiable.
4. Suppose f (x) =
k=1
u
k
(x) where the convergence is uniform and each u
k
is a polynomial. Is it reasonable to conclude that f
(x) =
k=1
u
k
(x)? The
answer is no. Use Problem 3 and the Weierstrass approximation theorem do
show this.
5. Let X be a normed linear space. A X is weakly bounded if for each x
, sup[x
(x)[ : x A < , while A is bounded if sup[[x[[ : x A < .

Show A is weakly bounded if and only if it is bounded.
6. Let f be a 2 periodic locally integrable function on 1. The Fourier series for
f is given by
k=
a
k
e
ikx
lim
n
n
k=n
a
k
e
ikx
lim
n
S
n
f (x)
where
a
k
=
1
2
_

e
ikx
f (x) dx.
Show
S
n
f (x) =
_

D
n
(x y) f (y) dy
12.3. EXERCISES 317
where
D
n
(t) =
sin((n +
1
2
)t)
2 sin(
t
2
)
.
Verify that
_
D
n
(t) dt = 1. Also show that if g L
1
(1) , then
lim
a
_
R
g (x) sin(ax) dx = 0.
This last is called the Riemann Lebesgue lemma. Hint: For the last part,
assume rst that g C
c
(1) and integrate by parts. Then exploit density of
the set of functions in L
1
(1).
7. It turns out that the Fourier series sometimes converges to the function point-
wise. Suppose f is 2 periodic and Holder continuous. That is [f (x) f (y)[
K[x y[
where (0, 1]. Show that if f is like this, then the Fourier series
converges to f at every point. Next modify your argument to show that if
at every point, x, [f (x+) f (y)[ K[x y[
for y close enough to x and

larger than x and [f (x) f (y)[ K[x y[
for every y close enough to x

and smaller than x, then S
n
f (x)
f(x+)+f(x)
2
, the midpoint of the jump
of the function. Hint: Use Problem 6.
8. Let Y = f such that f is continuous, dened on 1, and 2 periodic. Dene
[[f[[
Y
= sup[f(x)[ : x [, ]. Show that (Y, [[ [[
Y
) is a Banach space. Let
x 1 and dene L
n
(f) = S
n
f(x). Show L
n
Y
but lim
n
[[L
n
[[ = .
Show that for each x 1, there exists a dense G
subset of Y such that for f

in this set, [S
n
f(x)[ is unbounded. Finally, show there is a dense G
subset of
Y having the property that [S
n
f(x)[ is unbounded on the rational numbers.
Hint: To do the rst part, let f(y) approximate sgn(D
n
(xy)). Here sgnr =
1 if r > 0, 1 if r < 0 and 0 if r = 0. This rules out one possibility of the
uniform boundedness principle. After this, show the countable intersection of
dense G
sets must also be a dense G
set.
9. Let (0, 1]. Dene, for X a compact subset of 1
p
,
C
(X; 1
n
) f C (X; 1
n
) :
(f ) +[[f [[ [[f [[
<
where
[[f [[ sup[f (x)[ : x X
and
(f ) sup
[f (x) f (y)[
[x y[
: x, y X, x ,= y.
Show that (C
(X; 1
n
) , [[[[
) is a complete normed linear space. This is

called a Holder space. What would this space consist of if > 1?
318 BANACH SPACES
10. Let X be the Holder functions which are periodic of period 2. Dene
L
n
f (x) = S
n
f (x) where L
n
: X Y for Y given in Problem 8. Show [[L
n
[[
is bounded independent of n. Conclude that L
n
f f in Y for all f X. In
other words, for the Holder continuous and 2 periodic functions, the Fourier
series converges to the function uniformly. Hint: L
n
f (x) is given by
L
n
f (x) =
_

D
n
(y) f (x y) dy
where f (x y) = f (x) + g (x, y) where [g (x, y)[ C [y[
. Use the fact the

Dirichlet kernel integrates to one to write
D
n
(y) f (x y) dy
=|f(x)|
..
D
n
(y) f (x) dy
+C
sin
__
n +
1
2
_
y
_
(g (x, y) / sin(y/2)) dy
Show the functions, y g (x, y) / sin(y/2) are bounded in L

1
independent of
x and get a uniform bound on [[L
n
[[. Now use a similar argument to show
L
n
f is equicontinuous in addition to being uniformly bounded. In doing
this you might proceed as follows. Show
[L
n
f (x) L
n
f (x
)[
D
n
(y) (f (x y) f (x
y)) dy
[[f[[
[x x
sin
__
n +
1
2
_
y
_
_
f (x y) f (x) (f (x
y) f (x
))
sin
_
y
2
_
_
dy
Then split this last integral into two cases, one for [y[ < and one where
[y[ . If L
n
f fails to converge to f uniformly, then there exists > 0 and a
subsequence, n
k
such that [[L
n
k
f f[[
where this is the norm in Y or

equivalently the sup norm on [, ]. By the Arzela Ascoli theorem, there is
a further subsequence, L
n
k
l
f which converges uniformly on [, ]. But by
Problem 7 L
n
f (x) f (x).
11. Let X be a normed linear space and let M be a convex open set containing
0. Dene
(x) = inft > 0 :
x
t
M.
Show is a gauge function dened on X. This particular example is called a
Minkowski functional. It is of fundamental importance in the study of locally
convex topological vector spaces. A set, M, is convex if x + (1 )y M
whenever [0, 1] and x, y M.
12.3. EXERCISES 319
12. The Hahn Banach theorem can be used to establish separation theorems. Let
M be an open convex set containing 0. Let x / M. Show there exists x
such that Re x
(x) 1 > Re x
(y) for all y M. Hint: If y M, (y) < 1.

Show this. If x / M, (x) 1. Try f(x) = (x) for 1. Then extend
f to the whole space using the Hahn Banach theorem and call the result F,
show F is continuous, then x it so F is the real part of x
.
13. A Banach space is said to be strictly convex if whenever [[x[[ = [[y[[ and x ,= y,
then

x +y
2
< [[x[[.
F : X X
is said to be a duality map if it satises the following: a.)

[[F(x)[[ = [[x[[. b.) F(x)(x) = [[x[[
2
. Show that if X
is strictly convex, then

such a duality map exists. The duality map is an attempt to duplicate some
of the features of the Riesz map in Hilbert space. This Riesz map is the map
which takes a Hilbert space to its dual dened as follows.
R(x) (y) = (y, x)
The Riesz representation theorem for Hilbert space says this map is onto.
Hint: For an arbitrary Banach space, let
F (x)
_
x
: [[x
[[ [[x[[ and x
(x) = [[x[[
2
_
Show F (x) ,= by using the Hahn Banach theorem on f(x) = [[x[[
2
.
Next show F (x) is closed and convex. Finally show that you can replace
the inequality in the denition of F (x) with an equal sign. Now use strict
convexity to show there is only one element in F (x).
14. Prove the following theorem which is an improved version of the open mapping
theorem, [16]. Let X and Y be Banach spaces and let A L(X, Y ). Then
the following are equivalent.
AX = Y,
A is an open map.
Note this gives the equivalence between A being onto and A being an open
map. The open mapping theorem says that if A is onto then it is open.
15. Suppose D X and D is dense in X. Suppose L : D Y is linear and
[[Lx[[ K[[x[[ for all x D. Show there is a unique extension of L,

L, dened
on all of X with [[
Lx[[ K[[x[[ and

L is linear. You do not get uniqueness
when you use the Hahn Banach theorem. Therefore, in the situation of this
problem, it is better to use this result.
16. A Banach space is uniformly convex if whenever [[x
n
[[, [[y
n
[[ 1 and
[[x
n
+ y
n
[[ 2, it follows that [[x
n
y
n
[[ 0. Show uniform convexity
320 BANACH SPACES
implies strict convexity (See Problem 13). Hint: Suppose it is not strictly
convex. Then there exist [[x[[ and [[y[[ both equal to 1 and

x
n
+y
n
2
= 1
consider x
n
x and y
n
y, and use the conditions for uniform convexity to
get a contradiction. It can be shown that L
p
is uniformly convex whenever
> p > 1. See Hewitt and Stromberg [25] or Ray [36].
17. Show that a closed subspace of a reexive Banach space is reexive. Hint:
The proof of this is an exercise in the use of the Hahn Banach theorem. Let
Y be the closed subspace of the reexive space X and let y
. Then
i
and so i
= Jx for some x X because X is reexive.

Now argue that x Y as follows. If x / Y , then there exists x
such that
x
(Y ) = 0 but x
(x) ,= 0. Thus, i
= 0. Use this to get a contradiction.

When you know that x = y Y , the Hahn Banach theorem implies i
is onto
Y
and for all x
,
y
(i
) = i
(x
) = Jx(x
) = x
(x) = x
(iy) = i
(y).
18. x
n
converges weakly to x if for every x
, x
(x
n
) x
(x). x
n
x
denotes weak convergence. Show that if [[x
n
x[[ 0, then x
n
x.
19. Show that if X is uniformly convex, then if x
n
x and [[x
n
[[ [[x[[, it
follows [[x
n
x[[ 0. Hint: Use Lemma 12.27 to obtain f X
with [[f[[ = 1
and f(x) = [[x[[. See Problem 16 for the denition of uniform convexity.
Now by the weak convergence, you can argue that if x ,= 0, f (x
n
/ [[x
n
[[)
f (x/ [[x[[). You also might try to show this in the special case where [[x
n
[[ =
[[x[[ = 1.
20. Suppose L L(X, Y ) and M L(Y, Z). Show ML L(X, Z) and that
(ML)
= L
.
Hilbert Spaces
13.1 Basic Theory
Denition 13.1 Let X be a vector space. An inner product is a mapping from
X X to C if X is complex and from X X to 1 if X is real, denoted by (x, y)
which satises the following.
(x, x) 0, (x, x) = 0 if and only if x = 0, (13.1)
(x, y) = (y, x). (13.2)
(ax +by, z) = a(x, z) +b(y, z). (13.3)
Note that 13.2 and 13.3 imply (x, ay + bz) = a(x, y) + b(x, z). Such a vector space
is called an inner product space.
The Cauchy Schwarz inequality is fundamental for the study of inner product
spaces.
Theorem 13.2 (Cauchy Schwarz) In any inner product space
[(x, y)[ [[x[[ [[y[[.
Proof: Let C, [[ = 1, and (x, y) = [(x, y)[ = Re(x, y). Let
F(t) = (x +ty, x +ty).
If y = 0 there is nothing to prove because
(x, 0) = (x, 0 + 0) = (x, 0) + (x, 0)
and so (x, 0) = 0. Thus, it can be assumed y ,= 0. Then from the axioms of the
inner product,
F(t) = [[x[[
2
+ 2t Re(x, y) +t
2
[[y[[
2
0.
321
322 HILBERT SPACES
This yields
[[x[[
2
+ 2t[(x, y)[ +t
2
[[y[[
2
0.
Since this inequality holds for all t 1, it follows from the quadratic formula that
4[(x, y)[
2
4[[x[[
2
[[y[[
2
0.
This yields the conclusion and proves the theorem.
Proposition 13.3 For an inner product space, [[x[[ (x, x)
1/2
does specify a
norm.
Proof: All the axioms are obvious except the triangle inequality. To verify this,
[[x +y[[
2
(x +y, x +y) [[x[[
2
+[[y[[
2
+ 2 Re (x, y)
[[x[[
2
+[[y[[
2
+ 2 [(x, y)[
[[x[[
2
+[[y[[
2
+ 2 [[x[[ [[y[[ = ([[x[[ +[[y[[)
2
.
The following lemma is called the parallelogram identity.
Lemma 13.4 In an inner product space,
[[x +y[[
2
+[[x y[[
2
= 2[[x[[
2
+ 2[[y[[
2
.
The proof, a straightforward application of the inner product axioms, is left to
the reader.
Lemma 13.5 For x H, an inner product space,
[[x[[ = sup
||y||1
[(x, y)[ (13.4)
Proof: By the Cauchy Schwarz inequality, if x ,= 0,
[[x[[ sup
||y||1
[(x, y)[
_
x,
x
[[x[[
_
= [[x[[ .
It is obvious that 13.4 holds in the case that x = 0.
Denition 13.6 A Hilbert space is an inner product space which is complete. Thus
a Hilbert space is a Banach space in which the norm comes from an inner product
as described above.
In Hilbert space, one can dene a projection map onto closed convex nonempty
sets.
Denition 13.7 A set, K, is convex if whenever [0, 1] and x, y K, x+(1
)y K.
13.1. BASIC THEORY 323
Theorem 13.8 Let K be a closed convex nonempty subset of a Hilbert space, H,
and let x H. Then there exists a unique point Px K such that [[Px x[[
[[y x[[ for all y K.
Proof: Consider uniqueness. Suppose that z
1
and z
2
are two elements of K
such that for i = 1, 2,
[[z
i
x[[ [[y x[[ (13.5)
for all y K. Also, note that since K is convex,
z
1
+z
2
2
K.
Therefore, by the parallelogram identity,
[[z
1
x[[
2
[[
z
1
+z
2
2
x[[
2
= [[
z
1
x
2
+
z
2
x
2
[[
2
= 2([[
z
1
x
2
[[
2
+[[
z
2
x
2
[[
2
) [[
z
1
z
2
2
[[
2
=
1
2
[[z
1
x[[
2
+
1
2
[[z
2
x[[
2
[[
z
1
z
2
2
[[
2
[[z
1
x[[
2
[[
z
1
z
2
2
[[
2
,
where the last inequality holds because of 13.5 letting z
i
= z
2
and y = z
1
. Hence
z
1
= z
2
and this shows uniqueness.
Now let = inf[[x y[[ : y K and let y
n
be a minimizing sequence. This
means y
n
K satises lim
n
[[x y
n
[[ = . Now the following follows from
properties of the norm.
[[y
n
x +y
m
x[[
2
= 4([[
y
n
+y
m
2
x[[
2
)
Then by the parallelogram identity, and convexity of K,
y
n
+y
m
2
K, and so
[[ (y
n
x) (y
m
x) [[
2
= 2([[y
n
x[[
2
+[[y
m
x[[
2
)
=||y
n
x+y
m
x||
2
..
4([[
y
n
+y
m
2
x[[
2
)
2([[y
n
x[[
2
+[[y
m
x[[
2
) 4
2
.
Since [[x y
n
[[ , this shows y
n
x is a Cauchy sequence. Thus also y
n
is
a Cauchy sequence. Since H is complete, y
n
y for some y H which must be in
K because K is closed. Therefore
[[x y[[ = lim
n
[[x y
n
[[ = .
Let Px = y.
324 HILBERT SPACES
Corollary 13.9 Let K be a closed, convex, nonempty subset of a Hilbert space, H,
and let x H. Then for z K, z = Px if and only if
Re(x z, y z) 0 (13.6)
for all y K.
Before proving this, consider what it says in the case where the Hilbert space is
1
n
.
E
y
K
y

x
z
Condition 13.6 says the angle, , shown in the diagram is always obtuse. Re-
member from calculus, the sign of x y is the same as the sign of the cosine of the
included angle between x and y. Thus, in nite dimensions, the conclusion of this
corollary says that z = Px exactly when the angle of the indicated angle is obtuse.
Surely the picture suggests this is reasonable.
The inequality 13.6 is an example of a variational inequality and this corollary
characterizes the projection of x onto K as the solution of this variational inequality.
Proof of Corollary: Let z K and let y K also. Since K is convex, it
follows that if t [0, 1],
z +t(y z) = (1 t) z +ty K.
Furthermore, every point of K can be written in this way. (Let t = 1 and y K.)
Therefore, z = Px if and only if for all y K and t [0, 1],
[[x (z +t(y z))[[
2
= [[(x z) t(y z)[[
2
[[x z[[
2
for all t [0, 1] and y K if and only if for all t [0, 1] and y K
[[x z[[
2
+t
2
[[y z[[
2
2t Re (x z, y z) [[x z[[
2
If and only if for all t [0, 1],
t
2
[[y z[[
2
2t Re (x z, y z) 0. (13.7)
Now this is equivalent to 13.7 holding for all t (0, 1). Therefore, dividing by
t (0, 1) , 13.7 is equivalent to
t [[y z[[
2
2 Re (x z, y z) 0
for all t (0, 1) which is equivalent to 13.6. This proves the corollary.
Corollary 13.10 Let K be a nonempty convex closed subset of a Hilbert space, H.
Then the projection map, P is continuous. In fact,
[Px Py[ [x y[ .
13.1. BASIC THEORY 325
Proof: Let x, x
H. Then by Corollary 13.9,

Re (x
Px
, Px Px
) 0, Re (x Px, Px
Px) 0
Hence
0 Re (x Px, Px Px
) Re (x
Px
, Px Px
)
= Re (x x
, Px Px
) [Px Px
[
2
and so
[Px Px
[
2
[x x
[ [Px Px
[ .
The next corollary is a more general form for the Brouwer xed point theorem.
Corollary 13.11 Let f : K K where K is a convex compact subset of 1
n
. Then
f has a xed point.
Proof: Let K B(0, R) and let P be the projection map onto K. Then
consider the map f P which maps B(0, R) to B(0, R) and is continuous. By the
Brouwer xed point theorem for balls, this map has a xed point. Thus there exists
x such that
f P (x) = x
Now the equation also requires x K and so P (x) = x. Hence f (x) = x.
Denition 13.12 Let H be a vector space and let U and V be subspaces. U V =
H if every element of H can be written as a sum of an element of U and an element
of V in a unique way.
The case where the closed convex set is a closed subspace is of special importance
and in this case the above corollary implies the following.
Corollary 13.13 Let K be a closed subspace of a Hilbert space, H, and let x H.
Then for z K, z = Px if and only if
(x z, y) = 0 (13.8)
for all y K. Furthermore, H = K K
where
K
x H : (x, k) = 0 for all k K

and
[[x[[
2
= [[x Px[[
2
+[[Px[[
2
. (13.9)
Proof: Since K is a subspace, the condition 13.6 implies Re(x z, y) 0
for all y K. Replacing y with y, it follows Re(x z, y) 0 which implies
Re(x z, y) 0 for all y. Therefore, Re(x z, y) = 0 for all y K. Now let
326 HILBERT SPACES
[[ = 1 and (x z, y) = [(x z, y)[. Since K is a subspace, it follows y K for
all y K. Therefore,
0 = Re(x z, y) = (x z, y) = (x z, y) = [(x z, y)[.
This shows that z = Px, if and only if 13.8.
For x H, x = x Px + Px and from what was just shown, x Px K
and Px K. This shows that K
+ K = H. Is there only one way to write

a given element of H as a sum of a vector in K with a vector in K
? Suppose
y + z = y
1
+ z
1
where z, z
1
K
and y, y
1
K. Then (y y
1
) = (z
1
z) and
so from what was just shown, (y y
1
, y y
1
) = (y y
1
, z
1
z) = 0 which shows
y
1
= y and consequently z
1
= z. Finally, letting z = Px,
[[x[[
2
= (x z +z, x z +z) = [[x z[[
2
+ (x z, z) + (z, x z) +[[z[[
2
= [[x z[[
2
+[[z[[
2
The following theorem is called the Riesz representation theorem for the dual of
a Hilbert space. If z H then dene an element f H
by the rule (x, z) f (x). It

follows from the Cauchy Schwarz inequality and the properties of the inner product
that f H
. The Riesz representation theorem says that all elements of H
are of
this form.
Theorem 13.14 Let H be a Hilbert space and let f H
. Then there exists a

unique z H such that
f (x) = (x, z) (13.10)
for all x H.
Proof: Letting y, w H the assumption that f is linear implies
f (yf(w) f(y)w) = f (w) f (y) f (y) f (w) = 0
which shows that yf(w)f(y)w f
1
(0), which is a closed subspace of H since f is
continuous. If f
1
(0) = H, then f is the zero map and z = 0 is the unique element
of H which satises 13.10. If f
1
(0) ,= H, pick u / f
1
(0) and let w uPu ,= 0.
Thus Corollary 13.13 implies (y, w) = 0 for all y f
1
(0). In particular, let y =
xf(w) f(x)w where x H is arbitrary. Therefore,
0 = (f(w)x f(x)w, w) = f(w)(x, w) f(x)[[w[[
2
.
Thus, solving for f (x) and using the properties of the inner product,
f(x) = (x,
f(w)w
[[w[[
2
)
Let z = f(w)w/[[w[[
2
. This proves the existence of z. If f (x) = (x, z
i
) i = 1, 2,
for all x H, then for all x H, then (x, z
1
z
2
) = 0 which implies, upon taking
x = z
1
z
2
that z
1
= z
2
If R : H H
is dened by Rx(y) (y, x) , the Riesz representation theorem

above states this map is onto. This map is called the Riesz map. It is routine to
show R is linear and [Rx[ = [x[.
13.2. APPROXIMATIONS IN HILBERT SPACE 327
13.2 Approximations In Hilbert Space
The Gram Schmidt process applies in any Hilbert space.
Theorem 13.15 Let x
1
, , x
n
be a basis for M a subspace of H a Hilbert space.
Then there exists an orthonormal basis for M, u
1
, , u
n
which has the property
that for each k n, span(x
1
, , x
k
) = span(u
1
, , u
k
) . Also if x
1
, , x
n

H, then
span(x
1
, , x
n
)
is a closed subspace.
Proof: Let x
1
, , x
n
be a basis for M. Let u
1
x
1
/ [x
1
[ . Thus for k = 1,
span(u
1
) = span(x
1
) and u
1
is an orthonormal set. Now suppose for some k < n,
u
1
, , u
k
have been chosen such that (u
j
u
l
) =
jl
and span(x
1
, , x
k
) =
span(u
1
, , u
k
). Then dene
u
k+1

x
k+1
k
j=1
(x
k+1
u
j
) u
j
x
k+1
k
j=1
(x
k+1
u
j
) u
j
, (13.11)
where the denominator is not equal to zero because the x
j
form a basis and so
x
k+1
/ span(x
1
, , x
k
) = span(u
1
, , u
k
)
Thus by induction,
u
k+1
span(u
1
, , u
k
, x
k+1
) = span(x
1
, , x
k
, x
k+1
) .
Also, x
k+1
span(u
1
, , u
k
, u
k+1
) which is seen easily by solving 13.11 for x
k+1
and it follows
span(x
1
, , x
k
, x
k+1
) = span(u
1
, , u
k
, u
k+1
) .
If l k,
(u
k+1
u
l
) = C
_
_
(x
k+1
u
l
)
k
j=1
(x
k+1
u
j
) (u
j
u
l
)
_
_
= C
_
_
(x
k+1
u
l
)
k
j=1
(x
k+1
u
j
)
lj
_
_
= C ((x
k+1
u
l
) (x
k+1
u
l
)) = 0.
The vectors, u
j
n
j=1
, generated in this way are therefore an orthonormal basis
because each vector has unit length.
Consider the second claim about nite dimensional subspaces. Without loss of
generality, assume x
1
, , x
n
is linearly independent. If it is not, delete vectors
328 HILBERT SPACES
until a linearly independent set is obtained. Then by the rst part, span(x
1
, , x
n
) =
span(u
1
, , u
n
) M where the u
i
are an orthonormal set of vectors. Suppose
y
k
M and y
k
y H. Is y M? Let
y
k

n
j=1
c
k
j
u
j
Then let c
k
_
c
k
1
, , c
k
n
_
T
. Then
c
k
c
l
j=1
c
k
j
c
l
j
2
=
_
_
n
j=1
_
c
k
j
c
l
j
_
u
j
,
n
j=1
_
c
k
j
c
l
j
_
u
j
_
_
= [[y
k
y
l
[[
2
which shows
_
c
k
_
is a Cauchy sequence in F
n
and so it converges to c F
n
. Thus
y = lim
k
y
k
= lim
k
n
j=1
c
k
j
u
j
=
n
j=1
c
j
u
j
M.
This completes the proof.
Theorem 13.16 Let M be the span of u
1
, , u
n
in a Hilbert space, H and let
y H. Then Py is given by
Py =
n
k=1
(y, u
k
) u
k
(13.12)
and the distance is given by
_
[y[
2
k=1
[(y, u
k
)[
2
. (13.13)
Proof:
_
y
n
k=1
(y, u
k
) u
k
, u
p
_
= (y, u
p
)
n
k=1
(y, u
k
) (u
k
, u
p
)
= (y, u
p
) (y, u
p
) = 0
It follows that
_
y
n
k=1
(y, u
k
) u
k
, u
_
= 0
for all u M and so by Corollary 13.13 this veries 13.12.
13.2. APPROXIMATIONS IN HILBERT SPACE 329
The square of the distance, d is given by
d
2
=
_
y
n
k=1
(y, u
k
) u
k
, y
n
k=1
(y, u
k
) u
k
_
= [y[
2
2
n
k=1
[(y, u
k
)[
2
+
n
k=1
[(y, u
k
)[
2
and this shows 13.13.
What if the subspace is the span of vectors which are not orthonormal? There
is a very interesting formula for the distance between a point of a Hilbert space and
a nite dimensional subspace spanned by an arbitrary basis.
Denition 13.17 Let x
1
, , x
n
H, a Hilbert space. Dene
( (x
1
, , x
n
)
_
_
_
(x
1
, x
1
) (x
1
, x
n
)
.
.
.
.
.
.
(x
n
, x
1
) (x
n
, x
n
)
_
_
_ (13.14)
Thus the ij
th
entry of this matrix is (x
i
, x
j
). This is sometimes called the Gram
matrix. Also dene G(x
1
, , x
n
) as the determinant of this matrix, also called the
Gram determinant.
G(x
1
, , x
n
)
(x
1
, x
1
) (x
1
, x
n
)
.
.
.
.
.
.
(x
n
, x
1
) (x
n
, x
n
)
(13.15)
The theorem is the following.
Theorem 13.18 Let M = span(x
1
, , x
n
) H, a Real Hilbert space where
x
1
, , x
n
is a basis and let y H. Then letting d be the distance from y to
M,
d
2
=
G(x
1
, , x
n
, y)
G(x
1
, , x
n
)
. (13.16)
Proof: By Theorem 13.15 M is a closed subspace of H. Let

n
k=1
k
x
k
be the
element of M which is closest to y. Then by Corollary 13.13,
_
y
n
k=1
k
x
k
, x
p
_
= 0
for each p = 1, 2, , n. This yields the system of equations,
(y, x
p
) =
n
k=1
(x
p
, x
k
)
k
, p = 1, 2, , n (13.17)
330 HILBERT SPACES
Also by Corollary 13.13,
[[y[[
2
=
d
2
..
y
n
k=1
k
x
k
2
+
k=1
k
x
k
2
and so, using 13.17,
[[y[[
2
= d
2
+
j
_
k
(x
k
, x
j
)
_
j
= d
2
+
j
(y, x
j
)
j
(13.18)
d
2
+y
T
x
(13.19)
in which
y
T
x
((y, x
1
) , , (y, x
n
)) ,
T
(
1
, ,
n
) .
Then 13.17 and 13.18 imply the following system
_
( (x
1
, , x
n
) 0
y
T
x
1
__

d
2
_
=
_
y
x
[[y[[
2
_
By Cramers rule,
d
2
=
det
_
( (x
1
, , x
n
) y
x
y
T
x
[[y[[
2
_
det
_
( (x
1
, , x
n
) 0
y
T
x
1
_
=
det
_
( (x
1
, , x
n
) y
x
y
T
x
[[y[[
2
_
det (( (x
1
, , x
n
))
=
det (( (x
1
, , x
n
, y))
det (( (x
1
, , x
n
))
=
G(x
1
, , x
n
, y)
G(x
1
, , x
n
)
13.3 Orthonormal Sets
The concept of an orthonormal set of vectors is a generalization of the notion of the
standard basis vectors of 1
n
or C
n
.
Denition 13.19 Let H be a Hilbert space. S H is called an orthonormal set if
[[x[[ = 1 for all x S and (x, y) = 0 if x, y S and x ,= y. For any set, D,
D
x H : (x, d) = 0 for all d D .

If S is a set, span(S) is the set of all nite linear combinations of vectors from S.
13.3. ORTHONORMAL SETS 331
You should verify that D
is always a closed subspace of H.

Theorem 13.20 In any separable Hilbert space, H, there exists a countable or-
thonormal set, S = x
i
such that the span of these vectors is dense in H. Further-
more, if span(S) is dense, then for x H,
x =
i=1
(x, x
i
) x
i
lim
n
n
i=1
(x, x
i
) x
i
. (13.20)
Proof: Let T denote the collection of all orthonormal subsets of H. T is
nonempty because x T where [[x[[ = 1. The set, T is a partially ordered set
with the order given by set inclusion. By the Hausdor maximal theorem, there
exists a maximal chain, C in T. Then let S C. It follows S must be a maximal
orthonormal set of vectors. Why? It remains to verify that S is countable span(S)
is dense, and the condition, 13.20 holds. To see S is countable note that if x, y S,
then
[[x y[[
2
= [[x[[
2
+[[y[[
2
2 Re (x, y) = [[x[[
2
+[[y[[
2
= 2.
Therefore, the open sets, B
_
x,
1
2
_
for x S are disjoint and cover S. Since H is
assumed to be separable, there exists a point from a countable dense set in each of
these disjoint balls showing there can only be countably many of the balls and that
consequently, S is countable as claimed.
It remains to verify 13.20 and that span(S) is dense. If span(S) is not dense,
then span(S) is a closed proper subspace of H and letting y / span(S),
z
y Py
[[y Py[[
span(S)
.
But then S z would be a larger orthonormal set of vectors contradicting the
maximality of S.
It remains to verify 13.20. Let S = x
i
i=1
and consider the problem of choosing
the constants, c
k
in such a way as to minimize the expression
x
n
k=1
c
k
x
k
2
=
[[x[[
2
+
n
k=1
[c
k
[
2
k=1
c
k
(x, x
k
)
n
k=1
c
k
(x, x
k
).
This equals
[[x[[
2
+
n
k=1
[c
k
(x, x
k
)[
2
k=1
[(x, x
k
)[
2
and therefore, this minimum is achieved when c
k
= (x, x
k
) and equals
[[x[[
2
k=1
[(x, x
k
)[
2
332 HILBERT SPACES
Now since span(S) is dense, there exists n large enough that for some choice of
constants, c
k
,
x
n
k=1
c
k
x
k
2
< .
However, from what was just shown,
x
n
i=1
(x, x
i
) x
i
x
n
k=1
c
k
x
k
2
<
showing that lim
n
n
i=1
(x, x
i
) x
i
= x as claimed. This proves the theorem.
The proof of this theorem contains the following corollary.
Corollary 13.21 Let S be any orthonormal set of vectors and let
x
1
, , x
n
S.
Then if x H
x
n
k=1
c
k
x
k
x
n
i=1
(x, x
i
) x
i
2
for all choices of constants, c
k
. In addition to this, Bessels inequality
[[x[[
2
k=1
[(x, x
k
)[
2
.
If S is countable and span(S) is dense, then letting x
i
i=1
= S, 13.20 follows.
13.4 Fourier Series, An Example
In this section consider the Hilbert space, L
2
(0, 2) with the inner product,
(f, g)
_
2
0
fgdm.
This is a Hilbert space because of the theorem which states the L
p
spaces are com-
plete, Theorem 10.10 on Page 251. An example of an orthonormal set of functions
in L
2
(0, 2) is
n
(x)
1
2
e
inx
for n an integer. Is it true that the span of these functions is dense in L
2
(0, 2)?
Theorem 13.22 Let S =
n
nZ
. Then span(S) is dense in L
2
(0, 2).
13.4. FOURIER SERIES, AN EXAMPLE 333
Proof: By regularity of Lebesgue measure, it follows from Theorem 10.28 that
C
c
(0, 2) is dense in L
2
(0, 2) . Therefore, it suces to show that for g C
c
(0, 2) ,
then for every > 0 there exists h span(S) such that [[g h[[
L
2
(0,2)
< .
Let T denote the points of C which are of the form e
it
for t 1. Let / denote
the algebra of functions consisting of polynomials in z and 1/z for z T. Thus a
typical such function would be one of the form
m
k=m
c
k
z
k
for m chosen large enough. This algebra separates the points of T because it
contains the function, p (z) = z. It annihilates no point of t because it contains
the constant function 1. Furthermore, it has the property that for f /, f /.
By the Stone Weierstrass approximation theorem, Theorem 6.13 on Page 129, / is
dense in C (T) . Now for g C
c
(0, 2) , extend g to all of 1 to be 2 periodic. Then
letting G
_
e
it
_
g (t) , it follows G is well dened and continuous on T. Therefore,
there exists H / such that for all t 1,
H
_
e
it
_
G
_
e
it
_
<
2
/2.
Thus H
_
e
it
_
is of the form
H
_
e
it
_
=
m
k=m
c
k
_
e
it
_
k
=
m
k=m
c
k
e
ikt
span(S) .
Let h(t) =
m
k=m
c
k
e
ikt
. Then
__
2
0
[g h[
2
dx
_1/2
__
2
0
max [g (t) h(t)[ : t [0, 2] dx
_1/2
=
__
2
0
max
_
G
_
e
it
_
H
_
e
it
_
: t [0, 2]
_
dx
_1/2
<
__
2
0
2
2
_1/2
= .
Corollary 13.23 For f L
2
(0, 2) ,
lim
m
f
m
k=m
(f,
k
)
k
L
2
(0,2)
Proof: This follows from Theorem 13.20 on Page 331.
334 HILBERT SPACES
13.5 General Theory Of Continuous Semigroups
Much more on semigroups is available in Yosida [43]. This is just an introduction
to the subject.
Denition 13.24 A strongly continuous semigroup dened on H,a Banach space
is a function S : [0, ) H which satises the following for all x
0
H.
S (t) L(H, H) , S (t +s) = S (t) S (s) ,
t S (t) x
0
is continuous, lim
t0+
S (t) x
0
= x
0
Sometimes such a semigroup is said to be C
0
. It is said to have the linear operator
A as its generator if
D(A)
_
x : lim
h0
S (h) x x
h
exists
_
and for x D(A) , A is dened by
lim
h0
S (h) x x
h
Ax
The assertion that t S (t) x
0
is continuous and that S (t) L(H, H) is not
sucient to say there is a bound on [[S (t)[[ for all t. Also the assertion that for
each x
0
,
lim
t0+
S (t) x
0
= x
0
is not the same as saying that S (t) I in L(H, H) . It is a much weaker assertion.
The next theorem gives information on the growth of [[S (t)[[ . It turns out it has
exponential growth.
Lemma 13.25 Let M sup[[S (t)[[ : t [0, T] . Then M < .
Proof: If this is not true, then there exists t
n
[0, T] such that [[S (t
n
)[[ n.
That is the operators S (t
n
) are not uniformly bounded. By the uniform bound-
edness principle, Theorem 12.8, there exists x H such that [[S (t
n
) x[[ is not
bounded. However, this is impossible because it is given that t S (t) x is con-
tinuous on [0, T] and so t [[S (t) x[[ must achieve its maximum on this compact
set.
Now here is the main result for growth of [[S (t)[[.
Theorem 13.26 For M described in Lemma 13.25, there exists such that
[[S (t)[[ Me
t
.
In fact, can be chosen such that M
1/T
= e
.
13.5. GENERAL THEORY OF CONTINUOUS SEMIGROUPS 335
Proof: Let t be arbitrary. Then t = mT + r (t) where 0 r (t) < T. Then by
the semigroup property
[[S (t)[[ = [[S (mT +r (t))[[
= [[S (r (t)) S (T)
m
[[ M
m+1
Now mT t mT +r (t) (m+ 1) T and so
m
t
T
m+ 1
Therefore,
[[S (t)[[ M
(t/T)+1
= M
_
M
1/T
_
t
.
Let M
1/T
e
and then
[[S (t)[[ Me
t
Denition 13.27 Let S (t) be a continuous semigroup as described above. It is
called a contraction semigroup if for all t 0
[[S (t)[[ 1.
It is called a bounded semigroup if there exists M such that for all t 0,
[[S (t)[[ M
Note that for S (t) an arbitrary continuous semigroup satisfying
[[S (t)[[ Me
t
,
It follows that the semigroup,
T (t) = e
t
S (t)
is a bounded semigroup which satises
[[T (t)[[ M.
Proposition 13.28 Given a continuous semigroup S (t) , its generator A exists and
is a closed densely dened operator. Furthermore, for
[[S (t)[[ Me
t
and > , I A is onto and (I A)
1
maps H onto D(A) and is in L(H, H).
Also for these values of ,
(I A)
1
x =
_

0
e
t
S (t) xdt.
For > , the following estimate holds.
(I A)
1

M
[ [
336 HILBERT SPACES
Proof: First note D(A) ,= . In fact 0 D(A). It follows from Theorem 13.26
that for all large enough, one can dene a Laplace transform,
R() x
_

0
e
t
S (t) xdt H.
Here the integral is the ordinary improper Riemann integral. I claim each of these
is in D(A) .
S (h)
_
0
e
t
S (t) xdt
_
0
e
t
S (t) xdt
h
Using the semigroup property and changing the variables in the rst of the above
integrals, this equals
=
1
h
_
e
h
_

h
e
t
S (t) xdt
_

0
e
t
S (t) xdt
_
=
1
h
_
_
e
h
1
_
_

0
e
t
S (t) xdt e
h
_
h
0
e
t
S (t) xdt
_
The limit as h 0 exists and equals
R() x x
Thus R() x D(A) as claimed and
AR() x = R() x x
Hence
x = (I A) R() x. (13.21)
Since x is arbitrary, this shows that for large enough, I A is onto.
Why is D(A) dense? It was shown above that R() x and thererfore R() x
D(A) . Then for > where [[S (t)[[ Me
t
,
[[R() x x[[ =
_

0
e
t
S (t) xdt
_

0
e
t
xdt
_

0
e
t
(S (t) x x)
dt
=
_
h
0
e
t
(S (t) x x)
dt +
_

h
e
t
(S (t) x x)
dt
_
h
0
e
t
(S (t) x x)
dt +
_

h
e
()t
dt (M + 1) [[x[[
Now since S (t) x x 0, it follows that for h suciently small

2
_
h
0
e
t
dt +

e
()h
(M + 1) [[x[[

2
+

e
()h
(M + 1) [[x[[ <
whenever is large enough. Thus D(A) is dense as claimed.
Let x D(A) . Then for y
,
y
__
t
0
S (s) Axds
_
=
_
t
0
y
_
S (s) lim
h0+
S (h) x x
h
_
ds
The dierence quotient is given to have a limit and so the dierence quotients are
bounded. Therefore, one can use the dominated convergence theorem to take the
limit outside the integral and write the above equals
lim
h0+
_
t
0
y
_
S (s)
S (h) x x
h
_
ds
= lim
h0+
y
_
1
h
_
_
t+h
h
S (s) xds
_
t
0
S (s) xds
__
= lim
h0+
y
_
_
t+h
t
S (s) xds
_
h
0
S (s) x
_
= y
(S (t) x x) .
Thus since y
is arbitrary, for x D(A)

S (t) x = x +
_
t
0
S (s) Axds
Why is A closed? Suppose x
n
x and x
n
D(A) while Ax
n
z. From what
was just shown
S (t) x
n
= x
n
+
_
t
0
S (s) Ax
n
ds
and so, passing to the limit this yields
S (t) x = x +
_
t
0
S (s) zds
which implies
lim
t0+
S (t) x x
h
= lim
h0+
1
t
_
t
0
S (s) zds = z
which shows Ax = z and x D(A) . Thus A is closed.
Because of 13.21 it follows R() x = (I A)
1
x. Also
[[R() x[[
_

0
e
t
S (t) xdt
_

0
e
t
Me
t
dt [[x[[
M
[ [
[[x[[
so R() = (I A)
1
L(H, H) and this also proves the last estimate. Also from
13.21, R() maps H onto D(A). This proves the proposition.
The linear mapping (I A)
1
is called the resolvent.
The above proof contains an argument which implies the following corollary.
338 HILBERT SPACES
Corollary 13.29 Let S (t) be a continuous semigroup and let A be its generator.
Then for 0 < a 0 you can take the derivative from the left,
lim
h0+
S (t) x S (t h) x
h
= S (t) Ax
Proof:Letting y
,
y
_
_
b
a
S (t) Axdt
_
=
_
b
a
y
_
S (t) lim
h0
S (h) x x
h
_
dt
The dierence quotients are bounded because they converge to Ax. Therefore, from
the dominated convergence theorem,
y
_
_
b
a
S (t) Axdt
_
= lim
h0
_
b
a
y
_
S (t)
S (h) x x
h
_
dt
= lim
h0
y
_
_
b
a
S (t)
S (h) x x
h
dt
_
= lim
h0
y
_
1
h
_
b+h
a+h
S (t) xdt
1
h
_
b
a
S (t) xdt
_
= lim
h0
y
_
1
h
_
b+h
b
S (t) xdt
1
h
_
a+h
a
S (t) xdt
_
= y
(S (b) x S (a) x)
Since y
is arbitrary, this proves the rst part. Now from what was just shown, if
t > 0 and h is small enough,
S (t) x S (t h) x
h
=
1
h
_
t
th
S (s) Axds
which converges to S (t) Ax as h 0 +. This proves the corollary.
Given a closed densely dened operator, when is it the generator of a bounded
semigroup? This is answered in the following theorem which is called the Hille
Yosida theorem.
Theorem 13.30 Suppose A is a densely dened linear operator which has the prop-
erty that for all > 0,
(I A)
1
L(H, H)
which means that I A : D(A) H is one to one and onto with continuous
inverse. Suppose also that for all n N,
_
(I A)
1
_
n
n
. (13.22)
Then there exists a continuous semigroup, S (t) which has A as its generator and
satises [[S (t)[[ M and A is closed. In fact letting
S
(t) exp
_
+
2
(I A)
1
_
it follows lim
(t) x = S (t) x uniformly on nite intervals. Conversely, if A

is the generator of S (t) , a bounded continuous semigroup having [[S (t)[[ M, then
(I A)
1
L(H, H) for all > 0 and 13.22 holds.
Proof: Consider the operator
(I A)
1
A
On D(A) , this equals
+
2
(I A)
1
(13.23)
which makes sense on all of H, not just on D(A). Also this last expression equals
A(I A)
1
on all of H because I A is given to be onto. Denote this as A
to save notation.
Thus on D(A) ,
A(I A)
1
= (I A)
1
A
For x D(A) ,
(I A)
1
x x
(I A)
1
(x (I A) x)
(I A)
1
Ax
[[Ax[[
which converges to 0. Therefore, for x D(A) ,
[[A
x Ax[[ =
(I A)
1
Ax Ax

M[[Ax[[
(13.24)
so it also converges to 0. Because of 13.23, the operator A(I A)
1
is continuous.
Now using 13.23 dene an approximate semigroup
S
(t) e
t
k=0
t
k
_
2
(I A)
1
_
k
k!
340 HILBERT SPACES
The sum converges in L(H, H) because it converges absolutely and L(H, H) is
complete. Here is why it converges absolutely. It follows from the assumption in
the lemma.
k=0
t
k
2
(I A)
1
_
k
k!

k=0
t
k
2
(I A)
1
_
k
k!
k=0
t
k
k
M
k!
= Me
t
Thus
[[S
(t)[[ e
t
Me
t
= M
The series converges uniformly on any nite interval thanks to the Weierstrass M
test. Thus t S
(t) is continuous and it is also routine to verify the semigroup

identity. Clearly lim
t0
S
(t) x = x. It is also the case that S
(t) is generated by
+
2
(I A)
1
= A
. This is easy to show from dierentiating the power series

which has a continuous derivative. Thus
() e
t
k=0
t
k
_
2
(I A)
1
_
k
x
k!
+e
t
k=0
t
k
_
2
(I A)
1
_
k+1
x
k!
=
_
+
2
(I A)
1
_
S
(t) x = S
(t)
_
+
2
(I A)
1
_
x
Now let t 0+ to obtain
_
+
2
(I A)
1
_
x = A
x.
Claim: For , > 0, (I A)
1
and (I A)
1
commute.
Proof of claim: Suppose
y = (I A)
1
(I A)
1
x (13.25)
z = (I A)
1
(I A)
1
x (13.26)
I need to show y = z. First note z D(A) and
(I A) z = (I A)
1
x D(A) .
Hence
(I A) z = ( ) z + (I A) z D(A) .
Similarly
(I A) y, (I A) y D(A) .
From 13.25
(I A) (I A) y = x
and using 13.26,
x = (I A) (I A) z
= (( ) I + (I A)) (I A) z
= ( ) (I A) z + (I A)
2
z
= (I A) ( ) z + (I A) (I A) z
= (I A) ( ) z + (I A) (( ) I + (I A)) z
= (I A) ( ) z + (I A) (( ) z + (I A) z)
= (I A) ( ) z + (I A) ( ) z + (I A) (I A) z
= (I A) (I A) z
Thus
x = (I A) (I A) z = (I A) (I A) y
and so z = y. This proves the claim.
It follows from the description of S
(t) that S
(t) and S
(s) commute and also

A
commutes with S
(t) for any t.

I want to show that for each x D(A) ,
lim
(t) x S (t) x
where S (t) is the desired semigroup. Let x D(A)
[[S
(t) x S
(t) x[[ =
_
t
0
d
dr
(S
(t r) S
(r)) xdr
Since A
commutes with S
(r) , the following formula follows from 13.24.

=
_
t
0
(S
(t r) S
(r) A
x S
(t r) A
(r) x) dr
_
t
0
[[S
(t r) S
(r) (A
x A
x)[[ dr
M
2
t [[A
x A
x[[ M
2
t ([[A
x Ax[[ +[[Ax A
x[[)
_
[[Ax[[
+
[[Ax[[
_
tM
2
Hence whenever , large enough, [[S
(t) x S
(t) x[[ is small. Thus S
(t) x
converges uniformly on nite intervals to something denoted by S (t) x. Therefore,
t S (t) x is continuous for each x D(A) and also
[[S (t) x[[ = lim
[[S
(t) x[[ M[[x[[

342 HILBERT SPACES
so that S (t) can be extended to a continuous linear map, still called S (t) dened
on all of H which also satises [[S (t)[[ M since D(A) is dense in H. If x is
arbitrary, let y D(A) be close to x. Then
[[S (t) x S
(t) x[[ [[S (t) x S (t) y[[ +[[S (t) y S
(t) y[[
+[[S
(t) y S
(t) x[[
2M[[x y[[ +[[S (t) y S
(t) y[[
and so lim
(t) x = S (t) x for all x, uniformly on nite intervals. Thus

t S (t) x is continuous for any x H.
It remains to verify A generates S (t) and for all x, S (t) x x 0. From the
above,
S
(t) x = x +
_
t
0
S
(s) A
xds (13.27)
and so
lim
t0+
[[S
(t) x x[[ = 0
By the uniform convergence just shown, there exists large enough that for all
t [0, ] ,
[[S (t) x S
(t) x[[ < .

Then
lim sup
t0+
[[S (t) x x[[ lim sup
t0+
([[S (t) x S
(t) x[[ +[[S
(t) x x[[)
lim sup
t0+
( +[[S
(t) x x[[)
It follows lim
t0+
S (t) x = x because is arbitrary.
Next, lim
x = Ax for all x D(A) by 13.24. Therefore, passing to the

limit in 13.27 yields from the uniform convergence
S (t) x = x +
_
t
0
S (s) Axds
and by continuity of s S (s) Ax, it follows
lim
h0+
S (h) x x
h
= lim
h0
1
h
_
h
0
S (s) Axds = Ax
Thus letting B denote the generator of S (t) , D(A) D(B) and A = B on D(A) .
It only remains to verify D(A) = D(B) .
To do this, let > 0 and consider the following where y H is arbitrary.
(I B)
1
y = (I B)
1
_
(I A) (I A)
1
y
_
Now (I A)
1
y D(A) D(B) and A = B on D(A) and so
(I A) (I A)
1
y = (I B) (I A)
1
y
which implies,
(I B)
1
y =
(I B)
1
_
(I B) (I A)
1
y
_
= (I A)
1
y
Recall from Proposition 13.28 that an arbitrary element of D(B) is of the form
(I B)
1
y and this has shown every such vector is in D(A) , in fact it equals
(I A)
1
y. Hence D(B) D(A) which shows A generates S (t) and this proves
the rst half of the theorem.
Next suppose A is the generator of a semigroup S (t) having [[S (t)[[ M. Then
by Proposition 13.28 for all > 0, (I A) is onto and
(I A)
1
=
_

0
e
t
S (t) dt
thus
_
(I A)
1
_
n
_

0

_

0
e
(t
1
++t
n
)
S (t
1
+ +t
n
) dt
1
dt
n
_

0

_

0
e
(t
1
++t
n
)
Mdt
1
dt
n
=
M
n
.
13.5.1 An Evolution Equation
When generates a continuous semigroup, one can consider a very interesting
theorem about evolution equations of the form
y
y = g (t)
provided t g (t) is C
1
.
Theorem 13.31 Let be the generator of S (t) , a continuous semigroup on H,
a Banach space and let t g (t) be in C
1
(0, ; H). Then there exists a unique
solution to the initial value problem
y
y = g, y (0) = y
0
D()
and it is given by
y (t) = S (t) y
0
+
_
t
0
S (t s) g (s) ds. (13.28)
This solution is continuous having continuous derivative and has values in D().
344 HILBERT SPACES
Proof: First I show the following claim.
Claim:
_
t
0
S (t s) g (s) ds D() and
__
t
0
S (t s) g (s) ds
_
= S (t) g (0) g (t) +
_
t
0
S (t s) g
(s) ds
Proof of the claim:
1
h
_
S (h)
_
t
0
S (t s) g (s) ds
_
t
0
S (t s) g (s) ds
_
=
1
h
__
t
0
S (t s +h) g (s) ds
_
t
0
S (t s) g (s) ds
_
=
1
h
_
_
th
h
S (t s) g (s +h) ds
_
t
0
S (t s) g (s) ds
_
=
1
h
_
0
h
S (t s) g (s +h) ds +
_
th
0
S (t s)
g (s +h) g (s)
h
1
h
_
t
th
S (t s) g (s) ds
Using the estimate in Theorem 13.26 on Page 334 and the dominated convergence
theorem, the limit as h 0 of the above equals
S (t) g (0) g (t) +
_
t
0
S (t s) g
(s) ds
which proves the claim.
Since y
0
D() ,
S (t) y
0
= S (t) lim
h0
S (h) y
0
y
0
h
= lim
h0
S (t +h) S (t)
h
y
0
= lim
h0
S (h) S (t) y
0
S (t) y
0
h
(13.29)
Since this limit exists, the last limit in the above exists and equals
S (t) y
0
(13.30)
and so S (t) y
0
D(). Now consider 13.28.
y (t +h) y (t)
h
=
S (t +h) S (t)
h
y
0
+
1
h
_
_
t+h
0
S (t s +h) g (s) ds
_
t
0
S (t s) g (s) ds
_
=
S (t +h) S (t)
h
y
0
+
1
h
_
t+h
t
S (t s +h) g (s) ds
+
1
h
_
S (h)
_
t
0
S (t s) g (s) ds
_
t
0
S (t s) g (s) ds
_
From the claim and 13.29, 13.30 the limit of the right side is
S (t) y
0
+g (t) +
__
t
0
S (t s) g (s) ds
_
=
_
S (t) y
0
+
_
t
0
S (t s) g (s) ds
_
+g (t)
Hence
y
(t) = y (t) +g (t)

and from the formula, y
is continuous since by the claim and 13.30 it also equals

S (t) y
0
+g (t) +S (t) g (0) g (t) +
_
t
0
S (t s) g
(s) ds
which is continuous. The claim and 13.30 also shows y (t) D(). This proves the
existence part of the lemma.
It remains to prove the uniqueness part. It suces to show that if
y
y = 0, y (0) = 0
and y is C
1
having values in D() , then y = 0. Suppose then that y is this way.
Letting 0 < s < t,
d
ds
(S (t s) y (s))
lim
h0
S (t s h)
y (s +h) y (s)
h
S (t s) y (s) S (t s h) y (s)
h
provided the limit exists. Since y
exists and y (s) D() , this equals

S (t s) y
(s) S (t s) y (s) = 0.
Let y
. This has shown that on the open interval (0, t) the function s
y
(S (t s) y (s)) has a derivative equal to 0. Also from continuity of S and y, this

function is continuous on [0, t]. Therefore, it is constant on [0, t] by the mean value
theorem. At s = 0, this function equals 0. Therefore, it equals 0 on [0, t]. Thus
for xed s > 0 and letting t > s, y
(S (t s) y (s)) = 0. Now let t decrease toward

s. Then y
(y (s)) = 0 and since y
was arbitrary, it follows y (s) = 0. This proves

uniqueness.
346 HILBERT SPACES
13.5.2 Adjoints, Hilbert Space
In Hilbert space, there are some special things which are true.
Denition 13.32 Let A be a densely dened closed operator on H a real Hilbert
space. Then A
is dened as follows.
D(A
) y H : [(Ax, y)[ C [x[

Then since D(A) is dense, there exists a unique element of H denoted by A
y such
that
(Ax, y) = (x, A
y)
for all x D(A) .
Lemma 13.33 Let A be closed and densely dened on D(H) H, a Hilbert space.
Then A
is also closed and densely dened. Also (A
= A. In addition to this, if
(I A)
1
L(H, H) , then (I A
)
1
L(H, H) and
__
(I A)
1
_
n
_
=
_
(I A
)
1
_
n
Proof: Denote by [x, y] an ordered pair in H H. Dene : H H H H
by
[x, y] [y, x]
Then the denition of adjoint implies that for ( (B) equal to the graph of B,
( (A
) = (( (A))
(13.31)
In this notation the inner product on H H with respect to which is dened is
given by
([x, y] , [a, b]) (x, a) + (y, b) .
Here is why this is so. For [x, A
x] ( (A
) it follows that for all y D(A)

([x, A
x] , [Ay, y]) = (Ay, x) + (y, A
x) = 0
and so [x, A
x] (( (A))
which shows
( (A
) (( (A))
To obtain the other inclusion, let [a, b] (( (A))
. This means that for all x

D(A) ,
([a, b] , [Ax, x]) = 0.
In other words, for all x D(A) ,
(Ax, a) = (x, b)
and so [(Ax, a)[ C [x[ for all x D(A) which shows a D(A
) and
(x, A
a) = (x, b)
for all x D(A) . Therefore, since D(A) is dense, it follows b = A
a and so
[a, b] ( (A
) . This shows the other inclusion.

Note that if V is any subspace of the Hilbert space H H,
_
V
= V
and S
is always a closed subspace. Also and commute. The reason for this is
that [x, y] (V )
means that
(x, b) + (y, a) = 0
for all [a, b] V and [x, y]
_
V
_
means [y, x] V
so for all [a, b] V,

(y, a) + (x, b) = 0
which says the same thing. It is also clear that has the eect of multiplication
by 1.
It follows from the above description of the graph of A
that even if ( (A) were

not closed it would still be the case that ( (A
) is closed.
Why is D(A
) dense? Suppose z D(A
. Then for all y D(A
) so that
[y, Ay] ( (A
) , it follows [z, 0] ( (A
=
_
(( (A))
= ( (A) but this

implies
[0, z] ( (A)
and so z = A0 = 0. Thus D(A
) must be dense since there is no nonzero vector

in D(A
.
Since A is a closed operator, meaning ( (A) is closed in H H, it follows from
the above formula that
(
_
(A
_
=
_
_
(( (A))
__
=
_
(( (A))
=
_
(( (A))
=
_
( (A)
= ( (A)
and so (A
= A.
Now consider the nal claim. First let y D(A
) = D(I A
) . Then letting
x H be arbitrary,
_
x,
_
(I A) (I A)
1
_
y
_
_
(I A) (I A)
1
x, y
_
=
_
x,
_
(I A)
1
_
(I A
) y
_
Thus
_
(I A) (I A)
1
_
= I =
_
(I A)
1
_
(I A
) (13.32)
348 HILBERT SPACES
on D(A
). Next let x D(A) = D(I A) and y H arbitrary.

(x, y) =
_
(I A)
1
(I A) x, y
_
=
_
(I A) x,
_
(I A)
1
_
y
_
Now it follows
_
(I A) x,
_
(I A)
1
_
y
_
[y[ [x[ for any x D(A) and so

_
(I A)
1
_
y D(A
)
Hence
(x, y) =
_
x, (I A
)
_
(I A)
1
_
y
_
.
Since x D(A) is arbitrary and D(A) is dense, it follows
(I A
)
_
(I A)
1
_
= I (13.33)
From 13.32 and 13.33 it follows
(I A
)
1
=
_
(I A)
1
_
and (I A
) is one to one and onto with continuous inverse. Finally, from the
above,
_
(I A
)
1
_
n
=
__
(I A)
1
_
_
n
=
__
(I A)
1
_
n
_
.
With this preparation, here is an interesting result about the adjoint of the
generator of a continuous bounded semigroup. I found this in Balakrishnan [8].
Theorem 13.34 Suppose A is a densely dened closed operator which generates
a continuous semigroup, S (t) . Then A
is also a closed densely dened operator

which generates S
(t) and S
(t) is also a continuous semigroup.

Proof: First suppose S (t) is also a bounded semigroup, [[S (t)[[ M. From
Lemma 13.33 A
is closed and densely dened. It follows from the Hille Yosida

theorem, Theorem 13.30 that
_
(I A)
1
_
n
n
From Lemma 13.33 and the fact the adjoint of a bounded linear operator preserves
the norm,
M
__
(I A)
1
_
n
_
__
(I A)
1
_
_
n
_
(I A
)
1
_
n

and so by Theorem 13.30 again it follows A
generates a continuous semigroup,

T (t) which satises [[T (t)[[ M. I need to identify T (t) with S
(t). However,
from the proof of Theorem 13.30 and Lemma 13.33, it follows that for x D(A
)
and a suitable sequence
n
,
(T (t) x, y) =
_
_
_ lim
n
e
n
t
k=0
t
k
_
2
n
(
n
I A
)
1
_
k
k!
x, y
_
_
_
= lim
n
_
_
_
_
e
n
t
k=0
t
k
_
_
2
n
(
n
I A)
1
_
k
_
k!
x, y
_
_
_
_
= lim
n
_
_
_
_
x, e
n
t
_
_
_
_
k=0
t
k
_
_
2
n
(
n
I A)
1
_
k
_
k!
_
_
_
_
y
_
_
_
_
= (x, S (t) y) = (S
(t) x, y) .
Therefore, since y is arbitrary, S
(t) = T (t) on x D(A
) a dense set and this

shows the two are equal. This proves the proposition in the case where S (t) is also
bounded.
Next only assume S (t) is a continuous semigroup. Then by Proposition 13.28
there exists > 0 such that
[[S (t)[[ Me
t
.
Then consider the operator I + A and the bounded semigroup e
t
S (t). For
x D(A)
lim
h0+
e
h
S (h) x x
h
= lim
h0+
_
e
h
S (h) x x
h
+
e
h
1
h
x
_
= x +Ax
Thus I +A generates e
t
S (t) and it follows from the rst part that I +A
generates e
t
S
(t) . Thus
x +A
x = lim
h0+
e
h
S
(h) x x
h
= lim
h0+
_
e
h
S
(h) x x
h
+
e
h
1
h
x
_
= x + lim
h0+
S
(h) x x
h
showing that A
generates S
(t) . It follows from Proposition 13.28 that A
is closed
and densely dened. It is obvious S
(t) is a semigroup. Why is it continuous? This

350 HILBERT SPACES
also follows from the rst part of the argument which establishes that
e
t
S
(t)
is continuous. This proves the theorem.
13.5.3 Adjoints, Reexive Banach Space
Here the adjoint of a generator of a semigroup is considered. I will show that the
adjoint of the generator generates the adjoint of the semigroup in a reexive Banach
space. This is about as far as you can go although a general but less satisfactory
result is given in Yosida [43].
Denition 13.35 Let A be a densely dened closed operator on H a real Banach
space. Then A
is dened as follows.
D(A
) y
: [y
(Ax)[ C [[x[[ for all x D(A)

Then since D(A) is dense, there exists a unique element of H
denoted by A
y such
that
A
(y
) (x) = y
(Ax)
for all x D(A) .
Lemma 13.36 Let A be closed and densely dened on D(A) H, a Banach space.
Then A
is also closed and densely dened. Also (A
= A. In addition to this, if
(I A)
1
L(H, H) , then (I A
)
1
L(H
, H
) and
__
(I A)
1
_
n
_
=
_
(I A
)
1
_
n
Proof: Denote by [x, y] an ordered pair in H H. Dene : H H H H
by
[x, y] [y, x]
A similar notation will apply to H
. Then the denition of adjoint implies

that for ( (B) equal to the graph of B,
( (A
) = (( (A))
(13.34)
For S H H, dene S
by
[a
, b
] H
: a
(x) +b
(y) = 0 for all [x, y] S

If S H
a similar denition holds.

[x, y] H H : a
(x) +b
(y) = 0 for all [a
, b
] S
Here is why 13.34 is so. For [x
, A
] ( (A
) it follows that for all y D(A)

x
(Ay) = A
(y)
and so for all [y, Ay] ( (A) ,
x
(Ay) +A
(y) = 0
which is what it means to say [x
, A
] (( (A))
. This shows
( (A
) (( (A))
To obtain the other inclusion, let [a
, b
] (( (A))
. This means that for all

[x, Ax] ( (A) ,
a
(Ax) +b
(x) = 0
In other words, for all x D(A) ,
[a
(Ax)[ [[b
[[ [[x[[
which means by denition, a
D(A
) and A
= b
. Thus [a
, b
] ( (A
).This
shows the other inclusion.
Note that if V is any subspace of H H,
_
V
= V
and S
is always a closed subspace. Also and commute. The reason for this is
that [x
, y
] (V )
means that
x
(b) +y
(a) = 0
for all [a, b] V and [x
, y
]
_
V
_
means [y
, x
]
_
V
_
= V
so for all
[a, b] V,
y
(a) +x
(b) = 0
which says the same thing. It is also clear that has the eect of multiplication
by 1. If V H
, the argument for commuting and is similar.

It follows from the above description of the graph of A
that even if ( (A) were

not closed it would still be the case that ( (A
) is closed.
Why is D(A
) dense? If it is not dense, then by a typical application of the Hahn

Banach theorem, there exists y
such that y
(D(A
)) = 0 but y
,= 0.
Since H is reexive, there exists y H such that x
(y) = 0 for all x
D(A
) .
Thus
[y, 0] ( (A
=
_
(( (A))
= ( (A)
and so [0, y] ( (A) which means y = A0 = 0, a contradiction. Thus D(A
) is
indeed dense. Note this is where it was important to assume the space is reexive.
If you consider C ([0, 1]) it is not dense in L
([0, 1]) but if f L

1
([0, 1]) satises
_
1
0
fgdm = 0 for all g C ([0, 1]) , then f = 0. Hence there is no nonzero f
C ([0, 1])
.
352 HILBERT SPACES
Since A is a closed operator, meaning ( (A) is closed in H H, it follows from
the above formula that
(
_
(A
_
=
_
_
(( (A))
__
=
_
(( (A))
=
_
(( (A))
=
_
( (A)
= ( (A)
and so (A
= A.
Now consider the nal claim. First let y
D(A
) = D(I A
) . Then letting
x H be arbitrary,
y
(x) =
_
(I A) (I A)
1
_
(x)
= y
_
(I A) (I A)
1
x
_
Since y
D(A
) and (I A)
1
x D(A) , this equals
(I A)
_
(I A)
1
x
_
Now by denition, this equals
_
(I A)
1
_
(I A)
(x)
It follows that for y
D(A
) ,
_
(I A)
1
_
(I A)
=
_
(I A)
1
_
(I A
) y
= y
(13.35)
Next let y
be arbitrary and x D(A)

y
(x) = y
_
(I A)
1
(I A) x
_
=
_
(I A)
1
_
((I A) x)
= (I A)
_
(I A)
1
_
(x)
In going from the second to the third line, the rst line shows
_
(I A)
1
_

D(A
) and so the third line follows. Since D(A) is dense, it follows

(I A
)
_
(I A)
1
_
= I (13.36)
Then 13.35 and 13.36 show I A
is one to one and onto from D(A
) t0 H
and
(I A
)
1
=
_
(I A)
1
_
.
Finally, from the above,
_
(I A
)
1
_
n
=
__
(I A)
1
_
_
n
=
__
(I A)
1
_
n
_
.
With this preparation, here is an interesting result about the adjoint of the
generator of a continuous bounded semigroup.
Theorem 13.37 Suppose A is a densely dened closed operator which generates
a continuous semigroup, S (t) . Then A
is also a closed densely dened operator

which generates S
(t) and S
(t) is also a continuous semigroup.

Proof: First suppose S (t) is also a bounded semigroup, [[S (t)[[ M. From
Lemma 13.36 A
is closed and densely dened. It follows from the Hille Yosida

theorem, Theorem 13.30 that
_
(I A)
1
_
n
n
From Lemma 13.36 and the fact the adjoint of a bounded linear operator preserves
the norm,
M
__
(I A)
1
_
n
_
__
(I A)
1
_
_
n
_
(I A
)
1
_
n
and so by Theorem 13.30 again it follows A
generates a continuous semigroup,

T (t) which satises [[T (t)[[ M. I need to identify T (t) with S
(t). However,
from the proof of Theorem 13.30 and Lemma 13.36, it follows that for x
D(A
)
and a suitable sequence
n
,
T (t) x
(y) = lim
n
e
n
t
k=0
t
k
_
2
n
(
n
I A
)
1
_
k
k!
x
(y)
= lim
n
e
n
t
k=0
t
k
_
_
2
n
(
n
I A)
1
_
k
_
k!
x
(y)
= lim
n
x
_
_
_
_
e
n
t
_
_
_
_
k=0
t
k
_
_
2
n
(
n
I A)
1
_
k
_
k!
y
_
_
_
_
_
_
_
_
= x
(S (t) y) = S
(t) x
(y) .
354 HILBERT SPACES
Therefore, since y is arbitrary, S
(t) = T (t) on x D(A
) a dense set and this

shows the two are equal. In particular, S
(t) is a semigroup because T (t) is. This

proves the proposition in the case where S (t) is also bounded.
Next only assume S (t) is a continuous semigroup. Then by Proposition 13.28
there exists > 0 such that
[[S (t)[[ Me
t
.
Then consider the operator I + A and the bounded semigroup e
t
S (t). For
x D(A)
lim
h0+
e
h
S (h) x x
h
= lim
h0+
_
e
h
S (h) x x
h
+
e
h
1
h
x
_
= x +Ax
Thus I +A generates e
t
S (t) and it follows from the rst part that I +A
generates the semigroup e

t
S
(t) . Thus
x +A
x = lim
h0+
e
h
S
(h) x x
h
= lim
h0+
_
e
h
S
(h) x x
h
+
e
h
1
h
x
_
= x + lim
h0+
S
(h) x x
h
showing that A
generates S
(t) . It follows from Proposition 13.28 that A
is closed
and densely dened. It is obvious S
(t) is a semigroup. Why is it continuous? This

also follows from the rst part of the argument which establishes that
t e
t
S
(t) x
is continuous. This proves the theorem.
13.6 Exercises
1. For f, g C ([0, 1]) let (f, g) =
_
1
0
f (x) g (x)dx. Is this an inner product
space? Is it a Hilbert space? What does the Cauchy Schwarz inequality say
in this context?
2. Suppose the following conditions hold.
(x, x) 0, (13.37)
(x, y) = (y, x). (13.38)
(ax +by, z) = a(x, z) +b(y, z). (13.39)
13.6. EXERCISES 355
These are the same conditions for an inner product except it is no longer
required that (x, x) = 0 if and only if x = 0. Does the Cauchy Schwarz
inequality hold in the following form?
[(x, y)[ (x, x)
1/2
(y, y)
1/2
.
3. Let S denote the unit sphere in a Banach space, X,
S x X : [[x[[ = 1 .
Show that if Y is a Banach space, then A L(X, Y ) is compact if and only
if A(S) is precompact, A(S) is compact. A L(X, Y ) is said to be compact
if whenever B is a bounded subset of X, it follows A(B) is a compact subset
of Y. In words, A takes bounded sets to precompact sets.
4. Show that A L(X, Y ) is compact if and only if A
is compact. Hint: Use

the result of 3 and the Ascoli Arzela theorem to argue that for S
the unit ball

in X
, there is a subsequence, y
n
S
such that y
n
converges uniformly
on the compact set, A(S). Thus A
n
is a Cauchy sequence in X
. To get
the other implication, apply the result just obtained for the operators A
and
A
. Then use results about the embedding of a Banach space into its double
dual space.
5. Prove the parallelogram identity,
[x +y[
2
+[x y[
2
= 2 [x[
2
+ 2 [y[
2
.
Next suppose (X, [[ [[) is a real normed linear space and the parallelogram
identity holds. Can it be concluded there exists an inner product (, ) such
that [[x[[ = (x, x)
1/2
?
6. Let K be a closed, bounded and convex set in 1
n
and let f : K 1
n
be
continuous and let y 1
n
. Show using the Brouwer xed point theorem
there exists a point x K such that P (y f (x) +x) = x. Next show that
(y f (x) , z x) 0 for all z K. The existence of this x is known as
Browders lemma and it has great signicance in the study of certain types of
nolinear operators. Now suppose f : 1
n
1
n
is continuous and satises
lim
|x|
(f (x) , x)
[x[
= .
Show using Browders lemma that f is onto.
7. Show that every inner product space is uniformly convex. This means that if
x
n
, y
n
are vectors whose norms are no larger than 1 and if [[x
n
+y
n
[[ 2,
then [[x
n
y
n
[[ 0.
8. Let H be separable and let S be an orthonormal set. Show S is countable.
Hint: How far apart are two elements of the orthonormal set?
356 HILBERT SPACES
9. Suppose x
1
, , x
m
is a linearly independent set of vectors in a normed
linear space. Show span(x
1
, , x
m
) is a closed subspace. Also show every
orthonormal set of vectors is linearly independent.
10. Show every Hilbert space, separable or not, has a maximal orthonormal set
of vectors.
11. Prove Bessels inequality, which says that if x
n
n=1
is an orthonormal
set in H, then for all x H, [[x[[
2
k=1
[(x, x
k
)[
2
. Hint: Show that if
M = span(x
1
, , x
n
), then Px =

n
k=1
x
k
(x, x
k
). Then observe [[x[[
2
=
[[x Px[[
2
+[[Px[[
2
.
12. Show S is a maximal orthonormal set if and only if span(S) is dense in H,
where span(S) is dened as
span(S) all nite linear combinations of elements of S.
13. Suppose x
n
n=1
is a maximal orthonormal set. Show that
x =
n=1
(x, x
n
)x
n
lim
N
N
n=1
(x, x
n
)x
n
and [[x[[
2
=

i=1
[(x, x
i
)[
2
. Also show (x, y) =

n=1
(x, x
n
)(y, x
n
). Hint:
For the last part of this, you might proceed as follows. Show that
((x, y))
n=1
(x, x
n
)(y, x
n
)
is a well dened inner product on the Hilbert space which delivers the same
norm as the original inner product. Then you could verify that there exists
a formula for the inner product in terms of the norm and conclude the two
inner products, (, ) and ((, )) must coincide.
14. Suppose X is an innite dimensional Banach space and suppose
x
1
x
n
are linearly independent with [[x

i
[[ = 1. By Problem 9 span(x
1
x
n
) X
n
is a closed linear subspace of X. Now let z / X
n
and pick y X
n
such that
[[z y[[ 2 dist (z, X
n
) and let
x
n+1
=
z y
[[z y[[
.
Show the sequence x
k
satises [[x
n
x
k
[[ 1/2 whenever k < n. Now
show the unit ball x X : [[x[[ 1 in a normed linear space is compact if
and only if X is nite dimensional. Hint:
z y
[[z y[[
x
k
z y x
k
[[z y[[
[[z y[[
.
13.6. EXERCISES 357
15. Show that if A is a self adjoint operator on a Hilbert space and Ay = y for
a complex number and y ,= 0, then must be real. Also verify that if A is
self adjoint and Ax = x while Ay = y, then if ,= , it must be the case
that (x, y) = 0.
16. Theorem 13.31 gives the the existence and uniqueness for an evolution equa-
tion of the form
y
y = g, y (0) = y
0
H
where g is in C
1
(0, ; H) for H a Banach space. Recall was the generator
of a continuous semigroup, S (h). Generalize this to an equation of the form
y
y = g +Ly, y (0) = y
0
H
where L is a continuous linear map. Hint: You might consider + L and
show it generates a continuous semigroup. Then apply the theorem.
358 HILBERT SPACES
Representation Theorems
14.1 Radon Nikodym Theorem
This chapter is on various representation theorems. The rst theorem, the Radon
Nikodym Theorem, is a representation theorem for one measure in terms of an-
other. The approach given here is due to Von Neumann and depends on the Riesz
representation theorem for Hilbert space, Theorem 13.14 on Page 326.
Denition 14.1 Let and be two measures dened on a -algebra, o, of subsets
of a set, . is absolutely continuous with respect to ,written as , if
(E) = 0 whenever (E) = 0.
It is not hard to think of examples which should be like this. For example,
suppose one measure is volume and the other is mass. If the volume of something
is zero, it is reasonable to expect the mass of it should also be equal to zero. In
this case, there is a function called the density which is integrated over volume to
obtain mass. The Radon Nikodym theorem is an abstract version of this notion.
Essentially, it gives the existence of the density function.
Theorem 14.2 (Radon Nikodym) Let and be nite measures dened on a -
algebra, o, of subsets of . Suppose . Then there exists a unique f L
1
(, )
such that f(x) 0 and
(E) =
_
E
f d.
If it is not necessarily the case that , there are two measures,
and
||
such
that =
+
||
,
||
and there exists a set of measure zero, N such that for
all E measurable,
(E) = (E N) =
(E N) . In this case the two mesures,
and
||
are unique and the representation of =
+
||
is called the Lebesgue
decomposition of . The measure
||
is the absolutely continuous part of and
is called the singular part of .

Proof: Let : L
2
(, +) C be dened by
g =
_
g d.
359
360 REPRESENTATION THEOREMS
By Holders inequality,
[g[
__
1
2
d
_
1/2
__
[g[
2
d ( +)
_
1/2
= ()
1/2
[[g[[
2
where [[g[[
2
is the L
2
norm of g taken with respect to + . Therefore, since is
bounded, it follows from Theorem 12.4 on Page 301 or Lemma 10.19 on Page 255
that (L
2
(, +))
, the dual space L

2
(, +). By the Riesz representation
theorem in Hilbert space, Theorem 13.14 or Theorem 10.21 on Page 257, there
exists a unique h L
2
(, +) with
g =
_
g d =
_
hgd( +). (14.1)

The plan is to show h is real and nonnegative at least a.e. Therefore, consider the
set where Imh is positive.
E = x : Imh(x) > 0 ,
Now let g = A
E
and use 14.1 to get
(E) =
_
E
(Re h +i Imh)d( +). (14.2)
Since the left side of 14.2 is real, this shows
0 =
_
E
(Imh) d( +)
_
E
n
(Imh) d( +)
1
n
( +) (E
n
)
where
E
n

_
x : Imh(x)
1
n
_
Thus ( +) (E
n
) = 0 and since E =
n=1
E
n
, it follows ( +) (E) = 0. A similar
argument shows that for
E = x : Imh(x) < 0,
( +)(E) = 0. Thus there is no loss of generality in assuming h is real-valued.
The next task is to show h is nonnegative. This is done in the same manner as
above. Dene the set where it is negative and then show this set has measure zero.
Let E x : h(x) < 0 and let E
n
x : h(x) <
1
n
. Then let g = A
E
n
. Since
E =
n
E
n
, it follows that if ( +) (E) > 0 then this is also true for ( +) (E
n
)
for all n large enough. Then from 14.2
(E
n
) =
_
E
n
h d( +) (1/n) ( +) (E
n
) < 0,
14.1. RADON NIKODYM THEOREM 361
a contradiction. Thus it can be assumed h 0.
At this point the argument splits into two cases.
Case Where . In this case, h < 1.
Let E = [h 1] and let g = A
E
. Then
(E) =
_
E
h d( +) (E) +(E).
Therefore (E) = 0. Since , it follows that (E) = 0 also. Thus it can be
assumed
0 h(x) < 1
for all x.
From 14.1, whenever g L
2
(, +),
_
g(1 h)d =
_
hgd. (14.3)
Now let E be a measurable set and dene
g(x)
n
i=0
h
i
(x)A
E
(x)
in 14.3. This yields
_
E
(1 h
n+1
(x))d =
_
E
n+1
i=1
h
i
(x)d. (14.4)
Let f(x) =

i=1
h
i
(x) and use the Monotone Convergence theorem in 14.4 to let
n and conclude
(E) =
_
E
f d.
f L
1
(, ) because is nite.
The function, f is unique a.e. because, if g is another function which also
serves to represent , consider for each n N the set,
E
n

_
f g >
1
n
_
and conclude that
0 =
_
E
n
(f g) d
1
n
(E
n
) .
Therefore, (E
n
) = 0. It follows that
([f g > 0])
n=1
(E
n
) = 0
Similarly, the set where g is larger than f has measure zero. This proves the
theorem.
Case where it is not necessarily true that .
In this case, let N = [h 1] and let g = A
N
. Then
(N) =
_
N
h d( +) (N) +(N).
and so (N) = 0. Now dene a measure,
by
(E) (E N)
so
(E N) = (E N N)
(E) and let

||

. Therefore,
(E) =
_
E N
C
_
Also,
||
(E) = (E)
(E) (E) (E N) =
_
E N
C
_
.
Suppose
||
(E) > 0. Therefore, since h < 1 on N
C
||
(E) =
_
E N
C
_
=
_
EN
C
h d( +)
<
_
E N
C
_
+
_
E N
C
_
= (E) +
||
(E) ,
which is a contradiction unless (E) > 0. Therefore,
||
because if (E) = 0,
the above inequality cannot hold.
It only remains to verify the two measures
and
||
are unique. Suppose then
that
1
and
2
play the roles of
and
||
respectively. Let N
1
play the role of N
in the denition of
1
and let g
1
play the role of g for
2
. I will show that g = g
1

a.e. Let E
k
[g
1
g > 1/k] for k N. Then on observing that
1
=
2
||
0 = (
1
)
_
E
n
(N
1
N)
C
_
=
_
E
n
(N
1
N)
C
(g
1
g) d
1
k
_
E
k
(N
1
N)
C
_
=
1
k
(E
k
) .
and so (E
k
) = 0. Therefore, ([g
1
g > 0]) = 0 because [g
1
g > 0] =
k=1
E
k
.
It follows g
1
g a.e. Similarly, g
1
g a.e. Therefore,
2
=
||
and so
=
1
also. This proves the theorem.
The f in the theorem for the absolutely continuous case is sometimes denoted
by
d
d
and is called the Radon Nikodym derivative.
The next corollary is a useful generalization to nite measure spaces.
Corollary 14.3 Suppose and there exist sets S
n
o with
S
n
S
m
= ,
n=1
S
n
= ,
14.1. RADON NIKODYM THEOREM 363
and (S
n
), (S
n
) < . Then there exists f 0, where f is measurable, and
(E) =
_
E
f d
for all E o. The function f is + a.e. unique.
Proof: Dene the algebra of subsets of S
n
,
o
n
E S
n
: E o.
Then both , and are nite measures on o
n
, and . Thus, by Theorem 14.2,
there exists a nonnegative o
n
measurable function f
n
,with (E) =
_
E
f
n
d for all
E o
n
. Dene f(x) = f
n
(x) for x S
n
. Since the S
n
are disjoint and their union
is all of , this denes f on all of . The function, f is measurable because
f
1
((a, ]) =
n=1
f
1
n
((a, ]) o.
Also, for E o,
(E) =
n=1
(E S
n
) =
n=1
_
A
ES
n
(x)f
n
(x)d
=
n=1
_
A
ES
n
(x)f(x)d
By the monotone convergence theorem
n=1
_
A
ES
n
(x)f(x)d = lim
N
N
n=1
_
A
ES
n
(x)f(x)d
= lim
N
_
N
n=1
A
ES
n
(x)f(x)d
=
_

n=1
A
ES
n
(x)f(x)d =
_
E
f d.
This proves the existence part of the corollary.
To see f is unique, suppose f
1
and f
2
both work and consider for n N
E
k

_
f
1
f
2
>
1
k
_
.
Then
0 = (E
k
S
n
) (E
k
S
n
) =
_
E
k
S
n
f
1
(x) f
2
(x)d.
Hence (E
k
S
n
) = 0 for all n so
(E
k
) = lim
n
(E S
n
) = 0.
Hence ([f
1
f
2
> 0])

k=1
(E
k
) = 0. Therefore, ([f
1
f
2
> 0]) = 0 also.
Similarly
( +) ([f
1
f
2
< 0]) = 0.
This version of the Radon Nikodym theorem will suce for most applications,
but more general versions are available. To see one of these, one can read the
treatment in Hewitt and Stromberg [25]. This involves the notion of decomposable
measure spaces, a generalization of nite.
Not surprisingly, there is a simple generalization of the Lebesgue decomposition
part of Theorem 14.2.
Corollary 14.4 Let (, o) be a set with a algebra of sets. Suppose and are
two measures dened on the sets of o and suppose there exists a sequence of disjoint
sets of o,
i
i=1
such that (
i
) , (
i
) < . Then there is a set of measure
zero, N and measures
and
||
such that
+
||
= ,
||
,
(E) = (E N) =
(E N) .
Proof: Let o
i
E
i
: E o and for E o
i
, let
i
(E) = (E) and
i
(E) = (E) . Then by Theorem 14.2 there exist unique measures
i
and
i
||
such that
i
=
i
+
i
||
, a set of
i
measure zero, N
i
o
i
such that for all E o
i
,
(E) =
i
(E N
i
) and
i
||

i
. Dene for E o
(E)
(E
i
) ,
||
(E)
i
||
(E
i
) , N
i
N
i
.
First observe that
and
||
are measures.
j=1
E
j
_

j=1
E
j

i
_
=
(E
j

i
)
=
(E
j

i
) =
i
(E
j

i
N
i
)
=
(E
j

i
) =
(E
j
) .
The argument for
||
is similar. Now
(N) =
i
(N
i
) =
i
(N
i
) = 0
and
(E)
(E
i
) =
i
(E
i
N
i
)
=
i
(E
i
N) = (E N) .
14.2. VECTOR MEASURES 365
Also if (E) = 0, then
i
(E
i
) = 0 and so
i
||
(E
i
) = 0. Therefore,
||
(E) =
i
||
(E
i
) = 0.
The decomposition is unique because of the uniqueness of the
i
||
and
i
and the
observation that some other decomposition must coincide with the given one on the
i
.
14.2 Vector Measures
The next topic will use the Radon Nikodym theorem. It is the topic of vector and
complex measures. The main interest is in complex measures although a vector
measure can have values in any topological vector space. Whole books have been
written on this subject. See for example the book by Diestal and Uhl [15] titled
Vector measures.
Denition 14.5 Let (V, [[ [[) be a normed linear space and let (, o) be a measure
space. A function : o V is a vector measure if is countably additive. That
is, if E
i
i=1
is a sequence of disjoint sets of o,
(
i=1
E
i
) =
i=1
(E
i
).
Note that it makes sense to take nite sums because it is given that has
values in a vector space in which vectors can be summed. In the above, (E
i
) is a
vector. It might be a point in 1
n
or in any other vector space. In many of the most
important applications, it is a vector in some sort of function space which may be
innite dimensional. The innite sum has the usual meaning. That is
i=1
(E
i
) = lim
n
n
i=1
(E
i
)
where the limit takes place relative to the norm on V .
Denition 14.6 Let (, o) be a measure space and let be a vector measure dened
on o. A subset, (E), of o is called a partition of E if (E) consists of nitely
many disjoint sets of o and (E) = E. Let
[[(E) = sup
F(E)
[[(F)[[ : (E) is a partition of E.
[[ is called the total variation of .
The next theorem may seem a little surprising. It states that, if nite, the total
variation is a nonnegative measure.
Theorem 14.7 If [[() < , then [[ is a measure on o. Even if [[ () =
, [[ (
i=1
E
i
)

i=1
[[ (E
i
) . That is [[ is subadditive and [[ (A) [[ (B)
whenever A, B o with A B.
Proof: Consider the last claim. Let a < [[ (A) and let (A) be a partition of
A such that
a <
F(A)
[[(F)[[ .
Then (A) B A is a partition of B and
[[ (B)
F(A)
[[(F)[[ +[[(B A)[[ > a.
Since this is true for all such a, it follows [[ (B) [[ (A) as claimed.
Let E
j
j=1
be a sequence of disjoint sets of o and let E
j=1
E
j
. Then
letting a < [[ (E
) , it follows from the denition of total variation there exists a

partition of E
, (E
) = A
1
, , A
n
such that
a <
n
i=1
[[(A
i
)[[.
Also,
A
i
=
j=1
A
i
E
j
and so by the triangle inequality, [[(A
i
)[[
j=1
[[(A
i
E
j
)[[. Therefore, by the
above, and either Fubinis theorem or Lemma 7.18 on Page 144
a <
n
i=1
||(A
i
)||
..
j=1
[[(A
i
E
j
)[[ =
j=1
n
i=1
[[(A
i
E
j
)[[
j=1
[[(E
j
)
because A
i
E
j
n
i=1
is a partition of E
j
.
Since a is arbitrary, this shows
[[(
j=1
E
j
)
j=1
[[(E
j
).
If the sets, E
j
are not disjoint, let F
1
= E
1
and if F
n
has been chosen, let F
n+1

E
n+1

n
i=1
E
i
. Thus the sets, F
i
are disjoint and
i=1
F
i
=
i=1
E
i
. Therefore,
[[
_
j=1
E
j
_
= [[
_
j=1
F
j
_
j=1
[[ (F
j
)
j=1
[[ (E
j
)
and proves [[ is always subadditive as claimed regardless of whether [[ () < .
Now suppose [[ () < and let E
1
and E
2
be sets of o such that E
1
E
2
=
and let A
i
1
A
i
n
i
= (E
i
), a partition of E
i
which is chosen such that
[[ (E
i
) <
n
i
j=1
[[(A
i
j
)[[ i = 1, 2.
Such a partition exists because of the denition of the total variation. Consider the
sets which are contained in either of (E
1
) or (E
2
) , it follows this collection of
sets is a partition of E
1
E
2
denoted by (E
1
E
2
). Then by the above inequality
and the denition of total variation,
[[(E
1
E
2
)
F(E
1
E
2
)
[[(F)[[ > [[ (E
1
) +[[ (E
2
) 2,
which shows that since > 0 was arbitrary,
[[(E
1
E
2
) [[(E
1
) +[[(E
2
). (14.5)
Then 14.5 implies that whenever the E
i
are disjoint, [[(
n
j=1
E
j
)
n
j=1
[[(E
j
).
Therefore,
j=1
[[(E
j
) [[(
j=1
E
j
) [[(
n
j=1
E
j
)
n
j=1
[[(E
j
).
Since n is arbitrary,
[[(
j=1
E
j
) =
j=1
[[(E
j
)
which shows that [[ is a measure as claimed. This proves the theorem.
In the case that is a complex measure, it is always the case that [[ () < .
Theorem 14.8 Suppose is a complex measure on (, o) where o is a algebra
of subsets of . That is, whenever, E
i
is a sequence of disjoint sets of o,
(
i=1
E
i
) =
i=1
(E
i
) .
Then [[ () < .
Proof: First here is a claim.
Claim: Suppose [[ (E) = . Then there are disjoint subsets of E, A and B
such that E = A B, [(A)[ , [(B)[ > 1 and [[ (B) = .
Proof of the claim: From the denition of [[ , there exists a partition of
E, (E) such that
F(E)
[(F)[ > 20 (1 +[(E)[) . (14.6)
Here 20 is just a nice sized number. No eort is made to be delicate in this argument.
Also note that (E) C because it is given that is a complex measure. Consider
the following picture consisting of two lines in the complex plane having slopes 1
and -1 which intersect at the origin, dividing the complex plane into four closed
sets, R
1
, R
2
, R
3
, and R
4
as shown.
R
1
R
2
R
3
R
4
Let
i
consist of those sets, A of (E) for which (A) R
i
. Thus, some sets,
A of (E) could be in two of the
i
if (A) is on one of the intersecting lines. This
is not important. The thing which is important is that if (A) R
1
or R
3
, then
2
2
[(A)[ [Re ((A))[ and if (A) R
2
or R
4
then
2
2
[(A)[ [Im((A))[ and
Re (z) has the same sign for z in R
1
and R
3
while Im(z) has the same sign for z in
R
2
or R
4
. Then by 14.6, it follows that for some i,
F
i
[(F)[ > 5 (1 +[(E)[) . (14.7)
Suppose i equals 1 or 3. A similar argument using the imaginary part applies if i
equals 2 or 4. Then,
F
i
(F)
F
i
Re ((F))
F
i
[Re ((F))[
2
2
F
i
[(F)[ > 5
2
2
(1 +[(E)[) .
Now letting C be the union of the sets in
i
,
[(C)[ =
F
i
(F)
>
5
2
(1 +[(E)[) > 1. (14.8)
Dene D E C.
E
C
Then (C) +(E C) = (E) and so
5
2
(1 +[(E)[) < [(C)[ = [(E) (E C)[
= [(E) (D)[ [(E)[ +[(D)[
and so
1 <
5
2
+
3
2
[(E)[ < [(D)[ .
Now since [[ (E) = , it follows from Theorem 14.8 that = [[ (E) [[ (C) +
[[ (D) and so either [[ (C) = or [[ (D) = . If [[ (C) = , let B = C and
A = D. Otherwise, let B = D and A = C. This proves the claim.
Now suppose [[ () = . Then from the claim, there exist A
1
and B
1
such that
[[ (B
1
) = , [(B
1
)[ , [(A
1
)[ > 1, and A
1
B
1
= . Let B
1
A play the same
role as and obtain A
2
, B
2
B
1
such that [[ (B
2
) = , [(B
2
)[ , [(A
2
)[ > 1,
and A
2
B
2
= B
1
. Continue in this way to obtain a sequence of disjoint sets, A
i
such that [(A

i
)[ > 1. Then since is a measure,
(
i=1
A
i
) =
i=1
(A
i
)
but this is impossible because lim
i
(A
i
) ,= 0. This proves the theorem.
Theorem 14.9 Let (, o) be a measure space and let : o C be a complex
vector measure. Thus [[() < . Let : o [0, ()] be a nite measure such
that . Then there exists a unique f L
1
() such that for all E o,
_
E
fd = (E).
Proof: It is clear that Re and Im are real-valued vector measures on o.
Since [[() < , it follows easily that [ Re [() and [ Im[() < . This is clear
because
[(E)[ [Re (E)[ , [Im(E)[ .
Therefore, each of
[ Re [ + Re
2
,
[ Re [ Re()
2
,
[ Im[ + Im
2
, and
[ Im[ Im()
2
are nite measures on o. It is also clear that each of these nite measures are abso-
lutely continuous with respect to and so there exist unique nonnegative functions
in L
1
(), f
1,
f
2
, g
1
, g
2
such that for all E o,
1
2
([ Re [ + Re )(E) =
_
E
f
1
d,
1
2
([ Re [ Re )(E) =
_
E
f
2
d,
1
2
([ Im[ + Im)(E) =
_
E
g
1
d,
1
2
([ Im[ Im)(E) =
_
E
g
2
d.
Now let f = f
1
f
2
+i(g
1
g
2
).
The following corollary is about representing a vector measure in terms of its
total variation. It is like representing a complex number in the form re
i
. The proof
requires the following lemma.
Lemma 14.10 Suppose (, o, ) is a measure space and f is a function in L
1
(, )
with the property that
[
_
E
f d[ (E)
for all E o. Then [f[ 1 a.e.
Proof of the lemma: Consider the following picture.
'
1
(0, 0)
p
B(p, r)
where B(p, r) B(0, 1) = . Let E = f
1
(B(p, r)). In fact (E) = 0. If (E) ,= 0
then
1
(E)
_
E
f d p
1
(E)
_
E
(f p)d
1
(E)
_
E
[f p[d < r
because on E, [f (x) p[ < r. Hence
[
1
(E)
_
E
fd[ > 1
because it is closer to p than r. (Refer to the picture.) However, this contradicts the
assumption of the lemma. It follows (E) = 0. Since the set of complex numbers,
z such that [z[ > 1 is an open set, it equals the union of countably many balls,
B
i
i=1
. Therefore,
_
f
1
(z C : [z[ > 1
_
=
_
k=1
f
1
(B
k
)
_
k=1
_
f
1
(B
k
)
_
= 0.
Thus [f(x)[ 1 a.e. as claimed. This proves the lemma.
Corollary 14.11 Let be a complex vector measure with [[() <
1
Then there
exists a unique f L
1
() such that (E) =
_
E
fd[[. Furthermore, [f[ = 1 for [[
a.e. This is called the polar decomposition of .
Proof: First note that [[ and so such an L
1
function exists and is unique.
It is required to show [f[ = 1 a.e. If [[(E) ,= 0,
(E)
[[(E)
1
[[(E)
_
E
f d[[
1.
Therefore by Lemma 14.10, [f[ 1, [[ a.e. Now let
E
n
=
_
[f[ 1
1
n
_
.
Let F
1
, , F
m
be a partition of E
n
. Then
m
i=1
[(F
i
)[ =
m
i=1
_
F
i
fd [[
i=1
_
F
i
[f[ d [[
i=1
_
F
i
_
1
1
n
_
d [[ =
m
i=1
_
1
1
n
_
[[ (F
i
)
= [[ (E
n
)
_
1
1
n
_
.
Then taking the supremum over all partitions,
[[ (E
n
)
_
1
1
n
_
[[ (E
n
)
which shows [[ (E
n
) = 0. Hence [[ ([[f[ < 1]) = 0 because [[f[ < 1] =
n=1
E
n
.This
proves Corollary 14.11.
1
As proved above, the assumption that || () < is redundant.
Corollary 14.12 Suppose (, o) is a measure space and is a nite nonnegative
measure on o. Then for h L
1
() , dene a complex measure, by
(E)
_
E
hd.
Then
[[ (E) =
_
E
[h[ d.
Furthermore, [h[ = gh where gd [[ is the polar decomposition of ,
(E) =
_
E
gd [[
Proof: From Corollary 14.11 there exists g such that [g[ = 1, [[ a.e. and for
all E o
(E) =
_
E
gd [[ =
_
E
hd.
Let s
n
be a sequence of simple functions converging pointwise to g. Then from the
above,
_
E
gs
n
d [[ =
_
E
s
n
hd.
Passing to the limit using the dominated convergence theorem,
_
E
d [[ =
_
E
ghd.
It follows gh 0 a.e. and [g[ = 1. Therefore, [h[ = [gh[ = gh. It follows from the
above, that
[[ (E) =
_
E
d [[ =
_
E
ghd =
_
E
d [[ =
_
E
[h[ d
and this proves the corollary.
14.3 Representation Theorems For The Dual Space
Of L
p
Recall the concept of the dual space of a Banach space in the Chapter on Banach
space starting on Page 299. The next topic deals with the dual space of L
p
for p 1
in the case where the measure space is nite or nite. In what follows q = if
p = 1 and otherwise,
1
p
+
1
q
= 1.
Theorem 14.13 (Riesz representation theorem) Let p > 1 and let (, o, ) be a
nite measure space. If (L
p
())
, then there exists a unique h L

q
() (
1
p
+
1
q
=
1) such that
f =
_
hfd.
This function satises [[h[[
q
= [[[[ where [[[[ is the operator norm of .
14.3. REPRESENTATION THEOREMS FOR THE DUAL SPACE OF L
P
373
Proof: (Uniqueness) If h
1
and h
2
both represent , consider
f = [h
1
h
2
[
q2
(h
1
h
2
),
where h denotes complex conjugation. By Holders inequality, it is easy to see that
f L
p
(). Thus
0 = f f =
_
h
1
[h
1
h
2
[
q2
(h
1
h
2
) h
2
[h
1
h
2
[
q2
(h
1
h
2
)d
=
_
[h
1
h
2
[
q
d.
Therefore h
1
= h
2
and this proves uniqueness.
Now let (E) = (A
E
). Since this is a nite measure space A
E
is an element
of L
p
() and so it makes sense to write (A
E
). In fact is a complex measure
having nite total variation. Let A
1
, , A
n
be a partition of .
[A
A
i
[ = w
i
(A
A
i
) = (w
i
A
A
i
)
for some w
i
C, [w
i
[ = 1. Thus
n
i=1
[(A
i
)[ =
n
i=1
[(A
A
i
)[ = (
n
i=1
w
i
A
A
i
)
[[[[(
_
[
n
i=1
w
i
A
A
i
[
p
d)
1
p
= [[[[(
_
d)
1
p
= [[[[()
1
p
.
This is because if x , x is contained in exactly one of the A
i
and so the absolute
value of the sum in the rst integral above is equal to 1. Therefore [[() <
because this was an arbitrary partition. Also, if E
i
i=1
is a sequence of disjoint
sets of o, let
F
n
=
n
i=1
E
i
, F =
i=1
E
i
.
Then by the Dominated Convergence theorem,
[[A
F
n
A
F
[[
p
0.
Therefore, by continuity of ,
(F) = (A
F
) = lim
n
(A
F
n
) = lim
n
n
k=1
(A
E
k
) =
k=1
(E
k
).
This shows is a complex measure with [[ nite.
It is also clear from the denition of that . Therefore, by the Radon
Nikodym theorem, there exists h L
1
() with
(E) =
_
E
hd = (A
E
).
Actually h L
q
and satises the other conditions above. Let s =
m
i=1
c
i
A
E
i
be a
simple function. Then since is linear,
(s) =
m
i=1
c
i
(A
E
i
) =
m
i=1
c
i
_
E
i
hd =
_
hsd. (14.9)
Claim: If f is uniformly bounded and measurable, then
(f) =
_
hfd.
Proof of claim: Since f is bounded and measurable, there exists a sequence of
simple functions, s
n
which converges to f pointwise and in L
p
(). This follows
from Theorem 7.24 on Page 150 upon breaking f up into positive and negative parts
of real and complex parts. In fact this theorem gives uniform convergence. Then
(f) = lim
n
(s
n
) = lim
n
_
hs
n
d =
_
hfd,
the rst equality holding because of continuity of , the second following from 14.9
and the third holding by the dominated convergence theorem.
This is a very nice formula but it still has not been shown that h L
q
().
Let E
n
= x : [h(x)[ n. Thus [hA
E
n
[ n. Then
[hA
E
n
[
q2
(hA
E
n
) L
p
().
By the claim, it follows that
[[hA
E
n
[[
q
q
=
_
h[hA
E
n
[
q2
(hA
E
n
)d = ([hA
E
n
[
q2
(hA
E
n
))
[[[[
[hA
E
n
[
q2
(hA
E
n
)
p
= [[[[ [[hA
E
n
[[
q
p
q
,
the last equality holding because q 1 = q/p and so
__

[hA
E
n
[
q2
(hA
E
n
)
p
d
_
1/p
=
__
_
[hA
E
n
[
q/p
_
p
d
_
1/p
= [[hA
E
n
[[
q
p
q
Therefore, since q
q
p
= 1, it follows that
[[hA
E
n
[[
q
[[[[.
Letting n , the Monotone Convergence theorem implies
[[h[[
q
[[[[. (14.10)
P
375
Now that h has been shown to be in L
q
(), it follows from 14.9 and the density
of the simple functions, Theorem 10.25 on Page 260, that
f =
_
hfd
for all f L
p
().
It only remains to verify the last claim.
[[[[ = sup
_
hf : [[f[[
p
1 [[h[[
q
[[[[
by 14.10, and Holders inequality. This proves the theorem.
To represent elements of the dual space of L
1
(), another Banach space is
needed.
Denition 14.14 Let (, o, ) be a measure space. L
() is the vector space of

measurable functions such that for some M > 0, [f(x)[ M for all x outside of
some set of measure zero ([f(x)[ M a.e.). Dene f = g when f(x) = g(x) a.e.
and [[f[[
infM : [f(x)[ M a.e..

Theorem 14.15 L
() is a Banach space.
Proof: It is clear that L
() is a vector space. Is [[ [[
a norm?
Claim: If f L
(), then [f (x)[ [[f[[
a.e.
Proof of the claim:
_
x : [f (x)[ [[f[[
+n
1
_
E
n
is a set of measure zero
according to the denition of [[f[[
. Furthermore, x : [f (x)[ > [[f[[
=
n
E
n
and so it is also a set of measure zero. This veries the claim.
Now if [[f[[
= 0 it follows that f (x) = 0 a.e. Also if f, g L
(),
[f (x) +g (x)[ [f (x)[ +[g (x)[ [[f[[
+[[g[[
a.e. and so [[f[[
+ [[g[[
serves as one of the constants, M in the denition of

[[f +g[[
. Therefore,
[[f +g[[
[[f[[
+[[g[[
.
Next let c be a number. Then [cf (x)[ = [c[ [f (x)[ [c[ [[f[[
and so [[cf[[

[c[ [[f[[
. Therefore since c is arbitrary, [[f[[
= [[c (1/c) f[[
1
c
[[cf[[
which
implies [c[ [[f[[
[[cf[[
. Thus [[ [[
is a norm as claimed.
To verify completeness, let f
n
be a Cauchy sequence in L
() and use the

above claim to get the existence of a set of measure zero, E
nm
such that for all
x / E
nm
,
[f
n
(x) f
m
(x)[ [[f
n
f
m
[[
Let E =
n,m
E
nm
. Thus (E) = 0 and for each x / E, f
n
(x)
n=1
is a Cauchy
sequence in C. Let
f(x) =
_
0 if x E
lim
n
f
n
(x) if x / E
= lim
n
A
E
C(x)f
n
(x).
Then f is clearly measurable because it is the limit of measurable functions. If
F
n
= x : [f
n
(x)[ > [[f
n
[[
and F =
n=1
F
n
, it follows (F) = 0 and that for x / F E,
[f(x)[ lim inf
n
[f
n
(x)[ lim inf
n
[[f
n
[[
<
because [[f
n
[[
is a Cauchy sequence. ([[[f

n
[[
[[f
m
[[
[ [[f
n
f
m
[[
by the
triangle inequality.) Thus f L
(). Let n be large enough that whenever m > n,

[[f
m
f
n
[[
< .
Then, if x / E,
[f(x) f
n
(x)[ = lim
m
[f
m
(x) f
n
(x)[
lim
m
inf [[f
m
f
n
[[
< .
Hence [[f f
n
[[
< for all n large enough. This proves the theorem.

The next theorem is the Riesz representation theorem for
_
L
1
()
_
.
Theorem 14.16 (Riesz representation theorem) Let (, o, ) be a nite measure
space. If (L
1
())
, then there exists a unique h L
() such that
(f) =
_
hf d
for all f L
1
(). If h is the function in L
() representing (L
1
())
, then
[[h[[
= [[[[.
Proof: Just as in the proof of Theorem 14.13, there exists a unique h L
1
()
such that for all simple functions, s,
(s) =
_
hs d. (14.11)
To show h L
(), let > 0 be given and let

E = x : [h(x)[ [[[[ +.
Let [k[ = 1 and hk = [h[. Since the measure space is nite, k L
1
(). As in
Theorem 14.13 let s
n
be a sequence of simple functions converging to k in L
1
(),
and pointwise. It follows from the construction in Theorem 7.24 on Page 150 that
it can be assumed [s
n
[ 1. Therefore
(kA
E
) = lim
n
(s
n
A
E
) = lim
n
_
E
hs
n
d =
_
E
hkd
P
377
where the last equality holds by the Dominated Convergence theorem. Therefore,
[[[[(E) [(kA
E
)[ = [
_
hkA
E
d[ =
_
E
[h[d
([[[[ +)(E).
It follows that (E) = 0. Since > 0 was arbitrary, [[[[ [[h[[
. Since h L
(),
the density of the simple functions in L
1
() and 14.11 imply
f =
_
hfd , [[[[ [[h[[
. (14.12)
This proves the existence part of the theorem. To verify uniqueness, suppose h
1
and h
2
both represent and let f L
1
() be such that [f[ 1 and f(h
1
h
2
) =
[h
1
h
2
[. Then
0 = f f =
_
(h
1
h
2
)fd =
_
[h
1
h
2
[d.
Thus h
1
= h
2
. Finally,
[[[[ = sup[
_
hfd[ : [[f[[
1
1 [[h[[
[[[[
by 14.12.
Next these results are extended to the nite case.
Lemma 14.17 Let (, o, ) be a measure space and suppose there exists a measur-
able function, r such that r (x) > 0 for all x, there exists M such that [r (x)[ < M
for all x, and
_
rd < . Then for
(L
p
(, ))
, p 1,
there exists h L
q
(, ), L
(, ) if p = 1 such that
f =
_
hfd.
Also [[h[[ = [[[[. ([[h[[ = [[h[[
q
if p > 1, [[h[[
if p = 1). Here
1
p
+
1
q
= 1.
Proof: Dene a new measure , according to the rule
(E)
_
E
rd. (14.13)
Thus is a nite measure on o. For (L
p
())
, dene
(L
p
( ))

by
(g)
_
r
1/p
g
_
This really is in (L
p
( ))
because
(g)
_
r
1/p
g
_
[[[[
__
r
1/p
g
p
d
_
1/p
= [[[[ [[g[[
L
p
( )
Therefore, by Theorems 14.16 and 14.13 there exists a unique h L
q
( ) which
represents

. Here q = if p = 1 and satises 1/q + 1/p = 1 otherwise. Thus for
g L
p
( ) ,
_
r
1/p
g
_

(g) =
_
hgrd =
_
_
r
1/q
h
__
r
1/p
g
_
d
For f L
p
() , it follows f = r
1/p
_
r
1/p
f
_
= r
1/p
g and r
1/p
f L
p
( ). Thus
from the above,
(f) =
_
r
1/p
_
r
1/p
f
__
=
_
_
r
1/q
h
_
r
1/p
_
r
1/p
f
_
d =
_
_
r
1/q
h
_
fd
Since h L
q
( ) , it follows r
1/q
h L
q
(). This is true even in the case that p = 1
so q = because r is bounded. It follows
r
1/q
h
q
L
q
()
=
r
1/q
h
r
1/q
h
q2
r
1/q
h
r
1/q
h
q2
r
1/q
h
_
[[[[
__
r
1/q
h
q/p
_
p
d
_
1/p
= [[[[
r
1/q
h
q/p
L
q
()
and so
r
1/q
h
L
q
()
[[[[ .
Now
[[[[ sup
||f||
L
p
()
1
_
r
1/q
h
_
fd
r
1/q
h
L
q
()
[[[[
and so all the conclusions of Theorems 14.16 and 14.13 hold. This proves the lemma.
A situation in which the conditions of the lemma are satised is the case where
the measure space is nite. In fact, you should show this is the only case in which
the conditions of the above lemma hold.
Theorem 14.18 (Riesz representation theorem) Let (, o, ) be nite and let
(L
p
(, ))
, p 1.
Then there exists a unique h L
q
(, ), L
(, ) if p = 1 such that
f =
_
hfd.
P
379
Also [[h[[ = [[[[. ([[h[[ = [[h[[
q
if p > 1, [[h[[
if p = 1). Here
1
p
+
1
q
= 1.
Proof: Without loss of generality, assum () = . Then let
n
be a
sequence of disjoint elements of o having the property that
1 < (
n
) < ,
n=1
n
= .
Dene
r(x) =
n=1
1
n
2
A
n
(x) (
n
)
1
, (E) =
_
E
rd.
Thus
_
rd = () =
n=1
1
n
2
<
so is a nite measure. The above lemma gives the existence part of the conclusion
of the theorem. Uniqueness is done as before. This proves the theorem.
With the Riesz representation theorem, it is easy to show that
L
p
(), p > 1
is a reexive Banach space. Recall Denition 12.32 on Page 315 for the denition.
Theorem 14.19 For (, o, ) a nite measure space and p > 1, L
p
() is reex-
ive.
Proof: Let
r
: (L
r
())
L
r
() be dened for
1
r
+
1
r
= 1 by
_
(
r
)g d = g
for all g L
r
(). From Theorem 14.18
r
is one to one, onto, continuous and linear.
By the open map theorem,
1
r
is also one to one, onto, and continuous (
r
equals
the representor of ). Thus
r
is also one to one, onto, and continuous by Corollary
12.29. Now observe that J =
p

1
q
. To see this, let z
(L
q
)
, y
(L
p
)
p

1
q
(
q
z
)(y
) = (
p
z
)(y
)
= z
(
p
y
)
=
_
(
q
z
)(
p
y
)d,
J(
q
z
)(y
) = y
(
q
z
)
=
_
(
p
y
)(
q
z
)d.
Therefore
p

1
q
= J on
q
(L
q
)
= L
p
. But the two maps are onto and so J is
also onto.
0
(X)
Consider the dual space of C
0
(X) where X is a locally compact Hausdor space.
It will turn out to be a space of measures. To show this, the following lemma will
be convenient. Recall this space is dened as follows.
Denition 14.20 f C
0
(X) means that for every > 0 there exists a compact
set K such that [f (x)[ < whenever x / K. Recall the norm on this space is
[[f[[
[[f[[ sup[f (x)[ : x X

Lemma 14.21 Suppose is a mapping which has nonnegative values which is de-
ned on the nonnegative functions in C
0
(X) such that
(af +bg) = a(f) +b(g) (14.14)
whenever a, b 0 and f, g 0. Then there exists a unique extension of to all of
C
0
(X), such that whenever f, g C
0
(X) and a, b C, it follows
(af +bg) = a(f) +b(g) .
If
[(f)[ C [[f[[
then
[f[ C [[f[[
Proof: Let C
0
(X; 1) be the real-valued functions in C
0
(X) and dene
R
(f) = f
+
f
for f C
0
(X; 1). Use the identity
(f
1
+f
2
)
+
+f
1
+f
2
= f
+
1
+f
+
2
+ (f
1
+f
2
)
and 14.14 to write

(f
1
+f
2
)
+
(f
1
+f
2
)
= f
+
1
f
1
+f
+
2
f
2
,
it follows that
R
(f
1
+ f
2
) =
R
(f
1
) +
R
(f
2
). To show that
R
is linear, it is
necessary to verify that
R
(cf) = c
R
(f) for all c 1. But
(cf)
= cf
,
if c 0 while
(cf)
+
= c(f)
,
if c < 0 and
(cf)
= (c)f
+
,
14.4. THE DUAL SPACE OF C
0
(X) 381
if c < 0. Thus, if c < 0,
R
(cf) = (cf)
+
(cf)
=
_
(c) f
_
(c)f
+
_
= c(f
) +c(f
+
) = c((f
+
) (f
)) = c
R
(f) .
A similar formula holds more easily if c 0. Now let
f =
R
(Re f) +i
R
(Imf)
for arbitrary f C
0
(X). This is linear as desired.
Here is why. It is obvious that (f +g) = (f) + (g) from the fact that
taking the real and imaginary parts are linear operations. The only thing to check
is whether you can factor out a complex scalar.
((a +ib) f) = (af) + (ibf)

R
(a Re f) +i
R
(a Imf) +
R
(b Imf) +i
R
(b Re f)
because ibf = ib Re f b Imf and so Re (ibf) = b Imf and Im(ibf) = b Re f.
Therefore, the above equals
= (a +ib)
R
(Re f) +i (a +ib)
R
(Imf)
= (a +ib) (
R
(Re f) +i
R
(Imf)) = (a +ib) f
The extension is obviously unique because all the above is required in order for
to be linear.
It remains to verify the claim about continuity of . From the denition of ,
if 0 g f, then
(f) = (f g +g) = (f g) +(g) (g)
[
R
f[
f
+
f
max
_
f
+
, f
_
([f[) C [[f[[
Then letting f = [f[ , [[ = 1, and using the above,

[f[ = f = (f)
R
(Re (f)) = [
R
(Re (f))[
C [[Re (f)[[ C [[f[[

Let L C
0
(X)
. Also denote by C
+
0
(X) the set of nonnegative continuous
functions dened on X. Dene for f C
+
0
(X)
(f) = sup[Lg[ : [g[ f.
Note that (f) < because [Lg[ [[L[[[[g[[ [[L[[[[f[[ for [g[ f. Then the
following lemma is important.
Lemma 14.22 If c 0, (cf) = c(f), f
1
f
2
implies f
1
f
2
, and
(f
1
+f
2
) = (f
1
) +(f
2
).
Also
0 (f) [[L[[ [[f[[
Proof: The rst two assertions are easy to see so consider the third.
For f
j
C
+
0
(X) , there exists g
i
C
0
(X) such that [g
i
[ f
i
and
(f
1
) +(f
2
) [L(g
1
)[ +[L(g
2
)[ + 2
= L(
1
g
1
) +L(
2
g
2
) + 2
= L(
1
g
1
+
2
g
2
) + 2
= [L(
1
g
1
+
2
g
2
)[ + 2
where [g
i
[ f
i
and [
i
[ = 1 and
i
L(g
i
) = [L(g
i
)[. Now
[
1
g
1
+
2
g
2
[ [g
1
[ +[g
2
[ f
1
+f
2
and so the above shows
(f
1
) +(f
2
) (f
1
+f
2
) + 2.
Since is arbitrary, (f
1
) + (f
2
) (f
1
+f
2
) . It remains to verify the other
inequality.
Now let [g[ f
1
+f
2
, [Lg[ (f
1
+f
2
) . Let
h
i
(x) =
_
f
i
(x)g(x)
f
1
(x)+f
2
(x)
if f
1
(x) +f
2
(x) > 0,
0 if f
1
(x) +f
2
(x) = 0.
Then h
i
is continuous and h
1
(x) + h
2
(x) = g(x), [h
i
[ f
i
. The reason it is
continuous at a point where f
1
(x) + f
2
(x) = 0 is that at every point y where
f
1
(y) +f
2
(y) > 0, the top description of the function gives
f
i
(y) g (y)
f
1
(y) +f
2
(y)
[g (y)[
Therefore,
+(f
1
+f
2
) [Lg[ [Lh
1
+Lh
2
[ [Lh
1
[ +[Lh
2
[
(f
1
) +(f
2
).
Since > 0 is arbitrary, this shows that
(f
1
+f
2
) (f
1
) +(f
2
) (f
1
+f
2
)
The last assertion follows from
(f) = sup[Lg[ : [g[ f sup
||g||
||f||
[[L[[ [[g[[
[[L[[ [[f[[

0
(X) 383
which proves the lemma.
Let be dened in Lemma 14.21. Then is linear by this lemma and also
satises
[f[ [[L[[ [[f[[
. (14.15)
Also, if f 0,
f =
R
f = (f) 0.
Therefore, is a positive linear functional on C
0
(X). In particular, it is a positive
linear functional on C
c
(X). By Theorem 8.23 on Page 186, there exists a unique
measure such that
f =
_
X
fd
for all f C
c
(X). This measure is inner regular on all open sets and on all
measurable sets having nite measure. In fact, it is actually a nite measure.
Lemma 14.23 Let L C
0
(X)
as above. Then letting be the Radon measure

just described, it follows is nite and
(X) = [[[[ = [[L[[
Proof: First of all, why is [[[[ = [[L[[? From 14.15 it follows [[[[ [[L[[. But
also
[Lg[ ([g[) = ([g[) [[[[ [[g[[
and so by denition of the operator norm, [[L[[ [[[[ .

Now X is an open set and so
(X) = sup (K) : K V
and so letting K f X for one of these K, it also follows
(X) = sup f : f X
However, for such f X,
0 f =
R
f [[L[[ [[f[[
= [[L[[
and so
(X) [[L[[ .
Now since C
c
(X) is dense in C
0
(X) , there exists f C
c
(X) such that [[f[[ 1
and
[f[ + > [[[[ = [[L[[
Then also f X and so
[[L[[ < [f[ = f (X)
Since is arbitrary, this shows [[L[[ = (X). This proves the lemma.
What follows is the Riesz representation theorem for C
0
(X)
.
Theorem 14.24 Let L (C
0
(X))
for X a locally compact Hausdorf space. Then

there exists a nite Radon measure and a function L
(X, ) such that for

all f C
0
(X) ,
L(f) =
_
X
fd.
Furthermore,
(X) = [[L[[ , [[ = 1 a.e.
and if
(E)
_
E
d
then = [[
Proof: From the above there exists a unique Radon measure such that for all
f C
c
(X) ,
f =
_
X
fd
Then for f C
c
(X) ,
[Lf[ ([f[) =
_
X
[f[d = [[f[[
L
1
()
.
Since is both inner and outer regular thanks to it being nite, C
c
(X) is dense
in L
1
(X, ). (See Theorem 10.28 for more than is needed.) Therefore L extends
uniquely to an element of (L
1
(X, ))
,

L. By the Riesz representation theorem for
L
1
for nite measure spaces, there exists a unique L
(X, ) such that for all

f L
1
(X, ) ,
Lf =
_
X
fd
In particular, for all f C
0
(X) ,
Lf =
_
X
fd
and it follows from Lemma 14.23, (X) = [[L[[.
It remains to verify [[ = 1 a.e. For any f 0,
f
_
X
fd [Lf[ =
_
X
fd
Now if E is measurable, the regularity of implies there exists a sequence of bounded

functions f
n
C
c
(X) such that f
n
(x) A
E
(x) a.e. Then using the dominated
convergence theorem in the above,
_
E
d = lim
n
_
X
f
n
d lim
n
_
X
f
n
d
_
E
d

0
(X), ANOTHER APPROACH 385
and so if (E) > 0,
1
1
(E)
_
E
d
which shows from Lemma 14.10 that [[ 1 a.e. But also, choosing f
1
appropri-
ately, [[f
1
[[
1, and letting Lf
1
= [Lf
1
[ ,
(X) = [[L[[ = sup
||f||
1
[Lf[ [Lf
1
[ +
_
X
f
1
d + =
_
X
Re (f
1
) d +
_
X
[[ d +
and since is arbitrary,
(X)
_
X
[[ d
which requires [[ = 1 a.e. since it was shown to be no larger than 1 and if it is
smaller than 1 on a set of positive measure, then the above could not hold.
It only remains to verify = [[. By Corollary 14.12,
[[ (E) =
_
E
[[ d =
_
E
1d = (E)
and so = [[ . This proves the Theorem.
Sometimes people write
_
X
fd
_
X
fd [[
where d [[ is the polar decomposition of the complex measure . Then with this
convention, the above representation is
L(f) =
_
X
fd, [[ (X) = [[L[[ .
0
(X), Another Approach
It is possible to obtain the above theorem by a slick trick after rst proving it for
the special case where X is a compact Hausdor space. For X a locally compact
Hausdor space,

X denotes the one point compactication of X. Thus,

X = X
and the topology of

X consists of the usual topology of X along with all
complements of compact sets which are dened as the open sets containing .
Also C
0
(X) will denote the space of continuous functions, f, dened on X such
that in the topology of

X, lim
x
f (x) = 0. For this space of functions, [[f[[
0

sup[f (x)[ : x X is a norm which makes this into a Banach space. Then the
generalization is the following corollary.
Corollary 14.25 Let L (C
0
(X))
where X is a locally compact Hausdor space.

Then there exists L
(X, ) for a nite Radon measure such that for all

f C
0
(X),
L(f) =
_
X
fd.
Proof: Let
D
_
f C
_
X
_
: f () = 0
_
.
Thus

D is a closed subspace of the Banach space C
_
X
_
. Let : C
0
(X)

D be
dened by
f (x) =
_
f (x) if x X,
0 if x = .
Then is an isometry of C
0
(X) and

D. ([[u[[ = [[u[[ .)The following diagram is
obtained.
C
0
(X)

D
_
C
_
X
_
C
0
(X)
D
i
C
_
X
_
By the Hahn Banach theorem, there exists L
1
C
_
X
_
such that
L
1
= L.
Now apply Theorem 14.24 to get the existence of a nite Radon measure,
1
, on

X
and a function L
X,
1
_
, such that
L
1
g =
_
X
gd
1
.
Letting the algebra of
1
measurable sets be denoted by o
1
, dene
o E : E o
1
and let be the restriction of

1
to o. If f C
0
(X),
Lf =
L
1
f L
1
if = L
1
f =
_
X
fd
1
=
_
X
fd.
14.6 More Attractive Formulations
In this section, Corollary 14.25 will be rened and placed in an arguably more
attractive form. The measures involved will always be complex Borel measures
dened on a algebra of subsets of X, a locally compact Hausdor space.
14.6. MORE ATTRACTIVE FORMULATIONS 387
Denition 14.26 Let be a complex measure. Then
_
fd
_
fhd [[ where
hd [[ is the polar decomposition of described above. The complex measure, is
called regular if [[ is regular.
The following lemma says that the dierence of regular complex measures is also
regular.
Lemma 14.27 Suppose
i
, i = 1, 2 is a complex Borel measure with total variation
nite
2
dened on X, a locally compact Hausdorf space. Then
1

2
is also a
regular measure on the Borel sets.
Proof: Let E be a Borel set. That way it is in the algebras associated
with both
i
. Then by regularity of
i
, there exist K and V compact and open
respectively such that K E V and [
i
[ (V K) < /2. Therefore,
A(V \K)
[(
1
2
) (A)[ =
A(V \K)
[
1
(A)
2
(A)[
A(V \K)
[
1
(A)[ +[
2
(A)[
[
1
[ (V K) +[
2
[ (V K) < .
Therefore, [
1
2
[ (V K) and this shows
1
2
is regular as claimed.
Theorem 14.28 Let L C
0
(X)
Then there exists a unique complex measure,

with [[ regular and Borel, such that for all f C
0
(X) ,
L(f) =
_
X
fd.
Furthermore, [[L[[ = [[ (X) .
Proof: By Corollary 14.25 there exists L
(X, ) where is a Radon

measure such that for all f C
0
(X) ,
L(f) =
_
X
fd.
Let a complex Borel measure, be given by
(E)
_
E
d.
This is a well dened complex measure because is a nite measure. By Corollary
14.12
[[ (E) =
_
E
[[ d (14.16)
2
Recall this is automatic for a complex measure.
and = g [[ where gd [[ is the polar decomposition for . Therefore, for f
C
0
(X) ,
L(f) =
_
X
fd =
_
X
fg [[ d =
_
X
fgd [[
_
X
fd. (14.17)
From 14.16 and the regularity of , it follows that [[ is also regular.
What of the claim about [[L[[? By the regularity of [[ , it follows that C
0
(X) (In
fact, C
c
(X)) is dense in L
1
(X, [[). Since [[ is nite, g L
1
(X, [[). Therefore,
there exists a sequence of functions in C
0
(X) , f
n
such that f
n
g in L
1
(X, [[).
Therefore, there exists a subsequence, still denoted by f
n
such that f
n
(x) g (x)
[[ a.e. also. But since [g (x)[ = 1 a.e. it follows that h
n
(x)
f
n
(x)
|f
n
(x)|+
1
n
also
converges pointwise [[ a.e. Then from the dominated convergence theorem and
14.17
[[L[[ lim
n
_
X
h
n
gd [[ = [[ (X) .
Also, if [[f[[
C
0
(X)
1, then
[L(f)[ =
_
X
fgd [[
_
X
[f[ d [[ [[ (X) [[f[[
C
0
(X)
and so [[L[[ [[ (X) . This proves everything but uniqueness.
Suppose and
1
both work. Then for all f C
0
(X) ,
0 =
_
X
fd (
1
) =
_
X
fhd [
1
[
where hd [
1
[ is the polar decomposition for
1
. By Lemma 14.27
1
is
regular and so, as above, there exists f
n
such that [f
n
[ 1 and f
n
h pointwise.
Therefore,
_
X
d [
1
[ = 0 so =
1
14.7 Exercises
1. Suppose is a vector measure having values in 1
n
or C
n
. Can you show that
[[ must be nite? Hint: You might dene for each e
i
, one of the standard ba-
sis vectors, the real or complex measure,
e
i
given by
e
i
(E) e
i
(E) . Why
would this approach not yield anything for an innite dimensional normed lin-
ear space in place of 1
n
?
2. The Riesz representation theorem of the L
p
spaces can be used to prove a
very interesting inequality. Let r, p, q (1, ) satisfy
1
r
=
1
p
+
1
q
1.
Then
1
q
= 1 +
1
r

1
p
>
1
r
14.7. EXERCISES 389
and so r > q. Let (0, 1) be chosen so that r = q. Then also
1
r
=
_
_
_
_
_
1/p+1/p
=1
..
1
1
p
_
_
_
_
_
+
1
q
1 =
1
q

1
p
and so
q
=
1
q

1
p
which implies p
(1 ) = q. Now let f L
p
(1
n
) , g L
q
(1
n
) , f, g 0.
Justify the steps in the following argument using what was just shown that
r = q and p
(1 ) = q. Let
h L
r
(1
n
) .
_
1
r
+
1
r
= 1
_
_
f g (x) h(x) dx
_ _
f (y) g (x y) h(x) dxdy
_ _
[f (y)[ [g (x y)[
[g (x y)[
1
[h(x)[ dydx
_ __
_
[g (x y)[
1
[h(x)[
_
r
dx
_
1/r
__
_
[f (y)[ [g (x y)[
_
r
dx
_
1/r
dy
_
_ __
_
[g (x y)[
1
[h(x)[
_
r
dx
_
p
/r
dy
_
1/p
_
_ __
_
[f (y)[ [g (x y)[
_
r
dx
_
p/r
dy
_
1/p
_
_ __
_
[g (x y)[
1
[h(x)[
_
p
dy
_
r
/p
dx
_
1/r
_
_
[f (y)[
p
__
[g (x y)[
r
dx
_
p/r
dy
_
1/p
=
_
_
[h(x)[
r
__
[g (x y)[
(1)p
dy
_
r
/p
dx
_
1/r
[[g[[
q/r
q
[[f[[
p
= [[g[[
q/r
q
[[g[[
q/p
q
[[f[[
p
[[h[[
r
= [[g[[
q
[[f[[
p
[[h[[
r
. (14.18)
Youngs inequality says that
[[f g[[
r
[[g[[
q
[[f[[
p
. (14.19)
Therefore [[f g[[
r
[[g[[
q
[[f[[
p
. How does this inequality follow from the
above computation? Does 14.18 continue to hold if r, p, q are only assumed to
be in [1, ]? Explain. Does 14.19 hold even if r, p, and q are only assumed to
lie in [1, ]?
3. Suppose (, , o) is a nite measure space and that f
n
is a sequence of
functions which converge weakly to 0 in L
p
(). This means that
_
f
n
gd 0
for every g L
p
(). Suppose also that f

n
(x) 0 a.e. Show that then
f
n
0 in L
p
() for every p > > 0.
4. Give an example of a sequence of functions in L
(, ) which converges
weak to zero but which does not converge pointwise a.e. to zero. Conver-
gence weak to 0 means that for every g L
1
(, ) ,
_
g (t) f
n
(t) dt 0.
Hint: First consider g C
c
(, ) and maybe try something like f
n
(t) =
sin(nt). Do integration by parts.
5. Let be a real vector measure on the measure space (, T). That is has
values in 1. The Hahn decomposition says there exist measurable sets P, N
such that
P N = , P N = ,
and for each F P, (F) 0 and for each F N, (F) 0. These
sets P, N are called the positive set and the negative set respectively. Show
the existence of the Hahn decomposition. Also explain how this decompo-
sition is unique in the sense that if P
, N
is another Hahn decomposition,

then (P P
) (P
P) has measure zero, a similar formula holding for

N, N
. When you have the Hahn decomposition, as just described, you de-
ne
+
(E) (E P) ,
(E) (E N). This is sometimes called the

Hahn Jordan decomposition. Hint: This is pretty easy if you use the polar
decomposition above.
6. The Hahn decomposition holds for measures which have values in (, ].
Let be such a measure which is dened on a algebra of sets T. This is
not a vector measure because the set on which it has values is not a vector
space. Thus this case is not included in the above discussion. N T is called
a negative set if (B) 0 for all B N. P T is called a positive set if
for all F P, (F) 0. (Here it is always assumed you are only considering
sets of T.) Show that if (A) 0, then there exists N A such that N
is a negative set and (N) (A). Hint: This is done by subtracting o
14.7. EXERCISES 391
disjoint sets having positive meaure. Let A N
0
and suppose N
n
A has
been obtained. Tell why
t
n
sup(E) : E N
n
0.
Let B
n
N
n
such that
(B
n
) >
t
n
2
Then N
n+1
N
n
B
n
. Thus the N
n
are decreasing in n and the B
n
are
disjoint. Explain why (N
n
) (N
0
). Let N = N
n
. Argue t
n
must
converge to 0 since otherwise (N) = . Explain why this requires N to
be a negative set in A which has measure no larger than that of A.
7. Using Problem 6 complete the Hahn decomposition for having values in
(, ]. Now the Hahn Jordan decomposition for the measure is
+
(E) (E P) ,
(E) (E N) .
Explain why
is a nite measure. Hint: Let N

0
= . For N
n
a given
negative set, let
t
n
inf (E) : E N
n
=
Explain why you can assume that for all n, t
n
< 0. Let E
n
N
C
n
such that
(E
n
) < t
n
/2 < 0
and from Problem 6 let A
n
E
n
be a negative set such that (A
n
) (E
n
).
Then N
n+1
N
n
A
n
. If t
n
does not converge to 0 explain why there exists a
set having measure which is not allowed. Thus t
n
0. Let N =
n=1
N
n
and explain why P N
C
must be positive due to t
n
0.
8. What if has values in [, ). Prove there exists a Hahn decomposition for
as in the above problem. Why do we not allow to have values in [, ]?
Hint: You might want to consider .
9. Suppose X is a Banach space and let X
denote its dual space. A sequence

x
n=1
in X
is said to converge weak to x
if for every x X,
lim
n
x
n
(x) = x
(x) .
Let
n
be a mollier. Also let be the measure dened by
(E) = 1 if 0 E and 0 if 1 / E.
Explain how
n
weak .
10. Let (, T, P) be a probability space and let X : 1
n
be a random variable.
This means X
1
(open set) T. Dene a measure
X
on the Borel sets of
1
n
as follows. For E a Borel set,
X
(E) P
_
X
1
(E)
_
Explain why this is well dened. Next explain why
X
can be considered a
Radon probability measure by completion. Explain why
X
(
if
X
()
_
R
n
d
X
where ( is the collection of functions used to dene the Fourier transform.
11. Using the above problem, the characteristic function of this measure (ran-
dom variable) is
X
(y)
_
R
n
e
ixy
d
X
Show this always exists for any such random variable and is continuous. Next
show that for two random variables X, Y,
X
=
Y
if and only if
X
(y) =
Y
(y) for all y. In other words, show the distribution measures are the same
if and only if the characteristic functions are the same. A lot more can be
concluded by looking at characteristic functions of this sort. The important
thing about these characteristic functions is that they always exist, unlike
moment generating functions.
Integrals And Derivatives
15.1 The Fundamental Theorem Of Calculus
The version of the fundamental theorem of calculus found in Calculus has already
been referred to frequently. It says that if f is a Riemann integrable function, the
function
x
_
x
a
f (t) dt,
has a derivative at every point where f is continuous. It is natural to ask what occurs
for f in L
1
. It is an amazing fact that the same result is obtained asside from a set of
measure zero even though f, being only in L
1
may fail to be continuous anywhere.
Proofs of this result are based on some form of the Vitali covering theorem presented
above. In what follows, the measure space is (1
n
, o, m) where m is n-dimensional
Lebesgue measure although the same theorems can be proved for arbitrary Radon
measures [32]. To save notation, m is written in place of m
n
. Also dy will be used
for dm(y) .
By Lemma 8.8 on Page 178 and the completeness of m, the Lebesgue measurable
sets are exactly those measurable in the sense of Caratheodory. Also, to save on
notation m is also the name of the outer measure dened on all of T(1
n
) which is
determined by m
n
. Recall
B(p, r) = x : [x p[ < r. (15.1)
Also dene the following.
If B = B(p, r), then

B = B(p, 5r). (15.2)
The rst version of the Vitali covering theorem presented above will now be used
to establish the fundamental theorem of calculus. The space of locally integrable
functions is the most general one for which the maximal function dened below
makes sense.
393
394 INTEGRALS AND DERIVATIVES
Denition 15.1 f L
1
loc
(1
n
) means fA
B(0,R)
L
1
(1
n
) for all R > 0. For
f L
1
loc
(1
n
), the Hardy Littlewood Maximal Function, Mf, is dened by
Mf(x) sup
r>0
1
m(B(x, r))
_
B(x,r)
[f(y)[dy.
Theorem 15.2 If f L
1
(1
n
), then for > 0,
m([Mf > ])
5
n
[[f[[
1
.
(Here and elsewhere, [Mf > ] x 1
n
: Mf(x) > with other occurrences of
[ ] being dened similarly.)
Proof: Let S [Mf > ]. For x S, choose r
x
> 0 with
1
m(B(x, r
x
))
_
B(x,r
x
)
[f[ dm > .
The r
x
are all bounded because
m(B(x, r
x
)) <
1
_
B(x,r
x
)
[f[ dm <
1
[[f[[
1
.
By the Vitali covering theorem, there are disjoint balls B(x
i
, r
i
) such that
S
i=1
B(x
i
, 5r
i
)
and
1
m(B(x
i
, r
i
))
_
B(x
i
,r
i
)
[f[ dm > .
Therefore
m(S)
i=1
m(B(x
i
, 5r
i
)) = 5
n
i=1
m(B(x
i
, r
i
))
5
n
i=1
_
B(x
i
,r
i
)
[f[ dm
5
n
_
R
n
[f[ dm,
the last inequality being valid because the balls B(x
i
, r
i
) are disjoint. This proves
the theorem.
Note that at this point it is unknown whether S is measurable. This is why
m(S) and not m(S) is written.
The following is the fundamental theorem of calculus from elementary calculus.
15.1. THE FUNDAMENTAL THEOREM OF CALCULUS 395
Lemma 15.3 Suppose g is a continuous function. Then for all x,
lim
r0
1
m(B(x, r))
_
B(x,r)
g(y)dy = g(x).
Proof: Note that
g (x) =
1
m(B(x, r))
_
B(x,r)
g (x) dy
and so

g (x)
1
m(B(x, r))
_
B(x,r)
g(y)dy
1
m(B(x, r))
_
B(x,r)
(g(y) g (x)) dy
1
m(B(x, r))
_
B(x,r)
[g(y) g (x)[ dy.
Now by continuity of g at x, there exists r > 0 such that if [x y[ < r, [g (y) g (x)[ <
. For such r, the last expression is less than
1
m(B(x, r))
_
B(x,r)
dy < .
Denition 15.4 Let f L
1
_
1
k
, m
_
. A point, x 1
k
is said to be a Lebesgue
point if
limsup
r0
1
m(B(x, r))
_
B(x,r)
[f (y) f (x)[ dm = 0.
Note that if x is a Lebesgue point, then
lim
r0
1
m(B(x, r))
_
B(x,r)
f (y) dm(y) = f (x) .
and so the symmetric derivative exists at all Lebesgue points. I will use dy to
symbolize dm(y) for the sake of convenience and because it is standard notation.
Theorem 15.5 (Fundamental Theorem of Calculus) Let f L
1
(1
k
). Then there
exists a set of measure 0, N, such that if x / N, then
lim
r0
1
m(B(x, r))
_
B(x,r)
[f(y) f(x)[dy = 0.
Proof: Let > 0 and let > 0. By density of C
c
_
1
k
_
in L
1
_
1
k
, m
_
there exists
g C
c
_
1
k
_
such that [[g f[[
L
1
(R
k
)
< . Now since g is continuous,
limsup
r0
1
m(B(x, r))
_
B(x,r)
[f (y) f (x)[ dy
= limsup
r0
1
m(B(x, r))
_
B(x,r)
[f (y) f (x)[ dy
lim
r0
1
m(B(x, r))
_
B(x,r)
[g (y) g (x)[ dy
= limsup
r0
_
1
m(B(x, r))
_
B(x,r)
[f (y) f (x)[ [g (y) g (x)[ dy
_
limsup
r0
_
1
m(B(x, r))
_
B(x,r)
[[f (y) f (x)[ [g (y) g (x)[[ dy
_
limsup
r0
_
1
m(B(x, r))
_
B(x,r)
[f (y) g (y) (f (x) g (x))[ dy
_
limsup
r0
_
1
m(B(x, r))
_
B(x,r)
[f (y) g (y)[ dm
_
+[f (x) g (x)[
M ([f g]) (x) +[f (x) g (x)[ .
Therefore,
_
limsup
r0
1
m(B(x, r))
_
B(x,r)
[f (y) f (x)[ dm >
_
_
M ([f g]) >

2
_
_
[f g[ >

2
_
Now
>
_
[f g[ dm
_
[|fg|>
2
]
[f g[ dm

2
m
__
[f g[ >

2
__
This along with the weak estimate of Theorem 15.2 implies
m
__
limsup
r0
1
m(B(x, r))
_
B(x,r)
[f (y) f (x)[ dm >
__
<
_
2
5
k
+
2
_
[[f g[[
L
1
(R
k
)
<
_
2
5
k
+
2
_
.
15.1. THE FUNDAMENTAL THEOREM OF CALCULUS 397
Since > 0 is arbitrary, it follows
m
__
limsup
r0
1
m(B(x, r))
_
B(x,r)
[f (y) f (x)[ dm >
__
= 0.
Now let
N =
_
limsup
r0
1
m(B(x, r))
_
B(x,r)
[f (y) f (x)[ dm > 0
_
and
N
n
=
_
limsup
r0
1
m(B(x, r))
_
B(x,r)
[f (y) f (x)[ dm >
1
n
_
It was just shown that m(N
n
) = 0. Also, N =
n=1
N
n
. Therefore, m(N) = 0 also.
It follows that for x / N,
limsup
r0
1
m(B(x, r))
_
B(x,r)
[f (y) f (x)[ dm(y) = 0
and this proves a.e. point is a Lebesgue point.
Of course it is sucient to assume f is only in L
1
loc
_
1
k
_
.
Corollary 15.6 (Fundamental Theorem of Calculus) Let f L
1
loc
(1
k
). Then there
exists a set of measure 0, N, such that if x / N, then
lim
r0
1
m(B(x, r))
_
B(x,r)
[f(y) f(x)[dy = 0.
Proof: Consider B(0, n) where n is a positive integer. Then f
n
fA
B(0,n)

L
1
_
1
k
_
and so there exists a set of measure 0, N
n
such that if x B(0, n) N
n
,
then
lim
r0
1
m(B(x, r))
_
B(x,r)
[f
n
(y) f
n
(x)[dy
= lim
r0
1
m(B(x, r))
_
B(x,r)
[f(y) f(x)[dy = 0.
Let N =
n=1
N
n
. Then if x / N, the above equation holds.
Corollary 15.7 If f L
1
loc
(1
n
), then
lim
r0
1
m(B(x, r))
_
B(x,r)
f(y)dy = f(x) a.e. x. (15.3)
Proof:
1
m(B(x, r))
_
B(x,r)
f(y)dy f(x)
1
m(B(x, r))
_
B(x,r)
[f(y) f(x)[ dy
and the last integral converges to 0 a.e. x.
Denition 15.8 For N the set of Theorem 15.5 or Corollary 15.6, N
C
is called
the Lebesgue set or the set of Lebesgue points.
The next corollary is a one dimensional version of what was just presented.
1
(1) and let
F(x) =
_
x
f(t)dt.
Then for a.e. x, F
(x) = f(x).
Proof: For h > 0
1
h
_
x+h
x
[f(y) f(x)[dy 2(
1
2h
)
_
x+h
xh
[f(y) f(x)[dy
By Theorem 15.5, this converges to 0 a.e. Similarly
1
h
_
x
xh
[f(y) f(x)[dy
converges to 0 a.e. x.
F(x +h) F(x)

h
f(x)
1
h
_
x+h
x
[f(y) f(x)[dy (15.4)
and

F(x) F(x h)
h
f(x)
1
h
_
x
xh
[f(y) f(x)[dy. (15.5)
Now the expression on the right in 15.4 and 15.5 converges to zero for a.e. x.
Therefore, by 15.4, for a.e. x the derivative from the right exists and equals f (x)
while from 15.5 the derivative from the left exists and equals f (x) a.e. It follows
lim
h0
F(x +h) F(x)
h
= f(x) a.e. x
15.2 Absolutely Continuous Functions
Denition 15.10 Let [a, b] be a closed and bounded interval and let f : [a, b] 1.
Then f is said to be absolutely continuous if for every > 0 there exists > 0 such
that if

m
i=1
[y
i
x
i
[ < , then

m
i=1
[f (y
i
) f (x
i
)[ < .
15.2. ABSOLUTELY CONTINUOUS FUNCTIONS 399
Denition 15.11 A nite subset, P of [a, b] is called a partition of [x, y] [a, b] if
P = x
0
, x
1
, , x
n
where
x = x
0
< x
1
< , < x
n
= y.
For f : [a, b] 1 and P = x
0
, x
1
, , x
n
dene
V
P
[x, y]
n
i=1
[f (x
i
) f (x
i1
)[ .
Denoting by T [x, y] the set of all partitions of [x, y] dene
V [x, y] sup
PP[x,y]
V
P
[x, y] .
For simplicity, V [a, x] will be denoted by V (x) . It is called the total variation of
the function, f.
There are some simple facts about the total variation of an absolutely continuous
function, f which are contained in the next lemma.
Lemma 15.12 Let f be an absolutely continuous function dened on [a, b] and
let V be its total variation function as described above. Then V is an increasing
bounded function. Also if P and Q are two partitions of [x, y] with P Q, then
V
P
[x, y] V
Q
[x, y] and if [x, y] [z, w] ,
V [x, y] V [z, w] (15.6)
If P = x
0
, x
1
, , x
n
is a partition of [x, y] , then
V [x, y] =
n
i=1
V [x
i
, x
i1
] . (15.7)
Also if y > x,
V (y) V (x) [f (y) f (x)[ (15.8)
and the function, x V (x) f (x) is increasing. The total variation function, V
is absolutely continuous.
Proof: The claim that V is increasing is obvious as is the next claim about
P Q leading to V
P
[x, y] V
Q
[x, y] . To verify this, simply add in one point
at a time and verify that from the triangle inequality, the sum involved gets no
smaller. The claim that V is increasing consistent with set inclusion of intervals is
also clearly true and follows directly from the denition.
Now let t < V [x, y] where P
0
= x
0
, x
1
, , x
n
is a partition of [x, y] . There
exists a partition, P of [x, y] such that t < V
P
[x, y] . Without loss of generality it
can be assumed that x
0
, x
1
, , x
n
P since if not, you can simply add in the
points of P
0
and the resulting sum for the total variation will get no smaller. Let
P
i
be those points of P which are contained in [x
i1
, x
i
] . Then
t < V
p
[x, y] =
n
i=1
V
P
i
[x
i1
, x
i
]
n
i=1
V [x
i1
, x
i
] .
Since t < V [x, y] is arbitrary,
V [x, y]
n
i=1
V [x
i
, x
i1
] (15.9)
Note that 15.9 does not depend on f being absolutely continuous. Suppose now
that f is absolutely continuous. Let correspond to = 1. Then if [x, y] is an
interval of length no larger than , the denition of absolute continuity implies
V [x, y] < 1.
Then from 15.9
V [a, n]
n
i=1
V [a + (i 1) , a +i] <
n
i=1
1 = n.
Thus V is bounded on [a, b]. Now let P
i
be a partition of [x
i1
, x
i
] such that
V
P
i
[x
i1
, x
i
] > V [x
i1
, x
i
]

n
Then letting P = P
i
,
+
n
i=1
V [x
i1
, x
i
] <
n
i=1
V
P
i
[x
i1
, x
i
] = V
P
[x, y] V [x, y] .
Since is arbitrary, 15.7 follows from this and 15.9.
Now let x < y
V (y) f (y) (V (x) f (x)) = V (y) V (x) (f (y) f (x))
V (y) V (x) [f (y) f (x)[ 0.
It only remains to verify that V is absolutely continuous.
Let > 0 be given and let correspond to /2 in the denition of absolute conti-
nuity applied to f. Suppose
n
i=1
[y
i
x
i
[ < and consider
n
i=1
[V (y
i
) V (x
i
)[.
By 15.9 this last equals

n
i=1
V [x
i
, y
i
] . Now let P
i
be a partition of [x
i
, y
i
] such
that V
P
i
[x
i
, y
i
] +

2n
> V [x
i
, y
i
] . Then by the denition of absolute continuity,
n
i=1
[V (y
i
) V (x
i
)[ =
n
i=1
V [x
i
, y
i
]
n
i=1
V
P
i
[x
i
, y
i
] + < /2 +/2 = .
and shows V is absolutely continuous as claimed.
15.2. ABSOLUTELY CONTINUOUS FUNCTIONS 401
Lemma 15.13 Suppose f : [a, b] 1 is absolutely continuous and increasing.
Then f
exists a.e., is in L
1
([a, b]) , and
f (x) = f (a) +
_
x
a
f
(t) dt.
Proof: Dene L, a positive linear functional on C ([a, b]) by
Lg
_
b
a
gdf
where this integral is the Riemann Stieltjes integral with respect to the integrating
function, f. By the Riesz representation theorem for positive linear functionals,
there exists a unique Radon measure, such that Lg =
_
gd. Now consider the
following picture for g
n
C ([a, b]) in which g
n
equals 1 for x between x +1/n and
y.
x y + 1/n x + 1/n y
Then g
n
(t) A
(x,y]
(t) pointwise. Therefore, by the dominated convergence
theorem,
((x, y]) = lim
n
_
g
n
d.
However,
_
f (y) f
_
x +
1
n
__

_
g
n
d =
_
b
a
g
n
df
_
f
_
y +
1
n
_
f (y)
_
+
_
f (y) f
_
x +
1
n
__
+
_
f
_
x +
1
n
_
f (x)
_
and so as n the continuity of f implies
((x, y]) = f (y) f (x) .
Similarly, (x, y) = f (y) f (x) and ([x, y]) = f (y) f (x) , the argument used
to establish this being very similar to the above. It follows in particular that
f (x) f (a) =
_
[a,x]
d.
Note that up till now, no reference has been made to the absolute continuity of f.
Any increasing continuous function would be ne.
Now if E is a Borel set such that m(E) = 0, Then the outer regularity of m
implies there exists an open set, V containing E such that m(V ) < where
corresponds to in the denition of absolute continuity of f. Then letting I
k
be
the connected components of V it follows E
k=1
I
k
with
k
m(I
k
) = m(V ) < .
Therefore, from absolute continuity of f, it follows that for I
k
= (a
k
, b
k
) and each
n
(
n
k=1
I
k
) =
n
k=1
(I
k
) =
n
k=1
[f (b
k
) f (a
k
)[ <
and so letting n ,
(E) (V ) =
k=1
[f (b
k
) f (a
k
)[ .
Since is arbitrary, it follows (E) = 0. Therefore, m and so by the Radon
Nikodym theorem there exists a unique h L
1
([a, b]) such that
(E) =
_
E
hdm.
In particular,
([a, x]) = f (x) f (a) =
_
[a,x]
hdm.
From the fundamental theorem of calculus f
(x) = h(x) at every Lebesgue point

of h. Therefore, writing in usual notation,
f (x) = f (a) +
_
x
a
f
(t) dt
as claimed. This proves the lemma.
With the above lemmas, the following is the main theorem about absolutely
continuous functions.
Theorem 15.14 Let f : [a, b] 1 be absolutely continuous if and only if f
(x)
exists a.e., f
L
1
([a, b]) and
f (x) = f (a) +
_
x
a
f
(t) dt.
Proof: Suppose rst that f is absolutely continuous. By Lemma 15.12 the total
variation function, V is absolutely continuous and f (x) = V (x) (V (x) f (x))
where both V and V f are increasing and absolutely continuous. By Lemma 15.13
f (x) f (a) = V (x) V (a) [(V (x) f (x)) (V (a) f (a))]
=
_
x
a
V
(t) dt
_
x
a
(V f)
(t) dt.
15.3. DIFFERENTIATIONOF MEASURES WITHRESPECT TOLEBESGUE MEASURE403
Now f
exists and is in L
1
becasue f = V (V f) and V and V f have derivatives
in L
1
. Therefore, (V f)
= V
and so the above reduces to

f (x) f (a) =
_
x
a
f
(t) dt.
This proves one half of the theorem.
Now suppose f
L
1
and f (x) = f (a) +
_
x
a
f
(t) dt. It is necessary to verify

that f is absolutely continuous. But this follows easily from Lemma 7.45 on Page
163 which implies that a single function, f
is uniformly integrable. This lemma

implies that if

i
[y
i
x
i
[ is suciently small then
_
y
i
x
i
f
(t) dt
i
[f (y
i
) f (x
i
)[ < .
The following simple corollary is a case of Rademachers theorem.
Corollary 15.15 Suppose f : [a, b] 1 is Lipschitz continuous,
[f (x) f (y)[ K[x y[ .
Then f
(x) exists a.e. and

f (x) = f (a) +
_
x
a
f
(t) dt.
Proof: It is easy to see that f is absolutely continuous. Therefore, Theorem
15.14 applies.
15.3 Dierentiation Of Measures With Respect To
Lebesgue Measure
Recall the Vitali covering theorem in Corollary 9.20 on Page 223.
n
and let T, be a collection of open balls of bounded
j
j=1
, such that m(E
j=1
B
j
) = 0.
Denition 15.17 Let be a Radon mesure dened on 1
n
. Then
d
dm
(x) lim
r0
(B(x, r))
m(B(x, r))
whenever this limit exists.
It turns out this limit exists for m a.e. x. To verify this here is another denition.
Denition 15.18 Let f (r) be a function having values in [, ] . Then
lim sup
r0+
f (r) lim
r0
(supf (t) : t [0, r])
lim inf
r0+
f (r) lim
r0
(inf f (t) : t [0, r])
This is well dened because the function r inf f (t) : t [0, r] is increasing and
r supf (t) : t [0, r] is decreasing. Also note that lim
r0+
f (r) exists if and
only if
lim sup
r0+
f (r) = lim inf
r0+
f (r)
and if this happens
lim
r0+
f (r) = lim inf
r0+
f (r) = lim sup
r0+
f (r) .
The claims made in the above denition follow immediately from the denition
of what is meant by a limit in [, ] and are left for the reader.
Theorem 15.19 Let be a Borel measure on 1
n
then
d
dm
(x) exists in [, ]
m a.e.
Proof: Let p < q and let p, q be rational numbers. Dene N
pq
(M) as
_
x B(0, M) such that lim sup
r0+
(B(x, r))
m(B(x, r))
>
q > p > lim inf
r0+
(B(x, r))
m(B(x, r))
_
,
Also dene N
pq
as
_
x 1
n
such that lim sup
r0+
(B(x, r))
m(B(x, r))
> q > p > lim inf
r0+
(B(x, r))
m(B(x, r))
_
,
and N as
_
x 1
n
such that lim sup
r0+
(B(x, r))
m(B(x, r))
> lim inf
r0+
(B(x, r))
m(B(x, r))
_
.
I will show m(N
pq
(M)) = 0. Use outer regularity to obtain an open set, V con-
taining N
pq
(M) such that
m(N
pq
(M)) + > m(V ) .
From the denition of N
pq
(M) , it follows that for each x N
pq
(M) there exist
arbitrarily small r > 0 such that
(B(x, r))
m(B(x, r))
< p.
Only consider those r which are small enough to be contained in B(0, M) so that
the collection of such balls has bounded radii. This is a Vitali cover of N
pq
(M) and
so by Corollary 15.16 there exists a sequence of disjoint balls of this sort, B
i
i=1
such that
(B
i
) < pm(B
i
) , m(N
pq
(M)
i=1
B
i
) = 0. (15.10)
Now for x N
pq
(M) (
i=1
B
i
) (most of N
pq
(M)), there exist arbitrarily small
balls, B(x, r) , such that B(x, r) is contained in some set of B
i
i=1
and
(B(x, r))
m(B(x, r))
> q.
This is a Vitali cover of N
pq
(M)(
i=1
B
i
) and so there exists a sequence of disjoint
balls of this sort,
_
B
j
_
j=1
such that
m
_
(N
pq
(M) (
i=1
B
i
))
j=1
B
j
_
= 0,
_
B
j
_
> qm
_
B
j
_
. (15.11)
It follows from 15.10 and 15.11 that
m(N
pq
(M)) m((N
pq
(M) (
i=1
B
i
))) m
_
j=1
B
j
_
(15.12)
Therefore,
_
B
j
_
> q
j
m
_
B
j
_
qm(N
pq
(M) (
i
B
i
)) = qm(N
pq
(M))
pm(N
pq
(M)) p (m(V ) ) p
i
m(B
i
) p
i
(B
i
) p
_
B
j
_
p.
It follows
p (q p) m(N
pq
(M))
Since is arbitrary, m(N
pq
(M)) = 0. Now N
pq

M=1
N
pq
(M) and so m(N
pq
) =
0. Now
N =
p,qQ
N
pq
and since this is a countable union of sets of measure zero, m(N) = 0 also. This
proves the theorem.
From Theorem 14.8 on Page 367 it follows that if is a complex measure then
[[ is a nite measure. This makes possible the following denition.
Denition 15.20 Let be a real measure. Dene the following measures. For E
a measurable set,
+
(E)
1
2
([[ +) (E) ,
(E)
1
2
([[ ) (E) .
These are measures thanks to Theorem 14.7 on Page 366 and
+
= . These
measures have values in [0, ). They are called the positive and negative parts of
respectively. For a complex measure, dene Re and Im by
Re (E)
1
2
_
(E) +(E)
_
Im(E)
1
2i
_
(E) (E)
_
Then Re and Im are both real measures. Thus for a complex measure,
= Re
+
Re
+i
_
Im
+
Im
_
=
1
1
+i (
3
4
)
where each
i
is a real measure having values in [0, ).
Then there is an obvious corollary to Theorem 15.19.
Corollary 15.21 Let be a complex Borel measure on 1
n
. Then
d
dm
(x) exists
a.e.
Proof: Letting
i
be dened in Denition 15.20. By Theorem 15.19, for m a.e.
x,
d
i
dm
(x) exists. This proves the corollary because is just a nite sum of these
i
.
Theorem 14.2 on Page 359, the Radon Nikodym theorem, implies that if you have
two nite measures, and , you can write as the sum of a measure absolutely
continuous with respect to and one which is singular to in a unique way. The
next topic is related to this. It has to do with the dierentiation of a measure which
is singular with respect to Lebesgue measure.
Theorem 15.22 Let be a Radon measure on 1
n
and suppose there exists a
measurable set, N such that for all Borel sets, E, (E) = (E N) where m(N) =
0. Then
d
dm
(x) = 0 m a.e.
Proof: For k N, let
B
k
(M)
_
x N
C
: lim sup
r0+
(B(x, r))
m(B(x, r))
>
1
k
_
B(0,M) ,
B
k

_
x N
C
: lim sup
r0+
(B(x, r))
m(B(x, r))
>
1
k
_
,
B
_
x N
C
: lim sup
r0+
(B(x, r))
m(B(x, r))
> 0
_
.
Let > 0. Since is regular, there exists H, a compact set such that H
N B(0, M) and
(N B(0, M) H) < .
B(0, M)
N B(0, M)
H
B
i
B
k
(M)
For each x B
k
(M) , there exist arbitrarily small r > 0 such that B(x, r)
B(0, M) H and
(B(x, r))
m(B(x, r))
>
1
k
. (15.13)
Two such balls are illustrated in the above picture. This is a Vitali cover of B
k
(M)
and so there exists a sequence of disjoint balls of this sort, B
i
i=1
such that
m(B
k
(M)
i
B
i
) = 0. Therefore,
m(B
k
(M)) m(B
k
(M) (
i
B
i
))
i
m(B
i
) k
i
(B
i
)
= k
i
(B
i
N) = k
i
(B
i
N B(0, M))
k(N B(0, M) H) < k
Since was arbitrary, this shows m(B
k
(M)) = 0.
Therefore,
m(B
k
)
M=1
m(B
k
(M)) = 0
and m(B)
k
m(B
k
) = 0. Since m(N) = 0, this proves the theorem.
It is easy to obtain a dierent version of the above theorem in which is only a
Borel measure which has complex values. This is done with the aid of the following
lemma.
Lemma 15.23 Suppose is a Borel measure on 1
n
having values in [0, ). Then
there exists a Radon measure,
1
such that
1
= on all Borel sets.
Proof: By assumption, (1
n
) < and so it is possible to dene a positive
linear functional, L on C
c
(1
n
) by
Lf
_
fd.
By the Riesz representation theorem for positive linear functionals of this sort, there
exists a unique Radon measure,
1
such that for all f C
c
(1
n
) ,
_
fd
1
= Lf =
_
fd.
Now let V be an open set and let K
k

_
x V : dist
_
x, V
C
_
1/k
_
B(0,k).
Then K
k
is an incresing sequence of compact sets whose union is V. Let K
k
f
k
V. Then f
k
(x) A
V
(x) for every x. Therefore,
1
(V ) = lim
k
_
f
k
d
1
= lim
k
_
f
k
d = (V )
and so =
1
on open sets. Now if K is a compact set, let
V
k
x 1
n
: dist (x, K) < 1/k .
Then V
k
is an open set and
k
V
k
= K. Letting K f
k
V
k
, it follows that
f
k
(x) A
K
(x) for all x 1
n
. Therefore, by the dominated convergence theorem
with a dominating function, A
R
n
1
(K) = lim
k
_
f
k
d
1
= lim
k
_
f
k
d = (K)
and so and
1
are equal on all compact sets. It follows =
1
on all countable
unions of compact sets and countable intersections of open sets.
Now let E be a Borel set. By regularity of
1
, there exist sets, H and G such
that H is the countable union of an increasing sequence of compact sets, G is the
countable intersection of a decreasing sequence of open sets, H E G, and
1
(H) =
1
(G) =
1
(E) . Therefore,
1
(H) = (H) (E) (G) =
1
(G) =
1
(E) =
1
(H) .
therefore, (E) =
1
(E) and this proves the lemma.
Corollary 15.24 Suppose is a complex Borel measure dened on 1
n
for which
there exists a measurable set, N such that for all Borel sets, E, (E) = (E N)
where m(N) = 0. Then
d
dm
(x) = 0 m a.e.
Proof: Each of Re
+
, Re
, Im
+
, and Im
are real measures having values

in [0, ) and so by Lemma 15.23 each is a Radon measure having the same property
that has in terms of being supported on a set of m measure zero. Therefore, by
Theorem 15.22 for equal to any of these,
d
dm
(x) = 0 m a.e. This proves the
corollary.
15.4. EXERCISES 409
15.4 Exercises
1. Let E be a Lebesgue measurable set. x E is a point of density if
lim
r0
m
n
(E B(x, r))
m
n
(B(x, r))
= 1.
Show that a.e. point of E is a point of density. Hint: The numerator of the
above quotient is
_
B(x,r)
A
E
(x) dm. Now consider the fundamental theorem
of calculus.
2. Show that if f L
1
loc
(1
n
) and
_
fdx = 0 for all C
c
(1
n
), then f (x) = 0
a.e.
3. Now suppose that for u L
1
loc
(1
n
), w L
1
loc
(1
n
) is a weak partial derivative
of u with respect to x
i
if whenever h
k
0 it follows that for all C
c
(1
n
),
lim
k
_
R
n
(u(x +h
k
e
i
) u(x))
h
k
(x) dx =
_
R
n
w(x) (x) dx. (15.14)
and in this case, write w = u
,i
. Using Problem 2 show this is well dened.
4. Show that w L
1
loc
(1
n
) is a weak partial derivative of u with respect to
x
i
in the sense of Problem 3 if and only if for all C
c
(1
n
) ,
_
u(x)
,i
(x) dx =
_
w(x) (x) dx.
5. If f L
1
loc
(1
n
), the fundamental theorem of calculus says
lim
r0
1
m
n
(B(x, r))
_
B(x,r)
[f (y) f (x)[ dy = 0
for a.e. x. Suppose now that E
k
is a sequence of measurable sets and r
k
is
a sequence of positive numbers converging to zero such that E
k
B(x, r
k
)
and m
n
(E
k
) cm
n
(B(x, r
k
)) where c is some positive number. Then show
lim
k
1
m
n
(E
k
)
_
E
k
[f (y) f (x)[ dy = 0
for a.e. x. Such a sequence of sets is known as a regular family of sets [42]
and is said to converge regularly to x in [30].
6. Let f be in L
1
loc
(1
n
). Show Mf is Borel measurable. Hint: First consider the
function, F
r
(x)
1
m
n
(B(x,r))
_
B(x,r)
[f (y)[ dm
n
(y). Argue F
r
is continuous.
Then Mf (x) = sup
r>0
F
r
(x).
7. If f L
p
, 1 /2,
0 if [f(x)[ /2.
Argue [Mf(x) > ] [Mf
1
(x) > /2]. Then use the distribution function.
Recall why
_
(Mf)
p
dx =
_

0
p
p1
m([Mf > ])d
_

0
p
p1
m([Mf
1
> /2])d.
Now use the fundamental estimate satised by the maximal function and
Fubinis Theorem as needed.
8. Show [f(x)[ Mf(x) at every Lebesgue point of f whenever f L
1
loc
(1
n
).
9. In the proof of the Vitali covering theorem, Theorem 9.11 on Page 216, there
is nothing sacred about the constant
1
2
. Go through the proof replacing this
constant with where (0, 1). Show that it follows that for every > 0,
the conclusion of the Vitali covering theorem can be obtained with 5 replaced
by (3 +) in the denition of

B. In this context, see Rudin [38] who proves
a dierent version of the Vitali covering theorem involving only nite covers
and gets the constant 3. See also Problem 10.
10. Suppose A is covered by a nite collection of Balls, T. Show that then there
exists a disjoint collection of these balls, B
i
p
i=1
, such that A
p
i=1
B
i
where
5 can be replaced with 3 in the denition of

B. Hint: Since the collection of
balls is nite, they can be arranged in order of decreasing radius.
11. Suppose E is a Lebesgue measurable set which has positive measure and let B
be an arbitrary open ball and let D be a set dense in 1
n
. Establish the result
of Smtal, [11]which says that under these conditions, m
n
((E +D) B) =
m
n
(B) where here m
n
denotes the outer measure determined by m
n
. Is this
also true for X, an arbitrary possibly non measurable set replacing E in which
m
n
(X) > 0? Hint: Let x be a point of density of E and let D
denote those
elements of D, d, such that d +x B. Thus D
is dense in B. Now use

translation invariance of Lebesgue measure to verify there exists, R > 0 such
that if r < R, we have the following holding for d D
and r
d
< R.
m
n
((E +D) B(x +d, r
d
))
m
n
((E +d) B(x +d, r
d
)) (1 ) m
n
(B(x +d, r
d
)) .
Argue the balls, m
n
(B(x +d, r
d
)), form a Vitali cover of B.
15.4. EXERCISES 411
12. Consider the construction employed to obtain the Cantor set, but instead of
removing the middle third interval, remove only enough that the sum of the
lengths of all the open intervals which are removed is less than one. That
which remains is called a fat Cantor set. Show it is a compact set which has
measure greater than zero which contains no interval and has the property
that every point is a limit point of the set. Let P be such a fat Cantor set
and consider
f (x) =
_
x
0
A
P
C (t) dt.
Show that f is a strictly increasing function which has the property that its
derivative equals zero on a set of positive measure.
13. Let f be a function dened on an interval, (a, b). The Dini derivates are
dened as
D
+
f (x) lim inf
h0+
f (x +h) f (x)
h
,
D
+
f (x) lim sup
h0+
f (x +h) f (x)
h
D
f (x) lim inf

h0+
f (x) f (x h)
h
,
D
f (x) lim sup

h0+
f (x) f (x h)
h
.
Suppose f is continuous on (a, b) and for all x (a, b), D
+
f (x) 0. Show
that then f is increasing on (a, b). Hint: Consider the function, H (x)
f (x) (d c) x(f (d) f (c)) where a < c < d < b. Thus H (c) = H (d).
Also it is easy to see that H cannot be constant if f (d) < f (c) due to the
assumption that D
+
f (x) 0. If there exists x
1
(a, b) where H (x
1
) > H (c),
then let x
0
(c, d) be the point where the maximum of f occurs. Consider
D
+
f (x
0
). If, on the other hand, H (x) < H (c) for all x (c, d), then consider
D
+
H (c).
14. Suppose in the situation of the above problem we only know
D
+
f (x) 0 a.e.
Does the conclusion still follow? What if we only know D
+
f (x) 0 for every
x outside a countable set? Hint: In the case of D
+
f (x) 0,consider the
bad function in the exercises for the chapter on the construction of measures
which was based on the Cantor set. In the case where D
+
f (x) 0 for all but
countably many x, by replacing f (x) with

f (x) f (x) + x, consider the
situation where D
+
f (x) > 0 for all but countably many x. If in this situation,
f (c) >

f (d) for some c < d, and y
_
f (d) ,

f (c)
_
,let
z sup
_
x [c, d] :

f (x) > y
0
_
.
Show that

f (z) = y
0
and D
+
f (z) 0. Conclude that if

f fails to be in-
creasing, then D
+
f (z) 0 for uncountably many points, z. Now draw a

conclusion about f.
15. Let f : [a, b] 1 be increasing. Show
m
_
_
_
N
pq
..
_
D
+
f (x) > q > p > D
+
f (x)
_
_
_ = 0 (15.15)
and conclude that aside from a set of measure zero, D
+
f (x) = D
+
f (x).
Similar reasoning will show D
f (x) = D
f (x) a.e. and D

+
f (x) = D
f (x)
a.e. and so o some set of measure zero, we have
D
f (x) = D
f (x) = D
+
f (x) = D
+
f (x)
which implies the derivative exists and equals this common value. Hint: To
show 15.15, let U be an open set containing N
pq
such that m(N
pq
)+ > m(U).
For each x N
pq
there exist y > x arbitrarily close to x such that
f (y) f (x) < p (y x) .
Thus the set of such intervals, [x, y] which are contained in U constitutes a
Vitali cover of N
pq
. Let [x
i
, y
i
] be disjoint and
m(N
pq

i
[x
i
, y
i
]) = 0.
Now let V
i
(x
i
, y
i
). Then also we have
m
_
_
_N
pq
=V
..
i
(x
i
, y
i
)
_
_
_ = 0.
and so m(N
pq
V ) = m(N
pq
). For each x N
pq
V , there exist y > x
arbitrarily close to x such that
f (y) f (x) > q (y x) .
Thus the set of such intervals, [x
, y
] which are contained in V is a Vitali

cover of N
pq
V . Let [x
i
, y
i
] be disjoint and
m(N
pq
V
i
[x
i
, y
i
]) = 0.
Then verify the following:
i
f (y
i
) f (x
i
) > q
i
(y
i
x
i
) qm(N
pq
V ) = qm(N
pq
)
pm(N
pq
) > p (m(U) ) p
i
(y
i
x
i
) p
i
(f (y
i
) f (x
i
)) p
i
f (y
i
) f (x
i
) p
15.4. EXERCISES 413
and therefore, (q p) m(N
pq
) p. Since > 0 is arbitrary, this proves that
there is a right derivative a.e. A similar argument does the other cases.
16. Suppose [f(x) f(y)[ K[x y[. Show there exists g L
(1), [[g[[
K,
and
f(y) f(x) =
_
y
x
g(t)dt.
Hint: Let F(x) = Kx + f(x) and let be the measure representing
_
fdF.
Show m.
17. We P = x
0
, x
1
, , x
n
is a partition of [a, b] if a = x
0
< < x
n
= b.
Dene
P
[f (x
i
) f (x
i1
)[
n
i=1
[f (x
i
) f (x
i1
)[
A function, f : [a, b] 1 is said to be of bounded variation if
sup
P
_
P
[f (x
i
) f (x
i1
)[
_
< .
Show that whenever f is of bounded variation it can be written as the dier-
ence of two increasing functions. Explain why such bounded variation func-
tions have derivatives a.e.
18. Do there exist compact sets of 1 H and K such that H K, m(K) = m(H)
but m
_
K H
_
> 0? If so, nd an example and if not, prove a theorem.
Dierentiation With Respect
To General Radon Measures
This is a brief chapter on certain important topics on the dierentiation theory
for general Radon measures. For dierent proofs and some results which are not
discussed here, a good source is [19] which is where I rst read some of these things.
16.1 Besicovitch Covering Theorem
The fundamental theorem of calculus presented above for Lebesgue measures can
be generalized to arbitrary Radon measures. It turns out that the same approach
works if a dierent covering theorem is employed instead of the Vitali theorem. This
covering theorem is the Besicovitch covering theorem of this section. It is necessary
because for a general Radon measure , it is no longer the case that the measure
is translation invariant. This implies there is no way to estimate
_
B
_
in terms of
(B) and thus the Vitali covering theorem is of no use. In the Besicovitch covering
theorem the balls in the covering are not enlarged as they are in the Vitali theorem.
In this theorem they can also be either open or closed or neither open nor closed.
The balls can also be taken with respect to any norm on 1
n
. The notation, B(x,r)
in the above argument will denote any set which satises
y : [[y x[[ < r B(x, r) y : [[y x[[ < r
and the norm, [[[[ is just some norm on 1
n
. The following picture is a distorted
picture of the situation described in the following lemma.
0
y
x
415
416 DIFFERENTIATION WITH RESPECT TO GENERAL RADON MEASURES
Lemma 16.1 Let 10 r
x
r
y
and suppose B(x, r
x
) and B(y, r
y
) both have
nonempty intersection with B(0, 1) but neither of these balls contains 0. Suppose
also that
[[x y[[ r
y
so that neither ball contains both centers in its interior. Then
x
[[x[[

y
[[y[[
4
5
.
Proof: By hypothesis,
[[x[[ r
x
[[x[[ 1, [[y[[ r
y
[[y[[ 1.
Then

x
[[x[[

y
[[y[[
x[[y[[ [[x[[ y
[[x[[ [[y[[
x[[y[[ y [[y[[ +y [[y[[ [[x[[ y

[[x[[ [[y[[
[[x y[[
[[x[[

[[[y[[ [[x[[[
[[x[[
(16.1)
Now there are two cases.
First suppose [[y[[ [[x[[ . Then the above is larger than
r
y
[[x[[

[[y[[
[[x[[
+ 1
r
y
[[x[[

(r
y
+ 1)
[[x[[
+ 1
= 1
1
[[x[[
1
1
r
x
1
1
10
=
9
10
.
Next suppose [[x[[ [[y[[ . Then 16.1 is at least as large as
r
y
[[x[[

[[x[[ [[y[[
[[x[[
=
r
y
[[x[[
1 +
[[y[[
[[x[[
2r
y
[[x[[
1
2r
y
r
x
+ 1
1
2r
x
r
x
+ 1
1
20
10 + 1
1
= . 818 18
Lemma 16.2 There exists L
n
depending on dimension, n, such that for x
k
m
k=1
distinct points on B(0, 1) , if m L
n
, then the distance between some pair of
points of x
k
m
k=1
is less than 4/5.
16.1. BESICOVITCH COVERING THEOREM 417
Proof: Let z
j
L
n
1
j=1
be a 1/3 net on B(0, 1) . Then for m L
n
, if x
k
m
k=1
is a set of m distinct points on B(0, 1) , there must exist x
i
and x
j
for i ,= j such
that both x
i
and x
j
are contained in some B(z
k
, 1/3) . This follows from the pigeon
hole principle. There are more x
i
than there are B(z
k
, 1/3) and so one of these
must have more than one x
k
in it. But then
[[x
i
x
j
[[ [[x
i
z
k
[[ +[[z
k
x
j
[[
2
3
<
4
5
Corollary 16.3 Let B
0
= B(0,1) and let B
j
= B(x
j
, r
j
) for j = 1, , K such
that r
j
10, 0 / B
j
for all j > 0, B
j
B
0
,= , and for all i ,= j,
[[x
i
x
j
[[ max (r
i
, r
j
) .
That is, no B
j
contains two centers in its interior. Then K L
n
, the constant of
the above lemma.
Proof: By Lemma 16.2, if K > L
n
, there exist two of the centers, x
i
and x
j
such that
x
i
||x
i
||

x
j
||x
j
||
< 4/5. This contradicts Lemma 16.1. Hence K L

n
as
claimed.
Theorem 16.4 There exists a constant N
n
, depending only on n with the following
property. If T is any collection of nonempty balls in 1
n
with
supdiam(B) : B T < D <
and if A is the set of centers of the balls in T, then there exist subsets of T, (
1
, ,
(
N
n
, such that each (
i
is a countable collection of disjoint balls from T and
A
N
n
i=1
B : B (
i
.
Lemma 16.5 In the situation of Theorem 16.4, suppose the set of centers A is
bounded. Then there exists a sequence of balls from T, B
j
J
j=1
where J such
that
r (B
1
)
3
4
supr (B) : B T (16.2)
and if
A
m
A (
m
i=1
B
i
) ,= , (16.3)
then
r (B
m+1
)
3
4
supr : B(a, r) T, a A
m
. (16.4)
Letting B
j
= B(a
j
, r
j
) , this sequence satises
A
J
i=1
B
i
, r (B
k
)
4
3
r (B
j
) for j < k, B(a
j
, r
j
/3)
J
j=1
are disjoint. (16.5)
Proof: Pick B
1
satisfying 16.2. If B
1
, , B
m
have been chosen, and A
m
is
given in 16.3, then if it equals , it follows A
m
i=1
B
i
. Set J = m. If A
m
,= ,
pick B
m+1
to satisfy 16.4. This denes the desired sequence. It remains to verify
the claims in 16.5. Consider the second claim. Letting A
0
A, A
k
A
j1
and so
j1
r
k
.
Hence
r
j

3
4
j1
(3/4) r
k
.
and so r (B
k
)
4
3
r (B
j
). This proves the second claim of 16.5.
Consider the third claim of 16.5. Suppose to the contrary that x B(a
j
, r
j
/3)
B(a
i
, r
i
/3) where i < j. Then
[[a
i
a
j
[[ [[a
i
x[[ +[[x a
j
[[
<
1
3
(r
i
+r
j
)
1
3
_
r
i
+
4
3
r
i
_
=
7
9
r
i
< r
i
contrary to the construction which requires a
j
/ B(a
i
, r
i
).
Finally consider the rst claim of 16.5. It is true if J < . This follows from
the construction. If J = , then since A is bounded and the balls, B(a
j
, r
j
/3) are
disjoint, it must be the case that lim
i
r
i
= 0. Suppose J = so that A
m
,=
for all m. If a
0
fails to be covered, then a
0
A
k
for all k. Let a
0
B(a
0
, r
0
) T
for some ball B(a
0
, r
0
) . Then for i large enough, r
i
<
1
10
r
0
and so since a
0
A
i1
,
3
4
r
0

3
4
i1
r
i
<
1
10
r
0
,
a contradiction. This proves the lemma.
Lemma 16.6 There exists a constant M
n
depending only on n such that for each
1 k J, M
n
exceeds the number of sets B
j
for j < k which have nonempty
intersection with B
k
.
Proof: These sets B
j
which intersect B
k
are of two types. Either they have
large radius, r
j
> 10r
k
, or they have small radius, r
j
10r
k
. In this argument let
[S[ denote the number of elements in the set S. Dene for xed k,
I j : 1 j < k, B
j
B
k
,= , r
j
10r
k
,
K j : 1 j < k, B
j
B
k
,= , r
j
> 10r
k
.
Claim 1: B
_
a
j
,
r
j
3
_
B(a
k
, 15r
k
) for j I.
Proof: Let j I. Then B
j
B
k
,= and r
j
10r
k
. Now if
x B
_
a
j
,
r
j
3
_
,
16.1. BESICOVITCH COVERING THEOREM 419
then since r
j
10r
k
,
[[x a
k
[[ [[x a
j
[[ +[[a
j
a
k
[[
r
j
3
+r
j
+r
k
=
4
3
r
j
+r
k

4
3
(10r
k
) +r
k
=
43
3
r
k
< 15r
k
.
Therefore, B
_
a
j
,
r
j
3
_
B(a
k
, 15r
k
).
Claim 2: [I[ 60
n
.
Proof: Recall r (B
k
)
4
3
r (B
j
) . Then letting (n) r
n
be the Lebesgue measure
of the n dimensional ball of radius r, (Note this (n) depends on the norm used.)
(n) 15
n
r
n
k
m
n
(B(a
k
, 15r
k
))
jI
m
n
_
B
_
a
j
,
r
j
3
__
=
jI
(n)
_
r
j
3
_
n
jI
(n)
_
r
k
4
_
n
_
since r
k

4
3
r
j
_
= [I[ (n)
_
r
k
4
_
n
and so it follows [I[ 60
n
as claimed.
Claim 3: [K[ L
n
where L
n
is the constant of Corollary 16.3.
Proof: Consider B
j
: j K and B
k
. Let f (x) r
1
k
(x x
k
) . Then f (B
k
) =
B(0, 1) and
f (B
j
) = r
1
k
B(x
j
x
k
, r
j
) = B
_
x
j
x
k
r
k
, r
j
/r
k
_
.
Then r
j
/r
k
10 because j K. None of the balls, f (B
j
) contain 0 but all these
balls intersect B(0, 1) and as just noted, each of these balls has radius 10 and
none of them contains two centers on its interior. By Corollary 16.3, it follows there
are no more than L
n
of them. This proves the claim. A constant which will satisfy
the desired conditions is
M
n
L
n
+ 60
n
+ 1.
This completes the proof of Lemma 16.6.
Next subdivide the balls B
i
J
i=1
into M
n
subsets (
1
, , (
M
n
each of which
consists of disjoint balls. This is done in the following way. Let B
1
(
1
. If
B
1
, , B
k
have each been assigned to one of the sets (
1
, , (
M
n
, let B
k+1
(
r
where r is the smallest index having the property that B
k+1
does not intersect any
of the balls already in (
r
. There must exist such an index r 1, , M
n
because
otherwise B
k+1
B
j
,= for at least M
n
values of j < k + 1 contradicting Lemma
16.6. By Lemma 16.5
A
M
n
i=1
B : B (
i
=
J
j=1
B
j
.
This proves Theorem 16.4 in the case where A is bounded.
To complete the proof of this theorem, the restriction that A is bounded must
be removed. Dene
A
l
A x 1
n
: 10 (l 1) D [[x[[ < 10lD , l = 1, 2,
and
T
l
= B(a,r) : B(a,r) T and a A
l
.
Then since D is an upper bound for all the diameters of these balls,
(T
l
) (T
m
) = (16.6)
whenever m l +2. Therefore, applying what was just shown to the pair (A
l
, T
l
),
there exist subsets of T
l
, (
l
1
(
l
M
n
such that each (
l
i
is a countable collection of
disjoint balls of T
l
T and
A
l

M
n
i=1
_
B : B (
l
i
_
.
Now let (
j

l=1
(
2l1
j
for 1 j M
n
and for 1 j M
n
, let (
j+M
n

l=1
(
2l
j
.
Thus, letting N
n
2M
n
,
A =
l=1
A
2l

l=1
A
2l1

N
n
j=1
B : B (
j
and by 16.6, each (

j
is a countable set of disjoint balls of T. This proves the
Besicovitch covering theorem.
16.2 Fundamental Theorem Of Calculus For Radon
Measures
In this section the Besicovitch covering theorem will be used to give a generalization
of the Lebesgue dierentiation theorem to general Radon measures. In what follows,
will be a Radon measure,
Z x 1
n
: (B(x,r)) = 0 for some r > 0,
_
B(x,r)
fd
_
0 if x Z,
1
(B(x,r))
_
B(x,r)
fd if x / Z,
and the maximal function Mf : 1
n
[0, ] is given by
Mf (x) sup
r1
_
B(x,r)
[f[ d.
Lemma 16.7 Z is measurable and (Z) = 0.
16.2. FUNDAMENTAL THEOREM OF CALCULUS FOR RADON MEASURES 421
Proof: For each x Z, there exists a ball B(x,r) with (B(x,r)) = 0. Let (
be the collection of these balls. Since 1
n
has a countable basis, a countable subset,
(, of ( also covers Z. Let
( = B
i
i=1
.
Then letting denote the outer measure determined by ,
(Z)
i=1
(B
i
) =
i=1
(B
i
) = 0
Therefore, Z is measurable and has measure zero as claimed.
Theorem 16.8 Let be a Radon measure and let f L
1
(1
n
, ). Then
lim
r0
_
B(x,r)
[f (y) f (x)[ d(y) = 0
for a.e. x 1
n
.
Proof: First consider the following claim which is a weak type estimate of the
same sort used when dierentiating with respect to Lebesgue measure.
Claim 1:The following inequality holds for N
n
the constant of the Besicovitch
covering theorem.
([Mf > ]) N
n
1
[[f[[
1
Proof: First note [Mf > ] Z = and without loss of generality, you can
assume ([Mf > ]) > 0. Next, for each x [Mf > ] there exists a ball B
x
=
B(x,r
x
) with r
x
1 and
(B
x
)
1
_
B(x,r
x
)
[f[ d > .
Let T be this collection of balls so that [Mf > ] is the set of centers of balls of T.
By the Besicovitch covering theorem,
[Mf > ]
N
n
i=1
B : B (
i
where (
i
is a collection of disjoint balls of T. Now for some i,
([Mf > ]) /N
n
(B : B (
i
)
because if this is not so, then
([Mf > ])
N
n
i=1
(B : B (
i
)
<
N
n
i=1
([Mf > ])
N
n
= ([Mf > ]),
a contradiction. Therefore for this i,
([Mf > ])
N
n
(B : B (
i
) =
BG
i
(B)
BG
i
1
_
B
[f[ d

1
_
R
n
[f[ d =
1
[[f[[
1
.
This shows Claim 1.
Claim 2: If g is any continuous function dened on 1
n
, then
lim
r0
_
B(x,r)
[g (y) g (x)[ d(y) = 0
and if x / Z,
lim
r0
1
(B(x,r))
_
B(x,r)
g (y) d(y) = g (x). (16.7)
Proof: If x Z there is nothing to prove. If x / Z, then since g is continuous
at x, whenever r is small enough,
_
B(x,r)
[g (y) g (x)[ d(y)
=
1
(B(x,r))
_
B(x,r)
[g (y) g (x)[ d(y)
1
(B(x,r))
_
B(x,r)
d(y) = .
16.7 follows from the above and the triangle inequality. This proves the claim.
Now let g C
c
(1
n
) and x / Z. Then from the above observations about
continuous functions,
__
x : limsup
r0
_
B(x,r)
[f (y) f (x)[ d(y) >
__
(16.8)

__
x : limsup
r0
_
B(x,r)
[f (y) g (y)[ d(y) >

2
__
+
__
x : [g (x) f (x)[ >

2
__
.

__
M (f g) >

2
__
+
__
[f g[ >

2
__
(16.9)
Now
_
[|fg|>
2
]
[f g[ d

2
__
[f g[ >

2
__
16.2. FUNDAMENTAL THEOREM OF CALCULUS FOR RADON MEASURES 423
and so from Claim 1 16.9 and hence 16.8 is dominated by
_
2
+
N
n
_
[[f g[[
L
1
(R
n
,)
.
But by regularity of Radon measures, C
c
(1
n
) is dense in L
1
(1
n
, ) and so since g
in the above is arbitrary, this shows 16.8 equals 0. Now
__
x : limsup
r0
_
B(x,r)
[f (y) f (x)[ d(y) > 0
__
=
_
m=1
__
x : limsup
r0
_
B(x,r)
[f (y) f (x)[ d(y) >
1
m
___
m=1
__
x : limsup
r0
_
B(x,r)
[f (y) f (x)[ d(y) >
1
m
__
= 0.
By Lemma 16.7 the set Z is a set of measure zero and so if
x /
_
limsup
r0
_
B(,r)
[f (y) f ()[ d(y) > 0
_
Z
the above has shown
0 lim inf
r0
_
B(x,r)
[f (y) f (x)[ d(y)
limsup
r0
_
B(x,r)
[f (y) f (x)[ d(y) = 0
The following corollary is the main result referred to as the Lebesgue Besicovitch
Dierentiation theorem.
Corollary 16.9 If f L
1
loc
(1
n
, ),
lim
r0
_
B(x,r)
[f (y) f (x)[ d(y) = 0 a.e. x. (16.10)
Proof: If f is replaced by fA
B(0,k)
then the conclusion 16.10 holds for all x / F
k
where F
k
is a set of measure 0. Letting k = 1, 2, , and F
k=1
F
k
, it follows
that F is a set of measure zero and for any x / F, and k 1, 2, , 16.10 holds
if f is replaced by fA
B(0,k)
. Picking any such x, and letting k > [x[ +1, this shows
lim
r0
_
B(x,r)
[f (y) f (x)[ d(y)
= lim
r0
_
B(x,r)
fA
B(0,k)
(y) fA
B(0,k)
(x)
d(y) = 0.
16.3 Slicing Measures
Let be a nite Radon measure. I will show here that a formula of the following
form holds.
(F) =
_
F
d =
_
R
n
_
R
m
A
F
(x, y) d
x
(y) d(x)
where (E) = (E 1
m
). When this is done, the measures,
x
, are called slicing
measures and this shows that an integral with respect to can be written as an
iterated integral in terms of the measure and the slicing measures,
x
. This is
like going backwards in the construction of product measure. One starts with a
measure, , dened on the Cartesian product and produces and an innite family
of slicing measures from it whereas in the construction of product measure, one
starts with two measures and obtains a new measure on a algebra of subsets of
the Cartesian product of two spaces. First here are two technical lemmas.
Lemma 16.10 The space C
c
(1
m
) with the norm
[[f[[ sup[f (y)[ : y 1
m
is separable.
Proof: Let T
l
consist of all functions which are of the form
||N
a
_
dist
_
y,B(0,l + 1)
C
__
n
where a
, is a multi-index, and n
is a positive integer. Then T

l
is countable,
separates the points of B(0,l) and annihilates no point of B(0,l). By the Stone
Weierstrass theorem T
l
is dense in the space C
_
B(0,l)
_
and so T
l
: l N is a
countable dense subset of C
c
(1
m
).
From the regularity of Radon measures, the following lemma follows.
Lemma 16.11 If and are two Radon measures dened on algebras, o
and
o
, of subsets of 1
n
and if (V ) = (V ) for all V open, then = and o
= o
.
Proof: Every compact set is a countable intersection of open sets so the two
measures agree on every compact set. Hence it is routine that the two measures
agree on every G
and F
set. (Recall G
sets are countable intersections of open

sets and F
sets are countable unions of closed sets.) Now suppose E o
is a
bounded set. Then by regularity of there exists G a G
set and F, an F
set
such that F E G and (G F) = 0. Then it is also true that (G F) = 0.
Hence E = F (E F) and E F is a subset of G F, a set of measure zero. By
completeness of , it follows E o
and
(E) = (F) = (F) = (E) .
If E o
not necessarily bounded, let E

m
= E B(0, m) and then E
m
o
and
(E
m
) = (E
m
) . Letting m , E o
and (E) = (E) . Similarly, o
and the two measures are equal on o
.
The main result in the section is the following theorem.
16.3. SLICING MEASURES 425
Theorem 16.12 Let be a nite Radon measure on 1
n+m
dened on a algebra,
T. Then there exists a unique nite Radon measure, , dened on a algebra, o,
of sets of 1
n
which satises
(E) = (E 1
m
) (16.11)
for all E Borel. There also exists a Borel set of measure zero, N, such that
for each x / N, there exists a Radon probability measure
x
such that if f is a
nonnegative measurable function or a measurable function in L
1
(),
y f (x, y) is
x
measurable a.e.
x
_
R
m
f (x, y) d
x
(y) is measurable (16.12)
and
_
R
n+m
f (x, y) d =
_
R
n
__
R
m
f (x, y) d
x
(y)
_
d(x). (16.13)
If
x
is any other collection of Radon measures satisfying 16.12 and 16.13, then

x
=
x
for a.e. x.
Proof:
Existence and uniqueness of
First consider the uniqueness of . Suppose
1
is another Radon measure sat-
isfying 16.11. Then in particular,
1
and agree on open sets and so the two
measures are the same by Lemma 16.11.
To establish the existence of , dene
0
on Borel sets by
0
(E) = (E 1
m
).
Thus
0
is a nite Borel measure and so it is nite on compact sets. Then an
application of the Riesz representation theorem for positive linear functionals yields
a Radon measure which extends
0
.
Uniqueness of
x
Next consider the uniqueness of
x
. Suppose
x
and
x
satisfy all conclusions
of the theorem with exceptional sets denoted by N and

N respectively. Then,
enlarging N and

N, one may also assume, using Lemma 16.7, that for x / N

N,
(B(x,r)) > 0 whenever r > 0. Now let
A =
m
i=1
(a
i
, b
i
]
where a
i
and b
i
are rational. Thus there are countably many such sets. Then from
the conclusion of the theorem, if x
0
/ N

N,
1
(B(x
0
, r))
_
B(x
0
,r)
_
R
m
A
A
(y) d
x
(y) d
=
1
(B(x
0
, r))
_
B(x
0
,r)
_
R
m
A
A
(y) d
x
(y) d,
and by the Lebesgue Besicovitch Dierentiation theorem, there exists a set of
measure zero, E
A
, such that if x
0
/ E
A
N

N, then the limit in the above exists
as r 0 and yields
x
0
(A) =
x
0
(A).
Letting E denote the union of all the sets E
A
for A as described above, it follows
that E is a set of measure zero and if x
0
/ E N

N then
x
0
(A) =
x
0
(A) for
all such sets A. But every open set can be written as a disjoint union of sets of this
form and so for all such x
0
,
x
0
(V ) =
x
0
(V ) for all V open. By Lemma 16.11
this shows the two measures are equal and proves the uniqueness assertion for
x
.
It remains to show the existence of the measures
x
.
Existence of
x
For f C
c
(1
m
) , dene a nite measure on the Borel sets of 1
n
,
f
by
f
(E)
_
ER
m
f (y) d.
Then it is clear that
f
dened above. If (E) = 0 then (E 1
m
) = 0 and
so
f
(E) = 0. Therefore, from the Radon Nikodym theorem, there exists a unique
Borel measurable g
f
L
1
(1
n
) such that g
f
0 and
f
(E) =
_
E
g
f
(x) d(x) .
Consider the map f g
f
L
1
(1
n
) . For E Borel measurable,
_
E
g
(f
1
+f
2
)
(x) d(x)
f
1
+f
2
(E)
=
_
ER
m
f
1
(x) d +
_
ER
m
f
2
(x) d

f
1
(E) +
f
2
(E)
=
_
E
g
f
1
(x) d(x) +
_
E
g
f
2
(x) d(x)
=
_
E
(g
f
1
+g
f
2
) (x) d(x)
and since E is arbitrary, it follows f g
f
is linear as a map into L
1
(1
n
, B(1
n
))
where B (1
n
) denotes the Borel sets in 1
n
. Of course each of these g
f
is really
an equivalence class of functions. I want to pick a single representative for each of
these in such a way that the map f g
f
(x) will be linear for each x.
Lemma 16.13 There exists a Borel set of measure zero N such that for all x / N,
lim
r0
1
(B(x, r))
_
B(x,r)
g
f
(x) d exists.
Proof: By Lemma 16.10 there exists a dense subset of C
c
(1
m
) , T. Then since
this set is countable, it follows from the fundamental theorem of calculus, Corollary
16.9 there exists a single set of measure zero N such that for x / N, and f T
lim
r0
1
(B(x, r))
_
B(x,r)
g
f
(x) d
exists. Without loss of generality N Z, the set of x where (B(x, r)) = 0 for
some r > 0. Now let f C
c
(1
m
) be arbitrary and denote by f
1
a function of T. I
want to show the above limit exists for f whenever x / N. Then for x / N,
1
(B(x, r))
_
B(x,r)
g
f
(x) d =
1
(B(x, r))
_
B(x,r)
(g
f
(x) g
f
1
(x)) d
+
1
(B(x, r))
_
B(x,r)
g
f
1
(x) d
It follows
limsup
r0
1
(B(x, r))
_
B(x,r)
g
f
(x) d lim inf
r0
1
(B(x, r))
_
B(x,r)
g
f
(x) d
limsup
r0
1
(B(x, r))
_
B(x,r)
g
f
1
(x) d lim inf
r0
1
(B(x, r))
_
B(x,r)
g
f
1
(x) d
+2 limsup
r0
1
(B(x, r))
_
B(x,r)
(g
f
(x) g
f
1
(x)) d
= 2 limsup
r0
1
(B(x, r))
_
B(x,r)
(g
f
(x) g
f
1
(x)) d
because the limit exists for g

f
1
. Consider this last expression. By denition it equals
2 limsup
r0
1
(B(x, r))
_
B(x,r)R
m
(f (y) f
1
(y)) d
2 limsup
r0
1
(B(x, r))
_
B(x,r)R
m
d = 2
provided f
1
is close enough to f. Since is arbitrary, this proves the lemma.
Now dene for x / N
g
f
(x) lim
r0
1
(B(x, r))
_
B(x,r)
g
f
(x) d
In the right, g
f
(x) denotes any representative of g
f
in L
1
(1
n
) , since they all give
the same limit, while on the left, g
f
denotes the specic representative which equals
the above limit for all x / N. Then for x N, simply dene g
f
(x) ,= 0.
Then the map f g
f
(x) will be linear for each x and also
f
(E)
_
ER
m
f (y) d(x, y) =
_
E
g
f
(x) d(x)
Thus for each x, f g
f
(x) is a positive linear functional dened on C
c
(1
m
). By
the Riesz representation theorem, there exists a unique Radon measure
x
repre-
senting this functional in the sense that
g
f
(x) =
_
R
m
f (y) d
x
(y) .
Summarizing, this has shown that for all E Borel and f C
c
(1
m
) ,
_
ER
m
f (y) d(x, y) =
_
E
g
f
(x) d(x) =
_
E
_
R
m
f (y) d
x
(y) d(x) (16.14)
Let f
k
be an increasing sequence of nonnegative continuous functions whose limit
is 1. Then by the monotone convergence theorem,
(E 1
m
) =
_
E
_
R
m
1d
x
(y) d(x) =
_
E
x
(1
m
) d(x)
Then letting E = B(x, r) , x / N given above,
1 =
1
(B(x, r))
_
B(x,r)
x
(1
m
) d(x)
Taking a limit, it follows from the fundamental theorem of calculus that for a.e. x,
x
(1
m
) = 1.
This was one of the things to show.
Now consider V an open set in 1
m
and let f
k
be an increasing sequence of
functions in C
c
(1
m
) converging pointwise to A
V
. Then the monotone convergence
theorem implies
_
ER
m
A
V
(y) d(x, y) =
_
E
_
R
m
A
V
(y) d
x
(y) d(x)
Let ( denote the Borel sets V such that the above equation holds. Then it is clear
since all the measures are nite that ( is closed with respect to nite disjoint unions
and complements. In addition ( contains the system consisting of the open sets.
Therefore, by the Lemma on systems, Lemma 8.43, ( equals the Borel sets. Thus
whenever F is a Borel set,
_
ER
m
A
F
(y) d(x, y) =
_
E
_
R
m
A
F
(y) d
x
(y) d(x)
Going to Borel simple functions it follows that whenever f 0, g 0 are Borel
measurable,
_
R
n+m
f (y) g (x) d(x, y) =
_
R
n
_
R
m
f (y) g (x) d
x
(y) d(x) (16.15)
Now let ( denote the Borel sets F of 1
n+m
such that
_
R
n+m
A
F
(x, y) d(x, y) =
_
R
n
_
R
m
A
F
(x, y) d
x
(y) d(x)
and that all the integrals make sense. From 16.15, this includes all Borel sets of the
form EF where E, F are Borel. Again, it is clear that ( is closed with respect to
countable disjoint unions and complements while sets of the form E F for E, F
Borel form a system. Therefore, by Lemma 8.43 again ( contains the Borel sets.
It follows from the usual approximation with simple functions that if f 0 and is
Borel measurable, then
_
R
n+m
f (x, y) d(x, y) =
_
R
n
_
R
m
f (x, y) d
x
(y) d(x)
with all the integrals making sense.
This proves the theorem in the case where f is Borel measurable and non-
negative. It just remains to extend this to the case where f is only measurable.
However, from regularity of there exist Borel measurable functions g, h, g f h
such that
_
R
n+m
f (x, y) d(x, y) =
_
R
n+m
g (x, y) d(x, y)
=
_
R
n+m
h(x, y) d(x, y)
It follows
_
R
n
_
R
m
g (x, y) d
x
(y) d(x) =
_
R
n
_
R
m
h(x, y) d
x
(y) d(x)
and so, since for a.e. x, y g (x, y) and y h(x, y) are
x
measurable with
0 =
_
R
m
(h(x, y) g (x, y)) d
x
(y)
and
x
is a Radon measure, hence complete, it follows for a.e. x, y f (x, y)
must be
x
measurable because it is equal to y g (x, y) ,
x
a.e. Therefore, for
a.e. x, it makes sense to write
_
R
m
f (x, y) d
x
(y) .
Similar reasoning applies to the above function of x being measurable due to
being complete. It follows
_
R
n+m
f (x, y) d(x, y) =
_
R
n+m
g (x, y) d(x, y)
=
_
R
n
_
R
m
g (x, y) d
x
(y) d(x)
=
_
R
n
_
R
m
f (x, y) d
x
(y) d(x)
with everything making sense. This proves the theorem.
Part III
Complex Analysis
431
The Complex Numbers
The reader is presumed familiar with the algebraic properties of complex numbers,
including the operation of conjugation. Here a short review of the distance in C is
presented.
The length of a complex number, referred to as the modulus of z and denoted
by [z[ is given by
[z[
_
x
2
+y
2
_
1/2
= (zz)
1/2
,
Then C is a metric space with the distance between two complex numbers, z and
w dened as
d (z, w) [z w[ .
This metric on C is the same as the usual metric of 1
2
. A sequence, z
n
z if and
only if x
n
x in 1 and y
n
y in 1 where z = x + iy and z
n
= x
n
+ iy
n
. For
example if z
n
=
n
n+1
+i
1
n
, then z
n
1 + 0i = 1.
Denition 17.1 A sequence of complex numbers, z
n
is a Cauchy sequence if for
every > 0 there exists N such that n, m > N implies [z
n
z
m
[ < .
This is the usual denition of Cauchy sequence. There are no new ideas here.
Proposition 17.2 The complex numbers with the norm just mentioned forms a
complete normed linear space.
Proof: Let z
n
be a Cauchy sequence of complex numbers with z
n
= x
n
+iy
n
.
Then x
n
and y
n
are Cauchy sequences of real numbers and so they converge
to real numbers, x and y respectively. Thus z
n
= x
n
+ iy
n
x + iy. C is a linear
space with the eld of scalars equal to C. It only remains to verify that [ [ satises
the axioms of a norm which are:
[z +w[ [z[ +[w[
[z[ 0 for all z
[z[ = 0 if and only if z = 0
[z[ = [[ [z[ .
433
434 THE COMPLEX NUMBERS
The only one of these axioms of a norm which is not completely obvious is the rst
one, the triangle inequality. Let z = x +iy and w = u +iv
[z +w[
2
= (z +w) (z +w) = [z[
2
+[w[
2
+ 2 Re (zw)
[z[
2
+[w[
2
+ 2 [(zw)[ = ([z[ +[w[)
2
and this veries the triangle inequality.
Denition 17.3 An innite sum of complex numbers is dened as the limit of the
sequence of partial sums. Thus,
k=1
a
k
lim
n
n
k=1
a
k
.
Just as in the case of sums of real numbers, an innite sum converges if and
only if the sequence of partial sums is a Cauchy sequence.
From now on, when f is a function of a complex variable, it will be assumed that
f has values in X, a complex Banach space. Usually in complex analysis courses,
f has values in C but there are many important theorems which dont require this
so I will leave it fairly general for a while. Later the functions will have values in
C. If you are only interested in this case, think C whenever you see X.
Denition 17.4 A sequence of functions of a complex variable, f
n
converges
uniformly to a function, g for z S if for every > 0 there exists N
such that if
n > N
, then
[[f
n
(z) g (z)[[ <
for all z S. The innite sum

k=1
f
n
converges uniformly on S if the partial
sums converge uniformly on S. Here [[[[ refers to the norm in X, the Banach space
in which f has its values.
The following proposition is also a routine application of the above denition.
Neither the denition nor this proposition say anything new.
Proposition 17.5 A sequence of functions, f
n
dened on a set S, converges
uniformly to some function, g if and only if for all > 0 there exists N
such that
whenever m, n > N
,
[[f
n
f
m
[[
< .
Here [[f[[
sup[[f (z)[[ : z S .
Just as in the case of functions of a real variable, one of the important theorems
is the Weierstrass M test. Again, there is nothing new here. It is just a review of
earlier material.
Theorem 17.6 Let f
n
be a sequence of complex valued functions dened on S
C. Suppose there exists M
n
such that [[f
n
[[
< M
n
and

M
n
converges. Then
f
n
converges uniformly on S.
17.1. THE EXTENDED COMPLEX PLANE 435
Proof: Let z S. Then letting m < n
k=1
f
k
(z)
m
k=1
f
k
(z)
k=m+1
[[f
k
(z)[[
k=m+1
M
k
<
whenever m is large enough. Therefore, the sequence of partial sums is uniformly
Cauchy on S and therefore, converges uniformly to

k=1
f
k
(z) on S.
17.1 The Extended Complex Plane
The set of complex numbers has already been considered along with the topology of
C which is nothing but the topology of 1
2
. Thus, for z
n
= x
n
+iy
n
, z
n
z x+iy
if and only if x
n
x and y
n
y. The norm in C is given by
[x +iy[ ((x +iy) (x iy))
1/2
=
_
x
2
+y
2
_
1/2
which is just the usual norm in 1
2
identifying (x, y) with x + iy. Therefore, C is
a complete metric space topologically like 1
2
and so the Heine Borel theorem that
compact sets are those which are closed and bounded is valid. Thus, as far as
topology is concerned, there is nothing new about C.
The extended complex plane, denoted by

C , consists of the complex plane, C
along with another point not in C known as . For example, could be any point
in 1
3
. A sequence of complex numbers, z
n
, converges to if, whenever K is a
compact set in C, there exists a number, N such that for all n > N, z
n
/ K. Since
compact sets in C are closed and bounded, this is equivalent to saying that for all
R > 0, there exists N such that if n > N, then z
n
/ B(0, R) which is the same as
saying lim
n
[z
n
[ = where this last symbol has the same meaning as it does in
calculus.
A geometric way of understanding this in terms of more familiar objects involves
a concept known as the Riemann sphere.
Consider the unit sphere, S
2
given by (z 1)
2
+ y
2
+ x
2
= 1. Dene a map
from the complex plane to the surface of this sphere as follows. Extend a line
from the point, p in the complex plane to the point (0, 0, 2) on the top of this
sphere and let (p) denote the point of this sphere which the line intersects. Dene
() (0, 0, 2).
(0, 0, 2)
(0, 0, 1)
p
(p)
C
436 THE COMPLEX NUMBERS
Then
1
is sometimes called sterographic projection. The mapping is clearly
continuous because it takes converging sequences, to converging sequences. Fur-
thermore, it is clear that
1
is also continuous. In terms of the extended complex
plane,

C, a sequence, z
n
converges to if and only if z
n
converges to (0, 0, 2) and
a sequence, z
n
converges to z C if and only if (z
n
) (z) .
In fact this makes it easy to dene a metric on

C.
Denition 17.7 Let z, w

C including possibly w = . Then let d (x, w)
[ (z) (w)[ where this last distance is the usual distance measured in 1
3
.
Theorem 17.8
_
C, d
_
is a compact, hence complete metric space.
Proof: Suppose z
n
is a sequence in

C. This means (z
n
) is a sequence in
S
2
which is compact. Therefore, there exists a subsequence, z
n
k
and a point,
z S
2
such that z
n
k
z in S
2
which implies immediately that d (z
n
k
, z) 0.
A compact metric space must be complete.
17.2 Exercises
1. Prove the root test for series of complex numbers. If a
k
C and r
limsup
n
[a
n
[
1/n
then
k=0
a
k
_
_
_
converges absolutely if r < 1
diverges if r > 1
test fails if r = 1.
2. Does lim
n
n
_
2+i
3
_
n
exist? Tell why and nd the limit if it does exist.
3. Let A
0
= 0 and let A
n

n
k=1
a
k
if n > 0. Prove the partial summation
formula,
q
k=p
a
k
b
k
= A
q
b
q
A
p1
b
p
+
q1
k=p
A
k
(b
k
b
k+1
) .
Now using this formula, suppose b
n
is a sequence of real numbers which
converges to 0 and is decreasing. Determine those values of such that
[[ = 1 and

k=1
b
k
k
converges.
4. Let f : U C C be given by f (x +iy) = u(x, y) + iv (x, y) . Show f is
continuous on U if and only if u : U 1 and v : U 1 are both continuous.
Riemann Stieltjes Integrals
In the theory of functions of a complex variable, the most important results are those
involving contour integration. I will base this on the notion of Riemann Stieltjes
integrals as in [12], [34], and [26]. The Riemann Stieltjes integral is a generalization
of the usual Riemann integral and requires the concept of a function of bounded
variation.
Denition 18.1 Let : [a, b] C be a function. Then is of bounded variation
if
sup
_
n
i=1
[ (t
i
) (t
i1
)[ : a = t
0
< < t
n
= b
_
V (, [a, b]) <
where the sums are taken over all possible lists, a = t
0
< < t
n
= b .
The idea is that it makes sense to talk of the length of the curve ([a, b]) , dened
as V (, [a, b]) . For this reason, in the case that is continuous, such an image of a
bounded variation function is called a rectiable curve.
Denition 18.2 Let : [a, b] C be of bounded variation and let f : [a, b] X.
Letting T t
0
, , t
n
where a = t
0
< t
1
< < t
n
= b, dene
[[T[[ max [t
j
t
j1
[ : j = 1, , n
and the Riemann Steiltjes sum by
S (T)
n
j=1
f ( (
j
)) ( (t
j
) (t
j1
))
where
j
[t
j1
, t
j
] . (Note this notation is a little sloppy because it does not identify
the specic point,
j
used. It is understood that this point is arbitrary.) Dene
_
fd as the unique number which satises the following condition. For all > 0
there exists a > 0 such that if [[T[[ , then
fd S (T)
< .
437
438 RIEMANN STIELTJES INTEGRALS
Sometimes this is written as
_
fd lim
||P||0
S (T) .
The set of points in the curve, ([a, b]) will be denoted sometimes by
.
Then
is a set of points in C and as t moves from a to b, (t) moves from

(a) to (b) . Thus
has a rst point and a last point. If : [c, d] [a, b] is

a continuous nondecreasing function, then : [c, d] C is also of bounded
variation and yields the same set of points in C with the same rst and last points.
Theorem 18.3 Let and be as just described. Then assuming that
_
fd
exists, so does
_
fd ( )
and
_
fd =
_
fd ( ) . (18.1)
Proof: There exists > 0 such that if T is a partition of [a, b] such that [[T[[ < ,
then

fd S (T)
< .
By continuity of , there exists > 0 such that if Q is a partition of [c, d] with
[[Q[[ < , Q = s
0
, , s
n
, then [(s
j
) (s
j1
)[ < . Thus letting T denote the
points in [a, b] given by (s
j
) for s
j
Q, it follows that [[T[[ < and so
fd
n
j=1
f ( ((
j
))) ( ((s
j
)) ((s
j1
)))
<
where
j
[s
j1
, s
j
] . Therefore, from the denition 18.1 holds and
_
fd ( )
exists.
This theorem shows that
_
fd is independent of the particular used in its

computation to the extent that if is any nondecreasing function from another
interval, [c, d] , mapping to [a, b] , then the same value is obtained by replacing
with .
The fundamental result in this subject is the following theorem.
439
Theorem 18.4 Let f :
X be continuous and let : [a, b] C be continuous

and of bounded variation. Then
_
fd exists. Also letting

m
> 0 be such that
[t s[ <
m
implies [[f ( (t)) f ( (s))[[ <
1
m
,
fd S (T)
2V (, [a, b])
m
whenever [[T[[ <
m
.
Proof: The function, f , is uniformly continuous because it is dened on
a compact set. Therefore, there exists a decreasing sequence of positive numbers,
m
such that if [s t[ <
m
, then
[f ( (t)) f ( (s))[ <
1
m
.
Let
F
m
S (T) : [[T[[ <
m
.
Thus F
m
is a closed set. (The symbol, S (T) in the above denition, means to
include all sums corresponding to T for any choice of
j
.) It is shown that
diam(F
m
)
2V (, [a, b])
m
(18.2)
and then it will follow there exists a unique point, I
m=1
F
m
. This is because
X is complete. It will then follow I =
_
f (t) d (t) . To verify 18.2, it suces to

verify that whenever T and Q are partitions satisfying [[T[[ <
m
and [[Q[[ <
m
,
[S (T) S (Q)[
2
m
V (, [a, b]) . (18.3)
Suppose [[T[[ <
m
and Q T. Then also [[Q[[ <
m
. To begin with, suppose
that T t
0
, , t
p
, , t
n
and Q t
0
, , t
p1
, t
, t
p
, , t
n
. Thus Q con-
tains only one more point than T. Letting S (Q) and S (T) be Riemann Steiltjes
sums,
S (Q)
p1
j=1
f ( (
j
)) ( (t
j
) (t
j1
)) +f ( (
)) ( (t
) (t
p1
))
+f ( (
)) ( (t
p
) (t
)) +
n
j=p+1
f ( (
j
)) ( (t
j
) (t
j1
)) ,
S (T)
p1
j=1
f ( (
j
)) ( (t
j
) (t
j1
)) +
=f((
p
))((t
p
)(t
p1
))
..
f ( (
p
)) ( (t
) (t
p1
)) +f ( (
p
)) ( (t
p
) (t
))
+
n
j=p+1
f ( (
j
)) ( (t
j
) (t
j1
)) .
Therefore,
[S (T) S (Q)[
p1
j=1
1
m
[ (t
j
) (t
j1
)[ +
1
m
[ (t
) (t
p1
)[ +
1
m
[ (t
p
) (t
)[ +
n
j=p+1
1
m
[ (t
j
) (t
j1
)[
1
m
V (, [a, b]) . (18.4)
Clearly the extreme inequalities would be valid in 18.4 if Q had more than one
extra point. You simply do the above trick more than one time. Let S (T) and
S (Q) be Riemann Steiltjes sums for which [[T[[ and [[Q[[ are less than
m
and let
1 T Q. Then from what was just observed,
[S (T) S (Q)[ [S (T) S (1)[ +[S (1) S (Q)[
2
m
V (, [a, b]) .
and this shows 18.3 which proves 18.2. Therefore, there exists a unique complex
number, I
m=1
F
m
which satises the denition of
_
fd. This proves the

theorem.
The following theorem follows easily from the above denitions and theorem.
Theorem 18.5 Let f C (
) and let : [a, b] C be of bounded variation and

continuous. Let
M max [[f (t)[[ : t [a, b] . (18.5)
Then

fd
MV (, [a, b]) . (18.6)

Also if f
n
is a sequence of functions of C (
) which is converging uniformly to

the function, f on
, then
lim
n
_
f
n
d =
_
fd. (18.7)
Proof: Let 18.5 hold. From the proof of the above theorem, when [[T[[ <
m
,
fd S (T)
2
m
V (, [a, b])
and so

fd
[[S (T)[[ +
2
m
V (, [a, b])
441
j=1
M[ (t
j
) (t
j1
)[ +
2
m
V (, [a, b])
MV (, [a, b]) +
2
m
V (, [a, b]) .
This proves 18.6 since m is arbitrary. To verify 18.7 use the above inequality to
write

fd
_
f
n
d
(f f
n
) d (t)
max [[f (t) f

n
(t)[[ : t [a, b] V (, [a, b]) .
Since the convergence is assumed to be uniform, this proves 18.7.
It turns out to be much easier to evaluate such integrals in the case where is
also C
1
([a, b]) . The following theorem about approximation will be very useful but
rst here is an easy lemma.
Lemma 18.6 Let : [a, b] C be in C
1
([a, b]) . Then V (, [a, b]) < so is of
bounded variation.
Proof: This follows from the following
n
j=1
[ (t
j
) (t
j1
)[ =
n
j=1
_
t
j
t
j1
(s) ds
j=1
_
t
j
t
j1
[
(s)[ ds
j=1
_
t
j
t
j1
[[
[[
ds
= [[
[[
(b a) .
Therefore it follows V (, [a, b]) [[
[[
(b a) . Here [[[[
= max [ (t)[ : t [a, b].

Theorem 18.7 Let : [a, b] C be continuous and of bounded variation. Let
be an open set containing
and let f : K X be continuous for K a

compact set in C, and let > 0 be given. Then there exists : [a, b] C such that
(a) = (a) , (b) = (b) , C
1
([a, b]) , and
[[ [[ < , (18.8)
f (, z) d
_
f (, z) d
< , (18.9)
V (, [a, b]) V (, [a, b]) , (18.10)
where [[ [[ max [ (t) (t)[ : t [a, b] .
Proof: Extend to be dened on all 1 according to (t) = (a) if t < a and
(t) = (b) if t > b. Now dene
h
(t)
1
2h
_
t+
2h
(ba)
(ta)
2h+t+
2h
(ba)
(ta)
(s) ds.
where the integral is dened in the obvious way. That is,
_
b
a
(t) +i (t) dt
_
b
a
(t) dt +i
_
b
a
(t) dt.
Therefore,
h
(b) =
1
2h
_
b+2h
b
(s) ds = (b) ,
h
(a) =
1
2h
_
a
a2h
(s) ds = (a) .
Also, because of continuity of and the fundamental theorem of calculus,
h
(t) =
1
2h
_
_
t +
2h
b a
(t a)
__
1 +
2h
b a
_
_
2h +t +
2h
b a
(t a)
__
1 +
2h
b a
__
and so
h
C
1
([a, b]) . The following lemma is signicant.
Lemma 18.8 V (
h
, [a, b]) V (, [a, b]) .
Proof: Let a = t
0
< t
1
< < t
n
= b. Then using the denition of
h
and
changing the variables to make all integrals over [0, 2h] ,
n
j=1
[
h
(t
j
)
h
(t
j1
)[ =
n
j=1
1
2h
_
2h
0
_
_
s 2h +t
j
+
2h
b a
(t
j
a)
_
_
s 2h +t
j1
+
2h
b a
(t
j1
a)
__
1
2h
_
2h
0
n
j=1
_
s 2h +t
j
+
2h
b a
(t
j
a)
_
_
s 2h +t
j1
+
2h
b a
(t
j1
a)
_
ds.
443
For a given s [0, 2h] , the points, s 2h + t
j
+
2h
ba
(t
j
a) for j = 1, , n form
an increasing list of points in the interval [a 2h, b + 2h] and so the integrand is
bounded above by V (, [a 2h, b + 2h]) = V (, [a, b]) . It follows
n
j=1
[
h
(t
j
)
h
(t
j1
)[ V (, [a, b])
which proves the lemma.
With this lemma the proof of the theorem can be completed without too much
trouble. Let H be an open set containing
such that H is a compact subset of .

Let 0 < < dist
_
, H
C
_
. Then there exists
1
such that if h <
1
, then for all t,
[ (t)
h
(t)[
1
2h
_
t+
2h
(ba)
(ta)
2h+t+
2h
(ba)
(ta)
[ (s) (t)[ ds
<
1
2h
_
t+
2h
(ba)
(ta)
2h+t+
2h
(ba)
(ta)
ds = (18.11)
due to the uniform continuity of . This proves 18.8.
From 18.2 and the above lemma, there exists
2
such that if [[T[[ <
2
, then for
all z K,
f (, z) d (t) S (T)
<

3
,
h
f (, z) d
h
(t) S
h
(T)
<

3
for all h. Here S (T) is a Riemann Steiltjes sum of the form
n
i=1
f ( (
i
) , z) ( (t
i
) (t
i1
))
and S
h
(T) is a similar Riemann Steiltjes sum taken with respect to
h
instead of
. Because of 18.11
h
(t) has values in H . Therefore, x the partition, T, and
choose h small enough that in addition to this, the following inequality is valid for
all z K.
[S (T) S
h
(T)[ <

3
This is possible because of 18.11 and the uniform continuity of f on H K. It
follows

f (, z) d (t)
_
h
f (, z) d
h
(t)
f (, z) d (t) S (T)
+[[S (T) S
h
(T)[[
+
S
h
(T)
_
h
f (, z) d
h
(t)
< .
Formula 18.10 follows from the lemma. This proves the theorem.
Of course the same result is obtained without the explicit dependence of f on z.
This is a very useful theorem because if is C
1
([a, b]) , it is easy to calculate
_
fd and the above theorem allows a reduction to the case where is C

1
. The
next theorem shows how easy it is to compute these integrals in the case where is
C
1
. First note that if f is continuous and C
1
([a, b]) , then by Lemma 18.6 and
the fundamental existence theorem, Theorem 18.4,
_
fd exists.
Theorem 18.9 If f :
X is continuous and : [a, b] C is in C

1
([a, b]) ,
then
_
fd =
_
b
a
f ( (t))
(t) dt. (18.12)

Proof: Let T be a partition of [a, b], T = t
0
, , t
n
and [[T[[ is small enough
that whenever [t s[ < [[T[[ ,
[f ( (t)) f ( (s))[ < (18.13)
and

fd
n
j=1
f ( (
j
)) ( (t
j
) (t
j1
))
< .
Now
n
j=1
f ( (
j
)) ( (t
j
) (t
j1
)) =
_
b
a
n
j=1
f ( (
j
)) A
[t
j1
,t
j
]
(s)
(s) ds
where here
A
[a,b]
(s)
_
1 if s [a, b]
0 if s / [a, b]
.
Also,
_
b
a
f ( (s))
(s) ds =
_
b
a
n
j=1
f ( (s)) A
[t
j1
,t
j
]
(s)
(s) ds
and thanks to 18.13,
n
j=1
f((
j
))((t
j
)(t
j1
))
..
_
b
a
n
j=1
f ( (
j
)) A
[t
j1
,t
j
]
(s)
(s) ds
=
b
a
f((s))
(s)ds
..
_
b
a
n
j=1
f ( (s)) A
[t
j1
,t
j
]
(s)
(s) ds
j=1
_
t
j
t
j1
[[f ( (
j
)) f ( (s))[[ [
(s)[ ds [[
[[
j
(t
j
t
j1
)
= [[
[[
(b a) .
445
It follows that
fd
_
b
a
f ( (s))
(s) ds
fd
n
j=1
f ( (
j
)) ( (t
j
) (t
j1
))
j=1
f ( (
j
)) ( (t
j
) (t
j1
))
_
b
a
f ( (s))
(s) ds
[[
[[
(b a) +.
Since is arbitrary, this veries 18.12.
Denition 18.10 Let be an open subset of C and let : [a, b] be a contin-
uous function with bounded variation f : X be a continuous function. Then
the following notation is more customary.
_
f (z) dz
_
fd.
The expression,
_
f (z) dz, is called a contour integral and is referred to as the

contour. A function f : X for an open set in C has a primitive if there
exists a function, F, the primitive, such that F
(z) = f (z) . Thus F is just an

antiderivative. Also if
k
: [a
k
, b
k
] C is continuous and of bounded variation, for
k = 1, , m and
k
(b
k
) =
k+1
(a
k
) , dene
_
m
k=1

k
f (z) dz
m
k=1
_
k
f (z) dz. (18.14)
In addition to this, for : [a, b] C, dene : [a, b] C by (t)
(b +a t) . Thus simply traces out the points of
in the opposite order.

The following lemma is useful and follows quickly from Theorem 18.3.
Lemma 18.11 In the above denition, there exists a continuous bounded vari-
ation function, dened on some closed interval, [c, d] , such that ([c, d]) =
m
k=1
k
([a
k
, b
k
]) and (c) =
1
(a
1
) while (d) =
m
(b
m
) . Furthermore,
_
f (z) dz =
m
k=1
_
k
f (z) dz.
If : [a, b] C is of bounded variation and continuous, then
_
f (z) dz =
_
f (z) dz.
Re stating Theorem 18.7 with the new notation in the above denition,
Theorem 18.12 Let K be a compact set in C and let f : K X be continuous
for an open set in C. Also let : [a, b] be continuous with bounded variation.
Then if r > 0 is given, there exists : [a, b] such that (a) = (a) , (b) =
(b) , is C
1
([a, b]) , and
f (z, w) dz
_
f (z, w) dz
< r, [[ [[ < r.
It will be very important to consider which functions have primitives. It turns
out, it is not enough for f to be continuous in order to possess a primitive. This is
in stark contrast to the situation for functions of a real variable in which the funda-
mental theorem of calculus will deliver a primitive for any continuous function. The
reason for the interest in such functions is the following theorem and its corollary.
Theorem 18.13 Let : [a, b] C be continuous and of bounded variation. Also
suppose F
(z) = f (z) for all z , an open set containing
and f is continuous
on . Then
_
f (z) dz = F ( (b)) F ( (a)) .

Proof: By Theorem 18.12 there exists C
1
([a, b]) such that (a) = (a) ,
and (b) = (b) such that
f (z) dz
_
f (z) dz
< .
Then since is in C
1
([a, b]) ,
_
f (z) dz =
_
b
a
f ( (t))
(t) dt =
_
b
a
dF ( (t))
dt
dt
= F ( (b)) F ( (a)) = F ( (b)) F ( (a)) .
Therefore,
(F ( (b)) F ( (a)))
_
f (z) dz
<
and since > 0 is arbitrary, this proves the theorem.
Corollary 18.14 If : [a, b] C is continuous, has bounded variation, is a closed
curve, (a) = (b) , and
where is an open set on which F
(z) = f (z) ,
then
_
f (z) dz = 0.
18.1. EXERCISES 447
18.1 Exercises
1. Let : [a, b] 1 be increasing. Show V (, [a, b]) = (b) (a) .
2. Suppose : [a, b] C satises a Lipschitz condition, [ (t) (s)[ K[s t[ .
Show is of bounded variation and that V (, [a, b]) K[b a[ .
3. : [c
0
, c
m
] C is piecewise smooth if there exist numbers, c
k
, k = 1, , m
such that c
0
< c
1
< < c
m1
< c
m
such that is continuous and
: [c
k
, c
k+1
] C is C
1
. Show that such piecewise smooth functions are
of bounded variation and give an estimate for V (, [c
0
, c
m
]) .
4. Let : [0, 2] C be given by (t) = r (cos mt +i sinmt) for m an integer.
Find
_
dz
z
.
5. Show that if : [a, b] C then there exists an increasing function h : [0, 1]
[a, b] such that h([0, 1]) =
.
6. Let : [a, b] C be an arbitrary continuous curve having bounded variation
and let f, g have continuous derivatives on some open set containing
. Prove
the usual integration by parts formula.
_
fg
dz = f ( (b)) g ( (b)) f ( (a)) g ( (a))

_
gdz.
7. Let f (z) [z[
(1/2)
e
i
2
where z = [z[ e
i
. This function is called the principle
branch of z
(1/2)
. Find
_
f (z) dz where is the semicircle in the upper half

plane which goes from (1, 0) to (1, 0) in the counter clockwise direction. Next
do the integral in which goes in the clockwise direction along the semicircle
in the lower half plane.
8. Prove an open set, U is connected if and only if for every two points in U,
there exists a C
1
curve having values in U which joins them.
9. Let T, Q be two partitions of [a, b] with T Q. Each of these partitions can
be used to form an approximation to V (, [a, b]) as described above. Recall
the total variation was the supremum of sums of a certain form determined by
a partition. How is the sum associated with T related to the sum associated
with Q? Explain.
10. Consider the curve,
(t) =
_
t +it
2
sin
_
1
t
_
if t (0, 1]
0 if t = 0
.
Is a continuous curve having bounded variation? What if the t
2
is replaced
with t? Is the resulting curve continuous? Is it a bounded variation curve?
11. Suppose : [a, b] 1 is given by (t) = t. What is
_
f (t) d? Explain.
Fundamentals Of Complex
Analysis
19.1 Analytic Functions
Denition 19.1 Let be an open set in C and let f : X. Then f is analytic
on if for every z ,
lim
h0
f (z +h) f (z)
h
f
(z)
exists and is a continuous function of z . Here h C.
Note that if f is analytic, it must be the case that f is continuous. It is more
common to not include the requirement that f
is continuous but it is shown later

that the continuity of f
follows.
What are some examples of analytic functions? In the case where X = C, the
simplest example is any polynomial. Thus
p (z)
n
k=0
a
k
z
k
is an analytic function and
p
(z) =
n
k=1
a
k
kz
k1
.
More generally, power series are analytic. This will be shown soon but rst here is
an important denition and a convergence theorem called the root test.
Denition 19.2 Let a
k
be a sequence in X. Then

k=1
a
k
lim
n
n
k=1
a
k
whenever this limit exists. When the limit exists, the series is said to converge.
449
450 FUNDAMENTALS OF COMPLEX ANALYSIS
Theorem 19.3 Consider
k=1
a
k
and let limsup
k
[[a
k
[[
1/k
. Then if < 1,
the series converges absolutely and if > 1 the series diverges spectacularly in the
sense that lim
k
a
k
,= 0. If = 1 the test fails. Also

k=1
a
k
(z a)
k
converges
on some disk B(a, R) . It converges absolutely if [z a[ < R and uniformly on
B(a, r
1
) whenever r
1
< R. The function f (z) =
k=1
a
k
(z a)
k
is continuous on
B(a, R) .
Proof: Suppose < 1. Then there exists r (, 1) . Therefore, [[a
k
[[ r
k
for
all k large enough and so by a comparison test,

k
[[a
k
[[ converges because the
partial sums are bounded above. Therefore, the partial sums of the original series
form a Cauchy sequence in X and so they also converge due to completeness of X.
Now suppose > 1. Then letting > r > 1, it follows [[a
k
[[
1/k
r innitely
often. Thus [[a
k
[[ r
k
innitely often. Thus there exists a subsequence for which
[[a
n
k
[[ converges to . Therefore, the series cannot converge.
Now consider

k=1
a
k
(z a)
k
. This series converges absolutely if
lim sup
k
[[a
k
[[
1/k
[z a[ < 1
which is the same as saying [z a[ < 1/ where limsup
k
[[a
k
[[
1/k
. Let
R = 1/.
Now suppose r
1
< R. Consider [z a[ r
1
. Then for such z,
[[a
k
[[ [z a[
k
[[a
k
[[ r
k
1
and
lim sup
k
_
[[a
k
[[ r
k
1
_
1/k
= lim sup
k
[[a
k
[[
1/k
r
1
=
r
1
R
< 1
so

k
[[a
k
[[ r
k
1
converges. By the Weierstrass M test,

k=1
a
k
(z a)
k
converges
uniformly for [z a[ r
1
. Therefore, f is continuous on B(a, R) as claimed because
it is the uniform limit of continuous functions, the partial sums of the innite series.
What if = 0? In this case,
lim sup
k
[[a
k
[[
1/k
[z a[ = 0 [z a[ = 0
and so R = and the series,

[[a
k
[[ [z a[
k
converges everywhere.
What if = ? Then in this case, the series converges only at z = a because if
z ,= a,
lim sup
k
[[a
k
[[
1/k
[z a[ = .
Theorem 19.4 Let f (z)
k=1
a
k
(z a)
k
be given in Theorem 19.3 where R >
0. Then f is analytic on B(a, R) . So are all its derivatives.
19.1. ANALYTIC FUNCTIONS 451
Proof: Consider g (z) =

k=2
a
k
k (z a)
k1
on B(a, R) where R =
1
as
above. Let r
1
< r < R. Then letting [z a[ < r
1
and h < r r
1
,
f (z +h) f (z)
h
g (z)
k=2
[[a
k
[[
(z +h a)
k
(z a)
k
h
k (z a)
k1
k=2
[[a
k
[[
1
h
_
k
i=0
_
k
i
_
(z a)
ki
h
i
(z a)
k
_
k (z a)
k1
k=2
[[a
k
[[
1
h
_
k
i=1
_
k
i
_
(z a)
ki
h
i
_
k (z a)
k1
k=2
[[a
k
[[
_
k
i=2
_
k
i
_
(z a)
ki
h
i1
_
[h[
k=2
[[a
k
[[
_
k2
i=0
_
k
i + 2
_
[z a[
k2i
[h[
i
_
= [h[
k=2
[[a
k
[[
_
k2
i=0
_
k 2
i
_
k (k 1)
(i + 2) (i + 1)
[z a[
k2i
[h[
i
_
[h[
k=2
[[a
k
[[
k (k 1)
2
_
k2
i=0
_
k 2
i
_
[z a[
k2i
[h[
i
_
= [h[
k=2
[[a
k
[[
k (k 1)
2
([z a[ +[h[)
k2
< [h[
k=2
[[a
k
[[
k (k 1)
2
r
k2
.
Then
lim sup
k
_
[[a
k
[[
k (k 1)
2
r
k2
_
1/k
= r < 1
and so

f (z +h) f (z)
h
g (z)
C [h[ .
therefore, g (z) = f
(z) . Now by 19.3 it also follows that f
is continuous. Since
r
1
< R was arbitrary, this shows that f
(z) is given by the dierentiated series

above for [z a[ < R. Now a repeat of the argument shows all the derivatives of f
exist and are continuous on B(a, R).
19.1.1 Cauchy Riemann Equations
Next consider the very important Cauchy Riemann equations which give conditions
under which complex valued functions of a complex variable are analytic.
Theorem 19.5 Let be an open subset of C and let f : C be a function,
such that for z = x +iy ,
f (z) = u(x, y) +iv (x, y) .
Then f is analytic if and only if u, v are C
1
() and
u
x
=
v
y
,
u
y
=
v
x
.
Furthermore,
f
(z) =
u
x
(x, y) +i
v
x
(x, y) .
Proof: Suppose f is analytic rst. Then letting t 1,
f
(z) = lim
t0
f (z +t) f (z)
t
=
lim
t0
_
u(x +t, y) +iv (x +t, y)
t

u(x, y) +iv (x, y)
t
_
=
u(x, y)
x
+i
v (x, y)
x
.
But also
f
(z) = lim
t0
f (z +it) f (z)
it
=
lim
t0
_
u(x, y +t) +iv (x, y +t)
it

u(x, y) +iv (x, y)
it
_
1
i
_
u(x, y)
y
+i
v (x, y)
y
_
=
v (x, y)
y
i
u(x, y)
y
.
This veries the Cauchy Riemann equations. We are assuming that z f
(z) is
continuous. Therefore, the partial derivatives of u and v are also continuous. To see
this, note that from the formulas for f
(z) given above, and letting z

1
= x
1
+iy
1
v (x, y)
y

v (x
1
, y
1
)
y
[f
(z) f
(z
1
)[ ,
showing that (x, y)
v(x,y)
y
is continuous since (x
1
, y
1
) (x, y) if and only if
z
1
z. The other cases are similar.
Now suppose the Cauchy Riemann equations hold and the functions, u and v
are C
1
() . Then letting h = h
1
+ih
2
,
f (z +h) f (z) = u(x +h
1
, y +h
2
)
19.1. ANALYTIC FUNCTIONS 453
+iv (x +h
1
, y +h
2
) (u(x, y) +iv (x, y))
We know u and v are both dierentiable and so
f (z +h) f (z) =
u
x
(x, y) h
1
+
u
y
(x, y) h
2
+
i
_
v
x
(x, y) h
1
+
v
y
(x, y) h
2
_
+o (h) .
Dividing by h and using the Cauchy Riemann equations,
f (z +h) f (z)
h
=
u
x
(x, y) h
1
+i
v
y
(x, y) h
2
h
+
i
v
x
(x, y) h
1
+
u
y
(x, y) h
2
h
+
o (h)
h
=
u
x
(x, y)
h
1
+ih
2
h
+i
v
x
(x, y)
h
1
+ih
2
h
+
o (h)
h
Taking the limit as h 0,
f
(z) =
u
x
(x, y) +i
v
x
(x, y) .
It follows from this formula and the assumption that u, v are C
1
() that f
is
continuous.
It is routine to verify that all the usual rules of derivatives hold for analytic
functions. In particular, the product rule, the chain rule, and quotient rule.
19.1.2 An Important Example
An important example of an analytic function is e
z
exp(z) e
x
(cos y +i siny)
where z = x + iy. You can verify that this function satises the Cauchy Riemann
equations and that all the partial derivatives are continuous. Also from the above
discussion, (e
z
)
= e
x
cos (y) + ie
x
siny = e
z
. Later I will show that e
z
is given by
the usual power series. An important property of this function is that it can be
used to parameterize the circle centered at z
0
having radius r.
Lemma 19.6 Let denote the closed curve which is a circle of radius r centered
at z
0
. Then a parameterization this curve is (t) = z
0
+re
it
where t [0, 2] .
Proof: [ (t) z
0
[
2
=
re
it
re
it
= r
2
. Also, you can see from the denition of
the sine and cosine that the point described in this way moves counter clockwise
over this circle.
19.2 Exercises
1. Verify all the usual rules of dierentiation including the product and chain
rules.
2. Suppose f and f
: U C are analytic and f (z) = u(x, y) + iv (x, y) .

Verify u
xx
+ u
yy
= 0 and v
xx
+ v
yy
= 0. This partial dierential equation
satised by the real and imaginary parts of an analytic function is called
Laplaces equation. We say these functions satisfying Laplaces equation are
harmonic functions. If u is a harmonic function dened on B(0, r) show that
v (x, y)
_
y
0
u
x
(x, t) dt
_
x
0
u
y
(t, 0) dt is such that u +iv is analytic.
3. Let f : U C be analytic and f (z) = u(x, y) + iv (x, y) . Show u, v and uv
are all harmonic although it can happen that u
2
is not. Recall that a function,
w is harmonic if w
xx
+w
yy
= 0.
4. Dene a function f (z) z x iy where z = x +iy. Is f analytic?
5. If f (z) = u(x, y) +iv (x, y) and f is analytic, verify that
det
_
u
x
u
y
v
x
v
y
_
= [f
(z)[
2
.
6. Show that if u(x, y) +iv (x, y) = f (z) is analytic, then u v = 0. Recall
u(x, y) = u
x
(x, y) , u
y
(x, y)).
7. Show that every polynomial is analytic.
8. If (t) = x(t)+iy (t) is a C
1
curve having values in U, an open set of C, and if
f : U C is analytic, we can consider f , another C
1
curve having values in
C. Also,
(t) and (f )
(t) are complex numbers so these can be considered

as vectors in 1
2
as follows. The complex number, x + iy corresponds to the
vector, x, y). Suppose that and are two such C
1
curves having values in
U and that (t
0
) = (s
0
) = z and suppose that f : U C is analytic. Show
that the angle between (f )
(t
0
) and (f )
(s
0
) is the same as the angle
between
(t
0
) and
(s
0
) assuming that f
(z) ,= 0. Thus analytic mappings

preserve angles at points where the derivative is nonzero. Such mappings
are called isogonal. . Hint: To make this easy to show, rst observe that
x, y) a, b) =
1
2
(zw +zw) where z = x +iy and w = a +ib.
9. Analytic functions are even better than what is described in Problem 8. In
addition to preserving angles, they also preserve orientation. To verify this
show that if z = x + iy and w = a + ib are two complex numbers, then
x, y, 0) and a, b, 0) are two vectors in 1
3
. Recall that the cross product,
x, y, 0) a, b, 0), yields a vector normal to the two given vectors such that
the triple, x, y, 0), a, b, 0), and x, y, 0) a, b, 0) satises the right hand rule
19.3. CAUCHYS FORMULA FOR A DISK 455
and has magnitude equal to the product of the sine of the included angle
times the product of the two norms of the vectors. In this case, the cross
product either points in the direction of the positive z axis or in the direction
of the negative z axis. Thus, either the vectors x, y, 0), a, b, 0), k form a right
handed system or the vectors a, b, 0), x, y, 0), k form a right handed system.
These are the two possible orientations. Show that in the situation of Problem
8 the orientation of
(t
0
) ,
(s
0
) , k is the same as the orientation of the
vectors (f )
(t
0
) , (f )
(s
0
) , k. Such mappings are called conformal. If f
is analytic and f
(z) ,= 0, then we know from this problem and the above that
f is a conformal map. Hint: You can do this by verifying that (f )
(t
0
)
(f )
(s
0
) = [f
( (t
0
))[
2
(t
0
)
(s
0
). To make the verication easier,
you might rst establish the following simple formula for the cross product
where here x +iy = z and a +ib = w.
(x, y, 0) (a, b, 0) = Re (ziw) k.
10. Write the Cauchy Riemann equations in terms of polar coordinates. Recall
the polar coordinates are given by
x = r cos , y = r sin.
This means, letting u(x, y) = u(r, ) , v (x, y) = v (r, ) , write the Cauchy Rie-
mann equations in terms of r and . You should eventually show the Cauchy
Riemann equations are equivalent to
u
r
=
1
r
v
,
v
r
=
1
r
u
11. Show that a real valued analytic function must be constant.

19.3 Cauchys Formula For A Disk
The Cauchy integral formula is the most important theorem in complex analysis.
It will be established for a disk in this chapter and later will be generalized to
much more general situations but the version given here will suce to prove many
interesting theorems needed in the later development of the theory. The following
are some advanced calculus results.
Lemma 19.7 Let f : [a, b] C. Then f
(t) exists if and only if Re f
(t) and
Imf
(t) exist. Furthermore,

f
(t) = Re f
(t) +i Imf
(t) .
Proof: The if part of the equivalence is obvious.
Now suppose f
(t) exists. Let both t and t +h be contained in [a, b]
Re f (t +h) Re f (t)
h
Re (f
(t))
f (t +h) f (t)
h
f
(t)

and this converges to zero as h 0. Therefore, Re f
(t) = Re (f
(t)) . Similarly,
Imf
(t) = Im(f
(t)) .
Lemma 19.8 If g : [a, b] C and g is continuous on [a, b] and dierentiable on
(a, b) with g
(t) = 0, then g (t) is a constant.

Proof: From the above lemma, you can apply the mean value theorem to the
real and imaginary parts of g.
Applying the above lemma to the components yields the following lemma.
Lemma 19.9 If g : [a, b] C
n
= X and g is continuous on [a, b] and dierentiable
on (a, b) with g

If you want to have X be a complex Banach space, the result is still true.
Lemma 19.10 If g : [a, b] X and g is continuous on [a, b] and dierentiable on
(a, b) with g

Proof: Let X
. Then g : [a, b] C . Therefore, from Lemma 19.8, for each

X
, g (s) = g (t) and since X
separates the points, it follows g (s) = g (t) so

g is constant.
Lemma 19.11 Let : [a, b] [c, d] 1 be continuous and let
g (t)
_
b
a
(s, t) ds. (19.1)
Then g is continuous. If

t
exists and is continuous on [a, b] [c, d] , then
g
(t) =
_
b
a
(s, t)
t
ds. (19.2)
Proof: The rst claim follows from the uniform continuity of on [a, b] [c, d] ,
which uniform continuity results from the set being compact. To establish 19.2, let
t and t +h be contained in [c, d] and form, using the mean value theorem,
g (t +h) g (t)
h
=
1
h
_
b
a
[(s, t +h) (s, t)] ds
=
1
h
_
b
a
(s, t +h)
t
hds
=
_
b
a
(s, t +h)
t
ds,
where may depend on s but is some number between 0 and 1. Then by the uniform
continuity of

t
, it follows that 19.2 holds.
Corollary 19.12 Let : [a, b] [c, d] C be continuous and let
g (t)
_
b
a
(s, t) ds. (19.3)

t
g
(t) =
_
b
a
(s, t)
t
ds. (19.4)
Proof: Apply Lemma 19.11 to the real and imaginary parts of .
Applying the above corollary to the components, you can also have the same
result for having values in C
n
.
Corollary 19.13 Let : [a, b] [c, d] C
n
be continuous and let
g (t)
_
b
a
(s, t) ds. (19.5)

t
g
(t) =
_
b
a
(s, t)
t
ds. (19.6)
If you want to consider having values in X, a complex Banach space a similar
result holds.
Corollary 19.14 Let : [a, b] [c, d] X be continuous and let
g (t)
_
b
a
(s, t) ds. (19.7)

t
g
(t) =
_
b
a
(s, t)
t
ds. (19.8)
Proof: Let X
. Then : [a, b] [c, d] C is continuous and

t
exists
and is continuous on [a, b] [c, d] . Therefore, from 19.8,
(g
(t)) = (g)
(t) =
_
b
a
(s, t)
t
ds =
_
b
a
(s, t)
t
ds
and since X
separates the points, it follows 19.8 holds.

The following is Cauchys integral formula for a disk.
Theorem 19.15 Let f : X be analytic on the open set, and let
B(z
0
, r) .
Let (t) z
0
+re
it
for t [0, 2] . Then if z B(z
0
, r) ,
f (z) =
1
2i
_
f (w)
w z
dw. (19.9)
Proof: Consider for [0, 1] ,
g ()
_
2
0
f
_
z +
_
z
0
+re
it
z
__
re
it
+z
0
z
rie
it
dt.
If equals one, this reduces to the integral in 19.9. The idea is to show g is a
constant and that g (0) = f (z) 2i. First consider the claim about g (0) .
g (0) =
__
2
0
re
it
re
it
+z
0
z
dt
_
if (z)
= if (z)
__
2
0
1
1
zz
0
re
it
dt
_
= if (z)
_
2
0
n=0
r
n
e
int
(z z
0
)
n
dt
because

zz
0
re
it
< 1. Since this sum converges uniformly you can interchange the
sum and the integral to obtain
g (0) = if (z)
n=0
r
n
(z z
0
)
n
_
2
0
e
int
dt
= 2if (z)
because
_
2
0
e
int
dt = 0 if n > 0.
Next consider the claim that g is constant. By Corollary 19.13, for (0, 1) ,
g
() =
_
2
0
f
_
z +
_
z
0
+re
it
z
__ _
re
it
+z
0
z
_
re
it
+z
0
z
rie
it
dt
=
_
2
0
f
_
z +
_
z
0
+re
it
z
__
rie
it
dt
=
_
2
0
d
dt
_
f
_
z +
_
z
0
+re
it
z
__
1
_
dt
= f
_
z +
_
z
0
+re
i2
z
__
1
f
_
z +
_
z
0
+re
0
z
__
1
= 0.
Now g is continuous on [0, 1] and g
(t) = 0 on (0, 1) so by Lemma 19.9, g equals a

constant. This constant can only be g (0) = 2if (z) . Thus,
g (1) =
_
f (w)
w z
dw = g (0) = 2if (z) .
This is a very signicant theorem. A few applications are given next.
Theorem 19.16 Let f : X be analytic where is an open set in C. Then f
has innitely many derivatives on . Furthermore, for all z B(z
0
, r) ,
f
(n)
(z) =
n!
2i
_
f (w)
(w z)
n+1
dw (19.10)
where (t) z
0
+re
it
, t [0, 2] for r small enough that B(z
0
, r) .
Proof: Let z B(z
0
, r) and let B(z
0
, r) . Then, letting (t)
z
0
+re
it
, t [0, 2] , and h small enough,
f (z) =
1
2i
_
f (w)
w z
dw, f (z +h) =
1
2i
_
f (w)
w z h
dw
Now
1
w z h

1
w z
=
h
(w +z +h) (w +z)
and so
f (z +h) f (z)
h
=
1
2hi
_
hf (w)
(w +z +h) (w +z)
dw
=
1
2i
_
f (w)
(w +z +h) (w +z)
dw.
Now for all h suciently small, there exists a constant C independent of such h
such that
1
(w +z +h) (w +z)

1
(w +z) (w +z)
h
(w z h) (w z)
2
C [h[
and so, the integrand converges uniformly as h 0 to
=
f (w)
(w z)
2
Therefore, the limit as h 0 may be taken inside the integral to obtain
f
(z) =
1
2i
_
f (w)
(w z)
2
dw.
Continuing in this way, yields 19.10.
This is a very remarkable result. It shows the existence of one continuous deriva-
tive implies the existence of all derivatives, in contrast to the theory of functions of
a real variable. Actually, more than what is stated in the theorem was shown. The
above proof establishes the following corollary.
Corollary 19.17 Suppose f is continuous on B(z
0
, r) and suppose that for all
z B(z
0
, r) ,
f (z) =
1
2i
_
f (w)
w z
dw,
where (t) z
0
+ re
it
, t [0, 2] . Then f is analytic on B(z
0
, r) and in fact has
innitely many derivatives on B(z
0
, r) .
Another application is the following lemma.
Lemma 19.18 Let (t) = z
0
+ re
it
, for t [0, 2], suppose f
n
f uniformly on
B(z
0
, r), and suppose
f
n
(z) =
1
2i
_
f
n
(w)
w z
dw (19.11)
for z B(z
0
, r) . Then
f (z) =
1
2i
_
f (w)
w z
dw, (19.12)
implying that f is analytic on B(z
0
, r) .
Proof: From 19.11 and the uniform convergence of f
n
to f on ([0, 2]) , the
integrals in 19.11 converge to
1
2i
_
f (w)
w z
dw.
Therefore, the formula 19.12 follows.
Uniform convergence on a closed disk of the analytic functions implies the target
function is also analytic. This is amazing. Think of the Weierstrass approximation
theorem for polynomials. You can obtain a continuous nowhere dierentiable func-
tion as the uniform limit of polynomials.
The conclusions of the following proposition have all been obtained earlier in
Theorem 19.4 but they can be obtained more easily if you use the above theorem
and lemmas.
Proposition 19.19 Let a
n
denote a sequence in X. Then there exists R [0, ]
such that
k=0
a
k
(z z
0
)
k
converges absolutely if [z z
0
[ < R, diverges if [z z
0
[ > R and converges uniformly
on B(z
0
, r) for all r < R. Furthermore, if R > 0, the function,
f (z)
k=0
a
k
(z z
0
)
k
is analytic on B(z
0
, R) .
Proof: The assertions about absolute convergence are routine from the root test
if
R
_
lim sup
n
[a
n
[
1/n
_
1
with R = if the quantity in parenthesis equals zero. The root test can be used
to verify absolute convergence which then implies convergence by completeness of
X.
The assertion about uniform convergence follows from the Weierstrass M test
and M
n
[a
n
[ r
n
. (

n=0
[a
n
[ r
n
< by the root test). It only remains to verify
the assertion about f (z) being analytic in the case where R > 0.
Let 0 < r < R and dene f
n
(z)
n
k=0
a
k
(z z
0
)
k
. Then f
n
is a polynomial
and so it is analytic. Thus, by the Cauchy integral formula above,
f
n
(z) =
1
2i
_
f
n
(w)
w z
dw
where (t) = z
0
+ re
it
, for t [0, 2] . By Lemma 19.18 and the rst part of this
proposition involving uniform convergence,
f (z) =
1
2i
_
f (w)
w z
dw.
Therefore, f is analytic on B(z
0
, r) by Corollary 19.17. Since r < R is arbitrary,
this shows f is analytic on B(z
0
, R) .
This proposition shows that all functions having values in X which are given as
power series are analytic on their circle of convergence, the set of complex numbers,
z, such that [z z
0
[ < R. In fact, every analytic function can be realized as a power
series.
Theorem 19.20 If f : X is analytic and if B(z
0
, r) , then
f (z) =
n=0
a
n
(z z
0
)
n
(19.13)
for all [z z
0
[ < r. Furthermore,
a
n
=
f
(n)
(z
0
)
n!
. (19.14)
Proof: Consider [z z
0
[ < r and let (t) = z
0
+ re
it
, t [0, 2] . Then for
w ([0, 2]) ,
z z
0
w z
0
< 1
and so, by the Cauchy integral formula,
f (z) =
1
2i
_
f (w)
w z
dw
=
1
2i
_
f (w)
(w z
0
)
_
1
zz
0
wz
0
_dw
=
1
2i
_
f (w)
(w z
0
)
n=0
_
z z
0
w z
0
_
n
dw.
Since the series converges uniformly, you can interchange the integral and the sum
to obtain
f (z) =
n=0
_
1
2i
_
f (w)
(w z
0
)
n+1
_
(z z
0
)
n
n=0
a
n
(z z
0
)
n
By Theorem 19.16, 19.14 holds.
Note that this also implies that if a function is analytic on an open set, then all
of its derivatives are also analytic. This follows from Theorem 19.4 which says that
a function given by a power series has all derivatives on the disk of convergence.
19.4 Exercises
1. Show that if [e
k
[ , then

k=m
e
k
_
r
k
r
k+1
_
< if 0 r < 1. Hint:

Let [[ = 1 and verify that
k=m
e
k
_
r
k
r
k+1
_
=
k=m
e
k
_
r
k
r
k+1
_
k=m
Re (e
k
)
_
r
k
r
k+1
_
where < Re (e
k
) < .
2. Abels theorem says that if
n=0
a
n
(z a)
n
has radius of convergence equal
to 1 and if A =

n=0
a
n
, then lim
r1
n=0
a
n
r
n
= A. Hint: Show
k=0
a
k
r
k
=

k=0
A
k
_
r
k
r
k+1
_
where A
k
denotes the k
th
partial sum
of

a
j
. Thus
k=0
a
k
r
k
=
k=m+1
A
k
_
r
k
r
k+1
_
+
m
k=0
A
k
_
r
k
r
k+1
_
,
where [A
k
A[ < for all k m. In the rst sum, write A
k
= A+e
k
and use
Problem 1. Use this theorem to verify that arctan(1) =
k=0
(1)
k
1
2k+1
.
19.4. EXERCISES 463
3. Find the integrals using the Cauchy integral formula.
(a)
_
sin z
zi
dz where (t) = 2e
it
: t [0, 2] .
(b)
_
1
za
dz where (t) = a +re
it
: t [0, 2]
(c)
_
cos z
z
2
dz where (t) = e
it
: t [0, 2]
(d)
_
log(z)
z
n
dz where (t) = 1 +
1
2
e
it
: t [0, 2] and n = 0, 1, 2. In this
problem, log (z) ln[z[ + i arg (z) where arg (z) (, ) and z =
[z[ e
i arg(z)
. Thus e
log(z)
= z and log (z)
=
1
z
.
4. Let (t) = 4e
it
: t [0, 2] and nd
_
z
2
+4
z(z
2
+1)
dz.
5. Suppose f (z) =
n=0
a
n
z
n
for all [z[ < R. Show that then
1
2
_
2
0
f
_
re
i
_
2
d =
n=0
[a
n
[
2
r
2n
for all r [0, R). Hint: Let
f
n
(z)
n
k=0
a
k
z
k
,
show
1
2
_
2
0
f
n
_
re
i
_
2
d =
n
k=0
[a
k
[
2
r
2k
and then take limits as n using uniform convergence.
6. The Cauchy integral formula, marvelous as it is, can actually be improved
upon. The Cauchy integral formula involves representing f by the values of
f on the boundary of the disk, B(a, r) . It is possible to represent f by using
only the values of Re f on the boundary. This leads to the Schwarz formula .
Supply the details in the following outline.
Suppose f is analytic on [z[ < R and
f (z) =
n=0
a
n
z
n
(19.15)
with the series converging uniformly on [z[ = R. Then letting [w[ = R,
2u(w) = f (w) +f (w)
and so
2u(w) =
k=0
a
k
w
k
+
k=0
a
k
(w)
k
. (19.16)
Now letting (t) = Re
it
, t [0, 2]
_
2u(w)
w
dw = (a
0
+a
0
)
_
1
w
dw
= 2i (a
0
+a
0
) .
Thus, multiplying 19.16 by w
1
,
1
i
_
u(w)
w
dw = a
0
+a
0
.
Now multiply 19.16 by w
(n+1)
and integrate again to obtain
a
n
=
1
i
_
u(w)
w
n+1
dw.
Using these formulas for a
n
in 19.15, we can interchange the sum and the
integral (Why can we do this?) to write the following for [z[ < R.
f (z) =
1
i
_
1
z
k=0
_
z
w
_
k+1
u(w) dw a
0
=
1
i
_
u(w)
w z
dw a
0
,
which is the Schwarz formula. Now Re a
0
=
1
2i
_
u(w)
w
dw and a
0
= Re a
0
i Ima
0
. Therefore, we can also write the Schwarz formula as
f (z) =
1
2i
_
u(w) (w +z)
(w z) w
dw +i Ima
0
. (19.17)
7. Take the real parts of the second form of the Schwarz formula to derive the
Poisson formula for a disk,
u
_
re
i
_
=
1
2
_
2
0
u
_
Re
i
_ _
R
2
r
2
_
R
2
+r
2
2Rr cos ( )
d. (19.18)
8. Suppose that u(w) is a given real continuous function dened on B(0, R)
and dene f (z) for [z[ < R by 19.17. Show that f, so dened is analytic.
Explain why u given in 19.18 is harmonic. Show that
lim
rR
u
_
re
i
_
= u
_
Re
i
_
.
Thus u is a harmonic function which approaches a given function on the
boundary and is therefore, a solution to the Dirichlet problem.
19.5. ZEROS OF AN ANALYTIC FUNCTION 465
9. Suppose f (z) =

k=0
a
k
(z z
0
)
k
for all [z z
0
[ < R. Show that f
(z) =
k=0
a
k
k (z z
0
)
k1
for all [z z
0
[ < R. Hint: Let f
n
(z) be a partial sum
of f. Show that f
n
converges uniformly to some function, g on [z z
0
[ r
for any r < R. Now use the Cauchy integral formula for a function and its
derivative to identify g with f
.
10. Use Problem 9 to nd the exact value of

k=0
k
2
_
1
3
_
k
.
11. Prove the binomial formula,
(1 +z)
n=0
_
n
_
z
n
where
_
n
_
( n + 1)
n!
.
Can this be used to give a proof of the binomial formula,
(a +b)
n
=
n
k=0
_
n
k
_
a
nk
b
k
?
Explain.
12. Suppose f is analytic on B(z
0
, r) and continuous on B(z
0
, r) and [f (z)[ M
on B(z
0
, r). Show that then

f
(n)
(a)
Mn!
r
n
.
19.5 Zeros Of An Analytic Function
In this section we give a very surprising property of analytic functions which is in
stark contrast to what takes place for functions of a real variable.
Denition 19.21 A region is a connected open set.
It turns out the zeros of an analytic function which is not constant on some
region cannot have a limit point. This is also a good time to dene the order of a
zero.
Denition 19.22 Suppose f is an analytic function dened near a point, where
f () = 0. Thus is a zero of the function, f. The zero is of order m if f (z) =
(z )
m
g (z) where g is an analytic function which is not equal to zero at .
Theorem 19.23 Let be a connected open set (region) and let f : X be
analytic. Then the following are equivalent.
1. f (z) = 0 for all z
2. There exists z
0
such that f
(n)
(z
0
) = 0 for all n.
3. There exists z
0
which is a limit point of the set,
Z z : f (z) = 0 .
Proof: It is clear the rst condition implies the second two. Suppose the third
holds. Then for z near z
0
f (z) =
n=k
f
(n)
(z
0
)
n!
(z z
0
)
n
where k 1 since z
0
is a zero of f. Suppose k < . Then,
f (z) = (z z
0
)
k
g (z)
where g (z
0
) ,= 0. Letting z
n
z
0
where z
n
Z, z
n
,= z
0
, it follows
0 = (z
n
z
0
)
k
g (z
n
)
which implies g (z
n
) = 0. Then by continuity of g, we see that g (z
0
) = 0 also,
contrary to the choice of k. Therefore, k cannot be less than and so z
0
is a point
satisfying the second condition.
Now suppose the second condition and let
S
_
z : f
(n)
(z) = 0 for all n
_
.
It is clear that S is a closed set which by assumption is nonempty. However, this
set is also open. To see this, let z S. Then for all w close enough to z,
f (w) =
k=0
f
(k)
(z)
k!
(w z)
k
= 0.
Thus f is identically equal to zero near z S. Therefore, all points near z are
contained in S also, showing that S is an open set. Now = S ( S) , the union
of two disjoint open sets, S being nonempty. It follows the other open set, S,
must be empty because is connected. Therefore, the rst condition is veried.
This proves the theorem. (See the following diagram.)
1.)

2.) 3.)
Note how radically dierent this is from the theory of functions of a real variable.
Consider, for example the function
f (x)
_
x
2
sin
_
1
x
_
if x ,= 0
0 if x = 0
19.6. LIOUVILLES THEOREM 467
which has a derivative for all x 1 and for which 0 is a limit point of the set, Z,
even though f is not identically equal to zero.
Here is a very important application called Eulers formula. Recall that
e
z
e
x
(cos (y) +i sin(y)) (19.19)
Is it also true that e
z
=
k=0
z
k
k!
?
Theorem 19.24 (Eulers Formula) Let z = x +iy. Then
e
z
=
k=0
z
k
k!
.
Proof: It was already observed that e
z
given by 19.19 is analytic. So is exp(z)
k=0
z
k
k!
. In fact the power series converges for all z C. Furthermore the two
functions, e
z
and exp(z) agree on the real line which is a set which contains a limit
point. Therefore, they agree for all values of z C.
This formula shows the famous two identities,
e
i
= 1 and e
2i
= 1.
19.6 Liouvilles Theorem
The following theorem pertains to functions which are analytic on all of C, entire
functions.
Denition 19.25 A function, f : C C or more generally, f : C X is entire
means it is analytic on C.
Theorem 19.26 (Liouvilles theorem) If f is a bounded entire function having
values in X, then f is a constant.
Proof: Since f is entire, pick any z C and write
f
(z) =
1
2i
_
R
f (w)
(w z)
2
dw
where
R
(t) = z +Re
it
for t [0, 2] . Therefore,
[[f
(z)[[ C
1
R
where C is some constant depending on the assumed bound on f. Since R is
arbitrary, let R to obtain f
(z) = 0 for any z C. It follows from this that f

is constant for if z
j
j = 1, 2 are two complex numbers, let h(t) = f (z
1
+t (z
2
z
1
))
for t [0, 1] . Then h
(t) = f
(z
1
+t (z
2
z
1
)) (z
2
z
1
) = 0. By Lemmas 19.8 -
19.10 h is a constant on [0, 1] which implies f (z
1
) = f (z
2
) .
With Liouvilles theorem it becomes possible to give an easy proof of the fun-
damental theorem of algebra. It is ironic that all the best proofs of this theorem
in algebra come from the subjects of analysis or topology. Out of all the proofs
that have been given of this very important theorem, the following one based on
Liouvilles theorem is the easiest.
Theorem 19.27 (Fundamental theorem of Algebra) Let
p (z) = z
n
+a
n1
z
n1
+ +a
1
z +a
0
be a polynomial where n 1 and each coecient is a complex number. Then there
exists z
0
C such that p (z
0
) = 0.
Proof: Suppose not. Then p (z)
1
is an entire function. Also
[p (z)[ [z[
n
_
[a
n1
[ [z[
n1
+ +[a
1
[ [z[ +[a
0
[
_
and so lim
|z|
[p (z)[ = which implies lim
|z|
p (z)
1
= 0. It follows that,
since p (z)
1
is bounded for z in any bounded set, we must have that p (z)
1
is a
bounded entire function. But then it must be constant. However since p (z)
1
0
as [z[ , this constant can only be 0. However,
1
p(z)
is never equal to zero. This
proves the theorem.
19.7 The General Cauchy Integral Formula
19.7.1 The Cauchy Goursat Theorem
This section gives a fundamental theorem which is essential to the development
which follows and is closely related to the question of when a function has a
primitive. First of all, if you have two points in C, z
1
and z
2
, you can consider
(t) z
1
+t (z
2
z
1
) for t [0, 1] to obtain a continuous bounded variation curve
from z
1
to z
2
. More generally, if z
1
, , z
m
are points in C you can obtain a contin-
uous bounded variation curve from z
1
to z
m
which consists of rst going from z
1
to
z
2
and then from z
2
to z
3
and so on, till in the end one goes from z
m1
to z
m
. We
denote this piecewise linear curve as (z
1
, , z
m
) . Now let T be a triangle with
vertices z
1
, z
2
and z
3
encountered in the counter clockwise direction as shown.
z
1
z
2
z
3
Denote by
_
T
f (z) dz, the expression,
_
(z
1
,z
2
,z
3
,z
1
)
f (z) dz. Consider the fol-
19.7. THE GENERAL CAUCHY INTEGRAL FORMULA 469
lowing picture.
E
E
T T
1
1
T
1
2
T
1
3
T
1
4
s
s
' E
s
z
1
z
2
z
3
By Lemma 18.11
_
T
f (z) dz =
4
k=1
_
T
1
k
f (z) dz. (19.20)
On the inside lines the integrals cancel as claimed in Lemma 18.11 because there
are two integrals going in opposite directions for each of these inside lines.
Theorem 19.28 (Cauchy Goursat) Let f : X have the property that f
(z)
exists for all z and let T be a triangle contained in . Then
_
T
f (w) dw = 0.
Proof: Suppose not. Then
_
T
f (w) dw
= ,= 0.
From 19.20 it follows

4
k=1
_
T
1
k
f (w) dw
and so for at least one of these T

1
k
, denoted from now on as T
1
,
_
T
1
f (w) dw

4
.
Now let T
1
play the same role as T, subdivide as in the above picture, and obtain
T
2
such that

_
T
2
f (w) dw

4
2
.
Continue in this way, obtaining a sequence of triangles,
T
k
T
k+1
, diam(T
k
) diam(T) 2
k
,
and

_
T
k
f (w) dw

4
k
.
Then let z
k=1
T
k
and note that by assumption, f
(z) exists. Therefore, for all

k large enough,
_
T
k
f (w) dw =
_
T
k
f (z) +f
(z) (w z) +g (w) dw
where [[g (w)[[ < [w z[ . Now observe that w f (z) + f
(z) (w z) has a
primitive, namely,
F (w) = f (z) w +f
(z) (w z)
2
/2.
Therefore, by Corollary 18.14.
_
T
k
f (w) dw =
_
T
k
g (w) dw.
From the denition, of the integral,
4
k

_
T
k
g (w) dw
diam(T
k
) (length of T
k
)
2
k
(length of T) diam(T) 2
k
,
and so
(length of T) diam(T) .
Since is arbitrary, this shows = 0, a contradiction. Thus
_
T
f (w) dw = 0 as
claimed.
This fundamental result yields the following important theorem.
Theorem 19.29 (Morera
1
) Let be an open set and let f

(z) exist for all z .
Let D B(z
0
, r) . Then there exists > 0 such that f has a primitive on
B(z
0
, r +).
Proof: Choose > 0 small enough that B(z
0
, r +) . Then for w
B(z
0
, r +) , dene
F (w)
_
(z
0
,w)
f (u) du.
Then by the Cauchy Goursat theorem, and w B(z
0
, r +) , it follows that for [h[
small enough,
F (w +h) F (w)
h
=
1
h
_
(w,w+h)
f (u) du
=
1
h
_
1
0
f (w +th) hdt =
_
1
0
f (w +th) dt
which converges to f (w) due to the continuity of f at w. This proves the theorem.
The following is a slight generalization of the above theorem which is also referred
to as Moreras theorem.
1
Giancinto Morera 1856-1909. This theorem or one like it dates from around 1886
Corollary 19.30 Let be an open set and suppose that whenever
(z
1
, z
2
, z
3
, z
1
)
is a closed curve bounding a triangle T, which is contained in , and f is a contin-
uous function dened on , it follows that
_
(z
1
,z
2
,z
3
,z
1
)
f (z) dz = 0,
then f is analytic on .
Proof: As in the proof of Moreras theorem, let B(z
0
, r) and use the given
condition to construct a primitive, F for f on B(z
0
, r) . Then F is analytic and so
by Theorem 19.16, it follows that F and hence f have innitely many derivatives,
implying that f is analytic on B(z
0
, r) . Since z
0
is arbitrary, this shows f is analytic
on .
19.7.2 A Redundant Assumption
Earlier in the denition of analytic, it was assumed the derivative is continuous.
This assumption is redundant.
Theorem 19.31 Let be an open set in C and suppose f : X has the property
that f
(z) exists for each z . Then f is analytic on .

Proof: Let z
0
and let B(z
0
, r) . By Moreras theorem f has a prim-
itive, F on B(z
0
, r) . It follows that F is analytic because it has a derivative, f,
and this derivative is continuous. Therefore, by Theorem 19.16 F has innitely
many derivatives on B(z
0
, r) implying that f also has innitely many derivatives
on B(z
0
, r) . Thus f is analytic as claimed.
It follows a function is analytic on an open set, if and only if f
(z) exists for

z . This is because it was just shown the derivative, if it exists, is automatically
continuous.
The same proof used to prove Theorem 19.29 implies the following corollary.
Corollary 19.32 Let be a convex open set and suppose that f
(z) exists for all

z . Then f has a primitive on .
Note that this implies that if is a convex open set on which f
(z) exists and

if : [a, b] is a closed, continuous curve having bounded variation, then letting
F be a primitive of f Theorem 18.13 implies
_
f (z) dz = F ( (b)) F ( (a)) = 0.

Notice how dierent this is from the situation of a function of a real variable! It
is possible for a function of a real variable to have a derivative everywhere and yet
the derivative can be discontinuous. A simple example is the following.
f (x)
_
x
2
sin
_
1
x
_
if x ,= 0
0 if x = 0
.
Then f
(x) exists for all x 1. Indeed, if x ,= 0, the derivative equals 2xsin

1
x
cos
1
x
which has no limit as x 0. However, from the denition of the derivative of a
function of one variable, f
(0) = 0.
19.7.3 Classication Of Isolated Singularities
First some notation.
Denition 19.33 Let B
(a, r) z C such that 0 < [z a[ < r. Thus this is

the usual ball without the center. A function is said to have an isolated singularity
at the point a C if f is analytic on B
(a, r) for some r > 0.

It turns out isolated singularities can be neatly classied into three types, re-
movable singularities, poles, and essential singularities. The next theorem deals
with the case of a removable singularity.
Denition 19.34 An isolated singularity of f is said to be removable if there exists
an analytic function, g analytic at a and near a such that f = g at all points near
a.
Theorem 19.35 Let f : B
(a, r) X be analytic. Thus f has an isolated singu-

larity at a. Suppose also that
lim
za
f (z) (z a) = 0.
Then there exists a unique analytic function, g : B(a, r) X such that g = f on
B
(a, r) . Thus the singularity at a is removable.

Proof: Let h(z) (z a)
2
f (z) , h(a) 0. Then h is analytic on B(a, r)
because it is easy to see that h
(a) = 0. It follows h is given by a power series,

h(z) =
k=2
a
k
(z a)
k
where a
0
= a
1
= 0 because of the observation above that h
(a) = h(a) = 0. It
follows that for [z a[ > 0
f (z) =
k=2
a
k
(z a)
k2
g (z) .
What of the other case where the singularity is not removable? This situation
is dealt with by the amazing Casorati Weierstrass theorem.
Theorem 19.36 (Casorati Weierstrass) Let a be an isolated singularity and sup-
pose for some r > 0, f (B
(a, r)) is not dense in C. Then either a is a removable

singularity or there exist nitely many b
1
, , b
M
for some nite number, M such
that for z near a,
f (z) = g (z) +
M
k=1
b
k
(z a)
k
(19.21)
where g (z) is analytic near a.
Proof: Suppose B(z
0
, ) has no points of f (B
(a, r)) . Such a ball must exist if

f (B
(a, r)) is not dense. Then for z B
(a, r) , [f (z) z
0
[ > 0. It follows from
Theorem 19.35 that
1
f(z)z
0
has a removable singularity at a. Hence, there exists h
an analytic function such that for z near a,
h(z) =
1
f (z) z
0
. (19.22)
There are two cases. First suppose h(a) = 0. Then

k=1
a
k
(z a)
k
=
1
f(z)z
0
for z near a. If all the a
k
= 0, this would be a contradiction because then the left
side would equal zero for z near a but the right side could not equal zero. Therefore,
there is a rst m such that a
m
,= 0. Hence there exists an analytic function, k (z)
which is not equal to zero in some ball, B(a, ) such that
k (z) (z a)
m
=
1
f (z) z
0
.
Hence, taking both sides to the 1 power,
f (z) z
0
=
1
(z a)
m
k=0
b
k
(z a)
k
and so 19.21 holds.
The other case is that h(a) ,= 0. In this case, raise both sides of 19.22 to the 1
power and obtain
f (z) z
0
= h(z)
1
,
a function analytic near a. Therefore, the singularity is removable. This proves the
theorem.
This theorem is the basis for the following denition which classies isolated
singularities.
Denition 19.37 Let a be an isolated singularity of a complex valued function, f.
When 19.21 holds for z near a, then a is called a pole. The order of the pole in
19.21 is M. If for every r > 0, f (B
(a, r)) is dense in C then a is called an essential

singularity.
In terms of the above denition, isolated singularities are either removable, a
pole, or essential. There are no other possibilities.
Theorem 19.38 Suppose f : C has an isolated singularity at a . Then a
is a pole if and only if
lim
za
d (f (z) , ) = 0
in

C.
Proof: Suppose rst f has a pole at a. Then by denition, f (z) = g (z) +
M
k=1
b
k
(za)
k
for z near a where g is analytic. Then
[f (z)[
[b
M
[
[z a[
M
[g (z)[
M1
k=1
[b
k
[
[z a[
k
=
1
[z a[
M
_
[b
M
[
_
[g (z)[ [z a[
M
+
M1
k=1
[b
k
[ [z a[
Mk
__
.
Now lim
za
_
[g (z)[ [z a[
M
+
M1
k=1
[b
k
[ [z a[
Mk
_
= 0 and so the above in-
equality proves lim
za
[f (z)[ = . Referring to the diagram on Page 436, you see
this is the same as saying
lim
za
[f (z) (0, 0, 2)[ = lim
za
[f (z) ()[ = lim
za
d (f (z) , ) = 0
Conversely, suppose lim
za
d (f (z) , ) = 0. Then from the diagram on Page
436, it follows lim
za
[f (z)[ = and in particular, a cannot be either removable or
an essential singularity by the Casorati Weierstrass theorem, Theorem 19.36. The
only case remaining is that a is a pole. This proves the theorem.
Denition 19.39 Let f : C where is an open subset of C. Then f is called
meromorphic if all singularities are isolated and are either poles or removable and
this set of singularities has no limit point. It is convenient to regard meromorphic
functions as having values in

C where if a is a pole, f (a) . From now on, this
will be assumed when a meromorphic function is being considered.
The usefulness of the above convention about f (a) at a pole is made clear
in the following theorem.
Theorem 19.40 Let be an open subset of C and let f :

C be meromorphic.
Then f is continuous with respect to the metric, d on

C.
Proof: Let z
n
z where z . Then if z is a pole, it follows from Theorem
19.38 that
d (f (z
n
) , ) d (f (z
n
) , f (z)) 0.
If z is not a pole, then f (z
n
) f (z) in C which implies [ (f (z
n
)) (f (z))[ =
d (f (z
n
) , f (z)) 0. Recall that is continuous on C.
19.7.4 The Cauchy Integral Formula
This section presents the general version of the Cauchy integral formula valid for
arbitrary closed rectiable curves. The key idea in this development is the notion
of the winding number. This is the number also called the index, dened in the
following theorem. This winding number, along with the earlier results, especially
Liouvilles theorem, yields an extremely general Cauchy integral formula.
Denition 19.41 Let : [a, b] C and suppose z /
. The winding number,

n(, z) is dened by
n(, z)
1
2i
_
dw
w z
.
The main interest is in the case where is closed curve. However, the same
notation will be used for any such curve.
Theorem 19.42 Let : [a, b] C be continuous and have bounded variation with
(a) = (b) . Also suppose that z /
. Dene
n(, z)
1
2i
_
dw
w z
. (19.23)
Then n(, ) is continuous and integer valued. Furthermore, there exists a sequence,
k
: [a, b] C such that
k
is C
1
([a, b]) ,
[[
k
[[ <
1
k
,
k
(a) =
k
(b) = (a) = (b) ,
and n(
k
, z) = n(, z) for all k large enough. Also n(, ) is constant on every
connected component of C
and equals zero on the unbounded component of C
.
Proof: First consider the assertion about continuity.
[n(, z) n(, z
1
)[ C
_
1
w z

1
w z
1
_
dw

C (Length of ) [z
1
z[
whenever z
1
is close enough to z. This proves the continuity assertion. Note this
did not depend on being closed.
Next it is shown that for a closed curve the winding number equals an integer.
To do so, use Theorem 18.12 to obtain
k
, a function in C
1
([a, b]) such that z /
k
([a, b]) for all k large enough,
k
(x) = (x) for x = a, b, and
1
2i
_
dw
w z

1
2i
_
k
dw
w z
<
1
k
, [[
k
[[ <
1
k
.
It is shown that each of
1
2i
_
k
dw
wz
is an integer. To simplify the notation, write
instead of
k
.
_
dw
w z
=
_
b
a
(s) ds
(s) z
.
Dene
g (t)
_
t
a
(s) ds
(s) z
. (19.24)
Then
_
e
g(t)
( (t) z)
_
= e
g(t)
(t) e
g(t)
g
(t) ( (t) z)
= e
g(t)
(t) e
g(t)
(t) = 0.
It follows that e
g(t)
( (t) z) equals a constant. In particular, using the fact that
(a) = (b) ,
e
g(b)
( (b) z) = e
g(a)
( (a) z) = ( (a) z) = ( (b) z)
and so e
g(b)
= 1. This happens if and only if g (b) = 2mi for some integer m.
Therefore, 19.24 implies
2mi =
_
b
a
(s) ds
(s) z
=
_
dw
w z
.
Therefore,
1
2i
_
k
dw
wz
is a sequence of integers converging to
1
2i
_
dw
wz
n(, z)
and so n(, z) must also be an integer and n(
k
, z) = n(, z) for all k large enough.
Since n(, ) is continuous and integer valued, it follows from Corollary 5.58 on
Page 119 that it must be constant on every connected component of C
. It is clear
that n(, z) equals zero on the unbounded component because from the formula,
lim
z
[n(, z)[ lim
z
V (, [a, b])
_
1
[z[ c
_
where c max [w[ : w
.This proves the theorem.

Corollary 19.43 Suppose : [a, b] C is a continuous bounded variation curve
and n(, z) is an integer where z /
. Then (a) = (b) . Also z n(, z) for

z /
is continuous.
Proof: Letting be a C
1
curve for which (a) = (a) and (b) = (b) and
which is close enough to that n(, z) = n(, z) , the argument is similar to the
above. Let
g (t)
_
t
a
(s) ds
(s) z
. (19.25)
Then
_
e
g(t)
( (t) z)
_
= e
g(t)
(t) e
g(t)
g
(t) ( (t) z)
= e
g(t)
(t) e
g(t)
(t) = 0.
Hence
e
g(t)
( (t) z) = c ,= 0. (19.26)
By assumption
g (b) =
_
1
w z
dw = 2im
for some integer, m. Therefore, from 19.26
1 = e
2mi
=
(b) z
c
.
Thus c = (b) z and letting t = a in 19.26,
1 =
(a) z
(b) z
which shows (a) = (b) . This proves the corollary since the assertion about con-
tinuity was already observed.
It is a good idea to consider a simple case to get an idea of what the winding
number is measuring. To do so, consider : [a, b] C such that is continuous,
closed and bounded variation. Suppose also that is one to one on (a, b) . Such a
curve is called a simple closed curve. It can be shown that such a simple closed curve
divides the plane into exactly two components, an inside bounded component and
an outside unbounded component. This is called the Jordan Curve theorem or
the Jordan separation theorem. This is a dicult theorem which requires some
very hard topology such as homology theory or degree theory. It wont be used
here beyond making reference to it. For now, it suces to simply assume that
is such that this result holds. This will usually be obvious anyway. Also suppose
that it is possible to change the parameter to be in [0, 2] , in such a way that
(t) +
_
z +re
it
(t)
_
z ,= 0 for all t [0, 2] and [0, 1] . (As t goes
from 0 to 2 the point (t) traces the curve ([0, 2]) in the counter clockwise
direction.) Suppose z D, the inside of the simple closed curve and consider the
curve (t) = z+re
it
for t [0, 2] where r is chosen small enough that B(z, r) D.
Then it happens that n(, z) = n(, z) .
Proposition 19.44 Under the above conditions,
n(, z) = n(, z)
and n(, z) = 1.
Proof: By changing the parameter, assume that [a, b] = [0, 2] . From Theorem
19.42 it suces to assume also that is C
1
. Dene h
(t) (t)+
_
z +re
it
(t)
_
for [0, 1] . (This function is called a homotopy of the curves and .) Note that
for each [0, 1] , t h
(t) is a closed C
1
curve. Also,
1
2i
_
h
1
w z
dw =
1
2i
_
2
0
(t) +
_
rie
it
(t)
_
(t) +(z +re
it
(t)) z
dt.
This number is an integer and it is routine to verify that it is a continuous function
of . When = 0 it equals n(, z) and when = 1 it equals n(, z). Therefore,
n(, z) = n(, z) . It only remains to compute n(, z) .
n(, z) =
1
2i
_
2
0
rie
it
re
it
dt = 1.
This proves the proposition.
Now if was not one to one but caused the point, (t) to travel around
twice,
you could modify the above argument to have the parameter interval, [0, 4] and
still nd n(, z) = n(, z) only this time, n(, z) = 2. Thus the winding number
is just what its name suggests. It measures the number of times the curve winds
around the point. One might ask why bother with the winding number if this is
all it does. The reason is that the notion of counting the number of times a curve
winds around a point is rather vague. The winding number is precise. It is also the
natural thing to consider in the general Cauchy integral formula presented below.
Consider a situation typied by the following picture in which is the open set
between the dotted curves and
j
are closed rectiable curves in .
E
2
'
3
'
U
The following theorem is the general Cauchy integral formula.
Denition 19.45 Let
k
n
k=1
be continuous oriented curves having bounded vari-
ation. Then this is called a cycle if whenever, z /
n
k=1
k
,
n
k=1
n(
k
, z) is an
integer.
By Theorem 19.42 if each
k
is a closed curve, then
k
n
k=1
is a cycle.
Theorem 19.46 Let be an open subset of the plane and let f : X be
analytic. If
k
: [a
k
, b
k
] , k = 1, , m are continuous curves having bounded
variation such that for all z /
m
k=1
k
([a
k
, b
k
])
m
k=1
n(
k
, z) equals an integer
and for all z / ,
m
k=1
n(
k
, z) = 0.
Then for all z
m
k=1
k
([a
k
, b
k
]) ,
f (z)
m
k=1
n(
k
, z) =
m
k=1
1
2i
_
k
f (w)
w z
dw.
Proof: Let be dened on by
(z, w)
_
f(w)f(z)
wz
if w ,= z
f
(z) if w = z
.
Then is analytic as a function of both z and w and is continuous in . This
is easily seen using Theorem 19.35. Consider the case of w (z, w) .
lim
wz
(w z) ((z, w) (z, z)) = lim
wz
_
f (w) f (z)
w z
f
(z)
_
= 0.
Thus w (z, w) has a removable singularity at z. The case of z (z, w) is
similar.
Dene
h(z)
1
2i
m
k=1
_
k
(z, w) dw.
Is h analytic on ? To show this is the case, verify
_
T
h(z) dz = 0
for every triangle, T, contained in and apply Corollary 19.30. To do this, use
Theorem 18.12 to obtain for each k, a sequence of functions,
kn
C
1
([a
k
, b
k
])
such that
kn
(x) =
k
(x) for x a
k
, b
k
and
kn
([a
k
, b
k
]) , [[
kn
k
[[ <
1
n
,
kn
(z, w) dw
_
k
(z, w) dw
<
1
n
, (19.27)
for all z T. Then applying Fubinis theorem,
_
T
_
kn
(z, w) dwdz =
_
kn
_
T
(z, w) dzdw = 0
because is given to be analytic. By 19.27,
_
T
_
k
(z, w) dwdz = lim
n
_
T
_
kn
(z, w) dwdz = 0
and so h is analytic on as claimed.
Now let H denote the set,
H
_
z C
m
k=1

k
([a
k
, b
k
]) :
m
k=1
n(
k
, z) = 0
_
.
H is an open set because z

m
k=1
n(
k
, z) is integer valued by assumption and
continuous. Dene
g (z)
_
h(z) if z
1
2i
m
k=1
_
k
f(w)
wz
dw if z H
. (19.28)
Why is g (z) well dened? For z H, z /
m
k=1
k
([a
k
, b
k
]) and so
g (z) =
1
2i
m
k=1
_
k
(z, w) dw =
1
2i
m
k=1
_
k
f (w) f (z)
w z
dw
=
1
2i
m
k=1
_
k
f (w)
w z
dw
1
2i
m
k=1
_
k
f (z)
w z
dw
=
1
2i
m
k=1
_
k
f (w)
w z
dw
because z H. This shows g (z) is well dened. Also, g is analytic on because
it equals h there. It is routine to verify that g is analytic on H also because of
the second line of 19.28. By assumption,
C
H because it is assumed that
k
n(
k
, z) = 0 for z / and so H = C showing that g is an entire function.
Now note that

m
k=1
n(
k
, z) = 0 for all z contained in the unbounded compo-
nent of C
m
k=1
k
([a
k
, b
k
]) which component contains B(0, r)
C
for r large enough.
It follows that for [z[ > r, it must be the case that z H and so for such z, the
bottom description of g (z) found in 19.28 is valid. Therefore, it follows
lim
|z|
[[g (z)[[ = 0
and so g is bounded and entire. By Liouvilles theorem, g is a constant. Hence,
from the above equation, the constant can only equal zero.
For z
m
k=1
k
([a
k
, b
k
]) ,
0 = h(z) =
1
2i
m
k=1
_
k
(z, w) dw =
1
2i
m
k=1
_
k
f (w) f (z)
w z
dw =
1
2i
m
k=1
_
k
f (w)
w z
dw f (z)
m
k=1
n(
k
, z) .
Corollary 19.47 Let be an open set and let
k
: [a
k
, b
k
] , k = 1, , m, be
closed, continuous and of bounded variation. Suppose also that
m
k=1
n(
k
, z) = 0
for all z / . Then if f : C is analytic,
m
k=1
_
k
f (w) dw = 0.
Proof: This follows from Theorem 19.46 as follows. Let
g (w) = f (w) (w z)
where z
m
k=1
k
([a
k
, b
k
]) . Then by this theorem,
0 = 0
m
k=1
n(
k
, z) = g (z)
m
k=1
n(
k
, z) =
m
k=1
1
2i
_
k
g (w)
w z
dw =
1
2i
m
k=1
_
k
f (w) dw.
Another simple corollary to the above theorem is Cauchys theorem for a simply
connected region.
Denition 19.48 An open set, C is a region if it is open and connected. A
region, is simply connected if

C is connected where

C is the extended complex
plane. In the future, the term simply connected open set will be an open set which
is connected and

C is connected .
Corollary 19.49 Let : [a, b] be a continuous closed curve of bounded vari-
ation where is a simply connected region in C and let f : X be analytic.
Then
_
f (w) dw = 0.
Proof: Let D denote the unbounded component of

C
. Thus

C
.
Then the connected set,

C is contained in D since every point of

C must be
in some component of

C
and is contained in both

C and D. Thus D must
be the component that contains

C . It follows that n(, ) must be constant on
C , its value being its value on D. However, for z D,

n(, z) =
1
2i
_
1
w z
dw
and so lim
|z|
n(, z) = 0 showing n(, z) = 0 on D. Therefore this veries the
hypothesis of Theorem 19.46. Let z D and dene
g (w) f (w) (w z) .
Thus g is analytic on and by Theorem 19.46,
0 = n(z, ) g (z) =
1
2i
_
g (w)
w z
dw =
1
2i
_
f (w) dw.
The following is a very signicant result which will be used later.
Corollary 19.50 Suppose is a simply connected open set and f : X is
analytic. Then f has a primitive, F, on . Recall this means there exists F such
that F
(z) = f (z) for all z .

Proof: Pick a point, z
0
and let V denote those points, z of for which
there exists a curve, : [a, b] such that is continuous, of bounded variation,
(a) = z
0
, and (b) = z. Then it is easy to verify that V is both open and closed
in and therefore, V = because is connected. Denote by
z
0
,z
such a curve
from z
0
to z and dene
F (z)
_
z
0
,z
f (w) dw.
Then F is well dened because if
j
, j = 1, 2 are two such curves, it follows from
Corollary 19.49 that
_
1
f (w) dw +
_
2
f (w) dw = 0,
implying that
_
1
f (w) dw =
_
2
f (w) dw.
Now this function, F is a primitive because, thanks to Corollary 19.49
(F (z +h) F (z)) h
1
=
1
h
_
z,z+h
f (w) dw
=
1
h
_
1
0
f (z +th) hdt
and so, taking the limit as h 0, F
(z) = f (z) .
19.7.5 An Example Of A Cycle
The next theorem deals with the existence of a cycle with nice properties. Basically,
you go around the compact subset of an open set with suitable contours while staying
in the open set. The method involves the following simple concept.
Denition 19.51 A tiling of 1
2
= C is the union of innitely many equally spaced
vertical and horizontal lines. You can think of the small squares which result as
tiles. To tile the plane or 1
2
= C means to consider such a union of horizontal and
vertical lines. It is like graph paper. See the picture below for a representation of
part of a tiling of C.
Theorem 19.52 Let K be a compact subset of an open set, . Then there exist
continuous, closed, bounded variation oriented curves
j
m
j=1
for which
j
K =
for each j,
j
, and for all p K,
m
k=1
n(
k
, p) = 1.
while for all z /
m
k=1
n(
k
, z) = 0.
Proof: Let = dist
_
K,
C
_
. Since K is compact, > 0. Now tile the plane
with squares, each of which has diameter less than /2.
K
K

Let S denote the set of all the closed squares in this tiling which have nonempty
intersection with K.Thus, all the squares of S are contained in . First suppose p is
a point of K which is in the interior of one of these squares in the tiling. Denote by
S
k
the boundary of S
k
one of the squares in S, oriented in the counter clockwise
direction and S
m
denote the square of S which contains the point, p in its interior.
Let the edges of the square, S
j
be
_
j
k
_
4
k=1
. Thus a short computation shows
n(S
m
, p) = 1 but n(S
j
, p) = 0 for all j ,= m. The reason for this is that for
z in S
j
, the values z p : z S
j
lie in an open square, Q which is located at a
positive distance from 0. Then

C Q is connected and 1/ (z p) is analytic on Q.
It follows from Corollary 19.50 that this function has a primitive on Q and so
_
S
j
1
z p
dz = 0.
Similarly, if z / , n(S
j
, z) = 0. On the other hand, a direct computation will
verify that n(p, S
m
) = 1. Thus 1 =

j,k
n
_
p,
j
k
_
=

S
j
S
n(p, S
j
) and if
z / , 0 =
j,k
n
_
z,
j
k
_
=
S
j
S
n(z, S
j
) .
If
j
k
coincides with
l
l
, then the contour integrals taken over this edge are
taken in opposite directions and so the edge the two squares have in common can
be deleted without changing

j,k
n
_
z,
j
k
_
for any z not on any of the lines in the
tiling. For example, see the picture,
T
c
T
' '
c
E E
T
'
E
c
From the construction, if any of the
j
k
contains a point of K then this point is
on one of the four edges of S
j
and at this point, there is at least one edge of some
S
l
which also contains this point. As just discussed, this shared edge can be deleted
without changing

i,j
n
_
z,
j
k
_
. Delete the edges of the S
k
which intersect K but
not the endpoints of these edges. That is, delete the open edges. When this is done,
delete all isolated points. Let the resulting oriented curves be denoted by
k
m
k=1
.
Note that you might have
k
=
l
. The construction is illustrated in the following
picture.
T
c
E
'
E
c T
c
K
K
Then as explained above,

m
k=1
n(p,
k
) = 1. It remains to prove the claim
about the closed curves.
Each orientation on an edge corresponds to a direction of motion over that
edge. Call such a motion over the edge a route. Initially, every vertex, (corner of
a square in S) has the property there are the same number of routes to and from
that vertex. When an open edge whose closure contains a point of K is deleted,
every vertex either remains unchanged as to the number of routes to and from that
vertex or it loses both a route away and a route to. Thus the property of having the
same number of routes to and from each vertex is preserved by deleting these open
edges.. The isolated points which result lose all routes to and from. It follows that
upon removing the isolated points you can begin at any of the remaining vertices
and follow the routes leading out from this and successive vertices according to
orientation and eventually return to that end. Otherwise, there would be a vertex
which would have only one route leading to it which does not happen. Now if you
have used all the routes out of this vertex, pick another vertex and do the same
process. Otherwise, pick an unused route out of the vertex and follow it to return.
Continue this way till all routes are used exactly once, resulting in closed oriented
curves,
k
. Then
k
n(
k
, p) =
j
n
_
j
, p
_
= 1.
In case p K is on some line of the tiling, it is not on any of the
k
because
k
K = and so the continuity of z n(
k
, z) yields the desired result in this
case also. This proves the lemma.
19.8 Exercises
1. If U is simply connected, f is analytic on U and f has no zeros in U, show
there exists an analytic function, F, dened on U such that e
F
= f.
2. Let f be dened and analytic near the point a C. Show that then f (z) =
k=0
b
k
(z a)
k
whenever [z a[ < R where R is the distance between a and
the nearest point where f fails to have a derivative. The number R, is called
the radius of convergence and the power series is said to be expanded about
a.
3. Find the radius of convergence of the function
1
1+z
2
expanded about a = 2.
Note there is nothing wrong with the function,
1
1+x
2
when considered as a
function of a real variable, x for any value of x. However, if you insist on using
power series, you nd there is a limitation on the values of x for which the
power series converges due to the presence in the complex plane of a point, i,
where the function fails to have a derivative.
4. Suppose f is analytic on all of C and satises [f (z)[ < A +B[z[
1/2
. Show f
is constant.
5. What if you dened an open set, U to be simply connected if CU is connected.
Would it amount to the same thing? Hint: Consider the outside of B(0, 1) .
6. Let (t) = e
it
: t [0, 2] . Find
_
1
z
n
dz for n = 1, 2, .
7. Show i
_
2
0
(2 cos )
2n
d =
_
_
z +
1
z
_
2n
_
1
z
_
dz where (t) = e
it
: t [0, 2] .
Then evaluate this integral using the binomial theorem and the previous prob-
lem.
8. Suppose that for some constants a, b ,= 0, a, b 1, f (z +ib) = f (z) for all
z C and f (z +a) = f (z) for all z C. If f is analytic, show that f must
be constant. Can you generalize this? Hint: This uses Liouvilles theorem.
9. Suppose f (z) = u(x, y) + iv (x, y) is analytic for z U, an open set. Let
g (z) = u
(x, y) +iv
(x, y) where
_
u
_
= Q
_
u
v
_
where Q is a unitary matrix. That is QQ
= Q
Q = I. When will g be
analytic?
10. Suppose f is analytic on an open set, U, except for
U where is a one
to one continuous function having bounded variation, but it is known that f
is continuous on
. Show that in fact f is analytic on
also. Hint: Pick a

point on
, say (t
0
) and suppose for now that t
0
(a, b) . Pick r > 0 such
that B = B( (t
0
) , r) U. Then show there exists t
1
< t
0
and t
2
> t
0
such
19.8. EXERCISES 487
that ([t
1
, t
2
]) B and (t
i
) / B. Thus ([t
1
, t
2
]) is a path across B going
through the center of B which divides B into two open sets, B
1
, and B
2
along
with
. Let the boundary of B

k
consist of ([t
1
, t
2
]) and a circular arc, C
k
.
Now letting z B
k
, the line integral of
f(w)
wz
over
in two dierent directions

cancels. Therefore, if z B
k
, you can argue that f (z) =
1
2i
_
C
f(w)
wz
dw. By
continuity, this continues to hold for z ((t
1
, t
2
)) . Therefore, f must be
analytic on ((t
1
, t
1
)) also. This shows that f must be analytic on ((a, b)) .
To get the endpoints, simply extend to have the same properties but dened
on [a , b +] and repeat the above argument or else do this at the beginning
and note that you get [a, b] (a , b +) .
11. Let U be an open set contained in the upper half plane and suppose that
there are nitely many line segments on the x axis which are contained in
the boundary of U. Now suppose that f is dened, real, and continuous on
these line segments and is dened and analytic on U. Now let

U denote the
reection of U across the x axis. Show that it is possible to extend f to a
function, g dened on all of
W

U U the line segments mentioned earlier
such that g is analytic in W. Hint: For z

U, the reection of U across the
x axis, let g (z) f (z). Show that g is analytic on

U U and continuous on
the line segments. Then use Problem 10 or Moreras theorem to argue that
g is analytic on the line segments also. The result of this problem is know as
the Schwarz reection principle.
12. Show that rotations and translations of analytic functions yield analytic func-
tions and use this observation to generalize the Schwarz reection principle
to situations in which the line segments are part of a line which is not the x
axis. Thus, give a version which involves reection about an arbitrary line.
The Open Mapping Theorem
20.1 A Local Representation
The open mapping theorem, is an even more surprising result than the theorem
about the zeros of an analytic function. The following proof of this important
theorem uses an interesting local representation of the analytic function.
Theorem 20.1 (Open mapping theorem) Let be a region in C and suppose f :
C is analytic. Then f () is either a point or a region. In the case where f ()
is a region, it follows that for each z
0
, there exists an open set, V containing
z
0
and m N such that for all z V,
f (z) = f (z
0
) +(z)
m
(20.1)
where : V B(0, ) is one to one, analytic and onto, (z
0
) = 0,
(z) ,= 0 on
V and
1
analytic on B(0, ) . If f is one to one then m = 1 for each z
0
and
f
1
: f () is analytic.
Proof: Suppose f () is not a point. Then if z
0
it follows there exists r > 0
such that f (z) ,= f (z
0
) for all z B(z
0
, r) z
0
. Otherwise, z
0
would be a limit
point of the set,
z : f (z) f (z
0
) = 0
which would imply from Theorem 19.23 that f (z) = f (z
0
) for all z . Therefore,
making r smaller if necessary and using the power series of f,
f (z) = f (z
0
) + (z z
0
)
m
g (z) (
?
=
_
(z z
0
) g (z)
1/m
_
m
)
for all z B(z
0
, r) , where g (z) ,= 0 on B(z
0
, r) . As implied in the above formula,
one wonders if you can take the m
th
root of g (z) .
g
g
is an analytic function on B(z
0
, r) and so by Corollary 19.32 it has a primitive
on B(z
0
, r) , h. Therefore by the product rule and the chain rule,
_
ge
h
_
= 0 and
so there exists a constant, C = e
a+ib
such that on B(z
0
, r) ,
ge
h
= e
a+ib
.
489
490 THE OPEN MAPPING THEOREM
Therefore,
g (z) = e
h(z)+a+ib
and so, modifying h by adding in the constant, a +ib, g (z) = e
h(z)
where h
(z) =
g
(z)
g(z)
on B(z
0
, r) . Letting
(z) = (z z
0
) e
h(z)
m
implies formula 20.1 is valid on B(z
0
, r) . Now
(z
0
) = e
h(z
0
)
m
,= 0.
Shrinking r if necessary you can assume
(z) ,= 0 on B(z
0
, r). Is there an open
set, V contained in B(z
0
, r) such that maps V onto B(0, ) for some > 0?
Let (z) = u(x, y) +iv (x, y) where z = x +iy. Consider the mapping
_
x
y
_
_
u(x, y)
v (x, y)
_
where u, v are C
1
because is given to be analytic. The Jacobian of this map at
(x, y) B(z
0
, r) is
u
x
(x, y) u
y
(x, y)
v
x
(x, y) v
y
(x, y)
u
x
(x, y) v
x
(x, y)
v
x
(x, y) u
x
(x, y)
= u
x
(x, y)
2
+v
x
(x, y)
2
=
(z)
2
,= 0.
This follows from a use of the Cauchy Riemann equations. Also
_
u(x
0
, y
0
)
v (x
0
, y
0
)
_
=
_
0
0
_
Therefore, by the inverse function theorem there exists an open set, V, containing
z
0
and > 0 such that (u, v)
T
maps V one to one onto B(0, ) . Thus is one to
one onto B(0, ) as claimed. Applying the same argument to other points, z of V
and using the fact that
(z) ,= 0 at these points, it follows maps open sets to

open sets. In other words,
1
is continuous.
It also follows that
m
maps V onto B(0,
m
) . Therefore, the formula 20.1
implies that f maps the open set, V, containing z
0
to an open set. This shows f ()
is an open set because z
0
was arbitrary. It is connected because f is continuous and
is connected. Thus f () is a region. It remains to verify that
1
is analytic on
B(0, ) . Since
1
is continuous,
lim
(z
1
)(z)
1
((z
1
))
1
((z))
(z
1
) (z)
= lim
z
1
z
z
1
z
(z
1
) (z)
=
1
(z)
.
Therefore,
1
is analytic as claimed.
20.1. A LOCAL REPRESENTATION 491
It only remains to verify the assertion about the case where f is one to one. If
m > 1, then e
2i
m
,= 1 and so for z
1
V,
e
2i
m
(z
1
) ,= (z
1
) . (20.2)
But e
2i
m
(z
1
) B(0, ) and so there exists z
2
,= z
1
(since is one to one) such that
(z
2
) = e
2i
m
(z
1
) . But then
(z
2
)
m
=
_
e
2i
m
(z
1
)
_
m
= (z
1
)
m
implying f (z
2
) = f (z
1
) contradicting the assumption that f is one to one. Thus
m = 1 and f
(z) =
(z) ,= 0 on V. Since f maps open sets to open sets, it follows

that f
1
is continuous and so
_
f
1
_
(f (z)) = lim
f(z
1
)f(z)
f
1
(f (z
1
)) f
1
(f (z))
f (z
1
) f (z)
= lim
z
1
z
z
1
z
f (z
1
) f (z)
=
1
f
(z)
.
One does not have to look very far to nd that this sort of thing does not hold
for functions mapping 1 to 1. Take for example, the function f (x) = x
2
. Then
f (1) is neither a point nor a region. In fact f (1) fails to be open.
Corollary 20.2 Suppose in the situation of Theorem 20.1 m > 1 for the local
representation of f given in this theorem. Then there exists > 0 such that if
w B(f (z
0
) , ) = f (V ) for V an open set containing z
0
, then f
1
(w) consists of
m distinct points in V. (f is m to one on V )
Proof: Let w B(f (z
0
) , ) . Then w = f ( z) where z V. Thus f ( z) =
f (z
0
) + ( z)
m
. Consider the m distinct numbers,
_
e
2ki
m
( z)
_
m
k=1
. Then each of
these numbers is in B(0, ) and so since maps V one to one onto B(0, ) , there
are m distinct numbers in V , z
k
m
k=1
such that (z
k
) = e
2ki
m
( z). Then
f (z
k
) = f (z
0
) +(z
k
)
m
= f (z
0
) +
_
e
2ki
m
( z)
_
m
= f (z
0
) +e
2ki
( z)
m
= f (z
0
) +( z)
m
= f ( z) = w
20.1.1 Branches Of The Logarithm
The argument used in to prove the next theorem was used in the proof of the open
mapping theorem. It is a very important result and deserves to be stated as a
theorem.
Theorem 20.3 Let be a simply connected region and suppose f : C is
analytic and nonzero on . Then there exists an analytic function, g such that
e
g(z)
= f (z) for all z .
Proof: The function, f
/f is analytic on and so by Corollary 19.50 there is

a primitive for f
/f, denoted as g
1
. Then
_
e
g
1
f
_
=
f
f
e
g
1
f +e
g
1
f
= 0
and so since is connected, it follows e
g
1
f equals a constant, e
a+ib
. Therefore,
f (z) = e
g
1
(z)+a+ib
. Dene g (z) g
1
(z) +a +ib.
The function, g in the above theorem is called a branch of the logarithm of f
and is written as log (f (z)).
Denition 20.4 Let be a ray starting at 0. Thus is a straight line of innite
length extending in one direction with its initial point at 0.
A special case of the above theorem is the following.
Theorem 20.5 Let be a ray starting at 0. Then there exists an analytic function,
L(z) dened on C such that
e
L(z)
= z.
This function, L is called a branch of the logarithm. This branch of the logarithm
satises the usual formula for logarithms, L(zw) = L(z) +L(w) provided zw / .
Proof: C is a simply connected region because its complement with respect
to

C is connected. Furthermore, the function, f (z) = z is not equal to zero on
C . Therefore, by Theorem 20.3 there exists an analytic function L(z) such that
e
L(z)
= f (z) = z. Now consider the problem of nding a description of L(z). Each
z C can be written in a unique way in the form
z = [z[ e
i arg
(z)
where arg
(z) is the angle in (, + 2) associated with z. (You could of course

have considered this to be the angle in ( 2, ) associated with z or in innitely
many other open intervals of length 2. The description of the log is not unique.)
Then letting L(z) = a +ib
z = [z[ e
i arg
(z)
= e
L(z)
= e
a
e
ib
and so you can let L(z) = ln[z[ +i arg
(z) .
Does L(z) satisfy the usual properties of the logarithm? That is, for z, w C,
is L(zw) = L(z)+L(w)? This follows from the usual rules of exponents. You know
e
z+w
= e
z
e
w
. (You can verify this directly or you can reduce to the case where z, w
are real. If z is a xed real number, then the equation holds for all real w. Therefore,
it must also hold for all complex w because the real line contains a limit point. Now
20.2. MAXIMUM MODULUS THEOREM 493
for this xed w, the equation holds for all z real. Therefore, by similar reasoning,
it holds for all complex z.)
Now suppose z, w C and zw / . Then
e
L(zw)
= zw, e
L(z)+L(w)
= e
L(z)
e
L(w)
= zw
and so L(zw) = L(z) +L(w) as claimed. This proves the theorem.
In the case where the ray is the negative real axis, it is called the principal
branch of the logarithm. Thus arg (z) is a number between and .
Denition 20.6 Let log denote the branch of the logarithm which corresponds to
the ray for = . That is, the ray is the negative real axis. Sometimes this is called
the principal branch of the logarithm.
20.2 Maximum Modulus Theorem
Here is another very signicant theorem known as the maximum modulus theorem
which follows immediately from the open mapping theorem.
Theorem 20.7 (maximum modulus theorem) Let be a bounded region and let
f : C be analytic and f : C continuous. Then if z ,
[f (z)[ max [f (w)[ : w . (20.3)
If equality is achieved for any z , then f is a constant.
Proof: Suppose f is not a constant. Then f () is a region and so if z ,
there exists r > 0 such that B(f (z) , r) f () . It follows there exists z
1

with [f (z
1
)[ > [f (z)[ . Hence max
_
[f (w)[ : w
_
is not achieved at any interior
point of . Therefore, the point at which the maximum is achieved must lie on the
boundary of and so
max [f (w)[ : w = max
_
[f (w)[ : w
_
> [f (z)[
for all z or else f is a constant. This proves the theorem.
You can remove the assumption that is bounded and give a slightly dierent
version.
Theorem 20.8 Let f : C be analytic on a region, and suppose B(a, r) .
Then
[f (a)[ max
_
f
_
a +re
i
_
: [0, 2]
_
.
Equality occurs for some r > 0 and a if and only if f is constant in hence
equality occurs for all such a, r.
Proof: The claimed inequality holds by Theorem 20.7. Suppose equality in the
above is achieved for some B(a, r) . Then by Theorem 20.7 f is equal to a
constant, w on B(a, r) . Therefore, the function, f () w has a zero set which has
a limit point in and so by Theorem 19.23 f (z) = w for all z .
Conversely, if f is constant, then the equality in the above inequality is achieved
for all B(a, r) .
Next is yet another version of the maximum modulus principle which is in Con-
way [12]. Let be an open set.
Denition 20.9 Dene
to equal in the case where is bounded and

in the case where is not bounded.
Denition 20.10 Let f be a complex valued function dened on a set S C and
let a be a limit point of S.
lim sup
za
[f (z)[ lim
r0
sup[f (w)[ : w B
(a, r) S .
The limit exists because sup[f (w)[ : w B
(a, r) S is decreasing in r. In case

a = ,
lim sup
z
[f (z)[ lim
r
sup[f (w)[ : [w[ > r, w S
Note that if limsup
za
[f (z)[ M and > 0, then there exists r > 0 such that
if z B
(a, r) S, then [f (z)[ < M + . If a = , there exists r > 0 such that if

[z[ > r and z S, then [f (z)[ < M +.
Theorem 20.11 Let be an open set in C and let f : C be analytic. Suppose
also that for every a
,
lim sup
za
[f (z)[ M < .
Then in fact [f (z)[ M for all z .
Proof: Let > 0 and let H z : [f (z)[ > M + . Suppose H ,= .
Then H is an open subset of . I claim that H is actually bounded. If is
bounded, there is nothing to show so assume is unbounded. Then the condition
involving the limsup implies there exists r > 0 such that if [z[ > r and z , then
[f (z)[ M + /2. It follows H is contained in B(0, r) and so it is bounded. Now
consider the components of . One of these components contains points from H.
Let this component be denoted as V and let H
V
H V. Thus H
V
is a bounded
open subset of V. Let U be a component of H
V
. First suppose U V . In this
case, it follows that on U, [f (z)[ = M + and so by Theorem 20.7 [f (z)[ M +
for all z U contradicting the denition of H. Next suppose U contains a point
of V, a. Then in this case, a violates the condition on limsup. Either way you
get a contradiction. Hence H = as claimed. Since > 0 is arbitrary, this shows
[f (z)[ M.
20.3. EXTENSIONS OF MAXIMUM MODULUS THEOREM 495
20.3 Extensions Of Maximum Modulus Theorem
20.3.1 Phragmen Lindelof Theorem
This theorem is an extension of Theorem 20.11. It uses a growth condition near the
extended boundary to conclude that f is bounded. I will present the version found
in Conway [12]. It seems to be more of a method than an actual theorem. There
are several versions of it.
Theorem 20.12 Let be a simply connected region in C and suppose f is ana-
lytic on . Also suppose there exists a function, which is nonzero and uniformly
bounded on . Let M be a positive number. Now suppose
= A B such
that for every a A, limsup
za
[f (z)[ M and for every b B, and > 0,
limsup
zb
[f (z)[ [(z)[
M. Then [f (z)[ M for all z .

Proof: By Theorem 20.3 there exists log ((z)) analytic on . Now dene
g (z) exp( log ((z))) so that g (z) = (z)
. Now also
[g (z)[ = [exp( log ((z)))[ = [exp( ln[(z)[)[ = [(z)[
.
Let m [(z)[ for all z . Dene F (z) f (z) g (z) m
. Thus F is analytic
and for b B,
limsup
zb
[F (z)[ = limsup
zb
[f (z)[ [(z)[
Mm
while for a A,
lim sup
za
[F (z)[ M.
Therefore, for
, limsup
z
[F (z)[ max (M, M
) and so by Theorem
20.11, [f (z)[
_
m
|(z)|
_
max (M, M
) . Now let 0 to obtain [f (z)[ M.

In applications, it is often the case that B = .
Now here is an interesting case of this theorem. It involves a particular form for
, in this case =
_
z C : [arg (z)[ <

2a
_
where a
1
2
.
Then equals the two slanted lines. Also on you can dene a logarithm,
log (z) = ln [z[ + i arg (z) where arg (z) is the angle associated with z between
and . Therefore, if c is a real number you can dene z
c
for such z in the usual way:
z
c
exp(c log (z)) = exp(c [ln[z[ +i arg (z)])
= [z[
c
exp(ic arg (z)) = [z[
c
(cos (c arg (z)) +i sin(c arg (z))) .
If [c[ < a, then [c arg (z)[ <

2
and so cos (c arg (z)) > 0. Therefore, for such c,
[exp((z
c
))[ = [exp([z[
c
(cos (c arg (z)) +i sin(c arg (z))))[
= [exp([z[
c
(cos (c arg (z))))[
which is bounded since cos (c arg (z)) > 0.
Corollary 20.13 Let =
_
z C : [arg (z)[ <

2a
_
where a
1
2
and suppose f is
analytic on and satises limsup
za
[f (z)[ M on and suppose there are
positive constants, P, b where b < a and
[f (z)[ P exp
_
[z[
b
_
for all [z[ large enough. Then [f (z)[ M for all z .
Proof: Let b < c < a and let (z) exp((z
c
)) . Then as discussed above,
(z) ,= 0 on and [(z)[ is bounded on . Now
[(z)[
= [exp([z[
c
(cos (c arg (z))))[
lim sup
z
[f (z)[ [(z)[
= lim sup
z
P exp
_
[z[
b
_
[exp([z[
c
(cos (c arg (z))))[
= 0 M
and so by Theorem 20.12 [f (z)[ M.
The following is another interesting case. This case is presented in Rudin [38]
Corollary 20.14 Let be the open set consisting of z C : a < Re z < b and
suppose f is analytic on , continuous on , and bounded on . Suppose also that
f (z) 1 on the two lines Re z = a and Re z = b. Then [f (z)[ 1 for all z .
Proof: This time let (z) =
1
1+za
. Thus [(z)[ 1 because Re (z a) > 0 and
(z) ,= 0 for all z . Also, limsup
z
[(z)[
= 0 for every > 0. Therefore, if a

is a point of the sides of , limsup
za
[f (z)[ 1 while limsup
z
[f (z)[ [(z)[
=
0 1 and so by Theorem 20.12, [f (z)[ 1 on .
This corollary yields an interesting conclusion.
Corollary 20.15 Let be the open set consisting of z C : a < Re z 0 and dene
g (z) (M (a) +)
bz
ba
(M (b) +)
za
ba
where for M > 0 and z C, M
z
exp(z ln(M)) . Thus g ,= 0 and so f/g is
analytic on and continuous on . Also on the left side,
f (a +iy)
g (a +iy)
f (a +iy)
(M (a) +)
baiy
ba
f (a +iy)
(M (a) +)
ba
ba
1
while on the right side a similar computation shows
f
g
1 also. Therefore, by
Corollary 20.14 [f/g[ 1 on . Therefore, letting x +iy = z,
[f (z)[
(M (a) +)
bz
ba
(M (b) +)
za
ba
(M (a) +)
bx
ba
(M (b) +)
xa
ba
and so
M (x) (M (a) +)
bx
ba
(M (b) +)
xa
ba
.
Since > 0 is arbitrary, it yields the conclusion of the corollary.
Another way of saying this is that x ln(M (x)) is a convex function.
This corollary has an interesting application known as the Hadamard three cir-
cles theorem.
20.3.2 Hadamard Three Circles Theorem
Let 0 < R
1
< R
2
and suppose f is analytic on z C : R
1
< [z[ < R
2
. Then letting
R
1
< a < b < R
2
, note that g (z) exp(z) maps the strip z C : lna < Re z < b
onto z C : a < [z[ < b and that in fact, g maps the line ln r +iy onto the circle
re
i
. Now let M (x) be dened as above and m be dened by
m(r) max
f
_
re
i
_
.
Then for a < r < b, Corollary 20.15 implies
m(r) = sup
y
f
_
e
ln r+iy
_
= M (lnr) M (ln a)
ln bln r
ln bln a
M (ln b)
ln rln a
ln bln a
= m(a)
ln(b/r)/ ln(b/a)
m(b)
ln(r/a)/ ln(b/a)
and so
m(r)
ln(b/a)
m(a)
ln(b/r)
m(b)
ln(r/a)
.
Taking logarithms, this yields
ln
_
b
a
_
ln(m(r)) ln
_
b
r
_
ln(m(a)) + ln
_
r
a
_
ln(m(b))
which says the same as r ln(m(r)) is a convex function of lnr.
The next example, also in Rudin [38] is very dramatic. An unbelievably weak
assumption is made on the growth of the function and still you get a uniform bound
in the conclusion.
Corollary 20.16 Let =
_
z C : [Im(z)[ <

2
_
. Suppose f is analytic on ,
continuous on , and there exist constants, < 1 and A < such that
[f (z)[ exp(Aexp([x[)) for z = x +iy
and

f
_
x i
2
_
1
for all x 1. Then [f (z)[ 1 on .
Proof: This time let (z) = [exp(Aexp(z)) exp(Aexp(z))]
1
where <
< 1. Then (z) ,= 0 on and for > 0
[(z)[
=
1
[exp(Aexp(z)) exp(Aexp(z))[
Now
exp(Aexp(z)) exp(Aexp(z))
= exp(A(exp(z) + exp(z)))
= exp
_
A
_
cos (y)
_
e
x
+e
x
_
+i sin(y)
_
e
x
e
x
__
and so
[(z)[
=
1
exp[A(cos (y) (e
x
+e
x
))]
Now cos y > 0 because < 1 and [y[ <

2
. Therefore,
lim sup
z
[f (z)[ [(z)[
0 1
and so by Theorem 20.12, [f (z)[ 1.
20.3.3 Schwarzs Lemma
This interesting lemma comes from the maximum modulus theorem. It will be used
later as part of the proof of the Riemann mapping theorem.
Lemma 20.17 Suppose F : B(0, 1) B(0, 1) , F is analytic, and F (0) = 0. Then
for all z B(0, 1) ,
[F (z)[ [z[ , (20.4)
and
[F
(0)[ 1. (20.5)
If equality holds in 20.5 then there exists C with [[ = 1 and
F (z) = z. (20.6)
Proof: First note that by assumption, F (z) /z has a removable singularity at
0 if its value at 0 is dened to be F
(0) . By the maximum modulus theorem, if

[z[ < r < 1,
F (z)
z
max
t[0,2]
F
_
re
it
_
r

1
r
.
Then letting r 1,
F (z)
z
1
this shows 20.4 and it also veries 20.5 on taking the limit as z 0. If equality
holds in 20.5, then [F (z) /z[ achieves a maximum at an interior point so F (z) /z
equals a constant, by the maximum modulus theorem. Since F (z) = z, it follows
F
(0) = and so [[ = 1.
Rudin [38] gives a memorable description of what this lemma says. It says that
if an analytic function maps the unit ball to itself, keeping 0 xed, then it must do
one of two things, either be a rotation or move all points closer to 0. (This second
part follows in case [F
(0)[ < 1 because in this case, you must have [F (z)[ ,= [z[
and so by 20.4, [F (z)[ < [z[)
20.3.4 One To One Analytic Maps On The Unit Ball
The transformation in the next lemma is of fundamental importance.
Lemma 20.18 Let B(0, 1) and dene
(z)
z
1 z
.
Then
: B(0, 1) B(0, 1) ,
: B(0, 1) B(0, 1) , and is one to one and

onto. Also
=
1
. Also
(0) = 1 [[
2
,
() =
1
1 [[
2
.
Proof: First of all, for [z[ < 1/ [[ ,
(z)
_
z+
1+z
_
1
_
z+
1+z
_ = z
after a few computations. If I show that
maps B(0, 1) to B(0, 1) for all [[ < 1,

this will have shown that
is one to one and onto B(0, 1).

Consider

_
e
i
_
. This yields
e
i
1 e
i
1 e
i
1 e
i
= 1
where the rst equality is obtained by multiplying by
e
i
= 1. Therefore,
maps
B(0, 1) one to one and onto B(0, 1) . Now notice that
is analytic on B(0, 1)
because the only singularity, a pole is at z = 1/. By the maximum modulus
theorem, it follows
[
(z)[ < 1
whenever [z[ < 1. The same is true of
.
It only remains to verify the assertions about the derivatives. Long division
gives
(z) = ()
1
+
_
+()
1
1z
_
and so
(z) = (1) (1 z)
2
_
+ ()
1
_
()
= (1 z)
2
_
+ ()
1
_
= (1 z)
2
_
[[
2
+ 1
_
Hence the two formulas follow. This proves the lemma.
One reason these mappings are so important is the following theorem.
Theorem 20.19 Suppose f is an analytic function dened on B(0, 1) and f maps
B(0, 1) one to one and onto B(0, 1) . Then there exists such that
f (z) = e
i
(z)
for some B(0, 1) .
Proof: Let f () = 0. Then h(z) f
(z) maps B(0, 1) one to one and

onto B(0, 1) and has the property that h(0) = 0. Therefore, by the Schwarz lemma,
[h(z)[ [z[ .
but it is also the case that h
1
(0) = 0 and h
1
maps B(0, 1) to B(0, 1). Therefore,
the same inequality holds for h
1
. Therefore,
[z[ =
h
1
(h(z))
[h(z)[
and so [h(z)[ = [z[ . By the Schwarz lemma again, h(z) f
_
(z)
_
= e
i
z.
Letting z =
, you get f (z) = e

i
(z).
20.4 Exercises
1. Consider the function, g (z) =
zi
z+i
. Show this is analytic on the upper half
plane, P+ and maps the upper half plane one to one and onto B(0, 1). Hint:
First show g maps the real axis to B(0, 1) . This is really easy because you
end up looking at a complex number divided by its conjugate. Thus [g (z)[ = 1
for z on (P+) . Now show that limsup
z
[g (z)[ = 1. Then apply a version
of the maximum modulus theorem. You might note that g (z) = 1+
2i
z+i
. This
will show [g (z)[ 1. Next pick w B(0, 1) and solve g (z) = w. You just
have to show there exists a unique solution and its imaginary part is positive.
20.4. EXERCISES 501
2. Does there exist an entire function f which maps C onto the upper half plane?
3. Letting g be the function of Problem 1 show that
_
g
1
_
(0) = 2. Also note

that g
1
(0) = i. Now suppose f is an analytic function dened on the upper
half plane which has the property that [f (z)[ 1 and f (i) = where [[ < 1.
Find an upper bound to [f
(i)[ . Also nd all functions, f which satisfy the

condition, f (i) = , [f (z)[ 1, and achieve this maximum value. Hint: You
could consider the function, h(z)
f g
1
(z) and check the conditions
for the Schwarz lemma for this function, h.
4. This and the next two problems follow a presentation of an interesting topic
in Rudin [38]. Let
be given in Lemma 20.18. Suppose f is an analytic

function dened on B(0, 1) which satises [f (z)[ 1. Suppose also there are
, B(0, 1) and it is required f () = . If f is such a function, show
that [f
()[
1||
2
1||
2
. Hint: To show this consider g =
. Show
g (0) = 0 and [g (z)[ 1 on B(0, 1) . Now use Lemma 20.17.
5. In Problem 4 show there exists a function, f analytic on B(0, 1) such that
f () = , [f (z)[ 0, and [f
()[ =
1||
2
1||
2
. Hint: You do this by choosing
g in the above problem such that equality holds in Lemma 20.17. Thus you
need g (z) = z where [[ = 1 and solve g =
for f.
6. Suppose that f : B(0, 1) B(0, 1) and that f is analytic, one to one, and
onto with f () = 0. Show there exists , [[ = 1 such that f (z) =
(z) .
This gives a dierent way to look at Theorem 20.19. Hint: Let g = f
1
.
Then g
(0) f
() = 1. However, f () = 0 and g (0) = . From Problem

4 with = 0, you can conclude an inequality for [f
()[ and another one

for [g
(0)[ . Then use the fact that the product of these two equals 1 which
comes from the chain rule to conclude that equality must take place. Now use
Problem 5 to obtain the form of f.
7. In Corollary 20.16 show that it is essential that < 1. That is, show there
exists an example where the conclusion is not satised with a slightly weaker
growth condition. Hint: Consider exp(exp(z)) .
8. Suppose f
n
is a sequence of functions which are analytic on , a bounded
region such that each f
n
is also continuous on . Suppose that f
n
converges
uniformly on . Show that then f
n
converges uniformly on and that the
function to which the sequence converges is analytic on and continuous on
.
9. Suppose is a bounded region and there exists a point z
0
such that
[f (z
0
)[ = min
_
[f (z)[ : z
_
. Can you conclude f must equal a constant?
10. Suppose f is continuous on B(a, r) and analytic on B(a, r) and that f is not
constant. Suppose also [f (z)[ = C ,= 0 for all [z a[ = r. Show that there
exists B(a, r) such that f () = 0. Hint: If not, consider f/C and C/f.
Both would be analytic on B(a, r) and are equal to 1 on the boundary.
11. Suppose f is analytic on B(0, 1) but for every a B(0, 1) , lim
za
[f (z)[ =
. Show there exists a sequence, z
n
B(0, 1) such that lim
n
[z
n
[ = 1
and f (z
n
) = 0.
20.5 Counting Zeros
The above proof of the open mapping theorem relies on the very important inverse
function theorem from real analysis. There are other approaches to this important
theorem which do not rely on the big theorems from real analysis and are more
oriented toward the use of the Cauchy integral formula and specialized techniques
from complex analysis. One of these approaches is given next which involves the
notion of counting zeros. The next theorem is the one about counting zeros. It
will also be used later in the proof of the Riemann mapping theorem.
Theorem 20.20 Let be an open set in C and let : [a, b] be closed, con-
tinuous, bounded variation, and n(, z) = 0 for all z / . Suppose also that f
is analytic on having zeros a
1
, , a
m
where the zeros are repeated according to
multiplicity, and suppose that none of these zeros are on
. Then
1
2i
_
(z)
f (z)
dz =
m
k=1
n(, a
k
) .
Proof: Let f (z) =
m
j=1
(z a
j
) g (z) where g (z) ,= 0 on . Hence
f
(z)
f (z)
=
m
j=1
1
z a
j
+
g
(z)
g (z)
and so
1
2i
_
(z)
f (z)
dz =
m
j=1
n(, a
j
) +
1
2i
_
(z)
g (z)
dz.
But the function, z
g
(z)
g(z)
is analytic and so by Corollary 19.47, the last integral
in the above expression equals 0. Therefore, this proves the theorem.
The following picture is descriptive of the situation described in the next theo-
rem.
f
q
a
1
a
2
a
3
f(([a, b]))
Theorem 20.21 Let be a region, let : [a, b] be closed continuous, and

bounded variation such that n(, z) = 0 for all z / . Also suppose f : C is
20.5. COUNTING ZEROS 503
analytic and that / f (
) . Then f : [a, b] C is continuous, closed, and

bounded variation. Also suppose a
1
, , a
m
= f
1
() where these points are
counted according to their multiplicities as zeros of the function f Then
n(f , ) =
m
k=1
n(, a
k
) .
Proof: It is clear that f is continuous. It only remains to verify that it is of
bounded variation. Suppose rst that
B B where B is a ball. Then

[f ( (t)) f ( (s))[ =
_
1
0
f
( (s) +( (t) (s))) ( (t) (s)) d
C [ (t) (s)[
where C max
_
[f
(z)[ : z B
_
. Hence, in this case,
V (f , [a, b]) CV (, [a, b]) .
Now let denote the distance between
and C . Since
is compact, > 0.
By uniform continuity there exists =
ba
p
for p a positive integer such that if
[s t[ < , then [ (s) (t)[ <

2
. Then
([t, t +]) B
_
(t) ,

2
_
.
Let C max
_
[f
(z)[ : z
p
j=1
B
_
(t
j
) ,

2
_
_
where t
j

j
p
(b a) +a. Then from
what was just shown,
V (f , [a, b])
p1
j=0
V (f , [t
j
, t
j+1
])
C
p1
j=0
V (, [t
j
, t
j+1
]) <
showing that f is bounded variation as claimed. Now from Theorem 19.42 there
exists C
1
([a, b]) such that
(a) = (a) = (b) = (b) , ([a, b]) ,
and
n(, a
k
) = n(, a
k
) , n(f , ) = n(f , ) (20.7)
for k = 1, , m. Then
n(f , ) = n(f , )
=
1
2i
_
f
dw
w
=
1
2i
_
b
a
f
( (t))
f ( (t))
(t) dt
=
1
2i
_
(z)
f (z)
dz
=
m
k=1
n(, a
k
)
By Theorem 20.20. By 20.7, this equals

m
k=1
n(, a
k
) which proves the theorem.
The next theorem is incredible and is very interesting for its own sake. The
following picture is descriptive of the situation of this theorem.
f
q
a
a
1
a
2
a
3
a
4
B(a, )
z
B(, )
Theorem 20.22 Let f : B(a, R) C be analytic and let
f (z) = (z a)
m
g (z) , > m 1
where g (z) ,= 0 in B(a, R) . (f (z) has a zero of order m at z = a.) Then there
exist , > 0 with the property that for each z satisfying 0 < [z [ < , there exist
points,
a
1
, , a
m
B(a, ) ,
such that
f
1
(z) B(a, ) = a
1
, , a
m
and each a
k
is a zero of order 1 for the function f () z.
Proof: By Theorem 19.23 f is not constant on B(a, R) because it has a zero
of order m. Therefore, using this theorem again, there exists > 0 such that
B(a, 2) B(a, R) and there are no solutions to the equation f (z) = 0 for
z B(a, 2) except a. Also assume is small enough that for 0 < [z a[ 2,
f
(z) ,= 0. This can be done since otherwise, a would be a limit point of a sequence
of points, z
n
, having f
(z
n
) = 0 which would imply, by Theorem 19.23 that f
= 0
on B(a, R) , contradicting the assumption that f has a zero of order m and is
therefore not constant. Thus the situation is described by the following picture.
20.5. COUNTING ZEROS 505
f ,= 0
f
,= 0
2
Now pick (t) = a +e

it
, t [0, 2] . Then / f (
) so there exists > 0 with

B(, ) f (
) = . (20.8)
Therefore, B(, ) is contained on one component of C f ( ([0, 2])) . Therefore,
n(f , ) = n(f , z) for all z B(, ) . Now consider f restricted to B(a, 2) .
For z B(, ) , f
1
(z) must consist of a nite set of points because f
(w) ,= 0
for all w in B(a, 2) a implying that the zeros of f () z in B(a, 2) have no
limit point. Since B(a, 2) is compact, this means there are only nitely many. By
Theorem 20.21,
n(f , z) =
p
k=1
n(, a
k
) (20.9)
where a
1
, , a
p
= f
1
(z) . Each point, a
k
of f
1
(z) is either inside the circle
traced out by , yielding n(, a
k
) = 1, or it is outside this circle yielding n(, a
k
) =
0 because of 20.8. It follows the sum in 20.9 reduces to the number of points of
f
1
(z) which are contained in B(a, ) . Thus, letting those points in f
1
(z) which
are contained in B(a, ) be denoted by a
1
, , a
r
n(f , ) = n(f , z) = r.
Also, by Theorem 20.20, m = n(f , ) because a is a zero of f of order m.
Therefore, for z B(, )
m = n(f , ) = n(f , z) = r
Therefore, r = m. Each of these a
k
is a zero of order 1 of the function f () z
because f
(a
k
) ,= 0. This proves the theorem.
This is a very fascinating result partly because it implies that for values of f
near a value, , at which f () has a zero of order m for m > 1, the inverse
image of these values includes at least m points, not just one. Thus the topological
properties of the inverse image changes radically. This theorem also shows that
f (B(a, )) B(, ) .
Theorem 20.23 (open mapping theorem) Let be a region and f : C
be analytic. Then f () is either a point or a region. If f is one to one, then
f
1
: f () is analytic.
Proof: If f is not constant, then for every f () , it follows from Theorem
19.23 that f () has a zero of order m < and so from Theorem 20.22 for each
a there exist , > 0 such that f (B(a, )) B(, ) which clearly implies
that f maps open sets to open sets. Therefore, f () is open, connected because f
is continuous. If f is one to one, Theorem 20.22 implies that for every f ()
the zero of f () is of order 1. Otherwise, that theorem implies that for z near
, there are m points which f maps to z contradicting the assumption that f is one
to one. Therefore, f
(z) ,= 0 and since f

1
is continuous, due to f being an open
map, it follows
_
f
1
_
(f (z)) = lim
f(z
1
)f(z)
f
1
(f (z
1
)) f
1
(f (z))
f (z
1
) f (z)
= lim
z
1
z
z
1
z
f (z
1
) f (z)
=
1
f
(z)
.
20.6 An Application To Linear Algebra
Gerschgorins theorem gives a convenient way to estimate eigenvalues of a matrix
from easy to obtain information. For A an n n matrix, denote by (A) the
collection of all eigenvalues of A.
Theorem 20.24 Let A be an n n matrix. Consider the n Gerschgorin discs
dened as
D
i

_
_
_
C : [ a
ii
[
j=i
[a
ij
[
_
_
_
.
Then every eigenvalue is contained in some Gerschgorin disc.
This theorem says to add up the absolute values of the entries of the i
th
row
which are o the main diagonal and form the disc centered at a
ii
having this radius.
The union of these discs contains (A) .
Proof: Suppose Ax = x where x ,= 0. Then for A = (a
ij
)
j=i
a
ij
x
j
= ( a
ii
) x
i
.
Therefore, if we pick k such that [x
k
[ [x
j
[ for all x
j
, it follows that [x
k
[ ,= 0 since
[x[ ,= 0 and
[x
k
[
j=k
[a
kj
[
j=k
[a
kj
[ [x
j
[ [ a
kk
[ [x
k
[ .
Now dividing by [x
k
[ we see that is contained in the k
th
Gerschgorin disc.
20.6. AN APPLICATION TO LINEAR ALGEBRA 507
More can be said using the theory about counting zeros. To begin with the
distance between two n n matrices, A = (a
ij
) and B = (b
ij
) as follows.
[[AB[[
2
ij
[a
ij
b
ij
[
2
.
Thus two matrices are close if and only if their corresponding entries are close.
Let A be an n n matrix. Recall the eigenvalues of A are given by the zeros
of the polynomial, p
A
(z) = det (zI A) where I is the n n identity. Then small
changes in A will produce small changes in p
A
(z) and p
A
(z) . Let
k
denote a very
small closed circle which winds around z
k
, one of the eigenvalues of A, in the counter
clockwise direction so that n(
k
, z
k
) = 1. This circle is to enclose only z
k
and is
to have no other eigenvalue on it. Then apply Theorem 20.20. According to this
theorem
1
2i
_
A
(z)
p
A
(z)
dz
is always an integer equal to the multiplicity of z
k
as a root of p
A
(t) . Therefore,
small changes in A result in no change to the above contour integral because it
must be an integer and small changes in A result in small changes in the integral.
Therefore whenever every entry of the matrix B is close enough to the corresponding
entry of the matrix A, the two matrices have the same number of zeros inside
k
under the usual convention that zeros are to be counted according to multiplicity. By
making the radius of the small circle equal to where is less than the minimum
distance between any two distinct eigenvalues of A, this shows that if B is close
enough to A, every eigenvalue of B is closer than to some eigenvalue of A. The
next theorem is about continuous dependence of eigenvalues.
Theorem 20.25 If is an eigenvalue of A, then if [[B A[[ is small enough, some
eigenvalue of B will be within of .
Consider the situation that A(t) is an n n matrix and that t A(t) is
continuous for t [0, 1] .
Lemma 20.26 Let (t) (A(t)) for t < 1 and let
t
=
st
(A(s)) . Also let
K
t
be the connected component of (t) in
t
. Then there exists > 0 such that
K
t
(A(s)) ,= for all s [t, t +] .
Proof: Denote by D((t) , ) the disc centered at (t) having radius > 0,
with other occurrences of this notation being dened similarly. Thus
D((t) , ) z C : [(t) z[ .
Suppose > 0 is small enough that (t) is the only element of (A(t)) contained
in D((t) , ) and that p
A(t)
has no zeroes on the boundary of this disc. Then by
continuity, and the above discussion and theorem, there exists > 0, t + < 1, such
that for s [t, t +] , p
A(s)
also has no zeroes on the boundary of this disc and that
A(s) has the same number of eigenvalues, counted according to multiplicity, in the
disc as A(t) . Thus (A(s)) D((t) , ) ,= for all s [t, t +] . Now let
H =
_
s[t,t+]
(A(s)) D((t) , ) .
I will show H is connected. Suppose not. Then H = P Q where P, Q are separated
and (t) P. Let
s
0
inf s : (s) Q for some (s) (A(s)) .
There exists (s
0
) (A(s
0
)) D((t) , ) . If (s
0
) / Q, then from the above
discussion there are
(s) (A(s)) Q
for s > s
0
arbitrarily close to (s
0
) . Therefore, (s
0
) Q which shows that s
0
> t
because (t) is the only element of (A(t)) in D((t) , ) and (t) P. Now let
s
n
s
0
. Then (s
n
) P for any
(s
n
) (A(s
n
)) D((t) , )
and from the above discussion, for some choice of s
n
s
0
, (s
n
) (s
0
) which
contradicts P and Q separated and nonempty. Since P is nonempty, this shows
Q = . Therefore, H is connected as claimed. But K
t
H and so K
t
(A(s)) ,=
for all s [t, t +] . This proves the lemma.
The following is the necessary theorem.
Theorem 20.27 Suppose A(t) is an nn matrix and that t A(t) is continuous
for t [0, 1] . Let (0) (A(0)) and dene
t[0,1]
(A(t)) . Let K
(0)
= K
0
denote the connected component of (0) in . Then K
0
(A(t)) ,= for all
t [0, 1] .
Proof: Let S t [0, 1] : K
0
(A(s)) ,= for all s [0, t] . Then 0 S.
Let t
0
= sup(S) . Say (A(t
0
)) =
1
(t
0
) , ,
r
(t
0
) . I claim at least one of these
is a limit point of K
0
and consequently must be in K
0
which will show that S
has a last point. Why is this claim true? Let s
n
t
0
so s
n
S. Now let the
discs, D(
i
(t
0
) , ) , i = 1, , r be disjoint with p
A(t
0
)
having no zeroes on
i
the
boundary of D(
i
(t
0
) , ) . Then for n large enough it follows from Theorem 20.20
and the discussion following it that (A(s
n
)) is contained in
r
i=1
D(
i
(t
0
) , ).
Therefore, K
0
( (A(t
0
)) +D(0, )) ,= for all small enough. This requires at
least one of the
i
(t
0
) to be in K
0
. Therefore, t
0
S and S has a last point.
Now by Lemma 20.26, if t
0
< 1, then K
0
K
t
would be a strictly larger connected
set containing (0) . (The reason this would be strictly larger is that K
0
(A(s)) =
for some s (t, t +) while K
t
(A(s)) ,= for all s [t, t +].) Therefore,
t
0
= 1 and this proves the theorem.
The following is an interesting corollary of the Gerschgorin theorem.
20.6. AN APPLICATION TO LINEAR ALGEBRA 509
Corollary 20.28 Suppose one of the Gerschgorin discs, D
i
is disjoint from the
union of the others. Then D
i
contains an eigenvalue of A. Also, if there are n
disjoint Gerschgorin discs, then each one contains an eigenvalue of A.
Proof: Denote by A(t) the matrix
_
a
t
ij
_
where if i ,= j, a
t
ij
= ta
ij
and a
t
ii
= a
ii
.
Thus to get A(t) we multiply all non diagonal terms by t. Let t [0, 1] . Then
A(0) = diag (a
11
, , a
nn
) and A(1) = A. Furthermore, the map, t A(t) is
continuous. Denote by D
t
j
the Gerschgorin disc obtained from the j
th
row for the
matrix, A(t). Then it is clear that D
t
j
D
j
the j
th
Gerschgorin disc for A. Then
a
ii
is the eigenvalue for A(0) which is contained in the disc, consisting of the single
point a
ii
which is contained in D
i
. Letting K be the connected component in for
dened in Theorem 20.27 which is determined by a
ii
, it follows by Gerschgorins
theorem that K (A(t))
n
j=1
D
t
j

n
j=1
D
j
= D
i
(
j=i
D
j
) and also, since
K is connected, there are no points of K in both D
i
and (
j=i
D
j
) . Since at least
one point of K is in D
i
,(a
ii
) it follows all of K must be contained in D
i
. Now by
Theorem 20.27 this shows there are points of K (A) in D
i
. The last assertion
follows immediately.
Actually, this can be improved slightly. It involves the following lemma.
Lemma 20.29 In the situation of Theorem 20.27 suppose (0) = K
0
(A(0))
and that (0) is a simple root of the characteristic equation of A(0). Then for all
t [0, 1] ,
(A(t)) K
0
= (t)
where (t) is a simple root of the characteristic equation of A(t) .
Proof: Let S
t [0, 1] : K
0
(A(s)) = (s) , a simple eigenvalue for all s [0, t] .
Then 0 S so it is nonempty. Let t
0
= sup(S) and suppose
1
,=
2
are two
elements of (A(t
0
)) K
0
. Then choosing > 0 small enough, and letting D
i
be
disjoint discs containing
i
respectively, similar arguments to those of Lemma 20.26
imply
H
i

s[t
0
,t
0
]
(A(s)) D
i
is a connected and nonempty set for i = 1, 2 which would require that H
i
K
0
.
But then there would be two dierent eigenvalues of A(s) contained in K
0
, contrary
to the denition of t
0
. Therefore, there is at most one eigenvalue, (t
0
) K
0

(A(t
0
)) . The possibility that it could be a repeated root of the characteristic
equation must be ruled out. Suppose then that (t
0
) is a repeated root of the
characteristic equation. As before, choose a small disc, D centered at (t
0
) and
small enough that
H
s[t
0
,t
0
]
(A(s)) D
is a nonempty connected set containing either multiple eigenvalues of A(s) or else a
single repeated root to the characteristic equation of A(s) . But since H is connected
and contains (t
0
) it must be contained in K
0
which contradicts the condition for
s S for all these s [t
0
, t
0
] . Therefore, t
0
S as hoped. If t
0
< 1, there exists
a small disc centered at (t
0
) and > 0 such that for all s [t
0
, t
0
+] , A(s) has
only simple eigenvalues in D and the only eigenvalues of A(s) which could be in K
0
are in D. (This last assertion follows from noting that (t
0
) is the only eigenvalue
of A(t
0
) in K
0
and so the others are at a positive distance from K
0
. For s close
enough to t
0
, the eigenvalues of A(s) are either close to these eigenvalues of A(t
0
)
at a positive distance from K
0
or they are close to the eigenvalue, (t
0
) in which
case it can be assumed they are in D.) But this shows that t
0
is not really an upper
bound to S. Therefore, t
0
= 1 and the lemma is proved.
With this lemma, the conclusion of the above corollary can be improved.
Corollary 20.30 Suppose one of the Gerschgorin discs, D
i
is disjoint from the
union of the others. Then D
i
contains exactly one eigenvalue of A and this eigen-
value is a simple root to the characteristic polynomial of A.
Proof: In the proof of Corollary 20.28, rst note that a
ii
is a simple root of A(0)
since otherwise the i
th
Gerschgorin disc would not be disjoint from the others. Also,
K, the connected component determined by a
ii
must be contained in D
i
because it
is connected and by Gerschgorins theorem above, K (A(t)) must be contained
in the union of the Gerschgorin discs. Since all the other eigenvalues of A(0) , the
a
jj
, are outside D
i
, it follows that K (A(0)) = a
ii
. Therefore, by Lemma 20.29,
K (A(1)) = K (A) consists of a single simple eigenvalue. This proves the
corollary.
Example 20.31 Consider the matrix,
_
_
5 1 0
1 1 1
0 1 0
_
_
The Gerschgorin discs are D(5, 1) , D(1, 2) , and D(0, 1) . Then D(5, 1) is dis-
joint from the other discs. Therefore, there should be an eigenvalue in D(5, 1) .
The actual eigenvalues are not easy to nd. They are the roots of the characteristic
equation, t
3
6t
2
+3t +5 = 0. The numerical values of these are . 669 66, 1. 423 1,
and 5. 246 55, verifying the predictions of Gerschgorins theorem.
20.7 Exercises
1. Use Theorem 20.20 to give an alternate proof of the fundamental theorem
of algebra. Hint: Take a contour of the form
r
= re
it
where t [0, 2] .
Consider
_
r
p
(z)
p(z)
dz and consider the limit as r .
2. Let M be an nn matrix. Recall that the eigenvalues of M are given by the
zeros of the polynomial, p
M
(z) = det (M zI) where I is the n n identity.
Formulate a theorem which describes how the eigenvalues depend on small
20.7. EXERCISES 511
changes in M. Hint: You could dene a norm on the space of nn matrices
as [[M[[ tr (MM
)
1/2
where M
is the conjugate transpose of M. Thus

[[M[[ =
_
_
j,k
[M
jk
[
2
_
_
1/2
.
Argue that small changes will produce small changes in p
M
(z) . Then apply
Theorem 20.20 using
k
a very small circle surrounding z
k
, the k
th
eigenvalue.
3. Suppose that two analytic functions dened on a region are equal on some
set, S which contains a limit point. (Recall p is a limit point of S if every
open set which contains p, also contains innitely many points of S. ) Show
the two functions coincide. We dened e
z
e
x
(cos y +i siny) earlier and we
showed that e
z
, dened this way was analytic on C. Is there any other way
to dene e
z
on all of C such that the function coincides with e
x
on the real
axis?
4. You know various identities for real valued functions. For example cosh
2
x
sinh
2
x = 1. If you dene cosh z
e
z
+e
z
2
and sinhz
e
z
e
z
2
, does it follow
that
cosh
2
z sinh
2
z = 1
for all z C? What about
sin(z +w) = sinz cos w + cos z sinw?
Can you verify these sorts of identities just from your knowledge about what
happens for real arguments?
5. Was it necessary that U be a region in Theorem 19.23? Would the same
conclusion hold if U were only assumed to be an open set? Why? What
about the open mapping theorem? Would it hold if U were not a region?
6. Let f : U C be analytic and one to one. Show that f
(z) ,= 0 for all z U.

Does this hold for a function of a real variable?
7. We say a real valued function, u is subharmonic if u
xx
+u
yy
0. Show that if u
is subharmonic on a bounded region, (open connected set) U, and continuous
on U and u m on U, then u m on U. Hint: If not, u achieves its
maximum at (x
0
, y
0
) U. Let u(x
0
, y
0
) > m + where > 0. Now consider
u
(x, y) = x
2
+ u(x, y) where is small enough that 0 < x
2
< for all
(x, y) U. Show that u
also achieves its maximum at some point of U and

that therefore, u
xx
+ u
yy
0 at that point implying that u
xx
+ u
yy
,
a contradiction.
8. If u is harmonic on some region, U, show that u coincides locally with the
real part of an analytic function and that therefore, u has innitely many
derivatives on U. Hint: Consider the case where 0 U. You can always
reduce to this case by a suitable translation. Now let B(0, r) U and use
the Schwarz formula to obtain an analytic function whose real part coincides
with u on B(0, r) . Then use Problem 7.
9. Show the solution to the Dirichlet problem of Problem 8 on Page 464 is unique.
You need to formulate this precisely and then prove uniqueness.
Residues
Denition 21.1 The residue of f at an isolated singularity which is a pole,
written res (f, ) is the coecient of (z )
1
where
f (z) = g (z) +
m
k=1
b
k
(z )
k
.
Thus res (f, ) = b
1
in the above.
At this point, recall Corollary 19.47 which is stated here for convenience.
Corollary 21.2 Let be an open set and let
k
: [a
k
, b
k
] , k = 1, , m, be
m
k=1
n(
k
, z) = 0
for all z / . Then if f : C is analytic,
m
k=1
_
k
f (w) dw = 0.
The following theorem is called the residue theorem. Note the resemblance to
Corollary 19.47.
Theorem 21.3 Let be an open set and let
k
: [a
k
, b
k
] , k = 1, , m, be
m
k=1
n(
k
, z) = 0
for all z / . Then if f :

C is meromorphic with no pole of f contained in any
k
,
1
2i
m
k=1
_
k
f (w) dw =
A
res (f, )
m
k=1
n(
k
, ) (21.1)
513
514 RESIDUES
where here A denotes the set of poles of f in . The sum on the right is a nite
sum.
Proof: First note that there are at most nitely many which are not in
the unbounded component of C
m
k=1
k
([a
k
, b
k
]) . Thus there exists a nite set,
1
, ,
N
A such that these are the only possibilities for which
n
k=1
n(
k
, )
might not equal zero. Therefore, 21.1 reduces to
1
2i
m
k=1
_
k
f (w) dw =
N
j=1
res (f,
j
)
n
k=1
n(
k
,
j
)
and it is this last equation which is established. Near
j
,
f (z) = g
j
(z) +
m
j
r=1
b
j
r
(z
j
)
r
g
j
(z) +Q
j
(z) .
where g
j
is analytic at and near
j
. Now dene
G(z) f (z)
N
j=1
Q
j
(z) .
It follows that G(z) has a removable singularity at each
j
. Therefore, by Corollary
19.47,
0 =
m
k=1
_
k
G(z) dz =
m
k=1
_
k
f (z) dz
N
j=1
m
k=1
_
k
Q
j
(z) dz.
Now
m
k=1
_
k
Q
j
(z) dz =
m
k=1
_
k
_
b
j
1
(z
j
)
+
m
j
r=2
b
j
r
(z
j
)
r
_
dz
=
m
k=1
_
k
b
j
1
(z
j
)
dz
m
k=1
n(
k
,
j
) res (f,
j
) (2i) .
Therefore,
m
k=1
_
k
f (z) dz =
N
j=1
m
k=1
_
k
Q
j
(z) dz
=
N
j=1
m
k=1
n(
k
,
j
) res (f,
j
) (2i)
= 2i
N
j=1
res (f,
j
)
m
k=1
n(
k
,
j
)
= (2i)
A
res (f, )
m
k=1
n(
k
, )
515
The following is an important example. This example can also be done by real
variable methods and there are some who think that real variable methods are
always to be preferred to complex variable methods. However, I will use the above
theorem to work this example.
Example 21.4 Find lim
R
_
R
R
sin(x)
x
dx
Things are easier if you write it as
lim
R
1
i
_
_
R
1
R
e
ix
x
dx +
_
R
R
1
e
ix
x
dx
_
.
This gives the same answer because cos (x) /x is odd. Consider the following contour
in which the orientation involves counterclockwise motion exactly once around.
R R
R
1
R
1
Denote by
R
1 the little circle and
R
the big one. Then on the inside of this
contour there are no singularities of e
iz
/z and it is contained in an open set with
the property that the winding number with respect to this contour about any point
not in the open set equals zero. By Theorem 19.22
1
i
_
_
R
1
R
e
ix
x
dx +
_
R
1
e
iz
z
dz +
_
R
R
1
e
ix
x
dx +
_
R
e
iz
z
dz
_
= 0 (21.2)
Now
R
e
iz
z
dz
_

0
e
R(i cos sin )
id
_

0
e
Rsin
d
and this last integral converges to 0 by the dominated convergence theorem. Now
consider the other circle. By the dominated convergence theorem again,
_
R
1
e
iz
z
dz =
_
0
e
R
1
(i cos sin )
id i
516 RESIDUES
as R . Then passing to the limit in 21.2,
lim
R
_
R
R
sin(x)
x
dx
= lim
R
1
i
_
_
R
1
R
e
ix
x
dx +
_
R
R
1
e
ix
x
dx
_
= lim
R
1
i
_
R
1
e
iz
z
dz
_
R
e
iz
z
dz
_
=
1
i
(i) = .
Example 21.5 Find lim
R
_
R
R
e
ixt sin x
x
dx. Note this is essentially nding the
inverse Fourier transform of the function, sin(x) /x.
This equals
lim
R
_
R
R
(cos (xt) +i sin(xt))
sin(x)
x
dx
= lim
R
_
R
R
cos (xt)
sin(x)
x
dx
= lim
R
_
R
R
cos (xt)
sin(x)
x
dx
= lim
R
1
2
_
R
R
sin(x(t + 1)) + sin(x(1 t))
x
dx
Let t ,= 1, 1. Then changing variables yields
lim
R
_
1
2
_
R(1+t)
R(1+t)
sin(u)
u
du +
1
2
_
R(1t)
R(1t)
sin(u)
u
du
_
.
In case [t[ < 1 Example 21.4 implies this limit is . However, if t > 1 the limit
equals 0 and this is also the case if t < 1. Summarizing,
lim
R
_
R
R
e
ixt
sinx
x
dx =
_
if [t[ < 1
0 if [t[ > 1
.
21.1 Rouches Theorem And The Argument Prin-
ciple
21.1.1 Argument Principle
A simple closed curve is just one which is homeomorphic to the unit circle. The
Jordan Curve theorem states that every simple closed curve in the plane divides
the plane into exactly two connected components, one bounded and the other un-
bounded. This is a very hard theorem to prove. However, in most applications the
21.1. ROUCHES THEOREM AND THE ARGUMENT PRINCIPLE 517
conclusion is obvious. Nevertheless, to avoid using this big topological result and
to attain some extra generality, I will state the following theorem in terms of the
winding number to avoid using it. This theorem is called the argument principle.
First recall that f has a zero of order m at if f (z) = g (z) (z )
m
where g is
an analytic function which is not equal to zero at . This is equivalent to having
f (z) =
k=m
a
k
(z )
k
for z near where a
m
,= 0. Also recall that f has a pole
of order m at if for z near , f (z) is of the form
f (z) = h(z) +
m
k=1
b
k
(z )
k
(21.3)
where b
m
,= 0 and h is a function analytic near .
Theorem 21.6 (argument principle) Let f be meromorphic in . Also suppose
is a closed bounded variation curve containing none of the poles or zeros of f with
the property that for all z / , n(, z) = 0 and for all z , n(, z) either equals
0 or 1. Now let p
1
, , p
m
and z
1
, , z
n
be respectively the poles and zeros
for which the winding number of about these points equals 1. Let z
k
be a zero of
order r
k
and let p
k
be a pole of order l
k
. Then
1
2i
_
(z)
f (z)
dz =
n
k=1
r
k
k=1
l
k
Proof: This theorem follows from computing the residues of f
/f. It has residues

at poles and zeros. I will do this now. First suppose f has a pole of order p at .
Then f has the form given in 21.3. Therefore,
f
(z)
f (z)
=
h
(z)
p
k=1
kb
k
(z)
k+1
h(z) +
p
k=1
b
k
(z)
k
=
h
(z) (z )
p
p1
k=1
kb
k
(z )
k1+p
pb
p
(z)
h(z) (z )
p
+
p1
k=1
b
k
(z )
pk
+b
p
This is of the form
=
b
p
s (z) +b
p
r (z)
pb
p
(z)
b
p
=
b
p
s (z) +b
p
_
r (z)
b
p
p
(z )
_
where s () = r () = 0. From this, it is clear res (f
/f, ) = p, the order of the

pole.
Next suppose f has a zero of order p at . Then
f
(z)
f (z)
=
k=p
a
k
k (z )
k1
k=p
a
k
(z )
k
=
k=p
a
k
k (z )
k1p
k=p
a
k
(z )
kp
and from this it is clear res (f
/f) = p, the order of the zero. The conclusion of this

theorem now follows from Theorem 21.3.
518 RESIDUES
One can also generalize the theorem to the case where there are many closed
curves involved. This is proved in the same way as the above.
Theorem 21.7 (argument principle) Let f be meromorphic in and let
k
:
[a
k
, b
k
] , k = 1, , m, be closed, continuous and of bounded variation. Suppose
also that
m
k=1
n(
k
, z) = 0
and for all z / and for z ,
m
k=1
n(
k
, z) either equals 0 or 1. Now let
p
1
, , p
m
and z
1
, , z
n
be respectively the poles and zeros for which the above
sum of winding numbers equals 1. Let z
k
be a zero of order r
k
and let p
k
be a pole
of order l
k
. Then
1
2i
_
(z)
f (z)
dz =
n
k=1
r
k
k=1
l
k
There is also a simple extension of this important principle which I found in
[26].
Theorem 21.8 (argument principle) Let f be meromorphic in . Also suppose
is a closed bounded variation curve with the property that for all z / , n(, z) =
0 and for all z , n(, z) either equals 0 or 1. Now let p
1
, , p
m
and
z
1
, , z
n
be respectively the poles and zeros for which the winding number of
about these points equals 1 listed according to multiplicity. Thus if there is a pole
of order m there will be this value repeated m times in the list for the poles. Also
let g (z) be an analytic function. Then
1
2i
_
g (z)
f
(z)
f (z)
dz =
n
k=1
g (z
k
)
m
k=1
g (p
k
)
Proof: This theorem follows from computing the residues of g (f
/f) . It has
residues at poles and zeros. I will do this now. First suppose f has a pole of order
m at . Then f has the form given in 21.3. Therefore,
g (z)
f
(z)
f (z)
=
g (z)
_
h
(z)
m
k=1
kb
k
(z)
k+1
_
h(z) +
m
k=1
b
k
(z)
k
= g (z)
h
(z) (z )
m
m1
k=1
kb
k
(z )
k1+m
mb
m
(z)
h(z) (z )
m
+
m1
k=1
b
k
(z )
mk
+b
m
From this, it is clear res (g (f
/f) , ) = mg () , where m is the order of the pole.

Thus would have been listed m times in the list of poles. Hence the residue at
this point is equivalent to adding g () m times.
21.1. ROUCHES THEOREM AND THE ARGUMENT PRINCIPLE 519
Next suppose f has a zero of order m at . Then
g (z)
f
(z)
f (z)
= g (z)
k=m
a
k
k (z )
k1
k=m
a
k
(z )
k
= g (z)
k=m
a
k
k (z )
k1m
k=m
a
k
(z )
km
and from this it is clear res (g (f
/f)) = g () m, where m is the order of the zero.

The conclusion of this theorem now follows from the residue theorem, Theorem
21.3.
The way people usually apply these theorems is to suppose
is a simple closed
bounded variation curve, often a circle. Thus it has an inside and an outside, the
outside being the unbounded component of C
. The orientation of the curve is

such that you go around it once in the counterclockwise direction. Then letting r
k
and l
k
be as described, the conclusion of the theorem follows. In applications, this
is likely the way it will be.
21.1.2 Rouches Theorem
With the argument principle, it is possible to prove Rouches theorem . In the
argument principle, denote by Z
f
the quantity

m
k=1
r
k
and by P
f
the quantity
n
k=1
l
k
. Thus Z
f
is the number of zeros of f counted according to the order of the
zero with a similar denition holding for P
f
. Thus the conclusion of the argument
principle is.
1
2i
_
(z)
f (z)
dz = Z
f
P
f
Rouches theorem allows the comparison of Z
h
P
h
for h = f, g. It is a wonderful
and amazing result.
Theorem 21.9 (Rouches theorem)Let f, g be meromorphic in an open set . Also
suppose
is a closed bounded variation curve with the property that for all z /
, n(, z) = 0, no zeros or poles are on
, and for all z , n(, z) either equals 0

or 1. Let Z
f
and P
f
denote respectively the numbers of zeros and poles of f, which
have the property that the winding number equals 1, counted according to order, with
Z
g
and P
g
being dened similarly. Also suppose that for z
[f (z) +g (z)[ < [f (z)[ +[g (z)[ . (21.4)

Then
Z
f
P
f
= Z
g
P
g
.
Proof: From the hypotheses,
1 +
f (z)
g (z)
< 1 +
f (z)
g (z)
which shows that for all z
,
f (z)
g (z)
C [0, ).
520 RESIDUES
Letting l denote a branch of the logarithm dened on C [0, ), it follows that
l
_
f(z)
g(z)
_
is a primitive for the function,
(f/g)
(f/g)
=
f
f

g
g
.
Therefore, by the argument principle,
0 =
1
2i
_
(f/g)
(f/g)
dz =
1
2i
_
_
f
f

g
g
_
dz
= Z
f
P
f
(Z
g
P
g
) .
Often another condition other than 21.4 is used.
Corollary 21.10 In the situation of Theorem 21.9 change 21.4 to the condition,
[f (z) g (z)[ < [f (z)[
for z
. Then the conclusion is the same.

Proof: The new condition implies
1
g
f
(z)
<
g(z)
f(z)
on
. Therefore,
g(z)
f(z)
/
(, 0] and so you can do the same argument with a branch of the logarithm.
21.1.3 A Dierent Formulation
In [40] I found this modication for Rouches theorem concerned with the counting
of zeros of analytic functions. This is a very useful form of Rouches theorem because
it makes no mention of a contour.
Theorem 21.11 Let be a bounded open set and suppose f, g are continuous on
and analytic on . Also suppose [f (z)[ < [g (z)[ on . Then g and f +g have the
same number of zeros in provided each zero is counted according to multiplicity.
Proof: Let K =
_
z : [f (z)[ [g (z)[
_
. Then letting [0, 1] , if z / K,
then [f (z)[ < [g (z)[ and so
0 < [g (z)[ [f (z)[ [g (z)[ [f (z)[ [g (z) +f (z)[
which shows that all zeros of g +f are contained in K which must be a compact
subset of due to the assumption that [f (z)[ < [g (z)[ on . By Theorem 19.52 on
Page 483 there exists a cycle,
k
n
k=1
such that
n
k=1
k
K,
n
k=1
n(
k
, z) = 1
for every z K and

n
k=1
n(
k
, z) = 0 for all z / . Then as above, it follows
from the residue theorem or more directly, Theorem 21.7,
n
k=1
1
2i
_
k
f
(z) +g
(z)
f (z) +g (z)
dz =
p
j=1
m
j
21.2. SINGULARITIES AND THE LAURENT SERIES 521
where m
j
is the order of the j
th
zero of f +g in K, hence in . However,

n
k=1
1
2i
_
k
f
(z) +g
(z)
f (z) +g (z)
dz
is integer valued and continuous so it gives the same value when = 0 as when
= 1. When = 0 this gives the number of zeros of g in and when = 1 it is
the number of zeros of f +g. This proves the theorem.
Here is another formulation of this theorem.
Corollary 21.12 Let be a bounded open set and suppose f, g are continuous on
and analytic on . Also suppose [f (z) g (z)[ < [g (z)[ on . Then f and
g have the same number of zeros in provided each zero is counted according to
multiplicity.
Proof: You let f g play the role of f in Theorem 21.11. Thus f g +g = f
and g have the same number of zeros. Alternatively, you can give a proof of this
directly as follows.
Let K = z : [f (z) g (z)[ [g (z)[ . Then if g (z) + (f (z) g (z)) = 0
it follows
0 = [g (z) +(f (z) g (z))[ [g (z)[ [f (z) g (z)[
[g (z)[ [f (z) g (z)[
and so z K. Thus all zeros of g (z) + (f (z) g (z)) are contained in K. By
Theorem 19.52 on Page 483 there exists a cycle,
k
n
k=1
such that
n
k=1
k

K,
n
k=1
n(
k
, z) = 1 for every z K and

n
k=1
n(
k
, z) = 0 for all z / .
Then by Theorem 21.7,
n
k=1
1
2i
_
k
(f
(z) g
(z)) +g
(z)
(f (z) g (z)) +g (z)
dz =
p
j=1
m
j
where m
j
is the order of the j
th
zero of (f g) + g in K, hence in . The left
side is continuous as a function of and so the number of zeros of g corresponding
to = 0 equals the number of zeros of f corresponding to = 1. This proves the
corollary.
21.2 Singularities And The Laurent Series
21.2.1 What Is An Annulus?
In general, when you consider singularities, isolated or not, the fundamental tool
is the Laurent series. This series is important for many other reasons also. In
particular, it is fundamental to the spectral theory of various operators in functional
analysis and is one way to obtain relationships between algebraic and analytical
522 RESIDUES
conditions essential in various convergence theorems. A Laurent series lives on an
annulus. In all this f has values in X where X is a complex Banach space. If you
like, let X = C.
Denition 21.13 Dene ann(a, R
1
, R
2
) z : R
1
< [z a[ < R
2
.
Thus ann(a, 0, R) would denote the punctured ball, B(a, R) 0 and when
R
1
> 0, the annulus looks like the following.
a
The annulus is the stu between the two circles.
Here is an important lemma which is concerned with the situation described in
the following picture.
a
z
a
z
Lemma 21.14 Let
r
(t) a + re
it
for t [0, 2] and let [z a[ < r. Then
n(
r
, z) = 1. If [z a[ > r, then n(
r
, z) = 0.
Proof: For the rst claim, consider for t [0, 1] ,
f (t) n(
r
, a +t (z a)) .
Then from properties of the winding number derived earlier, f (t) Z, f is continu-
ous, and f (0) = 1. Therefore, f (t) = 1 for all t [0, 1] . This proves the rst claim
because f (1) = n(
r
, z) .
For the second claim,
n(
r
, z) =
1
2i
_
r
1
w z
dw
=
1
2i
_
r
1
w a (z a)
dw
=
1
2i
1
z a
_
r
1
1
_
wa
za
_dw
=
1
2i (z a)
_
k=0
_
w a
z a
_
k
dw.
The series converges uniformly for w
r
because
w a
z a
=
r
r +c
for some c > 0 due to the assumption that [z a[ > r. Therefore, the sum and the
integral can be interchanged to give
n(
r
, z) =
1
2i (z a)
k=0
_
r
_
w a
z a
_
k
dw = 0
because w
_
wa
za
_
k
has an antiderivative. This proves the lemma.
Now consider the following picture which pertains to the next lemma.
r
a
Lemma 21.15 Let g be analytic on ann(a, R
1
, R
2
) . Then if
r
(t) a + re
it
for
t [0, 2] and r (R
1
, R
2
) , then
_
r
g (z) dz is independent of r.
Proof: Let R
1
< r
1
< r
2
< R
2
and denote by
r
(t) the curve,
r
(t)
a + re
i(2t)
for t [0, 2] . Then if z B(a, R
1
), Lemma 21.14 implies both
n
_
r
2
, z
_
and n
_
r
1
, z
_
= 1 and so
n
_
r
1
, z
_
+n
_
r
2
, z
_
= 1 + 1 = 0.
Also if z / B(a, R
2
) , then Lemma 21.14 implies n
_
r
j
, z
_
= 0 for j = 1, 2.
Therefore, whenever z / ann(a, R
1
, R
2
) , the sum of the winding numbers equals
zero. Therefore, by Theorem 19.46 applied to the function, f (w) = g (z) (w z)
and z ann(a, R
1
, R
2
)
2
j=1
r
j
([0, 2]) ,
f (z)
_
n
_
r
2
, z
_
+n
_
r
1
, z
__
= 0
_
n
_
r
2
, z
_
+n
_
r
1
, z
__
=
1
2i
_
r
2
g (w) (w z)
w z
dw
1
2i
_
r
1
g (w) (w z)
w z
dw
=
1
2i
_
r
2
g (w) dw
1
2i
_
r
1
g (w) dw
which proves the desired result.
524 RESIDUES
21.2.2 The Laurent Series
The Laurent series is like a power series except it allows for negative exponents.
First here is a denition of what is meant by the convergence of such a series.
Denition 21.16

n=
a
n
(z a)
n
converges if both the series,
n=0
a
n
(z a)
n
and
n=1
a
n
(z a)
n
converge. When this is the case, the symbol,

n=
a
n
(z a)
n
is dened as
n=0
a
n
(z a)
n
+
n=1
a
n
(z a)
n
.
Lemma 21.17 Suppose
f (z) =
n=
a
n
(z a)
n
for all [z a[ (R
1
, R
2
) . Then both

n=0
a
n
(z a)
n
and

n=1
a
n
(z a)
n
converge absolutely and uniformly on z : r
1
[z a[ r
2
for any r
1
< r
2
satis-
fying R
1
< r
1
< r
2
< R
2
.
Proof: Let R
1
< [w a[ = r
1
< r
1
. Then

n=1
a
n
(w a)
n
converges
and so
lim
n
[a
n
[ [w a[
n
= lim
n
[a
n
[ (r
1
)
n
= 0
which implies that for all n suciently large,
[a
n
[ (r
1
)
n
< 1.
Therefore,
n=1
[a
n
[ [z a[
n
=
n=1
[a
n
[ (r
1
)
n
(r
1
)
n
[z a[
n
.
Now for [z a[ r
1
,
[z a[
n
1
r
n
1
and so for all suciently large n
[a
n
[ [z a[
n
(r
1
)
n
r
n
1
.
Therefore, by the Weierstrass M test, the series,

n=1
a
n
(z a)
n
converges
absolutely and uniformly on the set
z C : [z a[ r
1
.
Similar reasoning shows the series,
n=0
a
n
(z a)
n
converges uniformly on the set
z C : [z a[ r
2
.
This proves the Lemma.
Theorem 21.18 Let f be analytic on ann(a, R
1
, R
2
) . Then there exist numbers,
a
n
C such that for all z ann(a, R
1
, R
2
) ,
f (z) =
n=
a
n
(z a)
n
, (21.5)
where the series converges absolutely and uniformly on ann(a, r
1
, r
2
) whenever R
1
<
r
1
< r
2
< R
2
. Also
a
n
=
1
2i
_
f (w)
(w a)
n+1
dw (21.6)
where (t) = a + re
it
, t [0, 2] for any r (R
1
, R
2
) . Furthermore the series is
unique in the sense that if 21.5 holds for z ann(a, R
1
, R
2
) , then a
n
is given in
21.6.
Proof: Let R
1
< r
1
< r
2
< R
2
and dene
1
(t) a + (r
1
) e
it
and
2
(t)
a+(r
2
+) e
it
for t [0, 2] and chosen small enough that R
1
< r
1
< r
2
+ <
R
2
.
a
z
2
Then using Lemma 21.14, if z / ann(a, R
1
, R
2
) then
n(
1
, z) +n(
2
, z) = 0
and if z ann(a, r
1
, r
2
) ,
n(
1
, z) +n(
2
, z) = 1.
526 RESIDUES
Therefore, by Theorem 19.46, for z ann(a, r
1
, r
2
)
f (z) =
1
2i
_
_
1
f (w)
w z
dw +
_
2
f (w)
w z
dw
_
=
1
2i
_
_
_
1
f (w)
(z a)
_
1
wa
za
_dw +
_
2
f (w)
(w a)
_
1
za
wa
_dw
_
_
=
1
2i
_
2
f (w)
w a
n=0
_
z a
w a
_
n
dw+
1
2i
_
1
f (w)
(z a)
n=0
_
w a
z a
_
n
dw. (21.7)
From the formula 21.7, it follows that for z ann(a, r
1
, r
2
), the terms in the rst
sum are bounded by an expression of the form C
_
r
2
r
2
+
_
n
while those in the second
are bounded by one of the form C
_
r
1
r
1
_
n
and so by the Weierstrass M test, the
convergence is uniform and so the integrals and the sums in the above formula may
be interchanged and after renaming the variable of summation, this yields
f (z) =
n=0
_
1
2i
_
2
f (w)
(w a)
n+1
dw
_
(z a)
n
+
1
n=
_
1
2i
_
1
f (w)
(w a)
n+1
_
(z a)
n
. (21.8)
Therefore, by Lemma 21.15, for any r (R
1
, R
2
) ,
f (z) =
n=0
_
1
2i
_
r
f (w)
(w a)
n+1
dw
_
(z a)
n
+
1
n=
_
1
2i
_
r
f (w)
(w a)
n+1
_
(z a)
n
. (21.9)
and so
f (z) =
n=
_
1
2i
_
r
f (w)
(w a)
n+1
dw
_
(z a)
n
.
where r (R
1
, R
2
) is arbitrary. This proves the existence part of the theorem. It
remains to characterize a
n
.
If f (z) =
n=
a
n
(z a)
n
on ann(a, R
1
, R
2
) let
f
n
(z)
n
k=n
a
k
(z a)
k
. (21.10)
This function is analytic in ann(a, R
1
, R
2
) and so from the above argument,
f
n
(z) =
k=
_
1
2i
_
r
f
n
(w)
(w a)
k+1
dw
_
(z a)
k
. (21.11)
Also if k > n or if k < n,
_
1
2i
_
r
f
n
(w)
(w a)
k+1
dw
_
= 0.
and so
f
n
(z) =
n
k=n
_
1
2i
_
r
f
n
(w)
(w a)
k+1
dw
_
(z a)
k
which implies from 21.10 that for each k [n, n] ,
1
2i
_
r
f
n
(w)
(w a)
k+1
dw = a
k
However, from the uniform convergence of the series,
n=0
a
n
(w a)
n
and
n=1
a
n
(w a)
n
ensured by Lemma 21.17 which allows the interchange of sums and integrals, if
k [n, n] ,
1
2i
_
r
f (w)
(w a)
k+1
dw
=
1
2i
_
m=0
a
m
(w a)
m
+
m=1
a
m
(w a)
m
(w a)
k+1
dw
=
m=0
a
m
1
2i
_
r
(w a)
m(k+1)
dw
+
m=1
a
m
_
r
(w a)
m(k+1)
dw
=
n
m=0
a
m
1
2i
_
r
(w a)
m(k+1)
dw
+
n
m=1
a
m
_
r
(w a)
m(k+1)
dw
=
1
2i
_
r
f
n
(w)
(w a)
k+1
dw
528 RESIDUES
because if l > n or l < n,
_
r
a
l
(w a)
l
(w a)
k+1
dw = 0
for all k [n, n] . Therefore,
a
k
=
1
2i
_
r
f (w)
(w a)
k+1
dw
and so this establishes uniqueness. This proves the theorem.
21.2.3 Contour Integrals And Evaluation Of Integrals
Here are some examples of hard integrals which can be evaluated by using residues.
This will be done by integrating over various closed curves having bounded variation.
Example 21.19 The rst example we consider is the following integral.
_

1
1 +x
4
dx
One could imagine evaluating this integral by the method of partial fractions
and it should work out by that method. However, we will consider the evaluation
of this integral by the method of residues instead. To do so, consider the following
picture.
x
y
Let
r
(t) = re
it
, t [0, ] and let
r
(t) = t : t [r, r] . Thus
r
parameterizes
the top curve and
r
parameterizes the straight line from r to r along the x
axis. Denoting by
r
the closed curve traced out by these two, we see from simple
estimates that
lim
r
_
r
1
1 +z
4
dz = 0.
This follows from the following estimate.
r
1
1 +z
4
dz
1
r
4
1
r.
Therefore,
_

1
1 +x
4
dx = lim
r
_
r
1
1 +z
4
dz.
We compute
_
r
1
1+z
4
dz using the method of residues. The only residues of the
integrand are located at points, z where 1 +z
4
= 0. These points are
z =
1
2
2
1
2
i
2, z =
1
2
2
1
2
i
2,
z =
1
2
2 +
1
2
i
2, z =
1
2
2 +
1
2
i
2
and it is only the last two which are found in the inside of
r
. Therefore, we need
to calculate the residues at these points. Clearly this function has a pole of order
one at each of these points and so we may calculate the residue at in this list by
evaluating
lim
z
(z )
1
1 +z
4
Thus
Res
_
f,
1
2
2 +
1
2
i
2
_
= lim
z
1
2
2+
1
2
i
2
_
z
_
1
2
2 +
1
2
i
2
__
1
1 +z
4
=
1
8
2
1
8
i
2
Similarly we may nd the other residue in the same way
Res
_
f,
1
2
2 +
1
2
i
2
_
= lim
z
1
2
2+
1
2
i
2
_
z
_
1
2
2 +
1
2
i
2
__
1
1 +z
4
=
1
8
i
2 +
1
8
2.
Therefore,
_
r
1
1 +z
4
dz = 2i
_
1
8
i
2 +
1
8
2 +
_
1
8
2
1
8
i
2
__
=
1
2
2.
530 RESIDUES
Thus, taking the limit we obtain
1
2
2 =
_
1
1+x
4
dx.
Obviously many dierent variations of this are possible. The main idea being
that the integral over the semicircle converges to zero as r .
Sometimes we dont blow up the curves and take limits. Sometimes the problem
of interest reduces directly to a complex integral over a closed curve. Here is an
example of this.
Example 21.20 The integral is
_

0
cos
2 + cos
d
This integrand is even and so it equals
1
2
_

cos
2 + cos
d.
For z on the unit circle, z = e
i
, z =
1
z
and therefore, cos =
1
2
_
z +
1
z
_
. Thus
dz = ie
i
d and so d =
dz
iz
. Note this is proceeding formally to get a complex
integral which reduces to the one of interest. It follows that a complex integral
which reduces to the one desired is
1
2i
_
1
2
_
z +
1
z
_
2 +
1
2
_
z +
1
z
_
dz
z
=
1
2i
_
z
2
+ 1
z (4z +z
2
+ 1)
dz
where is the unit circle. Now the integrand has poles of order 1 at those points
where z
_
4z +z
2
+ 1
_
= 0. These points are
0, 2 +
3, 2
3.
Only the rst two are inside the unit circle. It is also clear the function has simple
poles at these points. Therefore,
Res (f, 0) = lim
z0
z
_
z
2
+ 1
z (4z +z
2
+ 1)
_
= 1.
Res
_
f, 2 +
3
_
=
lim
z2+
3
_
z
_
2 +
3
__
z
2
+ 1
z (4z +z
2
+ 1)
=
2
3
3.
It follows
_

0
cos
2 + cos
d =
1
2i
_
z
2
+ 1
z (4z +z
2
+ 1)
dz
=
1
2i
2i
_
1
2
3
3
_
=
_
1
2
3
3
_
.
Other rational functions of the trig functions will work out by this method also.
Sometimes you have to be clever about which version of an analytic function
that reduces to a real function you should use. The following is such an example.
Example 21.21 The integral here is
_

0
lnx
1 +x
4
dx.
The same curve used in the integral involving
sin x
x
earlier will create problems
with the log since the usual version of the log is not dened on the negative real
axis. This does not need to be of concern however. Simply use another branch of
the logarithm. Leave out the ray from 0 along the negative y axis and use Theorem
20.5 to dene L(z) on this set. Thus L(z) = ln [z[ +i arg
1
(z) where arg
1
(z) will be
the angle, , between
2
and
3
2
such that z = [z[ e
i
. Now the only singularities
contained in this curve are
1
2
2 +
1
2
i
2,
1
2
2 +
1
2
i
2
and the integrand, f has simple poles at these points. Thus using the same proce-
dure as in the other examples,
Res
_
f,
1
2
2 +
1
2
i
2
_
=
1
32
2
1
32
i
2
and
Res
_
f,
1
2
2 +
1
2
i
2
_
=
3
32
2 +
3
32
i
2.
Consider the integral along the small semicircle of radius r. This reduces to
_
0
ln[r[ +it
1 + (re
it
)
4
_
rie
it
_
dt
which clearly converges to zero as r 0 because r lnr 0. Therefore, taking the
limit as r 0,
_
large semicircle
L(z)
1 +z
4
dz + lim
r0+
_
r
R
ln(t) +i
1 +t
4
dt+
lim
r0+
_
R
r
lnt
1 +t
4
dt = 2i
_
3
32
2 +
3
32
i
2 +
1
32
2
1
32
i
2
_
.
532 RESIDUES
Observing that
_
large semicircle
L(z)
1+z
4
dz 0 as R ,
e (R) + 2 lim
r0+
_
R
r
lnt
1 +t
4
dt +i
_
0
1
1 +t
4
dt =
_
1
8
+
1
4
i
_
2
where e (R) 0 as R . From an earlier example this becomes
e (R) + 2 lim
r0+
_
R
r
lnt
1 +t
4
dt +i
_
2
4

_
=
_
1
8
+
1
4
i
_
2.
Now letting r 0+ and R ,
2
_

0
lnt
1 +t
4
dt =
_
1
8
+
1
4
i
_
2 i
_
2
4

_
=
1
8
2
2
,
and so
_

0
lnt
1 +t
4
dt =
1
16
2
2
,
which is probably not the rst thing you would thing of. You might try to imagine
how this could be obtained using elementary techniques.
The next example illustrates the use of what is referred to as a branch cut. It
includes many examples.
Example 21.22 Mellin transformations are of the form
_

0
f (x) x
dx
x
.
Sometimes it is possible to evaluate such a transform in terms of the constant, .
Assume f is an analytic function except at isolated singularities, none of which
are on (0, ) . Also assume that f has the growth conditions,
[f (z)[
C
[z[
b
, b >
for all large [z[ and assume that
[f (z)[
C
[z[
b
1
, b
1
<
for all [z[ suciently small. It turns out there exists an explicit formula for this
Mellin transformation under these conditions. Consider the following contour.
'
R
E
'
In this contour the small semicircle in the center has radius which will converge
to 0. Denote by
R
the large circular path which starts at the upper edge of the
slot and continues to the lower edge. Denote by
the small semicircular contour

and denote by
R+
the straight part of the contour from 0 to R which provides
the top edge of the slot. Finally denote by
R
the straight part of the contour
from R to 0 which provides the bottom edge of the slot. The interesting aspect of
this problem is the denition of f (z) z
1
. Let
z
1
e
(ln|z|+i arg(z))(1)
= e
(1) log(z)
where arg (z) is the angle of z in (0, 2) . Thus you use a branch of the logarithm
which is dened on C(0, ) . Then it is routine to verify from the assumed estimates
that
lim
R
_
R
f (z) z
1
dz = 0
and
lim
0+
_
f (z) z
1
dz = 0.
Also, it is routine to verify
lim
0+
_
R+
f (z) z
1
dz =
_
R
0
f (x) x
1
dx
and
lim
0+
_
R
f (z) z
1
dz = e
i2(1)
_
R
0
f (x) x
1
dx.
534 RESIDUES
Therefore, letting
R
denote the sum of the residues of f (z) z
1
which are con-
tained in the disk of radius R except for the possible residue at 0,
e (R) +
_
1 e
i2(1)
_
_
R
0
f (x) x
1
dx = 2i
R
where e (R) 0 as R . Now letting R ,
lim
R
_
R
0
f (x) x
1
dx =
2i
1 e
i2(1)
=
e
i
sin()
where denotes the sum of all the residues of f (z) z

1
except for the residue at
0.
The next example is similar to the one on the Mellin transform. In fact it is
a Mellin transform but is worked out independently of the above to emphasize a
slightly more informal technique related to the contour.
Example 21.23
_
0
x
p1
1+x
dx, p (0, 1) .
Since the exponent of x in the numerator is larger than 1. The integral does
converge. However, the techniques of real analysis dont tell us what it converges
to. The contour to be used is as follows: From (, 0) to (r, 0) along the x axis and
then from (r, 0) to (r, 0) counter clockwise along the circle of radius r, then from
(r, 0) to (, 0) along the x axis and from (, 0) to (, 0) , clockwise along the circle
of radius . You should draw a picture of this contour. The interesting thing about
this is that z
p1
cannot be dened all the way around 0. Therefore, use a branch of
z
p1
corresponding to the branch of the logarithm obtained by deleting the positive
x axis. Thus
z
p1
= e
(ln|z|+iA(z))(p1)
where z = [z[ e
iA(z)
and A(z) (0, 2) . Along the integral which goes in the positive
direction on the x axis, let A(z) = 0 while on the one which goes in the negative
direction, take A(z) = 2. This is the appropriate choice obtained by replacing the
line from (, 0) to (r, 0) with two lines having a small gap joined by a circle of radius
and then taking a limit as the gap closes. You should verify that the two integrals
taken along the circles of radius and r converge to 0 as 0 and as r .
Therefore, taking the limit,
_

0
x
p1
1 +x
dx +
_
0
x
p1
1 +x
_
e
2i(p1)
_
dx = 2i Res (f, 1) .
Calculating the residue of the integrand at 1, and simplifying the above expression,
_
1 e
2i(p1)
_
_

0
x
p1
1 +x
dx = 2ie
(p1)i
.
Upon simplication
_

0
x
p1
1 +x
dx =

sinp
.
Example 21.24 The Fresnel integrals are
_

0
cos
_
x
2
_
dx,
_

0
sin
_
x
2
_
dx.
To evaluate these integrals consider f (z) = e
iz
2
on the curve which goes from
the origin to the point r on the x axis and from this point to the point r
_
1+i
2
_
along a circle of radius r, and from there back to the origin as illustrated in the
following picture.
x
y
Thus the curve to integrate over is shaped like a slice of pie. Denote by
r
the
curved part. Since f is analytic,
0 =
_
r
e
iz
2
dz +
_
r
0
e
ix
2
dx
_
r
0
e
i
1+i
2
_
1 +i
2
_
dt
=
_
r
e
iz
2
dz +
_
r
0
e
ix
2
dx
_
r
0
e
t
2
_
1 +i
2
_
dt
=
_
r
e
iz
2
dz +
_
r
0
e
ix
2
dx
2
_
1 +i
2
_
+e (r)
where e (r) 0 as r . Here we used the fact that
_
0
e
t
2
dt =
2
. Now
consider the rst of these integrals.
r
e
iz
2
dz
_
4
0
e
i(re
it
)
2
rie
it
dt
r
_
4
0
e
r
2
sin 2t
dt
=
r
2
_
1
0
e
r
2
u
1 u
2
du
r
2
_
r
(3/2)
0
1
1 u
2
du +
r
2
__
1
0
1
1 u
2
_
e
(r
1/2
)
536 RESIDUES
which converges to zero as r . Therefore, taking the limit as r ,
2
_
1 +i
2
_
=
_

0
e
ix
2
dx
and so
_

0
sinx
2
dx =
2
=
_

0
cos x
2
dx.
The following example is one of the most interesting. By an auspicious choice
of the contour it is possible to obtain a very interesting formula for cot z known
as the Mittag- Leer expansion of cot z.
Example 21.25 Let
N
be the contour which goes from N
1
2
Ni horizontally
to N +
1
2
Ni and from there, vertically to N +
1
2
+ Ni and then horizontally
to N
1
2
+ Ni and nally vertically to N
1
2
Ni. Thus the contour is a
large rectangle and the direction of integration is in the counter clockwise direction.
Consider the following integral.
I
N

_
N
cos z
sinz (
2
z
2
)
dz
where 1 is not an integer. This will be used to verify the formula of Mittag
Leer,
1
2
+
n=1
2
2
n
2
=
cot
. (21.12)
You should verify that cot z is bounded on this contour and that therefore,
I
N
0 as N . Now you compute the residues of the integrand at and
at n where [n[ < N +
1
2
for n an integer. These are the only singularities of the
integrand in this contour and therefore, you can evaluate I
N
by using these. It is
left as an exercise to calculate these residues and nd that the residue at is
cos
2sin
while the residue at n is
1
2
n
2
.
Therefore,
0 = lim
N
I
N
= lim
N
2i
_
N
n=N
1
2
n
2

cot
_
which establishes the following formula of Mittag Leer.
lim
N
N
n=N
1
2
n
2
=
cot
.
Writing this in a slightly nicer form, yields 21.12.
21.3. THE SPECTRAL RADIUS OF A BOUNDED LINEAR TRANSFORMATION537
21.3 The Spectral Radius Of A Bounded Linear
Transformation
As a very important application of the theory of Laurent series, I will give a short
description of the spectral radius. This is a fundamental result which must be
understood in order to prove convergence of various important numerical methods
such as the Gauss Seidel or Jacobi methods.
Denition 21.26 Let X be a complex Banach space and let A L(X, X) . Then
r (A)
_
C : (I A)
1
L(X, X)
_
This is called the resolvent set. The spectrum of A, denoted by (A) is dened as
all the complex numbers which are not in the resolvent set. Thus
(A) C r (A)
Lemma 21.27 r (A) if and only if I A is one to one and onto X. Also if
[[ > [[A[[ , then (A). If the Neumann series,
1
k=0
_
A
_
k
converges, then
1
k=0
_
A
_
k
= (I A)
1
.
Proof: Note that to be in r (A) , I A must be one to one and map X onto
X since otherwise, (I A)
1
/ L(X, X) .
By the open mapping theorem, if these two algebraic conditions hold, then
(I A)
1
is continuous and so this proves the rst part of the lemma. Now
suppose [[ > [[A[[ . Consider the Neumann series
1
k=0
_
A
_
k
.
By the root test, Theorem 19.3 on Page 450 this series converges to an element
of L(X, X) denoted here by B. Now suppose the series converges. Letting B
n

1
n
k=0
_
A
_
k
,
(I A) B
n
= B
n
(I A) =
n
k=0
_
A
_
k
k=0
_
A
_
k+1
= I
_
A
_
n+1
I
538 RESIDUES
as n because the convergence of the series requires the n
th
term to converge
to 0. Therefore,
(I A) B = B(I A) = I
which shows I A is both one to one and onto and the Neumann series converges
to (I A)
1
This lemma also shows that (A) is bounded. In fact, (A) is closed.
Lemma 21.28 r (A) is open. In fact, if r (A) and [ [ <
(I A)
1
1
,
then r (A).
Proof: First note
(I A) =
_
I ( ) (I A)
1
_
(I A) (21.13)
= (I A)
_
I ( ) (I A)
1
_
(21.14)
Also from the assumption about [ [ ,
( ) (I A)
1
[ [
(I A)
1
< 1
and so by the root test,
k=0
_
( ) (I A)
1
_
k
converges to an element of L(X, X) . As in Lemma 21.27,
k=0
_
( ) (I A)
1
_
k
=
_
I ( ) (I A)
1
_
1
.
Therefore, from 21.13,
(I A)
1
= (I A)
1
_
I ( ) (I A)
1
_
1
.
Corollary 21.29 (A) is a compact set.
Proof: Lemma 21.27 shows (A) is bounded and Lemma 21.28 shows it is
closed.
Denition 21.30 The spectral radius, denoted by (A) is dened by
(A) max [[ : (A) .
Since (A) is compact, this maximum exists. Note from Lemma 21.27, (A)
[[A[[.
21.4. ANALYTIC SEMIGROUPS 539
There is a simple formula for the spectral radius.
Lemma 21.31 If [[ > (A) , then the Neumann series,
1
k=0
_
A
_
k
converges.
Proof: This follows directly from Theorem 21.18 on Page 525 and the obser-
vation above that
1
k=0
_
A
_
k
= (I A)
1
for all [[ > [[A[[. Thus the analytic
function, (I A)
1
has a Laurent expansion on [[ > (A) by Theorem 21.18
and it must coincide with
1
k=0
_
A
_
k
on [[ > [[A[[ so the Laurent expansion of
(I A)
1
must equal
1
k=0
_
A
_
k
on [[ > (A) . This proves the lemma.
The theorem on the spectral radius follows. It is due to Gelfand.
Theorem 21.32 (A) = lim
n
[[A
n
[[
1/n
.
Proof: If
[[ < lim sup
n
[[A
n
[[
1/n
then by the root test, the Neumann series does not converge and so by Lemma
21.31 [[ (A) . Thus
(A) lim sup
n
[[A
n
[[
1/n
.
Now let p be a positive integer. Then (A) implies
p
(A
p
) because
p
I A
p
= (I A)
_
p1
+
p2
A+ +A
p1
_
=
_
p1
+
p2
A+ +A
p1
_
(I A)
It follows from Lemma 21.27 applied to A
p
that for (A) , [
p
[ [[A
p
[[ and so
[[ [[A
p
[[
1/p
. Therefore, (A) [[A
p
[[
1/p
and since p is arbitrary,
lim inf
p
[[A
p
[[
1/p
(A) lim sup
n
[[A
n
[[
1/n
.
21.4 Analytic Semigroups
21.4.1 Sectorial Operators And Analytic Semigroups
With the theory of functions of a complex variable, it is time to consider the notion
of analytic semigroups. These are better than continuous semigroups. I am mostly
following the presentation in Henry [24]. In what follows H will be a Banach space
unless specied to be a Hilbert space.
540 RESIDUES
Denition 21.33 Let < /2 and for a 1, let S
a
denote the sector in the
complex plane
z C a : [arg (z a)[
This sector is as shown below.
S
a
a
A closed, densely dened linear operator, A is called sectorial if for some sector
as described above, it follows that for all S
a
,
(I A)
1
L(H, H)
and for some M it satises
(I A)
1

M
[ a[
To begin with it is interesting to have a perturbation theorem for sectorial
operators. First note that for S
a
,
A(I A)
1
= I +(I A)
1
Proposition 21.34 Suppose A is a sectorial operator as dened above so it is a
densely dened closed operator on D(A) H which satises
A(I A)
1
C (21.15)
whenever [[ , S
a
, is suciently large and suppose B is a densely dened closed
operator such that D(B) D(A) and for all x D(A) ,
[[Bx[[ [[Ax[[ +K[[x[[ (21.16)
where C < 1. Then A+B is also sectorial.
Proof: I need to consider (I (A+B))
1
. This equals
__
I B(I A)
1
_
(I A)
_
1
. (21.17)
The issue is whether this makes any sense for all S
b
for some b 1. Let b > a
be very large so that if S
b
, then 21.15 holds. Then from 21.16, it follows that
for [[x[[ 1,
B(I A)
1
x
A(I A)
1
x
+K
(I A)
1
x
C +K/ [ a[
and so if b is made still larger, it follows this is less than r < 1 for all [[x[[ 1.
Therefore, for such b,
_
I B(I A)
1
_
1
exists and so for such b, the expression in 21.17 makes sense and equals
(I A)
1
_
I B(I A)
1
_
1
and furthermore,
(I A)
1
_
I B(I A)
1
_
1
M
[ a[
1
1 r

M
[ b[
by adjusting the constants because
M
[ a[
[ b[
1 r
is bounded for S
b
. This proves the proposition.
It is an interesting proposition because when you have compact embeddings,
such inequalities tend to hold.
Denition 21.35 Let > 0 and for a sectorial operator as dened above, let the
contour
,
be as shown next where the orientation is also as shown by the arrow.
542 RESIDUES
S
a
a
The little circle has radius in the above contour.

Denition 21.36 For t S
0
0(+/2)
the open sector shown in the following picture,
0
S
0(+/2) +/2
dene
S (t)
1
2i
_
,
e
t
(I A)
1
d (21.18)
where is chosen such that t is a positive distance from the set of points included
in
,
. The situation is described by the following picture which shows S
0
0(+/2)
and S
0
. Note how the dotted line is at right angles to the solid line.
S
0(+/2)
t
Also dene S (0) I. It isnt necessary that be small, just that

,
not contain
t. This is because the integrand in 21.18 is analytic.
Then it is necessary to show the above denition is well dened.
Lemma 21.37 The above denition is well dened for t S
0
0(+/2)
. Also there is
a constant, M
r
such that
[[S (t)[[ M
r
for every t S
0
0(+/2)
such that [arg t[ r <
_
2

_
.
Proof: In the denition of S (t) one can take = 1/ [t[ . Then on the little
circle which is part of
,
the contour integral equals
1
2
_

e
e
i(+arg(t))
_
1
[t[
e
i
A
_
1
1
[t[
e
i
d
and by assumption the norm of the integrand is no larger than
M
1/ [t[
1
[t[
and so the norm of this integral is dominated by
M
2
_

d =
M
2
(2 2) M
which is independent of t.
Now consider the part of the contour used to dene S (t) which is the top line
segment. This equals
1
2i
_

1/|t|
e
ywt
(ywI A)
1
wdy
544 RESIDUES
where w is a xed complex number of unit length which gives a direction for motion
along this segment, arg (w) = . Then the norm of this is dominated by
1
2
_

1/|t|
e
ywt
M
y
dy =
1
2
_

1/|t|
exp(y [t[ cos (arg (w) + arg (t)))
M
y
dy
By assumption [arg (t)[ r <
_
2

_
and so
arg (w) + arg (t) ( ) (r)
=

2
+
__
2

_
r
_

2
+ (r) , (r) > 0.
It follows the integral dominated by
1
2
_

1/|t|
exp(c (r) [t[ y)
M
y
dy
=
1
2
_

1
exp(c (r) x)
M[t[
x
1
[t[
dx
=
1
2
_

1
exp(c (r) x)
M
x
dx
where c (r) < 0 independent of [arg (t)[ r. A similar estimate holds for the integral
on the bottom segment. Thus for [arg (t)[ r, [[S (t)[[ is bounded. This proves the
Lemma.
Also note that if the contour is shifted to the right slightly, the integral over
the shifted contour,
,
coincides with the integral over
,
thanks to the Cauchy
integral formula and an approximation argument involving truncating the innite
contours and joining them at either end. Also note that in particular, [[S (t)[[ is
bounded for all positive real t. The following is the main result.
Theorem 21.38 Let A be a sectorial operator as dened in Denition 21.33 for
the sector S
0
.
1. Then S (t) given above in 21.18 is analytic for t S
0
0(+/2)
.
2. For any x H and t > 0, then for n a positive integer,
S
(n)
(t) x = A
n
S (t) x
3. S is a semigroup on the open sector, S
0
0(+/2)
. That is, for all t, s S
0
0(+/2)
,
S (t +s) = S (t) S (s)
4. t S (t) x is continuous at t = 0 for all x H.
5. For some constants M, N such that if t is positive and real,
[[S (t)[[ M
[[AS (t)[[
N
t
Proof: Consider the rst claim. This follows right away from the formula.
S (t)
1
2i
_
,
e
t
(I A)
1
d
The estimates for uniform convergence do not change for small changes in t and so
the formula can be dierentiated with respect to the complex variable t using the
dominated convergence theorem to obtain
S
(t)
1
2i
_
,
e
t
(I A)
1
d
=
1
2i
_
,
e
t
_
I +A(I A)
1
_
d
=
1
2i
_
,
e
t
A(I A)
1
d
because of the Cauchy integral theorem and an approximation result. Now approx-
imating the innite contour with a nite one and then the integral with Riemann
sums, one can use the fact A is closed to take A out of the integral and write
S
(t) = A
_
1
2i
_
,
e
t
(I A)
1
d
_
= AS (t)
To get the higher derivatives, note S (t) has innitely many derivatives due to t
being a complex variable. Therefore,
S
(t) = lim
h0
S
(t +h) S
(t)
h
= lim
h0
A
S (t +h) S (t)
h
and
S (t +h) S (t)
h
AS (t)
and so since A is closed, AS (t) D(A) and
A
2
S (t)
Continuing this way yields the claims 1.) and 2.). Note this also implies S (t) x
D(A) for each t S
0
0(+/2)
.
546 RESIDUES
Next consider the semigroup property. Let s, t S
0
0(+/2)
and let be suf-
ciently small that
,
is at a positive distance from both s and t. As described
above let
,
denote the contour shifted slightly to the right, still at a positive
distance from t. Then
S (t) S (s) =
_
1
2i
_
2
_
,
_
,
e
t
(I A)
1
e
s
(I A)
1
dd
At this point note that
(I A)
1
(I A)
1
= ( )
1
_
(I A)
1
(I A)
1
_
.
Then substituting this in the integrals above, it equals
_
1
2i
_
2
_
,
_
,
e
s
e
t
_
( )
1
_
(I A)
1
(I A)
1
__
dd
=
_
1
2i
_
2
_
,
e
t
_
,
e
s
( )
1
(I A)
1
dd
+
_
1
2i
_
2
_
,
_
,
e
s
e
t
( )
1
(I A)
1
dd
The order of integration can be interchanged because of the absolute convergence
and Fubinis theorem. Then this reduces to
=
_
1
2i
_
2
_
,
(I A)
1
e
s
_
,
e
t
( )
1
dd
+
_
1
2i
_
2
_
,
(I A)
1
e
t
_
,
e
s
( )
1
dd
Now the following diagram might help in drawing some interesting conclusions.
The rst iterated integral equals 0. This can be seen from the above picture.
The inner integral taken over
,
is essentially equal to the integral over the closed
contour in the above picture provided the radius of the part of the circle in the
above closed contour is large enough. This closed contour integral equals 0 by the
Cauchy integral theorem. The second iterated integral equals
1
2i
_
,
(I A)
1
e
t
e
s
d
This can be seen from considering a similar closed contour, using the Cauchy integral
formula. This veries the semigroup identity.
Next consider 4.), the continuity at t = 0. This requires showing
lim
t0+
S (t) x = x
where the limit is taken through positive real values of t. It suces to let x D(A)
because by Lemma 21.37, [[S (t)[[ is bounded for these values of t. Also in doing
the computation, let
,
equal
t
1
,
and it will be assumed t < 1. Then one must
estimate
[[S (t) x x[[
1
2i
_
t
1
,
e
t
(I A)
1
xd Ix
(21.19)
By the Cauchy integral formula and approximating the integral with a contour
integral over a nite closed contour,
1
2i
_
t
1
,
e
t
d = 1
548 RESIDUES
and so 21.19 equals
1
2i
_
t
1
,
e
t
_
(I A)
1
I
1
_
xd
1
2i
_
t
1
,
e
t
1
(I A)
1
Axd
Changing the variable letting t = , the above equals
1
2i
_
1,
e
t ()
1
((/t) I A)
1
Ax
1
t
d
which is dominated by
1
2
_
1,
e
|| cos
t
2
M
[[
2
[[Ax[[
1
t
d [[ = tC [[Ax[[ .
Therefore, lim
t0+
[[S (t) x x[[ = 0 whenever x D(A) . Since S (t) is bounded
for positive t, if y H is arbitrary, choose x D(A) close enough to y such that
[[S (t) y y[[ [[S (t) (y x)[[ +[[S (t) x x[[ +[[x y[[
/2 +[[S (t) x x[[
and now if t is close enough to 0 it follows the right side is less than . This proves
4.).
Finally consider 5.). The rst part follows from Lemma 21.37. It remains to
show the second part. Let x H. First suppose t ,= 1.
[[AS (t) x[[ =
1
2i
_
t
1
,
e
t
A(I A)
1
xd
1
2i
_
t
1
,
e
t
_
I +(I A)
1
_
xd
1
2i
_
t
1
,
e
t
(I A)
1
xd
1
2i
_
1,
e
t
_
t
I A
_
1
x
1
t
d
and this is dominated by
1
2
_
1,
e
|| cos
Md [[
1
t
=
N
t
.
What if A is sectorial in the more general sense with S
a
taking the place of S
0
and the resolvent estimate being
(I A)
1

M
[ a[
, S
a
?
Then S
a
= a +S
0
and so for S
0
, ( +a) S
a
and so
(I (AaI))
1
= (( +a) I A)
1
L(H, H)
and
(I (AaI))
1
(( +a) A)
1
M
[( +a) a[
=
M
[[
Therefore, letting A
a
AaI, it follows the result of Theorem 21.38 holds for this
operator. There exists a semigroup, S
a
(t) satisfying
1. S
a
(t) is analytic for t S
0
0(+/2)
.
S
(n)
a
(t) x = A
n
a
S
a
(t) x
3. S
a
is a semigroup on the open sector, S
0
0(+/2)
. That is, for all t, s S
0
0(+/2)
,
S
a
(t +s) = S
a
(t) S
a
(s)
4. t S
a
(t) x is continuous at t = 0 for all x H.
[[S
a
(t)[[ M
[[A
a
S
a
(t)[[
N
t
Dene
S (t) e
at
S
a
(t) .
This satises the semigroup identity, is continuous at 0, and satises the inequality
[[S (t)[[ Me
at
for all t real and nonnegative. What is its derivative?
S
(t) = ae
at
S
a
(t) + (AaI) e
at
S
a
(t)
= Ae
at
S
a
(t) = AS (t)
550 RESIDUES
and by continuing this way using A is closed as before, it follows
S
(n)
(t) = A
n
S (t) .
Also for t > 0,
[[AS (t)[[ =
e
at
AS
a
(t)
e
at
(AaI) S
a
(t) +e
at
aS
a
(t)
e
at
A
a
S
a
(t) +e
at
aS
a
(t)
e
at
N
t
+ae
at
M = e
at
_
N
t
+aM
_
.
It is clear S (t) x is continuous at 0 since this is true of S
a
(t).
This proves the following corollary which is a useful generalization of the above
major theorem.
Corollary 21.39 Let A be sectorial satisfying
(I A)
1
L(H, H) for S
a
for < /2 and
(I A)
1

M
[ a[
for all S
a
. Then there exists a semigroup, S (t) dened on S
0
0(+/2)
which
satises
1. S (t) is analytic for t S
0
0(+/2)
.
S
(n)
(t) x = A
n
S (t) x
3. S (t) is a semigroup on the open sector, S
0
0(+/2)
. That is, for all t, s
S
0
0(+/2)
,
S (t +s) = S (t) S (s)
4. t S (t) x is continuous at t = 0 for all x H.
[[S (t)[[ Me
at
[[AS (t)[[ e
at
_
N
t
+aM
_
21.4.2 The Numerical Range
In Hilbert space, there is a useful easy to check criterion which implies an operator
is sectorial.
Denition 21.40 Let A be a closed densely dened operator A : D(A) H for
H a Hilbert space. The numerical range is the following set.
(Au, u) : u D(A)
Also recall the resolvent set, r (A) consists of those C such that (I A)
1
L(H, H) . Thus, to be in this set I A is one to one and onto with continuous
inverse.
Proposition 21.41 Suppose the numerical range of A,a closed densely dened op-
erator A : D(A) H for H a Hilbert space is contained in the set
z C : [arg (z)[
where 0 < < /2 and suppose A
1
L(H, H) , (0 r (A)). Then A is sectorial
with the sector S
0,
where /2 >
> .
Proof: Here is a picture of the situation along with details used to motivate
the proof.
(Tu, u)
In the picture the angle which is a little larger than is
. Let be as shown
with [arg [
. Then from the picture and trigonometry, if u D(A) ,

[[ sin
_
_
<

_
A
u
[u[
,
u
[u[
_
and so
[u[ [[ sin
_
_
<
_
u Au,
u
[u[
_
[[(I A) u[[
552 RESIDUES
Hence for all such that [arg [
and u D(A) ,
[u[ <
_
1
sin
_
_
_
1
[[
[(I A) u[
M
[[
[(I A) u[
Thus (I A) is one to one on S
0,
and if r (A) , then
(I A)
1
<
M
[[
.
By assumption 0 r (A). Now if [[ is small,
(I A)
1
must exist because it equals
__
A
1
I
_
A
_
1
and for [[ <
A
1
,
_
A
1
I
_
1
L(H, H) since the innite series
k=0
(1)
k
_
A
1
_
k
converges and must equal to
_
A
1
I
_
1
. Therefore, there exists S
0,
such
that ,= 0 and r (A). Also if ,= 0 and S
0,
, then if [ [ <
||
M
, (I A)
1
must exist because
(I A)
1
=
__
( ) (I A)
1
I
_
(I A)
_
1
where
_
( ) (I A)
1
I
_
1
exists because
( ) (I A)
1
= [ [
(I A)
1
<
[[
M

M
[[
= 1.
It follows that if S
_
S
0,
: r (A)
_
, then S is open in S
0,
. However, S is
also closed because if = lim
n
n
where
n
S, then if = 0, it is given S.
If ,= 0, then for large enough n,
[
n
[ <
[
n
[
M
and so S. Since S
0,
is connected, it follows S = S
0,
. This proves the propo-
sition.
Corollary 21.42 If for some a 1, the numerical values of aI + A are in the
set : [[ where 0 < < /2, and a r (A) then A is sectorial.
Proof: By assumption, 0 r (aI +A) and also from Proposition 21.41, for
S
0,
where /2 >
> ,
((aI +A) I)
1
L(H, H) ,
((aI +A) I)
1

M
[[
Therefore, for S
0,
, +a r (A) . Therefore, if S
a,
, a S
0,
(AI)
1
(AaI ( a) I)
1

M
[ a[
21.4.3 An Interesting Example
In this section related to this example, for V a Banach space, V
will denote the

space of continuous conjugate linear functions dened on V . Usually the symbol
has meant the space of continuous linear functions but here they will be conjugate
linear. That is f V
means
f (ax +by) = af (x) +bf (y)
and f is continuous.
Let be a bounded open set in 1
n
and dene
V
0

_
u C
_
: u = 0 on
_
where is some measurable subset of the boundary of and C
_
denotes the
restrictions of functions in C
c
(1
n
) to . By Corollary 10.48 V
0
is dense in L
2
() .
Now dene the following for u, v V
0
.
A
0
u(v) a
_
uvdx
_
a (x) u vdx
where a > 0 and a (x) 0 is a C
1
_
_
function. Also dene the following inner
product on V
0
.
(u, v)
1

_
_
auv +a (x) u v
_
dx
Let [[[[
1
denote the corresponding norm.
Of course V
0
is not a Banach space because it fails to be complete. u V will
mean that u L
2
() and there exists a sequence u
n
V
0
such that
lim
m,n
[[u
n
u
m
[[
1
= 0
554 RESIDUES
and
lim
n
[u
n
u[
L
2
()
= 0.
For u V, dene u to be that element of L
2
(; C
n
, a (x) dm
n
) , the space of vector
valued L
2
functions taken with respect to the measure a (x) dm
n
which satises
[u u
n
[
L
2
(;C
n
,a(x)dm
n
)
0.
Denote this space by W for simplicity of notation.
Observation 21.43 V is a Hilbert space with inner product given by
(u, v)
1

_
_
auv +a (x) u v
_
dx
Everything is obvious except completeness. Suppose then that u
n
is a Cauchy
sequence in V. Then there exists a unique u L
2
() such that [u
n
u[
L
2
()
0.
Now let
[w
n
u
n
[
L
2
()
+[w
n
u
n
[
W
< 1/2
n
It follows w
n
is also a Cauchy sequence in W while w
n
in L
2
() converging to u. Thus the thing to which w
n
converges in W is the
denition of u and u V. Thus
[[u
n
u[[
1
[[u
n
w
n
[[
1
+[[w
n
u[[
1
<
1
2
n
+[[w
n
u[[
1
and the last term converges to 0. Hence V is complete as claimed.
Then it is clear V is a Hilbert space. The next observation is a simple one
involving the Riesz map.
Denition 21.44 Let V be a Hilbert space and let V
be the space of continuous

conjugate linear functions dened on V . Then dene R : V V
by
Rx(y) (x, y) .
This is called the Riesz map.
Lemma 21.45 The Riesz map is one to one and onto and linear.
Proof: It is obvious it is one to one and linear. The only challenge is to show
it is onto. Let z
. If z
(V ) = 0 , then letting z = 0, it follows Rz = z
. If
z
(V ) ,= 0, then
ker (z
) x V : z
(x) = 0
is a closed subspace. It is closed because z
is continuous and it is just z

1
(0) .
Since ker (z
) is not everything in V there exists

w ker (z
x : (x, y) = 0 for all y ker (z
)
and w ,= 0. Then
z
_
z
(x)w z
(w)x
_
= z
(x) z
(w) z
(w) z
(x) = 0
and so z
(x)w z
(w)x ker (z
) . Therefore, for any x V,

0 =
_
w, z
(x)w z
(w)x
_
= z
(x) (w, w) z
(w) (w, x)
and so
z
(x) =
_
z
(w)
[[w[[
2
, x
_
so let z = w/ [[w[[
2
. Then Rz = z
and so R is onto. This proves the lemma.

Now for the V described above,
Ru(v) =
_
_
auv +a (x) u v
_
dx
Also, as noted above V is dense in H L
2
() and so if H is identied with H
, it
follows
V H = H
.
Let A : D(A) H be given by
D(A) u V : Ru H
and
A R
on D(A). Then the numerical range for A is contained in (, a] and so A is
sectorial by Proposition 21.41 provided A is closed and densely dened.
Why is D(A) dense? It is because it contains C
c
() which is dense in L
2
() .
This follows from integration by parts which shows that for u, v C
c
() ,
auvdx
_
a (x) u vdx
=
_
auvdx +
_
(a (x) u) vdx
and since C
c
() is dense in H,
Au = au + (a (x) u) L
2
() = H.
Why is A closed? If u
n
D(A) and u
n
u in H while Au
n
in H, then
it follows from the denition that Ru
n
and u
n
converges to u in V so for
any v V,
Ru(v) = lim
n
Ru
n
(v) = lim
n
(Ru
n
, v)
H
= (, v)
H
556 RESIDUES
which shows Ru = H and so u D(A) and Au = . Thus A is closed. This
completes the example.
Obviously you could follow identical reasoning to include many other examples
of more complexity. What does it mean for u D(A)? It means that in a weak
sense
au + (a (x) u) H.
Since A is sectorial for S
a,
for any 0 < < /2, this has shown the existence of a
weak solution to the partial dierential equation along with appropriate boundary
conditions,
au + (a (x) u) = f, u V.
What are these appropriate boundary conditions? u = 0 on is one. the other
would be a variational boundary condition which comes from integration by parts.
Letting v V, formally do the following using the divergence theorem.
(f, v)
H
=
_
(au + (a (x) u)) vdx

=
_
auvdx +
_
(a (x) uv) nds

_
a (x) u(x) v (x) dx

= (f, v)
H
+
_
\
(a (x) u) nvds
and so the other boundary condition is
a (x)
u
n
= 0 on .
To what extent this weak solution is really a classical solution depends on more
technical considerations.
21.4.4 Fractional Powers Of Sectorial Operators
It will always be assumed in this section that A is sectorial for the sector S
a,
where a > 0. To begin with, here is a useful lemma which will be used in the
presentation of these fractional powers.
Lemma 21.46 The following holds for (0, 1) and < t.
_
t
(t s)
1
(s )
ds =

sin()
In particular,
_
1
0
(1 s)
1
s
ds =

sin()
.
Also for , > 0
() () =
__
1
0
x
1
(1 x)
1
dx
_
( +) .
Proof: First change variables to get rid of the . Let y = (t )
1
(s ) .
Then the integral becomes
_
1
0
(t [(t ) y +])
1
(t )
(t ) dy
=
_
1
0
((t ) (1 y))
1
(t )
(t ) dy
=
_
1
0
(1 y)
1
y
dy
Next let y = x
2
. The integral is
2
_
1
0
_
1 x
2
_
1
x
12
dx
Next let x = sin
2
_ 1
2
0
(cos ())
21
sin
(12)
() d = 2
_ 1
2
0
_
cos ()
sin()
_
21
d
Now change the variable again. Let u = cot () . Then this yields
2
_

0
u
21
1 +u
2
du
This is fairly easy to evaluate using contour integrals. Consider the following contour
called
R
for large R. As R , the integral over the little circle converges to 0
and so does the integral over the big circle. There is one singularity at i.
R R
R
1
R
1
Thus
lim
R
_
R
e
(ln|z|+i arg(z))(12)
1 +z
2
dz =
= (1 + cos (1 2) )
_

0
u
21
1 +u
2
du
+i sin((1 2) )
_

0
u
21
1 +u
2
du
558 RESIDUES
=
_
cos
_
2
(1 2)
_
+i sin
_
2
(1 2)
__
Then equating the imaginary parts yields
sin((1 2) )
_

0
u
21
1 +u
2
du = sin
_
2
(1 2)
_
and so using the trig identities for the sum of two angles,
_

0
u
21
1 +u
2
du =

_
sin
_
2
(1 2)
__
2 sin
_
2
(1 2)
_
cos
_
2
(1 2)
_
=

2 cos
_
2
(1 2)
_ =

2 sin()
It remains to verify the last identity.
() ()
_

0
_

0
t
1
e
t
s
1
e
s
dsdt
=
_

0
_

t
t
1
e
u
(u t)
1
dudt
=
_

0
e
u
_
u
0
t
1
(u t)
1
dtdu
=
_
1
0
x
1
(1 x)
1
dx
_

0
e
u
u
+1
du
=
__
1
0
x
1
(1 x)
1
dx
_
( +)
If it is not stated otherwise, in all that follows > 0.
Denition 21.47 Let A be a sectorial operator corresponding to the sector S
a
where a < 0. Then dene for > 0,
(A)
1
()
_

0
t
1
S (t) dt
where S (t) is the analytic semigroup generated by A as in Corollary 21.39. Note
that from the estimate, [[S (t)[[ Me
at
of this corollary, the integral is well dened
and is in L(H, H).
Theorem 21.48 For (A)
as dened in Denition 21.47

(A)
(A)
= (A)
(+)
(21.20)
Also
(A)
1
(A) = I, (A) (A)
1
= I (21.21)
and (A)
is one to one if 0, dening A

0
I.
If < , then
(A)
(H) (A)
(H) . (21.22)
If (0, 1) , then
(A)
=
sin()
_

0
(I A)
1
d (21.23)
Proof: Consider 21.20.
(A)
(A)
1
() ()
_

0
_

0
t
1
s
1
S (t +s) dsdt
Changing variables and using Fubinis theorem which is justied because of the
abolute convergence of the iterated integrals, which follows from Corollary 21.39,
this becomes
1
() ()
_

0
_

t
t
1
(u t)
1
S (u) dudt
=
1
() ()
_

0
_
u
0
t
1
(u t)
1
S (u) dtdu
=
1
() ()
_

0
S (u)
_
1
0
(ux)
1
(u ux)
1
udxdu
=
1
() ()
__
1
0
x
1
(1 x)
1
dx
__

0
S (u) u
+1
du
=
1
() ()
__
1
0
x
1
(1 x)
1
dx
_
( +) (A)
(+)
= (A)
(+)
This proves the rst part of the theorem.
Consider 21.21. Since A is a closed operator, and approximating the integral
with an appropriate sequence of Riemann sums, (A) can be taken inside the
integral and so
(A)
1
(1)
_

0
t
11
S (t) dt =
_

0
(A) S (t) dt
=
_

0
d
dt
(S (t)) dt = S (0) = I.
Next let x D(A) . Then
1
(1)
_

0
t
11
S (t) dt (A) x =
_

0
S (t) Axdt
=
_

0
AS (t) xdt =
_

0
d
dt
(S (t)) dt = Ix
560 RESIDUES
This shows that the integral in which = 1 deserves to be called A
1
so the
denition is not bad notation. Also, by assumption, A
1
is one to one. Thus
(A)
1
(A)
1
x = 0
implies
(A)
1
x = 0
hence x = 0 so that (A)
2
is also one to one. Similarly, (A)
m
is one to one for
all positive integers m.
From what was just shown, if (A)
x = 0 for (0, 1) , then

(A)
1
x = (A)
(1)
(A)
x = 0
and so x = 0. This shows (A)
is one to one for all [0, 1] if is dened as

(A)
0
I.
What about > 1? For such , it is of the form m+ where [0, 1) and m
is a positive integer. Therefore, if
(A)
(m+)
x = 0
then
(A)
_
(A)
m
_
x = 0
and so from what was just shown,
_
(A)
m
_
x = 0
and now this implies x = 0 so that (A)
is one to one for all 0.

Consider 21.22. It was shown above that
(A)
(A)
= (A)
(+)
Let x = (A)
(+)
y. Then
x = (A)
(A)
y (A)
(A)
(H) (A)
(H) .
This proves 21.22. If < , (A)
(H) (A)
(H) .
Now consider the problem of writing (A)
for (0, 1) in terms of A, not

mentioning S (t) . By Proposition 13.28,
(I A)
1
x =
_

0
e
t
S (t) xdt
Then
_

0
(I A)
1
d =
_

0
_

0
e
t
S (t) dtd
=
_

0
S (t)
_

0
e
t
ddt
=
_

0
S (t)
_

0
1
e
t
ddt
where 1 . Then using Lemma 21.46, this equals
_

0
S (t)
_

0
1
t
1
e
t
1
ddt =
_

0
t
S (t)
_

0
1
e
ddt
= (1 )
_

0
t
1
S (t) dt = () (1 ) (A)
=
__
1
0
x
1
(1 x)
dx
_
(A)
=

sin()
(A)
and so this gives the formula

(A)
=
sin()
_

0
(I A)
1
d.
This proves 21.23.
Denition 21.49 For 0, dene (A)
on D((A)
) (A)
(H) by
(A)
_
(A)
_
1
Note that if , > 0, then if x D
_
(A)
+
_
,
(A)
+
x =
_
(A)
(+)
_
1
x =
_
(A)
(A)
_
1
x = (A)
(A)
x. (21.24)
Next let > > 0 and let x D
_
(A)
_
. Then from what was just shown,
(A)
(A)
x = (A)
x
and so
(A)
x = (A)
(A)
x
If x D
_
(A)
_
, does it follow that (A)
x D
_
(A)
_
? Note x = (A)
y
and so
(A)
x = (A)
(A)
y = (A)
(+)
y D
_
(A)
+
_
.
(A)
x = (A)
(A)
_
(A)
x
_
= (A)
(A)
x.
562 RESIDUES
Theorem 21.50 The denition of (A)
is well dened and (A)
is densely
dened and closed. Also for any > 0,
[[(A)
S (t)[[
C
1
t
e
t
(21.25)
where > a. Furthermore, C
is bounded as 0+ and is bounded on compact

intervals of (0, ). Also for (0, 1) and x D((A)
) ,
[[(S (t) I) x[[
C
1
[[(A)
x[[ (21.26)
There exists a constant C independent of [0, 1) such that for x D(A) and
> 0,
[[(A)
x[[ [[(A) x[[ +C

/(1)
[[x[[ (21.27)
There exists a constant C
independent of [0, 1] such that for x D(A) ,

[[(A)
x[[ C
[[(A) x[[
[[x[[
1
(21.28)
The formula 21.28 is called an interpolation inequality.
Proof: It is obvious (A)
is densely dened because its domain is at least as

large as D(A) which was assumed to be dense. It is a closed operator because if
x
n
D((A)
) and
x
n
x, (A)
x
n
y,
then
(A)
x
n
(A)
x, x
n
= (A)
(A)
x
n
(A)
y
and so
(A)
y = x
showing x D((A)
) and y = (A)
x. Thus (A)
is closed and densely

dened.
Let > a where the sector for A was S
a,
, a > 0. Then recall from Corollary
21.39 there is a constant, N such that
[[(A) S (t)[[
N
t
e
t
What about [[(A)
S (t)[[? First note that for [0, 1) this at least makes sense
because S (t) maps into D(A). For any > 0,
S (t) (A)
= (A)
S (t)
follows from the deniton of (A)
. Therefore,
(A)
S (t) (A)
= S (t) . (21.29)
Note this implies that on D((A)
) ,
(A)
S (t) = S (t) (A)
.
Also
(A)
1
S (t) = S (t) (A)
1
= S (t) (A)
(A)
(1)
and so
S (t) = (A) S (t) (A)
(A)
(1)
(A)
S (t) = (A) (A)
S (t) (A)
(A)
(1)
= (A) S (t) (A)
(1)
(21.30)
Then with this formula,
[[(A)
S (t)[[ =
(A) S (t) (A)

(1)
1
(1 )
_

0
s
1
(A) S (t +s) ds
N
(1 )
_

0
s
1
(t +s)
e
(s+t)
ds
=
N
(1 )
_

t
(u t)
1
u
e
u
ds
N
(1 )
_

t
_
1
t
u
_
1
1
u
e
u
ds
N
(1 )
1
t
_

t
e
u
ds =
N
(1 )
1
t
e
t
1
t
e
t
.
this establishes the formula when [0, 1). Next suppose = m, a positive integer.
[[A
m
S (t)[[ =
A
m
S
_
t
m
_
m
_
AS
_
t
m
__
m
N
t
m
m
m
.
This is why the above inequality holds.
564 RESIDUES
If , > 0,
A
+
S (t)
A
+
S
_
t
2
_
S
_
t
2
_
S
_
t
2
_
A
S
_
t
2
_
e
2t
=
C
t
+
e
t
Suppose now that > 0. Then
= m+
where [0, 1). Then from what was just shown,
A
m+
S (t)
C
t
m+
e
t
.
Next consider 21.26. First note that whenever > 0,
(A)
S (s) = S (s) (A)
and so on D((A)
) ,
S (s) = (A)
S (s) (A)
, S (s) (A)
= (A)
S (s)
Now for x D((A)
) ,
[[(S (t) I) x[[ =
_
t
0
(A) S (s) xds
_
t
0
(A)
1
(A)
S (s) xds
_
t
0
(A)
1
S (s) (A)
xds
_
t
0
(A)
1
S (s)
ds [[(A)
x[[
_
t
0
C
1
1
s
1
e
s
ds [[(A)
x[[
C
1
[[(A)
x[[
and this shows 21.26.
Next consider 21.27. Let x H and (0, 1) . Then
(A)
=
1
()
_

0
t
1
S (t) xdt

=
1
()
_

0
t
1
S (t) xdt +
_

t
1
S (t) xdt
1
()
_

0
t
1
[[S (t) x[[ dt +
1
()
t
1
S (t) xdt
C
()
[[x[[ +
1
()
t
1
S (t) xdt
C
()
[[x[[ +
1
()
1
S () A
1
x + (1 )
_

t
2
S (t) A
1
xdt
1
()
_
C
[[x[[ +
1
A
1
x
+ (1 )
A
1
x
t
2
dt
_
=
1
()
_
C
[[x[[ + 2
1
A
1
x
_
.
Now let = C
so = C
1/
1/
and
1
= C
1
(1)/
. Thus for all x H,
(A)

1
()
_
[[x[[ + 2C
1
(1)/
A
1
x
_
.
Let =

()
=

(1+)
. Then the above is of the form
(A)
[[x[[ + 2
C
1
()
((1 +))
(1)/
A
1
x
[[x[[ + 2C
1
((1 +))
(1)/
A
1
x
because is decreasing on (0, 1) . I need to verify that for (0, 1) ,

(1 +)
(1)/
is bounded. It is continuous on (0, 1] and so if I can show lim
0+
(1 +)
(1)/
exists, then it will follow the function is bounded. It suces to show
lim
0+
1
ln(1 +) = lim
0+
ln(1 +)
exists. Consider this. By LHospitals rule and dominated convergence theorem,

this is
lim
0+
_
0
ln(t) t
e
t
dt
(1 +)
= lim
0+
_

0
ln(t) t
e
t
dt
= lim
0+
_

0
ln(t) e
t
dt.
566 RESIDUES
Thus the function is bounded independent of (0, 1) . This shows there is a
constant C which is independent of (0, 1) such that for any x H,
(A)
[[x[[ +C
(1)/
A
1
x
. (21.31)
Now let y D(A) = D((A)) and let x = (A) y. Then the above becomes
(A)
(A) y
[[(A) y[[ +C
(1)/
[[y[[
I claim that
(A)
(A) y = (A)
1
y.
The reason for this is as follows.
(A)
(A)
1
y = (A) y
and so the desired result follows from multiplying on the left by (A)
. Hence
(A)
1
y
[[(A) y[[ +C
(1)/
[[y[[
Now let 1 = and obtain
[[(A)
y[[ [[(A) y[[ +C

/(1)
[[y[[
This proves 21.27.
Finally choose to minimize the right side of the above expression. Thus let
=
_
[[y[[ C
[[(A) y[[ (1 )
_
1
Then the above expression becomes
[[(A)
y[[ [[(A) y[[

_
[[y[[ C
[[(A) y[[ (1 )
_
1
+C
_
_
[[y[[ C
[[(A) y[[ (1 )
_
1
_
/(1)
[[y[[
= [[(A) y[[
[[y[[
1
_
C
(1 )
_
1
+[[(A) y[[
[[y[[
1
_
C
(1 )
_
=
_
_
C
(1 )
_
1
+
_
C
(1 )
_
_
[[(A) y[[
[[y[[
1
C
[[(A) y[[
[[y[[
1
where C
does not depend on (0, 1) . To see such a constant exists, note

lim
1
_
C
(1 )
_
1
= 1
and
lim
1
_
C
(1 )
_
= 0
while
lim
0
_
C
(1 )
_
1
= 0, lim
0
_
C
(1 )
_
= 1
Of course C
depends on C but as shown above, this did not depend on (0, 1) .

This proves 21.28.
The following corollary follows from the proof of the above theorem.
Corollary 21.51 Let (0, 1) . Then for all > 0, there exists a constant C (, )
such that
(A)
[[x[[ +C (, )
(A)
1
x
Also if A
1
is compact, then so is (A)
for all (0, 1).

Proof: The rst part is done in the above theorem. Let S be a bounded set
and let > 0. Then let > 0 be small enough that for all x S, [[x[[ < /4.
Let
_
(A)
1
x
n
_
be a / (2 + 2C (, )) net for (A)
1
(S) . Then if (A)
x
(A)
S, there exists x
n
such that
(A)
1
x
n
(A)
1
x
<

2 + 2C (, )
.
Then
(A)
x
n
(A)
[[x
n
x[[ +C (, )
(A)
1
x
n
(A)
1
x
<

2
+

2
=
showing (A)
(S) has a net. Thus (A)
is compact. This proves the corol-

lary.
The next proposition gives a general interpolation inequality.
Proposition 21.52 Let 0 < < and let
= + (1 ) , (0, 1) .
Then there exists a constant, C such that for all x D
_
(A)
_
,
[[(A)
x[[ C
(A)
[[(A)
x[[
1
.
568 RESIDUES
Proof: This is an exercise in using 21.25. Letting x D
_
(A)
_
,
(A)
x = (A)
(A)
(A)
x
Therefore, letting C denote a generic constant, it follows since (A)
is closed,
() [[(A)
x[[ =
_

0
t
1
(A)
S (t) (A)
xdt
_

0
t
1
(A)
(A)
S (t) (A)
dt
+
_

t
1
(A)
(A)
S (t) (A)
dt
C
_

0
t
1
t
dt
(A)
+C
_

t
1
t
dt [[(A)
x[[
= C
_
(A)
+

()

[[(A)
x[[
_
and now writing in what is in terms of yields
() [[(A)
x[[ C
_
1

_
_
_
_
1
(1 )
(A)
+
_
[[(A)
x[[
_
Letting =
, it follows
() [[(A)
x[[ C
_
1

_
_

1
(1 )
(A)
[[(A)
x[[
_
then let
=
[[(A)
x[[
(A)
which is obtained from minimizing the expression on the right in the above. then
placing this in the inequality yields
() [[(A)
x[[
C
_
1

_
_
_
_
_
_
_
||(A)
x||
[[(A)
x[[
_
1
(1 )
(A)
+
_
||(A)
x||
[[(A)
x[[
_
[[(A)
x[[
_
_
_
_
_
= C
_
1

__
1
(1 )
+
1
_
[[(A)
x[[
1
(A)
and this proves the proposition.

Note that the constant is not bounded as 1.
Here is another interesting result about compactness.
Proposition 21.53 Let A be sectorial for S
a,
where a < 0. Then the following
are equivalent.
1. (A)
is compact for all > 0.

2. S (t) is compact for each t > 0.
Proof: First suppose (A)
is compact for all > 0. Then

() (A)
=
_
t
0
s
1
S (s) ds +
_

t
s
1
S (s) ds
=
t
S (t)
_
t
0
s
AS (s) ds +s
1
S (s) A
1
[
t
( 1)
_

t
s
2
S (s) A
1
ds
Now

AS (s)
C
s
1
and so the second integral satises
_
t
0
s
AS (s) ds
C
t
2
() (A)
= O
_
t
2
_
+
t
S (t)
t
1
A
1
S (t) ( 1)
_

t
s
2
S (s) dsA
1
It follows that for t > 0, and > 0 given,
S (t) =
_
t
t
1
_
1 _
() (A)
+( 1)
_

t
s
2
S (s) dsA
1
+O
_
t
2
__
=
_
t
t
1
_
1 _
() (A)
+( 1)
_

t
s
2
S (s) dsA
1
_
+O
_
1
_
= N
+O
_
1
_
.
570 RESIDUES
where N
is a compact operator. Now let B be a bounded set in H, [[x[[ M for all

x B and let > 0 be given. Then choose large enough that

O
_
1
<

4+4M
.
Then there exists a /2 net, N
x
n
N
n=1
for N
(B) . Then consider S (t) x

n
N
n=1
.
For x B, there exists x
n
such that [[N
x
n
N
x[[ < /2. Then

[[S (t) x S (t) x
n
[[ [[S (t) x N
x[[
+[[N
x N
x
n
[[ +[[N
x
n
S (t) x
n
[[

4 + 4M
M +

2
+

4 + 4M
M <
Thus S (t) (B) has an net for every > 0 and so S (t) is compact.
Next suppose S (t) is compact for all t > 0. Then
(A)
=
1
()
_

0
t
1
S (t) dt
and the integral is a limit in norm of Riemann sums of the form
m
k=1
t
1
k
S (t
k
) t
k
and each of these operators is compact. Since (A)
is the limit in norm of

compact operators, it must also be compact. This proves the proposition.
Here are some observations which are listed in the book by Henry [24]. Like the
above proposition, these are exercises in this book.
Observation 21.54 For each x H, t tAS (t) is continuous and lim
t0+
tAS (t) x =
0.
The reason for this is that if x D(A) , then
tAS (t) x = [tS (t) Ax[ 0
as t 0. Now suppose y H is arbitrary. Then letting x D(A) ,
[tAS (t) y[ [tAS (t) (y x)[ +[tAS (t) x[
+[tAS (t) x[
provided x is close enough to y. The last term converges to 0 and so
lim sup
t0+
[tAS (t) y[
where > 0 is arbitrary. Thus
lim
t0+
[tAS (t) y[ = 0.
Why is t tAS (t) x continuous on [0, T]? This is true if x D(A) because
t tS (t) Ax is continuous. If y H is arbitrary, let x
n
converge to y in H where
x
n
D(A) . Then
[tAS (t) y tAS (t) x
n
[ C [y x
n
[
and so the convergence is uniform. Thus t tAS (t) y is continuous because it is
the uniform limit of a sequence of continuous functions.
Observation 21.55 If x H and A is sectorial for S
a,
, a < 0, then for any
[0, 1] ,
lim
t0+
t
[[(A)
S (t) x[[ = 0.
This follows as above because you can verify this is true for x D(A) and then
use the fact shown above that
t
[[(A)
S (t)[[ C
to extend it to x arbitrary.
21.4.5 A Scale Of Banach Spaces
Next I will present an important and interesting theorem which can be used to
prove equivalence of certain norms.
Theorem 21.56 Let A, B be sectorial for S
a,
where a < 0 and suppose D(A) =
D(B) . Also suppose
(AB) (A)
, (AB) (B)
are both bounded on D(A) for some (0, 1). Then for all [0, 1] ,
(A)
(B)
, (B)
(A)
are both bounded on D(A) = D(B). Also D

_
(A)
_
= D
_
(B)
_
.
Proof: First of all it is a good idea to verify (AB) (A)
, (AB) (B)
make sense on D(A) . If x D(A) , then why is (A)
x D(A)? Here is why.

Since x D(A) ,
x = (A)
1
y
for some y H. Then
(A)
x = (A)
(A)
1
y = (A)
1
(A)
y D(A) .
The case of (AB) (B)
is similar.
572 RESIDUES
Next for (0, 1) and > 0, use 21.28 to write
(A)
(I A)
1
x
(A) (I A)
1
x
(I A)
1
x
1
C
(A) (I A)
1
(I A)
1
1
[[x[[
C
I (I A)
1
M
( +)
1
[[x[[
C
_
1 +

( +)
_
M
( +)
1
[[x[[
C
( +)
1
[[x[[ (21.32)
where a < < 0 where C denotes a generic constant. Similarly, for all (0, 1) ,
(B)
(I B)
1
x

C
( +)
1
[[x[[ (21.33)
Now from Theorem 21.48 and letting (0, 1) ,
(B)
(A)
=
sin()
_

0
_
(I B)
1
(I A)
1
_
d
=
sin()
_

0
(I B)
1
(AB) (I A)
1
d. (21.34)
Therefore, letting x D(A) and letting C denote a generic constant which can be
changed from line to line and using 21.32 and 21.33,
x (B)
(A)
C
_

0
1
(B)
(I B)
1
(AB) (I A)
1
x
d
The reason (B)
goes inside the integral is that it is a closed operator. Then the

above
C
_

0
1
( +)
1
(AB) (A)
(A)
(I A)
1
x
d
C
_

0
1
( +)
1
(A)
(I A)
1
x
d
C
_

0
1
( +)
1
1
( +)
1
d[[x[[ = C [[x[[ .
It follows (B)
(A)
is bounded on D(A).
Next reverse A and B in 21.34. This yields
(A)
(B)
=
sin()
_

0
(I A)
1
(B A) (I B)
1
d.
Letting x D(A) ,
x (A)
(B)
C
_

0
(A)
(I A)
1
(B A) (I B)
1
x
d
C
_

0
1
( +)
1
(B A) (B)
(B)
(I B)
1
x
d(21.35)
C
_

0
1
( +)
1
( +)
1
d[[x[[ = C [[x[[ (21.36)
This shows (A)
(B)
is bounded on D(A) = D(B) . Note the assertion these

are bounded refers to the norm on H.
It remains to verify D
_
(A)
_
= D
_
(B)
_
. Since D(A) is dense in H
there exists a unique L(A, B) L(H, H) such that L(A, B) = (A)
(B)
on
D(A). Let L(B, A) be dened similarly as a continuous linear map which equals
(B)
(A)
on D(A) . Then
(A)
L(A, B) = (B)
(B)
L(B, A) = (A)
The rst of these equations shows D

_
(B)
_
D
_
(A)
_
and the second turns
the inclusion around. Thus they are equal as claimed.
Next consider the case where = 1. In this case
(AB) B
is bounded on D(A) and so

(AB) B
B
1+
is also bounded on D(A) . But this equals
(AB) B
1
.
Thus AB
1
is bounded on D(A) . Similarly you can show
(B A) A
1
is bounded which implies BA
1
is bounded on D(A). This proves the theorem.
574 RESIDUES
Denition 21.57 Let A be sectorial for the sector S
a,
. Let b > a so that A bI
is sectorial for S
,
where = b a. Then for each [0, 1] , dene a norm on
D((bI A)
) H
by
[[x[[
[[(bI A)
x[[
The H
[0,1]
is called a scale of Banach spaces.
Proposition 21.58 The H
above are Banach spaces and they decrease in . Fur-

thermore, if b
i
> a for i = 1, 2 then the two norms associated with the b
i
are
equivalent.
Proof: That the H
are decreasing was shown above in Theorem 21.48. They

are Banach spaces because (bI A)
is a closed mapping which is also one to one.

It only remains to verify the claim about the equivalence of the norms. Let
b
2
> b
1
> a. Then if (0, 1) ,
((b
1
I A) (b
2
I A)) (b
2
I A)
= (b
1
b
2
) (b
2
I A)
L(H, H)
and so by Theorem 21.56, for each [0, 1] ,
D
_
(b
1
I A)
_
= D
_
(b
2
I A)
_
so the spaces, H
are the same for either choice of b > a. Also from this theorem,
(b
1
I A)
(b
2
I A)
, (b
2
I A)
(b
1
I A)
are both bounded on D(A) . Therefore, for x H
(b
1
I A)
(b
1
I A)
(b
2
I A)
(b
2
I A)
(b
2
I A)
Similarly using the boundedness of (b

2
I A)
(b
1
I A)
, it follows
(b
2
I A)
(b
1
I A)
Thus showing the two norms are equivalent. This proves the proposition.
21.5 Exercises
1. Example 21.19 found the integral of a rational function of a certain sort. The
technique used in this example typically works for rational functions of the
form
f(x)
g(x)
where deg (g (x)) deg f (x) + 2 provided the rational function
has no poles on the real axis. State and prove a theorem based on these
observations.
21.5. EXERCISES 575
2. Fill in the missing details of Example 21.25 about I
N
0. Note how important
it was that the contour was chosen just right for this to happen. Also verify
the claims about the residues.
3. Suppose f has a pole of order m at z = a. Dene g (z) by
g (z) = (z a)
m
f (z) .
Show
Res (f, a) =
1
(m1)!
g
(m1)
(a) .
Hint: Use the Laurent series.
4. Give a proof of Theorem 21.6. Hint: Let p be a pole. Show that near p, a
pole of order m,
f
(z)
f (z)
=
m+
k=1
b
k
(z p)
k
(z p) +
k=2
c
k
(z p)
k
Show that Res (f, p) = m. Carry out a similar procedure for the zeros.
5. Use Rouches theorem to prove the fundamental theorem of algebra which
says that if p (z) = z
n
+ a
n1
z
n1
+ a
1
z + a
0
, then p has n zeros in C.
Hint: Let q (z) = z
n
and let be a large circle, (t) = re
it
for r suciently
large.
6. Consider the two polynomials z
5
+3z
2
1 and z
5
+3z
2
. Show that on [z[ = 1,
the conditions for Rouches theorem hold. Now use Rouches theorem to verify
that z
5
+ 3z
2
1 must have two zeros in [z[ < 1.
7. Consider the polynomial, z
11
+7z
5
+3z
2
17. Use Rouches theorem to nd
a bound on the zeros of this polynomial. In other words, nd r such that if z
is a zero of the polynomial, [z[ < r. Try to make r fairly small if possible.
8. Verify that
_
0
e
t
2
dt =
2
. Hint: Use polar coordinates.
9. Use the contour described in Example 21.19 to compute the exact values of
the following improper integrals.
(a)
_
x
(x
2
+4x+13)
2
dx
(b)
_
0
x
2
(x
2
+a
2
)
2
dx
(c)
_
dx
(x
2
+a
2
)(x
2
+b
2
)
, a, b > 0
10. Evaluate the following improper integrals.
(a)
_
0
cos ax
(x
2
+b
2
)
2
dx
576 RESIDUES
(b)
_
0
xsin x
(x
2
+a
2
)
2
dx
11. Find the Cauchy principle value of the integral
_

sinx
(x
2
+ 1) (x 1)
dx
dened as
lim
0+
__
1
sinx
(x
2
+ 1) (x 1)
dx +
_

1+
sinx
(x
2
+ 1) (x 1)
dx
_
.
12. Find a formula for the integral
_
dx
(1+x
2
)
n+1
where n is a nonnegative integer.
13. Find
_
sin
2
x
x
2
dx.
14. If m < n for m and n integers, show
_

0
x
2m
1 +x
2n
dx =

2n
1
sin
_
2m+1
2n

_.
15. Find
_
1
(1+x
4
)
2
dx.
16. Find
_
0
ln(x)
1+x
2
dx = 0.
17. Suppose f has an isolated singularity at . Show the singularity is essential
if and only if the principal part of the Laurent series of f has innitely many
terms. That is, show f (z) =
k=0
a
k
(z )
k
+
k=1
b
k
(z)
k
where innitely
many of the b
k
are nonzero.
18. Suppose is a bounded open set and f
n
is analytic on and continuous on
. Suppose also that f
n
f uniformly on and that f ,= 0 on . Show
that for all n large enough, f
n
and f have the same number of zeros on
provided the zeros are counted according to multiplicity.
Complex Mappings
22.1 Conformal Maps
If (t) = x(t) + iy (t) is a C
1
curve having values in U, an open set of C, and
if f : U C is analytic, consider f , another C
1
curve having values in C.
Also,
(t) and (f )
(t) are complex numbers so these can be considered as

vectors in 1
2
as follows. The complex number, x + iy corresponds to the vector,
(x, y) . Suppose that and are two such C
1
curves having values in U and that
(t
0
) = (s
0
) = z and suppose that f : U C is analytic. What can be said about
the angle between (f )
(t
0
) and (f )
(s
0
)? It turns out this angle is the same
as the angle between
(t
0
) and
(s
0
) assuming that f
(z) ,= 0. To see this, note

(x, y) (a, b) =
1
2
(zw +zw) where z = x + iy and w = a + ib. Therefore, letting
be the cosine between the two vectors, (f )
(t
0
) and (f )
(s
0
) , it follows from
calculus that
cos
=
(f )
(t
0
) (f )
(s
0
)
(f )
(s
0
)
(f )
(t
0
)
=
1
2
f
( (t
0
))
(t
0
) f
( (s
0
))
(s
0
) +f
( (t
0
))
(t
0
)f
( (s
0
))
(s
0
)
[f
( (t
0
))[ [f
( (s
0
))[
=
1
2
f
(z) f
(z)
(t
0
)
(s
0
) +f
(z)f
(z)
(t
0
)
(s
0
)
[f
(z)[ [f
(z)[
=
1
2
(t
0
)
(s
0
) +
(s
0
)
(t
0
)
1
which equals the angle between the vectors,
(t
0
) and
(t
0
) . Thus analytic map-
pings preserve angles at points where the derivative is nonzero. Such mappings are
called isogonal. .
Actually, they also preserve orientations. If z = x + iy and w = a + ib are two
complex numbers, then (x, y, 0) and (a, b, 0) are two vectors in 1
3
. Recall that the
cross product, (x, y, 0) (a, b, 0) , yields a vector normal to the two given vectors
such that the triple, (x, y, 0) , (a, b, 0) , and (x, y, 0) (a, b, 0) satises the right hand
577
578 COMPLEX MAPPINGS
rule and has magnitude equal to the product of the sine of the included angle times
the product of the two norms of the vectors. In this case, the cross product will
produce a vector which is a multiple of k, the unit vector in the direction of the z
axis. In fact, you can verify by computing both sides that, letting z = x + iy and
w = a +ib,
(x, y, 0) (a, b, 0) = Re (ziw) k.
Therefore, in the above situation,
(f )
(t
0
) (f )
(s
0
)
= Re
_
f
( (t
0
))
(t
0
) if
( (s
0
))
(s
0
)
_
k
= [f
(z)[
2
Re
_
(t
0
) i
(s
0
)
_
k
which shows that the orientation of
(t
0
),
(s
0
) is the same as the orientation of
(f )
(t
0
) , (f )
(s
0
). Mappings which preserve both orientation and angles are
called conformal mappings and this has shown that analytic functions are conformal
mappings if the derivative does not vanish.
22.2 Fractional Linear Transformations
22.2.1 Circles And Lines
These mappings map lines and circles to either lines or circles.
Denition 22.1 A fractional linear transformation is a function of the form
f (z) =
az +b
cz +d
(22.1)
where ad bc ,= 0.
Note that if c = 0, this reduces to a linear transformation (a/d) z+(b/d) . Special
cases of these are dened as follows.
dilations: z z, ,= 0, inversions: z
1
z
,
translations: z z +.
The next lemma is the key to understanding fractional linear transformations.
Lemma 22.2 The fractional linear transformation, 22.1 can be written as a nite
composition of dilations, inversions, and translations.
Proof: Let
S
1
(z) = z +
d
c
, S
2
(z) =
1
z
, S
3
(z) =
(bc ad)
c
2
z
22.2. FRACTIONAL LINEAR TRANSFORMATIONS 579
and
S
4
(z) = z +
a
c
in the case where c ,= 0. Then f (z) given in 22.1 is of the form
f (z) = S
4
S
3
S
2
S
1
.
Here is why.
S
2
(S
1
(z)) = S
2
_
z +
d
c
_
1
z +
d
c
=
c
zc +d
.
Now consider
S
3
_
c
zc +d
_
(bc ad)
c
2
_
c
zc +d
_
=
bc ad
c (zc +d)
.
Finally, consider
S
4
_
bc ad
c (zc +d)
_
bc ad
c (zc +d)
+
a
c
=
b +az
zc +d
.
In case that c = 0, f (z) =
a
d
z +
b
d
which is a translation composed with a dilation.
Because of the assumption that ad bc ,= 0, it follows that since c = 0, both a and
d ,= 0. This proves the lemma.
This lemma implies the following corollary.
Corollary 22.3 Fractional linear transformations map circles and lines to circles
or lines.
Proof: It is obvious that dilations and translations map circles to circles and
lines to lines. What of inversions? If inversions have this property, the above lemma
implies a general fractional linear transformation has this property as well.
Note that all circles and lines may be put in the form
_
x
2
+y
2
_
2ax 2by = r
2
_
a
2
+b
2
_
where = 1 gives a circle centered at (a, b) with radius r and = 0 gives a line. In
terms of complex variables you may therefore consider all possible circles and lines
in the form
zz +z +z + = 0, (22.2)
To see this let =
1
+i
2
where
1
a and
2
b. Note that even if is not
0 or 1 the expression still corresponds to either a circle or a line because you can
divide by if ,= 0. Now I verify that replacing z with
1
z
results in an expression
of the form in 22.2. Thus, let w =
1
z
where z satises 22.2. Then
_
+w +w +ww
_
=
1
zz
_
zz +z +z +
_
= 0
and so w also satises a relation like 22.2. One simply switches with and
with . Note the situation is slightly dierent than with dilations and translations.
In the case of an inversion, a circle becomes either a line or a circle and similarly, a
line becomes either a circle or a line. This proves the corollary.
The next example is quite important.
Example 22.4 Consider the fractional linear transformation, w =
zi
z+i
.
First consider what this mapping does to the points of the form z = x + i0.
Substituting into the expression for w,
w =
x i
x +i
=
x
2
1 2xi
x
2
+ 1
,
a point on the unit circle. Thus this transformation maps the real axis to the unit
circle.
The upper half plane is composed of points of the form x + iy where y > 0.
Substituting in to the transformation,
w =
x +i (y 1)
x +i (y + 1)
,
which is seen to be a point on the interior of the unit disk because [y 1[ < [y + 1[
which implies [x +i (y + 1)[ > [x +i (y 1)[. Therefore, this transformation maps
the upper half plane to the interior of the unit disk.
One might wonder whether the mapping is one to one and onto. The mapping
is clearly one to one because it has an inverse, z = i
w+1
w1
for all w in the interior
of the unit disk. Also, a short computation veries that z so dened is in the upper
half plane. Therefore, this transformation maps z C such that Imz > 0 one to
one and onto the unit disk z C such that [z[ < 1 .
A fancy way to do part of this is to use Theorem 20.11. limsup
za
zi
z+i
1
whenever a is the real axis or . Therefore,
zi
z+i
1. This is a little shorter.

22.2.2 Three Points To Three Points
There is a simple procedure for determining fractional linear transformations which
map a given set of three points to another set of three points. The problem is as
follows: There are three distinct points in the extended complex plane, z
1
, z
2
, and
z
3
and it is desired to nd a fractional linear transformation such that z
i
w
i
for i = 1, 2, 3 where here w
1
, w
2
, and w
3
are three distinct points in the extended
complex plane. Then the procedure says that to nd the desired fractional linear
transformation solve the following equation for w.
w w
1
w w
3
w
2
w
3
w
2
w
1
=
z z
1
z z
3
z
2
z
3
z
2
z
1
22.3. RIEMANN MAPPING THEOREM 581
The result will be a fractional linear transformation with the desired properties.
If any of the points equals , then the quotient containing this point should be
adjusted.
Why should this procedure work? Here is a heuristic argument to indicate why
you would expect this to happen rather than a rigorous proof. The reader may
want to tighten the argument to give a proof. First suppose z = z
1
. Then the right
side equals zero and so the left side also must equal zero. However, this requires
w = w
1
. Next suppose z = z
2
. Then the right side equals 1. To get a 1 on the left,
you need w = w
2
. Finally suppose z = z
3
. Then the right side involves division by
0. To get the same bad behavior, on the left, you need w = w
3
.
Example 22.5 Let Im > 0 and consider the fractional linear transformation
which takes to 0, to and 0 to /, .
The equation for w is
w 0
w
_
/
_ =
z
z 0

0

After some computations,
w =
z
z
.
Note that this has the property that
x
x
is always a point on the unit circle because
it is a complex number divided by its conjugate. Therefore, this fractional linear
transformation maps the real line to the unit circle. It also takes the point, to
0 and so it must map the upper half plane to the unit disk. You can verify the
mapping is onto as well.
Example 22.6 Let z
1
= 0, z
2
= 1, and z
3
= 2 and let w
1
= 0, w
2
= i, and w
3
= 2i.
Then the equation to solve is
w
w 2i

i
i
=
z
z 2

1
1
Solving this yields w = iz which clearly works.
22.3 Riemann Mapping Theorem
From the open mapping theorem analytic functions map regions to other regions or
else to single points. The Riemann mapping theorem states that for every simply
connected region, which is not equal to all of C there exists an analytic function,
f such that f () = B(0, 1) and in addition to this, f is one to one. The proof
involves several ideas which have been developed up to now. The proof is based on
the following important theorem, a case of Montels theorem. Before, beginning,
note that the Riemann mapping theorem is a classic example of a major existence
theorem. In mathematics there are two sorts of questions, those related to whether
something exists and those involving methods for nding it. The real questions are
often related to questions of existence. There is a long and involved history for
proofs of this theorem. The rst proofs were based on the Dirichlet principle and
turned out to be incorrect, thanks to Weierstrass who pointed out the errors. For
more on the history of this theorem, see Hille [26].
The following theorem is really wonderful. It is about the existence of a subse-
quence having certain salubrious properties. It is this wonderful result which will
give the existence of the mapping desired. The other parts of the argument are
technical details to set things up and use this theorem.
22.3.1 Montels Theorem
Theorem 22.7 Let be an open set in C and let T denote a set of analytic
functions mapping to B(0, M) C. Then there exists a sequence of functions
from T, f
n
n=1
and an analytic function, f such that f
(k)
n
converges uniformly to
f
(k)
on every compact subset of .
Proof: First note there exists a sequence of compact sets, K
n
such that K
n

int K
n+1
for all n where here int K denotes the interior of the set K, the
union of all open sets contained in K and
n=1
K
n
= . In fact, you can verify
that B(0, n)
_
z : dist
_
z,
C
_
1
n
_
works for K
n
. Then there exist positive
numbers,
n
such that if z K
n
, then B(z,
n
) int K
n+1
. Now denote by T
n
the set of restrictions of functions of T to K
n
. Then let z K
n
and let (t)
z +
n
e
it
, t [0, 2] . It follows that for z
1
B(z,
n
) , and f T,
[f (z) f (z
1
)[ =
1
2i
_
f (w)
_
1
w z

1
w z
1
_
dw
1
2
f (w)
z z
1
(w z) (w z
1
)
dw
Letting [z
1
z[ <

n
2
,
[f (z) f (z
1
)[
M
2
2
n
[z z
1
[
2
n
/2
2M
[z z
1
[
n
.
It follows that T
n
is equicontinuous and uniformly bounded so by the Arzela Ascoli
theorem there exists a sequence, f
nk
k=1
T which converges uniformly on K
n
.
Let f
1k
k=1
converge uniformly on K
1
. Then use the Arzela Ascoli theorem applied
to this sequence to get a subsequence, denoted by f
2k
k=1
which also converges
uniformly on K
2
. Continue in this way to obtain f
nk
k=1
which converges uni-
formly on K
1
, , K
n
. Now the sequence f
nn
n=m
is a subsequence of f
mk

k=1
and so it converges uniformly on K
m
for all m. Denoting f
nn
by f
n
for short, this
is the sequence of functions promised by the theorem. It is clear f
n
n=1
converges
uniformly on every compact subset of because every such set is contained in K
m
for all m large enough. Let f (z) be the point to which f
n
(z) converges. Then f
is a continuous function dened on . Is f is analytic? Yes it is by Lemma 19.18.
Alternatively, you could let T be a triangle. Then
_
T
f (z) dz = lim
n
_
T
f
n
(z) dz = 0.
Therefore, by Moreras theorem, f is analytic.
As for the uniform convergence of the derivatives of f, recall Theorem 19.52
about the existence of a cycle. Let K be a compact subset of int (K
n
) and let
m
k=1
be closed oriented curves contained in
int (K
n
) K
such that

m
k=1
n(
k
, z) = 1 for every z K. Also let denote the distance
between
j
j
and K. Then for z K,
f
(k)
(z) f
(k)
n
(z)
k!
2i
m
j=1
_
j
f (w) f
n
(w)
(w z)
k+1
dw
k!
2
[[f
k
f[[
K
n
m
j=1
(length of
k
)
1
k+1
.
where here [[f
k
f[[
K
n
sup[f
k
(z) f (z)[ : z K
n
. Thus you get uniform
convergence of the derivatives.
Since the family, T satises the conclusion of Theorem 22.7 it is known as a
normal family of functions. More generally,
Denition 22.8 Let T denote a collection of functions which are analytic on , a
region. Then T is normal if every sequence contained in T has a subsequence which
converges uniformly on compact subsets of .
The following result is about a certain class of fractional linear transformations.
Recall Lemma 20.18 which is listed here for convenience.
Lemma 22.9 For B(0, 1) , let
(z)
z
1 z
.
Then
maps B(0, 1) one to one and onto B(0, 1),

1
, and
() =
1
1 [[
2
.
The next lemma, known as Schwarzs lemma is interesting for its own sake but
will also be an important part of the proof of the Riemann mapping theorem. It
was stated and proved earlier but for convenience it is given again here.
Lemma 22.10 Suppose F : B(0, 1) B(0, 1) , F is analytic, and F (0) = 0. Then
for all z B(0, 1) ,
[F (z)[ [z[ , (22.3)
and
[F
(0)[ 1. (22.4)
If equality holds in 22.4 then there exists C with [[ = 1 and
F (z) = z. (22.5)
Proof: First note that by assumption, F (z) /z has a removable singularity at
0 if its value at 0 is dened to be F
(0) . By the maximum modulus theorem, if

[z[ < r < 1,
F (z)
z
max
t[0,2]
F
_
re
it
_
r

1
r
.
Then letting r 1,
F (z)
z
1
this shows 22.3 and it also veries 22.4 on taking the limit as z 0. If equality
holds in 22.4, then [F (z) /z[ achieves a maximum at an interior point so F (z) /z
equals a constant, by the maximum modulus theorem. Since F (z) = z, it follows
F
(0) = and so [[ = 1. This proves the lemma.

Denition 22.11 A region, has the square root property if whenever f,
1
f
:
C are both analytic
1
, it follows there exists : C such that is analytic and
f (z) =
2
(z) .
The next theorem will turn out to be equivalent to the Riemann mapping the-
orem.
22.3.2 Regions With Square Root Property
Theorem 22.12 Let ,= C for a region and suppose has the square root
property. Then for z
0
there exists h : B(0, 1) such that h is one to one,
onto, analytic, and h(z
0
) = 0.
Proof: Dene T to be the set of functions, f such that f : B(0, 1) is one
to one and analytic. The rst task is to show T is nonempty. Then, using Montels
theorem it will be shown there is a function in T, h, such that [h
(z
0
)[
(z
0
)
1
This implies f has no zero on .
for all T. When this has been done it will be shown that h is actually onto.
This will prove the theorem.
Claim 1: T is nonempty.
Proof of Claim 1: Since ,= C it follows there exists / . Then it follows
z and
1
z
are both analytic on . Since has the square root property,
there exists an analytic function, : C such that
2
(z) = z for all
z , (z) =

z . Since z is not constant, neither is and it follows
from the open mapping theorem that () is a region. Note also that is one
to one because if (z
1
) = (z
2
) , then you can square both sides and conclude
z
1
= z
2
implying z
1
= z
2
.
Now pick a () . Thus

z
a
= a. I claim there exists a positive lower
bound to

z +a
for z . If not, there exists a sequence, z

n
such
that
_
z
n
+a =
_
z
n
+
_
z
a

n
0.
Then
_
z
n
=
_
_
z
a
_
(22.6)
and squaring both sides,
z
n
=
2
n
+z
a
2
n
_
z
a
.
Consequently, (z
n
z
a
) =
2
n
2
n
z
a
which converges to 0. Taking the limit
in 22.6, it follows 2
z
a
= 0 and so = z
a
, a contradiction to / . Choose
r > 0 such that for all z ,
z +a
> r > 0. Then consider

(z)
r
z +a
. (22.7)
This is one to one, analytic, and maps into B(0, 1) (
z +a
> r). Thus T

is not empty and this proves the claim.
Claim 2: Let z
0
. There exists a nite positive real number, , dened by
sup
_
(z
0
)
: T
_
(22.8)
and an analytic function, h T such that [h
(z
0
)[ = . Furthermore, h(z
0
) = 0.
Proof of Claim 2: First you show < . Let (t) = z
0
+re
it
for t [0, 2]
and r is small enough that B(z
0
, r) . Then for T, the Cauchy integral
formula for the derivative implies
(z
0
) =
1
2i
_
(w)
(w z
0
)
2
dw
and so

(z
0
)
(1/2) 2r
_
1/r
2
_
= 1/r. Therefore, < as desired. For
dened above in 22.7
(z
0
) =
r
(z
0
)
((z
0
) +a)
2
=
r (1/2)
_
z
0
_
1
((z
0
) +a)
2
,= 0.
Therefore, > 0. It remains to verify the existence of the function, h.
By Theorem 22.7, there exists a sequence,
n
, of functions in T and an
analytic function, h, such that
n
(z
0
)
(22.9)
and
n
h,
n
h
, (22.10)
uniformly on all compact subsets of . It follows
[h
(z
0
)[ = lim
n
n
(z
0
)
= (22.11)
and for all z ,
[h(z)[ = lim
n
[
n
(z)[ 1. (22.12)
By 22.11, h is not a constant. Therefore, in fact, [h(z)[ < 1 for all z in
22.12 by the open mapping theorem.
Next it must be shown that h is one to one in order to conclude h T. Pick
z
1
and suppose z
2
is another point of . Since the zeros of h h(z
1
) have no
limit point, there exists a circular contour bounding a circle which contains z
2
but
not z
1
such that
contains no zeros of h h(z

1
).
z
1
c
'
T
z
2
Using the theorem on counting zeros, Theorem 20.20, and the fact that
n
is
one to one,
0 = lim
n
1
2i
_
n
(w)
n
(w)
n
(z
1
)
dw
=
1
2i
_
(w)
h(w) h(z
1
)
dw,
which shows that hh(z
1
) has no zeros in B(z
2
, r) . In particular z
2
is not a zero of
h h(z
1
) . This shows that h is one to one since z
2
,= z
1
was arbitrary. Therefore,
h T. It only remains to verify that h(z
0
) = 0.
If h(z
0
) ,= 0,consider
h(z
0
)
h where
is the fractional linear transformation

dened in Lemma 22.9. By this lemma it follows
h(z
0
)
h T. Now using the
chain rule,
h(z
0
)
h
_
(z
0
)
h(z
0
)
(h(z
0
))
[h
(z
0
)[
=
1
1 [h(z
0
)[
2
[h
(z
0
)[
=
1
1 [h(z
0
)[
2
>
Contradicting the denition of . This proves Claim 2.
Claim 3: The function, h just obtained maps onto B(0, 1).
Proof of Claim 3: To show h is onto, use the fractional linear transformation
of Lemma 22.9. Suppose h is not onto. Then there exists B(0, 1) h() . Then
0 ,=
h(z) for all z because
h(z) =
h(z)
1 h(z)
and it is assumed / h() . Therefore, since has the square root property, you
can consider an analytic function z
_
h(z). This function is one to one

because both
and h are. Also, the values of this function are in B(0, 1) by

Lemma 22.9 so it is in T.
Now let

h(z
0
)
h. (22.13)
Thus
(z
0
) =
h(z
0
)
h(z
0
) = 0
and is a one to one mapping of into B(0, 1) so is also in T. Therefore,
(z
0
)
_
_
h
_
(z
0
)
. (22.14)
Dene s (w) w
2
. Then using Lemma 22.9, in particular, the description of
1
, you can solve 22.13 for h to obtain

h(z) =
h(z
0
)

=
_
_
F
..
h(z
0
)

_
_
(z)
= (F ) (z) (22.15)
Now
F (0) =
h(z
0
)
(0) =
1
h(z
0
)) = h(z
0
) = 0
and F maps B(0, 1) into B(0, 1). Also, F is not one to one because it maps B(0, 1)
to B(0, 1) and has s in its denition. Thus there exists z
1
B(0, 1) such that
h(z
0
)
(z
1
) =
1
2
and another point z
2
B(0, 1) such that
h(z
0
)
(z
2
) =
1
2
. However, thanks to s, F (z
1
) = F (z
2
).
Since F (0) = h(z
0
) = 0, you can apply the Schwarz lemma to F. Since F is
not one to one, it cant be true that F (z) = z for [[ = 1 and so by the Schwarz
lemma it must be the case that [F
(0)[ < 1. But this implies from 22.15 and 22.14

that
= [h
(z
0
)[ = [F
( (z
0
))[
(z
0
)
= [F
(0)[
(z
0
)
<
(z
0
)
,
a contradiction. This proves the theorem.
The following lemma yields the usual form of the Riemann mapping theorem.
Lemma 22.13 Let be a simply connected region with ,= C. Then has the
square root property.
Proof: Let f and
1
f
both be analytic on . Then
f
f
is analytic on so by
Corollary 19.50, there exists

F, analytic on such that

F
=
f
f
on . Then
_
fe
F
_
= 0 and so f (z) = Ce
F
= e
a+ib
e
F
. Now let F =

F + a + ib. Then F is
still a primitive of f
/f and f (z) = e
F(z)
. Now let (z) e
1
2
F(z)
. Then is the
desired square root and so has the square root property.
Corollary 22.14 (Riemann mapping theorem) Let be a simply connected region
with ,= C and let z
0
. Then there exists a function, f : B(0, 1) such
that f is one to one, analytic, and onto with f (z
0
) = 0. Furthermore, f
1
is also
analytic.
Proof: From Theorem 22.12 and Lemma 22.13 there exists a function, f :
B(0, 1) which is one to one, onto, and analytic such that f (z
0
) = 0. The assertion
that f
1
is analytic follows from the open mapping theorem.
22.4 Analytic Continuation
22.4.1 Regular And Singular Points
Given a function which is analytic on some set, can you extend it to an analytic
function dened on a larger set? Sometimes you can do this. It was done in the
proof of the Cauchy integral formula. There are also reection theorems like those
discussed in the exercises starting with Problem 10 on Page 486. Here I will give a
systematic way of extending an analytic function to a larger set. I will emphasize
simply connected regions. The subject of analytic continuation is much larger than
the introduction given here. A good source for much more on this is found in Alfors
22.4. ANALYTIC CONTINUATION 589
[2]. The approach given here is suggested by Rudin [38] and avoids many of the
standard technicalities.
Denition 22.15 Let f be analytic on B(a, r) and let B(a, r) . Then is
called a regular point of f if there exists some > 0 and a function, g analytic on
B(, ) such that g = f on B(, ) B(a, r) . Those points of B(a, r) which are
not regular are called singular.
a
Theorem 22.16 Suppose f is analytic on B(a, r) and the power series

f (z) =
k=0
a
k
(z a)
k
has radius of convergence r. Then there exists a singular point on B(a, r).
Proof: If not, then for every z B(a, r) there exists
z
> 0 and g
z
analytic on
B(z,
z
) such that g
z
= f on B(z,
z
) B(a, r) . Since B(a, r) is compact, there
exist z
1
, , z
n
, points in B(a, r) such that B(z
k
,
z
k
)
n
k=1
covers B(a, r) . Now
dene
g (z)
_
f (z) if z B(a, r)
g
z
k
(z) if z B(z
k
,
z
k
)
Is this well dened? If z B(z
i
,
z
i
) B
_
z
j
,
z
j
_
, is g
z
i
(z) = g
z
j
(z)? Consider the
following picture representing this situation.
You see that if z B(z
i
,
z
i
) B
_
z
j
,
z
j
_
then I B(z
i
,
z
i
) B
_
z
j
,
z
j
_
B(a, r) is a nonempty open set. Both g

z
i
and g
z
j
equal f on I. Therefore, they
must be equal on B(z
i
,
z
i
) B
_
z
j
,
z
j
_
because I has a limit point. Therefore,
g is well dened and analytic on an open set containing B(a, r). Since g agrees
with f on B(a, r) , the power series for g is the same as the power series for f and
converges on a ball which is larger than B(a, r) contrary to the assumption that the
radius of convergence of the above power series equals r. This proves the theorem.
22.4.2 Continuation Along A Curve
Next I will describe what is meant by continuation along a curve. The following
denition is standard and is found in Rudin [38].
Denition 22.17 A function element is an ordered pair, (f, D) where D is an open
ball and f is analytic on D. (f
0
, D
0
) and (f
1
, D
1
) are direct continuations of each
other if D
1
D
0
,= and f
0
= f
1
on D
1
D
0
. In this case I will write (f
0
, D
0
)
(f
1
, D
1
) . A chain is a nite sequence, of disks, D
0
, , D
n
such that D
i1
D
i
,=
. If (f
0
, D
0
) is a given function element and there exist function elements, (f
i
, D
i
)
such that D
0
, , D
n
is a chain and (f
j1
, D
j1
) (f
j
, D
j
) then (f
n
, D
n
) is
called the analytic continuation of (f
0
, D
0
) along the chain D
0
, , D
n
. Now
suppose is an oriented curve with parameter interval [a, b] and there exists a chain,
D
0
, , D
n
such that

n
k=1
D
k
, (a) is the center of D
0
, (b) is the center
of D
n
, and there is an increasing list of numbers in [a, b] , a = s
0
< s
1
< s
n
= b
such that ([s
i
, s
i+1
]) D
i
and (f
n
, D
n
) is an analytic continuation of (f
0
, D
0
)
along the chain. Then (f
n
, D
n
) is called an analytic continuation of (f
0
, D
0
) along
the curve . ( will always be a continuous curve. Nothing more is needed. )
In the above situation it does not follow that if D
n
D
0
,= , that f
n
= f
0
! How-
ever, there are some cases where this will happen. This is the monodromy theorem
which follows. This is as far as I will go on the subject of analytic continuation. For
more on this subject including a development of the concept of Riemann surfaces,
see Alfors [2].
Lemma 22.18 Suppose (f, B(0, r)) for r < 1 is a function element and (f, B(0, r))
can be analytically continued along every curve in B(0, 1) that starts at 0. Then
there exists an analytic function, g dened on B(0, 1) such that g = f on B(0, r) .
Proof: Let
R = supr
1
r such that there exists g
r
1
analytic on B(0, r
1
) which agrees with f on B(0, r) .
Dene g
R
(z) g
r
1
(z) where [z[ < r
1
. This is well dened because if you use r
1
and r
2
, both g
r
1
and g
r
2
agree with f on B(0, r), a set with a limit point and so
the two functions agree at every point in both B(0, r
1
) and B(0, r
2
). Thus g
R
is
analytic on B(0, R) . If R < 1, then by the assumption there are no singular points
on B(0, R) and so Theorem 22.16 implies the radius of convergence of the power
series for g
R
is larger than R contradicting the choice of R. Therefore, R = 1 and
this proves the lemma. Let g = g
R
.
The following theorem is the main result in this subject, the monodromy theo-
rem.
22.5. THE PICARD THEOREMS 591
Theorem 22.19 Let be a simply connected proper subset of C and suppose
(f, B(a, r)) is a function element with B(a, r) . Suppose also that this function
element can be analytically continued along every curve through a. Then there exists
G analytic on such that G agrees with f on B(a, r).
Proof: By the Riemann mapping theorem, there exists h : B(0, 1) which
is analytic, one to one and onto such that f (a) = 0. Since h is an open map, there
exists > 0 such that
B(0, ) h(B(a, r)) .
It follows f h
1
can be analytically continued along every curve through 0. By
Lemma 22.18 there exists g analytic on B(0, 1) which agrees with f h
1
on B(0, ).
Dene G(z) g (h(z)) . For z = h
1
(w) , it follows G
_
h
1
(w)
_
= g (w) . If w
B(0, ) , then G
_
h
1
(w)
_
= f h
1
(w) and so G = f on h
1
(B(0, )) , an open
set contained in B(a, r). Therefore, G = f on B(a, r) because h
1
(B(0, )) has a
limit point. This proves the theorem.
Actually, you sometimes want to consider the case where = C. This requires
a small modication to obtain from the above theorem.
Corollary 22.20 Suppose (f, B(a, r)) is a function element with B(a, r) C.
Suppose also that this function element can be analytically continued along every
curve through a. Then there exists G analytic on C such that G agrees with f on
B(a, r).
Proof: Let
1
z C : a +it : t > a and
2
z C : a it : t > a . Here
is a picture of
1
.
1
a
A picture of
2
is similar except the line extends down from the boundary of
B(a, r).
Thus B(a, r)
i
and
i
is simply connected and proper. By Theorem 22.19
there exist analytic functions, G
i
analytic on
i
such that G
i
= f on B(a, r). Thus
G
1
= G
2
on B(a, r) , a set with a limit point. Therefore, G
1
= G
2
on
1
2
. Now
let G(z) = G
i
(z) where z
i
. This is well dened and analytic on C. This proves
the corollary.
22.5 The Picard Theorems
The Picard theorem says that if f is an entire function and there are two complex
numbers not contained in f (C) , then f is constant. This is certainly one of the
most amazing things which could be imagined. However, this is only the little
Picard theorem. The big Picard theorem is even more incredible. This one asserts
that to be non constant the entire function must take every value of C but two
innitely many times! I will begin with the little Picard theorem. The method of
proof I will use is the one found in Saks and Zygmund [40], Conway [12] and Hille
[26]. This is not the way Picard did it in 1879. That approach is very dierent and
is presented at the end of the material on elliptic functions. This approach is much
more recent dating it appears from around 1924.
Lemma 22.21 Let f be analytic on a region containing B(0, r) and suppose
[f
(0)[ = b > 0, f (0) = 0,

and [f (z)[ M for all z B(0, r). Then f (B(0, r)) B
_
0,
r
2
b
2
6M
_
.
Proof: By assumption,
f (z) =
k=0
a
k
z
k
, [z[ r. (22.16)
Then by the Cauchy integral formula for the derivative,
a
k
=
1
2i
_
B(0,r)
f (w)
w
k+1
dw
where the integral is in the counter clockwise direction. Therefore,
[a
k
[
1
2
_
2
0
f
_
re
i
_
r
k
d
M
r
k
.
In particular, br M. Therefore, from 22.16
[f (z)[ b [z[
k=2
M
r
k
[z[
k
= b [z[
M
_
|z|
r
_
2
1
|z|
r
= b [z[
M[z[
2
r
2
r [z[
Suppose [z[ =
r
2
b
4M
< r. Then this is no larger than
1
4
b
2
r
2
3M br
M (4M br)

1
4
b
2
r
2
3M M
M (4M M)
=
r
2
b
2
6M
.
Let [w[ <
r
2
b
4M
. Then for [z[ =
r
2
b
4M
and the above,
[w[ = [(f (z) w) f (z)[ <
r
2
b
4M
[f (z)[
and so by Rouches theorem, z f (z) w and z f (z) have the same number of
zeros in B
_
0,
r
2
b
4M
_
. But f has at least one zero in this ball and so this shows there
exists at least one z B
_
0,
r
2
b
4M
_
such that f (z) w = 0. This proves the lemma.
22.5.1 Two Competing Lemmas
Lemma 22.21 is a really nice lemma but there is something even better, Blochs
lemma. This lemma does not depend on the bound of f. Like the above two
lemmas it is interesting for its own sake and in addition is the key to a fairly short
proof of Picards theorem. It features the number
1
24
. The best constant is not
currently known.
Lemma 22.22 Let f be analytic on an open set containing B(0, R) and suppose
[f
(0)[ > 0. Then there exists a B(0, R) such that

f (B(0, R)) B
_
f (a) ,
[f
(0)[ R
24
_
.
Proof: Let K () max [f
(z)[ : [z[ = . For simplicity, let C
z : [z[ = .
Claim: K is continuous from the left.
Proof of claim: Let z
such that [f
(z
)[ = K () . Then by the maximum

modulus theorem, if (0, 1) ,
[f
(z
)[ K () K () = [f
(z
)[ .
Letting 1 yields the claim.
Let
0
be the largest such that (R
0
) K (
0
) = R[f
(0)[ . (Note (R 0) K (0) =

R[f
(0)[ .) Thus
0
< R because (R R) K (R) = 0. Let [a[ =
0
such that
[f
(a)[ = K (
0
). Thus
[f
(a)[ (R
0
) = [f
(0)[ R (22.17)
Now let r =
R
0
2
. From 22.17,
[f
(a)[ r =
1
2
[f
(0)[ R, B(a, r) B(0,

0
+r) B(0, R) . (22.18)
0
a
Therefore, if z B(a, r) , it follows from the maximum modulus theorem and
the denition of
0
that
[f
(z)[ K (
0
+r) <
R[f
(0)[
R
0
r
=
2R[f
(0)[
R
0
=
2R[f
(0)[
2r
=
R[f
(0)[
r
(22.19)
Let g (z) = f (a +z) f (a) where z B(0, r) . Then [g
(0)[ = [f
(a)[ > 0 and

for z B(0, r),
[g (z)[
_
(a,z)
g
(w) dw
[z a[
R[f
(0)[
r
= R[f
(0)[ .
By Lemma 22.21 and 22.18,
g (B(0, r)) B
_
0,
r
2
[f
(a)[
2
6R[f
(0)[
_
= B
_
0,
r
2
_
1
2r
[f
(0)[ R
_
2
6R[f
(0)[
_
= B
_
0,
[f
(0)[ R
24
_
Now g (B(0, r)) = f (B(a, r)) f (a) and so this implies
f (B(0, R)) f (B(a, r)) B
_
f (a) ,
[f
(0)[ R
24
_
.
Here is a slightly more general version which allows the center of the open set
to be arbitrary.
Lemma 22.23 Let f be analytic on an open set containing B(z
0
, R) and suppose
[f
(z
0
)[ > 0. Then there exists a B(z
0
, R) such that
f (B(z
0
, R)) B
_
f (a) ,
[f
(z
0
)[ R
24
_
.
Proof: You look at g (z) f (z
0
+z) f (z
0
) for z B(0, R) . Then g
(0) =
f
(z
0
) and so by Lemma 22.22 there exists a
1
B(0, R) such that
g (B(0, R)) B
_
g (a
1
) ,
[f
(z
0
)[ R
24
_
.
Now g (B(0, R)) = f (B(z
0
, R)) f (z
0
) and g (a
1
) = f (a) f (z
0
) for some a
B(z
0
, R) and so
f (B(z
0
, R)) f (z
0
) B
_
g (a
1
) ,
[f
(z
0
)[ R
24
_
= B
_
f (a) f (z
0
) ,
[f
(z
0
)[ R
24
_
which implies
f (B(z
0
, R)) B
_
f (a) ,
[f
(z
0
)[ R
24
_
as claimed. This proves the lemma.
No attempt was made to nd the best number to multiply by R[f
(z
0
)[. A
discussion of this is given in Conway [12]. See also [26]. Much larger numbers than
1/24 are available and there is a conjecture due to Alfors about the best value. The
conjecture is that 1/24 can be replaced with
_
1
3
_
_
11
12
_
_
1 +
3
_
1/2
_
1
4
_
. 471 86
You can see there is quite a gap between the constant for which this lemma is proved
above and what is thought to be the best constant.
Blochs lemma above gives the existence of a ball of a certain size inside the
image of a ball. By contrast the next lemma leads to conditions under which the
values of a function do not contain a ball of certain radius. It concerns analytic
functions which do not achieve the values 0 and 1.
Lemma 22.24 Let T denote the set of functions, f dened on , a simply con-
nected region which do not achieve the values 0 and 1. Then for each such function,
it is possible to dene a function analytic on , H (z) by the formula
H (z) log
__
log (f (z))
2i

_
log (f (z))
2i
1
_
.
There exists a constant C independent of f T such that H () does not contain
any ball of radius C.
Proof: Let f T. Then since f does not take the value 0, there exists g
1
a
primitive of f
/f. Thus
d
dz
_
e
g
1
f
_
= 0
so there exists a, b such that f (z) e
g
1
(z)
= e
a+bi
. Letting g (z) = g
1
(z) +a +ib, it
follows e
g(z)
= f (z). Let log (f (z)) = g (z). Then for n Z, the integers,
log (f (z))
2i
,
log (f (z))
2i
1 ,= n
because if equality held, then f (z) = 1 which does not happen. It follows
log(f(z))
2i
and
log(f(z))
2i
1 are never equal to zero. Therefore, using the same reasoning, you
can dene a logarithm of these two quantities and therefore, a square root. Hence
there exists a function analytic on ,
_
log (f (z))
2i

_
log (f (z))
2i
1. (22.20)
For n a positive integer, this function cannot equal

n
n 1 because if it did,
then
__
log (f (z))
2i

_
log (f (z))
2i
1
_
=
n 1 (22.21)
and you could take reciprocals of both sides to obtain
__
log (f (z))
2i
+
_
log (f (z))
2i
1
_
=
n 1. (22.22)
Then adding 22.21 and 22.22
2
_
log (f (z))
2i
= 2
n
which contradicts the above observation that
log(f(z))
2i
is not equal to an integer.
Also, the function of 22.20 is never equal to zero. Therefore, you can dene the
logarithm of this function also. It follows
H (z) log
__
log (f (z))
2i

_
log (f (z))
2i
1
_
,= ln
_
n
n 1
_
+ 2mi
where m is an arbitrary integer and n is a positive integer. Now
lim
n
ln
_
n +
n 1
_
=
and lim
n
ln
_
n
n 1
_
= and so C is covered by rectangles having
vertices at points ln
_
n
n 1
_
+ 2mi as described above. Each of these
rectangles has height equal to 2 and a short computation shows their widths are
bounded. Therefore, there exists C independent of f T such that C is larger
than the diameter of all these rectangles. Hence H () cannot contain any ball of
radius larger than C.
22.5.2 The Little Picard Theorem
Now here is the little Picard theorem. It is easy to prove from the above.
Theorem 22.25 If h is an entire function which omits two values then h is a
constant.
Proof: Suppose the two values omitted are a and b and that h is not constant.
Let f (z) = (h(z) a) / (b a). Then f omits the two values 0 and 1. Let H be
dened in Lemma 22.24. Then H (z) is clearly not of the form az+b because then it
would have values equal to the vertices ln
_
n
n 1
_
+2mi or else be constant
neither of which happen if h is not constant. Therefore, by Liouvilles theorem, H
must be unbounded. Pick such that [H
()[ > 24C where C is such that H (C)

contains no balls of radius larger than C. But by Lemma 22.23 H (B(, 1)) must
contain a ball of radius
[H
()[
24
>
24C
24
= C, a contradiction. This proves Picards
theorem.
The following is another formulation of this theorem.
Corollary 22.26 If f is a meromophic function dened on C which omits three
distinct values, a, b, c, then f is a constant.
Proof: Let (z)
za
zc
bc
ba
. Then (c) = , (a) = 0, and (b) = 1. Now
consider the function, h = f. Then h misses the three points , 0, and 1. Since
h is meromorphic and does not have in its values, it must actually be analytic.
Thus h is an entire function which misses the two values 0 and 1. Therefore, h is
constant by Theorem 22.25.
22.5.3 Schottkys Theorem
Lemma 22.27 Let f be analytic on an open set containing B(0, R) and suppose
that f does not take on either of the two values 0 or 1. Also suppose [f (0)[ .
Then letting (0, 1) , it follows
[f (z)[ M (, )
for all z B(0, R) , where M (, ) is a function of only the two variables , .
(In particular, there is no dependence on R.)
Proof: Consider the function, H (z) used in Lemma 22.24 given by
H (z) log
__
log (f (z))
2i

_
log (f (z))
2i
1
_
. (22.23)
You notice there are two explicit uses of logarithms. Consider rst the logarithm
inside the radicals. Choose this logarithm such that
log (f (0)) = ln[f (0)[ +i arg (f (0)) , arg (f (0)) (, ]. (22.24)
You can do this because
e
log(f(0))
= f (0) = e
ln|f(0)|
e
i
= e
ln|f(0)|+i
and by replacing with + 2m for a suitable integer, m it follows the above
equation still holds. Therefore, you can assume 22.24. Similar reasoning applies to
the logarithm on the outside of the parenthesis. It can be assumed H (0) equals
ln
_
log (f (0))
2i

_
log (f (0))
2i
1
+i arg
__
log (f (0))
2i

_
log (f (0))
2i
1
_
(22.25)
where the imaginary part is no larger than in absolute value.
Now if B(0, R) is a point where H
() ,= 0, then by Lemma 22.22

H (B(, R [[)) B
_
H (a) ,
[H
()[ (R [[)
24
_
where a is some point in B(, R [[). But by Lemma 22.24 H (B(, R [[))
contains no balls of radius C where C depended only on the maximum diameters of
those rectangles having vertices ln
_
n
n 1
_
+ 2mi for n a positive integer
and m an integer. Therefore,
[H
()[ (R [[)
24
< C
and consequently
[H
()[ <
24C
R [[
.
Even if H
() = 0, this inequality still holds. Therefore, if z B(0, R) and (0, z)

is the straight segment from 0 to z,
[H (z) H (0)[ =
_
(0,z)
H
(w) dw
_
1
0
H
(tz) zdt
_
1
0
[H
(tz) z[ dt
_
1
0
24C
R t [z[
[z[ dt
= 24C ln
_
R
R [z[
_
.
Therefore, for z B(0, R) ,
[H (z)[ [H (0)[ + 24C ln
_
1
1
_
. (22.26)
By the maximum modulus theorem, the above inequality holds for all [z[ < R also.
Next I will use 22.23 to get an inequality for [f (z)[ in terms of [H (z)[. From
22.23,
H (z) = log
__
log (f (z))
2i

_
log (f (z))
2i
1
_
and so
2H (z) = log
__
log (f (z))
2i

_
log (f (z))
2i
1
_
2
2H (z) = log
__
log (f (z))
2i

_
log (f (z))
2i
1
_
2
= log
__
log (f (z))
2i
+
_
log (f (z))
2i
1
_
2
Therefore,
__
log (f (z))
2i
+
_
log (f (z))
2i
1
_
2
+
__
log (f (z))
2i

_
log (f (z))
2i
1
_
2
= exp(2H (z)) + exp(2H (z))
and
_
log (f (z))
i
1
_
=
1
2
(exp(2H (z)) + exp(2H (z))) .
Thus
log (f (z)) = i +
i
2
(exp(2H (z)) + exp(2H (z)))
which shows
[f (z)[ =
exp
_
i
2
(exp(2H (z)) + exp(2H (z)))
_
exp
i
2
(exp(2H (z)) + exp(2H (z)))
exp
2
([exp(2H (z))[ +[exp(2H (z))[)
exp
2
(exp(2 [H (z)[) + exp([2H (z)[))
= exp( exp2 [H (z)[) .

Now from 22.26 this is dominated by
exp
_
exp2
_
[H (0)[ + 24C ln
_
1
1
___
= exp
_
exp(2 [H (0)[) exp
_
48C ln
_
1
1
___
(22.27)
Consider exp(2 [H (0)[). I want to obtain an inequality for this which involves
. This is where I will use the convention about the logarithms discussed above.
From 22.25,
2 [H (0)[ = 2
log
__
log (f (0))
2i

_
log (f (0))
2i
1
_

2
_
_
_
ln
_
log (f (0))
2i

_
log (f (0))
2i
1
_
2
+
2
_
_
1/2
2
_
_
ln
_
_
log (f (0))
2i
_
log (f (0))
2i
1
2
+
2
_
_
1/2
2
ln
_
_
log (f (0))
2i
_
log (f (0))
2i
1
+ 2
ln
_
2
_
log (f (0))
2i
log (f (0))
2i
1
__
+ 2
= ln
__
log (f (0))
i
log (f (0))
i
2
__
+ 2 (22.28)
Consider
log(f(0))
i
log (f (0))
i
=
ln[f (0)[
i +
arg (f (0))
and so
log (f (0))
i
=
_
ln[f (0)[
2
+
_
arg (f (0))
_
2
_
1/2
ln
2
+
_
_
2
_
1/2
=
_
ln
2
+ 1
_
1/2
.
Similarly,
log (f (0))
i
2
ln
2
+ (2 + 1)
2
_
1/2
=
_
ln
2
+ 9
_
1/2
It follows from 22.28 that
2 [H (0)[ ln
_
_
2
_
ln
2
+ 9
_
1/2
_
_
+ 2.
Hence from 22.27
[f (z)[
exp
_
_
exp
_
_
ln
_
_
2
_
ln
2
+ 9
_
1/2
_
_
+ 2
_
_
exp
_
48C ln
_
1
1
__
_
_
and so, letting M (, ) be given by the above expression on the right, the lemma
is proved.
The following theorem will be referred to as Schottkys theorem. It looks just
like the above lemma except it is only assumed that f is analytic on B(0, R) rather
than on an open set containing B(0, R). Also, the case of an arbitrary center is
included along with arbitrary points which are not attained as values of the function.
Theorem 22.28 Let f be analytic on B(z
0
, R) and suppose that f does not take
on either of the two distinct values a or b. Also suppose [f (z
0
)[ . Then letting
(0, 1) , it follows
[f (z)[ M (a, b, , )
for all z B(z
0
, R) , where M (a, b, , ) is a function of only the variables , ,a, b.
(In particular, there is no dependence on R.)
Proof: First you can reduce to the case where the two values are 0 and 1 by
considering
h(z)
f (z) a
b a
.
If there exists an estimate of the desired sort for h, then there exists such an estimate
for f. Of course here the function, M would depend on a and b. Therefore, there is
no loss of generality in assuming the points which are missed are 0 and 1.
Apply Lemma 22.27 to B(0, R
1
) for the function, g (z) f (z
0
+z) and R
1
< R.
Then if [f (z
0
)[ = [g (0)[ , it follows [g (z)[ = [f (z
0
+z)[ M (, ) for every
z B(0, R
1
) . Now let (0, 1) and choose R
1
< R large enough that R =
1
R
1
where
1
(0, 1) . Then if [z z
0
[ < R, it follows
[f (z)[ M (,
1
) .
Now let R
1
R so
1
.
22.5.4 A Brief Review
First recall the denition of the metric on

C. For convenience it is listed here again.
Consider the unit sphere, S
2
given by (z 1)
2
+y
2
+x
2
= 1. Dene a map from the
complex plane to the surface of this sphere as follows. Extend a line from the point,
p in the complex plane to the point (0, 0, 2) on the top of this sphere and let (p)
denote the point of this sphere which the line intersects. Dene () (0, 0, 2).
(0, 0, 2)
(0, 0, 1)
p
(p)
C
Then
1
is sometimes called sterographic projection. The mapping is clearly
continuous because it takes converging sequences, to converging sequences. Fur-
thermore, it is clear that
1
is also continuous. In terms of the extended complex
plane,

C, a sequence, z
n
converges to if and only if z
n
converges to (0, 0, 2) and
a sequence, z
n
converges to z C if and only if (z
n
) (z) .
In fact this makes it easy to dene a metric on

C.
Denition 22.29 Let z, w

C. Then let d (x, y) [ (z) (w)[ where this last
distance is the usual distance measured in 1
3
.
Theorem 22.30
_
C, d
_
is a compact, hence complete metric space.
Proof: Suppose z
n
is a sequence in

C. This means (z
n
) is a sequence in
S
2
which is compact. Therefore, there exists a subsequence, z
n
k
and a point,
z S
2
such that z
n
k
z in S
2
which implies immediately that d (z
n
k
, z) 0.
A compact metric space must be complete.
Also recall the interesting fact that meromorphic functions are continuous with
values in

C which is reviewed here for convenience. It came from the theory of
classication of isolated singularities.
Theorem 22.31 Let be an open subset of C and let f :

C be meromorphic.
Then f is continuous with respect to the metric, d on

C.
Proof: Let z
n
z where z . Then if z is a pole, it follows from Theorem
19.38 that
d (f (z
n
) , ) d (f (z
n
) , f (z)) 0.
If z is not a pole, then f (z
n
) f (z) in C which implies [ (f (z
n
)) (f (z))[ =
d (f (z
n
) , f (z)) 0. Recall that is continuous on C.
The fundamental result behind all the theory about to be presented is the Ascoli
Arzela theorem also listed here for convenience.
Denition 22.32 Let (X, d) be a complete metric space. Then it is said to be
locally compact if B(x, r) is compact for each r > 0.
Thus if you have a locally compact metric space, then if a
n
is a bounded
sequence, it must have a convergent subsequence.
Let K be a compact subset of 1
n
and consider the continuous functions which
have values in a locally compact metric space, (X, d) where d denotes the metric on
X. Denote this space as C (K, X) .
Denition 22.33 For f, g C (K, X) , where K is a compact subset of 1
n
and X
is a locally compact complete metric space dene
K
(f, g) supd (f (x) , g (x)) : x K .
The Ascoli Arzela theorem, Theorem 5.22 is a major result which tells which
subsets of C (K, X) are sequentially compact.
Denition 22.34 Let A C (K, X) for K a compact subset of 1
n
. Then A is
said to be uniformly equicontinuous if for every > 0 there exists a > 0 such that
whenever x, y K with [x y[ < and f A,
d (f (x) , f (y)) < .
The set, A is said to be uniformly bounded if for some M < , and a X,
f (x) B(a, M)
for all f A and x K.
The Ascoli Arzela theorem follows.
Theorem 22.35 Suppose K is a nonempty compact subset of 1
n
and A C (K, X) ,
is uniformly bounded and uniformly equicontinuous where X is a locally compact
complete metric space. Then if f
k
A, there exists a function, f C (K, X) and
a subsequence, f
k
l
such that
lim
l
K
(f
k
l
, f) = 0.
In the cases of interest here, X =

C with the metric dened above.
22.5.5 Montels Theorem
The following lemma is another version of Montels theorem. It is this which will
make possible a proof of the big Picard theorem.
Lemma 22.36 Let be a region and let T be a set of functions analytic on
none of which achieve the two distinct values, a and b. If f
n
T then one of the
following hold: Either there exists a function, f analytic on and a subsequence,
f
n
k
such that for any compact subset, K of ,
lim
k
[[f
n
k
f[[
K,
= 0. (22.29)
or there exists a subsequence f
n
k
such that for all compact subsets K,
lim
k
K
(f
n
k
, ) = 0. (22.30)
Proof: Let B(z
0
, 2R) . There are two cases to consider. The rst case is
that there exists a subsequence, n
k
such that f
n
k
(z
0
) is bounded. The second
case is that lim
n
[f
n
k
(z
0
)[ = .
Consider the rst case. By Theorem 22.28 f
n
k
(z) is uniformly bounded on
B(z
0
, R) because by this theorem, and letting = 1/2 applied to B(z
0
, 2R) , it fol-
lows [f
n
k
(z)[ M
_
a, b,
1
2
,
_
where is an upper bound to the numbers, [f
n
k
(z
0
)[.
The Cauchy integral formula implies the existence of a uniform bound on the
_
f
n
k
_
which implies the functions are equicontinuous and uniformly bounded. Therefore,
by the Ascoli Arzela theorem there exists a further subsequence which converges
uniformly on B(z
0
, R) to a function, f analytic on B(z
0
, R). Thus denoting this
subsequence by f
n
k
to save on notation,
lim
k
[[f
n
k
f[[
B(z
0
,R),
= 0. (22.31)
Consider the second case. In this case, it follows 1/f
n
(z
0
) is bounded on
B(z
0
, R) and so by the same argument just given 1/f
n
(z) is uniformly bounded
on B(z
0
, R). Therefore, a subsequence converges uniformly on B(z
0
, R). But
1/f
n
(z) converges to 0 and so this requires that 1/f
n
(z) must converge uni-
formly to 0. Therefore,
lim
k
B(z
0
,R)
(f
n
k
, ) = 0. (22.32)
Now let D
k
denote a countable set of closed balls, D
k
= B(z
k
, R
k
) such that
B(z
k
, 2R
k
) and
k=1
int (D
k
) = . Using a Cantor diagonal process, there
exists a subsequence, f
n
k
of f
n
such that for each D
j
, one of the above two
alternatives holds. That is, either
lim
k
[[f
n
k
g
j
[[
D
j
,
= 0 (22.33)
or,
lim
k
D
j
(f
n
k
, ) . (22.34)
Let A = int (D
j
) : 22.33 holds , B = int (D
j
) : 22.34 holds . Note that the
balls whose union is A cannot intersect any of the balls whose union is B. Therefore,
one of A or B must be empty since otherwise, would not be connected.
If K is any compact subset of , it follows K must be a subset of some nite
collection of the D
j
. Therefore, one of the alternatives in the lemma must hold.
That the limit function, f must be analytic follows easily in the same way as the
proof in Theorem 22.7 on Page 582. You could also use Moreras theorem. This
proves the lemma.
22.5.6 The Great Big Picard Theorem
The next theorem is the main result which the above lemmas lead to. It is the Big
Picard theorem, also called the Great Picard theorem.Recall B
(a, r) is the deleted

ball consisting of all the points of the ball except the center.
Theorem 22.37 Suppose f has an isolated essential singularity at 0. Then for
every R > 0, and C, f
1
() B
(0, R) is an innite set except for one possible

exceptional .
Proof: Suppose this is not true. Then there exists R
1
> 0 and two points,
and such that f
1
() B
(0, R
1
) and f
1
() B
(0, R
1
) are both nite sets.
Then shrinking R
1
and calling the result R, there exists B(0, R) such that
f
1
() B
(0, R) = , f
1
() B
(0, R) = .
Now let A
0
denote the annulus
_
z C :
R
2
2
< [z[ <
3R
2
2
_
and let A
n
denote the
annulus
_
z C :
R
2
2+n
< [z[ <
3R
2
2+n
_
. The reason for the 3 is to insure that A
n

A
n+1
,= . This follows from the observation that 3R/2
2+1+n
> R/2
2+n
. Now
dene a set of functions on A
0
as follows:
f
n
(z) f
_
z
2
n
_
.
By the choice of R, this set of functions missed the two points and . Therefore, by
Lemma 22.36 there exists a subsequence such that one of the two options presented
there holds.
First suppose lim
k
[[f
n
k
f[[
K,
= 0 for all K a compact subset of A
0
and
f is analytic on A
0
. In particular, this happens for
0
the circular contour having
radius R/2. Thus f
n
k
must be bounded on this contour. But this says the same
thing as f (z/2
n
k
) is bounded for [z[ = R/2, this holding for each k = 1, 2, . Thus
there exists a constant, M such that on each of a shrinking sequence of concentric
circles whose radii converge to 0, [f (z)[ M. By the maximum modulus theorem,
[f (z)[ M at every point between successive circles in this sequence. Therefore,
[f (z)[ M in B
(0, R) contradicting the Weierstrass Casorati theorem.

The other option which might hold from Lemma 22.36 is that lim
k
K
(f
n
k
, ) =
0 for all K compact subset of A
0
. Since f has an essential singularity at 0 the zeros
of f in B(0, R) are isolated. Therefore, for all k large enough, f
n
k
has no zeros for
[z[ < 3R/2
2
. This is because the values of f
n
k
are the values of f on A
n
k
, a small
anulus which avoids all the zeros of f whenever k is large enough. Only consider k
this large. Then use the above argument on the analytic functions 1/f
n
k
. By the as-
sumption that lim
k
K
(f
n
k
, ) = 0, it follows lim
k
[[1/f
n
k
0[[
K,
= 0 and
so as above, there exists a shrinking sequence of concentric circles whose radii con-
verge to 0 and a constant, M such that for z on any of these circles, [1/f (z)[ M.
This implies that on some deleted ball, B
(0, r) where r R, [f (z)[ 1/M which

again violates the Weierstrass Casorati theorem. This proves the theorem.
As a simple corollary, here is what this remarkable theorem says about entire
functions.
Corollary 22.38 Suppose f is entire and nonconstant and not a polynomial. Then
f assumes every complex value innitely many times with the possible exception of
one.
Proof: Since f is entire, f (z) =
n=0
a
n
z
n
. Dene for z ,= 0,
g (z) f
_
1
z
_
=
n=0
a
n
_
1
z
_
n
.
Thus 0 is an isolated essential singular point of g. By the big Picard theorem,
Theorem 22.37 it follows g takes every complex number but possibly one an innite
number of times. This proves the corollary.
Note the dierence between this and the little Picard theorem which says that
an entire function which is not constant must achieve every value but two.
22.6 Exercises
1. Prove that in Theorem 22.7 it suces to assume T is uniformly bounded on
each compact subset of .
2. Find conditions on a, b, c, d such that the fractional linear transformation,
az+b
cz+d
maps the upper half plane onto the upper half plane.
3. Let D be a simply connected region which is a proper subset of C. Does there
exist an entire function, f which maps C onto D? Why or why not?
4. Verify the conclusion of Theorem 22.7 involving the higher order derivatives.
5. What if = C? Does there exist an analytic function, f mapping one to
one and onto B(0, 1)? Explain why or why not. Was ,= C used in the proof
of the Riemann mapping theorem?
6. Verify that [
(z)[ = 1 if [z[ = 1. Apply the maximum modulus theorem to

conclude that [
(z)[ 1 for all [z[ < 1.

7. Suppose that [f (z)[ 1 for [z[ = 1 and f () = 0 for [[ < 1. Show that
[f (z)[ [
(z)[ for all z B(0, 1) . Hint: Consider

f(z)(1z)
z
which has a
removable singularity at . Show the modulus of this function is bounded by
1 on [z[ = 1. Then apply the maximum modulus theorem.
8. Let U and V be open subsets of C and suppose u : U 1 is harmonic while
h is an analytic map which takes V one to one onto U. Show that u h is
harmonic on V .
9. Show that for a harmonic function, u dened on B(0, R) , there exists an
analytic function, h = u +iv where
v (x, y)
_
y
0
u
x
(x, t) dt
_
x
0
u
y
(t, 0) dt.
22.6. EXERCISES 607
10. Suppose is a simply connected region and u is a real valued function dened
on such that u is harmonic. Show there exists an analytic function, f such
that u = Re f. Show this is not true if is not a simply connected region.
Hint: You might use the Riemann mapping theorem and Problems 8 and
9. For the second part it might be good to try something like u(x, y) =
ln
_
x
2
+y
2
_
on the annulus 1 < [z[ < 2.
11. Show that w =
1+z
1z
maps z C : Imz > 0 and [z[ < 1 to the rst quadrant,
z = x +iy : x, y > 0 .
12. Let f (z) =
az+b
cz+d
and let g (z) =
a
1
z+b
1
c
1
z+d
1
. Show that f g (z) equals the quotient
of two expressions, the numerator being the top entry in the vector
_
a b
c d
__
a
1
b
1
c
1
d
1
__
z
1
_
and the denominator being the bottom entry. Show that if you dene
__
a b
c d
__
az +b
cz +d
,
then (AB) = (A) (B) . Find an easy way to nd the inverse of f (z) =
az+b
cz+d
and give a condition on the a, b, c, d which insures this function has an
inverse.
13. The modular group
2
is the set of fractional linear transformations,
az+b
cz+d
such
that a, b, c, d are integers and ad bc = 1. Using Problem 12 or brute force
show this modular group is really a group with the group operation being
composition. Also show the inverse of
az+b
cz+d
is
dzb
cz+a
.
14. Let be a region and suppose f is analytic on and that the functions f
n
are also analytic on and converge to f uniformly on compact subsets of .
Suppose f is one to one. Can it be concluded that for an arbitrary compact
set, K that f
n
is one to one for all n large enough?
15. The Vitali theorem says that if is a region and f
n
is a uniformly bounded
sequence of functions which converges pointwise on a set, S which has a
limit point in , then in fact, f
n
must converge uniformly on compact sub-
sets of to an analytic function. Prove this theorem. Hint: If the sequence
fails to converge, show you can get two dierent subsequences converging uni-
formly on compact sets to dierent functions. Then argue these two functions
coincide on S.
16. Does there exist a function analytic on B(0, 1) which maps B(0, 1) onto
B
(0, 1) , the open unit ball in which 0 has been deleted?

2
This is the terminology used in Rudins book Real and Complex Analysis.
Approximation By Rational
Functions
23.1 Runges Theorem
Consider the function,
1
z
= f (z) for z dened on B(0, 1) 0 = B
(0, 1) .
Clearly f is analytic on . Suppose you could approximate f uniformly by poly-
nomials on ann
_
0,
1
2
,
3
4
_
, a compact subset of . Then, there would exist a suit-
able polynomial p (z) , such that
1
2i
_
f (z) p (z) dz
<
1
10
where here is a
circle of radius
2
3
. However, this is impossible because
1
2i
_
f (z) dz = 1 while
1
2i
_
p (z) dz = 0. This shows you cant expect to be able to uniformly approxi-

mate analytic functions on compact sets using polynomials. This is just horrible!
In real variables, you can approximate any continuous function on a compact
set with a polynomial. However, that is just the way it is. It turns out that the
ability to approximate an analytic function on with polynomials is dependent on
being simply connected.
All these theorems work for f having values in a complex Banach space. How-
ever, I will present them in the context of functions which have values in C. The
changes necessary to obtain the extra generality are very minor.
Denition 23.1 Approximation will be taken with respect to the following norm.
[[f g[[
K,
sup[[f (z) g (z)[[ : z K
23.1.1 Approximation With Rational Functions
It turns out you can approximate analytic functions by rational functions, quotients
of polynomials. The resulting theorem is one of the most profound theorems in
complex analysis. The basic idea is simple. The Riemann sums for the Cauchy
integral formula are rational functions. The idea used to implement this observation
is that if you have a compact subset, K of an open set, there exists a cycle
composed of closed oriented curves
_
j
_
n
j=1
which are contained in K such that
609
610 APPROXIMATION BY RATIONAL FUNCTIONS
for every z K,
n
k=1
n(
k
, z) = 1. One more ingredient is needed and this is a
theorem which lets you keep the approximation but move the poles.
To begin with, consider the part about the cycle of closed oriented curves. Recall
Theorem 19.52 which is stated for convenience.
Theorem 23.2 Let K be a compact subset of an open set, . Then there exist
continuous, closed, bounded variation oriented curves
_
j
_
m
j=1
for which
j
K =
for each j,
j
, and for all p K,
m
k=1
n(p,
k
) = 1.
and
m
k=1
n(z,
k
) = 0
for all z / .
This theorem implies the following.
Theorem 23.3 Let K where K is compact and is open. Then there exist
oriented closed curves,
k
such that
k
K = but
k
, such that for all z K,
f (z) =
1
2i
p
k=1
_
k
f (w)
w z
dw. (23.1)
Proof: This follows from Theorem 19.52 and the Cauchy integral formula. As
shown in the proof, you can assume the
k
are linear mappings but this is not
important.
Next I will show how the Cauchy integral formula leads to approximation by
rational functions, quotients of polynomials.
Lemma 23.4 Let K be a compact subset of an open set, and let f be analytic
on . Then there exists a rational function, Q whose poles are not in K such that
[[Qf[[
K,
< .
Proof: By Theorem 23.3 there are oriented curves,
k
described there such that
for all z K,
f (z) =
1
2i
p
k=1
_
k
f (w)
w z
dw. (23.2)
Dening g (w, z)
f(w)
wz
for (w, z)
p
k=1
k
K, it follows since the distance
between K and
k
k
is positive that g is uniformly continuous and so there exists
a > 0 such that if [[T[[ < , then for all z K,
f (z)
1
2i
p
k=1
n
j=1
f (
k
(
j
)) (
k
(t
i
)
k
(t
i1
))
k
(
j
) z
<

2
.
23.1. RUNGES THEOREM 611
The complicated expression is obtained by replacing each integral in 23.2 with a
Riemann sum. Simplifying the appearance of this, it follows there exists a rational
function of the form
R(z) =
M
k=1
A
k
w
k
z
where the w
k
are elements of components of C K and A
k
are complex numbers or
in the case where f has values in X, these would be elements of X such that
[[R f[[
K,
<

2
.
23.1.2 Moving The Poles And Keeping The Approximation
Lemma 23.4 is a nice lemma but needs rening. In this lemma, the Riemann sum
handed you the poles. It is much better if you can pick the poles. The following
theorem from advanced calculus, called Mertens theorem, will be used
23.1.3 Mertens Theorem.
Theorem 23.5 Suppose

i=r
a
i
and

j=r
b
j
both converge absolutely
1
. Then
_

i=r
a
i
_
_
_
j=r
b
j
_
_
=
n=r
c
n
where
c
n
=
n
k=r
a
k
b
nk+r
.
Proof: Let p
nk
= 1 if r k n and p
nk
= 0 if k > n. Then
c
n
=
k=r
p
nk
a
k
b
nk+r
.
1
Actually, it is only necessary to assume one of the series converges and the other converges
absolutely. This is known as Mertens theorem and may be read in the 1974 book by Apostol
listed in the bibliography.
Also,
k=r
n=r
p
nk
[a
k
[ [b
nk+r
[ =
k=r
[a
k
[
n=r
p
nk
[b
nk+r
[
=
k=r
[a
k
[
n=k
[b
nk+r
[
=
k=r
[a
k
[
n=k
b
n(kr)
k=r
[a
k
[
m=r
[b
m
[ < .
Therefore,
n=r
c
n
=
n=r
n
k=r
a
k
b
nk+r
=
n=r
k=r
p
nk
a
k
b
nk+r
=
k=r
a
k
n=r
p
nk
b
nk+r
=
k=r
a
k
n=k
b
nk+r
=
k=r
a
k
m=r
b
m
It follows that
n=r
c
n
converges absolutely. Also, you can see by induction that
you can multiply any number of absolutely convergent series together and obtain a
series which is absolutely convergent. Next, here are some similar results related to
Mertens theorem.
Lemma 23.6 Let
n=0
a
n
(z) and
n=0
b
n
(z) be two convergent series for z K
which satisfy the conditions of the Weierstrass M test. Thus there exist positive
constants, A
n
and B
n
such that [a
n
(z)[ A
n
, [b
n
(z)[ B
n
for all z K and
n=0
A
n
< ,
n=0
B
n
< . Then dening the Cauchy product,
c
n
(z)
n
k0
a
nk
(z) b
k
(z) ,
it follows

n=0
c
n
(z) also converges absolutely and uniformly on K because c
n
(z)
satises the conditions of the Weierstrass M test. Therefore,
n=0
c
n
(z) =
_

k=0
a
k
(z)
__

n=0
b
n
(z)
_
. (23.3)
Proof:
[c
n
(z)[
n
k=0
[a
nk
(z)[ [b
k
(z)[
n
k=0
A
nk
B
k
.
Also,
n=0
n
k=0
A
nk
B
k
=
k=0
n=k
A
nk
B
k
=
k=0
B
k
n=0
A
n
< .
The claim of 23.3 follows from Mertens theorem. This proves the lemma.
Corollary 23.7 Let P be a polynomial and let
n=0
a
n
(z) converge uniformly and
absolutely on K such that the a
n
satisfy the conditions of the Weierstrass M test.
Then there exists a series for P (
n=0
a
n
(z)) ,
n=0
c
n
(z) , which also converges
absolutely and uniformly for z K because c
n
(z) also satises the conditions of the
Weierstrass M test.
The following picture is descriptive of the following lemma. This lemma says
that if you have a rational function with one pole o a compact set, then you can
approximate on the compact set with another rational function which has a dierent
pole.
V
a
b
K
Lemma 23.8 Let R be a rational function which has a pole only at a V, a
component of C K where K is a compact set. Suppose b V. Then for > 0
given, there exists a rational function, Q, having a pole only at b such that
[[R Q[[
K,
< . (23.4)
If it happens that V is unbounded, then there exists a polynomial, P such that
[[R P[[
K,
< . (23.5)
Proof: Say that b V satises T if for all > 0 there exists a rational function,
Q
b
, having a pole only at b such that
[[R Q
b
[[
K,
<
Now dene a set,
S b V : b satises T .
Observe that S ,= because a S.
I claim S is open. Suppose b
1
S. Then there exists a > 0 such that
b
1
b
z b
<
1
2
(23.6)
for all z K whenever b B(b
1
, ) . In fact, it suces to take [b b
1
[ < dist (b
1
, K) /4
because then
b
1
b
z b
<
dist (b
1
, K) /4
z b
dist (b
1
, K) /4
[z b
1
[ [b
1
b[
dist (b
1
, K) /4
dist (b
1
, K) dist (b
1
, K) /4

1
3
<
1
2
.
Since b
1
satises T, there exists a rational function Q
b
1
with the desired prop-
erties. It is shown next that you can approximate Q
b
1
with Q
b
thus yielding an
approximation to R by the use of the triangle inequality,
[[R Q
b
1
[[
K,
+[[Q
b
1
Q
b
[[
K,
[[R Q
b
[[
K,
.
Since Q
b
1
has poles only at b
1
, it follows it is a sum of functions of the form
n
(zb
1
)
n . Therefore, it suces to consider the terms of Q
b
1
or that Q
b
1
is of the
special form
Q
b
1
(z) =
1
(z b
1
)
n
.
However,
1
(z b
1
)
n
=
1
(z b)
n
_
1
b
1
b
zb
_
n
Now from the choice of b
1
, the series
k=0
_
b
1
b
z b
_
k
=
1
_
1
b
1
b
zb
_
converges absolutely independent of the choice of z K because
_
b
1
b
z b
_
k
<
1
2
k
.
By Corollary 23.7 the same is true of the series for
1
(1
b
1
b
zb
)
n . Thus a suitable partial
sum can be made uniformly on K as close as desired to
1
(zb
1
)
n . This shows that b
satises T whenever b is close enough to b
1
verifying that S is open.
Next it is shown S is closed in V. Let b
n
S and suppose b
n
b V. Then
since b
n
S, there exists a rational function, Q
b
n
such that
[[Q
b
n
R[[
K,
<

2
.
Then for all n large enough,
1
2
dist (b, K) [b
n
b[
and so for all n large enough,
b b
n
z b
n
<
1
2
,
for all z K. Pick such a b
n
. As before, it suces to assume Q
b
n
, is of the form
1
(zb
n
)
n . Then
Q
b
n
(z) =
1
(z b
n
)
n
=
1
(z b)
n
_
1
b
n
b
zb
_
n
and because of the estimate, there exists M such that for all z K
1
_
1
b
n
b
zb
_
n

M
k=0
a
k
_
b
n
b
z b
_
k
<
(dist (b, K))
n
2
. (23.7)
Therefore, for all z K
Q
b
n
(z)
1
(z b)
n
M
k=0
a
k
_
b
n
b
z b
_
k
1
(z b)
n
_
1
b
n
b
zb
_
n

1
(z b)
n
M
k=0
a
k
_
b
n
b
z b
_
k
(dist (b, K))

n
2
1
dist (b, K)
n
=

2
and so, letting Q
b
(z) =
1
(zb)
n
M
k=0
a
k
_
b
n
b
zb
_
k
,
[[R Q
b
[[
K,
[[R Q
b
n
[[
K,
+[[Q
b
n
Q
b
[[
K,
<

2
+

2
=
showing that b S. Since S is both open and closed in V it follows that, since
S ,= , S = V . Otherwise V would fail to be connected.
It remains to consider the case where V is unbounded. Pick b V large enough
that
z
b
<
1
2
(23.8)
for all z K. From what was just shown, there exists a rational function, Q
b
having
a pole only at b such that [[Q
b
R[[
K,
<

2
. It suces to assume that Q
b
is of the
form
Q
b
(z) =
p (z)
(z b)
n
= p (z) (1)
n
1
b
n
1
_
1
z
b
_
n
= p (z) (1)
n
1
b
n
_

k=0
_
z
b
_
k
_
n
Then by an application of Corollary 23.7 there exists a partial sum of the power
series for Q
b
which is uniformly close to Q
b
on K. Therefore, you can approximate
Q
b
and therefore also R uniformly on K by a polynomial consisting of a partial sum
of the above innite sum. This proves the theorem.
If f is a polynomial, then f has a pole at . This will be discussed more later.
23.1.4 Runges Theorem
Now what follows is the rst form of Runges theorem.
Theorem 23.9 Let K be a compact subset of an open set, and let b
j
be a
set which consists of one point from each component of

C K. Let f be analytic
on . Then for each > 0, there exists a rational function, Q whose poles are all
contained in the set, b
j
such that
[[Qf[[
K,
< . (23.9)
If

C K has only one component, then Q may be taken to be a polynomial.
Proof: By Lemma 23.4 there exists a rational function of the form
R(z) =
M
k=1
A
k
w
k
z
where the w
k
are elements of components of C K and A
k
are complex numbers
such that
[[R f[[
K,
<

2
.
Consider the rational function, R
k
(z)
A
k
w
k
z
where w
k
V
j
, one of the com-
ponents of C K, the given point of V
j
being b
j
. By Lemma 23.8, there exists
a function, Q
k
which is either a rational function having its only pole at b
j
or a
polynomial, depending on whether V
j
is bounded such that
[[R
k
Q
k
[[
K,
<

2M
.
Letting Q(z)
M
k=1
Q
k
(z) ,
[[R Q[[
K,
<

2
.
It follows
[[f Q[[
K,
[[f R[[
K,
+[[R Q[[
K,
< .
In the case of only one component of C K, this component is the unbounded
component and so you can take Q to be a polynomial. This proves the theorem.
The next version of Runges theorem concerns the case where the given points
are contained in

C for an open set rather than a compact set. Note that here
there could be uncountably many components of

C because the components are
no longer open sets. An easy example of this phenomenon in one dimension is where
= [0, 1] P for P the Cantor set. Then you can show that 1 has uncountably
many components. Nevertheless, Runges theorem will follow from Theorem 23.9
with the aid of the following interesting lemma.
Lemma 23.10 Let be an open set in C. Then there exists a sequence of compact
sets, K
n
such that
=
k=1
K
n
, , K
n
int K
n+1
, (23.10)
and for any K ,
K K
n
, (23.11)
for all n suciently large, and every component of

C K
n
contains a component of
C .
Proof: Let
V
n
z : [z[ > n
_
z /
B
_
z,
1
n
_
.
Thus z : [z[ > n contains the point, . Now let
K
n

C V
n
= C V
n
.
You should verify that 23.10 and 23.11 hold. It remains to show that every compo-
nent of

CK
n

C. Let D be a component of

CK
n
V
n
.
If / D, then D contains no point of z : [z[ > n because this set is connected
and D is a component. (If it did contain a point of this set, it would have to
contain the whole set.) Therefore, D

z /
B
_
z,
1
n
_
and so D contains some point
of B
_
z,
1
n
_
for some z / . Therefore, since this ball is connected, it follows D must
contain the whole ball and consequently D contains some point of
C
. (The point
z at the center of the ball will do.) Since D contains z / , it must contain the
component, H
z
, determined by this point. The reason for this is that
H
z

C

C K
n
and H
z
is connected. Therefore, H
z
can only have points in one component of
C K
n
. Since it has a point in D, it must therefore, be totally contained in D. This
veries the desired condition in the case where / D.
Now suppose that D. / because is given to be a set in C. Letting
H
denote the component of

C determined by , it follows both D and H
contain . Therefore, the connected set, H
cannot have any points in another

component of

C K
n
and it is a set which is contained in

C K
n
so it must be
contained in D. This proves the lemma.
The following picture is a very simple example of the sort of thing considered
by Runges theorem. The picture is of a region which has a couple of holes.
a
1
a
2
However, there could be many more holes than two. In fact, there could be
innitely many. Nor does it follow that the components of the complement of need
to have any interior points. Therefore, the picture is certainly not representative.
Theorem 23.11 (Runge) Let be an open set, and let A be a set which has one
point in each component of

C and let f be analytic on . Then there exists a
sequence of rational functions, R
n
having poles only in A such that R
n
converges
uniformly to f on compact subsets of .
Proof: Let K
n
be the compact sets of Lemma 23.10 where each component of
CK
n

C. It follows each component of

CK
n
contains
a point of A. Therefore, by Theorem 23.9 there exists R
n
a rational function with
poles only in A such that
[[R
n
f[[
K
n
,
<
1
n
.
It follows, since a given compact set, K is a subset of K
n
for all n large enough,
that R
n
f uniformly on K. This proves the theorem.
Corollary 23.12 Let be simply connected and f analytic on . Then there exists
a sequence of polynomials, p
n
such that p
n
f uniformly on compact sets of .
Proof: By denition of what is meant by simply connected,

C is connected
and so there are no bounded components of

C. Therefore, in the proof of Theorem
23.11 when you use Theorem 23.9, you can always have R
n
be a polynomial by
Lemma 23.8.
23.2 The Mittag-Leer Theorem
23.2.1 A Proof From Runges Theorem
This theorem is fairly easy to prove once you have Theorem 23.9. Given a set of
complex numbers, does there exist a meromorphic function having its poles equal
23.2. THE MITTAG-LEFFLER THEOREM 619
to this set of numbers? The Mittag-Leer theorem provides a very satisfactory
answer to this question. Actually, it says somewhat more. You can specify, not just
the location of the pole but also the kind of singularity the meromorphic function
is to have at that pole.
Theorem 23.13 Let P z
k
k=1
be a set of points in an open subset of C, .
Suppose also that P C. For each z
k
, denote by S
k
(z) a function of the form
S
k
(z) =
m
k
j=1
a
k
j
(z z
k
)
j
.
Then there exists a meromorphic function, Q dened on such that the poles of
Q are the points, z
k
k=1
and the singular part of the Laurent expansion of Q at
z
k
equals S
k
(z) . In other words, for z near z
k
, Q(z) = g
k
(z) + S
k
(z) for some
function, g
k
analytic near z
k
.
Proof: Let K
n
denote the sequence of compact sets described in Lemma
23.10. Thus
n=1
K
n
= , K
n
int (K
n+1
) K
n+1
, and the components of
CK
n
contain the components of

C. Renumbering if necessary, you can assume
each K
n
,= . Also let K
0
= . Let P
m
P (K
m
K
m1
) and consider the
rational function, R
m
dened by
R
m
(z)
z
k
K
m
\K
m1
S
k
(z) .
Since each K
m
is compact, it follows P
m
is nite and so the above really is a
rational function. Now for m > 1,this rational function is analytic on some open
set containing K
m1
. There exists a set of points, A one point in each component
of

C. Consider

CK
m1
. Each of its components contains a component of

C
and so for each of these components of

C K
m1
, there exists a point of A which
is contained in it. Denote the resulting set of points by A
. By Theorem 23.9 there

exists a rational function, Q
m
whose poles are all contained in the set, A

C
such that
[[R
m
Q
m
[[
K
m1,
<
1
2
m
.
The meromorphic function is
Q(z) R
1
(z) +
k=2
(R
k
(z) Q
k
(z)) .
It remains to verify this function works. First consider K
1
. Then on K
1
, the above
sum converges uniformly. Furthermore, the terms of the sum are analytic in some
open set containing K
1
. Therefore, the innite sum is analytic on this open set and
so for z K
1
The function, f is the sum of a rational function, R
1
, having poles at
P
1
with the specied singular terms and an analytic function. Therefore, Q works
on K
1
. Now consider K
m
for m > 1. Then
Q(z) = R
1
(z) +
m+1
k=2
(R
k
(z) Q
k
(z)) +
k=m+2
(R
k
(z) Q
k
(z)) .
As before, the innite sum converges uniformly on K
m+1
and hence on some open
set, O containing K
m
. Therefore, this innite sum equals a function which is
analytic on O. Also,
R
1
(z) +
m+1
k=2
(R
k
(z) Q
k
(z))
is a rational function having poles at
m
k=1
P
k
with the specied singularities because
the poles of each Q
k
are not in . It follows this function is meromorphic because
it is analytic except for the points in P. It also has the property of retaining the
specied singular behavior.
23.2.2 A Direct Proof Without Runges Theorem
There is a direct proof of this important theorem which is not dependent on Runges
theorem in the case where = C. I think it is arguably easier to understand and
the Mittag-Leer theorem is very important so I will give this proof here.
k
k=1
be a set of points in C which satises lim
n
[z
n
[ =
. For each z
k
, denote by S
k
(z) a polynomial in
1
zz
k
which is of the form
S
k
(z) =
m
k
j=1
a
k
j
(z z
k
)
j
.
Then there exists a meromorphic function, Q dened on C such that the poles of Q
are the points, z
k
k=1
and the singular part of the Laurent expansion of Q at z
k
equals S
k
(z) . In other words, for z near z
k
,
Q(z) = g
k
(z) +S
k
(z)
for some function, g
k
analytic in some open set containing z
k
.
Proof: First consider the case where none of the z
k
= 0. Letting
K
k
z : [z[ [z
k
[ /2 ,
there exists a power series for
1
zz
k
which converges uniformly and absolutely on
this set. Here is why:
1
z z
k
=
_
1
1
z
z
k
_
1
z
k
=
1
z
k
l=0
_
z
z
k
_
l
and the Weierstrass M test can be applied because
z
z
k
<
1
2
on this set. Therefore, by Corollary 23.7, S
k
(z) , being a polynomial in
1
zz
k
, has
a power series which converges uniformly to S
k
(z) on K
k
. Therefore, there exists a
polynomial, P
k
(z) such that
[[P
k
S
k
[[
B(0,|z
k
|/2),
<
1
2
k
.
Let
Q(z)
k=1
(S
k
(z) P
k
(z)) . (23.12)
Consider z K
m
and let N be large enough that if k > N, then [z
k
[ > 2 [z[
Q(z) =
N
k=1
(S
k
(z) P
k
(z)) +
k=N+1
(S
k
(z) P
k
(z)) .
On K
m
, the second sum converges uniformly to a function analytic on int (K
m
)
(interior of K
m
) while the rst is a rational function having poles at z
1
, , z
N
.
Since any compact set is contained in K
m
for large enough m, this shows Q(z) is
meromorphic as claimed and has poles with the given singularities.
Now consider the case where the poles are at z
k
k=0
with z
0
= 0. Everything
is similar in this case. Let
Q(z) S
0
(z) +
k=1
(S
k
(z) P
k
(z)) .
The series converges uniformly on every compact set because of the assumption
that lim
n
[z
n
[ = which implies that any compact set is contained in K
k
for
k large enough. Choose N such that z int(K
N
) and z
n
/ K
N
for all n N + 1.
Then
Q(z) = S
0
(z) +
N
k=1
(S
k
(z) P
k
(z)) +
k=N+1
(S
k
(z) P
k
(z)) .
The last sum is analytic on int(K
N
) because each function in the sum is analytic due
to the fact that none of its poles are in K
N
. Also, S
0
(z) +
N
k=1
(S
k
(z) P
k
(z)) is
a nite sum of rational functions so it is a rational function and P
k
is a polynomial
so z
m
is a pole of this function with the correct singularity whenever z
m
int (K
N
).
23.2.3 Functions Meromorphic On

C
Sometimes it is useful to think of isolated singular points at .
Denition 23.15 Suppose f is analytic on z C : [z[ > r . Then f is said to
have a removable singularity at if the function, g (z) f
_
1
z
_
has a removable
singularity at 0. f is said to have a pole at if the function, g (z) = f
_
1
z
_
has a
pole at 0. Then f is said to be meromorphic on

C if all its singularities are isolated
and either poles or removable.
So what is f like for these cases? First suppose f has a removable singularity
at . Then zg (z) converges to 0 as z 0. It follows g (z) must be analytic near
0 and so can be given as a power series. Thus f (z) is of the form f (z) = g
_
1
z
_
=
n=0
a
n
_
1
z
_
n
. Next suppose f has a pole at . This means g (z) has a pole at 0 so
g (z) is of the form g (z) =
m
k=1
b
k
z
k
+h(z) where h(z) is analytic near 0. Thus in the
case of a pole at , f (z) is of the form f (z) = g
_
1
z
_
=
m
k=1
b
k
z
k
+
n=0
a
n
_
1
z
_
n
.
It turns out that the functions which are meromorphic on

C are all rational
functions. To see this suppose f is meromorphic on

C and note that there exists
r > 0 such that f (z) is analytic for [z[ > r. This is required if is to be isolated.
Therefore, there are only nitely many poles of f for [z[ r, a
1
, , a
m
, because
by assumption, these poles are isolated and this is a compact set. Let the singular
part of f at a
k
be denoted by S
k
(z) . Then f (z)
m
k=1
S
k
(z) is analytic on all of
C. Therefore, it is bounded on [z[ r. In one case, f has a removable singularity at
. In this case, f is bounded as z and

k
S
k
also converges to 0 as z .
Therefore, by Liouvilles theorem, f (z)
m
k=1
S
k
(z) equals a constant and so
f
k
S
k
is a constant. Thus f is a rational function. In the other case that f has
a pole at , f (z)
m
k=1
S
k
(z)
m
k=1
b
k
z
k
=
n=0
a
n
_
1
z
_
n
m
k=1
S
k
(z) . Now
f (z)
m
k=1
S
k
(z)
m
k=1
b
k
z
k
is analytic on C and so is bounded on [z[ r. But
now

n=0
a
n
_
1
z
_
n
m
k=1
S
k
(z) converges to 0 as z and so by Liouvilles
theorem, f (z)
m
k=1
S
k
(z)
m
k=1
b
k
z
k
must equal a constant and again, f (z)
equals a rational function.
23.2.4 A Great And Glorious Theorem About Simply Con-
nected Regions
Here is given a laundry list of properties which are equivalent to an open set being
simply connected. Recall Denition 19.48 on Page 481 which said that an open
set, is simply connected means

C is connected. Recall also that this is not
the same thing at all as saying C is connected. Consider the outside of a disk
for example. I will continue to use this denition for simply connected because it
is the most convenient one for complex analysis. However, there are many other
equivalent conditions. First here is an interesting lemma which is interesting for
its own sake. Recall n(p, ) means the winding number of about p. Now recall
Theorem 19.52 implies the following lemma in which B
C
is playing the role of in
Theorem 19.52.
Lemma 23.16 Let K be a compact subset of B
C
, the complement of a closed set.
Then there exist continuous, closed, bounded variation oriented curves
j
m
j=1
for
which
j
K = for each j,
j
, and for all p K,
m
k=1
n(
k
, p) = 1.
while for all z B
m
k=1
n(
k
, z) = 0.
Denition 23.17 Let be a closed curve in an open set, , : [a, b] . Then
is said to be homotopic to a point, p in if there exists a continuous function,
H : [0, 1][a, b] such that H (0, t) = p, H (, a) = H (, b) , and H (1, t) = (t) .
This function, H is called a homotopy.
Lemma 23.18 Suppose is a closed continuous bounded variation curve in an
open set, which is homotopic to a point. Then if a / , it follows n(a, ) = 0.
Proof: Let H be the homotopy described above. The problem with this is
that it is not known that H (, ) is of bounded variation. There is no reason it
should be. Therefore, it might not make sense to take the integral which denes
the winding number. There are various ways around this. Extend H as follows.
H (, t) = H (, a) for t < a, H (, t) = H (, b) for t > b. Let > 0.
H
(, t)
1
2
_
t+
2
(ba)
(ta)
2+t+
2
(ba)
(ta)
H (, s) ds, H
(0, t) = p.
Thus H
(, ) is a closed curve which has bounded variation and when = 1, this

converges to uniformly on [a, b]. Therefore, for small enough, n(a, H
(1, )) =
n(a, ) because they are both integers and as 0, n(a, H
(1, )) n(a, ) . Also,

H
(, t) H (, t) uniformly on [0, 1] [a, b] because of uniform continuity of H.

Therefore, for small enough , you can also assume H
(, t) for all , t. Now

n(a, H
(, )) is continuous. Hence it must be constant because the winding

number is integer valued. But
lim
0
1
2i
_
H
(,)
1
z a
dz = 0
because the length of H
(, ) converges to 0 and the integrand is bounded because

a / . Therefore, the constant can only equal 0. This proves the lemma.
Now it is time for the great and glorious theorem on simply connected regions.
The following equivalence of properties is taken from Rudin [38]. There is a slightly
dierent list in Conway [12] and a shorter list in Ash [6].
Theorem 23.19 The following are equivalent for an open set, .
1. is homeomorphic to the unit disk, B(0, 1) .
2. Every closed curve contained in is homotopic to a point in .
3. If z / , and if is a closed bounded variation continuous curve in , then
n(, z) = 0.
4. is simply connected, (
C is connected and is connected. )

5. Every function analytic on can be uniformly approximated by polynomials
on compact subsets.
6. For every f analytic on and every closed continuous bounded variation
curve, ,
_
f (z) dz = 0.
7. Every function analytic on has a primitive on .
8. If f, 1/f are both analytic on , then there exists an analytic, g on such
that f = exp(g) .
9. If f, 1/f are both analytic on , then there exists analytic on such that
f =
2
.
Proof: 12. Assume 1 and let be a closed curve in . Let h be the homeo-
morphism, h : B(0, 1) . Let H (, t) = h
_
_
h
1
(t)
__
. This works.
23 This is Lemma 23.18.
34. Suppose 3 but 4 fails to hold. Then if

C is not connected, there exist
disjoint nonempty sets, A and B such that A B = A B = . It follows each
of these sets must be closed because neither can have a limit point in nor in
the other. Also, one and only one of them contains . Let this set be B. Thus
A is a closed set which must also be bounded. Otherwise, there would exist a
sequence of points in A, a
n
such that lim
n
a
n
= which would contradict
the requirement that no limit points of A can be in B. Therefore, A is a compact
set contained in the open set, B
C
z C : z / B . Pick p A. By Lemma 23.16
there exist continuous bounded variation closed curves
k
m
k=1
which are contained
in B
C
, do not intersect A and such that
1 =
m
k=1
n(p,
k
)
However, if these curves do not intersect A and they also do not intersect B then
they must be all contained in . Since p / , it follows by 3 that for each k,
n(p,
k
) = 0, a contradiction.
45 This is Corollary 23.12 on Page 618.
56 Every polynomial has a primitive and so the integral over any closed
bounded variation curve of a polynomial equals 0. Let f be analytic on . Then let
f
n
be a sequence of polynomials converging uniformly to f on
. Then
0 = lim
n
_
f
n
(z) dz =
_
f (z) dz.
67 Pick z
0
. Letting (z
0
, z) be a bounded variation continuous curve
joining z
0
to z in , you dene a primitive for f as follows.
F (z) =
_
(z
0
,z)
f (w) dw.
This is well dened by 6 and is easily seen to be a primitive. You just write the
dierence quotient and take a limit using 6.
lim
w0
F (z +w) F (z)
w
= lim
w0
1
w
_
_
(z
0
,z+w)
f (u) du
_
(z
0
,z)
f (u) du
_
= lim
w0
1
w
_
(z,z+w)
f (u) du
= lim
w0
1
w
_
1
0
f (z +tw) wdt = f (z) .
78 Suppose then that f, 1/f are both analytic. Then f
/f is analytic and so
it has a primitive by 7. Let this primitive be g
1
. Then
_
e
g
1
f
_
= e
g
1
(g
1
) f +e
g
1
f
= e
g
1
_
f
f
_
f +e
g
1
f
= 0.
Therefore, since is connected, it follows e
g
1
f must equal a constant. (Why?)
Let the constant be e
a+ibi
. Then f (z) = e
g
1
(z)
e
a+ib
. Therefore, you let g (z) =
g
1
(z) +a +ib.
89 Suppose then that f, 1/f are both analytic on . Then by 8 f (z) = e
g(z)
.
Let (z) e
g(z)/2
.
91 There are two cases. First suppose = C. This satises condition 9
because if f, 1/f are both analytic, then the same argument involved in 89 gives
the existence of a square root. A homeomorphism is h(z)
z
1+|z|
2
. It obviously
maps onto B(0, 1) and is continuous. To see it is 1 - 1 consider the case of z
1
and z
2
having dierent arguments. Then h(z
1
) ,= h(z
2
) . If z
2
= tz
1
for a positive
t ,= 1, then it is also clear h(z
1
) ,= h(z
2
) . To show h
1
is continuous, note that if
you have an open set in C and a point in this open set, you can get a small open
set containing this point by allowing the modulus and the argument to lie in some
open interval. Reasoning this way, you can verify h maps open sets to open sets. In
the case where ,= C, there exists a one to one analytic map which maps onto
B(0, 1) by the Riemann mapping theorem. This proves the theorem.
23.3 Exercises
1. Let a C. Show there exists a sequence of polynomials, p
n
such that
p
n
(a) = 1 but p
n
(z) 0 for all z ,= a.
2. Let l be a line in C. Show there exists a sequence of polynomials p
n
such
that p
n
(z) 1 on one side of this line and p
n
(z) 1 on the other side of
the line. Hint: The complement of this line is simply connected.
3. Suppose is a simply connected region, f is analytic on , f ,= 0 on , and
n N. Show that there exists an analytic function, g such that g (z)
n
= f (z)
for all z . That is, you can take the n
th
root of f (z) . If is a region
which contains 0, is it possible to nd g (z) such that g is analytic on and
g (z)
2
= z?
4. Suppose is a region (connected open set) and f is an analytic function
dened on such that f (z) ,= 0 for any z . Suppose also that for every
positive integer, n there exists an analytic function, g
n
dened on such that
g
n
n
(z) = f (z) . Show that then it is possible to dene an analytic function, L
on f () such that e
L(f(z))
= f (z) for all z .
5. You know that (z)
zi
z+i
maps the upper half plane onto the unit ball. Its
inverse, (z) = i
1+z
1z
maps the unit ball onto the upper half plane. Also for z
in the upper half plane, you can dene a square root as follows. If z = [z[ e
i
where (0, ) , let z
1/2
[z[
1/2
e
i/2
so the square root maps the upper half
plane to the rst quadrant. Now consider
z exp
_
i log
_
i
_
1 +z
1 z
__
1/2
_
. (23.13)
Show this is an analytic function which maps the unit ball onto an annulus.
Is it possible to nd a one to one analytic map which does this?
Innite Products
The Mittag-Leer theorem gives existence of a meromorphic function which has
specied singular part at various poles. It would be interesting to do something
similar to zeros of an analytic function. That is, given the order of the zero at
various points, does there exist an analytic function which has these points as zeros
with the specied orders? You know that if you have the zeros of the polynomial,
you can factor it. Can you do something similar with analytic functions which
are just limits of polynomials? These questions involve the concept of an innite
product.
Denition 24.1

n=1
(1 +u
n
) lim
n
n
k=1
(1 +u
k
) whenever this limit ex-
ists. If u
n
= u
n
(z) for z H, we say the innite product converges uniformly on
H if the partial products,

n
k=1
(1 +u
k
(z)) converge uniformly on H.
The main theorem is the following.
Theorem 24.2 Let H C and suppose that
n=1
[u
n
(z)[ converges uniformly on
H where u
n
(z) bounded on H. Then
P (z)
n=1
(1 +u
n
(z))
converges uniformly on H. If (n
1
, n
2
, ) is any permutation of (1, 2, ) , then for
all z H,
P (z) =
k=1
(1 +u
n
k
(z))
and P has a zero at z
0
if and only if u
n
(z
0
) = 1 for some n.
627
628 INFINITE PRODUCTS
Proof: First a simple estimate:
n
k=m
(1 +[u
k
(z)[)
= exp
_
ln
_
n
k=m
(1 +[u
k
(z)[)
__
= exp
_
n
k=m
ln(1 +[u
k
(z)[)
_
exp
_

k=m
[u
k
(z)[
_
< e
for all z H provided m is large enough. Since

k=1
[u
k
(z)[ converges uniformly
on H, [u
k
(z)[ <
1
2
for all z H provided k is large enough. Thus you can take
log (1 +u
k
(z)) . Pick N
0
such that for n > m N
0
,
[u
m
(z)[ <
1
2
,
n
k=m
(1 +[u
k
(z)[) < e. (24.1)
Now having picked N
0
, the assumption the u
n
are bounded on H implies there
exists a constant, C, independent of z H such that for all z H,
N
0
k=1
(1 +[u
k
(z)[) < C. (24.2)
Let N
0
< M < N. Then
k=1
(1 +u
k
(z))
M
k=1
(1 +u
k
(z))
N
0
k=1
(1 +[u
k
(z)[)
k=N
0
+1
(1 +u
k
(z))
M
k=N
0
+1
(1 +u
k
(z))
k=N
0
+1
(1 +u
k
(z))
M
k=N
0
+1
(1 +u
k
(z))
C
_
M
k=N
0
+1
(1 +[u
k
(z)[)
_
k=M+1
(1 +u
k
(z)) 1
Ce
k=M+1
(1 +[u
k
(z)[) 1
.
629
Since 1

N
k=M+1
(1 +[u
k
(z)[) e, it follows the term on the far right is domi-
nated by
Ce
2
ln
_
N
k=M+1
(1 +[u
k
(z)[)
_
ln1
Ce
2
N
k=M+1
ln(1 +[u
k
(z)[)
Ce
2
N
k=M+1
[u
k
(z)[ <
uniformly in z H provided M is large enough. This follows from the simple obser-
vation that if 1 < x < e, then x1 e (lnx ln1). Therefore,
m
k=1
(1 +u
k
(z))
m=1
is uniformly Cauchy on H and therefore, converges uniformly on H. Let P (z) denote
the function it converges to.
What about the permutations? Let n
1
, n
2
, be a permutation of the indices.
Let > 0 be given and let N
0
be such that if n > N
0
,
k=1
(1 +u
k
(z)) P (z)
<
for all z H. Let 1, 2, , n
_
n
1
, n
2
, , n
p(n)
_
where p (n) is an increasing
sequence. Then from 24.1 and 24.2,
P (z)
p(n)
k=1
(1 +u
n
k
(z))
P (z)
n
k=1
(1 +u
k
(z))
k=1
(1 +u
k
(z))
p(n)
k=1
(1 +u
n
k
(z))
k=1
(1 +u
k
(z))
p(n)
k=1
(1 +u
n
k
(z))
k=1
(1 +[u
k
(z)[)
n
k
>n
(1 +u
n
k
(z))
N
0
k=1
(1 +[u
k
(z)[)
k=N
0
+1
(1 +[u
k
(z)[)
n
k
>n
(1 +u
n
k
(z))
+Ce
n
k
>n
(1 +[u
n
k
(z)[) 1
+Ce
M(p(n))
k=n+1
(1 +[u
n
k
(z)[) 1

where M (p (n)) is the largest index in the permuted list,
_
n
1
, n
2
, , n
p(n)
_
. then
from 24.1, this last term is dominated by
+Ce
2
ln
_
_
M(p(n))
k=n+1
(1 +[u
n
k
(z)[)
_
_
ln1
+Ce
2
k=n+1
ln(1 +[u
n
k
[) +Ce
2
k=n+1
[u
n
k
[ < 2
for all n large enough uniformly in z H. Therefore,
P (z)
p(n)
k=1
(1 +u
n
k
(z))
<
2 whenever n is large enough. This proves the part about the permutation.
It remains to verify the assertion about the points, z
0
, where P (z
0
) = 0. Obvi-
ously, if u
n
(z
0
) = 1, then P (z
0
) = 0. Suppose then that P (z
0
) = 0 and M > N
0
.
Then

k=1
(1 +u
k
(z
0
))
k=1
(1 +u
k
(z
0
))
k=1
(1 +u
k
(z
0
))
k=1
(1 +u
k
(z
0
))
k=M+1
(1 +u
k
(z
0
))
k=1
(1 +u
k
(z
0
))
k=M+1
(1 +[u
k
(z
0
)[) 1
k=1
(1 +u
k
(z
0
))
ln
k=M+1
(1 +[u
k
(z
0
)[) ln1
e
_

k=M+1
ln(1 +[u
k
(z)[)
_
k=1
(1 +u
k
(z
0
))
k=M+1
[u
k
(z)[
k=1
(1 +u
k
(z
0
))
1
2
k=1
(1 +u
k
(z
0
))
whenever M is large enough. Therefore, for such M,

M
k=1
(1 +u
k
(z
0
)) = 0
and so u
k
(z
0
) = 1 for some k M. This proves the theorem.
24.1. ANALYTIC FUNCTION WITH PRESCRIBED ZEROS 631
24.1 Analytic Function With Prescribed Zeros
Suppose you are given complex numbers, z
n
and you want to nd an analytic
function, f such that these numbers are the zeros of f. How can you do it? The
problem is easy if there are only nitely many of these zeros, z
1
, z
2
, , z
m
.
You just write (z z
1
) (z z
2
) (z z
m
) . Now if none of the z
k
= 0 you could
also write it at

m
k=1
_
1
z
z
k
_
and this might have a better chance of success in
the case of innitely many prescribed zeros. However, you would need to verify
something like

n=1
z
z
n
< which might not be so. The way around this is to

adjust the product, making it

k=1
_
1
z
z
k
_
e
g
k
(z)
where g
k
(z) is some analytic
function. Recall also that for [x[ < 1, ln
_
(1 x)
1
_
=
n=1
x
n
n
. If you had x/x
n
small and real, then 1 = (1 x/x
n
) exp
_
ln
_
(1 x/x
n
)
1
__
and

k=1
1 of course
converges but loses all the information about zeros. However, this is why it is not
too unreasonable to consider factors of the form
_
1
z
z
k
_
e
p
k
k=1
z
z
k
k
1
k
where p
k
is suitably chosen.
First here are some estimates.
Lemma 24.3 For z C,
[e
z
1[ [z[ e
|z|
, (24.3)
and if [z[ 1/2,
k=m
z
k
k
1
m
[z[
m
1 [z[

2
m
[z[
m
1
m
1
2
m1
. (24.4)
Proof: Consider 24.3.
[e
z
1[ =
k=1
z
k
k!
k=1
[z[
k
k!
= e
|z|
1 [z[ e
|z|
the last inequality holding by the mean value theorem. Now consider 24.4.
k=m
z
k
k
k=m
[z[
k
k

1
m
k=m
[z[
k
=
1
m
[z[
m
1 [z[

2
m
[z[
m
1
m
1
2
m1
.
The functions, E
p
in the next denition are called the elementary factors.
Denition 24.4 Let E
0
(z) 1 z and for p 1,
E
p
(z) (1 z) exp
_
z +
z
2
2
+ +
z
p
p
_
In terms of this new symbol, here is another estimate. A sharper inequality is
available in Rudin [38] but it is more dicult to obtain.
Corollary 24.5 For E
p
dened above and [z[ 1/2,
[E
p
(z) 1[ 3 [z[
p+1
.
Proof: From elementary calculus, ln(1 x) =
n=1
x
n
n
for all [x[ < 1.
Therefore, for [z[ < 1,
log (1 z) =
n=1
z
n
n
, log
_
(1 z)
1
_
=
n=1
z
n
n
,
because the function log (1 z) and the analytic function,
n=1
z
n
n
both are
equal to ln(1 x) on the real line segment (1, 1) , a set which has a limit point.
Therefore, using Lemma 24.3,
[E
p
(z) 1[
=
(1 z) exp
_
z +
z
2
2
+ +
z
p
p
_
1
(1 z) exp
_
log
_
(1 z)
1
_
n=p+1
z
n
n
_
1
exp
_
n=p+1
z
n
n
_
1
n=p+1
z
n
n
e
[
n=p+1
z
n
n
[
1
p + 1
2 e
1/(p+1)
[z[
p+1
. 3 [z[
p+1
With this estimate, it is easy to prove the Weierstrass product formula.
Theorem 24.6 Let z
n
be a sequence of nonzero complex numbers which have no
limit point in C and let p
n
be a sequence of nonnegative integers such that
n=1
_
R
[z
n
[
_
p
n
+1
< (24.5)
for all R 1. Then
P (z)
n=1
E
p
n
_
z
z
n
_
is analytic on C and has a zero at each point, z
n
and at no others. If w occurs m
times in z
n
, then P has a zero of order m at w.
Proof: Since z
n
has no limit point, it follows lim
n
[z
n
[ = . Therefore,
if p
n
= n 1 the condition, 24.5 holds for this choice of p
n
. Now by Theorem 24.2,
the innite product in this theorem will converge uniformly on [z[ R if the same
is true of the sum,
n=1
E
p
n
_
z
z
n
_
1
. (24.6)
But by Corollary 24.5 the n
th
term of this sum satises
E
p
n
_
z
z
n
_
1
z
z
n
p
n
+1
.
Since [z
n
[ , there exists N such that for n > N, [z
n
[ > 2R. Therefore, for
[z[ < R and letting 0 < a = min[z
n
[ : n N ,
n=1
E
p
n
_
z
z
n
_
1
3
N
n=1
R
a
p
n
+1
+3
n=N
_
R
2R
_
p
n
+1
< .
By the Weierstrass M test, the series in 24.6 converges uniformly for [z[ < R and so
the same is true of the innite product. It follows from Lemma 19.18 on Page 460
that P (z) is analytic on [z[ < R because it is a uniform limit of analytic functions.
Also by Theorem 24.2 the zeros of the analytic P (z) are exactly the points,
z
n
, listed according to multiplicity. That is, if z
n
is a zero of order m, then if it
is listed m times in the formula for P (z) , then it is a zero of order m for P. This
proves the theorem.
The following corollary is an easy consequence and includes the case where there
is a zero at 0.
Corollary 24.7 Let z
n
be a sequence of nonzero complex numbers which have
no limit point and let p
n
be a sequence of nonnegative integers such that
n=1
_
r
[z
n
[
_
1+p
n
< (24.7)
for all r 1. Then
P (z) z
m
n=1
E
p
n
_
z
z
n
_
is analytic and has a zero at each point, z
n
and at no others along with a zero of
order m at 0. If w occurs m times in z
n
, then P has a zero of order m at w.
The above theory can be generalized to include the case of an arbitrary open
set. First, here is a lemma.
Lemma 24.8 Let be an open set. Also let z
n
be a sequence of points in
which is bounded and which has no point repeated more than nitely many times
such that z
n
has no limit point in . Then there exist w
n
such that
lim
n
[z
n
w
n
[ = 0.
Proof: Since is closed, there exists w
n
such that dist (z
n
, ) =
[z
n
w
n
[ . Now if there is a subsequence, z
n
k
such that [z
n
k
w
n
k
[ for all k,
then z
n
k
must possess a limit point because it is a bounded innite set of points.
However, this limit point can only be in because z
n
k
is bounded away from .
This is a contradiction. Therefore, lim
n
[z
n
w
n
[ = 0. This proves the lemma.
Corollary 24.9 Let z
n
be a sequence of complex numbers contained in , an
open subset of C which has no limit point in . Suppose each z
n
is repeated no more
than nitely many times. Then there exists a function f which is analytic on
whose zeros are exactly z
n
. If w z
n
and w is listed m times, then w is a zero
of order m of f.
Proof: There is nothing to prove if z
n
is nite. You just let f (z) =
m
j=1
(z z
j
) where z
n
= z
1
, , z
m
.
Pick w z
n
n=1
and let h(z)
1
zw
. Since w is not a limit point of z
n
,
there exists r > 0 such that B(w, r) contains no points of z
n
. Let
1
w.
Now h is not constant and so h(
1
) is an open set by the open mapping theorem.
In fact, h maps each component of to a region. [z
n
w[ > r for all z
n
and
so [h(z
n
)[ < r
1
. Thus the sequence, h(z
n
) is a bounded sequence in the open
set h(
1
) . It has no limit point in h(
1
) because this is true of z
n
and
1
.
By Lemma 24.8 there exist w
n
(h(
1
)) such that lim
n
[w
n
h(z
n
)[ = 0.
Consider for z
1
f (z)
n=1
E
n
_
h(z
n
) w
n
h(z) w
n
_
. (24.8)
Letting K be a compact subset of
1
, h(K) is a compact subset of h(
1
) and so if
z K, then [h(z) w
n
[ is bounded below by a positive constant. Therefore, there
exists N large enough that for all z K and n N,
h(z
n
) w
n
h(z) w
n
<
1
2
and so by Corollary 24.5, for all z K and n N,
E
n
_
h(z
n
) w
n
h(z) w
n
_
1
3
_
1
2
_
n
. (24.9)
Therefore,
n=1
E
n
_
h(z
n
) w
n
h(z) w
n
_
1
converges uniformly for z K. This implies

n=1
E
n
_
h(z
n
)w
n
h(z)w
n
_
also converges
uniformly for z K by Theorem 24.2. Since K is arbitrary, this shows f dened
in 24.8 is analytic on
1
.
Also if z
n
is listed m times so it is a zero of multiplicity m and w
n
is the point
from (h(
1
)) closest to h(z
n
) , then there are m factors in 24.8 which are of the
form
E
n
_
h(z
n
) w
n
h(z) w
n
_
=
_
1
h(z
n
) w
n
h(z) w
n
_
e
g
n
(z)
=
_
h(z) h(z
n
)
h(z) w
n
_
e
g
n
(z)
=
z
n
z
(z w) (z
n
w)
_
1
h(z) w
n
_
e
g
n
(z)
= (z z
n
) G
n
(z) (24.10)
where G
n
is an analytic function which is not zero at and near z
n
. Therefore, f has
a zero of order m at z
n
. This proves the theorem except for the point, w which has
been left out of
1
. It is necessary to show f is analytic at this point also and right
now, f is not even dened at w.
The w
n
are bounded because h(z
n
) is bounded and lim
n
[w
n
h(z
n
)[ =
0 which implies [w
n
h(z
n
)[ C for some constant, C. Therefore, there exists
> 0 such that if z B
(w, ) , then for all n,
h(z
n
) w
_
1
zw
_
w
n
h(z
n
) w
n
h(z) w
n
<
1
2
.
Thus 24.9 holds for all z B
(w, ) and n so by Theorem 24.2, the innite product

in 24.8 converges uniformly on B
(w, ) . This implies f is bounded in B
(w, ) and
so w is a removable singularity and f can be extended to w such that the result is
analytic. It only remains to verify f (w) ,= 0. After all, this would not do because
it would be another zero other than those in the given list. By 24.10, a partial
product is of the form
N
n=1
_
h(z) h(z
n
)
h(z) w
n
_
e
g
n
(z)
(24.11)
where
g
n
(z)
_
h(z
n
) w
n
h(z) w
n
+
1
2
_
h(z
n
) w
n
h(z) w
n
_
2
+ +
1
n
_
h(z
n
) w
n
h(z) w
n
_
n
_
Each of the quotients in the denition of g
n
(z) converges to 0 as z w and so
the partial product of 24.11 converges to 1 as z w because
_
h(z)h(z
n
)
h(z)w
n
_
1 as
z w.
If f (w) = 0, then if z is close enough to w, it follows [f (z)[ <
1
2
. Also, by the
uniform convergence on B
(w, ) , it follows that for some N, the partial product

up to N must also be less than 1/2 in absolute value for all z close enough to w
and as noted above, this does not occur because such partial products converge to
1 as z w. Hence f (w) ,= 0. This proves the corollary.
Recall the denition of a meromorphic function on Page 474. It was a function
which is analytic everywhere except at a countable set of isolated points at which
the function has a pole. It is clear that the quotient of two analytic functions yields
a meromorphic function but is this the only way it can happen?
Theorem 24.10 Suppose Q is a meromorphic function on an open set, . Then
there exist analytic functions on , f (z) and g (z) such that Q(z) = f (z) /g (z) for
all z not in the set of poles of Q.
Proof: Let Q have a pole of order m(z) at z. Then by Corollary 24.9 there
exists an analytic function, g which has a zero of order m(z) at every z . It
follows gQ has a removable singularity at the poles of Q. Therefore, there is an
analytic function, f such that f (z) = g (z) Q(z) . This proves the theorem.
Corollary 24.11 Suppose is a region and Q is a meromorphic function dened
on such that the set, z : Q(z) = c has a limit point in . Then Q(z) = c
for all z .
Proof: From Theorem 24.10 there are analytic functions, f, g such that Q =
f
g
.
Therefore, the zero set of the function, f (z) cg (z) has a limit point in and so
f (z) cg (z) = 0 for all z . This proves the corollary.
24.2 Factoring A Given Analytic Function
The next theorem is the Weierstrass factorization theorem which can be used to
factor a given analytic function f. If f has a zero of order m when z = 0, then you
could factor out a z
m
and from there consider the factorization of what remains
when you have factored out the z
m
. Therefore, the following is the main thing of
interest.
Theorem 24.12 Let f be analytic on C, f (0) ,= 0, and let the zeros of f, be
z
k
,listed according to order. (Thus if z is a zero of order m, it will be listed m
times in the list, z
k
.) Choosing nonnegative integers, p
n
such that for all r > 0,
n=1
_
r
[z
n
[
_
p
n
+1
< ,
24.2. FACTORING A GIVEN ANALYTIC FUNCTION 637
There exists an entire function, g such that
f (z) = e
g(z)
n=1
E
p
n
_
z
z
n
_
. (24.12)
Note that e
g(z)
,= 0 for any z and this is the interesting thing about this function.
Proof: z
n
cannot have a limit point because if there were a limit point of this
sequence, it would follow from Theorem 19.23 that f (z) = 0 for all z, contradicting
the hypothesis that f (0) ,= 0. Hence lim
n
[z
n
[ = and so
n=1
_
r
[z
n
[
_
1+n1
=
n=1
_
r
[z
n
[
_
n
<
by the root test. Therefore, by Theorem 24.6
P (z) =
n=1
E
p
n
_
z
z
n
_
a function analytic on C by picking p
n
= n 1 or perhaps some other choice. (
p
n
= n 1 works but there might be another choice that would work.) Then f/P
has only removable singularities in C and no zeros thanks to Theorem 24.6. Thus,
letting h(z) = f (z) /P (z) , Corollary 19.50 implies that h
/h has a primitive, g.
Then
_
he
g
_
= 0
and so
h(z) = e
a+ib
e
g(z)
for some constants, a, b. Therefore, letting g (z) = g (z) + a + ib, h(z) = e
g(z)
and
thus 24.12 holds. This proves the theorem.
Corollary 24.13 Let f be analytic on C, f has a zero of order m at 0, and let the
other zeros of f be z
k
, listed according to order. (Thus if z is a zero of order l,
it will be listed l times in the list, z
k
.) Also let
n=1
_
r
[z
n
[
_
1+p
n
< (24.13)
for any choice of r > 0. Then there exists an entire function, g such that
f (z) = z
m
e
g(z)
n=1
E
p
n
_
z
z
n
_
. (24.14)
Proof: Since f has a zero of order m at 0, it follows from Theorem 19.23 that
z
k
cannot have a limit point in C and so you can apply Theorem 24.12 to the
function, f (z) /z
m
which has a removable singularity at 0. This proves the corollary.
24.2.1 Factoring Some Special Analytic Functions
Factoring a polynomial is in general a hard task. It is true it is easy to prove the
factors exist but nding them is another matter. Corollary 24.13 gives the existence
of factors of a certain form but it does not tell how to nd them. This should not
be surprising. You cant expect things to get easier when you go from polynomials
to analytic functions. Nevertheless, it is possible to factor some popular analytic
functions. These factorizations are based on the following Mitag-Leer expansions.
By an auspicious choice of the contour and the method of residues it is possible to
obtain a very interesting formula for cot z .
Example 24.14 Let
N
be the contour which goes from N
1
2
Ni horizontally
to N +
1
2
Ni and from there, vertically to N +
1
2
+ Ni and then horizontally
to N
1
2
+ Ni and nally vertically to N
1
2
Ni. Thus the contour is a
large rectangle and the direction of integration is in the counter clockwise direction.
Consider the integral
I
N

_
N
cos z
sinz (
2
z
2
)
dz
where 1 is not an integer. This will be used to verify the formula of Mittag-
Leer,
1
n=1
2
2
n
2
= cot . (24.15)
First you show that cot z is bounded on this contour. This is easy using the
formula for cot (z) =
e
iz
+e
iz
e
iz
e
iz
. Therefore, I
N
0 as N because the integrand
is of order 1/N
2
while the diameter of
N
is of order N. Next you compute the
residues of the integrand at and at n where [n[ < N +
1
2
for n an integer. These
are the only singularities of the integrand in this contour and therefore, using the
residue theorem, you can evaluate I
N
by using these. You can calculate these
residues and nd that the residue at is
cos
2sin
while the residue at n is
1
2
n
2
.
Therefore
0 = lim
N
I
N
= lim
N
2i
_
N
n=N
1
2
n
2

cot
_
which establishes the following formula of Mittag Leer.
lim
N
N
n=N
1
2
n
2
=
cot
.
24.2. FACTORING A GIVEN ANALYTIC FUNCTION 639
Writing this in a slightly nicer form, you obtain 24.15.
This is a very interesting formula. This will be used to factor sin(z) . The
zeros of this function are at the integers. Therefore, considering 24.13 you can pick
p
n
= 1 in the Weierstrass factorization formula. Therefore, by Corollary 24.13 there
exists an analytic function g (z) such that
sin(z) = ze
g(z)
n=1
_
1
z
z
n
_
e
z/z
n
(24.16)
where the z
n
are the nonzero integers. Remember you can permute the factors in
these products. Therefore, this can be written more conveniently as
sin(z) = ze
g(z)
n=1
_
1
_
z
n
_
2
_
and it is necessary to nd g (z) . Dierentiating both sides of 24.16
cos (z) = e
g(z)
n=1
_
1
_
z
n
_
2
_
+zg
(z) e
g(z)
n=1
_
1
_
z
n
_
2
_
+ze
g(z)
n=1
_
2z
n
2
_
k=n
_
1
_
z
k
_
2
_
Now divide both sides by sin(z) to obtain
cot (z) =
1
z
+g
(z)
n=1
2z/n
2
(1 z
2
/n
2
)
=
1
z
+g
(z) +
n=1
2z
z
2
n
2
.
By 24.15, this yields g
(z) = 0 for z not an integer and so g (z) = c, a constant. So

far this yields
sin(z) = ze
c
n=1
_
1
_
z
n
_
2
_
and it only remains to nd c. Divide both sides by z and take a limit as z 0.
Using the power series of sin(z) , this yields
1 =
e
c
and so c = ln. Therefore,

sin(z) = z
n=1
_
1
_
z
n
_
2
_
. (24.17)
Example 24.15 Find an interesting formula for tan(z) .
This is easy to obtain from the formula for cot (z) .
cot
_
_
z +
1
2
__
= tanz
for z real and therefore, this formula holds for z complex also. Therefore, for z +
1
2
not an integer
cot
_
_
z +
1
2
__
=
2
2z + 1
+
n=1
2z + 1
_
2z+1
2
_
2
n
2
24.3 The Existence Of An Analytic Function With
Given Values
The Weierstrass product formula, Theorem 24.6, along with the Mittag-Leer the-
orem, Theorem 23.13 can be used to obtain an analytic function which has given
values on a countable set of points, having no limit point. This is clearly an amazing
result and indicates how potent these theorems are. In fact, you can show that it
isnt just the values of the function which may be specied at the points in this
countable set of points but the derivatives up to any nite order.
k
k=1
be a set of points in C,which has no limit point.
For each z
k
, consider
m
k
j=0
a
k
j
(z z
k
)
j
. (24.18)
Then there exists an analytic function dened on C such that the Taylor series of
f at z
k
has the rst m
k
terms given by 24.18.
1
Proof: By the Weierstrass product theorem, Theorem 24.6, there exists an
analytic function, f dened on all of such that f has a zero of order m
k
+ 1 at
z
k
. Consider this z
k
Thus for z near z
k
,
f (z) =
j=m
k
+1
c
j
(z z
k
)
j
where c
m
k
+1
,= 0. You choose b
1
, b
2
, , b
m
k
+1
such that
f (z)
_
m
k
+1
l=1
b
l
(z z
k
)
k
_
=
m
k
j=0
a
k
j
(z z
k
)
j
+
k=m
k
+1
c
k
j
(z z
k
)
j
.
1
This says you can specify the rst m
k
derivatives of the function at the point z
k
.
24.3. THE EXISTENCE OF AN ANALYTIC FUNCTION WITH GIVEN VALUES641
Thus you need
m
k
+1
l=1
j=m
k
+1
c
j
b
l
(z z
k
)
jl
=
m
k
r=0
a
k
r
(z z
k
)
r
+ Higher order terms.
It follows you need to solve the following system of equations for b
1
, , b
m
k
+1
.
c
m
k
+1
b
m
k
+1
= a
k
0
c
m
k
+2
b
m
k
+1
+c
m
k
+1
b
m
k
= a
k
1
c
m
k
+3
b
m
k
+1
+c
m
k
+2
b
m
k
+c
m
k
+1
b
m
k
1
= a
k
2
.
.
.
c
m
k
+m
k
+1
b
m
k
+1
+c
m
k
+m
k
b
m
k
+ +c
m
k
+1
b
1
= a
k
m
k
Since c
m
k
+1
,= 0, it follows there exists a unique solution to the above system.
You rst solve for b
m
k
+1
in the top. Then, having found it, you go to the next
and use c
m
k
+1
,= 0 again to nd b
m
k
and continue in this manner. Let S
k
(z)
be determined in this manner for each z
k
. By the Mittag-Leer theorem, there
exists a Meromorphic function, g such that g has exactly the singularities, S
k
(z) .
Therefore, f (z) g (z) has removable singularities at each z
k
and for z near z
k
, the
rst m
k
terms of fg are as prescribed. This proves the theorem.
Corollary 24.17 Let P z
k
k=1
be a set of points in , an open set such that
P has no limit points in . For each z
k
, consider
m
k
j=0
a
k
j
(z z
k
)
j
. (24.19)
Then there exists an analytic function dened on such that the Taylor series of
f at z
k
has the rst m
k
terms given by 24.19.
Proof: The proof is identical to the above except you use the versions of the
Mittag-Leer theorem and Weierstrass product which pertain to open sets.
Denition 24.18 Denote by H () the analytic functions dened on , an open
subset of C. Then H () is a commutative ring
2
with the usual operations of addition
and multiplication. A set, I H () is called a nitely generated ideal of the ring
if I is of the form
_
n
k=1
g
k
f
k
: f
k
H () for k = 1, 2, , n
_
where g
1
, , g
n
are given functions in H (). This ideal is also denoted as [g
1
, , g
n
]
and is called the ideal generated by the functions, g
1
, , g
n
. Since there are
nitely many of these functions it is called a nitely generated ideal. A principal
ideal is one which is generated by a single function. An example of such a thing is
[1] = H () .
2
It is not a eld because you cant divide two analytic functions and get another one.
Then there is the following interesting theorem.
Theorem 24.19 Every nitely generated ideal in H () for a connected open set
(region) is a principal ideal.
Proof: Let I = [g
1
, , g
n
] be a nitely generated ideal as described above.
Then if any of the functions has no zeros, this ideal would consist of H () because
then g
1
i
H () and so 1 I. It follows all the functions have zeros. If any of the
functions has a zero of innite order, then the function equals zero on because
is connected and can be deleted from the list. Similarly, if the zeros of any of these
functions have a limit point in , then the function equals zero and can be deleted
from the list. Thus, without loss of generality, all zeros are of nite order and there
are no limit points of the zeros in . Let m(g
i
, z) denote the order of the zero of
g
i
at z. If g
i
has no zero at z, then m(g
i
, z) = 0.
I claim that if no point of is a zero of all the g
i
, then the conclusion of the
theorem is true and in fact [g
1
, , g
n
] = [1] = H () . The claim is obvious if n = 1
because this assumption that no point is a zero of all the functions implies g ,= 0
and so g
1
is analytic. Hence 1 [g
1
] . Suppose it is true for n 1 and consider
[g
1
, , g
n
] where no point of is a zero of all the g
i
. Even though this may be true
of g
1
, , g
n
, it may not be true of g
1
, , g
n1
. By Corollary 24.9 there exists
, a function analytic on such that m(, z) = min m(g
i
, z) , i = 1, 2, , n 1 .
Thus the functions g
1
/, , g
n1
/ .are all analytic. Could they all equal zero
at some point, z? If so, pick i where m(, z) = m(g
i
, z) . Thus g
i
/ is not equal to
zero at z after all and so these functions are analytic there is no point of which
is a zero of all of them. By induction, [g
1
/, , g
n1
/] = H (). (Also there are
no new zeros obtained in this way.)
Now this means there exist functions f
i
H () such that
n
i=1
f
i
_
g
i
_
= 1
and so =

n
i=1
f
i
g
i
. Therefore, [] [g
1
, , g
n1
] . On the other hand, if
n1
k=1
h
k
g
k
[g
1
, , g
n1
] you could dene h

n1
k=1
h
k
(g
k
/ ) , an analytic
function with the property that h =
n1
k=1
h
k
g
k
which shows [] = [g
1
, , g
n1
].
Therefore,
[g
1
, , g
n
] = [, g
n
]
Now has no zeros in common with g
n
because the zeros of are contained in the
set of zeros for g
1
, , g
n1
. Now consider a zero, of . It is not a zero of g
n
and
so near , these functions have the form
(z) =
k=m
a
k
(z )
k
, g
n
(z) =
k=0
b
k
(z )
k
, b
0
,= 0.
I want to determine coecients for an analytic function, h such that
m(1 hg
n
, ) m(, ) . (24.20)
24.3. THE EXISTENCE OF AN ANALYTIC FUNCTION WITH GIVEN VALUES643
Let
h(z) =
k=0
c
k
(z )
k
and the c
k
must be determined. Using Mertens theorem, the power series for 1hg
n
is of the form
1 b
0
c
0
j=1
_
j
r=0
b
jr
c
r
_
(z )
j
.
First determine c
0
such that 1 c
0
b
0
= 0. This is no problem because b
0
,= 0. Next
you need to get the coecients of (z ) to equal zero. This requires
b
1
c
0
+b
0
c
1
= 0.
Again, there is no problem because b
0
,= 0. In fact, c
1
= (b
1
c
0
/b
0
) . Next consider
the second order terms if m 2.
b
2
c
0
+b
1
c
1
+b
0
c
2
= 0
Again there is no problem in solving, this time for c
2
because b
0
,= 0. Continuing this
way, you see that in every step, the c
k
which needs to be solved for is multiplied by
b
0
,= 0. Therefore, by Corollary 24.9 there exists an analytic function, h satisfying
24.20. Therefore, (1 hg
n
) / has a removable singularity at every zero of and
so may be considered an analytic function. Therefore,
1 =
1 hg
n
+hg
n
[, g
n
] = [g
1
g
n
]
which shows [g
1
g
n
] = H () = [1] . It follows the claim is established.
Now suppose g
1
g
n
are just elements of H () . As explained above, it can
be assumed they all have zeros of nite order and the zeros have no limit point in
since if these occur, you can delete the function from the list. By Corollary 24.9
there exists H () such that m(, z) minm(g
i
, z) : i = 1, , n . Then
g
k
/ has a removable singularity at each zero of g
k
and so can be regarded as an
analytic function. Also, as before, there is no point which is a zero of each g
k
/ and
so by the rst part of this argument, [g
1
/ g
n
/] = H () . As in the rst part
of the argument, this implies [g
1
g
n
] = [] which proves the theorem. [g
1
g
n
]
is a principal ideal as claimed.
The following corollary follows from the above theorem. You dont need to
assume is connected.
Corollary 24.20 Every nitely generated ideal in H () for an open set is a
principal ideal.
Proof: Let [g
1
, , g
n
] be a nitely generated ideal in H () . Let U
k
be
the components of . Then applying the above to each component, there exists
h
k
H (U
k
) such that restricting each g
i
to U
k
, [g
1
, , g
n
] = [h
k
] . Then let h(z) =
h
k
(z) for z U
k
. This is an analytic function which works.
24.4 Jensens Formula
This interesting formula relates the zeros of an analytic function to an integral. The
proof given here follows Alfors, [2]. First, here is a technical lemma.
Lemma 24.21
_

ln
1 e
i
d = 0.
Proof: First note that the only problem with the integrand occurs when = 0.
However, this is an integrable singularity so the integral will end up making sense.
Letting z = e
i
, you could get the above integral as a limit as 0 of the following
contour integral where
is the contour shown in the following picture with the

radius of the big circle equal to 1 and the radius of the little circle equal to ..
_
ln[1 z[
iz
dz.
1
On the indicated contour, 1z lies in the half plane Re z > 0 and so log (1 z) =
ln[1 z[ +i arg (1 z). The above integral equals
_
log (1 z)
iz
dz
_
arg (1 z)
z
dz
The rst of these integrals equals zero because the integrand has a removable sin-
gularity at 0. The second equals
i
_

arg
_
1 e
i
_
d +i
_

arg
_
1 e
i
_
d
+i
_

d +i
_
2
d
where
0 as 0. The last two terms converge to 0 as 0 while the

rst two add to zero. To see this, change the variable in the rst integral and then
recall that when you multiply complex numbers you add the arguments. Thus you
end up integrating arg (real valued function) which equals zero.
In this material on Jensens equation, will denote a small positive number. Its
value is not important as long as it is positive. Therefore, it may change from place
24.4. JENSENS FORMULA 645
to place. Now suppose f is analytic on B(0, r +) , and f has no zeros on B(0, r).
Then you can dene a branch of the logarithm which makes sense for complex
numbers near f (z) . Thus z log (f (z)) is analytic on B(0, r +). Therefore, its
real part, u(x, y) ln[f (x +iy)[ must be harmonic. Consider the following lemma.
Lemma 24.22 Let u be harmonic on B(0, r +) . Then
u(0) =
1
2
_

u
_
re
i
_
d.
Proof: For a harmonic function, u dened on B(0, r +) , there exists an
analytic function, h = u +iv where
v (x, y)
_
y
0
u
x
(x, t) dt
_
x
0
u
y
(t, 0) dt.
By the Cauchy integral theorem,
h(0) =
1
2i
_
r
h(z)
z
dz =
1
2
_

h
_
re
i
_
d.
Therefore, considering the real part of h,
u(0) =
1
2
_

u
_
re
i
_
d.
Now this shows the following corollary.
Corollary 24.23 Suppose f is analytic on B(0, r +) and has no zeros on B(0, r).
Then
ln[f (0)[ =
1
2
_

ln
f
_
re
i
_
(24.21)
What if f has some zeros on [z[ = r but none on B(0, r)? It turns out 24.21
is still valid. Suppose the zeros are at
_
re
i
k
_
m
k=1
, listed according to multiplicity.
Then let
g (z) =
f (z)
m
k=1
(z re
i
k
)
.
It follows g is analytic on B(0, r +) but has no zeros in B(0, r). Then 24.21 holds
for g in place of f. Thus
ln[f (0)[
m
k=1
ln[r[
=
1
2
_

ln
f
_
re
i
_
d
1
2
_

k=1
ln
re
i
re
i
k
d
=
1
2
_

ln
f
_
re
i
_
d
1
2
_

k=1
ln
e
i
e
i
k
d
m
k=1
ln[r[
=
1
2
_

ln
f
_
re
i
_
d
1
2
_

k=1
ln
e
i
1
d
m
k=1
ln[r[
Therefore, 24.21 will continue to hold exactly when
1
2
_
m
k=1
ln
e
i
1
d = 0.
But this is the content of Lemma 24.21. This proves the following lemma.
Lemma 24.24 Suppose f is analytic on B(0, r +) and has no zeros on B(0, r) .
Then
ln[f (0)[ =
1
2
_

ln
f
_
re
i
_
(24.22)
With this preparation, it is now not too hard to prove Jensens formula. Suppose
there are n zeros of f in B(0, r) , a
k
n
k=1
, listed according to multiplicity, none equal
to zero. Let
F (z) f (z)
n
i=1
r
2
a
i
z
r (z a
i
)
.
Then F is analytic on B(0, r +) and has no zeros in B(0, r) . The reason for this
is that f (z) /
n
i=1
r (z a
i
) has no zeros there and r
2
a
i
z cannot equal zero if
[z[ < r because if this expression equals zero, then
[z[ =
r
2
[a
i
[
> r.
The other interesting thing about F (z) is that when z = re
i
,
F
_
re
i
_
= f
_
re
i
_
n
i=1
r
2
a
i
re
i
r (re
i
a
i
)
= f
_
re
i
_
n
i=1
r a
i
e
i
(re
i
a
i
)
= f
_
re
i
_
e
i
n
i=1
re
i
a
i
re
i
a
i
so

F
_
re
i
_
f
_
re
i
_
.
24.5. BLASCHKE PRODUCTS 647
Theorem 24.25 Let f be analytic on B(0, r +) and suppose f (0) ,= 0. If the
zeros of f in B(0, r) are a
k
n
k=1
, listed according to multiplicity, then
ln[f (0)[ =
n
i=1
ln
_
r
[a
i
[
_
+
1
2
_
2
0
ln
f
_
re
i
_
d.
Proof: From the above discussion and Lemma 24.24,
ln[F (0)[ =
1
2
_

ln
f
_
re
i
_
d
But F (0) = f (0)
n
i=1
r
a
i
and so ln[F (0)[ = ln[f (0)[ +
n
i=1
ln
r
a
i
. Therefore,
ln[f (0)[ =
n
i=1
ln
r
a
i
+
1
2
_
2
0
ln
f
_
re
i
_
d
as claimed.
Written in terms of exponentials this is
[f (0)[
n
k=1
r
a
k
= exp
_
1
2
_
2
0
ln
f
_
re
i
_
d
_
.
24.5 Blaschke Products
The Blaschke
3
product is a way to produce a function which is bounded and analytic
on B(0, 1) which also has given zeros in B(0, 1) . The interesting thing here is
that there may be innitely many of these zeros. Thus, unlike the above case of
Jensens inequality, the function is not analytic on B(0, 1). Recall for purposes of
comparison, Liouvilles theorem which says bounded entire functions are constant.
The Blaschke product gives examples of bounded functions on B(0, 1) which are
denitely not constant.
Theorem 24.26 Let
n
be a sequence of nonzero points in B(0, 1) with the
property that
n=1
(1 [
n
[) < .
Then for k 0, an integer
B(z) z
k
k=1
n
z
1
n
z
[
n
[
n
is a bounded function which is analytic on B(0, 1) which has zeros only at 0 if k > 0
and at the
n
.
3
Wilhelm Blaschke, 1915
Proof: From Theorem 24.2 the above product will converge uniformly on B(0, r)
for r < 1 to an analytic function if
k=1
n
z
1
n
z
[
n
[
n
1
converges uniformly on B(0, r) . But for [z[ < r,
n
z
1
n
z
[
n
[
n
1
n
z
1
n
z
[
n
[

n
(1
n
z)
n
(1
n
z)
[
n
[
n
[
n
[ z
n
+[
n
[
2
z
(1
n
z)
n
[
n
[
n
n
[
n
[ z +[
n
[
2
z
(1
n
z)
n
= [[
n
[ 1[
n
+z [
n
[
(1
n
z)
n
= [[
n
[ 1[
1 +z ([
n
[ /
n
)
(1
n
z)
[[
n
[ 1[
1 +[z[
1 [z[
[[
n
[ 1[
1 +r
1 r
and so the assumption on the sum gives uniform convergence of the product on
B(0, r) to an analytic function. Since r < 1 is arbitrary, this shows B(z) is analytic
on B(0, 1) and has the specied zeros because the only place the factors equal zero
are at the
n
or 0.
Now consider the factors in the product. The claim is that they are all no larger
in absolute value than 1. This is very easy to see from the maximum modulus
theorem. Let [[ < 1 and (z) =
z
1z
. Then is analytic near B(0, 1) because its
only pole is 1/. Consider z = e
i
. Then
_
e
i
_
e
i
1 e
i
1 e
i
1 e
i
= 1.
Thus the modulus of (z) equals 1 on B(0, 1) . Therefore, by the maximum mod-
ulus theorem, [(z)[ < 1 if [z[ < 1. This proves the claim that the terms in the
product are no larger than 1 and shows the function determined by the Blaschke
product is bounded. This proves the theorem.
Note in the conditions for this theorem the one for the sum,

n=1
(1 [
n
[) <
. The Blaschke product gives an analytic function, whose absolute value is bounded
by 1 and which has the
n
as zeros. What if you had a bounded function, analytic
on B(0, 1) which had zeros at
k
? Could you conclude the condition on the sum?
The answer is yes. In fact, you can get by with less than the assumption that f is
bounded but this will not be presented here. See Rudin [38]. This theorem is an
exciting use of Jensens equation.
Theorem 24.27 Suppose f is an analytic function on B(0, 1) , f (0) ,= 0, and
[f (z)[ M for all z B(0, 1) . Suppose also that the zeros of f are
k
k=1
,
listed according to multiplicity. Then

k=1
(1 [
k
[) < .
Proof: If there are only nitely many zeros, there is nothing to prove so assume
there are innitely many. Also let the zeros be listed such that [
n
[ [
n+1
[
Let n(r) denote the number of zeros in B(0, r) . By Jensens formula,
ln[f (0)[ +
n(r)
i=1
lnr ln[
i
[ =
1
2
_
2
0
ln
f
_
re
i
_
d ln(M) .
Therefore, by the mean value theorem,
n(r)
i=1
1
r
(r [
i
[)
n(r)
i=1
lnr ln[
i
[ ln(M) ln[f (0)[
As r 1, n(r) , and so an application of Fatous lemma yields
i=1
(1 [
i
[) lim inf
r1
n(r)
i=1
1
r
(r [
i
[) ln(M) ln[f (0)[ .
You dont need the assumption that f (0) ,= 0.
Corollary 24.28 Suppose f is an analytic function on B(0, 1) and [f (z)[ M
for all z B(0, 1) . Suppose also that the nonzero zeros
4
of f are
k
k=1
, listed
according to multiplicity. Then

k=1
(1 [
k
[) < .
Proof: Suppose f has a zero of order m at 0. Then consider the analytic
function, g (z) f (z) /z
m
which has the same zeros except for 0. The argument
goes the same way except here you use g instead of f and only consider r > r
0
> 0.
4
This is a fun thing to say: nonzero zeros.
Thus from Jensens equation,
ln[g (0)[ +
n(r)
i=1
lnr ln[
i
[
=
1
2
_
2
0
ln
g
_
re
i
_
d
=
1
2
_
2
0
ln
f
_
re
i
_
d
1
2
_
2
0
mln(r)
M +
1
2
_
2
0
mln
_
r
1
_
M +mln
_
1
r
0
_
.
Now the rest of the argument is the same.
An interesting restatement yields the following amazing result.
Corollary 24.29 Suppose f is analytic and bounded on B(0, 1) having zeros
n
.
Then if

k=1
(1 [
n
[) = , it follows f is identically equal to zero.
24.5.1 The M untz-Szasz Theorem
Corollary 24.29 makes possible an easy proof of a remarkable theorem named above
which yields a wonderful generalization of the Weierstrass approximation theorem.
In what follows b > 0. The Weierstrass approximation theorem states that linear
combinations of 1, t, t
2
, t
3
, (polynomials) are dense in C ([0, b]) . Let
1
<
2
<
3
< be an increasing list of positive real numbers. This theorem tells when
linear combinations of 1, t
1
, t
2
, are dense in C ([0, b]). The proof which follows
is like the one given in Rudin [38]. There is a much longer one in Cheney [13]
which discusses more aspects of the subject. This other approach is much more
elementary and does not depend in any way on the theory of functions of a complex
variable. There are those of us who automatically prefer real variable techniques.
Nevertheless, this proof by Rudin is a very nice and insightful application of the
preceding material. Cheney refers to the theorem as the second M untz theorem. I
guess Szasz must also have been involved.
Theorem 24.30 Let
1
<
2
<
3
< be an increasing list of positive real
numbers and let a > 0. If
n=1
1
n
= , (24.23)
then linear combinations of 1, t
1
, t
2
, are dense in C ([0, b]).
Proof: Let X denote the closure of linear combinations of
_
1, t
1
, t
2
,
_
in
C ([0, b]) . If X ,= C ([0, b]) , then letting f C ([0, b]) X, dene C ([0, b])
as
follows. First let
0
: X +Cf be given by
0
(g +f) = [[f[[
. Then
sup
||g+f||1
[
0
(g +f)[ = sup
||g+f||1
[[ [[f[[
= sup
||g/+f||
1
||
[[ [[f[[
= sup
||g+f||
1
||
[[ [[f[[
Now dist (f, X) > 0 because X is closed. Therefore, there exists a lower bound,
> 0 to [[g +f[[ for g X. Therefore, the above is no larger than
sup
||
1
[[ [[f[[
=
_
1
_
[[f[[
which shows that [[

0
[[
_
1
_
[[f[[
. By the Hahn Banach theorem

0
can be
extended to C ([0, b])
which has the property that (X) = 0 but (f) =

[[f[[ ,= 0. By the Weierstrass approximation theorem, there exists a polynomial, p
such that (p) ,= 0. Therefore, if it can be shown that whenever (X) = 0, it is
the case that (p) = 0 for all polynomials, it must be the case that X is dense in
C ([0, b]).
By the Riesz representation theorem the elements of C ([0, b])
are complex mea-

sures. Suppose then that for a complex measure it follows that for all t
k
,
_
[0,b]
t
k
d = 0.
I want to show that then
_
[0,b]
t
k
d = 0
for all positive integers. It suces to modify is necessary to have (0) = 0 since
this will not change any of the above integrals. Let
1
(E) = (E (0, b]) and use
1
. I will continue using the symbol, .
For Re (z) > 0, dene
F (z)
_
[0,b]
t
z
d =
_
(0,b]
t
z
d
The function t
z
= exp(z ln(t)) is analytic. I claim that F (z) is also analytic for
Re z > 0. Apply Moreras theorem. Let T be a triangle in Re z > 0. Then
_
T
F (z) dz =
_
T
_
(0,b]
e
(z ln(t))
d [[ dz
Now
_
T
can be split into three integrals over intervals of 1 and so this integral is es-
sentially a Lebesgue integral taken with respect to Lebesgue measure. Furthermore,
e
(z ln(t))
is a continuous function of the two variables and is a function of only the
one variable, t. Thus the integrand is product measurable. The iterated integral is
also absolutely integrable because

e
(z ln(t))
e
x ln t
e
xln b
where x + iy = z and
x is given to be positive. Thus the integrand is actually bounded. Therefore, you
can apply Fubinis theorem and write
_
T
F (z) dz =
_
T
_
(0,b]
e
(z ln(t))
d [[ dz
=
_
(0,b]
_
T
e
(z ln(t))
dzd [[ = 0.
By Moreras theorem, F is analytic on Re z > 0 which is given to have zeros at the
k
.
Now let (z) =
1+z
1z
. Then maps B(0, 1) one to one onto Re z > 0. To see this
let 0 < r < 1.
_
re
i
_
=
1 +re
i
1 re
i
=
1 r
2
+i2r sin
1 +r
2
2r cos
and so Re
_
re
i
_
> 0. Now the inverse of is
1
(z) =
z1
z+1
. For Re z > 0,
1
(z)
2
=
z 1
z + 1

z 1
z + 1
=
[z[
2
2 Re z + 1
[z[
2
+ 2 Re z + 1
< 1.
Consider F , an analytic function dened on B(0, 1). This function is given to
have zeros at z
n
where (z
n
) =
1+z
n
1z
n
=
n
. This reduces to z
n
=
1+
n
1+
n
. Now
1 [z
n
[
c
1 +
n
for a positive constant, c. It is given that
n
= . so it follows
(1 [z
n
[) =
also. Therefore, by Corollary 24.29, F = 0. It follows F = 0 also. In particular,
F (k) for k a positive integer equals zero. This has shown that if C ([0, b])
and
sends 1 and all the t
n
to 0, then sends 1 and all t
k
for k a positive integer to
zero. As explained above, X is dense in C ((0, b]) .
The converse of this theorem is also true and is proved in Rudin [38].
24.6 Exercises
1. Suppose f is an entire function with f (0) = 1. Let
M (r) = max [f (z)[ : [z[ = r .
Use Jensens equation to establish the following inequality.
M (2r) 2
n(r)
where n(r) is the number of zeros of f in B(0, r).
24.6. EXERCISES 653
2. The version of the Blaschke product presented above is that found in most
complex variable texts. However, there is another one in [33]. Instead of
n
z
1
n
z
|
n
|
n
you use
n
z
1
n
z
Prove a version of Theorem 24.26 using this modication.
3. The Weierstrass approximation theorem holds for polynomials of n variables
on any compact subset of 1
n
. Give a multidimensional version of the M untz-
Szasz theorem which will generalize the Weierstrass approximation theorem
for n dimensions. You might just pick a compact subset of 1
n
in which
all components are positive. You have to do something like this because
otherwise, t
might not be dened.

4. Show cos (z) =
k=1
_
1
4z
2
(2k1)
2
_
.
5. Recall sin(z) = z
n=1
_
1
_
z
n
_
2
_
. Use this to derive Wallis product,
2
=
k=1
4k
2
(2k1)(2k+1)
.
6. The order of an entire function, f is dened as
inf
_
a 0 : [f (z)[ e
|z|
a
for all large enough [z[
_
If no such a exists, the function is said to be of innite order. Show the order
of an entire function is also equal to limsup
r
ln(ln(M(r)))
ln(r)
where M (r)
max [f (z)[ : [z[ = r.
7. Suppose is a simply connected region and let f be meromorphic on .
Suppose also that the set, S z : f (z) = c has a limit point in . Can
you conclude f (z) = c for all z ?
8. This and the next collection of problems are dealing with the gamma function.
Show that
_
1 +
z
n
_
e
z
n
1

C (z)
n
2
and therefore,
n=1
_
1 +
z
n
_
e
z
n
1
<
with the convergence uniform on compact sets.
9. Show

n=1
_
1 +
z
n
_
e
z
n
converges to an analytic function on C which has
zeros only at the negative integers and that therefore,
n=1
_
1 +
z
n
_
1
e
z
n
is a meromorphic function (Analytic except for poles) having simple poles at
the negative integers.
10. Show there exists such that if
(z)
e
z
z
n=1
_
1 +
z
n
_
1
e
z
n
,
then (1) = 1. Thus is a meromorphic function having simple poles at the
negative integers. Hint:

n=1
(1 +n) e
1/n
= c = e
.
11. Now show that
= lim
n
_
n
k=1
1
k
lnn
_
12. Justify the following argument leading to Gausss formula
(z) = lim
n
_
n
k=1
_
k
k +z
_
e
z
k
_
e
z
z
= lim
n
_
n!
(1 +z) (2 +z) (n +z)
e
z(
n
k=1
1
k
)
_
e
z
z
= lim
n
n!
(1 +z) (2 +z) (n +z)
e
z(
n
k=1
1
k
)
e
z[
n
k=1
1
k
ln n]
= lim
n
n!n
z
(1 +z) (2 +z) (n +z)
.
13. Verify from the Gauss formula above that (z + 1) = (z) z and that for n
a nonnegative integer, (n + 1) = n!.
14. The usual denition of the gamma function for positive x is
1
(x)
_

0
e
t
t
x1
dt.
Show
_
1
t
n
_
n
e
t
for t [0, n] . Then show
_
n
0
_
1
t
n
_
n
t
x1
dt =
n!n
x
x(x + 1) (x +n)
.
Use the rst part to conclude that
1
(x) = lim
n
n!n
x
x(x + 1) (x +n)
= (x) .
Hint: To show
_
1
t
n
_
n
e
t
for t [0, n] , verify this is equivalent to
showing (1 u)
n
e
nu
for u [0, 1].
24.6. EXERCISES 655
15. Show (z) =
_
0
e
t
t
z1
dt. whenever Re z > 0. Hint: You have already
shown that this is true for positive real numbers. Verify this formula for
Re z > 0 yields an analytic function.
16. Show
_
1
2
_
=
. Then nd
_
5
2
_
.
17. Show that
_
e
s
2
2
ds =
2. Hint: Denote this integral by I and observe

that I
2
=
_
R
2
e
(x
2
+y
2
)/2
dxdy. Then change variables to polar coordinates,
x = r cos (), y = r sin.
18. Now that you know what the gamma function is, consider in the formula
for ( + 1) the following change of variables. t = +
1/2
s. Then in terms
of the new variable, s, the formula for ( + 1) is
e
+
1
2
_

s
_
1 +
s
ds
= e
+
1
2
_

ln
1+
s
ds
Show the integrand converges to e
s
2
2
. Show that then
lim
( + 1)
e
+(1/2)
=
_

e
s
2
2
ds =
2.
Hint: You will need to obtain a dominating function for the integral so that
you can use the dominated convergence theorem. You might try considering
s (
) rst and consider something like e

1(s
2
/4)
on this interval.
Then look for another function for s >
. This formula is known as Stirlings

formula.
19. This and the next several problems develop the zeta function and give a
relation between the zeta and the gamma function. Dene for 0 < r < 2
I
r
(z)
_
2
0
e
(z1)(ln r+i)
e
re
i
1
ire
i
d +
_

r
e
(z1)(ln t+2i)
e
t
1
dt (24.24)
+
_
r
e
(z1) ln t
e
t
1
dt
Show that I
r
is an entire function. The reason 0 < r < 2 is that this prevents
e
re
i
1 from equaling zero. The above is just a precise description of the
contour integral,
_
w
z1
e
w
1
dw where is the contour shown below.
E '
'
E
c
in which on the integrals along the real line, the argument is dierent in going
from r to than it is in going from to r. Now I have not dened such
contour integrals over contours which have innite length and so have chosen
to simply write out explicitly what is involved. You have to work with these
integrals given above anyway but the contour integral just mentioned is the
motivation for them. Hint: You may want to use convergence theorems from
real analysis if it makes this more convenient but you might not have to.
20. In the context of Problem 19 dene for small > 0
I
r
(z)
_
r,
w
z1
e
w
1
dw
where
r
is shown below.
E
'
'
E
c
2
r
x
Show that lim
0
I
r
(z) = I
r
(z) . Hint: Use the dominated convergence
theorem if it makes this go easier. This is not a hard problem if you use these
theorems but you can probably do it without them with more work.
21. In the context of Problem 20 show that for r
1
< r, I
r
(z) I
r
1
(z) is a
contour integral,
_
r,r
1
,
w
z1
e
w
1
dw
where the oriented contour is shown below.
24.6. EXERCISES 657
'
E
c
E
'
T
r,r
1
,
In this contour integral, w
z1
denotes e
(z1) log(w)
where log (w) = ln[w[ +
i arg (w) for arg (w) (0, 2) . Explain why this integral equals zero. From
Problem 20 it follows that I
r
= I
r
1
. Therefore, you can dene an entire func-
tion, I (z) I
r
(z) for all r positive but suciently small. Hint: Remember
the Cauchy integral formula for analytic functions dened on simply connected
regions. You could argue there is a simply connected region containing
r,r
1
,
.
22. In case Re z > 1, you can get an interesting formula for I (z) by taking the
limit as r 0. Recall that
I
r
(z)
_
2
0
e
(z1)(ln r+i)
e
re
i
1
ire
i
d +
_

r
e
(z1)(ln t+2i)
e
t
1
dt (24.25)
+
_
r
e
(z1) ln t
e
t
1
dt
and now it is desired to take a limit in the case where Re z > 1. Show the rst
integral above converges to 0 as r 0. Next argue the sum of the two last
integrals converges to
_
e
(z1)2i
1
_
_

0
e
(z1) ln(t)
e
t
1
dt.
Thus
I (z) =
_
e
z2i
1
_
_

0
e
(z1) ln(t)
e
t
1
dt (24.26)
when Re z > 1.
23. So what does all this have to do with the zeta function and the gamma
function? The zeta function is dened for Re z > 1 by
n=1
1
n
z
(z) .
By Problem 15, whenever Re z > 0,
(z) =
_

0
e
t
t
z1
dt.
Change the variable and conclude
(z)
1
n
z
=
_

0
e
ns
s
z1
ds.
Therefore, for Re z > 1,
(z) (z) =
n=1
_

0
e
ns
s
z1
ds.
Now show that you can interchange the order of the sum and the integral.
This is possibly most easily done by using Fubinis theorem. Show that
n=1
_
e
ns
s
z1
ds < and then use Fubinis theorem. I think you

could do it other ways though. It is possible to do it without any reference to
Lebesgue integration. Thus
(z) (z) =
_

0
s
z1
n=1
e
ns
ds
=
_

0
s
z1
e
s
1 e
s
ds =
_

0
s
z1
e
s
1
ds
By 24.26,
I (z) =
_
e
z2i
1
_
_

0
e
(z1) ln(t)
e
t
1
dt
=
_
e
z2i
1
_
(z) (z)
=
_
e
2iz
1
_
(z) (z)
whenever Re z > 1.
24. Now show there exists an entire function, h(z) such that
(z) =
1
z 1
+h(z)
for Re z > 1. Conclude (z) extends to a meromorphic function dened on
all of C which has a simple pole at z = 1, namely, the right side of the above
formula. Hint: Use Problem 10 to observe that (z) is never equal to zero
but has simple poles at every nonnegative integer. Then for Re z > 1,
(z)
I (z)
(e
2iz
1) (z)
.
By 24.26 has no poles for Re z > 1. The right side of the above equation is
dened for all z. There are no poles except possibly when z is a nonnegative
integer. However, these points are not poles either because of Problem 10
which states that has simple poles at these points thus cancelling the simple
24.6. EXERCISES 659
zeros of
_
e
2iz
1
_
. The only remaining possibility for a pole for is at z = 1.
Show it has a simple pole at this point. You can use the formula for I (z)
I (z)
_
2
0
e
(z1)(ln r+i)
e
re
i
1
ire
i
d +
_

r
e
(z1)(ln t+2i)
e
t
1
dt (24.27)
+
_
r
e
(z1) ln t
e
t
1
dt
Thus I (1) is given by
I (1)
_
2
0
1
e
re
i
1
ire
i
d +
_

r
1
e
t
1
dt +
_
r
1
e
t
1
dt
=
_
r
dw
e
w
1
where
r
is the circle of radius r. This contour integral equals 2i
by the residue theorem. Therefore,
I (z)
(e
2iz
1) (z)
=
1
z 1
+h(z)
where h(z) is an entire function. People worry a lot about where the zeros of
are located. In particular, the zeros for Re z (0, 1) are of special interest.
The Riemann hypothesis says they are all on the line Re z = 1/2. This is a
good problem for you to do next.
25. There is an important relation between prime numbers and the zeta function
due to Euler. Let p
n
n=1
be the prime numbers. Then for Re z > 1,
n=1
1
1 p
z
n
= (z) .
To see this, consider a partial product.
N
n=1
1
1 p
z
n
=
N
n=1
j
n
=1
_
1
p
z
n
_
j
n
.
Let S
N
denote all positive integers which use only p
1
, , p
N
in their prime
factorization. Then the above equals

nS
N
1
n
z
. Letting N and using
the fact that Re z > 1 so that the order in which you sum is not important (See
Theorem 25.1 or recall advanced calculus. ) you obtain the desired equation.
Show

n=1
1
p
n
= .
Elliptic Functions
This chapter is to give a short introduction to elliptic functions. There is much
more available. There are books written on elliptic functions. What I am presenting
here follows Alfors [2] although the material is found in many books on complex
analysis. Hille, [26] has a much more extensive treatment than what I will attempt
here. There are also many references and historical notes available in the book by
Hille. Another good source for more having much the same emphasis as what is
presented here is in the book by Saks and Zygmund [40]. This is a very interesting
subject because it has considerable overlap with algebra.
Before beginning, recall that an absolutely convergent series can be summed in
any order and you always get the same answer. The easy way to see this is to think
of the series as a Lebesgue integral with respect to counting measure and apply
convergence theorems as needed. The following theorem provides the necessary
results.
Theorem 25.1 Suppose

n=1
[a
n
[ < and let , : N N be one to one and
onto mappings. Then

n=1
a
(n)
and

n=1
a
(n)
both converge and the two sums
are equal.
Proof: By the monotone convergence theorem,
n=1
[a
n
[ = lim
n
n
k=1
a
(k)
= lim
n
n
k=1
a
(k)
but these last two equal

k=1
a
(k)
and

k=1
a
(k)
respectively. Therefore,
k=1
a
(k)
and

k=1
a
(k)
exist (n a
(n)
is in L
1
with respect to counting
measure.) It remains to show the two are equal. There exists M such that if
n > M then
k=n+1
a
(k)
< ,
k=n+1
a
(k)
<
k=1
a
(k)
k=1
a
(k)
< ,
k=1
a
(k)
k=1
a
(k)
<
661
662 ELLIPTIC FUNCTIONS
Pick such an n denoted by n
1
. Then pick n
2
> n
1
> M such that
(1) , , (n
1
) (1) , , (n
2
) .
Then
n
2
k=1
a
(k)
=
n
1
k=1
a
(k)
+
(k)/ {(1), ,(n

1
)}
a
(k)
.
Therefore,
n
2
k=1
a
(k)
n
1
k=1
a
(k)
(k)/ {(1), ,(n

1
)},kn
2
a
(k)
Now all of these (k) in the last sum are contained in (n

1
+ 1) , and so the
last sum above is dominated by
k=n
1
+1
a
(k)
< .
Therefore,
k=1
a
(k)
k=1
a
(k)
k=1
a
(k)
n
2
k=1
a
(k)
n
2
k=1
a
(k)
n
1
k=1
a
(k)
n
1
k=1
a
(k)
k=1
a
(k)
< + + = 3
and since is arbitrary, it follows
k=1
a
(k)
=
k=1
a
(k)
as claimed. This proves
the theorem.
25.1 Periodic Functions
Denition 25.2 A function dened on C is said to be periodic if there exists w
such that f (z +w) = f (z) for all z C. Denote by M the set of all periods. Thus
if w
1
, w
2
M and a, b Z, then aw
1
+ bw
2
M. For this reason M is called the
module of periods.
1
In all which follows it is assumed f is meromorphic.
Theorem 25.3 Let f be a meromorphic function and let M be the module of peri-
ods. Then if M has a limit point, then f equals a constant. If this does not happen
then either there exists w
1
M such that Zw
1
= M or there exist w
1
, w
2
M such
that M = aw
1
+bw
2
: a, b Z and w
1
/w
2
is not real. Also if = w
2
/w
1
,
[[ 1,
1
2
Re
1
2
.
1
A module is like a vector space except instead of a eld of scalars, you have a ring of scalars.
25.1. PERIODIC FUNCTIONS 663
Proof: Suppose f is meromorphic and M has a limit point, w
0
. By Theorem
24.10 on Page 636 there exist analytic functions, p, q such that f (z) =
p(z)
q(z)
. Now
pick z
0
such that z
0
is not a pole of f. Then letting w
n
w
0
where w
n
M,
f (z
0
+w
n
) = f (z
0
) . Therefore, p (z
0
+w
n
) = f (z
0
) q (z
0
+w
n
) and so the analytic
function, p (z) f (z
0
) q (z) has a zero set which has a limit point. Therefore, this
function is identically equal to zero because of Theorem 19.23 on Page 465. Thus
f equals a constant as claimed.
This has shown that if f is not constant, then M is discreet. Therefore, there
exists w
1
M such that [w
1
[ = min[w[ : w M. Suppose rst that every element
of M is a real multiple of w
1
. Thus, if w M, it follows there exists a real number,
x such that w = xw
1
. Then there exist positive integers, k, k +1 such that k x <
k +1. If x > k, then wkw
1
= (x k) w
1
is a period having smaller absolute value
than [w
1
[ which would be a contradiction. Hence, x = k and so M = Zw
1
.
Now suppose there exists w
2
M which is not a real multiple of w
1
. You
can let w
2
be the element of M having this property which has smallest absolute
value. Now let w M. Since w
1
and w
2
point in dierent directions, it follows
w = xw
1
+yw
2
for some real numbers, x, y. Let [mx[
1
2
and [n y[
1
2
where
m, n are integers. Therefore,
w = mw
1
+nw
2
+ (x m) w
1
+ (y n) w
2
and so
w mw
1
nw
2
= (x m) w
1
+ (y n) w
2
(25.1)
Now since w
2
/w
1
/ 1,
[(x m) w
1
+ (y n) w
2
[ < [(x m) w
1
[ +[(y n) w
2
[
=
1
2
[w
1
[ +
1
2
[w
2
[ .
[w mw
1
nw
2
[ = [(x m) w
1
+ (y n) w
2
[
<
1
2
[w
1
[ +
1
2
[w
2
[ [w
2
[
and so the period, w mw
1
nw
2
cannot be a non real multiple of w
1
because w
2
is the one which has smallest absolute value and this period has smaller absolute
value than w
2
. Therefore, the ratio w mw
1
nw
2
/w
1
must be a real number, x.
Thus
w mw
1
nw
2
= xw
1
Since w
1
has minimal absolute value of all periods, it follows [x[ 1. Let k x <
k + 1 for some integer, k. If x > k, then
w mw
1
nw
2
kw
1
= (x k) w
1
which would contradict the choice of w
1
as being the period having minimal absolute
value because the expression on the left in the above is a period and it equals
something which has absolute value less than [w
1
[. Therefore, x = k and w is an
integer linear combination of w
1
and w
2
. It only remains to verify the claim about
.
From the construction, [w
1
[ [w
2
[ and [w
2
[ [w
1
w
2
[ , [w
2
[ [w
1
+w
2
[ .
Therefore,
[[ 1, [[ [1 [ , [[ [1 +[ .
The last two of these inequalities imply 1/2 Re 1/2.
Denition 25.4 For f a meromorphic function which has the last of the above
alternatives holding in which M = aw
1
+bw
2
: a, b Z , the function, f is called
elliptic. This is also called doubly periodic.
Theorem 25.5 Suppose f is an elliptic function which has no poles. Then f is
constant.
Proof: Since f has no poles it is analytic. Now consider the parallelograms
determined by the vertices, mw
1
+nw
2
for m, n Z. By periodicity of f it must be
bounded because its values are identical on each of these parallelograms. Therefore,
it equals a constant by Liouvilles theorem.
Denition 25.6 Dene P
a
to be the parallelogram determined by the points
a +mw
1
+nw
2
, a + (m+ 1) w
1
+nw
2
, a +mw
1
+ (n + 1) w
2
,
a + (m+ 1) w
1
+ (n + 1) w
2
Such P
a
will be referred to as a period parallelogram. The sum of the orders of
the poles in a period parallelogram which contains no poles or zeros of f on its
boundary is called the order of the function. This is well dened because of the
periodic property of f.
Theorem 25.7 The sum of the residues of any elliptic function, f equals zero on
every P
a
if a is chosen so that there are no poles on P
a
.
Proof: Choose a such that there are no poles of f on the boundary of P
a
. By
periodicity,
_
P
a
f (z) dz = 0
because the integrals over opposite sides of the parallelogram cancel out because
the values of f are the same on these sides and the orientations are opposite. It
follows from the residue theorem that the sum of the residues in P
a
equals 0.
Theorem 25.8 Let P
a
be a period parallelogram for a nonconstant elliptic function,
f which has order equal to m. Then f assumes every value in f (P
a
) exactly m
times.
Proof: Let c f (P
a
) and consider P
a
such that f
1
(c) P
a
= f
1
(c) P
a
and P
a
contains the same poles and zeros of f c as P
a
but P
a
has no zeros of
f (z) c or poles of f on its boundary. Thus f
(z) / (f (z) c) is also an elliptic

function and so Theorem 25.7 applies. Consider
1
2i
_
P
a
(z)
f (z) c
dz.
By the argument principle, this equals N
z
N
p
where N
z
equals the number of zeros
of f (z) c and N
p
equals the number of the poles of f (z). From Theorem 25.7 this
must equal zero because it is the sum of the residues of f
/ (f c) and so N
z
= N
p
.
Now N
p
equals the number of poles in P
a
counted according to multiplicity.
There is an even better theorem than this one.
Theorem 25.9 Let f be a non constant elliptic function and suppose it has poles
p
1
, , p
m
and zeros, z
1
, , z
m
in P
, listed according to multiplicity where P
contains no poles or zeros of f. Then

m
k=1
z
k

m
k=1
p
k
M, the module of
periods.
Proof: You can assume P
a
contains no poles or zeros of f because if it did,
then you could consider a slightly shifted period parallelogram, P
a
which contains
no new zeros and poles but which has all the old ones but no poles or zeros on its
boundary. By Theorem 21.8 on Page 518
1
2i
_
P
a
z
f
(z)
f (z)
dz =
m
k=1
z
k
k=1
p
k
. (25.2)
Denoting by (z, w) the straight oriented line segment from z to w,
_
P
a
z
f
(z)
f (z)
dz
=
_
(a,a+w
1
)
z
f
(z)
f (z)
dz +
_
(a+w
1
+w
2
,a+w
2
)
z
f
(z)
f (z)
dz
+
_
(a+w
1
,a+w
2
+w
1
)
z
f
(z)
f (z)
dz +
_
(a+w
2
,a)
z
f
(z)
f (z)
dz
=
_
(a,a+w
1
)
(z (z +w
2
))
f
(z)
f (z)
dz
+
_
(a,a+w
2
)
(z (z +w
1
))
f
(z)
f (z)
dz
Now near these line segments
f
(z)
f(z)
is analytic and so there exists a primitive, g
w
i
(z)
on (a, a +w
i
) by Corollary 19.32 on Page 471 which satises e
g
w
i
(z)
= f (z).
Therefore,
= w
2
(g
w
1
(a +w
1
) g
w
1
(a)) w
1
(g
w
2
(a +w
2
) g
w
2
(a)) .
Now by periodicity of f it follows f (a +w
1
) = f (a) = f (a +w
2
) . Hence
g
w
i
(a +w
1
) g
w
i
(a) = 2mi
for some integer, m because
e
g
w
i
(a+w
i
)
e
g
w
i
(a)
= f (a +w
i
) f (a) = 0.
Therefore, from 25.2, there exist integers, k, l such that
1
2i
_
P
a
z
f
(z)
f (z)
dz
=
1
2i
[w
2
(g
w
1
(a +w
1
) g
w
1
(a)) w
1
(g
w
2
(a +w
2
) g
w
2
(a))]
=
1
2i
[w
2
(2ki) w
1
(2li)]
= w
2
k w
1
l M.
m
k=1
z
k
k=1
p
k
M.
Hille says this relation is due to Liouville. There is also a simple corollary which
follows from the above theorem applied to the elliptic function, f (z) c.
Corollary 25.10 Let f be a non constant elliptic function and suppose the func-
tion, f (z) c has poles p
1
, , p
m
and zeros, z
1
, , z
m
on P
, listed according
to multiplicity where P
contains no poles or zeros of f (z) c. Then

m
k=1
z
k
m
k=1
p
k
M, the module of periods.
25.1.1 The Unimodular Transformations
Denition 25.11 Suppose f is a nonconstant elliptic function and the module of
periods is of the form aw
1
+bw
2
where a, b are integers and w
1
/w
2
is not real.
Then by analogy with linear algebra, w
1
, w
2
is referred to as a basis. The uni-
modular transformations will refer to matrices of the form
_
a b
c d
_
where all entries are integers and
ad bc = 1.
These linear transformations are also called the modular group.
The following is an interesting lemma which ties matrices with the fractional
linear transformations.
Lemma 25.12 Dene
__
a b
c d
__
az +b
cz +d
.
Then
(AB) = (A) (B) , (25.3)
(A) (z) = z if and only if
A = cI
where I is the identity matrix and c ,= 0. Also if f (z) =
az+b
cz+d
, then f
1
(z) exists
if and only if ad cb ,= 0. Furthermore it is easy to nd f
1
.
Proof: The equation in 25.3 is just a simple computation. Now suppose (A) (z) =
z. Then letting A =
_
a b
c d
_
, this requires
az +b = z (cz +d)
and so az + b = cz
2
+ dz. Since this is to hold for all z it follows c = 0 = b and
a = d. The other direction is obvious.
Consider the claim about the existence of an inverse. Let ad cb ,= 0 for
f (z) =
az+b
cz+d
. Then
f (z) =
__
a b
c d
__
It follows
_
a b
c d
_
1
exists and equals
1
adbc
_
d b
c a
_
. Therefore,
z = (I) (z) =
__
a b
c d
__
1
ad bc
_
d b
c a
___
(z)
=
__
a b
c d
__

__
1
ad bc
_
d b
c a
___
(z)
= f f
1
(z)
which shows f
1
exists and it is easy to nd.
Next suppose f
1
exists. I need to verify the condition ad cb ,= 0. If f
1
exists, then from the process used to nd it, you see that it must be a fractional
linear transformation. Letting A =
_
a b
c d
_
so (A) = f, it follows there exists
a matrix B such that
(BA) (z) = (B) (A) (z) = z.
However, it was shown that this implies BA is a nonzero multiple of I which requires
that A
1
must exist. Hence the condition must hold.
Theorem 25.13 If f is a nonconstant elliptic function with a basis w
1
, w
2
for
the module of periods, then w
1
, w
2
is another basis, if and only if there exists a
unimodular transformation,
_
a b
c d
_
= A such that
_
w
1
w
2
_
=
_
a b
c d
__
w
1
w
2
_
. (25.4)
Proof: Since w
1
, w
2
is a basis, there exist integers, a, b, c, d such that 25.4
holds. It remains to show the transformation determined by the matrix is unimod-
ular. Taking conjugates,
_
w
1
w
2
_
=
_
a b
c d
__
w
1
w
2
_
.
Therefore,
_
w
1
w
1
w
2
w
2
_
=
_
a b
c d
__
w
1
w
1
w
2
w
2
_
Now since w
1
, w
2
is also given to be a basis, there exits another matrix having
all integer entries,
_
e f
g h
_
such that
_
w
1
w
2
_
=
_
e f
g h
__
w
1
w
2
_
and
_
w
1
w
2
_
=
_
e f
g h
__
w
1
w
2
_
.
Therefore,
_
w
1
w
1
w
2
w
2
_
=
_
a b
c d
__
e f
g h
__
w
1
w
1
w
2
w
2
_
.
However, since w
1
/w
2
is not real, it is routine to verify that
det
_
w
1
w
1
w
2
w
2
_
,= 0.
Therefore,
_
1 0
0 1
_
=
_
a b
c d
__
e f
g h
_
and so det
_
a b
c d
_
det
_
e f
g h
_
= 1. But the two matrices have all integer
entries and so both determinants must equal either 1 or 1.
Next suppose
_
w
1
w
2
_
=
_
a b
c d
__
w
1
w
2
_
(25.5)
where
_
a b
c d
_
is unimodular. I need to verify that w
1
, w
2
is a basis. If w M,
there exist integers, m, n such that
w = mw
1
+nw
2
=
_
m n
_
_
w
1
w
2
_
From 25.5
_
d b
c a
__
w
1
w
2
_
=
_
w
1
w
2
_
and so
w =
_
m n
_
_
d b
c a
__
w
1
w
2
_
which is an integer linear combination of w
1
, w
2
. It only remains to verify that
w
1
/w
2
is not real.
Claim: Let w
1
and w
2
be nonzero complex numbers. Then w
2
/w
1
is not real
if and only if
w
1
w
2
w
1
w
2
= det
_
w
1
w
1
w
2
w
2
_
,= 0
Proof of the claim: Let = w
2
/w
1
. Then
w
1
w
2
w
1
w
2
= w
1
w
1
w
1
w
1
=
_

_
[w
1
[
2
Thus the ratio is not real if and only if
_

_
,= 0 if and only if w
1
w
2
w
1
w
2
,= 0.
Now to verify w
2
/w
1
is not real,
det
_
w
1
w
1
w
2
w
2
_
= det
__
a b
c d
__
w
1
w
1
w
2
w
2
__
= det
_
w
1
w
1
w
2
w
2
_
,= 0
25.1.2 The Search For An Elliptic Function
By Theorem 25.5 and 25.7 if you want to nd a nonconstant elliptic function it must
fail to be analytic and also have either no terms in its Laurent expansion which are
of the form b
1
(z a)
1
or else these terms must cancel out. It is simplest to look
for a function which simply does not have them. Weierstrass looked for a function
of the form
(z)
1
z
2
+
w=0
_
1
(z w)
2

1
w
2
_
(25.6)
where w consists of all numbers of the form aw
1
+bw
2
for a, b integers. Sometimes
people write this as (z, w
1
, w
2
) to emphasize its dependence on the periods, w
1
and w
2
but I wont do so. It is understood there exist these periods, which are
given. This is a reasonable thing to try. Suppose you formally dierentiate the
right side. Never mind whether this is justied for now. This yields
(z) =
2
z
3

w=0
2
(z w)
3
=
w
2
(z w)
3
which is clearly periodic having both periods w
1
and w
2
. Therefore, (z +w
1
)
(z) and (z +w
2
) (z) are both constants, c
1
and c
2
respectively. The reason
for this is that since
is periodic with periods w

1
and w
2
, it follows
(z +w
i
)
(z) = 0 as long as z is not a period. From 25.6 you can see right away that
(z) = (z)
Indeed
(z) =
1
z
2
+
w=0
_
1
(z w)
2

1
w
2
_
=
1
z
2
+
w=0
_
1
(z +w)
2

1
w
2
_
= (z) .
and so
c
1
=
_
w
1
2
+w
1
_
w
1
2
_
=
_
w
1
2
_
w
1
2
_
= 0
which shows the constant for (z +w
1
) (z) must equal zero. Similarly the
constant for (z +w
2
) (z) also equals zero. Thus is periodic having the two
periods w
1
, w
2
.
Of course to justify this, you need to consider whether the series of 25.6 con-
verges. Consider the terms of the series.
1
(z w)
2

1
w
2
= [z[
2w z
(z w)
2
w
2
If [w[ > 2 [z[ , this can be estimated more. For such w,
1
(z w)
2

1
w
2
= [z[
2w z
(z w)
2
w
2
[z[
(5/2) [w[
[w[
2
([w[ [z[)
2
[z[
(5/2) [w[
[w[
2
((1/2) [w[)
2
= [z[
10
[w[
3
.
It follows the series in 25.6 converges uniformly and absolutely on every compact
set, K provided

w=0
1
|w|
3
converges. This question is considered next.
Claim: There exists a positive number, k such that for all pairs of integers,
m, n, not both equal to zero,
[mw
1
+nw
2
[
[m[ +[n[
k > 0.
Proof of claim: If not, there exists m
k
and n
k
such that
lim
k
m
k
[m
k
[ +[n
k
[
w
1
+
n
k
[m
k
[ +[n
k
[
w
2
= 0
However,
_
m
k
|m
k
|+|n
k
|
,
n
k
|m
k
|+|n
k
|
_
is a bounded sequence in 1
2
and so, taking a sub-
sequence, still denoted by k, you can have
_
m
k
[m
k
[ +[n
k
[
,
n
k
[m
k
[ +[n
k
[
_
(x, y) 1
2
and so there are real numbers, x, y such that xw
1
+ yw
2
= 0 contrary to the
assumption that w
2
/w
1
is not equal to a real number. This proves the claim.
Now from the claim,
w=0
1
[w[
3
=
(m,n)=(0,0)
1
[mw
1
+nw
2
[
3

(m,n)=(0,0)
1
k
3
([m[ +[n[)
3
=
1
k
3
j=1
|m|+|n|=j
1
([m[ +[n[)
3
=
1
k
3
j=1
4j
j
3
< .
Now consider the series in 25.6. Letting z B(0, R) ,
(z)
1
z
2
+
w=0,|w|R
_
1
(z w)
2

1
w
2
_
+
w=0,|w|>R
_
1
(z w)
2

1
w
2
_
and the last series converges uniformly on B(0, R) to an analytic function. Thus is
a meromorphic function and also the argument given above involving dierentiation
of the series termwise is valid. Thus is an elliptic function as claimed. This is
called the Weierstrass function. This has proved the following theorem.
Theorem 25.14 The function dened above is an example of an elliptic function.
On any compact set, equals a rational function added to a series which is uniformly
and absolutely convergent on the compact set.
25.1.3 The Dierential Equation Satised By
For z not a pole,
(z) =
2
z
3

w=0
2
(z w)
3
Also since there are no poles of order 1 you can obtain a primitive for , .
2
To do so, recall
(z)
1
z
2
+
w=0
_
1
(z w)
2

1
w
2
_
where for [z[ < R this is the sum of a rational function with a uniformly convergent
series. Therefore, you can take the integral along any path, (0, z) from 0 to z
which misses the poles of . By the uniform convergence of the above integral, you
can interchange the sum with the integral and obtain
(z) =
1
z
+
w=0
1
z w
+
z
w
2
+
1
w
(25.7)
This function is odd. Here is why.
(z) =
1
z
+
w=0
1
z w

z
w
2
+
1
w
while
(z) =
1
z
+
w=0
1
z w

z
w
2

1
w
=
1
z
+
w=0
1
z +w

z
w
2
+
1
w
.
Now consider 25.7. It will be used to nd the Laurent expansion about the origin
for which will then be dierentiated to obtain the Laurent expansion for at the
origin. Since w ,= 0 and the interest is for z near 0 so [z[ < [w[ ,
1
z w
+
z
w
2
+
1
w
=
z
w
2
+
1
w

1
w
1
1
z
w
=
z
w
2
+
1
w

1
w
k=0
_
z
w
_
k
=
1
w
k=2
_
z
w
_
k
2
I dont know why it is traditional to refer to this antiderivative as rather than but I
am following the convention. I think it is to minimize the number of minus signs in the next
expression.
From 25.7
(z) =
1
z
+
w=0
_
k=2
z
k
w
k+1
_
=
1
z

k=2
w=0
z
k
w
k+1
=
1
z

k=2
w=0
z
2k1
w
2k
because the sum over odd powers must be zero because for each w ,= 0, there exists
w ,= 0 such that the two terms
z
2k
w
2k+1
and
z
2k
(w)
2k+1
cancel each other. Hence
(z) =
1
z

k=2
G
k
z
2k1
where G
k
=
w=0
1
w
2k
. Now with this,
(z) = (z) =
1
z
2
+
k=2
G
k
(2k 1) z
2k2
=
1
z
2
+ 3G
2
z
2
+ 5G
3
z
4
+
Therefore,
(z) =
2
z
3
+ 6G
2
z + 20G
3
z
3
+
(z)
2
=
4
z
6

24G
2
z
2
80G
3
+
4(z)
3
= 4
_
1
z
2
+ 3G
2
z
2
+ 5G
3
z
4

_
3
=
4
z
6
+
36
z
2
G
2
+ 60G
3
+
and nally
60G
2
(z) =
60G
2
z
2
+ 0 +
where in the above, the positive powers of z are not listed explicitly. Therefore,
(z)
2
4(z)
3
+ 60G
2
(z) + 140G
3
=
n=1
a
n
z
n
In deriving the equation it was assumed [z[ < [w[ for all w = aw
1
+bw
2
where a, b are
integers not both zero. The left side of the above equation is periodic with respect
to w
1
and w
2
where w
2
/w
1
is not a real number. The only possible poles of the
left side are at 0, w
1
, w
2
, and w
1
+w
2
, the vertices of the parallelogram determined
by w
1
and w
2
. This follows from the original formula for (z) . However, the above
equation shows the left side has no pole at 0. Since the left side is periodic with
periods w
1
and w
2
, it follows it has no pole at the other vertices of this parallelogram
either. Therefore, the left side is periodic and has no poles. Consequently, it equals
a constant by Theorem 25.5. But the right side of the above equation shows this
constant is 0 because this side equals zero when z = 0. Therefore, satises the
dierential equation,
(z)
2
4(z)
3
+ 60G
2
(z) + 140G
3
= 0.
It is traditional to dene 60G
2
g
2
and 140G
3
g
3
. Then in terms of these new
quantities the dierential equation is
(z)
2
= 4(z)
3
g
2
(z) g
3
.
Suppose e
1
, e
2
and e
3
are zeros of the polynomial 4w
3
g
2
w g
3
= 0. Then the
above equation can be written in the form
(z)
2
= 4 ((z) e
1
) ((z) e
2
) ((z) e
3
) . (25.8)
25.1.4 A Modular Function
The next task is to nd the e
i
in 25.8. First recall that is an even function. That
is (z) = (z). This follows from 25.6 which is listed here for convenience.
(z)
1
z
2
+
w=0
_
1
(z w)
2

1
w
2
_
(25.9)
Thus
(z) =
1
z
2
+
w=0
_
1
(z w)
2

1
w
2
_
=
1
z
2
+
w=0
_
1
(z +w)
2

1
w
2
_
= (z) .
Therefore, (w
1
z) = (z w
1
) = (z) and so
(w
1
z) =
(z) . Letting
z = w
1
/2, it follows
(w
1
/2) = 0. Similarly,
(w
2
/2) = 0 and
((w
1
+w
2
) /2) =
0. Therefore, from 25.8
0 = 4 ((w
1
/2) e
1
) ((w
1
/2) e
2
) ((w
1
/2) e
3
) .
It follows one of the e
i
must equal (w
1
/2) . Similarly, one of the e
i
must equal
(w
2
/2) and one must equal ((w
1
+w
2
) /2).
Lemma 25.15 The numbers, (w
1
/2) , (w
2
/2) , and ((w
1
+w
2
) /2) are dis-
tinct.
Proof: Choose P
a
, a period parallelogram which contains the pole 0, and the
points w
1
/2, w
2
/2, and (w
1
+w
2
) /2 but no other pole of (z) . Also P
a
does not
contain any zeros of the elliptic function, z (z) (w
1
/2). This can be done
by shifting P
0
slightly because the poles are only at the points aw
1
+ bw
2
for a, b
integers and the zeros of (z) (w
1
/2) are discreet.
0
w
1
w
2
w
1
+w
2
a
If (w
2
/2) = (w
1
/2) , then (z) (w
1
/2) has two zeros, w
2
/2 and w
1
/2 and
since the pole at 0 is of order 2, this is the order of (z) (w
1
/2) on P
a
hence by
Theorem 25.8 on Page 664 these are the only zeros of this function on P
a
. It follows
by Corollary 25.10 on Page 666 which says the sum of the zeros minus the sum of
the poles is in M,
w
1
2
+
w
2
2
M. Thus there exist integers, a, b such that
w
1
+w
2
2
= aw
1
+bw
2
which implies (2a 1) w
1
+ (2b 1) w
2
= 0 contradicting w
2
/w
1
not being real.
Similar reasoning applies to the other pairs of points in w
1
/2, w
2
/2, (w
1
+w
2
) /2 .
For example, consider (w
1
+w
2
) /2 and choose P
a
such that its boundary contains
no zeros of the elliptic function, z (z) ((w
1
+w
2
) /2) and P
a
contains no
poles of on its interior other than 0. Then if (w
2
/2) = ((w
1
+w
2
) /2) , it
follows from Theorem 25.8 on Page 664 w
2
/2 and (w
1
+w
2
) /2 are the only two
zeros of (z) ((w
1
+w
2
) /2) on P
a
and by Corollary 25.10 on Page 666
w
1
+w
1
+w
2
2
= aw
1
+bw
2
M
for some integers a, b which leads to the same contradiction as before about w
1
/w
2
not being real. The other cases are similar. This proves the lemma.
Lemma 25.15 proves the e
i
are distinct. Number the e
i
such that
e
1
= (w
1
/2) , e
2
= (w
2
/2)
and
e
3
= ((w
1
+w
2
) /2) .
To summarize, it has been shown that for complex numbers, w
1
and w
2
with
w
2
/w
1
not real, an elliptic function, has been dened. Denote this function as
(z) = (z, w
1
, w
2
) . This in turn determined numbers, e
i
as described above. Thus
these numbers depend on w
1
and w
2
and as described above,
e
1
(w
1
, w
2
) =
_
w
1
2
, w
1
, w
2
_
, e
2
(w
1
, w
2
) =
_
w
2
2
, w
1
, w
2
_
e
3
(w
1
, w
2
) =
_
w
1
+w
2
2
, w
1
, w
2
_
.
Therefore, using the formula for , 25.9,
(z)
1
z
2
+
w=0
_
1
(z w)
2

1
w
2
_
you see that if the two periods w
1
and w
2
are replaced with tw
1
and tw
2
respectively,
then
e
i
(tw
1
, tw
2
) = t
2
e
i
(w
1
, w
2
) .
Let denote the complex number which equals the ratio, w
2
/w
1
which was assumed
in all this to not be real. Then
e
i
(w
1
, w
2
) = w
2
1
e
i
(1, )
Now dene the function, ()
()
e
3
(1, ) e
2
(1, )
e
1
(1, ) e
2
(1, )
_
=
e
3
(w
1
, w
2
) e
2
(w
1
, w
2
)
e
1
(w
1
, w
2
) e
2
(w
1
, w
2
)
_
. (25.10)
This function is meromorphic for Im > 0 or for Im < 0. However, since the
denominator is never equal to zero the function must actually be analytic on both
the upper half plane and the lower half plane. It never is equal to 0 because e
3
,= e
2
and it never equals 1 because e
3
,= e
1
. This is stated as an observation.
Observation 25.16 The function, () is analytic for in the upper half plane
and never assumes the values 0 and 1.
This is a very interesting function. Consider what happens when
_
w
1
w
2
_
=
_
a b
c d
__
w
1
w
2
_
and the matrix is unimodular. By Theorem 25.13 on Page 668 w
1
, w
2
is just an-
other basis for the same module of periods. Therefore, (z, w
1
, w
2
) = (z, w
1
, w
2
)
because both are dened as sums over the same values of w, just in dierent order
which does not matter because of the absolute convergence of the sums on compact
subsets of C. Since is unchanged, it follows
(z) is also unchanged and so the

numbers, e
i
are also the same. However, they might be permuted in which case
the function () dened above would change. What would it take for () to
not change? In other words, for which unimodular transformations will be left
unchanged? This happens if and only if no permuting takes place for the e
i
. This
occurs if
_
w
1
2
_
=
_
w
1
2
_
and
_
w
2
2
_
=
_
w
2
2
_
. If
w
1
2

w
1
2
M,
w
2
2

w
2
2
M
then
_
w
1
2
_
=
_
w
1
2
_
and so e
1
will be unchanged and similarly for e
2
and e
3
.
This occurs exactly when
1
2
((a 1) w
1
+bw
2
) M,
1
2
(cw
1
+ (d 1) w
2
) M.
This happens if a and d are odd and if b and c are even. Of course the stylish way
to say this is
a 1 mod2, d 1 mod2, b 0 mod2, c 0 mod2. (25.11)
This has shown that for unimodular transformations satisfying 25.11 is unchanged.
Letting be dened as above,
=
w
2
w
cw
1
+dw
2
aw
1
+bw
2
=
c +d
a +b
.
Thus for unimodular transformations,
_
a b
c d
_
satisfying 25.11, or more suc-
cinctly,
_
a b
c d
_
_
1 0
0 1
_
mod2 (25.12)
it follows that
_
c +d
a +b
_
= () . (25.13)
Furthermore, this is the only way this can happen.
Lemma 25.17 () = (
) if and only if
=
a +b
c +d
where 25.12 holds.
Proof: It only remains to verify that if (w
1
/2) = (w
1
/2) then it is necessary
that
w
1
2

w
1
2
M
with a similar requirement for w
2
and w
2
. If
w
1
2

w
1
2
/ M, then there exist integers,
m, n such that
w
1
2
+mw
1
+nw
2
is in the interior of P
0
, the period parallelogram whose vertices are 0, w
1
, w
1
+w
2
,
and w
2
. Therefore, it is possible to choose small a such that P
a
contains the pole,
0,
w
1
2
, and
w
1
2
+mw
1
+nw
2
but no other poles of and in addition, P
a
contains
no zeros of z (z)
_
w
1
2
_
. Then the order of this elliptic function is 2. By
assumption, and the fact that is even,
_
w
1
2
+mw
1
+nw
2
_
=
_
w
1
2
_
=
_
w
1
2
_
=
_
w
1
2
_
.
It follows both
w
1
2
+ mw
1
+ nw
2
and
w
1
2
are zeros of (z)
_
w
1
2
_
and so by
Theorem 25.8 on Page 664 these are the only two zeros of this function in P
a
.
Therefore, from Corollary 25.10 on Page 666
w
1
2

w
1
2
+mw
1
+nw
2
M
which shows
w
1
2

w
1
2
M. This completes the proof of the lemma.
Note the condition in the lemma is equivalent to the condition 25.13 because
you can relabel the coecients. The message of either version is that the coecient
of in the numerator and denominator is odd while the constant in the numerator
and denominator is even.
Next,
_
1 0
2 1
_
_
1 0
0 1
_
mod2 and therefore,
_
2 +
1
_
= ( + 2) = () . (25.14)
Thus is periodic of period 2.
Thus leaves invariant a certain subgroup of the unimodular group. According
to the next denition, is an example of something called a modular function.
Denition 25.18 When an analytic or meromorphic function is invariant under
a group of linear transformations, it is called an automorphic function. A function
which is automorphic with respect to a subgroup of the modular group is called a
modular function or an elliptic modular function.
Now consider what happens for some other unimodular matrices which are not
congruent to the identity mod2. This will yield other functional equations for
in addition to the fact that is periodic of period 2. As before, these functional
equations come about because is unchanged when you change the basis for M,
the module of periods. In particular, consider the unimodular matrices
_
1 0
1 1
_
,
_
0 1
1 0
_
. (25.15)
Consider the rst of these. Thus
_
w
1
w
2
_
=
_
w
1
w
1
+w
2
_
Hence
= w
2
/w
1
= (w
1
+w
2
) /w
1
= 1 +. Then from the denition of ,
(
) = (1 +)
=
_
w
1
+w
2
2
_
_
w
2
2
_
_
w
1
2
_
_
w
2
2
_
=

_
w
1
+w
2
+w
1
2
_
_
w
1
+w
2
2
_
_
w
1
2
_
_
w
1
+w
2
2
_
=

_
w
2
2
+w
1
_
_
w
1
+w
2
2
_
_
w
1
2
_
_
w
1
+w
2
2
_
=

_
w
2
2
_
_
w
1
+w
2
2
_
_
w
1
2
_
_
w
1
+w
2
2
_
=
_
w
1
+w
2
2
_
_
w
2
2
_
_
w
1
2
_
_
w
1
+w
2
2
_
=

_
w
1
+w
2
2
_
_
w
2
2
_
_
w
1
2
_
_
w
2
2
_
+
_
w
2
2
_
_
w
1
+w
2
2
_
=
_
(
w
1
+w
2
2
)(
w
2
2
)
(
w
1
2
)(
w
2
2
)
_
1 +
_
(
w
2
2
)(
w
1
+w
2
2
)
(
w
1
2
)(
w
2
2
)
_
=
_
(
w
1
+w
2
2
)(
w
2
2
)
(
w
1
2
)(
w
2
2
)
_
_
(
w
1
+w
2
2
)(
w
2
2
)
(
w
1
2
)(
w
2
2
)
_
1
=
()
() 1
. (25.16)
Summarizing the important feature of the above,
(1 +) =
()
() 1
. (25.17)
Next consider the other unimodular matrix in 25.15. In this case w
1
= w
2
and
w
2
= w
1
. Therefore,
= w
2
/w
1
= w
1
/w
2
= 1/. Then
(
) = (1/)
=
_
w
1
+w
2
2
_
_
w
2
2
_
_
w
1
2
_
_
w
2
2
_
=

_
w
1
+w
2
2
_
_
w
1
2
_
_
w
2
2
_
_
w
1
2
_
=
e
3
e
1
e
2
e
1
=
e
3
e
2
+e
2
e
1
e
1
e
2
= (() 1) = () + 1. (25.18)
You could try other unimodular matrices and attempt to nd other functional
equations if you like but this much will suce here.
25.1.5 A Formula For
Recall the formula of Mittag-Leer for cot () given in 24.15. For convenience,
here it is.
1
n=1
2
2
n
2
= cot .
As explained in the derivation of this formula it can also be written as
n=
2
n
2
= cot .
Dierentiating both sides yields
2
csc
2
() =
n=
2
+n
2
(
2
n
2
)
2
=
n=
( +n)
2
2n
( +n)
2
( n)
2
=
n=
( +n)
2
( +n)
2
( n)
2

=0
..
n=
2n
(
2
n
2
)
2
=
n=
1
( n)
2
. (25.19)
Now this formula can be used to obtain a formula for () . As pointed out
above, depends only on the ratio w
2
/w
1
and so it suces to take w
1
= 1 and
w
2
= . Thus
() =

_
1+
2
_
2
_
_
1
2
_
2
_ . (25.20)
From the original formula for ,
_
1 +
2
_
2
_
=
1
_
1+
2
_
2

1
_
2
_
2
+
(k,m)=(0,0)
1
_
k
1
2
+
_
m
1
2
_
_
2

1
_
k +
_
m
1
2
_
_
2
=
(k,m)Z
2
1
_
k
1
2
+
_
m
1
2
_
_
2

1
_
k +
_
m
1
2
_
_
2
=
(k,m)Z
2
1
_
k
1
2
+
_
m
1
2
_
_
2

1
_
k +
_
m
1
2
_
_
2
=
(k,m)Z
2
1
_
k
1
2
+
_
m
1
2
_
_
2

1
_
k +
_
m
1
2
_
_
2
=
(k,m)Z
2
1
_
1
2
+
_
m+
1
2
_
k
_
2

1
__
m+
1
2
_
k
_
2
. (25.21)
Similarly,
_
1
2
_
2
_
=
1
_
1
2
_
2

1
_
2
_
2
+
(k,m)=(0,0)
1
_
k
1
2
+m
_
2

1
_
k +
_
m
1
2
_
_
2
=
(k,m)Z
2
1
_
k
1
2
+m
_
2

1
_
k +
_
m
1
2
_
_
2
=
(k,m)Z
2
1
_
k
1
2
m
_
2

1
_
k +
_
m
1
2
_
_
2
=
(k,m)Z
2
1
_
1
2
+m k
_
2

1
__
m+
1
2
_
k
_
2
. (25.22)
Now use 25.19 to sum these over k. This yields,
_
1 +
2
_
2
_
=
2
sin
2
_
_
1
2
+
_
m+
1
2
_
__

2
sin
2
_
_
m+
1
2
_
_
=
2
cos
2
_
_
m+
1
2
_
_

2
sin
2
_
_
m+
1
2
_
_
and
_
1
2
_
2
_
=
2
sin
2
_
_
1
2
+m
__

2
sin
2
_
_
m+
1
2
_
_
=
2
cos
2
(m)

2
sin
2
_
_
m+
1
2
_
_.
The following interesting formula for results.
() =
m
1
cos
2
((m+
1
2
))

1
sin
2
((m+
1
2
))
m
1
cos
2
(m)

1
sin
2
((m+
1
2
))
. (25.23)
From this it is obvious () = () . Therefore, from 25.18,
() + 1 =
_
1
_
=
_
1
_
(25.24)
(It is good to recall that has been dened for / 1.)
25.1.6 Mapping Properties Of
The two functional equations, 25.24 and 25.17 along with some other properties pre-
sented above are of fundamental importance. For convenience, they are summarized
here in the following lemma.
Lemma 25.19 The following functional equations hold for .
(1 +) =
()
() 1
, 1 = () +
_
1
_
(25.25)
( + 2) = () , (25.26)
(z) = (w) if and only if there exists a unimodular matrix,
_
a b
c d
_
_
1 0
0 1
_
mod2
such that
w =
az +b
cz +d
(25.27)
Consider the following picture.
C
l
2
l
1
1
1
2
In this picture, l
1
is the y axis and l
2
is the line, x = 1 while C is the top half of
the circle centered at
_
1
2
, 0
_
which has radius 1/2. Note the above formula implies
has real values on l
1
which are between 0 and 1. This is because 25.23 implies
(ib) =
m
1
cos
2
((m+
1
2
)ib)

1
sin
2
((m+
1
2
)ib)
m
1
cos
2
(mib)

1
sin
2
((m+
1
2
)ib)
=
m
1
cosh
2
((m+
1
2
)b)
+
1
sinh
2
((m+
1
2
)b)
m
1
cosh
2
(mb)
+
1
sinh
2
((m+
1
2
)b)
(0, 1) . (25.28)
This follows from the observation that
cos (ix) = cosh(x) , sin(ix) = i sinh(x) .
Thus it is clear from 25.28 that lim
b0+
(ib) = 1.
Next I need to consider the behavior of () as Im() . From 25.23 listed
here for convenience,
() =
m
1
cos
2
((m+
1
2
))

1
sin
2
((m+
1
2
))
m
1
cos
2
(m)

1
sin
2
((m+
1
2
))
, (25.29)
it follows
() =
1
cos
2
((
1
2
))

1
sin
2
((
1
2
))
+
1
cos
2
(
1
2
)

1
sin
2
(
1
2
)
+A()
1 +B()
=
2
cos
2
((
1
2
))

2
sin
2
((
1
2
))
+A()
1 +B()
(25.30)
Where A() , B() 0 as Im() . I took out the m = 0 term involving
1/ cos
2
(m) in the denominator and the m = 1 and m = 0 terms in the nu-
merator of 25.29. In fact, e
i(a+ib)
A(a +ib) , e
i(a+ib)
B(a +ib) converge to zero
uniformly in a as b .
Lemma 25.20 For A, B dened in 25.30, e
i(a+ib)
C (a +ib) 0 uniformly in a
for C = A, B.
Proof: From 25.23,
e
i
A() =
m=0
m=1
e
i
cos
2
_
_
m+
1
2
_
_
e
i
sin
2
_
_
m+
1
2
_
_
Now let = a +ib. Then letting
m
=
_
m+
1
2
_
,
cos (
m
a +i
m
b) = cos (
m
a) cosh(
m
b) i sinh(
m
b) sin(
m
a)
sin(
m
a +i
m
b) = sin(
m
a) cosh(
m
b) +i cos (
m
a) sinh(
m
b)
Therefore,
cos
2
(
m
a +i
m
b)
= cos
2
(
m
a) cosh
2
(
m
b) + sinh
2
(
m
b) sin
2
(
m
a)
sinh
2
(
m
b) .
Similarly,
sin
2
(
m
a +i
m
b)
= sin
2
(
m
a) cosh
2
(
m
b) + cos
2
(
m
a) sinh
2
(
m
b)
sinh
2
(
m
b) .
It follows that for = a +ib and b large
e
i
A()
m=0
m=1
2e
b
sinh
2
_
_
m+
1
2
_
b
_
m=1
2e
b
sinh
2
_
_
m+
1
2
_
b
_ +
2
m=
2e
b
sinh
2
_
_
m+
1
2
_
b
_
= 2
m=1
2e
b
sinh
2
_
_
m+
1
2
_
b
_ = 4
m=1
e
b
sinh
2
_
_
m+
1
2
_
b
_
Now a short computation shows
e
b
sinh
2
((m+1+
1
2
)b)
e
b
sinh
2
((m+
1
2
)b)
=
sinh
2
_
_
m+
1
2
_
b
_
sinh
2
_
_
m+
3
2
_
b
_
1
e
3b
.
Therefore, for = a +ib,
e
i
A()
4
e
b
sinh
_
3b
2
_
m=1
_
1
e
3b
_
m
4
e
b
sinh
_
3b
2
_
1/e
3b
1 (1/e
3b
)
which converges to zero as b . Similar reasoning will establish the claim about
B() . This proves the lemma.
Lemma 25.21 lim
b
(a +ib) e
i(a+ib)
= 16 uniformly in a 1.
Proof: From 25.30 and Lemma 25.20, this lemma will be proved if it is shown
lim
b
_
2
cos
2
_
_
1
2
_
(a +ib)
_
2
sin
2
_
_
1
2
_
(a +ib)
_
_
e
i(a+ib)
= 16
uniformly in a 1. Let = a + ib to simplify the notation. Then the above
expression equals
_
8
_
e
i
+e
i
_
2
+
8
_
e
i
e
i
_
2
_
e
i
=
_
8e
i
(e
i
+ 1)
2
+
8e
i
(e
i
1)
2
_
e
i
=
8
(e
i
+ 1)
2
+
8
(e
i
1)
2
= 16
1 +e
2i
(1 e
2i
)
2
.
Now
1 +e
2i
(1 e
2i
)
2
1
1 +e
2i
(1 e
2i
)
2

_
1 e
2i
_
2
(1 e
2i
)
2
3e
2i
e
4i
(1 e
2b
)
2

3e
2b
+e
4b
(1 e
2b
)
2
and this estimate proves the lemma.
Corollary 25.22 lim
b
(a +ib) = 0 uniformly in a 1. Also (ib) for b > 0
is real and is between 0 and 1, is real on the line, l
2
and on the curve, C and
lim
b0+
(1 +ib) = .
Proof: From Lemma 25.21,
(a +ib) e
i(a+ib)
16
< 1
for all a provided b is large enough. Therefore, for such b,
[(a +ib)[ 17e
b
.
25.28 proves the assertion about (bi) real.
By the rst part, lim
b
[(ib)[ = 0. Now from 25.24
lim
b0+
(ib) = lim
b0+
_
1
_
1
ib
__
= lim
b0+
_
1
_
i
b
__
= 1. (25.31)
by Corollary 25.22.
Next consider the behavior of on line l
2
in the above picture. From 25.17 and
25.28,
(1 +ib) =
(ib)
(ib) 1
< 0
and so as b 0+ in the above, (1 +ib) .
It is left as an exercise to show that the map 1
1
maps l
2
onto the curve,
C. Therefore, by 25.25, for l
2
,
_
1
1
_
=

_
1
_
1
_
1
(25.32)
=
1 ()
(1 ()) 1
=
() 1
()
1 (25.33)
It follows is real on the boundary of in the above picture. This proves the
corollary.
Now, following Alfors [2], cut o by considering the horizontal line segment,
z = a + ib
0
where b
0
is very large and positive and a [0, 1] . Also cut o
by the images of this horizontal line, under the transformations z =
1
and z =
1
1
. These are arcs of circles because the two transformations are fractional linear
transformations. It is left as an exercise for you to verify these arcs are situated as
shown in the following picture. The important thing to notice is that for b
0
large the
points of these circles are close to the origin and (1, 0) respectively. The following
picture is a summary of what has been obtained so far on the mapping by .
: z
-
? 6
real small positive

near 1 and real
C
2
C
1
large, real, negative
small, real, negative
z = a +ib
0
C
l
2
l
1
1
1
2
In the picture, the descriptions are of acting on points of the indicated bound-
ary of . Consider the oriented contour which results from (z) as z moves rst up
l
2
as indicated, then along the line z = a +ib and then down l
1
and then along C
1
to C and along C till C
2
and then along C
2
to l
2
. As indicated in the picture, this
involves going from a large negative real number to a small negative real number
and then over a smooth curve which stays small to a real positive number and from
there to a real number near 1. (z) stays fairly near 1 on C
1
provided b
0
is large
so that the circle, C
1
has very small radius. Then along C, (z) is real until it hits
C
2
. What about the behavior of on C
2
? For z C
2
, it follows from the denition
of C
2
that z = 1
1
where is on the line, a + ib

0
. Therefore, by Lemma 25.21,
25.17, and 25.24
(z) =
_
1
1
_
=

_
1
_
1
_
1
=

_
1
_
1
_
1
=
1 ()
(1 ()) 1
=
() 1
()
= 1
1
()
which is approximately equal to
1
1
16e
i(a+ib
0
)
= 1
e
b
0
e
ia
16
.
These points are essentially on a large half circle in the upper half plane which has
radius approximately
e
b
0
16
.
Now let w C with Im(w) ,= 0. Then for b
0
large enough, the motion over the
boundary of the truncated region indicated in the above picture results in tracing
out a large simple closed curve oriented in the counter clockwise direction which
includes w on its interior if Im(w) > 0 but which excludes w if Im(w) < 0.
Theorem 25.23 Let be the domain described above. Then maps one to one
and onto the upper half plane of C, z C such that Im(z) > 0 . Also, the line
(l
1
) = (0, 1) , (l
2
) = (, 0) , and (C) = (1, ).
Proof: Let Im(w) > 0 and denote by the oriented contour described above
and illustrated in the above picture. Then the winding number of about w
equals 1. Thus
1
2i
_
1
z w
dz = 1.
But, splitting the contour integrals into l
2
,the top line, l
1
, C
1
, C, and C
2
and chang-
ing variables on each of these, yields
1 =
1
2i
_
(z)
(z) w
dz
and by the theorem on counting zeros, Theorem 20.20 on Page 502, the function,
z (z) w has exactly one zero inside the truncated . However, this shows
this function has exactly one zero inside because b
0
was arbitrary as long as it
is suciently large. Since w was an arbitrary element of the upper half plane, this
veries the rst assertion of the theorem. The remaining claims follow from the
above description of , in particular the estimate for on C
2
. This proves the
theorem.
Note also that the argument in the above proof shows that if Im(w) < 0, then
w is not in () . However, if you consider the reection of about the y axis,
then it will follow that maps this set one to one onto the lower half plane. The
argument will make signicant use of Theorem 20.22 on Page 504 which is stated
here for convenience.
Theorem 25.24 Let f : B(a, R) C be analytic and let
f (z) = (z a)
m
g (z) , > m 1
where g (z) ,= 0 in B(a, R) . (f (z) has a zero of order m at z = a.) Then there
exist , > 0 with the property that for each z satisfying 0 < [z [ < , there exist
points,
a
1
, , a
m
B(a, ) ,
such that
f
1
(z) B(a, ) = a
1
, , a
m
and each a
k
is a zero of order 1 for the function f () z.
Corollary 25.25 Let be the region above. Consider the set of points, Q =
0, 1 described by the following picture.
C
l
2
l
1
1
1
2
Then (Q) = C 0, 1 . Also
(z) ,= 0 for every z in
k=
(Q+ 2k) H.
Proof: By Theorem 25.23, this will be proved if it can be shown that (
) =
z C : Im(z) < 0 . Consider
1
dened on
by
1
(x +iy) (x +iy).
Claim:
1
is analytic.
Proof of the claim: You just verify the Cauchy Riemann equations. Letting
(x +iy) = u(x, y) +iv (x, y) ,
1
(x +iy) = u(x, y) iv (x, y)
u
1
(x, y) +iv (x, y) .
Then u
1x
(x, y) = u
x
(x, y) and v
1y
(x, y) = v
y
(x, y) = u
x
(x, y) since
is analytic. Thus u
1x
= v
1y
. Next, u
1y
(x, y) = u
y
(x, y) and v
1x
(x, y) =
v
x
(x, y) = u
y
(x, y) and so u
1y
= v
x
.
Now recall that on l
1
, takes real values. Therefore,
1
= on l
1
, a set with
a limit point. It follows =
1
on
. By Theorem 25.23 maps one to

one onto the upper half plane. Therefore, from the denition of
1
= , it follows
maps
one to one onto the lower half plane as claimed. This has shown that
is one to one on
. This also veries from Theorem 20.22 on Page 504 that
,= 0 on
.
Now consider the lines l
2
and C. If
(z) = 0 for z l
2
, a contradiction can
be obtained. Pick such a point. If
(z) = 0, then z is a zero of order m 2 of

the function, (z) . Then by Theorem 20.22 there exist , > 0 such that if
w B((z) , ) , then
1
(w) B(z, ) contains at least m points.
C
l
2
l
1
1
1
2
z
1
B(z, )
z
(z
1
)
B((z), )
(z)
In particular, for z
1
B(z, ) suciently close to z, (z
1
) B((z) , )
and so the function (z
1
) has at least two distinct zeros. These zeros must be
in B(z, ) because (z
1
) has positive imaginary part and the points on l
2
are
mapped by to a real number while the points of B(z, ) are mapped by to
the lower half plane thanks to the relation, (z + 2) = (z) . This contradicts
one to one on . Therefore,
,= 0 on l
2
. Consider C. Points on C are of the form
1
1
where l
2
. Therefore, using 25.33,
_
1
1
_
=
() 1
()
.
Taking the derivative of both sides,
_
1
1
__
1
2
_
=

()
()
2
,= 0.
Since is periodic of period 2 it follows
(z) ,= 0 for all z
k=
(Q+ 2k) .
Lemma 25.26 If Im() > 0 then there exists a unimodular
_
a b
c d
_
such that
c +d
a +b
is contained in the interior of Q. In fact,
c+d
a+b
1 and
1/2 Re
_
c +d
a +b
_
1/2.
Proof: Letting a basis for the module of periods of be 1, , it follows from
Theorem 25.3 on Page 662 that there exists a basis for the same module of periods,
w
1
, w
2
with the property that for
= w
2
/w
1
[
[ 1,
1
2
Re

1
2
.
Since this is a basis for the same module of periods, there exists a unimodular
matrix,
_
a b
c d
_
such that
_
w
1
w
2
_
=
_
a b
c d
__
1
_
.
Hence,
=
w
2
w
1
=
c +d
a +b
.
Thus
is in the interior of H. In fact, it is on the interior of
Q.
0 1 1/2 1 1/2
25.1.7 A Short Review And Summary

With this lemma, it is easy to extend Corollary 25.25. First, a simple observation
and review is a good idea. Recall that when you change the basis for the module
of periods, the Weierstrass function does not change and so the set of e
i
used in
dening also do not change. Letting the new basis be w
1
, w
2
, it was shown
that
_
w
1
w
2
_
=
_
a b
c d
__
w
1
w
2
_
for some unimodular transformation,
_
a b
c d
_
. Letting = w
2
/w
1
and
=
w
2
/w
=
c +d
a +b
()
Now as discussed earlier
(
) = (())
_
w
1
+w
2
2
_
_
w
2
2
_
_
w
1
2
_
_
w
2
2
_
=
_
1+
2
_
2
_
_
1
2
_
2
_
These numbers in the above fraction must be the same as
_
1+
2
_
,
_
2
_
, and
_
1
2
_
but they might occur dierently. This is because does not change and these
numbers are the zeros of a polynomial having coecients involving only numbers
and (z) . It could happen for example that
_
1+
2
_
=
_
2
_
in which case this
would change the value of . In eect, you can keep track of all possibilities by
simply permuting the e
i
in the formula for () given by
e
3
e
2
e
1
e
2
. Thus consider the
following permutation table.
1 2 3
2 3 1
3 1 2
2 1 3
1 3 2
3 2 1
.
Corresponding to this list of 6 permutations, all possible formulas for (())
can be obtained as follows. Letting
= () where is a unimodular matrix

corresponding to a change of basis,
(
) =
e
3
e
2
e
1
e
2
= () (25.34)
(
) =
e
1
e
3
e
2
e
3
=
e
3
e
2
+e
2
e
1
e
3
e
2
= 1
1
()
=
() 1
()
(25.35)
(
) =
e
2
e
1
e
3
e
1
=
_
e
3
e
2
(e
1
e
2
)
e
1
e
2
_
1
= [() 1]
1
=
1
1 ()
(25.36)
(
) =
e
3
e
1
e
2
e
1
=
_
e
3
e
2
(e
1
e
2
)
e
1
e
2
_
= [() 1] = 1 () (25.37)
(
) =
e
2
e
3
e
1
e
3
=
e
3
e
2
e
3
e
2
(e
1
e
2
)
=
1
1
1
()
=
()
() 1
(25.38)
(
) =
e
1
e
3
e
3
e
2
=
1
()
(25.39)
Corollary 25.27
() ,= 0 for all in the upper half plane, denoted by P

+
.
Proof: Let P
+
. By Lemma 25.26 there exists a unimodular transforma-
tion and
in the interior of Q such that
= (). Now from the denition of in

terms of the e
i
, there is at worst a permutation of the e
i
and so it might be the case
that (()) ,= () but it is the case that (()) = (()) where
(z) ,= 0.
Here is one of the functions determined by 25.34 - 25.39. (Since () / 0, 1 ,
((z)) ,= 0. This follows from the above possibilities for listed above in 25.34 -
25.39.) All the possibilities are (z) =
z,
z 1
z
,
1
1 z
, 1 z,
z
z 1
,
1
z
and these are the same as the possibilities for
1
. Therefore,
(())
() =
(())
() and so
() ,= 0 as claimed.
Now I will present a lemma which is of major signicance. It depends on the
remarkable mapping properties of the modular function and the monodromy theo-
rem from analytic continuation. A review of the monodromy theorem will be listed
here for convenience. First recall the denition of the concept of function elements
and analytic continuation.
Denition 25.28 A function element is an ordered pair, (f, D) where D is an open
ball and f is analytic on D. (f
0
, D
0
) and (f
1
, D
1
) are direct continuations of each
other if D
1
D
0
,= and f
0
= f
1
on D
1
D
0
. In this case I will write (f
0
, D
0
)
(f
1
, D
1
) . A chain is a nite sequence, of disks, D
0
, , D
n
such that D
i1
D
i
,=
. If (f
0
, D
0
) is a given function element and there exist function elements, (f
i
, D
i
)
such that D
0
, , D
n
is a chain and (f
j1
, D
j1
) (f
j
, D
j
) then (f
n
, D
n
) is
called the analytic continuation of (f
0
, D
0
) along the chain D
0
, , D
n
. Now
suppose is an oriented curve with parameter interval [a, b] and there exists a chain,
D
0
, , D
n
such that

n
k=1
D
k
, (a) is the center of D
0
, (b) is the center
of D
n
, and there is an increasing list of numbers in [a, b] , a = s
0
< s
1
< s
n
= b
such that ([s
i
, s
i+1
]) D
i
and (f
n
, D
n
) is an analytic continuation of (f
0
, D
0
)
along the chain. Then (f
n
, D
n
) is called an analytic continuation of (f
0
, D
0
) along
the curve . ( will always be a continuous curve. Nothing more is needed. )
Then the main theorem is the monodromy theorem listed next, Theorem 22.19
and its corollary on Page 591.
Theorem 25.29 Let be a simply connected subset of C and suppose (f, B(a, r))
is a function element with B(a, r) . Suppose also that this function element can
be analytically continued along every curve through a. Then there exists G analytic
on such that G agrees with f on B(a, r).
Here is the lemma.
Lemma 25.30 Let be the modular function dened on P
+
the upper half plane.
Let V be a simply connected region in C and let f : V C 0, 1 be analytic
and nonconstant. Then there exists an analytic function, g : V P
+
such that
g = f.
Proof: Let a V and choose r
0
small enough that f (B(a, r
0
)) contains neither
0 nor 1. You need only let B(a, r
0
) V . Now there exists a unique point in Q,
0
such that (
0
) = f (a). By Corollary 25.25,
(
0
) ,= 0 and so by the open
mapping theorem, Theorem 20.22 on Page 504, There exists B(
0
, R
0
) P
+
such
that is one to one on B(
0
, R
0
) and has a continuous inverse. Then picking r
0
still smaller, it can be assumed f (B(a, r
0
)) (B(
0
, R
0
)). Thus there exists
a local inverse for ,
1
0
dened on f (B(a, r
0
)) having values in B(
0
, R
0
)
1
(f (B(a, r
0
))). Then dening g
0

1
0
f, (g
0
, B(a, r
0
)) is a function element.
I need to show this can be continued along every curve starting at a in such a way
that each function in each function element has values in P
+
.
Let : [, ] V be a continuous curve starting at a, ( () = a) and sup-
pose that if t < T there exists a nonnegative integer m and a function element
(g
m
, B( (t) , r
m
)) which is an analytic continuation of (g
0
, B(a, r
0
)) along where
g
m
( (t)) P
+
and each function in every function element for j m has values
in P
+
. Thus for some small T > 0 this has been achieved.
Then consider f ( (T)) C 0, 1 . As in the rst part of the argument, there
exists a unique
T
Q such that (
T
) = f ( (T)) and for r small enough there is
an analytic local inverse,
1
T
between f (B( (T) , r)) and
1
(f (B( (T) , r)))
B(
T
, R
T
) P
+
for some R
T
> 0. By the assumption that the analytic continua-
tion can be carried out for t < T, there exists t
0
, , t
m
= t and function elements
(g
j
, B( (t
j
) , r
j
)) , j = 0, , m as just described with g
j
( (t
j
)) P
+
, g
j
= f
on B( (t
j
) , r
j
) such that for t [t
m
, T] , (t) B( (T) , r). Let
I = B( (t
m
) , r
m
) B( (T) , r) .
Then since
1
T
is a local inverse, it follows for all z I
(g
m
(z)) = f (z) =
_
1
T
f (z)
_
Pick z
0
I . Then by Lemma 25.19 on Page 682 there exists a unimodular mapping
of the form
(z) =
az +b
cz +d
where
_
a b
c d
_
_
1 0
0 1
_
mod2
such that
g
m
(z
0
) =
_
1
T
f (z
0
)
_
.
Since both g
m
(z
0
) and
_
1
T
f (z
0
)
_
are in the upper half plane, it follows ad
cb = 1 and maps the upper half plane to the upper half plane. Note the pole of
is real and all the sets being considered are contained in the upper half plane so
is analytic where it needs to be.
Claim: For all z I,
g
m
(z) =
1
T
f (z) . (25.40)
Proof: For z = z
0
the equation holds. Let
A =
_
z I : g
m
(z) =
_
1
T
f (z)
__
.
Thus z
0
I. If z I and if w is close enough to z, then w I also and so both
sides of 25.40 with w in place of z are in
1
m
(f (I)) . But by construction, is one
to one on this set and since is invariant with respect to ,
(g
m
(w)) =
_
1
T
f (w)
_
=
_

1
T
f (w)
_
and consequently, w A. This shows A is open. But A is also closed in I because
the functions are continuous. Therefore, A = I and so 25.40 is obtained.
Letting f (z) f (B( (T)) , r) ,
1
T
(f (z))
__
=
_
1
T
(f (z))
_
= f (z)
and so
1
T
is a local inverse for on f (B( (T)) , r) . Let the new function
element be
_
_
_
g
m+1
..

1
T
f, B( (T) , r)
_
_
_. This has shown the initial function element
can be continued along every curve through a.
By the monodromy theorem, there exists g analytic on V such that g has values
in P
+
and g = g
0
on B(a, r
0
) . By the construction, it also follows g = f. This
last claim is easy to see because g = f on B(a, r
0
) , a set with a limit point so
the equation holds for all z V . This proves the lemma.
25.2 The Picard Theorem Again
Having done all this work on the modular function which is important for its own
sake, there is an easy proof of the Picard theorem. In fact, this is the way Picard
did it in 1879. I will state it slightly dierently since it is no trouble to do so, [26].
Theorem 25.31 Let f be meromorphic on C and suppose f misses three distinct
points, a, b, c. Then f is a constant function.
Proof: Let (z)
za
zc
bc
ba
. Then (c) = , (a) = 0, and (b) = 1. Now
consider the function, h = f. Then h misses the three points , 0, and 1. Since
h is meromorphic and does not have in its values, it must actually be analytic.
25.3. EXERCISES 695
Thus h is an entire function which misses the two values 0 and 1. If h is not constant,
then by Lemma 25.30 there exists a function, g analytic on C which has values in
the upper half plane, P
+
such that g = h. However, g must be a constant
because there exists an analytic map on the upper half plane which maps the
upper half plane to B(0, 1) . You can use the Riemann mapping theorem or more
simply, (z) =
zi
z+i
. Thus g equals a constant by Liouvilles theorem. Hence
g is a constant and so h must also be a constant because (g (z)) = h(z) . This
proves f is a constant also. This proves the theorem.
25.3 Exercises
1. Show the set of modular transformations is a group. Also show those modular
transformations which are congruent mod2 to the identity as described above
is a subgroup.
2. Suppose f is an elliptic function with period module M. If w
1
, w
2
and
w
1
, w
2
are two bases, show that the resulting period parallelograms resulting
from the two bases have the same area.
3. Given a module of periods with basis w
1
, w
2
and letting a typical element
of this module be denoted by w as described above, consider the product
(z) z
w=0
_
1
z
w
_
e
(z/w)+
1
2
(z/w)
2
.
Show this product converges uniformly on compact sets, is an entire function,
and satises
(z) / (z) = (z)

where (z) was dened above as a primitive of (z) and is given by
(z) =
1
z
+
w=0
1
z w
+
z
w
2
+
1
w
.
4. Show (z +w
i
) = (z) +
i
where
i
is a constant.
5. Let P
a
be the parallelogram shown in the following picture.
0
w
1
w
2
a
Show that
1
2i
_
P
a
(z) dz = 1 where the contour is taken once around the
parallelogram in the counter clockwise direction. Next evaluate this contour
integral directly to obtain Legendres relation,
1
w
2
2
w
1
= 2i.
6. For dened in Problem 3, 4 explain the following steps. For j = 1, 2
(z +w
j
)
(z +w
j
)
= (z +w
j
) = (z) +
j
=

(z)
(z)
+
j
Therefore, there exists a constant, C
j
such that
(z +w
j
) = C
j
(z) e
j
z
.
Next show is an odd function, ( (z) = (z)) and then let z = w
j
/2
to nd C
j
= e
j
w
j
2
and so
(z +w
j
) = (z) e
j
(z+
w
j
2
)
. (25.41)
7. Show any even elliptic function, f with periods w
1
and w
2
for which 0 is
neither a pole nor a zero can be expressed in the form
f (0)
n
k=1
(z) (a
k
)
(z) (b
k
)
where C is some constant. Here is the Weierstrass function which comes
from the two periods, w
1
and w
2
. Hint: You might consider the above func-
tion in terms of the poles and zeros on a period parallelogram and recall that
an entire function which is elliptic is a constant.
8. Suppose f is any elliptic function with w
1
, w
2
a basis for the module of
periods. Using Theorem 25.9 and 25.41 show that there exists constants
a
1
, , a
n
and b
1
, , b
n
such that for some constant C,
f (z) = C
n
k=1
(z a
k
)
(z b
k
)
.
Hint: You might try something like this: By Theorem 25.9, it follows that if
k
are the zeros and b
k
the poles in an appropriate period parallelogram,
b
k
equals a period. Replace
k
with a
k
such that
a
k
b
k
= 0.
Then use 25.41 to show that the given formula for f is bi periodic. Anyway,
you try to arrange things such that the given formula has the same poles as
f. Remember an entire elliptic function equals a constant.
9. Show that the map 1
1
maps l
2
onto the curve, C in the above picture
on the mapping properties of .
10. Modify the proof of Theorem 25.23 to show that ()z C : Im(z) < 0 =
.
The Hausdor Maximal
Theorem
The purpose of this appendix is to prove the equivalence between the axiom of
choice, the Hausdor maximal theorem, and the well-ordering principle. The Haus-
dor maximal theorem and the well-ordering principle are very useful but a little
hard to believe; so, it may be surprising that they are equivalent to the axiom of
choice. First it is shown that the axiom of choice implies the Hausdor maximal
theorem, a remarkable theorem about partially ordered sets.
A nonempty set is partially ordered if there exists a partial order, , satisfying
x x
and
if x y and y z then x z.
An example of a partially ordered set is the set of all subsets of a given set and
. Note that two elements in a partially ordered sets may not be related. In
other words, just because x, y are in the partially ordered set, it does not follow
that either x y or y x. A subset of a partially ordered set, (, is called a chain
if x, y ( implies that either x y or y x. If either x y or y x then x and
y are described as being comparable. A chain is also called a totally ordered set. (
is a maximal chain if whenever

( is a chain containing (, it follows the two chains
are equal. In other words ( is a maximal chain if there is no strictly larger chain.
Lemma A.1 Let T be a nonempty partially ordered set with partial order . Then
assuming the axiom of choice, there exists a maximal chain in T.
Proof: Let A be the set of all chains from T. For ( A, let
S
C
= x T such that (x is a chain strictly larger than (.
If S
C
= for any (, then ( is maximal. Thus, assume S
C
,= for all ( A. Let
f(() S
C
. (This is where the axiom of choice is being used.) Let
g(() = ( f(().
697
698 THE HAUSDORFF MAXIMAL THEOREM
Thus g(() _ ( and g(() ( =f(() = a single element of T. A subset T of A
is called a tower if
T ,
( T implies g(() T ,
and if o T is totally ordered with respect to set inclusion, then
o T .
Here o is a chain with respect to set inclusion whose elements are chains.
Note that A is a tower. Let T
0
be the intersection of all towers. Thus, T
0
is a
tower, the smallest tower. Are any two sets in T
0
comparable in the sense of set
inclusion so that T
0
is actually a chain? Let (
0
be a set of T
0
which is comparable
to every set of T
0
. Such sets exist, being an example. Let
B T T
0
: T _ (
0
and f ((
0
) / T .
The picture represents sets of B. As illustrated in the picture, T is a set of B when
T is larger than (
0
but fails to be comparable to g ((
0
). Thus there would be more
than one chain ascending from (
0
if B ,= , rather like a tree growing upward in
more than one direction from a fork in the trunk. It will be shown this cant take
place for any such (
0
by showing B = .
(
0
T
f((
0
)
This will be accomplished by showing

T
0
T
0
B is a tower. Since T
0
is the
smallest tower, this will require that

T
0
= T
0
and so B = .
Claim:

T
0
is a tower and so B = .
Proof of the claim: It is clear that

T
0
because for to be contained in B
it would be required to properly contain (
0
which is not possible. Suppose T

T
0
.
The plan is to verify g (T)

T
0
.
Case 1: f (T) (
0
. If T (
0
, then since both T and f (T) are contained in
(
0
, it follows g (T) (
0
and so g (T) / B. On the other hand, if T _ (
0
, then since
T

T
0
, f ((
0
) T and so g (T) also contains f ((
0
) implying g (T) / B. These are
the only two cases to consider because (
0
is comparable to every set of T
0
.
Case 2: f (T) / (
0
. If T _ (
0
it cant be the case that f (T) / (
0
because if
this were so, g (T ) would not compare to (
0
.
T
(
0
f((
0
)
f(T)
Hence if f (T) / (
0
, then T (
0
. If T = (
0
, then f (T) = f ((
0
) g (T) so
699
g (T) / B. Therefore, assume T _ (
0
. Then, since T is in

T
0
, f ((
0
) T and so
f ((
0
) g (T). Therefore, g (T)

T
0
.
Now suppose o is a totally ordered subset of

T
0
with respect to set inclusion.
Then if every element of o is contained in (
0
, so is o and so o

T
0
. If, on
the other hand, some chain from o, (, contains (
0
properly, then since ( / B,
f ((
0
) ( o showing that o / B also. This has proved

T
0
is a tower and
since T
0
is the smallest tower, it follows

T
0
= T
0
. This has shown roughly that no
splitting into more than one ascending chain can occur at any (
0
which is comparable
to every set of T
0
. Next it is shown that every element of T
0
has the property that
it is comparable to all other elements of T
0
. This is done by showing that these
elements which possess this property form a tower.
Dene T
1
to be the set of all elements of T
0
which are comparable to every
element of T
0
. (Recall the elements of T
0
are chains from the original partial order.)
Claim: T
1
is a tower.
Proof of the claim: It is clear that T
1
because is a subset of every set.
Suppose (
0
T
1
. It is necessary to verify that g ((
0
) T
1
. Let T T
0
(Thus T (
0
or else T _ (
0
.)and consider g ((
0
) (
0
f ((
0
). If T (
0
, then T g ((
0
)
so g ((
0
) is comparable to T. If T _ (
0
, then T g ((
0
) by what was just shown
(B = ). Hence g ((
0
) is comparable to T. Since T was arbitrary, it follows g ((
0
)
is comparable to every set of T
0
. Now suppose o is a chain of elements of T
1
and
let T be an element of T
0
. If every element in the chain, o is contained in T, then
o is also contained in T. On the other hand, if some set, (, from o contains T
properly, then o also contains T. Thus o T
1
since it is comparable to every
T T
0
.
This shows T
1
is a tower and proves therefore, that T
0
= T
1
. Thus every set of
T
0
compares with every other set of T
0
showing T
0
is a chain in addition to being a
tower.
Now T
0
, g (T
0
) T
0
. Hence, because g (T
0
) is an element of T
0
, and T
0
is a
chain of these, it follows g (T
0
) T
0
. Thus
T
0
g (T
0
) _ T
0
,
a contradiction. Hence there must exist a maximal chain after all. This proves the
lemma.
If X is a nonempty set, is an order on X if
x x,
and if x, y X, then
either x y or y x
and
if x y and y z then x z.
is a well order and say that (X, ) is a well-ordered set if every nonempty subset
of X has a smallest element. More precisely, if S ,= and S X then there exists
an x S such that x y for all y S. A familiar example of a well-ordered set is
the natural numbers.
Lemma A.2 The Hausdor maximal principle implies every nonempty set can be
well-ordered.
Proof: Let X be a nonempty set and let a X. Then a is a well-ordered
subset of X. Let
T = S X : there exists a well order for S.
Thus T ,= . For S
1
, S
2
T, dene S
1
S
2
if S
1
S
2
and there exists a well
order for S
2
,
2
such that
(S
2
,
2
) is well-ordered
and if
y S
2
S
1
then x
2
y for all x S
1
,
and if
1
is the well order of S
1
then the two orders are consistent on S
1
. Then
observe that is a partial order on T. By the Hausdor maximal principle, let (
be a maximal chain in T and let
X
(.
Dene an order, , on X
as follows. If x, y are elements of X
, pick S ( such
that x, y are both in S. Then if
S
is the order on S, let x y if and only if x
S
y.
This denition is well dened because of the denition of the order, . Now let U
be any nonempty subset of X
. Then S U ,= for some S (. Because of the

denition of , if y S
2
S
1
, S
i
(, then x y for all x S
1
. Thus, if y X
S
then x y for all x S and so the smallest element of S U exists and is the
smallest element in U. Therefore X
is well-ordered. Now suppose there exists

z X X
. Dene the following order,

1
, on X
z.
x
1
y if and only if x y whenever x, y X
x
1
z whenever x X
.
Then let
( = S ( or X
z.
Then

( is a strictly larger chain than ( contradicting maximality of (. Thus X
X
= and this shows X is well-ordered by . This proves the lemma.

With these two lemmas the main result follows.
Theorem A.3 The following are equivalent.
The axiom of choice
The Hausdor maximal principle
The well-ordering principle.
A.1. EXERCISES 701
Proof: It only remains to prove that the well-ordering principle implies the
axiom of choice. Let I be a nonempty set and let X
i
be a nonempty set for each
i I. Let X = X
i
: i I and well order X. Let f (i) be the smallest element
of X
i
. Then
f
iI
X
i
.
A.1 Exercises
1. Zorns lemma states that in a nonempty partially ordered set, if every chain
has an upper bound, there exists a maximal element, x in the partially ordered
set. x is maximal, means that if x y, it follows y = x. Show Zorns lemma
is equivalent to the Hausdor maximal theorem.
2. Let X be a vector space. Y X is a Hamel basis if every element of X can be
written in a unique way as a nite linear combination of elements in Y . Show
that every vector space has a Hamel basis and that if Y, Y
1
are two Hamel
bases of X, then there exists a one to one and onto map from Y to Y
1
.
3. Using the Baire category theorem of the chapter on Banach spaces show
that any Hamel basis of a Banach space is either nite or uncountable.
4. Consider the vector space of all polynomials dened on [0, 1]. Does there
exist a norm, [[[[ dened on these polynomials such that with this norm, the
vector space of polynomials becomes a Banach space (complete normed vector
space)?
Bibliography
[1] Adams R. Sobolev Spaces, Academic Press, New York, San Francisco, London,
1975.
[2] Alfors, Lars Complex Analysis, McGraw Hill 1966.
[3] Apostol, T. M., Mathematical Analysis, Addison Wesley Publishing Co.,
1969.
[4] Apostol, T. M., Calculus second edition, Wiley, 1967.
[5] Apostol, T. M., Mathematical Analysis, Addison Wesley Publishing Co.,
1974.
[6] Ash, Robert, Complex Variables, Academic Press, 1971.
[7] Baker, Roger, Linear Algebra, Rinton Press 2001.
[8] Balakrishnan A.V., Applied Functional Analysis, Springer Verlag 1976.
[9] Bergh J. and Lofstrom J. Interpolation Spaces, Springer Verlag 1976.
[10] Bledsoe W.W. , Am. Math. Monthly vol. 77, PP. 180-182 1970.
[11] Bruckner A. , Bruckner J., and Thomson B., Real Analysis Prentice
Hall 1997.
[12] Conway J. B. Functions of one Complex variable Second edition, Springer
Verlag 1978.
[13] Cheney, E. W. ,Introduction To Approximation Theory, McGraw Hill 1966.
[14] Da Prato, G. and Zabczyk J., Stochastic Equations in Innite Dimensions,
Cambridge 1992.
[15] Diestal J. and Uhl J. Vector Measures, American Math. Society, Providence,
R.I., 1977.
[16] Dontchev A.L. The Graves theorem Revisited, Journal of Convex Analysis,
Vol. 3, 1996, No.1, 45-53.
703
704 BIBLIOGRAPHY
[17] Dunford N. and Schwartz J.T. Linear Operators, Interscience Publishers,
a division of John Wiley and Sons, New York, part 1 1958, part 2 1963, part 3
1971.
[18] Duvaut, G. and Lions, J. L. Inequalities in Mechanics and Physics,
Springer-Verlag, Berlin, 1976.
[19] Evans L.C. and Gariepy, Measure Theory and Fine Properties of Functions,
CRC Press, 1992.
[20] Evans L.C. Partial Dierential Equations, Berkeley Mathematics Lecture
Notes. 1993.
[21] Federer H., Geometric Measure Theory, Springer-Verlag, New York, 1969.
[22] Gagliardo, E., Properieta di alcune classi di funzioni in piu variabili, Ricerche
Mat. 7 (1958), 102-137.
[23] Grisvard, P. Elliptic problems in nonsmooth domains, Pittman 1985.
[24] Henry D. Geometric Theory of Semilinear Parabolic Equations, Lecture Notes
in Mathematics, Springer Verlag, 1980.
[25] Hewitt E. and Stromberg K. Real and Abstract Analysis, Springer-Verlag,
New York, 1965.
[26] Hille Einar, Analytic Function Theory, Ginn and Company 1962.
[27] Hormander, Lars Linear Partial Dierrential Operators, Springer Verlag,
1976.
[28] Hormander L. Estimates for translation invariant operators in L
p
spaces,
Acta Math. 104 1960, 93-139.
[29] John, Fritz, Partial Dierential Equations, Fourth edition, Springer Verlag,
1982.
[30] Jones F., Lebesgue Integration on Euclidean Space, Jones and Bartlett 1993.
[31] Kuttler K.L. Basic Analysis. Rinton Press. November 2001.
[32] Kuttler K.L., Modern Analysis CRC Press 1998.
[33] Levinson, N. and Redheer, R. Complex Variables, Holden Day, Inc. 1970
[34] Markushevich, A.I., Theory of Functions of a Complex Variable, Prentice
Hall, 1965.
[35] McShane E. J. Integration, Princeton University Press, Princeton, N.J. 1944.
[36] Ray W.O. Real Analysis, Prentice-Hall, 1988.
BIBLIOGRAPHY 705
[37] Rudin, W., Principles of mathematical analysis, McGraw Hill third edition
1976
[38] Rudin W. Real and Complex Analysis, third edition, McGraw-Hill, 1987.
[39] Rudin W. Functional Analysis, second edition, McGraw-Hill, 1991.
[40] Saks and Zygmund, Analytic functions, 1952. (This book is available on the
web. Analytic Functions by Saks and Zygmund
[41] Smart D.R. Fixed point theorems Cambridge University Press, 1974.
[42] Stein E. Singular Integrals and Dierentiability Properties of Functions.
Princeton University Press, Princeton, N. J., 1970.
[43] Yosida K. Functional Analysis, Springer-Verlag, New York, 1978.
Index
C
1
functions, 84
C
c
, 266
C
m
c
, 266
F
sets, 136
G
, 301
G
sets, 136
L
1
loc
, 394
L
p
compactness, 271
systems, 199
algebra, 135
Abels theorem, 462
absolutely continuous, 398
adjugate, 57
algebra, 125
analytic continuation, 590, 692
Analytic functions, 449
approximate identity, 267
at most countable, 18
automorphic function, 678
axiom of choice, 13, 17, 242
axiom of extension, 13
axiom of specication, 13
axiom of unions, 13
Banach space, 251
Banach Steinhaus theorem, 303
basis of module of periods, 666
Besicovitch covering theorem, 415
Bessels inequality, 332, 356
Big Picard theorem, 605
Blaschke products, 647
Blochs lemma, 593
block matrix, 64
Borel Cantelli lemma, 166
Borel measurable, 242
Borel measure, 179
Borel sets, 135
bounded continuous linear functions,
301
bounded variation, 437
branch of the logarithm, 492
Brouwer xed point theorem, 325
Browders lemma, 355
Cantor diagonalization procedure, 110
Cantor function, 241
Cantor set, 241
Caratheodory, 169
Caratheodorys procedure, 170
Cartesian coordinates, 40
Casorati Weierstrass theorem, 472
Cauchy
general Cauchy integral formula,
478
integral formula for disk, 457
Cauchy Riemann equations, 451
Cauchy Schwarz inequality, 253, 321
Cauchy sequence, 77
Cayley Hamilton theorem, 61
chain rule, 82
change of variables general case, 232
characteristic function, 142
characteristic polynomial, 61
closed graph theorem, 307
closed set, 111
closure of a set, 112
cofactor, 55
compact, 101
compact set, 113
complete measure space, 170
706
INDEX 707
completion of measure space, 195
conformal maps, 455, 578
connected, 115
connected components, 116
continuous function, 113
convergence in measure, 166
convex
set, 322
convex
functions, 272
convolution, 267, 293
Coordinates, 39
countable, 18
counting zeros, 502
Cramers rule, 58
cycle, 478
Darboux, 36
Darboux integral, 36
derivatives, 82
determinant, 50
product, 54
transpose, 52
dierential equations
Peano existence theorem, 131
dilations, 578
Dini derivates, 243, 411
distribution function, 193
dominated convergence theorem, 161
doubly periodic, 664
dual space, 312
duality maps, 319
Egoro theorem, 142
eigenvalues, 61, 507, 510
elementary factors, 631
elliptic, 664
entire, 467
epsilon net, 101, 107
equality of mixed partial derivatives,
91
equivalence class, 19
equivalence relation, 19
essential singularity, 473
Eulers theorem, 659
evolution equation
continuous semigroup, 343
exchange theorem, 43
exponential growth, 296
extended complex plane, 435
Fatous lemma, 154
nite intersection property, 105, 115
nite measure space, 136
Fourier series
uniform convergence, 318
Fourier transform L
1
, 284
fractional linear transformations, 578,
583
mapping three points, 580
fractional powers
sectorial operator, 558
Frechet derivative, 81
Fresnel integrals, 535
Fubinis theorem, 203
function, 16
function element, 590, 692
functional equations, 682
fundamental theorem of algebra, 468
fundamental theorem of calculus, 35,
395, 397
general Radon measures, 423
Gamma function, 272
gamma function, 653
gauge function, 309
Gausss formula, 654
Gerschgorins theorem, 506
Gram determinant, 329
Gram matrix, 329
Gramm Schmidt process, 68
great Picard theorem, 604
Hadamard three circles theorem, 497
Hahn Banach theorem, 310
Hahn decomposition, 390
Hahn Jordan decomposition, 390
Hardy Littlewood maximal function,
394
Hardys inequality, 272
harmonic functions, 454
708 INDEX
Hausdor maximal principle, 20, 215,
309
Hausdor maximal theorem, 697
Hausdor metric, 120
Hausdor space, 111
Heine Borel theorem, 104, 120
Hermitian, 71
Hilbert space, 253, 321
dual, 255
Holders inequality, 247
homotopic to a point, 623
implicit function theorem, 91, 94, 95
indicator function, 142
innite products, 627
inner product space, 253, 321
inner regular measure, 179
inverse function theorem, 95, 96
inverses and determinants, 56
inversions, 578
isogonal, 454, 577
isolated singularity, 472
James map, 314
Jensens formula, 644
Jensens inequality, 272
Laplace expansion, 55
Laplace transform, 133, 243, 296
Laurent series, 524
Lebesgue
set, 398
Lebesgue decomposition, 359
Lebesgue measure, 211
Lebesgue point, 395
limit point, 111
linear combination, 42, 53
linearly dependent, 42
linearly independent, 42
Liouville theorem, 467
little Picard theorem, 694
locally compact, 106
locally compact , 113
logarithm
branch of logarithm, 492
Lusins theorem, 271
matrix
left inverse, 57
lower triangular, 58
non defective, 71
normal, 71
right inverse, 57
upper triangular, 58
maximal function
general Radon measures, 420
maximal function
measurability, 409
maximal function strong estimates, 410
maximum modulus theorem, 493
mean value theorem
for integrals, 37
measurable, 169
Borel, 138
measurable function, 138
pointwise limits, 139
measurable functions
Borel, 166
combinations, 141
measurable sets, 136, 170
measure space, 136
Mellin transformations, 532
meromorphic, 474
Mertens theorem, 611
Minkowski functional, 318
Minkowskis inequality, 258
minor, 55
Mittag Leer, 536, 638
mixed partial derivatives, 89
modular function, 676, 678
modular group, 607, 666
module of periods, 662
mollier, 267
monotone convergence theorem, 151
monotone functions
dierentiable, 244, 412
Montels theorem, 581, 603
multi-index, 89, 264, 275
Neumann series, 537
nonmeasurable set, 242
normal family of functions, 583
INDEX 709
normal topological space, 112
nowhere dierentiable functions, 316
numerical range, 551
one point compactication, 114, 182
open cover, 113
open mapping theorem, 304, 489
open sets, 111
operator norm, 77, 301
order, 653
order of a pole, 473
order of a zero, 465
order of an elliptic function, 664
orthonormal set, 330
outer measure, 166, 169
outer regular measure, 179
parallelogram identity, 355
partial derivative, 85
partial order, 20, 308
partially ordered set, 697
partition, 21
partition of unity, 185
period parallelogram, 664
Phragmen Lindelof theorem, 495
pi systems, 199
Plancherel theorem, 288
point of density, 409
polar decomposition, 371
pole, 473
polynomial, 264, 275
positive and negative parts of a mea-
sure, 406
positive linear functional, 186
power series
analytic functions, 461
power set, 13
precompact, 113, 130
primitive, 445
principal branch of logarithm, 493
principal ideal, 642
product topology, 113
projection in Hilbert space, 324
properties of integral
properties, 33
Rademachers theorem, 403
Radon Nikodym derivative, 362
Radon Nikodym Theorem
nite measures, 362
nite measures, 359
rank of a matrix, 58
real Schur form, 69
reexive Banach Space, 315
reexive Banach space, 379
region, 465
regular family of sets, 409
regular measure, 179
regular topological space, 112
removable singularity, 472
residue, 513
resolvent, 337
resolvent set, 537, 551
Riemann criterion, 25
Riemann integrable, 24
continuous, 120
Riemann integral, 24
Riemann sphere, 435
Riemann Stieltjes integral, 24
Riesz map, 258, 326
Riesz representation theorem
C
0
(X), 385
Hilbert space, 257, 326
locally compact Hausdor space,
186
Riesz Representation theorem
C (X), 383
Riesz representation theorem L
p
nite measures, 372
Riesz representation theorem L
p
nite case, 378
Riesz representation theorem for L
1
nite measures, 376
right polar decomposition, 73
Rouches theorem, 519
Runges theorem, 616
scalars, 41
scale of Banach spaces, 574
Schottkys theorem, 601
Schroder Bernstein theorem, 17
710 INDEX
Schwarz formula, 463
Schwarz reection principle, 487
Schwarzs lemma, 584
sectorial, 540
self adjoint, 71
semigroup
adjoint, 348
contraction
bounded, 335
generator, 334
growth estimate, 334
Hille Yosida theorem, 338
strongly continuous, 334
separated, 115
separation theorem, 319
sets, 13
Shannon sampling theorem, 298
simple function, 147
Smtal, 410
Sobolev Space
embedding theorem, 297
equivalent norms, 297
Sobolev spaces, 297
span, 42, 53
spectral radius, 538
stereographic projection, 436, 602
Stirlings formula, 655
strict convexity, 320
subspace, 42
support of a function, 185
Tietze extention theorem, 132
topological space, 111
total variation, 365, 399
totally bounded set, 101
totally ordered set, 697
translation invariant, 213
translations, 578
trivial, 42
uniform boundedness theorem, 303
uniform convergence, 434
uniform convexity, 320
uniformly bounded, 107, 603
uniformly equicontinuous, 106, 603
uniformly integrable, 162
unimodular transformations, 666
upper and lower sums, 22
Urysohns lemma, 180
variational inequality, 324
vector measures, 365
Vitali convergence theorem, 163, 272
Vitali covering theorem, 216, 219, 220,
222
Vitali coverings, 220, 222
Vitali theorem, 607
weak convergence, 320
Weierstrass
approximation theorem, 125
Stone Weierstrass theorem, 126
Weierstrass M test, 434
Weierstrass P function, 671
well ordered sets, 699
winding number, 475
Youngs inequality, 247, 390
zeta function, 655

Lecturenotes641 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecturenotes641 PDF

Uploaded by

Copyright:

Available Formats

Lecture Notes

such that for any z

U (f, P) +U (g, P) (L(f, P) +L(g, P)) < /2 +/2 = .

< U (f, P) L(f, P) <

(x) = f (x) , F (a) = 0

< U (f, P) L(f, P) < .

is any partition having [[P[[ < , and z

. Therefore, the sum

linearly independent? If it is not there exist scalars, c

44 IMPORTANT LINEAR ALGEBRA

is a linearly independent set of vectors. If spanv

in the ordered list, (i

. Writing this in terms of components

AU, then B is also normal. But

AU is a normal and upper triangular matrix and by Lemma 3.56 it

DU. Then denote by

Proof: First extend the given linearly independent set w

Rx x, y)[ = 0 for all x, y because the given x, y were arbitrary. Let

Rx x to conclude that for all x,

R = I since x is arbitrary. This proves the lemma.

F. This is a Hermitian matrix because

F are all nonnegative. This is because

Thus there are at least as many z

is an orthonormal basis for F

R = I. This proves the theorem.

)[[ [(u, 0)[ .

f exists and is continuous.

is a matrix whose entries are dierentiable functions of D

(x) for [[ < q

f (x, y) for [[ q. This follows easily from the description of D

f (x, y) for [[ q +1 and D

(x) for [[ < q +1. It follows since y

denote the sets of B which are contained in

= (. Then for each B B

is a countable collection of sets of ( whose union

( = (. If ( admits no nite subcover, then neither does

) < and yet d (f (p

X X where is just the name of some point which is not in X which is

x : dist (x, S) inf d (x, s) : s S .

is a closed set containing S. Also show that is a metric on (. This

. Now D is complete because it is a closed subspace of a

is also complete. This proves the lemma.

) is a complete normed linear space. This is

for all x X, g is continuous, and g equals

= max [f (x)[ : x [1, 1].

< 0 on (1, 1 +) and (1 , 1) . Now consider

. Then for all m, V

n. The following picture is roughly illustrative of what is taking

be sequences of complex simple functions associated with f and g respectively as

Proof: f is measurable by Theorem 7.8. Since [f[ g, it follows that

[f[ d lim inf

X X where is just the name of some point which is not in X which is

is a collection of compact sets. I will argue that there are

which have empty intersection. If not, then (

is a Borel set and

. Therefore, g = f a.e. and this proves the lemma.

(t) > 0 for

(t) ([s > t]) dm =

(t) ([s > t]) dm =

(t) > 0 for all t > 0 and (0) = 0. Then

(t) ([f > t]) dt.

(t) ([f > t]) dm

and (N) = 0. Consider