Advanced Calculus

LYNN H.LOOMIS and SHLOMO STERNBERG Department of Mathematics, Harvard University ADVANCED CALCULUS REVISED EDITION ® JONES AND BARTLETT PUBLISHERS. nEditorial, Sales, and Customer Service Offices: Jones and Bartlett Publishers, Inc. (One Exeter Plaza Boston, MA 02116 Jones and Bartlett Publishers International PO Box 1498 London W6 7RS England Copyright © 1990 by Jones and Bartlett Publishers, Ine. Copyright © 1968 by Addison-Wesley Publishing Company, Inc. All rights reserved. No part of the material protected ky this copyright notice may he reproduced or utilized in any form, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system, without written permission from the copyright owner. Printed in the Us W987 6 rary of Congress Cataloginy in-Publication Data Loomis, Lynn H. Advanced calculus / Lynn H. Loomis and Shlomo Sternberg. —Rev. ed. poem. Originally published: Reading, Ma: ISBN 0-86720-122-3, 1. Caleulus. I. Sternberg, Shlomo, II. Title QA303.L87 1990 515--de20 89-15620 Cie Addison-Wesley Pub, Co., 1968.PREFACE ‘This books based on an honors course in advanced ealeulus that we gave in the 1960's. The foundational material, presented in the unstarred sections of Chap- ters I through 11, was normally covered, but different applications of this basic material were stressed from year to year, and the book therefore contains more material than was covered in any one year. It ean accordingly be used (with omissions) as a text for a year's course in advanced caleulus, or as a text for & three-semester introduction to analysis. ‘These prerequisites are a good grounding in the ealeulus of one variable from a méthematically rigorous point of view, together with some acquaintance with linear algebra. The reader should be familiar with limit and continuity type arguments and have a certain amount of mathematical sophistication. As possi ble introductory texts, we mention Differential and Integral Calculus by R. Cou rant, Caleulus by T. Apostol, Caleulus by M. Spivak, and Pure Mathematics by G. Hardy. The reader should also have some experience with partial derivatives, In overall plan the book divides roughly into a first half which develops the calculus (principally the differential calculus) in the setting of normed vector spaces, and a second half which deals with the calculus of differentiable manifolds. Vector space calculus is treated in two chapters, the differential caleulus in Chapter 3, and the basic theory of ordinary differential equations in Chapter 6 ‘The other early chapters are auxiliary. The first two chapters develop the necessary purely algebraic theory of vector spaces, Chapter 4 presents the material (on compactness and completeness needed for the more substantive results 0° the calculis, and Chapter 5 contains a brief account of the extra structure en: countered in sealar product spaces. Chapter 7 is devoted to multilinear (tensori algebra and is, in the main, a reference chapter for later use. Chapter 8 deals with the theory of (Riemann) integration on Euclidean spaces and includes (in exercise form) the fundamental faets about the Fourier transform. Chapters § and 10 develop the differential and integral caleulus on manifolds, while Chapter 11 treats the exterior caleulus of E. Cartan, ‘The fist eleven chapters form a logical unit, each chapter depending on the results of the preceding chapters. (Of course, many chapters contain material that ean be omitted on first reading; this is generally found in starred sections.‘On the other hand, Chapters 12, 13, and the latter parts of Chapters 6 and 11 are independent of each other, and are to be regarded as illustrative apolieations of the methods developed in the earlier chapters. Presented here are elementary Sturm-Liouville theory and Fourier series, elementary differential geometry, potential theory, and classical mechanics. We usually covered only one or two of these topies in our one-year course. We have not hesitated to present the same material more than once from different points of view. For example, although we have selected the eontraction, ‘mapping fixed-point theorem as our basic approach to the implicit-funetion theorem, we have also outlined a “Newton's method” proof in the text and have sketched still a third proof in the exercises. Similarly, the ealculus of variations is encountered twwiee—onee in the context of the differential ealeulus of an infinite-dimensional vector space and later in the context of classical mechanics. ‘The notion of a submanifold of a vector space is introduced in the early chapters, while the invariant definition of a manifold is given later on. In the introductory treatment of vector space theory, we are more eareful ‘and precise than is customary. In fact, this level of precision of language is not ‘maintained in the later chapters. Our feeling is that in linear algebra, where the concepts are so clear and the axioms s0 familiar, it is pedagogieally sound to illustrate various subtle points, such as distinguishing between spaces that are normally identified, discussing the naturality of various maps, and so on, Later oon, when overly precise language would be more cumbersome, the reader should be able to produce for himself a more preeise version of any assertiors that he finds to be formulated too loosely. Similarly, the proofs in the first few chapters ‘are presented in more formal detail. Again, the philosophy is that once the student has mastered the notion of what constitutes a formal mathematical proof, itis safe and more convenient to present arguments in the usual mathematical colloquialisms. While the level of formality decreases, the level of mathematical sophist cation does not. ‘Thus inereasingly abstract and sophisticated matiematical ‘objects are introduced. It has been our experience that Chapter 9 contains the concepts most difficult for students to absorb, especially the notions of the tangent space to a manifold and the Lie derivative of various objects with respect to a veetor field.‘There are exercises of many different. kinds spread throughout the book. Some are in the nature of routine applications. Others ask the reader to fill in or extend various proofs of results presented in the text. Sometimes whole topics, such as the Fourier transform or the residue calculus, are presented in exercise form. Due to the rather abstract nature of the textual material, the student is strongly advised to work out as many of the exercises as he possibly can, Any enterprise of this nature owes much to many people besides the authors, but we particularly wish to acknowledge the help of L. Ahifors, A. Gleason, R. Kulkarni, R. Rasala, and G. Mackey and the general influence of the book by Dieudonné. We also wish to thank the staff of Jones and Bartlett for their invaluable help in preparing this revised edition. Cambridge, Massachusetts LHL. 1968, 1989 SS.hapter 1 Chapter 2 CONTENTS Loxie: quantifiers ‘The logical connectives Negations of quantifiers Sets Restricted variables Ordered pairs and relations Funetions and mappings Product sets; index notation Composition Duality ‘The Boolean operations Partitions and equivalence relations ‘Vector Spaces Fundamental notions Veetor spaces and geometry Product spaces and Hom(V, 1") Affine subspaces and quotient spaces Direot sums Bilinearity Finite-Dimensional Vector Spaces Bases Dimension ‘The dual space Matrices ‘Trace and determinant Matis computations ‘The diagonalization of a quadratic form “The Ditreremtial Cal Review in R Norms Continuity a 36 43 56 or 102 a 17 121 126Chapter 5 ‘Equivalent norms Tnfinitesimals ‘The differential Directional derivatives; the mean-value theorem, ‘The differential and product spaces ‘The differential and R* . . Elementary applications ‘The implicit-funetion theorem. Submanifolds and Lagrange multipliers Functional dependence ce Uniform continuity and function-valued mappings ‘Tho caleulus of variation ‘The second differer The Taylor formula wctness and Completeness spaces; open and closed sets ‘Topology bio Sequential convergence Sequential compactness ‘Compaetness and unifon Equicontinuity Completeness: ‘A first look at Banach algebras ‘The contraction mapping fixed-point theorem ‘The integral of a parametrized are ‘The complex number system ‘Weak methods Scalar Product Spaces Scalar products Orthogonal projection Self-adjoint transforms Orthogonal transforms Compact transformations nd the elassifcation of critical points 132 136 140 M6 152 156 161 164 12 15 179 182 186 191 195 201 202 210 215 216 223 228 236 248 252 257 262 264Differential Equations ‘The fundamental theorem Differentiable dependence on parameters ‘The linear equation : ‘The nth-order linear equation Solving the inhomogeneous equation ‘The boundary-value problem. Fourier series ‘Multilinear Funetionals Bilinear functionals Multilinear functionals Permutations ‘The sign of a permutation ‘The subspace @ of alternating tensors ‘The determinant ‘The exterior algebra. - Exterior powers of sealar produet space ‘The star operator Integration Introduction Axioms Rectangles and paved sets ‘The minimal theory ‘The minimal theory (continued) Contented sets When is a set contented? Behavior under linear distortions ‘Axioms for integration Integration of contented functions ‘The change of variables formula Successive integration Absolutely integrable functions Problem set: The Fourier transformChapter 9 Chapter 10 Chapter 1. Chapter 12 1 2 3 4 Differentiable Ma lds Atlases Functions, convergence Differentisble manifolds ‘The tangent space Flows and veetor fields ie derivatives inear differenti Computations with coordinates Riemann meteies ‘The Integral Caleulus on Manifolds Compactness Partitions of Densities, Volume density of « Riemann metric Pullback and Lie derivatives of densities ‘The divergence theorem More complicated domains ity Exterior Calculus Exterior differential forms Oriented marifolds and the integration of exterior differential forms ‘The operator d Stokes’ theorem Some illustrations of Stokes’ theorem ‘The Lie derivative of a differential form Appendix I. *Veetor analysis” Appendix IT, Elementary differential geometry of surfaces in E® Potential Theory in E* Solid angle Green's formalas ‘The maximum principle Green's funetions 364 367 313 376 383 390 393 307 403 405 408 au 416 419 424 429 433 438 442 449 452 487 450 am 476. 477 4795 6 8 9 10 u 2 13 Chapter 13 1 2 3 4 5 6 7 8 9 10 u 2 13 4 6 ‘The Poisson integral formula Consequences of the Poisson integral formula Harnaek’s theorem Subharmonie funetions Dirichlet’s problem Behavior near the boundary Dirichlet’s principle Physical applications Problem set: The caleulus of residues Classical Mechanies ‘The tangent and cotangent bundles Equations of variation ‘The fundamental linear differential form on T*(M) . ‘The fundamental exterior two-form on T*(M) Hamiltonian mechanies ‘The eentral-foree problem The two-body problem Lagrange's equations Variational principles Goodesie cootdinates Euler’s equations poeta Small oscillations Small oscillations (continued) Canonical transformations Sclected References Notation Index Index 482 485 487 480 491 495 499 501 503 bu 513 515 517 521 528 530 582 837 Bal 54 553 558 500 372 315CHAPTER 0 This preiminary chapter contains a short exposition of the set theory that forms the substratum of mathematical thinking today. It begins with a brief discussion of logic, so that set theory can be discussed with some precision, and continues with a review of the way in which mathematical objects can be defined as sets. The chapter ends with four sections which treat specific set-theoretic topies. Tt is intended that this material be used mainly for reference. Some of it ‘will be familiar to the reader and some of it will probably be new. We suggest that he read the chapter through “lightly” at first, and then refer back to it for details as needed. 1. LOGIC: QUANTIFIERS ‘A statement is a sentence which is true or false as it stands. Thus ‘1 < 2 and “4-43 = 5 are, respectively, true and false mathematical statements. Many sentences oecurring in mathematies eontain variables and are therefore not trae or false as they stand, but become statements when the variables are given values. Simple examples are ‘x <4’, ‘x < y’, ‘x isan integer’, ‘82? + y® = 10. Such sen:ences will be ealled slafement frames. If P(z) isa frame containing the one variable ‘’, then P() is the statement obtained by replacing ‘2’ in P(z) by the numeral ‘5". Vor example, if P(e) is ‘x <4’, then P(5) is ‘5 < 4’, PCV?) is ‘V2 <4, and so on "Another way to obtain a statement from the frame P(2) isto assert that P(e) is always true. We do this by prefixing the phrase ‘Tor every 2’. Thus, for every z,x <4’ isa false statement, and ‘for every 2,2” —1 = (2 — 1)@+1)'isa true statement. This prefixing phrase is called a wniversal quantifier. Sya- onymous phrases are ‘for each 2’ and ‘Tor all 2’, and the symbol customarily ‘used is ‘(Wz)’, which ean be read in any of these ways. One frequently presents sentences containing variables as being always true without explicitly writing the universal quantifiers. For instance, the associative law for the addition of ‘numbers is often written atta)=@tyts where it is understood that the equation is true for all x, y and z, ‘Thus the 12 wmopuerio ou actual statement being made is (ver) (¥y)(wa)[e + (y + 2) = (ety) + ale Finally, we can convert the frame P(z) into a statement by asserting that it is sometimes true, which we do by writing ‘there exists an « such that P(z). This process is called existential quantification. Synonymous prefixing phrases here are ‘there is an z such that’, ‘for some 2’, and, symbolically, ‘(31)’ The statement ‘(¥2)(« < 4)’ still contains the variable ‘2’, of course, but ‘2’ is no longer Jree to be given values, and is now called a bound variable, Roughly speaking, quantified variables are bound and unquantified variables are free. The notation ‘P(2)’ is used only when ‘2’ is free in the sentence being discussed. ‘ow suppose that we have a sentenee P(e, y) containing two free variables. Clearly, we need two quantifiers to obtain a statement from this sentence. This brings us to a very important observation. If quantifiers of both types are used, then the order in which they are written affects the meaning of the statement; (ay)(¥2)P(z, ») and (Wz) @y)PCe, v) say different things. ‘The first says that one y can be found that works for all 2: “there exists a y such that for all z...”. ‘The second says that for each 2 a y can be found that works: “for each x there exists a y such that...” But in the second ease, it may very well happen that when 2 is changed, the y that ean be found will also have to be changed. ‘The existence of a single y that serves for all x is thus the stronger statement. For example, itis true that (¥z)(3y)(e < y) and false that (3y)(¥2)(x < y). ‘The reader must be absolutely clear on this point; his whole mathematical future is at stake. ‘The seeond statement says that there exists a y, eall it yo, such that (vr)(e < yo), that is, such that every number is less than yo. This is false; vo + 1, in particular, is not less than yo. ‘The first statement says that for each x wwe ean find a corresponding y. And we can: take y 1 On the other hand, among a group of quantifiers of the same type the order does not affect the meaning, Thus (¥2)(¥y)’ and ‘(¥y)(¥2)' have the same meaning. We often abbreviate such clumps of similar quantifiers by using the quantification symbol only onee, asin “(¥r, )’, which ean be read ‘ior every «and y ‘Tihus the strietly correct ‘(¥z)(¥u)(V2)l + (y + 2) = (e+ y) + 2} receives the slightly more idiomatic rendition ‘(¥z, y, ae + (y+ 2) = (@-+u) +2). The situation is clearly the same for a group of existential quantifiers ‘The beginning student generally feels that the prefixing phrases ‘for every 2 there exists a y such that’ and ‘there exists a y such that for every 2” sound artificial and are unidiomatic. ‘Thisis indeed the ease, but this awkwardness the price that has to be paid for the order of the quantifiers to be fixed, so that the ‘meaning of the quantified statement is clear and unambiguous. Quantifiers do occur in ordinary idiomatic discourse, but their idiomatic oecurrences often house ambiguity. The following two sentences are good examples of such ambiguous idiomatic usage: “Every 2 is less than some y” and “Some y is greater than every 2”. Ifa poll were taken, it would be found that most men on thea2 ‘THE LOGICAL cONNEeTIVES 3 street feel that: these two sentences say the same thing, but half will fel that the ‘common assertion is false and half will think it true! ‘The trouble here is that the matrix is preceded by one quantifier and followed by another, and the poor reader doesn't know which to take as the inside, or first applied, quantifier. ‘The ‘two possible symbolic renditions of our first sentence, ‘{(¥z)(e < y)](3y)’ and “(vx){(x < y)(2y))’, are respectively false and true. Mathematicians do use hanging quantifiers in the interests of more idiomatie writing, but only if they are sure the reader will understand their order of application, either from the context or by comparison with standard usage. Tn general, « hanging quantihier ‘would probably be read as the inside, or first applied, quantifier, and with this ‘understanding our twvo ambiguous sentences become true and false in that order: Alter this apology: the reader should be able to tolerate the definition of sequential convergence. It involves three quantifiers and runs as follows: The sequence {tu} converges to x if (¥e)(3N)(¥n)(if n > N then lem — x] < ©. In exactly the same format, we define a funetion f to be continuous at a if (¥6)(28)(¥a)(if |e — al < 8 then [f(2) — J(@)| < &). We often omit an inside universal quantifier by displaying the final frame, so that the universal quantification is understood. ‘Thus we define f to be continuous at a if for every € there is @ @ sueh that if e—al <4, then [fta) —f@)| < We shall study these definitions later. We remark only that it is perfeetly possible to build up an intuitive understanding of what these and similar quantified statements actually say. 2, TIE LOGICAL CONNECTIVES ‘When the word ‘and’ is inserted between two sentences, the resulting sentence is true if both constituent sentences are true and is false otherwise. ‘That is, the “truth value”, T or F, of the compound sentence depends only on the truth values of the constituent sentences. We ean thus deseribe the way ‘and! acts in compounding sentences in the simple “truth table” PQ Pant TT T TOF ¥ ror F FOP F where ‘P? and ‘Q' stand for arbitrary statement frames. Words like ‘and’ are called logical connectives. It is often convenient to use symbols for conncetives, and a standard symbol for ‘and? is the ampersand ‘&’. Thus ‘P & Q’ is read "PandQ.4 awtnopverion 02 Another logical connective is the word ‘or’. Unfortunately, this word is used ambiguously in ordinary discourse. Sometimes it is used in the exclusive sense, where ‘P or Q’ means that one of P and Q is true, but not both, and sometimes it is used in the inclusive sense that at least one is true, and possibly both are true, Mathematics cannot tolerate any fundamental ambiguity, and in mathe matics ‘or’ is alicaye used in the latter way. We thus have the truth table PQ Pog TT T TF T ror FOF oF ‘The above two connectives are binary, in the sense that they combine two sentences to form one new sentence. ‘The word ‘not’ applies to one sentence and really shouldn't be considered a connective at all; nevertheless, it is ealled a unary connective. A standard symbol for ‘not! is '~'. Its truth table is obviously P TOF T In idiomatic usage the word ‘not’ is generally buried in the interior of a sentence. We write ‘x is not equal to y’ rather than ‘not (x is equal to y)’. However, for the purpose of logical manipulation, the negation sign (the word ‘not? or & symbol like ‘~') precedes the sentence being negated. We shall, of course, continue to write ‘x + y’, but keep in mind that this is idiomatic for ‘not (# = y)! or ‘~(2 =v)". We come now to the troublesome ‘if..., then...” connective, which we write as either ‘if P, then Q’ or ‘P => Q’. ‘This is almost always applied in the universally quantified context (¥2)(P() = Q(z), and its meaning is best ‘unraveled by a study of this usage. We consider ‘if < 3, then x < 5! to bea true sentence. More exaetly, itis true for all z, s0 that the universal quantification (¥z)(¢ <3 2 <5) is a true statement. ‘This conclusion forces us to agree that, in particular, ‘2 < 32 <5’, 4<3—44<5', and '6<3— 6 < 5’ are all true statements. The truth table for ‘=’ thus contains the values entered below. PQ P=Q , T TOF - roT T FOF oT02 On the other hand, we consider ‘z <7=+.2 <5! to be a false sentence, and therefore have to agree that ‘6 < 7G < 5 is false. Thus the remaining row in the table above gives the value ‘F” for P = Q. Combinations of frame variables and logical connectives such as we have been considering are called truth-functional forms. We can further combine the clementary forms such as ‘P = Q' and ‘~P" by connectives to construct composite forms such as ‘~(P => Q)' and (P= @) & (Q=> PY. A senteneo has a iziven (truth-funetional) form if it ean be obtained from that form by substitution ‘Thus ‘z < yor ~(e < y)' has the form ‘P or ~P’, since itis obtained from this form by substituting the sentenee ‘x < y for the sentence variable ‘P’. Com- posite truth-funetional forms have truth tables that ean be worked out by combining the elementary tables. For example, ‘~(P = Q)’ has the table below, the truth value for the whole form being in the column under the conneetive which is applied last ('~? in this example. PQ ~P=®@ T in) T nF Fr ir) F rT ‘Thus ~(P = Q) is true only when P is true and @ is false. ‘A truth-funetional form such as ‘P or (~P)' which is always true (.e., has only ‘T’ in the final column of its truth table) is ealled a tautology or a tautologous form. The reader ean check that (P&(P3Q)3Q and (P3Q) &Q=R)) >= (PR) are also tautologous. Indeed, any valid principle of reasoning, that does not, involve quantifiers must be expressed by a tautologous form, ‘The ‘if and only if? form ‘P <> Q, or ‘P if and only if Q, or ‘P iff Q abbreviation for ‘(P = Q) & (Q=> PY’. Its truth table works out to be P T T t F ‘That is, P B is tautologous, and conversely.6 — wrnopverioN oO Replacing a sentence obtained by substitution in a form A by the equivalent sentence obtained by the same substitutions in an equivalent form B is a device much used in logical reasoning. ‘Thus to prove a statement P true, it suffices to prove the statement ~P false, since ‘P’ and ‘~(~P)' are equivalent. forms. Other important equivalences are ~(P orQ) © (~P) & (~0), (P3Q © Qor(~P), (PQ) © P&(~Q), A bit of conventional sloppiness which we shall indulge in for smoother idiom is the use of ‘if? instead of the correet, in definitions. We define f to be continuous at if so-and-so, meaning, of course, that fis continuous at x if and only if so-and-so. This eauses no difficulty, since it is clear that ‘if and only if’ is meant when a definition is being given. 3. NEGATIONS OF QUANTIFIERS ‘The combinations ‘~(¥2)' and ‘(3r)~’ have the same meanings: something is not always true if and only if it is sometimes false. Similarly, ‘~(3y)’ and ‘(¥y)~" have the same meanings. These equivalences can be applied to move a negation sign past each quantifier in a string of quantifiers, giving the following important practical rule: In taking the negation of a statement beginning with a string of quantifiers, ‘we simply change eack quantifier to the opposite kind and move the negation sign to the end of the string. ‘Thus ~(¥z)(3y)(¥2)P(z, ys 2) #9 (3z)(¥y)(32)~PC, v2). ‘There are other principles of quantifieational reasoning that can be isolated and which we shall oceasionally mention, but none seem worth formalizing here. 4. SETS It is present-day practice to define every mathematical object as a set of some Kind or other, and we must examine this fundamental notion, however briefly. ‘A set isa collection of objects that is itself considered an entity. ‘The objects the collection are called the elements or members of the set. ‘The symbol for is a member of is‘€’ (a sort of capital epsilon), so that ‘z € A’ is read “r isa member of A”, “r is an element of A”, “r belongs to A*, or “2 is in A". We use the equals sign ‘=’ in mathematics to mean logical identity; A = B means that A is B. Now a set A is considered to be the same object as a set B if and only if A and B have exactly the same members. That is, ‘A = B” means that (Vz)(e € A #2 € B),4 sem 7 ‘We say that a set A is a subset of a set B, or that A is included in B (or that, Bis a superset of A) if every clement of A is an element of B. The symbol for inclusion is ‘C’. Thus ‘AC B" means that (Vz)(c¢ A => 2B). Clearly, (A B) = (ACB) and (BCA). ‘This isa frequently used way of establishing set identity: we prove that A = B by proving that A.C B and that BC A. If the reader thinks about the above ‘equivalence, he will see that it depends first on the equivalence of the truth-funetional forms ‘P ®)(3n&). Some restriction was implieit on page 1. If the reader agreed that (v2)(c? — 1 = @ = 1)(@ + 1)) was true, he probably took z to be a real number.06 ORDERED PAIRS AND RELATIONS — 9 6. ORDERED PAIRS AND RELATIONS Ordered pairs are basic tools, as the reader knows from analytic geometry. According to our general principle, the ordered pair is taken to be a certain set, but here again we don’t eare which particular set it is so long as it guarantees the crucial characterizing property = @ x= aand ‘Thus <1, 3> % <3,1>. ‘The notion of a correspondence, or relation, and the special ease of a mapping, or function, is fundamental to mathematics. A eorrespondenee is a pairing of objects such that given any two objects x and y, the pair <2, y> either docs or does not correspond. A particular correspondence (relation) is generally presented by a statement frame P(z, y) having two free variables, with z and y corresponding if any only if P(e, y) is true. Given any relation (correspondence), the set of all ordered pairs <2, y> of corresponding elements is called its graph ‘Now a relation isa mathematical object, and, as we have said several times, it is current practice to regard every mathematical object as a set of some sort or other. Since the graph of a relation is. set (of ordered pairs), itis efficient and mmary to take the graph fo be the relation. ‘Thus @ relation (correspondence) is simply a set of ordered pairs. If R is a relation, then we say that x has the relation 2 to y, and we write ‘rRy’, if and only if €R. We also say that x corresponds to y under R. ‘The set of all frst elements occurring in the ‘ordered pairs of a relation Wis called the domain of I and is designated dom & or 9(R). Thus dom R= (x: (y) €R} ‘The set of second elements is called the range of R: range R = {y: (3x) <2, y> ER} ‘The inverse, R~, of a relation R is the set of ordered pairs obtained by reversing those of R: R A statement frame P(z, y) having two free variables actually determines a pair of mutually inverse relations R&S, called the graphs of P, as follows: R= {xauri PGW), SH {:2 : 2A &y EB} of all ordered pairs with first element in A and second element in B is ealled the Cartesian product of the { i <> ER}10 awrnopvcrion o7 sets A and B. A relation R is always a subset of dom R X range R. If the two “factor spaces” are the same, we can use exponential notation: A? = Ax A. ‘The Cartesian product R? = RX R is the “analytic plane”. Analytic geometry rests upon the one-to-one coordinate correspondence between R? and the Euclidean plane E? (determined by an axis system in the latter), which ‘enables us to treat geometric questions algebraically and algebraic questions geometrically. In particular, since a relation between sets of real numbers is a subset of R?, we can “picture” it by the corresponding subset of the Euclidean plane, or of any model of the Euclidean plane, such as this page. A simple Cartesian product is shown in Fig, 0.1 (A U Bis the union of the sets A and B). TT T 1B when A=(1,2}42h, 3} and B=[1,19}v(2} Fig. 0. Fig. 02 If R isa rlation and A is any set, then the restriction of R to A, RTA, is the subset of R consisting of those pairs with first element in A RIA {: eRandze A}. ‘Thus R[ A = Rn (A x range R), where CA D is the intersection of the sets Cand D. If Risa relation and A is any set, then the image of A under R, R(A), is the set of second clements of ordered pairs in 2 whose first elements are in A: RIA} = (y: Gree A & ER}. ‘Thus RA] = range (R [ A), as shown in Fig, 0.2. 7. FUNCTIONS AND MAPPINGS A function is relation f such that each domain element 2 is paired with exactly fone range element y. This property can be expressed as follows: efand <227 Ef > yon07 FUNCTIONS AND MAPPINGS IL ‘The y which is thus uniquely determined by f and z is designated f(z): y=J@) @ ef One tends to think of a function as being active and a relation which is not ‘function as being passive. A function f acts on an element z in its domain to give f(x). We take x and apply f to it; indeed we often call a function an operator. On the other hand, if 2 is a relation but not a function, then there is in general ‘no particular y related to an element x in its domain, and the pairing of x and y ‘is viewed more passively. We often define a function f by specifying its value f(z) for each 2 in its domain, and in this eonneetion a stopped arrow notation is used to indicate the pairing. Thus x ++ x* is the function assigning to each number x its square z*. J <201 <2, 4 431 <> If we want it to be understood that f is this funetion, we ean write “Consider the funetion f:2++ 2". The domain of f must be understood for this notation to be meaningful If isa function, then f-" is of course a relation, but in general it is not a function, For example, if f is the function 2+ z*, then f—* contains the pairs <4,2> and <4, —2> and so is not a funetion (see Fig. 0.8). If" és a funetion, we say that fis one-to-one and that fis @ one-to-one correspondence between its domain and its range. Each x € dom f eorresponds to only one y € range (Fis function), and each y € range f corresponds to only one x € dom f (J! is # function). The notation PAB is read “a (the) funetion fon A into B” or “the function f from A to B’. The notation implies that f is a funetion, that dom f= A, and that range fc B. ‘Many people feel that the very notion of function should include all these ingredients; that is, a funetion should be considered an ordered triple < f, A, B>, where f is a funetion according to our more limited definition, A is the domain12 iwrmopucrion 08 of f, and B isa superset of the range of f, which we shall eall the codomain of fin this context. We shall use the terms ‘map’, ‘mapping’, and ‘transformation’ for such a triple, so that the notation f: A — B in its totality presents © mapping. ‘Moreover, when there is no question about which set is the codomain, we shall often call the function f itself a mapping, since the triple is then determined by f. The two arrow notations can be combined, as in: “Define FR Roy rez, A mapping f: A — B is said to be injective if f is one-to-one, surjective if range f= B, and bijective if it is both injective and surjective. A bijective mapping f: A — B is thus a one-to-one correspondence between its domain A and its eodomain B. Of course, a function is always surjective onto its range R, and the statement that Jis surjective means that R = B, where B is the understood codomain, 8, PRODUCT SETS; INDEX NOTATION One of the characteristic habits of the modern mathematician is that as soon as, a new kind of object has been defined and discussed a little, he immediately looks at the set of all such objects. With the notion of a function from A to S well in hand, we naturally consider the set of all funetions from A to S, which we designate S4. Thus Ris the set of all real-valued functions of one reel variable, and S™ is the set of all infinite sequences in S. (It is understood that an infinite sequence is nothing but a function whose domain is the set Z* of all positive integers.) Similarly, if we set % = {1,...,n}, then S* is the set of all finite sequences of length m in S. IB isa subset ofS, then its characteristic function (relative to 8) is the funetion on S, usually designated x», which has the constant value 1 on B and the constant value 0 off B. ‘The set of all characteristic functions of subsets of S is, thus 2° (since 2= {0,1}). But because this collection of functions is in a natural one-to-one correspondence with the collection of all subsets of S$, x2 corresponding to B, we tend to identify the two collections. ‘Thus 2° is also interpreted as the set ofall subsets of S. We shall spend most of the remainder of this section discussing further similar definitional ambiguities which mathematicians tolerate. ‘The ordered triple is usually defined to be the ordered pair <,2>. The reason for this definition is probably that a funetion of ‘two variables z and y is ordinarily considered a funetion of the single ordered pair variable , so that, for example, a real-valued funetion of two real variables is a subset of (RX R) X R. But we also consider such a function a subset of Cartesian 3-space R°. Therefore, we define R® as (RX R) x Rs that is, we define the ordered triple ,2>. On the other hand, the ordered triple <2, y, 2 could also be regarded as the finite sequence {<1, 2>, <2, y>, <3,2>}, which, of course, is different object. These two models for an ordered triple serve equally well, and, again,08 PRODUCT SETS; INDEX NoTATION 13 tathematicians tend to slur over the distinction. We shall have more to say con this point later when we discuss natural isomorphisms (Section 1.6). For the moment we shall simply regard R® and R® as being the same; an ordered e is something which can be “viewed” as being cither an ordered pair of ‘which the first clement is an ordered pair or asa sequence of length 3 (or, for that matter, as an ordered pair of which the second element is an ordered pair). Similarly, we pretend that Cartesian 4-space R* is K¥, R* X R*, or REX RE = Rx ((RXR) XR), ete. Clearly, wo are in effect assuming an associative law for ordered pair formation that we don’t really have. ‘This kind of ambiguity, where we tend to identify two objects that really are distinet, is a necesary corollary of deciding exactly what things are. It is one of the prices we pay for the precision of set theory; in days when mathematics was vaguer, there would have been a single fuzzy notion. ‘The device of indices, which is used frequently in mathematics, also has am- Diguous implications which we should examine. An indexed collection, as a set, is nothing but the range set ofa funetion, the indexing function, and a particular indexed object, say x,, is simply the value of that function at the domain element i. If the set of indices is I, the indexed set is designated {2;:i€ Z} or {x} ier (or {zj:721 in case I = Z*). However, this notation suggests that we view the indexed set as being obtained by letting the index run through the index set J and collecting the indexed objects. ‘That is, an indexed set is viewed as being the set together with the indexing function. ‘This ambivalence is reflected in the fact that the same notation frequently designates the mapping. ‘Thus we refer to the sequence {r,}7-1, where, of course, the sequence is the mapping n+ 2.. We believe that if the reader examines his idea of a sequence he will find this ambiguity present. He means neither just the set nor just the mapping, but the ‘mapping with emphasis on its range, or the range “together with” the mapping. But since set theory cannot reflect these nuanees in any simple and graceful way, we shall take an indexed set fo be the indexing function. Of course, the same range object may be repeated with different indices; there is no implication that an indexing is one-to-one. Note also that indexing imposes no restriction on the set being indexed; any set can at least be self-indexed (by the identity function). Except for the ambiguous ‘{;: i € J}’, there is no universally used notation for the indexing function. Since x; is the value of the function at i, we might think of ‘2/' as another way of writing ‘2(()', in which ease we designate the funetion ‘z’ or ‘x’. We certainly do this in the case of ordered n-tuplets when we say, “Consider the n-tuplet x = <21,...,z,>". On the other hand, there is no compelling reason to use this notation. We can call the indexing funeti anything we want; if it is f, then of course f(i) = 2; for all i. We come now to the general definition of Cartesian product. Earlier we argued (in a special case) that the Cartesian product A x B x C is the set of all ordered triples x = <2, 12, 23> such that 2, € A, x2 €B, and zg EC. More generally, A; X Ag X-+-+ X An, or [Tar 4s is the set of ordered n- tuples x = such that 2; € A; fori = 1,...,n. If we interpret,14 isrropucrion 09 an ordered n-tuplet as a function on = {1,..., n}, we have Tihs Avis the set of all functions x with domain % such that 2€ A; for all iE 7. This rephrasal generalizes almost verbatim to give us the notion of the Cartesian product of an arbitrary indexed collection of sets. Definition. ‘The Cartesian product: [TscrS; of the indexed colletion of sets {See 1) is the set ofall functions J with domain the index set T buch that fQ) € S; for all FZ. ‘We ean also use the notation [I{S;:7€ 7} for the product and f; for the value (i). 9. COMPOSITION If we are given maps f: A + B and g: BC, then the composition of 9 with f, gf, is the map of A into C defined by (oe N@) = (fe) forall zea. ‘This is the function of a function operation of elementary ealeulus. Iffand g are the maps from R.to R defined by f(z) = 23 +1 and g(e) = 2, then f(z) ©9941 = 2? +1, and go f(z) = (29 +1)? = 27 + 2x! +1, Note that the codomain of f must be the domain of g in order for g fto be defined. This operation is perhaps the basic binary operation of mathematies. Lemma. Composition satisfies the associative law: foeh=(Seaek Proof. (f2(g2h))(z) = s(ge bye) = S(o(h@))) = (Fe g)(hie)) = (Jeg) °h)(z) for all x dom h. O If A isa set, the identity map I4: A — A is the mapping taking every 2A toitsell. Thus 24 = { : 2 € A}. If f maps A into B, ther clearly fela=f= nef. Ifg:B — A issuch that gf = I4, then wo say that g is a left inverse of f and that f is a right inverse of 9. Lemma, If the mapping f: 4 — B has both a right inverse h and a left inverse g, they must necessarily be equal. just algebraic juggling and works for any associative operation. Tnoh= Gof)oh=ge(fol) =gela=o.00.10 peaury 15 In this ease we eall the uniquely determined map g:B— A such that Seg= In and go f= 14 tho inverse of J. We then have Theorem. A mapping f: A — B has an inverse if and only it itis bijective, in which ease its inverse is its relational inverse J-". Proof. If fis bijective, then the relational inverse j~* isa funetion from B to A, and the equations fof"? = Ip and J" » f = I are obvious. On the other hand, if fog = Zn, then J is surjective, since then every y in B can be written u=Jlo(y)). And if g°f= I, then f is injective, for then the equation Sle) = f(y) implies that 2 = g(jl2)) = 9(fo)) = y. Thus fis bijective if it has an inverse. 1 Now let S(A) be the set of all bijections f: A+ A. ‘Then &(A) is closed ‘under the binary operation of composition and 1) fo geh) = (fog) eh forall f,9,heS; 2) there exists a unique I © S(A) such that fo I = Io f= ffor all feS; 3) for each fe there exists a unique y € © such that fog = gef= I. Any set @ closed under a binary operation having these properties is called ‘4 group with respect to that operation. Thus (4) is a group with respect to composition. Composition can also be defined for relations as follows. If R CA x Band SCBXG, thon Se RCA X Cis defined by <2,2> ESoR © (3y°)(<2,y> ERE ES). If Rand S axe mappings, this definition agrees with our earlier on 10, DUALITY There is another elementary but important phenomenon ealled duality which occurs in practically all branches of mathematies. Let F: A x BC be any funetion of two variables. It is obvious that if x is held fixed, then F(z, y) is a function of the one variable y. That is, for each fixed = there is a function 1: BC defined by h*(y) = F(z, y). Then ++ k* is a mapping ¢ of A into C8. Similarly, each y € B yields a fanetion gy €C4, where g,(2) = Fv), and y+ gy is mapping 0 from B to C4. ‘Now suppose conversely that we are given a mapping y: A + C®. For each 2A we designate the corresponding value of ¢ in index notation as /*, so that J is a funetion from B to C, and we define F:A x BC by F(x, 1) = AeQ). We are now back where we started. ‘Thus the mappings A —> C#, F:A x BC, and 0: B— C4 are equivalent, and ean be thought of as three different ways of viewing the same phenomenon. The extreme mappings ¢ and 4 will be said to be dual to each other.16 istopverion 0.10 ‘The mapping ¢ is the indexed family of functions {i*:z€ A} CC®. Now suppose that $C C” is an unindexed collection of functions on B into C, and define F:5 x BC by Ff, y) = sly). Then 6: B —> C¥ is defined by 9y(f) = Sly). What is happening here is simply that in the expression f(y) we regard both symbols as variables, so that f(y) is a function on § X B. Then when we hold y fixed, we have a function on ¥ mapping into C. We shall see some important applications of this duality prineiple as our subject develops. For example, an m X n matrix is a function t = {() in R™**, We picture the matrix as a rectangular array of numbers, where ‘i is the row index and {is the column index, so that fi is the number at the intersection of the ith row and the jth column. If we hold i fixed, we get the n-tuple forming the ith row, and the matrix ean therefore be interpreted as an m-tuple of row n-iuples. Similarly (dually), it ean be viewed as an n-tuple of column ‘m-tuples. In the same vein, an n-tuple of funetions from A to B can be regarded as a single n-tuple-valued function from A to B, ars - In a somewhat different application, duality will allow us to regard a finite- dimensional veetor space V as being its own second conjugate space (V*)*. It is instructive to look at elementary Euclidean geometry from this point of view. Today we regard a straight line as being a set of geometric points. An older and more neutral view is to take points and lines as being two different kinds of primitive objects. Accordingly, let A be the set of all points (so that A is the Euclidean plane as we now view it), and let B be the set of all straight ines. Let F be the incidence function: F(p, !) = Lif p and Uare incident (p is “on” I, Lis “on” p) and F(p, 1) = O otherwise. ‘Thus F maps A x B into {0, 1}. Then for each ! © B the function (p) = F(p, 2 is the characteristic funetion of the set of points that we think of as being the line ! (g1(p) has the value 1 if p is on 1 and 0 if p is not onl) Thus each line determines the set of points that are on it. But, dually, each point p determines the set of lines J “on” it, through its ehar- acteristic function h?(0). Thus, in complete duality we can regard a line as being set of points and a point as being a set of lines. ‘This duality aspect of geometry is basie in projective geometry. It is sometimes awkward to invent new notation for the “partial” funetion obtained by holding a variable fixed in a function of several variables, as we did above when we set g,(x) = P(r, ), and there is another device that is frequently ‘woeful in this situation. ‘This is to put a dot in the position of tho “varying variable”. ‘Thus F(a,-) is the funetion of one variable obtained from F(, ») by holding z fixed at the value a, so that in our beginning discussion of duality we have w= Fay), y= FCs w) If is a function of one variable, we ean then write f = f(-), and so express theout ‘THE BOOLEAN OPERATIONS — 17 above equations also as h“() = F(2,), oy) = F(-, y)- The flaw in this notation is that we can't indicate substitution without losing meaning. Thus the value of the function F(z, ) at b is F(x, 2), but from this evaluation we eannot read backward and tell what funetion was evaluated. We are therefore forced to some such cumbersome notation as F(,-)y which ean get out of hand. Never- theless, the dot deviee is often helpful when it can be used without evaluation dificulties, In addition to eliminating the need for temporary notation, as mentioned above, it ean also be used, in situations where it is strietly speaking superfluous, to direct the eye at once to the position of the variable. For example, later on DF will designate the directional derivative of the function F in the (xed) direction & This is a funetion whose value at « is DgF(a), and the notation DzF(-) makes this implicitly understood fact explicit. Ll, THE BOOLEAN OPERATIONS Let S be a fixed domain, and let ¥ be a family of subsets of S. The union of 5, ‘or the union of ail the sets in 5, is the set of all elements belonging to at least one set in F. We designate the union US or Uses A, and thus we have Us = f:GA)@eE A}, YEUT& GAM )y EA), We often consider the family § to be indexed. ‘That is, we assume given a set I (the set of indices) and a surjective mapping i» A; from I to 5, so that 5 = {4i:i€ I}. Then the union of the indexed collection is designated User As or U{Ai:2e 1}. The device of indices has both technical and psychological advantages, and we shall generally use it If is finite, and either it or the index set is listed, then a different notation is used for its union. If ¢ = {A, B}, we designate the union A U B, a notation that displays the listed names. Note that here we have z € A UB @2€ A or 2eB, If 5 = {Ac:i = 1,...,n), we generally write ‘Ay U Ag U-++U As’ or ‘Ulan AV for US. ‘The intersection of the indexed family (A,}.cr, designated ser Ay, is the set of all points that lie in every Ay. Thus rela: © WEN(we Ad, For an unindexed family 5 we use the notation M5 or Macs A, and if 5 = {A,B}, then NF = ANB. ‘The complement, A’, of a subset of S is the set of elements x = $ not in A: A’ = (25:2 4}. The law of De Morgan states that the complement of «an intersection is the union of the complements: (Q4y = Yao ‘This an immediate consequence of the rule for negating quantifiers. It is the18 1rropuerion oun equivalence between ‘not always in’ and ‘sometimes not in’: [~(¥i)(e € A.) (3) € A] says exactly that ze (Nay © rEU(A). Af and take complements again, we obtain the dual form: (UserB,)’ = Micr(B). Other principles of quantification yield the laws an(U a)=Uenad from P & (22)Q%2) + (32)(P & Q@)), Bu (0,4) = Neu as, Bo (0,4) = Dn aa, fet fer Bu(U 4) = U Bu ad. ‘In the ease of two sets, these laws imply the following familiar laws of set algebra (AuBy=A4'nB, (AN BY = AUB! De Morgan), An (BUC) = (ANB) U(ANO, AU(BNC) = (AUB) (AUC). Even here, thinking in terms of indices makes the laws more intuitive. ‘Thus (Ain 4a) = Ay U AS is obvious when thought of as the equivalence between ‘not always in? and ‘sometimes not in’. ‘The family ¢ is diajoine if distinet sets in ¥ have no elements in common, ie, if (WX, YS)(X x Y= XY = 9). For am indexed family (A,}ier the condition becomes i # j=» A: As = B. IF = {A,B}, we simply say that Aand B are disjoint. Given f: U + V and an indexed family {B,} of subsets of V, we have the following important identities: riya Ure and, for a single set BC V, r[Na)= ews, SB = mB. For example, res [N Bese eB AUG) B is any surjective mapping, and if for each value y in B we set A, =I"W) = @EA: Ie) =H, then § = {Ay:y €B} is a fibering of B to. Also yf is the projection x: A > 5, set Z of all z in A'such that (@) = f(a). ‘The above process of generating a fibering of A from a function on A is relatively trivial. A more important way of obtaining a fibering of A is from an equality-like relation on A called an equivalence relation. An equivalence relation ~ on A is a binary relation which is relerive (x ~ x for every 2 € A), symmetric (e ~ y =2 y ~ 2), and transitive (e ~ y and y~ z= 2 ~ 2). Every fibering 5 of A generates a relation ~ by the stipulation that x ~ y if and only if and y are in the same fiber, and obviously ~ is an equivalence relation. The most important fact to be established in this seetion is the converse A and g:y + Ay is a bijection from inee @ ° H(z) = ese) is the ‘Theorem. Every equivalence relation ~ on A is the equivalence relation of a fibering. Proof. We obviously have to det {vy ~ 2}, and our problem is to show that the fa obtained this way isa fibering The reflexive, symmetric, and transitive laws beeome as the set of elements y equivalent to x, F ofall subsets of A rey, cepsyer, and regandyer = vez. Refiexivity thus implies that 5 covers A. Transitivity says that if y € 2, then rep= rez; that is, ify ez, then 7 Cz But also, if y ez, then 2 ey by20 awtnopvertox 12 . Therefore, if two of our no B. In other words, if is we have a fibering. 0 symmetry, and so 2Cy. ‘Thus y © it sets @ and 5 have a point z in common, th not the set J, then @ and B are disjoint, an ‘The fundamental rote this argument plays in mathematics is due to the fact that in many important situations equivalence relations occur as the primary object, and then are used to define partitions and functions. We give two examples. Let Z be the integers (positive, negative, and zero). A fraction ‘m/n’ ean be considered an ordered pair of integers with n #0. ‘The set of all fractions is thus ZX (Z— {0}). Two fractions and are “equal” if and only if mg = np, and equality is checked to be an equivalence relation. ‘The equivalence class is the object taken to be the rational number m/n. Thus the rational number system Q is the set of fibers in a partition of Z x @ — {0}). Next, we choose a fixed integer p ¢Z and define a relation # on Z by Bn € p divides m —n, ‘Then His an equivalence relation, and the set 2, of its equivalence classes is ealled the integers modulo p. It is easy to see that mEn if and only if mand n have the same remainder when divided by p, so that in this ease there is an easily calculated function J, where j(m) is the remainder after dividing m by p, which defines the ibering. ‘The set of possible remainders is {0, 1,...,p — 1}, 0 that Z, contains p elements. ‘A function on a’set A can be “factored” through a fibering of A by the following theorem. ‘Theorem. Let g be a function on A, and let 5 be a fibering of 4. Then g is constant on each fiber of 5 if and only if there exists a function 9 on 5 such that g = J © 7. Proof. If g is constant on each fiber of 5, then the association of this unique value with the fiber defines the function J, and clearly g = J x. ‘The converse is obvious, 0CHAPTER 1 VECTOR SPACES ‘The calculus of functions of more than one variable unites the ealeulus of one variable, which the reader presumably knows, with the theory of veetor spaces, atid the adequacy of its treatment depends direetly on the extent to which vector space theory really is used. The theories of differential equations and differential geometry are similarly based on @ mixture of calculus and vector space theory. Such “vector calculus" and its applications constitute the subject matter of this book, and in order for our treatment to be completely satisfactory, we shall have to spend considerable time at the beginning studying veetor spaces themselves. This we do principally in the first two chapters. ‘The present chapter is devoted to general vector spaces and the next chapter to finite-dimensional spaces. We begin this chapter by introducing the basie concepts of the subject— vector spaces, veetor subspaces, linear combinations, and linear transformations—and then relate these notions to the lines and planes of geometry. Next wwe establish the most elementary formal properties of linear transformations and, Cartesian produet veetor spaces, and take a brief look at quotient vector spaces. ‘This brings us to our first major objective, the study of direet sum decompositions, which we undertake in the fifth seetion. ‘The chapter concludes with a preliminary examination of bili 1. FUNDAMENTAL NOTIONS Vector spaces and subspaces. The reader probably has already had some contact with the notion of a veetor space. Most beginning ealeulus texts discuss eometrie veetors, which are represented by “arrows” drawn from a chosen origin 0. These veetors are added geometrically by the parallelogram rule: The sum of the vector OA (represented by the arrow from 0 to A) and the vector OB is the veetor OP, where P is the vertex opposite O in the parallelogeam having OA and OB as two sides (Fig. 1.1). Veetors ean also be multiplied by numbers: z(@A) is that vector OB such that B is on the line through O and 4, the distance from O to B is |x| times the distance from 0 to A, and B and A sre on the same side of 0 if x is positive, and on opposite sides if x is negative 2122 vector spaces Al 1B-+00-OX Fig. 13 (Fig. 1.2). These two veetor operations satisfy certain laws of algebra, which we shall soon state in the definition. The geometrie proofs of these laws are generally sketchy, consisting more of plausibility arguments than of airtight logic. For example, the geometric figure in Fig. 1.3 is the essence of the usual itive. In each case the final vector OX is from 0 in the parallelepiped constructed from the three edges 0, OB, and OC. The set of all geometric vectors, together with these two operations and the laws of algebra that they satisfy, constitutes fone example of a veetor space. We shall return to this situation in Section 2. ‘The reader may also have seen coordinate triples treated as veetors. In system a three-dimensional veetor is an ordered triple of numbers <2, 22, 23> which we think of geometrically as the coordinates of a point in space. Addition is now algebraically defined, X24, 20) 29> + ) as is multiplication by numbers, ¢<21, 22 23> = . The vector laws are much easier to prove for these objects, since they are almost algebraic formalities. The set R° of all ordered triples of numbers, together with these two operations, is a seeond example of a vector space.1 FUNDAMENTAL NOTIONS 23 If we think of an ordered triple <2, x2, 72> a8 a funetion x with domain the set of integers from 1 to 8, where 2; is the value of the funetion x at é (see Section 0.8), then this vector space suggests a general type called a function space, which we shall examine after the definition. For the moment we remark only that we defined the sum of the triple x and the triple y as that triple 2 such that 2¢ = 2;-+ ys for every i. ‘A veetor space, then, is a collection of objects that ean be added to eacl other and multiplied by numbers, subject to certain laws of algebra. In this context a number is often called a sealar. Definition. Let V be a set, and let there be given a mapping +» a+8 from Vx V to V, ealled addition, and a mapping <2,a> 2a from R X V to V, called multiplication by scalars, Then V is a veetor space with respect to these two operations if: AL a+ @+2)=(a+)+7 forall a6,7eV. AL a+ Basta forall a, BEV. AB. ‘There exists an element 0 V such that a+0— a for all ae V. Ad. For every a V there exists a @ € V such that a+ = 0. SL. (ey)a = 2(ya) forall z,yER, aeV. 82 @+ a= sa4 yo forall z,yeR, ae. 88. x(a +8) = rat a8 forall reR, a, 86. St laa forall av. In contexts where it is clear (as it generally is) which operations are intended, wwe refer simply to the vector space V. Certain further properties of a vector space follow direetly from the axioms. ‘Thus the zero clement postulated in A3 is unique, and for each « the 6 of Ad is unique, and is called a. Also 0a = 0, 20 = 0, and (—I)a= —a. These elementary consequences are considered in the exercises Our standard example of a vector space will be the set V = R¢ of all real- valued functions on a set A under the natural operations of addition of two funetions and multiplication of a function by a number. This generalizes the example R'"-23) = R® that we looked at above. Remember that a function f in R¢ is simply a mathematical object of a certain kind. We are saying that two of these objects can be added together in a natural way to form a third such object, and that the set of all such objects then satisfies the above laws for audition. Of course, {+ is defined as the funetion whose value at a is f(a) (a), $0 that (f+ 9)(a) = f(a) + a(a) for all a in A. For example, in R? we defined the sum x ++ yas that triple whose valueat fisr-+ ys forall. Similarly, of is the function defined by (ef)(a) = c(f(a)) for all a, Laws Al through S+ follow at once from these definitions and the corresponding laws of algebra for the real number system. For example, the equation (s+ Of = af + f means34 vEcToR spaces 1. that ((8 + 04)(@) = (af + Y)(@) for alla A. But (6+ DN@ = (+ O(a) = s(F@)) + 1H) GN(@) + WA) = OF + N10), where we have used the definition of scalar multiplication in 4, the distributive law in R, the definition of scalar multiplication in R“, and the definition of addition in R4, in that order. Thus we have S2, and the other laws follow similarly. "The set A can be anything at all. If A = R, then V = R® is the vector space of all real-valued functions of one real variable. If A = R x R, then. ¥ = R*** is the space of all real-valued functions of two real varables. If A= (1,2) =3, then V = R= R? is the Cartesian plane, and if A {1,...)n} =, then V = R* is Cartesian n-space. If A contains @ single point, then R4 is a natural bijective image of R itself, and of course B is trivially a vector space with respect to its own operations. Now let V be any veetor space, and suppose that IV is a nonempty subset of F that is closed under the operations of V. ‘That is, if «and B are in W, then so is a +B, and if a isin W, then so is za for every sealar 2. For example, let V be the veetor space 2"! of all real-valued functions on the closed interval [a, b] CR, and let W be the set €((a, b)) of all continuous real-valued funetions on [a,b ‘Then W is a subset of V that is closed under the operations of V, since f-+ g and ef are continuous whenever f and gare. Or let V be Cartesian 2-space B®, and let W be the sot of ordered pairs x = <2, 22> such that 21 + 22 = 0. Clearly, WY is elosed under the operations of V. Such a subset Wis always a veetor space in its own right. ‘The universally ‘quantified laws Al, A2, and SI through S4 hold in W7 because they hold in the larger set V. And since there is some 8 in W, it follows that 0 = 08 is in W ecause W is closed under multiplication by scalars. For the same reason, if @ is in W, then s0 is —a = (—1)a. ‘Therefore, A3 and Ad also hold, and we see that W'is a vector space. We have proved the following lemma. Lemma. If W is a nonempty subset of a veetor space V whieh is closed under the operations of V, then IV is itself a veetor space. We call W a subspace of V. Thus €((a, 6) is a subspace of R'*", and the pairs <2y,.22> such that zr +22 = 0 form a subspace of R?, Subspaces will he with us from now to the end. ‘A subspace of a veotor space Ri is called a function space. In other words, & ction space is collection of real-valued funetions on a eommon domain sh is closed under addition and multiplication by sealars. What we have defined so far ought to be called the notion of a ral vector space or a veetor space over R. ‘There is an analogous notion of a compler vector space, for which the sealars are the complex numbers. ‘Then laws SI through $4 refer to multiplication by complex numbers, and the space C4 of all complex fuAl FUNDAMENTAL NOTIONS — 25 valued functions on A is the standard example. In fact, if the reader knew what is meant by a field FP, we could give a single general definition of a vector space over F, where scalar multiplication is by the elements of F, and the standard example is the space V = F4 of all functions from A to F. ‘Throughout this book it will be understood that a vector space is a real vector space unless explic itly stated otherwise. However, much of the analysis holds as well for complex veetor spaces, and most of the pure algebra is valid for any scalar field F. EXERCISES LAL_ Sketch the geometre figure representing law 83, x04 + 0B) = 20a) + (0B), for geometric vectors. Assume that 2 > 1. 12. Prove S3 for RP using the explicit displayed form (ey, 22,25} for ordered triples. 1.3. The vector O postulated in A3 is unique, as elementary algebra fiddling will show. For suppose that 0” also satisfies A. ‘Then = 0+0 (A3foro =0+0 (2) -0 (AS for 0) Show by similar algebraic juggling that, given a, the 8 postulated in A4 is unique This unique Bis designated —a LA Prove similarly that Oa = 0, 0 = 0, and (—1)a 1.5. Prove that if ra = 0, then either 2 = 0 or « 1.6 Prove SI for function space BR. Prove S3. 1.1 Given that is any veetor in a veetor space V, show that the set {2a:2€ R} of al scalar multiples of ais a subspace of V. 18. Given that « and 8 are any’ two vectors in V, show that the set of all vetors za+ yB, where z and y are any real numbers, isa subspace of V. 1.9. Show thatthe set of triples xin R# such that 21 — 22+ 2ra = Osa subspace M. If is the similar subspace {x:21 + ze-f 23 = 0}, find a nonzero vector a in MOAN. Show that AF isthe set (ca:2 € BY} of all scalar multiples of a 1.10 Let A be the open interval (0,1), and let V be RA. Given a point z in (0,1), Jet Ve be the set of functions in V that have a derivative at z. Show that Visa sub: space of V. LIL For any subsets 4 and B of a vector space V we define the set sum 4 + B by 14+B= (@+B:aG | and BEB}. Show that (4+ B)+C = A+ (B+0) 112 MACY and XCR, we similarly define XA = f2a:2eX and a€ al} Show that a nonvoid set 4 is a subspace if and only if A+ A = A and RA = 4 LIS Let V be R2, and let M be the line through the origin with slope k. Let x be any nonaero veetorin Mf, Show that Af is the subspace Rx = féx:te R).26 vector spaces cmt 1.14 Show that any other line L with the same slope f is of the form M-+ a for some a. 115. Let M be a subspace of a veetor space V, and let a and 8 be any two veetors in V Given A = a+ AM and B= 6+ M, show that cither A= B or ANB = B. Show also that t+ B = (a-+8)-+ M0. 1.16 State more carefully and prove what is meant by a subspace”, LI7_ Prove that the interseetion of two subspaces of a veetor space is always itself a subspace. 1.18 Prove more generally that the interseetion W'= Aier We of any family (Ws: € 1} of subspaces of V is a subspace of V. 1.19 Let V again be Rand let H be the set of all Sunetions fin V sueh that f"(2) exists for every zin (0, 1). Show that I'js the interseetion ofthe collection of subspaces of the form V, that were considered in Exercise 1.10, 1.20: Let V bea function space Re, and fora point ain {let Wa be the set of funetions such that f(a) = 0. Wa is clearly a subspace, For a subset BC .L let Wa be the set of functions f in V such that f = O on B. Show that Wy is the intersection (wen We. 1.21 Supposing again that X and Yate subspaces of V, show that if X+ Y = Vand XY = {0}, then for every vector ¢ in V there is a unique pair of vectors EEX fand 9 such that ¢ = £-+ 9. 1.22 Show that if N’and ¥' are subspaces of a vector space V’, then the union XU Y ean only be a subspace if either XC Y or YON, subspace of a subspace is Linear combinations and linear span, Because of the commutative and assor ative laws for vector addition, the sum of a finite set of vectors is the same for all possible ways of adding them. For example, the sum of the three vectors a) ay ae can be calculated in 12 ways, all of which give the same result: (ea an) + ae = a + (ea + a8) = (e+ 4g) + a4 = 04 + (ae + a), ele. ‘Therefore, if I= {a,b,c} is the set of indices used, the notation Dicr ai, Which indicates the sum without telfing us how we got it, is unambiguous. Tn general, for any finite indexed set of veetors {ai :i = I) there is a uniquely determined sum veetor 3j Consider now the set of all linear combinations of the two veetors <1,1,1> and <0, 1, —1> in B®. It is the set of all veetors <1, 1, 1> + 180, 1, -L> = <8, 9-4 f,8 — ©, where s and Care any real numbers. Thus L= (Xs,8-+68— 0 : ER}. It will be clear on inspection that Lis closed under addition and sealar multiplication, and therefore is a subspace of B, Also, Z contains each of the two given veetors, with coefficient. pairs <1,0> and <0,1>, respectively. Finally, any subspace M of &* which contains each of the two given veetors will also contain all of their linear com! nations, and so will include L. That is, L isthe smallest subspace of B® containing <1, 1, 1> and <0, 1, —1>. Itisealled the linear span of the two vectors, or the subspace generated by the two veetors. In general, we have the following theorem.28 vecror spaces a ‘Theorem 1.1. If A is a nonempty subset of a veetor space V, then the set L(A) of all linear combinations of the veetors in A is a subspace, and it is the smallest subspace of V which includes the set A. Proof. Suppose first that A is finite. We ean assume that we have indexed A in some way, so that A = {a;:i€ J} for some finite index set I, and every element of TA) is of the form Sacr tam ‘Then we have (Lr) + (Lye) = Zee + vidas because the left-hand side becomes X; (cya; + yia) when it is regrouped by pairs, and then $2 gives the right-hand side. We also have AE nai) = Llexdas by $3 and mathematical induction, Thus L(A) is closed under addition and rultiplication by sealars and hence is a subspace. Moreover, L() contains each a (why?) and so includes Finally, if a subspace W includes A, then it contains each linear combination 5 zrya;, so it includes L(A). Therefore, L(A) can be directly characterized as the uniquely determined smallest subspace which includes the set A If A is infinite, we obviously can’t use a single finite listing. However, the sum (Ef zen) ++ (Cf 938)) of two linear combinations of elements of A is clearly a finite sum of scalars times elements of A. If we wish, we ean rewrite it as Dit xies, where we have set 8; = an4j and yj = tn4j for j= 1,...,m. In any ease, L(A) is again closed under addition and multiplication by scalars and so isa subspace. ( We eall L(A) the linear span of A. If L(A) V is finitesdimensionat if it has a finite spanning set. If V = R°, and if 3, 5%, and 4° aro the “unit points on the axes”, 8! = <1,0,0>, 6? = <0,1,0>, and 5*= <0,0,1>, then {4°}? spans V, since x= X2y 22, 25> = <21,0,0> + <0,22,0% + <0,0,29> = 28) + 236? + x56® = Df x6" for every x in R®, More generally, if V = R" and 3! is the n-tuple having I in the jth piace and 0 elsewhere, then we have similarly that X= Xry...,24> = Dh 2185, so that {8} spans BY. Thus R" is finite- dimensional. In general, a function space on an infinite set A will not be finite- dimensional. For example, it is true but not obvious that €((a, b]) has no finite spanning set. V, we say that A spans V; EXERCISES: 1.28 Given @ = <1,1,1>, 8 = <0, 1, —1>,7 = <2,0,1>, compute the Tinewr combinations a+ 8-17, 8a — 28+, za-+ yB-+ 27. Find ,y, and 2 such that ra-+yB-+27 = <0,0,1> = 88. Do the same for 8! and 3%. 1.24 Given a = <1,1,1>, 8 = <0,1,—1>, 7 = <1,0,2>, show that each of a, 8,7 is a linear combination of the other two. Show that it is impossible to find cooficients z,y, and z such that a+ yB+ 27 = 6Ll FUNDAMENTAL NoTIONS 29 1.23. a) Find the linear combination of the set A = <0, ~ 1, 2+ 1> with eoefti- cient triple <2, —1, 1>. Do the same for <0, 1, 1>. b) Find the coefficient triple for which the linear combination of the triple is (+1). Do the same for 1 ©) Show in fact that any polynomial of degree < 2 is linear combination of 1 1.26 Find the linear combination fof {e', -*} C R® such that f(0) = 1and"(0) = 2 1.27 Find a linear combination fof sin 2, cos z, and e* such that f(0) = 0, (0) = 1 and "(0) = 1. 1.28 Suppose that a sin z-+ beos 2 + cet is the zero funtion. Prove that a eno 1.29 Prove that <1, 1> and <1, 2> span R?, 1.30 Show that the subspace Mf = (x: +22 = 0) CR? is spanned by one veetor. LBL Let M be the subspace {x:21 —z2+2zs = 0} in RY. Find two vectors a and b in Af neither of which is a sealar multiple of the other. ‘Then show that 1 is, the linear span of a and b. 1.82 Find the intersection of the linear span of <1,1,1> and <0,1,—1> in BY with the coordinate subspace r2 = 0. Exhibit this intersection as a linear span. 1.83 Do the above exercise with the coordinate space replaced by M = (erm ta2 = 0). 1.34 By Theorem 1.1 the linear span L(4) of an arbitrary subset 4 of a veetor space V has the following two properties: i) L(A) is a subspace of V which includes 4; fi) If If is any subspace which includes A, then L(.1) CM. Using only (i) and (i), show that a) AC B= 14) C LB); b) L(L(A)) = L(A). 1.85. Show that a) if Mand N are subspaces of V, then so is M+ N; b) for any subsets A, BC V, L(A UB) = L(A) + LB). 1.36 Remembering (Exercise 1.18) that the interseetion of any family of subspaces ‘a subspace, show that the linear span L(A) of a subset A of a veetor space Vis the intersection ofall the subspaces of V that include A. This alternative characterization is sometimes taken as the definition of linear span. 1,37 By convention, the sum of an empty set of vectors is taken to be the zero vector. This is necessary if Theorem 1.1 is to be strictly correct. Why? What about the preceding problem? Linear transformations. The general function space R4 and the subspace €((a, b)) of R'" both have the property that in addition to being closed under the vector operations, they are also closed under the operation of multiplication of two functions. That is, the pointwise product of two functions is agai function [(fg)(a) = f(a)g(a)], and the produet of two continuous functions is ‘continuous. With respeet to these three operations, addition, multiplication,30. _vEcToR sPaces a and scalar multiplication, Rand €((a, b)) are examples of algebras. Tf the reader noticed this extra operation, he may have wondered why, at least in the eontext of funetion spaces, we bother with the notion of veetor space. Why not study all three operations? The answer is that the veetor operations are exactly the operations that are “preserved” by many of the most important: mappings of sets of functions. For example, define T:€((a, bl) > B by TU) = [e f(0 at. Then the laws of the integral ealeulus say that T(f+ 9) = T(f) + T(@) and T() = eT). Thus P “preserves” the vector operations. Or we can say that T “commutes” with the veetor operations, since plus followed by T equals 7’ followed by plus. However, 1 does not preserve multiplication: it is not true in general that Tu) = TUT). Another example is the mapping T:x + y from R® to R® defined by th = 2x1 — tat ta Yo = 21+ Bea — ea, for which we can again verify that T(x-+ y) = T(x) + T(y) and T(ex) = eT(a). The theory of the solvability of systems of linear equations is essentially the theory of such mappings 7’; thus we have another important type of mapping that preserves the vector operations (but not produets). ‘These remarks suggest that we study veetor spaces in part so that we ean study mappings which preserve the vector operations. Such mappings are called linear transformations. Definition. If V and W are veetor spaces, then a mapping T:V — Wis a Hinear transformation or a linear map it T(a+ 8) = Tla) + T(B) for all a, 8€V,and Tra) = xP(@) for allac V,2ER. ‘These two conditions on ean be combined into the single equation P(e} y8) = 2T(a) + yT(@) forall a@EV andall 2,yER, Moreover, this equation can be extended to any finite sum by induetion, s0 that if 7 is linear, then 1 (Zara) = Dated for any linear combination 5 rai. For example, f (Cf ef) = Dt eft Je EXERCISES 1.38 Show that the most general linear map from R to B is multiplication by a co stant. 1.39 For a fixed a in V the mapping z+» za from B to Vis linear. Why? 1.40 Why 1.41 Show that every linear mapping from R. to V is of the form 2+ xa for a fixed veetor a in V. is this true for a zer when 2 is fixed?rat FUNDAMENTAL NOTIONS 31 142 Show that every linear mapping from B® to V is of the form >> a+ z2a2 fora fixed pair of veetorsa; and azin V. What is the range of this mapping? 1.43 Show that the map f+ J? (0) dt from @((a, 6) to B does not preserve products LAt Let g be any fixed function in R4, Prove that the mapping T:R — Ré defined by T(f) = of is linear. 145. Let ¢ be any mapping from a set A ta n-set R. Show that composition by ¢ is a linear mapping from B® to R4, That is, show that 7:R —> Re defined by Tif) = fees linear. In order to acquire a supply of examples, we shall find all linear transforma tions having R" as domain space. It may be well to start by looking at one such transformation. Suppose we choose some fixed triple of functions {f,} in the space R* of all real-valued funetions on R, say fi(t) = sin t, fa(l) = cos f, and fall) = ef = exp(). Then for each triple of numbers x = {x} in R® we have ‘the linear combination 2-1 z:f;with (23) as coefficients. This is the element of R" whose valueat fis E3 2ifi(0) = x1 sint + x2 cost + x3¢', Different coefficient triples give different functions, and the mapping x» Dia zifi = 21 sin + 2.008 + 23 exp is thus a mapping from &° to R". It is clearly linear. If we call this mapping 7, then we can recover the determining triple of functions from 7 fas the images of the “unit points” & in RY; 706") = ¥ afi = fi, and so (83) = sin, T(6*) = cos, and (6%) = exp. We are going to see that every linear mapping from R? to R® is of this form. Tn the following theorem {3'}{ is the spanning set for R" that we defined earlier, s0 that x= Ef 2:8" for every ntuple x= <2y..-, 74> in B® ‘Theorem 1.2. If {8)}7 is any fixed n-tuple of vectors in a veetor space W, then the “linear combination mapping” x-+ Ef 28s is a linear transformation T from R" to W, and 7(8/) = 8; forj = 1,...,n Conversely, if 7 is any linear mapping from R* to W, and if we set 8) = 78) for 1,...)2, then 7'is the linear combination mapping x» Ef 28 Proof. The linearity of the linear combination map T' follows by exactly the ‘same argument that we used in Theorem 1.1 to show that L(A) is a subspace. ‘Thus Et wei = E (eae + v9 x tiBe + z uss = T(x) + Ty), and Tox) =F (ord = Sale) = +E 28s = aT. Also 1(8!) = Ef-1 88: = 85, since 6 = 1 and &f = 0 for i x j.82 VECTOR spaces 1 Conversely, if 7: RY — Wis linear, and if we set 8 = (6?) forall j, then for any x= in RY we have T(x) = T(LT x: 83) = Li xT (6) Efage Thus 7 is the mapping x» Ef 2G. O ‘This is a tremendously important theorem, simple though it may seem, and the reader is urged to fix it in his mind. To this end we shall invent some terminology that we shall stay with for the first three chapters. Ife = {a isan n-tuple of veetors in a veetor space W, let Le be the corresponding linear combination mapping x + Cf a: from R* to W. Note that the n-tuple a itself isan clement of W*. If 7 is any linear mapping from R" to W, we shall eall, the n-tuple {7'(8')}¥ the skeleton of 7. In these terms the theorem ean be restated as follows. ‘Theorem 1.2". For cach n-tuple a in W", the map La: R" — W is linear and its skeleton is @. Conversely, if Tis any linear map from ®" to W, then T = Lg where B is the skeleton of T. Or again: ‘Theorem 1.2”. ‘The map a+ Le is a bijection from 17" to the set of all linear maps T from R" to W, and 1’ skeleton (7) is its inverse. A linear transformation from a veetor space ¥” to the scalar field Ris ealled a linear functional on V. Thus f+ ft f(d) dt is a linear functional on V = €((a,b). The above theorem is particularly simple for a linear functional F: since W = R, each veetor 8; = F(8) in the skeleton of F is simply a number bi, and the skeleton {b,}} is thus an element of R". In this ease we would waite F(x) = Cf bay, putting the numerical coefficient ‘4 before the variable ‘2. Thus F(x) = 3x1 — 22+ 4s is the linear functional on R® with skeleton <8, —1,4>. The set of all linear functionals on R" is in a natural one-to-one correspondence with R” itself; we get b from F by by = F(s') for all i, and we get F from b by F(x) = ¥ bar; for all x in R”. We next consider the ease where the codomain space of Tis a Cartesian space R™, and in order to keep the two spaces clear in our minds, we shall, for the moment, take the domain space to be R®, Each veetor 6; = 7(3') in the skeleton of 1 is now an metuple of numbers. If we picture this m-tuple as a column of numbers, then the three m-tuples 8, can be pictured as a rectangular array of numbers, consisting of three columns each of m numbers. Let é; be the ‘th number in the jth column. Then the doubly indexed set of numbers {@} is called the matriz of the transformation 7. We eall it an m-by-3 (an m X 3) matrix because the pictured reetangular array has m rows and three columns ‘The matrix determines 7 uniquely, since its columns form the skeleton of ‘The identity T(x) = Li zjT(s’) = Li xy6; allows the m-tuple T(x) to be calculated explicitly from x and the matrix {4j}. Picture multiplying the column m-tuple 8; by the sealar x; and then adding across the three columns at1 FUNDAMENTAL NOTIONS 33 the ith row, as below (--Q-Q-@) Sinee (i; is the ith number Dh 28; is Dhan aotez- That the m-tuple fj, the ith number in the m-tuple we let y be the m-tuple T(x), then a we= DO bist; for i= 1,...,m, and this set of m sealar equations is equivalent to the one-vector equation y= Tw. We ean now replace three by n in the above discussion without changing anything except the diagram, and thus obtain the following specialization of Theorem 1.2. ‘Theorem 1.3. Every linear mapping from B* to B™ determines the mx n matrix t = {ly} having the skeleton of 7’ as its columns, and the ‘expression of the equation y = T(x) in linear combination form is equivalent to the m scalar equations n= Litany for Conversely, each m Xn matrix t determines the linear combination mapping having the eolumns of tas its skeleton, and the mapping t -+ 7 is therefore ction from the set ofall m x n matrices to the set ofall linear maps from B* to B". A linear funetional F on &* is a linear mapping from R” to BR, so it must beexpressed by al x nmatrix. That is, the n-tuple b in R* which is the skeleton of F is viewed as a matrix of one row and n columns. Asa final example of linear maps, we look at an important elass of special linear functionals defined on any function space, the so-called coordinate functionals. If V = B! and i € J, then the ith coordinate funetional 7; is simply evaluation at i, so that (J) = J(). These functionals are obviously linear. In fact, the vector operations on functions were defined to make them tinear; since af + Wy is defined to be that function whose value at i is sf(é) + tg(¢) for all é, we seo that sf-+t is by definition that funetion such that mu(sf +t) swf) + tw(o) for all i! Tf V is RY, then m; is the mapping x= <21,...,,> H+.23. In this ease ‘we know from the theorem that xj must be of the form 1,(x) = Ei} buts for some n-tuple b. What is b?34 -vreToR spaces 1 ‘The general form of the linearity property, T(E tai) = ExT (a:), shows that T and 7! both earry subspaces into subspaces. ‘Theorem 14. If T:V — I is linear, then the T-image of the linear span ‘of any subset A CV is the linear span of the T-image of A: TIL(A)] = L(T[A). In particular, i A isa subspace, then so is T[A]. Furthermore, if Y isa subspace of W, then 7—![Y'] isa subspace of V. Proof. According to the formula T(E xa;) = XxiT(a,), « veetor in W the 7-image ofa linear combination on A if and only if itis linear combination on T[A]. That is, TIL(A)] = L(T[A)). If A isa subspace, then A = L(A) and TA] = L(T[A), a subspace of W. Finally, if Y isa subspace of W and {a,) C TY), then TE xa) = LxM@) €L0) = V. Thus Dea € T) and T™{Y] is its own linear span. ‘The subspace T-1(0) = fae V: T(a) = 0} is called the null space, or Kernel, of , and is designated N(7) or (7). The range of 7 is the subspace TV] of W. It is designated R(T) or (7). Lemma 1.1. A linear mapping T is injective if and only if its mull space is 0) Proof. WET is injective, and if a 0, then T(a) # 7(0) = O and the null space accordingly contains only 0. On the other hand, if N(T) = {0}, then whenever a # 8, we havea — 8 # 0, T(a) — T(6) = T(a — 8) #0, and T(a) # TA); this shows that 1'is injective, A linear map 7: —+ W which is bijective is called an isomorphism. ‘Two vector spaces V and W aro isomorphic if and only if there exists an isomorphism between them, For example, the map <¢,,-..,¢n% > D7! eax! I" with the veetor space of all polynomials of degree < ». Isomorphie spaces “have tho same form”, and are identical as abstract vector spaces. That is, they cannot be distinguished from each other solely on the basis of veetor properties which they do or do not have. ‘When a linear transformation is from V to itself, special things ean happen. One possibility is that 7’ can map a vector a essentially to itself, T(a) = xa for some z in B. In this ease a is called an eigenvector (proper veetor, character istie veetor), and « is the corresponding eigenvalue. an isomorphism of EXERCISES 1.46 In the situation of Exercise 1 by shoving that show that 1's an isomorphism if ¢ is bijective a) ¢ injective surjective, b) surjective T injectiveAl FUNDAMENTAL NOTIONS 1.47 Find the linear functional / on R? sueh that (<1, 1) = O and (<1, 2>) ‘That is, find b = <0j,b2> in R® such that /is the linear combination map x bien + bare 1.48 Do the same for 1(<2,1>) = —Band I(<1,2>) = 4. 1.49 Find the linear 7:R? — RB such that 7(<1,1>) = @ and T(<1,2>) = 8. That is, find the funetions f,(0) and fa(t) such that! T'is the linear combination map xo nifit refs 1.50, Lot Tbe the linear map from R? to RS such that T(3!) = <2,—1, 1>, 708) = <1,0,3>. Write down the matrix of T in standard rectangular form. Determine whether or not 8! isin the range of 7. L.5L_ Let T'be the linear map from R¥ to RS whose matris is 10209 2 0-1]. 3-101 Find T(x) when x = <1, 1,02; do the same for x = <3, —2, 1> 1.52 Lot M be the linear span of <1, —1,0> and <0, 1,12. Find the subspace TIM] by finding two vectors spanning it, where 7 is as in the above exercise 1.53 Let T’e the map —+ from R? to itself. Show that Tisa linear combination mapping, and write down its matrix in standard form, 54 Do the same for T: + <2 —2,2-+2,y> from BP to itself 1.55. Find a linear transformation 7 from B® to itself whose range space is the span of <1, -1,0> and <1, 0,2>. 1.56 Find two linear functionals on R¢ the intersection of whose mull spaces is the linear span of <1, 1,1, 1> and <1,0,—1,0>. You now have in hand a linear transformation whose null space is the above span. What is it? 1.57 Let V = €({a,b)) be the space of continuous real-valued functions on fa, 8), designated (a, b), and let 1 = €%((a,b)) be those having continuous first Mlerivatives, Let D: iV —» V be differentiation (Df =f’), and define T on V by T(f) = F, where F(x) = JZ f(t) dt. By stating appropriate theorems of the ealeulus, show that D and 7 are linear, T maps into W, and D is a left inverse of T (De T is the identity on V). 1.58 In the above exercise, identify the range of T’and the null space of D. We know that D is surjective and that T is injective. Why? 1.59 Let V’be the linear span of the functions sin x and cos x. Then the operation ‘of differentiation D isa linear transformation from V to V. Prove that D isan isomorphism from V to V. Show that D? = —J on V. {60 a) Ae the reader would guess, @9(R) ix the sot of real-valued functions on RL having continuous derivatives up to and including the third. Show that f+” is a surjective linear map T' from €8(R) to €(R). b) For any fixed a in B show that f+ is an isomorphism from the null space N(T) to B®. [Hints Apply Taylor's formula with remainder.)36 vEcToR sraces 12 1.61 An integral analogue of the matrix equations ys = Tj tysn t= Ayes ym, is the equation . a) =f KGOK0 a, 610,11 Assuming that K(s;t) is defined on the square [0,1] X 0, 1] and is continuous as a function of ¢ for each s, check that f — g is a linear mapping from @((0, 1]) to Rl", 1.62 For a finite set = fai}, Theorem 1.1 Isa corollary of Theorem 1.4. Why? 1.63 Show that the inverse of an isomorphism i linear (and hence i an isomorphism). 1.64 Find the eigenvectors and eigenvalues of 7:R? — R? if the matrix of Tis tat 2 0. Since every sealar multiple za of an eigenvector a is elearly also an eigenvector, it will suffice to find one vector in each “eigendirection”. ‘This is a problem in elementary algebra, 1.65 Find the cigenveetors and eigenvalues of the transformations 7 whose matrices aor 1-1 1-1 27 Gob Go Gea 1.66 ‘The five transformations in the above two exercises exhibit four different kinds of behavior according to the number of distinet eigendirections they have. What are the possibilities? 1.67 Let V be the vector space of polynomials of degree < 3 and define 7:1 > V by f+ 4"). Find the eigenvectors and eigenvalues of 2. VECTOR SPACES AND GEOMETRY ‘The familiar coordinate systems of analytic geometry allow us to consider geometric entities such as lines and planes in veetor settings, and these geometric notions give us valuable intuitions about veetor spaces. Before looking at the vector forms of these geometric ideas, we shall briefly review the construction of, the coordinate correspondence for three-dimensional Euclidean space. As usual, the confident reader ean skip it. We start with the line. A coordinate correspondenee between a line Land the real number system & is determined by choosing arbitrarily on La zero point O and a unit point @ distinet from 0. ‘Then to each point X on Lis assigned the number x such that |2| is the distance from O to X, measured in terms of the segment 0Q as unit, and x is positive or negative according as X and Q are (on the same side of 0 or on opposite sides. ‘The mapping X ++ x is the coordinate correspondence. Now consider three-dimensional Euclidean space E*. We want, to sot up a coordinate correspondence between E* and the Cartesian vector space BR’. We first choose arbitrarily a zero point 0 and three unit points Q1,Q2, and Qs in such a way that the four points do not lic in a plane, Each of12 VECTOR SPACES AND GEOMETRY 37 the unit points Q, determines a Iine Ly through 0 and a coordinate correspondence on this line, as defined above. The three lines L1, L, and Ls are called the coordinate axes. Consider now any point X in ES. “The plane through X parallel to Lz and Ls intersects Ly at a point Xy, and therefore determines a number 21, the coordinate of X, on Ly. Ina similar way, X determines points Xz on Lz and X3 on Ls which have eoordinates 22 and a, respectively. Alto- gether X determines a triple XS Xen teta> in B°, and we have thus defined a mapping, a: Xx from E* to B® (see Fig, 1.4). We call @ the coordinate correspondence defined by the axis system. The conver tion implicit in our notation above is that A(Y) is y, (A) is a, ete. Note that the ‘unit point Q, on L1 has the coordinate triple a! = <1, 0,0>, and similarly, that (2) = 8 = <0, 1,0> and (Qa) = 8 = <0,0,1> Figs Lt ‘There are certain basie facts about the coordinate correspondence that have to be proved as theorems of geometry before the correspondence ean be used to treat geometric questions algebraically. ‘These geometric theorems are quite tricky, and are almost impossible to discuss adequately on the basis of the ustal secondary school treatment of geometry. We shall therefore simply assume them, They are: 1) Gis bijection from E* to R. }) ‘Two line segments AB and XY are equal in length and parallel, and the direction from A to B is the same as that from X to ¥ if and only if b — a= y —x (in the veetor space R'), This relationship between line segments is important enough to formalize. A directed line segment is a geometric line sex rent, together with a choice of one of the two directions along it. If we interpret AB as the directed line segment from A to B, and if we define the directed line segments AB and XY to be equivalent (and write AB ~ XY) if they are equal in length, parallel, and similarly directed, then (2) ean be restated AB~XY o b—a- 8) If X ¥ 0, then Y is on the line through 0 and X in E* if and only if y = tx for some fin R. Moreover, this tis the coordinate of ¥ with respeet to X ‘as unit point on the line through 0 and X.38 vector spaces 12 (ua. y-2) =1XVP (oa) ~\0r? Fig. 15 4) If the axis system in E* is Cartesian, that is, if the axes are mutually perpendicular and a common unit of distance is used, then the length |OX| of the segment OX is given by the so-called Euchidean norm on R°, |OX| = (Li 2*)"*. This follows directly from the Pythagorean theorem. ‘Then this formula and a second application of the Pythagorean theorem to the triangle OXY imply that the segments OX and OY are perpendicular if and only if the sealar product (x, y) = Lifes 20; has the value 0 (see Fig. 1.5). In applying this result, it is useful to note that the scalar product (x, y) is linear as @ funetion of either veetor variable when the other is held fixed. ‘Thus 2 a (ext dy, 2) =D (erst dudes tart dD yee = le, 2) + aly, 2) actly the same theorems hold forthe coordinate eorespondenee between the Euclidean plane E* and the Cartesian 2-space R*, except that now, of course, (wy) = Eh ean = zaps says. ‘We can easily obtain the equations for ines and planes in E* from these basic theorems. First, we see 4 ~\ from (2) and () that if fined points A and B are given, x th A #0, then the line through B parallel to the egment OA contains the point X ifand only i there 9 cnten acalar (such that x — b— ta (see Fig. 1.8). ‘Therefore, the equation of this line is ts x=la+b. Fig. 1.6 ‘This vector equation is equivalent to the three numerical equations x; = ‘ait ++ b;, i = 1,2, 3. These are customarily ealled the parametrie equations of the ine, since they present the coordinate triple x of the varying point X on the line as functions of the “parameter” t.12 VECTOR SPACES AND GEOMETRY 39 Next, we know that the plane through B perpendicular to the direetion of the segment OA contains the point X if and only if BX OA, and it therefore follows from (2) and (4) that the plane contains X if and only if (x — b, a) = 0 (Gee Fig. 1.7). But (x ~ b, a) = (x, a) — (b, a) by the linearity of the sealar product in its first variable, and if we set 1 = (b, a), we see that the equation of the plane is (x, a) or Do aees ‘That is, a point X is on the plane through B perpendicular to the direction of OA if and only if this equation holds for its coordinate triple x. Conversely, ifla + 0, then we can retrace the steps taken above to show that the set of points X in E* whose coordinate triples x satisfy (x, a) = lis a plane. ‘The fact that R® has the natural sealar product (x,y) is of course extremely important, both algebraically and geometrically. However, most vector spaces do not have natural sealar products, and we shall deliberately neglect scalar produets in our early veetor theory (but shall return to them in Chapter 5). ‘This leads us to seek a different interpretation of the equation L} We saw in Seetion I that x 5} azzs is the most general linear functional f on 8°. Therefore, given any plane IM in E®, there is a nonzero linear functional f on R¥ and a number / such that the equation of M is f(x) =. And conversely, given any nonzero linear functional f:R® —> Rand any 1 R, the locus of Lis a plane A in E®. ‘The reader will remember that we obtain the coefficient triple a from f by a= J(H), sinee then f(x) = f(EP x8) = ix: Finally, we seek the vector form of the notion of parallel translation, In plane geometry when we are considering two congruent figures that are pau and similarly oriented, we often think of obtaining one from the other by “sli40 vector spaces 12 ane along itself” in such a way that all lines remain parallel to their original positions, This description of a parallel translation of the plane can be more clegantly stated as the condition that every directed line segment slides to an equivalent one. If X slides to Y and 0 slides to B, then OX slides to BY, so that OX ~ BY and x = y — b by (2). Therefore, the coordinate form of such ‘8 parallel sliding is the mapping x + y = x +b. Conversely, for any b in R? the plane mapping defined by x-> y = x +b is easily seen to be a parallel translation. ‘These considerations hold equally well for parallel translations of the Eucldean space E°, It is geometrically clear that under a parallel translation planes map to parallel planes and tines map to parallel lines, and now we ean expect an easy algebraic proof. Consider, for example, the plane Mf with equation f(x) = 0; let us ask what happens to Mf under the translation x+y = x-+b. Since x = y — b, we see that point x is on A if and only if its translate y satisfies ‘the equation f(y — b) = Lor, since f is linear, the equation fy) = U, where 1 =1+J(b). But this is the equa‘ion of a plane N. Thus the translate of is the plane N. It is natural to transfer all this geometric terminology from sets in E to the corresponding sets in R® ard therefore to speak of the set of ordered triples x satisfying f(x) = Las a set of points in R® forming a plane in B°, and to call the mapping x++ x +b the (parallel) translation of R* through b, ete. Moreover, since R'is a vector space, we would expect these geometric ideas to interplay with veetor notions. For instance, translation through b is simply the operation of adding the constant veetor b: x x +b. Thusif Af isa plane, then the plane NV obtained by translating M through b is just the vector set, sum M+. If the equation of M is f(x) = J, then the plane AT goes through 0 if and only if 1 = 0, in which ease A isa veetor subspace of 8 (the null space off). It is easy to sce that any plane M isa translate of a plane through 0. Similarly, the line {la-+ b:1R) is the translate through b of the line {ta : £ R}, and this second line is a subspace, the linear span of the one veetor a. ‘Thus planes and lines in B ave translates of subspaces. ‘These notions all carry over to an arbitrary real veetor space in a perfectly satisfactory way and with additicnal dimensional variety. A plane in R® through 0 is a veetor space which is vo-dimensional in a strictly algebraic sense which we shall discuss in the next chapter, and a line is similarly one-dimensional. In R® there are no proper subspaces other than planes and lines through 0, but in a veetor space V with dimersion n > 3 proper subspaces occur with all dimensions from 1 ton — 1. We shall therefore use the term “plane” loosely to refer to any translate of a subspacs, whatever its dimension. More properly, translates of vector subspaces are called afine subspaces. We shall see that if V is a finite-dimensional space with dimension n, then the null space of a nonzero linear furetional fis always (n — 1)-dimensional, and therefore it cannot be a Euclidean-like two-dimensional plane exeept when12 VECTOR SPACES AND GEOMETRY 41 n= 8. Weuse the term hyperplane for such a null space or one of its translates ‘Thus, in general, a hyperplane is a set with the equation f(x) = 1, where fis ‘nonzero linear functional. Tt is a proper afine subspace (plane) which is maxi: mal in the sense that the only affine subspace properly including it is the whole of V. In R® hyperplanes are ordinary geometric planes, and in R? hyperplanes are lines! EXERCISES 2.1 Assuming the theorem AB ~ XY © b—a = y — x, show that OC is the sum of Od and OB, as defined in the preliminary discussion of Section 1, if and only if © =b-+a. Considering also our assumed geometric theorem (8), show that the mapping x++ OX from R® to the vector spare of geometric vectors is linear and hence an isomorphism, 2.2 Let L be the line in the Cartesian plane R? with equation zp = 3x1. Express L in parametric form as x = ta for a suitable ordered pair a, 2.3 Let V be any vector space, and let a and g be distinct vectors. Show that the line through a and has the parametric equation E=B+ da, ER. ‘Show also that the segment from «to 8 is the image of (0, 1] in the above mapping 24 According to the Pythagorean theorem, a triangle with side lengths a, , and ¢ hhas a right angle at the vertex “opposite c” if and only if c? = a? + b?. NS Prove from this that in « Cartesian coordinate system in E* the length [OX of @ segment OX is given by jox}? = ya, where x = <21, 22, 25> is the coordinate tripleof the point X. Next use our geometric theorem (2) to conclude that OXLOY ifandonly if (x,y) =0, where (x,y) = Dra (Use the bilinearity of (x, y) to expand |X — ¥]*) 2.5 More generally, the law of cosine says that in any triangle labeled as indicated, B= a? + 88 = Bab cos 0.42° vreron spaces 12 AN, (19) = Allyl e088, where (x, y) is the sealar produet D7 zis, |x] = (x, x)? = [OX], ete. 2.6. Given a nonzero linear unctioml f:R® — B, and given FE B, show that the set of points X in E® such that f(a) = Kisa plane. [/fint: Find a b in B® such that 1, and throw the equation fs) = # into the form (x — bh, a) = 0, ete] Show that for any b in R® the mapping X-+ ¥ from E¥ to itself defined by “+ b is parallel translation. “That is, show that if X-+ ¥ and ZW, then Apply this law to the diagram to prove that 2.8 Let Mf be the set in R with equation 314 — x2 +a = 2. Find tripletsa and b such that M is the plane through b perpendicular to the direction of a, What is the equation of the plane P= M+} <1,2,1>? 2.9 Continuing the above exercise, what is the condition on the triplet b in order for N= Mt hte pass theongh the origin? What is the equation of N? 2.10 Show that if the plane AM in R has the equation f(x) = J, then M isa translate of the null space NV of the linear functional f. Shovr that any two translates Mf and P entical or disjoint, What is the condition on the ordered triple b beat Generalize the above exercise to hyperplanes in BA, Of Nave either k in order that AP 2 BAZ _Let N be the subspace (plane through the origin) in R with equation f(x) = 0. Let and P be any two planes obtained from N by parallel translation. Show that Q = MP isa thied such plane, If Af and P have the equations f(x) = fy and {fls) = la find the equation for 2.13 If is the plane in R® with e uation f(x) = [, and if ris any nonzero number, show that the set produet rif isa plane parallel to 37 2.14 In view ofthe above two exercises, discuss how we might consider the set of all Parallel translates of the plane .V with equation f(x) = 0 as forming a new vector space. ‘Let L be the subspace (line through the origin) in R* with parametric equation Diseuss the set ofall parallel translates of 1 in the sprit of the above three 2.16. Tho best object to take as “bein,” the geometric vector TH is the equivalence flats of all directed ine segments XY such that XY ~ .1B. Assuming whatever you heed from properties (1) through (1), show that this fan equivalence relation on the Set of all directed line segments (Seven 0.12). 2.17 Assuming that the geometric vector AB is defined as in the above exercise, show that, strictly speaking, i is actually the mapping of the plane (or space) into itself that we have called the parallel translation through 1B, Show also that AB + CD is the ‘composition of the two translations.13 PRODUCT SPACES AND HOM(Y,m) 43. 3. PRODUCT SPACES AND HOM(Y, 1) If W is a vector space and A is an arbitrary set, then the set W-valued functions on A is a veetor space in exactly the same way that Ris. Addition is the natural addition of funetions, (f+ 9)(a) Ja) + g(a), and, similarly, (z/)(a) = 2((a)) for every funetion f and sealar x. Tavs AI through 84 follow just as before and for exactly the same reasons. For variety, let us check the associative law foraddition. ‘The equation f-+ (g++ h) = (F+49) +h means that (f+ (+ H))(a) = (f+ 9) +h)(@) for all ae A. But + OF D)@ = fa) + G+ NO) Sta) + (ala) + Ma)) = (fa) + o(@)) + Ma) = F+9@) +N) = (F+ 0) +2), where the middle equality in this chain of five holds by the associative law for W and the other four are applications of the definition of addition, ‘Thus the associative law for addition holds in W4 beeause it holds in W, and the other Jaws follow in exactly the same . As before, we let m; be evaluation at i, so that w,(f) = (i). Now, however, 7; is vector valued rather than scalar valued, because itis a mapping from V to W, and we eal it the ith coordinate projection rather than the ith coordinate functional. Again these maps are all linear. In fact, as before, the natural vector operations on W4 are uniquely defined by the requirement that the projections 1; all be linear. We call the value f(j) = ‘x,(f) the jth coordinate of the vector f. Here the analogue of Cartesian n-space is the set W* of all n-tuples a = of veetors in W; it is also designated W*. Clearly, ay is the jth coordinate of the n-tuple a. ‘There is no reason why we must use the same space W at each index, as we did above. In fact, if W:,..., Wy are any a vector spaces, then the set of all netuples @ = such that aj € W; for j= 1,...,n is a vector space under the same definitions of the operations and for the same reasons. ‘That is, the Cartesian product W = W, x Wz X+++X Wy is also a vector space of vector-valued functions. Such finite products will be very important tous. Of course, R is the product TT} Ws with each Wy = R; but R* can also be considered R" x R"—", or more generally, []f Ws, where Wy = R™ and Xfm; = n. However, the most important use of finite product spaces arises from the fact that the study of certain phenomena on a vector space V may Jead in a natural way to a collection {Vi}} of subspaces of V such that V is isomorphic to the product IT Vi. ‘Then the extra structure that V acquires when we regard it as the product space IT} V's is used to study the phenomena in question. This is the theory of direct sums, and we shall investigate it in Section 5. Later in the course we shall need to consider a general Cartesian product of vector spaces. We remind the reader that if {W;:i € I} is any indexed collection of vector spaces, then the Cartesian produet [Tier Wi of these vector spaces is44° vector spaces 13 defined as the set of all functions f with domain I such that j(é) € Ws for all TET (see Section 0.8). ‘The following is a simple conerete example to keep in mind. Let S be the ordinary unit sphere in R°, = {x: Ez? = 1}, and for each point x on S let W, be the subspace of R? tangent to 8 at x. By this we mean the subspace (plane through 0) parallel to the tangent plane to S at x, so that the translate W, +x is the tangent plane (soe Fig. 1.8). A funetion f in the product space W'= [lees W, is a function which assigns to each point x on S a veetor in W, that is, a veetor parallel to the tangent plane to S at x. Such a function is ealled ‘a vector field on S. ‘Thus the produet set: WV is the set of all vector fields on S, and W itself is a veetor space, as the next theorem states Fig. 18 Of course, the jth coordinate projection on W = TTies Ws is evaluation at j, m)(f) = JG), and the natural vector operations on W are uniquely defined by the requirement that the coordinate projections all be linear. ‘Thus f+ must be that element of W whose value at J, x)(F-+ 9), is (1) + mila) = JG) + 9G) forall j € I, and similarly for multiplication by sealars. Theorem 81. The Cartesian product of a collection of vector spaces can be made ito a vector space in exactly one way so that the coordinate pro- jeotions are all linear. Proof. With the vector operations determined uniquely as above, the proofs of ‘Al through S4 that we sampled earlier hold verbatim. ‘They did not require that the functions being added have all their values in the same space, but only that, the values at a given domain element ¢ all lie in the same space. Hom(¥, 7). Linear transformations have the simple but important properties that the sum of two linear transformations is linear and the composition of two Tinear transformations is Tinear. ‘These imprecise statements are in essence the theme of this section, although they need bolstering by conditions on domains and codomains. Their proofs are simple formal algebraic arguments, but the objects being discussed will inerease in conceptual complexity.13 PRODUCT SPACES AND HoM(y,™) 45 If Wis a veetor space and A is any set, we know that the space 14 of all mappings f: A — W is a veetor space of functions (now vector valued) in the same way that R4 is. If A is itself a vector space V, we naturally single out for special study the subset of 7” consisting of all lincar mappings. We designate this subset Hom(V, 17). ‘The following clementary theorems summarize its basie algebraic properties. ‘Theorem 3.2, Hom(V, W) is a vector subspace of 1". Proof. The theorem is an easy formality. If $ and T are in Hom(¥’, W), then S+ 1) (ea + yB) = Slea+ vB) + Tea + vB) #8(a) + yS(6) + 27 (a) + PG) = a(S + TH(a) + iS + THE), so S + Tis linear and Hom(V, W) is closed under addition. The reader should be sure he knows the justification for each step in the above eontinued equality ‘The closure of Hom(V, W) under multiplication by scalars follows similarly, and sinee Hom(V, W) contains the zero transformation, and so is nonempty, it is a subspace. 0 ‘Theorem 3.3. ‘The composition of linear maps is linear: if 7 € Hom(V, 1) and 3 © Hom(W, X), then Se 7 ¢Hom(V, X). Moreover, composition is distributive over addition, under the obvious hypotheses on domains and codomains: Gi tS) eT =SyoT+S:07 and So(Ty +12) Finally, composition commutes with salar multiplication: S01) = (cS) oT = $07) oT 4+8oT, Proof. We have Seo Tea+ yB) = S(T(@a+ y8)) = S(zT(a) + yT(B)) S(T(a)) + yS(T(B)) = x(8 © THa) + y(S 2 THB, so Se js linear. The two distributive laws will be left to the reader. 0 Corollary. If T’ < Hom(V, 1) is fixed, then composition on the right by T” is a Inear transformation from the veetor space Hom({V, X) to the vector space Hom(V,X). It is an isomorphism if 7’ is an isomorphism. Proof. ‘The algebraic properties of composition stated in the theorem can be ined as follows: (eyS1 + cpp) © T= e4(S1 © 7) + cn(S2 © 7), Se (eT s + e272) = ex(S © Ts) + a(S ° Ta). ‘The first equation says exactly that composition on the right by a fixed T isa linear transformation. (Write S © T' as 5(S) if the equations still don’t look right.) If 7 is an isomorphism, then composition by T—! “undoes” composition by 7, and so is its inverse. 046 vecron spaces 13 ‘The second equation implies a similar corollary about composition on the left by a fixed S. ‘Theorem 3.4. If W isa product veetor space, W = []y Ws, then a mapping 1 from a veetor space V to W is linear if and only if x; 7 is linear for each coordinate projection m: Proof. It T is linear, then w; 0 T is linear by the above theorem. N conversely, that all the maps a; 7’ are linear. Then ai(T(ea + yB)) = ws0 Tea + y8) = aC © TH(a) + ulmi e TYG) am(T(a)) + ymi(P(B)) = mi(2T(@) + yT(@)). But if (J) = wig) for all 4, then f= g. Therefore, T(ta-+ y8) = xT(a) ++ 0B), and T is linear. O If 7 is linear mapping from R* to W whose skeleton is {8;)*, then myo 7 has skeleton {7,(3,)}3u1. If W is BY, then x; is the sth coordinate functional y+ yo and 6; is the th column in the matrix t = {05} of T. Thus 7(B;) = tis and ;© Tis thelinear funetional whose skeleton isthe ith row of the matrix of 7. In the discussion centering around Theorem 1.3, we replaced the vector equation y = T(x) by the equivalent set of m scalar equations yj = Ear fs which we obtained by reading off the ith coordinate in the vector equation, But in “reading off” the ith coordinate we were applying the coordinate mapping ‘ror in more algebraie terms, we were replacing the linear map T by the set of linear maps {;° 7}, which is equivalent to it by the above theorem. Now consider in particular the space Hom(V, V), which we may as well designate ‘Hom(V)’- In addition to being a vector space, itis also closed under composition, which we consider a multiplication operation. Sinee composition of functions is aluays associative (see Section 0.9), we thus have for multiplieation the laws Ae (BoC) = (Ae B)oC, Ae (B+C)= (Ao B)+ (AoC), (A+B) eC = (AeC) + Boo), K(A © B) = (A) o B= Ao (KB) Any veetor space which has in addition to the vector operations an operation ‘of multiplication related to the veetor operations in the above ways is called aan algebra, ‘Thus, ‘Theorem 3.5. Hom(V) is an algebra, Wo noticed earlier that certain real-valued function spaces are also algebras Examples were R4 and €((0, 1). In these cases multiplication is eommutative, Dut in the ease of Hom(V) multiplication is not commutative unless V is a trivial space (V = {0}) oF V is isomorphie to R. We shall check this later when ‘we examine the finite-dimensional theory in greater detail,13 PuoDUCT SPacES AND Hom. 47 Product projections and injections. In addition to the coordinate projections, there is a second class of simple linear mappings that is of basie importanee in the handling of a Cartesian product space W = [[yex Ws. These are, for each 4, the mapping @; taking a veetor a W; to the function in the product space having the value « at the index j and 0 elsewhere. For example, 03 for Ws x W2 x W, is the mapping a++ <0, a, 0> from Ws to W. Or if we view RB? as RX RY, then dp is the mapping <22,25> + <0, <2, 23> > = <0,22, 43> We eal 9; the injection of W; into Ths Ws, The linearity of 8 is probably obvious. The mappings x; and 8; are clearly connected, and the following projection- injection identities state their exact relationship. If I; is the identity transformation on W), then mjeG=T; and wye8j=0 it ie) If K is finile and I is the identity on the produet space W, then In the case [[f1 Wi, we have 62° ro() = <0,a2,0>, and the identity simply says that ++ <0, a2,0> + <0,0,a3> = Xar, a, 43> for all a, az, a3. These identities will probably be elear to the reader, anid we leave the formal proofs as an exercise. The coonlinate projections x, are useful in the study of any product space, but beeause of the limitation in the above identity, the injections 4; are of interest principally in the ease of finite products. ‘Together they enable us to decompose and reassemble linear maps whose domains or codomains are finite product spaces. For a simple example, consider the 7 in Hom(R%, R®) whose matrix is 2-1 1 11 4, ‘Then #1 © Tis the linear functional whose skeleton <2, —1, 1> is the first row in the matrix of 7, and we know that we ean visualize its expression in equation form, ys = 221 — 2+ 2s, as being obtained from the veetor equation y T(x) by “reading off the first row”. Thus we “decompose” into the two linear functionals {j= x;° 7. Then, speaking loosely, we have the reassembly T= ; more exactly, T(x) = <2ry— 1p ty, t+ 42+ 4ty> = for all x. However, we want to present this reassembly as the action of the linear maps @, and @. We have = 61(L1G9)) + Oa(lals)) = (01 © m1 + 82° a)(TE)) = TOW), which shows that the decomposition and reassembly of 7’ is an expression of the identity 5 a0 7 = I. In general, it T © Hom(V, W) and W = TI, W, then Ty= mo isi V, W0) for cach #, and 1% ean be considered “the part of T going into W,”, since T(a) is the ith coordinate of T(a) for each a. ‘Then we48 vector spaces can reassemble the 17's to form T again by T= E06: Ts, for £02 Ts (Eoem)eT=1eT=T. Moreover, any finite collection of 7y's on a common domain ean be put together in this way to make a 7. For example, wwe can assemble an metuple {7,}1' of linear maps on a common domain V to form a single m-tuple-valued linear map 7. Given a in V, we simply define T(q) as that. m-tuple whose ith coordinate is Ti(a) for i= 1,...,.m, and then check that is linear. ‘Thus without having to ealeulate, we see’ from this assembly principle that T:x ++ <2ry — 22+ 2421+ 2-+ 44> is a linear ‘mapping from R® to R2, sinee we have formed 7 by assembling the two linear functionals ty(x) = 221 — 22+ 25 and le(x) = 21+ a+ 41g to form a single ordered-pair-valued map. ‘This very intuitive process has an equally simple formal justification. We rigorize our diseussion in the following theorem ‘Theorem 3.6. If 7; is in Hom(V, W:) for cach i in a finite index set I, and if W is the product space [Tier Wj, then there isa uniquely determined ‘Tin Hom(V, W) such that 7; = ay T for all é in 1 Proof. If T exists such that T; = ;° T for each i, then T= Iw © (Lem) T = LHe (xe T) = LG; T;. Thus 7 is uniquely determined as £6 T;. Moreover, this T’ does have the required property, since then ry 9 (3, Oe Ti) = Zi (ajo odo Vs xyoT =Tet= 7,0 Tn the same way, we ean decompose a linear T whose domain is a produet space V = [Ti Vj into the maps 7; = 7» 8; with domains Vj, and then reassemble these maps to form T by the identity T= Ejay 7) 0 x; (cheek it mentally}. Moreover, a finite collection of maps into a common codomain, space ean be put together to form a single map on the produet of the domain, spaces. ‘Thus an n-tuple of maps {7;}7 into W defines a single map 7 into W, where the domain of is the produet of the domains of the 7,’s, by the equation P(X ay. y4n>) = EI Tila) or T = LI Tye w;. For example, it 7:8 — R? is the map {> 12, 1> = <2,0>, and Tz and 's are similarly the maps to t<—1,1> and (+5 ¢<1,4>, then T= E} Tse 7; is the mapping from RS to B? whose matrix is there is a simple formal argument, and we shall ask the reader to write out the proof of the following theorem. ‘Theorem 3.7. If 7; is in Hom(V, W) for each j in a finite index set J, and if V = [jer Vis then there exists a unique T in Hom(V, 17) such that T° 0; = for each j in J. Finally we should mention that Theorem 3.6 holds for all produet spaces, finite or not, and states a property that characterizes product spaces. We shall13 PRODUCT SPACES AND HOM. 49 nvestigate this situation in the exercises. ‘The proof of the general case of ‘Theorem 8.6 has to get along without the injections 6; instead, it isan application of Theorem 3.4. ‘The reader may feel that we are being overly formal in using the projeetions ; and the injections 0; to give algebraic formulations of processes that ate visualized directly, such as reading off the sealar “components” of a veetor equation. However, the mappings xe and ye <0;.- 0,50 0,22 OF are clearly fundamental doviees, and making their relationships explicit now be helpful to us later on when we have to handle their oeeurrenees in more complicated situations. Show that RB" X Ris isomorphic to R™™ Show more generally that if I ne = n, then TTP, RY is isomorphie to R*. Show that if (2, C} isa partitioning of 1, then Be" and RX RE are isomorphic. Generalize the above to the ease where (1) partitions 4. Show that a mapping 7 from a veetor space 1 to a vector space I is linear if tnd only if (the graph off 7 is a subspace of VX. W, 8.6 Let S and be nonzero linear maps from V to 1. The definition of the map S++ Tis not the same as the set sum of (the graphs of) Sand T'as subspaces of VX 1 Show that the set sum of (the graphs of) S and 7’ cannot be a graph uinless S 3 32, 3.8 Prove the distributive laws given in ‘Theorem 3.3, 3.9. Let D:el((a, d)) — elfa,H) be differentiation, and let 8:€%(a, Hl) > B he the dofinite integral map f+ J2J.” Comte the composition Se D. Give the justifieation for each step of the ealeulation in Theore 3.10 We know that the general linear funetional F on Ris the map x arzi -| ase determined by the pair a in R2, and that the general linear map 7 in Hom determined by a mates, ‘Then Fe T is another linear functional, and hence is of the form «> bury + bare for some bin R2, Compute b from t and a, Your computation should stow yon that ams bis linear. What is its matrix? BAIL Given S and Tin Hom(R2) whose matrices are fa] [il respectively, find the matrix of Se 7 in Hom(R2)50 VECTOR spaces. 13, 3.12 Given S and Tin Hom(R*) whose matrices are ef ee] oe fe, if Sand T are in Hom(R*)? Verify your guess. isan isomorphism Se T surjective S surjective, $2 P injective 7 injective, and, therefore, that if 7’ © Hom(V, W), $¢ Hom(W, V), and SoT=y, TeSm1, then 7 is an isomorphism, 3.15 Show tha: if S-! and 7-t exist, then (So 7) exists and equals T-! 0 S: Give a more caroful statement of this result. 3.16 Show that if Sand 7’in Hom ¥ commute with each other, then the null space of T,N = N(P), anditsrange R = R(T) are invariant under $ (SIN]CN and SURIC R). BAIT. Show that if a is an eigenvector of T and $ commutes with 7, then S(a) is fan eigenvector of T and has the same cigenvalue, 3.18 Show tha: if $ commutes with 7 and 7-1 exists, then S commutes with 1 3.19 Given that ais an eigenvector of T' with eigenvalue 2, show that « is also an eigenvector of T2 = To 7, of T*, and of T-* (if T is invertible) and that the corre sponding eigenvalues are 22, 2%, and 1/2. Given that p() isa polynomial in , define the operator p(7), and under the above hypotheses, show that a is an eigenvector of p(T) with eigenvalue p(2). 3.20 If S and T are in Hom V, we say that $ doubly commutes with T (and write ‘See 7) if § commutes with every 4 in Hom V which commutes with 7. Fix 7, and set {7}"" = {S:SeoT}. Show that (7) is a commutative subalgebra of Hom V. 3.21 Given Tin Hom V and a in V’, let NV be the linear span of the “trajectory of a ‘under 2" (the set {Pain €Z*}). Show that N is invariant under 7. 8.22 Atransformation Tin Hom V such that 7 = 0 for some n issaid to be nilpotent. Show that if 7s nilpotent, then J — Tis invertible. [Jin The power series is a finite sum ifz is replaced by T.] 3.23 Suppose that 1'is nilpotent, that $ commutes with 7, and that S~! exists, where 8, T © Hom V. Show that (S — T)~! exists 3.24 Lety be an isomorphism from a veetor space V to a veetor space W. Show that T+ go Teg is an algebra isomorphism from the algebra Hom V to the algebra Hom IF,13 PRODUCT SPACES AND HOM) 51 8.25 Show the x's and 8,'s explicitly for BS = BX BX R using the stopped arrow notation. Also write out the identity 5 @;¢ xj = J in explicit form. 8.26 Do the same for B = BR? x RS, 3.27 Show that the first two projection-injection identities (ry® @ = Ieand xj2 6; = 0 if j x 1) are simply a restatement of the definition of 6. Show that the linearity of 6, follows formally from these identities and Theorem 3.4 3.28 Prove the identity 5 6/2 x; = Iby applying x; to the equation and remembering that f = 9 if x5(f) = ms{9) for all j (this being just the equation (4) = (3) for all j)- 3.29 Prove the general ease of Theorem 86. We are given an indexed collection of linear maps {Ts:4€ 1} with common domain V and codomains {W,:i eT}. The first question is how to define T:¥ — W = TJ; Ws. Do this by defining 7(® suitably for each £€ V and then applying ‘Theorem 3.4 to conclude that Tis linear. 3.30 Prove T 3.31 We know without ealeulation that the may corem 37 Tex > <38y — rat 33,22 25,21 — B45, 21> from B® to Rt is linear. Why’ (Cite relevant theorems from the text.) 8.82 Write down the matrix for the transformation T'in the above example, and then ‘write down the mappings 1'° @, from R to RY (lor i = 1, 2,8) in explicit ordered quadruplet form. 3.33 Let W = TIF Ws be a finite product veetor space and set pi = 852 x5, 60 that isin Hom W for all i. Prove from the projection-injection identities that 3 ps = I (the identity map on W), pie pj = Oi i # J, and pie ps = pi. Identify the range Re = Rip. In the context of the above exer dofine Tin Hom W as Xm Show that ais an eigenvector of T'if and only if a isin one of the subspaces Ri and that then the eigenvalue of e isi 3. In the same situation show that the polynomial (P= Ne--o (P=) is the zero transformation, 3.86 Theorems 3.6 and 3.7 can be combined if T€ Hom(V, W), where both V and 1 fre product spaces: veTIv, ad) w=T[w State and prove a theorem which says that such « 7 can be decomposed into a doubly indexed family (7,} when Ts € Hom(Vs, W;) and conversely that any such doubly indexed family can be assembled to form a single T form ¥ to W.52 vector spaces, 14 8.87 Apply your theorem to the special ease where V = RY and IV = R™ (that is, Vi = Wy = B for all Gand j). Now Ty is from B to Band hence is simply multiplication by a number fy. Show that the indexed collection {t,} of these numbers is the matris of 7. 3.38 Given an metuple of vector spaces {1}, suppose that there are a veetor space Vand maps psin Hom(X, 1), = 1,...,m, with the following property: P. For any m-tuple of linear maps (71 from a common domain space V to the above spaces I; (so that T/€ Hom(V, Wi), i= 1,...,m), there is a unique 7 in Hom(V, X) sueh that ;'= pio T,i = 1... ,m. Prove that there is a “eanonical” isomorphism from we TLw, to x under whieh the given maps p: become the projetions mz. [Remarks ‘The produet space W itsolf has property P by Theorem 3.6, and this exercise therefore shows that P is an abstract characterization of the product space.) "INE SUBSPACES AND QUOTIENT SPACES section we shall look at the “planes” in a veetor space V and see what happens to thom when we translate them, intersect them with each other, take their images under linear maps, and s0 on. ‘Then we shall confine ourselves to the sot of al. planes that aro translates of a fixed subspace and diseover that, this set itself is a veetor spaco in the most obvious way. Some of this material hhas been anticipated in Section 2. Affine subspaces. If N is a subspace of a vector space V and a is any veetor of V, then the set V-}a= {€4-a:£EN)} is called cither the coset of N containing « or the affine subspace of V through a and parallel toN. ‘The set N +a is also called the translate of N through a. We saw in Section 2 that affine subspaces are the general objects that we want to eall planes. If’ is given and fixed in a discussion, we shall use the notation a = N ++ a (see Seetion 0.12). We begin with a list of some simple properties of affine subspaces. Some of these will gene-alize observations already made in Section 2, and the proofs of some will be left as exercises 1) With a fixed subspace ¥ assumed, if Yea, then ¥= a For if 7 = aor then Y= at (n9-+ 2) €2, 807 CH. Alsoabn = 7+ (9 — m0) EF, s0%C7. Thus a= 7. 2) With N fixed, for any @ and 8, either 2 = 8 or @ and J are disjoint. For if @ and Bare not disjoint, then there exists a ¥ in each, and a= 7 = 8 by (1). ‘The reuder may find it illuminating to compare these calculations with the more general ones of Section 0.12, Here @ ~ 8 if and only if a ~ 8 e) 3) Now lo: @ be the collection of all affine subspaces of V;@ is thus the set of all cosets of all vector subspaces of V. ‘Then the interscetion of any sub-14 AFFINE SUBSPACES AND QUOTIENT SP\CES 53 family of @ is cither empty or itself an affine subspace. In fact, if (Ai}cer is ‘an indexed collection of affine subspaces and A, is a coset of the vector subspace W, for each i € 1, then Mer Acis either empty or a coset of the veetor subspace Aser We. For if 8 € Myer Ay, then (1) implies that Ay = 8 + Wy for all i, and then NA; = 6+ NW. 4) If A, Bea, then A+ BEG. That is, the set sum of any two affine subspaces is itself an affine subspace. 5) If A @@and Tc Hom(V, W), then T[A] is an affine subspace of W. In particular, if ¢ € R, then td € @. 6) If Bis an affine subspace of W and T ¢Hom(V, 1), then TB] is either empty or an affine subspace of V. 7) Por a fixed a € V the translation of V through a is the mapping SeiV > V defined by So() = €+ 4 for all ¢€ V. Translation is not linear; for example, Sa(0) = a. It is clear, however, that translation carries affine subspaces into affine subspaces, Thus S,(4) = A+ and S,(8-+ 1) = (e+) + W. 8) An affine transformation from a vector space V to a veetor space W is a linear mapping from V to W followed by a translation in W. ‘Thus an affin transformation is of the form £ +» T(8) + 8, where 1’ € Hom(V, W) and 8 € W. Note that + T(¢+ a) isaffine, since TE+ a) Ta). T() +6, — where Tt follows from (5) and (7) that an affine transformation carries affine subspaces of V into affine subspaces of W. Quotient space. Now fix a subspace N of V, and consider the set W of all translates (cosets) of N. We are going to see that W itself is a vector space in the most natural way possible. Addition will be set addition, and scalar multiplication will be set multiplication (except in one special ease). For example, if N isa line through the origin in R®, then W consists of all lines in R® parallel to N. We are saying that this set of parallel lines will automatically turn out to be a vector space: the set sums of any two of the lines in WV turn out to be a line in W! And if L € W and t # 0, then the set product tL is a line in W. ‘The translates, of L fiber B°, and the set of fibers is a natural vector spac During this discussion it will be helpful temporarily to indicate set sums by “he and set products by ‘,’. With NV fixed, it follows from (2) above that two cosets are disjoint or identical, so that the set W of all cosets is a fibering of V in the general case, just as it was in our example of the parallel lines. From (4) or by a direct calculation we know that #-+,8= a8. Thus W is closed under set addition, and, naturally, we take this to be our operation of addition on W. ‘That is, we define + on W by a+ 8 = a-+, 8, Then the natural map m:ar+a from V to W preserves addition, x(a-+ 8) = (a) + (8), since54 vecrou sraces 1 this is just our equation a-F B= a +B above, Similarly, if (© R, then the set product f+, & is either fa or {0}. Hence if we define as the set product when 1 Oand as 6 = N when ¢= 0, then x also preserves sealar multiplication, (la) t(a). We thus have two vectorlike operations on the set W of all cosets of N, ‘and we naturally expect 17 to turn out to bea veetor space. We could prove this, by verifying all the laws, but it is more elegant to notice the general setting for such a verifieation proof. ‘Theorem 4.1. Let V be a veetor space, and let W be a set having two veetorike operations, whieh we designate in the usual way. Suppose that there exists a surjective mapping T': V —+ I whieh preserves the operations: Toa + (8) = sP(a) + tT(@). ‘Then W isa vector space. Proof. We have to check laws Al through $4. However, one example should make it clear to the reader how to proceed. We show that 7'(0) satisfies A3 and hence is the zero vector of W. Since every 8 € W is of the form Ta), we have 70) +8 = TO) + Ta) = TO + @) = Te) = 6, which is A3, We shall ask the reader to cheek more of the laws in the exereises. ‘Theorem 4.2. ‘The set of cosets of a fixed subspace NV of a vector space themselves form a vector space, called the quotient space V/N, under the above natural operations, and the projection x is a surjective linear map from V to V/N. ‘Theorem 4.3, If T is in Hom({’, W), and if the null space of 7'ineludes the subspace M CV, then 7 has a unique factorization through V/M. ‘That there exists a unique transformation S in Hom(V/M, W) such that T= Som Proof. Since T is 2070 on M, it follows that T is constant on each coset A of A, so that T[A] contains only one vector. If we define S(A) to be the unique vector in T[A], then S(a) = T(a), so 8 om = T by definition. Conversely, if T= Rea, then R@) = Ko x(a) = Ta), and Ris ourabove S, The linearity of S is practically obvious, ‘Thus S@+A) = S@FA) = Ta +8) = Te) +70) = S@) +58, ‘and homogeneity follows similarly. ‘This completes he proof, 0 ‘One more remark is of interest here. If N is invariant, under a Tinear map Tin Hom V (that is, TIN] CN), then for each a in V, Ta] is a subset of the coset Ta), for Ta] = Ta + N] = Pla) +s TIN] C Ta) +. N = Te.1 AFFINE SUBSPACES AND QUOTIENT SPACES There is therefore a map S:V/N— V/N defined by the requirement that, S(@) = Tia) (or Ser = wo 1), and itis easy to check that S is linear. ‘There- fore, ‘Theorem 4.4. Ifa subspace N of a vector space V is carried into itself by a transformation 7 in Hom V, then there is a unique transformation S in Hom(V/N) such that Sem = we T. EXERCISES 4.1 Prove properties (4), (5), and (6) of affine subspaces 4.2. Choose an origin 0 in the Euclidean plane E? (your sheet of paper), and let I and Lz be two parallel lines not containing 0. Let X and ¥ be distinet points on and Zany point on La. Draw the figure giving the geometric sums OX+0% and OF +02 (parallelogram rule), and state the theorem from plane geometry that says that these ‘ovo sum points are on a third line Lg parallel to Ly and La, 4.3. 2) Prove the associative law for additi )_ Prove also laws Ad and S2 44 Return now to Exercise 2.1 and reexamine the situation in the light of Theorem 4.1. Show, finally, how we really know that the geometrie vectors form a vector space. 4.3. Prove that the mapping S of Theorem 4.3 is injective if and only if N is the null space of 7. 4.6 We know from Exercise 4.5 that if Tis a surjective element of Hom(V, W) and {Vis the null space of 7, then the S of Theorem 4.3 is an isomorphism from V/N to W'. Its inverse S~! assigns a coset of N to each in I. Show that the process of “indefinite integration” is an example of such a map S~!, ‘This is the process of calculating an integral and adding an arbitrary constant, as in [rinzds = o2-b0 yn for Theorem 4.1 4.7 Suppose that NV and M are subspaces of a veetor space Vand that NC M. Show that then 3{/W isa subspace of V/N and that V/.M is naturally isomorphic to the ‘quotient space (V/N)/(M/N). [2int: Every eoset of Nis a subset of some coset of MJ 48 Suppose that N and M are any subspaces of a vector space V. Prove that (M-+ N)/N is naturally isomorphic to M/(M AN). (Start with the fact that each coset of A 7. is included in a unique coset of N.) 4.9 Prove that the map S of Theorem 4.4 is linear. 4.10, Given 7 € Hom V, show that 72 = 0 (7? = To 7) ifand only if R(T) CN(T). 4.11 Suppose that 7’ Hom V and the subspace N are such that TT is the identity ‘on N and also on V/N. ‘The latter assumption is that the S of Theorem 4.4 is the identity on V/N. Set R = T — 1, and use the above exercise to show that R® = 0. Show that if 7 = I+ R and R? = 0, then there is a subspace N’ such that 7'is the identity on N and also on V/N.56 vEcTor sraces 18 4.12 We now view the above situation a little differently, Supposing that 7 is the identity on Nand on V/X, and setting R= I— 7, show that tere exists a K€ Hom(¥/N, ¥) such that R = Ke x. Show that for any coset 4 of X’ the action ‘of Ton A can be viewed as translation through K(A). Thatis if 8 A and y = K(.1), then T() = E+ 413 Consider the map T: <2 + 2r2, 22> in Hom B?, and let Whe the null space of R = T— I. Identify N and show that 1 is the identity on Vand ‘on R2/N. Find the map K of the above exercise, Such a mapping 7 iscalled a shear transformation of V’ parallel to V. Draw the unit square and its image under 7. 4.16 If we remember that the linear span 1.(.1) of a subset .1 of a vector space V ean bee defined as the intersection of all the subspaces of V that include .t, then the fact, that the intersection of any collection of affine subspaces of a vector space V is either an affine subspace or empty suggests that we define the affine span M(.1) of a nonempty subset 1 Vas the intersection of all afine subspaces including «1. ‘Then we know from (8) in our list of affine properties that M(.1) is an affine subspace, and by its definition above that itis the smallest affine subspace including A. We now naturally wonder whether AM(.1) ean be directly deseribed in terms of Hinear combinations, Show first that if a al, then M(A) = L(A —a) ++ a; then prove tha: M1) is the set of all linear combinations 5 ray on -{ such that Sox; = 1 4.15. Show that the linear span of a set B is the affine span of BU {0} 4.16 Show that M(L-++7) = MCU) -+7 for any Yin V and that M(zd) = 23f(A) for any in R. 5. DIRECT SUMS We come now to the heart of the chapter. It frequently happens that the study ‘of some phenomenon on a vector space V leads to a finite collection of subspaces {V2} such that V is naturally isomorphic to the product space []; Ve Under this isomorphism the maps 6;¢ m; on the product space become certain maps Pin Hom V, and the projection-injection identities are reflected in she identities, LP. = 1, Pye Pj = P; forall j, and Pro P; = Oif i x j. Also, Ve = range P,, ‘The product structure that V thus acquires is then used to study the phe nomenon that gave rise to it, For example, this is the way that we unravel the structure of a linear transformation in Hom V, the study of which is one of the central problems in lincar algebra. Direct sums. If V,.-.,Vq are subspaces of the veotor space V, then the mapping: + Sj ay is a linear transformation from [Tj Vs to V, since it is the sum 7 = Sf, of the coordinate projections Definition. We shall say that the V,'s are independent if mis injective and that V is the direet sum of the V,’s if is an isomorphism. We express the latter relationship by writing V = V1 ® +++ @ Va = @i Vie ‘Thus V = @fuy Vs if and only if is injective and surjective, ic., if and only if the subspaces {V7} are both independent and span ’. A useful restate-15 pinecr sums 57 ment of the direct sum condition is that each @ & V is uniquely expressible as a sum Sf ai, with a € Vi for all 7; « has some such expression because the V's span V, and the expression is unique by their independeneo. For example, let V = @(R) be the space of real-valued continuous functions on R, let V be the subset of even functions (funetions f sueh that f(—z) = fle) {or all), and let V, be the subset of odd funetions (functions sueh that f(—2) =f(2) for all 2). Tt'is clear that V_ and V, are subspaces of V, and we elaim that V= Ve @ Vy. To see this, note that for any fin V, g(e) = (f(a) +.f(—2))/2 is even, h(x) = (f(2) — f(—2))/2 is odd, and j= gh. Thus V = Ve-+ Vou Moreover, this decomposition of f is unique, for if f= 1 ++ hy also, where gy is even and 2, is odd, then g — gi = ly — h, and therefore g — hy — h, since the only funetion that is both even and odd is zero. The even-odd components of e* are the hyperbolie cosine and sine funetions: Since * is injective if and only if its null spaee is {0} (Lemma 1.1), we have: Lemma 5.1, ‘The independence of the subspaces {V/))7 is equivalent to the property that if ay = V; for all i and Df a; = 0, then a= 0 for all i Corollary. If the subspaces {V}7 are independent, a; € V; for all é, and Ef ay isan clement of V5, then a; = 0 for i x j. We leave the proof to the reader. The ease of two subspaces is particularly simple. Lemma 5.2. The subspaces M and N of V are independent if and only if MAN = 0}. Proof. Wae M,8eN,anda+=0,thena=—seM ON. MAN = {0}, this will further imply that a= = 0, so M and N are independent, On ‘the other hand, if 0 # 8 € MAN, and if we set a= —8, then ae BEN, and a +8 = 0, 0 M and N are not independent, 0 Note that the first argument above is simply the general form of the unique~ ness argument. we gave earlier for the even-odd decomposition of @ function on, Corollary. V = M @ N if and only if V = M +N and MN = 00} Definition. If V = M @ N, then Af and N are called complementary subspaces, and each is a complement of the other. Warning: A subspace M of V does not have a unique complementary subspace unless M is trivial (that is, Mf = {0} or M = V). If we view Rt? as coordinatized welidean 3-space, then I is a proper subspace if and only if M is a plane con- {aining the origin or M isa line through the origin (see Fig. 1.9). If Mand N are58 vecTor spaces 15 EY AG. manor a gobs Fig. 19 proper subspaces one of which is a plane and the other a line not lying in that plane, then Mf and N are complementary subspaces. Moreover, these are the only nontrivial complementary pairs in R®. The reader will be asked to prove some of these facts in the exereises and they all will be clear by the middle of the next chapter. ‘The following lemma is technically useful. Lemma 5.3. If V; and Vo are independent subspaces of V and {V3} are independent subspaces of Vo, then {Vj} are independent subspaces of V. Proof. It aj € Vi for all é and S73 ay = 0, then, setting ay = 53 ai, we have a1 + a9 = 0, with ag € Vo. Therefore, a1 = ay = 0 by the independence of Vi and Vo. But then ay = ay=-:+— ay =0 by the independence of {V,}3, and we are done (Lemma 5.1). 0 Corollary. V= V1 @ Vo and Vo = @t Vi together imply that v= Oe Vi Projections. If V = @lai Vs, if m is the isomorphism @ = Li ay and if 1; is the jth projection map ++ aj from Vi to V3, then (x; 0F)(a) = ay Definition, We call aj the jth component of a, and we eall the linear map Pj = 150-4" the projection of V onto V; (with respeet to the given direct, sum decomposition of V). Since each a in V is uniquely expressible asa sum «= Cia, with a in Vi for all i, we ean view Py(a) = a; as “the part of ain Vj". ‘This use of the word “projection” is different from its use in the Cartesian product situation, and each is different from its use in the quotient space context (ection 0.12) It is apparent that these three uses are related, and the ambiguity causes little confusion since the proper meaning is always clear from the context. ‘Theorem 5.1. If the maps Pare the above projections, then range P, Pro P; = Ofori x j,and Pi = I. Proof. Since x is an isomorphism and Pj = x; x-!, we have range Pj = range 7; = Vj. Next, it follows directly from the corollary to Lemma 5.1 that Va15 pinecr sems 59 if we V;, then Pi(a) = 0 for i # j, and so Pye Py = 0 for ij. Finally, LPP Eimer! = (Lj x) ow! = rex! = Land weare done, 0 jon properties are clearly the reflection in V of the pro jection identities for the isomorphie space IT} Vi. A converse theorem is also true. ‘Theorem 5.2. If {P)}}C Hom V satisfy D}Pi= I and Pio Pj i #j, and if we set V;= range Pi, then V = @lai Vs, and P; corresponding projection on Vz Proof. The equation a = I(a) = E} Pi(a) shows that the subspaces {Vi} span V. Next, if 6 V,, then P,(8) = 0 for i # j, since @ € range P; and for the ProPy=0 if £4 J. Then also Pi(@) = (I — Dogs PIB) = 106) = 8. Now consider a = 57 a; for any choice of a; € Vs. Using the above two facts, we have Pj(a) = Pj(Xfay a4) = ley Py(aq) = a3. Therefore, a = 0 implies that a;= PQ) = 0 for all j, and the subspaces V; are independent. Consequently, V = @j Vs. Finally, the fact that @ = ¥ Py(a) and Pi(a) € Vi for ll shows that P,(a) is the jth component of a for every a and therefore that, P; is the projection of V onto V;. 0 ‘There is an intrinsic characterization of the kind of map that is a projection. Lemma 5.4. The projections P; are idempotent (P? cach is the identity on its range. ‘The null space of P: Vj for j x & Proof. P? = P;0 (I — Eigs Pi) = Pj I = Pj. Since this ean be rewritten, as P,(P,(a)) = Pi(a) for every a in V, it says exactly that P, is the identity P), or, equivalently, the sum of the spaces on its range. Now set Ws= Digi Vj and note that if 6 € Ws, then Py(a) = 0 since PAV3) = 0 forj # i. ‘Thus W;c N(P). Conversely, if Pi(a) = 0, then Ma) = D4 P3(a) = Cie: Pile) € We. Thus N(P)) C We, and the two spaces are equal. 0) Conversely: Lemma 5.5. If P € Hom(V) is idempotent, then V is the direct sum of its range and null space, and P is the corresponding projection on its range. Proof. Setting Q= I — P, we have PQ = P — P? = 0. Therefore, V is the direct sum of the ranges of P and Q, and P is the corresponding projection on its range, by the above theorem, Moreover, the range of @ is the null space of P, by the corollary. 0 If V = M @ N and P is the corresponding projection on M, we call P the projection on M along N. The projection P is not determined by M alone, since ‘M does not determine N. A pair P and Q in Hom V such that P+ @Q = I and PQ = QP = 0 is called a pair of complementary projections.60 vecton spaces 15 In the above discussion we have noglected another fine point, Strictly speaking, when we form the sum = 5} xi, we are treating each m; as though it were from [Jj V; to V, whereas actually the codomain of w; is V;, And we want P, to be from V to V, wherea has codomain V,, so the equation Pj = x; 0m can't quite be true either. To repair these flaws we have to introduce the injection ¢;: Vj + V, which is the identity map on Vs, but which views V'j as a subspace of V and so takes V as its codomain. If our concept of a mapping includes a eodomain possibly larger than the range, then we have to admit such identity injections, ‘Then, setting #; = 1 © mj, we have the correct, equations w= S] Rand Pj = #0 x EXERCISES 5.1 Prove the corollary to Lemma 8. 5.2 Let a be the vector <1,1, 1 in RS, and let M = Ret be its one-timensional span. Show that each of the three coordinate planes is a complement of IM. 5.3. Show that a finite product space V’ = [Tt Vs has subspaces {W;}? such that We is isomorphic to V; and Y= @f W; Show how the corresponding projections {P2 are related to the x's and 0s 34 If Te Hom(V, W), show that (the graph of) T is a complement of W! = (0) x Win x W. 5.3. If is a linear functional on V (= Hom(V,R) = ¥*), and if ais a veetor in V such that (a) 0, show that I’ = NV @ 3, where Nis the null space of Cand Af = Ra is the linear span of a. What does this result say about complements in B®? 5.6 Show that any complement A of a subspace N’ of a vector space V is isomorphic to the quotient space V/¥, 5.7 We suppose again that every subspace has a complement, Show that if 1 Hom V is nol injective, then there is a nonzero S in Hom V such that Te 8 = 0, Show that if 7" Hom V is not surjective, then there is a nonzero $ in Hom V such that So T = 0, 5.8 Using the above exercise for half the arguments, show that TE Hom Vis injective if and only if T'° S = O=> 8 = Oand that T'is surjective if and only if So T 0= 8 = 0. We thus have characterizations of injectivity and surjectivity that are formal, in the sense that they do not refer to the fact that S and T'are transformations, ‘but refer only to the algebraic properties of S and 7'as elements of an algebra. 9 Let Mf and N be complementary subspaces of a vector space V, and let X be a subspace such that XN = {0}. Show that there is a linear injection from X to M. int: Consider the projection P of V onto M along N.] Show that any two complements of a subspace V are isomorphic by showing that the above injection is surjective if and only if is a complement of N. 5.10 Going back to the first point of the preceding exercise let Y be a complement of P{X]in M. Show that XA ¥ = {0} and that X @ Y is a complement of \V. SAIL Let M be a proper subspace of ¥, and let fai:i I} be a finite set in V. Set L = L({as}), and suppose that If-+ L = V. Show that there is a subset J CF such1s pinecr sts OL that {ai:iE J} spans a complement of I [ints Consider a largest possible subset J such that MAL (au}s) = {0)-] BAZ Given PE Hom(T, W) and SE Hom(IF, X), show that a) So Tis surjective © 8 is surjective and R(T) + N(S) = Wi b) So Tis injective + Tis injective and R(T) AN(S) = {0}; ©) So Tisan isomorphisn © Sissurjective, Tis injective, and IF = R(T) ® N(S). 5.13 Assuming that every subspace of 1" has a complement, show that 7’ Hom V satisfies 72 = Oif and only if V has a direet sum decomposition V = AF @ N such that T= OonN and TIMCN. 5.14 Suppose nest that 7% = 0 but 7? x 0. Show that V ean be written as V ¥1@ VO Ve, where PUG] V2, Talc Vay and P= O on Mx. (Assume again that any subspace of a vector space has a complement.) 5.15 We now suppose that T* = but Tet x 0. Sct Ny — null space (7) for jn — 1, and let Vy be a complement of Net in V. Show first that TIVINN a2 = (0) and that 701 C Naa. Estend TIVs1 to a complement V9 of ¥ show that in this way we ean construct subspaces V,..-, Vx teh that 1, and ve@r, MWC for i V such that ToS = Iw, the identity on W. Thus a solution process picks one solution veetor £ € V for each » & W in such a way that the solving £ varies linearly with a. Taking this as our meaning of solving, we have the following fundamental reformulatio ‘Theorem 5.3. Tot T he a surjective linear map from the vector space V to the vector space W, and let N be its null space. ‘Then a subspace M is a ‘complement of NV if and only if the restriction of 7 to M is an isomorphism from M to W. ‘The mapping Mr (7 } M)~" isa bijection from the set of all such complementary subspaces M to the set of all linear right inverses of f.62 vector spaces 15 Proof. Tt should be clear that @ subspace A is the range ofa linear right inverse of T (a map Ssuch that T'© $= Tw) ifand only if 7 [ isan isomorphism to W, inwhich case S = (7 | M)~!. Strictly speaking, the right inverse must be from W to V and therefore must be R = ur S, where uy is the identity injection from M to V. Then (Re 7)? = Ro(ToR)oT =RelyeT = Re T, and R © T is a projection whose range is M and whose null space is N (since R is, injective). Thus V = M@ N. Conversely, if V = M @ N, then 7 [ Af is injective because MN = {0} and surjective because M-+'N that W = 21] = T[M +N] = TIM)-+ TIN] = TUM) + (0) Polynomials in T. ‘The material in this subsection will be used in our study of differential equations with constant coefficients and in the proof of the diagonal- izability of a symmetric matrix. In linear algebra it is basie in almost any approach to the eanonical forms of matrices. Tips) = Chait and p2(d) = E4 0,0 are any two polynomials, then their product is the polynomial 10 = ropa =F et where cy = Eiyinn ads = Cho adj. Now let 7 be any fixed element of Hom(V), and for any polynomial q(t) let q(T) be the transformation obtained by replacing t by T. That is, if g() = Soe, then (7) = Eb cal, where, of course, 7" is the composition product T'e T'° +++ T with | factors. Then the bilinearity of composition (Theorem 3.3) shows that if p(t) = pi(é)pa(t), then p(T) = p,(T) p2(7).. In particular, any two polynomials in 7 commute with each other under composition. More simply, the commutative law for addition implies that if pO = pil + pal, — then p(T) = pil) + pal. ‘The mapping p(t) ++ p(T) from the algebra of polynomials to the algebra ‘Hom(V) thus preserves addition, multiplication, and (obviously) scalar multiplication. That is, it preserves all the operations of an algebra and is therefore what is called an (algebra) homomorphism. ‘The word “homomorphism” is a general term describing a mapping @ between two algebraic systems of the same kind such that @ preserves the operations of the system. Thus a homomorphism between vector spaces is simply @ linear transformation, and a homomorphism between groups is a mapping preserving the one group operation. An accessible, but not really typical, example of the latter is the logarithm funetion, which isa homomorphism from the multiplicative group of positive real numbers to the additive group of R. ‘The logarithm function is actually a bijective homomorphism and is therefore 4 group isomorphism. If this were a course in algebra, we would show that the division algorithm ‘and the properties of the degree of a polynomial imply the following theorem, (However, seo Exercises 5.16 through 5.20.)15 pingcr sums 63 ‘Theorem 5.4. If p,(t) and pa(t) are relatively prime polynomials, then there cexist polynomials a,() and aa() such that ax()pr) + ar)p2) By rolatively prime we mean having no common factors except constants. ‘We shall assume this theorem and the results of the discussion preceding it in proving our next theorem. ‘We say that a subspace M C V is invariant under 1’ Hom(V) if 7(M].CM. [that is, 7 fat € Hom(aN). ‘Theorem 5.5, Let 7 be any transformation in Hom V, and let q be any polynomial. ‘Then the null space N of q(7) is invariant under 7, and if 4 = quia is any factorization of q into relatively prime factors and Ny and Ne are the null space of q(T) and ga(7), respectively, then N= Ny @ Nz Proof. Since T + q(T) = 9(T) » T, we see that if (T)(a) = 0, then g(T)(Ta) = T(q(T)(a)) = 0, 80 TINJCN. Note also that since q(T) = 91(1) ° ga(T), it follows that any a in Nis also in N, so Nz CN. Similarly, Ny CN. We ean therefore replace V by N and T by T {'N;hence we ean assume that 7’€ Hom N and q(T) = qu(T) © g2(T) = 0. ‘Now choose polynomials a; and a, s0 that aig: ++ dag2 = 1. Since p+ p(T) is an algebraic homomorphism, we then have a,(P) 2 qi(7) + a3(T) © 93(T) = T. Set Ay = a,(7), ete., so that Ay °Q1 + Az °Qz = 1,Q1 °Q2 = 0, and alll the operators Ay, Q; commute with each other. Finally, set Ps = Ase Qi = Qie As for i= 1,2. Then Py + Pp = Tand P;P; = PsP, = 0. Thus P; and P, are projections, and NV is the direct sum of their ranges: V = V; ® Vo. Since each range is the mull space of the other projeetion, we can rewrite this as N Ny @ Nz, where N; = N(P). It remains for us to show that N(P,) = N(@). Note first that since Qi °P2 = Qi 2Qz° Az = 0, we have Qi = Quel Q12 (P+ Ps) =Q1eP;. Then the two identities Py= Ase Q; and Q Q;¢ P;show that the null space of each of P; and Q; is included in the other, and so they are equal. ‘This completes the proof of the theorem. 0 Corollary. Let p(t) = ITM; pi(t) be a factorization of the polynomial () into relatively prime factors, let T be an element of Hom(V), and set Ny = N(p(1)) for i= 1,..., mand N = N(p(1)). Then N and all the ‘Weare invariant under 7, and NV ™ Proof. ‘The proof is by induetion on m. ‘The theorem is the ease m = 2, and if wo ect g—= TIT p(t) and M-—=N(q(7)), then the theorem implice that N=, @ Mand that N; and Mf are invariant under 7. Restricting T to M, see that the inductive hypothesis implies that Mf = Qf. N, and that Nj is invariant under T for i= 2,...,m. The corollary to Lemma 5.3 then yields our result.GL vector spaces 15 EXERCISES 5.16 Presumably the reader knows (or ean see) that the degree d(P) of a polynomial P satisfies the laws AUP + Q) < mas (AP), dQ}, AP -Q) = a(P) + dQ) if both P and Q are nonzero. ‘The degree of the zero polynomial is undefined. (Lt would have to be —= ) By induetion on the degree of P, prove that for any two polynomials P and D, with D # 0, there are polynomials @ and & such that P = DQ+ R and d(R) < d(D) or R = 0. [Win Te d(P) < d(D), we ean takeQ and Ras what? Ifd(P) > d(D), and if the lead ing terms of P and Dare ar* and br, respectively, with n > m, then the polynomial rrQe hhas degree less than d(P), so P” = DQ’ + R by the inductive hypothesis, Now finish the proof.) 5.17 Assuming the above result, prove that R and Q are uniquely determined by Pand D. (Assume also that P = DQ’+ R’, and prove from the properties of degree that 2” = Rand Q = Q.) These two results together constitute the division algoritho. for polynomials, 5.18 If P is any polynomial PQ) = Yaa", and if € is any number, then of course P(e) i the number Dat. Prove from the division algorithm that for any polynomial P and any number ¢ there isa polynomial Q such that PQ) = © — 09) + PO, and therefore that P(z) is divisible by (r — 0) if and only if P(@ = 0. 5.19 Let P and Q be nonsero polynomials, and choose polynomials .19 and Bo such that among all the polynomials of the form 1P ++ BQ the polynomial D = AoP + BoQ is nonzero and has minimum degree. Prove that D is a factor of both P and Q. (Sup- pose that D does not divide P and apply the division algorithm to get a contradietion with the choice of -lo and Bo.) 5.20, Let P au @ be nonzero relatively prime polywomials, This meas Unat if # ise common factor of P and Q(P = EP',Q = EQ’), then E is a constant. Prove that ‘there are polynomials A and B such’ that A(z)P(@)-+ B@)Q@) = 1. (Apply the above exercise.)13 pineer sums 65 5.21 In the context of Theorem 5.5, show that the restrietion of q2(7) = Q2 to isan isomorphism (from Ns to X1) 3.22 An involution on Vis a mapping 7. Hom V such that 72 = I. Show that if Tis an involution, then ¥ is a digeet sum V = V1 @ V2, where T(Q) = & for every EE V1 (T= Lon V1) and Tg) = —£ for every £E V2 (T = —Fon Va). (Apply. ‘Theorem 9.5.) 5.28 We noticed earlier (in an exercise) that if y is any mapping from a set -L Lo a set B, then f+ fo g isa linear map T, from B® to R4. Show now that if ¥: BC, then The = Te° Ty. (his should turn out to be a direct consequence of the associativity of composition.) 52h Let A be any set, and let g:1— 4 be such that go gla) = a for every a ‘Then Ty: f+ fog is an involution on V = R& (since Troy = Tye T.). Show that the decomposition of R* as the direct sum of the subspace of even functions and the subspace of odd funetions arises from an involution on R® defined by such a map RoR. 5.25 Let V be a subspace of BM consisting of differentiable functions, and suppose that V is invariant under differentiation (FE V=> Df V). Suppose also that on V the linear operator DE Hom V satisfies D? — 2D — 31 = 0. Prove that V’ is the direct sum of two subspaces Mf and A such that D = 32 on M and D = —T on YX. Actually, it follows that 1 is the linear span of a single vector, and similarly for Find these two functions, if you ean. (f/ = 3f—> f = 2) “Mock decompositions of linear maps. Given 7’ in Hom V and a direct sum decomposition V = @} Vs, with corresponding projections {P,}4, we can consider the maps Ty = Pie To P;, Although Tis from V to V, we may also ‘want to consider it as being from V; to V; (in whieh ease, strietly speaking, what is it?). We picture the 17's arranged schematically in a rectangular array ‘Similar to a matrix, as indieated below for Furthermore, sinee 7 = Ej Tis, we eall the doubly indexed family the block decomposition of 7 associated with the given direct sum decomposition of V More generally, if 7 = Hom(V, W) and W also has a direct sum deeomposi tion W = @P, Wi, with corresponding projections {Q;}7, then the family {73} defined by yj = Qce T° P; and pictured as an m X n rectangular array is the block decomposition of 7 with respect to the two direet sum decompositions. ‘Whenever 7'in Hom V has a special relationship to a particular direet sum decomposition of V, the corresponding block diagram may have features that, display these special properties in a vivid way; this then helps us to understand the nature of 7’ better and to calculate with it more easily.66 _yeeror spaces 15 Forexample, if V = V's ® Vs, then Vs isinvatiant under 7 (ie. 71Vi1C V1) if and only if the block diagram is upper triangular, as shown in the following diagram. Ti | Te 0 | Tox Suppose, next, that T? = 0. Letting V1 be the range of 7, and supposing that V has a complement V's, the reader should clearly see that the corresponding block diagram is | Tia fae ‘This form is ealled strictly upper triangular; it is upper triangular and also zero on the main diagonal. Conversely, if 7 has some strietly upper-triangular 2X 2 block diagram, then 7? = 0. If Bisa composition product, R = ST, then its block eomponents ean be computed in terms of those of Sand 7. ‘Thus Ram PARP, = PST, = PS (EP) Tr = E Syn We have used the identities = Tj, Pj and Py =P}. The 2x 2 ease is pictured below. SuTir + Sioa | 8) Sra oa Sais + SaoTar | SoiTr2 + SaaT oa From this we can read off a fact that will be useful to us later: If 7’ is 2x 2 upper triangular (21 = 0), and if Ti is invertible as a map from V; to Vs (= 1,2), then T is invertible and its inverse is Tat | Pata of te We find this solution by simply setting the produet diagram equal to nlo oln and solving; but of course with the diagram in hand it ean bbe eorreet. imply be checked to EXERCISES 5.26 Show that if Te Hom V, if V = @E Vs, and if (2) are the corresponding projections, then the sum of the transformation Ty; = Po Pe Py is 7.16 puuveariry 67 5.27 If 8 and 7 are in Hom V and {Sy}, {7} are their block components with respect to some direct sum decomposition of V, show that Sye Tw = Oi # l 5.28 Verify that if T has an upper-triangular block diagram with respect to the direct sum decomposition V = V1 @ Va, then Vs is invariant under 7. 45.29 Verify that ifthe diagram is strictly upper triangular, then 7? = 0. 5.30 Show that if V = Vy @ V2 @ Vy and 7'= Hom Y, then the subspaces Vy are all invariant under 7 if and only if the block diagram for Tis Mm 0 0 0 Tm 0 0 0 Ms, Show that 7 is invertible if and only if Ti is invertible (as an element of Hom V9) for each i. 5.81 Supposing that has an upper-triangular 2X 2 block diagram and that Tis is invertible as an element of Hom V; for ¢ = 1, 2, verify that T is invertible by form= ing the 2X 2 block diagram that is the product of the diagram for T and the diagram sgiven in the text as the inverse of 7, = 71 must have Tand SoT = 1 5.82 Supposing that Tis as in the preceding exercise, show that the given block diagram by considering the two equations Te in theie block form. 5.83 What would strictly upper triangular mean for a 3X 3 block diagram’? What is the corresponding property of 7? Show that 7' has this property if and only if it has strictly upper-triangular block diagram. (See Exercise 5.14.) 5.84 Suppose that 7 in Hom V satisfies 7" = 0 (but T*! +4 0). Show that 7 has fa strictly upper-triangular n Xn block decomposition. (Apply Exercise 5.15.) 6, BILINEARITY Bilinear mappings. ‘The notion of a bilinear mapping is important to the un derstanding of linear algebra because it is the veetor setting for the duality prineiple (Section 0.10). Definition. If U,V, and W are veetor spaces, then a mapping. : <> H+ WE, 9) from UX V to W is bilinear if it is linear in each variable when the other variable is held fixed. That is, if we hold € fixed, then ++ «(E, ») is linear [and so belongs to Hom(V, W)]; if we hold fixed, then similarly w(&, 9) is in Hom(U, W) as a function of £. ‘This is not the same notion as linearity on the product veetor space Ux V. For example, <2, y> — 2+y is a linear mapping from R to R, but it is not bilinear. If y is held fixed, then the mapping 2+ 2+ y is affine (translation through y), but it is not linear unless y is 0. On the other hand, <2, y> + 2y is bilinear mapping from R? to R, but it isnot linear. If y68 vector sracrs 16 is held fixed, then the mapping x ye is linear. But the sum of two ordered couples does not map to the sum of their images: + = , is thus from V to Hom(U, W), and ite linearity is due to the linearity of ein’ when & is held fixed (Een + at) = cw, n) + de £) = cay() + deg), weypar() so that so that Gendt = Cody + dary Similarly, if we define at by a(n) = 0, 9), then & at isa linear mapping from U to Hom(V, W). Conversely, if ¢: U + Hom(V, 1) is linear, then the funetion w defined by w(&, 9) = ¢(8(a) is bilinear. Moreover, at = o(2), 80 that ¢ is the mapping fw. 0 We shall see that bilinearity oceurs frequently. Sometimes the reinterpreta- tion provided by the above theorem provides new insights; at other times it seems less helpful. For example, the composition map <8, 17> Se 7 is bilinear, and the corollary of Theorem 3.3, which in effect states that composition on the right by a fixed T isa linear map, is simply part of an explicit statement of the bilinearity.. But the linear map 1’ > composition by is a complicated object that we have no need for except in the ease W = &. On the other hand, the linear combination formula 2} z,a;and Theorem 1.2 do receive new illumination. ‘Theorem 6.2. ‘The mapping w(x, #) = Df xia, is bilinear from R" x V* to V. The mapping a+ a is therefore @ lincar mapping from V™ to Hom(R", V), and, in fact, is an isomorphism, Proof. The linearity of w in x for a fixed x was proved in Theorem 1.2, and its Tincarity in « for a fixed s is seen in the same way. ‘Then « + wa is linear by ‘Theorem 6.1. Its bijeetivity was implicit in Theorem 1.2. 0 It should be remarked that we can use any finite index set J just as well as the special set 7 and conclude that w(x, a) = Lier «ass bilinear from @! x V!16 pouNeariry 69 to V and that a++ wa is an isomorphism from V! to Hom(R!, V). Also note that aa = La in the terminology of Seetion 1. Corollary. ‘The sealar product (x, a) = Einav is to R; therefore, a+ @ = Ly is an isomorphism from R" to Hom(R, R) Natural isomorphisms. We often find two veetor spaces related to each other in such a way that a particular isomorphism between then is singled out, ‘This phenomenon is hard to pin down in general terms but easy to deseribe by examples. Duality is one souree of such “natural” isomorphisms. For example, an mx n matrix {{;3} is a real-valued funetion of the two variables , and. as such it is an element of the Cartesian space R™**, We ean also view {l,)} as 1a sequence of n column vectors in R". This is the dual point of view where we hold j fixed and obtain a funetion of ¢ for each j.. From this point of view {ty} isan element of (R)*. This correspondence between R™** and (RF) is clearly an isomorphism, and is an example of a natural isomorphism. ‘We review next the various ways of looking at Cartesian n-space itself One standard way of defining an ordered n-tuplet is by induction, The ordered, triplet <2, y, 2> is defined as the ordered pair < , 2, and the ordered, netuplet <2)... 24> is defined as <<2y,..-, 2.1% 2_>- Thus we define R" inductively by setting R' = Rand R" = R™' x R, ‘The ordered n-tuplet can also be defined as the function on % = {1,..-,n} which assigns 2; to i. ‘Then Kai eptn® = (Kary for all EV. (This is duality again! (2) isa function of the two variables Zand &2) And it is not hard to see that this identification of 7 with {771% is an isomorphism from Tl; Hom(V, W,) to Hom(V, TI: 7). Similarly, Theorem 3.7 identifies an n-tuple of linear maps {71} into a common eodomain V with a single linear map T of an n-tuple variable, and this id tification is a natural isomorphism from IT} Hom(V;, ¥) to Hom(IT} Wi, V).70 yeeton spaces 16 An arbitrary isomorphism between two veetor spaces identifies them in a transient way. For the moment we think of the veetor spaces as representing the same abstraet space, but only so long as the isomorphism is before us. If we shift to a different isomorphism between them, we obtain a new temporary identification. Natural isomorphisms, on the other hand, effect permanent identifications, and we think of paired objects as being two aspeets of the same ‘object in a deeper sense, ‘Thus we think of a matrix as “being” either a sequence of row veetors, a sequence of column veetors, or a single funetion of two integer indices. We shall take a final look at this question at the end of Seetion 3 in the next chapter. *We can now make the ultimate dissection of the theorems centering around the linear combination formula, Laws S1 through $3 state exactly that the scalar product za is bilinear. More precisely, they state that the mapping S: <2, a> +> ra from 8X W to W isbilinear. In the language of Theorem 6.1, xa = aq(z), and from that theorem we conclude that the mapping a wy is an isomorphism from W to Hom(R, W). ‘This isomorphism between W and Hom(R, W) extends to an isomorphism from 1" to (Hom(R, W))*, which in turn is naturally isomorphic to Hom(R",WV) by the second Cartesian product isomorphism. ‘Thus W" is naturally isomorphic to Hom(R", WW); the mapping is a+ La, where La(x) = Cina, In particular, R" is naturally isomorphic to the space Hom(R®, R) of all linear functionals on R*, the x-tuple a corresponding to the functional ey defined by &4(x) = Ei asx ‘Also, (R")" is naturally isomorphic to Hom(R", R"). And sinee BP** is naturally isomorphie to (R")*, it follows that the spaces R™™™ and Hom(R”, R") are naturally isomorphic. This is simply our natural association of a transfor~ mation Tin Hom(R", 8") to an m Xn matrix {tj}CHAPTER 2 FINITE-DIMENSIONAL VECTOR SPACES We have defined a veetor space to be finite-dimensional if it has a finite spanning set. In this chapter we shall focus our attention on such spaces, although this restriction is unnecessary for some of our discussion. We shall see that. we ean assign to each finite-dimensional space V a unique integer, called the dimension of V, which satisfies our intuitive requirements about dimensionality and whieh becomes a principal tool in the deeper explorations into the nature of such spaces. A number of “dimensional identities” are crucial in these further investigations. We shall find that the dual space of all linear funetionals on V, V* = Hom(V, R), plays a more satisfactory role in finite-dimensional theory than in the context of general vector spaces. (However, we shall see later in the book that when we add limit theory to our algebra, there are certain special infinite-dimensional vector spaces for which the dual space plays an equally important role.) A finite-dimensional space ean be characterized as a vector space isomorphie to some Cartesian space R", and such an isomorphism allows a transformation 7 in Hom V to be “transferred” to R*, whereupon it acquires a matrix. The theory of linear transformations on such spaces is therefore mirrored completely by the theory of matrices. In this chapter we shall push much deeper into the nature of this relationship than we did in Chapter 1. We also include a seetion on matrix computations, a brief section describing the trace and determinant functions, and a short discussion of the diagonalization of a ‘quadratie form. 1. BASES. Consider again a fixed finite indexed set of veetors @ = {a:: i € 7} in V and the corresponding linear combination map La:x ++ Ea from R! to V having a 1s skeleton, Definition. ‘The finite indexed set {ai : i € Z} is independent if the above mapping Le is injective, and {a;} is a basis for V if Le is an isomorphism (onto V). Tn this situation wo call {ay: #€ 7} an ovtored basis or feame if T= 1= {1,...,n} for some positive intoger n. ‘Thus {as :7€ 1} isa basis if and only if for each § € V there exists a unique indexed “coefficient” set x= {r;:7ET} ER! such that €= Taq; The 772 FINITE-DIMENSIONAL VECTOR SPACES 2 numbers 2; always exist because {as: # € T} spans V, and x is unique because La is injective. For example, we ean check direetly that b! = <2,1> and b? = <1, —3> form a basis for R?, ‘The problem is to show that for each y € R? there is a unique x such that, Since this vector equation is equivalent to the two scalar equations yy = 221 +22 and yz = 21 — Bra, we ean find the unique solution 21 = (8y1-+ y2)/7, 2 = (yi ~ 2y2)/7 by the usual climination method of secondary school algebra, ‘The form of these definitions is dietated by our interpretation of the linear combination formula as linear mapping. ‘The more usual definition of independence is a corollary, 1<2,1> bap <1, —8> = <2ey $42, t1 — Bram. Lemma 11. The independence of the finite indexed set fai: i¢T} is ‘equivalent to the property that Sy xia; = 0 only if all the coefficients x; are 0. Proof. This is the property that the mull space of Le consist only of 0, and is thus equivalent to the injectivity of Le, that is, to the independence of {a}, by Lemma 1.1 of Chapter 1. 0 If {ac} is an ordered basis (frame) for V, the unique n-tuple x such that § = Dj za: is called the coordinate n-tuple of & (with respect to the basis {ai}), and 2:38 the ith coordinate of g We call za; (and sometimes x,) the ith component of & ‘The mapping Le will be called a basis isomorphism, and its inverse La’, which assigns to each vector & € V its unique coordinate n-tuple x, is a evordinale ‘isomorphism. ‘The linear functional ¢+> x; is the jth coordinate functional; itis the composition of the coordinate isomorphism +> x with the jth eoordinate projection x++ 2; on R®. We shall see in Section 3 that the n eoordinate functionals form a basis for V* = Hom(V, R). In the above paragraph we took the index set J to be 7 = {1,..., 1) and used the language of n-tuples. ‘The only difference for an arbitrary finite index set is that we speak of a coordinate function x = {zi: i € J) instead of a coondi- nate n-tuple. Our first concer will be to show that every finite-dimensional (finitely spanned) vector space has a basis. We start with some remarks about indices. We note first that a finite indexed set. {ai # € I} ean be independent only if the indexing is injective as a mapping into V, for ifm = a, then Eom where 2 = 1, 21 = —1, and x; = O for the remaining indices. Also, if fas: 7 is independent and J C I, then {aii € J} is independent, sinee if Dy za: and if we set x; = 0 for ie 1 —J, then Dr2.ai = 0, and so each 2; is 0. AA finite unindezed set is said to be independent if it is independent24 mases 73. (necessarily bijective) indexing. It will of course then be independent with respect to any bijective indexing. Aw arbitrary set is independent if every finite subset is independent. It follows that a set A is dependent (not independent) if ‘and only if there exist distinet elements ay, ---,a in A and sealars 21)... 5 0 not all zero such that Ef xia; = 0. An unindexed basis would be defined in the obvious way, However, a set can always be regurided as being indexed, by itself if necessary! Lemma 12. If B isan independent subset of a veetor space Vand B is any ‘veetor not in the linear span £(B), then BU (3) is independent. Proof. Otherwise there is a zero linear combination, «8 C7 28; — 0, where Bis. -- Beate distinet cloments of B and the couffiients ate not all 0. But then x eannot be zero: if it were, the equation would eontradiet the independence of B. We can therefore divide by and solve for 8, s0 that # € L(B), a contra ion, 0 ‘The reader will remember that we call a yeetor space TV’ finitetimensional if it has a finite spanning set {a}, We can use the above lemma to construct a basis for such a V by choosing some of the a's, We simply run through the sequence {ai} and choose those members that inerease the linear span of the preceding choices. We end up with a spanning set sinee fai}? spans, and our subsequence is independent at each step, by the lemma, In the same way we cean extend an independent set {2,} to a basis by choosing some members of spanning set {a,}{. This procedure is intuitive, but it is messy to set up rigor- ously. We shall therefore proceed differently. ‘Theorem LL. Any minimal finite spanning set is a basis, and therefore any finite-dimensional veetor space V has a basis. More generally, if (3: €J} isa finite independent set and {ay : 7 € I} is finite spanning set, and if isa smallest subset of F such that {8)} y U {a;} x spans, then this collection is independent and 2 basis. ‘Therefore, any finite independent subset of a finite-dimensional space ean be extencled to a basis. Proof. It is sufficient to prove the second assertion, since it includes the frst asa special ease. If {8;} U fai} x is not independent, then there is a nontrivial ‘ero linear combination Sy 938; -+ Lx tas = 0. I every a; were zero, this ‘equation would contradict the independence of {3;} s. Therefore, some ay is sot zero, and we ean solve the equation for aj. That is, if we set L = K — (8), then the linear span of {8;} U fa:)1,contains a. Tt therefore includes the whole ‘original spanning set and hence is V. But this contradicts the minimal nature of K, since L isa propersubset of K. Consequently, {8;} U {a;} x is independent. 0 We next note that B" itself has a very special basis. Tn the indexing map fa; the vector ay corresponds to the index Jj, but under the linear combination map x E ziai the veetor a, corresponds to the funetion 3? which has the value 1 at j and the value 0 elsewhere, so that Ey Bay = ay. ‘This function7A FINITE-DIMENSIONAL VECTOR SPACES 21 81 is called a Kronecker delta function. Tt is clearly the characteristic function Xw of the one-point set B = {j}, and the symbol ‘6” is ambiguons, just as ‘Xn is ambiguous; in each ease the meaning depends on what domain is implicit from the context, We have already used the delta funetions on &" in proving ‘Theorem. 1.2 of Chapter 1 ‘Theorem 1.2, ‘Ihe Kronecker functions {8°}5 Proof. Since If 2:8'(j) = x; by the definition of 44, we see that Di 2:6! is the n-tuple x itself, so the linear combination mapping Lg: x + Ej 26" is the identity mapping x + x, a trivial isomorphism. form a basis for R, Among all possible indexed bases for R", the Kronecker basis is thus singled out by the fact that its basis isomorphism is the identity; for this reason it is called the standard basis or the natural basis for 6". ‘The same holds for R! for any finite set I. Finally, we shall draw some elementary conclusions from the existence of a basis, ‘Theorem 1.3. If T € Hom(V, W) is an isomorphism and @ = {a;: i J} isa basis for V, then {T(ai) : i € I} is a basis for W. Proof. By hypothesis La is an isomorphism in Hom(R®, V), and so 1° La is an isomorphism in Hom(R*, W). Its skeleton {7'(a)} is thereforea basis for WV. 0 We can view any basis {a} as the image of the standard basis {6'} under the basis isomorphism. Conversely, any isomorphism 6: R! > B becomes a basis isomorphism for the basis aj = 6(6'). ‘Theorem 1.4. IfX and Y are complementary subspaces of a vector space V, then the union of a basis for X and a basis for Y isa basis for V. Conversely, if a basis for V is partitioned into two sets, with linear spans X and Y, respectively, then X and Y are complementary subspaces of V. Proof. We prove only the first statement, If ai: ¢ €J) is a basis for X and {ai: i © K} isa basis for Y, then it is clear that {ai :i€ J UK) spans V, since its span includes both X and Y, and so X + Y = V. Suppose then that Crue tia; = 0. Setting ¢ = Ly re and y= Cx ries, we see that eX, 7 Y,and &-+ = 0. Butthen & 0, sinee X and Y are complementary. And then 2; = 0 for i €J because {aj} is independent, and 2; = 0 for ie K Deeause {a;} x is independent. Therefore, {ai} sux isa basis for V. We leave the converse argument as an exercise. 0 Corollary. If V = @i Vs and Bs is a basis for Vi, then B= Uj By is a basis for V. Proof. Wersee from the theorem that B; U By isa basis for V1 ® V3. Proceed- ing inductively we seo that Uj, By is a basis for @jey Vi for j and the corollary is the ease j = ». 021 pases 75 If we follow a coordinate isomorphism by a linear combination map, we get the mapping of the following existence theorem, which we state only in n-tuple form, Theorem 1.5. If 8 = {8:}4 is an ordered basis for the veetor space V, and. if {a;}} is any n-tuple of veetors in a vector space W, then there exists a unique S € Hom(V, W) such that S(8,) = aj for i= 1,...,n Proof. By hypothesis Lp is an isomorphism in Hom(R", 1), and so S = Ino (Lp) is an element of Hom(V, W) such that S(8,) = La(3') Conversely, if $ € Hom(V, W) is such that S(8) asforall so that So Lp um It is natural to ask how the unique $ above varies with the n-tuple {ai}. ‘The answer is: linearly and “isomorphically”. ‘Theorem 1.6. Let {8)}} be a fixed ordered basis for the veetor space V, and for each n-tuple « = {a;}} chosen from the vector space let Sa € Hom(V, 17) be the unique transformation defined above. ‘Then the map a ++ Sa is an isomorphism from W” to Hom(V, W). Proof. As above, Sa = Lae 6-t, where 0s the basis isomorphism Lp. Now we know from Theorem 6.2 of Chapter I that a + La is an isomorphism from 1 to Hom(R®, 17), and composition on the right. by the fixed coordinate isomorphism @~" is an isomorphism from Hom(R", 1) to Hom(V, 1) by the corollary to Theorem 3.3 of Chapter 1. Composing these two isomorphisms gives us the theorem. 0 “Infinite bases. Most vector spaces donot have finite bases, and it isnatural to try to extend the above discussion to index sets J that may be infinite. ‘The Kronecker functions {6': 7 € 7} have the same definitions, but they no longer span B!, By definition f is a linear combination of the functions 8" if and only if fis of the form Dicr, 6:5, where I, is a finite subset of I. But then f outside of Jy. Conversely, if f€ R! is 0 except on a finite set Jy, then f= Lier, {4 The linear span of {3° :i € J} is thus exactly the set of all func~ tions of R! that are zero except on a finite set. We shall designate this subspace Br. If {a;:1€ J} is an indexed set of veetors in V and f € Ry, then the sum Lier Ha: becomes meaningful if we adopt the reasonabie convention that the sum of an arbitrary number of O's is 0. Then Yer = Lier, where Io is any finite subset of T outside of which fis zero. ith this convention, La:f+> D(a: is a linear map from Ry to V, as in ‘Theorem 1.2 of Chapter 1. And with the same convention, Tier f(a)as 18 an clegant expression for the general linear combination of the veetors ai. Instead of choosing a finite subset 1 and numbers ¢: for just those indices ¥ in I), we define ¢; for all i Z, but with the stipulation that ¢: = 0 for all but a finite number of indices. That is, we take e = (ci: ¢ € 2) asa function in Rr.76 FINITE-DIMENSIONAL VECTOR sPaces 2a We make the same definitions of independence and basis as before. ‘Then fai: 1 € 1} is a basis for V if and only if La: Ry > V is an isomorphism, ie, if and only if for each & € V there exists a unique x € Ry such that § = Ysa, By using an axiom of set theory ealled the axiom of choice, it ean be shown that every veetor space has a basis in this sense and that any independent set ean bo extended to a basis ‘Then Theorems 14 and 1.5 hold with only minor ‘changes in notation, In. particular, if a basis for a subspace MT of V is extended toa basis for V, then the linear span of the added part isa subspace X comple mentary to M.” ‘Thus, in a purely algebraic sense, every subspace has com plementary subspaces. We assume this fact in some of our exercises. ‘The above sums are always finite (despite appearanecs), and the above notion of basis is purely algebraic. However, infinite bases in this sense are not ‘very useful in analysis, and we shall therefore concentrate for the present on spaces that have finite bases (ie., are finite-dimensional). Then in one important context later on we shall discuss infinite bases where the sums are genuinely finite by virtue of limit theory. EXERCISES LL Show by a direct computation that {<1, —1>, <0, 1>} is a basis for B®, 1.2 Thestucent must realize that the th coordinate oa vector depends on the whole basis and not just on the ith basis vector. Prove this forthe seeond coordinate of veetorsin RE using the standard basis and the bass of the above exercise 1.3 Show that (<1, 1>, <1,2>) is a basis for V = 2. ‘Tho basis isomorphism from BF to Visnow from Bo R®. Find its matrix, Find the mates ofthe coordinate isomorphism. Compute the coordinates, with respect to this bass, of <—1, 1>, <0,1>, 2.37. A Show that {b4%, whore bt = <1,0,0>, bP = <1,1,0>, and b? 1,1, 1> i basis for B3. 1.5. Ta the above exercise find the three linear functionals that are the coordinate functionals with respect to the given basis, Since x= Dheops, finding the ty is equivalent to solving x = S$ yb! for the y’'s in terme of xm <81.25207 1.6. Show that any set of polynomials no two of which have the same degree is independent 1.1. Show thatif {af isan independent subset of V and 1? in Hom(V, W) is inje- tive, then {T(a,)} is an independent subset of W 1.8 Show that if Tis any element of Hom(V, W) and (T(q)} is i then (edf is independent in V. \lependent in W,22 pimension 77 1.9 Later on we are going to eall a vector space V nimensional if every basis for F contains exactly n elements, If 1 is the span of a single vector a, so that Y= Ray then V is clearly one-dimensional Let {¥7 be a collection of one-dimensional subspaces of a vector space ¥', and choose a nonzero vector a in Vi for cach i, Prove that {as}} ix independent if and only if the subspaces {V7} are independent and that {a;}] is a basis if and only if v= Oi. 1.10 Finish the proof of Theorem 14. 1.11 Give a proof of Theorem 1.4 based on the existence of isomorphisms, 1.12 ‘The reader would guess, and we shall prove in the next section, that every siibspace of a finite-dimensional space is finitexdimensional, Prove now that a sub: space V of a fnite-dimensional veetor space Wis finite-dimensional if and only if has complement M. (Work from a combination of Theorems 1.1 and 1 ancl direct sum projections.) 113 Since {h3} = {<1,0,0>, <1, 1,02, <1, 1, 12) isa basis for BY, there is a unique Tin Hom(R3, RE) such that Ti!) = <1,0>, Tib!) = <0, 1>, and Tb!) = <1,1>. Find the matriy of 7. (Find 76) for? = 1, 2,3) 1.14 Find, similarly, the S in Hom BR such that Sib) = 6! for d= 1,2, 3 1.15. Show that the infinite sequence 4 nomials 3 is-8 basis for the vector space of all poly 2, DIMENSION "The concept of dimension rests on the fact that two differont bases for the same space always contain the same number of elements, This number, which is then the number of elements in every basis for Vis called the dimension of V. It tells all there is to know about V to within isomorphism: There exists an isomorphism between two spaces if and only if they have the same dimension, We shall consider only finite dimensions. If V is not finite-dimensional, its dimension is an infinite eardival number, a concept with which the reader is probably unfamiliar. Lemma 2.1. If Vis fin is an isomorphism. ‘e-dimensional and 7 in Hom Vis surj cetive, then T Proof. Let n be the smallest number of elements that can span V. That is, there is some spanning set (a; and none with fewer than n elements. Then {a;}} isa basis, by Theorem 1.1, and the linear combination map @:x M+ Df zai is accordingly a basis isomorphism. But {8,}} = {P(«)}' also spans, since is ‘and so 7'» @ is also a basis isomorphism, for the same reason. ‘Then is an isomorphism. 0 ‘Theorem 2.1. If V is finite-dimensional, then all bases for V contain the same number of elements. Proof. Two bases with n and m clements determine basis isomorphisms 0: R* > Vand g:R” — V'. Suppose that m < nand, viewing Ras R™ x R"—",78 FINITE-DIMENSIONAL VECTOR SPACES 22 let be the projection of R* onto BR, R(X oy Em eee tah) = Stee te Sinee T= @~'¢ g is an isomorphism from R" to R" and Tom: R"— RY is therefore surjective, it follows from the lemma that Te is an isomorphism, ‘Then w= Te (Tem) =0, and we have a contradiction, ‘Therefore no basis ean be smaller than any other basis. 0 11 isomorphism. But it isn’t, beeause a (5" ‘The integer that is the number of elements in every basis for V is of eourse called the dimension of V, and we designate it d(V). Sinee the standard basis {83}} for B has » elements, we sce that R" is n-dimensional in this preeise sense. Corollary. ‘Two finite-dimensional veetor spaces are isomorphie if and only if they have the same dimension. Proof. If T isan isomorphism from V to W and B isa basis for V, then T[B] is a basis for W by Theorem 1.3. Therefore d(V") = $B = #7[B] = dGW), where 4A is the number of elements in A. Conversely, if d(V) = d(V) = n, then V and W are each isomorphie to R and so to each other. 0 ‘Theorem 2.2. Every subspace Mf of a finite-dimensional veetor space V is finite-timensional Proof. Let @ be the family of finite independent subsets of M. By Theorem 1.1, if AG, then A can be extended to a basis for V, and so fA < d(V). Thus, {#42 A © @) is a finite set of integers, and we can choose BE @ such that n= $B is the maximum of this finite set. But then L(B) = M, beeause otherwise for any @ € M — L(B) we have BU {a} © @, by Lemma 1.2, and HBU fe) = n+], contradicting the maximal nature of . ‘Thus Af is finitely spanned. Corollary. Every subspace If of a finite-dimensional space V has a complement. Proof. Use Theorem 1.1 to extend a basis for Mf to a basis for V, and let N be the linear span of the added vectors. Then apply Theorem 1.4. 0 Dimensional identities. We now prove two basie dimensional identities. ‘We will always assume V finite-dimensional. Lemma 2.2. If V; and V2 are complementary subspaces of V, then d(V) (V1) | dV). More gencrally, if V = OY Vi then a(V) — Dpavy. Proof. This follows at, onee from Theorem 1.4 and its corollary. 0 ‘Theorem 2.8. If U and W aresubspaces ofa finite-dimensional veetor space, then d(U + W) + aU n W) = a0) + a0W),22 DIMENSION 79) Proof. Let V bea complement of UAW in U. Westart by showing that then V isalsoa complement of W in U + W. First V+W=V+(UnW)+W)= (V+ WOM) +W = 04H, We have used the obvious fact that the sum of a veetor space and a subspace is the vector space. Next, Vaw=(Vau)nW=VaWwnw)= 0, because V is a complement of UAW in U. We thus have both V+ U-+W and VW = {0}, and so V is a complement of W in U-+W by the corollary of Lemma 5.2 of Chapter 1. ‘The theorem is now a corollary of the above lemma, We have aU) + dW) = (aU NW) + dV) + a) = aU 9 W) +(a(V) + a) (UW) +a +). 0 ‘Theorem 2.4. Let V be finite-dimensional, and let W be any vector space. Let 7 € Hom(V, W) have null space N (in V) and range R (in W). ‘Then is finite-dimensional and a(V) = d(N) + a(R). Proof. Let U be a complement of N in V. ‘Then we know that 7 [ U is an isomorphism onto R. (See Theorem 5.3 of Chapter 1.) ‘Therefore, R is finite- dimensional and d(R) + dN) = d(U) + dN) = d(V) by our first identity. 0 Corollary. If W is finite-dimensional and d(W) = d(V), then T'is injective if and only if itis surjective, so that in this ease injectivity, surjeetivity, and bijectivity are all equivalent. Proof. Tis surjective if and only if R = W. But this is equivalent to (Rt) = (WV), and if d(W) = a(V), then. the theorem shows this is tun to be equivalent to dQ) = 0, that is, to N= (0}. 0 ‘Theorem 2.5. If d(V) =n and d(W) =m, then Hom(V, W) is finite- dimensional and its dimension is mn. Proof. By Theorem 1.6, Hom(V, W) is isomorphic to W" which is the direct sum of the n subspaces isomorphic to W under the injections 8 for ¢ ‘The dimension of WV" is therefore Dm = mn by Lemma 2.2. 0 Another proof of Theorem 2.5 will be available in Section 4. EXERCISES 2.1 Prove that if d(V) = n, then any spanning subset of n elements is a basi 2.2. Prove that if @(V) = n, then any independent subset of n elements is basis. 2.8. Show that ifd(V) = and Wis subspace of the same dimension, then W = V.80. FiNrre-pinteN JONAL VECTOR SPACES 2.4 Prove by using dimensional identities that iff is a nonzero linear funetional on ‘an nedimensional space V,, then its null space has dimension » — 1 2.5. Prove by using dimensional identities that ifs linear funetional on a finite- dimensional space V’, and if a is a vector notin its null space X, then Y= NV @ Re. 2.6 Given that V is an (n — I}-dimensional subspace of an n-timensional veetor spee F, show that WV is the null space of a linear funetional 2.7 Let X and Y be subspaces of a finite-dimensional vector space V’, and suppose that 7 in Hom(V, 1°) bas null space N= XY. Show that TIY+ ¥] = TIX] © 7(¥), and then deduce Theorem 2.3 from Lemma 2.2 and ‘Theorem 24. ‘This proof still depends on the existence of a T having N = N'Y as its null space, Do we know of any such T? 2.8 Show that if 1" is finito-limensional and 8, 7 © Hom ¥, then Sof = [=> Tisinvertible Show also that ToS = I => Tis invertible, 2.9 A subspace NV of a vector space V has finite codimension m if the quotient space V/N is finite-dimensional, with dimension n. Show that a subspace V has finite codimension n if and only if N’ has a complementary subspace Mf of dimension x. (Move a basis for V/ back into V.) Do not assume V to be finite-dimensional. 2.10 Show that if Ny and 2 are subspaces of a vector space V with finite eodimen- sions, then = Ny 01g has finite codimension and. coil(N) < cod( Vy) + cod). (Consider the mapping + <1, fa when Bis the coset of 4s containing &) 2.11 In the above exercise, suppose that cod(Ny) = eod (2), that is, d(/Ni) AQV/N2). Prove that d(Vy/¥) = alN2/%). 2.12 Given nongero vectors in V and f in V* such that (9) # 0, show that some scalar multiple of the mapping ++ /(£)8 is a projection. Prove that any projection hhaving a one-dimensional range arises in this way 213 We know that the choice of an origin O in Buelidean 3-space E4 induces a veetor space structure in E® (under the correspondence X ++ OX) and that this vector space is three-limensional. Show that a geometrie plane through O becomes a two- dimensional subspace. 2.14 An m-dimensional plane 31 is translate N + a9 of an m-dimensional subspace N Let {839 be any bass of V, and set ay = Be-+ ao. Show that is exactly the set of Tinear combinations Sra such that Sav a ° 2.15 Show that Exercise 2.14 isa corollary of Exercise 4.14 of Chapter 1 2.16 Show, conversely, that if @ plane AV is the affine span of m-+ 1 elements, then its dimension is < m. 2.17 From the above two exercises concoct a direct definition of the dimension of an affine subspace.23 ‘cue puat spack 81 2.18_ Write a small essay suggested by the following definition. An (m ++ 1)-tuple ‘finely independent ifthe conditions Samo and Sano together imply that een 0 forall & 2.19 A polynomial on a vector space V isa real-valued funetion on ¥ which can be represented as a finite sum of finite produets of linear functionals. Define the degree of a polynomial; define a homogeneous polynomial of degree k. Show that the set of homogeneous polynomials of degree Bisa vector space Xx. 2.20 Continuing the above exercise, show that if ki < hy < ++ < ky, then the vector spaces {X;,}% are independent subspaces of the vector space of all polynomials, [Assume that a polynomial p(0) of areal variable can be the zero function only if all its coefficients are 0. For any polynomial P on V consider the polynomials pa) Pita).) 221 Let be a basis for the two-dimensional space Vand let be the corresponding coordinate projeetions (dual basis in V*). Show that every polynomial on V “is a polynomial in the two variables \ and 4”. 2.22 Let be a basis for a two-dimensional veetor space V, and let be the corresponcling coordinate projections (dual bass for V*). Show that W be a mapping between two-dimensional spaces such that for any 1, ¥ V and any 1 I, I(F(éa + ¥)) is 8 quadeatie function oft, that i, of ‘the form at? + bt-+e. Show that F is quadratic aecording to your definition in the above exercises, 3. THE DUAL SPACE Although throughout this section all spaces will be assumed finite-dimensional, ‘many of the definitions and properties are valid for infinite-dimensional spaces as well. But for such spaces there is a difference between puzely algebraic situations and situations in which algebra is mixed with hypotheses of continuity. One of the blessings of finite dimensionality is the absence of this complication. As the reader has probably surmised from the number of special linear functionals wwe have met, particularly the coordinate functionals, the space Hom(V, R) of all linear functionals on ¥ plays a special role.TE-DIMENSIONAL VECTOR SPACES 23 the dual space (or conjugate space) V* of the veetor space V is the veetor space Hom(V, B) of all lincar mappings from V to R. Tts elements are called linear functionals. We are going to see that in a certain sense V is in turn the dual space of V* (V and (V*)* are naturally isomorphic), so that the two spaces are sym= metrically related. We shall briefly study the notion of annihilation (orthogonal ity) whieh has its origins in this setting, and then see that there is a natural isomorphism between Hom(V, W) and Hom(1*, V*). ‘This gives the mathematician a new tool to use in studying a linear transformation 7 in Hom(V, W); the relationship between 7' and its image 7* exposes new properties of T itself. Dual bases. At the outset one naturally wonders how big a space Vis, and we settle the question immediately. Theorem 3.1. Let {8,} be an ordered basis for V, and let &; be the corresponding jth coordinate functional on V: 6)(8) = 2, where £= Dh 28. ‘Then {6;}" is an ordered basis for V*. Proof. Let us frst make the proof by a direct elementary calculation. a) Independence. Suppose that I} 6/6) = 0, that is, EY 68)(@) = 0 for all €€V. Taking 8; and remembering that the coordinate n-tuple of 8; is 3', we sce that the above equation reduces to c; = 0, and this for all ¢. There- fore, {8;}4 is independent. b) Spanning. First note that the basis expansion £ = F 248; ean be rewritten £= L §,(¢)8;. Then for any \€ V* we have A() = DT Lee), where we have sot ly — (3). That is, 8 = E lie. This shows that (8) spans V*, and, together (a), that it is a basis. 0 Definition. The basis {8,} for V* is called the dual of the basis {8,} for V. [As usual, one of our fundamental isomorphisms is lurking behind all this, but we shall leave its exposure to an exercise. Corollary. d(V*) = dV). ‘The three equations eB, LMG, ME) = LB) - 8) are worth looking at. The first two are symmetrically related, each presenting ‘the basis expansion of a vector with its coefficients computed by applying the corresponding element of the dual basis to the vector. ‘The third is symmetric itself hetwoen £and \. Since @ finite-dimensional space V and its dual space V* have the same dimension, they are of course isomorphic. In fact, each basis for V defines an isomorphism, for we have the associated coordinate isomorphism from V to R™, the dual basis isomorphism from R* to V*, and therefore the composite isomor-23 THE DUAL SPACE 83 phism from V to V*. This isomorphism varies with the basis, however, and there is in general no natural isomorphism between V and V*. Tt is another matter with Cartesian space R” because it has a standard basis, and therefore a standard isomorphism with its dual space (R")*. It is not hard to see that this is the isomorphism a+ Ly, where La(x) = Di ai, that we discussed in Section 1.6. We can therefore feel free to identify R" with (2*)*, only keeping in mind that when we think of an n-tuple a as a Inner funetional, we mean the funetional L(x) = Df awe. ‘The second conjugate space. Despite the fact that V and V* are not naturally isomorphic in general, we shall now see that V és naturally isomorphic to V** = wy. ‘Theorem 3.2. The function w: V x V+ R defined by (Ef) = J) is bilinear, and the mapping ¢++ af from V to V** is a natural isomorphism. Proof. In this context we generally set §** = wf, so that @** is defined by E*(f) = M9 for all fe V*. The bilinearity of » should be elear, and Theorem 6.1 of Chapter 1 therefore applies. ‘The reader might like to run through a direct check of the linearity of £-» g** starting with (cr + caf) **(/)- ‘There still is the question of the injeetivity of this mapping. If « % 0, we can find f € V* so that f(a) # 0. One vay is to make a the first veetor of an ordered basis and to take fas the first funetional in the dual basis; then f(a) Since a**(/) = fla) # 0, we see in particular that a** x 0." ‘The mapping. E+ E** is thus injeetive, and it is then bijective by the corollary of Theorem 4. 0 If we think of V** as being naturally identified with V in this way, the two spaces V and V* are symmetrically related to each other. Each is the dual of the other. In the expression ‘/(2)’ we think of both symbols as variables and then hold one or the other fixed for the two interpretations. In such a situation \ve often use a more symmetrie symbolism, such as (£, f), to indicate our inten- tion to treat both symbols as variables. Lemma 3.1. If {43} is the basis in V* dual to the basis {a} in V, then {af*} is the basis in V** dual to the basis {A} in V*. Proof. We have af*(\;) = dy(ai) = 8, which shows that af* is the ith coordi nate projection. In case the reader has forgotten, the basis expansion f = ej implies that of*() = f(a.) = (ZeA)(a:) = cs s0 that aP* is the mapping fred Annihitator subspaces. Tt is in this dual situation that orthogonality. frst naturally appears. “However, we shall save the term ‘orthogonal’ for the latter context in which Vand V* have been identified through a scalar product, and shall speak here of the annihilator of a set rather than its orthogonal complement.84 anrre-pinensi Ab VECTOR SPACES, Definition. If A CV, the annihilator of A, A°, is the set of all fin V* such that f(a) = 0 for all ain A. Similarly, if A c V*, then AP = (ae V:fla) = O forall fe A}. If we view V as (V*)*, the second definition is ineluded in the first. “The following properties are easily establish! an will he oft as exerci: 1) AP is always a subspace. 2) ACB=B°CA® ) (L(A)? = 4° 4) (AUBY en B®, 5) ACA™. We now add one more crucial dimensional identity to those of the last section, ‘Theorem 3.3. If W ix a subspace of V, then d(V) = d(W) + d0¥?).. Proof. Let {8,)% be a basis for W, and extend it to a basis (8:31 for V. Let {\} be the dual basis in V'*. We claim that then \}u41 is a basis for 172 First, if j > m, then A,(8,) = 0 for i= 1,...,m, and 80 5 is in W? by (3) above. Thus Qnsi.-++Ax} CW®. Now suppose that fe W?, and let f= jhy be its (dual) basis expansion. ‘Then for each 1 < m we have ¢ 1G) = 0, since 8; € W and fe W?; therefore, f= Di4x cd ‘Thus every fin W° isin the span of (A:)5.44- Altogether, we have shown that W7° is the span of (dhen, as claimed. ‘Then d(W*)-+ dV) = (x — m)-+m= n= d(V), and we are done. 1 Corollary. A° = L(A) for every subset ACV. Proof. Since (L(A)? = A®, we have d(L(A)) + a(A% = a(V), by the theorem, Also d(A®) + d(A%) = a(V*) = a(¥). Thus da%) = a(L(A)), and sinee L(A) C A®, by (5) above, we have L(A) = A°°. (1 ‘The adjoint of T. We shall now see that with every 1’ in Hom(V, W) there is naturally associated an element of Hom(W*, V*) which we call the adjoint of T and designate T*. One consequence of the intimate relationship between 7” and T° is that the range of 7* is exactly the annihilator of the null space of T. Combined with our dimensional identities, this implies that the ranges of T and T* have the same dimension. And later on, after we have established the ‘connection between matrix representations of T and '*, this turns into the very mysterious fact that the dimension of the linear span of the row veetors of an m= hhy-m matrix is the same as the dimension of the linear span of its column vectors, which gives us our notion of the rank of a matrix. In Chapter 5 we shall study a situation (Hilbert space) in which we are given a fixed fundamental isomorphism between V and V*. If T'is in Hom V, then of course 1” is in Hom V*, and we ‘ean use this isomorphism to “transfer” 7* into Hom V. But now 7 ean be eom-23 THE puaL space 85 pared with its (transferred) adjoint 7", and they may be equal. That is, 7’ may be self-adjoint. It turns out that the self-adjoint transformations are “nice” ones, as we shall see for ourselves in simple eases, and also, fortunately, that many important linear maps arising from theoretical physies are self-adjoint. If T in R8, (Use the isomorphism of (R8)* with R? to express the basis vectors as triples.) 8.6. Find (a basis for) the annihilator of {<1, 1, 1>, <1, 2,3>) in R9, 1 Find (a basis for) the annihilator of {<1, 1, 1,1>, <1, 2,3,4>) in BA, 3.8 Show that if V = AF @ N, then V* = Mf @ N°, 3.9. Show that if Af is any subspace of an n-dimensional veetor space V and dM) = im, then -] ean be viewed as being the linear span of an independent subset of m elements of V or as being the annihilator of (the interseetion of the null spaces of) an. independent subset of n — m elements of V*. 3.10 If B = {fi}? is a finite collection of linear functionals on ¥ (BC V*), then its annibilator B° is simply the intersection N= (V} Ns of the null spaces N,'= N(f,) of the functionals fz. State the dual of Theorem 3.3 in this context. ‘That is, take W” as the linear span of the functionals fi, so that WC V* and WC V. State the dual of the corollary. 3.11 Show that the following theorem isa consequence of the corollary of Theorem 3.3, ‘Theorem. Let NV be the intersection }.N; of the null spaces of a set {7 of linear functionals on V, and suppose that g in V* is zero on NV. Then gis linear combination of the set {fi} 42 A corollary of Theorem 3:3 is that i IV isa proper subspace of V, then there is at least one nonzero linear functional fin V* such that f = 0 on W. Prove this fact directly by elementary means. (You are allowed to construct # suitable basis.) 313. An m-tuple of linear functionals (f)7 on a vector spuce defines a linear mapping a+ from V to R™. What theorem is being applied here? Prove that the range of this linear mapping is the whole of Rif and only if (J07 isan independent setof functionals. {J1in: If the range is a proper subspace I’, there is a nonzero m-tuple a such that Dj aie for all x W') 214 Continuing the above exercise, what is the null space N’of the linear mapping + ? Teg isa linear functional which is zero on ¥, show that 9 ina linear combination ofthe f, now as a corollary of the above exereise and Theorem 413 of Chapter 1. (Assume the set {f) independent.) 1S Write out from seratch the proof that 1" is linear [fora given Tin Hom(¥, W)] ‘so prove directly that 7'-+ 7" is linear. 416 Prove the other half of Theorem 3.5. BAT Let 6; be the isomorphism ai-+ a* from V, to Vst* ford = 1, 2, and suppose given T'in Hom(V1, Vs). The loowe statement 7 = T°* means exactly that i Oe ToG! or T800, = O07. Prove this identity. As usual, do this by proving that it holds for each ain V1.88 FINITE-DIMENSIONAL VECTOR SPACES 24 3.18 Let 6: RV’ bea basis isomorphism, Prove that the adjoint 6* isthe eoordinate isomorphism for the dual basis if (R)* is identified with R" in the natural way, 39 Let w be any bilinear functional on VX I. ‘Then the two associated linear transformations are 7:V + W* defined by (T@)) = o(&,n) and 8:0 — V* defined by (Siq))(E) = (6,9). Prove that S = 7* it W is identified with W**, 20 Suppose that fin (R")* has coordinate m-tuple a [f(y) = DP aan] and that 7 Hom(, R™) has matrix t = {€). Write out the explicit expression of the number {4(T(9) in terms of all these coordinates. Rearrange the sum s0 that it appears in ‘the for ats) x bay and then read off the formula for b in terms of a. Matrices and linear transformations. The reader hasalready learned something, about matrices and their relationship to linear transformations from Chapter 1; ‘we shall begin our more systematic diseussion by reviewing this earlier material By popular conception a matrix is a reetangular array of numbers such as ti a ve ie fr tae bn tm tnd ban Note that the first index numbers the rows and the second index numbers the columns. If there are m rows and n columns in the array, itis ealled an m-by- (mX n) matrix, ‘Thi inexact. A rectangular array isa way of picturing a matrix, but a matris is really a function, just as a sequence is a function. With the notation 7 = (1, ...,m}, the above matrix is a function assigning a number to every pair of integers <é,j> in 7X 7. It is thus an element of the set RFT, The addition of two m X n matrices is performed in the obvious place- by-place way, and is merely the addition of two funetions in R™™™; the same is true for scalar multiplication. ‘The set ofall. m x n matrices is thus the veetor space R™™*, a Cartesian space with a rather faney finite index set. We shall use the customary index notation t, for the value t(j, ) of the function t at , and we shall also write {f,} fort, just as we do for sequences and other indexed ‘The additional propertios af matrieas stom from the correspondence between ‘m X n matrices {3} and transformations 7’ € Hom(R", R"). ‘The following theorem restates results from the first chapter. See Theorems 1.2, 1.3, and 6.2 of Chapter 1 and the diseussion of the linear combination map at the end of Section 1.6.24 sarnices 89 ‘Theorem 4.1. Let {(i)} be an m-by-n matrix, and let tbe the m-tuple that is its jth column for = 1, ...,n. Then there is a unique 7'in Hom(R", R) such that skeleton 7 = {U}, ie, such that T(#) =U forall. Tis defined as the linear combination mapping x y = E74 2yt/, and an equivalent presentation of 7's the collection of sealar equations wa Dt for i Each 7’ in Hom(R", R") arises this way, and the bijection {t;} ++ 7 from R™ to Hom(R", R™) is a natural isomorphism, The only additional remark called for here is that in identifying an m xn matrix with an n-tuple of m-tuples, we are making use of one of the standard identifications of duality (Section 0.10). We are treating the natural isomorphism between the realy distinet spaces R™**and (R™)* as though it were the iden ‘We can also relate ' to {ti} by way of the rows of {4}. As above, taking ‘ith eoordinates in the m-tuple equation y = EJ, xt, we get the equivalent and familiar system of numerical (scalar) equations ys = D3; tie; for i= 1,...,m. Now the mapping x» 53, ez; from R* to Ris the most gencral linear functional on R*. In the above numerical equations, therefore, we have simply used the m rows of the matrix {(,} to present the m-tuple of linear functionals on &" which is equivalent to the single m-tuple-valued linear mapping 7 in Hom(R", R™) by Theorem 3.6 of Chapter 1. ‘The choice of ordered bases for arbitrary finite-dimensional spaces V and W allows us to transfer the above theorem to Hom(V, 17). Since we are now going to correlate a matrix t in R™™ with a transformation Tin Hom(V, W), we shall dlesignate the transformation in Hom(R", R") discussed above by 7. Theorem 4.2. Let {a} and {8;}7 be ordered bases for the vector spaces V and W, respectively. For each matrix {¢;;} in R°** let T be the unique clement of Hom(V, W) such that (a) = O%, ty8 for j= 1,...,n. ‘Then the mapping {t,;} ++ T is an isomorphism from R™* to Hom(V, W). Proof. We simply combine the isomorphism {f;;} + T of the above theorem with the isomorphism 7+ T = yo T > ¢-" from Hom(R®, B") to Hom(V, W), where ¢ and y are the two given basis isomorphisms. Then Tis the transforma tion deseribed in the theorem, for T(a;) = ¥(T(e(a,))) = ¥(T(s)) ¥(U) = CM 48% The map {3} + 7 is the composition of two isomorphisms sand so is an isomorphism. tis instructive to look at what we have just done in a slightly different way. Given the matrix {(;}, let 7, be the vector in WV whose coordinate m-tuple is the ith column t¥ of the matrix, so that 7; = E7%, t8;. Then let 7' be the unique vlement of Hom(V, W) such that (a) = 7, forj—1,...,n. Now we have obtained 7 from {%;} in the following two steps: T' corresponds to the n-tuple00 FINITE-DIMENSIONAL VECTOR SPACES 24 {r;}} under the isomorphism from Hom(V, W) to W® given by Theorem 1.6, and {73} corresponds to the matrix {4} by extension of the coordinate isomorphism between W and R™ to its product isomorphism from 1 to (R")". is the Te) Corollary. If y is the coordinate m-tuple of the vector in W and coordinate n-tuple of fin V (with respeet to the given bases), then if and only if ys = jes Gets for f= 1,5 me Proof. We know that the sealar equations are equivalent to y = T(x), which is ‘the equation y= y-1 To g(x). The isomorphism y converts this to the equation = 7(¢). 0 Our problem now is to discover the matrix analogues of relationship between linear transformations. For transformations between the Cartesian spaces R” this is a fairly direct, uncomplicated business, because, as we know, the matrix here is a natural alter ego for the transformation. But. when we leave the Car- ‘esian spaces, a transformation 7 no longer has a matric in any natural way, and only acquires one when bases are chosen and a corresponding 7 on Cartesian spaces is thereby obtained. All matrices now are determined with respect to ‘chosen bases, and all ealeulations are complicated by the necessary presence of the basis and coordinate isomorphisms. ‘There are two ways of handling this situation, ‘The first, which we shall follow in general, is to deseribe things Uireetly for the general space V and simply to accept. the necessarily more compli cated statements involving bases and dual bases and the corresponding loss in transparency. ‘The other possibility is first to read off the answers for the Cartesian spaces and then to transeribe them via coordinate isomorphisms, Lemma 4.1. The matrix element fk; ean be obtained from T by the formula, 4; = me(T(@s)), where yx is the kth element of the dual basis in W*, Proof. un(T(as)) = me(Eler ty8) = Le times) = Lets & = ty In terms of Cartesian spaces, 7(#).is the jth column m-tuple t# in the matrix {t,} of T, and t,; is the kth coordinate of t’. From the point of view of linear maps, the kth coordinate is obtained by applying the kth coordinate projection m, so that j= ma(7(®)). Under the basis isomorphisms, re becomes ys, T becomes T, &' becomes a, and the Cartesian identity becomes the identity of the lemma. ‘The transpose. ‘The transpose of the m x n matrix {tj} is the m x m matrix {8} defined by ( = tj: forall, j. The rows of t* are of course the columns of t, and conversely. ‘Theorem 4.3. The matrix of T* with respect to the dual bases in W* and Vis the transpose of the matrix of 7.24 aarnices 91 Proof. Isis the matrix of T*, then Lemmas 3.1 and 4.1 imply that, a = @5*(T*(ud) = = (ee Ta) = wi(T(@)) = ty. O Definition. ‘The ran space of the matrix {t,,} ¢ RF is the subspace of R spanned by the m row veetors. ‘The column space is similarly the span of the » column vectors in R™. Corollary. ‘The row and column spaces of a matrix have the same dimension, Proof. If T is the element of Hom(R", R™) defined by T(é’) = t/, then the set {t’}{ of column vectors in the matrix {¢,,} is the image under 7' of the standard basis of R*, and so its span, which we have called the eolumn space of the matrix, is exactly the range of 7. Tn particular, the dimension of the column space is d(2R(T)) = rank 7. Since the matrix of 7* is the transpose t* of the matrix t, we have, similarly, that rank 7* is the dimension of the column space of t*. But the column space ‘of tis the row space of t, and the assertion of the corollary is thus reduced to the identity rank 7* = rank T, which isthe corollary of Theorem 3.5. 0 ‘This common dimension is called the rank of the matrix. ix products. If 7’ € Hom(R", R") and S ¢ Hom(R™, R’), then of course R = So T e Hom(R", R), and it certainly should be possible to calculate the matrix r of & from the matrices s and t of S and 7, respectively. To make this computation, we set y = T(x) and z = S(y), so that z = (Se T)(x) = Ri). ‘The equivalent sealar equations in terms of the matrices t and s are Ese w= DU tinee and zy am an BS tn = E(B ute) But ze = Char raves fork = 1,...,1 Taking x as #, we have ry = De sistss for all keand j We thus have found the formula for the matrix of the map R = So T:x—» 2. Of course, r is defined to be the product of the matrices » and t, and we write ros-torr=st. Note that in order for the product st to be defined, the number of columns in the left factor must equal the number of rows in the right factor. We get the clement 74; by going across the kth row of s and simultaneously down the jth92. FINITE-DIMENSIONAL VECTOR SPACES 24 column of t, multiplying corresponding elements as we go, and adding the resulting products. ‘This process is illustrated in Fig. 2.1. In terms of the scalar product (x, y) = Xi xvi on R", we see that the element nj in x = st is the scalar product of the kth row of s and the jth column of t aym x cm by =) = ym 6 $ Tih coramm Since we have defined the product of two matrices as the matrix of the produet of the corresponding transformations, ie., so that the mapping T(t: preserves products (Se T'++st), it follows from the general principle of ‘Theorem 4.1 of Chapter 1 that the algebraic laws satisfied by composition of transformations will automatically hold for the product of matrices. For example, we know without making an explicit computation that matrix multiplication is associative. ‘Then for square matrices we have the following theorem. ‘Theorem 44. The set My of square n X n matrices is an algebra naturally isomorphic to the algebra Hom(R"). Proof. We already know that 17+ {ij} is a natural linear isomorphism from Hom(R”) to M, (Theorem 4.1), and we have defined the product of matrices so that the mapping also preserves multiplication. ‘The laws of algebra (for an algebra) therefore follow for M, from our observation in Theorem 3.5 of Chapter 1 that they hold for Hom(R"). 0 ‘The identity 7 in Hom(R*) takes the basis veetor 6” into itself, and therefore its matrix ¢ has # for its jth column: e = #. Thus ej— & = 1ifi=j and ej = 6 = if i rj. That is, the matrix e is 1 along the main diagonal (from upper left to lower right) and 0 elsewhere. Since J e under the algebra isomorphism T+ t, we know that e is the identity for matrix multiplication. Of course, we can check this directly: Car tyjejn = tay and similarly for multiplying by e on the left. The symbol ‘eis aimbiguous in that we have used it to denote the identity in the space R™** of square n X n matrices for any 7. Corollary. A square n X n matrix t has a multiplicative inverse if and only if its rank isn.24 Marnices — 93 Proof. By the theorem there exists an s EM, such that st = ts =e if and only if there exists an S € Hom(®") such that So T= T'S = I. But such ‘an S exists ifand only if 7 isan isomorphism, and by the corollary to Theorem 2.4 this is equivalent to the dimension of the range of being n. But this dimer js the rank of t, and the argument is complete. 0 ‘A square malin (or a Lransformation in Tom V) is said to be nonsingular if it is invertible. ‘Theorem 4.5. If {ai}7, (8), and {7434 are ordered bases for the vector spaces U, V, and W, respectively, and if T € Hom(U, V) and Sc Hom(V, W), then the matrix of Se is the product of the matrices of Sand 7 (with respect to the given bases). Proof. By definition the matrix of Se Tis the matrix of Se = x7" 0 (Se T) oe in Hom(R", R'), where ¢ and x are the given basis isomorphisms for U and W. But if ¥ is the basis isomorphism for V, we have BoP = Octo Soyo (te Tog) = BT, and therefore its matrix is the product of the matrices of 5 and 7 by the definition of matrix multiplication. The latter are the matrices of S and with respect to the given bases. Putting these observations together, we have the theorem. ‘There ‘Theorem 4.6. If the matrix product st is defined, then so is t*s*, and bist = (st)*. Proof. A direct calculation is easy. We have a simple relationship between matrix produets and transposition. DK = (Ons = De sets = Le Gish ‘Thus (st)* = t*s*, as asserted. 0 ‘This identity is clearly the matrix form of the transformation identity (Se 7)* = T*oS*, and it ean be deduced from the latter identity if desired. Cartesian veetors as matrices. We can view an ntuple x= <2... 24> ‘as being alternatively either an n X 1 matrix, in which ease we eall it a column vector, oF a 1X nmatrix, in which ease we call ita row vector. Of course, these identifications are natural isomorphisms. The point of doing this is, in part, that then the equations ys = EJ zy say exactly that the column vector y is the matrix product of t and the eolumn veotor x, that is, y— tx. The linear map T;R" — R™ becomes left multiplication by the fized matriz t when R” is viewed as the space of nX 1 column vectors. For this reason we shall take the column veetor as the standard matrix interpretation of an n-tuple x; then x* is the corresponding row vector.94 PINITE-DIMENSIONAL VECTOR SPACES 24 In particular, a linear functional F & (R")* becomes left multiplication by its matrix, whieh is of course 1X » (F being from R” to R'), and therefore is simply the row matrix interpretation of an n-tuple in &", ‘That is, in the natural isomorphism a» L, from R" to (B")*, where L4(x) = D4 airs, the funetional L,,can now be interpreted as left matrix multiplieation by the n-tuple a viewed as the row veetor a*. The matrix produet of the row veetor (1% 1 matrix) a* and the column veetor (v1 matrix) x is a 1X 1 matrix a*-x, that is, a number. Let us now see what these observations say about 7°. The number L, (7'(x)) is the 1 x 1 matrix attx. Since L,(7(x)) = (T*(L,))(x) by the definition of T*, we see that the functional T*(L,) is left multiplication by the row veetor att. Since the row vector form of Ly is a* and the row veetor form of T*(Lq) is a*t, this shows that when the functionals on R* are interpreted as row veetors, T* becomes right multiplication by t. ‘This only repeats something we already now. If we take transposes to throw the row veetors into the standard column, veetor form for n-tuples, it shows that 7* is left multiplication by t*, and so gives another proof that the matrix of T* is ¢* Change of basis. If gx § = Lj ngyand 6: y+ = EF yi8l are tovo basis, isomorphisms for V, then A = @~! © g is the isomorphism in Hom(R") which takes the coordinate n-tuple x of a veetor £ with respect to the basis (83) into the coordinate n-tuple y of the same vector with respect to the basis {8}. ‘The isomorphism A is called the “change of coordinates” isomorphism. In terms of the matrix a of A, we have y = ax, as above ‘The change of coordinate map A = 6-1 » y should not be confused with the ilar looking 7” = @ © ¢~!. ‘The latter is a mapping on V, and is the element of Hom(V) whieh takes each 8; to 6% Fig. 22 We now want to see what happens to the matrix of a transformation T GHom(V, W) when we change bases in its domain and codomain spaces. Suppose then that ¢ and g2 are basis isomorphisms from R" to V, that yy and yy are basis isomorphisms from R" to W, and that t/ and t” are the matrices of T with respect to the first and second bases, respectively. ‘That is, t! is the matrix of 7 = (y)"1* Te ey € Tom(R*, R"), an similarly for ¢”, ‘The mapping A= e7' ee; €Hom(R") is the change of coordinates transformation for V+ if x is the coordinate n-tuple of a vector £ with respect to the first basis [thatis, € = ¢1(x)], then A (x) ists eoordinate n-tuple with respect to the second basis. Similarly, let B be the change of coordinates map ¥y" © y; for W. ‘The diagram in Fig. 2.2 will help keep the various relationships of these spaces and24 sameices 95 ‘mappings straight. We say that the diagram is commutative, which means that any two paths between two points represent the same map. By selecting various pairs of paths, we ean read off all the identities which hold for the nine maps 1,1", 7", ¢1, €y A, ¥i,¥2, B. For example, 7” can be obtained by going backward along 4, forward along 7’, and then forward along B. ‘That is, 7” Roo A-t. Since these “outside maps” are all maps of Cartesian spaces, we can then read off the corresponding matrix identity v= bea", showing how the matrix of 7' with respect to the second pair of bases is obtained from its matrix with respect to the first pair. ‘What we have actually done in reading off the above identity from the diagram is to climinate certain retraced steps in the longer path which the definitions would give us. ‘Thus from the definitions we get Bote Am = Wz" ov) e Wie Toe) 0 (er! ° #2) = ¥i In the above situation the domain and codomain spaces were different, and the two basis changes were independent of each other. If W = V, so that T € Hom(V), then of course we consider only one basis change and the formula becomes Teg.=T". Waa. Now consider a linear functional F €V*. If £” and £" are its coordinate n-tuples considered as column veetors (n X 1 matrices), then the matrices of F with respect to the two bases are the row veetors (f')* and (f"")*, as we saw earlier. Also, there is no change of basis in the range space since here W = R, its permanent natural basis vector 1. Therefore, b = e in the formula v= be ‘We want to compare this with the change of coordinates of a vector £€ V, which, as we saw earlier, is given by ‘These changes go in the oppositive directions (with a transposition thrown in). For reasons largely historical, functionals F in V* are called covariant vectors, and since the matrix for a change of coordinates in V is the transpose of the inverse of the matrix for the corresponding change of coordinates in V*, the veetors £ in V are ealled contravariant vectors, ‘These terms are used in classical ‘tensor analysis and differential geomet ‘The isomorphism {t,;} ++ 7, being from a Cartesian space R™™®, is auto- ‘matically a basis isomorphism. Its basis in Hom(V, W) is the image under the isomorphism of the standard basis in R™*®, where the latter is the set of Kronecker functions 5 defined by 8"(i,j)=0 if # and 8*(k, 1) = 1. (Remember that in R4, 6° is that function such that 4*(b) = 096 FINITE-DIMENSIONAL VECTOR SPACES 24 if} # @and 5"(a) = 1, Here A = mx Wand the elements a of A are ordered pairs a= .) The function # is that matrix whose columns are all 0 except for the (th, and the (th column is the m-tuple 3. ‘The corresponding transformation Dur thus takes every basis vector a; to 0 except az and takes ai to Bt. That is, Diu(a,) = Oifj x l, and Dix(ar) = Bs. Again, Daz takes the Ith basis veetor in V to the hth basis veetor in HV and takes the other basis vector in V to, If £= Fray, it follows that Du(é) = 2B Sinee {Diy} is the basis defined by the isomorphism {t,} ++ 7, it follows that {()} is the coordinate set of T with respect to this basis; itis the image of under the enordinate isomorphism. Tt is interesting to see how this basis expan sion of 7 automatically appears. We have Te) (3.2m) = Eaten = Ena Eure, so that LtDy Our original discussion of the dual basis in V* was a special case of th present situation. There we had Hom(V, R) = V*, with the permanent standard basis I for R. The basis for V* corresponding to the basis {a} for V therefore consists of those maps D; taking ay to 1 and a; to 0 for j #1. Then Di) = Du(E 2yx,) = x, and Dy isthe Ith coordinate functional &. ly, we note that the matrix expression of 1 € Hom(R", RM) is very suggestive of the block dlceompositions of that we discussed earlier in Sect 1.5. In the exercises we shall ask the reader to show that in fact Tr = ferDur. EXERCISES ling linear transformation defined by (T())(@) = a8 9), then for any basis fax} for F the mattis ty = faa) is the mattis of 7. 4.2. Verity that the row and column rank of the following mattis are both 1: 5 2 8 0 4 6 4.8. Show by adireot calculation that i the row rank of a 2X 3 matris is 1, then so in itp colin ran, 44 Let {f)3 bea linearly dependent set of €2-functions (tice continuously lifer- ‘entiable real-valued functions) on R, Show that the three triples are dependent for any z. Prove therefore that sin f, eos, and are linearly independent. (Compute the derivative triples for a well-chosen 2.)24 sarnices 97 .5 Compute 5-2 38 1 4 1 2 -3]x 1 6-1 4 Ede 4.6 Compute to exist 4.7 A matrix a is idempotent if a? = a, Find a basis for the vector space R®? of all 2 2 matrices consisting entirely of idempotents. 4.8 Dy a direot caleulation show that 1 2 2 3 is invertible and find its inverse 4.9 Show by explicitly solving the equation se yep 9 © dl wl bo that the matrix on the lef is invertible if and only if (the determinant) ad — be is not 4.10. Find a nonzero 2% 2 matrix whose square is zero, 4.11 Find all 2 2 matrices whose squares are zero. 4.12 Prove by computing matrix produets that matrix multiplieation is associative. 4.13 Similarly, prove directly the distributive law, (e+ 6)-t = n-th ast 4.14. Show that left matrix multiplication by a fixed r in R>* isa inear transformation from R®S to RFF, What theorem in Chapter I does this mirror? 4.13. Show that the rank uf a product of two matrices is at most the minimum of thei ranks, (Remember that the rank of a matrix is the dimension of the range space ofits associated 7.) 4.16 Leta be an m X nmatrix, and let b be m Xm. Ifm > n, show that a b cannot bbe the identity e (m % m)-98 FINITE-DIMENSIONAL VECTOR SPACES 24 4.17 Let Z be the subset of 2 X 2 matrices of the form «8 le] Prove that Z isa subalgebra of R2*2 (that is, Z is closed under addition, sealar multiplication, and matrix multiplication). Show that in fact 2 is isomorphic tothe complex shuns 4:18 A mats (necessarily square) which sequal to its transpose is said to be apm ntrc. Ava square aray it insstomtrie about the main diagonal, Show tha for ny ‘mx matrix tthe product tts meaningful and symmetie 4.19. Show that iw and & are symietse n Xn matrices, and if they commute, then S-Cin symmetric. (Do nt try fo ansver thie by writing out matix product) Show Conversely that ft, and tare all aymmetrc then sand t commute 4.20, Suppose that 7 in Hom R® has a symmetric matrix and that 7 is not of the form ef. Show that has esactly two eigenveetors (up to sealar multiples). What ddoes the matrix of T become with respect to the “eigenbasis” for R? consisting of these two eigenvectors? 4.21 Show that the symmetric 2X 2 matrix t has a symmetric square root s (s? = 1) if and only if its eigenvalues are nonnegative. (Assume the above exercise.) 4.22 Suppose that tis a 2X 2 matrix such that t* = t-!, Show that t has one of the forms aoe ba where a? 0? = 1 4.23 Prove that multiplication by the above ¢ is a Euclidean isometry. ‘That is, show that ify = t+ x, where andy © R¥, then lz] = [ll], where [zl] = (23-42), 4.24 Let {Du} be the basis for Hom(V, 17) defined in the text. ‘Taking I show that these operators satisfy the very important multiplication rules Dye Du=0 it j#k, Due Dut = De 4.25 Keeping the above identities in mind, show that if € = m, then there are trans formations $ and Tin Hom V such that Sof —TeS = Dw Also find $ and such that, SeT — ToS = Du— Dow 4.26 Given T in Hom R®, we know from Chapter 1 that 1 = ¥y,j Tv where Tuy PAP; and P; = Ory, Now we also have T= DL taDu. Show from the definition of Dj in the text that PDP; = Diy and that P,DuPs if either i x kor j x1. Conclude that Ty = tiyDuy-25 TRACE AND DETERMINANT 99 5. TRACE AND DETERMINANT Our aim in this short section is to acquaint the reader with two very special real-valued functions on Hom V and to deseribe some of their properties. ‘Theorem 5.1. If V is an n-dimensional vector space, there is exactly one linear functional \ on the vector space Hom(V) with the property that (Se 1) = Te S) forall S, Tin Hom(V) and normalized so that MZ) = n. If a basis is chosen for V and the corresponding matrix of Tis {t}, then MP) ‘ui, the sum of the elements on the main diagonal. Proof. If we choose a basis and define X(T) as Stu, then it is clear that is a linear functional on Hom(V’) and that XU) = n. Moreover, uso 1) = F(Z eutu) = So sutu= E aa = M* 8), That is, each basis for V gives usa functional \ in (Hom V)* such that MS ¢ 7) = A(T © $), XZ) = n,and MP) = Ft for the matrix representation of that basis. Now suppose that cis any element of (Hom(V))* such that u(S eT) A(T oS) and 4(Z) =n, If we choose a basis for V and use the isomorphism 6: {ti} > T from R™* to Hom V, we have a functional y= ye @ on RO (» = 0%) such that v(st) = v(ts) and »(e) = n. By Theorem 4.1 (or 3.1) v is given by a matrix ¢, v(t) = Dijer cus, and the equation v(st — ts) becomes Fiiast cu(eatny — sat) = 0. We are going to leave it as an exereise for the reader to show that if Um, then very simple special matrices s and t can be chosen so that this sum reduces t0 cin = 0, and, by a different choiee, to x1 — émm = O. ‘Together with the requirement that »(e) = n, this implies that com = 0 for 1 mand Cam = 1 form =1,...,n. That is, (t) = D4 tam, and v is the of the basis being used. Altogether this shows that there is a unique 2 in (Hom V)* such that (S © 7) = X(T’ ¢ S) forall Sand T and \(/) = n,and that A(T) has the diagonal evaluation as 5 fin every basis. 0 ‘This unique > is called the trace functional, and \(T) is the trace of T. Tt is usually designated tr(7). ‘The determinant function (7) on Hom V is much more complicated, and we shall not prove that it exists until Chapter 7. Its geometrie meaning is as follows. First, |A(T)] is the factor by which T' multiplies volumes. More precisely, if we define a “volume” » for subsets of V by choosing a basis and using the coordinate correspondence to transfer to V the “natural” volume on R", then, for any figure A.C V, o(T{A)) = |A(T)|- oA). This will he spelled ont in ‘Chapter 8. Second, A(T) is positive or negative according as T preserves or reverses orientation, which again is a sophisticated notion to be explained later. For the moment we shall list properties of A(T) that are related to this geometric, interpretation, and we give a sufficient number to show the uniqueness of 4.100 FINTTE-DIMENSIONAL VECTOR SPACES 25 v m1 ue. Fig. 23 We asstume that for each finite-limensional veetor space V there is a func tion A (or dy when there is any question about domain) from Hom(V) to B stich that the following are true: a) A(S © T) = 4(S) A(Z) for any 8, 7 in Hom(V) b) Ifa subspace N of V is invariant under T'and 7’is the identity on N’ and on V/N (that is, Tla] = a for each coset a= a+N of N), then A(T) = 1. Such a 7 is a shearing of V along the planes parallel to N. In two dimensions it ean be pictured as in Fig. 2.3, ©) If Visa direct sum V = M +N of T-invariant subspaces Mf and N, and if R= Tt Mand S= TN, then A(T) = A(R) A(S). More exactly, Ay(T) = Au(R) Avi). 4) If Vis one-dimensional, so that any T'in Hom(V) is simply muttipliation by a constant cr, then A(T) is that constant er. ©) If is two-dimensional and 7 interchanges a pair of independeiit veetors, then A(T) = —1. This is clearly @ pure orientation-changing property. ‘The fact that A is uniquely determined by these properties will follow from our diseussion in the next section, which will also give us a process for ealeulating A. This process is efficient for dimensions greater than two, but for 7 in Hom(R?) there is a simple formula for A(7) which every student should know by heart. ‘Theorem 5.2. If Tis in Hom(R*) and {(,} is its 2x2 matrix, then A(T) = taatea — tata. This sa special case ofa general formula, which we shal derive in Chapter 7, that expresses A(7) asa sum of n! terms, each term being a product of n numbers from the matrix of 7. This formula is too complicated to be useful in computations for lange n, but for n = 3 it is about as easy to use as our row-reduetion calculation in the next section, and for n = 2 it becomes the above simple ‘expression, ‘There are a few more properties of A with which every student should be familiar. ‘They will all be proved in Chapter 7 ‘Theorem 5.3. If Pisin Hom V, then A(7"*) = A(T). If @ isan isomorphism from V to W and S = 00 To", then A(S) = A(T).25 TRACE AND DETERMINANT — 101 Theorem 5.4, The transformation 7’ is nonsingular (invertible) if and only if A(T) » 0. In the noxt theorem we consider 7’ in Hom R", and we want to think of A(P) ‘as a funetion of the matrix t of T. To emphasize this we shall use the notation Di) = A(T). ‘Theorem 5.5. (Cramer's rule). Given an n Xn matrix t and an n-tuple y, let t |; y be the matrix obtained by replacing the th column of t by y. Then +x => D(t)zy = Dit |jy) 7S for all j. If t is nonsingular [D(t) # 0], this becomes an explicit formala for the solution x of the equation y = t- x; it is theoretically important even in those cases when it is not useful in practice (large n). EXERCISES 1 Finish Theorem 5.1 by applying Exereise 4.25. 5.2. It follows from our discussion of trace that tr(T) = 55 ty is independent of the basis. Show that this faet follows directly from tr(t-s) = trie) ‘and the change of basis formula in the preceding section. 5.3 Show by direct computation that the function d(t) = titer — tiafor satisfies (et) = d{(s) d(t) (where s and t are 2X 2 matrices). Conclude that if V is two- dimensional and d(T) is defined for 7 in Hom V by choosing a basis and setting d(T) = dit), then d(7) is actually independent of the basis 5.4 Continuing the above exercise, show that d(T) = A(T) in any’ of the following eases: 1) T interchanges two independent vectors. 2) T has two eigenvectors. 3) T has a matrix of the form Show next that if 7 has none of the above forms, then T = Re S, where S is of type (1) and Bis of type (2) or (8). [#int: Suppose T(a) = 8, with a and g independent. Let $ interchange a and 8, and consider R = Te S,] Show finally that d() = A(7) for all T in Hom V. (Vis two-dimensional.) 5.3. If t is symmetric and 2X 2, show that there is a 2X 2 matrix » such that and sts“? is diagonal. 6 Assuming Theorem 5.2, verify Theorem 5.4 for the 2X 2 ease 7 Assuming Theorem 5.2, verify Theorem 6.5 for the 2X 2 case,102 FINITE-DIMENSIONAL VECTOR SPACES. 26 5.8 In this exercise we suppose that the reader remembers what a continuous funetion of a real variable is. Suppose that the 2 2 matrix function ant) ast) wo-[ee Sal has continuous components a(t) for 1 (0, 1), and suppose that a(é) is nonsingular for every £. Show that the solution y() to the linear equation a(t) « y(@) = xt) has ‘continuous components ys(t) and y(t) if the functions 21(0) and z3(2) are continuous. 5.9 \ homogeneous second-order linear differential equation is an equation of the form + ay’ + aoy = 0, where a1 = a1(0) and ao = ao(t) are continuous functions. solution is a €%-funetion J (es, & twice continuously differentiable function) such that J" + ax(O sO + ‘ao(tfi) = 0. Suppose that f and g are €2.funetions fon (0, 1), say} such that the 2x 2 matrix 10 pall "Dg. is always nonsingular, Show that there is homogeneous second-order ‘equation of which they are both solutio 5.10 In the above exercise show that the space of all vector space. That is, show that if h(®) is any third solution, then h is a linear eombi- nation of f and g. 5.11 By a “linear motion” of the Cartesian plane R? into itself we shall mean a continuous map 2+ t(z) from (0, 1] to the set of 2X 2 nonsingular matrices such that (0) =e. Show that A(t(1)) > 0. 5.12 Show that if A(s) 1, then there is linear motion whose final matrix t(1) iss. 6, MATRIX COMPUTATIONS, ‘The computational process by which the reader Iearned to solve systems of linear equations in secondary school algebra was undoubtedly “elimination by successive substitutions”. The first equation is solved for the first unknown, and the solution expression is substituted for the first unknown in the remaining equations, thereby eliminating the first unknown from the remaining equations. Next, the second equation is solved for the second unknown, and this unknown is then eliminated from the remaining equations. In this way the unknowns are liminated one at a time, and a solution is obtained. This same procedure also solves the following additional problems: 1) to obtain an explicit basis for the linear span of a set of m veetors in R"; therefore, in particular, 2) to find the dimension of such a subspace; 3) to compute the determinant of an m x m matrix; 4) to compute the inverse of an invertible m X m matrix.26 MATRIX coMPUTATIONS 103 In this seetion we shall briefly study this process and the solutions to these problems, ‘We start by noting that the kinds of changes we are going to make on a finite sequence of veetors do not alter its span. Lemma 6.1. Let {a;} be any m-tuple of vectors in a veetor space, and let Tbe obtained from {a}? by any one of the following elementary 1) interchanging two veetors; 2) multiplying some a; by a nonzero scalar; 3) replacing a; by a; ~ zay for some j # i and some x eR, ‘Then L((B9) = Lad). Proof. If aj = a; — raj, then a; = a} + zay. Thus if {8;}" is obtained from {ai}? by one operation of type (3), then {ai}? can be obtained from {37 by one operation of type (3). In particular, each sequence is in the linear span of the other, and the two linear spans are therefore the same. Similarly, each of the other operations can be undone by one of the same type, and the linear spans are unchanged, 0 When we perform these operations on the sequence of row vectors in a matrix, we call them elementary row operations. We define the order of an ntuple x = as the index of the first nonzero entry. ‘Thus if 2; = 0 for i is 4 Let {aij} be an m X n matrix, let V be its row space, and let ny < ng < < nz be the integers that occur as orders of nonzero veetors in V. We are going to construct a basis for V consisting of k elements having exactly the above set of orders. It every nonzero row in {ai)} has order >p, then every nonzero veetor x in V has order > p, since x isa linear combination of these row vectors. Since some veetor in V has the minimal order m, it follows that some row in {a;;} has order 1s, Wemove such ro to the top by interchanging to rows. We then multiply this row x by a constant, so that its ist nonzero entry Za, is 1. Leta!,..., a" be the row veetors that we now have, so that a! has order n, and af, — 1. We next subtract multiples of a! from each of the other rows in such a way that the new ith row has 0as its n;-coordinate. Specifically, we replace a‘ by a’ — al, a fori > 1. The matrix that we thus obtain has the property that its jth column js the zoro m-tuple for each j < ny and its nyth column is 6° iu R™. Ths first row has order m;, and every other row has order > nj. Its row space is still V. We ‘again call it a. Now let x = Ef cia! be a vector in V with order ns. Then cy = 0, for if 1 0, then the order of x is m1. ‘Thus x is a linear combination of the second104 FINITE-DIMENSIONAL VECTOR SPACES 26 to the mth rows, and, just as in the first ease, one of these rows must therefore have order ns. ‘We now repeat the above process all over again, keying now on this veetor. ‘We bring it to the second row, make its nq-coordinate 1, and subtract multiples of it from all the other rows (including the first), so that the resulting matrix has 8? for its nath column, Next we find a row with order na, bring it to the third row, and make the nth column 8°, ete. We exhibit this process below for one 3X 4 matrix. ‘This example is dis- honest in that it has been chosen so that fractions will not occur through the application of (2). ‘The reader will not be that lucky when he tries his hand. Our defense is that by keeping the matrices simple we make the process itself ‘more apparent. 0-1 23 aa ere > 2 4 -2| a fo -1 2 3] oy fo a > 4 08 > 4 0 3 24 op 2-7 f4 @ lo -1 2 3| @ jo 1 @ joa 2 3] & fp sl ‘Note that from the final matrix we can tell that the orders in the row space are 1,2, and 4, whereas the original matrix only displays the orders 1 and 2. We end up with an m X n matrix having the same row space V and the following special structure: 1) For 1 ns, a contradiction). 8) The nth column is & It follows that any linear combination of the first k rows with coefficients +104 has ¢; in the njth place, and hence cannot be zero unless all the cj'sare zero. These k rows thus form a basis for V, solving problems (1) and (2). ‘Our final matrix is said to be in row-reduced echelon form. It can be shown to be uniquely determined by the space V and the above requirements relating its rows to the orders of the elements of V. Its rows form the canonical basis of V.26 MATRIX COMPUTATIONS 105 A typlalrow-rwiuced echelon matrix is shown in Fig. 2.4. This matrix is X 11, its orders are 1, 4, 5, 7, 10, and its row space has dimension 5. It is entirely 0 below the broken line. The dashes in the first five lines represent arbitrary numbers, but any change in these remaining entries changes the spanned space V. We shall now look for the significance of the row-reduction operations from the point of view of gnera linear theory In this discussion it will be convenient to uso the fact from Seaton 4 that ifan r-tupot in Rie viewed as an n X 1 matric (ve, a8 a eolumn veetor), then the system of linear equations‘ Sha osp = 1, m, expres exelly the single matrix equation ‘has he aoa iar taimati 4 © Hom) nm vow x being simply multiplication by the matrix a; y = A(s)ifand only ify =>. ° o 0 o Fig. 24 We first note that each of our elementary row operations on an m Xm matrix a is equivalent to premultiplication by a corresponding m X m elementary matrix u. Supposing for the moment that this is so, we ean find out what is by using the m X m identity matrix e. Since w+ a= (u-e)- a, we see that the result of performing the operation on the matrix a ean also be obtained by premultiplying a by the matrix ue. That is, i the elementary operation ean be obtained as matrix multiplication by u, then the multiplier is u-e. This argument suggests that we should perform the operation on e and then see if premultiplying a by the resulting matrix performs the operation on If the elementary operation is interchanging the igth and joth rows, then performing it on e gives the matrix a with wiz = 1 for k 7 ig and k ¥ jo, Wajo = Wig = Land ux1 = 0 for all other indices. Moreover, examination of the sums defining the elements of the product matrix u- a will show that pre- ‘multiplying by this w does just interchange the ith and joth rows of any mX nm matrix a. In the same way, multiplying the ioth row of a by ¢ is equivalent to pre- ‘multiplying by the matrix u which is the same as ¢ except that wii = Finally, multiplying the joth row by 2 and adding it to the foth row is equivalent to premultiplying by the matrix u which is the identity e except that wig i 2 instead of 0.106 FINITE-pimENSIONAL VECTOR SPACES 26 ‘These three elementary matrices are indicated schematically in Fi ‘ach has the value 1 on the main diagonal and 0 off the main diagonal except as indicated. ip \ % fo ‘These elementary matrices w are all nonsingular (invertible). ‘The row interchange matrix is its own inverse. The inverse of multiplying the jth row by ¢ ‘multiplying the same row by I/e. And the inverse of adding c times the jth row to the ith row is adding —c times the jth row to the ith row. Ifu!, u,..., w? is a sequence of elementary matrices, and if, b then b-a is the matrix obtained from a by performing the corresponding sequence of elementary row operations on a. If u!,..., w is a sequence which row reduces a, then r = b- a is the resulting row-reduced echelon matrix. Now suppose that a isa square m m matrix and is nonsingular (invertible). ‘Thus the dimension of the row space is m, and hence there are m different orders ny .seyMe That is, k =m, and since 1 < my < ny [lo -al-3 aJ=L a -4. Cheek it if you are in doubt. Finally, since b- e = b, we see that we get b from e by applying the same row operations (gathered together as premultiplication by b) that we used to reduce a to echelon form. This is probably the best way of computing the inverse of a matrix. To keep track of the operations, we ean place e to the right of a to form a single m X 2m matrix a |e, and then rovr reduce it. In echelon form it will then be the mx 2m matrix eb, and we ean read off the inverse b of the original matrix a Let us reeompute the inverse of ilo of sis wh da from which we read off the inverse to be li 1}. 2-4 Finally we consider the problem of computing the determinant of a square ‘mX m matrix. We use fw elementary operations (one modified) as follows: 1) interchanging two rows and simultaneously changing the sign of one of them; 2) as before, replacing some row a, by a: — xa; for some j # i When applied to the rows of a square matriz, these operations leave the determinant unchanged. ‘This follows from the properties of determinants listed in Section 5, and its proof will be left as an exercise. Moreover, these properties will be trivial consequences of our definition of a determinant in Chapter 7. Consider, then, @ square m X m matrix {a;3}. We interchange the first and pth rows to bring a row of minimal order n, to the top, and change the sign of the row being moved down (the first row here). We do not make the leading108 FINITE-DIMENSIONAL VECTOR SPACES 26 coefficient of the new first row 1; this elementary operation is not being used now. Weedo subtract multiples of the first row from the remaining rows, in order to make all the remaining entries in the nth column 0. ‘The nth column is now 81, where ¢; is the leading eoefficient in the first row. And the new matrix has the Same determinant as the original matrix. ‘We continue 9s hefore, subject ta the above modifications. We change the sign of a row moved downward in an interchange, we do not make leading coefficients 1, and we do clear out the njth column so that it becomes 0,6", where ¢; is the leading coefficient of the jth row (I , <1,2,3,4%, <0,1,0,1>, and <4, 3, 2, 1> are linearly independent by row reducing. Part of one of the row-reduc~ ing operations is unnecessary for this check. What is it? 6.5 Row reduce110 FINITE-DIMENSIONAL VECTOR SPACES 26 6.1 Let us call a ketuple of vectors {a} in R" canonical if the kX m matrix a with a; as its ith row for all is in row-reduced echelon form. Supposing that an n-tuple £ is in the row space of a, we can read off what its coordinates are with respect to the above canonical basis. What are they? How then can we check whether oF not an arbitrary n-tuple £ is in the row space? 6.8 Use the device of row reducing, as suggested in the above exercise, to determine whether or not 8! — <1, 0,0,0> is in the span of <1,1,1,1>, <1, 2,547, aul <2,0,1, -1>. Do the same for <1, 2, 1, 2>, and also for <1, 1,0,4>. 6.9 Supposing that a + 0, show that a 8 ed is invertible if and only if ad — be + 0 by reducing the matrix to echelon form. 6.10 Let a be an m X n matrix, and let w be the nonsingular matrix that row reduces a, so that r = w- ais the row-reduced echelon matrix obtained from a, Suppose that hhas m — k > 0 zero rows at the bottom (the kth row being nonzero). Show that the bottom m — F rows of u span the annihilator (range A)° of the range of A. That is, y= ax for some x if and only if Lew for each m-tuple e in the bottom m— k rows of . [Ufint: The bottom row of ris ‘obtained by applying the bottom row of u to the columns of a.) 6.11 Remember that we find the row-reducing matrix u by applying to the m Xm identity matrix e the row operations that reduce a to r. That is, we row reduce the mX (n+ m) justaposition matrix ale to r|u, Assuming the result stated in the above exercise, find the range of A € Hom(R®) as the null space of a functional if the matrix of 4 is. 12 8 2 3 af. 3 5 7. 6.12 Similarly, find the range of A if the matrix of A is 0 4 3 6.18 Let a be an mX m matrix, and let a be row reduced to x. Let A and R be the corresponding operators in Hom(R", R™) (so that A(x) = a- x). Show that A and R Ihave the same null space and that A* and R* have the same range space, 6.14 Show that solving a system of m linear equations in n unknowns is equivalent ‘to solving a matrix equation kate for the n-tuple x, given the m X n matrix t and the m-tuple k. Let 1’ Hom(R®, R™) bbe multiplication by t. Review the possibilities for a solution from our general linear theory for 7 (range, null space, affine subspace).27 THE DIAGONALIZATION OF A QUADRATIC FoRM 111 6.15 Let b = ¢[d be the mX (n+p) matrix obtained by justaposing the m Xn matrix e and the mX p matrix d. If ais an X m matris, show that ach = aclad, State the similar result concerning the expression of b as the justaposition of n column metuples. State the corresponding theorem for the “distributivity” of right mulipli- cation over juxtaposition. 6.16 Let abe an mX nmatris and ka column m-tuple, Let b | Ube the m (n+ D) matrix obtained from the mX (n+ 1) justaposition matrix a | k by row reduetion Show that a-x = kif and only if b- x = 1. Show that there isa solution x if and only if every row that is zero in b is zero in 1, Restate this condition in terms of the notion ‘of row rank, GAT Let b be the row-reduced echelon matrix obtained from an m Xn matrix a. ‘Thus b = ua, where w is nonsingular, and B and A have the same null space (where Be Hom(R®, R") is multiplication by b). We can read off from ba basis for a sub~ space W'C R® such that B'| Wis an isomorphism onto range B. What is this basi. We then know that the null space 1’ of B isa complement of W. One complement of W, call it Mf, can be read off from W. What is M? 6.18 Continuing the above exercise, show that for each standard basis vector 8: in M ‘we ean reed off from the matrix b a vector ay in W such that 8 — aN. Show that these vectors {3° — ai} form a basis for N. 6.19 We still have to show that the modified elementary row operations leave the determinant of a square matrix unchanged, assuming the properties (a) through (e) from Seetion 5. First, show from (a), (c), (2), and (e) that if T in Hom R? is defined by TOs!) = 62 and 7(62) = —81, then A(T) = 1. Do this by a very simple factor- ination, 7 = Re S, where (e) ean be applied to S. Conclude that a type (1) elementary matrix has determinant 1 6.20 Show from the determinant property (b) that an elementary matrix of type (2) has determinant 1, Show, therefore, that the modified elementary row operations on a square matris leave its determinant unchanged. “1, THE DIAGONALIZATION OF A QUADRATIC FORM. As we mentioned earlier, one of the erucial problems of linear algebra is the analysis of the “structure” of a linear transformation 7 in Hom V. From the point of view of bases, every theorem in this area asserts that with the choice of a special basis for V the matrix of 7 can be given the such-and-such simple form. This is a very difficult part of the subject, and we are only making contact with it in this book, although Theorem 5.5 of Chapter I and its corollary form a comerstone of the structural results. In this seetion we are going to solve a simpler problem. In the above language it is the problem of choosing a basis for V making simple the matrix of a transformation 7 in Hom(V, V+). Such a transformation is equivalent to a bilinear functional on V (by Theorem 6.1 of Chapter 1 and Theorem 3.2 of this chapter); we shall tackle the problem in this setting.112 FINITR-DIMENSIONAL VECTOR SPACES 27 Let V be a finite-dimensional real vector space, and let w: Vx V > R be a bilinear functional. If {a;}4 is a basis for V, then w determines a matrix tis = W(ai, a). We know that if (8) = (6, 9), then oy © V* and 9+ wy is a linear mapping 7 from V to V*. We leave it as an exercise for the reader to show that {tj} is the matrix of 7 with respect to the basis {aj} for V and its ‘ual basis for V* (Exercise 4.1) IE = Lj ma and y= TF yay, then (9) z auyseolaiy a3) = Do tasty. In particular, if we set 9(8) = o(6, €), then 9(8) = Ex tty is a homogeneous quadratic polynomial in the coordinates For the rest of this section we assume that w is symmetric: w(£, 9) = w(n, £). Then we can recover w from the quadratic form g by e+) — at») (6) = Swe, as the reader ean easily check. In particular, if the bilinear form « is not identically zero, then there are veetors € such that 9(€) = w(&, &) » 0. What we want to do is to show that we can find a basis {a:}} for V such that (ai, «)) = 0 iff ¥ j and «a, ai) has one of the three values 0, ++ 1. Borrow- ing from the standard usage of sealar product theory (see Chapter 5), we say that such a basis is orthonormal. Our proof that an orthonormal basis exists will be an induction on n = dim V. If'm = 1, then any nonzero veetor 8 is a basis, and if «8, 8) # 0, then we can choose a = 28 s0 that 2°4(8, 8) = w(a, a) = tI, the required value of x obviously being z = |w(8, 6)|~". In the general ‘ease, if ois the zero funetional, then any basis will trivially be orthonormal, and ‘we can therefore suppose that « is not identically 0. ‘Then there exists a 8 such that «(8, 8) % 0, as we noted earlier. We set ay = 28, where 2 is chosen to make q(@n) = (ay, an) = £1. The nonzero linear functional f(#) = w(&, ax) has an (n — 1)-dimensional null space N, and if we let « be tie restrietion of @ to X N, then o has an orthonormal basis {a.}3" by the inductive hypothesis. Also w(ais an) = (aq, a) = Oif i < n, because a: is in the null space of f ‘Therefore, {a;)} is an orthonormal basis for «, and we have reached our goal: ‘Theorem 7.1. If is a symmetric bilinear functional on a finite-dimensional real veetor space V, then V has an w-orthonormal basis. Foran w-orthonormal basis the expansion «(&, n) = 5 2ay,ta(a a4) reduces to a9) =X zwaled, where g(ai) = -+1 or 0. If we let V7; be the span of those basis veetors ay for which g(a) = 1, and similarly for V_; and Vo, then we see that q(g) > 0 for every nonzero £ in Vi, g() < 0 for every nonzero veetor £ in V_1, and q= 027 ‘THE DIAGONALIZATION OF 4 QUADRATIC FoR — 113 on Vo. Furthermore, V = Vi ® V1 @ Vo, and the three subspaces are w-orthonormal to each other (which means that w(&, 9) =0 if £E Vi and 7E V1, etc). Finally, q() <0 for every gin V1 © Vow If we choose another orthonormal basis {8;} and let W1, W_1, and Wo be its corresponding subspaces, then W1 may be different from V1, but their dimensions must be the same. For W119 (V1 © Vo) = {0}, since any nonzero £ in this intersection would yield ‘the contradictory inequalities g() > 0 and (6) <0. Thus W, ean be extended to a complement of V1 © Vo, and since Vy és « complement, we have d(W,) < d(V,). Similarly, d(¥1) < a), and the dimensions therefore are equal. Incidentally, this shows that W, is a ‘complement of V_y @ Vo. In exactly the same way, we fin a(V'-) and finally, by subtraction, that d(W'o) = d(V). It is eonventi reorder an w-orthonormal basis {a,} s0 thatall the a's with q(a)) = 1 come first, then those with q(a;) = —1, and finally those with g(a) = 0. Our results above can then be stated as follows: ‘Theorem 1.2. If wis a symmetric bilinear functional on a finite-dimensional space V, then there are integers n and p such that if {a} is any w-orthonormal basis in conventional order, and if § = CT ia, then git) = tte ah — thy hae a, ey Dat- Qa! ot ‘The integer p —n is called the signature of the form q (or its associated symmetric bilinear funetional a), and p -+ is its rank. Note that p ++ » is the dimension of the column space of the above matrix of g, and hence equals the dimension of the range of the related linear map 7. Therefore, p +n is the rank of every matrix of g ‘An inductive proof that an orthonormal basis exists doesn’t show us how to find one in practice. Let us suppose that we have the matrix {tj} of w with respect to some basis {a:}} before us, so that (8,9) =X xiii, where B= Ch nas, 2 = FI yeas and fy; = w(ay aj), and we want to know how to go ‘about actually finding an orthonormal basis {8;}}. ‘The main problem is to find ‘an orthogonal basis; normalization is then trivial. ‘The first objective is to find ‘aveotor 8 such that w(A, 8) # 0. If some ti; = «(ai a) is not zero, we can take 8 = a;. Ifall (; = and the form « is not the zero form, there must be some ij #0, say tug 0. If we set 71 = ax} ag and %; = ai fori > 1, then {73} is a basis, and the matrix ¢ = {,} of w with respect to the basis {7,) has S11 = (Vs, 74) = len + a2, an + a9) = ta + 2hae + far = Bra #0. Similarly, 64; = ty if either i or jis greater than 1. For example, if w is the bilinear form on B? defined by w(x, y) = 21y2 + ayy, then its matrix ty = w(3', 8) is114 FINITE-DIMENSIONAL VECTOR SPACES 27 ‘and we must change the basis to get f11 #0. According to the above scheme, we set 71 = 8!-+ 6? and Y= 6? and get the new matrix sj = «(%i; 74), which works out to 2 7 1 0. ‘The next stey to find a basis for the null space of the functional «(é, 71) — Ezy We do this by modifying 72, ..., Ya; we replace 7; by ¥%-+ cv; and calculate ¢ so that this vector is in the null space. Therefore, we want 0 (75 + 07 MH 81j + ¢811, and 80 ¢ = —8);/81;. Note that we cannot take this orthogonalizing step until we have made 81; ¥ 0. The new set still spans and thus isa basis, and the new matrix {73} has riy #0 and rij = 731 = 0 for {> 1. Wenow simply repeat the whole procedure for the restriction of w to this (n — 1)-dimensional null space, with matrix {rj :2 < i,j 0 ot thr <0, If the determinant is negative, then the sgoature is 0. ‘Thus the Siznature ofthe form (sy) = aya Fay with matic oy] 1 of" is known to be 0, without any ealeulation, ‘Theorems 7.1 and 7.2 are inportant forthe clssiseation oferta! points of real-valued functions on vestor spaces, We shall ee in Section 3.10 thatthe second diferentil of such a funtion Fix a symmetietlnensfanetona, and ‘that the signature of its form has the same significance in determining the behavior of F near a point at which its first differential is zero that the sign of the second derivative has in the elementary calculus. ‘A quadratic form is said to be definite if 9(£) is never zero except for £ = 0. ‘Then g(8) must always have the same sign, and q is accordingly called positive definite or negative definite. Looking back to Theorem 7.2, it should be obvious that q is positive definite only if p = d(V) and n = 0, and negative definite only if mn = d(V) and 0. A symmetric bilinear functional whose associated quadratic form is positive definite is called a scalar product. This is a very portant notion on general vector spaces, and the whole of Chapter 5 is devoted to developing sone ofits npctionsCHAPTER 3 THE DIFFERENTIAL CALCULUS Our algebraic background is now adequate for the differential ealeulus, but we still need some multidimensional limit theory. Roughly speaking, the differential caleulus is the theory of linear approximations to nonlinear mappings, and we have to know what. we mean by approximation in general vector settings. We shall therefore start this chapter by studying the notion of a measure of length, called norm, for the vectors in a veetor space V. We can then study the phenomenon suggested by the way in which a tangent plane to a surface approximates the surface near the point of tangency. ‘This is the general theory of unique local linear approximations of mappings, called differentials. ‘The collection of rules for computing differentials includes all the familiar laws of the differential caleulus, and achieves the same goal of allowing complicated calculations to be performed in a routine way. However, the theory is richer in the multidimensional setting, and one new aspect which we must master is the interplay between the linear transformations which are differentials and their evaluations at given veetors, which are directional derivatives in general and partial derivatives when the vectors belong to a basis. In particular, when the spaces in question are finitedimensional and are replaced by Cartesian spaces through a choice of bases, then the differential is entirely equivalent to its matrix, “which isa certain matrix of partial derivatives called the Jacobian matrix of the mapping, ‘Then the rules of the differential ealeulus are expressed in terms of matrix operations. Maximum and minimum points of real-valued funetions are found exactly: as before, by computing the differential and setting it equal to zero. However, ‘we shall neglect this subject, except in starred seetions. It also is much richer than its one-variable counterpart, and in certain infinite-dimensional it becomes the subject called the calculus of variations Finally, we shall begin our study of the inverse-mapping theorem and the jmplicit-funetion theorem. The inverse-mapping theorem states thatifa mapping between vector spaces is continuously differentiable, and if its differential at a point a is invertible (as a linear transformation), then the mapping itself is invertible in the neighborhood of a. ‘The implieit-funetion theorem states that if ‘a continuously differentiable vector-valued function G of two vector variables {s set equal to zero, and if the second partial differential of G is invertible (as a linear mapping) at a point where G(a,A) = 0, then the equation 11631 neview wR 117 (En) = 0 can be solved for » in terms of € near this point. ‘That is, there is a ‘uniquely determined mapping 9 = F(z) defined near a such that 8 = F(a) and such that @(é, F()) = 0 in the neighborhood of a. ‘These two theorems are fundamental to the further development of analysis. They are deeper results than our work up to this point in that they depend on a special property of ‘vector spaces called completeness; we shall have to put off part of their proofs to the next chapter, where we shall study completeness in a fairly systematic way. Ina number of starred sections at the end of the chapter we present some harder material that we do not expect the reader to master. However, he should iry to get a rough idea of what is going on. 1. REVIEW IN R Livery student of the calculus is presumed to be familiar with the properties of the real number system and the theory of limits. But we shall need more than familiarity at this point. It will be absolutely essential that the student under~ stand the edefinitions and be able to work with them, To be on the safe side, we shall review some of this material in the setting of limits of functions; the confident reader can skip it, We suppose that all the funetions we consider are defined at least on an open interval containing a, ‘except possibly at itself, The need for this exception is shown by the difference quotients of the calculus, which are not defined at the point near which their chavior is erucial, Definition. j(2) approaches I as x approaches a (in symbols, f(x) — U as x — a) if for every positive € there exists a positive 8 such that, O and g(x) —+ v, and we want to prove that some other function h has a limit w at a, In such cases we always try to find an inequality expressing the quantity we wish to make small, {h(x) — wl, in terms of the quantities which we know can be made small, [f(2) ~ w and |g(2) — a}. For example, suppose that h +g. Since f(x) is close to u and g(x) is close to v, clearly h(x) is close to w = u + v. But how close? Sinee h(x) — w = (fle) = u) + (o(z) — »), we have Gz) — wel < [fle) — ul + late) = af. From this it is clear that in order to make {h() — tc| less than € it is sufficient to make each of |f(z) — ul and |g(x) — »| less than ¢/2. Therefore, given any €, wwe ean take 8 s0 that 0 < |r — al < 8, = |f(z) — wl < €/2, and 52 s0 that 0 < [x ~ al < 5, = |g(z) — of < e/2, and we can then take & as the smaller of these two numbers, so that if 0 < |z — a] < 4, then both inequalities hold. Thus O< |e —al <8 = [AG) — uw) < If) — wt @) — <$ 45-6 and we have found the desired 8 for the funetion fh. Suppose next that w # Oand that h = 1/f. Clearly, h(x) is elose to w = 1/u when f(z) is close to u, and so we try to express h(z) — w in terms of f(z) — uw. iu Thus A _1_u=so, Tu Feu Nz) — w and so |A(z) — w| < |f(z) — u//If(@)ul. The trouble here is that the denominator is variable, and if it should happen to be very small, it might eancel the ‘smallness of [f(z) — u| and not force a small quotient. But the answer to this problem is easy. Since f(z) is close to u and w is not zero, f(z) cannot be close to zero. For instance, if f(z) is eloser to w than |ul/2, then j(z) must be farther from 0 than |ul/2. We therefore choose 3; so that 0 < |x — al < d= [fke) — ul < |ul/2, from which it follows that [/(2)| > |ul/2, ‘Then [ney — wl < 21f(e) — ul/|ul®,3a review sR 19 ‘and now, given any ¢, we take 82 s0 that 0 [hCe) — wl < 21a) — ul/lul? < 2elul?/2|ul? = €, and again we have found our 6 for the funetion h. We have tried to show how one would think about these situations. The actual proof that would be written down would only show the choice of 8. Thus, Lemma 11 asz— a. Proof. Given ¢, choose 8, so that 0 < |x —al < 5 = [f(e) al < ¢/2 (by the assumed convergence of f to w at a), and, similarly, choose 62 s0 that 0< |r — al < 8) = [g(e) — 0] wand g(x) + v as2 a, then f(z) + 92) > uo “Thus we have proved that for every ¢ there is a 8 such that, 0 < fea] <8 = [(fle) +92) — UHH <6 und we are done. 0 In addition to understanding techniques in limit theory, itis necessary to understand and to be able to use the fundamental property of the real number system called the least upper bound property. Tn the following statement of the property the semi-infinite interval (—20, aJisof course thesubset {cE R :x m (as x a), then f(z) g(z) —» Im as 2+ a, 1.3. Prove that [x — | < |al/2=> |2| > |al/2. 14 Prove (in detail) the greatest lower bound property from the least upper bound property. 15 Show that lub A = 2 if and only if 2 is an upper bound of .1 and, for every positive ¢, x — e is not an upper bound of 1. 1.6 Let A and B be subsets of R that are nonempty and bounded above, Show thet, A+ Bis nonempty and bounded above and that lub (1+ 8) = lub A-+ lub B, 1.7 Formulate and prove a correct theorem about the least upper bound of the product of two sets. 1.8 Define the notion ofa one-sided limit for a function whose domain isa subset of R. For example, we want to be able to discuss the limit of f(z) as x approaches a fron below, which wwe might designate tte. 19 Ith domsin ofa evalu fein fan neva a 0 wey at fis ss tara unt B 1 LL A hat ate very oe abner fan interval (sad to be dense in [s, t). Show that if f: [a, b] — R is increasing and range f is dense in [f(a), f(b)], ‘then range f = [f(a), f(b)]. (For any z between f(a) and f(b) set y = lub {x: f(z) < 2}, ete.) 1.12 Assuming the results of the above two exercises, show that if fis a continuous strictly increasing function from (a, to B, and ifr = f(a) and s = (0), then fl isa continuous strictly increasing function from [r,s] to R. [A function f is continuous if 1G) + fly) a8 2» y for every y in its domain; it is strictly increasing if 2 < y= J) < $0) 1.18 Argue somewhat as in Exercise 1.11 above to prove that if f:[a, 8] ~+ 8 is continuous on (a, 0), then the range of f includes (f(a), f@)]. This is the intermediate- talue theorem. Lt Suppose the function g.R — R satisfies alzy) = ye) ay) for all 2, yER. Note that q(z) = 2" (n a positive integer) and q(2) = [a (F any real number) satisfy this “functional equation”, So does 4(z) = 0 (r = —22), Show that if q satisfies the fanetional equation and q(z) > 1 for x > 1, then there is a real number r > I such that (2) = 2° for all positive x32 onus 121 115 Show that if q is continuous and satisfies the functional equation q(zy) = (2) 4(y) for all z,y © R, and if there is at least one point a where g(a) % 0,1, then (2) = 2" forall positive 2, Conclude that if also q is nonnegative, then q(2) = [2|"on R. 1.16 Show that if q(z) = [2% and if (e+ ») < a(2) + aly), thenr <1. (Try y= 1 nnd x large; what is '(2) like ifr > 1?) 2. NORMS: In the limit theory of R, as reviewed briefly above, the absolute-value function is used prominently in’ expressions like ‘[r— yl’ to designate the distance hotween two numbers, here between z and y. ‘The definition of the convergence of (2) to is simply a careful statement of what it means tosay that the distance \f(2) — ul tends to zero as the distanee jz — al tends to zero. ‘The properties of [cl whieh we have used in our proofs are 1) [el > Oifz x 0, and [0] = 0; 2) levl = lal [ul 3) etal < lel +h The limit theory of veetor spaces is studied in terms of fumetions called norms, which serve as multidimensional analogues of the absolute-value function on R. Thus, if p: VR is a norm, then we want to interpret p(a) as the “size” ‘af @-and pla — A) as the “distance” between @ and 8. However, if V is not ‘one-dimensional, there is no one notion of size that is most natural. For example, if fis a positive continuous function on [a,b], and if we ask the reader for a number which could be used as a measure of how “large” f is, there are two possibilities that will probably occur to him: the maximum vaiue of j and the area inder the graph of f. Certainly, f must be considered small if max J is small. But also, we would have to agree that fis small in a different sense if its area is small. These are two examples of norms on the veetor space V = @({a, 6) of all continuous funetions on [a,b]: PE) = max (flO :4E [0,0 and — ai) = f° ol a Note that f ean be small in the second sense and not in the first. In order to be useful, notion of size for a veetor must have properties xnalogous to those of the absolute-value funetion on Definition, A norm isa real-valued function p on a veetor space V such that nl. pla) > Oif a # 0 (positivity); #2. plea) = [zlp(a) for all « V, 2 © R (homogencity); n3. pla+8) < pla) + p(8) for all a, 8 & V (triangle inequality). A normed linear space (nls), oF normed vector space, is a veetor space V together with @ norm p on V. A normed linear space is thus really a pair122, THE DIFFERENTIAL cancuLUs 32 , but generally we speak simply of the normed linear space V, a definite norm on V then being understood. Tt has been customary to designate the norm of a by ja, presumably to suggest the analogy with absolute value. The triangle inequality n3 then becomes ja + || < |\al| + |(8|, which is almost identical in form with the basic absolute-value inequality |x + yl < |x| + |u|. Similarly, n2 becomes |\xal || lal|, analogous to [zy| = |2| [yl in R. Furthermore, ja — 6] is similarly interpreted as the distance between a and 8. ‘This is reasonable since if we set a= £— 7 and 6 = 9 — f, then nB becomes the usual triangle inequality of geometry: We tH < eal + ly — at ‘We shall use both the double bar notation and the “ is on oceasion superior to the other. ‘The most commonly used norms on R" are ||x\|, = Lj |x,|, the Euclidean norm [x2 = (Ct22)'", and |x}e = max {lr}, Similar norms on the infinite-dimensional vector space €({a, b]) of all continuous real-valued functions: on (a, 6} are wotation for norms; each ih = f° iso at, Ite= (vor a)”, Ile = max (Uf|sa <<). It should be easy for the reader to check that || ||, is a norm in both cases above, and we shall take up the so-called uniform norms || ||» in the next paragraph. ‘The Buclidean norms |) lz are trickier; their properties depend on Scalar product considerations. ‘These will be discussed in Chapter 5. Meanwhile 0 that the reader ean use the Eueidean norm || on BR, we shall ask him to prove the triangle inequality for it (the other axioms being obvious) by brute force in an exercise. On R itself the absolute value is @ norm, and itis the only norm to within a constant multiple. We ean transfer the above norms on R* to arbitrary fnite-dimensional spaces by the following general remark. Lemma 2.1. If p is a norm on a veetor space WY and Tis an injective linear ‘map from a veetor space V to W, then pe T is a norm on V. Proof. The proof is left to the reader. Uniform norms. The two norms || ||» considered above are special cases of a very general situation. Let A be an arbitrary nonempty set, and let (A, R) be the set of all bounded functions f: A B. That is, f © @(A, R) if and only if JER and range f C[—b, b] for some bE R. This is the same as saying that range [f| (0, b], and we call any such b a bound of |f|. ‘The set @(A, R) is a32 onus 123 veetor space V, since if |f| and |g] are bounded by 6 and ¢, respectively, then |2f + yo| is bounded by |z\ + |yle. The uniform norm ||fije is defined as the smallest bound of |f|. That is, [\flle = lub {[/@)| sp € A} Of course, it has to be checked that || le is @ norm. For any pin A, [f@) + 9(P)| SO) + le) S Whe + laos ‘Thus | fle + lg is a bound of [f + g| and is therefore greater than or equal to the smallest such bound, which is ||/-+ glle. ‘This gives the triangle inequality. Next we note that if z 0, then b bounds |f| if and only if |2|b bounds (zf|, and it follows that lf = || \f\le Finally, [fle = 0, and ||f)le= 0 only if fis the zero funetion. ‘We ean replace R by any normed linear space W in the above discussion. A funetion f: A — W is bounded by b if and only if |i/(p)|| 0. ‘The so-called closed ball of radius r about 8, B= {& || — Bl lall/2. 22 Provein detail that bh = Xtal isa norm on R*, Alo prove that ee ALO is a norm on €((a, 6). 2.3. For x in R' let [el be the Euclidean length -[E-]" k and let (x,y) be the scalar product, G9) = Dew “The Sehwars inequality says that {6 91 $ bl lol and that the inequality is strict if x and y are independent. a) Prove the Schware inequality for the case n = 2 by squaring and canceling. ) Now prove it for the general n in the same w: 2.4 Continuing the above exercise, prove that the Euclidean length |x| is # norm, ‘The crucial step is the triangle inequality, |x-+ y| <|x|+|yl. Reduce it to the Schwarz inequality by squaring and canceling. This is of course our two-norm |x 25. Prove that the unit balls for the norms | x and [| jn on R? are as shown in Fig. 32. 2.6 Prove that an open ball is an open set. 2.1 Prove that a closed bull is a closed set. 28 Give an example of a subsat of R? that is nether open nor closed 29 Show from the definition of an open set that any open set is the union of @ family (perhaps very large?) of open balls. Show that any union of open sets is open. Conclude, therefore that a set i open if and only if ti 8 union of open balls 2.10 A subset A of a normed linear space V is said to be conver if includes the line segment joining any two ofits points. We know thatthe lin segment from a to 8 is the image of [0,1] under the mapping ¢—» 8+ (1 —d)a. Thus - is convex if and only if a, 8 € A and ¢ (0, 1] 48+ (1 — Dae A. Prove that every ball By() in 4 normed linear space V is convex. 2.11 A seminorm is the same as a norm except that the positivity condition nl is relaxed to nonnegativity: nl’, p(a) > 0 for all a.126 THE DIFFERENTIAL cancuLuS BS ‘Thus p(a) may be 0 for some nonzero a, Every norm is in particular a seminorm, Prove: 8) If pis a seminorm on a veetor space I and T'is a linear mapping from V to W, then pe Tis a seminorm on V. b) pe Tis. norm if and only if T is injective and p is @ norm on range 7. 2.12 Show that the sum of two seminorme is a seminorm, 2.13 Prove from the above two exercises (and not by @ direct caleulation) that UD) = e+ [4tto)| is a seminorm on the space €4((a, 6) of all continuously differentiable real functions on [a 6, where (is a fixed point in fa, 3]. Prove that ¢ is a norm. 2.14 Show that the sum of two bounded sets is bounded. 2.15 Prove that the sum By(a) + By) is exactly the ball B,a(a-+ 8). 3. CONTINUITY ‘Let V and W be any two normed linear spaces. We shall designate both norms by [| |. This ambiguous usage does not cause confusion. It is like the ambiguous use of “0” for the zero elements of all the veetor spaces under consideration. If we replace the absolute value sign | | by the general norm symbol | || in the definition we gave earlier for the limit of a real-valued function of a real variable, it becomes verbatim the corresponding definition of convergence in the gencral setting. However, we shall repeat the definition and take the occasion to relax the hypothesis on the domain of f. Accordingly, let A by any subset of V, and let J be any mapping from A to W. Definition. We say that f(g) approaches @ as £ approaches a, and write J(8) > Bas &— a, if for every € there is a 8 such that, €eAandO < |lf— al <5 = |f) — Bl a and have the direct ¢,- characterization of continuity: f is continuous at aif for every € there exists a 3 such that [|€ — all < = lif(@) — f(@)|| < . It is understood here that is universally quantified over the domain A of f. We say that fis continuous if fis continuous at every point a in its domain. If the absolute value of a number is replaced by the norm of a veetor, the limit theorems that we sampled in Section 1 hold verbatim for normed linear spaces. We shall ask the reader to write out a fow of these transeriptions in the exercises. ‘There is a property stronger than continuity at a which is much simpler to use when it is available. We say that fis Lipschite continuous at a if there is constant ¢ such that f() — f(@)|| < elf — al for all & sufficiently close to a.33 conrmvunry 127 ‘That is, there are constants ¢ and r such that le — all W the Lipschitz inequality is more simply written as (TQM < ells for all ¢ € Vj we just use the fact that now T(@) — T(x) = T(E ~ 9) and set $= £— 1. In this context itis conventional to call 7’a bounded linear mapping rather than a Lipschitz linear mapping, and any such c is called a bound of 7. ‘We know from the beginning calculus that if f is @ continuous real-valued funetion on (a, 6] (that is, if f € €((a, b)), then [Jf f(z) dz| < mb — a), where ‘is the maximum value of (z)|. But this is just the uniform norm off, so that the inequality can be rewritten as |{? f] < (6 — a)||flj. This shows that if the ‘uniform norm is used on @({a, )), then f + {2 f is a bounded linear functional, with bound 6 — a. It should immediately be pointed out that this is not the same notion of boundedness we discussed earlier. There we called a real-valued function bounded if its range was a bounded subset of R. The analogue here would be to call a vector-valued function bounded if its range is norm bounded. But a nonzero linear transformation cannot be bounded in this sense, because | ITC@)|. ‘The present definition amounts to the boundedness in the earlier sense of the quotient T(a) /\\al| (on V — {0}). Tt turns out that for a linear map 7, being continuous and being Lipschitz are the same thing, 'T(ea)| ‘Theorem 3.1. Let 1’ be a linear mapping from a normed linear space V to a normed linear space WW. ‘Then the following conditions are equivalent: 1) Tis continuous at one point; 2) Tis continuous; 3) Tis bounded. Proof. (1) => (3). Suppose 7 is continuous at ao. Then, taking ¢= 1, there exists 6 such that lla — agl| < 8 => [7(a) — T(ao)|| < 1. Setting = a — a0 ‘and using the additivity of 7, we have [el] < 8 => [7(8)|| < 1. Now for any nonzero, £= 8x/2{nl| has norm 6/2. Therefore, |T()| <1. But [I7()|| = 317()|/2InI|, giving ||7(»)| < 2In]//8. Thus 7 is bounded by c= 7s.128 Tur pirrereNTiaL caucuLus 33 8) > 2). Suppose |7(@|| < Cf] for all & Then for any ao and any € wwe ean take 8 = ¢/C’and have lla ~ aol| < 8 = [IP(@) — Tao)|] = [Ta — ae)| < Cla — aol) < C8 (2) > (1). Trivial. 0 Tn the lemma below we prove that the norm function is a Lipschita function from V to R. Lemma 3.1. For all a, € V, | llal| ~ \jal|| < lle — ai Proof. Wehave |lal| = ||(a — 8) + 8l| < lla — Bl] + |[Al|, s0 that Jal] — |IA\| < le — Bll. Similarly, |\8|| — |\al| < {8 — al| = || — ||. This pair of inequalities is equivalent to the lemma. 0 Other Lipschitz mappings will appear when we study mappings with eontinuous differentials. Roughly speaking, the Lipschitz property lies between continuity and continuous differentiability, and it is frequently the eonc ‘that we actually apply under the hypothesis of continuous differentiability. ‘The smallest bound of a bounded linear transformation Tis called its norm, ‘That is, | [P| = tub {\[7(@)|\/al]:« 0}. Vor example, let 1: €({a, b]) R be the Riemann integral, T(f) = [ f(z) dz. We saw earlier that if we use the uniform norm ||f\\. on @({a, b]), then Tis bounded by b — a: 7(J)| < ~a)|fle. On the other hand, there is no smaller bound, because [21 = b —a=(b—a)||i|je. Thus |/T||=b—a. Other formulations of the above definition are useful. Since (T@){\/lal| = (e/a) |, © B= a/|lal| has norm 1, we have [TI] = lub {7@)|| : sl) = 1) Finally, if [7l] <4, then 7 = 28, where [| = Land fz] <1, and WFO = [al IPOD < LPL ‘We therefore have an inefficient but still useful characterization: (P| = tub {1'70)| srl] < ap ‘These last two formulations are uniform norms. ‘Thus, if By is the closed unit, ball {&: ||| < 1}, we see that a linear 7 is bounded if and only if 7 | By is bounded in the old sense, and then WT = IT T Bille. Alinear map 7: V + W isbounded below by bit | T(¢)|| > bl || forall gin V. If T has a bounded inverse and m = Tl, then 7’ is bounded below by 1/m, for |7-"(a)|| < m|nl| for all » © W if and only if él] < m|7(®)| forall €EV. by homogeneity, and33 conminurry 129 If V is finite-dimensional, then it is true, conversely, that if 7 is bounded below, then it is invertible (why?), but in general this does not follow. If V and W are normed linear spaces, then Hom(V, W) is defined to be the set of all bounded linear maps 7: V — W. The results of Section 2.3 all remain ‘true, but require some additional arguments. ‘Theorem 3.2. Hom(V’, W) ia itself a normed linear apace if [7] ia defined as above, as the smallest bound for 7. Proof. This follows from the uniform nort of the identity ||7|| = [7 [ Bille O ‘Theorem 3.3. If U,V, and W are normed linear spaces, and if TeHom(U,V) and SeHom(V,W), then SeT¢Hom(U,W) and |Se 1 < [S| [IZ]. It follows that composition on the right by a fixed 7 is bounded linear transformation from Hom(V, W) to Hom(U, W), and similarly for composition on the left by a fixed S, Proof (Se T)(a@)|] = IS(PC@)) | < ISI) [P| < USHCIT| {eld = CSI] - TID Cla)- ‘Thus Se 7 is bounded by |S||- 7] and everything else follows at once. 0 iseussion of Section 2 by virtue ‘As before, the conjugate space V* is Hom(V, R), now the space of all bounded linear functionals, EXERCISES: aa Ww 1) Let Vand W'be normed linear spaces, and let F and @ be mappings from V to W. TE Timmyoe FQ) = w and limes G(® = % then limese (F-+ OE) = w+ >. 2) Given F:V¥ Wand g: VR, if F(8) +p and g(f) +0 as Era, then (gE) + bw 3.2 Prove that if P(g) > was € >a and G(m) +X as qs, then Ge P(8) > das, +a, Give careful, complete statement of the theorem you have proved, ‘3.3 Suppose that A is an open subset of a nls V and that ao A. Suppose that F:A— Ris such that lime say F(a) = b x 0. Prove that 1/F(a) —+1/b as a> a0 (6 -proo!), 3.4 The function flz) = |z{’is continuous at x = Ofor any positive. Prove that fis, not Lipschite continuous at 2 = Oif r <1. Prove, however, that f is Lipschitz continuous at 2 = @if'@ > 0. (Use the mean-value theorem ) 3.5 Use the mean-value theorem of the calculus and the definition of the derivative to show that if fis a real-valued funetion on an interval I, and if J’ exists everywhere, then f is a Lipsehite mapping if and only if f’ is « bounded function. Show also that then |i ia the smallest Lipschitz constant C. te out the e,6-proofs of the following limit theorems.130 THE DIFFERENTIAL caLcULUS 33 8.6 The “working rules” for [7 are 1) THIS ITH Hel for al & 2 (THI < vig, aE > IT} Daze is a 3.8 Prove similarly that if |jxl| = [ees then |Zall = lll. 3.9 Use the above exercises to show that if [x] on Ris the one-norm, then lin = tub {(fo)]:fe (R* and [If <1) 310 Show that if 7 in Hom(R*, R™) has matix t = {lj}, and if we use the one norm |x on R* and the uniform'norm ly on R™, then || = [te SAIL Show that the meaning of ‘Hom(Y, 1)" has changed by giving an example of a linear mapping that fails to be bounded. ‘There is one inthe text. 312 Fora fixed in V define the mapping eve: Hom(V, W) + W by eve) = 7. Prove that eve is @ bounded linear mapping. BAB. Tn the shove exereise it i in fact ty need new theorem. that flovgl| = [1], but to prove this we ‘Theorem. Given fin the normed linear space V, there exists a funetional fin ¥* such that [fl = Land |f(@)] = [8 Assuming this theorem, prove that level] = [,f]+ [Jfint: Presumably you have already. shown that level| < |g. You now need a 7 in Hom(V, 1) such that || = 1 and IPCI = Il. Consider a suitable dyad.) SIM Lett = {fj} bea square matris, and define le as maxs (ltl). Prove that this is a norm on the space R™ ofall n Xn matrices, Prove that [st] < [> [ Compute the norm of the identity matrix B15 Let Ve the normed linear space R* under the uniform norm [lle = max {ln} Ti Pe tlom V, prove that [isthe norm af ts mais It] as dened in the above eerie, That show that inj = max 3 tal (Show first that ||t|| is an upper bound of 7, and then show that {j7%()|| — ||t}| [Ix|| for {specially chosen x) Dots part of the previous exerdae now become supertaos? 216 Assume the following fact: 17 €(0, 1) and lf = a then given there isa funetion g © e(, 1) such that alle and [a pane33, conmisunry 131 Let K(e,4) be continuous on [0, 1] [0, 1] and bounded by b. Define 7: €((0, 1!) + B(0, 1) by Th = k, where Ho) =f KG, Hd ad, If V and IF are the normal linear spaces © and (& under the uniform norms, prove that Ia] = hub [1K (6, 01 at, Uint: Proceed as in the above exercise.) 3.17 Let V and 1 be normed linear spaces, and let 4 be any subset of V containing, more than one point, Let £(.1, IF) be the set of all Lipschitz mappings from .{ to 1 Forin £(ly I), ev pif) be the smallest Lipschita constant for. That is, ap ZO — FO MD = IEA Prove that £(., 11) is a vector space V and that p is a seminorm on V. 3.18 Continuing the above exercise, show that if a is any fixed point of .1, then P(A) + [f(@)| is a norm on V. 319 Let K be a mapping from a subset .t of a normed linear space ¥ to V which ‘lffers from the identity by a Lipschita mapping with constant ¢ less than 1. We may as well take ¢ = 4, and then our hypothesis is that IK) — Ki) — &— mI S HE — al Prove that K is injective and that its inverse is a Lipschita mapping with constant 2. 20 Continuing the above exereise, suppose in addition that the domain .t of Kis ‘an open subset of V and that K(C] is & elosed set whenever C is a closed ball lying in Prove that if C = Cy(a), the closed ball of radius r about a, is a subset of 4, then K{C] includes the ball B = By7(7), where 7 = K(a). ‘This proof is elementary but tricky. If there is a point v of B not in K[Cl, then since KIC] is closed, there is largest hull B® sbout v disjoint from KIC] and a point y= K(8) in K{C] as close to B’ as we wish, Now if we change € by adding v — the change in the value of K will approxi- ‘uate v — m elosely enough to force the new value of K to be in B’. If we ean also show that the new value £-+ (v — 2) is in C, then this new value of K is in K(C], and we have our contradiction. Draw a picture. Obviously, the radius p of BY is at most r/7. Show that if 1 = K(Q) is chosen so that |v — al] < 3/2p, then the above assertions follow from the triangle inequality, and the equality displayed in Exercise 3.19. You have to prove that [K+ @—m) -oll
to the normed linear space , is bounded in both directions, and hence that these two normed linear spaces are isomorphic. If V is infinite-dimensional, two norms will in general not be equivalent. For example, if V = e((0, 11) and fy( = 1%, then [full = 1/(n +1) and fale 1. ‘Therefore, there is no constant a such that [fle < aly forall FEO, 1], and the norms |) |1« and || ly are not equivalent on V'= ela, b). ‘This is why the very notion of a normed linear spaee depends on the assumption ofa given norm. However, we have the following theorem, which we shall prove in the next chapter by more sophisticated methods than we are using at present. ‘Theorem 4.1. On a finite-dimensional vector space V all norms are equivalent. We shall need this theorem and also the following consequenee of it oceasio ally in the present chapter. ‘Theorem 4.2. If V and W are finite-dimensional normed linear spaces, then every linear mapping 7’ from V to W is necessarily bounded.34 EQUIVALENT NonMs 133 Proof. Because of the above theorem, it is sufficient to prove T bounded with respect to some pair of norms. Let 6: R"—V and : R"— W be any basis morphisms, and let {4j} be the matrix of P= ge Te in Hom(R®, RM). Then su omstod (3 el) = ta where b = max |tu|. Now a(n) = llp—(n) |e and p() = |]@-"(6)||x are norms on W and V respectively, by Lemma 2.1, and since A(T) = TOS) w Sd OT*Ella = bwC, we see that 7’ is bounded by b with respect to the norms p and gon Vand W. UPs\je = max If we change to an equivalent norm, we are merely passing through an isomorphism, and all continuous linear properties remain unchanged. For example: ‘Theorem 4.3. The vector space Hom(V, W) remains the same if either the domain norm or the range norm is replaced by an equivalent norm, and the two indueed norms on Hom(V, W) are equivalent. Proof. ‘The proof is left to the reader. ‘We now ask what kind of a norm we might want on the Cartesian product V XIV of two normed linear spaces. It is natural to try to choose the product norm so that the fundamental mappings relating the product space to the two factor spaces, the two projections 7, and the two injections 6j, should be cor tinuous. It turns out that these requirements determine the produet norm uniquely to within equivalence. For if |||| has these properties, then ll] = | + <0, E>|| < || || + []<0, E>] S Kallal| + elle] < ACal| + [Ne1D, where ky is a bound of the injection @ and k is the langer of ky and ka. Also, jal < ex|/ |) and |/&| < cal| ||, by the boundedness of the projections my, and so |lal| + \|£|| < el] ||, where ¢ = c+ ¢2. Now |lall + Ui is clearly a norm || ||; on Vx W, and our argument above shows that ||| will satisfy our requirements if and only if it is equivalent to ||]. Any such norm will be ealled a product norm for V x W. The product norms ost frequently used are the uniform (produet) norm [l#a, €> [le — max {lla 8, the Euclidean (product) norm || {|2 = (\lal|? + |[z||?)"2, and the abor sum (produet) norm | 1. We shall leave the verification that the w form and Euclidean norms actually are norms as exercises. Each of these three product norms ean be defined as well for n factor spaces as for two, and we gather the facts for this general ease into a theorem.134 THE DIFFERENTIAL caLet s Bo Theorem 44. If {}1 is a finite set of normed Tinear spaces, then Wills I fay and ley defined on V = [hte Vs by lel = 2 pa), lel = (Si pad?) and lela = max (pi(as):i=1,...,n}, are equivalent norms on V, and each isa produet norm in the sense that the projections m; and the injections 6, axe all continuous, * Tt looks above as though all we are doing is taking any norm || || on R™ and then defining a norm {J | on the product space V by feel ‘This is almost correct. ‘The interested reader will discover, however, that. | || on R” must have the property that if |x| < |yi| for i= 1,..., 0, then \Ix|| < lly|| for the triangle inequality to follow for {j {| in V. If we call such a norm on R™ an increasing norm, then the following is true. I | Tf || | is any inereasing norm on R®, then {laell is a product norm on V = TT? V, | |] However, we shall use only the 1-, -produet norms in this book.+ ‘The triangle inequality, the continuity of addition, and our requirements on ‘a produet norm form a set of nearly equivalent conditions. In particular, we make the following observation, Lemma 4.1. If V is a normed linear space, then the operation of addition isa bounded linear map from Vx V to V. Proof. The triangle inequality for the norm on V says exactly that addition is, bounded by I when the sum norm is used on FV. 0 A normed linear space V is a (norm) direet sum @f Vy if the mapping £21, ..-;2q> H+ Df #18 a norm isomorphism from [fj Vs to V. That is, the given norm on ¥ must be equivalent to the product norm it acquires when it is viewed as [Tj V. If V is algebraically the direct sum @} Vi, we always have ea] < Sle by the triangle inequality forthe norm on V, and the sum on the right isthe one norm for [T} Vs. Therefore, V will be the norm direct sum @¥ Vi if, conver there is an n-tuple of constants {%)) such that [il] < kill] for all x. TI the same as saying that the projections Py:.r are all bounded. ‘Thus, ‘Theorem 4.5. If V isa normed linear space and V is algebraically the direet sum V = @f Vs, then V = @j V; as normed linear spaees if and only if, the associated projections (P,} are all bounded,34 RQUIVALENT Nomms 135, EXERCISES 41 The fact that Hom(V, 18) is unchanged when norms are replaced by equivalent norms ean be viewed as a corollary of Theorem 3.3. Show that this isso. 42 Write down a string of aui ws inequalities showing that the norms Wy Way and | lle on R* are equ Discuss what happens as n > =. 4.3. Let V be an n-dimensional vector space, and consider the eolletion of all norms on V of the form po 8, where 0: ¥ — Ris a coordinate isomorphism and p is one of the norms [li | la le oa BR". Show that all of these norms are equivalent. (Use the above exercise and the reasoning in Theorem 4.2.) 44 Prove that || = max (lar, [£8 a norm on VX W. 4.5. Prove that l] = llall+ [fis norm on FW. 4.6 Provo that |[|| = (lail?+ [gI®)" is @ norm on V x 1. 4.1 Assuming Exercises 44 through 4.6, prove by induetion the eoreesponding part st Theorem 4.4 4.8 Prove that if is an open subset of ¥ XW, hen mi[4] is an open subset of V 49 Prove (6,8) that <2, $> + Se Tis continuous map from Hom(Vs, Va) X Hom(Va, Va) to Hom(V1, Va), where the Yate all normed linear spaces 4.10 Let [| be any increasing norm on R; that is, laf < [yl it xe < ys forall &. Tet py be a norm on the veetor space Vs for i= 1,...,n, Show that Weel = I sa norm on ¥ = TIt Ve 1.11 Suppose that p: V’—> R is a nonnegative function such that p(za) = |z{p(a) Sor all, a. This is surely @ minimum requirement for any function purporting to be a sucasure of length of a vector. 1) Define continuity with respect to p and show that ‘Thoorem 3.1 is valid b) Our nest requirement i that addition be continuous as a map from VX V to V, and we decide that continuity at O means that for every « there isa 8 such that pla) < Band p(B) <6 => plat p) 0). If we divide the last equation by ¢ again, we see that this property of the infin-3B INFINITESIMALS 187 itesimal g, that it converges to 0 faster than fas ¢ -» 0, is exactly equivalent to the fact that the difference quotient of f converges to.a. ‘This makes it clear that the study of derivatives is included in the study of the rate at which infinitesimals get small, and the usefulness of this paraphrase will shortly become clear. Definition. A subset A of a normed linear space V is a neighborhood of a point a if A includes some open ball about a. A deleted neighborhood of a i8 a neighborhood of a minus the point « itself. ‘We define special sets of functions 4, ©, and os follows. It will be assumed in these definitions that each funetion is from @ neighborhood of 0 in a normed linear space V to a normed linear space W. FE4 if (0) = 0 and f is continuous at 0. ‘These functions are the infinitesimals. £0 if f(0) = 0 and f is Lipschits continuous at 0. ‘That is, there exist positive constants r and ¢ such that ||f(é)|) < ¢l]é| on B,(0). Joi f0) = Oand [If(e)|/le| + 0.98 ¢ +0. When the spaces V and W are not understood, we specify them by writing o(V, W), ete. ‘A simple set of functions from R to R makes the qualitative difference between these classes apparent, ‘The funetion f(z) = |z|"!? is in 4(R, R) but not. in 0, 9(2) = 2 is in 0 and therefore in $ but not in c, and A(z) =z? is in all three classes (Fig. 3.7) It is clear that 4, ©, and o are unchanged when the norms on V and W are replaced by equivalent norms, Our previous notion of the sum of two funetions does not apply to a pair ‘f functions f, ¢ € (V, W) because their domains may be different. However, J & gis defined on the intersection dom fn dom g, which is still a neighborhood “0. Morcover, addition remains commutative and associative when extended on this way. Tt is clear that then 6(V, W) is almost a vector space. The only trouble oceurs in eonneetion with the equation f+ (—f) = 0; the domain of ‘the function on the left is dom J, whereas we naturally take 0 to be the zero function on the whole of V.138. THe DIFFERENTIAL cancuLus 35 *The way out of this difficulty is to identify two functions f and g in $ if they are the same on some ball about 0. We define f and g to be equivalent (f~ 9) if and only if there exists a neighborhood of 0 on which f= g. We then check (in our minds) that this iv an equivalence relation and that we now do have a vector space. Its elements aro ealled germs of functions at 0. Strictly speaking, » germ is thus an equivalenee elass of funetions, but. in practice ane tends to think of germs in terms of their representing functions, only keeping in mind that two functions are the same as germs when they agree on a neighbor hood of O.« As one might guess from our introductory discussion, the algebraic properties of the three classes 4, ©, and e are crucial for the differential calculus. We gather them together in the following theorem. ‘Theorem 5.1 1) o(¥, W) COV, W) C4(V, W), and each of the three classes is closed under addition and multiplication by scalars, 2) If fEO(V,W), and if ge o(W,X), then gofeo(V, X), where dom ge f = j"ldom 9]. 3) If either f or g above is in e, then so is g °f. 4) If fEo(V,W) and ges(V,R), then fy €o(V,W), and similarly it Sesand geo. 5) In (A) if cither f or g is in o and the other is merely bounded on a neigh borhood of 0, then fy € o(V’, W). 6) Hom(¥, W)C o(V, W). 7) Hom(V, W) no(V, W) = {0}. Proof. Let £4(V, W) be the set of infinitesimals f such that |[f(é)|| < ¢l!é| on ‘some ball about 0. Then f ¢ 0 if and only if fis in some £,, and f €0 if and only iff isin every &.. Obviously, oC 0 C8. 1) IE JJ] < allel on BY(0) and [g(¥)|| < biz] on By(0), then Ie) + 98] S (@+ dE on B,(0), where r= min {tu}. Thus © is closed under addition. The closure of o under addition follows similarly, or simply from the limit of a sum being the sum of the limits. 2) If fll < alléll when [|e < ¢ and |\g(n)|| < bljn|| when In|] ) and also 7 = (2), then 7 = o(8). Proof. The hypotheses imply that there are numbers b, ry and p such that. ‘nl < Ble + $e + fal) if lle < v1 and [el + [nll €O(V, XX ¥). ‘That is, <0(2), 0(8)> = 0(8). Proof. ‘The proof is left to the reader.140 THE DIWrERENTIAL cancuLUS: 36 EXERCISES 51 Prove in detail that the class 9(V, W) is unchanged if the norms on V and IP are replaced by equivalent norm 5.2 Do the same for © and e. 5.8 Prove (B) of the Go theorem (Theorem 5.1) 5.4 Prove also that if in (4) either for is in @ and the other is merely hounded oma neighborhood of O, then fg € ©(V, W). 5.5 Prove Lemma 5.2. (Remember that F = is loose language for Fm 6:0 8;-+ G20 Fz) State the generalization to n functions. State the e-form of the theorem, 5.6 Given Fy € (V1, W) and F2 © (V2, W), define F from (a subset of) V = Vix Va to W by Flare) = Fila) + Falaa). Prove that FE OV, W). (First state the defining equation as an identity involving the projections x and 2 and not involving explicit mention of the domain veetors ay and aa.) 5.7 Given Fy € O(V'1, W) and Fz O(V'2, B), define precisely what you mean by iF and show that itis in o(V X Va, W). 5.8 Define the class 0” as follows: J © O* iff € ¥ and ||f(8)\/Lg|* is bounded in some deleted ball about 0. (A deleted neighborhood of a is a neighborhood minus a.) State fand prove a theorem about f-+g when f€ 0" and g € 0”. 5.9 State and prove a theorem about fg when f 0* and g EO”. 5.10 State and prove a theorem about fg when f © O* and g € 6 5.11 Define a similar class 0", State and prove a theorem about fog when f= 0" and g Ee" 6, THE DIFFERENTIAL Before considering the notion of the differential, we shall review some geometric ‘material from the elementary calculus. We do this for motivation only; our subsequent theory is independent of the preliminary discussion, In the elementary one-variable caleulus the derivative J"(a) of a function f at the point a has geometric meaning as the slope of the tangent line to the graph of f at the point a. (Of course, aecording to our notion of a function, the graph of f if.) The tangent line thus has the (point-slope) equation y ~ f(a) = ‘#'@)(x — a), and is the graph of the affine map x» J"(a)(e — a) + f(a). ‘We ordinarily examine the nature of the curve f near the point by using new variables which are zero at this point. That is, we express everything in terms of s = y — f(a) and {= 2— a, This change of variables is simply the translation <2,y>!> = <2—ay—Jfa)> in the Cartesian plane R? which brings the point of interest to the origin. If we picture the situation in a Euelidean plane, of which the next page is a satis factory local model, then this translation in R? is represented by a choice of new faxes, the + and saxes, with origin at the point of tangeney. Since y = j(2)36 ‘THE DIFFERENTIAL 141 if and only if » = f(a-+ 0) — f(a), we sce that the image of f under this translation is the funetion Af, defined by Afo(!) = fla+ t) ~ fla). (See Fig. 38.) Of course, Afa is simply our old friend the change in j brought about by changing, x froma toa +t 19a) = aha —_+—_|___ Fig. 3.8 Similarly, the equation y — f(a) = f’(a)(z — a) becomes 3 = j"(a)t, and the tangent line accordingly translates to the line that is (the graph of) the linear funetional : ¢-+ /"(a)¢ having the number J’(a) as its skeleton (matrix). Remember that from the point of view of the geometric configuration (eurve and tangent line) in the Euclidean plane, all that we are doing is choosing the natural axis system, with origin at the point of tangeney. ‘Then the curve is (the graph of) the function 4f., and the tangent line is (the graph of) the linear map 1. ‘Now it follows from the definition of f"(a) that lean also be characterized as the linear funetion that approximates Af, most closely. For, by definition, t) SO pa) as 10, tnd this is exaetly the same as saying that fall) 109 or fe tee. T But we know from the @o-theorem that the expression of the funetion Af, as the sum -Fe is unique. This unique linear approximation ls called the differentia stat aand is designated dj. Again, the differential of fata isthe linear Function I:Re+R that approximates the actual change in f, Af. in the sense that Afe—1 0; wesaw above that if the derivative (a) exists, then the differential ‘Fat avexists and has f(a) as its skeleton (LX 1 matrix). Similarly, iffisa function of two variables, then (the graph of) fis.a surface in Cartesian 3-space R? = R* x R, and the tangent plane to this surface at has the equation z — f(a, b) = fila, b)\(x — a) + fola, b)(e — b),142° THE DIFFERENTIAL CALCULUS 36 where fy = a//ar and fz = af/ay. If, as above, we set Af is the change in f around a, b ‘and 1 is the linear functional on R? with matrix (skeleton) . ‘Moreover, it is a theorem of the standard calculus that if the partial derivatives of f are continuous, then again | approximates Afia,s>, with error in 0, Here also Lis ealled the differential of fat and is designated dfs» (Fig. 3.9). ‘The notation in the figure has been changed to show the value at t = of the differential dj, of fat a= Fig. 39 ‘The following definition should now be clear. As above, the local function AF, is defined by AF 4(¢) = F(a + &) — F(a). Definition. Let V and W be normed linear spaces, and let A bea neighborhood of a in V. A mapping F: A — W is differentiable at a if there is a T in Hom(V, W) such that AF.(8) = T(@) + 6(8). ‘The 6o-theorem implies then that T'is uniquely determined, forif also MPa = S-+0, then T — $ Ge, and so T — $ = O by (7) of the theorem. ‘This uniquely determined T is called the diferential of F at e and is designated di, Thus tg = Fa to, where dF, is the unique (bounded) linear approximation to AF.36 THE DIFFERENTIAL © 143 + Our preliminary discussion should make it clear that this definition of the differential agrees with standard usage when the domain space is R". However, in certain eases when the domain space is an infinite-dimensional function space, Fy is called the first variation of F at a. ‘This is due to the fact that although the carly writers on the calculus of variations saw its analogy with the differential calculus, they did not realize that it was the same subject. We gather together in the next two theorems the familiar rules for differentiation. They follow immediately from the definition and the 0e-theorem. It will be convenient to use the notation D4(V, W) for the set of all mappings from neighborhoods of a in V to W that are differentiable at a. ‘Theorem 6.1 1) IfF eD,(V, W), then AF, € 0(V, W). 2) IF, € DV, W), then P+ € 9,(V,W) and dF + Qe Pa + dG. 3) IfF €D,(V, R)and G <,(V, W), then FG €D_(V, W) and d(F@e F(a) d@_-+ dFaG(a), the second term being a dyad. 4) If F is a constant function on V, then F is differentiable and dF. = 0. 5) If @ Hom(V, W), then F is differentiable at every « € Vand da dF, +0 = © +0 = © by (1) and (6) of the Go-theorem. It is clear that A(P + @)a= AFa+ AG. Therefore, A(F + @)e = (GF, + 0) + (AG.+ 0) = (dF + dGq) + © by (1) of the Ge-theorem. Since Rg + dG, € Hom(V, W), we have (2). 3) AFG)a(8) = Flat )Gla+ ) — Fla)G(a) \Pa(8)G(a) + Fla) AGa(#) + AP a(8) AGa(E), as the reader will see upon expanding and canceling. ‘This is just the usual device of adding aad subtracting middle terms in order to arrive at the form involving the 4’s. ‘Thus A(PG)_ = (dF +)G(a) + F(a)(dOq +e) + 00 = dFG(a) + Fla) dGa+0 by the 6e-theorem. 4) IfaF, = 0, then dF, = 0 by (7) of the eo-theorem, 5) APo(t) = F(a + $) — F(a) = P(g). Thus AF. = ‘The composite-furetion rule is somewhat more complicated. Theorem 6.2, IfF ED(V, W) and GE Dp,—)(W, X), then Go F ED4(V, X) and €Hom(V, W). 0 dG © Pla = dGria) * dPaeM44 THE DIFFERENTIAL caLcuLUs 36 Proof. We have ACG 2 Pals) = G(F(a + 8) — G(F(@)) G(P(a) + aF.() ~ 6) = AG ria)(4Fa(é)) AG pas(AP 8) + (APC) = dG ria, (dF a(8)) + AGras(0(2)) +020 (Gr) © dFa)(8) + 020+ 086. Thus A(@° F)q = dGya) ° dFq + 0, and since dGyia) ° dF, € Hom(V, W), this proves the theorem. The reader should be able to justify each step taken in this chain of equalities. 0 EXERCISES 6.1 The coordinate mapping <2, y> ++ 2 from R? to Bis differe ‘What is its differential? 6.2 Prove that differentiation commutes with the application of bounded linear maps. ‘That is, show that if F: V— Wis differentiable at a and if T = Hom(W, X), then To F is differentiable at a and d(T’ F), = Te di, 63. Prove that FE D_(V, R) and Fla) # 0=> G = 1/F €,(V, R) and le. Why? aire @@)? 6.4 Let F:V +R be differentiable at a, and let f:R — R be a function whose derivative exists at a = F(a). Prove that fis differentiable at a and that f° Fe = $"(@) de [Remember that the differential of f at a is simply multiplication by its derivative: afa(h) = f"(a)-| Show that the preceding problem is a special ease, 6.5. Let V and W be normed linear spaces, and let F:V—+ W and G:W — V be continuous maps such that Ge F = Ty and FG = Iy. Suppose that F is differentiable at a and that @ is differentiable at § = F(a). Prove that Gy = (dF)~ 6.6 Let f: V+ R be differentiable ata. Show that g = f* is differentiable at a and that oe = Wf") Uae (Prove this both by an induetion on the product rule and by the composite-function rule, assuming in the second ease that Dez” = nz"36 Tue DIFFERENTIAL 145 6.1 Prove from the product rule by induction that if the m functions fi: V —» R, G2 Jy.cu ny areal differentiable at a then so isf = TT} fo and that a= X [Tse a0. 6.8 A monomial of depree mon the normed linear space V is a product [Tf ty of linear functionals (/;€ V*). A homogeneous polynomial of degree n is a finite sum of ‘monomials of degree n. A polynomial of degree n isa sum of homogeneous polynomials P,i = 0,...,n, where Po is constant. Show from the above exercise and other known facts that a polynomial is differentiable everywhere. 6.9 Show that if Fy: V— Wy and Fa: V— We are both differentiable at a, then so is F = from V to W = Wy X Wa (use the injections 6; and 6). 6.10 Show without using explicit computations, but using the results of earlier exercises instead, that the mapping F = R? + R? defined by is everywhere differentiable. Now compute its differential at . GAL Let P:V +X and G:W + X be differentiable at a and 8 respectively, and define K:V X WX by K(&n) = FQ) + Go). Show that Kis differentiable at a) by a direct A-calculation; b) by using the projections #1 and 2 to express K in terms of F and @ without ‘explicit reference to the variable, and then applying the differentiation rules. 6.12 Now suppose given F: V+ Rand @: W + X, and define K by K(n) = FOGG). Show that if F and @ are differentiable at a and @ respectively, then K is differentiable ‘at in the manner of (b) in the above exercise. 6.13 Let V and W be normed linear spaces, Prove that the map +» lal Isl from VX W to Risin o(V X WR). Use the maximum norm on the product space. Let f: VX W—+ R be bounded and bilinear. Here boundedness means that there is some such that [f(a,)| < blal| [il for all a,8. Prove that f is differentiable everywhere and find its differential. 6.14 Let fand g be differentiable functions from R to R, We know from the composite function rule of the ordinary calculus that sen'@ =s'G)o@). Our composite-function rule says that 4S De = dato ° Ay where dfzis the linear mapping t+ "(z)t. Show that these two statements are equivalent.146 THe DirFexeNTIAL cancuLus 37 6.15 Prove that f(z, y) = [Ill1 = |2| + [ul is differentiable except on the coordinate axes (that is, df exists if a and b are both nonzero). 6.16 Comparing the shapes of the unit balls for || lI, and || je on R®, guess from the above the theorem about the differentiability of || lla. Prove it. 6.17 Let V and W be fixed normed linear spaces, let Xu be the set of all maps from V to W that are differentiable at 0, let Xo be the set of all maps from V to IV that belong to w(V, W), and let Xrbe Hom(V, W). Prove that Xg and Xo are veetor spaces and that Xz = Xo ® Xi 6.18 Let F be a Lipschitz function with constant C which is differentiable at a point «. Prove that |[dFal| < C. 7. DIRECTIONAL DERIVATIVES; THE MEAN-VALUE THEOREM Directional derivatives form the conneeting link between differentials and the derivatives ofthe elementary ealeulus, and, although they add one more concept that has to be fitted into the scheme of things, the reader should find them intuitively satisfying and technically useful. continuous function f from an interval ‘CR to a normed linear space W" can have a derivative /’(z) ata point z € J in exactly the sense of the elementary. caleulus: inf +0— fe), S') The range of such a function f is @ curve or are in W, and it is conventional to call f itself a parametrized arc when we want to keep this geometrie notion in mind. We shall also call /"z), if it exists, the ‘angent vector to the are f at z. This terminology fits our geometric intuition, as Fig. 3.10 suggests. For sim- ity we have set x= 0 and f(z) = 0. Iff'(z) exists, we say that the parametrized are fis smooth at x. We also say that fis smooth at « = (2), but this terminology is ambiguous if fis not injective (i.e, if the are crosses itself). An are is smooth if itis smooth at every value of the parameter. We naturally wonder about the relationship between the existence of the tangent veetor f'(z) and the differentiability of fat x. If dfs exists, then, being a linear map on R, it is simply multiplication “by” the fixed veetor a that is its skeleton, dfe(t) = h dja(1) = ha, and we expect a to be the tangent vector, “Ho _ 040-0) Fig. 310aT DIRECTIONAL DERIVATIVES; THE MEAN-VALUE THEOREM 147. {J"(2). We showed this and also the converse result for the ordinary calculus in ‘our preliminary discussion in Seetion 6. Actually, our argument was valid for ‘veetor-valued functions, but we shall repeat it anyway. ‘When we think of a vector-valued function of a real variable as being an are, we often use Greek letters like (V’ and ‘7’ for the function, as we do below. ‘This of eourse does not in any way change what is being proved, but is slightly suggestive of a geometrie interpretation. ‘Theorem 7.1. A parametrized are 7: {a, b] —» V is differentiable at x € (a,b) if and only if the tangent veetor (derivative) @ = (2) exists, in which ease the tangent vector is the skeleton of the differential, d¥(h) = hY'(x) = ha. Proof. If the parametrized are 7: [a, b] + V is differentiable at x & (a, b), then ay_(h) = hdv,(1) = ha, where a = d7,(1). Since 47, — d7z 6, this gives 1s7.(8) — hall/|h| > 0, and so 47.(h)/h— a as h—+0. Thus a is the derivative 4"(e) in the ordinary sense. By reversing the above steps we see that the existence of 7/(z) implies the differentiability of 7 at x. 0 Now let F bea function from an open set A in a normed linear space V to a normed linear space W’, One way to study the behavior of F in the neighborhood of a point a in A is to consider how it behaves on each straight line through a. ‘That is, we study F by temporarily restricting it to a one-dimensional domain, ‘The advantage gained in doing this is that the restricted F is then simply a parametrized are, and its differential is simply multiplication by its ordinary derivative. For any nonzero & € V' the straight line through a in the direction £ has the parametric representation (+ a+ ¢. The restriction of F to this line is the parametrized are 7:1(0) = F(a-+ (2). Its tangent veetor (derivative) at the origin t = 0, if it exists, is called the derivative of F in the direction § at a, or the derivative of F with respect fo € al a, and is designated D¢P(a). Clearly, Fla +18) — F(a), t DeFla) = lim Comparing this with our original definition of j', we see that the tangent vector ‘V(e) to.a parametrized are 7 is the directional derivative D,7(z) with respect to the standard basis vector 1 in R. Strictly speaking, we are misusing the word “direetion”, because different veetors ean have the same direction. Thus, if » = c& with ¢ > 0, then n and & point in the same direction, but, beeause DgF(a) is linear in § (as we shall see in a moment), their associated derivatives are different: D,F(a) = eDyF (a). ‘We now want to establish the relationship between directional derivatives, which are vectors, and differentials, which are linear maps. We saw above that for an are 7 differentiablity is equivalent to the existence of 12) = D7) In the general ease the relationship is not as simple as itis for ares, but in one direction everything goes smoothlyM8 tHe pirFeRENTIAL cancuLus a7 ‘Theorem 7.2. If F isdifferentiable at @, and if \is any smooth are through a, with « =X), then ¥= Fe is smooth at x, and (x) = dF_(X'(@)). In particular, if F is differentiable at a, then every directional derivative DeF(a) exists, and DiF(@) = dPa(®). Proof. The smoothness of 7 is equivalent to its diferentiability at x and therefore follows from the eamposite-unetion theorem. Moreaver, 7"(r) = @¥_(1) = a © d)(1) = dF (dd.(1)) = dFa(N'(@)). Tf is the ‘parametrized line NO) = a+ tg, then it has the constant derivative &, and sinee a = (0) here, the above formula becomes ¥'(0) = dFa(g). That is, DyF(a) = 70) = ara(o. 0 Itis not true, conversely, that the existence of all the directional derivatives D,F(«) of a function F at a point a implies the differentiability of F at a. The casiest counterexample involves the notion of a homogenous funetion. We say that a funetion F:V —+ W is homogeneous if (zt) = 2F(8) for all x and & For such a function the directional derivative DeF(0) exists beeause the are ‘1() = FO + te) = #(g) islinear, and 7/(0) = F(®). Thus, all ofthe directional derivatives of a homogeneous function F exist at 0 and DyF(0) = F(2). If F is also differentiable at 0, then dFo() = DgF(0) = F(e) and F = do. Thus a differentiable homogeneous function must be linear. ‘Therefore, any nonlinear homogeneous function F will be a funetion such that De (0) exists for all & but GF does not exist. Taking the simplest possible situation, define F: R? —> R by FG, y) = 25/(a? + v3) if # <0,0> and F(0,0) = 0. Then F(z, y) = Fe, v), so that F is homogeneous, but F is not linear. However, if V is finite-dimensional, and if for each & in a spanning set of vectors the directional derivative D,P(a) exists and is a continuous function of ‘on an open set A, then F is continuously differentiable on A. The proof of this fact depends on the mean-value theorem, which we take up next, but we shall not complete it until Section 9 (Theorem 9.3). ‘The reader will remember the mean-value theorem as a cornerstone of the calculus, and this is just as true in our general theory. We shall apply it in the next section to give the proof of the general form of the above-mentioned theorem, and practically all of our more advanced work will depend on it. The ordinary mean-value theorem does not have an exact analogue here. Instead we shall prove a theorem that in the one-variable caleulus is an easy consequence of the mean-value theorem. ‘Theorem 7.8. Let f be a continuous function (parametrized are) from a cloved interval [2,5] to a normed linear space, and suppose that /"0) exats and that [j'@] Sm for all ¢€ (a,b). Then lf) — f(a] < me —a). Proof. Fix € > 0, and let A be the set of points x € [a,b] such that Ife) — fa)|| < (m+ O@—a) +e37 DIRECTIONAL DERIVATIVES; THE MEAN-VALUE THEOREM — 149 A includes at least a small interval (a, el, because f is continuous at a Set T= lub A. Then lf — f(@)|| < (m + Ol — a) + eby the continuity of fat Thus Le A, and a << b. We claim that 1= 6. For if . 1) Prove the same result from Theorems 7.1 and 7.2 and the differents Section 6, using the exact relation F = 1 ¢f-+ 62° 9. In the spirit of the above two exercises, state @ product law for deriv ves and prove it asin the (b) proofs above. 7.6 Find the tangent vector to the arc <¢,sint> at ¢ = 0; at t = */2. [Apply Hvereise 74(a),]. What is the differential of the above parametrized arc at these two ints? That is, if {() = , what are dfp and dfs? 2 Let F:R®—R? be the mapping <2, y> H+ <32%y, 2% mal derivative Dei.a>F(3, —1) 18) as the tangent veetor at <3, —1> to the are fo, where Xia the straight line through <3, —1> in the direction <1, 2>5 b) by first computing di and then evaluating at <1, 2>. 7.8 Let and w be any two linear functionals on a vector space V. Evaluate the product J(€) = N(Bu(E) along the line £ = fa, and hence compute Daffa). Now rvaluate f along the general line £ = fa-+ 8, and from it compute Daf(B). 1.9 Work the above exercise by computing differentials 2.10 If f:R* Bis differentiable at a, we know that its differential df, being a Iinear funetional on R®, is given by its skeleton ntuple Laccording to the formula {In this context we call the n-tuple Lthe gradient of f at a. Show from the Schwarz ‘equality (Exercise 2.8) that if we use veetors y of Buclidesn length 1, then the sltional derivative Dyf(a) is maximum when y points in the direction of the gradient of I Let W be a normed linear space, and let V be the set of parametrized arcs X'[-1, 1] + Wesuch that (0) = O and \(0) exists. Show that Vis a vector space and ‘Hut \'+X"(0) is & surjective linear mapping from V to W. Describe in words the rlements of the quotient space V/N, where AV is the null space of the above map. 1.12 Find another homogeneous nonlinear function. Evaluate its directional derivatives De(0), and show again that they do not make up a linear map. Prove that if F is a differentiable mapping from an open ball # of a normed Iuwear space V' to a normed linear space IV such that di’, = 0 for every a in B, then F ‘constant function, 1.14 Generalize the above exercise to the case where the domain of F is an open set A with the property that any two points of A can be joined by a smooth are lying in A. jon rules of ves of| >. Compute the sitet fats) = (Lyx)152 THe pirrexenTiaL cancunus 38 Show by a counterexample thatthe result does not generalize to arbitrary open sets ‘as the domain of F. ZAS_ Prove the following generalization of the mean-value theorem. Let be a continuous mapping from the elosed interval [a ] to a normed linear space Vand let g be a continuous real-valued function on a, ). Suppose that f'() and g/() both exist at all points of the open interval (a 6) and that lf"(l| $ g/(@ on (a,0).. ‘Then Ws) — Ha] < 006) — a0). [Consider the points such that [J(2) — f(@|| $ g(t) — g(a) + ele — a) + el 8. THE DIFFERENTIAL AND PRODUCT SPAC In this section we shall relate the differentiation rules to the special configurations resulting from the expression of a vector space as a finite Cartesian product. When dealing with the range, this isa trivial consideration, but when the domain is a product space, we become involved with a deeper theorem. ‘These general product considerations will be specialized to the ®"-spaces in the next section, but they also have a more general usefulness, as we shall see in the later sections of this chapter and in later chapters. ‘We know that an m-tuple of functions on a common domain, F +m, is equivalent to a single m-tuple-valued function raw Tl, As Ws, F(a) being the m-tuple {F(a)} for each a € A. We now check the obviously necessary fact that F is differentiable at a if and only if each F*is differentiable ata. Theorem 8.1. Given Fi: A Wii = 1,...,myand P= , then F is differentiable at « if and only if all the functions F* are, in whieh case dF = . Proof. Strictly speaking, F = Sf 6:0 F', where 6; is the injection of W} into the product space W = TIT W (see Section 1.3). Since each 6; is linear and hence differentiable, with d(jJa = 64, we see that if each Fis differentiable at a, then so is F, and dFy= Ef 60 dF, Less exactly, this is the statement dP, = . “The converse follows similarly from F‘= me F, where 1; is the projection of ITP W. onto W;. 0 ‘Thoorems 7.1 and 8.1 have the following obvious corollary (which ean also bbe proved as easily by a direct inspection of the limits involved). Lemma 8.1, If J; is an are from (a, 6] to Wi, for i= 1, and if fis the n-tuple-valtied are = , then J’(2) exists if and only if Sila) exists for each i, in which case {’(z) = When the domain space V is a produet space [Tj V; the situation is more complicated. A function F(E1, ... £,) of n vector variables does not decompose38 THE DIFFERENTIAL AND PRopUCT SPACES 153, ito an equivalent n-tuple of functions. Moreover, although its differential dg does decompose into an equivalent 1-tuple of partial differentials (a), we do not have the simple theorem that ds exists if and only if the partial differentials dF all exist. Of course, we regard a funetion F(&, ‘function of the single n-tuple variable = <&,..., &4>,80 that in principle there is nothing new when we consider the differentiability of F. However, when we consider a composition F ° G, the inner funetion @ must now be an n-tuple- valued function @ = , where gis from an open subset A of some normed linear space X to Vi, and we naturally try to express the differential of F'e Gin terms of the differentials dg, To accomplish this we need the partial differentials dF, of F. For the moment we shall define the jth partial differential of Fat a= as the rest of the differential d¥a to Vj, considered as a subspace of V = TI} Vs. As usual, this really involves the injection @, of V, into IT} V;, and our formal (temporary) definition, accordingly, + fx) of n vector variables as being = dFao; ‘Then, since = = Ei O(Ei), we have alt) = Yo aPaCD. Similarly, since G= aPign » gh, which we shall call the general chain rule, ‘There is ambiguity in the “i"-super- retipts in this formula: to be more proper we should write (dF) and d(g')y. ‘We shall now work around to the real definition of a partial differential Since AFe0 6; = (dFa +0) ° 6; Pao 0;+0= dPa +o, we see that ‘an be directly characterized, independently of dif, as follows: aP¥,is the unique clement 7; of Hom(V;, W) such that AFae = Ts-+e. ‘That is, dF is the differential at a; of the function of the one variable & obtained by holding the other variables in F(1,..-, &) fixed at the values 4 = a3. This is important beeause in practice it is often such partial differentiability that we come upon as the primary phenomenon. We shall therefore luke this direct characterization as our definition of ds, after which our moti- vuting calculation above is the proof of the following lemma. Lemma 8.2. If A is an open subset of a produet space V = IT} Vs, and if F: A — W is differentiable at @, then all the partial differentials dF', exist and dF’ = dFa° 6.154 THE DIFFERENTIAL caLcULUS 38 ‘The question then occurs as to whether the existence of all the partial differentials’, implies the existence of da. ‘The answer in general is negative, ‘as we shall see in the next seetion, but if all the partial differentials dF’ exist for cach ain an open set A and are continuous funetions of then F is continuously differentiable on A. Note that Lemma 8.2 and the projeetion-injeetion identities show us what da must beif it exists: dF, = dPa and 5 8 x; = I together imply that d?a = TdF © m;. Theorem 8.2. Let A be an open subset of the normed linear space V = Vix Vo, and suppose that F: A — W has continuous partial differ centials dag and dl%.,g5 on A. Then dF exists and is continuous on A, and dP (& 1) = AP ka p>(8) + dF kop>(n)- Proof. We shall use the sum norm on V = V1 Vs. Given ¢, we choose & s0 that ||P cu9> — di?|| < € for every in the é-ball about and for i= 1, 2. Setting G®) = Flat & 8+) — dPhas>(®; we have laGyl = ldPLegsetar —Phasrll <€ if (8)l| S lel, when || ] <8. Arguing similarly with (x) = F(a, B+ 0) — dF eaa>(n), wo find that IF(@, B+) — F(a, B) — aF%a,a>(n)|| S elim when || <0, >|] < 4, Combining the two inequalities, we have [AF (E, 0) — T(<8,9>)| < €]<& 0>|) when ||<& >| <8 where T= dFka,g> 041+ dF%ag>° 72, That is, AP — T = 0, and 80 dF exists and equals 7. 0) ‘The theorem for more than two factor spaces is a corollary. ‘Theorem 8.3. If A is an open subset of [T}V;and F — W is such that for each i= 1,...,7 the partial differential dF exists for all @ € A and ix continuous as a function of a = , then dFe exists and ix continuous on A, IE = < Eq) .+) E42) then dale) = Lj da Gi). Proof. The existence and continuity of dF and dF3 imply by the theorem that aFL2%, + dF3o79 is the differential of F considered as a function of the frst ‘ovo variables when the others are held fixed. Since it isthe sum of continuous38 THE DIFFERENTIAL AND PRODUCT SPACES 155, functions, itis itself continuous in e, and we can now apply the theorem again to add dF@ to this sum partial differential, concluding that C{ dFqe7; is the partial differential of F on the factor space V X V2 X Vs, and so on (which is colloquial for induction), 0 AAs an illustration of the use of these two theorems, we shall deduce the general product rule (allhough a direet proof based on A-estimates is perfectly feasible). A general produetiis simply a bounded bilinear mapping w: Xx YW, where X, Y, and W are all normed linear spaces. ‘The boundedness inequality here is oC, nll < ble lnk We first show that w is differentiable. Lemma 8.3. A bounded bilinear mapping w: X x ¥ + W is everywhere differentiable and de>(E, 0) = (a, 1) ++ (6, 8). Proof. With B held fixed, gp(¢) = «(§, 8) is in Hom(X, W) and therefore is, everywhere differentiable and equal to its own differential. That is, d' exists and dakeg>(g) = w(t, 8). Since ++ gg is a bounded linear mapping, dake py — op is continuous function of . Similarly, di%a>(n) (a, 7), and ds? is continuous. The lemma is now a direct corollary of Theorem 82.0 If w(é, n) is thought of as a produet of & and », then the produet of two funetions g(¢) and h(t) is w(g(¢), h($)), where g is from an open subset A of a normed linear space V to X and h is from A to Y. ‘The product rule is now just what would be expected: the differential of the product is the first times the differential of the second plus the second times the differential of the first. ‘Theorem 8.4. If g: A> X and h: A — Y are differentiable at 8, then so is the product F(s) = w(9(s), h(¢)) and GF g(t) = «o(g(8), dhg(t)) + (dog(2), )). Proof. This is a rule. 0 irect corollary of Theorem 8.1, Lemma 8.3, and the chain EXERCISES, 8.1 Find the tangent vector to the arc at t = 0; at ¢ = 1/2. What is the differential ofthe above parametrized arc at the two given points? That is, iff) = , what are dfo and dfaja? 8.2. Give the detailed proof of Lemma 8.1 83. The formula ara = Saticey156. TE pirrenexTiaL cancunes 39 is probably obvious in view of the identity g= Date and the definition of partial differential, but write out an explicit, detailed prof anyway. 8.4 Let F be a differentiable mapping from an n-dimensional vector space V to @ fnite-dimensional vector space W, and define G: VX W—» Hy G(E 9) = 9 — FC® ‘Thus the graph of Fin VX Wis the null set of G. Show that the mull space of dGeup> has dimension n for every € VX W. 8.5. Let F(é;1) be # continuously diferentiable function defined on product AX B, where B is a ball and ts an open set, Suppose that dias» = 0 for al a,8> in AX B. Prove that F is independent of 7. That is, show that there is « continuously diferentiable function G(8) defined on “t such that FUE, ) = GCE) on AX B. 8.6 By considering a domain in R? as indicated at the right, show that there exists a function f(z, y) on an open set A in R® such that ay"? everywhere and such that f(,») isnot a function of «alone 8.7 Let F(&,2,{) be any function of three vector variables, and for fixed set G(&,n) = F(E, 9,7). Prove that the partial differential d?<.,a,.> exists if and only if dG' exists, in which case they are equal. 8.8 Give a more careful proof of Theorem 8.3. That is, state the induetive hypothesis and show thatthe theorem fllows from it and ‘Theorem 8.2. TF you are metielous i Your argument, you will needa form of the above exercise. 8.9. Let fbe a differentiable mapping from R? to R. Regarding R? as RX R, show that the two partial dferentils of f are simply multiplication by its partial derivatives Generalize ton dimensions. Show thatthe above ix still rue fora map F from R? toa general vector space V, the partial derivatives now being vectors, 8.10. Give the details ofthe prof of Theorem 8.4 9. THE DIFFERENTIAL AND R” ‘We shall now apply the results of the last two seetions to mappings involving the Cartesian spaces R*, the bread and butter spaces of finite-dimensional theory. We start with the domain, ‘Theorem 9.1. If F is a mapping from (an open subset of) R* to a normed linear space W, then the directional derivative of F in the direetion of the jth standard basis vector is just the partial derivative 3F/@xj, and the jth partial differential is multiplication by aF/azj: dP4(h) = h(OF/az;)(a). ‘More exactly, if any one of the above three objects exists at a, then they all do, with the above relationships.39 ‘THE DIFFERENTIAL AND RY 157 Proof. We have ar F(a, aj +t... ) = Fa, az, = Ti _ lim Fat i) — Fla) DuF (a). Moreover, since the restriction of F to a+ Ré! is a parametrized are whose differential at 0 is by definition the jth partial differential of F at a and whose tangent veetor at 0 we have just computed to be (9F'/@z,)(a), the remainder of the theorem follows from Theorem 7.1. 0 Combining this theorem and Theorem 7.2, we obtain the following result. ‘Theorem 9.2. If V = R" and F is differentiable at a, then the partial derivatives (9F/az;)(a) all exist and the n-tuple of partial derivatives at a, {(@F /8r,)(a)}}, is the skeleton of dF. In particular, Di Proof. Since dF4(6') = Ds'F(a) = (@F /ax;)(a), a8 we noted above, we have Doble) = aig) = a, (E va!) = E wdPs) =F v5E 0, All that we have done here is to display dif, as the linear combination mapping defined by its skeleton {di?,(6')} (see Theorem 1.2 of Chapter 1), where 7(3') dF (6!) is now recognized as the partial derivative (dF /az,)(a). ‘The above formula shows the barbarism of the classical notation for partial Ulerivatives: note how it comes out if we try to evaluate d,(x). ‘The notation DyF is precise but cumbersome. Other notations are F and D,F. Each has its problems, but the second probably minimizes the difficulties. Using it, our formula reads dF4(y) = Lay ¥5DjF (a. In the opposite direction we have the corresponding specialization of ‘Theorem 8.3. Theorem 9.3. Ii A is an open subset of R", and if F is a mapping from A to a normed linear space W such that all of the ‘partial derivatives (@F/ar))(a) exist and are continuous on A, then F is continuously differ~ centiable on A. Proof. Since the jth partial differential of F is simply multiplication by aF /az;, we are (by Theorem ¥.1) assuming the existenee and continuity of all the partial differentials dF, on A. Theorem 9.3 thus becomes a special case of Theorem 8.3. ‘Now suppose that the range space of F is also a Cartesian space, so that F isa mapping from an open subset A of R" to R™. Then dF, is in Hom(R*, R").158 THE DIFFERENTIAL caLcuLUS 39) For computational purposes we want to represent linear maps from R™ to &™ ty their matrices, and itis therefore ofthe utmost importance to fnd the matrix t of the differential T = dF,. This matrix is called the Jacobian matrix of F ‘The columns of t form the skeleton of dy, and we saw above that this skeleton is the n-tuple of partial derivatives (dF /42;)(a). If we write the m- tuple-valued F loosely as an m-tuple of functions, F = , then according to Lemma 8.1, the jth column of t is the m-tuple oF gy 2h nm O= <2... 20> ar; ‘Thus, Theorem 9.4. Let F be a mapping from an open subset of RY to R™, and suppose that F is differentiable at a. Then the matrix of dF, (the Jacobian matrix of F at a) is given by of y= an. If we use the notation yi = fi(x), we have a on hy If we also have a differentiable map z = G(y) = from an open set BC R™ into Rt, then dG, has, similarly, the matrix Ge (yy — He By, ) = Fe). Also, if B contains b = F(a), then the composite-function rule (Go FP), = dG, dF, has the matrix form 8k (gy = Fy Ht (yy WE Be => Fw Mew, or simply a ae; ~ 2 ay. 22; is the usual form of the chain rule in the ealeulus. We see that it is merely the expression of the composition of linear maps as matrix multiplication. saw in Seetion 8 that the ordinary derivative /’(a) of a function f of one Ie is the skeleton of the differential df, and it is perfeotly reasonable to generalize this relationship and define the derivative F"(a) of a funetion F of n real variables to be the skeleton of dF, so that F(a) is the n-tuple of partial derivatives {(@F /6z,)(a)}7, a8 we saw above. In particular, if F is from an open subset of R* to R™, then F’(a) is the Jacobian matrix of Fat a. This gives the39 THE DIFFERENTIAL AND R" 159 ‘matrix chain rule the standard form (Ge Py(a) = G(F(a)) F(a). Some authors use the word ‘derivative’ for what we have ealled the differential, but this is a change from the traditional meaning in the one-variable ease, and we prefer to maintain the distinetion as diseussed above: the differential dP. is the linear map approximating AF, and the derivative F’(a) must be the matrix of this linear map when the domain and range spaces are Cartesian, However, we shall stay with the language of Jacobian. Suppose now that A is an open subset of a finite-dimensional veetor space V and that H: A — W is differentiable at « € A. Suppose that W is also finite- dimensional and that ¢: V — R* and y: 17 —> R™ are any coordinate isomorphisms. If A = ¢{4], then 2 is an open subset of R" and H = yeH © ¢isa mapping from 2 to R™ which is differentiable at a= g(a), with a, ¥edH.e¢~', Then dH, is given by its Jacobian matrix {(9h*/32,)(a)}, which wwe now call the Jacobian matriz of H with respect to the chosen bases in V and W. Change of bases in V and W changes the Jacobian matrix according to the rale given in Section 2.4. If F is a mapping from R® to itself, then the determinant of the Jacobian matrix (af'/8z;)(a) is ealled the Jacobian of F at a. It is designated af SV), (Ea, +++ ta) if it is understood that ys = f'(x). Another notation is J p(a) (or simply J(a) if P is understood). However, this is sometimes used to indicate the differential F,, and we shall write det J p(a) instead Ti F(x) = <2? — 24, 2eyz2>, then its Jacobian matrix is fe a). and det J(x) = 4¢ei + 23) = A(x)”. EXERCISES: 9.1 By analogy with the notion of a parametrized arc, we define a smooth param- ctrized two-dimensional surface in a normed linear space W’ to be # continuously differentiable map T' from a rectangle 1X J in R® to W. Suppose that 1x [=1, 1] X [-1, 1], and invent a definition of the tangent space to the range of at the point PO, 0). Show that the two vectors r x FOO snd 50,0) fare a basis for this tangent space. (This should not have been your definition.)160 THE DiFFEKENTIAL cancuLUS 39 9.2 Generalize the above exercise to a smooth parametrized n-dimensional surface in a normed linear space W 9.8 Compute the Jacobian matrix of the mapping <2, y> + <2%, 2, (2+ y)?> Show that its rank is two except at the origin. 9A Let F = from BR? to R® be defined by Awa =sby ts fleyy = Pb, $a(x, y,2) = 24+ y+ 23 Compute the Jacobian of F at . Show that itis nonsingular unless two of the three coordinates are equal. Describe the locus ofits singularities, 9.5 Compute the Jacobian of the mapping F: <2, y> > <(r-+y)*,y°> from RP to Rat <1, —1>;at <1,0>;at . Compute the Jacobian of @: <3, > 9.6 In the above exercise compute the compositions F » Gand Ge F. Compute the Jacobian of Fe Gat . Compute the corresponding produet of the Jacobians of F and . 9.7 Compute the Jacobian matrix and determinant of the mapping T defined by = reos@, y = rsin8, 2 = 2. Composing a function f(z, y,2) with this mapping gives a new function: and 916, 8,2) = flr cos 8, sin 8,2) ‘That is, g = f2 7. This composition (substitution) is called the change to cylindrical, coordinates in B. 9.8 Compute the Jacobian determinant of the polar coordinate transformation <1,0> m9 , where x = rcos6,y = rsin8, 9.9 The transformation to spherical coordinates is given by x = rsin g cos 8, 8,2 = rcos#. Compute the Jacobian Aen) | a, 48) 9.10 Write out the chain rule for the following special eases: du/dt = 2, where w= Fiz,y), r= 9, y= hid, Find duo/dt when w = Flas, ...,24) and x; = git), i = 1,...)n, Find do/@u when w = F(z, y), 2 = o(u,1), = A(u, 9). The special case where g(u,#) = 0 ean be rewritten a gg PG Aa, »). Compute it G.I Tw = fie, y), x = re0s6, and y = rsinO, show that3.10 ELEMENTARY APPLICATIONS — 161 10, ELEMENTARY APPLICATIONS ‘The elementary max-min theory from the standard calculus generalizes with little change, and we inelude a brief discussion of it at this point. ‘Theorem 10.1. Let F be a real-valued function defined on an open subset A of a normed linear space V, and suppose that F assumes a relative maximum value at a point a in A where dF, exists. Then dF, = 0. Proof. By definition DyP(a) is the derivative 7'(0) of the function ¥(0) F(a + 12), and the domain of 7 isa neighborhood of 0 in B. Since 7 has a relative maximum value at 0, we have 1"(0) = 0 by the elementary calculus. ‘Thus GF() = DeF(@) = 0 for all &, and s0 dF. = 0. 0 A point « such that df = 0 is called a critical point. ‘The theorem states that a differentiable real-valued function ean have an interior extremal value only at a critical point. If V is R®, then the above argument shows that a real-valued funetion F can have a relative maximum (or minimum) at a only if the partial derivatives (@F/az,)(a) are all zero, and, as in the elementary ealeulus, this often provides, way of calculating maximum (or minimum) values. Suppose, for example, ‘that wewant to show that the eube is the most efficient rectangular parallelepiped from the point of view of minimizing surface area for a given volume V. If the edges are z, y and 2, we have V = zyz and A = 2(zy + 22+ y2) = ey + V/y-+V/z). Then from 0 = aA/az = 2y — V/2*), we see that V = ys", and, similarly, 24/dy = 0 implies that V = zy’. Therefore, yz? -zy?, and since neither z nor y ean be 0, it follows that x = y. Then V = ye’ z',and z= V"' = y, Finally, substituting in V = 2yz shows that 2— V", Our eritieal configuration is thus a cube, with minimum area A = GV Tt was assumed above that A has an absolute minimum at some point 0), then z is relative maxi- ‘mum (minimum) point for f. We shall prove the corresponding general theorem in Section 16. ‘There are more possibilities now; among them we have the analogous sufficient condition that if di’, = 0 and d?P, is negative (positive) Alefinite as a quadratic form on V, then a is a relative maximum (minimum) point of F. ‘We consider next the notion of a tangent plane to @ graph. The calculation ‘of tangent lines to curves and tangent planes to surfaces is ordinarily considered 1 geometric application of the derivative, and we take this as sufficient justification for considering the general question here.162 THE pivrEnENTIAL caLcunus 3.10 Let F be a mapping from an open subset A of a normed linear space V to ‘a normed linear space W. When we view F as a graph in V x W, we think of it asa “surface” S lying “over” the domain A, generalizing the geometrie interpretation of the graph of a real-valued function of two real variables in B® = R? x R, ‘The projection a: VX W — V projects $ “down” onto A, gives the point of 8 lying “over” &. Our geometric imagery views Vas the plane (subspace) V x {0} in V x 1, just as wwe customarily visualize R as the real axis x {0} in 82, ‘We now assume that F is differentiable at a. Our preliminary discussion in Section 6 suggested that (the graph of) the linear function di, is the tangent, plane to (the graph of) the function AF, in V x W, and that its translate M through is the tangent plane at to the surface S that is (the graph of) #. ‘The equation of this plane is » — F(a) = dF4(g— a), and it is accordingly (the graph of) the affine function G() = dRa(£— a) + F(a). Now we know that dF, is the unique T in Hom(V,W) such that AF4(¢) Tp) + o(f), and if we set ¢ = F — a it is easy to see that this is the same as saying that G is the unique affine map from V to W such that F®) — G(®) = of — ‘That is, M is the unique plane over V that “fits” the surface S around in the sense of e-approximation. However, there is one further geometric fact that greatly strengthens our feeling that this really is the tangent. plane. Theorem 10.2. ‘The plane with equation 7 — F(a) = dF4(£—a) is exactly the union of all the straight lines through in Vx W that are tangent to smooth curves on the surface S = graph F passing through this point. In other words, the vectors in the subspace dF, of V x W are exaetly the tangent vectors to curves lying in S and passing through . Proof. This is nearly trivial. If <£,> € dF, then the are 10 = in $ lying over the line (+ a+ ¢& in V has <& dFa(s)> = <£,n> as its tangent veetor at , by Lemma 8.1 and Theorem 8.2. Conversely, ift-+ isany smooth are in S passing through a, with a = Alo), then its tangent vector at 1s 1 veetor in (the graph of) dq. 0310 ELEMENTARY APpLicaTIONS 16% ‘As an example of the general tangent plane discussed above, let F be the map from R® to B? defined by u(x) = (e} — 23)/2, falx) = Fiza. The graph of F is a surface over R? in R* = B® x R?, According to our above discussion, the tangent plane at has the equation y P(x — a) + F(a). Ata = <1,2> the Jacobian matrix of dF, is fs “$h.-k 7) <~§,2>. The equation of the tangent plane A at <1, 2> is thus ‘Computing the matrix produet, we have the sealar equations mye t(-l He =n 2th Qe +z2t(-2 -2 42) = Arta —2. ui ui Note that these two equations present the affine space M as the intersection ‘of the hyperplane in R* consisting of all <1, 22, ys, Y2> such that a 2ee = 8 with the hyperplane having the equation Bey + ae — ya = 2. EXERCISES 10.1 Find the maximum value of f(x, y,2) = 2+ y+ 20n the ellipsoid se ay? + 3? = 1 10.2 Find the maximum value of the linear functional f(x) = Sf esr: on the unit phere SPaf = 1. 10.3 Find the ( x08! andy = 8X1,1,1> 4 <1,0,-1> imum) distance between the two Tines ww BD, 10.4 Show that there is a uniquely determined pair of closest points on the two lines v= la+Land y = sh-+m in R* unless b= ka for some k, We assume that 4 #0. 2b, Remember that if b is not of the form ka, then [(a, bl < [alls bila, vcording to the Schwars inequality. 10.5 Show that the origin is the only critical point of f(c, y, 2) = zy ++ yet 2 Find aline through the origin along which 0 is « maximum point for f, and find another Ine along which 0 is & minimum point.1GL THE DIFFERENTIAL caLCULUS BL 10.6 In the problem of minimizing the area of a rectangular parallelepiped of given volume V worked out in the text, it was assumed that a(t ly aos) hhas an absolute minimum at an interior point of the first quadrant. Prove this. Show first that 1 + if <2, y% approuches the boundary in any way: EO oes eos tee 10.7 Let P:R? + R? be the mapping defined by tn = sin (e1-f 22), ya = 008 (21 — 22). Find the equation of the tangent plane in R¢ to the graph of F over the point a 10.8 Define F: B® RE by neLs ys Find the equation of the tangent plane to the graph of F in RS over a = <1,2, —L> 10.9 Let u(&, ») be a bounded bilinear mapping from a product normed linear space ¥ XI toa normed linear space X, Show that the equation of the tangent plane to the graph Sof win VX IX X at the point € Sis £ = Wl8,B) + ola, m) + ola, 8). 10.10 Let F be a bounded linear functional on the normed linear space V. Show that the equation of the tangent plane to the graph of Fin VX Rover the point a ean be written in the form y = F2(a)(8F(8) — 2F(@)). 10.11 Show that if the toa mapping F in Hom(V’, W), then it reduces to the equation for F itself fn xno matter where the point of tangeney. (Naturally!) 10.12. Continuing Exercise 9.1, show that the tangent space to the range of P in I at F(0) is the projection on Hof the tangent space to the graph of Pin 8? WF at the point <0,1(0)>. Now define the tangent plane to range T in I at, P(0), and show that itis similarly the projection of the tangent plane to the graph of P. 10.13 Let F: V+ W be differentiable at a. Show that the range of di is the projection on I of the tangent space to the graph of F in VX W at the point rneral equation fora tangent plane given in the text is applied lh, LL, THE IMPLICIT-FUNCTION THEOREM. ‘The formula for the Jacobian of a composite map that we obtained in Section 9 is reminiscent of the chain rule for the differential of a composite map that we derived earlier (Section 8). ‘The Jacobian formula involves numbers (partial derivatives) that we multiply and add; the differential chain rule involves linear maps (partial differentials) that we compose and add. (The similarity becomes 1 full formal analogy if we use block decompositions.) Roughly speaking, theaL cntial is a linear map from the one-dimensional space R to itself, and is therefore multiplication by a number, the derivative. In the many-variable ealeulus when we decompose with respect to one-dimensional subspaces, we get blocks of such numbers, ic., Jacobian matrices. When we generalize the whole theory to vector spaces that are not one-dimensional, we get essentially the same formulas but with numbers replaced by linear maps (differentials) and multiplication by ‘composition. ‘Thus the derivative of an inverse function is the reciprocal of the derivative of the funetion: if g = f-! and b = f(a), then g'(b) = 1//’(a). ‘The differential of an inverse map is the composition inverse of the differential of the map: if F~" and F(a) = B, then dG = (dF.) Tf the equation g(z, y) = 0 defines y implicitly as a funetion of x, y = flz), we learn to compute /”(a) in the elementary calculus by differentiating, o(e f@)) and we get a9 ag "1 5 (078) + 5p (a0) Sa) where b = /(@). Hence 10) = — Silay We shall see below that if G(g, 9) = O defines 7 as a funetion of & = F(2), and if 6 = F(a), then we calculate the differential dF, by differentiating the identity G(g, F({)) = 0, and we get a formula formally identical to the above. Finally, in exactly the same way, the so-called auxiliary variable method of solving max-min problems in the elementary caleulus has the same formal structure as our later solution of a “constrained” maximum problem by Lagrange multipliers Tn this section we shall consider the existence and differentiability of funetions implicitly defined. Suppose that we are given a (vector-valued) function G(E, 2) of two vector variables, and we want to know whether setting @ equal to 0 defines 7 as a function of & that is, whether there exists a unique funetion F such that G(é, F(é)) is identically zero. Supposing that such an “implicitly defined” function F exists and that everything is differentiable, we ean try to compute the differential of F at « by differentiating the equation @(, P(®)) = O,or Ge <1, F> = 0. Weget dGka.ay ° dg + dG%a3> ° dPa = 0, where we have set ¢ — F(a). If dG? is invertible, we ean solve for dF, getting OF, = —(0G%.gy)~* 0 Oey Note that this has the same form as the corresponding expression from the clementary calculus that we reviewed above. If F is uniquely determined, then so is dP, and the above calculation therefore strongly suggests that we are166 THE DIFFERENTIAL caxcuLUS Bul going to need the existence of (dG2_,g>)~' as @ necessary condition for the existence of a uniquely defined implieit function around the point . Since 8 is F(a), we also need G(a, £) = 0. These considerations will lead us to the right theorem, but we shall have to postpone part of its proof to the next chapter. What we can prove here is that if there isan implicitly defined function, then it must be differentiable. ‘Theorem 11.1. Let V, W, and X be normed linear spaces, and let G be a mapping from an open subset A x B of VX W to X. Suppose that F isa continuous mapping from A to B implicitly defined by the equation G(é, n) = 0, that is, satisfying G(g, F()) = 0 on A. Finally, suppose that G is differentiable at , where 8 = F(a), and that d%0,g> is invertible. Then Je at a and dy = —(4@%ag>) © dO ka,p>- Fa(8), 80 that Gla + & 8+ 1) = Gla+ & Fat 8) 0= Gat & 8+) — Ga, 6) = aG<, AG ea,g>(8) + AG %a,p>(n) + LE 0)- 7 to this equation, where T= dG%q,s>, and solving for m, we get 9= —TO (dG eag>(8)) + OC ). This equation is of the form 9 = 0(8) + o(< a>), and since y= AFa(2) is an infinitesimal #(2), by the continuity of F at a, Lemmas 5.1 and 5.2 imply first that = 0(8) and then that <> = e(g). Thus o(0(<& 9>)) = 0((c0())) = 08), and we have AFQ(8) = 9 = S(E) + (8), where S = —(dG%a,¢>)~! © d@e,a>, an element of Hom(V, W). Therefore, F is differentiable at a and dF. has the asserted value. 0 We shall show in the next chapter, as an application of the fixed-point theorem, that if V, W, and X are jinile-dimensional, and if @ is a continuously diferentiable mapping from an open subset A x B of V x W to X such that at the point we have both G(a, 8) = 0 and dG%q,g> invertible, then there is a uniquely determined continuous mapping F from a neighborhood M of a to B such that F(a) = 8 and G(g, F(¢)) = 0 on M. The same theorem is true for the more general class of complete normed linear spaces which we shall study in the next chapter. For these spaces it is also true that if T" exists, then so does S“! for all S sufficiently close to T, and the mapping $+ Sis eontinuous. Therefore d@2,> is invertible for all sufficiently close to , and the above theorem then implies that F is differentiable on a neighborhood of a. Moreover, only continuous mappings are involved in the formula given by the theorem for dF: + dP,, and it follows that F is in fact continuously differentiable near a. ‘These conclusions constitute the implicit- function theorem, which we now restate. >(E ) = AB (E 9) +0(8, 9) Apply8.1 THE mMpLicr-FUNcTION THEOREM 167 ‘Theorem 11.2. Let V, W, and X be finite-dimensional (or, more generally, complete) normed linear spaces, let A x B be an open subset of Vx W, and let G: A x B— X be continuously differentiable. Suppose that at the point in A x Bwe have both G(a, 6) = Oand dG%_,g5 invertible. ‘Then there isa ball M about « and a uniquely defined continuously differentiable mapping F from M to B such that F(a) = Band G(z, F()) =Oon M. ‘The so-called inverse-mapping theorem is a special case of the implicit function theorem. ‘Theorem 11.3. Let H be a continuously differentiable mapping from an ‘open subset B of a finite-dimensional (or complete) normed linear space W to a normed linear space V, and suppose that its differential is invertible at point 8. Then H itself is invertible near 8. That is, there is a ball M about @ = H(8) and a uniquely determined continuously differentiable function F from M to B such that F(a) = 8 and H(F(E)) = §on M, Proof. Set GCE, n) = £—H(9). ‘Then G is continuously differentiable from VX B to V and d@%,g5 = —dHg is invertible. The implicit-funetion theorem (hen gives us a ball 7 about a and a uniquely determined continuously differ- eatiable mapping F from M to B such that F(a) = 8 and 0 = O(é, F(2)) H(PG)) on M. 0 ‘The inverse-mapping theorem is often given a slightly different formula which we state as a corollary. Corollary. Under the hypotheses of the above theorem there exists an open neighborhood U of 8 such that H is injeetive on U,N = H[U] is open in Fig. Pooof. The proof of the corollary is left as an exercise. In practice we often have to apply the Cartesian formulations of these \worems. ‘The student should certainly be able to write these down, but we ll state them anyway; starting with the simpler inverse-mapping theorem. ‘Theorem 11.4. Suppose that we are given n continuously differentiable real-valued functions G.(y1,.--» Ys), #=1,-.-,%, of m real variables defined on a neighborhood B of a point b in R* and suppose that the Jacobian determinant Gs, G OY, +++ Um) ) b)168 THE DIFFERENTIAL caLcuLLs Bul is not zero, ‘Then there is a ball M about a = G(b) in R" and a uniquely determined n-tuple F = of continuously differentiable real: valued funetions defined on Mf such that F(a) = b and G(F(x)) = x on M for i= 1,...,n. That is, Gi(Pi(en,«++5 mds eos Pa(tis+++st0)) = for all x in M and for i= 1,...,m. For example, if x = we have ‘Sui, ub oot ae (x1, 42) Oa, va) aefR BJ, and we therefore know without trying to solve explicitly that there is a uniqi solution for y in terms of x near x= <1*+28,1?-+2%> = <9,5>. The reader would find it virtually impossible to solve for y, sinee he would quickly discover that he had to solve a polynomial equation of degree 6. ‘This clearly shows the power of the theorem: we are guaranteed the existence of a mapping which may be very difficult if not impossible to find explicitly. (However, in the next chapter we shall discover an iterative procedure for approximating the inverse mapping as closely as we want.) Everything we have said here applies all the more to the implieit-funetion theorem, which we now state in Cartesian form. ‘Theorem 11.5. Suppose that we are given m continuously differentiable real-valued functions Gi(x, y) = Gi(x1, En Vay + Ym) of m+ m real variables defined on an open subset A x B of R"*™ and an (n + m)-tuple = such that G,(a,b) = 0 for i 1, ,m, and such that the Jacobian determinant: HG, 2G Gu. a,b Wired) OP? isnot zero. ‘Then there is a ball Af about a in R" and a uniquely determine m-tuple F= of continuously differentiable real-valuet funetions F(x) = Fy(e1,..- 4) defined on Mf such that b= F(a) ami Gils FG)) = 0 on M for ¢= 1,...,m. That is, be = Flan, 5) for FS 1... myand G(r, «.-y2tm5 Faltay e+ ytady oo ey Poultay + 5a) 0 for all x in AF and for i= 1,.-.,m. For example, the equations aitab—yi— v= 0, si-a— yi v= 0 can be solved uniquely for y in terms of x near = <1,1,1,—1>31 ‘THE IMPLicrT-FUNCTION THEOREM — 169 because they hold af that point and because (Gs, Ga) fee —2y2 aus) ~ l—ayf, v8 1 S(y.vi — yout) has the value 12 there. Of course, we mean only that the solution functions exist, not that we can explicitly produce them. EXERCISES ULL Show that <2, y> 1+ is locally invertible about any point , and compute the Jacobian matrix of the inverse map. 11.2 Show that ++ is locally invertible about any point in B, by computing the Jacobian matrix. In this ease the whole mapping is invertible, with an easily computed inverse. Make this ealeulation, compute the Jacobian matrix of the inverse map, and verify that the two matrices are inverses at the appropriate points. 11.3 Show that the mapping ++ from RS to BP is locally invectible about <0, x/2,0>. Show thet
rs is locally invertible about. <=/4, —m/4, 0> 14 Express the second map of the above exercise as the composition of two maps, and obtain your answer a second way. ILS Let P: b+ be the mapping from R® to R? defined by u = 22+ y2,0 = 2ry. Compute an inverse G of F, being careful to give the domain and range of @. How many inverse mappings are there? Compute the Jacobian matrices of Fat <1,2> and of @ st <5,4>, and show by multiplying them that they are 11.6 Consider now the mapping F: ++ <2, y8>. Show that dF <0.0> is, singular and yet that the mapping has an inverse G. What conclusion do we draw about the dfferentiability of @ at the origin? ILT Define F:R? +R? by !» . Prove that F is locally invertible about every point. LB Define P:R + RS by xt y where wen ntet at, we sttaet Gb —8e), ys = alt abtas Prove that x+y = F(x) is locally invertible about x = <0,0,1>. 11.9 Fora function f: 8 + R the proof of local invertiblity around a point a where {fy is nonsingular is much simpler than the general case. Show first that the Jacobian matrix of fat ais the number f(a). We are therefore assuming that f(z) is continuous in a neighborhood of a and that f’(a) + 0, Prove that then fis strictly increasing (or decreasing) in an interval about a. Now finish the theorem. (See Exercise 1.12.)170 THE DIFFERENTIAL CALCULUS Bal 11.10. Show that the equations ee have differentiable solutions 2(0, y(O, 2) around = <0, —1, 1,0> LIL Show that the equations CLM Ma CHOt te ad can be uniquely solved for w and v in terms of x and y around the point <0, 0,0, 0> UL12 Let S be the graph of the equation e+ sin (xy) + 008 (22) = 1 in R9, Determine whether in the neighborhood of (0, 1, 1) Sis the graph of a differ entiable function in any of the following forms: za fly, 2 = one), y= hee) 11.13 Given funetions fand g from R® to R such that fla, 8,6) = Oand g(a,,¢) = 0, write down the condition on the partial derivatives of f and g that guarantees the existence of a unique pair of differentiable functions y = h(2) and 2 = k(2) satisfying Ma) =, Ra) = 6, and Sle, y,2) = F(z, Mz), k(2) g(t, y, 2) = g(x, h(z), kz) = 0 aroun . 1.14 Let G(E, 9, ¢) be a continuously differentiable mapping from V = []} Vito IV such that dQ: Vs —> W is invertible and Gla?) = Glas, az, aa) = 0. Prove that ther tists a uniquely determined funetion {= F(E 9) defined around in Vi X Va such that @(E,m, F(E,1)) = 0 and F(e1, a2) = ag. Also show that [dX torr] dG cenrrhs °, Peay where f= FCE, 1). 11,15 Let F(E,2) be a continuously differentiable function from VX W to X, and suppose that dF.» is invertible. Setting Y = F(a), show that there is a product neighborhood LX MX N of <7,a,8> in XX VX W and a unique continuously differentiable mapping G:LX M—>N such that on LX M, F(8, GU, 8) =. 11.16 Suppose that the equation 9(z, v, 2) = Ocan be solved for z in terms oft and y ‘This means that there ie a function (,y) sueh that g(x,y, f(z, y)) = 0. Suppose also that everything is differentiable and eompute d2/8z. 11.17 Suppose that the equations g(zy2)=0 and hee, ¥,2) can be solved for y and z as functions of x. Compute dy/dr. 0 can be solved for u and » 118 Suppose that g(z, y,u,) = O-and A(z, y, ue) as functions of x and y. Compute au/ar. 119 Compute de/de where 28 + y8 + 28 = Oand 2?-4y*+ 2? = 1. 1.20 I+ 29+ 43 = Oand e+ 22+ y24 2? » 1, thends/dzisambiguom ‘We are obviously going to think of two of the variables as funetions of the other two.31 THE DePLicrr-ruNcTION THEOREM — 171 ‘Also is going to be dependent and independent, But is tory ging tobe the other independent variable? "Compute d/@z under each of these ssstmptions 1121 Weare given four “physica variable” pf, and g such that each of them i function of any two ofthe other thee. Show that 3/2p has two quite different mean ings, and make explicit what the relationship betwen them i y labeling the varios function that are relevant and applying the implicit diflerentaton proces 11.22 Again the “one-dimensional” ease is substantially simpler, Let @ be a con tinuously dferentable mapping from Re to R such that Gla) = O-and (@6/a0\2,8) = Gx(a,8) > 0. Show that there are positive numbers eand & such that for each ¢ in (a — 3,0-+ 8) ‘the function g(y) = G(c, y) is strictly increasing on [b — ¢, b+ €] and G(c,b — 6) < 0'< dle,b-+ 0. Conclude irom te intermediato-ralu theorem (Exereis 1.13) that thore exist a unique function F:(e~ 8,44 3) —> @— «,b-+ 0) auch that G(z, F(z)) = 0. 11.33 By applying the same argunent wed inthe abore exercise a second time, prove that Fis continous, 11.24 In the inversefunction theorem show that d= (af)~!. That isthe ditfer- ential ofthe inverse of His the invent ofthe diferent fF. Show this a) by applying the implicit-function theorem; 1) by adv elelaton from the identity 1 (F(B) = & 11.25. Agsin in the contest of the inyerse-mapping theorem, show that there is neighborhood JY of 8 in 4 such that P(H@)) = 0m M. (Don't work at this Just tpply the theorem again) 11.26 We continue in the context of the inverse mapping theorem, Assume the result (from the next chapter) that if dH3! exists, then so does dH", for £ sufficiently elose tof. Show that there is an open neighborhood U of 8 in B such that H is injective on U, ‘H{U] is an open set N in V, and H~ is continuously differentiable on N. 11.27 Use Exercise 3.21 to give a direct proof of the existence of a Lipshit continuous leal inverse in the contert of the inverse-mapping theorem. (Hint: Apply ‘Theorem 74 11.28 A direct proof ofthe diferetiblityof an inverse function is simpler than the implicit function theorem proc Work out such a proof, modeling your arguments in federal way upon thoe in Theoret 11-1. 11.29 Prove that the implisi-uneton theorem can be deduced from the inverse function theorem as fellows. Bet H(E,n) = <£, GCE n>, td show that dH/t exists from the block diagram results of Chapter 1. Apply the inverse-mapping theorem.172 THE DIFFERENTIAL CALCULUS Baz 12, SUBMANIFOLDS AND LAGRANGE MULTIPLIERS If V and W are finite-dimensional spaces, with dimensions n and m, respectively, and if F is a continuous mapping from an open subset A of V to W, then (the graph of) F isa subset of VX W which we visualize as a kind of “n-dimensional surface” S spread out over A. (See Section 10,) We shall eall F an n-dimensional patch in V XW. More generally, if X is any (n -+-m)-dimensional veetor space, ‘we shall call a subset $ an n-dimensional patch if there is an isomorphis from X to a product space V x W such that V is n-dimensional and ¢{S] is a patch in VX W. That is, S beeomes a patch in the above sense when X is ‘considered tc be V x W. This means that if m1 is the projeetion of X = Vx W onto V, then +;[S] is an open subset. A of V, and the restriction m1 [ S is one-to- one and has @ continuous inverse. If ms is the projection on W, then F = p(w; [ SI“ is the map from A to W whose graph in V x Wis S (when V x W is identified with X). ‘Now there are important surfaces that aren’t such “patch” surfaces. Con- sider, for instance, the surface of the unit ball in R®,$ = {x: C2? = 1}. Sis obviously a two-dimensional surface in R? which cannot be expressed as a graph, no matter how we try to express R° as a direct sum, However, it should be equally clear that S is the union of overlapping surface patches. Ife is any point ‘on S, then any sufficiently small neighborhood Nof e in R® will intersect $ in a patch; we take V as the subspace parallel to the tangent plane at @ and W as the perpendicular line through 0. Moreover, this property of S is a completely adequate definition of what we mean by a submanifold. ‘A subset S of an (n+ m)-dimensional veetor space X is an n-dimensional submanifold of X if each a on S has @ neighborhood N in X whose intersection with S is an n-dimensional pateh. ‘We say chat S is smooth if all these patches Sq are smooth, that is, if the function F: A —> W whose graph in V x W is the patch Sq (when X is viewed as V x W) is continuously differentiable for every such patch Sq. The sphere we considered above is a two-dimensional smooth submanifold of Submanifolds are frequently presented as zero sets of mappings. For example, oursphere above is the zero set of the mapping @ from 8° to B defined by G(x) = Eta? — 1. It is obviously important to have a condition guar- anteeing that such a null set is a submanifold, ‘Theorem 12.1. Let @bea continuously differentiable mapping from an open subset U of an (n+ m)-dimensional vector space X to an m-dimensional vector space ¥ such that dGq is surjective for every a in the zero set $ of G. ‘Then S is an n-dimensional submanifold of X. Proof. Choose any point ¥ of S. Since dGy is surjective from the (n + m)- dimensional veetor space X to the m-dimensional veetor space Y, we know that ‘the null space V of dGy has dimension » (Theorem 2.4, Chapter 2). Let W be any32 SUBMANIFOLDS AND LAGRANGE MULTIPLIERS 173 complement of V, and think of Xs V x W, so that G now becomes funetion of two vector variables and 7 is a point such that Ga, a) = 0. The restriction of d@ge,s» to W isan isomorphism from W to Y; that is, (A@%a.g>)~" exists, Therefore, by the implicit-funetion theorem, there is a product neigh- bothood Ss(a) x S,(8) of in X whose intersection with S is the graph of a funetion on Ss(a). This proves our theorem. 0) If S is a smooth submanifold, then the function F whose graph is the patch of S around ¥ (when X is viewed suitably as V x W) is continuously differenti ble, and therefore S has a uniquely determined n-dimensional tangent plane M at 7 that fits S most closely around 7 in the sense of our o-approximations, If ¥ = 0, this tangent plane is an n-dim nal subspace, and in general it is the translate through 7 of a subspace N. We call V the tangent space of S at 7; its elements are exactly the veetors in X tangent to parametrized ares drawn. in S through 7. What we are going to do later is to deseribe an n-dimensional manifold $ independently of any imbedding of $ ina veetor space. ‘The tangent space to S at a point 7 will till be an invaluable notion, but we are not going to he able to visualize it by an actual tangent plane in a space X carrying S. Instead, we will have to construct the vector space tangent to S at 7 some. how. ‘The clue is provided by Theorem 10.2, which tells us that if S is imbedded asa submanifold in a vector space X, then each vector tangent to Sat 7 ean be presented as the unique tangent vector at 7 to some smooth eurve lying in S. ‘This mapping from the set of smooth curves in S through 7 to the tangent space 17 is not injective; clearly, different curves ean be tangent to each other at 7 and so have the same tangent vector there. Therefore, the object in S that corresponds to a tangent vector at ¥ is an equivalence class of smooth curves through 7, and this wll in fact be our definition of a tangent veetor for a general ‘manifold The notion of a submanifold allows us to consider in an elegant. way classical “constrained” maximum problem. We are given an open subset U ‘a finite-dimensional veetor space X, a differentiable real-valued function F ‘lefined on U, and a submanifold $ lying in U. We shall suppose that. the submanifold S is the zero set of a continuously differentiable mapping @ from U twa vector space Y such that dG is surjective for each 7 on S. We wish to cmsider the problem of maximizing (or minimizing) F(7) when 7 is “con strained” to lie on S. We eannot expect to find such a maximum point Yo by setting dF, = 0 and solving for 7, beeause Yo will not be a eritieal point for F. Consider, for example, the funeticn g(x) = Ej 2? — 1 from R® to R and F(x) vty. Here the “surface” defined by y — 0 is the unit sphere $2? = 1, and on this sphere F has its maximum value 1 at <0,1,0>. But F is linear, and so «hy, = F can never be the zero tmnsformation. ‘The device known as Lagrange multipliers shows that we can nevertheless find such constrained eritical points hy solving dy = 0 for a suitable function174 Tp piFFERENTIAL catcuLus Baz ‘Theorem 12.2. Suppose that F has a maximum value on S at the point 7. ‘Then there is a functional Uin ¥* such that 7 is a critical point of the funetion F — (Le @). Proof. By the implicit-funetion theorem we ean express X as V x W in such « ‘way that the neighborhood of S around 7 is the graph of a mapping AT from an open set A in V lo WY. ‘Thus, expressing # and @ as functions on Vx W, w: have G(6, 1) = OnearY = ifand only if y = H(2), and the restrietion of F(é, 2) to this zero surface is thus the funetion K: A —»+ R defined by K(2 F(é,H(&)). By assumption a is a critical point for this function. Thus 0 = dK, = dP *u.gp + dP Say * Ha. ‘Also from the identity @(¢, H(g)) = 0, we get 0 = dGhag> + d@%ag> © dH Since dG%,,9 is invertible, we ean solve the second equation for dif. and substitute in the first, thus getting, dropping the subscripts for simplicity, aR" — dk? © (dG?) © dG? = 0. Let 1€ ¥* be the functional dF? © (4G2)~*. Then we have dF* = Le d@* and, by definition, dF? = 1 dG?. Composing the first equation (on the right) with mi:V XW» V and the second with wz, and adding, we get dF cay = Vo d@a,sy- That is, dF — Lo Gy = 0. 0 Nothing we have said so far explains the phrase “Lagrange multipliers”. This comes out of the Cartesian expression of the theorem, where we have U ‘an open subset of a Cartesian space B®, Y = R",G—= ,and lin Y* of the form l.:l(y) = LT eye Then F —leG = F — Ffewg', and a(F — 12 G)q = 0 becomes oF _ S80" ax; ax; ‘These n equations together with the m equations @ = = Ogive ‘m+ n equations in the m+n unknowns 21)... 5 Zn Cty ++» Cm Our original trivial example will show how this works out in practice. We ‘want to maximize F(x) = z» from R° to R subject to the constraint Dj a? = 1 Here g(x) = L$ 2? — 1 is also from R® to R, and our method tells us to look for a critical point of F — cg subject to g = 0. Our system of equations is 0 — 2er, = 0, 12ers = 0, 0 — 2ers = 0, L3.13 FUNCTIONAL DEPENDENCE 175 ‘The first says that ¢ = 0 or x; = 0, and the second implies that c eannot be ¢. ‘Therefore, x1 = z3 = 0, and the fourth equation then shows that 22 = £1. Another example is our problem of minimizing the surface area A 2(zy + ye-+ 22) of a rectangular parallelepiped, subject to the constraint of a constant volume, xyz = V. The theorem says that the minimum point will be a critical point of A — AV for some 2, and, setting the differential of this funetion equal to zero, we get the equations 2y +) dye = 0, 2@ + 2) — Mz = 0, 2(z + y) — Ary = 0, together with the constraint aye = Vi ‘The first three equations imply that x = y = 2; the last then gives V"!* at the ‘common value. ‘3, FUNCTIONAL DEPENDENCE ‘The question, roughly, is this: If we are given a collection of continuous functions, all definee on some open set A, how ean we tell whether or not some of them are funetions of the rest? For example, if we are given three real-valued continuow funetions f,, fo, and fy, how can we tell whether or not some one of them is a funetion of the other two, say fs is a function of f; and fz, which means that there is a function of two variables g(z, y) such that fs() = g(f1(t), fa(€)) for all ¢ in the common domain A? If this happens, we say that fs is functionally dependent on f, and fy. This is very nearly the same as asking when it will be the case that the range S of the mapping F:t+ is a two-dimensional: submanifold of R°, However, there are differences in these questions that are worth noting. If fs is functionally dependent on f; and fz, then the range of ? certainly ‘ies on a two-dimensional submanifold of R°, namely, the graph of g. But this is no guarantee that it itself forms a two-dimensional submanifold. For example, both f, and fg might be functionally dependent on fi, fz = g°fi, and fe = ho fx, in which case the range of F lies on the curve in R®, which is a one-dimensional submanifold. In the opposite direction, the range of F can be a two-dimensional submanifold M without fy being functionally epondont on fz and f,. All we ean conclude in this caso is that loeally one of the functions {/,}? is a function of the other two, sinee locally M is a surface patch, in the language of the last section. But if we move a litte bit away on the curving surface M to the neighborhood of another point, we may have to solve fora different one of the functions. Nevertheless, if M — range F is a subset of ‘two-dimensional manifold, it is reasonable to say that the funetions {f,}} are functionally dependent, and we are led to examine this more natural notior.176 THE oIFFERENTIAL CALCULUS 313 Ii we assume that F = is continuously differentiable and that, the rank of dP is 3 at some point ain A, then the implicit-function theorem implies that F(A] includes a whole ball in R* about the point F(a). ‘Thus a necessary condition for AI = range F to lie on a two-dimensional submanifold in BR js that the rank of dg be everywhere less than 3. We shall se, in fact, that if the rank of dis 2 forall a, then MT = range Fis essentially a two-dimensional manifold. (There is still a tiny difficulty that we shall explain later.) Our tools are going to be the implieit-funetion theorem and the following theorem, which could well have come much earlier, that the rank of is @ “lower semicon- tinuous” funetion of 7. ‘Theorem 13.1. Let V and W be finite-dimensional vector spaces, normesl in some way. Then for any 7’ in Hom(V, WV) there is an ¢ such that, |S — 7] < ¢ = rank S > rank 7. Proof. Let T have null space N and range R, and let X be any complement of N in V. ‘Then the restriction of 1 to X is an isomorphism to R, and hence is, bounded below by some positive m. (Its inverse from R to X is bounded by some b, by Theorem 4.2, and we set m= 1/6.) Then if |S — Tl] < m/2, it follows that S is bounded below on X by m/2, for the inequalities UP@)| > milla] and (8 — 7)(@| < (on/2)la together imply that ||S(q)]| > (m/2)|al|. In particular, 5 is injective on X, and so rank $= (range 8) 2 d(X) = d(R) = rank 7. 0 ‘We can now prove the general local theorem. ‘Theorem 13.2. Let V and IW be finite-dimensional spaces, let r be an integer less than she dimension of W, and let F be a continuously differentiable map from an open subset A CV to W such that the rank of diy = r for all 7 jin A. Then each point 7 in A has a neighborhood U such that F{U] is an r-dimensional patch submanifold of W. Proof. For a fixed ¥ in A let V1 and ¥ be the null space and range of dy, let Vo be a complement of Vj; in V, and view V as Vi X V2. Then F becomes ‘a function F(g, n) of two variables, and if Y= , then deg» is an isomorphism rom V2 to Y. At this point we ean already choose the decomposition W = W; © Wa with respect to which FIA] is going to be a graph (locally). We simply choose any direct. sum decomposition W = W, © Wa such that We is a complement of Y = range dF’ <..g>. ‘Thus W, might be Y, but it doesn’t have to be. Let P be the projeetion of W onto W,, along Ws. Since ¥ is a complement of the null space of P, we know that P [ ¥ is an isomorphism from ¥ to W,. In particular, W; is r-limensional, and rank P 2 dP cay = 7313 FUNCTIONAL DEPENDENCE 177 Moreover, and this is crucial, P is an isomorphism from the range of AP gz to W; for all <£, n> sufficiently elose to . For the above rark theorem implies that rank Pedi > rank Pedi . On the other hand, the range of Po dF near , and since rank dF <> =r by hypothesis, we see that P is an isomorphism on the range of any such di <> Now define H:W, x A — Was the mapping = Pe dF%q3», which is an isomorphism from Va to Wy. ‘Therefore, by the impliit-function theorem there exists 4 neighborhood Lx Mx N of and a uniquely determined con inuously differentiable mapping @ from Lx M to N such that HCE, & GG, 9) = on LX M. That is, f Po F(E, G(s, 8) on Lx ML. ‘The remainder of our argument consists in showing that F(g, G(¢, §)) isa funetion of f alone. We start by differentiating the above equation with respe:t to £, getting 0 = Po (dF! + dF? 0dG?) = PodP <1,dG?> As noted above, P is an isomorphism on the range of dF for all <& n> sufficiently close to , and if we suppose that LX I is also taken smnall enough so that this holds, then the above equation implies that UP geny 2 <1,0G> = 0 for all <¢, £> €L X M. But this is just the statement that the partial differ- cntial with respect to & of P(E, G(, §)) is identically 0, and hence that F(E, GD) is a continuously differentiable funetion K of alone F(& OG, 9) = KO Since 9 = G(f, 8) and ¢ = Pe F(&, n), we thus have F(&, n) K(PeF(, 9), P=KoPe and this holds on the open set U consisting of those points < & 9> in AL x N such that Pe F(&,2) EL. If we think of Was Wy x Wz, then F and K fare ordered pairs of functions, F = and K = , P is the mapping +f, and the second component of the above equation is F? = keP!178 THe prrenewriaL caLcunus 313 Since F'[U] = P » F[U] = L, the above equation says that F(U] is the graph of the mapping k from L to Wa. Moreover, L is an open subset of the r-dimensional veetor space W1, and therefore F[U] is an r-dimensional patch manifold in W=WixWs.0 ‘The above theorem includes the answer to our original question about functional dependence. Corollary. Let F = {f'}7 be an m-tuple of continuously differentiable real-valued functions defined on an open subset A of a normed linear space V, and suppose that the rank of dy has the constant value r on A, where r is less than m. ‘Then any point 7 in A has a neighborhood U over which m—r of the functions are functionally dependent on the remaining Proof. By hypothesis the range Y of dFy = < is an rdimen- sional subspace of R". We ean therefore find a basis for a complementary subspace W by choosing m — r of the standard basis elements {6'}, and we may ‘as well renumber the functions f* so that these are 6*",..., 8". Then the projection P of R™ onto RY = 1(6!, ..., 8°) is an isomorphism from Y to BY (ince ¥ is a complement of its null space), and by the theorem there is a neighborhood U of ¥ over which (I — P) © F is a function k of Pe F. But this says exactly that = ke . That is, k is an (m —r)- tuple-valued function, & = , and f= Be for jarth.jmO v Fig. 3.12 ‘We mentioned earlier in the section that there was a difficulty in concluding that if F isa continuously differentiable map from an open subset A of V to W ‘whose differential has constant rank r less than d(W), then S = range F is an r-dimensional submanifold of S. ‘The flaw ean be deseribed as follows. ‘The definition of a submanifold S of X required that each point of S have a neighbor- hhood in X whose intersection with S is a patch. Tn the ease before us, what weBad UNIFORM CONTINUITY AND FUNCTION-VALUED MAPPINGS 179 can conclude is that if 8 is a of S, then 8 = F(a) for some a in A, and a has a neighborhood 0 whose image under Fis a patch. But this image may not be a full neighborhood of 2 in S, because § may curve back on itself in such a ‘way as to intrude into every neighborhood of p. Consider, for example, the one- dimensional T imbedded in R* suggested by following Fig. 3.12. The curve bogins in the ze-plane along the z-axis, curves over, and when it comes to the ry-plane it starts spiraling in to the origin in the zy-plane (the point of change over from the zz-plane to the zy-plane is a singularity that we could smooth out). ‘The origin is not a point having a neighborhood in R* whose intersection with I" ‘is a one-patch, but the full curve is the image of (—1, 1) under a continuously differentiable injection We would consider to bea one-dimensional manifold without any difficulty, but something has gone wrong with its imbedding in R°, so it is not a one-dimensional submanifold of R°. “14, UNIFORM CONTINUITY AND FUNCTION-VALUED MAPPINGS In the next chapter we shall see that-a continuous function F whose domain is a bounded closed subset of a finite-dimensional vector space V is necessarily uniformly continuous. This means that given ¢, there is a 8 such that eal <3 = |FQ)—F@ll f, (or 2 + F(, »)) is continuous, in fact, uniformly continuous, from N to Y.180 THE DIFFERENTIAL caLcuLUS Bd Proof. Given ¢, choose 6 so that <8 09> ~ | <5 = [FE n) — PHM Ih — Sle Se 0 ‘We have proved that if a function of two variables is uniformly continuous, then the mappings obtained from it by the general duality principle are eontinuous. ‘This phenomenon lies behind many well-known facts. For example Corollary. If F(z, y) is a uniformly continuous real-valued function on the unit square (0, 1] X (0, 1] in B4, then J FC, y) dr isa continuous funetion of Proof. ‘The mapping y+ [ F(e, 1) de is the composition of the bounded linear mapping f+ fi f from €((0, 1] to R with the continuous mapping y— F(-,y) from [0, 1} to €[0, 1}, and is continuous as the composition of eontinuous mappings. We consider next the dfferentiability of the above duality induced mapping. ‘Theorem 14.2. If Pisa bounded continuous mapping from an open product set MIN of a normed linear space V x W to a normed linear space X, ‘and if d’%.,3> exists and is a bounded uniformly continuous function of on x N, then ¢:9— FC, 2) is.a differentiable mapping from to ¥ = Ge(M, X), and [dep (8) = AP Lx9>(0)- Proof. Given ¢, we choose 8 by the uniform continuity of dF?, so that In wl] <6 = [ld Eeu> — dPReo>l| <€ for all £€ M. The corollary to Theorem 7.4 then implies that [In| < 6 => |APZe5>(n) — dF Sea>(m)|l < eliall for all € M, all @€N, and all » such that the line segment from 8 to 8+ 9 isin N. We fix 8 and rewrite the right-hand side of the above inequality. ‘This, is the heart of the proof. First AF Ze a>(n) = F(E 8 + 0) — F(E 8) = Uory — Soll) = [ol8 + ») — o(@)1(8) = Load e)- ‘Next we can check that if ||dF%,.»>|| © M x N, then the mapping 7 defined by the formula [7(9)](2) = dF&ea>(a) is an clement of Hom(I¥, Y) of norm at most:b. We leave the detailed verification of this as anad UNIFORM CONTINUITY AND FUNCTION-VALUED MAPPINGS 181 exercise for the reader. ‘The last displayed inequality now takes the form nl] < 8 => [ife—(n) — TEC! < €linll, ‘and henee lial < 8 = |Aga() — T(n)|le < ella ‘This says exactly that the mapping ¢ is differentiable at 8 and dey = 7. 0 ‘The mapping ¢ is in fact continuously differentiable, as ean be seen by arguing a little further in the above manner. The situation is very elose to being ‘an application of Theorem 14.1 ‘The classical theorem on differentiability under the integral sign isa corollary of the above theorem. We give a simple case. Note that if 9 isa real variable y, then the above formula for dg can be rewritten in terms of are derivatives: , iF WOW = 3, GD. Corollary. If F(z, y) is a continuous real-valued function on the unit square (0, 1] x [0,1], and if aF /ay exists and is a uniformly continuous funetion on the square, then J¢ F(z, y) dr is a differentiable function of y and its derivative is ff (2F /ay)(x, ») dz. Proof. The mapping T: y+ Jé F(z, y) dr is the composition of the bounded linear mapping f + J¢ f(z) dz from @({0, 1}) to B with the differentiable mapping ¢:y + F(-, y) from (0, 1] to €((0, 1), and is therefore differentiable by the composite-function rule. ‘Then Theorem 7.2 and the faet that the differential of 4 bounded linear map is itself give a) [wonme=[ Fone a We come now to the situation of most importance to us, where a point to point map generates function-to-funetion map by composition. Let A be an open set in a normed linear space V, let S be an arbitrary set, and let @ be the set of bounded maps from S to A. Then @ is a subset of the normed linear space (S, V) of all bounded funetions from S to V under the uniform norm. A funetion f € @ will be an interior point of @ if and only if the distance from the range of f to the boundary of @ is a positive number 8, for this is clearly equivalent to saying that @ includes a ball in @(S, V) about the point j. Now let g be any bounded mapping from A to a normed linear space W, and let G: @ —> @(S, W) bbe composition by g. ‘That is, h = G(J) if and only iffe@and k= gef. We can consider both the continuity and differentiability of G, but we shall only work out the differentiability theorem, ‘Theorem 14.3. Let the function g: A + W be differentiable at each point ain A, and let dg, be a bounded uniformly continuous function of a. ‘Then the mapping G:@ — G(S, W) defined by G()) = gf is differentiable at182 THE DIFFERENTIAL caLcULUS BAS any interior point fin @ and dG,: &(S, V) > 6S, W) is defined by [4G yA) = daren(h(9)) for all 8 ES. Proof. Given e, choose é by the uniform continuity of dg so that, lla = ll <8 = [dz — daall <6, and then apply the corollary to Theorem 7.4 onee more to conclude that Wel < 8 = [laaa() — daa) S ele provided the line segment from a to a-+ €isin A. Now chooseany fixed interior point f in @, and choose 4’ < 6 so that Bs(f) C@. Then for any h in @(S, V), lle <8 => [Agpea(K(e)) — dape(h(s))|| < lA for all €S. Define a map 7: @(S, ¥) + @(S, W) by IT(DIs) = doyay(H(). ‘Then the above displayed inequality ean be rewritten as Uhlle <8 =9 []AG/(h) — TH) < ele That is, AG; = T+ 0. We will therefore be done when we have shown that T €Hom(a(S, V), 8(S, W)). First, we have (Tihs + ha) )(8) = day Cha + ha)(8)) = day (hal) + hal)) dag.as(hx(9)) + Aaya (hal$)) = (T(h))(®) + (T2)) (0, ‘Thus P(y + ha) = Ths) + T(he), and homogeneity follows similarly. Second, if b is a bound to ldgal| on A, then ||7(A)|» = lub {\|(7@)(@)|| 18 8} < lub ‘ldgn|- |M(@||:8€S} < ble. Therefore, ||7} and [G()]Q = g(f1(0, fal), 0), then our rules about partial differentials give us the formula [AGM = dof (hal) + dafen(h2(0). ‘15, THE CALCULUS OF VARIATIONS ‘The problems of the calculus of variations are simply eritieal-point problems of a certain type with a characteristic twist in the way the condition dF, = Ois used. ‘We shall illustrate the subject by proving one of its standard theorems. Since we want to solve a constrained maximum problem in which the domain is an infinite-dimensional veetor space, a systematic diseussicn would start off3.15 ‘THE CALCULUS OF VARIATIONS 183. with a more general form of the Lagrange multiplier theorem. However, for our purpose it is sufficient to note that if Sis a closed plane M -+ a, then the restrietion of F to S is equivalent to a new funetion on the vector space M, and its differential at ¢ = 9 +a in S is clearly just the restriction of dFp to M. The requirement that 6 be a eritical point for the constrained function is therefore simply the requirement that dF’y vanish on Mf. Let F be a uniformly continuous differentiable real-valued function of three variables defined on (an open subset of) Wx Wx R, where Wis a normed Jinear space. Given a closed interval [a, 6] CR, let V be the normed linear space €'((a, 8}, W) of smooth ares f:[a, b] —» W, with |f) taken as lf + [fle ‘The problem is to maximize the (nonlinear) functional G(f) = [2 F(f(O,f'O,0) dt, subject to the restraints f(a) = a and f(@) = 8. That is, we consider only smooth ares in W with fixed endpoints @ and 8, and we want to find that are from a to 8 which maximizes (or minimizes) the integral. Now we ean show that G is a continuously differentiable funetion from (an open subset of) V to B. ‘The easiest way to do this is to let X be the space e((a, b], ) of eontinuous ares under the uniform norm, and to consider first the more general functional K from X x X to R defined by K(f, 9) = Je F({, 90, 0) dt. By Theorem 14.3 the integrand map r+ F({), 0¢),-) is differentiable from X x X to (a,b) and its differential at evaluated at is the function AP 0, geoar(W) + APE eo, 000.0>K(D. Since f+ {! s(0 isa bounded linear functional on @, it is differentiable and equal to its differential. ‘The composite-function rule therefore implies that K is differentiable and that AK (hyb) = f° la ™(AQ) + AP? (HCO) a, where the partial differentials in the integrand are at the point Now the pairs such that f” exists and equals g form a closed subspace of XX X which is isomorphic to V. Tt is obvious that they form a subspace, but to see that it is closed requires the theory of the integral for parametrized ares from Chapter 4, for it depends on the representation s(0) = fla) + f£s"G) de and the consequent norm inequality ||f(¢) — f(a)\| < (¢— a)||f"l. Assuming this, we se that our orginal functional Gis just the restriction of K to this subs space (isomorphic to) V, and henee is differentiable with agin) = f° ak *(A(0) + aP*(H(O) a ‘This differential dGy is called the firs variation of G about f. ‘The fixed endpoints a and 8 for the are f determine in turn a closed plane P in V, for the evaluation maps (coordinate projections) mz: f(z) are bounded and P is the intersection of the hyperplanes w4—= a and mr» = 8. Since P is a translate of the subspace M= {fe V: fla) = f(b) = 0}, our constrained184 THE DIVFERENTIAL caLcULUS 315 maximum equation is 4G, [/ lar Geo) + are@oyiae for all h in M. We come now to the special trick of the calculus of variations, called the Jemma of Du Bois-Reymond. Suppose for simplicity that W = R. Then F is a function F(z, y, ¢) of three real variables, the partial differentials are equivalent to ordinary partial derivatives, and our critical-point equation is aGy(h) = t @. h 0, we see that the equation becomes . f ar f an la \Oy az) whereg = W, Sineeh isan arbitrary continuously differentiable function except for the constraints ha) = h(b) = 0, we see that g is an arbitrary continuous funetion except for the constraint [° g(t) di = 0. That is, 0F/ay — [aF /ar is orthogonal to the null space of the linear functional 9 ++ {2 9(0) dt. Sinee the one-dimensional space N* is clearly the set of constant functions, f[o=0 [°c F G0,90,0 = [ FGo,£0,) ate. ‘This equation implies, in particular, that the left member is differentiable. This is not immediately apparent, since j’ is only assumed to be continuous. Differ- entiating, we conelude finally that f is a critical point of the mapping @ if and only if it is a solution of the differential equation dar di oy which is called the Euler equation of the variational problem. It is an ordinary differential equation for the unknown function f; when the indicated derivative is computed, it takes the form. ‘our eondition becomes 0,70.) = (0.10.0, oF, , &F ,. er ar _ ay!” + ayac!’ * ayat~ az = If W is not R, we get exaetly the same result from the general form of the integration by parts formula (using Theorem 6.3) and a more sophisticated3.15 THE CALCULUS OF VARIATIONS 185 version of the above argument. (See Exercise 10.14 and 10.15 of Chapter 4.) ‘That is, the smooth are f with fixed endpoints @ and is a critical point of the mapping g++ [? F(o(0), (0), ¢) dt if and only if it satisfies the Euler differential equation a . Sahn roe = Pe 00 ‘This is now a vector-valued equation, with values in W*. If W is finite-dimensional, with dimension n, then a choice of basis makes W* into RX, and this vector equation is equivalent to 7 scalar equations dar adi ay: where F is now a funetion of 2n + 1 real variables, (100, 10,9 =F 0,90,9, P(x, yy t) = Pty 6+ 5 Bay Yay ++ +9 Yar D. Finally, let us see what happens to the simpler variational problem (1 = R) when the exdpoints of fare not fixed. Now the eritical-point equation is d@,(h) = 0 for all h in V, and when we integrate by parts it becomes or [ +f (ar _ a ar ay "Lt J, Ge ~ aay for all h in V. We can reason essentially as above, but little more closely, to conclude that a function f is a critical point if and only if it satisfies the Euler equation d (al or a @) ea? and also the endpoint conditions 0 orl _aF| Ine 8 ‘This has been only a quick look at the variational ealeulus, and the interested reader can pursue it further in treatises devoted to the subject. ‘There are many ‘more questions of the general type we have considered. For example, we may want neither fixed nor completely free endpoints but freedom subject to constraints. We shall take this up in Chapter 13 in the special ease of the varia tional equations of mechanics. Or again, may be a function of two or more variables and the integral may be a multiple integral. In this ease the Euler ‘equation may become a system of partial differential equations in the unknown f. Finally, there is the question of sufficient conditions for the critical funetion to give-a maximum or minimum value to the integral. ‘This will naturally involve a study of the second differential of the functional G, or its second variation, as itis, known in this subject. ’y186 THE DIFFERENTIAL caLcULUS 3.16 ‘16, THE SECOND DIFFERENTIAL AND. ‘THE CLASSIFICATION OF CRITICAL POINTS Suppose that V and W are normed linear spaces, that A is an open subset of Y and that F: AW is a continuously differentiable mapping. ‘The frst differ ential of F is the continuous mapping dF: 7 ++ dy from A to Hom(V’, WP). We now want to study the differentiability of this mapping at the point a. Pre- sumably, we know what it means to say that dF is diferentiable at a. By definition d(@F)q is a bounded linear transformation 7’ from V to Hom(V,,W) such that A(dF (a) — T(x) = e(n).. That is, dPa4y— Pa — Tn) is an clement of Eom(V’, W) of norm less than en] for » sufficiently small. We set @2F, = d(aP),and repeat: d*F, = d*F,(-) isa linear map from V to Hom(V, W), @F,(9) = EFa(n)() is an clement of Hom(V, W), and d®Fq(9)(¢) is a veetar in W. Also, we know that d?F, is equivalent to a bounded bilinear map wVxVoW, where w(n, = dF a()(8)- The vector d?F4(n)(&) clearly ought to be some kind of second derivative of F at a, and the reader might even conjecture that it is the mixed derivative in the directions € and Theorem 16.1. If F:A—+ Wis continuously differentiable, and if the second differential d°F, exists, then for each fixed » V the function D,F:1 + D,E(2) from A to W is differentiable at and D,(D,F)(a) = (PF.O))(u). Proof. We use the evaluation-at-4 map ev,:Hom(V, 1) — W defined for « fixed y in V by ev,(7) = T(x). Tt isa bounded linear mapping. ‘Then (Dak la) = aPa(u) = evs(dP.) = (ey © dF Ya), so that the funetion D, is the composition ev, © dF. It is differentiable at « because d(dF), exists and ev, is linear, Thus (D,(D,F))(a) = d(D,P)a(») = levy © dF av) = (evn 2 (dF )a)(v) = evpl(d?Fa)(o)] = (WF alv))(H). 0 ‘The reader must remember in going through the above argument that Dj is the funetion (D,F)(-), and he might prefer to use this notation, as follows D/(DFY) la = A(DpFIC) a0) = Mevy © AFP) = levy © d(dF.)al®) = ev,(d?Fa(e)). If the domain space V is the Cartesian space R, then the differentiability of (DyF)\() = (@F/2z,)(>) at a implies the existence of the second partial deriva tives (@"F /85; az,)(a) by Theorem 9.2, and with b and e fixed, we then have D.(DsF) = De (r b 3) = Eve = En (eds (3) = Eoeag te3.16 SECOND DIFFERENTIAL} CLASSIFICATION OF CRITICAL POINTS 187 ‘Thus, Corollary 1. If V = R" in the above theorem, then the existence of dF. implies the existence of all the second partial derivatives (@2F/02; 0z;)(a) and = Dy(D.F)(a) x, bi (b, ©) @. 8x50 cs ee ‘Moreover, from the above considerations and Theorem 9.3 we ean also ‘conclude that: ‘Theorem 16.2. If V = R", and if all the second partial derivatives (Fax; dx;)(a) exist and are continuous on the open set A, then the second differential d?F, exists on A and is continuous, Proof. We have directly from Theorem 9.3 that each first partial derivative (GF /ar,)() is differentiable. But 4F/ér, = eve! » dP, and the corollary is then a consequence of the following general principle. 0 Lemma. If {S,}{ is a finite collection of linear maps on a vector space W such that $= is invertible, then a mapping F: A — W is differentiable at a if and only if S; F is differentiable at a for all i, Proof. For then Se F and F = S~! © SF are differentiable, by Theorems 8.1 and 6.2. 0 “These considerations clearly extend to any number of differentiations. ‘Thus, if dF: y > a?P, is differentiable at a, then for fixed b and ¢ the evaluation @Fo(b, e) is differentiable at a and the formula P.DWNO = FF, a) = Tb oO cll & fe shows (for special choices of b and e) that all the second partials (@F /az; ax,)(-) are differentiable at a, with 2 (DD_DiF\a) = DDD la = ES bedegac ser aes Conversely, if all the third partials exist and are continuous on A, then the second partials are differentiable on A by Theorem 9.3, and then d2F, is differentiable by the lemma, sinee (@°F /8x; a23)() = evesia’y © dF.» ‘As the reader will remember, it is crucially important in working with higher-order derivatives that 4°F/ax; dz; = 4°F /dx; ax;, and we very much ned the same theorem here Theorem 16.3. The second differential is a symmetric function of its two arguments: (d*Fa(n))() = (?Fa(2))(n)-188 THE DIFFERENTIAL caLcuLUS 316 Proof. By the definition of d(@F)q, given ¢, there is @ 6 such that (AF )a(n) — d(aF)a(n)|| < lal whenever ||| <3 Of course, A(dF)a(n) = Rasy — dPa. If we write down the same inequality with 9 replaced by + £, then the difference of the transformations in the left members of the two inequalities is (Paty ~ WPasyts) — PF), and the triangle inequality therefore implies that N(@Fagn — Fatast) — @Fa(—¥)|| $ €(\lnll + llr + sll) < 2eCllall + lls, provided that both and 7+ fhave norms at most 5. Weshall take ¢|| < 6/3 and |n|| < 28/3. If we hold § fixed, and if we set 7 = d2F4(—) and O(e F(8) ~ P(E + 9), then this inequality becomes ||dGa4y — T\| < 2e(\\al| + Il), and since it holds whenever || < 28/3, we ean apply the corollary to Theorem 7.4 and conclude that ||AGa49(£) — T(8)|| < 2«(\\nl| + [\s||)|I&||, provided that » nd 9 -+ Ehave norms at most 24/3. This inequality therefore holds if», ¢, and & all have norms at most 8/3. If we now set | = —m, we have Gein = AFaga — AFatest = AFayy — OF ay and AGa4q(8) = Flat n+ §) — Fla +2) ~ Fla + 8) + F(a). This function of nand £ is called the second difference of F at a, and is designated A7F.(n, 8). Note that itis symmetric in §and n. Our final inequality ean now be rewritten as (la?Fa(n, 8) — (@?Fa(n))(8)|| S 4ellnl [el Reversing 7 and £, and using the symmetry of AF, we see that MI(@?Fa(a))(€) — (d?Fa(8))(a)|l < Bellnl ell, provided » and & have norms at most 4/3. But now it follows by the usual homogeneity argument that this inequality holds for all y and & Finally, sinee ¢ is arbitrary, the leftchand side is zero. 0 ‘The reader will remember from the elementary calculus that a critical point a for a function f [f"(a) = 0] is a relative extremum point if the second derivative f”(a) exists and is not zero. In fact, iff"(a) <0, then fhas a relative ‘maximum at a, because /”(a) < 0 implies that ’is deereasing in a neighborhood of a and the graph of f is therefore concave down in a neighborhood of a. Simi- larly, f has a relative minimum at a if f"(a) = 0 and f”(a) > 0. Iff"(a) = 0, nothing ean be concluded. Iff isa real-valued function defined on an open set A in a finite-dimensional vector space V, if a € A is critical point of f, and if d3fa exists and is a non-3.16 SECOND DIFFERENTIAL; CLASSIFICATION OF CRITICAL POINTS 189 singular element of Hom(V, V*), then we can draw similar conclusions about the behavior of f near a, only now there is a richer variety of possibilities. ‘The reader is probably already familiar with what: happens for a function f from R? to R. Then a may bea relative maximum point (@ “eap” point on the graph of f), relative minimum point, or a saddle point as shown in Fig, 3.13 for the graph of the translated function Af,, However, it must be realized that new ‘axes may have to be chosen for the orientation of the saddle to the axes to look as shown. Replacing f by Af, amounts to supposing that 0 is the eritical point and that /(0) = 0. Note that if 0 is a saddle point, then there are two complemen: tary subspaces, the coordinate axes in the Fig. 3.13, such that 0 is a relative ‘maximum for f when fis restricted to one of them, and a relative minimum point for the restriction of f to the other. Fig. 3.18, We shall now investigate the general case and find that it is just like the ‘two-dimensional case except that when there is a saddle point the subspace on which the critical point is a maximum point may have any dimension from 1 ton ~ 1 [where d(V) = n]. Moreover, this dimension is exaetly the number of 1's in the standard orthonormal basis representation of the quadratic form a(t) = (8, 8) = afal(E, 8 ur hypotheses, then, are that fis a continuously differentiable real-valued function on an open subset of a finite-dimensional normed linear space V, that @ eA is a critical point for f (df = 0), and that the mapping d¥fq:V — V* exists and is nonsingular. ‘This last hypothesis is equivalent to assuming that the bilinear form w(, ») = d%fa(E, n) has a nonsingular matrix with respect to any basis for V. We now use Theorem 7.1 of Chapter 2 to choose an w-orthonormal basis {a;}}. Remember that this means that w(ai, aj) = Oif i > j,w(ai, a:) = 1 for i= 1,...,p, and ola, a) = —1 fori = p+,...,n. There cannot be any 0 values for (ai, ai) because the matrix ti; = o(ai, a,) is nonsingular: if ‘#(a%, a3) = 0, then the whole ith column is zero, the column space has dimension Hom*(V, W)192 THE DIFFERENTIAL CALCULUS BT is differentiable on A, then its differential, d(d°F) = d*F, is from A to Hom(V, Hom?(V, W)) = Hom'(V, W), the space of all trilinear mappings from V3=V x Vx V toW. Continuing inductively, we arrive at the notion of the nth differential of F on A as a mapping from A to Hom(V, Hom"~"(V, W)) = Hom*(V, W), the space of all n-linear mappings from V" to W. ‘The theorem that d?F, is a symmetric element of Hom*(V, W) extends inductively to show that dP, is a symmetric element of Hom"(V, W). We shall omit this proof. Our theorem on the evaluation of the second differential by mixed directional derivatives also generalizes by induction to give Diyos DyF(@) = aFalty for starting from the left-hand term, we have Daye «++ DeFI|a = A(Dagy +) Dig) al 1) (aR (ay «5 bn) ED = MeV gegen 2d FesdalE) ¥ etannta> © AU Fe)allB) FV itg> (d"Fa(1)) (a"Fa(E1)) (E25 +++» En) = @"Pa(bay «++» En): IfV = R®, then our conclusions about partial derivatives extend inductively in the same way to show that F has continuous differentials on A up through order m if and only if all the mth-order partial derivatives "F/Azi,... 221, exist and are continuous on A, with vr Endy aPC’. We now consider the behavior of F along the line ¢+» a+ tm, where, of course, a and 9 are fixed. If X(0) = F(a + fn), then we can prove by induction that A/a = (Dy)Fla +t) We know this to be true for j = 1 by Theorem 7.2, and assuming it for j = m, ‘we have, by the same theorem, Hy fax, Fo = (BY 0 = DePaul) = DADFAV(A + tH) = DeFla +t. ‘Now suppose that Fis real-valued (W= R). We then have Taylor's formula: NO= NO ENO +--+ FO + A, for some k between 0 and 1. Taking ¢ = 1 and substituting from above, we have 1 Flat 9) = Fla) + DoFla) +o + 5 DEP) + Gay DET Fla + ha),3.7 ‘THE TAYLOR FonMULA 193 which is the general Taylor formula, terms of differentials, itis Fla +n) = Fla) + dPa(a) bo + the normed linear space context. In Fal, «+5 1) + Ea Fesenlts-s0) wT If V=R*, then oe = X}.y:.aG/ae, and so the general term in the Taylor expansion is A(Eug)ra-3, 5 Ota, ,6= <0, then “Or, Im 2, and if we use the notation Be Zk Ge) +20 ZF) 4 0 EE @]- ‘The above description is logieslly simple, but it is inefficient in that it repeats identical terms such as yiv2(07F/ar1 dz2) and yayi(8°F /2r2 821). We conclude by describing for the interested reader the modern “multi-index” notation for this very complicated situation Remember that we are looking at the mth term of the Taylor formula for F, and that F has» variables, For any n-tuple k = of nonnegative integers, we define [k| as DJ ki, and for xR", we set a= xyMzgl---z4h, Also we set Fy Pryayety 0F better, if DjF = AP /ax;, we set DYP = D:MD,# ++ Dub = Py Finally, we set k! = ky kg! ++ ty!, and if p > (kl, we set (t) = pl/k(p — |k|)!. ‘Then the mth term of the a expansion of F is E(t) ows, whieh is surely notational triumph. ‘The general Taylor formula is too cumbersome to be of much use in practice; it is principally of theoretical value. ‘The Taylor expansions that. we actually compute are generally found by other means, such as substitution of a polynomial (or power series) in a power series. For example, ety) Set, ear weet + (G-)+ GPa) A mapping from A to W which has continuous differentials of all orders ‘through k is said to be of class C* on A, and the collection of all such mappings 1 pe DFG sin @ + 9?104 THE DIFFERENTIAL caLcULUS 37 is designated C*(4, W) or €*(A, W). Teis clear that CA, 1) is a vectors (induction). Moreover, it ean also be shown by induetion that a composition ‘of Cmaps is itself of lass C*, This depends on recognizing the general form of the mth differential of a composition Fe @ as being a finite sum, each term of ‘4 composition of funetions chosen from F, dF,...,d"F, @,dG, ... dG. Funetions of many variables are involved in these calculations, and it is simplest to treat each as a function of a single n-tuplet variable and to apply the obvious corollary of Theorem 8.1 that if G,..., G* are of class CF, then so is C= <6',...,G">, with &@ = . As a special case of ‘composition, we ean conclude that a product of C*-maps is of elass C* ‘We shall see in the next chapter that g: 7'++ 7 isa differentiable map on the open set of invertible elements in Hom V (if V is @ Banach space) and that der(H) = — THT, Since + S~'HT~"then has continuous partial differentials, we ean continue, and another induction shows that ¢ is of elass C! for every kand that d"er(Hy,..., Hy) isa finite sum of finite produets of T-!, Hy, ..., Hy. Tt then follows that a function F defined implicitly by a C* function G@ is also of class C*, for its differential, as computed in the implicit- function theorem, is then a composition of maps of elass C*~!, ‘A mapping F which is of class C* for all k is said to be of class C*, and it follows from our remarks above that the family of C*-maps is closed under all the operations that we have met in the calculus. If the domain of F is an open set in R*, then F & €*(4, 1) if and only if all the partial derivatives of F exist sand are continuous on 4.CHAPTER 4 COMPACTNESS AND COMPLETENESS In this chapter we shall investigate two properties of subsets of a normed linear space V which are concerned with the fact that in a certain sense all the points which ought: to be there really are there. These notions are largely independent of the algebraic structure of V, and we shall therefore study them in their own ‘most natural setting, that of metrie spaces. The stronger of these two properties, compactness, helps to explain why the theory of finite-dimensional spaces is 80 simple and satisfactory. ‘The weaker property, completeness, is shared by important infnite-dimensional normed linear spaces, and allows us to treat these spaces in almost as satisfactory @ way. Tt is these properties that save the calculus from being largely @ formal theory. ‘They allow us to define erucial clements by limiting processes, and are responsible, for example, for an infinite series having a sum, a continuous real- valued function aseuming a maximum value, and a definite integral existing. For the real number system itself, the compactness property is equivalent to the least upper bound property, which has already been an absolutely essential tool in our construction of the differential ealculus in Chapter 3. In Sections 8 through 10 we shall apply completeness to the calculus. The first of these sections is devoted to the existence and differentiability of functions defined by power series, and sinee we want to include power series in an operator 1, we shall take the occasion to introduce and exploit the notion of a Banach algebra. Next we shall prove the contraction mapping fixed-point theorem, which is the missing ingredient in our unfinished proof of the implicit-funetion theorem in Chapter 3 and which will be the basis for the fundamental existence and ‘uniqueness theorem for ordinary differential equations in Chapter 6. In Section 10 we shall prove a simple extension theorem for linear mappings into a complete normed linear space and apply it to construct the Riemann integral of a param atrized are. 1, METRI SPACES; OPEN AND CLOSED SETS In the preceding chapter we occasionally treated questions of convergence and continuity in situations where the domain was an arbitrary subset A of a normed, linear space V. In such discussions the algebraic structure of V fades into the background, and the vector operations of V are used only to produce the eombi- 195106 COMPACTNESS AND COMPLETENESS 4a nation lla — 6], which is interpreted as the distance from a to 8. It we distill out of these contexts what is essential to the convergence and continuity arguments, we find that we need a space A and a function p: AX A — B, p(z, y) being called the distance from z to y, such that, 1) pla, y) > Oit x # y, and ple, 2) = 2) ple, v) = ply, 2) for all 2, y € A; 8) p(x, 2) < pz, u) + p(y, 2) for all 2, y,2€ A Any set, A together with such a funetion p from A X A to R is called a metric space; the function p is the metric. It is obvious that a normed linear space is « metrie space under the norm metrie p(a, 8) = Ja ~ 8 and that any subset B of a metrio space A is itself a metric space under p | BX B. If we start with « niee intuitive space, ike R" under one of its standard norms, and choose a weird subset B, it will be clear that a metric space ean be a very odd object, and may fail to have almost any property one ean think of. Metrie spaces very often arise in practice as subsets of normed linear spaces with the norm metric, but they eome from other sourees too. Even in the normed linear space context, metries other than the norm metrie are used. For example, S might be a two-dimensional spherical surface in R°, say S {x: 53.2! = 1), and p(x, y) might be the great eirele distance from x to y. Or, more generally, S might be any smooth two-dimensional surface in R*, and ‘(x, y) might be the length of the shortest curve eonneeting x to y in S. ‘In this chapter we shall adopt the metrie space context for our arguments wherever it is appropriate, so that the student may become familiar with this more general but very intuitive notion. We begin by reproducing the basic definitions in the language of metries. Because the sealar-veetor dichotomy is not a factor in this context, we shall drop our convention that points be represented by Greek or boldface roman letters and shall use whatever letters we wish, Definition. If X and ¥ are metric spaces, then f:X > ¥ is continuous at aE X if for every e there is a 8 such that plz, a) < 6 = pl f(z), fla) < € Here we have used the same symbol ‘p’ for metries on different spaces, just 1s earlier we made ambiguous use of the norm symbol. Definition. The (open) ball of radius r about p, By(p), is simply the set of points whose distance from p is less that r: Bulp) = (0:09) <1). Definition. A subset A CX is open if every point p in A is the center of some ball included in A, that is, if (Wp © A)(3r”*)(By(p) CA).41 METRIC SPACES; OPEN AND CLOSED SETS 197 Lemma 1.1. Every ball is open; in fact, if ¢ € By(p) and 8 = r — p(p, 9), then Bu(q) C By(p). Proof. This amounts to the triangle inequality. For, if € By(q), then p(z, q) < & and ple, p) < plag) + 0GG.P) 1—«.200 coMPACTNESS AND COMPLETENESS 41 Proof. Choose any 8 & N. Then p(8, N) > 0 (because N is closed), and there exists an 9 € N such that le — all < 0, N/A — [by the definition of p(8,.N)). Set a = (8 — »)/||8 — nl. Then |jal] = 1 and (a, N) = p(B — 9, N)/II8 — al] = p(B, N)/\|8 — nl) > (8, NG — &/0(8, N) by (2), (3), and the definition of ». 01 ‘The reader may feel that we ought to be able to improve this lemma. Surely, all we have to do is choose the point in N which is closest to 8, and so obtain i — al] = p(G, N), giving finally a vector a such that ||al] = Land p(a, N) = 1. However, this is a matter on which our intuition lets us down: if N is infinite. dimensional, there may not be a closest point 9! For example, as we shall see later in the exercises of Chapter 5, if V is the space @({—1, 1}) under the two- norm [if] = (J: 9)"?, and ifW is theset of functions g in V such that {2.9 = 0, ‘then Nisa closed subspace for which we cannot find such a “best” a. But if N is finite-dimensional, we ean always find such a point, and if V is a Hilbert space, (see Chapter 5) we can also. EXERCISES 1.1 Write out the proof of Lemma 1.2 1.2 Prove (2) and (8) of Theorem 1.1 1.8. Prove (2) of Theorem 1.2 1.4 Itisnot true thatthe intersetion ofa sequence of open setsis necessarily open Find ¢ counterexample in R 15. Prove the corollary of Lemma 1.4 1.6. Prove that p € if and only if pp, 4) = 0. LT Let X and ¥ be metric spaces, and let f: X + Y have the property that J-11B] is opin in X whenever B is open in Y. Prove that Js continuous. 1.8 Show that p(z, A) = pte, 2). 1.9 Show that p(z, 4) is « continuous funetion of z. (In fact, itis Lipschits con- ‘inueus) 1.10 Invent metric spaces S$ (by choosing subsets of R?) having the following properties: 1) Shas n points. 2) Sisinfinite and p(x, y) > Vif x oy. 8) Shas a ball By(a) such that the elosed ball {2: p(z, a) < 1) is not the same as the closure of Br(a).42 roroLocy 201 1.11 Prove that in a normed linear space a closed ball is the closure of the corre sponding open ball. 1.12 Show that if f:X + ¥ and g: Y¥ —+Z are continuous (where X,Y, and Z are metrie spaces), then s0 is 9 f. 1.13 Let X and ¥ be metric spaces. Define the notion of « product metric on Z = XX ¥. Define a I-metric p: and a uniform metric pe on Z (showing that they are metrics) in analogy with the I-norm and uniform norm on a product of normed linear spaces, and show that eack is a product metric according to your definition above. 1.14 Do the same for a 2netrie p2 on Z = XX Y. 1.15 Let X and ¥ be metric spaces, and let V be a normed linear space. Let f:X +R and g: ¥ — V be continuous maps,’ Prove that <2y>J@) ov) isa continuous map from XX ¥ to V. "2, TOPOLOGY TEX isan arbitrary set and 9 is any family of subsets of X satisfying properties (1) through (3) in Theorem 1.1, then 5 is called a topology on X. ‘Theorem 1.1 thus asserts that the open subsets of a metric space X form a topology on X. ‘The subsequent definitions of interior, closed set, and closure were purely topological in the sense that they depended only on the topology 5, as were ‘Theorem 1.2 and the identity (4)’ = (4°), ‘The study of the consequences of the existence ofa topology is called general tapology. On the other hand, the definitions of balls and continuity given earlier were ietrie definitions, and therefore part of metric space theory. In metrie spaces, then, we have not only the topology, but also our edefinitions of continuity and balls and the spherical characterizations of closure and interior, ‘The reader may be surprised to be told now that although continuity and convergence were definec metrically, they also have purely topological characterizations and are thersfore topological ideas. This is easy to see if one keeps in mind that in a metric space an open set is nothing but a union of balls. We have: {fis continuous at p ifand only if for every open set A containing f(p) there ‘exists an open set B containing p such that f(B]C A. This local condition involving behavior around a single point p is more fluently rendered in terms of the notion of neighborhood. A set A is a neighborhood of a point p if p € A', ‘Then we have: ‘fis continuous at p if and only if for every neighborhood N of /(p), JIN] is a neighborhood of p. Finally there is an elegant topological characterization of global continuity. ‘Suppose that S; and S2 are topological spaces. ‘Then J: Sy > Sis continuous202 COMPACTNESS AND COMPLETENESS 43 (everywhere) if and only if [4] is open whenever A is open. Also, f is continuous if and only if f-"[B] is closed whenever B is closed. ‘These conditions are not surprising in view of Lemma 1.4, 8. SEQUENTIAL CONVERGENCE In addition to shifting to the more general point of view of metrie space theory, we also want to add to our kit of tools the notion of sequential convergence, which the reader will probably remember from his previous encounter with the calculus. One of the principal reasons why metrie space theory is simpler and ‘more intuitive than general topology is that nearly all metrie arguments ean be presented in terms of sequential convergence, and in this chapter we shall partially make up for our previous neglect of this tool by using it constantly and in preference to other alternatives. Definition. We say that the infinite sequence {x,} converges to the point a if for every € there is an N such that n>N = plega) @.as n— 20, or Jimy.<2, = a. Formally, this definition is practically identical with our earlier definition of function convergence, and where there are parallel theorems the arguments that we use in one situation will generally hold almost verbatim in the other. Thus the proof of Lemma 1.1 of Chapter 3 can be alternated slightly to give the following result. Lemma 3. If {E) and {nj} are two sequences in & normed linear space V, ‘then foaand y+ 8 > + Wat Bb. ‘The main difference is that. we now choose N as max {N, Ns} instead of choosing 6 a8 min (81, 62). Similarly: Lemma 3.2. If {> a in V and x; ain R, then 2:8; aa. [As before, the definition begins with three quantifiers, (¥e)(3N)(¥n). A somewhat more idiomatic form can be obtained by rephrasing the definition in terms of balls and the notion of “almost all n”. We say that P(n) is true for almost all n if P(n) is true for all buta finite number of integers n, or equivalently, if (AV) (Vn?) P(n). ‘Then we see that lim 2, = a if and only if every ball about a contains almost all the 2, ‘The following sequential characterization provides probably the most intuitive way of viewing the notion of closure and closed sets.43 SEQUENTIAL CONVERGENCE — 203, ‘Theorem 3.1. A point x is in the closure Z of a set A if and only if there is a sequence {z,} in A converging to x. ‘Therefore, a set A is closed if and only if every convergent sequence lying. in A has its limit in A. Proof. If (,) CA and 2, +2, then every ball shout eontains almost every za, and s0, in particular, intersects A. Thus 2 € 4 by Lemma 1.3. Conversely, if'r © 4, then every ball about z intersects A, and we ean construct a sequence in A that converges to x by choosing x, a8 any point in Byya(z) A. Since A is Closed if and only if A = 2, the seeond statement of the theorem follows from the first. 0 ‘There is also @ sequential characterization of continuity which helps greatly in using the notion of continuity in a flexible way. Let X and Y be metrie spaces, and let f be any function from X to ¥. ‘Theorem 8.2. The funtion f is continuous at a if and only if, for any sequence {29} in X, if zy — a, then f(zq) —» (a), Proof. Suppose frst that J is continuous at a, and let {4} be any sequence converging to a. Then, given any ¢, there is a 6 sueh that plz, a) <6 = plfiz), fla) < by the continuity of f at a, and for this 4 there is an N such that n>N = pln, a) <5, because ty — a. Combining these implications, we see that given € we have found N so that n > N= plf(ta) » fla)) < € That is, f(t) — f(a). ‘Now suppose that fis not continuous at a. In considering such a negation it is important that implicit universal quantifiers be made explicit. Thus, formally, we are assuming that. ~(¥e)(38)(¥z)(p(z a) < 8 = p( fz)» fla)) < ey that is, that (3e)(¥8)(3z)( ple, a) < 8 & p(f(z) » fla)) 2 €). Such symbolization will not be necessary after the reader has had some practice in computing logical negations; the experienced thinker will intuit the correct negation without formal calculation. In any event, we now have a fixed ¢, and for each 6 of the form 5 = 1/n we can let, be a corresponding 2. We then have p(n , a) < 1/n and p( f(r.) , f(a)) > € for all n. The first inequality shows that 2, — a; the second shows that fz») +h f(a). ‘Thus, if f is not continuous at a, then the sequential condition is not satisfied. 0 ‘The above type of argument is used very frequently and almost: amounts to an automatie proof procedure in the relevant situations. We want to prove, say, that (¥z)(3y)(¥2)P(, y, 2). Arguing by contradiction, we suppose this false, so that (32)(¥y)(2)~P(, y, 2)- Then, instead of trying to use all numbers y, we let y run through some sequence converging to zero, such as {1/1}, and we choose204 COMPACTNES® AND COMPLETENESS 43 ‘one corresponding 2, zq, for each such y. We end up with ~P(e, 1/n, #4) for the given 2 and all n, and we finish by arguing sequentially. The reader will remember that two norms p and q on a veetor space V are equivalent if and only if the identity map ++ £ is continuous from to and also from to . By virtue of the above theorem wwe now see that: ‘Theorem 3.3, The norms p and q are equivalent if and only if they yield exactly the same collection of convergent sequences. Earlier we argued that a norm on a product VX W of two normed linear spaces should be equivalent to ||[]1 = lll + [|g]. Now with respeet to this sum norm it is clear that a sequence in V x W converges to if and only if aq —> a in Vand & > £ in W. We now see (again by ‘Theorem 3.2) that: ‘Theorem 3.4. A product norm on V x W is any norm with the property that — in Vx W if and only if aya in V and a Ein W. EXERCISES 3.1 Prove that a convergent sequence in a metric space has a unique limit. That is, show that if 2, —* a and 2, —»6, then a = 2. 8.2 Show that 24 —2 in the metric space X if and only if p(zs,z) +O in R. 3.3 Prove that if z, >a in Rand z, > 0 for all n, then a > 0, 3.4 Prove that if zy +0 in R and |ys| < x, for all n, then yq —* 0. 8.5. Give detailed ¢, N-proofs of Lemmas 3.1 and 3.2 3.6 By applying Theorem 3.2, prove that if is « metrie space, V isa normed linear space, and F and @ are continuous maps from X to V, then P+ @ is continuous. State and prove the sinilar theorem for a product FG. 3.7 Prove that continuity is preserved under composition by applying Theorem 3.2. 3.8 Show that (the range of) 8 sequence of points in a metrie space is in general not ‘closed set. Show that it may be a closed set. 8.9 The fact that in & normed linear space the closure of an open ball includes the corresponding closed bell is practically trivial on the basis of Lemma 3.2 and ‘Theorem 3.1. Show that this i so. 3.10 Show direetly that if the maximum norm |||| = max (all, [£[} is used on V = Vi X Va, then itis true that + in if and only if ana in Vi and fe in Va.44 SEQUENTIAL COMPACTNESS — 205 3.11. Show that if [is any increasing norm on R? (see the remark ater Theorem 4.3 of Chapter 3), then pl, <22,¥a>) = [I is a metric on the product X X ¥ of two metrie spaces X and ¥ 3.12 In the above exercise show that <2», o> —* in XX Vif and only if Hy zin X and yq— y in V. This property would be our minimal requirement for & product meire 213 Defining a product metric as above, use Theorem 3.2 to show that # <0,0>. a ei ‘Then fis continuous as a function of x for each fixed value of y, and conversely. Show, however, that f is not continuous at the origin. That is, find a sequence <2, ya> ‘converging to <0,0> in the plane such that f(z, ya) does not converge to 0. This example shows that continuity of @ function of two variables is a stronger property ‘than continuity in each variable separately. 4. SEQUENTIAL COMPACTNESS ‘The reader is probably familiar with the idea of a subsequence. A subsequence of ‘a sequence {z,} is a new sequence {ym} that is formed by selecting an infinite number, but generally not all, of the terms z», and counting them off in the order of the selected indices. ‘Thus, if m; is the first selected n, ny the next, and 80 on, and if we set Ym = Znq, then we obtain the subsequence (i Vasey Ua DS ay tay ering) OF mda = Conde Strictly speaking, this counting off of the selected set of indices is a sequence ‘m+-+ Mi from Z* to Z* which preserves order: Mm41 > Nm for all m. And the subsequence m+ 2,, is the composition of the sequence n+ z, and the selector sequence Tn order to avoid subscripts on subscripts, we may use the notation n(m) instead of rm. In either ease we are being conventionally sloppy: we are using the same symbol n’ as an integer-valued variable, when we write zq, and as the selector function, when we write (rm) oF i. This is one of the standard nota-206 COMPACENESS AND COMPLETENESS 4 tional ambiguities which we tolerate in elementary calculus, because the eure is, sed worse than the disease. We could say: let f be a sequence, i.e, a funetion from Z* to R. Then @ subsequenee of f is a composition fe g, where is a mapping from Z* to Z* such that. g(m ++ 1) > g(m) for all m Ifyou have grasped the idea of subsequenee, you should be able to see that any infinite sequence of O's and 1's, say {0, 1,0, 0, 1,0,0,0,1,...}, can be ed as a subsequence of {0, 1,0, 1,0, 1,---, {1 + (—1)"/2 Ifz, —+ a, then it should be clear that every subsequence also converges to a. ‘We leave the details as an exercise. On the other hand, if the sequence {r,} does not converge to a, then there is an ¢ such that for every NV there is some larger n at which p(zy,a) > € Now we ean choose such an n for every N, taking care that ny41 > my, and thus choose a subsequence all of whose terms are at a distance at least € from a. Then this sequence has rio subsequence converging to a. Thus, if {z_} does not converge to a, then it has a subsequence no (sub)subsequence of which converges to a. Therefore, Lemma 4.1. If the sequence {z,} and the point a are such that every subsequence of {x,} has itself a subsequence that converges to a, then ‘This is a wild and unlikely sounding lemma, but we shall use it to prove a ‘most important theorem (Theorem 4, Definition. A subset A of a metric space is sequentially compact if every sequence in A has a subsequence that converges to a point of A. Here, so to speak, we ereate convergence out of nothing. One would expect ‘a compact set to have very powerful properties, and perhaps suspect that there aren’t many such sets. We shall soon see, however, that every bounded closed. subset of R” is compact, and it is in the theory of finite-dimensional spaces that ‘we most frequently use this notion. Sequential compactness in infinite-dimensional spaces is @ much rarer phenomenon, but when it does occur it is very important, as we shall see in our brief look at Sturm-Liouville theory in Chapter 6. ‘We begin with a few simple but important general results, Lemma 4.2. If A is a sequentially compact subset of a mot then A is closed and bounded. Proof. Suppose that {24} CA and that z_—>. By the compactness of A there exists a subsequence {1_,5)} that converges toa point aE A. But a subsequence of a convergent sequence converges to the same limit. Therefore, a= bandb€ A. Thus A is closed. ‘Boundedness here will mean lying in some ball about a given point b. Tf is not bounded, for each n there exists @ point x, € A such that p(zq, 0) > n. By compactness a subsequence {2qq)}; converges to a point a é A, and space S, Pltmciy b) — (a, b). This clearly contradicts p(eqio, ) > n(i) > i. 044 SEQUENTIAL COMPACTNESS — 207 Continuous functions carry compact sets into compact sets. ‘The proof of the following result is let as an exercise, ‘Theorem 4.1. If fis continuous and A isa sequentially compact subset ofits domain, then j[4] is sequentially compact. A nonempty compact set AC R contains maximum and minimum elements. ‘This is beeause lub A is the limit of a sequence in A, and hence belongs to A itself, since A is closed. Combining this fact with the above theorem, we obtain the following well-known corollary. Corollary. If fis a continuous real-valued funetion and dom (J) is nonempty and sequentially compact, then f is bounded and assumes maximum and minimum values. ‘The following very useful result is related to the above theorem. ‘Theorem 4.2. If f is continuous and bijective and dom (J) is sequen compact, then f-" 's continuous. Proof. We have to show that if yq —» y in the range of f, and if = J") and z= f(y), then 2, 2. Itis sufficient to show that every subsequence {qi} has itself a subsequence converging to z (by Lemma 4.1). But, since dom (J) is compact, there is a subsequence {2q,ij} 5 converging to some z, and the continuity of f implies that f(e) = lim.» flexi) = limy- Yc) = ‘Therefore, z = j—'(y) = z, which is what we had to prove. Thus J" is continuous. 0 ‘We now take up the problem of showing that bounded closed sets in R" are compact. We first. prove it for R itself and then give an inductive argument for R. A sequence {24} CR is said to be increasing if rq < xny1 for all n. It is striclly increasing if xq < 2n41 for all n. ‘The notions of a decreasing sequence and a strictly decreasing sequence are obvious. A sequence which is either inereasing or decreasing is said to be monotone. ‘The relevance of these notions here lies in the following two lemmas. lly Lemma 4.3. A bounded monotone sequence in B is convergent. Proof. Suppose that {29} is increasing and bounded above. Let I be the least ‘upper bound of its range. ‘That is, 24 < I forall n, but for every €,! — eisnot an upper bound, and so! —¢ < ay for some N. Then u>N ao l-e las n— oo. 0 Lemma 4.4. Any sequence in R has & monotone subsequence. Proof. Call 2, a peak term if it is greater than or equal to all later terms. If there are infinitely many peak terms, then they obviously form a decreasing208 cOMPACTNESS AND COMPLETENESS 44 subsequence. On the other hand, if there are only finitely many peak terms, then ‘there isa last one zy (or none at all), and then every later term is strictly less than some other still later term. We choose any n; greater than mo, and then we ‘can choose na > 1 80 that.2q, < tq, ete. Therefore, in this ease we ean choose a strictly inereasing subsequence. We have thus shown that any sequence {24} in R has either a decreasing subsequence or a strictly increasing subsequence. Putting these two lemmas together, we have: ‘Theorem 4.3. Every bounded sequence in R has a convergent subsequence. Now we can generalize to R" by induetion. ‘Theorem 4.4. Every bounded sequence in R" has a convergent subsequence (using any product norm, say || |) Proof. The above theorem is the case n = 1. Suppose then that the theore true for n — 1, and let {x"}m be a bounded sequence in R". Thinking of R* as RY“! R, we havex™ = , and {y"}m is bounded in R*-', because if x= , then [li = lll: + le] > llylly. Therefore, thereisa subsequence {y™}, converging to some y in R*~', by the inductive hypothesis. Since {2ncj} is bounded in R, i has a subsequence {2q,p»} » converging to some z in BR.” Of course, the corresponding subsubsequence {y"" » still converges to y in R"-?, and then {x™“}, converges to x= in R*= R™"! x R, \e its two component sequences now converge to y and z, respectively. We have thus found a convergent subsequence of {x"}. ‘Theorem 4.5. If A is a bounded closed subset of R", then A is sequentially compact (in any produet norm), Proof. If (_} CA, then there is a subsequence {z,,)}+ converging to some 2 in B*, by Theorem 4.4, and 2 is in A, since A is closed. Thus A is compact. 0 We can now fill in one of the minor gaps in the last chapter. Theorem 4.6, All norms on B® are equivalent Proof. It is sufficient to prove that an arbitrary norm || || is equivalent to || Setting a = max {\|3|} 3, we have tt =| s x led I'l S alls, ‘so one of our inequalities is trivial. We also have | ||z\] — l[y\| | < iz — ull < l|z — y\)1, 0 ||z\| is @ continuous function on R* with respect to the one-norm. Now the unit one-sphere $= {x |/x\], = 1} is closed and bounded and so ‘compact (in the one-norm). The restriction of the continuous function |jz|| to ‘this compact set S has a minimum value m, and m cannot be zero because S does not contain the zero vector. We thus have |jz|| > mllz||; on S, and so \jz|| = mlz], on R", by homogeneity. Altogether we have found positive constants a and m such that ml j]1 < || || < all ji. 0 a44 SEQUENTIAL CoMPACTNESS — 209 Composing with @ coordinate isomorphism, we see that all norms on any Jfinite-dimensional vector space are equivalent. Corollary. If M is finite-dimensional subspace of the normed linear space V, then M is a closed subspace of V. Proof. Suppose that {£,} CM and & —> a V. We have to show that a is in M. Now {&} is a bounded subset of M, and its closure in Af is therefore so- quentially compaet, by the theorem. Therefore, some subsequence converges to 8 point 8 in Mf as well as to a, and soa = eM. 0 EXERCISES 4.1 Prove by induction that if f:2+—+ Z* ig such that f(n-+ 1) > f(a) forall n, then f(n) 2 m for alm 4.2 Prove carefully that if 24+ as m—+ 2, then zqq) “+a as m—+ 00 for any subsequence. ‘The above exerese is useful in this proof 4.8. Prove that if (2s) isan increasing sequence in R (#a41 > 4 forall n), and if 2x} hus a convergent subsequence, then (4) converges. 44 Gives more detailed version of the argument that ifthe sequence {z,} doesnot converge to, then there ivan e and a subsequence {2m} m such that play, @) > € for all m. 4.5. Find a sequence in R having no convergent subsequence 4.6 Find a nonconvergent sequence in R such that the set of limit points of con- vengent subsequence consists exactly of the number 1 4. Show that there isa sequence {4} in (0, 1] such that for any y & (0, 1] there isa subsequence 24, convergirg toy 4.8 Show that the set of limits of convergent subsequences of a sequence {24} in & metric space X isa closed subset of X. 4.9 Prove ‘Theorem 4.1. 4.10 Prove that the Cartesian product of two sequentially compact metric spaces is sequentially compact. (The proof is essentially in the text.) 4.11 A metric space is boundedly compact if every closed bounded set is sequentially ‘compact, Prove thatthe Cartesian product of two boundedly compact metre spaces is, boundedly compact (using, say, the maximum metrie on the product space). 4.12 Prove that the sum A +B of two sequentially compact subsets of « normed near space is sequentially compact. 4.13 Prove that the sum A+ B of closed set and a compact set is closed, 4.14 Show by an exampl in R that the sum of two closed sets nel not be closed. 415. Tet (C,) be a decroasing sequence (C.441C Cy for all n) of nonempty closed subsets of a sequentially sompact metric space S. Prove that (fC, is nonempty. 4.16 Give an example ofa decreasing sequence (C,} of nonempty closed subsets of metric space such that NF Cy = D.210 COMPACTNESS AND COMPLETENESS 45 4.17 Suppose the metric space S has the property that every decreasing sequence {Cx} of nonempty closed subsets of S has nonempty intersection. Prove that then S rust be sequentially compact. (Hint: Given any sequence {zi} CS, let Cy be the closure of {25:4 0}. 4.18 Let - be sequentially compact subset of ans V, and let 2 be obtained from .1 ‘by drawing all ine segments from points of -1 to the origin (that B= (ta:ae A and te (0, 1). Prove that B is compact. 4.19. Show by applying a compactness argument to Lemma 1.5 that if is a proper closed subspace of a finite-dimensional veetor space V, then there exists a in V such that llall = pla, N) = 1, 5, COMPACTNESS AND UNIFORMITY ‘The word ‘uniform’ is frequently used as a qualifying adjective in mathematics Roughly speaking, it concerns a “point” property P(y) which may or may not hold at each point y in a domain A and whose definition involves an existential quantifier. A typical form for P(y) is (¥e)(3d)QUy, 6 d). ‘Thus, if PQ) is fis continuous at y’, then P(y) has the form (¥e)(33)Q(v,¢, ). ‘The property holds on A if it holds for all y in A, that is, if (wy )lC¥e) (3A) QCy, ¢, A]. Here d will, in general, depend both on y and cif either y or cis changed, the corresponding may have to be changed. ‘Thus in the definition of continuity depends both on ¢ and on the point y at which eontinuity is being asserted. The property is said to hold uniformly on A, or uniformly in y, if a value d ean be found that is independent of y (but still dependent on ¢). Thus the property holds. ‘uniformly in y if (AA CHyFIQL, ¢, 2); the uniformity of the property is expressed in the reversal of the order of the quantifiers (¥y**) and (3d). ‘Thus f is uniformly continuous on A if (ve)(28) (vy, 2Ialy, 2) < 8 = af), J@)) < 4 Now 8 is independent of the point at which continuity is being asserted, but still dependent on ¢, of course. We saw in Section 14 of the last chapter how much more powerful the poi condition of continuity becomes when it holds uniformly. In the remainder of this section we shall diseuss some other uniform notions, and shall see that the ‘uniform property is often implied by the point property if the domain over which it holds is sequentially compact. ‘The formal statement forms we have examined above show clearly the distinetion between uniformity and nonuniformity. However, in writing an argument, we would generally follow our more idiomatie practice of dropping out45 COMPACTNESS AND UNIFORMITY — 211 the inside universal quantifier. For example, a sequence of functions {f,} C W4 converges pointwise to f: A —+ W if it converges to f at every point p in A, that, is, if for every point p in A and for every ¢ there is an N such that n>N = p(falp),fp)) <€ ‘The sequence converges uniformly on A if an N exists that is independent of p, that is, if for every ¢ there is an N such that >N = plfalphS)) S€ — forevery pin A. When p(é, 1) = [|€ — ml, saying that p(fa(p), 1(p)) < € for all p is the same as saying that ||f, — file < €. Thus fy — J uniformly if and only if fu — file 0; this is why the norm | leis ealled the uniform norm. Pointwise convergence does not imply uniform convergence. ‘Thus fa(z) = 2" on A = (0, 1) converges pointwise to the zero funetion but does not converge ‘uniformly. Nor does continuity on A imply uniform continuity. ‘The function f(z) 1/x is continuous on (0, 1) but is not uniformly continuous. ‘The fun: sin (1/2) is eontinuous and bounded on (0, 1) but is not uniformly continuous. Compactness changes the latter situation, however. ‘Theorem 5.1. If fis continuous on A and A is compact, then fis uniformly continuous on A Proof. ‘This is one of our “aulomatie” negation proofs. Uniform continuity (UC) is the property (Wer \(as?Y(¥2, ¥*[Ole, 9) < B= aLHe) SW)) < 4. Therefore, ~UC e+ (36)(V8)(32, la(, v) < band p(f(z), f@)) 26. Take 5 = I/n, with corresponding x, and ya. Thus, for alln, p(2q, Yx) < 1/n and p(SGtn), Sly) 2 €, where € is 8 fixed positive number. ‘Now {24} has a eon- vvergent subsequence, say yj) 2, by the compactness of A. Since evasion Eacn) < Wir we also have Yn) > 2. By the continuity of fat x, fle m0) Sumes)) $ wICEncas SG) + C412), Smo) 0, which contradicts p (f(a), f@ao)) 2 € This completes the proof by negation. ‘The compactness of A does not, however, automatically convert the point wise convergence of a sequence of functions on A into uniform convergence. The “piecewise linear” functions fq: 0, 1] + (0, 1] defined by the graph shown in Fig. 4.1 converge pointwise to zero on the compact domain (0, 1], but the eon vergence is not uniform. (However, see Exercise 5.4.)212 COMPACTNESS AND COMPLETENESS 45 H 1 Va 2m Fig. 4.1 Fig. 42 We pointed out earlier that the distance between a pair of disjoint closed sets may be zero. However, if one of the closed sets is compact, ther the distance must be positive, ‘Theorem 5.2. If A and C are disjoint nonempty closed sets, one of whieh is compact, then p(A,C) > 0. Proof. The proof is by automatic contradiction, and is left to the reader. ‘This result is again a uniformity condition. Saying that a set A is disjoint from a closed set C is saying that (¥e"4)(3"°")(B,(2) NC = @). Saying that (4, C) > Ois saying that (3° (v2)... ‘As a last consequence of sequential compactness, we shall establish a very powerful property which is taken as the definition of compactness in general topology. First, however, we need some preparatory work. If A isa subset of a rmetrie space S, the r-neighborhood of A, B,(A\, is simply the union of all the balls of radius r about points of A Bid] = U{By(a) :4 € A} = fe: Ba=4)(p(x, a) < ‘A subset A CS is rdense in 8 if SC B,(A], that is than r to some point of A A-subset A of a metric space 8 is dense in S if A= S. This is the same as saying that for every point p in S there are points of A arbitrarily close to p. ‘The set Q of all rational numbers is a dense subset of the real number system R, because any irrational real number x ean be arbitrarily closely approximated by rational numbers. Since we do arithmetic in decimal notation, itis eustomary to use decimal approximations, and if 0 § for all n > 1 We take &1 = ai/ail|, and we have a sequence {£,} C By such that lm — fall > $ if m # n. ‘Then no ball of radius 4 can contain more than one &, proving the Jemma, 0 For a concrete example, let V be €((0, 1), and let f, be the “peak” function sketched in Fig. 4.2, where the three points on the base are 1/(2n +2), 1/(2n+ 1), and 1/2n. Then fas is “disjoint” from fu (that is, fngifa = 0), and we have [fale = 1 for all w and |[f —falle = Lif'm % m. ‘Thus no ball of radius $ ean contain more than one of the functions f,, and accordingly the closed unit bal in V cannot be covered by a finite number of balls of radius 4 Lemma 3.2. Every sequentially compact set A is totally bounded. Proof. If A is not totally bounded, then there exists an r such that no finite subset F is r-dense in A. We can then define a sequence {p,} inductively by taking p; as any point of A, pp as any point of A not in B,(p;), and pp as any point of A not in B,[Uj~! pi] = Ut! By(p,). Then {p_} is a sequence in A such that p(p;, p;) > r for all i # j. But this sequence can have no convergent subsequence. ‘Thus, if A is not totally bounded, then A is not sequentially eom- pact, proving the lemma, 0 Corollary. A normed linear space V is finite-dimensional if and only if its closed unit ball is sequentially compact. Proof. ‘This follows from ‘Theorem 4.4 in one direction and from the above two lemmas in the other direetion. Lemma 5.3. Suppose that A is sequentially compact and that {E;: i € 7) isan open covering of A (that is, (E}) isa family of open sets and CU; ‘Then there exists an r > O.with the property that for every point p in A the ball B,(p) is included in some B. Proof. Otherwise, for every r there isa point p in A such that By(p) is not a subset of any H;. Take r = 1/n, with corresponding sequence {p,}. ‘Thus Byya(Pal isnota subset of any B;. Since A is sequentially compact, {p,} has a convergent subsequence, Pain) —* Pas m—> co. Since {E;} covers A, some B; contains p,214 COMPACTNESS AND COMPLETENESS 45 ‘and then B,(p) CE; for some € > 0, sinee His open. Taking m large enough so that 1/m <¢/2 and also p(Pacm P) < €/2, we have Buinems(Pnems) CBD) C Ep contradicting the fact that Byja(P) is not a subset of any By. ‘The lemma has thus been proved. 0 Theorem 5.3. If 5 is an open covering of a sequentially compact set A, then some finite subfamily of 5 eovers 4. Proof. By the lemma immediately above there exists an r > 0 such that for every p in A the ball B,(p) lies entirely in some set of 5, and by the first lemma there exist 7,» Px in A such that ACU} By(p:). Taking corresponding sets Ey in 5 such that Br(p.) CBs for i= I,m, we clearly have A CUT Ey. 0 In general topology, a set A such that every open covering of A includes a finite covering is said to be compact or to have the Heine-Borel property. ‘The above theorem says that in a metric space every sequentially compact set is compact. We shall see below that the reverse implication also holds, so that the two notions are in fact equivalent on a metrie space. ‘Theorem 5.4. If A is a compact metric space, then A is sequentially ‘compact. Proof. Let {4} be any sequence in A, and let 5 be the collection of open balls B ly many 2,. If § were to cover A, then by compaetness A would be the union of finitely many balls in 5, and this would clearly imply that the whole of A contains only finitely many 2,, contradicting the fact ‘that {21} isan infinite sequence. Therefore, 5 does not cover A, and so there is a point z in A such that every ball about x contains infinitely many of the 2;. ‘More precisely, every ball about x contains x; for infinitely many indices i. Tt ean now be safely left to the reader to see that a subsequence of {xq} converges to x. 0 EXERCISES 5.1 Show that fa(2) = 2% does not converge uniformly on (0, 1). 5.2 Show that fl2) = 1/2 is not uniformly continuous on (, 1) 5.3 _Define the nation of a function K:X' ¥ + ¥ being uniformly Lipschitz in its second vatiable over its frst variable. 5.4 Let S be a sequentially compact metric space, and let {fa} be a sequence of continous real-valued functions on S that decreases pointwise to zero (that i, {fa(9)} isa decreasing sequence in R and f4(p) ~+ 0 as n+ 2 foreach pin §). Prove that the convergence is uniform. (Try to apply Bxereise 4.15.) 5 Restate the corollaries of Theorems 15.1 and 15.2 of Chapter 3, employing the weaker hypotheses that suffice by virtue of Theorem 5.1 of the present section46 rauicoxmixurry 215 3.6 Prove Theorem 5.2 5.1 Prove that if .1 is an r-lense subset of a set X in a normed linear space V, Bis an sadense subset of a set ¥ CV, then A+ Bis (r+ s)-dense in X-+ Y. Cone clude that the sum of two totally bounded subsets of ¥ is totally bounded. 5.8 Suppose that the n points {p.}f are rlense in a metric space X. Let A be any: subset of X. Show that A has a subset of at most n points that is dense in 0, let $ be @ collection of functions f from (0, 1) to (0, 1) such that f’ exists and |f'| < mon (0, 1). Then |f(z) — f(y)| < mlx — yl, by the ordinary mean-value theorem. Therefore, given any €, we can take = €/mand have feu <8 [fle) — fu] 0, choose 80 that for all fin 5 and all ps, pain A, p(ps, pa) < = plf(ps),S(p2)) < €/4. Let D bea finite subset of A which is 5dense in A, and let B be a finite subset of B which is (¢/4)-dense in B. Let G be the set. B? of all funetions on D into B. G is of course finite; in fact, #@ = n™, where m = #D and n= #2. Finally, for each g € @ let $, be the set of all functions f € 5 such that (Sp), 9(0)) co, then {rq} elearly ought to converge to a limit. Tt may not, however; the desired limit point may be missing from the space. Ifa metric space S is such that every sequence which ought to converge actually does converge, then we say that S is complete. We now make this notion precise Definition. {24} is a Cauchy sequence if for every ¢ there is an NV such that m>Nandn>N = ptm tn) <€ Lemma 7.1. If {24} is convergent, then {24} is Cauchy. Proof. Given ¢, we choose N such that n > N= p(e,, a) < /2, where a is the limit of the sequence. ‘Then if m and n are both greater than N, we have PlEm tn) S plEm a) + pla, %) <€/2+e/2= 6.047 COMPLETENESS = 217 Lemma 7.2, If {z,} is Cauchy, and if a subsequence is convergent, then {aq} itself converges. Proof. Suppose that r_,,)—> aasi— oo. Given €, we take N so that m,n > N= Ply, tm) <€. Because qq» a as i» co, we can choose an i such that n(i) > Nand p(x, a) < €. Thusifm > N, we have Pem 8) S pCtms Eats) + PCEmcn @) < 24, and 80 _— a. 0 Actually, of course, if m,n > N= p(tm, tq) <¢, and if 24 —> a, then for any m > N it is true that p(zm,a) «Why? Lemma 7.3. If A and B are metric spaces, and if 7 is a Lipschits mappin from A to B, then T carries Cauchy sequences in A into Cauchy sequences in B. This is true in particular if A and B are normed linear spaces and "is an clement of Hom(A, B). Proof. Let {zn} be Cauchy sequence in A, and set. ya = T(e,). Given 6 choose N so that m,n > N= plzm Za) < /C, when C is a Lipschitz constant for F. Then m,n > N => paste) = e(T tm), T@n)) S Coltm ta) < Ce/C =. 0 This lemma has a substa generalization, as follows. Theorem 7.1. If A and B are metric spaces, {24} is Cauchy in A, and F: A — Bis uniformly continuous, then {F(¢,)} is Cauchy in B. Proof. ‘The proof is left as an exercise. ‘The student should try to acquire a good intuitive feel for the truth of these Jemmas, after which the technical proofs beome more or less obvious. Definition. A metric space A is complete if every Cauchy sequence in A converges to a limit in A. A complete normed linear space is ealled a Banach space. ‘We are now going to list some important examples of Banach spaces. In each case a proof is necessary, so the list becomes a collection of theorems. ‘Theorem 7.2. Ris complete. Proof. Let {#4} be Cauchy in R. Then {24} is bounded (why?) and so, by ‘Theorem 4.3, has a convergent subsequence. Lemma 7.2 then implies that {za} is convergent. 0 ‘Theorem 7.3. If A is a complete metric space, and if f is a continuous bijective mapping from A to a metric space B such that J" is Lipschitz continuous, then B iscomplete. In particular, if V is a Banach space, and if Tin Hom(V, W) is invertible, then W is a Banach space,218 COMPACTNESS AND COMPLETENESS ag Proof. Suppose that {y,) is a Cauchy sequence in B, and set 2; = J-(y) for all. Then {x3} is Cauchy in A, by Lemma 7.3, and so converges to some. in A, since A is complete. But then yq = (24) — f(z), because f is continuous ‘Thus every Cauchy sequence in B is convergent and B is complete, 0 ‘The Banach space assertion is a special ease, because the invertibility of 1 means that 7 exists in Hom(W, V) and hence is a Lipschitz mapping. Corollary. If p and q are equivalent norms on V and is complete, then so is . Theorem 7.4. If V, and Vy are Banach spaces, then so is V1 x Va. Proof. If {} is Cauchy, then so are each of {4} and {_} (by Lemma 7.3, since the projections m; are bounded). ‘Then fj —> a and 1, — 8 for some a@@ Vy and 6 EVs, Thus — in Vix Ve. Gee ‘Theorem 3.4.) 0 Corollary 1. If {V7 are Banach spaces, then so is ITs Vi. Corollary 2. Every finite-dimensional vector space is a Banach space (in any norm). Proof. R” is complete (in the one-norm, say) by Theorem 7.2 and Corollary 1 above. We then impose one-norm on V by choosing a basis, and apply the corollary of Theorem 7.3 to pass to any other norm. [1 ‘Theorem 7.5. Let W be a Banach space, let A be any set, and let (4, W) be the vector space of all bounded funetions from A to W with the uniform norm |iflje = lub {lf(@)||:@e A}. ‘Then (4, W) is a Banach space Proof. Let {fx} be Cauchy, and choose any a A. Since |lfq(a) ~ fu(a)|| < \\fn — Sala i follows that. {f,(a)} is Cauchy in W and so convergent. Define g: A —W by g(a) = lim f(a) for each a A. We have to show that g is bounded and that f, — g. Given €, we choose N so that m,n > N= llfm— alle <€ Then Ifm(a) — 9(a)|| = lim fue) — fula)|] S € ‘Thus, if m > N, then [ifa(a) — g(a)|| < €foralla € A, and hence lfm — gle S € This implies both that fy — g € @(A, W), and so 9 = Sn — Um ~ 9) € (A, W), and that fg in the uniform norm, 0 ‘Theorem 7.6. If V is a normed linear space and W is a Banach space, then Hom(V, W) is a Banach space. ‘The method of proof is identical to that of the preceding theorem, and we leave it as an exercise, Boundedness here has a different meaning, but it is used47 compuerexess 219 in essentially the same way. One additional fact has to be established, namely, that the limit map (corresponding to g in the above theorem) is linear. ‘Theorem 7.7. A closed subset of a complete metric space is complete, A complete subset of any metric space is closed. Proof. ‘The proof is left to the reader. It follows from Theorem 7.7 that a complete metrie space A is absolutely closed, in the sense that no matter how we extend A to a larger metrie space B, A is always a elosed subset of B. Actually, this property is equivalent to ‘completeness, for if A is not complete, then a very important construction of metric space theory shows that A can be completed. ‘That is, we ean construct ‘a complete metric space B which includes A. Now, if A is not complete, then the closure of A in B, being complete, is different from A, and A is not absolutely closed. Sco Exercise 7.21 through 7.23 for a construction of the completion of a ‘metric space. The completion of a normed linear space is of eourse a Banach space. ‘Theorem 7.8. In the context of Theorem 7.5, let A be a metric space, let €(A, W) be the space of continuous functions from A to W, and set @e(A, W) = BA, W) NECA, W). ‘Then Ge is a closed subspace of @. Fig. 4.3 Proof. We suppose that {f,} C Ge and that [|fn — gle 0, where 9 €@. We have to show that g is continuous. This is an application of a much used “up, over, and down” argument, which ean be schematically indicated as in Fig. 43. Given ¢, we first choose any such that ||f, — gle < ¢/3. Consider now any a € A. Since j, is continuous at a, there exists a 8 such that, pte, 0) < 8 = l\fnlz) — fala) || < €/3.220 COMPACTNESS AND COMPLETENESS az ‘Then plz, a) < 8 = |lo(x) ~ g(a)\| < lla) — fal) + fnlz) — Sa(@)ll + [Win(a) — 9(a)|| < €/3 + €/3 + €/3 ‘Thus g is continuous at a for every a € A, and soy €@e. 0 This important classical result is traditionally stated as follows: The limit of 4a uniformly convergent sequence of continuous functions is continuous. Remark. The proof was slightly more general. We actually showed that if Jn f uniformly, and if each f, is continuous at a, then f is continuous at Corollary. @e(A, W) is a Banach space. ‘Theorem 7.9. If A is a sequentially compact metric space, then A is complete. Proof. A Cauchy sequence in A has a subsequence converging to a limit in A, and therefore, by Lemma 7.2, itself converges to that limit. Thus 4 iscomplete. 0 In Section 5 we proved that a compact set is also totally bounded. It can be shown, conversely, that a complete, totally bounded set A is sequentially compact, 80 that these two properties together are equivalent to compactness. ‘The erucial fact is that if is totally bounded, then every sequence in A hhas a Cauchy subsequence. If A is also complete, this Cauchy subsequence will, converge to a point of A. ‘Thus the fact that total boundedness and completeness together are equivalent to compactness follows directly from the next Jemma. Lemma 74. If A is totally bounded, then every sequence in A has a Cauchy subsequence. Proof. Let {pm} be any sequence in A. Since A ean be covered by a finite number of balls of radius 1, at least one ball in such a covering contains infinitely ‘many of the points {pq}. More precisely, there exists an infinite set My CZ* such that the set {pm :7m € M,) lies in a single ball of radius 1. Suppose that, My,..-, Ma CZ* have been defined so that Miya CM; fori = 1,. MM, isinfinite,and {pq.:m € M,} isa subset ofa bal of radius 1/i for i= 1, .. Sinee A can be covered by a finite family of balls of radius 1/(n-+ 1), at least ‘one covering ball contains infinitely many points of the set {Pm :m=M,}. More precisely, there exists an infinite set Myy2C My such that {Pm im & Mya} is a subset of a ball of radius 1/(n +1). We thus define an infinite sequence {MM} of subsets of 2* having the above properties. ‘Now choose my € My, mz € Mz 0 that my > my, and, in general, mnt € Myx 80 that msi > my. Then the subsequence {Pm,}n is Cauchy. For given ¢, we cen choose n s0 that 1/n < €/2. Then i,j > => mi, mj My PUPmy Pm,) <2(1/n) <€ This proves the lemma, and our theorem is a corollary. 0ar COMPLETENESS — 21 ‘Theorem 7.10. A metric space S is sequentially compact if and only if S is totally bounded and complete. ‘The next three sections will be devoted to applications of completeness to the caleulus, but before embarking on these vital matters we should say a few words about infinite series. As in the ordinary calculus, if (€,} is a sequence in a normed linear space V, we say that the series 5 &; converges and has the sum a, and write D7 & = a, if the sequence of partial sums converges to a. ‘This means that o, + aasn — o, whereg, is the finite sum 5} £; foreach n. We say that Sts converges absolutely if the series of norms S[fj| converges in R. This is abuse of language unles it is true that every absolutely convergent series converges, and the importance of the notion stems from the following theorem. ‘Theorem 7.11. If V is a Banach space, then every absolutely convergent series in V is convergent. Proof. Let 5 be absolutely convergent. This means that ||| converges in R, i.e, that the sequence {s,} converges in B, where s, = 5} [lgi- Ifm N and for all a in B. It then follows from the mean-value theorem for differentials that N(arse) — aF¥(@) — (AFRO — aFF(O)Il < 2elel for all» > N and all & such that 6-+ £€B. Letting n + oo and regrouping, we have IQR) — TE) — (AFF — aFFO)I| < Belial for all such § But, by the definition of dF there is 8 such that, [ar3(e) — aFF Ol S lel when [| < 4. Putting these last two inequalities together, we see that lle <6 = [iarp(e) — TCH) < see. ‘Thus F is differentiable at 6 and dFs = T. 0 ‘The remaining proofs are left as a set of exercises.226 COMPACTNESS AND COMPLETENESS 48 Lemma 8.1. Multiplication on a Banach algebra A is differentiable (from AX A to A). If we let p be the product function, so that p(x, y) = zy, then dpcaar(es 9) = ay + 2b. Lemma 82. Let A be a commutative Banach algebra, and let p be the rmonomial function p(e) = az". ‘Then p is everywhere differentiable and dina) = nay". Lemma 83. If {lla} is a bounded sequence in R, then (nays) is bounded for any 0 < s + zy is @ bounded bilinear map.) 8.3. Prove Lemma 8.2 by making a direct A-estimate from the binomial expansion, asin the elementary calculus 8.6. Prove Lemma 8.2 by induction from Lemma 8.1 8.7 Let A be any Banach algebr pala) = 202+ aza+ ats 8.8 Prove by induction that if g(x) = 2%, then q is differentiable and Prove that p: 2+ 2" is differentiable and that ae) = aieae-1-0, Deduce Lemma 8.2.a8 a corollary. 8.9 Let I beany Banach algebra. Prove that r: z+ 2-1 is everywhere differentiable on the open set U of invertible elements and that ars(z) = az (dint: Examine the proofs of Theorems 8.1 and 8.2.) 8.10 Let A bean open subset of a normed linear space V, and let F and @ be mappings ‘rom A toa Banach algebra X that are differentiable at a, Prove that the product napping FG is differentiable at a and that d(FG,) = F(a) dG, + dFgG(a). Does it follow that d(F)4 = 2F(a) dF? BL Continuing the above exercise, show that if X'is a commutative Banach alge>ra, chen d(F%)_ = nF*Ma) dP. 812 Let F: A + X be a differentiable map from an open set -1 of a normed linear space to a Banach algebra X, and suppose that the element F(8) is invertible in X for every § in A. Prove that the map G: £9 [F(8)|~! is differentiable and that AGu(®) = — F(a) d¥.(8F(@). Show also that if Fis a parametrized are(.t=1.CR), shen G"(a) = —F(a)~" F(a) Fa) 8.13 Prove Lemma 8.3, 8.14 Prove Theorem 8.5 by showing that Lemma 8.3 makes Theorem 8.4 applicable. 8.15. Show that in Theorem 8.4 the convergence of F* to F needs only to be assumed ‘aL one point, provided we know that the codomain space I is a Banach space.228 COMPACTNESS AND COMPLETENESS 49 8.16 We want to prove the law of exponents for the exponential function on @ commutative Banach algebra. Show first that (exp (—z))(exp 2) = ¢ by applying Exercise 7.18 of Chapter 3, the above Exercise 8.10, and the fact that d exps (2) = (exp a)z. B.1T Show that if X is a commutative Banach algebra and F:X — X is a differentiable map such that d¥a(2) = &F(@), then F(8) = Bexp & for some constant 8. [Consider the differential of F() exp (—B.1 8.18 Now set F(g) = exp (E+) and prove from the above exereise that ex + xp (8) exp (7). You will also need the fact that exp 0 = 1 8.19 Let + be a nilpotent clement in a commutative Banach algebra X. ‘That is, 2 = 0 for some postive integer p. Show by an elementary estimate based on the binomial expansion that if | <1, then 2-2" < kn." for n> p. The series of positive terms 5 ner" converges forr < I (by the ratio test). Show, therefore, thatthe series for log (i — (e-F2)) and for (1 — (e-F2}) "converge when fz) <1 8.20. Continuing the above exersse, show that Fy) = log (1 ~ y) is defined and differentiable on the ball y ~ 2|| <1 and that dF) = —U —a)~!+2. Show, therefore, that exp (log (1 ~ y)) = 1 — y on this bal, either by applying the inverse mapping’ theorem or by applying the composite function rule for differentiating. Conclude that for every nilpotent element 2 in X there exists a w in X such that expu= 1a. 8.21 Let X1,...,Xx be Banach algebras. Show that the product Banach space X = TI} X: becomes a Banach algebra if the product xy = <2)... ,2.> is defined as and if the maximum norm is used on X. 18.22. In the above situation the projections x; have now hecome bounced algebra homomorphisms. Tn fact, just as in our original vector definitions on a product space, ‘our definition of multiplication on X was determined by the requirement that m(xy) = ‘x:(x)r.(y) for all i. State and prove an algebra theorem analogous to ‘Theorem 3.4 of Chapter 1. 8.28 Continuing the above discussion, suppose that the series J ayx" converges in X, with sum y. Show that then 3(a,)i(1)* eonverges in X; to y, for each i, where, of course, y= - Conclude that e = for any x = ayes tq> in X 8.24 Define the sine and cosine functions on a commutative Banach algebra, and show that sin’ = eos, cos" = —sin, sin? + eos! = € 9. THE CONTRACTION MAPPING FIXED-POINT THEOREM In this section we shall prove the very simple and elegant, fired-point theorem for contraction mappings, and then shall use it to complete the proof ofthe implicit- funetion theorem. Later, in Chapter 6, it will be the basis of our proof of the fundamental existence and uniqueness theorem for ordinary differential equations. The section coneludes with a eomparison of the iterative procedure of the fixed-point theorem and that of Newton's method.49 {TM CONTHACTION MAPPING FIXED-POLNT THHOWEM — 229 A mapping K from a metrie space X to itself is a contraction if itis a Lipschitz mapping with constant less than 1; that is, if there isa constant C with 0 n, then Plime) $E pleinn 2) < EC's < C/I ~0), and C* + 0.8 n— co, because C <1, Since X is complete, {24} converges to some a in X, and it then follows that K(a) = lim K(z,) = lim 2_41 = a, $0 that a is a fixed point. 0 to choose 8, so that if p(s, 1) < 4, then the distance from K(s, p,) to K(t, m) is at most ¢. Since K(t, p,) = py, this simply says that the contraction with parameter value s moves p; a distance at most €, and so the distance from x to the fixed point py is at most ¢/(1 — C) by Corollary 3. That is, p(s, 0) < 8 = alps, rm) < €/(L — C), where C is the uniform contraction constant, and ‘the mapping s ++ p, is accordingly continuous at ¢. 0 Combining Corollaries 2 and 4, we have the following theorem, ‘Theorem 9.2. Let B be a ball in a complete metric space X, let S be any metric space, and let K bea mapping from $ x B to X which isa contraction in its second variable uniformly over its first variable and is continuous in its first variable for each value of its second variable. Suppose also that K moves the center of B a distance less than (1 — C)r for every s in 8, where r isthe radius of B and C is the uniform contraction constant. Then for each ¢ in S there is a unique p in B such that K(s, p) = p, and the mapping 48> pis continuous from S to B. ‘We can now complete the proof of the implicit-function theorem. ‘Theorem 9.3. Let V, W, and X be Banach spaces, let Ax B be an open subset of V x W, and let G: AX B-—» X be continuous and have a continuous second partial differential. Suppose that the point in AX Bis such that Gla, 8) = 0 and dG%..a> is invertible. Then there are ‘open balls M and N about a and 8, respectively, such that for each & in Af there is a unique » in N satisfying G(é, 2) = 0. The funetion F thus ‘uniquely defined near by the condition G(¢, F(¢)) = 0 is continuous. Proof. Set T= dG%q.g> and K(f, n) = 9 — T—1(G(6, ))- ‘Then X is a continuous mapping from A x B to W such that K(a, 8) = 8, and K has a con-49 THE CONTRACTION MAPPING FIXED-POINT THEOREM — 231 ‘tinuous second partial differential such that dK%a,s> = 0. Because dK%,,,> is ‘continuous funetion of , we ean choose a product ball IM x N about on which dK%,,y> is bounded by 4, and we can then decrease the ball If if necessary so that for sin M we also have ||K(u, 8) — él] < r/2, where ris the radius of the ball N. ‘The mean-value theorem for differentials implies that K is 8 contraction in its second variable with constant 4. ‘The preceding theorem therefore shows that for each in M there isa unique » in N such that K(&, 9) = and the mapping F: ++ » is continuous. Since K(g, 9) = » if and only if G(é, 9) = 0, we are done. ‘Theorems 8.2 and 9.3 complete the list of ingredients of the implicit-function theorem. (However, see Exercise 9.8.) ‘We next show, in the other direction, that if a contraction depending on a parameter is continuously differentiable, then the fixed point is a continuously differentiable funetion of the parameter. ‘Theorem 9.4, Let V and W be Banach spaces, and let K bea differentiable mapping from an open subset A x B of V x W to W which satisfies the hypotheses of Theorem 9.2. Then the function F from A to B uniquely defined by the equation K(¢, F(@) = F(&) is differentiable. Proof. ‘The inequality. |K(E, 9!) — K(& 9”)|| < Clln’ ~ "| is equivalent to (lak 2..55[| $C for all in AX B. We now define @ by G(,) = 1—K(E 1), and observe that dG? = I — dK? and that dG? is therefore invertible by Theorem 8.1. Sifice G(g, F(2)) = 0, it follows from Theorem 11.1 of Chapter 3 that F is differentiable and that its differential is obtained by differentiating the above equation. 0 Corollary. If K is continuously differentiable, then so is F. * We should emphasize that the fixed-point theorem not only has the implicit- function theorem as a consequence, but the proof of the fixed-point theorem gives an iterative procedure for actually finding the value of F(é), once we Know how to compute 7-1 (where T= d@%q,a>). In fact, for a given value of £ in a small enough ball about consider the function G(¢,.). If we set K(g, ) = 9 — T~'0(E, 9), chen the inductive procedure reg = K(E 0) becomes (niga — 1) = —TG(E, a). (9.1) ‘The meaning of this iterative procedure is easily seen by studying the graph of the situation where V = W = R'. (See Fig. 4.4.) As was proved above, under suitable hypotheses, the series Sl/ni41 — mil converges geometrically. Its instructive to compare this procedure with Newton’s method of elementary calculus. ‘There the iterative scheme (9.1) is replaced by GUE Wd), (92) (niga —232 COMPACTNESS AND CoMPLETENESS 49 ht where S; = dG%gq,>- (See Fig. 4.5.) As we shall see, this procedure (when it works) converges much more rapidly than (9.1), but it suffers from the advantage that we must be able to compute the inverses of an infinite number of, linear transformations S;. i Fig. 4.5 . 2 4 Let us suppress the £ which willbe fixed in the argument and ccnsider a map @ defined in some neighborhood of the origin in a Banach space. Suppose that @ has two continuous differentials. For definiteness, we assume that G is defined in the unit ball, B, and we suppose that for each x € B the map dG, is invertible and, infact, Wear 23/4, so that e¥/4 > 41/4 or & > 4, which guarantees that e~*!? < 4, implying that e~"'2/(1 — e-¥/4) < 1, ‘Then (++#) becomes the requirement G(0) < KE ‘We end this section with an example of the fixed-point iterative procedure in its simplest context, that of the inverse-mapping theorem. We suppose that H(0) = Oand that Hf" exists, and we want to invert H near zero, ie., solve the equation H(n) — £ = 0 for 1 in terms of . Our theory above tells us that the » corresponding to £ will be the fixed point of the contraction K(E, n) = 1 — TH) + T-(2), where T = dH. In order to make our example as234 COMPACTNESS AND COMPLETENESS 49 simple as possible, we shall take Hf from B? to R? and choose it so that dHy = I. Also, in order to avoid indices, we shall use the mongrel notation x = , w= Consider the mapping x= H(w) defined by x= u-+e%y ‘The Jacobian matrix 1 ow [se “] is clearly the identity at the origin. Moreover, in the expression K(x, u) = x} u — H(u), the difference H{(u) — wis just the funetion J(u) = ‘This cancellation of the first-order terms is the practical expression of the fact that in forming K(E, 1) = 1 — T~'G(é, 9), we have acted to make dK? = Oat the “center point” (the origin here). We naturally start the ieration with up = 0, and then our fixed-point sequence proceeds uy = K(x, uo) = K(x, 0),--25 ‘Thus up = Oand uy = K(x, u, K(x, Un, x — J(ttn), giving was, 1=% wary, mayne, us gy), Us —@-y, ev) ube yD We are guaranteed that this sequence uy will converge geometrically provided the starting point x is elose enough to 0, and it seems clear that these two sequences of polynomials are computing the Taylor series expansions for the inverse functions u(x, y) and v(x, y), We shall ask the reader to prove this in an exercise, ‘The two Taylor series start out EXERCISES 9.1 Let B be a compact subset of a normed linear space such that rB CB for all rE [0,1]. Suppose that F:B— B is a Lipschitz mapping with eoastant 1 (Le, [FQ — FOI S [fal for all £7 € B). Prove that F has a fixed point. (Hint Consider first @ = rF for 0 is invertible if |dK%p <1. (Do not be confused by the notation. We merely want to know that Sis invertible i ITs} <1) 9.8 There isa sight diserepaney between the statements of Theorem 11.2in Chapter 3 and Theorem @.3. In the one case we assert the existence of a unique continous rapping from a ball Af, and in the other ease, from the ball Af to the ball N. Show that the requirenent that the range be in N’can be dropped by showing that two continuous solutions must agree on IM. (Use the point-by-point uniqueness of ‘Theorem 9.3) 9.9 Compute the expression for df’, from the identity G(g, F()) = 0 in Theorem 94, and show that if K is continuously diferentable, then all the maps involvec inthe solution expressin are continuous and that ai» dF is therefore eontinuous 9.10 Going back to the example worked out atthe end of Section 8, show by induction that the polynomials uy — qt and by — ¥4—1 contain no terms of degree less than m 9.11 Continuing the above exerese, show therefore that the power series defined by taking the terms of degree at most from ts convergent in s ball about O and that its sum isthe first component u(z,») of the mapping inverse to Hf. 9.12 The above conclusions hold generally. Let J = be any mapping from « ball about 0 in R to R? defined by the convergent power series KG0 = Daisy, Lay) = D bia'y! jn which there are no terms of degree 0 or 1, With the conventions us , consider the iterative sequence and Wo = 0, uy x= Sued Make any necessary assumptions about what happens when one power series is substituted in another, and show by induction that uy — us—1 contains no terms of degree less than a, and therefore that the uq define « convergent, power series whose sum is the function u(z, y) = inverse to H in a neighborhocd of 0. [Remember that J(m) = H(n) — 9] 9.13 Let A be a Banach algebra, and let x be an element of 4 of norm less than 1, Show that . Tha+2%. =a)236 COMPACTNESS AND COMPLETENESS 4.10 ‘This means that if x+ is the partial product [Tf (1-+ 2%), then 4 — (e —2)-! [Hint: Prove by induction that (e— z)zu—1 = ¢— 2] ‘This is another example of convergence at an exponential rate, like Newton's ‘method in the text. 10, THE INTEGRAL OF A PARAMETRIZED ARC In this section we shall make our final application of completeness. We first prove a very general extension theorem, and then apply it to the construetion of the Riemann integral as an extension of an elementary integral defined for step) functions. ‘Theorem 10.1. Lat U be a subspace of a normed linear space V, and let 7 be a bounded linear mapping from U to a Banach space W. Then T has a uniquely determined extension to a bounded linear transformation $ from the closure U to W. Moreover, ||S|| = ||7' Proof. Bix ae U and choose (&,} CU so that f4—> a. Then {5} is Cauchy and {1(E)} is Cauchy (by the lemmas of Section 7), so that {T(E,)} con verges to some 8 € W. If {nq} is any other sequence in U converging to a, then 5 — nm —* 0, T(E) — Than) = T(E — ma) 0, and 80 T(4q) — B also ‘Thus 8 is independent of the sequence chosen, and, clearly, 8 must be the value ‘S(a) at « of any continuous extension S of 7. Ifa’ U, then 6 = lim Taq) = T(a) by the continuity of T. We thus have S uniquely defined on T by the requirement that it be a continuous extension of 7: Tt remains to be shown that Sis linear and bounded by |7|. Forany a, 8¢ 0 swe choose {Es}, {nq} CU, 80 that fq +a and ay —>B. Then 265+ Ym. —> E+ yn, so that, S(xa+ yb) = lim Pky + yny) = x lim T(é,) +y lim Pap ‘Thus § is linear, Finally, |/S(a)|| = lim ||7(Eq)|| < I|7\| lim |JEal| = [7] - lal. ‘Thus || is a bound for S, and, since S includes 7, [$j = ||. 0 ‘The above theorem has many applications, but we shall use it only once, to obtain the Riemann integral J2,( dé of a continuous function f mapping a ‘losed interval [a,b] into a Banach space Was an extension of the trivial integral for step functions. If W7 is a normed linear space and J: [a,b] — Wis a continuous function defined on a closed interval [a, b] C R, we might expect to be able to define [2/0 dasa suitable veetor in Wand to proceed with the integral calculus of vector-valued functions of one real variable. We haven't done this ‘until now beeause we need the completeness of W to prove that the integral exists! At first we shall integrate only certain elementary funetions ealled step functions. A finite subset A of [a,b] which contains the two endpoints a and b 2S(a) + yS(8).4.10 ‘THE INTEGRAL OF A PARAMETRIZED AKC 237 will be ealled a partition of fa, b]. Thus A is (the range of) some finite sequence {6)B,wherea =f < ty < +++ < t = band A subdivides (a,b) into a sequence of smaller intervals. To be definite, we shall take the open intervals ((-1, 0), i =1,...,m,as the intervals of the subdivision. If A and B are partitions and ACB, we shall say that B isa refinement of A. Then each interval (s;-1, sj) of the B-subdivision is included in an interval (1, 4) of the A-subdivision, f:—1 is the largest element of A which is ess than or equal to 81, and fis the smallest greater than or equal to sj. A slep function is simply a map f: (a, 6] + W which is conatant on the intervals of some subdivision A = {&3}3. ‘That is, there exists 1 sequence of veetors fai} such that f(@) = a; when £ € (1, 6). The values of fat the subdividing points may be among these values or they may be different. For each step funetion f we define Jo f(t) dt.as Eta ay Aty, where f= a; on (inst) and St = — (1. If f were real-valued, this would be simply the sum of the areas of the rectangles making up the region between the graph of f and the taxis. Now f may be described as a step function in terms of many different subdivisions. For example, if fis constant on the intervals of A, and if we obtain B from A by adding one new point s, then f is constant on the (maller) intervals of B. We have to be sure that the value of the integral of f doesn't change when we change the describing subdivision. In the case just ‘mentioned this is easy to see. The one new point «lies in some interval (t.-1 4), dofined by the partition A. The contribution of this interval to the A-sum is act; — th-1), while in the B-sum it splits into au(ts ~ 4) -+ ai(@ ~ i). But this is the same vector. The remaining summands are the same in the two sums, ‘and the integral is therefore unchanged, In general, suppose that J is a step function with respect to A and also with respect to C. Set B= AUC, the “common refinement” of A and C. We can pass from A to B in a sequence of steps at each of which we add one new point. As we have seen, the integral remains unchanged at each of these steps, and so it is the same for A as for B. tis similarly the same for C and B, and $0 for A and C. We have thus shown that J? fis independent of the subdivision used to define /. Now fix (a, }] and W, and let & be the set of all step funetions from fa, b] to W. Then & isa veetor space. For, iffand g in & are step functions relative to partitions A and B, then both funetions are constant on the intervals of C = AUB, and therefore xf + yg is also. Moreover, if € = {4}, and if on (41,6) wehavef = azand = 8,30 that 2f ++ yg = 2a;-+ va, there, then the equation EE teas vay ate 2(S avan) + (3 acan) is just Jo Cah + uo) = 2 is thus linear from 6 to W. Finally, ir if 4 +y fig. The map f+ eval <¥ lelau < lle ~ 0238 COMPACTNESS AND COMPLETENESS 4.10 where [fle = lub {|,f(Ol| :¢ € [a, 6]} = max {llayl| 21 R? defined by 40) = sin f,c08 > 10.5. Show that integration commutes with the application of linear transformations. ‘That is, show that if fis a continuous function from (a, 8] to a Banach space I, and if T © Hom(W, X), where X is a Banach space, then f[rg0) a= r[ fsa. Uint: Make the computation direetly for step functions.) 10.6 State and prove the theorem suggested by the following identity [[ is the ‘unique multiplicative identity in C and that every nonzero complex number & has a multiplicative inverse. These additional facts are summarized by saying that C is a field, and they allow us to use C as a new scalar field in vector space theory. In fact, the whole development of Chapters 1 and 2 remains valid when Ris replaced everywhere by €. Scalar multiplication is now multiplication by complex numbers. ‘Thus C" is the vector space of ordered n-tuples of complex numbers <&1,---, >, and the product of an n-tuple by a complex scalar is defined by a = , where a& is complex multiplication. Iti time to come to grips with complex multiplication. As the reader prob- bly knows, itis given by an odd looking formula that is motivated by thinking of an element § = <2, z2> as being in the form z+ ii, where i? = —1, and then using the ordinary laws of algebra. ‘Then we have bn = (er + ita)(us + ty) ay + tea + tea + Peeve = Cen — 20y2) + ewe + za), and thus our definition is <2, 22> = <2 ~ wave, tia + wath > Of course, it has to be verified that this operation is commutative and satisfies the laws for an algebra. A straightforward check is possible but dull, and we shall indicate a neater way in the exercises. ‘The mapping z+ is an isomorphic injection of the field R into the field C. It clearly preserves sums, and the reader ean check in his mind that it, also preserves produets. It is eonventional to identify 2 with its image , end so to view Ras a subfield of C. ‘The mysterious # ean be identified in C as the pair <0, 1>, since then = <0,1> <0,1> = <—1,0>, which we have identified with —1. With these identifications we have = + <0,y> = + <0, 1> = 2 + iy, and this is the way we shall write complex numlers from now on. ‘The mapping + iy+ x — ty is a field isomorphism of C with itself. ‘That is, it preserves both sums and products, as the reader can easily check. Such a self-isomorphism is called an automorphism. ‘The above automorphism is called complex conjugation, and the image — iy of = z+ ty is called the conjugate of ¢, and is designated f. We shall ask the reader to show in an exercise242 COMPACTNESS AND COMPLETENESS 41 that conjugation is the only automorphism of C (except the identity automorphism) which leaves the elements of the subfield B fixed. ‘The Euclidean norm of ¢ = x-+ iy = <2, y> is called the absolule ealue of f and is designated |r], so that |¢| = le iy] = (7+ 1°)". This is reasonable because it then turns out that |¢7| = [¢| [1]. ‘This ean be verified by squaring and multiplying, but it is much more elegant first to notice the relationship between absolute value and the conjugation automorphism, namely, aE = Il* (Ce y)le = iy) = 2? ~ (iy)? = 27+ 93), Then |9)? = (END = DO = Is, and taking square roots gives us our identity. The identity ¢F = [¢/ also shows us that if ¢ # 0, then F/(¢|® is its multiplicative inverse. Because the real number system ® is a subfield of the complex: number system C, any’ veetor space over C is automatically also a veetor space over R: ultiplieation by complex sealars includes multiplication by real sealars. And any complex linear transformation between complex veetor spaces is automatically real linear. ‘The converse, of course, does not hold. For example, a real linear mapping 7' from IR? to R? is not in general complex linear from C to C, nor does a real linear $ in Hom R* become a complex linear mapping in Hom C* when R* is viewed as C*. We shall study this question in the exercises. ‘The complex differentiablity of a mapping F between complex veetor spaces has the obvious definition AF, = T'+o, where Tis complex linear, and then F is also real differentiable, in view of the above remarks. But F may be real differentiable without being complex differentiable. It follows from the discussion at the end of Section 8 that if {ay} C Cand {a} is bounded, then the series I a,{" converges on the ball Bg(0) in the (real) Banach algebra C, and F(g) = LG ani" is real differentiable on this ball, with d’s(¢) = (LT nanB")f = F'G)- ¢. But multiplication by F”@) is obviously a complex linear operation on the one-dimensional complex vector space C. ‘Therefore, complex-valued functions defined by convergent complex power series are automatially complex differentiable. But we ean go even further. In this ease, if ¢ = O, we ean divide by ¢ in the defining equation AF a(t) FB): F + (5) to get the result that SH) mae $0, ‘Thatis, F’(8) is now an honest derivative again, with the complex infinitesimal ¢ in the denominator of the difference quotient. ‘The consequences of complex differentiability are inealeulable, and we shall mostly leave them as future pleasures to be experienced in a course on functions of complex variables, See, however, the problems on the residue ealeulus at the end of Chapter 12 and the proof in Chapter 11, Exercise 4.8, of the following fundamental theorem of algebra.41 ‘THE COMPLEX NUMBER SYSTEM — 245 Theorem. Every polynomial with complex coefficients is a product of linear factors. A weaker but equivalent statement is that every polynomial has at least one (complex) root. The crux of the matter is that 2? + 1 cannot be factored over R (ie, it has no real root), but over € we have z*-+ 1 = (r+ a(x ~~ 2), with the two Toots + i. For later use we add few more words about the complex exponential funetion expt =o = LG r"/nl. i p= a+ iy, we have d = ort et = YF (iy) "nl = (1 — 2/2! yS/At— ++) iy — y¥/3tt y8/s! 3 cos y isin y. Thus e*+¥ = e*(c0s y+ ésin y). ‘That is, the real and imagi paris of the complex-valued function exp (2+ iy) are e* eos y and e* sin y respectively. EXERCISES TAA Prove the associativity of complex multiplication direetly from its definition 11.2. Prove the distributive law, a(+ 9) = ak + an, for complex numbers. 11.3. Show that scalar multiplication by a real number a, a = , in © = B? is consistent with the interpretation of @as the complex number and the definition of complex multiplication. 11.4 Let @ be an automorphism ofthe complex number field leaving the real numbers fixed. Prove that 8 is either the identity or eomplex conjugation. [Hint: (6(4))® = (2) = &—1) = —1. Show that the only complex numbers x-+ fy whose squares are ‘re a and then finish up] 11.5 _Ifweremember that C isin particular the two-dimensional real vector space R2, we see that multiplying the elements of C by the complex number a + i must define ‘linear transformation on B?. Show that its matris is bo 116 The above exercise suggests that the complex number system may be like the set of all 2X 2 real matrices of the form ba Prove that A is a subalgebra of the matrix algebra RE*F (hat is, A is closed under ‘multiplication, addition, and sealar multiplication) and that the mapy pes is bijection from A to € that preserves all algebra operations. We therefore can conclude that the laws of an algebra automatically hold for C, Why’?241 COMPACTNESS AND COMPLETENESS 41 11.7 In the above matrix model of the complex number system show that the abso lute value identity [$7] = [f] 7] is a determinant property. 18 Let W be a real veetor space, and let V be the real vector space Wx W Show that there is a @in Hom V such that 6 = —I. (Think of © as being the real vector space R? = RX R under multiplieation by é) 11.9. Let ¥ bea real veetor space, and let @ in Hom V satisfy 62 = —7. Show that ¥ becomes a complex vector space if ia is defined as O(a). If the complex veetor space F is made from the real vector space Was in this and the above exercise, we shall call V the complecifcation of 1°. We shall regard W itself as being a real subspace of 1 (actually WX {0}), and then V = W @ itt 11.10 Show that the compley veetor space C* is the complexifcation of R". Show more generally that for any'set 1 the complex vector space C* is the eomplesifieation ‘of the real vector space Re 11.11 Let V be the complexiication of the real veetor space W. Define the operation of complex conjtgation on V. ‘That is, show that there is real linear mapping ¢ such that ¢? = I and y(ia) = —ig(a). Show, conversely, that if V is a complex vector space and y is 2 conjugation on V a real linear mapping ¢ such that ¢? = T and tia) = —ig(a)), then V is (isomorphic to) the complexifcation of a real linear space W. (Apply Thevrem 5.5 of Chapter 1 to the identity ¢? — I = 0.) 11.12. Let W be a real vector space, and let V be its complesifieation. Show that every T'in Hom IT “estends” to a complex linear S in Hom V which commutes with the conjugation ¢, By S extending T we mean, of course, that S| (IX {0}) = T. Show, conversely, that if $ in Hom V commutes with conjugation, then $ is the extension of a Tin Hom IV, 11.13 In this situation we naturally eall S the complexifcation of T. Show finally that if S is the complexifcation of 7, then its null space X in V is the direct sum X = N Oi, where V is the null space of Tin W. Remember that we are viewing V as WOW, 1114 On a complez normed linear space V the norm is required to be compler homogeneous: (Pal = (+ Hl for all complex sumbers X. Show that the natural definitions of || lay Il lay and |) lle ‘on C* have this property. 11.15 If a real normed linear space I is complexified to V = 1" @ ll’, there is no trivial formula whieh eonverts the real norm for W into a complex norm for V- Show. that, nevertheless, any product norm on V (which really is 1" 1) ean be used t generate an equivalent complex norm. [Hint: Given <£, 9> & V, eonsider the set of numbers (\(2-+ iy) <£,0>| [e+ iy] = 1), and try to obtain from this set a single ‘number that wo-ks.) 11.16 Show thst every nonzero complex number has a logarithm, ‘That is, show that ifw-+ de ~ 0, then thereexists an x-} iy such that c= = u-+ iv, (Write the equation e*(e0s y +f sing) = u-+ ae, and solve by being slightly clever.) 117 The fundamental theorem of algebra and ‘Theorem 5.5 of Chapter 1 imply that if V isa complex vector space and Tin Hom V satisfies p(T’) = 0 fora polynomial42 WEAK MetHops 245 py then thore are subspaces {V7 of V, complex numbers {4}, and integers {nf such Gat V = @f Vs, Vis Pnvariant foreach, and (7 — NT)" = 0 on V, for each Show that this iss, Show also that if Vis fnitexdimensional, then every "in Hom ¥ must satisfy some polynomial equation p(t) = 0. (Consider the linear independence oe dependence of the veetor 1,7, T?,...,T, yin the vector space Hom V.) 11.18 Suppose that the polynomial pin the ahove exereise has rel coeficients, Use the fact chat complex conjugation is an automorphism af C to prove that iis root ofp, then so is show that if V isthe eomplesfcation ofa real space Wand 7’ the complexiien tion of & € Hom I, then there exists a real polynomial p such that p(7) = 0. 11.19 Show that if Wis a finite-dimensional real vector space and 2 € Hom Wis an isomorphism, then there exists an Hom W such that R= e* (that is log R exist). ‘This is a hard exercise, but it can be proved fom Exercises 8.19 through 8.23, 1.13, 1147, and 118. ‘2, WEAK METHODS Our theorem that all norms are equivalent on a finite-dimensional space suggests that the limit theory of such spaces should be accessible independently of norms, and our earlier theorem that every linear transformation with a finite-dimensional domain is automatically bounded reinforees this impression. We shall look into this question in this section. In a sense this effort i irrelevant, since we can't do without norms completely, and since they are so handy that we use them even when we don’t have to. Roughly speaking, what we are going todo is to study a vector-valued map F by studying the whole collection of real-valued maps {Ie F:1¢V*} ‘Theorem 12.1. If V is finite-dimensional, then —» in V (with respect to any, and so every, norm) if and only if ((g,) —r (2) in R for each Lin V*, Proof. If & > € and Le V*, then [(g,) — I(2), since 1 is automatically eon- tinaous. Conversely, if (gs) —» I(@) for every Pin V*, then, ehoosing a basis (8) for V, we have e(E,) — &(8) for each functional ¢; in the dual basis, and this implies that > € in the associated one-norm, since [fq — Eh Eh lelEs) — (6) 0. 0 Remark. If V is an arbitrary normed linear space, so that "* = Hom(V, Ri is she set of bounded linear functionals, then we say that &, —> & weakly if I(g,) + U(8) for each 1€ V*. The above theorem ean therefore be rephrase to say that in a finite-dimensional space, weak convergence and norm convergence are equivalent notions. ‘We shall now see that in a similar way the integration and differentiation of parametrized ares ean all be thrown back to the standard calculus of real-valued functions of a real variable by applying functionals from V* and using the natural isomorphism of V** with V, Thus, if f€ e(a, b), V) and X€ V*, then246 COMPACTNESS AND COMPLETENESS 4a ef ©€((a, HR), and so the integral [°\ exists from standard caleulus. If« vary ), we can check that the map Xv f?\e fis linear, hence is in V**, a therefore is given by a uniquely determined vector « € V (by duality; s Chapter 2, Theorem 3.2.). That is, there exists a unique @ © V such that Ma) = f? df for every } € V*, and we define this « to be f? f. Thus integra tion is defined so as to commute with the application of linear functionals J2 fis that vector such that. ae) f[sro) ae forall EY, Similarly, if all the real-valued functions (X°f:\e V*} are differentiable atzp, then the mapping A+ (Xe f)/(ro) is linear by the linearity of the derivative in the standard ealeulus: (ead + eda) © A) = (x0 of) + e002)” ‘Therefore, there is aguin a unique a € V such that er of)’ + ea02 oS" Qe /'o) = Me) — forall NEV, and if we define this a to be the derivative /"(ro), we have again defined an oper ation of the calculus by commutativity with linear functionals (We FMe0) = ero). Now the fundamental theorem of the ealeulus appears as follows If F(z) = [Ef, then (A F)(2) = fFAef by the weak definition of the integral. ‘The fundamental theorem of the standard calculus then says that (ve FY’ exists and (© F(x) = (de f)(x) = Mf(e)). By the weak definition 1 the derivative we then have that F” exists and P’(2) = f(x) ‘The one conclusion that we don’t get so easily by weak methods is the nom inequality |/{2 | < (6 —a)|lflle ‘This requires a theorem about norms on finite-dimensional spaces that we shall not prove in this course. ‘Theorem 12.2, |/a**|| = |lal| for each a eV. What is being asserted is that lub Ja**(\)|/Al| = lll. Since a**(X) = Na). and since |[X(a)| < | - l'al| by the definition of }\), we see that lub Ja**(a)|/[1Al| < lel. Our problem is therefore to find \ € V* with {[Al = Land [Aa)| = lla. If we multiply through by a suitable constant (replacing a by ea, where ¢ = 1/lla). wwe ean suppose that ljal| = 1. ‘Then a is on the unit spherical surface, and thy problem is to find a functional € V* such that the affine subspace (hyperplane) Where \ = 1 touches the unit sphere at a (so that Na) = 1) and otherwis lies outside the unit sphere (so that [\(8)| <1 when jel) = 1, and hence IN <1). Ttis clear geometrically that such “tangent planes” must exist, bul ‘we shall drop the matter here412 WEAK wmHops 247 If we assume this theorem, then, since [CLA] = [ff xGe@) at] < @ — ) max {1a(s(0)|¢€ a, < @—a)lfAll max {if} (From [AC@)| < JAI - lal) = © a)IAl [lor UA=1L 91a (La/s 6 — ois the extreme members of which form the desired inequality. we getCHAPTER 5 SCALAR PRODUCT SPACES In this short chapter we shall look into what is going on behind two-norms, aul wwe shall find that a wholly new branch of linear analysis is opened up. ‘Th norms ean be characterized ly as those arising from sealar product ‘They are the finite and infinite-dimensional analogues of ordinary geometric length, and they ary with them practically all the eoneepts of Euclidean ‘goometry, such as the notion of the angle between two veetors, perpendicularits (orthogonality) and the Pythagorean theorem, and the existence of many rigid motions. ‘The impact of this extra structure is particularly dramatie for infinite dimensional spaces, Tnfinite orthogonal bases exist in great profision and exw be handled about as casily as bases in finite-dimensionsal spaees, although the basis expansion of a vector it now a convergent infinite series, € = 7 sw Many of the most important series expansions in mathematies are examples such orthogonal basis expansions, For example, we shal see in the next ehaptes that the Fourier series expansion of a continuotis funetion f on (0, x] is the bas expansion of f under the two-norm [fii = (ff /2)" for the particular orthox onal basis {a} 7 = {sin nf} {. Ifa veetor space is complete under a sealar produrt norm, itis ealled @ Hilbert space, ‘The more advaneed theory of such spaces i one of the most beautiful parts of mathematics. ot 1. SCALAR PRODU A scalar product on a real veetor space Vis a real-valued function from VX to R, its value at the pair ordinarily being designated (£9), such that 8) (1) is linear in & when 9 is held fixed; b) (En) = (m9 (@ymmetry); ©) (G4) > Oi E #0. (positive definiteness), It (0) is replaced by the weaker condition ©) (5 8 2 Oforall FEV, then (&, 2) is called a semiscalar product ‘Two important examples of sealar products are (y= Sra when Pak 285a SCALAR PRopueTs 249 and (ho= ['H000 a when V = e(a,). On a complex vector space (b) must be replaced by b) (2) = Ge) Germitian symmetry), where the bar denotes eorplex conjugation. ‘The corresponding examples are (z, w) = Lf aim; when V = €* and (f,9) = ff’ f9 when V is the space of continuous complex-valued funetions on (a,b). We shal study’ only the real case. It follows from (a) and (b) that a semisealar product is also linear in the second variable when the first variable is held fixed, and therefore is a symmetric bilinear functional whose associated quadratic form q(#) = (£, £) is positive definite or positive semidefinite [(c) or (e'); see the last section in Chapter 2]. ‘The definiteness of the form q has far-reaching consequences, as we shall begin to see at once. ‘Theorem Il. ‘The Schwwars inequality [eal S&B yt? is valid for any semisealar product. Proof. We have 0 < (&— tn — tn) = (6, 8) ~ 21(6, 9) + (a, 9) for every ER. Since this quadratie in ¢ is never negative, it cannot have distinet roots, and the usual (6? — dac)-formula implies that 4(&, 9)” — 4(& &(n 9) <0, which is equivalent to the Schwars inequality. 0 We can also proceed directly. If (n, ») > 0, and if we set t= (6, 9)/(m, ») in the quadratic inequality in the first line of the proof, then the resulting ‘expression simplifies to the Schwarz. inequality. If (n, n) = 0, then (&, n) must. also be 0 (or else the beginning inequality is clearly false for some 0), and now the Schwarz inequality holds trivially. Corollary. If (g,) is @ sealar product, then zl] = (&, 8)1!? is a norm, Proof let al? = (+n t+— eh? + 2CE, a) + [lal]? < 81? + 28! llnll + lla]? (by Schwarz) = (Well + lal)? proving the triangle inecuality. Also, lle {el ii. O Note that the Schwarz inequality |(&, »)| < {jl In is now just the state- ‘ment that the bilinear furctional (¢, n) is bounded by one with respeet to the sealar product norm. ‘A normed linear space V in which the norm is a sealar product norm is, called a pre-Hilbert space. If V is complete in this norm, it is a Hilbert space. (eb, 6)? = (e*(, 9)?250 SCALAR PRODUCT SPACES 5 ‘The two examples of scalar products mentioned earlier give us the real explana- tion of our two-norms for the first time Iule= (Sat) or xem" a lil =([P)"" tor Fe ece,09 are sealar product norms. Sinee the sealar product norm on R" becomes Buelidean length under » Cartesian coordinate correspondence with Euclidean n-space, itis conventional to call R* itself Euclidean n-space E" when we want it understood that the sealar product norm is being used ‘Any finite-dimensional space Vis a Hilbert space with respect to any sealar product norm, because its finite dimensionality guarantees its completeness On the other hand, we shall see in Exercise 1.10 that €((a, )) is incomplete in the ‘two-norm, and is therefore a pre-Hilbert space but not Hilbert space in this norm. (Remember, however, that @((a, b)) is complete in the uniform norm fle.) It is important to the real uses of Hilbert spaces in mathematics that at ppre-Hilbert space can be completed to a Hilbert space, but the theory of infinite dimensional Hilbert spaces is for the most part beyond the seope of this book. Scalar product norms have in some sense the smoothest. possible unit spheres, because these spheres are quadratic surfaces. It is orthogonality that gives the theory of pre-Hilbert spaces its special flavor. Two vectors « and 8 are said to be orthogonal, written a 8, if (a, 8) = 0 This definition gets its inspiration from geometry; we noted in Chapter 1 that two geometric veetors are perpendicular if and only if their coordinate triples » and y satisfy (x, y) = 0. It is an interesting problem to go further and to show from the law of cosines (c? = a? + b — 2ab cos 6) that the angle # between two geometric vectors is given by (x, y) = |x| ly|| 088. ‘This would motivate us to define the angle @ between two veetors § and 1 in a pre-Hilbert space by (En) = [| [ln] e08 8, but we shall have no use for this more general formu- lation. We say that two subsets A and B are orthogonal, and we write A 4 B, if LB forevery ain A and in B;forany subset A weset A+ = {ge V: 8 1 4} Lemma L.1. If Bis orthogonal to the set A, then 6 is orthogonal to L(A), the closure of the linear span of A. It follows that BY is a closed subspace for every subset B. Proof. The first assertion depends on the linearity and continuity of the sealar product in one of its variables; it will be left to the reader. As for A = BY, it includes the closure of its own linear span, by the first part, and so is a closed subspace. 0Bl SCALAR PropucTs 251 Lemma 12. In any proHilbert space we have the parallelogram law, lla+ al? + lla — Bl? = 2a? + Il), and the Pythagorean theorem, @ 18 ifandonly if |ja+ ll? = lla? + lal? If {a:}] is a (pairwise) orthogonal collection of vectors, then Poa ai] = 2 lal Proof. Sinee |la-+ lla? + 2(@, 8) + sl’, by. the bilinearity o? the sealar product i+ al/?, we see that |la+ 6\/? = Jal? + al? if and only if (a,8) = 0, which is the Pythagorean theorem. Writing down the similar expansion of {la — g\|? and adding, we have the parallelogram law. The last statement follows from the Pythagorean theorem and Lemma 1.1 by induetion. Or we can obtain this statement direetly by expanding the scalar product om the left and noticing that all “mixed terms” drop out by orthogonality. 0 ‘The reader will notice that the Schwarz inequality has not been used in this lemma, but it, would have been silly to state the lemma before proving that lis] = 9"? isa norm, If fai}}_ are orthogonal and nonzero, then the identity [EY zl? = EE alla? shows that Fase, ean be zero only if all the coeficients 2, are zero. Thus, Corollary. A finite collection of (pairwise) orthogonal nonzero veetors is independent. Similarly, @ finite collection of orthogonal subspaces is independent. EXERCISES, 1.1 Complete the second proof of Theorem 1.1 1.2. Reexamine the proof of Theorem 1.1 and show that if £ and 9 are independent, then the Schwarz inequality is strict. 1.3. Continuing the above exercise, now show that the triangle inequality is strict if € and y are independent. 14 a) Show that the sum of two semiscalar produets is a semiscalar product. b) Show that if (4, ») is a semisealar product on a vector space Wand if Tis a linear transformation from a vector space V to W, then [&, a] = (Tu, 7s) is a somisealar produet on V. © Deduee from (a) and (b) that (9) = Kowor+ [fora is a semisealar product on V = €'(a, 0). Prove that itis sealar product.252 ScazAR PRODUCT SPACES 1.5. Waris held fixed, we know that f(Q) = (8,4) is continuc generally that (£, ) is continuous as @ map from FX V to B. is. Why? Prove nici: 1.6 Let ¥ bea two-dimensional Hilbert space, and let {a1, a2} be any basis for | Show that a sealar produet (8, n) has the form (& 0) = acu + blerye + ayn) + cxaya, where b? < ac, Here, of course, & = cay} #202, = yor + yeas, 17 Prove that if w(x, 9) = ariyi+ Dlaia-+ 2a) + exays and D8 < ae, then « is a sealar product on RE 1.8 Let a(n) be any symmetie bilinear functional on @ fnite-dimensional vet space Vand Tet 9(8) = ag, €) be its associated quadratic form, Show that. for a choice of a basis for V the equation 4(8) = 1 becomes a quadratic eqeation in tl coordinates (23) of & 1.9 Prove in detail that ia veetor Bis orthogonal to a set in a pretllbert space ‘hen Bis orthogonal to TAA. 1.10 We know from the last chapter that the Riemann integral is defined for the st B of uniform limits of real-valued step funetions on (0, 1} and that & ineudes all ths continuous funetions. Given that k isthe step function whose value is 1 0n (0, 8} awl 0 on [B, 1), show that [Jf — kljp > 0 for any continuous function f. Show, however that there isa sequence of continuous funetions {4} such that fy —~ kp —> 0. Show therefore, that €{(0, 1) is incomplete in the two-norm, by showing that the abovs sequence {fx} is Cauchy but not convergent in (0, 1D) 2, ORTHOGONAL PROJECTION One of the most important devices in geometric reasoning is “dropping a per- pendieular” from a point to a line or a plane and then using right triangh arguments. This device is equally important in pre-Hilbert space theory. If isa subspace and a is any element in V, then by “the foot of the perpendicular dropped from a to M” we mean that vector w in Mf such that («— x) LM, if such ay exists, (See Fig. 5.1.) Writing a as 4 + (a — x), we see that the existence of the “foot” win M for each ain Vis equivalent to the direct sum decomposition V = M @ M*. . Now it is precisely this direct sum decomposition that the completeness of a Hilbert space guarantees, s/ as we shall shortly see. We start by proving the 3 geometrically intuitive fact that m is the foot of the oy perpendicular dropped from a to M if and only if w is the point in M closest to a Fig. 5. Lemma 2.1. If isin the subspace M, then (a ~ g) LM if and only-if wis the unique point in M closest to a, that is, u is the “best approximation” to ain, Proof. If (a — w) LM and § is any other point in M, then [lx — €]? N@ =o) + = BF = Ne — wl? + [ln — 2 > lle ~ wll®. Thus pis the82 ORTHOGONAL PROJECTION 253 ‘unique point in MT closest toa. Conversely, suppose that is point in M closest to a, and let £ be any nonzero veetor in M. Then la — ull? < ||(@ — uw) + t8l?, which becomes 0 < 2i(a ~ p, £) + €|/§|? when the right-hand sealar produc: is expanded. This ean hold for allt only if (a ~ s, &) = 0 (otherwise let t = 2) ‘Therefore, (@ — x) LM. 0 ‘On the basis of this lemma it is clear that a way to look for u is to take a sequence uy in M such that lla — || ~» p(a, Af) and to hope to define yas its Imit. Here is the erux of the matter: We ean prove that such a sequence {un} is ‘always Cauchy, but its limit may not exist if Af is not complete! Lemma 2.2. If {u)} is a sequence in the subspace IM whose distance from some vector a converges to the distance p from a to M, then {49} is Cauchy. Proof. By the parallelogram law, lly — s||? = la — tn) — (@ — ttm) |? = (lla — myl|? + lle — m||*) — [2 — (a + Hm)||?. Since the first term on the right converges to 4p? as n,m— oo, and since the second term is always < —4p? (factor out the 2), we see that un — Hm||? > 0 as nm —> oo. O ‘Theorem 2.1. If M is a complete subspace of a pre-Hilbert space V, then V = M @ MS In particular, this is true for any finite-dimensional subspace of a pre-Hilbert space and for any closed subspace of a Hilbert space. Proof. This follows at once from the last two lemmas, since now 4 = exists, || — |] = p(a, M), and so (a ~ w) LM. 0 im py LFV = M @ M+, then the projection on Mf along Mis ealled the orthogonal rojection on M, or simply the projection on M, since among all the projections on -M associated with the various complements of M, the orthogonal projeetior. is éistinguished. ‘Thus, if Mis @ eomplete subspace of V, and if P is the projeetion cn M, then P(8) is at once the foot of the perpendicular dropped from £ to M (shich is where the word “projection” comes from) and also the best approxi- nation to € in Mf (Lemma 2.1) Lemma 2.8. If {M,}{ isa finite collection of complete, pairwise orthogonal subspaces, and if for a vector a in V, ais the projection of a on M; for = 1,...,n, then D4 a is the projection of aon @} Mi. Progf. We have to show that a — Et a; is orthogonal to i M;, and it is sufficient to show it orthogonal to each M; separately. But if € M,, then (a — Lh ay &) = (a ~ a, 8) since (as, &) = 0 for é x j, and (a ~ ay, §) = 0 because ay isthe projection of a on M;. Thus (a — Dh, :) = 0. 0 Lemma 2.4. The projection of ¢ on the one-dimensional span of a single nonzero vector 7 is ((E, 1)/|\n|?)y- Proof. Here w must be of the form 29, But ( — 29) 4 if and only if 2, or = x= (E,)/|lnl)?, 0 O= (E— 2m, 9) = (89) —254 SCALAR Propuer sPacts Bb We call the number (£, »)/|\n||? the 1-Fourier coefficient of . If » is a unit (normalized) vector, then this Fourier coefficient is just (¢, m). It follows from. ‘Lemma 2.3 that if {¢,)7 is an orthogonal collection of nonzero vectors, and if {23 are the corresponding Fourier coefficients of a vector &, then 5} zyes is th projection of Fon thesubspace M spanned by {g;}. Therefore, §— Tj zig; 1M and (Lemma 2.1) £3 zx. is the best approximation to in M. If ¢'sin M, then both of these statements say that = Ez. (This ean of coure be verified directly, by letting ¢ = Tj aig; be the basis expansion of £ and computing (& ¢) = Lh aes ¢/) = aylesl®) If an orthogonal set of vectors {¢;} is call the set orthonormal. normalized (ilg,|| = 1), then we Theorem 2.2. If {es} isan infinite orthonormal sequence, anc if {2} are the corresponding Fourier coefficients of a vector &, then Liz < \lel? — (Bessel’s inequality), and § = Lia? = |e? (Parseval's equation). Proof. Setting oy =X} ave: and & £0, Lon, we have 7 nig: if and only § — 05) +0) and remembering that Wel? = Ne onl +3 ‘Therefore, Dj 2? < |¥|? for alln, proving Bessel’s inequality, and oy —> & (that is, | & — onl] + 0) if and only if Df 2? — ||&|/?, proving Parseval’s identity. O ‘We call the formal series 5 aye: the Fourier series of & (with respect to the orthonormal set. {gi}). The Parseval condition says that the Fourier series of & converges to £ if and only if ||¢|? = E722 ‘An infinite orthonormal sequence {gi}7 is called a basis for a pre-Hilbert space V if every element in V is the sum of its Fourier series. ‘Theorem 2.3. An infinite orthonormal sequence {vi}? is a basis for a pre- Hilbert space V if (and only if) its linear span is dense in V. Proof. Let £ be any element of V, and let {zd be its sequener of Fourier coefficients, Since the linear span of {¢) is dense in V, given any e, there is a finite linear combination C} yigs which approximates £ to within € But SUP xe: is the best approximation to & in the span of {¢}, by Lemmas 2.3 and 2.1, and so f-Sanfse av woe That is, €= Ci zw. O Corollary. If V is a Hilbert space, then the orthonormal sequence {¢;}7 is'@ basis if and only if {¢:) + = {0}.Bz ORTHOGONAL PROSECTION 255 Proof. Let M be the closure of the linear span of {gi}. Since V = M+ M+, and since M+ = {g}4, by Lemma 1.1, we see that (g,) = {0} if and only if V = M, and, by the theorem, this holds if and only if {¢;} is a basis. C Note that when orthogonal bases only are being used, the coefficient of a vector ¢ at a basis element 8 is always the Fourier coefficient (é, 6)/\. ‘Thus the B-coefficient of & depends only on 8 and is independent of the choi of the rest of the basis. However, we know from Chapter 2 that when sn arbitrary basis containing 8 is being used, then the A-coefficient of & varies with the basis. This partly explains the favored position of orthogonal bases ‘We often obtain an orthonormal sequence by “orthogonalizing” some given sequence. Lemma 2.5. If {a:) is a finite or infinite sequence of independent vectors, then there is an orthonormal sequence {yi} such that {a;}7 and {g,}"} have the same linear span for all n. Proof. Since normalising is trivial, we shall only orthogonalize. Suppose, to be definite, that the sequence is infinite, and let M, be the Tinear span of {a1,-.- 44}. Let um be the orthogonal projection of ay on M,_1, and set x = Gx — tn (and ¢1 = ay). This is our sequence. We have vi Mic My. if i Oif 8 ~ 0, and 50 @is injective, Actually, 0 is an isometry, as we shall ask the reader to show in and256 SCALAR PRODUCT SPACES 52 ‘an exercise, If V is finite-dimensional, the injeetivity of isomorphism. But we have a much more startling result: ‘Theorem 2.4. 0 is an isomorphism if and only if V is a Hilbert space. Proof. Suppose first that V is a Hilbert space. We have to show that 0 is surjective, ie, that every nonzero F in V* is of the form 8. Given such an F, let be its null space, let a be a veetor orthogonal to N (Theorem 2.1), and eonsider 8 = ca, where cis to be determined later. Every vector £ in V is uniquely a sum £ = 28+ 1, where isin N. [This only says that V/N is one-dimensional, which presumably we know, but we can check it direetly by applying F and seeing that, F(g— x8) = 0 if and only if x = F(Q)/F(6).] But now the equations PC) = PB + 9) = 2F(B) = 2eF(@) and (8) = (8B) = (28 + 9, 8) = 2? = 2e*llal|* show that Jy = F if we take ¢ = F(a)/|la® Conversely, if @ is surjective (and assuming that it is an isometry), then itis an isomorphism in Hom(V, V*), and since V* is complete by Theorem 7.6, Chapter 4, it follows that V is complete by Theorem 7.3 of the same chapter. We are finished. 0 EXERCISES: 2.1 In the proof of Lemma 2.1, if (a ~ yu, £) x 0, what value of ¢ will contradict the inequality 0 < 24(a — u, 8) + eg|*? 2.2 Prove the “only if” part of Theorem 2.3 2.8 Let {Jf@} be an orthogonal sequence of complete subspaces of a pre-Hilbert space V, and let P; be the (orthogonal) projection on M, Prove that {P,g} is Cauchy for any Ein V. 2.4 Show that the functions {sin nf} a1 form an orthogonal collection of elements in the pre-Hilbert space €((0, x)) with respect to the standard scalar product (f, ¢) = SG $0 9) dt. Show also that lsin nlp = x72. 2.5 Compute the Fourier coefficients of the function (0) = ¢ in ((0, x) with respect to the above orthogonal set. What then is the best two-norm approximation, to in the two-dimensional space spanned by sin t and sin 21? Sketch the graph of this, approximating function, indicating its salient features in the usual manner of calculus ceurve sketching. 2.6 The “step” function f defined by f(t) = x/2 on [0/2] and f() = 0 on (#/2, =] is of course discontinuous at x/2. Nevertheless, ealculate its Fourier coefficients with respect to {sin nf) Z=y in €((0,x)) and graph its best approximation in the span of {in nt} 2.7 Show that the functions {sin nf}Z=y U {cos nf} ao form an orthogonal eolleetion of elements in the pre-Hilbert space €([—x, x) with respect to the standard scalar product (f,@) = JE-J(0) g(®) at53 SELP-ADJOINT TRANSFORMATIONS 257 2.8 Calculate the first three terms in the orthogonalization of {z"}§ in @((—1, 1)). 2.9 Use the definition of the norm of « bounded linear transformation and the ‘Schwarz inequality to show that {i@sl| < [\8i| [where @(#) = (£,8)]. In order to conclude that 6 ++ 8, is an isometry, we also need the opposite inequality, ||@5|| > |\6\). Prove this by using «special value off. 2.10 Show that if Vis an incomplete preHilbert space, then ¥ has & proper closed subspace IM such that AF (0). [[fin: There must exist F © ¥* not of the form P(® ~ (E,a).) Together with Theorem 2.1, this shows that « pre-Hilber space Visa Hilbert space if and only if V = M ® M-+ for every closed subspace M. 211 The isometry 8: a7 By [where @a(8) = (f,2)]imbeds the pre-Hilbert space V ints conjugate space V*, We kuow that Vis complete. Why? ‘The closure of ¥ as a subspace of Vis therefore complete, and we can hence complete Vax a Banach space. Let I! be its completion. Te sa Banach space including (the isometre image of) V as a dense subspace. Show that the sealar product on V extends uniquely to H and thatthe norm on isthe extended sear product norm, so that isa Hilbert space. 2.12 Show that under the isometric imbedding a A ofa pre-Hilhert space V into ¥* orthogonality is equivalent to annihilation as discuseed in Section 2.3. Discuss the connection between the properte ofthe anniilstor 4° and Lemma 1.1 ofthis chapter. 2.13 Prove that if € is a nonempty complete convex subset of a pre-Hilbert space V, and if ais any veetor not in C, then there is a unique x € C closest toa. (Examine the proof of Lemma 2.2.) 3. SELF-ADJOINT TRANSFORMATIONS Definition. If V is a pre-Hilbert space, then 7 in Hom V is self-adjoint if (Tra, 8) = (a, T8) for every a, 8 € V. The set of all self-adjoint transformations will be designated SA. Self-adjointness suggests that 7 ought to become its own adjoint under the injection 6 of V into V*. We cheek thisnow. Since (a, 8) = ép(a), we ean rewrite the equation (Ta, 8) = (a, 18) as 8,(T'a) = @r9(a), and again as (77*(@) (a %sddprby the definition of T*, This holds forall « and 8 if and only if "(0 079 for all 8 € V, or T* © @ = 0 T, which is the asserted identification, Lemma 3.1. If V is a finite-dimensional Hilbert space and {¢,}7 is an orthonormal basis for V, then 7’ € Hom(V) is self-adjoint if and only if the matrix {tj} of T with respoet to (ed Proof. If we substitute the basis expansions of « and @ and expand, we see that, (a, 78) = (Pa, 8) for all and @ if and only if (gs, Tes) = (Tye ¢,) for all ¢ and j. But Ter = Df) tke, and when this is substituted in these last sealar products, the equation becomes fi; = tye. That is, 1 is self-adjoint if and only if rau A self-adjoint 7 is said to be nonnegative if (1, #) > 0 for all &. ‘Then [8 1] = (1 ») is a semiscalar product! GTA)258 SCALAR PkopUCT SPAC! a Lemma 32. If T is a nonnegative self-adjoint transformation, then Ie] < [7\"2(7e, 4"? for all £ Therefore, if (7, &) = 0, then T= 0, ard, more generally, if (Eq, &) —* 0, then T(E,) — 0. Proof. If T' is nonnegative as well as self-adjoint, then (8, 9] = (7, n) isa semiscalar product, and so, by Schwar2’s inequality, VPS wm] = U8 dS U8, ln, ad? = (1 OT, a) ‘Taking » = 7%, the factor on the right becomes (7(T'2), 78)", which is less than or equal te|7'|*27¢], by Schwara and the definition of |. Dividing by UTE], we got the inequality of the lemma. 0 Tha # O-and T(a) = ca for some ¢, then a is called an eigenvector (proper vector, characteristic vector) of 7, and ¢ is the associated eigenralue (proper value, characteristic value). ‘Theorem 3.1. If V is a finitedimensional Hilbert space and 7 is a seli- adjoint element of Hom V, then V has an orthonormal basis consisting entirely of eigenvectors of 7. Proof. Consider the function (7g, €). It is a continuous real-valued funetion of § and on the unit sphere S= {¢: él| = 1} it is bounded above by ||7') (by Schvvar2). Sct m = lub {(7', 8) || = 1}. Since S is compact (being bounded and elosed), (Tg, £) assumes the value m at some point a on S. Now m — Tisanomegative self-adjoint transformation (Check this!),and (Te, «) 1m is equivalent to ((m— T)a, a) = 0. Therefore, (m — T)a~ 0 by Lemma 3.2,and Ta = ma. Wehave thus found one eigenveetor for 7. Now set V1 = V, a = @ and m; = m, and let V2 be {ax)+. Then TIV2] C Va, for if § Las, then (15, a1) = (& Tax) = m(E, a1) = 0. We can therefore repeat the above argument for the restriction of T to the Hilbert space V2 and find ap in Vp such that |jaz|| = 1 and T(as) = me, where ms = lub {(7s, §):||$] = Land £ V2}. Clearly, mz < my, We then set V3 = {e1,a2}+ and continue, arriving finally at an orthonormal basis {ai} of eigenvectors of 7. 0 Now let Ai... Az be the distinct values in the list ma... my, and let My be the linear span of those basis veetors a; for which m;— j. Then the subspaces M, are orthogonal to each other, V = @j M,, each M; is T-invariant, and the restriction of 1° to My is Xs times the identity, Since all the nonzero vectors in M; are eigenvectors with eigenvalue Ay, if the ais spanning M, are replaced by any other orthonormal basis for My, then we still have an orthonormal basis of eigenvectors, ‘The as are therefore not in general uniquely determined. Bu the subspaces M, and the eigenvalues 2, are wnique. This will follow if we show that every eigenvector is in an I Lemma 3.3, In the context of the above discussion, if € # Oand T(8) = 2¢ for some xin R, then & € A; (and so x = }s) for some j53 SELP-ADJOINT TRANSFORMATIONS — 259 Proof. Since ¥ = @i My, we have = Lj & with & © My. Then. rn) = EM) = Me and EG — NE =O. Derk = Since the subspaces M;, are independent, every component (x — d,)£; is0. But some §; ¥ 0, since £ # 0. Therefore, z= dj, & = 0 fori j, and r= hem, 0 We have thus proved the following theorem. Theorem 3.2. If V is a finite-dimensional Hilbert space and T is a self- adjoint element of Hom V, then there are uniquely determined subspaces {V,}4 of Y, and distinct sealars (4,);, such that {V3} is an orthogonal family whose sum is V and the restriction of T to Vs sd; times the identity. If V is finite-dimensional vector space and we are given T Hom V, then ‘we know how to compute related mappings such as 7? and 7} (if it exists) and vectors Ta, Ta, ete., by choosing a basis for V and then computing matrix, products, inverses (when they exist), and so on, Some of these computations, particularly those related to inverses, ean be quite arduous. One enormous advantage of a basis consisting of eigenvectors for 1’ is that it trivialize all of these ealeulations. ‘To see this, let {8,} be a basis of V consisting entirely of eigenvectors for 7, and let {rq} be the corresponding eigenvalues. To compute Tg, we write down, the basis expansion for g € = D} 2i8i, and then TE = TC} rani. T? has the same eigenvectors, but with eigenvalues {°?}. Thus Ta Strix. T-? exists if and only if no r; = 0, in which ease it has the same eigenvectcrs with cigenvalues {1/r)}. Thus T= F} (ei/r)Be WF PO = CPagt™ is any polynomial, then P(T) takes 8; into P(r,)8:. ‘Thus P(T)E = LF PeraizBi By now the point should be amply clear. ‘The additional value of orthonormality in a basis is already clear fom the last section. Basically, it enables us to compute the coefficients {2} of ¢ by sealar products: 2; = (&,8,)- This is a gcod place to say a few words about the general eigenvalue problem in finite-dimensional theory. Our complete analysis above was made possible by the self-adjoininess of ' (or the symmetry of the matrix t). What we can say about an arbitrary Tin Hom V is much less satisfactory. Wee first note that the eigenvalues of can be determined algebraically, for . isan eigenvalue if and only if T’ — A7 is not injective, or, equivalently, is singular, and we know that 7 — \/ is singular if and only ifits determinant A(T — MJ) sO. If we choose any basis for V, the determinant of T — AY is the determinant of its matrix t — de, and our Iater formula in Chapter 7 shows that this is a polynomial of degree n in }. It is easy to see that this polynomial is independent of the basis; itis called the characteristic polynomial of T. ‘Thus the eigeavalues of 7 are exactly the roots of the characteristic polynomial of .260 ScaLAK PRODUCT spaces 53 However, 7 need not have any eigenvectors! Consider, for example, a 90° rotation in the Cartesian plane. This is the map T: <2,y> + <—y,2> Thus 7(6!) = 6% and T(5*) = —8?, so the matrix of T is [a ft a} and the characteristic polynomial of 1's the determinant of this matrix: A? + 1 Since this polynomial is irreducible over R, there are no eigenvalues. Note how different the outcome is if we consider the transformation with the same matrix on complex 2-space C2, Here the scalar field is the complex number system, and TT is the map <2, 22> ++ <—22, 21> from C? to C. But now \7+1= (+00 — 2), and T has eigenvalues -bi! To find the eigenvectors for i, we solve T(z) = iz, which is the equation <—22, 21> = (or 1<1, —i> = ) is the unique eigenveetor for i to within a sealar multiple We retum to our real theory. If f 0, then 7? af cannot be the zero transformation 3.3. Let p(t) = E+ bite be an irreducible quadratie polynomial (0 < 4e), and lee T be a self-adjoint transformation, Show that p(T) # 0. (Complete the square and apply eatlier exercises.)33 SELF-ADJOINT TRANSFORMATIONS — 251 44 Let T be self-adjoint and nilpotent (7* = 0 for some n). Prove that T = 0, This can be done in vatious ways, One method is to show it fist for n = 2 and then for n = 2" by induction. Finally, any n can be bracketed by powers of 2, 2" < nc gmtt 15 Let V beany veetor space, and let P bean cement of Hom ¥. Suppose that there is. polynomial q such that q(7) = 0, and let p be such a polynomial of minimum degree, Show that p is unique (to within a constant multiple). It is called the minimal palynomial of T. Show that if we apply Theorem 5.5 of Chapter 1 to the minimal polynomial p of 7, then the subspaces 2V, must both be nontrivial 26. It is a corollary of the fundamental theorem of algebra that a polynomial with real cooficients ean be fuctored into a product of linear factors (t — r) and irreducible quadratie factors (t+ bt-+¢). Let T be a self-adjoint transformation on a finiie- dlimensional Hilbert space, and let p(®) be its minimal polynomial. Deduce « now proof of Theorem 3.1 by applying to p(@) the above remark, Theorem 6.5 of Chapter 1, and Exercises 3.1 through 3.4 2.7 Prove that if 'is self-adjoint transformation on a pre-ilbert space V, then its null space is the orthogonal complement of its range: N(T) = (RCT))=.. Concle that if Visa Hilbert space, then a selt-adjoint T’is injective i and only if its range is dense (in V). 4.8 Assuming the above exercise, show that if V is a Hilbert space and [is a sel aujoint element of Hom V that is bounded below (as well as bounded), then 7 is surjective. 29° Let T be self-adjoint and nonnegative, and set m = tub ((TE, 8: [ll = 1) Use the Schwars inequality and the inequality of Lemma 32 to show that m = [7 310 Let V be a Hilbert space, et 7 be a self-adjoint element of Hom V, and wt m = lub {(T§, £): lf] = 1}. Show that if a > m, then a— T (=a — 7) is in- vertibleand l(a — 7)~! <1/(a — m). (Apply the Schwara inequality, the definition ‘of m, and Exercise 3.8.) BAL Let P be a bounded linear transformation on a preHilbert space V that isa projection in the sense of Chapter 1. Prove that if P is seladjoint, then P is an orthogonal projection. Now prove the converse. 312 Let V be a finite-dimensional Hilbert space, let 1 in Hom V be self-adjoint, and suppose that Sin Hom V commutes with 7. Show that the subspaces A; of ‘Theorem 8.1 and Lemma 3.3 are invariant under 5. 318A self-adjoint transformation 7 on a fnite- V* is an isomorphism, we ean of course replace the adjoint 7* € Hom V* of any T’ Hom V by the corresponding transformation @' e T* © @¢ Hom V. In Hilbert space theory it is this mapping that is called the adjoint of ' and is designated 7%. ‘Then, exactly as in our discussion of a self-adjoint 77, we see that (Ta, A) = (a,7%8) forall «BEV and that 1'* is uniquely defined by this identity. Finally, 7 is self-adjoint if and only if T = T*, Although it really amounts to the above way of introducing 7* into Hom V, ‘we can make a direct definition as follows. For each the mapping -» (TE, 1) is linear and bounded, and so is an elementsof V*, which, by Theorem 24, is given by a unique element 8, in V according to the formula (TE, 9) = (&, 8,)- ‘Now we check that 7+» 8, is linear and bounded and is therefore an element of Hom V which we eall T+, ete. ‘The matrix ealeulations of Lemma 3.1 generalize verbatim to show that the matrix of 7° in Hom V is the transpose t* of the matrix t of 7. Another very important type of transformation on a Hilbert space is one that preserves the sealar product. Definition. A transformation T€ Hom V is orthogonal it (Ta, 7) (a, 8) for all a, 8 € V. By the basic adjoint identity above this is entirely equivalent to (a, 7*T) (a, 8), for all a, 8, and hence to T*T = I. An orthogonal T is injective, since |/Tei|® = \Ja|®, and is therefore invertible if V is finite-dimensional. Whether V is finite-dimensional or not, if T is invertible, then the above condition becomes Tt = T-! If T€ Hom R®, the matrix form of the equation T*T = I is of course = e, and if this is written out, it becomes 1“ Stites = sf forall i,3, which simply says that the columns of ¢ form an orthonormal set (and henee a basis) in R", We thus have: ‘Theorem 4.1. A transformation 7’ Hom R” is orthogonal if and only if the image of the standard basis {3°}3 under 7’ is another orthonormal basis (with respect to the standard scalar product). ‘Tho necessity of this condition is, of course, obvious from the scalar-product- preserving definition of orthogonality, and the sufficiency can also be checked directly using the basis expansions of a and 8. We can now state the cigenbasis theorem in different terms. By a diagonal ‘matrix we mean a matrix which is zero everywhere except on the main diagonal.54 ORTHOGONAL TRANSFORMATIONS — 263 ‘Theorem 4.2. Let t = {ti} be a symmetric n Xm matrix. ‘Then there exists an orthogonal m Xn matrix b such that b~'tb is a diagonal matrix. Proof. Since the transformation 7 < Hom R® defined by t is self-adjoint, there exists an orthonormal basis {b'}4 of eigenvectors of 7, with corresponding genvalues {r)}}. Let B be the orthogonal transformation defined by B(#) = Bi, j=1,...,n. (The n-tuples bé are the columns of the matrix b = {bj} of B.) Then (B™'e To B)(8/) = rj8/. Since (B™' 0 To B)(8#) is the jth ‘column of b“"tb, we see that s = b“'tb is diagonal, with 3; = ry. 0 For later applications we are also going to want the following result. ‘Theorem 4.3. Any invertible 7’ € Hom V on a finite-dimensional Hilbert space V can be expressed in the form 7’ = RS, where 2 is orthogonal and S is self-adjoint and positive. Proof. For any 7, 77 is self-adjoint, since (P*7)*= TeT** = T°, Let {ei}} be an orthonormal eigenbasis, and let {ri} be the eorresponding eigenvalues of T*7, Then 0 < ||T'gi||? = (T*Tei, e:) = (ries #1) = 71 for each i. Since all the eigenvalues of 7'*7 are thus positive, we ean define a positive square root S = (T*2)"? by See = (Fi) "yi, #= 1,2)... Tt is clear that S? TT and that 8 is self-adjoint. Then A = ST" is orthogonal, for (ST'a, ST!) = (T~ta, S°T~8) = (Pa, TT-1B) = (T~'a, 78) = (TT~'a, 8) = (a, 8). Since T= A-'S, ve set R= A~ and have the theorem. 0 It is not hard to see that the above factorization of Tis unique. Also, by starting with 7'7*, we can express T in the form 1’ = SR, where S is self-adjoint ‘and positive and R is orthogonal. We call these factorizations the polar decompositions of 7, since they fune~ tion somewhat like the polar coordinate factorization 2 = re of a complex number. Corollary Any nonsingular n Xn matrix t can be expressed as t= udy, where w and v are orthogonal and d is diagonal. Proof. From the theorem we have t = rs, where r is orthogonal and s is sym- db™?, where d is diagonal and b is orthogonal. udy, where u= rb and y= b~ are both EXERCISES, 41 Let V be a Hilbert space, and suppose that $ and Tin Hom V satisty (Thm) = Sn) forall Em Write out the proof of the identity $= @-1e Toa,264 SCALAR Propvcr sPacts Ba 4.2. Write out the analogue of the proof of Lemma 3.1 which shows that the matrix of T* is the transpose of the matrix of 7. 4.3 Once again show that if (£, 9) = (££) for all & then » = ¢. Conclude that if S, Tin Hom V are such that (, 7) = (£, 8) for ally, then T = S, 4.4 Let fa, b} be an orthonormal basis for R2, and let t be the 2 2 matrix whose columns are a and b. Show by direct calculation that the rows of t are also orthonormal. 4.5. State again why it is that if V is finite-dim satisfy So 7 = , then Tisinvertibleand $ = 7~' Hilbert space, and let 7’ be an orthogonal transformation i also orthogonal. 4.6 Let t be ann m matrix whose columns form an orthonorma: basis for RY Prove that the rows of t also form an orthonormal basis. (Apply the above exercise.) sional, and if $ and T in Hom V Now let V bea finite-dimensional Hom V. Show that T* is 4.7 Show that a nonnegative self-adjoint transformation S on a finite-dimensional Hilbert space has a uniquely determined nonnegative self-adjoint square root. 4.8 Prove that if V isa finite-dimensional Hilbert space and 7. Hom V, then the “polar decomposition” of 7, = RS, of Theorem 4.3 is unique. (Apply the above exercise) 5, COMPACT TRANSFORMATIONS ‘Theorem 3.1 breaks down when V is an infinite-dimensional Hilbert. space. A self-adjoint transformation T does not in general have enough eigenvectors to form a basis for V, and a more sophisticated analysis, allowing for a “con- ‘tinuous spectrum” as well as a “discrete spectrum”, is necessary. This en- riched situation is the reason for the need for further study of Hilbert. space theory at the graduate level, and is one of the sources of complexity in the ‘mathematical structure of quantum mechanies. However, there is one very important special ease in which the eigenbasis ‘theorem is available, and which will have a startling application in the next chapter. Definition. Let V and W be any normed linear spaces, and let $ be the unit ball in V._A transformation in Hom(V, 1) is compact if the elosure of TS] in W is sequentially compact. ‘Theorem 5.1. Let V be any pre-Hilbert space, and let 7 € Hom ¥ be self- adjoint and compact. ‘Then the pre-Hilbert space R = range (7) has an orthonormal basis {yi} consisting entirely of eigenvectors of T, and the ‘corresponding sequence of eigenvalues {r4} converges to 0 (or is finite). Proof. ‘The proof is just like that of Theorem 3.1 except that we have to start a little differently. Set m = ||T\| = lub {\|7(&)||:||é\| = 1}, and choose a sequence {5} such that [jfq[| = 1 for all m and [/7(&,)|| > m. Then ((m? — T)Eqy En) = m? — |IT(En)||? > 0,55 COMPACT TRANSFORMATIONS — 265, and since m? — 7? is a nonnegative self-adjoint transformation, Lemma 3.2 tells that (m? — 7°)(,) +0. But since Tis compact, we can suppose (passing toa subsequence if necessary) that {T'f,} converges, say to 8. Then TE, —» 73, and so m®,—> 7 also. Thus f, — 78/m* and 6 = lim Tt, = T%(3)/m® Since \@l| = lim ||7(E,)|| = m, we have a nonzero vector 8 such that T#(8) = mip. Set a= A/a ‘We have thus found a vector a such that || (m — T)(m-+ T)(a). Then either (m+ T)(a) = 0, in which ease T(a) = =ma, or (n+ T)(a) = ¥ #0 and (m — 7) = 0, in which ease TY = m7. ‘Thus there exists a vector gx (a or ¥/|\7) such that llerl| = 1 and T(¢r typi, where [ry] = m. We now proceed just as in Theorem 3.1 For nosational consistency we set m, = m, Vi = V, and now set Vp {ei}. Then T[V2]C Vo, since if w1y,, then (Ta, v1) (a, ¢1) = 0. Thus 7 [ V2 is compact and self-adjoint, and if mz = |/T' t Vall, there exists ¢2 with ||v2|| = 1and T(¢2) = raga, where |ra| = ma. We continue induetively, obtaining an orthonormal sequence {p,} CV and a sequence {ra} CR such that Ten = regn and [ral = [IT f Val, where Vn = ea-- ena}. ‘We suppose for the moment, since this is the most interesting case, that tt» #0 forall n> Then we claim that |r,| —> 0. For [rl is decreasing in any ease, and if it does not converge to 0, then’ there exists a b > O such that [ral > b for all n. Then [P(e — P(e)" = [hee — ry? = Troe? + level #21? > 267 for all i # j, and the sequence {T(g,)) can have no convergent subsequence, contradicting the compactness of T. Therefore [ral | 0. Finally, we have to show that the orthonormal sequence {¢,} isa basis for R. If #= Te), and if {b,} and {a,} are the Fourier coefficients of 6 and a, then we expect that by = fqdq, and this is easy to check: by = (B, ¢x) = (PCa), en) = (a) Ten) = (a; Puen) = u(y Go) = Tandy This is just saying that T(Gugn) = Pnem and therefore 6 — Ei} beer = Tla — Ci av). Now @ — Dj aig; is orthogonal to {y;}4 and therefore is an element of V,,41, and the norm of Ton Vass is ray. Moreover, la — 3 aieil| < lal, by the Pytha- gorean thearem. Altogether we ean conclude that |p fand since ra1—+ 0, this implies that 8 = EP bi. Thus, fe) is a basis for R(T). Alsc, since T is self-adjoint, N(T) = R(T)* = (e)* = NF Vie If some 1; = 0, then there is a first m such that rp In this case ||T {Vall = [ra] = 0, s0 that V CN(T). But ¢: € R(T) if i < n, since then gi = P(y)/ri, and so N(T) = R(T)*C {e1,--. 5 ¢n—1}+ = Ve There- fore, N(T) = Vq and R(7) is the span of fe)}1-'. O Land 0 = (m? — 72)(@ S Wrosal llal,CHAPTER 6 DIFFERENTIAL EQUATIONS ‘This chapter is not a small differential equations textbook; we leave out far tov much. We are principally coneerned with some of the theory of the subject although we shall say one or two practical things. Our first goal is the fundamental existence and uniqueness theorem of ordinary differential equations, which we prove as an clegant application of the fixed-point theorem. Next ook at the linear theory, where we make vital use of material from the first two chapters and get quite specific about the process of actually finding solutions. So far our development, is linked to the initial-value problem, concerning the existence of, and in some cases the ways of finding, a unique solution passing through some initially preseribed point in the space containing the solution curves. However, some of the most important aspects of the subject relate to ‘what are called boundary-value problems, and our last and most sophisticated effort will be directed toward making a first step into this large area. ‘This will involve us in the theory of Chapter 5, for we shall find ourselves studying self- adjoint operators. In fact, the basic theorem about Fourier series expansions will come out of recognizing a certain right inverse of a differential operator to be a compact self-adjoint operator. 1, ‘THE FUNDAMENTAL THEOREM. Let A bean open subset of a Banach space W, let J bean open interval in R, and let F: 1 x A— W be continuous. We want to study the differential equation da/dt = F(t, a). A solution of this equation isa funetion f: J — A, where J is an open subinterval of Z, such that f’(f) exists and J = FIO) for every tin J. Note that a solution f has to be continuously differentiable, for the existence of J’ implies the continuity of f, and then {’() = F(t, {() is continuous by the continuity of F. We are going to see that if F has a continuous second partial diferential, then there exists a uniquely determined “local” solution through any point Ktoao> 1X A. 061 ‘THE FUNDAMENTAL THEOREM — 267 In saying that the solution f goes through , we mean, of course, that azo = fa). The requirement that the solution J have the value ay when t= fo is called an initial condition. ‘The existence and continuity of d%;,<> implies, via the mean-value theorem, that F(t, ) is locally uniformly Lipschitz in a. By this we mean that for any point <¢o, a> in I X A there is a neighborhood M XN and a constant b such that [|F(, £) — F ml] < 6) € ~ nj for all ¢ in M and all &, » in NV. ‘To see this we simply choose balls M and N about fo and ao such that dla» is bounded, say by b, on M x N, and apply Theorem 7.4 of Chapter 3. This is the condition that we actually use below. ‘Theorem 1.1. Let A be an open subset of a Banach space W, let I be an open interval in R, and let F be a continuous mapping from Ix A to W which is locally uniformly Lipschits in its second variable. ‘Then for any point in Ix A, for some neighborhood U of ao and for any sufficiently small interval J containing fo, there is a unique function f from J to U which is a solution of the differential equation passing through the point , then an integration gives £0 — Sl) = f'P (6,500) ds, so that, $0 = 00+ J" F(s40)) ds for EJ. Conversely, if f satisfies this “integral equation”, then the fundamental theorem of the caleulus implies that /"( exists and equals F(t, 0) onJ, so that fis a solution ofthe differential equation which clearly goes through lo, ay>. Now for any continuous f:.J + A we can define 9: J —> W by a) = a0 + J" FCs, S16) ds, and our argument above shows that fis a solution of the differential equation if and only if fis fixed point of the mapping K: fg. This suggests that we try to show that K is a contraction, so that we can apply the fixed-point theorem. ‘We start by choosing @ neighborhood L x U of on which F((, a) is bounded and Lipschitz in a uniformly over &. Let J’ be some open subinterval of L containing tq, and let V be the Banach space @e(J, W) of bounded eontinuous funetions from J to W. Our later ealeulation will show how small we have to take J. We assume that the neighborhood U is a ball about a of radius 1, and we consider the ball of funetions U = B,(&) in V, where a is the constant function with value a9. Then any fin has its range in U, so that F(t, f() is defined, bounded, and continuous. ‘That is, K as defined earlier maps the ball ‘into V.268 IFFERENTIAL mqvarioNs l We now calculate, Let F be bounded by mon L x U and let 8 be the length of J. Then [K(@) — aol = lub {f° F6, 2) dal =e} < am @ by the norm inequality for integrals (see Section 10 of Chapter 4). Also, if and fo are in‘, and if ¢ is a Lipschitz constant for F on LX U, then IK) — KU Ia = wb {| ['FAO) — FG A(9) |} sub (lF(s, fi) — F(s, fo) Be lub {ifits) — fal0)l} bellfa — falle- 2) From (2) we see that K is a contraction with constant © = de if 6¢ < 1, and from (1) we see that K moves the center a of the ball Ua distance less than (1 = Opr if 5m < (1 — e)r. ‘This double requirement on 6 is equivalent to mter Wa o< and with any such 4 the theorem follows from a corollary of the fixed-point theorem (Corollary 2 of Theorem 9.1, Chapter 4). 0 Corollary. ‘The theorem holds if F: Ix A —> W is continuous and has a continuous second partial differential ‘We next show that any two solutions through must agree on the intersection of their domains (under the hypotheses of Theorem 1.1). Lemma 11. Let g: and g2 be any two solutions of da/dt = F(¢, a) through toy o>. Then gx(0) = galt) for all t in the intersection J = Jy AJ of their domains. Proof. Otherwise there is a point sin J such that gx(8) # g2(8). Suppose that 8 > toy and set C= {t:t> to and 93(0) # ga(0)} and z= glbC. ‘The set C is open, since g: and gp are continuous, and therefore 2 is not in C. ‘That is, x(2) = g2(2). Call this common value « and apply the theorem to With r such that B,(a) CA, we choose & small enough so that the differential equation has a unique solution g from (¢ — 4, x + 8) to By(a) passing through <2, a> , and we also take 8 small enough so that the restrictions of gx and ga to (@ — 8,24 8) have ranges in B,(a). But then gy = 92 = 9 on this interval by the uniqueness of g, and this contradicts the definition of 2. Therefore, 1 = ge on the interseotion of their domains. ‘This lemma allows us to remove the restriction on the range of f in the theorem. ‘Theorem 1.2. Let A, I, and F be as in Theorem 1.1. ‘Then for any point in I x A and any sufficiently small interval neighborhood J of fo, there is a unique solution from J to A passing through .61 ‘THE FUNDAMENTAL THEOREM — 269 Fig. 6.1 Global solutions. The solutions we have found for the differential equation da/dt = F(t, a) are defined only in sufficiently small neighborhoods of the initial point fo and are accordingly called local solutions. Now if we run along to a point , first ofall it will have to agree with our first solution on the intersection of the two domains, and secondly it will in general extend farther beyond ty than the first solution, so the two local solutions will fit together to make a solution on a larger interval than either gives separately. We can continue in this way to extend our original solution to what might be ealled a global solution, made up of a patchwork of matching local solutions. These notions are somewhat vague as described above, and we now tur to a more precise construction of a global solution. Given CX A, let 5 be the family of all solutions through to, a>. Thus g 5 if and only if is a solution on an interval J CI, to € J, and g(to) = ao. Lemma 1.1 shows exactly that the uniont fof all the functions g in 5 is itself a function, for if Egy and E42, then ay = n® = 92) = a2 “Moreover, fis a solution, because around any z in its domain f agrees with some g €5. By the way f was defined we see that J is the unique maximum solution through . We have thus proved the following theorem. ‘Theorem 1.3. Let F: 1 X A — V bea function satisfying the hypotheses of ‘Theorem 1.1. ‘Then through each in I x A there is @ uniquely determined maximal solution to the differential equation da/dt = F(t, a). In general, we would have to expect a maximal solution to “run into the boundary of A” and therefore to havea domain interval J properly included in I, as Fig. 6.1 suggests. { Remember that we are taking function to be a set of ordered pairs, so that the union of a family of functions makes precise sense,270 DIFFERENTIAL EQUATIONS oa However, if A is the whole space W, and if F(, a) is Lipschitz in a for each f, with a Lipschitz bound c(t) that is continuous in f, then we can show that each ‘maximal solution is over the whole of J. We shall shortly see that this condition is a natural one for the linear equation. Theorem 1.4, Let W bea Banach space, and let J be an open interval in B. Let F: 1X W — W be continuous, and suppose that there is a continuous function ¢: [+ R such that IF, a4) — P(t, a)|| < (lle — al) forall ¢in [and all ay, a2 in W. ‘Then each maximal solution to the differ ential equation da/dt = F(t, a) has the whole of J for its domaic Proof. Suppose, on the contrary, that g is a maximal solution whose domain interval J has right-hand endpoint 6 less than that of Z. We choose a finite open. interval L containing b and such that ZC I (see Fig. 6.2). Since Tis compact, the continuous funetion c(?) has a maximum value con I. We choose any ty in La J close enough to b s0 that b — ty < 1/c, and we set ay = g(t) and m= max |F(, a;)| on Z. With these values of ¢ and m, and with any r, the proof of Theorem 1.1 gives us a local solution f through with domain (ty — t+ 4) for any 6 less than r/(m+ re) = 1/(c+ (m/r)). Since we now have no restriction on r (because A = W), this bound on 5 becomes 1/e, and sinee we chose ty so that t; + (L/e) > by we ean now choose 6 s0 that, t, + 8 > 6. But this gives us a contradiction; the maximal solution g through includes the local solution f, so that, in particular, ¢.-+ 8 < b. We have thus proved the theorem, 0 i Fig. 62 Going back to our original situation, we can conclude that if the Lipschitz control of F is of the stronger type assumed above, and if the domain J of some ‘maximal solution g is less than I, then the open set A cannot be the whole of W. It is in fact true that the distance from g(t) to the boundary of A approaches zero as (approaches an endpoint b of J which is interior to Z. ‘That is it is now a theorem that p(f(0), A’) > 0 as t+ 6. The proof is more complicated than our argument above, and we leave it as a set of exercises for the interesied reader. The nth-order equation. Let Ay, A,..., Ay be open subsets of a Banach space WV, let J bean open interval in Rand let G: 1X Ay X AgX-++X Ay W be continuous. We consider the differential equation afd" = Gt, a, da/at, ...,d**a/at™61 THE FUNDAMENTAL THEOREM 271 ‘A function f:J — 1 is a solution to this equation if J is an open subinterval of 1,f has continuous derivatives on J up to the nth order, f°] C Ay i= Tyo), and LD = Clb $0550, FPO) for Le J. An initial condition is now given by a point lo Bry Bay +4 Bq® ELK Ay X02 X Ay ‘The basic theorem is almost the same as before. To simplify our notation, let @ be the n-tuple in W" = V, and set A= JT} Az. Also let ¥ be the mapping fr . ‘Then the solution equation becomes f(t) = Gt, ¥f(0). ‘Theorem 1.3. Let G:1 X A— W be as above and suppose, in addition, that G(t, a) is locally uniformly Lipschitz in. Then for any in 1X Aand for any sufficiently small open interval J containing fo, there is a ‘unique function f from J to W such that fis a solution to the above nth-order ‘equation satisfying the initial condition ¥(lo) = Proof. There is an ancient and standard device for reducing a single nth-order equation to a system of first-order equations. ‘The idea is to replace the single equation aa/dt” = Gl, a, da/at, by the system of equations aye) day/at = as, dag/dt = as, oxy —i/dt = any ag /at = Gl, ay a 5 4) and then to recognize this system as equivalent to a single first-order equation on a different space. In fact, if we define the mapping F = from 1X A to V = W* by setting Pil, 2) = ay for yn —1,and F*(,a) = G(t, a), then the above system becomes the single equation dajat = F(t, a), where F is clearly locally uniformly Lipschitz ine. Now a funetion f= from J to V is a solution of this equation if and only if272 DIFFERENTIAL EQUATIONS 61 that is, if and only if fy has derivatives up to order n, ¥(fi) = and f() GG, ¥fu()).. The n-tuplet initial condition yf(to) = is now just f(a) = B. ‘Thus the nth-order theorem for @ has turned into the first-order theorem for F, and so follows from Theorems 1.1 and 1.2 0 ‘The local solution through extends to a unique maximal solution by Theorem 1.3 applied to our first-order problem da/dt = F(t, a), and the domain of the maximal solution is the whole of J if G(, a) is Lipschitz in a with a bound e(t) that is continuous and if A = 17, as in Theorem 1.4. EXERCISES 1.1 Consider the equation da/dt = F(t,a) in the special ease where W = R?. Write out the equation as a pair of equations involving real-valued functions and real variables. 1.2. Consider the system of differential equations dz/dt = t+ 22+, — dy/dt = cos zy. Define the function F: RS —+ R? so that the above system becomes da/dt = F(t, a), where a = <2,y>. 1.8. In the above exercise show that F is uniformly Lipschits in a on BX A, where A is any bounded open set in R?, Is P uniformly Lipachite on RX R2? 1.4 Write out the above system in terms of a solution function f ‘Write out for this system the integrated form used in proving ‘Theorem 1.1. 1.5 The fixed-point theorem iteration sequence that we used in proving Theorem 1.1 starts off with fo as the constant function zy and then proceeds by $4 = 20-+ f(s fou) Compute this sequence as far as fa for the differential equation afm the f= t+ FO) with the initial condition f(0) = 0. That is, take fo = O and compute fi, fo, fa, and fa from the formula, Now guess the solution f and verify 1.6 Compute the iterates fo, fi, and fa for the initial-value problem dy/as =2+y?, (0) = 0. Supposing that the solution f has a power series expansion about 0, what are its first three nonzero terms? 1.7 Make the computation in the above exercise for the 1.8 Do the same for f(0) = -H1. 1 condition f(0) = —1.6 ‘THE FUNDAMENTAL THEOREM — 273 1.9. Suppose that WY is « Banach space and that F and G are functions from Rt x W* to IV satisfying suitable Lipschitz conditions. Show how the second-order system = PGE Ra B= OG En Eat) would be brought under our standard theory by making it into a single second-order equation. 1.10. Answer the above exercise by converting it toa first-order system and then toa single frstorder equation. 1L-UL_ Let 9 be a nonnegative, continwous, real-valued funetion defined on an interval 10,0} CR, and suppose that there are constants band ¢ > 0 such thet O) < ef d+ be forall 20,0} a) Prove by induction that if m = ||@|l», then 0) , and set O(2) = ||f(to +2) — aol|. Prove that, Ma) < [00 a+ be forz > O and lo+ = in I. ‘Then use the result in the above exercise to derive a much ‘stronger bound than we have in the text on the growth of the solution f(é) as ¢ goes. tay from f. 1.13 With the hypotheses on F as in the above exercise, show that the iteration teanenc for the solution through converges onthe whole of by showing inductively that fo = ap and $n) = 20+ [FC fo-x00) a, then be Ul) —Saasto] 2D. From these inequalities prove directly that the solution f through satisfies 4 — aol < Be =»,274 DIFFERENTIAL EQUATIONS 62 2, DIFFERENTIABLE DEPENDENCE ON PARAMETERS It is exceedingly important in some applications to know how the solution to the system, £0 = 66f0), fa)= a varies with the initial point . In order to state the problem precisely, ‘we fix an open interval J, set U = B,(&q) C V = Be(J, W) as in the previous section, and require a solution in % passing through , where is near . Supposing that a unique solution f exists, we then have a mapping + f, and it is the continuity and differentiability of this map that we wish to study. ‘Theorem 21. Let LX U be a neighborhood of with the following property. Forany in J X NV there is a unique function f from J to U which is a solution of the differential equation da/dt = F(t, a) passing through , and the mapping +f from J x N to V is continuous. Proof. We simply reexamine the ealeulation of Theorem 1.1 and take 5 a little smaller. Let E(l,, a1, ) be the mapping of that theorem but with initial point ty, a1>, so that g = K(t,, a1, f) if and only if g(t) = a4 + ff F(s, f(s)) ds for all fin J.” Clearly K is eontinuous in for each fixed J. If N is the ball B,;o(a0), then the inequality (1) in the proof shows that U1K(lu,e1, 0) — Aol < [an — aol + 8m <1/2-+ am. The second inequality remains unchanged. Therefore, f+ K(t, a1, ) isa map from % to V which isa contraction with constant C = cif 6 <'1, and which moves the center a of @ distance less than (1 —C)r if r/2-+ 4m < (1 — éc)r. This new double requirement on 8 is equivalent to °< 3m Fen” which is just half the old value. With J of length 4, we can now apply Theorem 9.2 of Chapter 4 to the map K(ly, a1, f) from (J X N) XL to V, and so have ‘our theorem. ‘If we want the map ++ f to be differentiable, it is sufficient, by ‘Theorem 9.4 o' Chapter 4, to know in addition to the above that KiVXN) XUV is continuously differentiable, And to deduce this, i is sufficient to suppose that dF exists and is uniformly continuous on L X U. Theorem 22. Let LX U be a neighborhood of in the Banach space R X W, and let F(t, a) be a bounded mapping from L x U to W such that di exists, is bounded, and is uniformly continuous on L X U. Thea, in62 DIFFERENTIABLE DEPENDENCE ON PARAMETENS 275 the context of the above theorem, the solution f is a continuously differentiable function of the initial value <4, a> Proof. We have to show that the map K(t), a1, f) from (J x N) x Uto V ison tinuously differentiable, after which we ean apply ‘Theorem 9.4 of Chapter 4, as ‘we remarked above. Now the mapping h +> k defined by k(t) = fi h(s) ds isa bounded linear mapping from V to V which clearly depends continuously on t, and by Theorem 14.3 of Chapter 3 the integrand map f+» A defined by (s) F(s,f(2) is continuously differentiable on Ut. Composing these two maps we see that dK%,9,,7> exists and is continuous on J x NV Xu. Now AK sr() = & so that dK? = 1, and AKLi,9,/>(4) = —{it"* F(,f@)) ds, from which it follows easily that dK y.m,>(8) ——hF(4, j(O)- The three partial different als dK", dK®, and dK* thus exist and are continuous on J x N X al, and it follows, from Theorem 8.3 of Chapter 3 that K(¢,, a1, /) is continuously differentiable there, 0 Corollary. If s is any point in J, then the value /(s) of a solution at sis a differentiable function of its value at fo Proof. Let fa be the solution through . By the theorem, a+ fa a continuously differentiable map from N to the funetion space V = Ge(J, W), Put m.:f+ J(s) is a bounded linear mapping and thus trivially continuously differentiable. Composing these two maps, we see that ++ fa(8) is continuously differentiable on N. 0 It is also possible to make the continuous and differentiable dependence of the solution on its initial value <¢o, ay> into a global affair. ‘The following is the theorem. We shall not go into its proof here. Theorem 2.8. Let fhe the maximal solution through <¢o, ao> with domain J and let {a, 6] be any finite closed subinterval of J containing fa. ‘Then there exists an € > O such that for every © B,( includes fa, and the restriction of this solution to [a,b] is @ continuous function of . If F satisfies the hypotheses of Theorem 22, then this dependence is continuously differentiable Finally, suppose that F depends continuously (or continuously differ- entiably) on a parameter \, so that we have F(X, , a) on M x Ix A. Now the solution f to the initial-value problem FO=FEIO), fh) depends on the parameter as well as on the initial condition f(%)) = ay, and if the reader has fully understood our arguments above, he will see that we can show in the same way that the dependence of f on » is also continuous (continuously differentiable). We shall not go into these details here.276 DIFFERENTIAL EQUATIONS 63 3. THE LINEAR EQUATION ‘We now suppose that the function F of Seetion 1 is from J x W to W and continuous, and that F(, a) is linear in « for each fixed ¢. Tt is not hard to see that we then automatically have the strong Lipschitz hypothesis of Theorem 1.4, Which we shall in any case now assume. Here this is a boundedness condition on a linear map: we are assuming that F(t, a) = Tr(a), where 7, € Hom W, and that |P;l| < c(t) for all t, where c(t) is eontinuows on Z. ‘As one might expect, in this situation the existence and uniqueness theory of Section 1 makes eontact with general linear theory. Let Xy be the veetor space €(Z, W) of all continuous funetions from 7 to W, and let X, be its subspace €1(7, W) of all functions having continuous first derivatives. Norms will play no role in our theorem. ‘Theorem 3.1, The mapping S:X1—»Xo defined by setting g = Sf if a =f" — F(f(O) is surjective linear mapping. ‘The set N’ of global solutions of the differential equation da/dt = F(t, a) is the null space of S, and is therefore, in particular, a vector space. For each to € J the restriction to N of the coordinate (evaluation) mapping mf - f{lo) isan isomorphism from N to W. ‘The null space M of m, is therefore a complement of N in X1, and so determines a right inverse R of S. The mapping f+ is an isomorphism from X1 to Xo X W, and this fact is equivalent to all the above assertions. Proof. For any fixed g in Xo we set GUl,a) = F(t, a) + 9(?) and consider the (nonlinear) equation da/dt = G(t, a). By Theorems 1.3 and 1.4 it has a unique ‘maximal solution f through any initial point <¢o, a9, and the domain of fis the whole of I. That is, for each pair in Xo X W there is a unique fin X; such that = . ‘The mapping KS, Ty>f1> is thus bijective, and since it is clearly linear, it isan isomorphism. Tn particular, ‘Sis surjective. ‘The null space N of S is the inverse image of {0} x WV under the above isomorphism; that is, 14 [ Nis an isomorphism from N to W. Finally, the null space M of i. is the inverse image of Xy X {0} under , and the direct sum decomposition X; = M @ N simply reflects the decomposition Xo x W = (Xo X {0}) ® ({0} W) under the inverse isomorphism. This finishes the proof of the theorem. 0 The problem of finding, for a given gin Xo and a given ao in W, the uniques in X, such that S(/) = g and f(a) = ag is called the initial-value problem. At the theoretieal level, the problem is solved by the above theorem, which states ‘that the uniquely determined fexists. At the practical level of computation, the problem remains important. The fact that M = 37,, is a complement of N breaks down the initial-value problem into two independent subproblems. ‘The right inverse R associated with63 qe LINEAR EQUATION — 277 ‘My finds h in X, such that S(h) = g and h(to) phism f+ f(éa) from N to W selects that k in X1 such that S(&) (fq) = ao. ‘Then f= h +k. ‘The first subproblem is the problem of “solving the inhomogeneous equation with homogeneous initial data”, and the second is the problem of “solving the horogeneous equation with inhomogeneous initial data”, In a certain sense the initial-value problem is the “direct sum” of these ‘two independent problems. We shall now study the homogeneous equation da/dt = T,(a) mote closely. ‘As we saw above, its solution space N is isomorphic to W under each projection map m= J++ f(0. Let gr be this isomorphism (60 that y= | N). We now choose some fixed fo in I—we may as well suppose that I contains 0 and take O—and set K; = g° ¢o'. Then {K;} is a one-parameter family of linear isomorphisms of W with itself, ard if we set fo) = K,(B), then fg is the solution of da/dé = Ty(a) passing through <0, >. We call Kea fundamental solution of the homogeneous equation da/dt = Ty(a). Since fg() = Ti(fo(), we see that d(K,)/dt = T, K, in the sense that ‘the equation is true at each 8 in W. However, the derivative d(K)/dt does not necessarily exist as a norm limit in Hom W. This is beeause our hypotheses on 7 do not imply that the mapping t+ 7; is continuous from I to Hom W. If this, mapping is continuous, then the mapping <¢, A> ++ T, A is continuous from 1X Hom W to Hom W, and the initial-value problem aA at = ved, Aga I has a unique solution A, in €4(T, Hom W). Because evaluation at 8 isa bounded linear mapping from Hom W to W, A,(8) is a differentiable function of ¢ and dA (p)/at = (dA,/ay(s) = T.(Au(B)). This implies that 4,(8) = K,(8) for all 6, so K,= Ay. In particular, the fundamental solution t+ K, is now a differentiable map into Hom W, and aK,/dt = T; © Ky. We have proved the following theorem. ‘Theorem 3.2, Let {++ T; be a continuous map from an interval neighborhood I of 0 to Hom W. Then the fundamental solution t-» K, of the differential equation da/dt = Ty(a) is the parametrized are from I to Hom W that is the solution of the initial-value problem dA/dt= Tyo A, Ao= 1. In terms of the isomorphisms K; = K(), we ean now obtain an explicit solution for the inhomogeneous equation da/dt = T,(a) + 9(f). We want f such that FO — TAMO) = oO. Now K'() =Ts2K(), 80 that T,= K’()eK()-%, and it follows from Exercise 8.12 of Chapter 4 and the general product rule for differentiation278 prruneNTiaL, Bavamions a (Theorem 8.4 of Chapter 3) that the left side of the equation above is exactly x (41K@-*U0))) ‘The equation we have to solve can thus be rewritten as GK OHO) = KOO). We therefore have an abvious solution, and even if the reader has found our ‘motivating argument too technical, he should be able to check the solution by differentiating. Theorem 3.8. In the context of Theorem 3.2, the function SO = Kil [' K7*(() a] isthe solution ofthe inhomogeneous initial-value problem da/dt = Tia) + 9(), (0) = ‘This therefore is a formula for the right inverse R of S determined by the complement Mo of the null space N of S. ‘The special ease of the constant coeficient equation, where the “coefficient” operator 7; is a fixed 7 in Hom W, is extremely important. ‘The first new fact to be observed is that if fis a solution of da/dt = T(a), then so is j’. For the equation f’(0) = T(f(O) has a differentiable right-hand side, and differentiating, we get f"() = T(s"). That is: Lemma 3.1. The solution space N of the constant coefficient equation da/dt = T(a) is invariant under the derivative operator D. Moreover, we see from the differential equation that the operator D on N is just composition with T. More precisely, the equation j"(#) = T(/(¢)) ean be rewritten ;* D = T'e m, and since the restriction of a: to N is the isomorphism gr from NV to W, this equation can be solved for 7. We thus have the following lemma, Lemma 3.2. For each fixed ¢ the isomorphism ¢, from N to W takes the derivative operator D on N to the operator T on W. That is, Pa oDeo ge ‘The equation for the fundamental solution K; is now aS/dt = TS. In the elementary calculus this is the equation for the exponential funetion, which leads us to expect and immediately check that K, = €. (See the end of Section 8 of Chapter 4.) The solution of da/d! = (a) through <0, 8> is thus the funeti63 THE LINEAR EQUATION — 279 If T satisfies a polynomial equation p(T’) = 0, as we know it must if W is finite-dimensional, then our analysis can be carried significantly further. Suppose for now that p has only real roots, so that its relatively prime factorization is pit) = TT — 4)". ‘Then we know from Theorem 5.5 of Chapter 1 that is the direct sum W = @f W; of the null spaces W, of the transformations (7 — A)", and that each Wj is invariant under 7. This gives us a much simpler form for the solution curve e'"a if the point a is in one of the null spaces WY, Taking such a subspace Wits as W forthe moment, we have (7 — 1)" = 0, so that T= J+ R, where R™=0, and the factorization e'” = e* ‘together with the now finite series expansion of e*®, gives us ny th cep B®], a= [a+ eR@ + +e ee ‘Note that the number of terms on the right is the degree of the factor (t — )™ i the polynomial p(t). In the general situation where W = @it Ws, we have a = Di ay e'"(a) = Ti e7(a,), and each e'?(a,) of the above form. The solution of #’(0) = T(f(0) through the general point <0, a> is thus a finite sum of terms of the form #2B,;, the number of terms being the degree of the polynomial p. If W is a complex Banach space, then the restriction that p have only real roots is superfluous. We get exactly the same formula but with complex values of \, This introduces more variety into the behavior of the solution curves since an outside exponential factor e = e'*e” now has a periodic factor if vA 0. Altogether we have proved the following theorem. ‘Theorem 3.4. If W is a real or complex Banach space and T ¢ Hom W, then the solution curve in W of the initial-value problem f’() = T(s(@), S10) = 8, is 8 Mt SE ai LAr. If T satisfies a polynomial equation (7 — 2)" = 0, then ot = + mH S00 = [a+enw + +qe=m® ], where R = T — XI. If T satisfies a polynomial equation p(T) = 0 and p has the relatively prime factorization p(e) = [Ti (@ — 4)", then () is a sum of k terms of the above type, and so has the form 10 = Zee Bis, where the number of terms on the right is the degree of the polynomial p, and each 8; is a fixed (constant) vector.280 DIFFERENTIAL EQUATIONS It is important to notice how the asymptotic behavior of 0) as t+ + co is controlled by the polynomial roots }y. We first restrict ourselves to the solution ‘through a vector a in one of the subspaces W, which amounts to supposing that (1 —X)" =0. Then if \ has a positive real part, so that e = ee™ with ‘a > 0, then |\f(®)|| > « in exponential fashion. If \ has a negative real part, ‘then f(0 approaches zero as t -> co (but its norm becomes infinite exponentially fast as t—+ —co). If the real part of ) is zero, then | f(t)|| > co like ¢"—* if ‘m > 1. Thus the only way for to be bounded on the whole of Ris for the real [pa¥t of A to be zero and m = 1, in which case J is periodic. Similarly, in the general case where p(T) = IT} (7 — Aa)"* = 0, it will be true that all the solution curves are bounded on the whole of R if and only if the root's hq are all pure imaginary and all the multiplicities my are 1 EXERCISES 3.1 Let I bean open interval in R, and let W be # normed linear space. Let Pt,a) bea continuous function from 1X TF to W which is near ina for each fied t. Prove ‘that there is a function c(t) which is bounded on every closed interval (a, 6] included in I and such that [F(@,)|| < e(@)|al for all wand t. ‘Then show that ¢ san be made eontinuous. (You may want to use the Heine-Borel property: If [a, 6] is covered by collection of open intervals, then some finite subcolletion already cavers (a, )) ‘3.2 In the text we omitted checking that fr 0 — G(6,4,1"-.- fis surjective from X, to Xo. Prove that this is 0 by tracking down the surjestvity through ‘the reduction to Brst-order system 8.3 Suppose that the coefficients a0) in the operator T= Lag? ave ll themselves in @1. Show that the null space N of Pisa subspace of @**1. State ‘generalization of this theorem and indeate roughly why ti true 3.4 Suppode that W is Banach space, 7’© Hom W, and is an eigenvector of 7 with eigenvalue. Show that the solution of the eostanteoefeent equation daa T(@) through <0, 6> is f(t) = eB. 353. Suppose next that IV is finitedimensional and has a basis {6.)7 consisting of cigenveetors af 7, with corresponding eigenvalues 7. Find formula for the solution through <0, a> in terms of the basis expansion of 3.6 A very important special ease ofthe linear equation da/at = Py) is when the operator function 7 is periods. Suppose, for example, that Tyyx'= 7, Yor all Show that then Kiya = KA(K)* for all and, “Assume next that X; basa logarithm, and so ean be written Kx = e4 for some A in Hom W. (We know from Exerise 11:10 of Chapter 4 that this i alvays possible if W is ate-dimensional) Show that-now K, can be writen in the form Ky = Boet4, ‘where B(0 is periodie with period 1.64 ‘THE nTH-ORDER LINEAR EQUATION 281 8.7 Continuing the above exercise, suppose now that W is a finite-dimensional complex vector space. Using the analysis of e'48 given in the text, show that the differential equation da/dt = Ty(a) has a periodic solution (with any period) only if Ky has an eigenvalue of absolute value 1. Show also that if Kr has an nth root of unity as an eigenvalue, then the differential equation has a periodic solution with period n. 3.8 Write out the special form that the formula of ‘Theorem 3.3 takes in the constant coefficient situation, 3.9. It is interesting to Icok at the facts of Theorem 3.1 from the point of view of ‘Theorem 5.3 of Chapter 1, Assume that S: Xy—» Xo is surjective and that its mull space Nis isomorphic to W ander the coordinate (evaluation) map mig. Prove that if MI is the nullspace of ray in Xs, then [ AF is an isomorphism onto Xo by applying this theorem. 4. THE nTH-ORDER LINEAR EQUATION ‘The nth-order linear differential equation is the equation da/dt" = Gt, a, da/dt,...,d"~ta/dt"—"), where Gl, 2) = G(l, a1, .. a) snow linear from V = W" to W foreach tin I. We eonvert this to a firstorder equation da/dé = F(a) just as before, where now F isa map from I X V to V thatis linear in its second variable a, F(t, a) = Di@). ur proof of Theorem 1.5 showed that a funetion fin €(T, 17) isa solution of the nth-order equation d*a/di" = G(t,a,...,d*~!a/dt"+) if and only if the n-tuplet ¥f = is a solution of the first-order equation da/dt = F(,@) = Tia). We know that the latter solutions form a vector subspace N of e*(Z, 1"), and since the map ¥:f++ is linear from €*(I, W) to 6#(Z, W"), it follows that the set V of solutions of the nth-order equation is a subspace of €*(I, W) and ¥ | N is an isomorphism from to. Since the coordirate evaluation g = 7, TN isan isomorphism from N to W* for each t (Theorem 3.1), it follows that the map HeoV Sr takes V isomorphically to W*, Its null space M, is a complement of N in €*, as before. Here M; is the set of functions J in ©*(L, W) such that f(0) LP) = 0. ‘We now consider the special ease W = ®. For each fixed ¢, @is now a linear ‘map from R to R, that is, an element of (R")*, and its coordinate set with respect to the standard basis is an n-tuple k= . Since the Tinear map varies eontinuously with t, the n-tuple kc varies continuously with t Thus, when we taketinto account, wehavean n-tuple kl) = Is an isomorphism from N to R*, and the set Af, of funetions Jin e* such that f(fo) = ++ = F*—P(G) = 0 is therefore a complement ‘of in €"(1), and determines a linear right inverse of L. ‘The practical problem of “solving” the differential equation L() = g for J when g is given falls into two parts. First we have to find the null space N of L, that is, we have to solve the homogeneous equation L(f) = 0. Sinee N is an n-dimensional veetor space, the problem of delineating it is equivalent to finding, 1 basis, and this is clearly the efficient way to proceed. Our first problem therefore is to find n linearly independent solutions {x}? of L(f) = 0. Our secon! problem is to find a right inverse of L, that is, a linear way of picking one f such that L()) = 9 for each g. Here the obvious thing to do is to try to make the formula of Theorem 3.3 into a practical computation. If v is cme solution of L() = g, then of course the set of all solutions is the alffine subspace N +. ‘We shall start with the first problem, that of finding a basis (u:} of solutions to L() = 0. Unfortunately, there is no general method available, and we have to be content with partial success, We shall see that we ean easily solve the first-order equation directly, and that if we ean find one solution of the nth- order equation, then we ean reduee the problem to solving an equation of order n—1, Moreover, in the very important special ease of an operator L with constant coefficients, ‘Theorem 3.4 gives a complete explicit solution ‘The first-order homogeneous linear equation ean be written in the form af + a@y = 0, where the coefficient of y’ has been divided out. Dividing by4 ‘THE MEH-ORDER LINEAR EQUATION 283 ‘and remembering that y'/y = (log ¥)', we see that, formally at least, a solution is given by log y = — fa(t) dt or y = e~!*##, and we can check it by inspec- ion. Thus the equation y+ y/t = 0 has a solution y = ¢-"** = 1/1, as the reader might have notieed directly ‘Suppose now that L is an nth-order operator and that we know one solution wof Lf = 0. Our problen then is to find n —1 solutions v1, ... 0-1 independent of each other and of u. It might even be reasonable to guess that these could be determined as solutions of an equation of order n — 1. We try to find 1 second solution v(f) in the form e(t)u(t), where e(t) is an unknown function. Our motivation, in part, is that such a solution would automatically be independent of u unless ¢(?) turns out to be a constant. Now if v(t) = e(@)u(), then of = ew’ + e'u, and generally -£ (Jaren If we write down (0) = £3 a,(0"( and eollect those terms involving e(0), wo got LG) = el) S ap? + terms involving ¢, 6 = el(x) + Se) = Se, where S is a certain linear differential operator of order n — 1 which can be explicitly computed from the above formulas. We claim that solving S(f) = 0 solves our original problem. For suppose that {g;)7~? is a basis for the null space of S, and set c(t) = fé gs. Then L(ciu) = S(e{) = S(g) = 0 for i T...)2 1. Moreover, u, equ... ,¢q-1 are independent, for if w EP! how, then 1= Cy! ke(®) and 0 = LF! ke) = LI kell), come ‘adicting the independense of the set {o;} We have thus shown that if we ean find one solution u of the nth-order equation Lf = 0, then its complete solution is reduced to solving an equation Sf =0 of order n —1 (although our independence argument was a little sketehy). ‘This reduction procedure does not combine with the solution of the first- order equation to build up a sequence of independent solutions of the nth-order equation because, roughly speaking, it “orks off the top instead of off the bottom”. For the combination to be successful, we would have to be able to find from @ given nth-order operator a first-order operator such that N(S) C N(L), and we can't do this in general. However, we can do it when the coefficient funetions in L are all constants, although we shall in fact proceed differently. Meanwhile it is valuable to note that a second-order equation Lf = 0 can. be solved completely if we ean find one solution u, since the above argument reduces the remaining problem to a first-order equation which can then be solved by an integration, as wo saw earlier. Consider, for instance, the equation xy’ — 2y/t? = 0 over any interval I not containing 0, so that ao(t) = 1/t? is continuous on J. We see by inspection that u(t) = #* is one solution. Then we284 DIFFERENTIAL EQUATIONS oa Inco that we can find a solution 2) independent of u() in the form vt) = #e() and thatthe problem will become a first-order problem fore. We have, in fact, Y= Pe 42 and w= Pe" + dee + 26, 90 that L(0) = 2” — 2o/ = Ce" + Ate, and L(e) = 0 if and only if (€)!-+ (4/e" = 0. "Thus wit pte ay, em yh (to within sealar multiple; we only want a basis!), and 9 = e() = 1/t (The reader may wish to cheek that this isthe promised solution.) ‘The null space of the operator L(f) = J” — 2f/t? is thus the linear span of (@, 1/0 ‘We now tura to an important tractable ease, the differential nperator LH anh + an af $2 aah whore the coefficients a are constants and ay might as well be taken to 1. What rnakes this case accessible is that now Z. sa polynomial inthe derivative operator D. That is, if Df’, so that DJ = J, then L = p(D), where p(2) = ‘The most elegant, but not the most elementary, way to handle this equation is to go over to the equivalent first-order system dx/d! = T(x) on Band to apply the relevant theory from the last setion Theorem 4.2. If pit) = (¢— b)", then the solution space N of the eon- stant coefficient nth-order equation p(D)f = 0 has the basis Ce 14, te If p() is a polynomial which has a relatively prime factorization pt) = TI; a(t) with each p,(0) of the above form, then the solution space of the constant eoeficient equation p(D)f = 0 has the basis UB, where By isthe above basis for the solution space N; of ps(D)f = 0. Proof. We know that the mapping ¥:f-> is an isomorphism from the mull space N of p(D) to the null space N'of dx/dt — T(x). It is clear that ¥ commutes with differentiation, Y(Df) = = DV), and sinee we know that N is invariant under D by Lemma 3.1, it follows (and ean easily be checked directly) that N is invariant under D. By Lemma 3.2 we have e:2 D>, which simply says that the isomorphism g::N — R™ takes the operator D on N into the operator 7 on R". Altogether ge ¥ takes D) on W into 7 on &", and since p(D) = 0 on N, it follows that p(7) = 0 on R”. We saw in Theorem 3.4 that if p(T) = 0 and p= (¢— 8)”, then the solution space N of dx/dt = 7(x) is spanned by vectors of the form os, ee ‘Tho first coordinates of the n-tuple-valued functions g in N form the space N (under the isomorphism 'g), and we therefore see that N is spanned by the functions e™,..., !&, Since N is n-dimensional, and since there are of these funetions, the spanning set forms a basis.oa THE RTH-ORDER LINEAR EQUATION — 285 ‘The remainder of the theorem can be viewed as the combination of the above and the direct application of Theorem 5.5 of Chapter I to the equation p(D) = OonN, oras the carry-over to N under the isomorphism ¥~? of the facts already established for N in the last section. If the roots of the polynomial p are not all real, then we have to resort to the complexification theory that we developed in the exercises of Section 11, Chapter 4. Except for one final step, the results are the same. ‘The one extra fact that has to be applied is that the null space of a real operator acting on a real vector space Y is exactly the intersection with Y of the null space of the complexification S of T acting on the complexification Z= Y @ i¥ of Y. This implies that if p(®) is a polynomial with real coefficients, then we get the real solutions of p(D)f = 0 as the real parts of the complex solutions. In order to see exactly what this means, suppose that g(r) = (r? — 2br + 6)" is one of the relatively prime factors of p(x) over B, with x* — 2hr ++ c irreducible over B. Over ©, (2) factors into ( ~ "(x — 5)", where d = b+} wand wo? = ¢ — b?. It follows from our general theory above that the complex 2n-dimensional null space of g(D) is the complex span of E01, oo VOM, 0B 088, ome ‘The real parts of the complex linear eombinations of these 2m functions is a 2m-dimensional real veetor space spanned by the real parts of the above funetions ‘and the real parts of # times the above functions. ‘That is, the null space of the real operator g(D) is a 2m-dimensional real space spanned by £0 08 et 1 008 et, «5 "Fe cos et; oO sin et, ("4M sin et} Since there are 2m of these funetions, they must be independent, and must form a basis for the real solution space of (D)f = 0. Thus, ‘Theorem 4.3. Ii p(t) of the constant eoefic (? + 2bt-+ 6)" and b? I'as ¢ +0, then there is an S in Hom V such that Ki — e'S.« EXERCISES Find solutions for the following equations, 41 2” — 30 +28 = 0 42 242 —3e=0 43 274/430 =0 4A a4 202 = 0 45 2! — 32" 43x —2=0 46 2” —2=0 47 20220 48 27-0 49 22 =0 410 Solve the initial-value problem 2” + 42" ~ 52 = 0, 2(0) = 1,2'@) = 2 4.11 Solve the initial-value problem 2” + 2! = 0,2(0) = 0,2'(0) = —1,2"(0) = 1. 4.12 Find one solution u of the equation 4122"” O by trying u(t) = t%, and then find a second solution as in the text by setting v(t) = e(@u(l). 413 Solve 2!” — Ste! + Bx = O by trying w(t) = & 414 Solve tx” +2" = 0. 415 Solve U(x" + 2°) +22" +2) = 0. 416 Knowing that e cos af and e sin ut are solutions of a second-order linsar differential equation, and observing that their values at O are 1 and 0, we know that they are independent. Why? 4.17 Find constant coefficient differential equations of which the following functions are solutions: f sin t, sin t 4.18 If fand g are independent solutions of a second-order linear differential equation uw! + ayu' + aqu = 0 with continuous coefficient functions, then we know that she vectors and are independent st every point z. Show conversely that if two functions have this latter property, then they are solutions of a second-order differential equation, 419. Solve the equation (D — a)%f = 0 by applying the order-reducing procedure discussed in the text starting with the obvious solution &288 DIFFERNTIAL EQUATIONS 65 5. SOLVING THE INHOMOGENEOUS EQUATION, We come now to the problem of solving the inhomogeneous equation Lf) We shall briefly describe a practical method which works easily some of the and a theoretical method which works all the time, but which may be hard to apply. ‘The latter is just the translation of Theorem 3.3 into matrix language. ‘We first consider the constant coefficient equation L(f) = ¢ in the special case where ¢ itself is in the null space of a constant coefficient operator S.A simple exampleis y/ ~ ay = ¢ (or y/ ~ ay = sin bf), where g(0) = eis in the null space of $= (D — 6). In such a situation a solution f must be in the null space of S » L, for S@ L(f) = S(q) = 0. We know what all these funetions fare, and our problem is to seleet f among them such that L(/) is the given g For the mement suppose that the polynomials Zand S (polynomials in D) have no factors in common. Then we know that L is an isomorphism on the null space N's of S and therefore that there exists an f in Ns such that Lf = g. Since we have a basis for Ns, we could construct the matrix for the action of Lon Ns and find f by solving a matrix equation, but the simplest thing to do is take ‘2 general linear combination of the basis, with unknown coefficients, let L act, on it, and see what the eoefficients must be to give g. For exampie, to solve y' — ay = &, we try Je) = ce™ and apply L:(D ~ adfec) = (b — ace and we see that e = 1/(b — a). to solve y’ — ay = cos bt, we observe that: cos bt is in the null space ‘D? +62 and that this null space has the basis {sin bt, eos bi}. We therefore set Jit) = ex sin bt + €9 cos bt. and solve (D — a)f = cost, getting Lacy — bea) sin bt + (bey — aca) cos bt —aey — beg cos bt, bey — cz = and KO = apapgasin ot — ay cos, When L and S do have factors in common, the situation is more complicated, but a similar procedure ean be proved to work. Now an extra factor must be roduced, where i is the number of occurrences of the common factor in L. For example, in solving (D — #)?f = e", we have Se L = (D —1)%, and so. we must set f() = ete", Our equation then becomes (D = njret?e = Yee" 2 et, and so ¢ For (D? + 1)f = sin t we have to sot f(0) = t(cy sin t+ ¢2 e084), and after we work it out we find that c) = O and ce = —4, so that f= —4f e0s65 SOLVING THE INHOMOGENEOUS EQUATION 280 ‘This procedure, called, natwnally, the method of undetermined coefficients, vio~ lates our philosophy about a solution process being a linear right inverse. Indeed, it is not a single process, applicable to any g occurring on the right, but varies with the operator S. However, when itis available, itis the easiest way to compute explicit solutions. ‘We describe next a general theoretical method, ealled variation of parameters, that is a right inverse to L and does therefore apply to every g. Moreover, it, inverts the general (variable cooficient) linear nth-order operator L: an = E aus? We are assuming that we know the null space N of L; that is, we assume known n linearly independent solutions {u}} of the homogeneous equation Lf = 0. What we are going to do is to translate into this context our formula K, ff Kz #(g(s)) ds for the solution to da/dt = T,(a) + 9(t). Since VEL SAL oo SOP > is an isomorphism from the solution space N of the nth-order equation L(f) = to the solution space N of the equivalent first-order system dx/dé = T(x), it follows that if we have a basis {u)} for N, then the columns of the matrix wis = uf" form a basis for N. Let w(t) be the matrix w(t) = uf (). Since evaluation at tis the isomorphism ¢, from N to R*, the cclumns of w(t) form a basis for BR", for each t. But K,(@) is the value at ¢ of the solution of dx/dé = T(x) through the initial point <0, >, and it follows that the linear transformation K; takes the coltumns of the matrix w(0) to the corresponding columns of w(t). ‘The matrix for Ky is therefore w(t) + w(0)~?, and the matrix form of our formula £0) = Kf (K)-"(w(0)) de is therefore £() = w(t) w(0)*- fro w(a)-1 + g(s) de. Moreover, since integration commutes with the application of a constant linear transformation (here multiplicstion by a constant matrix), the middle w(0) factors cancel, and we have the result that £0 =w(O- L w(s)*- @(s) ds is the solution of dx/dt = T,(x) + g(0) which passes through <0,0>. Finally, set k(s) = w(s)"" (4), so that this solution formula splits into the pair £0) = wid fK(0) ds, -w(0)- KOs) = (0.290 iFFERENTIAL BQUATIONS 65 Now we want to solve the inhomogeneous nth-order equation Lif) = g, and this means solving the first-order system with ¢ = <0,.-.,0,g>. Therefore, the second equation above is equivalent to ZL wslOb(0) = ZL was(edki(s) = (8). i u(y where (0) is the antiderivative [fh(3) ds, Any other antiderivative would do as well, since the difference between the two resulting formulas is of the form Xi aau(t), a sclution of the homogeneous equation L(f) = 0. We have proved the following theorem. ‘Theorem 5.1. If {us(t)}1 isa basis for the solution space of the homogeneous equation L(h) = 0, and if {() = Df ex(@u(d), where the derivatives e{() ‘are determined as the solutions of the equations Leu = 0, Leu = 9, then Li) = 9. We now consider a simple example of this method. The equation y/" ++ y see x has constant coeflicients, and we ean therefore easily find the null space of the homogeneous equation ” + y. A basis for it is {sin x, eos z}. But we can't ‘use the method of undetermined coefficients, because see x is not a solution of a ‘constant coefficient equation. We therefore try for a solution v(2) = eq(2) sin x + e2(z) cos. Our system of equations to be solved is ej sin 2 + ch cos. x = 0, see x. econ — ebsin 2 Thus ¢ ej tan 2 and ¢{(cos 2 + sin x tan 2) = see, giving eal of = —tanz, log cos 2, a= ee and (2) = asin x + (log cos 2) cos 2. (Cheek itt)65 SOLVING THE INHOMOGENEOUS EQUATION — 201 This is all we shall say about the process of finding solutions. Th eases where everything works we have complete control of the solutions of L(/) = 9, and ‘we can then solve the initial-value problem. If has order n, then we know that the null space NV is n-dimensional, and if for a given g the function v is one solution of the inhomogeneous equation L(f) ~ 9, then the set of all solutions is the n-dimensional plane (affine subspace) M' = N +s, If we have found a basis {u;}7 for N, then every solution of L(f) = g is of the form f = 3 eu; +». ‘The initial-value problem is the problem of finding f such that L(f) = g and Silo) = af, £"lo) = a8, «f° P (le) = af, where = a° is the given initial value. We can now find this unique f by using these n conditions to determine the n coefficients Eeau+v. Weget n equations in the n unknowns ¢;. Our ability to solve this problem uniquely again comes back to the fact that the matrix wj(to) = uf"!(t9) is nonsingular, as did our success in carrying out the variation of parameters process. ‘We conclude this section by discussing a very simple and important example. When a perfectly elastic spring is stretched or compressed, it resists with a “restoring” force proportional to its deformation. If we picture a coiled spring. lying along the z-axis, with one end fixed and the free end at the origin when undisturbed (Fig. 6.3), then when the coil is stretched a distance x (compression being negative stretching), the force it exerts is —cx, where c is a constant representing the stiffness, or elasticity, of the 5 , and the minus sign shows that the force is in the direction opposite to the displacement. ‘This is Hooke's Taw. Fig. 63 ‘Suppose that we attach a point mass m to the free end of the spring, pull the spring out to an initial position zo = a, and let go. The reader knows perfectly well that the system will then oscillate, and we want to describe its vibration explicitly. We disregard the mass of the spring itself (which amounts to ad~ justing m), and for the moment we suppose that frietion is zero, so that the system will oscillate forever. Newton's law says that if the force F is applied to ‘the mass m, then the particle wll aecolerate according to the equa nf ra Here F so the equation combining the laws of Newton and Hooke is292 DIFFERENTIAL EQUATIONS 65 ‘This is almost the simplest constant coefficient equation, and we know that the general solution is 2 = ey sin Mt + cp cos Mt, where @ = V/é/ni. Our initial condition was that. = aand 2’ = 0 whent = 0. Thus cp =a and ¢, = 0, so z= acos. The particle oscillates forever between z= —a and = ‘The maximum displacement a is called the amplitude A of the oscillation. The number of complete oscillations per unit time is called the frequency f, so f = @/2m = V/¢/2rv/m. This is the quantitative expression of the intuitively clear fact that the frequency will increase with the stiffness ¢ and decrease as the mass m increases. Other initial conditions are ‘equally reasonable. We might consider the system originally at rest and strike it, so that we start with an initial velocity v and an initial displacement 0 at time f= 0. Now ep = Oand x = cy sin 1. In order to evaluate cy, we remember that de/dt =v at t= 0, and since dz/dt = ¢49 cos 1, we have v= ex2 and ¢; = 0/2, the amplitude for this motion. In general, the initial condition would bez = a and 2’ = vwhen = 0, and the unique solution thus determined ‘would involve both terms of the general solution, with amplitude to be calculated. ‘The situation is both more realistic and more interesting when friction is taken into account. Frictional resistance is ideally a force proportional to the velocity de/dt but again with a negative sign, since its direction is opposite to that of the motion. Our new equation is thus fr de moat kG +er=0, ‘and we know that the system will act in quite different ways depending on the relationship among the constants m, k, and c. ‘The reader will be asked to explore these equations further in the exercises. Ttis extraordinary that exactly the same equation governs freely oscillating electric circuit. It is now written dz. ae t Rate where L, R, and C are the inductance, resistance, and capacitance of the circuit, respectively, and dz/adt isthe current. However, the ordinary operation of such 8 cireuit involves forced rather than free oscillation. An altemating (sinusoidal) voltage is applied as an extra, extemal, “oreo” term, and the equation is now az ae This shows the most interesting behavior of all. Using the method of undetermined coefficients, we find that the solution contains transient terms that die away, contributed by the homogeneous equation, and a permanent part of frequeney w/2x, arising from the inhomogeneous term a sino. New phenomena called phase and resonance now appear, as the reader will discover in the exercises. L asin wt dey +RG+h65 SOLVING THE INHOMOGENBOUS EQUATION 293 EXERCISES Find particular solutions cf the following equations. Sl e—2=tt 32 2” —2= sin Ba apa sint 55 yy 56 yy me 5.1 Consider the equation y/’ + y = sec x that was solved in the text. To what interval I must we limit our discussion? Check that the particular solution found in the text is correct, Solve the initial-value problem for 3 pe sint tt xt (Here y! = dy/ar) Ma+Ia) = sz, f0)=1, sO) =— Solve the following equations by variation of parameters, 58 aba tant 59 bat at 510 by = BAL y —y = cose 5.12 y+ dy = sec2e 5.13 y+ 4 5.14 Show that the general solution 1 sin @t-+ C2 cos Mt of the frictionless elastic equation m(d#z/dé®) + ex = 0 can be rewritten in the form Asin (Qt — a). (Remember that sin (2 —y) = sin z.e0s y — cos zsin y.) ‘This type of motion along ‘line is ealled simple harmonic motion, 5.15 In the above exercise express A and a in terms of the initial values dr/dt = © and z= a when = 0. 5.16 Consider now the freely vibrating system with friction taken into account, and ‘therefore having the equation m(d?x/a®) + k(dz/d) + ex = 0, allcoefcients being postive. Show that if 2 < 4m then the system oxillates forever, Dut with amplitude decreasing exponentially. Determine the frequency of oscillation, Use Exerise 5.14 to simplify the solution, and sketch its graph. 5.17 Show that if the frictional foree is sufiiently large (B? > dmc), then a freely Vibrating system doesnot in fact vibrate. ‘Taking the simplest ease k2 = Amc, sketch the behavior ofthe system for the initial condition dz/dt = Oand z = a when { = 0. Do the same forthe initia! condition de/dt = v and z = O when ¢ = 0 5.18 Use the method of undetermined coeficients to find a particular solution ofthe ‘equation of the driven electric cizeuit co @ L asin. ae + RE+E Assuming that R > 0, shew by a general argument that your particular solution isin {act the steady-state part ithe part without exponential deeay) of the general solution,294 DIFFERENTIAL BQUATIONS 66 5.19 In the above exercise show that the “current dz/dt for your solution can be written in the form, de a a Vee sin (ut ~ a), where X = Lo — 1/wC. Here a is called the phase angle. 5.20 Continuing our discussion, show that the current flowing in the circuit will have a maximum amplitude when the frequency of the “impressed voltage” a sin af is 1/2eVIC.. This is the phenomenon of resonance. Show also that the eurrent is in phase with the impressed voltage (ie, that a = 0) if and only if L = C = 0. 5.21 What is the condition that the phase a be approximately 90°? —90°? 5.22 In the theory of a stable equilibrium point in a dynamical system we end up with ‘two sealar produets (Z, 1) and ((E, 9)) on a finite-dimensional vector space V, the qui dratic form q(8) = 4((G, €)) being the potential energy and p(#") = 4(E', #) being the Kinetic energy. Now we know that dga(£) = ((a, £)) and similarly for p, and because of this Faet it ean be shown that the Lagrangian equations ean be written a(a,,) | (#9) = Gm Prove that a basis {8;}} can be found for V such that this vector equation becomes the system of second-order equations a = ha, de where the constants \are positive. Show therefore that the motion of the system isthe sum of » linearly independent simple harmonie motions. 6, THE BOUNDARY-VALUE PROBLEM We now tum to a problem whieh seems to be like the initial-value problem but which turns out to be of a wholly different character. Suppose that T’isa second order operator, which we consider over a closed interval [a,b]. Some of the most, important problems in physies require us to find solutions to T(J) = g such that ‘Shas given values at a and b, instead of f and J’ having given values at a single point fo. This new problem is ealled a boundary-calue problem, because {a,b} is, the boundary of the domain I = {a, 6]. The boundary-value problem, like the initial-value problem, breaks neatly into two subproblems if the set M = {fe e%((a, b)) = f(a) = J) = 0} turns out to be a complement of the null space N of f. However, if the reader will consider this general question for a moment, he will realize that he doesn't have a clue to it from our initial-value development, and, in fact, wholly new tools have to be devised. Our procedure will be to forget that we are trying to solve the boundary- value problem and instead to speculate on the nature of a linear differential

Advanced Calculus

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Advanced Calculus

Uploaded by

Copyright:

Available Formats

You might also like