You are on page 1of 295

A Practical Approach to

LINEAR ALGEBRA

"This page is Intentionally Left Blank"

A Practical Approach to

LINEAR ALGEBRA

Prabhat Choudhary

'Oxford Book Company
Jaipur. India

ISBN: 978-81-89473-95-2

First Edition 2009

Oxford Book Company
267, 10-B-Scheme, Opp. Narayan Niwas,
Gopalpura By Pass Road, Jaipur-3020 18
Phone: 0141-2594705, Fax: 0141-2597527
e-mail: oxfordbook@sity.com
website: www.oxfordbookcompany.com

© Reserved

Typeset by:
Shivangi Computers
267, lO-B-Scheme, Opp. Narayan Niwas,
Gopalpura By Pass Road, Jaipur-3020 18

Printed at :
Rajdhani Printers, Delhi

All Rights are Reserved. No part ofthis publication may be reproduced, stored in a retrieval system, or transmitted, in any
form or by any means, electronic. mechanical, photocopying, recording, scanning or otherwise, without the prior written
permission of the copyright owner. Responsibility for the facts stated, opinions expressed, conclusions reached and
plagiarism, if any, in this volume is entirely that of the Author, according to whom the matter encompassed in this book has
been origmally created/edited and resemblance with any such publication may be incidental. The Publisher bears no
responsibility for them, whatsoever.

and that in order to understand and appreciate mathematics it is necessary to do a great deal of personal cogitation and problem solving. Linear algebra is an indispensable tool in such research and this paper attempts to collect and describe a selection of some of its more important parallel algorithms. and it is usually studied as a sophisticated phenomenon of Interpolation between different approximately-Linear Regimes. Scientific and engineering research is becoming increasingly dependent upon the development and implementation of efficient parallel algorithms. There is a widespread feeling that the non-linear world is very different. least squares computations. or block-structured problems arising in the major areas of direct solution of linear systems. banded. We must emphasize that mathematics is not a spectator sport. Linear Algebra is a continuation of classical course in the light of the modem development in Science and Mathematics. Prabhat Choudhary .Preface Linear Algebra has occupied a very crucial place in Mathematics. eigenvalue and singular value computations. and rapid elliptic solvers. The purpose is to review the current status and to provide an overall perspective of parallel algorithms for solving dense.

"This page is Intentionally Left Blank" .

Bilinear and : . 221 9. Structure of Operators in Inner Product Spaces 198 Quadratic Forms 8. Advanced Spectral Theory 234 10.Contents v Preface 1. Inner Product Spaces 162 7. Determinants 101 5. 50 4.3. Linear Transformations 252 ..2:' Systems of Linear Equations 26 . Matrics'. Introduction to Spectral Theory 139 6. Basic Notions 1 .

"This page is Intentionally Left Blank" .

you cannot multiply two vectors. called vectors. addition of vectors and multiplication by a number (scalar). It is also easy to show that . You can add vectors. Commutativity: v + w = w + v for all v. such that the following properties (the so-called axioms of a vector space) hold: The first four properties deal with the addition of vector: 1. And finally. They are simply the familiar rules of algebraic manipulations with numbers. W E V. ~. Of course. Multiplicative identity: 1v = v for all v E V. W E V such that The next two properties concern multiplication: 5. 2. you can do with number all possible manipulations that you have learned before. 8. Remark: It is not hard to show that zero vector 0 is unique.Chapter 1 Basic Notions VECTOR SPACES A vector space V is a collection of objects. which connect multiplication and addition: 7. Remark: The above properties seem hard to memorize. But. Associativity: (u + v) + W = u + (v + w) for all u. Zero vector: there exists a special vector. 6. a(u + v) = au + av for all u. along with two operations. Such additive inverse is usually denoted as -v. (a + ~)v = av + ~v for all v E Vand all scalars a. Multiplicative associativity: (a~)v = a(~v) for all v E Vand all E scalars a. 4. two distributive properties. v E Vand all sCfllars a. v. 3. but it is not necessary. The only new twist here is that you have to understand what operations you can apply to what objects. Additive inverse: For every vector v E V there exists a vector v + W = O. denoted by 0 such that v + 0 = v for all v E V. or add a number to a vector. ~. W E V. and you can multiply a vector by a number (scalar).

It is also possible to consider a situation when the scalars are elements of an arbitrary field IF. Addition and multiplication are defined exactly as in the case of lRn . Addition and multiplication are defined entrywise. Although many of the constructions in the book work for general fields. and that -v = (-l)v. If the scalars are the usual real numbers. Note. en r- . where t is the independent variable. Example: The space lP'n of polynomials of degree at most n.. that some. if we allow complex entries and multiplication by complex numbers. or even all.. that any complex vector space is a real vector space as well (if we can multiply by complex numbers. properties can be deduced from the properties: they imply that 0 = Ov for any v E V. If the scalars are the complex numbers. we call the space Va real vector space. i. complex coefficient give us a complex vector space. i. only the entries now are complex numbers.e.2 Basic Notions given v E V the inverse vector -v is unique.. + ant.e. in this text we consider only real and complex vector spaces. Note.. In the case of real coefficients ak we have a real vector space. In this case we say that V is a vector space over the field IF. IF is always either lR or Co Example: The space lRn consists of all columns of size n. is a complex vector space. i.. a aVn vn en vn wn vn +wn Example: The space also consists of columns of size n. we call the space Va complex vector space. VI v2 v= vn whose entries are real numbers. we can multiply by real numbers).Ifwe allow only real entries (and so only multiplication only by reals). In fact. i. consists of all polynomials p of form pet) = ao + alt + a2 +.e. then we have a real vector space. but not the other way around. Example: The space Mmxn (also denoted as Mm n) ofm x n matrices: the multiplication and addition are defined entrywise.e.. coefficients ak can be O. the only difference is that we can now multiply vectors by complex numbers. if we can multiply vectors by complex numbers. we then have a complex vector space.

but for now transposition will be just a useful formal operation.k = (A)kj meaning that the entry of AT in the row number) and column number k equals the entry of A in the row number k and row number}. BASES. We will study this in detail later.)m n = A = (aJ ' kj=l.1 is a general way to write an m x n matrix. its transpose (or transposed matrix) AT.-k' and sometimes as in example above the same letter but in lowercase is used for the matrix entries. and let VI' v2"'" vp E Vbe a collection of vectors. One of the first uses of the transpose is that we can write a column vector x E IRn as x = (XI' x 2' •. For example al. Ifwe put the column vertically. the rows of AT are the columns ofA. where the entry is aij' and the second one is the number of the column. xn)T.. namely it gives the so called adjoint transformation. A linear combination of vectors VI' v 2"'" vp is a sum of form . Let Vbe a vector space. the columns of AT are the rows of A and vise versa. The transpose of a matrix has a very nice interpretation in terms of linear transformations.3 Basic Notions Question: What are zero vectors in each of the above examples? Matrix notation An m x n matrix is a rectangular array with m rows and n columns. . Very often for a matrix A the entry in row number) and column number k is denoted by AJ-.k=1 ' a2.1 . the first index denotes the number of the row. it will use significantly more space. LINEAR COMBINATIONS. Given a matrix A. Elements of the array are called entries of the matrix. 3 6 So. The formal definition is as follows: (AT)j. For example (41 25 63)T = (12 4J5 . It is often convenient to denote matrix entries by indexed letters is}. is defined by transforming the rows of A into the columns.k or (A)J.

.. . Example: The space V is ]RII.. . . Before discussing any properties of bases.. Consider vectors (polynomials) eo' e l .. + xmvn = v (with unknowns xk ) has a unique solution for arbitrary right side v.·. en =~. = 0 . en E Jllln defined by eo_= 1.. ••• . .. en = 0 .. . The system of vectors e l . e3 =~. let us give few examples. e2... which is 1). e 1 = t. ell is a basis in Rn. xnen = LXkek k=1 and this representation is unique.J Another way to say that vI' v2'.. . Example: In this example the space is the space Jllln of the polynomials of degree at most n. vn E Vis called a basis (for the vector space V) if any vector v E V admits a unique representation as a linear combination II V = aiv i + a 2v2 +. v.. The system e l . k=1 Definition: A system of vectors vI' v2... + apvp = Lakvk. 0 (the vector ek has all entries 0 except the entry number k. . e2.. e2 = P. or with respect to the basis vI' v2' ••. e3 0 = . and it makes sense to study them.. showing that such objects exist. any vector V= can be represented as the linear combination Xn n V = xle l + x 2e2 +. e2 0 = 0 0 0 1 0 0 0 .. + anvn = Lak vk • k=1 The coefficients ai' a 2. an are called coordinates of the vector v (in the basis. . . en E ]Rn is called the standard basis in ]Rn..4 Basic Notions p aiv i + a 2v2 +. . Indeed... . Consider vectors 0 e. VII is a basis is to say that the equation xlvI + x 2v2 +... e 2 . .

we can operate with them as if they were column vectors. if we have a basis. We do not want to worry about existence. Also. The words generating. So..~kVk= L(Uk+~k)Vk> i..e. say vn +1' .Basic Notions 5 Clearly. vn' and just ignore the new ones (by putting corresponding coefficients uk = 0). + upvp = I. . . Let us analyse these two statements separately. . •. Definition: A system of vectors vI' '. Namely.'2' ...... if we stack the coefficients uk in a column. as with elements oflRn.. vn' and we add to it several vectors. pet) representation = ao + alt + a 2t 2 + + ain admits a unique p = aoe o + aiel +. if v = I. vp ' then the new system will be a generating (complete) system. Generating and Linearly Independent Systems.UkVk k=1 The only difference with the definition of a basis is that we do not assume that the representation above is unique. Now. Remark: If a vector space V has a basis vI' v2. i..e.. + anen· So the system eo' e l . k=1 Uk vk n v+w= I. vn' then any vector v is uniquely defined by its co-effcients in the decomposition v = I. The term complete.. because of my operator theory background. let us turn our attention to the uniqueness. to get the column of coordinates of the sum one just need to add the columns of coordinates of the summands.UkVk+ k=1 and w = I. This statement is in fact two statements. Indeed. e 2. k=1 ~k vk ' then n n k=! k=1 I. k=1 Uk vk . any polynomial p. which always admits a representation as a linear combination. any basis is a generating (complete) system. we can represent any vector as a linear combination of the vectors vI' v2' . so let us consider the zero vector 0. .. spanning and complete here are synonyms. The definition of a basis says that any vector admits a unique representation as a linear combination. ' Vp E Vis called a generating system (also a spanning system... say vI' v2' . We will call it the standard basis in pn. Clearly.. •. en E pn is a basis in pn.. namely that the representation exists and that it is unique. or a complete system) in V if any vector v E V admits representation as a linear combination p v = Uiv i + U 2v 2 +. . .

:=1 1xk 1"* O....~iVj' j=1 j*k Proof Suppose the system ak' :2. and it can be written as :2. + xpvp = 0 (with unknowns x k) has only trivial solution xI = x 2 =.. we get the following Definition: A system of vectors vI' v2 . By negating the definition of linear independence.. + apvp is called trivial if a k = 0 Vk. 0 = :2. P Vk = :2.. .:=11 ak 1"* 0 such that VI' V2"'" vp is linearly dependent. A trivi~llinear combination is always (for all choices of vectors vI' v2' . vp is linearly independent i the equation xlvI + x 2v2 + .. Non-trivial here means that at least one of the coefficient a k is non-zero. vp E V is linearly dependent if and only if one of the vectors Vk can be represented as a linear combination of the other vectors. ..P' :2... If a system is not linearly independent. once again again means that at least one ofxk is different from 0. k=1 An alternative definition (in terms of equations) is that a system VI' v 2' ••• . So.:=11 ak 1"* 0 such that p :2.. . Then there exist scalars .. Definition: A system of vectors vI' v2' . restating the definition we can say. (J. Non-trivial.. . ••• ... . .6 Basic Notions Definition: A linear combination a l vI + a 2v2 +. .:=l akvk . the system vI' v2. ••• . vp is called linearly dependent if 0 can be represented as a nontrivial linear combination. vp ) equal to 0.. vp is linearly dependent i the equation XlVI +x2v2 +··· +xpvp=O (with unknowns x k ) has a non-trivial solution. a k vk = o. vp E V is called linearly independent if only the trivial linear combination (:2. vp equals O. This can be (and usually is) written as :2... In other words.. The following proposition gives an alternative description of linearly dependent systems.:=1 1ak 1"* o..:=l akvk with a k = 0 Vk) of vectors vI' V2. it is called linearly dependent. = xp = O... and that IS probably the reason for the name. that a system is linearly dependent if and only ifthere exist scalars at' a 2.•. . Proposition: A system of vectors VI' V2.

Since the system vI' v2. Proposition: A system of vectors v I' v2' . vn is a basis. if a system vI' v 2. Take an arbitrary vector v2 v. So.I V I + '""'2V 2 + •. . if holds. + anvn = Lakvk' k=l Since the trivial linear combination always gives 0.7 Basic Notions aiv i + a 2v 2 + . vn is linearly independent and complete.• . j=1 j"#k -a/a Dividing both sides by ak we get with ~j = k• On the other hand. Let k be the index such that ak:f:. Suppose a system vI' v2' . 0 can be represented as a non-trivial linear combination P vk - I~jVj =0 j=1 j"#k Obviously. v can be represented as n V = I'V U.. . Indeed. Suppose v admits another representation Then .... Proof: We already know that a basis is always linearly independent and complete. any basis is a linearly independent system. O. + apvp = O.• + '""'n rv V = ~ akvk' n £. if a system is a basis it is a complete (generating) and linearly independent system. Vn E V is a basis if and only if it is linearly independent and complete (generating).···. vn is linearly complete (generating). so in one direction the proposition is already proved.. 0 admits a unique representation n 0= alv 1 + a 2v2 +.J I'V k=I We only need to show that this representation is unique. as we already discussed.. Then... moving all terms except akvk to the right side we get p akvk Iajvj.. ••. The following proposition shows that the converse implication is also true. Let us prove the other direction. the trivial linear combination must be the only one giving O.

v E Vand for all scalars a. If the new system is linearly independent. W be vector spaces. v E V. T (u + v) = T(u) + T (v) 'r. any linear combination of vectors vI' v2"'" vp can be represented as a linear combination of the same vectors without vk (i.. Repeating this procedure finitely many times we arrive to a linearly independent and complete system. The set X is called the domain of T. Remark: In many textbooks a basis is defined as a complete and linearly independent system. it is a basis. (akvk). it is linearly dependent. and the set Y is called the target space or codomain of T. namely that any vector admits a unique representation as a linear combination. k. k. T (av) = aT (v) for all v E Vand for all scalars a..j = k).e. i..L. 1fnot. Uk . A transformation T: V ~ W is called linear if I. i. 2. So.fk. MATRIX-VECTOR MULTIPLICATION A transformation T from a set X to a set Y is a rule that for each argument (input) x E X assigns a value (output) y = T (x) E Y. Properties I and 2 together are equivalent to the following one: T (au + pv) = aT (u) + PT (v) for all u.._l be the differentiation operator. may be without even suspecting it. W = lP'n-l' and let T: lfDn ~ lfD.. and thus the representation v = aIv I + a 2v 2 +.e. because otherwise we delete all vectors and end up with an empty set.. we repeat the procedure. the new system will still be a complete one. as the examples below show.8 Basic Notions n n n L. Proposition: Any (finite) generating system contains a basis. any finite complete (generating) set contains a complete linearly independent subset. Example: Differentiation: Let V = lfDn (the set of polynomials of degree at most n). So. if we delete the vector vk.Uk = 0 'r. .akVk = V-V= 0 k=l k=l k=l Since the system is linearly independent. Examples: You dealt with linear transformation before. It emphasizes the main property of a basis. the vectors vj' 1 ~j ~p. Definition: Let V. LINEAR TRANSFORMATIONS. Proof Suppose VI' v2"'" Vp E V is a generating (complete) set. If it is linearly independent. We write T: X ~ Y to say that T is a transformation with the domain X and the target space Y. + anvn is unique. and we are done. Since vk can be represented as a linear combination of vectors vj' j :j. (ak -advk = L. we are done. Suppose it is not linearly independent. a basis.fu. Then there exists a vector V k which can be represented as a linear combination of the vectors vj' j :j. Although this definition is more common than one presented in this text.e. b.

Linear transformations J!{' -7 Matrix-column mUltiplication: It turns out that a linear transformation T: jRn -7 jRm also can be represented as a multiplication. So. Example: Rotation: in this example V = W = jR2 (the usual coordinate plane). any linear transformation of jR is just a multiplication by a constant. Therefore the property 1 of linear transformation holds. It is also easy to see that the property 2 is also true./)' = af'. Fig. but by a matrix. T ((::)) ~ V~J and from this formula it is easy to check that the transformation is linear. Example: Let us investigate linear transformations T: jR ~ lR. Indeed. Rotation Namely. and a transformation Ty: jR2 -7 jR2 takes a vector in jR2 and rotates it counterclockwise by r radians. Any such transformation is given by the formula T (x) = ax where a = T (1). that this transformation is linear. Since Tyrotates the plane as a whole. it is easy to write a formula for T. and the transformation T: jR2 -7 jR2 is the reflection in the first coordinate axis. this is a linear transformation. Example: Reflection: in this example again V = W = jR2. but we will use another way to show that. T(x) = T(x x I) =xT(l) =xa = ax.Basic Notions 9 T (p):= p'lip E lP'n' Since if + g) = f + g and (a. not by a number. it rotates as a whole the parallelogram used to define a sum of two vectors (parallelogram law). r. . It can also be shown geometrically.

2 a2.n So. Namely..2 al.l am..2 al. en of Rn.e... . . Indeed. i.k Then if we want Ax = T (x) we get al.2 am..l al.k am. + xnen = L:~=lxkek and T(x) = T(ixkek) = iT(Xkek) = iXkT(ek) = iXkak .. 2..n am.2 +X2 a2. the matrix-vect~r multiplication should be performed by the following column by .k ak = a2. T (x) = Ax. .1 am. this matrix contains all the information about T. that the column number k of A is the vector ak . n).2 am.n A= am. k = 1.l al.." the vectors of size m).. an together in a matrix A = [aI' a2. let X= Xn Then x = xle l + x 2e2 + . k=l k=l k=l k=l So. Let us show how one should define the product of a matrix and a vector (column) to represent the transformation T as a product. it is sufficient to know n vectors in Rm (i.n Recall..e. Let al.. an] (ak being the kth column of A.n +···+Xn a2. al. . Let T: ]Rn ~ ]Rm be a linear transformation. ..10 Basic Notions Let us see how. .l Ax = n LXkak k=l = Xl a2.n a2. . if we join the vectors (columns) aI' a2.1 a2. What information do we need to compute T (x) for all vectors x E ]Rn? My claim is that it is sufficient how T acts on the standard basis e" e 2.

}}. e2.3)=(14) 4·1 + 5·2 + 6·3 32 ( 41 25 6 Linear transformations and generating sets: As we discussed above. In particular. one can consider any basis. ••. . n thenT= TI. k= 1. If the matrix A of the linear transformation T is known. To get the matrix of a linear transformation T: Rn ---? Rm one needs to join the vectors ak = T ek (where e I . Example: 3)(~J3 = (1. m. T(x) = Ax.2+3. then T (x) can be found by the matrix-vector multiplication. that is.. To perform matrixvector multiplication one can use either "column by coordinate" or "row by column" rule. Example: The "column by coordinate" rule is very well adapted for parallel computing. T~: V ---? W such that Tvk = TIv". ••• .2. then yk = I n a ·x· = 1. It will be also very important in different theoretical constructions later. when doing computations manually. This can be expressed as the following row by column rule: To get the entry number k of the result. if vI' V 2.k here Xj and Yk are coordinates ofthe vectors x and y respectively.Basic Notions 11 coordinate rule: Multiply each column of the matrix by the corresponding coordinate of the vector. linear transformation T(acting from ~nto ~m) is completely defined by its values on the standard basis in ~n. en is the standard basis in Rn) into a matrix: kth column of the matrix is ak' k = 1.. and aj'k are the entries of the matrix A. However.. if Ax = y. j=lk. ifit is a basis) in V. 2. . n. The fact that we consider the standard basis is not essential. a linear transformation T: V ---? W is completely defined by its values on a generating set (in particular by its values on a basis). . vn is a generating set (in particular. it is more convenient to compute the result one entry at a time. one need to multiply row number k of the matrix by the vector. and T and TI are linear transformations T. ... Namely. .2. .. Conclusions 1. even any generating (spanning) set..2. .1+2.

n can only be multiplied by an m x n matrix. not T(v + u). m. Remark: In the matrix-vector mUltiplication Ax the number of columns of the matrix A matrix must coincide with the size of the vector x.k = (row j of A) . I intentionally did not speak about sizes of the matrices A and B. then Ab I . In other words the product AB is defined i£ and only if A is an m x nand B is n x r matrix. ••. COMPOSITION OF LINEAR TRANSFORMATIONS AND lVIATRIX MULTIPLICATION Definition of the matrix multiplication: Knowing matrix-vector multiplication. and will be used in different theoretical constructions. but still have some unused entries in the vector. since an m x n matrix defines a linear transformation JR. Note that the usual order of algebraic operations apply.e.n ~ JR:m." a vector in JR. For example. . (column k of B) Formally it can be rewritten as . i.n. Since a linear transformation is essentially a multiplication. we can see that in order for the multiplication to be defined. the size of a row of A should be equal to the size of a column of B. .k are entries of the matrices A and B respectively.e. For a linear transformation T: JR. one can easily guess what is the natural way to define the product AB of two matrices: Let us multiply by A each column of B (matrix-vector multiplication) and join the resulting column-vectors into a matrix. but still have unused entries in the column.. very often people do not distinguish between a linear transformation and its matrix.k (the entry in the row j and column k) of the product AB is defined by (AB)j. so vector x must belong to JR. However. We will also use this notation. Recalling the row by column rule for the matrix-vector mUltiplication we get the following row by column rule for the matrices the entry (AB)j. b2 . . Formally. The easiest way to remember this is to remember that if performing multiplication you run out of some elements faster.n ~ JR. but if we recall the row by column rule for the matrix-vector multiplication. if using the "row by column" rule you run out of row entries. we will also use the same symbol for a transformation and its matrix. Abr are the columns of the matrix AB. if b I . It is also not defined if you run out of vector's entries. the notation Tv is often used instead of T(v). br are the columns of B. . its matrix is usually denoted as [T].12 Basic Notions The latter seems more appropriate for manual computations.k and bj. Ab2 . the multiplication is not defined.. When it does not lead to confusion. Tv + u means T(v) + u. It makes sense. The former is well adapted for parallel computers.k = Laj"b"k' if aj. then the multiplication is not defined. i. and use the same symbol for both. (AB)j..

Suppose we have two linear transformations.. the expression TI(Tix)) is well defined and the result belongs to ]Rm. T: ]Rr ~ ]Rm. Why are we using such a complicated rule of multiplication? Why don't we just multiply matrices entrywise? And the answer is. the columns of the matrix of Tare Abl' Ab2. Note that TI(x) ERn. T1: ]Rn ~ ]Rm and T2: ]Rr ~ ]Rn. Formally it can be written as . the direct computation of Te I and Te2 involves significantly more trigonometry than a sane person is willing to remember. we can (and will) write TI T2 instead of TI T2 and TIT2x instead of TI(Tix )). . then reflect everything in the xI-axis. say ]Rm. Note that in the composition TI T2 the transformation T2 is applied first! The way to remember this is to see that in TI T2x the transformation T2 meets x first. So. Namely.. Since the matrix multiplication agrees with the composition. arises naturally from the composition of linear transformations. . To find the matrix. where el' e2. So. then TI must act from Rn to some space. . Then to get the reflection T we can first rotate the plane by the angle -g.Basic Notions 13 Motivation: Composition of linear transformations. .. er is the standard basis in Rr. that the multiplication. we need to compute Tel and Te2 . and let To be the reflection in the xl-axis. in order for TI T2 to be defined the matrices of TI and T2 should We will usually identify a linear transformation and its matrix. An easier way to find the matrix of T is to represent it as a composition of simple linear transformation. say ]R'to ]Rn.. It is easy to show that T is a linear transformation. As we discussed in the previous section. T(e r). so let us find its matrix. Abr. r we have T (e k) = TI (T2(e k )) = TI(Be k) = TI(b k) = Abk (operators T2 and TI are simply the mUltiplication by B and A respectively). . and then rotate the plane by g. T2 as T (x) = TI(Tix)) \Ix ERr. So. 2. For k = 1. It is a linear transformation. let g be the angle between the xI axis and the line xI = 3x2. so it is defined by an m x r matrix.. . How one can find this matrix.. different form the row by column rule: for a composition T JT2 to be defined it is necessary that T2x belongs to the domain of T1• If T2 acts from some space. T (e 2). Remark: There is another way of checking the dimensions of matrices in a product. and that is exactly how the matrix AB was defined! Let us return to identifying again a linear transformation with its matrix. Since TI:]Rn ~ ]Rm. . taking everything back.. However. Example: Let T: ]R2 ~ ]R2 be the reflection in the line xI = 3x2. Define the composition T = TI T2 of the transformations T I . but in the next few paragraphs we will distinguish them be of sizes m x nand n x r respectively-the same condition as obtained from the row by column rule. as it is defined above. moving the line xI = 3x2 to the xl-axis. the columns of T are vectors T (e l )... knowing the matrices of TI and T2? Let A be the matrix of TI and B be the matrix of T2.

The properties of linear transformations then imply the properties for the matrix multiplication. Distributivity: A(B + C) = AB + AC. The new twist here is that the commutativity fails: Matrix multiplication is noncommutative.JW and similarly sin y = second coordinate length -. generally for matrices AB = BA. say a vector (3. i. To compute sin yand cos ytake a vector in the line x I = 3x2.14 Basic Notions T= RgTOR-Y where Rg is the rotation by g. Then ..e. COS(-y) R_y= ( sin(-y) -sine-y») (COSY cos(-y) = -siny sin y) cosy. Matrix multiplication enjoys a lot of properties. Associativity: A(BC) = (AB)C. the rotation matrices are known cosy -sin Y) Ry = ( sin y cos y . 3.J1O Gathering everything together we get T~ VoR-y~ lo G~I)(~ ~I) lo (~I ~) ~ I~ G~I)(~ ~I)( ~I ~) It remains only to perform matrix multiplication here to get the final result. 2. To ~ (~ _~). first coordinate 3 3 cos Y= length . (A + B)C = AC + BC. Properties of Matrix Multiplication. . One can take scalar multiplies out: A(aB) = aAB. One should prove the corresponding properties for linear transformations. This properties are easy to prove. familiar to us from high school algebra: I. provided either left or right side of each equation is well defined.~32 + 12 - 1 . The matrix of To is easy to compute.~32 + 12 - Il. provided that either left or right side is well defined. and they almost trivially follow from the definitions.

If we put the column vertically. One of the first uses of the transpose is that we can write a column vector x E Rn as x = (x \' x 2.Basic Notions 15 One can see easily it would be unreasonable to expect the commutativity of matrix multiplication. T So." when you take the transpose of the product. you change the order of the terms. for example. when A and Bare nxn (square) matrices. Then the product AB is well defined. A simple analysis of the row by columns rule shows that (AB)T = BTAT. it will use significantly more space.. but if m = r. The transpose of a matrix has a very nice interpretation in terms of linear transformations.k) its trace (denoted by trace A) is the sum of the n trace A = L ak. Trace and Matrix Multiplication. the rows of AT are the columns ofA. X-n)T. but for now transposition will be just a useful formal operation. namely it gives the so-called adjoint transformation. The formal definition is as follows: (AT)j. If we just pick the matrices A and B at random. Then trace(AB) = trace(BA) .e. letA and B be matrices of sizes m x nand n x r respectively.• . Even when both products are well defined.k = (A)kJ meaning that the entry of AT in the row number) and column number k equals the entry of A in the row number k and row number}. the columns of AT are the rows of A and vise versa. Indeed. its transpose (or transposed matrix) AT is defined by transforming the rows of A into the columns. BA is not defined. the chances are that AB = BA: we have to be very lucky to get AB = BA. the multiplication is still non-commutative. For a square (n x n) matrix A diagonal entries = (aj. For example I 2 (4 5 (1 4) !) ~ ~ ! . We will study this in detail later.k k=l Theorem: Let A and B be matrices of size m Xn and n Xm respectively (so the both p )ducts AB and BA are well defined). Given a matrix A. Transposed Matrices and Multiplication. . i.

'Vx. for example on matrices which has all entries 0 except the entry I in the intersection of jth column and kth row. Clearly. Since a linear transformation is completely defined by its values on a generating system. the equalities AI=A. the identity transformation (operator) L Ix = x. we need just to check the equality on some simple matrices. when it is does not lead to the confusion we will use the same symbol I for all identity operators (transformations). We can consider two linear transformations. The transformation A is called right invertible if there exists a linear transformation C: W ~ V such that . there is the identity transformation I = Iv: V ~ V. T and Tl' acting from Mnxm to lR = lRI defined by T (X) = trace(AX). To be precise. Clearly. ISOMORPHISMS IDENTITY TRANSFORMATION AND IDENTITY MATRIX Among all linear transformations. there is a special one. 'Vx E V. J0. there is another way. the equality for X = A gives the theorem. This method requires some proficiency in manipulating sums in notation.16 Basic Notions There are essentially two ways of proving this theorem. its matrix is an n x n matrix 1 0 o 1 0 0 o 1 1=1n = 0 (l on the main diagonal and 0 everywhere else). One is to compute the diagonal. we use the notation In. However. When we want to emphasize the size of the matrix. entries of AB and of BA and compare their sums. Ivx = x. If you are not comfortable with algebraic manipulatioos. if I: lR n ~ lR n is the identity transformation in Rn. T} (X) = trace(XA) To prove the theorem it is sufficient to show that T = T 1. otherwise we just use 1. there are infinitely many identity transformations: for any vector space V. We say that the transformation A is left invertible if there exist a transformation B: W ~ V such that BA = I (I = I v here). for an arbitrary linear transformation A. INVERTffiLE TRANSFORMATIONS Definition: Let A: V ~ W be a linear transformation.k' INVERTIBLE TRANSFORMATIONS AND MATRICES. We will use the notation IV only we want to emphasize in what space the transformation is acting.IA =A hold (whenever the product is defined).

that we did not assume the uniqueness of B or C here. A-I: W ~ V such definition of an A-IA = IV' AA-l = Iw The transformation A-I is called the inverse of A. and it also can be checked by the matrix multiplication. and generally left and right inverses are not unique. Therefore the left inverse B is unique. Then BAC = B(AC) = BI = B. Note. Suppose for some transformation BI we have BIA = 1. right invertible). Definition: A matrix is called invertible (resp. AA. To show that this matrix is not right invertible. The matrix A-I is called (surprise) the inverse of A. then its left and right inverses Band C are unique and coincide. The identity transformation (matrix) is invertible. right invertible) if the corresponding linear transformation is invertible (resp. 1) -sin cos 1 rl is invertible.I = 1. If a linear transformation A: V ~ W is invertible. we just notice that there are more than one left inverse. left invertible. 11 2. Theorem: asserts that a matrix A is invertible if there exists a unique matrix A-I such that A-1A = I. and therefore B = C. 112). and the inverse is given by (R y = R_"( This equality is clear from the geometric description of Rg. Exercise: describe all left inverses of this matrix. left invertible. Theorem. 3. I)T is left invertible but not right invertible.Basic Notions 17 AC = I (here 1= I w)' The transformations Band C are called left and right inverses of A. One of the possible left inverses in the row (112. Examples: 1. Corollary: A transformation A: V ~ Wis invertible if and only if there erty is used as the exists a unique linear transformation (denoted A-I). Repeating the above reasoning with B I instead of B we get B 1 = C. The uniqueness of C is proved similarly. The column (l. On the other hand BAC = (BA)C = IC = C. The rotation Rg Ry = (C~S1 sm 1 = I. Proof Let BA = I and AC = 1. Definition: A linear transformation A: V ~ W is called invertible if it is both right and left invertible. .

it is invertib!e. Isomorphic spaces can be considered as di erent representation of the same space. If A is invertible.s-I(A-IA)B = . if a square matrix A has either left of right inverse. And finally. then the second factor is also invertible. An invertible linear transformation A: V ~ W is called an isomorphism. If A is invertible. then the product AB is invertible and (ABt l = . if A is invertible. if one of the factors (either A or B) and the product AB are invertible.s-I)A-I = AIA.I = AA-I = I and similarly (. So.18 Basic Notions 4. (A-Ir l = A. it is sufficient to check only one of the identities AA. (A-It l = A. and similarly AT (A-Il = (A-tAl = IT = 1. it is just another name for the object we already studied. Until we prove this fact. 2. Theorem: (Inverse of AT). 3. Remark: An invertible matrix must be square (n x n). The column (112. 1) is right invertible. . We did not introduce anything new here. Moreover. we will not use it.s-IA-I)(AB) = . then A-I is also invertible. Properties of the Inverse Transformation Theorem: (Inverse of the product). If A and B are invertible and the product AB is defined. So. ISOMORPHISM ISOMORPHIC SPACES. This fact will be proved later. However. then AT is also invertible and (ATt l = (A-I)T.s-I A-I (note the change of the order!) Proof Direct computation shows: (AB)(.s-IA-I) = A(B. I presented it here only to stop trying wrong directions.s-IA-I. then AB is invertible and (AB)-I = . but not left invertible.s-IB = I Remark: The invertibility of the product AB does not imply the in-vertibility of the factors A and B (can you think of an example?). 1I2l is a possible right inverse. If a matrix A is invertible. Two vector spaces V and Ware called isomorphic (denoted V == W) if there is an isomorphism A: V ~ W. If linear transformations A and B are invertible (and such that the product AB is defined). The row (l. then AT is also invertible and (ATrl = (A-t)T Proof Using (ABl = BT AT we get (A-t)T AT = (AA-t)T = IT = I.s-IIB = .I = L A-IA = 1. ret us summarize the main properties of the' inverse: 1. then A-I is also invertible.

Therefore in the above theorem we can state that vI' v2' •.2. vn ' Define transformation A: Aek = vk . Let V be a (real) vector space with a basis ]Rn ~ Vby v\' v2' .. .. n (as we know.. or "generating". . .. The inverse to the Theorem is also true Theorem: Let A: V ~ W be a linear map. so JP>n _ 2. . a linear transformation is defined by its values on a basis). e2. Aen+ 1 = (1 By Theorem A is an isomorphism.2. = ]Rn+l. Ae2 = t. en is the standard basis in ]Rn.. Remark: In the above theorem one can replace "basis" by "linearly independent".. Then the system Av l . .•. Theorem: LetA: V ~ Wbe an isomorphism. AVn is a basis. n. Remark: If A is an isomorphism. vn is a basis if and only if Avl' Av2 . .. . . . Again by Theorem A is an isomorphism. Examples: 1.. Then A is invertible if and only if for any right side b E W the equation Ax=b has a unique solution x E V. Ae n = (1-1. . Let A: ]Rn+1 ~ JP>n (JP>n is the set of polynomials I:=oak tk of degree at most n) is defined by Ae l = 1. Proof Define the inverse transformation A-I by A-Iwk = vk .2.. . Av2. More generally. vn be a basis in V. or "linearly dependent"-all these properties are preserved under isomorphisms... 3.. suppose that for some other vector XI E V Ax I -b Multiplying this identity by A-I from the left we get . if AVk = w k' k = 1. k = 1. wn are bases in Vand W respectively.. The theorem below illustrates this statement.. . . . . To show that the solution is unique. Mmxn == Rm"n Invertibility and equations Theorem: Let A: V ~ W be a linear transformation.. then so is A-I. Then x = A-Ib solves the equation Ax = b. and let vI' V2' .. and let VI' v2' ••• ..Basic Notions 19 meaning that all properties and constructions involving vector space operations are preserved under isomorphism. n. so V== Rn. . M 2x3 == JR6.. Proof: Suppose A is invertible. 4. then A is an isomorphism. vn and WI' w2' .. . k= 1. AVn is a basis in W. . where e l .

which is denoted as Null A or Ker A and consists of all vectors v E V such that Ay = o. Let us use symbol y instead of b. that the empty set 0 is not a vector space.. or kernel of A. W E W whicb can be If A is a matrix.e. If v E Vo then av E Vo for all scalars a. Trivial subspaces of a space V. p. is that the result of some operation does not belong to Vo. i. SUBSPACES A subspace of a vector space V is a subset Vo c V of V which is closed under the vector addition and multiplication by scalars. and therefore x I = A-I b = x. The only thing that could possibly go wrong.e. 3.2. because all operations are inherited from the vector space V they must satisfy all eight axioms of the vector space. that a subspace Vo c V with the operations (vector addition and multiplication by scalars) inherited from Vis a vector space.. Let us call this solution B (y). We need to show that B(aYI + PY2) = ap(YI) + PB(Y2)· Let xk := B(Yk)' k = 1. since it does not contain a zero vector. AXk =Yk' k = 1. namely V itself and {O} (the subspace consisting only of zero vector). we can see that any vector W E Ran A can be represented as . Let us now suppose that the equation Ax = b has a unique solution x for any b E W. 2. Note. i. then recalling column by coordinate rule of the matrix-vector multiplication. so it is not a subspace. Note that both identities. v E Vo' and for all scalars a. Then which means B(aYI + PY2) = aB(Yi) + PB(Y2)· Corollary: An m x n matrix is invertible ifand only ifits columns form a basis in Rm. We know that given yEW the equation Ax=y has a unique solution x E V. 1. With each linear transformation A : V -t W we can associate the following two subspaces: 2. For any u. AA-I = I and A-iA = I were used here.e. A: R m -t ]Rn. Note. But the definition of a subspace prohibits this! Now let us consider some examples: 1. Let us check that B is a linear transformation.20 Basic Notions A-lAx =A-Ib .2. Again. v E Vo the sum u + v E Vo. The range Ran A is defined as the set of all vectors represented as w = Ay for some v E V. Indeed. the conditions 1 and 2 can be replaced by the following one: au + bv E Vo for all u.. i. The null space.

which play role of x and y coordinates on the plane. and graphic engine connects them to reconstruct the object.. The notation span{v I. The simplest transformation is a translation (shift). so manipulations with bitmaps require a lot of computing power. vr } is also used instead of £{vl' v 2'···. We will not go into the details.. Of course. where every pixel of an object is described. for a matrix A. . one needs only to give the coordinates of its vertices. But there are standard methods allowing one to draw smooth curves through a collection of points. Position of each pixel is determined by the column and row. . not all objects on a computer screen can be represented as polygons.. like letters.Basic Notions 21 a linear combination of columns of the matrix A. So. and which vertex is connected with which. That is the reason that most ofthe objects. . Anybody who has edited digital photos in a bitmap manipulation programme.. Bitmap object can contain a lot of points. have curved smooth boundaries. For us a graphical object will be a collection of points (either wireframe model. vr . v 2' ••• . That explains why the term column space (and notation Col A) is often used for the range of the matrix. 4. and vector object. V 2' .. 2-Dimensional Manipulation The x . knows that one needs quite a powerful computer.. In particular we explain why manipulation with 3 dimensional images are reduced to multiplications of 4 x 4 matrices. the notation Col A is often used instead of Ran A. vr } It is easy to check that in all of these examples we indeed have subspaces. Given a system of vectors vI' V 2 ' .. each pixel is assigned a specific colour.y plane (more precisely. a rectangle there) is a good model of a computer monitor. appearing on a computer screen are vector ones: the computer only needs to memorize critical points. Remark: There are two types of graphical objects: bitmap objects.. or bitmap) and we would like to show how one can perform some manipulations with such objects. like Adobe Photoshop.y coordinates is a good model for a computer screen: and a graphical object is just a collection of points. and even with modern and powerful computers manipulations can take some time. So a rectangle on a plane with x . where we describe only critical points. Any object on a monitor is represented as a collection of pixels. Vr E Vits linear span (some-times called simply span) £{V I. APPLICATION TO COMPUTER GRAPHICS In this section we give some ideas of how linear algebra is used in computer graphics. vr } is the collection of all vectors V E Vthat can be represented as a linear combination v = alv I + a 2v2 +. A (digital) photo is a good example of a bitmap object: every pixel of it is described. + arvr of vectors vI' V2' . to describe a polygon. where each point (vector) v is . For example. some. but just explain some ideas. And now the last Example.

b :?: O. The rotation by yaround the origin o is given by the multiplication by the rotation matrix Rr we discussed above. r Stny cosy Ifwe want to rotate around a point a. that the translation is not a linear transformation (if a :f. then rotate around 0 (multiply by R) and then translate everything back by a.. All other transformation used in computer graphics are linear. it does not preserve O. we have the same basic transformations: Translation. However it is sometimes convenient to consider some di erent transformations. b then x and y coordinates scale di erently. so the translation is easy to implement. If a = b it is uniform scaling which enlarges (reduces) an object. moving the point a to 0. that any linear transformation in ]R2 can be represented either as a composition of scaling rotations and reflections. Another very useful transformation is scaling. First we need to be able to manipulate 3-dimensional objects. a.e. we first need to translate the picture by-a. preserving its shape. the vector v is replaced by v + a (notation v 1--7 v + a is used for this). The manipulations with 3-dimensional objects is pretty straightforward. Another often used transformation is reflection: for example the matrix defines the reflection through x-axis. given by the matrix This transformation makes all objects slanted. _ (COSY -sin y) R . reflection through a plane. We will show later in the book. If a :f. The first one that comes to mind is rotation. and then we need to represent it on 2-dimensional plane (monitor). 0): while it preserves the straight lines. Note. the object becomes "taller" or "wider". the horizontal lines remain horizontal. Matrices of these . but vertical lines go to the slanted lines at the angle j to the horizontal ones.22 Basic Notions translated by a. scaling. i. A vector addition is very well adapted to the computers. rotation. 3-Dimensional Graphics Three-dimensional graphics is more complicated. given by a matrix (~ ~). like the shear transformation.

Note. we know how to manipulate 3-dimensional objects. For example ( o~ ~ ~ J' (~ ~ ~J' (:~:~ ~:i. To get a more realistic picture one needs to use the so-called perspective projection. scaling. say to the x . So. Rotating an object and projecting it is equivalent to looking at it from di erent points. Perspective Projection onto x . 000 y :r % Fig. and rotation around z-axis. The simplest way is to project it to a plane. because it does not take into account the perspective. one can write matrices for the other 2 elementary rotations around x and around y axes. that the above rotation is essentially 2-dimensional transformation.23 Basic Notions transformations are very similar to the matrices of their 2 the matrices x 2 counterparts. the fact that the objects that are further away look smaller. this method does not give a very realistic picture. the matrix of this projection is [ ~ ~ ~J.y plane. To: Qefine a perspective projection one needs to pick a point the centre of projection or the . Let us now discuss how to represent such objects on a 2-dimensional plane. However.y plane. Similarly. To perform such projection one just needs to replace z coordinate by 0. It will be shown later that a rotation around an arbitrary axis can be represented as a composition of elementary rotations.yY ~J' 0 -I 0 0 cOO I represent respectively reflection through x . it does not change z coordinate.y plane: F is the centre (focal point) of the projection Such method is often used in technical illustrations.

. every point in ]R3 is represented by 4 coordinates. To do this let us introduce the so-called homogeneous coordinates.d-z l-z/d and similarly y* = y . and let v* = (x*. zl.. O) x' z x h) z Fig. that this formula also works if z > d and if z < 0: you can draw the corresponding similar triangles to check It... Let us get a formula for the projection. y*. y. y.24 Basic Notions focal point) and a plane to project onto.. 0."'" (x'. l-z/d Note. This is exactly how a camera works.z / d' O)T This transformation is definitely not linear (because of z in the denominator). to get usual3-dimensional coordinates of the vector v = (x. and it is a reasonable first approximation of how our eyes work. z) T xd x x*= . its image and the centre of the projection lie on the same line. x 3 . x 4l .. zl from its homogeneous coordinates (x l' x 2 . we get that x* x d d-z' so y .y' . Consider a point v = {x.. Then each point in ]R3 is projected into a point on the plane such that the point. Thus the perspective projection maps a point (x. 4th coordinate playing role of the scaling coe cient.. y. the last. In the homogeneous coordinates.. ol be its projection. y._ _.. d)T and that we are projecting onto x-y plane. Finding Coordinates x*. Assume that the focal point is (0. However it is still possible to represent it as a linear transformation. y* of the Perspective Projection of the Point (x. z) to the x y point ( 1.z / d ' 1. Thus.= ..

Then we first need to apply the translation by -(dp d2. In other words. For example. d3l. d) T but some arbitrary point (d t . O)Tto move the centre to (0. Thus in homogeneous coordinates the vector v* can be represented as (x.25 Basic Notions one needs to divide all entries by the last coordinate x4 and take the first 3 coordinates 3 (if x 4 = 0 this recipe does not work. Similarly. Ifwe multiply homogeneous coordinates of a point in]R2 by a non-zero scalar. and then move everything back translating it by (d t . 0. and so on. d3)T while preserving the x-y plane. d2 . how to make realistic lighting. y. I .z/dl. Of course. 0. ol. there is much more. so in homogeneous coordinates the perspective projection. here we only touched the mathematics behind 3-dimensional graphics. That explains why modern graphic cards have 4 x 4 matrix operations embedded in the processor. if the plane we project to is not x-y plane. how to determine which parts of the object are visible and which are hidden. apply the projection. etc. 0. we move it to the x-y plane by using rotations and translations. we do not change the point. All these operations are just multiplications by 4 x 4 matrices. d2 . so we assume that the case x 4 = 0 corresponds to the point at infinity). in homogeneous coordinates a point in ]R3 is represented by a line through 0 in ]R4. is a linear transformation: x y 0 l-zld = 0 0 0 x 0 I 0 0 y 0 0 0 0 z I I 0 0 -lid Note that in the homogeneous coordinates the translation is also a linear transformation: x y = 0 0 0 o 0 0 G3 z I I But what happen if the centre of projection is not a point (0. . shades.

k = I. n.b2. these three examples are essentially just different representations of the same mathematical object... . The first one is. + xnan = b.. '" am'n = (b l . .. a 2I x i { amixI + + a 12 x 2 a 22 x 2 + . . ... a k = (alk' a 2'k' .Chapter 2 Systems of Linear Equations Different Faces of Linear Systems There exist several points of view on what a system of linear equations.Z ::: ~~::]. Note. we can write the system as a vector equation xla l + x 2a 2 + .. Ifwe denote X:= (xl' X2' . where a k is the kth column of the matrix A. xnl E lR n .. To solve the above equation is to find all vectors X E Rn satisfying Ax = b. and then the above linear system can be written in the matrix form (as a matrix vector equation) Ax = b..b2 + amZxZ + . and finally. bi a 2n Xn . . + + '" + a'nXn =:... ... .. recalling the "column by coordinate" rule of the matrixvector multiplication.k)T. + amnXn = bm To solve the system is to find all n-tuples of numbers xl' X2' . am. or in short a linear system is. .. that it is simply a collection ofm linear equations with n unknowns xl' X 2' . xn which satisfy all m equations simultaneously. 2. bml E lR m .. X n' all X... b A= (~~:~ ~~:~ am 'I am. .

more "advanced" explanation why the above row operations are legal. Scaling: multiply a row by a non-zero scalar a. There are three types of row operations we use: 1.. the so-called echelon form. Row operations. As for the operation 3. To see that we do not gain anything extra. Solution of a Linear System. There is another.e. all the information necessary to solve the system is contained in the matrix A. It is clear that the operations 1 and 2 do not change the set of solutions of the system. we just notice that row operation of type 3 are reversible. Row exchange: interchange two rows of the matrix. one can easily write the solution. every row operation is equivalent to the multiplication of the matrix from the left by one ofthe special elementary matrices. So. Row operations and multiplication by elementary matrices. We will usually put the vertical line separating A and b to distinguish between the augmented matrix and the coefficient matrix.. i. which is called the coefficient matrix of the system and in the vector (right side) b. Namely. When the system is in the echelon form. i. x k' Yk or something else. let us notice that it does not matter what we call the unknowns. Echelon and Reduced Echelon Forms Linear system are solved by the Gauss-Jordan elimination (which is sometimes called row reduction). Namely. all other rows remain intact. This matrix is called the augmented matrix ofthe system. the "old' system also can be obtained from the "new" one by applying a row operation of type 3. Then any solution of the "old" system is a solution of the "new" one. the multiplication by the matrix . 2. let a "new" system be obtained from an "old" one by a row operation of type 3. they essentially do not change the system. on the equations). Hence. By performing operations on rows of the augmented matrix of the system (i. one can easily see that it does not lose solutions. 3. all the information we need is contained in the following matrix which is obtained by attaching the column b to the matrix A. that any solution of the "new" system is also a solution of the "old" one. Namely.e.Systems of Linear Equations 27 Before explaining how to solve a linear system. we reduce it to a simple form.. Row replacement: replace a row # k by its sum with a constant multiple of a row # j.e.

one again can simply check how they act on columns. one just replaces a by 1/a. performing a row operatiQn on the augmented matrix of the system Ax = b is equivalent to the multiplication of the system (from the left) by a special invertible matrix E. Left multiplying the equality Ax = b by E we get that any solution of the equation Ax =b . Note that all these matrices are invertible (compare with reversibility of row operations).. So. And finally. 0 o just interchanges the rows number} and number k. k ...... To get the inverse ofthe second one...28 Systems of Linear Equations k } o 1 o . To see that the inverses are indeed obtained this way. that the multiplication by these matrices works as advertised.. multiplication by the matrix 1 1 } k o Q o o 1 A way to describe (or to remember) these elementary matrices: they are obtained from I by applying the corresponding row operation to it adds to the row # k row # } multiplied by a. Finally.. the inverse of the third matrix is obtained by replacing a by -a.. The inverse ofthe first matrix is the matrix itself.. and leaves all other rows intact. . To see.. 1 o 1 0 o k Q 0 o 0 ] o o 1 Multiplication by the matrix multiplies the row number k by Q... one can just see how the multiplications act on vectors (columns).

by applying row operations of type 2. . This entry will be called the pivot entry or simply the pivot.. m. The main step of row reduction consists of three sub-steps: 1. then we leave the first row alone and apply the main step to rows 2. we get: U~ ~ n:. 3. After applying the main step finitely many times (at most m). We apply the main step to a matrix.. . Row reduction.(~ j j JJ Operate R2 (- ±). 3. Find the leftmost non-zero column of the matrix.e. "Kill" (i. that the first (the upper) entry of this column is non-zero. not even interchange it with another row.29 Systems oj Linear Equations is also a solution of EAx = Eb.~ . .. we get what is called the echelon form of the matrix. . 2. .m. (-2).. An example of row reduction. 2(-3). . Make sure. which is the original equation Ax = b. if necessary.. So. R 3 . etc.. make them 0) all non-zero entries below the pivot by adding (subtracting) an appropriate multiple of the first row from the rows number 2. m. we get . Multiplying this equation (from the left) by ~l we get that any of its solutions is a solution of the equation ~IEAx =~IEb . The point to remember is that after we subtract a multiple of a row from all rows below it (step 3). a row operation does not change the solution set of a system. Let us consider the following linear system: XI + 2x2 + 3x3 = 1 3xI+2x2 +x3 =7 { 2x1 + X 2 + 2x3 = 1 The augmented matrix of the system is (~ 2 ~ ~ 1 2 jJ 1 Operate R 2 . then to rows 3.. we leave it alone and do not change it in any way.

where A is the coefficient matrix. Instead of using back substitution. from the last row (equation) we getx3 =-2. Pivots: leading (rightmost non-zero entries) in a row. For a non-zero row.1 .-2l. or in vector form 'x~UJ or x= (1. 3. So. 3. All zero rows (i.2X2 . We can check the solution by mUltiplying Ax. For any non-zero row its leading entry is strictly to the right of the leading entry in the previous row. and finally. Echelon form. Then the second property of the echelon form can be formulated as follows: 2. we can do row reduction from down to top. let us call the leftmost non-zero entry the leading entry.6 + 6 = 1.-2)T 0 the augmented matrix. and the rest is pretty self-explanatory: (6o 0~ ~ J) =~~ _(6 ~ g ~)-2R2 . Then from the second equation we get x 2 = -1. Namely. The leading entry in each row in echelon form is also called pivot entry.(6 ~ g ~) 1 -2 0 0 1 -2 0 0 1 -2 and we just read the solution x = (1. are below all nonzero entries. if any. because these entries are exactly the pivots we used in the row reduction. .2(-2) = 3. the solution is { :~ : 13 x3 = -2. or simply pivot. R3 2 (3). killing all the entries above the main diagonal of the coefficient matrix: we start by multiplying the last row by 112.30 Systems of Linear Equations (~ JJ =l) Operate. from the first row (equation) xl = 1 .2x3 = .e.3x3 = 1 . A matrix is in echelon form if it satisfies the following two conditions: 1. we obtain (6o ~ ~ -~)-3R2 (6 ~ ~ -l) -3 -4 -1 0 0 2 -4 Now we can use the so called back substitution to solve the system." the rows with all entries equal 0).

one can also easily read the solution from the reduced echelon form. is a particular case of the reduced echelon form. The idea is to move the variables. as in the above example. All entries above the pivots are O. ( 0000ill3 here we boxed the pivots. all its entries on the main diagonal are non-zero. The right side. corresponding to the columns without pivot (the so-called free variables) to the right side. The general definition is as follows: we say that a matrix is in the reduced echelon form. one just reads the solution from the reduced echelon form. In general case. Then we can just write the solution. To get reduced echelon form from echelon form. For example. In this form the coefficient matrix is square (n x n). 4. if it is in the echelon form and 3. An example of the reduced echelon form is the system with the coefficient matrix equal!. . All pivot entries are equal I. Note. i. using row replacement to kill all entries above the pivots. In this case. We got this form in our example above. let the reduced echelon form of the system (augmented matrix) be ill 2 0 0 0 IJ ooills 02. After the backward phase of the row reduction. we work from the bottom to the top and from the right to the left. moving all free variables to the right side.. that all entries below the pivots are also o because of the echelon form. the rightmost column of the augmented matrix can be arbitrary. Xl = 1-2x2 x 2 is free x3 = 2 -Sx4 x 4 is free x5 =3 or in the vector form 1-2x2 x2 1-Sx4 X= One can also find the solution from the echelon form by using back substitution: the idea is to work from bottom to top.Systems of Linear Equations 31 A particular case of the echelon form is the so-called triangular form.e. and all the entries below the main diagonal are zero. we get what the socalled reduced echelonform of the matrix: coefficient matrix equal I.

1. Il (all entries are 0. i. I should only emphasize that this statement does not say anything about the existence. Therefore.0. Equation Ax = b has a unique solution for any right side b if and only if echelon form of the coefficient matrix A has a pivot in every column and every row. . Now.32 Systems of Linear Equations Analyzing the Pivots All questions about existence of a solution and it uniqueness can be answered by analyzing pivots in the echelon (reduced echelon) form of the augmented matrix of the system.. Note. . If Ae has a zero row. if we put be = (0. .. statement 3 immediately follows from statements 1 and 2. A solution (if it exists) is unique i there are no free variables. i.. Let us show that if we have a zero row in the echelon form ofthe coefficient matrix A. if one just thinks about it: a system is inconsistent (does not have a solution) if and only ifthere is a pivot in the last row of an echelon form of the augmented matrix. because free variables are responsible for all nonuniqueness.. 3.. If we have a pivot in every row of the coefficient matrix. Ifwe don't have such a row. Then Ae=EA.. then we can pick a right side b such that the system Ax = b is not consistent. Multiplying this equation by n I from the left. then the equation Ac = be does not have a solution. First of all. except the last one). where E is the product of elementary matrices.e. .. Indeed. E 2 . we cannot have the pivot in the last column of the augmented matrix. let us investigate the question of when when is the equation Ax = b inconsistent.. E I. .e. + xn = b -:t that does not have a solution.. and not with the augmented matrix of the system. corresponding to the row operations. we get that the equation Ax = nIbe does not have a solution. no matter what the right side b is. that is if and only if the echelon form of the coefficient matrix has a pivot in every column. three more statements. I an echelon form of the augmented matrix has a row 00 . The second statement is a tiny bit more complicated. ° ° ° 2: Equation Ax = b is consistent for all right sides b if and only if the echelon form of the coefficient matrix has a pivot in every row. they all deal with the coefficient matrix. then the last row is also zero. b. b -:t in it. such a row correspond to the equation ox I + Ox2+. we just make the reduced echelon form and then read the solution off. The first statement is trivial. when it does not have a solution. so the system is always consistent. The answer follows immediately. Finally. . LetAe echelon form of the coefficient matrix A. an recalling that nIAe = A. E = EN.

a linearly independent or a span'1ing system.. And finally. which is impossible if m > n (number of pivots cannot be more than number of rows). . .• Vm E ~n. Then 1.. the system VI' v2' ••. Similarly. ~n 2. The system vi' v 2' •.. generating) i echelon Proof The system VI' v2' •.. vm is a basis in ~n i echelon form of A has a pivot in every column and in every row. vm is linearly independent i echelonform of A has a pivot in every column. . The main observation.• vm.. it happens if and only XlvI if there is a pivot in every column in echelon form of the matrix.. . 3. Proposition. and letA = [VI' v2' ••• . . any row and any column have no more than 1 pivot in it (it can have 0 pivots) Corollaries about Linear Independence and Bases Questions as to when a system of vectors in ~n is a basis. Any linearly independent system of vectors in ~ n cannot have more than n vectors in it.. . The system vI' v2.. = xm = 0. can be easily answered by the row reduction. vm E ~m is complete in ~n ifand only if the equation +x2v2 +··· +xmvm=b has a solution for any right side b E ~ n . the system VI' v 2' •. .. vm] be the n x m matrix with columns v I' v 2' . it happens if and only if there is a pivot in every column of the matrix. . Any two bases in a vector space V have the same number of vectors in them. vm E ~m is linearly independent ifand only if the equation XlvI +x2v2 +··· +xmvm=O has the unique (trivial) solution XI = x 2 = . The system VI' V2' ••.• vm] be an n x m matrix with columns vI' v2• . . In echelon form. the equation Ax = 0 has unique solution x = O. vm E ~n be linearly independent. ..33 Systems of Linear Equations From the above analysis of pivots we get several very important corollaries. or equivalently.. E Proof Let a system vi' v2' . vm E ~m is a basis in ~n ifand only if the equation +x2v2 +··· +xmvm=b has unique solution for any right side b XlVI ~n. By statement 2 above.. By statement 1 above. Proposition. vm. Let us have a system of vectors vI' v2• . and let A = [vI' v2• . Proposition. vrn is complete in form ofA has a pivot in every row. By Proposition echelon form of A must have a pivot in every column.. (spanning. By statement 3 this happens if and only ifthere is a pivot in every column and in every row of echelon form of A.....

The above propo·sition immediately implies the foIlowing Corollary. In other words. we know.wm be two different bases in V. n ~ m.. Proof As it was discussed in the beginning of the section. . Corollaries About Invertible Matrices Proposition. There is also an alternative proof. m ~ n.2... means that the equation Ax = b has a unique solution for any (all possible) right side b. Consider an isomorphism A : IR... n. The existence means that there is a pivot in every row (of a reduced echelon form of the matrix).34 Systems of Linear Equations Proof Let vI' V 2' . Since A-I is also an isomorphism. en is the standard basis in IRn. Without loss of generality we can assume that n ~ m. . A matrix A is invertible if and only if its echelon form has pivot in every column and every row. but there is also a direct proof. hence the number of pivots is exactly n.• .. Proposition above states that it happens if and only if there is a pivot in every row and every column. The uniqueness mean that there is pivot in every column of the coefficient matrix (its echelon form). n ~ V defined by Ae k = vk' k = 1.. The fact that the system is a basis.. Proof This fact foIlows immediately from the previous proposition. An invertible matrix must be square (n x n). Any spanning (generating) set in IR n must have at least n vectors. Proposition. that the matrix (linear transformation) A is invertible if and only if the equation Ax = b has a unique solution for any possible right side b.. .. Together with the assumption n ~ m this implies that m =n. But. . . where e l . . Let v I' v 2' . and letA be n x m matrix with columns VI' v2' . then it is invertible. A-I w2' ••. vm be a complete system in IR n . to check the invertibility of a square matrix A it is sucient to check only one of the conditions AA-I = 1. e 2 .. so m = number of columns = number of pivots = n Proposition. So it is linearly independent. vm be a basis in IR n and let A be the n x m matrix with columns VI' v2' . vm . We know that a matrix is invertible if and only if its columns form a basis in. Any basis in IR n must have exactly n vectors in it.•• . . Proposition. or ifit is right right invertible. .lj a square (n x n) matrix is left invertible.. Statement 2 of Proposition implies that echelon form of A has a pivot in every row. A-I wm is a basis. Since number of pivots cannot exceed the number of rows. the equation Ax = b has a unique solution for any right side b if and only if the echelon form of A has pivot in every row and every column. the system A-I wl . A-IA = 1. vn and w"w2' . The statement below is a particular case of the above proposition. . Proof Let VI' v2' . vm ..

so A is invertible. can be reduced by row operations) to the identity matrix. is as follows: we know that (for an invertible A) the vector A-Ib is the solution of the equation Ax = b. Therefore reduced echelon form of an invertible matrix is the identity matrix 1. Therefore. Now let us state a simple algorithm of finding the inverse of an n x n matrix: 1. Performing row operations on the augmented matrix transform A to the identity matrix I. then multiplying this identity by B from the left we get x = 0. BA = l).. en is the standard basis in Rn. The matrix I that we added will be automatically transfo~med to A-I. If a matrix A is left invertible.Systems of Linear Equations 35 Note. the echelon form of A has pivots in every row. e2. So to find the column number k of A-I we need to find the solution of Ax = ek. If a matrix A is right invertible.. Indeed. The above algorithm just solves the equations Ax = e k . k = 1. then for x = Cb.2. the echelon form also has pivots in every column. that this proposition applies only to square matrices! Proof We know that matrix A is invertible if and only if the equation Ax = b has a unique solution for any right side b. n simultaneously! . b E IR n Ax = ACb = Ib = b. FINDING A-I BY ROW REDUCTION As it was discussed above. a na·"yve one. This happens if and only if the echelon form of the matrix A has pivots in every row and in every column.. If A is square. . A is not invertible There are several possible explanations of the above algorithm. If it is impossible to transform A to the identity by row operations. where e l .e. . Therefore. Thus. for any right side b the equation Ax = b has a solution x = Cb.. so the matrix is invertible. 3. the equation Ax = 0 has unique solution x = O. and C is its right inverse (AC = l). and its echelon form must have pivots in every row and every column. it also has a pivot in every column. Therefore. so the solution is unique. an invertible matrix must be square. If the matrix A is square (n x n). 2. The first. . 4. Form an augmented n x 2n matrix (All) by writing the n x n identity matrix right of A. and x satisfies Ax = 0. . if b is a left inverse of A (i.. Any invertible matrix is row equivalent (Le. echelon form of A has pivots in every row.

. EA = J. FINITE-DIMENSIONAL SPACES Definition. . But the same row operations transform the augmented matrix (AI 1) to (EA IE) = (I I A-I). . so E = A-I.. A-I =EN··· E E 2 I' so A = (A-It l = EjlE". every row operation can be realized as a left multiplication by an elementary matrix./. Proof As we discussed in the previous paragraph. ( 3 11 -6 Augmenting the identity matrix to it and performing row reduction we get 1 4 -2 1 0 OJ ( 1 4 -2 1 o OJ 1 0 -2 -7 7 0 1 0 2R 0 1 3 2 ( 3 11 -6 0 0 1 ~3R: 0 -1 0-3 o 1 +R2 +2R2 1 4 -2 1 0 0JX3 (3 12 -6 3 0 013210 -01321 (o 0 3 -1 1 1 0 0 3 -1 1 0~J-R3 Here in the last row operation we multiplied the first row by 3 to avoid fractions in the backward phase of row reduction.. E-.. EN be the elementary matrices corresponding to the row operation we performed.l . E 2. Let E I .. The dimension dim V of a vector space V is the number of vectors in a basis. Theorem.. and let E = EN . Suppose we want to find the inverse of the matrix 1 4 -2J -2 -7 7 . more "advanced" explanation. i. E2EI be their product.0 1 0 3 0 -1 0 0 3 -1 1 1 Dividing the first and the last row by 3 we get the inverse matrix -35/3 2/3 14/3J 3 0 -1 ( -113 1/3 113 DIMENSION.e. . 1 We know that the row operations transform A to identity. This "advanced" explanation using elementary matrices implies an important proposition that will be often used later. Any invertible matrix can be represented as a product ofelementary matrices. As we discussed above.. Continuing with the row reduction we get 3 12 0 1 2 2Jo 1 0 3 0 -1 ( o 0 3 -1 1 1 12R 2 (3 0 0 -35 2 14J .36 Systems of Linear Equations Let us also present another.

. . and let A : V -t IR n be an isomorphism. Av2 . If V does not have a (finite) basis. This immediately implies the following Proposition. . Av2. . n. Proposition.2.. Proposition. Any linearly independent system 0/ vectors in a finitedimensional space can be extended to a basis. otherwise we call it infinite-dimensional. n = dimE to move the problem to IR n . Vn E V.37 Systems of Linear Equations For a vector space consisting only of zero vector 0 we put dim V = o. Proof Let vI' v ' .. that it does not"depend on the choice of a basis.. n. vm E Vbe a linearly independent system. vm E Vbe a complete system in V. if dim V = n then there exists a basis VI' V 2 ' ••• . then there always exists an isomorphism A : V -t IRn.. Proposition. """' vn is a basis in V. ••. . . we call the space V finitedimensional. . Then Av I. A vector space Vis finite-dimensional if and only if it has a finite spanning system. and by Proposition m ~ n. v r+ 2 """' vn such that the system o/vectors vI' v2..e. . Proposition asserts that the dimension is well defined. k = 1. we put dim V = 00. Suppose. . where all such questions can be answered by row reduction (studying pivots). vr are linec:rly independent vectors in afinitedimensional vector space V then one can find vectors vr+l. or if it is complete)? Probably the simplest way is to use an isomorphism A : V -t IR n . and letA: V -t IR n be 2 an isomorphism... that we have a system of vectors in a finite-dimensional vector space. Indeed. that if dim V = n. . i.. Note. If dim V is finite. Any generating system in afinite-dimensional vector space V must have at least dim V vectors in it. . AVm is a linearly independent system in IR n . Then Av I. AVm is a complete system in lR n . .. ijvI' v2. and one can define an isomorphism A : V -t IR n by AVk = ek .. and we want to check if it is a basis (or if it is linearly independent. and by Proposition m ::. Proof Let vI' v2.. Any linearly independent system in a finite-dimensional vector space V cannot have more than dim V vectors in it.e. i.

v r' vr + I is linearly independent.•• . and let H be the set of all solutions of the associated homogeneous system Ax = O. ofthe solution set) of a linear system.38 Systems of Linear Equations Proof Let n = dim Vand let r < n (if r = n then the system v I' V 2' . . v r } and call it vr + I (one can always do that because the system vI' V 2' .. Let a vector xlsatisfy the equation Ax = b.e. this theorem can be stated as General solution of Ax =b A particular solution = of Ax = b Proof Fix a vector xI satisfying AXI =b. vr is already a basis. Then for x h :=x-x I we get + General solution of Ax = 0 . Take any vector not belonging to span{vl' v2' . Repeat the procedure with the new system to get vector vr + 2. General Solution of a Linear System In this short section we discuss the structure ofthe general solution (i. Note. that the process cannot continue infinitely.. We call a system Ax = b homogeneous. Let a vector xh satisfy Axh = O. i. Then for we have Ax = A(x I + xh) = AXI + AXh = b + 0 = b... a homogeneous system is a system of form Ax = O.. so any x of form x = xI + xh' xII E H is a solution of Ax = b.. (General solution of a linear equation). b = 0. .. Now let x be satisfy Ax = b. Theorem. vr is not generating).. if the right side. With each system Ax = b we can associate a homogeneous system just by putting b = O. The system vI' v2 ' . We will stop the process when we get a generating system. and so on. In other words. because a linearly independent system of vectors in V cannot have more than n = dim V vectors. Then the set {x = XI + x h : xh E H} is the set of all solutions of the equation Ax = b. .e. and the case r> n is impossible).

we can repeat the row operations. if the solution was obtained by some non-standard method. The power of this theorem is in its generality. Of course. here we just replaced the last vector by its sum with the second one. we keeping notation x3 and Xs here only to remind us that they came from the corresponding free variables. Namely. Now. For example. 0. so H. it can look differently from what we get from the row reduction. So.b = 0. For example the formula gives the same set as (can you say why?). You will meet this theorem in differential equations. 1. consider a system (11 1 1=~)x (li). 2. The simplest way to check that give us correct solutions. There is an immediate application in this course: this theorem allows us to check a solution of a system Ax = b. It applies to all linear equations. = 2 2 2 2 -8 14 Performing row reduction one can find the solution of this system The parameters x 3. to study uniqueness. which always has a solution. ol satisfies the equation Ax = b. and we want to check whether or not it is correct. Therefore any solution x of Ax X h E = b can be represented as x = xl + xh with some xh E H. partial differential equations. this formula is different from the solution we got from the row reduction. and that the other two (the ones with . that we are just given this solution.Systems of Linear Equations AXh = A(x . let us suppose. etc. but this is too time consuming. Besides showing the structure of the solution set.xl) 39 = Ax - AXI = b . but it is nevertheless correct. this theorem allows one to separate investigation of uniqueness from the study of existence. t and s. x5 can be denoted here by any other letters. integral equations. is to check that the first vector (3. for example. we do not have to assume here that vector spaces are finitedimensional. we only need to analyse uniqueness of the homogeneous equation Ax = 0. Moreover.

Often the term row space is used for Ran AT and the term left null space is used for KerAT (but usually no special notation is introduced). The four subspaces RanA. one does not have to perform all row operations to check that there are only 2 free variables. there should be 2 free variables. FUNDAMENTAL SUBSPACES OF A MATRIX A: V ~ Wwe can associate two subspaces. which is one of the fundamental notions of Linear Algebra. then in addition to Ran A and Ker A one can also consider the range and kernel for the transposed matrix AT . if one does row operations. we will need new notions of fundamental subspaces and of rank of a matrix. What comes to mind. In other words. namely. the number of pivots is 3. For example. If A is an m x n matrix. and that formulas both give correct general solution. So indeed. We will need the following definition. Systems of linear equations example.40 Systems of Linear Equations the parameters x3 and Xs or sand t in front of them) should satisfy the associated homogeneous equation Ax = O. If this checks out. a mapping from ]Rn to ]Rm. we will be assured that any vector x defined is indeed a solution. that this method of checking the solution does not guarantee that gives us all the solutions. In this section we will study important relations between the dimensions of the four fundamental subspaces. its kernel. is the dimension of the range of A rankA := dim Ran A. one needs to do echelon reduction. and it looks like we did not miss anything. Ker AT are called the fundamental subspaces of the matrix A. let a be the matrix. and the range Ran A is exactly the set of all right sides b E W for which the equation Ax = b has a solution. and Ae be its echelon form . Note. To be able to prove this. which is often used instead of Ran A. Computing Fundamental Subspaces and Rank To compute the fundamental subspaces and rank ofa matrix. Definition. then it follows from the "column by coordinate" rule ofthe matrix mUltiplication that any vector W E Ran A can be represented as a linear combination of columns of A. or null space KerA = Null A := {v E V: Av = O} C V. Namely. and its range Ran A = {w E W: w = Av for some v E V} C W. This explains the name column space (notation Col A).. Ker A. is to count the pivots again. Given a linear transformation (matrix) a its rank. Ran AT.e. In this example. If A is a matrix. rankA. the kernel Ker A is the solution set of the homogeneous equation Ax = 0. i. if we just somehow miss the term with x 2' the above method of checking will still work fine.

The pivot rows of the echelon from Ae give us a basis in the row space. which in this example is . the columns 1 and 3 of the original matrix. So.. the columns where after row operations we will have pivots in the echelon form) give us a basis (one of many possible) in Ran A. Compute the reduced echelon form of A. Our reason for putting them vertically is that although we call RanAT the row space we define it as a column space of AT) To compute the basis in the null space Ker A we need to solve the equation Ax = O. 3.e. The question of whether to put vectors here vertically as columns. Example. -!J 0 0 000 000 (the pivots are boxed here). Consider a matrix i3 i3 ~3 ~3 2~J ' (1 1 -1 -1 0 Performing row operations we get the echelon form (~oo ~ ~ -. But if we already have the echelon form of A. and then do row operations.. it is possible just to transpose the matrix. or horizontally as rows is is really a matter of convention. Of course. To find a basis in the null space Ker A one needs to solve the homogeneous equation Ax = 0: the details will be seen from the example below.e.. i. the vectors (we put the vectors vertically here.Systems of Linear Equations 41 1.e. say by computing Ran A. The pivot columns of the original matrix a (i. 2. i. then we get Ran AT for free. the columns give us a basis in Ran A. We also get a basis for the row space RanA T for free: the first and second row of the echelon form of A.

Indeed. in this case the last column of the augmented matrix is the column of zeroes. Unfortunately. the knowledge of the echelon form of a does not help here. in the vector form x= -1 0 1 0 0 +x4 -1 +xs 0 1 1 =x2 -x4 --xs 3 X4 0 0 -113 0 -113 0 1 Xs The vectors at each free variable. why do the above methods indeed give us bases in the fundamental subspaces? .42 Systems of Linear Equations ill o ( oo 1 ill0 01 113. we can read the solution 0 the reduced echelon form above: 1 xI = -x2 -3"xs . 113) 0 0 0 0 0 0 0 0 0 Note. without actually writing it. it is sucient to work with the coefficient matrix. -1. Xs is free or. we can just keep this column in mind. x4 is free. there is no shortcut for finding a basis in KerAT. x2is free.e.[g] [-11~] -113 O. one must solve the equation AT x = O. in our case the vectors [ 001 -~]o . Keeping this last zero column in mind. 1 0 form a basis in KerA. that when solving the homogeneous equation Ax = 0.. i. Explanation of the Computing Bases in the Fundamental Subspaces So. it is not necessary to write the whole augmented matrix. So. Unfortunately. which does not change under row operations.

Ev2..... . The row space Ran A T. the reduced echelon form Are is obtained from A by the left multiplication Are = EA. Let us now show that the pivot columns of a span the column space of A.w2' . we can conclude that a l = O. It is easy to see that the pivot rows of the echelon form Ae of a are linearly independent. wr are 0. the entry number k of the resulting vector is exactly x k' so the only way this vector (the linear combination) can be 0 is when all free variables are O. Indeed. where E is a product of elementary matrices... . EVr are the pivot columns of Are' and the column v ofa is transformed to the column Ev of Are. . i. . found all the solutions. The case of the null space KerA is probably the simplest one: since we solved the equation Ax = 0. Consider the first non-zero entry of WI. the vectors we obtained form a spanning system in Ker A. Multiplying this equality by g-I from the left we get the representation v = aIv I + a 2v2 + .. . the pivot columns of the original matrix A are linearly independent. To see that vectors w I ... Let VI ' v 2.e.. notice that the pivot columns of the reduced echelon form are of a form a basis in Ran Are. one can notice that row operations do not change the row space. Consider now the first non-zero entry ofw2. First of all. . The vectors Ev l . We want to show that v can be represented as a linear combination of the pivot columns vI' v2. .. so indeed the pivot columns of A span Ran A.Systems of Linear Equations 43 The null space KerA. To see that the system is linearly independent. wr span the row space. The corresponding entries of the vectors w3. wr the corresponding entries equal 0 (by the definition of echelon form). we get that a k = 0 Vk = 1.. Since for all other vectors w2. The column space Ran A. + arvr. so a 2 = O. ..+ arvr. Let us now explain why the method for finding a basis in the column space Ran A works. but we present here a more formal way to demonstrate this fact. + arwr = O.+ a. . r. Thus. . and let V be an arbitrary column of A.. This can be obtained directly from analyzing row operations. 2. let us multiply each vector by the corresponding free variable and add everything... vr' v = a IVI + a2v2 + . then any vector in Ker A is a linear combination of the vectors we obtained. let wI 'w2. .. Therefore. vector Ev can be represented as a linear combination Ev = alEv I + a 2Ev2 + . . Since row operations are just left multiplications by invertible atrices.. .. Then for each free variable x k .. so E is an invertible matrix.wr be the transposed (since we agreed always to put vectors vertically) pivot rows of Ae. Suppose alw l + a 2w 2 + . .Evr.w3' ... . vr be the pivot columns of A. Since the pivot columns of Are form a basis in RanA re . Repeating this procedure. they do not change linear independence... So we can just ignore the first term in the sum..

However.e.. DIMENSIONS OF FUNDAMENTAL SUBSPACES There are many applications in which one needs to find a basis in column space or in the null space of a matrix. .e. since dimensions of both Ran A and RanA Tare equal to the number of pivots in the echelon form of A. Proof The proof. to n). dim Ker AT + dimRan AT = dim Ker AT + rank AT = dimKer AT + rankA T = m (dimension of the target space ofA). If a is an m x n matrix. by removing unnecessary vectors (columns). i. Then 1. A(X) : = {y = A(x) : x EX}. where E is an m x m invertible matrix (the product of the corresponding elementary matrices).e. For example. rank A) adds up to the number of columns (i. a linear transformationfrom ]Rn to ]Rm. The second statement. Finding a basis in the column space means simply extracting a basis from a spanning set..e..l The proof of this theorem is trivial. Theorem rankA = rankA T . A: THE RANK THEOREM. and Ae is its echelon form.. 2. i. This theorem is often stated as follows: IThe column rank of a matrix coincides with its row rank. as it was shown above. The following theorem is gives us important relations between dimensions of the fundamental spaces. the number of pivots. Then Ran A: = Ran(AT ET) =AT (Ran ET) =AT (~m) so indeed RanA T = Ran = Ran AT . modulo the above algorithms of finding bases in the fundamental subspaces. Ae is obtained from A be left multiplication Ae = EA. solving a homogeneous equation Ax = 0 amounts to finding a basis in the null space KerA. dim Ker A + dim Ran A = dim Ker A + rank A = n (dimension of the domain of A). x E X. if one takes into account that rank A = rank AT is simply the first statement applied to AT. The first statement is simply the fact that the number of free variables (dimKer A) plus the number of basic variables (i. is almost trivial. Let A be an m Xn matrix. the most important application of the above methods of computing bases of fundamental subspaces is the relations between their dimensions. It is often also called the Rank Theorem Theorem.44 Systems of Linear Equations For a transformation A and a set X let us denote by A(X) the set of all elements y which can represented as y = A(x).

how can we guarantee that any of the formulas describe all solutions? First of all. last 2 vectors in either formula form a basis in KerA. the last 2 vectors (the ones multiplied by the parameters) belong to Ker A. hence dim Ker A ~ 2. we already can see that the rank is 3. An important corollary of the rank theorem. there we considered a system 2 3 1 1 ( 1 1 -3 111 421 -9J (17J6 -5 x = 8 . rankA + dim Ker A = 5. By Theorem. is the following theorem connecting existence and uniqueness for linear equations. (Actually. but it is enough just to have the estimate here). we know that in either formula. 2 2 2 3 -8 14 and we claimed that its general solution given by or by A vector x given by either formula is indeed a solution of the equation. Therefore. and therefore there cannot be more than 2 linearly independent vectors in KerA.1t is easy to see that in either case both vectors are linearly independent (two vectors are linearly dependent if and only if one is a mUltiple of the other).Systems of Linear Equations 45 As an application of the above theorem. Let A be an an m x n matrix. But. so either formula give all solutions of the equation. let us count dimensions: interchanging the first and the second rows and performing first round of row operations J(i ~ ~ l =~J -(J ~ -~ i =~J 2R --R 1 1 1 2 -5 0 0 0 1 -2 J -2RJ 2 2 2 3 -8 0 0 0 1 -2 we see that there are three pivots already. Then the equation Ax = b has a solution for every b ATx=O E lR m if and only if the dual equation . S0 rank A ~ 3. Now. Theorem.

Change of Coordinates Formula The material we have learned about linear transformations and their matrices can be easily extended to transformations in abstract vector spaces with finite bases.. then the dimensions of the domain Rn and of the range Ran A coincide. Coordinate vector. Proof The proof follows immediately from Theorem by counting the dimensions. . + xnbn = LXkbk' k=I The numbers xl' X2. vn to the standard basis e I. which relates the coordinate vectors [Tv]B and [v]A' [Tv]B= [1]BA [v]A. . then the transformation "kills" dimKerA dimensions.. . . and let a = {a!' a2. B := {bl' b2. not A).. Matrix of a linear transformation. en in IRn. an}. that in the second equation we have AT. "'j .dim Ker A. It transforms the basis v!' v2. . Namely. bm } be bases in Vand W respectively. . (Note. .... . If the kernel is non-trivial. that if a transformation a: IR n ~ IRtn has trivial kernel (KerA = {O}). statement 1 of the theorem says. so dimRanA =n . It is convenient to join these coordinates into the so-called coordinate vector of v relative to the basis B. Representation of a Linear Transformation in Arbitrary Bases. A matrix of the transformation T in (or with respect to) the bases a and b is an m x n matrix. Let T: V ~ W be a I inear transformation. In this section we will distinguish between a linear transformation T and its matrix.... which is the column vector Note that the mapping v ~ [v]B is an isomorphism between Vand IRn. denoted by [11 BA . There is a very nice geometric interpretation of the second rank theorem. . Xn are called the coordinates of the vector v in the basis B. e 2...46 Systems of Linear Equations has a unique (only the trivial) solution. bn}· Any vector v E V admits a unique representation as a linear combination n V = xlb I + x 2b2 + .. Let V be a vector space with a basis B := {bI' b2. the reason being that we consider different bases. . so a linear transformation can have different matrix representation.

. so we do not repeat it here. i. Let us have two bases A = {aI' a2. ". Let our space Vbe and let us have a basis B = {bI' b2. The proof here goes exactly as in the case of ~n spaces with standard bases... An example: change of coordinates from the standard basis. . let T) : x ~ Yand T2 : Y ~ Z be linear transformation. Change of Coordinate Matrix. . Consider the identity transformation I = I v and its matrix [1]BA in these bases. . . Namely. The for the composition T= T2T I. . The matrix [1]BA is easy to find: its kth column is just the coordinate vector [Tak]B (compare this with finding the matrix of a linear transformation from ~n to ~m). By the definition [v]B = [1]BA [v] A . its kth column is the coordinate representation [aklB of kth element of the basis A Note that [1]AB = ([1] BAtl. Another possibility is to transfer everything to the spaces ~n via the coordinate isomorphisms v ~ [v]B' Then one does not need any proof. (follows immediately from the mUltiplication of matrices rule). T: x ~ Z. Tx:= T2(T I (x)) we have [1]CA = [T2 T dcA= [T2]CB [TdBA (notice again the balance of indices here). . The matrix [1]BA is easy to compute: according to the general rule of finding the matrix of a linear transformation. so any change of coordinate matrix is always invertible. The matrix [1]BA is often called the change ofcoordinates (from the basis A to the basis B) matrix. and let A.Systems of Linear Equations 47 notice the balance of symbols A and B here: this is the reason we put the first basis A into the second position. \Iv E V. bn} in a vector space V.. an} and b = {bI' b2. " bn} there..'" bn] =: B.e. Yand Z respectively. for any vector v E Vthe matrix [1]BA transforms its coordinates in the basis a into coordinates in the basis B. The change of coordinates matrix [1]SB is easy to compute: [1]SB = [bl' b2. en} there. As in the case of standard bases.. composition of linear transformations is equivalent to multiplication oftheir matrices: one only has to be a bit more careful about bases.. everything follows from the results about matrix multiplication. Band C be bases in X. We also have the standard basis S = {el' e2. . ~n.

Let T: V ~ W be a linear transformation. 1 . The rule is very simple: . For example. [1] Matrix of a transformation and change of coordinates. x}. in }R2 . and it is also easy to check that the above matrix is indeed the inverse of B) An example: going through the standard basis.. B. However. And in the other direction [l]BS = ([1] SB )-1 = 15 1. consider a basis un B={(1). and we want to find the change of coordinate matrix [1]BA. and let S denote the standard basis there. In the space of polynomials of degree at most 1 we have bases . Then [1]SB = Ui) =: B and 1]-1 S[1]BS= [ SB 1(-I2 -I2) = B-1 ="3 (we know how to compute inverses. In PI we also have the standard basis S = {l. and we know how to do that. Of course. 1 + x}.e. i. it involves solving linear systems. and B = {I + 2x.. and taking the inverses [1]As=A _I (1 -1)1 ' [1]BS= 15 = 4"1(22 -1· 1) = 0 1 Then and Notice the balance of indices here. [llsA = (1 ~2) =: B. A = {I. the matrix [1] BA. A be two bases in V and let B. I think the following way is simpler. Suppose we know the matrix [1] BA' and we would like to find the matrix representation with respect to new bases A. and let A. we can always take vectors from the basis A and try to decompose them in the basis B. it is just the matrix B whose kth column is the vector (column) vk .2x}.e. and for this basis [I]SA = n (6 =: A.48 Systems of Linear Equations i. B be two bases in W.

If a is similar to B. b2. Consider a linear transformation T: V ~ V and let [11 AA be its matrix in this basis (we use the same basis for "inputs" and "outputs") The case when we use the same basis for "inputs" and "outputs" is very important (because in this case we can multiply a matrix by itself). i. . It is shorter. The above reasoning shows. . By the change of coordinate rule above [T]BB = [I]BA[T]AA[I]AB Recalling that [I]BA = [flAk and denoting Q := [I]AB ' we can rewrite the above formula as [11 BB = Q -1 [11 AA Q.. Case of one basis: similar matrices. I[Tls A = [I]BB[T]BA[I]AA I The proof can be done just by analyzing what each of the matrices does. so I recommend using it (or at least always keep it in mind) when doing change of coordinates.Systems of Linear Equations 49 to get the matrix in the "new" bases one has to surround the matrix in the "old" bases by change of coordinates matrices. we can just say that A and B are similar. . so let us study this case a bit more carefully. Let V be a vector space and let A = {aI' a 2. an} be a basis in V.. . that very often in this [11 A is often used instead of [11 AA . Notice. if A = g-IBQ. that it does not matter where to put Q and where g-l: one can use the formula A = QBg-I in the definition of similarity. that similar matrices A and B have to be square and of the same size. The above discussion shows. We say that a matrix A is similar to a matrix b if there exists an invertible matrix Q such that A = g-I BQ.. This gives a motivation for the following definition Definition. but two index notation is better adapted to the balance of indices rule. it follows from counting dimensions. . therefore B is similar to A. Since an invertible matrix must be square. that one can treat similar matrices as different matrix representation of the same linear operator (transformation). then B = QAg-I = (g-ItIA(g-I) (since g-l is invertible). matrix representation of a linear transformation changes according to the formula Notice the balance of indices. So. bn} be another basis in V. Namely. because we don't have any choice if we follow the balance of indices rule..e. case the shorter notation [11 A is used instead of [T]AA' However. the two index notation [T] is better adapted to the balance of indices rule. We did not mention here what change of coordinate matrix should go where.. Let B = {b I .

Hence (ai 1 .n ami amn 1 is called an m x n matrix. We count rows from the top and columns from the left. and represents the entry in the matrix (1) on the i-th row andj-th column. We now consider the question of arithmetic involving matrices. Consider the 3 x 4 matrix [J 1 4 3 I 5 0 7 Here and aij ~1 m represent respectively the 2-nd row and the 3-rd column ofthe matrix. ain) and amj represent respectively the i-th row and the j-th column of the matrix (1). and 5 represents the entry in the matrix on the 2-nd row and 3-rd column. Example..Chapter 3 Matrics Introduction A rectangular array of numbers of the form [ a~ i a. let us . with m rows and n columns. First of all..

(c) are easy consequences of ordinary addition. We first study the simpler case of multiplication by scalars. For part (d). . we can consider the matrix A' obtained from A by multiplying each entry of A by -1. Suppose further that 0 represents the m x n matrix with all entries zero. Suppose that A=[ ~ ~11 ~ . (b) A + (B + C) = (A + B) + C. B. -1 0 7 Proposition. Then we write all ~ql al amI + bml amn ·+ bmn A+B=[ n . and (d) there is an m x n matrix A' such that A + A' = O. -1 0 7 and B 6 =[ ~ ~ ~2 ~Il' -2 1 3 3 Then A+B = [~:~ . and includes multiplication of a matrix by a scalar as well as mUltiplication of two matrices. Care m x n matrices. as matrix addition is simply entry-wise addition. Suppose that the two matrices a~ 1 a~n 1 .51 Matrics study the problem of addition. [b~ I A=: : and B = : [ amI amn bm1 both have m rows and n columns.~! -21~171 [~ ~ ~ 9~1' and -1-1 0+1 7+3 6+3 -1 1 10 Example. Definition. Example.. The theory of multiplication is rather more complicated. A reasonable theory can be derived from the following definition. (Matrix Addition) Suppose that A.. b~:n 1 bmn ~bln 1 and call this the sum of the two matrices A and B. .:~ . Proof Parts (a) . . (c) A + 0 = A. Then (a) A + B = B + A. We do not have a definition for adding" the matrices 2 [ -1 4 3 0 7 -I] [~ ~l 6 and .

Then (a) c(A + B) = cA + cB. as multiplication by scalar c is simply entry-wise multiplication by the number c. Bare m x n matrices. Proof These are all easy consequences of ordinary multiplication. Example. and that C E JR.52 Matrics Definition. -2 0 14 12 Proposition. ami has m rows and n columns. let us consider the representation of a system of linear equations al1 xI +···+a1nxn =~. 6 Then 2A = [: ~ 1~ ~21. Then we write cA = [C~l1 C~ln 1 cam I camn . and that c. Suppose that the matrix A -- [ a~1 .. where a~ A=: [ ami a~n: 1andb = [~I: 1 amn represent the coeffcients and bm . (c) OA = 0. dE JR. in the form Ax = b. and (d) c(dA) = (cd)A. (b) (c + d)A = cA + dA. Suppose that 2 4 3 1] A= 3 1 5 -1 0 7 [ 2 . The question of multiplication of two matrices is rather more complicated. To motivate this.. I . (Multiplication By Scalar) Suppose that A. and call this the product of the matrix A by the scalar c. Suppose further that 0 represents the m x n matrix with all entries zero.

Consider the matrices . Note first of all that tbe number of columns of the first matrix must be equal to the numberof rows of the second matrix. we observe that the i-th row of A and the j-th column of B are respectively b1j bnj We now multiply the corresponding entries .. . On the other hand. ..y· k=l Remark.. m and j = 1.. for a simple way to work out qjj . +ainb. Definition. p. Then the matrix product AB is given by the m x p matrix AB qml qmp where for every i = 1. the entry in the i-throw and j-th column of AB. Suppose that A= and B - are respectively an m x n matrix and an n x p matrix..53 Matrics represents the variables. . .. we have n qij = ~aikbkj = ailb1j + . This can be written in full matrix notation by Can you work out the meaning of this representation? Now let us define matrix multiplication more formally. until a in with bnj .and then add these products to obtain q ij' Example.from ail with bll' and so on.

To calculate this. we need the I-st row of A and the I-st column of B. so that [~ ~ ~ ~H :+~l :] 3 x From the definition.3 = 3 + 2 + 0 + 6 = 11 . To calculate this.54 Matrics A=[ ~ -1 4 3 . we need the I-st row of A and the 2-nd column of B. x [ x x x x x x x x x -2 [X ql21 x xJ =X 1 x. we have q2I = 3. so that 2 4 3 -I] ~: [qtlx x qt2] x . From the definition.3 + 3 (. Consider next q2I. 0 + 2.6 -1 = 13. we have qI2 = 2. so let us cover up all unnecessary information.2 + 5.1 + 4. so that 2 4 3-1]: .4 + 4.2 + 3.3 = 2 + 8 + 0 -3 = 7.0 + (-1) . o x [ x x x xxx x x x = 3 x From the definition. we have qll = 2. Let us calculate the product Consider first of all q II. -2 0 1 Note that A is a 3 x 4 matrix and B is a 4 x 2 matrix. so let us cover up all unnecessary information. so that the product AB is a 3 x 2 matrix. To calculate this. we need the 2-nd row of A and the I-st column of B.1 = 8 + 12 .2) + (. so let us cover up all unnecessary information. Consider next q 12.1 + 1.1) .

To calculate this.1 +0.3 =-1 +0+0+ 18= 17.0+6. so let us cover up all unnecessary information. Consider next q31.1 = . we have q22 = 3.4 + 0. (.3 + 7 (.4 + 1.3 + 5. so that x x -1 : : :j: : =[: x x x -2 0 7 6 X q32 X From the definition.2) + 2.2) + 6. we have q31 =(-1). To calculate this.12.55 Matries Consider next q22. so let us cover up all unnecessary information.2+7. 3 From the definition. we need the 3-rd row of A and the 2-nd column of B. We therefore conclude that AB=[ ~ 41 53 2-Ij o -1 7 6 1 4 2 3 0 -2 3 +~ 17 Example. we have q32 = (-1) . so that x 4 x x x x 5 3 3 x 2 x -2 = x x x x x x X q22 X X j . Consider again the matrices A~P -1 4 3 1 5 0 7 -Ij ~ and B= 1 4 2 3 0 -2 3 137 1. so let us cover up all unnecessary information. To calculate this. we need the 3-rd row of A and the I-st column of B. -12 . we need the 2-nd row of A and the 2-nd column of B. so that 1 x x x x v 2 x x x x x -1 0 7 6 o x 3 x From the definition. Consider finally q32.14) + 6 = .4 + 0 + (.1 = 12 + 3 -10 + 2 = 7.

(Associative Law) Suppose that A is an mn matrix. Definition. Systems of Linear Equations Note that the system of linear equations can be written in matrix form as Ax = b. Proposition. Proposition (Distributive Laws) (a) Suppose that A is an m x n matrix and Band Care n x p matrices. Proof Clearly the system (2) has either no solution.v» = Au + A(c(u . B is an n x p matrix. . Then c(AB) = (cA)B = A(cB). Suppose that A is an m x n matrix. one solution or infinitely many solutions. Inversion of Matrices We shall deal with square matrices.v) = Au -Av = b . Proposition. or more than one solution. Then A(B + C) = AB + AC. Clearly we have infinitely many solutions. and that c E JR. where the matrices A. so that A(u . Then Au = band Av = b. we have A(u + c(u .v) is a solution for every c E lR. Then A(BC) = (AB)C. where 0 is the zero m x 1 matrix. It now follows that for every c E lR. We shall establish the following important result. Every system of linear equations of the form. exactly one solution.has either no solution. The n x n matrix .b = 0. so that x = u + c(u . x and b are given. Then (A + B)C = AC +BC. (b) Suppose that A and Bare m x n matrices and C is an n x p matrix. We leave the proofs of the following results as exercises for the interested reader. so that we do not have a definition for the "product" BA. It remains to show that if the system (2) has two distinct solutions. then it must have infinitely many solutions.56 Matries Note that B is a 4 x 2 matrix and A is a 3 x 4 matrix.v» = b + c) = b.v» = Au + c(A(u . those where the number ofrows equals the number of columns. Proposition. Suppose that x = u and x = v represent two distinct solutions. B is an np matrix and C is an p x r matrix.

Proposition. Proposition.57 Matrics where I if i = j. a·· { 1)= 0 ifi-::. We shall relate the existence of such a matrix B to some properties of the matrix A..!:j. An n x n matrix A is said to be invertible if there exists an n x n matrix B such that AB = BA = In. it is sucient to show that B-IA -1 satises the requirements for being the inverse of AB. It shows that the identity matrix In acts as the identity for multiplication of n x n matrices. we shall be content with nding such a matrix B if it exists. Note that a a a a a a a a 1 a a a a 1 II = (1) and 14 = The following result is relatively easy to check. Equality follows from the uniqueness of inverse. is called the identity matrix of order n.A -I) = AA -I = In and (B-IA -I)(AB) = B -I(A -I(AB» = B-I« A -IA)B) =B-I(InB) B -IB = In as required.. is it possible to find another n x n matrix B such that AB = BA = In? However. Proof Suppose that B satises the requirements for being the inverse of A.A = A. Then its inverse A-I is unique. Suppose that A is an invertible n x n matrix. . Suppose that A is an invertible n x n matrix.. Note that (AB)(B-IA -I) =A(B(B -IA -I» = A «BB -I)A -I) = A(I. Remark. Then (AB) B-1 =A -I Proof In view of the uniqueness of inverse. we have AIn = I. Definition. Proposition. Proof Note that both (A -1) -1 and A satisfy the requirements for being the inverse of A -I. It follows that A -I =A -lIn = A -I(AB) = (A -IA)B = InB = B. Hence the inverse A -I is unique. we say that B is the inverse of A and write B = A-I. Then (A -I) -I =A. Then AB = BA = In.. In this case. This raises the following question: Given an n x n matrix A. Proposition. For every n x n matrix A. Suppose that A and B are invertible n x n matrices.

It can be checked that if we take P =[ -2~ ~ ~l' 3 0 then p-I =[=~ 4~3 ~ ].. is called a diagonal matrix of order n. : [ ani ann where aij = 0 whenever i ::j: j. 3 -5/3 -1 Furthermore. if we write D=[~3 ~ n then it can be checked that A = PDP -1. Definition. Example. so that .58 Matrics Application to Matrix Multiplication In this section. Detailed discussion of the technique involved will be covered.:A: k However. as we shall see in the following example. it is usually rather complicated to calculate Ak =1 d:. The 3 x 3 matrices [~ ~ ~ 1 [~ ~ ~] and are both diagonal. we shall discuss an application of invertible matrices. -30 20 12 Suppose that we wish to calculate A98. Example.. Consider the 3 x 3 matrix A= 17 -10 -5] 45 -28 -15. Given an n x n matrix A. An n x n matrix a~ I A= al n . the calculation is rather simple when A is a diagonal matrix.

Note that this example is only an illustration. if the inverse exists. This is much simpler than calculating A98 directly. and 001 Note that • [ :~: :~: :~:] = [~ ~ ~][ ::: ( :~: :~: :~: 0 0 1 a31 a32 a33 Let us interchange rows 2 and 3 of A and do likewise for 13 . 0 1 o a31 a32 a 33 Let us add 3 times row 1 to row 2 of A and do likewise for 13. and (3) multiplying one row by a non-zero constant. These are: (1) interchanging two rows. We obtain respectively a31 a21 a 32 a 22 a33 a23 land ~~ 0 0 0 Note that [all a 31 a12 a32 a13 1 0 a 33 and 0 0 Til 1 a21 a12 13 a22 a0 23 ] . Finding Inverses by Elementary Row Operations In this section. Example. a22 a23 . we shall discuss a technique by which we can nd the inverse of a square matrix. 98 p-I. Consider the matrices A = ( • all a12 a13 a 21 a22 a23 and h= (1 0 0 0 0 O. ~I ~2 ~3 0 0 1 Let us interchange rows 1 and 2 of A and do likewise for 13. We have not discussed here how the matrices P and D are found. We obtain respectively a21 • ::: :::]. let us recall the three elementary row operations we discussed in the previous chapter. (2) adding a multiple of one row to another row. We obtain respectively ~ ~ ~].59 Matrics A 98 398 0 0 0 298 0 0 0 298 = ~PD-I)"'(PDP-I~ = PD98 p.I = P . Before we discuss this technique.

0 0 I Note that • all Sa21 [ a31 Let us multiply [ [I al3] 0 01[a11 a12 al3] Sa23 = 0 S 0 J a21 a22 a23 ..2a32 +a12 a22 a32 -2a +a.2a32 + a l2 a22 -2a33 + al3] 1 a23 = 0 [ a 31 a32 a3 3 0 Let us multiply row 2 of A by S and do likewise -2a31 + all a21 [S:~I a31 S::2 a32 S::3] and a33 0 -2][ all a12 al3] 1 0 a 21 a22 a23 0 1 a31 a32 a33 for 13" We obtain respectively ~ ~ ~]. -a31 -a32 -a33 0 0 -1 Let us now consider the problem in general. We obtain respectively al2 Sa22 ::: ::: -~I -~2 :::] and[ -~3 ~ ~ ~ 1. 0 0 -1 Note that [:~: :~: :~:]= [~ ~ ~][ :~: :~: :~: ].~ 3al:~ a21 a22 a23 ] = ~] [~ ~ ~][ ::: ::: :::].] 33 a23 a33 and [~ 0 -2]o . a32 a33 0 0 1 a31 a32 a33 row 3 of A by . 0 Note that • . We obtain respectively -2a31 +all a 21 a31 .60 Matrics [ a l2 all 3all +a21 3al2 +a22 a31 a32 a" ] 3al3 + a23 and 1 0 3 1 0 0 a33 Note that [ • 3alla~ 3al. a31 a32 a33 0 0 1 a31 a32 a33 Let us add -2 times row 3 to row 1 of A and do likewise for 13 . a 31 a32 a33 .1 and do likewise for 13 .

... Example. By an elementary n x n matrix.." l(Ek ···E2EIA I Ek . ••• ... and suppose that B is obtained from A by an elementary row operation. <Xk then In = Ek . -2 3 0 To find A-I. then the right hand half of the array gives the inverse A-I. E2EI = Ek . We now perform elementary row operations on the array and try to reduce the left hand half to the matrix In. . ·E2ElI n) = (In.. then the final array is clearly in reduced . Suppose that it is possible to reduce the matrix A by a sequence <X J <X2 . Consider an n x n matrix A. Proposition. we mean an n x n matrix obtained from In by an elementary row operation. Consider the matrix <Yk A=[ ~ ~ ~. The interested reader may wish to construct a proof. .. E2EIA We therefore must have A-I = Ek . Suppose further that E is an elementary matrix obtained from In by the same elementary row operation.. Ek are respectively the elem'entary n x n matrices obtained from In by the same elementary row operations <XI' <X2. Suppose that A is an n x n matrix. The process can then be described pictorially by' (A I In) C'i) l(EJA I EJln ) °2 l(E2EIA I E2ElIn) 03 l. . We state without proof the following important result. we consider an array with the matrix A on the left and the matrix In on the right. taking into account the different types of elementary row operations.1 A.J).. E 2E lI n· It follows that the inverse A-I can be obtained from In by performing the same elementary row operations <Xl' <X2 •••• <Xk• Since we are performing the same elementary row operations on A and In it makes sense to put them side by side. Note that if we succeed. we consider the array 121 030 300 o 0] o 1 1 0 We now perform elementary row operations on this array and try to reduce the left hand half to the matrix 13 . <Xk of elementary row operations to the identity matrix In. Then B = EA. .61 Matrics Definition. We now adopt the following strategy. If EI' E2. If we succeed in doing so. In other words..

62 Matrics row echelon form. we obtain [ [ o~ ~3 ~ ~9 ~4 ~3]. we obtain 3 3 0 -15 10 6 o o -3 0 6 -4. Adding . we obtain [ ~o ~3 ~3 ~3 ~ ~l' 15 12 6 0 3 Adding 5 times row 2 to row 3. we obtain [ ~ ~3 ~3 ~3 ~ ~l' -2 3 0 0 0 1 Adding 2 times row 1 to row 3. o 0 -3 -9 5 3 Adding -1 times row 3 to row 2. we obtain .. we obtain [ o~ ~3 ~3 ~3 ~ ~l' 5 4 2 0 1 MUltiplying row by 3 by 3. 3 3 0 -3 -9 5 3 Adding 2 times row 3 to row 1. We therefore follow the same procedure as reducing an array to reduced row echelon form. 0 -3 -9 5 3 Adding 1 times row 2 to row 1. -3 . we obtain [~o ~ ~3 ~ ~ ~l. we obtain [o~ ~3 ~3 ~3 ~ ~l' 0 -3 -9 5 3 Multiplying row 1 by 3. 0 -3 -9 5 3 MUltiplying row 1 by 113.3 times row 1 to row 2. we obtain 3 [ o 0 -15 10 6] 3 -3 -3 -3 1 O.

It follows that the right hand half of the array represents the inverse A-I. we obtain 11231000 0 0 0 -1 -2 1 0 0 0 0 1 0 0 3 0 0 0 0 0 1 0 0 0 1 Adding 1 times row 2 to row 4.63 Matrics 0 0 -3 0 -3 0 6 ~31 2 -4 0 0 -3 -9 5 Multiplying row 2 by -113. we obtain . Hence -3 2 A-I = -2 4/3 : l- 3 -5/3 -1 Example. -5/3 -1 Note now that the array is in reduced row echelon form. 0 0 -3 -9 5 3 Multiplying row 3 by -113. we obtain 1 0 0 -3 1 0 -2 [ o 0 1 3 1 2 4/3 o 1 1. Consider the matrix 1 1 2 3 A= 2 2 4 5 0 3 0 0 0 0 0 1 To find A-I. we obtain 1 0 0 -3 2 1 0 1 0 -2 4/3 1 . Adding -2 times row 1 to row 2. and that the left hand half is the identity matrix 13 . we consider the array 1 1 2 3 0 0 0 2 2 4 5 0 1 0 0 (AI 14)= 0 3 0 0 0 0 1 0 0 0 0 0 0 0 We now perform elementary row operations on this array and try to reduce the left hand half to the matrix 14.

For those who remain unconvinced. Adding 3 times row 3 to row 1. we obtain 1 1 2 0 -5 3 0 0 o 3 0 0 o o 1 0 o 0 0 -1 -2 o 0 o 0 0 0 -2 1 0 Multiplying row 1 by 6 (here we want obtain 6 6 12 0 -30 18 o 3 0 0 0 o o 0 0 -1 0 o to avoid fractions in the next two steps).64 Matrics 1 1 2 o o o 3 100 0 0 0 -1 -2 1 0 0 3 0 0 o 0 1 0 0 0 0 -2 1 0 1 Interchanging rows 2 and 3. we obtain . we 0 1 0 0 0 -1 o 0 0 0 -2 1 0 Adding . let us continue. we obtain 6 6 12 0 0 3 0 -15 03000010 o 0 0 -1 o 0 0 -1 o 0 0 0 -2 1 0 1 Adding -2 times row 2 to row 1.15 times row 4 to row 1. we obtain 1 2 0 -5 3 0 0 03000010 o 0 0 -1 -2 0 0 o 0 0 0 -2 0 1 Adding -1 times row 4 to row 3. we observe that it is impossible to reduce the left hand half of the array to 14 . we obtain 1 2 3 000 03000010 o 0 0 -1 -2 1 0 0 o 0 0 0 -2 1 0 1 At this point.

E. In fact. EIA implies that A = E1-1 . Suppose that an n x n matrix B can be obtained from an n x n matrix A by a finite sequence of elementary row operations. the matrix A can be obtained from the matrix B by a finite sequence of elementary row operations.IB.. and that the left hand half is not the identity matrix 14 . (2) adding a multiple of one row to another row. .. if we have multiplied any row by a non-zero constant e.. For (1).. For (2). multiplying row 2 by 113. For (3). multiplying row 3 by -1 and we obtain 0 0 1/12 -1/3 -5/2 0 113 0 0 0 0 0 0 1 0 0 0 1 0 -1/2 0 0 0 0 1 -112 Note now that the array is in reduced row echelon form. In this section. Proof Let us consider elementary row operations. the matrix A is not invertible.. It follows that if A is row equivalent to B.65 Matries 6 0 12 0 0 3 0 0 0 0 0 0 0 0 Multiplying row 1 by multiplying row 4 by -112. Recall that these are: (1) interchanging two rows. we interchange the two rows again.. E1A. Remark. and (3) mUltiplying one row by a non-zero constant. Criteria for Invertibility Examples.e times row i to row j. The following result gives conditions equivalent to the invertibility of an n x n matrixA. Note that B = Ek . then we can reverse this by adding . Then since these elementary row operations can be reversed. if we have originally added e times row i to row j.. Definition. then B is row equivalent to A.ite number of elementary n x n matrices EI . Every elementary matrix is invertible.. These elementary row operations can clearly be reversed by elementary row operations. Note now that each elementary matrix is obtained from In by an elementary row operation. The inverse of this elementary matrix is clearly the elementary matrix obtained from In by the elementary row operation that reverses the original elementary row operation. 1 0 2 0 1 0 0 3 -2 -15 0 0 0 1 -1 0 0 0 -1 1 0 -2 1 0 116.. Our first step here is the following simple observation. we can reverse this by multiplying the same row by the constant lie. Ek such that B = Ek . We usually say that A and B are row equivalent. Our technique has failed.. Proposition. we shall obtain some partial answers to this question. An n x n matrix A is said to be row equivalent to an n x n matrix B ifthere exist a fin.

.66 Matrics Proposition. . E~ are all invertible.. By Proposition.xn=O. so that .. . Then A is invertible.. . (c) Suppose that the matrices A and In are row equivalent. where xl' . . . Then the system Ax equations has only the trivial solution. then it can be reduced by elementary row operations to the system : x1=O. .. It follows that the trivial solution is the only solution. . . Proof (a) Suppose that we have Xo is a solution of the system Ax = O. Then there exist elementary nn matrices E l . A = ·. the matrices E l .... Suppose that a~ I a~n 1 . Xn are variables.. ° Xo = Irfo = (A-IA)x o = A-l(Axo) = A:! = o. (a) Suppose that the matrix A is invertible. This is eqQ..ivalent to saying that the array a~1 al n 0 [ 0 can be reduced by elementary row operations to the reduced row echelon form ani ann l . Ek such that In = E k. ... = 0 of linear (b) Suppose that the system Ax = 0 of linear equations has only the trivial solution. . [ ani ann and that are n x 1 matrices. Then the matrices A and In are row equivalent. (c) Suppose that the matrices A and In are row equivalent. Then since A is invertible. (b) Note that if the system Ax = 0 of linear equations has only the trivial solution.. EIA.. r!1 Hence the matrices A and In are row equivalent. .

. Consider the system Ax x~ are n []:] and = b. Ek is a product of invertible matrices.Matrics 67 I I I A = E-I I ···Ek l n = EI . Xn Ax = A(A_I b) = (A-I b) = (AA-1)b = Inb = b. Suppose further that the matrix A is invertible. We have proved the following important result. where b~[ll are variables and b l .. and is therefore itself invertible.fo= (A-IA)x o = A-I(Axo) = A-lb. let Xo be any solution of the system. It follows that the system has unique solution. Proposition. so that Xo = l. Suppose that . Then Axo = b. Proposition.. We next attempt to study the question in the opposite direction. b n E lR are arbitrary. Since A is invertible let us consider X = A-lb. where x I' bn . is invertible. . . bn E lR are arbitrary. where Xl' . Suppose that a~ I .. Consequences of Invertibility Suppose that the matrix A = 1 all··· a. anI ann and that x= [~Il and b =[~I xn are n x I matrices.. Consider so that x = A-I b is a solution of the system. ...... On the other hand.. .. .n anI ann is invertible. Then the system Ax = b of linear equations has the unique solution x = A-I b. xn are variables and bl' . A=: [ a~n 1 :. Clearly x 1 matrices.

. ..I] . bn E lR. Now let XI 1 0 = 1. for every j entry 0 elsewhere.n] xnl xnn denote respectively solutions of the systems of linear equations Ax = b l . . A[a.- 0 0 0 0 .1 ani so that A is invertible.. in other words.. Xn are variables. . . In the notation of Proposition.xn =[x.....68 Matrics and that are n x 1 matrices. Ax = bn It is easy to check that A( xl' . xn ) = ( b l . where xl' ...• . bn).b2 = b - o . . Then the matrix A is invertible. . the system Ax = b of linear equations is soluble.. .. .. the following four statements are equivalent: (a) The matrix A is invertible. Proof Suppose that 1 . bn = 0 0 0 In other words. bj is an n x 1 matrix with entry 1 on row and = [X.. . Suppose further that for every b I .. .. We can now summarize Propositions as follows. . .. n. Proposition.

Note that the column sum ClO + . n. n. each of the n sectors requires material from some or all of the sectors to produce its output... .69 Matrics (b) The system Ax = 0 of linear equations has only the trivial solution. and let d. n. the entry cilx\ + . ..... Collecting together xi and di for i = 1. the vector Cl i C J E~n = C nj is known as the unit consumption vector of sector j. j = 1. Consider the mat~ix product For every i = 1.. let cij denote the monetary value of the output of sector i needed by sector j to produce one unit of monetary value of output. Cd) The system Ax = b of linear equations is soluble for every n x 1 matrix b. . let xi denote the monetary value of the total output of sector i over a fixed period. + C <1 J nJ in order to ensure that sector j does not make a loss. n.. n. For every j = 1.. we describe briey the Leontief input-output model.. On the other hand. This leads to the production equation x= Cx+ d Here Cx represents the part ofthe total output that is required by the various sectors of . where an economy is divided into n sectors.nXn represents the monetary value of the output of sector i needed by all the sectors to produce their output.. (c) The matrices A and In are row equivalent. we obtain the vectors x=[~ll E~n andd= ~llE~' dn Xn known respectively as the production vector and demand vector of the economy. + c.. . we obtain the matrix 0 cll C=(cl·· ·cn ) = : [ cnl is known as the consumption matrix of the economy.. . . Application to Economics In this section. . Collecting together the unit consumption vectors.. For i. denote the output of sector i needed to satisfy outside demand over the same xed period. For every i = 1..

)d in total. Now it is not dicuIt to check that for every positive integer k.70 Matrics the economy to produce the output in the first place.. + Ck. 50 and 20... To produce this extra C2d. .and the production vector x = (I . Then the production vector and demand vector are respectively while the consumption matrix is given by . we need C(C2d) = C3d as input.C)(I + C + C2 + C3 + . we need Cd as input. If the entries of Ck + I are all very small. Their dependence on each other is summarized in the table below: To produce one unit of monetary value of output in sector 2 3 0:3 0:2 0:1 monetary value of output required from sector 2 0:4 0:5 0:2 monetary value of output required from sector 3 0: 1 0: 1 0:3 monetary value of output required from sector 1 Suppose that the fit.lal demand from sectors 1. so that (I . Suppose that the entries of the consumption matrix C and the demand vector d are non-negative. This gives a practical way of approximating (I . Suppose further that the inequality (5) holds for each column of C... and d represents the part of the total output that is available to satisfy outside demand. we have demand d. And so on. Let us indulge in some heuristics.. To produce d. Then the inverse matrix (/ . we need C(Cd) = c'2d as input.. Clearly (/ . Example. = (/+ C+ C2 + C3 + .C) .2 and 3 are respectively 30.C)-I d has non-negative entries and is the unique solution of the production equation (6).C)-I"" I + C + C2 + C3 + . we have .C)-I.C)-I exists. Initially. If the matrix (/ ..C)-I = / + C + C2 + C3 + .. and lalso suggests that (/ .C)(/ + C + C2 + C3 + . An economy consists of three sectors..d is invertible. We state without proof the following fundamental result. then (I . then represents the perfect production level. + C") = I .. + C") "" I. Proposition. To produce this extra Cd. Hence we need to produce d+ Cd+ C2d+ C3d+ . .C)x = d.Ck + I..{f.

'] = A (x~] + A (x. to the nearest integers. simply observe that A = ( X. Matrix Transformation on the Plane Let A be a 2 x 2 matrix with real entries. so that I .5 [ 0.1] 0.7 .2 -0.C = -0.1 0.5 -0. J= eA (Xl] .1 -0. To see this.2 50.71 Matrics 0.4 0. A matrix transformation T:]R 2 --+ ]R 2 can be dened as follows: For every x = (XI' x 2 ) E]R.2.1 -0.7 20 -1 -1 7 200 and which can be converted to reduced row echelon form ( 0 0 0 3200/27 o 0 0 6100/27 o This gives xl ~ 0 1 70019 119. The matrix .5 -0.2 .4 0.1 0. 0. I t t -4 5 -2 500 eqmva en 0 0.'] x~ + x.2 -0.2 C = 0.1 30 [ 7 -2 -1 300 -0. x 2 ~ 226 and x3 ~ 78.4 0.3 0.1] [0. where satifies Such a transformation is linear. eX2 x2 and A (ex l ) Here we conne ourselves to looking at a few simple matrix transformations on the plane. x~ x.1 -0. we write T(x) = y.7 -0. . + x. Example.3 -0.1 0. The production equation (I . in the sense that x' E]R2 and T(ex) = eT(x) for every X E]R2 T(x' + x1 = T(x1 + T(x1 for every x' and every C E ]R .7 -0.C) x = d has augmented matrix 0.

We give a matrix [~ ~ll [ ~1 {Yl =-x1 [ ~1 Y2 = -x2 {Yl =x1 Y2 =x2 = ~l ~ll [~ ~l Example. On the other hand.JR . k < 1.JR 2 . the matrix satisfies for every (x I' x 2) E. and so represents reflection across the x I-axis.JR 2 .JR 2 . On the other hand. whereas the matrix for every (xI' x 2) E.:H ~l ~J[::H~:ll for every (x I' x 2) E. and so represents reflection across the x2-axis.axis {Yl =-x1 Y2 = -x2 Y2 =x2 Reflection across origin Reflection acrossxI = x2 x 2. The matrix 2 for every (x I' x 2 ) E. the matrix • .axis {Yl =x1 Reflection acrossx2 . whereas the matrix A=( ~l ~l satisfies A[. and so represents reection across the origin. Let k be a fixed positive real number. and so represents reection across the line xI summary in the table below: Transformation Equations Reflection acrossx1 . and so represents dilation if k > 1 and contraction if 0 < 1.JR 2 .72 Matrics for every (x I' X2) E.

For the case k = 1. 2. and so represents an expansionn in the xI -direction if k > 1and compression in the x I-direction if 0 < k < 1. and so represents expansion in the x 2-direction if k > 1 and compression in the x 2-direction if 0 < k < 1. 2 . Let k be a fixed real number. and so represents a shear in the xI-direction. whereas the matrix for every (xI' x 2) E IR. We give summary in the table below: Transformation Dilation or contract ion by factor k > 0 Equations matrix {YI = Axl [~ ~l [~ ~l [~ ~l = kx2 {YI ~ Axl Y2 Expansion or compression in xI -direction by factor k > 0 Y2 =x2 Expansion or compression in x2 -direction by factor k > 0 {YI ~xl Y2 = kx2 Example. we have the following: T (k= I) ~ . The matrix for every (xI' x 2) E IR. 2 .73 Matrics for every (XI' X2) E IR.

x YI Matrix 2 sin e = x] sin e + x2 cos e [ cose -Sine] sine cose We conclude this section by establishing the following result which reinforces the linearity of matrix transformations on the plane.74 Matries For the case k = . .direction Y2 =kxl +x2 Example. we have the following: T (k=-J) • Similarly. and so represents a shear in the x2-direction.direction {Y2 ='1 +kx2 [~ ~] [~ ~] Y2 =X2 {Yl =xl Shear in xl . x 2) = (Y]'Y2)' where Y] + iY2 = (x] + ix2)(cos e + i sin e ). We give a summary in the table below: Transformation Equations Matrix Shear in XI . For anticlockwise rotation by an angle e.sin e1 cose . sme cose Y2 It follows that the matrix in question is given by A = [ cos e sine . and so YI ] = [ Y2 [c~se -Sine][ YI ]. we have T(x]. cose . the matrix A=[~ ~] satisfies for every (xl' x 2) E IR?.1. We give a summary in the table below: Equations Transformation Anticlockwise rotation bylangle e {Yl = x.

the 12 . note that straight lines through the origin correspond to y = O. Proof Suppose that T(x l . where x = [ ~) and y =[ . we have x = A-I y . To prove (c).75 Matries Proposition. ---? JR2 is given by an (b) The image under T of a straight line through the origin is a straight line through the origin. . Then (a) The image under T of a straight line is a straight line.. the image under T of the straight line a~l +~~ =yis aYI + WY2 =y. Application to Computer Graphics Example. To prove (b). note that parallel straight lines correspond to different values of y for the same values of a and ~. Hence (a.. clearly another straight line. This proves (a). Suppose that a matrix transformation T: JR2 invertible matrix A.J The equation of a straight line is given by ax l + ~x2 = yor. by (a~) = [:: )<1).'W) * (a~) A-I. In other words. x 2) = (Yl'Y2)' Since A is invertible. in matrix form. Consider the letter M in the diagram below: ~ Following the boundary in the anticlockwise direction starting at the origin. and (c) The images under T of parallel straight lines are parallel straight lines.

(I:) (~].!. using the matrix A=[: H representing a shear in the xl-direction with factor 0:5. (~]. (~]. Then the images of the 12 vertices are respectively (~]. (:]. [I:]. (~]. Let us apply a matrix transformation to these vertices. X2] X: for every (x l .x2)EIR 2. (:]. (~]. [I~]. 006060088288 In view of Proposition. (:]. noting that (: t][~ = [0 I 1 1 4 7 7 8 8 7 4 1 0] 060 6 008 8 288 4 4 10 7 8 12 11 5 5 4]. (~]. (~].. Hence the image of the letter M under the shear looks like the following: . (~].. (!]. (!].[:]. the image of any line segment that joins two vertices is a line segment that joins the images of the two vertices. [:]. so that Xl] A ( x2 = (Xl +.76 Matrics vertices can be represented by the coordinates (~]. [~l' [~]. (:]. (~].

:]A [::]=[::: ::][::]. x 2) ]. the image of the point (Xl' x 2' 1) is now (Yl'Y2' 1).h2) is of the form (. 0 1 Remark. we introduce homogeneous coordinates.x2) E]R1. so we attempt to find a 3 x 3 matrix A * such that [ ~ ~~ ] = A *[ ~1 for every (x l' x. we may wish to translate this image. 1 2 It fo Hows that using homogeneous coordinates. translation is transformation by vector Ene h=(h l . and this cannot be described by a matrix transformation on the plane. Then [. To overcome this deficiency. Consider a matrix transformation T :]R2 ~]R2 on the plane given by a matrix A=[a a l1 12 a2l a22 Suppose that T(x l . However.)E R'. translation by vector h = (h l' h2) E ]R . can be described by the matrix A*=[~o ~ ~l.77 Matrics Next.: H:: H~ 1 for every (x" x. = Under homogeneous coordinates.) E]R2. Now we wish to translate a point (Xl' X 2) to (Xl' X 2) + (hI' h2) = (Xl + hI' x 2 + h2). It is easy to check that [ :~:~l= ~ 1 ~ ~] . = (YI' Y2). For every point (xl' X2) (Xl' x 2)E]R2 we identify it with the point (Xl' x 2' 1) E]R3 .:] 0 0 1 for every (x l . Note that .

The letter M. 7 7 8 8 7 4 111 1 1 I 1 0] 1 1 Then the 2 x 2 matrix . put in an array in the form [ 0 1 1 4 o 0 6 0 6 0 0 8 8 2 8 8 . It follows that homogeneous coordinates can also be used to study all the matrix transformations we have discussed. the 12 vertices are now represented by homogeneous coordinates. [1 ~l2 A= o 1 is now replaced by the 3 2 A* 0 o x 3 matrix o 0 . Example.78 Matrics YI] [all al2 0) XI] [~2 = a~1 a~2 ~J 7. By moving over to homogeneous coordinates. we simply replace the 2 x 2 matrix A by the 3 x 3 matrix A*=[~ ~J. 0 Note that 0 .t 1 -4 7 7 8 8 7 4 11 A* 0" 0 6 0 6 0 0 8 8 2 8 [1 1 1 1 1 - O~] 1 1 0 1 2 o [0 1 I 4 7 7 8 8 7 4 1 00060600882 8 8 o 0 1 0] 11111111111 1 1 .

111111111111 Next. The matrix under homogeneous coordinates for this translation is given by B* ~ ~ ~].Matrics 79 o 1 4 4 10 7 8 12 11 5 5 4] =006060088288. o 0 1 0 1 4 Note that 4 7 7 8 8 7 4 0 B*A* 0 0 6 0 6 0 0 8 8 2 8 8 1 1 1 =( ~ 0 =[~ 3 6 6 12 9 10 14 13 7 3 9 3 1 0 1 4 m 5 5 8 12 11 6 0 6 0 0 8 8 2 111 1 1 1 1 1 1 0 1 1 1 4 10 7 1 7 939 11 11 5 11 1 '1 1 1 1 1 1 l~l J. 3). 8 ~l giving rise to coordinates in IR 2 . let us consider a translation by the vector (2. displayed as an array 2 [3 3 6 6 12 9 10 14 13 7 7 3 9 3 9 3 3 11 11 5 11 6]' 11 Hence the image of the letter M under the shear followed by translation looks like the following: .

for example. followed by a shear by factor 2 in the xI-direction. and the array now looks like . One way of solving the system Ax = b is to write down the augmented matrix a. This requires n + I operations. ann (II) Next. where A is an n x n invertible matrix. we now multiply the first row by . This requires n + 1 operations. . the transformation representing a reflection across the xI-axis. and followed by translation by vector (2. [anI a. we may need to interchange two rows in order to ensure that the top left entry in the array is non-zero.1 . We are interested in the number of operations required to solve such a system. adding or mUltiplying two real numbers.n ~Il' bn and then convert it to reduced row echelon form by elementary row operations. n.. . followed by anticlockwise rotation by 90°..80 Matrics Example.ail and then add to row i. we need to multiply the new first row by a constant in order to make the top left pivot entry equal to I. and the array now looks like anI a n2 ann bn Note that we are abusing notation somewhat. (III) For each row i = 2.I)(n + 1) operations. as the entry a l2 here.. Under homogeneous coordinates.. By an operation. The first step is to reduce it to row echelon form: (I) First of all. has matrix [~ ~ ~Il[~ ~I ~ [~ ~ ~l[~ ~I ~l ~2 ~Il' = [: 00 100100100 100 I Complexity of a Non-Homogeneous System Consider the problem of solving a system of linear equations of the form Ax = b. This requires 2(n . may well be different from the entry a l2 in the augmented matrix. -I). . we mean interchanging.

This 1 to reduced row echelon form by elementary row operations.ts at most 2(n + 1) + 2(11 . and conclude that the number of operations required to convert the augmented matrix (II) to row echelon form is at most n2 2 L2m(m+ 1) ~ _n 3 • m=1 3 The next step is to convert the row echelon form to reduced row echelon form. (V) Our nelt task is to convert the smaller array t [a~2 . 2 3 . In any case.Matrics 81 (IV) In sumJary. .1.. by something like n2 if one analyzes the problem more carefully. ani ann . This is simpler. so this is less ecient than our first method. It can be shown that the number of operations required is bounded by something like 2n 2 indeed. a n2 a~n ~21' ann bn to an array that looks like These have one row and one column fewer than the arrays (II) and (III).. estlmate"3 n earlier.1)(n + 1) = 2n(n + 1) . where m = n . as many entries are now zero. It can be shown that the number of operations required is something like 2n 3 . the number of operations require~. Ax We therefore conclude that the number of operations required to solve the system by reducing the augmented matrix to reduced row echelon form is bounded by =b 2 something like 3" n3 when n is large. Another way of solving the system Ax may involve converting the array = b is to first nd the inverse matrix A-I. We continue in this way systematically to reach row echelon form. these estimates are insignicant compared to the . to proceed from the form II to the form III. and the number of operations required is at most 2m(m + 1).

then there is no need to multiply any row of the array by a non-zero constant. except that the pivot entries do not have to be equal to 1. However.. Definition. Proof Recall that applying an elementary row operation to an m x n matrix corresponds to mUltiplying the matrix on the left by an elementary m x m matrix. then U = Ek. Then A=LU. . . where the elementary matrices E 1.. . . with the same coefficient matrix A but for many different vectors b... and it is easy to see that the corresponding elementary matrix is lower triangular. . It can be shown that products and inverses of unit lower triangular matrices are also unit lower triangular. we first need a deffinition.. with diagonal entries all equal to 1. To describe this. (2) All zero rows are grouped together at the bottom of the array. In other words. Proposition. Suppose that an m x n matrix A can be converted to quasi row echelon form by elementary row operations but without interchanging any two rows. we describe a way for solving this problem in a more efficient way. If an m x n matrix A can be reduced in this way to quasi row echelon form U. E 2E 1)-I. In fact. and we may have to convert the augmented matrix to reduced row echelon form. If A is an invertible square matrix. we may need to solve systems of linear equations of the form Ax = b.. L = (Ek' E2E1A. where L is an m x m lower triangular matrix with diagonal entries all equal to 1 and U is a quasi row echelon form of A. Let us call such elementary matrices unit lower triangular. It is not necessary for its value to be equal to 1.E2. Let . the array looks like row echelon form in shape. Ek are all unit lower triangular. On the other hand. it is sucient even to restrict this to adding a mUltiple of a row higher in the array to another row lower in the array. the matrix A may not be a square matrix. Hence the only elementary row operation we need to perform is to add a mUltiple of one row to another row. In this section. if we are aiming for quasi row echelon form and not row echelon form. We consider first of all a special case.82 Matrics Matrix Factorization In some situations. then we can and its inverse A-I and then compute A-1b for each vector b. A rectangular array of numbers is said to be in quasi row echelon form if the following conditions are satised: (1) The left-most non-zero entry of any non-zero row is called a pivot entry.. Then A = LU. (3) The pivot entry of a non-zero row occurring lower in the array is to the right of the pivot entry of a non-zero row occurring higher in the array. Hence L is a unit lower triangular matrix as required.

the entry 3 in row 2 and column 2 is a pivot entry. Adding -2 times row 1 to row 2. we obtain 2 -1 2 -2 3 0 3 2 -1 2 0 -9 -6 10 -8 0 -12 -8 18 -8 Note that the same three elementary row operations convert 1 0 0 0 2 1 0 0 1 * 1 * * 0 0 0 0 to 0 0 0 0 0 * 1 * * 0 1 Next. ••• . and adding -1 times row 1 to row 4. However. Example. we have Ly = band Ux = y. adding -1 times row 1 to row 3. where U = Ek . then L(Ux) = b. E2EIL This means that the very elementary row operations that convert A to U will convert L to /. E2Eltl. It remains to nd L. Ifwe reduce the matrix A to quasi row echelon form by only performing the elementary row operation of adding a multiple of a row higher in the array to another row lower in the array. and column 1 is a pivot column. Consider the Matrix 2 A= -1 3 -2 2 4 6 -5 8 2 -10 -4 8 -5 2 -13 -6 16 -5 The entry 2 in row 1 and column 1 is a pivot entry. It is simplest to illustrate the technique by an example. and column 2 is a pivot . Both of these systems are easy to solve since both Land U have many zero entries. note that L = (Ek . Writing y= Ux. E2E 1A. We therefore wish to create a matrix L such that this is satised. It remains to and Land U. and so 1= Ek. ••• . ••• . It follows that the problem of solving the system Ax = b corresponds to first solving the system Ly = b and then solving the system Ux = y. then U can be taken as the quasi row echelon form resulting from this." 83 Matrics If Ax = b and A = L U.

the entry 7 in row 3 and column 4 is a pivot entry. Adding 3 times row 2 to row 3. The pivot columns at the time of establishing the pivot entries are respectively . we obtain 2 -1 0 3 2 -2 2 -1 0 0 0 7 3 2 -2 0 0 0 14 0 Note that the same two elementary row operations convert 1 0 0 0 0 1 0 0 0 -3 1 0 0 -4 * 0 0 to 0 0 1 0 0 0 0 1 0 0 0 * Next. and adding 4 times row 2 to row 4. and column 4 is a pivot column. and column 5 is a pivot column. Example. we obtain the quasi row echelon form 2 -1 2 -2 0 3 2 -1 u= 0 0 0 7 0 0 0 0 3 2 -2' 4 where the entry 4 in row 4 and column 5 is a pivot entry. 1 -4 The strategy is now clear. The lower triangular entries of L are formed by these columns with each column divided by the value of the pivot entry in that column. Note that the same elementary row operation converts 1 0 0 0 0 0 0 0 0 1 0 0 0 2 1 0 0 0 to 0 1 0 0 0 0 1 0 0 0 0 Now observe that if we take L= 2 0 0 1 0 0 1 -3 0 o' 2 1 then L can be converted to 14 by the same elementary operations that convert A to U. Every time we nd a new pivot. we note its value and the entries below it. Adding -2 times row 3 to row 4.84 Matrics column. Let us examine our last example again.

(2) Record any new pivot column at the time of its first recognition.Matrics 85 2 * 4 3 2 ' -9 -12 2 * * *' * * 7 . we obtain respectively the columns 2 * * 1 * l ' -3' -4 * l' *' 2 Note that the lower triangular entries of the matrix L= 1 0 0 0 2 1 0 0 -3 1 0 -4 2 correspond precisely to the entries in these columns. and modify it by replacing any entry above the pivot entry by zero and dividing every other entry by the value of the pivot entry. (1) Reduce the matrix A to quasi row echelon form by only performing the elementary row operation of adding a mUltiple of a row higher in the array to another row lower in the array. 7 and 4. with modied version A= -5 . LU FACTORIZATION ALGORITHM. Example. We wish to solve the system of linear equations Ax = b. where 3 -1 -3 3 -4 2 -4 1 1 5 -2 -2 andb= 11 -10 6 9 6 -6 8 -21 13 -9 -15 Let us first apply LV factorization to the matrix A. 4 14 Dividing them respectively by the pivot entries 2. Let V be the quasi row echelon form obtained. The first pivot column is column 1. (3) Let L denote the square matrix obtained by letting the columns be the pivot columns as modied in step (2). 3.

2 times row 1 to row 3. we obtain the quasi row echelon form 3 -1 2 -4 -1 0 2 -3 0 0 4 -1 3 0 0 0 0 2 The last pivot column is column 5. with modied version o 1 -1 3 Adding row 2 to row 3. we obtain 3 -1 2 -4 1 -1 0 2 -3 0 0 4 -1 3 -8 2 -4 0 0 The third pivot column is column 3.86 Matries -1 2 -2 Adding row 1 to row 2. adding . and adding -3 times row 2 to row 4. and adding 2 times row 1 to row 4. with modied version . with modied version o o -2 Adding 2 times row 3 to row 4. we obtain 3 -1 0 2 0 -2 0 6 2 -4 -3 -1 7 -2 4 -17 5 -7 The second pivot column is column 2.

Using row 4. Using row 2. with augmented matrix 3 -1 2 -4 1 1 o 2 -3 1 -1-1 o 0 4 -1 3 6 o 0 0 0 2 2 Here the free variable is x 4.2Yl + 3Y2 . so that Y3 = 6. 1 Hence Y= -1 6 2 We next consider the system Ux =Y. we obtain 2x5 = 2. so that x5 = 1. we 4 obtain 2x2 = -1 + 3x3 -x4 + x5 9 1 Xs = ---t.87 Matrics 0 0 0 1 It follows that 3 -1 2 -4 -1 0 0 0 2 -3 1 -1 and u= L= 4 -1 3 0 0 2 -1 1 0 0 0 2 -2 3 -2 0 0 0 0 0 We now consider the system Ly 1 0 -1 2 -1 -2 3 = b. 3 1 Usmg row 3. so thatY2 =-1. we obtainY2 . we obtam 4x3 = 6 + x 4 -3x5 = 3 + t. with augmented matrix 0 0 1 o 0-2 1 0 9 -2 1 -15 Using row 1. we obtainYl = 1. Let x 4 = t. Using row 4. we obtain Y4 .2Y3 = -15. .Yl = -2. . we obtain Y3 + 2Yl . Using row 2. so that Y4 = 2. so that x3 = + 4t . 4 4 . Using row 3.Y2 = 9.

.3. Application to Games of Strategy Consider a game with two players... 2 (2) Computing an LU factorization of an n x n matrix takes approximately 3 n3 operations. has n possible moves. .3. let aij denote the payo that player C has to make to player R if player R makes move i and player C makes move j.. . has m possible moves. player C makes movej with probability qj" Then PI + . . .2. Assume that the players make moves independently of each other.g/. . the number Pjqj represents the probability that player R makes move i and player C makes move j. . + qn = 1. . andj = 1. in which case the matrices Land U may also have many zero entries. usually known as the row player...: [ ami The entries can be positive. player R makes move i with probability Pi' and that for every j = 1.. (3) LU factorization is particularly ecient when the matrix A has many zero entries.x5 27 3 = Sl .where t E R.2.. and j = 1... usually known as the column player. . 1 gt . m.88 Matrics so that x 2 that XI = 9 9 1 = g. we obtain 3x I = l+x2 .g' Using row 1. m. .2.. n. Solving the systems Ly = band Ux = y requires approximately 2n2 operations. Then the double sum m n EA(p. Then for every i = 1. 3. denoted by j = 1. . . Player R..3. Remarks. . n.. . . denoted by i = 1. n.2x3+4x4 .. (1) In practical situations.qj . while player C. m.g' so Hence _ (9/-1 9-1 t 3 +t 11) (xl' x 2' X3' X4' Xs) 8 ' 8 ' 4 " ..3.. 3. . 2. These numbers give rise to the payo matrix all A -- .q) = LLaijp.=1 j=1 represents the expected payo that player C has to make to player R. 2. 2.3. Suppose that for every i = 1... but which can be made unit lower triangular by interchanging rows. + Pm = 1 and qI + . 3.2. interchanging rows is usually necessary to convert -a matrix A to quasi row echelon form. m. . The matrices .2. For every i = 1. n... The technique here can be modied to produce a matrix L which is not unit lower triangular. negative or zero.

. The strategy p is known as an optimal strategy for player R. a~n][ ~I . Remark. However. q*) is known as the value of the game. the teachers require 100 students to each choose between rowing (R) and cricket (C). o o where the I 's occur in position i in p* and positionj in q*. It is very easy to show that different saddle points in the payo matrix have the same value. 0 I 0 .. q*) for every strategy p* of player R. and every strategy q* of player C. and the strategy q is known as an optimal strategy for player C. q*) > EA(p. q*) = EA(p**. Zero sum games which are strictly determined are very easy to analyse. . if p** and q** are another pair of optimal strategies.p. In some sports mad school. so that the value of the game is aij . 0) and q = I . Clearly the expected payo m n [ all EA(p.. . The quantity EA(p*. q)? Is it possible for player C to choose a strategy q to try to minimize the expected payo EA(P. However. The right hand side is a I x 1 matrix! We now consider the following problem: Suppose that A is xed.q) = B~aijPiq/PI'''Pm): . Optimal strategies are not necessarily unique. are optimal strategies. then EA(P*.. ]_ ..Aq. q) > EA(p*. the strategies o - o p* = (0 . Here the payo matrix A contains saddle points.89 Matrics P ~ (PI··· Pm ) and q = [ ::J are known as the strategies of player R and player C respectively. Example. q**). In this case. q)? Fundemental Theorem of Zero Sum Games.. An entry aij in the payo matrix A is called a saddle point if it is a least entry in its row and a greatest entry in its column. amI'" amn qn Here we have slightly abused notation. There exist strategies p* and q* such that EA(p*. the students cannot make up their mind. Is it possible for player R to choose a strategy p to try to maximize the expected payo Eip. Remark.

if coaches R2 and CI are hired. C3 and C4 denote the 4 possible cricket coaches: CI C2 C3 RI 75 50 45 R2 20 60 30 R3 45 70 35 C5 60 55 30 [For example. where RI. and so 80 students will choose cricket.q which is independent of q.!.a 12 .-==. 20 -15 -20 [For example.a l2 . and CI. However.ql) + a 21 (1.a 21 + a22 E A(P . The number of students who will choose rowing ahead of cricket in each scenario is as follows. R2 and R3 denote the 3 possible rowing coaches. then 25 is the number cricket concedes to rowing.] Here the entry -5 in row I and column 3 is a saddle point.a21 + a22 all .:. q) = allPlql + a I 2PI(1..(a 22 .~a lI a22 -a12 a21 all .-. Then the solution for these optimal problems are solved by linear programming techniques which we do not discuss here.PI)ql + a 22(1 .a 22 )Pl + Let . the top left entry denotes that if each sport starts with 50 students. Then ql) a 22 · Then * ) = (a12 -a22)(a22 -a21 ) + a22 = --!.. There are 3 possible rowing coaches and 4 possible cricket coaches the school can hire. In general.ql..!.al2 . so that the problem is not strictly determined.~ I~ -5 ].a 21 + a 22 )Pl .. Similarly. if . C2. then 20 students will choose rowing. we can write P2 = I .PI)(1 = ((all . saddle points may not exist.PI and q2 = I Eip.a 21 ))ql + (a 12 . so the optimal strategy for rowing is to use coach RI and the optimal strategy for cricket is to use coach C3.90 Matrics and will only decide when the identities of the rowing coach and cricket coach are known.] We first reset the problem by subtracting 50 from each entry and create a payo matrix A =[~:o I~ .. in the case of 2 x 2 payo matrices which do not contain saddle points.

1) form an orthonormal basis B = {up u2}. which is independent of p.a2l +a22 all-a12 . A square matrix A with real entries and satisfying the condition A-I = At is called an orthogonal matrix.a12 .q *) -- alla22 . Hence Eip* q) = Eip*. Then C = {vI' v2 } is also an orthonormal basis. .a12 a 2l al1 .a12 1 all -a12 . Consider the euclidean space ~2 with the euclidean inner product. 0) and u2 = (0. Let us now rotate u l and u2 anti clockwise by an angle to obtain vI = (cose sin e) and v2 = (-sine.a 2l + a 22 . The vectors u l = (1. Example. q*) for all strategiesp and q: Note that * [ P = a22 -a2l all . cose ).91 Matrics then EA ( p. q*) = Eip.a2l +a22 and with value ORTHOGONAL MATRICES Definition.

Ii t _ : -. Example.rn It follows that AAt = I if and only if for every i. ... So are the column vectors of A. 2/3. 113) are orthonormal. . namely (113. The matrix A= ( 113 -2/3 2/3 -113 2/3 ) -2/3 2/3 2/3 1/3 is orthogonal.2/3). Proof We shall only prove (a)...... since the proof of (b) is almost identical. Proposition. .. vn } are two orthonormal bases of a real inner product space V. In fact. and (b) A is orthogonal if and only if the column vectors of A form an orthonormal basis of 1R n under the euclidean inner product. Let r.-113. Then AA Ii.-2/3.. since At A = ( 113 -2/3 2/3 -2/3 2/3) (113 -2/3 2/3) (1 0 0 ) -113 2/3 2/3 -1/3 -2/3 = 0 1 0 .r. ( rn . Then (a) A is orthogonal if and only if the row vectors of A form an orthonormal basis of 1R n under the euclidean inner product. (2/3. j = 1. rn . Suppose that A is an n x n matrix with real entries. Proposition. . -2/3 113 2/3 2/3 1/3 0 0 1 Note also that the row vectors of A. Suppoiie that B = {up'" un} and C = {vI' . . Then the transition matrix P from the basis C to the basis B is an orthogonal matrix.. n.:rn ) . .. rn denote the row vectors of A. we have . Ii . smS cosS Clearly 1 p -I =p t = [cos sin S . -sinS cosS In fact. our example is a special case of the following general result. .-2/3) and (2/3.92 Matries The transition matrix from the basis C to the basis B is given by p -sinS 1 = ([VdB[V2 ln = [ COS . .. our last observation is not a coincidence.

+ ~nun = Ylvi +.. we have (c) For every Proof «a)) :::} (b)) Suppose that A is orthogonal. so that AlA = 1. Then II u 112 = (u. u) = (~Iul + . v for every u. v. It follows that for every x E lR n.Av=-IIAu+Avll --IIAu-Avll =-IIA(u+v)1I --IIA(u-vll 4 4 4 4 1 2 1 2 2 =-lIu+vll --llu-vll =u. so that (AlA -I)u ... + ~nun' ~Iul + . In particular. v E lRn. this holds when v = (AlA -l)u. v=Au. . we have Au . 21 21 21 Au. (b) For every x E lRn. Proof For every UE V. so that (AlA -l)u . Ax = xlAIAx = xlIx = XIX «b)) :::} (c)) Suppose that II Ax we have 1 = X . + ynvn' where ~I'····'~n' YI'····· YnE lR. vn} are two orthonormal bases of V. } { 0 ifi::/:. /..93 Matrics if i == j.. Suppose that A is an n x n matrix with real entries. v = o... (AlA -I) u = 0. u.. X = II X 112. .Av=v'AIAu=AIAu. . II = II x II for every x E lR n • Then for every u. Av = u .v. rn are orthonormal.. Proposition...... Then the following are equivalent: (a) A is orthogonal. we can write u = ~Iul + . v E lR n . and so AlA = 1. we have II Ax 112 = Ax ..(i) But then (1) is a system ofn homogeneous linear equations in n unknowns satised by" n every u E lR • Hence the coefficient matrix AI A -I must be the zero matrix. II Ax II = II x II. r· r· I if and only if r I' .. v E lRn. un} and C= {vi' . Suppose further that the inner product in lR n is the euclidean inner product. Then Iu. whence (AlA -I)u = 0.j. 4 4 Suppose that Au. . v=u. and where B = {u l ' ..... v.. + ~nun) . Av = u.

AI)v = O} . On the other hand.AI)v = O. 'YIVI + . The polynomial det(A .. Since vERn is non-zero. we must have det . and forms a subspace of R n • This space (iii) is .')J. Hence /I Px II = II x II holds for every x E R .94 Matrics Similarly...AI) = O. In this case. + 'YnV) n n LL'Yi'Yj(V. (iii) is the nullspace of the matrix A . so that (A . In other words. + 'YnVn ..AI) is called the characteristic polynomial of the matrix A.Vj ) = L'Y~ = 1=1 j=1 1=1 It follows that in lR n with the euclidean norm. Suppose further that there exist a number A E 1R and a non-zero vector vERn such that Av = v. Solving this equation (2) gives the eigenvalues of the matrix A. for any eigenvalue of the matrix A. Then we say that A is an eigenvalue of the matrix A. It now follows from Proposition that P is orthogonal. it follows that we must have det(A . and so /I P[u]c /I = II [u]c /I n for every u E V. the set { v E lR n : (A .. IIu /1 2 = (U. and that v is an eigenvector corresponding to the eigenvalue A .. we have Av = AV = ')Jv. n U) = ('YIVI +.(ii) =0 ani a n2 ann - A Note that (ii) is a polynomial equation.. where I is the n x n identity matrix. Eigenvalues and Eigenvectors We give a brief review on eigenvalues and eigenvectors: Suppose that a~1 ••• A=: a~n J : ( ani ann is an n x n matrix with real entries.. we have /I [u]B /I = /I [u]c /I.

Matrics

95

called the eigenspace corresponding to the eigenvalue A. Suppose now that A has eigenvalues
AI' .... An E lR, not necessarily distinct, with corresponding eigenvectors vI' .... , vn E lRn,

and that vi' .... vn are linearly independent. Then it can be shown that
p-IAP=D,
where

J

In fact, we say that A is diagonalizable if there exists an invertible matrix P with real
entries such that P-IAP is a diagonal matrix with real entries. It follows that A is
diagonalizable if its eigenvectors form a basis of lR n • In the opposite direction, one can
show that if A is diagonalizable, then it has n linearly independent eigenvectors in lR n • It
therefore follows that the question of diagonalizing a matrix A with real entries is reduced
to one of linear independence of its eigenvectors.
We now summarize our discussion so far.
Diagonalization Process. Suppose that A is an n x n matrix with real entries.
(1) Determine whether the n roots of the characteristic polynomial det(A - IJ) are
real.
(2)

If not, then A is not diagonalizable. If so, then nd the eigenvectors corresponding
to these eigen-values. Determine whether we can nd n linearly independent
eigenvectors.

(3)

If not, then A is not diagonalizable. If so, then write

p

~ (vI .... v

n)

and D

~ (A.I '. A.J

where Ai' ... , An E lR are the eigenvalues ofA and where vl' .....vn E lRn are respectively
their corresponding eigenvectors. Then P-IAP = D.
In particular, it can be shown that if A has distinct eigenvalues AI'·· .. An E lR, with
corresponding eigenvectors vi' ...... , vn E lR n , then vi' ..... vn are linearly independent. It
follows that all such matrices A are diagonalizable.

Orthonormal Diagonalization
We now consider the euclidean space ffi. n an as inner product space with the euclidean
inner product. Given any n x n matrix A with real entries, we wish to nd out whether there

96

Matrics

exists an orthonormal basis of ~n consisting of eigenvectors of A. Recall that in the
Diagonalization process discussed in the last section, the columns of the matrix Pare
eigenvectors of A, and these vectors form a basis of ~ n . It follows from Proposition basis
is orthonormal if and only if the matrix P is orthogonal.
Definition. An nn matrix A with real entries is said to be orthogonally dagonalizable if
there exists an orthogonal matrix P with real entries such that P-IAP = plAP is a diagonal
matrix with real entries. First of all, we would like to determine which matrices are
orthogonally diagonalizable. For those that are, we then need to discuss how we may find
an orthogonal matrix P to carry out the diagonalization. To study the first question, we
have the following result which gives a restriction on those matrices that are orthogonally
diagonalizable.
Proposition. Suppose that A is a orthogonally diagonalizable matrix with real entries.
Then A is symmetric.
Proof Suppose that A is orthogonally diagonalizable. Then there exists an orthogonal
matrix P and a diagonal matrix D, both with real entries and such that P'AP = D. Since PP'
= pIp = I and Dt = D, we have
A = PDP' = PD'P',
so that
A' = (PD'P0 t = (P0 t(D0'P' = PDP' = A,
whence A is symmetric. Our first question is in fact answered by the following result
which we state without proof.
Proposition. Suppose that A is an n x n matrix with real entries. Then it is orthogonally
diagonalizable if and only if it is symmetric. The remainder of this section is devoted to
nding a way to orthogonally diagonalize a symmetric matrix with real entries. We begin
by stating without proof the following result. The proof requires results from the theory of
complex vector spaces.
Proposition. Suppose that A is a symmetric matrix with real entries. Then all the
eigenvalues of A are real.
Our idea here is to follow the Diagonalization process discussed in the last section,
knowing that since A is diagonalizable, we shall find a basis of ~n consisting of
eigenvectors of A. We may then wish to orthogonalize this basis by the Gram-Schmidt
process. This last step is considerably simplied in view of the following result.
Proposition. Suppose that u 1 and u2 are eigenvectors of a symmetric matrix A with
real entries, corresponding to distinct eigenvalues 1 and 2 respectively. Then u 1 • u 2 = O. In
other words, eigenvectors of a symmetric real matrix corresponding to distinct eigenvalues
are orthogonal.
Proof Note that if we write u 1 and u2 as column matrices, then since A is symmetric,
we have

Matrics

97

It follows that
A)U) u 2 = Au) . u 2 = u) AU2 = u) .2u2,
so that (AI - A2)(u l . u2) = o. Since A) 7: A2, we must have ul . u2 = o.
We can now follow the procedure below.
Orthogonal Diagonalization Process. Suppose that A is a symmetric n x n matrix
with real entries.
(1) Determine the n real roots AI, .... An of the characteristic polynomial det (A- AJ),
and find n linearly independent eigenvectors u l ' •••• , un of A corresponding to these
eigenvalues as in the Diagonalization process.

(2)

Aply the Gram~Schmidt orthogonalization process to the eigenvectors u I ' ••• , un
to obtain orthogonal eigenvectors vIm, ... , vn of A, noting that eigenvectors
corresponding to distinct eigenvalues are already orthogonal.

(3)

Normalize the orthogonal eigenvectors v .. ..., vn to obtain orthonormal eigenvectors
wI' ... ,

wn of A. These form an orthonormal basis of lRn. Furthermore, write

p~ J... and D=(AJ ". AJ
(w

w.)

where A., ... , An E lR are the eigenvalues of A and where wI' ... , wn E lR n are
respectively their orthogonalized and normalized eigenvectors. Then PtAP = D.
Remark. Note that if we apply the Gram-Schmidt orthogonalization process to
eigenvectors corresponding to the same eigenvalue, then the new vectors that result from
this process are also eigenvectors corresponding to this eigenvalue. Why?
Example. Consider the matrix

A=( ~

~ ~).

122

To find the eigenvalues of A, we need to nd the roots of

1) =

2-A
2
det
2
5- A
2
(
2
2-A

0;

in other words, (A-7)(A- 1)2 = O. The eigenvalues are therefore Al = 7 and (double
root)
~=A3=1.

An eigenvector corresponding to A)

= 7 is a solution of the system

98

Matrics

-5
(A-71)u

=(

~

Eigenvectors corresponding to 1..2 =1..3 = 1 are solutions of the system

(A-7I)U=U

~ ~}=

0 wiili root .,

=UJ

Md ",

=Ul J

which are linearly independent. Next, we apply the Gram-Schmidt orthogonalization
process to u2 and u3, and obtain

which are now orthogonal to each other. Note that we do not have to do anything to
ul at this stage, in view of Proposition. We now conclude that

form an orthogonal basis of ~3. Normalizing each of these, we obtain respectively

wI

=

1116] ,w2 =
2116
[ 1116

(1IJi]
0 , = [1IJ3]
-11 J3 .
-1IJi
11J3
w3

We now take

Then

1116 2/16

p= pI

=

1IJi
[
1IJ3

Example. Consider the matrix

0

-1IJ3

~:::n] /AP~(~ ~
and

1IJ3

0 0

99

Matries

A=[~1 -~3 ~~2J.

o -9 20
To find the eigenvalues of A, we need to nd the roots of
6
_13 _A

det(-I~A

~102 Jo;

o
-9
20-A
in other words, (A + 1)( A- 2)( A - 5) = O. The eigenvalues are therefore
Al = -1, A2 = 2 and A3 = 5.
An eigenvector corresponding Al = -I is a solution of the system

(A+/)U=(~ -~2 ~~2Ju=0'
o

-9

21

An eigenvector corresponding to ~

with root

-9

An eigenvector corresponding to A3

(A+5/)u =

-9

with root

u3

=(~5J.

15

-3

= 5 is a solution of the system

(~6 -~8 ~~2 Ju =
o

=(~J.

0
= 2 is a solution of the system

(A+5/)U=[~6 -~8 ~~2Ju=0'
o

ul

0, with root

15

u3 =

(~5 J.
-3

Note that while uI ' u2' u3 correspond to distinct eigenvalues of A, they are not
orthogonal. The matrix A is not symmetric, and so Proposition 100 does not apply in this
case.
Example. Consider the matrix

A=( ~2

n

~2

To find the eigenvalues of A, we need to nd the roots of

det

(

5-A -2
-2 6 - A

o

2

0
2

J= 0;

7-A

in other words, ( A - 3)( A- 6)( A - 9) = O. The eigenvalues are therefore
A\ = 3, A2 = 6

3 = 9. Normalizing each of these vectors.1 (A-3J)u =(~2 ~2 ~)u = o 2 = 3 is a solution of the system 0. 2/3 Then p-t 2/3 -1/3 ( -113 2/3 = pi = 2/3 2/3 -113) 2/3 2/3 (3 and pIAP= 0 0 . 2/3 We now take P=(wt w2 2/3 2/3 -1/3) w3)= 2/3 -113 2/3. with root u3= ( o 2 -2 (A-9J)u= [-1) 2 .. ( -113 2/3 . so it follows from Proposition that up u2.w = [2/3) wt 2/3 -113. u3 are orthogonal. wiili root ~ =( ~1 )- An eigenvector corresponding to 3 = 9 is a solution of the system -2 0) -4 -2 -3 2 u=O. we obtain respectively 2/3) .100 Matrics and 1. -1 An eigenvector corresponding to ~ = 6 is a solution of the system (A-6nu =( ~~ ~2 ~} = 0. An eigenvector corresponding 1. with root Ut =( 4 ~ ).. 2 f-113 2/3 w3 (-1/3) 2/3 . 2 Note now that the eigenvalues are distinct. so we do not have to apply Step (2) of the Orthogonal diagonalization process.

First I want to give some motivation. we do not have any'choice. Then if we want to have these properties. I don't want just to give a formal definition. vn in R.. In dimension 1 is it just the length.. It can be easily visualized when n = 2 (parallelogram) and n = 3 (parallel~piped). but with determinant of a system of vectors.be. It is more convenient to start not with the determinant of a matrix. and then derive some properties the determinant should have.. the determinant of a 3 x 3 matrix can be found by the "Star of David" rule..Chapter 4 Determinants Introduction The reader probably already met determinants in calculus or algebra.O S tk S 1 "ik = 1. vn). In this chapter we would like to introduce determinants for n x n matrices... and arrive to several equivalent definitions of the determinant.. Vn can be defined as the collection of all vectors V E ]Rn that can be represented as v=tIvI +t2v2 + . let us introduce some notation. So. n. and we want to find the n-dimensional volume of the parallelepiped determined by these vectors. . .2. Vn we will denote its determinant (that we are going to construct) as D(v I . . ·+tnvn . at least the determinants of 2 x 2 and 3 x 3 matrices. what is the n-dimensional volume? If n = 2 it is area. V2. There is no real difference here. For a 2 x 2 matrix the determinant is simply ad . . For a system of vectors (columns) VI' V2. . if n = 3 it is indeed the volume.n (notice that the number of vectors coincide with ~imension). Let us have n vectors VI' V2. The parallelepiped determined by the vectors VI' V2.... Finally. since we always can join vectors together (say as columns) to form a matrix. .. . . . If we . .

for a matrix an. then the height (i... v2. . vn)) is multiplied by a.2 aI.. . the above two properties say that the determinant of n vectors is linear in each argument (vector). there is nothing special about vector vI' so for any index k D(VI. then this property holds for all scalars a...o. .e. ... v2. vn k ) + D( VI'"'' Vb"'.uk +vk. vn ) of the system vI' V2.Vk'·"'Vn ) = aD(v I .!1 .e.l aI..... . If we admit negative heights (and negative volumes). . then height is the distance from this vector to the subspace spanned by the remaining vectors. . . the distance to the linear span £ (v2' . and the volume (determinant) is uniquely defined.. . that D(vI"". WHAT PROPERTIES DETERMINANT SHOULD HAVE We know.. and the base is the (n . vn) Also. vn ) k k To get the next property.. v2. .. ·. vn) = aD(v I. Linearity in Each Argument First of all..v~ . vn). vn should satisfy D(av I. and so the determinant D(vl'v2. that if we assume that the base and the height satisfy some natural properties. if we multiply vector vI by a positive number a.. let us notice that if we add 2 vectors. then we will use the notation detA. Of course..n a2) a2'f a2. . that for dimensions 2 and 3 "volume" of a parallelepiped is determined by the base times height rule: if we pick one vector.1 vectors and interpret the remaining vector as a variable (argument). . .1) dimensional. then we do not have any choice. .. Now let us generalize this idea to higher dimensions. we get a linear function.} an'2 an. . k ' = D( VI''''Ub' .Volume of the parallelepiped determined by the remaining vectors.n its determinant is often is denoted by al. . i. We will show. meaning that if we fix n . For a moment we do not care about how exactly to determine height and base.Determinants 102 join these vectors in a matrix A (column number k of a is vk ). .''''vn) . then the "height" of the result should be equal the sum of the "heights" of summands. . vn ) k In other words. detA = D(v\..

vk. .. applying property three times. . if we take a vector.. .. the "height" does not change. . by admitting negative heights.'''''Vn ) j = = k +".Vk':'" vb'''' Vn ) ' k = D[V\. "Vj ..' . .Vj""Vk.Vn ] .'Vn1 D[V\""'V~""':k ) k -Vj.. is Functions of several variables that change sign D[V\>'''' v~.. Vj. .. let us notice that the second part of linearity is not independent: it can be deduced from properties.. In fact. we even gained something..:k -Vj. because the sign of the determinant contains some information about the system of vectors (orientation).."'~J +(Vj -Vj):···'~k ~VJ. . but it can be deduced from the previous ones. .Vj.. and then using we get D(v\. . Antisymmetry The next property the determinant should have..Vn 1= -D(v\. Namely...Vj"".. . if we apply the column operation of the third type. . Vb'''' Vn ) = k j = D(V1"". .. Namely.'''' Vn ] j j k In other words. Preservation Under "Column Replacement" The next property also seems natural.. Remark. v j . that helps in many situations. the determinant does not change. admitting negative heights (and therefore negative volumes) is a very small price to pay to get linearity. } k J j k At first sight this property does not look natural. So..""Vn ). Although it is not essential here. + o. . . so D(v\. . since we can always put on the absolute value afterwards.Determinants 103 Remark. . we did not sacrifice anything! To the contrary. say vp and add to it a multiple of another vector vk . We already know that linearity is a very nice property..

uk .104 Determinants =n['1. 3. . . vk. then detA = o.e. then det A = 0. ~.. Normalization property: det 1= 1. 4. in vector notation for every index k D(v!. . After all we have to be sure that the object we are computing and studying exists. i. we derive other properties of the determinant. Basic Properties.e. some of them highly nontrivial. Properties of Determinant Deduced from the Basic Properties 1. 3.··· vn ) k k for all scalars a. We will show that the determinant. . detA = o. e2. if one interchanges two columns.~Uk :-f3vk. Constructing the Determinant The plan of the game is now as follows: using the properties that the determinant should have. 2. If columns of a are linearly dependent.Vt. . These three properties completely define determinant. .. n Normalization The last property is the easiest one.vr. . If one column of a is a multiple of another. then detA = 0. Determinant is antisymmetric. .... If a has two equal columns.e. that we did not use property: it can be deduced from the above three. . en) In matrix notation this can be written as det(l) = 1 = 1.e... The first propertyis just the combined. We will use the following basic properties of the determinant: 1. For the D(e!.D(v!. . . The second one and the last one is the normalization property.. if the matrix is not invertible... the determinant changes sign. i. We will show how to use these properties to compute the determinant using our old friend-row reduction. For a square matrix a the following statements hold: If a has a zero column.. Note. vn ) + f3D(v!. Proposition. . .V 1 ..Vy-. i. 2... Determina~t is linear in each column. i.v J = n -+.. ...-Vt ··. a fu'nction with the desired properties exists and unique.···'Vn ) = k o. then .

det A. so D(vl""'Vk "". the determinant is preserved under "column replacement" (column operation of third type). In particular.l = D [[t. n VI = <x2v2 + <X3 v3 + . . implies statement 2. k k so the determinant in this case is also O. Interchanging this vector with VI we arrive to the situation we just treated. we should get O. Proof Fix a vector vk .. u = L>~"jVj' j"".'''' vn ) + D(vI. k k . vn is linearly dependent." . "'tVt]. which is possible only if det A = O. v" .. As we already have said above.+ <Xnvn = Lakvk' k=2 Then by linearity we have (in vector notation) D(v. say vk can be represented as a linear combination ofthe others. Proposition. let us assume that the system vI' v2. we are using in this section. so det4 = . vn ) = D(vl"'" '-k D(vl'''' vk and by Proposition the last term is zero. vn ) = -D(vk""'V)'''''vn ) = -0 = 0.k Then by linearity + U. vk..105 Determinants Proof Statement 1 follows immediately from linearity. this property can be deduced from the three "basic" properties of the determinant. vn ). But by the property 1 above. v3"'" vn ) k=2 and each determinant in the sum is zero because of two equal columns. . Then one of the vectors. v.. . Indeed...u... we change nothing. let us first suppose that the first vector VI is a linear combination of the other vectors.. i. On the other hand.e. so the determinant remains the same. interchanging two columns changes sign of determinant.. v3'"'' v. 1 n = L:>~'kD(vk' v2. and let u be a linear combination of the other vectors. Statement 3 is immediate corollary of statement 2 and linearity. . The determinant does not change if we add to a column a linear combination of the other columns (leaving the other columns intact). Let us now consider general case... To prove the last statement... . we do not change the matrix and its determinant. . v. Ifwe multiply the zero column by zero. The next proposition generalizes property. if we interchange two equal columns. The fact that determinant is antisymmetric. .

It is easy to see that ° ~--------------------------------------~ Determinant of a triangular matrix equals to the product of the diagonal entries.. 2.. first subtract appropriate multiples of the last column from columns number n .... Indeed.1.. . A square matrix A = {a j 'k } ~.j=l is called diagonal if all entries off the main diagonal are zero.e. a 2. 3.e. using their properties: one just need to do column reduction (i. an' The next important class is the class of so-called triangular matrices.. i.det(diag{a\.. n. i. .. . k} ~. an}) = a\a2 . We call a matrix triangular. n. . a2' .e. i. . i. .. an' n. then subtract appropriate mUltiples of the second column from columns number 3. Let us recall that a square matrix A = {aj. .. "killing" all entries in the first row.e if aj'k = for all} < k. an} for the diagonal matrix [?° ~° . it is not invertible (this can easily be checked by column operations) and therefore both sides equal zero. .j=l ° is called upper triangular if all entries below the main diagonal are 0. . if it is either lower or upper triangular matrix.... an Since a diagonal matrix diag{a\..e" if a"k = for all k <i.. if aj •k ° = for all} ::j:: k. The first class is the so-called diagonal matrices. row reduction for AT) keeping track of column operations . We will often use the notation diag{a\.\a2'2 . Determinant of a diagonal matrix equal the product of the diagonal entries. . g]. detA = a\. To treat the case of lower triangular matrices one has to do "column reduction" from the left to the right..Determinants 106 Determinants of Diagonal and Triangular Matrices Now we are ready to compute determinant for some important special classes of matrices. an} can be obtained from the identity matrix I by multiplying column number k by ak. a 2. If all diagonal entries are non-zero. 1. and so on. and so on. A square matrix is called lower triangular if all entries above the mai~ are 0. then using column replacement (column operations of third type) one can transform the matrix into a diagonal one with the same diagonal entries: For upper triangular matrix one should first subtract appropriate multiples of the first column from the columns number 2. . if a triangular matrix has zero on the main diagonal.. Computing the Determinant Now we know how to compute determinants... .

If a is invertible. . we arrive at a triangular matrix. Ifan echelon form of AT does not have pivots in every column (and row). its determinant is 1 (determinant of 1) times the eect of the column operation. So. So we only need to keep track of interchanging of columns and of multiplication of column by a scalar. we can use row operations to compute determinants. so det A = O. This can look like a lucky coincidence. Namely. right multiplication by an elementary matrix is a column operation. Theorem. det A = det(AT). but the above paragraph is a complete and rigorous proof of the lemma! Applying N times Lemma we get the following corollary.row replacement. Combining this with the last statement of Proposition we get Proposition. An equivalent statement: det A Note. operation of third type does not change the determinant. So. Fortunately. The above algorithm implies that detA can be zero only if a matrix A is not invertible.. This theorem implies that for all statement about columns the corresponding statements about rows are also true. For n x n matrices A and B det(AB) = (detA)(detB) In other words IDeterminant of a product equals product of deteramamtsl To prove both theorem we need the following lemma Lemma. (Determinant of a transpose).e. the determinant is still not defined. And that is all! It may be hard to realise at first. det A = 0 if and 0 if and only if A is invertible. determinants behave under row operations the same way they behave under column operations. the most often used operation . only if a is not invertible. For a square matrix A and an elementary matrix E (of the same size) det(AE) = (detA) (detE) Proof The proof can be done just by direct checking: determinants of special matrices are easy to compute.Determinants 107 changing the determinant. and det A is the product of diagonal entries times the correction from column interchanges and multiplications. In particular. then a is not invertible. Determinants of Transpose and Product * In this section we prove two important theorems Theorem. for a column operation the corresponding elementary matrix can be obtained from the identity matrix 1 by this column operation. i. but it is not a coincidence at all. and eect of column operations on the determinant is well known. For a square matrix A. (Determinant of a product). One can ask: why don't we define it as the result we get from the above algorithm? The problem is that formally this result is not well defined: that means we did not prove that different sequences of column operations yield the same answer. that the determinants of elementary matrices agree with the corresponding column operations. that although we now know how to compute determinants.

and therefore any invertible matrix can be represented as a product of elementary matrices.EN) Lemma.. -I -I -I -I -I -I -I -I A = EI E2 . . it is invertible. Notice.. and both determinants are zero... Since taking the transpose just transposes each elementary matrix and reverses their order. Then Lemma implies that B can be represented as a product of elementary matrices B = E IE 2. which is its reduced echelon form.. A = E IE 2.108 Determinants Corollary. If B is not invertible.. . . So /= ENElV l . let us say once more. . EN) = (det A)(det EI)(det E 2). So B is left invertible. that it is sucient to prove the theorem only for invertible matrices A. Proof First of all. (det.. that determinant is defined only for square matrices Since we now know that det A = det(AT). . EN' and so by Corollary det (AB) = (det A)[(det EI)(det E 2). E 2E IA. . the statements that we knew about columns are true for rows too. Corollary implies that detA = detA T.. and since it is square.. We got a contradiction. Proof We know that any invertible matrix is row equivalent to the identity matrix.. Properties of Determinant First of all.. (det EN)] = (det A)(det B). 2. EN' and by Corollary the determinant of A is the product of determinants of the elementary matrices. 1.. we get C-1AB = /.. EN_IEN (inverse of an elementary matrix is an elementary matrix). . EN_IEN / = EI E2 . Determinant is linear in each row (column) when the other rows (columns) are fixed. . If one interchanges two rows (columns) of a matrix A. EN (all matrices are n x n) det(AE IE 2.. so C-IA is a left inverse of B. that for an elementary matrix E we have detE = det(ET). let us assume that it is invertible. Any invertible matrix is a product of elementary matrices. .. it can be easily checked. Proof Let us first suppose that the matrix B is invertible. then the product AB is also not invertible.. By Lemma matrix A can be represented as a product of elementary matrices.. since if A is not invertible then AT is also not invertible. For any matrix A and any sequence of elementary matrices E I.. . E 2.. and the theorem just says that 0 = O. .... the determinant changes sign. Then multiplying the identity AB = C by C-I from the left. To check that the product AB = C is not invertible.

.e. then det(aA) = an det A. is not invertible.. If a is an n x n matrix. i.l e j' v2 . det A does not change if we add to a row (column) a linear combination of the other rows (columns). ..e.nD(ejl·eh" .. More generally... [~~:~l = al'k el +a2'k e2 + . i. i.lah... If a matrix A has two equal rows (columns). det A*. = 0 if and only if A = O. .Determinants 109 3.. if we recall that to multiply a matrix A by a we have to multiply each row by a. then det A = 0.k = 1. = (det A)(det B). .en = 2: a j'l D(ej .. satisfying the basic properties 1. the determinant is preserved under the row (column) replacement. . In particular.e. for a diagonal) matrix its determinant is the product of the diagonal entries. and moreover. .. 2.ajn. 11. +an'k en ~~ = taj'k ej .O if and only if A is invertible. and let vI' v 2. det AT = det A. we do not have any choice in constructing the determinant... det A 6. det(AB) The last property follows from the linearity of the determinant.. If one of the rows (columns) of A is a linear combination of the other rows (columns). 7. Existence and Uniqueness oftbe Determinant In this section we arrive to the formal definition of the determinant. vn) = D(2: a j. det 1= 1. or equivalently 10.. ~ait. ej)' j)=lh=1 jn=1 .2 . v2'''' vn )· j=1 j=1 Then we expand it in the second column. 5.. If a matrix A has a zero row (or column). v 2. under the row (column) operation of the third kind.e. For a triangular (in particular. i.. We show that a function. and so on. 9.. det A 8. if the matrix is not invertible. And finally. 12. and that each multiplication multiplies the determinant by a. We get n D(v.. J~ Using linearity of the determinant we expand it in the first column n n vI: D(v. det A = O. In particular. such function is unique. Consider an n vk x = n matrix A = {aj. 4. 3 from Section 3 exists. v 2.k }nj. vn be its columns. . then in the third. . " vn) = n n 2: 2:.

that although there are infinitely many ways to get the n-tuple (1). .. . . v2... and see if the number is even or odd. Note also.2. j < k such that aU) > a(k). 2. . and sign(a) = -1 if the number of interchanges is odd. .. n. to get from a permutation to another one always needs an even . Fortunately. . . n} ~ {l.. . . n into a(1)... that we have to use a different index of summation for each column:we call themjl. 2. .ecr(2)'····.. .. it contains nn terms. Namely. changes the signum of a permutation. vn) = 2:: acr(I). . a(2). the determinant D(ej I . . a convenient way to represent a permutation is by using a function a: {I. we can rewrite the determinant as D(v l . n into a(1). then the number of such pairs is 0. 2. . . (n) gives the new order of the set 1. if any 2 of the indices j l' h. (n) from 1. . We want to show that signum and sign coincide.. Such functions (one-to-one and onto) are called bijections. . . It is a well-known fact from the combinatorics. . the indexh here is the same as the index}. .. One of the ways to show that is to count the number K of pairs j. let us notice.. a(2).h. . that the sign of permutation is well defined.Determinants 110 Notice. a permutation of an ordered set {l. ecr(n) can be obtained from the identity matrix by finitely many column interchanges.2.. which interchange two neighbors... so sign is well defined. so the determinant D(ecr(I)' ecr(2). 2.lacr(2)....e. . .. that the number of different perturbations of the set {I.nD(ecr(I). . 2... n. and they give one-toone correspondence between two sets. that it is wellknown in combinatorics. n} is exactly n!. the permutation rearranges the ordered set 1. k. we define sign (denoted sign a) of a permutation to be 1 if even number of interchanges is necessary to rearrange the ntuple 1.2···acr(n). ..jn.. Such function a has to be one-to-one (different values for different arguments) and onto (assumes all possible values from the target space).. So.. the number of interchanges is either always odd or always even. so signum of such identity permutation is 1. . It is a huge sum. . Although it is not directly relevant here. n} is a rearrangement of its elements. We call the permutation odd if K is odd and even if K is even. . The set of all permutations of the set {l. The most convenient way to do that is using the notion of a permutation. because there are two equal rows here.... Using the notion of a permutation. (n). If (k) = k V k. Then define signum of to be (-I)K... i.. let us rewrite the sum. 2.. n}. n} will be denoted Perm(n). . 2. because it changes (increases or decreases) the number of the pairs exactly by I. ef) is zero. . ecr(n» is I or -1 depending on the number of column interchanges. In other words... a(n).. .ecr(n»· crEPenn(n) The matrix with columns ecr(l)' e cr (2)' . where a(I). omitting all zero terms. . (2).. jn coincide. some of the terms are zero... .. . eh. a(2). . . So. that any elementary transpose. To formalize that.

So.2(-I)j+2 detAj . If we define the determinant this way. Finally. So. COFACTOR EXPANSION For an n x n matrix A = {aj.2. l (_I)J+n detA.l. Theorem. for each k. .k(-I) j+k det Aj. Since det A = det AT. column expansion follows automatically. . to get from 1.d~..k' k=1 Similarly.2···. Finally.n sign(a). 1 column number k.k' k=1 Proof Let us first prove the formula for the expansion in row number 1. Let us first consider a special case. determinant ofA can be expanded in the row number j as _ '+1 detA .. Interchanging two columns of a just adds an extra interchange to the perturbation. . n}. For each j. and odd number of interchanges is needed to get an odd permutation (negative signum). detA = 2: aEPenn(n) where the sum is taken over all permutations of the set {I. if we want determinant to satisfy basic properties 1-3 from Section 3. n we transform a to the lower triangular form.l (-I)J detAj. .. . any interchange of two entries can be achieved by an odd number of elementary transposes.1 + aj ..111 Determinants number of elementary transposes if the permutation have the same signum.. Let A be an n x n matrix. That means signum and sign coincide. because for each column every term (product) in the sum contains exactly one entry from this column.k=1 letAj'kdenotes the (n-l) x (n-l) matrix obtained from A by crossing out row number j and column number k. and an oddnumber if the signums are different.aa(2).aj.. + a'J.n n = 2:aj. 3. it is easy to check that it satisfies the basic properties. n to an even permutation (positive signum) one always need even number of interchanges. ~ j ~ n. n detA ~ k ~ n. 2. so right side in changes sign. Indeed. and so sign is well defined. we must define it as aa(l). The determinant of A then can be computed as . the determinant can be expanded in the = 2:aj.. when the first row has one non zero term a I I' Performing column operations on columns 2.2 + . it is linear in each column.aa(n).J.k(-I) j+k det Aj. for the identitymatrix I. . the right side is 1 (it has one non-zero term). (Cofactor expansion of determinant).. This implies that signum changes under an interchange of two entries. The formula for expansion in row number k then can be obtained from it by interchanging rows number 1 and k.

we can interchange the first and second rows and apply the above formula.k is the only non-zero entry in the first row det A = (_1)1+ k aI. can be reduced to the previous one by interchanging rows 2 and 3. correcting factor from entries of the triangular x the column operations matrix But the product of all diagonal entries except the first one (i.Determinants 112 Ithe product of diagonal entries of the triangular matrix I the product of diagonal .. without at I) times the correcting factor is exactly detAI. Definition.k det A2. k ) k=1 where the matrix A(k) is obtained from A by replacing all entries in the first row except O..e. Using this notation. and/therefore in this case detA = (-1) detA I'2' The case when a1'3 is the only non-zero entry in the first row.l:)-1) I+k a2. so in this case detA = a1'3 detA1'3' Repeating this procedure we get that in the case when al.k. k are called co/actors.k = k=I 2: (-1)2+k a2.k detA3. k =(-I)l'+k detA'j. k=1 and so on. k=1 To get the cofactor expansion in the second row.. linearity of the determinant implies that n det A = det A(I) + det A(2) + . so aI'k by n det A = L (-I) l+k a I'k det A I ok. so we get n det A = .l' so in this particular case detA = al.k detAl'k. The numbers C·1. The row exchange changes the sign. To expand the determinant det A in a column one need to apply the row expansion formula for AT.k det Al k' In the general case.I' Let us now consider the case when all entries in the first row except a l .k' n k=I Exchanging rows 3 and 2 and expanding in the second row we get formula n detA = 2:(-1) 3+k a3.' This case can be reduced to the previous one by interchanging columns number 1 and 2.2 are zeroes.k det A2.I detAI. the formula for expansion of the determinant in the row number j can be rewritten as . As we just discussed above detA(k) = (_I)1+k al. + det A(n) = L:deti '.

k + .+ a·j. Ver~ often the cofactor expansion formula is Jused as the definition of determinant.. 1+ a. However.k C2..n C. to get a.'1 + a·J.e. On the other hand. The diagonal entry number j is obtained by mUltiplyingjth row of a by jth column of a (i. 2 + . + a·J. Remark. It can only be practical to apply the cofactor expansion formula in higher dimensions if a row (or a column) has a lot of zero entries. Then A-I =_l_C T .kCj.j -:t= k.k' '-1 Remark.. It would take a computer performing a million operations per second (very slow..4 . detA Proof Let us find the product ACT. computing the determinant of an n x n matrix using row reduction requires (n 3 + 2n . by today's standards) a fraction of a second to compute the determinant of a 100 x 100 matrix by row reduction..3)/3 multiplications (and about the same number of additions). the proof of anti symmetry is easy. Although it looks very nice. J.kCj.J. the cofactor expansion formula is not suitable for computing determinant of matrices bigger than 3 x 3. J J by the cofactor expansion formula. 2C. It is not dicult to show that the quantity given by this formula satisfies the basic properties of the determinant: the normalization property is trivial.k = Laj. = a·'IC.k=1 whose' entries are c<!factors of A given matrix A is called the cofactor matrix of A.n = Laj. + a.. C. j.n J. + an'k Cn.Determinants 113 n detA = a"1 C"I J J + a' 2 c. Theorem.n C k..j" 2Ck2 + .k' k=1 Similarly.. However. and n! grows very rapidly..n = detA.. the proof of linearity is a bit tedious (although not too dicult). For example. Let a be an invertible matrix and let C be its cofactor matrix. As one can count it requires n! multiplications. the cofactor expansion formula is of great theoretical importance. cofactor expansion of a 20 x 20 matrix require 20! ~ 2. . jth row of C). To get the off diagonal terms we need to mUltiply jth row of A by kth column of CT. J. J.n . as the next section shows. 10 18 multiplications: it would take a computer performing a billion multiplications per second over 77 years to perform the multiplications. 2 + .IC k.k }~.J. Cofactor Formula for Inverse Matrix The matrix C = {Cj. so (ACT)JJ.k + a2. expansion in the row number k can be written as n det A = al'k CI.

it is the inverse. Example (Inverting 2 x 2 matrices). 1 That means that the matrix det A CT is a right inverse of A.114 Determinants It follows from the cofactor expansions formula (expanding in kth row) that this is the determinant of the matrix obtained from a by replacing row number k by the row number } (and leaving all other rows as they were). So.CTb. that it is easy to construct an integer matrix A with det A = 1: one should start with a triangular matrix with 1 on the main diagonal. it has a great theoretical value. detA We get the following corollary of the above theorem. Some applications of the cofactor formula for the inverse.just entries (1 x ] matrices). such that its inverse also has integer entries (inverting such matrix would make a nice homework problem: no messing with fractions). the cofactor matrix is (-cd -b) a' so the inverse matrix A-I is given by the formula A-I = de~A (!c ~b). The cofactors are. Recalling that for an invertible matrix A the equation Ax solution = b has a unique 1 =A. Corollary.1 b = . (Cramer's rule). The cofactor formula really shines when one needs to invert a 2 x 2 matrix X I A=(~ ~). so the determinant is O. (Matrix with integer inverse). . Suppose that we want to construct a matrix a with integer entries. the cofactor formula for inverses implies that A-I also have integer entries. and then apply several row or column replacements (operations of the third type) to make the matrix look generic. If det A = 1 and its entries are integer. For an invertible matrix a the entry number k ofthe solution of the equation Ax = b is given by the formula detBk xk=--' detA where the matrix Bk is obtained from a by replacing column number k of A by the vector b. Example. and since a is square. But the rows} and k of this matrix coincide. thus ACT = (det A) 1. all off-diagonal entries of ACT are zeroes (and all diagonal ones equal det A). Note. as the examples below illustrate.. While the cofactor formula for the inverse does not look practical for dimensions higher than 3.

e original matrix. obtained by taking k rows and k columns. rankA(x) ~ r for all x.e. n}. To show that such r exists. and let M be a minor of order k such that M(xo) 0. so it is invertible (pivot in every column and every row) and its determinant is non-zero. Corollary. since the dimension of the column space Ran A is rank A < k. it is not identically zero. So. The'Jrem. rows and columns of the original matrix. any k columns of A are linearly dependent. p(x) is a multiple of each denominator. If there exists x such that rank A (x) = r. So. Since M(x) is the determinant of a k x k polynomial matrix. But by the definition ofr. except maybe finitely many points. If not. but probably the easiest way to get such a minor is to take pivot rows and pivot column (i. i. so it can be zero onl. To complete the proof we need to show that there exists a non-zero minor of order k = rankA. After finitely many steps we either stop or hit 0. then the inverse matrix A-I(x) is also a polynomial ri. it is of greattheoretical importance. everywhere except maybe finitely many points rankA(x) ~ r. containing a pivot). M(x) is a polynomial. and so all minors of order k are 0.. we replace r by r -1 and try again.. for any k x k submatrix of A its column are linearly dependent.e.e. Therefore. Proof Let us first show.( ~) different k x k submatrices. Since M(xo) 0. at finitely many points. This k x k submatrix has the same pivots as t:. because it is much easier to perform row reduction than to compute all minors. Let Xo be a point such that rankA(xo) = r. it follows from the cofactor expansion that p(x) is a polynomial. we first try r = min {m. For a non-zero matrix a its rank equals to the maximal integer k such that there exists a non-zero minor of order k. that an m x n matrix has (~l). Note. so A-I (x) is a has rational entries: moreover. However. Minors and Rank For a matrix A let us consider its k x k submatrix. a matrix whose entries are polynomials ofx).). as the following corollary shows. '* '* DETERMINANTS We have related the question of the invertibility of a square matrix to a question of . Another example is to consider a polynomial matrix A(x).( ~) minors of order k. Proof Let r be the largest integer such that rankA(x) = r for some x. There can be many such minors. a matrix whose entries are not numbers but polynomials a· ix) of the variable x. we found r. If det A(x) 1. The determinant of this matrix is called a minor of order k. Then rank A (x) is constant everywhere. This theorem does not look very useful..atrix. r exists. that if k > rank A then all minors of order k are 0. Indeed. and so it has (. Let A = A(x) be an m x n polynomial matrix (i. If det A (x) =p(x) == 0.Determinants 115 Examp:e (Inverse of a polynomial matrix).

so that the matrix A is not invertible. So what is the determinant? Let us start with 1 x 1 matrices.116 Determinants solutions of systems of linear equations. let us turn to 202 0 x 2 matrices. is a 1 x 1 matrix.c O. So we consider the array (AII2 ) = (~ ~ 6 ~). Consider next the case a 6:f. O. since the matrix A is invertible if and . we obtain 1 0) b a ( o ad-be -e a· If D = ad . If a 6 ::. Definition. with inverse matrix A-I = (a . We write det (A) = a. then clearly the matrix A is invertible. then clearly no matrix B can satisfy AB = BA = 1\. We therefore conclude that the value a is a good "determinant" to determine whether the 1 x 1 matrix A is invertible. We shall relate these two uestions to the question of the determinant of the matrix in question. In some sense. We shall use elementary row operations to nd out when the matrix A is invertible.1) On the other hand. Suppose first of all that a = e = O. then this becomes 1 a' 0) a b ( o 0 -c . we obtain Adding -e times row 1 to row 2. since it is not simple to nd an answer to either of these questions without a lot of work. of the form A = (a) • Note here that 1\ = (1).c 0. Next. The task is reduced to checking whether this determinant is zero or non-zero. and call this the determinant of the matrix A.be = 0. Then the array becomes I 0) 0 b (o d O l ' and so it is impossible to reduce the left hand half of the array by elementary row operations to the matrix 12 . Multiplying row 2 of the array (1) by a. and try to use elementary row operations to reduce the left hand half of the array to 12. Suppose that A = (a). Let us then agree on the following definition. of the form A=(~ ~). only if a ::. if a = 0. this is unsatisfactory.

Let us then agree on the following definition. Interchanging rows I and 2 of the array (I).Determinants 117 and so it is impossible to reduce the left hand half of the array by elementary row operations to the matrix 12. We therefore conclude that the value ad . 0. ad-be -e a Finally. if D = ad . O. we obtain e dOl) (ae be eO' Adding -a times row I to row 2. then the array (2) can be reduced by elementary row operations to ( 1 0 o d / D -b / D) a/ D ' 1 -c/ D so that A-I = (d -b).be :f.be = O. then this becomes e dOl) (a 0 -e a' and so it is impossible to reduce the left hand half of the array by elementary row operations to the matrix 12. if D = ad . since the matrix A is invertible if and only if ad . then the array (3) can be reduced by elementary row operations to I 0 d / D -b / D) ( o 1 -e/ D af D ' so that A-I = A= (~ ~) 1 ( d -b).be = 0. we obtain dOl) e ( o ad -be -e a' Again. a 1 ad-be -e Consider nally the case e:f. On the other hand. O. On the other hand. we obtain e dOl) ( o be-ad e -a' Multiplying row 2 by -1. we obtain e (a dOl) b 1 O· Multiplying row 2 of the array bye. note that a = e = 0 is a special case of ad .be is a good determinant" to determine whether the 2 x 2 matrix A is invertible. Definition.be :f.be = 0. if D = ad . Suppose that .

. Let all. We write det(A) = ad . By the cofactor expansion of A by row i.1) matrix..be.118 Determinants is a 2 2 matrix. . and call this the determinant of the matrix A. and so on. + ainCin . ( .1) x (n . dene the determinant of 4 x 4 matrices in terms of determinants of 3 x 3 matrices. x Determinants for Square Matrices of Higher Order If we attempt to repeat the argument for 2 x 2 matrices to 3 x 3 matrices. then it is very likely that we shall end up in a mess with possibly no firm conclusion. ..1) matrix all aI(I-I) a(i-I)I Aij • aI(I+I) • a(i-l)(j-I) ·a(i-I)(j+I) = • a(i. be an n matrix. . The number Cij = (_l)i+j det(Aij) is called the cofactor of the entry aij of A. j=I Note that the entries of A in column} are given by .) = 1. Our approach is inductive in nature.1) matrices. In other words.1) (n . Note that the entries of A in row i are given by (ail' . let us delete row i and column} of A to obtain the (n . we shall dene the determinant of 2 x 2 matrices in terms of determinants of 1 x 1 matrices.1) x (n .. Try the argument on 4 x 4 matrices if you must. and nally multiplying by a sign (-ly+J.erminant of the resulting (n .!"I)I an(j-I) a(i_I)" • a(i+l}(j-I) : a(j+I)(j+I) anI aln • an(j+I) a(i~l)" ann Here denotes that the entry has been deleted. the cofactor of the entry a ij is obtained from A by first deleting the row and the column containing the entry aij' then calculating the d~t.. dene the determinant of 3 x 3 matrices in terms of determinants of 2 x 2 matrices.. In other words. a ln ] anI··· ann A =·. we mean the expression n L aijCij =aijC il + . Definition. For every i. Those who have their feet firmly on the ground will try a different approach.. n.. Suppose now that we have dened the determinant of (n . ain)· Definition.

Consider the matrix 2 3 5) ( A= 1 4 2. We call the common value in the determinant of the matrix A.35 = -2: Alternatively.) = (-I? (20-2)= 18. Suppose that A is an n x n matrix. C22 = all: It follows that by row 1 : allC ll + a l2 C l2 = a ll a22 .) = (_1)3 (5 ..bc as before. Example. Let us check whether this agrees with our earlier definition of the determinant of a 2 x 2 matrix.3 .a 12a21 . we mean the expression n IaijCy i=l = aljClj + . 215 Let us use cofactor expansion by row 1. Definition. Then Cll = (_1)1+1 det C l2 = (_1)1+2 detl C I3 = (_1)1+3 detl (i .a 21 a 12 . By the cofactor expansion of A by column j. and of the form ad . denoted by det(A). by row 2 : a 21 C21 + a 22 C22 = -a2l a l2 + a 22 a ll . U. Then the expressions are all equal and independent of the row or column chosen.Determinants 119 a F] ( an) Definition. Writil1g A= (a a l1 a21 12 ). Then . U i) = (_1)4 (1 . by column 2 : a l2 C l2 + a22 C22 = -a l2 a21 + a22a ll : The four values are clearly equal. C 21 =-a I2 . C l2 =-a21 . Suppose that A is an n x n matrix.4) = -1.. so that det(A) = allC ll + a l2 C l2 + a l3 C l3 = 36 . anjCnj" Proposition. a22 we have CII = a22 . by column 1 : allC ll + a 21 C 21 = a ll a 22 . let us use cofactor expansion by column 2.8) = -7.

4) = -1.) = C22 = (_1)2+2 det (~ ~) = (_1)4(10 - C32 = (_1)3+2 det (f ~) (-1)3(5 . Then det (A) = O. 2 1 0 5 Here it is convenient to use cofactor expansion by column 3. since then 2 3 det(A) = a 13 C 13 + a23 C23 + a 33 C33 + a43 C43 = 8C33 = 8E-l)3+3 det ( ~ i D=-16.120 Determinants C l2 = (_1)1+2 det U. then A is called an upper triangular matrix.. . Proposition. If aij = 0 whenever i <j. al n J. Consider an n x n matrix (ap .5) = 0. Example.. A= an. det(A) = O.. We also say that A is a triangular matrix if it is upper triangular or lower triangular. 10) = (-1)5(4 . Some Simple Observations In this section. The matrix ( ~ ~ ~J 006 is upper triangular. . When using co-factor expansion. ann If aij = 0 whenever i > j. in view of Example. Consider the matrix 21 4 3 0025J A= (5 4 8 5 . Proof We simply use cofactor expansion by the zero row or zero column. Definition. we shall describe two simple observations which follow immediately from the definition of the determinant by cofactor expansion.. we should choose a row or column with as few nonzero entries as possible in order to minimize the calculations. Suppose that a square matrix A has a zero row or has a zero column. so that det(A) = a 12 C 12 + a22 C22 + a 32 C32 = -3 + 0 + 1 = -2. then A is called a lower triangular matrix. Example. = 1.

A diagonal matrix is both upper triangular and lower triangular. (c) Suppose that the matrix B is obtained from the matrix A by multiplying one row of A by a non-zero constant c. When n> 2. (b) Suppose that the matrix B is obtained from the matrix A by adding a multiple of one row of A to another row.121 Determinants Example.. Then det(B} = c det(A}. Recall that the elementary row operations that we consider are: (l) interchanging two rows. . say row i. we see that f3 det (a an3 as required. we use co-factor expansion by a third row. . a~nJ = '" '" all ann '" a22 • . change the term left-most column to the term top row in the proof.. Proposition. Proof (a) The proof is by induction on n. ( ani aln'J . and (3) multiplying one row by a non-zero constant.I) matrices Bij are obtained from the matrices Ai" by interchanging two rows of Aij . Then n det(B) = . .. (2) adding a multiple of one row to another row. ann Then det(A) = all a22 . . Using cofactor expansion by the left-most column at each step.. so that det(Bij) = -det(Aij)' It follows that !) n det(B) =- j Laij(-li+ det(Ay) = -det(A) j=1 as required.LJaij(-l)i+ j '"" det(Bij)' j=1 Note that the (n ... (a) Suppose that the matrix B is obtained from the matrix A by interchanging two rows of A. ann Elementary Row Operations We now study the eect of elementary row operations on determinants.1) x (n . Then det(B} = det(A}. Then det(B} = -det(A}. A= all . Suppose that the n x n matrix is triangular.. Proposition. Proof Let us assume that A is upper triangular for the case when A is lower triangular. It is easily checked that the result holds when n = 2. . . (ELEMENTARY ROW OPERATIONS) Suppose that A is an n x n matrix. ann' the product of the diagonal entries. .

uij(-l)i+ j det(Aij) = det(A) 1=1 as required. Then det(B} = . Elementary row and column operations can be combined with cofactor expansion to calculate the determinant of a given matrix. (c) This is simpler. Proposition. When n> 2. (c) Suppose that the matrix B is obtained from the matrix A by multiplying one column of A by a non-zero constant c. we have the following result. It is easily checked that the result holds when n = 2. It follows that n det(B) = Lcaij(-I)i+1 det(Aij) = cdet(A) j=1 as required. Suppose that the matrix B is obtained from the matrix A by multiplying row i of A by a non-zero constant c. More precisely. the proof is by induction on n.aij(-li+ j det(Bij) j=1 Note that the (n -1) x (n . the above operations can also be carried out on the columns of A. Suppose that A is an n x n matrix. (b) Suppose that the matrix B is obtained from the matrix A by adding a multiple of one column of A to another column. Then n det{ B) = L caij ( -1 )i+1 det{Bij ) j=1 Note now that Bij = Aij' since row i has been removed respectively from Band A. we have . In fact. say row i. Example. 122 Determinants (b) Again. Consider the matrix A =(1 ! ! ~J 2 2 0 4 Adding -I times column 3 to column 1.det(A). Then n det(B) = L.1) matrices Bij are obtained from the matricesAij by adding a multiple of one row of Aij to another row.. We shall illustrate this point by the following examples. Then det(B} = c det(A}. (a) Suppose that the matrix B is obtained from the matrix A by interchanging two columns of A. so that det(Bij) = det(Aij)' It follows that n det(B) = L. Then det(B} = det(A}. we use co-factor expansion by a third row.

Dividing row 4 by 2.2(-1)3+2 det (! ~) = det (! ~). Using the formula for the determinant of2 x 2 matrices. 1 0 2 . we conclude that det(A) = 4(9 . : 2 2 0 4 Adding -1/2 times row 4 to row 3. we have det(A) = 2(_1)4+1 det (~3 i4 ~J3 = -2 det(~3 Adding -1 times row 1 to row 3. we have det(A) = det [H ! ~J. we have ! de~ ~ ~2 det (~ Using cofactor expansion by row 3. we have det(A) = 2 det = (~ ! ! ~J 1 1 0 2 Adding -3 times column 3 to column 2. we have det(A) = -2. 1 1 0 2 Adding -1 times row 4 to row 2. we have det(A) = 2det(~ 1 ! ! ~J.123 Determinants det(A) = det[? ! ~J. we have (~o i2 -2~ J. we have det(A) = 2 det (! : ! ~J. 2 2 0 4 Using cofactor expansion by column 1. Let us start again and try a different way.28) = -76. det(A) = -2 det n Adding 1 times column 2 to column 3.

we have det(A) = -2. we have det(A) = 1(-1) 4+3 2 1 1 3J ( (2 1 1 1 3 1 2 1 3 1 det 2 7 1 1 = -det 2 7 1 2120 212 . Example. we have det(A) = det[~ ? ~ H]. [21 01 01 21 03 Using cofactor expansion by column 3. Adding -1 times row 4 to row 2. we conclude that det(A) = -2(25 + 13) = -76. we have .124 Determinants Using cofactor expansion by row 2. 1 I 2 Adding -2 times row 3 to row 1. so let us work to get more zeros into this column. 1 0 1 1 3 2 102 0 Adding -2 times row 4 to row 3. we have det(A) = 2 1(-1)2+3 det det(A) = -2 det =~ ~) =-2det(. (~1 =~1 1)2 Adding -5 times row 3 to row 2. Consider the matrix A=[~ ~ ~ H] 1 0 1 1 3 . we have 21 31 00 11 23] det(A) = det 2 7 0 1 1. 1(_1)3+1 det(-=-t3 ~5) =-2det(-=-15 3 ~5)' Using the formula for the determinant of 22 matrices. we have ~ -2 de{~ ~53 det(A) -H Using cofactor expansion by column 1.1 (. 2 102 0 Here we have the least number of non-zero entries in column 3.

we have 3 1 2) ( det(A) = -det 9 1 O. we have 1 0 2 4 1 0 2 4 5 7 6 2 4 6 1 9 2 1 det(A) = det 3 5 0 1 2 5 2 4 5 3 6 2 0 0 0 1 0 0 Adding -1 times row 5 to row 2.1) = -34. 0 1 2 0 Adding 1 times row 1 to row 2. Consider the matrix 1 024 1 0 2 4 5 7 6 2 A= 4 6 1 9 2 1 350 125 245 362 1 0 2 5 1 0 Here note that rows 1 and 6 are almost identical. 120 Using cofactor expansion by column 3. Adding -1 times row 1 to row 6.Determinants 125 Adding -1 times column 3 to column 1. we have 1 1 1 3J (o 1 2 0 o 3 1 2 det( A) = .det 1 7 1 1· Adding -1 times row 1 to row 3. we have det(A) = _2(_1)1+3 det(i ~) = -2det(i ~). Using the formula for the determinant of 2 x 2 matrices. Example. we conclude that det(A) =2(18 . we have J o1 31 11 23 det(A) = -det 0 6 0 -2· ( o 1 2 0 Using cofactor expansion by column 1. we have . we have det(A)=-l(-I)I+ldet(~1 2~ 32)=-detdet(A)=-det(~ ~ 32'J.

. . Note now that transposing a 1x 1 matrix does not aect its determinant (why?). Example. Further Properties of Determinants Definition. al n . we have 1 0 2 4 1 0 o 0 0 0 0 0 4 6 1 9 2 1 det(A) = det 3 5 0 1 2 5 245 362 000 100 It follows from Proposition 3B that det(A) = O. It follows that determinants of n x n matrices ultimately depend on determinants of 1 x 1 matrices. We have . anI'" ann By the transpose At of A. Consider the matrix 1 2 3) (789 A= 4 5 6. I (all .. in turn. we have det(A t) = det(A). A x n matrix a1n ). ... For every n x n matrix A.126 Determinants 102 4 1 0 0 0 4 0 0 o det(A) = det 4 6 1 9 2 1 350 125 245 362 000 100 Adding -4 times row 6 to row 2. The result below follows in view of Proposition.. we mean the matrix obtained from A by transposing rows and columns. determinants of 3 x 3 matrices depend on determinants of 2 x 2 matrices. Consider the n (a = 11 . A = '. and so on. Then AI =(1 ~ ~J. 369 Recall that determinants of 2 x 2 matrices depend on determinants of 1 x 1 matrices. ann Example.. a(ll) .

o. * * (b) The system Ax = 0 of linear equations has only the trivial solution. ••• . ••• . Then det(A. It follows that det(B) ::f:. O. we have det(AB) = det(A) det(B). Finally. Then there exist a finite sequence E 1. Proposition. ••• . Ek of elementary n x n matrices such that B = Ek . 0. det=[n 12312 35730 10113 2 102 0 Next.Determinants 127 H~l=det[~ i ~ i ~l=-34. Application to Curves and Surfaces A special case of Proposition states that a homogeneous system ofn linear equations in n variables has a non-trivial solution if and only if the determinant if the coefficient matrix .1) = 1 det(A) Proof In view of Propositions 3G and 3C. it must be In. is summarized by the following result. so that B has no zero rows by Proposition. For every n x n matrices A and B. Then A is invertible if and only if det(A) o. (e) The determinant det(A) ::f:. In the notation of Proposition. Suppose that A is an n x n matrix. Then det(A) 0 follows immediately from Proposition. (d) The system Ax = b of linear equations is soluble for every n 1 matrix b. the main reason for studying determinants. We therefore conclude that A is row equivalent to In. It now follows from Proposition that A is invertible. we shall study the determinant of a product. we have the following result. EIA It foIrows from Proposition that det(B) = det(Ek). Proposition. Suppose now that det(A) ::f:. Proposition.1) = det(In) = 1. det(E 1) det(A) Recall that all elementary matrices are invertible and so have non-zero determinants. Proof Suppose that A is invertible. Let us now reduce A by elementary row operations to reduced row echelon form B. We shall sketch a proof of the following important result Proposition. Since B is an n x n matrix in reduced row echelon form. the following statements are equivalent: (a) The matrix A is invertible. we have det(A) det(A. (c) The matrices A and In are row equivalent. as outlined in the introduction. Suppose that the n x n matrix A is invertible. Combining Propositions. The result follows immediately.

we must have 2 2) (2 2) a (al + YI + bX 1+ cYI + d = 0. Example. Hence xa + yb + c = 0. Suppose that we wish to determine the equation of the unique line on the xy-plane that passes through two distinct given points (xI' Yl) and (x2' Y2)' The equation of a line on the xy-plane is of the form ax + by + c = O. The equation of a circle on the xy-plane is of the form a(x2 +Y2) + bx + cy + d = O. We illustrate our ideas by a few simple examples. In this section. Since the three points lie on the circle. 0 (x. and so we must have det(~x2 Y2~ 1~J = O. xla + Ylb + c = 0. we must have aX I + bYI + c = 0 and ax2 + bY2 + c = O.)a + x3b + Y3c + d = 0 Written in matrix notation. Example. we have . b. we have ( ~ ~ ~J (~J =(~J. + y. c) to this system of linear equations. Written in matrix notation. Since the two points lie on the line.128 Determinants is equal to zero. the equation of the line required. Suppose that we wish to determine the equation of the unique circle on the xy-plane that passes through three distinct given points (xl'Yl)' (x2'Y2) and (x3'Y3)' not all lying on a straight line. 2 (x1 + yf}a + xlb + ylc + d = 0 (xi + yi)a + x 2b + Y2c + d. we shall use this to solve some problems in geometry. x2 Y2 1 c 0 Clearly there is a non-trivial solution (a. x 2a + Y2b + c = O. a x2 + Y2 + bX2 + cY2 + d = 0 and Hence (x2 + Y2)a + xb + yc + d = 0.

the equation of the circle required.) + bX3 +CY3 +dz3 +e = 0. aX2 + bY2 + CZ2 + d= 0. Suppose that we wish to Qetermine the equation of the unique plane in 3space that passes through three distinct given points (xl' YI' ZI)' (X2'Y2' z2) and (x 3' Y3' z3)' not all lying on a straight line. we must have a(xf + yf + zf)+ bXI +cYI +dzl +e = 0. Since the three points lie on the plane. x 3a + Y3b + z3c + d = 0: Written in matrix notation. + Y. a (xi + yi + zi) + bX2 + CY2 + dz 2 + e = 0. c. c. a(x. d) to this system of linear equations. and so we must have the equation of the plane required. we have 2 x + i x Y a b xi + yi 2 x3 2 + Y3 x 2 Y2 x3 Y3 c d 0 o =o o Clearly there is a non-trivial solution (a. Hence xa + yb + zc + d = 0. . and so we must have det 2 x +i 2 xI +iI 2 2 2 2 x2 + Y2 x3 + Y3 x Y xI YI x2 Y2 x3 Y3 = 0. b. xla+Ylb+zlc+d=O. The equation of a sphere in 3-space is of the form z2 x3 Y2 Y3 z2 z3 a(x2 + Y2 + z2) + bx + cy + dz + e = o. Suppose that we wish to determine the equation of the unique sphere in 3space that passes through four distinct given points (xl' YI' ZI)' (x 2' Y2' Z2)' (x 3' Y3' z3) and (x4' Y4' Z4)' not all lying on a plane. Since the four points lie on the sphere. The equation of a plane in 3-space is of the form ax + by + cz + d = o. we must have ax I + bYI + cz I + d = 0.Determinants 129 Clearly there is a non-trivial solution (a. det(~ ~ ~ ~J = 1 0' 1 Example. b. and ax3 + bY3 + cZ3 + d= O. Example. x2a" + Y2b + z2c + d = 0. d) to this system of linear equations. + z.

1) matrix . + Y. (xi + yi + zi)a+x3 b + Y3 C + z3 d +e = 0.8 for proofs. we have x 2 + i+z2 x 2 + i +z2 x Y Z 1 xI YI zi x2 + Y2 +z2 x 2 + i+z2 x2 Y2 z2 x3 Y3 z3 x4 + Y4 + z4 x4 Y4 z4 I I I 2 2 2 3 2 3 2 3 2 1 1 a b 0 0 C =0 d 0 e o Clearly there is a non-trivial solution (a. . The rst one enables us to nd the inverse of a matrix... Some Useful Formulas In this section. (xi + yi + zi)a+x4 b + Y4 C + Z4 d +e = o.Determinants 130 Hence (x 2 + y2 + z2)a+ xb+ yc+ zd +e = 0.. Written in matrix notation. a ln ) . while the second one enables us to solve a system of linear equations.. (x. e) to this system of linear equations. 1 the equation of the sphere required. +z. we shall discuss two very useful formulas which involve determinants only. The interested reader is referred to Section 3. + zi)a+ x2 b + Y2 C + z2d + e = 0. ( ani . b. . and so we must have X2 + Y2 +z2 x 2 + i +z2 x Y Z 1 xI YI zi det xi + Y. c. d. Recall rst of all that for any n x n matrix all A =·. ann the number Cij = (-ly+j det(Aij) is called the cofactor of the entry aij' and the (n .1) (n . (xf + yf +zf)a+xlb+ Ylc+zl d +e = 0.. X2 Y2 Z2 xf + yf +zf X3 Y3 z3 x~ + y~ +z~ x4 Y4 Z4 I I I 1 =0.

Suppose that the n x n matrix A is invertible.. here denotes that the entry has been deleted.0 1 On the other hand.ln • • • • an(j_I)· a(i+I){i+I) aU -l)n aU + l)n aU + I)n an(j+I) ann a(i-I)(j+I) • is obtained from A by deleting row i and column}. Definition. Proposition. Remark. C11I •• .. . Cf/lJ CIIII is called the adjoint of the matrix A. det(A) lj Example. adding 1 times column 1 to column 2 and then using cofactor expansion on row 1. Then A-I = 1 ad·(A).Determinants 131 al(j. 203 Then det[~ ~] -det[ ~l ~] det[ ~ I ~] ad}(A) = -det(~ ~] det(~~] -det(~~] ~[~2 i2 ~~l det[O I] -det[1 -1] detr1 -I] 2 0 2 0 . 0]2 ~det[10 0I 0]2 ~detG 3 It follows that 2 2 3 ~]~-l. The n x n matrix ad} (A) = (C?I . Consider the matrix 1 -1 0] [ A= 0 1 2._I) a(i-I)I a(i_l)(j-I) • • a(i-I)(i_l) ani a. Note that adj(A) is obtained from the matrix A rst by replacing each entry of A by its cofactor and then by transposing the resulting matrix. we have det(A) ~ det[~ ~l.

.--. Consider the system Ax = b..... x2 = det[~2 ~3 ~]3 det(A) = -4.. .. bn an(J+I) a\nJ. Example. . [1~ -10] b Recall that det(A) = -1. .132 Determinants -3 -3 2] [ A-I = -4 2 -3 2.. we turn our attention to systems of n linear equations in n unknowns.:. AI(b). (Cramer's Rule) Suppose that the matrix A is invertible.. represented in matrix notation in the form Ax = b... For every j j of the matrix A by the column b. .. . where A= [1] ~ ~ and = ~. . we have det[~3 110 ~]3 XI = --'-d-et-(A-)--'-= -3... x n det(An(b» = -..:.. = 1. k.' -'--':._.. Then the unique solution of the system Ax = b.e. anrfn = bn..-.:. is given by det(A. (b» x . . we replace column al(Jr l ) .. .. . By Cramer's rule. ann Proposition. det(A) ' where the matrices AI(b). .1det(A) ... write in other words.:. of the form anix i + . where A. x and b are given by equation...-. 2 -1 Next. where C: Jn and b = ) Cnn (b):1 bn represent the coefficients and represents the variables.... ..

For every n EN. dened by x<l>'I' = {x$)'I' for every x E X so that is followed by... . We shall do this only for permutations. Proof There are n choices for 1$. that it is {I.Determinants 133 (1 -1 1] det 0 1 2 203 x = =3 3 det{A) . Sn denotes the collection of all functions from (1.. . . we shall first discuss a definition of the determinant in terms of permutations.. Let us check our calculations. Definition. In S4' 1 2 3 4) (2 4 1 3 . It is not dicult to see that if: $ X ~ X and: X ~ X are both permutations on X. (1<1> 2<1> . Remark.. LetXbe a non-empty finite set. .. .. In other words. we can use the notation n) 1 2 . the set Sn has n! elements. . n}. To represent particular elements of Sn' there are various notations. without loss of generality. we need to make a digression and discuss first the rudiments of permutations on non-empty finite sets. In order to do so. Since the set X is non-empty and finite.. where n EN. If x E X. 2. 2.1) choices left for 2$.. .. 2 2 -1 We therefore have [:~l [=~2 =~2 -1~ ][~]3 [=~]. Example. For example. A permutation $ onXis a function: X ~ X which is one-to-one and onto. n} that are both one-to-one and onto. n) to {I. Proposition. we den()te by x the image of x under the permutation. n<1> to denote the permutation $. Note also that we write to denote the composition. Recall from Example that [-3 -3 2] A-I = -4 -3 2. Note that we use the notation x instead of our usual notation (x) to denote the image of x under. there are (n .. n}. The reasons will become a little clearer later in the discussion. is also a permutation on X. We now let Sn denote the set of all permutations on the set {l.. 2. For each such choice. then: X ~ X. we may assume. . 2. And so on. 3 x3 = = Further Discussion In this section.

. (b) For every subset (XI' x 2. In 84 or 86 . .. the reader can easily check that 1 2 3 4)(1 2 3 4) (I 2 3 4) (2 4 1 3 3 2 4 1 . Proposition. . and is its own inverse. x k) and (YI' Y2' . it is not necessary to include this in the cycle..... .. (a) Every permutation in 8n can be written as a product of disjoint cycles. .. Xk) satises Xl' X 2' . in other words. 2. Example. xk) of the set {I. . . we have . 4<1> = 3 and 3<1> = 1. 2<1> = 4. The interested reader may try to prove the following result. n} and leaves all the others unchanged is called a transposition. the permutation I 2 3 4 5 6) (2 4 1 3 6 5 can be represented in cycle notation as (1 243)(5 6)..2 1 3 4 ' can be represented in cycle notation by (1 243)(1 34) = (1 2). .. the information 1 2 3 4)(1 2 3 4) (I 2 3 4) (2 4 1 3 3 2 4 1 . . Here the cycle (1 243) gives the information 1<1> = 2. Furthermore. since the image of2 is 2. (1 3 4) and (1 2) have lengths 4.134 Determinants denotes the permutation. . 3 = 1 and 4 = 3. we have (1 2 4 3) = (1 2)(1 4)(1 3). Suppose that n EN.. YI are all different. 2 = 4. .2 1 3 4 ' A more convenient way is to use the cycle notation.. . The permutations U~ ~ j) and (~ ~ l-:i) can be represented respectively by the cycles (I 243) and (I 34).. ·. The last example motivates the following important idea. Example.. In 8 6. We also say that the cycles (1 2 4 3). Xk) X2 )(X I . (XI X2 ..... where 1 = 2. . A pet:mutation in 8n that interchanges two numbers among the elements of {I. Suppose that n EN. the cycle (XI' X2' . every cycle can be written as a product of transpositions. By Theorem 3P(b). Definition. . Consequently. Two cycles (XI' x 2' . . Xk = (XI' x 3). YI) in 8n are said to be disjoint if the elements XI' . On the other hand.... 3 and 2 respectively. In 8 9. Remark. Note also that in the latter case. where the elements are distinct. the permutation (c) I 2 3 4 5 6 7 8 9) (3 2 5 1 7 8 4 9 6 can be written in cycle notation as (1 3 5 74)(6 8 9). Example. n}. It is obvious that a transposition can be represented by a 2-cycle. . every permutation in 8n can be written as a product of transpositions. (XI' xk). xk' YI' . 2.

Here we conne our study to the special cases when n = 2 and n = 3.a13a22a31 .!.Determinants 13S (1 3 5 7 4) = (1 3)(1 5)(1 7)(1 4) and (6 8 9) = (6 8)(6 9). Definition. In the two examples below.. we write (. By an elementary product from the matrix A. Suppose that n EN.. ann is an n x n matrix. Suppose that A=[aI~ anI In . no two of which are from the same row or same column. By the determinant of an n x n matrix A of the form (11).a I2 a 2I as shown before. Hence the permutation can be represented by (1 3)(1 5)(1 7)(1 4)(68)(69). <l> Remark.. we mean the product of n entries of A. Furthermore. It can be shown that no permutation can be simultaneously odd and even. we use e to denote the identity permutation. Then a permutation in Sn is said to be even if it is representable as the product of an even number oftransp0sitions and odd ifit is representable as the product of an odd number of transpositions.. Indeed. Example. Suppose that n = 3. The very interested reader may wish to make an attempt. Example. a :] . an(nj). We are now in a position to dene the determinant of a matrix. We have the following: elementary product permutation sign e +1 alla22a33 (123) +1 a12a21a33 (1 32) +1 a13a21a32 -1 (1 3) a13a22a3I contribution + aIla22a33 + a12a21a31 + a13a21a32 . It follows that any such elementary product must be of the form a I(1<1»ai 2<1» .. one can use (12) to establish Proposition.. Definition. where <I> is a permutation in Sn' Definition.) _ E 'f' - {+-11 ififf isis even odd. Suppose that n = 2. We have the following: elementary product permutation sign contribution a ll a22 a 12a2I e +1 (1 2) -1 Hence det (A) = all a 22 . we mean the sum <Pe:S" where the summation is over all the n! permutations in Sn' It is be shown that the determinant de ned in this way is the same as that dened earlier by row or column expansions..

we have det(EB) = det(E) det(B). Let us reduce A by elementary row operations to reduced row echelon form A'. (a) If E arises from interchanging two rows of In. then det(E) = -I. E~. Suppose that E is an elementary matrix.a12a2Ia33' We have the picture lielow: + Next.. ButAB = E 1. Since elementary matrices are invertible with elementary inverse matrices. E/A 'B). Ek of elementary matrices such that A = EI . so it follows from Proposition that det(AB) = O. EJI1' Suppose first of all that det(A) = O. .a13a22a31. Proof of Proposition. Gk. Then it follows from (13) that the matrix Ao must have a zero row. and so det(A' B) = O. Then A' = In' and so it follows from Equation that AB = E 1. Thenfor any n x n matrix B. Proposition.. (b) (c) If E arises from adding one row of In to another row.ofelementary matrices such that A' = Gk . G1A..alla23a32 . The idea is to use elementary matrices.Determinants 136 (23) -1 . ... Corresponding to Proposition. we can easily establish the following result. det(E) = c. . we discuss briey how one may prove Proposition concerning the determinant of the. then det(E) = 1. Hence A' B must have a zero row. .. it follows that there exist a nite sequence E 1. then Combining Propositions 3D and 3Q.a12a21a33 Hence det(A) = alla22a33 + a12a23a31 + a13a21a32 . If E arises from mUltiplying one row of In by a non-zero constant c. Suppose next that det(A) ::t: O. .alla23a32 alla23a32 a12a21a33 (1 2) -1 . we can establish the following intermediate result. Then there exist a finite sequence G 1. .. . Proposition. ••• . ••• . product of two matrices. Suppose that E is an n x n elementary matrix..

for every} = 1. we have bii = ailCil + . as this clearly implies A[ 1 ad}(A)] = In' det(A) giving the result. it remains to show that bIClj + . Proof Since A is invertible. To complete the proof. we have htClj + . and so the determinant is 0 (why?). this becomes • adj(A)b. . .y x· = ---=-----"- det(A) . if i :.. n. To show. + ainC...} = 1....Determinants 137 Proof It sucess to show that Aad}(A) = det(A)In. It follows that when i = }.. on using cofactor expansion by column}. b1n ] (B)=: : .. note that all ...n = det(A): On the other hand. +bnC. .. + ainCjn .... n.det~A) ~~ [ Hence. [ al n 1[CJl Aad}(A) =: :: anI'" ann Ctn Suppose that the right hand side of is equal to [ b~l . }. then equation is equal to the determinant of the matrix obtained from A by replacing row} by row i. bn1 ... This matrix has therefore two identical rows. the unique solution of the system Ax = b is given by x=A -I = 1 det(A) Written in full. that J . [::J. . we have bij = ajJCjI + . we get A-I = 1 ad'(A) det(A) lj By Proposition. + bnCnj = det(Aib)): Note.... bnn Then for every i.

.(b» } = L..138 Determinants ~ det(A ..bi ( -1) i=l Hj a(i-I)I det • a(i+}) I a(i-l)(j-I) • • a(i-I)(j-I) • • a(i+I)(J-I) an(j-I) • a(i-I)n • ..

. or any polynomial. is some scalar.... Spectral theory in infinitely many dimensions if significantly more complicated.2xO= '). The main idea of spectral theory is to split the operator into simple blocks and analyse each block separately. 1. let us consider difference equations.. Spectrum A ·scalar A.. To explain the main idea. millions? Or what if we want to analyse the behaviour of xn as n ~ oo? Here the idea of eigenvalues and eigenvectors comes in. 1 At the first glance the problem looks trivial: the solution xn is given by the formula xn =Anxo .. Eigenvectors. .2xO' . Ifwe have such a linear transformation A : V ~ V.. n = 0. we can mUltiply it by itself. and most of the results presented here fail in infinite-dimensional setting. and xn is the state of the system at the time n. equivalently. Suppose that Axo = A xo' where'). analyse the long time behaviour of x n' etc.. In this chapter we consider only operators acting from a vector space to itself (or. . where a : V ~ V is a linear transformation.. MAIN DEFINITIONS Eigenvalues. ..Chapter 5 Introduction to Spectral Theory EIGENVALUES AND EIGENVECTORS Spectral theory is the main tool that helps us to understand the structure of a linear operator. Anxo = ').2. is called an eigenvalue of an operator A : V ~ V if there exists a non-zero vector v E V such that . Many processes can be described by the equations of the following type xn +1 = Axn. But what if n is huge: thousands. take any power of it. n x n matrices).nxo' so the behaviour of the solution is very well understood In this section we will consider only operators in finite-dimensional spaces.. ThenA2axo = A 2xO' '). Given the initial state Xo we would like to know the state xn at the time n.

This polynomial is called the characteristic polynomial of A. Let a act on lR n (i. The set of all eigenvalues of an operator A is called spectrum of A. that determinants of similar matrices coincide. So.. Therefore II E a(A). and compute eigenvalues of the matrix of the operator in this basis. A -Ihas a nontrivial nullspace if and only if it is not invertible.140 Introduction to Spectral Theory Av = Av. finding all eigenvectors.. Note. and it is impossible to solve the equation of degree higher than 4 in radicals. the eigenvectors are easy to find: one just has to solve the equation Ax = Ax.Ai) x = 0 has a non-trivial solution). We know that a square matrix is not invertible if and only if its determinant is O.e. Let us recall that square matrices A and B are called similar if there exist an invertible matrix S such that A =SBS-l. in higher dimensions different numerical methods of finding eigenvalues and eigenvectors are used. the set of all eigenvectors and 0 vector. or. So.i.e. is called the eigenspace. the determinant det(A . Since the matrix of A is square.AI) is a polynomial of degree n of the variable A. a: lR n ~ lRn). Finding Eigenvalues: Characteristic Polynomials A scalar A is an eigenvalue if and only if the nullspace Ker(A . Finding roots of a polynomial of high degree can be a very dicult problem. The vector v is called the eigenvector ofa (corresponding to the eigenvalue A). equivalently (A . The nullspace Ker(A .AI) is non-trivial (so the equation (A .AI) = 01 If A is an n x n matrix. Ifwe know that ').AI).. to find all eigenvalues of A one just needs to compute the characteristic polynomial and find all its roots.e.A is an eigenvalue of A {:} det (A . So. is an eigenvalue. One is based on the notion of similar matrices. corresponding to an eigenvalue is simply finding the nUllspace of A . But how do we find eigenvalues of an operator acting in an abstract vector space? The recipe is simple: Take an arbitrary basis. Characteristic Polynomial of an Operator So we know how to find the spectrum of a matrix.AI. Indeed det A = det(SBS-l) = det S det B det S-l = det B because det S-l = 1/ det S.Ai)x = o. i. This method of finding the spectrum of an operator is not very practical in higher dimensions. and is usually denoted cr(A). Note that if A = SBS-I then . But how do we know that the result does not depend on a choice of the basis? There can be several possible explanations.

As we have discussed above. Icharacteristic polynomials of similar matrices coincide. and A and B are two bases in V.. when people say simply "multiplicity" they usually mean algebraic multiplicity.. counting multiplicity.I - 141 = S(BS-I - US-I) so the matrices A -lJ and B -lJ are similar.e.U = SBS. Let us mention. Any polynomial p(z) = L~=o akz k of degree n has exactly n complex roots. peA) = 0) then Z .An). that ifp is a polynomial. p can be represented as p(z) = (z ..Introduction to Spectral Theory A .)S-I. that algebraic and geometric multiplicities of an eigenvalue can differ. Let A be n x n matrix. i. positive integer k such that (z .. An be its eigenvalues (counting multiplicities). . + An..)2 divides p and so on. p can be represented as p(z) = an(z . Multiplicities of Eigenvalues Let us remind the reader. . . then [T]AA = [I]AB[TbB[I]BA and since [l]BA = ([l]AB)-1 the matrices [11 AA and [11 BB are similar. There is another notion of multiplicity of an eigenvalue: the dimension of the eigenspace Ker(A-1) is called geometric multiplicity of the eigenvalue A. So. . where q is some polynomial. If T: V -7 ASIS-I = S(B _IJ.. . Trace and Determinant Theorem.'JJ).A2) .. Then 1. The largest. I V is a linear transformation. The words counting multiplicities mean that if a root has multiplicity d we have to count it d times.. If q(A) = 0. then it is a root of the characteristic polynomial p(z) = det(A . matrices of a linear transformation in different bases are similar. An are its complex roots. In other words. we can define the characteristic polynomial of an operator as the characteristic polynomial of its matrix in some basis. In other words. where Ai' ~. Geometric multiplicity is not as widely used as algebraic mUltiplicity. the result does not depend on the choice of the basis. Proposition. so characteristic polynomial of an operator is well defined. i.. Therefore det(A -'JJ) = det(B . The mUltiplicity of this root is called the (algebraic) multiplicity of the eigenvalue A. Geometric multiplicity of an eigenvalue cannot exceed its algebraic multiplicity.A)q(Z).A divides p(z).zl).AI)(Z . so (z ..e.A i divides p(z) is called the multiplicity..e. of the root A. and let AI' A2. traceA = AI + A2 + . and Ais its root (i. then q also can be divided by z -. counting multiplicities. (z . If A i~ an eigenvalue of an operator (matrix) A. Therefore.

vn of eigenvectors.. ..). . Namely eigenvalues ofa triangular matrix (counting mUltiplicities) are exactly the diagonal entries a).142 2. Namely.. n on the diagonal [A] BB = diag{A). . 2..). However. . n be the corresponding eigenvalues. Introduction to Spectral Theory det A = A)A2 . and use the fact that determinant of a triangular matrix is the product of its diagonal entries.A)(a2'2 . (an'n .. An} = [AI A2 .) . when we can just read eigenvalues off the matrix. .A2 .2.. . o An Therefore.2. A.. A2. an'n' DIAGONALIZATION Suppose an operator (matrix) a has a basis b = vI' v2' . it is easy to find an Nth power of the operator A.n By triangular here we mean either upper or lower triangular matrix.. An .... . functions of the operator are also very easy to compute: for example the operator (matrix) exponent et4 is defined as et4 and its matrix in the basis B is t2A2 = I +tA+~+ t3 A3 3! 00 l Ak =Lkl• .. Since a diagonal matrix is a particular case of a triangular matrix (it is both upper and lower triangular the eigenvalues of a diagonal matrix are its diagonal entries The proof of the statement about triangular matrices is trivial: we need to subtract A from the diagonal entries of A.. its matrix in the basis B is o • "N "N _ [A N ]BB = dlag = {"A)N . . which can be quite time consuming. .A) . We get the characteristic polynomial det(A -'M) = (a). and let A....An - o Moreover. a2. k=O .. Then the matrix of A in this basis is the diagonal matrix with 1.. an. . . Eigenvalues of a Triangular Matrix Computing eigenvalues is equivalent to finding roots of a characteristic polynomial of a matrix (or using some numerical method). there is one particular case..A) and its roots are exactly al')' a 2'2' .. 0].

.I . and let vI' v 2. An where we use D for the diagonal matrix in the middle. I . and a system consisting of one non-zero vector is linearly independent... A matrix a admits a representation A = SDS-I.143 Introduction to Spectral Theory o o To find the matrices in the standard basis S. Another way of thinking about powers (or other functions) of diagonalizable operators is to see that if operator A can be represented as A = SDS. . . because by the definition an eigenvector is non-zero. Theorem. .. . . vr are linearly independent. then A = [A]ss = s AI [ A 0 2. then the matrix admits the representation A = SDS-l. ' NTimes and it is easy to compute the Nth power of a diagonal matrix. Proof We will use induction on r. A. Let A... where S = [l]SB is the change of coordinate matrix from coordinates in the basis B to the standard coordinates. vn . . its columns form a basis. Since S is invertible. .I ) . vr be the corresponding eigenvectors. where D is a diagonal matrix if and only if there exists a basis of eigenvectors of A. o 1S -=ISDS -I . The case r = 1 is trivial... Proof We already discussed above that if there is a basis of eigenvectors. we need to recall that the change of coordinate matrix [l]SB is the matrix with columns vI' v2. Theorem. .z. A. .. Then vectors vI' v2.. then columns of S are eigenvectors of A (column number k corresponds to the kth diagonal entry of D).r be distinct eigenvalues of A. then AN = (SDS-I)(SDS. . . Similarly Af AN = SD N S-I A~ =S o o and similarly for etA. . On the other hand if the matrix admits the representation a = SDS-I with a diagonal matrix D.. The following theorem is almost trivial. Let us call this matrix S. = SD N S-I .

. Proof For each eigenvalue k let vk be a corresponding eigenvector (just pick one eigenvector for each eigenvalue). we have the trivial linear combination.1. . If an operator A " V ~ V has exactly n = dim V distinct then it is diagonalizable.I are linearly independent. ...e.AkI).. . . Vp is linearly independent if and only if any system of non-zero vectors vk..2. Ak E cr(A). vn is linearly independent. . Vp is a basis if and only if it is generating and linearly independent.A/)Vr = 0 we get r-I L::Ck (Ak .+ crvr = L::CkVk = 0 k=I Applying A .. . Remark.. From the above definition one can immediately see that Theorem states in fact that the system of eigenspaces Ek of an operator A Ek := Ker(A . i... .. Suppose there exists a non-trivial linear combination r cIv I + c2v2 + .. . vr. . . ' is linearly independent.Ar )Vk = O... where vk E Vk. r .. is linearly independent. Since Ak"* Ar we can conclude that ck = 0 for k < r. p). ... . We saythatthe system of subspaces is a basis in V if any vector v E V admits a unique representation as a sum p v = v J + v2 + . .1. that a system of subspaces VI' V2. Then it follows from that cr = 0. Remark. .+ vp = 0.144 Introduction to Spectral Theory Suppose that the statement of the theorem is true for r .. Vp is generating (or complete. Vp be subspaces ofa vector space V. Corollary. vk E Vk has only trivial solution (vk = 0 Vk = 1. By Theorem the system vI' v 2. 2.. . .. We say that the system of subspaces VI' V2. k=I By the induction hypothesis vectors vI' v2. .. a system of subspaces VI' V2. or spanning) if any vector v E V admits representation. so ciAk-r) = 0 for k = 1.. Another way to phrase that is to say that a system of subspaces VI' V2.. and since it consists of exactly n = dim V vectors it is a basis. It is easy to see that similarly to the bases of vectors. .Vk E Vk· k=1 We also say.Ar I and using the fact that (A .. .. Bases of Subspaces (AKA Direct Sums of Subspaces) Let VI' V2.+ vp = L::Vk... Vp is linearly independent if the equation VI + v2 + .

. for example as follows: first list all vectors from B I . P such that the set Bk consists of vectors bj :} E A k . . n. i. . Proof To prove the theorem we will use the same notation as in the proof of Lemma.. . Instead of using two indices (one for the number k and the other for the number of a vector in Bk.. listing all vectors from Bp last. then all vectors in B2. we have cj = 0 for all} E A k' Since it is true for all A k' we can conclude that cj = 0 for all }. n into p subsets AI' A2. Proof The proof of the lemma is almost trivial. vn ' Split the set of indices I.. . . Then the union [kBk ofthese bases is a basis in V. . the system Bk) are linearly independent.. This way. and the set of indices {I. . Clearly the subspaces Vk form a basis of V.e. Ap ' and define subspaces Vk := span {Vj :} E A k }. let n be the number of vectors in B := [U~k' Let us order the set B. . let us use "flat" notation..'\" c ·b· = 0 b + . vk = 0 Vk..Introduction to Spectral Theory 145 There is a simple example of a basis of subspaces. we index all vectors in B by integers 1. 2.2. . Theorem.. 2. Vp be a basis of subspaces.+ vp = O.e. Than means that for every k L: Cij =0. Since vk E vk and the system of subspaces Vk is linearly independent. etc. Namely.... if one thinks a bit about it. the system Bk consists of vectors bi'} E A k' ... . JEAk and since the system of vectors bj :} E A k (i. .. .. n} splits into the sets 1.. Let V be a vector space with a basis vI' v2.2... Vp be a linearly independent family of subspaces.. To prove the theorem we need the following lemma.. Suppose we have a non-trivial linear combination n b . The following theorem shows that in the finite-dimensional case it is essentially the only possible example of a basis of subspaces. Let VI' V2' . The main diculty in writing the proof is a choice of a appropriate notation. .. Let VI' V2' .. and let us have in each subspace Vk a basis (of vectors) B.. and let us have in each subspace Vk a linearly independent system Bk of vectors 3 Then the union B "= U~k is a linearly independent system. . Lemma. + Cnn-~JJ c ibl + c22 J=I Denote Then can be rewritten as VI + v 2 + .

"" n is linearly independent. that any polynomial can be decomposed into the product of monomials.2.+ v = P = I:'>k.'Ai) = (AI . which means it is a basis. and therefore the same holds for the diagonalizable operators... ". Let us now prove the other implication.Ai) (i. Consider the matrix A= (~ r). Let Bk be a basis in Ek. Theorem. so it only remains to show that the system is compl~te. Vp is a basis. The subspaces Ek. either by working in a complex vector space. An} det(D . So.. Let AI' A2.At/) be the corresponding eigenspaces. Since the system of subspaces VI' V2.A)(A2 .e. Note. First of all let us mention a simple necessary condition. that for a diagonal matrix. An operator A : V ~ V is diagonalizable ifand only iffor each eigenvalue Il the dimension of the eigenspace Ker(A . We know that each Bk consists of dim Ei= mUltiplicity ofAk) vectors. . the geometric multiplicity) coincides with the algebraic multiplicity of Il. or simply assuming that a has exactly n = dim Veigenvalues(counting multiplicity).A). By Lemma the system B = [U~k is a linearly independent system of vectors. the algebraic and geometric multiplicities of eigenvalues coincide. vk E Vk' k=1 Since the vectors bpj E A k form a basis in Vk' the vectors vk can be represented as vk 2:: cjbj . "" Ap be eigenvalues of A.e.. Proof First of all let us note. jEA k and therefore v = 2::~={Jbj' Criterion of Diagonalizability. k = 1.146 Introduction to Spectral Theory Lemma asserts that the system of vectors b"j = 12. we see that if an operator A is diagonalizable. if we allow complex coefficients (i. we have a linearly independent system of dim V eigenvectors. A2. its characteristic polynomial splits into the product of monomials. complex eigenvalues). In what follows we always assume that the characteristic polynomial splits into the product of monomials. ".A) ".(An . Since for a diagonal matrix D = diag{A 1. So the number of vectors in b equal to the sum of multiplicities of eigenvectors k. and let Ek := Ker(A . which is exactly n = dim V. Some Examples Real eigenvalues. any vector v E V can be represented as p v = vIPI + v 2P2 + ". '''' pare linearly independent.

= 5 (1-5 2) (-4 2) A-51 = 8 1. Similarly. OT. for t.. . = -3 A-U=A+31=(: ~) and the eigenspace Ker(A + 31) is spanned by the vector (1.1~>-1=(1->-)2+22 and the eigenvalues (roots of the characteristic polynomial are t. = 5 and t. = 1 + 2i A. For = (-=-~ _~i) This matrix has rank 1.= 1-2i a corresponding eigenvector is (1 . So. Consider the matrix A=(b O· Its characteristic polynomial is 110>.. Since the matrix A is real.. = 1 ± 2i.5 = 8 -4 A basis in its nullspace consists of one vector (1. _2)T ... 2)T.1~>-I=(l_>-)2. we do not need to compute an eigenvector for t. so the eigenspace Ker(A . for example by (1.>-)2 -16 and its roots (eigenvalues) are t.AI) is spanned by one vector... For the eigenvalue t. Consider the matrix A=(l2 i)· Its characteristic polynomial is 1 1=2>.1~ >-1 = (1. for t. = 1-2i: we can get it for free by taking the complex conjugate of the above eigenvector. = -3.V t.-if . so this is the corresponding eigenvector.Introduction to Spectral Theory 147 Its characteristic polynomial is equal to 118>. and so the matrix A can be diagonalized as A=(1i -i1)(1+2i 0 )(1 1)-1 0 1. The matrix A can be diagonalized as A= (~ i) = (~ l2)(~ ~3)(~ 1 -2 )-1 Complex eigenvalues.2i i -i A non-diagonalizable matrix..

But. note that form a basis for JR l It follows that every U E JR2 can be written uniquely in the form u=c1v 1 +c2v2' where c I . Il~? ~ JR2 . Since AV=A(~ ~}=(~ ~}. so that Au =A(clv l + c2v2) = clAvI + c0 v2 = 2c l v I + 6c2v2 .1) = 1 (1 pivot. it would have a diagonal form (6 ~) in some basis. c2 E lR. y) = (s. There is also an explanation which does not use Theorem. so a is not diagonalizable. the two special vectors vI and v 2 and the two special numbers 2 and 6. Of). so 2 . it is easy to see that dimKer(A .148 Introduction to Spectral Theory so A has an eigenvalue 1 of mUltiplicity 2. Namely. we must have . the function f: JR2 ~ JR2 can be described easily in terms of . and so the dimension of the eigenspace wold be 2. Note that in this case. we got that the eigenspace Ker(A-I1) is one dimensional (spanned by the vector (1. We hope to find numbers A E JR and nonzero vectors v E JR2 such that G~)V=AV. (. Consider a function f: j(x. Therefore A cannot be diagonalized. Example.). t). Let us now examine how these special vectors and numbers arise. If A were diagonalizable. the geometric multiplicity of the eigenvalue 1 is different from its algebraic multiplicity.)=G Note that On the other hand.1 = 1 free variable). Therefore. y) E JR2 by where ~)(. dened for every (x.

) .Introduction to Spectral Theory 149 (G ~)-(~ ~)}= o. Substituting A = 2 into (1). Since vERn is non-zero. Hence (3 .)(5 . wiiliroot v.A. Then Av = AV = 'Alv. we must have al1 det -A a12 a21 a22.A. Suppose that A is an eigenvalue of the n x n matrix A. Solving this equation gives the eigenvalues of the matrix A. so that (A . 1 J= O. On the other hand. it follows that we must have det (A . with roots A. ( 1 5-A.AJ)v = O.A. the set n {v elR : (A-A1)v=O} . Suppose that ( a~1 . ann . A=: ani a~nJ : ann is an n x n matrix with entries in lR. Definition.A1) = O. and that v is an eigenvector corresponding to the eigenvalue A. In other words. and that v is an eigenvector corresponding to the eigenvalue A... v. t we obtain G!} = 0. willi root = 2 and A. In other words. 3 5 _ A.3 = 0. we obtain (~3 ~l}= 0. 3) v=O.A ani an2 =0. = (~J Substituting A = 6 into (1). Then we say that A is an eigenvalue of the matrix A. Suppose further that there exist a number E R and a non-zero vector v E]Rn such that Av = AV.A that is a polynomial equation.A. we must therefore ensure that det ( 3. we must have 3. =CJ. for any eigenvalue A of the matrix A. where I is the n x n identity matrix. In order to have non-zero v E]R2 .2 = 6.

ISO Introduction to Spectral Theory is the nullspace of the matrix A .. . the space is called the eigenspace corresponding to the eigenvalue A..A and 1. with corresponding eigenvectors respectively. The matrix G~) has characteristic polynomial (3 .2)(1. . 1. The eigenspace corresponding to the eigenvalue 2 is {VE~2:G :)V=+H~JCE~}. a subspace of ~ n • Definition. we need to nd the roots of -12 J 30 = 0. Hence the eigenvalues are Al = 2 and ~ = 6. The polynomial is called the characteristic polynomial of the matrix A..2 .1.3 30 . Consider the matrix -1 6 -12J A= 0 -1.81. with root vI -9 = (IJ ~ . . The eigenvalues are therefore Al = -1.3 = 5. For any root Aof equation. An eigenvector corresponding to the eigenvalue -1 is a solution of the system (A + f)v =(~ 6 -12 -12J ~~ v = 0.2 = 2 det 6 -1-1. (A + 1)(1.)(5 ... in other words...IJ.A) .. ( o -9 20-1. in other words. + 12 = O.3 = 0. ( o -9 20 • To find the eigenvalues of A. 0 -13 .. 1.5) = O. Example. Example. The eigenspace corresponding to the eigenvalue 6 is {VE~2 {~3 ~l)V=+H:}CE~}..

-30 20 10 3 0 ( Note that the eigenspace corresponding to the eigenvalue -3 is a line through the origin. 18 1 An eigenvector corresponding to the eigenvalue 5 is a solution of the system (A-5J)v= -6 6 -12) ( 1) 0 -18 30 v=O. ( 1) 3.151 Introduction to Spectral Theory An eigenvector corresponding to the eigenvalue 2 is a solution of the system -3 6 (o ~A . -2 An eigenvector corresponding to the eigenvalue 2 is a solution of the system 15 (A . and so form a basis for ]R3. with root vI ( -30 20 15 = Az = 2. Note also that the eigenvectors VI' v2 and v3 are linearly independent. we need to nd the roots of -5) 17-A -to 45 -28-A -15 = 0. withrootv3 = -5 . while the eigenspace corresponding to the eigenvalue 2 is a plane through the origin.2)2 = O. ( o -9 15 -3 Note that the three eigenspaces are all lines through the origin. The eigenvalues are therefore Al = -3 and An eigenvector corresponding to the eigenvalue -3 is a solution of the system det (A+31)v= 20 -to -5) 45 -25 -15 V=O. with root v2 = 2. with roots v2 = 0 and v3 = 3 . -30 20 12 To find the eigenvalues of A. Consider the matrix A= (~~ =~~ ~155). (A + 3)(A .2J)v = 0 -15 -9 (0) -12) 30 v = 0. . Note also that the eigenvectors vI' v2 and v3 are linearly independent. ( -30 20 12-A in other words. Example. and so form a basis for ]R3.2I)v = 45 -to -5) (1) (2) -30 -15 v = 0.

we need to find the roots of . and so the eigenspace corresponding to the eigenvalue 1 is of dimension 1 and so is also a line through the origin. OJ (0 0 3 A= 1 To find the eigenvalues of A. Consider the matrix 2 -1° 0.J)v= (i ~: ~}= 0. We can therefore only nd two linearly independent eigenvectors. An eigenvector corresponding to the eigenvalue 3 is a solution of the system -1~ ~-1 OJ~ v = 0.1)2 = 2-A 1 -1 0. Consider the matrix A= (~ =~ ~J. 0. On the other hand.152 Introduction to Spectral Theory Example. with root v2 =(i} Note that the eigenspace corresponding to the eigenvalue 3 is a line through the origin. the matrix (i ~: ~J has rank 2. ( (A . The eigenvalues are therefore Al = 3 and A2 = 1.3I)v = An eigenvector corresponding to the eigenvalue 1 is a solution of the system (A. with root vI = (OJ~. in other words. 1 -3 4 To find the eigenvalues of A.3)(A . so that ]R3 does not have a basis consisting of linearly independent eigenvectors of the matrix A. Example. we need to nd the roots of det ( °-A °° J= ° ° 3-A (A .

so that IR does not have a basis consisting of linearly independent eigenvectors of the matrix A. It follows that the eigenvalues of the matrix A are given by the entries on the diagonal.2)(A . it can be proved by induction that for every natural number kEN. -1 0 Note now that the matrix (i =~ ~J has rank 1. and so the eigenspace corresponding to the eigenvalue 2 is of dimension 2 and so is a plane through the origin. Then A 2v = A(Av) = A(AV) = A(Av) = A(AV) = A2V. (A . In fact. 003 To find the eigenvalues of A. Ak is an eigenvalue of the matrix Ak. with roots -3 2 vI = ( ~ J and v2= (~J. Consider the matrix ( ~ ~ :J.1)(A . 3-A in other words. with corresponding eigenvector v. (A .2)3 = O.Introduction to Spectral Theory det 153 3-A -3 2 1 -1. Suppose that A is an eigenvalue of a matrix A. 1 -3 4-A in other words. Hence A2 is an eigenvalue of the matrixA2.A 2 ( J= 0. this is true for all triangular matrices. with corresponding eigenvector v. The eigenvalue is therefore A = 2. In fact.3) = O. An eigenvector corresponding to the eigenvalue 2 is a solution of the system =~ ~Jv = 0. we need to find the roots of det(l ~A 2 ~ A o 0 : J= 0. Let us return to Examples are consider again the matrix . Example. Example. with corresponding eigenvector v. We can therefore only nd two linearly independent 3 eigenvectors. The Diagonalization Problem Example.

154 Introduction to Spectral Theory A~G !} We have already shown that the matrix A has eigenvalues Al = 2 and 11.:) respectively. with entries in R.2 = 6. Note that c E R2 is arbitrary. Suppose further that . with corresponding eigenvectors respectively.J=el :)(~ ~)(. This implies that (AP .PD)c = 0 for every c E R2.} Then both can be rewritten as (. Since the eigenvectors form a basis for uniquely in the form u = civ i + c2v2' where cl' c2 E R. then matrix become u = Pc and Au = P Dc respectively. Hence we must have AP = PD.}AU =(.u =(.)~(~l :)(~. Note here that AI P = (vI v2) and D = ( 0 Note also the crucial point that the eigenvectors of A form a basis for 1R 2 • We now consider the problem in general. Proposition.) ~ (~l :)(. Suppose that A is an nn matrix. Ifwe write p ~ (~l :) and D ~ (~ ~). we conclude that P-IAP=D. Since P is invertible. so that APc = P Dc. and ne. every u E R2 can be written Write c= (:J.:) and (.

not necessarily distinct. . so that every u E]Rn can be written uniquely in the form u= civ i + .PD)c = 0 for every cERn. with corresponding eigenvectors vI' ... vn ERn.. . An E R. so that APc = PDc. o -9 20 as in Example. and that vI' . We have P-IAP = D. + cnvn' where c I ' ...Introduction to Spectral Theory A has eigenvalues AI' 155 .. and Writing we see that both equations can be rewritten as U J . where p~(~ ~ ~~) D~(~l ~ and n . Vn are linearly independent. . they form a basis for ]Rn. Consider the matrix A = (~1 -~3 ~~2). Note that C E]Rn is arbitrary. This implies that (AP . Example. . Then p-IAP=D.. vn are linearly independent. = P = (AICI = Pc and Au : = PDc AnCn respectively. Since the columns of P are linearly independent. cn E]Rn .... where Proof Since VI' . Hence we must have AP = PD. Hence P-IAP = D as required.. it follows that P is invertible. ..

156

Introduction to Spectral Theory

Example. Consider the matrix

A=

(

P=

(

17 -10 -5)

45 -28 -15 ,
-30 20
12
as in Example. We have P-IAP = D, where

1 1 2)

3
-2

0 3 and D =
3 0

(-3 0
0
0

2
0

Definition. Suppose that A is an n x n matrix, with entries in lR. We say that A is
diagonalizable ifthere exists an invertible matrix P, with entries in lR, such that P-IAP is
a diagonal matrix, with entries in lR. It follows from Proposition that an n x n matrix A
with entries in lR is diagonalizable if its eigenvectors form a basis for lR n • In the opposite
direction, we establish the following result.
Proposition. Suppose that A is an nn matrix, with entries in lR. Suppose further that

A is diagonalizable. Then A has n linearly independent eigenvectors in lRn.
Proof Suppose that A is diagonalizable. Then there exists an invertible matrix P, with

entries in lR, such that D = P-IAP is a diagonal matrix, with entries in lR.
Denote by vI' ... , vn the columns of P; in other words, write
P=(vl· .. vn)·
Also write

Clearly we have AP

= PD. It follows that
AI

(Avi ... Avn) = A(vi ... vn) = ( vI'" vn)
(

= (AIV I ... Alnvn)·
Equating columns, we obtain
AVI = Alvl' ... , AVn = AnVn'
It follows that A has eigenvalues AI' ... , An E lR, with corresponding eigenvectors vI'
... , Vn E lRn. Since P is invertible and vI' ... , vn are the columns of P, it follows that the
eigenvectors vI' ... , vn are l!nearly independent.

157

IntroductIOn to Spectral Theory

In view of Propositions, the question of diagonalizing a matrix A with entries in IR is
reduced to one of linear-independence of its eigenvectors.

Proposition. Suppose that A is an n x n matrix, with entries in IR. Suppose further
that A has distinct eigenvalues AI' ... , An E IR, with corresponding eigenvectors vI' ... , Vn
E IR n . Then vI' ... , vn are linearly independent.
Proof Suppose that vI' ... , vn are linearly dependent. Then there exist c I' ... , cn E IR,
not all zero, such that
clvl+···+cnvn=O.
Then
A(CIV I + ... + cnvn) = clAvI + ... + c~vn = Atclv I + ... + Ancnvn = O.
Since vI' ... , vn are all eigenvectors and hence non-zero, it follows that at least two
numbers among c I ' ... , cn are non-zero, so that c I ' ••• , cn_1 are not all zero. Multiplying by
An and subtracting, we obtain
(AI - n)c 1VI + ... + (An-I - An)Cn- IVn-I = O.
Note that since AI' ... , An are distinct, the numbers Al - An' ... , An-I - An are all nonzero. It follows that VI' ... , Vn- l are linearly dependent. To summarize, we can eliminate
one eigenvector and the remaining ones are still linearly dependent. Repeating this argument
a finite number of times, we arrive at a linearly dependent set of one eigenvector, clearly
an absurdity.
We now summarize our discussion in this section.

Diagonalization Process. Suppose that A is an n x n matrix with entries in lR.
Determine whether the n roots of the characteristic polynomial det(A - IJ) are

(l)

real.
(2)

If not, then A is not diagonalizable. If so, then nd the eigenvectors corresponding
to these eigenvalues. Determine whether we can find n linearly independent
eigenvectors.

(3)

If not, then A is not diagonalizable. If so, then write

. . J,

where AI' ... , An E IR are the eigenvalues ofA and where vI' ... , vn E lR n are respectively
their corresponding eigenvectors. Then P-IAP = D.

Some Remarks
In all the examples we have discussed, we have chosen matrices A such that the
characteristic polynomial det(A -IJ) has only real roots. However, there are matrices A
where the characteristic polynomial has non-real roots. Ifwe permit AI' ... , An to take values

158

Introduction to Spectral Theory

in C and permit "eigenvectors" to have entries in <C , then we may be able to "diagonalize"
the matrix A, using matrices P and D with entries in <c. The details are similar.
Example. Consider the matrix
A

=

(1 -5).

1 -1
To find the eigenvalues of A, we need to find the roots of

J

I-A
-5
= 0;
1
-I-A
in other words, A2 + 4 = O. Clearly there are no real roots, so the matrix A has no
det= (

eigenvalues in
matrix

1R. Try to show, however, that the matrix A can be "diagonalized" to the
D= (

2i

0)

.
-2i
We also state without proof the following useful result which will guarantee many
examples where the characteristic polynomial has only real roots.
Proposition. Suppose that A is an n x n matrix, with entries in 1R. Suppose further
thatA is symmetric. Then the characteristic polynomial det(A -IJ) has only real roots. We
conclude this section by discussing an application of diagonalization. We illustrate this by
an example.
Example. Consider the matrix

A

=

o

(~~ =~~ ~155J'
-30

20

12

as in Example. Suppose that we wish to calculate A98. Note that P-IAP

P=

1 1 2J
(-3
(-23 03 30 and D = 00

o2 O.
OJ
o 2

It follows that A = PDp-i, so that

A

98

= (PDP-I) ...(PDP- 1) = PD 98 p- 1 = P
,

v

0

I

98

This is much simpler than calculating A 98 directly.

o

o

= D, where

Introduction to Spectral Theory

159

An Application to Genetics
In this section, we discuss very briey the problem of autosomal inheritance. Here we
consider a set oftwo genes designated by G and g. Each member of the population inherits
one from each parent, resulting in possible genotypes GG, Gg and gg. Furthermore, the
gene G dominates the gene g, so that in the case of human eye colours, for example, people
with genotype GG or Gg have brown eyes while people with genotype gg have blue eyes.
It is also believed that each member of the population has equal probability of inheriting
one or the other gene from each parent. The table below gives these peobabilities in detail.
Here the genotypes of the parents are listed on top, and the genotypes of the ospring are
listed on the left.

GG-GG

GG-Gg

GG

1

Gg
gg

GG-gg

Gg-Gg

-

0

4

0

-

1

-

-

0

0

0

-

-

1

2
1
2

1

1

2
1

Gg-gg

gg-gg

0

0

1

2
1

0

1
2
Example. Suppose that a plant breeder has a large population consisting of all three
genotypes. At regular intervals, each plant he owns is fertilized with a plant known to have
genotype GG, and is then disposed of and replaced by one of its osprings. We would like
to study the distribution of the three genotypes after n rounds of fertilization and
replacements, where n is an arbitrary positive integer. Suppose that GG(n), Gg(n) and
gg(n) denote the proportion of each genotype after n rounds offertilization and replacements,
and that GG(O), Gg(O) and gg(O) denote the initial proportions. Then clearly we have
GG(n) + Gg(n) + gg(n) = 1 for every n = 0, 1,2, ...
On the other hand, the left hand half of the table above shows that for every n = 1, 2,
3, ... , we have
1
GG(n) = GG(n - 1) + Gg(n - 1),

"2

Gg(n)

1

= "2 Gg(n - 1) + gg(n - 1),

and
gg(n) = 0,

so that
GG(n)J
Gg(n)
(
gg(n)

112
(1
-1)J
1 Gg(n-l).
= 0 112 0J(GG(n
0 o o gg(n-l)

4

160

Introduction to Spectral Theory

It follows that

( ~~~;J = (~~~;J
An

gg(n)

for every n = 1, 2, 3, ... ,

gg(O)

where the matrix

A=(~ ~;~ ~J

000
has eigenvalues Al = 1, A2 = 0, A3

=

112, with respective eigenvectors

p=(~ ~2 ~IJ
We therefore write

p=(~

~IJ and D=(~

-2

-I

with P

=

(I

OJ ,

:J

1

~

0
0 o
0 112

0

-2

-1

Then p-I AP = D, so that A = PDp-I, and so

An =PDnp-

1

=(~o ~2 ~IJ(~ ~ ~ J(~ 0 :J
1

=

1-1I2n

I-l/2n-1

I/2n

l/2n-1

0

0

0
0

0

0 0 1I2n

0 -1 -2

It follows that

I-I/2
n
1I2
Gg(n) = 0

(GG(n))
gg(n)

0

0

n I-1I2 n- 1
I/2n-1

0

(GG(O))
Gg(O)
gg(O)

Introduction to Spectral Theory 161 n GG(O) + Gg(O) + gg(O) . .Gg(O)/2 .gg(O)/2 = Gg(O)/2 n + gg(O)/2 n-\ n-\ o n I-Gg(O)/2 -gg(O)/2 = Gg(O)/2 n + gg(O)/2 n-\ n-\ o This means that nearly the whole crop will have genotype GG.

ln =yT X. y):= XIYI + XV'2 + . The dot product in IR3 was defined as x . y) oftwo vectors x = (xl' X2. Let us now define norm and inner product for en. the distance from its endpoint to the origin) by the Pythagorean rule. we have 1z 12 = x2 + y2 = z Z . Inner product and norm in en. Y = (Yl' Y2' . The complex space en is the most natural space from the point of view of spectral theory: even if one starts from a matrix with real coefficients (or operator on a real vectors space). n Similarly. + X. Note.x).. . xnl. and one needs to work in a complex space. to define the norm ofthe vector x ERn as I 2 2 2 IIxll=-vxI +X2 + . Y = x IY2 + xV'2 + xJY3' where x = (xl' X2' x3)T andy = (YI' Y2' Y3l.. If Z E en is given by .. for 3 example in lR the length of the vector is defined as I 2 2 2 II x 11= -V Xl + x2 + x3 .. that yT X = xT y. For a complex number z = X + iy. and we use the notation yT X only to be consistent. . we defined the length of a vector x (i.. in IR one can define the inner product (x. so II X 11= ~(x. the eigenvalues can be complex. It is natural to generalize this formula for all n. +xnThe word norm is used as a fancy replacement to the word length.. In dimensions 2 and 3.Chapter 6 Inner Product Spaces n Inner Product in IR and en Inner product and norm in IRn..e. ynl by (X... .

x) ~ 4. The inner product we defined for ]Rn and en satisfies the following properties: 1.w)=z. this property z) + (y. w) = w*z is more widely used. Note. 2. Using the notion of A *. It is easy to see that one can define a different inner product in en such that II z Ib = (z. w)1 = Z I WI + Z 2 w 2 + . + Z n Wn = z*w. . so we will use it as well. note. x) (x. z). = 0 if and only if x = o. For a matrix A let us define its Hermitian adjoint. (Conjugate) symmetry: (x. that for a real matrix A..Inner Product Spaces 163 ~~ . One of the choices is to define (z. w)l = (w. Inner Product Spaces. 0 '\Ix. z and all scalars a. (x.WI +z2W2+ . While the second choice of the inner product looks more natural. A* = AT . (z. and then take the complex conjugate of each entry. but z*w and w*z are the only reasonable choices giving II z 112 = (z. z). w) by n (z. We did not specify what properties we want the inner product to satisfy. z). x). 3. y) = (y.. w) = w*z. z). or simply adjoint A* by A* = A-~ meaning that we take the transpose of the matrix. y.iYn ' it is natural to define its norm II z II by 2 n 2 2 n 2 II z II = L (Xk + Yk) = 2:) zk I . Remark. Non-negativity: (x. p.. one can write the inner product in en as (z. Note. Non-degeneracy: (x. k=l" k=l Let us try to define an inner product on en such that II z 112 = (z. the first one. z) = (x. To simplify the notation. y) = is just symmetry.. that the above two choices of the inner product are essentially equivalent: the only difference between them is notatioool.[:~ ! &~ z= z n . let us introduce a new notion. namely the inner product given by (z. because (z. Linearity: (ax + ay.xn . y). z) for all vector x.+znwn= I:>kWk> k=1 and that will be our definition of the inner product in en . that for a real space.

This definition works both for complex and real cases.x). it is easy to check that the properties. Example. y) = y*x = y * x defined above. Note that for a real space V we assume that (x. B) = trace (B*A). We already have an inner product (x. When we have somt! statement about the space F n . In the real case we only allow polynomials with real coefficients. its trace is defined as the sum of the diagonal entries. it n en. An inner product on V is a function. that assign to each pair of vectors x. j. Let V be ]Rn or en .e. denoted by (x. Again. and for a complex space the inner product (x.=1 xkYk This inner product is called the standard inner product in ]Rn or en We will use symbol F to denote both e and lR. and we do not need the complex conjugate here. i. Given an inner product space. y) can be complex. Y a scalar. For the space Mm x n of m x n matrices let us define the so-called Frobenius inner product by (A. y) such that the properties 1-4 from the previous section are satisfied. one defines the norm on it by II x II = ~(x. Note.k . y) is always real.. that we indeed defined an inner product. A space V together with an inner product on it is called an inner product space. Let V be the space Pn of polynomials of degree at most n.164 Inner Product Spaces Let V be a (complex or real) vector space. Define the inner product by (f. means the statement is true for both lR and Example. Let us recall.k so this inner product coincides with the standard inner product in e mn • . that the above properties 1-4 are satisfied.kBj. that for a square matrix A.k' k=l Example. = 2:. -1 It is easy to check. n trace A = 2:ak.g) = JI f(t)g(t)dt. that trace (B* A) = 2: Aj.

. z) \lz E V .y) + 0(x.x) = --- a(x. that properties 1 and 2 imply that 2 '. B : x ~ Y satisfo (Ax. The statements we get in this section are true for any inner product space. II x - ty 112 = (x - ty. x . so x = O. y) I ~ II x II . Let us consider the real case first.0z.ay +0z) = (ay-t. The following property relates the norm and the inner product. Then x = 0 if and only if (x. the transformations A and B coincide. (x. First of all let us notice.x) --- + 0(z. If y = 0. 0) = O. Theorem. Proof Since (0. Putting y = x in) we get(x. I (x. The following corollary is very simple. Then A = B Proof By the previous corollary (fix x and take all possible y's) we get Ax = Bx' Since this is true for all x E X. z).y~ 1.Inner Product Spaces 165 Properties of Inner Product. (x. z) = (y. (Cauchy-Schwarz inequality). but it gives a lot for the understanding. y) = (Bx. not only for Fn' To prove them we use only properties 1-4 of the inner product. this inequality should hold for t = becomes 2t(x. x) = 0. so we can assume that y O. (x. but will be used a lot Corollary. ay + ~z) = ~ (x. II y II· Proof The proofwe are going to present.x) = a(y. Suppose two operators A. is not the shortest one. x) = (x. Applying the above lemma to the difference x . y be vectors in an inner product space V.z) Note also that property 2 implies that for all vectors x (0. y) + i3 (x.x) + 0(z. and for this point the inequality II y II (x. Indeed.x) = a(y. Let x. y) \Ix E X. y) = 0 \ly E V. The equality x = y holds if and only if (x. y) + t 211 Y 112. By the properties of an inner product. \ly E V. Lemma.y) 2 2 lIyll . the statement is trivial.ty) = II x 112 - In particular. for all scalar t "* o ::.y we get the following Corollary. Let x be a vector in an inner product space V. y) = 0 we only need to show that implies x = O.

y)=. In 2-dimensional space this lemma relates sides of a parallelogram with its diagonals. we get II y 112 II Y 112 2 2 I(x. = (x. L::: . An immediate Corollary of the Cauchy-Schwarz Inequality is the following lemma. y) + I t 1211 Y 112. y in an inner product space II x + y II :::. x .y)=4"(llx+ yll 2 2 -llx-yll ) if V is a real inner product space. v II u + v 112 + II u . II y II = (II x II + II y 11)2. y) + (y. x) Substituting t t (x. that the above paragraph is in fact a complete formal proof of the theorem.t(y. x .v 112 = 2(11 u 112 + II v 11 2). al~+aYIl2 4 a=±l. II x 112 + II y 112 + 211 x II . The reasoning before that was only to explain why do we need to pick this particular value of t.ty. One is to replace x by ax. For x. There are several possible ways to treat the complex case.+i if Vis a complex space. The following polarization identities allow one to reconstruct the inner product from the norm: Lemma (Polarization identities). For any vectors u.ty) . where a is a complex constant. and then repeat the proof for the real case. which explains the name. I a I = 1 such that (ax.t(y. Lemma. x .ty) = (x. The lemma is proved by direct computation. (Triangle inequality). and (x.y) into this inequality. II x II . y) is real. Note. Another important property of the norm in an inner product space can be also checked by direct calculation.ty) = II x 112 . Lemma. II x .ty 112 = (x .!. II x II + II y II· Proof II x + y 112 = (x + y.II Y 112 which is the inequality we need. x) :::.y) + (x. The other possibility is again to consider o : :.y) I o :::. x + y) = II x 112 + II y 112 + (x.Inner Product Spaces 166 which is exactly the inequality we need to prove. (Parallelogram Identity). For any vectors x. y E V 1 (x. It is a well-known fact from planar geometry.

The triangle inequality for k . 2. Homogeneity: II v II = II . II v II for all vectors v and all scalars. and we will not present it here. the Parallelogram Identity. nor f" + Ix 2 f" + . kp even has special name: its called Minkowski inequality. For all other p the triangle inequality is true. given p. Non-degeneracy: II v II = 0 if and only if v = o. 2 .lI p for p * 2 cannot be obtained from an inner product. there are many other normed spaces. Any inner product space is a normed space.167 Inner Product Spaces NORMS Normed spaces We have proved before that the norm II v II satisfies the following properties: 1. and one can easily find a counter example in 1R . .lIp'p ~ 2. v) satisfies the above properties 1-4. For example. To check that 1I·lI p is indeed a norm one has to check that it satisfies all the above properties 1-4. after the German mathematician H.lIp'p ~ 2. It is easy to see that this norm is not obtained from the standard inner product in Rn (en). Note.+ II xn 1V'lllp ~ [~I r en by P xk I p One can also define the norm 11. n}. because the norm II v II = J(v. But we claim more! We claim that it is impossible to introduce an inner product which gives rise to the norm 1I. This statement is actually quite easy to prove.2.11"" (p = 00) by II x 1100 = max{1 xkl: k= 1. as the theorem below asserts completely characterizes norms obtained from an inner product. Then we say that the function v ~ II v II is a norm. II u + v II ~ II u II + II v II· Non-negativity: II v II ~ 0 for all vectors v. which then gives rise to a counter example in all other spaces. A vector space V equipped with a norm is called a normed space... The triangle inequality (property 2) is easy to check for P = 1 and p = 1 (and we proved it for p = 2). Triangle inequality: Suppose in a vector space V we assigned to each vector v a number II v II such that above properties 1-4 are satisfied. 1 < p < 00 one can define the norm II x lip ~ (I XI II . The norm 1I·ll p for P = 2 coincides with the regular norm obtained from the inner product.. .3 and 4 are very easy to check. that the norm 1I. 4. lip on lR. In fact.. but the proof is not so simple. It is easy to see that the Parallelogram Identity fails for the norm 1I. Minkowski. Properties 1. However. 3.

.2. Let E be spanned by vectors VI' v 2. 2. we call the system orthonormal. v k . Definition. Lemma. .. If. v to say that the vectors are orthogonal. in addition Itvk 11= 1 for all k. Two vectors u and v are called orthogonal (also perpendicular) if (u. so we do not present it here. u) = II U 112 + II v 112 ((u. v) = (v. r. 2. vk' k WEE = 1. that it satisfies alt the properties... u) = 0 because of orthogonality). Let VI' v2. .. (Generalized Pythagorean identity).e. .. . Note. "i/k = 1. The proof is straightforward computation. But.. but the proof is a bit too involved. The inverse implication is more complicated. In particular...e. Then * . We say that subspaces E and F are orthogonal if all vectors in E are orthogonal to F.e. A system of vectors v I' v2.v 112 = 2(11 u 112 + II v 112) "i/u. . v. i. Yr' Then v ? E if and only if v J. that for orthogonal vectors u and v we have the following Pythagorean identity: II u + v 112 = II U 112 + II v 112 if u 1.168 Inner Product Spaces Theorem. if(vj . Since the vectors vk span E.. . w. We say that a vector v is orthogonal to a subspace E if v is orthogonal to all vectors w in E. v E V.. we need to show that (x. If we are given a norm. Lemma. any vector can be represented as a linear combination Z=:=lakVk' Then so v 1. u + v) = (u. u) + (v. Proof By the definition. . It is indeed possible to check if the norm satisfies the parallelogram identity. let v J. r. .. if v 1. Definition. vk. . k = 1. and this norm came from an inner product... i. all vectors in E are orthogonal to all vectors in F The following lemma shows how to check that a vector is orthogonal to a subspace. v) + (v. vk) = 0 forj k). y) we got from the polarization identities is indeed an inner product. vn be an orthogonal system.. v) + (u.. We will write u 1. this inner product must be given by the polarization identities. E then v is orthogonal to all vectors in E.. A norm in a normed space is obtained from some inner product if and only if it satisfies the Parallelogram Identity II u + v 112 + II u . On the other hand. . v J. . r.. v) = O. ORTHOGONALITY Orthogonal and Orthonormal Bases Definition. vn is called orthogonal if any two vectors are orthogonal to each other (i. II u + v 112 = (u + v. then we do not have any choice..

+ anvn = I:cx jVj' j=1 Taking inner product of both sides of the equation with VI we get .. Then by the ~ 2 2 o=11 0 II 2 = L. It is clear that in dim V = n then any orthogonal system of n non-zero vectors is an orthogonal basis. to find coordinates of a vector in a basis one needs to solve a linear system. k=1 k=1 Corollary.169 Inner Product Spaces n 2 n 2 2 I:CXkVk = I:lcxk Illvk II k=1 k=1 This formula looks particularly simple for orthonormal systems. Proof of the Lemma.. for an orthogonal basis finding coordinates of a vector is much easier... An orthogonal (orthonormal) system VI' v2.tCXjVj] = ttakaj(Vk>Vj)' k=1 k=1 j=1 k=1 j=1 Because of orthogonality vk' v) = 0 if} ::j:. . . However. Any orthogonal system Vi' v2. As we studied before.. k=1 0) we conclude that a k = 0 Vk.. Orthogonal and Orthonormal Bases Definition.. ~2' Generalized Pythagorean identity . it always can be added to any orthogonal system. Since the zero vector 0 is orthogonal to everything. Namely.I CXk I II Vk II .. . .. Proof Suppose for some ai' . tCXkVk 2 = [tCXkVk. 0 (vk ::j:. In what follows we will usually mean by an orthogonal system an orthogonal system of non-zero vectors. vn which is also a basis is called an orthogonal (orthonormal) basis. and let n X = aiv i + 2v2 + . .. suppose VI' v2. Remark.. .. where II vk II = 1.. Therefore we only need to sum the terms with} = k. . an we have I::=I CXkVk = O. so only the trivial linear combination gives O. Since II vk II ::j:.Vk) = I:lak IlIvk II .. .. vn is an orthogonal basis. Vn of non-zero vectors is linearly independent. but it is really not interesting to consider orthogonal systems with zero vectors. which gives exactly n n 2 I:lcxk I 2 2 (Vk. k.

when II V k II = 1. vI) (XI = . . Definition. For a vector V its orthogonal projection P EV onto the subspace E is a vector w such that 1. the coordinates are determined by the formula. 2. one can introduce the following definition. E.2 · II vIII Similarly. This formula is especially simple for orthonormal bases. mUltiplying both sides by v k we get n (x. Let E be a subspace of an inner product space V. VI) = alii VI 112 j=1 (all inner products (Vj . How does one find it? We will show first that the projection is unique. VI) = I:>~ j (Vj . Vk) = L(X j(Vj' Vk) = (Xk(Vk' Vk) = ak II Vk II 2 j=1 so Therefore. proving its existence. so (x. wEE.170 Inner Product Spaces n (x. Orthogonal Projection and Gram-Schmidt Orthogonalization Recalling the definition of orthogonal projection from the classical planar (2dimensional) geometry.. Does the object exist? 2. VI) = al(v l . it is natural to ask: 1. VI) = 0 if} :t= 1).W 1. After introducing an object. The following theorem shows why the orthogonal projection is important and also proves that it is unique. v. to find coordinates of a vector in an orthogonal basis one does not need to solve a linear system. Then we present a method of finding the projection. Is the object unique? 3. We will use notation w = PEV for the orthogonal projection.

Let .w 112 + II y 112 ~ II v . en and lR. Let VI' v2. i.PE! + ~PsY.L E we have y .Inner Product Spaces 171 Theorem. Proof of Proposition.wand so by Pythagorean Theorem II v . ..e. Remark. II = II v . vr form an orthogonal basis in E. if x = w.x II· Moreover. . Proposition. from the definition and uniqueness of the orthogonal projection. Indeed. Recalling the definition of inner product in r PE = L 1 k=ll1 Vk II 2 * VkVk where columns VI' V 2' . The orthogonal projection w for all x E E II v .e. Proof Let y w =w- = PEV minimizes the distance from v to E. It is easy to see now from formula that the orthogonal projection P E is a linear transformation..w 112.. Then v-x=v-w+w-x=v-w+~ Since v . Then the orthogonal projection P EV of a vector v is given by the formula r PEv= LQ. One can also see linearity of P E directly.. this formula for an orthogonal system (not a basis) gives us a projection onto its span.e.kVk' where Q..w . iffor some x E E II v then x = v. II vk II ~ (v... none can get from the above formula the matrix of the orthogonal projection PE onto E in en (lR.J--2vk · k=III vk II Note that the formula for k coincides with..w II ~ II v .vk) PEV= L. n) is given by Remark.. it is easy to check that for any x and y the vector ax + ~y .. i.(aPeX . vr be an orthogonal basis in E.~P sY) is orthogonal to any vector in E. Note that equality happens only ify = 0 i.k = k=I In other words (V'Vk~.. x. The following proposition shows how to find an orthogonal projection if we know an orthogonal basis in E.L v .x II. .x 112 = II v . so by the definition PE(ax + ~y) = a.

. Step 2. v k) - L Ol /Vj' vk) j=I (V.. . Gram-Schmidt Orthogonalization Algorithm.X2 =X2 - (x2' VI) 2 VI' II VI II Define E2 = span {vI' v 2}· Note that span {xl'x 2} = E 2.. n.Xr+I . . . x n . we know how to perform orthogonal projection onto one-dimensional spaces.. .. . constructing an orthogonal system (consisting of non-zero vectors) VI' v 2.. Put VI :=x I ' Denote by EI := span{x I } = span{v I}. x r } = span {vI' V2...PE2 X3 = X3 - (x3' VI) (X3' V2) II vIII IIv211 2 VI - 2 V2 Put E3 := span {VI' V2. Computing the inner product we get for k = 1. Suppose we have a linearly independent system x I' X2. . .. vk) = (v.xr+I .... if we know an orthogonal basis in E we can find the orthogonal projection onto E. r W . . . 2. Note also thatx3 ~ E2 so O.172 Inner Product Spaces r W:L:>~kVk' where Olk= k=I (V'Vk~... The Gram-Schmidt method constructs from this system an orthogonal system vI' v 2' .Vk) 2 =(v. By Lemma it is sucient to show that v = 1._ _ ~ (Xr+I. . Define . .Vk) vr+I ..I Vk II .L vk' k r (v . v3 }· Note that span {xl' X2. vr such that Er := span{vl' v2' . . there exists a simple algorithm allowing one to get an orthogonal basis from a basis.vk)-ak(vk. 2.w. . Step r + 1. Define v3 by V3 := X3 . v r } Now let us describe the algorithm.W . vr } = span{x I. Suppose that we already made r steps of the process.. In particular. x r }.. vn such that span{xl' x 2. But how do we find an orthogonal projection if we are only given a basis in E? Fortunately.. v n }· Moreover.1 E. . . Step 1. .. Define v2 by V2 =X2 -PE. " xn} = span {vI' V2.. v k) - (w.. vk) = (v. X3 } v3 =1= = E 3. for all r $ n we get span {xl' x 2. II vk II We want to show that v . .PEr Xr+I .. Step 3. x 2. . .vk)=--2 I1vk II =0. since any system consisting of one vector is an orthogonal system...6 I 2 Vk k=I . IIvk II SO.

that xr+I e Er so vr+I '* 0. .ll. when performing the computations one may want to avoid fractional entries by multiplying a vector by the least common denominator of its entries. and we want to orthogonalize it by Gram-Schmidt. . Continuing this algorithm we get an orthogonal system vI' v2.173 Inner Product Spaces Note. Since the multiplication by a scalar does not change the orthogonality. Suppose we are given vectors xI = (1. in many theoretical constructions one normalizes vectors vk by dividing them by their respective norms II vk II. Thus one may want to replace the vector v 3 from the above example by (1. Then the resulting system will be orthonormal.11 ~ 112 = 3. I. vn .1. we get Finally. -2. ll. 1. and the formulas will look simpler.) ~ [[ ~Hl]l ~ 3. On the other hand. An example. 2l. ll· On the second step we get (x2' vI) v2 = x2 -PE1X 2 = x2 - 2 VI· II vI II Computing (x2' v... define v3 = x3 - PE2 X3 = x3 - (x3' vI) II vIII 2 vI - (x3' v2) II v211 2 v2· Computing (II VI 112 was already computed before) we get 1 vJ =[g]-~[l]-M -1]= -I 2 Remark.0. . X3 = (1. On the first step define vI =x l = (1. one can multiply vectors vk obtained by Gram-Schmidt by any non-zero numbers. X2 = (0. 2l. In particular.

174 Inner Product Spaces Orthogonal Complement. we get the so-called least square solution.. does not have a solution. which mean exactly that any vector admits the unique decomposition above.. Definition.. situations when we want to solve an equation that does not have a solution can appear naturally. For a subspace E its orthogonal complement E? is the set of all vectors orthogonal to E. By the definition of orthogonal projection any vector in an inner product space V admits a unique representation v = VI + V 2' VI E E. to minimizing the sum of squares of linear functions. := {x:x . For a subspace E Least Square Solution The equation Ax =b R an A...) . Y .jxj -bk k=I j=I i..L.. if we obtained the equation from an experiment. The term least square arises from the fact that minimizing II Ax . But. then there is no solution. Decomposition V = E $ E.b II is equivalent to minimizing 2 2 m 2 II Ax-b II = 2) (AX)k -bk I = k=I m n L: L:Ak. . so it is possible that an equation that in theory should be consistent.L E (eqv.L E. If x. in real life it is impossible to avoid errors in measurements. the system is consistent and we have exact solution. because if there is no solution. The following proposition gives an important property ofthe orthogonal complement.L E then for any linear combination ax + ~y . V2 E E1. is a subspace. V2 .e. Proposition. what one can do in this situation? Least square solution. Therefore E1. Ifwe do not have any errors. for example.. Otherwise.. But what do we do to solve an equation that has a solution if and only if b E does not have a solution? This seems to be a silly question... So.L E}. and equation is consistent.. E1. (where clearly VI = PEV)' This statement is often symbolically written as V = E $ E1. Ifwe can find x such that the error is 0. The simplest idea is to write down the error IIAx-b II and try to find x minimizing it. But.. the right side b belongs to the column space Ran A.

n.b II is minimal if and only if Ax = PRan Ab. . we can forget about absolute values. So. n. then the condition Ax 1.b II is exactly the distance from b to Ran A.vk) PRanAb= w 11 2vk • k=I 11 vk Ifwe only know a basis in Ran A. Ax is the orthogonal projection P Ra~b if and only if b Ax 1. then Ax = PRanAb. Ran A can be rewritten as b -Ax 1. theoretically. Ifwe are in ~n . . we need to use the Gram-Schmidt orthogonalization to obtain an orthogonal basis from it. Fortunately. That means 0= (b -Ax. and then mUltiply the solution by A. .e.Ax) Vk = 1. which gives us minimum. Namely. Then we can just take partial derivatives with respect to Xj and find the where all of them are 0. However. Note. together we get that these equations are equivalent to A*(b -Ax) = 0. the problem is solved. . If aI' a2. ak' '1k = 1. . where PRanA stands for the orthogonal projection onto the column space Ran A. which can be computationally intensive. so minimum of II Ax . there exists a simpler solution. to find the least square solution we simply need to solve the equation Ax = PRanAb.(b.2.. then Ax gives us all possible vectors in Ran A. to find the orthogonal projection of b onto the column space Ran A we need to solve the normal equation A *Ax = A *b. Namely.. so the orthogonal projection P~b can be computed as .. but the solution is not very simple: it involves Gram-Schmidt orthogonalization. As we already discussed above. ifx is a solution of the normal equation A * Ax = A * b (i. the solution of the normal equation A *Ax = A *b is given by x = (A*ArIA*b. a. we can find vector P Ra~b by the formula ~ (b. vn in Ran A. Ran A (Ax E Ran A for all x)... an are columns of A. So. Therefore the value of II Ax ..2.175 Inner Product Spaces There are several ways to find the least square solution.. ak) Joining rows = a. which in tum is equivalent to the so-called normal equation A * Ax =A * b. Formula for the orthogonal projection. there is a simpler way offinding the minimum. if we take all possible vectors x. that the least square solution is unique if and only if A * A is invertible. Ifwe know an orthogonal basis vI' v2. A solution of this equation gives us the least square solution of Ax = b. Normal equation. Geometric approach.• .. If the operator A *A is invertible.and everything is real. a least square solution of Ax = b). So. . .

4). Theorem. for the other one use the fact that 1/ Ax 1/ 2 = (Ax. n (note that here. is to minimize the total quadratic error n 2:1 a+bxk . Suppose our data (xk' Yk) consist of pairs (-2. Therefore Ker(A *A) = {O} if and only if rank A = n. 1). Suppose we run the experiment n times. according to the rank theorem KerA = {O} if and only rank A is n.. but not exactly on it.. . 2. That is where the least square solution helps! Ideally.Yk 2 1 • k=l But. Ideally.. line fitting. 1).176 Inner Product Spaces Since this is true for all b. all the points (xk' Yk) should be on a straight line. (0. and the unknowns are a and b). and we would like to find them from experimental data. (3. One of the inclusion is trivial. The following theorem implies that for an m x n matrix A the matrix A *A is invertible if and only if rank A = n.2). minimizing this error is exactly finding the least square solution of the system :. Let us introduce a few examples where the least square solution appears naturally. The coefficients a and b are unknown. . it usually does not happen: the point are usually close to some line. (-1. Example.j 1 '1 1 xn [bl ~ J~l Yn (recall. x). To prove the equality Ker A = Ker (A *A) one needs to prove two inclusions Ker(A *A) KerA and KerA Ker(A *A). If not. but because of errors in measurements. the coefficients a and b should satisfy the equations a + bxk = Yk' k = 1. and the unknowns are a and b). n.. (2. and we get n pairs (xk' Yk)' k = 1. . Since the matrix A *A is square. Suppose that we know that two quantities x and yare related by the law Y = a + bx. Then we need to find the least square solution of . PRa"A =A(A*ArIA* is the formula for the matrix of the orthogonal projection onto Ran A. the standard thing to do. For an m x n matrix A KerA = Ker(A*A). it is invertible if and only if rank A = n. xk andYk are some fixed numbers. If it is possible to find such a and b we are lucky. 1). . Indeed. 2. that xkYk are some given numbers. Ax) = (A *Ax. Example.

It can also be applied to more general curves. = yk. Then our unknowns a. An example: curve fitting. . . The solution of this equation is a so the best fitting straight line is = 2. suppose we know that the relation between x and y is given by the quadratic law y = a + bx + cx2' so we want to fit a parabola y = a + bx + cx 2 to the data. The general algorithm is as follows: 1. 3. . Find the least square solution of the system. b = -112.2. Examples.177 Inner Product Spaces 4 -1 0 2 3 [~] = 2 1 I 1 Then A*A=(_i 1 -2 -1 1 1 1 1) 1 0 -1 0 2 3 1 1 2 1 3 =(~ 1~) and 4 1 1 1 A*b=(_i -1 0 2 so the normal equation A *Ax = A *b 2 I 1 I j) =(-~) is rewritten as (~ ?8)(~) =(-~). as well as to surfaces in higher dimensions. in matrix form .. y = 2 . For example... Write these equations as a linear system. n or. Find the equations that your data should satisfy if there is exact fit. where unknowns are the parameters you want to find. Note. that the system need not to be consistent (and usually is not). Curves and Planes. c should satisfy the equations a + bXk + ex. b. The only constraint here is that the parameters we want to find be involved linearly.. 2. The least square method is not limited to the line fitting. k = 1.1I2x. .

let us fit a plane z = a + bx + cy to the data (xk'Yk' zk) E E]R3.'" n. for the data from the previous example we need to find the least square solution of 1 -2 4 4 1 -1 2 1 0 1 1 2 1 1 3 9 1 Then 1 -2 4 2 -1 1 1 1 1 -1 0 2 3 1 o 0 18 26 18 26 114 1 0 4 9 1 2 4 1 3 9 and 4 1 1 1 2 -1 0 2 1 1 0 4 1 1 Therefore the normal equation A *Ax = A*b is H~]= r <A*A=H A*b=H =p i]= IS] =[3H ~ 261~ 114i~][~l [-~l 31 [ 18 C = which has the unique solution a = 86/77.?l 1 X n X Yn n For example. The equations we should have in the case of exact fit are a + bXk + cYk = zk' k = 1. in the matrix form . . n. or.2. b = -62/77. Y = 86/77 .178 Inner Product Spaces : :. Plane fitting.. As another example. . :~ [~]=[. Therefore.. k = 1.62 x/77 + 43 x2/154 is the best fitting parabola.2. C = 43/154.

A *y) is often used as the definition of the adjoint operator.z. If [AlBA is the matrix of A with respect to these bases. Since for complex numbers z and W we have zw = . The following identity is the main property of adjoint matrix: I(Ax..w2' .. In other words. Uniqueness of the adjoint. Now. Let us recall that for transposed matrices we have the identity (AB)T = BT AT..I Before proving this identity. and therefore the matrices A and B coincide. y. Also.y) = (x.y E V. .. the first and the last equalities here follow from the definition of inner product in Fn . The above main identity (Ax. by the definition of A* (x.. Let us first notice that the adjoint operator is unique: if a matrix B satisfies (Ax.. the identity (AB)* = B*A* holds for the adjoint. since (AT l = A and z = ~. y) = y*Ax = (A*y)*x = (x. 'dy E W. and the middle one follows from the fact that (Ax) = x(A) = xA. where A : V ~ W is an operator acting from one inner product space to another.wm in W. then B = A *.Inner Product Spaces 179 So. we define the operator A * by defining its matrix [A *lAB as . (A*) =A* = A. y) = (x. we define A* : W ~ Vto be the operator satisfying (Ax. b. to find the best fitting plane. we need to find the best square solution ofthis system (the unknowns are a. vn in Vand B = w l . Indeed. Let as recall that for an m x n matrix A its Hermitian adjoint (or simply adjoint) A* is defined by A * : = AT . let us introduce some useful formulas. Namely. we are ready to prove the main identity: (Ax. A *y) can be used to define the adjoint operator in abstract setting. y) = (x. the linear transformations. Since it is true for all y. By) 'd x and therefore by Corollary A*y = By.A * y)\ix. By) 'd x. Adjoint transformation in abstract setting. the matrix A * is obtained from the transposed matrix AT by taking complex conjugate of each entry. y) = (x. Fundamental Subspaces Revisited Adjoint matrices and adjoint operators. A *y) 'dx E V. . y) = (x. A*y). Why does such an operator exists? We can simply construct it: consider orthonormal bases A = VI' v2. The above main identity (Ax.A *y) = (x. . c).

Proof First of all. In the "matrix" proof. the statements 2 and 4 are equivalent as well. Ker A = (Ran A)-I. that since for a subspace E we have (E -L ) -L = E. (AB) = B*A *. l. is the row number k of A*. Let aI' a 2. (aA)* (iA*. we proved that x E (Ran A) -L if and only ifA *x = 0. or "coordinate-free" one. (y. 3. and that is exactly the statement 1./k = 1. . ak) = A. Below we present the properties of the adjoint operators (matrices) we will use a lot.if and only if x . Ran A = (Ker A). Ax) = (A *y. (A*)* =A.. for the same reason. Note. and considering the matrix of A in this bases. ak) E (Ran A)-I. and an "invariant". = 2.. (A + B) = A* + B*. n. Since A. . Finally. Similarly. thatA : F" ~ P. . that means o = (x. = (Ker A)-L.. Ker A = (Ran A)-L. The general case can be always reduced to this one by picking orthonormal bases in Vand W... the above n equalities are equivalent to the equation A*x = o. 5.180 Inner Product Spaces [A*]AB = ([A]BA)*' Useful form ulas. to prove the theorem we only need to prove statement 1. We will present 2 proofs of this statement: a "matrix" proof..2. statement 2 is exactly statement 1 applied to the operator A * (here we use the fact that (A*)* =A) So. Relation Between Fundamental Subspaces Theorem.e. 4. Let A : V another.. 2. By the definition of the inner product in F n . 3. that x (i... So. Then ~ W be an operator acting from one inner product space to l. we assume that A is an m x n matrix. let us notice.e. the statements 1 and 3 are equivalent. n. an be the columns of A. . = 1. (x. x)..l.L ak = 0) \. i. . Ran A 4. . 2. .

II Ux II = II x II Vx E X. if it preserves the norm. The inclusion x E (Ran A). The following theorem shows that an isometry preserves the inner product. the last identity is equivalent to (A *x. An operator U: x -7 Y is called an isometry.U y ) 1 ~ =. i.e.y).±i Similarly. and by Lemma this happens if and only if Ax = O. Isometrices and Unitary Operators. if Xis a complex space (Ux. It follows from this theorem that the operator A can be represented as a composition of orthogonal projection onto Ran A * and an isomorphism from Ran A * to Ran A... An operator U : X -7 Y is an isometry if and only if U*U = I. 4 a=±l. tland only if (x. Proof The proof uses the polarization identities. that (x. y) = 0 Vy.±i =! L exllx+ exyI1 2=(x.. Ay) = 0 Vy . let us present the "coordinate-free" proof. . An operator U : x -7 Y is an isometry if and only if it preserves the inner product.D ex II Ux+exUy II 2 4 a=±l.L means that x is orthogonal to all vectors of the form AY' i. Theorem. y) = (Ux.y E X.L if and only if A *x = 0. So we proved that x E (Ran A). Lemma. For example.y). which is exactly the statement 1 of the theorem.. Since (x. for a real space x 1 2 (V"U y ) = "4(11 Ux+Uy II 1 2 = "4 (II U (x + y) II -II Ux - 2 -II Ux-Uy II ) Uy II 2 ) I 2 2 ="4(1l x +YII -llx-yll )=(x.e.. The above theorem makes the structure of the operator A and the geometry of fundamental subspaces much more transparent. Ay) = (A *x.±i =! L ex IIU(x+exUy) 112 4 a=±I. y). Uy) Vx. Unitary and Orthogonal Matrices Main definitions Definition. ..181 Inner Product ::''paces Now.

that a matrix V is an isometry if and only if its columns form an orthonormal system.. y) VY E X. On the other hand. Let x and Y be inner products paces. An isometry V: X ~ Y is a unitary operator if and only if dirnX = dim Y. it is invertible (a left invertible square matrix is invertible). . The next example is more abstract. y) = (Ux. This statement can be checked directly by computing the product V*V. A square matrix V is called unitary if V*V = I. let us notice. and therefore by Corollary V*Vx = x. an orthogonal matrix is a matrix of a unitary operator acting in the real space R n • Few properties of unitary operators: I.) coso. . Vvn is an orthonormal system. vn is an orthonormal basis. On the other hand. if V*V::I: I. it is unitary. y) = (x. y) = (Vx' Vy ) and therefore Vis an isometry. Vy) = (V*Vx. a unitary matrix is a matrix of a unitary operator acting in Fn' A unitary matrix with real entries is called an orthogonal matrix. If V is a isometry. y) V x. we have (x. For a unitary transformation V.. then for all x E X (V*Vx. 4.Vv2.. dim X = dim . . Therefore. . Y E X. and vi' v2. then Uv l . if V is an isometry. A product of unitary operators is a unitary operator as well. An isometry V: X ~ Y is called a unitary operator if it is invertible. dim X = dim Y (only quare matrices are invertible. Therefore. and since it is square. It is easy to check that the columns of the rotation matrix c?sa ( sma -Sino. ""Vvn is an orthonormal basis. In other words. Proof Since V is an isometry. V* is also unitary. Definition. isomorphic spaces have equal dimensions). Examples. If V is unitary. U. Moreover. The above lemma implies that an isometry is always left invertible (V* being a left inverse). and since dim X = dim Y . it is left invertible. and that each column has norm 1. the rotation matrix is an isometry. if V is unitary. if V: x ~ Y is invertible.e. Proposition. we have V*V = I. = U-I 2.. VVi'Vv 2.182 Inner Product Spaces Proof By the definitions of the isometry and of the adjoint operator (x. are orthogonal to each other. i..I = V*. Since all entries of the rotation matrix are real. First of all. Since it is true for all x E X. 3. it is an orthogonal matrix.

X. 2. To prove statement 2 let us notice that if Ux = Ax then II Ux I A I = 1. The following proposition gives a way to construct a counterexample. Ux is an eigeri'vector of B. . + C. Since for a vector x = Cl xl + C(2 + . so Unitary Equivalent Operators Definition.. Define an operator U: X ~ Y by Uxk = Yk' k = 1.. Since det (U) = det(U) . Then BUx = UAU*Ux = UAx = U(Ax) = 'AUx~ i. Let now a has an orthogonal basis up u2. 2. .. So. Operators (matrices) a and b are called unitarily equivalent if there exists a unitary operator U such that A = UBU)*. I det U I = 1. Since U is unitary. for an orthogonal matrix det U = ±I. 11l2=II~Ckxk 112= ~ICk one can conclude that II Ux II = II X II for all x E 2 1 . . it is easy to construct a pair of similar matrices. . Ue n is an orthonormal basis. Proof LetA = UBU* and let Ax = Ax.l = if. Then I. n. the system Ue l .x... letD = UAU*. and let xl' x 2. that for an orthogonal matrix. Proof Let det U = z. Our old friend.. which are not unitarily equivalent.ff II X 112 = I cl12 + I c21 + . II x II. Dividing each vector Uk by its norm if necessary. . we can always assume that the system u 1l u2' .Ue2...e. Xn and yp Y2' .. so U is a unitary operator. Let U be a unitary matrix.. Properties of Unitary Operators Proposition.. so the vectors Ue k are eigenvectors of A. .. In particular. Yn be orthonormal bases inXand Yrespectively. Note.. i.183 Inner Product Spaces Y= n. . A matrix a is unitarily equivalent to a diagonal one if and only if it has an orthogonal (orthonormal) basis of eigenvectors. + I Cn I and II Ux 112 = u[~c. then I 'A I = 1 Remark. the rotation matrix gives an example.. . any fWO unitary equivalent matrices are similar as well. an eigenvalue (unlike the determinant) does not have to be real. . Statement 1 is proved. .. The converse is not true. un of eigenvectors. Proposition. so I det U I = I z I = 1. we have I z 12 = z z = det (U* U) = det I = 1. letA be unitarily equivalent to a diagonal matrix D. un is an .. The vectors ek of the standard basis are eigenvectors of D. II = II Ax II = I 'A I . If 'A is an eigenvalue of U.e... Since for a unitary U we have U.

Inner Product Spaces 184 orthonormal basis. there exists u E V such that u + (u) = O. The complex euclidean inner product of u and v is dened by u. we have (a + b)u = au + bu.. WE V. we have lu = u. un) and v = (vI' . Proposition. .. .. (SM2) For every C E and u. we have u + (v + w) = (u + v) + w. .v " = (I u l . (SM3) For every a. Then . V E V.. A complex vector space V is a set of objects. (VAS) For every u. b E C and u E V. we have cu E V.vn 12)112.. .. A = UDU*. V E V. The standard change of coordinate formula implies A = [A]ss = [1]SB [A]BB [1]BS = UDU. We begin by giving a reminder of the basics of complex vector spaces or vector spaces over Definition. + Un Vn . and the complex euclidean distance between u and v is dened by d(u. Denote by U the matrix with columns u I ' u2' . v = U I VI + . (SMl) For every C E and u E V. norm and distance. Suppose that u = (up .. v) = " u .vI 12 + . + 1un 12)112. Suppose that u. COMPLEX INNER PRODUCTS Our task in this section is to dene a suitable complex inner product.. un E e. v. U is unitary.. WEen and CE C. Corresponding to Proposition. Remark. (VA4) For every u E V. An example of a complex vector space is the euclidean space Cn consisting of all vectors of the form u = (u I ' ••• .. Definition. b E C and u E V. D is a diagonal matrix.. we have u + V= V + u. (VA3) There exists an element 0 E V such that for every u E V. we have (ab)u = a(bu). Subspaces of complex vector spaces can be dened in a similar way as for real vector spaces. un)' where u I ' . (SM4) For every a. the complex euclidean norm of u is dened by "u" = (u u)1I2 = (I u I I2 + . We shall first generalize e. e e the concept of dot product. we have u + 0 = 0 + u= u. known as vectors. vn) are vectors in e. rst developed for ~n in Chapter 9..I and since U is unitary.. un' Since the columns form an orthonormal basis. we have c(u + v) = cu + CV. v. un' Clearly. (SMS) For every u E V. + 1un .. we have the following result. together with vector addition + and mUltiplication of vectors by elements of and satisfying the following properties: (VAl) For every u.. (VA2) For every u. we have u + v E V. Let D be the matrix of A in the basis B = u I ' u2. . e. v E V. ..

the analogous roles are played by unitary matrices and hermitian matrices respectively. Definition.v)+(u. we mean a function (. Definition. orthogonal matrices and symmetric matrices play an important role in the orthogonal diagonalization problem. we have (u. c(u. Definition. v E V and c E C. v. For matrices with complex entries. Suppose further that the matrix A is obtained from the matrix A by replacing each entry ofA by its complex conjugate. The following definition is motivated by Proposition. By a complex inner product on V. Suppose that A is a matrix with complex entries. vi = hv.u)1I2 . ) : V x V ~ C which satises the following conditions: (lPl) For every u. the Gram Schmidt orthogonalization process.u) = 0 if and only if u = O. Suppose that u and v are vectors in a complex inner product space V .v).wehave (u.u) ~ 0. Then the matrix --I A=A is called the conjugate transpose of the matrix A. we can discuss orthogonality. and 0. v) =(cu. the results in Sections can be generalized to the case of complex inner product spaces. and (u. orthogonal and orthonormal bases. as well as orthogonal projections.vii. and u . (b) (A + B)* =A* + B*. v. WE V. Definition. (lP3) For every u. Suppose that A and B are matrices with complex entries. Using this inner product. u = 0 if and only if u = O. In particular. in a similar way as for real inner product spaces. we have hu. (lP2)Foreveryu.v+w)=(u. ui. Then the norm of u is defined by II u 1I=(u. . we have chu. Suppose that V is a complex vector space. Proposition. V E V. A complex vector space with an inner product is called a complex inner product space or a unitary space. Then (a) (A *)* = A.u. v= v.w). u ~ = (u = (c u) v) + (u w). and that c E C. and the distance between u and v is defined by d(u. Unitary Matrices For matrices with real entries. (b) u (v + w) (c) c(u v) (d) u. v) = IIu .185 Inner Product Spaces (a) u. (IP4) For every u E V.

First of all. we then need to discuss how we may nd a unitary matrix P to carry out the diagonalization. Remark. Corresponding to Propositions. Definition. We have indicated that a square matrix with real entries is orthogonally diagonalizable if and only if it is symmetric. it is not true that a square matrix with complex entries is unitarily diagonalizable if and only if it is hermitian. Unfortunately. Then it is unitarily diagonalizable if and only if it is normal. Corresponding to Proposition. In other words.186 Inner Product Spaces = cA*. A square matrix A with complex entries is said to be normal if AA * = A *A. The explanation is provided by the following. The most natural extension to the complex case is the following. Definition. there are unitarily diagonalizable matrices that are not hermitian.3. Proposition. Definition. For those" that are. Suppose that A is an n n matrix with complex entries. we now discuss the following unitary diagonalization problem. Note that every hermitian matrix is normal and every unitary matrix is normal. Then (a) A is unitary if and only if the row vectors of A form an orthonormal basis of en under the complex euclidean inner product. (c) (cA*) Definition. Then u l u2 = O. we would like to determine which matrices are unitarily diagonalizable. (b) A is unitary if and only if the column vectors of A form an orthonormal basis of en under the complex euclidean inner product. Suppose that u l and u2 are eigenvectors ofa normal matrix A with complex entries. Suppose that A is an n x n matrix with complex entries. these are dened as for the real case without any change. and (d) (AB)* = B*A*. eigenvectors of a normal matrix corresponding to distinct eigenvalues are orthogonal. While it is true that every hermitian matrix is unitarily diagonalizable. corresponding to distinct eigenvalues Al and ~ respectively. we study the question of eigenvalues and eigenvectors of a given matrix. we have the following results. We can now follow the procedure below. A square matrix A with complex entries is said to be hermitian if A = A. Proposition. As before. A square matrix A with complex entries and satisfying the condition A-I = A * is said to be a unitary matrix. we have the following result. . A square matrix A with complex entries is said to be unitarily diagonalizable if there exists a unitary matrix P with comp lex entries such that p-IAP = P *AP is a diagonal matrix with complex entries. Unitary Diagonalization Corresponding to the orthogonal disgonalization problem in Section 10. Proposition.

. Suppose that A is a hermitian matrix. we wish to approximatefby a polynomial . vn to obtain orthonormal eigenvectors wI' .. it follows that they are real. Suppose further that is an eigenvalue of A.. (3) Normalize the orthogonal eigenvectors vI' .. (2) Apply the Gram-Schmidt orthogonalization process to the eigenvectors u l . These form an orthonormal basis of en..Inner Product Spaces 187 Unitary Diagonalization Process... Proposition. Wn ) J. (1) Determine the n complex roots 11"'" In of the characteristic polynomial det(AII). . .. It is easy to prove that hermitian matrices must have real entries on the main diagonal. Since v*Av and v*v are 1 xl. 'Wn E en are respectively their orthogonalized and normalized eigenvectors. It follows that both v*Av and v*v are hermitian. . un to obtain orthogonal eigenvectors vI' . we obtain v*Av = V*AV = AV* v. We conclude this chapter by discussing the following important result which implies Proposition. Furthermore.. noting that eigenvectors corresponding to distinct eigenvalues are already orthogonal. and n linearly independent eigenvectors uI ' . . with corresponding eigenvector v. .. .. b] ~ lR. Wn of A.. un of A corresponding to these eigenvalues as in the Diagonalization process. Suppose that A is a normal n n matrix with complex entries. that all the eigenvalues of a symmetric real matrix are real. Multiplying on the left by the conjugate transpose v* of v.. Proof Suppose that A is a hermitian matrix. Then all the eigenvalues of A are real.. and D = AI ( An where AI' . . it suces to show that the 1 x 1 matrices v*Av and v*v both have real entries. To show that A is real.. Now (v*Av)* = v*A*(v*)* = v*Av and (v*v)* = v*(v*)* = v*v. write P = (w 1 .. Then Av = AV. Then P*AP = D. An E e are the eigenvalues of A and where wI' . Vn of A. APPLICATIONS OF REAL INNER PRODUCT SPACES Least Squares Approximation Given a continuous functionf: [a.

It follows from Pythagoras's theorem that II u . bJ. with inner product (f. b] ~ lR with real coecients and of degree at most k. projwU can be thought of as the vector in W closest to u. b].projwu . Suppose that {l'o' vI' .g.188 Inner Product Spaces g: [a. Proposition. Let V denote the vector space qa. vk } is an orthogonal basis of W= P k [a.w)1I 2 = II u . Alternatively. we have .W 112.W) + (projwu . b]. the distance from u to any W E W is minimized by the choice w = projwU. and that W is a finitedimensional subspace of V. b] of all continuous real valued functions on the closed interval [a. The purpose of this section is to study this problem using the theory of real inner product spaces. the inequality II u projwU II ~ II u w II holds for every w E W.g) = f: f(x)g(x)dx. b]. Note that W is essentially Pk' although the variable is restricted to the closed interval [a.g Ii. In view of Proposition IIA. b] be the collection of all polynomials g : [a.W 112 II u .gil· It follows that the least squares approximation problem is reduced to one of nding a suitable polynomial g to minimize the norm IIf.g) =11 f .f . so that II u . Suppose that V is a real inner product space. the orthogonal projection of u on the subspace W.. Our argument is underpinned by the following simple result in the theory. .W 112 = II(u . such that the error f b a I f(x) - 2 g(x) dx is minimized. This subspace is of dimension k + 1. b] ~ lR of degree at most k. Then b a f I f(x)g(x) 2 2 dx =(f .. Given any u E V.L and proj Wu WE W. Now let W = Pk [a.proj wU 112 = II proj wU .projwU 112 + II projwU . Then by Proposition. In other words.W 112 ~ 0: The result follows immediately. It is easy to show that W is a subspace of V. we conclude that g= projwf gives the best least squares approximation among polynomials in W = P k [a. Proof Note that u projwU E W.

. and W = PI [0.x-I) g= \ 1+ \ 2 (x-I). 1]. In this case.2].. x-I} of W. We now apply the Gram-Schmidt orthogonalization process to this basis to obtain an orthogonal basis {I.X-I) = f: x (x-I)dx = ~ and II x_II1 = (x-I.X-1I2)( IIlIT + II 1 X _11211 2 X -"2 . Consider the functionf(x) = eX in the interval [0. 3 It follows that 4 2 g=-+2(x-I)=2x--.g) = f~f(x)g(x)d I x.x-I) = 2 1o 2 2 (x-I) dx=-. 1].vo) g- II Vo II 2 VO (f. we can take V = C[O.Inner Product Spaces 189 _ (f. and W = PI [0. we can take V = qo.I) g Now so that = (e . Suppose that we wish to find a least squares approximation by a polynomial of degree at most 1. Consider the functionj{x) 2 Vk • = x 2 in the interval [0. 2]. and take li.I) li. and take 1) X (ex. with inner product (f. with basisfI' xg. We now apply the Gram-Schmidt orthogonalization process to this basis to obtain an orthogonal basis {I. 111112 IIx-III (i.vI) (f.1I2} of W. 1]. 3 3 Example.2]. x . with inner product (f.Vk) II VI II II Vk II +-2 VI + . with basis {I. . x}.g) = f: f(x)g(x)dx. In this case. while 2 2 2 (x .I)= f:idx=~ and 111112= f:dx=2. Suppose that we wish to nd a least squares approximation by a polynomial of degree at most 1. + Example.

IH. +7xi(XIX'{~ ~tJ Example. -coo 2 l) ifi> j. the quadratic form can be described in terms of a real symmetric matrix.w II is minimized by the unique choice w = proj wU. this is always possible... It can be written in the form 5x~ +6x1x. The expression 4x~ + 5x..x-f)= J. 2. It follows that the least squares approximation problem posed here has a unique solution. it is clear that II u .( X-Hdx= I~ It follows that g = (e-l)+(l8-6e)(x-.. note that given any quadratic form (1). + 2xlx2 + 4xlx3 + 6x2x3 = (xlx2 x3) 1 5 3 ( 2 3 3 x3 Note that in both examples. The expression 5x~ +6xlx2 +7x. From the proof of Proposition.dx= 1 and ~xMI' +-f. To see this. is a quadratic form in two variables xl and x 2 .j = 1.Xj' j=1 iSj.. .190 Inner Product Spaces Also 11111'= (1. .!. xn is an expression of the form n n LL .=1 Cijx. Quadratic Form A real quadratic form in n variables xl' . for every i.j = 1. . + 3x. + 3x. Example. . where cij E lR for every i. Remark. + 2x1x2 + 4xlx3 + 6x2x3 is a quadratic form in three variables xl' x 2 and x 3• It can be written in the form 4 1 2J (XIJ x2' 4x~ + 5x. In fact. . 1 1 -Coo 2 )1 ifi> j. n satisfying i < j.) =(l8-6e)x+(4e-1O). aij = Cij ifi = j. we can write. .. n.

The rst is the question of what conditions the matrix A must satisfy in order that the inequality x/Ax> 0 holds for every non-zero x E ]Rn. Proposition. A quadratic form x/Ax is said to be positive denite if x/Ax> 0 for every non-zero x E lR n • In this case. writing y= ptx. In other words. Definition. It follows that xtAx = xtPDptx . Since the matrix A is real and symmetric. Our strategy here is to prove Proposition by first studying our second question. such that Y the quadratic form x(1x can be represented in the alternative form y Dy. there exists an orthogonal matrix P and a diagonal matrix D such that PtAP = D. where D is a diagonal matrix with real entries. Many problems in mathematics can be studied using quadratic forms. it follows from Proposition lOE that it is orthogonally diagonalizable. Here we shall restrict our attention to two fundamental problems which are in fact related. . The second is the question of whether it is possible to have a change of variables of the type x = P ' where P is an invertible matrix. A quadratic form xtAx is positive denite if and only if all the eigenvalues of the symmetric matrix A are positive. and so A = P DPt. In this case. To answer our rst question. where A is an n x n real symmetric matrix and x takes values in ]Rn.Inner Product Spaces 191 Then We are interested in the case when xl"'" xn take real values. we say that the symmetric matrix A is a positive denite matrix. we shall prove the following result. and so. we can write It follows that a quadratic form can be written as x/Ax.

. where A=(~ ~ ~) ~d x=(~} The matrix A has eigenvalues I = 7 and (double root) 2 Furthermore. Furthermore. +4xlx2 + 2Xlx3 + 4x2x3. which is clearly positive defnite. the diagonal entries in the matrix D can be taken to be the eigenvalues of A. .J3 0 0 1 Writingy = pIX. we also have x = Py. Also.4xlx2 + 4x2x3. n E lR are the eigenvalues of A. Writing y=(n we have _1. so that D=(AI '.~ 1I~ -:. This cn be .J2 1/. This can bewritten in the form xlAx. Consider the quadratic from 5xJ + 6x~ + written in the form XlAx.192 Inner Product Spaces we have x'Ax =YDy. 1116 and -1I.~l D=(~ ~ ~). where = 3 = 1... + AnYn. + y. + 2x. Example.. P=[~. where yi . the quadratic form becomes 7xJ + y. see Example. Note now that x = 0 if and only if y = 0. This answers our second question. Consider the quadratic form 2xJ + 5x. since P is an invertible matrix.t 2 2 x:Ax = y Dy = AlYI + . since P is an orthogonal matrix. Example. J where AI' . in view of the Orthogonal diagonalization process. we have plAP = D.

we have PIAP = D. n]. we have f. For every /E E. Clearly this is equal to (x 1 + x 2)2 and is therefore not positive defnite. where (x) = 0 for every x E [-n. . Consider the quadratic form x~ + xi + 2xIX2. where P= ( 2/3 2/3 -113) 2/3 -113 2/3 and -113 2/3 2/3 D= (3 0 0) 0 6 o. The quadratic form can be written in the form xlAx. Then the following conditions hold: For every f. where A= G :) and x = [::j. let E E denote the function A: [-n. For every f. We further adopt the convention that any two functionsf. the quadratic form becomes 2y~ which is not positive denite. It follows from Proposition that the eigenvalues of A are not all positive. It is easy to check that E forms a real vector space. the matrix A has eigenvalues Al = 2 and A2 = 0. gEE. gEE. Real Fourier Series Let E denote the collection of all functions /: [-n. A2 = 6 and A3 = 9. Indeed. 1IJi -1IJi 0 Writing y = pIX. we have/ + (-j) = A. This means that any / E E has at most a finite number of po ints of discontinuity. we have / + (g + h) = if + g) + h.Inner Product Spaces 193 The matrix A has eigenvalues Al = A3. ateach of which / need not be dened but must have one sided limits which are finite. h E E. See Example. For every f. n] E lR. gEE are considered equal. we have/ + A= A+/= f. denoted by /= g. n] -7 lR which are piecewise continuous on the interval [-n. with corresponding eigenvectors (:) and (_:). furthermore. n]. the quadratic form becomes 3y~ + 6y. +9y. Hence we may take p=(1IJi 1IJi J and D=(2 00). More precisely. gEE. which is clearly positive definite. n] with at most a finite number of exceptions. if/ex) = g(x) for every x E [-n. Example. 0 0 9 Writingy= pIX. For every / E E. g. we have / + g = g +f.

we have as well as . 1t -1t The integral exists since the function j{x)g(x) is clearly piecewise continuous on [-1t'.g)+(f. We now give this vector space E more structure by introducing an inner product. For every a.194 Inner Product Spaces For every e E IR andfE E.cos 2x. we have If= f.f).and . (f.g) = (g. gEE. we have e (f. we have (f. For every e E IR andJ. e E IR andf E E. For every fEE. gEE and e E IR. gEE. Remark.m)x .COS3X. sin 3x. we have efE E. It is not straightforward to show that the set {. we have (a + b)f= af + bf. The diculty is to show that the set spans E. gEE.wehave (f. g. we have elf + g) = ef+ eg. For every J.cosx. we have (a + b)f= a(bf). It is easy to check that the elements in (4) form an orthonormal "system". kx' . For every a.f)~O. For every J. •• -} in E forms an orthonormal "basis" for E.k sin x. mEN. ForeveryfE E. For every k. we have (f. h E E. m . e E IR andfE E. 1t]. It is easy to check that the following conditions hold: For every J. kx'smmxdx =-1 .smmx) = -1 f1t sm (SIll 1t -1t 1t 12 . g). g) = (cf.g+ h) = (f.f)=O if and only iff=A.sin 2x.cos (k + m)x)dx = {I0 f1t -(cos(k -1t ifk=m if k:f.g) = !f1t f(x)g(x)dx. The diculty here is that the inner product space E is not finite-dimensional.h). For every f.(f. HenceEis a real inner product space.

On integrating by parts. Then a natural extension of Proposition 9H gives rise to the following: Every function J E E can be written uniquely in the form ao + t(ancosnx+bnsinnx) 2 n=l known usually as the (trigonometric) Fourier series of the function f. for every n E N. Consider the functionJ: [-x. with Fourier coecients Ji :n) !i:1t (J.cosmx) = lJ1t coskxcosmxdx = l X -1t X .(_[xcosnx]1t + [sin. = J (x)dx. smnx. for every n E N.cosnx) = lJ1t J(x)cosnxdx and x -1t bn = (J. we have 1t bn = lJ1t xsin nxdx = l r xsin nxdx. x]. a = lJ1t xcosnxdx = 0 n x-1t since the integrand is an odd function. Example.(sin(k-m)x-sin(k+ m)x)dx x -1t X -1t 2 Let us assume that we have established that the set (4) forms an orthonormal basis for E.:n)= ~=~. we have ~ 1R. On the other hand.!. an = (J. given by j{x) = x for every x E [-x.. we have bn =3.!. and. L 00 n=1 n .cosmx) =lJ1t sinkxcosmxdx =lJ1t .Inner Product Spaces 195 (coskx.(_[xcosnx]1t + i1tcosnx dxJ=3. x -1t x Jo since the integrand is an even function.x]1tJ= 2(-1)n+1 x noon X non 0 n We therefore have the (trigonometric) Fourier series 2( _l)n+1 .. x] For every n E N u {O}.sinnx) = lJ1t J(x)sinnxdx x -1t Note that the constant term in the Fourier series (5) is given by (f.(cos(k-m)x-cos(k+m)I~)dx = {I r J1t -1t 2 0 ? k =m if k '* m (sinkx.

n] ~ 1R. dx . an = -1 f1t sgn~x n -1t 1t 0 since the integrand is an even function. 7t(2k -1) II odd Note that the functionfis even. we have ')'smnxdx -2 i1t smn.nxIJ { ~. we have bn = ~f1t I x I sinnxdx = 0 n -1t since the integrand is an odd function. Example. we have an 1t = ~f1t I x Icosnxdx = ~ ro xcoxnxdx.~ n-l ttn k=l 2 cos(2k-l)x. given for every x E [-n. nJ n -1t since the integrand is an even function. n] by + 1 if 0 < x ~ n. f(x) = sgn(x) = 0 if x = 0 { -1 if -n ~ x < 0. for every n EN. For every n E N u {O}. Clearly ao = ~f1t xdx = n. for every n EN.196 Inner Product Spaces Note that the functionfis odd. We therefore have the (trigonometric) Fourier series nf4 n~ 2 2 4 . n]. since the integrand is an odd function. Consider the function f: [-n. ifnisodd. 7t 0 Furthermore.~--2 cosnx=-. and this plays a crucial role in eschewing the Fourier coefficients bn corresponding to the odd part of the Fourier series. On the other hand. given by f(x) = Ix I for every x E [n. n] ~ 1R. and this plays a crucial role in eschewing the Fourier coefficients an corresponding to the even part of the Fourier series. for every n EN. On the other hand. we have an =~([xsinnx lJ1t _ J1tSinnxdxJ n 1t ~ ~ ([XSi:nx 0 n 0 o I +[cO:.x . =- if niseven. Consider the functionf: [-n.. on integrating by parts. It is easy to see that . Example. For every n E N u {O}we have an = ~f1t n -1t sgn(x) cos nxdx = 0.

n]-t lR. n=1 nn k=1 n(2k -1) 11: 0 L- -- L 00 00 II odd Example. . n n 2 an = - 2 =- 0 -It since the integrand is an even function. 3 11=1 4( _1)n 2 n cosnx.Inner Product Spaces bn = 197 _~[cosnx]lt ={ ~ m ifniseven ifn is odd. for every n EN. we have bn =-1 n flt x 2 sin nxdx =0 -It since the integrand is an odd function. Furthermore.)' On the other hand. n] For every n E N u {O} • we have 1 flt x cosnxdx 2i lt x cosnxdx. given by j(x) = x 2 for every x E [-n. on integrating by parts. we have !([ x' s~nnxI-f:2xs~nnx J !([ x' s~nnxI+[ 2X~SnxI -cc:~nx = an = = dx I dx J I ~([ x2s~nnx _[2X~~nx _[2X~~nx]J 4(:. Consider the function/: [-n. Clearly lt ao = ~ r idx = 3n n Jo 3 2 . for every n EN. nn We therefore have the (trigonometric) Fourier series 4 4 sinnx = sin(2k-l)x. We therefore have the (trigonometric) Fourier series n2 00 -+ L.

If dirnX = 1 the theorem is trivial. Theorem.1' We do care enough about the lower right (n . .. In other words. ..) be an eigenvalue of A. .1 u I· Denote E = .. Upper triangular (Schur) representation of an operator. vn be some orthonormal basis in E (clearly.. Note. In this eigenvector...1) block.I ). and T is an upper triangular matrix. . Let A. vn is an orthonormal basis in X. AU I basis the matrix of A has the form • o o here all entries below 1. so up v2.1. un in X such that the matrix ofA in this basi$ is upper triangular. 1. un) in which the matrix of A I is upper triangular.. and we want to prove it for dim X = n.Chapter 7 Structure of Operators in Inner Product Spaces In this chapter we again assuming that all spaces are finite-dimensional. any n x n matrix A can be represented as T = UTU*. Let A : X ~ X be an operator acting in a complex inner product space. and let up II u l II = 1 be a corresponding ut v = 1. . Proof We prove the theorem using the induction in dirnX. where U is a unitary.. . . There exists an orthonormal basis u l . and * means that we do not care what entries are in the first row right of 1.1. u2.and let 2' . . Suppose we proved that the theorem is true if dirnX = n . dimE = dim X-I = n..1 are zeroes..1) x (n . to give it name: we denote it as A I' . the induction hypothesis implies that there exists an orthonormal basis (let us denote is as u2. since any 1 x 1 matrix is upper triangular. and since dimE = n . that A I defines a linear transformation in E..

. u2.) a:. and let v 2. the matrices U and T can have complex entries. In this case the theorem claims that any operator have an upper triangUlar form in some basis. Therefore. Note. .Structure of Operators in Inner Product Spaces 199 So. Proof To prove the theorem we just need to analyse the proof of Theorem. Remark... then by the induction hypothesis there exists an orthonormal basis u2. -Sino. Suppose that all eigenvalues ofA are real. it is some operator constructed from A.1!: k7\ k E Z ( sma coso. Indeed.. that the version for inner product spaces is stronger than the one for the vector spaces. because it says that we always can find an orthonormal basis.. Then there exists an brthonormal basis u l .. that AE c E if and only if all entries denoted by * (i. that the subspace E = ut introduced in the proof is not invariant under A. .. un in X such that the matrix of A in this basis is upper triangular. a proof can be modeled after the proof of Theorem. An alternative way is to equip V with an inner product by fixing a basis in V and declaring it to be an orthonormal one. is not unitarily equivalent (not even similar) to a real upper triangular matrix. the inclusion AE c E does not necessarily holds.e. that even if we start from a real matrix A. In other words. matrix of A in the orthonormal basis u I' u2. . not just a basis. and the eigenvalues of an upper triangular matrix are its diagonal entries. where U is an orthogonal. . The following theorem is a real-valued version of Theorem Theorem. Note. Note. . where matrix A I is upper triangular. any real n x n matrix A can be represented as T = UTU* = UTUT. . vn is an orthonormal basis in lR n • The matrix of a in this basis has form equation. II ulll = 1 be a corresponding eigenvector. ..1) matrices. all entries in the first row... c?so. As in the proof of Theorem let 1 be a real eigenvalue of A. eigenvalues of this matrix are complex. and T is a real upper triangular matrices. Remark. The rotation matrix . the theorem is true for (n . Remark. un has the form).' . Indeed. .e. the matrix of a in this basis is upper triangular as well. Note also. That means that AI is not a part of A. u l E lR n . i. . vn be on orthonormal system (in ~ n) such that up v2. except AI) are zero. Let us assume (we can always do that without loss of generality. An analogue of Theorem can be stated and proved for an arbitrary vector space. Let A : X ~ X be an operator acting in a real inner product space. that the operator (matrix) A acts in lRn. then we are done.1) x (n . . un in E = ut . without requiring it to have an inner product. where A I is some real matrix. If we can prove that matrix Al has only real eigenvalues. Suppose.

Then all eigenvalues of A are real. Let A = A be a self-adjoint (and therefore square) matrix. Proposition.. a matrix satisfying A = a is called a Hermitian matrix. for example). Spectral Theorem for self-adjoint and normal operators. To show that A I has only real eigenvalues. and there exists and orthonormal basis of eigenvectors ofA in X This theorem can be restated in matrix form as follows Theorem. Let A = A * be a self-adjoint operator in an inner product space X (the space can be complex or real). Ax) = (x. so All x 112 = ~ II x 112. Theorem. Then. so is real. Since we usually do not distinguish between operators and their matrices. But a has only real eigenvalues! U2.200 Structure of Operators in Inner Product Spaces such that the matrix of A I in this basis is upper triangular. Now let us ask ourself a question: What upper triangular matrices are self-adjoint? The answer is immediate: an upper triangular matrix is self-adjoint if and only if it is a diagonal matrix with real entries. (Ax. if the matrix A is real. Av = Av. Moreover. . x) = (x. Let A = A * be a self-adjoint operator.l. we will use both terms. we can conclude A = I. x) = II x 112. and let u. x) = (x. Au = AU. It also follows from Theorem that eigenspaces of a self-adjoint operator are orthogonal. orthogonal). On the other hand.A) (take the cofactor expansion in the first.e. Theorem is proved.A) det(A I . i.e. A matrix about of a selfadjoint operator (in some orthonormal basis). so the matrix of a in the basis UI ' un is also upper triangular. the eigenvectors u and v are orthogonal. row. let us notice that det(A -'JJ) = (AI . Then (Ax. where U is a unitary matrix and D is a diagonal matrix with real entries. Let us recall that an operator is called self-adjoint if A = A *. Ax) = ~ (x. v be its eigenvectors. Let us give an alternative proof of this result.. Let A = A* and Ax = Ax. x '* O. Then A can be represented as A= UDU. A *x) = (x. matrix U can be chosen to be real (i. . and so any eigenvalue of A I is also an eigenvalue of A. Proof To prove Theorems let us first apply Theorem if X is a real space) to find an orthonormal basis in X such that the matrix of a in this basis is upper triangular. Lei us give an independent proof to the fact that eigenvalues of a selfadjoint operators are real. if A '* j. Since Ilx116= 0 (x '* 0). In this section we deal with matrices (operators) which are unitarily equivalent to diagonal matrices. x) = ~ II x 11 2 . . x) = (x.

Now let us try to find what matrices are unitarily equivalent to a diagonal one. v) = /leU.. so 'A. Remark. .1) upper triangular normal matrix is diagonal. if U is a unitary operator acting from one space to another. v). On the other hand (Au. that in the above theorem even if N is a real matrix. any self-adjoint operator (AA * = AA *) is normal. An operator (matrix) N is called normal if N* N = NN. Moreover. that if D is real. Any normal operator N in a complex vector space has an orthonormal basis of eigenvectors. We can write it as *" N= aJ'J 0 aJ'2 . we cannot say that U is normal. We will prove this using induction in the dimension of matrix. Note. and we want to prove it for n x n matrices. since any 1 x 1 matrix is diagonal. A *v) = (u. v) = leu. (Au. and D is a diagonal one. If 'A. In other words. Note. v) = /leU. /lv) = iI (u.1) x (n . /l it is possible only if (u. Theorem. Clearly. any matrix N satisfying N*N = NN* can be represented as N= UDU*. Proof To prove Theorem we apply Theorem to get an orthonormal basis. but here we are giving a direct proof. such that the matrix of N in this basis is upper triangular. v) (the last equality holds because eigenvalues of a self-adjoint operator are real). v) = (u. we did not claim that matrices U and D are real. v) = (lu. To complete the proof of the theorem we only need to show that an upper triangular normal matrix must be diagonal.Structure of Operators in Inner Product Spaces 201 Proof This proposition follows from the spectral theorem. it can be easily shown.1) x (n - 1) matrix. The case of 1 x 1 matrix is trivial.(u. Let N be n x n upper triangular normal matrix. N must be self-adjoint. not from one space to another. Av) = (u. So. Therefore AA = AA if the matrix of a in some orthonormal basis is diagonal. where U is a unitary matrix. v) = o.. aJ'n NJ 0 where Nt is an upper triangular (n . Also. Definition. Namely. that a normal operator is an operator acting in one space. any unitary operator U: X ~ X is normal since U*U = UU* = 1. It is easy to check that for a diagonal matrix D D*D=DD*. Suppose we have proved that any (n . v).

An operator N : X ~ X is normal if and only if II Nx II = II N*x II \. y) = (Nx. and by the induction hypothesis it is diagonal. + I al'n 12.fx E X . Ny) = L a II Nx+aNy 112 a=±).) = (NN»).) 12 + I a)'21 2 + . So.) = I a). So the matrix N is also diagonal.±i = L a=±l.n = O.. Direct computation shows that that and (NN»). x) = (N*x. .Nx) = (N*Nx. N*y) = (NN* x.±i and therefore N*N = NN*.N*x) = II Nx 112 so II Nx II = II N*x II· Now let II Nx II = II N*x II \.±i a=±l. y) a=±l. Proof Let N be normal.Structure of Operators in Inner Product Spaces 202 Let us compare upper left entries (first row first column) of N*N and NN*. Then II Nx 112 = (Nx.. = a). the matrix N has the form o a)..) if and only if a).2 = . (N*N»).Ii 2 o N*N= o o 0 .) 0 0 N= N) 0 It follows from the above representation that Ia).. NN*= o o so N. . That means the matrix N) is also normal.fx E X. N*N = NN*.. Therefore. Nl o = Nl N. x) = (NN*x. The following proposition gives a very useful characterization of normal operators. Proposition.±i a II N*(x+aN* y) 112 = (N*x. y E x (N*N x. The Polarization Identities imply that for all x.

Corollary. (A*A*) =A*A** =A*A and ~ d~agonal form Y. To prove that such B is unique. the matrix of C has the diag{A. B = C.J'.... Proof Pick an orthonormal basis such that matrix of a in this basis is diagonal.. Indeed. In the basis VI' v 2.ln be the corresponding eigenvalues (note that Uk ~ OVk). . a self adjoint operator A : X ~ X is called positive definite if (Ax. A > 0 if and only if all eigenvalues of A are positive...e.. . Theorem. . . Let u l ' u2. An be the corresponding eigenvalues.. . vn be an orthonormal basis of eigenvectors of A. un be an orthonormal basis of eigenvalues of e.l~. Consider an operator A : X square A *A is a positive semidefinite operator acting in X... . ~n} and therefore the matrix of A ::: C2 in the same basis is diag {~~ . .Clearly.B=B andB2=A.. The following theorem describes positive definite and semidefinite operators. J. ~.. Such B is called (positive) square root ofA and is denoted asJA or Proof Let us prove that JA exists.. vn the matrix of a is a diagonal matrix diag{l. There exists a unique positive semidefinite operator B such that B2 = A il2. i.. ex = . . 2. . Let A = A * ~ 0 be a positive semidefinite operator. An on the diagonal. that since A ~ 0.Structure of Operators in Inner Product Spaces 203 Polar and Singular Value Decompositions Definition. A A 0 if and only if all eigenvalues ofA are non-negative. n} with entries A]. un is a diagonal one diag {~]' ~. 2. .. To finish the proof it remains to notice that a diagonal matrix is positive definite (positive semidefinite) if and only if all its diagonal entries are positive (non-negative).~~ . ..t: 0" and it is called positive semidefinite if (Ax.Fz . . ..x.. and let AI' ~. . and let J. if Ax = Ax... Singular values. Detine the matrix of B in the same basis as diag {. then Therefore in the basis vI' v2' . . . Note. Modulus ofan operator. and A ~ 0 for positive semi-definite. The matrix of C in the basis ul' u2. vn above.... Its Hermitian .. and.. . all Ak ~ o. Let a = A *.Fn}. moreover. Fn}.. ~~} This implies that any eigenvalue A of A is of form J.. We will use the notation A> 0 for positive definite operators.. . x) ~ 0 "Ix EX. Let VI' v2. Then 1. x) > 0 Vx:. let us suppose that there exists an operator C = C* ~ 0 such that e2 = A. .J>:. ..ll' J.l2' .A.

Theorem. x) = (Ax. But first we need to prove that Vo is well defined. Let VI be another vector such that x = I A Iv I · But x = I A I v = I A I VI means that v . Proof Consider a vector x E Ran IA I. where. IA I x) = CI A I * I A = (A*Ax. By Proposition II Vox II = II A v II = III A I v II = II x II so it looks like V is an isometry from Ran I A I to X. the second one follows from the identity Ker T = (Ran T*). Let A : X ~ X be an operator (square matrix).l (j A I is self-adjoint). x) 1 Corollary. (Polar decomposition of an operator).J Therefore. x) = (I A 2x. This operator R is called the modulus of the operator A. Remark. Ax) = II Ax 112 lx. .204 Structure of Operators in Inner Product Spaces (A *Ax. x) = (Ax. Ax) = II Ax 112 ~ 0 Vx EX. Then vector x can be represented as x = IAlv for some vector v E X. Ker A = Ker I A I = (Ran I A I) 1. But in this case we can only guarantee that V is an isometry from Ran i A I = {KerA) 1- to Y. there exists a (unique) positive-semidefinite square root R = A * A . As one will see from the proof of the theorem. V is unique if and only if a is invertible. If dim X :S. Define Vo x := Av. . The polar decomposition A = VI A I also holds for operators A : X --7 Yacting from one space to another. By the construction A = VolA I.• Proof The first equality follows immediately from Proposition. For a linear operator A:X--7Y III A I x II = II Ax II Vx EX· Proof For any x E X III A I x 112 = CI A I x.v I E Ker I A I = Ker A so meaning that Vo x is well defined. The unitary operator V is generally not unique.V is a unitary operator. Remark. Then A can be represented as A=VIAI. The modulus ofA shows how "big" the operator A is: Proposition. dim Y this isometry can be extended to the isometry from the whole X to Y (if dim X = dim Y this will be a unitary operator). and is often denoted as I A I.

Consider an operator A : X ~ Y.. . . . . an be the singular values of a counting mUltiplicities. . It is always possible to do this. vk)wk' k=I Indeed. a r are the non-zero singular values of A. .. The system 1 . vk) = i*k 0 a~(vj' vk) = { . . a j' J.. the operator a can be represented as or. = a J·w·vJv· "akwkvk L... . = Ker A*. r is an orthonormal system. if AI' A2..J ·f· I J = 1. In other words.. It is easy to check that V = Vo + VI is a unitary operator. r. In the notation of the above proposition... Proof (Avj .. equivalently r Ax = L ak(x. . An are eigenvalues of A*A.Structure of Operators in Inner Product Spaces 205 To extend Vo to a unitary operator V. . Then r * * vJ.. That means ak = 0 for k> r. ... wk := ak AVk' k = 1. we know that vI' V 2. counting multiplicities. Assume also that a l' a 2. and let aI' a 2.J J J k=I = a J·w·J = Av·.. .. =k since vI' v 2. . .. . and let vI' V 2. .. A *Avk = ai vk Proposition.. vr is an orthonormal system. vn is an orthonormal basis in X.a~ are eigenvalues of A*A. let us find some unitary transformation VI: Ker A ~ (RanA)l.a~. and that A=~AI· Singular Values Eigenvalues ofl A 1 are called the singular values of A. since for square matrices dim KerA = dim Ker A * (the Rank Theorem).. Avk) = (A *Avj . . 2. By the definition of singular values a~. then are singular values of A. .2 . Vn be an orthonormal basisof eigenvectors of A *A.

* WkWj=(Wj. Singular value decomposition is not unique. vr is an orthonormal system r A* AVj 2 * 2 = LOkVkVkVj =OjVj k=1 thus vk are eigenvectors of A *A. Let a can be represented as * r A = LOkWkVk ° k=1 where ok> and v\... .. Why? Lemma. . It is especially easy to do if the operator .. V2. Definition Remark. k=1 So the operators in the left and right sides of equation coincide on the basis v I' V2. v2. A*A vk = O~Vk.. The singular value decomposition can be written in a nice matrix form. Vr' W I 'W2' . . . wr are some orthonormal systems. vr is an orthonormal system. Proof We only need to show that vk are eigenvalues of A* A.. j ~k ·-k ] . 1.wk)=B kj := {a. Then this representation gives a singular value decomposition ofA..•. . k=1 Since vI' v2. . . Let r * A= LOkWkVk k=1 be a singular value decomposition ofA..206 Structure of Operators in Inner Product Spaces and * r LOk WkVkVj = 0= AVj forj> r. Corollary. and therefore * 2 r A * A = LOkVkVk. Since v\. vn' so they are equal. . Then r * A= LOkVkWk k=1 is a singular value decomposition ofA * Matrix representation of the singular value decomposition.. .

.w2' . that if we know the singular value decomposition A = W LV * of a square matrix A. v2' . To represent A as WL V let us complete the systems {Vd~=I' {wd:=1 to orthonormal bases..• . r < n be non-zero singular values of A.wn respectively. and the operator A has n non-zero singular values (counting multiplicitie ).w2' . and V.. We will use this interpretation later... Then v\. ar' 0. ... Note.. . It can be rewritten as = W L V*. Namely.. A = WLV* so I A I = V LV * and General matrix form of the singular value decomposition. wn are orthonormal bases in x and Y respectively... .. From singular value decomposition to the polar decomposition. ... to say that L is the matrix of A in the (orthonormal) bases A = vI' v2. Let * n A = Lakwkvk k=1 be a singular value decomposition of A.. Remark. vn and B:= w l . an} and V and Ware unitary matrices with columns v\' v2' . .. Namely... Such representation can be written even if A is not invertible. Ware n x n vn and wI. . where L is n x n diagonal matrix diag {aI' .e. . if LV * is . a. a 2.wn be an orthonormal bases in Ker A = Ker IA I and (Ran A)... and let a I' a 2.. .wn respectively. vn and wp w2. let vr+ 1. wn' i. so the singular value decomposition has the form * n A= Lakwkvk k=1 where vI' v2.... vn and wl'w2... . A is an m x n matrix).L respectively.w2' as . the above representation A = V also possible... dim Y = m (i. .wn are orthonormal bases inXand Yrespectively and A can be represented A= WLV*. Let us first consider the case dim X = dim Y = n. . .. . ... . vn and wr+I' .. . vn and w l .. Another way to interpret the singular value decomposition A = W LV * is unitary matrices with columns vI' v2.e. . A where L = diag{a l . O}.. . . . . that = [AlB A.Structure of Operators in InnerrrOduct Spaces 207 A is invertible. In this case di X = dim Y = n. we can write a polar decomposition of A: = (WV)(VLV *) = VIAl U = WV. . In the general case when dim X = n..

.. especially if one takes into account that there is no formula for finding roots of polynomials of degree 5 and higher.. In other words. Because a Hermitian matrix has an orthogonal basis of eigenvectors. we cannot say much about spectral properties of an operator from its singular value decomposition. However. the diagonal entries of L in the singular value decomposition are not the eigenvalues of A.. Note. . and just a little more computationally intensive than solving a linear system.. . there are very eective numerical methods of find eigenvalues and eigenvectors of a hermitian matrix up to any given precision. you should believe me that there are very eective numerical methods for computing eigenvalues and eigenvectors of a Hermitian matrix and for finding the singular value decomposition. wr to orthonormal bases in X and Y respectively. . vn and w I 'w 2. we need to complete the systems vI' v 2. and O"k L j. We will not discuss these methods here.k = { 0 L is a "diagonal" m x n matrix j = k ~ r: otherwise. Final Remark: performing singular value decomposition requires finding eigenvalues and eigenvectors ofthe Hermitian (self-adjoint) matrix A * A.208 Structure of Operators in Inner Product Spaces * r A = LO"kWkVk k=1 is a singular value decomposition of A.. These methods do not involve computing the characteristic polynomial and finding its roots. They compute approximate eigenvalues and eigenvectors directly by an iterative procedure. . Since we have two dierent bases here. To find eigenvalues we usually computed characteristic polynomial. these methods work extremely well. found its roots. For example. This looks like quite a complicated process.. to get the matrix one has to take the diagonal matrix diag {O" I' 0"2' . However. vr and w I . r} and make it to an m x n matrix by adding extra zeroes "south and east".. SINGULAR VALUES As we discussed above. and so on . . . Then a can be represented as A = W LV*. ""wm respectively. = WL V* as in we generally have W L n V * . as the examples below show..w2' . that for a An :f:. singular values tell us a lot about so-called metric properties of a linear transformation. it goes beyond the scope of this book. where V E Mn x nand WE Mmxm are unitary matrices with columns VI' V 2.. These methods are extremely eective. the singular value decomposition is simply diagonalization with respect to two dierent orthonormal bases. so this diagonalization does not help us in computing functions 'of a matrix. However..

2 ~ 2-· a1 a2 an k=1 ak where YI' Y2' . ... 1). Y n satisfy the inequality 2 2 2 n 2 +Yn=~Yk<1 2··· 2 ~2a2 an k=1 ak 2l+ Y2 + 2 a1 (this is simply the inequality II x 112 = L k I xk 12 ::.. . for an = 3 it is an ellipsoid with half-axes a I' a 2 and a 2 . . Then the matrix of A in these bases is diagonal [AhA = diag{sn: n = 1. a k > 0. We want to describe A(B). . when the matrix A is not necessarily square. if and only if the coordinates YI' Y2' .. . . It is easy to see that the image space but in the Ran I of L B of the unit ball B is the ellipsoid (not in the whole L) with half axes a I' a 2... .. Namely. an}. . Then forv= (x l . . wn' not in the standard one. Y n are coordinates ofy in the orthonormal basis B = w l 'w 2.. we want to find out how the unit ball is transformed under the linear transformation.. consider the orthogonal bases A = VI' v2... Consider first the case of a "diagonal" matrix form.. Consider the general case. .2. so y=(YI'Y2' ···. Similarly. But that is essentially the same ellipsoid as before. a2...2. k= 1. I} IR n --7 be the closed unit ball in lRn.9. . it is easy to show that any point Y = Ax E A(B) if and only if it satisfies the inequality 2 2 2 II 2 Y2 + +Yn=~Yk<1 2l+ 2 2 ... . Assuming that all a k > 0 and essentially repeating the above reasoning. . . Consider for example the following problem: let A : lR m be a linear transformation. ar... .. n..2. Let us first consider the simplest case when A is a diagonal matrix A = diag{a l . The vectors e l . n}. .. (XI' x 2.. more precisely the corresponding lines are called the principal axes of the ellipsoid.. only "rotated" (with dierent but still orthogonal principal axes)! There is also an alternative explanation which is presented below. . xnl and (Y1'Y2' .. and we call that set an ellipsoid with half axes aI' a 2. In lR n the geometry of this set is also easy to visualize.w2' . e 2. . n. .. i. .wn from the singular value decomposition. . and (or) not all singular values are non-zero.. The singular value decomposition essentially says that any operator in an inner product space is diagonal with respect to a pair of orthonormal bases. . xnl = [x]A.e.209 Structure of Operators in Inner Product Spaces Image of the unit ball.. an. . The set of points in lR n satisfying the above inequalities is called an ellipsoid.ynl=Axforllxll::. ynl = Y = Ax we have yk = akXk (equivalently. ... vn and B = w l . and let B = {x E lR n : II x II ::. 1. If n = 2 it is an ellipse with half-axes a I and a 2. ..x2' . x k = yklak) for k = 1. .. see Remark 3. en or..

(B)) is also an ellipsoid with the same half-axes.. II Aelll = II slelll = sIll ell. V* (using the fact that both Wand V are invertible) that W transforms RanI. sr be non-zero diagonal entries of A and let sl be the maximal one. to Ran A. We know that I. so W( I. and I. Unitary transformations do not change the unit ball (because they preserve norm).. Definition. so formula holds. Here r is the number of non-zero singular values. For a diagonal matrix A with non-negative entries the maximum is exactly maximal diagonal entry. The quantity max {II Ax II : x E X.. . i. A = wI. Since unitary transformations do not change the norm. so II Ax II ~ S) II x II· On the other hand. To treat the general case let us consider the singular value decomposition.Structure of Operators in Inner Product Spaces 210 Consider now the general case. singular value decomposition allows us to solve the problem. (B) is an ellipsoid in Ran I. Vare unitary operators. ar . . where V. A = WI. that the maximum of IIAxl I on the unit ball B is the maximal singular value of A. i. so indeed sl is the maximum of II Ax lion the closed unit ball B. Indeed. lt is an easy exercise to see that II A II satisfies all properties of the norm: .. is the diagonal matrix with non-negative entries. the rank of A. that in the above reasoning we did not assume that the matrix A is square. with half-axes ai' a 2. Note. Again.ll x Il . Ware unitary operators. II x II ~ I} is called the operator norm of a and denoted II A II. V*. Operator norm of a linear transformation. . one can conclude that the maximum of II Ax II on the unit ball B is the maximal diagonal entry of I. so V* (B) = B.. a r . It is not hard to see from the decomposition A = wI. Given a linear transformation A : x ~ Y let us consider the following optimization problem: find the maximum ofkAxk on the closed unit ball B = {x EX: II x ~ I}. V .. where W. let s!' s2' .e.e. we only assumed that all entries outside the "main diagonal" are 0. Unitary transformations do not change geometry of objects. Since r Ax= LXkek' k=l we can conclude that r r k=l k=1 2 2 IIAxll~~>i IXk 12 ~S122:lxk 1 =sJ. so we can conclude: the image A(B) of the closed unit ball B is an ellipsoid in Ran A with half axes ai' 0"2' . .

One of the main properties of the operator norm is the inequality II Ax II ~ II A II . it can be shown that the operator norm II A II is the best (smallest) number C ~ o such that II Ax II :::. But even if we have exact data. In fact. k=1 On the other hand we know th~t the operator norm of a equals its largest singular value. Let us consider the simplest model.Structure of Operators in Inner Product Spaces 1. let us investigate how these two norms compare.e. Recalling that the trace equals the sum of the eigenvalues we conclude that r II A 112 = trace(A * A) = ~>i. So. instead of the equation Ax = b we are solving .. . when the data is obtained. round-o errors during computations by a computer may have the same effect of distorting the data. . sr be non-zero singular values of A (counting multiplicities).. that the operator norm of a matrix cannot be more than its. II A II + II B II· o for aliA.. i. . the Frobenius. II aA II = I ex I . 2 II A 112= trace(A * A). i. II A II = s I. suppose there is a small error in the right side of the equation. II A II· 2. = 0 if and only if A = o. That means. II A Ib. Condition number of a matrix. On the space of linear transformations we already have one norm. are non-zero eigenvalues of A*A (again counting multiplicities).s. That happens in the real life. for example by some experiments. II A + B 3. but we want to investigate what happens if we know the data only approximately. The beauty of the proof we presented here is that it does not require any computations and illuminates the reasons behind the inequality. Let sl' s2' . and such a proof is presented in some textbooks. which follows easily from the homogeneity of the norm II x II. This statement also admits a direct proof using the Cauchy-Schwarz inequality.e. is given by x = A-I b.. Suppose we have an invertible matrix A and we want to solve the equation Ax = b. SO we can conclude that II A II :::. C II x II '\Ix E X. Then s~ . So it is indeed a norm on a space of linear transformations from from x to Y. and let sl be the largest eigenvalues. II A II II ~ 211 II :::. IIA 4.s. of course.. This is often used as a definition of the operator norm. II x II. or Hilbert-Schmidt norm II Alb. The solution.

Theoretically. the matrix is called ill conditioned. . instead of the exact solution x of Ax = b we get the approximate solution x + Llx of A(x+ Llx) = b + ~b.11 AII.e.II ~bll. We deduced above that II~II -I lI~bll W ~II A II· II A II ·w· It is not hard to see that this estimate is sharp.. Here a can be any scalar. What is "big" here depends on the problem: with what precision you can find your right side. that it is possible to pick the right side b and the error ~b such that we have equality II ~II =11 A-I 11. Let sl' s2' .. sn In other words. If the condition number is big. II xII IIbll The quantity II A 11'11 A-III is called the condition number of the matrixA.II A-I 11. so x = A-I ~b..However. etc. in practical .II ~bll. We want to know how big is the relative error in the solution II ~ 11111 x II in comparison with the relative error in the right side II ~b 11111 b II. Let us see how this quantity is related to singular values. It estimates how the relative error in the solution x depends on the relative error in the right side b. We are assuming that A is invertible. what precision is required for the solution. sn be the singular values of A. so -I 1 II A 11= sl..212 Structure of Operators in Inner Product Spaces Ax = b + ~b. So. the rank of a matrix is easy to compute: one JUSt needs to row reduce matrix and count pivots. II A-I II . II A II . We know that the (operator) norm of an operator equals its largest singular value. where ~b is a small perturbation of the right side b. II x II we can conclude that II ~II::. i. the condition number of a matrix equals to the ratio of the largest and the smallest singular values. II ~ b II and II A x II ::.1I A 11=-. II xII IIbil We just put b = VI and ~b = aWn' where VI is the first column of the matrix V.. and let us assume that sl is the largest singular value and sn is the smallest. sn so IIAII. Effective rank of a matrix. and wn is the nth column of the matrix Win the singular value decomposition A = w'L V*.IIA- I II=~. It is easy to see that II ~II = II A-I~b II = II A-I~b 1111 b II = II A-I~b 1111 Axil II x II II x II II b II II x II II b II II x II Since II A-I ~b II ::.11 AII. A matrix is called well conditioned ifits condition number is not too big.

it is not very reliable in the general case. It requires more computations. . Then there exists an orthonormal basis VI' v2. we only know its approximation up to some precision. so effectively we cannot distinguish between a zero pivot and a very small pivot. kernel.k x (n . count it as zero. The main reason is that very often we do not know the exact matrix. Theorem. The advantage of this approach is that we can see what we are doing.. which are easier to interpret. etc) one simply sets up a tolerance (some small number) and if the pivot is smaller than the tolerance. A simple naive idea of working with round-off errors is as follows. the main disadvantage is that is is impossible to see what the tolerance is responsible for. Moreover.2k are 2-dimensional rotations. =(COS<Pk -sin<Pk) sin<Pk cos<Pk and In.2k) Rq. A better approach is to use singular values. since it is very easy to programme. even if we know the exact matrix. Suppose that detU = 1.2k). When computing the rank (and other objects related to it. The advantage of this approach is its simplicity. The theorem below explains this name. so by setting up the tolerance we just deciding how "thin" the ellipsoid sholJld be to be considered "flat". Let U be an orthogonal operator in IRI1. and then perform singular value decomposition. . Structure of Orthogonal Matrices An orthogonal matrix U with det U = 1 is often called a rotation. but gives much better results. In this approach we also set up some small number as a tolerance. what do we lose is we set the tolerance equal to 10--6? How much better will 10-8 be? While the above approach works well for well conditioned matrices. like column space.2k stands for the identity matrix of size (n . vn such that the matrix of U in this basis has the block diagonal form 0 Rq>1 Rq>2 Rq>k 0 where Rjk I n . . most computer programs introduce roundo errors in the computations.Structure of Operators in Inner Product Spaces 213 applications not everything is so easy. However. The singular values are the half-axes of the ellipsoid A(B) (B is the closed unit ball).. For example. Then we simply treat singular values smaller than the tolerance as zero.

U leaves the 2-dimensional subspace EA. E is an invariant subspace of U.e. that eigenvalues of a unitary matrix have absolute value 1.e. we have AU = (cos a + i sin a) (x + iy) = ((cos a)x-(sin a)y) + i((cos a)y + (sin a)x). Fix a pair of complex eigenvalues A and U. A can be split into pairs Ak.e. y is an orthogonal basis in EA. In other word. y is an orthogonal basis in EA. split U into real and maginary parts. Y = 1m u = (u . so without loss of generality we can assume that II x II = II y II = 1 i. we do not change matrices of linear transformations.u)/(2i). -sina cosa Note. 11 Uy = 2i U(u-iJ) = 2/ AU . Now. so u = x + iy (note. so x. c E'J. define x k := Re u = (u + 17)/2. v2 =Y to an orthonormal basis in ~n • Since UEA. spanned by the vectors x. all complex eigenvalues of a real matrix 5: k' We know. that x. y invariant and the matrix of the restriction of U onto this subspace is the rotation matrix cosa sina) Ra= ( .. Uy = Im(u) = (cos a)y + (sin a)x. vectors with real entries). Then 1 lux = U -(u + iJ) = -(AU + AU) = Re(Au) 2 2 Similarly. that the vectors u and u (eigenvectors of a unitary matrix. p(I) = 0 (this can easily be checked by into p(z) = L ~=o ak z k ). so all complex eigenvalues of A can be written as Ak = cos ak + i sin ak. that x. yare real vectors. Uu = Au' Then Uu =I 5:.214 Structure of Operators in Inner Product Spaces Proof We know that if p is a polynomial with real coefficient and A is its complex root.. y by the same non-zero number. Since = cos a + i sin a. E"A.' Ifwe multiply each vector in the basis x. corresponding to dierent eigenvalues) are orthogonal. Therefore. then I is a root of p as well. so by the Pythagorean Theorem J2 IIx 11= Ilyll =2'lluli.' Let us complete the orthonormal system VI = x.AU ) = 1m (AU). i. pCA) plugging I = 0. i. so Ux = Re(Au) = (cos a) x . let u E en be the eigenvector of 17 .' i. and I k = cos a k + i sin a k . It is easy to check that x -1 y. the matrix of U in this basis has the block triangular form .(sin a)y.e.

Therefore V*EI = U-IEI = EI so the matrix of V in the basis we constructed is in fact block diagonal. i. that the 2 x 2 matrix -12 can be interpreted as the rotation through the angle n'. its structure is described by the following theorem. the above matrix has the form given in the conclusion of the theorem with '<Pk = -uk or '<Pk = n Let us give a dierent interpretation of Theorem. Real eigenvalues can be only + 1 or -1. it is also unitary.. (*) Since V is unitary (~} so.2) x 2 block of zeroes. Note.. k. r) must be even.e.e.. So.u1 0 R_ U2 o here Ir and I.... Then Theorem simply says that V is the composition of the rotations ~ .. we have VE/. Therefore. Define ~.!ve take the composition. . the theorem can be interpreted as follows: Any rotation in IR n can be represented as a composition of at most nl2 commuting planar rotations.. = E/. i = 1. . If VI has complex eigenvalues we can apply the same procedure to decrease its size by 2 until we are left with a block that has only real eigenvalues. are identity matrices of size r x r and I x I respectively. vi + 1.215 Structure of Operators in Inner Product Spaces where 0 stands for the (n . act in mutually orthogonal planes. so in some orthonormal basis the matrix of V has the form R. 2. to be a rotation thorough <Pj in the plane spanned by the vectors vi . Note. Since det U = 1. the multiplicity of the eigenvalue -1 (i. Since the rotation matrix R_ uis invertible. that because the rotations T. Ifan orthogonal matrix has determinant -1. they commute. . since VI is square. it does not matter in what order .

The case n = 2 is treated. The case n = 1 is trivial. . There exist n.I of Rtx (the rotation R2 does not change the last coordinate.. . For a formal proof we will use induction in n. We call an elementary rotation 2 a rotation in the Xj .. let us prove it for n. Then there exists an orthonormal basis vI' v2. Let U be an orthogonal operator in ~n.2k where r = n .e. and let detU = -1. x 2f E ~2.1 elementary rotations R I. 0. Let us now fix an orthonormal basis. a linear transformation which changes only the coordinates Xj and x k .. where a = J + x~.2k .I -xn plane to "kill" the last coordinate ofx. _(COS'Pk .. R 2R j x = (a. Then use an elementary rotation R2 in x n. . There exists a rotation R of R2 which moves the of. Of. say the standard basis in ~n. vn such that the matrix of U in this basis has block diagonal form ° R<pk ° I n . . There exists a 2 x 2 rotation matrix Ra such that . .t plane to "kill" the coordinate number n . . +x~.. 0. . an orthogonal transformation U with detU = I) can be represented as a product at most n(n .. Assuming now that Lemma is true for n . Lemma. Theorem. where a = Jx~ +x~ + .2k) x (n .. Proof The idea ofthe proof of the lemma is very simple. and so on.x k plane. sm'Pk -sin'Pk] cos'Pk ~k- and I n. that it follows from the above theorem that an orthogonal 2 x 2 matrix U with determinant -I is always a reflection... R 2.. R n_1 such thdt R n.2-xn. so the last coordinate of R 2R tx remains zero).1. Let x = (xI' vector x to the vector (a. Let x = (xI' x 2.. To prove the theorem we will need the following simple lemmas. Any rotation U (i. The modification that one should make to the proof of Theorem are pretty obvious. since any vector in ~1 has the desired form. x: One can just draw a picture orland write a formula for Ra' Lemma.Structure of Operators in Inner Product Spaces 216 Theorem. xn)T E ~n. and it acts on these two coordinates as a plane rotation.1)/2 elementary rotations. . Note.1 and R<pk are 2-dimensional rotations. We use an elementary rotation R t in the xn .2k). .e.2k stands for the identity matrix of size (n .. i.1 .

. There exists a rotation R which "kills" the second coordinate of ai' making the first coordinate non-negative. The case n = I is trivial..' R3R2R x Let us now show that a = = (a.. " Xn_I' an_If E JR.. ..... So if we define the n R.Structure of Operators in Inner Product Spaces where an-I = J 217 X. . JR.1l .1 to the vector (a..I elementary rotations (say R I. " R2R)A has the following block triangular form Of. Rn _ l . . .. 0. but we apply the following indirect reasoning. . . " of. all its diagonal entries except the last one B n.2) identity matrix). and we know that a ~ 0. 0. There exist elementary rotations R I.R2. . " Xn-I' an_If = (a. It can be easily checked directly.. Then R n_ l .I) matrices.I. We can always assume that the elementary rotations R 2.2' an-I' of. " R n_ J act in JR.1)/2 such that the matrix B = RN .. R n_1 which transform a l into (a. " So.R2R JA = (~ ~J. the only possibility for a is a = I 2 2 2 \jXJ +X2 +"'+Xn ' Lemma. then we do not have any choice.. . .. " R n. Then the matrix B = RA is of desired form. ~(1'~2 x n elementary rotation R I by . . We can find n .. Proof We will use induction in n. 0. We know that orthogonal transformations preserve the norm.. In other words R n_ l .2) x (n . the matrix R n_ l . For the n x n matrix A let al be its first column. . " of E Jx~ + xi +'" + x~. .J (1n-2 is (n . simply by assuming that they do not change the last coordinate.2 elementary rotations (let us call them R 2. 1l .. so there exist n .. then R lx = (xI' x 2. . We assumed that the conclusion of the lemma holds for n ._I + x. Il . " R 3Rix l' x 2. Let A be an n x n matrix with real entries. . " R n_l ) in JR. 0. . 1l. . R 2. and we want to prove it for n x n matrices.. Let us consider the case n = 2. But. Let us now assume that lemma holds for (n .1 which transform the vector 1l I (XI' x 2.I) x (n . R 3. moreover. since we can say that any I x I matrix is of desired form. and. n ~ n(n . " O)T E JR. " x n. .n are non-negative. Let al be the first column of A.. R 3. . " R2RIA is upper triangular.

R2RIA). RN such that the matrix VI = RN.. VI is a diagonal matrix. So. except may be the last one.••. so the last diagonal entry also must be 1. .ne the orientation. . In each figure. xn)' but we can always assume that they act on the whole JR(1l simply by assuming that they do not change the first coordinate... the basis b) can be obtained from the standard basis a) by a rotation.1) (n . and orientation of the bases (c) is negative. ol (the first column of Rn_ l . and then to see what the orientation is.. RN · Here we use the fact that the inverse of an elementary rotation is an elementary rotation as well.. if you can see a basis. In Figures 3 orthonormal bases in JR(2 and JR(3 respectively.. and you probably know that bases (a) and (b) have positive orientation. you can try to draw a picture to visualize the vectors.218 Structure of Operators in Inner Product Spaces where Al is an (n . you probably can say what orientation it has.1)/2 elementary rotations. You also probably know some rules to determ. . and all diagonal entries. so the matrix A can be transformed into the desired upper triangular form by at most n .1) block. we can conclude that det V\ = det V= 1.. so we can have only 1 or -Ion the diagonal of VI' But. so AI can be transformed by at most (n . are non-negative. 0.1.. are 1. Note. and therefore V can be represented as a product of elementary rotations -\ -\ -I V=R1 R2 . and we know that an upper triangular matrix can be normal only if it is diagonal. . R2R2V is upper triangular.2)/2 rotations into the desired upper triangular form. Then.. so all the diagonal entries of VI' except may be the last one. Proof There exist elementary rotations R 1. The last diagonal entry can be ±1. Orientation Motivation. R2. say in JR(3. .2)/2 = n(n . like the right hand rule from physics.. So VI = J. that the matrix VI is orthogonal. we know that all diagonal entries of VI' except may be the last one. while it is impossible to rotate the standard basis to get the basis c) (so that ek goes to vk Vk). But what if you only given coordinates of the vectors VI' V2' V3? Of course.1)(n . Any orthogonal matrix is normal. these rotations do not change the vector (a..1) x (n . You have probably heard the word "orientation" before. . that these rotations act in JR(n-1 (only on the coordinates x 2' x 3' . We assumed that lemma holds for n . are nonnegative.. Note. Since elementary rotations have determinant 1. Therefore. . We know that an eigenvalue of an orthogonal matrix can either be lor-I.1 + (n .

. In an abstract space one just needs to fix a basis and declare that its orientation is positive. k = 1. en' Uek = vk. Note. . We say that the bases a and b have the same orientation. Vn is obtained from the standard basis by a rotation. e2. . how do you "explain" this to a computer? It turns out that there is an easier way. Orientation in IR 3 There is unique linear transformation U such that its matrix (in the standard basis) is the matrix with columns VI' v2 . We usually assume that the standard basis e l . Definition. 2..B in the definition . Standard Bases h e) (a) lR 2 e2 V2 (b) (c) Fig. that (for 3 x 3 matrices) if det U = -1. Let a and b be two bases in a real vector space X. the same orientation as the standard basis) This equation show that the basis Vl' v 2.. If an orthonormal basis VI' v2.•. Note. L e) v2 (a) (b) (c) Fig. This gives us a motivation for the formal definition below. i. en in IR n has positive orientation. then U is the composition of a rotation about some axis and a reflection in the plane of rotation.A' one can use the matrix [J]A. vn in IR n has positive orientation (i. if the change of coordinates matrix [J]B. . .e. Let us explain it. in the plane orthogonal to this axis. 3..e. Theorems give us the answer: the matrix U is a rotation if and only if det U = 1. that since ' -I [J]A. and say that they have dierent orientations if the determinant of [J]B A is negative. Moreover. . so we need to see when it is rotation.Structure of Operators in Inner Product Spaces 219 But this is not always easy. We need to check whether it is possible to get a basis VI' V2' v3 in IR3 by rotating the standard basis e l . .A has positive determinant. e2. .B = [I]B. v 3 • It is an orthogonal matrix (because it transforms an orthonormal basis to an orthonormal basis)..

one needs to show that the identity matrix J can be continuously transformed through invertible matrices to any matrix B satisfying det B > O.. Proof Suppose the basis A can be continuously transformed to the basis B. b] = [0.2 . k = 1. . b] such that via) = ak . . i.1]. vn(t) is a basis for all t E [a. V (b) = B. the Intermediate Value Theorem asserts that det V (a) and det V (b) has the same sign.. b] the matrix V (t) is invertible and such that V (a) = J.. an} can be continuously transformed to A basis b = {bI' b 2. bn } have the same orientation. . vib) = bk . Theorem. we can always assume. We say that a basis a = {aI' a2. To prove the oppqsite implication. if and only if one of the bases can be continuously transformed to the other. we can conclude that det[1]A.. b] be a continuous family of bases. the "only if' part of the theorem. . bn} if there exists a continuous family of bases Vet) = {vIet). . "Continuous family of bases" mean that the vector-functions vit) are continuous (their coordinates in some bases are continuous functions) and. that because Vet) is always a basis. v 2(t).e. .. so the bases A and B have the same orientation.. det V (t) is never zero. an} and B = {bI' b2. . t E [a. vit). Clearly.220 Structure of Operators in Inner Product Spaces Continuous Transformations of Bases and Orientation Definition. b]. t E [a. and let Vet).B = det V (b) > 0. . that there exists a continuous matrixfunction V (t) on an interval [a. b] such that for all t E [a.. Consider a matrix-function V (t) whose columns are the coordinate vectors [vit)]A of vi!) in the basis A. n. Since det V (a) = detl = 1.. V(b) = [1]A B' Note. that performing a change of variables... which is essential. . performing this transformation.. the system vI (t). .. .. In other words. Then. . if necessary that [a. .. vnc!)}. Note. . the entries of V(t) are continuous functions and V(a) = I. Two bases A = {aI' a 2.

Chapter 8

Bilinear and Quadratic Forms
Main Definition
Bilinear forms on Rn. A bilinear form on Rn is a function L = L(x, y) oftwo arguments

x, Y
1.
2.

E

jRn

which is linear in each argument, i.e. such that

= aL(xl,y) + ~L(x2'Y);
L(x, ay! + ~Y2) = alex, y!) + ~L(x, Y2)'
L(ax!

t

~x2'Y)

One can consider bilinear form whose values belong to an arbitrary vector space, but
in this book we only consider forms that take real values.
If x = (x!' x 2, ... , xn)T and y = (Y!, Y2' ... , Ynf, a bilinear form can be written as
n

L(x,y)

=L

aj,kxkYj'

j,k=!

or in matrix form
L(x, y)

= (Ax, y)

where

A=

The matrix A is determined uniquely by the bilinear form L.
Quadratic forms on jR n • There are several equivalent definition of a quadratic form.
One can say that a quadratic form on Rn is the "diagonal" of a bilinear form L, i.e. that any
quadratic form Q is defined by Q[x] = L(x, x) = (Ax, x).

222

Bilinear and Quadratic Forms

Another, more algebraic way, is to say that a quadratic form is a homogeneous
polynomial of degree 2, i.e. that Q[x] is a polyn~mial of n variables xl' x 2, ... , xn having
only terms of degree 2. That means that only terms axi and cX/'k are allowed.
There many ways (in fact, infinitely many) to write a quadratic form Q[x] as Q[x] =
2

2

(Ax, x). For example, the quadratic form Q[x] =xl + x2 - 4xlx2 on ~2 can be represented
as (Ax, x) where A can be any of the matrices

(°1 -4) (1 0) (1 -1).

1 ' -4 1 '
In fact, any matrix A of form

-2

1

(~a a~4)
will work.
But if we require the matrix a to be symmetric, then such a matrix is unique:
Any quadratic form Q[x] on ~n admits unique representation Q[x] = (Ax, x) where a
is A (real) symmetric matrix.
For example, for the quadratic form
2

2

2

Q[x]=xl + 3x2 + 5x3 +4xlx2-16x2x3+7xlX3
on ~3 , the corresponding symmetric matrix A is

[~

~ ;,~J.

-8 3.5

5

en.

en

Quadratic forms on
One can also define a quadratic form on
(or any complex
inner product space) by taking a self-adjoint transformation A = A * and defining Q by
Q[x] = (Ax, x). While our main examples will be in Rn, all the theorems are true in the
settipg of Cn as well. Bearing this in mind, we will always use a instead of AT

Diagonalization of Quadratic Forms
You have probably met quadratic forms before, when you studied second order curves
in the plane. May be you even studied the second order surfaces in ~3 •
We want to present a unified approach to classification of such objects. Suppose we
are given a set in ~ 3 defined by the equation Q[x] = I, where Q is some quadratic form.
If Q has some simple form, for example if the corresponding matrix is diagonal, i.e. if

Q[x] = alx~ + a2x~ + ... + anx~, we can easily visualize this set, especially if n = 2, 3. In

223

Bilinear and Quadratic Forms

higher dimensions, it is also possible, if not to visualize, then to understand the structure
of the set very well.
So, if we are given a general, complicated quadratic form, we want to simplify it as
much as possible, for example to make it diagonal. The standard way of doing that is the
change of variables. Orthogonal diagonalization. Let us have a quadratic form

= (Ax, x) in IRn. Introduce new variables y = (Y!, Y2'
where S is some invertible n x n matrix, so x = Sy.
Q[x]

... ,

ynf E

IR n , with y = S-I x,

Then,
Q[x]

= Q[Sy] = (ASy, 'Sy) = (S*ASy,

y),

so in the new variables y the quadratic form has matrix S*AS.
So, we want to find an invertible matrix S such that the matrix S*AS is diagonal. Note,
that it is dierent from the diagonalization of matrices we had before: we tried to represent
a matrix a as A = SDS-I, so the matrix D = S-lAS is diagonal. However, for orthogonal
matrices U, we have U = U-!, and we can orthogonally diagonalize symmetric matrices.
So we can apply orthogonal diagonalization we studied before to the quadratic forms.
Namely, we can represent the matrix a as
A = UDU* = UDU-!.
Recall, that D is a diagonal matrix with eigenvalues of A on the diagonal, and U is the
matrix of eigenvectors (we need to pick an orthogoml basis of eigenvectors). Then
D

= U*AU,

so in the variables

y= U-! x
the quadratic form has diagonal matrix.
Let us analyse the geometric meaning of the orthogonal diagonalization. The columns
of the orthogonal matrix Uform an orthonormal basis in IR n , let us call this
basis S. The change of coordinate matrix [I]s B from this basis to the standard one is exactly
U. We know that
'
y = (Y!, Y2' ... , ynf = Ax,
so the coordinates Y!'Y2' ... , Yn can be interpreted as coordinates ofthe vector x in the
new basis u!' u2, ... , un' So, orthogonal diagonalization allows us to visualize very well the
set Q[x] = I, or a similar one, as long as we can visualize it for diagonal matrices.
up u 2, ... , un

Example. Consider the quadratic form of two variables (i.e. quadratic form on IR 2 ),
Q(x, y) = 2.x2 +2y2 + 2xy.

Let us describe the set of points (x,
The matrix A of Q is
A

= (~

~}

yf E

IR 2 satisfying Q(x, y) = 1.

224

Bilinear and Quadratic Forms

Orthogonally diagonalizing this matrix we can represent it as

A

= (~

~)u*, where U = ~C

or, equivalently

U' AU

~(~ ~)~: D

r.(-Jz, Jzr'

The set {y : (Dy, y) = I} is the ellipse with half-axes 11.J3 and 1. Therefore the set
{x

E

IR 2

:

(Ax. x)

~ I}, is the same ellipse only in the basis (Jz, Jz

or, equivalently, the same ellipse, rotated 1t/4.
Non-orthogonal diagonalization. Orthogonal diagonalization involves computing
eigenvalues and eigenvectors, so it may be dicult to do without computers for n > 2. There
is another way of diagonalization based on completion of squares, which is easier to do
without computers.
Let us again consider the quadratic form of two variables, Q[ x] = 2x{ + 2xlx2 + 2xi
(it is the same quadratic form as in the above example, only here we call variables not x, y
but x I' x 2 ). Since
1)2

2 ( xJ+"2X2

(2

1

1 2)

2

1 2

=2 xl +2XI"2X2+4"X2 =2xJ +2XIX2+"2X2

(note, that the first two terms coincide with the first two terms of Q), we get
2

2

(

1)2

2xJ +2XIX2 +2xI =2 xl +-X2

2

3 2
2

3 2
2

2

+-X2 =2YI +-Y2'

1

whereYI =x I

+ xl +"2 X2 andY2 =x2.

The same method can be applied to quadratic form of more than 2 variables. Let us
consider, for example, a form Q[x] in ~3 ,
2 2 2

Q[ X] = Xl - 6XIX2 + 4XIX3 - 6X2X3 + 8X2 - 3X3 .
Considering all terms involving the first variable xl (the first 3 terms in this case), let
us pick a full square or a multiple of a full square which has exactly these terms (plus
some other terms).
Since
(Xl - 3X2

+ 2X3)

2

2 '

= xI

- 6XIX2

2

we can rewrite the quadratic form as
2

2

2

+ 4XI X3 -12x2 X3 + 9X2 + 4X3
2

(XI- 3x2+ 2x3) =X2 -6X2 X3+ 7x3·

Bilinear and Quadratic Forms

225

Note, that the expression -xi + 6x2x3 -7xi involves only variables x 2 and x 3. Since
-(X2 - 3X3)

2

=-(X22 -

222

6X2X3 + 9X3) = -x2 + 6X2X3 - 9X3

we have
2

2

2

2

-X2 + 6X2X3 - 7X3 =-(X2 - 3X3) + 2x3·
Thus we can write the quadratic fonn Q as
,2

22222

Q[x](x)-3X2+ 2x3) -(x2- 3x3) + 2X3 =y) +2+23

where
Y) = x) - 3x2 + 2x3'Y2 = x 2 - 3x3'Y3 = x 3·
There is another way of perfonning non-orthogonal diagonalization of a quadratic
fonn. The idea is to perfonn row operations on the matrix a of the quadratic form. The
difference with the row reduction (Gauss-Jordan elimination) is that after each row operation
we need to perform the same column operation, the reason for that being that we want to
make the matrix S*AS diagonal.
Let us explain how everything works on an example. Suppose we want to diagonalize
a quadratic form with matrix

1 -1 3)
(

=

-1 2 1.
311
We augment the matrix A by the identity matrix, and perform on the augmented matrix
(All) row/column operations. After each row operation we have to perfonn on the matrix
a the same column operation. We get
A

(

~ - ~ ~ ~ ~ ~) + ~ (~

-

R)

311001

-:

! ~ ~ ~1) ~

31100

~ ! : ~~) ~ [~ ~ ! ~ ~ ~1) ~

1
0
(3
4
001
(

0

~

1

1 -3~

00 0

o

~

,0 4 -8 -3 0

~~) ~ (~~ ~

4 -8 -3 0
1

(

0 0

1

0
-24 -7 -4

-4R2

~IJ.

1

~

0 0 -24 -7 -4

~)-

E2. If we now denote E* = S we get that S* AS is a diagonal matrix.e. so we will end up with the same matrix. and the matrix E = S is the right half of the transformed augmented matrix. we performed only row operations on it. (°~ °~ ~ J=(~ ~ 1~J(~1 ~l 1~J(~° °~ =:J. because we did not need to interchange two rows. interchange columns 1 and 2. Therefore. The identity 2XIX2 122 = i[(XI +X2) -(Xl -X2) ] gives us the idea for the following series of row operations: (01 °111° 0)1-~Rl (01 -1/ 21I -1/ 21 0)1 2 (11 (01 -11I -1121 O)+Rl 1 °1112 1) -I 112 1 1 112 11)' (0 -1 I -112 ° Non-orthogonal diagonalization gives us a simple description of a set Q[x] = 1 in a non-orthogonal basis. So. The corresponding column operation is the right multiplication by the transposed elementary matrix. It is harder to visualize... We get the diagonal D matrix on the left. performing row operations E l . This operation is a bit tricker to perform.. A row operation is a left multiplication by an elementary matrix. EN and the same column operations we transform the matrix A to * * * EN . i. so D = S*AS. E2EIAEIE2 . for example. that we perform column operations only on the left side of the augmented matrix. As for the identity matrix in the right side. But we also must to perform the same column operation. Let us consider. we need something more non-trivial.. and the matrix S* on the right.1 -24 -7 -4 3 Let us explain why the method works. EN =EAE*. In the above example we got lucky.. the simplest idea would be to interchange rows I and 2.226 Bilinear and Quadratic Forms Note. so the identity matrix is transformed to EN'" E2El1 = E1 = E. a quadratic form with the matrix A = (~ ~) Ifwe want to diagonalize it by row and column operations. ". but it may be hard to guess the correct row operations. It is quite simple if you know what to do. then the representation given by the .

resp..positive (resp. A-neutral). However.. S~An} This transformation changes the diagonal entries of D. Proof By rearranging the standard basis in ]R. zero) diagonal entries of a diagonalization D = S*AS in terms of A. Sometimes.. However. Here we of course assume that S is an invertible matrix. we can take a diagonal matrix S=diag{sl.siA2. We will need the following definition. negative. . . x) > 0 (resp. zero) diagonal entries of D coincides with the maximal dimension of an D .sk 0 and transform D to *' S * DS = diag{s~Al. Silvester's Law of Inertia As we discussed above. x) < 0.negative.s2' . not involving S or D. Theorem. (Ax.e. And this is always the case Namely. . D . . and let D = S*AS be its diagonalization by an invertible matrix S.positive subspace E of dimension 1 +.. (Ax. for a quadratic form Q[x] :::. .n) we call a subspace E c lR positive (resp. the famous Silvester's Law ofInertia states that: For a Hermitian matrix A (i. etc). . then there exists an A . negative. neutral) if (Ax.neutral) subspace. Given an n x n symmetric matrix A = A * (a quadratic form Q[x] = (Ax. x) n on ]R. The idea of the proof ofth~ Silvester's Law of Inertia is to express the number of positive (negative. Then the number ofpositive (resp. For example.. We will need the following lemma. D . (Ax. Definition. that a resulting diagonal matrix is not unique. zero) diagonal entries of D depends only on A. if we are not interested in the details. Lemma. there many ways to diagonalize a quadratic form.Bilinear and Quadratic Forms 227 orthogonal diagonalization. the number of positive (negative. which can be considered a particular case of the above theorem. but it is impossible to find a positive subspace E with dim E > r +. to emphasize the role of a we will say A-positive (A negative. for example if it is sucient for us to know that the set is ellipsoid (or hyperboloid. negative.negative. it does not change the signs of the diagonal entries. x» and any its diagonalization D = S*AS. resp.sk E ]R. x) = 0) for all x E E..neutral) subspace..n (changing the numeration) we can *' . if we got a diagonal matrix D = diag{Al' A2. resp. and D is a diagonal one. An}. Let D be a diagonal matrix D = diag{Al' A2 .. Note.. An}' Then the number of positive (resp.. zero) diagonal entries of D coincides with the maximal dimension of an A-positive (resp. Let A be an n x n symmetric matrix. A . The above theorem says that ifr+ is the number of positive diagonal entries of D. then the nonorthogonal diagonalization is an easier way to get the answer. x O. sn}. resp. resp. resp. but not on a particular choice of diagonalization. A .

= xr + = 0.. In other words. . . and therefore (Dx.positive subspace F the subspace !)IF is D .positive and a D . We got an operator acting from E to E+. and vice versa: for any A . x) = (S* ASx. First of all. Since (Dx. But we just proved that Ker T= {O}. Therefore the maximal possible dimensions of a A .positive subspace E. . . Let us now apply the Rank Theorem. er+. . x) = (ASx. Note.positive subspace coincide. k=r++l But X belongs to aD . The same identity implies that for any A . but we restricted its domain to E and target space to E+..positive. . The case of negative and zero diagonal entries treated similarly. that dim Ker T= 0.positive subspace E define an operator T: E ~ E+ by Tx = Px. . we just apply the above reasoning to the matrix -D. so the inequality (Dx. and we use a dierent letter to distinguish it from P.. Indeed. Of. so for x dim E = rank T ~ dim E+ = r+.228 Bilinear and Quadratic Forms always assume without loss of generality that the positive diagonal entries of D are the first r + diagonal entries. and dimE+ = r+. i.x) = n L 2 Akxk ~ 0 (Ak ~ Ojork>r+). let for X = (xl' x 2. Consider the orthogonal projection P = PE+' Px = (xl' x2. Since Sand !)l are invertible transformations. . xnf E E we have Tx = Px = O. Sx) it follows that for any D .. Consider the subspace E+ spanned by the first r-f coordinate vectors e l . T is the restriction of the projection P : P is defined on the whole space. but even simpler.positive subspace E. rank T = dimRan T ~ dimE+ = r+ because Ran T c E+. x r+. for any D positive subspace E we can find an A . dimKer T + rank T = dim E. e2. Therefore. Proof Let D = S*AS be a diagonalization of A. To prove the statement about negative entries... Let us now show that for any other D-positive subspace E we have dim E ~ r +.. Then. x = (Xl' x2.positive subspace (namely SE) of the same dimension. x) ~ 0 holds only = O..positive subspace F we can find a D . dimE = dim SE and dimF = dim !)l F. . The case of zero entries is treated similarly. . By the Rank Theorem. For a D .... the subspace SE is an Apositive subspace.. .e. by the definition of P Xl = x 2 = . and the theorem is proved. V X E E. xnf. Clearly E+ is a D-positive subspace.positive subspace (namely !)l F) of the same dimension. that Ker T = {O}. 0 .

Definition. Proof The proof follows trivially from the orthogonal diagonalization. A is negative semidefinite i all eigenvalues of a are non-positive. i. The opposite implication is quite simple. if a > 0 and det A = ac-b 2 > 0. A is indefinite i it has both positive and negative eigenvalues. 0 for all x. A is negative definite i all eigenvalues of a are negative. for a matrix A "'2 . if there exist vectors xl and x2 such that Q[xd > 0 and Q[x2] < O. A symmetric matrix A = A * is called positive definite (negative definite. • Indefinite if it take both positive and negative values.b2 > 0 Indeed. So we have proved that conditions imply that A is positive definite.229 Bilinear and Quadratic Forms Positive Definite Forms Minimax characterization of eigenvalues and the Silvester's criterion of positivity Definition. 3. etc. there is an orthonormal basis in which matrix of a is diagonal.). Let A = A *. Then 1. So we know that if AI' A2 are eigenvalues of a then AI > 0 (det A > 0) and 1 + 2 = traceA > O. A is positive definite i all eigenvalues of a are positive. By Silvester's Law of Inertia it is sucient to perform av arbitrary. etc) one does not have to compute eigenvalues. A is positive semidefinite i all eigenvalues of a are non-negative. Theorem. not necessarily orthogonal diagonalization D = SAS and look at the diagonal entries of D. 4. x) is positive definite (negative definite. It is an easy exercise to see that a 2 x 2 matrix is positive definite if and only if a > 0 and det A = ac . then c > 0. that to find whether a matrix (a quadratic form) is positive definite (negative definite.) ifthe corresponding quadratic form Q[x] =(Ax. and for diagonal matrices the theorem is trivial.. This result can be generalized to the case of n x n matrices. so trace A = a + c > O.e. • Positive semidefinite if Q[x] .:: 0 for all x. Remark. 2. But that only possible if both eigenvalues are positive. Note. • Negative definite if Q[x] < 0 for all x • Negative semidefinite if Q[x] ~ '* o. etc. S. Indeed. Silvester's criterion of positivity. A quadratic form Q is called • Positive definite if Q[x] > 0 for all x * O. Namely.

after we subtract the third row/column we get the diagonalization of A 2 . . one can see that the only bad situation that can happen is the situation where at some step we have zero in the pivot place. . since we are not performing row exchanges and performing the operations in the correct order. . that we do not encounter any pathological situation..2.l ).A 2 = . m~ffix :J3~ ~ * is positive definite if and only if detA k > Ofor all k = 1. Let us recall that the codimension of a subspace E c X is by the definition the dimension of its orthogonal complement. We will need the following characterizatipn of eigenvalues of a hermitian matrix. and if we are not doing any row interchanges.A3= a21 a22 '" 2.j k then all eigenvalues ofA are positive by analyzing diagonalization of a quadratic form using row and column operations. Minimax characterization of eigenvalues. Since we are performing only row repla~ement we do not change the determinant. . dim X = n we have dim E + dim E.1 a n. first subtracting the first row/column from all other rows/columns.n A let us consider its all upper left submatrices A I =(a l1 ). .1 2.Bilinear and Quadratic Forms 230 A= an. then subtracting the second row/columns from the rows/columns 3. the entry in the k + 1st row and k + 1st column is O.. after we subtract first and second rows and columns..An =A Theorem.e.l = n. . ..1 al.2 . and so on. One can show that if det Ak > 0 \. Moreover. (Silvester's Criterion of Positivit f?. det Ak > 0 for all k. Recall that the trivial subspace {O} has dimension zero.e. n.2 an..2 1. since all eigenvalues of a positive definite matrix are positive.3 a23. If one analyzes the algorithm. 4. then we automatically diagonalize quadratic forms Ak as well. the condition det Ak > 0 guarantees that each new entry in the diagonal is positive. we get diagonalization of A 2 . In other words. Therefore. Of course. Namely. we can see that codim E = dirnX .2 J al. and so on . ). a a) (a a 1. codim E = dim(E. i. Since for a subspace E c X.JJ. if after we subtracted the first k rows and columns and obtained a diagonalization of A k .dim E. n. First of all let us notice that if A> 0 then Ak > 0 also (can you explain why?).1 (al. so the whole space X has codimension O. and perform the operations in the correct order... The key here is the observation that if we perform row/column operations in natural order (i.. one has to be sure that we can use only row replacements. we preserve determinants of A k • Therefore.

but let us only note that this matrix is not A: it has to be a k x k matrix. Remark. so the above inequality min(Ax. An}' Pick subspaces E and F. The first one requires some familiarity with basic notions of analysis: one should just say that the unit sphere in E. By normalizing it we can assume without loss of generality that II Xo II = 1. and find the minimum of (Ax.. i. There are two possible explanations of the fact that (Ax. and let AI' 1.x).e. xeE IIxll=1 xeF Ilxll=1 We did not assume anything except dimensions about the subs paces E and F. (Axo. where k= dimE. For each such subspace E we consider the set of all x E E of norm 1. Proof First of all. and we need to pick a subspace E such that the number is maximal.<11=1 COLlInF=A~1 11'11=1 Let us explain in more details what the expressions like max min and minmax mean.x). xeE IIxll=1 xeF Ilxll=1 . . x E E is a quadratic form on E. Another explanation will be to notice that the function Q[x] = (Ax. the functionJ(x) = x has neither maximum nor minimum on the open interval (0.x) = min max(Ax. x) in our case) on a compact set attains its maximum and minimum. A = diag{A\. codim F= k-l.2 ~ . That is the maxmin.... A sophisticated reader may notice a problem here: why do the maxima and minima exist? It is well known. (Minimax characterization of eigenvalues). It is easy to see that for a quadratic form the maximum and minimum over a unit sphere is the maximal and minimal eigenvalues of its matrix.x):::. 1. Thus for each subspace we obtain a number. x) over all such x. ~ An be its eigenvalues taken in the decreasing order. However. by picking an appropriate orthonormal basis..xo):::. dimE = n -k+ 1. Since dimE + dim F> n. we can assume without loss of generality that the matrix A is diagonal. that maximum and minimum have a nasty habit of not existing: for example..2 ~ . the set {x E E: II x II = I} is compact. It is not dicult to compute the matrix of this form in some orthonormal basis in E. i.Bilinear and Quadratic Forms 231 Theorem.x). As for optimizing over all subspaces. To compute the first one. x) attains maximum and minimum. x). so let us assume that AI ~ 1. max(Ax. max(Ax.e.2' . Then Ak = max min(Ax. & ~E F: ~E dimE=k 11. ~ An. dimE= k. and that a continuous function (Q[x] = (Ax. we will prove below that the maximum and minimum do exist.. Let A = A * be an n x n matrix. we need to consider all subspaces E of dimension k.x) :::. Since x belongs to the both subspaces E and F -min(Ax. 1). there exists a non-zero vector Xo E En F.. in this case maximum and minimum do exist. We can always arrange the eigenvalues in decreasing order.. The min max is defined similarly.

Fo := span{ek.1 basis vectors. Let 1. Il n-I be the eigenvalues ofA and A respectively. en-I}' Since (Ax. eX = span{e I .x).e.n and Ill' 112' .k. mil} max(Ax. x) for all x EX.x) ~ min EcX xeE dimE=n-k IIxll=1 EcF n dimE=n-k (minimum over a bigger set can only be smaller). . . . dimE = k.e.k+I xeE IIxll=1 .k+l.1) x (n .k = 1.x) ~ min (Ax. 1' 1. ek+l..k' xeEo xeFo IIxll=1 IIxll=1 It follows from equation that for any subspace E...k • Corollary (Intertwining of Eigenvalues). max(Ax.k' xeE xeFo IIxll=1 IIxll=1 and similarly. . . ek+2.(k . any subspace E c codimension in Therefore X) has dimension n Ilk = X of codimension k . . x) = A.. Define Eo := span{el' e2.k }~. ek}.k=I be a self-adjoint matrix. taken in decreasing order. if we increase the set). Let A = A * = {aj. i.k . Let X c F be the subspace spanned by the first n .k }~.1.I=I be its submatrix of size (n . so min max = max min = A. so its codimension in Fn is k.1).x) = A. dimE = k min(Ax. take maximum over a bigger set (any subspace of X is a subspace of P'). and let A = {aj.1 (here we mean 1 . e 2. A.1) = n . .. min (Ax.k' xeF xeEo IIxll=1 IIxll=1 But on subspaces Eo and Fo both maximum and minimum are A.. the maximum and minimum of (Bx. Then I. Therefore Ilk ~ lk' (the maximum can only increase. xcX dimE=k xeE IIxll=1 To get A. we get that .x) = A. max(Ax. x) over the unit sphere {x : " x " = I} are the maximal and the minimal eigenvalue respectively (easy to check on diagonal matrices).. x) = max(Ax.232 Bilinear and Quadratic Forms holds for all pairs of E and F of appropriate dimensions.. . Ilk = ma{( min(Ax.. A.2' ..k ~ Ilk ~ A.x) ~ max(Ax.. for any subspace F of codimension k .2...n-1 n Proof. . . en}' Since for a self-adjoint matrix B.. On the other hand. x) = (Ax..k we need to get maximum over the set of all subspaces E of P'.. x) = A.

thenA k > 0 for k= 1. ..2. k-l Since detA k = AI/"'2 .I respectively.?:Ilj >0 forj= 1. that all Ak (and so A = An) are positive definite..2. . ~k-I be eigenvalues of Ak and A k. using induction in k. . We will show.. By Corollary 'k j . the last eigenvalue Ak must also be positive.. . Therefore.... Ak-I/"'k > 0.. . 'Let us now prove the other implication. the matrix Ak is positive definite..Bilinear and Quadratic Forms 233 Proof If A> 0. . Ak and ~I' ~2' .. Clearly A I is positive definite (it is 1 x 1 matrix. since all its eigenvalues are positive. . n. det Ak > 0 for all k = 1. Let det Ak > 0 for all k. Let AI' A2.. Since all eigenvalues of a positive definite matrix are positive. . n as well (can you explain why?).. so A I = det A I)' Assuming that A k_1 > 0 (and det Ak > 0) let us show that Ak is positive definite. . 2. .

AI is zero matrix. But this is a wrong proofl To see why. and let pCA) = det(A -I) be its characteristic polynomial.cnA n k=O then the result will be zero matrix. let us analyse what the theorem states. Even though in both cases we got zero.Ai) = peA) = LCk Ak k=O and then plug matrix A instead of A to get n k peA) = L ck A = col + cIA + . . which is based on some ideas from analysis. A "continuous" proof.. Namely. and the theorem claims that this matrix is the zero matrix. Moreover. for diagonalizable matrices).AI). we get peA) = det(A . and its determinant is just the number O. Thus we are comparing apples and oranges. that if we compute the characteristic polynomial n det(A .n several observations.Chapter 9 Advanced Spectral Theory Cayley-Hamilton Theorem Theorem (Cayley-Hamilton). But peA) is a matrix. The proof is based o. and so for matrices similar to diagonal (Le. It is not clear why we get the same result if we just plug A instead of A in the determinant det(A . Then peA) = o. these are dierent zeroes: he number zero and the zero matrix! Let us present another proof. The second observation is that any matrix can be approximated (as close as we . It states. A .AI) = det 0 = o. it is easy to see that with the exception of trivial case of 1 x 1 matrices we will get a dierent object. First of all. The proof looks ridiculously simple: plugging A instead of in the definition of the characteristic polynomial. Let a be a square matrix. the theorem is trivial for diagonal matrices.

it is sucient to prove the theorem only for upper triangular matrices. It may even look like some kind of cheating. Such a reader should be able to fill in all the details.. to make them all dierent. so piAk) = O. let A be an upper triangular matrix..z1) of A can be represented as p(z) = (A\ .AI)(Z . proof of the theorem which is one of the "standard" proofs from linear algebra textbooks. So. Therefore peA) = lim k-7oo 0 = O.) = det(A . We can perturb the diagonal entries of A as little as we want. We can perturb diagonal entries of A (as little as we want). so let AI' A2. so 00 * A= o An The characteristic polynomial p(z) = det(A . for others. so Frobenius norm II A . Since for any polynomial p we have p(UA[jI) = Up(A)[jI. Since any operator has an upper triangular matrix in some orthonormal basis. . (z .. A "standard" proof We know.An)' so . that any square matrix is unitary equivalent to an upper triangular one. It can be shown that the characteristic polynomials pl). we can assume without loss of generality that a is an upper triangular matrix.z) = (_l)n (z .. Therefore peA) = lim Pk(Ak )· k-7 But as we just discussed above the Cayley-Hamilton Theorem is trivial for diagonalizable matrices.A2) . An be eigenvalues of A ordered as they appear on the diagonal. Therefore one can find a sequence of diagonalizable matrices Ak such that Ak -7 A as k -7 00 for example such that Ak ..A1) of A. and by Corollary an n x n matrix with n distinct eigenvalues is diagonalizable). let us resent another. . However. see Theorem.Ak -7 A as k -7 00). and for him/her this proof should look extremely easy and natural. although. and the characteristic polynomials of unitarily equivalent matrices coincide.Z)(A2 . This proof is intended for a reader who is comfortable with such ideas from analysis as continuity and convergence. the proof definitely may look strange. (An .) = det(Ak -IJ) converge to the characteristic polynomial pC). so the perturbed matrix A is diagonalizable (eigenvalues of a a triangular matrix are its diagonal entries.z) . So. We know that diagonal entries of a triangular matrix coincide with it eigenvalues. let me repeat that it is an absolutely correct and rigorous proof (modulo some standard facts in analysis).... who are not comfortable yet with these ideas.235 Advanced Spectral Theory want) by diagonalizable matrices.A 112 is as small as we want.

Anl)x E En-I' X2 := (A -In_ll)x I = (A -In_Il)(A -Inl) X E En-2' Xn := (A -1 2l)xn_ 1 = (A -1 2l) . AEk c Ek (meaning that Av E Ek for all v E Ek). (A ...Al)Ek c E k.Ak)ej E Ej C Ek (because Ej is an invariant subspace of A . . Take any vector v E Ek.I n_Il)(A -Inl)x EEl· The last inclusion mean that xn = ae l • But (A .AIl)(A .236 Advanced Spectral Theory Define subspaces Ek := span{e l ..At! to some vectors in E k _1' we can conclude that (A .At! can be non-zero. so 0= (A . . because both Av and AV are in Ek.. where e l .Al.A~. However Aylj = AiAk = Ak+j. ek. Moreover.. . e2. since for any v E Ek and any A (A .A2l) . We know that generally matrix multiplication is not commutative.1 entries of the kth column of the matrix of A . e2. Ek is an invariant subspace of A . An is the standard basis in Since the matrix ofA is upper triangular.AIl)e l = 0. ek_I }. Namely. e2. . Thus (A .A~ek E span{e l .. because only the first k .. ele are transformed by A . Therefor. the subspaces Ek are so-called invariant subspaces of the operator A.e. .A~Ek. i. and from here it is easy to show that for arbitrary polynomials p and q . (A .e.Al)v =Av ...:: p(A)x = 0 for all X E en . 1 we get XI := (A . i. e 2. n - 1. . By the definition of Ele it can be represented as a linear combination of the vectors e 1.AV E Ek. . Spectral Mapping Theorem Polynomials of operators..e.. which means exactly that peA) = O.AIl)xn = (A .. generally AB BA so the order is essential. N Ie 2 N p(A):= LaleA =aOI+a1A+a2A + . .. i. . Since all vectors e l . (A . e2. Take an arbitrary vector x E en = En" Applying inductively with k = n.Anl)X.. for j < k we have (A ..A~V E Ek_ 1 \/v E Ek • en. ek}. ... We can say even more about the the subspace (A . . k=1 here we agree that AO = I ::j::. Let us also recall that for a square matrix (an operator) A and for a polynomial p(z) = the operator peA) is defined by substituting a instead of the independent variable.. . +aNA . On the other hanc!.

. In particular. Proof As it was discussed above. then Akx = Akx.z\)(z .ztl)(A .2. The operator q(A) is not invertible. if is an eigenvalue of A. (A . Since ~ E (P(A)) the operator q(A) = peA) . Ifwe consider a particular case ~ = o of the above theorem.. To prove the opposite inclusion cr(p(A)) cp(cr(A)) take a point ~ E cr(p(A)).. . That means zk E (A). Theorem (Spectral Mapping Theorem). then peA) can be represented as peA) = a(A . Namely.. . the inclusion p(cr(A)) c (P(A)) is trivial. n. that as stated. That means that the inclusion p(cr(A)) c cr(p(A)) is trivial.. if a polynomial p(z) can be represented as a product of monomials p(z) = a(z .z2l) .1. if . Ax = x for some x =1= 0.zN). Let A : V ~ V be an operator (linear transformation) in a vector space V. ~ is an eigenvalue of peA) if and only if ~ = peA) for some eigenvalue of A. so o= q(zk) = p(zk) . we get the following corollary. i.~1. Then. Note. .. Denote q(z) = p(z) .Advanced Spectral Theory 237 p(A)q(A) = q(A)p(A) = R(A) where R(z) = p(z)q(z).zN)' where z \' z2' . so q(A) = peA) .~ and therefore ~ = p(zk). Then peA) is invertible if and only if peAk) =1= 'v'k = 1. that one inclusion is trivial. Note. For a square matrix a and an arbitrary polynomial p cr(p(A)) = p(cr(A)).~. (z . (z .. Let a be a square matrix with eigenvalues 1. a subspace E of the vector space V is called an invariant subspace of the operator A (or.zNl) Spectral Mapping Theorem. So we have proved the inclusion s(P(A)) c p((A)).n\l)(A . so peA) is an eigenvalue of peA). Remark. Corollary. one does not need to worry about non-commutativity.. Let us represent the polynomial q(z) as a product of monomials. . as it was discussed above in Section 2. In other words. this theorem does not say anything about multiplicities of the eigenvalues. That means that when dealing only with polynomials of an operator A. and act like a is simply an independent (scalar) variable. shortly.~I is not invertible.z2) . 2. A .z/ must be not invertible (because a product of invertible transformations is always invertible). so one of the terms A .e. z N are the roots of p..z2) . and p(A)x = p(A)x.invariant) if AE c E. On the other hand zk is a root of q.. n and let p be a polynomial. Let us recall that the spectrum seA) of as quare matrix (an operator) A is the set of all eigenvalues of A (not counting multiplicities. Invariant Subspaces Definition.zNl)..z\)(z . we can represent q(A) = a(A . q(z) = a(z ..

However. Then p(AI E) = p(A)I E· Proof The proof is trivial. then for all vEE the result Av also belongs to E. Er a basis of A-invariant subspaces. for an A-invariant subspace E we define the so-called restriction AlE: E ---7 E of A onto E by (AIE)v = Av \tv E E. since AEk = Att:k c Ek the operators Ak act independently of each other (do not interact).invariant. We will need the following simple lemma Lemma. here we have the correct ordering of the basis in V. and let E be an A-invariant subspace. because the restriction of a to the eigenspace Ker(A . i.e. as we know eigenspaces do not always form a basis (they form a basis if and only if A can be diagonalized. If E is A-invariant. If E is an A-invariant subspace. then A2E = A(AE) cAE c E. The eigenspaces Ker(A . and to analyse action of A we can analyse operators Ak separately.then in E2 and so on). Unfortunately. first we take a basis in E1. Our goal now is to pick a basis of invariant subspaces E 1. Similarly one can show (using induction.E2. E2.238 Advanced Spectral Theory Av E E for all vEE. i. then.AI) would be good candidates..... then Att: cE \tk~l· This implies that P(A)E c E for any polynomial p. In this case we will get basis in which the matrix of A has a simple structure.AI) is simply At!. Here we changed domain and target space of the operator. E is A 2 . . . If E1 . if we pick a basis in each subspace Ek and join them to get a basis in V then the operator a will have in this basis the following block-diagonal form o A= o (of course.e. Formally. Therefore we can treat A as an operator acting on E. Er such that the restrictions Ak have a simple structure. . In particular. that if AE c E. . for example). not on the whole space V. but the rule assigning value to the argument remains the same. and AEk := AIEk are the corresponding restrictions. that any A-invariant subspace E is an invariant subspace of peA). . the so-called generalized eigenspaces will work. Let p be a polynomial.

which means (A _l)kw = O. Let for some k Ker(A .. F means that E c F but E:. it cannot grow to infinity.e. let us notice that if for finitedimensional subspaces E and F we have E <. i. The opposite inclusion is trivial.I)k ~ dim V < 00 .I)k stabilizes. of all generalized eigenvectors. .I)k = Ker(A .. The representation does not look very simple. that for the generalized eigenspace EA. Ker(A . To show that the sequence of kernels stabilizes. in fact one can take the finite union. Then Ker(A _l)k+r = Ker(A .A.I)k+r+l. i. Recalling the definition of w we get that (A . for it involves an infinite union.I)k w = 0 so V E Ker(A . so at some point Ker(A .Ai)k+r+1v = O. as EA. The rest follows from the lemma below.AI)k+ 1 Vk ~ k" .'Al)k+r+l Vr ~ O.e. Ker(A . then dim E < dim F. However. A vector v is called a generalized eigenvector (corresponding to an eigenvalue) if (A .:. The number d = d(A) on which the sequence Ker(A .Advanced Spectral Theory 239 Generalized Eigenspaces Definition.2.A.I)k stabilizes. Then w := (A . the sequence of the subspaces Ker(A . i.Al)k+r. We proved that Ker(A _l)k+r+l c Ker(A _'M)k+r.'M)k+ 1 V k ~ 1. It follows from the definition of the depth.AI)k c Ker(A . the number d such that Ker(A _'M)d-I C oct Ker(A _'M)d = Ker(A .)J/v = 0 for some k ~ 1. In other words one can represent the generalized eigenspace EA.A..Ali. Definition.AI)k+ I. = UKer(A-AI)k. 3. (A . k~1 The sequence Ker(A .e. The collection EA. Lemma. Since dimKer(A .Hi c Ker(A .I)k = Ker(A .II)k+ I. F). is an increasing sequence of subspaces. F (symbol E <. Proof Let v E Ker(A . so. k = 1.ll)r E Ker(A . together with 0 is called the generalized eigenspace (corresponding to the eigenvalue A. i.e.A.Al)d+l is called the depth of the eigenvalue A. .I)k+r v = (A . But we know that Ker(A _'M)k = Ker(A _'M)k+l so WE Ker(A _'M)k.ll)k+ 1. .

AkI)/IIk (A).. E is an invariant subspace of A.E2. Ar}.AkIEk are nilpotent.k I Ek ) = 0. Ntk Proof Let mk be the multiplicity of the eigenvalue Ak. = 0. E. 0= peA) = (A .. .k I ) k I Ek = (Ak . (A .. In this basis the matrix of the operator a has the block diagonal form A = diag{Al'A 2. Proof There are 2 possible simple proofs. and let Ek : = Ek be the corresponding generalized eigenspaces. Lemma. then by Theorem. where Ak := A I Ek (property 2 of the generalized eigenspaces).. the operator PiA) I Ek = piAk) is invertible. )mk Pk (A k ). . . Then the system of subspace E J. what we know about generalized eigenspaces.Ak )mk is .AI£). . . cr(AIE')." is nilpotent. and restriction all operators to Ek we get 0= p(Ak ) = (Ak .) 3. Er is a basis ofsubspaces in V Remark. . The first one is to notice that mk ..AkI)mk I Ek = 0. Now we are ready to state the main result of this section. 1. and the spectrum of nilpotent operator consists of one point 0.::: dk.. A. . Now let us summarize. .) = (AlE').AkIE.A1n)d('). AE.Al)IEJ d('). = {A}. where dk is the depth of the eigenvalue Ak and use the fact that d 'I mk 'I (A . see 2. By the Cayley-Hamilton Theorem. Theorem...I\.. Ek = E'). . K so (Ak-AkIEk ) I11k -1 =p(Ak)Pk(Ak ) To prove the theorem define -1 =OPkCAk ) =0..I\... Let (A) consists ofr points AI' A2. (z- A m.'AJ)d(').) = 0... so p(z) = the characteristic polynomial of A. where Ak := AI Ek . Ifwe join the bases in all generalized eigenspaces Ek. then 2. Define pk(z) = p(Z)/(Z-Akt'k II hk = . The second possibility is to notice that according to the Spectral Mapping Theorem.. j) II::1 (z . If d(A) is the depth of the eigenvalue.e' It is also easy to see.)v = Vv E E/. «A ..Advanced Spectral Theory 240 (A . that the operators Nk := Ak . Let A : V ~ V. we will get a basis in the whole space. because the operator AlE').

the operator B = q(A) is invertible. Note that BEk c Ek (any A .Pk =B -1 L. Since we have for w = piA)v '\ /Ilk /Ilk (A -I'vkl) w = (A . To prove property 3. restriction of which to Ej is zero. 4. To prove the last property. by the Spectral Mapping Theorem. that Ek is an invariant subspace of S-1. i. recall that according to Cayley-Hamilton Theorem peA) = O. For the operators Pk defined above PI + P2 + . PklEj 3.PkPPk(A)=B B=l. which together with BEk c Ek implies BEk = Ek.j:.. = 0 for} . That means.j:.Akl) pk(A)v = p(A)v = O. Define the operators Pk by Pk = S-lpiA ). Lemma. Note also. that it follows from that piA) I Ej = 0 Vj. Moreover. RanPk c Ek. Indeed.kJ). 0 for all k. k . 2. 0. because piA) I Ej = Pk(A) and piA) contains the factor (Aj -'A/Ej )mj = O.. MUltiplying the last identity by S-I we get that S-IEk = Ek. dim(BEk) = dim E k.Advanced Spectral Theory 241 r q(z) = LPk(z) k=1 Since Pk( j) = 0 for j . Therefore piA) I Ej = 0 and thus P k I Ej = S-I piA) I Ej = O.j:.Pk(A) contains the factor (A ... 1. any vector w in Ran piA) is annihilated by some power of (A . Since B is an invertible operator. + Pr = I. k and Pk (k) . let us notice that it follows from that for v E Ek . which by definition means that Ran Pk(A) c E k. PkV = vVv E E k .e.invariant). we can conclude that q(k) if:. k=1 k=1 L.'A)mj .. k. Therefore. in fact Ran Pk = Ek • Proof Property 1 is trivial: r '" r -1".j:.invariant subspace is also peA) .. so.

so let us pick them in such a way that the corresponding blocks Ak are upper triangular.A)nk .Ai) consists of one eigenvalue k of (algebraic) multiplicity nk = dimEk · The multiplicity equals nk because an operator in a finite-dimensional space V has exactly dim V eigenvalues counting multiplicities. Note that we are free to pick bases in Ek . where Operators are nilpotent. . Now we are ready to complete the proofofthe theorem. and Ak has only one eigenvalue.AIEk = IT (I"k . that if v is represented as v = "" r vk' E Ek . the spectrum of the operator Ak (recall that Ak =Nk . . But this means that the algebraic multiplicity of the eigenvalue Ak is nk = dimEj. The following corollary is very important for dierential equations..A r }.+ vr ) = Pkvk = v k· Geometric Meaning of Algebraic Multiplicity Proposition. vk E Ek .... the matrix of a in any such basis has a block-diagonal form diag{AI. so a(Nk) = {O}. Take v E Vand define vk = Pkv.A2' . Proof Ifwe joint bases in generalized eigenspaces Ek = EM to get a basis in the whole space. and by statement.AI) = r r k=1 k=1 IT det(Ak . Therefore. Then det(A .Jk=l Pkv = Pivl + v2 + . r V= LVk' k=l so v admits a representation as a linear combination. then it follows from the Statements 2 and 4 of the lemma that L.i An important application. we can just note. To show that this representation is unique. Then according to Statement 3 of the above lemma. Algebraic multiplicity of an eigenvalue equals to the dimension of the corresponding generalized eigenspace. j=l which implies Pkv= g-lBv= v.242 Advanced Spectral Theory r Pk(A)v = LPj(A)v = Bv.

. Ek = Ek· The operators Nk := Ac AJ Ek are nilpotent. . Let us recall that if p is a polynomial of degree d. DN=ND.. But since N is nilpotent. Any operator A in V can be represented as A = D + N. N"' = 0 for some m. N2.e.. Since operators D and N commute. AJEr } Notice also that AkIEkNk = NkAkIEk (identity operator commutes with any operator).A r }. so m-l/k)(D) k p(A)=p(D+N)= ~ k! N. where D is diagonalizable (i. and we can write (by plugging D instead of a and N instead of x d p(k)(D) k k! N p(A) p(D + N) p(a+x) = = =. then p(a + x) can be computed with the help of Taylor's formula d Lp (k) (a) i k=O k! This formula is an algebraic identity. . . and the operator in the equation. the same rules as for usual (scalar) variables apply to them. so the block diagonal operator N = diag{N\. then in this basis A has the block diagonal form A = diag{A\. where Ak :=A I Ek.A2' . . . to compute the derivative p(k)(D) we first compute the kth derivative of the polynomial p(x) (using the usual rules from calculus). only first m terms can be non-zero.Advanced Spectral Theory 243 Corollary.. meaning that for each polynomial p we can check that the formula is true using formal algebraic manipulations with a and x and not caring about their nature. .N2. and DN=ND.DN=ND.. Proof As we discussed above. This corollary allows us to compute functions of operators. . D = diag{Al lEI' A21E2 .to Here. Therefore.. Nr } we get the desired decomposition. defining N as the block diagonal operator N = diag{N1.Nr } commutes with D. and then plug D instead of x. if we join the bases in Ek to get a basis in V. diagonal in some basis) and N is nilpotent (NI1l = 0 for some m)...

we will need to consider the following problem: For a nilpotent operator A find a basis such that the matrix of a in this basis is simple.!. These matrices (together with 1 x 1 zero matrices) will be our "building blocks". The same approach works if p is not a polynomial.244 Advanced Spectral Theory In m i. Let see. that if we join the bases in all generalized eigenspaces Ek = Ek to get a basis in the whole space. In particular. so we cannot say that the formula is true for an arbitrary power series p(x).. Since we can deal with each Nk separately. However. It is easy to see that the matrix o 1 0 o o 1 o o is nilpotent. but an infinite power series. Note.Nk k=O k! k=O k! This formula has important applications in dierential equation. then. this formula makes computation of peA) much easier. we will show that for any nilpotent operator one can find a basis such that the . .. what does it mean for a matrix to have a simple form.. much smaller than d. if the radius of convergence of the power series is 1. if p(x) = eX.. . using the fact that (eXy = eX we get. Ar} and operators Ak ca be represented as Ak = Ai +Nk. In the previous section we have proved. where Nk are nilpotent operators. we need to find a basis in which the nilpotent operator Nk has a simple form. In each generalized eigenspace Ek we want to pick up a basis such that the matrix of Ak in this basis has the simplest possible form. then everything works fine. that an operator a in a vector space V is called nilpotent if Ak = 0 for some exponent k. For general power series we have to be careful about convergence of all the series involved. e A m-l D m-l = L -=-Nk = e D = L . then the operator a has in this basis A block diagonal form diag{AI' A 2. Since matrix (in any basis) of the identity operator is the identity matrix.. that the fact that ND=DN is essential here! Structure of Nilpotent Operators Recall. Namely.

2.. and initial vectors ofthese cycles are linearly independent.... p .1'. because. .. I ~ j ~ Pk}' Consider the subspace RanA. and that A annihilates any cycle of length 1. and let C I . v.2.. Let a be a nilpotent operator.. I Thus we have to be looking for the chains of vectors vI' v2. . . vp satisfying the above relations. k = 1. + Pr be the total number of vectors in all the cycles.. Suppose the matrix of an operator A has in a basis vI' v'). Then no vector belongs to two cycles. .1). otherwise we can consider instead of the operator A its restriction onto the invariant subspace span{v~ : k= 1..k' Pk being the length ofthe cycle Ck. V the form (4. V.245 Advanced Spectral Theory matrix of the operator in this basis has the block diagonal form diag{A l' A 2.. . Then p Av 1and AVk+ = vk . p .. . . Theorem.. Ifn = 1 the theorem is trivial. .. Without loss of generality we can assume that the vectors v~ span the whole space V. It follows from the relations that vectors v~ :k=I. are linearly independent. Remark... A similar definition can be made for an arbitrary operator. Therefore.v. the vector Vp is called the end vector of the cycle.k-l is a cycle..2. that the theorem is true for all operators and for all collection of cycles. Let us now assume. C2.. .k' V. .V. . Note that if Pk > 1 then the system v. . C k vectors VII. and the vectors .. Proof Let n = PI + P2 + . Ar}.. ... . a chain of non-zero vectors vI' v 2.. Cr be cycles of its generalized eigenvectors..1 ~j~Pk-l span Ran A. ° Cycles of Generalized Eigenvectors Definition. . and they must satisfy the identities (A -lJ)vI = 0. and the union of all the vectors from all the cycles is a linearly independent. The vector vI is called the initial vector of the cycle. Let a be a nilpotent operator.. We will use induction in n... v~ = v. .I. (A . we have finitely many cycles. where each A k is either a block of form or a 1 x I zero block. . . and the number p is called the length of the cycle.1. Then all vectors vk must belong to the same generalized eigenspace E.'Al)vk+ 1 = vk. k = 1. Assume that the initial . . . .2. Let us see what we should be looking for.. as long as the total number of vectors in all the cycles is strictly less than n. vp satisfying relations is called a cycle of generalized eigenvecton of A..r. so the induction hypothesis applies.

Consider the subspace X= Ran A. Since the v:. For n = 1 the theorem is trivial.k+1 such that k AVpk+1 = vpk· So we can extend each cycle Ck to a bigger cycle initial vectors v: of cycles Ck' vk =1. Since the end vector V.. so we can consider the restriction Alx..r) + dimKer A .. V. .::: n. r are linearly independent. ..k' V. j ::. . Theorem.::: r. so we have n-r vectors in the basis v~ : k = 1. rank A = dim Ran A = n . we have a basis there. j ::..::: (n . Cr of generalized eigenvectors such that their union is a basis in X. therefore the vectors v~ : k= 1. Assume that the theorem is true for any operator acting in a space of dimension strictly less than n. . Pk -1 ). .v2'···' vpk where v: is the initial vector of the cycle. i. Let us complete this system to a basis in KerA. Let A : V ~ V be a nilpotent operator. By the definition of the cycle we have v: E KerA. .r) + r = n so dim V. 2.. r. one can find a vector V...k+1 are linearly independent. and we assumed that the initial vectors v: ' k = 1. Pk -1 are linearly independent. 2. v. Xis an invariant subspace of the operator A. and since these vectors are linearly independent dim Ker A .. so they are linearly independent Jordan Canonical form of a Nilpotent operator. . Pk' form a basis. r. so by the induction hypothesis there exist cycles C I' C2. dim Ran A < dim V.. 2.. On the other hand V is spanned by n vectors.v. .. r. Proof We will use induction in n where n = dim V. Theorem implies that the union of these cycles is a linearly independent system. On the other hand AiI =0 for k = 1. . k = 1. . dimV= rankA + dimKerA = (n ... . r.k+1. V.2. Since these vectors also span Ran A. Then V has a basis consisting of union of cycles of generalized eigenvectors of the operator A.. .... " .J ::.r k (we had n vectors. . Since A is not invertible.. .2.. .e. . .. and we removed one vector vPk from each cycle Ck . 1 ::. By the Rank Theorem.k belong to Ran A. Ck = v: . r. 2. Therefore.. 1 ::. 1<· . Let k k k Ck = vI . let .246 Advanced Spectral Theory Vjk·k=12 ..k' V. ..

Uq IS a baSIS there). if we treat this question literally. Let A be a nilpotent operator. Ul! . VI 'Ul' U2" . .A 2. Note. . .. that such basis is not unique.... The matrix of a in a Jordan canonical basis is called the Jordan canonicalform of the operator A. Cr have dim Ran A = rank A vectors total. .2. u I ' u2. The vector uj can be treated as a cycle of length !. Corollary. .. This methods also allows us to answer many natural questions. .. We will see later that the Jordan canonical formis unique. for we always can change the order of the blocks. the answer is "no". I uq such that the system. . if we agree on how to order the blocks (i. . . We can join these 1x 1 zero blocks in one large zero block (because o-diagonal entries are 0). . -- - (because We added to the cycles Cl . and then put the zero block). . . so we got rank A + r + q = rank A + dimKer A = dim V linearly independent vectors.247 Advanced Spectral Theory find vectors U ... Uniqueness of the Jordan canonical form.k =1. So. . . and one of the blocks Ak can be zero. But dim V linearly independent vectors is a basis.. like "is the block diagonal representation given by Corollary unique?" Of course.. To show that it is a basis.A r }.. . . for example by agreeing on some order of blocks (say. let us count the dimensions.e....C2 . the socalled dot diagrams.Cr . Proof According to Theorem one can find a basis consisting of a union of cycles of generalized eigenvectors. But. on how to order the vectors in the basis). There is a good way of visualizing Theorem and Corollary. or not? . if we exclude such trivial possibilities. We know that dim Ker A = r + q 12 r . VI' vI . we can apply Theorem to get that the union of all these cycles is a linearly independent system. uq is a basis in Ker A (it may happen that the system v. A basis consisting of a union of cycles of generalized eigenvectors of a nilpotent operator a (existence of which is guaranteed by the Theorem) is called a Jordan canonical basis for A. a cycle of size p gives rise to a p x p diagonal block.. Definition. . so the total number of vectors in all the cycles Ck is rank A + r. Dot diagrams. is the representation unique.. We know that the cycles C p C2. There exists a basis (a Jordan canonical basis) such that the matrix ofA in this basis is a block diagonal diag {A]. C 2 .. and a cycle of length 1correspond to a 1 x 1 zero block. whose initial vectors are linearly independent. Each cycle Ck was obtained from Ck by adding 1 vector to it. Cr additional q vectors. u I ' tl2' . in which case we put q = 0 and add nothing). so we have a collection of cycles CI .. uq . where all Ak (except may be one) are blocks ofform. if we put all non-zero blocks in decreasing order..r is already a basis in Ker A.

So. Ifwe agree on the ordering of the blocks. suppose we have a basis. there is a one-to-one correspondence between dot diagrams and Jordan canonical forms (for nilpotent operators). Dot Diagram and Corresponding Jordan Canonical form of a Nilpotent Operator To better understand the structure of nilpotent operators. all other entries of the matrix are zero. The cycle of length 5 corresponds to the 5 x 5 block of the matrix. and 3 cycles of length 1. . we can see that the operator a acts on its dot diagram by deleting the first (top) row of the diagram. two cycles of length 3. which we join in the 3 x 3 zero block. let us analyse. The first row consists of initial vectors of cycles. This dot diagram shows. To answer this question. Let us represent the basis by an array of dots. and we arrange the columns (cycles) by their length. and allows us to write down the Jordan canonical form for the restriction A IRan A. the cycles of length 3 correspond to two 3 non-zero blocks. putting the longest one to the left. On the figure 1 we have the dot diagram of a nilpotent operator. and moves vector vH1 of a cycle to the vector vk. Here we only giving the main diagonal of the matrix and the diagonal above it. The new dot diagram corresponds to a Jordan canonical basis in Ran A. so that each column represents a cycle. the question about uniqueness of the Jordan canonical form is equivalent to the question about uniqueness of the dot diagram. as well as its Jordan canonical form.248 Advanced Spectral Theory • ••••• • •• • •• • • 0 1 0 1 0 0 1 0 0 0 1 0 o 0 o 1 o o 0 o 0 o 0 o Fig. Three cycles of length 1 correspond to three zero entries on the diagonal. Namely. how the operator A transforms the dot diagram. which is a union of cycles of generalized eigenvalues. Since the operator A annihilates initial vectors of the cycles. let us draw the so-called dot diagram. that the basis has 1 cycle oflength 5.

we have constructed all cycles of maximal length. these will be the initial vectors of the cycles.2. k= 1.. completing the basis v:. Now we want to put vectors instead of dots and find a basis which is a union of cycles. we know how many cycles should be there. Let us say few words about computing a Jordan canonical basis for a nilpotent operator. the number of dots in the first row is dimKerA. does not depend on a particular choice of such a basis. . Let P2 be the length of a maximal cycle among those that are left to find. if for all k we know the dimensions dimKer(A"). v~ . PI' and counting dimKer(A") we can construct the dot diagram of A. . . Thus.CI'.dimKer A. we can complete the basis v:. .2. the number of dots in the second row is dimKer(A2) . . . .. V( in Ran (A P2 ). v~ . . and let dim Ran(A P2 ) = r 2. and the number of dots in the kth row is dimKer(A") . . it is not hard to see that the operator Ak removes the first k rows of the dot diagram..Then we find the end vectors of the cycles by solving the equations Pi k V~I' V!I"'" V~I k A vpI =1' k = 1. then the Jordan canonical form is unique..2 by solving (for V~2) the equations PI k k A vp2 =vI. which was initially defined using a Jordan canonical basis.. V(+l ... .. Therefore. V(. . is unique. . and what is the length of each cycle). Then. . .r I +2.r2. . Consider a basis in the column space Ran(Apl).dimKer(A k+1). This implies that if we agree on the order of the blocks. V( to a basis v:. the dot diagram.... V( in Ker( A P2 ) to a basis in Ker( A P3 we construct the cycles of length P3' and so on. . Then we find end vectors of the cycles C'1+ I . Computing a Jordan canonical basis. But this means that the dot diagram...Advanced Spectral Theory 249 Similarly. That PI is the length of the longest cycle.. Namely. Let p} be the largest integer such that ApI "* 0 (so APl+} = 0).. Therefore. we know the dot diagram of the operator A. v~. k=r} + l. We start by finding the longest cycles (because we know the dot diagram. .rI· Applying consecutively the operator a to the end vector V~I ' we get all the vectors v~ in the cycle.. Name the vectors in this basis v:' v~ . Computing operators Ak. Consider the subspace Ran(AP2).. Since Ran(A PI ) c Ran(AP2).v( . thus constructing the cycles of length P2' Let P3 denote the length of a maximal cycle among ones left. ...

where The operators Nk =A k . Jordan Decomposition Theorem Theorem.'Al)m = dim Ker(A _'JJ)m+l. but here we do not discuss this part. if we know the dot diagram. . k = 1.. In fact. the matrix of a in this basis has a block diagonal form diag {A I . For an eigenvalue let m = m be the number where the sequence Ker(A _'JJ)k stabilizes. i. Then E" = Ker(A .'Al)m is the generalized eigenspace corresponding to the eigenvalue.2. and assume that eigenvalues are already computed. Proof Ifwe join bases in the generalized eigenspaces Ek = EM to get a basis in the whole space. Here we assume that the block of size 1 is just 'A..A2' . we know the canonical form..e. until the sequence of the subspaces stabilizes. Given an operator A there exist a basis (Jordan canonical basis) such that the matrix of a in this basis has a block diagonal form with blocks ofform Iv 1 Iv 0 1 Iv 1 Iv where Iv is an eigenvalue ofA.. For each eigenvalue we compute subspaces Ker(A - . so by Theorem one can find a basis in Ek such that the matrix of Nk in this basis is the Jordan canonical form of Nk . then it is sucient only to keep track oftheir dimension (or ranks of the operators (A _'JJ)k.Advanced Spectral Theory 250 One final remark: as we discussed above. The block diagonal form from Theorem is called the Jordan canonical form of the operator A.J'l.1v1)k+I). After we computed all the generalized eigenspaces there are two possible ways of . since we have an increasing sequence of subspaces (Ker(A _'JJ)k c Ker(A . To get the matrix of Ak in this basis one just puts k instead of 0 on the main diagonal. so after we have found a Jordan canonical basis. we do not need to compute the matrix of a in this basis: we already know it.'A/Ek are nilpotent. First of all let us recall that the computing of eigenvalues is the hardest part. The corresponding basis is called a Jordan canonical basis for an operator A.A r }. . m satisfies dimKer(A _'JJ)m-l < dimKer(A .

. which involves inverting a matrix and matrix multiplication. which we would need to consider when working with the block Ak separately. so applying the algorithm described in Section 4./)1..'). Ar}.. But we need to find the matrix of the operator in a new basis. Again.A2. when computing a Jordan EAk canonical basis for a generalized eigenspace EAk ' instead of considering subspaces Ran(Ak . the algorithm works with a slight modification..Advanced Spectral Theory 251 action.. The first way is to find a basis in each generalized eigenspace. and putting k instead of 0 on the main diagonal.').i are nilpotent./)1 EAk • . so the matrix of the operator a in this basis has the block-diagonal form diag{A1. Another way is to find a Jordan canonical basis in each of the generalized eigenspaces by working directly with the operator A. we get the Jordan canonical representation for the block A k • The advantage of this approach is that we are working with smaller blocks. we consider the subspaces (A .4 we get the Jordan canonical representation for N k.. where Ak =AIEAk · Then we can deal with each matrix Ak separately. without splitting it first into the blocks.'). Namely. .. . The operators Nk = A k .

Chapter 10

Linear Transformations
Euclidean Linear Transformations
By a transformation from IR n into IR m , we mean a function of the type T: IR n -+ IR m ,
n
with domain IR and codomain IR m , For every vector x E IRII, the vector T(x)
called the image of x under the transformation T, and the set

E

IR m is

R(I) = {T(x) : x E IRII},
of all images under T, is called the range of the transformation T,
Remark. For our convenience later, we have chosen to use R(I) instead of the usual
T( IR II) to denote the range of the transformation T,

For every x = (xl' .. " Xn) E IR n , we can write
T(x) = T(x l , .. " xn) = (YI' .. " Ym):
Here, for every i = 1, .. ,' m, we have
Yi = Tixl' .. " xn)' (1)
where Ti : IR II

--7

IR is a real valued function,

Definition, A transformation T: IR II -+ IR m is called a Iinear transformation if there
exists a real matrix

A=(a~l
amI
such that for every
we have
where such that for every

a~nJ
amn

253

Linear Transformations

we have
where

or, in matrix notation,

The matrix A is called the standard matrix for the linear transformation T.
Remarks. (1) In other words, a transformation

T: lR. n ~ lR./11
is linear if the equation (l) for every i = 1, .. , ,m is linear.
(2) Ifwe write x
form

E]Rn

and y

E JRm

as column matrices, then (2) can be written in the
y=Ax,

and so the linear transformation T can be interpreted as multiplication of x
the standard matrix A.

Definition. A linear transformation

T:]R11 -7]R/11

m. In this case, we say that T is a linear operator on

Y3

1 0 3 -2

0

by

is said to be a linear operator if n =
JR

I1

Example. The linear transformation T:]Rs -7]R3 , defined by the equations
y) = 2x) + 3x2 + 5x3 + 7x4 - 9xs;
Y2 = 3x2 + 4x3 + 2xs;
Y3 =x) + 3x3 -2x4 ;
can be expressed in matrix form as

( ;~J=(~ ~ ~ ~ -:]

E]Rn

254

Linear Transformations
1

(~J(~ °

3 5

7

3 4

0

-n

3 -2

° ~(:JJ
°
1

so that
T(1, 0, 1,0, 1) = (-2,6,4).

Example. Suppose that A is the zero m x n matrix. The linear transformation T: lR n
n
~ lR m , where T(x) = Ix for every x E lR , is the zero transformation from Rn into lRn.
Clearly T(x) = 0 for every x E lR n •
Example. Suppose that I is the identity n x n matrix. The linear operator T: lR n - t lR m ,
where T(x) = Ix for every x E lR n , is the identity operator on Rn. Clearly T(x) = x for every
xE lRn.

PROPOSITION. Suppose that T: lR n

lR m is a linear transformation, andthatfel'
... , eng is the standard basis for lR n. Then the standard matrix for T is given by
-t

A = (T(e l ) ... T(enY;

Linear Operators on

]R

2

In this section, we consider the special case when n = T(xl,x 3)' and study linear operators
on ]R2. For every x E ]R2 , we shall write x = (xl' x 2 ).
Example. Consider reection across the x2-axis, so that T(x l , x 2) = (-xl' x 2 ). Clearly we
have

and so it follows from Proposition that the standard matrix is given by

A=(~l ~)
It is not dicult to see that the standard matrices for reection across the xl-axis and

across the line Xl = x 2 are given respectively by

A~(~ ~J)

and

A~(~ ~)

Also, the standard matrix for reection across the origin is given by

255

Linear Transformations

A

~(~I ~I}

We give a summary in the table below:
Linear operator
Equations
Reection across x2-axis nYI

YI = -xI
{ Y2 =x2
Y\ =X\

Reection across X I-axis
Reectio:1 across xl

Standard matrix

Y2 = -x2
y\ =X2

= x2

Y2 =XI
YI
{ Y2

Reection across origin

= -Xl

= -X2

(-1 0)
0-1

Example. For orthogonal projection onto the xI-axis, we have T(x l , x 2) = (XI' 0), with
standard matrix

A=(~ ~J
Similarly, the standard matrix for orthogonal projection onto the x2-axis is given by

A=(~ ~J
We give a summary in the table below:
Linear operator

Equations

Orthogonal projection onto X I-axis
Orthogonal projection onto x2-axis

YI
{

Standard matrix

= xl

°
°

Y2 =
Y 1_=
{ Y2 - x2

(1 0)
°°
(0° 0)
1

Example. For anticlockwise rotation by an angle e, we have T(x I' x 2) = (}II' Y2)' where
Y\ + iY2 = (Xl + ix2)(cose + i sine );

and so

= (c~se

-Sine) (XI)
SIn e cose
x2
It follows that the standard matrix is given by
Y\)
( Y2

A

= (cose
sine

-sin e)
cose

256

Linear Transformations

We give a summary in the table below:
Linear operator

Equations

Standard matrix

Yl=XlCOSe-x2 sine
= xl cos e - x2 sin e

Anticlockwise rotation by angle e { {Y2

(cose
sin e

-sine)
cos e

Example. For contraction or dilation by a non-negative scalar k, we have
T(xI' x 2)

= (kxl' kx2),

with standard matrix

A~(~ ~)

°

The operator is called a contraction if < k < 1 and a dilation if k > 1, and can be
extended to negative values of k by noting that for k < 0, we have

This describes contraction or dilation by non-negative scalar -k followed by reection
across the origin. We give a summary in the table below:
Linear operator
Equations
Standard matrix
Contraction or dilation by factor k

YI
{ Y2

= kxl
= kx2

(ko O
k)

Example. For expansion or compression in the xl-direction by positive factor k, we
have T(xl' x 2) (kxl' x 2), with standard matrix

A=(~ ~)
This can be extended to negative values of by noting that for k < 0, we have

(~ ~)~ (~l ~X-: ~)
This describes expansion or compression in the xl-direction by positive factor -k
followed by reection across the x 2-axis. Similarly, for expansion or compression in the x 2direction by non-zero factor k, we have the standard matrix

A=(~ ~).
We give summary in the
Linear operator

tabl~

below:

Expansion or compression in xl-direction

Equations
Yl = kxl
{ Y2 =x2

Standard matrix

For shears in the xl-direction with factor k. x 2) = (xI + kx2. T (k= 1) • For the case k = -1. for shears in the x2-direction with factor k. with standard matrix A=(~ ~). we have standard matrix A=(~ ~). we have T(x l . For the case k = 1. x 2). we have the following. T (k= -I) ) Similarly. We give a summary in the table below: Linear operator Equations Shear in x l-direction { Shear in x 2-direction = xI +kx2 Y2 = x2 Yl = xl +kx2 { Y2 =x2 Yl Standard matrix .Linear Transformations 257 Yl =xl { Y2 = kx2 Expansion or compression in x2-direction Example. we have the following.

then the inverse matrix A-I is the standard matrix for the inverse linear operator '11. -1 Let us summarize the above and consider a few special cases. )M(~l)=T(ez). We have the following table.258 Linear Transformations Example. We have the following table of invertible linear operators with k '* O. Linnear operator T Reflection across Standard matrix A Inverse matrix A-I (~ b) (~ ~) (b ~) (~ b) line XI =X2 Expansion or compressIOn in XI -direction Expansion or compressIOn in X2 -direction Shear in XI -direction Shear in x2 -direction 1 k 0 1 1 0 k 1 rl Linear operator r I ReflectIOn across line "1 =X2 0 1 Expansion or compression 0 ExpanSIOn or compression 1 J 0 in X2 -direction 1 -k Shear in XI -direction 0 1 1 0 Shear in x2 -direction -k 1 k~1 Next. if A is the standard matrix for an invertible linear operator T. Elementary row operation Elementary matrix E Interchanging the two rows Multiplying row 1 by non-zero factor k Multiplying row 2 by non-zero factor k (~ ~) (~ ~) (~ ~) . To end the standard matrix. e z } of ]R2 . -(~)M( ~)M(. followed by a shear in the x1-<iirection with factor 3 and then reection across the xl-axis. Clearly. Consider a linear operator T: ]R Z ---7 ]Rz which consists of a reection across the xz-axis. Note that el - 1 (~ ) M ( ~1) M ( ~1) M ( ~ )= T( el) . consider the eect of Ton a standard basis {e 1. ez so it follows from Proposition that the standard matrix for Tis A = (-Io 3). It is not dicult to see that an elementary row operation performed on a 2 x 2 matrix A has the eect of multiplying the matrix A by some elementary matrix E to give the product EA. let us consider the question of elementary row operations on 2 x 2 matrices.

we know that any invertible matrix A can be reduced to the identity matrix by a finite number of elementary row operations. expansions. Hence Let (a' Then W) = (a ~)A-l.) and y~(~) The equation of a straight line is given by (XXI + ~x2 (IX ~t:) ~ (1). we can prove the following result concerning images of straight lines. In other words. where A is invertible. in matrix form. Proof Suppose that T(x l • x 2) = (Y1'Y2)' Since A is invertible. Suppose that the linear operator T: jR2 ~ jR2 has standard matrix A. (b) The image under T of a straight line through the origin is a straight line through the origin. ••• . compressions and shears. and . Proposition.. by . we have x =A-l y .. Then T is the product of a succession of nitely many reections.Linear Transformations 259 Adding k times row 2 to row 1 Adding k times row 1 to row 2 Now. We have proved the following result. Es of the types above with various non-zero values of k such that so that -1 -1 A = El . = 'Yor. In fact.2 ~ 1R2 has standard matrix A. (c) The images under T ofparallel straight lines are parallel straight lines. E s . where X~(.. Suppose that the linear operator T: 1R. there exist a finite number of elementary matrices E 1. Then (a) The image under T of a straight line is a straight line. Proposition. where A is invertible.

Suppose that Ii : IR2 -7 IR2 is anticlockwise rotation by and T2 : IR2 is anticlockwise rotation by <j>. Then the respective standard matrices are _ (COS8 -sin8) (cos<j> -sin<j» Al . TI and TI . In other words. the image under T of the straight line xI + x 2 == y is "(' YI + WY2 == . cos(<j> + 8) -7 IR2 . smS cos8 2 sin<j> cos<j> It follows that the standard matrix for T2 . Hence T2 . Then T == T2 . To prove (c). Ii: IR -7 IRk is also a linear transformation. note that parallel straight lines correspond to dierent values of for the same values of a and ~. we have TI(x) ==Alx for every x E IR and T2(y) ==A7. Example. Ii: IR n -7 IRIll Proposition. they have standard matrices A I and n A2 respectively. ~ (~ ~) It follows that the standard matrices for T2 .( 2 1. note that straight lines through the origin correspond to "( == 0.sin<j>cos8+cos<j>sin8 cos<j>cos8-sin<j>sin8 = (COS(<j>+8) sine <j> + 8) -Sin(<j>+8)). T2 are respectively A2AI~(~ ~l)andAIA'~(~ ~). TI and TI .sin <j>COS8) AA. Suppose that and T2 : IRIll -7 IRk are linear tramformations. so that T has standard matrix AzAI' Example. This proves (a). clearly another straight line... It follows that T(x) == TiTI(x)) == Ay4lx for every Y E IR n .sin <j>sin 8 -cos<j>sin 8 .J = (y). we establish a number of simple properties of euclidean linear transformations. Suppose that Ii: ~2 -7 IR2 is anti clockwise rotation by nl2 and 2 T2 : IR -7 IR matrices are 2 is orthogonal projection onto the xI-axis.(.260 Linear Transformations (a ' W) = (. and A = .Y for n every Y E IRIll . TI is cOS<j>COS8 . Then the respective standard AI ~ (~ ~1) and A. Elementary Properties of Euclidean Linear Transformations In this section. Proof Since TI and T2 are linear transformations. T2 are not equal. In other words. To prove (b).

= A-Ix for every . 2 2 Example. Example.n . Multiplying on the left by A-I gives x' = x". . (b) The linear operator T is one-to-one. and is therefore invertible.. Then Ax' = Axil. Remark.. . Then the system Ax = 0 has unique solution x = 0 in lR IJ • It follows that A can be reduced by elementary row operations to the identity matrix J. x" E lR. n . n. we clearly have Ao = O. clearly x = A-Iy satises Ax = y.. ((b» =:} (a» Suppose that T is one-to-one.. we obtain x' = x". xn E T(x) = e ' so that AXj = ej . Then the linear operator r-I xE lR. Suppose that the linear operator T : ][{ n -t lR. Definition. in other words. for every j = 1. Clearly r-1(T(x» = x and T(r-I(x» = x for every x E lR. 0 and Ax = O. R(T) = lR IJ • Proof ((a» =:} (b» Suppose that T(x') = T(x"). lR. so that T(x) = y.n . Then there exists x E lR. Definition. then Ax' = Axil. so that A is invertible. Proposition. Linear transformations that map distinct vectors to distinct vectors are of special importance. Suppose that the linear operator T:][{n -t][{11 has standard matrix A. Write J C = (xI'" x n)· Then AC = J. : lR. Suppose 2 next that A is not invertible.261 Linear Transformations Hence T2 0 TI is anticlockwise rotation by <p + e.n . Ifwe consider linear operators T: ][{ -) ][{ . where E]Rn be chosen to satisfy A is invertible. Then the following statements are equivalent: (a) The matrix A is invertible. A linear transformation T:][{n -t][{m is said to be one-lo-one iffor every x'. suppose rst of all that A is invertible.. reaction across the x I-axis followed by reection across the x)-axis gives reection across the origin. n has standard matrix A. such that x ::f. we have x' = x" whenever T(x) = T(x").n . On the other hand. dened by r-Icx) is called the inverse of the linear operator T. If T(x') = T(x"). It follows that T(x) = T(O). . The reader should check that in lR 2 . (c) The range of T is lR n . en} is the standard basis for ]Rn. then T is one-to-one precisely when the standard matrix A is invertible. . Multiplying on the left by A-I. ((a» =:} (c» For any y ((c» =:} (a) Suppose that {el' .. Let xI' . so that T is not one-to-one. To see this.n -t lR.

we have T(u + v) = T(u) + T(v). we study the linearity properties of euclidean linear transformations which we shall use later to discuss linear transformations in arbitrary real vector spaces. dened by T(x) = Ax for every 2 x E]R . A-I -1 1 Hence the inverse linear operator is 11 : ]R2 ~]R2 . we have T(u + v) = A(u + v) = Au + Av = T(u) + T(v) and T(cu) = A(cu) = c(Au) = cT (u): Suppose now that (a) and (b) hold. we need to nd a matrix A such that T(x) = Ax for every X€]Rn. dened by 11 (x) = A-Ix for every 2 x E]R . . For any vector . Suppose that {e 1. we have = 1. Let A be the standard matrix for T.. The reader should check that 11 : ]R2 ~]R2 is anticlockwise rotation by angle 21t-9. Example.. en} is the standard basis for IR n . where T(e) is a column matrix for every j in ]Rn. ••• . n Proposition. . Next. n. A transformation T : IR ~]Rm is linear if and only if the following two conditions are satised: n (a) For every u. we have T(cu) n = cT (u). v E IR nand c E IR . (b) For every u E IR n and c E IR.. where Clearly A =( 2 -1). Then for every u. v E IR .262 Linear Transformations Example. we write A = ( T(e l ) . Suppose that T: 1R2 ~ 1R2 is anticlockwise rotation by angle.• T(e n». To show that Tis linear. Consider the linear operator T:]R2 ~]R2. m Proof Suppose rst of all that T: IR ~ IR is a linear transformation. As suggested by Proposition 8A.

. E lR is called an eigenvalue of T if there exists a non-zero vector x E IR n such that T(x) = x. and is called the zero transformation from V to W.. Definition. we obtain Ax = T(x)e)) + . To define a linear transformation from V into W. is clearly linear. we have T(cu) = cT (u). The transformation T: V ~ V. where leu) = u for every u E V. To conclude our study of euclidean linear transformations. and that k E lR is fixed. A linear transformation T: V ~ V from a real vector space V into itself is called a linear operator on V. T(e n)) (]J ~ xIT(el) + . and is called the identity operator on V. Example. Suppose that T: ll~n ~ IR n is a linear operator. is clearly linear. + xnen) = T(x) as required. with domain V and codomain W. We therefore do not need to discuss this problem any further. Note that the equation T(x) = x is equivalent to the equation Ax = Ax. v E V....... Remark. Suppose that V and Ware two real vector spaces.. we are motivated by Proposition which describes the linearity properties of euclidean linear transformations. n This non-zero vector x E IR is calIed an eigenvector of T corresponding to the eigenvalue f. Then any real number f. This operator is called a dilation if k > I and a contraction if 0 < k < 1. Definition. where T(u) = 0 for every u E V... we mean a function of the type T: V ~ W.. we briey mention the problem of eigenvalues and eigenvectors of euclidean linear operators. For every vector U E V. we have T(u + v) = T(u) + T(v). Suppose that V is a real vector space. where T(u) = ku for every u E V. . The transformation T: V ~ W. is clearly linear. By a transformation from V into W. General Linear Transformations Suppose that Vand Ware real vector spaces. It follows that there is no distinction between eigenvalues and eigenvectors of T and those of the standard matrix A.. + T(xne n) = T(x)e) + . Suppose that V is a real vector space. (LT2) For every u E Vand C E lR. the vector T(u) E W is called the image of u under the transformation T.263 Linear Transformations Ax ~ ( T(e I) . The transformation l: V ~ V. + xnT(en)· Using (b) on each summand and then using (a) inductively. Example. A transformation T: V E W from a real vector space V into a real vector space W is called a linear transformation if the following two conditions are satised: (LT1) For every u. Example. Definition.

Consider the transformation T: V~ W. We shall return to this in greater detail in the next section. It is easy to check from properties of derivatives that T is a linear transformation.. Example. +P. ~n + ~n) = (~I' . we let T(P) = Pn + Pn-I x + . . + cp/' so that T(cp) = cPn + cPn_Ix + . . Example. and let W denote the vector space of all real valued functions dened on ffi. Then P + q = (Po + qo) + (p) + ql)x + . Dene a transformation T: Pn ~ P n as follows. Also. . + (~n + An)wn. + (Pn + qn)xn. ... for any c E IR...t>(l is another polynomial in Pn . then cu = C~I wI + . Hence T is a linear transformation.wn}. Let V denote the vector space of all real valued functions that are Riemann integrable over the interval [0. we have cp = cPo + cP1x + . .. Let V denote the vector space of all real valued functions dierentiable everywhere in IR. where T(f) = f' for every f E V.... + p~) = cT (p).. +q... W n }.... For every U E V.. Suppose that P n denotes the vector space of all polynomials with real coefficients and degree at most n. + p~. + C~nWn' so that T(cu) = (c~I' . with basis {WI' ... . + q~) = T(P) + T(q).n such that u = ~IwI + . + cPoXn = c(Pn + Pn_)x + . ..... 1]. the transformation T gives the coordinates of any vector u E V with respect to the given basis {wI' .... . ~n) + (AI' .. ~n) = cT (u).. Consider the transformation T: V ~ ffi. so that T(u + v) = (~l + AI' . so that T(P + q) = (pn + qn) + (Pn-I + qn_l)x + ..f1 in Pn .. where T(f) = f~f(x)dx .Linear Transformations 264 Example. Dene a transformation T: V ~ IR n as follows. + (Po + qo)xn = (Pn + Pn-I x + ... Example. . Suppose now that q=qo+q)x+ . there exists a unique vector (~I' . . For every polynomial P=PO+PI X + . ~n) E ffi. Also..... if C E IR.. Then u + v = (~I + AI)w I + . Hence T is a linear transformation.. An) = T(u) + T(v).... c~n) = c(~l' . Suppose that V is a finite dimensional vector space.. .. Suppose now that V=AIwI+···+AnWn is another vector in V. ~n)· In other words. + p~) + (qn + qn_I x + ... + ~nwn· We let T(u) = (~I' .

1. we have.0) + T(1. W.4. 1. 1)=T(2(1 0. In particular.1. for example. Suppose that V.3T(x) + 2T(x2) = 5. Definition. Proposition. Then T(u + v) = TzCTI(u + v)) = T2(T I(u) + TJ(v)) = T2(T J(u)) + TzCTJ(v)) = T(u) + T(v).. 0. 1. this linear transformation is completely determined. + ~nvn) = T(~JvJ) + . Suppose that {vI' .. Then T= T2 T J : V ~ U is also a linear transformation. T(x) = E and T(x 2) = 3. ~n E IR. Then the matrix .0. Then every u E V can be written uniqudy in the form u = ~lvJ + . 0. Suppose further that TI : V ~ Wand T2 : W·~ U are linear transformations. 4 0). vn } is a basis of V Then T is completely determined by T(v l )....0) = 3 and T(1. 1.. where T(1) = 1. Suppose that u E V and (3) holds.0)+(1.0. 1. Since {l. Also. 1.3. Consider a linear transformation T: V ~ W from a finite dimensional real vector space V into a real vector space W.. x 2} is a basis of P 2..0)+(1. 0. .. (1... Proposition. T(5 . ~n E IR· It follows that the vector u can be identied with the vector (~I' .1. 1) = 14. U are real vector spaces.. ~n) E IRn. Hence T is a linear transformation. . T(6.. .0) = 2. 1. v n } is a basis of V. In particular... It is easy to check from properties of the Riemann integral that T is a linear transformation. T(1. we have.0)+2(1. 1. 1. 1) = 4.. + ~nvn' where ~I' . with basis B = {u I' . (1. . where T(1. 0) = 1. x. + ~nun' where ~l' .0) + 2T(1. .3x + 2x2) = 5T(1) . 1.0. 0) + T(1. if c E IR. for example.0. Since {(1.Linear Transformations 265 for every f E V. Suppose that T : V ~ W is a linear transformation from a finite dimensional real vector space V into a real vector space W Suppose further that {v I' ... T(l. We also have the following generalization of PROPOSITION. .1.1. 1.. Change of Basis E Suppose that V is a real vector space. this linear transformation is completely determined. + ~nT(vn)' We have therefore proved the following generalization of proposition. 1. 1. + T(~nvn) = ~JT(vl) + . . T(vn). un}' Then every vector u V can be written uniquely as a linear combination u = ~luI + .1. (1.. 0. Consider a linear transformation T: IR ~ IR ... 0). 0. It follows that T(u) = T(~JvJ + . 0. then T(cu) = T2(T J(cu)) = TzCcTJ(u)) = cT2(T I(u)) = cT (u). Example.1)) = 2T(I.1. Proof Suppose that u.. Example.0. I)} is a basis of IR 4 . v E V. 1..0). Consider a linear transformation T: P 2 ~ IR .

u4} is a basis of JR4 . . then we can basically forget about the basis B and imagine that we are working in JR n with the standard basis.. note that each of the vectors vI' . ~3 ~4 so that ~l ~2 [u]B= ~3 ~4 = 3 2 -2 2 3 -10 1 1 3 0 -6 2 0 0 0 x y z w Remark. we have vi = aliu l + . and preserves much of the structure of V. u3 = (2.. where (u) = [u]B for every U E V. -6.266 Linear Transformations [ul. Thus is a ljnear transformation. v E Vand C E JR. =(~:J is called the coordinate matrix of II relative to the basis B = {Ut' .. It follows that for any U = (x.. We also say that V is isomorphic to JR n • In practice. z.. note that [u + v]B = [u]B + [v]B and [cu]B = c[u]B' so that <j>(u + v) = <j>(u) + <j>(v) and <j>(cu) = c<j>(u) for every u.. un}. It is not dicult to see that this function gives rise to a one-to-one correspondence between the elements of V and the elements of ~ n • Furthermore. w) E JR4. u4 = (-2.3.. . ani E R. so that .0). . then we also need to nd a way of calculating [u]C in terms of [u]B for every vector u E V. The vectors u l = (1. 1. if we change from one basis B = {u I ' ••• ... vn} of V. 1. this becomes x 3 2 -2 ~l Y z w = 2 3 -10 1 3 0 0 0 0 I -6 2 ~2 . + aniun. . To do this. Example. -10. y. u2 = (3....0). and so B = {u l ' u2' u3. Clearly. once we have made this identication between vectors and their coordinate matrices. vn can be written uniquely as a linear combination of the vectors ul' . Consider a function <j> : V ~ JR n . . where ali' . Suppose that for i = 1.3. 2) are linearly independent in JR4 .. n.0. un} to another basis C = {VI' . un. .0). 2.. we can write U = ~lul + ~2u2 + ~3u3 + ~4u4: In matrix notation. .

we have {vI' ... + anI un) + .0). Yn E R. so that [ul e = P-1[ul B• Definition. [vnl B) are precisely the coordinate matrices of the elements of C relative to the basis B.0). ) a~n Y:n' We have proved the following result. -1.0). + ~Ynann' Written in matrix notation.3. -1. we can write u = ~Iul + .. + ~nun: Hence ~n = ~Ianl + . and with vI = (1... We know that with u l = (1. + ')'naln)u l + .. + ynvn =YI(allu l + ... ~n' YI' .. 0. 0. we have ( ~:n~I}= a~1 alnJ(YIJ [all (. 1. + 'Ynann)un = ~Iul + .2. where ~I' so that Ylvl Clearly . . Proposition. v3 = (1. + yn(alnu l + ..0). -10.2). Remark. . 2.0). v4= (0... 1. un} and C = vector space V. . Proposition gives [ul B in terms of [ulc.1.. v2= (1.However.. vn} are two bases ofa real [ul B = P[ul c' where the columns of the matrix P = ([vtlB ... + ~nun = + .0).... + ann un) = (Ylall + .. . u3 = (2. + (Ylanl + ... Suppose thatB = {u I' . u4 = (-2. Strictly speaking. -6... note that the matrix P is invertible (why?)... 2). .3.. Example... + Ynvn... Then for every U E V.' am for every u E V.. u2 = (3. The matrix P in Proposition is sometimes called the transition matrix from the basis C to the basis B. u = Ylvi + ..1.0. 0.267 Linear Transformations [vilB aliJ = ( :..

2. u3 = -3v I + 4v2 + v3' u4 = -vI . Now let u = (6.268 that Linear Transformations both B = {u I ' u2' u3' u4 } and C = {vI' V 2. Example. It is also easy to check that u l = vI' u2 = 2vI + v2 . It is easy to check VI = ul' v2 = -2u I + u2' v3 = 11u I -4u2 + u3 v4 = -27u I + l1u 2 . We Hence [u]c = Q[u]B for every u E can check that u = VI + 3v2 + 2v3 + v4' so that 1 [u1c = 3 2 1 Thea [u]B = 1 -2 11 0 1 -4 0 0 0 0 1 0 -271 11 -2 1 1 3 j 21 -10 6 0 1 Check that u = -IOu I + 6u2 + u4. V 4 } are bases of ~4 .3v2 + 2v3 + v4' so that 1 2 -3 -1 o 1 4 -3 o o 0 2 0 0 1 ~4. V 3.2u 3 + u4' so that 1 -2 11 -27 o -4 1 11 -2 1 0 P = ([vdB [v 2]B [v 3]B [v4 ]B) = 0 o 0 0 1 Hence [u]B = P[u]c for every u E ~4. -1. Consider the vector space P 2. It is not too dicult to check that .2). Note that PQ = 1.

-1). note that = (-3. v 2 = 1 + x. it is also not too dicult to check that vI = 1.x 2 = YI + Y2(1 + x) + Yi1 + x + x 2) = (Yl + Y2 + Y3) + (Y2 + Y3)x + Y3 x2 . so that and Hence (YI' Y2' Y3) Ifwe write then Next. Hence (~I' ~2' ~3) B = (3. Then ' U 3 =X+X2 where 1 + 4x -x2 = ~1(1 + x) + ~2(1 + x 2) + ~3(x + x 2) = (~I + ~2) + (~1 + ~3)x + (~2 + ~3)x2.269 Linear Transformations U I =l+x ' U 2 =1+x2 form a basis of P 2' Let 2 U = 1 + 4x -x . . so that and P2 + P3 =-1. then [Uln=(+J On the other hand. Ifwe write u2' u3 }.5. -2. = {u I ' 1). v3 = 1 + x + x 2 form a basis of P2' Also U = Ylvi + Y2 v2 + Y3v3' where 1 + 4x .

Recall that the sum ofthe dimension of the nullspace ofA and dimension of the column space of A is equal to the number of columns of A. For a euclidean linear transformation T with standard matrix A. Clearly we have ker(I) = Vand R(I) = {O}. . Then the set ker(I) = {u E V : T(u) = 0 is called the kernel of T. This is known as the Rank-nullity theorem. we need the following generalization ofthe idea of the nullspace and the column space. -112 0 112 (:~~ ~ :~~)(~3). On the other hand. To do this. Definition. the set n {x E IR : Ax = O} is the nullspace of A.~~). Suppose that T: V ~ W is the zero transformation.Linear Transformations 270 Hence P = ([vdB [v 2]B [v3]B) = To verify that [u]B = P[u]c' note that ( ~2) = 1 (~~~ ~ . It follows that R(I) is the set of all linear combinations of the columns of the matrix A. Suppose that T: V ~ W is a linear transformation from a real vector space V into a real vector space W. while R(1) is the column space of A. -112 0 112 -1 Kernel and Range Consider rst of all a euclidean linear transformation IR n ~ IR m . and the set R(I) = {T(u) : U E V} is called the range of T. The purpose of this section is to extend this result to the setting of linear transformations. Example. and is therefore the column space of A. we have shown that ker(I) is the nullspace of A. . Suppose that A is the standard matrix for T. Example. Then the range of the transformation T is given by n R(I) = {T(x) : x E IRn} = {Ax: x E IR }.

Then T(cu) = cT (u) = cw. 1]. Z E R(1). Suppose that T: ~n ~ ~n is one-to-one. where V denotes the vector space of all real valued functions Riemann integrable over the interval [0. Example. Then dim ker(1) + dim R(1) = n. where W denotes the space of all real valued functions dened in JR. and so is the set of all constant functions in JR. Example. so that C u E ker(1). Example. Proposition. Then ker(1) is the set of all Riemann integrable functions in [0. while R(1) is the xl-axis. v E V such that T(u) = wand T(v) = z. Then there exist u. and where T(f) =f~f(x)dx for every f E V. so that w + Z E R(1). Suppose further that c E JR. Then ker(1) = {O} and R(1) = JR n . Suppose that T. Example. Consider the linear transformation T: V ~ W. Proposition. Hence R(1) is a subspace of W. Then T(cu) = cT (u) = Co = 0. Proof Since T(O) = 0. Suppose further that c E JR. while R(1) = JR. Suppose that T: JR2 ~ JR2 is orthogonal projection onto the xl-axis. Suppose that T: V ~ W is a linear transformation from a real vector space V into a real vector space W. Suppose next that w. Then ker(1) is a subspace of V. For any u. To complete this section. Then ker(1) is the x 2-axis. while R(1) is a subspace of W. Hence ker(1) is a subspace of V. so that cw E R(1). .Linear Transformations 271 Example. it follows that 0 E ker(1) Vand 0 E R(1) W. 1] with zero mean. so that u + v E ker(1).' V ~ W is a linear transformation from an n-dimensional real vector space V into a real vector space W. Suppose that T: V ~ V is the identity operator on V. we prove the following generalization of the Rank-nullity theorem. we have T(u + v) = T(u) + T(v) = 0 + 0 = 0. where V denotes the vector space of all real valued functions dierentiable everywhere in JR. Hence T(u + v) = T(u) + T(v) = w + z. Then ker(1) is the set of all dierentiable functions with derivative 0. and where T(f) = f' for every f E V. Clearly we have ' ker(1) = {O} and R(1) = V. v E ker(1). Consider the linear transformation T: V~ JR.

CnVn = O. ~n E lR such that ~lvi + '" + ~rvr + ~r+I v r+ 1 + . + cnT(vn) = 0.. ... + ~nvn' so that T(u) = ~IT(vl) + . Since {VI' . T(vn).. + cnvn) = 0. . Then there exist ~I' .. it follows that c i = . a contradiction since civ i + . + cnT(vn) = O.. Remark...Cr+l Vr+l . ... vn} is a basis of V... Suppose next that dim ker(I) = 0. + CrVr . Hence dim R(I) = n.. Suppose that cr+I' . .. = cr = cr+I = . r < n... vr' vr + 1. vn } of V. + crvr ... Suppose that U= UE V. Let {vI' .. . and the result again follows immediately. elements of R(I) are linear combinations of T(v l ). . cn E lR and cr+IT(vr+l) + . . T(vn)} is a basis of R(I). . and the result follows immediately. vr } be a basis ofker(I).. . where 1 ::::.. Hence there exist c I ' .... T(v n) are linearly independent in W.. it follows from equation that T(cr+Ivr+I + ... . cn E lR. This basis can be extended to a basis {vI' . and so R(I) = {O}. It suces toshow that {T(vr+I)' .. +cnVn... . so that ker(I) = {O}. We sometimes say that dim R(I) and dim ker(I) are respectively the rank and the nullity of the linear transformation T. .. We may therefore assume that dim ker(I) = r.. . By linearity. On the other hand. ........... vn } is a basis of V.... . so that C I vI + . = cn = O... If {vI' . = cn = O.f:.. + ~nT(vn)' It follows that spans R(I). so that cr+lvr+1 + .. such that cIT(v l ) + . . It remains to prove that its elements are linearly independent. + cnvn = civ i + ... . + ~nT(vn) = ~r+IT(vr+l) + . 272 < Linear Transformations Proof Suppose first of all that dim ker(1) = n. We need to show that cr+1 = . + cnvn E ker(I). Then ker(I) = V... + ~rT(vr) + ~r+IT(vr+1) + . cr E lR such that cr+lvr+1 + .... for otherwise there exist c l ' . O. .. so that T(c i vI + . not all zero. + cnvn) = 0. then it follows that T(v l )....

Given any u'. v E V such that r-I(w) = u and r-I(z) = v. (c) The range of T is V.T(u') = T(u' . Suppose that T: V -7 W is a linear transformation from a real vector space V into a real vector space WF. R(T) -7 V is a linear transformation.u'') = 0 if and only if u' . in other words. Then r-I. Z E R(1). We have the following generalization of PROPOSITION. we have u' = u" whenever T(u') = T(u"). where u E Vis the unique vector satisfying T(u) = w. Suppose that T: V -7 W is a one-to-one linear transformation from a real vector space V into a real vector space W. Then there exist u. (a) The linear operator T is one-to-one. Proof (=» Clearly 0 E ker(1). It follows that T(u) = wand T(v) = z. Then for every W E R(1). u" E V . Proposition. Suppose that ker(1) ::j: {O}. Definition. Proposition.Linear Transformations 273 Inverse Linear Transformations In this section. Then T is one-to-one if and only if ker(1) = {O}. R(1) = V. . Suppose that T: V -7 V is a linear operator on a finite-dimensional real vector space V. Proof Suppose that w.u" = 0. we generalize some of the ideas first discussed in Section. A linear transformation T: V -7 W from a real vector space V into a real vector space Wis said to be one-to-one iffor every u'. (b) We have ker(1) = {g}. Suppose that T:V-7W is a one-to-one linear transformation from a real vector space V into a real vector space W. there exists exactly one u E V such that T(u) = W.I We can therefore dene a transformation r-I : R(1) -7 V by writing r-I(w) = u. we have T(u') . It follows that T(v) = T(O). Then the following statements are equivalent. Proof The equivalence of (a) and (b) is established by PROPOSITION. Proposition. if and only if u' = u". The equivalence of (b) and (c) follows from PROPOSITION. so that T(u + v) = T(u) + T(v) = w + z. Then there exists a nonzero v E ker(1). and so T is not one-to-one. in other words. (( ¢:::) Suppose that ker(1) = {O}. u" E V.

with range R(<1» I : = lR n • Furthermore. then it is possible to describe T indirectly in terms of some matrix A. with dim V = n and dim W = m.. Suppose further that the vector spaces V"and Ware finite dimensional. where <I>(v) = [v]B for every v E V.. vn }. where PI' . Suppose next that C = {wI' . + Pnvn. W.. with range R( lR n -I) = V. where \jJ (w) = [w]c for every w E now have the following diagram of linear transformations.. . We shall show that if we make use of a basis B of V and a basis C of W. Proposition...s a basis of V. W. Consider now a transformation <I> : V ~ lR. Suppose further that C E IR . TI : V ~ U is one-to-one.T2-1.. Suppose that V. the inverse linear transformation <1>- lR n ~ V is also one-to-one... Then every vector v E V can be written uniquely as a linear combination v = PI VI + . Then T(cu) = cw. Suppose that B = {VI' . The proof of the following result is straightforward. so that jl(cw) = cu = cjl(w). Then we can dene a linear transformation \jJ : W ~ lR m . Then the tramformation <I> : V~ W. Suppose further that T\ : V ~ Wand T2 : W ~ U are one-to-one linear transformations. U are real vector spaces. where <I>(v) = [v]Bsatisesforevery v E V. in view of our knowledge concerning inverse functions. and (b) (T2 . . n . Let us recall some discussion in Section. Proposition. .274 whence • jl(w + z) = U + v = Linear Transformations jl(w) + jl(z). in a similar way. .. Titl = 1}-I. We also have the following result concerning compositions of linear transformations and which requires no further proof. Matrices of General Linear Transformations Suppose that T: V ~ W is a linear transformation from a real vector space V to a real vector space W. Suppose that the real vector space V has basis B = {VI' . Pn E lR The matrix is the coordinate matrix of v relative to the basis B. Then (a) The linear transformation T2 . The main idea is to make use of coordinate inatrices relative to the bases Band C. vn } . is a one-to-one linear transformation. We .wm} is a basis of W.

.. We now have the following diagram of linear transformations. For every j See) = (\jf 0 T 0 <I> -I ) (e) = \If (T(<I>- = I. = (S(e l ) ••• . V _ _ _~T_ _ _~) W -I 0/ 0/ JR" Clearly the composition n -I III S = \jf. The matrix A given by equatioin is called the matrix for the linear transformation Twith respect to the bases Band C...T 0 T 0 <I> : IR. . .---~) W -I 0/ 0/ IIt _ _ _--"'S_ _ _~) ]Rnl Hence wt: can write T as the composition .. is a euclidean linear transformation.. We know from Proposition that A where {e l . T V----. and can therefore be described in term of a standard matrix A. n. Our task is to determine this matrix A in terms of rand the base~ :. where \If (w) = [w]c for every WE W . in similar way.. nd C. . Seen))' en} is the standard basis for ]Rn.. . [T(vn)]c)' Definition.:...275 Linear Transformations Suppose nextthat {w l' .. 1 It follows that A = ([T(vl)]c . ~ IR. We now have the following diagram of linear transformations. W m} is basis of W. Then we can define linear transformation \1' : W ~ ]Rm. we have (e))) = \If (T(v)) = [T(v) )]c.

we then say that A is the matrix for the linear operator T with respect to the basis B. we have the following: v ~[V]B ~ A[v]B More precisely. we may choose a basis B for the domain V of T and a basis C for the codomain V of T. In the case when T is the identity linear operator. and so T(v) = \If-I(A[v]B) = 'Ylwi + . + 'Ymwm.. Then . with bases Band C respectively. Suppose that T: V ~ W is a linear transformation from a real vector space V into a real vector space W. the linear transformation T: V ~ W is a linear operator on T. where W E W is the unique vector satisfying [w]c = A [v]B. Now consider the basis B = { 1.. x. Remark. In the special case when V = W. where for every polynomial p(x) in P3' we have T(P(x» = xp' (x). Consider an operator T : P 3 ~ P 3 on the real vector space P3 of all polynomials with real coefficients and degree at most 3. we have T(v) = w. The reader is invited to check that T is a linear operator. In the case when T is not the identity operator. + ~nvn' then say. and that A is the matrix for the linear transformation T with respect to the bases Band C. x 2' x 3 } of P3 • The matrix for Twith respect to B is given by A = ([T(I)]B [T(x)]B [T(x 2)]B [T(x 3)]B) = ([O]B [x]B [2x 2]B [3x 3]B) = o o 0 0 0 o 0 0 3 1 0 0 0 0 2 0 Suppose that p(x) = 1 + 2x + 4x2 + 3x3 . we often choose B = C for the sake of convenience.:f:. if v -\ 'II ) 'V -I (A[ v]B ) = ~Ivi + . Then for every v E V. We have proved the following result. Of course. Example. V.f 276 Linear Transformations -I For every v E T='V oS0<l>:V~W. Suppose further that Vand Ware finite dimensional. C since this represents a change of basis. we often choose B .. the product of x with the formal derivative p'(x) of p(x). Proposition..

[(3. Suppose that T: )Rn ~)Rm is a linear transformation.) XI and A[XI . 1)1.. 4)1. 1)1. In general.lR m respectively. Then [(3. Observe that T(P(x» = xp' (x) = x(Plx + 2P2x + 3P3x) = PIx + 2pzX2 + 3P3x3 . xI 3x2) for every (x I' x 2 ) E .x. 2). verifying our result. 1) = (2x 1 + x2. so it follows from Proposition that is simply the standard matrix for T.) ~ ([(2..2x2)(1 .. In general. ~ G) and A[(3. Then the matrix for with respect to is given by A~ 1\1. x 2) = (xI . if p(x) = Po + Plx+ P2x2 + P3x3.lR nand .0) +9(1. T(en)]c = (T(e l ) . This can be easily veried directly.2)]. :. 0) + (xI + 3x2)(1. 0).[T(1. Consider the linear operator T: ~2 ~ ~2.9). Example. Consider also the basis B = (1. Then the matrix for T with respect to Band C is given by A = T(el)]c . ~ C ~I)G) ~ (~I).)lB ~ ( :. ~ C ~TI x} (:: : ~::). ~ C ~~)­ Suppose that (xI' x2) 3. Example.lR 2 . so that T(x l . 2) -(1. 2)]. 1. This can be easily verified by noting that T(P(x» = xp' (x) = x(2 + 8x + 9x2 ) = 2x + 8x2 + 9x3. we have [(XI' x. T(e n).lR 2 .x. 1) (8. so that T(3. x2) 2x1 x 2. given by T(x\. 1) of .. Suppose further that Band C are the standard bases for . . O)]. xI + 3x2).Linear Transformations 277 fp(x)]B = 1 0 2 0 4 and Afp(x)]B = 3 0 0 0 1 0 0 0 2 2 0 0 2 0 4 8 0 0 0 3 3 9 so that T(P(x» =2x + 8x2 + 9x3 . then Po [P(x)]B PI P2 and A[P(x)]B P3 = o o 0 0 0 Po 0 1 0 0 2 PI 0 0 0 0 0 3 P2 P3 o 3P3 so that T(P(x» = PIx + 2P2x2 + 3P3x3 .1.

. with respective bases B = {vI' .. U are finite dimensional. and SI ='If 1J 0 -\ 0 <I> n : lR -7 lR m an d S2 =11 7' 0 12 0 'If -I TlJ)m : m.-3 S.. with bases B. TlJ)k -7 m. . k _ _ _. is a linear transformation. where the real vector spaces V.. Then A2 A I is the matrix for the linear transformation T2 x TI with respect to the bases Band D.. Suppose that Al and A2 for SI and S2' so that they are respectively the matrix for TI with respect to Band C and the matrix for T2 with respect to C and D. and that A2 is the matrix for the linear transformation T2 with respect to the bases C and D. Suppose further that Al is the matrix for the linear transformation TI with respect to the bases Band C. W. .. To summarize. Suppose that TI : V -7 Wand T2 : W -7 U are linear transformations. We then have the following diagram of linear transformations. {wI' ._~) ~ Here 11: U -7lR k . ~ W ---"------+) U v----'---~) ". ~n----'---~) 11-' 11 '" ~m S. wm } and D = {u I ' .Linear TransformatIOns 278 Suppose now that T J-7 W and T2 W-7 U are linear transformations. Proposition. D respectively.. T. we have the following result. are euclidean linear transformations. where 11 (u) = [u]D for every U E U. x 2... x. x 3 } of P3 is given by .. Example. We have already shown that the matrix for TI with respect to the basis B = {I. W.. vn }. uk}.. Consider the linear operator TI : P3 -7 P3' where for every polynomial p(x) in we have TI (P(x)) = xp'(x).. where the real vector spaces V. . It follows thatA01 is the standard matrix for S2 0 SI' and so is the matrix for T2 0 TI with respect to the bases Band D.. -I n k S2 0 Sl = 11 0 12 o1J 0 <I> : lR -7 lR . Clearly . C. U are finite dimensional.

T2(x) = 1 + x. where for every polynomial q(x) = qo +q lx+q2x 2 +q3x3'In P 3. T2(x 2) = 1 + 2x + x 2 and Tix3) = 1 + 3x + 3x2 + x 3. By Proposition 8T. given by . we have 0 0 0 0 A = A2AI = 0 1 2 3 0 1 0 0 0 0 1 3 0 0 2 0 0 0 0 0 0 0 = 3 0 2 3 0 1 4 9 0 0 2 9 0 3 0 0 Suppose thatp(x) = Po + PIx + P2x2 + P3x3. We can check this directly by noting that T(P(x» = TiTI(P(x») = T2(Plx + 2P2x2 + 3P3x3) =PI(1 + x) + 2pil + x)2 + 3P3(1 + x)3 = (PI + 2P2 + 3P3) + (PI + 4P2 + 9P3)x + (2P2 + 9P3)x2 + 3P3x3 . Then [p(x>S = Po 0 1 2 3 pOl 4 9 I and A[p(x)]B = 0 0 2 9 ~ P3 0 0 0 3 Po P I ~ PI +2Pn +3P3 = PI +4P2 +9P3 P3 2~+~ 3P3 so that T(P(x» = (PI+ 2P2 + 3P3) + (PI + 4P2 + 9P3) x + (2P2 + 9P3)x2 + 3P3x3. so that the matrix for T2 with respect to B is given by A2 = ([T2(1)]B [T2(x)]B [T2(x2)]B [T2(x 3)]B) = 023 003 000 Consider now the composition T= T2 0 TI : P 3 ~ P 3. We have Til) = 1. Example. LetA denote the matrix for T with respect to B.we have Tiq(x» = q(l + x) = qo + qt(1 + x) + qi1 + x)2 + q3(1 + x)3.Linear Transformations Al = 279 0 0 0 0 0 0 0 2 0 0 0 0 3 0 0 Consider next the linear operator T2 : P3 ~ P3. Consider the linear operator T:]R2 ~]R2 .

5x l + IOx2). Proposition.• 280 Linear Transformations T(x l . 0). I) = (5x l + 5x2. since if A I denotes the matrix for the inverse linear operator 11 with respect to B. Furthermore. Then T is one-to-one if and only if A is invertible. Suppose further that A is the matrix for the linear operator T with respect to the basis B.]. Suppose that T: V -7 V is a linear operator on a finite dimensional real vector space V with basis B. The reader is invited to check this directly. the matrix for T2 with respect to B is given by A' =(: ~l: ~lG ~:} Suppose that (xl' x 2) E]R2 . where for every q(x) = qo + qlx + 2 q2x + q3x 3'In P3' we have T(q(x)) = q(1 + x) = qo + ql(l + x) + qil + x)2 + q3(l + x)3. A simple consequence of Propositions 8N and 8T is the following result concerning inverse linear transformations. Then [(V.)]. Proof Simply note that T is one-to-one if and only if the system Ax=O has only the trivial solution X = O.[V. Example. (l. 0) + (5x] + IQx2)(l. The last assertion follows easily from Proposition 8T. if T is one-to-one. We have already shown that the matrix for Twith respect to the basis B = {(l. = ( XI ~ X') and A. the matrix for the identity operator 110 Twith respect to B. = (~ ~:)(XI ~ X2) (5X::~~J so that T(xof' x 2) = -5xil. B= {I +x. We have already shown that the matrix for Twith respect to the basis . X 2) = (2x l + X 2' xl + 3x2) for every (xl' x 2) E]R2 . then we must have A'A = I. By Proposition 8T.x3} is given by . I)} of ]R 2 is given by A=G ~I} Consider the linear operator T:]R2 -7]R2 . Consider the linear operator T: P 3 -7 P 3.x2. then A-I is the matrix for the inverse linear operator 11 : V -7 V with respect to the basis B.

Then 0 0 0 -1 Po [p(xh = PI P2 -I and A [P(x)]8 =0 1 -2 0 1 0 0 0 P3 P2 P3 so that r-I(p(x» 0 3 -3 ' 1 Po .. with one basis B = {v!. . .3x2 + 3x .1) = Po + PI (x . [un]B) .2P2 + 3PJ)x + (P2 . so it follows that T is one-to-one. it can be checked that A -I -1 1 -1 1 1 -2 0 =0 0 3 1 -3 1 Suppose that p(x) = Po + PIX + P2x2 + P3x3.1). .1) + pix . that if P = ([ut]B .P3) + (Pt .3P3)x2 + P3x3 =Po + PI(x .Linear Transformations 281 1 1 A= 1 2 3 0 0 0 1 3 0 0 0 1 This matrix is invertible.. and let A' denote the matrix for Twith respect to the basis B'.. un}' Suppose that T: V -7 V is a linear operator on V.PI + P2 .1)2 + P3(x .1)3 = p(x .. Let A denote the matrix for T with respect to the basis B.1) + P2(X2 . Change of Basis Suppose that V is a finite dimensional real vector space.. then [w]B =A[v]B and [W]B' = A. vn} and another basis B' = {u 1' .P3 Po PI -1 = PI -2P2 +3P2 P2 -3P3 P3 = (Po - PI + P2 . Furthermore..[v]B' We wish to nd the relationship between A' and A. Recall Proposition J. Ifv E Vand T(v) = w.2x + 1) + P3(r .

We can also write P = [1]B' B. to denote that P is the transition matrix from the basis B' to the basis B. Proposition. and ~I = ([v dB' .B' = [1]B' and [1 ]B.B = [1]B' We have proved the following result. B[1]B[1 ]B. then [vJs = P[v]B' and [wJs = P[w]B" Note that the matrix P can also be interpreted as the matrix for the identity operator I:V-7V with respect to the bases B' and B. (1) We have the following picture.. Remarks. Comparing this with (1 ]). This implies that A =PA'p-l. un}' Suppose further that A and A I are the matrices for T with respect to the basis B and with respect to the basis B' respectively. and can also be interpreted as the matrix for the identity operator I:V-7V with respect to the bases Band B'.. where P = ([utlB'" [un]B denotes the transition matrix from the basis B' to the basis B. .. with bases B = {vI' . We can use the notation A = [1]B and A' [1]B' to denote that A and A' are the matrices for T with respect to the basis B and with respect to the basis B' respectively. [vn]B' ) denotes the transition matrix from the basis B to the basis B'. It is easy to see that the matrix P is invertible. We conclude that. Suppose that T: V -7 V is a linear operator on a finite dimensional real vector space V. . vn} and B' = {u l ' . Then (13) and (14) become respectively [I ]B'. (2) The idea can be extended to the case of linear transformations T: V -7 W ffom a finite dimensional real vector space into another. we conclude that ~IAP=A'. with a change of basis in Vand a change of basis in W.282 Linear Transformations denotes the transition matrix from the basis B' to the basis B. Then ~IAP =A' andA'= PA~I . B' [1]B' [1]B'. .. [w]B' = ~I w]B = ~IA[v]B = ~IAP[v]B'. so that ~I = [1]B" B.. Remark..

with bases B = {1. x2.:.---------+) [w]B' ~ [V]B A / ) [W]B Example. Let A denote the matrix for T with respect to the basis B.. Consider also the linear operator T: P3 ~ P3' where for every polynomial p(x) = Po +PIx + pzX2 + P3x3. x. and so o A = ([T(l)]B [T(x)]B [T(x 2)]B [T(x 3)]B) o 0 0 = 0 0 o 0 Next. T(x) = 1 + x.1 + x + x 2 + x 3}. we have' T(P(x» = (Po + PI) + (PI + P2)x + (P2 +P3)x2 + (Po +P3)x3. 1 + x. T(x 2) = x + x2 and T(x 3) = x2 + x3.283 Linear Transformatiom T ~v V~ )W~ ________________________________--+)W T A' [v]B'----------:. 1 + x + x2. Then T(l) = 1 + x 3. Consider the vector space P3 of all polynomials with real coefficients and degree at most 3. note that the transition matrix from the basis B' to the basis B is given by 1 1 1 1 o o 1 1 1 0 000 It can be checked that -\ P = -1 0 0 0 1 -1 0 0 0 1 -1 0 0 0 1 .x3} and B'= {l.

(c) A and A' have the same characteristic polynomial. T(l + x + . and (b) A vector V is an eigenvector of T corresponding to an eigenvalue A if and only if the coordinate matrix [ul B is an eigenvector of A corresponding U E to the eigenvalue A. These can be veried directly. Then any real number A E 1R is called an eigenvalue of T if there exists a non-zero vector v E V such that T(v) = Av. Suppose further that A is the matrix for T with respect to a basis B of V. T(l + x) = 1 + (1 + x) . Proposition. Then (a) det A = det A'. Suppose further that A and A' are the matrices for T with respect to the basis B and with respect to the basis B' respectively. (d) A and A' have the same eigenvalues.x2) + (l + x + x 2 + x 3) = 1 + x 3. Eigenvalues and Eigenvectors Definition.x2 + x3) = 2 + 2x + . We also state without proof the following result. the proof of which is left as an exercise.x2 + x3. The purpose of this section is to show that the problem of eigenvalues and eigenvectors of the linear operator T can be reduced to the problem of eigenvalues and eigenvectors of the matrix for T with respect to any basis B of V. with bases Band B'. It follows that T(l) = 1 . The starting point of our argument is the following theorem.x2) = (l + x) + (1 + x + . Suppose that T: V ~ V is a linear operator on a finite dimensional real vector space V. Then (a) The eigenvalues of T are precisely the eigenvalues of A.(1 + x + x 2) + (1 + x + x 2 + x 3) = 2 + x + x 3. Suppose that T: V ~ V is a linear operator on a finite dimensional real vector space V. (b) A and A' have the same rank.(1 + x +. . Proposition. T(l + x + x 2 + x 3) = 2(1 + x + x 2 + x 3) = 2 + 2x + 2x2 + 2x3.284 Linear Transformations and so A'=P-1AP= -1 0 0 1 0 0 0 1 -1 0 0 1 1 0 0 0 1 -1 0 0 1 1 1 0 1 1 1 0 0 1 1 = 1 1 0 0 0 1 1 0 -1 -1 0 0 1 1 1 2 0 0 0 1 1 0 0 1 0 0 0 is the matrix for T with respect to the basis Bo. This non-zero vector v E V is called an eigenvector of T corresponding to the eigenvalue A. and (e) The dimension of the eigenspace of A corresponding to an eigenvalue A is equal to the dimension of the eigenspace of A' corresponding to A. Suppose that T: V ~ V is a linear operator on a finite dimensional real vector space V.

285 Linear Transformations Suppose now that A is the matrix for a linear operator T: V~ V on a finite dimensional real vector space V with respect to a basis B = {vI' . It follows that the matrix for Twith D{I . un} [un]B)' is a basis of V consiting of eigenvectors of T. with basis B = {I.. we have T(P(x» = (5po -2PI) + (6PI + 2P2 - 2Po)x + (2p} + 7P2)x2· Then T(I) = 5 . P = ([uI]B . In other words.. then there exists an invertible matrix P such that jTIAP=D is a diagonal matrix. with It is a simple exercise to show that the matrix 2A has corresponding eigenvectors XI =(~}2 =(~lJ~=(n so that writing • P=(~ ~I2 ~IJ' 2 -I . Furthermore. J respect to the basis B' is given by where AI' . so that the matrix for T with respect to the basis B is given by A 5 -26 OJ2.. where B' = {u I ' . vn }· If A can be diagonalized. Furthermore. . and so are the coordinate matrices of eigenvectors of T with respect to the basis B. Consider the vector space P2 of all polynomials with real coefficients and degree at most 2. 6. Consider also the linear operator T : P2 ~ P2' where for every polynomial p(x) = Po +PIx+Pr2. 9. = ([T(I)]B [T(x)]B [T(x2)]B) = -2 (° 7 eigenvalues 3.. x. x2 }. Example... . .. T(x) = -2 + 6x + 2x2 and T(x 2) = 2x + 7x2. P is the transition matrix from the basis B' to the basis B. An are the eigenvalues of T. the columns of P are eigenvectors of A.2x..

xx2) = 6 + 6x . where [A ~ (xl]B PJ [p. P3(x)}. Clearly PI(x) = 2 + 2x -x2.6x + 12x2 = 6P2(x).3x2 = 3PI (x). Note now that T(p} (x)) = T(2 + 2x .286 Linear Transformations we have Now let Bo = {PI (x). T(P2(x)) = T(2 -x + 2x2) = 12 . T(pix)) = T(-1 + 2x + 2x2) = -9 + 18x + 18x2 = 9P3(x). '. (xl]s ~ ~l). [~(xl]s ~ ( ( n Then P is the transition matrix from the basis B' to the basis B.pix) = 2 -x + 2x2 ad P3(X) = -1 + 2x + 2x2. Pz{x). . and D is the matrix for T with respect to the basis B'.