LA

1
MA212 Further Mathematical Methods

Part II: Linear Algebra
In this pack you will find a set of notes and exercises for the second half of the course
MA212. These materials are identical to those used for the half-unit course MA201,
which has now been discontinued. The notes and lectures are still appropriate for
any students registered for MA201.
The notes are heavily based on those prepared by Dr Robert Simon, who lectured the
course for several years up to 2007, and Professor Graham Brightwell, who taught
the course until 2013. There is some material in the notes that won’t be in the
lectures – students do not need to study this material for the exam. In particularly,
the notes include more proofs than will be covered in lectures. This is primarily a
methods course, and most of what we will be studying will be the methods used in
linear algebra, with a few applications. However, an understanding of the theory is
also important in order to derive the maximum benefit from the course.
The notes are organised into 20 lectures, but the lectures may not stick exactly to
this schedule.
Exercises are at the end of these notes. Exercises will be set each week, for classes in
the following week. Answers to the exercises will appear on the Moodle page after
the classes.
2
Lecture 1: Linear Independence

1.1 What is a vector space?
A vector space is a set V equipped with notions of “addition” and “scalar multipli-
cation”, so that: (i) the sum v1 + v2 of any two elements (vectors) v1 , v2 in V is
also in V , (ii) the scalar multiple av is also in V , whenever v is in V and a is any
scalar.
Usually, we shall take the set of scalars to be the set R of all real numbers: then we
say that V is a real vector space. Sometimes we shall take the set of scalars to be
the set C of all complex numbers: then V is a complex vector space.
Addition and scalar multiplication of vectors in a vector space satisfy the following
properties, all of which hold for all vectors (u, v, w) in the vector space, and all
scalars (a, b):
(i) commutativity of vector addition: u + v = v + u,
(ii) associativity of vector addition: u + (v + w) = (u + v) + w,
(iii) distributivity of scalar multiplication: a(u+v) = au+av and (a+b)u = au+bu,
(iv) associativity of scalar multiplication: (ab)u = a(bu),
(v) there is a zero vector: a special vector 0 such that 0 + v = v + 0 = v for all
vectors v,
(vi) special property of the scalar 1: 1u = u.
A vector space is called trivial if it consists only of the zero vector 0.
1.2 Some examples of vector spaces

The most common examples of vector spaces, and the ones we will work with in this
course most of the time, are the real vector space
Rn := {(a1 , a2 , . . . , an )t | ai ∈ R for all i}
and the complex vector space
Cn := {(a1 , a2 , . . . , an )t | ai ∈ C for all i},
for n a positive integer.
Vectors in Rn or Cn are “column vectors”, written vertically, but we often write

3
 
0
them as, e.g., (0, 1, 2) = 1. Here the t-superscript indicates the transpose. If
t
2
a vector in R or C is written as a letter, then it will be in bold face (e.g., v) –
n n
when a vector is handwritten, either by the lecturer in lectures or by students in

homework, vectors should be underlined to distinguish them from scalars.
Sometimes we will consider other vector spaces. For example, the set P of all
polynomial functions is a vector space, as is the set of all functions from R to R, or
(e.g.) from [0, 1] to R, with [af + bg](x) defined in each case to be af (x) + bg(x).
Another example, which we shall look at in more detail shortly, is the set D of all
functions f : R → R that have derivatives of all orders, i.e., such that f 0 , f 00 , f 000 , . . .
are all defined.
2
Another vector space that we shall encounter P∞ 2 later is the set ` of infinite sequences
(a0 , a1 , . . . ) of real numbers such that i=0 ai < ∞, i.e., the sum of the squares
is finite. Addition and scalar multiplication in `2 are defined by: (a0 , a1 , . . . ) +
(b0 , b1 , . . . ) = (a0 + b0 , a1 + b1 , . . . ) and r(a0 , a1 , . . . ) = (ra0 , ra1 , . . . ), for r a real
number and (a0 , a1 , . . . ) and (b0 , b1 , . . . ) in `2 .
1.3 Linear independence

Let V be a real or complex vector space. A finite subset {v1 , v2 , . . . , vk } of vectors
in V is defined to be linearly dependent if there are scalars a1 , . . . , ak , not all zero,
such that a1 v1 + a2 v2 + · · · + ak vk = 0.
A set of vectors that is not linearly dependent is called linearly independent. So the
set {v1 , v2 , . . . , vk } is linearly independent if the only scalars a1 , . . . , ak satisfying
a1 v1 + a2 v2 + · · · + ak vk = 0 are a1 = a2 = · · · = ak .
Observe that the “singleton” set {v} is linearly dependent if and only if v = 0.
Indeed, if {v} is linearly dependent, then there is a scalar a 6= 0 such that av = 0;
dividing by a then gives v = a1 0 = 0. On the other hand, 10 = 0, and therefore {0}
is linearly dependent.
For example, consider the subset S = {(1, 0, 0, 0)t , (2, 1, 0, 0)t , (3, 2, 1, 0)t , (4, 3, 2, 1)t }
of R4 . Suppose that there are real numbers a1 , . . . , a4 such that 0 = (0, 0, 0, 0)t =
a1 (1, 0, 0, 0)t +a2 (2, 1, 0, 0)t +a3 (3, 2, 1, 0)t +a4 (4, 3, 2, 1)t . This equation is equivalent
to the system:
a1 + 2a2 + 3a3 + 4a4 = 0, a2 + 2a3 + 3a4 = 0, a3 + 2a4 = 0, a4 = 0.

4
In any solution, a4 = 0. Now a3 + 2a4 = 0 implies that a3 = 0, and so on: the only
solution is a1 = a2 = a3 = a4 = 0. Therefore the set S is linearly independent.
The linear span of a finite set S = {v1 , v2 , . . . , vk } of vectors in a real vector space
V is the subset
Xk
span(S) = { ai vi | ai ∈ R}
i=1
of linear combinations of the vectors in S. If the vectors are in a complex vector

space, then the sum is over all ai in C. The linear span of an infinite set A of vectors
is the set of vectors that can be written as a linear combination of finitely many
vectors from A.
A subset A of a vector space V spans the vector space V if span(A) = V , meaning

that every vector in the vector space can be written as the finite linear combination
of these vectors.
A vector space V with a finite subset A that spans V is called a finite-dimensional

vector space.
We now give some alternative characterisations of linear independence.

Theorem 1.3.1: A finite set A of vectors is linearly independent if and only if, for
every v ∈ A, the vector v is not in span(A \ {v}).
Proof: First suppose that A is linearly independent. If, for some v in A, v is in

span(A \ {v}), then v = a1 v1 + a2 v2 + · · · + ak vk for some scalars a1 , . . . , ak and
vectors v1 , v2 , . . . , vk ∈ A \ {v}, and therefore 0 = v − a1 v1 − a2 v2 − · · · − ak vk , a
contradiction.
For the converse, suppose that A = {v1 , . . . , vk } is not linearly independent, mean-
ing that there are scalars a1 , . . . , ak , not all zero, such that a1 v1 + · · · + ak vk = 0.
Without loss of generality, assume that a1 6= 0. Dividing by a1 gives that v1 =
− aa21 v2 − · · · − aak1 vk , implying that v1 is in span(A \ {v1 }). 2
Theorem 1.3.2: A finite set A = {v1 , . . . , vk } is linearly independent if and only

if, for every vector w in span(A), there is a unique way to write w as a linear
combination of members of A.
Proof: Suppose that A is linearly independent. If w = a1 v1 + · · · + ak vk , and also

w = b1 v1 + · · · + bk vk , then 0 can be written as (a1 − b1 )v1 + · · · + (ak − bk )vk .
Unless ai = bi for all i, this is a contradiction to linear independence.
5
Conversely, if A is not linearly independent, and 0 = a1 v1 + · · · + ak vk for some

scalars ai , not all zero, then we have an alternative way to write 0 as a linear
combination of the vectors in A. 2
1.4 Subspaces
In the vector space R3 , the set {(1, 0, 0)t , (0, 1, 0)t } does not span R3 . But the set
U = {(x, y, z)t ∈ R3 | z = 0} is also a real vector space, and these two vectors do
form a basis for this smaller space U .
A subset U of a vector space V is a vector subspace (or just subspace) of V if U is

also a vector space (with the addition and multiplication inherited from the larger
vector space V ). To check that a subset U of V is a vector subspace, we don’t need
to go through and verify all the properties of addition and multiplication of a vector
space, as these all hold in V .
A subset U of a vector space V is a subspace of V if and only if, whenever a1 , a2
are scalars and v1 , v2 are elements of U , the linear combination a1 v1 + a2 v2 is also
in U .
Theorem 1.4.1 If Q is a subset of a vector space V , then span(Q) is a subspace

of V .
Pm
Proof:
Pm Suppose v1 , v2 are in span(Q), meaning that v 1 = i=1 bi qi and v2 =
i=1 ci qi , for some scalars b1 , . . . , bm , c1 , . . . cm , P
and q1 , . . . , qm in Q. Let a1 and a2
be scalars. The vector a1 v1 + a2 v2 is equal to m i=1 (a1 bi + a2 ci )qi and therefore is
in span(Q). Therefore Q is a subspace. 2
P
More Examples: {(a1 , . . . , an ) | ni=1 ai = 0} is a subspace of Rn .
The set of all polynomial functions from R to R is a vector subspace of the set of
all functions from R to R.
The set of infinite sequences (a0 , a1 , . . . , ai , 0, 0, . . . ) of real numbers that are “even-
tually zero”, i.e., such that there is some i with aj = 0 for all j > i, is a subspace of
the vector space `2 .
6
1.5 Matrices and Row Operations

Recall that an n × m matrix A is an array of numbers with n rows and m columns.
The entry in row i and column j of A is usually denoted aij or (A)ij . If the entries
are real numbers, we call A a real matrix, and if they are complex we call A a complex
matrix. (Note that every real matrix can also be regarded as a complex matrix.)
If A is an n × k matrix and B is a k ×P
m matrix, then the product AB is defined to
be the n × m matrix where (AB)ij = kl=1 ail blj for every pair (i, j).
The n × n identity matrix is the n × n matrix In with 0s off the diagonal and 1s on
the diagonal. Sometimes it is more convenient to write simply I in place of In for
the identity matrix.
An n × n square matrix A is invertible if there exists another n × n matrix A−1 such

that AA−1 = A−1 A = In .
The column space of an n × m matrix A is the subspace of Rn spanned by the

vectors forming the columns of the matrix. The row space of A is the subspace of
Rm spanned by the transposes of the vectors forming the rows of the matrix.
We consider the following three (elementary) row operations on matrices:
(i) multiplying a row by a non-zero scalar,
(ii) exchanging two rows
(iii) adding a scalar multiple of one row to another.
By applying a sequence of elementary row operations to any n × m matrix A, it is

possible to transform the matrix A into a matrix B with the following properties:
(i) in every row, either all entries are zero, or the first non-zero entry in the row is
a 1 – such an entry in the matrix B is called a leading 1;
(ii) in any column containing a leading 1, all other entries are zero;
(iii) if the first non-zero entry of the ith row is in the jth column, and i0 < i, then
the first non-zero entry of the i0 th row is in a column j 0 with j 0 < j.
A matrix B satisfying (i)-(iii) is said to be in reduced row echelon form, and the
process of converting a matrix to such a form by means of row operations is known
as Gaussian elimination.

1 2 3
For example, starting with the matrix , through row operations we can
3 2 1
7

1 0 −1
obtain the matrix , which is in reduced row echelon form.
0 1 2
Gaussian elimination gives a method for solving a system of n linear equations with
m variables:
a11 x1 + · · · + a1m xm = b1
a21 x1 + · · · + a2m xm = b2
···
an1 x1 + · · · + anm xm = bn .
This can be written as a matrix equation:
    
a11 . . . a1m x1 b1
a . . . a   x  b 
 21 2m   2   2
 .. .. ..   ..  =  ..  .
 . . .  .   . 
an1 . . . anm xm bn
The method is to write down the augmented matrix: the n × m matrix followed by
the last n × 1 column:  
a11 . . . a1m b1
 a ... a 
 21 2m b2 
 .. .. .. .. 
 . . . . 
an1 . . . anm bn
and then perform row operations on the augmented matrix, until the original matrix
is in reduced row echelon form. Then we can read off the solutions, if any, from the
result. A solution will not exist if, in the reduced row echelon form, there is a row
consisting only of zeros corresponding to a non-zero entry in the n × 1 column.
For example, consider the equations 3x1 + 2x3 = 6 and −x1 + x2 + x3 = 5. We get
the augmented matrix
3 0 2 6
.
−1 1 1 5
Gaussian elimination yields the augmented matrix

1 0 2/3 2
.
0 1 5/3 7
Looking just at the first two columns, which are the columns containing leading 1s,
we can read off one solution: x1 = 2, x2 = 7 and x3 = 0. Additional solutions
are provided by allowing x3 to be any real number (x3 is called a free variable) and
setting x1 = 2 − 32 x3 and x2 = 7 − 53 x3 .
8
Lecture 2: Bases and Dimension

2.1 Bases
A basis of a vector space V is a subset B such that B is linearly independent and
spans V .
Theorem 2.1.1: Any non-trivial finite-dimensional vector space has a basis.

Proof: Let A be a finite set of vectors that spans V . If A is not linearly independent,
then there is a vector v that is in the linear span of A \ {v}. Since v can be written
as a linear combination of the elements of A \ {v}, it follows that A \ {v} is a smaller
set of vectors that still spans V . We continue deleting vectors from our set, until
we finish with a set of vectors that spans V and is linearly independent. We must
stop when we get to a set of size 1, at the latest: if we ever do come down to a set
spanning V consisting of a single vector, that vector is non-zero and hence the set
is linearly independent. 2
The standard basis of Rn or Cn is E = {e1 , . . . , en }, where ei is defined to be

the vector (0, 0, . . . , 0, 1, 0, . . . , 0)t , where the 1 is in the ith position. Any vector
(a1 , . . . , an )t in Rn or Cn can be written uniquely as the sum a1 e1 + · · · + an en .
For infinite-dimensional vector spaces there is no simple proof of the existence of a
basis.
2.2 Dimension
Theorem 2.2.1: If V is a finite-dimensional vector space, and B is a basis of V
with n elements, then any set of n + 1 elements of V is linearly dependent.
Proof: Suppose B = {v1 , . . . , vn } is a basis of V with n elements, and let C =
{w1 , . . . , wn+1 } be any set of n + 1 elements. As B is a basis, we can write wj =
Pn
i=1 aij vi for each j.
P
Consider solving the system of equations: n+1 j=1 aij xj = 0 for i = 1, 2, . . . , n.
As there are more unknowns than equations (i.e., as a matrix equation, there are
more columns than rows in the matrix), when we perform Gaussian elimination we
get at least one column without a leading 1, in other words we get at least one free
variable. This means that there is a solution (x̂1 , . . . , x̂n+1 ) to the equations with
not all the x̂j equal to 0.
P Pn+1
As n+1j=1 aij x̂j = 0 for all i, it follows also that ( j=1 aij x̂j )vi = 0 for all i, and
9
therefore
X
n X
n+1 X
n+1 X
n X
n+1
0= aij x̂j vi = x̂j aij vi = x̂j wj .
i=1 j=1 j=1 i=1 j=1
As not all the x̂j are zero, this shows that the set C is linearly dependent, as claimed.
2
Corollary 2.2.2: Any two bases of a finite-dimensional vector space have the same
size.
The common size of any basis of a finite-dimensional vector space V is called the
dimension of V , and written dim(V ).
2.3 Matrices and Gaussian Elimination

When we use row operations to reduce a matrix A to reduced row echelon form,
we can think of the process in terms of multiplying A on the left by a sequence of
elementary matrices.
Multiplying row i of the matrix A by the non-zero

 scalar a is the same  as multiplying
1 ... ... ... ...
 ... .. .. .. .. 
 . . . . 
 
A on the left by the diagonal n × n matrix . . . . . . a . . . . . ., where a is in
 . .. .. .. .. 
 .. . . . . 
... ... ... ... 1
the (i, i) position and the other diagonal entries are all 1.
Switching row  i with row j is the  same as multiplication on the left by the n × n
1 ... 0 ... 0
 ... .. .. .. 
 . . 1 .
 .. . .. .. .. , whose entries b are all zero except for b = 1
matrix B =  . .. . . . pq pp
 . . .. .. 
 .. 1 .. . .
... ... ... ... 1
if p 6∈ {i, j} and bij = bji = 1.
Adding a times the jth row to the ith row is the same as multiplication on the left
10
 
1 ... ... ... ...
 ... .. .. .. .. 
 . . . . 
 .. . . . .. 
.
by the n × n matrix  . .. .. .. , where a is in the (i, j) position and
 . .. 
 .. a ... ..
. . 
... ... ... ... 1
all other entries are 0 except for 1s on the diagonal.
Definition: The matrices above, corresponding to the elementary row operations,

are called the elementary matrices.
Note that each elementary matrix has an inverse that is also an elementary matrix.
You should already be familiar with the notion of the determinant det(A) of a matrix
A, how to calculate it, and in particular that a square matrix A is invertible if and
only if det(A) is not zero, which in turn is the case if and only if the only solution
to Av = 0 is v = 0. Recall that a matrix is singular if its determinant is zero, and
non-singular otherwise. We use the ideas above to prove an additional property of
determinants:
Theorem 2.3.1: If A and B are n × n matrices, then det(AB) = det(A) det(B).
Proof: If det(A) = 0 or det(B) = 0, then AB is also not invertible and therefore

det(AB) = 0. Otherwise we can break A into a product of elementary matrices
A = A1 A2 · · · Aj and B into a product of elementary matrices B = B1 B2 · · · Bk .
We claim that, if M is any matrix and A is any elementary matrix, then AM satisfies
det(AM ) = det(A) det(M ). This will imply that det(A) = det(A1 ) det(A2 ) · · · det(Aj ),
det(B) = det(B1 ) det(B2 ) · · · det(Bk ), and
det(AB) = det(A1 · · · Aj B1 · · · Bk ) =
det(A1 ) · · · det(Aj ) det(B1 ) · · · det(Bk ) = det(A) det(B).
To prove the claim, we must consider the three types of row operation in turn.
(1) If A represents the multiplication of a row by λ 6= 0, the determinant of A is λ
and the determinant of AM is λ times the determinant of M .
(2) If A represents the switching of two rows, the determinant of A is −1 and the
determinant of AM is minus the determinant of A.
(3) If A represents adding a multiple of one row to another row, the determinant of
A is 1, and the determinant of AM is the same as the determinant of A. 2
11
2.4 Inverses
The determination of whether an n × n matrix has an inverse, and its calculation
(if it exists) can be done in several ways. When the dimension n is large, the easiest
way is through Gaussian elimination.
Given an n × n matrix A that may or may not have an inverse, place the identity
matrix In alongside it. If A is invertible, there are elementary matrices B1 , B2 , . . . , Bk
corresponding to row operations such that B1 B2 · · · Bk A = In , meaning that the
product B1 B2 · · · Bk is equal to A−1 . Applying the same row operations to In as to
A transforms In into B1 B2 · · · Bk In = B1 B2 · · · Bk , the inverse of A.
 
0 2 −5
Example: We find the inverse of A = −1 0 1 .
3 −1 2
Starting with the augmented matrix (A|In ), we perform row operations until the
first matrix is reduced to In :
   
0 2 −5 1 0 0 1 0 −1 0 −1 0
 −1 0 1 0 1 0  ; R1 ↔ R2, −R1 → R1  0 2 −5 1 0 0  ;
3 −1 2 0 0 1 3 −1 2 0 0 1
   
1 0 −1 0 −1 0 1 0 −1 0 −1 0

R3−3R1 → R3 0 2  
−5 1 0 0 ; R2+R3 → R2 0 1 0 1 3 1 ;
0 −1 5 0 3 1 0 −1 5 0 3 1
   
1 0 −1 0 −1 0 1 0 −1 0 −1 0
R3
R2 + R3 → R3  0 1 0 1 3 1 ; → R3  0 1 0 1 3 1 ;
5
0 0 5 1 6 2 0 0 1 1/5 6/5 2/5
 
1 0 0 1/5 1/5 2/5
R1 + R3 → R1  0 1 0 1 3 1 .
0 0 1 1/5 6/5 2/5
The final matrix on the right is A−1 . Whenever we calculate an inverse, it is easy
to check whether the answer is correct:
    
1/5 1/5 2/5 0 2 −5 1 0 0
 1 3 1  −1 0 1  = 0 1 0 .
1/5 6/5 2/5 3 −1 2 0 0 1
So indeed A−1 A = In .
12
2.5 The Wronskian

Let us examine the idea of linear independence in the vector space D we introduced
in Lecture 1, consisting of real-valued functions with derivatives of all orders. Given
a finite set S of functions in D, it may be difficult to determine whether S is linearly
independent.
We show one method, based on a determinant, that can often be used to demonstrate
that a set S of n functions in D is linearly independent.
Let f1 , f2 , . . . , fn be functions in D: this certainly implies that each of them can

be differentiated n − 1 times.
The Wronskian determinant of the functions f1 , . . . , fn is:
 
f1 (x) f2 (x) . . . fn (x)
 f 0 (x) f20 (x) . . . fn0 (x) 
 1 
W (x) = det  .. .. .. .
 . . . 
(n−1) (n−1) (n−1)
f1 (x) f2 (x) . . . fn (x)
Theorem 2.5.1: If W (x) is not zero for some value x, then the functions f1 , . . . , fn
are linearly independent.
Proof: Suppose that f1 , f2 , . . . , fn are linearly dependent, which means that there
are scalars a1 , a2 , . . . , an , not all zero, such that a1 f1 + a2 f2 + · · · + an fn is the zero
function. Since the derivative of the zero function is also the zero function we have,
for all real x,
    
f1 (x) f2 (x) . . . fn (x) a1 0
 f 0 (x) f2 (x) . . . fn (x)   a2  0
0 0    
 1 
 .. .. ..   ..  =  ..  .
 . . .   .  .
(n−1) (n−1) (n−1)
f1 (x) f2 (x) . . . fn (x) an 0
As the vector (a1 , . . . , an )t is non-zero, this implies that the determinant W (x) of
the matrix is zero for all choices of x. 2
The other direction of Theorem 2.5.1 doesn’t hold: it is possible for W (x) to be zero
for all x ∈ R, and for the functions still to be linearly independent.
13
Lecture 3: Introduction to Linear Transformations

3.1 Matrix representation
A linear transformation is a function T from a vector space U to a vector space V
(possibly with U = V ) such that:
T (ax + by) = aT (x) + bT (y),
for all x, y in U , and all scalars a, b.
An ordered basis of a finite-dimensional vector space U is a tuple (u1 , . . . , um ) of

distinct vectors in U such that the set {u1 , . . . , um } is a basis of U .
If B = (u1 , u2 , . . . , um ) is an ordered basis of U , then a linear transformation T is
determined entirely by the values of T (u1 ), . . . , T (um ). Indeed, every vector v in U
can be written in exactly one way as v = α1 u1 + · · · + αm um . Then, for a linear
transformation T , T (v) = α1 T (u1 ) + · · · + αm T (um ).
If, additionally, C = (v1 , v2 , . . . , vn ) is an ordered basis of V , then we can represent
the linear transformation T by a matrix:
 
a11 a12 · · · a1m
 a21 a22 . . . a2m 
 
 · · · · · · · · · · · · 
AC,B =  
T · · · · · · · · · · · ·  ,
 
· · · · · · · · · · · · 
an1 an2 · · · anm
where, for every 1 ≤ i ≤ m,
T (ui ) = a1i v1 + a2i v2 + · · · + ani vn .
Every column of the matrix AC,BT corresponds to a member of the basis B; the jth
column is the result of the application of T to the jth member of the ordered basis
B, written in terms of the basis C. Every row corresponds to a member of the basis
C.
Conversely, for every n × m matrix A and every choice of an ordered basis B of U

and ordered basis C of V , there corresponds a linear transformation T : U → V
such that A = AC,B
T .
For example, if B = (u1 , u2 ) is an ordered basis for the space U , and C = (v1 , v2 , v3 )
14
is an ordered basis for the space V , T (u1 ) = 2v1 + 3v2 and T (u2 ) = v2 − v3 , then
 
2 0
AC,B
T = 3 1  .
0 −1
If we change the order of the elements so that B 0 = (u2 , u1 ) and C 0 = (v2 , v3 , v1 ),

then  
1 3
0 0
ACT ,B = −1 0 .
0 2
If T1 : U → V and T2 : V → W are both linear transformations, then the composi-
tion T2 ◦ T1 , defined by (T2 ◦ T1 )(u) = T2 (T1 (u)), is also a linear transformation. To
check this:

T2 T1 (au + bv) = T2 aT1 (u) + bT1 (v) = aT2 (T1 (u)) + bT2 (T1 (v)).
What is the matrix representing the linear transformation T2 ◦ T1 ?
Theorem 3.1.1: Let B = (u1 , . . . , um ) be an ordered basis for U , C = (v1 , . . . , vl )

an ordered basis for V , and D = (w1 , . . . , wn ) an ordered basis for W . The matrix
AD,B D,C C,B
T2 ◦T1 representing the composition T2 ◦T1 : U → W is the matrix product AT2 AT1 .
Proof: Let βik be the entries of AD,C C,B

T2 and αkj be the entries of AT1 . Then T1 (uj ) =
Pl
k=1 αkj vk and therefore
X
l X
l X
n X
n X l X
n
T2 (T1 (uj )) = αkj T2 (vk ) = αkj βik wi = ( βik αkj )wi = γij wi ,
k=1 k=1 i=1 i=1 k=1 i=1
where the γij are the entries of the matrix product AD,C C,B
T 2 AT 1 . 2
3.2 Change of Basis

What is the difference between a matrix and a linear transformation? A linear
transformation is a function between spaces. A matrix is an object defined by
entries in rows and columns. Linear transformations and matrices are not the same,
but they are closely related, as shown in the previous section.
Given choices for the bases of the two vector spaces, a matrix represents a linear
transformation. The same linear transformation can be represented by many differ-
ent matrices, depending on the choice of bases.
15
Given a linear transformation T : Rm → Rn , we are particularly interested in the

matrix representing T with respect to the standard ordered bases Em = (e1 , . . . , em )
and En = (e1 , . . . , en ) of Rm and Rn respectively. In this case, we write simply AT
instead of AE
T
n ,Em
for the matrix representing the linear transformation T . We use
the same convention for a linear transformation from Cm to Cn .
Suppose that we are given alternative bases B and C of Rm and Rn , respectively,

and we know the matrix AT representing a linear transformation according to the
standard bases Em and En . How do we find the matrix AC,B
T representing T according
to the bases B and C?
Let B = (u1 , . . . , um ) and C = (v1 , . . . , vn ) be ordered bases of Rm and Rn , re-

spectively. Let MB be the m × m matrix such that, for each j, the vector uj is
written into the jth column. This means that, if aij are the entries of MB then
uj = (a1j , . . . , amj )t = a1j e1 + · · · + amj em . Likewise, let MC be the n × n matrix
where, for each i, the vector vi is written into the ith column.
Example: Let B = ((1, 1, 0)t , (0, −1, 2)t , (−1, 0, 3)t ) and C = ((2, −1)t , (1, 2)t ).
Then  
1 0 −1
2 1
MB = 1 −1 0  MC = .
−1 2
0 2 3
There is a linear transformation T represented by the matrix MB , namely the linear

transformation T : Rm → Rm that sends each standard basis element ej to the vector
uj . The matrix MB is equal to the matrix AT . Because B is a basis, there is also a
linear transformation that sends each uj back to ej , and this linear transformation
must be the inverse T −1 of T , meaning that T ◦ T −1 = T −1 ◦ T is the identity linear
transformation on Rm , sending every vector to itself. What is the matrix AT −1 that
represents the linear transformation T −1 ?
Theorem 3.2.1: If T : Rm → Rm is the linear transformation that sends each

standard basis element ej to the jth element uj of an ordered basis B = (u1 , . . . , um ),
and MB = AT , namely the matrix such that the vector uj is written into the jth
column, then AT −1 = MB−1 .
Proof: Since T ◦ T −1 is the identity transformation, it follows that AT ◦T −1 is the

identity matrix Im . By Theorem 3.1.1, AT AT −1 = AT ◦T −1 = Im , and hence AT = MB
and AT −1 are inverses. 2
Returning to the original problem, we have the matrix AT , which represents the
16
linear transformation T : Rm → Rn with respect to the standard ordered bases Em

and En , and we want to find the matrix AC,B
T which represents the linear transfor-
mation T with respect to the ordered bases B and C of Rm and Rn respectively.
In the matrix AC,B
T , the vector uj ∈ B plays the role that ej plays in AT . We first
multiply by the matrix MB , sending every ej to uj . Next we apply the matrix AT ,
which sends every uj to the standard basis representation of the vector T (uj ) in Rn .
By Theorem 3.2.1, we need to multiply this vector by MC−1 to represent it in terms
of the ordered basis C. We have proven the following result.
Theorem 3.2.2: Let T : Rm → Rn be a linear transformation, and let B and C be

ordered bases of Rm and Rn respectively. Then
AC,B
T = MC−1 AT MB .
The formula of Theorem 3.2.2 also tells us how to get the matrix AT from the matrix
AC,B
T . Because MB and MC are invertible, we can multiply the above formula on
the left by MC and on the right by MB−1 to get
AT = MC AC,B
T MB−1 .
0 0
This also tells us how to go from AC,B
T to ACT ,B for some other alternative ordered
bases B 0 and C 0 :
0 0
ACT ,B = MC−10 AT MB 0 = MC−10 MC AC,B
T MB−1 MB 0 .

1 0 2
Let’s work out an example. Let AT = .
0 −1 1
Let B = ((1, 1, 0)t , (1, 0, 1)t , (0, 1, 0)t ) and C = ((2, 1)t , (1, 2)t ).
Now  
1 1 0
2 1 2/3 −1/3
MB = 1 0 1 ; MC = ; MC−1 = ;
1 2 −1/3 2/3
0 1 0
 
1 1 0
2/3 −1/3 1 0 2 1 0 1 = 1 5/3 1/3
AC,B = .
T −1/3 2/3 0 −1 1 −1 −1/3 −2/3
0 1 0
 
1
1 0 2 1 = 1 , and we see that
As a check, we calculate AT (1, 1, 0)t =
0 −1 1 −1
0
u1 = (1, 1, 0)t is mapped by T to (1, −1)t . This can be re-written as (2, 1)t −(1, 2)t =
17
v1 −v2 . Therefore we expect AC,B

T (1, 0, 0) to be (1, −1) , and indeed the first column
t t
of AC,B
T is (1, −1)t .
Similarly, the vector u2 = (1, 0, 1)t is mapped to (3, 1)t = 35 (2, 1)t − 31 (1, 2)t , and the
vector u3 = (0, 1, 0)t is mapped to (0, −1) = 31 (2, 1)t − 32 (1, 2)t .
One important case is when we have a linear transformation T : Rn → Rn , and

we are given an ordered basis B of Rn . Then we have AB,B T = MB−1 AT MB . In
applications, if we are given a linear transformation T from Rn to itself, we might
hope to find a basis B such that AB,B
T has a simple form.
3.3 Diagonalisation
For instance, we might want AB,B
T to be a diagonal matrix.
Recall that an eigenvalue of an n × n matrix A is a scalar λ such that there exists
some non-zero vector v with Av = λv. The vector v is called an eigenvector
corresponding to the eigenvalue λ.
A scalar λ is an eigenvalue of the matrix A if and only if x = λ is a root of the
characteristic polynomial det(xI − A). This is easy to prove: det(λI − A) = 0 if and
only if there is some non-zero vector v such that (λI − A)v = 0, i.e., λv − Av = 0,
and such a vector is an eigenvector corresponding to eigenvalue λ.
Given an n×n real matrix A, you should know a method for finding a real eigenvalue
– if one exists – and a corresponding eigenvector, and also to diagonalise the matrix
with these eigenvectors, if it is possible. The basic idea is to write A = P −1 DP ,
where D is a diagonal matrix whose entries are the eigenvalues, and the columns of
P are the corresponding eigenvectors. In order for P to be invertible, we need its
columns to form an ordered basis of Rn . (We shall return to this topic later in the
course, to see what can be done if this is not possible, i.e., in the case where the
n × n matrix A does not have a set of n linearly independent eigenvectors.)
What we are really doing here is a change of basis. The linear transformation
represented by A with respect to the standard ordered basis of Rn is represented by
D with respect to the (ordered) basis of eigenvectors we found.
18
3.4 Similarity
Two n × n matrices A and B are similar if there exists an invertible matrix P such
that P AP −1 = B.
Similarity is an equivalence relation:
(a) every matrix A is similar to itself, as IAI = A;
(b) if A is similar to B, then there is an invertible matrix P so that P AP −1 = B –
then we have P −1 BP = A, so B is similar to A;
(c) if A is similar to B, and B is similar to C, then there are invertible matrices P
and Q with P AP −1 = B and QBQ−1 = C, so QP AP −1 Q−1 = QBQ−1 = C, and
therefore A is similar to C.
We can restate what we have done in the previous section in terms of similarity.
If A and B are similar then they can represent the same linear transformation, with
respect to a change of basis represented by the matrix P . Conversely, if A and
B can represent the same linear transformation, then there will be a P such that
P AP −1 = B, and so A and B are similar.
An n × n matrix A is diagonalisable if it is similar to a diagonal matrix, as we

have just seen. This is equivalent to saying that there is a basis of Rn consisting of
eigenvectors of A.
Theorem 3.4.1 If A and B are similar then they have the same characteristic
polynomial.
Proof: Suppose B = P AP −1 , where P is invertible. Then we have P (xI −A)P −1 =

P (xI)P −1 − P AP −1 = xI − B, and therefore
det(xI − B) = det(P (xI − A)P −1 ) = det(P ) det(xI − A) det(P −1 ) = det(xI − A),
since det(P ) det(P −1 ) = det(P P −1 ) = det(I) = 1. 2
That the characteristic polynomial of A is not altered by a change of basis means

that it is truly “characteristic” of the linear transformation represented by A.
Define the trace of a matrix to be the sum of the diagonal entries. It is an exercise
to show that if f (x) = xn + an−1 xn−1 + · · · + a0 is the characteristic polynomial of the
n × n matrix A, then −an−1 is equal to the trace of A. Therefore similar matrices
have the same trace.
In the same way, we see that similar matrices have the same determinant. Also, if
λ is an eigenvalue of A, and B is similar to A, then λ is an eigenvalue of B.
19
Lecture 4: Kernels and Images

4.1 Subspaces
Associated to any linear transformation T : U → V are two vector subspaces: the
image and the kernel of T .
The image of a linear transformation T : U → V is the set im(T ) = {v ∈ V | v =
T (u) for some u ∈ U }. Informally, the image is the subset of V that “gets hit” by
the transformation T .
The kernel of a linear transformation T is the set ker(T ) = {u ∈ U | T (u) = 0}:
the subset of U that is mapped to the zero vector by T .
If A is an n × m real matrix, then the range of A is the set
R(A) = {v ∈ Rn | v = Aw for some w ∈ Rm },
and the nullspace of A is the set
N (A) = {u ∈ Rm | Au = 0}.
If T : Rm → Rn is the linear transformation represented by a matrix AT , with

respect to the standard bases, then the range of AT is equal to the image of T , and
the nullspace of AT is equal to the kernel of T .
Conversely, suppose that U and V are finite-dimensional vector spaces, with ordered
bases B and C respectively, and T : U → V is a linear transformation. Let AB,C T be
the n × m matrix representing T with respect to these ordered bases. The range of
AB,C
T is a set of vectors in Rn ; this set is equal to the set of vectors in the image of
T , written in terms of the ordered basis C. Similarly the nullspace of AB,C T is the
set of vectors in the kernel of T , written in terms of the ordered basis B.
Quite often, the notions of the image/kernel (of a linear transformation) and the
range/nullspace (of a matrix representing that linear transformation with respect to
a given ordered basis) can be used interchangeably.
Theorem 4.1.1: Both the image and the kernel of a linear transformation are
subspaces.
Proof: Let T : U → V be a linear transformation.

Kernel: Suppose u1 and u2 are in ker(T ), so T (u1 ) = T (u2 ) = 0. Let a and b be
scalars. Then T (au1 +Bu2 ) = aT (u1 )+bT (u2 ) = a 0+b 0 = 0, so au1 +bu2 ∈ ker(T ).
20
Image: Suppose v1 and v2 are in im(T ), with v1 = T (u1 ) and v2 = T (u2 ). For
scalars a and b, T (au1 + bu2 ) = aT (u1 ) + bT (u2 ) = av1 + bv2 , so av1 + bv2 ∈ im(T ).
2
For an n×m matrix A, this result also tells us that the range R(A) of A is a subspace
of Rn , and the nullspace N (A) is a subspace of Rm . The range R(A) is spanned
by the set {Ae1 , . . . , Aem }, which are just the column vectors of A. In other words,
R(A) is the linear span of the column vectors of A, also known as the column space
of A.
The next important result relates the dimensions of the kernel and the image of a
linear transformation.
Theorem 4.1.2: If T : U → V is a linear transformation, then the dimension of

im(T ) plus the dimension of ker(T ) equals the dimension of U .
Proof: Since the kernel of T is a finite-dimensional vector space, it has a basis

{u1 , . . . , uk }. We can extend this basis of ker(T ) to a basis {u1 , . . . , uk , uk+1 , . . . , um }
of U . It suffices to show that S = {T (uk+1 ), . . . , T (um )} is a basis of the image of T .
Because T (ui ) = 0 for all i ≤ k, the set S certainly spans the image of T . Suppose
that there are scalars ak+1 , . . . , am such that ak+1 T (uk+1 )+· · ·+am T (um ) = 0. Then
T (ak+1 uk+1 +· · ·+am um ) = 0, so ak+1 uk+1 +· · ·+am um is in the kernel of T . However,
since this vector is in ker(T ), which is spanned by {u1 , . . . , uk }, and every vector in
U has a unique representation as a linear combination of u1 , . . . , uk , uk+1 , . . . , um ,
this is possible only if ak+1 = · · · = am = 0. 2
In terms of matrices, the result above says that, for an n×m matrix A, the dimension
of R(A) (the rank r(A) of A) plus the dimension of N (A) (the nullity n(A) of A)
equals m.
4.2 Bases for the kernel and image

How does one find a basis for the image and the kernel of a linear transformation?
The answer brings us back to the solving of systems of linear equations and Gaussian
elimination.
Suppose that T : U → V is a linear transformation between finite-dimensional

vector spaces, with B = (u1 , . . . , um ) an ordered basis of U , and C = (v1 , . . . , vn )
an ordered basis of V . The columns of AC,B T comprise the vectors T (u1 ), . . . , T (um ),
written in terms of C.
21
The image of T is the linear span of the set {T (u1 , . . . , T (um )}. Indeed, each
vector T (ui ) is certainly in im(T ), while any vector u ∈ U can be written as u =
x1 u1 + · · · xm um , and so T (u) = x1 T (u1 ) + · · · + xm T (um ).
A basis of im(T ) can be found from a subset of columns of AC,B T that is linearly
independent and spans the same space. To find the kernel, one must solve the system
of linear equations given by T (x1 u1 + · · · + xm um ) = x1 T (u1 ) + · · · + xm T (um ) = 0.
Fortunately, one way to find a basis of the image follows from the same technique
for finding a basis of the kernel.
To find the kernel of T , we solve the system AC,B

T x = 0 of linear equations. The set
of solutions x to this system is exactly the set of vectors in the kernel of T , written
in terms of the basis B.
If, after performing Gaussian elimination, we find only one solution, namely the zero
vector, then the ker(T ) = {0}. Otherwise, there are some free variables, and the
dimension of ker(T ) is the number of free variables. To see this, let F be the set of
free variables: F is a subset of the index set {1, . . . , m}. For any i ∈ F , let αi = 1
and let αl = 0 for all other other free variables l 6= i. Now determine
P what the other
coefficients αj for j in {1, . . . , m} \ F should be, so that ui + j∈{1,...,m}\F αj uj is in
ker(T ). The answer is simple: αj should be the negative of the (k, i)-entry of matrix
in reduced row echelon form, where the leading coefficient of the kth row is in the
jth column. For large matrices, an easier way than negating all these entries is to
let the coefficient of ui be −1 instead of negating the entries.
A basis of im(T ) is {T (uj ) | j is not a free variable}. Why? Let k be a free vari-
P
able. Since
P −u k + j∈{1,...,m}\F αj uj is in ker(T ) for some scalars αj , we can write
T (uk ) = j∈{1,...,m}\F αj T (uj ), and therefore {T (uj ) | j is not a free variable} spans
the image. By Theorem 3.2.2, the dimension of the image is m minus the number
of free variables, which is the number of variables that are not free, and therefore
this set is also a basis of the image.
Gaussian elimination also shows how to extend a basis of ker(T ) to a basis of U . The
set C = {uj | j is not a free variable}, when added to a basis B of ker(T ), will result
in a basis of U . Since the size of these two sets adds up to m = dim(U ), it suffices to
P P
show that this set is linearly independent. Suppose
P that j∈C aj u j + w∈B bw w = 0.
By applying the transformation T , we get j∈C aj T (uj ) = 0. But, since we showed
already that the T (uj ), for the non-free variables j,Pform a basis of the image of T ,
it follows that all the aj are zero. We are left with w∈B bw w = 0. As B is a basis
of ker(T ), we conclude that all the bw are also zero.
22
Extending a basis of the image of T to a basis of V involves an additional process of

Gaussian elimination. Whenever S is a set of k vectors in an n-dimensional vector
space, we can write these vectors as the rows of a k × n matrix (expressed in terms
of some basis of V ) and perform Gaussian elimination. The result will be rows of
vectors whose linear span is the same as the linear span of the original set S. The
dimension of the subspace spanned by S will be n minus the number of free variables
from the matrix in reduced row echelon form. It is easy to see that the row vectors,
either from the original set S or from the reduced row echelon form, together with
the free variables, form a basis for V .
For example, let us find

 bases for the image
 and kernel of T , when T is represented
2 1 −1 2
by the matrix AT = 3 4 6 −2. We row-reduce the augmented matrix
6 −6 −30 24
 
2 1 −1 2 0
3 4 6 −2 0 .
6 −6 −30 24 0
Start by subtracting the first row from the second, and exchanging the first two
rows:  
1 3 7 −4 0
 2 1 −1 2 0  .
1 −1 −5 4 0
Now subtract two times the first row from the second row, and subtract the first
row from the third row:  
1 3 7 −4 0
 0 −5 −15 10 0  .
0 −4 −12 8 0
Next, divide the second row by −5 and the third by −4:
 
1 3 7 −4 0
 0 1 3 −2 0  .
0 1 3 −2 0
Finally, subtract the second row from the third, and subtract three times the second
row from the first:  
1 0 −2 2 0
 0 1 3 −2 0  .
0 0 0 0 0
We can now read off the answers. The first and second original columns, namely the
vectors (2, 3, 1)t and (1, 4, −1)t , are linearly independent, while the other columns
23
are in the linear span of these first two, so the two vectors span the range of AT .
As for the kernel, both the third and the fourth variables are free variables. To
find one basis vector of ker(T ), set the coefficient x3 for e3 equal to −1 and the
coefficient x4 for e4 equal to 0: to find the other two coefficients, we solve the two
equations x1 + (−1)(−2) = 0 and x2 + (−1)3 = 0, to get the vector (−2, 3, −1, 0)t in
ker(T ). Similarly, setting x4 = −1 and x3 = 0 yields the vector (2, −2, 0, −1)t in the
kernel. These two vectors (−2, 3, −1, 0)t and (2, −2, 0, −1)t form a basis of ker(T ).
The vectors (1, 0, 0, 0)t and (0, 1, 0, 0)t corresponding to the non-free variables will
extend any basis of the kernel to a basis of R4 .

2 3 1
Placing the vectors (2, 3, 1)t and (1, 4, −1)t into rows, we get the matrix .
1 4 −1

1 4 −1
Swap the rows, and subtract twice the first row from the second row, to get .
0 −5 3

1 0 7/5
Continue to get the matrix . As the third column does not have a
0 1 −3/5
leading 1, the vector (0, 0, 1)t will extend any basis of the image to a basis of R3 .
4.3 Row Rank = Column Rank

Let A = AT be any n×m matrix representing some linear transformation T : U → V ,
for U and V of appropriate dimensions. The row space of A is the subspace of U
spanned by the vectors in the rows of the matrix. The column space of A is the
subspace of V spanned by the vectors in the columns of the matrix. The row rank
is the dimension of the row space and the column rank or rank is the dimension of
the column space: we have already seen that this is the dimension of im(A).
Suppose we have a matrix A, with a given row rank and column rank, and we per-
form an elementary row operation to obtain A0 . It is easy to see that:
(a) the row space of A is equal to the row space of A0 ;
(b) if a set of column vectors of A is linearly independent, then the corresponding
set of column vectors of A0 is also linearly independent.
This means that both the row rank and the column rank are unchanged on perform-
ing an elementary row operation.
For a matrix B in reduced row echelon form, the row rank of B is the number of non-
zero rows, which is the same as the number of columns containing a leading 1. The
column vectors containing leading 1s (and otherwise all 0s) are linearly independent,
and all the other columns of B are contained in the linear span of these column
vectors. Thus the column rank of B is also the number of columns containing
24
leading 1s.
This tells us that the row rank and column rank of a matrix in reduced row echelon
form are equal, and therefore that the row rank and column rank of any matrix are
equal. The common value of the row rank and the column rank of a matrix is called
simply the rank of the matrix.
Moreover, if we start with a matrix A, and row-reduce it to a matrix B in reduced
row echelon form, then the non-zero rows of B form a basis for the row space of A.
Also, the columns of B containing leading 1s correspond to columns of A that form
a basis of the column space R(A) of A.
4.4 Kernels and images from subspaces

Kernels and images of linear transformation are subspaces. What about the con-
verse: can any vector subspace of a finite-dimensional vector space be a kernel or
an image of a linear transformation?
Theorem 4.4.1: If U is a subspace of a finite-dimensional vector space V then

U is the kernel of some linear transformation and U is the image of some linear
transformation.
Proof: The subspace U has a basis B = {v1 , · · · , vk }. Extend B to a basis

B̂ = {v1 , · · · , vk , vk+1 , . . . , vn } of the space V . Now define a linear transformation
from V to V which sends vj to itself if 0 ≤ j ≤ k and sends vj to 0 for every
k + 1 ≤ j ≤ n. The image of this transformation is U .
Likewise, we can define a linear transformation T : V → V which sends vj to itself
if k + 1 ≤ j ≤ n and sends vj to 0 if 1 ≤ j ≤ k. Every vector in U has T (u) = 0; if
v is not in U then, writing v uniquely as a sum of the elements in the basis B̂, there
will be a non-zero coefficient aj for some vj with j ≥ k + 1, meaning that T (v) 6= 0.
Therefore ker(T ) = U . 2
25
Lecture 5: Real Inner Product Spaces

5.1 Definitions and Examples
Example 1: Consider the function that takes a pair (u, v) of vectors in Rn to the
dot product u · v = u1 v1 + · · · + un vn , which is a real number. So, formally, the dot
product operation is a function from the set Rn × Rn (the set of ordered pairs of
vectors in Rn ) to R.
The dot product operation has a number of nice properties, some of which we now
list.
• The function is linear in the first variable: for each fixed v, the function taking
u to u · v is linear, i.e.,
(α1 u1 + α2 u2 ) · v = α1 (u1 · v) + α2 (u2 · v)
for all real numbers α1 , α2 and all u1 , u2 in Rn .
• The function is symmetric: u · v = v · u for all u and v in Rn .
• The function is positive: u · u > 0 for all u 6= 0.
It turns out that this particular set of conditions (a) is possessed by a number of
important examples of functions from V × V to R, where V is a real vector space,
and (b) has a large number of useful consequences and applications. So we consider
a general function satisfying these three conditions, and start by giving it a name.
Definition: An inner product on a real vector space V is a function h·, ·i that
associates a real number hu, vi to each pair of vectors u, v ∈ V , satisfying the
following three conditions.
• The function is linear in the first variable: for each fixed v in V , the function
taking u to hu, vi is linear, i.e.,
hα1 u1 + α2 u2 , vi = α1 hu1 , vi + α2 hu2 , vi
for all real numbers α1 , α2 and all u1 , u2 in V .
• The function is symmetric: hu, vi = hv, ui for all u and v in V .
• The function is positive: hu, ui > 0 for all u 6= 0 in V .

26
Definition: A real vector space V equipped with an inner product is called a real
inner product space.
So Rn , equipped with the dot product, is a real inner product space. Let us see some
other examples of inner products, starting with a different inner product on R2 .
Example 2: Consider the function h·, ·i from R2 × R2 to R defined by hu, vi =
u1 v1 + 2u2 v2 + u2 v1 + u1 v2 .
It is not hard to check that this function also has all the three properties listed
above, and so is an inner product on R2 . To see positivity, note that hu, ui =
u21 + 2u22 + 2u1 u2 = (u1 + u2 )2 + u22 , which is always non-negative, and can only be
zero if u1 + u2 = 0 = u2 , which only happens for u = 0.
Example 3: Let P be the Rspace of polynomial functions with real coefficients.

1
Define h·, ·i on P by hf, gi = −1 f (x)g(x) dx.
For instance, if f is the function given by f (x) = x − 1, and g is the function
g(x) = x, then
Z 1 Z 1
2
hf, gi = f (x)g(x) dx = (x2 − x) dx = .
−1 −1 3
This function is linear in the first variable:

R1
hα1 f1 + α2 f2 , gi = −1 (α1 f1 (x) + α2 f2 (x))g(x) dx
R1 R1
= α1 −1 f1 (x)g(x) dx + α2 −1 f2 (x)g(x) dx = α1 hf1 , gi + α2 hf2 , gi.
By
R 1 its definition, the function h·, ·i is symmetric. It is also positive, because hf, f i =
2
−1 (f (x)) dx, which is always non-negative, and is only zero – for a polynomial
function f – if f is the zero function.
Example P 4: Recall that `2 is the set of infinite sequences (a1 , a2 , . . . ) of real numbers
∞
such
P that i=1 a 2
i < ∞. For a = (a1 , . . . ) and b = (b1 , . . . ) in ` , define ha, bi to
2
be ∞ 2
i=1 ai bi . It is easy to check that this defines an inner product on ` ; this is a
generalisation of the standard dot product in Rn .
We make a few remarks regarding the definition of an inner product.

(1) The properties of linearity in the first variable, and symmetry, imply that:
hu, α1 v1 + α2 v2 i = α1 hu, v1 i + α2 hu, v2 i,
for all u, v1 , v2 in V and all α1 , α2 in R. In other words, an inner product is also
linear in the second variable.
27
A function from V × V to R which is linear in both its first and second variables is
called a bilinear form on V .
(2) Linearity in the first and second variables implies that hu, 0i = h0, ui = 0. So
in particular h0, 0i = 0 in any real inner product space. In the positivity condition,
we are careful to demand only that hu, ui > 0 for all non-zero u.
For the rest of this subsection, we focus on the case where V is finite-dimensional,
with a basis B = {w1 , . . . , wn }.
An inner product on V is determined by the values hwi , wj i of the inner product
on pairs of basis elements. Indeed, take any vectors x = x1 w1 + · · · xn wn and
y = y1 w1 + · · · + yn wn in V . By linearity in the first variable,
hx, yi = hx1 w1 + · · · + xn wn , yi
can be decomposed as
x1 hw1 , yi + · · · + xn hwn , yi,
and then, by linearity in the second variable, this can be broken down to the double
sum
Xn Xn
xi yj hwi , wj i.
i=1 j=1
Setting aij := hwi , wj i, we now get

X
n X
n
hx, yi = xi yj aij .
i=1 j=1
If we set A to be the n × n matrix with entries aij , we can express this as

 
XX a y + · · · + a y
n n
11 1 1n n
hx, yi = xi yj aij = x1 . . . xn  ..
. =
i=1 j=1 an1 y1 + · · · + ann yn
  
a11 . . . a1n y1

x1 . . . xn  .. . .
.. ..   ...  = xt Ay.
.
an1 . . . ann yn
On the other hand, if we can write hx, yi = xt Ay for some matrix A, then the form
h·, ·i satisfies both linearity conditions. For instance,
hu, av + bwi = ut A(av + bw) = aut Av + but Aw = ahu, vi + bhu, wi.

28

1 1
In Example 1, the matrix A is the identity matrix In ; in Example 2, it is .
1 2
To say that the function h·, ·i defined by hx, yi = xt Ay is symmetric means that
xt Ay = yt Ax for all x and y in V . As the 1×1 matrix yt Ax is equal to its transpose
xt At y, the function is symmetric if and only if xt Ay = xt At y for all x and y – and
this is equivalent to saying that A = At , i.e., A is symmetric.
A real symmetric matrix A such that xt Ax > 0 for all non-zero x is called positive
definite. Thus hx, yi = xt Ay defines an inner product on Rn if and only if the matrix
A is positive definite. We shall return to this property later in the course.
5.2 Geometry in Real Inner Product Spaces

Definition: Two vectors x and y in a real inner product space are orthogonal or
perpendicular if hx, yi = 0. The notation x ⊥ y means also that hx, yi = 0.
p
The norm of a vector v in a real inner product space is defined as ||v|| = hv, vi.
The positivity property means that the norm is a non-negative real number, equal
to zero only if v = 0. A unit vector is a vector with norm 1.
p R with the dot product, the norm of a vector v = (v1 , . . . , vn ) is given by

n t
In
v12 + · · · + vn2 , which is the usual Euclidean length of the vector, or the distance of
that vector from the origin.
The distance between two vectors u and v in a real inner product space is ku − vk.
The inner product also measures the degree to which one vector has a “component”
in the direction of another vector. Let v be a non-zero vector in Rn . As av will
represent the same direction, for any scalar a ∈ R, we may choose instead the vector
||v||
||v|| , which is a unit vector, as || ||v|| || = ||v|| = 1. From now on, we assume that the
v v
vector v is a unit vector. Choose now any vector u ∈ Rn . The inner product hu, vi
measures the component of u in the direction of v. For example, if u is perpendicular
to v, then u has no component in the direction of v, and indeed hu, vi = 0. The
unit vector v has component 1 in its own direction, and indeed hv, vi = ||v||2 = 1.
Definition: The angle θ between two vectors x and y in a real inner product space
hx,yi
is the unique value 0 ≤ θ ≤ π such that cos θ = ||x||·||y|| .
If x and y are unit vectors, the angle between them is arccoshx, yi.
This definition of angle is chosen to coincide with the well known result in Rn , with
the dot product, that ||x − y||2 = ||x||2 + ||y||2 − 2||x|| · ||y|| cos θ, where θ is the
29
angle at the origin between the vectors x and y.

When vectors are orthogonal, then the angle between them is π/2, as cos(π/2) = 0,
and the above is the Theorem of Pythagoras.
The following is a central result about inner products.
Theorem 5.2.1: [The Cauchy-Schwarz Inequality] For any pair x, y of vectors

in a real inner product space, |hx, yi| ≤ ||x|| · ||y||.
Proof: Let α be a real variable, and consider the square of the norm of x + αy, i.e.,
||x + αy||2 = hx + αy, x + αyi, which is at least zero by positivity. Using linearity
and symmetry, we write this as hx + αy, x + αyi = hx, xi + 2αhx, yi + α2 hy, yi ≥ 0.
The above equation holds for all real α. Whenever a 6= 0, and at2 + bt + c ≥ 0
for all choices of t ∈ R, it must be that 4ac ≥ b2 (since otherwise there would be
two distinct real solutions of the quadratic equation, implying that there are some
choices of t for which at2 +bt+c < 0). Setting c = hx, xi, b = 2hx, yi and a = hy, yi,
this yields hx, xi hy, yi ≥ |hx, yi|2 , giving ||x||2 ||y||2 ≥ |hx, yi|2 . 2
Theorem 5.2.2: Generalised Theorem of Pythagoras For any pair x, y of

orthogonal vectors in a real inner product space, ||x + y||2 = ||x||2 + ||y||2 .
Proof: ||x + y||2 = hx + y, x + yi = hx, xi + 2hx, yi + hy, yi = hx, xi + hy, yi =

||x||2 + ||y||2 . 2
Theorem 5.2.3: The Triangle Inequality For any pair x, y of vectors in a real
inner product space, ||x + y|| ≤ ||x|| + ||y||.
Proof: By the properties of an inner product, ||x + y||2 = hx + y, x + yi = ||x||2 +

2hx, yi + ||y||2 .
By the Cauchy-Schwarz inequality (Theorem 5.2.1), hx, yi ≤ ||x|| · ||y||, which im-
plies that ||x + y||2 ≤ ||x||2 + 2||x|| · ||y|| + ||y||2 = (||x|| + ||y||)2 . 2
5.3 Orthonormal Bases

Definition: A set A = {v1 , . . . , vk } of vectors in a real inner product space V is
orthonormal if:
(i) ||vi || = 1 for i = 1, . . . , k;
(ii) for every pair i 6= j, hvi , vj i = 0 (i.e., vi ⊥ vj ).
Theorem 5.3.1: If B = {v1 , . . . , vk } is an orthonormal set of vectors in a real inner

30
product space, then B is linearly independent.
Proof: Suppose that a1 v1 + · · · + ak vk = 0. Due to orthogonality and linearity, the

inner product ha1 v1 + · · · + ak vk , a1 v1 + · · · + ak vk i is equal to a21 hv1 , v1 i + · · · +
a2k hvk , vk i, and it is also equal to h0, 0i = 0.
So each term a2i kvi k2 is zero. As the vi are all non-zero, this means that a21 = · · · =
a2k = 0, which implies that a1 = · · · = ak = 0. 2
Thus an orthonormal set A in a real inner product space is always linearly indepen-
dent; if also it spans V , it is called an orthonormal basis of V .
Examples in R2 with the dot product:

√ √
The set {( 21 , 23 )t , ( 23 , 21 )t } satisfies the norm condition, namely that both vectors
are unit vectors. However, the dot product of the two vectors is not zero and so this
is not an orthonormal set.
The set S = {(1, 1)t , (−1, 1)t } satisfies the condition

√ that the dot product of the two
vectors is zero, but the norm of both vectors is 2.
We can convert S into an orthonormal set by normalising the vectors, i.e., replacing
each vector v by the√ unit
√
vector√ √
v/||v|| in the same direction. This yields the
orthonormal basis {( 22 , 22 )t , (− 22 , 22 )t } of R2 .
√ √
The set {( 21 , 2
3 t
) , (− 2 , 2) }
3 1 t
is another orthonormal basis of R2 .
The Gram-Schmidt Method:
Let U be a finite-dimensional subspace of a real inner product space. We give a

method to find an orthonormal basis of U .
Start with any non-zero vector v̂1 6= 0 in U . Define v1 to be the unit vector v̂1 /||v1 ||.
Suppose that S = {v1 , v2 , . . . , vk } is an orthonormal set in U such that span(S) is

∗
not the whole of U . Take any vector vk+1 ∈ U that is not in span(S). Define the
vector
∗ ∗ ∗
v̂k+1 = vk+1 − hvk+1 , v1 iv1 − · · · − hvk+1 , vk ivk .
We claim that v̂k+1 is orthogonal to all the vj , for j ≤ k. Indeed:

X
∗ ∗ ∗
hv̂k+1 , vj i = hvk+1 , vj i − hvk+1 , vj ihvj , vj i − hvk+1 , vl ihvl , vj i =
l6=j
∗ ∗
= hvk+1 , vj i − hvk+1 , vj ihvj , vj i = 0.
31
Now normalise this vector, setting vk+1 = v̂k+1 /||v̂k+1 ||. The set {v1 , . . . vk , vk+1 }
is also an orthonormal set in U . Continue in this way until we have a basis of U .
The Gram-Schmidt method proves the following:
Theorem 5.3.2: Every finite-dimensional subspace of a real inner product space

has an orthonormal basis.
If the inner product is defined by the standard dot product in Rm , then there
is another way to find a vector v perpendicular to a collection of other vectors
u1 , . . . , uk : create the k × m matrix A by placing the the vectors u1 , . . . , uk into the
rows of A, and solve the equation Av = 0. A vector v ∈ R that satisfies this equation
will be perpendicular to each of the vectors u1 , . . . , uk , as the matrix multiplication
Av is performed by taking the dot product of v with each of the u1 , . . . , uk , and
each result must be zero. The solution set will be the set of all vectors v such that
v ⊥ ui for all i = 1, 2, . . . , k.
This method, however, has its limitations. It won’t work if the inner product is not
the dot product. Furthermore, if the inner product is the standard dot product,
but the vector v is required to belong to some particular vector subspace, then one
must solve for both membership in the kernel (of the transformation defined by the
above matrix A) and membership in the desired vector subspace.
Example: Let T : R4 → R3 be the linear transformation defined by

 
1 −1 0 −4
0 1 1 2  .
1 0 1 −2
Find an orthonormal basis of ker(T ) (with respect to the standard Euclidean dot
product), and extend it to an orthonormal basis of R4 .
First we find a basis for the kernel of T . We start by row reducing the given matrix
to    
1 −1 0 −4 1 0 1 −2
0 1 1 2  and then 0 1 1 2 .
0 1 1 2 0 0 0 0
We can now read off a basis {(1, 1, −1, 0)t , (−2, 2, 0, −1)t } for the nullspace of the
matrix, i.e., for the kernel of T , but this basis is not orthonormal. Because the norm
of the second vector is simpler, namely 3, we take this vector (−2, 2, 0, −1)t first,
and divide by the norm to obtain v1 = (−2/3, 2/3, 0, −1/3)t , a unit vector. Now
32
we have the good fortune that (1, 1, −1, 0)t · (−2/3, 2/3, 0, −1/3)t = 0, so the second
vector v2 can be taken as √13 (1, 1, −1, 0)t .
Rather than continue with arbitrary vectors that complete to a basis of R4 , it is

quicker to find some vectors that are already perpendicular to ker(T ). Returning to
the matrix  
1 −1 0 −4
0 1 1 2  ,
1 0 1 −2
notice that we found v1 and v2 as vectors that are perpendicular to the rows of
this matrix (or indeed to the row vectors in any matrix obtained during the row
∗
3 = (1, 0, 1, −2) is already perpendicular to v1 and
t
reduction). So, for instance, v√
v2 ; normalising, we have v3 = 66 (1, 0, 1, −2)t .
Now take another row vector v4∗ = (0, 1, 1, 2)t ; we know this is orthogonal to v1 and
v2 , but it is not orthogonal to v3 . Using the Gram-Schmidt process, we obtain:
√ √
6 6
v̂4 = (0, 1, 1, 2)t − (0, 1, 1, 2)t · ( (1, 0, 1, −2)t ) (1, 0, 1, −2)t =
6 6
1 1 1 3
(0, 1, 1, 2)t − (−3) (1, 0, 1, −2)t = (0, 1, 1, 2)t + (1, 0, 1, −2)t = ( , 1, , 1)t
6 2 2 2
Finally, we normalise v̂4 to get v4 = 3√2 (1, 2, 3, 2)t . The set {v1 , v2 , v3 , v4 } is an
1
orthonormal basis of R4 , and {v1 , v2 } is a basis for ker(T ).
The alternative method of finding a suitable v4 , given the above vectors v1 , v2 , v3 ,

goes as follows. Take convenient multiples of the vectors v1 , v2 , v3 as the rows of a
matrix, say  
−2 2 0 −1
 1 1 −1 0  ,
1 0 1 −2
with the aim of finding the vector v4 as an element of its nullspace. Row reduction
of this matrix leads to  
1 0 0 −1/2
0 1 0 −1  ,
0 0 1 −3/2
so the nullspace is spanned by the vector (1, 2, 3, 2)t . Divide this vector by its norm to
get v4 = √118 (1, 2, 3, 2)t . This is the same vector as that found by the Gram-Schmidt
method. (This is no accident: the only other vector in R4 that is orthogonal to all
of v1 , v2 and v3 is −v4 .)
33
Lecture 6: Orthogonal Matrices

6.1 Orthogonal Matrices
Definition: An n × n matrix A with real entries is orthogonal if A−1 = At , (where
At is the transpose of the matrix A).
The following is an orthogonal matrix:

 
2/3 2/3 1/3
−2/3 1/3 2/3 .
1/3 −2/3 2/3
Notice that the vectors written in the columns form an orthonormal basis. The same
is true of the vectors written into the rows.
Theorem 6.1.1: Consider the inner product space Rn , with the dot product, and
let T : Rn → Rn be a linear transformation represented by the matrix AT with
respect to the standard basis. The following are equivalent:
(1) AT is orthogonal;
(2) T maps the standard basis of Rn to an orthonormal basis of Rn , meaning that
the columns of AT form an orthonormal basis of Rn ;
(3) T preserves distance, i.e, ||T (v)|| = ||v|| for all v ∈ Rn ,
(4) T preserves the dot product, meaning T (u) · T (v) = u · v for all u, v ∈ Rn .
Proof:
(1) and (2) are equivalent: For every 1 ≤ i ≤ n, set vi = T (ei ) = AT ei . Then
vi appears as the ith column of AT , and therefore also as the ith row of AtT .
Writing each vi as vi = (a1i , . . . ani ), we see that AtT AT = I if and only if a1i a1j +
· · · + ani anj = vi · vj is equal to 0 if i 6= j, while a21i + · · · + a2ni = vi · vi is equal to 1.
These conditions are equivalent to saying that {v1 , . . . , vn } is an orthonormal basis.
√
(3) and (4) are equivalent: We know that kxk = x · x. Also, one can check
that
1 1
x · y = ||x + y||2 − ||x − y||2 ,
4 4
for all x, y ∈ R . This means that the dot product and the norm can each be written
n
in terms of the other, and therefore preserving one is equivalent to preserving the
other.
34
(2) and (4) are equivalent The inner product is determined entirely by what it
does to pairs of basis elements. Therefore, preserving the inner product is equivalent
to requiring that vi · vj = ei · ej for all pairs i and j, and this is equivalent to (2). 2
6.2 Closure
Theorem 6.2.1: Let A and B be n × n orthogonal matrices. Then:
(1) A−1 is orthogonal,
(2) AB is orthogonal,
(3) det(A) is equal to either 1 or −1.
Proof: (1) follows from A−1 (A−1 )t = At (A−1 )t = (A−1 A)t = I t = I.

(2) follows from AB(AB)t = ABB t At = AAt = I.
(3) follows from 1 = det(I) = det(AAt ) = det(A) det(At ) = det(A)2 . 2
Properties (1) and (2) above are called “closure” properties: the class of n × n
orthogonal matrices is closed under taking inverses, and taking products. We shall
see similar results for other interesting classes of matrices later.
The orthogonal matrices with determinant equal to 1 are called orientation-preserving
and the orthogonal matrices with determinant −1 are called orientation-reversing.
6.3 Two-dimensional orthogonal matrices

In dimension 2, we can describe all the orthogonal matrices in a very simple way.
Suppose that A is a 2 × 2 orthogonal matrix. Set v1 = Ae1 and v2 = Ae2 . We know

from Theorem 6.1.1 that ||v1 || = ||v2 || = 1, and v1 · v2 = 0. Let θ be the angle
between (1, 0)t and v1 , so 0 ≤ θ ≤ 2π is such that v1 = (cos θ, sin θ)t .
Because v2 is orthogonal to v1 , the angle between (1, 0)t and v2 must be either θ + π2
or θ − π2 . Let’s look at these two cases separately.
If the angle between (1, 0)t and v2 is θ + π2 , then v2 = (cos(θ + π/2), sin(θ + π/2))t =
(− sin θ, cos θ)t , and the transformation that takes e1 to v1 and e2 to v2 is the
rotation by the angle θ, represented by the matrix

cos θ − sin θ
A= ,
sin θ cos θ
and the determinant of A is cos2 θ + sin2 θ = 1.
If the angle between (1, 0)t and v2 is θ − π2 , then v2 = (cos(θ − π/2), sin(θ − π/2))t =
35
(sin θ, − cos θ)t , and the transformation that takes e1 to v1 and e2 to v2 is represented
by the matrix
cos θ sin θ
A= ,
sin θ − cos θ
with determinant − cos2 θ − sin2 θ = −1.
The transformation represented by A is a reflection about the line through the origin
and the vector u = (cos( 2θ ), sin( 2θ ))t .
We have proven the following theorem.
Theorem 6.3.1 In R2 , a linear transformation represented by an orientation-

preserving orthogonal matrix is a rotation about the origin, and a linear trans-
formation represented by an orientation-reversing orthogonal matrix is a reflection
about a line through the origin.
Conversely, because any rotation or reflection of R2 that fixes the origin defines a
linear transformation that is distance-preserving, its corresponding matrix is orthog-
onal.
What happens when two orientation-preserving 2 × 2 orthogonal matrices (corre-

sponding to rotations) are composed? If one represents rotation by the angle θ1 and
the other by the angle θ2 , then the result will be a rotation by the angle θ1 + θ2 .
What happens when two orientation-reversing matrices (corresponding to reflec-

tions) are composed? Because their product will have determinant equal to one,
their composition is again a rotation.
6.4 Dimension 3
What happens in R3 ? Surprisingly, the structure of orthogonal linear transforma-
tions in R3 is almost identical to that of R2 .
Theorem 6.4.1: Any 3 × 3 orientation-preserving orthogonal matrix represents

a rotation about some axis. Any 3 × 3 orientation-reversing orthogonal matrix
represents either a reflection, or a rotation combined with a reflection about the
plane perpendicular to the axis of rotation.
The theorem is based on the following two results.

36
Theorem 6.4.2: Let A be an n × n orthogonal matrix.

(1) If A is orientation-reversing, then −1 is an eigenvalue of A.
(2) If n is odd, then either 1 or −1 is an eigenvalue of A.
(3) If n is odd and A is orientation-preserving, then 1 is an eigenvalue of A.
Theorem 6.4.3: Let v be an eigenvector associated to the eigenvalue 1 or −1 of

a 3 × 3 orthogonal matrix A, and extend {v} to an orthonormal basis {v, u1 , u2 }
of R3 . Then the transformation T represented by A takes the subspace spanned by
{u1 , u2 } to itself.
We omit the proofs.
6.5 Axis and angle of rotation

Given a distance-preserving linear transformation of R3 , how do we find the matrix
representing it? Conversely, given a 3 × 3 orthogonal matrix A, how do we find out
what transformation it represents?
The general method is to write down the matrix representing the transformation
with respect to a convenient orthornormal ordered basis, and then change basis.
Example: Find the orientation-reversing orthogonal matrix A that represents re-

flection in the plane P with equation 2x1 + x3 = 0.
The vector (2, 0, 1)t is normal to P and it is thus an eigenvector associated with
the eigenvalue −1. Any vector in the plane P is an eigenvector with eigenvalue
1. We normalise (2, 0, 1)t to get v1 := √15 (2, 0, 1)t . For our next basis element,
we take a vector in the plane with a simple form, for instance (0, 1, 0)t , which is
already of norm 1 so we may take v2 = (0, 1, 0)t . The only choice for a third vector
orthogonal to both the previous vectors is v3 = √15 (1, 0, −2)t (or its negation). With
regard to this orderedbasis B =  (v1 , v2 , v3 ), the matrix representing the linear
−1 0 0
transformation is S =  0 1 0. We now write down the change-of-basis matrix
  0 0 1
√2 0 √15
 5

MB =  0 1 0 . Note that MB is an orthogonal matrix, so its inverse is MBt ,
√1 −2
5
0 √ 5
37
which in this case is again MB . We now write down the answer:

    √2   
√2 0 √1 −1 0 0 0 √1
− 3
0 − 4
 5 5
  5 5
 5 5
−1
A = MB SMB =  0 1 0  0 1 0  0 1 0  =  0 1 0  .
−2 −2
√1
5
0 √ 5
0 0 1 √1
5
0 √ 5
− 54 0 35
One may check that indeed (2, 0, 1)t is mapped to (−2, 0, 1)t by A, while (0, 1, 0)t
and (1, 0, −2)t are mapped to themselves.
Example: What transformation is represented by the orthogonal matrix

    
0 −1 0 1 −8 −4 8 −1 4
1 1
C = 1 0 0  −8 1 −4 = 1 −8 −4 .
9 9
0 0 −1 −4 −4 7 4 4 −7
We check that C is an orthogonal matrix with determinant +1, and so it is orientation-

preserving, and hence represents a rotation. What is the axis of the rotation, and
what is the angle of rotation?
We know that 1 is an eigenvalue of C, and we need to look at the nullspace of C − I:

 
−1 −1 4
1
C − I =  1 −17 −4  .
9
4 4 −16
We notice that that the third column is −4 times the first, and deduce that the
vector (4, 0, 1)t is in the nullspace of C − I, and so defines the axis of rotation.
To find the angle of rotation, it suffices to choose any vector (of norm 1) perpendic-
ular to (4, 0, 1)t , take the dot product v · (Cv), and then the angle of rotation will be
the arc-cosine of this quantity v · (Av). The easiest vector to choose is (0, 1, 0)t ; it is
mapped to 91 (−1, −8, 4)t , and the dot product of the two vectors is −8 9 , meaning that
the rotation is almost half-way around the circle, through an angle arccos(−8/9).
By observing what happens to (0, 1, 0)t , it is possible to see that the direction of
rotation is anticlockwise, given that the axis is oriented by (4, 0, 1)t and the angle is
assumed to be between 0 and π – we’ll come back to this in the next section.
Using the trace allows for an easier calculate of the angle of rotation of a 3 × 3
orthogonal matrix, without even finding the axis of rotation! One needs to know
only whether the matrix is orientation-reversing or orientation-preserving.
An orientation-preserving orthogonal matrix A, representing a rotation about some
axis through angle θ, is similar to the matrix B that represents a rotation about
38
 
cos θ − sin θ 0
the axis (0, 0, 1)t through angle θ, and we know that B =  sin θ cos θ 0. The
0 0 1
trace of B is 1 + 2 cos θ, and therefore the trace of the similar matrix A is also
1 + 2 cos θ.
 
8 −1 4
For instance, the order-preserving orthogonal matrix C = 19 1 −8 −4 above has
4 4 −7
trace −7/9: if it represents a rotation through an angle θ, then 1 + 2 cos θ = −7/9,
so cos θ = −8/9, in agreement with our earlier calculation.
Similarly, if A is an orientation-reversing
 orthogonal matrix, with angle of rotation

cos θ − sin θ 0
θ, then it is similar to  sin θ cos θ 0 , and so its trace will equal −1 + 2 cos θ.
0 0 −1
6.6 Clockwise or anticlockwise?

What matrix AT represents a rotation T through an angle π/2 clockwise about the
axis (1, 0, 2)t ?
The method for answering this question is similar to that in the previous section:
we first find an orthonormal basis including v3 = √15 (1, 0, 2)t . A natural choice for
the other two vectors is (0, 1, 0)t and √15 (2, 0, −1)t . With respect to this basis, T is
 
0 1 0
represented by −1 0 0. Now we change basis.
0 0 1
However, there is a difficulty. Which of our two new vectors should be v1 and which
v2 ? This does make a difference! In fact, if we choose the wrong order, then we get
the matrix representing a rotation through π/2 anticlockwise about the given axis.
We should be clear about what is meant by a clockwise rotation about an axis. Think
of the axis as pointing up out of a plane: we want the rotation to be clockwise if we
look down on that plane. It is worth pointing out that a rotation of π/2 clockwise
about (1, 0, 2)t is the same as a rotation of π/2 anticlockwise about (−1, 0, −2)t .
 
0 1 0
Our rotation matrix −1 0 0 is written in terms of the standard ordered basis
0 0 1
(e1 , e2 , e3 ), and it fixes e3 as the axis, and rotates vectors in the x−y plane clockwise
39
by π/2, so that the image of e1 is −e2 , and the image of e2 is e1 .

A key point is that, in the standard ordered basis, e3 is above the plane in which
e2 lies π/2 clockwise of e1 . When we switch to a new ordered basis (v1 , v2 , v3 ),
we should preserve this property. The way to check this is simple: write the three
basis vectors as the columns of a matrix: if the determinant is positive (actually,
it will be +1), then the vectors are the “right way round” – this means that the
transformation taking the standard basis to the new one is orientation-preserving.
If the determinant turns out to be negative, we can reverse the order of v1 and v2 ,
or negate one of the vectors.
 
0 2 1
Here, for instance, the matrix 1 0 0 has negative determinant, and hence
0 −1 2
√
so does the matrix obtained by dividing two of its columns by 5. So this is
the wrong order of vectors, and we should take our ordered basis to be B =
( √15 (2, 0, −1)t , (0, 1, 0)t , √15 (1, 0, 2)t ). Note again that the matrix MB with these vec-
tors as its columns is an orthogonal matrix, so MB−1 = MBt .
Now we have that AT = MB AB,B t
T MB , which is equal to:
     √ 
2 √0 1 0 1 0 2 √0 −1 1√ 2 5 √2
1 1
0 5 0 −1 0 0 0 5 0  = −2 5 √ 0 5 .
5 5
−1 0 2 0 0 1 1 0 2 2 − 5 4
Conversely, how do we recognise whether a given rotation is clockwise or anticlock-

wise about a given axis?
Given an axis x, and another vector y orthogonal to it, consider the plane spanned
by the two in which y lies “to the left” of x – i.e., a rotation of π/2 anticlockwise in
the plane takes x to y. We complete to an orthogonal basis by including the vector
z pointing upwards from the plane. We see that the matrix with columns (x, y, z)
has positive determinant (+1 if all the vectors are unit vectors).
Now hold x fixed and consider rotating around the axis in an anticlockwise direction,
through an angle between 0 and π: we see that y is moved above the plane. So, if
A represents an anticlockwise rotation, then Ay has a positive component in the z
direction.
We now write Ay = αx + βy + γz. So the rotation about x represented by A is
anticlockwise if γ > 0. Now we write (x, y, Ay) for the matrix with these vectors as
its columns. To evaluate its determinant, we note that the determinant is unchanged
on subtracting α times the first column and β times the second from the third
40
column, and so
det(x, y, Ay) = det(x, y, γz) = γ det(x, y, z) = γ.
This gives us the following result.

Theorem 6.6.1: If A represents a rotation about x, and y is orthogonal to x,
then the rotation is anticlockwise, through an angle between 0 and π, if and only if
det(x, y, Ay) > 0.
For example, the matrix C in the previous section represented a rotation about
(4, 0, 1)t , and it mapped (0, 1, 0)t to 19 (−1, −8, 4)t . We have already seen that the
angle of rotation is arccos(−8/9) – we take this to mean the number between 0 and
+π whose cosine is −8/9. We calculate
 
4 0 −1
4 1
det 0 1 −8 = det > 0,
−1 4
1 0 4
so the rotation is anticlockwise.
6.7 Orthogonal matrices in n dimensions

In general, if A is an n × n orthogonal matrix, then there is an n × n orthogonal
matrix M such that
 
Q1 0 0 . . . . . . 0
 ... .. .. .. .. .. 
 . . . . . 
 
−1  0 0 Qm 0 . . . 0 
M AM =  
 0 . . . 0 λ1 0 0 
 . .. .. .. .. .. 
 .. . . . . . 
0 . . . . . . . . . 0 λk ,
where the λi are entries on the diagonal that are either 1 or −1, and the Qi are 2 × 2
blockson the diagonalrepresenting two-dimensional rotations, so each Qi is of the
cos αi − sin αi
form . Both extremes are possible, namely k = 0 or m = 0.
sin αi cos αi
41
Lecture 7: Complex Inner Product Spaces

7.1 Complex numbers
The complex numbers are all numbers of the form a + bi, where a and b are real
numbers, and i2 = −1. The symbol C stands for the set of all complex numbers.
The product (a1 + b1 i)(a2 + b2 i) is equal to a1 a2 − b1 b2 + (a1 b2 + a2 b1 )i, because
i2 = −1. The complex numbers are closed under multiplication and addition.
If a and b are real numbers then the real part Re(a + bi) of a + bi is a, and the
imaginary part Im(a + bi) is b.
If c = a + bi is a complex number then the conjugate of c is the number a − bi

and is written as c. If one takes the conjugate twice one gets the original number:
c = a + bi = a − bi = a + bi = c. If c = a + bi then c + c = (a + bi) + (a − bi) = 2a =
2 Re(c) and c − c = (a + bi) − (a − bi) = 2bi = 2i Im(c).
√
The modulus of a complex number c = a + bi ∈ C is defined to be |c| = a2 + b2 .
We have cc = (a + bi)(a − bi) = a2 + b2 = |c|2 .
For any non-zero complex number c = a + bi, the modulus |c| is positive, and the
complex number c/|c|2 is a multiplicative inverse for c.
When performing arithmetic with complex numbers, never leave i in the lower part
of a fraction as a final answer. If necessary, multiply top and bottom of the fraction
1+i
by the conjugate of what is on the bottom. For example, if the number −3+i appears,
−3−i
then multiply this fraction by −3−i to get (−3−i)(−3+i) = −2−4i
(−3−i)(1+i)
10 =
−1−2i
5 .
As one can take the conjugate of a complex number, so one can take the conjugate
of a vector in Cn . If v = (c1 , . . . , cn )t , then v = (c1 , . . . , cn )t .
Also one can take the conjugate of a matrix: if A is a complex n × m matrix with
entries aij , then the conjugate matrix A has entries aij .
The complex conjugate of a product of complex numbers c and d satisfies cd = cd.
It follows that, for complex matrices C and D, CD = CD
We can calculate this determinant of A using the same calculation as for det(A),
except that at every step of the calculation we have the complex conjugate. So
det(A) = det(A).
42
7.2 Complex inner products

We would like to build a theory of inner products on complex vector spaces.
Our prime example will be Cn , equipped with a dot product operation, but what
exactly should this be? One property we want to keep from the real case is that z · z
should be the square of the “length” of the complex vector z. This square in turn
should be equal to |z1 |2 + · · · + |zn |2 = z1 z1 + · · · + zn zn . It is then a short step to
imagine that the “correct” definition of the complex dot product should be
z · w = z1 w1 + · · · zn wn .
This is a function from Cn × Cn to C.
As in the real case, we collect some key properties of this function.
• The dot product is linear in the first variable: for all w, z1 , z2 in Cn , and all
complex numbers α1 , α2 , we have
(α1 z1 + α2 z2 ) · w = α1 (z1 · w) + α2 (z2 · w).
• The dot product is Hermitian: w · z = z · w, for all z and w in Cn .

• The dot product is positive: z · z is a positive real number, for all non-zero z
in Cn .
Notice the differences between the real and complex cases. Another key difference
comes when we look at linearity properties in the second variable. We see that, for
all z, w1 , w2 in Cn , and all complex numbers α1 , α2 ,
z · (α1 w1 + α2 w2 ) = (α1 w1 + α2 w2 ) · z
= α1 (w1 · z) + α2 (w2 · z)
= α1 w1 · z + α2 w2 · z
= α1 (z · w1 ) + α2 (z · w2 ).
We say that the dot product is conjugate linear in the second variable.
Definition: A complex inner product on a complex vector space V is a function

h·, ·i from V × V to C such that:
• The function is linear in the first variable: for all w, z1 , z2 in V , and all complex
numbers α1 , α2 , we have
hα1 z1 + α2 z2 , wi = α1 hz1 , wi + α2 hz2 , wi.
43
• The function is Hermitian: hw, zi = hz, wi, for all z and w in V .

• The function is positive: hz, zi is a positive real number, for all non-zero z in V .
A complex vector space equipped with a complex inner product is called a complex
inner product space.
As above, a complex inner product is conjugate linear in the second variable:

hz, α1 w1 + α2 w2 i = α1 hz, w1 i + α2 hz, w2 i
for all z, w1 , w2 in V , and all complex numbers α1 , α2 .
Given a basis B = {v1 , . . . , vn } for the finite-dimensional complex vector space V ,

a complex inner product on V is determined by what happens to any pair vi , vj of
basis elements, and so the inner product can be represented by an n × n complex
matrix A, with the entry aij ∈ C being the value of hvi , vj i.
There is another significant difference between the theory of complex and real inner
products. If x = x1 v1 + · · · + xn vn and y = y1 v1 + · · · + yn vn , we cannot write
hx, yi = (x1 , . . . , xn )A(y1 , . . . , yn )t as before. Instead we write
  
a11 . . . a1n y1

hx, yi = x1 . . . xn  .. . .
.. ..   ...  = xt Ay.
.
an1 . . . ann yn
From any
p complex inner product, we can define a norm for a vector:
||v|| = hv, vi.
Example 1: We have already seen that the complex dot product hu, vi = u · v :=
u1 v1 + · · · + un vn is an inner product on Cn . This is sometimes called the Euclidean
inner product on Cn . In matrix form, we can write
  
1 ... 0 v1
. . .
u · v = u1 . . . un  .. .. ..   ...  = ut v.
0 ... 1 vn
The square of the norm of a vector v ∈ Cn , with respect to the dot product, is given
by v · v. If v = (v1 , . . . , vn )t , then we have
||v||2 = v1 v1 + · · · + vn vn = |v1 |2 + · · · + |vn |2 .
p
Therefore ||v|| = |v1 |2 + · · · + |vn |2 .
44
Example 2: Let PC be the space of polynomial functions with complex coefficients.

For any polynomial f ∈ PC , define f to be the function f (x) = f (x).
For example, if f (x) = ix2 + (2 − 3i)x − 4 + 3i, then the complex conjugate f is
given by f (x) = −ix2 + (2 + 3i)x − 4 − 3i.
R1
Define h·, ·i on PC by hf, gi = 0 f (x)g(x) dx. To see that this is linear in the first
variable:
R1
haf1 + bf2 , gi = 0 (af1 (x) + bf2 (x))g(x) dx
R1 R1
= a 0 f1 (x)g(x) dx + b 0 f2 (x)g(x) dx = ahf1 , gi + bhf2 , gi.
R 1 h·, ·i is 2Hermitian. It is positive because f (x)f (x) = ||f (x)|| ≥ 0

2
By its definition,
for all x, and 0 ||f (x)|| dx = 0 implies that f is the zero function. So h·, ·i is a
complex inner product on PC .
Example 3: Let `2 (C) be the set of infinite sequences (a1 , a2 , . . . ) of complex

P∞ P∞
i=1 ||ai || < ∞. Define ha, bi to be
2
numbers such that i=1 ai bi , where a =
(a1 , a2 , . . . ) and b = (b1 , b2 , . . . ). This defines an inner product on `2 (C).
7.3 Orthonormal sets

A set {v1 , . . . , vk } of vectors in a complex inner product space V is orthonormal if
hvi , vj i is 0 in case i 6= j and otherwise hvi , vi i = 1 for all i = 1, . . . , k.
Examples (with the standard dot product in Cn ):

√ √
Example 1: v1 = 22 (1, i)t , v2 = 22 (i, 1)t . The norms are equal to 1 because
(1, i)t · (1, i)t = 1 + i(−i) = 2 and (i, 1)t · (i, 1)t = i(−i) + 1 = 2. The dot product is
2 ((1, i) · (i, 1) ) = 2 (−i + i) = 0, so the two vectors form an orthonormal set.
1 t t 1
Example 2: v1 = 12 (1, i, 1 + i)t , v2 = 21 (i, −1, 1 − i)t . The norm of each vector is 1,
and their dot product is 41 (−i − i + (1 + i)(1 + i)) = 14 (−2i + 1 − 1 + 2i) = 0.
There is no essential difference between the proof of the following result and that of
Theorem 5.3.1.
Theorem 7.3.1: Any orthonormal set is linearly independent.
As with real inner product spaces, the Gram-Schmidt procedure can be performed
∗
with complex inner product spaces. The slight difference is that the new vector vk+1
should appear in the first position, with the other previous vectors v1 , v2 , . . . , vk in
the second position. The inner product hu, vi represents the component of u in the
45
direction v, but not the component of v in the direction of u.
We demonstrate the Gram-Schmidt process by extending the orthonormal set from

Example 2 to an orthonormal basis of C3 , with respect to the dot product.
Take v3∗ = (1, 0, 0)t . We get

1
v̂3 = (1, 0, 0)t − (1, 0, 0)t · (1, i, 1 + i)t (1, i, 1 + i)t
4
1
− (1, 0, 0)t · (i, −1, 1 − i)t (i, −1, 1 − i)t
4
1 1
= (1, 0, 0)t − (1, i, 1 + i)t − (−i)(i, −1, 1 − i)t
4 4
1
= (4, 0, 0)t − (1, i, 1 + i)t − (1, i, −1 − i)t
4
1
= (2, −2i, 0)t .
4
So the last vector v3 is any unit multiple of (1, −i, 0)t , such as √12 (1, −i, 0)t . It is
worth noting that, in contrast to the real case, there are other examples of unit
vectors orthogonal to v1 and v2 besides v3 and −v3 . For instance, u = 12 (1 + i, 1 −
i, 0)t is orthogonal to both; the explanation for this is that u is a multiple of v3 by
a complex number of modulus 1, namely u = 1+i √ v3 .
2
The process of solving linear systems of equations through Gaussian elimination is

the same in the complex case.
    
i 2 0 x1 i
Example: Solve the set of linear equations 1 −i −1 x2  =  1 
0 i 1 x3 1−i
As usual, we form the augmented matrix, and perform row reductions. Switch the
first two rows and subtract i times the first row from the second row:
 
1 −i −1 1
0 1 i 0 .
0 i 1 1−i
Add i times the second row to the first row and −i times the second row to the
third row:  
1 0 −2 1
0 1 i 0 .
0 0 2 1−i
46
Add the third row to the first row, and then divide the third row by 2:
 
1 0 0 2−i
0 1 i 0 .
0 0 1 (1 − i)/2
Finally, subtract i times the third row from the second row:
 
1 0 0 2−i
 0 1 0 (−1 − i)/2  .
0 0 1 (1 − i)/2
We read off the solution x1 = 2 − i, x2 = (−1 − i)/2, x3 = (1 − i)/2.
 
1 1−i i
Example: Find a basis for the nullspace of the matrix A = 0 1 2 − i, and
0 1+i 3+i
extend it to an orthonormal basis of C .
3
We perform Gaussian elimination, to solve the equation Ax = 0. One conceptual

difference is that the nullspace will not be orthogonal to the rows of A, but rather
to their conjugates.
It is not immediately apparent that the third row of A is a multiple of the second
row, but multiplying the second row by 1 + i does indeed yield the third row. With
the third row gone, continue by adding 1 − i times the second row to the first row:

1 0 1 − 2i
.
0 1 2−i
√
In the nullspace of A is the vector v̂1 = (1 − 2i, 2 − i, −1)t , with norm 11, so we
take v1 = √111 (1 − 2i, 2 − i, −1)t .
To complete to an orthonormal basis of C3 , we write down two linearly independent
vectors orthogonal to v1 , for instance v2∗ = (1, 0, 1 + 2i)t and v3∗ = (0, 1, 2 + i)t . After
normalising, one gets v2 = √16 (1, 0, 1 + 2i)t . As v3∗ is already perpendicular to v1 we
need only take its inner product with v2 .
1
v̂3 = (0, 1, 2 + i)t − (0, 1, 2 + i)t · (1, 0, 1 + 2i)t (1, 0, 1 + 2i)t
6
4 − 3i
= (0, 1, 2 + i)t − (1, 0, 1 + 2i)t
6
1
= (−4 + 3i, 6, 2 + i)t .
6
Normalising, we have v3 = √166 (−4 + 3i, 6, 2 + i)t .
47
Lecture 8: The Adjoint

8.1 The adjoint of a complex matrix
In this lecture, we will introduce some important special classes of complex matrices.
First, we look at the analogue of the transpose operation for complex matrices.
t
For any complex valued matrix A, we define the adjoint A∗ of A to be A , the result
of taking the conjugate of all entries and transposing the matrix. Because the two
operations commute with each other, we have (A∗ )∗ = A, as was also the case with
the transpose.
For the adjoint of a product, we have: (AB)∗ = (AB)t = B t At = B t At = B ∗ A∗ .

If A is invertible, then A∗ (A−1 )∗ = (A−1 A)∗ = I ∗ = I, which means that (A−1 )∗ is
the inverse of A∗ .
The determinant of A∗ is equal to the determinant of its transpose A. So det(A∗ ) =
det(A) = det(A).
The pairing of the n × n matrix A and its adjoint A∗ has a special relationship to
the complex inner product. This relationship is expressed in the following result.
Theorem 8.1.1: For every pair x, y of vectors of Cn , and any n×n complex matrix
A, we have (Ax) · y = x · (A∗ y).
Proof: We have
(Ax) · y = (Ax)t y = xt At y = xt A∗ y = x · (A∗ y).
2
8.2 Unitary matrices

Definition An n × n complex valued matrix A is unitary if A∗ A = AA∗ = I (or
equivalently A∗ = A−1 ).
If A is a real matrix, then it is unitary if AAt = I, in other words if it is orthogonal.

The class of unitary matrices is the generalisation to complex vector spaces of the
class of orthogonal matrices.
Theorem 8.2.1: Consider the space Cn with the standard dot product. Let A be
an invertible n × n complex matrix. Then the following are equivalent:
(1) A is unitary;
(2) the columns of A form an orthonormal basis of Cn ;
48
(3) the transformation defined by A preserves the inner product, i.e., (Au) · (Av) =
u · v, for all u, v ∈ Cn .
(4) the transformation defined by A preserves distance: ||Av|| = ||v|| for all v ∈ Cn .
Proof:
(1) and (2) are equivalent: For every 1 ≤ i ≤ n, let vi = Aei be the ith column
of A. Expressing vi as vi = (a1i , . . . ani )t and vi = (a1i , . . . ani )t , we have A∗ A = I
if and only if a1i a1j + · · · + ani anj = vj · vi is equal to 0 if i 6= j, and otherwise
a1i a1i + · · · + ani ani = vi · vi is equal to 1. These are precisely the conditions for the
vi to form an orthonormal basis.
(2) and (3) are equivalent: The inner product is determined entirely by what it
does to pairs of basis elements. Therefore, preserving the inner product is equivalent
to (Aei ) · (Aej ) = ei · ej for all pairs i and j, and this is equivalent to (2).
(3) and (4) are equivalent: The norm of a vector is defined in terms of the inner
√
product: kvk = v · v. So, if A preserves inner products, it also preserves norms.
On the other hand, Exercise 30 shows that the inner product of two vectors can
be written as a function of the norms of certain associated vectors. Therefore, if A
preserves norms, then it also preserves inner products. 2
Theorem 8.2.2: Let A and B be n × n unitary matrices.

(1) A−1 is unitary,
(2) AB is unitary,
(3) | det(A)| = 1.
Proof: (1) As A∗ = A−1 , we have (A−1 )∗ = A = (A−1 )−1 , so A−1 is unitary.

(2) AB(AB)∗ = ABB ∗ A∗ = AA∗ = I.
(3) 1 = det(I) = det(AA∗ ) = det(A) det(A∗ ) = det(A)det(A) = | det(A)|2 , so
| det(A)| = 1. 2
If A is a complex matrix, then det(A) is a complex number. Part (3) above says
that, if A is unitary, then this complex number has modulus 1.
If A is a unitary n × n matrix, then the columns of A form an orthonormal basis of

Cn . Moreover, as A∗ is unitary, the columns of A∗ = A−1 also form an orthonormal
basis of Cn , and these are the conjugates of the row vectors of A. It follows that the
rows of A also form an orthonormal basis of Cn .
Theorem 8.2.3: Every eigenvalue λ of a unitary matrix has norm 1.

49
Proof: Let v be an eigenvector of a matrix A corresponding to the eigenvalue λ; so

v 6= 0 and Av = λv. We have (Av) · (Av) = λλ(v · v) = |λ|2 (v · v). On the other
hand, from Theorem 8.1.1, we have (Av) · (Av) = v · (A∗ Av) = v · v. As v 6= 0,
this implies that |λ| = 1. 2
Alternative proof: a unitary matrix A is distance-preserving. So, for v an eigenvector

with eigenvalue λ, we have ||v|| = ||Av|| = ||λv|| = |λ| ||v||.
8.3 Hermitian matrices

Another important class of special matrices is the class of Hermitian matrices.
Definition: An n × n matrix A is Hermitian if A∗ = A.
A real matrix is Hermitian if and only if it is symmetric: At = A. All the results

that follow therefore apply equally to real symmetric matrices.
Theorem 8.3.1:
(i) If A is an invertible Hermitian matrix, then A−1 is Hermitian.
(ii) The eigenvalues of any Hermitian matrix are real.
Proof: (i) Suppose A is invertible and Hermitian. Then (A−1 )∗ = (A∗ )−1 = A−1 ,
so A−1 is Hermitian.
(ii) Let u be any eigenvector of A corresponding to an eigenvalue λ, so u 6= 0
and Au = λu. The inner product (Au) · u is equal to λ||u||2 , as (Au) · u =
(λu) · u = λ(u · u) = λ||u||2 . But, since A∗ = A, Theorem 8.1.1 tells us that
(Au) · u = u · (A∗ u) = u · (Au) = u · (λu) = λ||u||2 . Since u is non-zero, we conclude
that λ = λ, so λ is real. 2
An n × n matrix A is called normal if AA∗ = A∗ A. A unitary or Hermitian matrix

A is also a normal matrix.
If a normal matrix A is invertible, then (A−1 )∗ A−1 = (A∗ )−1 A−1 = (AA∗ )−1 =
(A∗ A)−1 = A−1 (A∗ )−1 = A−1 (A−1 )∗ , so A−1 is also normal.
8.4 Positive Definite Matrices

Recall that a complex inner product is determined completely by what happens to
basis elements. If x = (x1 , . . . , xn )t and y = (y1 , . . . , yn )t then any function h·, ·i on
Cn × Cn that is linear in the first variable and conjugate linear in the second – we
50
say such a function is a sesquilinear form – can be written as

  
a11 . . . a1n y1

hx, yi = x1 . . . xn  ... ..
.
..   ..  ,
. .
an1 . . . ann yn
where aij = hei , ej i. Setting A to be the n × n matrix in the centre of the expression
above, this can be reformulated in several ways:
hx, yi = (At x) · y = x · (Ay) = xt Ay.
Replacing A by A for a cleaner statement, we see that any n × n matrix A defines

a sesquilinear form through hx, yi = x · (Ay), and conversely any sesquilinear form
is defined from some matrix A by: hx, yi = x · (Ay).
Recall that a function h·, ·i from Cn × Cn to C is called Hermitian if hx, yi = hy, xi

for all x and y in Cn . Therefore a matrix A defines a Hermitian sesquilinear form if
and only if, for all pairs x and y of vectors, x · (Ay) = y · (Ax). But we also have
y · (Ax) = (Ax) · y = x · (A∗ y). Therefore the sesquilinear form defined by A is
Hermitian if and only if x · (Ay) = x · (A∗ y) for all choices of vectors x and y, and
this is equivalent to A∗ = A, i.e., that A is a Hermitian matrix!
For a sesquilinear form h·, ·i to define a complex inner product, it must be Hermitian
and positive. For the form to be positive means that hx, xi > 0 for all vectors x 6= 0.
An n × n matrix A is called positive definite if A is Hermitian and, for all non-zero

x ∈ Cn ,
x · (Ax) > 0.
Note that, for a matrix to be positive definite, it must be Hermitian.
Since A is Hermitian (A = A∗ ), by Theorem 8.1.1, we have v · (Av) = (A∗ v) · v =
(Av) · v, and so it is equivalent to demand that (Av) · v > 0.
We have thus proven the following result.
Theorem 8.4.1: A matrix A defines a complex inner product h·, ·i on Cn × Cn

through hx, yi = x · (Ay) if and only if A is positive definite.
A Hermitian matrix A is called positive semi-definite if x · (Ax) ≥ 0 for all vectors x,

with equality for some x 6= 0 if A is not positive definite. If −A is positive definite
than we say that A is negative definite and if −A is positive semi-definite then A is
negative semi-definite.
51
Lecture 9: Introduction to Spectral Theory

9.1 Complex solutions
We begin by reviewing the method for finding a real eigenvalue of an n × n real
matrix A – if one exists – and a corresponding eigenvector, and also to diagonalise
the matrix using these eigenvectors, if it is possible. The steps are as follows:
(1) Let x be a variable, form the matrix xI − A, and take its determinant. This
will be a polynomial f (x) in x, of degree n. This polynomial f (x) is called the
characteristic polynomial of the matrix A.
(2) Find the roots of the characteristic polynomial f (x), which are the eigenvalues
of A.
(3) For each eigenvalue λ of A, find a basis of the nullspace of λI − A (the eigenspace
corresponding to eigenvalue λ). Because the determinant of each λI − A is zero,
each eigenspace will be a subspace of dimension at least one, and will contain an
eigenvector.
(4) If the sum of the dimensions of all the eigenspaces equals n, the dimension of
the vector space, then we have found a basis of Rn whose elements are eigenvectors.
Writing the basis into the columns of a matrix S, we have A = SDS −1 and D =
S −1 AS, for a diagonal matrix D that contains the eigenvalues of A on the diagonal.
As we mentioned earlier, the process of diagonalisation amounts to finding a diagonal

matrix D similar to A, and the linear transformation represented by A with respect
to the standard ordered basis of Rn is represented by D with respect to the ordered
basis of eigenvectors.
The characteristic polynomial of a matrix A is usually defined, as above, to be

det(xI − A). For the process of diagonalisation, it does not matter if one uses
det(xI − A) or det(A − xI), as these polynomials will have the same roots. The
motivation behind using det(A − xI) is easier to understand; the advantage of using
det(xI − A) is that its leading term is always 1.
Our aim is to examine where this process may fail, and what we can do (partly) to
remedy this.
The first place where the process could fail is in step (2). The problem is that the
characteristic polynomial det(xI − A) may have no real roots.
52
A central theorem concerning the complex numbers is the fundamental theorem of

algebra. It states that any non-zero polynomial f with coefficients in the complex
numbers can be factored completely into linear factors, meaning that if P the degree of
f is n ≥ 1 then f (x) can be re-written as a(x − r1 ) · · · (x − rk ) with ki=1 mi = n
m1 mk
such that the ri are complex numbers and a is also a complex number, which is the
coefficient of the leading term xn of the polynomial. If f (x) = det(xI − A) is the
characteristic polynomial of a matrix A, then a = 1.
By the fundamental theorem of algebra, the characteristic polynomial f (x) of an

n × n real matrix A will have complex roots, and it can be factored completely.

cos θ − sin θ
Example: Let A = .
sin θ cos θ
This is an orthogonal matrix, representing rotation by angle θ in R2 . We have

x − cos θ sin θ
det(xI − A) = det = x2 − 2x cos θ + 1.
− sin θ x − cos θ
The characteristic polynomial may have no real roots (indeed, it will have real roots
only if cos θ = 1), but it does always have roots in the complex
√
numbers. Solving for
2 cos θ± 4 cos2 θ−4
x using the quadratic formula, one finds the roots ,√
which simplify to
√ √ √ 2 √
cos θ ± i sin θ. (Notice that cos2 θ − 1 = −1 1 − cos2 θ = −1 sin2 θ = i sin θ.)
Let’s continue the process of diagonalisation. Next, we subtract the eigenvalue

cos θ) + i sin θ from the diagonal and get

−i sin θ − sin θ
.
sin θ −i sin θ
To find the nullspace of this matrix, we notice that the first row is −i times the
second row, so the row reduction yields a one-dimensional nullspace spanned by
(1, −i)t .
We do the same for the other eigenvalue cos θ) − sin θi, obtaining a one-dimensional
nullspace spanned by (1, i)t .
Notice that these two eigenvectors (1, −i)t and (1, i)t corresponding to the two eigen-
values cos θ+i sin θ and cos θ−i sin θ, respectively, are orthogonal. For some purposes
it is useful to normalise them, so as to get a unitary change of basis matrix
√ √
2 1 1 −1 2 1 i
S= with inverse S = .
2 −i i 2 1 −i
53
Now we can diagonalise the matrix A:

1 1 i cos θ − sin θ 1 1 cos θ + i sin θ 0
= .
2 1 −i sin θ cos θ −i i 0 cos θ − i sin θ
9.2 Complex Diagonalisation

We continue with the method of complex diagonalisation.
Example: Consider  
5 5 −5
A = 3 3 −5 .
4 0 −2
We calculate
 
x − 5 −5 5
det(Ix − A) = det  −3 x − 3 5  = x3 − 6x2 + 4x + 40.
−4 0 x+2
Because the characteristic polynomial f (x) = x3 − 6x2 + 4x + 40 is real and of odd
degree, there must be a real root, and we look for one.
If f (x) has an integer root, it must be a divisor of 40. It’s not hard to see that
f (x) is positive for all positive x. We try f (−1) = 29, and then f (−2) = 0, so we
have found one root. Factoring (x + 2) out gives f (x) = (x + 2)(x
√
2
− 8x + 20). The
other√ roots of f are the roots of the quadratic, which are 4 + 64−80
2 = 4 + 2i and
4 − 64−80
2 = 4 − 2i. These are our eigenvalues.
In general, if f (x) is a real polynomial, and c is a root of f (x) with Im(c) 6= 0, then
c is also a root. To see this, consider the complex conjugate of the entire equation
f (c) = 0. The real coefficients in f remain untouched by conjugation, and cn = cn
for each n. So the result is f (c) = f (c) = 0 = 0. As our matrix A has only real
entries, it has a real characteristic polynomial, and it follows that 4 + 2i is a root if
and only if 4 − 2i is a root.
Before we continue to diagonalise, we sum up the eigenvalues as a check: 4 + 2i +

4 − 2i + −2 = 6. This is the trace of the original matrix.
An eigenvector associated with −2 is easiest to find. The nullspace of the matrix

 
7 5 −5
3 5 −5
4 0 0
54
is spanned by (0, 1, 1)t .

For an eigenvector associated with 4 + 2i, we need to find a vector in the nullspace
of  
1 − 2i 5 −5
 3 −1 − 2i −5  .
4 0 −6 − 2i
Divide the third row by four:
 
1 − 2i 5 −5
 3 −1 − 2i −5 .
1 0 −3/2 − i/2
Add −1 + 2i times the third row to the first row and subtract 3 times the third row
from the second row:  
0 5 −5/2 − 5i/2
0 −1 − 2i −1/2 + 3i/2 .
1 0 −3/2 − i/2
Multiply the middle row by −1 + 2i, and observe that it is equal to the top row, so
it may be eliminated. Divide the top row by 5 to obtain

0 1 −1/2 − i/2
.
1 0 −3/2 − i/2
We find the eigenvector (3 + i, 1 + i, 2)t .
We could now search for an eigenvector associated with 4−2i using the same method,
but we have already done the work. Letting v = (3 + i, 1 + i, 2)t and λ = 4 + 2i,
we have (A − λI)v = 0. Taking the complex conjugate of this equation yields
A − λIv = 0. But, since A is a real matrix, A = A, and hence v is an eigenvector
corresponding to the eigenvalue λ. In this example, we conclude that (3 − i, 1 − i, 2)t
is an eigenvector corresponding to 4 − 2i. We now have the change of basis matrix
 
3+i 3−i 0
S = 1 + i 1 − i 1 .
2 2 1
The following is the calculation to find the inverse of S:
   
3+i 3−i 0 1 0 0 2 2 −1 1 −1 0
 1+i 1−i 1 0 1 0  ; R1 − R2 → R1;  1 + i 1 − i 1 0 1 0  ;
2 2 1 0 0 1 2 2 1 0 0 1
55
 
2 2 −1 1 −1 0
(1 − i)R2 → R2;  2 −2i 1 − i 0 1 − i 0  ; R2 − R1 → R2, R3 − R1 → R3;
2 2 1 0 0 1
 
2 2 −1 1 −1 0
 0 −2 − 2i 2 − i −1 2 − i 0  ; (1 − i)R2 → R2, 2R1 + R2 → R1, R3 → R3;
2
0 0 2 −1 1 1
 
4 0 −1 − 3i 1 + i −1 − 3i 0
 0 −4 1 − 3i −1 + i 1 − 3i 0  ; R1+(1+3i)R3 → R1, R2+(−1+3i)R3 → R2;
0 0 1 −1/2 1/2 1/2
 
4 0 0 (1 − i)/2 (−1 − 3i)/2 (1 + 3i)/2
 0 −4 0 (−1 − i)/2 (1 − 3i)/2 (−1 + 3i)/2  ; R1 → R1, − R2 → R2;
4 4
0 0 1 −1/2 1/2 1/2
 
1 0 0 (1 − i)/8 (−1 − 3i)/8 (1 + 3i)/8
 0 1 0 (1 + i)/8 (−1 + 3i)/8 (1 − 3i)/8  .
0 0 1 −1/2 1/2 1/2
We have the diagonalisation:
   
1 − i −1 − 3i 1 + 3i 5 5 −5 3+i 3−i 0
1
S −1 AS = 1 + i −1 + 3i 1 − 3i 3 3 −5 1 + i 1 − i 1
8
−4 4 4 4 0 −2 2 2 1
 
4 + 2i 0 0
= 0 4 − 2i 0  .
0 0 −2
56
Lecture 10: Jordan Normal Form

10.1 Multiplicity

1 1
Not all square matrices can be diagonalised. A simple example is A = . As
0 1
1 is its only eigenvalue, if A were diagonalisable then there would be an invertible
matrix P such that P AP −1 = I2 , and that would imply that A is equal to P −1 I2 P =
I2 , which it is not.
Let A be an n × n matrix with characteristic polynomial (x − λ1 )m1 · · · (x − λk )mk ,

with m1 + · · · + mk = n. For an eigenvalue λi , the value mi is called the algebraic
multiplicity of λi .
For each eigenvalue λi of A, define the eigenspace of λi to be the nullspace of A−λi I;
in other words, the eigenspace of λi contains all the eigenvectors with eigenvalue λi ,
and also the zero vector. The eigenspace of λi is a subspace of Rn ; its dimension
(which is at least 1 since λi is an eigenvalue) is the geometric multiplicity of λi .
We shall see shortly that, for any eigenvalue λ of an n × n matrix A, the algebraic
multiplicity of λ is greater than or equal to its geometric multiplicity.
We can diagonalise the matrix A if and only if, for every eigenvalue λi of A, the
algebraic multiplicity of λi is equal to its geometric multiplicity: in this case, we can
combine bases of each eigenspace to get a basis of Rn consisting of eigenvectors.
If we cannot diagonalise the matrix, what is the next best thing?
10.2 Upper Triangular Form

Suppose A is an n × n complex matrix representing the linear transformation T
with respect to the standard basis, and suppose the characteristic polynomial of
A is (x − λ1 )m1 · · · (x − λk )mk . So λ1 is an eigenvalue of the matrix A, and so
A − λ1 I has a non-trivial nullspace, with a basis C = {v1 , . . . , vt } of eigenvectors
corresponding to eigenvalue λ, where t is the geometric multiplicity of eigenvalue
λ. Imagine completing C to an ordered basis B = (v1 , . . . , vn ) of Cn , and changing
the basis to B. What form does the matrix AB T , representing T with respect to the
basis B, take?
As T (v1 ) = λ1 v1 , . . . , T (vt ) = λ1 vt , the first k columns of AB
T have λ1 on the
λ1 It M
diagonal, and zeros elsewhere. So AB T has the form , where M and N are
0 N
57
t × (n − t) and (n − t) × (n − t) matrices respectively. The characteristic polynomial

of ABT is (x − λ1 ) times the characteristic polynomial of N . But this is the same as
t
the characteristic polynomial of A, by Theorem 3.4.1.

Our first conclusion is that t ≤ m1 : the geometric multiplicity of eigenvalue λ is at
most the algebraic multiplicity.
Our second conclusion is that the characteristic polynomial of N is equal to (x −
λ1 )m1 −t (x−λ2 )m2 · · · (x−λk )mk . The matrix N now represents a linear transformation
on the space spanned by D = {vt+1 , . . . , vn }. Specifically, N represents a linear
transformation T 0 : span(D) → span(D), and M represents a linear transformation
T 00 : span(D) → span(C), with the property that T (v) = T 0 (v) + T 00 (v), for v in
span(D).
If m1 − t > 0, then λ1 is an eigenvalue of N . This means that there is some vector v
in span(D) such that T 0 (v) = λ1 v. This means that T (v) − λv is in span(C), and
this in turn means that applying the linear transformation u 7→ T (u) − λ1 u twice
sends v to an element of span(C), and thence to 0. This means that v is in the
nullspace of (A − λI)2 , but not of A − λI. Of course, the nullspace of A − λI is
contained in the nullspace (A − λI)2 .
Continuing in this way, we get the following result.
Theorem 10.2.1: Let λ be an eigenvalue of the matrix A, with algebraic multiplic-
ity m. Then, for some r ≤ m, the matrix (A − λI)r has a nullspace Uλ of dimension
m. Moreover, for each u ∈ Uλ , Au is also in Uλ .
Moreover, one can show the following result.

Theorem 10.2.2: Suppose f (x) = (x − λ1 )m1 · · · (x − λk )mk is the characteristic
polynomial of the n × n matrix A (with n = m1 + · · · + mk ). For each 1 ≤ i ≤ k,
let Ui be the nullspace of (A − λi I)mi .
(i) For each i, the subspace Ui has dimension mi .
(ii) The union of the Ui spans the n-dimensional vector space Rn (or Cn ).
(iii) For every i, and every v ∈ Ui , the vector Av is in Ui . Moreover, there is
an ordered basis Bi = {v1 , . . . , vmi } of Ui such that Avj is in the linear span of
{v1 , . . . , vj } for each j = 1, . . . , mi .
(iv) The characteristic polynomial of A restricted to Ui is (x − λi )mi .
Theorem 10.2.2 is called the Primary Decomposition Theorem.
Theorem 10.2.3 (Cayley-Hamilton Theorem): If f (x) = (x − λ1 )m1 · · · (x −

λk )mk is the characteristic polynomial of the matrix A, then f (A) = 0.
58
Proof: Let the Ui be as in Theorem 10.2.2, and let Bi be a basis for Ui , for each i.
By part (ii) of the Theorem, the union of the Bi forms a basis for Cn . Each element
of Bi is in the nullspace of (A − λi I)mi .
Now f (A) = (A − λ1 I)m1 · · · (A − λi I)mi · · · (A − λk I)mk , and the terms (A − λj )
all commute, so f (A) is of the form C(A − λi I)mi for some matrix C. Therefore
f (A)v = 0 for each v in the basis Bi .
Thus the matrix f (A) takes each basis element to 0, which implies that f (A) is the
zero matrix. 2
The ordered basis B of Cn obtained by putting together the ordered bases Bi in

Theorem 10.2.2(iii) has the property that, for each basis element vj , Avj is in
span({v1 , . . . , vj }). In other words, the matrix C representing T with respect to the
basis B is in upper triangular form: cij = 0 whenever i > j.
Theorem 10.2.4: For every n × n complex matrix A, there is an invertible complex
matrix S such that S −1 AS is in upper triangular form. In other words, every square
matrix is similar to an upper triangular matrix.
We shall explore the theoretical consequences of this later. There is a fairly simple
method, broadly following the theoretical discussion earlier, that puts a matrix into
upper triangular form. See Ostaszewski for details. We shall not study this method:
instead we’ll look at a method that accomplishes even more.
10.3 Jordan Normal Form

One can go a step further than Theorem 10.2.4.
Theorem 10.3.1 (Jordan Normal Form): If A is any n × n matrix with char-

acteristic polynomial (x − λ1 )m1 · · · (x − λk )mk , then there exists an invertible n × n
matrix S such that  
A1 0 . . . 0
 0 A2 . . . 0 
SAS −1 =  
. . . . . . . . . . . .  ,
0 0 . . . Ak
59
where each Ai is of the form

 
λi ∗ 0 ... ... 0 0
0 λ ∗ ... 0 0 
 i 0 
0 0 λ ∗ ... 0 0 
 i 
 
Ai =  . . . . . . . . . ... . . . . . . . . . ,
 
. . . . . . . . . ... ... ∗ 0 
 
0 0 0 ... . . . λi ∗ 
0 0 0 ... . . . 0 λi
such that the ∗ are either 0 or 1, and all other non-diagonal entries are 0.
A matrix of the form as in SAS −1 above is said to be in Jordan normal form. A

matrix Ai with 1s in place of all the ∗s is called a Jordan block.
Theorem 10.3.1 says that every n × n matrix is similar to a matrix in Jordan normal
form.
We’ll concentrate on a method for putting matrices into Jordan normal form.
Fix an eigenvalue λ of a matrix A, and let m be the algebraic multiplicity of λ. For
every j, let dj be the dimension of the nullspace of (A − λI)j . Let r be the least
number such that dr = m. The plan is to find a set B, of size dr − dr−1 , that extends
a basis of the nullspace of (A − λI)r−1 to a basis of the nullspace of (A − λI)r . For
each v in B, we apply A − λI repeatedly, until reaching the last non-zero vector.
As each v in the nullspace of (A − λi )r , but not of (A − λi )r−1 , the vectors (A −
λI)r−1 v, . . . , (A − λI)v, v are all non-zero. Moreover, this set of r vectors is linearly
independent: if
αr−1 (A − λI)r−1 v + · · · α1 (A − λI)v + α0 v = 0,
and αj is the last non-zero coefficient (i.e., the one with j smallest), then applying
the linear transformation (A − λI)r−j−1 to the sum above leaves us with only one
non-zero term, namely αj (A − λI)r−1 v, which is not possible.
If we take two different elements of B, then one can show that the two sequences of
r vectors span subspaces whose intersection is {0}.
If r(dr − dr−1 ) = m, then we are done and have found the basis for the Jordan
normal form (as we would have m vectors that span a space of dimension m). If
0
not, find the first r0 such that the linear span of N ((A − λI)r −1 ) ∪ {(A − λI)j v | v ∈
0
B, j = 0, 1, . . . , r − 1} does not include N ((A − λI)r ), and continue the process.
When we have all the basis elements, we write them down in the order where each
basis vector v is preceded by (A − λI)v, if that is non-zero.
60
Usually the process of finding the basis for the Jordan normal form is fairly easy.
Only when some subspace corresponding to a single eigenvalue has dimension at
least 4 might the process start to become complicated. In the practical examples that
follow, we’ll restrict ourselves to matrices where none of the serious complications
can occur; in the Appendix, we’ll discuss a more complex case in the abstract.
 
3 −1 −4 7
1 1 −3 5 
Example: Let A be the matrix  
0 1 −1 −1. We put A into Jordan normal
0 0 0 2
form.
 
3 − x −1 −4 7
 1 1−x −3 5 
First, we find the determinant of A−xI =   0
. Because
1 −1 − x −1 
0 0 0 2−x
the last row has zeros except for the 2 − x, it is best to start with this row: we see
that det(A − xI) is equal to (2 − x) times
 
3 − x −1 −4
det  1 1−x −3  = (3 − x)(x2 + 2) − (1 + x) − 4 = −x3 + 3x2 − 3x + 1,
0 1 −1 − x
which is equal to −(x − 1)3 . The characteristic polynomial of A is therefore equal

to (x − 1)3 (x − 2).
The eigenvalue 2, with algebraic multiplicity 1, is not a problem; there is one eigen-
value corresponding to this eigenvalue, which is found to be v4 = (8, 7, 2, 1)t .
The nullspace of (A − I)3 must be three-dimensional. We calculate:
    
2 −1 −4 7 2 −1 −4 7 3 −6 3 20
1 0 −3 5  1 0 −3 5  2 −4 2 15
(A − I)2 =    
0 1 −2 −1 0 1 −2 −1 = 1 −2 1 6  ;

0 0 0 1 0 0 0 1 0 0 0 1
    
3 −6 3 20 2 −1 −4 7 0 0 0 8
2 −4 2 15 1 0 −3 5  0 0 0 7
(A − I)3 =    
1 −2 1 6  0 1 −2 −1 = 0 0 0 2 .

0 0 0 1 0 0 0 1 0 0 0 1
The nullspace of (A − I)3 is spanned by the standard vectors e1 , e2 , and e3 . The
nullspace of (A−I)2 is spanned by {(1, 0, −1, 0)t , (0, 1, 2, 0)t }. Take any vector in the
61
nullspace of (A − I)3 but not in the nullspace of (A − I)2 – a convenient choice is the
vector e3 = (0, 0, 1, 0)t – and call it v3 . Find the vector (A−I)v3 = (−4, −3, −2, 0)t ,
and call it v2 . This vector must lie in the nullspace of (A − I)2 but not in the
nullspace of A − I. Now find (A − I)v2 = (3, 2, 1, 0)t , and call this vector v1 . Notice
that A takes v3 to v3 + v2 , takes v2 to v2 + v1 , and takes v1 to v1 . Along with
v4 = (8, 7, 2, 1)t , an eigenvector for the eigenvalue 2, we have our basis for putting
A into Jordan normal form:
   
3 −4 0 8 3 −4 0 4
2 −3 0 7 2 −3 0 5
S= 1 −2 1 2
; S −1
=  
1 −2 1 4 ;
0 0 0 1 0 0 0 1
   
3 −4 0 4 3 −1 −4 7 3 −4 0 8
2 −3 0 5 1 1 −3 5  2 −3 0 7
S −1 AS =  
1 −2 1 4 0 1 −1 −1 1 −2 1 2 =
 
0 0 0 1 0 0 0 2 0 0 0 1
    
5 −7 0 9 3 −4 0 8 1 1 0 0
3 −5 1 9 2 −3 0 7 0 1 1 0
    
1 −2 1 4 1 −2 1 2 = 0 0 1 0 .
0 0 0 2 0 0 0 1 0 0 0 2
Appendix: A more general case
How does one know the Jordan normal form corresponding to an eigenvalue λ, i.e.,
how do we tell how many chains of basis elements there are of each length, given
the dimensions of the nullspaces of (A − λI)n ?
We work out an example that shows the general method. Suppose we know that
the algebraic multiplicity is 9 and the dimensions of the nullspaces are as follows:
the nullspace of A − λ has dimension 4,
the nullspace of (A − λ)2 has dimension 6,
the nullspace of (A − λ)3 has dimension 8,
the nullspace of (A − λ)4 has dimension 9.
First we place four points on the “bottom level”,

62
◦ ◦ ◦ ◦
and then build the second level so that the total number of points is 6,
◦ ◦
◦ ◦ ◦ ◦
and continue in this way until we have built up a pyramid of nine points:
◦
◦ ◦
◦ ◦
◦ ◦ ◦ ◦
We have four towers of points, the first tower being of height 4, the second of
height 3, and the last two both of height 1. First we search for a vector v in the
nullspace of (A−λI)4 that is not in the nullspace of (A−λI)3 , and apply A−λI to it
repeatedly until reaching the last vector (A−λ)3 v in the chain before 0 = (A−λ)4 v.
Next we search for a vector w in the nullspace of (A − λ)3 that is both not in the
nullspace of (A−λ)2 and linearly independent from the set of all the previous vectors
(A − λ)v, (A − λ)2 v, (A − λ)3 v found in the nullspace of (A − λI)3 . Next we find a
non-zero vector q in the nullspace of A−λ that is not in the linear span of the vectors
(A − λI)3 v and (A − λI)2 v. Lastly, we find a non-zero vector r in the nullspace of
A − λI that is not in span({(A − λI)3 v, (A − λI)2 v, q}). The vectors v, w, q, r,
and the applications of A − λI to these vectors, will form the basis for the Jordan
normal form. The result will be the following matrix in Jordan normal form:
 
λ 1 0 0 0 0 0 0 0
0 0
 λ 1 0 0 0 0 0 
0 0
 0 λ 1 0 0 0 0 
0 0
 0 0 λ 0 0 0 0 
 
0 0 0 0 λ 1 0 0 0 .
 
0 0 0 0 0 λ 1 0 0
 
0 0 0 0 0 0 λ 0 0
 
0 0 0 0 0 0 0 λ 0
0 0 0 0 0 0 0 0 λ
63
Lecture 11: Application to Differential Equations

11.1 Homogeneous Linear Differential Equations
In this and the next lecture, we examine some practical applications of the technique
of diagonalising a matrix, or of putting it into Jordan normal form.
Let y(t) be a complex valued function of a real variable t. A simple type of differential
equation is y 0 (t) = ay(t), where a is a constant. This equation is solved by y(t) =
ceat , as y 0 (t) = aceat . Indeed, all solutions look like this, since if y is a solution
then the derivative of y(t)e−at is (by the chain rule) y 0 (t)e−at − ay(t)e−at = (y 0 (t) −
ay(t))e−at = 0. The only functions g whose derivatives are zero everywhere are
constant functions, and therefore with c being that constant from y(t)e−at = c we
have y(t) = ceat . The value of y(0) determines the value of c, as e0 = 1 and so
y(0) = c.
We want to find the functions y1 , y2 , . . . yn of the variable t satisfying the differential

equations:
y10 (t) = a11 y1 (t) + · · · + a1n yn (t)
y20 (t) = a21 y1 (t) + · · · + a2n yn (t)
···
0
yn (t) = an1 y1 (t) + · · · + ann yn (t).
The above system of equations is called homogeneous; the derivative of each function
is in the linear span of itself and the other functions.
One can place these equations in matrix form and solve the following equations:
    
0
y1 a11 a12 . . . a1n y1
y0  a a . . . a  y 
 2   21 22 2n   2 
 ..  =  .. .. .. ..   ..  .
.  . . . .  . 
yn0 an1 an2 . . . ann yn
The idea of a matrix equation is not accidental. Taking the derivative is a linear
transformation defined on the subspace of all functions that have derivatives.
If taking the derivative took one of these n functions to a function outside of the
linear span of these n functions, then the resulting system of equations would be
non-homogeneous. Solving non-homogeneous systems of equations can be difficult
or easy. However, once one solution is found, solving the homogeneous system of
64
equations gives us all the solutions. Suppose that we are to solve:

      
y10 a11 a12 . . . a1n y1 g1
y0  a a . . . a  y  g 
 2   21 22 2n   2   2
 ..  =  .. .. .. ..   ..  +  ..  ,
.  . . . .  .   . 
0
yn an1 an2 . . . ann yn gn
and we know one solution (ŷ1 , ŷ2 , . . . , ŷn ). Any other solution (y1 , y2 , . . . , yn ) must
satisfy     
(y1 − ŷ1 )0 a11 a12 . . . a1n y1 − ŷ1
 (y − ŷ )0   a a . . . a2n   
 2 2   21 22   y2 − ŷ2 
 ..  =  .. .. .. ..   ..  ,
 .   . . . .  . 
0
(yn − ŷn ) an1 an2 . . . ann yn − ŷn
and hence the difference is the solution of the homogeneous system.
Furthermore, higher order differential equations, meaning differential equations in-

volving at least one second or higher derivative, can often be solved in the same
3 2
way. For example, to solve ddt3y − ddt2y + 2 dy dt − 3y = 0, we could reformulate this as:
2 3
d2 y
y0 = y, y1 = dy dt2 − 2 dt + 3y = y2 − 2y1 + 3y0 . We would
d y d y dy2 dy
dt , y 2 = dt 2 and dt 3 = dt =
have to solve the following equation:
 0   
y0 0 1 0 y0
y1  = 0 0 1 y1  .
0
y20 3 −2 1 y2
11.2 The method

We must solve     
y10 a11 a12 . . . a1n y1
y0  a a . . . a  y 
 2   21 22 2n   2 
 ..  =  .. .. .. ..   ..  ,
.  . . . .  . 
yn0 an1 an2 . . . ann yn
which we can write as y0 = Ay.
If A is diagonal, then solving these equations is easy. For example, if a11 = 1,

a22 = i and a33 = −i (and otherwise aij = 0) then we get the solution y1 (t) = c1 et ,
y2 (t) = c2 eit = c2 (cos t + i sin t) and y3 (t) = c3 e−it = c3 (cos t − i sin t).
We know that, for any n × n matrix A, there will be some invertible matrix S with
S −1 AS = E, for some matrix E in Jordan normal form. For now, we assume that E
65
is diagonal, and later we will show how to solve the equations when E is in Jordan
normal form.
Write A as A = SES −1 , E = S −1 AS and

 
b11 0 . . . 0
 0 b ... 0 
 22 
E =  .. .. .. ..  .
 . . . . 
0 0 . . . bnn
The trick is to create new variables according to the formula

   
f1 y1
f  y 
 2  2
 ..  = S −1  ..  .
. .
fn yn
By the linearity of the derivative, we can also write

   
f10 y10
f 0   0
 2 −1  y2 
 ..  = S  ..  .
. .
fn0 yn0
We can re-write this as

     
y10 b11 0 . . . 0 y1
y0   0 b ... 0  y 
 2  22  −1  2 
 ..  = S  .. .. .. ..  S  ..  ,
.  . . . .  .
0
yn 0 0 . . . bnn yn
and by multiplying by S −1 on the left of both sides of the equation we get

    
f10 b11 0 ... 0 f1
f 0   0 b22 . . . 0   f2 
 
 2  
 ..  =  .. .. .. ..   ..  .
.  . . . .  . 
fn0 0 0 . . . bnn fn
We can now write down expressions for the functions fk , namely fk (t) = ck ebk,k t ,
where ck is some constant. Then we find the functions y1 , . . . , yn according to the
66
formula    
y1 f1
y  f 
 2  2
 ..  = S  ..  .
. .
yn fn
We illustrate this method by using it to solve the (hopefully familiar) second order
2
differential equation ddt2y + y = 0. As described above, this can be reformulated as
y1 = dy
dt , y0 = y, and 0
y0 0 1 y0
0 = .
y1 −1 0 y1

0 1 −x 1
Set M = . The determinant of is x2 + 1. So the eigenvalues of
−1 0 −1 −x

−i 1
M are i and −i. We find the nullspace of M −iI = . Recognising that the
−1 −i
rows are multiples of each other (they have to be, since M −iI is singular so the rows
are linearly dependent), we have that v1 = (1, i)t is in the nullspace of M − iI, and
−i)
t
is an eigenvector of M for the eigenvalue i. Likewise
v2 = (1, is aneigenvector

1 1 y0 f0
of M for the eigenvalue −i. Setting S = , we have =S , and
i −i y1 f1
0 0
f0 y 0 1 y0
0 = S −1 00 = S −1 =
f1 y1 −1 0 y1

0 1 f i 0 f y 1 1 f0
S −1 S 0
= 0
and 0
= .
−1 0 f1 0 −i f1 y1 i −i f1
The solution is f0 = c0 eit and f1 = c1 e−it , and so we get the solution of y =
y0 = c0 eit + c1 e−it . The first derivative is ic0 eit − ic1 e−it and, by taking the second
2
derivative, we get ddt2y = −c0 eit − c1 e−it , which is indeed the negative of y. Putting
c0 = c1 = 21 gives the familiar solution y(t) = cos t, and putting c0 = −i/2 and
c1 = i/2 gives the solution y(t) = sin t.
11.3 An example with Jordan normal form

Suppose we are to solve the following system of differential equations:
 0  
y1 1 0 1 y1
y2  1 1 −3 y2  .
0
y30 0 1 4 y3
67
We set A to be the matrix above, and calculate its characteristic polynomial, which
is (x − 2)3 . By the Cayley-Hamilton Theorem, (A − 2I)3 is the zero matrix and
therefore every vector is in the nullspace of (A − 2I)3 .
We calculate:  
−1 0 1
A − 2I =  1 −1 −3 ;
0 1 2
    
−1 0 1 −1 0 1 1 1 1
(A − 2I) =  1 −1 −3  1 −1 −3 = −2
2
−2 −2 ;
0 1 2 0 1 2 1 1 1
    
−1 0 1 1 1 1 0 0 0
(A − 2I) =  1 −1 −3 −2 −2 −2 = 0
3
0 0 .
0 1 2 1 1 1 0 0 0
We see that A−2I has a one-dimensional nullspace, (A−2I)2 has a two-dimensional
nullspace, and (A − 2I)3 has a three-dimensional nullspace. Any vector v in the
nullspace of (A − 2I)3 that is not in the nullspace of (A − 2I)2 will generate a
sequence of three vectors v, (A − 2I)v, and (A − 2I)2 v that span
 the nullspace of
2 1 0
(A − 2I)3 . This implies that A is similar to the matrix 0 2 1 in Jordan normal
0 0 2
form.
To find the change-of-basis matrix, we take the vector v3 = (0, 0, 1)t , which is in the
nullspace of (A − 2I)3 , but not in the nullspace of (A − 2I)2 . The image of v3 under
A − 2I is v2 := (1, −3, 2)t . The image of v2 under A − 2I is v1 := (1, −2, 1)t . And
of course A − 2I applied to v1 gives 0. Therefore, if we define the matrix S to be
 
1 1 0
S = −2 −3 0 ,
1 2 1
 
2 1 0
then we have that S −1 AS is equal to 0 2 1.
0 0 2
We therefore set     
y1 1 1 0 f1
y2  = −2 −3 0 f2  .
y3 1 2 1 f3
68
We must solve  0   
f1 2 1 0 f1
f20  = 0 2 1 f2  .
f30 0 0 2 f3
The rows translate to f30 = 2f3 , f20 = 2f2 + f3 , f10 = 2f1 + f2 . This leads to the
solution
f3 = c3 e2t
f2 = c2 e2t + c3 te2t
2
f1 = c1 e2t + c2 te2t + c3 t2 e2t .
Therefore, the general solution of the original equations is:

y1 = f1 + f2 = (c1 + c2 )e2t + (c2 + c3 )te2t + c23 t2 e2t ;
y2 = −2f1 − 3f2 = (−2c1 − 3c2 )e2t + (−2c2 − 3c3 )te2t − c3 t2 e2t ;
y3 = f1 + 2f2 + f3 = (c1 + 2c2 + c3 )e2t + (c2 + 2c3 )te2t + c23 t2 e2t .
In general, what is the solution of the following matrix equation:

 0    
f1 λ 1 0 0 ... 0 0 f1
 f 0  0 λ 1 0 . . . 0 0  
 2     f2 
 f 0  0 0 λ 1 . . . 0 0  
 3     f3 
 ..   .. .. .. .. .. .. ..   .. ?
 .  = . . . . . . . . 
 0    
fn−2   0 0 0 0 . . . 1 0  fn−2 
 0    
fn−1   0 0 0 0 . . . λ 1  fn−1 
fn0 0 0 0 0 ... 0 λ fn
It is cleaner to solve upwards in index rather than downwards, so we write gk =

fn+1−k and solve the equations:
g10 = λg1 ,
g20 = λg2 + g1 ,
g30 = λg3 + g2 ,
···
0
gn−1 = λgn−1 + gn−2
gn0 = λgn + gn−1 .
The solution of the first equation is already familiar to us:

g1 = c1 eλt .
For the second, the answer is also simple:
g2 = tg1 +c2 eλt , as the derivative of g2 would be g1 +tλg1 +c2 λeλt = g1 +λg2 . Indeed, if
g is any other function satisfying the same condition, then we have ((g − tg1 )e−λt )0 =
69
(g 0 − tg10 − g1 − λg + λtg1 )e−λt = (g 0 − g1 − λg)e−λt = 0, and hence (g − tg1 )e−λt

equals a constant c, so g = ceλt + tg1 .
Theorem 11.3.1: The general solution to the system of n differential equations:

g10 = λg1 and gk0 = λgk + gk−1 , for all 2 ≤ k ≤ n is:
t2 λt tk−1 λt
gk = ck e + ck−1 te + ck−2 e + · · · + c1
λt λt
e .
2 (k − 1)!
Proof: Certainly this is true for the function g1 . Assume it is true for all the
functions g1 , . . . , gk . The function g = gk+1 must satisfy g 0 = λg + gk . Multiply by
e−λt on both sides to get
e−λt g 0 − λe−λt g = e−λt gk .
Notice that the left side of the equation is the derivative of e−λt g, so that e−λt g is
equal to the indefinite integral of the function e−λt gk . We have assumed that gk is
e , and so e−λt gk = ck + ck−1 t +
2
tk−1 λt
equal to ck eλt + ck−1 teλt + ck−2 t2 eλt + · · · + c1 (k−1)!
. The indefinite integral of the function e−λt gk is ck+1 + ck t +
2 k−1
ck−2 t2 + · · · + c1 (k−1)!
t
2 3 k
ck−1 t2 + ck−2 t6 + · · · + c1 tk! . Multiplying by eλt gives the conclusion. 2
70
Lecture 12: Age-Specific Population Growth

12.1 The model
Another application of the technique of diagonalising a matrix, or putting it into
Jordan normal form, is in the solution of systems of recurrence relations. We’ll study
this via a specific application, but the techniques we give will be very general.
We consider a simple model of how ‘populations’ change over time. For example,
we may want to predict how many animals of a certain species will be alive in a few
years time, or how rapidly a progressive disease will spread throughout a given pop-
ulation. The model is called age-specific because it is sophisticated enough to take
into account the fact that the likelihood of an animal reproducing (or succumbing
to a progressive disease) is dependent on its age.
The main assumption is that, as females give birth, they are more essential to the
propagation of a species than the male. With regard to animals, the number of
males in the population is not so critical to the population dynamics, as one male
can be the father of a very large number of offspring. The same logic does not hold
as strongly for human society or for a few exceptional animal species.
Suppose that the maximum age of a female of the species is L. We shall divide the
interval [0, L] into n age classes,
h L h L 2L h (n − 1)L i
0, , , ,..., ,L .
n n n n
We are interested in the number of females in each age class and how this grows
with time. We assume that the population in each of these age classes is measured
at discrete time intervals of L/n; that is, at times
L 2L kL
t0 = 0, t1 = , t2 = , . . . , tk = ,....
n n n
(k)
Now, for 1 ≤ i ≤ n, let xi be the population in the ith age class Ci as measured
at time tk = kL/n where
h (i − 1)L iL
Ci = , ,
n n
We consider two demographic parameters which determine how these age-specific
populations change:
• for i = 1, 2, . . . , n, the parameter ai represents the average number of daughters

born to a female in age class Ci ,
71
• for i = 1, 2, . . . , n − 1, the parameter bi represents the fraction of females in

age class Ci expected to survive for the next L/n years (and hence enter class
Ci+1 ).
We assume that ai ≥ 0 for all i and 0 < bi ≤ 1 for i < n. If, for any i, we find that
ai > 0, we call the age class Ci fertile.
It is of course difficult to state precisely an upper bound for the age that can be
obtained by any individual. For our purposes in analyzing population dynamics, it
is more important that we have a well defined upper bound for fertility. In either
case, for humans a bound of 200 years would suffice.
The number of daughters born between successive population measurements at times

tk−1 and tk is
(k−1) (k−1)
a1 x1 + a2 x2 + · · · + an x(k−1)
n ,
(k)
and this quantity must be exactly x1 . So, equating these, we get:
(k) (k−1) (k−1)
x1 = a 1 x1 + a2 x2 + · · · + an x(k−1)
n .
(k) (k−1)
For i = 1, 2, . . . , n − 1, we have xi+1 = bi x i . Writing this in matrix form we find
that,   
(k)   (k−1) 
x a a a3 . . . an−1 an x1
 1(k)   1 2  (k−1) 
x2   b1 0
 .   0 ... 0 0  x
 2(k−1) 

 ..  =  0 b2 0 
  0 ... 0 x3  ,
 ..   .. .. .. .. .. ..   .. 
 

 .  . . . . . .  . 
xn
(k) 0 0 0 . . . bn−1 0 xn
(k−1)
or x(k) = Lx(k−1) , where L is called the Leslie Matrix.
Example: Suppose that the maximum age of a female of a certain species is 45 years
and that we consider the three age classes [0, 15), [15, 30), [30, 45]. If the demographic
parameters defined above are:
a1 = 0, a2 = 4, a3 = 3, b1 = 1/2 and b2 = 1/4,
then the Leslie matrix for this species is

 
0 4 3
L = 1/2 0 0 .
0 1/4 0
72
Now, if the initial population in each age class is 1000, then x(0) = (1000, 1000, 1000)t ,
which implies that after one time period (i.e. 15 years), the population in each age
class will be given by
    
0 4 3 1000 7000
x(1) = 1/2 0 0 1000 =  500  ,
0 1/4 0 1000 250
after two time periods (i.e. 30 years), the population in each age class will be given
by
    
0 4 3 7000 2750
(2) (1)
x = Lx = 1/2 0  0  500  = 3500 ,
0 1/4 0 250 125
and after three time periods (i.e. 45 years), the population in each age class will be
given by
    
0 4 3 2750 14375
x = Lx = 1/2
(3) (2)
0 0 3500 =  1375  .
0 1/4 0 125 875
It is natural to ask what the long-term population distribution will be, and to find
this, we need eigenvalues and eigenvectors.
12.2 Eigenvalues and long-term behaviour

We have found that the population distribution in the kth time period can be found
from the population distribution in the (k − 1)st time period using the formula
x(k) = Lx(k−1) , so we can relate the population distribution in the kth time period to
the initial population distribution by using the formula x(k) = Lk x(0) . To understand
what happens in the long run, we need to be able to find Lk for suitably large k and
the easiest way to do this is to use the process of diagonalisation.
We can find an invertible matrix P such that E = P LP −1 where the matrix E is in

Jordan normal form with the eigenvalues λ1 , . . . , λn (with possible repetitions).
 
λ1 ∗ . . . 0
 0 λ ... 0
 
P −1 LP = E =  .. .. ..
2
..  .
. . . .
0 0 ... λl
73
On re-arranging, we find that P EP −1 = L and so if we were to multiply L by itself

k times we would find that
Lk = (P EP −1 )k = (P EP −1 )(P EP −1 ) · · · (P EP −1 ) = P E k P −1 .
| {z }
k times
k k −1
This, in turn, gives in general L = P E P . This drastically simplifies the task of
finding the population distribution in the kth time period.
Theorem 12.2.1: If A is an n × n Jordan block matrix of the form

 
λ 1 0 ... ... 0 0
 0 λ 1 0 ... 0 0 
 
 0 0 λ 1 ... 0 0 
 
 
A = . . . . . . . . . . . . . . . . . . . . . ,
 
. . . . . . . . . . . . . . . 1 0 
 
 0 0 0 ... ... λ 1 
0 0 0 ... ... 0 λ
k
k−j+i
then the (i, j) entry of the matrix Ak is j−i λ , where the Binomial coefficient

j−i is defined to be zero whenever j < i or k < j − i.
k
In the particular cases n = 2 and n = 3, we have

 k  k 
k k k−1
λ 1 0 λ kλk−1 k
λk−2
λ 1 λ kλ 2
= ;  0 λ 1 =  0 λk kλk−1  .
0 λ 0 λk
0 0 λ 0 0 λk
For a matrix made up of several Jordan blocks, each block can be raised to the
required power separately.
The result above shows that, if the matrix A has some eigenvalue of algebraic mul-
tiplicity 1, with a norm greater than the norms of all the other eigenvalues, then, as
long as the initial population distribution has a non-zero coefficient for this eigen-
vector, the long term behaviour
of Ak will be dominated by this eigenvalue. This
l
follows from the fact that kl ≤ k l and if λ > 1 then, for any fixed l, limk→∞ λkk = 0.
That is exactly what happens with Leslie matrices.
Fact: The characteristic polynomial f (x) = det(xI − L) of the Leslie matrix L is

given by

f (x) = xn − a1 xn−1 − a2 b1 xn−2 − · · · − an−1 b1 b2 . . . bn−2 x − an b1 b2 . . . bn−1 .
74
We know that the eigenvalues of the Leslie matrix are the solutions of the equation
f (x) = 0, and so given this fact, it can be seen (by dividing by xn ) that
f (x) = 0 ⇐⇒ g(x) = 1,
where g(x) is given by
a1 a2 b1 an b1 b2 · · · bn−1
g(x) = + 2 + ··· + .
x x xn
The function g(x) has the following three properties:
• g(x) is a decreasing function for x > 0, with g 0 (x) < 0 for all x > 0,
• as x → 0+ , g(x) → ∞,
• as x → ∞, g(x) → 0.
Consequently, we can conclude that there is a unique positive real solution of the
equation g(x) = 1. That is, L has a unique positive real eigenvalue, which we call
λ1 . Furthermore, this eigenvalue λ1 is of algebraic multiplicity 1, i.e. (x − λ1 ) is
a factor of f (x), but (x − λ1 )2 is not. Indeed, if (x − λ1 )2 is a factor, then the
evaluated derivative f 0 (λ1 ) would be zero. Solving for g(x) in terms of f (x) and
using the quotient rule, one discovers that f (λ1 ) = 0 and f 0 (λ1 ) = 0 would imply
that g 0 (λ1 ) = 0, contradicting that g 0 (x) is strictly negative for positive x.
Given any eigenvalue λ of the Leslie matrix L, we can write down the equation a
corresponding eigenvector v satisfies:
    
a1 a2 . . . an−2 an−1 an v1 λv1
b 0 . . . 0 0    
 1 0   v2   λv2 
 .. .. .. .. .. ..   ..  =  ..  .
. . . . . .  .   . 
    
 0 0 . . . bn−2 0 0  vn−1  λvn−1 
0 0 . . . 0 bn−1 0 vn λvn
Solving from the bottom up, we can express all the vi in terms of vn : vn−1 =
(λ/bn−1 )vn , vn−2 = (λ/bn−2 )vn−1 = (λ2 /bn−2 bn−1 )vn , and so on until
v1 = (λn−1 /b1 b2 · · · bn−1 )vn . The top equation is sure to be satisfied, as λ is an
eigenvalue and so we know there is some solution to the equation above. Dividing
by the value of v1 , we can write down the eigenvector v corresponding to λ is
 
1
 
 b1 /λ 
 2 
v= b1 b2 /λ .
 . 
 .
. 
b1 b2 · · · bn−1 /λn−1
75
In particular, the eigenvector corresponding to the unique positive eigenvalue λ1 is

 
1
 
 b1 /λ1 
 2 
v1 =  b1 b2 /λ1 .
 . 
 .. 
b1 b2 · · · bn−1 /λ1
n−1
Theorem 12.2.2: If two successive entries ai and ai+1 of the Leslie matrix L are
both non-zero, and the b1 , . . . , bi are all non-zero then for any eigenvalue λ of L other
than λ1 , |λ| < λ1 .
In other words, if there are two successive fertile age classes with a positive proba-
bility of reaching both of these age classes, then the eigenvalue λ1 is dominant.
So, let us suppose that there are two successive fertile classes reached with positive
probability. Let v1 , as above, be the eigenvector that corresponds to λ1 . Let E be
the matrix in Jordan normal form such that L = P EP −1 and
 
P =  v1 v2 · · · vn  .
As we can write Lk = P E k P −1 , and the population distribution in the kth time

period is related to the initial population distribution by the formula x(k) = Lk x(0) ,
we can write x(k) = P E k P −1 x(0) . Dividing both sides by λk1 , we get
x(k) 1
k
= k
P E k P −1 x(0) ,
λ1 λ1
But, since λ1 is dominant, we know that |λ/λ1 | < 1 for all other eigenvalues λ, and
so  
1 0 ... 0
1 k 0 0 . . . 0


lim k E =  .. .. .. ..  .
k→∞ λ1 . . . .
0 0 ... 0
Consequently, we find that
x(k) 1
lim k = lim k P E k P −1 x(0) = cv1 ,
k→∞ λ1 k→∞ λ1
76
where the constant c is the first entry of the vector given by P −1 x(0) , in other
words the coefficient for the eigenvector v1 . Thus, for large values of k, we have
the approximation x(k) ' cλk1 v1 , given that the initial population has a positive
coefficient c for the eigenvector v1 , and if so then the proportion of the population
lying in each age class is, in the long run, constant, which tells us that the population
in each age class grows approximately by a factor of λ1 every time period (i.e. every
L/n years).
Of course if we start with only women beyond child-bearing age, the population
dynamics will ignore the eigenvalue λ1 and move quickly toward zero. Furthermore,
if we start with complex numbers that are not non-negative real numbers, then
the long term behaviour may fail to follow the eigenvalue λ1 and multiples of its
eigenvector v1 , but then the dynamics would not follow real population dynamics.
12.3 Example
Returning to the earlier example we find that the characteristic polynomial of the
Leslie matrix  
0 4 3
L =  1/2 0 0  ,
0 1/4 0
is given by f (λ) = λ3 − 2λ − 38 . As L has two successive positive entries in its top
row, we have two successive fertile classes. Consequently, we expect a dominant
eigenvalue, and this turns out to be the root λ1 = 3/2. Using the formula above,
the eigenvector v1 corresponding to this eigenvalue is,
     
1 1 1
v1 =  b1 /λ1  =  (1/2)/(3/2)  =  1/3  .
b1 b2 /λ21 (1/2)(1/4)/(3/2)2 1/18
Thus, for large k, our approximation gives
 
k 1
3 
x(k) 'c 1/3  ,
2
1/18
and so this tells us that the proportion of the population lying in each age class
is, in the long run, constant and given by the ratio 1 : 1/3 : 1/18. From this we
also deduce that x(k) ' 32 x(k−1) , which tells us that the population in each age class
grows approximately by a factor of 3/2 (i.e. increases by 50%) every time period,
which in this case is every 15 years.
77
12.4 Power Series of Matrices

We wish to make sense of the following calculation, for an n × n matrix A:
(I −A)(I +A+A2 +A3 +· · · ) = (I +A+A2 +A3 +· · · )−(A+A2 +A3 +A4 +· · · ) = I,
and therefore (I − A)−1 = I + A + A2 + A3 + · · · .
The problem is that we need to make sense of the infinite sum. There is no problem
in interpreting the sum of matrices “entry by entry”, so the (i, j)-entry of I + A +
A2 + A3 + · · · is (I)ij + (A)ij + (A2 )ij + · · · . The question is when this sum of (in
general complex) numbers converges.
In Theorem 12.2.1, we looked at the matrix Ak , where A is in Jordan normal form.
We saw that, in an n×n Jordan block corresponding to eigenvalue λ, the (i, j)-entry
k−j+i
k k
of A is given by j−i λ . Let’s observe that these entries are all at most Ck n |λ|k
in modulus, where C is the maximum of 1 and |λ|−n .
We then have the following.
Theorem 12.4.1: If A is an n×n matrix, with eigenvalues λ1 , λ2 , . . . , and µ > |λi |
for each i, then there is a constant C such that, for every k, every entry of Ak has
modulus at most Cµk .
Proof: We write A = P JP −1 , where P is an invertible matrix and J is in Jordan
normal form. Then Ak = P J k P −1 . Each entry of Ak is the sum of n2 products
p`q, where p is an entry of P , ` of Jk , and q of P −1 . So the modulus of any entry
of Ak is at most n2 times the product of the maximum modulus of an entry of
each matrix, and hence is at most a constant (depending on A but not on k) times
the modulus of the largest entry in J k . So the maximum entry of Ak is at most a
constant times k n times |λ1 |k , where λ1 is the largest eigenvalue of J (and therefore
of A) in modulus. This is less than µk for sufficiently large k, and so we can choose
C as in the statement of the theorem. 2
Returning to the sum I + A + A2 + A3 + · · · , we now see that this does converge

provided each eigenvalue of A has modulus less than 1. Indeed, if this is the case,
then we can choose µ < 1, and a constant C, so that every entry of Ak has modulus
P P∞
at most Cµk . Since ∞k=0 Cµ k
converges to some finite quantity, so does k
k=0 (A )ij ,
for each i and j.
The formula (I − A)−1 = I + A + A2 + · · · is most useful in the form (I − εA)−1 =
I + εA + ε2 A2 + · · · , for small ε: this describes what happens when we “perturb” the
identity matrix by a small amount ε in some “direction” A. This is valid provided
|ε| < 1/|λ1 |, where λ1 is the eigenvalue of A of maximum modulus.
78
Another useful power series of matrices is

1 2 1 3 1 4
eA = I + A + A + A + A + ··· ;
2! 3! 4!
this series converges for any matrix A, for much the same reasons as before.
We note that the solution to the general linear system of differential equations given
by y0 = Ay is y = etA y0 . To see this, write
t2 2 t3
y = etA y0 = y0 + tAy0 + A y0 + A3 y0 + · · · .
2 3!
Differentiating this term-by-term, we verify that y0 = Ay.
This is a nice theoretical result, but in practice it doesn’t help us much with the
solution of systems of differential equations.
Indeed, to find etA in practice, we would put A in Jordan normal form, say A =
P JP −1 , and then observe that
2

−1 t t2 2
tA −1
e = P P +tP JP + P J P +· · · = P I + tJ + J + · · · P −1 = P etJ P −1 .
2 −1
2 2
This may be useful, because we can write down etJ explicitly. Again, we need only
evaluate etJ when J is a Jordan block. See Exercise 56.
79
Lecture 13: Normal and Positive Definite Matrices

13.1 Upper triangular matrices
Recall that an n×n matrix A is upper triangular if all the entries below the diagonal
are zero.
We have proved already that, for every complex matrix A, there is an invertible
complex matrix S such that S −1 AS is upper triangular. Indeed, we can even find a
matrix S such that S −1 AS is in Jordan normal form.
Thinking in terms of the linear transformation represented by a matrix, we get the

following alternative description. An n × n matrix A is in upper triangular form if
and only if, for each i = 1, 2, . . . , n, the vector Aei is in the linear span of {e1 , . . . , ei }.
Likewise, if B = (v1 , . . . , vn ) is an alternative ordered basis, then MB−1 AMB is upper
triangular if and only if, for each i = 1, . . . , n, the vector Avi is in the linear span
of the v1 , . . . , vi . This fact leads directly to a simple proof of the following result.
Theorem 13.1.1 For any n × n matrix A, there is a unitary matrix S such that
S −1 AS = S ∗ AS is upper triangular.
Proof: Let B ∗ = (v1∗ , . . . , vn∗ ) be an ordered basis such that MB−1∗ AMB ∗ is up-
per triangular. Perform the Gram-Schmidt procedure on the vectors v1∗ , . . . , vn∗ to
obtain an ordered orthonormal basis B = (v1 , . . . , vn ) so that, for i = 1, . . . , n,
span({v1 , . . . , vi }) is equal to span({v1∗ , . . . , vi∗ }). As the matrix MB−1∗ AMB ∗ is up-
per triangular, the vector Avi∗ is in span({v1∗ , . . . vi∗ }), and so the vector Avi is in
span({v1 , . . . , vi }), for i = 1, . . . , n. Thus MB−1 AMB is also upper triangular. As B
is an orthonormal basis, MB is a unitary matrix, and that completes the proof. 2
13.2 Normal matrices

A matrix A is unitarily diagonalisable if there is a unitary matrix S such that S −1 AS
(which is the same as S ∗ AS) is diagonal.
Recall that an n × n matrix A is called normal if AA∗ = A∗ A. We will show that A

is normal if and only if A is unitarily diagonalisable. We first need a lemma.
Lemma 13.2.1: An upper triangular matrix A is normal if and only if it is diagonal.
Proof: One direction is easy, namely that if A is diagonal then AA∗ = A∗ A and
hence A is normal.
80
Suppose that A is normal and upper triangular, so aij P = 0 whenever P i > j. The
∗ ∗
condition that AA = A A says that, for all choices i, j, k aik ajk = k aki akj . We
need to show that aik = 0 whenever k > i.
P P
Just restricting to i = j, we have k |aP ik | =
2
k |aki | for allP
2
i. Since aij = 0 for all
j < i, this can be re-written as |aii | + k>i |aik | = |aii | + k<i |aki |2 .
2 2 2
P
Choosing i = 1, we have |a11 |2 + k>1 |a1k |2P= |a11 |2 , and therefore a1k = 0 for all
k > 1. Next choose i = 2: we have |a22 |2 + k>2 |a2k |2 = |a22 |2 + |a12 |2 = |a22 |2 , as
we have already established that a12 = 0. We conclude that a2k = 0 for all k > 2.
We can continue in this way, showing at every stage i that aik = 0 for all k > i. 2
Theorem 13.2.2: For a matrix A, the following are equivalent:

(i) A is normal;
(ii) A can be unitarily diagonalised;
(iii) there is an orthonormal basis of eigenvectors of A.
Proof:
(ii) ⇔ (iii): To say that S −1 AS is a diagonal matrix means that the columns of
S form a basis of eigenvectors of A, and to say that S is unitary means that these
columns form an orthonormal set. So to say that A is unitarily diagonalisable means
that there is an orthonormal basis of eigenvectors of A.
(ii) ⇒ (i) If a matrix A is unitarily diagonalisable, then there is a unitary matrix S
with S ∗ AS = D, for some diagonal matrix D. Since A = SDS ∗ and A∗ = SD∗ S ∗ ,
we can write AA∗ = SDS ∗ SD∗ S ∗ = SDD∗ S ∗ . Similarly A∗ A = SD∗ S ∗ SDS ∗ =
SD∗ DS ∗ . But these are equal, since DD∗ = D∗ D for any diagonal matrix D.
(i) ⇒ (ii) Suppose that A is normal. By Theorem 13.1.1, there is a unitary matrix
S such that E = S −1 AS = S ∗ AS is an upper triangular matrix. We claim that
E is also normal: we have E ∗ = S ∗ A∗ S and so EE ∗ = S ∗ ASS ∗ A∗ S = S ∗ AA∗ S =
S ∗ A∗ AS = S ∗ A∗ SS ∗ AS = E ∗ E. By Lemma 13.2.1, it follows that E is diagonal,
and so E = S ∗ AS is a unitary diagonalisation of A. 2
Any Hermitian matrix is also normal, as A∗ = A implies that AA∗ = A∗ A = A2 . So

every Hermitian matrix has an orthonormal basis of eigenvectors; by Theorem 8.3.2,
these all correspond to real eigenvectors.
Furthermore, unitary matrices are normal, and hence also unitarily diagonalisable.
Real orthogonal matrices are also unitary matrices, but most are not diagonalisable
with real eigenvalues. What we know is that real orthogonal matrices have orthonor-
mal bases of eigenvectors, which in general are complex vectors corresponding to
81
complex eigenvalues. We demonstrated this in Lecture 6 in the 2-dimensional case,

the orthogonal matrices representing reflections have real eigenvalues and eigenvec-
tors; those representing rotations do not, and to diagonalise those we need to venture
into complex space.
Theorem 13.2.3: A matrix A is Hermitian if and only if A is normal with real

eigenvalues.
Proof: We have already proven one direction, that if A is Hermitian then it is

normal with real eigenvalues. Now assume that A is normal with real eigenvalues.
By Theorem 13.2.2, there is a unitary matrix P with P ∗ AP = D, for a diagonal
matrix D with only real entries (zeros off the diagonal and the real eigenvalues on
the diagonal). Since D∗ = D, it follows that A∗ = (P ∗ )∗ D∗ P ∗ = P DP ∗ = A. 2
13.3 Positive Definite Matrices

Recall from Lecture 8 that an n × n complex matrix A is positive definite if it is
Hermitian and x · (Ax) > 0 for all non-zero x. Also: a Hermitian matrix is positive
semi-definite if x · (Ax) ≥ 0 for all x; it is negative definite if x · (Ax) < 0 for all
non-zero x; it is negative semi-definite if x · (Ax) ≤ 0 for all x. The same definitions
are naturally valid for a real symmetric matrix A.
In this section, we give various alternative characterisations of positive definite ma-
trices.
Theorem 13.3.1: An n × n Hermitian matrix A is positive definite if and only

if all its eigenvalues are positive. It is positive semi-definite if and only if all its
eigenvalues are non-negative.
Proof: The Hermitian matrix A is also normal, so has an orthonormal basis of

eigenvectors v1 , . . . , vn , corresponding to the real eigenvalues λ1 , . . . , λn respectively.
Let x = a1 v1 + · · · + an vn be any vector in Cn . Because the chosen basis of eigen-
vectors is orthonormal and the λi are real, we have
X n X
n X
n
x · (Ax) = (ai vi ) · (ai λi vi ) = λi ai ai (vi · vi ) = λi |ai |2 .
i=1 i=1 i=1
Pn
If λi is positive for all i, then x · (Ax) = i=1 λi |ai |
2
is positive for all non-zero
vectors x, and so A is positive definite.
Conversely, if λi is not positive for some i, then vi · (Avi ) = λi is not positive.
The proof for the positive semi-definite case follows the same argument. 2
82
Theorem 13.3.2: If f (x) = an xn +· · ·+a1 x+a0 is a polynomial whose roots are all
real numbers, then all the roots are positive if and only if the ai alternative between
being positive and negative.
Proof: We prove first a different claim: if f (x) = an xn +· · ·+a1 x+a0 is a polynomial

whose roots are all real numbers, then all the roots are negative if and only if either
all the ai are positive or all the ai are negative.
If all the roots are negative, then there are positive reals b1 , . . . , bn , and a non-zero
real number s, such that f (x) = a(x + b1 )(x + b2 ) · · · (x + bn ); expanding this out
tells us that all the coefficients ai have the same sign as a.
On the other hand, if the ai are either all positive or all negative, then either f (x) > 0
for all non-negative x or f (x) < 0 for all non-negative x, and in either case there
are no non-negative roots. This establishes the claim.
The polynomial f (x) has only positive roots if and only if f (−x) has only neg-
ative roots. By the above claim, f (−x) has only negative roots if and only if
f (−x) = (−1)n an xn + (−1)n−1 an−1 xn−1 + · · · − a1 x + a0 has either only positive
coefficients (−1)i ai , or only negative coefficients (−1)i ai , and this is equivalent to
the ai alternating between being positive and negative. 2
Notice that the result does not hold unless we assume that all the roots are real.
The polynomials x2 + x + 1 and x2 − x + 1 both have two roots that are complex
and not real.
Theorem 13.3.3:
(i) The characteristic polynomial of a Hermitian matrix has only real coefficients.
(ii) A Hermitian matrix is positive definite if and only if its characteristic polynomial
has coefficients alternating in sign.
Proof: (i) Because a Hermitian matrix A has only real eigenvalues, its characteristic
polynomial f (x) = det(xI −A), with leading term 1, takes the form (x−r1 )m1 · · · (x−
rk )mk where r1 , . . . , rk are real numbers. Expanding out gives a polynomial with real
coefficients.
(ii) By Theorem 13.3.2, the characteristic polynomial f (x) has coefficients alternat-
ing in sign if and only if all the roots r1 , . . . , rk are positive. By Theorem 13.3.1, all
roots (eigenvalues) are positive if and only if the matrix is positive definite. 2
83
Examples: Set:
   
5 −1 3 5 −1 3 + i
A1 = −1 2 −2 ; A2 =  −1 2 −2  .
3 −2 3 3 − i −2 3
Both A1 and A2 are Hermitian. We want to discover if they are also positive definite.
One approach would be to look at the characteristic polynomials. The characteristic

polynomial of A1 is x3 − 10x2 + 17x − 1; this polynomial has coefficients alternating
in sign; hence all its eigenvalues are positive and A1 is positive definite.
On the other hand, the characteristic polynomial of A2 is x3 − 10x2 + 16x + 1, where

the coefficients do not alternate in sign, and hence A2 is not positive definite.
There is another well-known test to determine if a matrix is positive definite. For

each k = 1, 2, . . . , n, define the leading minor Ak to be the k × k matrix whose (i, j)
entry is the same entry aij as in A. (So Ak is the top-left-hand k × k submatrix of
A.)
Theorem 13.3.4: A Hermitian matrix is positive definite if and only if det(Ak ) is

positive for all k = 1, 2, . . . , n.
The proof of Theorem 13.3.4 is omitted. It is not difficult, but it is long.
13.4 Singular Values

For any m × n matrix A, not necessarily even square, consider the (n × n) matrix
A∗ A.
This matrix A∗ A is Hermitian, as (A∗ A)∗ = A∗ (A∗ )∗ = A∗ A. It is also non-negative
definite, as x·(A∗ Ax) = (Ax)·(Ax) ≥ 0. Indeed, we see that A∗ A is positive definite
exactly when A has trivial nullspace, i.e., when A has rank n. Let us include here
a more general version of this observation.
Theorem 13.4.1: For any m × n matrix A, r(A) = r(A∗ A) = r(AA∗ ) = r(A∗ ).
Proof: Since A∗ and AA∗ represent linear transformations defined on the same
vector space Cm , to show that AA∗ and A∗ have the same rank it suffices to show
that the nullspace of A∗ is the same as the nullspace of AA∗ . This is also enough
for the rest of the theorem, as the rank of A is equal to the rank of A∗ .
One direction is easy: the nullspace of A∗ is contained in the nullspace of AA∗ , since
if A∗ v = 0 then also A(A∗ v) = A 0 = 0.
84
Now assume that v is in the nullspace of AA∗ . Then we have
(A∗ v) · (A∗ v) = (AA∗ v) · v = 0 · v = 0.
This implies that A∗ v = 0, i.e., that v is in the nullspace of A∗ . 2

As A∗ A is Hermitian, it has an orthonormal basis of eigenvectors, and all its eigen-
values will be non-negative √reals. If √ λ1 , . . . , λk are the non-zero eigenvalues, then
the singular values of A are λ1 , . . . , λk .
Theorem 13.4.2: The singular values of A are the same as the singular values
of A∗ .
Proof: Suppose v is an eigenvector of A∗ A, with eigenvalue λ > 0. Then AA∗ Av =

A(λv) = λAv. As Av = λv 6= 0, Av is an eigenvector of AA∗ , with eigenvalue λ.
An identical proof shows that, if λ is a non-zero eigenvalue of AA∗ , then it is also a
non-zero eigenvalue of A∗ A. Thus A∗ A and AA∗ have the same non-zero eigenvalues,
and hence A and A∗ have the same singular values. 2
Moreover, if v1 , . . . , vk is an orthonormal set of eigenvectors of A∗ A, with corre-

sponding positive eigenvalues λ1 , . . . , λk , then
(Avi ) · (Avj ) = vi · (A∗ Avj ) = λj (vi · vj ).
This means that the eigenvectors Av1 , . . . , Avk of AA∗ are also orthogonal. The
normalised vectors √1λ Av1 , . . . , √1λ Avk form an orthonormal set of eigenvectors of
1 k
AA∗ .
13.5 The norm of a matrix

For an m × n matrix A (again, not necessarily square), define the norm kAk of A
to be the maximum, over all non-zero vectors v ∈ Rn , of kAvk/kvk. So kAk is the
maximum amount that A “expands” a vector.
We see immediately that, if v is an eigenvector of A with eigenvalue λ, then
kAvk kλvk |λ|kvk
kAk ≥ = = = |λ|.
kvk kvk kvk
Therefore kAk is at least the largest modulus of an eigenvalue of A. In the case
where A is Hermitian, the norm of A is equal to this maximum modulus, but in
general the norm is larger.
Theorem 13.5.1: The norm of any matrix A is equal to its largest singular value.
85
Moreover, a vector maximising kAvk/kvk is an eigenvector of A∗ A corresponding

to the largest singular value of A.
Proof: Take an orthonormal basis v1 , . . . , vn of eigenvectors of A∗ A, with cor-

responding (real, non-negative) eigenvalues λ1 , . . . , λn . Say√that λ1 is the largest
eigenvalue of A∗ A, so that the largest singular value of A is λ1 .
Take any vector v, write v = α1 v1 + · · · + αn vn , and note that kvk2 = |α1 |2 + · · · +
|αn |2 . Now we have (A∗ A)v = α1 λ1 v1 + · · · + αn λn vn , and so
kAvk2 = (Av) · (Av) = v · (A∗ Av) = α1 α1 λ1 (v1 · v1 ) + · · · + αn αn λn (vn · vn ) =
= λ1 |α1 |2 + · · · λn |αn |2 ≤ λ1 kvk2 .
√
Therefore kAvk/kvk ≤ λ1 , with equality when v = v1 . 2

2 0 1
Example: Let A = . Then:
−1 2 0
 
5 −2 2
5 −2
A A = −2 4 0 ; AA =
∗ ∗
.
−2 5
2 0 1
Note that indeed both matrices are symmetric, so Hermitian. By inspection, the
eigenvectors of AA∗ are u1 = (1, 1)t and u2 = (1, −1)t , with corresponding eigenval-
ues 3 and 7 respectively.
We can now check that v1 = A∗ u1 = (1, 2, 1)t and v2 = A∗ u2 = (3, −2, 1)t are
eigenvectors of A∗ A with the same eigenvalues 3 and 7. The third eigenvalue of A∗ A
must be 0: indeed the rank of A∗ A is at most the rank of A, which is 2. The vector
(2, 1, −4)t is in the kernel of A, and so in the kernel of A∗ A.
√
The norm of A is 7, and a vector maximising kAxk/kxk is v2 = (3, −2, 1)t .
13.6 Singular values decomposition

For any matrix A, let v1 , . . . , vn be an orthornormal basis of eigenevectors of the
Hermitian matrix A∗ A, with v1 , . . . , vk corresponding to positive eigenvalues, and
the remaining vectors vk+1 , . . . , vn to zero eigenvalues.
Note that, if A∗ Av = 0, then 0 = v · (A∗ Av) = (Av) · (Av), so Av = 0.
For i = 1, . . . , k, set wi = √1λ Avi , so the wi form an orthonormal set of eigenvectors
i
of AA∗ . The singular values decomposition of A is
p p
A = λ1 w1 v1∗ + · · · + λk wk vk∗ .
86
A proof of this requires the fact that I = v1 v1∗ + · · · + vn vn∗ , whenever v1 , . . . , vn

is an orthonormal basis of Cn . Given this, the formula for the singular values
decomposition follows on applying A to both sides.
The singular values decomposition is of importance in numerical computing, and in
principal component analysis.
In the example in the previous section, we normalise the eigenvectors of AA∗ and
A∗ A to get the singular values decomposition:
√ √
√ 1/ 2 √ √ √ √ 1/ √2 √ √ √
A= 3 √ 1/ 6 2/ 6 1/ 6 + 7 3/ 14 −2/ 14 1/ 14
1/ 2 −1/ 2

1/2 1 1/2 3/2 −1 1/2
= + .
1/2 1 1/2 −3/2 1 −1/2
87
Lecture 14: Sums and Projections

14.1 Direct Sums
Let U1 and U2 be subspaces of a vector space V . Define the sum of the two subspaces
to be the subspace U1 + U2 spannedP by U1 ∪ U2 . This subspace is equal to the set
of all vectors that can be written as ki=1 ai vi , where the ai are scalars, and each vi
is either in U1 or U2 . By dividing the sum into two parts, those using vectors in U1
and those using vectors in U2 , and using the fact that both U1 and U2 are subspaces,
we have that U1 + U2 = {u1 + u2 | u1 ∈ U1 , u2 ∈ U2 }.
If U1 and U2 are subspaces of a vector space V , then U1 ∩ U2 is also a vector space.

To see this: if u and v are in U1 ∩ U2 , and a and b are scalars, then au + bv is in
both U1 and U2 , and hence in U1 ∩ U2 .
Theorem 14.1.1: If a vector space V is equal to U1 + U2 for subspaces U1 and U2 ,

then dim(V ) + dim(U1 ∩ U2 ) = dim(U1 ) + dim(U2 ).
Proof: Suppose that dim(U1 ∩ U2 ) = k, dim(U1 ) = `, dim(U2 ) = m − ` + k. We

wish to show that dim(V ) = dim(U1 + U2 ) = m.
Take a basis B = {v1 , . . . , vk } of U1 ∩U2 . Extend B to a basis {v1 , . . . , vk , vk+1 , . . . v` }
of U1 . Also extend B to a basis {v1 , . . . , vk , v`+1 , . . . , vm } of U2 . We claim that the
union C = {v1 , . . . , vm } of these two bases is a basis for V = U1 + U2 ; this will prove
the result.
It is clear that C spans U1 + U2 . We have to prove that C is linearly independent.
Suppose then that there are scalars α1 , . . . , αm , not all zero, such that
α1 v1 + · · · + αm vm = 0.
If all the αi for i > ` are zero, then we have a contradiction to the assumption that
{v1 , v` } is a basis of U1 . So the vector v = α`+1 v`+1 +· · · αm vm is a non-zero element
of U2 . The vector v is also an element of U1 , as it can be written as −α1 v1 −· · ·−α` v` .
Therefore v is in U1 ∩ U2 , and so it can be written as β1 v1 + · · · + βk vk , for some
scalars βi . This now gives us two different representations of the vector v in U2 as
a linear combination of the elements of {v1 , . . . , vk , v`+1 , . . . , vm }, contradicting the
assumption that this is a basis of U2 . 2
88
Example: Consider the two subspaces U1 = span({(0, 1, 0)t , (0, 1, 1)t }), U2 =
span({(1, 2, 3)t , (2, 0, 4)t }) of V = R3 . It’s easy to check that the first three of
these four vectors are linearly independent, so U1 + U2 = R3 . The formula in Theo-
rem 14.1.1 tells us that U1 ∩ U2 will have dimension 2 + 2 − 3 = 1. To find a vector
in the intersection, note that U1 is the set of all vectors (x, y, z)t with x = 0. The
vector 2(1, 2, 3)t − (2, 0, 4)t = (0, 4, 2)t is therefore in U1 ∩ U2 . So U1 ∩ U2 is the set
of multiples if (0, 4, 2)t .
If U1 and U2 are subspaces of a vector space V , we write U1 + U2 = U1 ⊕ U2 to mean

that every element of U1 + U2 can be written uniquely as a sum of an element of U1
and an element of U2 . We say that U1 + U2 is the direct sum of U1 and U2 .
Theorem 14.1.2: If U1 and U2 are subspaces of a vector space V , then U1 + U2 =

U1 ⊕ U2 if and only if U1 ∩ U2 = {0}.
Proof: Suppose U1 ∩ U2 = {0}. If u1 + u2 = u01 + u02 , for u1 , u01 ∈ U1 , u2 , u02 ∈ U2 ,

then u1 − u01 = u02 − u2 is in U1 ∩ U2 , and hence is equal to 0. This means that there
are not two different ways of writing any vector as a sum of a vector in U1 and a
vector in U2 .
Conversely, suppose that, for every u ∈ U1 + U2 , there is only one way to write
u ∈ U1 + U2 as u1 + u2 , with u1 ∈ U1 and u2 ∈ U2 . If u is a vector in U1 ∩ U2 , then
u + 0 and 0 + u are two expressions of u as a sum of a vector in U1 and a vector in
U2 , so these must be the same expressions, i.e., u = 0. 2
14.2 Orthogonal Complements

For any subspace U of a finite-dimensional real or complex inner product space V ,
there is always an easy way to find another subspace W such that V = U ⊕ W , i.e.,
V is the direct sum of U and W .
Definition: For U a subspace of an inner product space V , the orthogonal comple-

ment U ⊥ is the set
{v ∈ V | v ⊥ u for every u ∈ U }.
It is easy to verify that U ⊥ is also a subspace of V (an exercise).
Theorem 14.2.1: If U is a subspace of a finite-dimensional real or complex inner

product space V , then U ⊕ U ⊥ = V and (U ⊥ )⊥ = U .
89
Proof: Given a basis {u1 , . . . , un } of U , and an element v of V , the Gram-Schmidt

procedure finds a vector w = v + u, with u in U , which is orthogonal to all the ui ,
and is therefore in U ⊥ . Hence v can be written as −u + w, which is in U + U ⊥ , and
this proves that V = U + U ⊥ .
If u ∈ U ∩ U ⊥ , then hu, ui = 0, and therefore u = 0: hence V = U ⊕ U ⊥
It is easy to confirm that U ⊆ (U ⊥ )⊥ (an exercise).
Applying the result that V = W ⊕ W ⊥ to the subspace W = U ⊥ , we obtain that

V = U ⊥ ⊕ (U ⊥ )⊥ . By Theorem 14.1.2, the dimension of (U ⊥ )⊥ is the same as the
dimension of U . As U ⊆ (U ⊥ )⊥ , it follows that U = (U ⊥ )⊥ . 2
Theorem 14.2.1 does not hold in all infinite-dimensional vector spaces, as we will
show later.
Theorem 14.2.2: If A is an n × m real matrix, then N (At ) = R(A)⊥ .
Proof: Take a vector z in R(A)⊥ . This means that, for every x in Rm , we have
(Ax) · z = x · (At z) = 0. But, since this is true for all x in Rm , including x = At z,
the positivity property implies that At z = 0, which means that z ∈ N (At ).
On the other hand, suppose that z ∈ N (At ). We reverse the process. Let x be any
vector in Rm . As At z = 0, we have 0 = x · (At z) = (Ax) · z. Because x was chosen
arbitrarily, it follows that z is in R(A)⊥ . 2
Corollary 14.2.3: If A is an n × m real matrix, then N (A) = R(At )⊥ .
We have been using Corollary 14.2.3 already, to find vectors orthogonal to the
nullspace of a matrix A. These are the row vectors of A, or alternatively the row
vectors of a matrix obtained from A by row reduction. The row vectors of the ma-
trix are the column vectors of the transpose of the matrix, and therefore they span
R(At ).
With complex matrices, we must be more careful, since we have to take the conju-
gates of the row vectors to find the orthogonal complement of the nullspace.
Example: Let V be R4 and let U = span({(1, 2, −1, 1)t , (1, 1, 0, −1)t }). We want
to find U ⊥ . The easiest way is to find the nullspace of the matrix whose rows are
(1, 2, −1, 1) and (1, 1, 0, −1). This matrix

1 2 −1 1 0 −1 1 −2
row reduces to .
1 1 0 −1 1 1 0 −1
90
Its nullspace, which is U ⊥ , is spanned by (1, −1, −1, 0)t and (1, 0, 2, 1)t . Thus U ⊥ =
span({(1, −1, −1, 0)t , (1, 0, 2, 1)t }).
14.3 Projections
A projection is a linear transformation P from a (real) vector space V to itself such
that P (v) = v for all v in im(P ).
An n × n matrix A is idempotent if A2 = A.
Theorem 14.3.1: A linear transformation P from a vector space to itself is a

projection if and only if P 2 = P . Therefore an n×n matrix A represents a projection
from Rn to itself if and only if A is idempotent.
Proof: Suppose P is a projection from V to itself, and let u be any vector in V . Now
v = P (u) is a vector in im(P ), so P (P (u)) = P (v) = v = P (u). As P 2 (u) = P (u)
for every u in V , we have P 2 = P .
Conversely, if P is a linear transformation such that P 2 = P , and v is any vector in

im(P ), then v = P (u) for some u in V , and therefore P (v) = P 2 (u) = P (u) = v.
So P is a projection. 2
Suppose P : V → V is a projection. For any x in V , we may write x = P (x) + (x −

P (x). The vector P (x) is in im(P ). Also P (x − P (x) = P (x) − P (P (x)) = 0, so
x − P (x is in ker(P ). This means that, for any projection P , V = im(P ) + ker(P ).
Moreover, the sum is a direct sum: if x is in im(P ) ∩ ker(P ), then x = P (x) = 0.
Therefore V = im(P ) ⊕ ker(P ).
We now show that we can reverse the process: if V is the direct sum of two subspaces,
then there is a unique projection with one of the subspaces as its image and the other
as its kernel. The construction of the projection is simple and natural. Given two
subspaces U1 and U2 of a vector space V such that V = U1 ⊕ U2 , we define a function
P : V → U1 as follows: if v is written uniquely as u + w, with u ∈ U1 and w ∈ U2 ,
then we set P (v) = u.
Theorem 14.3.2: For V = U1 ⊕ U2 , the function P : V → U1 defined above is the

unique projection such that im(P ) = U1 and ker(P ) = U2 .
Proof: First we show that P is a linear transformation. Let x and y be two vectors
in V , and let a, b be scalars. We need to show that P (ax + by) = aP (x) + bP (y).
Write x uniquely as ux + wx and y uniquely as uy + wy , with ux , uy ∈ U1 and
91
wx , wy ∈ U2 . We have ax + by = aux + awx + buy + bwy . By the definition of P ,

it follows that P (ax + by) = aux + buy = aP (x) + bP (y).
From the definition, we have that P (u) = u for all u ∈ U1 , so im(P ) contains U1 .
Also, if v is in U2 , then the unique way to write v as a sum of a vector in U1 and
a vector in U2 is as 0 + v, and hence P (v) = 0. So ker(P ) contains U2 . As the
dimension of im(P ) and the dimension of ker(P ) sum to the dimension of V , which
is dim(U1 ) + dim(U2 ), it now follows that im(P ) = U1 and ker(P ) = U2 .
Now suppose T is any projection on V such that U1 is the image of T and U2 is the
kernel of T . Letting v = u + w be any vector in V , with u ∈ U1 and w ∈ U2 , we
have T (v) = T (u) + T (w) = u + 0 = u, and therefore T = P . 2
The projection P with image U1 and kernel U2 is called the projection of V onto U1
parallel to U2 .
Given bases of U1 and U2 with V = U1 ⊕ U2 , one can define explicitly the matrix
that represents the projection onto U1 parallel to U2 .
Example: Let the vector space be R4 , set U = span({(0, 1, −1, 1)t , (1, 0, −2, 0)t }),
and W = span({(0, 0, 1, 0)t , (0, 0, 0, 1)t }). Let P be the projection onto U parallel
to W . We want to find the matrix AP representing P . We need the change of basis
matrix MB corresponding to the basis
B = {(0, 1, −1, 1)t , (1, 0, −2, 0)t , (0, 0, 1, 0)t , (0, 0, 0, 1)t }, which is:
   
0 1 0 0 0 1 0 0
 1 0 0 0 1 0 0 0
MB =  −1 −2 1 0
; M −1
B = 
2 1 1 0 .

1 0 0 1 0 −1 0 1
 
1 0 0 0
0 1 0 0
As  
0 0 0 0 is the matrix representing P with respect to the basis B, we have
0 0 0 0
that AP is equal to
     
0 1 0 0 1 0 0 0 0 1 0 0 1 0 0 0
 1 0 0 0 0 1 0 0 1 0 0 0  0 1 0 0
     
−1 −2 1 0 0 0 0 0 2 1 1 0 = −2 −1 0 0 .
1 0 0 1 0 0 0 0 0 −1 0 1 0 1 0 0
92
Lecture 15: Orthogonal Projections

15.1 Characterising orthogonal projections
A projection P : Rn → Rn is orthogonal if im(P ) and ker(P ) are orthogonal com-
plements of each other, in other words im(P )⊥ = ker(P ) and im(P ) = ker(P )⊥ .
Our aim is to give a simple characterisation of the matrices representing orthogonal

projections.
Theorem 15.1.1: An n × n real matrix A is normal with all eigenvalues real if and
only if it is symmetric.
Proof: By Theorem 13.2.3, a matrix A is Hermitian if and only if it is normal and

has real eigenvalues. If A is a real matrix, then it is Hermitian if and only if it is
symmetric, so we have the result. 2
Theorem 15.1.1 implies that all matrices representing reflections in Rn are sym-
metric, since a reflection is represented by an orthogonal matrix, which is also a
normal matrix, and it has 1 and −1 for eigenvalues. However, not all orthogonal
−1 0
and symmetric matrices represent reflections: for instance, represents
0 −1
the rotation by the angle π.
Theorem 15.1.2 A real n × n matrix A represents an orthogonal projection if and

only if A is idempotent and symmetric.
Proof: Suppose that A represents an orthogonal projection. Because it represents

a projection, it is idempotent. Let B be an orthonormal basis of R(A) and C be an
orthonormal basis of N (A). Because Rn = R(A)⊕N (A), with R(A) perpendicular to
N (A), the set B ∪ C is an orthonormal basis of eigenvectors (B for the eigenvalue 1
and C for the eigenvalue 0). This implies that A is normal, by Theorem 13.2.2.
Hence, by Theorem 15.1.1, A is symmetric.
For the converse, suppose that A is symmetric and idempotent. Because it is idem-
potent, A is a projection. As A is real and symmetric, it is Hermitian and therefore
normal. Let v1 , . . . , vn be an orthonormal set of eigenvectors for A, and order them
so that v1 , . . . , vk correspond to eigenvalue 1 and vk+1 , . . . , vn correspond to eigen-
value 0. Let U = span({v1 , . . . , vk }) and W = span({vk+1 , . . . , vn }). We now see
that U = R(A), W = N (A), U and W are orthogonal to each other, and the vector
space Rn is the direct sum of U and W . So A represents the orthogonal projection
93
onto U . 2
In the last lecture, we found the matrix A representing the projection onto
span({(0, 1, −1, 1)t , (1, 0, −2, 0)t }), such that the two given vectors (0, 0, 1, 0)t and
(0, 0, 0, 1)t were in the nullspace of A. As (0, 0, 1, 0)t and (0, 0, 0, 1)t are not perpen-
dicular to (0, 1, −1, 1)t and (1, 0, −2, 0)t , this is not an orthogonal projection, and
therefore the previous theorem tells us that  the matrix A representing
 this projection
1 0 0 0
 0 1 0 0
is not symmetric. Indeed that matrix was  −2 −1 0 0 .

0 1 0 0
15.2 Projecting to the range of a matrix

Given a subspace U of Rn , we would like to find a formula for various projections
onto U . If we have a basis of U , we can write the vectors of this basis as the columns
of a matrix A, so that U = R(A). Thus, our problem becomes: given an n × m real
matrix A of rank m, how can we find a projection onto the range of A?
Theorem 15.2.1 : If A is an n × m real matrix of rank m, and B is an m × n

real matrix such that BA also has rank m, then the n × n matrix R = A(BA)−1 B
represents the projection of Rn onto the range of A parallel to the nullspace of B.
Proof: We are given that BA, an m × m matrix, has rank m, and therefore R is
well defined. We must check that R is idempotent with the same range as A and
the same nullspace as B.
Idempotent:
R2 = [A(BA)−1 B][A(BA)−1 B] = A[(BA)−1 BA][(BA)−1 B] = A(BA)−1 B = R.
Same range: Because R is of the form AM , the range of R is contained in the range
of A. Suppose that v is in R(A). This means that there is a vector u in Rm such
that v = Au. Now Rv = RAu = [A(BA)−1 B]Au = A[(BA)−1 BA]u = Au = v.
Same nullspace: It follows directly from the definition of R that any vector in the
nullspace of B is also in the nullspace of R: N (B) ⊆ N (R).
To show that the two nullspaces are equal, it suffices to show that they have the
same dimension. By the rank-nullity theorem, it suffices to show that B and R have
the same rank.
Because R has the same range as A, we have r(R) = r(A) = m. Also r(B) ≥
94
r(BA) = m, but B is an m × n matrix, so its rank is exactly m. 2
Theorem 15.2.2: If A is an n × m real matrix of rank m, then R = A(At A)−1 At

represents the orthogonal projection onto the range of A.
Proof: By Theorem 15.2.1, R represents projection onto R(A) parallel to N (At ).

But N (At ) = R(A)⊥ , by Theorem 14.2.2, so this is the orthogonal projection onto
R(A). 2
 
1 −1
Example: Let A = 0 1 . To find the matrix R representing the orthogonal
1 0
projection onto R(A), we calculate:

t 1 0 1 t 2 −1 t −1 2/3 1/3
A = ; AA= ; (A A) = ;
−1 1 0 −1 2 1/3 2/3
   
1 −1 2/3 −1/3 1/3
2/3 1/3 1 0 1
R = A(At A)−1 At = 0 1  = −1/3 2/3 1/3 .
1/3 2/3 −1 1 0
1 0 1/3 1/3 2/3
Notice that R takes each column of A to itself: R(1, 0, 1)t = (1, 0, 1)t , R(−1, 1, 0)t =
(−1, 1, 0)t . Also, (1, 1, −1)t is a vector in N (At ) = R(A)⊥ , and R(1, 1, −1)t = 0, as
expected.

1 0 1
Suppose we use the matrix B = in place of At . Then we get:
0 1 0

2 −1 1/2 1/2
BA = ; (BA)−1 = ;
0 1 0 1
   
1 −1 1/2 −1/2 1/2
1/2 1/2 1 0 1
R0 = A(BA)−1 B = 0 1  = 0 1 0 .
0 1 0 1 0
1 0 1/2 1/2 1/2
It is easy to check that R0 is idempotent and its range, span({(1, 0, 1)t , (−1, 2, 1)t }),
is equal to R(A). Furthermore, N (B) = span({(−1, 0, 1)t }), and it is easy to check
that this is also the nullspace of R0 . The vector (−1, 0, 1)t is not orthogonal to R(A):
this is a different projection onto R(A).
95
Lecture 16: Least Squares Approximation

16.1 Minimising the distance to a subspace
Let U be a subspace of an inner product space V , and let P be a projection from
V onto U .
For any particular vector v in V , one could choose a projection P : V → U that
minimises ||v − P (v)||, the distance from v to its image P (v) in U . What we now
show is that this optimal projection P is the orthogonal projection onto U , which
therefore accomplishes this minimisation for all vectors v simultaneously.
Theorem 16.1.1: Let V be a complex or real inner product space, and let U be a
finite-dimensional subspace of V . Let P be the orthogonal projection onto U (whose
kernel is U ⊥ ). For any vector v in V , the vector P (v) is the unique point in U at
minimum distance from v.
Proof: Let u be any vector in U , and consider its distance to v. Because we can
write u − v as (u − P (v)) + (P (v) − v), we can write hu − v, u − vi as
h(u − P (v)) + (P (v) − v), (u − P (v)) + (P (v) − v)i.
As u − P (v) is in U and P (v) − v is in the kernel of P , which is U ⊥ , this quantity

can be re-written as
hu − P (v), u − P (v)i + hP (v) − v, P (v) − vi.
The second inner product is fixed, but the first varies with the choice of u. Because
P (v) is in U , the choice of u = P v is legitimate and also is the unique choice that
minimises the distance. 2
16.2 Fitting functions to data

Suppose that we are given some empirical data relating two variables, and we want
to understand one variable as a function of the other. For example, we may wish
to relate the amount that a person has saved in a year with their income in that
year. In this type of situation, one variable is not actually a function of the other;
the quantity that someone has saved is determined by many more personal factors
than their income. Even with variables found in physics, there is no perfect relation
between variables. For instance, the time required for a 1 kilogram sphere of lead
to fall a distance of 10 meters will not be determined solely by the distance chosen
– there will be minor influences of the wind and air pressure. Additionally, there
96
will be errors in measurement. We will discuss an approach to the empirical data,

using linear algebra, that generates approximate functional relationships with both
simplicity and accuracy.
Assume that D × R is the set on which one has data, meaning that the data consists
of pairs of points (d, r) ∈ D × R, such that we would like to establish a functional
relationship between points in D and values in R. Now assume further that the
reasonable functions f for which the pair (d, r) should satisfy r = f (d) form a finite-
dimensional subspace U of all the functions from D to R. Let {f1 , f2 , . . . , fm } be a
basis for this finite-dimensional subspace U .
Assume that our data is in the form of pairs (d1 , r1 ), (d2 , r2 ), . . . , (dn , rn ), with n ≥ m.
Every function f in the subspace U also generates pairs (d1 , f (d1 ), . . . , (dn , f (dn ).
The accuracy of a function f in U as an approximation to the given data can be
measured by the distance between (f (d1 ), f (d2 ), . . . , f (dn ))t and (r1 , r2 , . . . , rn )t in
Rn , most naturally with respect to the Euclidean norm.
As an example, let us imagine that the peak daytime temperature in London is

a function of the time of year. Let the angle 0 = 2π stand for 1st January, and
the angle π stand for 1st July: we hypothesise that the function representing this
relationship is expressed by a linear combination of the three functions:
f1 (t) = 1, f2 (t) = cos t and f3 (t) = sin t.
Suppose we have the following data from some year, one temperature measurement
every two months (one-sixth of the year), in Celsius, with the first measurement on
1st January:
(0, 2), ( π3 , 10), ( 2π 4π 5π
3 , 18), (π, 24), ( 3 , 22), ( 3 , 14).
After a little experimentation, we might come up with a function in our space
approximating the data like f (t) = 15 − 10 cos t. That would give f (0) = 5 and
f (π) = 25. But we notice that the coldest and hottest times of the year seem not
to be precisely at t = 0 and t = π – as would be predicted by a function of the form
f (t) = A − B cos t – but rather sometime later. A better approximation is given by
f (t) = 15 − 10 cos t − 2 sin t.
We seek a method for discovering the “best” approximation from our subset of
functions.
97
16.3 Least-squares approximation

We continue to consider the problem in the previous section, where we seek the best
approximation to given data within a subspace U of the set of all functions.
Let B = (f1 , . . . , fm ) be an ordered basis of the space U of functions from D to R,
and let d1 , . . . , dn be the points in D at which we have data measurements. Notice
that the map T from U to Rn defined by T (f ) = (f (d1 ), f (d2 ), . . . , f (dn ))t is a linear
,
transformation. We can represent T by a matrix AE B
T , which from now one will be
called simply A.
Assume additionally that the transformation T is injective, meaning that distinct

functions f, g in U result in distinct points (f (d1 ), . . . , f (dn ))t , (g(d1 ), . . . , g(dn ))t .
This means that, on the subspace im(T ), the transformation T has an inverse, and
the dimension of the im(T ) is equal to m, the dimension of U . The orthogonal pro-
jection P of Rn onto the subspace im(T ) is represented by the matrix A(At A)−1 At ,
as shown in Lecture 15.
Suppose the coordinates of our data are given by the vector v = (r1 , . . . , rn )t in Rn .
By Theorem 16.1.1, the nearest point in im(T ) to the vector v will be the point
P (v).
What we really want is the function that approximates this data, not the data
produced by this function. However, this function is not difficult to find. Because
the matrix A represents a linear transformation from U to a subspace of the same
dimension as U , there can be only one function in U that is mapped to P (v). The
vector (At A)−1 At v in Rm corresponds to a function in U , written in terms of the
basis B, that gives the data P (v), since A((At A)−1 At v) = P (v). Since there is
only one function that does this, it must be the function we are looking for, and so
applying the matrix (At A)−1 At to the data v gives the answer.
Example: Suppose we drop a 1kg lead sphere from the heights of 1, 2, 3, and 4
meters and measure the time it takes for the sphere to hit the ground. Suppose
we record the data (1, 7), (2, 10), (3, 12), (4, 15) (where the time is a multiple of
1/20th of a second) and our hypothesis is that the correct function is of the form
f = a + bx + cx2 for some constants a, b and c. We want to find the values of a, b,
c giving the best approximation to the given data.
Let e1 = (1, 0, 0)t correspond to the function 1, e2 = (0, 1, 0)t to the function x, and
e3 = (0, 0, 1)t to the function x2 – this means that we are taking the ordered basis
(f1 , f2 , f3 ) of our space U of approximating functions, where f1 (x) = 1, f2 (x) = x and
98
f3 (x) = x2 . The independent values where we have data measurements are d1 = 1,

d2 = 2, d3 = 3 and d4 = 4. The matrix A representing the linear transformation
mapping each f in U to (f (d1 ), f (d2 ), f (d3 ), f (d4 ))t is:
   
f1 (d1 ) f2 (d1 ) f3 (d1 ) 1 1 1
f1 (d2 ) f2 (d2 ) f3 (d2 ) 1 2 4 
A= 
f1 (d3 ) f2 (d3 ) f3 (d3 ) or A =  
1 3 9  .
f1 (d4 ) f2 (d4 ) f3 (d4 ) 1 4 16
We need the matrix (At A)−1 At , and we calculate:
   
4 10 30 31/4 −27/4 5/4
A A = 10
t
30 100 ; (At A)−1 = −27/4 129/20 −5/4 ;
30 100 354 5/4 −5/4 1/4
  
31/4 −27/4 5/4 1 1 1 1
(At A)−1 At = −27/4 129/20 −5/4 1 2 3 4  =
5/4 −5/4 1/4 1 4 9 16
 
9/4 −3/4 −5/4 3/4
−31/20 23/20 27/20 −19/20 .
1/4 −1/4 −1/4 1/4
The matrix (At A)−1 At should take the vectors (1, 1, 1, 1)t , (1, 2, 3, 4)t , and (1, 4, 9, 16)t
to (1, 0, 0)t , (0, 1, 0)t and (0, 0, 1)t respectively, and this can be checked.
Applying the above matrix to the vector consisting of our data measurements v =
(r1 , r2 , r3 , r4 )t = (7, 10, 12, 15)t , we get:
 
  7  
9/4 −3/4 −5/4 3/4   9/2
−31/20 23/20 27/20 −19/20 10 = 26/10 .
12
1/4 −1/4 −1/4 1/4 0
15
This means that the best approximation in span({1, x, x2 }) to this data is the func-
tion f (x) = 26x 9
10 + 2 . This function fits the data rather well, as the following table
shows.
di 1 2 3 4
f (di ) 7.1 9.7 12.3 14.9
ri 7 10 12 15
This is called the method of least-squares approximation, because what is being min-
imised is the distance from the real data (r1 , . . . , rn )t to the estimate (f (d1 ), . . . , f (dn ))t ,
99
where “distance” means Euclidean distance in Rn . Equivalently,

Pwe are minimising
n
the square distance, which is equal to the sum of the squares i=1 (f (di ) − ri )2 of
the “errors”.
16.4 Modelling Issues

One should not expect polynomial functions to represent well the time for a ball to
fall as a dependent variable of the distance (as an independent variable). If x is the
time that a metal ball has been falling, then y = f (x) = ax2 represents the distance
that a metal ball falls in the time x (given that it has not hit the ground and that
the influence of air resistance is minimal). If y is to be taken as the independent
√ √
variable, then we have x = a y, and if our data reflects this relationship then
we won’t be able to find a polynomial that fits the data well. For this example,
the space of functions we choose to find our approximation on should include the
square root function. More generally, the space of functions within which we seek
our approximation should be influenced by the underlying scientific theory.
We introduced another problem with our solution above: we included the constant
functions in our space of functions. In our approximation, we get 29 as the coefficient
of the constant function 1. We don’t expect 29 to be a good answer for the time it
takes for a ball to fall a distance of zero!
We have chosen an ill-suited space of functions. From this inappropriate choice, we
got an answer that was mathematically correct (and moreover that fit the data very
well), but missed the appropriate relationship between independent and dependent
variables. By collecting more and more data, we can learn more about what should
be the correct space of approximating functions. Although mathematics can and
should assist in these discoveries, the problem of discovering the correct space of
functions is primarily a problem of science, not of mathematics.
100
Lecture 17: Left and Right Inverses

17.1 Left inverses
In this lecture, we treat only real vector spaces, though the arguments for complex
vector spaces are very similar.
When A is an n × m real matrix of rank m, the range of A has dimension m and
the nullspace is {0}. Because the range of A and Rm have the same dimension, we
would expect to find an m × n matrix that reverses what A does.
Definition: If A is an n × m matrix, then an m × n matrix Al is a left inverse of

A if Al A = Im .
We know already how to find a left inverse; in Lecture 16, when we found a function
that was the best fit to given data, we did it by using a left inverse. In that case
the matrix (At A)−1 At was a left inverse.
Theorem 17.1.1: If A is an n × m real matrix of rank m, and B is an m × n matrix

such that BA is invertible, then (BA)−1 B is a left inverse of A.
Proof: ((BA)−1 B)A = (BA)−1 (BA) = Im . 2
Every left inverse of A can be created with the formula (BA)−1 B, for some matrix
B. Indeed, if Al is a left inverse of A, then (Al A)−1 = Im , and putting B = Al in
the formula gives (Al A)−1 Al = Al .
Theorem 17.1.2: If A is an n × m matrix of rank m, and Al and Ax are two left

inverses of A, then (Al − Ax )A is the zero matrix. Furthermore, if the nullspace of
Al equals the nullspace of Ax , then Al equals Ax .
Conversely if Al is a left inverse of A and B is an m × n matrix such that BA is the
zero matrix, then Al + B is also a left inverse of A.
Proof: Since for every vector v in Rm we have (Al A)v = v and (Ax A)v = v it
follows that (Al − Ax )A is the zero matrix.
Suppose N (Al ) = N (Ax ). Since both Al and Ax are left inverses of A, they agree on
what they do to vectors in the range of A. They also agree on what they do to vectors
in N (Al ), namely they send these elements to 0. We claim that Rn = R(A)+ N (Al ):
this implies that Al and Ax agree on a basis of Rn , and therefore are the same matrix.
To show that Rn = R(A) + N (Al ), let v be any vector in Rn . We see that AAl v
101
is in R(A), and v − AAl v is in N (Al ), since Al (v − AAl v) = Al v − (Al A)Al v =

Al v − Al v = 0.
For the last part, given any v in Rm , (Al + B)Av = Al Av = v, hence Al + B is also
a left inverse for A. 2
Corollary 17.1.3: If an n × m matrix A has rank m, then there is one and only
one left inverse Ag of A such that N (Ag ) = R(A), namely Ag = (At A)−1 At .
17.2 Right inverses

When A is an n × m matrix of rank n, and n < m, then there can be no m × n
matrix mapping Av back to the vector v for all v in Rm . However we would expect
to find an m × n matrix B for which A is a left inverse to B.
Definition: If A is an n × m matrix of rank n, then an m × n matrix Ar is a right

inverse of A if AAr = In .
Right and left inverses are closely related to each other. From the definitions, we
see that if Al is a left inverse of A then A is a right inverse of Al , and likewise if Ar
is a right inverse of A then A is a left inverse of Ar .
Theorem 17.2.1: If A is an n × m matrix of rank n, and B is an m × n matrix,

then the following are equivalent:
(1) the n × n matrix AB is invertible,
(2) AB is invertible, and B(AB)−1 is a right inverse of A,
(3) B has rank n and N (A) ∩ R(B) = {0}.
If these conditions are met, then the range of B is equal to the range of B(AB)−1 .
Proof: (1) ⇒ (2): As AB(AB)−1 is the n × n identity matrix, B(AB)−1 is a right

inverse for A.
(2) ⇒ (3): Because B is an m × n matrix, the rank of B cannot be more than n.
It also cannot be less than n, since otherwise AB would also have rank less than n
and hence not be invertible.
Let v be in both N (A) and R(B). This means that there is some vector w in Rn
such that v = Bw and ABw = 0. But, since AB is invertible, this means that w
is 0, and therefore v = Bw = 0.
(3) ⇒ (1): For the invertibility of AB, we need that the nullspace of AB is {0}. If
ABv = 0 for some vector v, then Bv is in the nullspace of A. By condition (3), we
have Bv = 0. But the rank of B is n, which implies that v = 0.
102
Lastly, because the range of (AB)−1 is the whole space Rn , the range of B(AB)−1
is equal to the range of B. 2
If Ar is a right inverse of A, then (AAr )−1 is the n × n identity, making Ar (AAr )−1 =
Ar a right inverse. This tells us that every right inverse of A can be created by the
formula B(AB)−1 for some matrix B so that AB is invertible.
Theorem 17.2.2: If A is an n × m matrix of rank n, and Ar , Ax are two right

inverses of A, then A(Ar − Ax ) is the zero matrix. Furthermore, if R(Ar ) = R(Ax ),
then Ar equals Ax .
Conversely, if Ar is a right inverse of A and B is an n × m matrix such that AB is
the zero matrix, then Ar + B is also a right inverse of A.
Proof: Since, for every vector v, (AAr )v = v and (AAx )v = v it follows that
A(Ar − Ax ) is the zero matrix.
Now suppose that Ar and Ax have the same ranges, and let v be any vector in Rn .
As Ar v is in the range of Ax , Ar v is equal to Ax w for some w ∈ Rn . By the right
inverse property, v = AAr v = AAx w = w. and it follows that Ar v = Ax w = Ax v.
Therefore Ar = Ax .
For the last part, if Ar and B are as given, then A(Ar +B) = AAr +AB = In +0 = In ,
so Ar + B is also a right inverse for A. 2
Corollary 17.2.3: If an n × m matrix A has rank n, then there is one and only
one right inverse Ag of A such that the range of Ag is equal to N (A)⊥ , namely
Ag = At (AAt )−1 .
Proof: The range of the given matrix Ag is contained in the range of At , which is
N (A)⊥ . Moreover, any right inverse of A has rank n, while the dimension of N (A)
is m − n, so the dimension of N (A)⊥ is n. Therefore the range of Ag is equal to
N (A)⊥ .
By the previous result, any two right inverses of A with range equal to N (A)⊥ are
equal. 2
Example: Consider the matrix

1 0 1
A= .
2 1 0
We choose two different 3 × 2 matrices, At and B, and use these to construct two
103
different right
inverses of A:
   
12 0 1
2 2 0 1
A = 0
t
1 ; B = 1 0 ; t
AA = ; AB = ;
2 5 1 2
10 0 0

t −1 5/6 −1/3 −1 −2 1
(AA ) = ; (AB) ;
−1/3 1/3 1 0
   
1/6 1/3 1 0
Ag = At (AAt )−1 = −1/3 1/3  ; Ax = B(AB)−1 = −2 1 .
5/6 −1/3 0 0
As a check:
 
5/6 −1/3
1 0 1  0 0
A(Ax − Ag ) = −5/3 2/3  = .
2 1 0 0 0
−5/6 1/3
17.3 Generalised Inverses

We’ve seen how to find the “next best thing” to an inverse of A if A is an n × m
matrix of “full rank”, i.e., rank equal to the minimum of n and m. What can be
done if A is an n × m matrix of rank k, where k < n and k < m?
Let’s ask what properties left and right inverses have in common. If A has a left
inverse Al , then AAl A = A(Al A) = A. If A has a right inverse Ar , then AAr A =
(AAr )A = A. This inspires the following definition.
Definition: If A is an n × m matrix then an m × n matrix B is a weak generalised

inverse of A if ABA = A.
Theorem 17.3.1: If Ax and Ay are both weak generalised inverses of A, then

A(Ax − Ay )A is the zero matrix. Conversely, if Ax is a weak generalised inverse of
A, and W is any matrix of the right dimensions such that AW A is the zero matrix,
then Ax + W is also a weak generalised inverse of A.
Proof: Since A = AAx A = AAy A we have that A(Ax − Ay )A = A − A. Conversely

if AW A is the zero matrix then A(Ax + W )A = AAx A + AW A = AAx A = A. 2
Definition: If A is an n × m matrix, then an m × n matrix B is a double-sided

generalised inverse of A if B is a weak generalised inverse of A and also A is a weak
generalised inverse of B, i.e.,
104
(1) ABA = A and

(2) BAB = B.
Notice that, if A has a left inverse Al , then A and Al are doubled-sided generalised
inverses of each other, since AAl A = A(Al A) = A and Al AAl = (Al A)Al = Al .
Similarly, if Ar is a right inverse of A, then A and Ar are doubled-sided generalised
inverses of each other.
Theorem 17.3.2: If A is an n × m matrix of rank k, and B is a double-sided

generalised inverse of A, then the rank of B is also k and between the subspaces
R(B) and R(A) the matrices A and B represent linear transformations that are
inverses of each other – i.e., for every u in R(B), we have BAu = u, and for every
v in R(A) we have ABv = v.
Conversely if A and B are inverses between the subspaces R(A) and R(B), then B
and A are double-sided generalised inverses of each other.
Proof: Let u be a vector in R(B), meaning that u = Bv for some v in Rn . By

condition (2), BABv = Bv, which means that BAu = u. The same arguments
shows that ABv = v for all v in the range of A. Since they are inverses of each
other on these subspaces, these subspaces have the same dimension and hence the
rank of B is also k.
Conversely, if BAv = v for all v in R(B), then, for any w in Rn , BABw = Bw.
Thus BAB = B. The same argument shows that ABA = A. 2
We now show how to construct a double-sided generalised inverse for any matrix A.
The first step is to write A as a product of two full-rank matrices.
Theorem 17.3.3: Let A be an n × m matrix of rank k > 0. The matrix A can be
written as a product BC of a k × m matrix C of rank k an n × k matrix B of rank
k.
Proof: Place a basis {v1 , . . . , vk } of R(A) into the k columns of B.

For every j, the vector Aej is in R(A), and so can be written as Aej = c1j v1 + · · · +
ckj vk for some real numbers c1j , . . . , ckj . Form C by placing these numbers in the
jth column of C.
By following what happens to each ej , we see that A = BC. Since B is n × k and
C is k × m, neither can have rank larger than k. As k = r(A) ≤ min(r(B), r(C)),
both B and C have rank k. 2
105
Corollary 17.3.4: Every non-zero n × m matrix A has a double-sided generalised

inverse.
Proof: Let k = r(A). Write A = BC, where B is an n × k matrix and C is an

k × m matrix, both of rank k. Let B l be a left inverse of B, and C r a right inverse
of C.
We claim that C r B l is a double-sided generalised inverse of A. Indeed, we have:
BCC r B l BC = B(CC r )(B l B)C = BC and C r B l BCC r B l = C r (B l B)(CC r )B l =
C rBl. 2
Given an n × m matrix A of rank k, how does one find an n × k matrix B and a

k × m matrix C, both of rank k, such that A = BC? Sometimes we can just follow
the proof of Theorem 17.3.3: form B from k linearly independent columns of A,
and let the matrix C encode how each column of A is a linear combination of the
columns of B. If our method of finding the linearly independent columns is to put
A into reduced row echelon form, then the final matrix can be used as C.
We’ll just illustrate this with an example.
 
1 1 0 0
0 1 1 0
Example: Let A =  
0 0 1 1 . Row-reductions lead to the 3 × 4 matrix C =
1 0 0 1
 
1 0 0 1
0 1 0 −1 of rank 3.
0 0 1 1
There are leading 1s in the first three columnsof C, sothe first three columns of
1 1 0
0 1 1
A are linearly independent, and we write B =  0
 This is a 4 × 3 matrix of
0 1
1 0 0
rank 3 such that R(B) = R(A).
Moreover, the final column of C encodes the fact that the fourth column of A is
equal to the first column minus the second column plus the third. Therefore
   
1 1 0   1 1 0 0
0  1 0 0 1 
1 1   = 0 1 1 0
BC = 
0 0 1 0 −1  = A.
0 1 0 0 1 1
0 0 1 1
1 0 0 1 0 0 1
106
Example: We demonstrate how to find a double-sided generalised inverse of the

3 × 4 matrix  
3 1 −1 2
A = −1 −1 2 −1 .
1 −1 3 0
First, we row-reduce the matrix A. Adding the third row to the second, and sub-
tracting three times the third row from the first row leads to:
 
0 4 −10 2
0 −2 5 −1 .
1 −1 3 0
Notice that the first row is a multiple of the second row. Eliminating the first row,
negating the second row and switching the rows leads to

1 −1 3 0
C= .
0 2 −5 1
This will do for our matrix C, as the matrix encodes how to write each column of
A as 
a linear combination
 of the first and fourth columns of B, so we should take
3 2
B = −1 −1, and verify that
1 0
   
3 1 −1 2 3 2
1 −1 3 0
A = −1 −1 2 −1 = BC = −1 −1 .
0 2 −5 1
1 −1 3 0 1 0
Next, we find a left inverse of B and a right inverse of C. We can save work
by choosing simple matrices P and Q to use in the formulae B l = (P B)−1 P and
C r = Q(CQ)−1 . For instance, we might take:
 
1 0
1 0 0 0 1
P = ; Q= 
0 0 , and now:
0 1 0
0 0

3 2 1 −1
PB = ; CQ = ;
−1 −1 0 2

−1 1 2 −1 1 1/2
(P B) = ; (CQ) = ;
−1 −3 0 1/2
107

1 2 1 0 0 1 2 0
B l = (P B)−1 P = = ;
−1 −3 0 1 0 −1 −3 0
   
1 0 1 1/2
0 1 1 1/2 0 1/2
C r = Q(CQ)−1 =  
0 0 0 = 
0 0  .
1/2
0 0 0 0
Therefore, a double-sided generalised inverse for A is
   
1 1/2 1/2 1/2 0
0 1/2 1 2 0 −1/2 −3/2 0
C B =
r l 
0 0  −1 −3 0 = 
 0
.
0 0
0 0 0 0 0
108
Lecture 18: The Strong Generalised Inverse

18.1 Uniqueness
We have seen that one can construct a double-sided generalised inverse by writing
A = BC, and taking a left inverse of B and a right inverse of C. Can all double-sided
generalised inverses be constructed in this way? The answer is yes, and moreover it
doesn’t matter how we choose B and C.
Theorem 18.1.1: If A is an n × m matrix of rank k, and Ag is a double-sided

generalised inverse of A, then for every choice of an n × k matrix B and a k × m
matrix C, such that A = BC, the matrix Ag will equal C r B l , where B l is a left
inverse of B and C r is a right inverse of C.
Proof: Define a k × n matrix B l and an m × k matrix C r by B l = CAg and

C r = Ag B. We have C r B l = Ag BCAg = Ag AAg = Ag .
Next, we show that C r is a right inverse of C and B l is a left inverse of B, i.e.,
that B l B = Ik = CC r . Both B l B and CC r equal CAg B. We need to show that
CAg Bv = v for all v ∈ Rk .
Because N (B) = {0}, it suffices to show that BCAg Bv = Bv for all v ∈ Rk .
Because every v ∈ Rk is equal to Cw for some w ∈ Rm , it suffices to show that
BCAg BCw = BCw for all w ∈ Rm . This is true because BC = A, and A and Ag
are double-sided generalised inverses of each other. 2
Theorem 18.1.2: If A is an n × m matrix and both Ax and Ay are double-sided

generalised inverses of A such that N (Ax ) = N (Ay ) and R(Ax ) = R(Ay ), then
Ax = Ay .
Proof: Choose any n × k matrix B of rank k and k × m matrix C of rank k such

that A = BC. By Theorem 18.1.1, the inverse Ax is equal to C x B x and the inverse
Ay is equal to C y B y , for some pair B x , B y of left inverses of B and some pair C x , C y
of right inverses of C. But, as the nullspace of B x is the same as the nullsapce of Ax
and the nullspace of B y is the same as the nullspace of Ay , by Theorem 17.1.2, we
conclude that B x = B y . Likewise the range of C x is equal to the range of Ax and
the range of C y is equal to the range of Ay , and therefore, by Theorem 17.2.2, C x is
equal to C y . We conclude that Ax is equal to Ay . 2
109
18.2 The strong generalised inverse

A double-sided generalised inverse Ag of A is a strong generalised inverse of A if
additionally
(3) (AAg )t = AAg
(4) (Ag A)t = Ag A
Theorem 18.2.1: For any pair A, Ag of double-sided generalised inverses, the

nullspaces of Ag , AAg and Ag AAg are all equal and the ranges of Ag , Ag A, and
Ag AAg are all equal.
Proof: For all matrices A and Ag of appropriate dimensions, the nullspace of Ag

is contained in the nullspace of AAg , and the nullspace of AAg is contained in the
nullspace of Ag AAg . Since also Ag AAg = Ag , all the nullspaces are equal to each
other.
For all matrices A and Ag of appropriate dimensions the range of Ag AAg is contained
in the range of Ag A and the range of Ag A is contained in the range of Ag . But since
Ag AAg is equal to Ag we conclude that all the images are equal to each other. 2
Theorem 18.2.2: For every non-zero matrix A, the strong generalised inverse As
exists and is unique.
Proof: First we show uniqueness. Let As and Ag be two matrices that satisfy the
four properties, namely that both the pair A, As and the pair A, Ag are double-sided
generalised inverses of each other, and that all four matrices AAs , As A, AAg and
Ag A are symmetric. By Theorem 18.1.2, we need only to show that the nullspaces
of As and Ag are the same and that the ranges of As and Ag are the same.
As AAg = (AAg )t , Theorem 14.2.2 tells us that N (AAg ) and R(AAg ) are orthogonal
complements. By Theorem 18.2.1, N (AAg ) = N (Ag ) and R(AAg ) = R(A), and
so N (Ag ) and R(A) are orthogonal complements. Similarly, R(Ag ) and N (A) are
orthogonal complements. The same reasoning applies to As , and hence it follows
from Theorem 18.1.2 that As = Ag .
It remains to show the existence of the strong generalised inverse. Given a non-zero
n × m matrix A, by Theorem 17.3.3, if k = r(A) then A can be written as BC,
where B is an n × k matrix of rank k and C is a k × m matrix of rank k. Define
As = C t (CC t )−1 (B t B)−1 B t .

110
By Corollary 17.3.4, As is a doubled-sided generalised inverse of A.
To show that As is a strong generalised inverse, we need to show that AAs and
As A are symmetric. We have AAs = BCC t (CC t )−1 (B t B)−1 B t = B(B t B)−1 B t and
As A = C t (CC t )−1 (B t B)−1 B t BC = C t (CC t )−1 C. Because (M t )−1 = (M −1 )t for all
invertible matrices M , it follows that both these matrices are symmetric. 2
Example: Let A be the matrix

 
−2 1 2
A =  1 1 −1 .
1 4 −1
Row-reduction, and elimination of zero rows, leads to the matrix

0 1 0
C= .
1 0 −1
Inspection of C tells us that for B we should use the first two columns of A, but
with the second column first. Now we calculate As :
 
1 −2
0 1 0
A = BC = 1 1  ;
1 0 −1
4 1
 
1 −2
1 1 4 1 1  = 18 3
B tB = ;
−2 1 1 3 6
4 1
 
0 1
0 1 0  1 0
CC =t
1 0 = ;
1 0 −1 0 2
0 −1

t −1 2/33 −1/33 t −1 1 0
(B B) = ; (CC ) = ;
−1/33 2/11 0 1/2
As = C t (CC t )−1 (B t B)−1 B t =
 
0 1
1 0  1 0 2/33 −1/33 1 1 4
=
0 1/2 −1/33 2/11 −2 1 1
0 −1
   
0 1/2 −13/66 5/66 1/33
1 4/33 1/33 7/33
0  =  4/33 1/33 7/33  .
−13/33 5/33 2/33
0 −1/2 13/66 −5/66 −1/33
111
It is feasible to carry out some checks, for instance we note that

    
−13/66 5/66 1/33 −2 1 2 1/2 0 −1/2
A A =  4/33
s
1/33 7/33   1 1 −1 =  0 1 0 ,
13/66 −5/66 −1/33 1 4 −1 −1/2 0 1/2
which is symmetric, as it should be, since it represents orthogonal projection parallel
to N (A).
18.4 Fitting functions to data

Suppose that we want to find functions to fit data (x1 , y1 ), . . . , (xn , yn ), as we did in
Lecture 16, but the nullspace of the matrix A that sends the subspace of functions
into the space of y-values is non-trivial. We cannot use a left inverse, and also a
right inverse might not exist. However we can use the strong generalised inverse of
A to find a function that is a least squares approximation.
For example, suppose that we have data collected for the three values −2, 1, 3 for
x, the independent variable, two measurements for each value for x. We want
to find the best approximating function in span({1, x, x2 , x3 , x4 }). The matrix A
representing the linear transformation from the subspace of the functions onto the
space of y values will be:  
1 −2 4 −8 16
1 −2 4 −8 16
 
1 1 1 1 1 
 
1 1 1 1 1  .
 
1 3 9 27 81
1 3 9 27 81
Once we have found As , the strong generalised inverse of A, the matrix As will be
applied to the vector (a−2 , b−2 , a1 , b1 , a3 , b3 )t , where a−2 and b−2 are the two values
for y corresponding to x = −2, a1 and b1 are the two values for y corresponding
to x = 1, and a3 and b3 are the two values for y corresponding to x = 3. The
result will reveal the appropriate linear combination of the functions 1, x, x2 , x3 , x4
to approximate the data.
112
Lecture 19: Infinite Dimensional Spaces

19.1 Orthogonal projections
We have considered already some infinite-dimensional spaces, for example function
spaces and the spaces of infinite sequences. Now we look at orthogonal projections
in infinite-dimensional spaces.
Working in a finite-dimensional inner product space, the Gram-Schmidt procedure

was used to get an orthonormal basis for any subspace U , and extending it to an
orthonormal basis for the whole space V . In this way, one can write any vector v
in the whole space V uniquely as a sum of a vector u in U and a vector w in U ⊥ .
Even if we are working in an infinite-dimensional inner product space V , as long as

we are projecting onto a finite-dimensional subspace U , all is well. In this setting,
there is a well-defined Gram-Schmidt procedure, taking the inner product of any v
in the vector space V with each element of an orthonormal basis {u1 , . . . , uk } of U ,
P
and setting u = kj=1 hv, uj iuj . Then we can write v = (v − u) + u, where u ∈ U
and v − u ∈ U ⊥ . Also, if u ∈ U ∩ U ⊥ , then hu, ui = 0, implying that u = 0. This
means that V = U ⊕ U ⊥ , as long as U is finite-dimensional.
P
Example: Consider `2 , the space of real sequences (a1 , a2 , . . . ) such that ∞ 2
j=1 aj <
P∞
∞. We mentioned earlier that hx, yi = j=1 xj yj defines a real inner product on `2 .
Let U be the subspace {(a1 , a2 , . . . ) | there is some k such that aj = 0 ∀j > k} of
`2 . Consider any vector w = (w1 , w2 , . . . ) in U ⊥ , along with the element ej =
(0, 0, . . . , 0, 1, 0, 0, . . . ), with the 1 in the jth position, for any j.
As ej is in U and w is in U ⊥ , we have 0 = hw, ej i = wj . This is true for all
j = 1, 2, . . . , and so w = 0. In other words, U ⊥ = {0}. But V is not the sum of U
and {0}, since there are sequences in V that are not in U .
In this example, (U ⊥ )⊥ = `2 6= U , even though U is a subspace of `2 .
If U is an infinite-dimensional subspace of V , what might the orthogonal pro-

jection of V onto U parallel to U ⊥ be? Look again at the above example. If
Ui = span({e1 , . . . , ei }), then the orthogonal projection Pi onto Ui is defined by
Pi (v1 , . . . , vi , vi+1 , . . . ) = (v1 , . . . , vi , 0, . . . ). Following our previous approach sug-
gests that P (v) should equal v for all vectors v ∈ V ; however, not all vectors in V
are in U .
For infinite-dimensional subspaces U of a vector space V , a better approach is to
113
consider the projections onto finite-dimensional subspaces U 0 of U , and ask how well
these projections approximate the vectors of V .
19.2 Approximation in function spaces

Most students should be familiar with functional approximation using a Taylor ex-
pansion. Given any function f : R → R with n continuous derivatives, one can
create its Taylor expansion about any fixed x ∈ R, which is the function of the
variable y given by:
f 00 (x) f (k) (x)
f (x) + f 0 (x)(y − x) + (y − x)2 + · · · + (y − x)k + · · · .
2 k!
This is not the only way to approximate continuous functions with polynomial
√ func-
tions. In lectures,
√ we show that the functions g1 (x) = 1, g2 (x) = 3(2x − 1)
and g3 = 5(6x − 6x + 1) form an orthonormal basis for the vector space P 2
2
of polynomial
R1 functions of degree 2 or less, when the inner product is defined by
hf, gi = 0 f (x)g(x) dx.
We are now able to use the methods of the previous lectures to find the orthogonal
projection function from the vector space of real-valued continuous functions on
[0, 1] to the three-dimensional subspace P 2 of polynomial functions, and also do the
same for the higher dimensional subspaces P k .
Take, for example, the function f (x) = e−x . We calculate the inner product of this
function with
R 1 −xeach of the−xfunctions g1 , g2 and g3 in the orthonormal basis of P 2 :
hf, g1 i = 0 e dx = −[e ]10 = 1 − e−1 ;
√ R1 √ √
hf, g2 i = 3 0 (2x − 1)e−x dx = 3[(−2x − 1)e−x ]10 = 3(1 − 3e−1 );
√ R1 √ √
hf, g3 i = 5 0 (6x2 −6x+1)e−x dx = 5[−6x2 e−x −6xe−x −7e−x ]10 = 5(7−19e−1 ).
Now the Gram-Schmidt formula tells us that the orthogonal projection P (f ) of e−x
onto P 2 is:
P (f )(x) = 5(7 − 19e−1 )(6x2 − 6x + 1) + 3(1 − 3e−1 )(2x − 1) + 1 − e−1 =
30(7 − 19e−1 )x2 − (204 − 552e−1 )x + 33 − 87e−1 .
Let’s compare this approximation with one we know better: the Taylor expansion.
The second order Taylor expansion of e−x about x = 0 is the polynomial 1 − x + 12 x2 .
This is a very accurate approximation to e−x for values close to 0, but for 1 we
get the value 1/2. The value of our new approximation function P (f ) at x = 1 is
39−105e−1 . Now is the time to use a calculator. Our new approximation 39−105e−1
is approximately .372659, whereas e−1 is approximately .367879.
114
At x = 0, P (f )(0) = 33 − 87e−1 , which is approximately .9946, again a decent

approximation to the value f (0) = 1.
The comparison to Taylor expansions is not entirely fair. The Gram-Schmidt pro-
cedure concerns integration over the interval
R [0, 1]: the approximation P (f ) is the
function in P 2 minimising the integral 0 (P (f )(x) − e−x )2 dx. On the other hand,
1
the Taylor expansion we have used is centred about x = 0, and is designed to give
very good approximations only in the neighbourhood of 0.
If one continues the process of functional approximation without end, either with
an infinite orthonormal subset of functions and the Gram-Schmidt process, or with
the Taylor expansion, or with any other method of approximation or projection,
will one obtain in the limit the original function? In many cases the answer is yes.
When this does and doesn’t happen could be the subject matter of an entire course.
19.3 Shift transformations

We showed above that there are infinite-dimensional subspaces U such that the
whole vector space is not equal to U ⊕ U ⊥ . Another important difference between
infinite and finite-dimensional vector spaces concerns the images and kernels of linear
transformations. With a finite-dimensional vector space V , a linear transformation
from V to V is invertible if and only if its kernel is {0}, or if and only if its image is
all of V . This nice relationship breaks down in infinite-dimensional vector spaces.
Example: Let V be the space of the previous example, the set of infinite sequences
(r1 , r2 , . . . ) of real numbers such that the sum r12 +r22 +· · · is finite. Define T : V → V
by T (r1 , r2 , . . . ) = (r2 , r3 , . . . ). This function T , called the shift transformation
is a linear transformation, since T (ar1 + bs1 , ar2 + bs2 , . . . ) = (ar2 + bs2 , . . . ) =
a(r2 , . . . ) + b(s2 , . . . ), for all real a, b and all pairs (r1 , r2 , . . . ), (s1 , s2 , . . . ) in V .
The image of T is the whole of V . The kernel of T is not {0}, but rather the
subspace {(r, 0, 0, . . . ) | r ∈ R}. The function T is not invertible, because there is
no linear transformation T −1 with the property that T −1 (T (1, 0, . . . )) = (1, 0, . . . ),
since T (1, 0, . . . ) = 0.
Now define another function T ∗ : V → V by T ∗ (r1 , r2 , . . . ) = (0, r1 , r2 , . . . ). This

function T ∗ is also a linear transformation. The kernel of T ∗ is {0}, but the image
of T ∗ is not the whole of V . We do have that T T ∗ is the identity transformation,
however T ∗ T is not the identity transformation, as T ∗ T (1, 0, . . . ) = 0.
The linear transformation T has an eigenvalue 0 corresponding to the eigenvector

115
(1, 0, . . . ). Does T ∗ have an eigenvalue and corresponding eigenvector? Assume

that v = (r1 , r2 , . . . ) 6= 0 is an eigenvector corresponding to the eigenvalue λ. The
kernel of T ∗ is {0}, so 0 is not an eigenvalue of T ∗ . Therefore, since λ(r1 , . . . ) =
T ∗ (r1 , . . . ) = (0, r1 , . . . ), we conclude that r1 = 0. But then the same argument
repeated again and again implies that ri = 0 for all i = 1, 2, . . . , and hence v = 0,
a contradiction. So T ∗ has no eigenvalues.
That linear transformations on infinite-dimensional vector spaces may fail to have

eigenvalues should not be too surprising. The existence of eigenvalues followed
from the existence of roots of the characteristic polynomial, and the characteristic
polynomial was defined by a determinant. These things are not defined for linear
transformations in infinite-dimensional spaces.
116
Lecture 20: Fourier Series

20.1 Trigonometric polynomials
The trigonometric polynomials are the complex-valued functions defined on R that
can be written as a finite linear combination of functions of the form ekit where k
can be any integer (positive, negative, or zero). The set of all trigonometric poly-
nomials is an infinite-dimensional complex vector space. The set of trigonometric
polynomials is closed under addition and scalar multiplication. It is also closed un-
der multiplication, since emit enit = e(m+n)it for integers m and n – hence the term
trigonometric polynomial.
The function ekit , for a positive integer k. is the function that circles the complex
numbers of modulus one in the counter-clockwise direction (from 1 to i, to −1, and
so on) with uniform speed k, starting at 1 when t = 0 and travelling around the
complete circle k times between t = 0 and t = 2π. Another way to understand
ekit is via the formula ekit = cos(kt) + i sin(kt). For a negative integer k, ekit does
the same, except that it moves in the clockwise direction. Whether k is positive
or negative the value of ekit is 1 if and only if t is an integer multiple of 2π k . The
conjugate function of ekit , namely ekit is e−kit , and e−kit = ekit . When k is equal to
0, ekit = e0 = 1, and it this function is conjugate to itself.
P
For any trigonometric polynomial p(t) = nk=−n ak ekit , the evaluation of p at t and
t + 2π will be identical, for any t, since e2iπ = 1 and so ekit = ekit+2kiπ for all integers
k. For this reason the trigonometric polynomials are used mostly for understanding
functions that are periodic – f (t + 2π) = f (t) for all t – or that are only defined on
an interval such as (−π, π).
Let V be the complex inner product space of Rbounded continuous complex-valued

π
functions defined on (−π, π), with hf, gi = −π f (t)g(t) dt. For the rest of this
lecture, we shall work in the inner product space V .
Theorem 20.1.1: For every pair of different integers m and n, emit and enit are
orthogonal in V .
Proof: For n 6= m, the inner product henit , emit i is

Z π Z π
nit mit −i
e e dt = e(n−m)it dt = [e(n−m)it ]π−π = 0.
−π −π (n − m)
The last equality holds because the function e(n−m)it is periodic, and takes the same
value at t = −π and t = π. 2
117
In order to make the functions ekit into an orthonormal set, they must be normalised.
For any integer n, we have:
Z π Z π Z π
he , e i =
nit nit nit nit
e e dt = e(n−n)it
dt = dt = 2π,
−π −π −π
√
so in every case we must divide by 2π to get an orthonormal set of functions.
Let U be the subspace of V consisting of all the trigonometric polynomials. For

every non-negative integer n, define Un = span({ekit | −n ≤ k ≤ n}). Because these
2n + 1 trigonometric polynomials are orthogonal to each other, they are linearly
independent and the subspace Un has dimension 2n + 1.
Some textbooks work instead with the functions 1, cos(kt) and sin(kt), for all pos-
itive integers k. Due to the next result, it doesn’t matter whether we define Un as
above, or as the span of the functions 1, cos(kt) and sin(kt) for all positive k ≤ n.
Theorem 20.1.2: The subspace Un is equal to the span, over the complex numbers,
of the set of functions {cos(kt) | 0 ≤ k ≤ n} ∪ {sin(kt) | 1 ≤ k ≤ n}.
Proof: From the formula eit = cos t + i sin t, and from the facts that sin(−kt) =
− sin(kt) and cos(−kt) = cos(kt), it follows that the subspace Un is spanned (over
the complex numbers) by the functions {cos(kt) | 0 ≤ k ≤ n}∪{sin(kt) | 1 ≤ k ≤ n}.
In the other direction, we can write
1 kit 1
(e + ekit ) = (cos(kt) + i sin(kt) + cos(kt) − i sin(kt)) = cos(kt);
2 2
−i kit −i
(e − ekit ) = (cos(kt) + i sin(kt) − cos(kt) + i sin(kt)) = sin(kt),
2 2
so each function in the basis of Un can be written as a linear combination of the
functions cos(kt) and sin(kt). 2
The next result shows that, if we want to study only real valued functions, it doesn’t
matter whether we use functions of the form ekit or functions of the form cos(nt)
and sin(nt).
Theorem 20.1.3: If f is a trigonometric polynomial in Un , then the real and

imaginary parts of f are also in Un .
Proof: Since ekit and e−kit are conjugates of each other, the conjugate of a function
in Un is also in Un . Therefore 21 (f + f ), the real part of f , is also in Un . Likewise
−i
2 (f − f ), the imaginary part of f , is also in Un . 2
118
20.2 Projecting to the trigonometric polynomials

Given any function f in V , we can find its projection onto each of the finite-
dimensional subspaces Un of trigonometric polynomials. The result will be the
closest function (according to the inner product norm) to f in the subspace Un .
The following result, combined with Theorem 20.1.2, shows that the closest approx-
imation in the linear span of the functions 1, cos(kt) and sin(kt) for 1 ≤ k ≤ n can
be found by calculating the closest approximation in the linear span of the functions
ekit for 0 ≤ k ≤ n.
Theorem 20.2.1: If f is a bounded continuous real-valued function defined on

(−π, π), then the closest approximation in Un to f will be a real valued function,
and it will be the same one obtained by taking the inner product with the functions
1, cos(kt) and sin(kt) for 1 ≤ k ≤ n.
We omit the proof.

There are two advantages to using ekit instead of cos(kt) and sin(kt). One is theo-
retical: it is easy to prove that the functions are orthogonal. The other is practical:
often the required integrations can be performed easily. When carrying out integra-
tions of complex functions involving ekit , the same methods as for integrating real
functions are valid.
Define Pn to be the orthogonal projection onto Un ; our methods from earlier tell us
that
Z
X 1 X π
n n
1 1 kit −kit
Pn (f ) = √ hf (t), e i √ e =
kit
f (t)e dt ekit .
2π 2π 2π −π
k=−n k=−n
ForR any function f in V , and any integer k, we define the complex number ak (f ) =
1 π −kit
2π −π f (t)e dt. This number ak (f ) is called
Pn the kth kit
Fourier coefficient of f . The
projection of f onto Un is thus given by k=−n ak (f )e .
Example: Consider the function f (t) = |t| in V . Let’s project it to the subspace
Un . Its inner product with 1 is π 2 .
Its inner product with ekit , for k 6= 0, is
Z 0 Z π
−te−kit dt + te−kit dt.
−π 0
R
We evaluate the indefinite integral te−kit dt by parts, setting u = t and dv =
119
e−kit dt, yielding

Z Z
−kit it i it −kit 1
te dt = e−kit − e−kit dt = e + 2 e−kit .
k k k k
This gives us
h it 1 −kit iπ h it −kit 1 −kit i0
−kit
h|t|, e i =
kit
e + 2e − e + 2e .
k k 0 k k −π
If k is even, then ekiπ = e−kiπ = 1, and this inner product is k + k2 − k2 − k2 − k + k2

iπ 1 1 1 iπ 1
=
0.
If k is odd, then ekiπ = e−kiπ = −1, and this inner product is − iπk − 1
k2 − 1
k2 − 1
k2 +
k − k2 = − k2 .
iπ 1 4
Putting all this together, we find that the projection of the function |t| onto Un is
the function
π 2 X 1 kit π 4 X 1
−kit
− (e + e ) = − cos(kt).
2 π k2 2 π k2
k odd, 1≤k≤n k odd, 1≤k≤n
20.3 Complete Approximation

Let f be a function in V . We have seen that, for each positive integer n, we can
write !
Xn X
n
f= f− ak (f )ekit
+ ak (f )ekit ,
k=−n k=−n
where all of these functions are orthogonal in V . Therefore we have, for every
positive integer n,
X
n X
n
||f || = ||f −
2
ak (f )e || + 2π
kit 2
|ak (f )|2 .
k=−n k=−n
P
As n → ∞, we might hope that the approximation nk=−n ak (f kit
P)en , which iskitthe best
approximation to f in Un , “tends to” f , in the sense that ||f − k=−n Pak (f )e || tends
to zero. If that is the case, then we can conclude that ||f ||2 = 2π ∞ k=−∞ |ak (f )| .
2
The next result, which is by no means easy to prove, tells us that this is true for
every f in V , and moreover that we have convergence at every point in (−π, π).
P∞
Theorem 20.3.1: For every function f in V , we have
P∞ ||f ||2
= 2π k=−∞ |ak (f )| .
2
Moreover, for each t in (−π, π), we have f (t) = k=−∞ ak (f )ekit .

120
The series ∞ ∞ Z π
X 1 X
ak (f )ekit = f (t)e−kit dt ekit
2π −π
k=−∞ k=−∞
is called the Fourier series for f .
Using Theorem 20.3.1, we can return to our example, where we found approxima-
tions for |t| in each Un , and conclude that, for all t in (−π, π),
π 4 X 1
|t| = − cos(kt).
2 π k2
k odd, 1≤k<∞
In particular, substituting t = 0 gives:

π 4 X 1 π2 X 1
0= − or = .
2 π k2 8 k2
k odd, 1≤k<∞ k odd, 1≤k<∞
This is a remarkable enough formula by itself. We can go one step further and
derive
P∞ 1a formula for the sum of the reciprocals of all the squares. We can write
k=1 k 2 = 1 + 4 + 9 + 16 + 25 + 36 + 49 + · · · as
1 1 1 1 1 1
∞
X X 1 1 1 1 π2 4 π2 π2
= (1 + + + + ···) = =
i=0
(2i k)2 4 16 64 8 3 8 6
k odd, 1≤k<∞
20.4 Another example

We find the Fourier series for the function f (t) = et .
Rπ
First we perform the integration −π et e−kit dt, for every integer k between −n and
n. We get
Z π Z π
t −kit 1 1 + ki (1−ki)t π
ee dt = e(1−ki)t dt = [e(1−ki)t ]π−π = [e ]−π .
−π −π 1 − ki 1 + k2
The answer depends on whether k is even or odd. If k is even, then e−(1−ki)π =
ekπi e−π = e−π and e(1−ki)π = e−kπi eπ = eπ , and so
Z π
1 + ik π
et e−kit dt = 2
(e − e−π ) (k even),
−π 1+k
including the case k = 0. If k is odd, then everything is multiplied by e−kiπ =
e−kiπ = −1, and we get
Z π
1 + ik π
et e−kit dt = − 2
(e − e−π ) (k odd).
−π 1+k
121
The Fourier coefficient ak is therefore

1 + ik
ak = (−1)k 2
(eπ − e−π ).
2π(1 + k )
Therefore, for all t in (−π, π), we get
∞
X
1 π −π 1 + ik kit
t
e = (e − e ) (−1)k e .
2π (1 + k 2 )
k=−∞
This is not yet entirely satisfactory, as the real-valued function et is being expressed
in terms of complex functions. To resolve this, we combine the ak and a−k terms
for each positive integer k. Notice that ak and a−k are conjugates of each other,
and that ekit and e−kit are conjugates of each other, so that ak ekit and a−k e−kit are
conjugates of each other and ak ekit + a−k e−kit is thus twice the real part of ak ekit .
Using the identity ekit = cos(kt) + i sin(kt), we see that the real part ak ekit is the
real part of
(−1)k 1 + ik π
2
(e − e−π )(cos(kt) + i sin(kt)),
2π 1 + k
which is equal to

(−1)k π −π 1 k
(e − e ) cos(kt) − sin(kt) .
2π 1 + k2 1 + k2
The projection of et onto the subspace Un is therefore given by

eπ − e−π eπ − e−π X
n
1 k
+ (−1)k cos(kt) − sin(kt) .
2π π 1 + k2 1 + k2
k=1
By Theorem 20.3.1, we have, for all t in (−π, π):

−π −π X ∞
eπ
− e eπ
− e 1 k
et = + (−1)k cos(kt) − sin(kt) .
2π π 1 + k2 1 + k2
k=1
If we evaluate at t = 0, then all the terms with sin disappear, and all the cosines
are equal to 1, so we get
∞
1 π −π eπ − e−π X 1
1= (e − e ) + (−1)k .
2π π 1 + k2
k=1
122
π
Multiplying by eπ −e−π and rearranging gives:
∞
X (−1)k+1 1 1 1 1 1 π
= − + − + ··· = − π .
1 + k2 2 5 10 17 2 e − e−π
k=1
This is perhaps a little less elegant than the formulae we got from our first example,
but it is no less remarkable!
123
Exercises
Exercises will be set each week from those below. Not all exercises will be set, and
they may be set in a different order from the order they appear in.
Acknowledgments.
Many of these exercises were produced by Dr Robert Simon. Some others are taken
from the book “Advanced Mathematical Methods” by Adam Ostaszewski, by kind
permission of the author.
1. Determine if the following matrices have inverses, and if so what they are.
 
    1 1 0 0  
1 1 1 1 1 0 0 0 1 1 1 2 −1
(i) 1 2 3 (ii) 0 1 1 (iii)  
1 0 0 1 (iv) 1 4 1
 
1 3 5 1 0 1 1 8 −1
0 1 1 0
2. Determine whether the following sets of vectors in R4 are linearly independent:
(a) {(2, 1, 1, 1)t , (1, 2, −1, 1)t , (3, 1, 0, 2)t , (1, 2, 3, 4)t };
(b) {(1, 2, 0, 1)t , (2, 1, 0, 1)t };
(c) {(2, 1, 1, 1)t , (1, 2, −1, 1)t , (3, 1, 0, 2)t , (3, −3, 0, 2)t }.
For any set that is not linearly independent, find a basis for the subspace spanned
by the set.
3. For each n, find a basis B for the subspace V = {v ∈ Rn : v1 + · · · + vn = 0} of

Rn . You should show that your set B is linearly independent, and spans V .
4. Calculate the rank of the following matrix and find a basis for its nullspace.
 
1 2 3 1
2 3 1 2
 
−1 0 7 −1
 
3 4 −1 3 
1 3 8 1
124
5. Find all the solutions to the equation

 
1 2 1
2 4 3 x = b
1 2 2
when (i) b = (1, 2, 1)t and (ii) b = (1, 2, 2)t . In case (i), write down a basis for the
subspace of which the solution set is a translate.
6. Let 1 ≤ p, q ≤ n be given with p 6= q. Let the n × n matrix A be defined by

aii = 1 for all 1 ≤ i ≤ n, apq = a, and otherwise aij = 0. The matrix A represents
the row operation that adds a times row q to row p. What is the inverse of A?
7. Write each of the following four vectors as a linear combination of the other
three: u1 = (1, −2, 1)t , u2 = (0, 1, 2)t , u3 = (−1, 2, 1)t , and u4 = (1, 1, 1).
8. Determine if the following systems of linear equations have solutions, and if so

give the complete solutions:
(a) x1 + 2x3 + 3x4 = 2 (b) x1 + 2x2 + 3x3 + 4x4 = 2
x 2 + x3 − x 4 = −2 −3x1 − x2 + x3 + 3x4 = −1
2x1 − 2x2 + 3x3 + 7x4 = 6 2x1 + x2 + x3 + 2x4 = 0
3x1 + 4x2 + 8x3 + 7x4 = 2 x1 − 3x2 + 2x3 + 4x4 = 1
9. Let
P∞ (a0 , a1 , . . . ) andP
(b0 , b1 , . . . ) be two infinite
P∞ sequences of real numbers such
∞
that i=0 ai < ∞ and i=0 bi < ∞. Show that i=0 (ai + bi )2 < ∞. (This is what
2 2
is needed to show that the set `2 defined in Section 1.2 is a vector space: the sum
of two elements of `2 is also an element of `2 .)
10. Suppose we have a matrix A, and we obtain A0 from A by the elementary row
operation of adding the first row of A to the second row. (So the second row of A0
is the sum of the first two rows of A.)
(a) Show that the row space of A is equal to the row space of A0 .
(You need to show that every row of A0 is in the row space of A, and vice versa).
(b) Suppose the columns c01 , . . . , c0n of A0 satisfy, say α1 c01 + · · · + αn c0n = 0 for some
scalars α1 , . . . , αn . Show that the columns c1 , . . . , cn of A also satisfy α1 c1 + · · · +
αn cn = 0.
(c) Deduce that, if a set of column vectors of A is linearly independent, then the
corresponding set of column vectors of A0 is also linearly independent.
You should be able to see that (a)-(c) also hold for any other elementary row oper-
ation.
125
11. Prove that the set {ex , xex , x2 ex } is linearly independent.
12. Calculate the Wronskian for the functions
eαx , eβx , eγx .
Hence show that this set of functions is linearly independent if α, β and γ are all
different.
13. Show that the following pair f1 and f2 of real valued functions on (−∞, ∞) are
such that the Wronskian W (x) is zero for all choices of x, but f1 and f2 are linearly
independent: f1 (x) = x2 if x ≤ 0 and f1 (x) = 0 if x ≥ 0 and f2 (x) = 0 if x ≤ 0 and
f2 (x) = x2 if x ≥ 0.
14. A polynomial function of degree at most k is a function f : R → R of the form

f (x) = ak xk + · · · + a2 x2 + a1 x + a0 . Let P k be the set of polynomial functions of
degree at most k.
(a) Show that P 3 is a subspace of the vector space of all functions from R to R.
(b) For j = 0, 1, 2, 3, let gj (x) = xj . Show that B = (g0 , . . . , g3 ) is an ordered basis
for P 3 .
Define the function D : P 3 → P 3 by D(f ) = f 0 , the derivative of f .
(c) Show that D is a linear transformation.
(d) Find the matrix AB,B
D representing D with respect to the ordered basis B.
(e) Find ker(D) and im(D).
 
2 1 0
15. Let AT be the matrix −1/2 7/2 1/2 representing a linear transformation
−1/2 1/2 5/2
T :R →R .
3 3
(a) If B = ((1, 0, 1)t , (1, 1, 0)t , (0, 1, 1)t ), what is the matrix AB,B
T representing the
linear transformation T with respect to the ordered basis B?
(See Q1.)
(b) Let C = ((2, 0, 2)t , (3, 3, 0)t , (1, 4, 3)t ). Show that AC,B
T is the identity matrix I3 .
Why does that happen?
126
16. Let B = ((1, 0, 1)t , (1, 1, 0)t , (0, 1, 1)t ) be the ordered basis of R3 as in the
previous exercise, and consider the ordered basis C = ((1, 1)t , (−1, 1)t ) ofR2 . Let T:
1 −1 1
R3 → R2 be a linear transformation represented by the matrix AC,B T =
0 1 0
with respect to the ordered bases B and C.
Let B 0 = ((1, 0, −1)t , (1, −1, 0)t , (0, 1, 1)t ) and C 0 = ((2, −1)t , (1, 0)t ) be alternative
0 0
ordered bases for R3 and R2 respectively. Find ACT ,B , the matrix representing T
with respect to the ordered bases B 0 and C 0 .
17. In R3 , the basis is changed from the standard ordered basis (e1 , e2 , e3 ) to
B = ((1, 2, 3)t , (3, 1, 2)t , (2, 3, 1)t ).
Find the matrix Q such that, if x = (x1 , x2 , x3 ) is an element of R3 written in terms
of the standard basis, then Qx is the same element written in terms of the ordered
basis B.
18. Let A be an n×n matrix, and let x be a variable. Let f (x) = det(xI −A) be the
characteristic polynomial of A. Suppose that f (x) = xn + an−1 xn−1 + · · · + a1 x + a0 .
Show that (−1)n a0 is the determinant of A. Show also that −an−1 is the trace of A,
the sum of its diagonal entries.
19. Suppose that A and B are similar matrices. Show that A2 and B 2 are similar.
If A is invertible, show that B is also invertible and that A−1 and B −1 are similar.
20. (a) Suppose that A and B are n × n matrices, and that A is invertible. Show
that the matrices AB and BA are similar.
(b) Suppose that the n × n matrices S and T are similar. Show that there are
matrices A and B, with A invertible, such that S = AB and T = BA.

1 −1 1 1
(c) Let A = and B = . Show that AB is not similar to BA.
0 0 1 1
21. Let A and B be n × n matrices. Suppose that there is an invertible n × n matrix
S such that SAS −1 and SBS −1 are both diagonal matrices (only zero entries off the
diagonal). Show that AB = BA.
127
22. Suppose that A is an n × m matrix, and B is an m × k matrix. Show that

R(AB) ⊆ R(A) and N (B) ⊆ N (AB).
If also R(B) = Rm , show that R(AB) = R(A).
Deduce that, if A is an n × m matrix, and B is an m × n matrix, with m < n, then
AB is singular.
23. Verify that the following is an inner product in R2 :
h(x1 , x2 )t , (y1 , y2 )t i = x1 y1 − x1 y2 − x2 y1 + 3x2 y2 .
Does h(x1 , x2 )t , (y1 , y2 )t i = x1 y1 −x1 y2 −x2 y1 +x2 y2 also give an inner product on R2 ?
R24. Let V be the vector space of continuous functions on [−1, 1]. Show that hf, gi =
1 2
−1 x f (x)g(x) dx + f (0)g(0) defines an inner product on V .
Indicate briefly
R 1 why each of the following does not define an inner product on V .
(i) hf, gi = −1 xf (x)g(x) dx + f (0)g(0),
R1
(ii) hf, gi = 0 x2 f (x)g(x) dx + f (0)g(0),
R1
(iii) hf, gi = −1 x2 f (x)g(x) dx + f (0)g(1).
25. Use the Gram-Schmidt process to find an orthonormal basis for the subspace
span({(2, 1, 2)t , (2, 0, 1)t }) of R3 .
26. Find orthonormal bases for the image and kernel of the linear transformation
represented by the matrix  
1 1 1 1
M = 2 0 1 −1
0 2 1 3
(with respect to the standard bases), and extend them to orthonormal bases of the
whole spaces R3 and R4 respectively.
27. A function f : R → R is odd if f (−x) = −f (x) for all R 1x, and even if f (−x) =
f (x) for all x. Show that, if f is odd and g is even, then −1 f (x)g(x) dx = 0.
28. Let P 2 be the space of all real polynomial functions on [−1, 1] with degree two
or less. The 3-tuple of functions (1, x, x2 ) is anRordered basis of this space, as in Q14.
1
An inner product is defined on P 2 by hf, gi = −1 f (x)g(x) dx. Find an orthonormal
basis of P 2 with respect to this inner product.
128
29. (i) Suppose A is an n × n matrix with the property that xt Ay = 0 for all x, y
in Rn . Show that A = 0, i.e., every entry aij of A is equal to 0.
(ii) Deduce that, if A and B are n × n matrices with the property that xt Ay = xt By
for all x, y in Rn . Show that A = B.
(iii) Give an example of a non-zero 2 × 2 matrix A such that xt Ax = 0 for all x
in R2 .
30. (i) Show that, for x and y in Rn ,
hx, yi = 14 kx + yk2 − 14 kx − yk2 .
(ii) Show that, for x and y in Cn ,

1 1 i i
hx, yi = kx + yk2 − kx − yk2 + kx + iyk2 − kx − iyk2 .
4 4 4 4
31. Let u = (a, b, c)t be a vector in R3 , with a, c 6= 0. Show that v = (b, −a, 0)t is
orthogonal to u, and that there is a value of x such that w = (a, b, x)t is orthogonal
to both u and v. Find a formula for x.
This is often the easiest way to extend a set of size 1 to an orthonormal basis in R3 ,
or indeed – making the appropriate adjustments – C3 .
32. (a) Find the orthogonal matrix A representing a rotation through an angle π/3
anticlockwise about an axis in the direction (−1, 1, 1)t .
(b) Find the orthogonal matrix C representing a reflection in the plane through the
origin with normal (−1, 1, 1)t .
33. Consider the orthogonal matrices

   
2/3 2/3 −1/3 3/5 4/5 0
A = −1/3 2/3 2/3  , B= 0 0 1 .
2/3 −1/3 2/3 −4/5 3/5 0
(a) Are A and B orientation-preserving or orientation-reversing?

(b) Find the axis and angle of rotation of the transformation represented by the
matrix A. You should find whether the rotation is clockwise or anticlockwise.
(c) Without doing any calculation, explain why AB and BA are orientation-reversing
orthogonal matrices. Also explain why the angles of rotation of AB and BA are the
same. (An earlier exercise should help.)
129
34. Prove the Cauchy-Schwarz Inequality, the Triangle Inequality and the Gener-
alised Theorem of Pythagoras for complex inner product spaces:
(i) for any pair u, v of vectors, |hu, vi| ≤ ||u|| · ||v||;
(ii) for any pair x, y of vectors, ||x + y|| ≤ ||x|| + ||y||,
(iii) if x and y are orthogonal, then ||x − y||2 = ||x + y||2 = ||x||2 + ||y||2 .
Follow one of the lines of proof in the real case, either as in notes or as in lectures.
35. Find the rank of the complex matrix

 
i 0 1
M =  0 1 0 .
−1 i i
Find bases for the range R(M ) and the nullspace N (M ) of M .
36. Find an orthonormal basis for the subspace of C3 spanned by (i, 0, 1)t and
(1, 1, 0)t .
37. Consider the complex matrix

 
1 −1 1
C = 2 0 1 + i .
0 1 + i −1
Find an orthonormal basis of N (C), and extend it to an orthonormal basis of C3 .
38. (a) Which, if any, of the following matrices are Hermitian? Which are unitary?
What does that tell you about their eigenvalues?

2 2i 0 i 0 1
A= ; B= ; C= .
−2i 5 −i 0 i 0
(b) Find the eigenvalues and corresponding eigenvectors of each of A, B and C.
39. Show that all diagonal matrices are normal.

Which complex numbers can arise as an eigenvalue of a normal matrix?
130
40. Which of the following classes of n×n complex matrices are closed under matrix
multiplication? I.e., if A and B are in the class, is it true that AB is always in the
class? In order to show that such a result is not true, it is best to give an example
of matrices A and B which have the property, such that AB does not. You may
quote results from the notes.
(a) the set of Hermitian matrices?
(b) the set of unitary matrices?
(c) the set of invertible matrices?
(d) the set of normal matrices?
(e) the set of diagonal matrices?
√ √
1/ 2 i/ 2
41. Find all the triples (a, b, c) of real numbers for which the matrix
a + ib c
is:
(a) Hermitian;
(b) unitary;
(c) normal.
42. If the real matrix A is positive definite, and v is any non-zero vector in Rn ,
what can you say about the angle between v and Av?
43. Let A be a non-singular square matrix, with eigenvalues λ1 , . . . , λk . What are

the eigenvalues of A−1 ?
44. Prove that, if λ1 , λ2 are distinct eigenvalues of a square matrix A, then the
corresponding eigenvectors v1 , v2 are linearly independent.
Now show that, if A has three distinct eigenvalues λ1 , λ2 , λ3 , then the corresponding
eigenvectors are linearly independent.
 
0 2 1
45. Consider the matrix A = −2 0 2. Find the eigenvalues and eigenvectors
−1 −2 0
of A. Write down a matrix P and a diagonal matrix D such that D = P −1 AP .
There is no need to calculate P −1 .
46. An n × n matrix is called idempotent if A2 = A. Show that 0 and 1 are the only
possible eigenvalues of an idempotent matrix.
131
47. An n × n matrix A is called nilpotent if there exists a positive integer k such

that Ak is the zero matrix.
(i) Show that, if all the entries of A on or below the diagonal are zero, then A is
nilpotent.
(ii) Show that, if A is nilpotent and B is similar to A, then B is nilpotent.
(iii) Show that every n × n matrix is the sum of a diagonalisable matrix and a
nilpotent matrix.
(iv) Show that, if λ is an eigenvalue of a nilpotent matrix, then λ = 0.
(v) Show that 0 is an eigenvalue of every nilpotent matrix.

λ1 a
48. (a) Let be a 2 × 2 matrix in Jordan normal form. Explain briefly why
0 λ2
the entry a is either 0 or 1. If a = 1, what can you say about λ1 and λ2 ?

λ1 a
(b) Let S be the set of all 2 × 2 matrices in Jordan normal form, having
0 λ2
λ1 ≤ λ2 . Show that no two matrices in S are similar.
49. Put the following matrices in Jordan normal form:

   
0 0 1 1 −1 0
A = 1 0 −3 ; B = 0 0 −1 .
0 1 3 1 −1 1
50. Solve the following systems of differential equations:

 0   
y1 0 0 1 y1
 0 
(a) y2 = 1 0 −3   y2  ;
y30 0 1 3 y3
 0     
y1 1 −1 0 y1 0
(b) y2  = 0 0 −1 y2  + 0 .
0
y30 1 −1 1 y3 1
132
51. You are given that M = P JP −1 , where

     
0 −2 2 0 1 1 1 1 0
1
M= −1 −1 3 ; P = 1 0 1 ; J = 0 1 0  .
2
1 −1 3 1 1 0 0 0 −1
(a) Use the information given to determine the eigenvalues and corresponding eigen-
vectors of M .
(b) Find the general solution to the differential equation y0 = M y.
(c) Describe the long-term behaviour of the solution y to the differential equation
above, in general.
(d) Find all solutions y to the differential equation such that y1 (t) → 0 as t → ∞,
and y2 (0) = 0.
52. Let A be a real diagonalisable matrix whose eigenvalues are all non-negative.
Show that there is a real matrix B such that B 2 = A. Is the matrix B unique?
What if A is not diagonalisable? (You need only consider the case where A is a 2 × 2
matrix.)
53. Consider the Leslie matrix

 
0 2 1 0
9/10 0 0 0
L=
 0 3/4 0
,
0
0 0 1/2 0
representing the development of a population over each period of 10 years, where
the population is divided into the individuals of ages 0-10, 10-20, 20-30 and 30-40.
Assume that the population has been developing according to these dynamics for a
long period of time, and that there are 12,000 members of the population in 2010.
(a) Given that λ = 3/2 is the dominant eigenvalue of L, find the corresponding
eigenvector.
(b) Approximately how many individuals are there of age 0-10 in 2010? How many
of these are expected to survive until 2020?
(c) Approximately how many individuals were there in 2000?
133
54. Consider the Leslie matrix

 
0 3/2 33/25
L = 2/3 0 0 .
0 3/5 0
√
You are given that −3+5 2i is an eigenvalue of L. Find the other eigenvalues, and the
eigenvector corresponding to the dominant eigenvalue.
You should not need to find the characteristic polynomial of L, or row-reduce the
matrix.
Suppose that L represents the development of a population over a period of 5 years,
where the population is divided into the individuals of ages 0-5, 5-10, and 10-15. Ex-
plain the significance of the dominant eigenvalue, and its corresponding eigenvector,
in terms of the age-structure of the population.
55. Consider the system of difference equations:
xk+1 = 3xk − yk ; yk+1 = xk + yk ; zk+1 = xk − yk + 2zk .
Find the general solution. For which initial values (x0 , y0 , z0 ) will zk be positive for
all sufficiently large k?
tλ tλ

λ 1 e te
56. (a) Show that, if J = , then etJ = .
0 λ 0 etλ
 
λ 1 0
(b) Find a similar formula if J =  0 λ 1 .
0 0 λ
The expressions for the powers of a Jordan block in and after Theorem 12.2.1 should
help.
57. For each of the two matrices below, find Ak for each k, and etA , for t ∈ R. Use
these expressions to write down the general solution of y0 = Ay in each case.

−1 −2 0 −1
(i) A = ; (ii) A = .
2 3 1 0
Note: for (ii), it’s easier if you don’t start by diagonalising the matrix.

0 −1
58. Show that the matrix A = is normal. Find a unitary matrix U such
1 0
that U ∗ AU is diagonal.
134
59. Find unitary matrices P which reduce the following hermitian matrices A to
diagonal form (i.e., so that P ∗ AP is diagonal):
 
1 0 1
0 −i
(i) ; (ii) 0 4 0 .
i 0
1 0 1
 
7 0 9
60. Consider the matrix A = 0 2 0. Find an orthogonal matrix P such that
9 0 7
t
P AP is diagonal. Express A in the form λ1 E1 + λ2 E2 + λ3 E3 where the λi are the
eigenvalues of A and the Ei are 3 × 3 matrices formed by vi vit , where the vi are the
corresponding eigenvectors.
Verify that Ei Ej is equal to the zero matrix if i 6= j and otherwise Ei Ei = Ei .
61. Let A be an invertible n × n complex matrix. Recall that AA∗ is positive

definite.
(a) Show that every positive definite matrix P has a positive definite square root,
namely a matrix Q, denoted by P 1/2 , such that Q2 = P
(b) Let P = (AA∗ )1/2 . Show that P −1 AA∗ (P −1 )∗ = P −1 AA∗ P −1 = In .
(c) Show that A = P U , where U is unitary. (The conclusion is that every invertible
complex matrix can be written as the product of a positive definite matrix and a
unitary matrix.)
62. Show that, if λ1 , . . . , λk are the non-zero eigenvalues of a Hermitian matrix A,

then |λ1 |, . . . , |λk | are the singular values of A.

1 1 0
63. Find the singular values of the matrix A = . What is the norm of A?
1 0 1
Find a vector x maximising kAxk/kxk.
Find the singular values decomposition of A.

1 1
64. Find the norm of A = , and a vector x maximising kAxk/kxk.
0 1
65. Let A be an invertible matrix. What are the singular values of A−1 , in terms of
the singular values of A? What is the norm of A−1 ?
135
66. For a subset S of a complex inner product space V , define

S ⊥ = {v ∈ V | hu, vi = 0 for all u ∈ S}. Show that S ⊥ is a subspace of V . Show
also that S is contained in (S ⊥ )⊥ .
67. If C and D are subsets of a vector space with C ⊆ D, show that D⊥ ⊆ C ⊥ .
68. Consider the matrix

3 1 10
A= .
1 2 5
Determine the range and nullspace of A, and also the range and nullspace of At .
Verify that R(At ) and N (A) are orthogonal complements, as are R(A) and N (At ).
69. Let U = span({(1, 2, 1)t , (1, 0, 0)t }) and V = span({(0, 1, 1)t }) be subspaces of
R3 . Find the matrix A representing the projection onto U parallel to V . Find also
the matrix B representing the projection onto V parallel to U .
Evaluate A + B, and explain the answer you get.
70. Let v = (v1 , . . . , vn )t be any vector in Rn with ||v|| = 1. Define the n × n matrix
A = vvt , where vt is represented as a horizontal 1 × n matrix and v is represented
as a vertical n × 1 matrix. Show that A is symmetric and idempotent, of rank one.
Hence show that A represents orthogonal projection onto span({v}).
With A as above, set B = I − 2A. Show that B is symmetric and orthogonal, and
that B 2 = I. Is B orientation-preserving or orientation-reversing? What transfor-
mation does it represent?
Hint: what are the eigenvalues of A, counted according to multiplicity? How do the
eigenvalues and eigenvectors of B correspond to those of A?
71. Let S be the subspace of R3 spanned by (1, 2, 3)t and (1, 1, −1)t . Find the
matrix A representing orthogonal projection onto S.
72. Find an orthogonal and a non-orthogonal projection in R3 , each of rank two,

that sends the vectors (1, −1, 0)t and (0, 2, −1)t to themselves.
73. Consider the temperature data given at the end of Section 16.2:
t 0 π/3 2π/3 π 4π/3 5π/3

T 2 10 18 24 22 14
Find the best approximation to this data of the form T = f (t) = a + b cos t + c sin t.
Evaluate your function f (t) at the values t = 0, π/3, . . . , 5π/3.
136
74. Find the least squares fit for a curve of the form
6m
y= +c
x
given the following data points.
x 1 2 3 6
y 5 3 2 1
Why would it be wrong to suppose this was equivalent to the problem of fitting a
curve of the form
z(= xy) = cx + 6m
through the data points (xy, x)?
75. Suppose we are given a number of different measurements a1 , . . . , an of the value

of a constant x. Use the least squares method to find the best fit function to this
dataset: (x, a1 ), (x, a2 ), . . . , (x, an ).
76. Given the four points (0, 0), (1, 1), (2, 4) and (4, 7) in R2 , find:
(a) the line in R2 minimising the sum of the vertical distances from the points to
the line,
(b) the line in R2 minimising the sum of the horizontal distances from the points to
the line.
77. Let P 3 be the space of real polynomial Rfunctions of degree at most 3, equipped
1
with the inner product given by hf, gi = −1 f (x)g(x) dx. In Q28, we found an
orthonormal basis of the subspace P 2 = span({1, x, x2 }) of P 3 .
Using this basis, find a least squares approximation to x3 in P 2 . What exactly is
being minimised by the least squares approximation?
78. Consider the following data: (−3, 3), (0, 1), (1, 4), where the first coordinate is
the value for x and second the value for y. Find the least squares approximation
using only functions of the form y = ax + b and check how closely it approximates
the data. What is the least squares approximation using functions of the form
y = ax2 + bx + c, and how closely does it approximate the data?
137
79. Write
  down the equations which u, v, w, x, y and z must satisfy in order for
u v
w x to be a right inverse of A = 1 −1 1
. Hence find all the right inverses
1 1 2
y z
of A.
Show that A has no left inverse.
80. Express the matrix  

1 1 0 −1
A = 0 1 1 1 
1 1 0 −1
in the form A = BC where B and C are both of rank 2 and of sizes 3 × 2 and 2 × 4
respectively.
Hence calculate As and find the matrix which represents the orthogonal projection
of R3 onto the column space of A.
81. Find the strong generalised inverse of

 
1 0 0 −1
−1 1 0 0 
A= 
 0 −1 1 0  .
0 0 −1 1
Find also a weak generalised inverse of A that is not the strong generalised inverse
and has higher rank.
82. Let A be a real matrix, with strong generalised inverse As . Show that (As )t =
(At )s , i.e., that the transpose of As is the strong generalised inverse of At .
Deduce that, if A is symmetric, then so is As .

1 x
83. Let A be the singular 2 × 2 matrix , where x and y are real numbers,
y xy
not both zero. Find a formula for the strong generalised inverse As of A.
84. Assume that there are only two values for x, namely 0 and and 1, but there
are four pairs of data, (0, 3), (0, 2), (1, 5), (1, 3), where the first coordinate is the
independent variable x and the second the dependent variable y. Using the strong
generalised inverse, find the least squares approximation function to the data in
span({1, x, x2 }).
138
85. Let P be the space of all polynomial functions. Define the linear transformation
D : P → P by D(f ) = f 0 , i.e., each function f in P is mapped to its derivative.
(i) Does D have an inverse?
(ii) Find a right inverse of D, i.e., a linear transformation D∗ : P → P such that
DD∗ is the identity.
86. A linear transformation T : V → V on an inner product space V has bounded

norm if there is a positive real number r such that ||T (v)|| ≤ r||v|| for all v ∈ V .
As in the previous question, let P be the space
R 1of all polynomial functions, which
we now equip with the inner product hf, gi = 0 f (x)g(x) dx. Show that the linear
transformation D in the previous question does not have bounded norm.
87. Find the best approximation of the function f (t) = t in the interval (−π, π)
by a linear combination of the functions ekit , for k in some range −n ≤ k ≤ n, and
represent this approximation as a real-valued function.
Write the function t as a sum of functions of the form ck sin(kt) and dk cos(kt), for
k > 0.
Evaluate both sides of your final expression at t = π/2 to get an expression for π.
88. Find the best approximation of the function t2 in the interval (−π, π) by a linear
combination of the functions ekit , for k in some range −n ≤ k ≤ n, and represent
this approximation as a real-valued function.
By letting n tend to ∞ and evaluating at t = 0, show that
∞
X (−1)k+1 1 1 1 1 π2
=1− + − + − ··· = .
k2 4 9 16 25 12
k=1

LA

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

LA

Uploaded by

Copyright:

Available Formats

1

MA212 Further Mathematical Methods

Lecture 1: Linear Independence

A vector space is called trivial if it consists only of the zero vector 0.

1.2 Some examples of vector spaces

Rn := {(a1 , a2 , . . . , an )t | ai ∈ R for all i}

and the complex vector space

Cn := {(a1 , a2 , . . . , an )t | ai ∈ C for all i},

for n a positive integer.

Vectors in Rn or Cn are “column vectors”, written vertically, but we often write

when a vector is handwritten, either by the lecturer in lectures or by students in

1.3 Linear independence

a1 + 2a2 + 3a3 + 4a4 = 0, a2 + 2a3 + 3a4 = 0, a3 + 2a4 = 0, a4 = 0.

of linear combinations of the vectors in S. If the vectors are in a complex vector

A subset A of a vector space V spans the vector space V if span(A) = V , meaning

A vector space V with a finite subset A that spans V is called a finite-dimensional

We now give some alternative characterisations of linear independence.

Proof: First suppose that A is linearly independent. If, for some v in A, v is in

Theorem 1.3.2: A finite set A = {v1 , . . . , vk } is linearly independent if and only

Proof: Suppose that A is linearly independent. If w = a1 v1 + · · · + ak vk , and also

Conversely, if A is not linearly independent, and 0 = a1 v1 + · · · + ak vk for some

A subset U of a vector space V is a vector subspace (or just subspace) of V if U is

Theorem 1.4.1 If Q is a subset of a vector space V , then span(Q) is a subspace

1.5 Matrices and Row Operations

An n × n square matrix A is invertible if there exists another n × n matrix A−1 such

The column space of an n × m matrix A is the subspace of Rn spanned by the

By applying a sequence of elementary row operations to any n × m matrix A, it is

Lecture 2: Bases and Dimension

Theorem 2.1.1: Any non-trivial finite-dimensional vector space has a basis.

The standard basis of Rn or Cn is E = {e1 , . . . , en }, where ei is defined to be

2.3 Matrices and Gaussian Elimination

Multiplying row i of the matrix A by the non-zero

Definition: The matrices above, corresponding to the elementary row operations,

Theorem 2.3.1: If A and B are n × n matrices, then det(AB) = det(A) det(B).

Proof: If det(A) = 0 or det(B) = 0, then AB is also not invertible and therefore

det(A1 ) · · · det(Aj ) det(B1 ) · · · det(Bk ) = det(A) det(B).

2.5 The Wronskian

Let f1 , f2 , . . . , fn be functions in D: this certainly implies that each of them can

Lecture 3: Introduction to Linear Transformations

T (ax + by) = aT (x) + bT (y),

for all x, y in U , and all scalars a, b.

An ordered basis of a finite-dimensional vector space U is a tuple (u1 , . . . , um ) of

T (ui ) = a1i v1 + a2i v2 + · · · + ani vn .

Conversely, for every n × m matrix A and every choice of an ordered basis B of U

If we change the order of the elements so that B 0 = (u2 , u1 ) and C 0 = (v2 , v3 , v1 ),

What is the matrix representing the linear transformation T2 ◦ T1 ?

Theorem 3.1.1: Let B = (u1 , . . . , um ) be an ordered basis for U , C = (v1 , . . . , vl )

Proof: Let βik be the entries of AD,C C,B

3.2 Change of Basis

Given a linear transformation T : Rm → Rn , we are particularly interested in the

Suppose that we are given alternative bases B and C of Rm and Rn , respectively,

Let B = (u1 , . . . , um ) and C = (v1 , . . . , vn ) be ordered bases of Rm and Rn , re-

There is a linear transformation T represented by the matrix MB , namely the linear

Theorem 3.2.1: If T : Rm → Rm is the linear transformation that sends each

Proof: Since T ◦ T −1 is the identity transformation, it follows that AT ◦T −1 is the

linear transformation T : Rm → Rn with respect to the standard ordered bases Em

Theorem 3.2.2: Let T : Rm → Rn be a linear transformation, and let B and C be

v1 −v2 . Therefore we expect AC,B

One important case is when we have a linear transformation T : Rn → Rn , and

An n × n matrix A is diagonalisable if it is similar to a diagonal matrix, as we

Proof: Suppose B = P AP −1 , where P is invertible. Then we have P (xI −A)P −1 =

That the characteristic polynomial of A is not altered by a change of basis means

Lecture 4: Kernels and Images

If A is an n × m real matrix, then the range of A is the set