You are on page 1of 18

5CCM222A Linear Algebra

1 Vector Spaces
Denition. A real vector space is a set V together with two operations + and . called
addition and scalar multiplication respectively, such that:
(i) addition is a binary operation on V which makes V into an abelian group and
(ii) the operation of scalar multiplying an element v V by an element R gives an
element .v of V .
In addition, the following axioms must be satised:
(a) 1.v = v for all v V
(b) If , R and v V then .(.v) = ().v
(c) If R and v, w V then .(v +w) = .v +.w
(d) If , R and v V then ( +).v = .v +.v
The denition of a complex vector space is exactly the same except that eld R of scalars is
replaced by the complex eld C.
In fact one can have vector spaces over any eld; in this course we will not consider any
other elds (but most of what we do will hold with no change for any eld).
Informally, a vector space is a set of elements that can be added and multiplied by scalars
and obey the usual rules.
Denition. The span of a nite set S = s
1
, s
2
, . . . , s
k
of elements of a vector space V ,
denoted by span S, is the set of all nite sums of the form
1
s
1
+
2
s
2
+ +
k
s
k
where

i
F (some may be zero) and s
i
S. If the span of a set S is the whole vector space V
(i.e. if span S = V ) then S is called a spanning set.
We call a sum of the form
1
s
1
+
2
s
2
+ +
k
s
k
a linear combination of elements of S;
thus span S is the set of all linear combinations of elements of S.
Denition. A set S = s
1
, s
2
, . . . , s
k
is said to be linearly dependent if there exist scalars

1
,
2
, . . . ,
k
not all equal to 0 such that
1
s
1
+
2
s
2
+ +
k
s
k
= 0. If S is not linearly
dependent we say that S is linearly independent. Thus, S is linearly independent if
whenever
1
s
1
+
2
s
2
+ +
k
s
k
= 0 we must have
1
=
2
= =
k
= 0 .
Denition. Elementary Operations. The following are called elementary operations on
a nite set of vectors.
(I) Interchange two vectors.
(II) Multiply one vector by a non-zero scalar.
(III) Add a scalar multiple of one vector to another vector.
Remark. If S

= s

1
, s

2
, . . . , s

k
is obtained from S = s
1
, s
2
, . . . , s
k
by means of an
elementary operation then
(i) span(S

) = span(S) ,
(ii) S

is linearly independent if and only if S is linearly independent .


Denition. A nite set v
1
, v
2
, . . . , v
n
of elements of V is said to to be a basis of V if
it is linearly independent and spans V . If V has a (nite) basis we say that V is nite-
dimensional.
1
Theorem 1.1 If S = v
1
, v
2
, . . . , v
m
spans V then there is a basis of V which is a subset
of S.
Theorem 1.2 If V is nite-dimensional and S = v
1
, v
2
, . . . , v
k
is linearly independent,
then there is a basis of V which contains S.
Theorem 1.3 If v
1
, v
2
, . . . , v
n
is a basis of V and w
1
, w
2
, . . . , w
m
is a set of m vectors
with m > n, then w
1
, w
2
, . . . , w
m
is linearly dependent.
Theorem 1.4 [BASIS THEOREM] Every basis of a nite-dimensional vector space has the
same number of elements.
Denition. If V is a nite-dimensional vector space the dimension of V is the number of
elements in any basis of V .
Co-ordinates. Let V be a nite-dimensional vector space with basis v
1
, v
2
, . . . , v
n
. Then
any element v V can be written uniquely in the form
v =
1
v
1
+
2
v
2
+
n
v
n
where
i
are scalars. The n-tuple

2
.
.
.

is called the n-tuple of co-ordinates of v with


respect to the basis v
1
, v
2
, . . . , v
n
. (Co-ordinates of vectors are normally regarded as column
vectors but sometimes written in rows to save space.)
Theorem 1.5 If V is a real (complex) n-dimensional vector space then V is isomorphic to
R
n
(C
n
).
Theorem 1.6 Two nite-dimensional vector spaces (over the same eld) are isomorphic if
and only if they have the same dimension.
Denition. A subset W of a vector space V is called a subspace if it is a vector space (with
the operations as in V ).
Lemma 1.7 If W is a subspace of an n-dimensional vector space V then W is nite-
dimensional with dimension m n. The case m = n holds if and only if W = V .
Denition. Given two bases v
1
, v
2
, . . . , v
n
and w
1
, w
2
, . . . , w
n
of a vector space V , the
transition matrix (from v
i
to w
i
) is the matrix T which maps the co-ordinates with
respect to v
i
of each vector to the co-ordinates of the same vector with respect to w
i
.
Lemma 1.8 If v
1
, v
2
, . . . , v
n
and w
1
, w
2
, . . . , w
n
are two bases of V and
v
j
=
n

i=1
t
ij
w
i
= t
1j
w
1
+t
2j
w
2
+ +t
nj
w
n
then the transition matrix T from v
i
to w
i
is given by T = (t
ij
).
2
2 Linear Maps
Denition. Let V and W be vector spaces. A map (or function or transformation)
f : V W is said to be linear if preserves (or respects) addition and scalar multiplication,
that is
(i) f(v
1
+v
2
) = f(v
1
) +f(v
2
) for all v
1
, v
2
V and
(ii)f(v) = f(v) for all scalars and v V .
Remark. The inverse of an invertible linear map is linear. The result of composing two
linear maps is linear.
Theorem 2.1 Let V and W be real vector spaces of dimensions n and m respectively and
let f be a linear map f : V W. Choose bases v
1
, v
2
, . . . , v
n
of V and w
1
, w
2
, . . . , w
m

of W. For each i, let


f(v
i
) =
1i
w
1
+
2i
w
2
+
mi
w
m
=
m

j=1

ji
w
j
be the expansion of f(v
i
) in terms of the basis w
1
, w
2
, . . . , w
m
and let M
f
= (
ij
) be the
matrix whose i
th
column is (
1i
,
2i
,
mi
)
t
. Then the map taking the co-ordinates of any
vector v V to the co-ordinates of f(v) is a linear map from R
n
to R
m
which is implemented
by the mn matrix M
f
.
Denition. The matrix M
f
of the above theorem is called the matrix of the linear map f
(with respect to the chosen bases v
i
, w
j
).
Remark. The form of the matrix of a linear map depends crucially on the choice of bases
and quite dierent matrices can arise from the same map by varying the bases.
Denition. Given a linear map f : V W, the kernel (or null space) ker(f) and the image
(or range) im(f) are dened by
(i) ker(f) = v : f(v) = 0 ,
(ii) im(f) = w : w = f(v) for some v V .
Remark. A linear map f : V W
(i) is injective if and only if ker(f) = (0) ,
(ii) is surjective if and only if im(f) = W .
Denition. The rank of a linear map f : V W is the dimension of im(f); the nullity of
f is the dimension of ker(f).
Theorem 2.2 If f is a linear map f : V W, then
rank(f) + nullity(f) = dim(V ) .
Corollary 2.3 For any linear map f : V W, on can choose bases v
1
, v
2
, . . . , v
n
of V
and w
1
, w
2
, . . . , w
m
of W such that the matrix of f with respect to the chosen bases has
the partitioned matrix form
M
f
=

I
r
0
0 0

where r = rank(f), I
r
is the r r identity matrix and 0 denote zero matrices of appropriate
sizes. (If r = m then the bottom row is absent; if r = n then the right hand column is absent;
if r = 0 then M
f
is the zero matrix.)
3
Corollary 2.4 (a) If f : V W is invertible then V and W have the same dimension.
(b) If V and W have the same dimension n (in particular, if V = W) then the following are
equivalent:
(i) f is injective i.e. ker(f) = (0) or nullity(f) = 0 ,
(ii) f is surjective i.e. im(f) = W or rank(f) = n.
(iii) f is invertible.
Lemma 2.5 Let f : V W be a linear map.
(i) If g : V V is surjective (in particular if g is invertible) then rank(f g) = rank(f).
(ii) If h : W W is injective (in particular if h is invertible) then rank(h f) = rank(f).
Theorem 2.6 [CHANGE OF BASES] Let f : V W, be a linear map. Let M
f
be the
matrix of f with respect to bases v
1
, v
2
, . . . , v
n
of V and w
1
, w
2
, . . . , w
m
of W and let
M

f
be the matrix of f with respect to bases v

1
, v

2
, . . . , v

n
of V and w

1
, w

2
, . . . , w

m
of W.
Then
M
f
= S
1
M

f
T
where T is the transition matrix from v
i
to v

i
and S is the transition matrix from w
i
to w

i
.
Corollary 2.7 Let M be any m n matrix. Then there exists an invertible m m matrix
X and an invertible n n matrix Y such that
XMY =

I
r
0
0 0

where r = colrk(M), I
r
is the rr identity matrix and 0 denote zero matrices of appropriate
sizes. (If r = m then the bottom row is absent; if r = n then the right hand column is absent;
if r = 0 then M = XMY = 0 is the zero matrix.)
Corollary 2.8 The column rank of any matrix M is the same as the column rank of its
transpose.
Consequently,
dim[span(rows of M)] = dim[span(columns of M)] .
This result is often described as row rank = column rank.
Lemma 2.9 Let f and g be two linear maps from V V and let M
f
and M
g
be the matrices
of f and g with respect to the same basis v
1
, v
2
, , v
n
of V (in both the domain and the
co-domain). Then M
fg
= M
f
M
g
.
Lemma 2.10 Let f and g be two linear maps from V V . Then
nullity(f g) nullity(f) + nullity(g)
and hence
rank(f g) rank(f) + rank(g) dim(V ).
4
3 Equations, Matrices and Determinants
Theorem 3.1 [Homogeneous Linear Equations.] Let A be an m n matrix. The homoge-
nous system of linear equations
Ax = 0
has a non-trivial solution if and only if rank(A) < n. The dimension of the solution space
is n rank(A).
Consequently, if m < n (i.e. if there are more unknowns than equations) then there is always
a non-trivial solution.
Theorem 3.2 [Non-Homogeneous Linear Equations.] Let A be an mn matrix and bbe a
non-zero m 1 vector. The non-homogenous system of linear equations
Ax = b
has a solution if and only if rank(A) = rank([A[b]) where [A[b] is the m(n+1) augmented
matrix obtained by adjoining b to A as the (n + 1)
th
column. If a solution x
0
to the system
exists then every solution is of the form x
0
+ y where y is any solution of the homogeneous
system Ax = 0. The system has a unique solution if and only if rank(A) = rank([A[b]) = n.
Denition. The following are called elementary row operations on a matrix.
(I) Interchange two rows.
(II) Multiply one row by a non-zero scalar.
(III) Add a scalar multiple of one row to another row.
The elementary column operations are the same operations applied to columns.
Algorithm for matrix inverses. (Reminder.) Start by writing the identity matrix next
to the given matrix: (I[A). Successively apply the same elementary row operations to each
matrix until the right hand matrix is the identity (this can always be done if A is invertible).
Then the left hand matrix is the inverse of A.
Algorithm for nding the matrices X and Y of Corollary 2.7. Let M be the given
m n matrix. Start by writing the m m identity matrix to the left and the n n
identity matrix to the right of the given matrix: (I
m
[M[I
n
). Successively apply the same
elementary row operations to the left hand and middle matrix and the same elementary
column operations to the right hand and middle matrix until the middle matrix is of the
form

I
r
0
0 0

(this can always be done). If the left and right hand matrices at the end are X and Y
respectively then XMY is of the required form (as above).
5
Elementary Matrices.
Denition. The result of applying an elementary operation to the identity matrix is called
an elementary matrix. There are three types:
E
I
(p, q) interchanging row p and row q
E
II
(p, ) multiplying row p by a non-zero scalar ,
E
III
(p, q, ) adding times row p to row q, (where is a scalar).
Therefore
(I) all the diagonal entries of E
I
(p, q) except the p
th
and the q
th
are 1, the (p, q)
th
and the
(q, p)
th
entries are 1 and all the remaining entries are 0.
(II) all the diagonal entries of E
II
(p, ) except the p
th
are 1, the (p, p)
th
entry is and all
the remaining entries are 0.
(III) all the diagonal entries of E
III
(p, q, ) are 1, the (q, p)
th
entry is and all the remaining
entries are 0.
Lemma 3.3 The result of performing an elementary operation on the rows of a matrix is
the same as pre-multiplying the matrix by the corresponding elementary matrix.
The result of performing an elementary operation on the columns of a matrix is the same as
post-multiplying the matrix by the corresponding elementary matrix.
Remark. Elementary matrices are invertible. The inverses of E
I
(p, q), E
II
(p, ) and
E
III
(p, q, ) are E
I
(p, q), E
II
(p,
1
) and E
III
(p, q, ) respectively.
Remark. In view of Lemma 3.3 it is easy to see that the algorithms described in this
sections must work.
Theorem 3.4
(i) Every invertible matrix can be written as a product of elementary matrices.
(ii) Every mn matrix A can be in the form PI(r)Q where
I(r) =

I
r
0
0 0

and P and Q are invertible and therefore a product of elementary matrices. Here r =
rank(A), I
r
is the r r identity matrix and 0 denote zero matrices of appropriate sizes. (If
r = m then the bottom row is absent; if r = n then the right hand column is absent.)
Equivalence. Two matrices A and B are said to be equivalent (written A B) if there are
invertible matrices P and Q such that A = PBQ. Note that equivalence is an equivalence relation.
The results above can be described as follows. For mn matrices, the rank is a complete invariant
and the matrices I(r) are a set of normal (or canonical) forms for matrices under equivalence.
[A complete invariant for an equivalence relation is a quantity (here an integer) associated with
each element that uniquely identies its equivalence class.
A set of normal (or canonical) forms for an equivalence relation is set of specic (preferably as
simple as possible) representatives, exactly one from each equivalence class.]
6
Determinants. A brief review of the properties of determinants will be given in lectures.
You will be expected to know and use these properties but no proofs of any properties
of determinants will be required in the examination.
Let A be a square matrix. The determinant det(A) is a scalar associated with A. There are
two common denitions of determinants.
Denition. [Inductive Denition.]
(i) The determinant of the 1 1 matrix (a) is a.
(ii) Given an n n matrix A and integers i, j with 1 i n, 1 i n the minor M
ij
is dened to be the determinant of the (n 1) (n 1) matrix obtained by deleting the i
th
row and the j
th
column of A.
(iii) Supposing that determinants have been dened for (n 1) (n 1) matrices, the
determinant of the n n matrix A = (a
ij
) is dened by
det(A) = a
11
M
11
a
12
M
12
+. . . + (1)
n+1
a
1n
M
1n
=
n

k=1
(1)
k+1
a
1k
M
1k
.
Note. The quantity (1)
i+j
M
ij
is called the cofactor of a
ij
.
The second denition is given purely for your general interest and will not be covered in the
course. It is somewhat more sophisticated but, once mastered, the proofs of the properties are
more transparent. The denition requires some knowledge about permutations. Recall that a
bijective mapping of the integers 1 to n onto itself is called a permutation. The set of all such
permutations with composition as the operation of product forms a group denoted by S
n
. (i.e.

). Here are the main properties of permutations.


(i) Every permutation S
n
can be written as a product =
1

2
. . .
s
where each
i
is a
transposition (i.e. a permutation that exchanges two elements of 1, 2, . . . , n and leaves the rest
xed).
(ii) Although there are many such expressions for and the number s of transpositions varies,
nevertheless, for given , s is either always even (in which case we say that is an even permutation)
or always odd (when we say is odd).
(iii) We dene the sign sgn() of to be 1 if is even and 1 if is odd. (i.e. sgn() = (1)
s
).
(iv) The map sgn : sgn() is a homomorphism from S
n
to the multiplicative group 1, 1
i.e. sgn(

) = sgn() sgn(

) ,

S
n
.
The second denition of determinant is based on the idea that each term in the evaluation of a
determinant is a product of n entries with exactly one from each row and from each column. Thus
from the rst row a term might have a
1n
1
where n
1
is any integer between 1 and n, the from the
second a
2n
2
but now n
2
= n
1
, and so on. It is clear that the mapping i n
i
is a permutation of
the integers 1 to n and that there one such term to each such permutation . If the permutation
is even we attach a plus sign to the term a
1(1)
a
2(2)
a
n(n)
and if is odd, we attach a minus
sign. The sum of these signed terms is the determinant. Here is the formal denition.
Denition. [Permutation denition.]
The determinant of an n n matrix A = (a
ij
) is dened by
det(A) =

S
n
sgn()a
1(1)
a
2(2)
. . . a
n(n)
where the sum is take over the group S
n
of all permutations of the integers 1, 2, 3, . . . , n.
7
Properties of Determinants.
0. The determinant of a matrix equals the determinant of its transpose.
1. Interchanging two rows (or columns) of a matrix changes the sign of the determinant.
2. Multiplying each element of a row (or of a column) by a non-zero scalar multiplies the
determinant by the same scalar.
3. Adding a scalar multiple of a row (or column) to another row (or column) does not change
the value of the determinant.
Note that Properties 1,2,3 describe the eect that the three elementary operations applied
to a matrix have on its determinant. Note also that det(E
I
(p, q)) = 1, det(E
II
(p, )) = ,
det(E
III
(p, q, )) = 1. Thus the above properties can be written in symbols as follows:
1. det(E
I
(p, q)A) = det(A) = det(E
I
(p, q)). det(A) ,
2. det(E
II
(p, )A) = det(A) = det(E
II
(p, )). det(A) ,
3. det(E
III
(p, q, )A) = det(A) = det(E
III
(p, q, )). det(A) ,
For columns we have:
1. det(AE
I
(p, q)) = det(A) = det(A) det(E
I
(p, q)) ,
2. det(AE
II
(p, )) = det(A) = det(A) det(E
II
(p, )) ,
3. det(AE
III
(p, q, )) = det(A) = det(A) det(E
III
(p, q, )) .
Theorem 3.5 [Multiplicative property of determinants] For n n matrices A and B
det(AB) = det(A). det(B) .
Remark. If A has any row (or column) consisting entirely of zeros then det(A) = 0 (this
follows immediately from either denition). Since elementary operations cannot change the
determinant from a zero value to a non-zero value (or vice versa) it follows that an n n
matrix A is invertible if and only if det(A) = 0. Also, it is easy to see from either denition
that the determinant of a matrix in echelon form is the product of the diagonal terms (if A
is not invertible there will be a zero row and hence a zero diagonal term).
8
4 Similarity and diagonalisation
In this section we deal with matrices of linear maps from a nite-dimensional vector space V
into itself. Such a map is called an endomorphism of V or a linear operator on V . We shall
consider matrices of such maps with respect to the same basis of V in both the domain and
the co-domain. This ensures that composition of the maps corresponds to multiplication of
their matrices : M
fg
= M
f
.M
g
. We shall be particularly concerned in the cases when the
basis can be chosen so as to make the matrix diagonal. In this section, unless otherwise
stated, the eld of scalars will be the complex numbers C.
Denition. A matrix A is said to be similar to the matrix B if there exists an invertible
matrix R such that A = R
1
BR.
If A is similar to B we write A B. Note that similarity is an equivalence relation.
Let f : V V be a linear map. If M
f
is its matrix with respect to one basis and M

f
its
matrix with respect to another basis (in each case taking the same basis in the domain and
codomain) then it follows immediately from Theorem 2.6 that M
f
and M

f
are similar with
the transition matrix from one basis to the other eecting the similarity. Also, since any
invertible matrix is a transition matrix corresponding to some change of basis, any matrix
similar to M
f
is the matrix of f with respect to some basis.
Denition. A linear operator f : V V is said to be diagonalisable if the matrix of f
with respect to some basis of V is diagonal. A matrix A is said to be diagonalisable if it is
similar to some diagonal matrix.
It follows that f diagonalisable if and only if the matrix of f with respect to some basis of
V is diagonalisable.
For the remainder of this section, A will denote an n n matrix.
Theorem 4.1 A matrix A is diagonalisable if and only if there is a basis consisting of
eigenvectors of A.
Denition. The characteristic polynomial of a matrix A is det(XI A) and denoted by
p
A
(X). The characteristic equation of A is det(XI A) = 0.
Remark. The degree of the characteristic polynomial is n (when the matrix is n n).
Similar matrices have the same characteristic polynomial. The eigenvalues of a matrix are
the roots of its characteristic equation.
Theorem 4.2 If x
1
, x
2
, . . . , x
k
are eigenvectors of a matrix A corresponding to eigenval-
ues
1
,
2
, . . . ,
k
which are all dierent then x
1
, x
2
, . . . , x
k
is a linearly independent
set.
Corollary 4.3 If A has n distinct eigenvalues then it is diagonalisable.
Denition. For any eigenvalue of a matrix A, the number of times X appears as a
factor of the characteristic polynomial p
A
(X) is called the algebraic multiplicity of as an
eigenvalue. (Sometimes this is referred to simply as the multiplicity of .)
9
Theorem 4.4 The matrix A is diagonalisable if and only if for each eigenvalue of A with
algebraic multiplicity m

we have nullity(I A) = m

.
Proof. If A is diagonalisable then for some invertible matrix R, R
1
AR is a diagonal
matrix D. Since similar matrices have the same characteristic polynomials, the diagonal
entries of D are the eigenvalues of A repeated according to algebraic multiplicity. Thus,
I D has zero in exactly m

places on the diagonal. Therefore m

= nullity(I D) =
nullity(R
1
(I A)R) = nullity(I A).
Conversely, suppose A has r distinct eigenvalues
1
,
2
, ,
r
. Since ker(
k
I A) has
dimension m

k
, it has a basis x
ik
: 1 i m
k
[each being an eigenvector of A]. There
are n vectors in total because the sum m

1
+ n

2
+ m

r
is n. We show that these
form a basis of R
n
by showing they are linearly independent. Note that, if j = k then
(
j
I A)x
ik
= (
j

k
)x
ik
and, of course, (
j
I A)x = 0 for any eigenvector corresponding
to
j
. Now suppose a linear combination of x
ik
: 1 i m
k
, 1 k m equals 0; that is,
k=r

k=1
i=m
k

i=1

ik
x
ik
= 0 .
Multiplying this successively by (
j
IA) for all j = k multiplies the eigenvectors correspond-
ing to
k
by the non-zero constant c =

j=k
(
j

k
) and makes all the other terms vanish.
Thus we are left with c(
1k
x
1k
+
2k
x
2k
+ +
m
k
k
x
m
k
k
) = 0 and, since x
ik
: 1 i m
k

is a linearly independent set, all the coecients must be 0. This can be done for any k and
so the vectors are linearly independent and hence a basis.
The quantity nullity(I A) is called the geometric multiplicity of the eigenvalue . So a
criterion for diagonalisability of A is that the geometric and algebraic multiplicities are the
same for all the eigenvalues of A.
Theorem 4.5 Every square matrix is similar (over the complex eld) to an upper triangular
matrix.
Proof. This is proved by induction on the size of the matrix. The result is trivial for n = 1.
Suppose the result is true for (n1) (n1) matrices. Let A be an nn matrix and let
1
be an eigenvalue of A with x
1
an eigenvector corresponding to
1
. Let R be any invertible
matrix with x
1
as its rst column; [from Theorem 1.2. there is a basis of R
n
with x
1
as its
rst member and these form the columns of a suitable matrix R]. Since Re
1
= x
1
, we have
that
ARe
1
=
1
x
1
=
1
Re
1
and so R
1
ARe
1
=
1
e
1
. This means that the rst column of R
1
AR is


1
0

where 0 is
the zero column vector of C
n1
. Therefore
R
1
AR =


1
a
t
0 A
1

where a is some (n 1)-vector and A


1
is some (n 1) (n 1) matrix.
By the induction hypothesis there is an (n 1) (n 1) invertible matrix S
1
such that
S
1
1
A
1
S
1
= T
1
is upper triangular. Put S =

1 0
t
0 S
1

. Then RS is invertible and


(RS)
1
A(RS) = S
1
(R
1
AR)S =

1 0
t
0 S
1
1


1
x
t
0 A
1

1 0
t
0 S
1

10
=


1
x
T
A
1
0 S
1
1
A
1
S
1


1
x
t
A
1
0 T
1

,
which is upper triangular. Therefore the theorem follows by induction.
Corollary 4.6 If A is similar to a triangular matrix T then the diagonal entries in T are
the eigenvalues of A repeated according to algebraic multiplicity.
Proof. The characteristic polynomial p
T
(X) of T is clearly the product of the n factors
(X t
ii
) where t
ii
: 1 i n are the diagonal entries of T. So t
ii
: 1 i n are the
eigenvalues of T. But similar matrices have the same characteristic polynomials and so have
the same eigenvalues with the same multiplicities and the result follows.
Corollary 4.7 Let be an eigenvalue of A with algebraic multiplicity m

. Then
rank(I A) n m

. Consequently nullity(I A) m

.
Proof. Since T has as entries in the diagonal of T exactly m

times, the matrix I T


has exactly n m

non-zero entries in its diagonal. The rows in which these appear are
linearly independent and hence rank(I T) n m

and, since (if T = R


1
AR)
rank(I A) = rank(R
1
(I A)R) = rank(I T),
the result follows.
Theorem 4.8 [Cayley-Hamilton] Every square matrix satises its characteristic equation.
In symbols: p
A
(A) = 0.
Proof. Let
1
,
2
, ,
n
be the eigenvalues of A repeated according to algebraic multiplicity.
From Theorem 5.10 there is an invertible matrix R such that A = RTR
1
where T is a
triangular matrix with
1
,
2
, ,
n
as its diagonal entries. The characteristic polynomial
p of A and of T is (X
1
).(X
2
). . . . .(X
n
). Since p(A) = Rp(T)R
1
it is sucient
to show that p(T) = (T
1
I).(T
2
I). . . . .(T
n
I) = 0. Then the i
th
diagonal entry of
T
i
I is 0.
We complete the proof by showing that if T
i
: 1 i n is a set of triangular matrices
such that the i
th
diagonal entry of T
i
is 0 then T
1
.T
2
. . . . .T
n
= 0. The proof is by induction
on the size of the matrix. The result is obvious for n = 1. Suppose the result is true for
(n 1) (n 1) matrices. Write T
i
as partitioned matrices
T
1
=

0 x
t
1
0 S
1

, T
i
=

t
i
x
t
i
0 S
i

(2 i n),
where each S
i
is a triangular matrix with 0 in the (i 1)
th
diagonal place. Therefore, by the
induction hypothesis, S
2
.S
3
. . . . .S
n
= 0 and so
T
2
.T
3
. . . . .T
n
=

t z
t
0 S
2
.S
3
. . . . .S
n

t z
t
0 0

where t and z are some scalar and vector (whose values are not important). Then
T
1
T
2
.T
3
. . . . .T
n
=

0 x
t
1
0 S
1

t z
t
0 0

0 0
t
0 0

= 0
as required.
11
Lemma 4.9 There is a unique monic polynomial m
A
(X) of lowest degree such that m
A
(A) =
0. The polynomial m
A
(X) has the following properties:
(i) if q(X) is a polynomial such that q(A) = 0, then m
A
(X) divides q(X).
(ii) every linear factor of m
A
(X) is of the form X for some eigenvalue of A.
(iii) if is any eigenvalue of A, then X is a factor of m
A
(X).
Proof. The Cayley-Hamilton Theorem shows that A
n
, A
n1
, , A, I is a linearly depen-
dent set. Let k be the smallest integer such that A
k
, A
k1
, , A, I is linearly dependent.
Then for some scalars c
k
, c
k1
, , c
0
with c
k
= 0 we have c
k
A
k
+c
k1
A
k1
+ +c
1
A+c
0
I =
0. Let
i
= c
i
/c
k
. and write m
A
() =
k
+
k1

k1
+ +
1
+
0
. Then
m
A
(A) = A
k
+
k1
A
k1
+ +
1
A +
0
I = 0
and q(A) = 0 for any polynomial q of degree < k. To prove the uniqueness of m
A
, suppose
A
k
+
k1
A
k1
+ +
1
A +
0
I = 0 .
Then by subtraction, we have that (
k1

k1
)A
k1
+ (
k2

k2
)A
k2
+ + (
1

1
)A + (
0

0
)I = 0 and since, by the choice of k, the set A
k1
, , A, I is linearly
independent, we have that
i
=
i
.
(i) Suppose q(A) = 0. By Euclids algorithm, q(X) = m
A
(X)s(X) + r(X) where
deg(r(X)) < deg(m
A
(X)). Then
0 = q(A) = m
A
(A)s(A) +r(A) = 0 +r(A)
and, since deg(r(X)) < deg(m
A
(X)), r(X) is the zero polynomial. That is, m
A
(X) divides
q(X).
(ii) Since by (i), m
A
(X) divides the characteristic polynomial p
A
(X) of A, that is, p
A
(X) =
m
A
(X)s(X). Thus every root of m
A
(X) = 0 is a root of the characteristic equation p
A
(X) =
0 of A, that is, an eigenvalue of A and so (ii) follows.
(iii) If is an eigenvalue of A with Ax = x (where x = 0) then A
2
x = Ax =
2
x and
so A
s
x =
s
x for all s Z
+
and thus q(A)x = q()x for any polynomial q(X). Then
0 = m
A
(A)x = m
A
()x
and, since x = 0, it follows that m
A
() = 0. Therefore X is a factor of m
A
(X).
Denition. The polynomial m
A
(X) is called the minimal polynomial of A.
Theorem 4.10 A matrix A is diagonalisable if and only if its minimal polynomial m
A
(X)
has no repeated factors.
Proof. Let
1
,
2
, ,
r
be the distinct eigenvalues of A and let m
k
be the algebraic
multiplicity of
k
. If m
A
(X) has no repeated factors, it follows from Lemma 4.9 that
m
A
(X) = (X
1
).(X
2
). . . . .(X
r
)
12
and so
m
A
(A) = (A
1
I).(A
2
I). . . . .(A
r
I) = 0 .
consequently, using Lemma 2.10,
n = nullity(m
A
(A)) nullity(A
1
I) + nullity(A
2
I) + + nullity(A
r
I) .
Corollary 4.7 shows that nullity(A
i
I) m
i
for each i. Then for any k, applying this to
the above equation for all i = k,
n nullity(A
k
I) +

i=k
m
i
= nullity(A
k
I) +n m
k
since

r
i=1
m
i
= n. Therefore nullity(A
k
I) m
k
. Since the opposite inequality is known
from Corollary 4.7, it follows that nullity(A
k
I) = m
k
for all k and so, from Theorem 4.4,
A is diagonalisable.
Conversely, if A is diagonalisable then, for some R, R
1
AR is a diagonal matrix D with
the distinct eigenvalues
1
,
2
, ,
r
as its diagonal entries (repeated according to algebraic
multiplicity). We claim that
(D
1
I).(D
2
I). . . . .(D
r
I) = 0 .
Clearly for any integer i between 1 and n the the i
th
diagonal entry in one of the factors
(D
1
I), (D
2
I), , (D
t
I). is 0. Therefore (D
1
I).(D
2
I). . . . .(D
r
I)e
i
= 0
for each member e
i
of the usual basis and this proves the claim. Therefore, if m(X) =
(X
1
).(X
2
). . . . .(X
t
), we have that m(A) = Rm(D)R
1
= 0 and since m has no
repeated factors, Lemma 4.9 (i) shows that the minimal polynomial has no repeated factors
(in fact m is the minimal polynomial of A).
13
5 Inner products
Let u = (u
1
, u
2
, , u
n
)
t
and v = (v
1
, v
2
, , v
n
)
t
be two vectors of C
n
. The inner product
'u, v` of u and v is dened by
'u, v` =
n

i=1
v
i
u
i
.
In R
n
the denition is the same, but naturally there is no complex conjugate.
Note that 'u, v` = v
t
u. Using this fact it is easy to verify the following properties. For all
u, v, w and , C,
1. 'u +v, w` = 'u, w` +'v, w`,
2. 'u, v` = 'v, u`,
3. 'u, u` 0, and 'u, u` = 0 if and only if u = 0 .
Properties 1, 2 and 3 imply
4. 'u, v +w` =

'u, v` + 'u, w`,
5. 'u, 0` = '0, u` = 0.
For any v we write |v| = +

'v, v`. This is the norm or the length of v.


If V is a (complex) vector space and, for every pair (u, v) of elements of V , a complex number
'u, v` is dened and satises the above properties 1,2 and 3 then V is called an inner product space.
Most of the work below holds also for inner product spaces and the proofs are essentially the same.
However, time does not permit treating the abstract case this year in this course.
The following results hold in any inner product space. They are not dicult to prove but, since
they are not needed in this course, the proofs are omitted to save time.
Lemma
(i) [Triangle Inequality] |u +v| |u| + |v| for all u, v V.
(ii) [Cauchy-Schwartz Inequality] ['u, v`[ |u|.|v| for all u, v V.
For the rest of this section all vectors are elements of C
n
or of R
n
.
Denition. If 'u, v` = 0 we say that u is orthogonal to v; we write u v.
Lemma 5.1 [Theorem of Pythagoras] If 'u, v` = 0 then |u|
2
+ |v|
2
= |u +v|
2
.
Denition. A set v
1
, v
2
, . . . , v
k
of vectors is said to be orthonormal if
1. |v
i
| = 1 for all i,
2. 'v
i
, v
j
` = 0 if i = j.
Lemma 5.2 An orthonormal set is linearly independent.
Remark. An orthonormal set of n vectors of C
n
is a basis of C
n
and is said to be an
orthonormal basis. For example, the standard basis e
i
: 1 i n is an orthonormal basis
of C
n
.
14
Lemma 5.3 Let (
1
,
2
, ,
n
)
t
and (
1
,
2
, ,
n
)
t
be the co-ordinates of v and w with
respect to an orthonormal basis v
1
, v
2
, . . . , v
n
. Then
(i)
i
= 'v, v
i
` ,
(ii) 'v, w` =

1
+

2
+

n
=

n
i=1

i
.
Remark. Part (i) above gives an easy way to calculate co-ordinates with respect to any
orthonormal basis. Part (ii) shows that the inner product of two vectors and the length of a
vector can be calculated from the co-ordinates with respect to any orthonormal basis.
Denition.
(i) Let A = (a
ij
) be a complex n n matrix. The transpose of the complex conjugate of A,
that is, the matrix whose ij-th entry is a
ji
, is called the adjoint of A and is denoted by A

.
(In brief, A

=

A
t
.)
(ii) A matrix U is said to be unitary if U is invertible and U
1
= U

.
(iii) A real matrix P is said to be orthogonal if P is invertible and P
1
= P
t
.
Remark. For any matrix A we have that A

= A. Consequently, if U is unitary then so is


U
1
and if P is orthogonal then so is P
1
.
Theorem 5.4 Let v
1
, v
2
, . . . , v
n
be an orthonormal basis of C
n
(or of R
n
). Then the set
w
1
, w
2
, . . . , w
n
is an orthonormal basis of C
n
(or of R
n
) if and only if the transition matrix
from w
i
to v
i
is unitary (or orthogonal).
Denition.
(i) A matrix A is said to be unitarily similar to the matrix B if there exists a unitary matrix
U such that A = U
1
BU = U

BU.
(ii) A matrix A is said to be orthogonally similar to the matrix B if there exists an orthogonal
matrix P such that A = P
1
BP = P
t
BP.
Orthogonal similarity is a special case of unitary similarity that is relevant to real matrices.
Note that both unitary and orthogonal similarity are equivalence relations.
Remark It follows form the work above (and Theorem 2.6) that if A is unitarily (or orthog-
onally) similar to B then A and B represent the same linear map with respect to dierent
orthonormal bases.
Denition.
(i) A complex matrix A is said to be unitarily diagonalisable if it is unitarily similar to some
diagonal matrix.
(ii) A real matrix A is said to be orthogonally diagonalisable if it is orthogonally similar to
some real diagonal matrix.
Theorem 5.5 A matrix A is unitarily diagonalisable if and only if there is an orthonormal
basis consisting of eigenvectors of A.
Denition.
(i) A matrix A is said to be symmetric ifA = A
t
.
(ii) A matrix A is said to be self-adjoint (or Hermitian) ifA = A

.
(iii) A matrix A is said to be normal ifAA

= A

A.
15
Theorem 5.6
(i) If a real matrix is orthogonally diagonalisable, then it is symmetric.
(ii) If a complex matrix is unitarily diagonalisable, then it is normal.
Lemma Let A be an n n matrix and let x and y be elements of C
n
.
(i) If 'Ax, y` = 0 for all x and y, then A = 0
n
.
(ii) 'Ax, y` = 'x, A

y` .
(iii) If U is unitary, then 'Ux, Uy` = 'x, y` and |Ux| = |x|.
(iv) If x and y belong to R
n
and P is orthogonal, then 'Px, Py` = 'x, y` and |Px| = |x|.
Theorem 5.7 Let A be a self-adjoint n n matrix.
(i) Then the eigenvalues of A are real.
(ii) Eigenvectors of A corresponding to dierent eigenvalues are orthogonal to each other.
Corollary 5.8
(i) If a self-adjoint nn matrix has n distinct eigenvalues, then it is unitarily diagonalisable.
(ii) If a real symmetric n n matrix has n distinct eigenvalues, then it is orthogonally
diagonalisable.
Theorem 5.9 [Gram-Schmidt Orthogonalization Process] Let u
1
, u
2
, . . . , u
k
be a linearly
independent set. Then the vectors v
1
, v
2
, . . . , v
k
calculated from the formulae below
x
1
= u
1
v
1
=
x
1
x
1

,
x
2
= u
2
'u
2
, v
1
`v
1
v
2
=
x
2
x
2

,
x
3
= u
3
'u
3
, v
1
`v
1
'u
3
, v
2
`v
2
v
3
=
x
3
x
3

,
.
.
.
.
.
.
x
k
= u
k

k1

i=1
'u
k
, v
i
`v
i
v
k
=
x
k
x
k

.
form an orthonormal set with the following properties.
(i) spanv
1
, v
2
, . . . , v
k
= spanu
1
, u
2
, . . . , u
k
.
(ii) If there exists a self-adjoint matrix A such that every vector u
i
is an eigenvector for
A, then every vector v
i
is also an eignevector for A.
Remark Note that if u
1
is a unit vector then u
1
= v
1
. Hence, given any unit vector u
1
,
there is an orthonormal basis with this vector as its rst element. [Use Theorem 1.2 to
nd a basis with u
1
as its rst element and apply the theorem above.] Also it follows from
the theorem above that every subspace of C
n
(or R
n
) has an orthonormal basis. [Apply the
theorem to any basis of the subspace.]
Compare the following result to Theorem 4.5.
Theorem 5.10 Every square matrix is unitarily similar (over the complex eld) to an upper
triangular matrix.
16
Proof. This is proved by induction on the size of the matrix. The result is trivial for n = 1.
Suppose the result is true for (n 1) (n 1) matrices. Let X be an n n matrix and let

1
be an eigenvalue of X with u
1
a normalized eigenvector corresponding to
1
. Let U be a
unitary matrix with u
1
as its rst column. Since Ue
1
= u
1
, we have that
XUe
1
=
1
u
1
=
1
Ue
1
and so, as U

= U
1
, it follows that U

XUe
1
=
1
e
1
. This means that the rst column of
U

XU is


1
0

where 0 is the zero column vector of C


n1
. Therefore
U

XU =


1
x
t
0 X
1

By the induction hypothesis there is an (n 1) (n 1) unitary matrix V


1
such that
V

1
X
1
V
1
= T
1
is upper triangular. Put V =

1 0
t
0 V
1

. Then V

1 0
t
0 V

and V V

1 0
t
0 V
1

1 0
t
0 V

1 0
t
0 I
n1

= I
n
. So (UV )

(UV ) = V

(U

U)V = V

V = I and
therefore UV is unitary. Also
(UV )

X(UV ) = V

(U

XU)V =

1 0
T
0 V


1
x
T
0 X
1

1 0
T
0 V
1


1
x
T
V
1
0 V

1
X
1
V
1


1
x
T
V
1
0 T
1

,
which is upper triangular. Therefore the theorem follows by induction.
Theorem 5.11
(i) A matrix is unitarily diagonalisable if and only if it is normal.
(ii) A real matrix is orthogonally diagonalisable if and only if it is symmetric.
Proof. (i) Theorem 5.6 states that if A is unitarily diagonalisable, then it is normal. For
the converse, from Theorem 5.10, there is a unitary matrix U such that U

AU is an upper
triangular matrix T. Then T

T = (U

U)(U

AU) = U

(A

A)U = U

(AA

)U = TT

, so
T is normal.
We complete the proof by showing that a normal upper triangular matrix is diagonal.
This is proved by induction on the size of the matrix. The 1 1 case is obviously true.
Suppose the result is true for (n 1) (n 1) matrices. Let T be an n n normal upper
triangular matrix. Then
T =


1
x
t
0 T
1

and T

1
0
x T

where x is some column vector. However, since T is normal,


TT

[
1
[
2
+x
t
x x
t
T

1
T
1
x T
1
T

= T

T =

[
1
[
2

1
x
t

1
x T

1
T
1
+ x.x
t

17
where T
1
is an upper triangular matrix. Then (from the (1,1) entry) x = 0 and therefore

[
1
[
2
0
0 T

1
T
1

[
1
[
2
0
0 T
1
T

showing that T
1
is normal. By the induction hypothesis
T
1
is a diagonal matrix and therefore T is a diagonal matrix.
(ii) Theorem 5.6 states that if A is real and orthogonally diagonalisable then it is symmetric.
Conversely, if A is real and symmetric then, from (i) it is unitarily diagonalisable. Further,
from Theorem 5.7 the eigenvalues of A are real. The columns of the unitary matrix are
solutions of the homogeneous equations (
i
I A)x = 0 and as these are real equations, real
solutions can always be found. Since a real unitary matrix is orthogonal, it follows that there
exists an orthogonal matrix P such that P
t
AP is diagonal.
18

You might also like