You are on page 1of 63

MATRICES

Summary of Lectures
MATH 494 Special Topics in Mathematics, Fall 2005
CRN: 23662 - undergrad, 23663 - grad, MWF 3:00-3:50, 302 AH
Instructor: Shmuel Friedland
Department of Mathematics, Statistics and Computer Science
email: friedlan@uic.edu
Last update February 6, 2006
1 Introduction
Theory of matrices related to many elds of biology, business, engineering, medicine, science
and social sciences. Let us give a few examples. Recall that
A =
_

_
a
11
a
12
... a
1n
a
21
a
22
... a
2n
.
.
.
.
.
.
.
.
.
.
.
.
a
m1
a
m2
... a
mn
_

_
is called an m n matrix and briey denoted by A = (a
ij
)
m,n
i,j=1
or just A = (a
ij
). If the
entries a
ij
are in some given set o we denote by o
mn
the set of all mn with entries in
o. In some other books o
nn
is denoted by M
mn
(o) and M
n
(o) stands for M
nn
(o). As
usual R, C, Z and F stands for the set of real numbers, complex numbers, integers, and a
eld respectively.
Consider A R
nm
. It can be interpreted as a digital picture seen on a screen. Then
a
ij
encodes the color and its strength in the location of (i, j). In many case m and n are
very big, so it is very costly and time consuming to storage the information, or to transmit
it. We know that there is a lot of redundancy in the picture. Is there a way to condense
the information to have almost the same picture, when an average person looks at it? The
answer is yes, and one way to achieve it is to use the singular value decomposition discussed
later in this course.
An other possibility is that A represents DNA gene expression data, where a
ij
is the
expression level of the gene i in the experiment number j. The number of genes is huge, e.g.
from 6,000 to 100,000 and the number of experiments can be from 4 to 30. This is done by
lasers and computers, and certain percentage of entries is corrupted. To do some statistics
on DNA we need the values of all entries of A. Is there a good way to impute, (complete),
the values of A using matrix theory? The answer is yes, and one can use least squares and
inverse eigenvalue techniques to do it.
In many applications one has a linear system given schematically by the input-output
(black box) relation x y where x, y R
n
are column vectors with n coordinates, and
y = Ax, where A R
nn
. If one repeats this procedure m times, (closed loop, then
x
m
= A
m
x, m = 1, 2, . . .. How does x
m
loos like where m is very big? This question is
very natural in stationary Markov chains, which are nowadays are very popular in many
1
simulations and algorithms for hard problems in combinatorics and computer science. The
answer to this problem is given by using the Jordan canonical form.
Let G = (V, E) be a digraph on the set of n vertices V , and the set of edges E V V .
Then G is represented by A = (a
ij
) 0, 1
nn
, where a
ij
= 0 or a
ij
= 1 if there is no edge
or there is an edge from the vertex i to the vertex j respectively. Many properties of graph are
reected in the spectrum, (the set of eigenvalues), of A, and the corresponding eigenvectors,
in particular to the eigenvector corresponding to the nonnegative eigenvalue of the maximal
modulus. This topis is covered by the Perron-Frobenius theorem for nonnegative matrices.
2 Jordan Canonical Form
2.1 Statement of the Problem
Remark 2.1 In this notes we sometimes are going to emphasize that certain results
hold for a general eld F. The student unfamiliar with this notion can safely assume that F
is either the eld of complex numbers or the eld of the real numbers.
Let V be a vector space over the eld F of dimension n, e.g. V = F
n
. (Here F
n
is the
set of column vectors with n coordinates in the eld F. To save space we denote the column
vector x with coordinates x
1
, . . . , x
n
as x = (x
1
, . . . , x
n
)

.) Let u
1
, . . . , u
n
be a basis in
V, i.e. any vector x V, is uniquely expressed as a linear combination of u
1
, . . . , u
n
:
x = x
1
u
1
+ x
2
u
2
+ . . . + x
n
u
n
. (Any set of n linearly independent vectors forms a basis
in n-dimensional vector space.) The vector (x
1
, x
2
, . . . , x
n
)

F
n
is called the coordinate
vector of x in the basis [u
1
, . . . , u
n
] of V, and x
1
, . . . , x
n
are called the coordinates of x with
respect to [u
1
, . . . , u
n
]. It is convenient to use the formalism: x = [u
1
, . . . , u
n
](x
1
, . . . , x
n
)

.
In F
n
, (R
n
or C
n
), we have the standard basis
e
1
= (1, 0, . . . , 0)

, e
2
= (0, 1, 0, . . . , 0)

, . . . , e
n
= (0, . . . , 0, 1)

,
where e
i
has the i-th coordinate equal to 1, while all other coordinates are equal to 0. Let
[v
1
, . . . , v
n
] be an other basis in V. Then there exists U = (u
ij
) F
nn
such that
[u
1
, . . . , u
n
] = [v
1
, . . . , v
n
]U u
i
=
n

j=1
u
ji
v
j
, for i = 1, . . . , n. (2.1)
Furthermore, U is an invertible matrix, i.e. there exists V F
nn
such that UV = V U = I
n
,
where I
n
in n n, identity matrix whose i-th column is the vector e
i
, for i = 1, . . . , n. (If
no confusion arises we sometimes denote I
n
by I.) V is a unique matrix which is denoted
by U
1
, the inverse of U. U is called the the transition matrix from the base [u
1
, . . . , u
n
]
to the base [v
1
, . . . , v
n
].
Let [v
1
, . . . , v
n
] be a basis and dene vectors u
1
, . . . , u
n
as in (2.1). Then u
1
, . . . , u
n
is a
basis in V if and only U is an invertible matrix. Furthermore, if we multiply [u
1
, . . . , u
n
] =
[v
1
, . . . , v
n
]U by U
1
from the right we get that [v
1
, . . . , v
n
] = [u
1
, . . . , u
n
]U
1
, i.e. the
transition matrix from v- basis to u-basis is given by the inverse of the transition matrix
from u- basis to v-basis.
Let T : V V be a linear transformation: T(ax +by) = aT(x) +bT(y) for all scalars
a, b F and vectors x, y V. For example for A F
nn
A : F
n
F
n
is given by x Ax,
for any column vector x F
n
, is linear transformation.
Any linear transformation T is determined uniquely by its representation matrix A =
(a
ij
) F
n
in a given basis [u
1
, . . . , u
n
], dened as Tu
i
= a
1i
u
1
+ a
2i
u
2
+ . . . + a
ni
u
n
, i =
1, . . . , n. The formalism notation is
T[u
1
, . . . , u
n
] := [Tu
1
, . . . , Tu
n
] = [u
1
, . . . , u
n
]A.
2
Note that if x and y are the coordinate vectors of v and Tv respectively, then y = Ax:
Tv = T(
n

i=1
x
i
u
i
) =
n

i=1
x
i
Tu
i
=
n

i=1
n

j=1
x
i
a
ji
u
j
=
n

i=j=1
a
ji
x
i
u
j
=
n

j=1
(
n

i=1
a
ji
x
i
)u
j
=
n

j=1
y
j
u
j
.
This easily follows from the formalism
Tv = T([u
1
, . . . , u
n
]x) = (T[u
1
, . . . , u
n
])x = ([u
1
, . . . , u
n
]A)x = [u
1
, . . . , u
n
](Ax).
Let [v
1
, . . . , v
n
] be another basis in V. Assume (2.1). Then the representation matrix of T
in v-basis is given by B = UAU
1
:
T[u
1
, . . . , u
n
] = [u
1
, . . . , u
n
]A T([v
1
, . . . , v
n
]U) = ([v
1
, . . . , v
n
]U)A
(T[v
1
, . . . , v
n
])U = [v
1
, . . . , v
n
](UA) T[v
1
, . . . , v
n
] = [v
1
, . . . , v
n
](UAU
1
).
Denition 2.2 Let GL(n, F) F
nn
denote the set (group) of all n n invertible
matrices with entries in a given eld F. A, B F
nn
are called similar, and this is denoted
by A B, if B = UAU
1
for some U GL(n, F). The set of all B F
nn
similar to a
xed A F
nn
is called the similarity class corresponding to A, or simply a similarity class.
The following proposition is straightforward:
Proposition 2.3 Let F be a eld, (F = R, C). Then the similarity relation on F
nn
is
an equivalence relation:
A A, A B B A, A B and B C A B.
Furthermore if B = UAU
1
then
1. det(zI
n
B) = det(zI
n
A), i.e. A and B have the same characteristic polynomial.
2. For any integer m 2 B
m
= UA
m
U
1
.
3. If in addition A is invertible, then B is invertible and B
m
= UA
m
U
1
for any integer
m.
Corollary 2.4 Let V be n-dimensional vector space over F. Assume that T : V V
is a linear transformation. Then the set of all representation matrices of T is a similarity
class. Hence, the characteristic polynomial of T is dened as det(zI
n
A) = z
n
+

n1
i=0
a
i
z
i
,
where A is the representation matrix of T in any basis [u
1
, . . . , u
n
], and this denition is
independent of the choice of a basis. In particular det T := det A, and trace T
m
= trace A
m
for any nonnegative integer. (T
0
is the identity operator, i.e T
0
v = v for all v V, and
A
0
= I. Here by the trace of B F
nn
, denoted by trace B, we mean the sum of all diagonal
elements of B.)
Problem 2.5 (The representation problem.) Let V be n-dimensional vector space over
F. Assume that T : V V is a linear transformation. Find a basis [v
1
, . . . , v
n
] in which
T has the simplest form. Equivalently, given A F
nn
nd B A of the simplest form.
In the following case the answer is well known. Recall that v V is called an eigenvector
of T corresponding to the eigenvalue F, if v ,= 0 and Tv = v. This is equivalent to
the existence 0 ,= x F
n
such that Ax = x. Hence (I A)x = 0 which implies that
det(I A) = 0. Hence is the zero of the characteristic polynomial of A and T. The
assumption is a zero of the characteristic polynomial yields that the system (I A)x
has a nontrivial solution x ,= 0.
Corollary 2.6 Let A F
nn
. Then is an eigenvalue of A if and only if is a zero of
the characteristic polynomial of A: det(zI A). Let V be n-dimensional vector space over
F. Assume that T : V V is a linear transformation. Then is an eigenvalue of T if and
only if is a zero of the characteristic polynomial of T.
3
Proposition 2.7 Let V be n-dimensional vector space over F. Assume that T : V V
is a linear transformation. Then there exists a basis in V such that T is represented in this
basis dy a diagonal matrix
diag(
1
,
2
, . . . ,
n
) :=
_

1
0 ... 0
0
2
... 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 ...
n
_

_
,
if and only if the characteristic polynomial of T is (z
1
)(z
2
) . . . (z
n
), and V has
a basis consisting of eigenvectors of T.
Equivalently, A F
nn
is similar to a diagonal matrix diag(
1
,
2
, . . . ,
n
) if and only
if det(zI A) = (z
1
)(z
2
) . . . (z
n
), and A has n-linearly independent eigenvectors.
Proof. Assume that there exists a basis [u
1
, . . . , u
n
] in V such that T is represented
in this basis dy a diagonal matrix := diag(
1
, . . . ,
n
). Then the characteristic polynomial
of T is det(zI ) =

n
i=1
(z
i
). From the denition of the representation matrix of T,
it follows that Tu
i
=
i
u
i
for i = 1, . . . , n. Since each u
i
,= 0, we deduce that each u
i
is an
eigenvector of T. By our assumption u
1
, . . . , u
n
for a basis in V.
Assume now that V has a basis [u
1
, . . . , u
n
] consisting eigenvectors of T. So Tu
i
=
i
u
i
for i = 1, . . . , n. Hence is the representation matrix of T in the basis [u
1
, . . . , u
n
].
To prove the corresponding results for A F
nn
, let V := F
n
and dene the linear
operator Tx := Ax for all x F
n
. 2
Theorem 2.8 Let V be n-dimensional vector space over F. Assume that T : V V is
a linear transformation. Assume that the characteristic polynomial of T p(z) has n distinct
roots over F, i.e. p(z) =

n
i=1
(z
i
) where
1
, . . . ,
n
F, and
i
,=
j
for each i ,= j.
Then there exists a basis in V in which T is represented by a diagonal matrix.
Similarly, let A F
nn
and assume that det(zI A) has n distinct roots in F. Then A
is similar to a diagonal matrix.
Proof. It is enough to consider the case of the linear transformation T. Recall that
each root of the characteristic polynomial of T is an eigenvalue of T (Corollary 2.6). Hence
to each
i
corresponds an eigenvector u
i
: Tu
i
=
i
u
i
. Then the proof of the theorem
follows Problem 1 of this section and Proposition 2.7. 2
Given A F
nn
it may happen that det(zI A) does not have n roots in F. (See for
example Problem 2 of this section.) Hence we can not diagonalize A, i.e. A is not similar
to a diagonal matrix. If F is algebraically closed, i.e. any det(zI A) has n roots in F we
can apply Proposition diagform in general and Theorem diagthm in particular to see if A
is diagonable.
Since R is not algebraically closed and C is, that is the reason that we sometimes we
view a real valued matrix A R
nn
as a complex valued matrix A C
nn
. (See Problem
2 of this section.)
Corollary 2.9 Let A C
nn
be nondiagonable. Then its characteristic polynomial
must have a multiple root.
See Problem 3 of this section.
Denition 2.10 1. Let k be a positive integer and F. Then J
k
() F
kk
be
a k k be an upper diagonal matrix, with on the main diagonal, 1 on the next
4
sub-diagonal and other entries are equal to 0 for k > 1:
J
k
() :=
_

_
1 0 ... 0 0
0 1 ... 0 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 ... 1
0 0 0 ... 0
_

_
,
(J
1
() = [].)
2. Let A
i
F
n
i
n
i
for i = 1, . . . , l. Denote by

k
i=1
A
i
= A
1
A
2
. . . A
k
= diag(A
1
, A
2
, . . . , A
k
) :=
_

_
A
1
0 ... 0
0 A
2
... 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 ... A
k
_

_
F
nn
, n = n
1
+n
2
+. . . +n
k
,
the n n block diagonal matrix, whose blocks are A
1
, A
2
, . . . , A
k
.
Theorem 2.11 (The Jordan Canonical Form) Let A C
nn
, (A F
nn
, where F
is an algebraically closed eld.) Then A is similar to its Jordan canonical form
k
i=1
J
n
i
(
i
)
for some
1
, . . . ,
k
C, (
1
, . . . ,
k
F), and positive integers n
1
, . . . , n
k
. The Jordan
canonical form is unique up to the permutations of the Jordan blocks J
n
1
(
1
), . . . , J
n
k
(
k
).
Equivalently, let T : V V be a linear transformation of an n-dimensional space over C,
or any other algebraically closed eld. Then there exists a basis in V, such that
k
i=1
J
n
i
(
i
)
is the representation matrix of T in this basis. The blocks J
n
i
(
i
), i = 1, . . . , k are unique.
Note that A C
nn
is diagonable if and only in its Jordan canonical form k = n, i.e.
n
1
= . . . = n
n
= 1. For k < n, the Jordan canonical form is the simplest form of the
similarity class of a nondiagonable A C
nn
.
We will prove Theorem 2.11 in the next several sections.
Problems
1. Let V be a vector space over F. (You may assume that F = C.) Let T : V V be a
linear transformation. Suppose that u
i
is an eigenvector of T with the corresponding
eigenvalue
i
for i = 1, . . . , m. Show by induction on m that if
1
, . . . ,
m
are m
distinct scalars then u
1
, . . . , u
m
are linearly independent.
2. Let A =
_
0 1
1 0
_
R
22
.
(a) Show that A is not diagonable over the real numbers R.
(b) Show that A is diagonable over the complex numbers C. Find U
C
22
and a diagonal C
22
such that A = UU
1
.
3. Let A =
k
i=1
J
n
i
(
i
). Show that det(zI A) =

k
i=1
(z
i
)
n
i
. (You may use the
fact that the determinant of an upper triangular matrix is the product of its diagonal
entries.)
4. Let A =
k
i=1
A
i
where A
i
C
n
i
n
i
, i = 1, . . . , k. Show that det(zI
n
A) =

k
i=1
det(zI
n
i
A
i
). (First show the identity for k = 2 using the determinant ex-
pansion by rows. Then use induction for k > 2.
5
5. (a) Show that any eigenvector of J
n
() C
nn
is in the subspace spanned by e
1
.
Conclude that J
n
() is not diagonable unless n = 1.
(b) What is the rank of zI
n
J
n
() for a xed C and for each z C?
(c) What is the rank of zI
k
i=1
J
n
i
(
i
) for xed
1
, . . . ,
k
C and for each z C?
6. Let A C
nn
and assume that det(zI
n
A) = z
n
+ a
1
z
n1
+ . . . + a
n1
z + a
n
has
n distinct complex roots. Show that A
n
+ a
1
A
n1
+ . . . a
n1
A + a
n
I
n
= 0, where
0 C
nn
denotes the zero matrix, i.e. the matrix whose all entries are 0. (This is
a special case of the Cayley-Hamilton theorem, which claims that the above identity
holds for any A C
nn
.) Hint: Use the fact that A is diagonable.
2.2 Matrix polynomials
For a eld F, F = R, C, denote by F[z], the ring of polynomials p(z) = a
0
z
n
+a
1
z
n1
+. . .+a
n
with coecients in F. The degree of p, denoted by deg p, is the maximal degree n j of
a monomial a
j
x
nj
which is not identically zero, i.e. a
j
,= 0. So deg p = n if and only if
a
0
,= 0, the degree of a nonzero constant polynomial p(z) = a
0
is zero, and the degree of
the zero polynomial is agreed to be equal to . For two polynomials p, q F[z] and two
scalars a, b F ap(z)+bq(z) is a well dened polynomial. Hence F[z] is a vector space over F,
whose dimension is innite. The set of polynomials of degree n at most, is n+1 dimensional
subspace of F[z]. Given two polynomials p =

n
i=0
a
i
z
ni
, q =

m
j=0
b
j
z
mj
F[z] one can
form the product
p(z)q(z) =
n+m

k=0
(
k

i=0
a
i
b
ki
)z
n+mk
, where a
i
= b
j
= 0 for i > n and j > m.
Note that pq = qp and deg pq = deg p+deg q. The addition and the product in F[z] satises
all the nice distribution identities as the addition and multiplication in F. Here the constant
polynomial p 1 is the identity element, and the zero polynomial as the zero element.
(That is the reason for the name ring of polynomials in one variable over F.)
Let P(z) = (p
ij
(z))
m,n
i=j=1
be an m n matrix whose entries are polynomials in F[z].
The set of all such m n matrices is denoted by F[z]
mn
. Clearly F[z]
mn
is a vector
space over F, of innite dimension. Given p(z) F[z] and P(z) F[z]
mn
one can dene
p(z)P(z) := (p(z)p
ij
) F[z]. Again, this product satises nice distribution properties. Thus
F[z] is a module over the ring F[z]. (Note F[z] is not a eld!)
Let P(z) = (p
ij
(z)) F[z]
mn
. Then deg P(z) := max
i,j
deg p
ij
(z) = l. Write
p
ij
(z) =
l

k=0
p
ij,k
z
lk
, P
k
:= (p
ij,k
)
m,n
i=j=1
F
mn
for k = 0, . . . , l.
Then
P(z) = P
0
z
l
+P
1
z
l1
+. . . +P
l
, P
i
F
mn
, i = 0, . . . , l, (2.2)
is a matrix polynomial with coecients in F
mn
.
Assume that P(z), Q(z) F[z]
nn
. Then we can dene P(z)Q(z) F[z]. Note that in
general P(z)Q(z) ,= Q(z)P(z). Hence F[z]
nn
is a noncommutative ring. For P(z) F
nn
of the form (2.2) and any A F
nn
we dene
P(A) =
l

i=0
P
i
A
li
= P
0
A
l
+P
1
A
l1
+. . . +P
l
, where A
0
= I
n
.
Recall that given two polynomials p, q F[z] one can divide p by q , 0 with the residue
r, i.e. p = tq+r for some unique t, r F[z], where deg r < deg q. One can trivially generalize
that to polynomial matrices:
6
Proposition 2.12 Let p(z), q(z) F[z] and assume that q(z) , 0. Let p(z) = t(z)q(z)+
r(z), where t(z), r(z) F[z] are unique polynomials with deg r(z) < deg q(z). Let n > 1 be an
integer, and dene the following scalar polynomials: P(z) := p(z)I
n
, Q(z) := q(z)I
n
, T(z) :=
t(z)I
n
, R(z) := r(z)I
n
F[z]
nn
. Then P(A) = T(A)Q(A) +R(A) for any A F
nn
.
Proof. Since A
i
A
j
= A
i+j
for any nonnegative integer, with A
0
= I
n
, the equality
P(A) = T(A)Q(A) +R(A) follows trivially from the equality p(z) = t(z)q(z) +r(z). 2
Recall that p is divisible by q, denoted as q[p, if p = tq, i.e. r is the zero polynomial.
Note that if q(z) = (z a) then p(z) = t(z)(z a) + p(a). Thus (z a)[p if and only if
p(a) = 0. Similar results hold for square polynomial matrices, which are not scalar.
Lemma 2.13 Let P(z) F[z]
nn
, A F
nn
Then there exists a unique T
left
(z), of
degree degP 1 if deg P > 0 or degree if deg P 0, such that
P(z) = T
left
(z)(zI A) +P(A). (2.3)
In particular, P(z) is divisible from the right by zI A if and only if P(A) = 0.
Proof. We prove the lemma by induction on deg P. If deg P 0, i.e. P(z) = P
0
F
nn
then T
left
= 0, P(A) = P
0
and the lemma trivially holds. Suppose that the lemma holds
for all P with deg P l 1, where l 1. Let P(z) be of degree l 1 of the form
(2.2). Then P(z) = P
0
z
l
+

P(z), where

P(z) =

l
i=1
P
i
z
l1
. By the induction assumption

P(z) =

T
left
(z)(zI
n
A) +

P(A), where

T
left
(z) is unique. A straightforward calculation
shows that
P
0
z
l
=

T
left
(z)(zI
n
A) +P
0
A
l
, where

T
left
(z) =
l1

i=0
P
0
A
i
z
li1
,
and

T
left
is unique. Hence T
left
(z) =

T
left
(z) +

T
left
is unique, P(A) = P
0
A
l
+

P(A) and
(2.3) follows.
Suppose that P(A) = 0. Then P(z) = T
left
(z)(zI A), i.e. P(z) is divisible by zI
n
A
from the right. Assume that P(z) is divisible by (zI
n
A) from the right, i.e. there exists
T(z) F[z]
nn
such that P(z) = T(z)(zI
n
A). Subtract (2.3) from P(z) = T(z)(zI
n
A)
to deduce that 0 = (T(z)T
left
(z))(zI
n
A)P(A). Hence T(z) = T
left
(z) and P(A) = 0.
2
The above lemma can generalized to any Q(z) = Q
0
z
l
+Q
1
z
l1
+. . . +Q
l
F[z], where
Q
0
GL(n, F): There exists unique T
left
(z), R
left
(z) F[z] such that
P(z) = T
left
(z)Q(z) +R
left
(z), deg R
left
< deg Q, Q(z) =
l

i=0
Q
i
z
li
, Q
0
GL(n, F).
(2.4)
Here we agree that (Az
i
)(Bz
j
) = (AB)z
i+j
for any A, B F
nn
and nonnegative integers
i, j.
Theorem 2.14 ( Cayley-Hamilton theorem.) Let A F
nn
and p(z) = det(zI
n
A)
be the characteristic polynomial of A. Let P(z) = p(z)I
n
F[z]
nn
. Then P(A) = 0.
Proof. Let A(z) = zI
n
A. Fix z F and let B(z) = (b
ij
(z)) be the adjoint matrix of
A(z), whose entries are the cofactors of A(z). That is b
ij
(z) is (1)
i+j
times the determinant
of the matrix obtained from A(z) by deleting row j and column i. If one views z as
indeterminate then B(z) F[z]
nn
. Recall the identity
A(z)B(z) = B(z)A(z) = det A(z)I
n
= p(z)I
n
= P(z).
7
Hence (zI
n
A) divides from the right P(z). Lemma 2.13 yields that P(A) = 0. 2
For p, q F[z] let (p, q) be the greatest common divisor of p, q. If p and q are identically
zero then (p, q) is the zero polynomial. Otherwise (p, q) is a polynomial s of the highest
degree that divides p and q. s is determined up to a multiple of a nonzero scalar. s can be
chosen as a unique monic polynomial:
s(z) = z
l
+s
1
z
l1
+. . . +s
l
F[z]. (2.5)
For p, q , 0 s can be found using the Euclid algorithm:
p
i
(z) = t
i
(z)p
i+1
(z) +p
i+2
(z), deg p
i+2
< deg p
i+1
i = 1, . . . (2.6)
Start this algorithm with p
1
= p, p
2
= q. Continue it until p
k
= 0 the rst time. (Note that
k 3. Then p
k1
= (p, q). It is straightforward to show from the above algorithm that
(p(z), q(z)) = u(z)p(z) +v(z)q(z), for some u(z), v(z) F[z]. (2.7)
(This formula holds for any p, q F[z].) p, q F[z] are called coprime if (p, q) = 1.
Corollary 2.15 Let p, q F[z] be coprime. Then there exists u, v F[z] such that
1 = up + vq. Let n > 1 be an integer and dene P(z) := p(z)I
n
, Q(z) := q(z)I
n
, U(z) :=
u(z)I
n
, V (z) := u(z)I
n
F[z]
nn
. Then for any A F
nn
we have the identity I
n
=
U(A)P(A) +V (A)Q(A), where U(A)P(A) = P(A)U(A) and V (A)Q(A) = Q(A)V (A).
Let us consider that case where p, q F[z] are both nonzero polynomials that split (to
linear factors) over F. So
p(z) = p
0
(z
1
) . . . (z
i
), p
0
,= 0, q(z) = q
0
(z
1
) . . . (z
j
), q
0
,= 0.
In that case (p, q) = 1, if p and q do not have a common root. If p and q have a common
zero then (p, q) is is a nonzero polynomial that has the maximal number of common roots
of p and q counting with multiplicities.
From now on for any p F[z] and A F
nn
we identify p(A) with P(A), where
P(z) = p(z)I
n
.
2.3 Minimal polynomial and decomposition to invariant subspaces
Recall that F
nn
is a vector space over F of dimension n
2
. Let A F
nn
and consider the
powers A
0
= I
n
, A, A
2
, . . . , A
m
. Let m be the smallest positive integer such that these m+1
matrices are linearly dependent as vectors in F
nn
. (Note that A
0
,= 0.) So

m
i=0
b
i
A
mi
=
0, and (b
0
, . . . , b
m
)

,= 0. If b
0
= 0 then A
0
, . . . , A
m1
are linearly dependent, which
contradicts the denition of m. Hence b
0
,= 0. Divide the linear dependence by b
0
to obtain.
(A) = 0, (z) = z
m
+
m

i=1
a
i
z
mi
F[z], a
i
=
b
i
b
0
for i = 1, . . . , m. (2.8)
is called the minimal polynomial of A. In principle m n
2
, but in reality m n:
Theorem 2.16 Let A F
nn
and (z) be its characteristic polynomial. Assume that
p(z) F[z] is an annihilated polynomial of A, i.e. p(A) = 0. Then divides p. In
particular, the characteristic polynomial p(z) = det(zI
n
A) is divisible by (z). Hence
deg deg p = n.
Proof. Divide the annihilating polynomial p by to obtain p(z) = t(z)(z) + r(z),
where deg r < deg = m. Proposition 2.12 yields that p(A) = t(A)(A) + r(A) which
implies that r(A) = 0. Assume that l = deg r(z) 0, i.e. r is not identically the zero
8
polynomial. So A
0
, . . . , A
l
are linearly dependent, which contradicts the denition of m.
Hence r(z) 0.
The Cayley-Hamilton theorem yields that the characteristic polynomial p(z) of A anni-
hilates A. Hence [p and deg deg p = n. 2
Denition 2.17 A matrix A F
nn
is called nonderogatory if the minimal polynomial
of A is equal to its characteristic polynomial.
Denition 2.18 Let V be a nite dimensional vector space over F, and assume that
V
1
, . . . , V
i
nonzero subspaces of V. Then V is a direct sum of V
1
, . . . , V
i
, denoted as
V =
i
j=1
V
j
if any vector v V has a unique representation as v = v
1
+ . . . +v
i
, where
v
j
V
j
for j = 1, . . . , i. Equivalently, let [v
j1
, . . . , v
jl
j
] be a basis of V
j
for j = 1, . . . , i.
Then dimV =

i
j=1
dimV
j
=

i
j=1
l
j
and the dimV vectors v
11
, . . . , v
1l
1
, . . . , v
i1
, . . . , v
il
i
are linearly independent.
Let T : V V be a linear operator. A subspace U of V is called a T-invariant subspace,
or simply an invariant subspace when there is no ambiguity about T, if Tu U for each
u U. We denote this fact by TU U. Denote by T[
U
the restriction of T to the invariant
subspace of T. Clearly, T[
U
is a linear operator on U.
Note V and the zero subspace 0, (which consist only of the zero element), are invariant
subspaces. Those are called trivial invariant subspaces. U is called a nontrivial invariant
subspace if U is an invariant subspace such that 0 < dimU < dimV.
Since the representation matrices of T in dierent bases form a similarity class we can
dene the minimal polynomial (z) F[z] of T, as the minimal polynomial of any repre-
sentation matrix of T. (See Problem 1 in the end of this section.) Equivalently (z) is the
monic polynomial of the minimal degree which annihilates T: (T) = 0.
Theorem 2.19 Let T : V V be a linear operator on a nite dimensional space
dimV > 0. Let (z) be the minimal polynomial of T. Assume that (z) decomposes to
(z) =
1
(z) . . .
k
(z), where each
i
(z) is a monic polynomial of degree at least 1. Suppose
furthermore that for each pair i ,= j
i
(z) and
j
(z) are coprime. Then V is a direct
sum of V
1
, . . . , V
k
, where each V
i
is a nontrivial invariant subspace of T. Furthermore the
minimal polynomial of T[
V
i
is equal to
i
(z) for i = 1, . . . , k. Moreover, each V
i
is uniquely
determined by
i
(z) for i = 1, . . . , k.
Proof. We prove the theorem by induction on k 2. Let k = 2. So (z) =

1
(z)
2
(z). Let V
1
:=
2
(T)V, V
2
=
1
(T)V be the ranges of the operators
2
(T),
1
(T)
respectively. Observe that
TV
1
= T(
2
(T)V) = (T
2
(T))V = (
2
(T)T)V =
2
(T)(TV)
2
(T)V = V
1
.
Thus V
1
is a T-invariant subspace. Assume that V
1
= 0. This is equivalent to that

2
(T) = 0. By Theorem 2.16 divides
2
which is impossible since deg = deg
1
+
deg
2
> deg
1
. Thus dimV
1
> 0. Similarly V
2
is a nonzero T-invariant subspace. Let
T
i
= T[
V
i
for i = 1, 2. Clearly

1
(T
1
)V
1
=
1
(T)V
1
=
1
(T)(
2
(T)V) = (
1
(T)
2
(T))V = 0,
since is the minimal polynomial of T. Hence
1
(T
1
) = 0, i.e.
1
is an annihilating
polynomial of T
1
. Similarly,
2
(T
2
) = 0.
Let U = V
1
V
2
. Then U is an invariant subspace of T. We claim that U = 0,
i.e. dimU = 0. Assume to the contrary that dimU 1. Let Q := T[
U
and denote by
F[z] the minimal polynomial of Q. Clearly deg 1. Since U V
i
it follows that
i
is an annihilating polynomial of Q for i = 1, 2. Hence [
1
and [
2
, i.e. is a nontrivial
9
factor of
1
and
2
. This contradicts the assumption that
1
and
2
are coprime. Hence
V
1
V
2
= 0.
Since (
1
,
2
) = 1 there exists polynomials f, g F[z] such that
1
f +
2
g = 1. Hence
I =
1
(T)f(T)+
2
(T)g(T), where I is the identity operator Iv = v on V. In particular for
any v V we have v = v
2
+v
1
, where v
1
=
2
(T)(g(T)v) V
1
, v
2
=
1
(T)(f(T)v) V
2
.
Since V
1
V
2
= 0 it follows that V = V
1
V
2
. Let

i
be the minimal polynomial
of T
i
. Then

i
[
i
for i = 1, 2. Hence

2
[
1

2
. Let v V. Then v = v
1
+ v
2
, where
v
i
V
i
, i = 1, 2. Using the facts that

1
(T)

2
(T) =

2
(T)

1
(T),

i
is the minimal
polynomial of T
i
, and the denition of T
i
we deduce

1
(T)

2
(T)v =

2
(T)

1
(T)v
1
+

1
(T)

2
(T)v
2
= 0.
Hence the monic polynomial (z) :=

1
(z)

2
(z) is an annihilating polynomial of T. Thus
(z)[(z) which implies that (z) = (z), hence

i
=

for i = 1, 2.
It is left to show that V
1
and V
2
are unique. Let

V
i
:= v V :
i
(T)v = 0 for
i = 1, 2. So

V
i
is a subspace that contains V
i
for i = 1, 2. If
i
(T)v = 0 then

i
(T)(Tv) = (
i
(T)T)v = (T
i
(T)v) = T(
i
(T)v) = T0 = 0.
Hence

V
i
is T-invariant subspace. We claim that

V
i
= V
i
. Suppose to the contrary that
dim

V
i
> dimV
i
for some i 1, 2. Let j 1, 2 and j ,= i. Then dim(

V
i
V
j
) 0.
As before we conclude that U :=

V
i
V
j
is T-invariant subspace. As above, the minimal
polynomial of T[
U
must divide
1
(z) and
2
(z), which contradicts the assumption that
(
1
,
2
) = 1. This concludes the proof of the theorem for k = 2.
Assume that k 3. Let

2
:=
2
. . .
k
. Then (
1
,

2
) = 1 and =
1

2
. Then
V = V
1


V
2
, where T : V
1
V
1
, has the minimal polynomial
1
, and T :

V
2


V
2
has the minimal polynomial

2
. Note that V
1
and

V
2
are unique. Apply the induction
hypothesis to T[

V
2
to deduce the theorem. 2
Problems
1. Let A, B F
nn
and p(z) F[z]. Show
(a) If B = UAU
1
, for some U GL(n, F), then p(B) = Up(A)U
1
.
(b) If A B then A and B have the same minimal polynomial.
(c) Let Ax = x. Then p(A)x = p()x. Deduce that each eigenvalue of A is a root
of the minimal polynomial of A.
(d) Assume that A has n distinct eigenvalues. Then A is nonderogatory.
2. (a) Show that the Jordan block J
k
() F
kk
is nonderogatory.
(b) Let
1
, . . . ,
k
F be k distinct elements. Let
A =
k,l
i
i=j=1
J
m
ij
(
i
), where m
i
= m
i1
. . . m
il
i
1, for i = 1, . . . , k. (2.9)
Here m
ij
and l
i
are positive integers be integers. Find the minimal polynomial
of A. When A is nonderogatory?
3. Find the characteristic and the minimal polynomials of
C :=
_

_
2 2 2 4
4 3 4 6
1 1 1 2
2 2 2 4
_

_
,
10
4. Let A :=
_
x y
u v
_
. Then A is a point in four dimensional space R
4
.
(a) What is the condition that A has a multiple eigenvalue (det(zI
2
A) = (z )
2
)
? Conclude that the set (variety) all 22 matrices with a multiple eigenvalue is a
quadratic hypersurface in R
4
, i.e. it satises a polynomial equation in (x, y, u, v)
of degree 2. Hence its dimension is 3.
(b) What is the condition that A has a multiple eigenvalue and it is a diagonable
matrix, i.e. similar to a diagonal matrix? Show that this is a line in R
4
. Hence
its dimension is 1.
(c) Conclude that the set (variety) of 2 2 matrices which have multiple eigenvalues
and diagonable is much smaller then the variety of matrices with multiple
eigenvalue.
This fact holds for any n n matrices R
nn
or C
nn
.
5. Programming Problem
Spectrum and pseudo spectrum: Let A = (a
ij
)
n
i,j=1
C
nn
. Then det(zI
n
A) =
(z
1
) . . . (z
n
) and the spectrum of A is given as spec A :=
1
, . . . ,
n
. In
computations, the entries of A are known or given up to a certain precision. Say, in
regular precision each a
ij
is known with precision to eight digits: a
1
.a
2
. . . a
8
10
m
for some integer m., e.g. 1.2345678 10
12
, in oating point notation. Thus, with
a given matrix A, we associate a whole class of matrices ((A) C
nn
of matrices
B C
nn
that are represented by A. For each B ((A) we have the spectrum
spec B. Then the pseudo spectrum of A is the union of all the spectra of B ((A):
pspec A :=
Bc(A)
spec (B). spec A and pspec A are subsets of the complex plane
C and can be easily plotted by computer. The shape of pspec A gives an idea of
our real knowledge of the spectrum of A, and to changes of the spectrum of A under
perturbations. The purpose of this programming problems to give the student a taste
of this subject.
In all the computations use double precision.
(a) Choose at random A = (a
ij
) R
55
as follows: each entry a
ij
is chosen at
random from the interval [1, 1], using uniform distribution. Find the spectrum
of A and plot the eigenvalues of A on the XY axis as complex numbers, marked
say as +, where the center of + is at each eigenvalue.
i. For each = 0.1, 0.01, 0.0001, 0.000001 do the following:
For i = 1, . . . , 100 choose B
i
R
55
at random as A in the item (a) and
nd the spectrum of A+B
i
. Plot these spectra, each eigenvalue of A+B
i
plotted as on the X Y axis, together with the plot of the spectrum of A.
(Altogether you will have 4 graphs.)
(b) Let A := diag(0.1C, [0.5]), i.e. A R
55
be a block diagonal matrix where the
rst 4 4 block is 0.1C, where the matrix C is given in Problem 3 above, and
the second block is 1 1 matrix with the entry 0.5. Repeat part (i) of part (a)
above with this specic A. (Again you will have 4 graphs.)
(c) Repeat (a) by choosing at random a symmetric matrix A = (a
ij
) R
55
. That
is choose at random a
ij
for 1 i j, and let a
ji
= a
ij
for i < j.
i. Repeat the part (i) of (a). (B
j
are not symmetric!) You will have 4 graphs.
ii. Repeat part (i) of (a), with the restriction that each B
j
is a random sym-
metric matrix, as explained in (c). You will have 4 graphs.
(d) Can you draw some conclusions about these numerical experiments?
11
2.4 Existence and uniqueness of the Jordan canonical form
Denition 2.20 A F
nn
or a linear transformation T : V V is called nilpotent
respectively, if A
m
= 0 or T
m
= 0. The minimal m 1 for which A
m
= 0 or T
m
= 0 is
called the index of nilpotency of A and T respectively, and denoted by index A or index T
respectively.
Assume that A or T are nilpotent, then the s-numbers are dened as
s
i
(A) := rank A
i1
2rank A
i
+rank A
i+1
, s
i
(T) := rank T
i1
2rank T
i
+rank T
i+1
, i = 1, . . .
(2.10)
Note that A or T are nilpotent with the index of nilpotency m if and only z
m
is the minimal
polynomial of A or T respectively. Furthermore if A or T are nilpotent then the maximal l
for which s
l
> 0 is equal to the index of nilpotency of A or T respectively.
Proposition 2.21 Let T : V V be a nilpotent operator, with the index of nilpotency
m, on the nite dimensional vector V. Then
rank T
i
=
m

j=i+1
(j i)s
j
= (mi)s
m
+(mi1)s
m1
+. . . s
i+1
, i = 0, . . . , m1. (2.11)
Proof. Since T
l
= 0 for l m it follows that s
m
(T) = rank T
m1
and s
m1
=
rank T
m2
2rank T
m1
if m > 1. This proves (2.11) for i = m1, m2. For other values
of i (2.11) follows straightforward from (2.10) by induction on mi 2. 2
Theorem 2.22 Let T : V V be a linear transformation on a nite dimensional
space. Assume that T is nilpotent with the index of nilpotency m. Then V has a basis of
the form
x
j
, Tx
j
, . . . T
l
j
1
x
j
, j = 1, . . . , i, where l
1
= m . . . l
i
1, and T
l
j
x
j
= 0, j = 1, . . . , i.
(2.12)
More precisely, the number of l
j
, which are equal to an integer l [1, m], is equal to s
l
(T)
given in (2.10).
Proof. Let s
i
:= s
i
(T), i = 1, . . . , m be given by (2.10). Since T
l
= 0 for l m it follows
that s
m
= rank T
m1
= dimrange T
m1
. So [y
1
, . . . , y
s
m
] is a basis for T
m1
V. Clearly
y
i
= T
m1
x
i
for some x
1
, . . . , x
s
m
V. We claim that the ms
m
vectors
x
1
, Tx
1
, . . . , T
m1
x
1
, . . . , x
s
m
, Tx
s
m
, . . . , T
m1
x
s
m
(2.13)
are linearly independent. Suppose that there exists a linear combination of these vectors
that is equal to 0:
m1

j=0
s
m

k=1

jk
T
j
x
k
= 0. (2.14)
Multiply this equality by T
m1
. Thus we obtain

m1
j=0

s
m
k=1

jk
T
m1+j
x
k
= 0. Recall
that T
l
= 0 for any l m. Hence this equality reduces to

s
m
k=1

0k
T
m1
x
k
= 0. Since
T
m1
x
1
, . . . , T
m1
x
s
m
form a basis in T
m1
V it follows that
0k
= 0 for k = 1, . . . , s
m
. If
m = 1 we deduce that the vectors in (2.13) are linearly independent. Assume that m > 1.
Suppose that we already proved that
jk
= 0 for k = 1, . . . , s
m
and j = 0, . . . , l 1, where
1 l m 1. Hence in (2.14) we can assume that the summation on j starts from
j = l. Multiply (2.14) by T
ml+1
and use the above arguments to deduce that
lk
= 0
for k = 1, . . . , s
m
. Use this argument iteratively for l = 1, . . . , m 1 to deduce the linear
independence of the vectors in (2.13).
12
Note that for m = 1 we proved the theorem. Assume that m > 1. Let p [1, m] be an
integer. We claim that the vectors
x
j
, Tx
j
, . . . , T
l
j
1
x
j
, for all j such that l
j
p (2.15)
are linearly independent and satisfy the condition T
l
j
x
j
= 0 for all l
j
p. Moreover, the
vectors
T
p1
x
j
, . . . , T
l
j
1
x
j
, for all j such that l
j
p (2.16)
is a basis for range T
p1
. Furthermore for each integer l [p, m] the number of l
j
, which
are equal to l, is equal to s
l
(T).
We prove this claim by the induction on mp+1. For p = m our previous argument give
this claim. Assume that the claim holds for p = q m and let p = q 1. By the induction
assumption the vectors in (2.15) are linearly independent for l
j
q. Hence that vectors
T
q2
x
j
, . . . , T
l
j
1
x
j
for all l
j
q are linearly independent. Use the induction assumption
that the number of l
j
= l [q, m] is equal to s
l
(T) to deduce that the number of this
vectors is equal to t
q2
:= (mq +2)s
m
+(mq +1)s
m1
+. . . +2s
q
. Also the number of
l
j
q is L
q
= s
m
+ s
m1
+ . . . s
q
. Use the formula for rank T
q2
in (2.11) to deduce that
rank T
q2
t
q2
= s
q1
.
Suppose rst that s
q1
= 0. Hence the vectors T
q2
x
j
, . . . , T
l
j
1
x
j
for all l
j
q form a
basis in range T
q2
. In this case we assume that there is no l
j
that is equal to q 1. This
concludes the proof of the induction step and the proof of the theorem in this case.
Assume now that s
q1
> 0. Then there exist vectors z
1
, . . . , z
s
q1
that together with
the vectors T
q2
x
j
, . . . , T
l
j
1
x
j
for all l
j
q form a basis in T
q2
V. Let z
k
= T
q2
u
k
, k =
1, . . . , s
q1
. Observe next that by induction hypothesis the vectors given in (2.16) form a
basis in range T
p1
for p = q. Hence T
q1
u
k
=

j:l
j
q

l
j
1
r=q1

k,r,j
T
r
x
j
. Let v
k
:=
u
k


j:l
j
q

l
j
1
r=q1

k,r,j
T
rq+1
x
j
. Clearly T
q1
v
k
= 0 for k = 1, . . . , s
q1
. Also
T
q2
v
k
= z
k

j:l
j
q

l
j
1
r=q1

k,r,j
T
r1
x
j
. Hence T
q2
v
1
, . . . , T
q2
v
s
q1
and the vec-
tors T
q2
x
j
, . . . , T
l
j
1
x
j
for all l
j
q form a basis in T
q2
V. From the above de-
nition of L
q
l
j
q if and only if j = [1, L
q
]. Let x
j
= v
jL
q
and l
j
= s
q1
for
j = L
q
+ 1, . . . , L
q1
:= L
q
+s
q1
.
It is left to show that the vectors given in (2.15) are linearly independent for p = q 1.
This is done as in the beginning of the proof of the theorem. (Assume that a linear com-
bination of these vectors is equal to 0. Then apply T
q2
and use the fact that T
l
j
x
j
= 0
for j = 1, . . . , L
q1
. Then continue as in the beginning of the proof of this theorem.) This
concludes the proof of this theorem by induction. 2
Corollary 2.23 Let T satises the assumption of Theorem 2.22 hold. Denote
V
j
:= span (T
l
j
1
x
j
, . . . , Tx
j
, x
j
) for j = 1, . . . , i. Then each V
j
is a T-invariant subspace,
T[
V
j
is represented by J
l
j
(0) C
l
j
l
j
in the basis [T
l
j
1
x
j
, . . . , Tx
j
, x
j
], and V =
i
j=1
V
j
.
Each l
j
is uniquely determined by the sequence s
i
(T), i = 1, . . . ,. Namely, the index m
of the nilpotent T is the largest i 1 such that s
i
(T) 1. Let k
1
= s
m
(T), l
1
= . . . =
l
k
1
= p
1
= m and dene recursively k
r
:= k
r1
+ s
p
r
(T), l
k
r1
+1
= . . . = l
k
r
= p
r
, where
2 r, p
r
[1, m1], s
p
r
(T) > 0 and k
r1
=

mp
r
j=1
s
mj+1
(T).
Denition 2.24 T : V V be a nilpotent operator. Then the sequence (l
1
, . . . , l
i
)
dened in Theorem 2.22, which gives the lengths of the corresponding Jordan blocks of T in
a decreasing order, is called the Segre characteristic of T. The Weyr characteristic of T is the
dual to Segres characteristic. That is consider an mi 01 matrix B = (b
pq
) 0, 1
mi
.
The j-th column of B has 1 in the rows 1, . . . , l
j
and 0 in the rest of the rows. Let
p
be the
p-th row sum of B for p = 1, . . . , m. Then
1
. . .
m
1 is the Weyr characteristic.
Proof of Theorem 2.11 (The Jordan Canonical Form)
13
Let p(z) = det(zI
n
A) be the characteristic polynomial of A C
nn
. Since C is
algebraically closed p(z) =

k
j=1
(z
j
)
n
j
. Here
1
, . . . ,
k
are k distinct roots, (eigenvalues
of A), where n
j
1 is the multiplicity of
j
in p(z). Note that

k
j=1
n
j
= n. Let (z) be
the minimal polynomial of A. By Theorem 2.16 (z)[p(z). Problem 1(c) of 2.3 we deduce
that (
j
) = 0 for j = 1, . . . , k. Hence
det(zI
n
A) =
k

j=1
(z
j
)
n
j
, (z) =
k

j=1
(z
j
)
m
j
, 1 m
j
n
j
,
j
,=
i
for j ,= i, i, j = 1, . . . , k.
(2.17)
Let
j
:= (z
j
)
m
j
for j = 1, . . . , k. Then (
j
,
i
) = 1 for j ,= i. Let V := C
n
and
T : V V be given by Tx := Ax for any x C
n
. Then det(zI
n
A) and (z) are the
characteristic and the minimal polynomial of T respectively. Use Theorem 2.19 to obtain
teh decomposition V =
k
i=1
V
i
, where each V
i
is a nontrivial T-invariant subspace such
that the minimal polynomial of T
i
:= T[
V
i
is
i
for i = 1, . . . , k. That is T
i

i
I
i
, where
I
i
is the identity operator, i.e. I
i
v = v for all v V
i
, is a nilpotent operator on V
i
and
index (T
i

i
I
i
) = m
i
. Let Q
i
:= T
i

i
I
i
. Then Q
i
is nilpotent and index Q
i
= m
i
.
Apply Theorem 2.22 and Corollary 2.23 to deduce that V
i
=
q
j
j=1
V
i,j
, where each V
i,j
is Q
i
-invariant subspace, and each V
i,j
has a basis in which Q
i
is represented by a Jordan
block J
m
ij
(0) for j = 1, . . . , q
j
. According to Corollary 2.23
m
i
= m
i1
. . . m
iq
i
1, i = 1, . . . , k. (2.18)
Furthermore, the above sequence is completely determined by rank Q
j
i
, j = 0, 1, . . . for i =
1, . . . , k. Noting that T
i
= Q
i
+
i
I
i
it easily follows that each V
i,j
is a T
i
-invariant subspace,
hence T-invariant subspace. Moreover, in the same basis of V
i,j
that Q
i
represented by
J
m
ij
(0) T
i
is represented by J
m
ij
(
i
) for j = 1, . . . , q
i
and i = 1, . . . , k. This shows the
existence of the Jordan canonical form.
We now show that the Jordan canonical form is unique, up to a permutation of factors.
Note that the minimal polynomial of A is completely determined by its Jordan canonical
form. Namely (z) =

k
i=1
(z z
i
)
m
i1
, where m
i1
is the biggest Jordan block with the
eigenvalues
i
. (See Problems 1,2 in 2.3.) Thus m
i1
= m
i
for i = 1, . . . , k. Theorem
2.19 yields that the subspaces V
1
, . . . , V
k
are uniquely determined by . So each T
i
and
Q
i
= T
i
I
i
are uniquely determined. Theorem 2.22 yields that rank Q
j
i
, j = 0, 1, . . .
determines the sizes of the Jordan blocks of Q
i
. Hence all the Jordan blocks corresponding
to
i
are uniquely determined for each i [1, k]. 2
Problems
1. Let T : V V be nilpotent with m = index T. Let (
1
, . . . ,
m
) be the Weyr
characteristic. Show that rank T
j
=

j
p=1

j
for j = 1, . . . , m.
2. Let A F
nn
. Denote by p(z) and (z) its characteristic and minimal polynomials,
by adj (zI
n
A) F[z]
nn
the adjoint matrix if zI
n
A, q(z) the g.c.d., (the greatest
common divisor) of the entries of zI
n
A, which is the g.c.d of all (n 1) (n 1)
minors of (zI
n
A). (q(z) is a assumed to be a monic polynomial in F[z].) The aim
of this problem to demonstrate the equality (z) =
p(z)
q(z)
.
(a) Show that q(z) divides p(z). (Hint: Expand det(zI
n
A) be the rst row.) Let
(z) :=
p(z)
q(z)
(b) Show that adj (zI
n
A) = q(z)C(z) for some C(z) F[z]
nn
. Show that
(z)I
n
= C(z)(zI
n
A). (Recall the proof of Theorem 2.14 that p(z)I
n
=
adj (zI
n
A)(zI
n
A).) Show that (A) = 0. Conclude that (z)[(z).
(c) Let (z) :=
p(z)
(z)
. Show that (z) F[z].
14
(d) Show that (z)I
n
= D(z)(zI
n
A) for some D(z) F[z]
nn
. Conclude that
D(z) =
1
(z)
adj (zI
n
A). Conclude that (z)[q(z). Show that (z) = (z).
3. Let A C
nn
. Show that A is diagonable if and only if all the zeros of the minimal
polynomial of A are simple, i.e. does not have multiple roots.
4. Let A C
nn
and assume that det(zI
n
A) =

k
i=1
(z
i
)
n
i
, where
1
, . . . ,
k
are
k distinct eigenvalues of A. Let
s
i
(A,
j
) := rank (A
j
I
n
)
i1
2rank (A
j
I
n
)
i
+ rank (A
j
I
n
)
i+1
, (2.19)
i = 1, . . . , n
j
, j = 1, . . . , k.
(a) Show that s
i
(A,
j
) is the number of Jordan blocks of order i corresponding to

j
for i = 1, . . . , n
j
.
(b) Show that in order to nd all Jordan blocks of A corresponding to
j
one
can stop computing s
i
(A,
j
) at the smallest i [1, n
j
] such that 1s
1
(A,
j
) +
2s
2
(A,
j
) . . . +is
i
(A,
j
) = n
j
.
3 Applications of Jordan Canonical form
3.1 Functions of Matrices
Let A C
nn
. Consider the iterations
x
l
= Ax
l1
, x
l1
C
n
, l = 1, . . . (3.1)
Clearly x
l
= A
l
x
0
. To compute x
l
fromx
l1
one need to performn(2n1) ops, (operations:
n
2
multiplications and n(n1) additions). If we want to compute x
10
8 we need to 10
8
n(2n
1) operations, if we simply program the iterations (3.1). If n = 10 it will take us some time
to do these iterations, and we will probably run to the roundo error, which will render
our computations meaningless. Is there any better way to nd x
10
8? The answer is yes,
and this is the purpose of this section. To do that we need to give the correct way to nd
directly A
10
8
, or for that matter any f(A), where f(z) is either polynomial, or more complex
functions as e
z
, cos z, sinz, an entire function f(z), or even more special functions.
Theorem 3.1 Let A C
nn
and
det(zI
n
A) =
k

i=1
(z
i
)
n
i
, (z) =
k

i=1
(z
i
)
m
i
, (3.2)
1 m := deg =
k

i=1
m
i
n =
k

i=1
n
i
, 1 m
i
n
i
,
i
,=
j
for i ,= j, i, j = 1, . . . , k,
where (z) is the minimal polynomial of A. Then there exists unique m linearly independent
matrices Z
ij
C
nn
for i = 1, . . . , k and j = 0, . . . , m
i
1, which depend on A, such that
for any polynomial f(z) the following identity holds
f(A) =
k

i=1
m
i
1

j=0
f
(j)
(
i
)
j!
Z
ij
. (3.3)
(Z
ij
, i = 1, . . . , k, j = 1, . . . , m
i
are called the A-components.)
Proof. We start rst with A = J
n
(). So J
n
() = I
n
+ H
n
, where H
n
:= J
n
(0).
Thus H
n
is a nilpotent matrix, with H
n
n
= 0 and H
j
n
has 1s on the j-th subdiagonal and all
15
other elements are equal 0 for j = 0, 1, . . . , n1. Hence I
n
= H
0
n
, H
n
, . . . , H
n1
n
are linearly
independent.
Let f(z) = z
l
. Then
A
l
= (I
n
+H
n
)
l
=
l

j=0
_
l
j
_

lj
H
j
n
=
min(l,n1)

j=0
_
l
j
_

lj
H
j
n
.
The last equality follows from the equality H
j
= 0 for j n. Note that (z) = det(zI
n

J
n
()) = (z )
n
, i.e. k = 1 and m = m
1
= n. From the above equality we conclude that
Z
1j
= H
j
n
for j = 0, . . . if f(z) = z
l
and l = 0, 1, . . .. With this denition of Z
1j
(3.3) holds
for K
l
z
l
, where K
l
C and l = 0, 1, . . .. Hence (3.3) holds for any polynomial f(z) for this
choice of A.
Assume now that A is a direct sum of Jordan blocks as in (2.9): A =
k,l
i
i=j=1
J
m
ij
(
i
).
Here m
i
= m
i1
. . . m
il
i
1 for i = 1, . . . , k, and
i
,=
j
for i ,= j. Thus (3.2) holds with
n
i
=

l
i
j=1
m
ij
for i = 1, . . . , k. Let f(z) be a polynomial. Then f(A) =
k,l
i
i=j=1
f(J
m
ij
(
i
)).
Use the results for J
n
() to deduce
f(A) =
k,l
i
i=j=1
m
ij
1

r=0
f
(r)
(
i
)
r!
H
r
m
ij
.
Let Z
ij
C
nn
be a block diagonal matrix of the following form. For each integer l
[1, k] with l ,= i all the corresponding blocks to J
lr
(
l
) are equal to zero. In the block
corresponding to J
m
ir
(
i
) Z
ij
has the block matrix H
j
m
ir
for j = 0, . . . , m
i
1. Note that
each Z
ij
is a nonzero matrix with 0 1 entries. Furthermore, two dierent Z
ij
and Z
i

do not have a common 1 entry. Hence Z


ij
, i = 1, . . . , k, j = 0, . . . , m
i
1 are linearly
independent. It is straightforward to deduce (3.3) from the above identity.
Let B C
nn
. Then B = UAU
1
where A is the Jordan canonical form of B. Recall
that A and B have the same characteristic polynomial. Let f(z) C[z]. Then (3.3) holds.
Clearly
f(B) = Uf(A)U
1
=
k

i=1
m
i
1

j=0
f
(j)
(
i
)
j!
UZ
ij
U
1
.
Hence (3.3) holds for B, where UZ
ij
U
1
, i = 1, . . . , k, j = 0, . . . , m
ij1
are the B-components.
The uniqueness of the A-components follows from the existence and uniqueness of the
Lagrange-Sylvester interpolation polynomial as explained below.
2
Theorem 3.2 (The Lagrange-Sylvester interpolation polynomial). Let
1
, . . . ,
k

C be k-distinct numbers. Let m
1
, . . . , m
k
be k positive integers and let m = m
1
+. . . +m
k
.
Let s
ij
, i = 1, . . . , k, j = 0, . . . , m
i
1 be any m complex numbers. Then there exists a
unique polynomial (z) of degree at most m1 satisfying the conditions
(j)
(
i
) = s
ij
for
i = 1, . . . , k, j = 0, . . . , m
i
1 satisfying the conditions. (For m
i
= 1, i = 1, . . . , k is the
Lagrange interpolating polynomial.)
Proof. The Lagrange interpolating polynomial is given by the formula
(z) =
k

i=1
(z
1
) . . . (z
i1
)(z
i+1
) . . . (z
k
)
(
i

1
) . . . (
i

i1
)(
i

i+1
) . . . (
i

k
)
s
i0
.
In the general case one determines (z) as follows. Let (z) :=

k
i=1
(z
i
)
m
i
. Then
(z) = (z)
k

i=1
m
i
1

j=0
t
ij
(z
i
)
m
i
j
.
16
Now start to determine t
ij
recursively starting with any i and j = 0. Then it is straight-
forward to show that t
i0
=
i
(
i
), where
i
(z) =
(z)
(z
i
)
m
i
. Now nd t
i1
by taking the
derivative of the above formula for (z) and let z =
i
. Continue this process until all
t
ij
, i = 1, . . . , k, j = 1, . . . , m
i
1 are determined. Note that deg m1.
The uniqueness is shown as follows. Assume that (z) is another Lagrange-Sylvester
polynomial of degree less than m. Then (z) := (z) (z) must be divisible by (z
i
)
m
i
,
since
(j)
(
i
) = 0 for j = 0, . . . , m
i
1, for each i = 1, . . . , k. Hence (z)[(z). As
deg (z) m1 it follows that (z) is the zero polynomial, i.e. (z) = (z). 2
Proof of the uniqueness of A-components. Let
ij
(z) be the Lagrange-Sylvester
polynomial given by the data s
i

j
, i
t
= 1, . . . , k, j
t
= 1, . . . , m
i
1. Assume s
ij
= j! and all
other s
i

j
= 0. Then (3.3) yields that Z
ij
=
ij
(A). 2
Proposition 3.3 Let A C
nn
. Assume that the minimal polynomial (z) be given
by (3.2) and denote m = deg . Then for each integers u, v [1, n] denote by a
(l)
uv
and
(Z
ij
)
uv
the (u, v) entries of A
l
and of the A-component Z
ij
respectively. Then (Z
ij
)
uv
, i =
1, . . . , k, j = 0, . . . , m
i
1 are the unique solutions of the following system with m unknowns
k

i=1
m
i
1

j=0
_
l
j
_

max(lj,0)
i
(Z
ij
)
uv
= a
(l)
uv
, l = 0, . . . , m1. (3.4)
(Note that
_
l
j
_
= 0 for j > l.)
Proof. Consider the equality (3.3) for f(z) = z
l
where l = 0, . . . , m 1. Restrict-
ing these equalities to (u, v) entries we deduce that (Z
ij
)
uv
satisfy the system (3.4). Thus
the systems (3.4) are solvable for each pair (u, v), u, v = 1, . . . , n. Let X
ij
C
nn
, i =
1, . . . , k, j = 1, . . . , m
i
1 such that ((X
ij
)
uv
satisfy the system (3.4) for each u, v [1, n].
Hence f(A) =

k
i=1

m
i
1
j=0
f
(j)
(
i
)
j!
T
ij
for f(z) = z
l
and l = 0, . . . , m1. Hence the above
equality holds for any polynomial f(z) of degree less than m. Apply the above formula
to the Lagrange-Sylvester polynomial
ij
as given in the proof of the uniqueness of the
A-components. Then
ij
(A) = X
ij
. So X
ij
= Z
ij
. Thus each system (3.4) has a unique
solution. 2
The algorithm for nding the A-components and its complexity.
1. (a) Set i = 1.
(b) Compute and store A
i
. Check if I
n
, A, . . . , A
i
are linearly independent. If inde-
pendent, set i = i + 1 and go to (b).
(c) m = i and express A
m
=

m
i=1
a
i
A
mi
. Then (z) = z
m

m
i=1
a
i
z
mi
is the
minimal polynomial.
(d) Find the k roots of (z) and their multiplicities: (z) =

k
i=1
(z
i
)
m
i
.
(e) Find the A-components by solving n
2
systems (3.4).
2. The maximum complexity to nd (z) happens when m = n. Then we need to com-
pute and store I
n
, A, A
2
, . . . , A
n
. So we need n
3
storage space. Viewing I
n
, A, . . . , A
i
as row vectors arranged as i n
2
matrix B
i
C
in
2
, we bring B
i
to a row echelon
form: C
i
= U
i
B
i
, U
i
C
ii
. Note that C
i
is essentially upper triangular. Then we
add i +1-th row: A
i+1
to the B
i
to obtain C
i+1
= U
i+1
B
i+1
. (C
i
is i i submatrix of
C
i+1
.) To get C
i+1
from C
i
we need 2in
2
ops. In the case m = n C
n
2
+1
has las row
zero. So to nd (z) we need at most Kn
4
ops. (K 2?). The total storage space
is around 2n
3
.
17
Now to nd the roots of (z) with certain precision will take a polynomial time,
depending on the precision.
To solve n
2
systems with n variables, given in (3.4), use Gauss-Jordan for the aug-
mented matrix [S T]. Here S C
nn
stands for the coecient of the system (3.4),
depending on
1
, . . . ,
k
. T C
nn
2
given the left-hand side of n
2
systems of (3.4).
One needs around n
3
storage space. Bring [S T] to [I
n
Q] using Gauss-Jordan to nd
A-components. To do that we need about n
4
ops.
In summary, we need storage of 2n
3
and around 4n
4
ops. (This would suce to nd
the roots of (z) with good enough precision.)
Problems
1. Let A C
44
be given as in Problem 3 of Section 2.3. Assume that the characteristic
polynomial of A is z
2
(z 1)
2
.
(a) Use Problem 4 of Section 2.4 to nd the Jordan canonical form of A.
(b) Assume that the minimal polynomial of A is z(z1)
2
. Find all the A-components.
(c) Give the explicit formula for any A
l
.
2. Let A C
nn
and assume that det(zI
n
A) =

k
i=1
(z
i
)
n
i
, and the minimal
polynomial (z) =

k
i=1
(z
i
)
m
i
where
1
, . . . ,
k
are k distinct eigenvalues of A.
Let Z
ij
, j = 0, . . . , m
i
1, i = 1, . . . , k are the A-components.
(a) Show that Z
ij
Z
pq
= 0 for i ,= p.
(b) What is the exact formula for Z
ij
Z
ip
?
3.2 Power stability, convergence and boundedness of matrices
Corollary 3.4 Let A C
nn
Assume that the minimal polynomial (z) be given by
(3.2) and denote by Z
ij
, i = 1, . . . , k, j = 0, . . . , m
j
1 the A-components. Then for each
positive integer l
A
l
=
k

i=1
m
i
1

j=0
_
l
j
_

max(lj,0)
i
Z
ij
. (3.5)
If we know the A-components then to compute A
l
we need only around 2mn
2
2n
3
ops!
Thus we need at most 4n
4
ops to compute A
l
, including the computations of A-components,
without dependence on l! (Note that
j
i
= e
log j
i
.) So to nd x
10
8 = A
10
8
x
0
discussed in
the beginning of the previous section we need about 10
4
ops. So to compute x
10
8 we need
about 10
4
10
2
ops compared with 10
8
10
2
ops using the simple minded algorithm explained
in the beginning of the previous section. There are much simpler algorithms to compute A
l
which are roughly of the order (log
2
l)
2
n
3
of computations and (log
2
l)
2
n
2
(4n
2
?) storage.
See Problem ? However roundo error remains a problem for large l.
Denition 3.5 Let A C
nn
. A is called power stable if lim
l
A
l
= 0. A is called
power convergent if lim
l
A
l
= B for some B C
nn
. A is called power bounded if there
exists K > 0 such that the absolute value of every entry of every A
l
, l = 1, . . . is bounded
above by K.
Theorem 3.6 Let A C
nn
. Then
1. A is power stable if and only if each eigenvalue of A is in the interior of the unit disk:
[z[ < 1.
2. A is power convergent if and only if each eigenvalue of A satises one of the following
conditions
18
(a) [[ < 1;
(b) = 1 and each Jordan block of the JCF of A with an eigenvalue 1 is of order 1,
i.e. 1 is a simple zero of the minimal polynomial of A.
3. A is power bounded if and only if each eigenvalue of A satises one of the following
conditions
(a) [[ < 1;
(b) [[ = 1 and each Jordan block of the JCF of A with an eigenvalue is of order
1, i.e. is a simple zero of the minimal polynomial of A.
Proof. Consider the formula (3.4). Since the A-components Z
ij
, i = 1, . . . , k, j =
0, . . . , m
i
1 are linearly independent we need to satisfy the conditions of the theorem for
each term in (3.4), which is
_
l
j
_

lj
i
Z
ij
for l >> 1. Note that for a xed j lim
l
_
l
j
_

lj
i
= 0
if and only if [
i
[ < 1. Hence we deduce the condition 1 of the theorem.
Note that the sequence
_
l
j
_

lj
i
, l = j, j + 1, . . . , converges if and only if either [
i
[ < 1
or
i
= 1 and j = 0. Hence we deduce the condition 2 of the theorem.
Note that the sequence
_
l
j
_

lj
i
, l = j, j +1, . . . , is bounded if and only if either [
i
[ < 1
or [
i
[ = 1 and j = 0. Hence we deduce the condition 3 of the theorem.
2
Corollary 3.7 Let A C
nn
and consider the iterations x
l
= Ax
l1
for l = 1, . . ..
Then for any x
0
1. lim
l
x
l
= 0 if and only if A is power stable.
2. x
l
, l = 0, 1, . . . converges if and only if A is power convergent.
3. x
l
, l = 0, 1, . . . is bounded if and only if A is power bounded.
Proof. If A satises the conditions of an item i Theorem 3.6 then the corresponding
condition i of the corollary clearly holds. Assume that the conditions of an item i of the
corollary holds. Choose x
0
= e
j
= (
1j
, . . . ,
nj
)

for j = 1, . . . , n to deduce the correspond-


ing condition i of Theorem 3.6. 2
Theorem 3.8 Let A C
nn
and consider the nonhomogeneous iterations
x
l
= Ax
l1
+b
l
, l = 0, . . . (3.6)
Then
1. lim
l
x
l
= 0 for any x
0
C
n
and any sequence b
0
, b
1
, . . . satisfying the condition
lim
l
b
l
= 0 if and only if A is power stable.
2. The sequence x
l
, l = 0, 1, . . . converges for any x
0
and any sequence b
0
, b
1
, . . . satis-
fying the condition

l
l=0
b
l
converges.
3. The sequence x
l
, l = 0, 1, . . . is bounded for any x
0
and any sequence b
0
, b
1
, . . . satis-
fying the condition

l
l=0
[[b
l
[[

converges. (Here [[(x


1
, . . . , x
n
)[[

= max
i[1,n]
[x
i
[.)
Proof. Assume that b
l
= 0. Since x
0
is arbitrary we deduce the necessity of all
the conditions from Theorem 3.6. The suciency of the above conditions follow from the
Jordan Canonical Form of A as follows.
Let J = U
1
AU where U is an invertible matrix and J is the Jordan canonical form of
A. By letting y
l
:= U
1
x
l
and c
l
= U
1
b
l
it is enough to prove the suciency part of the
theorem for the case where A is sum of Jordan blocks. In this case system (3.6) reduces to
independent systems of equations for each Jordan block. Thus it is left to prove the theorem
when A = J
n
().
19
1. We show that if A = J
n
() and [[ < 1, then lim
l
x
l
= 0 for any x
0
and b
l
, l = 1, . . .
if lim
l
b
l
= 0. We prove this claim by the induction on n. For n = 1 (3.6) reduces
to
x
l
= x
l1
+b
l
, x
0
, x
l
, b
l
C for l = 1, . . . (3.7)
It is straightforward to show, e.g. use induction that
x
l
=
l

i=0

i
b
li
= b
l
+b
l1
+. . . +
l
b
0
l = 1, . . . , were b
0
:= x
0
. (3.8)
Let
m
= sup
im
[b
i
[. Since lim
l
b
l
= 0, it follows that each
m
is nite, the
sequence
m
, m = 0, 1, . . . decreasing and lim
m

m
= 0. Fix m. Then for l > m
[x
l
[
l

i=0
[[
i
[b
li
[ =
lm

i=0
[[
i
[b
li
[ +[[
lm
m

j=1
[[
j
[[b
mj
[

m
lm

i=0
[[
i
+[[
lm
m

j=1
[[
j
[[b
mj
[
m

i=0
[[
i
+[[
lm
m

j=1
[[
j
[[b
mj
[ =

m
1 [[
+[[
lm
m

j=1
[[
j
[[b
mj
[

m
1 [[
as l .
That is limsup
l
[x
l
[

m
1]]
. As lim
m

m
= 0 it follows that limsup
l
[x
l
[ =
0, which is equivalent to the statement lim
l
x
l
= 0. This proves the case n = 1.
Assume that the theorem holds for n = k. Let n = k+1. View x

l
:= (x
1,l
, y

l
, )

, b
l
=
(b
1,l
, c

l
, )

, where y
l
= (x
2,l
, . . . , x
k+1,l
)

, c
l
C
k
are the vectors composed of the
last k coordinates of x
l
and b
l
respectively. Then (3.6) for A = J
k+1
() for the last
k coordinates of x
l
is given by the system y
l
= J
k
()y
l1
+ c
l
for l = 1, 2, . . .. Since
lim
l
c
l
= 0 the induction hypothesis yields that lim
l
y
l
= 0. The system (3.6)
for A = J
k+1
() for the rst coordinate is x
1,l
= x
1,l1
+(x
2,l1
+b
1,l
) for l = 1, ldots.
From induction hypothesis and the assumption that lim
l
b
l
= 0 we deduce that
lim
l
x
2,l1
+b
1,l
= 0. Hence from the case k = 1 we deduce that lim
l
x
1,l
= 0.
Hence lim
l
x
l
= 0. The proof of this case is concluded.
2. Assume that A satises the each eigenvalue of A satises the following conditions:
either [[ < 1, or = 1 and each Jordan block corresponding to 1 is of order 1. As we
pointed out we assume that A is a direct sum of its Jordan form. So rst we consider
A = J
k
() with [[ < 1. Since we assumed that

l=1
b
l
converges we deduce that
lim
l
b
l
= 0. Thus, by part 1 we get that lim
l
x
l
= 0.
Assume now that A = (1) C
11
. Thus we consider (3.7) with = 1. (3.8) yields
that x
l
=

l
i=0
b
l
. By the assumption of the theorem

i=1
b
l
converges, hence the
sequence x
l
, l = 1, . . . converges.
3. As in the part 2 it is enough to consider the case J
1
() with [[ = 1. Note that (3.8)
yields that [x
l
[

l
i=0
[b
i
[. The assumption that

i=1
[b
i
[ converges imply that
[x
l
[

i=0
[b
i
[ < .
2
Remark 3.9 The stability, convergence and boundedness of the nonhomogeneous sys-
tems:
x
l
= A
l
x
l1
, A
l
C
nn
, l = 1, . . . ,
x
l
= A
l
x
l1
+b
l
, A
l
C
nn
, b
l
C
n
l = 1, . . . ,
20
are much harder to analyze. (If time permits we revisit these problems later on in the
course.)
Problems
1. Consider the nonhomogeneous system x
l
= A
l
x
l1
, A
l
C
nn
, l = 1, . . .. Assume
that the sequence A
l
, l = 1, . . . , is periodic, i.e. A
l+p
= A
l
for all l = 1, . . . , and a
xed positive integer p.
(a) Show that for each x
0
C
n
lim
l
x
l
= 0 if and only if B := A
p
A
p1
. . . A
1
is
power stable.
(b) Show that for each x
0
C
n
the sequence x
l
, l = 1, . . . , converges if and only if the
following conditions satised. First, B is power convergent, i.e. lim
l
B
l
= C.
Second, A
i
C = C for i = 1, . . . , p.
(c) Find a necessary and sucient conditions such that for each x
0
C
n
the sequence
x
l
, l = 1, . . . ,, is bounded.
3.3 e
At
and stability of certain systems of ODE
Recall that the exponential function e
z
has the MacLaurin expansion
e
z
= 1 +z +
z
2
2
+
z
3
6
+. . . =

l=0
z
l
l!
.
Hence for each A C
nn
one denes
e
A
:= I
n
+A+
A
2
2
+
A
3
6
+. . . =

l=0
A
l
l!
.
More generally, if t C then
e
At
:= I
n
+At +
A
2
t
2
2
+
A
3
t
3
6
+. . . =

l=0
A
l
t
l
l!
.
Hence e
At
satises the matrix dierential equation
e
At
dt
= Ae
At
= e
At
A. (3.9)
Also one has the standard identity e
At
e
Au
= e
A(t+u)
for any complex numbers t, u.
Proposition 3.10 Let A C
nn
and consider the system of linear system of n ordinary
dierential equations with constant coecients
dx(t)
dt
= Ax(t), where x(t) C
n
, satisfying
the initial conditions x(t
0
) = x
0
. Then x(t) = e
A(tt
0
)
x
0
is the unique solution to the above
system. More generally, let b(t) C
n
be any continuous vector function on R and consider
the nonhomogeneous system of n ordinary dierential equations with the initial condition:
dx(t)
dt
= Ax(t) +b(t), x(t
0
) = x
0
. (3.10)
Then this system has a unique solution of the form
x(t) = e
A(tt
0
)
x
0
+
_
t
t
0
e
A(tu)
b(u)du. (3.11)
21
Proof. The uniqueness of the solution of (3.10) follows from the uniqueness of solutions
to system of ODE (Ordinary Dierential Equations). The rst part of the proposition follows
from (3.9). To deduce the second part one does the variations of parameters. Namely one
tries a solution x(t) = e
A(tt
0
)
y(t) where y(t) C
n
is unknown vector function. Hence
x
t
= (e
A(tt
0
)
)
t
y(t) +e
A(tt
0
)
y
t
(t) = Ae
A(tt
0
)
y(t) +e
A(tt
0
)
y
t
(t) = Ax(t) +e
A(tt
0
)
y
t
(t).
Substitute this expression of x(t) to (3.10) to deduce the dierential equation y
t
= e
A(tt
0
)
b(t).
Since y(t
0
) = x
0
this simple equation have a unique solution y(t) = x
0
+
_
u
t
0
e
A(ut
0
)
b(u)du.
Now multiply by e
A(tt
0
)
and use the fact that e
At
e
Au
= e
A(u+v)
to deduce (3.11). 2
Note: The second term in the formula (3.11) can be considered as a perturbation term
to the solution
dx(t)
dt
= Ax(t), x(t
0
) = x
0
, i.e. to the system (3.10) with b(t) 0.
Use (3.3) for e
zt
and the observation that
d
j
e
zt
dz
j
= t
j
e
zt
, j = 0, 1, . . . to deduce:
e
At
=
k

j=1
m
i
1

j=0
t
j
e

i
t
j!
Z
ij
. (3.12)
We can substitute this expression for e
At
in (3.11) to get a simple expression of the
solution x(t) of (3.10).
Denition 3.11 Let A C
nn
. A is called exponentially stable, or simple stable, if
lim
t
e
At
= 0. A is called exponentially convergent if lim
t
e
At
= B for some B
C
nn
. A is called exponentially bounded if there exists K > 0 such that the absolute value
of every entry of every e
At
, t [0, ) is bounded above by K.
Theorem 3.12 Let A C
nn
. Then
1. A is stable if and only if each eigenvalue of A is in the left half of the complex plane:
'z < 0.
2. A is exponentially convergent if and only if each eigenvalue of A satises one of the
following conditions
(a) ' < 0;
(b) = 2l

1 for some integer l, and each Jordan block of the JCF of A with an
eigenvalue is of order 1, i.e. is a simple zero of the minimal polynomial of
A.
3. A is exponentially bounded if and only if each eigenvalue of A satises one of the
following conditions
(a) ' < 0;
(b) ' = 0 and each Jordan block of the JCF of A with an eigenvalue is of order
1, i.e. is a simple zero of the minimal polynomial of A.
Proof. Consider the formula (3.12). Since the A-components Z
ij
, i = 1, . . . , k, j =
0, . . . , m
i
1 are linearly independent we need to satisfy the conditions of the theorem for
each term in (3.12), which is
t
j
j!
e

i
t
Z
ij
. Note that for a xed j lim
t
t
j
j!
e

i
t
= 0 if and
only if '
i
< 0. Hence we deduce the condition 1 of the theorem.
Note that the function
t
j
j!
e

i
t
converges as t if and only if either '
i
< 0 or e

i
= 1
and j = 0. Hence we deduce the condition 2 of the theorem.
Note that the function
t
j
j!
e

i
t
is bounded for t 0 if and only if either '
i
< 0 or
[e

i
[ = 1 and j = 0. Hence we deduce the condition 3 of the theorem.
2
22
Corollary 3.13 Let A C
nn
and consider the system of dierential equations
dx(t)
dt
=
Ax(t), x(t
0
) = x
0
. Then for any x
0
1. lim
t
x(t) = 0 if and only if A is stable.
2. x(t) converges as t if and only if A is exponentially convergent.
3. x(t), t [0, ) is bounded if and only if A is exponentially bounded.
Theorem 3.14 Let A C
nn
and consider the system of dierential equations (3.10).
Then for any x
0
C
n
1. lim
t
x(t) = 0 for any continuous function b(t), such that lim
t
b(t) = 0, if and
only if A is stable.
2. x(t) converges as t for any continuous function b(t), such that
_

t
0
b(u)du
converges, if and only if A is exponentially convergent.
3. x(t), t [0, ) is bounded for any continuous function b(t), such that
_

t
0
[b(u)[du
converges if and only if A is exponentially bounded.
Proof. The necessity of the conditions of the theorem follow from Corollary 3.13 by
choosing b(t) 0.
1. Suppose that A is stable. Then Corollary 3.13 yields that lim
t
e
At
x
0
= 0. Thus
show that lim
t
x(t) = 0, it is enough to show that the second term in (3.11) tends
to 0. Use (3.12) to show that it is enough to demonstrate that
lim
t
_
t
t
0
(t u)
j
e
(tu)
g(u)du = 0, where ' < 0,
for any continuous g(t) [t
0
, ), such that limg(t) = 0. For > 0 there exists
T = T() such that [g(t)[ for t T(). Let t > T(). Then
[
_
t
t
0
(t u)
j
e
(tu)
g(u)du[ = [
_
T()
t
0
(t u)
j
e
(tu)
g(u)du +
_
t
T()
(t u)
j
e
(tu)
g(u)du[
[
_
T()
t
0
(t u)
j
e
(tu)
g(u)du[ +[
_
t
T()
(t u)
j
e
(tu)
g(u)du[
[
_
T()
t
0
(t u)
j
e
(tu)
g(u)du[ +
_
t
T()
(t u)
j
e
J(tu)
du.
Consider the rst term in the last inequality. Since lim
t
t
j
e
t
= 0 it follows that the
rst term converges to zero. The second term bounded by K for K :=
_

0
t
j
e
Jt
dt.
Hence as 0 we deduce that lim
t
_
t
t
0
(t u)
j
e
(tu)
g(u)du = 0.
2. Using part 1 we deduce the result for any eigenvalue with ' < 0. It is left to discuss
the case = 0. We assume that the Jordan blocks of A correspond to = 0 are of
length one. So the A-component corresponding to = 0 is Z
10
. The corresponding
term is
. . .
4 Inner product spaces
4.1 Inner product
Denition 4.1 Let F = R, C and let V be a vector space over F. Then , ) : VV F
is called an inner product if the following conditions hold:
23
(a) ax +by, z) = ax, z) +by, z), for all a, b F, x, y, z V,
(br) for F = R y, x) = x, y), for all x, y V;
(bc) for F = C y, x) = x, y), for all x, y V;
(c) x, x) > 0 for all x V0.
[[x[[ :=
_
x, x) is called the norm (length) of x V.
Other standard properties of inner products are mentioned in Problems 4.2-4.3. We will
use the abbreviation IPS for inner product space. In this chapter we assume that F = R, C
unless stated otherwise.
Proposition 4.2 Let V be a vector space over R. Identify V
c
with the set of pairs
(x, y), x, y V. Then V
c
is a vector space over C with
(a +

1b)(x, y) := a(x, y) +b(y, x), for all a, b R, x, y V.


If V has a basis e
1
, ..., e
n
over F then (e
1
, 0), ..., (e
n
, 0) is a basis of V
c
over C. Any inner
product , ) on V over R induces the following inner product on V
c
:
(x, y), (u, v)) = x, u) +y, v) +

1(y, u) x, v)), x, y, u, v V.
We leave the proof of this proposition to the reader (Problem 4.4).
Denition 4.3 Let V be an IPS. Then
(a) x, y V are called orthogonal if x, y) = 0.
(b) S, T V are called orthogonal if x, y) = 0 for any x S, y T.
(d) For any S V, S

V is the maximal orthogonal set to S.


(e) x
1
, ..., x
m
is called an orthonormal set if
x
i
, x
j
) =
ij
, i, j = 1, ..., m.
(f ) x
1
, ..., x
n
is called an orthonormal basis if it is an orthonormal set which is a basis in
V.
Denition 4.4 (Gram-Schmidt algorithm.) Let V be an IPS and S = x
1
, ..., x
m

V a nite (possibly empty) set (m 0). Then

S = e
1
, ..., e
p
is the orthonormal set (p 1)
or the empty set (p = 0) obtained from S using the following recursive steps:
(a) If x
1
= 0 remove it from S. Otherwise replace x
1
by [[x
1
[[
1
x
1
.
(b) Assume that x
1
, ..., x
k
is an orthonormal set and 1 k < m. Let y
k+1
= x
k+1

k
i=1
x
k+1
, x
i
)x
i
. If y
k+1
= 0 remove x
k+1
from S. Otherwise replace x
k+1
by [[y
k+1
[[
1
y
k+1
.
Corollary 4.5 Let V be an IPS and S = x
1
, ..., x
n
V be n linearly independent
vectors. Then the Gram-Schmidt algorithm on S is given as follows:
y
1
:= x
1
, r
11
:= [[y
1
[[, e
1
:=
y
1
r
11
,
r
ji
:= x
i
, e
j
), j = 1, ..., i 1, (4.1)
y
i
:= x
i

i1

j=1
r
ji
e
j
, r
ii
:= [[y
i
[[, e
i
:=
y
i
r
ii
, i = 2, ..., n.
In particular, e
i
S
i
and [[y
i
[[ = dist(x
i
, S
i1
), where S
i
= span(x
1
, ..., x
i
) for i = 1, ..., n
and S
0
= 0. (See Problem 4.5 for the denition of dist(x
i
, S
i1
).)
Corollary 4.6 Any (ordered) basis in a nite dimensional IPS V induces an orthonor-
mal basis by the Gram-Schmidt algorithm.
24
See Problem 4.5 for some known properties related to the above notions.
Remark 4.7 It is known, e.g. [8] that the Gram-Schmidt process as described in (4.1)
is numerically unstable. That is, there is a severe loss of orthogonality of y
1
, . . . as we
proceed to compute y
i
. In computations one uses either a modied GSP or Householder
orthogonalization [8].
Denition 4.8 (Modied Gram-Schmidt algorithm.) Let V be an IPS and S =
x
1
, ..., x
m
V a nite (possibly empty) set (m 0). Then

S = e
1
, ..., e
p
is the
orthonormal set (p 1) or the empty set (p = 0) obtained from S using the following
recursive steps:
Initialize j = 1 and p = m.
If x
j
,= 0 let e
j
:=
1
|x
j
|
x
j
. If x
j
= 0 replace p by p1 and x
i
by x
i+1
for i = j, . . . , p.
p
i
:= x
i
, e
j
)e
j
and replace x
i
by x
i
:= x
i
p
i
for i = j + 1, . . . , p.
Let j = j + 1 and repeat the process.
MGS algorithm is stable, needs mn
2
ops, which is more time consuming then GS
algorithm.
Problems
(4.2)
Let V be an IPS over F. Show
0, x) = x, 0) = 0,
for F = R z, ax +by) = az, x) +bz, y), for all a, b R, x, y, z V,
for F = C z, ax +by) = az, x) +

bz, y), for all a, b C, x, y, z V.


(4.3)
Let V be an IPS. Show
(a) [[ax[[ = [a[ [[x[[ for a F and x V.
(b) The Cauchy-Schwarz inequality:
[x, y)[ [[x[[ [[y[[,
and equality holds if and only if x, y are linearly dependent (collinear).
(c) The triangle inequality
[[x +y[[ [[x[[ +[[y[[,
and equality holds if either x = 0 or y = ax for a R
+
.
(4.4)
Prove Proposition 4.2.
(4.5)
Let V be a nite dimensional IPS of dimension n. Assume that S V. Show
(a) If x
1
, ..., x
m
is an orthonormal set then x
1
, ..., x
m
are linearly independent.
(b) Assume that e
1
, ..., e
n
is an orthonormal basis in V. Show that for any x V the
orthonormal expansion holds
x =
n

i=1
x, e
i
)e
i
. (4.6)
25
Furthermore for any x, y V
x, y) =
n

i=1
x, e
i
)y, e
i
). (4.7)
(c) Assume that S is a nite set. Let

S be the set obtained by the Gram-Schmidt process.
Show that

S = spanS = 0. Show that if

S ,= then e
1
, ..., e
p
is an orthonormal
basis in span S.
(d) There exists an orthonormal basis e
1
, ..., e
n
in V and 0 m n such that
e
1
, ..., e
m
S, span S = span(e
1
, ..., e
m
),
S

= span(e
m+1
, ..., e
n
),
(S

= spanS.
(e) Assume from here to the end of the problem that S is a subspace. Show V = S S

.
(f) Let x V and let x = u+v for unique u S, v S

. Let P(x) := u be the projection


of x on S. Show that P : V V is a linear transformation satisfying
P
2
= P, Range P = S, Ker P = S

.
(g) Show
dist(x, S) := [[x Px[[ [[x w[[ for any w S
and equality w = Px. (4.8)
(h) Show that dist(x, S) = [[xw[[ for some w S if and only if xw is orthogonal to S.
(i) Let e
1
, . . . , e
m
be an orthonormal basis of S. Show that for each x V Px =

p
i=1
y, e
i
)e
i
.
(Note: Px is called the least square approximation to x in the subspace S.)
(4.9)
Let X C
mn
and assume that m n and rank X = n. Let x
1
, ..., x
n
C
m
be the
columns of X, i.e. X = (x
1
, ..., x
n
). Assume that C
m
is an IPS with the standard inner
product < x, y >= y

x. Perform the Gram-Schmidt algorithm (4.5) to obtain the matrix


Q = (e
1
, ..., e
n
) C
mn
. Let R = (r
ji
)
n
1
C
nn
be the upper triangular matrix with
r
ji
, j i given by (4.1). Show that

Q
T
Q = I
n
and X = QR. (This is the QR algorithm.)
Show that if in addition X R
mn
then Q and R are real valued matrices.
(4.10)
Let C C
nn
and assume that
1
, ...,
n
are n eigenvalues of C counted with their
multiplicities. View C as an operator C : C
n
C
n
. View C
n
as 2n-dimensional vector
space over R
2n
. Let C = A+

1B, A, B M
n
(R).
a. Then

C :=
_
A B
B A
_
M
2n
(R) represents the operator C : C
n
C
n
as an operator
over R in suitably chosen basis.
b. Show that
1
,

1
, ...,
n
,

n
are the 2n eigenvalues of

C counting with multiplicities.
c. Show that the Jordan canonical form of

C, is obtained by replacing each Jordan block
I +H in C by two Jordan blocks I +H and

I +H.
4.2 Geometric interpretation of the determinant
Denition 4.9 Let x
1
, . . . , x
m
R
n
be m given vectors. Then the parallelepiped P(x
1
, . . . , x
m
)
is dened as follows. The 2
m
vertices of P(x
1
, . . . , x
m
) are of the form v :=

m
i=1
a
i
x
i
,
where a
i
= 0, 1 for i = 1, . . . , m. Two vertices v =

m
i=1
a
i
x
i
and w =

m
i=1
b
i
x
i
of
26
P(x
1
, . . . , x
m
) are adjacent, i.e. connected by an edge in P(x
1
, . . . , x
m
), if [[(a
1
, . . . , a
m
)

(b
1
, . . . , b
m
)

[[ = 1, i.e. the 0 1 coordinates of (a


1
, . . . , a
m
)

and (b
1
, . . . , b
m
)

dier only
at one coordinate k, for some k [1, m].
Note that if e
1
, . . . , e
n
is the standard basis in R
n
, i.e. e
i
= (
1i
, . . . ,
ni
)

, i = 1, . . . , n,
then P(e
1
, . . . , e
m
) is the m-dimensional unit cube, whose edges are parallel to e
1
, . . . , e
m
and its center (of gravity) is
1
2
(1, . . . , 1
. .
, 0, . . . , 0)

, where 1 appears m times for 1 m n.


For m > n P(x
1
, . . . , x
m
) is attened parallelepiped, since x
1
, . . . , x
m
are always
linearly dependent in R
n
for m > n.
Proposition 4.10 Let A R
nn
and view A = [c
1
c
2
. . . c
n
] as an ordered set of n vec-
tors, (columns), c
1
, . . . , c
n
. Then [ det A[ is the n-dimensional volume of the parallelepiped
P(c
1
, . . . , c
n
). If c
1
, . . . , c
n
are linearly independent then the orientation in R
n
induced by
c
1
, . . . , c
n
is the same as the orientation induced by e
1
, . . . , e
n
if det A > 0, and is the
opposite orientation if det A < 0.
Proof. det A = 0 if and only if the columns of A are linearly dependent. If c
1
, . . . , c
n
are
linearly dependent, then P(c
1
, . . . , c
n
) lies in a subspace of R
n
, i.e. some n 1 dimensional
subspace, and hence the n-dimensional volume of P(c
1
, . . . , c
n
) is zero.
Assume now that det A ,= 0, i.e. c
1
, . . . , c
n
are linearly independent. Perform that Gram-
Schmidt process 4.4. Then A = QR, where Q = [e
1
e
2
. . . e
n
] is an orthogonal matrix and
R = (r
ji
) R
nn
is an upper diagonal matrix. (See Problem 4.9.) So det A = det Qdet R.
Since Q

Q = I
n
we deduce that 1 = det I
n
= det Q

det Q = det Qdet Q = (det Q)


2
. So
det Q = 1 and the sign of det Q is the sign of det A.
Hence [ det A[ = det R = r
11
r
22
. . . r
nn
. Recall that r
11
is the length of the vector c
1
, and
r
ii
is the distance of the vector e
i
to the subspace spanned by e
1
, . . . , e
i1
for i = 2, . . . , n.
(See Problem 4.5 parts (f-i).) Thus the length of P(c
1
) is r
11
. The distance of c
2
to P(c
1
)
is r
22
. Hence the area, i.e 2-dimensional volume of P(c
1
, c
2
) is r
11
r
22
. Continuing in this
manner we deduce that the i 1 dimensional volume of P(c
1
, . . . , c
i1
) is r
11
. . . r
(i1)(i1)
.
As the distance of c
i
to P(c
1
, . . . , c
i1
) is r
ii
it follows that the i-dimensional volume of
P(c
1
, . . . , c
i
) is r
11
. . . r
ii
. For i = n we get that [ det A[ = r
11
. . . r
nn
which is equal to the
n-dimensional volume of P(c
1
, . . . , c
n
).
As we already pointed out the sign of det A is equal to the sign of det Q = 1. If
det Q = 1 it is possible to rotate the standard basis in R
n
to the basis given by the
columns of an orthogonal matrix Q with det Q = 1. If det Q = 1, we need one reection,
i.e. replace the standard basis e
1
, . . . , e
n
be the new basis e
1
, e
2
, . . . , e
n
and the rotate
the new basis e
1
, e
2
, . . . , e
n
to the basis consisting of the columns of an orthogonal matrix
Q, where det Q = 1. 2
Theorem 4.11 (The Hadamard determinantal inequality) Let A = [c
1
, . . . , c
n
]
C
nn
. Then [ det A[ [[c
1
[[ [[c
2
[[ . . . [[c
n
[[. Equality holds if and only if either c
i
= 0 for
some i or c
i
, c
j
) = 0 for all i ,= j, i.e. c
1
, . . . , c
n
is an orthogonal system.
Proof. Assume rst that det A = 0. Clearly the Hadamard inequality holds. Equal-
ity in Hadamard inequality if and only if c
i
= 0 for some i.
Assume now that det A ,= 0 and perform the Gram-Schmidt process. From (4.1) it
follows that A = QR where Q is a unitary matrix, i.e. Q

Q = I
n
and R = (r
ji
) C
nn
upper triangular with r
ii
real and positive numbers. So det A = det Qdet R. Thus
1 = det I
n
= det Q

Q = det Q

det Q = det Qdet Q = [ det Q[


2
[ det Q[ = 1.
Hence [ det A[ = det R = r
11
r
22
. . . r
nn
. According to Problem 4.5 and the proof of Propo-
sition 4.10 we know that [[c
i
[[ dist(c
i
, span(c
1
, . . . , c
i1
)) = r
ii
for i = 2, . . . , n. Hence
[ det A[ = det R [[c
1
[[ [[c
2
[[ . . . [[c
n
[[. Equality holds if [[c
i
[[ = dist(c
i
, span(c
1
, . . . , c
i1
))
27
for i = 2, . . . , n. Use Problem 4.5 to deduce that [[c
i
[[ = dist(c
i
, span(c
1
, . . . , c
i1
)) if an
only if c
i
, c
j
) = 0 for j = 1, . . . , i 1. Use these conditions for i = 2, . . . to deduce that
equality in Hadamard inequality holds if and only if c
1
, . . . , c
n
is an orthogonal system. 2
Problems
1. Let A = (a
ij
)
i,j
C
nn
. Assume that [a
ij
[ K for all i, j = 1, . . . , n. Show that
[ det A[ K
n
n
n
2
.
2. Let A = (a
ij
)
n
i,j=1
C
nn
such that [a
ij
[ 1 for i, j = 1, . . . , n. Show that [ det A[ =
n
n
2
if and only if A

A = AA

= nI
n
. In particular, if [ det A[ = n
n
2
then [a
ij
[ = 1 for
i, j = 1, . . . , n.
3. Show that for each n there exists a matrix A = (a
ij
)
n
i,j=1
C
nn
such that [a
ij
[ = 1
for i, j = 1, . . . , n and [ det A[ = n
n
2
.
4. Let A = (a
ij
) R
nn
and assume that a
ij
= 1, i, j = 1, . . . , n. Show that if n > 2
then the assumption that [ det A[ = n
n
2
yields that n is divisible by 4.
5. Show that for any n = 2
m
, m = 0, 1, . . . there exists A = (a
ij
) R
nn
such that
a
ij
= 1, i, j = 1, . . . , n and [ det A[ = n
n
2
. (Hint: Try to prove by induction on
m that A R
2
m
2
m
can be chosen symmetric, and then construct B R
2
m+1
2
m+1
using A.)
Note: A matrix A = (a
ij
)
n
i,j=1
R
nn
such that a
ij
= 1 for i, j = 1, . . . , n and
[ det A[ = n
n
2
is called a Hadamard matrix. It is conjectured that for each n divisible by 4
there exists a Hadamard matrix.
4.3 Special transformations in IPS
Proposition 4.12 Let V be an IPS and T : V V a linear transformation Then there
exists a unique linear transformation T

: V V such that < Tx, y >=< x, T

y > for all


x, y V.
See Problems 4.3-4.4.
Denition 4.13 Let V be an IPS and let T : V V be a linear transformation. Then
(a) T is called self-adjoint if T

= T;
(b) T is called anti self-adjoint if T

= T;
(c) T is called unitary if T

T = TT

= I;
(d) T is called normal if T

T = TT

.
Denote by S(V), AS(V), U(V), N(V) the sets of self-adjoint, anti self-adjoint, unitary
and normal operators on V respectively.
Proposition 4.14 Let V be an IPS over F = R, C with an orthonormal basis E =
e
1
, ..., e
n
. Let T : V V be a linear transformation. Let A = (a
ij
) F
nn
be the
representation matrix of T in the basis E:
a
ij
=< Te
j
, e
i
>, i, j = 1, ..., n. (4.1)
Then for F = R:
(a) T

is represented by A

,
(b) T is selfadjoint A = A

,
(c) T is anti selfadjoint A = A

,
(d) T is unitary A is orthogonal AA

= A

A = I,
(e) T is normal A is normal AA

= A

A,
28
and for F = C:
(a) T

is represented by A

(:=

A

),
(b) T is selfadjoint A is hermitian A = A

,
(c) T is anti selfadjoint A is anti hermitian A = A

,
(d) T is unitary A is unitary AA

= A

A = I,
(e) T is normal A is normal AA

= A

A.
See Problem 4.5.
Proposition 4.15 Let V be an IPS over R, and let T Hom(V). Let V
c
be the
complexication of V. Show that there exists a unique T
c
Hom(V
c
) such that T
c
[V = T.
Furthermore T is self-adjoint, unitary or normal if and only if T
c
is self-adjoint, unitary or
normal respectively.
See Problem 4.6
Denition 4.16 For a domain T with identity 1 let
S(n, T) := A T
nn
: A = A

,
AS(n, T) := A M
n
(T) : A = A

,
O(n, T) := A T
nn
: AA

= A

A = I,
SO(n, T) := A O(n, T) : det A = 1,
DO(n, T) := D(n, T) O(n, T),
N(n, R) := A R
nn
: AA

= A

A,
N(n, C) := A C
nn
: AA

= A

A,
H
n
:= A M
n
(C) : A = A

,
AH
n
:= A C
nn
: A = A

,
U
n
:= A C
nn
: AA

= A

A = I,
SU
n
:= A U
n
: det A = 1,
DU
n
:= D(n, C) U
n
.
See Problem 4.7 for relations between these classes.
Theorem 4.17 Let V be an IPS over C of dimension n. Then a linear transformation
T : V V is normal if and only if V has an orthonormal basis consiting of eigenvectors
of T.
Proof. Suppose rst that Vhas an orthonormal basis e
1
, ..., e
n
such that Te
i
=
i
e
i
, i =
1, ..., n. From the denition of T

it follows that T

e
i
=

i
e
i
, i = 1, ..., n. Hence TT

=
T

T.
Assume now T is normal. Since C is algebraically closed T has an eigenvalue
1
. Let
V
1
be the subspace of V spanned by all eigenvectors of T corresponding to the eigenvalue

1
. Clearly TV
1
V
1
. Let x V
1
. Then Tx =
1
x. Thus
T(T

x) = (TT

)x = (T

T)x = T

(Tx) =
1
T

x T

V
1
V
1
.
Hence TV

1
, T

1
V

1
. Since V = V
1
V

1
it is enough to prove the theorem for T[V
1
and T[V

1
.
As T[V
1
=
1
I
V
1
it is straightforward to show T

[V
1
=

1
I
V
1
(see Problem 4.4). Hence
for T[V
1
the theorem trivially holds. For T[V

1
the theorem follows by induction. 2
The proof of Theorem 4.17 yields:
29
Corollary 4.18 Let V be an IPS over R of dimension n. Then the linear transformation
T : V V with a real spectrum is normal if and only if V has an orthonormal basis
consiting of eigenvectors of T.
Proposition 4.19 Let V be an IPS over C. Let T N(V). Then
T is self adjoint spec (T) R,
T is unitary spec (T) S
1
= z C : [z[ = 1.
Proof. Since T is normal there exists an orthonormal basis e
1
, ..., e
n
such that Te
i
=

i
e
i
, i = 1, ..., n. Hence T

e
i
=

i
e
i
. Then
T = T


i
=

i
, i = 1, ..., n,
TT

= T

T = I [
i
[ = 1, i = 1, ..., n.
2
Combine Proposition 4.15 and Corollary 4.18 with the above proposition to deduce:
Corollary 4.20 Let V be an IPS over R and let T S(V). Then spec (T) R and V
has an orthonormal basis consisting of the eigenvectors of T.
Proposition 4.21 Let V be an IPS over R and let T U(V). Then V =
i1,1,2,...,k]
V
i
,
where k 1, V
i
and V
j
are orthogonal for i ,= j, such that
(a) T[V
1
= I
V
1
dimV
1
0,
(b) T[V
1
= I
V
1
dimV
1
0,
(c) T[V
i
= V
i
, dimV
i
= 2, spec (T[V
i
) S
1
1, 1 for i = 2, ..., k.
See Problem 4.9.
Proposition 4.22 Let V be an IPS over R and let T AS(V). Then V =
i1,2,...,k]
V
i
,
where k 1, V
i
and V
j
are orthogonal for i ,= j, such that
(a) T[V
1
= 0
V
1
dimV
1
0,
(b) T[V
i
= V
i
, dimV
i
= 2, spec (T[V
i
)

1R0 for i = 2, ..., k.


See Problem 4.10.
Theorem 4.23 Let V be an IPS over C of dimension n. Let T Hom(V). (Here
Hom(V) stands for the algebra of all linear transformations from V to itself.) Let
1
, ...,
n

C be n eigenvalues of T counted with their multiplicities. Then there exists a unitary basis
g
1
, ..., g
n
of V with the following properties:
Tspan(g
1
, ..., g
i
) span(g
1
, ..., g
i
), Tg
i
, g
i
) =
i
, i = 1, ..., n. (4.2)
Let V be an IPS over R of dimension n. Let T Hom(V) and assume that spec (T) R.
Let
1
, ...,
n
R be n eigenvalues of T counted with their multiplicities. Then there exists
an orthonormal basis g
1
, ..., g
n
of V such that (4.2) holds.
Proof. Assume rst that V is IPS over C of dimension n. The proof is by induction
on n. For n = 1 the theorem is trivial. Assume that n > 1. Since
1
spec (T) it follows
that there exists g
1
V, g
1
, g
1
) = 1 such that Tg
1
=
1
g
1
. Let U := span(g
1
)

. Let P
be the orthogonal projection on U. Let T
1
:= PT[
U
. Then T
1
Hom(U). Let

2
, ...,

n
be
the eigenvalues of T
1
counted with their multiplicities. The induction hypothesis yields the
existence of an orthonormal basis g
2
, ..., g
n
of U such that
T
1
span(g
2
, ..., g
i
) span(g
2
, ..., g
i
), Tg
i
, g
i
) =

i
, i = 1, ..., n.
30
It is straightforward to show that Tspan(g
1
, ..., g
i
) span(g
1
, ..., g
i
) for i = 1, ..., n. Hence
in the orthonormal basis g
1
, ..., g
n
T is presented by an upper diagonal matrix B = (b
ij
)
n
1
,
with b
11
=
1
and b
ii
=

i
, i = 2, ..., n. Hence
1
,

2
, ...,

n
are the eigenvalues of T counted
with their multiplicities. This establishes the theorem in this case. The real case is treated
similarly. 2
Combine the above results with Problems 4.8 and 4.15 to deduce:
Corollary 4.24 Let A C
nn
. Let
1
, ...,
n
C be n eigenvalues of A counted with
their multiplicities. Then there exist an upper triangular matrix B = (b
ij
)
n
1
M
n
(C),
such that b
ii
=
i
, i = 1, ..., n, and a unitary matrix U U
n
such that A = UBU
1
. If
A N(n, C) then B is a diagonal matrix.
Let A M
n
(R) and assume that spec (T) R. Then A = UBU
1
where U can be
chosen a real orthogonal matrix and B a real upper triangular matrix. If A N(n, R) and
spec (A) R then B is a diagonal matrix.
It is easy to show that U in the above Corollary can be chosen in SU
n
or SO(n, R) respec-
tively (Problem 4.14).
Denition 4.25 Let V be a vector space and assume that T : V V is a linear
operator. Let 0 ,= v V. Then W = span(v, Tv, T
2
v, . . .) is called a cyclic invariant
subspace of T generated by v. (It is also referred as a Krylov subspace of T generated by
v.) Sometimes we will call W just a cyclic subspace, or Krylov subspace.
Theorem 4.26 Let V be a nite dimensional IPS. Let T : V V be a linear operator.
For 0 ,= v V let W = span(v, Tv, ..., T
r1
v) be a cyclic T-invariant subspace of dimension
r generated by v. Let u
1
, ..., u
r
be an orthonormal basis of W obtained by the Gram-Schmidt
process from the basis [v, TV, ..., T
r1
v] of W. Then Tu
i
, u
j
) = 0 for 1 i j 2, i.e.
the representation matrix of T[W in the basis [u
1
, . . . , u
r
] is upper Hessenberg. If T is
self-adjoint then the representation matrix of T[W in the basis [u
1
, . . . , u
r
] is a tridiagonal
hermitian matrix.
Proof. Let W
j
= span(v, . . . , T
j1
v) for j = 1, ..., r + 1. Clearly TW
j
W
j+1
for
j = 1, ..., r. The assumption that W is T-invariant subspace yields W = W
r
= W
r+1
.
Since dimW = r it follows that v, ..., T
r1
v are linearly independent. Hence [v, . . . , T
r1
v]
is a basis for W. Recall that span(u
1
, ..., u
j
) = W
j
for j = 1, . . . , r. Let r j i + 2.
Then Tu
i
TW
i
W
i+1
. As u
j
W
i+1
it follows that Tu
i
, u
j
) = 0. Assume that
T

= T. Let r i j + 2. Then Tu
i
, u
j
) = u
i
, Tu
j
) = 0. Hence the representation
matrix of T[W in the basis [u
1
, . . . , u
r
] is a tridiagonal hermitian matrix. 2
Problems
(4.3)
Prove Proposition 4.12.
(4.4)
Let P, Q Hom(V), a, b F. Show that (aP +bQ)

= aP

bQ

.
(4.5)
Prove Proposition 4.14.
(4.6)
Prove Proposition 4.15 for nite dimensional V. (Hint: Choose an orthonormal basis in V.)
(4.7)
31
Show the following
SO(n, T) O(n, T) GL(n, T),
S(n, R) H
n
N(n, C),
AS(n, R) AH
n
N(n, C),
S(n, R), AS(n, R) N(n, R) N(n, C),
O(n, R) U
n
N(n, C),
SO(n, T), O(n, T), SU
n
, U
n
are groups
S(n, T) is a T module of dimension
_
n + 1
2
_
,
AS(n, T) is a T module of dimension
_
n
2
_
,
H
n
is an R vector space of dimension n
2
.
AH
n
=

1 H
n
(4.8)
Let E = e
1
, ..., e
n
be an orthonormal basis in IPS V over F. Let G = g
1
, ..., g
n
be
another basis in V. Show that F is an orthonormal basis if and only if the tranfer matrix
either from E to G or from G to E is a unitary matrix.
(4.9)
Prove Proposition 4.21
(4.10)
Prove Proposition 4.22
(4.11)
a. Show that A SO(2, R) is of the form A =
_
cos sin
sin cos
_
, R.
b. Show that SO(2, R) = e
AS(2,R)
. That is for any B AS(2, R) e
B
SO(2, R) and
any A SO(2, R) is e
B
for some B AS(2, R). (Hint: Consider the power series for
e
B
, B =
_
0
0
_
.)
c. Show that SO(n, R) = e
AS(n,R)
. (Hint: Use Propositions 4.21 and 4.22 and part b.)
d. Show that SO(n, R) is a path connected space. (See part e.)
e. Let V be an n(> 1)-dimensional IPS over F = R. Let p n 1). Assume that
x
1
, ..., x
p
and y
1
, ..., y
p
be two orthonormal systems in V. Show that these two o.n.s. are
path connected. That is there are p continuous mappings z
i
(t) : [0, 1] V, i = 1, ..., p such
that for each t [0, 1] z
1
(t), ..., z
p
(t) is an o.n.s. and z
i
(0) = x
i
, z
i
(1) = y
i
, i = 1, ..., p.
(4.12)
a. Show that if Q is 3 3 orthogonal matrix with det Q = 1 then 1 is an eigenvalue of Q.
b. Let Q be 33 orthogonal matrix with det Q = 1. Show that there exists e R
3
, [[e[[ = 1
and [0, 2) such that for each x R
3
the vector Qx can be obtained as follows.
Decompose x = u + v, where u = x, e)e and v, e) = 0. Let S := span(e)

be the two
dimensional subspace orthogonal to e. Then Qx = u + w, where w S is obtained by
rotating v S by an angle . (This is result is called Eulers theorem, i.e. a rotation of a
three dimensional body around its center of gravity can be obtained as a two dimensional
rotation along some axis, (given by the direction of e).)
c. For which values of n any n n orthogonal matrix Q has an eigenvalue 1 or 1? Can
you tell under what condition 1 is always an eigenvalue of Q?
(4.13)
32
a. Show that U
n
= e
AH
n
. (Hint: Use Proposition 4.19 and its proof.)
b. Show that U
n
is path connected.
c. Prove Problem 4.11e for F = C.
(4.14)
Show
(a) D
1
DD

1
= D for any D D(n, C), D
1
DU
n
.
(b) A N(n, C) A = UDU

, U SU
n
, D D(n, C).
(c) A N(n, R), (A) R A = UDU

, U SO
n
, D D(n, R).
(4.15)
Show that an upper triangular or a lower triangular matrix B C
nn
is normal if and only
if B is diagonal. (Hint: consider the equality (BB

)
11
= (B

B)
11
.)
(4.16)
Let the assumptions of Theorem 4.26 hold. Show that instead of performing the Gram-
Schmidt process on v, Tv, ..., T
r1
v one can perform the following process. Let w
1
:=
v
]]v]]
.
Assume that one already obtained i orthonormal vectors w
1
, ..., w
i
. Let w
i+1
:= Tw
i

i
j=1
Tw
i
, w
j
)w
j
. If w
i+1
= 0 then stop the process, i.e. one is left with i orthonormal
vectors. If w
i+1
,= 0 then w
i+1
:=
w
i+1
]] w
i+1
]]
and continue the process. Show that the process
ends after obtaining r orthonormal vectors w
1
, . . . , w
r
and u
i
= w
i
for i = 1, ..., r. (This is
a version of Lanczos tridiagonalization process.)
4.4 Quadratic and hermitian forms
In this section you may assume that T = R, C.
Denition 4.27 Let V be a module over T and Q : VV T. Q is called a quadratic
form (on V) if the following conditions are satised:
(a) Q(x, y) = Q(y, x) for all x, y V (symmetricity);
(b) Q(ax +bz, y) = aQ(x, y) +bQ(z, y) for all a, b T and x, y, z V (bilinearity).
For T = C Q is called hermitian form (on V) if Q satises the conditions (a
t
) and (b)
where
(a
t
) Q(x, y) =

Q(y, x) for all x, y V (barsymmetricity).
The following results are elementary (see Problems 4.1-4.2):
Proposition 4.28 Let V be a module over T with a basis E = e
1
, ..., e
n
. Then there
is 1 1 correspondence between a quadratic form Q on V and A S(n, T):
Q(x, y) =

A,
x =
n

i=1

i
e
i
, y =
n

i=1

i
e
i
, = (
1
, ...,
n
)

, = (
1
, ...,
n
)

T
n
.
Let V be a vector space over C with a basis E = e
1
, ..., e
n
. Then there is 1 1 correspon-
dence between a hermitian form Q on V and A H
n
:
Q(x, y) =

A,
x =
n

i=1

i
e
i
, y =
n

i=1

i
e
i
, = (
1
, ...,
n
)

, = (
1
, ...,
n
)

C
n
.
Denition 4.29 Let the assumptions of Proposition 4.28 hold. Then A is called the
representation matrix of Q in the basis E.
33
Proposition 4.30 Let the assumptions of Proposition 4.28 Let F = f
1
, ..., f
n
be an-
other basis of the T module V. Then the quadratic form Q is represented by B S(n, T)
in the basis F, where B is congruent A:
B = U

AU, U GL(n, T)
and U is the matrix corresponding to the basis change from F to E. For T = C the hermitian
form Q is presented by B H
n
in the basis F, where B hermicongruent to A:
B = U

AU, U GL(n, C)
and U is the matrix corresponding to the basis change from F to E.
In what follows we assume that T = F = R, C.
Proposition 4.31 Let V be an n dimensional vector space over R. Let Q : VV R
be a quadratic form. Let A S(n, R) the representation matrix of Q with respect to a basis
E in V. Let V
c
be the extension of V over C. Then there exists a unique hermitian form
Q
c
: V
c
V
c
C such that Q
c
[
VV
= Q and Q
c
is presented by A with respect to the basis
E in V
c
.
See Problem 4.3
Normalization 4.32 Let V is a nite dimensional IPS over F. Let Q : VV F be
either a quadratic form for F = R or a hermitian form for F = C. Then a representation
matrix A of Q is chosen with respect to an orthonormal basis E.
The following proposition is straightforward (see Problem 4.4).
Proposition 4.33 Let V is an n-dimensional IPS over F. Let Q : VV F be either
a quadratic form for F = R or a hermitian form for F = C. Then there exists a unique
T S(V) such that Q(x, y) =< Tx, y > for any x, y V. In any orthonormal basis of
V Q and T represented by the same matrix A. In particular the characteristic polynomial
p() of T is called the characteristic polynomial of Q. Q has only real roots:

1
(Q) ...
n
(Q),
which are called the eigenvalues of Q. Furthermore there exists an orthonormal basis F =
f
1
, ..., f
n
in V such that D = diag(
1
(Q), ...,
n
(Q)) is the representation matrix of Q in
F.
Vice versa, for any T S(V) and any subspace U V the form Q(T, U) dened by
Q(T, U)(x, y) :=< Tx, y > for x, y U
is either a quadratic form for F = R or a hermitian form for F = C.
In the rest of the book we use the following normalization unless stated otherwise.
Normalization 4.34 Let V is an n-dimensional IPS over F. Assume that T S(V).
Then arrange the eigenvalues of T counted with their multiplicities in the decreasing order

1
(T) ...
n
(T).
Same normalization applies to real symmetric matrices and complex hermitian matrices.
Problems
(4.1)
Prove Proposition 4.28.
(4.2)
Prove Proposition 4.30.
(4.3)
Prove Proposition 4.31.
(4.4)
Prove Proposition 4.33.
34
4.5 Max-min characterizations
Theorem 4.35 (Convoy Principle [12]) Let V be an n-dimensional IPS. Let Gr(m, V)
be the space of all m-dimensional subspaces in U of dimension m [0, n] Z
+
. Let
T S(V). Then

k
(T) = max
UGr(k,V)
min
0,=xU
Tx, x)
x, x)
= (4.1)
max
Gr(k,V)

k
(Q(T, U)), k = 1, ..., n,
where the quadratic form Q(T, U) is dened in Proposition 4.33. For k [1, n] N let
U be an invariant subspace of T spanned by eigenvectors e
1
, ..., e
k
corresponding to the
eigenvalues
1
(T), ...,
k
(T). Then
k
(T) =
k
(Q(T, U)). Let U Gr(k, V) and assume
that
k
(T) =
k
(Q(T, U)). Then U contains and eigenvector of T corresponding to
k
(T).
In particular

1
(T) = max
0,=xV
Tx, x)
x, x)
,
n
(T) = min
0,=xV
Tx, x)
x, x)
(4.2)
Moreover for any x ,= 0

1
(T) =
Tx, x)
x, x)
Tx =
1
(T)x,

n
(T) =
Tx, x)
x, x)
Tx =
n
(T)x,
The quotient
Tx,x)
x,x)
, 0 ,= x V is called Rayleigh quotient. The characterization (4.2)
is called convoy principle.
Proof. Choose an orthonormal basis E = e
1
, ..., e
n
such that
Te
i
=
i
(T)e
i
, e
i
, e
j
) =
ij
i, j = 1, ..., n. (4.3)
Then
Tx, x)
x, x)
=

n
i=1

i
(T)[x
i
[
2

n
i=1
[x
i
[
2
, x =
n

i=1
x
i
e
i
,= b0. (4.4)
The above equality yields straightforward (4.2) and the equality cases in these characteri-
zations. Let U Gr(k, V). Then the minimal characterization of
k
(Q(T, U)) yields the
equality

k
(Q(T, U)) = min
0,=xU
Tx, x)
x, x)
for any U Gr(k, U). (4.5)
Next there exists b0 ,= x U such that x, e
i
) = 0 for i = 1, ..., k 1. (For k = 1 this
condition is void.) Hence
Tx, x)
x, x)
=

n
i=k

i
(T)[x
i
[
2

n
i=k
[x
i
[
2

k
(T)
k
(T)
k
(Q(T, U)).
Let

1
(T) = ... =
n
1
(T) > (T)
n
1
+1
(T) = ... =
n
2
(T) > ... >

n
r1
+1
(T) = ... =
n
r
(T) =
n
(T), n
0
= 0 < n
1
< ... < n
r
= n. (4.6)
Assume that n
j1
< k n
j
. Suppose that
k
(Q(T, U)) =
k
(T). Then for the x of the
above form
Tx,x)
x,x)
=
k
(T). Hence x =

n
j
i=k
x
i
e
i
. Thus Tx =
k
(T)x.
35
Let U
k
= span(e
1
, ..., e
k
). Let 0 ,= x =

k
i=1
x
i
e
i
U
k
. Then
Tx, x)
x, x)
=

k
i=1

i
(T)[x
i
[
2

k
i=1
[x
i
[
2

k
(T)
k
(Q(T, U
k
))
k
(T).
Hence
k
(Q(T, U
k
)) =
k
(T). 2
It can be shown that for k > 1 and
1
(T) >
k
(T) there exist U Gr(k, V) such that

k
(T) =
k
(T, U) and U is not an invariant subspace of T, in particular U does not contain
all e
1
, ..., e
k
satisfying (4.3). (See Problem 4.10.)
Corollary 4.36 Let the assumptions of Theorem 4.35 hold. Let 1 n. Then

k
(T)
k
(Q(T, W)) and
k
(T) = max
WGr(,V)

k
(Q(T, W)), k = 1, ..., . (4.7)
Proof. For k apply Theorem 4.35 to
k
(Q(T, W)) to deduce that
k
(Q(T, W))

k
(T). Let U

= span(e
1
, ..., e

). Then

k
(Q(T, U

)) =
k
(T), k = 1, ..., .
2
Corollary 4.37 (Cauchy Interlacing Theorem) Let A H
n
and let B H
n1
be the
principal submatrix of A obtained by deleting the row and the column i [1, n] of A. Denote
by
1

2
. . .
n
and
1

2
. . .
n1
the eigenvalues of A and B respectively.
Then

1

1

2
. . .
n1

n
,
i.e.
i

i

i+1
for i = 1, . . . , n 1.
Proof. Let U
i
:= span(e
1
, . . . , e
i1
, e
i+1
, . . . , e
n
). Then the restriction of the quadratic
form x

Ax to U gives rise to the quadratic form induced by B. Corollary 4.36 yields the
inequality
i

i
for i = 1, . . . , n 1. Consider now A and its principal submatrix
B. Their eigenvalues arranged in a decreasing order are
n

n1
. . .
1
and

n1

n2
. . .
1
respectively. The above arguments yield
ni+1

ni
for i = 1, . . . , n 1, which are equivalent to
j

j+1
for j = 1, . . . , n 1. 2
Denition 4.38 For T S(V) denote by
+
(T),
0
(T),

(T) the number of positive,


negative and zero eigenvalues among
1
(T) ...
n
(T). The triple (T) := (
+
(T),
0
(T),

(T))
is called the inertia of T.
Let B > 0, B 0, B 0, B < 0 if
0
(T) +

(T) = 0,

(T) = 0,
+
(T) = 0 and

+
(T) +
0
(T) = 0 respectively.
For B H
n
(B) := (
+
(B),
0
(B),

(B)) is the inertia of B, where


+
(B),
0
(B),

(B)
is the number of positive, negative and zero eigenvalues of B respectively.
Proposition 4.39 Let U Gr(k, V).
1. Assume that
k
(Q(T, U)) > 0, i.e. Q(T, U) > 0. Then k
+
(T).
2. Assume that
k
(Q(T, U)) 0, i.e. Q(T, U) 0. Then k
+
(T) +
0
(T).
3. Assume that
1
(Q(T, U)) < 0, i.e. Q(T, U) < 0. Then k

(T).
4. Assume that
1
(Q(T, U)) 0, i.e. Q(T, U) 0. Then k

(T) +
0
(T).
36
5. Sylvester Law of Inertia: Let B H
n
or B S
n
(R) and assume that A =
PBP

or A = PBP

for some P GL(n, C) or P GL(n, R) respectively. Then


(A) = (B). Furthermore, if A, B H
n
or A, B S
n
(R) have the same inertia then
there exists P GL(n, C) or P GL(n, R) such that A = PBP

or A = PBP

respectively.
Proof. 1. Corollary 4.36 yields that
k
(T)
k
(Q(T, U)) > 0, hence k
+
(T).
The proofs of 2,3,4 are similar.
5. Assume that A = PBP

. Note that if x

Bx > 0 or x

Bx 0 or for all 0 ,= x U
then y

Ay > 0 y

Ay 0 for all y P

U respectively. Hence
+
(B)
+
(A) and

+
(B) +
0
(B)
+
(A) +
0
(A). Since P is invertible P
1
= Q and B = QAQ

. Hence we
deduce as above that
+
(A)
+
(B) and
+
(A) +
0
(A)
+
(B) +
0
(B). Thus (A) = (B)
and
0
(A) =
0
(B). Since
+
(A) +
0
(A) +

(A) =
+
(B) +
0
(B) +

(B) = n we deduce
that

(A) =

(B), i.e. (A) = (B).


Assume now that (B) = (A). Observe that B = QQ

, where Q is unitary and =


diag(
1
, . . . ,
n
). Let f(x) =
1

]x]
if x ,= 0 and f(0) = 1. Set R = diag(f(
1
), . . . , f(
n
))
S
n
(R). Then C = R

R = RR is a diagonal matrix with


+
(B) 1
t
s,
0
(B) zeros and

(B) 1
t
s on the main diagonal. So (QR)

B(QR) = C. Similarly C = (Q
t
R
t
)

A(Q
t
R
t
).
Hence A = PBP

.
2
Theorem 4.40 Let V be an n-dimensional IPS and T S(V). Then

k
(T) = min
WGr(k1,V)
max
0,=xW

Tx, x)
x, x)
, k = 1, ..., n.
See Problem 4.11 for the proof of the theorem and the following corollary.
Corollary 4.41 Let V be an n-dimensional IPS and T S(V). Let k, [1, n 1] be
integers satisfying k , k + > n. Then

k+n
(T)
k
(Q(T, W))
k
(T), for any W Gr(, V).
Denition 4.42 Let V be an n-dimensional IPS. Fix an integer k [1, n]. Then F
k
=
f
1
, ..., f
k
is called an orthonormal k-frame if f
i
, f
j
) =
ij
for i, j = 1, ..., k. Denote by
Fr(k, V) the set of all orthonormal k-frames in V.
Note that each F
k
Fr(k, V) induces U = spanF
k
Gr(k, V). Vice versa, any U
Gr(k, V) induces the set Fr(k, U) of orthonormal k-frames which span U.
Theorem 4.43 (Ky Fan [3]) Let V be an n-dimensional IPS and T S(V). Then
for any integer k [1, n]
k

i=1

i
(T) = max
f
1
,...,f
k
]Fr(k,V)
k

i=1
Tf
i
, f
i
).
Furthermore
k

i=1

i
(T) =
k

i=1
Tf
i
, f
i
)
for some k-orthonormal frame F
k
= f
1
, ..., f
k
if and only if spanF
k
is spanned by e
1
, ..., e
k
satisifying (4.3).
37
Proof. Dene
tr Q(T, U) :=
k

i=1

i
(Q(T, U)) for U Gr(k, V),
(4.8)
tr
k
T :=
k

i=1

i
(T).
Let F
k
= f
1
, ..., f
k
Fr(k, V). Set U = spanF
k
. Then in view of Corollary 4.36
k

i=1
Tf
i
, f
i
) = tr Q(T, U)
k

i=1

i
(T).
Let E
k
:= e
1
, ..., e
k
where e
1
, ..., e
n
are given by (4.3). Clearly tr
k
T = tr Q(T, spanE
k
).
This shows the maximal characterization of tr
k
T.
Let U Gr(k, V) and assume that tr
k
T = tr Q(T, U). Hence
i
(T) =
i
(Q(T, U)) for
i = 1, ..., k. Then there exists G
k
= g
1
, ..., g
k
Fr(k, U)) such that
min
0,=xspan(g
1
,...,g
i
]
Tx, x)
x, x)
=
i
(Q(T, U)) =
i
(T), i = 1, ..., k.
Use Theorem 4.35 to deduce that Tg
i
=
i
(T)g
i
for i = 1, ..., k. 2
Theorem 4.44 (J. Neumann) Let A, B H
n
. Denote by
1
(A) . . .
n
(A),
1
(B)
. . .
n
(B) the eigenvalues of A and B respectively. Then

1
(A)
n
(B) +
2
(A)
n1
(B) +. . . +
n
(A)
1
(B) tr(AB)
n

i=1

i
(A)
i
(B) (4.9)
Equalities hold if and only if there is an orthonormal basis g
1
, . . . , g
n
of C
n
such that
1. For the equality case in the upper bound Ag
i
=
i
(A)g
i
, Bg
i
=
i
(B)g
i
for i =
1, . . . , n.
2. For the equality case in the lower bound Ag
i
=
i
(A)g
i
, Bg
i
=
ni+1
(B)g
i
for
i = 1, . . . , n.
Proof. Note that for any invertible matrix U one has tr(AB) = tr(UABU
1
) =
tr((UAU
1
)(UBU
1
)), since tr(AB) is the sum of the eigenvalues of AB. Choose U uni-
tary such that UAU
1
= UAU

= := diag(
1
(A), . . . ,
n
(A). Let C = UBU
1
= UBU

.
Then C = (c
ij
)
n
i,j=1
H
n
and
i
(C) =
i
(B) for i = 1, . . . , n. Clearly tr(C) =

n
i=1

i
(A)c
ii
=

n
i=1

i
(A)(e

i
Ce
i
), where e
i
= (
i1
, . . . ,
in
)

, i = 1, . . . , n is the standard
basis in C
n
. Observe next that
n

i=1

i
(A)e

i
Ce
i
=
n1

i=1
(
i
(A)
i+1
(A))
i

j=1
e

j
Ce
j
+
n
n

i=1
e

i
Ce
i
.
Since
i
(A)
i+1
(A) 0 Ky Fan inequality yields that (
i
(A)
i+1
(A))

i
j=1
e

j
Ce
j

(
i
(A)
i+1
(A))

i
j=1

j
(C) for i = 1, . . . , n 1. Combine all these inequalities with
the equality

n
i=1
e

i
Ce
i
= tr C =

n
i=1

i
(C). to deduce the inequality tr(C)

n
i=1

i
(A)
i
(C). This gives the upper inequality in (4.9).
Equality case is slightly more delicate to analyze. If
1
(A) > . . . >
n
(A) then Ce
i
=

i
(C)e
i
for i = 1, . . . , n if tr(C) =

n
i=1

i
(A)
i
(C).
38
To obtain the lower bound note that tr(A(B))

n
i=1

i
(A)
i
(B). Use the identity

i
(B) =
ni+1
(B) for i = 1, . . . , n to deduce the lower bound. Equality case as for the
upper bound. 2
Denition 4.45 Let
R
n

:= x = (x
1
, ..., x
n
)
T
R
n
: x
1
x
2
... x
n
.
For x = (x
1
, ..., x
n
)
T
R
n
let x = ( x
1
, ..., x
n
)
T
R
n

be the unique rearrangement of the


coordinates of x in a decreasing order. That is there exists a permutation on 1, ..., n
such that x
i
= x
(i)
, i = 1, ..., n. Let x = (x
1
, ..., x
n
)

, y = (y
1
, ..., y
n
)

R
n
. Then x is
weakly majorized by y (y weakly majorizes x), which is denoted by x _ y, if
k

i=1
x
i

k

i=1
y
i
, k = 1, ..., n.
x is majorized by y (y majorizes x), which is denoted by x y, if x _ y and

n
i=1
x
i
=

n
i=1
y
i
.
A remarkable inequality is attached to the notion of majorization [6], see also Problem ??
part (c).
Theorem 4.46 Let x = (x
1
, ..., x
n
)

y = (y
1
, ..., y
n
)
T
. Let : [ y
n
, y
1
] R be a
continuous convex function. Then
n

i=1
(x
i
)
n

i=1
(y
i
).
Corollary 4.47 Let V be an n-dimensional IPS. Let T S(V). Denote (T) :=
(
1
(T), ...,
n
(T))

R
n

. Let F
n
= f
1
, ...f
n
Fr(n, V). Then (Tf
1
, f
1
), ..., Tf
n
, f
n
))


(T). Let : [
n
(T),
1
(T)] R be a continuous convex function. Then
n

i=1
(
i
(T)) = max
f
1
,...f
n
]Fr(n,V)
n

i=1
(Tf
i
, f
i
)).
See Problem 4.12
Denition 4.48 A set D R
n
is called convex if for any x, y D and any t [0, 1]
the linear combination tx + (1 t)y is in the set D.
Let D R
n
be a convex set and f : D R be a function on D. Then f is called a convex
function on D if for any x, y D and any t [0, 1] f(tx +(1 t)y) tf(x) +(1 t)f(y).
Problems
1. Let k, n be positive integers and assume that k n. Let f
k
: S
n
(R) R be the
following function: f
k
(A) =

k
i=1

i
(A) for any A S
n
(R). Show that f
k
is convex
on S
n
(R). (See above for the denition of convexity.) What happens for k = n?
(4.10)
Let V be 3 dimensional IPS and T Hom(V) be self-adjoint. Assume that

1
(T) >
2
(T) >
3
(T), Te
i
=
i
(T)e
i
, i = 1, 2, 3.
39
Let W = span(e
1
, e
3
).
(a) Show that for each t [
3
(T),
1
(T)] there exists a unique W(t) Gr(1, W) such that

1
(Q(T, W(t))) = t.
(b) Let t [
2
(T),
1
(T)]. Let U(t) = span(W(t), e
2
) Gr(2, V). Show that
2
(T) =

2
(Q(T, U(t)).
(4.11)
(a) Let the assumptions of Theorem 4.40 hold. Let W Gr(k 1, V). Show that there
exists b0 ,= x W

such that x, e
i
) = 0 for k + 1, ..., n, where e
1
, ..., e
n
satisfy (4.3).
Conclude that
1
(Q(T, W

))
Tx,x)
x,x)

k
(T).
(b) Let U

= span(e
1
, ..., e

). Show that
1
(Q(T, U

)) =
+1
(T) for = 1, ..., n 1.
(c) Prove Theorem 4.40.
(d) Prove Corollary 4.41. (Hint: Choose U Gr(k, W) such that U Wspan(e
k+n+1
, ..., e
n
)

.
Then
k+n
(T)
k
(Q(T, U))
k
(Q(T, W)).)
(4.12)
Prove Corollary 4.47.
(4.13)
Let B = (b
ij
)
n
1
H
n
. Show that B > 0 if and only if det(b
ij
)
k
1
> 0 for k = 1, ..., n.
4.6 Positive denite operators and matrices
Denition 4.49 Let V be a nite dimensional IPS over F = C, R. Let S, T S(V).
Then T > S, (T S) if Tx, x) > Sx, x), (Tx, x) Sx, x)) for all 0 ,= x V. T
is called positive (nonnegative) denite if T > 0 (T 0), where 0 is the zero operator in
Hom(V).
Let P, Q be either quadratic forms if F = R or hermitian forms if F = C. Then Q >
P, (Q P) if Q(x, x) > P(x, x), (Q(x, x) P(x, x)) for all 0 ,= x V. Q is called
positive (nonnegative) denite if Q > 0 (Q 0), where 0 is the zero operator in Hom(V).
For A, B H
n
B > A (B A) if x

Bx > x

Ax (x

Bx x

Ax) for all 0 ,= x C


n
.
B H
n
is called is called positive (nonnegative) denite if B > 0 (B 0).
Use (4.1) to deduce.
Corollary 4.50 Let V be n-dimensional IPS. Let T S(V). Then T > 0 (T 0) if
and only if
n
(T) > 0 (
n
(T) 0). Let S S(V) and assume that T > S (T S). Then

i
(T) >
i
(S) (
i
(T)
i
(S)) for i = 1, ..., n.
Proposition 4.51 Let V be a nite dimensional IPS. Assume that T S(V). Then
T 0 if and only if there exists S S(V) such that T = S
2
. Furthermore T > 0 if and
only if S is invertible. For 0 T S(V) there exists a unique 0 S S(V) such that
T = S
2
. This S is called the square root of T and is denoted by T
1
2
.
Proof. Assume rst that T 0. Let e
1
, ..., e
n
be an orthonormal basis consisting of
eigenvectors of T as in (4.3). Since
i
(T) 0, i = 1, ..., n we can dene P Hom(V) as
follows
Pe
i
=
_

i
(T)e
i
, i = 1, ..., n.
Clearly P is self-adjoint nonnegative and T = P
2
.
Suppose now that T = S
2
for some S S(V). Then T S(V) and Tx, x) = Sx, Sx)
0. Hence T 0. Clearly Tx, x) = 0 Sx = 0. Hence T > 0 S GL(V).
Suppose that S 0. Then
i
(S) =
_

i
(T), i = 1, ..., n. Furthermore each eigenvector of S
is an eigenvector of T. It is straightforward to show that S = P, where P is dened above. 2
40
Corollary 4.52 Let B H
n
(S(n, R)). Then B 0 if and only there exists A
H
n
(S(n, R)) such that B = A
2
. Furthermore B > 0 if and only if A is invertible. For
B 0 there exists a unique A 0 such that B = A
2
. This A is denoted by B
1
2
.
Theorem 4.53 Let V be an IPS over F = C, R. Let x
1
, ..., x
n
V. Then the grammian
matrix G(x
1
, ..., x
n
) := (x
i
, x
j
))
n
1
is a hermitian nonnegative denite matrix. (If F = R
then G(x
1
, ..., x
n
) is real symmetric nonnegative denite.) G(x
1
, ..., x
n
) > 0 if and only
x
1
, ..., x
n
are linearly independent. Furthermore for any integer k [1, n 1]
det G(x
1
, ..., x
n
) det G(x
1
, ..., x
k
) det G(x
k+1
, ..., x
n
). (4.1)
Equality holds if and only if either det G(x
1
, ..., x
k
) det G(x
k+1
, ..., x
n
>= 0 or x
i
, x
j
) = 0
for i = 1, ..., k and j = k + 1, ..., n.
Proof. Clearly G(x
1
, ..., x
n
) H
n
. If V is an IPS over R then G(x
1
, ..., x
n
) S(n, R).
Let a = (a
1
, ..., a
n
)

F
n
. Then
a

G(x
1
, ..., x
n
)a =
n

i=1
a
i
x
i
,
n

j=1
a
j
x
j
) 0.
Equality holds if and only if

n
i=1
a
i
x
i
= 0. Hence G(x
1
, ..., x
n
) 0 and G(x
1
, ..., x
n
) > 0
if and only if x
1
, ..., x
n
are linearly independent. In particular det G(x
1
, ..., x
n
) 0 and
det G(x
1
, ..., x
n
) > 0 if and only if x
1
, ..., x
n
are linearly independent.
We now prove the inequality (4.1). Assume rst that the right-hand side of (4.1) is zero.
Then either x
1
, ..., x
k
or x
k+1
, ..., x
n
are linearly dependent. Hence x
1
, ..., x
n
are linearly
dependent and det G = 0.
Assume now that the right-hand side of (4.1) is positive. Hence x
1
, ..., x
k
and x
k+1
, ..., x
n
are linearly independent. If x
1
, ..., x
n
are linearly dependent then det G = 0 and strict in-
equality holds in (4.1). It is left to show the inequality (4.1) and the equality case when
x
1
, ..., x
n
are linearly independent. Perform the Gram-Schmidt algorithm on x
1
, ..., x
n
as given in (4.1). Let S
j
= span(x
1
, ..., x
j
) for j = 1, ..., n. Corollary 4.1 yields that
span(e
1
, ..., e
n1
) = S
n1
. Hence y
n
= x
n


n1
j=1
b
j
x
j
for some b
2
, ..., b
n
F. Let
G
t
be the matrix obtained from G(x
1
, ..., x
n
) by subtracting from the n-th row b
j
times
j-th row. Thus the last row of G
t
is (y
n
, x
1
), ..., y
n
, x
n
)) = (0, ..., 0, [[y
n
[[
2
). Clearly
det G(x
1
, ..., x
n
) = det G
t
. Expand det G
t
by the last row to deduce
det G(x
i
, ..., x
n
) = det G(x
i
, ..., x
n1
) [[y
n
[[
2
= ... =
det G(x
i
, ..., x
k
)
n

i=k+1
[[y
i
[[
2
= (4.2)
det G(x
i
, ..., x
k
)
n

i=k+1
dist(x
i
, S
i1
)
2
, k = n 1, ..., 1.
Perform the Gram-Schmidt process on x
k+1
, ..., x
n
to obtain the orthogonal set of vectors
y
k+1
, ..., y
n
such that

S
j
:= span(x
k+1
, ..., x
j
) = span( y
k+1
, ..., y
j
), dist(x
j
,

S
j1
) = [[ y
j
[[,
for j = k + 1, ..., n, where

S
k
= b0. Use (4.2) to deduce that det G(x
k+1
, ..., x
n
) =

n
j=k+1
[[ y
j
[[
2
. As

S
j1
S
j1
for j > k it follows that
[[y
j
[[ = dist(x
j
, S
j1
) dist(x
j
,

S
j1
) = [[ y
j
[[, j = k + 1, ..., n.
This shows (4.1). Assume now equality holds in (4.1). Then [[y
j
[[ = [[ y
j
[[ for j =
k + 1, ..., n. Since

S
j1
S
j1
and y
j
x
j


S
j1
S
j1
it follows that dist(x
j
, S
j1
) =
41
dist( y
j
, S
j1
) = [[y
j
[[. Hence [[ y
j
[[ = dist( y
j
, S
j1
). Part (h) of Problem 4.5 yields that y
j
is orthogonal on S
j1
. In particular each y
j
is orthogonal to S
k
for j = k + 1, ..., n. Hence
x
j
S
k
for j = k + 1, ..., n, i.e. < x
j
, x
i
>= 0 for j > k and i k. Clearly, if the last
condition holds then det G(x
1
, ..., x
n
) = det G(x
1
, ..., x
k
) det G(x
k+1
, ..., x
n
). 2
det G(x
1
, ..., x
n
) has the following geometric meaning. Consider a parallelepiped in V
spanned by x
1
, ..., x
n
starting from the origin b0. That is is a convex hull spanned by the
vectors b0 and

iS
x
i
for all nonempty subsets S 1, ..., n. Then
_
det G(x
1
, ..., x
n
)
is the n-volume of . The inequality (4.1) and equalities (4.2) are obvious from this
geometrical point of view.
Corollary 4.54 Let 0 B = (b
ij
)
n
1
H
n
. Then
det B det(b
ij
)
k
1
det(b
ij
)
n
k+1
, for k = 1, ..., n 1.
For a xed k equality holds if and only if either the right-hand side of the above inequality
is zero or b
ij
= 0 for i = 1, ..., k and j = k + 1, ..., n.
Proof. From Corollary 4.52 it follows that B = X
2
for some X H
n
. Let x
1
, ..., x
n
C
n
be the n-columns of X
T
= (x
1
, ..., x
n
). Let < x, y >= y

x. Since X H
n
we deduce that
B = G(x
1
, ..., x
n
). 2
Theorem 4.55 Let V be an n-dimensional IPS. Let T S. TFAE:
(a) T > 0.
(b) Let g
1
, ..., g
n
be a basis of V. Then det(Tg
i
, g
j
))
k
i,j=1
> 0, k = 1, ..., n.
Proof. (a) (b). According to Proposition 4.51 T = S
2
for some S S(V) GL(V).
Then Tg
i
, g
j
) = Sg
i
, Sg
j
). Hence det(Tg
i
, g
j
))
k
i,j=1
= det G(Sg
1
, ..., Sg
k
). Since S is
invertible and g
1
, ..., g
k
linearly independent it follows that Sg
1
, ..., Sg
k
are linearly inde-
pendent. Theorem 4.1 implies that det G(Sg
1
, ..., Sg
k
) > 0 for k = 1, ..., n.
(b) (a). The proof is by induction on n. For n = 1 (a) is obvious. Assume that (a)
holds for n = m 1. Let U := span(g
1
, ..., g
n1
) and Q := Q(T, U). Then there exists
P S(U) such that Px, y) = Q(x, y) = Tx, y) for any x, y U. By induction P > 0.
Corollary 4.36 yields that
n1
(T)
n1
(P) > 0. Hence T has at least n 1 positive
eigenvalues. Let e
1
, ..., e
n
be given by (4.3). Then det(Te
i
, e
j
))
n
i,j=1
=

n
i=1

i
(T) > 0. Let
A = (a
pq
)
n
1
GL(n, C) be the transformation matrix from the basis g
1
, ..., g
n
to e
1
, ..., e
n
,
i.e.
g
i
=
n

p=1
a
pi
e
p
, i = 1, ..., n.
It is straightforward to show that
(Tg
i
, g
j
))
n
1
= A
T
(Te
p
, e
q
))

A
(4.3)
det(Tg
i
, g
j
))
n
1
= det(Te
i
, e
j
))
n
1
[ det A[
2
= [ det A[
2
n

i=1

i
(T).
Since det(Tg
i
, g
j
))
n
1
> 0 and
1
(T) ...
n1
(T) > 0 it follows that
n
(T) > 0. 2
Corollary 4.56 Let B = (b
ij
)
n
1
H
n
. Then B > 0 if and only if det(b
ij
)
k
1
> 0 for
k = 1, ..., n.
The following result is straightforward (see Problem 4.5:
42
Proposition 4.57 Let V be a nite dimensional IPS over F = R, C with the inner
product , ). Assume that T S(V). Then T > 0 if and only if (x, y) := Tx, y) is
an inner product on V. Vice versa any inner product (, ) : V V R is of the form
(x, y) = Tx, y) for a unique self-adjoint positive denite operator T Hom(V).
Example 4.58 Each 0 < B H
n
induces and inner product on C
n
: (x, y) = y

Bx.
Each 0 < B S(n, R) induces and inner product on R
n
: (x, y) = y
T
Bx. Furthermore any
inner product on C
n
or R
n
is of the above form. In particular, the standard inner products
on C
n
and R
n
are induced by the identity matrix I.
Denition 4.59 Let V be a nite dimensional IPS with the inner product , ). Let
S Hom(V ). Then S is called symmetrizable if there exists an inner product (, ) on V
such that S is self-adjoint with respect to (, ).
Denition 4.60 Let A C
nn
and assume that
1
, . . . ,
n
are the eigenvalues of A
counted with their multiplicities, i.e. det(zI A) =

n
i=1
(z
i
). Let
+
(A),
0
(A),

(A)
be the number of the eigenvalues of A satisfying '
i
> 0, '
i
= 0, '
i
< 0 respec-
tively. (Here 'z stands for the real part of the complex number z C.) Then (A) :=
(
+
(A),
0
(A),

(A)) is called the inertia of A.


Note that for A H
n
the inertia of A coincides with the inertia dened in Denition
4.38. Furthermore A is stable if and only if

(A) = n, i.e.
+
(A) =
0
(A) = 0.
Theorem 4.61 Let A C
nn
and B H
n
. Then C := A

B + BA H
n
. If C > 0
then A, B are nonsingular and (A) = (B). In particular,
0
(A) = 0, i.e. A does not have
eigenvalues on the imaginary axis.
Let A C
nn
. If A is stable then for any given C H
n
the linear system A

B+BA = C,
in unknown matrix B H
n
, has a unique solution B.
Moreover any A C
nn
is stable if an only if the system A

B + BA = I has a unique
solution B H
n
which is negative denite, i.e. B < 0. (Lyapunov criteria of stability.)
Proof. As B is hermitian (A

B + BA)

= B

A + A

= BA + A

B, i.e. C is
hermitian. Suppose that Bx = 0. Then x

B = 0

and x

Cx = x

A0

+ 0

Ax = 0. If
C > 0 we deduce that x = 0, i.e. B is nonsingular. Hence B
2
> 0. Suppose next that Ax =
x, and is purely imaginary, i.e.

= . Then x

= (Ax)

= (x)

=

x

= x

.
Hence
x

Cx = x

Bx +x

BAx = x

Bx +x

Bx = 0.
If C > 0 we deduce that x = 0, i.e.
0
(A) = 0. Let t [0, 1] and consider A(t) = (1t)A+tB.
Then C(t) := A(t)

B+BA(t) = (1 t)C +2tB


2
. If C > 0 then C(t) > 0 for each t [0, 1].
Thus i
0
(A(t)) = 0 for all t [0, 1]. The eigenvalues of A(t) are continuous functions of
t [0, 1]. Since i
0
(A(t) = 0 it follows that i
+
(A(t)) and i

(A(t)) are constant integers, i.e.


they do not depend on t. Hence (A(t)) = (A(0)) = (A) = (A(1)) = (B).
Assume now that A C
nn
stable. Fix any C H
n
and consider the equation
A

B +BA = C, C = X +

1 Y, X S
n
(R), Y AS(n, R). (4.4)
This is a system of n
2
real valued equations in n
2
real unknowns: the
n(n+1)
2
entries of
X and the
n(n1)
2
entries of Y . We claim that this system has a unique solution. To
show that it is enough to show that for C = 0 the system (4.4) has a unique solution
B = 0. Assume to the contrary that the system A

B + BA

= 0 has a nontrivial solution


0 ,= B H
n
. Then B = U diag(D, 0)U

where D S
m
(R) is a diagonal invertible matrix
and 0 S
nm
(R) and m [1, n]. Let E = UAU

. Then E

diag(D, 0) + diag(D, 0)E = 0.


Write E as a block matrix
_
E
11
E
12
E
21
E
22
_
. Then the matrix equation for E implies that
E
12
= E
21
= 0, i.e. E = diag(E
11
, E
22
), and E

11
D = DE
11
. So E

11
= DE
11
D
1
. The
43
eigenvalues DE
11
D
1
are the negative eigenvalues of E
11
, while the eigenvalues of E

11
are
the conjugate of the eigenvalues of E
11
. Hence each eigenvalue of E
11
is minus a conjugate
of another eigenvalue of E
11
. This is impossible if A is stable, since all the eigenvalues of A,
and hence of E
11
have negative real parts. Hence B = 0, and the system (4.4) has a unique
solution B = H
n
for any C H
n
.
Let A C
nn
. Assume rst that the system (4.4) for C = I
n
has a unique solution
B H
n
. The rst part of the theorem shows that A, B are nonsingular and (A) = (B).
If B < 0 we deduce that

(A) = n, i.e. A is stable.


Suppose now that A is stable, i.e.

(A) = n. The second part of the theorem implies


that the system (4.4) has a unique solution B H
n
for C = I
n
. The rst part of the
theorem implies that (B) = (A). Hence B < 0. 2
Problems
(4.5)
Show Proposition 4.57.
4.7 Singular Value Decomposition
Let U, V, be nite dimensional IPS over F = R, C, with the inner products , )
U
, , )
V
respectively. Let u
1
, ..., u
m
and v
1
, ..., v
n
be bases in U and V respectively. Let T : V U
be a linear operator. In these bases T is represented by a matrix A F
mn
. Let T

: U

=
U V

= V. Then T

T : V V and TT

: U U are selfadjoint operators. As


T

Tv, v)
V
= Tv, Tv)
V
0, TT

u, u)
U
= T

u, T

u)
U
0
it follows that T

T 0, TT

0. Let
T

Tc
i
=
i
(T

T)c
i
, c
i
, c
k
)
V
=
ik
, i, k = 1, ..., n, (4.1)

1
(T

T) ...
n
(T

T) 0,
TT

d
j
=
j
(TT

)d
j
, d
j
, d
l
)
U
=
jl
, j, l = 1, ..., m, (4.2)

1
(TT

) ...
m
(TT

) 0,
Proposition 4.62 Let U, V, be nite dimensional IPS over F = R, C. Let T : V
U. Then rank T = rank T

= rank T

T = rank TT

= r. Furthermore the selfadjoint


nonnegative denite operators T

T and TT

have exactly r positive eigenvalues, and

i
(T

T) =
i
(TT

) > 0, i = 1, ..., rank T. (4.3)


Moreover for i [1, r] Tc
i
and T

d
i
are eigenvectors of TT

and T

T corresponding to
the eigenvalue
i
(TT

) =
i
(T

T) respectively. Furthermore if c
1
, ..., c
r
satisfy (4.1) then

d
i
:=
Tc
i
]]Tc
i
]]
, i = 1, ..., r satisfy (4.2) for i = 1, ..., r. Similar result holds for d
1
, ..., d
r
.
Proof. Clearly Tx = 0 Tx, Tx) = 0 T

Tx = 0. Hence
rank T

T = rank T = rank T

= rank TT

= r.
Thus T

T and TT

have exactly r positive eigenvalues. Let i [1, r]. Then T

Tc
i
,= 0.
Hence Tc
i
,= 0. (4.1) yields that TT

(Tc
i
) =
i
(T

T)(Tc
i
). Similarly T

T(T

d
i
) =

i
(TT

)(T

d
i
) ,= 0. Hence (4.3) holds. Assume that c
1
, ..., c
r
satisfy (4.1). Let

d
1
, ...,

d
r
be dened as above. By the denition [[

d
i
[[ = 1, i = 1, ..., r. Let 1 i < j r. Then
0 = c
i
, c
j
) =
i
(T

T)c
i
, c
j
) = T

Tc
i
, c
j
) = Tc
i
, Tc
j
)

d
i
,

d
j
) = 0.
Hence

d
1
, ...,

d
r
is an orthonormal system. 2
44
Let

i
(T) =
_

i
(T

T) for i = 1, ...r,
i
(T) = 0 for i > r,
(4.4)

(p)
(T) := (
1
(T), ...,
p
(T))

R
p

, p N.
Then
i
(T) =
i
(T

), i = 1, ..., min(m, n) are called the singular values of T and T

re-
spectively. Note that the singular values are arranged in a decreasing order. The positive
singular values are called principal singular values of T and T

respectively. Note that


[[Tc
i
[[
2
= Tc
i
, Tc
i
) = T

Tc
i
, c
i
) =
i
(T

T) =
2
i

[[Tc
i
[[ =
i
, i = 1, ..., n,
[[T

d
j
[[
2
= T

d
j
, T

d
j
) = TT

d
j
, d
i
) =
i
(TT

) =
2
j

[[Td
j
[[ =
j
, j = 1, ..., m.
Let c
1
, ...c
n
be an orthonormal basis of V satisfying (4.1). Choose an orthonormal ba-
sis d
1
, ..., d
m
as follows. Set d
i
:=
Tc
i

i
, i = 1, ..., r. Then complete the orthonormal
set d
1
, ..., d
r
to an orthonormal basis of U. Since span(d
1
, ..., d
r
) is spanned by all
eigenvectors of TT

corresponding to nonzero eigenvalues of TT

it follows that ker T

=
span(d
r+1
, ..., d
m
). Hence (4.2) holds. In these orthonormal bases of U and V the operators
T and T

represented quite simply:


Tc
i
=
i
d
i
, i = 1, ..., n, where d
i
= 0 for i > m,
(4.5)
T

d
j
=
j
c
j
, j = 1, ..., m, where c
j
= 0 for j > n..
Let
= (s
ij
)
m,n
i,j=1
, s
ij
= 0 for i ,= j, s
ii
=
i
for i = 1, ..., min(m.n). (4.6)
In the case m ,= n we call a quasi diagonal matrix with the diagonal
1
, ...,
min(m,n)
.
Then in the bases [d
1
, ..., d
m
] and [c
1
, ..., c
n
] T and T

represented by the matrices and

respectively.
Corollary 4.63 Let [u
1
, ..., u
m
], [v
1
, ..., v
n
] be orthonormal bases in the vector spaces
U, V over F = R, C respectively. Then T and T

are presented by the matrices A F


mn
and A

F
nm
respectively. Let U U(m) and V U(n) be the unitary matrices rep-
resenting the change of base [d
1
, ..., d
m
] to [u
1
, ..., u
m
] and [c
1
, ..., c
n
] to [v
1
, ..., v
n
] respec-
tively. (If F = R then U and V are orthogonal matrices.) Then
A = UV

F
mn
, U U(m), V U(n). (4.7)
Proof. By the denition Tv
j
=

m
i=1
a
ij
u
i
. Let U = (u
ip
)
m
i,p=1
, V = (v
jq
)
n
j,q=1
. Then
Tc
q
=
n

j=1
v
jq
Tv
j
=
n

j=1
v
jq
m

i=1
a
ij
u
i
=
n

j=1
v
jq
m

i=1
a
ij
m

p=1
u
ip
d
p
.
Use the rst equality of (4.5) to deduce that U

AV = . 2
Denition 4.64 (4.7) is called the singular value decomposition (SVD) of A.
Proposition 4.65 Let F = R, C and denote by
mn,k
(F) F
mn
the set of all matrices
of rank k [1, min(m, n)] at most. Then A
mn,k
(F) if and only if A can be expressed
as a sum of at most k matrices of rank 1. Furthermore
mn,k
(F) is a variety in M
mn
(F)
given by the polynomial conditions: Each (k + 1) (k + 1) minor of A is equal to zero.
45
For the proof see Problem 4.20
Theorem 4.66 For F = R, C and A = (a
ij
) F
mn
the following conditions hold:
[[A[[
F
:=

tr A

A =

tr AA

_
rank A

i=1

i
(A)
2
. (4.8)
[[A[[
2
:= max
xF
n
,]]x]]
2
=1
[[Ax[[
2
=
1
(A). (4.9)
min
B7
m,n,k
(F)
[[AB[[
2
=
k+1
(A), k = 1, ..., rank A1. (4.10)

i
(A)
i
((a
i
p
j
q
)
m

,n

p=1,q=1
)
i+(mm

)+(nn

)
(A),
(4.11)
m
t
[1, m], n
t
[1, n], 1 i
1
< ... < i
m
m, 1 j
1
< ... < j
n
n.
Proof. The proof of (4.21) is left a Problem 4.21. We now show the equality in (4.9).
View A as an operator A : R
n
R
m
. From the denition of [[A[[
2
it follows
[[A[[
2
2
= max
0,=xR
n
x

Ax
x

x
=
1
(A

A) =
1
(A)
2
,
which proves (4.9).
We now prove (4.10). In the SVD decomposition of A (4.7) assume that U = (u
1
, ..., u
m
)
and V = (v
1
, ..., v
n
). Then (4.7) is equivalent to the following representation of A:
A =
r

i=1

i
u
i
v

i
, u
1
, ..., u
r
R
m
, v
1
, ..., v
r
R
n
, u

i
u
j
= v

i
v
j
=
ij
, i, j = 1, ..., r, (4.12)
where r = rank A. Let B =

k
i=1

i
u
i
v

i

mn,k
. Then in view of (4.9)
[[AB[[
2
= [[
r

k+1

i
u
i
v

i
[[
2
=
k+1
.
Let B
mn,k
. To show (4.10) it is enough to show that [[AB[[
2

k+1
. Let
W := x R
n
: Bx = 0.
Then codim W k. Furthermore
[[AB[[
2
2
max
]]x]]
2
=1,xW
[[(AB)x[[
2
= max
]]x]]
2
=1,xW
x

Ax
k+1
(A

A) =
2
k+1
,
where the last inequality follows from the min-max characterization of
k+1
(A

A).
Let C = (a
ij
q
)
m,n

i,q=1
. Then C

C is an a principal submatrix of A

A of dimension n
t
. The
interlacing inequalities between the eigenvalues of A

A and C

C yields (4.11) for m


t
= m.
Let D = (a
i
p
j
q
)
m

,n

p,q=1
. Then DD

is a principle submatrix of CC

. Use the interlacing


properties of the eigenvalues of CC

and DD

to deduce (4.11). 2
Corollary 4.67 Let U and V be nite dimensional IPS over F = R, C. Let T : V U
be a linear operator. Then
[[T[[
F
:=

tr T

T =

tr TT

_
rank T

i=1

i
(T)
2
. (4.13)
[[T[[
2
:= max
xV,]]x]]
2
=1
[[Tx[[
2
=
1
(A). (4.14)
min
QL(V,U),rank Qk
[[T Q[[
2
=
k+1
(T), k = 1, ..., rank T 1. (4.15)
46
Theorem 4.68 Let F = R, C and assume that A M
mn
(F). Dene
H(A) =
_
0 A
A

0
_
H
m+n
(F). (4.16)
Then

i
(H(A)) =
i
(A),
m+n+1i
(H(A)) =
i
(A), i = 1, ..., rank A,
(4.17)

j
(H(A)) = 0, j = rank A+ 1, ..., n +mrank A.
View A as an operator A : F
n
F
m
. Choose orthonormal bases [d
1
, ..., d
m
] and [c
1
, ..., c
n
]
in F
m
and F
n
respectively and in Proposition 4.62 respectively. Then
_
0 A
A

0
_ _
d
i
c
i
_
=
i
(A)
_
d
i
c
i
_
,
_
0 A
A

0
_ _
d
i
c
i
_
=
i
(A)
_
d
i
c
i
_
,
i = 1, ..., rank A, (4.18)
ker H(A) = span((d

r+1
, 0)

, ..., (d

m
, 0)

, (0, c

r+1
), ..., (0, c

n
)

), r = rank A.
Proof. It is straightforward to show the equalities (4.18). Since all the eigenvectors
appearing in (4.18) are linearly independent we deduce (4.17). 2
Theorem 4.69 Let A, B C
mn
, and assume that
1
(A)
2
(A) . . . 0,
1
(B)

2
(B) . . . 0, where
i
(A) = 0 and
j
(B) = 0 for i > rank A and j > rank B
respectively. Then

i=1

i
(A)
i
(B) 'tr AB

= 'tr A

B
m

i=1

i
(A)
i
(B). (4.19)
Equality holds if the A and B have common left and the right eigenvectors x
1
, . . . , x
m
and
y
1
, . . . , y
n
corresponding to
1
(A), . . . and
1
(B), . . ., respectively.
Proof. Note that tr AB

= tr B

A and 'tr AB

= 'tr(AB

= 'tr BA

. Ob-
serve next that tr H(A)H(B) = 2'tr AB

. Combine Theorems 4.68 and 4.44 to deduce the


theorem. 2
Corollary 4.70 For A C
mn
min
B7
m,n,k
(F)
[[AB[[
2
F
=

m
i=k+1

i
(A)
2
.
Proof. Let B C
mn
. Then
[[AB[[
2
F
= tr(AB)(A

) = [[A[[
2
F
+[[B[[
2
F
2'tr AB

i=1

i
(A)
2
+
i
(B)
2
2'tr AB

Assume now that B


m,n,k
. Then rank B k and denote by x
1
. . . x
k
0 the
nonzero singular values of B. Use Theorem 4.69 to deduce that
[[AB[[
F
[[A[[
2
F
+
k

i=1
x
2
i
2
k

i=1

i
(A)x
i
=
n

i=k+1

i
(A)
2
+
k

i=1
(x
i

i
(A))
2

i=k+1

i
(A)
2
.
Let A =

m
i=1

i
(A)u
i
v

i
is the singular decomposition of A. Choose B =

k
i=1

i
(A)u
i
v

m.n,k
to see that [[AB[[
2
F
=

m
i=k+1

i
(A)
2
. 2
Dene by R
n
+,
:= R
n

R
n
+
. Then D R
n
+,
is called a strong Schur set if for any
x, y R
n
+,
, x _ y we have the implication y D x D.
47
Theorem 4.71 Let p N and D R
p

R
p
+
be a regular convex strong Schur domain.
Fix m, n N and let
(p)
(D) := A F
mn
:
(p)
(A) D. Let h : D R be a convex
and strongly Schurs order preserving on D. Let f :
(p)
:R be given as h
(p)
. Then f
is a convex function.
See Problem ??.
Corollary 4.72 Let F = R, C, m, n, p N, q [1, ) and w
1
w
2
... w
p
> 0.
Then the following function
f : F
mn
R, f(A) := (
p

i=1
w
i

i
(A)
q
)
1
q
, A F
mn
is a convex function.
See Problem 4.22
Theorem 4.73 Let U be IPS over C. Let T : U U be a linear operator. Then
(T) [[T[[
2
. Furthermore equality holds if and only if the following conditions hold.
(a) T and T

have a common eigenvector x such that Tx = x, T

x =

x and [[ = (T).
(b) Let T
1
be the restriction of T to the invariant subspace V := span(x)

. Then [[T
1
[[
2

(T).
Proof. Let Tx = x where [[x[[ = 1 and (T) = [[. Recall [[T[[
2
=
1
(T), where

1
(T)
2
=
1
(T

T) is the maximal eigenvalue of the self-adjoint operator T

T. The max-
imum characterization of
1
(T

T) yields that [[
2
= Tx, Tx) = T

Tx, x)
1
(T

T) =
[[T[[
2
2
. Hence (T) [[T[[
2
.
Assume now that (T) = [[T[[
2
. (T) = 0 then [[T[[
2
= 0 T = 0, and theorem
holds trivially n this case. Assume that (T) > 0. Hence the eigenvector x
1
:= x is also
the eigenvector of T

T corresponding to
1
(T

T) = [[
2
. Hence [[
2
x = T

Tx = T

(x),
which implies that T

x =

x. Let U = span(x)

be the orthogonal complement of span(x).


Since Tspan(x) = span(x) it follows that T

U U. Similarly, since T

span(x) = span(x)
TU U. Thus V = span(x) U and span(x), U are invariant subspaces of T and T

.
Hence span(x), U are invariant subspaces of T

T and TT

. Let T
1
be the restriction of T to
U. Then T

1
T
1
is the restriction of T

T. Therefore [[T
1
[[
2
2
=
1
(T
1
T
1
)
1
(T

T) = [[T[[
2
2
.
This establishes the second part of theorem, labeled (a) and (b).
The above result imply that the conditions (a) and (b) of the theorem yield the equality
(T) = [[T[[
2
. 2
Corollary 4.74 Let U be an n-dimensional IPS over C. Let T : U U be a linear
operator. Denote by [(T)[ = ([
1
(T)[, ..., [
n
(T)[)

the absolute eigenvalues of T, (counting


with their multiplicities), arranged in a decreasing order. Then [(T)[ = (
1
(T), ...,
n
(T))

if and only if T is a normal operator.


Problems
(4.20)
Prove Proposition 4.65. (Use SVD to prove the nontrivial part of the Proposition.)
(4.21)
Prove the equalities in (4.8).
(4.22)
a. Prove Corollary 4.72
b. Recall the denition of a norm on a vector space over F = R, C. Show that the function
f dened in Corollary 4.72 is a norm. For p = min(m, n) and w
1
= ... = w
p
= 1 this norm
is called the q Schatten norm.
48
1. Let A S
n
(R) and assume the A = Q
T
Q, where Q O(n, R) and = diag(
1
, . . . ,
n
)
is a diagonal matrix, where [
1
[ . . . [
n
[ 0.
(a) Find the SVD of A.
(b) Show that
1
(A) = max(
1
(A), [
n
(A)[), where
1
(A) . . .
n
(A) are the n
eigenvalues of A arranged in a decreasing order.
2. Let k, m, n be a positive integers such that k min(m, n). Show that the function
f : R
mn
: [0, ) given by f(A) =

k
i=1

i
(A) is a convex function on R
mn
.
4.8 Moore-Penrose generalized inverse
Let A C
mn
. Then (4.12) is called the reduced SVD of A. It can be written as
A = U
r

r
V

r
, r = rank A,
r
:= diag(
1
(A), . . . ,
r
(A)) S
r
(R),
(4.23)
U
r
= [u
1
, . . . , u
r
] C
mr
, V
r
= [v
1
, . . . , v
r
] C
nr
, U

r
U
r
= V

r
V
r
= I
r
, .
Recall that
AA

u
i
=
i
(A)
2
u
i
, A

Av
i
=
i
(A)
2
v
i
, v
i
=
1

i
(A)
A

u
i
, u
i
=
1

i
(A)
Av
i
, i = 1, . . . , r.
Then
A

:= V
r

1
r
U

r
C
nm
(4.24)
is the Moore-Penrose generalized inverse of A. If A R
mn
then we assume that U R
mr
and V R
nr
, i.e. U, V are real values matrices over the real numbers R.
Theorem 4.75 Let A C
mn
matrix. Then the Moore-Penrose generalized inverse
A

C
nm
satises the following properties.
1. rank A = rank A

.
2. A

AA

= A

, AA

A = A, A

AA

= A

AA

= A

.
3. A

A and AA

are Hermitian nonnegative denite idempotent matrices, i.e. (A

A)
2
=
A

A and (AA

)
2
= AA

, having the same rank as A.


4. The least square solution of Ax = b, i.e. the solution of the system A

Ax = A

b,
has a solution y = A

b. This solution has the minimal norm [[y[[, for all possible
solutions of A

Ax = A

b.
5. If rank A = n then A

= (A

A)
1
A

. In particular, if A C
nn
is invertible then
A

= A
1
.
Proposition 4.76 Let E C
lm
, G C
mn
. Then rank EG min(rank E, rank G).
If l = m and E is invertible then rank EG = rank G. If m = n and G is invertible then
rank EG = rank G.
Proof. Let e
1
, . . . , e
m
C
l
, g
1
, . . . , g
n
C
m
be the columns of E and G respec-
tively. Then rank E = dimspan(e
1
, . . . , e
l
). Observe that EG = [Eg
1
, . . . , Eg
n
] C
ln
.
Clearly Eg
i
is a linear combination of the columns of E. Hence Eg
i
span(e
1
, . . . , e
l
).
Therefore span(Eg
1
, . . . , Eg
n
) span(e
1
, . . . , e
l
), which implies that rank EG rank E.
Note that (EG)
T
= G
T
E
T
. Hence rank EG = rank (EG)
T
rank G
T
= rank G. Thus
rank EG min(rank E, rank G). Suppose E is invertible. Then rank EG rank G =
rank E
1
(EG) rank EG. Hence rank EG = rank G. Similarly rank EG = rank E if G is
invertible. 2
Proof of Theorem 4.75.
49
1. Proposition 4.76 yields that rank A

= rank V
r

1
r
U

r
rank
1
r
U

r
rank
1
r
=
r = rank A. Since
r
= V

r
A

U
r
Proposition 4.76 yields that rank A

rank
1
r
= r.
Hence rank A = rank A

.
2. AA

= (U
r

r
V

r
)(V
r

1
r
U

r
) = U
r

1
r
U

r
= U
r
U

r
. Hence AA

A = (U
r
U

r
)(U
r

r
V

r
) =
U
r
V

r
= A. Hence A

AA

= (V
r

r
U

r
)(U
r
U

r
) = A

. Similarly A

A = V
r
V

r
and
A

AA

= A

, A

AA

= A

.
3. Since AA

= U
r
U

r
we deduce that (AA

= (U
r
U

r
)

= (U

r
)

r
= AA

, i.e. AA

is
Hermitian. Next (AA

)
2
= (U
r
U

r
)
2
= (U
r
U

r
)(U
r
U

r
) = (U
r
U

r
) = AA

, i.e. AA

is
idempotent. Hence AA

is nonnegative denite. As AA

= U
r
I
r
U

r
, the arguments of
part 1 yield that rank AA

= r. Similar arguments apply to A

A = V
r
V

r
.
4. Since A

AA

= A

it follows that A

A(A

b) = A

b, i.e. y = A

b is a least square
solution. It is left to show that if A

Ax = A

b then [[x[[ [[A

b[[ and equality holds


if and only if x = A

b.
We now consider the system A

Ax = A

b. To analyze this system we use the full


form of SVD given in (4.7). It is equivalent to (V
T
U

)(UV

)x = V
T
U

b. Mul-
tiplying by V

we obtain the system


T
(V

x) =
T
(U

b). Let z = (z
1
, . . . , z
n
)
T
:=
V

x, c = (c
1
, . . . , c
m
)
T
:= U

b. Note that z

z = x

V V x = x

x, i.e. [[z[[ = [[x[[.


After these substitutions the least square system in z
1
, . . . , z
n
variables is given in
the form
i
(A)
2
z
i
=
i
(A)c
i
for i = 1, . . . , n. Since
i
(A) = 0 for i > r we ob-
tain that z
i
=
1

i
(A)
c
i
for i = 1, . . . , r while z
r+1
, . . . , z
n
are free variables. Thus
[[z[[
2
=

r
i=1
1

i
(A)
2
+

n
i=r+1
[z
i
[
2
. Hence the least square solution with the minimal
length [[z[[ is the solution with z
i
= 0 for i = r + 1, . . . , n. This solution corresponds
the x = A

b.
5. Since rank A

A = rank A = n it follows that A

A is an invertible matrix. Hence the


least square solution is unique and is given by x = (A

A)
1
A

b. Thus for each b


(A

A)
1
A

b = A

b, hence A

= (A

A)
1
A

.
If A is an nn matrix and is invertible it follows that (A

A)
1
A

= A
1
(A

)
1
A

=
A
1
. 2
Problems
1. P C
nn
is called a projection if P
2
= P. Show that P is a projection if and only if
the following two conditions are satised:
Each eigenvalue of P is either 0 or 1.
P is a diagonable matrix.
2. P R
nn
is called an orthogonal projection if P is a projection and a symmetric
matrix. Let V R
n
be the subspace spanned by the columns of P. Show that for
any a R
n
, b V [[a b[[ [[a Pa[[ and equality holds if and only if b = Pa.
That is, Pa is the orthogonal projection of a on the column space of P.
3. Let A R
mn
and assume that the SVD of Ais given by (4.7), where U O(m, R), V
O(n, R).
(a) What is the SVD of A
T
?
(b) Show that (A
T
)

= (A

)
T
.
(c) Suppose that B R
lm
. Is it true that (BA)

= A

? Justify!
50
4.9 Rank-constrained matrix approximations
Let A C
mn
and assume that A = U
A

A
V

A
be the SVD of A given in (4.7). Let
U
A
= [u
1
u
2
. . . u
m
], V
A
= [v
1
v
2
. . . v
n
] be the representations of U, V in terms of their
m, n columns respectively. Then
P
A,L
:=
rank A

i=1
u
i
u

i
C
mm
, P
A,R
:=
rank A

i=1
v
i
v

i
C
nn
, (4.25)
are the orthogonal projections on the range of A and A

respectively. Denote by A
k
:=

k
i=1

i
(A)u
i
v

i
C
mn
for k = 1, . . . , rank A. For k > rank A we dene A
k
:= A (=
A
rank A
). For 1 k < rank A, the matrix A
k
is uniquely dened if and only if
k
(A) >

k+1
(A).
Theorem 4.77 Let A C
mn
, B C
mp
, C C
qn
be given. Then X = B

(P
B,L
AP
C,R
)
k
C

is a solution to the minimal problem


min
X7(p,q,k)
[[ABXC[[
F
, (4.26)
having the minimal [[X[[
F
. This solution is unique if and only if either k rank P
B,L
AP
C,R
or 1 k < rank P
B,L
AP
C,R
and
k
(P
B,L
AP
C,R
) >
k+1
(P
B,L
AP
C,R
).
Proof. Recall that the Frobenius norm is invariant under the multiplication from the left
and the right by the corresponding unitary matrices. Hence [[ABXC[[
F
= [[A
1

B
Y
C
[[,
where

A := U

B
AV
C
,

X := V

B
XU
C
. Clearly, X and

X have the same rank and the same
Frobenius norm. Thus it is enough to consider the minimal problem min
Y

X7(p,q,k)
[[

B

X
C
[[
F
. Let s = rank B, t = rank B. Clearly if B or C is a zero matrix, then X = 0
is the solution to the minimal problem (4.26). In this case either P
B,L
or P
C,R
are zero
matrices, and the theorem holds trivially in this case.
It is left to consider the case 1 s, 1 t. Dene B
1
:= diag(
1
(B), . . . ,
s
(B))
C
ss
, C
1
:= diag(
1
(C), . . . ,
t
(C)) C
tt
. Partition

A and

X to 2 2 block matrices

A = [A
ij
]
2
i,j=1
and

X = [X
ij
]
2
i,j=1
, where A
11
, X
11
C
st
. (For certain values of s and t,
we may have to partition

A or

X to less than 2 2 block matrices.) Observe next that
Z :=
B
Y
C
= [Z
ij
]
2
i,j=1
, where Z
11
= B
1
Y
11
C
1
and all other blocks Z
ij
are zero matrices.
Hence
[[

AZ[[
2
F
= [[A
11
Z
11
[[
2
F
+

2<i+j4
[[A
ij
[[
2
F
[[A
11
(A
11
)
k
[[
2
F
+

2<i+j4
[[A
ij
[[
2
F
.
Thus

X = [X
ij
]
2
i,j=1
, where X
11
= B
1
1
(A
11
)
k
C
1
1
and X
ij
= 0 for all (i, j) ,= (1, 1)
is a solution min
Y

X7(p,q,k)
[[

A
B

X
C
[[
F
with the minimal Frobenius form. This
solution is unique if and only if the solution Z
11
= (A
11
)
k
is the unique solution to
min
Z
11
7(s,t,k)
[[A
11
Z
11
[[
F
. This happens if either k rank A
11
or 1 k < rank A
11
and

k
(A
11
) >
k+1
(A
11
). A straightforward calculation shows that

X =

B
(P

B
,L

AP

C
,R
)
k

C
.
This shows that X = B

(P
B,L
AP
C,R
)
k
C

is a solution of (4.26) with the minimal Frobe-


nius norm. This solution is unique if and only if either k rank P
B,L
AP
C,R
or 1 k <
rank P
B,L
AP
C,R
and
k
(P
B,L
AP
C,R
) >
k+1
(P
B,L
AP
C,R
). 2
4.10 Generalized Singular Value Decomposition
See [4] for more details on this section.
Proposition 4.78 Let 0 < M S
m
(R), 0 < N S
n
(R) be positive denite symmetric
matrices. Let x, y)
M
:= y
T
M
2
x, u, v)
N
:= v
T
N
2
u, for x, y R
m
, u, v R
N
, be inner
51
products in R
m
, R
n
respectively. Let A R
mn
and view A as a linear operator A : R
n

R
m
, x Ax. Denote by A
c
: R
m
R
n
the adjoint operator with respect to the inner
products , )
M
, , )
N
. That is Au, y)
M
= u, A
c
y)
N
. Then A
c
= N
2
A
T
M
2
.
Proof. Clearly Au, y)
M
= y
T
M
2
Au, u, A
c
y)
N
= (A
c
y)
T
N
2
u. Hence (A
c
)
T
N
2
=
M
2
A. Take the transpose of this identity and divide by N
2
from the left to deduce
A
c
= N
2
A
T
M
2
. 2
Theorem 4.79 Let the assumptions of Proposition 4.78 hold. Then the generalized
singular value decomposition (GSVD) of A is
A = UV
T
, = diag(
1
, . . . , ) R
mn
,
1
. . .
r
> 0,
i
= 0 for i > r := rank A,(4.27)
U GL(m, R), V GL(n, R), U
T
M
2
U = I
m
, V
T
N
2
V = I
m
.
Proof. Identify R
n
and R
m
, with the inner products , )
N
, , )
M
, with IPS V, Urespec-
tively. Identify A, A
c
with the linear operators T : V V, T

: U V respectively. Apply
Proposition 4.62. Then v
1
, . . . , v
n
and u
1
, . . . , u
m
are orthonormal sets of eigenvectors of
A
c
A and AA
c
respectively, corresponding to the eigenvalues
2
1
, . . . ,
2
n
and
2
1
, . . . ,
2
m
:
N
2
A
T
M
2
Av
i
=
2
i
v
i
, v
T
j
N
2
v
i
=
ij
, i, j = 1, . . . , n, V = N
2
[v
1
. . . v
n
]
AN
2
A
T
M
2
u
i
=
2
i
u
i
, u
T
j
M
2
u
i
=
ij
, i, j = 1, . . . , m, U = [u
1
. . . u
m
], (4.28)
u
i
=
1

i
Av
i
, i = 1, . . . , r = rank A.
To justify the decomposition A = UV
T
, choose a vector v R
n
and write it up
as v =

n
i=1
v
i
v
i
. Then Av =

n
i=1
Av
i
. Since
i
= 0 for i > r it follows that
Av
i
= 0. Also Av
i
=
i
u
i
for i = 1, . . . , r. Hence Av =

r
i=1
v
i

i
u
i
. Compare that
with UV
T
v
i
= U[v
1
. . . v
n
]
T
N
2
v
i
, which us equal to
i
u
i
if i r and 0 if i > r. Hence
A = UV
T
. 2
Corollary 4.80 Let the assumptions and the notations of Theorem 4.79 hold. Then for
k [1, r]
A
k
:= U
k

k
V
T
k
=
k

i=1

i
u
i
v
T
i
N
2
, U
k
R
mk
, V
k
R
nk
, (4.29)

k
:= diag(
1
, . . . ,
k
), U
T
k
M
2
U
k
= V
T
k
N
2
V
k
= I
k
, U
k
= [u
1
, . . . , u
k
], V
k
= N
2
[v
1
, . . . , v
k
]
is the best rank k-approximation to A in the Frobenius and the operator norms with respect
to the inner products , )
M
, , )
N
on R
m
, R
n
respectively.
Theorem 4.81 Let A R
mn
and B R
ln
. Then there exists a generalized (com-
mon) singular value decomposition of A and B, named GSVD, of the form
A := U
r

r
(A)V
T
r
, B = W
r

r
(B)V
T
r
,

r
(A) = diag(
1
(A), . . . ,
r
(A)),
r
(B) = diag(
1
(B), . . . ,
r
(B)),

i
(A)
2
+
i
(B)
2
= 1, for i = 1, . . . , r, r := rank [A
T
B
T
] (4.30)
U
T
r
U
r
= W
T
r
W
r
= V
T
r
N
2
V
r
= I
r
.
Let V R
n
be the subspace spanned by the columns of A
T
and B
T
. Then 0 N S
n
(R) is
any positive denite matrix such that NV = V and N
2
[V = A
T
A+B
T
B[V. Furthermore,
the GSVD of A and B is obtained as follows. Let P := A
T
A+B
T
B. Then rank P = r. Let
P := Q
r

2
r
Q
T
r
, Q
T
r
Q
r
= I
r
, Q
r
:= [q
1
. . . q
r
],
r
:= diag(
_

1
, . . . ,
_

r
), (4.31)
Pq
i
=
i
q
i
, q
T
j
q
i
=
ij
, i, j = 1, . . . , n,
1
. . .
r
> 0 =
r+1
= . . . =
n
,
52
be the spectral decomposition of P. Dene
C
A
:=
1
r
Q
T
r
A
T
AQ
r

1
r
, C
B
:=
1
r
Q
T
r
B
T
BQ
r

1
r
R
rr
.
Then C
A
+ C
B
= I
r
. Let C
A
= R
r
(A)
2
R
T
, R R
rr
, R
T
R = I
r
be the spectral de-
composition of C
A
. Then C
B
= R
r
(B)
2
R
T
is the spectral decomposition of C
B
. Fur-
thermore V = Q
r

r
R. The nonzero orthonormal columns of U
r
and W
r
correspond-
ing to positive singular values
i
(A) and
j
(B) are uniquely determined by the equalities
U
r

r
(A) = AQ
r

1
r
R and W
r

r
(B) = BQ
r

1
r
R. Other columns of U
r
and W
r
is any
set of orthonormal vectors in R
m
and R
l
respectively, which are orthogonal to the previously
determined columns of U
r
and W
r
respectively.
Proof. We rst prove the identities (4.30). Assume that Q
r

1
r
R = [v
1
, . . . , v
r
].
Clearly, range P = span(q
1
, . . . , q
r
) = span(v
1
, . . . , v
r
), and range (A
T
), range (B
T
)
range P. Hence ker P = span(u
1
, . . . , u
r
)

ker A, ker P ker B.


Let U
r
= [u
1
. . . u
r
] R
mr
. From the equality U
r

r
(A) = AQ
r

1
r
R we deduce that
Av
i
=
i
u
i
for i = 1, . . .. The equality
r
(A)U
T
r
U
r

r
(A) = (AQ
r

1
r
R)
T
(AQ
r

1
r
R) =

r
(A)
2
implies that the columns of U
r
corresponding to positive
i
(A) form an orthonormal
system. Since V = Q
r

r
R we obtain V
T
Q
r

1
r
R = I
r
. Hence U
r

r
(A)V
T
v
i
=
i
(A)u
i
for i = 1, . . . , r. Therefore A = U
r

r
(A)V
T
r
. Similarly B = W
r

r
(A)V
T
r
.
From the denitions of P, Q
r
, C
A
, C
B
it follows that C
A
+ C
B
= I
r
. Hence
1
(A)
2
+

r
(B)
2
= I
r
. Other claims of the theorem follow straightforward. 2
We now discuss a numerical example in [4] which shows the sensitivity of GSVD of two
matrices. We rst generate at random two matrices A
0
R
87
and B
0
R
97
, where
rank A
0
= rank B
0
= 2 and rank [A
T
0
B
T
0
] = 3. These is done as follows. Choose at random
x
1
, x
2
R
8
, y
1
, y
2
R
9
, z
1
, z
2
, z
3
R
7
. Then A
0
= x
1
z
T
1
+x
2
z
T
2
, B
0
= y
1
z
T
1
+y
2
z
T
3
. The
rst three singular values of A
0
, B
0
are given as follows.
27455.5092631633888, 17374.6830503566089, 3.14050409246786192 10
12
,
29977.5429571960522, 19134.3838220483449, 3.52429226420727071 10
12
,
i.e. the ranks of A
0
and B
0
are 2 within the double digit precision. The four rst singular
values of P
0
= A
T
0
A
0
+B
T
0
B
0
are
1.32179857269680762 10
9
, 6.04366385186753988 10
8
,
3.94297368116438210 10
8
, 1.34609524647135614 10
7
.
Again rank P
0
= 3 within double digit precision. The 3 generalized singular values of A
0
and B
0
given by Theorem 4.81 are:

1
= 1,
2
= 0.6814262563,
3
= 3.777588180 10
9
,

1
= 0,
2
= 0.7318867789,
3
= 1.
So A
0
and B
0
have one common generalized right singular vector v
2
with the corre-
sponding singular value
2
,
2
, which are relatively close numbers. The right singular vector
v
1
is present only in A
0
and the right singular vector v
3
is present only in B
0
.
We next perturb A
0
, B
0
by letting A = A
0
+X, B = B
0
+Y , where X R
87
, Y R
97
.
The entries of X and Y were chosen each at random. The 7 singular values of X and Y are
given as follows:
(27490, 17450, 233, 130, 119, 70.0, 18.2), (29884, 19183, 250, 187, 137, 102, 19.7).
Note that [[X[[ 0.01[[A
0
[[, [[Y [[ 0.01[[B
0
[[. Form the matrices A := A
0
+ X, B :=
B
0
+Y . These matrices have the full ranks with corresponding singular values rounded o
to three signicant digits at least:
(27490, 17450, 233, 130, 119, 70.0, 18.2), (29884, 19183, 250, 187, 137, 102, 19.7).
53
We now replace A, B by A
1
, B
1
of rank two using the rst two singular values and the
corresponding singular vectors in the SVD decompositions of A, B. Then two nonzero
singular values of A
1
, B
1
are (27490, 17450), (29883, 19183), rounded to ve signicant digits.
The singular values of the corresponding P
1
= A
T
1
A
1
+B
T
1
B
1
are up to 3 signicant digits:
(1.32 10
9
, 6.07 10
8
, 3.96 10
8
, 1.31 10
4
, 0.068, 9.88 10
3
, 6.76 10
3
). (4.32)
Assume that r = 3, i.e. P has three signicant singular values. We now apply our Theorem
4.81. The three generalized singular values of A
1
, B
1
are
(1.000000000, .6814704276, 0.7582758358 10
8
), (0., .7318456506, 1.0).
These result match the generalized singular values of A
0
, B
0
at least up to four signicant
digits. Let

V ,

U
1
,

W
1
be the matrix V , the rst two columns of U, the last two columns of
W, which are computed for A
1
, B
1
. Then
[[V

V [[
[[V [[
0.0061, [[U
1


U
1
[[ 0.0093, [[W
1


W
1
[[ 0.0098.
Finally we discuss the critical issue of choosing correctly the number of signicant singu-
lar values of noised matrices A, B and the corresponding matrix P = A
T
A+B
T
B. Assume
that the numerical rank of P
1
is 4. That is in Theorem 4.81 assume that r = 4. Then the
four generalized singular values of A
1
, B
1
up to six signicant digits are (1, 1, 0, 0), (0, 0, 1, 1)!
5 Tensors
5.1 Introduction
The common notion of tensors in mathematics is associated with dierential geometry,
covariant and contravariant derivatives, Christofel symbols and Einstein theory of general
relativity. In engineering and other mundane applications as biology, psychology, the rele-
vant notions in mathematics are related to multilinear algebra. Of course the notions in the
two mentioned elds are related.
5.2 Tensor product of two vector spaces
Given two vector spaces U, V over F = R, C one rst denes one denes the U V as a
linear span of all vectors of the form u v, where u U, v V satisfying the following
natural properties:
a(u v) = (au) v = u (av) for all a F.
(a
1
u
1
+a
2
u
2
) v = a
1
(u
1
v) +a
2
(u
2
v) for all a
1
, a
2
F. (Linearity in the rst
variable.)
u(a
1
v
1
+a
2
v
2
) = a
1
(uv
1
) +a
2
(uv
2
) for all a
1
, a
2
F. (Linearity in the second
variable.)
If u
1
, . . . , u
m
and v
1
, . . . v
n
are bases in U and V respectively, then u
i
v
j
, i =
1, . . . , m, j = 1, . . . , n is a basis in UV.
The element uv is called decomposable tensor, or decomposable element (vector), or rank
one tensor.
It is not dicult to show that UV always exists.
Example 1. Let U be the space of all polynomials in variable x of degree less than
m: p(x) =

m1
i=0
a
i
x
i
with coecients in F. Let V be the space of all polynomials in
54
variable y of degree less than n: q(y) =

n1
j=0
b
j
x
j
with coecients in F. Then U V is
identied with the vector space of all polynomials in two variables x, y of the form f(x, y) =

m1,n1
i=j=0
c
ij
x
i
y
j
with the coecients in F. The decomposable elements are p(x)q(y), p
U, q V.
The tensor products of this kind is the basic tool for solving PDE (partial dierential
equations), using separation of variables, i.e. Fourier series.
Example 2. Let U = F
m
, V = F
n
then U V is identied with the space of m n
matrices F
mn
. The decomposable tensor uv is identied with uv
T
. Note uv
T
is indeed
rank one matrix.
Assume now that in addition U, V are IPS with the inner products , )
U
, , )
V
. Then
there exists a unique inner product , )
UV
which satises the property
u v, x y)
UV
= u, x)
U
v, y)
V
for all u, x U and v, y V.
This follows from the fact that if u
1
, . . . , u
m
and v
1
, . . . , v
m
are orthonormal bases in U and
V respectively, then u
i
v
j
, i = 1, . . . , m, j = 1, . . . , n is an orthonormal basis in UV.
In Example 1, if one has the standard inner product in F
m
and F
n
then these inner
product induce the following inner product in F
mn
: A, B) = tr AB

. If A = uv
T
, B =
xy
T
then trAB

= (x

u)(y

v).
Any U V can be viewed as a linear transformation
U,V
: U V and
V,U
as
follows. Assume for simplicity that U, V are IPS over R. Then
(u v)
U,V
: U V is given by x x, u)
U
v,
(u v)
V,U
: V U is given by y y, v)
V
u.
Since any U V is a linear combination of rank one tensors, equivalently is linear
combination of rank one matrices, the above denitions extend to any UV. Thus if
A F
mn
= F
m
F
n
then A
F
m
,F
nu = A
T
u, A
F
n
,F
mv = Av.
For U V rank
U
:= rank
V,U
and rank
V
:= rank
U,V
. The rank of ,
denoted by rank , is the minimal decomposition of to a sum of rank one nonzero tensors:
=

k
i=1
u
i
v
i
, where u
i
, v
i
,= 0 for i = 1, . . . , k.
Proposition 5.1 Let UV. Then rank = rank
U,V
= rank
V,U
.
Proof. Let =

k
i=1
u
i
v
i
. Then (v) =

k
i=1
v, v
i
)
V
u
i
span(u
1
, . . . , u
k
). Hence
Range
V,U
span(u
1
, . . . , u
k
). Therefore
rank
V,U
= dimRange
V,U
dimspan(u
1
, . . . , u
k
) k.
(It is possible that u
1
, . . . , u
k
are linearly dependent.) Since is represented by a matrix
A we know that rank
U,V
= rank A
T
= rank A = rank
V,U
. Also rank is the minimal
number k that A is represented as rank one matrices. The singular value decomposition of
A yields that one can represent A as sum of rank A of rank one matrices. 2
Let T
i
: U
i
V
i
be linear operators. Then they induce a linear operator on T
1
T
2
:
U
1
U
2
V
1
V
2
such that (T
1
T
2
)(u
1
u
2
) = (T
1
u
1
)(T
2
u
2
) for all u
1
U
2
, u
2
U
2
.
We will see in the next section that T
1
T
2
is a special 4 tensor, i.e. T
1
T
2
U
1
V
1

U
2
V
2
.
If furthermore P
i
: V
i
W
i
, i = 1, 2 then we have the following composition (P
1

P
2
)(T
1
T
2
) = (P
1
T
1
) (P
2
T
2
). This equality follows from
(P
1
P
2
)
_
(T
1
T
2
)(u
1
u
2
)
_
= (P
1
P
2
)(T
1
u
1
T
2
u
2
) = (P
1
T
1
u
1
) (P
2
T
2
u
2
).
55
Since each linear operator T
i
: U
i
V
i
, i = 1, 2 is represented by a matrix, one
can reduce the denition of T
1
T
2
to the notion of tensor product of two matrices A
F
m
1
n
1
, B F
m
2
n
2
. This tensor product is called the Kronecker product.
Let A = [a
ij
]
m
1
,n
1
i,j=1
F
m
1
n
1
. Then AB F
m
1
m
2
n
1
n
2
is the following block matrix:
AB :=
_

_
a
11
B a
12
B ... a
1n
1
B
a
21
B a
22
B ... a
2n
1
B
.
.
.
.
.
.
.
.
.
.
.
.
a
m
1
1
B a
m
1
2
B ... a
m
1
n
1
B
_

_
(5.1)
Let us try to understand the logic of this notation. Let x = [x
1
, . . . , x
n
1
]
T
F
n
1
=
F
n
1
1
, = [y
1
, . . . , y
n
2
]
T
F
n
2
= F
n
2
1
. Then we view xy as a column vector of dimension
n
1
n
2
given by the above formula. So the coordinates of x y are x
j
y
l
where the double
indices are arranged in the lexicographic order, (the order of a dictionary):
(1, 1), (1, 2), . . . , (1, n
2
), (2, 1), . . . , (2, n
1
), . . . , (n
1
, 1), . . . , (n
1
, n
2
).
The entries of AB are a
ij
b
kl
= c
(i,k)(j,l)
. So we view (i, k) as the row index of AB and
(j, l) as the column index of AB. Then
[(AB)(x y)]
(i,k)
=
n
1
,n
2

j,l=1
c
(i,k)(j,l)
x
j
y
k
=
_
n
1

j=1
a
ij
x
j
__
n
2

l=1
b
kl
y
l
_
= (Ax)
i
(By)
k
.
As should be according to our notation. Thus as in operator case
(A
1
B
1
)(A
2
B
2
) = (A
1
B
1
) (A
2
B
2
) if A
i
F
m
i
n
i
, B
i
F
n
i
l
i
, i = 1, 2.
Note that I
m
I
n
= I
mn
. Moreover if A and B are diagonal matrices then A B is a
diagonal matrix. If A and B are upper or lower triangular then A B is upper or lower
triangular respectively. So if A GL(n, F), B GL(n, F) then (A B)
1
= A
1
B
1
.
Also (AB)
T
= A
T
B
T
. So if A, B are symmetric then AB is symmetric. If A and B are
orthogonal matrices then AB is orthogonal. The following results follows straightforward
from the above properties.
Proposition 5.2 . The following facts hold
1. Let A
i
R
m
i
n
i
for i = 1, 2. Assume that A
i
= U
i

i
V
T
i
, U
i
O(m
i
, R), V
i

O(n
i
, R) is the standard SVD decomposition for i = 1, 2. Then A
1
A
2
= (U
1

U
2
)(
1

2
)(V
T
1
V
T
2
) is a singular decomposition of A
1
A
2
, except that the diagonal
entries of
1

2
are not arranged in a decreasing order. In particular all the nonzero
singular values of A
1
A
2
are of the form
i
(A
1
)
j
(A
2
), where i = 1, . . . , rank A
1
and j = 1, . . . , rank A
2
. Hence rank A
1
A
2
= rank A
1
rank A
2
.
2. Let A
k
F
n
k
n
k
, k = 1, 2. Assume that det(zI
n
k
A
k
) =

n
k
i=1
(z
i,k
) for k = 1, 2.
Then det(zI
n
1
n
2
A
1
A
2
) =

n
1
n
2
i,j=1
(z
i,1

j,2
).
3. Assume that A
i
S
n
i
(R) and A
i
= Q
i

i
Q
T
i
is the spectral decomposition of A
i
, i.e.
Q
i
is orthogonal and
i
is diagonal, for i = 1, 2. Then A
1
A
2
= (Q
1
Q
2
)(
1

2
)(Q
T
1
Q
T
2
) is the spectral decomposition of A
1
A
2
S
n
1
n
2
(R).
In the next section we will show that the singular decomposition of A
1
A
2
is a minimal
decomposition of the 4 tensor A
1
A
2
. In the rest of the section we discuss the symmetric
and skew symmetric tensor products of UU.
Denition: Let U be a vector space of dimension m over F = R, C. Denote U
2
:=
UU. The subspace Sym
2
U U
2
, called a 2-symmetric power of U, is spanned by tensors
56
of the form sym
2
(u, v) := u v +v u for all u, v U. sym
2
(u, v) = sym
2
(v, u) is called
a 2-symmetric product of u and v, or simply a symmetric product Any vector Sym
2
U
is a called a 2-symmetric tensor, or simply a symmetric tensor. The subspace
_
2
U U
2
,
called 2-exterior power of U, is spanned by all tensors of the form uv := uvvu, for
all u, v U. u v = v u is called the wedge product of u and v. Any vector
_
2
U
is a called a 2-skew symmetric tensor, or simply a skew symmetric tensor.
Since 2uv = sym
2
(u, v) +uv it follows that U
2
= Sym
2
(U)
_
2
U. That is, any
tensor U
2
can be decomposed uniquely to a sum =
s
+
a
where
s
,
a
U
2
are
symmetric and skew symmetric tensors respectively.
In terms of matrices, we identify U with F
m
, U
2
with F
mm
, i.e. the algebra of
m m matrices. Then Sym
2
U is identied with S
m
(F) F
mm
, the space of m m
symmetric matrices: A
T
= A, and
_
2
U is identied with AS(m, F), the space of m m
skew symmetric matrices: A
T
= A. Note that any matrix A F
mm
is of the form
A =
1
2
(A + A
T
) +
1
2
(A A
T
), which is the unique decomposition to a sum of symmetric
and skew symmetric matrices.
Proposition 5.3 Let U be a nite dimensional space over F = R, C. Let T : U U
be any linear operator. Then Sym
2
U and
_
2
U are invariant subspaces of T
2
:= T T :
U
2
U
2
.
Proof. Observe that T
2
sym
2
(u, v) = sym
2
(Tu, Tv) and T
2
u v = (Tu) (Tv).
2
It can be shown that Sym
2
U and
_
2
U are the only invariant subspaces of T
2
for all
choices of linear transformations T : U U.
5.3 Tensor product of many vector spaces
Let U
i
be vector spaces of dimension m
i
for i = 1, . . . , k over F = R, C. Then U :=

k
i=1
U
i
= U
1
U
2
. . . U
k
is the tensor product space of U
1
, . . . , U
k
of dimension
m
1
m
2
. . . m
k
. U is spanned by the decomposable tensors
k
i=1
u
i
= u
1
u
2
. . . u
k
, called
also rank one tensors, where u
i
U
i
for i = 1, . . . , k. As in the case of k = 2 we have the
basic identity:
a(u
1
u
2
. . .u
k
) = (au
1
)u
2
. . .u
k
= u
1
(au
2
). . .u
k
= . . . = u
1
u
2
. . .(au
k
).
Also the above decomposable tensor is multilinear in each variable. The denition and
the existence of k-tensor products can be done recursively as follows. For k = 2 U
1
2
is dened in the previous section. Then dene recursively
k
i=1
U
i
as (
k1
i=1
U
i
) U
k
for
k = 3, . . ..

k
j=1
u
i
j
,j
, i
j
= 1, . . . , m
j
, j = 1, . . . , k is a basis of
k
i=1
U
i
(5.2)
if u
1,i
, . . . , u
m
i
,i
is a basis of U
i
for i = 1, . . . , k.
Assume now that in addition U
i
is IPS with the inner product , )
U
i
for i = 1, . . . , k.
Then there exists a unique inner product , )

k
i=1
U
which satises the property

k
i=1
u
i
,
k
i=1
v
i
)

k
i=1
U
i
=
k

i=1
u
i
, v
i
)
U
i
for all u
i
, v
i
U
i
, i = 1, . . . , k.
In particular, if u
1,i
, . . . , u
m
i
,i
is an orthonormal basis in U
i
, for i = 1, . . . , k, then
k
j=1
u
i
j
,j
,
where i
j
= 1, . . . , m
j
and j = 1, . . . , k is an orthonormal basis in
k
i=1
U
i
.
57
Then any
k
i=1
U can be represented as
=

i
j
[1,m
j
],j=1,...,k
t
i
1
...i
k

k
j=1
u
i
j
,j
. (5.3)
The above decomposition is called the TUCKER model. Then T = (t
i
1
...i
k
)
m
1
,...,m
k
i
1
=...=i
k
=1
is
called the core tensor. If u
1,i
, . . . , u
m
i
,i
is an orthonormal basis in U
i
, for i = 1, . . . , k,
then the TUCKER model is referred is as Higher-Order Singular Value Decomposition, or
HOSVD. The core tensor T is called is called diagonal if t
i
1
i
2
...i
k
= 0 whenever the equality
i
1
= . . . = i
k
is not satised.
We now discuss the change in the core tensor when we replace the base [u
1,i
, . . . , u
m
i
,i
]
in U
i
to the base [v
1,i
, . . . , v
m
i
,i
] in U
i
for i = 1, . . . , k. We st recall the change in
the coordinates of the vector u U when we change the basis [u
1
, . . . , u
m
] to the basis
[v
1
, . . . , v
m
] in U. Let u =

m
i=1
x
i
u
i
. Then x := [x
1
, . . . , x
m
]
T
is the coordinate vector
of u in the basis [u
1
, . . . , u
m
]. It is convenient to represent u = [u
1
, . . . , u
m
]x. Suppose
that [v
1
, . . . , v
m
] is another basis in U. Then Q = [q
ij
]
m
i,j=1
F
mm
is called the transition
matrix from the base [u
1
, . . . , u
m
] to the base [v
1
, . . . , v
m
] if
[u
1
, . . . , u
m
] = [v
1
, . . . , v
m
]Q, u
i
=
m

j=1
q
ji
v
j
, j = 1, . . . , m.
So Q
1
is the transition matrix from the base [v
1
, . . . , v
m
] to [u
1
, . . . , u
m
], i.e. [v
1
, . . . , v
m
] =
[u
1
, . . . , u
m
]Q
1
. Hence
u = [u
1
, . . . , u
m
]x = [v
1
, . . . , v
m
](Qx), i.e. y = Qx coordinate vector of u in [v
1
, . . . v
n
] basis
Thus if Q
l
= [q
ij,l
]
m
i,j=1
F
mm
is the transition matrix from the base [u
1,l
, . . . , u
m
l
,l
]
to the base [v
1,l
, . . . , v
m
l
,l
] for l = 1, . . . , k then the core tensor T
t
= (t
t
j
1
...j
k
) corresponding
to the new basis
k
j=1
v
i
j
,j
is given by the formula:
t
t
j
1
,...,j
k
=
m
1
,...,m
k

i
1
,...,i
k
=1
_
k

l=1
q
j
l
i
l
,l
_
t
i
1
...i
k
, denoted as T
t
= T Q
1
Q
2
. . . Q
k
. (5.4)
Any tensor can be decomposed to a sum of rank one tensors:
=
R

i=1

k
l=1
u
l,i
, where u
l,i
U
l
for i = 1, . . . , j, l = 1, . . . , k. (5.5)
This decomposition is called CANDECOMP-PARAFAC decomposition. This decomposi-
tion is not unique. For example, we can obtain CANDECOMP-PARAFAC decomposition
from Tucker decomposition by replacing the nonzero term t
i
1
...i
k

k
j=1
u
i
j
,j
, t
i
1
...i
k
,= 0 with
(t
i
1
...i
k
u
i
1
) u
i
2
. . . u
i
k
.
The value of the minimal R is called the tensor rank of , and is denoted by rank .
That is, rank is the minimal number of rank one tensors in the decomposition of to a
sum of rank one tensors. In general, it is a dicult problem to determine the exact value
of rank for
k
i=1
U
i
and k 3.
Any
k
i=1
U can be viewed as a linear transformation

p
l=1
U
i
l
,
p

l=1
U
i

l
:
p
l=1
U
i
l

p

l=1
U
i

l
, 1 i
1
< . . . < i
p
k, 1 i
t
1
< . . . < i
p
k, (5.6)
where the two sets of nonempty indices i
1
, . . . , i
p
, i
t
1
, . . . , i
p
are complementary, i.e. 1
p, p
t
< k, p+p
t
= 1 and i
1
, . . . , i
p
i
t
1
, . . . , i
p
= , i
1
, . . . , i
p
i
t
1
, . . . , i
p
= 1, . . . , k.
The above transformation is obtained by contracting the indices i
1
, . . . , i
p
. Assume for sim-
plicity that U
i
is IPS over R for i = 1, . . . , k. Then for decomposable tensor transformation
(5.6) is given as
58
(
k
i=1
u
i
)(
p
l=1
v
i
l
) =
_
p

l=1
v
i
l
, u
i
l
)
U
i
l
_

l=1
u
i

l
. (5.7)
For example for k = 3 (u
1
u
2
u
3
)(v
2
) =
_
u
2
, v
2
)
U
2
_
u
1
u
3
, where u
1
U
1
, u
2
, v
2

U
2
, u
3
U
3
and p = 1, i
1
= 2, p
t
= 2, i
t
1
= 1, i
t
3
= 3.
Then
rank

p
l=1
U
i
l
,
p

l=1
U
i

l
:= dimRange

p
l=1
U
i
l
,
p

l=1
U
i

l
. (5.8)
It is easy to compute the above ranks, as this rank is the rank of the corresponding matrix
representing this linear transformation. As in the case of matrices it is straightforward to
show that
rank

p
l=1
U
i
l
,
p

l=1
U
i

l
:= rank

l=1
U
i

l
,
p
l=1
U
i
l
. (5.9)
In view of the above equalities, for k = 3, i.e. U
1
U
2
U
3
we have three dierent ranks:
rank
U
1
:= rank
U
2
U
3
,U
1
, rank
U
2
:= rank
U
1
U
3
,U
2
, rank
U
3
:= rank
U
1
U
2
,U
3
.
Thus rank
U
1
can be viewed as the dimension of the subspace of U
1
obtained by all possible
contractions of with respect to U
2
, U
3
.
In general, let
rank
i
= rank
U
1
...U
i1
U
i+1
...U
k
,U
i
, i = 1, . . . , k. (5.10)
Let
k
i=1
U
i
be xed. Choose a basis of U
i
such that u
1,i
, . . . , u
rank
i
,i
is a basis in
Range
U
1
...U
i1
U
i+1
...U
k
,U
i
. Then we obtain a more precise version of the TUCKER
decomposition:
=

i
j
[1,rank
j
],j=1,...,k
t
i
1
...i
k

k
j=1
u
i
j
,j
. (5.11)
The following lower bound on rank can be computed easily:
Proposition 5.4 Let U
i
be a vector space of dimension m
i
for i = 1, . . . , k 2. Let

k
i=1
U
i
. Then for any set of complementary indices i
1
, . . . , i
p
, i
t
1
, . . . , i
t
p

rank rank

p
l=1
U
i
l
,
p

l=1
U
i

l
.
Proof. Let be of the form (5.5). Then
rank

p
l=1
U
i
l
,
p

l=1
U
i

l
= dimRange

p
l=1
U
i
l
,
p

l=1
U
i

l
= dimspan(
p

l=1
u
i

l
,1
, . . . ,
p

l=1
u
i

l
,j
) j.
2
Proposition 5.5 Let U
i
be a vector space of dimension m
i
for i = 1, . . . , k 2. Let

k
i=1
U
i
. Then
rank
m
1
. . . m
k
max
i[1,k]
m
i
.
Proof. The proof is by induction on k. For k = 2, recall that any U
1
U
2
can be represented by A F
m
1
m
2
. Hence rank = rank A min(m
1
, m
2
) =
m
1
m
2
max(m
1
,m
2
)
.
Assume that the proposition holds of k = n 2 and let k = n+1. By permuting the factor
U
1
, . . . , U
k
, one can assume that m
n
m
i
for i = 1, . . . , n1. Let u
1,n
, . . . , u
m
n
,n
be a ba-
sis of U
n
. It is straightforward to show that
n
i=1
U
i
is of the form =

m
n
p=1

p
u
p,n
for unique
p

n1
i=1
U
i
. Decompose each
i
to a minimal sum of rank one tensors in

n1
i=1
U
i
. Now use the induction hypothesis for each
i
to obtain a decomposition of as a
sum of at most
m
1
...m
k
max
i[1,k]
m
i
rank one tensors. 2
59
5.4 Examples and results for 3-tensors
Let us start with the simplest case U
1
= U
2
= U
3
= R
2
, C
2
. Let e
1
= [1, 0]
T
, e
2
= [0, 1]
T
be
the standard basis in R
2
or C
2
. Then =

2
i=j=k=1
t
ijk
e
i
e
j
e
k
, and T = (t
ijk
)
i=j=k=2
i=j=k=1
.
Let T
k
:= [t
ijk
]
2
i,j=1
for k = 1, 2. Any B = [b
ij
]
2
i,j=1
F
2
we identify with the tensor

2
i,j=1
b
ij
e
i
e
j
. Using this identication we view = T
1
e
1
+T
2
e
2
.
Example 1: Assume that
t
111
= t
112
= 1, t
221
= t
222
= 2, t
211
= t
121
= t
212
= t
122
= 0. (5.12)
To nd R
1
= rank
U
2
U
3
,U
1
, we construct a matrix A
1
:= [a
pq,1
]
2,4
p,q=1
R
24
, where
p = i = 1, 2 and q = (j, k), j, k = 1, 2 and a
pq,1
= t
ijk
. Then A
1
:=
_
1 1 0 0
0 0 2 2
_
, and
R
1
= rank A
1
= 2. Next A
2
:= [a
pq,2
]
2,4
p,q=1
R
24
, where p = i = 1, 2 and q = (j, k), j, k =
1, 2 and a
pq,2
= t
jik
. Then A
2
= A
1
and R
2
= rank A
2
= 2. Next A
3
:= [a
pq,3
]
2,4
p,q=1
R
24
,
where p = i = 1, 2 and q = (j, k), j, k = 1, 2 and a
pq,3
= t
jki
. Then A
3
:=
_
1 0 0 2
1 0 0 2
_
,
and R
3
= rank A
3
= 1. Hence Proposition 5.4 yields that rank 2. We claim that
rank = 2 as a tensor over real or complex numbers. Observe that
T
1
= T
2
=
_
1 0
0 2
_
e
1
e
1
+2e
2
e
2
= (e
1
e
1
+2e
2
e
2
)e
1
+(e
1
e
1
+2e
2
e
2
)e
2
.
Hence
= (e
1
e
1
+ 2e
2
e
2
) (e
1
+e
2
) = e
1
e
1
(e
1
+e
2
) + (2e
2
) e
2
(e
1
+e
2
).
Thus rank 2 and we nally deduce that rank = 2.
Example 2: Assume that
t
211
= t
121
= t
112
= 1, t
111
= t
222
= t
122
= t
212
= t
221
. (5.13)
Let A
1
, A
2
, A
3
be the matrices dened as in Example 1. Then A
1
= A
2
= A
3
:=
_
0 1 1 0
1 0 0 0
_
.
Hence R
1
= R
2
= R
3
= 2 and rank 2. Observe next that
T
1
=
_
0 1
1 0
_
e
1
e
2
+e
2
e
1
, T
2
=
_
1 0
0 0
_
e
1
e
1
= T
1
e
1
+T
2
e
2
.
Hence = e
1
e
2
e
1
+e
2
e
1
e
1
+e
1
e
1
e
2
. Thus 2 rank 3. We will show
that rank = 3, using the results that follow.
Proposition 5.6 Let u
1,i
, . . . , u
m
i
,i
be a basis of U
i
for i = 1, 2, 3 over F = R, C. Let
U
1
U
2
U
3
, and assume that =

m
1
,m
2
,m
3
i=j=k=1
t
ijk
u
i,1
u
j,2
u
k,3
, i = 1, . . . , m
1
, j =
1, . . . , m
2
, k = 1, . . . , m
3
. Let T
k
:= [t
ijk
]
m
1
,m
2
i,j=1
F
m
1
m
2
for k = 1, . . . , m
3
, i.e. T
i
=

m
1
,m
2
i,j=1
t
ijk
u
i,1
u
j,2
and = sum
m
3
k=1
T
k
u
k,3
. Assume that a basis [u
1,3
, . . . , u
m
3
,3
] of
U
3
is changed to a basis [v
1,3
, . . . , v
m
3
,3
], where [u
1,3
, . . . , u
m
3
,3
] = [v
1,3
, . . . , v
m
3
,3
]Q
3
, Q
3
=
[q
pq,3
]
m
3
p,q=1
GL(m
3
, F). Then =

m
3
k=1
T
t
k
v
k,3
, where T
t
k
= [t
t
ijk
]
m
1
,m
2
i,j=1
=

k
l=1
q
kl,3
T
l
. In
particular, T
t
:= (t
t
ijk
)
m
1
,m
2
,m
2
i,j,k=1
is the core tensor corresponding to in the basis u
i,1
u
j,2

v
k,3
for i = 1, . . . , m
1
, j = 2, . . . , m
2
, k = 3, . . . , m
3
. Furthermore, R
3
= rank
U
1
U
2
,U
3
=
rank
U
3
,U
1
U
2
= dimspan(T
1
, . . . , T
m
3
), where each T
k
is viewed as a vector in F
m
1
m
2
.
In particular, one choose a basis [v
1,3
, . . . , v
m
3
,3
] in U
3
such that the matrices T
t
1
, . . . , T
t
R
3
are linearly independent. Furthermore, if m
3
> R
3
then we can assume that T
t
k
= 0 for
k > R
3
.
60
Assume that [v
1,3
, . . . , v
m
3
,3
] in U
3
was chosen as above. Let [v
1,1
, . . . , v
m
1
,1
], [v
1,2
, . . . , v
m
2
,2
]
be two bases in U
1
, U
2
respectively, where [u
1,1
, . . . , u
m
1
,1
] = [v
1,1
, . . . , v
m
1
,1
]Q
1
, [u
1,2
, . . . , u
m
2
,2
] =
[v
1,2
, . . . , v
m
2
,2
]Q
2
and Q
1
= [q
pq,1
]
m
1
p,q=1
GL(m
1
, F), Q
2
= [q
pq,2
]
m
2
p,q=1
GL(m
2
, F). Let
=

m
1
,m
2
,m
3
i,j,k=1

t
ijk
v
i,1
v
j,2
v
j,3
,

T
k
:= [

t
ijk
]
m
1
,m
2
i,j=1
F
m
1
m
2
for k = 1, . . . , m
3
. Then

T
k
= Q
1
T
t
k
Q
T
2
for k = 1, . . . , m
3
. Furthermore, if one chooses the bases in U
2
, U
2
such that
Range
U
2
U
3
,U
1
= span(v
1,1
, . . . , v
R
1
,1
) and Range
U
1
U
3
,U
2
= span(v
1,2
, . . . , v
R
2
,2
),
then each

T
k
is a block diagonal matrix

T
k
= diag(

T
k
, 0), where

T
k
= [

t
ijk
]
R
1
,R
2
i,j=1
F
R
1
R
2
for k = 1, . . . , m
3
. Recalling that T
t
k
= 0 for k > R
3
we get the representation =

R
1
,R
2
,R
3
i,j,k=1

t
ijk
v
i,1
v
j,2
v
j,3
.
The proof of this proposition is straightforward and is left to the reader.
By interchanging the factors U
1
, U
2
, U
3
in the tensor product U
1
U
2
U
3
it will be
convenient to assume that m
1
m
2
m
3
1. Also the above proposition implies that for
,= 0 we may assume that m
1
= R
1
m
2
= R
2
m
3
= R
3
.
Proposition 5.7 Let U
1
U
2
U
3
. If R
3
= 1 then rank = R
1
= R
2
.
Proof. We may assume that m
3
= 1. In that case = T
1
v
1,3
. Hence rank =
rank T
1
= R
1
= R
2
. 2
Thus we need consider the case m
1
= R
1
m
2
= R
2
m
3
= R
3
2. We now consider
the generic case, i.e. where T
1
, . . . , T
k
F
m
1
m
2
are generic. That is each T
i
,= 0 is chosen
at random, where
T
i
]]T
i
]]
F
has a uniform distribution on the matrices of norm one. It is well
known that a generic matrix T F
m
1
m
2
has rank m
2
, since m
1
m
2
.
Theorem 5.8 Let =

n,n,2
i,j,k=1
t
ijk
C
n
C
n
C
2
where n 2. Denote T
1
=
[t
ij1
]
n
i,j=1
, T
2
= [t
ij2
]
n
i,j=1
C
nn
. Suppose that there exists a linear combination T =
aT
1
+ bT
2
GL(n, C), i.e. T is invertible. Then n rank 2n 1. In particular, for a
generic tensor rank = n.
Proof. Recall that by changing a basis in C
2
, we may assume that T
t
1
= aT
1
+bT
2
=
T. Suppose rst that T
t
2
= 0. Then rank = rank T = n. Hence we have always the
inequality rank n.
Assume now that R
3
= 2. Choose rst Q
1
= T
1
and Q
2
= I
n
. Then

T
2
= I
n
. Thus, for
simplicity of notation we may assume to start with that T
1
= I
n
. Let be an eigenvalue T
2
.
Then change the basis in C
2
such that T
t
1
= I
n
and T
t
2
= T
2
T
1
= T
2
I
n
. So det T
t
2
= 0.
Hence r = rank T
t
2
n 1. So SVD of T
2
=

r
i=1
u
i
(
i
(T
2
) u
i
). Now I
n
=

n
i=1
e
i
e
i
.
Hence =

n
i=1
e
i
e
1
e
1
+

r
i=1
u
i
(
i
(T
2
) u
i
) e
2
. Hence rank 2n 1.
We now consider the generic case. In that case T
1
is generic, so rank T
1
= n is invertible.
Choose Q
1
= T
1
, Q
2
= I
n
, Q
3
= I
2
. Then

T
1
= I
n
,

T
2
= T
1
1
T
2
.

T
2
is generic. Hence
it is diagonable. So

T
2
= X diag(
1
, . . . ,
n
)X
1
for some invertible X. Now we again
change a basis in U
1
= U
2
= R
n
by letting Q
1
= X
1
, Q
2
= X
T
. The new matrices

T
1
= X
1
I
n
X = I
n
,

T
2
= X
1

T
2
X = diag(
1
, . . . ,
n
). In this basis
=

T
1
e
1
+

T
2
e
2
=
n

i=1
e
i
e
i
e
1
+
n

i=1

i
e
i
e
i
e
2
=
n

i=1
e
i
e
i
(e
1
+
i
e
2
).
Hence the rank of generic tensor C
n
C
n
C
2
is n. 2
I believe that for any C
n
C
n
C
2
rank 2n 1. The case rank = 2n 1
would correspond to T
1
= I
n
and T
2
a nilpotent matrix with one Jordan block.
The analysis of the proof of the above theorem yields.
61
Corollary 5.9 Let =

n,n,2
i,j,k=1
t
ijk
F
n
F
n
F
2
where n 2 and F = R, C. Denote
T
1
= [t
ij1
]
n
i,j=1
, T
2
= [t
ij2
]
n
i,j=1
F
nn
. Suppose that there exists a linear combination
T = aT
1
+ bT
2
GL(n, F), i.e. T is invertible. Then rank = n, if and only if the matrix
T
1
(bT
1
+aT
2
) is diagonable over F.
This result shows that it is possible that for R
n
R
n
R
2
rank
R
> n, while
rank
C
= n. For simplicity choose n = 2, = T
1
e
1
+ T
2
e
2
, T
1
= I
2
, T
2
= T
T
2
,= 0.
Then T
2
has two complex conjugate eigenvalues. Hence T
2
is not diagonable over R. The
above Corollary yields that rank
R
> 2. However, since T
2
is normal T
2
is diagonable over
C. Hence the above Corollary yields that rank
C
= 2.
Proof of the claim in Example 2 Observe that T
1
is invertible T
1
1
= T
1
. Consider
T
1
1
T
2
= S =
_
0 0
1 0
_
. Note that the Jordan canonical form of S is S
T
. Hence S is
not diagonable. The above Corollary shows that rank
R
, rank
C
> 2. Hence rank
R
=
rank
C
= 3. This shows that Theorem 5.8 is sharp for n = 2.
62
References
[1] R.B. Bapat and T.E.S. Raghavan, Nonnegative Matrices and Applications, Cambridge
University Press, Cambridge, UK, 1997.
[2] A. Berman and R.J. Plemmons, Nonnegative Matrices in Mathematical Sciences, Aca-
demic Press, New York 1979.
[3] K. Fan, On a theorem of Weyl concerning eigenvalues of linear transformations. I.,
Proc. Nat. Acad. Sci. U. S. A. 35 (1949), 652655.
[4] S. Friedland, A New Approach to Generalized Singular Value Decomposition, to appear
in SIMAX, 2006.
[5] F.R. Gantmacher, The Theory of Matrices, Vol. I and II, Chelsea Publ. Co., New York
1959, Reprinted by Amer. Math. Soc..
[6] G.H. Hardy, J.E. Littlewood and G. Polya, Inequalities, Cambridge Univ. Press, Second
edition, 1952.
[7] R.A. Horn and C.R. Johnson, Matrix Analysis, Cambridge Univ. Press, New York 1988.
[8] G.H. Golub and C.F. Van Loan, Matrix Computations, John Hopkins Univ. Press, 1985.
[9] T. Kato, A Short Introduction to Perturbation Theory for Linear Operators, Springer-
Verlag, 2nd ed., New York 1982.
[10] S.J. Leon, Linear Algebra with Applications, MacMillan, 6th edition, 2002.
[11] A.J. Laub, Matrix Analysis for Scientists & Engineers, SIAM, 2005.
[12] G. Polya and M. Schier, Convexity of functionals by transplantation, J. Analyse Math.
3 (1954), 245346.
63

You might also like