You are on page 1of 25

Linear Algebra Written Examinations

Study Guide
Eduardo Corona Other Authors as they Join In
November 2, 2008
Contents
1 Vector Spaces and Matrix Operations 2
2 Linear Operators 2
3 Diagonalizable Operators 3
3.1 The Rayleigh Quotient and the Min-Max Theorem . . . . . . . . 4
3.2 Gershgorins Discs Theorem . . . . . . . . . . . . . . . . . . . . . 4
4 Hilbert Space Theory: Interior Product, Orthogonal Projection
and Adjoint Operators 5
4.1 Orthogonal Projection . . . . . . . . . . . . . . . . . . . . . . . . 7
4.2 The Gram-Schmidt Process and QR Factorization . . . . . . . . 8
4.3 Riesz Representation Theorem and The Adjoint Operator . . . . 9
5 Normal and Self-Adjoint Operators: Spectral Theorems and
Related Results 11
5.1 Unitary Operators . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.2 Positive Operators and Square Roots . . . . . . . . . . . . . . . . 15
6 Singular Value Decomposition and the Moore-Penrose Gener-
alized Inverse 16
6.1 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . 16
6.2 The Moore-Penrose Generalized Inverse . . . . . . . . . . . . . . 18
6.3 The Polar Decomposition . . . . . . . . . . . . . . . . . . . . . . 19
7 Matrix Norms and Low Rank Approximation 19
7.1 The Frobenius Norm . . . . . . . . . . . . . . . . . . . . . . . . 19
7.2 Operator Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7.3 Low Rank Matrix Approximation: . . . . . . . . . . . . . . . . . 21
1
8 Generalized Eigenvalues, the Jordan Canonical Form and e
.
22
8.1 The Generalized Eigenspace 1
X
. . . . . . . . . . . . . . . . . . . 23
8.2 A method to compute the Jordan Form: The points diagram . . 24
8.3 Applications: Matrix Powers and Power Series . . . . . . . . . . . 24
9 Nilpotent Operators 24
10 Other Important Matrix Factorizations 24
11 Other Topics (which appear in past exams) 24
12 Yet more Topics I can think of 25
1 Vector Spaces and Matrix Operations
2 Linear Operators
Denition 1 Let l,\ be vector spaces ,1 (Usually 1 = R or C). Then
1(l, \ ) = T : l \ [ T is linear. In particular, 1(l, l) = 1(l) is the
space of linear operators of l, and 1(l, 1) = l
+
is its algebraic dual.
Denition 2 Important Subspaces: Given \ _ \ (subspace), T
1
(\) _
l. In particular, we are interested in T
1
(0) = 1cr(T). Also, if o _ l, then
T(o) _ l. We are most interested in T(l) = 1a:(T).
Theorem 3 l, \ c,1, dim(l) = :, dim(\ ) = :. Given 1 = n
1
, ..., n
n

basis of l and 1
t
=
1
, ...,
n
basis of \, to each T 1(l, \ ) we can associate
a matrix [T]
1
0
1
such that:
Tn
I
= a
1I

1
+... +a
nI

n
\i 1, .., :
[T]
1
0
1
= (a
I
) in '
nn
(1)
T
l \
1 | | 1
t
1
n
1
n
[T]
1
0
1
Conversely, given a matrix '
nn
(1), there is a unique T
.
1(l, \ ) such
that = [T]
1
0
1
.
Proposition 4 Given T 1(l), there exist basis 1 and 1
t
of l such that:
[T]
1
0
1
=
_
1 0
0 0
_
1 is constructed as an extension for a basis for 1cr(T), and 1
t
as an extension
for T(n)
u1
.
2
Theorem 5 (Rank and Nullity) dim(l) = dim(1cr(T))+dim(1a:(T)). i(T) =
dim(1cr(T)) is known as the nullity of T, and r(T) = dim(1a:(T)) as the rank
of T.
Change of Basis: l, \ c,1, 1 and basis of l, 1
t
,
t
basis of \, there
exists 1 invertible such that:
[T]
1
0
1
= 1[T]

1
1
1 is a matrix that performs a change of coordinates. This means that, if two
matrices are similar, they represent the same linear operator using a dierent
basis. This further justies that key properties of matrices are preserved under
similarity.
3 Diagonalizable Operators
If l = \ (T,is a linear operator), it is natural to impose that both basis 1 and
1
t
are also the same. In this case, it is no longer generally true that we can
nd a basis 1 such that the corresponding matrix is diagonal. However, if there
exists a basis 1 such that [T]
1
0
1
= diagonal matrix, we say T is diagonalizable.
Denition 6 \ c,1, T 1(\ ). ` 1 is an eigenvalue of T if \ a
nonzero vector such that T = `. All nonzero vectors such that this holds are
known as eigenvectors of T.
We can immediately derive, from this denition, that the existence of the
eigenpair (`, ) (eigenvalue ` and corresponding eigenvector ) is equivalent to
the existence of a nonzero solution to
(T `1) = 0
This in turn tells us that the eigenvalues of T are those such that the operator
T `1 is not invertible. After selecting a basis 1 for \, this also means: u
det([T]
1
1
`1) = 0
Which is called the characteristic equation of T. We notice this equation
does not depend on the choice of basis 1, since it is invariant under similarity:
det(11
1
`1) = det(1(`1)1
1
) = det(`1)
This equation nally is equivalent to nding the complex roots of a polyno-
mial in `. We know this to be a really hard problem for : _ 5, and a numerically
ill-posed problem at that.
3
Denition 7 \ c,1, T 1(\ ), ` an eigenvector of T. Then 1
X
= \ [
T = ` is the eigenspace for `.
Theorem 8 \ c,1 of nite dimension, T 1(\ ). The following are equiva-
lent:
i) T is diagonalizable
ii) \ has a basis of eigenvectors of T
iii) There exist subspaces \
1
, ..., \
n
such that dim(\
I
) = 1, T(\
I
) \
I
and
\ =

n
I=1
\
I
iv) \ =

|
I=1
1
X
k
with `
1
, ..., `
|
eigenvalues of T
v)

|
I=1
dim(1
Xi
) = dim(\ )
Proposition 9 \ c,C, T 1(\ ) then T has at least one eigenvalue (this is a
corollary of the Fundamental Theorem of Algebra, applied to the characteristic
equation).
Theorem 10 (Schurs factorization) \ c,C, T 1(\ ). There always exists
a basis 1 such that [T]
1
is upper triangular.
3.1 The Rayleigh Quotient and the Min-Max Theorem
3.2 Gershgorins Discs Theorem
Although calculating eigenvalues of a big matrix is a very dicult problem
(computationally and analytically), it is very easy to come up with regions on
the complex plane where all the eigenvalues of a particular operator T must
lie. This technique was rst devised by the russian mathematician Semyon
Aranovich Gershgorin (1901 1933):
Theorem 11 (Gershgorin, 1931) Let = (a
I
) '
n
(C). For each i 1, .., :,
we dene the it/ "radius of " as r
I
() =

,=I
[a
I
[ and the it/ Gershgorin
disc as
1
I
() = . C [ [. a
II
[ < r
I
()
Then, if we dene () = ` [ ` is an eigenvalue of , it follows that:
()
n
_
I=1
1
I
()
That is, all eigenvalues of must lie inside one or more Gershgorin discs.
4
Proof. Let ` be an eigenvalue of , an associated eigenvector. We x i as the
it/ coordinate of with maximum modulus, that is, [
I
[ _ [
|
[ \/. Necessarily,
[
I
[ , = 0. Then,
= ` ==
`
I
=

a
I

(` a
II
)
I
=

,=I
a
I

[(` a
II
)[ [
I
[ _

,=I
[a
I
[ [
I
[
` 1
I
()
Now, we know that represents a linear operator T 1(R
n
), and that
therefore its eigenvalues are invariant under transposition of and under simi-
larity. Therefore:
Corollary 12 Let = (a
I
) '
n
(C). Then, ()

n
_
I=1
1
I
(11
1
) [ 1
is invertible
Of course, if T is diagonalizable, one of this 1
t
: is the one such that 11
1
is diagonal, and therefore the Gershgorin discs degenerate to the : points we are
looking for. However, if we dont want to compute the eigenvalues, we can still
use this to come up with a ne heuristic to reduce the region given by the union
of the gershgorin discs: We can use permutation matrices or diagonal matrices
as our 1
t
: to get a "reasonable region". This result also hints at the fact that,
if we perturb a matrix , the eigenvalues change continuously.
The Gershgorin disc theorem is also a quick way to prove is invertible if it
is diagonal dominant, and it also provides us with results when the eigenvalues
of are all distinct (namely, that there must be at least one eigenvalue per
Gershgorin disc).
4 Hilbert Space Theory: Interior Product, Or-
thogonal Projection and Adjoint Operators
Denition 13 Let \ c,1. An interior product on \ is a function <, : \
\ 1 such that:
1) n +, n = n, n +, n \n, , n \
2) cn, n = cn, \n, \ , c 1
3) n, = , n
4) n, n _ 0 and n, n = 0 == n = 0
5
By denition, every interior product induces a natural norm for \ , given by
||
\
=
_
,
Denition 14 We say n and are orthogonal, or nl, if n, = 0
Some important identities:
1. Pythagoras Theorem: nl == |n +|
2
= |n|
2
+||
2
2. Cauchy-Bunyakowski-Schwarz: [n, [ _ |n| || \n, \ with
equality == n = c
3. Parallelogram: |n +|
2
+|n |
2
= 2(|n|
2
+||
2
) \n, \
4. Polarization:
n, =
1
4
|n +|
2
|n |
2
\n, \ if 1 = R
n, =
1
4
4

|=1
i
|
_
_
n +i
|

_
_
2
\n, \ if 1 = C
In fact, identities 3 and 4 (Parallelogram and Polarization) give us both
necessary and sucient conditions for a norm to be induced by some
interior product. In this fashion, we can prove ||
1
and ||
o
are not induced
by an interior product by showing paralellogram fails.
Denition 15 \ is said to be of unit norm if || = 1
Denition 16 A subset o \ is said to be orthogonal if the elements in o are
mutually orthogonal (perpendicular)
Denition 17 o is orthonormal if it is orthogonal and its elements are of unit
norm
If o is orthogonal, then it is automatically LI (Linearly Independent). Intu-
itively, we can think of orthogonal vectors as vectors which do not "cast a shade"
on each other, and therefore point to completel exclusive directions. We have
the following property: if o =
1
, ...,
n
is orthogonal, then for all :ja:(o):
= c
1

1
+... +c
n

n
c
I
=
,
I

I
,
I

\i
Thus, we can obtain the coecient for each element of o independently, by
computing the interior product with the corresponding
I
. Furthermore, if o is
orthonormal:
c
I
= ,
I
\i
And these coecients are also called the abstract Fourier coecients.
6
Theorem 18 (Bessels Inequality) Let
1
, ...,
n
orthonormal set, \ .
Then:
n

I=1
[,
I
[
2
_ ||
With equality == :ja:(
I

n
I=1
).
4.1 Orthogonal Projection
This last result suggests that, for an orthogonal set o, in order to retrieve the
component going "in the it/ direction", we only need to compute
u,ui)
ui,ui)

I
. This
is in fact a projection of our vector in the direction of vector
I
, or the "shadow"
that casts on the direction of vector
I
. We shall dene this more generally,
and see that we can dene projection operators which give us the component of
a vector in a given subspace of \ :
Denition 19 o \. We dene the orthogonal complement o
J
= \ [
, : = 0 \: o. If o _ \ (closed subspace), then oo
J
= \ and (o
J
)
J
= o
(always true for nite dimension)
Denition 20 Let \ _ \. Then we dene 1
V
1(\ ) such that, if =

V
+
V
?, then 1
V
() =
V
. We can also dene this operator by its action on
a suitable basis of \ : If we take 1
V
= n
1
, .., n

and 1
V
? = n
1
, ..., n
n

basis of \ and \
J
, then 1 = 1
V
' 1
V
?:
1
V
(n
I
) = n
I
\i 1, ..., j
1
V
(n

) = 0 \, 1, ..., : j
[1
V
]
1
=
_
1

0
0 0
_
From this, a myriad of properties for 1 can be deduced:
1. 1
2
V
= 1
V
: This follows easily from the fact that 1
V
n = n \n \
2. 1a:(1
V
) = \ and 1cr(1
V
) = \
J
3. 1
V
\
J
\ \ : we can deduce this directly from the deni-
tion, or compute the interior product with any member of \. Also, this
follows from the diagram one can draw in R
2
or R
3
: If we remove the
"shadow" cast by a vector, all that is left is the orthogonal component.
This additonally tells us that:
1
V
? = 1 1
V
7
4. | 1
V
| _ | n| \n \: This is a very strong result: it tells us the
orthogonal projection is the best approximation to through vectors in \.
This is a key result which justies the use of the projection in applications
such as least squares, polynomial interpolation and approximation, fourier
series, etc. In fact, this result can be extended to projection on convex
sets in Hilbert spaces.
5. |1
V
| _ || \ \: This tells us the projection is a contraction. In
particular, we know that |1| = 1, since there are vectors (namely, those
in \) for which equality holds.
6. 1
V
n, = n, 1
V
\n, \ (1
V
is "self-adjoint"). This can be
proved explicitly using the unique decomposition of n and as the sum of
components in \ and \
J
. In particular, this also tells us that the matrix
which represents 1
V
is symmetric / self-adjoint as well if we choose a basis
of orthonormal vectors.
7. It can be shown that properties (1) and (4), (1) and (5) or (1) and (6)
completely characterize the orthogonal projection. That is, from these
properties alone we can deduce the rest, and that the operator 1 has to
be a projection onto its range.
4.2 The Gram-Schmidt Process and QR Factorization
We can ask ourselves if, for any basis in \ there exists a procedure to turn it
into an orthonormal basis. The Gram-Schmidt Process does exactly this, and
as a by-product it also gives us a very useful matrix factorization, QR:
Theorem 21 (Gram-Schmidt) If n
I

n
I=1
is a linearly independent set, there
exists an orthonormal set n
I

n
I=1
such that :ja:(n
I

n
I=1
) = :ja:(n
I

n
I=1
). It
can be constructed through the following process:
_
_
_

1
= n
1
,
|
= n
|

|1

=1
n
|
,

= 1
son(]ui]
k1
i=1
)
?
(n
|
)
_
_
_
_
n
1
=

1
|
1
|
, n
|
=

|
|
|
|
_
Furthermore, by completing n
I

n
I=1
to a full basis of \ (if : _ :), we can
always obtain an orthonormal basis of \ following this process.
Theorem 22 (QR Factorization) Let be a matrix of full column rank,
with columns n
I

n
I=1
. Then , by applying Gram-Schmidt to the columns of
(augmenting them to obtain a full basis if : < :), we obtain the following:
n
|
= |
|
| n
|
+
|1

=1
n
|
, n

\/
8
If we write this in matrix form, where Q is a matrix with columns n
I

n
I=1
(by
denition, this is an orthogonal / unitary matrix) and 1
||
= |
|
| , 1
|
=
n
|
, n

if / , (upper triangular matrix), this last expression provides the


following matrix factorization for :
=
_
_
[ [ [ [
n
1
n
2
n
n
[ [ [ [
_
_
= Q1 =
_
_
[ [ [ [ [ [
n
1
n
n
n
n1
n
n
[ [ [ [ [ [
_
_
_
_
_
_
_
_
_
|
1
| n
1
, n
2
n
n
, n
1

0 |
2
| n
n
, n
2

0 0
.
.
.
.
.
.
0 0 0 |
n
|
0 0 0 0
_
_
_
_
_
_
_
This is, = (Q
1
[ Q
2
)
_
11
0
_
= Q
1
1
1
, where Q
1
has the same column space as
.
This factorization is very useful both to solve linear systems of equations
(there are ecient ways to compute Q1, namely the Householder algorithm and
other sparse or incomplete Q1 routines) because, once computed, the system
r = / is equivalent to solving:
1r = Q
+
/
Which can be rapidly solved through backward or forward substitution (since 1
is upper triangular). Also, Q1 factorization is extensively used to obtain easier
formulas for certain matrix products that appear in applications such as OLS
and smooth splines.
A relevant result regarding this matrix factorization is that, although it is
not unique in general, if we have that = Q
1
1
1
= Q
2
1
2
, then it can be shown
that 1 = 1
2
1
1
1
is a diagonal, unitary matrix.
4.3 Riesz Representation Theorem and The Adjoint Op-
erator
For any linear operator T 1(\, \), we can obtain a related operator T
+

1(\, \ ) called the adjoint operator, which has very interesting properties. This
operator becomes even more relevant for applications to vector spaces of innite
dimension. This operator is dened as follows:
Denition 23 Let T 1(\, \). Then, the adjoint operator T
+
1(\, \ ) is
dened by the following functional relation:
T, n
V
= , T
+
n
\
\ \ , n \
If we choose orthonormal basis 1 and 1
t
for \ and \, then the matrix that
represents the adjoint is the conjugate transpose of the matrix that represents
. We get:
r, j
R
m = r,
+
j
R
n \r R
n
, j R
n
Where = [T]
1
0
1
and
+
= [T
+
]
1
1
0 = ([T]
1
0
1
)
+
.
9
The existence and uniqueness of this operator is given by the Riesz Repre-
sentation Theorem for Hilbert Spaces:
Theorem 24 (Riesz Representation) Let \ :,1 a Hilbert space, and T
1(\, 1) a continuous, linear functional (element of the topological dual). Then,
! \ such that:
Tn = n, \n \
Therefore, the Adjoint operator is always well dened by the functional
relation we have outlined, parting from the linear functional 1
u
= T, n
V
xing each n \.
Remark 25 Here is a quick application of the adjoint operator and the orthog-
onal projection operator: Let '
nn
(1), and r = / a system of linear
equations. Then the least squares solution to this system is given by the solution
to:
rrrrrrrrrrrrr
0
= 1
1on(.)
(/)
Since we are projecting / onto the column space of , and we know this is the
best approximation we can have using linear combinations of the columns of .
Using properties of the projection operator, we now know that:

r, / 1
1on(.)
(/)
_
= 0 \r
r, / r
0
= 0
Now, using the adjoint of , we nd:
r,
+
/
+
r
0
= 0 \r
So, this means
+
/ =
+
r
0
, and therefore, if
+
is invertible, this means
r
0
= (
+
)
1

+
/
j = r
0
= (
+
)
1

+
/
Incidentally, this also tells us that the projection matrix of a vector onto the
column space of a matrix is given by 1
1on(.)
= (
+
)
1

+
.
Properties of the Adjoint: (T 1(\, \))
1. (T +o)
+
= T
+
+o
+
2. (cT)
+
= cT
+
3. (T
+
)
+
= T
4. 1
+
\
= 1
\
(the identity is self-adjoint)
5. (oT)
+
= T
+
o
+
10
6. 1 and 1
t
orthonormal basis of \ and \ == then [T
+
]
1
1
0 = ([T]
1
0
1
)
+
(be careful, this is an == statement)
The most important property of the adjoint, however, provides us with an
explicit relation between the kernel and the image of T and T
+
. These relations
can be deduced directly from the denition, and provide us with comprehensive
tools to study spaces \ and \.
Theorem 26 ("Fundamental Theorem of Linear Algebra II") Let \, \
nite dimentional Hilbert spaces, T 1(\, \). Then:
1cr(T
+
) = 1a:(T)
J
1a:(T
+
) = 1cr(T)
J
Thus, we can always write \ = 1cr(T)1a:(T
+
) and \ = 1cr(T
+
)1a:(T).
Proof. (1cr(T
+
) = 1a:(T)
J
): Let 1cr(T
+
), and any Tn 1a:(T), then
1a:(T)
J
== Tn, = 0 \n == n, T
+
= 0 \n == 1cr(T
+
).
The proof of the second statement can be found replacing T and T
+
above.
A couple of results that follow from this one are:
1. T is injective == T
+
is onto
2. 1cr(T
+
T) = 1cr(T), and thus, r(T
+
T) = r(T) = r(T
+
) (rank).
5 Normal and Self-Adjoint Operators: Spectral
Theorems and Related Results
Depending on the eld 1 we are working with, we can obtain "eld sensitive"
theorems that characterize diagonalizable operators. In particular, we are in-
terested in the cases where the eld is either R or C. This discussion will also
yield important results on isometric, unitary and positive operators.
Denition 27 T 1(\ ) is said to be a self-adjoint operator if T = T
+
. If
1 = R, this is equivalent to saying [T]
1
is symmetric, and if 1 = C, that [T]
1
is Hermitian (equal to its conjugate transpose).
Denition 28 T 1(\ ) is said to be normal if it commutes with its adjoint,
that is, if TT
+
= T
+
T.
Remark 29 If 1 = R, then an operator T is normal and it is not self-adjoint
== \1 orthogonal basis of \ , [T]
1
is a block diagonal matrix, with blocks of
size 1 and blocks of size 2 which are multiples of rotation matrices.
First, we introduce a couple of interesting results on self-adjoint and normal
operators:
11
Proposition 30 T 1(\ ), 1 = C then there exist unique self-adjoint opera-
tors T
1
and T
2
such that T = T
1
+iT
2
. T is then self-adjoint == T
2
= 0, and
is normal == T
1
T
2
= T
2
T
1
. These operators are given by:
T
1
=
T +T
+
2
, T
2
=
T T
+
2i
Proposition 31 If T 1(\ ) is normal, then 1cr(T) = 1cr(T
+
) and 1a:(T) =
1a:(T
+
).
The most important properties of these families of operators, however, have
to do with the spectral information we can retrieve:
Proposition 32 T 1(\ ) and self-adjoint and 1 = C. If ` is an eigenvalue
of T, then ` R
Proof. For n eigenvector, we have:
`n, n = n, Tn = Tn, n = `n, n
` = `
Proposition 33 T 1(\ ) and 1 = C. T is self-adjoint == T, R
\ \
Proof. Using both self-adjoint and properties of the interior product:
T, = , T = T, \ \
This, in particular tells us the Rayleigh quotient for such an operator is
always real, and we can also rederive last proposition.
Proposition 34 If T 1(\ ) is a normal operator,
i) |T| = |T
+
| \ \
ii) T 1 is normal \ C
iii) is an eigenvector of T == is an eigenvector of T
+
Proof. (i): T, T = , T
+
T = , TT
+
= T
+
, T
+

(ii): (T 1)
+
(T 1) = T
+
T 1(T +T
+
) +
2
1 = TT
+
1(T
+
+T) +
2
1 =
(T 1)(T 1)
+
(iii): (T `1) = 0 == |(T `1)
+
| = 0 (by i and ii) == (T `1)
+
= 0
12
Theorem 35 (Spectral Theorem, 1 = C Version) \ :,C nite dimen-
tional Hilbert space. T 1(\ ). \ has an orthonormal basis of eigenvectors of
T == T is normal.
Proof. (==) By Schurs factorization, there exists a basis of \ such that [T]
1
is
upper triangular. By Gram-Schmidt, I can turn this basis into an orthonormal
one Q, and by studying the Q1 factorization, we realize the resulting matrix
is still upper triangular. However, since the basis is now orthonormal, and
T is normal, it follows that [T]
Q
is a normal, upper triangular matrix. This
necessarily implies [T]
S
is diagonal (We can see this by computing the products
of the o-diagonal elements, and concluding they have to be zero in order for
this to hold).
(==) If this is the case, then we have Q an orthonormal basis and diagonal
such that [T]
Q
= . Since a diagonal matrix is always normal, it follows that T
is a normal operator.
Theorem 36 (Spectral Theorem, 1 = R Version) \ :,R nite dimen-
tional Hilbert space. T 1(\ ). \ has an orthonormal basis of eigenvectors of
T == T is self-adjoint.
Proof. We follow the proof for the complex case, noting that, since 1 = R, both
Schurs factorization and Gram-Schmidt will yield matrices with real entries.
Finally a diagonal matrix with real entries is always self-adjoint (since this only
means that it is symmetric). We can also apply the theorem for the complex
case and use the properties for self-adjoint operators.
In any case, we then have the following powerful properties:
1. \ =
|

I=1
1
Xi
and (1
Xi
)
J
=

,=I
1
Xj
\i
2. If we denote 1

= 1
J

j
, then 1
I
1

= c
I
3. (Spectral Resolution of Identity)
1
\
=
|

I=1
1
I
4. (Spectral Resolution of T)
T =
|

I=1
`
I
1
I
These properties characterize all diagonalizable operators on nite dimen-
sional Hilbert spaces. Some important results that follow from this are:
13
Theorem 37 (Cayley-Hamilton) T 1(\ ), \ nite dimensional Hilbert space.
If j is the characteristic polynomial of T, then j(T) = 0.
Theorem 38 \ :,C and T 1(\ ) normal, then j C[r] polynomial such
that j(T) = T
+
. This polynomial can be found by the Lagrange Interpolation
problem j(`
I
) = `
I

|
I=1
We also have the following properties for T normal, which we can now deduce
using the spectral decomposition of T. These properties basically tell us that, if
T is normal, we can operate it almost as if it were a number through its spectral
representation.
1. If is a polynomial, then (T) =

|
I=1
(`
I
)1
I
2. If T

= 0 for some j, then T = 0


3. An operator commutes with T == it commutes with each 1
I
4. T has a normal "square root" (o such that o
2
= T)
5. T is a projection == all its eigenvalues are 0
t
: or 1
t
:.
6. T = T
+
(anti-adjoint) == all eigenvalues are pure imaginary numbers
5.1 Unitary Operators
Denition 39 An operator T 1(\ ) is said to be orthogonal (R) / unitary
(C) if |T| = || \ \ (this means T is a linear isometry, it is a "rigid
transformation"). A unitary operator can also be characterized by T normal
and T
+
T = TT
+
= 1
\
.
Theorem 40 (Mazur-Ulam) If ) is an isometry such that )(0) = 0 and ) is
onto, then ) is a linear isometry (unitary operator)
Theorem 41 The following statements are equivalent for T 1(\ ) :
i) T is an isometry
ii) T
+
T = TT
+
= 1
\
iii) Tn, T = n, \n, \
iv) If 1 is an orthonormal basis, T(1) is an orthonormal basis
v) There exists an orthonormal basis of \ such that T(1) is an orthonormal
basis.
Theorem 42 If ` is an eigenvalue of an isometry, then [`[ = 1. T is then an
isometry == T
+
is an isometry as well.
14
5.2 Positive Operators and Square Roots
Denition 43 \ a nite dimensional Hilbert Space, T 1(\ ). We say T is
a positive operator if T is self-adjoint and T, _ 0 \ \
Remark 44 A matrix is said to be a positive operator (positive semidenite
matrix) if r, r = r

r _ 0 \r 1
n
Remark 45 If 1 = C, we can remove the assumption that T is self-adjoint.
Remark 46 The operators T
+
T and TT
+
are always positive. In fact, it can
be shown any positive operator T is of the form oo
+
. This is a general version
of the famous Cholesky factorization for symmetric positive denite matrices.
Proposition 47 T is a positive operator == T is self-adjoint and all its
eigenvalues are real and non-negative.
Some properties of positive operators:
1. T, l 1(\ ) positive operators, then T +l is positive
2. T 1(\ ) is positive == cT is positive \c _ 0
3. T 1(\ ) is positive and invertible == T
1
is positive
4. T 1(\ ) positive == T
2
is positive (the converse is false in general)
5. T, l 1(\ ) positive operators, then Tl = lT implies Tl is positive.
Here we use heavily that Tl = lT implies there is a basis of vectors
which are simultaneously eigenvectors of T and l.
Denition 48 T 1(\ ). Then we say o is a square root of T if o
2
= T.
We note that, in general, the square root is not unique, For example, the
identity has an innite number of square roots: permutations, reections and
rotations by 180

. However, we can show that, if an operator is positive, then


it has a unique positive square root.
Proposition 49 T 1(\ ) is positive == T has a unique positive square
root.
15
6 Singular Value Decomposition and the Moore-
Penrose Generalized Inverse
6.1 Singular Value Decomposition
The Singular Value Decomposition for T 1(\, \) (and the corresponding
factorization for matrices) is, without a doubt, one of the most useful results
in linear algebra. It is used in applications such as Least Squares Regression,
Smooth Spline and Ridge Regression, Principal Component Analysis, Matrix
Norms, Noise Filtering, Low Rank approximation of matrices and operators,
etc. This decomposition also enables us to dene a generalized inverse (also
known as the pseudoinverse), and to compute other decompositions, such as the
Polar decomposition for positive operators and computing positive square roots
explicitly.
Theorem 50 (Singular Value Decomposition, or SVD) Let \, \ vs,1 Hilbert
spaces, T 1(\, \) with rank r. Then, there exist orthonormal basis of \ and
\
1
, ..,
n
and n
1
, .., n
n
, as well as positive scalars o
1
_ o
2
_ ... _ o
:
0
such that:
T
I
= o
I
n
I
; i _ r
T
I
= 0 ; i r
These scalars are called the "singular values" of T. Conversely, if basis and
scalars like these exist, then
1
, ..,
n
is an orthonormal basis of eigenvectors
of T
+
T such that the rst r are associated to the eigenvalues o
2
1
, ..., o
2
:
, and the
rest are associated to ` = 0.
Using what we know about positive operators, we can see how the state-
ment of this theorem must always be true. Regardless of what T is, T
+
T is
a positive operator, and therefore diagonalizable, with nonnegative eigenval-
ues o
2
1
, ..., o
2
:
and possibly also 0. Then, we obtain the set of n
1
, .., n
n
by
computing T
I
,o
I
= n
I
for the rst r vectors, and then completing it to an
orthonormal basis of \.
Also, this theorem immediately has a geometric interpretation: by choosing
the "right" basis, we know exactly what is the action of T on the unit sphere.
Basically, T sends the unit sphere to the boundary of a r dimensional ellipsoid
(since it squashes
:+1
, ..,
n
to zero), with axis on the rst r n
t
I
:. Also, the
biggest axis of this ellipsoid is the one in the direction of n
1
, and the smallest
is the one in the direction of n
:
.
Finally, we note that this theorem applied to a matrix '
nn
(1) yields
the following matrix factorization: If \, l are unitary matrices with
1
, ..,
n

and n
1
, .., n
n
as columns, and if is the matrix in '
nn
(1) with all zeros
16
except for
II
= o
I
for i _ r, then:
= l\
+
=
_
_
[ [ [
n
1
n
n
[ [ [
_
_
_
_
_
_
_
_
o
1
0 0
0
.
.
. 0 0
.
.
.
.
.
. o
:
0
0 0 0
n:
_
_
_
_
_
_
_
_
_

+
1

.
.
.

+
n

_
_
_
This factorization is known as the Singular Value Decomposition, or SVD fac-
torization of .
We know that, for the system of equations r = /, the best approximation
is given by the solution of
+
r =
+
/. By using the SVD, we can always
compute the solution with minimum norm:
Given an SVD for , = l\
+
, we have the following:
|r /| = |l\
+
r /|
= |\
+
r Q
+
/|
Since l is a unitary matrix. Therefore, all we need is to minimize |j c| ,
and then solve for r = \ j, c = l
+
/. However, it is clear that:
|j c|
2
=
:

I=1
[o
I
j
I
c
I
[
2
+
n

I=:+1
[c
I
[
2
Which is minimized precisely when j
I
=
ci
ci
for i _ r and its minimum value
is

n
I=:+1
[c
I
[
2
. If we want the j with minimum norm, all we have to do is to
make the rest of its coordinates zero.
Now, solving for r, if we dene:

=
_
_
_
_
_
_
1
c1
0 0
0
.
.
. 0 0
.
.
.
.
.
.
1
cr
0
0 0 0
n:
_
_
_
_
_
_
Then the solution for this problem is given by:
r = \ j = \

c = (\

l
+
)/
From the properties of Least Squares, and this last formula, we already know
that the matrix (\

l
+
) does the following:
1. If / 1a:(), (the system is consistent) then it gives us the solution
to r = / with minimum norm. For any r R
n
, we know we can write
r = 1
1t:(.)
r +1
1t:(T)
?r. Since (1
1t:(.)
r) = 0, then (\

l
+
)/ is the
unique solution in 1cr()
J
.
17
2. If / , 1a:(), (the system is inconsistent) then it projects / in Co(),
and then gives us the unique solution to r = 1
1on(.)
/ in 1cr()
J
.
3. We can also deduce this from the fact that, from the construction of
the SVD, the Fundamental Theorem of Linear Algebra II, and
+
=
\
+
l
+
, that
1
, ...,
:
is a basis for 1cr()
J
,
:+1
, ...,
n
for 1cr(),
n
1
, ..., n
:
for 1a:() and n
:+1
, ..., n
n
for 1a:(T)
J
.
6.2 The Moore-Penrose Generalized Inverse
Theorem 51 (Moore-Penrose Generalized Inverse) Given \, \ vs,1 Hilbert
spaces, T 1(\, \) with rank r. There exists a unique linear operator, which
we call the Moore-Penrose Generalized Inverse (or pseudoinverse, for short)
T

: \ \ , such that, if o is the restriction of T to 1cr(T)


J
, then:
T

[
1on(T)
= o
1
As an extension of this inverse, it has the following properties:
T

T = 1
1t:(T)
?
TT

= 1
1on(T)
Finally, if we have an o\ 1 decomposition of T, the pseudoinverse T

can be
computed as:
T

; , _ r
T

= 0 ; , r
In matrix form, if = [T]
I
\
,

= [T]
\
I
, then:

= \

l
+
The following properties can be obtained for the SVD and the Pseudoinverse:
1. Let '
nn
(C). Then, ,
+
and

have the same singular values.


Also, (

)
+
= (
+
)

and (

= (

2. (Moore-Penrose Conditions) Let T 1(\, \). If an operator l


is such that: (a) TlT = T, (/) lTl = l and (c) lT and Tl are
self-adjoint then l = T

. These conditions are a characterization of the


pseudoinverse of T as a linear operator.
3. We can check that the general formula for the projection to 1a:() which
we calculated with the adjoint matrix is:
1
1on(.)
= (
+
)

+
=

= l

l
+
Where

is a diagonal matrix with 1


t
: in the rst r diagonal entries
and 0
t
: in the rest.
18
6.3 The Polar Decomposition
Another useful decomposition that can be obtained from the SVD is the Polar
Decomposition. This is a generalization of the decomposition of a complex
number . = [.[ c
I arg(:)
:
Theorem 52 Let '
nn
(1). Then, there exists \ unitary and 1 positive
matrices such that = \1. If is invertible, then this decomposition is
unique. One way to derive this is by using 1 = [[ =
_

+
. Given an SVD
= l\
+
, then \ = l\
+
and 1 = \ \
+
.
Proof. = l\
+
= (l\
+
)(\ \
+
) = \1. As a product of unitary matrices,
\ is unitary, and the positivity of 1 follows from the fact that is diagonal
with non-negative entries.
Some useful results that follow from this decomposition are:
1. = \1 is normal == \1
2
= 1
2
\
2. Using that a positive matrix has a unique square root, we can use the
previous result to conclude = \1 is normal == \1 = 1\
3. = \1, then det(1) = [det [ and det(\) = c
I arg(det(.))
.
The Polar decomposition, which can be extended to linear operators in in-
nite dimensions, basically tells us that we can view any linear operator as the
composition of a partial isometry and a positive operator.
7 Matrix Norms and Low Rank Approximation
Matrices are very versatile: they can be seen as rearranged vectors in R
nn
, we
can identify a group of matrices with some mathematical object, or we can just
take them as members of the vector space of linear transformations from 1
n
to
1
n
. In any case, it is very useful to have the notion of what a matrix norm is.
7.1 The Frobenius Norm
If we consider matrices as members of R
nn
, it is then natural to endow them
with the usual euclidean norm and interior product:
||
J
=

I,
a
2
I
, 1
J
=

I,
a
I
/
I
19
Or equivalently, we can write:
||
J
=
_
tr(
+
)
, 1
J
= tr(1
+
)
In any case, we conclude the space ('
nr
(1), ||
J
) is a Hilbert space. This
norm has the following properties:
1. |r|
J
m _ ||
J
|r|
J
n (Lipschitz condition). In particular, this condi-
tion tells us any linear operator in '
nr
(1) is continuous.
2. and 1 such that 1 makes sense, then |1|
J
_ ||
J
|1|
J
3. Given an SVD = l\
+
, ||
2
J
= tr(\
+
\
+
) = tr(
+
) =
:

I=1
o
2
I
4. Given diagonalizable, we can reinterpret the spectral decomposition
as follows: = QQ
+
=

n
I=1
`
I

+
I
=

n
I=1
`
I
7
I
, where 7
I

n
I=1
is
an orthonormal set in ('
nn
(1), ||
J
). Also, given an SVD of
'
nn
(1), = l\
+
=

:
I=1
o
I
(l

\
+

) =

:
I=1
o
I
7

where again, the


7 matrices are orthonormal. An orthonormal basis for '
nn
(1) can be
then given by 7
I
= l
I
\
+

.
5. (Pythagoras Theorem) 1a:()l1a:(1) == l1 in the Frobenius
inner product, and |+1|
2
J
= ||
2
J
+|1|
2
J
(not true for general matrix
norms)
6. (Pseudoinverse, revisited)

is the matrix that minimizes |1 A|


J
.
That is, it is the best approximation to the inverse of in the Frobenius
norm.
7. (Best approximation with unitary matrices) min|QQ
0
|
J
: Qis unitary =
|l\
+
Q
0
|
J
7.2 Operator Norms
If we consider matrices as operators in 1(1
n
, 1
n
), it is then natural to use the
corresponding operator norm. This norm is dependent on the norms we use
for 1
n
and 1
n
, and it measures the maximum distorsion of the unitary sphere
under the action of . That is, given : (1
n
, ||
o
) (1
n
, ||
b
):
||
o,b
= max
]r]
a
=1
|r|
b

Then, by denition of an operator norm, it follows that |r|


b
_ ||
o,b
|r|
o
\r 1
n
.
20
This maximum is always attained at some point in the sphere, since |r|
b
is
a continuous function, and the sphere is compact. Although we can potentially
use any norms we want on the domain and range of the transformation, it is
often the case that ||
o
and ||
b
are both j-norms with the same value of j. In
this case, we talk about the j :or: of as ||
,
.
An important question that arises then is how to calculate this operator
norm. For a general j, this becomes a constrained optimization problem (often
a nontrivial one for general j). However, for some important cases, we can again
say something in terms of the SVD or in terms of the entries of .
Properties of ||
2
:
1. ||
2
= maxo
I
= o
1
: As we know, this is a very signicant fact, that
is deeply tied with the geometrical interpretation of the SVD. As it was
mentioned before, the SVD reveals that the sphere is mapped to an r
dimensional ellipsoid, where the major axis has length o
1
.
2. min|r|
2
: r 1a:(
+
),|r|
2
= 1 = o
:
3. ||
2
= max
]r]
2
=1
max
]]
2
=1
j
+
r = max
]r]
2
=1
max
]]
2
=1
r, j
J
m
4. ||
2
= |
+
|
2
= |
+
|
2
5. For l and \ unitary, |l
+
\ |
2
= ||
2
6. invertible, then
_
_

1
_
_
2
=
1
cr
. In general, we have
_
_

_
_
2
=
1
cr
.
We also have a result for the 1 and norms:
||
1
= max
=1,..,n

I=1
[a
I
[ (maximum of ||
1
norm of columns)
||
o
= max
I=1,..,n

=1
[a
I
[ (maximum of ||
1
norm of rows)
We observe that ||
1
= |
+
|
o
.
7.3 Low Rank Matrix Approximation:
We have now seen that the SVD provides us with tools to compute matrix
norms and to derive results of the vector space of matrices of a certain size. In
a way that is completely analogous to the theory of general Hilbert spaces, this
leads us to a theory of matrix approximation, by eliminating the singular values
that are not signicant and therefore producing an approximation of that has
21
lower rank. This can immediately be seen as truncating a "Fourier Series" of
, using the orthonormal basis that is suggested by our SVD:
Let = l\
+
=

:
I=1
o
I
(l
I
\
+
I
) =

:
I=1
o
I
7
II
, where 7
I
= l
I
\
+


is the orthonormal basis of '
nn
(1) (with respect to the Frobenius interior
product) as before. Then, it becomes evident that:
o
I
= , 7
II

J
That is, the o
I
(and all of the entries in ) are the Fourier coecients of using
this particular orthonormal basis. We notice that, since the 7
I
are exterior
products of two vectors, it follows that ra:/(7
I
) = 1 \i, ,.
Now, it is often the case that will be "noisy", either because it represents
the pixels in a blurred image, or because it is a transformation that involves some
noise. However, as in other instances of ltering or approximation schemes, we
expect the noise to be of "high frequency", or equivalently, we know that the
signal to noise ratio decreases in proportion to o
I
. Therefore, by truncating the
series after a certain o
|
, the action of remains almost intact, but we often get
rid of a signicant amount of "noise". Also, using results of abstract Fourier
series, we can derive the fact that this is the best approximation to of rank /
in the Frobenius norm, that is:

|
=
|

I=1
o
I
7
II
|
|
|
2
J
=
|

I=1
o
2
I
= min
:on|(1)=|
|1|
2
J

We also have the following results:


1. Dening the "error matrix" 1
|
=
|
, it follows by Pythagoras The-
orem that |
|
|
2
J
= ||
2
J
|1
|
|
2
J
. We can also dene a relative error,
or
1
2
|
=
|1
|
|
2
J
||
2
J
=

:
I=|+1
o
2
I

:
I=1
o
2
I
2. The matrix
|
is the result of / succesive approximations to , each of
rank 1.
3.
|
is also an optimal approximation under the ||
2
norm, with minimum
value o
|+1
.
8 Generalized Eigenvalues, the Jordan Canoni-
cal Form and e

The theory of generalized eigenspaces is a natural extension of the results for


diagonalizable operators and their spectral decomposition. Although the SVD
22
does provide some of these properties, it is desireable to have a decomposition
of the space as the orthogonal sum of invariant subspaces. This also leads to
the Jordan Canonical form, which is a block diagonal matrix with which we
can easily operate and calculate matrix powers, power series and important
operators like the exponential operator.
8.1 The Generalized Eigenspace K

Denition 53 Let \ be a vector space over C, T 1(\ ). For ` eigenvalue


of T, we dene a:(`) the algebraic multiplicity as the multiplicity of the root
` of the characteristic polynomial j(`) = det(T `1) and q:(`) the geometric
multiplicity as the dimension of the eigenspace 1
X
= 1cr(T `1).
Let \ be a vector space over C, T 1(\ ) a linear operator. We know
that T has a set of distinct eigenvalues `
I

|
I=1
and it is either the case that the
algebraic and the geometric multiplicity of each `
I
coincide (T is diagonalizable)
or that, for some `, q:(`) < a:(`). In the latter case, the problem is that the
eigenspaces fail to span the entire space. We can then consider powers of the
operator (T `1), and since 1cr((T `1)
n
) 1cr((T `1)
n+1
) \: (the
space "grows"), we can dene the following:
1
X
= \ : (T `1)
n
= 0 for some : N
These generalized eigenspaces have the following properties:
1. 1
X
1
X
\` eigenvalue of T (by denition)
2. 1
X
_ \ and is invariant under the action of T: T(1
X
) 1
X
3. If dim(\ ) < then dim(1
X
) = a:(`)
4. 1
X1
1
X2
= ? for `
1
,= `
2
Theorem 54 (Generalized Eigenvector Decomposition) Let \ be a vector
space over C, T 1(\ ). Then if `
I

|
I=1
are the eigenvalues of T and the
characteristic polynomial j(`) splits in the eld 1,
\ =
|

I=1
1
Xi
Proof.
Theorem 55 (Jordan Canonical Form Theorem) Under the conditions of
the generalized eigenvector decomposition, there exists a basis 1 such that [T]
1
is block diagonal, and its blocks are Jordan blocks, that is:
[T]
1
=
_
_
_
_
_
J
1
0 0
0 J
2
0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 J
|
_
_
_
_
_
23
Where each J
I
is a Jordan Canonical Form. A Jordan Canonical Form in turn
is also a block diagonal matrix, composed of Jordan blocks, which are matrices
of the form:
J
()
Xi
=
_
_
_
_
_
_
`
I
1 0
0 `
I
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 1
0 0 0 `
I
_
_
_
_
_
_
And the number of blocks in J
I
coincides with the geometric multiplicity of `
I
(dim(1
Xi
)). Also, the maximum size of these blocks is the rst : for which
1cr((T `1)
n
) = 1cr((T `1)
n+1
) = 1
Xi
.
Proof.
8.2 A method to compute the Jordan Form: The points
diagram
8.3 Applications: Matrix Powers and Power Series
9 Nilpotent Operators
10 Other Important Matrix Factorizations
1. 11l Factorization
2. Cholesky
11 Other Topics (which appear in past exams)
1. Limits with Matrices
2. Symplectic Matrices
3. Perron Frobenius and the Theory of Matrices with Positive Entries.
4. Markov Chains
5. Graph Adjacency Matrices and the Graph Laplacian. Dijikstra and Floyd
Warshall.
6. Matrices of rank / and the Sherman-Morrison-Woodbury formula (for the
inverse of rank / updates of a matrix)
24
12 Yet more Topics I can think of
1. Symmetric Positive Semidenite Matrices and the Variance Covariance
Matrix
2. Krylov Subspaces: CG, GMRES and Lanczos Algorithms
3. Toeplitz and Wavelet Matrices
4. Polynomial Interpolation
25

You might also like