Professional Documents
Culture Documents
Math 462
Note to Students
Dont try to use this as a replacement for your textbook. These are notes and just provide
an outline of the subject material, not a complete presentation. I have provided a copy to
you to use only as a study aid. Their real purpose is to remind me what to talk about about
during my class lectures. They are loosely based on the textbook Linear Algebra Done Right
by Sheldon Axler, and contain some material from other sources as well, but the presentation
in the textbook is more thorough. You should read the textbook in preparation for class, and
just use these notes to aide your own note-taking during class.
2012. This work is licensed under the Creative Commons Attribution-NoncommercialNo Derivative Works 3.0 United States License. To view a copy of this license, visit
http://creativecommons.org/licenses/by-nc-nd/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
Page ii
Contents
Please remember that this is a draft document, and so it is incomplete and buggy. More
sections will be added as the course progresses, the ordering of topics is subject to change,
and errors will be corrected as I become aware of them. This version was last LATEXed
on November 3, 2012.
Front Cover
i
Table of Contents
iii
Symbols Used
v
1 Complex Numbers
1
2 Vectors in 3-Space
9
3 Matrices and Determinants 15
4 Eigenstuff
27
5 Inner Products and Norms 33
6 Similar Matrices
45
7 Previewing the SVD
47
8 Example: Metabolic Flux 51
9 Vector Spaces
59
10 Subspaces
67
11 Polynomials
73
12 Span and Linear Independence
81
13 Bases and Dimension
87
14 Linear Maps
95
15 Matrices of Linear Maps 103
16 Invertibility of Linear Maps107
Page iii
Math 462
CONTENTS
Page iv
Symbols Used
(v1 , . . . , vn )
(v1 , . . . , vn |p1 , . . . , pm )
hv, wi
kvk
V W
C
deg(p)
det(A)
diagonal(x1 , . . . , xn )
dim(V)
F
Fn
F
i
length(B)
L(V)
L(V, W)
M(T, B)
P(F)
Pm (F)
R
span(v1 , . . . , vn )
U , V, W
V(F)
z or z
T
List containing v1 , . . . , vn
(v1 , . . . ) with (p1 , . . . ) removed
inner product of two vectors
norm of a vector
Direct Sum of V and W
Field of Complex Numbers
Degree of polynomial p
Determinant of A
Diagonal matrix
Dimension of a Vector Space
Either R or C
Set of tuples (x1 , . . . , xn ), xi F
Set of sequences over F
1
Length of a list B
Set of linear operators T : V 7 V
Set of linear maps T : V 7 W
Matrix of a linear map T
with respect to a basis B
Set of polynomials over F
Polynomials of degree m
Field of Real Numbers
Span of a list of vectors
Vector Spaces
Vector Space over F
Complex conjugate of z (scalar or vector)
Adjoint of T
Page v
Math 462
CONTENTS
Working across the chalk board from right to left, an intrepid linear algebra
teacher enters a fractal dimensioned vector space in search of the elusive x, braving
Koch curves, PQ waves, and buggy lecture notes, demonstrating Bruces law of
learning: No learning takes place after Thanksgiving.1
Page vi
Topic 1
Complex Numbers
Definition 1.1 Let a, b R. Then a Complex Number is an ordered
pair
z = (a, b)
(1.1)
with the following properties:
1. Complex Addition:
z + w = (a + c, b + d)
(1.2)
2. Complex Multiplication:
z w = (ac bd, ad + bc)
(1.3)
(1.6)
Page 1
Math 462
(1.7)
although the expression on the left is a real number and the expression on
the right is a complex number. This works because the imaginary part of
the right hand side is zero; hence we can use x and (x, 0) interchangeably.
Definition 1.7 The imaginary axis is the set of complex numbers
{z = (0, y)|y R}
(1.8)
(1.9)
To see why this works, let u = (x, 0) be any point on the real axis. Then
by (1.3),
uz = (x, 0) (a, b) = (ax 0b, bx 0a) = (ax, bx) = x(a, b) = xz (1.10)
Definition 1.9 We use the symbol i to denote the complex number
i = (0, 1)
(1.11)
Square Roots of Negative Real Numbers. The motivation for equation 1.3 is the following. Since i = (0, 1) by (1.11), then by 1.3,
i2 = (0, 1) (0, 1)
(1.12)
(1.13)
= (1, 0)
(1.14)
= 1
(1.15)
i = 1
Page 2
(1.16)
Math 462
(1.17)
= (a, 0) + (b, 0)
(1.18)
= a(1, 0) + b(0, 1)
(1.19)
= a + bi
(1.20)
where we have used the notation (1.7) to write 1 = (1, 0) and (1.11) to
write i = (0, 1).
Theorem 1.11 Let u = a + bi and v = c + di, where a, b, c, d R. Then
the complex number uv can be computed using the normal rules of multiplication over R supplemented by the equation i2 = 1.
Proof. By equation (1.3)
uv = (a, b) (c, d)
(1.21)
(1.22)
(1.23)
(1.24)
2
(1.25)
(1.26)
The proof of the following theorem follows immediately form the properties
of R.
Theorem 1.12 Properties of Complex Numbers
1. Closure. The set C is closed under addition and multiplication, i.e.,
whenever w, z C it follows that w + z C and wz C.
2. Commutative Property.
w+z =z+w
wz = zw
w, z C
(1.27)
3. Associative Property:
(u + v) + w = u + (v + w)
(uv)w = u(vw)
Last revised: November 3, 2012
u, v, w C
(1.28)
Page 3
Math 462
(1.30)
(1.31)
(1.32)
z(1/z) = (1/z)z = 1
(1.33)
7. Distributive Property.
u(w + z) = uw + uz, u, w, z C
(1.34)
Math 462
Using the complex conjugate gives us a way to extend the way we factor
the difference of two squares. Recall from algebra that if a, b R then
a2 b2 = (a b)(a + b)
(1.36)
(1.37)
(1.38)
|z| = zz = a2 + b2
Theorem 1.18 Properties of the Complex Conjugate
z + z = 2Re z, z C
(1.39)
z z = 2iIm z, z C
(1.40)
z + w = z + w , z, w C
(1.41)
zw = (z )(w ), z, w C
(1.42)
(z ) = z, z C
(1.43)
|wz| = |w||z|, w, z C
(1.44)
(1.45)
where the quadrant is determined by the location of the point (a, b) in the
complex plane.
Since the distance between the origin and (a, b) is a2 + b2 = |z| we have
Theorem 1.20 Eulers Formula
a + bi = |z|(cos + i sin ) = |z|ei
(1.46)
where = Ph(z).
Last revised: November 3, 2012
Page 5
Math 462
Figure 1.1: Illustration of a complex number and its conjugate, showing the phase () and absolute value (r) of both z and z. [Attribution: Wikimedia Commons, Creative Commons Attribution-Share Alike
3.0 Unported license, by Oleg Alexandrov, http://en.wikipedia.org/
wiki/File:Complex_conjugate_picture.svg.]
(1.47)
Page 6
Math 462
i.
= cos + i sin
2
2
+ 2k + i sin
+ 2k
= cos
2
2
i(/2+2k)
=e
(1.49)
(1.50)
(1.51)
(1.52)
where the last two lines hold for any integer value of k. Hence
1/2
i = ei(/2+2k)
= ei(/4+k)
+ k + i sin
+ k , k = 0, 1
= cos
4
4
2
2 2
2
=
+i
,
i
2
2
2
2
(1.53)
(1.54)
(1.55)
(1.56)
In fact, (1.55) is valid for all integer values of k, not just k = 0, 1; but it
will only give two unique results, as the result for k will be the same as
for k 2, for all k, because adding any multiple of 2 does not change the
value of the trigonometric function.
Page 7
Math 462
Page 8
Topic 2
Vectors in 3-Space
Definition 2.1 A Euclidean 3-vector v is object with a magnitude and
direction which we will denote by the ordered triple
v = (x, y, z)
(2.1)
x2 + y 2 + z 2
(2.2)
This definition is motivated by the fact that v is the length of the line
segment from the origin to the point P = (x, y, z) in Euclidean 3-space.
A vector is sometimes represented geometrically by an arrow from the origin
to the point P = (x, y, z), and we will sometimes use the notation (x, y, z)
to refer either to the point P or the vector v from the origin to the point
P . Usually it will be clear from the context which we mean.
Definition 2.2 The set of all Euclidean 3-vectors is isomorphic to the Euclidean 3-space (which we typically refer to as R3 ).
If you are unfamiliar with the term isomorphic, dont worry about it; just
take it to mean in one-to-one correspondence with, and that will be
sufficient for our purposes.
Definition 2.3 Let v = (x, y, z) and w = (x0 , y 0 , z 0 ) be Euclidean 3-vectors.
Then the angle between v and w is defined as the angle between the line
segments joining the origin and the points P = (x, y, z) and P 0 = (x0 , y 0 , z 0 ).
Page 9
Math 462
(2.3)
where v = (x, y, z) and w = (x0 , y 0 , z 0 ), and scalar multiplcation (multiplication by a real number) by
kv = (kx, ky, kz)
(2.4)
Theorem 2.4 The set of all Euclidean vectors is closed under vector addition and scalar multiplication.
Definition 2.5 Let v = (x, y, z), w = (x0 , y 0 , z 0 ) be Euclidean 3-vectors.
Their dot product is defined as
v w = xx0 + yy 0 + zz 0
(2.5)
(2.6)
For complex vectors, we replace the transpose with the Hermitian Conjugate:
v w = v w
(2.7)
Alternative notations for the inner product are
v w = hv, wi
= hv|wi
(2.8)
(2.9)
Note that for complex vectors the dot products does not in general commute.
Theorem 2.7 Let be the angle between the line segments from the origin
to the points (x, y, z) and (x0 , y 0 , z 0 ) in Euclidean 3-space. Then
v w = |v||w| cos
(2.10)
Math 462
(2.12)
Definition 2.10 The standard basis vectors for Euclidean 3-space are
the vectors
i =(1, 0, 0)
(2.13)
j =(0, 1, 0)
(2.14)
k =(0, 0, 1)
(2.15)
(2.16)
(2.17)
(2.18)
i.e., there is some non-trivial linear combination of the vectors that sums
to the zero vector.
If no such numbers exist the vectors are said to be linearly independent,
i.e., no non-trivial linear combination of the vectors sums to the zero vector.
Definition 2.14 The span of a set of m-dimensional vectors v1 , v2 , . . . , vn
is the subset of Rm formed by all possible linear combinations of the vi .
(
)
n
X
span(v1 , v2 , . . . , vn ) = v v =
ai vi , a1 , a2 , F
(2.19)
i=1
(2.20)
Page 11
Math 462
a = (1, 2, 1)
b = (10, 0, 10)
c = (24 2, 2, 24 2)
(2.21)
(2.22)
(2.23)
Then a, b, c R3 (m = 3).
If we define
1 1 1
, ,
2
2 2
1
1
e2 = , 0,
2
2
e1 =
(2.24)
(2.25)
Then
a = 2e1
b = 10 2e2
c = 2 2e1 + 24 2e2
(2.26)
(2.27)
(2.28)
v1
|v1 |
(2.29)
(2.30)
f
|f |
(2.31)
Math 462
(2.32)
f
|f |
(2.33)
and continue.
Example 2.2 Calculate a basis for the set S in the previous example.
We start by calculating
|a| =
1+2+1=2
(2.34)
Hence
a
|a|
1
= (1, 2, 1)
2
1 1 1
=
, ,
2
2 2
(2.35)
e1 =
(2.36)
(2.37)
Next,
f = b (e1 b)e1
(2.38)
1 1 1
, ,
2
2 2
1 1 1
= (10, 0, 10) 0
, ,
2
2 2
= (10, 0, 10)
1 1 1
(10, 0, 10)
, ,
2
2 2
(2.40)
= (10, 0, 10)
|f | = 200 = 10 2
f
1
e2 =
= (10, 0, 10) =
|f |
10 2
Last revised: November 3, 2012
(2.39)
(2.41)
(2.42)
1
1
, 0,
2
2
(2.43)
Page 13
Math 462
=(24 2, 2, 24 2)
1 1 1
1 1 1
, ,
, ,
(24 2, 2, 24 2)
2
2
2 2
2 2
1
1
1
1
(24 2, 2, 24 2)
, 0,
, 0,
2
2
2
2
=(24 2, 2, 24 2)
1 1 1 1
1
, ,
24 2 , 0,
2 2
2
2 2
2
2
=(24 2, 2, 24 2)
!
!
24 2
2 2 2 2 2 2
24 2
, ,
, 0,
2
2
2
2
2
(2.44)
(2.45)
(2.46)
(2.47)
(2.48)
(2.49)
Since f = 0 and there are no more vectors we are done. The basis is {e1 , e2 }
Page 14
Topic 3
Matrices and
Determinants
Definition 3.1 An m n (or m by n) matrix A is a rectangular array of
number with m rows and n columns. We will denote the number in the ith
row and j th column as aij
A= .
(3.1)
..
..
.
am1
am2
amn
(3.2)
[aij ]T = [aji ]
(3.3)
or
Remark: The transpose of an m n matrix is an n m matrix.
Definition 3.3 The Hermitian Conjugate or Conjugate Transpose
or Hermitian Adjoint Matrix or just adjoint matrix, denoted by AH
or A of a matrix A is the transpose of the complex conjugate of the matrix,
[aH
ij ] = [aji ]
(3.4)
Page 15
Math 462
5 + 7i
i
3 2i
17
(3.5)
i
17
(3.6)
5 7i
3 + 2i
(3.7)
Definition 3.5 The inner product between two complex vectors is defined as
v w
(3.8)
This reduces to the ordinary dot product when the vectors are real. When
the inner product is complex we may use it to define a complex cosine,
where
cos = v w
(3.9)
Definition 3.6 A matrix A is called Hermitian or Self Adjoint if
AH = A
(3.10)
5
6 + 7i
6 7i
17
(3.11)
is self-adjoint.
Remark 3.7 Do not confuse the adjoint matrix with the Classical adjoint defined below in definition 3.39, as they are not related.
Theorem 3.8 The diagonal entries of a self-adjoint matrix are real.
Column and Row Vectors. We will sometimes represent the vector
v = (x, y, z) by its 3 1 column-vector representation
x
v = y
(3.12)
z
or its 1 3 row-vector representation
vT = x
Page 16
(3.13)
Math 462
a11 + b11 b22 + b12
b11 b12
a11 a12
a21 a22 b21 b22 a21 + b21 a22 + b22
+
..
..
..
.
.
.
(3.14)
Matrices that have different sizes cannot be added.
Definition 3.10 A square matrix is any matrix with the same number
of rows as columns. The order of the square matrix is the number of rows
(or columns).
Definition 3.11 The column (row) rank of a matrix is the dimension
of its column (row) space.
1. Row rank = column rank (follows from the SVD, to be discussed
later)
2. For a square matrix, rank order
3. A matrix is said to be of full rank if it has the maximum possible
rank, i.e., min(n, m) for an m n matrix.
Definition 3.12 Let A be a square matrix. A submatrix of A is the
matrix A with one (or more) rows and/or one (or more) columns deleted.
Example 3.3 Let
1
A = 5
9
2
6
10
3
7
11
4
8
12
(3.15)
n
X
(3.17)
i=1
for any k = 1, .., n, where by A0ik we mean the submatrix of A with the
ith row and k th column deleted. (The choice of which k does not matter
Last revised: November 3, 2012
Page 17
Math 462
a12
a22
(3.18)
In particular,
a
c
b
= ad bc
d
(3.19)
and
A
D
G
B
E
H
C
E
F = A
H
I
D
F
B
G
I
D
F
+ C
G
I
E
H
(3.20)
Definition 3.14 Let A = [aij ] be any square matrix of order n. Then the
cofactor of aij , denoted by cof aij , is the (1)i+j det Mij where Mij is
the submatrix of A with row i and column j removed.
Example 3.4 Let
1
A = 4
7
2
5
8
3
6
9
(3.21)
Then
4
cof(a12 ) = (1)1+2
7
6
= (1)(36 42) = 6
9
(3.22)
n
X
j=1
n
X
aij cof(aij )
(3.23)
aji cof(aji )
(3.24)
j=1
Page 18
Math 462
(3.25)
s(j1 , j2 ) =
(3.27)
1p<q2
= s(1, 2) = sign (2 1) = 1
(3.28)
s(2, 1) = sign (1 2) = 1
(3.29)
For n = 3,
Y
s(j1 , j2 , j3 ) =
sign (jq jp )
(3.30)
1p<q3
(3.31)
(3.32)
(3.33)
(3.34)
(3.35)
(3.36)
(3.37)
(3.38)
j1 ,...,jn
s(j1 , . . . , jn )
j1 ,...,jn
n
Y
ai,ji
(3.39)
i=1
Page 19
Math 462
A 2 2 determinant is:
det(A) = s(1, 2)a11 a22 + s(2, 1)a12 a21
= a11 a22 a12 a21
(3.40)
(3.41)
A 3 3 determinant is:
det A = s(1, 2, 3)a11 a22 a33 + s(3, 1, 2)a13 a21 a22 +
s(2, 3, 1)a12 a23 a31 + s(1, 3, 2)a11 a23 a32 +
s(2, 1, 3)a12 a21 a33 + s(3, 2, 1)a13 a22 a31
(3.42)
(3.43)
and so forth.
Remark 3.17 Properties of Permutations
1. If two numbers in a permutation (j1 , . . . , jn ) are interchanged, teh
sign is reversed.
2. Suppose the permutation of (j1 , . . . , jn ) can be formed from (1, 2, . . . , n)
by k successive interchanges. Then s(j1 , . . . , n ) = (1)k .
Example 3.5
s(5, 1, 3, 2, 4) = s(5, 4, 3, 2, 1)
(3.44)
interchange 1,4
(3.45)
(4, 3, 2, 1)
interchange 2,3
(3.46)
(3.47)
(3.48)
(j)
(3.49)
(j)
Page 20
Math 462
(3.50)
(3.51)
hence
det AT =
(3.52)
(3.53)
(j)
X
(i)
= det A
(3.54)
(3.56)
Page 21
Math 462
det Ai
det A
(3.59)
(3.60)
(3.61)
i=1
..
.
n
X
(3.62)
i=1
or more concisely:
n
X
aki xi = bk , k = 1, 2, . . . n
(3.63)
i=1
n
X
(3.64)
i=1
Page 22
cof(akj )
n
X
i=1
aki xi =
n
X
cof(akj )bk
(3.65)
k=1
Math 462
or
n
X
cof(akj )bk =
n X
n
X
cof(akj )aki xi
(3.66)
k=1 i=1
k=1
n
X
cof(akj )akj xj +
k=1
= xj
n
X
cof(akj )aki xi
(3.67)
i=1,i6=j
n
X
n
X
cof(akj )akj +
k=1
xi
i=1,i6=j
n
X
cof(akj )aki
(3.68)
k=1
The sum on the left is det Aj The first sum on the right is det A. The
internal sum of the second sum
n
X
cof(akj )aki , i 6= j
(3.69)
k=1
(3.70)
(3.71)
n
X
aij xj
(3.72)
(3.73)
j=1
n
X
(aj )i xj
(3.74)
j=1
Page 23
Math 462
Hence
y=
n
X
(3.75)
j=1
n
X
xj (aj ) = Ax
(3.76)
j=1
Remark 3.27 The terms range and column space are used interchangeably, since we has shown that they are equivalent.
Definition 3.28 The nullspace of a matrix A is the set of all vectors v
such that Av = 0. If v nullspace(A) then by (3.75)
0=
n
X
(3.77)
j=1
i.e., the ij element of the product is the dot product between the ith row
of A and the j th column of B.
Example 3.7
1
4
2
5
8
3
10
6
12
9
(1, 2, 3) (8, 10, 12) (1, 2, 3) (9, 11, 13)
11 =
(4, 5, 6) (8, 10, 12) (4, 5, 6) (9, 11, 13)
13
(3.79)
64
70
=
(3.80)
156 169
Math 462
(3.81)
(3.83)
(3.84)
(3.85)
1
det A1
(3.86)
(3.87)
Page 25
Math 462
(3.88)
1
A = 4
0
0
5
3
3
0
1
(3.89)
The adjugate is
(1)[(1)(5) (0)(3)]
adjA = (1)[(0)(1) (3)(3)]
(1)[(0)(0) (3)(5)]
5
= 9
15
4
1
12
(1)[(4)(1) (0)(0)]
(1)[(1)(1) (3)(0)]
(1)[(1)(0) (3)(4)]
T
5
12
3 = 4
12
5
9
1
3
T
(1)[(4)(3) (5)(0)]
(1)[(1)(3) (0)(0)]
(1)[(1)(5) (0)(4)]
(3.90)
15
12
5
(3.91)
1
adj A
det A
(3.92)
Example 3.9 Let A be the square matrix defined in equation 3.89. Then
det A = (1)(5 0) (0) + (3)(12 0) = 41
Hence
A1
5
1
4
=
41
12
9
1
3
15
12
5
(3.93)
(3.94)
In practical terms, computation of the determinant is computationally inefficient, and there are faster ways to calculate the inverse, such as via
Gaussian Elimination. In fact, determinants and matrix inverses are very
rarely used computationally because there is almost always a better way to
solve the problem, where by better we mean the total number of computations as measure by number of required multiplications and additions.
Page 26
Topic 4
Eigenstuff
Definition 4.1 Let A be a square matrix. Then is called an eigenvalue
of A if there exists some nonzero vector v such that
Av = v
(4.1)
1
A = 0
6
0
4
0
1
0
0
(4.2)
Then
3
v = 0
6
is an eigenvector of A with
1 0
Av = 0 4
6 0
(4.3)
Page 27
Math 462
TOPIC 4. EIGENSTUFF
(4.5)
Example 4.2 Let A be the square matrix defined in equation 3.89. Then
its characteristic equation is
1
0
3
5
0
0 = 4
(4.6)
0
3
1
= (1 )(5 )(1 ) 0 + 3(4)(3)
2
= 41 11 + 7
(4.7)
(4.8)
Theorem 4.4 The eigenvalues of a square matrix A are the roots of its
characteristic polynomial.
Proof. By definition, is an eigenvalue of A if and only if there is a nonzero
vector v such that
Av = v
(4.9)
Av v = 0
(4.10)
Av Iv = 0
(4.11)
(A I)v = 0
(4.12)
A I is singular
(4.13)
det(A I) = 0
(4.14)
(4.15)
The only real root of this equation is approximately 6.28761. There are
two additional complex roots, 0.356196 2.52861i and 0.356196 +
2.52861i.
Page 28
TOPIC 4. EIGENSTUFF
Math 462
2
A = 1
1
Its characteristic equation is
2
2
1
0 = 1
1
3
3
1
1
2
1
3
3
1
1
(4.16)
(4.17)
(4.18)
+ 3[3 (1 )]
(4.19)
(4.20)
= (2 )( 4) 2( + 2) + 3( + 2)
(4.21)
= (2 )( + 2)( 2) + ( + 2)
(4.22)
= ( + 2)[(2 )( 2) + 1]
(4.23)
(4.24)
= ( + 2)( 4 + 3)
(4.25)
= ( + 2)( 3)( 1)
(4.26)
= ( + 2)( + 4 3)
2 2 3
x
x
1 1
1 y = 2 y
(4.27)
1 3 1
z
z
for x, y, z. One way to do this is to multiply out the matrix on the left and
solve the system of three equations in three unknowns:
2x 2y + 3z = 2x
(4.28)
x + y + z = 2y
(4.29)
x + 3y z = 2z
(4.30)
However, we should observe that the eigenvector is never unique. For example, if v is an eigenvector of A with eigenvalue then
A(kv) = kAv = kv
(4.31)
Page 29
Math 462
TOPIC 4. EIGENSTUFF
(4.32)
x + 1 + z = 2
(4.33)
x + 3 z = 2z
(4.34)
Simplifying
4x 2 + 3z = 0
(4.35)
x+3+z =0
(4.36)
x+3+z =0
(4.37)
The second and third equations are now the same because we have fixed
one of the values. The remaining two equations give two equations in two
unknowns:
4x + 3z = 2
x + z = 3
(4.38)
(4.39)
The solution is x = 11, z = 14. Therefore an eigenvector of A corresponding to = 2 is v = (11, 1, 14), as is any constant multiple of this
vector.
Theorem 4.5 The eigenvalues of a diagonal matrix are the elements of the
diagonal.
Proof. Let the diagonal elements be d1 , d2 , . . . The characteristic equation
is
0 = det(A I)
d1
0
0
d
2
=
..
.
(4.40)
0
0
= (d1 )(d2 )
(4.41)
(4.42)
(4.43)
TOPIC 4. EIGENSTUFF
Math 462
(4.44)
Setting x = 0 gives
det(A) = (1)n 1 2 n
(4.45)
(4.46)
2
0
= 0
.
.
.
0
(4.47)
a13
a23
a14
a24
d3
a33
..
.
a1n
a2n
..
.
an1,n
dn
(4.48)
(4.49)
where the determinant is expanded by the first column. Hence the roots
are d1 , d2 , . . . A similar calculation holds for lower triangular matrices, expanding the determinant by the first row.
Page 31
Math 462
TOPIC 4. EIGENSTUFF
Page 32
Topic 5
n
X
xi yi
(5.1)
k=1
(5.2)
k=1
(Compare the above definitions with definitions 2.1 and 2.5, and the comments that follow those earlier definitions.) The Euclidean Length satisfies
the properties of a norm given below in definition 5.8.
Definition 5.3 The Cosine of the Angle between two vectors x and y
is given by
hx, yi
cos =
(5.3)
|x||y|
Definition 5.4 Two vectors x and y are said to be orthogonal if
hx, yi = 0
(5.4)
Page 33
Math 462
(5.5)
(5.6)
Two sets of vectors X = {x1 , x2 , . . . } and Y = {y1 , y2 , . . . } are orthogonal sets of vectors (or just orthognal) if
hxi , yj i xi X and yj Y
(5.7)
n
X
ci vi
(5.8)
i=1,i6=k
= vk
ci vi
=
i=1,i6=k
n
X
ci vk vi = 0
(5.9)
(5.10)
(5.11)
i=1,i6=k
where the last term follows because hvk , vi i = 0 for i 6= k. But this contradicts the observation that |vk | =
6 0.
Hence no vk can be written as a linear combination of the other elements
of V, and we conclude that V is a linearly independent set.
1 The
Page 34
Math 462
(5.13)
Theorem 5.7 Unitary matrices preserve lengths and angles, in the sense
that if A is unitary and xis any vector over F then
|Ax| = |x|
(5.14)
(5.15)
(5.16)
(5.17)
= u A Av
(5.18)
= u v
(5.19)
Page 35
Math 462
n
X
(5.20)
(vj x)vj
(5.21)
j=1
Then
hvi , yi = vi x
n
X
(vj x)(vi vj ) = 0
(5.22)
j=1
since the only non-zero term in the sum is the one with i = j.
Thus y is orthogonal to all the vi . We say that y is orthogonal to the
set V .
Thus any vector x can be be decomposed into n + 1 orthogonal components
Parallel to V
Orthogonal to V
z}|{
y
x=
z
}|
{
n
X
+
(vj x)vj
(5.23)
j=1
=y+
n
X
vj (vj x)
(5.24)
j=1
y
|{z}
Orthogonal to V
n
X
(vj vj )x
(5.25)
j=1
{z
Parallel to V
The matrix
P=
n
X
(vj vj )
(5.26)
j=1
Page 36
Math 462
Vector Norms
We will discuss norms more generally in chapter 21. Here we will review
some basic concepts of norms of vectors and matrices.
Definition 5.8 A norm is a function
kxk : Fn 7 R
(5.28)
(5.29)
i=1
n
X
|xi |
(5.30)
i=1
(5.31)
i=1
(5.32)
Page 37
Math 462
The pnorm can be visualized in terms of its unit-ball, that is, the collection
of all points such that
kxkp = 1
(5.33)
In 2-dimensions, we can describe a vector with components (x, y) in terms
of the polar coordinates (r, ), as
x = (r cos , r sin )
(5.34)
(5.35)
(5.36)
p
= r (| cos | + | sin | )
(5.37)
1
| cos |p + | sin |p
1/p
(5.38)
These unit balls are plotted for several values of p in figure 5.1. The dashed
box is the unit square [1, 1] [1, 1].
Page 38
Math 462
X
|wii xi |p
1/p
(5.39)
2 It
Page 39
Math 462
Matrix Norms
Note for Next Year
Include:
Raleigh Principle and derivation of simple formula for induced 2 norm
More on xT Ax as an ellipsoid
Applications: PCA and Least Squares Fit
The easiest way to define a matrix norm by treating any matrix as the
vector of its components. The Frobenius Norm or Hilbert-Schmidt
Norm is given by
sX
kAkF =
|aij |2
(5.40)
i,j
(5.41)
These are the most commonly used matrix norms when proving properties
for numerical linear algebra.
Theorem 5.10 The Frobenius Norm can be written is
p
p
kAkF = trace(A A) = trace(AA )
(5.42)
Proof. (exercise)
One can also defined an induced matrix norm as
kAk =
sup
x(6=0)Fn
kAxk
kxk
(5.43)
kAxk
sup
(5.44)
{x(6=0)Fn |kxk=1}
The induced matrix norms have geometric interpretations and are more
commonly used in analysis.
Figure 5.3 shows illustrates several induced matrix norms using
0.7 2
A=
3 1
(5.45)
The effect of the matrix A under (5.43) is to perturb the unit ball of the
pnorm; the matrix norm is the radius of the circumscribed circle, which
gives the greatest distance of from the origin of the perturbed unit ball
under the vector pnorm given by A.
Page 40
Math 462
Figure 5.3: Unit balls for some induced matrix pnorms in R2 using the
matrix A given by (5.45). The pnorm is the radius of the circumscribed
circle.
1-Norm Induced Unit Ball
(5.46)
(5.47)
Then
i
i.e., the matrix 2-norm of a diagonal matrix is the absolute value of the
largest diagonal element.
The induced 1-norm kAk1 has the property that
kAk1 =
kAxk1
sup
(5.48)
{x:kxk=1}
= max
1jn
n
X
(5.49)
j=1
= max kai k1
i=1,...,n
(5.50)
Page 41
Math 462
kAxk
sup
(5.51)
{x:kxk=1}
= max
1in
n
X
(5.52)
j=1
= max kai k1
(5.53)
i=1,...,n
where ai is the ith row vector of A, or equivalently, the ith column vector
of A .
Theorem 5.11 Cauchy-Schwarz Inequality. Let x and y be vectors
over Fn . Then
hx, yi kxk2 kyk2
(5.54)
or in terms of components
X
xi yi
qX
x2i
qX
yi2
(5.55)
Proof. If x and y are linearly dependent then there exists some a F such
that
y = ax
(5.56)
Then
2
Math 462
by
f (u) = kx uyk22 0
(5.57)
= 0 hx uy, x uyi
(5.58)
= (x uy) (x uy)
(5.59)
= x x ux y u y x + u uy y
=
kxk22
ux y u y x + |u|
kyk22
(5.60)
(5.61)
(5.62)
Since this holds for all u F, we can pick any u we like, such as
y x
kyk22
x y
= u =
kyk22
u=
and
uu =
y x x y
|x y|2
=
kyk22 kyk22
kyk42
2
x y
y x+
kyk22
0 kxk2
kyk22
kyk22
kyk42
|x y|2
|x y|2
= kxk22 2
+
kyk22
kyk22
2
|x y|
= kxk22
kyk22
(5.63)
(5.64)
(5.65)
(5.66)
(5.67)
(5.68)
Rearranging,
|x y|2
kxk22
kyk22
(5.69)
(5.70)
Taking the positive square root of each side gives the Cauchy-Scwartz inequality.
Theorem 5.12 Unitary Matrices preserve both the 2-norm and the Frobenius norm.
Proof. This result follows from theorem 5.7 applied to each norm.
Last revised: November 3, 2012
Page 43
Math 462
Page 44
Topic 6
Similar Matrices
Note for Next Year
Move this material to chapter 19 and integrate more thoroughly.
As we will see in definition 19.2,
Definition 6.1 Two square matrices A and B are called similar if there
exists a third square matrix T with det(T) 6= 0 such that
T1 AT = B
(6.1)
) det(A I) det(T)
(6.4)
) det(T) det(A I)
(6.5)
= det(T
= det(T
= det(T
(6.2)
IT)
= det(I) det(A I)
= det(A I)
(6.3)
(6.6)
(6.7)
Page 45
Math 462
Proof. The trace is the sum of the eigenvalues and the eigenvalues are
preserved.
Definition 6.5 A square matrix is diagonalizable if it is similar to a
diagonal matrix, i.e., there exists some T and some some diagonal matrix
D such that
T1 AT = D
(6.8)
Theorem 6.6 A is diagonalizable iff it has n linearly independent eigenvectors.
Proof. ( = ) Suppose that A is diagonalizable. Then for some matrix T
T1 AT = diag(d1 , d2 , . . . , dn )
(6.9)
A t1
AT = Tdiag(d1 , . . . , dn )
tn = t1 tn diag(d1 , . . . , dn )
= d1 t1 dn tn
(6.10)
Ati = di ti
(6.13)
(6.11)
(6.12)
(6.18)
where D = diag(1 , . . . , n ).
Multiplying by T1 gives
T1 AT = D
(6.19)
where D is diagonal.
Page 46
Topic 7
(7.1)
Page 47
Math 462
(7.3)
(7.4)
(7.5)
vn = 1 u1 2 u2 n un
= u1 u2 un diag(1 , 2 , . . . , n )
(7.6)
(7.7)
(7.8)
(7.9)
= diag(1 , 2 , . . . , n )
(7.10)
AV = U
(7.11)
v2
u2
vn
un
Then
Since the column vectors of V are orthonormal, the matrix V is unitary
(proof: exercise) and hence invertible with inverse V . This gives the
Singular Value Decomposition:
A = UV
(7.12)
Math 462
(7.13)
Let
2 1
2 2
5 3
R=A A=
=
2 1
1 1
3 5
2 2 2 1
8 0
L = AA =
=
1 1 2 1
0 2
(7.14)
(7.15)
The singular
values are the square roots of the eigenvalues, i.e., 1 = 2 2
and 2 = 2.
(To see that both R and L have the same eigenvalues, we can compute the
characteristic polynomial of R:
0 = (5 )(5 ) 9
(7.16)
(7.17)
= 10 + 16
(7.18)
= ( 2)( 8)
(7.19)
= 25 10 + 9
2
2 2
=
0
0
2
(7.20)
The
vectors are the normalized eigenvectors of L, which are
left singular
1
0
and
(by inspection, because L is diagonal); thus
0
1
U=
1
0
0
1
(7.21)
(7.23)
= 3y = 3x
(7.24)
Page 49
Math 462
So an eigenvector is
that
1
. Since we need normalized eigenvectors, we find
1
" #
v1 =
1
2
1
2
(7.25)
(7.26)
= 5x + 3y = 2x
(7.27)
= 3y = 3x
(7.28)
So a normalized eigenvector is
"
v2 =
hence
"
V=
1
2
1
2
12
#
(7.29)
1
2
12
#
(7.30)
1
2
0
1
2 2
0
" 1
0
2
2 12
1
2
1
2
#
(7.31)
Note that the algorithm did not specify how to chose the sign of the eigenvectors, and this must be determined by trial and error - if we had chosen
one of the signs wrong it will still give an eigenvector but might not multiply
out to the correct answer.
Page 50
Topic 8
(8.1)
(8.2)
(8.3)
This system represents some enzyme, or catalyst, that exists in two forms,
which we call E and E ; the () is a common biochemical notation that
indicates the activated form of the inactive enzyme (). In form E, the
enzyme cant do anything, but in E it can interact with some substrate to
do something. In our system of reactions, adding Y to E makes it active
(equation (8.3)); and then the activated form E spontaneously converts
back to the inactive form E, emitting a molecule X in the process (equation
(8.1)). Meanwhile, when they are free, the molecules X can convert to Y
on their own (equation (8.2)). The enzymatic activity of E , i.e., what it
does when it is active, is not actually shown in eqs. (8.1) to (8.3).
In this system, there are two conserved cycles:
1 For further reading on the example in this section see Sauro and Ingalls, Biophysical
Chemistry 109(2004):1-15, Conservation analysis in biochemical networks: computational issues for software writers.
Page 51
Math 462
E 1 E + X
k2
XY
(8.4)
(8.5)
k3
E + Y E
(8.6)
Mathematically, the law of mass action says that the affinity or likelihood
of two chemicals on the left-hand-side of a biochemical equation
k
n1 A + n2 B P
(8.7)
n1 A + n2 B + n3 P n4 P
(8.9)
(8.10)
If P appears in more than one reaction, then the total dP/dt is the sum of
the dP/dts from each individual reaction.
Page 52
Math 462
= k1 E + k3 EY
(8.11)
= k1 E k3 EY
(8.12)
= k1 E k2 X
(8.13)
= k2 X k3 EY
(8.14)
(8.15)
(8.16)
(8.17)
Hence
E + E = constant
(8.18)
(8.19)
(8.20)
=0
(8.21)
hence
E + X + Y = constant
(8.22)
= k1 E + k3 EY
(8.23)
= k1 E k3 EY
(8.24)
= k1 E k2 X
(8.25)
= k2 X k3 EY
(8.26)
Page 53
Math 462
E 1 E + X
k
X 2 Y
k3
E + Y E
at rate v1 = k1 E moles/second
(8.27)
at rate v2 = k2 X moles/second
(8.28)
at rate v3 = k3 EY moles/second
(8.29)
E
1 0
1
v1
d
0 1
E= 1
v2
(8.34)
dt X 1 1 0
v3
Y
0
1 1
Letting S = (E , E, X, Y )T be the vector of chemical species, v = (v1 , v2 , v3 )T
be the vector of velocities and N the matrix
1 0
1
1
0 1
N=
(8.35)
1 1 0
0
1 1
the system of differential equations can be compactly expressed as
dS
= Nv
dt
Page 54
(8.36)
Last revised: November 3, 2012
Math 462
The matrix N is called the stoichiometry matrix and has an easy intuitive explanation: Nij is the number of molecules of species j produced in
reaction i for each molecule consumed by the system:
(8.27)(8.28)(8.29)
1 0
1
1
0
1
1 1 0
0
1 1
for
for
for
for
E
E
X
Y
(8.37)
In other words, reading this matrix across the first row, E is decreased
by one by reaction (8.27) and increased by 1 in reaction (8.29). It is not
affected by reaction (8.28), hence the zero in the second column of the first
row.
It turns that the matrix N contains all the information we need to determine
which cycles are conserved. We first partition the matrix N (through, e.g.,
row reduction) into the linearly independent rows NR and the dependent
rows N0 as follows:
NR
N=
(8.38)
N0
Since the linearly dependent rows in N0 are linear combinations of the
independent rows in NR then there exists some matrix L0 such that
N0 = L0 NR
(8.39)
d Si
dS
I
=
NR v = LNR v = Nv =
L0
dt
dt Sd
(8.41)
(8.42)
(8.43)
(8.44)
(8.45)
Page 55
Math 462
(8.47)
(8.48)
dSi
dSd
=
dt
dt
(8.49)
(8.50)
(8.51)
(8.52)
(8.53)
(8.54)
so that
Page 56
L0 Si (t) Sd (t) = T
(8.55)
(8.56)
(8.57)
(8.58)
(8.59)
(8.60)
Math 462
where (8.46) has been used in the final step. Finally we can define the
Conservation Matrix as
= L0 I
(8.61)
so that
S = T
(8.62)
(8.63)
= 0 = L0 NR + N0
(8.64)
= L0 NR + IN0
NR
= L0 I
N0
= L0 I N
from (8.41)
(8.67)
= N
from (8.61)
(8.68)
(8.65)
(8.66)
In other words
N = 0
(8.69)
= NT T = 0
(8.70)
(8.71)
E X Y
E
1 1
1
0
T
N =
0
0 1 1
1 1 0 1
and hence
null(NT ) =
E E X Y
1 0 1 1
1 1 0 0
(8.72)
(8.73)
Reading across the first row tells us that {E , X, Y } form a cycle, and the
second row tells us that {E , E} form a cycle.
This can be computed, e.g., in Mathematica as follows:
Last revised: November 3, 2012
Page 57
Math 462
(8.74)
and we have some sort of model description of the system (what we think
all the reaction are or should be) that is described by a stoichiometry matriz
N. The problem thus becomes this: Given
1. a proposed model of the system given by the matrix N;
2. a list of constraints on the velocities given by (8.74);
3. a list of observed fluxes f (i.e., a subset of v that we were actually
able to measure in the lab),
then what values of the conserved quantities v will produce the observed
fluxes f ?
Mathematically, this problem is typically solved as follows:2 maximize f T v
subject to Nv = 0 and the constraints (8.74) where v is the vector of all
fluxes and f is a vector of indicators (1s and 0s) that indicates which
components of v are known. This solution method has the underlying
assumption that nature will produce an optimal solution, which may not
be correct. However, solving the problem when formulated in this manner is possible because it is a restatement of the basic problem of linear
programming, which is described in any textbook on operations research.3
Typical algorithms to do this include the simplex method. More sophisticated techniques rely on nonlinear optimization methods. Linear programming (which has nothing inherently to do with computer programming) is a
field of mathematical optimization that was developed originally by Leonid
Kantrovitch and George Dantzig in the 1930s and 1940s and is extensively
utilized in business.
2 See Smallbone et. al., BMC Systems Biology 4:6 (2010), Towards a genomescale kinetic model of cellular metabolism.
3 See, for example, Hillier and Lieberman, Introduction to Operations Research,
McGraw-Hill.
Page 58
Topic 9
Vector Spaces
Definition 9.1 A list of length n is a finite ordered collection of n objects,
e.g.,
(x1 , x2 , . . . , xn )
(9.1)
The expression
(x1 , x2 , . . . )
(9.2)
refers to a list of some finite unspecified length (it is not an infinite list).
The list of length 0 is written as ().
A list is similar to set except that the order is critical. For example,
{1, 3, 4, 2, 3, 5} and {1, 2, 3, 3, 4, 5}
(9.3)
(9.4)
Page 59
Math 462
(9.5)
(9.6)
to represent a list. For example, if x and y are two lists of the same length
n then we can defined list addition as
x + y = (x1 + y1 , x2 + y2 , . . . , xn + yn )
(9.7)
It is not possible to add lists of different lengths, but this will not be a
problem since we will in general be interested in using lists to represent
points in Fn . For example, if x, y Fn we can define the point
z = x + y Fn
(9.8)
by equation 9.7.
Definition 9.5 Let F be a field. A vector space V over F is a set
V along with two operations addition and scalar multiplication with
the following properties1 Elements of vector spaces are called vectors or
points.
1. Closure. v, w V and c F,
v+w V
cv V
(9.9)
2. Commutivity. u, v V,
u+v =v+u
(9.10)
3. Associativity. u, v, w V and a, b F,
(u + v) + w = u + (v + w)
(ab)v = a(bv)
(9.11)
1 Although this looks deceptively like the definition of a field, it is not, because we
only require scalar multiplication and not multiplication between elements of the set.
Also one should be careful to avoid confusing the terms vector space and vector field,
which you might see in your studies. A vector field is actually a function f : Fn 7 V(F)
that associates an element in the vector space V over F with every point in Fn . Vector
fields occur frequently in physics.
Page 60
Math 462
4. Additive Identity. (0 V) 3
v+0=0+v =v
(9.12)
v V.
5. Additive Inverse. (v V), (w V) 3
v+w =w+v =0
(9.13)
(a + b)u = au + bu
(9.14)
A vector space over R is called a Real vector space. For example, Euclidean 3-space R3 defined by
R3 = {(x, y, z)|x, y, z R}
(9.15)
(9.16)
(9.17)
m
X
vk f ,
w=
k=0
n
X
k=0
wk f ,
u=
p
X
uk f k ,
(9.18)
k=0
Page 61
Math 462
v+w =
m
X
vk f k +
k=0
n
X
wk f k
(9.19)
k=0
max(m,n)
(vk + wk )f k V
(9.20)
k=0
v+w =
max(m,n)
max(m,n)
(vk + wk )f k =
k=0
(wk + vk )f k = w + v
(9.21)
k=0
(u + v) + w =
((uk + vk ) + wk )f k
(9.22)
(9.23)
k=0
max(m,n,p)
X
k=0
= u + (v + w)
(9.24)
v =
m
X
(vk )f k
(9.25)
k=0
1v =
m
X
k=0
Page 62
(1)(vk )f =
m
X
(vk )f k = v
(9.26)
k=0
Math 462
=a
p
X
uk f +
m
X
k=0
p
X
k=0
m
X
k=0
k=0
uk f k + a
(a + b)u = (a + b)
p
X
!
vk f
(9.27)
vk f k = au + av
(9.28)
uk f k
(9.29)
k=0
=a
p
X
uk f k + b
k=0
p
X
uk f k = av + bv
(9.30)
k=0
a b
d e
c
f
(9.31)
Then V is a vector space over F with matrix addition as (+) and scalar
multiplication of a matrix as (). For example, the sum of any two 2 3
matrix is a 23 matrix; matrix addition is commutative and associative; the
additive inverse of v is the matrix v and the additive identity is the matrix
of all zeros; the multiplicative identity 1 F is the multiplicative identity;
and scalar multiplication of matrices distributes over matrix summation.
Example 9.4 Let V be the set of all functions f : F 7 F. Then V is a
vector space.
Theorem 9.6 Let V be an vector space over F. Then V has a unique
additive identity.
Proof. Suppose that 0, 00 V are both additive identities. Then
00 = 00 + 0 = 0
(9.32)
Page 63
Math 462
Proof. Let v V have two different additive inverses w and w0 . Then since
each is an additive inverse,
w = w + 0 = w + (v + w0 ) = (w + v) + w0 = 0 + w0 = w0
(9.33)
(9.35)
(9.36)
(9.37)
(9.38)
This is one of those cases where we have two different zeroes in the same
equation. The 0 on the left is the scalar zero in F, while the 0 on the right
is the vector 0 V.
Proof.
0v = (0 + 0)v = 0v + 0v 0v 0v = 0v 0 = 0v
(9.39)
Theorem 9.9 Let V be a vector space over F and let 0 V be the additive
indentity in V. Then
a0 = 0
(9.40)
Page 64
Math 462
Proof.
a0 = a(0 + 0) = a0 + a0 0 = a0 a0 = a0 + a0 a0 = a0
(9.41)
(9.43)
Page 65
Math 462
Page 66
Topic 10
Subspaces
Definition 10.1 A subset U V is called a subspace of V if U is also a
vector space. Specifically, the following properties need to hold:
1. Additive identity:
0U
(10.1)
u+v U
(10.2)
au U
(10.3)
(10.4)
Page 67
Math 462
(10.5)
W = {(0, y, 0) F3 |x F}
(10.6)
(10.7)
is also a subspace of V.
Theorem 10.5 Let U1 , . . . , Un be substaces of V. Then U1 + + Un is
the smallest subspace of V that contains all of U1 , . . . , Un .
Proof. We need to prove (a) that U1 + + Un contains U1 , . . . , Un , and (b)
that any subspace that contains U1 , . . . , Un also contains U1 + + Un .
To see (a), let u1 U1 . Since 0 Ui for all i, we can let u2 = 0 U2 , u3 =
0 U3 , . . . , un = 0 Un . Then
u1 = u1 + 0 + + 0
(10.8)
= u1 + u2 + + un U1 + + Un
(10.9)
(10.10)
(10.12)
Math 462
(10.13)
(10.15)
W = {(0, y)|y R}
(10.16)
V = Fn = {(v1 , v2 , . . . , vn )|vi F}
(10.17)
More generally, if
and
V1 = {(v, 0, . . . , 0)|v F}
V2 = {(0, v, 0, . . . , 0)|v F}
..
.
Vn = {(0, . . . , 0, v)|v F}
(10.18)
Then
V = V1 V2 Vn
(10.19)
U = {(x, y, 0)|x R}
W = {(0, y, y)|y R}
Z = {(0, 0, z)|z R}
(10.20)
(10.21)
V=
6 U W Z
(10.22)
but
(1) U is a subspace of V: U contains (0, 0, 0) = 0;
(x, y, 0) + (x0 , y 0 , 0) = (x + x0 , y + y 0 , 0) U
(10.23)
(10.24)
Page 69
Math 462
(10.26)
(10.27)
(10.28)
(10.29)
Since (x, y/2, 0) U, (0, y/2, y/2) W, and (0, 0, z y/2) Z then
(x, y, z) U + W + Z
(10.30)
and therefore V U + W + Z.
(5) V 6= U W Z: Consider the element (0, 0, 0) which is in each of the
subspaces as well as V. Then
(0, 0, 0)V = (0, 0, 0)U + (0, 0, 0)W + (0, 0, 0)Z
(10.31)
(10.32)
This means we can express the vector (0, 0, 0) as two different sums of the
form u + w + z, and hence the method of defining u, w, z is not unique.
Going back to equation 10.29 we see that that sum is also not unique. For
example, we could also write
(x, y, z) = (x, y/4, 0)U + (0, 3y/4, 3y/4)W + (0, 0, z 3y/4)Z
Page 70
(10.33)
Math 462
(10.34)
(10.35)
(10.37)
(10.38)
Then
0 = v v = (u1 v1 ) + (u2 v2 ) + + (un vn )
(10.39)
Page 71
Math 462
(10.41)
(10.42)
w =u+ww =u
(10.43)
Page 72
Topic 11
Polynomials
Definition 11.1 A function p(x) : F 7 F is called a polynomial (with
coefficients) in F if there exists a0 , . . . , an F such that
p(z) = a0 + a1 z + a2 z 2 + + an z n
(11.1)
(11.2)
(11.3)
Proof. (Exercise.)
Theorem 11.3 Let p P(F) be a polynomial with degree m 1. Then
F is a root of p if and only if there is a polynomial q P(F) with degree
m 1 such that
p(z) = (z )q(z)
(11.4)
for all z F.
Page 73
Math 462
Proof. ( = ) Suppose that there exists a q P(F) such that 11.4 holds.
Then
p() = ( )q() = 0
(11.5)
hence is a root of p.
( = )Let be a root of p, where
p(z) = a0 + a1 z + a2 z 2 + + am z m
(11.6)
0 = a0 + a1 + a2 2 + + am m
(11.7)
Then
Subtracting,
p(z) = a1 ( z) + a2 (z 2 2 ) + + am (z m m )
(11.8)
By the lemma
z j j = (z )qj1 (z)
(11.9)
(11.10)
where
and therefore
p(z) = a1 ( z) + a2 (z )q1 (z) + a3 (z )q2 (z)+
+ am (z )qm (z)
(11.11)
(11.12)
= (z )q(z)
(11.13)
(11.14)
where
Math 462
(11.15)
(11.16)
proof is based on the one given in Friedberg et al., Linear Algebra, 4th edition,
Pearson (2003).
Page 75
Math 462
(11.19)
q0
p0
(11.20)
(11.21)
p(z) = p0 + p1 z + p2 z 2 + + pm z m
(11.22)
and
qn nm
z
p(z)
pm
(11.23)
qn nm
z
p0 + p1 z + p2 z 2 + + pm z m
= q0 + q1 z + q2 z 2 + + qn z n
pm
(11.24)
Then since the z n terms subtract out,
deg(h(z)) < n
(11.25)
Then either deg(h) < m = deg(p) or m deg(h) < n. In the first instance,
case 1 applies, and in the second instance the inductive hypothesis applies
to h and p, i.e., there exist polynomials s0 (z) and r(z) such that
h(z) = s0 (z)p(z) + r(z)
(11.26)
(11.27)
with
qn nm
z
p(z) = s0 (z)p(z) + r(z)
pm
(11.28)
Math 462
q(z) =
(11.29)
(11.30)
= s(z)p(z) + r(z)
(11.31)
(11.32)
0 = a0 + a1 + + an m
(11.33)
Since is a root,
Taking the complex conjugate of this equation proves the theorem:
0 = 0 = a0 + a1 + + an m
m
a0 + a1 + + an
(11.34)
(11.35)
where the last line follows from the fact that all the coefficients are real,
hence ai = ai . Thus is a root of p.
Last revised: November 3, 2012
Page 77
Math 462
(11.36)
and 1 , 2 R 2 > 4.
Proof. We complete the squares in the quadratic,
2 2
z 2 + z + = z 2 + z +
+
2
2
2
2
= z+
+
2
4
(11.37)
(11.38)
If 2 4, then we define c R by
c2 =
(11.39)
and therefore
2
z 2 + z + = z +
c2
2
= z+ c z+ +c
2
2
(11.40)
(11.41)
hence
p
1
2 4
2
which is the familiar quadratic equation.
1,2 =
(11.42)
If 2 < 4 then the right hand side of the equation 11.37 is always positive,
and hence there is no real value of z that gives p(z) = 0. Hence there can
be no real roots; if any root exists, it must be complex. We can solve for
these two roots using the quadratic formula:
p
1
i 4 2
(11.43)
1,2 =
2
(substitution proves that these are roots). This is a complex conjugate
pair.
Theorem 11.11 If p P(C) is a non-constant polynomial then it has a
unique factorization
p(z) = c(z 1 )(z 2 ) (z n )
(11.44)
Math 462
If some of the roots are complex, then let m be the number of real roots
and k = n m be the number of complex roots, where k is even because
the complex roots come in pairs. Then the complex roots can written as
m+2j1 = aj + ibj ,
2j = aj ibj ,
j = 1, . . . , k/2
(11.45)
(11.46)
(11.48)
(11.49)
=
=
(x aj ) + b2j
x2 2aj x + a2j
(11.50)
+
b2j
(11.51)
Thus j = 2aj and j = a2j + b2j = (j /2)2 + b2j . Hence we have the
following result.
Theorem 11.12 Let p(x) be a non-constant polynomial of order n with
real coefficients. Then p(x) may be uniquely factored as
p(x) = c(x 1 )(x 2 ) (x m )(x2 + 1 x + 1 ) (x2 + k x + k )
(11.52)
where n = m + 2k, 1 , . . . , m R, i , i R, and i2 < 4j . If k > 0 then
the complex roots are
q
j
i j (j /2)2
(11.53)
j1,j2 =
2
Page 79
Math 462
Page 80
Topic 12
(12.2)
Page 81
Math 462
(12.3)
(12.6)
(12.7)
Theorem 12.6 Linear Dependence Lemma. Let (v1 , . . . , vn ) be linearly dependent in V with v1 6= 0. Then for some integer j, 2 j n,
(a) vj span(v1 , . . . , vj1 )
(b) If the j th term is removed from (v1 , . . . , vn ) then the span of the remaining lists equals span(v1 , . . . , vn ).
Proof. (a) Let (v1 , . . . , vn ) be linearly dependent with v1 6= 0.
a1 , . . . , an , not all 0, such that
0 = a1 v1 + a2 v2 + + an vn
Page 82
Then
(12.8)
Math 462
aj1
a1
v1
vj1
aj
aj
(12.10)
(12.11)
(12.12)
(12.14)
(12.15)
is linearly dependent.
By the linear dependence theorem, we can remove one of the wi so that
the remaining elements in 12.15 spans V. Call the vector removed wi1 , and
define
B1 = (u1 , w1 , w2 , . . . , wn |wi1 )
(12.16)
where by (a, b, ...|p, q, ...) we mean the list (a, b, ..) with the elements (p, q, ..)
removed.
Last revised: November 3, 2012
Page 83
Math 462
(12.17)
(12.18)
spans V.
We keep repeating this process. In each step, we add one uk and remove
one wj . If at some point we do not have any ws to remove this must
mean that we have created a list that only contains the us but is linearly
dependent. This is a contradition. Hence there must always be at least one
w to remove. Thus there must be at least as many ws as there are us.
Hence m n.
Example 12.5 Let V = R2 and define the vectors
a = (1, 0)
(12.19)
b = (0, 1)
(12.20)
c = (1, 1)
(12.21)
s = (a, b, c)
(12.22)
(12.23)
(12.24)
= ( + , + )
(12.25)
(12.26)
Page 84
(12.27)
(12.28)
Math 462
(12.29)
and as we shall see in the next chapter that this means that while s spans
V, it is not a basis of V.
In fact, we can remove c from s, and still have a set s0 = (a, b) that spans
V. Since the length of s0 = 2, this means that any linearly dependent list
in V has length at most 2. In fact, since s0 is linearly independent and
spans V, it is a basis of V (a basis is a linearly independent spanning list of
vectors).
Theorem 12.8 Every subspace of a finite dimensional vector space is finite
dimensional.
Proof. Let V be finite dimensional, and let U be a subspace of V. Since V
is finite dimensional, for some m, there exists w1 , . . . , wm V such that
L = (w1 , . . . , wm )
(12.30)
(12.31)
(12.32)
(12.33)
Page 85
Math 462
(12.34)
Either the process stops with n < m or it does not. If it stops, we are done,
because the length of B is finite.
If it does not then eventually B will have length n = m.
It is not possible to find any other vector u U at this point such that
B 0 = (v1 , . . . , vm , u)
(12.35)
is linearly independent.
To see this, suppose that there is such a vector, i.e., that B 0 is linearly
independent.
Since the length of B is m + 1 > m, this means we have found a linearly
independent list of length greater than a spanning list (L, which has length
m).
This contradicts theorem 12.7. Hence no such vector u exits.
Thus the longest possible list that spans U has m elements; since m is finite,
U is finite dimensioned, as required.
Page 86
Topic 13
(13.1)
(13.2)
for some a1 , . . . , an F.
Proof. ( = ) Suppose that B = (v1 , . . . , vn ) is a basis of V.
Let v V. Then because B spans V there exists some a1 , . . . , an F such
that
v = a1 v1 + + an vn
(13.3)
We must show that the numbers a1 , . . . , an are unique. To do those, suppose
that there is a second set of numbers b1 , . . . , bn F such that
v = b1 v1 + + bn vn
(13.4)
(13.5)
Subtracting,
Page 87
Math 462
Since B is a basis, it is linearly independent. Hence every coefficent in equation 13.5 must be zero, i.e., ai = bi , i = 1, . . . , n. Thus the representation
is unique.
( = ) Suppose that every v V can written uniquely in the form of
equation 13.3.
Then by definition of a spanning list, (v1 , . . . , vn ) spans V. To show that
B is a basis of V we need to also show that it is linearly independent.
Suppose that B is linearly dependent. Then there exist b1 , . . . , bn F such
that
0 = b1 v 1 + + bn v n
(13.6)
By uniqueness (which we are assuming), this is the only set of bi for which
this is true. But we also know that
0 = (0)v1 + + (0)vn
(13.7)
Math 462
i,k
Page 89
Math 462
(13.11)
where
u = a1 u1 + + am um U
(13.12)
w = b1 w1 + + bn wn W
(13.13)
(13.14)
(13.15)
(13.16)
By rearrangment,
a1 u1 + + am um b1 w1 bn wn = 0
(13.17)
(13.18)
(13.19)
Math 462
Theorem 13.7 Let V be any finite dimensional vector space, and let B1
and B2 be any two bases of V. Then
length(B1 ) = length(B2 )
(13.20)
(13.22)
(13.24)
(13.25)
Page 91
Math 462
(13.26)
(13.27)
Proof. Let
B = (v1 , . . . , vm )
(13.28)
dim(U W) = m
(13.29)
be a basis of U W. Hence
(13.30)
and to a basis BW of W,
BW = (v1 , . . . , vm , w1 , . . . , wk )
so that
dim(U) = m + j
dim(W) = m + k
(13.31)
(13.32)
(13.33)
Math 462
By rearrangement,
a1 v1 + + am vm + b1 u1 + + bj uj = c1 w1 ck wk
{z
}
{z
} |
|
U
(13.35)
(13.36)
(13.37)
(13.38)
(13.39)
(13.40)
(13.41)
Hence there is no linear combination of B 0 that gives the zero vector except
for the one in which all the coefficients are zero. This means the B 0 is a
linearly independent list. ]
Since B 0 is linearly independent and spans U + W, it is a basis of U + W.
Hence
dim(U + W) = length(B 0 )
(13.42)
=m+j+k
(13.43)
= (m + j) + (m + k) m
(13.44)
(13.45)
Page 93
Math 462
(13.46)
(13.47)
V = U1 Um
(13.48)
and
Then
Proof. Define bases B1 , . . . , Bm for each of the Ui . Let1
B = (B1 , B2 , . . . , Bn )
(13.49)
(13.50)
= dim(U1 ) + + dim(Um )
(13.51)
= dim(V)
(13.52)
(13.53)
ui =
aik vik
(13.54)
k=1
Hence
dim(U1 )
0=
X
k=1
dim(U2 )
a1k v1k +
dim(Um )
a2k v2k + +
k=1
amk vmk
(13.55)
k=1
1 By
(13.56)
Page 94
Topic 14
Linear Maps
Definition 14.1 Let V, W be vector spaces over a field F. Then a linear
map from V to W is a function T : V 7 W with the properties:
(1) Additivity: for all u, v V,
T (u + v) = T (u) + T (v)
(14.1)
(14.2)
It is common notation to omit the parenthesis when expressing maps, writing T (v) as T v. The reason for this will become clear when we study the
matrix representation of linear maps.
The properties of additivity and homogeneity can be combined into a linearity property expressed as
T (au + bv) = aT u + bT v
(14.3)
where a, b F and u, v V.
Definition 14.2 The set of all linear maps from V to W is denoted
by L(V, W)
Definition 14.3 Let V and W be vector spaces and let T L(V, W) be
a linear map T : V 7 W. The range of T is the subset of W that are
mapped to by T :
range(T ) = {w W|w = T v, v V}
(14.4)
Page 95
Math 462
(14.5)
The range of I is V.
Example 14.3 Differentiation: Define D L(P(R), P(R)) by Dp = p0
where p0 (x) = dp/dx.
Example 14.4 Integration: Define I L(P(R), R)) by
Z
Ip =
p(x)dx
(14.6)
(14.8)
(14.10)
(14.11)
Math 462
(14.12)
(14.13)
(14.14)
(14.16)
(14.17)
Theorem 14.14 Let V and W be vector spaces over F and T L(V, W).
Then null(T ) is a subspace of V.
Last revised: November 3, 2012
Page 97
Math 462
Proof. By additivity,
T (0) = T (0 + 0) = T (0) + T (0)
(14.18)
T (0) = 0
(14.19)
hence 0 null(T ).
Now let u, v null(T ).
T (u + v) = T u + T v = 0 + 0 = 0
(14.20)
(14.21)
(14.22)
Definition 14.16 Let V and W be vector spaces and let T L(V, W).
Then T is called surjective or onto if range(T ) = W, i.e, if w W, v
V 3 w = T v.
Theorem 14.17 Let V and W be vectors spaces and let T L(V, W).
Then T is injective (one-to-one) if and only if null(T ) = {0}.
Proof. ( = ) Suppose that T is injective (1-1). By equation 14.19 we
known that T (0) = 0 hence
{0} null(T )
(14.23)
(14.25)
Math 462
(14.26)
u v null(T ) = {0} .
(14.27)
Hence
Therefore uv = 0 or u = v which proves that T is injective (1-1) (because
T u = T v = u = v).
Theorem 14.18 Let V and W be vector spaces over F and let T L(V, W).
Then range(T ) is a subspace of W.
Proof. By definition R = range(T ) W. To show that R is a subspace of
W we need to show that:
(a) 0 R;
(b) R is closed under addition; and
(c) R is closed under scalar multiplication.
By equation 14.19, T (0) = 0, hence 0 R, proving (a).
Let w, z range(T ). Then there exist some u, v V such that T (u) = w
and T (v) = z. Then
T (u + v) = T u + T v = w + z
(14.28)
(14.29)
(14.30)
(14.31)
(14.32)
Page 99
Math 462
Hence
dim V = m + n
(14.33)
(14.34)
T v = T a1 u1 + + T am um + T b1 w1 + + T bn wn
(14.35)
= a1 T u1 + + am T um + b1 T w1 + + bn T wn
(14.36)
= b1 T w1 + + bn T wn
(14.37)
Hence
(14.38)
span(B 00 ) = range(T )
(14.39)
Hence
Since the length of B 00 is finite, range (T ) is finite dimensional (because it
is spanned by a finite list).
Suppose that there exist c1 , . . . , cn F such that
0 = c1 T w1 + + cn T wn
(14.40)
= T (c1 w1 + + cn wn )
(14.41)
(14.42)
(14.43)
or by rearrangement
c1 w1 + + cn wn d1 u1 + dm um = 0
(14.44)
(14.45)
Math 462
and thus (see equation 14.40) the only linear combination of the B 00 that
gives the zero vector is on in which all the coefficients are zero. This means
that B 00 is linearly independent.
Since B 00 is linearly independent and spans range(T ), it is a basis of range(T )
and hence
dim range(T ) = length(B 00 ) = n
(14.46)
Combining equations 14.46 with 14.31 and 14.33
dim V = m + n = dim null(T ) + dim range(T )
(14.47)
(14.48)
(14.49)
(14.50)
Since dim null(T ) > 0 then null(T ) must contain vectors other than 0, i.e.,
null(T ) 6= {0}.
Hence (see theorem 14.17) T is not injective.
Corollary 14.21 Let V and W be finite-dimensional vector spaces with
dim V < dim W
(14.51)
(14.52)
Hence there are vectors in W that are not mapped to by T , and thus T is
not surjective (onto).
Page 101
Math 462
Page 102
Topic 15
(15.2)
If the choice of bases for V and W are clear from the context we use the
notation M(T ).
The following may help you remember the structure of this matrix:
w1
..
.
v1
wm
vk
a1,k
..
.
vn
(15.3)
am,k
(15.4)
Page 103
Math 462
(15.5)
Proof. (sketch) Express the left hand side of each formula as a matrix and
then apply the properties of matrices as reviewed in Chapter 2.
Matrix Multiplication. To derive a rule for matrix multiplication, suppose that S L(U, V) and T L(V, W). The composition T S is a map
T S : U 7 W, i.e., T S L(U, W):
S : U 7 V and T : V 7 W
(15.6)
= T S : U 7 W
Let n = dim V, m = dim W, and p = dim U, and (v1 , . . . , vn ) be a basis of
V, (w1 , . . . , wm ) be a basis of W, and (u1 , . . . , up ) be a basis of U. Suppose
that
M(T ) = [ai,j ]i{1...m},j{1...n} = [m n] matrix
(15.7)
M(S) = [bj,k ]j{1...n},k{1...p} = [n p] matrix
Then
Suk =
n
X
(15.8)
bj,k vj
(15.9)
ai,j wi
(15.10)
j=1
T vj =
m
X
i=1
and therefore
T Suk = T
=
n
X
bj,k vj
(15.11)
bj,k T vj
(15.12)
j=1
n
X
j=1
n
X
bj,k
j=1
m
X
m
X
ai,j wi
(15.13)
i=1
n
X
wi
ai,j bj,k
i=1
(15.14)
j=1
Therefore if we identify
M(T S)i,k =
n
X
ai,j bj,k
(15.15)
j=1
Page 104
Math 462
[mn]
(15.16)
[np]
This is in fact the same definition of matrix multiplication with which you
are already acquainted.
Definition 15.3 Matrix of a Vector. Let V be a vector space over F and
let v V. If (v1 , , vn ) is a basis of V then for some some a1 , . . . , an F
v = a1 v1 + a2 v2 + + an vn
We define the matrix of the vector v as
a1
M(v) = ...
(15.17)
(15.18)
an
Theorem 15.4 Let V, W be vector spaces over F with bases (v1 , . . . , vn )
and (w1 , . . . , wm ) and let T L(V, W). Then for every v V,
M(T v) = M(T )M(v)
(15.19)
Proof. Let
a1,1
..
M(T ) = .
am, 1
a1,n
..
.
(15.20)
am,n
m
X
aj,k wj
(15.21)
j=1
n
X
bk vk
(15.22)
k=1
Page 105
Math 462
Hence
n
X
Tv = T
!
bk vk
(15.23)
k=1
=
=
n
X
k=1
n
X
bk T v k
bk
aj,k wj
(15.25)
j=1
k=1
m
X
m
X
(15.24)
wj
j=1
n
X
!
aj,k bk
(15.26)
aj,k bk
(15.27)
k=1
Therefore
[M(T v)]j =
n
X
k=1
Similarly, since
a1,1
..
M(T )M(v) = .
am, 1
Pn
a1,n
b1
k=1 a1,k bk
.. .. =
..
. . P
.
n
am,n
bn
k=1 am,k bk
(15.28)
we conclude that
[M(T )M(v)]j =
n
X
(15.29)
k=1
and therefore
M(T v) = M(T )M(v)
Page 106
(15.30)
Topic 16
Invertibility of Linear
Maps
Definition 16.1 Let V, W be vector spaces over F and let T L(V, W).
Then T is called invertible if there exists a linear map S L(W, V) such
that
ST = I V
(16.1)
TS = I W
The linear map S is called the inverse of T .
Theorem 16.2 The inverse is unique
Proof. Let T be a linear map with inverses S and S 0 . Then
S = SI = S(T S 0 ) = (ST )S 0 = IS 0 = S 0
(16.2)
(16.3)
Theorem 16.3 Let V and W be vector spaces over F and let T L(V, W ).
Then T is invertible if and only if T is both injective (one-to-one) and
surjective (onto).
Proof. ( = ) Assume that T is invertible.
Suppose that u, v V and that T u = T v. Then
u = T 1 (T u) = T 1 T u = T 1 T v = v
(16.4)
Page 107
Math 462
(16.5)
(16.6)
(16.7)
(16.8)
(16.9)
(16.10)
(16.11)
(16.12)
Page 108
Math 462
because T is linear
(16.13)
because T S = I
(16.14)
(16.15)
= Sw1 + Sw2
since ST = I
(16.16)
Homogeneity of T
(16.17)
Associative
(16.18)
because T S = I
(16.19)
ST (aSw) = Saw
(16.20)
aSw = Saw
because ST = I
(16.21)
= aT Sw
= aw
The last line shows that S is homogeneous. Hence S is a linear map; it has
the properties that ST = I and T S = I in their respective domains hence
it is the inverse of T . Hence T is invertible.
Definition 16.4 Let V and W be vector spaces over F. Then V and W are
said to be isomorphic vector spaces if there exists an invertible linear
map T L(V, W) (i.e., there also exists a linear map S = T 1 L(W, V)).
Definition 16.5 An operator is a linear map from a vector space to itself.
We denote the set of operators on V by L(V) (instead of L(V, V)).
Example 16.1 Let V = Rn . Then any n n matrix is an operator on V.
Theorem 16.6 Let V and W be finite-dimensional vector spaces. Then
they are isomorphic if and only if they have the same dimension.
Proof. ( = ) Assume that V and W are isomorphic.
Then there exists an invertible linear map T L(V, W).
Since T is invertible, it is injective (one-to-one), hence by theorem 14.17,
null(T ) = {0}. Hence
dim null(T ) = 0
(16.22)
2 Where
Page 109
Math 462
(16.23)
By equation 14.30
dim(V) = dim null(T ) + dim range(T ) = dim range(T ) = dim(W) (16.24)
( = ) Assume that dim(V) = dim(W). To show that this implies isomorphism, we need to show that there exists some invertible linear map
T L(V, W).
Let (v1 , . . . , vn ) and (w1 , . . . , wn ) be bases of V and W.
Define T L(V, W) such that
T (a1 v1 + + an vn ) = a1 w1 + + an wn
(16.25)
(16.27)
(16.28)
3 By
Page 110
Math 462
This theorem gives us the amazing result that any two finite dimensional
vector spaces of dimension n are isomorphic. In particular, as the following
corollary states, any vector space of dimension n is isomorphic to Fn . Thus
everything we need to know about vector spaces we can learn by studying
Fn .
Corollary 16.7 Let V be a finite dimensional vector space of dimension n.
Then V is isomorphic to Fn .
In particular, we will be interested in the matrix of a linear map T
L(V, W). This matrix, a linear map in L(Fn , Fm ) where n = dim(V) and
m = dim(W), is defined by the coefficients that map one set of basis vectors
to the others.
Theorem 16.8 Let V and W be finite dimensional vector spaces with bases
(v1 , . . . , vn ) and (w1 , . . . , wm ), and let T L(V, W). Then M
M(T ) : L(V, W) 7 Mat(m, n, F)
(16.29)
(16.30)
is invertible.
Proof. We have already shown that M(T ) is linear (theorem 15.2). To
show invertibility, we need to show that M(T ) is one-to-one and onto.
Let T null(M), that is, T L(V, W) such that M(T ) = 0, where 0 is
the n m zero matrix.
Then T vk = 0 for all k = 1, . . . , n.
Because (v1 , .P
. . , vn ) is a basis, every v V can be written as linear combination v =
ai vi for some a1 , a2 , . . . . Therefore
T (a1 v1 + + an vn ) = 0
(16.31)
In fact, this must hold for all values of ai F, since every collection of
a1 , a2 , . . . produces some vector in V. By linearity, since T vk = 0, we have
T v = 0 for all v V. Thus T = 0, i.e, T is the operator that maps all
vectors in V to the zero vector in W.
But since T null(M) = T = 0,this means that null(M) = {0}. By
14.17 M is one-to-one (injective).
Last revised: November 3, 2012
Page 111
Math 462
a1,1 a1,n
..
A = ...
.
am,1 am,n
(16.32)
If we define T L(V, W) by
T vk =
m
X
aj,k wj
(16.33)
j=1
(16.35)
Math 462
(16.36)
Page 113
Math 462
Page 114
Topic 17
Operators and
Eigenvalues
Definition 17.1 An operator is a linear map from a vector space to itself.
We denote the set of operators on V by L(V) (instead of L(V, V)).
If T L(V) then T n L(V) for any positive integer n. We use the notation
T 2 to denote the product T T , T 3 = T T T , etc.
If T is invertible, then we define T m = (T 1 )m . Furthermore,
T m T n = T m+n ,
(T m )n = T mn
(17.1)
(17.2)
(17.3)
(17.4)
Page 115
Math 462
(17.6)
(17.7)
(17.9)
u null(T I)
(17.10)
null(T I) 6= {0}
(17.11)
Math 462
(17.13)
and if vk is removed from (v1 , . . . , vm ) then the span of the remaining list
equals the span of the original list. Let k designate the smallest integer
such that this is true.
Since k is the smallest integer for which this is true, the list (v1 , . . . , vk1 )
is linearly independent.
Hence there exists constants a1 , . . . , ak F such that
vk = a1 v1 + + ak1 vk1
(17.14)
(17.15)
(17.17)
(17.18)
Since (v1 , . . . , vk1 ) is linearly independent, and since the j are all distinct,
a1 = a2 = = ak1 = 0
(17.19)
From equation 17.14, vk = 0. This contradicts the assumption that all the
vk 6= 0.
Therefore (v1 , . . . , vm ) must be linearly independent.
Page 117
Math 462
(17.20)
(17.21)
(17.22)
Since
the list ` is linearly-dependent. Hence there exists a0 , . . . , an , not all zero,
such that
0 = a0 v + a1 T v + a2 T 2 v + + an T n v
(17.23)
Define m n as the largest integer such that am 6= 0. Then
0 = a0 v + a1 T v + a2 T 2 v + + am T m v
(17.24)
(17.25)
= c(z 1 )(z 2 ) (z m )
(17.26)
(17.27)
2
Page 118
= (a0 I + a1 T + a2 T + + am T )v
(17.28)
= c(T 1 I) (T m I)v
(17.29)
Math 462
(17.30)
w = (T j+1 I) (T m I)v 6= 0
(17.31)
null(T j I) 6= {0}
(17.32)
where
Thus
Hence T j I is not injective (1-1). By theorem 17.8, j is an eigenvalue
of T .
Hence an eigenvalue exists.
Page 119
Math 462
Page 120
Topic 18
Matrices of Operators
Definition 18.1 Matrix of an Operator. Let T L(V) and let (v1 , . . . , vn )
be a basis of V. Then there are some numbers ai,j such that
T v1 = a11 v1 + + an1 vn
T v2 = a12 v1 + + an2 vn
(18.1)
..
T vn = a1n v1 + + ann vn
Then we define the matrix of T with respect to the basis (v1 , . . . , vn )
as
(18.3)
MT = A1 T A = (v1 , . . . , vn )1 T (v1 , . . . , vn )
(18.4)
or
where A = (v1 , . . . , vn ).
Page 121
Math 462
Theorem 18.2 Let V be a vector space over F with basis (v1 , . . . , vn ) and
T L(V) an operator on V. The the following are equivalent:
(1) M(T, (v1 , . . . , vn ) is upper triangular.
(2) T vk span(v1 , . . . , vk ) for k = 1, . . . , n.
(3) span(v1 , . . . , vk ) is invariant under T for each k = 1, . . . , n.
Proof. ((1) = (2)) Let M(T ) be upper triangular,
a11
0
M(T ) = 0
..
.
0
a12
a22
0
..
.
a33
..
.
a1n1
..
.
0
a1n
a2n
a3n
..
.
(18.5)
ann
(18.6)
(18.7)
Page 122
T v1 span(v1 )
T v2 span(v1 , v2 )
..
.
T vk span(v1 , . . . , vk )
..
.
(18.8)
Math 462
(18.9)
(18.10)
(18.11)
(18.12)
(18.13)
= dim U.
(18.14)
(18.15)
Page 123
Math 462
(18.16)
(18.17)
(18.18)
Since
(T I)vk U
(by definition of U)
(18.19)
(18.20)
we concluded that
T vk span(B)
(18.21)
Hence theorem 18.2 applies again and T has an upper triangular matrix
with respect to the basis B.
Theorem 18.4 Let V be a vector space over F and let T L(V ) be such
that M(T ) is upper triangular with resepct to some basis B = (v1 , . . . , vn )
of V. Then T is invertible if and only if all the entries on the diagonal of
M(T ) are nonzero.
Proof. ( = ) Let M(T ) be upper triangular, and write
1
0 2
M(T, B) =
..
.
.
0
.
0
.
0
(18.22)
Math 462
(18.23)
for i = 1, . . . , k 1.
Because k = akk = 0,
T vk = a1k v1 + a2k v2 + + ak1,k vk1 + akk vk
(18.24)
(18.25)
= T vk span(v1 , . . . , vk1 )
(18.26)
(18.27)
Sv = T |(v1 ,...,vk ) v
(18.28)
dim(v1 , . . . , vk1 ) = k 1
(18.29)
by
But
dim(v1 , . . . , vk ) = k
(18.30)
(18.31)
(18.32)
= k 1 + dim null(S)
(18.33)
(18.34)
(18.35)
Page 125
Math 462
Choose k to be the largest k such that equation 18.35 holds (with ak 6= 0).
Then
0 = T v = a1 T v1 + + ak1 T vk1 + ak T vk
= ak T vk = a1 T v1 ak1 T vk1
ak1
a1
T vk1
= T vk = T v1
ak
ak
(18.36)
(18.37)
(18.38)
T v1 = b1,1 v1
T v2 = b1,2 v1 + b2,2 v2
..
.
T vk1 = b1,k1 v1 + b2,k1 v2 + + bk1,k1 vk1
(18.39)
where the bij are the elements of M(T, B). Hence by substituting the
expressions for each T vj in (18.39) into the expression for T vk in (18.38),
T vk =
a1
a2
b1,1 v1 (b1,2 v1 + b2,2 v2 )
ak
ak
ak1
(b1,k1 v1 + b2,k1 v2 + + bk1,k1 vk1 )
ak
(18.40)
(18.41)
T v1 = a11 v1
T v2 = a22 v2
..
.
T vn = ann vn
Page 126
(18.42)
Math 462
(18.43)
(18.44)
4. V = null(T 1 I) null(T m I)
5. dim V = dim null(T 1 I) + + dim null(T m I)
Proof. ((1) (2)) This is theorem 18.5.
((2) = (3)) Assume (2), that V has a basis B = (v1 , . . . , vn ) consisting
of eigenvectors of T .
Let
Uj = span(vj ),
Last revised: November 3, 2012
j = 1, 2, . . . , n
(18.45)
Page 127
Math 462
(18.47)
(18.48)
(18.49)
(18.50)
(18.51)
(18.52)
Math 462
Since vi is an eigenvector,
span(vi ) = null(T i I)
(18.53)
(18.54)
(18.55)
Since each ui is an eigenvector, and the eigenvectors form a basis (by assumption), the ui are linearly independent. Thus each term in equation
18.55 is zero:
ui = 0
(18.56)
By theorem 10.7, the uniqueness of the expansion of 0 tells us that the sum
in (18.54) is a direct sum:
V = null(T 1 I) null(T n I)
(18.57)
(18.58)
(18.59)
Define ui as the sum of all the terms in (18.59) such that vk null(T
i I) = Ui (the number of eigenvalues might be smaller than the number of
eigenvectors so there might be more than one linearly independent eigenvector is each set). Thus for some m,
0 = u1 + + um
(18.60)
1 Exercise
Page 129
Math 462
(18.62)
(18.63)
(18.64)
= i ui
(18.65)
(18.66)
Page 130
Topic 19
(19.1)
y = Tx
(19.2)
given by
n
(19.3)
Similar matrices represent the same linear transformation in different coordinate systems.
Let E = (e1 , e2 , . . . , en ) be a basis of Rn . Then we can write
x = 1 e1 + 2 e2 + + n en
(19.4)
y = 1 e1 + 2 e2 + + n en
(19.5)
Page 131
Math 462
y = E
(19.6)
1
x1
..
..
=
E
. ,
.
y1
1
..
..
=
E
.
.
xn
yn
(19.7)
If y = T x then
E = T E = = (E 1 T E)
(19.8)
0
1
1
0
0.5
.7
=
0.7
5
(19.10)
1
,
1
e2 =
0
1
(19.11)
1
1
0
1
(19.12)
1
1
0
1
(19.13)
Math 462
1
1
0
0.5
0.5
=
1
0.7
0.2
(19.17)
1
T =
1
0
1
0.7
.7
=
=y
1.2
0.5
(19.19)
(19.20)
(19.21)
(19.22)
0.5
0.7
7
0.2
1.2
(19.23)
Page 133
Math 462
Proof. Let B = T 1 AT . Then, using the property that det AB = (det A)(det B)
gives
det(B I) = det(T 1 AT I)
(19.24)
= det(T
(A T IT
= det(T
(A I)T )
= (det T
)T )
)(det(A I))(det T )
= det(A I)
(19.25)
(19.26)
(19.27)
(19.28)
(19.29)
Since A and B have the same characteristic equation they have the same
eigenvalues with the same multiplicities.
Example 19.2 From the previous example, we had B = T 1 AT where
0 1
1 1
A=
,
B=
(19.30)
1 0
2
1
Each of these matrices have the same eigenvalues {i, i}. To see this observe that
det(A I) = 2 + 1
(19.31)
while
det(B I) = (1 )(1 ) + 2 = 1 + + 2 + 2 = 2 + 1 (19.32)
Definition 19.4 A diagonal matrix is called the Diagonal Canonical
Form of a square matarix A if it is similar to A.
Theorem 19.5 Let A be an n n square matrix. Then A is similar to a
diagonal matrix if and only if A has n linearly independent eigenvectors.
Proof. ( = ) Suppose A is similar to a diagonal matrix. Then there
exists some invertible matrix T with linearly independent column vectors
(e1 , . . . , en ) such that
T 1 AT = diagonal(1 , . . . , n )
(19.33)
Ae1
Page 134
AT = e1 en diagonal(1 , . . . , n )
Aen = 1 e1 n en
(19.34)
(19.35)
Math 462
On the left hand side, we see that the jth column vector of AT is Aej , and
on the right hand side the jth column is j ej .Hence
Aej = j ej
(19.36)
AT = diagonal(1 , . . . , n )
(19.41)
(19.42)
(19.43)
(19.44)
(19.45)
Page 135
Math 462
e2 =
1
=
2
1
1
1 1
i i
i
i
(19.46)
(19.47)
Hence
1 1 i
0
1
2 1 i
1 1 i
i
=
1
2 1 i
i 0
=
0 i
T 1 AT =
1
1 1
0
i i
i
1
(19.48)
(19.49)
(19.50)
Page 136
Topic 20
Invariant Subspaces
In this section we will assume that V is a real, finite-dimensional, non-zero
vector space.
Theorem 20.1 Let V be a finite dimensional non-zero real vector space.
Then V has an invariant subspace of either dimension 1 or dimension 2.
Proof. Let n = dim(V) > 0, T L(V), and pick any v V such that v 6= 0.
Then define the list
L = (v, T v, T v2 , . . . , T vn )
(20.1)
(20.2)
(20.3)
Then p(x) nas n complex roots, which may be grouped into m real roots
1 , . . . , m , and k = (m n)/2 complex conjugate pairs of roots (see theorem 11.12) and can be factored
p(x) = c(x 1 ) (x m )(x2 + 1 x + 1 ) (x2 + k x + k ) (20.4)
Page 137
Math 462
(20.5)
= (a0 I + a1 T + a2 T + + an T )v
(20.6)
(20.9)
(20.10)
= T au + bT u
(20.11)
= T au + b(j T u j u)
(20.12)
rearranging
T v = (a bj )T u bj u span(u, T u)
(20.13)
(20.14)
(20.15)
Math 462
(20.16)
(20.18)
Page 139
Math 462
Then
(T I)v = (T I)(u + aw)
(20.19)
= T u u + aT w aw
(20.20)
= T u u + a(T w w)
(20.21)
Since we can separate any vector, such as T w into the sum of its projections
onto U and W,
Tw
}|
{
z
(T I)v = T u u + a(PU ,W T w + PW,U T w w)
= T u u + a(PU ,W T w + Sw w)
= T u u +a PU ,W T w + (S I) w
| {z }
| {z } | {z }
U
(20.22)
(20.23)
(20.24)
=0
The first term is in U by definition of U as the span of (u, T u). The second
term is in U because it is the projection of a vector onto U with null space
W.
Hence (T I)v U.
(T I) : (U + span(w)) 7 U
(20.25)
(20.27)
is not injective (1-1), hence its null space is not {0} (theorem 14.17). Since
the null space is not {0}, there exists a non-zero vector v U + span(w)
such that
(T I)v = 0
(20.28)
Thus T has an eigenvalue.
Page 140
Topic 21
(21.1)
(21.2)
hv, vi = 0 v = 0
(21.3)
2. Definiteness:
3. Additivity in first variable: for all u, v, w V,
hu + v, wi = hu, wi + hv, wi
(21.4)
(21.5)
(21.6)
Page 141
Math 462
(21.8)
(21.9)
(21.10)
(21.11)
(21.12)
Proof of (2)
hu, v + wi = hv + w, ui
(conjugate symmetry)
(21.13)
= hv, ui + hw, ui
(21.14)
= hv, ui + hw, ui
(21.15)
= hu, vi + hu, wi
(conjugate symmetry)
(21.16)
Proof of (3)
hu, avi = hav, ui
(conjugate symmetry)
(21.17)
= a hv, ui
(21.18)
= ahv, ui
(21.19)
= a hu, vi
(conjugate symmetry)
Page 142
Math 462
k(z1 , . . . , zn )k = z1 z1 + + zn zn
(21.21)
(21.22)
|p(x)|2 dx
(21.23)
(21.25)
= hu, u + vi + hv, u + vi
(21.26)
(21.27)
= kuk + kvk
(21.28)
(21.29)
Page 143
Math 462
(21.30)
z
}|
{
u = (v u)v + (u (v u)v)
| {z }
(21.31)
parallel to v
(21.32)
hu, vi
kvk2
(21.33)
(21.34)
z
}|
{
hu, vi
hu, vi
v + u
v
u=
kvk2
kvk2
| {z }
(21.35)
parallel to v
Page 144
Math 462
(21.38)
(21.39)
| hu, vi |2
+ kwk2
kvk2
| hu, vi |2
kvk2
(21.40)
(21.41)
(21.42)
2
p(x)q(x)dx = | hp, qi |2
(21.43)
kpk2 kqk2
Z 1
Z
=
|p(x)|2 dx
0
(21.44)
1
|q(x)|2 dx
(21.45)
Page 145
Math 462
(21.46)
Proof.
2
ku + vk = hu + v, u + vi
(21.47)
(21.48)
(Cauchy-Schwarz Inequality)
(21.52)
(Conjugate Symmetry)
(21.49)
= (kuk + kvk)2
(21.53)
(21.54)
Proof.
2
ku + vk + ku vk = hu + v, u + vi + hu v, u vi
2
(21.55)
(21.56)
(21.57)
Page 146
Topic 22
(22.1)
Page 147
Math 462
a
a
(22.3)
(22.4)
(22.5)
Hence by the intermediate value theorem, g has a root r (a, b), where
g(r) = 0. Then
0 = g(r) = f (r) r = f (r) = r
(22.6)
i.e., r is a fixed point of f .
In the case just proven, there may be multiple fixed points. If the derivative
is sufficiently bounded then there will be a unique fixed point.
Theorem 22.3 (Condition for a unique fixed point) Let f be a continuous
function on [a, b] such that f : [a, b] 7 S (a, b), and suppose further that
there exists some postive constant K < 1 such that
|f 0 (t)| K,
Page 148
t [a, b]
(22.7)
Math 462
(22.8)
Suppose that a second fixed point q [a, b], q 6= p also exists, so that
q = f (q)
(22.9)
|f (p) f (q)| = |p q|
(22.10)
Hence
By the mean value theorem there is some number c between p and q such
that
f (p) f (q)
f 0 (c) =
(22.11)
pq
Taking absolute values,
f (p) f (q)
= |f 0 (c)| K < 1
pq
(22.12)
and thence
|f (p) f (q)| < |p q|
(22.13)
p1 = f (p0 )
p2 = f (p1 )
..
.
(22.14)
pn = f (pn1 )
..
.
converges to the unique fixed point of f in (a, b).
Proof. We know from theorem 22.3 that a unique fixed point p exists. We
need to show that pi p as i .
Since f maps onto a subset of itself, every point pi [a, b].
Last revised: November 3, 2012
Page 149
Math 462
Further, since p itself is a fixed point, p = f (p) and for each i, since pi =
f (pi1 ), we have
|pi p| = |f (pi1 ) f (p)|
(22.15)
If for any value of i we have pi = p then we have reached the fixed point
and the theorem is proved.
So we assume that pi 6= p for all i.
Then by the mean value theorem, for each value of i there exists a number
ci between pi1 and p such that
|f (pi1 ) f (p)| = |f 0 (ci )||pi1 p| K|pi1 p|
(22.16)
(22.17)
(22.18)
(22.19)
(22.20)
Thus pi p as i .
A weaker condition that is sufficient for convergence is the Lipshitz condition.
Definition 22.5 A function f on I R is said to be Lipshitz (or Lipshitz continuous, or satisfy a Lipshitz condition) on y if there exists
some constant K > 0 if for all x I then
|f (x1 ) f (x2 )| K|x1 x2 |
(22.21)
Math 462
Theorem 22.6 Under the same conditions as theorem 22.4 except that
the condition of equation 22.7 is replaced with the following condition:
f (t) is Lipshitz with Lipshitz constant K < 1. Then fixed point iteration
converges.
Proof. The Lipshitz condition gives equation 22.16 immediately. The rest
of the the proof follows as before.
(22.23)
Hence f is Lipshitz in y on D.
Last revised: November 3, 2012
Page 151
Math 462
(22.24)
form some K R, 0 < K < 1, for all v, w S. We will call the number K
the contraction constant.
Definition 22.11 Let V be a vector space over F and let v1 , v2 , . . . be a
sequence in V. Then we say that the sequence is Cauchy if kvm vn k 0
as n, m . More precisely, the sequence is Cauchyl if
( > 0)(N > 0)(m, n > N, m, n Z)(kvm vn k < )
(22.25)
The study of Complete spaces and Cauchy sequences is beyond the scope
of this class. We will just assume that we are working in a vector space in
which Cauchy sequences converge.
Definition 22.12 Let V be a vector space over F. Then we say that V is
complete if every Cauchy sequence in V converges to some element of V.
Lemma 22.13 Let T be a contraction on a complete normed vector space
V with contraction constant K. Then for any v V
kT n v vk
1 Kn
kT v vk
1K
(22.26)
1K
kT v vk
1K
(22.27)
1 Kn
kT v vk
1K
(22.28)
1 K n+1
kT v vk
1K
(22.29)
Math 462
(22.30)
1 Kn
kT v vk
1K
(22.31)
n1
(22.32)
Kk(T v) (T
v)k
..
. (repeating the step n times)
(22.33)
K n kT v v|
(22.34)
kT n+1 v vk K n kT v vk +
(22.35)
(22.36)
(22.37)
Page 153
Math 462
(22.38)
v0 = v
v1 = T v
v2 = T v1
..
.
vn = T vn1
..
(22.39)
Since T is a contraction,
kvm vn k = kT m v T n vk
(22.40)
m1
v) T (T
m1
n1
= kT (T
KkT
..
vT
n1
v)k
vk
K n kT mn v vk
(22.41)
(22.42)
(22.43)
1K
kvm vn k K n
(22.44)
(22.45)
(22.46)
Math 462
kvn uk <
(22.48)
2
By the triangle inequality,
kT u uk = kT u vn+1 + vn+1 uk
(22.49)
kT u vn+1 k + kvn+1 uk
(22.50)
kT u uk kT u T vn k + ku vn+1 k
(22.51)
Since vn+1 = T vn ,
(22.52)
ku vn k + ku vn+1 k
(22.53)
2ku vn k
(22.54)
<
(22.55)
(22.56)
Page 155
Math 462
(22.57)
is a norm.
The proof is left as an exercise.
The notation for the sup-norm comes from the p-norm,
!1/p
Z
b
|f (x)|p dx
(22.58)
lim kf kp = kf k
(22.59)
kf kp =
then
p
y(t0 ) = y0
(22.60)
has a unique solution (t) in the sense that 0 (t) = f (t, (y)), (t0 ) = y0 .
Proof. We begin by observing that is a solution of equation 22.60 if and
only if it is a solution of
Z t
(t) = y0 +
f (x, (x))dx
(22.61)
t0
(22.62)
t0
Math 462
(22.63)
atb
Z t
Z t
f (x, h(x))dx
f (x, g(x))dx y0
= sup y0 +
atb
t0
t0
(22.64)
Z t
[f (x, g(x)) f (x, h(x))] dx
= sup
atb
(22.65)
t0
t0
t
L sup
atb
|g(x) h(x)| dx
(22.67)
t0
(22.68)
atb
(22.69)
atb
K(b a) kg hk
(22.70)
Since K is fixed, so long as the interval (a, b) is larger than 1/K we have
kT g T hk K 0 kg hk
(22.71)
K 0 = K(b a) < 1
(22.72)
where
Thus T is a contraction. By the contraction mapping theorem it has a fixed
point; call this point . Equation 22.61 follows immediately.
Page 157
Math 462
Page 158
Topic 23
Orthogonal Bases
Definition 23.1 Kronecker Delta Function.1
1, if i = j
ij =
0, if i 6= j
(23.1)
(23.2)
(23.3)
for all a1 , a2 , F.
Proof. This follows immediately from the Pythagorean theorem.
Theorem 23.4 Let B = (e1 , . . . , em ) be and orthonormal list of vectors in
V. Then B is linearly independent.
Proof. Suppose there exist a1 , . . . , am F such that
0 = a1 e1 + + am em
1 Named
(23.4)
Page 159
Math 462
(23.5)
(23.6)
(23.7)
n
X
ai hei , ej i =
i=1
n
X
ai ij = aj
(23.8)
(23.9)
i=1
Substituting equation 23.9 into equation 23.8 for each aj gives equation
23.6.
Theorem 23.6 Gram-Schmidt Orthonormalization Procedure. Let
A = (v1 , . . . , vn ) be a linearly independent list of vectors in V. Then there
exists an orthonormal list of vectors B = (e1 , . . . , en ) in V such that
span(v1 , . . . , vj ) = span(e1 , . . . , ej ) j = 1, . . . , n
(23.10)
v1
kv1 k
(23.11)
and then define ej recursively for j > 1 from the e1 , . . . , ej1 . Clearly
ke1 k = 1.
To illustrate the general form we construct the first few. We define e2 by
normalizing2 the part of v2 that is orthogonal to e1 :
e2 =
v2 hv2 , e1 i e1
kv2 hv2 , e1 i e1 k
(23.12)
2 When
Page 160
Math 462
j1
X
i=1
j1
X
i=1
hvj , ei i ei
hvj , ei i ei
(23.14)
(23.15)
(23.16)
Page 161
Math 462
(23.17)
(23.19)
w =vu
(23.20)
v =u+w
(23.21)
Define
so that
Since B is a basis of U, u U, and
hw, ej i = hv, ej i hu, ej i = hv, ej i hv, ej i = 0
(23.22)
D X
E X
hw, ui = w,
hv, ei i ei =
hv, ei i hw, ei i = 0
(23.23)
Therefore
Page 162
Math 462
(23.24)
(23.25)
kv PU vk kv PU vk + kPU v uk
(23.26)
v PU v U
(23.27)
PU v u U
(23.28)
But
Page 163
Math 462
kv PU vk + kPU v uk = kv PU v + PU v uk
2
= kv uk
By the
(23.29)
(23.30)
Page 164
Topic 24
Fourier Series
Theorem 24.1 Let V be the set of all integrable functions f : [a, b] C
and let k(x) be any positive real-valued function on [a, b]. Then V is a
normed inner product space with inner product
Z b
hf, gi =
(24.1)
k(x)f (x)g(x)dx
a
ci fi = c0 f0 + c1 f1 + c2 f2 +
(24.2)
k=0
ij =
1, if i = j
0, if i 6= j
f (k) (0)
k!
(24.3)
Page 165
Math 462
such that
f (x) =
ak fk
(24.4)
k=0
Z
hf, gi =
f (x)g(x)dx
(24.5)
on the real vector space defined in the previous example, use the GramSchmidt process to find an orthogonal basis from the complete basis 1, x, x2 , . . .
Denote the original basis by fj = xj , j = 0, 1, 2, . . . the orthogonal basis by
pj , and the normalized basis by qj . Then since
1
kf0 k = hf0 , f0 i =
dx = 2
(24.6)
q0 =
(24.7)
Next we calculate
1
hf1 , q0 i =
2
x2 dx = 0
(24.8)
and thus
p1 = f1 hf1 , q0 i q0 = f1 = x
2
kp1 k =
x2 dx =
q1 =
Page 166
p1
=
kp1 k
3
x
2
2
3
(24.9)
(24.10)
(24.11)
Math 462
Next,
p2 = f2 hf2 , q0 i q0 hf2 , q1 i q1
Z 1
1
2
2
1
dx = =
hf2 , q0 i =
(x2 )
3
3
2
2
1
r !
Z 1
3
2
xdx = 0 (odd function)
hf2 , q1 i =
(x )
2
1
r
2
1
3
1
2
p2 = x
0
x = x2
3
2
3
2
2
Z 1
1
8
2
2
kp2 k =
x
dx =
3
45
1
r
2 2
kp2 k =
3 5
r
p2
3 5
1
q2 =
=
x2
kp2 k
2 2
3
(24.12)
(24.13)
(24.14)
(24.15)
(24.16)
(24.17)
(24.18)
and so forth.
Remark 24.3 The sequence of orthonormal functions generated in the previous example are related to the Legendre polynomials, which are solutions
of the initial value problem
d
d
(24.19)
(1 x2 ) Pn (x) + n(n + 1)Pn (x) = 0, Pn (1) = 1
dx
dx
in other words, they are eigenfunctions of the operator T L(V), where V
is the vector space of functions on [1, 1], given by
T f = [(1 x)2 f 0 ]0
(24.20)
with eigenvalues n(n + 1). It turns out the eigenfunctions of certain differential operators will always produce orthogonal bases. See any book on
boundary value problems or the Sturm-Liouville operator for more details.
Page 167
Math 462
Pn (x)
1
2
3x2 1
1
2
5x3 3x
1
8
35x4 30x2 + 3
1
8
hf, i i i
(24.21)
i=0
X
i=0
X
i=0
ci i
(24.22)
ci hi , j i =
ci ij = cj
(24.23)
i=0
Plugging the second equation into the first gives the desired result.
Remark 24.5 In the previous
P theorem we overlooked what we mean by
convergence of the series k=0 ck k . This is a subtle point that we will not
concern ourselves with in this class. In particular, the convergence of the
series only satisfies the concept of convergence in the mean, namely,
that
X
ck k sn
0 as n
(24.24)
k=0
Page 168
Math 462
The consequence is that the equality may not hold at a countable number
of points, in the sense that at any point x0 , equation 24.21 really means
X
1
f (x+
hf, i i i (x0 )
0 ) + f (x0 ) =
2
i=0
(24.25)
1
2
f (x)g(x)dx
(24.26)
hn , m i =
1
1
ei(nm)x
2 i(n m)
h
i
1
i(nm)
=
e
ei(nm)
2i(n m)
1
=
sin(n m) = 0
(n m)
=
(24.28)
(24.29)
(24.30)
(24.31)
(24.32)
ck eikx
(24.33)
k=
where
ck =
1
2
f (x)eikx dx
(24.34)
2 We
havent actually shown that j form a basis, only that they are orthonormal. To
show that it is a basis we have to show that it spans the space.
Page 169
Math 462
Example 24.4 Repeat the previous example with the set of functions
1 sin kx cos jx
k = , , , k, j = 0, 1, 2, . . .
(24.35)
hf, gi =
f (x)g(x)dx
(24.36)
(24.37)
(24.38)
(24.39)
a0 X
+
[ak cos kx + bk sin kx]
2
(24.40)
k=1
The coefficients in the Fourier series are called the Fourier Coefficients
cj = hf, j i
(24.41)
The following result tells us that the sum of the Fourier coefficients is
2
bounded by the square of the norm, kf k , in the sense that
X
2
|cj |2 kf k
(24.42)
This is different from a finite-dimensional vector spaces, because we saw
above that for a finite dimensional vector space V with basis e1 , . . . , en , if
we define ai = hv, ei i,
*
+
X
X
X
X
2
|ai |2
kvk = hv, vi =
ai ei ,
aj ej =
ai aj hei , ej i =
i
i,j
(24.43)
There is no reason to necessarily expect equality to hold in the case of the
infinite dimensional space.
Page 170
Math 462
|hf, k i| kf k
(24.44)
k=0
Proof. Let
sn =
n
X
hf, k i k
(24.45)
k=0
(24.46)
k=0
Therefore,
hf sn , i i = hf, i i hsn , i i = hf, i i hf, i i = 0
(24.47)
ksn k = kf k kf sn k kf k
(24.49)
But
2
ksn k = hsn , sn i
* n
+
n
X
X
=
hf, j i j ,
hf, k i k
j=0
n
X
j=0
n
X
j=0
n
X
hf, j i
(24.50)
(24.51)
k=0
n
X
hf, k i hk , j i
(24.52)
k=0
hf, j i hf, j i
(24.53)
| hf, j i |2
(24.54)
j=0
Page 171
Math 462
Substituting the right hand side of 24.54 into the the left hand side of
equation 24.49 gives
n
X
| hf, j i |2 = ksn k kf k
(24.55)
j=0
X
2
2
|hf, k i| = kf k
(24.56)
k=0
kf k = kf sn + sn k = kf sn k + ksn k
(24.57)
k=1
(24.59)
Math 462
k=0
n
X
(24.60)
(24.61)
ak k
(24.62)
k=0
(24.63)
kf tn k = hf tn , f tn i
(24.64)
(24.65)
and that
*
htn , tn i =
=
n
X
aj j ,
j=0
n
X
n
X
j=0
n
X
k=0
n
X
aj
j=0
n
X
j=0
n
X
aj
n
X
+
ak k
(24.66)
ak hi , j i
(24.67)
ak ij
(24.68)
k=0
k=0
aj aj
(24.69)
|aj |2
(24.70)
j=0
Furthermore,
*
hf, tn i =
f,
n
X
+
ak k
k=0
*
htn , f i =
n
X
+
ak k , f
k=0
n
X
k=0
n
X
k=0
ak hf, k i =
ak hk , f i =
n
X
k=0
n
X
ak ck
(24.71)
ak ck
(24.72)
k=0
Page 173
Math 462
Therefore
2
n
X
k=0
n
X
kf tn k = kf k +
= kf k +
|ak |2
|ak |2
k=0
n
X
k=0
n
X
ak ck
n
X
ak ck
(24.73)
k=0
2 Re(ak ck ) ck
(24.74)
k=0
Similarly,
2
n
X
k=0
n
X
kf sn k = kf k +
= kf k
|ck |2
n
X
ck ck
k=0
n
X
ck ck
(24.75)
k=0
|ck |2
(24.76)
k=0
Hence
n
X
|ck |2 = kf k kf sn k
(24.77)
k=0
(24.78)
= xx xy yx + yy
2
(24.79)
(24.80)
Hence
2
n
X
k=0
n
X
k=0
n
X
kf tn k = kf k +
= kf k +
= kf k +
|ak |2 +
n
X
(24.81)
k=0
|ak ck |2
n
X
|ck |2
(24.82)
k=0
2
k=0
(24.83)
=
n
X
|ak ck |2 + kf sn k kf sn k
(24.84)
k=0
with equality holding only if each of the terms in the first sum is zero,
namely, whence ak = ck for all k.
Page 174
Topic 25
Triangular Decomposition
Corollary 25.1 Schurs Theorem. Let V be a complex inner-product
space; and let T L(V) be an operator on V. Then there exists an orthonormal basis B of V such that M(T, B) is upper-triangular.
Proof. This follows from corollary 23.9.
Definition 25.2 Let U be a complex valued matrix. Then the Conjugate
Transpose of U, or Adjoint matrix1 denoted by U , is the complex
conjugate of the matrix transpose.
U = (UT ) = U
(25.1)
Page 175
Math 462
Proof. (1) and (2) follow from the fact that hrowi , columnj i = ij (because
U1 = U ) and the fact that rowi = (columni )T .
(3) Let be an eigenvalue of U with nonzero eigenvector v. Then since
v 6= 0,
Uv = v = Uv = v
(25.2)
= vT U = v
(25.3)
= v (U U)v = v v
=
vT Iv
= || kvk
= kvk = || kvk
2
= || = 1
(25.4)
(25.5)
(25.6)
(25.7)
(25.8)
(25.9)
Math 462
e1 = (1, 0, 0, 0, . . . , 0)
e2 = (0, 1, 0, 0, . . . , 0)
e3 = (0, 0, 1, 0, . . . , 0)
..
en = (0, 0, . . . , 0, 0, 1)
(25.10)
M = v1
v2
vn
(25.12)
Then
AM = A v1
v2
vn = 1 v1
Av2
Avn (25.13)
v1T
v2T
v Av Av
M AM =
(25.14)
2
n
1 1
..
T
vn
Page 177
Math 462
M AM =
v1T Av2
1 v1T v1
2 v2T v1
..
.
v1T Avn
1 vnT v1
(25.15)
1
0
M AM = .
(25.16)
B
..
0
where the * notation refers to values we dont care about.
Now we can make use of the inductive hypothesis. Since B has dimensions
(n 1) (n 1), there exists some unitary matrix W and some upper
triangular matrix T1 such that
W BW = T1
(25.17)
1
0
Y = .
..
(25.18)
0
Since W is unitary, so is Y. From equation (25.16),
1 0 0
1 1
0
0
0
Y (M AM)Y = .
..
.
.
..
.
W
B
0
0
0
(25.19)
1
0
= .
..
0
Page 178
W BW
1
0
= ..
.
0
T1
(25.20)
Math 462
(25.21)
(25.22)
(25.23)
(25.24)
(25.25)
(25.26)
(UTU ) = A = A = UTU
(25.27)
Since A = A,
Applying the conjugate transpose to the first term,
UT U = UTU
(25.28)
(25.29)
Since T is upper triangular this means that all off-diagonal elements must
be zero, and all diagonal elements are real. Thus (25.25) leads to
U AU = D
(25.30)
Page 179
Math 462
Page 180
Topic 26
(26.2)
(26.3)
By the linearity of
(26.4)
If we define v by
v = (e1 ) e1 + + (en ) en
(26.5)
1 Recall
from definition 14.1 that a linear map is a map that the properties of additivity
((u + v) = (u) + (v), u, v V) and homogeneity ((av) = a(v), v V, a F)
Page 181
Math 462
then
hu, vi = hu, (e1 ) e1 + + (en ) en i
(26.6)
(26.7)
= (u)
(26.8)
as desired. Note that v does not depend on u, only on and the basis.
To prove uniqueness, suppose that there exist v, w such that
(u) = hu, vi = hu, wi
(26.9)
(26.10)
vw =0
(26.11)
(26.12)
(26.13)
= hT v, ui + hT v, wi
(26.14)
= hv, T ui + hv, T wi
(26.15)
= hv, T u + T wi
(26.16)
(26.17)
= a hT v, wi
= a hv, T wi
= hv, aT wi
Page 182
(26.18)
(26.19)
(26.20)
(26.21)
Math 462
(26.22)
1. Additivity: (S + T ) = S + T
3. Adjoint of Adjoint: (T ) = T
4. Identity: I = I
(v V)
(26.23)
(v V)
(26.24)
(26.25)
w (range T )
(w W)
hT w, vi = 0
(w W)
v (range T )
(26.26)
(26.27)
(26.28)
= (range(T )
(26.29)
(26.30)
(26.31)
= range T
Last revised: November 3, 2012
Page 183
Math 462
Next we recall the definition of the adjoint matrix as the conjugate transpose (see definition 25.2). Then the adjoint operator and the adjoint matrix
are related in the following way:
Theorem 26.7 Let T L(V, W), and let E = (e1 , . . . , en ) and F =
(f1 , . . . , fm ) be orthonormal bases of V and W respectively. Then the matrix
of the adjoint of T is the adjoint (conjugate transpose) of the matrix of T :
M(T , F, E) = M(T, E, F )
(26.32)
Proof. (Exercise.)
Definition 26.8 Let T LV be an operator. Then we say T is selfadjoint or Hermitian if T = T .
Theorem 26.9 Let V be a finite-dimensional, nonzero, inner-product space
over F (either R or C) and let T L(V ) be a self-adjoint operator over V.
The the eigenvalues of T are all real.
Proof. Let be an eigenvalue of T with nonzero eigenvector v. Then
2
kvk = hv, vi
(26.33)
= hv, vi
(26.34)
= hT v, vi (because is an eigenvalue of T )
(26.35)
(26.36)
(26.37)
= hv, vi
(26.38)
= hv, vi
(26.39)
= kvk
(26.40)
(26.42)
= hT u, ui + hT u, wi + hT w, ui + hT w, wi
Page 184
(26.43)
Math 462
and
0 = hT (u w), u wi
= hT u, ui hT u, wi hT w, ui + hT w, wi
(26.44)
(26.45)
Subtracting gives
0 = hT (u + w), u + wi hT (u w), u wi
= 2 hT u, wi + 2 hT w, ui
= hT u, wi = hT w, ui
(26.46)
(26.47)
(26.48)
(26.49)
(26.50)
(26.51)
(26.52)
Subtracting,
0 = hT (u + iw), u + iwi hT (u iw), u iwi
(26.53)
= 2 hT u, iwi + 2 hiT w, ui
(26.54)
= 2i hT u, wi + 2i hT w, ui
(26.55)
= hT u, wi = hT w, ui
(26.56)
(26.57)
Since this must hold for all u, w, it certainly holds for w = T u. Then for
any u V,
0 = hT u, wi = hw, wi
(26.58)
which is true if and only if w = 0.
Hence for any u V, w = T u = 0. Hence T = 0.
Remark 26.11 The last result only holds if V is complex; if V is real
equations 26.49 and following do not hold. Consequently equation 26.48
does not imply that T = 0
Example 26.1 (Example of Remark 26.11). Let V be a real vector space
and
let T be the 90 degree rotation about the origin operator,e.g, if v =
x
V,
y
x
y
Tv = T
=
(26.59)
y
x
Last revised: November 3, 2012
Page 185
Math 462
Then
hT v, vi =
y
x
,
= yx + xy = 0
x
y
(26.60)
(26.61)
= hT v, vi hv, T vi
(26.62)
= hT v, vi hT v, vi
(26.63)
= hT v T v, vi
(26.64)
= h(T T )v, vi
(26.65)
(26.67)
(26.68)
(26.69)
Math 462
Hence
hT u, wi = hT w, ui
(26.70)
= hw, T ui
(26.71)
(26.72)
= hT u, wi (because V is real)
(26.73)
(26.74)
v V
(26.75)
Proof.
T is normal T T T T = 0
(26.76)
h(T T T T )v, vi = 0 v V
(26.77)
hT T v, vi = hT T v, vi
(26.78)
hT v, T vi = hT v, T vi
kT vk = kT vk
(26.79)
(26.80)
Page 187
Math 462
(26.81)
= (T I)(T I)
(26.82)
2
(26.83)
= T T T T + I
(26.84)
= T (T I) (T I)
(26.85)
= T T T T + || I
= (T I)(T I)
(26.86)
= H H
(26.87)
(26.88)
(26.89)
Thus
(T I)v = 0
(26.90)
= hu, vi u, v
(26.92)
(26.93)
= hT u, vi hu, T vi
(26.94)
= hT u, vi hT u, vi
(26.95)
=0
(26.96)
Page 188
Topic 27
a11 a1n
..
..
M(T, E) =
(27.1)
.
.
0
ann
where by definition of the matrix of an operator, the aij are given by the
coefficients of the expansion
T ei = a1i e1 + + ani en
(27.2)
(27.3)
so that
2
kT e1 k = |a11 |2
(27.4)
Page 189
Math 462
Similarly,
a11
M(T , E) =
..
.
a11
a1n
.. = ..
.
.
0
..
a1n
ann
(27.5)
ann
n
X
a1i ei
(27.6)
i=1
kT e1 k =
=
* n
X
n
X
a1i eei ,
a1j ej
j=1
i=1
n X
n
X
i=1 j=1
n X
n
X
i=1 j=1
n X
n
X
(27.7)
ha1i ei , a1j ej i
(27.8)
(27.9)
a1i a1j ij
(27.10)
i=1 j=1
= |a11 |2 + + |a1n |2
(27.11)
(27.12)
a11 0
0 a22
M(T ) = .
..
..
..
.
.
0
0
Page 190
(27.13)
0
a2n
..
.
(27.14)
ann
Math 462
Repeating the same argument with e2 we find that the only non-zero element in the second row is a22 . We proceed through the matrix and get the
same result on each row, giving us a completely diagonal matrix.
Since the matrix is diagonal, by theorem 18.5) E is an orthonormal basis
of eigenvalues.
Theorem 27.2 Real Spectral Theorem. Let V be a real inner-product
space and let T L(V). Then V has an orthonormal basis consisting of
eigenvectors of T if and only if T is self-adjoint.
Corollary 27.3 Let T L(V) be self-adjoint with distinct eigenvalues
1 , . . . , m . Then
V = null(V 1 I) null(V m I)
(27.15)
(27.16)
T 2 + T + I
(27.17)
then
is invertible.
Proof. Let v V be nonzero. Then
(T + T + I)v, v = T 2 v, v + hT v, vi + hv, vi
2
(27.18)
= hT v, T vi + hT v, vi + kvk
(27.19)
(27.20)
= hT v, T vi + hT v, vi + kvk
2
= kT vk + hT v, vi + kvk
(27.21)
(27.22)
(27.23)
(27.24)
Page 191
Math 462
Therefore
2
2
2
(T + T + I)v, v kT vk || kT vk kvk + kvk
(27.25)
= kT vk || kT vk kvk +
||
2
kvk
4
||2
2
2
kvk + kvk
(27.26)
4
2
|| kvk |
2
2
= kT vk
+
kvk > 0
2
4
(27.27)
where the last inequality follows because v 6= 0 and 2 < 4. Hence the
inner product on the left is non-zero. Hence
(T 2 + T + I)v 6= 0
(27.28)
[If it were equal to zero we would have 0 = h0, vi > 0 because the inner
product of any vector with the zero vectors must be zero.]
Since v is an arbitrary non-zero vector, this means that every non-zero
vector in V is not in null(T 2 + T + I), i.e.,
null(T 2 + T + I) = {0}
(27.29)
(27.30)
(27.31)
(27.32)
Math 462
(27.35)
(27.36)
Page 193
Math 462
(27.37)
Hence S is self-adjoint.
The inductive hypothesis applies to U because it has dimension smaller
than n. By the inductive hypothesis there is an orthonromal basis of U
consisting solely of eigenvectors of S. Joining u to this list of eigenvectors
of S (each of which is also an eigenvector of T ) gives a basis of V that
consists solely of eigenvectors of T .
We need the following lemma to apply the Spectral Theorem to matrices.
Lemma 27.6 Let T be a normal triangular matrix. Then T is a diagonal
matrix.
Proof. Let T be a triangular matrix with Tij = 0 for i > j. Since T is
normal, TT = T T, and in particular, the diagonal elements are equal:
(TT )ii = (T T)ii
(27.38)
Then
(T T)ii =
n
X
(T )ik Tki =
i
X
Tki Tki
(27.39)
k=1
k=1
(TT )ii =
n
X
k=1
Tik (T )ki =
n
X
Tik Tik
(27.40)
k=i
The second sum starts at i rather than at 1, because Tik = 0 for k < i.
From (27.38), we can equate the expressions in (27.39) and (27.40). This
gives
T1i T1i + + Tii Tii = Tii Tii + + Tin Tin
(27.41)
Each term is an absolute value, hence non-negative.
|T1i |2 + |T2i |2 + + |Tii |2 = |Tii |2 + |Ti,i+1 |2 + + |Tin |2
{z
} |
|
{z
}
top of column i
Page 194
(27.42)
RHS of row i
Math 462
(i = 1)
(27.43)
(i = 2)
(27.44)
(i = 3)
(27.45)
..
.
In the first line (27.43), the |T11 |2 cancels and we get a sum of non-negatives
equalling zero, hence
T12 = T13 = = T1n = 0
(27.46)
Cancelling the |T22 |2 in (27.44), and using T12 = 0 from (27.46) gives
T22 = T23 = = T2n = 0
(27.47)
Cancelling the |T33 |2 in (27.45), and using the facts that T13 = 0 from
(27.46), and that T23 = 0 from (27.47),
T34 = T35 = = T3n = 0
(27.48)
(27.49)
(27.50)
AA = UTU UT U = UTT U
Last revised: November 3, 2012
(27.51)
(27.52)
Page 195
Math 462
(27.53)
(27.54)
(27.55)
Page 196
Topic 28
b
a
(28.2)
where b 6= 0.
(3) For any orthonormal basis B of V, equation 28.2 holds with b > 0.
Proof. We will show that (1) = (2) = (3) = (1).
((1) = (2)) Assume that (1) is true, i.e.,
T T = T T but T 6= T
(28.3)
Page 197
Math 462
(28.4)
(28.5)
T v = cu + dv
(28.6)
(28.7)
(28.8)
M(T ) = M(T ) =
=
c
d
c d
(28.9)
where the last step follows because the vector space is real. By the definition
of M(T ), (equation 18.1),
T u = au + cu
(28.10)
T v = bu + dv
(28.11)
so that
kT uk = hau + cv, au + cvi = a2 + c2
(28.12)
(28.13)
(28.14)
kT vk = kT vk
(28.15)
and therefore
a2 + b2 = a2 + c2
2
c +d =b +d
(28.16)
(28.17)
Thus
b2 = c2 = b = c
Page 198
(28.18)
Math 462
and therefore
M(T ) 6= M(T )
(28.19)
a
b
(28.20)
c
a
6=
d
c
b
d
(28.21)
(28.22)
But
a
b
M(T )M(T ) =
M(T )M(T ) =
a
b
2
b
a b
a + b2 ab bd
=
d
b d
ab bd b2 + d2
2
b a b
a + b2 ab + bd
=
d b d
ab + bd b2 + d2
(28.23)
(28.24)
Substituting this into equation 28.22 and equating like components of the
matrix,
ab + bd = ab bd = ab = bd
(28.25)
Since b 6= 0, we have a = d, and thus
a
M(T, B) =
b
b
a
(28.26)
Page 199
Math 462
(28.28)
T v = bu + av
(28.29)
T (u) = T u = au bv
(28.30)
T (v) = T v = bu av
(28.31)
Thus
b
a
(28.32)
(M(T )) M(T ) =
(28.35)
=
b a b a
0
a 2 + b2
Since M(T )(M(T )) = (M(T )) M(T ) we conclude that T T = T T , as
required.
Block-Notation
We will sometimes divide a matrix into blocks and refer to each of the
blocks as a matrix in its own right. For example, we might write
a b c d e
f g h i j
k l m n o = A B
(28.36)
C D
p q r s t
u v w x y
Page 200
Math 462
where
A=
a
f
b
c
, B=
g
h
l
m
q , D = r
v
w
k
d e
, C = p
i j
u
n o
s t (28.37)
x y
AQ + BS
CQ + DS
(28.38)
so long as all the multiplicatons are well-defined (i.e., it is possible to multiply A by P, etc. ).
A square matrix M is called Block (Upper) Triangular if it can be
written in the form
(28.39)
M= .
..
..
..
.
.
...
0
...
Mnn
A1
0
A=
.
..
0
0
A2
..
.
...
...
..
.
..
.
0
0
..
.
0
Am
(28.40)
B1
0
B=
.
..
B2
..
.
...
...
..
.
..
.
0
0
..
.
0
Bm
(28.41)
Page 201
Math 462
A1 B1
0
AB =
.
..
0
0
A2 B2
..
.
...
...
..
.
..
.
0
0
..
.
0
Am Bm
(28.42)
(28.43)
= aj1 e1 + + ajm em + 0 f1 + + 0 fn
(28.44)
(28.45)
a11
..
.
a1m
M(T, B) =
0
.
..
0
Math 462
...
am1
..
.
b11
..
.
...
...
...
amm
0
..
.
b1m
c11
..
.
...
...
...
c1n
...
bn1
..
.
bnm
= A B
0 C
cn1
..
.
(28.46)
cnn
kT vk =
m
X
| hT v, ek i |2 +
k=1
n
X
| hT v, fk i |2
(28.47)
k=1
Hence
2
kT ej k =
=
m
X
k=1
m
X
| hT ej , ek i |2
(28.48)
| haj1 e1 + + ajm em , ek i |2
(28.49)
k=1
m
2
m X
X
hajp ep , ek i
=
k=1 p=1
m
2
m X
X
=
ajp hep , ek i
k=1 p=1
2
m X
m
m
X
X
=
ajp pk =
|ajk |2
k=1 p=1
(28.50)
(28.51)
(28.52)
k=1
kT ej k =
j=1
m X
m
X
|ajk |2
(28.53)
j=1 k=1
Page 203
Math 462
Bya similar calculation, kT ej k is the sum of the squares of the jth row
of A B
2
kT ej k =
m
X
|akj |2 +
k=1
so that
m
X
kT ej k =
j=1
n
X
|bkj |2
(28.54)
k=1
m X
m
X
|akj | +
j=1 k=1
m X
n
X
|bkj |2
(28.55)
j=1 k=1
kT ej k =
m
X
kT ej k
(28.56)
j=1
j=1
|ajk |2 =
j=1 k=1
m X
m
X
|akj |2 +
j=1 k=1
m X
n
X
|bkj |2
(28.57)
j=1 k=1
|bkj |2 = 0
(28.58)
j=1 k=1
(28.59)
A 0
0 C
(28.60)
Math 462
(28.62)
(28.63)
S v = T v = (T |U ) = T |U
(28.64)
Hence
(proof of theorem 28.2 (4)) Given that U is invariant under T and (T |U ) =
T |U , we need to show that T |U is normal on U.
To prove that T |U is normal on U we need to show that it commutes with
its adjoint. But
(T |U ) (T |U ) = (T |U )(T |U )
= (T |U )(T |U )
= (T |U )(T |U )
(by (3))
(28.65)
(T is self-adjoint so )T T = T T (28.66)
(by (3))
(28.67)
Thus (T |U ) is normal.
(proof of theorem 28.2 (5)) The proof is analogous to the proof of (4).
Theorem 28.3 Let V be a real inner product space and T L(V) be an
operator over V. Then T is normal if and only if there is some orthonormal
basis of V under which M(T ) is block diagonal and each block is either
1 1 or 2 2 with the form
a b
(28.68)
b a
with b > 0.
Proof. ( = ) Suppose that T is normal, and prove by induction on n =
dim(V).
For n = 1, the result follows immediately.
Last revised: November 3, 2012
Page 205
Math 462
For n = 2, then either T is self-adjoint or is not self-adjoint. If it is selfadjoint, then by the real spectral theorem (theorem 27.2), it has a basis
given entirely by eigenvectors of T , and hence by theorem 18.8, M(T ) is
diagonal with respect to some basis. If T is not self-adjoint, then by theorem
28.1 M(T ) has the form shown.
As our inductive hypothesis, assume n > 2 and that the result holds for
n 1.
By theorem 20.1, T has an invariant subspace U of either dimension 1 or
2, such that the dimension 1 subspaces have real eigenvalues and in the
dimension 2 subspaces T |U have only have complex conjugate eigenvalue
pairs with non-zero imaginary parts (i.e., they do not have real eigenvalues).
If dim (U) = 1, choose any v U such that kvk = 1 as a basis of U.
If dim (U) = 2, then T |U is normal (theorem 28.2, number (4)). But T |U is
not self-adjoint. To see this, note that if T were self-adjoint, it would have
a real eigenvalue by lemma 27.5 and we have just noted that T |U does not
have real eigenvalues on the 2-dimensional subspaces.
Since T |U is normal but not self adjoint (in dimension 2), theorem 28.1
applies. Then there is some basis in which M(T |U ) has the form given by
equation 28.68.
Next observe that U is invariant under T and T |U is normal on U by
theorem 28.2. Since dim U < n, the inductive hypothesis holds on it.
Hence there is some basis B under which M(T |U ) has the properties
predicted by the theorem. Adding this basis to the basis of U gives a
basis B 0 under which M(T, B 0 ) has the properties of the theorem.
( = ) Suppose there is some basis E under which M(T ) is block diagonal
M(T ) = diagonal(A1 , . . . , An )
(28.69)
(28.70)
(28.71)
Math 462
and therefore
M(T )M(T ) = diagonal(A1 , . . . , An )diagonal(AT1 , . . . , ATn )
(28.72)
(28.73)
(28.75)
= M(T )M(T )
(28.74)
(28.76)
Thus T is normal.
Normal Matrices
Definition 28.4 An n n square matrix is a normal matrix if all of its
eigenvectors are orthogonal.
Theorem 28.5 A normal matrix is sef-adjoint if and only if all of its eigenvalues are real.
Proof. Let N be a normal matrix. Then it has n orthogonal eigenvectors
v1 , . . . , vn with eigenvalues 1 , . . . , n .
Define U as the matrix whose column vectors are the orthonormal eigenvectors vi and = diagonal(1 , . . . , n ). Then U1 = U and
NU = N v1 vn
(28.77)
= Nv1 Nvn
(28.78)
(28.79)
= 1 v1 n vn
= v1 vn diagonal(1 , . . . , n )
(28.80)
= U
(28.81)
Hence
U NU =
(28.82)
N = UU
(28.83)
or equivalently
Thus N is similar to the diagonal matrix and has the same eigenvalues.
N is self-adjoint if and only if N = N, but
N = N (UU ) = UU
U U = UU
=
Last revised: November 3, 2012
(28.84)
(28.85)
(28.86)
Page 207
Math 462
The last line is true if and only if i = i for all i i.e., the eigenvalues are
all real.
Theorem 28.6 If N is a normal matrix then there exists square, commuting matrices A and B such that
N = A + iB and AB = BA
(28.87)
(28.88)
(28.89)
(28.90)
N = U(r + ii )U = Ur U + iUi U
(28.91)
where
Hence
Therefore N = A + iB, where
A = Ur U and B = Ui U
(28.92)
To see that A and B commute, we use the fact that diagonal matrices
commute:
AB = Ur U Ui U
(28.93)
= Ur i U
(28.94)
= Ui r U
(28.95)
= Ui U Ur U
(28.96)
= BA
(28.97)
Page 208
Math 462
Theorem 28.7 N is normal if and only if it commutes with its adjoint, i.e,
NN = N N
(28.98)
= U U
= U U
(28.99)
(28.100)
(28.101)
= (U U )(UU )
(28.102)
= N N
(28.103)
1
(N + N ),
2
1
(N N )
2i
B=
(28.104)
Then
N = A + iB
(28.105)
Furthermore, since NN = N N,
4iAB = (N + N )(N N )
2
(28.106)
2
(28.107)
(28.108)
(28.109)
= N + N N NN (N )
= N + NN N N (N )
= N(N + N ) N (N + N )
= (N N )(N + N )
(28.110)
= 4iBA
(28.111)
Hence AB = BA.
Let a1 a2 an be the eigenvalues of A and let = diagonal(a1 , . . . , an ).
Since A = A, we know the eigenvalues are real (see theorem 26.9).
Since A is real and self-adjoint it has n mutually orthogonal unit vectors
(real spectral theorem). Let V be the matrix whose columns are eigenvecLast revised: November 3, 2012
Page 209
Math 462
tors of A. Then
V AV = V
=V
Av1
a1 v1
Avn
an vn
(28.112)
(28.113)
= V V
(28.114)
(28.115)
Let
K = V BV
(28.116)
(28.117)
= V ABV
(28.119)
= V BAV
(28.118)
(28.120)
= (V BV)(V AV)
(28.121)
= K
(28.122)
ip Kpj =
n
X
p=1
n
X
p=1
n
X
p=1
p=1
ai ip Kpj =
Kip pj
(28.123)
Kip aj pj
(28.124)
ai Kij = Kij aj
(28.125)
(28.127)
Math 462
where each of the Ki is self-adjoint and has the same dimensions as the
corresponding i . (This follows from equation 28.125.)
Since Ki is self-adjoint, there is some self-adjoint matrix Wi such that
Wi Ki Wi is diagonal. Let
W = diagonal(W1 , . . . , Wm )
(28.128)
..
=
(28.129)
.
0
W1 K1 W
=
0
K1
0
0
Wm
0
..
..
W W =
.
0
..
0
W1
Km
0
Km Wm
Wm
0
Wn
0
..
(28.130)
Wm
(28.131)
W1
0
m
..
..
W1 1 W1
0
..
=
0
Wm
m Wm
W1 1 IW1
0
..
=
0
Wm m IWm
=
Wm
(28.132)
(28.133)
(28.134)
(28.135)
(28.136)
(28.138)
Page 211
Math 462
Page 212
Topic 29
Positive Operators
Definition 29.1 Let V be a finite-dimensioned non-zero inner-product
space over V and let T L(V) be a self-adjoint operator over V. Then
we say T is a positive operator operator, or a positive semi-definite
operator, or just T is positive, if T is self-adjoint and
hT v, vi 0
(29.1)
for all v V.
The term positive is a bit misleading because equality is allowed in equation 29.1; a better term might be non-negative rather than positive. An
operator that satisfies
hT v, vi > 0
(29.2)
unless v = 0 is called positive definite. When equality is allowed, the
definite becomes semi-definite but in our shorthand term, we call it
positive.
Example 29.1 Every orthogonal projection is positive.
Definition 29.2 A self-adjoint matrix A is called positive definite if
vT Av > 0
(29.3)
(29.4)
Page 213
Math 462
(29.5)
S(x, y, z) = (y, z, 0)
(29.6)
(29.7)
and
Then
Hence S is a square root of T .
Example 29.3 Square Root of a Matrix Find the square root of
125 75
T=
(29.8)
75 125
Let M = T be denoted by
a b
M=
(29.9)
c d
Then
T = M2 =
a
c
b a
d c
2
b
a + bc
=
d
ac + cd
ab + bd
bc + d2
(29.10)
(29.11)
ab + bd = 75
(29.12)
ac + cd = 75
(29.13)
bc + d = 125
(29.14)
The equations are non-linear and there are four solutions for M. To see
this, substractequation (29.11) from (29.14) to get
a2 = d2
This gives two choices, a = d or a = d.
We can rule out a = d since this would give a contradiction in either of
(29.12) or (29.13). Hence a = d.
Substituting this into equations (29.12) and (29.13) gives
Page 214
2ab = 75 = b = 75/2a
(29.15)
2ac = 75 = c = 75/2a
(29.16)
Math 462
752
4a2
(29.17)
Rearranging,
a4 125a2 +
752
=0
4
(29.18)
This is a quadratic in a2 , so
125 100
225
125
125 1252 752
=
=
or
a2 =
2
2
2
2
(29.19)
(29.22)
(29.23)
that
(29.24)
Page 215
Math 462
i=1
* n
X
i=1
* n
X
i=1
Page 216
ai Sei ,
j=1
n
X
+
aj ej
(linearity)
(29.28)
(eigenvalues)
(29.29)
j=1
ai i ei ,
n
X
+
aj ej
j=1
Math 462
Hence by additivity, homogeneity in the first argument, and conjugate homogeneity in the second argument of the inner product,
hSv, vi =
n X
n
X
i=1 j=1
n
n X
X
i=1 j=1
n X
n
X
hai i ei , aj ej i
(29.30)
ai i aj hei , ej i
(29.31)
ai i aj ij
(29.32)
i=1 j=1
n
X
i |ai |2 0
(29.33)
i=1
(29.34)
(29.36)
= a1 1 e1 + + an n en
(29.38)
= a1 T e1 + + an T en
(29.35)
(29.37)
(because T ej = j ej )
(29.39)
= T (a1 e1 + an T en )
(29.40)
= Tv
(29.41)
Page 217
Math 462
T = (S S) = S (S ) = S S = T
(29.42)
(29.43)
(29.44)
(29.45)
Page 218
Math 462
p
j I) = null(T j )I
(29.50)
(29.51)
(29.52)
= hU MUv, vi
(29.53)
= hDv, vi
(29.54)
= h(1 v1 , . . . , n vn ), vi
2
(29.55)
2
(29.56)
hMui , ui i = hi ui , ui i = i kui k = i
(29.57)
(29.58)
Page 219
Math 462
Thus
*
hMv, vi =
n
X
ai ui ,
i=1 j=1
n X
n
X
+
aj uj
(29.59)
hMai ui , aj uj i
(29.60)
ai aj i hui , uj i
(29.61)
|ai |2 i ij
(29.62)
i=1
n X
n
X
i=1 j=1
n
n X
X
n
X
j=1
i=1 j=1
|ai |2 i
(29.63)
which is greater than zero if and only if all eigenvalues are positive.
Page 220
Topic 30
Isometries
Definition 30.1 A operator S L(V) is called an isometry if it preserves
length in the sense
kSvk = kvk
(30.1)
for all v V.
Example 30.1 Suppose that S L(V) such that
S(ej ) = j ej
(30.2)
n
X
i=1
n
X
| hv, ei i |2
(30.3)
i=1
n
X
hv, ei i Sei =
i=1
n
X
i hv, ei i ei
=
=
(30.5)
XX
i
(30.4)
i=1
(30.6)
Xn
i=1
|i |2 | hv, ei i |2 = kvk
(30.7)
Page 221
Math 462
where the last step follows from equation 30.3. Therefore S is an isometry.
Theorem 30.2 Properties of Isometries. Let S L(V). Then the
following are equivalent:
1. S is an isometry.
2. hSu, Svi = hu, vi for all u, v V.
3. S S = I
4. (Se1 , . . . , Sen ) is orthonormal whenever (e1 , . . . , en ) is orthonormal.
5. There exists an orthonormal basis (e1 , . . . , en ) such that (Se1 , . . . , Sen )
is orthonormal.
6. S is an isometry.
7. hS u, S vi = hu, vi for all u, v V.
8. SS = I
9. (S e1 , . . . , S en ) is orthonormal whenever (e1 , . . . , en ) is orthonormal.
10. There exists an orthonormal basis (e1 , . . . , en ) such that (S e1 , . . . , S en )
is orthonormal.
Proof. We will show that (1) through (5) are equivalent. The proof that
conclusions (6) through (10)are equivalent is analogous. To see that (1)
through (5) are equivalent to (6) through (10) we compare items (3) and
(8), and observe that SS = I if and only if S S = I.
((1) = (2)) Assume that S is an isometry; then kSxk = kxk for all x V.
Then for any u, v V,
1
2
2
(homework)
(30.8)
kSu + Svk kSu Svk
hSu, Svi =
4
1
2
2
=
kS(u + v)k kS(u v)k
(30.9)
4
1
2
2
=
ku + vk ku vk
(isometry) (30.10)
4
= hu, vi
(homework) (30.11)
((2) = (3)) Suppose that hSu, Svi = hu, vi for all u, v V. Then
h(S S I)u, vi = hS Su, vi hu, vi = hSu, Svi hu, vi = 0
(30.12)
(30.13)
Math 462
Hence
(S S I)u = 0
(30.14)
S S I = 0 = S S = I
(30.15)
(30.16)
i=1
n X
n
X
i=1 j=1
n
X
(30.17)
(30.18)
(30.19)
j=1
| hv, ei i |2 = kvk
(30.20)
(30.21)
i=1
where the last step comes from theorem 23.5. Thus S is an isometry and
(5) = (1).
Remark 30.3 Every isometry is normal, since S S = I = SS .
Page 223
Math 462
(30.22)
a
b
b
a
(30.24)
Math 462
There must be basis vectors ej , ej+1 corresponding to this block such that
Sej = aej + bej+1
x
To see why this is true, let =
R2 . Then
y
a b x
=
b a
y
ax by
x
y
=
=a
+b
bx + ay
y
x
= a + b
(30.25)
(30.26)
(30.27)
(30.28)
1 = kej k = kSej k
(30.29)
(30.30)
2
=a +b
(30.31)
(30.32)
as required.
( = ) Suppose that for some orthonormal basis the matrix has the desired
form. Then we can decompose V into subspaces such that
V = U1 Um
(30.33)
(30.34)
(30.35)
(30.36)
(30.37)
Page 225
Math 462
where u =
Hence
2
(30.38)
(30.39)
X
X
(30.41)
hSui , Suj i =
i,j
(30.40)
hS Sui , uj i =
i,j
hui , uj i
(30.42)
i,j
(30.43)
kui k = kvk
(30.44)
(30.45)
which is a rotation by an angle in R2 ; thus every rotation in Rn is composed of a sequence of rotations in the coordinate planes.
Corollary 30.7 Let V = Rn where n is odd, and let S be an isometry on
V. Then S has either 1 or 1 (or possibly both) as an eigenvalue.
Page 226
Topic 31
Singular Value
Decomposition
Note for Next Year
Move the material from chapter 7 to the end of this chapter and integrate
more thoroughly.
Theorem 31.1 Polar Decomposition. Let T L(V). Then there exists
an isometry S L(V) such that
(31.1)
T = S T T
Lemma 31.2 Let U be a subspace of V. Then
dim V = dim U + dim U
(31.2)
kT vk = hT v, T vi = hT T v, vi 0
(31.3)
2
(31.4)
kT vk =
T T T T v, v
D
E
=
T T v, T T v
(31.5)
2
=
T T v
(31.6)
Page 227
Math 462
Now define
S 0 : range T T 7 rangeT
by
S0
T T v = T v
(31.7)
(31.8)
T T v1 = T T v2
(31.9)
Then
kT v1 T v2 k = kT (v1 v2 )k
=
S 0 T T (v1 v2 )
=
S 0 ( T T v1 T T v2 )
=0
(31.10)
(31.11)
(31.12)
(31.13)
Hence
T v1 = T v2
(31.14)
(31.15)
= kT vk (equation 31.8)
=
T T v
(equation 31.6)
(31.17)
= kuk
(31.18)
(31.16)
Therefore S 0 is an isometry.
Since S 0 is injective we have by theorem 14.19 that
(31.19)
= dim range(S )
(31.20)
= dim range(T )
(31.21)
Math 462
dim range T T
= dim (rangeT )
(31.22)
(31.23)
(31.24)
range T T
and F =
7 (rangeT )
S 00 : range T T
S 00 (a1 e1 + + an en ) = a1 f1 + + an fn
(31.25)
(31.26)
(31.27)
(31.28)
ai aj hfi , fj i
(31.29)
i,j
|ai |2 = kwk
(31.30)
hence S 00 is an isometry.
Define S as the operator
v range T T
Sv =
S 00 v, v range T T
S 0 v,
(31.31)
w range T T
such that
v = u + w = Sv = S 0 u + S 00 w
Last revised: November 3, 2012
(31.32)
Page 229
Math 462
But by definiton of S 0 ,
S T T v = S0 T T v = T v
(31.33)
kSvk = kS 0 u + S 00 wk
2
(31.34)
2
00
= kS uk + ks wk
2
= kuk + kwk
(Pythagorean Theorem)
(31.35)
(31.36)
(Pythagorean Theorem)
(31.37)
= kvk
Hence S is an isometry.
Example 31.1 Find a Polar Decomposition of
11 5
T=
(31.38)
2 10
2
10
11
T =T =
5
Hence
(31.39)
2 11
10 2
5
125 75
=
10
75 125
1 15
5
2
11
5
1 5
5
,
15
2 15
1
15
5
,
5
15
2
1 15
15
,
5
2 5
(31.40)
5
15
(31.41)
15
15
2 5
2 5
1 250 150
125 75
=
=
= T T
75 125
2 150 250
Page 230
(31.42)
(31.43)
(31.44)
Math 462
(31.45)
T = S T T = SM = S = TM1
We can calculate that
M1 =
1
3
20 2 1
1
3
(31.46)
Hence
S = TM1 =
1
11
20 2 2
5 3
10 1
1 7
1
=
3
5 2 1
1
7
1
=I
7
(31.47)
(31.48)
1 7 1
1 15
5
T=S T T=
(31.49)
15
5 2 1 7
2 5
|
{z
} |
{z
}
S
T T
M v = v = M M v = M v = 2 v
(31.50)
= M v = 2 v
( = ) Let 2 be an eigenvalue of M with eigenvector v. Then
(31.51)
(31.52)
or is an eigenvalue of M .
Corollary 31.5 The singular values of T are the square roots of eigenvalues
of T T .
Last revised: November 3, 2012
Page 231
Math 462
(31.53)
The singular values of T are the square roots of the eigenvalues of M, where
M = T T. But from the previous example
125 75
T T =
(31.54)
75 125
The characteristic equation is
0 = (125 )2 (75)2 = 2 250 + 10, 000
= ( 200)( 50)
(31.55)
(31.56)
200, 50 = 10 2, 5 2.
(31.57)
for every v V.
(31.59)
M v = M hv, e1 i e1 + + M hv, en i en
(31.60)
= s1 hv, e1 i e1 + + sn hv, en i en
(31.61)
Multiplying by M ,
T = S T T = SM
(31.62)
Page 232
Math 462
(31.63)
(31.64)
where
fj = Sej ,
j = 1, . . . , n
(31.65)
s1 0 0
e1
0 s2 0 ..
f
.
.
.
f
Av =
(31.69)
..
. v
1
n
.
en
0
0 sn
s1 0 0
hv, e1 i
0 s2 0 .
= f1 . . . fn .
(31.70)
..
..
hv, en i
0
0 sn
s1 hv, e1 i
..
= f1 . . . fn
(31.71)
.
sn hv, en i
= s1 hv, e1 i f1 + + sn hv, en i fn
(31.72)
Page 233
Math 462
(31.76)
1 15
5
M = T T =
(31.77)
15
2 5
and
1 7
S=
5 2 1
1
7
(31.78)
and
.
Orthonormal eigenvectors of M are
2 1
2 1
We also find the orthonormal basis fj from fj = Sej as
1 7 1
1
1
4/5
f1 = Se1 =
=
(31.79)
1
3/5
5 2 1 7
2
1 7 1
1
1
3/5
f2 = Se2 =
=
(31.80)
1
4/5
5 2 1 7
2
Hence the singular value decomposition is
0
1/
4/5 3/5 10 2
2
M = FSE =
3/5 4/5
0
5 2
1/ 2
Page 234
1/2
1/ 2
(31.81)
Topic 32
Generalized Eigenvectors
Motivation Let V be a vector space over F and let T be an operator on
V. Then we would like to describe T by finding subspaces of V in which
V = U1 Un
(32.1)
(32.2)
(32.3)
1 1 1
M = 0 2 1
1 0 2
(32.4)
Page 235
Math 462
3a
a
1 1 1
0 2 1 b = 3b
3c
c
1 0 2
(32.5)
(32.6)
Hence
a + b + c = 3a
(32.7)
2b + c = 3b = b = c
(32.8)
a + 2c = 3c = a = c
(32.9)
(32.10)
2b + c = b = c = b
(32.11)
a + 2c = c = a = c
(32.12)
0 1 1 0
(M I)2 = 0 1 1 0
1 0 1 1
(32.13)
1
1
1 = 1
1
1
1
1
1
2
2
2
(32.14)
(32.15)
Math 462
(32.16)
(32.17)
(32.18)
Hence null(T k ) null(T k+1 ). Since this holds for all k, the result follows.
Theorem 32.4 If for some m > 0, null(T m ) = null(T m+1 ) then
null(T 0 ) null(T 1 ) null(T m ) = null(T m+1 ) =
(32.19)
In other words, once two nullspaces in equation 32.17 are equal, all successive nullspaces are equal.
Proof. Suppose that for some m > 0,
null(T m ) = null(T m+1 )
(32.20)
(32.21)
(32.22)
(32.23)
Hence
Page 237
Math 462
where the second equality follows from our initial assumption (equation
32.20). Hence since
T m+k v = T m T k v = T m 0 = 0
(32.24)
(32.25)
(32.26)
Proof. Let m = dim V and suppose that null(T dim V ) 6= null(T dim V+1 )
(proof by contradiction). Then
{0} = null(T 0 ) null(T 1 ) null(T m ) null(T m+1 )
(32.27)
where the subsets are strict (not equality). Since the subsets are strict, the
dimension of each set must increase by at least 1. Hence
{0} = dim null(T 0 ) < dim null(T 1 ) <
< dim null(T m ) < dim null(T m+1 )
dim null(T dim V+1 ) = dim null(T m+1 ) > m + 1 = dim V + 1
(32.28)
(32.29)
But T dim V+1 is a subspace of V and cannot have dimension larger than
V. Hence this is a contradiction. Hence our assumption is false, and the
theorem follows.
Theorem 32.6 Let be an eigenvalue of T L(V). The set of generalized
eigenvalues of T corresponding to eigenvalue equals null((T I)dim V ).
Proof. Let v null((T I)dim V ). Then by definition of generalized eigenvectors, v is a generalized eigenvector with eigenvalue . This proves that
null((T I)dim V ) the set of generalized eigenvectors for
(32.30)
(32.31)
Math 462
Let S = T I. Then
v null(S j ) null(S j+1 ) S dim V
(32.32)
(32.33)
Thus
The set of generalized Eigenvectors for null((T I)dim V )
(32.34)
dim V
v = 0. Thus N
(32.35)
dim V
= 0.
(32.36)
(32.37)
Page 239
Math 462
Page 240
Topic 33
The Characteristic
Polynomial
Definition 33.1 The multiplicity of an eigenvalue is the dimension
of the subspace of generalized eigenvectors corresponding to , i.e.,
multiplicity() = dim null((T I)dim V )
(33.1)
(33.2)
times.
Proof. Consider first the case with = 0. To prove the general case, replace
T with T 0 = T I in what follows.
Define n = dim V.
For n = 1 the result holds because M is 1 1.
Let n > 1 and assume the result holds for n 1.
Suppose that (with respect to B), M is upper triangular; define i as the
diagonal elements, so that,
Page 241
Math 462
0
M=
.
..
..
.
..
.
n1
(33.3)
..
..
M0 = M(T |U , U) = ...
(33.4)
.
.
0 n1
By the inductive hypothesis, 0 appears on the diagonal of M0
dim null((T |U )dim U ) = dim null((T |U )n1 )
(33.5)
(33.6)
Hence (combining the last two equations), the number of zeros on the diagonal of M 0 is
dim null((T |U )n )
(33.7)
We consider two cases: n = 0 and n 6= 0.
Case 1: n 6= 0 By equation 33.3
n
1
0
M(T n ) = M(T )n = Mn =
.
..
..
.
nn1
..
.
nn
(33.8)
(33.9)
for some u U.
[To see this, let mi be the ith row vector of Mn and vni the ith component
of vn ; then
m1
X
Mn vn = ... vn =
mi (vn )i
(33.10)
i
mn
Page 242
Math 462
(33.11)
(33.12)
6U
where we have used equation 33.9 in the last step. Since U is invariant
under T , and hence under T n , the first two terms are in U, while the last
term is in span (vn ) which is not in U. Since the sum is 0, each term must
be zero. Hence
ann = 0
(33.13)
Since we have assumed (case 1) that n 6= 0 this measn a = 0. Hence
v = u0 + avn = u0 U . But we chose v as any element of null(T n ). Hence
null(T n ) U
(33.14)
and therefore
null(T n ) = null((T |U )n )
(33.15)
n
(otherwise there would be some element of null(T ) that was not in U.)
Hence by equation 33.7 the number of zeros on the diagonal of M0 is
dim null((T |U )n ) = dim null(T n )
(33.16)
(33.18)
(33.19)
and
equation 33.17 gives us
dim(null(T n )) = dim(U + null(T n )) + dim(U null(T n )) dim(U)
(33.20)
= dim(T |U )n + dim(U + null(T n )) (n 1)
Last revised: November 3, 2012
(33.21)
Page 243
Math 462
(33.22)
(33.23)
= T u = T vn
(33.24)
= T n vn range ((T |U )n )
(33.25)
(33.26)
(33.29)
= dim(U + null(T n )) = n
(33.30)
(33.31)
(33.32)
(33.33)
= dim(T |U ) + n (n 1)
= dim(T |U ) + 1
Using the last result in equation 33.7 gives, since n = 0,
Number of zeroes on diagonal of M
(33.34)
0
(33.35)
(33.36)
Math 462
(33.37)
(33.38)
(33.39)
for all v. Since any v can be expanded as a sum of the basis vectors, this
is also equivalent to proving that
p(T )vj = 0
(33.40)
(33.41)
(33.42)
Page 245
Math 462
For the inductive step, suppose that for 1 < j n, equation 33.42 holds
for 1, 2, . . . , j 1.
Let Mj = M(T |Uj ) where Uj = span(v1 , . . . , vj ). Then the bottom right
diagonal element of Mj j I is zero, hence
(T |Uj j I)vj span (v1 , . . . , vj1 )
(33.44)
i.e.,
vj =
j1
X
ak vk
(33.45)
k=1
j
Y
(T k I)vj =
k=1
j
Y
(T k I)
k=1
j1
X
a` v` = 0
(33.46)
`=1
(33.47)
(33.48)
= (1 + x)(4 + x) 6
(33.49)
1 x
p(x) =
2
2
= x + 5x 2
(33.50)
(33.51)
But
1 3
1 3
1 3
1
+5
2
2 4
2 4
2 4
0
7
15
5 15
2 0
=
+
10 22
10 20
0 2
0 0
=
0 0
p(M) =
Page 246
0
1
(33.52)
(33.53)
(33.54)
Topic 34
(34.1)
(34.2)
T v null(p(T ))
(34.3)
Therefore
hence null(p(T )) is invariant under T .
Theorem 34.2 Let V be a complex vector space, and let T be an operator
over V, with distinct eigenvalues 1 , . . . , m and corresponding eigenspaces
(subspaces spanned by the generalized eigenvectors) U1 , . . . , Um . Then
(1) Each Uj is invariant under Tj
(2) Each (T j I)|Uj is nilpotent.
(3) V = U1 Um
Proof. Proof of (1) By theorem 32.6
Uj = null((T j I)dim V )
(34.4)
Page 247
Math 462
Let
p(z) = (z j )dim V
(34.5)
(34.6)
Hence T j I is nilpotent.
Proof of (3). The multiplicity of each j is dim(Uj ) (def. 33.1) and the
sum of the multiplicities is dim(V) (theorem 33.3),
dim V = dim U1 + + dim Um
(34.7)
U = U1 + + Um
(34.8)
Define
By (1) each of the Ui is invariant under T ; hence U is invariant under T .
Define S = T |U . Every generalized eigenvector of T is a generalized eigenvector of S, with the same eigenvalues; and the the eigenvalues have the
same multiplicities. Hence
dim U = dim U1 + + dim Um = dim V
(34.9)
(34.10)
Math 462
(34.11)
Hence N vj is a linear combination of the columns to the left of it; thus the
diagonal element must be zero.
Repeat the argument for each succeeding column.
Theorem 34.5 Let V be a complex vector space and T LV . Let
1 , . . . , m be the distinct eigenvalues of T . Then there is a basis B of
V such that
M(T, B) = diagonal(A1 , . . . , Am )
(34.12)
where each Aj is upper triangular with j on the diagonal.
Proof. For each j , let Uj be the subspace spanned by the corresponding
generalized eigenvectors.
By theorem 34.2, (T j I)|Uj is nilpotent.
For each j we can choose a basis Bj of Uj such that (by theorem 34.4)
M(T |Uj
j I, Bi ) =
0
..
(34.13)
Page 249
Math 462
..
Aj =
.
0
j
(34.14)
A1
0
..
M(T, B) =
(34.15)
.
0
Am
A1
0
..
M(T, B) =
(34.16)
.
0
Am
j 1
0
.. ..
.
.
(34.17)
Aj =
.
.. 1
0
j
Theorem 34.7 Let V be a complex vector space. If T L(V) then there
is a basis of V that is a Jordan Basis of V.
Lemma 34.8 Let V be a vector space and let N L(V) be nilpotent. Then
there exist vectors v1 , . . . , vk V such that
1. The following is a basis of V:
(v1 , N v1 , . . . , N m(v1 ) v1 , . . . , vk , N vk , . . . , N m(vk ) vk )
(34.18)
(34.19)
Math 462
(34.20)
to zero and each subsequent vector in the list to the vector to its left.
By reversing each Bj defined in equation 34.20 and forming the list
B 0 = (reverse(B1 ), . . . , reverse(Bk ))
(34.21)
(34.22)
0
..
..
1
0
(34.23)
(34.24)
Sj = (T j I)|Uj
(34.25)
where
is nilpotent, by theorem 34.2 (2).
Now apply the argument in the previous paragraph: to each nilpotent
operator Sj there is a basis of Uj such that M(Sj , Bj ) has the form of
equation 34.23. By equation 34.25, each block has the form desired (by
adding j I to the form in equaiton 34.23). This gives a Jordan Block,
proving the theorem.
Example 34.1 Find the Jordan form
1
M = 0
1
Last revised: November 3, 2012
of
1
2
0
1
1
2
(34.26)
Page 251
Math 462
1 1
J = 0 1
0 0
0
0
(34.27)
3
Page 252
Math 462
Page 253