Linear Algebra Weeks 1234

Linear Algebra
JVN, Premaster Summer 2012

Time: 8:00am - 11:00am Tuesday, Thursday
Location: Room 2, JVN Building
Contents
Part 1. Preliminaries 1
1. Sets 1
1.1. Basic Objects 1
1.2. Set Operations 1
1.3. Maps between Sets 1
2. Topological Spaces 2
3. Groups 4
4. Rings and Fields 6
Part 2. Linear Algebra 8
5. Vector Spaces 8
5.1. Basic Objects 8
5.2. Inner Product 10
5.3. Norm 12
5.4. Angle between Vectors 14
5.5. Exercises 15
6. Linear Maps and Matrices 15
6.1. Morphisms between Vector Spaces 15
6.2. Kernel, Nullity, Image, Rank 16
6.3. Matrix Operations 17
6.4. Invertible Matrices and their Inverses 23
6.5. Exercises 31
Part 1. Preliminaries
1. Sets
1.1. Basic Objects. We begin with the most fundamental object in mathematics - a set.
Denition 1.1. A set X is a collection of distinct, denite objects.
Each object is called an element of X and written x X. Each subcollection U in X is
called a subset of X and denoted U X. Each subset usually contains elements of some
property P. The empty set is denoted . A nite set is denoted X = x
1
, x
2
, ..., x
n
. A
countably innite set is denoted X = x
1
, x
2
, x
3
, ....
Example 1.2. The set of all positive natural numbers is N
+
= 1, 2, 3, ..., the set of all
natural numbers is N = 0, 1, 2, ..., the set of all integral numbers is Z = ..., 2, 1, 0, 1, 2, ...
1
2
and the set of all real numbers is R. Surely N
+
N Z R. The properties are being
positive natural, being natural and being integral.
Example 1.3. The set of all possible outcomes when we roll a die is X = 1, 2, 3, 4, 5, 6.
Example 1.4. The set of all possible outcomes when we toss a coin is X = H, T.
Example 1.5. The set of all subsets of X is called its power set and denoted P(X).
1.2. Set Operations. Below are basic set operations, all of which can be seen and veried
by Venns diagrams.
complement: if U X then U
c
= x X [ x / U.
union: A
B = x [ x A or x B.
intersection: A
B = x [ x A and x B.
disjoint: if A
B = then we say A and B are disjoint.

partition: if
A
i
= X and the A
i
are disjoint then we say they partition X.
De Morgans law: (A
B)
c
= A
c
B
c
and (A
B)
c
= A
c
B
c
relative complement of B in A: AB = all x in A but not in B.
symmetric dierence: AB = (A
B)(A
B) = (AB)
(BA)
1.3. Maps between Sets. It is really important to consider the relationships, or maps
between sets beside looking within each set.
Denition 1.6. A map X
f
-
Y between X and Y is a law that assigns to each x X
a unique element f(x) Y .
Example 1.7. For every nonempty set X there exists a map X
f
-
X, x x, called
the identity map and often denoted id
X
.
Example 1.8. For every nonempty sets X, Y there exists a map X
f
-
Y, x y
0
for
some y
0
Y , called a constant map.
Denition 1.9. Given X
f
-
Y and Y
g
-
Z we dene their composition to be
X
gf
-
Z, x g(f(x)).
Below is the picture of composition,
X
f
-
Y x
-
f(x)
Z
g
?
g
f
-
g(f(x))
?
-
f
-
Y is called injective (or one-to-one) if f(x) ,= f(x
)
whenever x ,= x
X. It is called surjective (or onto) if f(X) = f(x), all x X = Y .

It is called bijective (or one-to-one and onto) if it is both injective and surjective.
Every bijection X
f
-
Y has an inverse Y
g
-
X, y x where x is the unique element
in X that maps to y. We denote this map as f
1
. Surely gf = id
X
and fg = id
Y
, together
they solidify the impression that X and Y are the same.
3
Example 1.11. If m n then there exists an injection x
1
, ..., x
m
f
-
y
1
, ..., y
n
. In
general, an injection X
f
-
Y allows us to view X as a subset of Y . Conversely, each
subset U X is an injection U
f
-
X.
Example 1.12. If m < n then there does not exist any surjection x
1
, ..., x
m
f
-
y
1
, ..., y
n
.
The same does not hold for innite sets, as we can have a surjection 5Z
-
Z, 5z z.
Already maps allow us to rigorously compare the cardinalities of sets. We say that
[X[ [Y [ if there exists an injection X
f
-
Y , that [X[ [Y [ if there exists a surjection
X
f
-
Y , and that [X[ = [Y [ if there exists a bijection X
f
-
Y . In the last case, we
can dene an inverse Y
g
-
X, y = f(x) x such that gf = id
X
and fg = id
Y
. This
inverse is unique and denoted f
1
.
Example 1.13. 5Z, Z
n
for all n 1, Q all have the same cardinality.
Exercise 1.14. Show that [Z[ < [R[ with strict inequality.
2. Topological Spaces
One of the rst structures we can give a set X is to pick out some subsets in X.
And if our choice satises certain properties, it can be used to formalize the notions of
convergence, continuity, connectedness, etc.
Denition 2.1. A topological space is a set X together with a collection T of subsets of
X that satises,
(1) , X T .
(2) (closure under union)
i
U
i
T if U
i
T .
(3) (closure under nite intersection)
n
i=1
U
i
T if U
i
T .
The subsets in T are called open sets and T is called a topology for X. Together they
are called a topological space and denoted (X, T ) though we often drop the T . A subset
of X may be neither closed nor open, either closed or open, or both.
Example 2.2. Every set X has the trivial topology T = , X and the discrete topology
P(X).
Example 2.3. The nite set X = x
1
, ..., x
6
has two topologies T = , x
1
, x
2
, X
and T = , x
1
, x
2
, x
1
, x
2
, x
3
, x
4
, X. They are ner than the trivial topology and
coarser than the discrete topology.
Example 2.4. If X is an innite set then the collection T = , X, all nite subsets of X
do not form a topology. One can nd a countable union of nite subsets that is not in T .
Example 2.5. The real line R together with the topology generated by all open intervals
(a, b), a, b R is a topological space.
Among topological spaces we only consider maps that respect their topologies.
4
f
-
Y between two topological spaces is called continuous if
f
1
(V ) is open for every open set V Y .
Exercise 2.7. Show that composition of two continuous maps is continuous.
We are more familiar with the (, ) denition of continuity, but it only applies to spaces
with metrics. This denition of continuity by open sets is more general and agrees with
that over metric spaces. As with maps between sets, a continuous map X
f
-
Y is called
injective if f(x) ,= f(x
) whenever x ,= x
and f is called surjective if f(X) = Y . When f

is bijective we can dene a map Y
-
X, y x where x is the unique element that f
maps to y. This map is not necessarily continuous. When it is, we denote it as f
1
, call
f a homeomorphism and write X Y .
Example 2.8. Every map (X, P(X))
f
-
(Y, T
Y
) is continuous.
Example 2.9. Every constant map (X, T
X
)
f
-
(Y, T
Y
) is continuous.
Exercise 2.10. Show that the function (R, B(R))
f
-
(R, B(R)), 0 0 and 0 ,= x
sin(
1
x
2
) is continuous everywhere but 0.
Example 2.11. Dene (1, 1)
f
-
(, ), x
x
(x+1)(x1)
then f is a homeomorphism
between (1, 1) and R. In topology, these spaces are the same. What is the inverse of f?
Whenever we have objects with some structure, we also consider subobjects with the
same structure.
Denition 2.12. A subset X
(X, T ) together with the induced topology T
=
X
U, U T is called a subspace of (X, T ).

Each open set U
comes from an open set U T . More generally, we can view

a subspace (X
, T
) (X, T ) as an inclusion X
i
-
X such that T
is the smallest
topology on X
to make i continuous.
Example 2.13. (Z, P(Z)) is a subspace of (R, B(R)).
Exercise 2.14. Consider Q (R, B(R)) with the induced topology. Show that,
(a) 0 is not open in Q so this induced topology is not the discrete topology P(Q).
(b) If a, b are rational then the interval a < q < b, q Q is open in Q.
(c) If a, b are rational then the interval a q b, q Q is closed in Q.
(d) If a, b are irrational then the interval a < q < b, q Q is both open and closed in Q.
3. Groups
While a topology gathers elements into open and closed subsets, an operation connects
elements in a dierent way.
Denition 3.1. A group is a set Gtogether with an operation that satises the following,
(1) (closure) g h G for any g, h G.
(2) (associativity) (g h) k = g (h k) for all g, h, k G.
5
(3) (identity element) there exists an element e G such that e g = g e = g for all
g G.
(4) (inverse element) there exists an element g
1
G such that g g
1
= g
1
g = e for
all g G.
From these four group axioms one can deduce the uniqueness of both e and g
1
. If H
is a subset in (G, ) such that (H, ) is itself a group then we call H a subgroup of G.
Example 3.2. We have the trivial group S = .
Example 3.3. All Z, nZ, Z/nZ, Q, R, C are groups under the usual addition. Surely
Q
, R
, C
are groups under multiplication. Some groups here are nite while others are
innite.
Taken for granted is the fact that g +h = h +g for all h, g in any of the above groups.
This is not guaranteed in general. Nor is it guaranteed that a subset H (G, ) is a group
under , because H may not be closed under multiplication or inverse.
Denition 3.4. A group (G, ) is called commutative if g h = h g for all g, h G.
Denition 3.5. A subset H (G, ) is called a subgroup if it is a group under .
Example 3.6. Consider S
n
= all bijections from 1, ..., n to 1, ..., n under compo-
sition. For each n, S
n
is a group with cardinality n!. For n 3, S
n
is noncommutative
with many commutative and noncommutative subgroups.
Example 3.7. The set C(X, R) of all continuous functions from X to R is a subgroup of
the set F(X, R) of all functions from X to R under addition. What do we need to modify
under multiplication?
Exercise 3.8. Determine which of the following operations are associative:
1. on Z dened by a b = a b.
2. on R dened by a b = a + b + ab.
3. on Z Z = (z
1
, z
2
), z
i
Z dened by (z
1
, z
2
) (z
1
, z
2
) = (z
1
+ z
1
, z
2
+ z
2
)
Exercise 3.9. Determine which of the following sets are groups, and which groups are
commutative:
(1) the set of all rational numbers with odd denominators under addition.
(2) the set of all rational numbers with even denominators under addition.
(3) the set of all rational numbers with denominators 1, 2, or 3 under addition.
(4) the set of all nonzero rational numbers under multiplication.
(5) the set of all integers under multiplication.
(6) the set M(2, R) of all 2 2 matrices over R under addition.
(7) the set M(2, R) of all 2 2 matrices over R under multiplication.
(8) the set of all n
th
roots of unity R = z C, z
n
= 1, n > 0 under multiplication.
(9) the set of all n
th
roots of unity R = z C, z
n
= 1, n > 0 under addition.
Again we consider relationships between groups, i.e. maps between groups that respect
their group structures.
Denition 3.10. A map (G, )
f
-
(H, ) such that f(g g
) = f(g) f(g
) is called a
group morphism.
6
Exercise 3.11. Show that composition of two group morphisms is a group morphism.
Example 3.12. (nZ, +)
-
(Z, +)
-
(Q, +)
-
(R, +)
-
(C, +).
Example 3.13. (Z, +)
-
(Z/nZ, +), n n.
Example 3.14. (Z/nZ, +)
-
(R, ), k e
i2k
n
.
Example 3.15. (R, +)
-
(R
, ), x e
x
.
As with maps between sets, a group morphism G
f
-
H is called injective iif f(g) ,=
f(g
) whenever g ,= g
i f
1
(1
H
) = 1
G
and f is called surjective if f(G) = H. We
call f
1
(1
H
) and f(G) the kernel and image of f, and denote them as ker(f) and im(f)
respectively. Both are subgroups of G and of H. When f is bijective we can dene a map
H
-
G, h g where g is the unique element that f maps to h. This map is a group
morphism. We denote it as f
1
, call f a group isomorphism and write G H.
Exercise 3.16. Show that if a group morphism G
f
-
H is bijective then f
1
is indeed
a group morphism. Hint: show that f
1
(hh
) = f
1
(h)f
1
(h
) for all h, h
H.
Exercise 3.17. Given a group morphism G
f
-
H, show that ker(f) is a subgroup of
G and im(f) is a subgroup of H.
Exercise 3.18. Determine which of the following maps are group morphisms, which
group morphisms are injective, surjective, bijective.
(a) (5Z, +)
-
(Z, +), 5n 5n.
(b) (nZ, +)
-
(Z, +), 5n n.
(c) (Z, +)
-
(M(2, Z), +), ().
(d) (M(2, Z), +)
-
(Z, +),
_
a b
c d
_
a + d.
4. Rings and Fields
We observe that beside addition Z also has multiplication. This observation leads us
to a general denition.
Denition 4.1. A ring (R, +, ) is a set R with two binary operations + and called
addition and multiplication satisfying,
(1) (additive group) (R, +) is a commutative group.
(2) (associativity) (a b) c = a (b c) for all a, b, c R.
(3) (distributivity) (a + b) c = a b + b c and a (b + c) = a b + a c.
(4) (multiplicative identity) there exists an element 1 R such that 1 a = a 1 for
all a R.
We often suppress +, and simply write R unless they are needed to clarify matters.
When multiplication is commutative, we call R a commutative ring. While others may
not require a ring to have 1, we always do and the requirement of R being abelian is
actually redundant, for (1 +1) (a +b) = a +b +a +b and (1 +1) (a +b) = a +a +b +b
implies a + b = b + a for all a, b R.
7
Example 4.2. The set a, b of two elements with addition a +a = a, b +b = a, a +b =
b, b + a = b and multiplication a a = a, a b = a, b a = a, b b = b is a ring.
Example 4.3. The sets Z/nZ, Z, Q, R, C with the usual addition and multiplication all
are commutative rings.
Exercise 4.4. Determine which of the following are rings, commutative rings,
(1) nZ, n > 1.
(2) The set of all 2 2 matrices (M(2, R), +, ).
(3) The set C(R, [0, 1]) of all continuous functions from R to [0, 1] under the usual
addition and multiplicaton.
(4) The set C(R, R)of all continuous functions from R to R under the usual addition
and multiplication.
(5) The set C(R, R)of all continuous functions from R to R under the usual addition
and composition.
So we have seen some noncommutative rings and some noninvertible ring elements.
Better than a general ring is one in which division is possible. Best is a ring in which
division is possible and multiplication is commutative, something like Q and R.
Denition 4.5. (R, +, ) is called a division ring if each element a R has a multiplicative
inverse b R such that a b = b a = 1. We denote such b as a
1
.
Denition 4.6. A eld is a commutative division ring.
Surely all elds are division rings and it can be shown that all nite divisions rings
are elds. It is not easy to give example of a noncommutative division ring. The most
popular one is the Hamilton quarternion algebra.
Example 4.7. Z is not a eld while all Q, R, C are.
Exercise 4.8. Determine which of the following are elds:
(a) (Z/pZ, +, ) for p prime.
(b) (Z/pqZ, +, ) for p, q prime.
(c) The set of
_
a 0
0 d
_
(M(2, Z), 0 ,= a, b Z
_
0 0
0 0
_
under the usual matrix
addition and multiplication.
(d) The set of all
_
a 0
0 d
_
(M(2, R), 0 ,= a, b R
_
0 0
0 0
_
under the usual
matrix addition and multiplication.
(e) The set of all
_
a 0
0 d
_
(M(2, R), a, b R under the usual matrix addition and
multiplication.
Again we consider subsets in a ring R with the same ring structure.
Denition 4.9. a subet S (R, +, ) is a subring of R if (S, +, ) is also a ring. A subring
in a eld F is called a subeld.
Example 4.10. nZ Z Q R C as subrings. Both Q and R are subelds of C
while nZ and Z are not elds, hence not subelds.
8
Once among rings, we consider maps between them that respect their ring structures.
Denition 4.11. A ring morphism is a map R

-
S such that (a + b) = (a) + (b)
and (ab) = (a)(b).
Again composition of two ring morphisms is a ring morphism. As with group mor-
phism, a ring morphism R
f
-
S is called injective iif f(g) ,= f(g
) whenever g ,= g
i
f
1
(0
R
) = 0
S
and f is called surjective if f(R) = S. When f is bijective we call it a
ring isomorphism and write R S. We call f
1
(0
H
) and f(R) the kernel and image of
f, and denote them as ker(f) and im(f) respectively.
Exercise 4.12. Given a ring morphism R
f
-
S, determine if ker(f) is a subring of R
and if im(f) is a subring of S. Compare with exercise 3.17.
Exercise 4.13. Determine which of the following are ring morphisms,
(a) (Z, +, )
-
(M(2, Z), +, ),
_
0
0
_
.
(b) (M(2, Z), +, )
-
(Z, +, ),
_
a b
c d
_
a + d.
(c) (R, +, )
-
(M(2, R), +, ), ().
(d) (M(2, R), +, )
-
(R, +, ),
_
a b
c d
_
ad bc.
Exercise 4.14. Show that the ring in example 4.2 is isomorphic to Z/2Z.
Part 2. Linear Algebra
5. Vector Spaces
5.1. Basic Objects. Engineers and physicists represent dierent objects in their elds
as elements in one-dimensional space R, two-dimensional space R R, or generally n-
dimensional space R ... R. Such elements can be added, subtracted, or scaled by a
real number. Together they form what are called Euclidean vector spaces R
n
. We begin
with an abstract generalization.
Denition 5.1. A vector space over a eld F is a set V together with two binary oper-
ations + and that satisfy the following axioms,
(1) (vector addition) u + v V for any u, v V .
(2) (associativity of addition) (u + v) + w = u + (v + w) for all u, v, w V .
(3) (commutivity of addition) u + v = v + u.
(4) (identity element under addition) there exists an element 0 V such that 0 +u =
u + 0 = u for all u V .
(5) (inverse element under addition) there exists an element u such that u+(u) =
u + u = 0 for all u V .
(6) (scalar multiplication) u V for any F and u V .
(7) (distributivity of scalar multiplication with respect to vector addition) (u+v) =
u + v.
9
(8) (distributivity of scalar multiplication with respect to eld addition) (+) u =
u + u.
(9) (compatitbility of scalar multiplication with eld multiplication) ()u = (u).
(10) (identity element of scalar multiplication) 1 u = u for any u V and 1 is the
multiplicative identity in F.
For those with some background in abstract algebra, the rst ve axioms mean V is
a commutative group under + and the next ve axioms mean V is an F-module. The
elements in V are called vectors while the elements in F are called scalars. We give some
examples.
Example 5.2. The singleton set V = under the trivial + and is a vector space over
any eld F. It is called the zero vector space over F.
Example 5.3. The plane R
2
= all 2-tuples (x, y) with x, y R over the eld R un-
der the usual operations + and is a vector space. More generally, the space R
n
=
all n-tuples (x
1
, ..., x
n
) with x
i
R over the eld R under the usual operations + and
is a vector space. They are called Euclidean spaces and will be our main focus in this
course.
Exercise 5.4. Determine if the set of all nite sums S =
1
s+
2
t+
3
u+
4
v+
5
w, a
i

C, s, t, u, v, w indeterminates under the usual operations + and is a vector space over
the eld C.
Example 5.5. We consider dierent classes of functions from R
n
to R.
(1) A function R
n
f
-
R, (x
1
, ..., x
n
) f(x
1
, ..., x
n
) is called linear if f(u +v) =
f(u) + f(v) for any , R, u, v R
n
. If we dene addition f + g as (f +
g)(x) = f(x) + g(x) and scalar multiplication as ( f)(x) = f(x) then the set
L = all linear functions R
n
f
-
R over the eld R under + and is a vector
space.
(2) More generally, a function R
n
g
-
R, (x
1
, ..., x
n
) g(x
1
, ..., x
n
) is called ane if
g(x) = f(x) + for some linear function f and some R. One can verify that
this condition is equivalent to the condition g(u + v) = f(u) + f(v) for any
+ = 1 R, u, v R
n
. The set M = all ane functions R
n
g
-
R over the
eld R under the usual operations + and is a vector space.
(3) Most generally, the set N = all functions R
n
h
-
R over the eld R under same
operations is a vector space.
Next we consider subobjects in the category of vector spaces over F.
Denition 5.6. A subset U (V, +, ) of a vector space V over F is called a subspace if
U under + and is also a vector space over F.
Example 5.7. In example 5.5 L M N as subspaces over R. We will know more
about them later.
Example 5.8. We can view R
2
as a subspace U = (x, y, 0), x, y R
2
R
3
. Why isnt
U
= (x, y, 1), x, y R
2
R
3
a subspace?
10
Exercise 5.9. Verify that the set R[x] = p(x) = a
n
x
n
+ ... + a
1
x + a
0
, a
i
R of all
polynomials in one indeterminate x over R is a vector space over R. What are some of
its nontrivial subspaces?
Together addiction and scalar multiplication allow us to form linear combinations of
vectors in R
n
.
Denition 5.10. Given v
1
, ..., v
n
V over F we dene their linear combination to be
1
v
1
+... +
n
v
n
, any
1
, ...,
n
F. The scalars
i
are called coecients of the linear
combination. If
1
v
1
+ ... +
n
v
n
= 0 for some nonzero
i
then v
i
can be written as
a linear combination of v
1
, .., v
i1
, v
i+1
, ..., v
n
and we say v
1
, ..., v
n
are linearly dependent.
Else we say they are linearly independent.
Denition 5.11. The set
1
v
1
+... +
n
v
n
,
i
R of all linear combinations of v
1
, ..., v
n
is called their span and denoted Span(v
1
, ..., v
n
).
Example 5.12. It is easy to see that v
1
, v
2
are linearly dependent i v
1
= v
2
as a
multiple. In that case Span(v
1
, v
2
) = Span(v
1
) = Span(v
2
) a line.
Example 5.13. It is harder to see that (1, 2, 3), (4, 5, 6), (2, 1, 0) are linearly dependent
and that Span((1, 2, 3), (4, 5, 6), (2, 1, 0)) = Span((1, 2, 3), (4, 5, 6)) a plane. Generally,
if v
1
, v
2
in R
3
are linearly independent then
1
v
1
+
2
v
2
, all
i
R make up the
plane containing v
1
, v
2
.
Exercise 5.14. Show that (1, 2, 0), (4, 0, 5), (6, 4, 3) are linearly independent and that
they span the whole R
3
.
Example 5.15. Given the system of one linear equation x y + 2z = 5, its solu-
tions form the plane (5 + s 2t, s, t), s, t R in R
3
. This is equivalent to the map
R
2
f
-
R
3
, (s, t) (5 + s 2t, s, t). We can also describe this plane as (5, 0, 0) +
s(1, 1, 0) +t(2, 0, 1), s, t R, essentially spanned by (1, 1, 0) and (2, 0, 1) up to trans-
lation.
Example 5.16. Any pair of indeterminates in example 5.4 are linearly independent.
Denition 5.17. A subset B = v
i
iI
, I an indexing set and v
i
V , is called a basis
for V if B is linearly independent and Span(B) = V .
By denition, every v V can be written as a linear combination v =
iI
i
v
i
of
members of basis B = v
1
, ..., v
n
and such representation is unique by linear independence
of B. Sometimes we write v = (
i
) in its coordinate form if the v
i
are ordered. One can
imagine that v has dierent representations and dierent coordinate forms in dierent
bases.
Example 5.18. If we choose B = (1, 0), (0, 1) as a basis for R
2
then v = (3, 4) can
be written as 3(1, 0) + 4(0, 1) with coordinate form (3, 4). If B
= (0, 1), (1, 0) then

still v = 3(1, 0) + 4(0, 1) = 4(0, 1) + 3(1, 0) but its coordinate form is now (4, 3). If
B
= (1, 0), (0, 2) is chosen then v = 3(1, 0) + 2(0, 2) = (3, 2) in its coordinate form.
Exercise 5.19. Find the representation and coordinate form of (3, 4) in basis B =
(
1
2
,
1
2
), (
1
2
,
1
2
).
11
The previous examples show no basis is more special than the rest, only some bases are
nicer for computation than others. Moreover, the order in each basis aects coordinate
form representation. What is true is every vector space has a basis B = v
i
, i I and
all bases of V have the same size, though that may be innite. Now we can associate the
rst invariant to each vector space V over a eld F, generalizing the notion of dimension
that we often speak of for R
n
.
Denition 5.20. We dene the dimension dim(V )/F of V as the size of any basis for V
over F.
Example 5.21. The unit vectors e
1
= (1, 0, .., 0), e
2
= (0, 1, 0, ..., 0), ..., e
n
= (0, ..., 0, 1)
together form a basis for R
n
since they are clearly linearly independent and any v =
(
1
, ...,
n
) can be written as
1
(1, 0, ..., 0) + ... +
n
(0, ..., 1). The dimension of R
n
is n
as conventionally known.
Example 5.22. The vectors s, t, u, v, w together form a basis for our vector space V over
C in example 5.4. Its dimension is 5. Viewed as a vector space over R, however V has
dimension 10 since one of its bases is s, is, t, it, u, iu, v, iv, w, iw.
Exercise 5.23. Show that the vector space N in example 5.5 has innite dimension. It
fact, any of its bases must be uncountable.
5.2. Inner Product. This section focuses on vector spaces over R. If they are to enjoy
some sort of multiplication V V
,
-
R, we must expect the following.
Denition 5.24. An inner product on a vector space V over R is any function V
V
,
-
R that satises the following axioms,
(1) (symmetry) u, v) = v, u).
(2) (linearity) u + v, w) = u, w) + v, w).
(3) (positive deniteness) u, u) 0 for all u V , with equality i u = 0.
A vector space V over R equipped with an inner product is called an inner product
space. Note that the product of two vectors is a scalar in R. We can turn R
n
into an
inner product space as follows.
Denition 5.25. Given two vectors u = (u
1
, ..., u
n
), v = (v
1
, ..., v
n
) R
n
we dene their
inner product to be u, v) =
n
i=1
u
i
v
i
= u
1
v
1
+ ... + u
n
v
n
.
One can verify that this newly minted product satises the above three axioms. Later
on we will also dene outer product for R
n
as well. Meanwhile let us see some examples.
Example 5.26. The inner product of a vector u = (u
1
, ..., u
n
) R
n
with the i
th
unit vector e
i
picks out its i
th
coordinate u
i
. This actually induces a linear function
R
n
ge
i
-
R, (u
1
, ..., u
n
) u
i
.
Example 5.27. If u = (u
1
, ..., u
n
) R
n
then u, u) = u
2
1
+ ... + u
2
n
.
Example 5.28. If a
i
, b
i
0, 1 and a = (a
1
, ..., a
n
), b = (b
1
, ..., b
n
) R
n
then a, b) is
the number of indices i where a
i
= b
i
= 1.
12
Following is a nice statement about inner product and linear functions as seen in ex-
ample 5.5.
Theorem 5.29. For any a = (a
1
, ..., a
n
) R
n
the function R
n
fa
-
R, u = (u
1
, ..., u
n
)
a, u) is linear. Conversely, any linear function f equals f
a
for some a R
n
.
Proof. That f
a
= a, ) is linear follows from linearity of inner product. Conversely, for
any linear function R
n
f
-
R and any u = (u
1
, ..., u
n
) R
n
we have f(u) = f(u
1
e
1
+
... + u
n
e
n
) = u
1
f(e
1
) + ... + u
n
f(e
n
) = a, u) = f
a
(u) where a = (f(e
1
), ..., f(e
n
)).
What a linear function R
n
f
-
R does to basis e
1
, ..., e
n
or to any other basis
completely determines its whole behavior. We will return to this later. Here are two nice
results.
Corollary 5.30. The space L of all linear functions from R
n
to R has dimension n.
Proof. It follows from the theorem that any linear function f = f
a
= a, ) = (f(e
1
), ..., f(e
n
)), ) =
f(e
1
)g
e
1
+ ... + f(e
n
)g
en
where g
e
i
was dened in example 5.26. Furthermore, these
g
e
1
, ..., g
en
are linearly independent, hence they form a basis for L. Summarily, ev-
ery basis B = b
1
, ..., b
n
for R
n
corresponds to a basis B
= g
b
1
, ..., g
bn
for L. Hence
R
n
and L share the same dimension n.
We now look at some examples.
Example 5.31. Taking average of the coordinates of a vector x, f(x) = (x
1
+... +x
n
)/n
is linear. Surely f = f
a
where a = (1/n, ..., 1/n).
Example 5.32. Taking maximum of the coordinates of a vector x, f(x) = max x
1
, ..., x
n
is not linear. To see this, pick n = 2, x = (1, 1), y = (1, 1). Then f(x+y) ,= f(x)+f(y).
Hence it can not be represented by any inner product.
Another nice result from theorem 5.29 is the following statement about the space M of
all ane functions from R
n
to R.
Corollary 5.33. The space M of all ane functions from R
n
to R has dimension n +1.
Proof. This follows from denition of ane functions in example 5.5 and previous corol-
lary.
While M is much smaller than N, every continuously dierentiable function f N has
a good ane approximation in M. Recall that if R
n
f
-
R, x = (x
1
, ..., x
n
) f(x) is a
continuously dierentiable then we can take the continuous partial derivatives
f(x)
x
i
, i =
1, ..., n and form its gradient _f(x) = (f(x)/x
1
, ..., f(x)/x
n
). The rst-order Taylor
approximation of f near x is dened as f
a
(x) =
n
i=1
f(x)
x
i
(x
i
x
i
) +f(x
) = _f(x
), (x
x)) + f(x
). This function f
a
is certainly linear and gives a good approximation of f(x)
when x
is near x.
Example 5.34. When n = 1 this is none other than the usual Taylor approximation
f
a
(x
) = f
(x)(x
x) + f(x) we often see.

13
Example 5.35. Consider R
2
f
-
R, (x
1
, x
2
) e
x
1
+x
2
1
+ e
x
1
x
2
1
+ e
x
1
1
. Then
_f(x) = (e
x
1
+x
2
1
+e
x
1
x
2
1
e
x
1
1
, e
x
1
+x
2
1
e
x
1
x
2
1
). At (0, 0), _f((0, 0)) = (1/e, 0).
Hence the rst-order Taylor approximation of f near (0, 0) is f
a
(x) = _f((0, 0)), x
(0, 0)) + f((0, 0)) = x
1
/e + 3/e.
5.3. Norm. Given a vector space V over R we want to make precise how large each vector
v V is. Of course, there are dierent ways to do so, but they all must meet certain
expectations.
Denition 5.36. A norm on V over R is a function V
||||
-
R that satises,
(1) (positive deniteness) [[v[[ 0, with equality i v = 0.
(2) (homogeneity) [[ v[[ = [[[[v[[.
(3) (triangle inequality) [[u + v[[ [[u[[ +[[v[[.
Any vector space V over R with norm is called a normed space.
Example 5.37. One obvious way to dene a norm on R
n
is [[(u
1
, ..., u
n
)[[ = [u
1
[+...+[u
n
[.
This is called the 1-norm and denoted [[ [[
1
.
Example 5.38. Another norm we have on R
n
is the usual Euclidean norm[[(u
1
, ..., u
n
)[[ =
_
u
2
1
+ ... + u
2
n
=
_
u, u). It is called the 2-norm and denoted [[ [[
2
. The Euclidean
norm is related to the root mean square (RMS, instead of mean or mean square) value
of a vector u, dened as RMS(u) =
_
1
n
(u
2
1
+ ... + u
2
n
) =
1
n
[[u[[. This quantity roughly
tells us about the typical value of the coordinates u
i
with respect to n.
Example 5.39. More generally we dene the p-norm on R
n
as [[(u
1
, .., u
n
)[[
p
=
p
_
u
p
1
+ ... + u
p
n
for any 1 p < .
Example 5.40. We dene the -norm on R
n
as [[(u
1
, ..., u
n
)[[ = max [u
1
[, ..., [u
n
[. It
is denoted [[ [[
.
Exercise 5.41. Draw all the vectors of norm 1 in R
2
, where norm here is the 1-norm,
the p-norm for 1 < p < 2, the 2-norm, the p-norm for 2 < p < , and the -norm.
Exercise 5.42. Given a measure space (X, T, ), consider
L(X, ) = all measurable X
f
-
R such that
_
X
[f[
p
d <
Show that we can dene a norm L
||||p
-
R, f
_
_
X
[f[
p
d
_
1/p
for L as follows.
(a) (positive semideniteness) [[f[[
p
0, with equality i f is 0 almost everywhere.
(b) (homogeneity) [[f[[
p
= [[[[f[[
p
.
(c) (triangle inequality) [[f + g[[
p
[[f[[
p
+[[g[[
p
.
(d) (positive deniteness) Let L
= L modulo those functions that are 0 almost every-

where, then [[ [[
p
is a norm on L
. Together with this norm, L
is called the L
p
space
in analysis.
Example 5.43. One more norm we can give R
n
is [[u[[
w
=
_
(u
1
/w
1
)
2
+ ... + (u
n
/w
n
)
2
=
_
v, w) for some w = (w
1
, ..., w
n
). What this norm does is assigning dierent weights
14
w
1
, ..., w
n
to the coordinates u
1
, ..., u
n
of u. Euclidean norm is now a special case of
weighted norm, where each coordinate is given the same weight 1. Moreover, when each
coordinate of u is a physical quantity with unit then the weights are often of the same
units, so that u has unitless norm.
We just saw that [[u[[
2
2
= u, u) on R
n
. In general, any inner product , ) on a vector
space V over R induces a norm [[u[[ =
_
u, u). Such denition easily clears the rst
three axioms of being a norm due to the properties of inner product. We establish a nice
theorem that would imply the triangle inequality axiom for such norm.
Theorem 5.44. (Cauchy-Schwarz Inequality) For all u, v in an inner product space V
with induced norm [[ [[ we have [u, v)[ [[u[[[[v[[. Moreover, equality holds i u, v are
linearly dependent.
Proof. If either u or v is 0 then inequality holds. Else consider the quadratic polynomial
p(t) = [[tu +v[[
2
= tu +v, tu +v) = u, u)t
2
+2u, v)t +v, v). Being a square, p(t) 0,
hence its discriminant 4u, v)
2
4u, u)v, v) 0, or u, v)
2
u, u)v, v). Equality holds
i the discriminant is 0 i p(t) = 0 i ut +v = 0 i v is a multiple of u. Do you know why
we considered such polynomial p(t) as to show us exactly what we needed to see?
Corollary 5.45. (Triangle Inequality) The induced norm [[ [[ from an inner product on
V satises triangle inequality [[u + v[[ [[u[[ +[[v[[, all u, v V .
Proof. Clearly [[u+v[[
2
= u+v, u+v) = u, u)+2u, v)+v, v) = [[u[[
2
+2u, v)+[[v[[
2
[[u[[
2
+ 2[[u[[[[v[[ +[[v[[
2
= ([[u[[ +[[v[[)
2
, from which our claim follows.
Thus an inner product on V induces a norm. A norm in turn induces distance.
Denition 5.46. For any vectors u, v in a vector space V equipped with norm [[ [[, we
dene the distance between them to be dist(u, v) = [[u v[[.
Example 5.47. All the above norms on R
n
induce dierent distances between vectors in
R
n
, though some of them are rather antiintuitive.
5.4. Angle between Vectors. As an inner product , ) induces a norm for V , we
want to relate u, v) and [[u[[, [[v[[ for any pair u, v V . For example, when u = v V
then u, v) = [[u[[[[v[[ = [[u[[
2
and when u = v then u, v) = [[u[[[[v[[ = [[u[[
2
. Hence
we suspect the ratio
u,v
||u||||v||
bears some correlation between u and v, perhaps how u lines
up against v.
Denition 5.48. For nonzero u, v in a vector space V with inner product , ) and
induced norm [[ [[ we dene their correlation coecient (u, v) =
u,v
||u||||v||
.
This correlation coecient, viewed as a function V V

-
R is surely symmetric.
It ranges between -1 and 1 by Cauchy-Schwarz inequality. Two vectors u, v said to be
correlated if (u, v) is close to 1, uncorrelated if (u, v) is close to 0, and anticorrelated if
(u, v) is close to 1.
Example 5.49. Consider u = (0.1, 0.3, 1.3, 0.3, 3.3), v = (0.2, 0.4, 3.2, 0.8, 5.2), w =
(1.8, 1.0, 0.6, 1.4, 0.2). Then [[u[[ = 3.57, [[v[[ = 6.17, [[w[[ = 2.57, u, v) = 21.7, u, w) =
0.06, (u, v) = 0.98, (u, w) = 0.007. Therefore u and v are more correlated than u
and w are (v is roughly twice u).
15
Example 5.50. (Throw back to probability theory) The set R = (, T, P)
X
-
R of
all real-valued random variables form a vector space over R. If we dene an inner product
for this space as X, Y ) = E((X
X
)(Y
Y
)) then X, Y ) is none other cov(X, Y ),
X, X) = [[X[[
2
is none other than var(X), [[X[[ is none other than
X
, and correlation
coecient (X, Y ) is none other than the usual correlation coecient corr(X, Y ) as known
in probability theory.
Cauchy-Schwarz inequality is also useful as it helps us dene angle between two vectors.
Denition 5.51. In an inner product space V we dene the angle between two vectors
u and v to be (u, v) = arccos((u, v)) = arccos
_
u,v
||u||||v||
_
.
Example 5.52. We can now speak of angle between two random variables X, Y .
This denition agrees with the usual notion of angle between vectors in R
2
and R
3
,
while generalizing to R
n
, n > 3. If (u, v) = 0
we say the u, v are aligned, i.e. they

have correlation coecient 1, or that each vector is a positive multiple of the other. If
0
< (u, v) < 90
we say u, v make an acute angle, i.e. they have positive correlation

coecient. If (u, v) = 90
we say u, v are orthogonal, i.e. they are uncorrelated and

written u v. If 90
< (u, v) < 180
we say u, v make an obtuse angle, i.e. they have

negative correlation coecient. Lastly, if (u, v) = 180
we say u, v are antialigned, i.e.

they have correlation coecient 1 and each vector is a negative multiple of the other.
We are more interested in orthogonal vectors because they make excellent bases.
Denition 5.53. A vector v in a normed space V is called a unit vector if [[v[[ = 1.
Any vector v V can be easily normalized and replaced by an aligned unit vector
v
||v||
.
Denition 5.54. A collection of vectors v
1
, ..., v
n
in an inner product space V with
induced norm are said to be orthonormal if each v
i
is a unit vector and v
i
v
j
for any
i ,= j.
If v =
1
v
1
+... +
n
v
n
is a linear combination of an orthonormal collection v
1
, ..., v
n
then surely v
i
, v) = v
i
,
1
v
1
+ ... +
n
v
n
) =
n
j=1
v
i
,
j
v
j
) =
i
1 =
i
. So taking inner
product with v
i
yields the i
th
coecient in the linear combination for v. This implies any
orthonormal collection v
1
, ..., v
n
are linearly independent, which is most useful when
n = dim(V ) and the v
1
, ..., v
n
form a basis for V .
Example 5.55. Both collections A = (1, 0, 0), (0, 1, 0) and
B = (0, 0, 1), (1/
2, 1/
2, 0), (1/
2, 1/
2, 0) are orthonormal while

C = (1, 1, 0), (1, 1, 1) can be normalized and completed into an orthonormal basis.
Write (2, 7, 3) as a linear combination in B and in C.
5.5. Exercises.
Exercise 5.56. page 18: 1.2, 1.7, 1.9, 1.10, 1.11, 1.12, 1.15, 1.16, 1.17, 1.21
6. Linear Maps and Matrices
6.1. Morphisms between Vector Spaces. We step back from vectors within a vector
space V/F and consider the relationships between nite dimensional vector spaces over
16
the same eld F. Given two vector spaces V, W we would expect any map between them
to respect their linear structures.
Denition 6.1. A map V
f
-
W between two vector spaces V, W over F is called linear
if f(u + v) = f(u) + f(v) for all u, v V and , F.
Example 6.2. The trivial map V
f
-
W, v 0 between any two vector spaces is
certainly linear.
Example 6.3. The map R
2
f
-
R
3
, (x, y) (x, y, 0) is linear. This is how we embed
R
2
into R
3
as seen before. In similar fashion can R
m
be embedded into R
n
, m < n .
The most notable thing about a linear map V
f
-
W is that it is completely determined
by what it does to a basis B = v
1
, ..., v
n
V . More precisely, if we write V v =
a
1
v
1
+... +a
n
v
n
then f(v) = a
1
f(v
1
) +... +a
n
f(v
n
) by linearity of f. If we also choose a
basis C = w
1
, ..., w
m
for W then we can write f(v
i
) = b
1i
w
1
+... +b
mi
w
m
= (b
1i
, ..., b
mi
)
t
.
Hence f(v) = a
1
f(v
1
)+...+a
n
f(v
n
) = a
1
(b
11
, ..., b
m1
)
t
+...+a
n
(b
1n
, ..., b
mn
)
t
= (a
1
b
11
+...+
a
n
b
1n
, ..., a
1
b
m1
+... +a
m
b
mn
)
t
=
_
_
b
11
... b
1n
. . .
b
m1
... b
mn
_
_
(a
1
, ..., a
n
)
t
if we dene multiplication
between an mn matrix and an n1 matrix as such (we have used the transpose notation
t here without having introduced it yet.) We state this in a theorem.
Theorem 6.4. Any linear map V
f
-
W can be represented by a matrix A
f
once V, W
are given bases. Conversely any m n matrix A is a linear map between a vector space
V/F of dimension n and a vector space W/F of dimension m.
Proof. It remains to show that as a map, A(v +
) = A(v) +
A(v), but this follows

from denition of matrix multiplication above.
Domain V and codomain W for A are often understood to be F
n
with canonical basis
e
1
= (1, 0, ..., 0), ..., e
n
= (0, ..., 0, 1) and F
m
with e
1
, ..., e
m
. Let us look at some
examples.
Example 6.5. The zero map V
0
-
W, v 0 is linear, it is represented by the 0 matrix
with respect to any bases for V and W.
Example 6.6. The identity map V
id
V
-
V, v v is linear, it is represented by the
identity matrix I M(n, F) w.r.t. any basis for V . More generally, scaling by F is
linear and represented by the matrix () M(n, F) w.r.t. any basis for V .
Example 6.7. Reection across any line in the plane R
2
is linear. If we choose an
orthonormal basis u
1
, u
2
such that r(u
1
) = u
2
and r(u
2
) = u
1
then clearly it is repre-
sented as
_
0 1
1 0
_
. Similarly, reection across u
1
is linear and represented by the matrix
_
1 0
0 1
_
while projection onto u
1
is represented by
_
1 0
0 0
_
.
17
Example 6.8. For any linear map V
A
-
W with respect to basis v
1
, ..., v
n
for V ,
A(v
i
) equals the i
th
column of A, as seen below,
_
_
_
_
c
11
... c
1i
... c
1n
: : : : :
: : : : :
c
n1
... c
ni
... c
nn
_
_
_
_
_
_
_
_
_
_
0
:
1
i
:
0
_
_
_
_
_
_
=
_
_
_
_
_
_
c
1i
:
:
:
c
ni
_
_
_
_
_
_
.
6.2. Kernel, Nullity, Image, Rank. The rst thing we look at in each linear map
V
f
-
W is what it destroys in V and what it reaches in W.
Denition 6.9. For any linear map V
f
-
W we dene ker(f) = v V such that f(v) =
0. The number dim(ker(f)) is called the nullity of f.
Denition 6.10. For any linear map V
f
-
W we dene im(f) = w W such that w =
f(v) for some v V . The number dim(im(f)) is called the rank of f.
Clearly f is injective i nullity(f) = 0 and f is surjective i rank(f) = dim(W).
Furthermore, if v
1
, ..., v
k
is a basis for ker(f) then it can be completed to a basis
v
1
, ..., v
k
, v
k+1
, ..., v
n
for V . Write any V v =
1
v
1
+...
k
v
k
+
k+1
v
k+1
+... +
n
v
n
then
f(v) =
1
f(v
1
) +...
k
f(v
k
) +
k+1
f(v
k+1
) +... +
n
f(v
n
) =
k+1
f(v
k+1
) +... +
n
f(v
n
).
Therefore im(f) = Span(f(v
k+1
), ..., f(v
n
)). While dim(ker(f)) = k, we see dim(im(f))
n k. Equality follows from the following theorem.
Theorem 6.11. (Rank-Nullity) If V
f
-
W is a linear map then dim(V ) = nullity(f)+
rank(f).
Proof. It remains to show f(v
k+1
), ..., f(v
n
) are linearly independent and thus form a
basis for im(f). Suppose
k+1
f(v
k+1
) + ... +
n
f(v
n
) = 0 for some
k+1
, ...,
n
F,
then f(
k+1
v
k+1
+ ... +
n
v
n
) = 0. So
k+1
v
k+1
+ ... +
n
v
n
ker(f) and we can write
k+1
v
k+1
+... +
n
v
n
=
1
v
1
+... +
k
v
k
for some
1
, ...,
k
F. Since the v
i
form a basis
for V , the
i
must all be 0. In particular,
k+1
= ... =
n
= 0.
Example 6.12. The zero map V
0
-
W has nullity(0) = dim(V ) and rank(0) = 0 while
scaling V
s
-
V has nullity(s) = 0 and rank(s) = dim(V ).
Example 6.13. Reections in example 6.7 has nullity 0 and rank 2. On the other hand,
projection onto u
1
has nullity 1 and rank 1. In general, if u
1
, ..., u
k
are linearly indepen-
dent in V then projection onto Span(u
1
, ..., u
k
) has rank = k and nullity = dim(V ) k.
It follows that V
f
-
W is bijective i dim(V ) = rank(f) = dim(W). In that case
we can dene a map W
-
V, w v where v is the unique element that f maps to w.
This map is linear. We denote it as f
1
, call f a linear isomorphism and write V W.
Exercise 6.14. Show that if a linear map V
f
-
W is bijective then f
1
is indeed linear.
Hint: show that f
1
(w +
) = f
1
(w) +
f
1
(w
) for all ,
F, w, w
W.
Exercise 6.15. Dene a bijection between the space in example 5.4 and R
5
. Dene a
nonbijection between them.
18
If we view each matrix A M(m, n, F) as a linear map F
n
A
-
F
m
then from example
6.8 and theorem 6.11, n k of its columns will form a basis for its image while the other
k columns are linearly dependent upon those. Looking closely at columns of a matrix
reveals information about that matrix.
Example 6.16. The matrix
_
_
_
_
1 2 5
1 0 1
2 1 0
0 1 2
_
_
_
_
has rank 2 and nullity 1. It is neither
injective nor surjective.
6.3. Matrix Operations. As any linear map between vector spaces over F with xed
bases is represented by a matrix, we study matrices more closely. First goes a formal
denition of matrix.
Denition 6.17. A matrix M is a rectangular array of (a
ij
)
mn
of m rows and n columns,
where each entry a
ij
is an element in F in the i
th
row and j
th
column. Somtimes we also
use (A)
ij
for a
ij
. We denote the set of all mn matrices over F as M(m, n, F).
Example 6.18. We have
(1)
_
_
1/2
2
2/3
_
_
, M M(3, 1, Q), a
3,1
= 2/3
(2) M =
_
sin(/10) cos(/10)
_
, M M(1, 2, R), a
1,2
= cos(/10)
(3) M =
_
_
i e 1
0 1/2
ln5 1 3
_
_
, M M(3, 3, C), a
2,2
= 0
(4) M =
_
sin(/10) cos(/10)
_
, M M(1, 2, R), a
1,3
= cos(/10)
Below are what we can do with matrices,
6.3.1. Partition a matrix into submatrices. such as A
45
=
_
A
33
A
32
A
13
A
12
_
. This is
especially useful when we multiply matrices by blocks without fretting too much about
entries.
Example 6.19. If A M(4, 5, F) and B M(r + s, 4, F) then they can be partitioned
into blocks for multiplication as follows,
_
B
r3
B
r1
B
s3
B
s1
__
A
33
A
32
A
13
A
12
_
=
_
B
r3
A
33
+ B
r1
A
13
B
r3
A
32
+ B
r1
A
12
B
s3
A
33
+ B
s1
A
13
B
s3
A
32
+ B
s1
A
12
_
.
6.3.2. Addition of matrices of same size. We equip the set M(m, n, F) with addition
consistent with addition of their associated linear maps.
Denition 6.20. If A
mn
= (a
ij
) and B
mn
= (b
ij
) then we dene A + B = (c
ij
) where
c
ij
= a
ij
+ b
ij
.
Example 6.21.
_
1 0 2
0 1 1
_
+
_
1 3 0
0 1 2
_
=
_
2 3 2
0 0 3
_
in M(2, 3, R).
19
.
Proposition 6.22. Matrix addition enjoys the following properties,
(1) A + B = B + A
(2) (A + B) + C = A + (B + C)
Proof. Both statements follow from commutivity and associativity of addition in F.
Importantly, one can verify that this addition of matrices A+B corresponds to addition
of linear maps V
f
A
+f
B
-
W once bases are chosen for V and W.
6.3.3. Multiplication of matrices. What about composition of linear maps U
f
-
V
g
-
W?
If bases are chosen for U, V, W and A, B, C are the corresponding matrices for f, g, gf then
we must dene the right product BC such that BA = C to represent gf. Here is the
picture.
U
f
-
V U
A
-
V U
A
-
V
W
g
?
g
f
-
W
B
?
C
-
W
B
?
B
A
-
Denition 6.23. If A = (a
ij
) M(m, n, F) and B = (b
ij
) M(l, m, F) then we dene
BA = (c
ij
) M(l, n, F) where c
ij
= a
1j
b
i1
+ a
2j
b
i2
+ ... + a
nj
b
il
=
m
k=1
a
kj
b
ik
is the inner
product of i
th
row of B and j
th
column of A.
Proposition 6.24. Consider vector spaces U, V, W over F with dimensions n, m, l and
bases u
1
, ..., u
n
, v
1
, ..., v
m
, and w
1
, ..., w
l
. If U
f
-
V is represented by A = (a
ij
)
M(m, n, F) and V
g
-
W is represented by B = (b
ij
) M(l, m, F) then U
gf
-
W is
represented by BA.
Proof. We write out A =
_
_
a
11
... a
1n
: :
a
m1
... a
mn
_
_
, B =
_
_
b
11
... b
1m
: :
b
l1
... b
lm
_
_
. We examine what
f does to the basis vectors,
f(u
1
) = A(1, 0, ..., 0)
t
= (a
11
, ..., a
m1
)
t
= a
11
v
1
+ ... + a
m1
v
m
...
f(u
n
) = A(0, ..., 0, 1)
t
= (a
1n
, ..., a
mn
)
t
= a
1n
v
1
+ ... + a
mn
v
m
and what gf does to the basis vectors,
gf(u
1
) = a
11
B(v
1
) + ... + a
m1
B(v
m
) = a
11
(b
11
, ..., b
l1
)
t
+ ... + a
m1
(b
1m
, ..., b
lm
)
t
...
gf(u
n
) = a
1n
B(v
1
) + .... + a
mn
B(v
m
) = a
1n
(b
11
, ..., b
l1
)
t
+ ... + a
mn
(b
1m
, ..., b
lm
)
t
20
Hence gf is represented by
_
_
a
11
b
11
+ ... + a
m1
b
1m
... a
1n
b
11
+ ... + a
mn
b
1m
: :
a
11
b
l1
+ ... + a
m1
b
lm
... a
1n
b
l1
+ ... + a
mn
b
lm
_
_
, which
is precisely BA.
This denition agrees with our earlier one in 6.1. As a special case we dene scalar
multiplication as A =
_
_
... 0
. .
0 ...
_
_
A. This scalar multiplication together with
matrix addition turn M(m, n, F) into a vector space over F. The space of all linear maps
between two vector spaces V and W is itself a vector space.
Example 6.25.
_
1 4 2 0
2 1 5 6
_
_
_
_
_
2 1 3
3 0 1
4 0 5
1 2 0
_
_
_
_
= C
23
where c
21
= 22+13+54+61 =
22
Example 6.26. 5
_
_
_
_
2 1 3
3 0 1
4 0 5
1 2 0
_
_
_
_
=
_
_
_
_
5 0 0 0
0 5 0 0
0 0 5 0
0 0 0 5
_
_
_
_
_
_
_
_
2 1 3
3 0 1
4 0 5
1 2 0
_
_
_
_
=
_
_
_
_
10 5 15
15 0 5
20 0 25
5 10 0
_
_
_
_
Exercise 6.27. Show that M(m, n, F) has dimension mn as a vector space over F.
Proposition 6.28. Matrix multiplication enjoys the following properties,
(1) (associativity) (AB)C = A(BC)
(2) (distributivity) A(B + C) = AB + AC and (A + B)C = AC + BC
(3) (identity element, best when m = n) I
mm
A
mn
= A
mn
I
nn
= A for all A
mn
(4) (commutivity in scalar multiplication) A = A
Here are a few more examples to illuminate matrix multiplication.
Example 6.29. One can see
_
0 1
1 0
__
1 0
0 0
_
,=
_
1 0
0 0
__
0 1
1 0
_
through direct
multiplication or through geometry. Matrix multiplication is not commutative in general.
Example 6.30. Matrix multiplication fails to cancel,
_
1 0
0 0
__
3
1
_
=
_
1 0
0 0
__
3
2
_
=
_
3
0
_
but clearly
_
3
1
_
,=
_
3
2
_
.
Example 6.31. Matrix multiplication has zero divisors,
_
1 0
0 0
__
0 0
0 1
_
=
_
0 0
0 0
_
.
Example 6.32. Matrix multiplication has idempotents,
_
1 0
0 0
__
1 0
0 0
_
=
_
1 0
0 0
_
.
Example 6.33. Given
1
, ...,
k
F and A
1
, ..., A
k
M(m, n, F) we can form a linear
combination
1
A
1
+ ... +
k
A
k
M(m, n, F).
21
Example 6.34. Given a polynomial such as p(x) = x
2
+2x+3 R[x], we can view it as
R
p(x)
-
R, 4 4
2
+24+3 = 27. Now we can also view it as M(m, n, R)
p(x)
-
M(m, n, R), A
A
2
+ 2 A + 3.
6.3.4. Trace of a matrix. We assign to each square matrix A M(n, F) its rst invariant.
Denition 6.35. For A = (a
ij
) M(n, F) we dene its trace as tr(A) =
m
i=1
a
ii
.
Example 6.36. tr() = n for any matrix () M(n, F).
Example 6.37. tr
_
_
2 1 4
3 4 1
5 3 1
_
_
= 2 + 4 + 1 = 7.
Trace has the following properties,
Proposition 6.38. For any A, B M(n, F) and F,
(1) tr(A) = tr(A).
(2) tr(A + B) = tr(A) + tr(B)
(3) tr(AB) = tr(BA).
Proof. Straightforward.
The rst two properties mean trace is actually a linear map M(n, F)
tr
-
F. To-
gether with the third property, they actually characterize trace completely, any linear
map M(n, F)
f
-
F satisfying the above three properties must be a multiple of trace.
6.3.5. Transpose of a matrix. Next we give each matrix A M(m, n, F) a companion, it
helps us express matrix theory.
ij
) M(m, n, F) we dene its transpose A
t
= (b
ij
)
M(n, m, F), where b
ij
= a
ji
.
Proposition 6.40. Transpose has the following properties,
(1) (A
t
)
t
= A.
(2) (A + B)
t
= A
t
+ B
t
.
(3) (A)
t
= A
t
.
(4) (AB)
t
= B
t
A
t
.
Proof. Straightforward.
Example 6.41.
_
_
2 1 4
3 4 1
5 3 1
_
_
t
=
_
_
2 3 5
1 4 3
4 1 1
_
_
and
_
2 1 7
_
t
=
_
_
2
1
7
_
_
Transpose can be used to describe many interesting classes of matrices. The rst class
is triangular matrices.
Denition 6.42. A square matrix A M(n, F) is called lower triangular if all entries
above the diagonal are zero, i.e. a
ij
= 0 for all i < j. A matrix A M(n, F) is called
upper triangular if a
ij
= 0 for all i > j. Furthermore, if a
ii
= 1 then we say A is unit
lower triangular or unit upper triangular, respectively.
22
Denition 6.43. A matrix A M(n, F) is called diagonal if a
ij
= 0 whenever i ,= j.
One sees from denition that A is lower triangular i A
t
is upper triangular and vice
versa, while A is diagonal i it is both lower triangular and upper triangular. Another
class of matrices that transpose helps describe is symmetric matrices.
Denition 6.44. A square matrix A M(n, F) is called symmetric if A = A
t
, or
equivalently if a
ij
= a
ji
for all i, j.
One source for symmetric matrices is taking inner product. If x, y are two vectors in
an inner product space V and v
1
, ..., v
n
are a basis for V then x =
1
v
1
+... +
n
v
n
, y =
1
v
1
+... +
n
v
n
and x, y) =
1
v
1
+... +
n
v
n
,
1
v
1
+... +
n
v
n
) = (
1
, ...,
n
)A(
1
, ...,
n
)
t
where A = (v
i
, v
j
)) M(n, F). Even if v
1
, ..., v
n
are not a basis for V , we can still
form A = (v
i
, v
j
)).
Denition 6.45. A square matrix A M(n, F) is named Gram if A = (v
i
, v
j
)) for some
vectors v
1
, ..., v
n
V .
This matrix is symmetric since inner product is symmetric. Moreover, if we choose
a basis for V and write v
i
= (
1i
, ...,
ni
)
t
then v
i
, v
j
) = (
1i
, ...,
ni
)(
1j
, ...,
nj
)
t
, so
A = B
t
B where B has i
th
column (
1i
, ...,
ni
)
t
.
Proposition 6.46. In an inner product space V over R with induced norm, the following
statements for v
1
, ..., v
n
are equivalent,
(1) they are orthonormal.
(2) their Gram matrix equals I.
(3) their coordinate forms (
1i
, ...,
ni
) with respect to any basis are orthonormal.
(4) B
t
B = I where B has i
th
column (
1i
, ...,
ni
)
t
.
Proof. follows from above discussion.
Such matrices as B also have a name.
Denition 6.47. A matrix A M(m, n, R) is called orthonormal (or orthogonal) if
its columns form an orthonormal collection in V . A linear map V
f
-
W is called
orthonormal if its associated matrix A
f
is orthonormal when bases have been chosen for
V and W.
The number n of columns of A may not be m, so they need not form a basis and
A need not be square. We will prove that it has as many or more rows than columns.
Orthonormal matrices are special. For one, A
t
A = I by denition, so their transpose is
their inverse from the left. Moreover, they preserve inner product, hence norm and angle
when viewed as maps between inner product spaces.
Proposition 6.48. If V
A
-
W is an orthonormal map between inner product spaces
over R then A(u), A(v)) = u, v) for all u, v V .
Proof. If (
1
, ...,
n
)
t
and (
1
, ...,
n
)
t
are the coordinate forms for u and v then A(u), A(v)) =
(A(
1
, ...,
n
)
t
)
t
A(
1
, ...,
n
)
t
= (
1
, ...,
n
)A
t
A(
1
, ...,
n
)
t
= (
1
, ...,
n
)(
1
, ...,
n
)
t
= v, v
).
23
What this proposition means is the following diagram commutes,
V V
(A,A)
-
W W
F
,
?
id
-
F
,
?
One more class of matrices that transpose helps describe is positive semidenite matri-
ces over R.
Denition 6.49. A square matrix A M(n, R) is called positive semidenite if it is
symmetric and x
t
Ax 0 for all x R
n
. It is called positive denite if in addition to
being semidenite, A satises x
t
Ax = 0 only if x = 0.
Example 6.50. For matrices of small sizes, we can verify the sign of x
t
Ax to see
if A is positive semidenite or positive denite. Concretely, A =
_
9 6
6 5
_
is posi-
tive denite as (x
1
, x
2
)A(x
1
, x
2
)
t
= (3x
1
+ 2x
2
)
2
+ x
2
2
0 for all (x
1
, x
2
) R
2
and
(x
1
, x
2
)A(x
1
, x
2
)
t
= 0 i (x
1
, x
2
) = (0, 0). On other hand B =
_
9 6
6 4
_
is positive
semidenite as (x
1
, x
2
)A(x
1
, x
2
)
t
= (3x
1
+ 2x
2
)
2
0 for all (x
1
, x
2
) R
2
. However it
is not positive denite since (2, 3)A(2, 3)
t
= 0. Lastly, B =
_
9 6
6 3
_
is not pos-
itive semidenite as (x
1
, x
2
)A(x
1
, x
2
)
t
= (3x
1
+ 2x
2
)
2
x
2
2
for all (x
1
, x
2
) R
2
and
(2/3, 1)A(2/3, 1)
t
< 0.
Example 6.51. Another source for positive semidenite matrices are Gram matrices
A M(n, R), since x
t
Ax = x
t
B
t
Bx = (Bx)
t
(Bx) = Bx, Bx) 0. Clearly A is positive
denite i B(x) = 0 implies x = 0 i B has trivial kernel i B
t
has full image.
6.3.6. Norm of a matrix. Viewed as a linear map between normed spaces, a matrix
A M(m, n, R) will either stretch or shrink a vector. This behavior is measured by
[[A(x)[[/[[x[[.
Denition 6.52. If V
A
-
W is a linear map between normed spaces over R then we
dene [[A[[ = max [[A(x)[[/[[x[[, all x V .
The following instances examplify this new denition.
Example 6.53. The scalar matrix () has norm [[.
Example 6.54. The n1 matrix (a
1
, ..., a
n
)
t
has norm (a
2
1
+... +a
2
n
)
1/2
as it would when
viewed as a vector.
It is not easy to calculate norm of a matrix in general, although MATLAB and wolfra-
malpha can approximate matrix norm by numerical methods.
Proposition 6.55. We list some apparent properties of matrix norm.
(1) (Homogeneity) [[A[[ = [[[[A[[.
24
(2) (Triangle inequality) [[A + B[[ [[A[[ +[[B[[.
(3) (Deniteness) [[A[[ 0 for all A and equality holds i A = 0.
(4) [[A[[ = max [[A(x)[[, [[x[[ = 1.
(5) [[A(x)[[ [[A[[[[x[[ for all vectors x.
(6) [[AB[[ [[A[[[[B[[ for all A, B.
(7) [[A
t
[[ = [[A[[.
Proof. Do it if time permits.
The rst three properties turn M(m, n, R) into a normed space.
6.4. Invertible Matrices and their Inverses.
6.4.1. Square matrices. When a linear map V
f
-
W between vector spaces of equal di-
mension n over F is bijective with inverse f
1
then their matrix representations A
f
, A
f
1
M(n, F) satisfy A
f
A
f
1 = A
f
1A = I with respect to any bases for V, W. Or abstractly,
M(n, F) has been equipped with + and and we want to consider those matrices that
are invertible under .
Denition 6.56. A square matrix A M(n, F) is called invertible (or nonsingular) if
there exists B M(n, F) such that AB = BA = I, in which case we denote B as A
1
.
Else we say A is noninvertible (or singular).
Example 6.57. The inverse of I M(n, F) is I itself. More generally, for ,= 0 F,
_
_
... 0
: :
0 ...
_
_
1
=
_
_
1/ ... 0
: 1/ :
0 ... 1/
_
_
Example 6.58. One can verify that
_
_
1 2 3
3 2 1
2 1 3
_
_
and
_
_
5/12 3/12 4/12
7/12 3/12 8/12
1/12 3/12 4/12
_
_
are
inverses of each other. Or one can input 1, 2, 3, 3, 2, 1, 2, 1, 3 into wolframalpha
and let it do the work.
Theorem 6.59. Any invertible matrices A, B M(n, F) satisfy the following,
(1) A has unique inverse A
1
and (A
1
)
1
= A.
(2) (AB)
1
= B
1
A
1
(3) (A
n
)
1
= (A
1
)
n
.
(4) (A)
1
=
1
A
1
for k ,= 0 R.
(5) (A
t
)
1
= (A
1
)
t
.
Proof. prove a few of these in class. Look at these properties in terms of composition of
linear maps.
Example 6.60. We reuse A and A
1
in example 6.58. One can verify that
A
2
=
_
_
13 9 14
11 11 14
11 9 16
_
_
and (A
1
)
2
=
_
_
25/72 9/72 14/72
11/72 27/72 14/72
11/72 9/72 22/72
_
_
are inverses of each
other.
25
Example 6.61. Or 12A
1
=
_
_
5/12 3/12 4/12
7/12 3/12 8/12
1/12 3/12 4/12
_
_
=
_
_
5 3 4
7 3 8
1 3 4
_
_
is the
inverse of
1
12
A.
Example 6.62. Or (A
1
)
t
=
_
_
5/12 7/12 1/12
3/12 3/12 3/12
4/12 8/12 4/12
_
_
is the inverse of A
t
=
_
_
1 3 2
2 2 1
3 1 3
_
_
We relate invertibility of a matrix A with its behavior as a linear map between vector
spaces of equal dimension over F.
Theorem 6.63. A matrix A M(n, F) is invertible i V
f
A
-
V is bijective as a
linear map i f
A
has trivial kernel i f
A
has full image i the columns of A are linearly
independent i the rows of A are linearly independent.
Proof. Surely A is invertible with inverse A
1
i f
A
is bijective with inverse f
A
1 i
f
A
has trivial kernel i f
A
has full image by the Nullity-Rank theorem i the columns
A(e
1
), ..., A(e
n
) of A span W i they are linearly independent i A
t
is invertible i the
rows of A are linearly independent.
Example 6.64. Any matrix with a whole row of zeros or a whole column of zeros is sin-
gular. Explicitly, if A =
_
c
1
c
2
0
_
then BA = B
_
c
1
c
2
0
_
=
_
Bc
1
Bc
2
0
_
,= I
for any matrix B.
Corollary 6.65. Positive denite matrices are invertible while positive semidenite ma-
trices that are not denite are noninvertible.
Proof. Consider a positive denite matrix A. For x ker(A), x
t
Ax = 0 so x = 0. Hence A
has trivial kernel as a linear map. By the theorem, A is invertible. On the other hand, if
A is positive semidenite but not positive denite then there exists some x ,= 0 such that
x
t
Ax = 0. Consider the quadratic polynomial p(t) = (x+ty)
t
A(x+ty) for arbitrary y V .
Then p(t) 0 since A is positive semidenite. After expansion, p(t) = y
t
Ayt
2
+ 2y
t
Axt
with minimum 0 at t = 0. Hence p
(0) = 2y
t
Ax = 0. It follows Ax = 0. By the theorem,
A is noninvertible.
Corollary 6.66. For A M(n, R), the product A
t
A is positive denite i A has trivial
kernel i A
t
has full image.
Proof. Surely x
t
A
A
t = (Ax)
t
(Ax) 0 for all x V so A
t
A is positive semidenite.
Everything now follows from corollary 6.65.
6.4.2. Elementary matrices and a method to nd A
1
. An obvious question is how to
nd the inverse of an invertible matrix. As the inverse for any invertible A M(1, F) is
obvious, we begin with A M(2, F).
Theorem 6.67. A square matrix A =
_
a b
c d
_
M(2, F) is invertible i ad bc ,= 0,
in which case A
1
=
1
adbc
_
d b
c a
_
26
Proof. One can verify that the product of those two matrices equals I M(2, F).
Example 6.68. Projection matrix in example 6.7 is singular while each reection matrix
in example 6.7 is invertible. One can see this by either computing the inverse as above or
by geometry.
Example 6.69. Given a linear system of 2x + y = 3, x y = 4 before, we could have
formed the augmented matrix
_
2 1 3
1 1 4
_
and used Gaussian Jordan elimination to
solve it. Now we have A =
_
2 1
1 1
_
, A
1
=
_
1/3 1/3
1/3 2/3
_
. Hence = A
1
_
3
4
_
=
_
7/3
5/3
_
.
Finding the inverse of a more general matrix A M(n, F) is more troublesome. Recall
the three elementary row operations: interchanging two rows, multiplying a row by a
constant, and adding a constant times a row to another row.
Denition 6.70. A matrix A M(n, F) is called elementary if it diers from I by a
single elementary row operation.
Denition 6.71. Two matrices A, B M(n, F) are called row equivalent if they dier
by a sequence of elementary row operations, in which case we write A B.
Example 6.72.
_
1 0
0 1
_
4r
2
-
_
1 0
0 4
_
r
1
+r
2
to r
1
-
_
1 4
0 4
_
r
1
r
2
-
_
0 4
1 4
_
. Hence
_
0 4
1 4
_
I but it is not an elementary matrix.
For convenience, we denote each elementary row operation by O and its reverse opera-
tion by O
1
.
Theorem 6.73. If E is the elementary matrix from performing an elementary row op-
eration O to I then EA = O(A), i.e. multiplying A by E on the left is the same as
performing O to A.
Proof. times permits.
Corollary 6.74. Any elementary matrix E is invertible with inverse E
1
= O
1
(I).
Proof. If E = O(I) then E O
1
(I) = O(O
1
(I)) = I.
Theorem 6.75. For A M(n, F) the following statements are equivalent,
(1) A is invertible.
(2) Reduced row echelon form of A is I.
(3) A I, or A = E
r
...E
1
A = O
r
(...O
1
(A)).
Proof. if time permits.
Corollary 6.76. If a sequence of elementary row operations reduces an invertible matrix
A to I then that same sequence changes I to A
1
.
27
Proof. If Ais invertible then by theorem 6.75 there exists O
1
, ..., O
r
such that O
r
(...O
1
(A)) =
I. But then O
r
(...O
1
(I))A = O
r
(...O
1
(A)) = I by theorem 6.73, so O
r
(...O
1
(I)) =
A
1
.
This corollary gives us an algorithm to invert any nonsingular matrix. It will stall on
a singular matrix.
Example 6.77. Given A =
_
_
2 1 1
1 2 1
1 1 2
_
_
, we perform
_
_
2 1 1 1 0 0
1 2 1 0 1 0
1 1 2 0 0 1
_
_
r
1
2
,r
2
r
1
,r
3
r
1
-
_
_
1 1/2 1/2 1/2 0 0
0 3/2 1/2 1/2 1 0
0 1/2 3/2 1/2 0 1
_
_
2r
2
3
,r
3
r
2
2
-
_
_
1 1/2 1/2 1/2 0 0
0 1 1/3 1/3 2/3 0
0 0 4/3 1/3 1/3 1
_
_
3r
3
4
,r
2
r
3
3
,r
1
r
3
2
-
_
_
1 1/2 0 5/8 1/8 3/8
0 1 0 1/4 3/4 1/4
0 0 1 1/4 1/4 3/4
_
_
r
1
r
2
2
-
_
_
1 0 0 3/4 1/4 1/4
0 1 0 1/4 3/4 1/4
0 0 1 1/4 1/4 3/4
_
_
=
_
I A
1
_
Corollary 6.78. A triangular matrix A M(n, F) is invertible i a
ii
,= 0 for all i.
Proof. The algorithm to invert A will produce A
1
i a
ii
,= 0 for all i.
6.4.3. Determinant of a Matrix and Cofactor Expansion. One invariant we have associ-
ated with a square matrix A M(n, F) is its trace. Now we associate a second invariant
to A, called determinant and written det(A) that encodes some innate characteristics of
A. We do it by induction.
Denition 6.79. For A =
_
a b
c d
_
M(2, F) we dene det(A) =
a b
c d
= ad bc.
Example 6.80. det
_
1 2
3 4
_
=
1 2
3 4
= 1 4 2 3 = 2
Suppose we have dened determinant for A M(n 1, F). We consider two more
ingredients for determinant of a general n n matrix.
ij
) M(n, F) we dene the minor M
ij
of entry a
ij
to be
the determinant of the (n1) (n1) submatrix that remains after deleting the i
th
row
and j
th
column from A. We dene the cofactor C
ij
of entry a
ij
to be (1)
i+j
M
ij
.
ij
) M(n, F) we dene its cofactor expansion along row i
th
to be
n
j=1
a
ij
C
ij
and its cofactor expansion along column j
th
to be
n
i=1
a
ij
C
ij
One obvious question is whether cofactor expansion depends on the choice of row vs.
column, or the choice of which row, or the choice of which column. Here is the answer.
28
Theorem 6.83. Cofactor expansion of A = (a
ij
) M(n, F) is independent of the choice
of row vs. column, or the choice of row i
th
, or the choice of column j
th
.
Proof. One way is to explicitly write out any two cofactor expansions in terms of the a
ij
and compare. It is tedious bookkeeping.
Finally we can dene determinant of a general n n matrix A.
ij
) M(n, F) we dene its determinant det(A) =
n
j=1
a
ij
C
ij
=
n
i=1
a
ij
C
ij
any cofactor expansion along any row or any column.
It takes us long to dene determinant but it is not hard to compute.
Example 6.85.
det
_
_
1 2 3
2 3 1
3 1 2
_
_
= 1
3 1
1 2
2 1
3 2
+ 3
2 3
3 1
= 2
2 1
3 2
+ 3
1 3
3 2
1 3
2 1
= ... = 18
Example 6.86.
det
_
_
_
_
1 2 3 4
2 3 1 5
3 1 2 6
2 1 3 7
_
_
_
_
= 7
1 2 3
2 3 1
3 1 2
1 2 3
2 3 1
2 1 3
+ 5
1 2 3
3 1 2
2 1 3
2 3 1
3 1 2
2 1 3
Computation of determinant is easier for matrices full of zeros. If A has a row or a

column full of zeros then we choose to calculate cofactor expansion along that row or
column. As a special case, any triangular matrix A has determinant
n
i=1
a
ii
. Another way
to compute determinant is via row reduction. For this, we need to understand how each
elementary operation aects determinant.
Theorem 6.87. For A M(n, F),
(1) If O multiplies a row or column of A by then det(O(A)) = det(A).
(2) If O interchanges two rows or two columns of A then det(O(A)) = det(A).
(3) If O adds a multiple of one row to another or a multiple of one column to another
then det(O(A)) = det(A).
Proof. Straightforward from denition of determinant by cofactor expansion.
Corollary 6.88. If two rows or two columns of A M(n, R) are proportional then
det(A) = 0.
Proof. By theorem 6.87[3] we can reduce A to a matrix with a row of zeros or a columns
of zeros and both have the same determinant 0.
We apply theorem 6.87 to elementary matrices rst.
29
Example 6.89.
1 0 0
0 1 0
0 0 2
= 2,
0 0 1
1 0 0
0 1 0
= 1, and
1 0 3
0 1 0
0 0 1
= 1. In any case,
det(E) ,= 0 for any elementary matrix E.
Example 6.90.
1 2 3
3 4 5
3 6 9
= 0
We apply theorem 6.87 to calculate determinant of a general n n matrix.
Example 6.91. Reduce
_
_
0 1 2
3 4 5
6 7 8
_
_
to diagonal form by elementary row operations,
keeping track of how det changes along the way. This could have been done with column
operations as well.
Example 6.92.
_
_
0 0 1
1 0 0
0 1 0
_
_
+
_
_
1 0 3
0 1 0
0 0 1
_
_
,=
1 0 4
1 1 0
0 1 1
. Determinant does not

respect addition.
However, determinant does respect matrix multiplication. Toward this we have a pre-
lude.
Lemma 6.93. If E is an elementary matrix then det(EA) = det(E)det(A) for any A
M(n, R).
Proof. If E = O(I) then EA = O(A) and the result follows from theorem 6.87.
We now list the immediate properties of determinant.
Theorem 6.94. If A, B, C M(n, F) are square matrices then,
(1) (Compatibility with transpose) det(A
t
) = det(A).
(2) det(A) =
n
det(A).
(3) If A, B, C dier only in one row i
th
and a
ij
+ b
ij
= c
ij
for all j then det(A) +
det(B) = det(C). The result holds for columns as well.
(4) (Compatibility with multiplication) det(AB) = det(A)det(B).
Proof. (1), (2) and (3) follow from denition of determinant. For (4) we consider two cases.
If A is not invertible then neither is AB and det(AB) = det(A)det(B) = 0 by theorem
6.96. If A is invertible then we write A = E
k
...E
1
, hence det(AB) = det(E
k
...E
1
B) =
det(E
k
)...det(E
1
)det(B) = det(A)det(B) by lemma 6.93.
Example 6.95. We can verify some instances of the property det(AB) = det(A)det(B).
_
_
0 0 1
1 0 0
0 1 0
_
_
_
_
1 0 3
0 1 0
0 0 1
_
_
0 0 1
1 0 0
0 1 0
1 0 3
0 1 0
0 0 1
We arrive at the most important piece of information in determinant.

30
Theorem 6.96. A matrix A M(n, F) is invertible i det(A) ,= 0. In that case,
det(A
1
) = det(A)
1
.
Proof. Reduce A to its reduced row echelon form R = E
k
...E
1
A, then det(A) ,= 0 i
det(R) ,= 0 i R = I i A is invertible. In that case, det(A)det(A
1
) = det(AA
1
) =
det(I) = 1.
Example 6.97. We can test whether a matrix A M(n, F) is invertible before using
row reduction to nd its inverse. If det(A) = 0 then we stop, else we go.
6.4.4. Adjugate matrix and another way to nd A
1
. There is another way to nd inverse
of an invertible matrix beside using elementary row operations. We have actually seen it
in the 2 2 case A =
_
a b
c d
_
and A
1
=
1
det(A)
_
d b
c a
_
Denition 6.98. If A M(n, F) is a square matrix then the matrix (C
ij
), C
ij
the cofactor
of a
ij
is called the matrix of cofactors of A and its transpose (C
ij
)
t
is called the adjugate
(or adjoint in some literature) of A, denoted adj(A).
Example 6.99. The matrix
_
_
1 2 3
3 2 1
2 1 3
_
_
has cofactors C
11
= 5, C
12
= 7, C
13
=
1, C
21
= 3, C
22
= 3, C
23
= 3, C
31
= 4, C
32
= 8, C
33
= 4. Hence its matrix of
cofactors is
_
_
5 7 1
3 3 3
4 8 4
_
_
and its adjugate is
_
_
5 3 4
7 3 8
1 3 4
_
_
We state and prove our suspicion for the n n case.
Theorem 6.100. If A M(n, F) is invertible then A
1
=
1
det(A)
adj(A).
Proof. Consider A adj(A) = (a
ij
)(C
ij
)
t
= (d
ij
) where d
ij
= a
i1
C
j1
+ ... + a
jn
C
jn
. If
i = j then d
ij
is precisely det(A), else d
ij
is the determinant of the matrix from replacing
the j
th
row of A with its i
th
row. Since this matrix has two rows that are the same, its
determinant is 0 by corollary 6.88. Hence A adj(A) = det(A)I and we are done.
Example 6.101. Since A =
_
_
1 2 3
3 2 1
2 1 3
_
_
has determinant 12, it is invertible with
inverse A
1
=
1
12
_
_
5 3 4
7 3 8
1 3 4
_
_
=
_
_
5/12 3/12 4/12
7/12 3/12 8/12
1/12 3/12 4/12
_
_
as already seen in
example 6.58.
6.4.5. General matrices. When V
f
-
W is a linear map between vector spaces V and
W of possibly dierent dimension n and m then its matrix representation A
f
has size
mn and we have more to keep track of. If m > n then A
f
has more rows than columns
and is called a tall matrix. If m < n then A has more columns than rows and is called a
wide matrix.
31
Denition 6.102. A matrix A M(m, n, F) is called left invertible if there exists a
matrix B M(n, m, F) such that BA = I M(n, F).
Theorem 6.103. A matrix A M(m, n, F) is left invertible i A has trivial kernel.
Proof. If A has a left inverse B then A(v) = 0, v V implies v = I(v) = BA(v) = B(0) =
0, so A has trivial kernel. Conversely, suppose A has trivial kernel. By corollary 6.66
A
t
A is positive denite, hence invertible with inverse (A
t
A)
1
. But then ((A
t
A)
1
A
t
)A =
(A
t
A)
1
(A
t
A) = I, so A is left invertible.
Corollary 6.104. Any left invertible matrix A must be square or tall.
Proof. By theorem 6.103, A has trivial kernel and V embeds into W, so m n (recall
Nullity-Rank theorem).
This left inverse (A
t
A)
1
A
t
for Ais called Moore-Penrose pseudoinverse, it is not unique.
Recall that a matrix A is called orthogonal if A
t
A = I, so it has left inverse and conse-
quently as many or more rows than columns. The case m n is mirrored.
Denition 6.105. A matrix A M(m, n, F) is called right invertible if there exists a
matrix B M(n, m, F) such that AB = I M(m, F).
Theorem 6.106. A matrix A M(m, n, F) is right invertible i A has full image.
Proof. We see A is right invertible i A
t
is left invertible i A
t
has trivial kernel by theorem
6.103 i (A
t
)
t
= A has full image by 6.66.
Corollary 6.107. Any right invertible matrix A must be square or tall.
Proof. If A is right invertible then A
t
has trivial kernel, hence left invertible, hence square
or tall. So A must be square or wide.
A right inverse for a right invertible matrix A is A
t
(AA
t
)
1
as we can imagine. It is
also called Moore-Penrose pseudoinverse. Note that a right inverse for A may not be its
left inverse and vice versa.
6.5. Exercises.
Exercise 6.108. page 41: 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.10, 2.12.
Exercise 6.109. page 63: 3.1, 3.4, 3.6, 3.17

Linear Algebra Weeks 1234

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linear Algebra Weeks 1234

Uploaded by

Copyright:

Available Formats

Linear Algebra

JVN, Premaster Summer 2012

B = then we say A and B are disjoint.

X. It is called surjective (or onto) if f(X) = f(x), all x X = Y .

and f is called surjective if f(X) = Y . When f

(X, T ) together with the induced topology T

U, U T is called a subspace of (X, T ).

comes from an open set U T . More generally, we can view

= (0, 1), (1, 0) then

x) + f(x) we often see.

= L modulo those functions that are 0 almost every-

. Together with this norm, L

we say the u, v are aligned, i.e. they

< (u, v) < 90

we say u, v make an acute angle, i.e. they have positive correlation

we say u, v are orthogonal, i.e. they are uncorrelated and

< (u, v) < 180

we say u, v make an obtuse angle, i.e. they have

we say u, v are antialigned, i.e.

2, 0) are orthonormal while

A(v), but this follows

Computation of determinant is easier for matrices full of zeros. If A has a row or a

. Determinant does not

We arrive at the most important piece of information in determinant.

You might also like