You are on page 1of 13

Linear Algebra (part 2) : Vector Spaces and Linear Transformations

(by Evan Dummit, 2012, v. 1.00)

Contents
1

Vector Spaces and Linear Transformations

Rn

1.1

Review of Vectors in

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2

Formal Denition of Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.3

Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.4

Span, Independence, Bases, Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.4.1

Linear Combinations and Span . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.4.2

Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.4.3

Bases and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.5

Linear Transformations

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

1.5.1

Kernel and Image

1.5.2

The Derivative as a Linear Transformation

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

Vector Spaces and Linear Transformations

1.1 Review of Vectors in Rn

A vector, as we typically think of it, is a quantity which has both a magnitude and a direction. This is in
contrast to a scalar, which carries only a magnitude.

Real-valued vectors are extremely useful in just about every aspect of the physical sciences, since just
about everything in Newtonian physics is a vector  position, velocity, acceleration, forces, etc. There
is also vector calculus  namely, calculus in the context of vector elds  which is typically part of a
multivariable calculus course; it has many applications to physics as well.

We often think of vectors geometrically, as a directed line segment (having a starting point and an endpoint).

We denote the
where the

ai

n-dimensional

vector from the origin to the point

(a1 , a2 , , an )

as

~v = ha1 , a2 , , an i,

are scalars.

Some vectors:

h1, 2i, h3, 5, 1i,


4
, e2 , 27, 3, , 0, 0, 1 .
3

Notation: I prefer to use angle brackets

hi

rather than parentheses

()

so as to draw a visual distinction

between a vector, and the coordinates of a point in space. I also draw arrows above vectors or typeset
them in boldface (thus

~v

or

v),

in order to set them apart from scalars. This is not standard notation

everywhere; many other authors use regular parentheses for vectors.

Note/Warning: Vectors are a little bit dierent from directed line segments, because we don't care where a
vector starts: we only care about the dierence between the starting and ending positions. Thus: the directed
segment whose start is
the same vector,

(0, 0)

and end is

(1, 1)

and the segment starting at

(1, 1)

and ending at

(2, 2)

represent

h1, 1i.

We can add vectors (provided they are of the same length!) in the obvious way, one component at a time: if

~v = ha1 , , an i

and

w
~ = hb1 , , bn i

then

~v + w
~ = ha1 + b1 , , an + bn i.

~v moves us from the origin to the point


w
~ tells us to add hb1 , , bn i to the coordinates of our current position, and so w
~
moves us from (a1 , , an ) to (a1 +b1 , , an +bn ). So the net result is that the sum vector ~
v +w
~ moves
us from the origin to (a1 + b1 , , an + bn ), meaning that it is just the vector ha1 + b1 , , an + bn i.
We can justify this using our geometric idea of what a vector does:

(a1 , , an ).

Then

Another way (though it's really the same way) to think of vector addition is via the parallelogram
diagram, whose pairs of parallel sides are

~v

and

w
~,

and whose long diagonal is

We can also 'scale' a vector by a scalar, one component at a time:

if

~v + w
~.

is a scalar, then we have

r ~v =

hra1 , , ran i.

Again, we can justify this by our geometric idea of what a vector does: if
direction, then

1
~v
2

Example: If

~v

should move us exactly as far, but in the opposite direction.

~v = h1, 2, 2i and w
~ = h3, 0, 4i then 2w
~ = h6, 0, 8i

2w
~ = h7, 2, 10i

moves us some amount in a

should move us half as far in that direction. Analogously,2~


v should move us twice as

far in that direction, and

~v

, and

~v + w
~ = h2, 2, 2i

. Furthermore,~
v

The arithmetic of vectors in

Rn

satises several algebraic properties (which follow more or less directly from

the denition):

Addition of vectors is commutative and associative.

There is a zero vector (namely, the vector with all entries zero) such that every vector has an additive
inverse.

Scalar multiplication distributes over addition of both vectors and scalars.

1.2 Formal Denition of Vector Spaces

The two operations of addition and scalar multiplication (and the various algebraic properties they satisfy)
are the key properties of vectors in

Rn .

the same properties as vectors in

We would like to investigate other collections of things which possess

Denition: A (real) vector space is a collection

of vectors together with two binary operations, addition of

vectors (+) and scalar multiplication of a vector by a real number (), satisfying the following axioms:

Let

Note: The statement that

[A1] Addition is commutative:

[A2] Addition is associative:

[A3] There exists a zero

[A4] Every vector

[M1] Scalar multiplication is consistent with regular multiplication:

[M2] Addition of scalars distributes:

(1 + 2 ) ~v = 1 ~v + 2 ~v .

[M3] Addition of vectors distributes:

(~v1 + ~v2 ) = ~v1 + ~v2 .

[M4] The scalar 1 acts like the identity on vectors:

~v , ~v1 , ~v2 , ~v3

be any vectors and

~v

and

, 1 ,2

be any (real number) scalars.

are binary operations means that

~v1 + ~v2

and

~v

are always dened.

~v1 + ~v2 = ~v2 + ~v1 .

(~v1 + ~v2 ) + ~v3 = ~v1 + (~v2 + ~v3 ).


vector ~
0, with ~v + ~0 = ~v .

has an additive inverse

~v ,

with

~v + (~v ) = ~0.
1 (2 ~v ) = (1 2 ) ~v .

1 ~v1 = ~v1 .

Important Remark: One may also consider vector spaces where the collection of scalars is something other
than the real numbers  for example, there exists an equally important notion of a complex vector space,
whose scalars are the complex numbers. (The axioms are the same.)

We will principally consider real vector spaces, in which the scalars are the real numbers.

The most general notion of a vector space involves scalars from a eld, which is a collection of numbers
which possess addition and multiplication operations which are commutative, associative, and distributive, with an additive identity 0 and multiplicative identity 1, such that every element has an additive
inverse and every nonzero element has a multiplicative inverse.

Aside from the real and complex numbers, another example of a eld is the rational numbers (i.e.,
fractions). One can formulate an equally interesting theory of vector spaces over the rational numbers.

Examples: Here are some examples of vector spaces:

The vectors in

Rn

are a vector space, for any

n > 0.

(This had better be true!)

Note: For simplicity I will demonstrate all of the axioms for vectors in

This space is rather boring: since it only contains one element, there's really not much to say about

In particular, if we take

n = 1,

then we see that the real numbers themselves are a vector space.

R2 ; there, the vectors are of


the form hx, yi and scalar multiplication is dened as hx, yi = hx, yi.
[A1]: We have hx1 , y1 i + hx2 , y2 i = hx1 + x2 , y1 + y2 i = hx2 , y2 i + hx1 , y1 i.
[A2]: We have (hx1 , y1 i + hx2 , y2 i)+hx3 , y3 i = hx1 + x2 + x3 , y1 + y2 + y3 i = hx1 , y1 i+(hx2 , y2 i + hx3 , y3 i).
[A3]: The zero vector is h0, 0i, and clearly hx, yi + h0, 0i = hx, yi.
[A4]: The additive inverse of hx, yi is hx, yi, since hx, yi + hx, yi = h0, 0i.
[M1]: We have 1 (2 hx, yi) = h1 2 x, 1 2 yi = (1 2 ) hx, yi.
[M2]: We have (1 + 2 ) hx, yi = h(1 + 2 )x, (1 + 2 )yi = 1 hx, yi + 2 hx, yi.
[M3]: We have (hx1 , y1 i + hx2 , y2 i) = h(x1 + x2 ), (y1 + y2 )i = hx1 , y1 i + hx2 , y2 i.
[M4]: Finally, we have 1 hx, yi = hx, yi.
The zero space with a single element ~0, with ~0 + ~0 = ~0 and ~0 = ~0 for every , is a vector space.
All of the axioms in this case eventually boil down to ~0 = ~0.
it.

The set of

mn

matrices for any

and any

n,

forms a vector space.

The various algebraic properties we know about matrix addition give [A1] and [A2] along with [M1],
[M2], [M3], and [M4].

The zero vector in this vector space is the zero matrix (with all entries zero), and [A3] and [A4]
follow easily.

Note of course that in some cases we can also multiply matrices by other matrices. However, the
requirements for being a vector space don't care that we can multiply matrices by other matrices!
(All we need to be able to do is add them and multiply them by scalars.)

The complex numbers

a + bi,

where

i2 = 1,

The axioms all follow from the standard properties of complex numbers. As might be expected, the

0 = 0 + 0i.

zero vector is just the complex number

are a vector space.

Again, note that the complex numbers have more structure to them, because we can also multiply
two complex numbers, and the multiplication is also commutative, associative, and distributive over
addition. However, the requirements for being a vector space don't care that the complex numbers
have these additional properties.

The collection of all real-valued functions on any part of the real line is a vector space, where we dene
the sum of two functions as

(f + g)(x) = f (x) + g(x)

for every

x,

and scalar multiplication as

( f )(x) = f (x).

To illustrate: if

f (x) = x and g(x) = x2 ,


(2f )(x) = 2x.

then

f +g

is the function with

(f + g)(x) = x + x2 ,

and

2f

is the function with

The axioms follow from the properties of functions and real numbers. The zero vector in this space
is the zero function; namely, the function

and

which has

z(x) = 0

for every

For example (just to demonstrate a few of the axioms), for any value

g,

x.
[a, b]

in

and any functions

we have

(f + g)(x) = f (x) + g(x) = g(x) + f (x) = (g + f )(x).


(f + g)(x) = f (x) + g(x) = (f )(x) + (g)(x).
[M4]: (1 f )(x) = f (x).

[A1]:

[M2]:

There are many simple algebraic properties that can be derived from the axioms (and thus, are true in every
vector space), using some amount of cleverness. For example:

1. Addition has a cancellation law: for any vector

Idea: Add

~v

~v ,

~v + ~a = ~v + ~b

if

then

to both sides and then use [A1]-[A4] to rearrange

~a = ~b.

(~v + ~a) + (~v ) = (~v + ~b) + (~v )

to

~a = ~b.
~v , if ~v + ~a = ~v ,
~
when b = 0.

2. The zero vector is unique: for any vector

Idea: Use property (1) applied

3. The additive inverse is unique: for any vector

Idea: Use property (1) applied when

4. The scalar

~v ,

if

~v + ~a = ~0

~v = (1 + 0) ~v = ~v + 0 ~v

Idea: Expand

6. The scalar

~0 = (~0 + ~0) = ~0 + ~0

0 ~v = ~0
~0 = ~0

for any vector

~v .
.

for any scalar

Idea: Use property (3) and [M2]-[M4] to write

(1) ~v = ~v

~v .

for any vector

~0 = 0 ~v = (1 + (1)) ~v = ~v + (1)~v ,

and then use

~a = ~v .

7. The additive inverse of the additive inverse is the original vector:

~a = ~v .

via [M1] and then apply property (1).

times any vector gives the additive inverse:

property (1) with

then

via [M2] and then apply property (2).

5. Any scalar times the zero vector is the zero vector:

~a = ~0.

~b = ~v .

times any vector gives the zero vector:

Idea: Expand

then

Idea: Use property (5) and [M1], [M4] to write

(~v ) = ~v

for any vector

~v .

(~v ) = (1) ~v = 1 ~v = ~v .

1.3 Subspaces

Denition: A subspace

of a vector space

and scalar multiplication operations as

V,

is a subset of the vector space

which, under the same addition

is itself a vector space.

Very often, if we want to check that something is a vector space, it is often much easier to verify that it
is a subspace of something else we already know is a vector space.

We will make use of this idea when we talk about the solutions to a homogeneous linear dierential
equation (see the examples below), and prove that the solutions form a vector space merely by
checking that they are a subspace of the set of all functions, rather than going through all of the
axioms.

We are aided by the following criterion, which tells us exactly what properties a subspace must satisfy:

Theorem (Subspace Criterion): To check that

is a subspace of

V,

it is enough to check the following three

properties:

[S1]

contains the zero vector of

[S2]

is closed under addition: For any

[S3]

is closed under scalar multiplication: For any scalar

V.
w
~1

and

w
~2

in

W,

the vector

and

w
~

w
~1 + w
~2

in

W,

is also in

the vector

W.

w
~

is also in

W.

The reason we don't need to check everything to verify that a collection of vectors forms a subspace is that
most of the axioms will automatically be satised in

because they're true in

V.

As long as all of the operations are dened, axioms [A1]-[A2] and [M1]-[M4] will hold in
hold in

V.

because they

But we need to make sure we can always add and scalar-multiply, which is why we need [S2]

and [S3].

In order to get axiom [A3] for

W,

we need to know that the zero vector is in

W,

which is why we need

[S1].

In order to get axiom [A4] for

we can use the result that

(1) w
~ = w
~,

to see that the closure under

scalar multiplication automatically gives additive inverses.

Remark: Any vector space automatically has two easy subspaces: the entire space
consisting only of the zero vector.

V , and the trivial

subspace

Examples: Here is a rather long list of examples of less trivial subspaces (of vector spaces which are of interest
to us):

The vectors of the form

ht, t, ti

are a subspace of

R3 .

[This is the line

x = y = z .]

t = 0.
ht1 , t1 , t1 i + ht2 , t2 , t2 i = ht1 + t2 , t1 + t2 , t1 + t2 i, which is again
we take t = t1 + t2 .
[S3]: We have ht1 , t1 , t1 i = ht1 , t1 , t1 i, which is again of the same form if

[S1]: The zero vector is of this form: take


[S2]: We have

The vectors of the form

hs, t, 0i

are a subspace of

. [This is the

xy -plane,

of the same form if

we take

aka the plane

t = t1 .

z = 0.]

s = t = 0.
[S2]: We have hs1 , t1 , 0i + hs2 , t2 , 0i = hs1 + s2 , t1 + t2 , 0i, which is again of the same form, if we take
s = s1 + s2 and t = t1 + t2 .
[S3]: We have hs1 , t1 , 0i = hs1 , t1 , 0i, which is again of the same form, if we take s = s1 and
t = t1 .

[S1]: The zero vector is of this form: take

The vectors

hx, y, zi

with

2x y + z = 0

are a subspace of

R3 .

2(0) 0 + 0 = 0.
hx2 , y2 , z2 i have 2x1 y1 + z1 = 0 and 2x2 y2 + z2 = 0 then adding the
equations shows that the sum hx1 + x2 , y1 + y2 , z1 + z2 i also lies in the space.
[S3]: If hx1 , y1 , z1 i has 2x1 y1 + z1 = 0 then scaling the equation by shows that hx1 , x2 , x3 i
[S1]: The zero vector is of this form, since
[S2]: If

hx1 , y1 , z1 i

and

also lies in the space.

More generally, the collection of solution vectors

of

homogeneous equations, form a subspace of

hx1 , , xn i
Rn .

to any homogeneous equation, or system

It is possible to check this directly by working with equations. But it is much easier to use matrices:

A~x = ~0, where ~x = hx1 , , xn i is a solution vector.


~
~
[S1]: We have A0 = 0, by the properties of the zero vector.
[S2]: If ~x and ~y are two solutions, the properties of matrix arithmetic imply A(~x + ~y ) = A~x + A~y =
~0 + ~0 = ~0 so that ~x + ~y is also a solution.
[S3]: If is a scalar and ~x is a solution, then A( ~x) = (A~x) = ~0 = ~0, so that ~x is also a
write the system in matrix form, as

solution.

22

The collection of

matrices of the form

[S2]: We have

[S3]: We have

a1 b1
a2 b2
+
0 a1
0 a2

 
a1 b1
a1

=
0 a1
0

22

matrices.

a + 2ai

also of this form.

is a subspace of the complex numbers.

The three requirements should be second nature by now!

The collection of continuous functions on

is a subspace of the space of all

a = b = 0.


a1 + a2 b1 + b2
=
, which is
0
a1 + a2

b1
, which is also of this form.
a1

The collection of complex numbers of the form

[S1]: The zero matrix is of this form, with

a b
0 a

[a, b]

is a subspace of the space of all functions on

[a, b].

[S1]: The zero function is continuous.


[S2]: The sum of two continuous functions is continuous, from basic calculus.
[S3]: The product of continuous functions is continuous, so in particular a constant times a continuous
function is continuous.

The collection of
on

[a, b],

n-times dierentiable functions on [a, b] is a subspace of the space of continuous functions


n.

for any positive integer

The zero function is dierentiable, as are the sum and product of any two functions which are
dierentiable

times.

The collection of all polynomials is a vector space.

Observe that polynomials are functions on the entire real line. Therefore, it is sucient to verify the
subspace criteria.

The zero function is a polynomial, as is the sum of two polynomials, and any scalar multiple of a
polynomial.

The collection of solutions to the (homogeneous, linear) dierential equation

y 00 + 6y 0 + 5y = 0

form a

vector space.

We show this by verifying that the solutions form a subspace of the space of all functions.
[S1]: The zero function is a solution.
[S2]: If

y1

and

y2

y100 + 6y10 + 5y1 = 0 and y200 + 6y20 + 5y2 = 0, so adding and using
00
0
that (y1 + y2 ) + 6(y1 + y2 ) + 5(y1 + y2 ) = 0, so y1 + y2 is also a

are solutions, then

properties of derivatives shows


solution.

[S3]: If

of derivatives shows

y1 is a solution, then scaling y100 + 6y10 + 5y1 = 0 by and


00
0
that (y1 ) + 6(y1 ) + 5(y1 ) = 0, so y1 is also a solution.

is a scalar and

using properties

Note: Observe that we can say something about what the set of solutions to this equation looks like,
namely that it is a vector space, without actually solving it!

For completeness, the solutions are

y = Aex + Be5x

for any constants

and

B.

From here,

if we wanted to, we could directly verify that such functions form a vector space.

nth-order homogeneous linear dierential equation y (n) +Pn (x)y (n1) +


+ P2 (x) y + P1 (x) y = 0 for continuous functions P1 (x), , Pn (x), form a vector space.

The collection of solutions to any

Note that

y (n)

means the

nth

derivative of

y.

As in the previous example, we show this by verifying that the solutions form a subspace of the
space of all functions.

[S1]: The zero function is a solution.

(n)
(n1)
y1 and y2 are solutions, then by adding the equations y1 +Pn (x)y1
+ +P1 (x)y1 = 0
(n)
(n1)
and y2
+ Pn (x) y2
+ + P1 (x) y2 = 0 and using properties of derivatives shows that
(y1 + y2 )(n) + Pn (x) (y1 + y2 )(n1) + + P1 (x) (y1 + y2 ) = 0, so y1 + y2 is also a solution.
(n)
(n1)
[S3]: If is a scalar and y1 is a solution, then scaling y1 +Pn (x)y1
+ +P2 (x)y10 +P1 (x)y1 = 0
(n)
by and using properties of derivatives shows that (y1 )
+Pn (x)(y1 )(n1) + +P1 (x)(y1 ) = 0,

[S2]: If

so

y1

is also a solution.

Note: This example is a fairly signicant amount of the reason we are interested in linear algebra
(as it relates to dierential equations):

because the solutions to homogeneous linear dierential

equations form a vector space. In general, for arbitrary functions


to solve the dierential equation explicitly for

y;

P1 (x), , Pn (x),

it is not possible

nonetheless, we can still say something about what

the solutions look like.

1.4 Span, Independence, Bases, Dimension

One thing we would like to know, now that we have the denition of a vector space and a subspace, is what
else we can say about elements of a vector space  i.e., we would like to know what kind of structure the
elements of a vector space have.

In some of the earlier examples we saw that, in

Rn

and a few other vector spaces, subspaces could all be

written down in terms of one or more parameters. In order to discuss this idea more precisely, we rst need
some terminology.

1.4.1

Linear Combinations and Span

Denition: Given a set


exist scalars

a1 , , an

Example: In

1 h0, 1i.

R2 ,

~v1 , , ~vn
such that

the vector

of vectors, we say a vector

w
~

is a linear combination of

~v1 , , ~vn

if there

w
~ = a1 ~v1 + + an ~vn .
h1, 1i

is a linear combination of

h1, 0i

and

h0, 1i,

because

h1, 1i = 1 h1, 0i +

because

and

h1, 1, 1, 2i,

R3 , the vector h0, 0, 1i is not a linear combination of h1, 1, 0i and h0, 1, 1i because there
scalars a1 and a2 for which a1 h1, 1, 0i + a2 h0, 1, 1i = h0, 0, 1i: this would require a common
to the three equations a1 = 0, a1 + a2 = 0, and a2 = 1, and this system has no solution.

Non-Example: In
exist no
solution

R4 , the vector h4, 0, 5, 9i is a linear combination of h1, 0, 0, 1i, h0, 1, 0, 0i,


h4, 0, 5, 9i = 1 h1, 1, 2, 3i 2 h0, 1, 0, 0i + 3 h1, 1, 1, 2i.

Example: In

~v1 , , ~vn , denoted span(~v1 , , ~vn ), to be the set W of all vectors


~v1 , , ~vn . Explicitly, the span is the set of vectors of the form a1 ~v1 + +
a1 , , an .

Denition: We dene the span of vectors


which are linear combinations of

an ~vn ,

for some scalars

Remark 1: The span is always subspace: since the zero vector can be written as

0 ~v1 + + 0 ~vn ,

and

the span is closed under addition and scalar multiplication.

Remark 2: The span is, in fact, the smallest subspace


any scalars

W.

a1 , , an ,

~v1 , , ~vn : because for


a1~v1 , a2~v2 , , an~vn to be in
a1~v1 + + an~vn to be in W .

containing the vectors

closure under scalar multiplication requires each of

Then closure under vector addition forces the sum

Remark 3: For technical reasons, we dene the span of the empty set to be the zero vector.

Example: The span of the vectors

h1, 0, 0i and h0, 1, 0i in R3

is the set of vectors of the form

a h1, 0, 0i +

b h0, 1, 0i = ha, b, 0i.

Equivalently, the span of these vectors is the set of vectors whose


plane

Denition: Given a vector space


generating set for

1.4.2

V,

V,

if the span of vectors

or that they generate

is zero  i.e., the

~v1 , , ~vn

is all of

V,

we say that

~v1 , , ~vn

are a

V.

h1, 0, 0i, h0, 1, 0i, and h0, 0, 1i


ha, b, ci = a h1, 0, 0i + b h0, 1, 0i + c h0, 0, 1i.

Example: The three vectors


can write

z -coordinate

z = 0.

generate

R3 ,

since for any vector

ha, b, ci

we

Linear Independence

Denition: We say a nite set of vectors

a1 = = an = 0.

~v1 , , ~vn

is linearly independent if

a1 ~v1 + + an ~vn = ~0

implies

(Otherwise, we say the collection is linearly dependent.)

Note: For an innite set of vectors, we say it is linearly independent if every nite subset is linearly
independent (per the denition above); otherwise (if some nite subset displays a dependence) we say it
is dependent.

In other words,

~v1 , , ~vn

as a linear combination of

are linearly independent precisely when the only way to form the zero vector

~v1 , , ~vn

is to have all the scalars equal to zero.

An equivalent way of thinking of linear (in)dependence is that a set is dependent if one of the vectors is a

a1 ~v1 +a2 ~v2 + +an ~vn =


1
~0 and a1 6= 0, then we can rearrange to see that ~v1 = (a2 ~v2 + + an ~vn ).
a1
Example: The vectors h1, 1, 0i and h0, 2, 1i in R3 are linearly independent, because if we have scalars a
and b with a h1, 1, 0i + b h0, 2, 1i = h0, 0, 0i, then comparing the two sides requires a = 0, a + 2b = 0,
b = 0, which has only the solution a = b = 0.
linear combination of the others  i.e., it depends on the others. Explicitly, if

3
Example: The vectors h1, 1, 0i and h2, 2, 0i in R are linearly dependent, because we can write 2h1, 1, 0i+
(1) h2, 2, 0i = h0, 0, 0i. Or, in the equivalent formulation, we have h2, 2, 0i = 2 h1, 1, 0i.

Example: The vectors


because we can write

Theorem: The vectors

h1, 0, 2, 2i, h2, 2, 0, 3i, h0, 3, 3, 1i, and h0, 4, 2, 1i in R4 are linearly dependent,
2 h1, 0, 2, 2i + (1) h2, 2, 0, 3i + (2) h0, 3, 3, 1i + 1 h0, 4, 2, 1i = h0, 0, 0, 0i.

~v1 , , ~vn are linearly independent if and only if every vector w


~ in the span of ~v1 , , ~vn
w
~ = a1 ~v1 + a2 ~v2 + + an ~vn .

may be uniquely written as a sum

For one direction, if the decomposition is always unique, then

a1 = = an = 0,

because

0 ~v1 + + 0 ~vn = ~0

a1 ~v1 + a2 ~v2 + + an ~vn = ~0 implies


~0.

is by assumption the only decomposition of

For the other direction, suppose we had two dierent ways of decomposing a vector

a1 ~v1 + a2 ~v2 + + an ~vn

w
~,

say as

w
~ =

w
~ = b1 ~v1 + b2 ~v2 + + bn ~vn .

and

Then subtracting and then rearranging the dierence between these two equations yields

w
~ w
~ =

(a1 b1 ) ~v1 + + (an bn ) ~vn .

Now

w
~ w
~

is the zero vector, so we have

But now because

b1 , , an bn

~v1 , , ~vn

are zero.

(a1 b1 ) ~v1 + + (an bn ) ~vn = ~0.

are linearly independent, we see that all of the scalar coecients

But this says

a1 = b1 , a2 = b2 , . . . , an = bn

a1

 which is to say, the two

decompositions are actually the same.

1.4.3

Bases and Dimension

Denition: A linearly independent set of vectors which generate

is called a basis for

Terminology Note: The plural form of the (singular) word basis is bases.

Example: The three vectors

Example: More generally, in

V.

h1, 0, 0i, h0, 1, 0i, and h0, 0, 1i generate R3 , as we saw above. They are also
linearly independent, since a h1, 0, 0i + b h0, 1, 0i + c h0, 0, 1i is the zero vector only when a = b = c = 0.
3
Thus, these three vectors are a basis for R .
Rn ,

the standard unit vectors

e1 , e2 , , en

(where

ej

j th

has a 1 in the

coordinate and 0s elsewhere) are a basis.

not possible to obtain the

h1, 1, 0i and h0, 2, 1i in R3 are not a basis, as they fail to generate V :


vector h1, 0, 0i as a linear combination of h1, 1, 0i and h0, 2, 1i.

Non-Example: The vectors

Non-Example: The vectors


linearly dependent: we have

1, x, x2 , x3 ,

Example: The polynomials

First observe that

h1, 0, 0i, h0, 1, 0i, h0, 0, 1i, and h1, 1, 1i in R3 are not a basis, as
1 h1, 0, 0i + 1 h0, 1, 0i + 1 h0, 0, 1i + (1) h1, 1, 1i = h0, 0, 0i.
2

1, x, x , x ,

it is

they are

are a basis for the vector space of all polynomials.

certainly generate the set of all polynomials (by denition of a

polynomial).

Now we want to see that these polynomials are linearly independent.

a0 , a1 , , an

a0 1 + a1 x + + an xn = 0,

So suppose we had scalars

x.
Then if we take the nth derivative of both sides (which is allowable because a0 1+a1 x+ +an xn = 0
is assumed to be true for all x) then we obtain n! an = 0, from which we see that an = 0.
Then repeat by taking the (n1)st derivative to see an1 = 0, and so on, until nally we are left with
2
n
just a0 = 0. Hence the only way to form the zero function as a linear combination of 1, x, x , , x
2
3
is with all coecients zero, which says that 1, x, x , x , is a linearly-independent set.

such that

n vectors ~v1 , , ~vn in Rn


~v1 , , ~vn , is an invertible matrix.

Theorem: A collection of
are the vectors

for all values of

is a basis if and only if the

n n matrix B , whose columns

The idea behind the theorem is to multiply out and compare coordinates, and then analyze the resulting
system of equations.

So suppose we are looking for scalars

a1 , , an

such that

a1~v1 + + an~vn = w
~,

for some vector

w
~

in

Rn .

This vector equation is the same as the matrix equation


are the vectors

~v1 , , ~vn , ~a

B ~a = w
~,

where

is the matrix whose columns

is the column vector whose entries are the scalars

a1 , , an ,

and

w
~

is

thought of as a column vector.

Now from what we know about matrix equations, we know that

B ~a = w
~

has a unique solution for every

is an invertible matrix precisely when

w
~.

But having a unique way to write any vector as a linear combination of vectors in a set is precisely the
statement that the set is a basis. So we are done.

Theorem: Every vector space


generating set for

has a basis. Any two bases of

contain the same number of elements. Any

contains a basis. Any linearly independent set of vectors can be extended to a basis.

Remark: If you only remember one thing about vector spaces, remember that every vector space has a basis !

Remark: That a basis always exists is really, really, really useful. It is without a doubt the most useful
fact about vector spaces: vector spaces in the abstract are very hard to think about, but a vector space
with a basis is something very concrete (since then we know exactly what the elements of the vector
space look like).

To show the rst and last parts of the theorem, we show that we can build any set of linearly independent
vectors into a basis:

Start with

being some set of linearly independent vectors. (In any vector space, the empty set is

always linearly independent.)

1. If

spans

V,

then we are done, because then

is a linearly independent generating set  i.e., a

basis.

2. If

does not span

in

S,

the new

V,

there is an element

of

which is not in the span of

S.

Then if we put

is still linearly independent. Then start over.

Eventually (to justify this statement in general, some fairly technical and advanced machinery may
be needed), it can be proven that we will eventually land in case (1).

If

has dimension

(see below), then we will always be able to construct a basis in at most

steps; it is in the case when

has innite dimension that things get tricky and confusing, and

requires use of what is called the axiom of choice.

To show the third part of the theorem, the idea is to imagine going through the list of elements in a
generating set and removing elements until it becomes linearly independent.

This idea is not so easy to formulate with an innite list, but if we have a nite generating set,
then we can go through the elements of the generating set one at a time, throwing out an element
if it is linearly dependent with the elements that came before it. Then, once we have gotten to the
end of the generating set, the collection of elements which we have not thrown away will still be a
generating set (since removing a dependent element will not change the span), but the collection will
also now be linearly independent (since we threw away elements which were dependent).

To show the second part of the theorem, we will show that if

is a basis with

To see this, since


elements of

B,

m > n,

elements, with

then

is a set of vectors with

is a basis, we can write every element

say as

ai =

n
X

ci,j bj

for

elements and

is linearly dependent.

ai

in

as a linear combination of the

1 i m.

j=1

Now suppose we have a linear combination of the


that there is some choice of scalars

dk ,

ai

which is the zero vector. We would like to see

not all zero, such that

n
X

dk ak = ~0.

k=1

If we substitute in for the vectors in

B,

is a basis, this means each coecient of

equalling the zero vector.

Since

then we obtain a linear combination of the elements of

bj

in the resulting

expression must be zero.

If we tabulate the resulting system, we can check that it is equivalent to the matrix equation
where

is the

the scalars

Now since

dk .
C is

mn

matrix of coecients with entries

and

d~ is

the

n1

But then we have

n
X

C d~ = ~0

dk ak = ~0

has a solution vector

for scalars

dk

d~ which

not all zero, so the set

C d~ = ~0,

matrix with entries

a matrix which has more rows than columns, by the assumption that

that the homogeneous system

ci,j ,

m > n,

we see

is not the zero vector.

is linearly dependent.

k=1

Denition: We dene the number of elements in any basis of

to be the dimension of

The theorem above assures us that this quantity is always well-dened.

Example: The dimension of

Rn

is

n,

since the

V.

standard unit vectors form a basis.

This says that the term dimension is reasonable, since it is the same as our usual notion of dimension.

Example: The dimension of the vector space of


of the

mn

matrices

Ei,j ,

where

Ei,j

mn

matrices is

is the matrix with a 1 in the

mn, because there is a basis consisting


(i, j)-entry and 0s elsewhere.

Example:

The dimension of the vector space of all polynomials is

polynomials

1, x, x2 , x3 ,

because the (innite list of )

are a basis for the space.

1.5 Linear Transformations

Now that we have a reasonably good idea of what the structure of a vector space is, the next natural question
is: what do maps from one vector space to another look like?

It turns out that we don't want to ask about arbitrary functions, but about functions from one vector space
to another which preserve the structure (namely, addition and scalar multiplication) of the vector space.

The analogy to the real numbers is: once we know what the real numbers look like, what can we say
about arbitrary real-valued functions?

The answer is, not much, unless we specify that the functions preserve the structure of the real numbers
 which is abstract math-speak for saying that we want to talk about continuous functions, which turn
out to behave much more nicely.

This is the idea behind the denition of a linear transformation: it is a map that preserves the structure of a
vector space.

Denition:

If

and

T from V to W
, we have the two

are vector spaces, we say a map

linear transformation if, for any vectors

~v , ~v1 , ~v2

and scalar

[T1] The map respects addition of vectors:

[T2] The map respects scalar multiplication:

(denoted

T ( ~v ) = ~v .

Remark: Like with the denition of a vector space, one can show a few simple algebraic properties of linear
(of

is a

T (~v1 + ~v2 ) = T (~v1 ) + T (~v2 )

transformations  for example, that any linear transformation sends the zero vector (of

T : V W)

properties

V)

to the zero vector

W ).

Example: If

V = W = R2 ,

then the map

~v = hx, yi, ~v1 = hx1 , y1 i,

which sends

to

hx, x + yi

is a linear transformation.

Let

[T1]: We have

T (~v1 + ~v2 ) = hx1 + x2 , x1 + x2 + y1 + y2 i = hx1 , x1 + y1 i + hx2 , x2 + y2 i = T (~v1 ) + T (~v2 ).

[T2]: We have

T ( ~v ) = hx, x + yi = hx, x + yi = T (~v ).

More General Example: If

and

~v2 = hx2 , y2 i,

hx, yi

V = W = R2 ,

so that

then the map

~v1 + ~v2 = hx1 + x2 , y1 + y2 i.

which sends

hx, yi

to

hax + by, cx + dyi

is a linear

transformation.

Just like in the previous example, we can work out the calculations explicitly.

But another way we can think of this map is as a matrix map:


column vector

ax + by
cx + dy


=

sends the column vector

a
c

a
c

So, in fact, this map

When we think of the map in this way, it is easier to see what is happening:

is really just (left) multiplication by the matrix

[T1]: We have

[T2]: Also,

T (~v1 + ~v2 ) =

a b
T ( ~v ) =
c d

is any

to the


.




a b
a b
(~v1 + ~v2 ) =
~v1 +
~v2 = T (~v1 ) + T (~v2 ).
c d
c d



a b
~v1 =
~v1 = T (~v ).
c d
b
d

b
d

V = Rm (thought of as m 1 matrices) and W = Rn (thought of as n 1


n m matrix, then the map T sending ~v to A ~v is a linear transformation.

Really General Example:


matrices) and

a
c


 

b
x

.
d
y

x
y

If

The verication is exactly the same as in the previous example.

[T1]: We have

[T2]: Also,

T (~v1 + ~v2 ) = A (~v1 + ~v2 ) = A ~v1 + A ~v2 = T (~v1 ) + T (~v2 ).

T ( ~v ) = A ~v1 = (A ~v1 ) = T (~v ).

This last example is very general: in fact, it is so general that every linear transformation from
of this form! Namely, if
that

acts by sending

T is a linear
~v to A ~v .

transformation from

to

, then there is some

nm

Rm

Rn is
A such

to

matrix

A is: it is just the m n


T (e1 ), T (e2 ), . . . , T (em ), where e1 , , em are the standard
with a 1 in the j th position and 0s elsewhere).

The reason is actually very simple, and it is easy to write down what the matrix
matrix whose columns are the vectors
basis elements of

Rm (ej

is the vector

To see that this choice of


combination

~v =

m
X

aj ej

works, note that every vector

of the basis elements.

~v

in

Rm

can be written as a unique linear

Then, after applying the two properties of a linear

j=1
transformation, we obtain

T (~v ) =

m
X

aj T (ej ).

But this is precisely the matrix product of the matrix

j=1

with the coecients of

~v .

Tangential Remark: If we write down the map

explicitly, we see that the term in each coordinate in


V

is a linear function of the coordinates in


and

cx + dy .

 e.g., if

A=

a
c

b
d


then the linear functions are

ax + by

This is the reason that linear transformations are named so  because they are really just

linear functions, in the traditional sense.

In fact, we can state something even more general:


dimensional vector space
linear transformation

and any

from

to

n-dimensional

the argument above shows that, if we take any

vector space

m-

and choose bases for each space, then a

behaves just like multiplication by (some)

nm

matrix

A.

Remark 1: This result underlines one of the reasons that matrices and vector spaces (which initially seem
like they have almost nothing to do with one another) are actually closely related: because matrices
describe the maps from one vector space to another.

Remark 2: One can also use this relationship between maps on vector spaces and matrices to provide
almost trivial proofs of some of the algebraic properties of matrix multiplication which are hard to prove
by direct computation.

For example: the composition of linear transformations is associative (because linear transformations
are functions, and function composition is associative).

Multiplication of matrices is the same as

composition of functions. Hence multiplication of matrices is associative.

1.5.1

Kernel and Image

T : V W is a linear transformation, then the kernel of T , denoted ker(T ), is the set of


v in V with T (v) = ~0. The image of T , denoted im(T ), is the set of elements w in W such that there
v in V with T (v) = w.

Denition: If
elements
exists a

Intuitively, the kernel is the elements which are sent to zero by


which are hit by

(the range of

and the image is the elements in

Essentially (see below), the kernel measures how far from one-to-one the map
measures how far from onto the map

T,

A ~x = ~0

is the kernel of the linear transformation

A.

Another reason is that they will say something about the subspace

The kernel is a subspace of

is, and the image

is.

One of the reasons we care about these subspaces is that (for example) the set of solutions to a set of
homogeneous linear equations

T ).

[S1] We have

V.

T (~0) = ~0,

by simple properties of linear transformations.

(see below).

of multiplication by

[S2] If v1 and v2
~0 + ~0 = ~0.

[S3] If

are in the kernel, then

is in the kernel, then

The image is a subspace of

[S1] We have
[S2] If
Then

[S3] If

T (v1 ) = ~0 and T (v2 ) = ~0.

T (v) = ~0.

Hence

T (v1 +v2 ) = T (v1 )+T (v2 ) =

Therefore,

T ( v) = T (v) = ~0 = ~0.

W.

T (~0) = ~0,

by simple properties of linear transformations.

w1 and w2 are in the image, then there exist v1 and v2 are such that T (v1 ) = w1
T (v1 + v2 ) = T (v1 ) + T (v2 ) = w1 + w2 , so that w1 + w2 is also in the image.
w

is in the image, then there exists

with

T (v) = w.

Then

and

T (v2 ) = w2 .

T ( v) = T (v) = w,

so

is

also in the image.

Theorem: The kernel

ker(T ) consists of only the zero vector if and only if the map T
W if and only if the map T is onto.

is one-to-one. The image

im(T ) consists of all of

The statement about the image is just the denition of onto.

If

is one-to-one, then (at most) one element of

zero vector, we see that

maps to

cannot send anything else

to ~
0.

~0.

But since the zero vector is taken to the

Thus

ker(T ) = ~0.

T is a linear transformation, the statement T (v1 ) = T (v2 ) is


T (v1 ) T (v2 ) = T (v1 v2 ) is the zero vector. But, by the denition of
the kernel, T (v1 v2 ) = ~
0 precisely when v1 v2 is in the kernel. However, this means v1 v2 = ~0, or
v1 = v2 . Hence T (v1 ) = T (v2 ) implies v1 = v2 , which means T is one-to-one.
If

ker(T )

is only the zero vector, then since

equivalent to the statement that

Theorem (Rank-Nullity): For any linear transformation

The idea behind this theorem is that if we have a basis for im(T ), say

v1 , , vk

with

that the set of

T : V W , dim(ker(T )) + dim(im(T )) = dim(V ).

T (v1 ) = w1 , . . . , T (vk ) = wk . Then if a1 , , al


vectors {v1 , , vk , a1 , al } is a basis for V .

To do this, given any

write

T (v) =

k
X

j wj =

j=1

k
X

w1 , , wk , then there
ker(T ), the goal is to

is a basis for

k
X
j T (vj ) = T j vj ,

where the

exist
show

are unique.

j=1

j=1

k
k
X
X
Then subtraction shows that T v
j vj = ~0 so that v
j vj
j=1
as a sum

l
X

i ai ,

where the

is in

ker(T ), hence can be written

j=1

are unique.

i=1

Putting all this together shows

v =

k
l
X
X
j vj +
i a i
j=1

{v1 , , vk , a1 , al }
1.5.2

is a basis for

for unique scalars

and

i ,

which says that

i=1

V.

The Derivative as a Linear Transformation

Example: If
then

T1 + T2

W are any vector spaces, and T1 and T2 are any linear


T1 are also linear transformations, for any scalar .

and

and

transformations from

to

W,

These follow from the criteria. (They are somewhat confusing to follow when written down, so I won't
bother.)

Example: If

to the value

is the vector space of real-valued functions and

f (0)

[T1]: We have

[T2]: Also,

W = R,

then the evaluation at 0 map taking

is a linear transformation.

T (f1 + f2 ) = (f1 + f2 )(0) = f1 (0) + f2 (0) = T (f1 ) + T (f2 ).

T ( f ) = (f )(0) = f (0) = T (f ).

Note of course that being a linear transformation has nothing to do with the fact that we are evaluating
at 0. We could just as well evaluate at 1, or

Example: If

and

and the map would still be a linear transformation.

W are both the vector space of real-valued functions and P (x) is any real-valued function,
f (x) to the function P (x)f (x) is a linear transformation.

then the map taking

[T1]: We have

[T2]: Also,

Example: If

T ( f ) = P (x)(f )(x) = P (x)f (x) = T (f ).


is the vector space of all

functions, then the

nth

n-times dierentiable functions and W is


f (x) to its nth derivative f (n) (x), is a

derivative map, taking

nth derivative of the sum is the sum


(n)
(n)
(x) = f1 (x) + f2 (x) = T (f1 ) + T (f2 ).

[T1]: The

f2 )

T (f1 + f2 ) = P (x)(f1 + f2 )(x) = P (x)f1 (x) + P (x)f2 (x) = T (f1 ) + T (f2 ).

(n)

[T2]: Also,

of the

nth

the vector space of all


linear transformation.

derivatives, so we have

T (f1 + f2 ) = (f1 +

T ( f ) = (f )(n) (x) = f (n) (x) = T (f ).

V is the vector space of all


n-times dierentiable functions, then the map T which sends a function y to the function y (n) + Pn (x) y (n1) +
+ P2 (x) y 0 + P1 (x) y is a linear transformation, for any functions Pn (x), , P1 (x).
If we combine the results from the previous four examples, we can show that if

In particular, the kernel of this linear transformation is the collection of all functions

Pn (x) y (n1) + + P2 (x) y 0 + P1 (x) y = 0

Note that since we know the kernel is a vector space (as it is a subspace of
solutions to

such that

y (n) +

 i.e., the set of solutions to this dierential equation.

y (n) + Pn (x) y (n1) + + P2 (x) y 0 + P1 (x) y = 0

V ),

we see that the set of

forms a vector space. (Of course, we could

just show this statement directly, by checking the subspace criteria.)

However, it is very useful to be able to think of this linear dierential operator sending

Pn (x) y

(n1)

+ + P2 (x) y + P1 (x) y

to

y (n) +

as a linear transformation.

Well, you're at the end of my handout. Hope it was helpful.


Copyright notice: This material is copyright Evan Dummit, 2012. You may not reproduce or distribute this material
without my express permission.

You might also like