16 views

Uploaded by Sebastian Shaqiri

Calculus And Linear Algebra Summary for first year collage students or high school students.

- Leach
- C3L4 Maxima and Minima of Function of Two Variables
- Basic Formulas for Intergration
- emath12
- Integration WITH SOFTWARE
- Lec12p4
- Derivative Slides
- Calculus
- Summary_ Applications of the
- Differentiation-Part 2.pdf
- De Rivia Tives
- Calculus 1 Ohio State
- Lecture 4 - Limits at Infinity and Continuity at a Point.pdf
- Mooculus
- Mooculus
- Def Integrals
- Abels Integral Equation Solution
- nda paper 1
- h 02 Partial
- Soerjadi_1968

You are on page 1of 50

Summary

Sebastian Shaqiri

Contents

1.1 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.1.1 Evaluating Limits . . . . . . . . . . . . . . . . . . . . 6

1.1.2 Continuous Functions . . . . . . . . . . . . . . . . . . 8

1.2 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.2.1 Definition of the Derivative . . . . . . . . . . . . . . . 9

1.2.2 Properties of Derivatives . . . . . . . . . . . . . . . . . 10

1.2.3 General Characteristics of Differentiable Functions . . 13

1.2.4 LHpitalss rule . . . . . . . . . . . . . . . . . . . . . 15

2.1 General Characteristics of Anti-derivatives . . . . . . . . . . . 16

2.1.1 Partial integration . . . . . . . . . . . . . . . . . . . . 17

2.1.2 Variable Substitution . . . . . . . . . . . . . . . . . . 17

2.2 Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2.1 The Riemann Integral . . . . . . . . . . . . . . . . . . 19

2.2.2 Integration of Continuous Functions . . . . . . . . . . 20

2.2.3 Properties and Estimates . . . . . . . . . . . . . . . . 22

2.2.4 Fundamental theorem of calculus . . . . . . . . . . . . 23

2.2.5 Improper Integrals . . . . . . . . . . . . . . . . . . . . 24

2.2.6 Integrals in Probability Theory . . . . . . . . . . . . . 25

3 Linear Algebra 27

3.1 System of Linear Equations . . . . . . . . . . . . . . . . . . . 27

3.2 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . 29

3.3 The Matrix Equation Ax = b . . . . . . . . . . . . . . . . . . 31

3.3.1 Properties of the Matrix-vector product Ax . . . . . . 32

3.4 The Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . 33

3.5 Matrix Factorizations . . . . . . . . . . . . . . . . . . . . . . 34

3.5.1 The LU Factorization . . . . . . . . . . . . . . . . . . 35

3.6 Subspaces of Rn . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.7 Eigenvectors and Eigenvalues . . . . . . . . . . . . . . . . . . 37

1

3.8 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.9 Inner Product, Length and Orthogonality . . . . . . . . . . . 41

3.9.1 The Inner Product . . . . . . . . . . . . . . . . . . . . 41

3.9.2 The Length of a Vector . . . . . . . . . . . . . . . . . 41

3.9.3 Orthogonal Vectors . . . . . . . . . . . . . . . . . . . . 42

3.9.4 Orthogonal Sets . . . . . . . . . . . . . . . . . . . . . 43

3.9.5 Orthogonal Projections . . . . . . . . . . . . . . . . . 45

3.10 The Gram-Schmidt process . . . . . . . . . . . . . . . . . . . 47

3.11 Least-Squares Problems . . . . . . . . . . . . . . . . . . . . . 48

3.12 Further Reading (Optional) . . . . . . . . . . . . . . . . . . . 49

2

Part I

3

1

some maximum or minimum.

Leonhard Euler,

1.1 Limits

In the case of x + we define the limit as

Definition 1. Assume that f (x) is a function whose definition contains

arbitrarily large amount of real numbers. We say that f (x) has the limit A

as x approaches infinity if for every given number > 0 is a number such

that )

x>

|f (x) A| < .

x Df

This is written

f (x) A when x +

or alternatively

lim f (x) = A.

x+

The meaning of the definition is as follows: the function has the limit

A when x if the function values f (x) satisfies any given tolerance

requirements of the form

A < f (x) < A +

as soon x is sufficiently large, that is, for all x > . The greater accuracy -ie

the smaller -stated, the greater has to be selected for tolerance require-

ment to be fulfilled for all x > .

n=0 , which

of course can be seen as functions of the natural numbers as domain. For

4

sequences, but not other functions, there are also the following terminology:

if the sequence has a limit as n it is said to be convergent otherwise

divergent.

The definition of the limit above is just one of many similar definitions

that must be done. When we examine the elementary functions we have to

work with

f (x) A when x ,

f (x) A when x a,

f (x) A when x a+ ,

f (x) A when x a .

These notions are defined analogously to the above-treated prototype limx+ f (x) =

A, and they are designated by corresponding lim notations. For example we

have the case x a :

Let f be a function and assume that every setting of the point a contains

points from Df . Then f is said to have the limit A as x approaches a if, for

every number > 0 there exist a number > 0 such that

)

|x a| <

|f (x) A| < .

x Df

Especially if the point a itself belongs to the domain, you can select

x = a. Accordingly read is |f (a) A| < for each > 0. The only possibility

then is that A = f (a). If f is defined in a and is defined for the limit when

x a thus the limit has to be equal to the function value f (a).

Limits of the type x a+ and x a . is right respectively left limit.

Their definition is obtained by changing the condition above |x a| < for

a x < a + and a < x. Apparently f (x) is defined for the limit when

x a exactly in that case when the right and left limit exist and is equal.

We also need to introduce concepts such as

f (x) + when x +,

f (x) + when x a,

f (x) when x a+

Such limits we call improper limits. They are defined by analogy with the

former (proper) limits.

5

1.1.1 Evaluating Limits

We usually try to avoid working directly with the definition when to deter-

mine the limits. Instead, we try to use some basic properties together with

a set of standard values.

The basic properties that we are about to establish are by most perceived

as intuitively obvious and works fully automatically at problem solving. The

rules are valid for all types of limits, x +, x a, x a+ etc., and in

the formulation below, we therefore make no stipulation in this regard unless

it is necessary for the sake of consistency.

Theorem 1. If lim f (x) = 0 and the function g(x) is finite it holds that

f (x)g(x) 0.

Proof. By the definition there exists two numbers C and 0 such that

x > 1 |f (x)| <

C

Let = max(0 , 1 ) we then have

x > |f (x)| |g(x)| < C =

C

and by the definition of a limit we have that f (x)g(x) 0 when x .

it applies

f (x) + g(x) A + B (1.1)

f (x)g(x) AB (1.2)

Furthermore, if B 6= 0 it applies

f (x) A

(1.3)

g(x) B

exists two numbers 1 and 2 such that

x > 1 |f (x) A| <

2

6

and

x > 2 |g(x) B| <

2

Let = max(1 , 2 ), by the triangle-inequality for x > we then get

|f (x) + g(x) (A + B)| = |(f (x) A) + (g(x) B)| |f (x A)|+|g(x) B| < + =

2 2

and thereby the proof is done. Proof of (3.2): Consider the equation

1 and that the proof is correct. Proof of (3.3): We are going to show that

1 1

when x .

g(x) B

If B is positive we get

B B

g(x) > B = .

2 2

and then we get

1 2

0< <

g(x) B

which gives us

1 1 1

= (B g(x))

g(x) B Bg(x)

and thus the proof is done.

Theorem 3 (Squeeze Theorem). If f (x) and g(x) has the same limit A and

if

f (x) h(x) g(x)

it implicate that lim h(x) = A.

exists two numbers 1 and 2 such that

and

x > 2 A < g(x) < A +

thus

A < f (x) h(x) g(x) < A +

for all x > max(1 , 2 ), and by the definition it entails that h(x) has the

limit A.

7

1.1.2 Continuous Functions

Definition 2. A function f is said to be continuous at a point x0 if x0

belongs to the domain, and if the limit

lim f (x)

xx0

exists.

-If a function is continuous at each point in its domain it is called contin-

uous.

Points in which a function is not continuous,is often referred to as discon-

tinuities. Sometimes we also talk about a singularity at one such point. The

meaning of continuity is that a small variation of the variable x only causes a

small change in the function value f (x). A sudden change of function values

thus indicates the presence of a discontinuity.

Of the basic properties for limits it immediately follows that if f and g are

continuous functions as is

f

f + g, f g, , f g

g

continuous in their respective domains. Since the function f (x) = x triv-

ially is is continuous, follows through repeated use of these rules that each

polynomial and each rational function is is continuous. We accept without

closed proof that the process for the introduction of powers is such that it

leads to continuous power functions and exponential functions. The geo-

metric situation at the introduction of the trigonometric functions indicates

that these are continuous: a small change in arc length x gives rise to small

changes in the coordinates cos(x) and sin(x) for the corresponding points

on the unit circle. The hyperbolic functions are made up of exponential

functions and is therefore continuous. Finally also logarithmic, and inverse

functions are continuous according to the following theorem

Theorem 4. The inverse of a strictly monotonic and continuous function

is continuous.

All the elementary functions of polynomials of inverse functions is thus

continuous. The same is true as well for all the functions that are made up

of those using addition, multiplication, division and composition.

1.2 Derivatives

There are many practical issues about how quickly a particular course of

change appears, such as "how fast is the car going?", "how quickly the air

8

pressure with increasing height above the surface of the earth?", "how much

does the tax increase with a growing income? ", etc. We will equip ourselves

with a mathematical tool that measures the speed of such changes. Let f (x)

denote the temperature distribution along a thin rod positioned along the

x-axis. We assume that the temperature is measured in degrees Celsius and

that the length of the unit is meters. If you have full knowledge of the

function f (x), it should also be possible to answer the question: how fast,

expressed in degrees per meter, is the temperature changing along with the

rod at a certain point x0 ? In other words it should be an expression formed

only by means of f (x) as to reasonably measure the change in temperature

per meter at a given point x0 . To find this expression, we first note that

from the point x0 to a nearby point x0 + h, the temperature has changed

with

f (x0 + h) f (x0 )

degrees. In the interval with endpoints x0 and x0 + h is accordingly read

the temperature increase (or decrease) in average

f (x0 + h) f (x0 )

(1.4)

h

degrees per meter. The expression (3.4) means no precise answer to the

question of how large the growth rate is at point x0 , but the smaller intervals

we use the closer we should get a precise indication. If the limit

f (x0 + h) f (x0 )

lim (1.5)

h0 h

exist, it is therefore reasonable to regard this as the metrics for temperature

rate of change in the point x0 . The limit value (3.5) thus represents the

expression were looking for.

The result of the analysis in the previous section shows that the limits of

the form (3.5) is of great interest, and we are now beginning a systematic

study of such.

x0 . If the limit

f (x0 + h) f (x0 )

lim

h0 h

exist then f is said to be differentiable at the point x0 . The limit is called

the derivative of f in x0 and denoted

df

f 0 (x0 ), (x0 ) or Df (x0 ).

dx

9

If a function f is differentiable at every point in its domain we say briefly

that f is differentiable. The function

x f 0 (x), x Df

We define the tangent at the point (x0 , f (x0 )) as the line whose equation is

We also talk about f 0 (x0 ) as the function curves slope or steepness at the

point (x0 , f (x0 )).

Second derivative

It can, of course, in some cases, be reason to study the growth rate of the

derivative f 0 of a function f. You should then form (f 0 )0 . This function is

called second derivative of f and designated in either of the ways

d2 f

f 00 , f (2) , D2 (f ) and .

dx2

According to its definition,f 00 measures how fast the growth rate increases

at the point x. For example, if s(t) indicates the total distance in meters at

the time t seconds is s0 (t) the normal acceleration in m/s2 . Depending on

the interpretation of f (x), however, the acceleration f 00 (x) have completely

different units. In the example where f (x) is the temperature ( C) at x (m)

this ensure the acceleration unit C/m2 .

We will now derive a number of basic properties of differentiation. In par-

ticular, for this purpose, we need the following results.

Theorem 5. If a function f is differentiable is the continuous.

Proof. Suppose that f is differentiable at point x0 . According to the def-

inition of continuity we shall show that f (x0 + h) f (x0 ) when h 0.

But

f (x0 + h) f (x0 )

f (x0 + h) f (x0 ) = h f (x0 ) 0 = 0 when h 0.

h

Which proves the theorem.

The reverse of theorem 5 is not true. For example the function f (x) = |x|

is continuous but not differentiable at the point x = 0.

10

Algebraic Properties

Theorem 6. Let f and g be differentiable functions and is a constant.

Then the functions of f + g, f g, and f /g is differentiable in their respective

domain. We have the following formulas for their derivatives:

0

f f 0 (x)g(x) f (x)g 0 (x)

= (1.10)

g g(x)2

Proof. Proof (3.8):By the definition of the derivative we get

= + f 0 (x)+g 0 (x)

h h h

when h 0 and thereby (3.8) is the derivative.

Proof (3.9): By the definition of the derivative we get

= =

h h

f (x + h) f (x) g(x + h) g(x)

= g(x + h) + f (x) .

h h

thus countinous we get

Proof (3.10):We are going to show that

1 g 0 (x)

D =

g(x) g(x)2

1 1 g(x+h)g(x)

g(x+h) g(x) g(x) g(x + h) h

= = .

h hg(x)g(x + h) g(x)g(x + h)

where the denominator goes to g(x)2 when h 0 and the numerator goes

to g 0 (x) hence the theorem is true.

11

Derivatives of Composite Function

Theorem 7 (Chain rule). Let g(x) be differentiable at x and f (x) be dif-

ferentiable at g(x). Let y = f (g(x)) and u = g(x).

Proof. We will use the fact that if y = h(x) is differentiable at x then

y = h0 (x)x + x

where 0 when x 0. We have that

u = g 0 (x)x + 1 x dr 1 0 d x 0

y = f 0 (u)u + 2 u dr 2 0 d u 0.

Substituting u from the first equation into the second,

dy

= f 0 (u) + 2 g 0 (x) + 1 .

dx

Taking the limit as x 0

dy dy du

= f 0 (u) g 0 (x) = .

dx du dx

Theorem 8. Assume that the function f has an inverse function which

is continuous. If f is differentiable at point x and f 0 (x) 6= 0 then f 1

differentiable at the point y = f (x) and

1

(Df 1 )(y) = .

f 0 (x)

Proof. For each small contribution k 6= 0 to y, we can write

y + k = f (x + h),

where the contribution h of x is determined by f 1 and (y + k) = x + h e.i.

h = f 1 (y + k) f 1 (y).

Since f 1 is assumed continuous when h 0 as k approaches 0. Now

consider the difference quotient of f 1 at the point y :

f 1 (y + k) f 1 (y) h 1

= = f (x+h)f (x)

.

k f (x + h) f (x)

h

It follows that

f 1 (y + k) f 1 (y) 1

0 when k 0.

k f (x)

And thereby the theorem is proved.

12

1.2.3 General Characteristics of Differentiable Functions

Definition 4. Let x0 be a point in the domain Df to a function f . We say

that f has a local maximum at x0 if there is a number > 0 such that

)

|x x0 i |

f (x) f (x0 ).

x Df

We then call x0 a local maximum point of f and the function value f (x0 )

for a local maximum. Moreover, if f (x) < f (x0 ) when x 6= 0 we speak of a

strict local maximum point and a strict local maximum.

Similarly we define a (strict) local minimum point and a (strict) local

minimum value.

Local maximum and local minimum points are with a common name

called local extreme points. We also say that f has local extreme values at

thees points. Note carefully that the concept of local extreme value only

describes the functions behavior in the immediate surroundings of a point.

A local maximum is not necessarily the functions largest value, but of course

it could be.

x0 in the domain interval and if f is differentiable at x0 we get

f 0 (x0 ) = 0.

Proof. We consider the case where f has a local maximum in x0 : the proof

in the other case is analogous. For all sufficiently small values of |h| is

according to the definition of a local maximum

(

f (x0 + h) f (x0 ) 0 if h < 0

h 0 if h > 0

0 f 0 (x0 ) 0

Points for which f has the derivative zero, ie where the functions growth

rate is zero, is usually called critical points. The meaning of theorem 9 is

that in addition to the possible end points, extreme values can only occur at

critical points. Among other things, in order to determine whether a given

critical point is an extreme point or not we need additional connections

between a function and its derivative. The following theorem is fundamental

in deriving such.

13

Theorem 10 (Mean value theorem). Suppose that f is continuous in the

closed interval a c b and differentiable in the open interval a < x < b.

Then there exist at least one point , a < < b, such that

f (b) f (a) = f 0 ()(b a).

Proof. Consider the following help fuction

f (b) f (a)

(x) = f (x) (x a)

ba

deposit of x = a, x = b gives us (a) = (b) = f (a). Furthermore is

continues in the interval [a, b] and differentiable in the interval (a, b) which

gives us

f (b) f (a)

0 (x) = f 0 (x)

ba

Rolles theorem states that there has to be a critical point at x = which

gives us

f (b) f (a)

() = f 0 () =0

ba

which is equivalent with

f (b) f (a) = f 0 ()(b a)

and thereby we have proved the theorem.

and if f 0 (x) = 0 for all x in this interval, then f is a constant function

Proof. Let c be a fixed number and let x be an arbitrary point in the interval.

Since differentiability entails continuity are the prerequisites of the mean

value theorem met in the range of endpoints c and x. Thus,

f (x) f (c) = f 0 ()(x c)

for some between c and x. However, the derivative is equal to 0 at all

points, so we get that f (x) f (c) = 0, ie

f (x) = f (x)

for all x ]a, b[. Thus the proof is finished.

f 0 (x) = g 0 (x), a < x < b,

it follows that

f (x) = g(x) + C

for some constant C.

Proof. The assertion follows directly by application of theorem 11 on the

funktion f (x) g(x).

14

1.2.4 LHpitalss rule

Let x0 be a real number (including ) and let f (x) and g(x) be dieren-

tiable functions. Suppose that limxx0 f (x) = 0 and limxx0 g(x) = 0. If

0 (x)

limxx0 fg0 (x) exists and there is an interval (a, b) containing x0 such that

f 0 (x)

g 0 (x) 6= 0 for all x (a, b), then limxx0 g 0 (x) exists and

f (x) f 0 (x)

lim = lim 0 .

xx0 g(x) xx0 g (x)

0

Also suppose limxx0 f (x) = and limxx0 g(x) = . If limxx0 fg0 (x) (x)

exists and there is an interval (a, b) containing x0 such that g 0 (x) 6= 0 for all

0 (x)

x (a, b), then limxx0 fg0 (x) exists and

f (x) f 0 (x)

lim = lim 0 .

xx0 g(x) xx0 g (x)

15

2

"Love can reach the same level of talent, and even genius, as the

discovery of differential calculus."

Lev Vygotsky

Definition 5. Let f be definerad in an interval I. A differentiable function

F is called a anti-derivative to f if

F 0 (x) = f (x), x I.

F (x) + C

only uncertainty that has a primitive. Namely, if G is another anti-derivative

of f , so that

G0 (x) = F 0 (x) = f (x), x I,

then the corollary of theorem 11 shows that

g(x) = F (x) + C.

derivatives of f by adding constants to F Instead of saying that f 0 (x) is the

derivative of f (x), we can say that f 0 (x)dx is the differential of f (x). The

reverse problem can be similarly formulated, we are looking for a function

F (x) whose differential is equal to f (x)dx. This is the background to let the

Z

f (x)dx (2.1)

16

denote an anti-derivative of f . We will soon see that this differential writ-

ing has large computational

R

advantages over other perhaps closer at hand

designations for example f (x).

Theorem 12 (Partial integration). If F is an anti-derivative of f then

Z Z

f (x)g(x)dx = F (x)g(x) F (x)g 0 (x)dx. (2.2)

equal to f (x)g(x). But the rule for differentiation of a product and the

definition of an anti-derivative gives immediately

Z

D F (x)g(x) F (x)g 0 (x) = F 0 (x)g(x)+F (x)g 0 (x)F (x)g 0 (x) = F 0 (x)g(x) = f (x)g(x).

A general method for all types of mathematical problem solving is to replace

variable. In this way, one might simplify his problems, or become aware of a

new perspective on it. The calculation of the anti-derivative is no exception

in this regard.

A change from a variable x to a new variable t is in this context form

x = g(t), (2.3)

return to the variable x having resolved our problems in the variable

R

t.

The following theorem hows how to transform calculating f (x)dx to

the calculation of an anti-derivative of such through the change of variables

in (3.3).

Theorem 13 (Variable Substitution). Suppose that g in (3.3) is a differ-

entiable function. Then

Z Z

f (x)dx = f (g(t))g 0 (t)dt .

t=g 1 (x)

constant,is Z

F (x) = f (g(t))g 0 (t)dt .

t=g 1 (x)

This is equivalent to

Z

F (g(t)) = f (g(t))g 0 (t)dt.

17

According to the chain rule and the definition of an anti-derivative, the

derivative with respect to t of the left hand side equal to

hand side the same derivative. Thus following the corollary of theorem 11.

The two sides are equal except for a constant. The proof is done.

dt = g 0 (t) which in differ-

ential form is written

dx = g 0 (t)dt.

This notation is convenient for practical behalf. In the integrals to be

calculated Z

F (x) = f (x)dx,

g(t). Hopefully when you make a change of variables of this kind is that

the new primitive Rfunction f (g(t))g 0 (t)dt should prove easier to calculate

R

than the original f (x)dx. If so, it carries out this calculation and finish

the solution and to return to the variable x.

2.2 Integrals

Integral of Steps Functions

A function on the interval [a, b] is called a step function if there is a

subdivision of [a, b] into smaller divisions in which has a constant value.

More precisely, if the division points are

then is defiend as

For the step function (3.4) we define the area between its graph and the

xaxis as the number

n

X

I() = ck (xk xk1 ). (2.5)

k=1

Each term in the sum can be interpreted as such. The part of the area below

the x-axis, however, has been assigned a negative metrics, as a closer stage

18

of (3.5) immediately indicate. We shall see later that this relationship is

very practical. It will also prove beneficial to no longer speak of the area

between the graph of and the x-axis, but rather consider I() in (3.5) as

a number associated with the function .

Definition 6. The number

n

X

I() = ck (xk xk1 )

k=1

is called the integral of the step function . We also use the designation

Z b

I() = (x)dx.

a

For each step function hears that we have seen a breakdown of its defini-

tion interval [a, b]. It is of course conceivable to add another division points

but for the sake of the function itself is changed. We then say that the divi-

sion refined. It is obvious that the value of the integral I() is not affected by

such a refinement of the distribution. This observation has the consequence

that if we have two step functions in the same interval [a, b] then there is

no restriction to assume that they are generated from the same division of

the interval. Against this background, it is not difficult to recognize the

correctness of the following theorem.

Theorem 14. The following properties hold for the integral of the step

function on the interval [a, b].

I() = I(), constant, (2.6)

Z b Z c Z b

I() = (x)dx = (x)dx + (x)dx if acb (2.9)

a a c

Definition 7. a finite function f defined on a finite interval [a, b] is said to

be (Riemann) integrable over this if it is to every real number > 0 exists

two step functions and satisfying

(x) f (x) (x), a x b,

and which is such that

I() I() < .

19

The definition has the consequence that if a function is integrable so its

graph can be covered by finitely many axis-parallel rectangles with arbitrar-

ily small total area. For the area between the graphs of and in the

definition consists of those rectangles and occupies an area of less than .

It remains to define the integral of an integrable function. The following

theorem is the basis for this.

that

I() I()

for all step functions and with f .

Given the geometric importance of I() and I() the number should

be an adequate measure of the area of the region between the graph of f

and x-axis. We are therefore led to the following definition.

Definition 8. Assume that the function f integrable over the interval [a, b].

The uniquely determined number in Theorem 15 is called the integral of f

over [a, b] and could be written as

Z b

f (x)dx

a

Theorem 16. If the function f is continuous in the closed interval [a, b],

then f is integrable over this.

Proof. Let be a given positive number. We will then construct two step

functions and with

> 0 such that

|f (x) f (y)| < , x, y [a, b] : |x y| < .

ba

With this we now make a division

of [a, b], such that the length l(D) of the longest sub-interval satisfy

l(D) < .

20

Then we define the numbers mk and Mk as the minimum and maximum

value of f in the interval xk1 x xk . Specifically, when

Mk mk < , k = 1, 2, ..., n.

ba

Finally, we define two step functions D and D belonging to this division

by putting

D = mk and D = Mk for xk1 < x < xk .

Then D f D and

n

X n

X

I(D ) I(D ) = Mk (xk xk1 ) mk (xk xk1 )

k=1 k=1

n

X

= (Mk mk )(xk xk1 ) <

k=1

n

X

< (xk xk1 ) = (b a) =

b a k=1 ba

Thus, f is integrable over [a, b] as defined by definition (7), and the theorem

is proved.

R

the proof we also get the more general result that each piecewise continuous

function is integrable. (With a piecewise continuous function in this context

means a function which is continuous in the whole interval [a, b] except at

finitely many points, where it is allowed to have a leap.)

Riemann sum

Let f be a continuous function at the interval [a, b], and regard the division

D : a = x0 < x1 < ... < xn = b

of this. Denote by l(D) the length of the largest sub-interval. This number

can we perceive as a measure of the fineness subdivision. Choose arbitrarily

in each sub-interval a point k so xk1 x xk , and form the sum

n

X

RD = f (k )(xk xk1 ). (2.10)

k=1

the sum of the rectangle area. It is reasonable that this sum can be made

arbitrarily close to the integral of f by choosing a sufficiently fine division,

ie a division D with sufficiently small value of l(D). This is the meaning of

the following theorem.

21

Theorem 17. Suppose that f is continuous on [a, b]. For the Riemann sum

(3.10) it applies that

n

X Z b

RD = f (k )(xk xk1 ) f (x)dx (2.11)

k=1 a

mk f (k ) Mk , k = 1, 2, ..., n.

is

I(D ) RD I(D ).

seeing that

I(D ) I(f ) I(D )

by the definition of I(f )

This shows that RD I(f ) when the division fineness l(D) goes to zero.

Theorem 18. If the functions f and g are integrable over [a, b] so this also

applies to the functions f ( constant) and f + g. Furthermore, we have

Z b Z b

f (x)dx = f (x)dx, (2.12)

a a

Z b Z b Z b

(f (x) + g(x)) = f (x)dx + g(x)dx, (2.13)

a a a

Z b Z b

f (x) g(x) in [a, b] f (x)dx g(x)dx, (2.14)

a a

Z b Z c Z b

f (x)dx = f (x)dx + f (x)dx. (2.15)

a a c

prove is when f and g are piecewise continuous. Then follows the formulas

directly by using the limit value (3.12) for the Riemann sums, and properties

(3.6) -(3.9) for step functions.

Primary (3.16) only exists when a c b. However, it is convenient for

b a to define Z b Z a

f (x)dx = f (x)dx. (2.16)

a b

22

Especially when aa f (x)dx = 0. With this Convention, we see that (3.16)

R

is a correct formula for all relative positions of points a, b and c, under the

premise that the integrals exist.

An important special case of (3.15) is

Z b

g(x) 0 in [a, b] g(x)dx 0.

a

[a, b], there exists point , so that a b, such that

Z b

f (x)dx = f ()(b a).

a

Proof. We put

m = min f (x) M = max

axb axb

is

m f (x) M nr axb

wich gives

Z b Z b Z b

m(b a) = mdx f (x)dx M dx = M (b a).

a a a

we put

1 b Z

C= f (x)dx

ba a

thees differences then implies m C M. But f is continuous and therefore

adopts every value between m and M in the interval [a, b]. Especially, there

is a in this interval for which f () = C. And we have proved the thoerem.

Theorem 20 (Fundamental theorem of calculus). Suppose that the function

f is continuous in the interval a x b. Then put

Z x

S(x) = f (t)dt

a

S 0 (x) = f (x).

of a derivative. We, therefore, form the differential quotient

Z x+h Z x ! Z x+h

S(x + h) S(x) 1 1

= f (t)dt f (t)dt = f (t)dt.

h h a a h x

23

Now we use the mean value theorem for an integral, and we get

S(x + h) S(x) 1

= f (h )(x + h x) = f (h )

h h

for some point h between x and x + h. When h 0 h goes towards x.

whereas f is continuous it follows that

f (h ) f (x) when h 0.

Thus, the function S(x) is differentiable with the derivative S 0 (x) = f (x).

The definition of the Riemann integral considers that we are working with

definite functions on definite intervals. In practice, you need to expand the

integral concept to include indefinite functions and intervals. The Riemann

integral is thereby combined with a limit process. We begin to study the

two simple cases where only one of the two restriction requirements will be

removed.

Consider a function defined in the interval [a, ] which is (Riemann)integrable

at the restricted domain a x X for each X. We associate f with a for-

mal improper integral Z

f (x)dx, (2.17)

a

for which we define the following concept.

Z X

lim f (x)dx

X+ a

exist, say equal to A, it is said that the improper integral (3.18) is convergent.

The number A is called its value. If the limit does not exist, we say that the

improper integral is divergent.

Z

f (x)dx,

a

not only for the integral but also for the integrals designated value A.

24

Indefinite integrand

We now consider a function defined in a definite interval a < x b and is

definite and Riemann integrable in each sub-interval [a + , b], > 0. The

function is assumed not to be definite throughout ]a, b]. For such a function

f is

Z b

f (x)dx (2.18)

a

a improper integral.

Z b

lim f (x)dx = A

0+ a+

exists we say that the improper integral (3.19) is convergent with the value

A. If the limit does not exist it is said to be divergent.

There will be integrals that

R

are improper in more

R

ways than one. It may for

example be a question of or an integral of a which also is improper in

the end point a. In such cases, divide the integral into two parts (or more),

each improper in just one way, and says that the whole integral converges

if each of the pieces does it. Otherwise, it is said to be divergent. For a

convergent integral its value is defined as the sum of a the individual bits

values.

In probability theory we often do analysis of random phenomenas, for ex-

ample in finance. As a model we often use a so called density function ie, a

non-negative function f (x) defined on the real axis and such that

Z

f (x)dx = 1.

f (x)dx is interpreted as the probability that the outcome of the trial will be

a number in a small range around x with width dx. The probability that

the outcome of the experiment ends up in a certain interval [a, b] is obtained

by summation of these sub intervals, ie it is equal to

Z b

f (x)dx.

a

25

It also works with distribution function F (x), which is related to the density

function by Z x

F (x) = f (t)dt;

the number F (x) apparently means the probability that the outcome of

the trial is less or equal to x. If f is continuous, F is differentiable and

F 0 (x) = f (x) according to the fundamental theorem of calculus.

As a measure of the density function we use the so-called mean or ex-

pected value. This is defined as the number

Z

(x)dx = 1.

The analogy with an emphasis in the mechanics is clear: the expected value

coincides with the center of gravity location for a mass distribution along

the entire real axis with density f (x) and the total mass first.

mean value. As a measure of this concentration, we use standard deviation,

which is the positive numbers that meet

Z

2

= (x m)2 f (x)dx.

son with Mechanics: The variance corresponding to the inertia of the mass

distribution f (x) with respect to an axis through each m perpendicular to

the xaxis.

1 2

(x) = ex /2

2

that belong to the so-called normal distribution. The corresponding density

function is Z x

1 2

(x) = et /2 dt.

2

26

3

Linear Algebra

Ren Descartes

A linear equation in the variables x1 , ..., xn is an equation that could be

written in the form

a1 x1 + a2 x2 + ... + an xn = b (3.1)

where b and the coefficients a1 , ..., an are real or complex numbers, usually

known in advance. The subscript n may be any positive integer.

A system of linear equations is a collection of one or more linear equations

involving the same variables-say,x1 , ..., xn . A solution of the system is a list

(s1 , s2 , ..., sn ) of numbers that makes each equation a true statement when

the values s1 , ..., sn are substituted for x1 , ..., xn respectively.

The set of all possible solutions is called a solution set of the linear

system. Two linear systems are equivalent if they have the same solution

set. That is, each solution of the first system is a solution of the second

system, and each solution of the second system i a solution to the first.

Finding the solution set of a system of two linear equations in two vari-

ables is easy because it amounts to finding the intersection of two lines. A

system of linear equations has

1. no solution, or

2. exactly on solution, or

3. infinitely many solutions.

A system of linear equations is said to be consistent if it has either one

solution or infinitely many solutions; a system is inconsistent if it has no

solutions.

27

Matrix Notation

The essential information of a linear system can be recorded compactly in

a rectangular array called a matrix. Given the system

x1 2x2 + x3 = 0

2x 8x = 8

2 3 (3.2)

5x 5x = 10

1 3

1 2 1

0 2 8

5 0 5

1 2 1 0

0 2 8 8

5 0 5 10

system consists of the coefficient matrix with an added column containing

the constants from the right sides of the equations.

The size of a matrix tells how many rows and columns it has. if m and

n are positive integers, an m n matrix is a rectangular array of numbers

with m rows and n columns. Matrix notation will simplify the calculations

in the examples that follows.

Definition 11. A rectangular matrix is in echelon form (or row echelon

form) if it has the following three properties:

entry of the row above it.

it is in reduced echelon form (or reduced row echelon form):

28

An echelon matrix is one that is in echelon form. Property 2 says that

the leading entries form an echelon ("steplike") pattern that moves down to

the right trough the matrix. Property 3 is a simple consequence of property

2, but we include it for emphasis.

The triangular metrices

2 3 2 1 1 0 0 29

0 1 4 8 and 0 1 0 16

0 0 0 5/2 0 0 1 3

are in echelon form. In fact the second matrix is in reduced echelon form.

Any nonzero matrix may be row reduced into more than one matrix

in echelon form, using different sequences of row operations. However, the

reduced echelon form one obtains from a matrix is unique.

row equivalent to one and only one reduced echelon matrix.

echelon form of A: if U is in reduced echelon form, we call U the reduced

echelon form of A.

Pivot Positions

When row operations on a matrix produce an echelon form, further row

operations to obtain the reduced echelon form do not change the positions

of the leading entries. Since the reduced echelon form is unique, the leading

entries are always in the same positions in any echelon form obtained from a

given matrix. These leading entries correspond to leading 1s in the reduced

echelon form.

sponds to a leading 1 in the reduced echelon form of A.A pivot column is a

column of A that contains a pivot position.

When a matrix B multiplies a vector x, it transforms x into the vector Bx.

If this vector is the multiplied in turn by a matrix A, the resulting vector

is A(Bx). Thus A(Bx) is produced from x by a composition of mappings.

Our goal is to represent this composite mapping as multiplication by a singel

matrix, denoted AB, so that

29

If A is m n, B is n p, and x Rp , denote the columns of B by

b1 , ..., bp and the entries in x by x1 , ..., xp . Then

Bx = x1 b1 + ... + xp bp .

The vector A(Bx) is a linear combination of the vectors Ab1 , ..., Abp , using

the entries in x as wights. In matrix notation, this linear combination is

written as h i

A(Bx) = Ab1 Ab2 ... Abp x.

h i

Thus multiplication by Ab1 Ab2 ... Abp transforms x into A(Bx).

columns b1 , ..., bp , then the product AB is the m p matrix whose columns

are Ab1 , ..., Abp . That

h i h i

AB = A b1 b2 ... bp = Ab1 Ab2 ... Abp

This definition makes equation (3.3) true for all x Rp . Equation (3.3)

proves that the composite mapping is a linear transformation and that its

standard matrix is AB. Multiplication of matrices corresponds to composi-

tions of linear transformations.

Theorem 22. Let A be an m n matrix, and let B and C have sizes for

which the indicated sums and products are defined.

1. A(BC) = (AB)C

2. A(B + C) = AB + AC

3. (B + C)A = BA + CA

5. Im A = A = AIn

Proof. We will just prove property (1). Property (1) follows from the fact

that matrix multiplication corresponds to composition of linear transforma-

tions, and its know that the composition of functions is associative.

30

3.3 The Matrix Equation Ax = b

A fundamental idea in linear algebra is to view a linear combination of vector

as the product of a matrix and a vector.

x Rn , then the product of A and x, denoted by Ax, is a linear combination

of the columns of A using the corresponding entries in x as wights: that is

x1

.

h i

Ax = a1 a2 ... an . = x1 a1 + ... + xn an

.

xn

b Rn , the matrix equation

Ax = b (3.4)

has the same solution as the vector equation

x1 a1 + ... + xn an = b (3.5)

which, in turn, has the same solution set as the system of linear equations

whose augmented matrix is

h i

a1 a2 ... an b (3.6)

linear algebra, because a system of linear equations may now be viewed in

three different but equivalent ways: as a matrix equation, as a vector equa-

tion, or as a system of linear equations. Whenever you cunstruct a math-

ematical model of a problem in real life, you are free to choose whichever

viewpoint is the most natural. Then you may switch from one formulation

of a problem to another whenever it is convenient. In any case, the matrix

equation (3.4), the vector equation(3.5), and the system of equations are all

solved in the same way- by row reducing the augmented matrix (3.6).

Existence of solutions

Theorem 24. Let A be an m n matrix. The following statements are

logically equivalent. That is, for a particular A, either they are all true or

they are all false.

31

3. The columns of A span Rm .

and what it means foe a set of vectors to span Rm .

Theorem 25. If A is an m n matrix, u and v are in Rn and c is a scalar,

then:

A(u + v) = Au + Av; (3.7)

h i

Proof. For simplicity, take n = 3, A = a1 a2 a3 , and u, v R3 . For

i = 1, 2, 3, let ui and vi be the ith entries in u and v, receptively. To prove

statement (3.7), compute A(u + v) as a linear combination of the columns

of A using the entries in u + v as weights.

h i u1 + v1

A(u + v) = a1 a2 a3 u2 + v2

u3 + v3

= (u1 a1 + u2 a2 + u3 a3 ) + (v1 a1 + v2 a2 + v3 a3 )

= Au + Av.

To prove statement (3.8), compute A(cu) as a linear combination of the

columns of A using the entries in cu as wights.

h i cu1

A(cu) = a1 a2 a3 cu2 = (cu1 )a1 + (cu2 )a2 + (cu3 )a3

cu3

= c(u1 a1 + u2 a2 + u3 a3 )

c(Au).

32

3.4 The Inverse of a Matrix

Matrix algebra provides tools for manipulation matrix equations and creat-

ing various useful formulas in ways similar to doing ordinary algebra with

real numbers.

Recall that the multilicatie inverse of a number sich as 5 is 1/5 or 51 .

This inverse satisfies the equations

51 5 = 1 and 5 51 = 1.

The matrix generalization requires both equations and avoids the slanted-

line notion (for division) because matrix multiplication is not commutative.

Furthermore, a full generalization is possible only if the matrices involved

are square.

An n n matrix A is said to be inverteble if ther is an n n matrix C

such that

CA = I and AC = I

where I = In , the n n identity matrix. In this case, C is an inverse of A.

In fact, C is uniquely determined by A, because if B were another inverse

matrix of A then B = BI = B(AC) = (BA)C = CI = C. This unique

inverse is denoted by A1 so that

A1 A = I and AA1 = I.

A matrix that is not invertible is sometimes called a singular matrix, and

an invertible matrix is called a nonsingular matrix.

Theorem 26. Let " #

a b

A=

c d

. If ad bc 6= 0,then A is invertible and

" #

1 1 d b

A =

ad bc c a

If ad bc = 0, then A is not invertible.

Theorem 27. If A is an invertible n n matrix, the for each b in Rn , the

equation Ax = b has the unique solution x = A1 b.

Proof. Take any b in Rn . A solution exists because if A1 b is substituted

for x, then Ax = AA1 b = (AA1 )b = Ib = b. So A1 b is a solution. To

prove that the solution is unique, show that if u is any solution, then u in

fact, must be A1 b. Indeed if Au = b, we can multiply both sides with A1

and obtain

A1 Au = A1 b Iu = A1 b u = A1 b.

33

The formula in theorem 27 i seldom used to solve an equation Ax = b

numerically because row reduction of [A b] is nearly almost faster. One

possible exeption is the 2 2 case. In this case mental computations to solve

Ax = b are sometimes easier using the formula for A1 .

(A1 )1 = A.

(b) If A and B are n n invertible matrices, then so is AB, and the inverse

of AB is the product of the inverses of A and B in the revers order.

That is,

(AB)1 = B 1 A1 .

transpose of A1 . That is

(AT )1 = (A1 )T .

A1 C = I and CA1 = I.

invertible, and A is its inverse. Next to prove statement (b), compute

the fact that (rA)T = rAT . We then get, (A1 )T AT = (AA1 )T = I T = I.

Similarly, AT (A1 )T = I T = I. Hence AT is invertible, and its inverse is

(A1 )T .

A factorization of a matrix A is an equation that expresses A as a product of

two or more matrices. Whereas matrix multiplication involves a synthesis of

data, matrix factorization is an analysis of data. In the language of computer

science, the expression of A as a product amounts to a preprocessing of data

in A, organizing that data into two ore more parts whose structures are

more useful in some way, perhaps more accessible for computation.

34

3.5.1 The LU Factorization

The LU factorization, described below, is motivated by the fairly common

industrial an business problem of solving a sequence of equations, all with

the same coefficient matrix:

Ax = b1 , Ax = b2 , ... Ax = bp . (3.9)

and so on. However, it is more efficient to solve the first equation in the

sequence (3.9) by row reduction and obtain an LU factorization of A at the

same time. Thereafter, the remaining equations in sequence (3.9) are solved

with the LU factorization.

At first, assume that A is an m n matrix that can be row reduced

to echelon form, without row interchanges. Then A can be written in the

form A = LU, where L is an m m lower triangular matrix with 1s on the

diagonal and U is an mn echelon form of A. Such factorization is called an

LU factorization of A. The matrix L is invertible and is called a unit lower

triangular matrix.

are so useful. When A = LU, the equation Ax = b can be written as

L(U x) = b. Writing y for U x, we can find x by solving the pair of equations

Ly = b (3.10)

U x = y. (3.11)

First solve Ly = b for y, and then solve U x = y for x. Each equation

are easy to solve because L and U are triangular.

An LU Factorization Algorithm

Suppose A can be reduced to an echelon form from U using only row re-

placements that add a multiple of one row to another row below it. In this

case, there exist unit lower triangular elementary matrices E1 , ..., Ep such

that

Ep E1 A = U. (3.12)

Then

A = (Ep E1 )1 U = LU (3.13)

where

L = (Ep E1 )1 . (3.14)

It can be shown that products and inverses of unit lower triangular ma-

trices are also unit lower triangular. Thus L is unit lower triangular.

35

Note that the row operations in equation (3.12), wich reduce A to U ,

also reduce the L in equation (3.14) to I, because Ep E1 L = (Ep

E1 )(Ep E1 )1 = I. This observation is key to construction L.

Definition 15 (Algorithm for an LU Factorization). 1. Reduce A to an

echelon form U by a sequence of row replacement operations, if possi-

ble.

2. Place entries in L such that the same sequence of row operations re-

duces L to I.

Step 1 is not always possible, but when it is, the argument above shows

that an LU factorization exists. By construction L will satisfy

(Ep E1 )L = I

using the same E1 , ..., Ep as equation (3.12). Thus L will be invertible,

by the invertible matrix theorem, with (Ep E1 ) = L1 . From (3.12),

L1 A = U, and A = LU. So step 2 will produce an acceptable L.

3.6 Subspaces of Rn

Definition 16. The subspace of Rn is any set H in Rn that has three

properties:

(a) The zero vector is in H.

(b) For each u and v, the sum u + v is in Rn .

(c) For each u in H and each scalar c, the vector cu is in Rn .

In words, a subspace is closed under addition and scalar multiplication.

Subspaces of Rn usually occur in applications and theory in one of two ways.

In both cases, the subspace can be related to a matrix.

Definition 17. The column space of a matrix A is the set of ColA of all

linear combinations of the columns of A.

If A = [a1 an ], with the columns in Rn , then ColA is the same as span

a1 , ..., an . Note that ColA equals Rm only when the columns of A span Rm .

Otherwise, ColA is only part of Rn .

Definition 18. The null space of a matrix A is the set NulA of all solutions

of the homogeneous equation Ax = 0.

When A has n columns, the solution of Ax = 0 belongs to Rn , and

the null space of A is a subset of Rn . In fact, NulA has the properties of a

subspace in Rn .

36

Theorem 29. The null space of an m n matrix is a subspace of Rn , and

the set off all solutions of a equation Ax = 0 of m homogeneous linear

equations in n unknowns is a subspace of Rn .

Proof. The zero vector is in NulA (because A0 = 0.) To show that NulA

satisfies the other two properties required for a subspace, take any u and

v in NulA. That is, suppose Au = 0 and Av = 0. Then, by a property of

matrix multiplication,

A(u + v) = Au + Av = 0 + 0.

Thus u + v satisfies Ax = 0 so u + v is in NulA. Also for any scalar c,

A(cu) = c(Au) = c(0) = 0.

whether Av is the zero vector. Because NulA is described by a condition

that must be checked for each vector, we say that the null space is defined

implicitly. In contrast, the column space is defined explicitly, because the

vectors in ColA can be constructed (by linear combinations) from columns

of A. To create an explicit description of NulA, solve the equation Ax = 0

and write the solution in parametric vector form.

Because a subspace typically contains an infinite numbers of vectors, some

problems involving a subspace are handled best by working with small finite

set of vectors that span the subspace. The smaller set, the better. It can be

shown that the smaller possible spanning set must be linearly independent.

Definition 19. A basis fo a subspace H of Rn is a linearly independent set

in H that spans H.

Theorem 30. The pivot columns of a matrix A form a basis for the column

space of A.

Definition 20. The dimension of a nonzero subspace H, denoted by dimH,

is the number of vectors in any basis for H. The dimension of the zero

subspace is defined to be zero.

Definition 21. The rank of a matrix A, denoted by rankA, is the dimension

of the column space of A.

Definition 22. An eigenvector of an m n matrix A is a nonzero vector

x such that Ax = x for some scalar . A scalar is called an eigenvalue

of A if there is a nontrivial solution x of Ax = x; such an x is called an

eigenvector corresponding to .

37

We say that is an eigenvector of an m n matrix A if and only if the

equation

(A I)x = 0 (3.15)

has a nontrivial solution. The set of all solutions of (3.15) is just the null

space of the matrix A I. So this set is a subspace of Rn and is called

the eigenspace of A corresponding to . The eigenspace consists of the zero

vector and all the eigenvectors corresponding to .

Theorem 31. The eigenvalues of a triangular matrix are the entries of its

main diagonal.

A I has the form

a11 a12 a13 0 0

A I = 0 a22 a23 0 0

0 0 a33 0 0

a11 a12 a13

= 0 a22 a23 .

0 0 a33

The scalar is an eigenvalue of A if and only if the equation (A I)x = 0

has a nontrivial solution, that is, if and only if the equation has a free

variable. Because of the zero entries in A I, it is easy to see that (A

I))x = 0 has a free variable if an only if at least one of the entries on the

diagonal of (A I) is zero. This happend if and only if equals one of the

entries, a11 , a22 , a33 in A.

happens if and only if the equation

Ax = 0x (3.16)

nontrivial solution if and only if A is not invertible. Thus 0 is an eigenvalue

of A if and only if A is not invertible.

values 1 , ..., r of an n n matrix A, then the set v 1 , ..., v r is linearly

independent.

that one of the vectors in the set is a linear combination of the preceding

vectors. Let p be the least index such that v p+1 is a linear combination

38

of the preceding (linearly independent) vectors. Then there exist scalars

c1 , ..., cp such that

c1 v 1 + ... + cp v p = v p+1 . (3.17)

Multiplying both sides of (3.17) by A and using the fact that Av k = k v k

for each k, er obtain

c1 Av 1 + ... + cp Av p = Av p+1

Multiplying both sides of (3.17) by p+1 and subtracting the result from

(3.18) we have

Since v 1 , ..., v p is linearly independent, the wights in (3.19) are all zero. But

none of the factors i p+1 are zero, because the eigenvalues are distinct.

Hence v 1 , ..., v r cannot be linearly dependent and therefore must be linearly

independent.

Similarity

The next theorem illustrates one use of the characteristic polynomial, and

it provides the foundation for several iterative methods that approximate

eigenvalues. If A and B are n n matrices, then A is similar to B if there is

an invertible matrix P such that P 1 AP = B, or equivalently, A = P BP 1 .

Writing Q for P 1 , we have Q1 BQ = A. So B is also similar to A, and

we say simply that A and B are similar. Changing A into P 1 AP is called

similarity transformation.

Theorem 33. If n n matrices A and B are similar, then they have the

same characteristic polynomial and hence the same eigenvalues.

B I = P 1 AP P 1 P = P 1 (AP P ) = (A I)P.

Since det(P 1 ) det(P ) = det(P 1 P ) = detI = 1, we see from equation

(3.20) that det(B I) = det(A I).

39

3.8 Diagonalization

In many cases, the eigenvalue-eigenvector information contained within a

matrix A can be displayed in a useful factorization of the form A = P DP 1

where D is a diagonal matrix. In this section, the factorization enable us

to compute Ak quickly for large values of k, a fundamental idea in several

applications of linear algebra.

matrix that is, if A = P DP 1 for some invertible matrix P and some diag-

onal matrix D. The next theorem gives a characterization of diagonalizable

matrices and tells how to construct a suitable factorization.

Theorem 34 (The diagonalization theorem). An n n matrix A is diago-

nalizable if and only if A has n linearly independent eigenvectors.

In fact, A = P DP 1 , with a D a diagonal matrix, if and only if the

columns of P are n linearly independent eigenvectors of A. In this case, the

diagonal entries of D are eigenvalues of A that correspond, respectively, to

the eigenvectors in P.

In other words, A is diagonalizable if and only if there are enough eigen-

vectors to form a basis of Rn . We call such basis an eigenvector basis of

Rn .

Proof. First, observe that if P is any nn matrix with the columns v 1 , ..., v n and

if D is any diagonal matrix with diagonal entries 1 , ..., n , then

h i h i

AP = A v 1 v 2 ... v n = Av 1 Av 2 ... Av n (3.21)

while

1 0 0

0 2 0

PD = P

.. .. .

.. (3.22)

. . .

0 0 n

Now suppose A is diagonalizable and A = P DP 1 . Then right-multiplying

this relation by P, we have AP = P D. In this case, equations (3.21) and

(3.22) imply that

h i h i

Av 1 Av 2 ... Av n = 1 v 1 2 v 2 n v n . (3.23)

Av 1 = 1 v 1 , Av 2 = 2 v 2 , .... Av n = n v n . (3.24)

Also, since these columns are nonzero, the equations in (3.24) show that

40

1 , ..., n are eigenvalues and v 1 , ..., v n are corresponding eigenvectors. This

argument proves the "only if" parts of the first and second statement, along

with the third statement, of the theorem.

Finally, given any n eigenvectors v 1 , ..., v n , use them to construct the

columns of P and use corresponding eigenvalues 1 , ..., n to construct D.

By equation (3.21)-(3.23), AP = P D. This is true without any condition on

the eigenvectors. If, in fact, the eigenvectors are linearly independent, then

P is invertible, and AP = P D implies that A = P DP 1 .

3.9.1 The Inner Product

If u and v are vectors in Rn , then we regard u and v as n 1 matrices.

The transpose uT is a 1 n matrix, and the matrix product uT v is a 1 1

matrix, which we write as a single real number (a scalar) without brackets.

The number uT v is called the inner product of u and v, and is often written

u v. This inner product, is also referred to as dot product. If

u1 v1

u2 v2

u=

..

and v=

..

. .

un vn

v1

v2

i

h

u1 u2 un .

= u1 v1 + u2 v2 + ... + un vn .

..

vn

(a) u v = v u.

(b) (u + v) w = u w + v w

If v is in Rn , with entries v1 , ..., vn , then the square root of v v is defined

because v v is nonnegative.

41

Definition 23. The length (or the norm) of v is the nonnegative scalar kvk

defined by

q

kvk = v v = v12 + v22 + ... + vn2 and kvk2 = v v.

" #

a

Suppose v is in R2 , say v = , if we identify v with a geometric

b

point in the plane, as usual, then kvk coincides with the standard notion of

the length of the line segment from the origin to v. This follows from the

Pythagorean Theorem applied to a triangle.

A similar calculation with the diagonal of a rectangular box shows that

the definition of length of a vector v in R3 coincides with the usual notion

of length.

For any scalar c, the length of cv is |c| times the length of v. That is

kcvk = |c| kvk .

A vector whose length is 1 is called a unit vector. If we divide a nonzero

vector v by its length- that is, multiply by 1/ kvk- we obtain a unit vector u

because the length of u is (1/ kvk) kvk. The process of creating u from v is

sometimes called normalizing v, and we say that u is in the same direction

as v.

Distance in Rn

Recall that if a and b are real numbers, the distance on the number line

between a and b is the number |a b|. This definition of distance in R has

a direct analogue in Rn .

Definition 24. For u and v in Rn , the distance between u and v, written

as dist(u, v), is the length of the vector u v. That is

dist(u, v) = ku vk .

In R2 and R3 , this definition of distance coincides with the usual formulas

for the Euclidean distance between two point.

Consider R2 or R3 and two lines trough the origin determined by vectors

u and v. The two lines are geometrically perpendicular if and only if the

distance from u to u is the same as the distance from u to u.

Definition 25. Two vectors u and v in Rn are orthogonal if u v = 0.

Theorem 36 (The Pythagorean Theorem). Two vectors u and v are or-

thogonal if and only if ku + vk2 = kuk2 + kvk2 .

Proof. ku + vk2 = (u + v)(u + v) = u u + u v + v u + v v = kuk2 +

kvk2 + 2u v. Where 2u v = 0 since the two vectors are orthogonal.

42

Orthogonal Complements

If a vector z is orthogonal to every vector in a subspace of W of Rn , then z

is said to be orthogonal to W. The set of all vectors z that are orthogonal

to W is called the orthogonal complement of W and is denoted by W

Theorem 37. Let A be an m n matrix. The orthogonal complement of

the row space of A is the null space of A, and the orthogonal complement of

the column space of A is the null space of AT :

then x is orthogonal to each row of A. Since the rows of A span the row

space, x is orthogonal to RowA. Conversely, if x is orthogonal to RowA,

then x is certainly orthogonal to each row of A, and hence Ax = 0. This

proves the first statement of the theorem. Since this statement is true for

any matrix, it proves for AT . That is, the orthogonal complement of the

row space of AT is the null space of AT . This proves the second statement,

because RowA = ColA.

A set of vectors {u1 , ..., up } in Rn is said to be an orthogonal set if each pair

of distinct vectors from the set is orthogonal, that is, if ui uj = 0 whenever

i 6= j.

Theorem 38. If S = {u1 , ..., up } is an orthogonal set of nonzero vectors

in Rn , then S is linearly independent and hence is a basis for the subspace

spanned by S.

Proof. if 0 = c1 u1 + ... + cp up for some scalars c1 , ..., cn , then

= c1 (u1 u1 ) + c2 (u2 u1 ) + ... + cp (up u1 )

= c1 (u1 u1 ),

because u1 is orthogonal to u2 , ..., up . Since u1 is nonzero u1 u1 is not

zero and so c1 = 0. Similarly, c2 , ..., cp must be zero. Thus S is linearly

independent.

W that is also an orthogonal set.

The next theorem suggest why an orthogonal basis is much nicer that

other bases. The weight in a linear combination can be computed easily.

43

Theorem 39. Let {u1 , ....up } be an orthogonal basis for a subspace W of

Rn . For each y in W , the weights in the linear combination

y = c1 u1 + ... + cp up

are given by

y uj

cj = .

uj uj

Proof. As in the preceding proof, the orthogonality of {u1 , ..., up } shows

that

y u1 = (c1 u1 + c2 u2 + ... + cp up ) u1 = c1 (u1 u1 )

Since u1 u1 is not zero, the equation can be solved for c1 . To find cj for

j = 2, ..., p, compute y uj and solve for cj .

Orthonormal Sets

A set u1 , ..., up is an orthonormal set if it is an orthogonal set of unit vectors.

If W is the subspace spanned by such a set, then u1 , ..., up is an orthonormal

basis for W, since the set is automatically linearly independent, by theorem

38.

The simplest example of an orthonormal set is the standard basis {e1 , ..., en }

for Rn . Any nonempty subset of {e1 , ..., en } is orthonormal, too.

U T U = I.

each a vector

h in Rn . The

i proof of the general case is essentially the same.

Let U = u1 u2 u4 and compute

uT1 h i uT1 u1 uT1 u2 uT1 u3

T

U U = u2 u1 u2 u3 = u2 u1 uT2 u2 uT2 u3 .

T T

(3.25)

uT3 uT3 u1 uT3 u2 uT3 u3

The entries in the matrix at the right are inner product, using transpose

notation. The columns of U are orthogonal if and only if

44

Theorem 41. Let U be an m n matrix with orthonormal columns, and

let x and y be in Rn . Then

(a) kU xk = kxk

(b) (U x) (U y) = x y

The orthogonal projection of a point in R2 onto a line trough the origin has

an important analogue in Rn . Given a vector y and a subspace W in Rn ,

in W such that (1) y

there is a vector y is the unique vector in W for which

yy is orthogonal to W , and (2) y is the unique vector in W closest to

y. These properties of y provide the key to finding least-squares solution of

linear system.

To prepare for the first theorem, observe that whenever a vector y is

written as a linear combination of vectors u1 , ..., un in Rn , the terms in the

sum for y can be grouped into two parts so that y can be written as

y = z1 + z2

of the rest of the ui . The idea is particularly useful when {u1 , ..., un } is an

orthogonal basis.

space of Rn . Then each y in Rn can be written uniquely in the form

+z

y=y (3.28)

basis of W , then

y u1 y up

=

y u+ ... + up (3.29)

u1 u1 up up

and z = y y

.

and often is written as projW y.

Proof. Let {u1 , ..., up } be any orthogonal basis for W , and define y by (3.29).

Then y is in W because y is a linear combination of the basis u1 , ..., up . Let

z =yy . Since u1 is orthogonal to u2 , ..., up , it follows from (3.29) that

y u1

z u1 = (y y

) u1 = y u1 u1 u1 0 ... 0

u1 u1

45

= y u1 y u1 = 0.

Thus z is orthogonal to u1 . Similarly, z is orthogonal to each uj in the basis

for W. Hence z is orthogonal to every vector in W. That is, z in W .

To show that the decomposition in (3.28) is unique, suppose y can also

be written as y = y1 +z 1 with y1 in W and z 1 in W . Then y

+z = y1 +z 1 ,

and so

y y1 = z 1 z.

y1 is in W an in W . Hence

This equality shows that the vector v = y

v v = 0, which shows that v = 0. This proves that y = y1 and also

z 1 = z.

depends only on W and not on the particular basis used in

projection y

(3.29).

If {u1 , ..., up } is an orthogonal basis for W and if y happens to be in W,

then the formula for projW y is exactly the same as the representation of y

given in theorem 39.

Theorem 43 (The Best Approximation Theorem). Let W be a subspace of

Rn , let y be any vector in Rn , and let y

be the orthogonal projection of y

onto W. Then y is the closest point in W to y, int the sense that

ky y

k < ky vk (3.30)

.

for all v in W distinct from y

The vector y in theorem 43 is called the best approximation to y by

elements of W. The distance from y to v, given by ky y

k, can be regarded

as the "error" of using v in place of y. Theorem 43 says that this error is

minimized when v = y .

does not depend on the

Inequality (3.30) leads to a new proof that y

particular orthogonal basis used to compute it. If a different orthogonal

basis for W were used to construct an orthogonal projection of y, then this

projection would also be the closest point in W to y, namely y.

. Then y

decomposition theorem, y y is orthogonal to W. In particular, y y is

v. Since

orthogonal to y

y v = (y y y v)

) + (

ky vk2 = ky y

k2 + k

y vk2 .

46

y vk2 > 0 because y

Now k v 6= 0, and so inequality (3.30) follows

immediately.

Rn , then

projW y = (y u1 )u1 + (y u2 )u2 + ... + (y up )up (3.31)

h i

if U = u1 u2 up , then

projW y = U U T y y Rn (3.32)

Proof. Formula (3.31) follows immediately from (3.29). Also (3.31) shows

that projW y is a linear combination of the columns of U using the weight

y u1 , y u2 , ..., y up . The weight can be written as uT1 y, uT2 y, ..., uTp y,

showing that they are entries in U T y and justifying (3.32).

The Gram-Schmidt process is a simple algorithm for producing an orthog-

onal or orthogonal basis for any nonzero subspace of Rn .

Theorem 45 (The Gram-Schmidt Process). Given a basis {x1 , ..., xp } for

a nonzero subspace W of Rn , define

v 1 = x1

x2 v 1

v 2 = x2 v1

v1 v1

x3 v 1 x3 v 2

v 3 = x3 v1 v2

v1 v1 v2 v2

..

.

x3 v 1 x3 v 2 xp v p1

v p = x3 v1 v 2 ... v p1 .

v1 v1 v2 v2 v p1 v p1

Proof. For 1 k p, let Wk = Span {x1 , ..., xk } . Set v 1 = x1 , so that

Span {v 1 } = Span {x1 } . Suppose, for some k < p, we have constructed

v 1 , ..., v k so that {x1 , ..., xk } is an orthogonal basis for Wk . Define

v k+1 = xk+1 projk xk+1 . (3.33)

By the orthogonal decomposition theorem v k+1 is orthogonal to Wk . Note

that projk xk+1 is in Wk and hence also in Wk+1 , so is v k+1 . Further-

more, v k+1 6= 0 because v k+1 is not in Wk = Span {x1 , ..., xk } . Hence

Span {v 1 , ..., v k+1 } is an orthogonal set of nonzero vectors in the (k + 1)-

dimensional space Wk+1 . By the basis theorem, this set is an orthogonal

basis for Wk+1 . Hence Wk+1 = Span {v 1 , ..., v k+1 } . When k + 1 = p, the

process stops.

47

Theorem 45 shows that any nonzero subspace W of Rn has an orthogo-

nal basis, because an ordinary basis {x1 , ..., xk } is always available and the

Gram-Schmidt process depends only on the existence of orthogonal projec-

tions onto subspaces of W that already have orthogonal bases.

Orthonormal Bases

An orthonormal base is constructed easily form an orthogonal basis {v 1 , ..., v p } :

simply normalize all the v k . When working problems by hand, this is easier

than normalizing each v k as soon as it is found.

Definition 27. If A is m n and b is in Rn , a least-squares solution of

in Rn such that

Ax = b is an x

kb A

xk kb Axk

for all x in Rn .

what x we select, the vector Ax will necessarily be in the column space,

ColA. So we seek an x that makes Ax the closest point in ColA to b.

Given A and b as above, apply the best approximation theorem to the

subspace ColA. Let

= Proj

b ColA b.

is in the column space of A, the equation Ax = b

Because b is consistent,

n

in R such that

and there is an x

A

x = b. (3.34)

is the closest point in ColA to b, a vector x

Since b is a least-squares solution

of Ax = b if and only if x satisfy (3.34). Such an x in Rn is a list of weights

that will build b out of the columns of A.

x = b.

the projection b has the property that b b is orthogonal to ColA, so

b A x is orthogonal to each column of A. If aj is any column of A, then

aj (b Ax) = 0, and aTj (b A

x) = 0. Since aTj is a row of AT ,

AT (b A

x) = 0. (3.35)

48

Thus

AT b AT A

x=0

= AT A

x = AT b.

These calculations show that each least squares solution of Ax = b satisfies

the equation

AT Ax = AT b. (3.36)

The matrix equation (3.36) represent a system of equations called the normal

equations for Ax = b. A solution of (3.36) is often denoted by x .

the nonempty set of solutions of the normal equation AT A

x = AT b.

each least-squares solution x satisfies the normal equations. Conversely,

suppose x satisfies AT Ax = AT b. Then x satisfies (3.35) above, which

shows that b A x is orthogonal to the rows of AT and hence is orthogonal

to the columns of A. Since the columns of A Span ColA, the vector b A x

is orthogonal to all of ColA. Hence the equation

x + (b A

b = A x)

orthogonal to ColA. By the uniqueness of the orthogonal decomposition,

Ax must be the orthogonal projection of b onto ColA. That is, A

x = b,

and x is a least-squares solution.

1. Determinants

2. Orthogonal Matrices

49

- LeachUploaded byRohit Raj
- C3L4 Maxima and Minima of Function of Two VariablesUploaded byEleena Aqmal Abd Rahim
- Basic Formulas for IntergrationUploaded bybhanuka2009
- emath12Uploaded byHanilen Catama
- Integration WITH SOFTWAREUploaded byNur Ashikin Ismahun
- Lec12p4Uploaded byapi-3724082
- Derivative SlidesUploaded byPrince Raj
- CalculusUploaded byAnonymous 9kzuGaY
- Summary_ Applications of theUploaded byMelita Lažem
- Differentiation-Part 2.pdfUploaded byManoj Joseph
- De Rivia TivesUploaded byIslam Unity
- Calculus 1 Ohio StateUploaded byMustain Mamur
- Lecture 4 - Limits at Infinity and Continuity at a Point.pdfUploaded byKeith Smith
- MooculusUploaded byBalarez Bed
- MooculusUploaded byarunred20
- Def IntegralsUploaded bySruthy S. Khan
- Abels Integral Equation SolutionUploaded bySharmae Aydy
- nda paper 1Uploaded byabhishek kumar
- h 02 PartialUploaded byAhmadMoaaz
- Soerjadi_1968Uploaded byMiguel Perez
- Schoenstadt Fourier PDEUploaded byWendel Alessandro
- Calculus III CompleteUploaded byMaestro King Magnifico
- Maths-i 1st Sem RegularUploaded bykaxilnaik8824
- Resofast 2016 Stp Jee Main (1)Uploaded byAbhishek Bansal
- Refractories Based on Alumina and MagnesiaUploaded byfcofimeuanl
- convexity-print_version.pdfUploaded byRitesh Pratap Singh
- L4-2014Uploaded byIonescu Paul
- too hot to handle student activityUploaded byapi-378044563
- Mitres 18 001 Guide5Uploaded byaspendos68
- Kuliah2_2015Uploaded bywira

- swaraUploaded byDavid Loreal
- Calde vsUploaded byChelle Ong
- HAARPUploaded byOmar Valdivia Morales
- System Pporf in Garment IndustryUploaded bygoca
- Chandrasekhar - (2009) - Exp. and Theo. Viscosity of Alumina NanofluidUploaded bypratikmitra30
- Complaint 2Uploaded byYdrel Obsioma
- Mechanical Calculation Sheet drafting procedure.pdfUploaded bylongnhar
- How to Use Synthetic PhonicsUploaded bybookwormj
- Research Paper: Milkweed and WartsUploaded byron971
- FOUNDATION FIELDBUS Commissioning BenefitsUploaded byJoshuaIsaac
- BasarUploaded byBirdyTheo
- Goods and Scates.docUploaded byIGnatiusMarieN.Layoso
- hjvjhUploaded byMichaella Claire Layug
- XX Nenbroto Nimrod and the TowerUploaded byPeter Mark Adams
- Ch.1 the Inconsistent TriadUploaded bycdmemfgamb
- Different Modes of Investment of IBBL-BUBTUploaded byAl-Amin Bhuiyan Shourov
- reportUploaded byapi-388584909
- LW Azuriel Attunement Eileen Brooks 070703Uploaded byvswami
- LyricsUploaded byHuynh Viet Tan
- Digital Control of Dynamic Systems - Chapter 1Uploaded bymagusrd
- DAFTAR PUSTAKAUploaded byIndrasti Banjaransari
- Thyroid SurgeryUploaded byKen Andrei Mesina
- Life and Accomplishments MusiciansUploaded bysumitkgupta
- samantha-seeger-743-fabi-original-with-feedback-step 4Uploaded byapi-347474235
- Documents from the U.S. Espionage Den volume 13 part 5Uploaded byemilywalker
- Miller SafEscape ELITE Rescue Descent Device - AUSUploaded byAdnan Kunic
- Tea Waste Adsorbent for the Removal of Chromium and Copper from Synthetic WastewaterUploaded byIJIRST
- Historical BackgroundUploaded byJunel Briones
- Material de tradus.pdfUploaded byValentina Chitic
- 35149722 Intellectual Property LawUploaded byRomielyn Macalinao