Professional Documents
Culture Documents
al Analysis
Being a series of le
tures on elementary numeri
al analysis
presented at the University of Maryland at College Park
and re
orded after the fa
t by
G. W. Stewart
Univerity of Maryland
College Park, MD
Contents
Prefa
e ix
Nonlinear Equations 1
Le
ture 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
By the dawn's early light . . . . . . . . . . . . . . . . . . . . . 3
Interval bise
tion . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Relative error . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Le
ture 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Newton's method . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Re
ipro
als and square roots . . . . . . . . . . . . . . . . . . . 11
Lo
al
onvergen
e analysis . . . . . . . . . . . . . . . . . . . . . 12
Slow death . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Le
ture 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
A quasi-Newton method . . . . . . . . . . . . . . . . . . . . . . 17
Rates of
onvergen
e . . . . . . . . . . . . . . . . . . . . . . . . 20
Iterating for a xed point . . . . . . . . . . . . . . . . . . . . . 21
Multiple zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Ending with a proposition . . . . . . . . . . . . . . . . . . . . . 25
Le
ture 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
The se
ant method . . . . . . . . . . . . . . . . . . . . . . . . . 27
Convergen
e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Rate of
onvergen
e . . . . . . . . . . . . . . . . . . . . . . . . 31
Multipoint methods . . . . . . . . . . . . . . . . . . . . . . . . 33
Muller's method . . . . . . . . . . . . . . . . . . . . . . . . . . 33
The linear-fra
tional method . . . . . . . . . . . . . . . . . . . 34
Le
ture 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
A hybrid method . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Errors, a
ura
y, and
ondition numbers . . . . . . . . . . . . . 40
Floating-Point Arithmeti
43
Le
ture 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Floating-point numbers . . . . . . . . . . . . . . . . . . . . . . 45
Over
ow and under
ow . . . . . . . . . . . . . . . . . . . . . . 47
Rounding error . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Floating-point arithmeti
. . . . . . . . . . . . . . . . . . . . . 49
Le
ture 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Computing sums . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Ba
kward error analysis . . . . . . . . . . . . . . . . . . . . . . 55
Perturbation analysis . . . . . . . . . . . . . . . . . . . . . . . . 57
Cheap and
hippy
hopping . . . . . . . . . . . . . . . . . . . . 58
Le
ture 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
v
vi Afternotes on Numeri
al Analysis
Can
ellation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
The quadrati
equation . . . . . . . . . . . . . . . . . . . . . . 61
That fatal bit of rounding error . . . . . . . . . . . . . . . . . . 63
Envoi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Linear Equations 67
Le
ture 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Matri
es, ve
tors, and s
alars . . . . . . . . . . . . . . . . . . . 69
Operations with matri
es . . . . . . . . . . . . . . . . . . . . . 70
Rank-one matri
es . . . . . . . . . . . . . . . . . . . . . . . . . 73
Partitioned matri
es . . . . . . . . . . . . . . . . . . . . . . . . 74
Le
ture 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
The theory of linear systems . . . . . . . . . . . . . . . . . . . . 77
Computational generalities . . . . . . . . . . . . . . . . . . . . 78
Triangular systems . . . . . . . . . . . . . . . . . . . . . . . . . 79
Operation
ounts . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Le
ture 11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Memory
onsiderations . . . . . . . . . . . . . . . . . . . . . . . 83
Row-oriented algorithms . . . . . . . . . . . . . . . . . . . . . . 83
A
olumn-oriented algorithm . . . . . . . . . . . . . . . . . . . 84
General observations on row and
olumn orientation . . . . . . 86
Basi
linear algebra subprograms . . . . . . . . . . . . . . . . . 86
Le
ture 12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Positive-denite matri
es . . . . . . . . . . . . . . . . . . . . . 89
The Cholesky de
omposition . . . . . . . . . . . . . . . . . . . 90
E
onomi
s . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Le
ture 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Inner-produ
t form of the Cholesky algorithm . . . . . . . . . . 97
Gaussian elimination . . . . . . . . . . . . . . . . . . . . . . . . 98
Le
ture 14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
BLAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
Upper Hessenberg and tridiagonal systems . . . . . . . . . . . . 110
Le
ture 15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Ve
tor norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Matrix norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Relative error . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Sensitivity of linear systems . . . . . . . . . . . . . . . . . . . . 116
Le
ture 16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
The
ondition of a linear system . . . . . . . . . . . . . . . . . 119
Arti
ial ill-
onditioning . . . . . . . . . . . . . . . . . . . . . . 120
Rounding error and Gaussian elimination . . . . . . . . . . . . 122
Comments on the error analysis . . . . . . . . . . . . . . . . . . 125
Contents vii
Le
ture 17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Introdu
tion to a proje
t . . . . . . . . . . . . . . . . . . . . . . 127
More on norms . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
The wonderful residual . . . . . . . . . . . . . . . . . . . . . . . 128
Matri
es with known
ondition numbers . . . . . . . . . . . . . 129
Invert and multiply . . . . . . . . . . . . . . . . . . . . . . . . . 130
Cramer's rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Submission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Polynomial Interpolation 133
Le
ture 18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
Quadrati
interpolation . . . . . . . . . . . . . . . . . . . . . . 135
Shifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Polynomial interpolation . . . . . . . . . . . . . . . . . . . . . . 137
Lagrange polynomials and existen
e . . . . . . . . . . . . . . . 137
Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Le
ture 19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Syntheti
division . . . . . . . . . . . . . . . . . . . . . . . . . . 141
The Newton form of the interpolant . . . . . . . . . . . . . . . 142
Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Existen
e and uniqueness . . . . . . . . . . . . . . . . . . . . . 143
Divided dieren
es . . . . . . . . . . . . . . . . . . . . . . . . . 144
Le
ture 20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Error in interpolation . . . . . . . . . . . . . . . . . . . . . . . 147
Error bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Convergen
e . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Chebyshev points . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Numeri
al Integration 155
Le
ture 21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Numeri
al integration . . . . . . . . . . . . . . . . . . . . . . . 157
Change of intervals . . . . . . . . . . . . . . . . . . . . . . . . . 158
The trapezoidal rule . . . . . . . . . . . . . . . . . . . . . . . . 158
The
omposite trapezoidal rule . . . . . . . . . . . . . . . . . . 160
Newton{Cotes formulas . . . . . . . . . . . . . . . . . . . . . . 161
Undetermined
oe
ients and Simpson's rule . . . . . . . . . . 162
Le
ture 22 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
The Composite Simpson rule . . . . . . . . . . . . . . . . . . . 165
Errors in Simpson's rule . . . . . . . . . . . . . . . . . . . . . . 166
Treatment of singularities . . . . . . . . . . . . . . . . . . . . . 167
Gaussian quadrature: The idea . . . . . . . . . . . . . . . . . . 169
Le
ture 23 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Gaussian quadrature: The setting . . . . . . . . . . . . . . . . . 171
viii Afternotes on Numeri
al Analysis
ix
x Afternotes on Numeri
al Analysis
1
Le
ture 1
Nonlinear Equations
By the Dawn's Early Light
Interval Bise
tion
Relative Error
3
4 Afternotes on Numeri
al Analysis
it is usually a good idea to ask about where it
ame from before working hard
to solve it.
The equation may not have a solution. Sin
e
os sin assumes a maximum
of 21 at = 4 , there will be no solution if
V02
d> :
g
Solutions, when they exist, are not unique. If there is one solution, then there
are innitely many, sin
e sin and
os are periodi
. These solutions represent
a rotation of the
annon elevation through a full
ir
le. Any resolution of the
problem has to take these spurious solutions into a
ount.
If d < V02=4g, and < 2 is a solution, then 2 is also a solution. Both
solutions are meaningful, but as far as the gunner is
on
erned, one may be
preferable to the other. You should nd out whi
h.
The fun
tion f is simple enough to be dierentiated. Hen
e we
an use a
method like Newton's method.
In fa
t, (1.1)
an be solved dire
tly. Just use the relation 2 sin
os = sin 2.
It is rare for things to turn out this ni
ely, but you should try to simplify before
looking for numeri
al solutions.
If we make the model more realisti
, say by in
luding air resistan
e, we may
end up with a set of dierential equations that
an only be solved numeri
ally.
In this
ase, analyti
derivatives will not be available, and one must use a
method that does not require derivatives, su
h as a quasi-Newton method
(x3.1).
Interval bise
tion
3. In pra
ti
e, a gunner may determine the range by trial and error, raising and
lowering the
annon until the target is obliterated. The numeri
al analogue
of this pro
ess is interval bise
tion. From here on we will
onsider the general
problem of solving the equation
f (x) = 0: (1.2)
4. The theorem underlying the bise
tion method is
alled the intermediate
value theorem.
If f is
ontinuous on [a; b and g lies between f (a) and f (b), then
there is a point x 2 [a; b su
h that g = f (x).
1. Nonlinear Equations 5
a1 a2 a4 b3 b1
8. The hardest part about using the bise
tion algorithm is nding a bra
ket.
On
e it is found, the algorithm is guaranteed to
onverge, provided the fun
tion
is
ontinuous. Although later we shall en
ounter algorithms that
onverge
mu
h faster, the bise
tion method
onverges steadily. If L0 = jb aj is the
length of the original bra
ket, after k iterations the bra
ket has length
L0
Lk =
2k :
Sin
e the algorithm will stop when Lk eps, it will require
L0
log2
eps
if (
==a ||
==b)
return;
is a
on
ession to the ee
ts of rounding error. If eps is too small, it is possible
for the algorithm to arrive at the point where (a+b)/2 evaluates to either a or
b, after whi
h the algorithm will loop indenitely. In this
ase the algorithm,
having given its all, simply returns.1
Relative error
10. The
onvergen
e
riterion used in (1.3) is based on absolute error ; that is,
it measures the error in the result without regard to the size of the result. This
may or may not be satisfa
tory. For example, if eps = 10 6 and the zero in
question is approximately one, then the bise
tion routine will return roughly
six a
urate digits. However, if the root is approximately 10 7 , we
an expe
t
no gures of a
ura
y: the nal bra
ket
an a
tually
ontain zero.
11. If a
ertain number of signi
ant digits are required, then a better measure
of error is relative error. Formally, if y is an approximation to x 6= 0, then the
relative error in y is the number
=
jy xj :
jxj
Alternatively, y has relative error , if there is a number with jj = su
h
that
y = x(1 + ):
12. The following table of approximations to e = 2:7182818 : : : illustrates the
relation of relative error and signi
ant digits.
Approximation
2: 2 10 1
2:7 6 10 3
2:71 3 10 3
2:718 1 10 4
2:7182 3 10 5
2:71828 6 10 7
13. If we ex
lude tri
ky
ases like x = 2:0000 and y = 1:9999, in whi
h the
notion of agreement of signi
ant digits is not well dened, the relation between
agreement and relative error is not di
ult to establish. Let us suppose, say,
that x and y agree to six gures. Writing x above y, we have
x = X1 X2 X3 X4 X5 X6 X7 X8 ;
y = Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 :
Now sin
e the digits X7 and Y7 must disagree, the smallest dieren
e between
x and y is obtained when, e.g.,
X7 X8 = 40;
y7 y8 = 38:
Nonlinear Equations
Newton's Method
Re
ipro
als and Square Roots
Lo
al Convergen
e Analysis
Slow Death
Newton's method
1. Newton's method is an iterative method for solving the nonlinear equation
f (x) = 0: (2.1)
Like most iterative methods, it begins with a starting point x0 and produ
es
su
essive approximations x1 , x2 , . . . . If x0 is su
iently near a root x of (2.1),
the sequen
e of approximations will approa
h x . Usually the
onvergen
e is
quite rapid, so that on
e the typi
al behavior of the method sets in, it requires
only a few iterations to produ
e a very a
urate approximation to the root.
(The point x is also
alled a zero of the fun
tion f . The distin
tion is that
equations have roots while fun
tions have zeros.)
Newton's method
an be derived in two ways: geometri
ally and analyti-
ally. Ea
h has its advantages, and we will treat ea
h in turn.
2. The geometri
approa
h is illustrated in Figure 2.1. The idea is to draw
a tangent to the
urve y = f (x) at the point A = (x0 ; f (x0 )). The abs
issa
x1 of the point C = (x1 ; 0) where the tangent interse
ts the axis is the new
approximation. As the gure suggests, it will often be a better approximation
to x than x0 .
To derive a formula for x1 ,
onsider the distan
e BC from x0 to x1 , whi
h
satises
BC = BAd :
tan ACB
But BA = f (x0 ) and tan ABC
d = 0
f (x0 ) (remember the derivative is negative
at x0 ). Consequently,
f (x0 )
x1 = x0 :
f 0 (x )
0
If the iteration is
arried out on
e more, the result is point D in Figure 2.1. In
general, the iteration
an be
ontinued by dening
f (xk )
xk+1 = xk ; k = 0; 1; : : : :
f 0(xk )
9
10 Afternotes on Numeri
al Analysis
B C D
3. The analyti
derivation of Newton's method begins with the Taylor expan-
sion
1
f (x) = f (x0 ) + f 0 (x0 )(x x0 ) + f 00 (0 )(x x0 )2 ;
2
where as usual 0 lies between x and x0 . Now if x0 is near the zero x of f and
f 0 (x0 ) is not too large, then the fun
tion
f^(x) = f (x0 ) + f 0 (x0 )(x x0 )
provides a good approximation to f (x) in the neighborhood of x . For example,
if jf 00 (x)j 1 and jx x0 j 10 2 , then jf^(x) f (x)j 10 4 . In this
ase it is
reasonable to assume that the solution of the equation f^(x) = 0 will provide a
good approximation to x . But this solution is easily seen to be
f (x0 )
x1 = x0 ;
f 0(x0 )
whi
h is just the Newton iteration formula.
2. Nonlinear Equations 11
Lo
al
onvergen
e analysis
8. We are going to show that if x0 is su
iently near a zero of x of f and
f 0(x ) 6= 0;
then Newton's method
onverges | ultimately with great rapidity. To simplify
things, we will assume that f has derivatives of all orders. We will also set
f (x)
'(x) = x ;
f 0 (x)
so that
xk+1 = '(xk ):
The fun
tion ' is
alled the iteration fun
tion for Newton's method. Note that
f (x )
'(x ) = x = x :
f 0 (x )
Be
ause x is unaltered by ', it is
alled a xed point of '.
Finally we will set
ek = xk x :
The quantity ek is the error in xk as an approximation to x . To say that
xk ! x is the same as saying that ek ! 0.
9. The lo
al
onvergen
e analysis of Newton's method is typi
al of many
onvergen
e analyses. It pro
eeds in three steps.
1. Obtain an expression for ek+1 in terms of ek .
2. Use the expression to show that ek ! 0.
3. Knowing that the iteration
onverges, assess how fast it
onverges.
10. The error formula
an be derived as follows. Sin
e xk+1 = '(xk ) and
x = '(x ),
ek+1 = xk+1 x = '(xk ) '(x ):
By Taylor's theorem with remainder,
'(xk ) '(x ) = '0 (k )(xk x );
where k lies between xk and x . It follows that
ek+1 = '0 (k )ek : (2.5)
This is the error formula we need to prove
onvergen
e.
11. At rst glan
e, the formula (2.5) appears di
ult to work with sin
e it
depends on k whi
h varies from iterate to iterate. However | and this is the
2. Nonlinear Equations 13
A sequen e whose errors behave like this is said to be quadrati ally onvergent .
13. To see informally what quadrati
onvergen
e means, suppose that the
multiplier of e2k in (2.6) is one and that e0 = 10 1 . Then e1
= 10 2 , e3
= 10 4 ,
e4 = 10 , e5 = 10 , and so on. Thus if x is about one in magnitude, the
8 16
rst iterate is a
urate to about two pla
es, the se
ond to four, the third to
eight, the fourth to sixteen, and so on. In this
ase ea
h iteration of Newton's
method doubles the number of a
urate gures.
For example, if the formula (2.4) is used to approximate the square root of
ten, starting from three, the result is the following sequen
e of iterates.
3:
3:16
3:1622
3:16227766016
3:16227766016838
Only the
orre
t gures are displayed, and they roughly double at ea
h iter-
ation. The last iteration is ex
eptional, be
ause the
omputer I used
arries
only about fteen de
imal digits.
14. For a more formal analysis, re
all that the number of signi
ant gures in
an approximation is roughly the negative logarithm of the relative error (see
x1.12). Assume that x 6= 0, and let k denote the relative error in xk . Then
from (2.6) we have
k+1
jx2 f 00(x )j
= 2jf 0 (x )j 2k K2k :
Hen
e
log k+1 = 2 log k log K:
As the iteration
onverges, log k ! 1, and it overwhelms the value of log K .
Hen
e
log k+1
= 2 log k ;
whi
h says that xk+1 has twi
e as many signi
ant gures as xk .
Slow death
15. The
onvergen
e analysis we have just given shows that if Newton's method
onverges to a zero x for whi
h f 0 (x ) 6= 0 then in the long run it must
onverge
quadrati
ally. But the run
an be very long indeed.
For example, in x2.5 we noted that the iteration
xk+1 = 2xk ax2k
2. Nonlinear Equations 15
will
onverge to a 1 starting from any point less than a 1 . In parti
ular, if
a < 1, we
an take a itself as the starting value.
But suppose that a = 10 10 . Then
x1 = 2 10 10
+ 10 30 = 2 10 :
10
Thus for pra
ti
al purposes the rst iterate is only twi
e the size of the starting
value. Similarly, the se
ond iterate will be about twi
e the size of the rst.
This pro
ess of doubling the sizes of the iterates
ontinues until xk = 1010 , at
whi
h point quadrati
onvergen
e sets in. Thus we must have 2 10 10
k = 1010
or k
= 66 before we begin to see quadrati
onvergen
e. That is a lot of work
to
ompute the re
ipro
al of a number.
16. All this does not mean that the iteration is bad, just that it needs a good
starting value. Sometimes su
h a value is easy to obtain. For example, suppose
that a = f 2e , where 12 f < 1 and we know e. These
onditions are satised
if a is represented as a binary
oating-point number on a
omputer. Then
a 1 = f 1 2 e . Sin
e 1 < f 1 2, the number 2 e < a 1 provides a good
starting value.
Le
ture 3
Nonlinear Equations
A Quasi-Newton Method
Rates of Convergen
e
Iterating for a Fixed Point
Multiple Zeros
Ending with a Proposition
A quasi-Newton method
1. One of the drawba
ks of Newton's method is that it requires the
omputa-
tion of the derivative f 0 (xk ) at ea
h iteration. There are three ways in whi
h
this
an be a problem.
1. The derivative may be very expensive to
ompute.
2. The fun
tion f may be given by an elaborate formula, so that it is
easy to make mistakes in dierentiating f and writing
ode for the
derivative.
3. The value of the fun
tion f may be the result of a long numeri
al
al
ulation. In this
ase the derivative will not be available as a
formula.
2. One way of getting around this di
ulty is to iterate a
ording to the
formula
f (xk )
xk+1 = xk ;
gk
where gk is an easily
omputed approximation to f 0 (xk ). Su
h an iteration
is
alled a quasi-Newton method.2 There are many quasi-Newton methods,
depending on how one approximates the derivative. For example, we will later
examine the se
ant method in whi
h the derivative is approximated by the
dieren
e quotient
f (xk ) f (xk 1)
gk = : (3.1)
x x k k 1
Here we will analyze the simple
ase where gk is
onstant, so that the iteration
takes the form
f (xk )
xk+1 = xk : (3.2)
g
We will
all this method the
onstant slope method. In parti
ular, we might
take g = f 0 (x0 ), as in (2.2). Figure 3.1 illustrates the
ourse of su
h an
2
The term \quasi-Newton" usually refers to a
lass of methods for solving systems of
simultaneous nonlinear equations.
17
18 Afternotes on Numeri
al Analysis
iteration.
3. On
e again we have a lo
al
onvergen
e theorem. Let
f (x)
'(x) = x
g
be the iteration fun
tion and assume that be the iteration fun
tion and assume
that
f 0 (x )
0
j' (x)j 1
< 1: (3.3)
g
k!1
lim jxjxk+1 x xjp j = C;
k
then the sequen
e is said to
onverge with order p. When p = 2 the
onvergen
e
is quadrati
. When p = 3 the
onvergen
e is
ubi
. In general, the analysis
of quadrati
onvergen
e in x2.14
an be adapted to show that the number of
orre
t gures in a sequen
e exhibiting pth order
onvergen
e in
reases by a
If = 1, the
onvergen
e is sometimes
alled
3
sublinear.The sequen
e f g
onverges
1
k
sublinearly to zero.
3. Nonlinear Equations 21
fa
tor of about p from iteration to iteration. Note that p does not have to be
an integer. Later we shall see the se
ant method (3.1) typi
ally
onverges with
order p = 1:62. . . .4
11. You will not ordinarily en
ounter rates of
onvergen
e greater than
ubi
,
and even
ubi
onvergen
e o
urs only in a few spe
ialized algorithms. There
are two reasons. First, the extra work required to get higher-order
onvergen
e
may not be worth it | espe
ially in nite pre
ision, where the a
ura
y that
an be a
hieved is limited. Se
ond, higher-order methods are often less easy
to apply. They generally require higher-order derivatives and more a
urate
starting values.
Iterating for a xed point
12. The essential identity of the lo
al
onvergen
e proof for Newton's method
and the quasi-Newton method suggests that they both might be subsumed un-
der a general theory. Here we will develop su
h a theory. Instead of beginning
with an equation of the form f (x) = 0, we will start with a fun
tion ' having
a xed point x | that is, a point x for whi
h '(x ) = x | and ask when the
iteration
xk+1 = '(xk ); k = 0; 1; : : : (3.6)
onverges to x . This iterative method for nding a xed point is
alled the
method of su
essive substitutions.
13. The iteration (3.6) has a useful geometri
interpretation, whi
h is illus-
trated in Figures 3.2 and 3.3. The xed point x is the abs
issa of the inter-
se
tion of the graph of '(x) with the line y = x. The ordinate of the fun
tion
'(x) at x0 is the value of x1 . To turn this ordinate into an abs
issa, re
e
t it in
the line y = x. We may repeat this pro
ess to get x2 , x3 , and so on. It is seen
that the iterates in Figure 3.2 zigzag into the xed point, while in Figure 3.3
they zigzag away: the one iteration
onverges if you start near enough to the
xed point, whereas the other diverges no matter how
lose you start. The
xed point in the rst example is said to be attra
tive, and the one in the
se
ond example is said to be repulsive .
14. It is the value of the derivative of ' at the xed point that makes the
dieren
e in these two examples. In the rst the absolute value of the derivative
is less than one, while in the se
ond it is greater than one. (The derivatives
here are both positive. It is instru
tive to draw iteration graphs in whi
h
the derivatives at the xed point are negative.) These examples along with
our earlier
onvergen
e proofs suggest that what is ne
essary for a method of
su
essive substitutions to
onverge is that the absolute value of the derivative
be less than one at the xed point. Spe
i
ally, we have the following result.
4
It is also possible for a sequen
e to
onverge superlinearly but not with order p > 1. The
sequen
e k1! is an example.
22 Afternotes on Numeri
al Analysis
x0 x1 x2 x3
If
j'0 (x )j < 1;
then there is an interval I = [x ; x + su
h that the iteration
(3.6)
onverges to x whenever x0 2 I . If '0 (x ) 6= 0, then the
onvergen
e is linear with ratio '0 (x ). On the other hand, if
0 = '0 (x ) = '00 (x ) = = '(p 1) (x ) 6= '(p) (x ); (3.7)
then the
onvergen
e is of order p.
15. We have essentially seen the proof twi
e over. Convergen
e is established
exa
tly as for Newton's method or the
onstant slope method. Linear
onver-
gen
e in the
ase where '0 (x ) 6= 0 is veried as it was for the
onstant slope
method. For the
ase where (3.7) holds, we need to verify that the
onvergen
e
is of order p. In the usual notation, by Taylor's theorem
1
ek+1 = '(p) (k )epk :
p!
Sin
e k ! x , it follows that
lim ek+1 = p1! '(p) (x ) 6= 0;
k!1 epk
3. Nonlinear Equations 23
x2 x1 x0
Multiple zeros
17. Up to now we have
onsidered only a simple zero of the fun
tion f , that
is, a zero for whi
h f 0 (x ) 6= 0. We will now
onsider the
ase where
0 = f 0(x ) = f 00(x ) = = f (m 1) (x ) 6= f (m) (x ):
By Taylor's theorem
f (m) (x )
f (x) = (x x )m ;
m!
where x lies between x and x. If we set g(x) = f (m) (x )=m!, then
f (x) = (x x )m g(x); (3.8)
where g is
ontinuous at x and g(x ) 6= 0. Thus, when x is near x , the
fun
tion f (x) behaves like a polynomial with a zero of multipli
ity m at x .
For this reason we say that x is a zero of multipli
ity m of f .
18. We are going to use the xed-point theory developed above to assess the
behavior of Newton's method at a multiple root. It will be most
onvenient to
use the form (3.8). We will assume that g is twi
e dierentiable.
19. Sin
e f 0 (x) = m(x x )m 1 g(x) + (x x )m g0 (x), the Newton iteration
fun
tion for f is
'(x) = x
(x x )m g(x) =x (x x )g(x) :
m(x x ) g(x) + (x x ) g (x)
m 1 m 0 mg(x) (x x )g0 (x)
From this we see that ' is well dened at x and
'(x ) = x :
A
ording to xed-point theory, we have only to evaluate the derivative of
' at x to determine if x is an attra
tive xed point. We will skip the slightly
tedious dierentiation and get straight to the result:
' 0 (x ) = 1
1:
m
Therefore, Newton's method
onverges to a multiple zero from any su
iently
lose approximation, and the
onvergen
e is linear with ratio 1 m1 . In parti
-
ular for a double root, the ratio is 12 , whi
h is
omparable with the
onvergen
e
of interval bise
tion.
3. Nonlinear Equations 25
Nonlinear Equations
The Se
ant Method
Convergen
e
Rate of Convergen
e
Multipoint Methods
Muller's Method
The Linear-Fra
tional Method
27
28 Afternotes on Numeri
al Analysis
x2
x0 x1
3. The se
ant method derives its name from the following geometri
interpreta-
tion of the iteration. Given x0 and x1 , draw the se
ant line through the graph
of f at the points (x0 ; f (x0 )) and (x1 ; f (x1 )). The point x2 is the abs
issa
of the interse
tion of the se
ant line with the x-axis. Figure 4.1 illustrates
this pro
edure. As usual, a graph of this kind
an tell us a lot about the
onvergen
e of the method in parti
ular
ases.
4. If we set
f (u)(u v) vf (u) uf (v)
'(u; v) = u = f (u) f (v) ; (4.3)
f (u) f (v)
then the iteration (4.2)
an be written in the form
xk+1 = '(xk ; xk 1 ):
Thus ' plays the role of an iteration fun
tion. However, be
ause it has two
arguments, the se
ant method is
alled a two-point method.
5. Although ' is indeterminate for u = v, we may remove the indetermina
y
4. Nonlinear Equations 29
by setting
f (u)
'(u; u) = u :
f 0(u)
In other words, the se
ant method redu
es to Newton's method in the
on
uent
ase where xk = xk 1 . In parti
ular, it follows that
'(x ; x ) = x ;
so that x is a xed point of the iteration.
Convergen
e
6. Be
ause the se
ant method is a two-point method, the xed-point theory
developed above does not apply. In fa
t, the
onvergen
e analysis is
onsider-
ably more
ompli
ated. But it still pro
eeds in the three steps outlined in x2.9:
(1) nd a re
ursion for the error, (2) show that the iteration
onverges, and
(3) assess the rate of
onvergen
e. Here we will
onsider the rst two steps.
7. It is a surprising fa
t that we do not need to know the spe
i
form (4.3)
of the iteration fun
tion to derive an error re
urren
e. Instead we simply use
the fa
t that if we input the answer we get the answer ba
k. More pre
isely, if
one of the arguments of ' is the zero x of f , then ' returns x ; i.e.,
'(u; x ) x and '(x ; v) x :
Sin
e '(u; x ) and '(x ; v) are
onstant, their derivatives with respe
t to u
and v are zero:
'u (u; x ) 0 and 'v (x ; v) 0:
The same is true of the se
ond derivatives:
'uu (u; x ) 0 and 'vv (x ; v) 0:
The term
ontaining the
ross produ
t pq is just what we want, but the terms
in p2 and q2 require some massaging. Sin
e 'uu (x + p; x ) = 0, it follows
from a Taylor expansion in the se
ond argument that
'uu (x + p; x + q) = 'uuv (x + p; x + q q)q;
where q 2 [0; 1. Similarly,
'vv (x + p; x + q) = 'uvv (x + pp; x + q)p;
where p 2 [0; 1. Substituting these values in (4.4) gives
pq
'(x + p; x + q) = x + ['uuv (x + p; x + q q)p
2 (4.5)
+ 2'uv (x + p; x + q) + 'uvv (x + pp; x + q)q:
9. Turning now to the iteration proper, let the starting values be x0 and x1 ,
and let their errors be e0 = x0 x and e1 = x1 x . Taking p = e1 and q = e0
in (4.5), we get
e2 = '(x + e1 ; x + e0 ) x
= e12e0 ['uuv (x + e1 ; x + e e0 )e1
0
r(e1; e0 ):
2
(4.6)
This is the error re
urren
e we need.
10. We are now ready to establish the
onvergen
e of the method. First note
that
r(0; 0) = 2'uv (x ; x ):
Hen
e there is a > 0 su
h that if juj; jvj then
jvr(u; v)j C < 1:
Now let je0 j; je1 j . From the error re
urren
e (4.6) it follows that je2 j
C je1 j < je1 j . Hen
e
je1 r(e2 ; e1 )j C < 1;
and je3 j C je2 j C 2je1 j. By indu
tion
jek j C k je j;
1
1
and sin
e the right-hand side of this inequality
onverges to zero, we have
ek ! 0; i.e., the se
ant method
onverges from any two starting values whose
errors are less than in absolute value.
4. Nonlinear Equations 31
Rate of
onvergen
e
11. We now turn to the
onvergen
e rate of the general two-point method.
The rst thing to note is that sin
e
ek ek
ek+1 =
2 r(ek ; ek 1 ) (4.7)
1
From (4.7),
sk spk 1jek 1 jp 2
Let k = log jrk j and k = log sk . Then our problem is to show that the
sequen
e dened by
k = k (p 1)k 1
has a limit.
Let = limk!1 k . Then the limit , if it exists, must satisfy
= (p 1) :
Thus we must show that the sequen
e of errors dened by
(k ) = (k ) (p 1)(k 1 )
onverges to zero.
15. The
onvergen
e of the errors to zero
an easily be established from rst
prin
iples. However, with an eye to generalizations I prefer to use the following
result from the theory of dieren
e equations.
If the roots of the equation
xn a1 xn 1
an = 0
all lie in the unit
ir
le and limk!1 k = 0, then the sequen
e fk g
generated by the re
ursion
k = k + a1 k 1 + an k n
Multipoint methods
17. The theory we developed for the se
ant method generalizes to multipoint
iterations of the form
xk+1 = '(xk ; xk 1 ; : : : ; xk n+1 ):
Again the basi
assumption is that if one of the arguments is the answer x
then the value of ' is x . Under this assumption we
an show that if the
starting points are near enough x then the errors satisfy
ek+1
lim = '12:::n (x ; x ; : : : ; x );
k!1 ek ek 1 ek n+1
where the subs
ript i of ' denotes dierentiation with respe
t to the ith argu-
ment.
18. If '12:::n (x ; x ; : : : ; x ) 6= 0, we say that the sequen
e exhibits n-point
onvergen
e. As we did earlier, we
an show that n-point
onvergen
e is the
same as pth-order
onvergen
e, where p is the largest root of the equation
pn pn 1
p 1:
The following is a table of the
onvergen
e rates as a fun
tion of n.
n p
2 1:61
3 1:84
4 1:93
5 1:96
The upper bound on the order of
onvergen
e is two, whi
h is ee
tively at-
tained for n = 3. For this reason multipoint methods of order four or greater
are seldom en
ountered.
Muller's method
19. The se
ant method is sometimes
alled an interpolatory method, be
ause
it approximates a zero of a fun
tion by a line interpolating the fun
tion at two
points. A useful iteration,
alled Muller's method,
an be obtained by tting
a quadrati
polynomial at three points. In outline, the iteration pro
eeds
as follows. The input is three points xk , xk 1 , xk 2 , and the
orresponding
fun
tion values.
1. Find a quadrati
polynomial g(x) su
h that g(xi ) = f (xi ), (i = k; k
1; k 2).
2. Let xk+1 be the zero of g that lies nearest xk .
34 Afternotes on Numeri
al Analysis
x1 x3 x2
24. Most low-order interpolation problems are simplied by shifting the origin.
In parti
ular we take yi = xi xk (i = k; k 1; k 2) and determine a, b, and
so that
y a
g(y) =
by
satises
f (xi) = g(yi ); i = k; k 1; k 2;
or equivalently
yi a = f (xi )(byi
); i = k; k 1; k 2: (4.11)
The fun
tion g is zero when yk+1 = a, and the next point is given by
xk+1 = xk + a:
25. Sin
e at any one time there are only three points, there is no need to keep
the index k around. Thus we start with three points x0, x1, x2, and their
orresponding fun
tion values f0, f1, f2. We begin by setting
y0 = x0 x2;
y1 = x1 x2:
where
fy0 = f0 y0;
fy1 = f1 y1;
and
df0 = f2 f0;
df1 = f2 f1:
The equations (4.13)
an be solved for
by Cramer's rule:
fy0 y1 fy1 y0
= :
fy0 df1 fy1 df0
36 Afternotes on Numeri
al Analysis
26. Be
ause we have
hosen our origin
arefully and have taken
are to dene
appropriate intermediate variables, the above development leads dire
tly to
the following simple program. The input is the three points x0, x1, x2, and
their
orresponding fun
tion values f0, f1, f2. The output is the next iterate
x3.
y0 = x0 - x2;
y1 = x1 - x2;
fy0 = f0*y0;
fy1 = f1*y1;
df0 = f2 - f0;
df1 = f2 - f1;
= (fy0*y1-fy1*y0)/(fy0*df1-fy1*df0);
x3 = x2 + f2*
;
Le
ture 5
Nonlinear Equations
A Hybrid Method
Errors, A
ura
y, and Condition Numbers
A hybrid method
1. The se
ant method has the advantage that it
onverges swiftly and requires
only one fun
tion evaluation per iteration. It has the disadvantage that it
an blow up in your fa
e. This
an happen when the fun
tion is very
at so
that f 0(x) is small
ompared with f (x) (see Figure 4.2). Newton's method is
also sus
eptible to this kind of failure; however, the se
ant method
an fail in
another way that is uniquely its own.
2. The problem is that in pra
ti
e the fun
tion f will be evaluated with error.
Spe
i
ally, the program that evaluates f at the point x will return not f (x)
but f~(x) = f (x) + e(x), where e(x) is an unknown error. As long as f (x) is
large
ompared to e(x), this error will have little ee
t on the
ourse of the
iteration. However, as the iteration approa
hes x , e(x) may be
ome larger
than f (x). Then the approximation to f 0 that is used in the se
ant method
will have the value
[f (xk ) f (xk 1) + [e(xk ) e(xk 1 ) :
xk xk 1
Sin
e the terms in e dominate those in f , the value of this approximate deriva-
tive will be unpredi
table. It may have the wrong sign, in whi
h
ase the
se
ant method may move away from x . It may be very small
ompared to
f~(xk ), in whi
h
ase the iteration will take a wild jump. Thus, if the fun
-
tion is
omputed with error, the se
ant method may behave errati
ally in the
neighborhood of the zero it is supposed to nd.
3. We are now going to des
ribe a wonderful
ombination of the se
ant
method and interval bise
tion.5 The idea is very simple. At any stage of
the iteration we work with three points a, b, and
. The points a and b are
the points from whi
h the next se
ant approximation will be
omputed; that
is, they
orrespond to the points xk and xk 1 . The points b and
form a
proper bra
ket for the zero. If the se
ant method produ
es an undesirable
approximation, we take the midpoint of the bra
ket as our next iterate. In
5
The following presentation owes mu
h to Jim Wilkinson's elegant te
hni
al report \Two
Algorithms Based on Su
essive Linear Interpolation," Computer S
ien
e, Stanford Univer-
sity, TR CS-60, 1967.
37
38 Afternotes on Numeri
al Analysis
this way the speed of the se
ant method is
ombined with the se
urity of the
interval bise
tion method. We will now ll in the details.
4. Let fa, fb, and f
denote the values of the fun
tion at a, b, and
. These
fun
tion values are required to satisfy
1: fa; fb; f
6= 0;
2: sign(fb) 6= sign(f
); (5.1)
3: jfbj jf
j:
At the beginning of the algorithm the user will be required to furnish points
b and
= a satisfying the rst two of these
onditions. The user must also
provide a
onvergen
e
riterion eps. When the algorithm is nished, the bra
k-
eting points b and
will satisfy j
bj eps.
5. The iterations take pla
e in an endless while loop, whi
h the program leaves
upon
onvergen
e. Although the user must see that the rst two
onditions
in (5.1) are satised, the program
an take
are of the third
ondition, sin
e
it has to anyway for subsequent iterations. In parti
ular, if jf
j < jfbj, we
inter
hange b and
. In this
ase, a and b may no longer be a pair of su
essive
se
ant iterates, and therefore we set a equal to
.
while(1){
if (abs(f
) < abs(fb))
{
t =
;
= b; b = t; (5.2)
t = f
; f
= fb; fb = t;
a =
; fa = f
;
}
6. We now test for
onvergen
e, leaving the loop if the
onvergen
e
riterion
is met.
if (abs(b-
) <= eps)
break;
7. The rst step of the iteration is to
ompute the se
ant step s at the points
a and b and also the midpoint m of b and
. One of these is to be
ome our
next iterate. Sin
e jfbj jf
j, it is natural to expe
t that x will be nearer to
b than
, and of
ourse it should lie in the bra
ket. Thus if s lies between b
and m, then the next iterate will be s; otherwise it will be m.
8. Computing the next iterate is a matter of some deli
a
y, sin
e we
annot
say a priori whether b is to the left or right of
. It is easiest to
ast the tests
in terms of the dieren
es ds = s b and m = m b. The following
ode does
the tri
k. When it is nished, dd has been
omputed so that the next iterate
is b + dd. Note the test to prevent division by zero in the se
ant step.
5. Nonlinear Equations 39
dm = (
-b)/2;
df = (fa-fb);
if (df == 0)
ds = dm;
else
ds = -fb*(a-b)/df;
if (sign(ds)!=sign(dm) || abs(ds) > abs(dm))
dd = dm;
else
dd = ds;
10. The next step is to form the new iterate |
all it d | and evaluate the
fun
tion there.
d = b + dd;
fd = f(d);
11. We must now rename our variables in su
h a way that the
onditions
of (5.1) are satised. We take
are of the
ondition that fd be nonzero by
returning if it is zero.
if (fd == 0){
b =
= d; fb = f
= fd;
break;
}
12. Before taking
are of the se
ond
ondition in (5.1), we make a provisional
assignment of new values to a, b, and
.
a = b; b = d;
fa = fb; fb = fd;
13. The se
ond
ondition in (5.1) says that b and
form a bra
ket for x .
If the new values fail to do so, the
ure is to repla
e
by the old value of b.
The reasoning is as follows. The old value of b has a dierent sign than the
old value of
. The new value of b has the same sign as the old value of
.
Consequently, the repla
ement results in a new value of
that has a dierent
sign than the new value of b.
In making the substitution, it is important to remember that the old value
of b is now
ontained in a.
40 Afternotes on Numeri
al Analysis
c
a b
14. The third
ondition in (5.1) is handled at the top of the loop; see (5.2).
15. Finally, we return after leaving the while loop.
}
return;
16. To explain the adjustment of dd in x5.9,
onsider the graph in Figure 5.1.
Here d is always on the side of x that is opposite
, and the value of
is not
hanged by the iteration. This means that although b is
onverging
superlinearly to x , the length of the bra
ket
onverges to a number that is
greater than zero | presumably mu
h greater than eps. Thus the algorithm
annot
onverge until its errati
asymptoti
behavior for
es some bise
tion
steps.
The
ure for this problem lies in the extra
ode introdu
ed in x5.9. If the
step size dd is less than eps in absolute value, it is for
ed to have magnitude
0.5*eps. This will usually be su
ient to push s a
ross the zero to the same
side as
, whi
h insures that the next bra
ket will be of length less than eps |
just what is needed to meet the
onvergen
e
riterion.
Errors, a
ura
y, and
ondition numbers
17. We have already observed in x5.2 that when we attempt to evaluate the
fun
tion f at a point x the value will not be exa
t. Instead we will get a
perturbed value
f~(x) = f (x) + e(x):
The error e(x)
an
ome from many sour
es. It may be due to rounding error in
the evaluation of the fun
tion, in whi
h
ase it will behave irregularly. On the
other hand, it may be dominated by approximations made in the evaluation of
5. Nonlinear Equations 41
the fun
tion. For example, an integral in the denition of the fun
tion may have
been evaluated numeri
ally. Su
h errors are often quite smooth. But whether
or not the error is irregular or smooth, it is unknown and has an ee
t on the
zeros of f that
annot be predi
ted. However, if we know something about the
size of the error, we
an say something about how a
urately we
an determine
a parti
ular zero.
18. Let x be a zero of f , and suppose we have a bound on the size of the
error; i.e.,
je(x)j :
If x1 is a point for whi
h f (x1 ) > , then
f~(x1 ) = f (x1 ) + e(x1 ) f (x1 ) > 0;
i.e., f~(x1 ) has the same sign as f (x1 ). Similarly, if f (x2 ) < , then f~(x2 )
is negative along with f (x2 ), and by the intermediate value theorem f has a
zero between x1 and x2 . Thus, whenever jf (x)j > , the values of f~(x) say
something about the lo
ation of the zero in spite of the error.
To put the point another way, let [a; b be the largest interval about x for
whi
h
x 2 [a; b =) f (x) :
As long as we are outside that interval, the value of f~(x) provides useful in-
formation about the lo
ation of the zero. However, inside the interval [a; b
the value of f~(x) tells us nothing, sin
e it
ould be positive, negative, or zero,
regardless of the sign of f (x).
19. The interval [a; b is an interval of un
ertainty for the zero x : we know
that x is in it, but there is no point in trying to pin it down further. Thus,
a good algorithm will return a point in [a; b, but we should not expe
t it to
provide any further a
ura
y. Algorithms that have this property are
alled
stable algorithms.
20. The size of the interval of un
ertainty varies from problem to problem. If
the interval is small, we say that the problem is well
onditioned. Thus, a stable
algorithm will solve a well-
onditioned problem a
urately. If the interval is
large, the problem is ill
onditioned. No algorithm, stable or otherwise,
an be
expe
ted to return an a
urate solution to an ill-
onditioned problem. Only if
we are willing to go to extra eort, like redu
ing the error e(x),
an we obtain
a more a
urate solution.
21. A number that quanties the degree of ill-
onditioning of a problem is
alled a
ondition number. To derive a
ondition number for our problem,
let us
ompute the half-width of the interval of un
ertainty [a; b under the
assumption that
f 0 (x ) 6= 0:
42 Afternotes on Numeri
al Analysis
eps
a a
b b
eps
43
Le
ture 6
Floating-Point Arithmeti
Floating-Point Numbers
Over
ow and Under
ow
Rounding Error
Floating-Point Arithmeti
Floating-point numbers
1. Anyone who has worked with a s
ienti
hand
al
ulator is familiar with
oating-point numbers. Right now the display of my
al
ulator
ontains the
hara
ters
2:597 03 (6.1)
whi
h represent the number
2:597 10 3 :
The
hief advantage of
oating-point representation is that it
an en
ompass
numbers of vastly diering magnitudes. For example, if we
onne ourselves
to six digits with ve after the de
imal point, then the largest number we
an
represent is 9:99999 = 10, and the smallest is 0:00001 = 10 5 . On the other
hand, if we allo
ate two of those six digits to represent a power of ten, then
we
an represent numbers ranging between 10 99 and 1099 . The pri
e to be
paid is that these
oating-point numbers have only four gures of a
ura
y, as
opposed to as mu
h as six for the xed-point numbers.
2. A base-
oating-point number
onsists of a fra
tion f
ontaining the
signi
ant gures of the number and exponent e
ontaining its s
ale.6 The
value of the number is
f e:
3. A
oating-point number a = f e is said to be normalized if
1
f < 1:
In other words, a is normalized if the base- representation of its fra
tion has
the form
f = 0:x1 x2 : : : ;
where x1 6= 0. Most
omputers work
hie
y with normalized numbers, though
you may en
ounter unnormalized numbers in spe
ial
ir
umstan
es.
The term \normalized" must be taken in
ontext. For example, by our
denition the number (6.1) from my
al
ulator is not normalized, while the
6
The fra
tion is also
alled the mantissa and the exponent the
hara
teristi
.
45
46 Afternotes on Numeri
al Analysis
s exponent fraction
01 9 32
number 0:2597 10 2 is. This does not mean that there is something wrong
with my
al
ulator | just that my
al
ulator, like most, uses a dierent nor-
malization in whi
h 1 f < 10.
4. Three bases for
oating-point numbers are in
ommon use.
name base where found
binary 2 most
omputers
de
imal 10 most hand
al
ulators
hex 16 IBM mainframes and
lones
In most
omputers, binary is the preferred base be
ause, among other things,
it fully uses the bits of the fra
tion. For example, the binary representation
of the fra
tion of the hexade
imal number one is :00010000. . . . Thus, this
representation wastes the three leading bits to store quantities that are known
to be zero.
5. Even binary
oating-point systems dier, something that in the past has
made it di
ult to produ
e portable mathemati
al software. Fortunately, the
IEEE has proposed a widely a
epted standard, whi
h most PCs and work-
stations use. Unfortunately, some manufa
turers with an investment in their
own
oating-point systems have not swit
hed. No doubt they will eventually
ome around, espe
ially sin
e the people who produ
e mathemati
al software
are in
reasingly relu
tant to jury-rig their programs to
onform to inferior
systems.
6. Figure 6.1 shows the binary representation of a 32-bit IEEE standard
oating-point word. One bit is devoted to the sign of the fra
tion, eight bits to
the exponent, and twenty-three bits to the fra
tion. This format
an represent
numbers ranging in size from roughly 10 38 to 1038 . Its pre
ision is about
seven signi
ant de
imal digits. A
uriosity of the system is that the leading
bit of a normalized number is not represented, sin
e it is known to be one.
The shortest
oating-point word in a system is usually
alled a single pre
i-
sion number. Double pre
ision numbers are twi
e as long. The double pre
ision
IEEE standard devotes one bit to the sign of the fra
tion, eleven bits to the
exponent, and fty-two bits to the fra
tion. This format
an represent num-
bers ranging from roughly 10 307 to 10307 to about fteen signi
ant gures.
Some implementations provide a 128-bit
oating-point word,
alled a quadruple
pre
ision number, or quad for short.
6. Floating-Point Arithmeti
47
12 + 10160 :
2
= 10 60
Now when (1=1060 )2 is squared, it under
ows and is set to zero. This does no
harm, be
ause 10 120 is insigni
ant
ompared with the number one, to whi
h
48 Afternotes on Numeri
al Analysis
12. It is instru
tive to
ompute a bound on the relative error that rounding
introdu
es. The pro
ess is su
iently well illustrated by rounding to ve digits.
Thus
onsider the number
a = X:XXXXY;
whi
h is rounded to
b = X:XXXZ:
Let us say we round up if Y 5 and round down if Y < 5. Then it is easy to
see that
jb aj 5 10 5 :
On the other hand, the leading digit of a is assumed nonzero, and hen
e jaj 1.
It follows that
jb aj 5 10 5 = 1 10 4 :
jaj 2
More generally, rounding a to t de
imal digits gives a number b satisfying
jb aj = 1 10 t+1 :
jaj 2
13. The same argument
an be used to show that when a is
hopped it gives
a number b satisfying
jb aj = 10 t+1 :
jaj
6. Floating-Point Arithmeti
49
This bound is twi
e as large as the bound for rounding, as might be expe
ted.
However, as we shall see later, there are other, more
ompelling reasons for
preferring rounding to
hopping.
14. The bounds for t-digit binary numbers are similar:
jb aj = 2 t rounding,
jaj 2 t+1
hopping.
15. These bounds
an be put in a form that is more useful for rounding-
error analysis. Let b =
(a) denote the result of rounding or
hopping a on a
parti
ular ma
hine, and let M denote the upper bound on the relative error.
If we set
b a
= ;
a
then b = a(1 + ) and jj M . In other words,
(a) = a(1 + ); jj :
M (6.2)
Floating-Point Arithmeti
Computing Sums
Ba
kward Error Analysis
Perturbation Analysis
Cheap and Chippy Chopping
Computing sums
1. The equation
(a + b) = (a + b)(1 + ) (jj M) is the simplest example
of a rounding-error analysis, and its simplest generalization is to analyze the
omputation of the sum
sn =
(x1 + x2 + + xn ):
There is a slight ambiguity in this problem, sin
e we have not spe
ied the
order of summation. For deniteness assume that the x's are summed left to
right.
2. The tedious part of the analysis is the repeated appli
ation of the error
bounds. Let
si =
(x1 + x2 + + xi ):
Then
s2 =
(x1 + x2 ) = (x1 + x2 )(1 + 1 ) = x1 (1 + 1 ) + x2 (1 + 1 );
where j1 j M. Similarly,
s3 =
(s2 + x3 ) = (s2 + x3 )(1 + 2 )
= x1 (1 + 1 )(1 + 2 ) +
x2 (1 + 1 )(1 + 2 ) +
x3 (1 + 2 ):
Continuing in this way, we nd that
sn =
(sn 1 + xn) = (sn 1 + xn )(1 + n 1 )
= x1 (1 + 1 )(1 + 2 ) (1 + n 1 ) +
x2 (1 + 1 )(1 + 2 ) (1 + n 1 ) +
x3 (1 + 2 ) (1 + n 1 ) + (7.1)
xn 1 (1 + n 2 )(1 + n 1 ) +
xn (1 + n 1 );
where ji j M (i = 1; 2; : : : ; n 1).
53
54 Afternotes on Numeri
al Analysis
3. The expression (7.1) is not very informative, and it will help to introdu
e
some notation. Let the quantities i be dened by
1 + 1 = (1 + 1 )(1 + 2 ) (1 + n 1 );
1 + 2 = (1 + 1 )(1 + 2 ) (1 + n 1 );
1 + 3 = (1 + 2 ) (1 + n 1 );
1 + n 1 = (1 + n 2 )(1 + n 1 );
1 + n = (1 + n 1 ):
Then
sn = x1 (1+ 1 )+ x2 (1+ 2 )+ x3(1+ 3 )+ + xn 1(1+ n 1 )+ xn(1+ n ): (7.2)
4. The number 1 + i is the produ
t of numbers 1 + j that are very near one.
Thus we should expe
t that 1 + i is itself near one. To get an idea of how
near,
onsider the produ
t
1 + n 1 = (1 + n 2 )(1 + n 1 ) = 1 + (n 2 + n 1 ) + n 2 n 1 : (7.3)
Now jn 2 + n 1 j 2M and jn 2 n 1 j 2M. If, say, M = 10 15 , then
2M = 2 10 15 while 2M = 10 30 . Thus the third term on the right-hand side
of (7.3) is insigni
ant
ompared to the se
ond term and
an be ignored. If we
ignore it, we get
n 1 = n 2 + n 1
or
jn 1j < jn 2j + jn 1 j 2M:
In general,
j1 j < (n 1)M ; (7.4)
j j < (n i + 1) ; i = 2; 3; : : : ; n:
i M
5. The approximate bounds (7.4) are good enough for government work, but
there are fastidious individuals who will insist on rigorous inequalities. For
them we quote the following result.
If nM 0:1 and i M (i = 1; 2; : : : ; n), then
(1 + 1 )(1 + 2 ) (1 + n ) = 1 + ;
where
1:06nM :
7. Floating-Point Arithmeti
55
Thus if we set
0M = 1:06M ;
then the approximate bounds (7.4) be
ome quite rigorously
j j (n 1)0 ;
1 M
(7.5)
jij (n i + 1)0 ;M i = 2; 3; : : : ; n:
The quantity 0M is sometimes
alled the adjusted rounding unit.
6. The requirement that nM 0:1 is a restri
tion on the size of n, and it is
reasonable to ask if it is one we need to worry about. To get some idea of what
it means, suppose that M = 10 15 . Then for this inequality to fail we must
have n 1014 . If we start summing numbers on a
omputer that
an add at
the rate of 1se
= 10 6 se
, then the time required to sum 1014 numbers is
108 se
= 3:2 years:
In other words, don't hold your breath waiting for nM to be
ome greater than
0:1.
Ba
kward error analysis
7. The expression
sn = x1 (1+ 1 )+ x2(1+ 2 )+ x3 (1+ 3 )+ + xn 1(1+ n 1 )+ xn(1+ n); (7.6)
along with the bounds on the i , is
alled a ba
kward error analysis be
ause the
rounding errors made in the
ourse of the
omputation are proje
ted ba
kward
onto the original data. An algorithm that has su
h an analysis is
alled stable
(or sometimes ba
kward stable ).
We have already mentioned in
onne
tion with the sum of two numbers
(x6.22) that stability in the ba
kward sense is a powerful property. Usually the
ba
kward errors will be very small
ompared to errors that are already in the
input. In that
ase it is the latter errors that are responsible for the ina
ura
ies
in the answer, not the rounding errors introdu
ed by the algorithm.
8. To emphasize this point, suppose you are a numeri
al analyst and are
approa
hed by a
ertain Dr. Xyz who has been adding up some numbers.
Xyz: I've been trying to
ompute the sum of ten numbers, and the answers I
get are nonsense, at least from a s
ienti
viewpoint. I wonder if the
omputer
is fouling me up.
You: Well it
ertainly has happened before. What pre
ision were you using?
You: Quite right. Tell me, how a
urately do you know the numbers you were
summing?
56 Afternotes on Numeri
al Analysis
Xyz: Pretty well,
onsidering that they are experimental data. About four
digits.
You: Then it's not the
omputer that is
ausing your poor results.
Xyz: How
an you say that without even looking at the numbers? Some sort
of magi
?
You: Not at all. But rst let me ask another question.
Xyz: Shoot.
You: Suppose I took your numbers and twiddled them in the sixth pla
e.
Could you tell the dieren
e?
Xyz: Of
ourse not. I already told you that we only know them to four pla
es.
You: Then what would you say if I told you that the errors made by the
omputer
ould be a
ounted for by twiddling your data in the fourteenth pla
e
and then performing the
omputations exa
tly?
Xyz: Well, I nd it hard to believe. But supposing it's true, you're right. It's
my data that's the problem, not the
omputer.
9. At this point you might be tempted to bow out. Don't. Dr. Xyz wants to
know more.
Xyz: But what went wrong? Why are my results meaningless?
You: Tell me, how big are your numbers?
Xyz: Oh, about a million.
You: And what is the size of your answer?
Xyz: About one.
You: And the answers you
ompute are at least an order of magnitude too
large.
Xyz: How did you know that. Are you a mind reader?
You: Common sense, really. You have to
an
el ve digits to get your answer.
Now if you knew your numbers to six or more pla
es, you would get one or more
a
urate digits in your answer. Sin
e you know only four digits, the lower two
digits are garbage and won't
an
el. You'll get a number in the tens or greater
instead of a number near one.
Xyz: What you say makes sense. But does that mean I have to remeasure my
numbers to six or more gures to get what I want?
You: That's about it.
Xyz: Well I suppose I should thank you. But under the
ir
umstan
es, it's not
easy.
You: That's OK. It
omes with the territory.
10. The above dialogue is arti
ial in three respe
ts. The problem is too
simple to be
hara
teristi
of real life, and no s
ientist would be as naive as
7. Floating-Point Arithmeti
57
Dr. Xyz. Moreover, people don't roll over and play dead like Dr. Xyz: they
require a lot of
onvin
ing. But the dialogue illustrates two important points.
The rst point is that a ba
kward error analysis is a useful tool for removing
the
omputer as a suspe
t when something goes wrong. The se
ond is that
ba
kward stability is seldom enough. We want to know what went wrong. What
is there in the problem that is
ausing di
ulties? To use the terminology we
introdu
ed for zeros of fun
tions: When is the problem ill-
onditioned?
Perturbation analysis
11. To answer the question just posed, it is a good idea to drop any
onsid-
erations of rounding error and ask in general what ee
ts known errors in the
xi will have on the sum
= x1 + x2 + + xn :
Spe
i
ally, we will suppose that
x~i = xi (1 + i ); jij ; (7.7)
and look for a bound on the error in the sum
~ = x~1 + x~2 + + x~n :
Su
h a pro
edure is
alled a perturbation analysis be
ause it assesses the ee
ts
of perturbations in the arguments of a fun
tion on the value of the fun
tion.
The analysis is easy enough to do. We have
j~ j jx jj j + jx jj j + + jxnjjnj:
1 1 2 2
We
an now obtain a bound on the relative error by dividing by jj. Spe
i
ally,
if we set
= 1
jx j + jx2 j + + jxnj ;
jx + x + + x j
1 2 n
then
j~ j : (7.8)
jj
The number , whi
h is never less than one, tells how the rounding errors made
in the
ourse of the
omputation are magnied in the result. Thus it serves
as a
ondition number for the problem. (Take a moment to look ba
k at the
dis
ussion of
ondition numbers in x5.21.)
58 Afternotes on Numeri
al Analysis
12. In Dr. Xyz's problem, the experimental errors in the fourth pla
e
an be
represented by = 10 4 , in whi
h
ase the bound be
omes
relative error = 10 4 :
Sin
e there were ten x's of size about 1,000,000, while the sum of the x's was
about one, we have = 107 , and the bound says that we
an expe
t no
a
ura
y in the result, regardless of any additional rounding errors.
13. We
an also apply the perturbation analysis to bound the ee
ts of round-
ing errors on the sum. In this
ase the errors i
orrespond to the errors i in
(7.2). Thus from (7.5) we have
jij (n 1)0 :
M
where as usual
=
jx j + jx j + + jxnj
1 2
jx + x + + xnj
1 2
This inequality predi
ts that rounding error will a
umulate slowly as terms
are added to the sum. However, the analysis on whi
h the bound was based
assumes that the worst happens all the time, and one might expe
t that the
fa
tor n 1 is an overestimate.
In fa
t, if we sum positive numbers with rounded arithmeti
, the fa
tor
will be an overestimate, sin
e the individual rounding errors will be positive or
negative at random and will tend to
an
el one other. On the other hand, if we
are summing positive numbers with
hopped arithmeti
, the errors will tend
to be in the same dire
tion (downward), and they will reinfor
e one another.
In this
ase the fa
tor n 1 is realisti
.
15. We don't have to resort to a lengthy analysis to see how this phenomenon
omes about. Instead, let's imagine that we take two six-digit numbers, and
7. Floating-Point Arithmeti
59
do two things with them. First, we round the numbers to ve digits and sum
them exa
tly; se
ond, we
hop the numbers to ve digits and on
e again sum
them exa
tly. The following table shows what happens.
number = rounded + error =
hopped + error
1374.8 = 1375 0.2 = 1374 + 0.8
3856.4 = 3856 + 0.4 = 3856 + 0.4
total 5231.2 = 5231 + 0.2 = 5230 + 1.2
As
an be seen from the table, the errors made in rounding have opposite
signs and
an
el ea
h other in the sum to yield a small error of 0:2. With
hopping, however, the errors have the same sign and reinfor
e ea
h other to
yield a larger error of 1:2. Although we have summed only two numbers to
keep things simple, the errors in sums with more terms tend to behave in the
same way: errors from rounding tend to
an
el, while errors from
hopping
reinfor
e. Thus rounding is to be preferred in an algorithm in whi
h it may be
ne
essary to sum numbers all having the same sign.
16. The above example makes it
lear that you
annot learn everything about a
oating-point system by studying the bounds for its arithmeti
. In the bounds,
the dieren
e between rounding and
hopping is a simple fa
tor of two, yet
when it
omes to sums of positive numbers the dieren
e in the two arithmeti
s
is a matter of the a
umulation of errors. In parti
ular, the fa
tor n 1 in
the error bound (7.8) re
e
ts how the error may grow for
hopped arithmeti
,
while it is unrealisti
forprounded arithmeti
. (On statisti
al grounds it
an be
argued that the fa
tor n is realisti
for rounded arithmeti
.)
To put things in a dierent light, binary,
hopped arithmeti
has the same
bound as binary, rounded arithmeti
with one less bit. Yet on the basis of
what we have seen, we would be glad to sa
ri
e the bit to get the rounding.
Le
ture 8
Floating-Point Arithmeti
Can
ellation
The Quadrati
Equation
That Fatal Bit of Rounding Error
Envoi
Can
ellation
1. Many
al
ulations seem to go well until they have to form the dieren
e
between two nearly equal numbers. For example, if we attempt to
al
ulate
the sum
37654 + 25:874 37679 = 0:874
in ve-digit
oating-point, we get
(37654 + 25:874) = 37680
and
(37680 37679) = 1:
This result does not agree with the true sum to even one signi
ant gure.
2. The usual explanation of what went wrong is to say that we
an
elled most
of the signi
ant gures in the
al
ulation of
(37860 37679) and therefore
the result
annot be expe
ted to be a
urate. Now this is true as far as it
goes, but it
onveys the mistaken impression that the
an
ellation
aused the
ina
ura
y. However, if you look
losely, you will see that no error at all was
made in
al
ulating
(37860 37679). Thus the sour
e of the problem must
lie elsewhere, and the
an
ellation simply revealed that the
omputation was
in trouble.
In fa
t, the sour
e of the trouble is in the addition that pre
eded the
an
el-
lation. Here we
omputed
(37654+25:874) = 37680. Now this
omputation is
the same as if we had repla
ed 25:874 by 26 and
omputed 37654 + 26 exa
tly.
In other words, this
omputation is equivalent to throwing out the three digits
0:874 in the number 25:874. Sin
e the answer
onsists of just these three digits,
it is no wonder that the nal
omputed result is wildly ina
urate. What has
killed us is not the
an
ellation but the loss of important information earlier
in the
omputation. The
an
ellation itself is merely a death
erti
ate.
The quadrati
equation
3. To explore the matter further, let us
onsider the problem of solving the
quadrati
equation
x2 bx +
= 0;
61
62 Afternotes on Numeri
al Analysis
-5
-10
-15
log2[x(k)]
-20
-25
-30
-35
-40
0 5 10 15 20 25 30 35 40
k
graph des
ends linearly with a slope of 2, as one would expe
t of any fun
tion
proportional to (1=4)k . However, at k = 20 the graph turns around and begins
to as
end with a slope of one. What has gone wrong?
11. The answer is that the dieren
e equation (8.2) has two prin
ipal solutions:
1 k and 2k :
4
Any solution
an be expanded as a linear
ombination of these two solutions;
i.e., the most general form of a solution is
k
1
+ 2k :
4
Now in prin
iple, the xk dened by (8.2) and (8.3) should have an expansion
in whi
h = 0; however, be
ause of rounding error, is ee
tively nonzero,
though very small. As time goes by, the in
uen
e of this solution grows until
it dominates. Thus the des
ending part of the graph represents the interval
in whi
h the
ontribution of 2k is negligible, while the as
ending portion
represents the interval in whi
h 2k dominates.
12. It is possible to give a formal rounding-error analysis of the
omputation
of xk . However, it would be tedious, and there is a better way of seeing
8. Floating-Point Arithmeti
65
what is going on. We simply assume that all the rounding error is made at the
beginning of the
al
ulation and that the remaining
al
ulations are performed
exa
tly.
Spe
i
ally, let us assume that errors made in rounding have given us x1
and x2 that satisfy
1
x1 = (4+0 + 2 56 );
3
1
x2 = (4 1 + 2 55 )
3
(note that 2 is the rounding unit for IEEE 64-bit arithmeti
). Then the
56
general solution is
1
xk = (41 k + 2k 57 ):
3
The turnaround point for this solution o
urs when
41 k = 2k 57 ;
whi
h gives a value of k between nineteen and twenty. Obviously, our simplied
analysis has predi
ted the results we a
tually observed.
13. All this illustrates a general te
hnique of wide appli
ability. It frequently
happens that an algorithm has a
riti
al point at whi
h a little bit of rounding
error will
ause it to fail later. If you think you know the point, you
an
onrm it by rounding at that point but allowing no further rounding errors.
If the algorithm goes bad, you have spotted a weak point, sin
e it is unlikely
that the rounding errors you have not made will somehow
orre
t your fatal
bit of error.
Envoi
14. We have now seen three ways in whi
h rounding error
an manifest it-
self.
1. Rounding error
an a
umulate, as it does during the
omputation
of a sum. Su
h a
umulation is slow and is usually important only
for very long
al
ulations.
2. Rounding error
an be revealed by
an
ellation. The o
urren
e of
an
ellation is invariably an indi
ation that something went wrong
earlier in the
al
ulation. Sometimes the problem
an be
ured by
hanging the details of the algorithm; however, if the sour
e of the
an
ellation is an intrinsi
ill-
onditioning in the problem, then it's
ba
k to the drawing board.
3. Rounding error
an be magnied by an algorithm until it dominates
the numbers we a
tually want to
ompute. Again the
al
ulation
does not have to be lengthy. There are no easy xes for this kind of
problem.
66 Afternotes on Numeri
al Analysis
It would be wrong to say that these are the only ways in whi
h rounding
error makes itself felt, but they a
ount for many of the problems observed in
pra
ti
e. If you think you have been bitten by rounding error, you
ould do
worse than ask if the problem is one of the three listed above.
Linear Equations
67
Le
ture 9
Linear Equations
Matri
es, Ve
tors, and S
alars
Operations with Matri
es
Rank-One Matri
es
Partitioned Matri
es
B
B a21 a22 a2;n 1 a2n C
C
A=
B
.. .. .. .. C
:
B
B . . . . C
C
B
am 1;1 an 1;2 am ;n 1 1 am 1;n
C
A
x = B ..
B C :
.C
A
xn
We write x 2 Rn . The number n is
alled the dimension. The numbers xj are
alled the
omponents of x.
4. Note that by
onvention, all ve
tors are
olumn ve
tors; that is, their
omponents are arranged in a
olumn. Obje
ts like (x1 x2 xn ) whose
omponents are arranged in a row are
alled row ve
tors. We generally write
row ve
tors in the form xT (see the denition of the transpose operation below
in x9.18).
5. We will make no distin
tions between Rn1 and Rn : it is all the same to
us whether we
all an obje
t an n 1 matrix or an n-ve
tor. Similarly, we will
69
70 Afternotes on Numeri
al Analysis
not distinguish between the real numbers R, also
alled s
alars , and the set of
1-ve
tors, and the set of 1 1 matri
es.
6. Matri
es will be designated by upper-
ase Latin or Greek letters, e.g., A,
, et
. Ve
tors will be designated by lower-
ase Latin letters, e.g., x, y, et
.
S
alars will be designated by lower-
ase Latin and Greek letters. Some attempt
will be made to use an asso
iated lower-
ase letter for the elements of a matrix
or the
omponents of a ve
tor. Thus the elements of A will be aij or possibly
ij . In parti
ular note the asso
iation of with x and with y.
9. If A and B have the same dimensions, then their sum is the matrix A + B
dened by
A + B = (aij + bij ):
We express the fa
t that A and B have the same dimensions by saying that
their dimensions are
onformal for summation.
10. A matrix whose elements are all zero is
alled a zero matrix and is written
0 regardless of its dimensions. It is easy to verify that
A + 0 = 0 + A = 0;
so that 0 is an additive identity for matri
es.
11. If A is an l m matrix and B is an m n matrix, then the produ
t AB
is an l n matrix dened by
m
X
AB = aik bkj :
k=1
9. Linear Equations 71
14. The identity matrix is a spe
ial
ase of a useful
lass of matri
es
alled
diagonal matri
es . A matrix D is diagonal if its only nonzero entries lie on its
diagonal, i.e., if dij = 0 whenever i 6= j . We write
diag(d1 ; : : : ; dn )
for a diagonal matrix whose diagonal entries are d1 ; : : : ; dn .
15. Sin
e we have agreed to regard an n-ve
tor as an n 1 matrix, the above
denitions
an be transferred dire
tly to ve
tors. Any ve
tor
an be multiplied
by a s
alar. Two ve
tors of the same dimension may be added. Only 1-ve
tors,
i.e., s
alars,
an be multiplied.
16. A parti
ularly important
ase of the matrix produ
t is the matrix-ve
tor
produ
t Ax. Among other things it is useful as an abbreviated way of writing
72 Afternotes on Numeri
al Analysis
systems of equations. Rather than say that we shall solve the system
b1 = a11 x1 + a12 x2 + + a1n xn
b2 = a21 x1 + a22 x2 + + a2n xn
: : :
bn = an1 x1 + an2 x2 + + annxn
we
an simply write that we shall solve the equation
b = Ax;
where A is of order n.
17. Both the matrix sum and the matrix produ
t are asso
iative; that is,
(A + B ) + C = A + (B + C ) and (AB )C = A(BC ). The produ
t distributes
over the sum; e.g., A(B + C ) = AB + AC . In addition, the matrix sum is
ommutative: A + B = B + A. Unfortunately the matrix produ
t is not
ommutative: in general AB 6= BA. It is easy to forget this fa
t when you are
manipulating formulas involving matri
es.
18. The nal operation we shall use is the matrix transpose. If A is an m n
matrix, then the transpose of A is the n m matrix AT dened by
AT = (aji ):
Thus the transpose is the matrix obtained by re
e
ting a matrix through its
diagonal.
19. The transpose intera
ts ni
ely with the other matrix operations:
1: (A)T = (AT );
2: (A + B )T = AT + B T;
3: (AB )T = B T AT :
Note that the transposition reverses the order of a produ
t.
20. If x is a ve
tor, then xT is a row ve
tor. If x and y are n-ve
tors, then
yT x = x1 y1 + x2 y2 + + xn yn
is a s
alar
alled the inner produ
t of x and y. In parti
ular the number
p
kxk = x x T
Rank-one matri
es
21. If x; y 6= 0, then any matrix of the form
0 1
x1 y1 x1 y2 x1 y3
Bx2 y1
B
x2 y2 x2 y3 C
C
W = xyT = B
Bx3 y1 x3 y2 x3 y3 C
C (9.1)
.. .. .. A
. . .
has rank one; that is, its
olumns span a one-dimensional spa
e. Conversely,
any rank-one matrix W
an be represented in the form xyT . Rank-one matri
es
arise frequently in numeri
al appli
ations, and it's important to know how to
deal with them.
22. The rst thing to note is that one does not store a rank-one matrix as a
matrix. For example, if x and y are n-ve
tors, then the matrix xyT requires n2
lo
ations to store, as opposed to 2n lo
ations to store x and y. To get some idea
of the dieren
e, suppose that n = 1000. Then xyT requires one million words
to store as a matrix, as opposed to 2000 to store x and y individually| the
storage diers by a fa
tor of 500.
23. If we always represent a rank-one matrix W = xyT by storing x and y,
the question arises of how we perform matrix operations with W | how, say,
we
an
ompute the matrix-ve
tor produ
t
= W b? An elegant answer to this
question may be obtained from the equation
= W b = (xyT )b = x(yT b) = (yT b)x; (9.2)
in whi
h the last equality follows from the fa
t that yT b is a s
alar.
This equation leads to the following algorithm.
1. Compute = yT b (9.3)
2. Compute
= x
This algorithm requires 2n multipli
ations and n 1 additions. This should be
ontrasted with the roughly n2 multipli
ations and additions required to form
an ordinary matrix ve
tor produ
t.
24. The above example illustrates the power of matrix methods in deriving
e
ient algorithms. A person
ontemplating the full matrix representation
(9.1) of xyT would no doubt
ome up with what amounts to the algorithm (9.3),
albeit in s
alar form. But the pro
ess would be arduous and error prone. On
the other hand, the simple manipulations in (9.2) yield the algorithm dire
tly
and in a way that relates it naturally to operations with ve
tors. We shall see
further examples of the power of matrix te
hniques in deriving algorithms.
74 Afternotes on Numeri
al Analysis
Partitioned matri
es
25. Partitioning is a devi
e by whi
h we
an express matrix operations at a
level between s
alar operations and operations on full matri
es. A partition
of a matrix is a de
omposition of the matrix into submatri
es. For example,
onsider the matrix
0 1
a11 a12 a13 a14 a15 a16 a17
Ba21 a22 a23 a24 a25 a26 a27 C
B
C
A=B
Ba31
B
a32 a33 a34 a35 a36 C:
a37 C
C
a41 a42 a43 a44 a45 a46 a47 A
a51 a52 a53 a54 a55 a56 a57
The partitioning indu
ed by the lines in the matrix allows us to write the
matrix in the form !
A
A= A A A ;
11 A 12 A13
21 22 23
where ! !
then 0 1
1
B 2 C
B C
Ax = (a1 a2 : : : an ) B .. C = 1 a1 + 2 a2 + + n an :
B C
.A
n
From this formula we
an draw the following useful
on
lusion.
The matrix-ve
tor produ
t Ax is the linear
ombination of the
olumns of A whose
oe
ients are the
omponents of x.
Le
ture 10
Linear Equations
The Theory of Linear Systems
Computational Generalities
Triangular Systems
Operation Counts
2. Although the above
onditions all have their appli
ations, as a pra
ti
al
matter the
ondition det(A) 6= 0
an be quite misleading. The reason is that
the determinant
hanges violently with minor res
aling. Spe
i
ally, if A is of
order n, then
det(A) = n det(A):
To see the impli
ations of this equality, suppose that n = 30 (rather small by
today's standards) and that det(A) = 1. Then
det(0:1 A) = 10 30
:
In other words, dividing the elements of A by ten redu
es the determinant by
a fa
tor of 10 30 . It is not easy to determine whether su
h a volatile quantity
77
78 Afternotes on Numeri
al Analysis
fa
tor it.
10. Linear Equations 79
where L is lower triangular (i.e., its elements are zero above the diagonal)
and U is upper triangular (its elements are zero below the diagonal). This
fa
torization is
alled an LU de
omposition of the matrix A. Now if A is
nonsingular, then so are L and U . Consequently, if we write the system Ax = b
in the form LUx = b, we have
Ux = L 1 b y: (10.3)
Moreover, by the denition of y
Ly = b: (10.4)
Thus, if we have a method for solving triangular systems, we
an use the
following algorithm to solve the system Ax = b.
1. Fa
tor A = LU
2. Solve Ly = b
3. Solve Ux = y
To implement this general algorithm we must be able to fa
tor A and to
solve triangular systems. We will begin with triangular systems.
Triangular systems
6. A matrix L is lower triangular if
i < j =) `ij = 0:
This is a fan
y way of saying that the elements of L lying above the diagonal
are zero. For example, a lower triangular matrix of order ve has the form
0 1
`11 0 0 0 0
B`21
B
`22 0 0 0 C C
L=B B
` 31 `32 ` 33 0 0 C:
C
We have already noted that the indexing
onventions for matri
es, in whi
h the rst
8
element is the (1,1)-element, are in
onsistent with C array
onventions in whi
h the rst
element of the array a is a[0[0. In most C
ode presented here, we will follow the matrix
onvention. This wastes a little storage for the unused part of the array, but that is a small
pri
e to pay for
onsisten
y.
10. Linear Equations 81
Operation
ounts
10. To get an idea of how mu
h it
osts to solve a triangular system, let us
ount the number of multipli
ations required by the algorithm. There is one
multipli
ation in the statement
b[i = b[i - l[i[j*b[j;
This statement is exe
uted for j running from 1 to i and for i running from
1 to n. Hen
e the total number of multipli
ations is
n i n n(n + 1)
XX
1=
X
i= = n ;
2
(10.6)
i=1 j =1 i=1 2 2
the last approximation holding for large n. There are a like number of addi-
tions.
11. Before we try to say what an operation
ount like (10.6) a
tually means,
let us dispose of a te
hni
al point. In deriving operation
ounts for matrix
pro
esses, we generally end up with sums nested two or three deep, and we
are interested in the dominant term, i.e., the term with the highest power of
n. We
an obtain this term by repla
ing sums with integrals and adjusting the
limits of integration to make life easy. If this pro
edure is applied to (10.6),
the result is Z nZ i Z n
1 dj di = i di = n2 ;
2
0 0 0
Linear Equations
Memory Considerations
Row-Oriented Algorithms
A Column-Oriented Algorithm
General Observations
Basi
Linear Algebra Subprograms
Memory
onsiderations
1. Virtual memory is one of the more important advan
es in
omputer systems
to
ome out of the 1960s. The idea is simple, although the implementation
is
ompli
ated. The user is supplied with very large virtual memory. This
memory is subdivided into blo
ks of modest size
alled pages . Sin
e the entire
virtual memory
annot be
ontained in fast, main memory, most of its pages
are maintained on a slower ba
king store, usually a disk. Only a few a
tive
pages are
ontained in the main memory.
When an instru
tion referen
es a memory lo
ation, there are two possibil-
ities.
1. The page
ontaining the lo
ation is in main memory (a hit). In this
ase the lo
ation is a
essed immediately.
2. The page
ontaining the lo
ation is not in main memory (a miss).
In this
ase the system sele
ts a page in main memory and swaps it
with the one that is missing.
2. Sin
e misses involve a time-
onsuming ex
hange of data between main
memory and the ba
king store, they are to be avoided if at all possible. Now
memory lo
ations that are near one another are likely to lie on the same page.
Hen
e one strategy for redu
ing misses is to arrange to a
ess memory sequen-
tially, one neighboring lo
ation after another. This is a spe
ial
ase of what is
alled lo
ality of referen
e.
Row-oriented algorithms
3. The algorithm (10.5) is one that preserves lo
ality of referen
e. The reason
is that the language C stores doubly subs
ripted arrays by rows, so that in the
ase n = 5 the matrix L might be stored as follows.
`11 0 0 0 0 `21 `22 0 0 0 `31 `32 `33 0 0 `41 `42 `43 `44 0 `51 `52 `53 `54 `55
83
84 Afternotes on Numeri
al Analysis
Now if you run through the loops in (10.5), you will nd that the elements of
L are a
essed in the following order.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
`11 0 0 0 0 `21 `22 0 0 0 `31 `32 `33 0 0 `41 `42 `43 `44 0 `51 `52 `53 `54 `55
Clearly the a
esses here tend to be sequential. On
e a row is in main memory,
we mar
h along it a word at a time.
4. A matrix algorithm like (10.5) in whi
h the inner loops a
ess the elements
of the matrix by rows is said to be row oriented. Provided the matrix is stored
by rows, as it is in C, row-oriented algorithms tend to intera
t ni
ely with
virtual memories.
5. The situation is quite dierent with the language fortran, in whi
h ma-
tri
es are stored by
olumns. For example, in fortran the elements of the
matrix L will appear in storage as follows.
`11 `21 `31 `41 `51 0 `22 `32 `42 `52 0 0 `33 `43 `53 0 0 0 `44 `54 0 0 0 0 `55
If the fortran equivalent of algorithm (10.5) is run on this array, the memory
referen
es will o
ur in the following order.
1 2 4 7 11 3 5 8 12 6 9 13 10 14 15
`11 `21 `31 `41 `51 0 `22 `32 `42 `52 0 0 `33 `43 `53 0 0 0 `44 `54 0 0 0 0 `55
Clearly the referen
es are jumping all over the pla
e, and we
an expe
t a high
miss rate for this algorithm.
A
olumn-oriented algorithm
6. The
ure for the fortran problem is to get another algorithm | one that
is
olumn oriented. Su
h an algorithm is easy to derive from a partitioned
form of the problem.
Spe
i
ally, let the system Lx = b be partitioned in the form
! ! !
11 0 1 1
`21 L22 x2 = b2 ;
where L22 is lower triangular. (Note that we now use the Greek letter to
denote individual elements of L, so that we do not
onfuse the ve
tor `21 =
(21 ; : : : ; n1 )T with the element 21 .) This partitioning is equivalent to the
two equations
11 1 = 1 ;
`21 1 + L22 x2 = b2 :
11. Linear Equations 85
l[i[i-1 b[i-1
from b[i. Consequently, if we write a little fun
tion
float dot(n, float x[, float y[)
{
float d=0;
for (i=0; i<n; i++)
d = d + x[i*y[i;
return d;
}
we
an rewrite (10.5) in the form
for (i=1; i<=n; i++)
b[i = (b[i - dot(i-1, &l[i[1, &b[1))/l[i[i;
Not only do we now have the possibility of optimizing the
ode for the fun
tion
dot, but the row-oriented algorithm itself has been
onsiderably simplied.
14. As another example,
onsider the following statements from the
olumn-
oriented algorithm (11.2).
do 10 i=j+1,n
b(i) = b(i) - b(j)*l(i,j)
10
ontinue
Clearly these statements
ompute the ve
tor
0 1 0 1
b(j+1) l(j+1,j)
B C B C
Bb(j+2)C Bl(j+2,j)C
B
.. C b(j) B .. C :
B
. C
A
B
. C
A
b(n) l(n,j)
Consequently, if we write a little fortran program axpy (for ax + y)
subroutine axpy(n, a, x, y)
integer n
real a, x(*), y(*)
do 10 i=1,n
y(i) = y(i) + a*x(i)
10
ontinue
return
end
88 Afternotes on Numeri
al Analysis
Linear Equations
Positive-Denite Matri
es
The Cholesky De
omposition
E
onomi
s
89
90 Afternotes on Numeri
al Analysis
A = a A
aT ;
then > 0 and A is positive denite. To see that > 0, set x = (1; 0; : : : ; 0).
Then ! !
a T
0 < x Ax = (1 0) a A 0 = :
T 1
To see that A is positive denite, let y 6= 0 and set xT = (0 yT ). Then
! !
0 < x Ax = (0 y ) aT
0 = yT A y:
T T
a A y
Writing out this equation by blo
ks of the partition, we get the three equations
1: = 2 ;
2: aT = rT ;
3: A = RT R + rrT:
Equivalently, p
1: = ;
2: rT = 1 aT ; (12.3)
3: RTR = A rrT:
The rst two equations are, in ee
t, an algorithm for
omputing the rst
row of R. The (1,1)-element of R is well dened, sin
e > 0. Sin
e 6= 0,
rT is uniquely dened by the se
ond equation.
The third equation says that R is the Cholesky fa
tor of the matrix
A^ = A rrT = A 1 aaT
[the last equality follows from the rst two equations in (12.3). This matrix is
of order one less than the original matrix A, and
onsequently we
an
ompute
its Cholesky fa
torization by applying our algorithm re
ursively. However, we
must rst establish that A^ is itself positive denite, so that it has a Cholesky
fa
torization.
8. The matrix A^ is
learly symmetri
, sin
e
A^T = (A rrT)T = AT (rT )T rT = A rrT :
Hen
e it remains to show that for any nonzero ve
tor y
^ = yT A y 1 (aT y)2 > 0:
yT Ay
To do this we will use the positive-deniteness of A. If is any s
alar, then
! !
0 < ( yT ) a A
aT = 2 + 2aT y + yTA y:
y
If we now set = 1 aT y, then it follows after a little manipulation that
0 < 2 + 2aT y + yT A y = yT A y 1 (aT y)2 ;
whi
h is what we had to show.
9. Before we
ode the algorithm sket
hed above, let us examine its relation
to an elimination method for solving a system of equations Ax = b. We begin
by writing the equation A^ = A 1 aaT in s
alar form as follows:
^ ij = ij 111 1i 1j = ij 111 i1 1j :
92 Afternotes on Numeri
al Analysis
Here we have put the subs
ripting of A ba
k into the partition, so that = 11
and aT = (12 ; : : : ; 1n ). The se
ond equality follows from the symmetry of
A.
Now
onsider the system
11 x1 + 12 x2 + 13 x3 + 14 x4 = b1
21 x1 + 22 x2 + 23 x3 + 24 x4 = b2
31 x1 + 32 x2 + 33 x3 + 34 x4 = b3
41 x1 + 42 x2 + 43 x3 + 44 x4 = b4
If the rst equation is solved for x1 , the result is
x1 = 111 (b1 12 x2 13 x3 14 x4 ):
Substituting x1 in the last three of the original equations and simplifying, we
get
(22 111 21 12 )x2 + (23 111 21 13 )x3 + (24 111 21 14 )x4
= b2 111 21 b1
(32 111 31 12 )x2 + (33 111 31 13 )x3 + (34 111 31 14 )x4
= b3 111 31 b1 (12.4)
(42 111 41 12 )x2 + (43 111 41 13 )x3 + (44 111 41 14 )x4
= b4 111 41 b1
In this way we have redu
ed our system from one of order four to one of order
three. This pro
ess is
alled Gaussian elimination.
Now if we
ompare the elements ij 111 i1 1j of the matrix A^ produ
ed
by the Cholesky algorithm with the
oe
ients of the system (12.4), we see
that they are the same. In other words, Gaussian elimination and the Cholesky
algorithm produ
e the same submatrix, and to that extent are equivalent. This
is no
oin
iden
e: many dire
t algorithms for solving linear systems turn out
to be variants of Gaussian elimination.
10. Let us now turn to the
oding of Cholesky's algorithm. There are two
ways to save time and storage.
1. Sin
e A and A^ are symmetri
, it is unne
essary to work with the
lower half | all the information we need is in the upper half. The
same applies to the other submatri
es generated by the algorithm.
2. On
e and aT have been used to
ompute and rT , they are no
longer needed. Hen
e their lo
ations
an be used to store and rT .
As the algorithm pro
eeds, the matrix R will overwrite the upper
half of A row by row.
12. Linear Equations 93
11. The overwriting of A by R is standard pro
edure, dating from the time
when storage was dear and to be
onserved at all
osts. Perhaps now that
storage is bounteous, people will quit evi
ting A and give R a home of its own.
Time will tell.
12. The algorithm pro
eeds in n stages. At the rst stage, the rst row of
R is
omputed and the (n 1) (n 1) matrix A in the southeast
orner
is modied. At the se
ond stage, the se
ond row of R is
omputed and the
(n 2) (n 2) matrix in the southeast
orner is modied. The pro
ess
ontinues until it falls out of the southeast
orner. Thus the algorithm begins
with a loop on the row of R to be
omputed.
do 40 k=1,n
At the beginning of the kth stage the array that
ontained A has the form
illustrated below for n = 6 and k = 3:
0 1
11 12 13 14 15 16
B 0 22 23 24 25 26 C
B
C
B 0 0 33 34 35 36 C
B
B 0
B
0 0 44 45
C
46 C
C
:
B C
0 0 0 0 55 56 A
0 0 0 0 0 26
The
omputation of the kth row of R is straightforward:
a(k,k) = sqrt(a(k,k))
do 10 j=k+1,n
a(k,j) = a(k,j)/a(k,k)
10
ontinue
At this point the array has the form
0 1
11 12 13 14 15 16
B 0 22 23 24 25 26 C
B
C
B
B 0 0 33 34 35 36 C
C
:
B 0 0 0 44 45 46 C
B
C
B C
0 0 0 0 55 56 A
0 0 0 0 0 26
We must now adjust the elements beginning with 44 . We will do it by
olumns.
do 30 j=k+1,n
do 20 i=k+1,j
a(i,j) = a(i,j) - a(k,i)*a(k,j)
20
ontinue
30
ontinue
94 Afternotes on Numeri
al Analysis
E
onomi
s
13. Sin
e our
ode is in fortran, we have tried to preserve
olumn orientation
by modifying A by
olumns. Unfortunately, this strategy does not work. The
kth row of R is stored over the kth row of A, and we must repeatedly
ross
a row of A in modifying A . The oending referen
e is a(k,i) in the inner
loop.
do 20 i=k+1,j
a(i,j) = a(i,j) - a(k,i)*a(k,j)
20
ontinue
There is really nothing to be done about this situation, unless we are willing
to provide an extra one-dimensional array |
all it r. We
an then store the
urrent row of R in r and use it to adjust the
urrent A . This results in the
following
ode.
do 40 k=1,n
a(k,k) = sqrt(a(k,k))
do 10 j=k+1,n
a(k,j) = a(k,j)/a(k,k)
r(j) = a(k,j)
10
ontinue
do 30 j=k+1,n
do 20 i=k+1,j
a(i,j) = a(i,j) - r(i)*r(j)
20
ontinue
30
ontinue
40
ontinue
12. Linear Equations 95
15. The fa
t that the Cholesky algorithm is an O(n3 ) algorithm has important
onsequen
es for the solution of linear systems. Given the Cholesky de
ompo-
sition of A, we solve the linear system Ax = b by solving the two triangular
systems
1: RT y = b;
2: Rx = y:
Now a triangular system requires 12 n2 operations to solve, and the two systems
together require n2 operations. To the extent that the operation
ounts re
e
t
a
tual performan
e, we will spend more time in the Cholesky algorithm when
1 n3 > n2 ;
6
or when n > 6. For somewhat larger n, the time spent solving the triangular
systems is insigni
ant
ompared to the time spent
omputing the Cholesky
de
omposition. In parti
ular, having
omputed the Cholesky de
omposition
of a matrix of moderate size, we
an solve several systems having the same
matrix at pra
ti
ally no extra
ost.
16. In x10.4 we have depre
ated the pra
ti
e of
omputing a matrix inverse
to solve a linear system. Now we
an see why. A good way to
al
ulate the
inverse X = (x1 x2 xn ) of a symmetri
positive-denite matrix A is to
ompute the Cholesky de
omposition and use it to solve the systems
Axj = ej ; j = 1; 2; : : : ; n;
where ej is the j th
olumn of the identity matrix. Now if these solutions
are
omputed in the most e
ient way, they require 13 n3 additions and mul-
tipli
ations | twi
e as many as the Cholesky de
omposition. Thus the invert-
and-multiply approa
h is mu
h more expensive than using the de
omposition
dire
tly to solve the linear system.
Le
ture 13
Linear Equations
Inner-Produ
t Form of the Cholesky Algorithm
Gaussian Elimination
must be greater than zero, so that we
an take its square root. Thus we
an
ompute the Cholesky fa
tors of A1 , A2 , A3 , and so on until we rea
h An = A.
The details are left as an exer
ise.11
3. The bulk of the work done by the inner-produ
t algorithm is in the solution
of the system RkT 1 rk = ak , whi
h requires 12 k2 additions and multipli
ations.
Sin
e this solution step must be repeated for k = 1; 2; : : : ; n, the total operation
ount for the algorithm is 16 n3 , the same as for the outer-produ
t form of the
algorithm.
4. In fa
t the two algorithms not only have the same operation
ount, they
perform the same arithmeti
operations. The best way to see this is to position
yourself at the (i; j )-element of the array
ontaining A and wat
h what happens
as the two algorithms pro
eed. You will nd that for both algorithms the (i; j )-
element is altered as follows:
ij 1i 1j ;
ij 1i 1j 2i 2j ;
ij 1i 1j 2i 2j i ;ii
1 1;j :
Then, depending on whether or not i = j , the square root of the element will
be taken to give ii , or the element will be divided by ii to give ij .
One
onsequen
e of this observation is that the two algorithms are the
same with respe
t to rounding errors: they give the same answers to the very
last bit.
Gaussian elimination
5. We will now turn to the general nonsymmetri
system of linear equations
Ax = b. Here A is to be fa
tored into the produ
t A = LU of a lower
triangular matrix and an upper triangular matrix. The approa
h used to derive
the Cholesky algorithm works equally well with nonsymmetri
matri
es; here,
however, we will take another line that suggests important generalizations.
6. To motivate the approa
h,
onsider the linear system
11 x1 + 12 x2 + 13 x3 + 14 x4 = b1
21 x1 + 22 x2 + 23 x3 + 24 x4 = b2
31 x1 + 32 x2 + 33 x3 + 34 x4 = b3
41 x1 + 42 x2 + 43 x3 + 44 x4 = b4
If we set
mi1 = i1 =11 ; i = 2; 3; 4;
11
If you try for a
olumn-oriented algorithm, you will end up
omputing inner produ
ts; a
row-oriented algorithm requires axpy's.
13. Linear Equations 99
and subtra
t mi1 times the rst equation from the ith equation (i = 2; 3; 4),
we end up with the system
11 x1 + 12 x2 + 13 x3 + 14 x4 = b1
022 x2 + 023 x3 + 024 x4 = b02
032 x2 + 033 x3 + 034 x4 = b03
042 x2 + 043 x3 + 044 x4 = b04
where
0ij = ij mi1 1j and b0i = bi mi1 b1 :
Note that the variable x1 has been eliminated from the last three equations.
Be
ause the numbers mi1 multiply the rst equation in the elimination they
are
alled multipliers.
Now set
mi2 = a0i2 =a022 ; i = 3; 4;
and subtra
t mi2 times the se
ond equation from the ith equation (i = 3; 4).
The result is the system
11 x1 + 12 x2 + 13 x3 + 14 x4 = b1
022 x2 + 023 x3 + 024 x4 = b02
0033 x3 + 0034 x4 = b003
0043 x3 + 0044 x4 = b004
where
00ij = 0ij mi2 02j and b00i = b0i mi2 b02 :
Finally set
mi3 = a00i3 =a0033 ; i=4
and subtra
t mi3 times the third equation from the fourth equation. The result
is the upper triangular system
11 x1 + 12 x2 + 13 x3 + 14 x4 = b1
022 x2 + 023 x3 + 024 x4 = b02 (13.1)
0033 x3 + 0034 x4 = b003
00044 x4 = b0004
where
000ij = 00ij mi3 003j and b000i = b00i mi3 b003 :
Sin
e the system (13.1) is upper triangular, it
an be solved by the te
hniques
we have already dis
ussed.
7. The algorithm we have just des
ribed for a system of order four extends in
an obvious way to systems of any order. The triangularization of the system is
usually
alled Gaussian elimination, and the solution of the resulting triangular
100 Afternotes on Numeri
al Analysis
system is
alled ba
k substitution. Although there are sli
ker derivations, this
one has the advantage of showing the
exibility of the algorithm. For example,
if some of the elements to be eliminated are zero, we
an skip their elimination
with a
orresponding savings in operations. We will put this
exibility to use
later.
8. We have not yet
onne
ted Gaussian elimination with an LU de
ompo-
sition. One way is to partition the equation A = LU appropriately, derive
an algorithm, and observe that the algorithm is the same as the elimination
algorithm we just derived. However, there is another way.
9. Let A1 = A and set
0 1
1 0 0 0
B
M1 = B m21 1 0 0CC
;
m31
B
0 1 0CA
m41 0 0 1
where the mij 's are the multipliers dened above. Then it follows that
A2 M1 A1
0 10 1 0 1
1 0 0 0 11 12 13 14 11 12 13 14
B
=B m21 1 0 0C
C B21
B
22 23 24 C
C B 0
B
=B 022 023 024 C
C
:
m
B
31 0 1 0A 31
CB
32 33 34 C
A 0 032 033 0
34 CA
0 m42 0 1
Then
A3 M2 A2
0 10 1 0 1
1 0 0 0 11 12 13 14 11 12 13 14
B
=B0 1 0 0C
CB 0
B
022 023 024 C
C
= B 0
B
022 023 024 CC
:
0
B
m32 1 0A 0
CB
032 033 034 A 0 0
C B
0033 0034 C
A
0 0 m43 1
13. Linear Equations 101
Then
U M A
3 3
0 10 1 0 1
1 0 0 0 11 12 13 14 11 12 13 14
B
0 1 0 0 CB
0 0 0 0 C
B 0
B
022 023 024 C
=B 0 0
B
CB
CB
22 23 24 C
=
1 0A 0 0 0033 0034 A 0 0 0033 0034 C
C B
C
A
:
0 0 m43 1 0 0 0043 0044 0 0 0 00044
In other words the produ
t U = M3 M2 M1 A is the upper triangular ma-
trix | the system of
oe
ients | produ
ed by Gaussian elimination. If we set
L = M1 1 M2 1 M3 1 , then
A = LU:
Moreover, sin
e the inverse of a lower triangular matrix is lower triangular
and the produ
t of lower triangular matri
es is lower triangular, L itself is
lower triangular. Thus we have exhibited an LU fa
torization of A.
10. We have exhibited an LU fa
torization, but we have not yet
omputed it.
To do that we must supply the elements of L. And here is a surprise. The
(i; j )-element of L is just the multiplier mij .
To see this, rst note that Mk 1 may be obtained from Mk by
ipping the
sign of the multipliers; e.g.,
0 1
1 0 0 0
M2 1 = B
B
0 1 0 0C C
:
0 m32 1 0C
B
A
0 m42 0 1
You
an establish this by showing that the produ
t is the identity.
It is now easy to verify that
0 10 1 0 1
1 0 0 0 1 0 0 0 1 0 0 0
M2 1 M3 1 = B
B
0 1 0 0C C B0 1
B
0 0C C = B
C B0
B
1 0 0CC
0 m32 1 0A 0 0 1 0A 0 m32 1 0C
B CB
A
Linear Equations
Pivoting
BLAS
Upper Hessenberg and Tridiagonal Systems
Pivoting
1. The leading diagonal elements at ea
h stage of Gaussian elimination play a
spe
ial role: they serve as divisors in the formulas for the multipliers. Be
ause
of their pivotal role they are
alled | what else | pivots. If the pivots are all
nonzero, the algorithm goes to
ompletion, and the matrix has an LU fa
tor-
ization. However, if a pivot is zero the algorithm mis
arries, and the matrix
may or may not have an LU fa
torization. The two
ases are illustrated by the
matrix !
0 1 ;
1 0
whi
h does not have an LU fa
torization and the matrix
! ! !
0 1 = 1 0 0 1 ;
0 0 0 1 0 0
whi
h does, but is singular. In both
ases the algorithm fails.
2. In some sense the failure of the algorithm is a blessing | it tells you that
something has gone wrong. A greater danger is that the algorithm will go on
to
ompletion after en
ountering a small pivot. The following example shows
what
an happen.12
0 1
0:001 2:000 3:000
A1 = B
1:000 3:712 4:623 C
A ;
2:000 1:072 5:643
0 1
1:000 0:000 0:000
1000:
M1 = B 1:000 0:000 C
A;
2000: 0:000 1:000
0 1
0:001 2:000 3:000
A2 = 0:000
B
2004: 3005: C
A;
0:000 4001: 6006:
12
This example is from G. W. Stewart, Introdu
tion to Matrix Computations, A
ademi
Press, New York, 1973.
103
104 Afternotes on Numeri
al Analysis
0 1
1:000 0:000 0:000
M2 = B 0:000 1:000 0:000 C A;
0:000 1:997 1:000
0 1
0:001 2:000 3:000
A3 = B 0:000 2004: 3005: CA:
0:000 0:000 5:000
The (3; 3)-element of A3 was produ
ed by
an
elling three signi
ant gures
in numbers that are about 6000, and it
annot have more than one gure of
a
ura
y. In fa
t the true value is 5:922. . . .
3. As was noted earlier, by the time
an
ellation o
urs in a
omputation, the
omputation is already dead. In our example, death o
urs in the passage from
A1 to A2 , where large multiples of the rst row were added to the se
ond and
third, obliterating the signi
ant gures in their elements. To put it another
way, we would have obtained the same de
omposition if we had started with
the matrix 0 1
0:001 2:000 3:000
A~1 = B 1:000 4:000 5:000 C A:
2:000 1:000 6:000
Clearly, there will be little relation between the solution of the system A1 x = b
and A~1 x~ = b.
4. If we think in terms of linear systems, a
ure for this problem presents itself
immediately. The original system has the form
0:001x1 + 2:000x2 + 3:000x3 = b1
1:000x1 + 3:712x2 + 4:623x3 = b2
2:000x1 + 1:072x2 + 5:643x3 = b3
If we inter
hange the rst and third equations, we obtain an equivalent system
2:000x1 + 1:072x2 + 5:643x3 = b3
0:001x1 + 2:000x2 + 3:000x3 = b1
1:000x1 + 3:712x2 + 4:623x3 = b2
whose matrix 0 1
2:000 1:072 5:643
A^1 = B
0:001 2:000 3:000 C A
1:000 3:712 4:623
an be redu
ed to triangular form without di
ulty.
5. This suggests the following supplement to Gaussian elimination for
om-
puting the LU de
omposition.
14. Linear Equations 105
Notation like this is best suited for the
lassroom or other situations where
mis
on
eptions are easy to
orre
t. It is risky in print, sin
e someone will
surely take it literally.
10. One drawba
k of the de
omposition (14.1) is that it does not provide
a simple fa
torization of the original matrix A. However, by a very simple
modi
ation of the Gaussian elimination algorithm, we
an obtain an LU fa
-
torization of Pn 1 P2 P1 A.
11. The method is best derived from a simple example. Consider A with its
third and fth rows inter
hanged:
0 1
a11 a12 a13 a14 a15
Ba21 a22 a23 a24 a25 C
B
C
Ba51 a52 a53 a54 C:
a55 C
B
B C
a41 a42 a43 a44 a45 A
a31 a32 a33 a34 a35
If one step of Gaussian elimination is performed on this matrix, we get
0 1
a11 a12 a13 a14 a15
Bm21
B
a022 a023 a024 a025 C
C
Bm51
B
a052 a053 a054 a055 C
C;
a042 a043 a044 0
B C
m41 a45 A
m31 a032 a033 a034 a035
where the numbers mij and a0ij are the same as the numbers we would have
obtained by Gaussian elimination on the original matrix | after all, they are
omputed by the same formulas:
mi1 = mi1 =m11 ;
a0ij = aij mi1 a1j :
If we perform a se
ond step of Gaussian elimination, we get
0 1
a11 a12 a13 a14 a15
B
m
B 21 a022 a023 a024 a025 CC
Bm51
B
m52 a0053 a0054 a0055 C
C;
a0043 a0044 a0045 A
B C
m41 m42
m31 m32 a0033 a0034 a0035
where on
e again the mij and the a00ij are from Gaussian elimination on the
original matrix. Now note that this matrix diers from the one we would
get from Gaussian elimination on the original matrix only in having its third
and fth rows inter
hanged. Thus if at the third step of Gaussian elimination
we de
ide to use the fth row as a pivot and ex
hange both the row of the
14. Linear Equations 107
Thus we rst perform the inter
hanges on the ve
tor b and pro
eed as usual
to solve the two triangular systems involving L and U .
BLAS
15. Although we have re
ommended the blas for matrix
omputations, we
have
ontinued to
ode at the s
alar level. The reason is that Gaussian elim-
ination is a
exible algorithm that
an be adapted to many spe
ial purposes.
But to adapt it you need to know the details at the lowest level.
16. Nonetheless, the algorithm (14.2) oers many opportunities to use the
blas. For example, the loop
maxa = abs(a(k,k))
p(k) = k
do 10 i=k+1,n
if (abs(a(i,k)) .gt. maxa) then
maxa = abs(a(i,k))
p(k) = i
end if
10
ontinue
an be repla
ed with a
all to a blas that nds the position of the largest
omponent of the ve
tor
(a(k,k); a(k+1,k); : : : ; a(n,k))T :
(In the
anoni
al blas, the subprogram is
alled imax.) The loop
do 20 j=1,n
temp = a(k,j)
a(k,j) = a(p(k),j)
a(p(k),j) = temp
20
ontinue
an be repla
ed by a
all to a blas (swap in the
anon) that swaps the ve
tors
(a(k,k); a(k,k+1); : : : ; a(k,n))
and
(a(p(k),k); a(p(k),k+1); : : : ; a(p(k),n)):
The loop
do 30 i=k+1,n
a(i,k) = a(i,k)/a(k,k)
30
ontinue
14. Linear Equations 109
a(n,j) a(n,k):
B .. .. C B .. C
(a(k,k+1); : : : ; a(k,n));
. . A . A
i.e., the sum of a matrix and an outer produ
t. Sin
e this operation o
urs
frequently, it is natural to assign its
omputation to a blas, whi
h is
alled
ger in the
anon.
18. The subprogram ger is dierent from the blas we have
onsidered so far
in that it
ombines ve
tors and matri
es. Sin
e it requires O(n2 ) it is
alled a
level-two blas.
The use of level-two blas
an redu
e the dependen
y of an algorithm on
array orientations. For example, if we repla
e the loops in x14.17 with an
invo
ation of ger, then the latter
an be
oded in
olumn- or row-oriented
form as required. This feature of the level-two blas makes the translation
from fortran, whi
h is
olumn oriented, to C, whi
h is row oriented, mu
h
easier. It also makes it easier to write
ode that takes full advantage of ve
tor
super
omputers.14
14
There is also a level-three blas pa
kage that performs operations between matri
es. Used
with a te
hnique
alled blo
king, they
an in
rease the e
ien
y of some matrix algorithms,
espe
ially on ve
tor super
omputers, but at the
ost of twisting the algorithms they benet
out of their natural shape.
110 Afternotes on Numeri
al Analysis
Linear Equations
Ve
tor Norms
Matrix Norms
Relative error
Sensitivity of Linear Systems
Ve
tor norms
1. We are going to
onsider the sensitivity of linear systems to errors in their
oe
ients. To do so, we need some way of measuring the size of the errors in
the
oe
ients and the size of the resulting perturbation in the solution. One
possibility is to report the errors individually, but for matri
es this amounts
to n2 numbers | too many to examine one by one. Instead we will summarize
the sizes of the errors in a single number
alled a norm. There are norms for
both matri
es and ve
tors.
2. A ve
tor norm is a fun
tion k k : Rn ! R that satises
1: x 6= 0 =) kxk > 0;
2: kxk = jj kxk; (15.1)
3: kx + yk kxk + kyk:
The rst
ondition says that the size of a nonzero ve
tor is positive. The
se
ond says that if a ve
tor is multiplied by a s
alar its size
hanges propor-
tionally. The third is a generalization of the fa
t that one side of a triangle
is not greater than the sum of the other two sides: see Figure 15.1. A useful
variant of the triangle inequality is
kx yk kxk kyk:
3. The
onditions satised by a ve
tor norm are satised by the absolute value
fun
tion on the line | in fa
t, the absolute value is a norm on R1 . This means
that many results in analysis
an be transferred mutatis mutandis from the
real line to Rn .
4. Although there are innitely many ve
tor norms, the ones most
ommonly
found in pra
ti
e are the one-, two-, and innity-norms. They are dened as
follows:
1: kxk1 = q i jxi j;
P
2: kxk2 =
P
i xi ;
2
113
114 Afternotes on Numeri
al Analysis
x+y
y
Matrix norms
5. Matrix norms are dened in analogy with ve
tor norms. Spe
i
ally, a
matrix norm is a fun
tion k k : Rmn ! R that satises
1: A 6= 0 =) kAk > 0;
2: kAk = jj kAk;
3: kA + B k kAk + kB k:
6. The triangle inequality allows us to bound the norm of the sum of two
ve
tors in terms of the norms of the individual ve
tors. To get bounds on the
produ
ts of matri
es, we need another property. Spe
i
ally, let k k stand for
a family of norms dened for all matri
es. Then we say that k k is
onsistent
if
kAB k kAkkB k;
whenever the produ
t AB is dened. A ve
tor norm k kv is
onsistent with a
matrix norm k kM if kAxkv kAkM kxkv .
7. The requirement of
onsisten
y frustrates attempts to generalize the ve
tor
innity-norm in a natural way. For if we dene kAk = maxi jaij j, then
! !
!
1 1 1 1
=
2 2
= 2:
1 1 1 1
2 2
15. Linear Equations 115
But
!
!
1 1
1 1
= 1 1 = 1:
1 1
1 1
This is one reason why the matrix one- and innity-norms have
ompli
ated
denitions. Here they are | along with the two-norm, whi
h gets new name:
1: kAk1 = max i jaij j;
P
q j
F: kAkF =
P
a2ij ;
i;jP
1: kAk1 = maxi j jaij j:
The norm k kF is
alled the Frobenius norm.15
The one, Frobenius, and innity norms are
onsistent. When A is a ve
tor,
the one- and innity-norms redu
e to the ve
tor one- and innity-norms, and
the Frobenius norm redu
es to the ve
tor two-norm.
Be
ause the one-norm is obtained by summing the absolute values of the
elements in ea
h
olumn and taking the maximum, it is sometimes
alled the
olumn-sum norm. Similarly, the innity-norm is
alled the row-sum norm.
Relative error
8. Just as we use the absolute value fun
tion to dene the relative error in
a s
alar, we
an use norms to dene relative errors in ve
tors and matri
es.
Spe
i
ally, the relative error in y as an approximation to x is the number
=
ky xk :
kxk
The relative error in a matrix is dened similarly.
9. For s
alars there is a
lose relation between relative error and the number
of
orre
t digits: if the relative error in y is , then x and y agree to roughly
log de
imal digits. This simple relation does not hold for the
omponents
of a ve
tor, as the following example shows.
Let 0 1 0 1
1:0000 1:0002
x=B 0:0100 A and y = B 0:0103 A :
C C
0:0001 0:0002
In the innity-norm, the relative error in y as an approximation to x is 3 10 4 .
But the relative errors in the individual
omponents are 2 10 4 , 3 10 2 , and 1.
The large
omponent is a
urate, but the smaller
omponents are ina
urate
in proportion as they are small. This is generally true of the norms we have
15
We will en
ounter the matrix two-norm later in x17.
116 Afternotes on Numeri
al Analysis
introdu
ed: the relative error gives a good idea of the a
ura
y of the larger
omponents but says little about small
omponents.
10. It sometimes happens that we are given the relative error of y as an
approximation to x and want the relative error of x as an approximation to y.
The following result says that when the relative errors are small, the two are
essentially the same.
If
ky xk < 1; (15.2)
kxk
then
kx y k :
kyk 1
To see this, note that from (15.2), we have
kxk ky xk kxk ky k
or
(1 )kxk kyk:
Hen
e
kx yk ky xk :
kyk (1 )kxk 1
If = 0:1, then =(1 ) = 0:111. . . , whi
h diers insigni
antly from .
Sensitivity of linear systems
11. Usually the matrix of a linear system will not be known exa
tly. For exam-
ple, the elements of the matrix may be measured. Or they may be
omputed
with rounding error. In either
ase, we end up solving not the true system
Ax = b;
but a perturbed system
A~x~ = b:
It is natural to ask how
lose x is to x~. This is a problem in matrix perturbation
theory. From now on, k k will denote both a
onsistent matrix norm and a
ve
tor norm that is
onsistent with the matrix norm.16
12. Let E = A~ A so that
A~ = A + E:
The rst order of business is to determine
onditions under whi
h A~ is nonsin-
gular.
16
We
ould also ask about the sensitivity of the solution to perturbations in b. This is a
very easy problem, whi
h we leave as an exer
ise.
15. Linear Equations 117
Let A be nonsingular. If
kA E k < 1;
1
(15.3)
then A + E is nonsingular.
To establish this result, we will show that under the
ondition (15.3) if
x 6= 0 then (A + E )x 6= 0. Sin
e A is nonsingular (A + E )x = A(I + A 1 E )x 6= 0
if and only if (I + A 1 E )x 6= 0. But
k(I + A 1E )xk = kx + A 1Exk kxk kA 1 E kkxk = (1 kA 1 E k)kxk > 0;
whi
h establishes the result.
13. We are now in a position to establish the fundamental perturbation theo-
rem for linear systems.
Let A be nonsingular and let A~ = A + E . If
Ax = b and A~x~ = b;
where b is nonzero, then
kx~ xk kA E k:1
(15.4)
kx~k
If in addition
kA E k < 1;
1
Linear Equations
The Condition of a Linear System
Arti
ial Ill-Conditioning
Rounding Error and Gaussian Elimination
Comments on the Error Analysis
where
(A) = kAkkA 1
k:
If
(A)
kE k < 1;
kAk
then we
an write
kE k
(A)
kx~ xk kAk :
kxk 1 (A) kkE
Ak
k
2. Now let's disassemble this inequality. First note that if (A)kE k=kAk is at
all small, say less than 0:1, then the denominator on the right is near one and
has little ee
t. Thus we
an
onsider the approximate inequality
kx~ xk < (A) kE k :
kxk kAk
The fra
tion on the left,
kx~ xk ;
kxk
is the relative error in x~ as an approximation to x. The fra
tion on the right,
kE k ;
kAk
119
120 Afternotes on Numeri
al Analysis
If we solve the linear system A~x~ = b without further error, we get a solution
that satises
kx~ xk (A) : (16.1)
kxk M
whose inverse is !
A = 21 11 :
1
^
A= 1 10 4
10 4
2 ;
17
Although it looks like you need a matrix inverse to
ompute the
ondition number, there
are reliable ways of estimating it from the LU de
omposition.
16. Linear Equations 121
whose inverse is !
A^ 1 = 2 10 1 :
4
104 1
The
ondition number of A^ is about 6 104 .
If we now introdu
e errors into the fth digits of the elements of A and A^ |
su
h errors as might be generated by rounding to four pla
es | the innity-
norms of the error matri
es will be about kAk1 10 4 = kA^k1 10 4 . Thus,
for A, the error in the solution of Ax = b is approximately
(A)
kE k1 = 9 10 ;4
kAk1
while for A^ the predi
ted error is
kE^ k
(A^) ^ 1 = 6:
kAk1
Thus we predi
t a small error for A and a large one for A^. Yet the passage from
A to A^ is equivalent to multiplying the rst equation in the system Ax = b by
10 4 , an operation whi
h should have no ee
t on the a
ura
y of the solution.
6. What's going on here? Is A^ ill
onditioned or is it not? The answer is \It
depends."
7. There is a sense in whi
h A^ is ill
onditioned. It has a row of order 10 4 ,
and a perturbation of order 10 4
an
ompletely
hange that row | even make
it zero. Thus the linear system Ax ^ = b is very sensitive to perturbations of
order 10 in the rst row, and that fa
t is re
e
ted in the large
ondition
4
number.
8. On the other hand, the errors we get by rounding the rst row of A^ are not
all of order 10 4 . Instead the errors are bounded by
! !
10 4 10 4 10 4 = 10 8
10 8
:
1 2 10 4
2 10 4
A~ = 30::0000 2:0000
9999 1:9996
and perform Gaussian elimination on A~ without rounding, we get the same
results as we did by performing Gaussian elimination with rounding on A.
12. The above example seems
ontrived. What is true of a 2 2 matrix may
be false for a large matrix. And if the matrix is nearly singular, things might
be even worse.
16. Linear Equations 123
then
a0ij = a~ij mi1 a1j : (16.5)
Now it follows from (16.3) and (16.5) that the matrix
a n
0 1
a11 a12 1
B m21
B
a022 a0 n
C
2 C
B
.. .. ..C ;
B
. . .C
A
A~ = B ..
2 C
B
.. ..C :
. . .C
A
a~n1 a~n2 a~nn
Moreover, A and A~ are near ea
h other. From (16.4) it follows that for j > 1
ja~ij aij j (jaij j + 3jmi1 jja1j j)M:
If we assume that the elimination is
arried out with pivoting so that jmi1 j 1
and set = maxi;j jaij j, then the bound be
omes
ja~ij aij j 4M :
Similarly, (16.2) implies that this bound is also satised for j = 1.
14. All this gives the
avor of the ba
kward rounding-error analysis of Gaussian
elimination; however, there is mu
h more to do to analyze the solution of a
linear system. We will skip the details and go straight to the result.
If Gaussian elimination with partial pivoting is used to solve the
n n system Ax = b on a
omputer with rounding unit M , the
omputed solution x~ satises
(A + E )~x = b;
where
kE k '(n)
: (16.6)
kAk M
Here ' is a slowly growing fun
tion of n that depends on the norm,
and
is the ratio of the largest element en
ountered in the
ourse
of the elimination to the largest element of A.
16. Linear Equations 125
1 1 1 1 1C
B
A
1 1 1 1 1
Gaussian elimination with partial pivoting applied to this matrix yields the
following sequen
e of matri
es.
0 1 0 1 0 1
1 0 0 0 1 1 0 0 0 1 1 0 0 0 1
B0 1 0 0 2C B0 1 0 0 2C B0 1 0 0 2C
B B B
C C C
B0 1 1 0 2 B0 0 1 0 4 B0 0 1 0 4
B C B C B C
C C C
0 1 1 1 2A 0 0 1 1 4A 0 0 0 1 8C
B C B C B
A
0 1 1 1 2 0 0 1 1 4 0 0 0 1 8
0 1
1 0 0 0 1
B0 1 0 0 2C
B
C
B0 0 1 0 4C:
B C
0 0 0 1 8C
B
A
0 0 0 0 16
Clearly, if Gaussian elimination is performed on a matrix of order n having
this form, the growth fa
tor will be 2n 1 .
126 Afternotes on Numeri
al Analysis
19. Does this mean that Gaussian elimination with partial pivoting is not to
be trusted? The re
eived opinion has been that examples like the one above
o
ur only in numeri
al analysis texts: in real life there is little growth and
often a de
rease in the size of the elements. Re
ently, however, a naturally
o
urring example of exponential growth has been en
ountered | not surpris-
ingly in a matrix that bears a family resemblan
e to W . Nonetheless, the
re
eived opinion stands. Gaussian elimination with partial pivoting is one of
the most stable and e
ient algorithms ever devised. Just be a little
areful.
Le
ture 17
Linear Equations
Introdu
tion to a Proje
t
More on Norms
The Wonderful Residual
Matri
es with Known Condition Numbers
Invert and Multiply
Cramer's Rule
Submission
Here is what you need to know about the two-norm for this proje
t.
1. The matrix two-norm of a ve
tor is its ve
tor two-norm.
2. The matrix two-norm is
onsistent; that is, kAB k kAkkB k, when-
ever AB is dened.
3. kxyT k = kxkkyk.
4. kdiag(d1 ; : : : ; dn )k = maxi fjdi jg.
5. If U T U = I and V T V = I (we say U and V are orthogonal ), then
kU TAV k = kAk.
19
If you nd this denition
onfusing, think of it this way. Given a ve
tor x of length one,
the matrix A stret
hes or shrinks it into a ve
tor of length kAxk. The matrix two-norm of A
is the largest amount it
an stret
h or shrink a ve
tor.
127
128 Afternotes on Numeri
al Analysis
All these properties are easy to prove from the denition of the two-norm, and
you might want to try your hand at it. For the last property, you begin by
establishing it for the ve
tor two-norm.
With these preliminaries out of the way, we are ready to get down to
business.
The wonderful residual
4. How
an you tell if an algorithm for solving the linear system Ax = b is
stable | that is, if the
omputed solution x~ satises a slightly perturbed system
(A + E )~x = b; (17.1)
where
kE k = O( )?
kAk M
Then
rx~T x~ rkx~k2
b (A + E )~x = (b Ax~) E x~ = r
kx~k2 = r kx~k2 = 0;
so that (A + E )~x = b. But
kE k = krx~ k ; T
kE k = krk :
kAk kAkkx~k
7. What we have shown is that the relative residual norm
krk
kAkkx~k
is a reliable indi
ation of stability. A stable algorithm will yield a relative
residual norm that is of the order of the rounding unit; an unstable algorithm
will yield a larger value.
Matri
es with known
ondition numbers
8. To investigate the ee
ts of
onditioning, we need to be able to gener-
ate nontrivial matri
es of known
ondition number. Given an order n and a
ondition number we will take A in the form
A = UDV T ;
where U and V are random orthogonal matri
es (i.e., random matri
es satis-
fying U T U = V T V = I ), and
D = diag(1; ; ; : : : ; 1 ):
1 2
n 1 n 1
The fa
t that the
ondition number of A is follows dire
tly from the properties
of the two-norm enumerated in x17.3.
9. The rst part of the proje
t is to write a fun
tion
fun
tion a =
ondmat(n, kappa)
to generate a matrix of order n with
ondition number . To obtain a ran-
dom orthogonal matrix, use the matlab fun
tion rand to generate a random,
normally distributed matrix. Then use the fun
tion qr to fa
tor the random
matrix into the produ
t QR of an orthogonal matrix and an upper triangular
matrix, and take Q for the random orthogonal matrix.
You
an
he
k the
ondition of the matrix you generate by using the fun
-
tion
ond.
130 Afternotes on Numeri
al Analysis
where
reg is the relative error in the solution by Gaussian elimination,
rrg is the relative residual norm for Gaussian elimination,
rei is the relative error in the invert-and-multiply solution,
rri is the relative residual norm for invert-and-multiply.
11. The matlab left divide operator \\" is implemented by Gaussian elimi-
nation. To invert a matrix, use the fun
tion inv.
Cramer's rule
12. The purpose here is to
ompare the stability of Gaussian elimination with
Cramer's rule for solving the 2 2 system Ax = b. For su
h a system, Cramer's
rule
an be written in the form
x1 = (b1 a22 b2 a12 )=d;
x2 = (b2 a11 b1 a21 )=d;
where
d = a11 a22 a21 a12 :
13. Write a fun
tion
fun
tion
ramer(kap)
where kap is a ve
tor of
ondition numbers. For ea
h
omponent kap(i), the
fun
tion should do the following.
17. Linear Equations 131
where
reg is the relative error in the solution by Gaussian elimination,
rrg is the relative residual norm for Gaussian elimination,
re
is the relative error in the solution by Cramer's rule,
rr
is the relative residual norm for Cramer's rule.
Submission
14. Run your programs for
kap = (1; 104 ; 108 ; 1012 ; 1016 )
using the matlab
ommand diary to a
umulate your results in a le. Edit
the diary le and at the top put a brief statement in your own words of what
the results mean.
Polynomial Interpolation
133
Le
ture 18
Polynomial Interpolation
Quadrati
Interpolation
Shifting
Polynomial Interpolation
Lagrange Polynomials and Existen
e
Uniqueness
Quadrati
interpolation
1. Muller's method for nding a root of the equation f (t) = 0 is a three-
point iteration (see x4.19). Given starting values x0 , x1 , x2 and
orresponding
fun
tion values f0 , f1 , f2 , one determines a quadrati
polynomial
p(t) = a0 + a1 t + a2 t2
satisfying
p(xi ) = fi ; i = 0; 1; 2 : (18.1)
The next iterate x3 is then taken to be the root nearest x2 of the equation
p(t) = 0.
2. At the time the method was presented, I suggested that it would be instru
-
tive to work through the details of its implementation. One of the details is the
determination of the quadrati
polynomial p satisfying (18.1), an example of
quadrati
interpolation. Sin
e quadrati
interpolation exhibits many features
of the general interpolation problem in readily digestible form, we will treat it
rst.
3. If the equations (18.1) are written out in terms of the
oe
ients a0 , a1 ,
a2 , the result is the linear system
0 10 1 0 1
1 x0 x20 a0 f0
1 x1 A a1 A = f1 A :
x21 C
B B C B C
1 x2 x22 a2 f2
In prin
iple, we
ould nd the
oe
ients of the interpolating polynomial by
solving this system using Gaussian elimination. There are three obje
tions to
this pro
edure.
4. First, it is not at all
lear that the matrix of the system | it is
alled
a Vandermonde matrix | is nonsingular. In the quadrati
ase it is possible
to see that it is nonsingular by performing one step of Gaussian elimination
and verifying that the determinant of the resulting 2 2 system is nonzero.
However, this approa
h breaks down in the general
ase.
135
136 Afternotes on Numeri
al Analysis
5. A se
ond obje
tion is that the pro
edure is too expensive. This obje
tion
is not stri
tly appli
able to the quadrati
ase; but in general the pro
edure
represents an O(n3 ) solution to a problem whi
h, as we will see,
an be solved
in O(n2 ) operations.
6. Another obje
tion is that the approa
h
an lead to ill-
onditioned systems.
For example, if x0 = 100, x1 = 101, x2 = 102, then the matrix of the system is
0 1
1 100 10; 000
V =B 1 101 10; 201C
A:
1 102 10; 404
The
ondition number of this system is approximately 2108 .
Now the unequal s
ale of the
olumns of V suggests that there is some
arti
ial ill-
onditioning in the problem (see x16.5) | and indeed there is. But
if we res
ale the system, so that its matrix assumes the form
0 1
1 1:00 1:0000
V^ = B
1 1:01 1:0201C
A;
1 1:02 1:0404
the
ondition number
hanges to about 105 | still un
omfortably large, though
perhaps good enough for pra
ti
al purposes. This ill-
onditioning, by the way,
is real and will not go away with further s
aling.
Shifting
7. By rewriting the polynomial in the form
p(t) = b0 + b1 (t x2 ) + b2 (t x2 )2 ;
we
an simplify the equations and remove the ill-
onditioning. Spe
i
ally, the
equations for the
oe
ients bi be
ome
0 10 1 0 1
1 x0 x2 (x0 x2 )2 b0 f0
1 x1 x2 (x1 x2 )2 C A b1 A = f1 A :
B B C B C
1 0 0 b2 f2
From the third equation we have
b0 = f2 ;
from whi
h it follows that ! ! !
x0 x2 (x0 x2 )2 b1 = f0 f2 :
x1 x2 (x1 x2 )2 b2 f1 f 2
For our numeri
al example, this equation is
! ! !
2 4 b 1 = f0 f2 ;
1 1 b2 f1 f2
whi
h is very well
onditioned.
18. Polynomial Interpolation 137
Polynomial interpolation
8. The quadrati
interpolation problem has a number of features in
ommon
with the general problem.
1. It is of low order. High-order polynomial interpolation is rare.
2. It was introdu
ed to derive another numeri
al algorithm. Not all
polynomial interpolation problems originate in this way, but many
numeri
al algorithms require a polynomial interpolant.
3. The appearan
e of the problem and the nature of its solution
hange
with a
hange of basis.20 When we posed the problem in the natural
basis 1, t, t2 , we got an ill-
onditioned 3 3 system. On the other
hand, posing the problem in the shifted basis 1, t x2 , (t x2 )2 lead
to a well-
onditioned 2 2 system.
9. The general polynomial interpolation problem is the following.
Given points (x0 ; f0 ), (x1 ; f1 ), . . . , (xn ; fn ), where the xi are dis-
tin
t, determine a polynomial p satisfying
1: deg(p) n;
2: p(xi ) = fi; i = 0; 1; : : : ; n:
B ..
B
.. .. .. C B .. C B .. C
CB : (18.2)
. . . . A . A . C A
1 xn x2n xnn an fn
The matrix of this system is
alled a Vandermonde matrix. The dire
t solution
of Vandermonde systems by Gaussian elimination is not re
ommended.
Lagrange polynomials and existen
e
11. The existen
e of the interpolating polynomial p
an be established in the
following way. Suppose we are given n + 1 polynomials `j (t) that satisfy the
following
onditions:
6= j ,
`j (xi ) = 10 ifif ii = (18.3)
j.
20
The term \basis" is used here in its usual sense. The spa
e of, say, quadrati
polynomials
is a ve
tor spa
e. The fun
tions 1, t, t2 form a basis for that spa
e. So do the fun
tions 1,
t a, (t a)2 .
138 Afternotes on Numeri
al Analysis
12. We must now show that polynomials `j having the properties (18.3) a
tu-
ally exist. For n = 2, they are
`0 (t) = t x )(t x ) ; `1 (t) = (t x )(t x )
x x )(x x ) ;
( 1 2 0 2
x
( 0 x )(x x )
1 0 2 ( 1 0 1 2
(t x )(t x )
`2 (t) = x x )(x x ) :
( 2
0
0 2
1
Uniqueness
15. To establish the uniqueness of the interpolating polynomial, we use the
following result from the theory of equations.
If a polynomial of degree n vanishes at n + 1 distin
t points, then
the polynomial is identi
ally zero.
16. Now suppose that in addition to the polynomial p the interpolation prob-
lem has another solution q. Then r(t) = p(t) q(t) is of degree not greater
than n. But sin
e r(xi ) = p(xi ) q(xi ) = fi fi = 0, the polynomial r vanishes
at n + 1 points. Hen
e r vanishes identi
ally, or equivalently p = q.
17. The
ondition that deg(p) n in the statement of the interpolation
problem appears unnatural to some people. \Why not require the polynomial
18. Polynomial Interpolation 139
to be exa
tly of degree n?" they ask. The uniqueness of the interpolant provides
an answer.
Suppose, for example, we try to interpolate three points lying on a straight
line by a quadrati
. Now the line itself is a linear polynomial that interpolates
the points. By the uniqueness theorem, the result of the quadrati
interpolation
must be that same straight line. What happens, of
ourse, is that the
oe
ient
of t2
omes out zero.
Le
ture 19
Polynomial Interpolation
Syntheti
Division
The Newton Form of the Interpolant
Evaluation
Existen
e and Uniqueness
Divided Dieren
es
Syntheti
division
1. The interpolation problem does not end with the determination of the inter-
polating polynomial. In many appli
ations one must evaluate the polynomial
at a point t. As we have seen, it requires no work at all to determine the
Lagrange form of the interpolant: its
oe
ients are the values fi themselves.
On the other hand, the individual Lagrange polynomials are tri
ky to evaluate.
For example, produ
ts of the form
(x0 xi ) (xi 1 xi )(xi+1 xi ) (xn xi )
an easily over
ow or under
ow.
2. Although the
oe
ients of the natural form of the interpolant
p(t) = an tn + an 1 tn 1 + an 1 tn 1 + + a1 t + a0 (19.1)
are not easy to determine, the polynomial
an be e
iently and stably evalu-
ated by an algorithm
alled syntheti
division or nested evaluation.
3. To derive the algorithm, write (19.1) in the nested form
p(t) = (( (((an )t + an 1 )t + an 2 ) )t + a1 )t + a0 : (19.2)
(It is easy to
onvin
e yourself that (19.1) and (19.2) are the same polynomial
by looking at, say, the
ase n = 3. More formally, you
an prove the equality
by an easy indu
tion.) This form naturally suggests the su
essive evaluation
an ;
(a n )t + a n 1 ;
((an )t + an 1 )t + an 2 ;
(( ((an )t + an 1 )t + an 2 ) )t + a1 ;
(( ((an )t + an 1 )t + an 2 ) )t + a1 )t + a0 :
At ea
h step in this evaluation the previously
al
ulated value is multiplied by
t and added to a
oe
ient. This leads to the following simple algorithm.
141
142 Afternotes on Numeri
al Analysis
p = a[n;
for (i=n-1; i>=0; i--)
p = p*t + a[i;
4. Syntheti
division is quite e
ient, requiring only n additions and n mul-
tipli
ations. It is also quite stable. An elementary rounding-error analysis
will show that the
omputed value of p(t) is the exa
t value of a polynomial p~
whose
oe
ients dier from those of p by relative errors on the order of the
rounding unit.
The Newton form of the interpolant
5. The natural form of the interpolant is di
ult to determine but easy to
evaluate. The Lagrange form, on the other hand, is easy to determine but
di
ult to evaluate. It is natural to ask, \Is there a
ompromise?" The
answer is, \Yes, it is the Newton form of the interpolant."
6. The Newton form results from
hoosing the basis
1; t x0 ; (t x0 )(t x1 ); : : : ; (t x0 )(t x1 ) (t xn 1 ); (19.3)
or equivalently from writing the interpolating polynomial in the form
p(t) =
0 +
1 (t x0 ) +
2 (t x0 )(t x1 ) + (19.4)
+
n (t x0 )(t x1 ) (t xn 1 ):
To turn this form of the interpolant into an e
ient
omputational tool, we
must show two things: how to determine the
oe
ients and how to evaluate
the resulting polynomial. The algorithm for evaluating p(t) is a variant of syn-
theti
division, and it will be
onvenient to derive it while the latter algorithm
is fresh in our minds.
Evaluation
7. To derive the algorithm, rst write (19.4) in nested form:
p(t) = (( (((
n )(t xn 1 )
+
n 1 )(t xn 2 ) +
n 2 ) )(t x1 ) +
1 )(t x0 ) +
0 :
From this we see that the nested Newton form has the same stru
ture as the
nested natural form. The only dieren
e is that at ea
h nesting the multiplier
t is repla
ed by (t xi ). Hen
e we get the following algorithm.
p =
[n;
for (i=n-1; i>=0; i--)
p = p*(t-x[i) +
[i;
8. This algorithm requires 2n additions and n multipli
ations. It is ba
kward
stable.
19. Polynomial Interpolation 143
B1
B
(x 1 x 0 ) 0 0 C
C
B1
B
( x 2 x 0 ) ( x2 x )(
0 x2 x1 ) 0 C
C ;
B .. . . ..
B C
.
.. .. . C
A
The existen
e also follows from the fa
t that any sequen
e fpi g
21 n
of polynomials su
h
that pi is exa
tly of degree i forms a basis for the set of polynomials of degree n (see x23.6).
i=0
The approa
h taken here, though less general, has pedagogi
al advantages.
144 Afternotes on Numeri
al Analysis
is analogous to the Vandermonde matrix in the sense that its (i; j )-element is
the (j 1)th basis element evaluated at xi 1 . The
orresponding analogue for
the Lagrange basis is the identity matrix. The in
reasingly simple stru
ture of
these matri
es is re
e
ted in the in
reasing ease with whi
h we
an form the
interpolating polynomials in their respe
tive bases.
12. An interesting
onsequen
e of the triangularity of the system (19.5) is
that the addition of new points to the interpolation problem does not ae
t
the
oe
ients we have already
omputed. In other words,
0 is the 0-degree polynomial interpolating
(x0 ; f0 );
0 +
1 (t x1 ) is the 1-degree polynomial interpolating
(x0 ; f0 ); (x1 ; f1 );
0 +
1 (t x0 ) +
2 (t x0 )(t x1 ) is the 2-degree polynomial interpolating
(x0 ; f0 ); (x1 ; f1 ); (x1 ; f1 );
and so on.
Divided dieren
es
13. In prin
iple, the triangular system (19.5)
an be solved in O(n2 ) operations
to give the
oe
ients of the Newton interpolant. Unfortunately, the
oe-
ients of this system
an easily over
ow or under
ow. However, by taking a
dierent view of the problem, we
an derive a substantially dierent algorithm
that will determine the
oe
ients in O(n2 ) operations.
14. We begin by dening the divided dieren
e f [x0 ; x1 ; : : : ; xk to be the
oe
ient of xk in the polynomial interpolating (x0 ; f0 ), (x1 ; f1 ), . . . , (xk ; fk ).
From the observations in x19.12, it follows that
f [x0 ; x1 ; : : : ; xk =
k ;
i.e., f [x0 ; x1 ; : : : ; xk is the
oe
ient of (t x0 )(t x1 ) (t xk 1) in the
Newton form of the interpolant.
15. From the rst equation in the system (19.5), we nd that
f [x0 = f0 ;
and from the se
ond
f1 a0 f [x1 f [x0
f [x0 ; x1 = = x x :
x1 x0 1 0
Thus the rst divided dieren
e is obtained from zeroth-order divided dier-
en
es by subtra
ting and dividing, whi
h is why it is
alled a divided dieren
e.
19. Polynomial Interpolation 145
16. The above expression for f [x0 ; x1 is a spe
ial
ase of a more general
relation:
f [x ; x ; : : : ; xk f [x0 ; x1 ; : : : ; xk 1
f [x0 ; x1 ; : : : ; xk = 1 2 : (19.6)
k x x 0
Polynomial Interpolation
Error in Interpolation
Error Bounds
Convergen
e
Chebyshev Points
Error in interpolation
1. Up to this point we have treated the ordinates fi as arbitrary numbers. We
will now shift our point of view and assume that the fi satisfy
fi = f (xi );
where f is a fun
tion dened on some interval of interest. As usual we will
assume that f has as many derivatives as we need.
2. Let p be the polynomial interpolating f at x0 , x1 , . . . , xn . Sin
e poly-
nomial interpolation is often done in the hope of nding an easily evaluated
approximation to f , it is natural to look for expressions for the error
e(t) = f (t) p(t):
In what follows, we assume that t is not equal to x0 , x1 , . . . , xn (after all, the
error is zero at the points of interpolation).
3. To nd an expression for the error, let q(u) be the polynomial of degree
n +1 that interpolates f at the points x0 , x1 , . . . , xn , and t. The Newton form
of this interpolant is
q(u) =
0 +
1 (u x0 ) + +
n (u x0 ) (u xn 1 )
+ f [x0 ; : : : ; xn ; t(u x0 ) (u xn 1 )(u xn ):
Now (see x19.12)
0 +
1 (u x0 ) + +
n (u x0 ) (u xn 1 ) is just the
polynomial p(u). Hen
e if we set
!(u) = (u x0 ) (u xn 1 )(u xn );
we have
q(u) = p(u) + f [x0 ; : : : ; xn ; t!(u):
But by
onstru
tion q(t) = f (t). Hen
e
f (t) = p(t) + f [x0 ; : : : ; xn ; t!(t);
147
148 Afternotes on Numeri
al Analysis
or
e(t) = f (t) p(t) = f [x0 ; : : : ; xn ; t!(t); (20.1)
whi
h is the expression we are looking for.
4. Although (20.1) reveals an elegant relation between divided dieren
es and
the error in interpolation, it does not allow us to bound the magnitude of the
error. However, we
an derive another, more useful, expression by
onsidering
the fun
tion
'(u) = f (u) p(u) f [x0 ; : : : ; xn ; t!(u):
Here, as above, we regard u as variable and t as xed.
Sin
e p(xi ) = f (xi ) and !(xi ) = 0,
'(xi ) = f (xi ) p(xi ) f [x0 ; : : : ; xn ; t!(xi ) = 0:
Moreover, by (20.1),
'(t) = f (t) p(t) f [x0; : : : ; xn ; t!(t) = 0:
In other words, if I is the smallest interval
ontaining x0 , . . . , xn and t, then
'(u) has at least n + 2 zeros in I:
By Rolle's theorem, between ea
h of these zeros there is a zero of '0 (u):
'0 (u) has at least n + 1 zeros in I:
Similarly,
'00 (u) has at least n zeros in I:
Continuing, we nd that
'(n+1) (u) has at least one zero in I .
Let be one of the zeros of '(n+1) lying in I .
To get our error expression, we now evaluate '(n+1) at . Sin
e p(u) is a
polynomial of degree n,
p(n+1) ( ) = 0:
Sin
e !(u) = un+1 + ,
!(n+1) ( ) = (n + 1)!
Hen
e
0 = '(n+1) ( ) = f (n+1) ( ) f [x0 ; : : : ; xn ; t(n + 1)!;
or
f (n+1) ( )
f [x0 ; : : : ; xn ; t = (20.2)
(n + 1)! :
In view of (20.1), we have the following result.
20. Polynomial Interpolation 149
5. The point t = is a fun
tion of t. The proof says nothing of the properties
of t , other than to lo
ate it in a
ertain interval. However, it is easy to show
that
f (n+1) (t ) is
ontinuous:
Just apply l'H^opital's rule to the expression
f (t) p(t)
f (n+1) (t ) = (n + 1)!
(t x0 ) (t xn )
at the points x0 , . . . , xn.
6. A useful and informative
orollary of (20.2) is the following expression.
1 f (n) ();
f [x0 ; x2 ; : : : ; xn =
n!
where lies in the interval
ontaining x1 , x2 , . . . , xn .
In parti
ular, if the points x0 , . . . , xn
luster about a point t, the nth dieren
e
quotient is an approximation to f (n) (t).
Error bounds
7. We will now show how to use (20.3) to derive error bounds in a simple
ase.
Let `(t) be the linear polynomial interpolating f (t) at x0 and x1 , and suppose
that
jf 00(t)j M
in some interval of interest. Then
00
jf (t) `(t)j = jf 2()j j(t x0)(t x1)j M2 j(t x0)(t x1)j:
The further treatment of this bound depends on whether t lies outside or inside
the interval [x0 ; x1 .
8. If t lies outside [x0 ; x1 , we say that we are extrapolating the polynomial
approximation to f . Sin
e j(t x0 )(t x1 )j qui
kly be
omes large as t moves
150 Afternotes on Numeri
al Analysis
10. As an appli
ation of this bound, suppose that we want to
ompute
heap
and dirty sines by storing values of the sine fun
tion at equally spa
ed points
and using linear interpolation to
ompute intermediate values. The question
then arises of how small the spa
ing h between the points must be to a
hieve
a pres
ribed a
ura
y.
Spe
i
ally, suppose we require the approximation to be a
urate to 10 4 .
Sin
e the absolute value of the se
ond derivative of the sine is bounded by one,
we have from (20.4)
j sin t `(t)j h8 :
2
Convergen
e
11. The method just des
ribed for approximating the sine uses many inter-
polants over small intervals. Another possibility is to use a single high-order
interpolant to represent the fun
tion f over the entire interval of interest. Thus
suppose that for ea
h n = 0, 1, 2, . . . we
hoose n equally spa
ed points and
let pn interpolate f at these points. If the sequen
e of polynomials fpn g1 n=0
onverges uniformly to f , we know there will be an n for whi
h pn will be a
su
iently a
urate approximation.
20. Polynomial Interpolation 151
n=4 n=8
1 1
0.5
0.5
0
-0.5
-0.5 -1
-5 0 5 -5 0 5
n = 12 n = 16
1 5
0
0
-1
-5
-2
-10
-3
-4 -15
-5 0 5 -5 0 5
-3
x 10 n = 16
1
0.8
0.6
0.4
0.2
-0.2
-0.4
-0.6
-0.8
-1
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
is minimized.
It
an be shown
min max j!(x)j = 2 n ;
!(x)=(x x )(x x 0 (x x ) x2[ 1;1
1) n
n=4 n=8
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
-0.2 -0.2
-5 0 5 -5 0 5
n = 12 n = 16
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
-5 0 5 -5 0 5
15. Figure 20.3 shows what happens when 1=(1 + x2 ) is interpolated at the
Chebyshev points. It appears to be
onverging satisfa
torily.
16. Unfortunately, there are fun
tions for whi
h interpolation at the Chebyshev
points fails to
onverge. Moreover, better approximations of fun
tions like
1=(1+ x2 )
an be obtained by other interpolants | e.g.,
ubi
splines. However,
if you have to interpolate a fun
tion of modest or high degree by a polynomial,
you should
onsider basing the interpolation on the Chebyshev points.
Numeri
al Integration
155
Le
ture 21
Numeri
al Integration
Numeri
al Integration
Change of Intervals
The Trapezoidal Rule
The Composite Trapezoidal Rule
Newton{Cotes Formulas
Undetermined Coe
ients and Simpson's Rule
Numeri
al integration
1. The dierential
al
ulus is a s
ien
e; the integral
al
ulus is an art. Given
a formula for a fun
tion | say e x or e x | it is usually possible to work your
2
way through to its derivative. The same is not true of the integral. We
an
al
ulate Z
e x dx
easily enough, but Z
e x 2
dx (21.1)
annot be expressed in terms of the elementary algebrai
and trans
endental
fun
tions.
2. Sometimes it is possible to dene away the problem. For example the
integral (21.1) is so important in probability and statisti
s that there is a well-
tabulated error fun
tion
erf(x) = p2
Z x
e x dx;
2
0
whose properties have been extensively studied. But this approa
h is spe
ial-
ized and not suitable for problems in whi
h the fun
tion to be integrated is
not known in advan
e.
3. One of the problems with an indenite integral like (21.1) is that the
solution has to be a formula. The denite integral, on the other hand, is a
number, whi
h in prin
iple
an be
omputed. The pro
ess of evaluating a
denite integral of a fun
tion from values of the fun
tion is
alled numeri
al
integration or numeri
al quadrature.22
The word \quadrature" refers to nding a square whose area is the same as the area
22
under a urve.
157
158 Afternotes on Numeri
al Analysis
Change of intervals
4. A typi
al quadrature formula is Simpson's rule :
Z 1
f (x) dx
1 2 1 1
= 6 f (0) + 3 f 2 + 6 f (1):
0
Now a rule like this would not be mu
h good if it
ould only be used to
integrate fun
tions over the interval [0; 1. Fortunately, by performing a linear
transformation of variables, we
an use the rule over an arbitrary interval [a; b.
Sin
e this pro
ess is used regularly in numeri
al integration, we des
ribe it now.
5. The tri
k is to express x as a linear fun
tion of another variable y. The
expression must be su
h that x = a when y = 0 and x = b when y = 1. This
is a simple linear interpolation problem whose solution is
x = a + (b a)y:
It follows that
dx = (b a)dy:
Hen
e if we set
g(y) = f [a + (b a)y;
we have Z b Z 1
2 2
Hen
e the general form of Simpson's rule is
b a + b
Z
b a
f (x) dx = f (a) + 4f + f (b) :
a 6 2
7. This te
hnique easily generalizes to arbitrary
hanges of interval, and we
will silently invoke it whenever ne
essary.
The trapezoidal rule
8. The simplest quadrature rule in wide use is the trapezoidal rule. Like
Newton's method, it has both a geometri
and an analyti
derivation. The
geometri
derivation is illustrated in Figure 21.1. The idea is to approximate
the area under the
urve y = f (x) from x = 0 to x = h by the area of the
trapezoid ABCD. Now the area of a trapezoid is the produ
t of its base with
21. Numeri
al Integration 159
B
f(h)
f(0)
h
A D
its average height. In this
ase the length of the base is h, while the average
height is [f (0) + f (h)=2. In this way we get the trapezoidal rule
Z h h
f (x) dx
= [f (0) + f (h): (21.2)
0 2
11. The error formula (21.4) shows that if f 00 (x) is not large on [0; h and h
is small, the trapezoidal rule gives a good approximation to the integral. For
example if jf 00 (x)j 1 and h = 10 2 , the error in the trapezoidal rule is less
than 10 7 .
The
omposite trapezoidal rule
12. The trapezoidal rule
annot be expe
ted to give a
urate results over a
large interval. However, by summing the results of many appli
ations of the
trapezoidal rule over smaller intervals, we
an obtain an a
urate approxima-
tion to the integral over any interval [a; b.
13. We begin by dividing [a; b into n equal intervals by the points
a = x0 < x1 < < xn 1 < xn = b:
Spe
i
ally, if
b a
h=
n
is the
ommon length of the intervals, then
xi = a + ih; i = 0; 1; : : : ; n:
Rx
Next we approximate x f (x) dx by the trapezoidal rule:
i
i 1
Z xi h
f (x) dx
= [f (xi 1 ) + f (xi ):
xi 21
where i 2 [xi 1 ; xi . Now the fa
tor n1 i f 00 (i ) is just the arithmeti
mean
P
of the numbers f 00 (i ). Hen
e it lies between the largest and the smallest of
these numbers, and it follows from P the intermediate value theorem that there
is an 2 [a; b su
h that f 00 () = n1 i f 00 (i ). Putting all this together, we get
the following result.
Let CTh (f ) denote the approximation produ
ed by the
omposite
trapezoidal rule applied to f on [a; b. Then
Z b
f (x) dx CTh(f ) =
(b a)f 00 () h2 :
a 12
15. This is strong stu. It says that we
an make the approximate integral
as a
urate as we want simply by adding more points (
ompare this with
polynomial interpolation where
onvergen
e
annot be guaranteed). Moreover,
be
ause the error de
reases as h2 , we get twi
e the bang out of our added points.
For example, doubling the number of points redu
es the error by a fa
tor of
four.
Newton{Cotes formulas
16. From the analyti
derivation of the trapezoidal rule, we see that the rule
integrates any linear polynomial exa
tly. This suggests that we generalize the
trapezoidal rule by requiring that our rule integrate exa
tly any polynomial
of degree n. Sin
e a polynomial of degree n has n + 1 free parameters, it is
reasonable to look for the approximate integral as a linear
ombination of the
fun
tion evaluated at n + 1 xed points or abs
issas. Su
h a quadrature rule
is
alled a Newton{Cotes formula.24
17. Let x0 , x1 , . . . , xn be points in the interval [a; b.25 Then we wish to
determine
onstants A0 , A1 , . . . , An , su
h that
Z b
deg(f ) n =) f (x) dx = A0 x0 + A1 x1 + + An xn : (21.5)
a
This problem has an elegant solution in terms of Lagrange polynomials.
Let `i be the ith Lagrange polynomial over x0 , x1 , . . . , xn . Then
Z b
Ai = `i (x) dx (21.6)
a
are the unique
oe
ients satisfying (21.5).
24
Stri
tly speaking, the abs
issas are equally spa
ed in a Newton{Cotes formula. But no
one is making us be stri
t.
25
In point of fa
t, the points do not have to lie in the interval, and sometimes they don't.
But mostly they do.
162 Afternotes on Numeri
al Analysis
18. To prove the assertion rst note that the rule must integrate the ith
Lagrange polynomial. Hen
e
Z b X n
`i (x) = Aj `i (xj ) = Ai `i (xi ) = Ai ;
a j =0
whi
h says that the only possible value for the Ai is given by (21.6).
Now let deg(f ) n. Then
Xn
f (x) = f (xi)`i (x):
i=0
Hen
e Z b n Z b n
X X
`j (x) dx = f (xi ) `i (x) dx = f (xi )Ai ;
a i=0 a i=0
whi
h is just (21.5).
Undetermined
oe
ients and Simpson's rule
19. Although the expressions (21.6) have a
ertain elegan
e, they are di
ult
to evaluate. An alternative for low-order formulas is to use the exa
tness
property (21.5) to write down a system of equations for the
oe
ients, a
te
hnique known as the method of undetermined
oe
ients.
20. We will illustrate the te
hnique with a three-point formula over the interval
[0; 1 based on the points 0, 12 , and 1. First note that the exa
tness property
requires that the rule integrate the fun
tion that is identi
ally one. In other
words, Z 1
1 A0 + 1 A1 + 1 A2 = 1 dx = 1:
0
The rule must also integrate the fun
tion x, whi
h gives
1
0 A0 + A1 + 1 A2 = x dx = 1 :
Z 1
2 0 2
Finally, the rule must integrate the fun
tion x2 . This gives a third equation
1
0 A0 + 4 A1 + 1 A2 = x2 dx = 31 :
Z 1
Numeri
al Integration
The Composite Simpson Rule
Errors in Simpson's Rule
Treatment of Singularities
Gaussian Quadrature: The Idea
xi 3
2. We now wish to approximate the integral of f over [a; b by summing the
results of Simpson's rule over [xi ; xi+2 . However, ea
h appli
ation of Simpson's
rule involves two of the intervals [xi ; xi+1 . Thus the total number of intervals
must be even. For the moment, therefore, we will assume that n is even.
3. The summation
an be written as follows:
Rb
h a f (x) dx = f0 + 4f1 + f2
3
+ f2 + 4f3 + f4 +
: : :
+ fn 4 + 4fn 3 + fn 2
+ fn 2 + 4fn 1 + fn :
This sum teles
opes into
Z b h
f (x) dx
= (f0 + 4f1 + 2f2 + 4f3 + 2f4 + + 2fn 2 + 4fn 1 + fn); (22.1)
a 3
whi
h is the
omposite Simpson rule.
4. Here is a little fragment of
ode that
omputes the sum (22.1). As above
we assume that n is an even, positive integer.
165
166 Afternotes on Numeri
al Analysis
This solution works quite well. However, it has the drawba
k that it is not
suitable for tabulated data, where additional fun
tion values are unobtainable.
A third option is to
on
o
t a Newton{Cotes-type formula of the form
Z xn
f (x) dx
= A0 fn 2 + A1 fn 1 + A2 fn
xn 1
and use it to integrate over the extra interval. The formula
an be easily
derived by the method of undetermined
oe
ients. It is sometimes
alled the
half-simp or semi-simp rule.
7. The presen
e of the fa
tor f (4) ( ) on the right-hand side of (22.2) implies
that the error vanishes when f is a
ubi
: Although Simpson's rule was derived
to be exa
t for quadrati
s, it is also exa
t for
ubi
s. This is no
oin
iden
e,
as we shall see when we
ome to treat Gaussian quadrature.
8. The error formula for the
omposite Simpson rule
an be obtained from
(22.2) in the same way as we derived the error formula for the
omposite trape-
zoidal rule. If CSh (f ) denotes the result of applying the
omposite Simpson
rule to f over the interval [a; b, then
Z b
f (x) dx CSh (f ) =
(b a)f (4) ( ) h4 ;
a 180
where 2 [a; b.
Treatment of singularities
9. It sometimes happens that one has to integrate a fun
tion with a singularity.
For example, if
f (x)
= px ;
R
when x is near zero, then 01 f (x) dx exists. However, we
annot use the trape-
zoidal rule or Simpson's rule to evaluate the integral be
ause f (0) is undened.
Of
ourse one
an try to
al
ulate the integral by a Newton{Cotes formula
based on points that ex
lude zero; e.g., x0 = 41 and x1 = 43 . However, we will
still not obtain very good results, sin
e f is not at all linear on [0; 1. A better
approa
h is to in
orporate the singularity into the quadrature rule itself.
10. First dene p
g(x) = xf (x):
Then g(x) =
when x is near zero, so that g is well behaved. Thus we should
seek a rule that evaluates the integral
Z 1
g(x)x dx;
1
2
where g is a well-behaved fun
tion on [0; 1. The fun
tion x is
alled a weight
1
2
1 x dx = 2 = A0 + A1 :
1
2
0
168 Afternotes on Numeri
al Analysis
g(x)x dx
5 1 2 3
= 3g 4 + 3g 4 :
1
2
12. In transforming this formula to another interval, say [0; h,
are must be
taken to transform the weighting fun
tion properly. For example, if we wish
to evaluate Z h
g(x)x dx;
1
2
0 0
Owing
p to the weight fun
tion x , the transformed integral is multiplied by
1
2
x = .025;
f0 = .5*
os(x)/sqrt(x) - sqrt(x)*sin(x);
g0 = .5*
os(x) - x*sin(x);
x = .075;
f1 = .5*
os(x)/sqrt(x) - sqrt(x)*sin(x);
g1 = .5*
os(x) - x*sin(x);
[.05*(f0+f1), sqrt(.1)*(5*g0/3 + g1/3), sqrt(.1)*
os(.1)
The true value of the integral is 0:3146. The Newton{Cotes approximation is
0.2479. The weighted approximation is 0:3151 | a great improvement.
22. Numeri
al Integration 169
Numeri
al Integration
Gaussian Quadrature: The Setting
Orthogonal Polynomials
Existen
e
Zeros of Orthogonal Polynomials
Gaussian Quadrature
Error and Convergen
e
Examples
Existen
e
9. To establish the existen
e of orthogonal polynomials, we begin by
omputing
the rst two. Sin
e p0 is moni
and of degree zero,
p0 (x) 1:
Sin
e p1 is moni
and of degree one, it must have the form
p1 (x) = x 1 :
To determine 1 , we use orthogonality:
0 = p1 p0 = (x 1 ) 1 = x 1 1:
R R R R
R
Sin
e the fun
tion 1 is positive in the interval of integration, 1 > 0, and it
follows that R
x
1 = R :
1
10. In general we will seek pn+1 in the form
pn+1 = xpn n+1 pn n+1 pn 1
n+1 pn 2 :
As in the
onstru
tion of p1 , we use orthogonality to determine the
oe
ients
n+1 , n+1 ,
n+1 , . . . .
To determine n+1 , write
0 = pn+1pn = xpnpn n+1 pn pn n+1 pn 1 pn
n+1 pn 2 pn :
R R R R R
By orthogonality, 0 = pn 1 pn = pn 2 pn = . Hen
e
R R
R R
xp2n n+1 p2n = 0:
R
Sin
e p2n > 0, we may solve this equation to get
R
xp2n
n+1 = R :
p2n
For n+1 , write
R R R
0 = pn+1 pn 1 =R xpn pn 1 n+1R pn pn 1
n+1 pn 1 pn 1
n+1 pn 2 pn 1 :
Dropping terms that are zero be
ause of orthogonality, we get
R R
xpnpn 1 n+1 p2n 1 =0
174 Afternotes on Numeri
al Analysis
or R
xp n pn 1
n+1 = R :
p2n 1
11. The formulas for the remaining
oe
ients are similar to the formula for
k+1 ; e.g., R
xp n pn 2
n+1 = R :
p2n 2
R
However, Rthere is a surprise here. The numerator xpn pn 2
an be written in
the form xpn 2 pn . Sin
e xpn 2 is of degree n 1 it is orthogonal to pn ; i.e.,
R
xpn 2 pn 1 = 0. Hen
e
k+1 = 0, and likewise the
oe
ients of pn 3 , pn 4 ,
. . . are zero.
12. To summarize:
The orthogonal polynomials
an be generated by the following re-
urren
e:
p0 = 1;
p1 = x 1 ;
pn+1 = xpn n+1 pn n+1 pn 1 ; n = 1; 2; : : : ;
where R R
xp2 xp p
n+1 = 2n and n+1 = R n2 n 1 :
R
pn pn 1
The rst two equations in the re
urren
e merely start things o. The
right-hand side of the third equation
ontains three terms and for that reason
is
alled the three-term re
urren
e for the orthogonal polynomials.
Zeros of orthogonal polynomials
13. It will turn out that the abs
issas of our Gaussian quadrature formula will
be the zeros of pn+1 . We will now show that
The zeros of pn+1 are real, simple, and lie in the interval [a; b.
14. Let x0 , x1 , . . . , xk be the zeros of odd multipli
ity of pn+1 in [a; b; i.e,, x0 ,
x1 , . . . , xk are the points at whi
h pn+1
hanges sign in [a; b. If k = n, we are
through, sin
e the xi are the n + 1 zeros of pn+1 .
Suppose then that k < n and
onsider the polynomial
q(x) = (x x0 )(x x1 ) (x xk ):
Sin
e deg(q) = k + 1 < n + 1, by orthogonality
R
pn+1 q = 0:
23. Numeri
al Integration 175
On the other hand pn+1 (x)q(x)
annot
hange sign on [a; b | ea
h sign
hange
in pn+1 (x) is
an
elled by a
orresponding sign
hange in q(x). It follows that
pn+1 q 6= 0;
R
16. To establish this result, rst note that by
onstru
tion the integration
formula Gn f is exa
t for polynomials of degree less than or equal to n (see
x21.17).
Now let deg(f ) 2n + 1. Divide f by pn+1 to get
f = pn+1 q + r; deg(q); deg(r) n: (23.4)
Then
Gn f = Pi Ai f (xi)
P
where 2 [a; b.
20. A
onsequen
e of the positivity of the
oe
ients Ai is that Gaussian
quadrature
onverges for any
ontinuous fun
tion; that is,
f
ontinuous =) nlim
R
!1 Gn f = f:
The proof | it is a good exer
ise in elementary analysis | is based on the
Weierstrass approximation theorem, whi
h says that for any
ontinuous fun
-
tion f there is a sequen
e of polynomials that
onverges uniformly to f .
Examples
21. Parti
ular Gauss formulas arise from parti
ular
hoi
es of the interval [a; b
and the weight fun
tion w(x). The workhorse is Gauss{Legendre quadrature,26
in whi
h [a; b = [ 1; 1 and w(x) 1, so that the formula approximates the
integral Z 1
f (x) dx:
1
imate Z 1
f (x)e x dx:
2
1
This is Gauss{Hermite quadrature.
24. There are many other Gauss formulas suitable for spe
ial purposes. Most
mathemati
al handbooks have tables of the abs
issas and
oe
ients. The
automati
generation of Gauss formulas is an interesting subje
t in its own
right.
Numeri
al Dierentiation
179
Le
ture 24
Numeri
al Dierentiation
Numeri
al Dierentiation and Integration
Formulas from Power Series
Limitations
181
182 Afternotes on Numeri
al Analysis
1 : f (x + h)
2
1 : f (x) = f (x)
hf 0 (x) + f 00 ( )
h2
0 : f (x h) = f (x)
2
hf 0 (x) + f 00 ( )
h2
f (x + h) f (x) =
2
Note that the error term
onsists of two evaluations of f 000 , one at + 2 [x; x + h
from trun
ating the series for f (x + h) and the other at 2 [x h; x from
trun
ating the series for f (x h). If f 000 is
ontinuous, the average of these two
values
an be written as f 000 ( ), where 2 [x h; x + h. Hen
e we have the
entral-dieren
e formula
f (x + h) f (x h) h2 000
f 0(x) = f ( ); 2 [x h; x + h:
2h 6
5. Sin
e the error in the
entral-dieren
e formula is of order h2 , it is ultimately
more a
urate than a forward-dieren
e s
heme. And on the fa
e of it, both
require two fun
tion evaluations, so that it is no less e
onomi
al. However, in
many appli
ations we will be given f (x) along with x. When this happens,
a forward-dieren
e formula requires only one additional fun
tion evaluation,
ompared with two for a
entral-dieren
e formula.
6. To get a formula for the se
ond derivative, we
hoose the
oe
ients to
pi
k o the rst two terms of the Taylor expansion:
= f (x) + hf 0 (x) + f 00 (x) + f 000 (x) + f (4) (+ )
h2 h3 h4
1 : f (x + h)
2 6 24
2 : f (x ) = f (x)
hf 0 (x) + f 00 (x) f 000 (x) +
h2 h3 h4
1 : f (x + h ) = f (x) f (4) ( )
2 6 24
h4 (f (4) + ) + f (4) ( )
f (x + h) 2 f (x ) + f (x h) = h2 f 00 (x) +
6 2
3 : f (x ) = f (x)
= f (x) + hf 0 (x) + f 00 (x) +
f 000 (1 )
h2 h3
4 : f (x + h)
2 63
4h 000
1 : f (x + h ) = f (x) + 2hf 0 (x) + 2h2 f 00 (x) + f (2 )
6
2h3 000 4h3 000
3f (x) + 4f (x + h) f (x + 2h) = 2hf 00 (x) + f (1 ) f (2 )
3 3
Hen
e
f 0 (x) =
3f (x) + 4f (x + h) f (x + 2h) + h2 [2f 000 ( ) f 000 ( ):
2h 3 2 1
The error term does not depend on a single value of f 000 ; however, if h is
small, it is approximately
h2 000
3 f (x):
8. The te
hnique just des
ribed has mu
h in
ommon with the method of
undetermined
oe
ients for nding integration rules. There are other, more
systemati
ways of deriving formulas to approximate derivatives. But this one
is easy to remember if you are stranded on a desert island without a textbook.
Limitations
9. As we indi
ated at the beginning of this le
ture, errors in the values of f
an
ause ina
ura
ies in the
omputed derivatives. In fa
t the errors do more:
they pla
e a limit on the a
ura
y to whi
h a given formula
an
ompute
derivatives.
10. To see how this
omes about,
onsider the forward-dieren
e formula
f (x + h) f (x) h2 00
D(f ) = + 2 f ( );
h
where D(f ) denotes the operation of dierentiating f at x. If we dene the
operator Dh by
f (x + h) f (x)
Dh (f ) = ;
h
then the error in the forward-dieren
e approximation is
h 00
Dh (f ) D(f ) = f ( ):
2
In parti
ular, if
jf 00(t)j M
24. Numeri
al Dierentiation 185
191
192 Afternotes on Numeri
al Analysis
dimension, 69
as n 1 matrix, 69
n-ve
tor, 69
represented by lower-
ase
Latin letters, 70
row ve
tor, 69 , 72
ve
tor operations, see
multipli
ation by a s
alar,
sum, produ
t, transpose
ve
tor super
omputer, 109
virtual memory, 83 , 84, 86
page, 83
page hit, 83
page miss, 83 , 83{85
Weierstrass approximation
theorem, 176
weight fun
tion, 171
Wilkinson, J. H., 37, 110
zero matrix, see matrix
zero of a fun
tion, 9