Professional Documents
Culture Documents
Jochen Voss
University of Warwick
December 2004
I tried to keep the text as free of errors as possible. Please report any remaining mistakes to
Jochen Voss (voss@seehuhn.de). The current version of the text can always be found on my
home page at http://seehuhn.de/mathe/numlinalg.html .
Copyright
c 2004 Jochen Voss and Andrew Stuart
2 Complexity of Algorithms 11
2.1 Computational Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Analysis of Matrix-Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . 12
5 Iterative Methods 34
5.1 Linear Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2 The Conjugate-Gradient Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2
Introduction
These lecture notes cover the course Numerical Linear Algebra (MA398) given in the autumn
term 2004 at the University of Warwick. The notes are partially based on lecture notes written
by Andrew Stuart for ealier courses.
What is numerical linear algebra? Of course, we consider the same problems which are
considered in a linear algebra course. But this time the focus is dierent. We are interested in
These large problems occur for example when continuous problems are discretised or in image
analysis. Throughout the lecture we will address three main problems.
SLE (simultaneous linear equations): given a matrix A Cnn and a vector b Cn , nd
xC n
with
Ax = b.
LSQ (least square problem): given a matrix A Cmn and a vector b Cm (m n), nd
xC n
which minimises the distance
kAx bk.
Ax = x with x 6= 0.
3
All of this is meant to deal with really big matrices. Where do these matrices come from?
As mentioned above these occur e.g. in the area of discretised continuous problems. This is
illustrated by the following example.
Example. We want to numerically solve the following problem: given a, b R, nd a function
f : [0, 1] R with f 00 (x) = 0 for all x [0, 1], f (0) = a and f (1) = b.
Of course this problem only serves as an illustration. It can easily be solved analytically
(question to the reader: what is the result?). But there are similar problems (for example in
the two-dimensional case) where direct solution is no longer feasible and the numerical approach
becomes practical. The methods used there are exactly the same as the ones presented in this
example.
The idea is to discretise the problem: for k = 0, . . . , N and N N let xk = k/N . We
consider the nite set {x 0 , x1 , . . . , xN } instead of the interval [0, 1] and we consider the vector
f (x0 ), f (x1 ), . . . , f (xn ) instead of the function f .
00
What should we do about the derivative f ? Using the Taylor formula we nd that for large
N the approximation
f (x0 ) = a
N f (xk1 ) 2N f (xk ) + N 2 f (xk+1 ) = 0
2 2
for k = 1, . . . , N 1
f (xN ) = b.
Since we used the approximation (0.1) the result will not be exact, but if we choose N large
enough the approximation gets better and we can hope that the result is close to the exact result.
To solve this problem we need to be able to deal with large systems of linear equations.
As mentioned above this example is very simple and only serves as an illustration, but more
interesting examples lead to the same kind of linear equations. You can learn more about this
in courses about numerical solution of partial dierential equations.
The course gives only an introduction in the topics of numerical linear algebra. Further
results can be found in many text books. The course is based on the following books: the books
of Lloyd N. Trefethen [TB97] and James W. Demmel [Dem97] give a good introductions into
the Topic. J. Stoer and R. Bulirsch [SB02] give a more theoretical presentation of numerical
analysis, which also contains results about numerical linear Algebra. The book of Roger A. Horn
and Charles R. Johnson [HJ85] is a good reference for theoretical results about matrix analysis.
Nicholas J. Higham's book [Hig02] contains a lot of information about stability and the eect of
rounding errors in numerical algorithms.
4
Chapter 1
Linear Algebra
The purpose of this chapter is to summarise a few results from linear algebra and to provide
some basic theoretical tools which we will later need for our analysis.
Examples:
the p-norm for 1 p < :
n 1/p
x Cn
X
kxkp = |xj |p
j=1
v
u n
uX
kxk2 = t |xj |2 x Cn
j=1
5
Example. The standard inner product on Cn is given by
n
x, y Cn .
X
hx, yi = xj yj (1.1)
j=1
Remark. Conditions c) and d) above state that h , i is linear in the second component. Using
the rules for inner products we get
and
hx, yi = hx, yi for all C, x, y Cn .
i.e. the inner product is anti-linear in the rst component.
Denition 1.3. Two vectors x, y are orthogonal with respect to the inner product h, i i
hx, yi = 0.
is a vector norm.
Proof. Cn ,
p
a) Since h, i is an inner product we have hx, xi 0 for all x i.e. hx, xi is
dened without problems and positive. Also we get
kxk = 0 hx, xi = 0 x = 0.
b) We have
p p
kxk = hx, xi = hx, xi = || kxk.
c) Note that the proof of the Cauchy-Schwarz inequality
x, y Cn
hx, yi kxkkyk
only uses properties of the inner product. So we can use it here even before we know that kk
is a norm. We get
kx + yk2 = hx + y, x + yi
= hx, xi + hx, yi + hy, xi + hy, yi
kxk2 + 2hx, yi + kyk2
AR mn
we get
A = AT .)
6
Using this denition we can write the standard inner product as
hx, yi = x y.
Remarks. 1) Unless otherwise specied, h, i will denote the standard inner-product (1.1).
The standard inner product satises
for all x, y Cn .
If A C
nn
2) is Hermitian and positive-denite, then
x Cn
p
kxkA = hx, xiA
denes a norm on Cn .
Remark. Conditions a), b) and c) state that kk is a vector norm on the vector space Cnn .
Condition d) only makes sense for matrices, since general vectors spaces are not equipped with
a product.
Denition 1.8. Given a vector norm k kv on Cn we dene the induced norm k km on Cnn
by
kAxkv
kAkm = max
x6=0 kxkv
for all A Cnn .
Theorem 1.9. The induced norm k km of a vector norm k kv is a matrix norm with
kIkm = 1
and
kAxkv kAkm kxkv
for all A C nn
and x C . n
7
Proof. a) kAkm R and kAkm 0 for all A Cnn is obvious from the denition. Also from
the denition we get
kAxkv
kAkm = 0 = 0 x 6= 0
kxkv
kAxkv = 0 x 6= 0 Ax = 0 x 6= 0 A = 0.
kAxkv ||kAxkv
kAkm = max = max = ||kAkm
x6=0 kxkv x6=0 kxkv
kAx + Bxkv
kA + Bkm = max
x6=0 kxkv
kAxkv + kBxkv
max
x6=0 kxkv
kAxkv kBxkv
max + max = kAkm + kBkm .
x6=0 kxkv x6=0 kxkv
kIxkv kxkv
kIk = max = max =1
x6=0 kxkv x6=0 kxkv
and
kAykv kAxkv
kAkm = max x Cn \ {0}
y6=0 kykv kxkv
which gives
kAxkv kAkm kxkv x Cn .
d) Using this estimate we nd
kABxkv
kABkm = max
kxkv
x6=0
kAkm kBxkv
max = kAkm kBkm .
x6=0 kxkv
Usually one denotes the induced matrix norm with the same symbol as the corresponding
vector norm. For the remaining part of this text we will follow this convention.
Ax = x and x 6= 0. (1.3)
(A) = max || is eigenvalue of A .
Theorem 1.11. For any matrix norm k k, any matrix A Cnn and any ` N we have
(A)` kA` k kAk` .
8
Proof. By denition of the spectral radius (A) we can nd an eigenvector x with Ax = x and
(A) = ||. Let X Cnn be the matrix where all n columns are equal to x. Then we have
A` X = ` X and thus
Dividing by kXk gives (A)` kA` k. The second inequality follows from property d) in the
denition of a matrix norm.
and get
n
X
kxk22 = |j |2 .
j=1
Similarly we nd
n
X n
X
2
Ax = j j xj and kAxk = |j j |2 .
j=1 j=1
This shows
2 1/2
Pn
kAxk2 j=1 |j j |
=
kxk2 2 1/2
Pn
j=1 |j |
Pn |j |2 |1 |2 1/2
j=1
P n 2
j=1 |j |
= |1 | = (A) x Cn
Similar methods to those used in the proof of the previous result yield the following theorem.
With the previous theorems we have among other things identied the matrix norm which
is induced by the 2-vector norm. The following theorem explicitly identies the matrix norm
associated to the innity-norm.
9
Theorem 1.16. The matrix norm induced by the innity norm is the maximum row sum norm:
n
X
kAk = max |aij |.
1in
j=1
n
X
kAxk = max |(Ax)i | = max aij xj
1in 1in
j=1
n
X
max |aij |kxk
1in
j=1
which gives
n
kAxk X
max |aij |
kxk 1in
j=1
n
X n
X
max |aij | = |akj |
1in
j=1 j=1
and dene x Cn by xj = akj /|akj | for all j = 1, . . . , n. Then we have kxk = 1 and
Pn akj
kAxk max1in j=1 aij |akj |
kAk =
kxk 1
n
X akj
akj
j=1
|akj |
n
X
= |akj |
j=1
n
X
= max |aij |.
1in
j=1
Exercises
1) Show that the matrix norm induced by the k k1 -norm on Cn is the maximum column sum
norm n
X
kAk1 = max |aij |.
1jn
i=1
and
kAkmax kBkmax kCk1 .
10
Chapter 2
Complexity of Algorithms
In this chapter we learn how to analyse how long it takes to solve a numerical problem on a
computer. Specically we are interested in the question how the cost to perform an algorithm
depends on the input size.
The computational cost of an algorithm is the amount of resources it takes to perform this algo-
rithm on a computer. For simplicity here we just count the number of oating point operations
(additions, subtractions, multiplications, divisions) performed during one run of the algorithm.
A more detailed analysis would also take factors like memory usage into account.
where n is the size of the input data (e.g. the number of equations etc.).
The following denition provides the notation we will use to describe the asyptotic compu-
tational cost of an algorithm, this is the behaviour of the cost C(n) for n .
Example. Using this notation we can write 5n2 + 2n 3 = (n2 ), n2 = O(n3 ) and n2 = (n).
Theorem 2.3. The standard inner-product algorithm on Cn has computational cost C(n) =
(n). Any algorithm for the inner product has C(n) = (n).
11
Proof. The standard inner-product algorithm above has n multiplications and n additions, i.e.
C(n) = 2n = (n).
Sketch for the proof of for C(n) = (n): Since each of the products xi yi is independent of
the others, we have to calculate all n of them.
Remark. Giving a real proof for the lower bound in the above theorem would require a detailed
model about what an algorithm actually is. One would for example need to be able to prove
that guessing the result in just one operation and returning it, is not a proper algorithm. We
avoid these diculties here by only giving sketches for lower bounds.
Theorem 2.4. The standard method for Cnn matrix-matrix multiplication satises C(n) =
(n3 ). Any method has C(n) = (n2 ).
In theorem 2.4 there is a gap between the order (n3 ) of the standard method for multiplying
2
matrices and the lower bound (n ). The purpose of this section is to show that there are
actually algorithms with an asymptotic order which is better than (n3 ).
For A, B C nn
, n even and D = AB write
A11 A12 B11 B12 D11 D12
A= ,B = ,D =
A21 A22 B21 B22 D21 D22
The above method calculates the product of two n n-matrices using eight multiplications
of (n/2) (n/2)-matrices. There is another way to calculate the entries of the matrix D, which
looks more complicated at rst but only uses seven multiplications of (n/2) (n/2)-matrices. It
will transpire that this fact can be utilised to get an asymptotically faster method of multiplying
matrices. Using
12
we can write
D11 = P1 + P4 P5 + P7
D12 = P3 + P5
D21 = P2 + P4
D22 = P1 + P3 P2 + P6 .
1: if n = 1 then
2: return AB
3: else
4: calculate P1 , . . . , P7 (using recursion)
5: calculate D11 , D12 , D21 and D22
6: return D
7: end if
Remark. Recursive algorithms of this kind are called divide and conquer algorithms.
n n
Using the Strassen-multiplication we can calculate D with 7 multiplications of 2 2 -matrices
n n
and 18 additions of
2 2 -matrices. Thus we nd
Lemma 2.5. The Strassen-multiplication has computational cost C(2k ) = 7 7k 6 4k for all
k N0 .
Proof. For k = 0 we get C(20 ) = C(1) = 1.
Assume the claim is true for k N0 . Then
Theorem 2.6. The Strassen-algorithm for matrix-matrix multiplication has asymptotic compu-
tational cost C(n) = (nlog 7 ). 2
Remark. We will prove the theorem for n = 2k , k N0 . If n is not of this form we can extend
the matrices: choose k N0 with 2k n > 2k1 and dene A, B C2k 2k by
A 012 B 012
A = , B=
021 022 021 022
Thus we can nd the product of the n n-matrices A and B by multiplying the 2k 2k -matrices
A and B
with the Strassen-algorithm. Since we have n 2k 2n, the extended matrices are at
most double the size of the original ones, and because (2n) = (n ) for every > 0 the result
k
for n = 2 implies C(n) = (n
log2 7
) for every n N.
13
Proof. (of the theorem) Let n = 2k . Then we have
k
7k = 2log2 (7 )
= 2k log2 7 = (2k )log2 7 = nlog2 7
and
4k = (2k )2 = n2 .
Using the lemma we get
A1
012
A1 =
021 I22
for every x 6= 0 and thus A A is positive denite and therefore invertible. We can write
A1 = (A A)1 A .
This allows us to invert A with cost C(n) = D(n) + O(n ) where D is the cost of inverting a
Hermitian, positive denite matrix and O(n ) is the cost for matrix-matrix multiplication. So
we can restrict ourselves to the case of Hermitian, positive denite matrices.
3) To determine the cost function D let B be Hermitian and positive denite:
B11 B12
B=
B12 B22
n n 1
where the Bjk are
2 2 -matrices. Let S = B22 B12 B11 B12 . A direct calculation shows that
1 1 1 1
B12 S 1 B12
B12 S 1
B11 + B11 B11 B11
B= 1 .
S 1 B12
B11 S 1
14
where O(n ) is the cost for the multiplications and O(n2 ) is the cost for the additions and
subtractions.
From theorem 2.4 we already know 2, so we can simplify the above estimate to
D(n) 2D(n/2) + cn
for some constant c > 0. With an induction argument (see exercise 4 below) one can conclude
c
D(2k ) (2k )
1 21
and thus we get
D(2k ) = O (2k )
Exercises
1) Let (i) f (n) = n2 [1 + sin(n)] and (ii) f (n) = n + n2 . In each case, which of the following
are true:
f (n) = O(1);
f (n) = O(n);
f (n) = O(n2 );
f (n) = (1);
f (n) = (n);
f (n) = (n2 ).
2) Show that the standard method for matrix-vector multiplication has asymptotic compu-
tational cost C(n) = (n2 ).
3) Show that the matrices S and B11 in the proof of theorem 2.7 are invertible.
4) Show by induction that D(1) = 1 and
D(n) 2D(n/2) + cn
c
D(2k ) (2k )
1 21
for all k N0 .
15
Chapter 3
3.1 Conditioning
Denition 3.1. A problem is called well conditioned if small changes in the problem only lead
to small changes in the solution and badly conditioned if small changes in the problem can lead
to large changes in the solution.
For this chapter x a vector norm kk and a matrix norm kk which is compatible with the
vector norm, that is which satises
This condition is for example satised when the matrix norm is induced by the vector norm.
Remark. We always have kIk = kAA1 k kAkkA1 k = (A). For induced matrix norms this
implies (A) 1 for every A Cnn .
Then we have kAk2 = max . Since the matrix A1 has eigenvalues 1/1 , . . . , 1/n we nd
1
kA k2 = min and thus the condition number of A in 2-norm is
max
(A) = .
min
16
and
A1 b = (x + x) A1 b = x + x x = x.
Therefore we get
The previous lemma gave an upper bound on how much the solution of the equation Ax = b
can change if the right hand side is slightly perturbed. The result shows that the problem is well
conditioned if the condition number (A) is small. Theorem 3.5 below gives a similar result for
perturbation of the matrix A instead of the vector b. For the proof we will need the following
lemma.
Lemma 3.4. If A Cnn satises kAk < 1 in any induced matrix norm, then I +A is invertible
and
k(I + A)1 k (1 kAk)1 .
Proof. With the triangle inequality we get
and thus
k(I + A)xk 1 kAk kxk
for every x Cn . Since this implies (I + A)x 6= 0 for every x 6= 0, and thus the matrix I +A is
invertible.
Now let b 6= 0 and x = (I + A)1 b. Then
k(I + A)1 bk 1
k(I + A)1 k = sup .
b6=0 kbk 1 kAk
Assume that A is invertible with kA1 kkAk < 1 in some induced matrix norm. Then we have
kxk (A) kAk
.
kxk 1 (A) kAk
kAk
kAk
Proof. We nd
(A + A)x = b (A + A)x = A x
1
and thus (I + A A)x = A1 A x. Using lemma 3.4 we can write
x = (I + A1 A)1 A1 A x
and we get
kA1 Ak
kxk k(I + A1 A)1 kkA1 Akkxk kxk.
1 kA1 Ak
17
Since
kAk
kA1 Ak kA1 kkAk = (A)
kAk
and since the map x 7 x/(1 x) is increasing on the interval [0, 1) we get
3.2 Stability
Stability of an algorithm measures how susceptible it is to rounding errors occurring during the
computation. In this section we will for the rst time distinguish between the computed result as
returned by a computer and the exact result which would be the mathematically correct solution
of the problem.
Denition 3.6. Assume we want to numerically calculate a value y = f (x), but the algorithm
returns the computed result y 6= y which can be represented as the exact image y = f (
x) of a
dierent input value . Then y = y y is called the forward error and x = x
x x is called
the backward error.
y = f (
x)
exact
x
y
calculated
exact y = f (x)
x
results
input data
If x
is not unique then we choose the one which results in minimal kxk. Typically we
relative backward error kxk/kxk and the relative forward error kyk/kyk.
consider the
Internally computers represent real number using only a nite number of bits. Thus they
can only represent nitely many numbers and when dealing with general real numbers rounding
errors will occur. Let F R be the set of representable numbers and let fl : R F be the
rounding to the closest element of F.
In this course we will use a simplied model for computer arithmetic which is described
by the following two assumptions. The main simplication is, that we ignore the problems of
numbers which are unrepresentable because they are very large (overows) or very close to zero
(underows).
18
Assumptions. There is a parameter m > 0 (the machine epsilon) such that the following
conditions hold.
fl(x) = x (1 + ).
x ~ y = (x y) (1 + )
Denition 3.8. An algorithm, is called backward stable if the relative backward error satises
kxk
= O(m ).
kxk
The typical way to use this concept is in a two-step procedure called backward error analysis.
In a rst step one shows that the algorithm in question is backward stable, i.e. that the inuence
of rounding errors can be represented as a small perturbation x of the original problem. In
the second step one uses results like theorem 3.7 about the conditioning of the problem (which
does not depend on the algorithm used) to show that the forward error is also small. Together
these steps show that the calculated result is close the the exact result.
fl(x1 ) fl(x2 ) = fl(x1 ) fl(x2 ) (1 + 3 )
fl(x1 ) fl(x2 ) = x1 (1 + 1 ) x2 (1 + 2 ) (1 + 3 )
= x1 (1 + 1 )(1 + 3 ) x2 (1 + 2 )(1 + 3 )
= x1 (1 + 4 ) x2 (1 + 5 )
19
Exercises
1) Choose a matrix norm and calculate the condition number of the matrix
1
A=
1 1
in this norm.
2) Let x be a solution of Ax = b and let x
be a solution of (A + A)
x = b + b. Show that
k
x xk (A) kAk kbk
kAk
+ .
kxk 1 (A) kAk kAk kbk
3) Show that the calculated arithmetic operations , and are backward stable.
20
Chapter 4
Gaussian elimination is the most commonly known method to solve systems of linear equations.
The mathematical background of the algorithm is the following theorem.
Denition 4.1. A matrix A Cmn is said to be upper triangular if aij = 0 for all i > j and
lower triangular if aij = 0 for all i < j . A triangular matrix is said to be unit triangular if all
diagonal entries are equal to 1.
Denition 4.2. The j th principal sub-matrix of a matrix A Cnn is the matrix Aj Cjj
j
with (A )kl = akl for 1 k, l j .
Theorem 4.3 (LU Factorisation). a) Let A Cnn be a matrix such that Aj is invertible for
j = 1, . . . , n. Then there is a unique factorisation A = LU where L is unit lower triangular and
U is non-singular upper triangular. b) If Aj is singular for one j {1, . . . , n} then there is no
such factorisation.
The following picture gives a graphical representation of the LU-factorisation.
n1
A b
A= (4.1)
c ann
21
where An1 is the (n 1)th principle sub-matrix of A, and b, c C(n1) and ann C are the
remaining blocks. We are looking for a factorisation of the form
L 0 U u LU Lu
A= = (4.2)
` 1 0 ` U ` u +
By the induction hypothesis L and U with An1 = LU exist and are unique. Since the matrix
L is invertible the condition Lu = b determines a unique vector u. Since U is invertible there is
an uniquely determined ` with U ` = c and thus ` U = c . Finally the condition ` u + = ann
uniquely determines C. This shows that the required factorisation for A exists and is unique.
Since 0 6= det(A) = 1 det U the upper triangular matrix is non-singular.
b) Assume that A has an LU-factorisation and let j {1, . . . , n}. Then we can write A = LU
in block form as
A11 A12 L11 0 U11 U12 L11 U11 L11 U12
= =
A21 A22 L21 L22 0 U22 L21 U11 L21 U12 + L22 U22
Example. The matrix A can be converted into upper triangular shape by multiplying lower
triangular matrices from the left. Let for example
2 1 1
A = 4 3 3 .
8 7 9
Then we can create zeros in the rst column below the diagonal by subtracting multiples of the
rst row from the other rows. In matrix notation this can be written as
1 2 1 1 2 1 1
L1 A = 2 1 4 3 3 = 0 1 1 .
4 1 8 7 9 0 3 5
1
2 1 1 2 1 1
L2 L1 A = 1 0 1 1 = 0 1 1 .
3 1 0 3 5 0 0 2
We have found lower triangular matrices L1 and L2 and an upper triangular matrix R with
A = (L2 L1 )1 R. The following lemma helps to calculate (L2 L1 )1 = L1 1
1 L2 .
Lemma 4.4. a) Let L = (`ij ) be unit lower triangular with non-zero entries below the diagonal
only in column k. Then L1 is also unit lower triangular with non-zero entries below the diagonal
only in column k and we have (L1 )ik = `ik for all i > k.
b) Let A = (aij ) and B = (bij ) be unit lower triangular n n matrices where A has non-
zero entries below the diagonal only in columns 1, . . . , k and B has non-zero entries below the
diagonal only in columns k + 1, . . . , n. Then AB is unit lower triangular with (AB)ij = aij for
j {1, . . . , k} and (AB)ij = bij for j {k + 1, . . . , n}.
Proof. a) Multiplying L with the suggested inverse gives the identity. b) Direct calculation.
22
Example. For the matrices L1 and L2 from the previous example we get
1 1 1
(L2 L1 )1 = L1 1
1 L2 = 2 1 1 = 2 1 .
4 1 3 1 4 3 1
Thus we found the LU -factorisation
2 1 1 1 2 1 1
4 3 3 = 2 1 1 1 .
8 7 9 4 3 1 2
The technique to convert A into an upper triangular matrix by multiplying lower triangular
matrices leads to the following algorithm:
Algorithm LU (LU-factorisation).
input: A Cnn with det(Aj ) 6= 0 for j = 1, . . . , n
output: L, U C
nn
where A = LU is the LU-factorisation of A
1: U = A, L = I
2: for k = 1, . . . , n 1 do
3: for j = k + 1, . . . , n do
4: ljk = ujk /ukk
5: (uj,k , . . . , uj,n ) = (uj,k , . . . , uj,n ) lj,k (uk,k , . . . , uk,n )
6: end for
7: end for
Remarks. Line 5 of the algorithm subtracts a multiple of line k from line j, causing ujk = 0
without changing columns 1, . . . , k1. This corresponds to multiplication with a lower triangular
matrix Lk as in the example above. Thus after the loop ending in line 6 is nished, the current
value of the matrix U is Lk L1 A and it has zeros below the diagonal in columns 1, . . . , k .
Since the principal sub-matrices Aj are non-singular and the matrices Lj are unit triangular
we get
det(Lk L1 A)k+1 = det Ak+1 6= 0
and thus we have ukk 6= 0 in line 4. Lemma 4.4 shows that the algorithm calculates the correct
entry ljk for the matrix L = (Ln L1 )1 .
The last missing building block for the Gaussian elimination method is the following algorithm
to solve systems of linear equations when the coecient matrix is triangular.
1: for j = n, . . . , 1 do
n
1 X
2: xj = bj ujk xk
ujj
k=j+1
3: end for
Remarks. 1) Since U is triangular we get
n
X
(U x)i = uij xj
j=i
1 n n
X X
= uii bi uik xk + uij xj
uii j=i+1
k=i+1
= bi .
23
Thus the algorithm is correct.
2) The corresponding algorithm to solve Lx = b where L is a lower triangular matrix is called
forward substitution.
Combining all our preparations we get the Gaussian elimination algorithm to solve the prob-
lem SLE:
n1
X n1
X
C(n) = (n k) 1 + 2(n k + 1) = 2(n k)3 + 3(n k).
k=1 k=1
if limx f (x)/g(x) = 1. This implies f (x) = g(x) but is a stronger concept since we also
compare the leading constants. Using this notation the claim of the lemma becomes C(n) 23 n3 .
Lemma 4.6. Forward substitution and back substitution have computational cost C(n) n2 .
Proof. Calculating xj in the back substitution algorithm needs 2(n j) + 1 operations. Thus
the total cost is
n
X n1
X
k + n = n(n 1) + n = n2 .
C(n) = 2(n j) + 1 = 2
j=1 k=0
24
Error Analysis of Gaussian Elimination
Theorem 4.8. The back substitution algorithm is backward stable: the computed solution x
satises (U + U )x = b for some upper triangular matrix U Cnn with
kU k
= O(m ).
kU k
The proof makes extensive use of assumptions (A1) and (A2) about computer arithmetic and
we omit it here. Using theorem 3.5 about the conditioning of SLE we get an upper bound on
the error in the computed result of back substitution:
k
x xk (U ) kU k
kU k
= (U )O(m ).
kxk 1 (U ) kU k kU k
Thus the backward substitution step of Gaussian elimination is numerically stable and introduces
no problem. The same holds for forward substitution.
Problem. LU-factorisation is not backward stable! The eect is illustrated in the following
example. Let
1
A=
1 1
for some > 0. Then A has a LU-factorisation A = LU with
1 0 1
L= , U= .
1 1 0 1 1
Now assume 1. Then 1 is a huge number and the representation of these matrices stored
in a computer will be rounded. The matrices might be represented as
= 1 0 = 1
L , U ,
1 1 0 1
U= 1 0 0
L =A+ .
1 0 0 1
A small rounding error led to a large dierence in the result! The example shows that for
Gaussian elimination a backward error analysis will, in general, lead to the conclusion that the
perturbed problem is not close to the original one.
Note that this problem is not related to the conditioning of the matrix A. We have
1 1
A1 = (1 )1
1
and thus (A) = kAk kA1 k 4 for small >0 and the matrix A is well conditioned.
Because of this instability the classical Gaussian elimination method is not used in numerical
methods. The next section introduces a modication of Gaussian elimination, which cures this
problem.
Denition 4.9. A matrix P Rnn is called a permutation matrix if every row and every
column contains n1 zeros and one one.
25
Example. If : {1, . . . , n} {1, . . . , n} is a permutation, then the matrix P = (pij ) with
(
1 if j = (i) and
pij =
0 else
is a permutation matrix. (Every permutation matrix is of this form.) In particular the identity
matrix is a permutation matrix.
n
X
T
(P P )ij = pki pkj = ij
k=1
and thus P P = I . This shows that permutation matrices are orthogonal and have P 1 = P T .
2) IfP is the permutation matrix corresponding to the permutation , then (P 1 )ij = 1 if
1
and only if j = (i). Thus the permutation matrix P 1 corresponds to the permutation 1 .
3) We get
n
X
(P A)ij = pik akj = a(a),j
k=1
for all i, j {1, . . . , n}. This shows that multiplying a permutation matrix from the left reorders
the rows of A. Furthermore we have
n
X
(AP )ij = aik pkj = ai,1 (j)
k=1
and hence multiplying a permutation matrix from the right reorders the columns of A.
The problem in our example for the instability of Gaussian elimination was caused by the fact
that we had to divide by the tiny number ukk = in step 4 of the LU-factorisation algorithm. We
will avoid this problem in the improved version of the algorithm by rearranging rows k, . . . , n at
the beginning of the k th iteration in order to maximise the element ukk . The following argument
shows that the modied algorithm still works correctly.
We want to calculate
U = Ln1 Pn1 L1 P1 A.
Multiplying Pk from the left exchanges rows k and ik where ik is chosen to maximise the
element uik k . We can rewrite this as
where
1 1
L0k = Pn1 Pk+1 Lk Pk+1 Pn1
1 1
for k = 1, . . . , n 1. Since Pn1 Pk+1 exchanges rows k + 1, . . . , n and Pk+1 Pn1 performs
0
the corresponding permutation on the columns k + 1, . . . , n the shape of Lk is the same as the
shape of Lk : it is unit lower triangular and the only non-vanishing entries below the diagonal
are in column k. Hence we can still use lemma 4.4 to calculate L = (L0n1 L01 )1 . The above
arguments lead to the following algorithm.
26
4: exchange row (uk,k , . . . , uk,n ) with (ui,k , . . . , ui,n )
5: exchange row (lk,1 , . . . , lk,k1 ) with (li,1 , . . . , li,k1 )
6: exchange row (pk,1 , . . . , pk,n ) with (pi,1 , . . . , pi,n )
7: for j = k + 1, . . . , n do
8: ljk = ujk /ukk
9: (uj,k , . . . , uj,n ) = (uj,k , . . . , uj,n ) lj,k (uk,k , . . . , uk,n )
10: end for
11: end for
Remarks. 1) The resulting matrix L has lij 1 for all i, j {1, . . . , n}.
2) The computational complexity is the same as for the LU-algorithm. This is trivial in our
simplied analysis since we only added steps to exchange rows to the LU-algorithm and we do
not count these operations. But the result still holds for a more detailed complexity analysis: the
number of additional assignments if of order O(n2 ) and can be neglected for a (n3 ) algorithm.
0 0
and thus by applying the matrices L1 , . . . Ln1 in order we get
27
Remarks. Theorem 4.10 and lemma 4.11 together show that algorithm LUPP and hence
algorithm GEPP is backward stable.
Example. Let
1 1
1 1 1
Rnn .
A = 1 1 1 1
.. .
. .. .. .
.
. . . . .
1 1 1 1
Since all relevant elements have modulus 1 we do not need to use pivoting and LU factorisation
gives
1 1
1 2
1 4
. R
nn
U = .
.. .
. .
1 2n2
2n1
Thus for the matrix A as dened above we get
maxij |uij |
gn (A) = = 2n1 .
maxij |aij |
The example shows that the bound from lemma 4.11 is sharp. Thus the constant for the
backward error in theorem 4.10 can grow exponentially fast in n. We showed that GEPP is
stable but there are matrices where algorithm GEPP does not work very well! Since experience
shows that these matrices never occur in practice the GEPP method is nevertheless commonly
used.
The Householder QR-factorisation is another method to solve SLE. QR-factorisation avoids the
issues of the LU factorisation presented at the end of the previous section, but the computation
takes about double the number of operations. The method is based on the following theorem.
Theorem 4.12 (QR Factorisation). Every matrix A Cmn with m n can be written as
A = QR where Q Cmm is unitary and R Cmn is upper triangular.
Remarks. The factorisation in the theorem is called full QR-factorisation. Since all entries
R are 0, the columns n+1, . . . , m of Q do not contribute to the product QR.
below the diagonal of
Cmn consist of the rst n columns of Q and R
Let Q Cnn consist of the rst n rows of R.
R
Then we have A = Q . This is called the reduced QR-factorisation of A. The following picture
illustrates the situation.
28
2: rkj = hqk , aj i
Pj1
3: qj = aj k=1 rkj qk
4: rjj = k qi k2
5: if rjj > 0 then
6: qj = qi /rjj
7: else
8: let qj be an arbitrary normalised vector orthogonal to q1 , . . . , qj1
9: end if
10: end for
11: choose qn+1 , . . . , qm to make q1 , . . . , qm an orthonormal system.
This algorithm calculates the columns q1 , . . . , qm of the matrix Q and the entries of R which
are on or above the diagonal. The entries of R below the diagonal are 0. For the matrices Q
and R we get
j
X j1
X
(QR)ij = qk rkj = qk rkj + qj
i i
k=1 k=1
and thus A = QR.
By construction we have kqj k2 = 1 for j = 1, . . . , m. We use induction to show that the
columns q1 , . . . , qj are orthogonal for all j {1, . . . , m}. For j = 1 there is nothing to show.
Now let j > 1 and assume that q1 , . . . , qj1 are orthogonal. We have to prove hqi , qj i = 0 for
i = 1, . . . , j 1. If rjj = 0, this holds by denition of qj . Otherwise we have
1
hqi , qj i = hqi , qj i
rjj
j1
1 X
= hqi , a
j i + rkj hqi , qk i
rjj
k=1
1
= hqi , a
j i rij .
rjj
Thus induction shows that the columns of Q are orthonormal and that Q is unitary.
The following algorithm solves the problem SLE using QR-factorisation. In order to apply
it we will need a numerically stable method to calculate the QR-factorisation and we need to
calculate the matrix-vector product Q b.
29
Algorithm QR (Householder QR-factorisation).
input: A Rmn with m n
output: Q R orthogonal, R R
mm mn
upper triangular with A = QR
1: Q = I , R = A
2: for k = 1, . . . , n 1 do
3: u = (rkk , . . . , rmk ) Rmk+1
4: v = sign(u1 )kuk2 e1 + u where e1 = (1, 0, . . . , 0) Rmk+1
5: v = v/kv k2
)R
(mk+1)(mk+1)
6: Hk = (Imk+1 2vv
I 0
7: Qk = k1
0 Hk
8: R = Qk R
9: Q = Q Qk
10: end for
Remarks. 1) The algorithm calculates matrices Qk with Qk = Qk
for k = 1, . . . , n 1 as well
as R = Qn1 Q1 A Q = Q1 Qn1 .
and
2) We will see that Qk Q1 A has zeros below the diagonal in columns 1, . . . , k and thus the
nal result R = Qn1 Q1 A is upper triangular.
3) The only use we make of the matrix Q when solving SLE by QR-factorisation is to
calculate Q b. Thus for solving SLE we can omit the explicit calculation of Q by replacing line 9
of algorithm QR with the statement b = Qk b. The nal result in the variable b will then be
Householder Reections
In step 8 of algorithm QR we calculate a product of the form
R11 R12 R11 R12
Qk = (4.3)
0 R22 0 Hk R22
where R11 R(k1)(k1) and Hk , R22 R(mk+1)(mk+1) . The purpose of the current section
is to understand this step of the algorithm.
If Hk as calculated in step 6 of algorithm QR is applied to a vector x Rmk+1 the result is
Since the vector vhv, xi is the projection of x onto v , the value x vhv, xi is the projection of
x onto the plane which is orthogonal to v and x 2vhv, xi is the reection of x at that plane.
Reecting twice at the same plane gives back the original vector and thus we nd
Hk Hk = Hk Hk = I.
This shows that the matrices Hk , and then also Qk , k {1, . . . , n 1}.
are orthogonal for every
The vector which denes the reection plane is either v = u kuk2 e1
v = u (kuk2 e1 ),
or
depending on the sign of u1 . The corresponding reection maps the vector u to Hk u = kuk2 e1
or Hk u = kuk2 e1 respectively. In either case the image is a multiple of e1 and since u is the
rst column of the matrix block R22 the product Hk R22 has zeros below the diagonal in the
rst column. The rst column of R22 is the k th column of R and thus Qk R has zeros below
the diagonal in columns 1, . . . , k . For k = n 1 we nd that R = Qn1 Q1 A is an upper
triangular matrix as required.
Remarks. 1) The matrices Hk and sometimes also Qk are called Householder reections.
2) The choice of sign in the denition of u helps to increase the stability of the algorithm in
the cases u kuk2 e1 and u kuk2 e1 .
30
Computational Cost
We considered two variants of algorithm QR, either calculating the full matrix Q as formulated
in line 9 of the algorithm or only calculating Q b by replacing line 9 with the statement b = Qk b.
We handle the dierent cases by rst analysing the operation count for the algorithm with line 9
omitted.
Lemma 4.13. The computational cost C(m, n) for algorithm QR applied to an m n-matrix
without calculating Q or Q b is asymptotically
2
C(m, n) 2mn2 n3 for m, n with m = (n).
3
Proof. We count the number of operations for the individual steps of algorithm QR. From
equation (4.3) we can see that for calculating the product Qk R
in step 8 we only have to
calculate Hk R22 = R22 2vv R22 . Since v k2
v = v/k and v k2 = v v we can calculate this
k as
v
Hk R22 = R22 v R22 . (4.4)
v v/2
Using this formula we get the following operations count:
construction of v: 2(m k + 1) + 1 operations (counting as 1).
Calculating v v/2 needs 2(m k + 1) operations and dividing v by the result requires
another (m k + 1) divisions.
Calculating v R22 )
( )( from this needs (m k + 1)(n k + 1) multiplications.
n1
X
C(m, n) = 5(m k + 1) + 1 + (n k + 1) 4(m k + 1) 1
k=1
m
X
= 5l + 1 + (n m + l)(4l 1)
l=mn+2
2
= 2mn2 n3 + terms with at most two factors m, n
3
2
2mn n3
2
3
for m, n with m = (n).
If we need to calculate the full matrix Q we have to perform an (m k + 1) (m k + 1)
matrix-matrix multiplication in step 9. Assuming that we use the standard matrix multiplication
algorithm this contributes asymptotic cost (m3 ) and so the asymptotic cost of algorithm QR
will be increased by this step. But if we apply algorithm QR only to solve SLE, we just have
to calculate Q b instead of Q. Algorithmically this is the same as appending the vector b as an
additional column to the matrix A. Thus the computational cost for this algorithm is C(m, n+1)
and since
2 2
2m(n + 1)2 (n + 1)3 2mn2 n3
3 3
for m, n with m = (n) the asymptotic cost does not change. For solving SLE we also
have m = n and thus we nd that the asymptotic computational cost of solving SLE using
Householder QR factorisation is
2 4
C(n) 2nn2 n3 = n3 for n .
3 3
31
This analysis shows that solving SLE using Householder QR factorisation requires asymp-
totically double the number of steps than algorithm GEPP does. This is the price we have to
pay for the better stability properties of the QR-algorithm.
Error Analysis
Theorem 4.14. a) For A Rmn let R,
v1 , . . . , vn1 be the computed result of the Householder
QR-algorithm. Let
k = Ik1 0
Q
0 vk vk
Imk+1 2
and Q = Q 1 Q n1 . Then we have Q R = A + A for some A Rmn with
kAk
= O(m ).
kAk
Remarks. v1 , . . . , vn1 in the theorem denote the value calculated for the vec-
1) The values
tor v in step 5 during iteration k . Since the matrices Qk are not explicitly calculated (we use
formula (4.4) instead), v
1 , . . . , vn1 are the relevant computed values. For our analysis we con-
sider the matrices Q k which are exactly calculated from the vectors vk and thus are exactly
orthogonal.
2) Note that part b) of the theorem only gives an absolute error for ,
Q which seems not
very useful at rst sight. The following theorem about backward stability of the Householder
algorithm shows how we can use this nevertheless.
Theorem 4.15. The Householder algorithm for solving SLE is backward stable: the computed
solution x for Ax = b satises
kAk
x = b,
(A + A) = O(m ).
kAk
Proof. Let , R
Q and y be as in theorem 4.14. The nal result x
is computed from = y using
Rx
back substitution. From the backward stability of the back substitution algorithm we get that
the calculated x
satises
+ R) kRk
(R x = y, = O(m ).
kRk
This gives
+ Q)
b = (Q y
+ Q)(R
= (Q + R)
x
R
= (Q + Q R
+Q
R + Q R)
x
x
= (A + A)
where A = A+Q R+
Q R+Q R and A is the error of the computed QR-factorisation
from part a) of theorem 4.14.
We study the four terms in the denition of A one by one. From theorem 4.14 we know
kAk/kAk = O(m ). Since
Q is orthogonal we have =Q
R (A + A) and thus
kRk k kA + Ak = O(1).
kQ
kAk kAk
32
Therefore we also get
kQ Rk
kQk O(1) = O(m ).
kAk
Similarly we nd
Rk
kQ
kRk kRk = O(m )
kQk
kAk kAk
kRk
and
kQ Rk
kRk kRk
kQk = O(2m ).
kAk kAk
kRk
Together this shows
kAk/kAk = O(m ).
The theorem shows that the Householder method for solving SLE is backward stable. For
simplicity we did not consider the constants in the stability estimate
kAk/kAk = O(m ). A
more detailed analysis shows that the constant here does not suer from the exponential growth
problem described at the end of section 4.2
Exercises
1) In analogy to the back substitution algorithm formulate the forward substitution algorithm
to solve Ly = b where L Cnn is a lower triangular, invertible matrix and b Cn is a vector.
Show that your algorithm computes the correct result and that it has asymptotic computational
cost of order (n2 ).
2) a) Find the LU factorisation of
1 2
A= .
3 1
What is gn (A) in this case? b) Find a QR factorisation of the same matrix, using both Gram-
Schmidt and Householder. c) Find the condition number of A in the k k , k k2 and k k1
norms.
33
Chapter 5
Iterative Methods
Until now we considered methods which directly calculate x = A1 b using (n3 ) operations. In
this chapter we will introduce the idea of iterative methods. These methods construct a sequence
(xk )kN with
xk x for k .
For special matrices A the error kxk xk becomes reasonably small after less then (n3 ) oper-
ations.
The basic idea of the methods described in this section is to write the matrix A as A = M +N
where the matrix M is easier to invert than A. Then, given xk1 for k N, we can dene xk by
M xk = b N xk1 . (5.1)
If we assume for the moment that limk xk = x, then the limit x satises
and thus for the limit we get Ax = b. This shows that the only possible limit for this sequence
is the solution of SLE.
To study convergence for the method we consider the error ek = xk x where x is the exact
solution of SLE. We get the recursive relation
M ek = M xk M x = (b N xk1 ) (A N )x = N ek1
Before we settle the question of convergence in theorem 5.4 we need a few theoretical tools.
The rst of these is the Jordan canonical form of a matrix. We state the result without a proof.
Denition 5.1. A Jordan block Jn () Cnn for C is the matrix satisfying Jk ()ii = ,
Jk ()i,i+1 = 1, and Jk ()ij = 0 else, for i, j = 1, . . . , n. A Jordan matrix is a block diagonal
matrix J C
nn
of the form
Jn1 (1 )
Jn2 (2 )
J =
..
.
Jnk (k )
Pk
where j=1 nj = n.
34
Theorem 5.2 (Jordan canonical form). For any A Cnn there is an invertible S Cnn and
a Jordan matrix J Cnn satisfying
A = SJS 1
where the diagonal elements 1 , . . . , k of the Jordan blocks are the eigenvalues of A.
Using the Jordan canonical form we can derive the following property of the spectral radius.
Lemma 5.3. Let A Cnn and > 0. Then there is a vector norm on Cn such that the induced
matrix norm satises (A) kAk (A) + .
Proof. From theorem 1.11 we already know (A) kAk for every matrix norm k k. Thus we
only have to show the second inequality of the claim.
Let J = S 1 AS be the Jordan form of A and D = diag(1, , 2 , . . . , n1 ). Then
1
.. ..
. .
..
.
1
(SD )1 A(SD ) = D1 JD =
2 .
.. ..
. .
..
.
2
..
.
kAxk
kAk = max
kxk
x6=0
k(SD )1 Axk
= max
x6=0 k(SD )1 xk
Since we know the k k -matrix norm from theorem 1.16 and we have calculated the explicit
form of the matrix (SD )1 A(SD ) above, this is easy to evaluate. We get kAk maxi |i |+ =
(R) + . This completes the proof.
Now we can come back to the iteration dened in (5.1). There the error ek = xk x satises
ek = M 1 N ek1 for all k N and the method converges if and only if ek 0 for k . The
following theorem characterises convergence with the help of the spectral radius of the matrix
C = M 1 N. (5.2)
Theorem 5.4. Let C Cnn , e0 Cn and ek = C k e0 for all k N. Then ek 0 for all
e0 Cn if and only if (C) < 1.
Proof. Assume rst (C) < 1. Then by lemma 5.3 we nd an induced matrix norm with kCk < 1
and we get
kek k = kC k e0 k kCkk ke0 k 0
35
for k .
On the other hand, if (C) 1, then there is an e0 Cn with Ce0 = e0 for some C
with || 1. For this vector e0 we get
Remarks. 1) The theorem shows that the linear iterative method dened by (5.1) converges if
and only if (C) < 1 for C = M 1 N . The convergence is fast if (C) is small.
2) Since kAk (A) for every matrix norm k k, a sucient criterion for convergence of an
iterative method is kCk < 1 for any matrix norm k k.
In the remaining part of the section we will consider specic methods which are obtained by
choosing the matrices M and N in (5.1). For a matrix A Cnn dene the three n n-matrices
a11
a21 a22
L = a31 a32 , D= a33 ,
.. ..
. .
an1 an2 ... an,n1 ann
a12 a13 ... a1n
a23 ... a2n
.. .
U = . . (5.3)
. .
an1,n
Then we have A = L + D + U.
xk = D1 b (L + U )xk1
for all k N and the convergence properties are determined by the matrix C = M 1 N =
1
D (L + U ). Since D is diagonal, the inverse D1 is trivial to compute.
Theorem 5.5. a) The Jacobi method is convergent for all matrices A with
X
|aii | > |aij | (5.4)
j6=i
0 else.
36
Using theorem 1.16 we nd
n
1 X
kCk = max |aij |.
i=1,...,n |aii |
j=1
Thus the strong row sum criterion gives kCk < 1 which implies (C) < 1 and the method
converges.
A, then the strong row sum crite-
b) If the strong column sum criterion (5.5) holds for
A
rion (5.4) holds for and thus the method converges for A . From theorem 5.4 we know then
1
C the matrices C, C and also D1 C D have the
D (U +L ) < 1. Since for every matrix
1
(U + L) = D1 (U + L ) < 1 and the method converges
same eigenvalues we get D
for A.
2 3
1
The matrix A is not irreducible (indeed P = I in the denition is enough to see this) and since
there is no path from 3 to 1, the graph G(A) is not connected.
2 3
1
Now the graph G(A) is connected and the matrix is thus irreducible.
37
Remarks. The proof of equivalence of the two characterisations of irreducibility is based on the
following observations. 1) The graphs G(A) and G(P T AP ) are isomorphic, only the vertices are
numbered in a dierent way. 2) The block A 22 in the denition of irreducibility corresponds to
a set of states from where there is no path into the states corresponding to the block A 11 .
for i = 1, . . . , n but X
|akk | > |akj | (5.7)
j6=k
for one index k, then the Jacobi method converges. (This condition is called the weak row sum
criterion.)
Proof. We have to show that (C) < 1 where C = D1 (L + U ). Dene the matrix |C| =
(|cij |)ij R nn
and the vector e = (1, 1, . . . , 1) R . Then we have
n
n
X X |a |
ij
|C|e = |cij | 1 = e
j=1
i |aii | i
j6=i
where this and some of the following -signs are to be read componentwise. Therefore we get
|C|l+1 e |C|l e e
for all l N.
Let t(l) = e |C|l e 0. Then the vectors t(l) are componentwise increasing. Let l be the
(l)
number of non-vanishing components of t . We will show that l is strictly increasing until
it reaches the value n: Assume l+1 = l = k < n. Since one row of A satises the strict
inequality (5.7) we have |C|e 6= e and thus k > 0. Then without loss of generality (since we can
reorder the rows and columns of A) we have
(l) a (l+1) b
t = , t =
0 0
b
= t(l+1) = e |C|l1 e
0
|C|e |C|l1 e = |C|t(l)
|C11 | |C12 | a
= .
|C21 | |C22 | 0
From a>0 |C21 | = 0 and thus C21 = 0 and C would not be irreducible. Since
we have this
implies that A would not be irreducible, we have found the required contradiction and can
(n)
conclude l+1 > l whenever l < n. This proves t > 0 componentwise.
n
Because of e > |C| e we get
38
Computational Complexity
The sequence (xk )kN from the Jacobi method is dened by the relation
xk = D1 b (L + U )xk1 .
Example. Assume that each row of A contains at most ` non-zero entries outside the diagonal.
Then step 1) requires O(`n) operations and performing one step of Jacobi method has cost O(`n).
The next question is of course how many steps of the method we have to perform. Given
>0 we can calculate x1 , . . . , xk until
kAxk bk
.
kbk
kxk xk
(A) (5.8)
kxk
and by choosing small enough we can make the relative error of the solution arbitrarily small.
Also we know that the absolute error satises xk x = C k (x0 x). If the matrix A is
1
symmetric, then C = D (L + U ) is normal and from theorem 1.14 we get
Thus the error converges exponentially fast to 0 as k . If we know both the spectral
radius (C) and the condition number (A) we can combine estimate (5.9) with the conditioning
of the problem to get an upper bound on the number of remaining iteration steps until the relative
error of the result will satisfy (5.8).
Remarks. 1) If the matrix A is known explicitly, the above arguments can be used to derive
more specic results about the computational complexity of the Jacobi method.
2) For large sparse matrices A the intermediate results in the LU or QR-factorisation will
often not be sparse. If there is not enough memory in the system to store a full matrix, then it
might still be possible to use iterative methods whereas direct methods will fail.
39
5.2 The Conjugate-Gradient Method
Let ARnn be positive denite, c Rn and x = A1 b. Consider the norm kzkA = z T Az
and dene F : R R by
n
1
F (z) = kz xk2A (5.10)
2
for all z Rn . The function F can also be expressed as a function for the residuals r = b Az :
every z R we get
n
for
1
F (z) = (z x)T A(z x)
2
1 1
= z T Az bT z + bT Ab
2 2
1 T
= (b Az) A(b Az).
2
Since x is the unique minimum of F we can solve SLE by minimising F. We will convert this
idea into an iterative method by taking this minimum over an increasing sequence of subspaces.
Lemma 5.9. Consider the algorithm CG. Let m = min{ k | pk = 0 }. Then for all 0 k m
the following statement holds.
(1) rj pi for all 0 i < j k
(2a) riT ri > 0 for all 0 i < k
(2b) riT pi = riT ri for all 0 i k
(Ak )
(3) pTi Apj = 0 for all 0 i < j k
(4) riT rj = 0 for all 0 i < j k
(5) ri = b Axi for all 0 i k
Proof. We use induction over k. First note that the claim (A0 ) is trivially true. Assume that (Ak )
holds for some k < m. We show properties (1) to (5) for (Ak+1 ).
40
(1) For i=k we get
T rkT rk T
rk+1 pk = (rk k Apk )T pk = rkT pk p Apk = rkT pk rkT rk = 0 (5.11)
pTk Apk k
where the last equality comes from (2b) of (Ak ). For i<k we can use (1) and (3) to get
T
rk+1 pi = (rk k ApK )T pi = 0.
(2a) If we had rk = 0, then we had pk = k1 pk1 and using property (3) we would get the
contradiction
0 < pTk Apk = k1 pTk Apk1 = 0.
Thus we have rk 6= 0 rkT rk > 0.
and consequently
T T T
(2b) Using (5.11) we get rk+1 pk+1 = rk+1 (rk+1 + k pk ) = rk+1 rk+1 .
(3) Applying the denitions of rk+1 and pk+1 from the algorithm we nd
Remarks. 1) Since the lemma shows that the vectors r0 , . . . , rm1 Rn are non-zero and
T T
orthogonal, we have m < n. From pm = 0 we can conclude rm rm = rm pm = 0 and thus rm = 0.
Therefore the algorithm terminates after at most n steps and on termination we have Axm = b.
2) When the computation is performed on a computer, rounding errors will cause p m 6= 0
where pm is the calculated value for pm . In practice one just treats the method as an iterative
method and continues the iteration until the residual error krk k is small enough.
3) From the construction of the vectors rk and pk in the algorithm we nd span(r0 , . . . , rk ) =
span(p0 , . . . , pk ). Dene
k
X
(u0 , . . . , uk ) = F xk + u i pi
i=0
for all u0 , . . . , u k R, where F is given by (5.10) and xk is the value constructed in step k of
the algorithm. Since is positive and grows to the outside, it has only one minimum and there
the gradient is zero. We show that xk+1 is the value which minimises . We get
k
X T
(u0 , . . . , uk ) = xk + ui pi x Apj = rT pj
uj i=1
Pk Pk
where r = A x xk i=1 ui pi = b A(xk + i=1 ui pi ). For (u0 , u1 , . . . , uk ) = (0, 0, . . . , k )
we get
k
X
xk + ui pi = xk + k pk = xk+1
i=1
and
r = b Axk+1 = rk+1 .
41
Using this value of r and conditions (2b) and (1) from the lemma we also nd
T
(u0 , . . . , uk ) = rk+1 pj = 0
uj
4) From the algorithm we see pk span(r0 , Ar0 , . . . , Ak r0 ) for every k<m and thus we get
The space Kk+1 (r0 , A) is called a Krylov subspace of Rn for the matrix A.
Exercises
1) Which of the following matrices are irreducible? For which of these matrices does the
Jacobi-method converge?
4 2 1 1 3 8
2 1 , 3 25 4 , 4 2
1 4 1 2 1 1
1 1 x1 1
1 10 2 x2 = 2
2 1 x3 3
converges and, for the iteration starting with x0 = 0, give an upper bound on the number of
steps required to get the relative error of the result below 106 .
42
Chapter 6
A = U V
is called singular value decomposition (SVD) of U Cmm and V Cnn are unitary,
A, if
C mn
is diagonal, and the diagonal entries of are 1 2 p 0 where
p = min(m, n). The values 1 , . . . , p are called singular values of A.
Theorem 6.2. Every matrix has a singular value decomposition and the singular values are
uniquely determined.
Proof. Let A Cmn . We prove existence of the SVD by induction over min(m, n). If m=0
or n = 0 the matrices U, V , and are just the appropriately shaped empty matrices (one
dimension is zero) and nothing is to show.
Assumemin(m, n) > 0 and that the existence of the SVD is already known for matrices where
kAxk2
one dimension is smaller than min(m, n). Let 1 = kAk2 = maxx6=0
kxk2 = maxkxk2 =1 kAxk2 .
Since the map v 7 Av is continuous and the set { x | kxk2 = 1 } C is compact, the image
n
U1 = (u1 , . . . , um ) Cmm
and
V1 = (v1 , . . . , vn ) Cnn .
Then the product U1 AV1 is of the form
w
1
0
S = U1 AV1 = .
..
B
0
43
with w Cn1 and B C(m1)(n1) .
For unitary matrices U we have
and thus
kU1 AV1 xk2 kAV1 xk2
kSk2 = max = max = kAk2 = 1 .
x6=0 kxk2 x6=0 kV1 xk2
1 + w w
2
1
2 2
1
1/2
=
1 + w w = 1 + w w
w 2 Bw w 2
S
2
and thus kSk2 (12 + w w)1/2 . Together this allows us to conclude w=0 and thus
1 0
A = U1 SV1 = U1 V1 .
0 B
B = U2 2 V2 .
Then
1 0 1 0 1 0
A = U1 V1
0 U2 0 2 0 V2
is a SVD of A and existence of the SVD is proved.
Uniqueness of the largest singular value 1 holds, since 1 is uniquely determined by the
relation
kU V xk2 kxk2
kAk2 = max = max = 1 .
x6=0 kxk2 x6=0 kxk2
Remarks. 1) Inspection of the above proof reveals that for real matrices A the matrices U and
V are also real.
2) If m > n then the last mn columns of U do not contribute to the factorisation A = U V :
44
For the rest of this section let A Cmn be a matrix with singular value decomposition
A = U V and singular values 1 r > 0 = = 0. To illustrate the usefulness of the
SVD we prove a few basic results.
This shows
range(A) = range(U V ) = span(u1 , . . . , ur ) Cm .
b) We have
ker(A) = ker(U V ) = ker(V ).
Since V is orthogonal we can conclude
ker(A) = span(vr+1 , . . . , vn ) Cn .
In this section we will rst derive a theoretical criterion to nd the solution of LSQ and then we
will derive three dierent algorithms to solve LSQ.
Theorem 6.6. A vector x Cn minimises kAx bk2 for A Cmn and b Cm , if and only
if Ax b is orthogonal to range(A).
Proof. Let g(x) = 1
2 kAx bk22 . Then minimising kAx bk2 is equivalent to minimising the
function g.
(1) Assume that Ax b is orthogonal to range(A) and let y Cn . Then
Ay Ax = A(y x) Ax b
1
0= g(x + y) = hAy, Ax bi + hAx b, Ayi = RehAx b, Ayi
2
and
1
0= g(x + iy) = ihAy, Ax bi + ihAx b, Ayi = i ImhAx b, Ayi.
2
This shows hAx b, Ayi = 0 and thus Ax b Ay for all y Cn .
45
Proof. From the theorem we know that x solves LSQ if and only if Ax b range(A). This in
turn is equivalent to Ax b ai for every column ai of A, i.e. to A (Ax b) = 0.
Denition 6.8. The system (6.1) of linear equations is called the normal equations for LSQ.
We will consider dierent algorithms to solve LSQ. The rst one is directly based on the
normal equations.
1: calculate A A and A b
2: solve (A A)x = A b
4
C(m, n) mn2 + n3 .
3
There are methods to solve SLE for symmetric matrices with fewer steps. Using Cholesky
factorisation one gets a total asymptotic cost of order C(m, n) mn2 + 13 n3 .
Rx
A Ax = A Q = A Q
Q b = A b
and thus solves LSQ. Steps 1 and 2 together have asymptotic cost C1 (m, n) 2mn2 23 n3 , and
2
step 3 has asymptotic cost C2 (m, n) = O(n ). Thus we get total asymptotic cost
2
C(m, n) 2mn2 n3
3
for m, n with m = (n).
V
1: compute the reduced SVD A = U
2: compute U b C
n
=U
3: solve y b
4: return x = V y
V
A Ax = A U V y = A U
U b = A b
46
6.3 Conditioning of LSQ
As before let A Cmn with m n, rank(A) = n and b Cm . From corollary 6.7 we know
that the solution of LSQ can be calculated as
x = (A A)1 A b.
Remarks. rank(A) < n then A A is not invertible and we set (A) = +. Since for square,
If
+ 1
invertible matrices A we have A = A , the denition of (A) is consistent with the previous
one. As before the condition number depends on the chosen matrix norm.
A+ = (A A)1 A = (V V )1 V U = V ( )1 U . (6.2)
1 1
2 i =
i i
for i = 1, . . . , n. Thus equation (6.2) is (modulo ordering of the singular values) a singular value
decomposition of A+ and we nd
1
(A) = kAk2 kA+ k2 = 1 .
n
Theorem 6.12. Assume that x solves LSQ for b and x + x solves LSQ for b + b. Dene
by cos() = kAxk
[0, /2] kbk and let = kAkkxk/kAxk 1. Then we have
2
2
Remarks. The constant (A)/ cos() becomes large if either (A) is large or /2. In
either of these cases the problem is badly conditioned.
47
Theorem 6.13. Let and as above. Assume that x solves LSQ for A, b and x + x solves
LSQ for A + A, b. Then
kxk2 (A)2 tan() kAk2
(A) +
kxk2 kAk2
Theorems 6.13 and 6.14 together give estimates for the accuracy of the computed result x,
given that is bounded away from /2.
Exercises
1) By following the proof of the existence of a singular value decomposition, nd the SVD
for the matrix
1 2
A= .
3 1
2) You are given m data points (u(i) , v (i) ) where u(i) Rn1 and v(i) R for i = 1, . . . , m.
nd R and R to minimise
n1
We would like to
m
X
|T u(i) + v (i) |2 .
i=1
Show that this problem may be reformulated as a standard least squares problem by specifying
appropriate choices of A Rmn and b Rm .
48
Chapter 7
Ax = x, x 6= 0.
In this chapter we will learn to solve the following problem: given A, compute the eigenvalues
and eigenvectors.
0 a0
1
a1
.. .
.
A= . . . (7.1)
..
. an2
1 an1
By induction one can show that A (z) = det(A zI) = (1)n p(z). This shows that the problem
of nding eigenvalues is equivalent to the problem of nding roots of a polynomial.
Theorem 7.3 (Abel, 1824). For every n 5 there is a polynomial p of degree n with rational
coecients that has a real root which cannot be expressed only using rational numbers, addition,
subtraction, multiplication, division and taking kth roots.
As the computed result of a computer program will always be based on the operations
mentioned in the theorem, we nd that it is not possible to nd an algorithm which calculates
eigenvalues exactly after a nite number of steps. Conclusion: any eigenvalue solver must be
iterative.
Theorem 7.4. If A Cnn is Hermitian, then there exists a unitary matrix Q Cnn and
diagonal Rnn with
A = QQ .
49
Remarks. 1) The orthonormal columns of Q are the eigenvectors of A and the diagonal entries
of are the corresponding eigenvalues.
2) The theorem shows that Hermitian matrices have real eigenvalues.
The iterative methods presented in this chapter will calculate approximations to eigenvectors.
The following consideration will help to get an approximation for the corresponding eigenvalue
from this. Given A Cnn x Cn
we try to nd the C which minimises kAx xk2 .
and
If x is an eigenvector then the minimum is attained for the corresponding eigenvalue. Otherwise
we consider the normal equations: in the distance
hx, Axi
= (x x)1 x (Ax) = .
hx, xi
Denition 7.5. The Rayleigh quotient of a matrix A Cnn is dened by
hx, Axi
rA (x) =
hx, xi
for all x Cn .
Theorem 7.6. Let A Rnn be symmetric and x Rn with x 6= 0. Then x is an eigenvector
of A with eigenvalue if and only if rA (x) = 0 and rA (x) = .
Proof. The gradient of rA can be calculated as
Pn
j,k=1 xj ajk xk
rA (x) = Pn 2
xi j=1 xj i=1,...,n
P P Pn 2
P
k6=i aik xk + j6=i xj aji + 2aii xi j=1 xj j,k xj ajk xk 2xi
= Pn 2
2 i=1,...,n
j=1 xj
2Ax hx, xi 2hx, Axi x
=
hx, xi2 x
2
= Ax r A (x) x .
kxk22
Assume Ax = x. Then rA (x) = hx, xi/hx, xi = and
2
rA (x) = x x = 0.
kxk22
If on the other hand rA (x) = 0, then Ax rA (x)x = 0 and thus we get Ax = rA (x) x.
For the remaining part of the chapter let x1 , . . . , xn be a orthonormal system of eigenvectors
and and 1 , . . . , n be the corresponding eigenvalues of a real, symmetric matrix A. We order
the eigenvalues such that |1 | |2 | |n |.
50
Remarks. 1) For practical usage the method, as is every iterative method, is stopped at some
point where the result is close enough to the real one.
z (k) = Ak z (0) /kAk z (0) k2 and (k) = rA (z (k) ). To avoid over-
2) The algorithm calculates
(k)
ow/underow errors the vector z is normalised in every step of the iteration.
(0)
3) The method is based on the following idea: if we express z in the basis x1 , . . . , xn we
get
n
X
z (0) = i xi
i=1
and
n
X n
X
Ak z (0) = i Ak xi = i ki xi .
i=1 i=1
For large k this expression is dominated by the term corresponding to the eigenvalue with the
largest modulus.
Since 1 = hz (0) , x1 i =
6 0, we get
n n
X X i i k
Ak z (0) = i k xi = 1 k1 x1 + xi
i=1
1
i=2 1
n n
k (0)
2
A z
= |1 k1 |2 1 +
X i 2 i 2k k 2
2 2k X i 2
2
|1 1 | 1 + .
i=2
1 1 1 i=2
1
Using the Taylor approximation 1 + x2 1 + x2 /2 we can conclude
A z
= |1 k1 | 1 + O 2 2k .
k (0)
2 1
Ak z (0)
2
Ak z (0)
2 X n
(k) i 2 i 2k
2 k
x = x = = O
1 1
|1 k1 | 1 k1 1
2 2 1 1
i=2
and thus
51
This nishes the proof of (7.2).
b) From theorem 7.6 we know rA ( (k) x1 ) = 1 and rA ( (k) x1 ) = 0. Taylor expansion of
(k)
rA around x1 gives
= 1 + 0 + O kz (k) x1 k22 .
The power iteration algorithm helps to nd the eigenvalue with the largest modulus and the
corresponding eigenvector. The method can be modied to nd dierent eigenvalues of A. This
is done in the following algorithm.
6: end for
1
7: return (k) = + .
(k)
Remarks. 1) For practical usage the method is stopped at some point where the result is close
enough to the real one.
2) By comparison with the power iteration method we nd that (k) approximates the eigen-
1 1
value of (A I) with the largest modulus. Since (A I) has eigenvalues i = 1/(i )
this value is j where j is the eigenvalue closest to and we get
1 1
(k) = + + = + (j ) = j .
(k) j
Using this argument it is easy to convert theorem 7.7 into a theorem about the speed of conver-
gence for the inverse iteration algorithm.
(0)
The power iteration method only works if the initial vector z satises the condition
R
(0)
n
hz , x1 i 6= 0. This is no problem since dim z hz, xi = 0 = n 1 < n. The probability
(0)
of hitting this hyperplane with a randomly chosen initial vector z is zero if the distribution
has a density with respect to the Lebesgue measure on
n
R
and also at least one of the basis
vectors e1 , . . . , en satises the condition hei , x1 i =
6 0.
The following algorithm extends the idea of the power iteration algorithm: it runs the power
iteration method for n orthonormal vectors simultaneously, re-orthonormalising them at every
step. The result is an algorithm which approximates all eigenvectors and all eigenvalues at once.
52
1: choose an orthogonal matrix Q(0) Rnn
2: for k = 1, 2, 3, . . . do
3: W (k) = AQ(k1)
4: calculate the QR-factorisation W (k) = Q(k) R(k)
(k) (k) T (k) (k)
5: = (Q ) W Q
6: end for
Theorem 7.8. Let A Rnn be symmetric with eigenvalues 1 , . . . , n satisfying 1 > > n .
Assume hqi(0) , xi i 6= 0 for i = 1, . . . , n. Then there are sequences (i(k) )kN for i = 1, . . . , n with
i {+1, 1} for all i, k with
(k)
q (k) xi
= O k
(k)
i i 2
and
i = O 2k
(k)
ii
for i = 1, . . . , n where q1(k) , . . . , qn(k) are the columns of Q(k) and = maxi=1,...,n1 |i |/|i+1 |.
Remarks. Since the matrices R(k) are upper-triangular, the rst column of Q(k) in step 4 of
(k) (k)
the algorithm is a multiple of the rst column of the matrix W . Thus the rst column q1 of
(k) (0)
the matrix Q performs the original power iteration algorithm with initial vector q1 .
Exercises
1) Give a proof by induction which shows that the matrix A from (7.1) really has determi-
nant (1)n p(z) where p(z) = a0 + a1 z + + an1 z n1 + z n .
2) Give a proof of theorem 7.8.
53
Bibliography
[Dem97] James W. Demmel. Applied Numerical Linear Algebra. SIAM, 1997.
[Hig02] Nicholas J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, 2002.
[HJ85] Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press,
1985.
[SB02] J. Stoer and R. Bulirsch. Introduction to Numerical Analysis. Springer, third edition,
2002.
[TB97] Lloyd N. Trefethen and David Bau. Numerical Linear Algebra. SIAM, 1997.
54