You are on page 1of 55

Numerical Linear Algebra

Jochen Voss
University of Warwick

December 2004
I tried to keep the text as free of errors as possible. Please report any remaining mistakes to
Jochen Voss (voss@seehuhn.de). The current version of the text can always be found on my
home page at http://seehuhn.de/mathe/numlinalg.html .

Copyright
c 2004 Jochen Voss and Andrew Stuart

This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike Li-


cense. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/2.0/
or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.
Contents
Introduction 3
1 Linear Algebra 5
1.1 Vector Norms and Inner Products . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Complexity of Algorithms 11
2.1 Computational Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Analysis of Matrix-Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . 12

3 Stability and Conditioning 16


3.1 Conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4 Systems of Linear Equations 21


4.1 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Gaussian Elimination with Partial Pivoting . . . . . . . . . . . . . . . . . . . . . 25
4.3 The QR-Factorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5 Iterative Methods 34
5.1 Linear Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2 The Conjugate-Gradient Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

6 Least Square Problems 43


6.1 Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.2 The Normal Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.3 Conditioning of LSQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

7 The Eigenvalue Problem 49

2
Introduction
These lecture notes cover the course Numerical Linear Algebra (MA398) given in the autumn
term 2004 at the University of Warwick. The notes are partially based on lecture notes written
by Andrew Stuart for ealier courses.

What is numerical linear algebra? Of course, we consider the same problems which are
considered in a linear algebra course. But this time the focus is dierent. We are interested in

solving large problems by using a computer

considering speed and stability of the algorithms

These large problems occur for example when continuous problems are discretised or in image
analysis. Throughout the lecture we will address three main problems.

SLE (simultaneous linear equations): given a matrix A Cnn and a vector b Cn , nd
xC n
with
Ax = b.

LSQ (least square problem): given a matrix A Cmn and a vector b Cm (m n), nd
xC n
which minimises the distance

kAx bk.

EVP (eigenvalue problem): given a matrix A Cnn nd x Cn , C with

Ax = x with x 6= 0.

The lecture is structured as follows.


Chapter 1 (Linear Algebra) will start by summarising a few results from linear algebra and
provide some basic theoretical tools which we will need for our analysis.
Chapter 2 (Complexity of Algorithms) introduces measures for the cost to perform a given
algorithm and illustrates this by analysing a few simple problems.
Chapter 3 (Stability and Conditioning) examines how much the solution of a linear problem
can change when the problem is slightly perturbed, for example in the presence of rounding
errors.
Chapter 4 (SLE) compares several algorithms to solve systems of linear equations. We will
consider both the stability and the cost of these algorithms.
Chapter 5 (LSQ) introduces the least squares problem and discusses several algorithms to
solve it.
Chapter 6 (Iterative Methods) introduces another set of algorithms to solve systems of linear
equations: iterative methods only give approximate solutions but can be both faster and more
stable than the methods presented in chapter 4.
Chapter 7 (EVP) explains algorithms to nd eigenvalues of large matrices.

3
All of this is meant to deal with really big matrices. Where do these matrices come from?
As mentioned above these occur e.g. in the area of discretised continuous problems. This is
illustrated by the following example.

Example. We want to numerically solve the following problem: given a, b R, nd a function
f : [0, 1] R with f 00 (x) = 0 for all x [0, 1], f (0) = a and f (1) = b.
Of course this problem only serves as an illustration. It can easily be solved analytically
(question to the reader: what is the result?). But there are similar problems (for example in
the two-dimensional case) where direct solution is no longer feasible and the numerical approach
becomes practical. The methods used there are exactly the same as the ones presented in this
example.
The idea is to discretise the problem: for k = 0, . . . , N and N N let xk = k/N . We
consider the nite set {x 0 , x1 , . . . , xN } instead of the interval [0, 1] and we consider the vector
f (x0 ), f (x1 ), . . . , f (xn ) instead of the function f .
00
What should we do about the derivative f ? Using the Taylor formula we nd that for large
N the approximation

f (xk1 ) 2f (xk ) + f (xk+1 )


f 00 (xk ) for all k = 1, . . . , N 1 (0.1)
(1/N )2

holds. This leads to the following system of N +1 linear equations:

f (x0 ) = a
N f (xk1 ) 2N f (xk ) + N 2 f (xk+1 ) = 0
2 2
for k = 1, . . . , N 1
f (xN ) = b.

Since we used the approximation (0.1) the result will not be exact, but if we choose N large
enough the approximation gets better and we can hope that the result is close to the exact result.
To solve this problem we need to be able to deal with large systems of linear equations.
As mentioned above this example is very simple and only serves as an illustration, but more
interesting examples lead to the same kind of linear equations. You can learn more about this
in courses about numerical solution of partial dierential equations.

The course gives only an introduction in the topics of numerical linear algebra. Further
results can be found in many text books. The course is based on the following books: the books
of Lloyd N. Trefethen [TB97] and James W. Demmel [Dem97] give a good introductions into
the Topic. J. Stoer and R. Bulirsch [SB02] give a more theoretical presentation of numerical
analysis, which also contains results about numerical linear Algebra. The book of Roger A. Horn
and Charles R. Johnson [HJ85] is a good reference for theoretical results about matrix analysis.
Nicholas J. Higham's book [Hig02] contains a lot of information about stability and the eect of
rounding errors in numerical algorithms.

4
Chapter 1

Linear Algebra
The purpose of this chapter is to summarise a few results from linear algebra and to provide
some basic theoretical tools which we will later need for our analysis.

1.1 Vector Norms and Inner Products

Denition 1.1. A vector norm on Cn is a mapping k k : Cn R satisfying


a) kxk 0 for all x Cn and kxk = 0 i x = 0,

b) kxk = ||kxk for all C, x Cn , and

c) kx + yk kxk + kyk for all x, y Cn .

Examples:
the p-norm for 1 p < :
n 1/p
x Cn
X
kxkp = |xj |p
j=1

for p=2 we get the Euclidean norm:

v
u n
uX
kxk2 = t |xj |2 x Cn
j=1

for p=1 we get


n
x Cn
X
kxk1 = |xj |
j=1

Innity norm: kxk = max1jn |xj |

Denition 1.2. An inner-product on Cn is a mapping h , i: Cn Cn C satisfying:


a) hx, xi R+ for all x Cn and hx, xi = 0 i x=0

b) hx, yi = hy, xi for all x, y Cn

c) hx, yi = hx, yi for all C, x, y Cn .

d) hx, y + zi = hx, yi + hx, zi for all x, y, z Cn

5
Example. The standard inner product on Cn is given by

n
x, y Cn .
X
hx, yi = xj yj (1.1)
j=1

Remark. Conditions c) and d) above state that h , i is linear in the second component. Using
the rules for inner products we get

hx + y, zi = hx, zi + hy, zi for all x, y, z Cn

and
hx, yi = hx, yi for all C, x, y Cn .
i.e. the inner product is anti-linear in the rst component.

Denition 1.3. Two vectors x, y are orthogonal with respect to the inner product h, i i
hx, yi = 0.

Lemma 1.4. Let h , i : Cn Cn C be an inner product. Then k k : Cn R dened by


x Cn
p
kxk = hx, xi

is a vector norm.
Proof. Cn ,
p
a) Since h, i is an inner product we have hx, xi 0 for all x i.e. hx, xi is
dened without problems and positive. Also we get

kxk = 0 hx, xi = 0 x = 0.

b) We have
p p
kxk = hx, xi = hx, xi = || kxk.
c) Note that the proof of the Cauchy-Schwarz inequality

x, y Cn

hx, yi kxkkyk

only uses properties of the inner product. So we can use it here even before we know that kk
is a norm. We get

kx + yk2 = hx + y, x + yi
= hx, xi + hx, yi + hy, xi + hy, yi
kxk2 + 2 hx, yi + kyk2

kxk2 + 2kxkkyk + kyk2


x, y Cn .
2
= kxk + kyk

This nishes the proof.

We write matrices A Cmn as



a11 a12 ... a1n
a21 a22 ... a2n
A= . . = (aij )ij . (1.2)

.
.. .
.
.
.
am1 am2 ... amn

Denition 1.5. Cmn adjoint A Cnm A = aji



A
Given we dene the by
ij
. (For

AR mn
we get

A = AT .)

6
Using this denition we can write the standard inner product as

hx, yi = x y.

Denition 1.6. A matrix Cmn , m n, is unitary if Q Q = I , i.e. if the columns of Q


Q
are orthogonal with respect to the standard inner-product. (If Q R
mn T
satises Q Q = I , we
say that the matrix Q is orthogonal.)
A matrixA Cnn is Hermitian if A = A (for real matrices: symmetric).
A Hermitian matrix A C is positive-denite if x Ax > 0 Cn \ {0} (positive
nn
for all x
semi-denite for 0) .

Remarks. 1) Unless otherwise specied, h, i will denote the standard inner-product (1.1).
The standard inner product satises

hAx, yi = (Ax) y = x A y = hx, A yi

for all x, y Cn .
If A C
nn
2) is Hermitian and positive-denite, then

hx, yiA = hx, Ayi x, y Cn

denes an inner product and

x Cn
p
kxkA = hx, xiA

denes a norm on Cn .

1.2 Matrix Norms

Denition 1.7. A matrix norm on Cnn is a mapping k k : Cnn R with


a) kAk 0 for all A Cnn and kAk = 0 i A = 0,

b) kAk = ||kAk for all C, A Cnn ,

c) kA + Bk kAk + kBk for all A, B Cnn .

d) kABk kAkkBk for all A, B Cnn .

Remark. Conditions a), b) and c) state that kk is a vector norm on the vector space Cnn .
Condition d) only makes sense for matrices, since general vectors spaces are not equipped with
a product.

Denition 1.8. Given a vector norm k kv on Cn we dene the induced norm k km on Cnn
by
kAxkv
kAkm = max
x6=0 kxkv
for all A Cnn .

Theorem 1.9. The induced norm k km of a vector norm k kv is a matrix norm with
kIkm = 1

and
kAxkv kAkm kxkv
for all A C nn
and x C . n

7
Proof. a) kAkm R and kAkm 0 for all A Cnn is obvious from the denition. Also from
the denition we get

kAxkv
kAkm = 0 = 0 x 6= 0
kxkv
kAxkv = 0 x 6= 0 Ax = 0 x 6= 0 A = 0.

b) For C and A Cnn we get

kAxkv ||kAxkv
kAkm = max = max = ||kAkm
x6=0 kxkv x6=0 kxkv

c) For A, B Cnn we get

kAx + Bxkv
kA + Bkm = max
x6=0 kxkv
kAxkv + kBxkv
max
x6=0 kxkv
kAxkv kBxkv
max + max = kAkm + kBkm .
x6=0 kxkv x6=0 kxkv

Before we check condition d) from the denition of a matrix norm we verify

kIxkv kxkv
kIk = max = max =1
x6=0 kxkv x6=0 kxkv

and
kAykv kAxkv
kAkm = max x Cn \ {0}
y6=0 kykv kxkv
which gives
kAxkv kAkm kxkv x Cn .
d) Using this estimate we nd

kABxkv
kABkm = max
kxkv
x6=0

kAkm kBxkv
max = kAkm kBkm .
x6=0 kxkv

Usually one denotes the induced matrix norm with the same symbol as the corresponding
vector norm. For the remaining part of this text we will follow this convention.

Recall that x Cn is an eigenvector with eigenvalue C of a matrix A Cnn if

Ax = x and x 6= 0. (1.3)

Denition 1.10. The spectral radius of a matrix A Cnn is dened by


(A) = max || is eigenvalue of A .

Theorem 1.11. For any matrix norm k k, any matrix A Cnn and any ` N we have
(A)` kA` k kAk` .

8
Proof. By denition of the spectral radius (A) we can nd an eigenvector x with Ax = x and
(A) = ||. Let X Cnn be the matrix where all n columns are equal to x. Then we have
A` X = ` X and thus

kA` kkXk kA` Xk = k` Xk = ||` kXk = (A)` kXk.

Dividing by kXk gives (A)` kA` k. The second inequality follows from property d) in the
denition of a matrix norm.

Denition 1.12. A matrix A Cnn is normal if A A = AA .


Lemma 1.13. A Cnn is normal i it has n orthonormal eigenvectors, i.e. eigenvectors
x1 , . . . , xn with hxi , xj i = ij for all i, j = 1, . . . , n.

Theorem 1.14. If A Cnn is normal, then


(A)` = kA` k2 = kAk`2 ` N.

Proof. Let x1 , . . . , xn be an orthonormal basis composed of eigenvectors of A with corresponding


eigenvalues 1 , . . . , n . Without loss of generality we have (A) = |1 |.
Let x Cn . Then we can write
n
X
x= j xj
j=1

and get
n
X
kxk22 = |j |2 .
j=1

Similarly we nd
n
X n
X
2
Ax = j j xj and kAxk = |j j |2 .
j=1 j=1

This shows

2 1/2
Pn 
kAxk2 j=1 |j j |
=
kxk2 2 1/2
Pn 
j=1 |j |
 Pn |j |2 |1 |2 1/2
j=1
P n 2
j=1 |j |
= |1 | = (A) x Cn

and consequently kAk2 (A).


Using theorem 1.11 we get

(A)` kA` k2 kAk`2 (A)`

for all ` N. This nishes the proof.

Similar methods to those used in the proof of the previous result yield the following theorem.

Theorem 1.15. For all matrices A Cnn we have


p
kAk2 = (A A).

With the previous theorems we have among other things identied the matrix norm which
is induced by the 2-vector norm. The following theorem explicitly identies the matrix norm
associated to the innity-norm.

9
Theorem 1.16. The matrix norm induced by the innity norm is the maximum row sum norm:
n
X
kAk = max |aij |.
1in
j=1

Proof. For x Cn we get

n
X
kAxk = max |(Ax)i | = max aij xj
1in 1in
j=1
n
X
max |aij |kxk
1in
j=1

which gives
n
kAxk X
max |aij |
kxk 1in
j=1

all x C and thus kAk max1in


n
Pn
for j=1 |aij |.
For the lower bound choose k {1, 2, . . . , n} such that

n
X n
X
max |aij | = |akj |
1in
j=1 j=1

and dene x Cn by xj = akj /|akj | for all j = 1, . . . , n. Then we have kxk = 1 and

Pn akj
kAxk max1in j=1 aij |akj |
kAk =
kxk 1
n
X akj
akj
j=1
|akj |
n
X
= |akj |
j=1
n
X
= max |aij |.
1in
j=1

This is the required result.

Exercises
1) Show that the matrix norm induced by the k k1 -norm on Cn is the maximum column sum
norm n
X
kAk1 = max |aij |.
1jn
i=1

2) Show that kAkmax = maxi,j |aij | for all A C


nn
denes a vector norm on the space of
n n-matrices, but not a matrix norm.
3) Let A, B, C C
nn
with A = BC . Show that

kAkmax kBk kCkmax

and
kAkmax kBkmax kCk1 .

10
Chapter 2

Complexity of Algorithms
In this chapter we learn how to analyse how long it takes to solve a numerical problem on a
computer. Specically we are interested in the question how the cost to perform an algorithm
depends on the input size.

2.1 Computational Cost

The computational cost of an algorithm is the amount of resources it takes to perform this algo-
rithm on a computer. For simplicity here we just count the number of oating point operations
(additions, subtractions, multiplications, divisions) performed during one run of the algorithm.
A more detailed analysis would also take factors like memory usage into account.

Denition 2.1. The cost of an algorithm is


C(n) = number of additions, subtractions, multiplications and divisions

where n is the size of the input data (e.g. the number of equations etc.).

The following denition provides the notation we will use to describe the asyptotic compu-
tational cost of an algorithm, this is the behaviour of the cost C(n) for n .

Denition 2.2. For f, g : N N or f, g : R+ R+ we write


 g(x)
g(x) = O f (x) if lim sup < ,
x f (x)
 g(x)
g(x) = f (x) if lim inf > 0,
n f (x)

g(x) = f (x) if g(n) = (f (x)) and g(x) = O(f (x)).

Example. Using this notation we can write 5n2 + 2n 3 = (n2 ), n2 = O(n3 ) and n2 = (n).

Standard inner product algorithm.


input: x = (x1 , . . . , xn ), y = (y1 , . . . , yn )
output: s = hx, yi
1: s = 0
2: for i = 1, . . . , n do
3: s = s + xi yi
4: end for
5: return s

Theorem 2.3. The standard inner-product algorithm on Cn has computational cost C(n) =
(n). Any algorithm for the inner product has C(n) = (n).

11
Proof. The standard inner-product algorithm above has n multiplications and n additions, i.e.
C(n) = 2n = (n).
Sketch for the proof of for C(n) = (n): Since each of the products xi yi is independent of
the others, we have to calculate all n of them.

Remark. Giving a real proof for the lower bound in the above theorem would require a detailed
model about what an algorithm actually is. One would for example need to be able to prove
that guessing the result in just one operation and returning it, is not a proper algorithm. We
avoid these diculties here by only giving sketches for lower bounds.

Theorem 2.4. The standard method for Cnn matrix-matrix multiplication satises C(n) =
(n3 ). Any method has C(n) = (n2 ).

Proof. Denote the rows of A Cnn by a1 , . . . , an and the columns of B Cnn by b1 , . . . , b n .


Since
(AB)ij = ai bj for all i, j = 1, . . . , n
2
we have to calculate n inner products. Thus the asymptotic computational cost is C(n) =
n2 (n) = (n3 ).
Sketch for the lower bound: we have to calculate n2 entries of the resulting matrix and thus
2
C(n) n .

2.2 Analysis of Matrix-Matrix Multiplication

In theorem 2.4 there is a gap between the order (n3 ) of the standard method for multiplying
2
matrices and the lower bound (n ). The purpose of this section is to show that there are
actually algorithms with an asymptotic order which is better than (n3 ).
For A, B C nn
, n even and D = AB write
     
A11 A12 B11 B12 D11 D12
A= ,B = ,D =
A21 A22 B21 B22 D21 D22

where Aij , Bij , Dij Cn/2n/2 . Then we have

D11 = A11 B11 + A12 B21


D12 = A11 B12 + A12 B22
D21 = A21 B11 + A22 B21
D22 = A21 B12 + A22 B22 .

The above method calculates the product of two n n-matrices using eight multiplications
of (n/2) (n/2)-matrices. There is another way to calculate the entries of the matrix D, which
looks more complicated at rst but only uses seven multiplications of (n/2) (n/2)-matrices. It
will transpire that this fact can be utilised to get an asymptotically faster method of multiplying
matrices. Using

P1 = (A11 + A22 )(B11 + B22 )


P2 = (A21 + A22 )B11
P3 = A11 (B12 B22 )
P4 = A22 (B21 B11 )
P5 = (A11 + A12 )B22
P6 = (A21 A11 )(B11 + B12 )
P7 = (A12 A22 )(B21 + B22 )

12
we can write

D11 = P1 + P4 P5 + P7
D12 = P3 + P5
D21 = P2 + P4
D22 = P1 + P3 P2 + P6 .

Algorithm (Strassen Multiplication).


input: A, B Cnn with n = 2k for some k N0
output: AB C
nn

1: if n = 1 then
2: return AB
3: else
4: calculate P1 , . . . , P7 (using recursion)
5: calculate D11 , D12 , D21 and D22
6: return D
7: end if
Remark. Recursive algorithms of this kind are called divide and conquer algorithms.
n n
Using the Strassen-multiplication we can calculate D with 7 multiplications of 2 2 -matrices
n n
and 18 additions of
2 2 -matrices. Thus we nd

C(n) = 7C(n/2) + 18n2 /4. (2.1)

Lemma 2.5. The Strassen-multiplication has computational cost C(2k ) = 7 7k 6 4k for all
k N0 .
Proof. For k = 0 we get C(20 ) = C(1) = 1.
Assume the claim is true for k N0 . Then

C(2k+1 ) = 7C(2k ) + 18n2 /4


= 7(7 7k 6 4k )
= 7 7k+1 (7 6 18)4k
= 7 7k+1 6 4k+1 .

Now the claim follows by induction.

Theorem 2.6. The Strassen-algorithm for matrix-matrix multiplication has asymptotic compu-
tational cost C(n) = (nlog 7 ). 2

Remark. We will prove the theorem for n = 2k , k N0 . If n is not of this form we can extend
the matrices: choose k N0 with 2k n > 2k1 and dene A, B C2k 2k by
   
A 012 B 012
A = , B=
021 022 021 022

where 012 Cn(2 k


021 C(2 n)n and 012 C(2 n)(2 n)
n)
,
k k k
are zero-matrices of appro-
priate size. The and B
product of A may again be written in block form:
 
AB 012
AB =
021 022

Thus we can nd the product of the n n-matrices A and B by multiplying the 2k 2k -matrices
A and B
with the Strassen-algorithm. Since we have n 2k 2n, the extended matrices are at

most double the size of the original ones, and because (2n) = (n ) for every > 0 the result
k
for n = 2 implies C(n) = (n
log2 7
) for every n N.

13
Proof. (of the theorem) Let n = 2k . Then we have

k
7k = 2log2 (7 )
= 2k log2 7 = (2k )log2 7 = nlog2 7

and
4k = (2k )2 = n2 .
Using the lemma we get

C(n) = 7 7k 6 4k = 7nlog2 7 6 n2 = (nlog2 7 ).

This nishes the proof.

Theorem 2.7. If there is a method to multiply n n-matrices with asymptotic computational


cost O(n ) for some > 0, then it is also possible to invert n n-matrices with cost O(n ).
Proof. Let A Cnn be invertible. The proof consists of three steps.
1) As in the previous theorem we can restrict ourselves to the case n = 2k for some k N0 .
If n is not of this form we extend the matrix: choose k N0 with 2k n > 2k1 and dene the
A C2 2
k k
matrix by
 
A 012
A =
021 I22

012 Cn(2 021 C(2


k k
n) n)n
where and are zero-matrices and I22 is the (2k n) (2k n)-
identity matrix. Then the inverse of this matrix is

A1
 
012
A1 =
021 I22

A by inverting the 2k 2k -matrix A.


 
and we can invert Since (2n) = n the asymptotic
cost is unchanged.
2) Since A is invertible we have

x (A A)x = (Ax) Ax = kAxk22 > 0

for every x 6= 0 and thus A A is positive denite and therefore invertible. We can write

A1 = (A A)1 A .

This allows us to invert A with cost C(n) = D(n) + O(n ) where D is the cost of inverting a

Hermitian, positive denite matrix and O(n ) is the cost for matrix-matrix multiplication. So
we can restrict ourselves to the case of Hermitian, positive denite matrices.
3) To determine the cost function D let B be Hermitian and positive denite:

 
B11 B12
B=
B12 B22
n n 1
where the Bjk are
2 2 -matrices. Let S = B22 B12 B11 B12 . A direct calculation shows that

1 1 1 1
B12 S 1 B12

B12 S 1
 
B11 + B11 B11 B11
B= 1 .
S 1 B12

B11 S 1

(This formula for B 1 is called Schur complement.)


Exercise: show that B11 S are invertible.
and
This method to calculate B 1 needs 2 inversions of n2 n2 -matrices (namely of B11 and of
n
S ), a products of
2 n2 -matrices and b sums/subtractions of n2 n2 -matrices where a and b are
independent of n. This shows that

D(n) 2D(n/2) + O(n ) + O(n2 )

14
where O(n ) is the cost for the multiplications and O(n2 ) is the cost for the additions and
subtractions.
From theorem 2.4 we already know 2, so we can simplify the above estimate to

D(n) 2D(n/2) + cn

for some constant c > 0. With an induction argument (see exercise 4 below) one can conclude

c
D(2k ) (2k )
1 21
and thus we get
D(2k ) = O (2k )


for k . This nishes the proof.

Exercises
1) Let (i) f (n) = n2 [1 + sin(n)] and (ii) f (n) = n + n2 . In each case, which of the following
are true:

f (n) = O(1);

f (n) = O(n);

f (n) = O(n2 );

f (n) = (1);

f (n) = (n);

f (n) = (n2 ).

2) Show that the standard method for matrix-vector multiplication has asymptotic compu-
tational cost C(n) = (n2 ).
3) Show that the matrices S and B11 in the proof of theorem 2.7 are invertible.
4) Show by induction that D(1) = 1 and

D(n) 2D(n/2) + cn

for some constant c>0 implies

c
D(2k ) (2k )
1 21
for all k N0 .

15
Chapter 3

Stability and Conditioning


Rounding errors lead to computational results, which are dierent from the theoretical ones.
The methods from this chapter will help us to answer the following question: How close is the
calculated result to the correct one?

3.1 Conditioning

Denition 3.1. A problem is called well conditioned if small changes in the problem only lead
to small changes in the solution and badly conditioned if small changes in the problem can lead
to large changes in the solution.

For this chapter x a vector norm kk and a matrix norm kk which is compatible with the
vector norm, that is which satises

kAxk kAk kxk for all x Cn , A Cnn .

This condition is for example satised when the matrix norm is induced by the vector norm.

Denition 3.2. The condition number (A) of a matrix is the number


(
kAkkA1 k if A is invertible and
(A) =
+ else.

Remark. We always have kIk = kAA1 k kAkkA1 k = (A). For induced matrix norms this
implies (A) 1 for every A Cnn .

Example. Let A be real, symmetric and positive-denite with eigenvalues

max = 1 2 n = min > 0.

Then we have kAk2 = max . Since the matrix A1 has eigenvalues 1/1 , . . . , 1/n we nd
1
kA k2 = min and thus the condition number of A in 2-norm is

max
(A) = .
min

Lemma 3.3. Let Ax = b and A(x + x) = b + b. Assume b 6= 0. Then


kxk kbk
(A) .
kxk kbk
Proof. If A is not invertible the right hand side of the inequality is + and everything is clear.
Otherwise we have
kbk = kAxk kAkkxk (3.1)

16
and
A1 b = (x + x) A1 b = x + x x = x.
Therefore we get

kxk kA1 bk kA1 kkbk kbk


= kAkkA1 k
kxk kxk kxk kbk

where the last inequality is a consequence of (3.1).

The previous lemma gave an upper bound on how much the solution of the equation Ax = b
can change if the right hand side is slightly perturbed. The result shows that the problem is well
conditioned if the condition number (A) is small. Theorem 3.5 below gives a similar result for
perturbation of the matrix A instead of the vector b. For the proof we will need the following
lemma.

Lemma 3.4. If A Cnn satises kAk < 1 in any induced matrix norm, then I +A is invertible
and
k(I + A)1 k (1 kAk)1 .
Proof. With the triangle inequality we get

kxk = k(I + A)x Axk


k(I + A)xk + k Axk
k(I + A)xk + kAkkxk

and thus

k(I + A)xk 1 kAk kxk
for every x Cn . Since this implies (I + A)x 6= 0 for every x 6= 0, and thus the matrix I +A is
invertible.
Now let b 6= 0 and x = (I + A)1 b. Then

k(1 + A)1 bk kxk 1


= .
kbk k(I + A)xk 1 kAk
Since this is true for all b 6= 0, we have

k(I + A)1 bk 1
k(I + A)1 k = sup .
b6=0 kbk 1 kAk

This completes the proof.

Theorem 3.5 (conditioning of SLE) . Let x solve the equations


Ax = b and (A + A)(x + x) = b.

Assume that A is invertible with kA1 kkAk < 1 in some induced matrix norm. Then we have
kxk (A) kAk
.
kxk 1 (A) kAk
kAk
kAk

Proof. We nd
(A + A)x = b (A + A)x = A x
1
and thus (I + A A)x = A1 A x. Using lemma 3.4 we can write

x = (I + A1 A)1 A1 A x

and we get
kA1 Ak
kxk k(I + A1 A)1 kkA1 Akkxk kxk.
1 kA1 Ak

17
Since
kAk
kA1 Ak kA1 kkAk = (A)
kAk
and since the map x 7 x/(1 x) is increasing on the interval [0, 1) we get

kxk (A) kAk


kAk
.
kxk 1 (A) kAk
kAk

This is the required result.

3.2 Stability

Stability of an algorithm measures how susceptible it is to rounding errors occurring during the
computation. In this section we will for the rst time distinguish between the computed result as
returned by a computer and the exact result which would be the mathematically correct solution
of the problem.

Denition 3.6. Assume we want to numerically calculate a value y = f (x), but the algorithm
returns the computed result y 6= y which can be represented as the exact image y = f (
x) of a
dierent input value . Then y = y y is called the forward error and x = x
x x is called
the backward error.
y = f (
x)

exact

x
y
calculated

exact y = f (x)

x
results
input data

If x
is not unique then we choose the one which results in minimal kxk. Typically we
relative backward error kxk/kxk and the relative forward error kyk/kyk.
consider the

Theorem 3.7. For the problem SLE we have


rel. forward error (A) rel. backward error.
Proof. Note that in the notation of SLE the relative forward error is kxk/kxk and the relative
backward error is kbk/kbk. Therefore the theorem is an immediate consequence of lemma 3.3.

Internally computers represent real number using only a nite number of bits. Thus they
can only represent nitely many numbers and when dealing with general real numbers rounding
errors will occur. Let F R be the set of representable numbers and let fl : R F be the
rounding to the closest element of F.
In this course we will use a simplied model for computer arithmetic which is described
by the following two assumptions. The main simplication is, that we ignore the problems of
numbers which are unrepresentable because they are very large (overows) or very close to zero
(underows).

18
Assumptions. There is a parameter m > 0 (the machine epsilon) such that the following
conditions hold.

A1: For all xR there in an (m , +m ) with

fl(x) = x (1 + ).

A2: For each operation {+, , , /} and every x, y F there in an (m , +m ) with

x ~ y = (x y) (1 + )

where ~ denotes the computed version of .

Denition 3.8. An algorithm, is called backward stable if the relative backward error satises
kxk
= O(m ).
kxk

The typical way to use this concept is in a two-step procedure called backward error analysis.
In a rst step one shows that the algorithm in question is backward stable, i.e. that the inuence
of rounding errors can be represented as a small perturbation x of the original problem. In
the second step one uses results like theorem 3.7 about the conditioning of the problem (which
does not depend on the algorithm used) to show that the forward error is also small. Together
these steps show that the calculated result is close the the exact result.

Lemma 3.9. The calculated subtraction is backward stable.


Proof. The exact result of a subtraction is given by f (x1 , x2 ) = x1 x2 , the computed result is
f(x1 , x2 ) = fl(x1 ) fl(x2 ). Using assumption A1 we get

fl(x1 ) = x1 (1 + 1 ) and fl(x2 ) = x2 (1 + 2 )

with |1 |, |2 | < m . Using assumption A2 we get


fl(x1 ) fl(x2 ) = fl(x1 ) fl(x2 ) (1 + 3 )

where |3 | < m . This gives


fl(x1 ) fl(x2 ) = x1 (1 + 1 ) x2 (1 + 2 ) (1 + 3 )
= x1 (1 + 1 )(1 + 3 ) x2 (1 + 2 )(1 + 3 )
= x1 (1 + 4 ) x2 (1 + 5 )

where4 = 1 + 3 + 1 3 and 5 = 2 + 3 + 2 3 satisfy |4 |, |5 | 2m + O(2m ) = O(m ) for


m 0.
Thus for the input error we can choose
   
x1 x1 (1 + 4 )
x= , x
= , x
x = x
x2 x2 (1 + 5 )
p
and we get kxk2 = 24 x21 + 25 x22 O(m )kxk2 . This completes the proof.

Remarks. 1) The above proof is a case where the x


from the denition of the backward error
is not uniquely determined. But since we are only interested in the x
which minimises the
backward error, we can choose any x
which gives the result kxk2 O(m )kxk2 . The real x

can only be better.
2) Similar proofs show, that the operations , and are also backward stable.
3) Proofs of backward stability have to analyse the inuence of rounding errors and thus are
always based on our assumptions A1 and A2 about computer arithmetic. Since they tend to be
long but not dicult we omit most of these proofs in the course.

19
Exercises
1) Choose a matrix norm and calculate the condition number of the matrix

 
1
A=
1 1

in this norm.
2) Let x be a solution of Ax = b and let x
be a solution of (A + A)
x = b + b. Show that

k
x xk (A)  kAk kbk 
kAk
+ .
kxk 1 (A) kAk kAk kbk

3) Show that the calculated arithmetic operations , and are backward stable.

20
Chapter 4

Systems of Linear Equations


In this chapter we analyse the following problem: given A Cnn and b Cn , nd an x Cn
such that Ax = b. This is the problem we denote (SLE).
We present several methods to solve this problem. The common idea is to split the matrix
A into simpler matrices M and N as A = M N and to solve rst the system M y = b and then
N x = y . The result is a vector x with Ax = M N x = M y = b and thus we have solved SLE.

4.1 Gaussian Elimination

Gaussian elimination is the most commonly known method to solve systems of linear equations.
The mathematical background of the algorithm is the following theorem.

Denition 4.1. A matrix A Cmn is said to be upper triangular if aij = 0 for all i > j and
lower triangular if aij = 0 for all i < j . A triangular matrix is said to be unit triangular if all
diagonal entries are equal to 1.

Denition 4.2. The j th principal sub-matrix of a matrix A Cnn is the matrix Aj Cjj
j
with (A )kl = akl for 1 k, l j .

Theorem 4.3 (LU Factorisation). a) Let A Cnn be a matrix such that Aj is invertible for
j = 1, . . . , n. Then there is a unique factorisation A = LU where L is unit lower triangular and
U is non-singular upper triangular. b) If Aj is singular for one j {1, . . . , n} then there is no
such factorisation.
The following picture gives a graphical representation of the LU-factorisation.

Proof. n = 1 we can set L = (1) C11 and U = (a11 ) C11


a) We use a proof by induction: If
to get A = LU . Since L is the only unit lower triangular 1 1-matrix the factorisation is unique.
Now let n > 1 and assume that any matrix A C
(n1)(n1)
can be uniquely factorised in
the required form A = LU if all its principle sub-matrices are invertible. We write A C
nn
as

 n1 
A b
A= (4.1)
c ann

21
where An1 is the (n 1)th principle sub-matrix of A, and b, c C(n1) and ann C are the
remaining blocks. We are looking for a factorisation of the form

    
L 0 U u LU Lu
A= = (4.2)
` 1 0 ` U ` u +

with C(n1)(n1) unit lower triangular, U C(n1)(n1)


L invertible upper triangular,
`, u C and C. We compare the blocks of (4.1) and (4.2).
n1

By the induction hypothesis L and U with An1 = LU exist and are unique. Since the matrix
L is invertible the condition Lu = b determines a unique vector u. Since U is invertible there is

an uniquely determined ` with U ` = c and thus ` U = c . Finally the condition ` u + = ann
uniquely determines C. This shows that the required factorisation for A exists and is unique.
Since 0 6= det(A) = 1 det U the upper triangular matrix is non-singular.
b) Assume that A has an LU-factorisation and let j {1, . . . , n}. Then we can write A = LU
in block form as
      
A11 A12 L11 0 U11 U12 L11 U11 L11 U12
= =
A21 A22 L21 L22 0 U22 L21 U11 L21 U12 + L22 U22

where A11 , L11 , U11 Cjj . We get

det(Aj ) = det(A11 ) = det(L11 U11 ) = det(L11 ) det(U11 ) = 1 det(U11 ) 6= 0

and thus Aj is non-singular.

Example. The matrix A can be converted into upper triangular shape by multiplying lower
triangular matrices from the left. Let for example


2 1 1
A = 4 3 3 .
8 7 9

Then we can create zeros in the rst column below the diagonal by subtracting multiples of the
rst row from the other rows. In matrix notation this can be written as

1 2 1 1 2 1 1
L1 A = 2 1 4 3 3 = 0 1 1 .
4 1 8 7 9 0 3 5

Repeating this for the second column gives


1
2 1 1 2 1 1
L2 L1 A = 1 0 1 1 = 0 1 1 .
3 1 0 3 5 0 0 2

We have found lower triangular matrices L1 and L2 and an upper triangular matrix R with
A = (L2 L1 )1 R. The following lemma helps to calculate (L2 L1 )1 = L1 1
1 L2 .

Lemma 4.4. a) Let L = (`ij ) be unit lower triangular with non-zero entries below the diagonal
only in column k. Then L1 is also unit lower triangular with non-zero entries below the diagonal
only in column k and we have (L1 )ik = `ik for all i > k.
b) Let A = (aij ) and B = (bij ) be unit lower triangular n n matrices where A has non-
zero entries below the diagonal only in columns 1, . . . , k and B has non-zero entries below the
diagonal only in columns k + 1, . . . , n. Then AB is unit lower triangular with (AB)ij = aij for
j {1, . . . , k} and (AB)ij = bij for j {k + 1, . . . , n}.

Proof. a) Multiplying L with the suggested inverse gives the identity. b) Direct calculation.

22
Example. For the matrices L1 and L2 from the previous example we get

1 1 1
(L2 L1 )1 = L1 1
1 L2 = 2 1 1 = 2 1 .
4 1 3 1 4 3 1
Thus we found the LU -factorisation

2 1 1 1 2 1 1
4 3 3 = 2 1 1 1 .
8 7 9 4 3 1 2

The technique to convert A into an upper triangular matrix by multiplying lower triangular
matrices leads to the following algorithm:

Algorithm LU (LU-factorisation).
input: A Cnn with det(Aj ) 6= 0 for j = 1, . . . , n
output: L, U C
nn
where A = LU is the LU-factorisation of A

1: U = A, L = I
2: for k = 1, . . . , n 1 do
3: for j = k + 1, . . . , n do
4: ljk = ujk /ukk
5: (uj,k , . . . , uj,n ) = (uj,k , . . . , uj,n ) lj,k (uk,k , . . . , uk,n )
6: end for
7: end for
Remarks. Line 5 of the algorithm subtracts a multiple of line k from line j, causing ujk = 0
without changing columns 1, . . . , k1. This corresponds to multiplication with a lower triangular
matrix Lk as in the example above. Thus after the loop ending in line 6 is nished, the current
value of the matrix U is Lk L1 A and it has zeros below the diagonal in columns 1, . . . , k .
Since the principal sub-matrices Aj are non-singular and the matrices Lj are unit triangular
we get
det(Lk L1 A)k+1 = det Ak+1 6= 0
and thus we have ukk 6= 0 in line 4. Lemma 4.4 shows that the algorithm calculates the correct
entry ljk for the matrix L = (Ln L1 )1 .

The last missing building block for the Gaussian elimination method is the following algorithm
to solve systems of linear equations when the coecient matrix is triangular.

Algorithm BS (back substitution).


input: U Cnn non-singular, upper triangular and b Cn
output: x C with U x = b
n

1: for j = n, . . . , 1 do
n
1  X 
2: xj = bj ujk xk
ujj
k=j+1
3: end for
Remarks. 1) Since U is triangular we get

n
X
(U x)i = uij xj
j=i
 1 n n
X  X
= uii bi uik xk + uij xj
uii j=i+1
k=i+1

= bi .

23
Thus the algorithm is correct.
2) The corresponding algorithm to solve Lx = b where L is a lower triangular matrix is called
forward substitution.

Combining all our preparations we get the Gaussian elimination algorithm to solve the prob-
lem SLE:

Algorithm GE (Gaussian elimination).


input: A Cnn with det(Aj ) 6= 0 for j = 1, . . . , n
output: x C with Ax = b
n

1: nd the LU-factorisation of A


2: solve Ly = b using forward substitution
3: solve U x = y using back substitution

The result of this algorithm is an x Cn with Ax = LU x = Ly = b and thus the algorithm


gives the correct result.

Computational Complexity of Gaussian Elimination


Lemma 4.5. The LU-factorisation algorithm has computational cost
2 3 1 2 7
C(n) = n + n n.
3 2 6
Proof. We have to count the number of oating point operations in the LU-factorisation algo-
rithm. Line 5 requires (n k + 1) multiplications and (n k + 1) subtractions, i.e. 2(n k + 1)
operations. Line 4 contributes one division. Thus the loop starting at line 3 needs
 (n k) 1 +
2(n k + 1) operations. Considering the outer loop the total number of operations is

n1
X  n1
X
C(n) = (n k) 1 + 2(n k + 1) = 2(n k)3 + 3(n k).
k=1 k=1

The claim follows now by induction.

Notation. For f, g : N N or f, g : R+ R+ we write


f (x) g(x) x
for


if limx f (x)/g(x) = 1. This implies f (x) = g(x) but is a stronger concept since we also
compare the leading constants. Using this notation the claim of the lemma becomes C(n) 23 n3 .

Lemma 4.6. Forward substitution and back substitution have computational cost C(n) n2 .
Proof. Calculating xj in the back substitution algorithm needs 2(n j) + 1 operations. Thus
the total cost is

n
X n1
X
k + n = n(n 1) + n = n2 .

C(n) = 2(n j) + 1 = 2
j=1 k=0

A similar argument applies to the situation of forward substitution.

Theorem 4.7 (Computational complexity of Gaussian Elimination). The asymptotic computa-


tional cost of the GE algorithm is of order
2 3
C(n) n .
3
Proof. This is an immediate consequence of the previous two lemmas.

24
Error Analysis of Gaussian Elimination
Theorem 4.8. The back substitution algorithm is backward stable: the computed solution x

satises (U + U )x = b for some upper triangular matrix U Cnn with
kU k
= O(m ).
kU k

The proof makes extensive use of assumptions (A1) and (A2) about computer arithmetic and
we omit it here. Using theorem 3.5 about the conditioning of SLE we get an upper bound on
the error in the computed result of back substitution:

k
x xk (U ) kU k
kU k
= (U )O(m ).
kxk 1 (U ) kU k kU k

Thus the backward substitution step of Gaussian elimination is numerically stable and introduces
no problem. The same holds for forward substitution.

Problem. LU-factorisation is not backward stable! The eect is illustrated in the following
example. Let
 
1
A=
1 1
for some > 0. Then A has a LU-factorisation A = LU with
   
1 0 1
L= , U= .
1 1 0 1 1

Now assume  1. Then 1 is a huge number and the representation of these matrices stored
in a computer will be rounded. The matrices might be represented as

   
= 1 0 = 1
L , U ,
1 1 0 1

which is compatible with assumption (A1) on rounding errors. We have =L


L and U.
U But
multiplying the two rounded matrices gives

   
U= 1 0 0
L =A+ .
1 0 0 1

A small rounding error led to a large dierence in the result! The example shows that for
Gaussian elimination a backward error analysis will, in general, lead to the conclusion that the
perturbed problem is not close to the original one.
Note that this problem is not related to the conditioning of the matrix A. We have

 
1 1
A1 = (1 )1
1

and thus (A) = kAk kA1 k 4 for small >0 and the matrix A is well conditioned.

Because of this instability the classical Gaussian elimination method is not used in numerical
methods. The next section introduces a modication of Gaussian elimination, which cures this
problem.

4.2 Gaussian Elimination with Partial Pivoting

Denition 4.9. A matrix P Rnn is called a permutation matrix if every row and every
column contains n1 zeros and one one.

25
Example. If : {1, . . . , n} {1, . . . , n} is a permutation, then the matrix P = (pij ) with

(
1 if j = (i) and
pij =
0 else

is a permutation matrix. (Every permutation matrix is of this form.) In particular the identity
matrix is a permutation matrix.

Remarks. 1) If P is a permutation matrix then we have

n
X
T
(P P )ij = pki pkj = ij
k=1

and thus P P = I . This shows that permutation matrices are orthogonal and have P 1 = P T .
2) IfP is the permutation matrix corresponding to the permutation , then (P 1 )ij = 1 if
1
and only if j = (i). Thus the permutation matrix P 1 corresponds to the permutation 1 .
3) We get
n
X
(P A)ij = pik akj = a(a),j
k=1

for all i, j {1, . . . , n}. This shows that multiplying a permutation matrix from the left reorders
the rows of A. Furthermore we have
n
X
(AP )ij = aik pkj = ai,1 (j)
k=1

and hence multiplying a permutation matrix from the right reorders the columns of A.

The problem in our example for the instability of Gaussian elimination was caused by the fact
that we had to divide by the tiny number ukk = in step 4 of the LU-factorisation algorithm. We
will avoid this problem in the improved version of the algorithm by rearranging rows k, . . . , n at
the beginning of the k th iteration in order to maximise the element ukk . The following argument
shows that the modied algorithm still works correctly.
We want to calculate
U = Ln1 Pn1 L1 P1 A.
Multiplying Pk from the left exchanges rows k and ik where ik is chosen to maximise the
element uik k . We can rewrite this as

U = L0n1 L01 Pn1 P1 A

where
1 1
L0k = Pn1 Pk+1 Lk Pk+1 Pn1
1 1
for k = 1, . . . , n 1. Since Pn1 Pk+1 exchanges rows k + 1, . . . , n and Pk+1 Pn1 performs
0
the corresponding permutation on the columns k + 1, . . . , n the shape of Lk is the same as the
shape of Lk : it is unit lower triangular and the only non-vanishing entries below the diagonal
are in column k. Hence we can still use lemma 4.4 to calculate L = (L0n1 L01 )1 . The above
arguments lead to the following algorithm.

Algorithm LUPP (LU-factorisation with partial pivoting).


input: A Cnn non-singular
output: L, U, P C
nn
where P A = LU with L unit lower triangular,
R non-singular upper triangular and P a permutation matrix
1: U = A, L = I , P = I
2: for k = 1, . . . , n 1 do
3: choose i {k, . . . , n} which maximises |uik |

26
4: exchange row (uk,k , . . . , uk,n ) with (ui,k , . . . , ui,n )
5: exchange row (lk,1 , . . . , lk,k1 ) with (li,1 , . . . , li,k1 )
6: exchange row (pk,1 , . . . , pk,n ) with (pi,1 , . . . , pi,n )
7: for j = k + 1, . . . , n do
8: ljk = ujk /ukk
9: (uj,k , . . . , uj,n ) = (uj,k , . . . , uj,n ) lj,k (uk,k , . . . , uk,n )
10: end for
11: end for
Remarks. 1) The resulting matrix L has lij 1 for all i, j {1, . . . , n}.
2) The computational complexity is the same as for the LU-algorithm. This is trivial in our
simplied analysis since we only added steps to exchange rows to the LU-algorithm and we do
not count these operations. But the result still holds for a more detailed complexity analysis: the
number of additional assignments if of order O(n2 ) and can be neglected for a (n3 ) algorithm.

Gaussian elimination with partial pivoting now works as follows:

Algorithm GEPP (Gaussian elimination with partial pivoting).


input: A Cnn non-singular
output: x C with Ax = b
n

1: nd P A = LU using algorithm LUPP


2: solve Ly = P b using forward substitution
3: solve U x = y using back substitution

The result of this algorithm is an x Cn with Ax = P 1 LU x = P 1 Ly = P 1 P b = b and


thus the algorithm is correct.

Error Analysis of GEPP


Theorem 4.10 (Backward Error Analysis for LUPP). The computed result of algorithm LUPP
satises
U
L = P (A + A)
with
kAk
m p(n)gn (A)
kAk
where p is a polynomial and gn (A) (the growth factor) is dened as
maxij |uij |
gn (A) = .
maxij |aij |
Lemma 4.11. The growth factor gn (A) satises gn (A) 2n1 for every A Cnn .
Proof. We have U = L0n1 L01 P A. For any k {1, . . . , n 1} and any matrix A Cnn we
nd
n
X
max (L0k A)ij = (L0k )il alj

max
i,j i,j=1,...,n
l=1
n
X
max (L0k )il max |aij |

i=1,...,n i,j
l=1
2 max |aij |
i,j

0 0
and thus by applying the matrices L1 , . . . Ln1 in order we get

max uij 2n1 max |(P A)ij | 2n1 max |aij |.



i,j i,j i,j

This completes the proof.

27
Remarks. Theorem 4.10 and lemma 4.11 together show that algorithm LUPP and hence
algorithm GEPP is backward stable.

Example. Let

1 1
1 1 1
Rnn .

A = 1 1 1 1

.. .
. .. .. .
.
. . . . .
1 1 1 1
Since all relevant elements have modulus 1 we do not need to use pivoting and LU factorisation
gives

1 1

1 2

1 4
. R

nn
U = .

.. .
. .

1 2n2
2n1
Thus for the matrix A as dened above we get

maxij |uij |
gn (A) = = 2n1 .
maxij |aij |

The example shows that the bound from lemma 4.11 is sharp. Thus the constant for the
backward error in theorem 4.10 can grow exponentially fast in n. We showed that GEPP is
stable but there are matrices where algorithm GEPP does not work very well! Since experience
shows that these matrices never occur in practice the GEPP method is nevertheless commonly
used.

4.3 The QR-Factorisation

The Householder QR-factorisation is another method to solve SLE. QR-factorisation avoids the
issues of the LU factorisation presented at the end of the previous section, but the computation
takes about double the number of operations. The method is based on the following theorem.

Theorem 4.12 (QR Factorisation). Every matrix A Cmn with m n can be written as
A = QR where Q Cmm is unitary and R Cmn is upper triangular.

Remarks. The factorisation in the theorem is called full QR-factorisation. Since all entries
R are 0, the columns n+1, . . . , m of Q do not contribute to the product QR.
below the diagonal of
Cmn consist of the rst n columns of Q and R
Let Q Cnn consist of the rst n rows of R.
R
Then we have A = Q . This is called the reduced QR-factorisation of A. The following picture
illustrates the situation.

Proof. Let a1 , . . . , an Cm be the columns of A and q1 , . . . , qm Cm be the columns of Q. The


proof is based on the Gram-Schmidt orthonormalisation method to construct Q and andR:
1: for j=1,. . . ,n do

28
2: rkj = hqk , aj i
Pj1
3: qj = aj k=1 rkj qk
4: rjj = k qi k2
5: if rjj > 0 then
6: qj = qi /rjj
7: else
8: let qj be an arbitrary normalised vector orthogonal to q1 , . . . , qj1
9: end if
10: end for
11: choose qn+1 , . . . , qm to make q1 , . . . , qm an orthonormal system.
This algorithm calculates the columns q1 , . . . , qm of the matrix Q and the entries of R which
are on or above the diagonal. The entries of R below the diagonal are 0. For the matrices Q
and R we get
j
X  j1
X 
(QR)ij = qk rkj = qk rkj + qj
i i
k=1 k=1
and thus A = QR.
By construction we have kqj k2 = 1 for j = 1, . . . , m. We use induction to show that the
columns q1 , . . . , qj are orthogonal for all j {1, . . . , m}. For j = 1 there is nothing to show.
Now let j > 1 and assume that q1 , . . . , qj1 are orthogonal. We have to prove hqi , qj i = 0 for
i = 1, . . . , j 1. If rjj = 0, this holds by denition of qj . Otherwise we have
1
hqi , qj i = hqi , qj i
rjj
j1
1 X 
= hqi , a
j i + rkj hqi , qk i
rjj
k=1
1 
= hqi , a
j i rij .
rjj
Thus induction shows that the columns of Q are orthonormal and that Q is unitary.

Remarks. 1) The Gram-Schmidt orthonormalisation used in the proof is numerically unstable


and should not be used to calculate a QR-factorisation in practice.
2) For m=n we get square matrices Q, R Cnn . Since

det(A) = det(QR) = det(Q) det(R)


and det(Q) {+1, 1} the matrix R is invertible if and only if A is invertible.

The following algorithm solves the problem SLE using QR-factorisation. In order to apply
it we will need a numerically stable method to calculate the QR-factorisation and we need to
calculate the matrix-vector product Q b.

Algorithm (solving SLE by QR-factorisation).


input: A Cnn non-singular
output: x Cn with Ax = b
1: nd the QR-factorisation A = QR
2: calculate y = Q b
3: solve Rx = y using back substitution

The result of this algorithm is an x Cn with Ax = QRx = Qy = QQ b = b and thus the


algorithm is correct. To calculate the QR-factorisation we will use the following algorithm. We
present the full algorithm rst and then analyse it to see how it works. The algorithm uses the
sign function, which is dened as follows.
(
+1, if x 0, and
sign(x) =
1 else.

29
Algorithm QR (Householder QR-factorisation).
input: A Rmn with m n
output: Q R orthogonal, R R
mm mn
upper triangular with A = QR

1: Q = I , R = A
2: for k = 1, . . . , n 1 do
3: u = (rkk , . . . , rmk ) Rmk+1
4: v = sign(u1 )kuk2 e1 + u where e1 = (1, 0, . . . , 0) Rmk+1
5: v = v/kv k2
 )R
(mk+1)(mk+1)
6: Hk =  (Imk+1 2vv
I 0
7: Qk = k1
0 Hk
8: R = Qk R
9: Q = Q Qk
10: end for
Remarks. 1) The algorithm calculates matrices Qk with Qk = Qk
for k = 1, . . . , n 1 as well
as R = Qn1 Q1 A Q = Q1 Qn1 .
and
2) We will see that Qk Q1 A has zeros below the diagonal in columns 1, . . . , k and thus the
nal result R = Qn1 Q1 A is upper triangular.
3) The only use we make of the matrix Q when solving SLE by QR-factorisation is to

calculate Q b. Thus for solving SLE we can omit the explicit calculation of Q by replacing line 9
of algorithm QR with the statement b = Qk b. The nal result in the variable b will then be

Qn1 Q1 b = (Q1 Qn1 ) b = Q b.

Householder Reections
In step 8 of algorithm QR we calculate a product of the form

   
R11 R12 R11 R12
Qk = (4.3)
0 R22 0 Hk R22

where R11 R(k1)(k1) and Hk , R22 R(mk+1)(mk+1) . The purpose of the current section
is to understand this step of the algorithm.
If Hk as calculated in step 6 of algorithm QR is applied to a vector x Rmk+1 the result is

Hk x = x 2vv x = x 2vhv, xi.

Since the vector vhv, xi is the projection of x onto v , the value x vhv, xi is the projection of
x onto the plane which is orthogonal to v and x 2vhv, xi is the reection of x at that plane.
Reecting twice at the same plane gives back the original vector and thus we nd

Hk Hk = Hk Hk = I.

This shows that the matrices Hk , and then also Qk , k {1, . . . , n 1}.
are orthogonal for every
The vector which denes the reection plane is either v = u kuk2 e1
v = u (kuk2 e1 ),
or
depending on the sign of u1 . The corresponding reection maps the vector u to Hk u = kuk2 e1
or Hk u = kuk2 e1 respectively. In either case the image is a multiple of e1 and since u is the
rst column of the matrix block R22 the product Hk R22 has zeros below the diagonal in the
rst column. The rst column of R22 is the k th column of R and thus Qk R has zeros below
the diagonal in columns 1, . . . , k . For k = n 1 we nd that R = Qn1 Q1 A is an upper
triangular matrix as required.

Remarks. 1) The matrices Hk and sometimes also Qk are called Householder reections.
2) The choice of sign in the denition of u helps to increase the stability of the algorithm in
the cases u kuk2 e1 and u kuk2 e1 .

30
Computational Cost
We considered two variants of algorithm QR, either calculating the full matrix Q as formulated
in line 9 of the algorithm or only calculating Q b by replacing line 9 with the statement b = Qk b.
We handle the dierent cases by rst analysing the operation count for the algorithm with line 9
omitted.

Lemma 4.13. The computational cost C(m, n) for algorithm QR applied to an m n-matrix
without calculating Q or Q b is asymptotically
2
C(m, n) 2mn2 n3 for m, n with m = (n).
3
Proof. We count the number of operations for the individual steps of algorithm QR. From
equation (4.3) we can see that for calculating the product Qk R
in step 8 we only have to
calculate Hk R22 = R22 2vv R22 . Since v k2
v = v/k and v k2 = v v we can calculate this
k as

v
Hk R22 = R22 v R22 . (4.4)
v v/2
Using this formula we get the following operations count:

construction of v: 2(m k + 1) + 1 operations (counting as 1).

computing v R22 : Since each of the n k + 1 components of the matrix-vector prod-



R22 needs m k + 1 multiplications and
uct v
 m k additions, the computation of v R22
requires (n k + 1) (m k + 1) + (m k) operations.

Calculating v v/2 needs 2(m k + 1) operations and dividing v by the result requires
another (m k + 1) divisions.
Calculating v R22 )
( )( from this needs (m k + 1)(n k + 1) multiplications.

Calculating R22 ( ) requires (m k + 1)(n k + 1) subtractions.

Thus the total operation count is

n1
X 
C(m, n) = 5(m k + 1) + 1 + (n k + 1) 4(m k + 1) 1
k=1
m
X
= 5l + 1 + (n m + l)(4l 1)
l=mn+2
2
= 2mn2 n3 + terms with at most two factors m, n
3
2
2mn n3
2
3
for m, n with m = (n).
If we need to calculate the full matrix Q we have to perform an (m k + 1) (m k + 1)
matrix-matrix multiplication in step 9. Assuming that we use the standard matrix multiplication
algorithm this contributes asymptotic cost (m3 ) and so the asymptotic cost of algorithm QR
will be increased by this step. But if we apply algorithm QR only to solve SLE, we just have
to calculate Q b instead of Q. Algorithmically this is the same as appending the vector b as an
additional column to the matrix A. Thus the computational cost for this algorithm is C(m, n+1)
and since
2 2
2m(n + 1)2 (n + 1)3 2mn2 n3
3 3
for m, n with m = (n) the asymptotic cost does not change. For solving SLE we also
have m = n and thus we nd that the asymptotic computational cost of solving SLE using
Householder QR factorisation is

2 4
C(n) 2nn2 n3 = n3 for n .
3 3

31
This analysis shows that solving SLE using Householder QR factorisation requires asymp-
totically double the number of steps than algorithm GEPP does. This is the price we have to
pay for the better stability properties of the QR-algorithm.

Error Analysis
Theorem 4.14. a) For A Rmn let R,
v1 , . . . , vn1 be the computed result of the Householder
QR-algorithm. Let  
k = Ik1 0
Q
0 vk vk
Imk+1 2
and Q = Q 1 Q n1 . Then we have Q R = A + A for some A Rmn with
kAk
= O(m ).
kAk

b) Let y be the computed value for Q b. Then


+ Q)
(Q y = b, kQk = O(m ).

Remarks. v1 , . . . , vn1 in the theorem denote the value calculated for the vec-
1) The values
tor v in step 5 during iteration k . Since the matrices Qk are not explicitly calculated (we use
formula (4.4) instead), v
1 , . . . , vn1 are the relevant computed values. For our analysis we con-
sider the matrices Q k which are exactly calculated from the vectors vk and thus are exactly
orthogonal.
2) Note that part b) of the theorem only gives an absolute error for ,
Q which seems not
very useful at rst sight. The following theorem about backward stability of the Householder
algorithm shows how we can use this nevertheless.

Theorem 4.15. The Householder algorithm for solving SLE is backward stable: the computed
solution x for Ax = b satises

kAk
x = b,
(A + A) = O(m ).
kAk

Proof. Let , R
Q and y be as in theorem 4.14. The nal result x
is computed from = y using
Rx
back substitution. From the backward stability of the back substitution algorithm we get that
the calculated x
satises

+ R) kRk
(R x = y, = O(m ).

kRk

This gives

+ Q)
b = (Q y
+ Q)(R
= (Q + R)
x
R
= (Q + Q R
+Q
R + Q R)
x
x
= (A + A)

where A = A+Q R+
Q R+Q R and A is the error of the computed QR-factorisation
from part a) of theorem 4.14.
We study the four terms in the denition of A one by one. From theorem 4.14 we know
kAk/kAk = O(m ). Since
Q is orthogonal we have =Q
R (A + A) and thus


kRk k kA + Ak = O(1).
kQ
kAk kAk

32
Therefore we also get

kQ Rk
kQk O(1) = O(m ).
kAk
Similarly we nd
Rk
kQ
kRk kRk = O(m )
kQk
kAk kAk
kRk
and
kQ Rk
kRk kRk
kQk = O(2m ).
kAk kAk
kRk
Together this shows
kAk/kAk = O(m ).

The theorem shows that the Householder method for solving SLE is backward stable. For
simplicity we did not consider the constants in the stability estimate
kAk/kAk = O(m ). A
more detailed analysis shows that the constant here does not suer from the exponential growth
problem described at the end of section 4.2

Exercises
1) In analogy to the back substitution algorithm formulate the forward substitution algorithm
to solve Ly = b where L Cnn is a lower triangular, invertible matrix and b Cn is a vector.
Show that your algorithm computes the correct result and that it has asymptotic computational
cost of order (n2 ).
2) a) Find the LU factorisation of

 
1 2
A= .
3 1

What is gn (A) in this case? b) Find a QR factorisation of the same matrix, using both Gram-
Schmidt and Householder. c) Find the condition number of A in the k k , k k2 and k k1
norms.

3) a) Let A = a b. Find all eigenvalues and eigenvectors of A. When is A a normal matrix?


b) Let H Rnn be a Householder reection. Show that H has a single eigenvalue at 1 and
an eigenvalue +1 of multiplicity (n 1). What is the value of det(H)?
4) Determine the asymptotic computational cost of algorithm QR when calculating the full
matrix Q.
5) Let A Rnn where n = 2k . Noting that the LU-factorisation of a matrix A can be
written as   
L11 0 U11 U12
A = LU = ,
L21 L22 0 U22
design a divide and conquer strategy which results in LU-factorisation of A in O(na ) operations
a
where O(n ) is the cost of matrix multiplication.

33
Chapter 5

Iterative Methods
Until now we considered methods which directly calculate x = A1 b using (n3 ) operations. In
this chapter we will introduce the idea of iterative methods. These methods construct a sequence
(xk )kN with
xk x for k .
For special matrices A the error kxk xk becomes reasonably small after less then (n3 ) oper-
ations.

5.1 Linear Methods

The basic idea of the methods described in this section is to write the matrix A as A = M +N
where the matrix M is easier to invert than A. Then, given xk1 for k N, we can dene xk by

M xk = b N xk1 . (5.1)

If we assume for the moment that limk xk = x, then the limit x satises

M x = lim M xk = lim b N xk1 = b N x


k k

and thus for the limit we get Ax = b. This shows that the only possible limit for this sequence
is the solution of SLE.
To study convergence for the method we consider the error ek = xk x where x is the exact
solution of SLE. We get the recursive relation

M ek = M xk M x = (b N xk1 ) (A N )x = N ek1

and thus ek = M 1 N ek1 . The method converges if ek 0 for k .

Before we settle the question of convergence in theorem 5.4 we need a few theoretical tools.
The rst of these is the Jordan canonical form of a matrix. We state the result without a proof.

Denition 5.1. A Jordan block Jn () Cnn for C is the matrix satisfying Jk ()ii = ,
Jk ()i,i+1 = 1, and Jk ()ij = 0 else, for i, j = 1, . . . , n. A Jordan matrix is a block diagonal
matrix J C
nn
of the form

Jn1 (1 )
Jn2 (2 )
J =

..
.
Jnk (k )
Pk
where j=1 nj = n.

34
Theorem 5.2 (Jordan canonical form). For any A Cnn there is an invertible S Cnn and
a Jordan matrix J Cnn satisfying
A = SJS 1
where the diagonal elements 1 , . . . , k of the Jordan blocks are the eigenvalues of A.
Using the Jordan canonical form we can derive the following property of the spectral radius.

Lemma 5.3. Let A Cnn and > 0. Then there is a vector norm on Cn such that the induced
matrix norm satises (A) kAk (A) + .
Proof. From theorem 1.11 we already know (A) kAk for every matrix norm k k. Thus we
only have to show the second inequality of the claim.
Let J = S 1 AS be the Jordan form of A and D = diag(1, , 2 , . . . , n1 ). Then

1

.. ..
. .

..
.


1

(SD )1 A(SD ) = D1 JD =
2 .

.. ..
. .

..
.


2

..
.

Dene a vector norm kk on Cn by


kxk = (SD )1 x

for all x Cn . Then the induced matrix norm satises

kAxk
kAk = max
kxk
x6=0

k(SD )1 Axk
= max
x6=0 k(SD )1 xk

k(SD )1 A(SD )yk


= max
y6=0 kyk
1

= (SD ) A(SD )k .

Since we know the k k -matrix norm from theorem 1.16 and we have calculated the explicit
form of the matrix (SD )1 A(SD ) above, this is easy to evaluate. We get kAk maxi |i |+ =
(R) + . This completes the proof.

Now we can come back to the iteration dened in (5.1). There the error ek = xk x satises
ek = M 1 N ek1 for all k N and the method converges if and only if ek 0 for k . The
following theorem characterises convergence with the help of the spectral radius of the matrix

C = M 1 N. (5.2)

Theorem 5.4. Let C Cnn , e0 Cn and ek = C k e0 for all k N. Then ek 0 for all
e0 Cn if and only if (C) < 1.

Proof. Assume rst (C) < 1. Then by lemma 5.3 we nd an induced matrix norm with kCk < 1
and we get
kek k = kC k e0 k kCkk ke0 k 0

35
for k .
On the other hand, if (C) 1, then there is an e0 Cn with Ce0 = e0 for some C
with || 1. For this vector e0 we get

kek k = kC k e0 k = ||k ke0 k

and thus ek does not converge to 0.

Remarks. 1) The theorem shows that the linear iterative method dened by (5.1) converges if
and only if (C) < 1 for C = M 1 N . The convergence is fast if (C) is small.
2) Since kAk (A) for every matrix norm k k, a sucient criterion for convergence of an
iterative method is kCk < 1 for any matrix norm k k.

In the remaining part of the section we will consider specic methods which are obtained by
choosing the matrices M and N in (5.1). For a matrix A Cnn dene the three n n-matrices

a11
a21 a22

L = a31 a32 , D= a33 ,

.. ..
. .
an1 an2 ... an,n1 ann

a12 a13 ... a1n

a23 ... a2n
.. .
U = . . (5.3)

. .

an1,n

Then we have A = L + D + U.

The Jacobi Method


The iterative method obtained by choosing M = D and N = L+U in (5.1) is called Jacobi
method. This choice of M and N leads to the iteration

xk = D1 b (L + U )xk1


for all k N and the convergence properties are determined by the matrix C = M 1 N =
1
D (L + U ). Since D is diagonal, the inverse D1 is trivial to compute.
Theorem 5.5. a) The Jacobi method is convergent for all matrices A with
X
|aii | > |aij | (5.4)
j6=i

for i = 1, . . . , n. (This condition is called the strong row sum criterion.)


b) The Jacobi method is convergent for all matrices A with
X
|ajj | > |aij | (5.5)
i6=j

for j = 1, . . . , n. (This condition is called the strong column sum criterion.)


Proof. a) The matrix C = D1 (L + U ) has entries
(
a
aij , if i 6= j , and
cij = ii

0 else.

36
Using theorem 1.16 we nd
n
1 X
kCk = max |aij |.
i=1,...,n |aii |
j=1

Thus the strong row sum criterion gives kCk < 1 which implies (C) < 1 and the method
converges.
A, then the strong row sum crite-
b) If the strong column sum criterion (5.5) holds for

A
rion (5.4) holds for and thus the method converges for A . From theorem 5.4 we know then
1
C the matrices C, C and also D1 C D have the

D (U +L ) < 1. Since for every matrix
1
(U + L) = D1 (U + L ) < 1 and the method converges

same eigenvalues we get D
for A.

Denition 5.6. A matrix A Cnn is called irreducible if there is no permutation matrix P


such that
A11 A12
 
T
P AP =
0 A22
where A11 Cpp , A12 Cpq , A22 Cqq and 0 is a qp zero-matrix with p+q = n and
p, q > 0.
There is an alternative description of irreducibility, which is often easier to check than the
denition. To A associate the oriented graph G(A) with vertices 1, . . . , n and edges i j for
all i, j {1, . . . , n} with aij 6= 0. Then the matrix A is irreducible if and only if the graph G(A)
is connected, i.e. if you can reach any vertex j from any vertex i by following edges.

Example. Consider the matrix



1 1
A= 1 1 .
1
The associated graph is

2 3
1

The matrix A is not irreducible (indeed P = I in the denition is enough to see this) and since
there is no path from 3 to 1, the graph G(A) is not connected.

Example. In continuation of the previous example consider the modied matrix



1 1
A = 1 2 1 .
1 1

The associated graph is

2 3
1

Now the graph G(A) is connected and the matrix is thus irreducible.

37
Remarks. The proof of equivalence of the two characterisations of irreducibility is based on the
following observations. 1) The graphs G(A) and G(P T AP ) are isomorphic, only the vertices are
numbered in a dierent way. 2) The block A 22 in the denition of irreducibility corresponds to
a set of states from where there is no path into the states corresponding to the block A 11 .

Theorem 5.7. If A is irreducible and satises


X
|aii | |aij | (5.6)
j6=i

for i = 1, . . . , n but X
|akk | > |akj | (5.7)
j6=k

for one index k, then the Jacobi method converges. (This condition is called the weak row sum
criterion.)
Proof. We have to show that (C) < 1 where C = D1 (L + U ). Dene the matrix |C| =
(|cij |)ij R nn
and the vector e = (1, 1, . . . , 1) R . Then we have
n

n
X  X |a | 
ij
|C|e = |cij | 1 = e
j=1
i |aii | i
j6=i

where this and some of the following -signs are to be read componentwise. Therefore we get

|C|l+1 e |C|l e e

for all l N.
Let t(l) = e |C|l e 0. Then the vectors t(l) are componentwise increasing. Let l be the
(l)
number of non-vanishing components of t . We will show that l is strictly increasing until
it reaches the value n: Assume l+1 = l = k < n. Since one row of A satises the strict
inequality (5.7) we have |C|e 6= e and thus k > 0. Then without loss of generality (since we can
reorder the rows and columns of A) we have

   
(l) a (l+1) b
t = , t =
0 0

where a, b Rk and a, b > 0. We can conclude

 
b
= t(l+1) = e |C|l1 e
0
|C|e |C|l1 e = |C|t(l)
  
|C11 | |C12 | a
= .
|C21 | |C22 | 0

From a>0 |C21 | = 0 and thus C21 = 0 and C would not be irreducible. Since
we have this
implies that A would not be irreducible, we have found the required contradiction and can
(n)
conclude l+1 > l whenever l < n. This proves t > 0 componentwise.
n
Because of e > |C| e we get

(C n ) kC n k = |C n | |C|n = max |C|n e i < 1



i=1,...,n

and thus (C) < 1. This completes the proof.

38
Computational Complexity
The sequence (xk )kN from the Jacobi method is dened by the relation

xk = D1 b (L + U )xk1 .


For each iteration we have to calculate


1) the product (L + U )xk1
2) the dierence b (needs n subtractions)
1
3) the product D (needs n divisions because D is diagonal)
2
For general matrices step 1) needs (n ) operations and thus dominates the computational cost.
Iterative methods are most powerful if the matrix A is sparse, i.e. if it contains many zeros, since
then the matrix-vector multiplication can be performed using fewer operations.

Example. Assume that each row of A contains at most ` non-zero entries outside the diagonal.
Then step 1) requires O(`n) operations and performing one step of Jacobi method has cost O(`n).

The next question is of course how many steps of the method we have to perform. Given
>0 we can calculate x1 , . . . , xk until

kAxk bk
.
kbk

From the conditioning of SLE (lemma 3.3) we then know

kxk xk
(A) (5.8)
kxk
and by choosing small enough we can make the relative error of the solution arbitrarily small.
Also we know that the absolute error satises xk x = C k (x0 x). If the matrix A is
1
symmetric, then C = D (L + U ) is normal and from theorem 1.14 we get

kxk xk2 kC k kkx0 xk2 = (C)k kx0 xk2 . (5.9)

Thus the error converges exponentially fast to 0 as k . If we know both the spectral
radius (C) and the condition number (A) we can combine estimate (5.9) with the conditioning
of the problem to get an upper bound on the number of remaining iteration steps until the relative
error of the result will satisfy (5.8).

Remarks. 1) If the matrix A is known explicitly, the above arguments can be used to derive
more specic results about the computational complexity of the Jacobi method.
2) For large sparse matrices A the intermediate results in the LU or QR-factorisation will
often not be sparse. If there is not enough memory in the system to store a full matrix, then it
might still be possible to use iterative methods whereas direct methods will fail.

The Gauss-Seidel Method


M = L + D and N = U for the matrices L, D and U from (5.3) then
If we use we still have
A = M + N and we get the Gauss-Seidel method. In this situation the iterative scheme (5.1)
has the form
(L + D)xk = b U xk1 .
Since (L + D) is lower triangular we can use forward substitution to calculate xk in each step.
The convergence properties of this method are determined by the matrix C = (L + D)1 U .
Theorem 5.8. Assume either
a) A satises the strong row sum criterion or
b) A is irreducible and satises the weak row sum criterion.
Then the Gauss-Seidel method converges.

39
5.2 The Conjugate-Gradient Method

Let ARnn be positive denite, c Rn and x = A1 b. Consider the norm kzkA = z T Az
and dene F : R R by
n
1
F (z) = kz xk2A (5.10)
2
for all z Rn . The function F can also be expressed as a function for the residuals r = b Az :
every z R we get
n
for

1
F (z) = (z x)T A(z x)
2
1 1
= z T Az bT z + bT Ab
2 2
1 T
= (b Az) A(b Az).
2
Since x is the unique minimum of F we can solve SLE by minimising F. We will convert this
idea into an iterative method by taking this minimum over an increasing sequence of subspaces.

Idea. Given xk , nd xk+1 with

F (xk+1 ) = min F (xk + u0 r0 + + uk rk )


u0 ,...,uk

where F (z) = 21 kz xk2A and rk = b Axk .

Algorithm CG (Conjugate-Gradient method).


input: A Rnn positive denite, b Rn , x0 Rn
output: xk R with xk = A
n 1
b
1: p0 = b Ax0 , r0 = b Ax0
2: for k = 0, 1, 2, . . . do
3: if pk = 0 then
4: return xk
5: else
rT r
6: k = pTkApkk
k
7: xk+1 = xk + k pk
8: rk+1 = rk k Apk
rT r
k+1
9: k = k+1Tr
rk k
10: pk+1 = rk+1 + k pk
11: end if
12: end for
Remarks. At rst sight this seems not to be a proper algorithm, since it is not clear whether
it terminates after a nite number of steps. We will later see that (in the absence of rounding
errors) we get pk = 0 for some k<n and thus the algorithm really terminates.

Lemma 5.9. Consider the algorithm CG. Let m = min{ k | pk = 0 }. Then for all 0 k m
the following statement holds.

(1) rj pi for all 0 i < j k
(2a) riT ri > 0 for all 0 i < k




(2b) riT pi = riT ri for all 0 i k


(Ak )
(3) pTi Apj = 0 for all 0 i < j k
(4) riT rj = 0 for all 0 i < j k




(5) ri = b Axi for all 0 i k

Proof. We use induction over k. First note that the claim (A0 ) is trivially true. Assume that (Ak )
holds for some k < m. We show properties (1) to (5) for (Ak+1 ).

40
(1) For i=k we get

T rkT rk T
rk+1 pk = (rk k Apk )T pk = rkT pk p Apk = rkT pk rkT rk = 0 (5.11)
pTk Apk k

where the last equality comes from (2b) of (Ak ). For i<k we can use (1) and (3) to get

T
rk+1 pi = (rk k ApK )T pi = 0.

(2a) If we had rk = 0, then we had pk = k1 pk1 and using property (3) we would get the
contradiction
0 < pTk Apk = k1 pTk Apk1 = 0.
Thus we have rk 6= 0 rkT rk > 0.
and consequently
T T T
(2b) Using (5.11) we get rk+1 pk+1 = rk+1 (rk+1 + k pk ) = rk+1 rk+1 .
(3) Applying the denitions of rk+1 and pk+1 from the algorithm we nd

pTi Apk+1 = pTi A(rk+1 + k pk )


1
= (ri ri+1 )T Ark+1 + k pTi Apk
i
1
= (pi i1 pi1 pi+1 + i pi )T rk+1 + k pTi Apk .
i
For i<k i = k we can use
the right-hand side is zero because of properties (1) and (3). For
the denitions of i
k as well as properties (1) and (2) to see that the right hand side still
and
T
vanishes. Thus we get pi Apk+1 for every i < k + 1.
T T
(4) From the denition of pk+1 and property (1) we nd ri rk+1 = (pi i1 pi1 ) rk+1 = 0.
(5) By denition of xk+1 and rk+1 we get bAxk+1 = bA(xk +k pk ) = rk k Apk = rk+1 .
The claim of the lemma follows by induction.

Remarks. 1) Since the lemma shows that the vectors r0 , . . . , rm1 Rn are non-zero and
T T
orthogonal, we have m < n. From pm = 0 we can conclude rm rm = rm pm = 0 and thus rm = 0.
Therefore the algorithm terminates after at most n steps and on termination we have Axm = b.
2) When the computation is performed on a computer, rounding errors will cause p m 6= 0
where pm is the calculated value for pm . In practice one just treats the method as an iterative
method and continues the iteration until the residual error krk k is small enough.
3) From the construction of the vectors rk and pk in the algorithm we nd span(r0 , . . . , rk ) =
span(p0 , . . . , pk ). Dene
k
X 
(u0 , . . . , uk ) = F xk + u i pi
i=0

for all u0 , . . . , u k R, where F is given by (5.10) and xk is the value constructed in step k of
the algorithm. Since is positive and grows to the outside, it has only one minimum and there
the gradient is zero. We show that xk+1 is the value which minimises . We get

k
X T
(u0 , . . . , uk ) = xk + ui pi x Apj = rT pj
uj i=1

Pk  Pk
where r = A x xk i=1 ui pi = b A(xk + i=1 ui pi ). For (u0 , u1 , . . . , uk ) = (0, 0, . . . , k )
we get
k
X
xk + ui pi = xk + k pk = xk+1
i=1

and
r = b Axk+1 = rk+1 .

41
Using this value of r and conditions (2b) and (1) from the lemma we also nd

T
(u0 , . . . , uk ) = rk+1 pj = 0
uj

for j = 1, . . . , k . Together this shows that we really have

F (xk+1 ) = min (u0 , . . . , uk ).


u0 ,...,uk

4) From the algorithm we see pk span(r0 , Ar0 , . . . , Ak r0 ) for every k<m and thus we get

span(p0 , . . . , pk ) = span(r0 , Ar0 , . . . , Ak r0 ) =: Kk+1 (r0 , A).

The space Kk+1 (r0 , A) is called a Krylov subspace of Rn for the matrix A.

Theorem 5.10 (speed of convergence) . The CG-method satises


 p(A) 1 k
kek kA 2 p ke0 kA
(A) + 1

where ek = xk x and (A) is the condition number of A in the k k2 -norm.

Exercises
1) Which of the following matrices are irreducible? For which of these matrices does the
Jacobi-method converge?


4 2 1 1 3 8
2 1 , 3 25 4 , 4 2
1 4 1 2 1 1

2) Show that the Jacobi method for the solution of


1 1 x1 1
1 10 2 x2 = 2
2 1 x3 3

converges and, for the iteration starting with x0 = 0, give an upper bound on the number of
steps required to get the relative error of the result below 106 .

42
Chapter 6

Least Square Problems


In this chapter we want to solve the following problem: given A Cmn and b Cm with
m n, nd xC n
which minimises kAx bk2 .

6.1 Singular Value Decomposition

Denition 6.1. Let A Cmn with m, n N. A factorisation

A = U V

is called singular value decomposition (SVD) of U Cmm and V Cnn are unitary,
A, if
C mn
is diagonal, and the diagonal entries of are 1 2 p 0 where
p = min(m, n). The values 1 , . . . , p are called singular values of A.
Theorem 6.2. Every matrix has a singular value decomposition and the singular values are
uniquely determined.
Proof. Let A Cmn . We prove existence of the SVD by induction over min(m, n). If m=0
or n = 0 the matrices U, V , and are just the appropriately shaped empty matrices (one
dimension is zero) and nothing is to show.
Assumemin(m, n) > 0 and that the existence of the SVD is already known for matrices where
kAxk2
one dimension is smaller than min(m, n). Let 1 = kAk2 = maxx6=0
kxk2 = maxkxk2 =1 kAxk2 .
Since the map v 7 Av is continuous and the set { x | kxk2 = 1 } C is compact, the image
n

{ Ax | kxk2 = 1 } C is also compact. Since k k2 : C R is continuous it realises it's


m n

supremum: there is an v1 C with kv1 k2 = 1 and


n

kAv1 k2 = max kAxk2 = 1 .


kxk2 =1

Dening u1 = Av1 /1 we get ku1 k2 = 1.


Extend {v1 } to an orthonormal basis {v1 , . . . , vn } of Cn and {u1 } to an orthonormal basis
{u1 , . . . , um } of Cm . Consider the matrices

U1 = (u1 , . . . , um ) Cmm

and
V1 = (v1 , . . . , vn ) Cnn .
Then the product U1 AV1 is of the form

w

1
0
S = U1 AV1 = .

..

B
0

43
with w Cn1 and B C(m1)(n1) .
For unitary matrices U we have

kU xk22 = hU x, U xi = hx, U U xi = hx, xi = kxk22

and thus
kU1 AV1 xk2 kAV1 xk2
kSk2 = max = max = kAk2 = 1 .
x6=0 kxk2 x6=0 kV1 xk2

On the other hand we get

1 + w w
   2   
1 2 2 1
1/2
= 1 + w w = 1 + w w

w 2 Bw w 2
S
2

and thus kSk2 (12 + w w)1/2 . Together this allows us to conclude w=0 and thus

 
1 0
A = U1 SV1 = U1 V1 .
0 B

By the induction hypothesis the (m1)(n1)-matrix B has a singular value decomposition

B = U2 2 V2 .

Then      
1 0 1 0 1 0
A = U1 V1
0 U2 0 2 0 V2
is a SVD of A and existence of the SVD is proved.
Uniqueness of the largest singular value 1 holds, since 1 is uniquely determined by the
relation
kU V xk2 kxk2
kAk2 = max = max = 1 .
x6=0 kxk2 x6=0 kxk2

Uniqueness of 2 , . . . , n follows by induction as above.

Remarks. 1) Inspection of the above proof reveals that for real matrices A the matrices U and
V are also real.
2) If m > n then the last mn columns of U do not contribute to the factorisation A = U V :

we can also write A


A=U V
as where U Cmn consists of the rst n columns of U and
C
nn
consists of the rst n rows of . This factorisation is called the reduced singular value
decomposition (reduced SVD) of A.

3) Since we have A A = V U U V = V V and thus A A V = V , we nd
2
A Avj = j vj for the columns v1 , . . . , vn of V . This shows that the vectors vj are eigenvectors
2
of A A with eigenvalues j .
4) From the proof we see that we can get the k k2 -norm of a matrix from its SVD: we have
kAk2 = 1 .

44
For the rest of this section let A Cmn be a matrix with singular value decomposition

A = U V and singular values 1 r > 0 = = 0. To illustrate the usefulness of the
SVD we prove a few basic results.

Theorem 6.3. The rank of A is equal to r.


Proof. Since U and V are invertible we have rank(A) = rank() = r.

Theorem 6.4. We have range(A) = span(u1 , . . . , ur ) and ker(A) = span(vr+1 , . . . , vn ).


Proof. a) Since is diagonal and V is invertible we have

range(V ) = range() = span(e1 , . . . , er ) Cm .

This shows
range(A) = range(U V ) = span(u1 , . . . , ur ) Cm .
b) We have
ker(A) = ker(U V ) = ker(V ).
Since V is orthogonal we can conclude

ker(A) = span(vr+1 , . . . , vn ) Cn .

Theorem 6.5. If A = A , then the singular values of A are |1 |, . . . , |n | where 1 , . . . , n are


the eigenvalues of A.

6.2 The Normal Equations

In this section we will rst derive a theoretical criterion to nd the solution of LSQ and then we
will derive three dierent algorithms to solve LSQ.

Theorem 6.6. A vector x Cn minimises kAx bk2 for A Cmn and b Cm , if and only
if Ax b is orthogonal to range(A).
Proof. Let g(x) = 1
2 kAx bk22 . Then minimising kAx bk2 is equivalent to minimising the
function g.
(1) Assume that Ax b is orthogonal to range(A) and let y Cn . Then

Ay Ax = A(y x) Ax b

and Pythagoras' theorem gives

kAy bk22 = kAy Axk22 + kAx bk22 kAx bk22

for all y Cn . Thus x minimises g.


(2) Now assume that the vector x minimises g. Then for every y Cn we have

1 
0= g(x + y) = hAy, Ax bi + hAx b, Ayi = RehAx b, Ayi
2
and
1 
0= g(x + iy) = ihAy, Ax bi + ihAx b, Ayi = i ImhAx b, Ayi.
2
This shows hAx b, Ayi = 0 and thus Ax b Ay for all y Cn .

Corollary 6.7. A vector x Cn solves LSQ if and only if


A Ax = A b. (6.1)

45
Proof. From the theorem we know that x solves LSQ if and only if Ax b range(A). This in
turn is equivalent to Ax b ai for every column ai of A, i.e. to A (Ax b) = 0.
Denition 6.8. The system (6.1) of linear equations is called the normal equations for LSQ.
We will consider dierent algorithms to solve LSQ. The rst one is directly based on the
normal equations.

Algorithm (LSQ via normal equations).


input: A Cmn with m n and rank(A) = n, b Cm
output: x C with minimal kAx bk2
n

1: calculate A A and A b
2: solve (A A)x = A b

Remarks. Calculating A A and A b requires asymptotically 2mn2 operations for m, n .



A A is symmetric we only need to calculate half of the entries, which can
Since be done in
mn2 operations. Solving (A A)x = A b with QR-factorisation requires 34 n3 operations.
Together this gives a total asymptotic cost of order

4
C(m, n) mn2 + n3 .
3
There are methods to solve SLE for symmetric matrices with fewer steps. Using Cholesky
factorisation one gets a total asymptotic cost of order C(m, n) mn2 + 13 n3 .

Algorithm (LSQ via QR-factorisation).


input: A Cmn with m n and rank(A) = n, b Cm
output: x C with minimal kAx bk2
n

1: compute the reduced QR-factorisation A = Q R



2: compute Q b Cn
=Q
3: solve Rx b using back substitution

The result of the algorithm satises

Rx
A Ax = A Q = A Q
Q b = A b

and thus solves LSQ. Steps 1 and 2 together have asymptotic cost C1 (m, n) 2mn2 23 n3 , and
2
step 3 has asymptotic cost C2 (m, n) = O(n ). Thus we get total asymptotic cost

2
C(m, n) 2mn2 n3
3
for m, n with m = (n).

Algorithm (LSQ via SVD).


input: A Cmn with m n and rank(A) = n, b Cm
output: x C with minimal kAx bk2
n

V
1: compute the reduced SVD A = U
2: compute U b C
n
=U
3: solve y b
4: return x = V y

The result x of the calculation satises

V
A Ax = A U V y = A U
U b = A b

and thus it is the solution of LSQ.

Remarks. 1) The three algorithms were presented in order of increasing stability.


2) The ratio of computational cost between the rst two algorithms depends on the ratio
between m and n. The algorithm using SVD is more expensive then the other two.

46
6.3 Conditioning of LSQ

As before let A Cmn with m n, rank(A) = n and b Cm . From corollary 6.7 we know
that the solution of LSQ can be calculated as

x = (A A)1 A b.

Denition 6.9. The matrix A+ = (A A)1 A is called the pseudo-inverse of A Cmn .


Denition 6.10. For A Cmn dene the condition number of A to be
(A) = kAkkA+ k.

Remarks. rank(A) < n then A A is not invertible and we set (A) = +. Since for square,
If
+ 1
invertible matrices A we have A = A , the denition of (A) is consistent with the previous
one. As before the condition number depends on the chosen matrix norm.

Lemma 6.11. The condition number in k k2 -norm of a matrix A Cmn is


1
(A) = .
m
Proof. Using the singular value decomposition of A we nd

A+ = (A A)1 A = (V V )1 V U = V ( )1 U . (6.2)

The matrix ( )1 Cnm is diagonal with diagonal elements

1 1
2 i =
i i

for i = 1, . . . , n. Thus equation (6.2) is (modulo ordering of the singular values) a singular value
decomposition of A+ and we nd

1
(A) = kAk2 kA+ k2 = 1 .
n

Theorem 6.12. Assume that x solves LSQ for b and x + x solves LSQ for b + b. Dene
by cos() = kAxk
[0, /2] kbk and let = kAkkxk/kAxk 1. Then we have
2
2

kxk2 (A) kbk2



kxk2 cos() kbk2

where (A) is the condition number of A in k k2 -norm.


Proof. We have x = A+ b and x + x = A+ (b + b). Linearity then gives x = A+ b and we
get

kxk2 kA+ k2 kbk2 (A)kbk2


=
kxk2 kxk2 kAk2 kxk2
(A)kbk2 (A) kbk2
= = .
kAxk2 cos() kbk2

Remarks. The constant (A)/ cos() becomes large if either (A) is large or /2. In
either of these cases the problem is badly conditioned.

47
Theorem 6.13. Let and as above. Assume that x solves LSQ for A, b and x + x solves
LSQ for A + A, b. Then
kxk2  (A)2 tan()  kAk2
(A) +
kxk2 kAk2

where (A) is the condition number of A in k k2 -norm.


Theorem 6.14. The algorithm to solve LSQ via Householder QR-factorisation is backward
stable: the computed solution x minimises k(A + A)x bk2 for some matrix A with
kAk2
= O(m ).
kAk2

Theorems 6.13 and 6.14 together give estimates for the accuracy of the computed result x,
given that is bounded away from /2.

Exercises
1) By following the proof of the existence of a singular value decomposition, nd the SVD
for the matrix  
1 2
A= .
3 1
2) You are given m data points (u(i) , v (i) ) where u(i) Rn1 and v(i) R for i = 1, . . . , m.
nd R and R to minimise
n1
We would like to

m
X
|T u(i) + v (i) |2 .
i=1

Show that this problem may be reformulated as a standard least squares problem by specifying
appropriate choices of A Rmn and b Rm .

48
Chapter 7

The Eigenvalue Problem


A vector x Cn is an eigenvector of a matrix A Cnn with eigenvalue C if

Ax = x, x 6= 0.

In this chapter we will learn to solve the following problem: given A, compute the eigenvalues
and eigenvectors.

Denition 7.1. Given a matrix A Cnn we dene the characteristic polynomial of A as


A (z) := det(A zI).

Theorem 7.2. A value C is an eigenvalue of the matrix A, if and only if A () = 0.


Proof. is an eigenvalue of A, if and only if there is an x 6= 0 with (A I)x = 0. This is
equivalent to the condition that A I is singular which in turn is equivalent to det(A zI) =
0.
This shows that, if we can nd the roots of arbitrary polynomials, we can also nd the
eigenvalues of arbitrary matrices. We will now see that the two problems are even equivalent,
i.e. that for every polynomial we can nd a matrix with this polynomial as the characteristic
polynomial. Let p(z) = a0 + a1 z + + an1 z n1 + z n be given. Dene A Cnn by


0 a0
1
a1
.. .
.

A= . . . (7.1)

..
. an2
1 an1

By induction one can show that A (z) = det(A zI) = (1)n p(z). This shows that the problem
of nding eigenvalues is equivalent to the problem of nding roots of a polynomial.

Theorem 7.3 (Abel, 1824). For every n 5 there is a polynomial p of degree n with rational
coecients that has a real root which cannot be expressed only using rational numbers, addition,
subtraction, multiplication, division and taking kth roots.
As the computed result of a computer program will always be based on the operations
mentioned in the theorem, we nd that it is not possible to nd an algorithm which calculates
eigenvalues exactly after a nite number of steps. Conclusion: any eigenvalue solver must be
iterative.

Theorem 7.4. If A Cnn is Hermitian, then there exists a unitary matrix Q Cnn and
diagonal Rnn with
A = QQ .

49
Remarks. 1) The orthonormal columns of Q are the eigenvectors of A and the diagonal entries
of are the corresponding eigenvalues.
2) The theorem shows that Hermitian matrices have real eigenvalues.

The iterative methods presented in this chapter will calculate approximations to eigenvectors.
The following consideration will help to get an approximation for the corresponding eigenvalue
from this. Given A Cnn x Cn
we try to nd the C which minimises kAx xk2 .
and
If x is an eigenvector then the minimum is attained for the corresponding eigenvalue. Otherwise
we consider the normal equations: in the distance

kAx xk2 = kx Axk2


we consider x to be a n 1-matrix, C1 to be the unknown vector and Ax Cn to be the
right hand side. Then according to corollary 6.7 the minimum is attained for

hx, Axi
= (x x)1 x (Ax) = .
hx, xi
Denition 7.5. The Rayleigh quotient of a matrix A Cnn is dened by

hx, Axi
rA (x) =
hx, xi
for all x Cn .
Theorem 7.6. Let A Rnn be symmetric and x Rn with x 6= 0. Then x is an eigenvector
of A with eigenvalue if and only if rA (x) = 0 and rA (x) = .
Proof. The gradient of rA can be calculated as
 Pn
j,k=1 xj ajk xk

rA (x) = Pn 2
xi j=1 xj i=1,...,n
P P  Pn 2
P
k6=i aik xk + j6=i xj aji + 2aii xi j=1 xj j,k xj ajk xk 2xi
 
= Pn  2
2 i=1,...,n
j=1 xj
2Ax hx, xi 2hx, Axi x
=
hx, xi2 x
2  
= Ax r A (x) x .
kxk22
Assume Ax = x. Then rA (x) = hx, xi/hx, xi = and

2 
rA (x) = x x = 0.
kxk22
If on the other hand rA (x) = 0, then Ax rA (x)x = 0 and thus we get Ax = rA (x) x.

For the remaining part of the chapter let x1 , . . . , xn be a orthonormal system of eigenvectors
and and 1 , . . . , n be the corresponding eigenvalues of a real, symmetric matrix A. We order
the eigenvalues such that |1 | |2 | |n |.

Algorithm (power iteration).


input: A Rnn symmetric with |1 | > |2 |
output: z
(k)
Rn , (k) R with z (k) x1 and (k) 1
1: choose z (0) Rn with kz (0) k2 = 1
2: for k=1,2,3,. . . do
3: w(k) = Az (k1)
w(k)
4: z (k) =
kw(k) k2
5: (k) = hz (k) , Az (k) i
6: end for

50
Remarks. 1) For practical usage the method, as is every iterative method, is stopped at some
point where the result is close enough to the real one.
z (k) = Ak z (0) /kAk z (0) k2 and (k) = rA (z (k) ). To avoid over-
2) The algorithm calculates
(k)
ow/underow errors the vector z is normalised in every step of the iteration.
(0)
3) The method is based on the following idea: if we express z in the basis x1 , . . . , xn we
get
n
X
z (0) = i xi
i=1

and
n
X n
X
Ak z (0) = i Ak xi = i ki xi .
i=1 i=1

For large k this expression is dominated by the term corresponding to the eigenvalue with the
largest modulus.

Theorem 7.7. Let |1 | > |2 | |n | and hz (0) , x1 i =


6 0. Then there is a sequence
( (k) )kN with (k) {1, +1} for all k N such that the sequences (z (k) ) and ((k) ) from the
power iteration algorithm satisfy
z (k) x1 = O 2 k
(k)  
(7.2)
2 1
and
1 = O 2 2k .
(k)  
(7.3)
1
Proof. a) Let x1 , . . . , xn be an orthonormal system of eigenvectors with eigenvalues 1 , . . . , n
and
n
X
z (0) = i xi .
i=1

Since 1 = hz (0) , x1 i =
6 0, we get

n n
X  X i i k 
Ak z (0) = i k xi = 1 k1 x1 + xi
i=1
1
i=2 1

and Pythagoras' theorem gives

n n
k (0) 2 
A z = |1 k1 |2 1 +
X i 2 i 2k  k 2
 2 2k X i 2 
2
|1 1 | 1 + .
i=2
1 1 1 i=2
1

Using the Taylor approximation 1 + x2 1 + x2 /2 we can conclude

A z = |1 k1 | 1 + O 2 2k .
k (0)  
2 1

Now dene (k) = sgn(1 k1 ). Then

Ak z (0) 2 Ak z (0) 2 X n
(k) i 2 i 2k  
2 k
x = x = = O

1 1
|1 k1 | 1 k1 1

2 2 1 1
i=2

and thus

Ak z (0) Ak z (0) Ak z (0)


(k)
z (k) x1 k (0) + (k) x1

2 kA z k2 k
|1 1 | 2 k
|1 1 | 2
     
2 2k 2 k 2 k
=O +O =O
1 1 1

51
This nishes the proof of (7.2).
b) From theorem 7.6 we know rA ( (k) x1 ) = 1 and rA ( (k) x1 ) = 0. Taylor expansion of
(k)
rA around x1 gives

rA (z) = rA (x1 ) + hrA ( (k) x1 ), z (k) x1 i + O kz (k) x1 k22




= 1 + 0 + O kz (k) x1 k22 .


Using (7.2) we get

1 = rA (z (k) ) 1 = O kz (k) (k) x1 k22 = O 2 2k .


(k)   
1
This completes the proof of relation (7.3).

The power iteration algorithm helps to nd the eigenvalue with the largest modulus and the
corresponding eigenvector. The method can be modied to nd dierent eigenvalues of A. This
is done in the following algorithm.

Algorithm (inverse iteration).


input: A Rnn symmetric, R
output: z
(k)
Rn , (k) R with z (k) xj and (k) j
where j is the eigenvalue closest to

1: choose z (0) Rn with kz (0) k2 = 1


2: for k=1,2,3,. . . do
3: solve (A I)w(k) = z (k1)
w(k)
4: z (k) =
kw(k) k2
5: = hz (k) , Az (k) i
(k)

6: end for
1
7: return (k) = + .
(k)

Remarks. 1) For practical usage the method is stopped at some point where the result is close
enough to the real one.
2) By comparison with the power iteration method we nd that (k) approximates the eigen-
1 1
value of (A I) with the largest modulus. Since (A I) has eigenvalues i = 1/(i )
this value is j where j is the eigenvalue closest to and we get

1 1
(k) = + + = + (j ) = j .
(k) j
Using this argument it is easy to convert theorem 7.7 into a theorem about the speed of conver-
gence for the inverse iteration algorithm.

(0)
The power iteration method only works if the initial vector z satises the condition
R
(0)

n

hz , x1 i 6= 0. This is no problem since dim z hz, xi = 0 = n 1 < n. The probability
(0)
of hitting this hyperplane with a randomly chosen initial vector z is zero if the distribution
has a density with respect to the Lebesgue measure on
n
R
and also at least one of the basis
vectors e1 , . . . , en satises the condition hei , x1 i =
6 0.

The following algorithm extends the idea of the power iteration algorithm: it runs the power
iteration method for n orthonormal vectors simultaneously, re-orthonormalising them at every
step. The result is an algorithm which approximates all eigenvectors and all eigenvalues at once.

Algorithm (simultaneous iteration).


input:A Rnn symmetric
output: Q
(k)
, (k) Rnn
(k) (k)
with Q (x1 , x2 , . . . , xn ) and ii i for i = 1, . . . , n

52
1: choose an orthogonal matrix Q(0) Rnn
2: for k = 1, 2, 3, . . . do
3: W (k) = AQ(k1)
4: calculate the QR-factorisation W (k) = Q(k) R(k)
(k) (k) T (k) (k)
5: = (Q ) W Q
6: end for
Theorem 7.8. Let A Rnn be symmetric with eigenvalues 1 , . . . , n satisfying 1 > > n .
Assume hqi(0) , xi i 6= 0 for i = 1, . . . , n. Then there are sequences (i(k) )kN for i = 1, . . . , n with
i {+1, 1} for all i, k with
(k)

 
q (k) xi = O k
(k)
i i 2

and  
i = O 2k
(k)
ii

for i = 1, . . . , n where q1(k) , . . . , qn(k) are the columns of Q(k) and = maxi=1,...,n1 |i |/|i+1 |.

Remarks. Since the matrices R(k) are upper-triangular, the rst column of Q(k) in step 4 of
(k) (k)
the algorithm is a multiple of the rst column of the matrix W . Thus the rst column q1 of
(k) (0)
the matrix Q performs the original power iteration algorithm with initial vector q1 .

Exercises
1) Give a proof by induction which shows that the matrix A from (7.1) really has determi-
nant (1)n p(z) where p(z) = a0 + a1 z + + an1 z n1 + z n .
2) Give a proof of theorem 7.8.

53
Bibliography
[Dem97] James W. Demmel. Applied Numerical Linear Algebra. SIAM, 1997.

[Hig02] Nicholas J. Higham. Accuracy and Stability of Numerical Algorithms. SIAM, 2002.

[HJ85] Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press,
1985.

[SB02] J. Stoer and R. Bulirsch. Introduction to Numerical Analysis. Springer, third edition,
2002.

[TB97] Lloyd N. Trefethen and David Bau. Numerical Linear Algebra. SIAM, 1997.

54

You might also like