Professional Documents
Culture Documents
1 2
Xiao-Qing JIN Yi-Min WEI
1
Department of Mathematics, University of Macau, Macau, P. R. China.
2
Department of Mathematics, Fudan University, Shanghai, P.R. China
2
i
To Our Families
ii
CONTENTS
page
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Basic symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Basic problems in NLA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Why shall we study numerical methods? . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Matrix factorizations (decompositions) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Perturbation and error analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Operation cost and convergence rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4.3 QR decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
v
Preface
Numerical linear algebra, also called matrix computation, has been a center of sci-
entific and engineering computing since 1946, the first modern computer was born.
Most of problems in science and engineering finally become problems in matrix compu-
tation. Therefore, it is important for us to study numerical linear algebra. This book
gives an elementary introduction to matrix computation and it also includes some new
results obtained in recent years. In the beginning of this book, we first give an outline
of numerical linear algebra in Chapter 1.
In Chapter 2, we introduce Gaussian elimination, a basic direct method, for solving
general linear systems. Usually, Gaussian elimination is used for solving a dense linear
system with median size and no special structure. The operation cost of Gaussian
elimination is O(n3 ) where n is the size of the system. The pivoting technique is also
studied.
In Chapter 3, in order to discuss effects of perturbation and error on numerical
solutions, we introduce vector and matrix norms and study their properties. The error
analysis on floating point operations and on partial pivoting technique is also given.
In Chapter 4, linear least squares problems are studied. We will concentrate on
the problem of finding the least squares solution of an overdetermined linear system
Ax = b where A has more rows than columns. Some orthogonal transformations and
the QR decomposition are used to design efficient algorithms for solving least squares
problems.
We study classical iterative methods for the solution of Ax = b in Chapter 5.
Iterative methods are quite different from direct methods such as Gaussian elimination.
Direct methods based on an LU factorization of the matrix A are prohibitive in terms
of computing time and computer storage if A is quite large. Usually, in most large
problems, the matrices are sparse. The sparsity may be lost during the LU factorization
procedure and then at the end of LU factorization, the storage becomes a crucial issue.
For such kind of problem, we can use a class of methods called iterative methods. We
only consider some classical iterative methods in this chapter.
In Chapter 6, we introduce another class of iterative methods called Krylov sub-
space methods proposed recently. We will only study two versions among those Krylov
subspace methods: the conjugate gradient (CG) method and the generalized mini-
mum residual (GMRES) method. The CG method proposed in 1952 is one of the best
known iterative method for solving symmetric positive definite linear systems. The
GMRES method was proposed in 1986 for solving nonsymmetric linear systems. The
preconditioning technique is also studied.
Eigenvalue problems are particularly interesting in scientific computing. In Chapter
vi
Authors words on the corrected and revised second printing: In its second
printing, we corrected some minor mathematical and typographical mistakes in the
first printing of the book. We would like to thank all those people who pointed these
out to us. Additional comments and some revision have been made in Chapter 7.
The references have been updated. More exercises are also to be found in the book.
The second printing of the book is supported by the research grant No.RG081/04-
05S/JXQ/FST.
viii
Chapter 1
Introduction
Numerical linear algebra (NLA) is also called matrix computation. It has been a center
of scientific and engineering computing since the first modern computer came to this
world around 1946. Most of problems in science and engineering are finally transferred
into problems in NLA. Thus, it is very important for us to study NLA. This book gives
an elementary introduction to NLA and it also includes some new results obtained in
recent years.
Let
R denote the set of real numbers, C denote the set of complex numbers and
i 1.
Let Rn denote the set of real n-vectors and Cn denote the set of complex n-vectors.
Vectors will almost always be column vectors.
Let Rmn denote the linear vector space of m-by-n real matrices and Cmn denote
the linear vector space of m-by-n complex matrices.
We will use the upper case letters such as A, B, C, and , etc, to denote
matrices and use the lower case letters such as x, y, z, etc, to denote vectors.
The symbol AT will denote the transpose of the matrix A and A will denote the
conjugate transpose of the matrix A.
1
2 CHAPTER 1. INTRODUCTION
We will use det(A) to denote the determinant of the matrix A and use diag(a11 , , ann )
to denote the n-by-n diagonal matrix:
a11 0 0
..
0 a22 . . . .
diag(a11 , , ann ) = . .
.. ..
.. . . 0
0 0 ann
For matrix A = [aij ], the symbol |A| will denote the matrix with entries (|A|)ij =
|aij |.
and ei will denote the i-th unit vector, i.e., the i-th column vector of I.
As in MATLAB, in algorithms, A(i, j) will denote the (i, j)-th entry of matrix A;
A(i, :) and A(:, j) will denote the i-th row and the j-th column of A, respectively;
A(i1 : i2 , k) will express the column vector constructed by using entries from the
i1 -th entry to the i2 -th entry in the k-th column of A; A(k, j1 : j2 ) will express the
row vector constructed by using entries from the j1 -th entry to the j2 -th entry
in the k-th row of A; A(k : l, p : q) will denote the (l k + 1)-by-(q p + 1)
submatrix constructed by using the rows from the k-th row to the l-th row and
the columns from the p-th column to the q-th column.
Ax = b
(2) Linear least squares problems: For any m-by-n matrix A and an m-vector b, find
an n-vector x such that
(3) Eigenvalues problems: For any n-by-n matrix A, find a part (or all) of its eigen-
values and corresponding eigenvectors. We remark here that a complex number
is called an eigenvalue of A if there exists a nonzero vector x Cn such that
Ax = x,
Besides these main problems, there are many other fundamental problems in NLA,
for instance, total least squares problems, matrix equations, generalized inverses, in-
verse problems of eigenvalues, and singular value problems, etc.
Ax = b
where Ai , for i = 1, 2, , n, are matrices with the i-th column replaced by the vector
b. Then we should compute n + 1 determinants det(Ai ), i = 1, 2, , n, and det(A).
There are
[n!(n 1)](n + 1) = (n 1)(n + 1)!
multiplications. When n = 25, by using a computer with 10 billion operations/sec., we
need
24 26!
10
30.6 billion years.
10 3600 24 365
4 CHAPTER 1. INTRODUCTION
multiplications. Then less than 1 second, we could solve 25-by-25 linear systems by
using the same computer. From above discussions, we note that for solving the same
problem by using different numerical methods, the results are much different. There-
fore, it is essential for us to study the properties of numerical methods.
By substituting, we can easily solve (1.1) and then Ax = b. Therefore, matrix factor-
izations (decompositions) are very important tools in NLA. The following theorem is
basic and useful in linear algebra, see [17].
X 1 AX = J diag(J1 , J2 , , Jp ),
for i = 1, 2, , p, are called Jordan blocks with n1 + +np = n. The Jordan canonical
form of A is unique up to the permutation of diagonal Jordan blocks. If A Rnn with
only real eigenvalues, then the matrix X can be taken to be real.
1.5. PERTURBATION AND ERROR ANALYSIS 5
(1) Perturbation.
For a given x, we want to compute the value of function f (x). Suppose there
is a perturbation x of x and |x|/|x| is very small. We want to find a positive
number c(x) as small as possible such that
|f (x + x) f (x)| |x|
c(x) .
|f (x)| |x|
Then c(x) is called the condition number of f (x) at x. If c(x) is large, we say that
the function f is ill-conditioned at x; if c(x) is small, we say that the function f
is well-conditioned at x.
Remark: A computational problem being ill-conditioned or not has no relation
with numerical methods that we used.
(2) Error.
By using some numerical methods, we calculate the value of a function f at a
point x and we obtain y. Because of the rounding error (or chopping error),
usually
y 6= f (x).
If there exists x such that
y = f (x + x), |x| |x|,
where is a positive constant having a closed relation with numerical methods
and computers used, then we say that the method is stable if is small; the
method is unstable if is large.
Remark: A numerical method being stable or not has no relation with computa-
tional problems that we faced.
By using direct methods, one can obtain an accurate solution of computational prob-
lems within finite steps in exact arithmetic. By using iterative methods, one can only
obtain an approximate solution of computational problems within finite steps.
The operation cost is an important measurement of algorithms. The operation
cost of an algorithm is the total operations of +, , , used in the algorithm.
We remark that the speed of algorithms is only partially depending on the operation
cost. In modern computers, the speed of operations is much faster than that of data
transfer. Therefore, sometimes, the speed of an algorithm is mainly depending on the
total amount of data transfers.
For direct methods, usually, we use the operation cost as a main measurement of
the speed of algorithms. For iterative methods, we need to consider
(i) operation cost in each iteration;
(ii) convergence rate of the method.
where 0 < c < 1 and k k is any vector norm (see Chapter 3 for a detail), then we say
that the convergence rate is linear. If it satisfies
where 0 < c < 1 and p > 1, then we say that the convergence rate is superlinear.
Exercises:
1. A matrix is strictly upper triangular if it is upper triangular with zero diagonal elements.
Show that if A is a strictly upper triangular matrix of order n, then An = 0.
2. Let A Cnm and B Cml . Prove that
3. Let
A11 A12
A= ,
A21 A22
where Aij , for i, j = 1, 2, are square matrices with det(A11 ) 6= 0, and satisfy
Then
det(A) = det(A11 A22 A21 A12 ).
12
P
n
2
where aj = A(:, j) and kaj k2 = |A(i, j)| . When does the equality hold?
i=1
6. Let B be nilpotent, i.e., there exists an integer k > 0 such that B k = 0. Show that if
AB = BA, then
det(A + B) = det(A).
7. Let A be an m-by-n matrix and B be an n-by-m matrix. Show that the matrices
AB 0 0 0
and
B 0 B BA
are similar. Conclude that the nonzero eigenvalues of AB are the same as those of BA,
and
det(Im + AB) = det(In + BA).
M = M , x M x > 0,
9. Show that any matrix A Cnn can be written uniquely in the form
A = B + iC,
11. Let
A11 A12
A= .
A21 A22
Assume that A11 , A22 are square, and A11 , A22 A21 A1
11 A12 are nonsingular. Let
B11 B12
B=
B21 B22
12. Suppose that A and B are Hermitian with A being positive definite. Show that A + B
is positive definite if and only if all the eigenvalues of A1 B are greater than 1.
13. Let A be idempotent, i.e., A2 = A. Show that each eigenvalue of A is either 0 or 1.
14. Let A be a matrix with all entries equal to one. Show that A can be written as A =
eeT , where eT = (1, 1, , 1), and A is positive semi-definite. Find the eigenvalues and
eigenvectors of A.
15. Prove that any matrix A Cnn has a polar decomposition A = HQ, where H is
Hermitian positive semi-definite and Q is unitary. We recall that M Cnn is a unitary
matrix if M 1 = M . Moreover, if A is nonsingular, then H is Hermitian positive definite
and the polar decomposition of A is unique.
Chapter 2
The problem of solving linear systems is central in NLA. For solving linear systems, in
general, we have two classes of methods. One is called the direct method and the other
is called the iterative method. By using direct methods, within finite steps, one can
obtain an accurate solution of computational problems in exact arithmetic. By using
iterative methods, within finite steps, one can only obtain an approximate solution of
computational problems.
In this chapter, we will introduce a basic direct method called Gaussian elimination
for solving general linear systems. Usually, Gaussian elimination is used for solving a
dense linear system with median size and no special structure.
Ly = b (2.1)
9
10 CHAPTER 2. DIRECT METHODS FOR LINEAR SYSTEMS
This algorithm is called the forward substitution method which needs O(n2 ) operations.
Now, we consider the following nonsingular upper triangular linear system
Ux = y (2.2)
where x = (x1 , x2 , , xn )T is an unknown vector, and U Rnn is given by
u11 u12 u13 u1n
..
0 u22 u23 .
.. ..
U = 0 0 u33 . .
.. .. .. .. ..
.
. . . .
0 0 unn
with uii 6= 0, i = 1, 2, , n. Beginning from the last equation of (2.2), we can obtain
xn , xn1 , , x1 step by step. The xn = yn /unn and xi is given by
1
n
X
xi = yi uij xj
uii
j=i+1
2.1. TRIANGULAR LINEAR SYSTEMS AND LU FACTORIZATION 11
A = LU
is called the Gaussian transform matrix. Such a matrix is a unit lower triangular
matrix. We remark that a unit triangular matrix is a triangular matrix with ones on
its diagonal. For any given vector
x = (x1 , x2 , , xn )T Rn ,
we have
Lk x = (x1 , , xk , xk+1 xk lk+1,k , , xn xk lnk )T
= (x1 , , xk , 0, , 0)T
12 CHAPTER 2. DIRECT METHODS FOR LINEAR SYSTEMS
if we take
xi
lik = , i = k + 1, , n
xk
with xk 6= 0. It is easy to check that
L1 T
k = I + lk ek
Lk A = (I lk eTk )A = A lk (eTk A)
and
rank(lk (eTk A)) = 1.
Therefore, Lk A is a rank-one modification of the matrix A.
we have
1 5 9
L1 A = 0 6 11 .
0 12 17
Followed by using the Gaussian transform matrix
1 0 0
L2 = 0 1 0 ,
0 2 1
we have
1 5 9
L2 (L1 A) U = 0 6 11 .
0 0 5
2.1. TRIANGULAR LINEAR SYSTEMS AND LU FACTORIZATION 13
where
1 0 0
L (L2 L1 )1 = L1 1
1 L2 = 2 1 0 .
3 2 1
For general n-by-n matrix A, we can use n 1 Gaussian transform matrices L1 ,
L2 , , Ln1 such that Ln1 L1 A is an upper triangular matrix. In fact, let A(0) A
and assume that we have already found k1 Gaussian transform matrices L1 , , Lk1
Rnn such that " #
(k1) (k1)
(k1) A11 A12
A = Lk1 L1 A = (k1)
0 A22
(k1)
where A11 is a (k 1)-by-(k 1) upper triangular matrix and
(k1) (k1)
akk akn
(k1) .. .. ..
A22 = . . . .
(k1) (k1)
ank ann
(k1)
If akk 6= 0, then we can use the Gaussian transform matrix
Lk = I lk eTk
where
lk = (0, , 0, lk+1,k , , lnk )T
with
(k1)
aik
lik = (k1)
, i = k + 1, , n,
akk
such that the last n k entries in the k-th column of Lk A(k1) become zeros. We
therefore have " #
(k) (k)
(k) (k1) A11 A12
A Lk A = (k)
0 A22
(k)
where A11 is a k-by-k upper triangular matrix. After n 1 steps, we obtain A(n1)
which is an upper triangular matrix that we need. Let
L = (Ln1 L1 )1 , U = A(n1) ,
14 CHAPTER 2. DIRECT METHODS FOR LINEAR SYSTEMS
L = L1 1
1 Ln1
= I + [l1 , l2 , , ln1 , 0]
1 0 0 0
l21 1 0 0
. . ..
=
l31 l32 1 . .
.
.. .. .. . .
. . . . 0
ln1 ln2 ln3 1
X
n1 n(n 1) n(n 1)(2n 1)
(n k) + 2(n k)2 = +
2 3
k=1
2 3
= n + O(n2 ) = O(n3 ).
3
(k1)
We remark that in Gaussian elimination, akk , k = 1, , n 1, are required to
be nonzero. We have the following theorem.
(i1)
Theorem 2.1 The entries aii 6= 0, i = 1, , k, if and only if all the leading
principal submatrices Ai of A, i = 1, , k, are nonsingular.
2.2. LU FACTORIZATION WITH PIVOTING 15
(k1) (i1)
where A11 is an upper triangular matrix with nonzero diagonal entries aii , i =
1, , k 1. Therefore, the k-th leading principal submatrix of A(k1) has the following
form " #
(k1)
A11
(k1) .
0 akk
Let (L1 )k , , (Lk1 )k denote the k-th leading principal submatrices of L1 , , Lk1 ,
respectively. By using (2.4), we obtain
" #
(k1)
A11
(Lk1 )k (L1 )k Ak = (k1) .
0 akk
Thus, we have
Theorem 2.2 If all the leading principal submatrices Ai of a matrix A Rnn are
nonsingular for i = 1, , n 1, then there exists a unique LU factorization of A.
x = (0.0000000000, 0.7000000000)T
x = (0.2000000000006 , 0.6999999999994 )T .
If we just interchange the first equation and the second equation, we have
1 1 x1 0.9
= .
0.3 1011 1 x2 0.7
By using Gaussian elimination with the 10-decimal-digit floating point arithmetic again,
we have
b 1 0 b 1 1
L= , U= .
0.3 1011 1 0 1
Then the computational solution is
x = (0.2000000000, 0.7000000000)T
The important properties of the permutation matrix are included in the following
lemma. Its proof is straightforward.
(i) P X is the same as X with its rows permuted. XP is the same as X with its
columns permuted.
(ii) P 1 = P T .
2.2. LU FACTORIZATION WITH PIVOTING 17
(iii) det(P ) = 1.
and
L21 u11 = A21 , e22 .
A22 = L21 U12 + A
Therefore, we obtain
e22 ) 6= 0.
det(A
Since
0 0
det(P1 AP2 ) = det(A) 6= 0
18 CHAPTER 2. DIRECT METHODS FOR LINEAR SYSTEMS
and also
0 0 1 0 u11 U12
det(P1 AP2 ) = det det e22
L21 I 0 A
e22 ),
= u11 det(A
we know that
e22 ) 6= 0.
det(A
Therefore, by the assumption of induction, there exist permutation matrices Pe1 and
e
P2 such that
Pe1 A
e22 Pe2 = L
eUe, (2.6)
1 0 1 0 u11 U12
= e e e PeT
L21 I 0 P1T L 0 U 2
" #
1 0 u11 U12 Pe2 1 0
= e e
L21 P1T L 0 e
U 0 Pe2T
" #
1 0 1 0 u11 U12 Pe2 1 0
= ,
0 Pe1T e e
P1 L21 L 0 e
U 0 Pe2T
for k = 1 : n 1
choose p, q, (k p, q n) such that
|A(p, q)| = max {|A(i, j)| : i = k : n, j = k : n}
A(k, 1 : n) A(p, 1 : n)
A(1 : n, k) A(1 : n, q)
if A(k, k) =6 0
A(k + 1 : n, k) = A(k + 1 : n, k)/A(k, k)
A(k + 1 : n, k + 1 : n) = A(k + 1 : n, k + 1 : n)
A(k + 1 : n, k)A(k, k + 1 : n)
else
stop
end
end
We remark that although the LU factorization with complete pivoting can overcome
some shortcomings of the LU factorization without pivoting, the cost of complete
pivoting is very high. Usually, it requires O(n3 ) operations in comparison with entries
of the matrix for pivoting.
In order to reduce the operation cost of pivoting, the LU factorization with partial
(k1)
pivoting is proposed. In partial pivoting, at the k-th step, we choose apk from the
(k1)
submatrix A22 which satisfies
n o
(k1) (k1)
|apk | = max |aik | : k i n .
When A is nonsingular, the LU factorization with partial pivoting can be carried out
until we finally obtain
P A = LU.
In this algorithm, the operation cost in comparison with entries of the matrix for
pivoting is O(n2 ). We have
20 CHAPTER 2. DIRECT METHODS FOR LINEAR SYSTEMS
A = AT , xT Ax > 0,
Theorem 2.4 Let A Rnn be symmetric positive definite. Then there exists a lower
triangular matrix L Rnn with positive diagonal entries such that
A = LLT .
Proof: Since A is positive definite, all the principal submatrices of A should be positive
definite. By Theorem 2.2, there exist a unit lower triangular matrix L e and an upper
triangular matrix U such that
e
A = LU.
Let
D = diag(u11 , , unn ), e = D1 U,
U
where uii > 0, for i = 1, , n. Then we have
e T DL
U e T = AT = A = LD
e Ue.
Therefore,
L e 1 = D1 U
eT U e
e T LD.
2.3. CHOLESKY FACTORIZATION 21
L e 1 = I = D1 U
eT U e T LD
e
e =L
which implies U e T . Thus
e L
A = LD eT .
Let
e
L = Ldiag( u11 , , unn ).
We finally have
A = LLT .
Thus, when a matrix A is symmetric positive definite, we could find the solution of
the system Ax = b by the following three steps:
From Theorem 2.4, we know that we do not need a pivoting in Cholesky factor-
ization. Also we could calculate L directly through a comparison in the corresponding
entries between two sides of A = LLT . We have the following algorithm.
Exercises:
1. Let S, T Rnn be upper triangular matrices such that
(ST I)x = b
Prove that a strictly diagonally dominant matrix is nonsingular, and a strictly diagonally
dominant symmetric matrix with positive diagonal entries is positive definite.
5. Let
A11 A12
A=
A21 A22
with A11 being a k-by-k nonsingular matrix. Then
S = A22 A21 A1
11 A12
is called the Schur complement of A11 in A. Show that after k steps of Gaussian elimi-
(k1)
nation without pivoting, A22 = S.
6. Let A be a symmetric positive definite matrix. At the end of the first step of Gaussian
elimination, we have
a11 aT1
.
0 A22
Prove that A22 is also symmetric positive definite.
7. Let A = [aij ] Rnn be a strictly diagonally dominant matrix. After one step of
Gaussian elimination, we have
a11 aT1
.
0 A22
Show that A22 is also strictly diagonally dominant.
8. Show that if P AQ = LU is obtained via Gaussian elimination with pivoting, then |uii |
|uij |, for j = i + 1, , n.
9. Let H = A + iB be a Hermitian positive definite matrix, where A, B Rnn .
2.3. CHOLESKY FACTORIZATION 23
10. Develop an algorithm to solve a tridiagonal system by using Gaussian elimination with
partial pivoting.
11. Show that if a singular matrix A Rnn has a unique LU factorization, then Ak is
nonsingular for k = 1, 2, , n 1.
24 CHAPTER 2. DIRECT METHODS FOR LINEAR SYSTEMS
Chapter 3
In this chapter, we will discuss effects of perturbation and error on numerical solutions.
The error analysis on floating point operations and on partial pivoting technique is also
given. It is well-known that the essential notions of distance and size in linear vector
spaces are captured by norms. We therefore need to introduce vector and matrix norms
and study their properties before we develop our perturbation and error analysis.
kxkp |xi |p
i=1
25
26 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS
where 1 p. The following p-norms are the most commonly used norms in practice:
n
n !1
X X 2
Theorem 3.1 If k k and k k are two norms on Rn , then there exist two positive
constants c1 and c2 such that
for all x Rn .
and
kxk kxk1 nkxk .
(k) (k)
We remark that for any sequence of vectors {xk } where xk = (x1 , , xn )T Rn ,
and x = (x1 , , xn )T Rn , by Theorem 3.1, one can prove that
(k)
lim kxk xk = 0 lim |xi xi | = 0,
k k
for i = 1, , n.
3.1. VECTOR AND MATRIX NORMS 27
Definition 3.2 A matrix norm is a function that assigns to each A Rnn a real
number kAk, called the norm of A, such that the following four properties are satisfied
for all A, B Rnn and all R:
An important property of matrix norms on Rnn is that all the matrix norms on
Rnn are equivalent. For the relation between a vector norm and a matrix norm, we
have
for A Rnn and x Rn , then these norms are called mutually consistent.
For any vector norm k kv , we can define a matrix norm in the following natural
way:
kAxkv
kAkM max = max kAxkv .
x6=0 kxkv kxkv =1
The most important matrix norms are the matrix p-norms induced by the vector p-
norms for p = 1, 2, . We have the following theorem.
P
n
(ii) kAk = max |aij |.
1in j=1
p
(iii) kAk2 = max (AT A), where max (AT A) is the largest eigenvalue of AT A.
Proof: We only give the proof of (i) and (iii). In the following, we always assume that
A 6= 0.
For (i), we partition the matrix A by columns:
A = [a1 , , an ].
Let
= kaj0 k1 = max kaj k1 .
1jn
P
n
Then for any vector x Rn which satisfies kxk1 = |xi | = 1, we have
i=1
P P
n n
kAxk1 = xj aj |xj | kaj k1
j=1 1 j=1
P
n
( |xj |) max kaj k1
j=1 1jn
= kaj0 k1 = .
kAej0 k1 = kaj0 k1 = .
Therefore
n
X
kAk1 = max kAxk1 = = max kaj k1 = max |aij |.
kxk1 =1 1jn 1jn
i=1
Let
v1 , v2 , , vn Rn
denote the orthonormal eigenvectors corresponding to 1 , 2 , , n , respectively. Then
for any vector x Rn with kxk2 = 1, we have
n
X n
X
x= i vi , i2 = 1.
i=1 i=1
Therefore,
n
X
xT AT Ax = i i2 1 .
i=1
On the other hand, let x = v1 , we have
Thus p q
kAk2 = max kAxk2 = 1 = max (AT A).
kxk2 =1
(iii) kAk2 = kQAZk2 , for any orthogonal matrices Q and Z. We recall that a matrix
M Rnn is called orthogonal if M 1 = M T .
Proof: We only prove (i). We first introduce the dual norm k kD of a vector norm
k k defined as follows,
kykD = max |y x|.
kxk=1
|y x| kyk2 kxk2
1
with equality when x = kyk2 y. Therefore, the dual norm of k k2 is given by
kykD
2 = max |y x| = max kyk2 kxk2 = kyk2 .
kxk2 =1 kxk2 =1
30 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS
One of the most important properties of k kF is that for any orthogonal matrices Q
and Z,
kAkF = kQAZkF .
In the following, we will extend our discussion on norms to the field of C. We remark
that from the viewpoint of norms, there is no essential difference between matrices or
vectors in the field of R and matrices or vectors in the field of C.
Definition 3.4 Let A Cnn . Then the set of all the eigenvalues of A is called the
spectrum of A and
For the relation between the spectral radius and matrix norms, we have
(A) kAk.
(ii) For any > 0, there exists a norm defined on Cnn such that
kAk (A) + .
3.1. VECTOR AND MATRIX NORMS 31
where n = 0.
(k)
We remark that for any sequence of matrices {A(k) } where A(k) = [aij ] Rnn ,
and A = [aij ] Rnn ,
(k)
lim kA(k) Ak = 0 lim aij = aij ,
k k
32 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS
for i, j = 1, , n.
Therefore,
lim (A)k = 0
k
which implies (A) < 1.
Conversely, assume that (A) < 1. By Theorem 3.4 (ii), there exists a matrix norm
k k such that kAk < 1. Therefore, we have
0 kAk k kAkk 0, k ,
i.e.,
lim Ak = 0.
k
By using Theorem 3.5, one can easily prove that following important theorem.
P
(ii) When Ak converges, we have
k=0
X
Ak = (I A)1 .
k=0
Moreover, there exists a norm defined on Cnn such that for any m,
m
X kAkm+1
1 k
(I A) A .
1 kAk
k=0
3.2. PERTURBATION ANALYSIS FOR LINEAR SYSTEMS 33
Corollary 3.1 Let k k be a norm defined on Cnn with kIk = 1 and A Cnn satisfy
kAk < 1. Then I A is nonsingular and satisfies
1
k(I A)1 k .
1 kAk
The solution of this linear system is x = (1, 1)T . If there is a small perturbation on b,
say,
= (1 104 , 1 104 )T ,
the system becomes
2.0001 1.9999 x1 4.0001
= .
1.9999 2.0001 x2 3.9999
kx xk 1 kk 1
= , = ,
kxk 2 kbk 40000
i.e., the relative error of the solution is 20000 times of that of the perturbation on b.
Thus, when we solve a linear system Ax = b, a good measurement, which can
tell us how sensitive the computed solution is to input small perturbations, is needed.
The condition number of matrices is then defined. It relates perturbations of x to
perturbations of A and b.
Definition 3.5 Let k k be any norm of matrix and A be a nonsingular matrix. The
condition number of A is defined as follows,
Obviously, the condition number depends on the matrix norm used. When (A) is
small, then A is said to be well-conditioned, whereas if (A) is large, then A is said to
be ill-conditioned. Note that for any p-norm, we have
kek kx xk
= .
kxk kxk
A(x + e) = Ax + Ae = b.
Therefore,
Ax = b Ae = b.
The x is the exact solution of Ax = b where b is a perturbed vector of b. Since x = A1 b
and x = A1 b, we have
Similarly,
kbk = kAxk kAk kxk,
i.e.,
1 kAk
. (3.4)
kxk kbk
Combining (3.3), (3.4) and (3.1), we obtain the following theorem which gives the effect
of perturbations of the vector b on the solution of Ax = b in terms of the condition
number.
3.2. PERTURBATION ANALYSIS FOR LINEAR SYSTEMS 35
The next theorem includes the effect of perturbations of the coefficient matrix A
on the solution of Ax = b in terms of the condition number.
Proof: Let
b and = b b.
E =AA
b = b, we have
By subtracting Ax = b from Ax
A(x x) = E x + .
Furthermore, we get
kx xk kxk kAxk kk
kA1 Ek + kA1 k .
kxk kxk kxk kbk
By using
kxk kx xk + kxk and kAxk kAk kxk,
we then have
kx xk kx xk kk
kA1 Ek + kA1 Ek + kA1 kkAk ,
kxk kxk kbk
i.e.,
kx xk kk
(1 kA1 Ek) kA1 Ek + (A) .
kxk kbk
Since
b < 1,
kA1 Ek kA1 k kEk = kA1 k kA Ak
36 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS
we get
kx xk 1 1 1 kk
(1 kA Ek) kA Ek + (A) .
kxk kbk
By using
kEk
kA1 Ek kA1 k kEk = (A) ,
kAk
we finally have
kx xk (A) kEk kk
+ .
kxk 1 (A) kEk
kAk
kAk kbk
Theorems 3.7 and 3.8 give upper bounds for the relative error of x in terms of
the condition number of A. From Theorems 3.7 and 3.8, we know that if A is well-
conditioned, i.e., (A) is small, the relative error in x will be small if the relative errors
in both A and b are small.
Corollary 3.2 Let k k be any matrix norm with kIk = 1 and A be a nonsingular
e being a perturbed matrix of A such that
matrix with A + A
e < 1.
kA1 Ak
e is nonsingular and
Then A + A
e 1 A1 k
k(A + A) (A) e
kAk
.
kA1 k e kAk
1 (A) kkAk
Ak
e kA1 k2
kAk
e 1 A1 k
k(A + A) .
e
1 kA1 Ak
Note that
A+A e
e = A[I (A1 A)].
e = A(I + A1 A)
e and r = kA1 Ak.
Let F = A1 A e Now,
e 1 = (I F )1 A1 .
(A + A)
By using identity
B 1 = A1 B 1 (B A)A1 ,
we have,
e 1 A1 = (A + A)
(A + A) e 1 AA
e 1 .
Then
1 2 e
e 1 A1 k kA1 k kAk
k(A + A) e 1 k kA k kAk .
e k(A + A)
1r
Finally, we obtain
e 1 A1 k
k(A + A) e
kA1 k kAk e
kA1 k kAk (A) e
kAk
.
kA1 k 1r e
1 kA1 k kAk e kAk
1 (A) kkAk
Ak
f = J , L J U,
where is the base, J is the order, and is the fraction. Usually, has the following
form:
= 0.d1 d2 dt
where t is the length (precision) of , d1 6= 0, and 0 di < , for i = 2, , t.
Let
F = {0} {f : f = J , 0 di < , d1 6= 0, L J U }.
Then F contains
2( 1) t1 (U L + 1) + 1
floating point numbers. These numbers are symmetrically distributed in the intervals
[m, M ] and [M, m], where
m = L1 , M = U (1 t ). (3.5)
We remark that F is only a finite set which cannot contain all the real numbers in
these two intervals.
Let f l(x) denote the floating point number of any real number x. Then
f l(x) = 0, for x = 0.
38 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS
Theorem 3.9 Let m |x| M , where m and M are defined by (3.5). Then
f l(x) = x(1 + ), || u,
Proof: In the following, we assume that x 6= 0 and x > 0. Let be an integer and
satisfy
1 x < . (3.6)
Since the order of floating point numbers in [ 1 , ) is , all the numbers
0.d1 d2 dt
are distributed in the interval with distance t . For the rounding error, by (3.6), we
have
1 1 1
|f l(x) x| t = 1 1t x 1t ,
2 2 2
i.e.,
|f l(x) x| 1
1t .
x 2
For the chopping error, we have
|f l(x) x| t = 1 1t x 1t ,
i.e.,
|f l(x) x|
1t .
x
3.3. ERROR ANALYSIS ON FLOATING POINT ARITHMETIC 39
Let us now consider the rounding error of elementary operations. Let a, b F and
represent any elementary operations: +, , , . By Theorem 3.9, we immedi-
ately have
f l(a b) = (a b)(1 + ), || u.
For the lower bound of (1 u)n , by using the Taylor expansion of (1 x)n , i.e.,
n(n 1)
(1 x)n = 1 nx + (1 x)n2 x2 ,
2
we have
1 nx (1 x)n .
Therefore,
1 1.01nu 1 nu (1 u)n . (3.8)
Now, we estimate the upper bound of (1 + u)n . By using the Taylor expansion of ex ,
we have
x2 x3
ex = 1 + x + + +
2! 3!
x x
= 1 + x + x 1 + + .
2 3
Therefore, when 0 x 0.01, we know that by using e0.01 < 2,
0.01 x
1 + x ex 1 + x + xe 1 + 1.01x. (3.9)
2
40 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS
|f l(xT y) xT y|.
Let
X
k
Sk = f l xi yi .
i=1
By Theorem 3.10, we have
S1 = x1 y1 (1 + 1 ), |1 | u,
and
Sk = f l(Sk1 + f l(xk yk ))
= [Sk1 + xk yk (1 + k )](1 + k ), |k |, |k | u.
Therefore,
P
n Q
n
f l(xT y) = Sn = xi yi (1 + i ) (1 + j )
i=1 j=i
P
n
= (1 + i )xi yi ,
i=1
where
n
Y
1 + i = (1 + i ) (1 + j )
j=i
Before we finish this section, let us briefly discuss the floating point analysis on
elementary matrix operations. We first introduce the following notations:
and
f l(A + B) = (A + B) + E, |E| u|A + B|.
Note that |A| |B| maybe is much larger than |AB|. Therefore the relative error of AB
may not be small.
(A + E)x = b,
where E is an error matrix. An upper bound of E is also given. We first study the
rounding error of the LU factorization of A.
Lemma 3.1 Let A Rnn with floating point entries. Assume that A has an LU
factorization and 6nu 1 where u is the machine precision. Then by using Gaussian
elimination, we have
eU
L e =A+E
where
e |U
|E| 3nu(|A| + |L| e |).
42 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS
Proof: We use induction on n. Obviously, Lemma 3.1 is true for n = 1. Assume that
the lemma holds for n 1. Now, we consider a matrix A Rnn :
wT
A= ,
v A1
l1 = v/ + f, u
|f | |v| (3.13)
||
and
e1 = A1 l1 wT + F,
A |F | (2 + u)u(|A1 | + |l1 | |w|T ). (3.14)
For Ae1 , by using the assumption, we obtain an LU factorization with a unit lower
triangular matrix Le 1 and an upper triangular matrix U
e1 such that
e1 U
L e1 = A
e1 + E1
where
e1 | + |L
|E1 | 3(n 1)u(|A e 1 | |U
e1 |).
Thus, we have
eUe= 1 0 wT
L l1 L
e1 e1 = A + E,
0 U
where
0 0
E= .
f E1 + F
|E1 + F | |E1 | + |F |
e1 | + |L
3(n 1)u(|A e1 |) + (2 + u)u(|A1 | + |l1 | |w|T )
e 1 | |U
3(n 1)u (1 + 2u + u2 )(|A1 | + |l1 | |w|T ) + |L
e 1 | |U
e1 |
e 1 | |U
+3(n 1)u(|L e1 |)
e |U
= 3nu(|A| + |L| e |).
Corollary 3.3 Let A Rnn be nonsingular with floating point entries and 6nu 1.
Assume that by using Gaussian elimination with partial pivoting, we obtain
eU
L e = PA + E
Lemma 3.2 Let S Rnn be a nonsingular triangular matrix with floating point
entries and 1.01nu 0.01. By using the method proposed in Section 2.1.1 to solve
Sx = b, we then obtain a computational solution x which satisfies
(S + H)x = b,
where
|H| 1.01nu|S|.
L1 y = f l(c x1 l1 ).
By assumption, we have
(L1 + H1 )y = f l(c x1 l1 )
where
|H1 | 1.01(n 1)u|L1 |. (3.16)
By Theorem 3.10 again, we obtain
where
D = diag(2 , , n ), D = diag(2 , , n )
3.4. ERROR ANALYSIS ON PARTIAL PIVOTING 45
with
|i |, |i | u, i = 2, , n.
Therefore,
x1 l1 + x1 D l1 + (I + D )(L1 + H1 )y = c,
and then
(L + H)x = b,
where
1 l11 0
H= .
D l1 H1 + D (L1 + H1 )
By using (3.15), (3.16) and the condition 1.01nu 0.01, we have
|1 | |l11 | 0
|H|
|D | |l1 | |H1 | + |D |(|L1 | + |H1 |)
u|l11 | 0
u|l1 | |H1 | + u(|L1 | + |H1 |)
|l11 | 0
u
|l1 | [1.01(n 1) + 1 + 1.01(n 1)u]|L1 |
1.01nu|L|.
Theorem 3.12 Let A Rnn be a nonsingular matrix with floating point entries and
1.01nu 0.01. If Gaussian elimination with partial pivoting is used to solve Ax = b,
we then obtain a computational solution x which satisfies
(A + A)x = b,
where
kAk u(3n + 5.04n3 )kAk (3.17)
with the growth factor
1 (k)
max |a |.
kAk i,j,k ij
Proof: By using Gaussian elimination with partial pivoting, we have the following two
triangular systems:
e = P b,
Ly e x = y.
U
46 CHAPTER 3. PERTURBATION AND ERROR ANALYSIS
e + F )(U
(L e + G)x = P b,
i.e.,
eU
(L e + FU
e + LG
e + F G)x = P b, (3.18)
where
e
|F | 1.01nu|L|, e |.
|G| 1.01nu|U (3.19)
eU
Substituting L e = P A + E into (3.18), we have
(A + A)x = b,
where
e + LG
A = P T (E + F U e + F G).
By using (3.19), Corollary 3.3 and the condition 1.01nu 0.01, we have
e |U
|A| P T (3nu|P A| + (3n + 2.04n)u|L| e |)
(3.20)
= nuP T (3|P A| e |U
+ 5.04|L| e |).
We remark that kAk usually is very small comparing with the initial error from
given data. Thus, Gaussian elimination with partial pivoting is numerically stable.
Exercises:
1. Let
1 0.999999
A= .
0.999999 1
Compute A1 , det(A) and the condition number of A.
3.4. ERROR ANALYSIS ON PARTIAL PIVOTING 47
kA1 k1
2 = min kAxk2 .
kxk2 =1
(I S)1 (I + S)
D = diag(d11 , , dnn ).
Show that
max{dii }
i
2 (A) .
min{dii }
i
T = I + V T A1 U
(A + U V T )1 = A1 A1 U T 1 V T A1 .
Chapter 4
where the data matrix A Rmn with m n and the observation vector b Rm
are given. We introduce some well-known orthogonal transformations and the QR
decomposition for constructing efficient algorithms for these problems. For a literature
on least squares problems, we refer to [15, 21, 42, 44, 45, 48].
49
50 CHAPTER 4. LEAST SQUARES PROBLEMS
and
b = (y1 , , ym )T , x = (x1 , , xn )T , r(x) = (r1 (x), , rm (x))T .
When m = n, we can require that r(x) = 0 and x can be found by solving the system
Ax = b. When m > n, we require that r(x) can reach its minimum under the norm
k k2 . We therefore introduce the following definition of the least squares problem.
It is called the least squares (LS) problem and r(x) is called the residual.
rank(A) = n < m.
Ax = b, A Rmn . (4.2)
R(A) {y Rm : y = Ax, x Rn }.
N (A) {x Rn : Ax = 0}.
Theorem 4.1 The equation (4.2) has solutions rank(A) = rank([A, b]).
Theorem 4.2 Let x be a special solution of (4.2). Then the solution set of (4.2) is
given by x + N (A).
4.1. LEAST SQUARES PROBLEMS 51
Corollary 4.1 Assume that the equation (4.2) has some solution. The solution is
unique null(A) = 0.
Theorem 4.3 The LS problem (4.1) always has solutions. The solution is unique if
and only if null(A) = 0.
Proof: Since
Rm = R(A) R(A) ,
the vector b can be expressed uniquely by
b = b1 + b2
Note that kr(x)k22 reaches the minimum if and only if kb1 Axk22 reaches the minimum.
Since b1 R(A), kr(x)k22 reaches its minimum if and only if
Ax = b1 ,
i.e.,
kb1 Axk22 = 0.
Thus, by Corollary 4.1, we know that the solution of Ax = b1 is unique, i.e., the solution
of (4.1) is unique, if and only if
null(A) = 0.
Let
X = {x Rn : x is a solution of (4.1)}.
We have
AT Ax = AT b. (4.3)
52 CHAPTER 4. LEAST SQUARES PROBLEMS
r(x) = b Ax = b b1 = b2 R(A) .
Therefore
AT r(x) = AT b2 = 0.
Substituting r(x) = b Ax into AT r(x) = 0, we obtain (4.3).
Conversely, let x Rn satisfy
AT Ax = AT b,
= kb Axk22 + kAyk22
kb Axk22 .
Thus, x X .
We remark that in computation of AT A, usually, the operation cost is O(n2 m), and
some information of matrix A could be lost. For example, we consider
1 1 1
0 0
A= 0 0 .
0 0
We have
1 + 2 1 1
AT A = 1 1 + 2 1 .
1 1 1 + 2
Assume that = 103 and a 6-digital decimal floating system is used. Then 1 + 2 =
1 + 106 is rounded off to be 1, which means that AT A is singular!
4.1. LEAST SQUARES PROBLEMS 53
x = (AT A)1 AT b.
If we let
A = (AT A)1 AT ,
then the LS solution x could be written as
x = A b.
Then
x = A b,
and
x + x = A (b + b) = A b
where b = b + b. We have
Theorem 4.5 Let b1 and b1 denote orthogonal projections of b and b on R(A), respec-
tively. If b1 6= 0, then
kxk2 kb1 k2
2 (A)
kxk2 kb1 k2
where 2 (A) = kAk2 kA k2 and b1 = b1 + b1 .
A b = A b1 + A b2 = A b1 + (AT A)1 AT b2 = A b1 .
54 CHAPTER 4. LEAST SQUARES PROBLEMS
Similarly, A b = A b1 . Therefore,
Since Ax = b1 , we have
kb1 k2 kAk2 kxk2 . (4.5)
By combining (4.4) and (4.5), the proof is complete.
We remark that the condition number 2 (A) is important for LS problems. When
2 (A) is large, we say that the LS problem is ill-conditioned. When 2 (A) is small, we
say that the LS problem is well-conditioned.
Theorem 4.6 Suppose that column vectors of A are linearly independent. Then
Therefore,
H = I 2 T . (4.6)
Theorem 4.7 Let H be defined as in (4.6). Then H has the following properties:
(ii) H 2 = I.
Proof: We only prove (iii). Note that for any vector x Rn , it can be expressed as:
x = u +
Hx = (I 2 T )(u + ) = u + 2 T u 2 T = u .
Theorem 4.8 For any 0 6= x Rn , we can construct a unit vector such that the
Householder transformation defined as in (4.6) satisfies
Hx = e1
where = kxk2 .
We then have
Hx = x 2( T x)
2 T
= x 2 (x eT1 )x (x e1 )
kx e1 k2
where x1 is the first component of the vector x. Let the coefficient of x be zero and
then we have the following equation:
2(kxk22 x1 )
1 = 0.
kx e1 k22
Solving this equation for , we have = kxk2 . Substituting it into (4.7), we therefore
have
Hx = kxk2 e1 .
We remark that for any vector 0 6= x Rn , by Theorem 4.8, one can construct a
Householder matrix H such that the last n 1 components of Hx are zeros. We can
use the following two steps to construct the unit vector of H:
Now a natural question is: how to choose the sign in front of kxk2 ? Usually, we
choose
v = x + sign(x1 )kxk2 e1 ,
where x1 6= 0 is the first component of the vector x, see [38]. Since
2
H = I 2 T = I vv T = I vv T
vT v
where = 2/v T v, we only need to compute and v instead of forming . Thus, we
have the following algorithm.
where c = cos and s = sin . It is easy to prove that G(i, k, ) is an orthogonal matrix.
Let x Rn and y = G(i, k, )x. We then have
Therefore, q
yi = x2i + x2k , yk = 0.
We remark that for any vector 0 6= x Rn , one can construct a Givens rotation
G(i, k, ) acting on x to make a nonzero component of x be zero.
4.3 QR decomposition
Let A Rmn and b Rm . By Theorem 3.3 (iii), for any orthogonal matrix Q, we
have
kAx bk2 = kQT (Ax b)k2 .
Therefore, the LS problem
min kQT Ax QT bk2
x
is equivalent to (4.1). We wish that we could find a suitable orthogonal matrix Q such
that the original LS problem becomes an easier solvable LS problem. We have
58 CHAPTER 4. LEAST SQUARES PROBLEMS
QT1 a1 = ka1 k2 e1 .
Therefore, we have
ka1 k2 v T
QT1 A = .
0 A1
For the matrix A1 R(m1)(n1) , we obtain by assumption,
R2
A1 = Q2 ,
0
Then Q and R are the matrices satisfying the conditions of the theorem.
When A Rmm is nonsingular, we want to show that the QR decomposition is
unique. Let
A = QR = Q eR
e
where Q, Qe Rmm are orthogonal matrices, and R, R e Rmm are upper triangular
matrices with nonnegative diagonal entries. Since A is nonsingular, we know that the
e are positive. Therefore, the matrices
diagonal entries of R and R
e T Q = RR
Q e 1
are both orthogonal and upper triangular matrices with positive diagonal entries. Thus
e T Q = RR
Q e 1 = I,
4.3. QR DECOMPOSITION 59
i.e.,
e = Q,
Q e = R.
R
Exercises:
4.3. QR DECOMPOSITION 61
1. Let A Rmn have full column rank. Prove that A + E also has full column rank if E
satisfies kEk2 kA1 k2 , where A = (AT A)1 AT .
2. Let U = [uij ] be a nonsingular upper triangular matrix. Show that
max |uii |
i
(U ) ,
min |uii |
i
where (U ) = kU k kU 1 k .
3. Let A Rmn with m n and have full column rank. Show that
I A r b
=
AT 0 x 0
P x = kxk2 e1 .
(a) Choose a small such that rank(A) = 3. Then compute 2 (A) to show that A is
ill-conditioned.
(b) Find the LS solution with A given as above and b = (3, , , )T by using
(i) the normalized equation method;
(ii) the QR method.
Show that
A = C (CC )1 (B B)1 B .
A = V U .
N (AA ) = N (AA ) = N (A ) = N (A ),
N (A A) = N (A A) = N (A).
63
64 CHAPTER 5. CLASSICAL ITERATIVE METHODS
0
a21 0
a31 a32 0
L= ,
.. .. .. ..
. . . .
an1 an2 an,n1 0
and
0 a12 a13 a1n
0 a a2n
23
. .. . .. ..
U = . .
0 an1,n
0
Then it is easy to see that
x = BJ x + g,
where
BJ = D1 (L + U ), g = D1 b.
The matrix BJ is called the Jacobi iteration matrix. The corresponding iteration
xk = BJ xk1 + g, k = 1, 2, , (5.1)
T
(0) (0) (0)
is known as the Jacobi method if an initial vector x0 = x1 , x2 , , xn is given.
BJ = D1 (L + U ), g = D1 b;
The iteration (5.3) is called linear stationary iteration, where B Rnn is called the
iteration matrix, g Rn the constant term, and x0 Rn the initial vector. In the
following, we give a convergence theorem.
Theorem 5.1 The iteration (5.3) converges with an arbitrary initial guess x0 if and
only if the matrix B k 0 as k .
x xk+1 = B 2 (x xk1 ).
x xk+1 = B k+1 (x x0 ).
This shows that {xk } converges to the solution x for any choice x0 if and only if B k 0
as k .
66 CHAPTER 5. CLASSICAL ITERATIVE METHODS
Recall that B k 0 as k if and only if the spectral radius (B) < 1. Since
|i | kBk, a good way to see whether (B) < 1 is to see whether kBk < 1 by
computing kBk with a row-sum or column-sum norm. Note that the converse is not
true. Combining the result of Theorem 5.1 with the above observation, we have the
following theorem.
Theorem 5.2 The iteration (5.3) converges for any choice of x0 if and only if (B) <
1. Moreover, if kBk < 1 for some matrix norm, then the iteration (5.3) converges.
It is easy to verify that for A1 , the Jacobi method converges even if the Gauss-Seidel
method does not. For A2 , the Jacobi method diverges while the Gauss-Seidel method
converges.
then the matrix A is called strictly diagonally dominant. The matrix A is called weakly
diagonally dominant if for i = 1, 2, , n,
n
X
|aii | |aij |
j=1
j6=i
Theorem 5.3 If A is strictly diagonally dominant, then the Jacobi method converges
for any initial approximation x0 .
5.2. CONVERGENCE ANALYSIS 67
BJ = D1 (L + U )
is given by
0 aa12
11
aa1n11
a21 0 aa23 aa2n
a22 22 22
.. .. .. .. ..
BJ =
. . . . . .
.. .. .. a
. . . n1,n
an1,n1
an,n1
aann
n1
ann 0
We know that the absolute row sum of each row is less than 1, which means
kBJ k < 1.
Theorem 5.4 If A is strictly diagonally dominant, then the Gauss-Seidel method con-
verges for any initial approximation x0 .
BGS = (D L)1 U.
BGS x = x,
we have
U x = (D L)x,
i.e.,
n
X i
X
aij xj = aij xj , 1 i n,
j=i+1 j=1
68 CHAPTER 5. CLASSICAL ITERATIVE METHODS
Let xk be the largest component having the magnitude 1 of the vector x. Then by
(5.6), we have
k1
X n
X
|| |akk | || |akj | + |akj |
j=1 j=k+1
i.e.,
k1
X n
X
||(|akk | |akj |) |akj |
j=1 j=k+1
or
P
n
|akj |
j=k+1
|| (5.7)
P
k1
|akk | |akj |
j=1
Thus from (5.7), we conclude that || < 1, i.e., (BGS ) < 1. By Theorem 5.2, the
Gauss-Seidel method converges.
We now discuss the convergence of the Jacobi, Gauss-Seidel methods for the sym-
metric positive definite matrices.
Theorem 5.5 Let A be symmetric with diagonal entries aii > 0, i = 1, 2, , n. Then
the Jacobi method converges if and only if both A and 2D A are positive definite.
Proof: Since
BJ = D1 (L + U ) = D1 (D A) = I D1 A,
and
D = diag(a11 , a22 , , ann )
with aii > 0, i = 1, 2, , n, then
It is easy to see that I D1/2 AD1/2 symmetric and similar to BJ . Then the eigen-
values of BJ are real.
Now, we first suppose that the Jacobi method converges, then (BJ ) < 1 by Theo-
rem 5.2. The absolute value of the eigenvalues of
I D1/2 AD1/2
is less than 1, i.e., the eigenvalues of D1/2 AD1/2 lies on (0, 2). Thus A is positive
definite. On the other hand, the eigenvalues of 2I D1/2 AD1/2 are positive, so the
matrix
2I D1/2 AD1/2
is positive definite. Since
and A is positive definite, it follows that the eigenvalues of I BJ are positive, i.e., the
eigenvalues of BJ are less than 1. Because 2D A is positive definite and
we can deduce that the eigenvalues of I + BJ are positive, i.e., the eigenvalues of BJ
are greater than 1. Thus (BJ ) < 1. By Theorem 5.2 again, the Jacobi method
converges.
Theorem 5.6 Let A be a symmetric positive definite matrix. Then the Gauss-Seidel
method converges for any initial approximation x0 .
Proof: Let be an eigenvalue of the iteration matrix BGS and u the corresponding
eigenvector. Then
(D L)1 U u = u.
Since A is symmetric positive definite, we have U = LT and
(D L)u = LT u.
Therefore,
u (D L)u = u LT u.
70 CHAPTER 5. CLASSICAL ITERATIVE METHODS
Let
u Du = , u Lu = + i.
We then have
u LT u = (Lu) u = u Lu = i.
Thus
[ ( + i)] = i.
Taking the modulus of both sides, we have
2 + 2
||2 = .
( )2 + 2
0 < u Au = u (D L LT )u = 2.
Hence,
( )2 + 2 = 2 + 2 + 2 2
= ( 2) + 2 + 2
> 2 + 2 .
Furthermore,
(i) If A is strictly diagonally dominant, then both the Jacobi and the Gauss-Seidel
methods converge. In fact,
(ii) If A is irreducible and weakly diagonally dominant, then both the Jacobi and the
Gauss-Seidel methods converge. Moreover,
xk+1 = M xk + g, k = 0, 1, ,
yk = M yk1 = = M k y0 .
kyk k kM k k ky0 k
with the equality possible for each k for some vector y0 . Thus, if y0 is not the zero
vector, then kM k k gives us a sharp upper-bound estimate for the ratio kyk k/ky0 k. Since
the initial vector y0 is unknown in practical problems, kM k k serves as a measurement
of comparison of different iterative methods.
Definition 5.3 Let M be an n-by-n iteration matrix. If kM k k < 1 for some positive
integer k, then
ln kM k k
Rk (M ) ln[(kM k k)1/k ] =
k
is called the average rate of convergence for k iterations of M .
72 CHAPTER 5. CLASSICAL ITERATIVE METHODS
is the average reduction factor per iteration for the norm of error. If kM k k < 1, then
by Definition 5.3, we have
kM k k2 = [(M )]k ,
and thus,
Rk (M ) = ln (M ),
which is independent of k.
Next we will consider the asymptotic convergence rate
R (M ) lim Rk (M ).
k
lim kM k k1/k = (M ).
k
Since
[(M )]k = (M k ) kM k k,
we have
(M ) kM k k1/k .
On the other hand, for any > 0, consider the matrix
1
B = M.
(M ) +
It is obvious that (B ) < 1 and then lim Bk = 0. Hence, there exists a natural
k
number K, for k K, we have
kBk k 1,
5.4. SOR METHOD 73
i.e.,
kM k k [(M ) + ]k .
Thus,
(M ) kM k k1/k (M ) + ,
which means
lim kM k k1/k = (M ).
k
(k) (k) (k) T
Here D1 (L+U ) = [cij ], xk = x1 , x2 , , xn , and g = D1 b = (g1 , g2 , , gn )T .
In matrix form, we have
xk+1 = L xk + (D L)1 b
where
L = (D L)1 [(1 )D + U ]
is called the iteration matrix of the SOR method and is the relaxation parameter.
We have three cases depending on the values of :
(1) if = 1, (5.8) is equivalent to the Gauss-Seidel method;
(2) if < 1, (5.8) is called underrelaxation;
(3) if > 1, (5.8) is called overrelaxation.
74 CHAPTER 5. CLASSICAL ITERATIVE METHODS
Theorem 5.9 The SOR iteration cannot converge for any initial approximation if
lies outside the interval (0,2).
L = (D L)1 [(1 )D + U ],
where A = [aij ] = D L U .
The matrix (D L)1 is a lower triangular matrix with 1/aii , i = 1, 2, , n, as
diagonal entries, and the matrix
(1 )D + U
(L ) < 1
L = (D L)1 [(1 )D + LT ]
[(1 )D + LT ]x = (D L)x
5.4. SOR METHOD 75
or
x [(1 )D + LT ]x = x (D L)x.
Let
x Dx = , x Lx = + i.
Therefore, x LT x = i and then
(1 ) + ( i) = [ ( + i)].
Taking modulus of both sides, we obtain
[(1 ) + ]2 + 2 2
||2 = . (5.9)
( )2 + 2 2
Note that
[(1 ) + ]2 + 2 2 ( )2 2 2
= [ ( )]2 ( )2
= ( 2)( 2).
Since A is symmetric positive definite, we have
> 0, 2 > 0.
Therefore, if 0 < < 2, we have
[(1 ) + ]2 + 2 2 < ( )2 + 2 2 . (5.10)
Thus, for 0 < < 2, we obtain by (5.9) and (5.10),
||2 < 1,
i.e., the SOR method converges.
Definition 5.4 A matrix M has property A if there exists a permutation P such that
T D11 M12
PMP =
M21 D22
where D11 and D22 are diagonal matrices.
76 CHAPTER 5. CLASSICAL ITERATIVE METHODS
where
b= D11 0 b= 0 0 b= 0 M12
D , L , U .
0 D22 M21 0 0 0
Let
cJ () D
B b + 1D
b 1 L b 1 U
b.
We have
cJ () are independent of .
Theorem 5.11 The eigenvalues of B
Note that BJ (1) = BJ is the Jacobi iteration matrix. From Theorem 5.11, we have
known that if M has property A, then P M P T is consistently ordered where P is a
permutation matrix such that
T D11 M12
PMP =
M21 D22
with D11 and D22 being diagonal. It is not true that consistent ordering implies prop-
erty A.
5.4. SOR METHOD 77
( + 1)2 = 2 2 , (5.11)
then is an eigenvalue of L .
This means that the convergence rate of the Gauss-Seidel method is twice as fast as
that of the Jacobi method.
= 2 .
To get the most benefit from overrelaxation, we would like to find an optimal ,
denoted by opt , minimizing (L ). We have the following theorem, see [14, 41].
78 CHAPTER 5. CLASSICAL ITERATIVE METHODS
Theorem 5.13 Suppose that A is consistently ordered and BJ has real eigenvalues
with = (BJ ) < 1. Then
2
opt = p ,
1 + 1 2
2
(Lopt ) = p ,
(1 + 1 2 )2
and
1,
opt < 2,
(L ) = q
1 + 1 2 2 + 1 + 1 2 2 ,
2 4 0 < opt .
Exercises:
1. Judge the convergence of the Jacobi method and the Gauss-Seidel method for the fol-
lowing examples:
1 2 2 2 1 1
A1 = 1 1 1 , A2 = 1 1 1 .
2 2 1 1 1 2
2. Show that the Jacobi method converges for 2-by-2 symmetric positive definite systems.
3. Show that if A = M N is singular, then we can never have (M 1 N ) < 1 even if M is
nonsingular.
4. Consider Ax = b where
1 0
A= 0 1 0 .
0 1
5. Let A Rnn be nonsingular. Show that there exists a permutation matrix P such that
the diagonal entries of P A are nonzero.
6. Prove that
4 1 1 0
1 4 0 1
A=
1 0
4 1
0 1 1 4
is consistently ordered.
5.4. SOR METHOD 79
7. Let A Rnn . Then (A) < 1 if and only if I A nonsingular and each eigenvalue of
(I A)1 (I + A) has a positive real part.
8. Let B Rnn satisfy (B) = 0. Show that for any g, x0 Rn , the iterative formula
xk+1 = Bxk + g, k = 0, 1,
(2)
kAk kp
lim = 2, p = 1, 2, .
k k
(3) (Ak ) = 1.
12. Let
Bk = Bk1 + Bk1 (I ABk1 ), k = 1, 2, .
Show that if kI AB0 k = c < 1, then
lim Bk = A1
k
and k
c2
kA1 Bk k kB0 k.
1c
13. Prove Theorem 5.12.
14. Prove Theorem 5.13.
80 CHAPTER 5. CLASSICAL ITERATIVE METHODS
Chapter 6
In this chapter, we will introduce a class of iterative methods called Krylov subspace
methods. Among Krylov subspace methods developed for large sparse problems, we will
mainly study two methods: the conjugate gradient (CG) method and the generalized
minimum residual (GMRES) method. The CG method proposed by Hestenes and
Stiefel in 1952 is one of the best known iterative methods for solving symmetric positive
definite linear systems, see [16]. The GMRES method was proposed by Saad and
Schultz in 1986 for solving nonsymmetric linear systems, see [34]. As usual, let us
begin our discussion from the steepest descent method.
Theorem 6.1 Let A Rnn be a symmetric positive definite matrix. Then finding
the solution of Ax = b is equivalent to finding the minimum of function (6.1).
81
82 CHAPTER 6. KRYLOV SUBSPACE METHODS
where grad(x) denotes the gradient of (x) and r = b Ax. If (x) reaches its
minimum at a point x , then
grad(x ) = 0,
i.e., Ax = b which means that x is the solution of the system.
Conversely, if x is the solution of the system, then for any vector y, we have
= xT Ax 2bT x + y T Ay = (x ) + y T Ay.
(x + y) (x ),
How to find the minimum of (6.1)? Usually, for any given initial vector x0 , we
choose a direction p0 , and then we try to find a point
x1 = x0 + 0 p0
It means that along this line, (x) reaches its minimum at point x1 . Afterwards,
starting from x1 , we choose another direction p1 , and then we try to find a point
x2 = x1 + 1 p1
i.e., along this line, (x) reaches its minimum at point x2 . Step by step, we have
p0 , p 1 , , and 0 , 1 , ,
where {pk } are line search directions and {k } are step sizes. In general, at a point xk ,
we choose a direction pk and then determine a step size k along the line x = xk + pk
such that
(xk + k pk ) (xk + pk ).
We then obtain xk+1 = xk + k pk . We remark that different ways for choosing line
search directions and step sizes give different algorithms for solving (6.1).
6.1. STEEPEST DESCENT METHOD 83
In the following, we first consider how to determine a step size k . Starting from a
point xk along a direction pk , we want to find a step size k on the line x = xk + pk
such that
(xk + k pk ) (xk + pk ).
Let
f () = (xk + pk )
xk+1 = xk + k pk .
(rkT pk )2
= 0.
pTk Apk
P
n P
n
= i i2 P 2 (i ) max P 2 (j ) i i2
i=1 1jn i=1
Then
kP (A)xkA max |P (j )| kxkA .
1jn
For the steepest descent method, we have the following convergence theorem.
By noting that
(x) + xT Ax = (x x )T A(x x ),
we have
(xk x )T A(xk x ) (xk1 + rk1 x )T A(xk1 + rk1 x )
(6.4)
= [(I A)(xk1 x )]T A[(I A)(xk1 x )],
for any R. Let P (t) = 1 t. By using Lemma 6.1, we have from (6.4),
kxk x kA kP (A)(xk1 x )kA
(6.5)
max |P (j )| kxk1 x kA ,
1jn
6.2. CONJUGATE GRADIENT METHOD 85
for any R. By using properties of the Chebyshev approximation, see [33], we have
n 1
min max |1 t| = . (6.6)
1 tn n + 1
Substituting (6.6) into (6.5), we obtain
n 1
kxk x kA kxk1 x kA .
n + 1
Thus,
k
n 1
kxk x kA kx0 x kA .
n + 1
= 2(rkT Apk1 + pTk1 Apk1 ),
86 CHAPTER 6. KRYLOV SUBSPACE METHODS
x = xk + 0 rk + 0 pk1
rkT Apk1
k1 = .
pTk1 Apk1
Note that pk satisfies pTk Apk1 = 0 (see Theorem 6.3 later), i.e., pk and pk1 are
mutually A-conjugate.
Once we get pk , we can determine k by using (6.3) and then compute
xk+1 = xk + k pk .
rkT pk
k = ,
pTk Apk
xk+1 = xk + k pk ,
rk+1 = b Axk+1 ,
T Ap
rk+1 k
k = ,
pTk Apk
pk+1 = rk+1 + k pk .
6.2. CONJUGATE GRADIENT METHOD 87
Thus, the scheme of the CG method, one of the most popular and successful iterative
methods for solving symmetric positive definite systems Ax = b, is given as follows. At
the initialization step, for k = 0, we choose x0 and then calculate
r0 = b Ax0 .
Theorem 6.3 The vectors {ri } and {pi } satisfy the following properties:
(2) riT rj = 0, i 6= j, 0 i, j k;
p0 = r0 , r1 = r0 0 Ap0 , p1 = r1 + 0 p0 .
Then
pT0 r1 = r0T r1 = r0T (r0 0 Ap0 ) = r0T r0 0 pT0 Ap0 = 0
provided 0 = r0T r0 /pT0 Ap0 , and
r1T Ar0 T
pT1 Ap0 = (r1 + 0 r0 )T Ar0 = r1T Ar0 r Ar0 = 0.
r0T Ar0 0
Now, we assume that the theorem is true for k and we try to prove that it also holds
for k + 1.
For (1), by using assumption and rk+1 = rk k Apk , we have
and also
pTk rk T
pTk rk+1 = pTk rk p Apk = 0.
pTk Apk k
Thus, (1) is true for k + 1.
For (2), we have by assumption,
span{r0 , , rk } = span{p0 , , pk }.
By (1), we know that rk+1 is orthogonal to this subspace. Therefore, (2) is true for
k + 1.
For (3), by using assumption, (2) and
we have
1 T
pTk+1 Api = r (ri ri+1 ) + k pTk Api = 0, i = 0, 1, , k 1.
i k+1
By the definition of k , we have
T Ap
rk+1 k
pTk+1 Apk = (rk+1 + k pk )T Apk = rk+1
T
Apk pTk Apk = 0.
pTk Apk
Therefore
and
pk+1 = rk+1 + k pk K(A, r0 , k + 2) = span{r0 , Ar0 , , Ak+1 r0 }.
By (2) and (3), we note that the vectors r0 , , rk+1 and p0 , , pk+1 are linearly
independent. Thus, (4) is true for k + 1.
We remark that by Theorem 6.3, at most n steps, we can obtain the exact solution
of an n-by-n system by using the CG method. Therefore, from a theoretical viewpoint,
the CG method is a direct method.
or
kxk x kA = min{kx x kA : x x0 + K(A, r0 , k)}, (6.9)
where kxkA = xT Ax and x is the exact solution of Ax = b.
Proof: Since (6.8) and (6.9) are equivalent, we only need to prove (6.9). Suppose that
rl = 0 at the l-th step of the CG method, then we have
x = xl = xl1 + l1 pl1
= x0 + 0 p0 + + l1 pl1 .
Let x be any vector in x0 + K(A, r0 , k). Then by Theorem 6.3 (4), we have
x = x0 + 0 p0 + + k1 pk1 .
Moreover,
Since
x xk = k pk + + l1 pl1 ,
kk pk + + l1 pl1 k2A
= kx xk k2A .
From Theorem 6.3, we know that the CG method would obtain an accurate solution
after n steps in exact arithmetic, where n is the size of the system. In other words,
the CG method is thought as a direct method rather than an iterative method. When
n is very large, in practice, we use the CG method as an iterative method and stop
iterations when
(i) krk k is less than , where rk = b Axk and is a given error bound; or
(ii) the number of iterations reaches kmax , the largest number of iterations provided
by us, where kmax n.
We then have the following practical algorithm for solving symmetric positive definite
systems Ax = b. At the initialization step k = 0, we choose a initial vector x and
calculate
r = b Ax, = rT r.
6.3. PRACTICAL CG METHOD AND CONVERGENCE ANALYSIS 91
While > kbk2 and k < kmax , in iteration steps, we have
k =k+1
if k = 1
p=r
else
= /; p = r + p
end
w = Ap; = /pT w; x = x + p
r = r w; = ; = rT r
(1) We only have the matrix-vector multiplications in the algorithm. If the matrix
is sparse or it has a special structure, then these multiplications can be done
efficiently by using some sparse solvers or fast solvers.
(2) We do not need to estimate any parameter in the algorithm unlike the SOR
method.
(3) For each iteration, we could use the parallel algorithms for the vector operations.
Now, we briefly discuss how to use the CG method to solve general linear systems
Ax = b. Since we cannot use the CG method directly to the system, instead of solving
Ax = b, we can use the CG method to solve the normalized system,
AT Ax = AT b.
When the system is well-conditioned, then the normalized CG method is suitable. But
if the system is ill-conditioned, then the condition number of the normalized system
could become very large because of 2 (AT A) = (2 (A))2 . Hence the normalized CG
method is not suitable for ill-conditioned systems.
and also
riT rp+1 = 0, i = 0, 1, , p.
Therefore rp+1 = 0, i.e., Axp+1 = b.
Theorem 6.6 Let A Rnn be symmetric positive definite and x be the exact solution
of Ax = b. We then have
2 1 k
kx xk kA 2 kx x0 kA
2 + 1
= A1 Pk (A)r0 ,
P
k
where Pk () = akj j with Pk (0) = 1. Let Pk be the set of all the polynomials Pk
j=0
with order less than or equal to k and Pk (0) = 1. By Theorem 6.4 and Lemma 6.1, we
have
kxk x kA = min{kx x kA : x x0 + K(A, r0 , k)}
where
0 < a1 = 1 n = a2
are the eigenvalues of A. By the well-known Approximation Theorem of Chebyshev
polynomials, see [33], we know that there exists a unique solution of the optimal prob-
lem
min max |Pk ()|
Pk Pk a1 a2
given by
Tk ( a2a+a 1 2
2 a1
)
Pek () = .
Tk ( aa22 a
+a1
1
)
Here Tk (z) is the k-th Chebyshev polynomial defined recursively by
with T0 (z) = 1 and T1 (z) = z. By the properties of Chebyshev polynomials, see [33]
again, we know that
1 2 1 k
max |Pek ()| = 2 ,
a1 a2 Tk ( aa22 +a
a1 )
1 2 + 1
Theorem 6.7 If the eigenvalues j of a symmetric positive definite matrix A are or-
dered such that
From Theorem 6.7, we note that when n is increased, if p, q are constants that do
not depend on n and 1 is uniformly bounded from zero, then the convergence rate is
linear, i.e., the number of iterations is independent of n. We also notice that the more
clustered the eigenvalues are, the faster the convergence rate will be.
where k p + q.
1 1
b2 2 1+ 2
= .
b1 1
Therefore,
1 1 1 2
= < .
+1
For 1 j p and [1 , 1 + ], we have
j 1+
0 .
j
p
1+
2 kpq .
6.4. PRECONDITIONING 95
6.4 Preconditioning
From Section 6.3, we know that if the matrix A of the system
Ax = b
is well-conditioned or its spectrum is clustered, then the convergence rate of the CG
method will be very quick. Therefore, in order to speed up the convergence rate, we
usually precondition the system, i.e., instead of solving the original system, we solve
the following preconditioned system
e = b,
Ax (6.10)
where
e = C 1 AC 1 ,
A x = Cx, b = C 1 b,
e could
and C is symmetric positive definite. We wish that the preconditioned matrix A
have better spectral properties than those of A.
By using the CG method on (6.10), we have
rkT rk
k = ,
p e k
T Ap
k
xk+1 = xk + k pk ,
rk+1 = rk k Ap e k, (6.11)
T r
rk+1 k+1
= ,
k
r T r
k k
pk+1 = rk+1 + k pk ,
e 0 and p0 = r0 . Let
where x0 is any given initial vector, r0 = b Ax
xk = Cxk , rk = C 1 rk , pk = Cpk , M = C 2.
Substituting them into (6.11), we actually have
wk = Apk , k = k /pTk wk ,
xk+1 = xk + k pk , rk+1 = rk k wk ,
zk+1 = M 1 rk+1 , T z
k+1 = rk+1 k+1 ,
where z, r, p, w are vectors and , , are scalars. This algorithm is called the
preconditioned conjugate gradient (PCG) method. Note that the PCG method has the
following properties:
(i) riT M 1 rj = 0, for i 6= j.
Usually, it is not easy to choose a preconditioner which satisfies all these two criteria.
Now we briefly discuss the following three classes of preconditioners.
M = diag(a11 , , ann )
6.4. PRECONDITIONING 97
if A1
ii are easily to be obtained, then one could use the block diagonal matrix
M = diag(A11 , , Akk )
as a preconditioner.
A = LLT + R,
then one can use the matrix M = LLT as a preconditioner. We could require
that the matrix L has the same sparse structure as the matrix A and also the
matrix LLT A.
Cn = Fn n Fn , (6.12)
and Cn1 y for any vector y can be computed by FFTs in O(n log n) operations.
For the Fourier matrix Fn , when there is no ambiguity, we shall denote F .
Now, we study a kind of preconditioner called the optimal preconditioner, see
[9, 11, 22]. Given any unitary matrix U Cnn , let MU be the set of all
matrices simultaneously diagonalized by U , i.e.,
MU = {U U | is an n-by-n diagonal matrix}. (6.13)
We note that in (6.13), when U = F , the Fourier matrix, MF is the set of all the
circulant matrices. Let (A) denote the diagonal matrix whose diagonal is equal
to the diagonal of the matrix A. We have the following lemma, see [22, 27, 39].
Lemma 6.2 For any arbitrary A = [apq ] Cnn , let cU (A) be the minimizer of
kW AkF over all W MU . Then
The matrix cU (A) is called the optimal preconditioner of A and the matrix cF (A)
is called the optimal circulant preconditioner of A. We remark that cF (A) is a
good preconditioner for solving a large class of structured linear systems Ax = b,
for instance, Toeplitz systems, Hankel systems, etc, see [9, 11, 12, 22].
6.5. GMRES METHOD 99
where
K(A, r0 , k) span{r0 , Ar0 , , Ak1 r0 }
with r0 = b Ax0 . Let x x0 + K(A, r0 , k). We have
k1
X
x = x0 + j Aj r0
j=0
and then
k1
X k
X
r = b Ax = b Ax0 j Aj+1 r0 = r0 j1 Aj r0 .
j=0 j=1
Hence
r = Pk (A)r0
where Pk Pk with Pk being the set of all the polynomials Pk with order less than or
equal to k and Pk (0) = 1. We therefore have the following theorem.
Theorem 6.8 Let xk be the solution after the k-th GMRES iteration. Then we have
Furthermore,
krk k2
kPk (A)k2 .
kr0 k2
Moreover, we have
100 CHAPTER 6. KRYLOV SUBSPACE METHODS
Theorem 6.9 The GMRES method will obtain the exact solution of Ax = b within n
iterations, where A Rnn .
pA (z) = det(zI A)
pA (z)
Pn (z) = Pn .
pA (0)
Pn (A) = pA (A) = 0.
P (A) = V P ()V 1 .
krk k2
min max |P (i )|.
kr0 k2 P Pk i
Theorem 6.11 If A is diagonalizable and has exactly k distinct eigenvalues, then the
GMRES method will terminate in at most k iterations.
where i are the eigenvalues of A. By Theorem 6.10, we know that rk = 0, i.e., Axk = b.
We should emphasize that in general, the behavior of the GMRES method cannot
be determined by eigenvalues alone. In fact, it is shown in [20] that any nonincreasing
convergence rate is possible for the GMRES method applied to some problem with
nonnormal matrix. Moreover, that problem can have any desired distribution of eigen-
values. Thus, for instance, eigenvalues tightly clustered around 1 are not necessarily
good for nonnormal matrices, as they are for normal ones. However, we have the
following two theorems.
Proof: We first extend the set of eigenvectors {uil } to be a basis of Rn , i.e., the vectors
Moreover, we have
P
k P
nk
Ax = l Auil + j Avj
l=1 j=1
P
k P
nk
= l il uil + j Avj
l=1 j=1
P
k
= b= l uil .
l=1
Hence,
j = 0, j = 1, 2, , n k;
l = l /il , l = 1, 2, , k.
We therefore have
k
X
x = (l /il )uil .
l=1
Let
k
Y il z
Pk (z) = Pk .
il
l=1
k
X
Pk (A)x = Pk (il )(l /il )uil = 0.
l=1
We thus have
krk k2 kPk (A)r0 k2 = kPk (A)bk2
Theorem 6.13 When the GMRES method is applied for solving a linear system Ax =
b where A = I + L, the method will converge in at most rank(L) + 1 iterations.
Proof: We first recall that the minimal polynomial of r0 with respect to A is the
nonzero monic polynomial p of the lowest degree such that p(A)r0 = 0, see [33]. By
Theorem 6.8, the GMRES method must converge within iterations, where is the
6.5. GMRES METHOD 103
implies
k
X
i Li r0 = 0
i=0
for some constants i and vice versa, we have = . Moreover, from the definition of
, the set
{r0 , Lr0 , , L1 r0 }
is linearly independent. Let B be the column vector space of L. Then, the dimension
of B is equal to the rank of L. Since Li r0 B for i 1, we have
{Lr0 , , L1 r0 } B.
min kb Axk2 .
xx0 +K(A,r0 ,k)
whose columns form an orthonormal basis of K(A, r0 , k). Then for any z K(A, r0 , k),
it can be written as
Xk
z= ul vlk = Vk u,
l=1
)T Rk .
where u = (u1 , u2 , , uk Thus, once we find Vk , we can convert the original
LS problem in the Krylov subspace into a LS problem in Rk as follows. Let xk be the
solution after the k-th iteration. We then have
xk = x0 + Vk yk
104 CHAPTER 6. KRYLOV SUBSPACE METHODS
for i = 1, 2, , k 1.
This algorithm produces the columns of the matrix Vk which are also an orthonormal
basis for K(A, r0 , k). We note that a breakdown happens when a division by zero occurs.
We have the following theorem for the breakdown happening.
Theorem 6.14 Let A be nonsingular, the vectors vj be generated by the above algo-
rithm, and i be the smallest integer for which
i
X
Avi ((Avi )T vj )vj = 0. (6.14)
j=1
we know that
AK(A, r0 , i) K(A, r0 , i).
Note that the columns of Vi = [v1 , v2 , , vi ] form an orthonormal basis for K(A, r0 , i),
i.e.,
K(A, r0 , i) = span{v1 , v2 , , vi },
6.5. GMRES METHOD 105
and then
AVi = Vi H (6.15)
where H Rii is nonsingular since A is nonsingular. There exists a vector y Ri
such that xi x0 = Vi y because xi x0 K(A, r0 , i). We therefore have
xi = A1 b x0 + K(A, r0 , i).
If the Gram-Schmidt process does not breakdown, we can use it to carry out the
GMRES method in the following efficient way. Let hij = (Avj )T vi . By the Gram-
Schmidt algorithm, we have a (k + 1)-by-k matrix Hk which is upper Hessenberg, i.e.,
its entries hij satisfy hij = 0 if i > j + 1. This process produces a sequence of matrices
{Vk } with orthonormal columns such that
AVk = Vk+1 Hk .
Therefore, we have
Hence
xk = x0 + Vk yk .
Let the GMRES iterations be ended when one finds a vector x such that for a given ,
Ax = b,
106 CHAPTER 6. KRYLOV SUBSPACE METHODS
where A Rnn and b Rn are known, see [33]. At the initialization step, let
r0 = b Ax0 , = kr0 k2 , v1 = r0 /.
Exercises:
1. Suppose that xk is generated by the steepest descent method. Prove that
1
(xk ) 1 (xk1 ),
2 (A)
7. Show that if A Rnn is symmetric positive definite and has exactly k distinct eigen-
values, then the CG method will terminate in at most k iterations.
8. Let the initial vector x0 = 0. When the GMRES method is used to solve the linear
system Ax = b where
0 1
1 0
A=
1 0
1 0
1 0
and b = (1, 0, 0, 0, 0)T , what is the convergence rate?
9. Let
I Y
A= .
0 I
When the GMRES method is used to solve Ax = b, what is the maximum number of
iterations required to converge?
10. Prove that the LS problem in the GMRES method has full column rank.
11. Show that cU (A) = U (U AU )U .
108 CHAPTER 6. KRYLOV SUBSPACE METHODS
Chapter 7
Nonsymmetric Eigenvalue
Problems
Ax = x.
109
110 CHAPTER 7. NONSYMMETRIC EIGENVALUE PROBLEMS
B = XAX 1 .
The transformation
A B = XAX 1
is called a similarity transformation by the similarity matrix X. The similar matrices
have the same eigenvalues. If x is an eigenvector of A, then y = Xx is an eigenvector of
B. By the Jordan Decomposition Theorem (Theorem 1.1), we know that any n-by-n
matrix is similar to its Jordan canonical form. If the similarity matrix is required to be
a unitary matrix, we then have the following perhaps the most fundamentally useful
theorem in NLA, see [17].
Theorem 7.1 (Schur Decomposition Theorem) Let A Cnn with the eigenval-
ues 1 , , n in any prescribed order. Then there exists a unitary matrix U Cnn
such that
U AU = T = [tij ]
where T is an upper triangular matrix with diagonal entries tii = i , i = 1, , n.
Furthermore, if A Rnn and if all the eigenvalues of A are real, then U may be
chosen to be real and orthogonal.
x1 , y2 , , yn .
x1 , z2 , , zn .
Let
U1 = [x1 , z2 , , zn ]
7.1. BASIC PROPERTIES 111
Let
1 0
U2 = .
0 V2
Then the matrices U2 and U1 U2 are unitary, and
1
U2 U1 AU1 U2 = 2 .
0 A2
Continuing this process, we can produce unitary matrices U1 , U2 , , Un1 such that
the matrix
U = U1 U2 Un1
is unitary and U AU yields the desired form.
Theorem 7.2 (Real Schur Decomposition Theorem [17]) Let A Rnn . Then
there exists an orthogonal matrix Q Rnn such that
R11 R12 R1m
R22 R2m
QT AQ = .. .. ,
. .
0 Rmm
where Rii is either a real number or a 2-by-2 matrix having a pair of complex conjugate
eigenvalues.
In general, one cannot hope to reduce a real matrix to a strictly upper triangular
form by using an orthogonal similarity transformation because the diagonal entries
would then be eigenvalues, which could not be real.
112 CHAPTER 7. NONSYMMETRIC EIGENVALUE PROBLEMS
!
P
n k
j
= k1 1 x1 + j xj ,
j=2 1
and then
Ak u 0
lim = 1 x1 .
k k
1
When 1 6= 0 and k is sufficiently large, we know that the vector
Ak u0
uk = (7.1)
k1
is a good approximate eigenvector of A.
In practice, we cannot use (7.1) directly to compute an approximate eigenvector
since we do not know the eigenvalue 1 in advance and the operation cost of Ak is very
large when k is large. We therefore propose the following iterative algorithm:
yk = Auk1 ,
(k)
k = j , (7.2)
u = y / ,
k k k
7.2. POWER METHOD 113
(k)
where u0 Cn is any given initial vector with ku0 k = 1 usually, and j is the
largest absolute value of components of yk . This iterative algorithm is called the power
method. We have the following theorem for the convergence of the power method.
|1 | > |2 | |p |.
A = Xdiag(J1 , , Jp )X 1 , (7.3)
where X Cnn , Ji Cni ni is the Jordan block associated with i (i = 1, , p), and
n1 + n2 + + np = n.
Since the geometric multiplicity of 1 is the same as its algebraic multiplicity, we have
J1 = 1 In1
where In1 Rn1 n1 is the identity matrix. Let y = X 1 u0 and then decompose y and
X as follows:
y = (y1T , y2T , , ypT )T , X = [X1 , X2 , , Xp ]
where yi Cni and Xi Cnni , for i = 1, , p. By using (7.3), we have
Ak u0 = Xdiag(J1k , , Jpk )X 1 u0
= k1 X1 y1 + X2 J2k y2 + + Xp Jpk yp
= k1 X1 y1 + X2 (1 k 1 k
1 J2 ) y2 + + Xp (1 Jp ) yp .
(1
1 Ji ) = |i |/|1 | < 1,
114 CHAPTER 7. NONSYMMETRIC EIGENVALUE PROBLEMS
for i = 2, 3, , p. Therefore,
1 k
lim A u 0 = X 1 y1 . (7.4)
k k1
Since the projection of the initial vector u0 on the eigenspace of 1 is nonzero, we have
X1 y1 6= 0. Let
x1 = 1 X1 y1
where is the largest absolute value of components of X1 y1 . Obviously, x1 is an
eigenvector of A associated with 1 . Let k be the largest absolute value of components
of Ak u0 . Then the largest absolute value of components of k k k
1 A u0 is k 1 . By (7.2),
we have
Auk1 Ak u 0 Ak u 0 Ak u0 /k1
uk = = = = .
k k k1 1 k k /k1
By using (7.4), we know that {uk } is convergent and
By using Auk1 = k uk and the fact that {uk } converges to an eigenvector associated
with 1 having the largest absolute value of components equaling to 1, we immediately
know that {k } converges to 1 .
We remark that from the proof of Theorem 7.3, the convergence rate of the power
method is determined by the value of |2 |/|1 |. Under the conditions of the theorem,
we know that
|2 |
< 1.
|1 |
The smaller the |2 |/|1 | is, the faster the convergence rate will be. When |2 |/|1 |
is closed to 1, then the convergence rate will be very slow. In order to speed up the
convergence of the power method, we could use the method on A I where is called
a shift. The could be chosen such that the distance between the eigenvalue with
the largest absolute value and the other eigenvalues becomes larger. Therefore, the
convergence rate of the power method can be increased.
(k)
where z0 Cn is any given initial vector and j is the largest absolute value of
components of yk . From Theorem 7.3, we know that if the eigenvalues of A satisfy
|n | < |n1 | |1 |,
A I
From (7.5), we know that in each iteration of the inverse power method, one needs to
solve a linear system. Hence, its operation cost is much larger than that of the power
method. However, one could use the LU factorization with partial pivoting in advance
and then in each iteration later on, one only needs to solve two triangular systems.
Suppose that the eigenvalues of the A I are ordered as follows:
0 < |1 | < |2 | |3 | |n |.
From Theorem 7.3 again, we know that the sequence {zk } produced by (7.5) converges
to an eigenvector associated with 1 . The convergence rate is determined by the value
of |1 |/|2 |. The more closer of to 1 , the faster the convergence rate will
be. But if is closed to an eigenvalue of A, then A I is closed to a singular matrix.
Therefore, one needs to solve an ill-conditioned linear system in each iteration of the
inverse power method. However, from practical computations, the illness of systems
has no effect on the convergence rate of the method. Usually, only one iteration could
produce a good approximate eigenvector of A if is closed to an eigenvalue of A.
116 CHAPTER 7. NONSYMMETRIC EIGENVALUE PROBLEMS
7.4 QR method
In this section, we introduce the well-known QR method which is one of main important
developments in matrix computations. For any given A0 = A Cnn , the basic
iterative scheme of the QR algorithm is given as follows:
Am1 = Qm Rm ,
(7.6)
Am = Rm Qm ,
for m = 1, 2, , where Qm is a unitary matrix and Rm is an upper triangular matrix.
For simplicity in later analysis, we require that diagonal entries of Rm are nonnegative.
By (7.6), one can easily obtain
Am = Qm Am1 Qm , (7.7)
i.e., each matrix in the sequence {Am } is similar to the matrix A. By using (7.7) again
and again, we have
Am = Q e m AQ
em , (7.8)
e
where Qm = Q1 Q2 Qm . Substituting Am = Qm+1 Rm+1 into (7.8), we obtain
e m Qm+1 Rm+1 = AQ
Q em .
Therefore,
e m Qm+1 Rm+1 Rm R1 = AQ
Q e m Rm R1 ,
i.e.,
e m+1 R
Q em+1 = AQ
em R
em ,
ek = Rk Rk1 R1 , for k = m, m + 1. Moreover, we have
where R
em R
Am = Q em . (7.9)
Proof: Let
X = Y 1 , = diag(1 , , n ).
Then A = XY . By assumption, Y has an LU factorization
Y = LU
where L is a unit lower triangular matrix and U is an upper triangular matrix. Hence
Am = Xm Y = Xm LU = X(m Lm )m U
(7.10)
= X(I + Em )m U,
b m )(R
Am = (QQ bm Rm U ),
and
u11 unn
D2 = diag ,, ,
|u11 | |unn |
118 CHAPTER 7. NONSYMMETRIC EIGENVALUE PROBLEMS
Comparing with (7.9) and noting that the QR decomposition is unique, we obtain
e m = QQ
Q b m Dm D2 , em = D1 Dm R
R bm Rm U.
1 2 1
Note that
A = XY = XX 1 = QRR1 Q .
We finally have
b RR1 Q
Am = D2 (D1 )m Q b m Dm D2 .
m 1
When m , we know that by (7.14) the entries under the diagonal of the matrix
Am produced by (7.6) tend to zero. At the same time,
(m)
ii i ,
for i = 1, 2, , n.
From Theorem 7.4, we know that the sequence {Am } produced by (7.6) converges to
the Schur decomposition of A.
has some special structure (Hessenberg) with many zero entries. Afterwards, we can
apply the QR algorithm (7.15) to the matrix H in (7.16). Then the operation cost per
iteration can be dramatically reduced.
For A = [ij ] Rnn , at the first step, we can choose a Householder transformation
H1 such that the first column of H1 A has many zero entries (at most n 1 zero
entries). In order to keep a similarity transformation, we need to add one more column
transformation:
H1 AH1 .
Hence H1 could have the following form
1 0
H1 = e1 (7.17)
0 H
to keep the zero entries in the first column unchanged. By using H1 defined as in (7.17),
we have " #
11 e1
aT2 H
H1 AH1 = e 1 A22 H
e 1 a1 H e1 (7.18)
H
e 1 a1 = pe1
H (7.19)
where p R and e1 Rn1 is the first unit vector. Therefore, the first column of the
matrix in (7.18) has n 2 zeros by using H1 defined by (7.17) and (7.19).
Afterwards, for Ae22 = H e 1 , we could find a Householder transformation
e 1 A22 H
e 1 0
H2 = b2
0 H
120 CHAPTER 7. NONSYMMETRIC EIGENVALUE PROBLEMS
such that
e2A
(H e22 H
e 2 )e1 = (, , 0, , 0)T .
Let
1 0
H2 = e2 .
0 H
We therefore have
h11 h12
. ..
h21 h22 .. .
.
0 h32 ..
H2 H1 AH1 H2 =
..
.
0 0 .
.. .. .. ..
. . . .
0 0
After n 2 steps, we have found n 2 Householder transformations H1 , H2 , , Hn2 ,
such that
Hn2 H2 H1 AH1 H2 Hn2 = H
where
h11 h12 h13 h1,n1 h1n
h21 h22 h23 h2,n1 h2n
0 h32 h33 h3,n1 h3n
H=
.. .. ..
. 0 h43 . .
.. .. .. .. ..
. . . . .
0 0 0 hn,n1 hnn
with hij = 0 for i > j + 1, and is called the upper Hessenberg matrix. Let
Q0 = H1 H2 Hn2
and therefore,
QT0 AQ0 = H,
which is called the upper Hessenberg decomposition of A. We have the following
algorithm by using Householder transformations.
Theorem 7.5 Suppose that A Rnn has the following two upper Hessenberg decom-
positions:
U T AU = H, V T AV = G, (7.20)
where
U = [u1 , u2 , , un ], V = [v1 , v2 , , vn ]
are n-by-n orthogonal matrices, and H = [hij ], G = [gij ] are upper Hessenberg matrices.
If u1 = v1 and all the entries hi+1,i are not zero, then there exists a diagonal matrix D
with diagonal entries being either 1 or 1 such that
U = V D, H = DGD.
uj = j vj , j = 1, 2, , m, (7.21)
and
Avm = g1m v1 + + gmm vm + gm+1,m vm+1 . (7.23)
Multiplying (7.22) and (7.23) by uTi and viT , respectively, we have
Therefore by (7.21),
him = i m gim , i = 1, 2, , m. (7.24)
122 CHAPTER 7. NONSYMMETRIC EIGENVALUE PROBLEMS
Substituting (7.24) into (7.22), and using (7.21) and (7.23), we obtain
= m gm+1,m vm+1 .
Hence,
|hm+1,m | = |gm+1,m |.
Since hm+1,m 6= 0, from (7.25), we know that
Then we construct a Givens rotation P34 = G(3, 4, 3 ) such that the 3 satisfies
cos 3 sin 3 h33
= .
sin 3 cos 3 h43 0
Hence,
0
P34 P23 P12 H =
0 0 .
0 0 0
0 0 0
7.5. REAL VERSION OF QR ALGORITHM 123
Therefore, it is easy to see that for n-by-n upper Hessenberg matrix H, we can construct
n 1 Givens rotations P12 , P23 , , Pn1,n such that
i.e., (7.27) holds. Through a shift, the convergence rate of the QR iteration is expected
to be quadratic. If H has complex eigenvalues, then a double shift strategy can be used
to speed up the convergence rate of the QR iteration.
in the lower right corner of Hm has a pair of complex conjugate eigenvalues 1 and 2 .
(m)
We cannot expect that hnn tends to an eigenvalue of A. A way around this difficulty
is to perform the following QR algorithm with double shifts:
H 1 I = Q1 R1 ,
H1 = R1 Q1 + 1 I,
(7.28)
H1 2 I = Q2 R2 ,
H2 = R2 Q2 + 2 I,
where H = Hm . Let
M (H 1 I)(H 2 I). (7.29)
By a few simple computations, we have
M = QR, (7.30)
and
H2 = QT HQ, (7.31)
where
Q = Q1 Q2 , R = R 2 R1 .
By (7.29), we obtain
M = H 2 sH + tI,
where
s = 1 + 2 = h(m) (m)
pp + hnn R,
and
t = 1 2 = det(G) R.
126 CHAPTER 7. NONSYMMETRIC EIGENVALUE PROBLEMS
Exercises:
1. Show that if T Cnn is upper triangular and normal, then T is diagonal.
2. Let A, B Cnn . Prove that the spectrum of AB is equal to the spectrum of BA.
3. Let A Cnn , x Cn and X = [x, Ax, , An1 x]. Show that if X is nonsingular, then
X 1 AX is an upper Hessenberg matrix.
4. Suppose that A Cnn has distinct eigenvalues. Show that if Q AQ = T is the Schur
decomposition and AB = BA, then Q BQ is upper triangular.
5. Suppose that A Rnn and z Rn . Find a detailed algorithm for computing an
orthogonal matrix Q such that QT AQ is upper Hessenberg and QT z is a multiple of e1
where e1 is the first unit vector.
6. Suppose that W , Y Rnn and define matrices C, B by
W Y
C = W + iY, B = .
Y W
Show that if R is an eigenvalue of C, then is also an eigenvalue of B. What is the
relation between two corresponding eigenvectors?
7. Suppose that
w x
A= R22
y z
has eigenvalues i, where 6= 0. Find an algorithm that determines c = cos and
s = sin stably such that
T
c s w x c s
= ,
s c y z s c
7.5. REAL VERSION OF QR ALGORITHM 127
where = 2 .
Hk k I = Qk Rk , Hk+1 = Rk Qk + k I.
Show that
(Q1 Qj )(Rj R1 ) = (H 1 I) (H j I).
where is any eigenvalue of A and Im() denotes the imaginary part of a complex number.
12. Show that if
1
(A + AT )
2
is positive definite, then Re() > 0, where is any eigenvalue of A and Re() denotes the
real part of a complex number.
13. Let B be a matrix with kBk2 < 1. Show that I B is nonsingular and the eigenvalues
of I + 2(B I)1 have negative real parts.
128 CHAPTER 7. NONSYMMETRIC EIGENVALUE PROBLEMS
Chapter 8
The symmetric eigenvalue problem with its nice properties and rich mathematical the-
ory is one of the most pleasing topics in NLA. In this chapter, we will study symmetric
eigenvalue problems. We begin by introducing some basic spectral properties of sym-
metric matrices. Then the symmetric QR method, the Jacobi method and the bisection
method are discussed. Finally, a divide-and-conquer algorithm is described.
QT AQ = diag(1 , , n ).
The eigenvalues of a symmetric matrix have minimax properties based on the values
called the Rayleigh quotient:
xT Ax
.
xT x
We have the following theorem and its proof can be found in [19].
1 n .
129
130 CHAPTER 8. SYMMETRIC EIGENVALUE PROBLEMS
Then
uT Au uT Au
i = max min = min max ,
dim(S)=i 06=uS uT u dim(S)=ni+1 06=uS uT u
The next theorem [46] shows the sensitivity of eigenvalues of symmetric matrices.
then
|i (A) i (B)| kA Bk2 , i = 1, 2, , n,
and
n
X
(i (A) i (B))2 kA Bk2F .
i=1
Theorem 8.3 tells us that the eigenvalues of any symmetric matrix are well conditioned,
i.e., small perturbations on the entries of A cause only small changes in the eigenvalues
of A.
for r = 1, 2, , n 1.
As for the sensitivity of eigenvectors, we have the following theorem, see [46].
Q = [q1 , Q2 ] Rnn
where is any eigenvalue of D22 , then there exists a unit eigenvector q1 such that
q
4 4
sin = 1 |q1T q1 |2 kek2 kEk2 ,
d d
where = arccos |q1T q1 |.
It seems that could be a good measurement between q1 and q1 . We can see that the
sensitivity to a perturbation of a single eigenvector depends on the separation of its
corresponding eigenvalue from the rest of eigenvalues.
The eigenvalues of any symmetric matrix are closely related with the singular values
of the matrix. The singular value decomposition [19] is essential in NLA.
Corollary 8.1 Let A, B Rnn and their singular values be ordered respectively as
follows:
1 (A) n (A), 1 (B) n (B),
then
|i (A) i (B)| kA Bk2 , i = 1, 2, , n,
and
n
X
(i (A) i (B))2 kA Bk2F .
i=1
The corollary shows that the singular values of any real matrix are also well conditioned,
i.e., small perturbations on the entries of A cause only small changes in the singular
values of A.
132 CHAPTER 8. SYMMETRIC EIGENVALUE PROBLEMS
QT AQ = T,
Hk vk1 = k e1 , k R.
(2) Compute
k+1 vkT
= Hk Ak1 Hk ,
vk Ak
where Ak R(nk1)(nk1) .
then we have
QT AQ = T.
From the reduction above, it is easy to see that the main operation cost of the k-th
step is to compute Hk Ak1 Hk . Let
Hk = I vv T , v Rnk .
where
1
w = u (v T u)v, u = Ak1 v.
2
Since only the upper triangular portion of this matrix needs to be computed, we see
that the transition from Ak1 to Ak can be computed in 4(nk)2 operations only. Given
a symmetric matrix A Rnn , the following algorithm overwrites A with T = QT AQ,
where T is a tridiagonal matrix and Q is a product of Householder transformations.
irreducible, i.e., sub-diagonal entries are nonzero. Let us discuss how to choose the
shift k . We can take k = Tk (n, n), the (n, n)-th entry at each iteration as a shift.
However, a better way is to select
q
2 ,
k = n + sign() 2 + n1
where = (n1 n )/2. This is the well-known Wilkinson shift, see [46]. Note that
k is just the eigenvalue of the matrix
n1 n1
Tk (n 1 : n, n 1 : n) =
n1 n
which is closer to n .
T I = QR
to
T = RQ + I
without explicitly forming the matrix T I. The essence of (8.1) is to transform
T to T by orthogonal similarity transformations. It follows from Theorem 7.5 that T
can be determined completely by the first column of Q. From the process of the QR
decomposition of T I by Givens rotations, we know that
Qe1 = G1 e1 ,
where G1 = G(1, 2, 1 ) is a rotation which makes the second entry in the first column
of T I to be zero. The 1 can be computed from
cos 1 sin 1 1
= .
sin 1 cos 1 1 0
Let
B = G1 T GT1 .
Then B has the following form (n = 4),
+ 0
0
B= +
.
0 0
8.2. SYMMETRIC QR METHOD 135
Let Gi = G(i, i+1, i ), i = 2, 3. Then Gi of this form can be used to chase the unwanted
nonzero entry + out of the matrix B as follows:
0 0 0 0
G2 +
G3 0
B 0 0 .
0 + 0 0
Ze1 = G1 e1 = Qe1
This algorithm requires about 30n operations and n square roots. Of course, the tridi-
agonal matrix T would be stored in a pair of n-vectors in any practical implementation.
T = U0T AU0 .
Set Q = U0 .
If we only need to compute the eigenvalues, this algorithm requires about 4n3 /3
operations. If we need both the eigenvalues and eigenvectors, it requires about 9n3
operations. It can be shown [46] that the computed eigenvalues i , i = 1, 2, , n,
obtained by Algorithm 8.3, satisfy
QT (A + E)Q = diag(1 , , n ),
where Q Rnn is orthogonal and kEk2 kAk2 u where u is the machine precision.
Using Theorem 8.3, we have
|i i | kAk2 u, i = 1, 2, , n,
where {i } are the eigenvalues of A. The absolute error in each i is small and the
relative error is less than the machine precision u. If Q = [q1 , , qn ] is the matrix
of computed orthonormal eigenvectors, then the accuracy of each qi depends on the
separation of i from the rest of eigenvalues.
8.3. JACOBI METHOD 137
1/2
n
!1/2 n X
n
X X
off(A) kAk2F a2ii =
a2ij
.
i=1 i=1 j=1
j6=i
The idea of Jacobi method is to systematically reduce the off(A) to be zero. The basic
tools for doing this are called Jacobi rotations defined as follows,
where p < q and ek denotes the k-th unit vector. Note that Jacobi rotations are no
different from Givens rotations, see Section 4.2.2. We change the name in this section
to honour the inventor. The basic step in a Jacobi procedure involves:
T
bpp bpq c s app apq c s
= (8.2)
bqp bqq s c aqp aqq s c
Note that the matrix B agrees with the matrix A except the p-th row (column) and
the q-th row (column). The relations are:
t2 + 2 t 1 = 0.
Then, p
t = 1 + 2.
We select t to be the smaller of the two roots which ensures that || /4 and has the
effect of minimizing of kB Ak2F because
n
X 2a2pq
kB Ak2F = 4(1 c) (a2ip + a2iq ) + .
i=1
c2
i6=p,q
Algorithm 8.4
function : [c, s] = sym(A, p, q)
if A(p, q) 6= 0
= (A(q, q) A(p, p))/(2A(p, q))
if 0
t = 1/( + 1 + 2)
else
t = 1/( + 1 + 2 )
end
c = 1/ 1 + t2
s = tc
else
c=1
s=0
end
A J(p, q, )T AJ(p, q, )
and then
P
n
off(B)2 = kBk2F b2ii
i=1
P
n
(8.4)
= kAk2F a2ii + (a2pp + a2qq b2pp b2qq )
i=1
= off(A)2 2a2pq .
[c, s] = sym(A, p, q)
A = J(p, q, )T AJ(p, q, )
U = U J(p, q, )
end
off(Ak+N ) c off(Ak )2 ,
see [19] and references therein. Therefore, the off-diagonal norm will approach to
zero at a quadratic rate after a sufficient number of iterations.
Another advantage of the Jacobi method is easy to compute the eigenvectors. If
the iteration stops after the k-th rotation, we then have
Ak = JkT Jk1
T
J1T AJ1 J2 Jk .
Denote
Qk = J1 J2 Jk .
8.3. JACOBI METHOD 141
Thus
AQk = Qk Ak .
Since off-diagonal entries of Ak are tiny and then diagonal entries of Ak are good
approximations to the eigenvalues of A, the identity above shows that the columns of Qk
are good approximations to the eigenvectors of A and all the approximate eigenvectors
are orthonormal. We can obtain Qk , the approximate eigenvectors, during Jacobi
iterative process.
group (1) : (1, 2), (3, 4), (5, 6), (7, 8);
group (2) : (1, 3), (2, 4), (5, 7), (6, 8);
group (3) : (1, 4), (2, 3), (5, 8), (6, 7);
group (4) : (1, 5), (2, 6), (3, 7), (4, 8);
group (5) : (1, 6), (2, 5), (3, 8), (4, 7);
group (6) : (1, 7), (2, 8), (3, 5), (4, 6);
group (7) : (1, 8), (2, 7), (3, 6), (4, 5).
Note that all 4 rotations within each group are nonconflicting. For instance, the
subproblems J(2i 1, 2i, i ), i = 1, 2, 3, 4, in the first group can be carried out in
parallel. When we compute J(1, 2, 1 )T AJ(1, 2, 1 ), it has no effect on the rotations
(3, 4), (5, 6) and (7, 8). Then the computation of
A = AJ(1, 2, 1 ), A = AJ(3, 4, 2 ),
A = AJ(5, 6, 3 ), A = AJ(7, 8, 4 ),
A = J(1, 2, 1 )T A, A = J(3, 4, 2 )T A,
A = J(5, 6, 3 )T A, A = J(7, 8, 4 )T A,
can also be carried out in parallel by 4 processors. For the example above, it only needs
1/4 computing time of a computer with a single processor. A parallel Jacobi algorithm
can be found in [19].
142 CHAPTER 8. SYMMETRIC EIGENVALUE PROBLEMS
Theorem 8.7 (Sturm Sequence Property) Let the symmetric tridiagonal matrix
T in (8.5) be irreducible. Then the eigenvalues of Ti1 strictly separate the eigenvalues
of Ti :
i (Ti ) < i1 (Ti1 ) < i1 (Ti ) < < 2 (Ti ) < 1 (Ti1 ) < 1 (Ti ).
then sn () is equal to the number of eigenvalues of T that are less than , where pi ()
are defined by (8.6). If pi () = 0, then pi1 ()pi+1 () < 0.
Proof: It follows from Theorem 8.4 that the eigenvalues of Ti1 weakly separate
those of Ti . Next we will show that the separation must be strict. Assume that
pi () = pi1 () = 0 for some i and . Since T is irreducible and, we note that by (8.6),
p0 () = p1 () = = pi () = 0,
8.4. BISECTION METHOD 143
Algorithm 8.6 (Bisection algorithm) Let 1 < 2 < < n be the eigenvalues
of T , i.e., the roots of pn (), and be a tolerance. Suppose that the desired eigenvalue
is m for a given m n. Then
l0 + u0
(2) Compute r1 = and sn (r1 ).
2
l1 + u1
(4) If |l1 u1 | < , take r2 = as an approximate value of m . Otherwise go
2
to (2).
From the algorithm above, we can see that the main operation cost is to compute
sn (). However in practice, sn () cannot be obtained through computing the value of
pi () because it is difficult to evaluate polynomials of high order. In order to avoid
such a problem, we define
pi ()
qi () = , i = 1, 2, , n.
pi1 ()
b2i1
q1 () = p1 () = a1 , qi () = ai , i = 2, 3, , n.
qi1 ()
It is easy to check that sn () is exactly the number of negative values in the sequence
of q1 (), , qn (). The following is a practical algorithm for computing sn ().
144 CHAPTER 8. SYMMETRIC EIGENVALUE PROBLEMS
x = [a1 , a2 , , an ]
y = [0, b1 , , bn1 ]
s = 0; q = x(1)
for k = 1 : n
if q < 0
s=s+1
end
if k < n
if q = 0
q = |y(k + 1)|u
end
q = x(k + 1) y(k + 1)2 /q
end
end
8.5.1 Tearing
Let T Rnn be given as follows,
a1 b1 0
..
b1 a2 .
T =
.. ..
.
. . bn1
0 bn1 an
Without loss of generality, assume n = 2m. Let
v = (0, , 0, 1, , 0, , 0)T Rn .
| {z } | {z }
m1 m1
8.5.2 Combining
Once we obtained the spectral decompositions of T1 and T2 :
QT1 T1 Q1 = D1 , QT2 T2 Q2 = D2 ,
V T T V = diag(1 , , n ).
Let
Q1 0
U= ,
0 Q2
then
T
Q1 0 T1 0 Q1 0
UT TU = + vv T
0 Q2 0 T2 0 Q2
= D + zz T ,
where
D = diag(D1 , D2 ), z = U T v.
Now the problem of finding the spectral decomposition of T is reduced to the problem
of computing the spectral decomposition of D +zz T . We will consider how to compute
the spectral decomposition of D + zz T quickly and stably.
Lemma 8.1 Let D = diag(d1 , , dn ) Rnn with d1 > d2 > > dn . Assume that
0 6= R and z = (z1 , z2 , , zn )T Rn with zi 6= 0 for all i. Let u Rn and R
satisfy
(D + zz T )u = u, u 6= 0.
Then z T u 6= 0 and D I is nonsingular.
On the other hand, if D I is singular, then there exists some i such that eTi (D
I) = 0, and then
0 = eTi (D I)u = z T ueTi z.
Since z T u 6= 0, we have eTi z = zi = 0, a contradiction. Thus, D I is nonsingular.
Theorem 8.8 Let D = diag(d1 , , dn ) Rnn with d1 > d2 > > dn . Assume
that 0 6= R and z = (z1 , z2 , , zn )T Rn with zi 6= 0 for all i. Suppose that the
spectral decomposition of D + zz T is
V T (D + zz T )V = diag(1 , , n ),
f () = 1 + z T (D I)1 z.
vi = i (D i I)1 z, i = 1, 2, , n.
(D + zz T )vi = i vi , kvi k2 = 1.
vi = z T vi (D i I)1 z, i = 1, 2, , n, (8.7)
1 = z T (D i I)1 z,
148 CHAPTER 8. SYMMETRIC EIGENVALUE PROBLEMS
i.e.,
f (i ) = 0, i = 1, 2, , n.
Thus, i , i = 1, 2, , n, are the roots of f (). Next we prove that f () has exactly n
zeros. Note that
z12 zn2
f () = 1 + + + ,
d1 dn
and moreover,
0 z12 zn2
f () = + + .
(d1 )2 (dn )2
Thus, f () is strictly monotone between the poles di and di+1 . If > 0, f () is strictly
increasing; if < 0, f () is strictly decreasing. Therefore, it is easy to see that f ()
has exactly n roots, one of each in the intervals
(2) Compute
(D i I)1 z
vi = , i = 1, 2, , n.
k(D i I)1 zk2
(ii) V T DV = diag(d(1) , , d(n) ) where d(1) > d(2) > > d(r) .
8.5. DIVIDE-AND-CONQUER METHOD 149
Proof: Suppose that two indices i < j satisfy di = dj . Then we can set a rotation
Pij = G(i, j, ) such that the j-th component of Pij z is zero. It is easy to show that
PijT DPij = D. After several steps, we can find an orthogonal matrix V1 which is a
product of some rotations such that
with
1 (i) 6= 0, i = 1, 2, , r,
and
1 (i) = 0, i = r + 1, , n.
Here 1 is a permutation of {1, 2, , n}. It follows from the construction of P1 that
V = V1 P1 diag(P2 , Inr )
V T DV = diag(d(1) , , d(n) )
with
d(1) > d(2) > > d(r) .
The proof is complete.
150 CHAPTER 8. SYMMETRIC EIGENVALUE PROBLEMS
where
D1 = diag(d(1) , , d(r) ) Rrr , d(1) > > d(r) ;
(1) Tear
T1 0
T = + vv T , T1 R2N 2N , v R4N ;
0 T2
and
Ti1 0
Ti = + i i iT ,
0 Ti2
where Tij RN N and i R2N , for i = 1, 2.
(2) Compute the spectral decompositions of T11 , T12 , T21 and T22 by 4 processors in
parallel.
(3) Combine the spectral decompositions of T11 , T12 to form a spectral decomposition
of T1 , and combine the spectral decompositions of T21 , T22 to form a spectral
decomposition of T2 . These can be implemented by 4 processors at the same
time.
From discussions above, we know that the divide-and-conquer method can be used
for computing all the eigenvalues and eigenvectors of any large symmetric tridiagonal
matrix in parallel.
8.5. DIVIDE-AND-CONQUER METHOD 151
Exercises:
1. Compute the Schur decomposition of
1 2
A= .
2 3
min (X) 1 ,
where B, C Rmn .
5. Use the singular value decomposition to show that if A Rmn with m n, then there
exist a matrix Q Rmn with QT Q = I and a positive semi-definite matrix P Rnn
such that A = QP .
6. Let
I B
A=
B I
with kBk2 < 1. Show that
1 + kBk2
kAk2 kA1 k2 = .
1 kBk2
7. Let
a1 b1 0
c1 a2 b2
.. .. ..
A=
. . . ,
.. ..
. . bn1
0 cn1 an
8. Let
2 1
1 2 1
T =
.
1 2 1
1 2
9. Let A, E Rnn be two symmetric matrices. Show that if A is positive definite and
kA1 k2 kEk2 < 1, then A + E is also positive definite.
10. Let A Rmn with m n, and assume that the singular values of A are ordered as
1 2 n .
Show that
kAuk2 kAuk2
i = max min = min max ,
dim(S)=i 06=uS kuk2 dim(S)=ni+1 06=uS kuk2
Applications
In this chapter, we will briefly survey some of the latest developments in using bound-
ary value methods (BVMs) for solving initial value problems of systems of ordinary
differential equations (ODEs). These methods require the solution of one or more
nonsymmetric, large and sparse linear systems. Therefore, we will use the GMRES
method studied in Chapter 6 with some preconditioners for solving these linear sys-
tems. One of the main results is that if an A1 ,2 -stable BVM is used for an n-by-n
system of ODEs, then the preconditioned matrix can be decomposed as I + L where
I is the identity matrix and the rank of L is at most 2n(1 + 2 ). When the GMRES
method is applied to the preconditioned systems, the method will converge in at most
2n(1 + 2 ) + 1 iterations. Applications to different kinds of delay differential equations
(DDEs) are also given. For a literature on BVMs for ODEs and DDEs, we refer to
[3, 4, 5, 8, 10, 23, 24, 25, 26, 29, 30].
9.1 Introduction
Let us begin with the initial value problem:
0
y (t) = Jn y(t) + g(t), t (t0 , T ],
(9.1)
y(t0 ) = z,
where y(t), g(t) : R Rn , z Rn , and Jn Rnn . The initial value methods (IVMs),
such as the Runge-Kutta methods, are well-known methods for solving (9.1), see [40].
Recently, another class of methods called the boundary value methods (BVMs) has
been proposed in [5]. Using BVMs to discretize (9.1), we obtain a linear system
M u = b.
The advantage of using BVMs is that the methods are more stable and the resulting
linear system M u = b is hence more well-conditioned. However, this system is in
153
154 CHAPTER 9. APPLICATIONS
general large and sparse (with band-structure), and solving it is a major problem in
the application of BVMs. The GMRES method studied in Chapter 6 will be used for
solving M u = b. In order to speed up the convergence of the GMRES iterations, a
preconditioner S called the Strang-type block-circulant preconditioner [10] is used to
precondition the discrete system. The advantage of the Strang-type preconditioner is
that if an A1 ,2 -stable BVM is used for solving (9.1), then S is invertible and the
preconditioned matrix can be decomposed as
S 1 M = I + L,
where the rank of L is at most 2n(1 + 2 ) which is independent of the integration step
size. It follows that the GMRES method applied to the preconditioned system will
converge in at most 2n(1 + 2 ) + 1 iterations in exact arithmetic.
The outline of this chapter is as follows. In Section 9.2, we will give some background
knowledge about the linear multistep formulas (LMFs) and BVMs. Then, we will
investigate the properties of the Strang-type block-circulant preconditioner for ODEs
in Section 9.3. The convergence and cost analysis of the method will also be given
with a numerical example. Finally, we discuss the applications of the Strang-type
preconditioner with BVMs for solving different kinds of delay differential equations
(DDEs) in Sections 9.49.6.
where y(t) : R R and f (t, y) : R2 R. The -step linear multistep formula (LMF)
over a uniform mesh with step size h is defined as follows:
X
X
j ym+j = h j fm+j , m = 0, 1, , (9.2)
j=0 j=0
y0 , y1 , , y1 .
9.2. BACKGROUND OF BVMS 155
Since only y0 is provided from the original problem, we have to find additional condi-
tions for the remaining values
y1 , y2 , , y1 .
The equation (9.2) with 1 additional conditions is called initial value methods
(IVMs). An IVM is called implicit if 6= 0 and explicit if = 0. If an IVM is
applied to an initial value problem on the interval [t0 , tN +1 ], we have the following
discrete problem 1
P
(i yi hi fi )
i=0
..
.
AN y = hBN f + 0 y1 h0 f1
, (9.3)
0
..
.
0
where
y = (y , y+1 , , yN +1 )T , f = (f , f+1 , , fN +1 )T ,
.. . . .. . .
. . . .
.. ..
AN = 0 . , BN = 0 . .
.. .. .. ..
. . . .
0 0
Note that the matrices AN , BN RN N are lower triangular band Toeplitz matrices
with lower bandwidth . We recall that a matrix is said to be Toeplitz if its entries
are constant along its diagonals. Moreover, the linear system (9.3) can be solved
easily by forward recursion. A classical example of IVM is the second order backward
differentiation formula (BDF),
y0 , y1 , , y1 1 ,
yN , yN +1 , , yN +2 1 ,
156 CHAPTER 9. APPLICATIONS
which are called (1 , 2 )-boundary conditions. Note that the class of BVMs contains
the class of IVMs (i.e., 1 = , 2 = 0).
The discrete problem generated by a -step BVM with (1 , 2 )-boundary conditions
can be written in the following matrix form
P
1 1
(i yi hi fi )
i=0
.
..
0 y1 1 h0 f1 1
0
..
Ay = hBf + . ,
0
yN h fN
..
.
P 2
(1 +i yN 1+i h1 +i fN 1+i )
i=1
where
y = (y1 , y1 +1 , , yN 1 )T , f = (f1 , f1 +1 , , fN 1 )T ,
A and B R(N 1 )(N 1 ) are defined as follows,
1 1
.. .. .. .. .. .. .. ..
. . . . . . . .
A = 0 . . . . . . ..
. , B = 0 .. .. ..
. . . . (9.4)
.. .. .. .. .. ..
. . . . . .
0 1 0 1
Note that the coefficient matrices are band Toeplitz with lower bandwidth 1 and
upper bandwidth 2 . An example of BVMs is the third order generalized backward
differentiation formula (GBDF),
0 = 1, 1 = 6, 2 = 3, 3 = 2, 2 = 6.
Although IVMs are more efficient than BVMs (which cannot be solved by forward
recursion), the advantage in using BVMs over IVMs comes from their stability prop-
erties. For example, the usual BDF are not A-stable for > 2 but the GBDF are
A1 ,2 -stable for any 1, see for instance [1] and [5, p. 79 and Figures 5.15.3].
9.2. BACKGROUND OF BVMS 157
Let = 1 + 2 . By using the -step block-BVM based on LMF over a uniform mesh
h = (T t0 )/s for solving (9.1), we have:
2
X 2
X
i+1 ym+i = h i+1 fm+i , m = 1 , . . . , s 2 . (9.5)
i=1 i=1
fm = Jn ym + gm , gm = g(tm ).
Also, (9.5) requires 1 initial conditions and 2 final conditions which are provided by
the following 1 additional equations:
X
X
(j) (j)
i yi = h i fi , j = 1, . . . , 1 1, (9.6)
i=0 i=0
and
X
X
(j) (j)
i ysi = h i fsi , j = s 2 + 1, . . . , s. (9.7)
i=0 i=0
The coefficients {(j) }, { (j) } in (9.6) and (9.7) should be chosen such that the
truncation errors for these initial and final conditions are of the same order as that in
(9.5). By combining (9.5), (9.6), (9.7) and the initial condition y(t0 ) = y0 = z, the
discrete system of (9.1) is given by the following block form
e In hB
M y (A e Jn )y = e1 z + h(B
e In )g. (9.8)
Here
e B
A, e R(s+1)(s+1) given by:
1 0
(1)
(1)
0
. .. ..
.. . . 0
(1 1) ( 1)
0 1
0
0
Ae= .. .. ..
,
. . .
.. .. ..
. . .
0
(s +1) (s +1)
0 0 2 2
.. ..
. .
(s) (s)
0
0 0
0
(1)
(1)
.. .. ..
. . .
(1 1) ( 1)
0 1 0
0
0
e=
B .. .. ..
,
. . .
.. .. ..
. . .
0
(s +1) (s +1)
0 0 2 2
.. ..
. .
(s) (s)
0
and is the tensor product.
We recall that the tensor product of A = (aij ) Rmn and B Rpq is defined as
follows:
a11 B a12 B a1n B
a21 B a22 B a2n B
AB .. .. ..
. . .
am1 B am2 B amn B
which is an mp-by-nq matrix. The basic properties of the tensor product can be found
in [19, 22].
9.3. STRANG-TYPE PRECONDITIONER FOR ODES 159
We remark that usually the linear system (9.8) is large and sparse (with band-
structure), and solving it is a major problem in the application of the BVMs. We
will use the GMRES method in Chapter 6 for solving (9.8). In order to speed up the
convergence rate of the GMRES iterations, we will use a preconidtioner S called the
Strang-type block-circulant preconditioner.
(3) The operation cost for each iteration of the preconditioned GMRES method is
smaller than that of direct solvers.
We first recall the definition of Strangs circulant preconditioner for Toeplitz matrices.
Given any Toeplitz matrix
Tl = [tij ]li,j=1 = [tq ],
where
1 0 1 1
.. .. .. .. ..
. . . . .
.. ..
0 . . 0
.. .. ..
. . . 0
.. .. ..
s(A) =
. . .
.. .. ..
0 . . .
.. ..
. .
.. .. .. .. ..
. . . . .
1 +1 0 1
and s(B) is defined similarly by using {i }i=0 instead of {i }i=0 in s(A). The {i }i=0
and {i }i=0 here are the coefficients given in (9.5). We remark that actually s(A),
s(B) are just Strangs circulant preconditioners for Toeplitz matrices A, B respectively,
where A, B are given by (9.4).
We will show that the preconditioner S is invertible provided that the given BVM
is A1 ,2 -stable and the eigenvalues of Jn are in
C {q C : Re(q) < 0}
where Re() denotes the real part of a complex number. The stability of a BVM is
closely related to two characteristic polynomials of degree = 1 + 2 , defined as
follows:
X2 X2
j+1
(z) j+1 z and (z) j+1 z j+1 . (9.10)
j=1 j=1
where z, q C.
Consider now the equation (z, q) = 0. It defines a mapping between the complex
z-plane and the complex q-plane. For every z C which is a root of (z, q), (9.11)
provides
(z)
q = q(z) = .
(z)
Let
(ei )
qC:q= , 0 < 2 . (9.12)
(ei )
9.3. STRANG-TYPE PRECONDITIONER FOR ODES 161
The is the set corresponding to the roots on the unit circumference and is called the
boundary locus. We have the following definition and lemma, see [5].
is called the region of A1 ,2 -stability of the given BVM. Moreover, the BVM is said to
be A1 ,2 -stable if
C D1 ,2 .
Lemma 9.1 If a BVM is A1 ,2 -stable and is defined by (9.12), then Re(q) 0 for
all q .
Now, we want to show that the preconditioner S is invertible under the stability
condition.
Theorem 9.1 If the BVM for (9.1) is A1 ,2 -stable and hk (Jn ) D1 ,2 where k (Jn ),
k = 1, , n, are the eigenvalues of Jn , then the preconditioner S defined by (9.9) is
invertible.
Proof: Since s(A) and s(B) are circulant matrices, their eigenvalues are given by
1 1 (z)
gA (z) z 2 + . . . + 1 + 1 1 + . . . + 0 1 = 1
z z z
and
1 1 (z)
gB (z) z 2 + . . . + 1 + 1 1 + . . . + 0 1 = 1 ,
z z z
2ij
evaluated at j = e s+1 where i 1, for j = 0, , s, see [9, 13]. The eigenvalues
jk (S) of S are therefore given by
has no roots on the unit circle |z| = 1 if hk (Jn ) D1 ,2 . Thus for all k = 1, , n,
and any arbitrary |z| = 1, we have
1
gA (z) hk (Jn )gB (z) = [z, hk (Jn )] 6= 0.
z 1
162 CHAPTER 9. APPLICATIONS
It follows that
jk (S) 6= 0, j = 0, , s, k = 1, , n.
Thus S is invertible.
In particular, we have
Corollary 9.1 If the BVM is A1 ,2 -stable and k (Jn ) C , then the preconditioner
S is invertible.
It is easy to check that LA and LB are (s + 1)-by-(s + 1) matrices with nonzero entries
only in the following four corners: a 1 -by-( + 1) block in the upper left; a 1 -by-1
block in the upper right; a 2 -by-( + 1) block in the lower right; and a 2 -by-2 block
in the lower left. By noting that = 1 + 2 , we then have
rank(LA ) , rank(LB ) .
Therefore,
rank(LA In ) = rank(LA ) n n
and
rank(LB Jn ) = rank(LB ) n n.
Thus,
S 1 M = In(s+1) + S 1 E = In(s+1) + L,
where the rank of L is at most 2n.
S 1 M y = S 1 b,
9.3. STRANG-TYPE PRECONDITIONER FOR ODES 163
by Theorems 6.13 and 9.2, we know that the method will converge in at most 2n + 1
iterations in exact arithmetic.
Regarding the cost per iteration, the main work in each iteration for the GMRES
method is the matrix-vector multiplication
e In hB
S 1 M z = (s(A) In hs(B) Jn )1 (A e Jn )z,
e B
see Section 6.5. Since A, e are band matrices and Jn is assumed to be sparse, the
matrix-vector multiplication
e In hB
M z = (A e Jn )z
s(A) = F A F , s(B) = F B F
where A , B are diagonal matrices containing the eigenvalues of s(A), s(B) respec-
tively. It follows that
This product can be obtained by using FFTs and solving s + 1 linear systems of order
n. Since Jn is sparse, the matrix
A In hB Jn
where q is the number of nonzeros of Jn , and 1 and 2 are some positive constants.
For comparing the computational cost of the method with direct solvers for the linear
system (9.8), we refer to [10].
emphasize that in all of our tests in this chapter, the zero vector is the initial guess
and the stopping criterion is
krq k2
< 106 ,
kr0 k2
where rq is the residual after q iterations. The BVM we used is the third order general-
ized Adams method (GAM). Its formula and the initial and final additional conditions
can be found in [5].
Example 9.1. Heat equation:
u 2u
= ,
t x2
u
u(0, t) = (, t) = 0, t [0, 2],
x
u(x, 0) = x, x [0, ].
We discretize the partial differential operator 2 /x2 with central differences and step
size equals to /(n + 1). The system of ODEs obtained is:
0
y (t) = Tn y(t), t [0, 2]
y(0) = (x1 , x2 , , xn )T ,
where Tn is a scaled discrete Laplacian matrix
2 1
1 ... ...
(n + 1)2
.
Tn = .. .. ..
2 . . .
1 2 1
1 1
Table 9.1 lists the number of iterations required for convergence of the GMRES
method for different n and s. In the table, I means no preconditioner is used and S
denotes the Strang-type block-circulant preconditioner defined by (9.9). We see that
the number of iterations required for convergence, when S is used, is much less than
that when no preconditioner is used. The numbers under the column S stay almost a
constant for increasing s and n.
n s I S n s I S
24 6 19 4 48 6 47 4
12 70 4 12 167 4
24 152 4 24 359 4
48 227 3 48 >400 3
96 314 3 96 >400 3
In the following, for simplicity, we only consider the case of s = 2 in (9.13). The
generalization to any arbitrary s is straightforward. Let
h = 1 /m1 = 2 /m2
be the step size where m1 and m2 are positive integers with m2 > m1 (2 > 1 ). For
(9.13), by using a BVM with (1 , 2 )-boundary conditions over a uniform mesh
tj = t0 + jh, j = 0, , r1 ,
166 CHAPTER 9. APPLICATIONS
X
X
i yp+i1 = h i (Jn yp+i1 + Dn(1) yp+i1 m1 + Dn(2) yp+i1 m2 + fp+i1 ),
i=0 i=0
(9.15)
for p = 1 , , r1 1, where = 1 + 2 . By providing the values
Ry = b
where
R A In hB Jn hC (1) Dn(1) hC (2) Dn(2) , (9.17)
b Rn(r1 1 ) depends on f , the boundary values and the coefficients of the method.
The matrices A, B R(r1 1 )(r1 1 ) are defined as in (9.4) and C (1) , C (2) R(r1 1 )(r1 1 )
are defined as follows:
0 0
. . . . . .
.. . . . . .. . . . .
. . . . . .
(1)
C = , C =(2) .
. .
0 . . 0 . .
.. .. .. .. .. ..
. . . . . .
0 0 0 0
(0, , 0, , , 0 , 0, , 0 )T
| {z } | {z }
m1 2 r1 m1 21 1
(0, , 0, , , 0 , 0, , 0 )T .
| {z } | {z }
m2 2 r1 m2 21 1
9.4. STRANG-TYPE PRECONDITIONER FOR DDES 167
is given by
Sej = [A ]jj In h[B ]jj Jn h[C (1) ]jj Dn(1) h[C (2) ]jj Dn(2) ,
for j = 1, 2, , r1 1 . Let
2ij
wj = e r1 1 .
We have
[A ]jj = (wj )/wj1 , [B ]jj = (wj )/wj1 ,
1 h i
Sej = wjm2 (wj )In h(wj )Jn hwjm1 (wj )Dn(1) h(wj )Dn(2) .
wjm2 +1
Theorem 9.3 If the BVM with (1 , 2 )-boundary conditions is A1 ,2 -stable and (9.14)
holds, then for any arbitrary R, the matrix
eim2 (ei )In h(ei )Jn heim1 (ei )Dn(1) h(ei )Dn(2)
Proof: Suppose that there exist x Cn with kxk2 = 1 and R such that
h i
eim2 (ei )In h(ei )Jn heim1 (ei )Dn(1) h(ei )Dn(2) x = 0.
Then
h i
x (ei )In h(ei )Jn heim1 (ei )Dn(1) heim2 (ei )Dn(2) x = 0,
i.e.,
We therefore have
(1) (2)
(ei ) (hx Jn x + heim1 x Dn x + heim2 x Dn x)(ei )
(1) (2)
= ei , h(x Jn x + eim1 x Dn x + eim2 x Dn x) = 0
where is the boundary locus defined by (9.12). Since the BVM is A1 ,2 -stable, from
Lemma 9.1, we know that
Thus we have
(Jn ) + kDn(1) k2 + kDn(2) k2 0,
Theorem 9.4 Let R be given by (9.17) and Se be given by (9.18). Then we have
Se1 R = In(r1 1 ) + L
where
rank(L) (2 + m1 + m2 + 21 + 2)n.
Se1 Ry = Se1 b,
n h I Se n h I Se
24 1/10 52 9 48 1/10 53 12
1/20 97 11 1/20 98 14
1/40 185 15 1/40 189 14
1/80 367 19 1/80 378 17
Lemma 9.3 Let Ln , Mn and Nn be any matrices with kLn k2 < 1. Then the solution of
(9.19) is asymptotically stable if Re(i ) < 0 where i , i = 1, , n, are the eigenvalues
of matrix
(In Ln )1 (Mn + Nn )
with || 1.
Let h = /k1 be the step size where k1 is a positive integer. For (9.19), by using a
BVM with (1 , 2 )-boundary conditions over a uniform mesh
tj = t0 + jh, j = 0, , r2 ,
Hy = b
where
H A In A(1) Ln hB Mn hB (1) Nn , (9.22)
y= (yT1 , yT1 +1 , , yrT2 1 )T R n(r2 1 )
,
b Rn(r2 1 ) depends on the boundary values and the coefficients of the method. In
(9.22), the matrices A, B R(r2 1 )(r2 1 ) are defined as in (9.4), and A(1) , B (1)
R(r2 1 )(r2 1 ) are given as follows:
0 0
. . . . . .
.. . . . . .. . . . .
. . . . . .
(1)
A = , B (1)
= ,
0 . . . 0 . . .
.. .. .. .. .. ..
. . . . . .
0 0 0 0
see [3]. We remark that the first column of A(1) is given by:
(0, , 0, , , 0 , 0, , 0 )T
| {z } | {z }
k1 2 r2 k1 21 1
(0, , 0, , , 0 , 0, , 0 )T .
| {z } | {z }
k1 2 r2 k1 21 1
S = (F In ) (A In A(1) Ln hB Mn hB (1) Nn ) (F In ),
where E is the diagonal matrix holding the eigenvalues of s(E), for E = A, B, A(1) ,
B (1) respectively.
9.5. STRANG-TYPE PRECONDITIONER FOR NDDES 173
We have
[A ]jj = (wj )/wj1 , [B ]jj = (wj )/wj1 ,
A In A(1) Ln hB Mn hB (1) Nn
in S is given by
1 h i
= wjk1 ((wj )In h(wj )Mn ) (wj )Ln h(wj )Nn ,
wjk1 +1
where
D (ei )In h(In eik1 Ln )1 (Mn + eik1 Nn )(ei ). (9.24)
Hence, we are required to show that is invertible for any R in order to prove
that Sj is invertible. Assume that kLn k2 < 1, we have In eik1 Ln is nonsingular for
any R. Therefore, we only need to show D is invertible for any R. We have
the following theorem, see [3]
Theorem 9.5 If the BVM with (1 , 2 )-boundary conditions is A1 ,2 -stable and Re(i ) <
0 where i , i = 1, , n, are the eigenvalues of matrix
(In Ln )1 (Mn + Nn )
with || 1, then for any R, the matrix D defined by (9.24) is invertible. It follows
that the Strang-type preconditioner S defined as in (9.23) is also invertible.
174 CHAPTER 9. APPLICATIONS
Proof: Let
U (In eim Ln )1 (Mn + eim Nn ).
Then D can be written as
D = (z)In hU (z).
Note that the eigenvalues of D are given by
i (D) = (z) hi (U )(z), i = 1, , n,
where i (U ), i = 1, , n, denote the eigenvalues of U . Since we know that
Re[i (U )] < 0, i = 1, , n,
it follows that hi (U ) C . Note that the BVM is A1 ,2 -stable and then we have
hi (U ) C D1 ,2 .
Therefore, the A1 ,2 -stability polynomial defined by (9.11)
[z, hi (U )] (z) hi (U )(z)
has no roots on the unit circle |z| = 1. Thus, for any |z| = 1, we have
i (D) = (z) hi (U )(z) = [z, hi (U )] 6= 0, i = 1, , n.
It follows that D is invertible. Therefore, the Strang-type preconditioner S defined as
in (9.23) is also invertible.
Theorem 9.6 Let H be given by (9.22) and S be given by (9.23). Then we have
S 1 H = In(r2 1 ) + L
where
rank(L) 2( + k1 + 1 + 1)n.
By Theorems 6.13 and 9.6 , when the GMRES method is applied to
S 1 Hy = S 1 b,
the method will converge in at most 2(+k1 +1 +1)n+1 iterations in exact arithmetic.
We observe from Theorem 9.6 that if the step size h = /k1 is fixed, the number of
iterations for convergence of the GMRES method, when applied for solving
S 1 Hy = S 1 b,
is independent of r2 , i.e., the length of the interval that we considered.
9.5. STRANG-TYPE PRECONDITIONER FOR NDDES 175
where
8 2 1
2 1 .. .. ..
. .. 2 . . .
1
1 .. .
, .. .. ..
Ln = .. .. Mn = 1 . . . 1 ,
n . . 1
.. .. ..
1 2 . . . 2
1 2 8
and
1 2
. ..
1 1 . . .
Nn = .
n ..
.
..
. 1
1 2
Example 9.3 is solved by using the fifth order GAM for t [0, 4]. In practice, we
do not have the boundary values
y1 , , y1 1 , yr2 , , yr2 +2 1 ,
provided by (9.21). Again as in Section 9.2, instead of giving the above values, 1 1
initial additional equations and 2 final additional equations are given. After introduc-
ing the additional equations, the matrices A, A(1) , B and B (2) in (9.22) are Toeplitz
matrices with small rank perturbations. We can also construct the Strang-type pre-
conditioner (9.23) by neglecting the small rank perturbations.
Table 9.3 lists the number of iterations required for convergence of the GMRES
method for different n and k1 . In the table, I means no preconditioner is used and S
denotes the Strang-type block-circulant preconditioner defined by (9.23). We see that
the number of iterations required for convergence, when S is used, is much less than
that when no preconditioner is used. We should emphasize that our numerical example
shows a much faster convergence rate than that predicted by the estimate provided by
Theorem 9.6.
176 CHAPTER 9. APPLICATIONS
n k1 I S n k1 I S
24 10 43 7 48 10 44 6
20 83 7 20 83 6
40 161 7 40 163 6
80 * 7 80 * 6
where
x(t), (t) : R Rm ; y(t), (t) : R Rn ;
and > 0, 0 < 1 are constants. We can rewrite SPDDE (9.25) as the following
initial value problem:
0
z (t) = P z(t) + Qz(t ), t t0 ,
(9.26)
(t)
z(t) = , t t0 ,
(t)
9.6. STRANG-TYPE PRECONDITIONER FOR SPDDES 177
tj = t0 + jh, j = 0, , v,
Kv = b, (9.29)
where
K A Im+n hB P hU Q. (9.30)
The vector v in (9.29) is defined by
The right-hand side b R(m+n)(v1 ) of (9.29) depends on the boundary values and
the coefficients of the method. The matrices A, B R(v1 )(v1 ) in (9.30) are defined
as in (9.4) and U R(v1 )(v1 ) in (9.30) is defined as the matrix C (1) in (9.17).
Sb1 Kv = Sb1 b,
and
5 1
.. ..
2 . .
H=
.. ..
.
. . 1
2 5 nn
Example 9.4 is solved by using the third order GAM for t [0, 4]. Table 9.4 lists
the number of iterations required for convergence of the GMRES method for different
m, n and k2 . In the table, Sb denotes the Strang-type block-circulant preconditioner
defined by (9.31).
k2 m n I Sb k2 m n I Sb
24 8 2 55 29 48 8 2 100 34
16 4 83 57 16 4 131 63
32 8 109 90 32 8 177 90
180 CHAPTER 9. APPLICATIONS
Bibliography
[1] P. Amodio, F. Mazzia and D. Trigiante, Stability of Some Boundary Value Methods
for the Solution of Initial Value Problems, BIT, vol. 33 (1993), pp. 434451.
[3] Z. Bai, X. Jin and L. Song, Strang-type Preconditioners for Solving Linear Systems
from Neutral Delay Differential Equations, Calcolo, vol. 40 (2003), pp. 2131.
[6] Z. Cao, Numerical Linear Algebra (in Chinese), Fudan University Press, Shanghai,
1996.
[7] R. Chan and X. Jin, A Family of Block Preconditioners for Block Systems, SIAM
J. Sci. Statist. Comput., vol. 13 (1992), pp. 12181235.
[8] R. Chan, X. Jin and Y. Tam, Strang-type Preconditioners for Solving System of
ODEs by Boundary Value Methods, Electron. J. Math. Phys. Sci., vol. 1 (2002),
pp. 1446.
[9] R. Chan and M. Ng, Conjugate Gradient Methods for Toeplitz Systems, SIAM
Review, vol. 38 (1996), pp. 427482.
[11] T. Chan, An Optimal Circulant Preconditioner for Toeplitz Systems, SIAM J. Sci.
Statist. Comput., vol. 9 (1988), pp. 766771.
181
182 BIBLIOGRAPHY
[12] W. Ching, Iterative Methods for Queuing and Manufacturing Systems, Springer-
Verlag, London, 2001.
[13] P. Davis, Circulant Matrices, 2nd edition, AMS Chelsea Publishing, Rhode Island,
1994.
[14] J. Demmel, Applied Numerical Linear Algebra, SIAM Press, Philadelphia, 1997.
[15] H. Diao, Y. Wei and S. Qiao, Displacement Rank of the Drazin Inverse, J. Comput.
Appl. Math., vol. 167 (2004), pp. 147161.
[16] M. Hestenes and E. Stiefel, Methods of Conjugate Gradients for Solving Linear
Systems, J. Res. Nat. Bur. Stand., vol. 49 (1952), pp. 409436.
[17] R. Horn and C. Johnson, Matrix Analysis, Cambridge University Press, Cam-
bridge, 1985.
[19] G. Golub and C. Van Loan, Matrix Computations, 3rd edition, Johns Hopkins
University Press, Baltimore, 1996.
[20] A. Greenbaum, Iterative Methods for Solving Linear Systems, SIAM Press,
Philadephia, 1997.
[21] M. Gulliksson, X. Jin and Y. Wei, Perturbation Bounds for Constrained and
Weighted Least Squares Problems, Linear Algebra Appl., vol. 349 (2002), pp. 221
232.
[22] X. Jin, Developments and Applications of Block Toeplitz Iterative Solvers, Kluwer
Academic Publishers, Dordrecht; and Science Press, Beijing, 2002.
[23] X. Jin, S. Lei and Y. Wei, Circulant Preconditioners for Solving Differential Equa-
tions with Multi-Delays, Comput. Math. Appl., vol. 47 (2004), pp. 14291436.
[24] X. Jin, S. Lei and Y. Wei, Circulant Preconditioners for Solving Singular Pertur-
bation Delay Differential Equations. Numer. Linear Algebra Appl., vol. 12 (2005),
pp. 327336.
[25] X. Jin, V. Sin and L. Song, Circulant Preconditioned WR-BVM Methods for ODE
systems, J. Comput. Appl. Math., vol. 162 (2004), pp. 201211.
[26] X. Jin, V. Sin and L. Song, Preconditioned WR-LMF-Based Method for ODE
systems, J. Comput. Appl. Math., vol. 162 (2004), pp. 431444.
BIBLIOGRAPHY 183
[27] X. Jin, Y. Wei and W. Xu, A Stability Property of T. Chans Preconditioner, SIAM
J. Matrix Anal. Appl., vol. 25 (2003), pp. 627629.
[28] J. Kuang, J. Xiang and H. Tian The Asymptotic Stability of One Parameter Meth-
ods for Neutral Differential Equations, BIT, vol. 34 (1994), pp. 400408.
[29] S. Lei and X. Jin, BCCB Preconditioners for Systems of BVM-Based Numerical
Integrators, Numer. Linear Algebra Appl., vol. 11 (2004), pp. 2540.
[30] F. Lin, X. Jin and S. Lei, Strang-type Preconditioners for Solving Linear Systems
form Delay Differential Equations, BIT, vol. 43 (2003), pp. 136149.
[31] T. Mori, N. Fukuma and M. Kuwahara, Simple Stability Criteria for Single and
Composite Linear Systems with Time Delays, Int. J. Control, vol. 34 (1981), pp.
11751184.
[32] T. Mori, E. Noldus, and M. Kuwahara, A Way to Stabilize Linear Systems with
Delayed State, Automatica, vol. 19 (1983), pp. 571573.
[33] Y. Saad, Iterative Methods for Sparse Linear Systems, PWS Publishing Company,
Boston, 1996.
[34] Y. Saad and M. Schultz, GMRES: A Generalized Minimal Residual Algorithm for
Solving Nonsymmetric Linear Systems, SIAM J. Sci. Stat. Comput., vol. 7 (1986),
pp. 856869.
[35] G. Stewart and J. Sun, Matrix Perturbation Theory, Academic Press, San Diego,
1990.
[37] G. Strang, A Proposal for Toeplitz Matrix Calculations, Stud. Appl. Math., vol.
74 (1986), pp. 171176.
[38] L. Trefethen and D. Bau III, Numerical Linear Algebra, SIAM Press, Philadelphia,
1997.
[40] S. Vandewalle and R. Piessens, On Dynamic Iteration Methods for Solving Time-
Periodic Differential Equations, SIAM J. Num. Anal., vol. 30 (1993), pp. 286303.
[41] R. Varga, Matrix Iterative Analysis, 2nd edition, Springer-Verlag, Berlin, 2000.
184 BIBLIOGRAPHY
[42] G. Wang, Y. Wei and S. Qiao, Generalized Inverses: Theory and Computations,
Science Press, Beijing, 2004.
[43] S. Wang, Further Results on Stability of X(t) = AX(t) + BX(t ), Syst. Cont.
Letters, vol. 19 (1992), pp. 165168.
[44] Y. Wei, J. Cai and M. Ng, Computing Moore-Penrose Inverses of Toeplitz Matrices
by Newtons Iteration, Math. Comput. Modelling, vol. 40 (2004), pp. 181191.
(2)
[45] Y. Wei and N. Zhang, Condition Number Related with Generalized Inverse AT,S
and Constrained Linear Systems, J. Comput. Appl. Math., vol. 157 (2003), pp.
5772.
[46] J. Wilkinson, The Algebraic Eigenvalue Problem, Clarendon Press, Oxford, 1965.
[47] S. Xu, L. Gao and P. Zhang, Numerical Linear Algebra (in Chinese), Peking Uni-
versity Press, Beijing, 2000.
[48] N. Zhang and Y. Wei, Solving EP Singular Linear Systems, Int. J. Computer
Mathematics, vol. 81 (2004), pp. 13951405.
Index
185
186 INDEX
QR algorithm, 120
range, 50
rank-one matrix, 147
rank-one modification, 12
rounding error, 5, 38