Professional Documents
Culture Documents
388
7.1 Introduction
In Chapter 6 we discussed several methods for solving the linear system
Ax = b;
where A was assumed to be square and nonsingular. However, in several practical situations, such
as in statistical applications, geometric modeling, signal processing etc., one needs to solve
a system where the matrix A is non-square and/or singular. In such cases, solutions may not
exist at all; in cases where there are solutions, there may be innitely many. For example, when
A is m n and m > n, we have an overdetermined system (that is, the number of equations is
greater than the number of unknowns), and an overdetermined system typically has no solution.
In contrast, an underdetermined system (m < n) typically has an innite number of solutions.
In these cases, the best one can hope for is to nd a vector x which will make Ax as close as
possible to the vector b. In other words, we seek a vector x such that r(x) = kAx ; bk is minimized.
When the Euclidean norm k k2 is used, this solution is referred to as a least squares solution to
the system Ax = b. The term \least squares solution" is justied, because it is a solution that
minimizes the Euclidean norm of the residual vector and, by denition, the square of the Euclidean
norm of a vector is just the sum of squares of the components of the vector. The problem of nding
least squares solutions to the linear system Ax = b is known as the linear least squares problem
(LSP). The linear least squares problem is formally dened as follows:
If the least squares problem has more than one solution, the one having the minimum Euclidean
norm is called the minimum length solution or the minimum norm solution.
This chapter is devoted to the study of such problems. The organization of the chapter is as
follows.
In Section 7.2 we show how a very simple business application leads to an overdetermined least
squares problem. In this section we simply formulate the problem as a least squares problem and
later in Section 7.5 we present a solution of the problem using normal equations.
389
In Section 7.3 we prove a theorem on the existence and uniqueness of solution of an overdeter-
mined least squares problem.
In Section 7.7 we analyze the sensitivity of the least squares problems due to perturbations in
data. We prove only a simple result here and state other results without proofs.
Section 7.8 deals with computational methods for both full rank and rank-decient overdeter-
mined problems. We discuss the normal equations methods and the QR factorization methods
using Householder transformations, modied Gram-Schmidt and Classical Gram-Schmidt orthog-
onalizations.
Underdetermined least squares problems are considered in Section 7.9. We again discuss here
the normal equations and the QR methods for an underdetermined problem.
In Section 7.10 an iterative improvement procedure for rening approximate solutions is pre-
sented.
In Section 7.11 we describe an ecient way of computing the variance-covariance matrix of a
least squares solution, which is (AT A);1.
If the data in the table has satised the above relation, we have
162 = x1 + 274x2 + 2450x3
390
120 = x
1 + 180x2 + 3254x3
223 = x
1 + 375x2 + 3802x3
131 = x
1 + 205x2 + 2838x3
67 = x
1 + 86x2 + 2347x3
or
Ax = b;
where 01 274 2450
1 162
0 1
BB 1
180 3254 C
C B
BB 120 CCC 0x 1
BB CC B C B 1
C
A=B 375 3802 C
BB 1 CC ; b = BBB 223 CCC ; x = B@ x2 CA :
B@ 1
205 2838 C A B@ 131 CA x3
1 86 2347 67
The above is an overdetermined system of ve equations in three unknowns.
m>n
x
A b
The underdetermined case will be discussed later in this chapter.
391
Theorem 7.3.1 (Least Squares Existence and Uniqueness Theorem)
There always exists a solution to the linear least squares problem. This solution
is unique i A has full rank, that is, rank(A) = n. If A is rank decient, then the
least squares problem has innitely many solutions.
We present a proof here in the full-rank case, that is, in the case when A has full rank. The
rank-decient case will be treated later in this chapter and in the chapter on the Singular Value
Decomposition (SVD) (Chapter 10).
First, we observe the following
Proof. We denote the residual r = b ; Ax by r(x) to emphasize that given A and b, r is a function
of x. Let y be an n-vector. Then r(y ) = b ; Ay = r(x) + Ax ; Ay = r(x) + A(x ; y ). So,
kr(y)k = kr(x)k + 2(x ; y)T AT r(x) + kA(x ; y)k :
2
2
2
2
2
2
392
for any > 0 if Az = 0, and for 0 < < k2Az kzk22 if Az 6= 0. This implies that x is not a least
k22
squares solution.
Proof. (of Theorem 7.3.1) Since A has full rank, AT A is symmetric and positive denite and is
thus, in particular, a nonsingular matrix. The theorem, in the full rank case, is now proved from
the fact that the linear system
AT Ax = AT b
has a unique solution i AT A is nonsingular.
b
b-Ax
R(A)
Ax
Figure 7.1
From this interpretation, it is easy to understand that a solution of the least squares problem
to the linear system Ax = b always exists. This is because one can project b to the \plane" R(A)
to obtain a vector u 2 R(A) and there is x 2 Rn such that u = Ax. This x is a solution.
Because b ; Ax is perpendicular to R(A) and every vector in R(A) is a linear combination of
column vectors of A, then b ; Ax is orthogonal to every column of A. That is,
AT (b ; Ax) = 0
393
or
AT Ax = AT b:
E = ;2 X n
xi (yi ; a ; a xi ; a xi ; ; am xmi ) 2
(7.5.4)
a 1 i =1
0 1 2
..
.
E = ;2 X n
xmi (yi ; a ; a xi ; ; am xmi )
am i =1
0 1
394
Setting these equations to zero, we have
X X X X
a n + a xi + a xi + + am xmi = yi
2
X X X m X
0 1 2
a xi + a xi + + am xi
0 1 =
2
x i yi +1
(7.5.5)
..
.
X X X X
a 0 xmi + a 1 xmi + + am
+1
xi m =
2
xmi yi
P
(where denotes the summation from i = 1 to n).
X
Setting xki = Sk ; k = 0; 1; ; 2m and denoting the entries of the right hand side, respec-
tively, by b0; b1 ; ; bm , the system of equations can be written as:
0S S Sm 1 0 a 1 0 b 1
BB S 0
S
1
Sm C
C BB a CC BB b CC
0 0
BB . 1 2
.. C
C BBB .. CCC = BBB .. CCC :
+1 1 1
(7.5.6)
B@ .. . C
A@ . A @ . A
Sm Sm +1 S m 2 am bm
(Note that S0 = n.)
This is a system of (m + 1) equations in (m + 1) unknowns a0; a1; ; am .
This is really a system of normal equations. To see this, dene
0 1 x xm 1
BB 1 x1 1
xm2 C
C
V =B B C
B@ ... ... . . . ... CCA : (7.5.7)
2
1 xn xmn
Then the above system becomes equal to
V T V a = b; (7.5.8)
where a = (a0; a1; : : :; am )T , and b = (b0; b1; : : :; bm)T .
This is a system of normal equations; furthermore, if the xi 's are all distinct, then the matrix
V has full rank.
The matrix V is known as the Vandermonde matrix. From our discussion in the previous
section, we see that a is the least squares solution to the system (7.5.8).
Example 7.5.1
Suppose that an electrical engineer has gathered the following experimental data consisting of
the measurement of the current in an electric wire for various voltages.
x = voltage 0 2 5 7 9 13 24
y = current 0 6 7:9 8:5 12 21:5 35
395
We would like to derive the normal equations for the above data corresponding to the best t of
the data to (a) straight line (b) a quadratic, and would like to see a comparison of the predicted
results with the actual result when v = 5.
a = 1:4353:
1
396
The solution of these normal equations is:
0a 1 0 0:8977 1
Ba
a=B
0
CC = BB 1:3695 CC :
@ 1 A @ A
a2 0:0027
The value of a0 + a1x + a2 x2 at x = 5 is 7.8127.
Note: The use of higher degree polynomial may not necessarily give the best result. The matrix
of the normal equations in this case is ill-conditioned:
Cond2(V T V ) = 2:3260 105:
Indeed, it is well-known that the Vandermonde matrices become progressively ill-
conditioned as the order of matrices increases.
7.6 Pseudoinverse and the Least Squares Problem
Denote (AT A);1AT = Ay .
Denition 7.6.1 The matrix
Ay = (AT A); AT 1
is called the pseudoinverse or the Moore-Penrose generalized inverse of A. From (7.3.1), it therefore
follows that the unique least-squares solution is x = Ay b.
Clearly, the above denition of the pseudoinverse generalizes the ordinary denition of the
inverse of a square matrix A. Note that when A is square and invertible,
Ay = (AT A); AT = A; (AT ); AT = A; :
1 1 1 1
397
We shall discuss this important concept in some more detail in Chapter 10. An excellent reference
on the subject is the book Generalized Inverses by C. R. Rao and S. K. Mitra (1971). Having
dened the generalized inverse of a rectangular matrix, we now dene the condition number of such
a matrix as Cond(A) = kAk kAyk.
Denition 7.6.2 If an m n matrix A has full rank, then Cond(A) = kAk kAyk.
Note: If not explicitly stated, all the norms used in the rest of this chapter are
2-norms, and Cond(A) is the condition number with respect to 2-norm. That is,
Cond(A) = kAk kAyk .
2 2
Example 7.6.1
01 21
B 2 3 CC ; rank(A) = 2:
A=B
@ A
4 5
Thus, A has full rank.
y ; ;1:2857 ;0:5714 0:8577 !
A = (A A) A =
T T
1
1 0:5000 ;0:5000
Cond (A) = kAk kAky = 7:6656 2:0487 = 15:7047 :
2 2 2
Variance-Covariance Matrix
In certain applications, the matrix AT A is called the information matrix, since it measures
the information contained in the experiment, and the matrix (AT A);1 is known as the variance-
covariance matrix. An algorithm for computing the variance-covariance matrix with-
out explicitly computing the inverse is given in Section 7.11.
7.7 Sensitivity of the Least Squares Problem
In this section we study the sensitivity of a least squares solution to perturbations in data, that
is, we investigate how a least squares solution changes with respect to small changes in the data.
This study is important in understanding the dierent behaviors of dierent methods for solving
the least squares problem that will be discussed in the next section. We consider two cases:
perturbation in the vector b and perturbation in the matrix A. The results in this section are
norm-wise perturbation results. For component-wise perturbation results, see Bjorck (1992), and
Chandrasekaran and Ipsen (1994).
398
Case 1: Perturbation in the vector b
Here we assume that the vector b has been perturbed to ^b = b + b, but A has remained unchanged.
Proof. Since x and x^ are the unique least squares solutions to the original and the perturbed
problems, we have
x = Ayb
x^ = Ay(b + b):
Thus,
x^ ; x = Ay b + Ayb ; Ay b = Ay b: (7.7.1)
Let bN denote the projection of b onto the orthogonal complement of R(A). That is,
b = bR + bN :
Since bN lies in the orthogonal complement of R(A) = N (AT ), we have AT (bN ) = 0. So
x^ ; x = Ay b = Ay(bR + bN )
= Ay (bR) + Ay (bN ) = Ay bR + (AT A); AT bN = Ay bR :
1
399
Combining (7.7.1) and (7.7.2), we have the theorem.
400
show this is indeed the case.
:6667
!
x = Ay b =
:3333
:6667
!
y
x^ = A (b + b) =
:3334
kx^ ; xk = 2:4495 10; 4
kxk
Example 7.7.2 A Sensitive Problem
0 1 1
1 0 2 1
B ; C B ; CC
A=B
@ 10 4
0 C A; b=B
@ 10 4
A
0 10;4 10;4
011
B :1 CC ;
b = 10; B Cond(A) = O(104) :
@ A
3
! ! 0
1 1:5005
x= ; x^ = :
1 0:5005
401
Let x and x^ denote the unique least squares solutions, respectively, to the original and to the
perturbed problem. Let EA and EN denote the projections of E onto R(A) and onto the orthogonal
complement of R(A), respectively. Then if bR 6= 0, we have the following theorem due to Stewart
(1969).
Theorem 7.7.2 tells us that in the case where only the matrix A is perturbed, the
sensitivity of the unique least squares solution, in general, depends upon the squares
of the condition number of A. However, if kEN k or kbN k is zero or small, then the
sensitivity will depend only on the Cond(A). Note that the residual r = b ; Ax is zero if
bN = 0.
0 1 1
1 1
0 1
B CC BB CC
A=B
@ 0:0001 0 A; b = @1A
0 0:0001 1
0 1:0001 1 0 ;0:0001 1
B
bR = PA b = B
C B C kbN k
@ 0:0001 CA ; bN = B@ 0:9999 CA ; kbRk = 1:
0:0001 0:9999
402
(Using PA from Example 5.6.3 of Chapter 5.) Let
0 0 ;0:0001 1
B 0 0:9999 CC
E = 10; B
@4
A
0 0:9999
0 1 1
1 0 0 10; 1 4
B C BB C
A+E = B
@ 0:0001 0:0001 CA ; EN = 10; 4
@ 0 0:9999 CA
0 0:0002 0 0:9999
kEN k = kE k = 9:999 10; : 5
kAk kAk
Though the product of kkEANkk and kkbbN kk is rather small, since (Cond(A)) = 2 10 is large, we
2 8
R
should expect a drastic departure of the computed solution from the true solution. This is indeed
true, as the following computations show:
;4:999 ! 0:5
!
x^ = 103
; x=
5 0:5
kx^ ; xk = 9:999 10 (Large!) :
3
kxk
Note that
kEN k kbN k (Cond(A)) = 9:999 10; 2 10 = 1:9998 10 :
2 5 8 4
kAk kbRk
Example 7.7.4 Sensitivity Depending Upon the Condition Number
Let
0 0 ;0:0001 1
B 0 0:9999 CC (same as in Example 7.7.3)
E = 10; B
@
4
A
0 0:9999
0 1 1
1 02
1
B 0:0001
A=B 0 C
C B C
@ A ; b = B@ 0:0001 CA :
0 0:0001 0:0001
001 0 1 10;4 10;4
1
B CC. Note that P = BB 10; C. (See
In this case, bR = b and bN = B
@0A A @ 4
0:5000 ;0:5000 C
A
0 10;4 ;0:5000 0:5000
Example 5.6.3 of Chapter 5.)
403
Thus, according to Theorem 7.7.2, the square of Cond(A) does not have any eect; the least
squares solution is aected only by Cond(A). We verify this as follows:
1
!
x=A b=y
1
1:4999
!
x^ ==
0:5000
Cond(A) = 1:4142 10 4
0 1 1
1
EA = 10; B
B 10; 0 CC
4
@ 4
A
0 10;4
kEAk = 10; 4
kAk
kx^ ; xk = 0:5000 :
kxk
Residual Sensitivity. We have just seen that the sensitivities of the least squares solutions due
to perturbations in the matrix A are dierent for dierent solutions; however, the following theorem
shows that the residual sensitivity always depends upon the condition number of the
matrix A. We state the result in somewhat simplied and crude form. See Golub and Van Loan,
MC (1983, pp. 141{144) for a precise statement and proof of a result on residual sensitivity.
404
The above result tells us that the sensitivity of the residual depends at most on
the condition number of A.
Example 7.7.5
0 1 1
1 0 1 1
B 10;
A=B 0 C
C B C
@ 4
A ; b = B@ 1 CA
0 10;4 1
0 0 ;0:0001 1
B C
E = 10;4 B
@ 0 0:9999 CA
0 0:9999
0:5
! ;4:9999 !
x= ; x^ = 10 3
0:5 5
0 ;0:0001 1
r = b ; Ax = B
B 0:9999 CC
@ A
0:9999
0 ;1 1
B C
r^ = b ; (A + E )^x = B
@ 0:9998 CA
0:9999
0 0 ;10;4 1
kr^ ; rk = 0:5; E = P E = 10;4 B B@ 0 0:9999 CCA
kbk N N A
0 0:9999
Cond(A) = 1:4142 104
and
Cond(A) kkEANkk = 1:4142:
The inequality in Theorem 7.7.3 is now easily veried.
Sensitivity of the Pseudoinverse. The following result, due to Wedin (1973), shows that it is
Cond(A) again that plays a role in the sensitivity analysis of the pseudoinverses of a matrix.
405
Theorem 7.7.4 (Pseudoinverse Sensitivity Theorem) Let A be m n, where
m n. Let Ay and A~y be, respectively, the pseudoinverse A and of A~ = A + E .
Then, provided that rank(A) = rank(A~), we have
y y
A~
; A
p kE k
A~y
2Cond(A) kAk :
Example 7.7.6
01 21 0 0:0010 0:0020 1
B2 3C
A=B
B C
@ C A ; E = 10; A = B@ 0:0020 0:0030 CA
4
4 5 0:0040 0:0050
y ;1:2857 ;0:5714 0:8571 !
A =
1 0:5000 ;0:5000
0 1:001 2:002 1
B C
A + E = A~ = B
@ 2:002 3:003 CA
4:004 5:005
~y ;1:2844 ;0:5709 0:8563 !
A =
0:9990 0:499995 ;0:4995
y y
A~ ; A
y
= 10;4 kE k :
A~
kAk
Note that
Cond(A) = 15:7047:
2 6 2 6
of AT A, and n
op to solve two triangular systems. Thus, the method is quite ecient.
2
408
given model is 0 7:0325 1
B C
( 1 220 2500 ) B
@ 0:5044 CA = 135:5005
0:0070
Note: In this example, the residuals are small in spite of the fact that the data matrix A is
ill-conditioned.
Example 7.8.2
01 21 031
B CC B C
A=B
@ 2 3 A ; b = B@ 5 CA
3 4 9
rank(A) = 2; rank(A; b) = 3. Thus the system Ax = b does not have a solution. We therefore
calculate the least squares solution.
40
!
(1) c = AT b =
57
(2) Cholesky factorization of AT A:
3:7417 0
!
H=
5:3452 0:6547
(3) Solutions of the triangular systems:
10:6904
!
y = ;
;0:2182 !
3:3333
x = :
;0:3333
3:3333
!
The unique least squares solution is x = .
;0:3333
Note that
; 1 : 8333 ; 0 : 3333 1 : 1667
!
Ay = (AT A);1AT =
1:3333 0:3333 ;0:6667
and
3:3333
!
y
A b= :
;0:3333
Numerical diculties with the normal equations method
409
The normal equations method, though easy to understand and implement, may give rise to
numerical diculties in certain cases.
First, we might lose some signicant digits during the explicit formation of AT A and the
computed matrix AT A may be far from positive denite; computationally, it may even be singular.
Indeed, it has been shown by Stewart (IMC, pp. 225{226) that, unless Cond(A) is less than 10 2 , t
where it is assumed that AT A has been computed exactly and then rounded to t signicant digits,
then AT A may fail to be positive denite or even may not be nonsingular. The following simple
example illustrates that fact.
Example 7.8.3
Consider 0 1 1
1
B 10;
A=B 0 C
C
@ 4
A
0 10 ; 4
Second, the normal equations approach may, in certain cases, introduce more errors than those
which are inherent in the problem. This is seen as follows.
From the perturbation analysis done in Chapter 6, we easily see that, if x^ is the computed least
squares solution obtained by the normal equations method, then
kx^ ; xk Cond(AT A)
kxk
= (Cond(A))2: (Exercise):
Thus, the accuracy of the least squares solution using normal equations will depend upon the square
of the condition number of the matrix A. However, we have just seen in the section on perturbation
analysis of the least squares problem that in certain cases such as when the residual is zero, the
sensitivity of the problem depends only on the condition number of A (see Theorems 7.7.1 and
7.7.2). Thus, in these cases, the normal equations method will introduce more errors
in the solution than what is warranted by the data.
410
A Special Note on the Normal Equations Method
In spite of the drawbacks mentioned above about the normal equations method, we must stress
that the method is still regarded as a useful tool for solving the least squares problem, at least in
the case where the matrix A is well conditioned. In fact, it is routinely used in many practical
applications and seems to be quite popular with practicing engineers and statisticians. Note that
in the example above, if we would use an extended precision in our computations, the computed
matrix AT A could be obtained as a symmetric positive denite matrix and the normal equations
method would yield an accurate answer, despite ill-conditioning; as the following computations
show:
1 : 00000001 1
!
AT A =
1 1:00000001
2:00000001
!
C = A b= T
2:00000001
1:000000005 0
!
H = the Cholesky factor of A A = T :
0:999999995 0:00014142135581
Solution of Hy = c: !
2
y= :
0:00014142135651
Solution of H T x = y : ! 1!
0:999999999500183
x= :
1:0000000499817 1
(The exact solution.)
= kR x ; ck + kdk ;
1
2 2
!
2 2
c
where QT b = . Thus, kAx ; bk will be minimized if x is chosen so that
2
d 2
R x ; c = 0:
1
411
The corresponding residual then is given by
krk = kdk
2 2
This observation immediately suggests the following QR approach for solving the least squares
problem:
where
R
!
R= 1
:
0
Flop-count. Since the cost of the algorithm is dominated by the cost of the QR decomposition
of A, the overall
op-count for the full-rank least squares solution using Householder's method is
n m ; n3 (Exercise):
2
Round-o Error and Stability. The method is stable. It has been shown in Lawson and
Hanson (SLP, p. 90) that the computed solution x^ is such that it minimizes
k(A + E )^x ; (b + b)k2
where c (6m ; 3n + 41)n and is machine precision. That is, the computed solution is the
exact least squares solution of a nearby problem.
Example 7.8.4
01 21 031
B 2 3 CC ;
A=B
B 5 CC
b=B
@ A @ A
3 4 9
413
(1) A = QR:
0 ;0:2673 0:8724 0:4082
1
B ;0:5345
Q = B
C
0:2182 ;0:8165 C
@ A
;0:8018 ;0:4364 0:4082
0 ;3:7417 ;5:3452 1
BB 0 C R!
0:6547 C
R = BBB CC = 1 :
CA
@ 0
0 0
(2)
c
!
QT b =
d
0 ;10:6904 1
B C
= B
@ ;0:2182 CA :
0:8165
(3) Solution of the system R1x = c:
;3:7417 ;5:3452 ! ;10:6904 !
x=
0 0:6547 ;0:2182
The least squares solution is !
3:3332
x= :
;0:3333
Norm of the residual = krk2 = kdk = 0:8115
Example 7.8.5
0 1 1
1 0 2 1
B C B C
A=B
@ 0:0001 0 C A; b=B
@ 0:0001 CA
0 0:0001 0:0001
(1) A = QR
0 ;1 0:0001 ;0:0001
1
B ;0:0001 ;0:7071 0:7071 CC
Q = B
@ A
0 0:7071 0:7071
0 ;1 ;1 1
BB CC R !
R = @ 0 0:0001 A = 1
0
0 0
(See Example 5.4.3 from Chapter 5.)
414
(2) 0 ;2 1
T BB CC c !
Q b = @ 0:0001 A =
d
0
(3) Solution of the system: R1 x = c
;1 ;1 ! x
! ;2 !
1
=
0 0:0001 x 2 0:0001
x 1 = 1
x 2 = 1:
1
!
The unique least squares solution is . Note that
1
0:0005 5 ;5
!
Ay = 103
! 0:0005 ;5 5
1
Ay b = :
1
Norm of the residual = krk2 = kdk = 0.
matrix Qmn = (q1; : : :; qn) with orthonormal columns and an upper triangular matrix R = (rij )nn
such that A = Qmn Rnn.
For k = 1; 2; : : :; n do
For i = 1; 2; : : :; k ; 1 do
rik qiT ak
kX
;1
qk ak ; rikqi
i=1
rkk = kqk k
qk rqk :
2
kk
The algorithm, as outlined above, is known to have serious numerical diculties. During the
computations of the qk's, cancellation can take place and, as a result, the computed
qk 's can be far from orthogonal. (See later in this section for details.)
The algorithm, however, can be modied to have better numerical properties. The following
algorithm, known as the modied Gram-Schmidt algorithm, computes the QR factorization
of A in which, at the kth step, the kth column of Q and the kth row of R are computed (note that
the Gram-Schmidt algorithm computes the kth columns of Q and R at the kth step).
Algorithm 7.8.4 Modied Gram-Schmidt (MGS) for QR Factorization
Set qk = ak ; k = 1; 2; : : :; n.
For k = 1; 2; : : :; n do
rkk = kqkk2
qk rqk
kk
For j = k + 1; : : :; n do
rkj qkT qj
qj qj ; rkjqk :
The above is the row-oriented modied Gram-Schmidt method. The column-oriented version
can similarly be developed (Exercise #17). The two versions are numerically equivalent.
Householder method. (Note that MGS works with the full length column vector at each
step, whereas the Householder method deals with successively shorter columns.)
Although in the 2 2 case the CGS and MGS algorithms produce the same results, we use a
2 2 example to illustrate here how the computational arrangements dier with the same matrix.
All computations are performed with 4-digit arithmetic.
Example 7.8.6
0 1 1
1 0 2 1
B C B C
A=B
@ 0:0001 0 C A; b=B
@ 0:0001 CA
0 0:0001 0:0001
Gram-Schmidt
k=1:
0 1 1
B 0:0001 CC ;
q =a =B r = kq k = 1
1 @ 1 A 11 1 2
0
0 1 1
B 0:0001 CC
q = rq = B
1 @
1
A
11
0
k=2:
001
B CC
r = 1;
12 q = a ; r q = 10; B
2 2 @ ;1 A
12 1
1
1
0 0 1
B
q =B
C
2 @ ;0:7071 CA
0:7071
q1T q2 = ;7:0711 10;5
Form Q and R:
0 1 0
1
B C
Q = (q ; q ) = B
1 @ 0:0001 ;0:7071 CA
2
0 0:7071
r r
! 1 1
!
R= 11 12
=
0 r22 0 1:414 10;4
417
Form c:
2
!
c = QT b =
0
2
!
The least squares solution is x = .
0
Modied Gram-Schmidt
q =a ;
1 1 q =a
2 2
k=1:
r = kq k = 1
11 1 2
0 1 1
B 0:0001 CC
q =B
1 @ A
0
0 0 1
B CC
r = qT q ;
12 1 2 q =q ;r q =B
2 2@ ; 0: 0001 A
12 1
0:0001
k=2:
r = kq k = 1:4142 10; 4
0 0 1
22 2
B CC
q = rq = B
2 @ ; 0:
2
7071 A
0:7071
22
Form Q and R:
0 1 1
B
0
C 1 1
!
Q=B
@ 0:0001 ;0:7071 CA ; R=
0 1:4142 10;4
0 0:7071
2
!
The least squares solution is x = . (Note that for the same problem, the Householder-
0!
1
QR method produced x = , which is correct (Example 7.8.5).)
1
418
Modied Gram-Schmidt versus Classical Gram-Schmidt Algorithms
Mathematically, the CGS and MGS algorithms are equivalent. However, as remarked earlier,
their numerical properties are dierent. For example, consider the computation of q2 by the CGS
method, given q1 with kq1 k2 = 1. We have
q a ;r q ;
2 2 12 1
where r12 = q1T a2. Then it can be shown (Bjorck (1992)) that
k
(q ) ; q k < (1:06)(2m + 3) ka k :
2 2 2 2
This shows that in CGS two computed vectors, q and q , can be far from being or-
1 2
thogonal. On the other hand, it can be shown Bjorck (1992) that in MGS the loss of orthogonality
depends upon the condition number of the matrix A. Specically, it has been shown that
I ; Q^ T Q^
2 1 ;c1c Cond
2 (A)
;
2 Cond 2 (A)
419
TABLE 7.1
Comparison of QR Factorization with Dierent Methods
Method kI ; Q^T Q^k
Gram-Schmidt 1:178648780488241 10; 7
Modied
Gram-Schmidt 4:504305729523455 10;12
Householder 4:841449989971538 10;16
Remark: The table clearly shows the superiority of the Householder method over
both the Gram-Schmidt and modied Gram-Schmidt methods; of the latter two meth-
ods, the MGS is clearly preferred over the CGS.
= Q (Rx ; z ) ; qn :
1 +1
Chris Paige is a Australian-Canadian numerical linear algebraist, who rejuvenated the use of the Lanczos algo-
rithm in matrix computations by detailed study of the break-down of the algorithm. He is a professor of computer
science at McGill University, Montreal, Canada.
420
If qn+1 is orthogonal to Q1, then kAx ; bk2 will be a minimum when Rx = z . Thus, the least
squares solution can be obtained by solving
Rx = z;
and the residual r will be given by r = qn+1 . Details can be found in Bjorck (1990). The
above discussion leads to the following least squares algorithm.
Algorithm 7.8.5 Least Squares Solution by MGS
1. Apply MGS to Amn to obtain Q = (q1 ; : : :; qn ) and R.
2. For k = 1; : : :; n do
k = qkT b
b b ; kqk
3. Solve Rx = ( ; : : :; n)T .
1
Example 7.8.7
Consider solving the least squares problem using the MGS with
0 1 1
1 0 2 1
A=B
B :0001 0 CC ; b = BB :0001 CC :
@ A @ A
0 :0001 :0001
1
!
The exact solution is x = .
1
0 1 1
B
0
C 1 1
!
Q=B @ :0001 ;:7071 CA ; R = 0 :0001 :
0 :7071
2
!
If we now form c = Q b and solve Rx = c, we obtain x =
T . On the other hand, if we obtain
0
x using !the algorithm above, we get (1 ; 2) = (2; 0:0001), and the solution of Rx = (1 ; 2)T is
1
x .
1
Round-o property and
op-count. It can be shown (see Bjorck and Paige (1992)) that
the MGS for the QR
! factorization method is numerically equivalent to the Householder method
0 0
applied to ; that is,
A b
0 0
!
R c
!
HnHn; H H = : 1
1 2 1
A b 0 c 2
421
From this equivalence, it follows that the MGS method is backward stable for the least
squares problem. The method is slightly more expensive than the Householder method. It
requires about mn
op, compared to the mn ; n
op needed by the Householder
2 2
3
3
method.
7.8.3 The QR Factorization Method for the Rank-Decient Case
In this section we consider the rank-decient overdetermined problem. As stated in Theorem 7.3.1,
there are innitely many solutions in this case. There are instances where, the rank-deciency is
actually desirable, because it provides a rich family of solutions which might be used for optimizing
some other aspects of the original problem.
In case the m n matrix A, m n, has rank r < n, the R matrix in a QR factorization of A
is singular. However, we have seen that the use of the QR factorization with column pivoting can
theoretically reveal the rank of A. Recall that Householder's method with column pivoting yields
R R
!
QT AP = 11 12
;
0 0
where P is a permutation matrix, R11 is an r r nonsingular upper triangular matrix and R12
is r (n ; r). This factorization can obviously be used to solve the rank-decient least squares
problem as follows.
Let
PTx = y
AP = A~
and
c
!
QT b = :
d
Then
kAx ; bk 2
=
QT APP T x ; QT b
22
2
2
R R ! y ! !
c
2
=
11 12
; 1
0 0 y d
2 2
=
(R y + R y ; c)
+ d
11 1 12 2
2 2
2
2
2
422
Thus, kAx ; bk22 will be minimized if y is chosen so that
R y = c;R y :
11 1 12 2
Moreover, the norm of the residual in this case will be given by:
krk = kb ; Axk = kck :
2 2 2
This observation suggests the following QR factorization algorithm for rank-decient least
squares solutions.
Algorithm 7.8.6: Least Squares Solutions for the Rank-Decient Problem Using QR
Step 1: Decompose Amn P = QmnRnm using column pivoting.
c
!
Step 2: Form Q b =
T .
d
Step 3: Choose an arbitrary vector y .2
Remarks:
1. A note on the use of column pivoting: We have shown that the column pivoting is useful
for the rank-decient least squares problem. However, even in the full rank case, the
use of column pivoting is suggested (see Golub and Van Loan MC 1984).
2. For the rank-decient least squares problem, the most reliable approach is the
use of singular value decomposition (see Section 7.8.4 and Chapter 10).
Round-o property. It can be shown (Lawson and Hanson, SLP p. 95) that for the minimum
length least squares problem, the computed vector x^ is close to the exact solution of a perturbed
problem. That is, there exist a matrix E and vectors x^ and b such that x^ is the minimum length
least squares solution of (A + E )^x b + b, where
kE kF (6m + 6n ; 6k ; 3s + 84)s kAkF + O( ) 2
and kbk (6m ; 3k + 40)k kbk + O(2 ), where k is the rank of A and s = min(m; n). Moreover,
kx ; x^k (6n ; 6k + 43)s kxk + O( ): 2
R R
!
Note: In the above result we have assumed that R in R = 11
is zero. But in
12
22
0 22 R
practical computations it will not be identically zero. In that case, if R^ 22 is the computed version
of R22, then we have
kE kF
R^
F + (6m + 6n ; 6k ; 3s + 84)s kAkF + O( ):
22
2
424
1 0
!
Step 1: AP = QR; P = .
0 1
0 ;:4472 :8940 0 1
B ;:8944 ;:4472 0 CC
Q = B
@ A
0 0 1
0 ;2:2361 0 1
B 0
R = B 0C
C:
@ A
0 0
0 ;6:7082 1
Step 2: QT b = BB@ 0 CCA ; c = (;6:7082).
0
Step 3: Choose y2 = 0.
Step 4: Solve R y = c ; R y :
;6:7082 = 3:
y = Rc = ;
11 1 12 2
1
11 2:2361
The minimum norm least squares solution is
3
!
x=y= :
0
Example 7.8.9 An Inconsistent Rank-Decient-Problem
01 01 011
B 2 0 CC ;
A=B
B 2 CC :
b=B
@ A @ A
0 0 0
0 ;0:4472 ;0:8944 0 1 0 ;2:2361 0 1
! B C ; R = BB 0
Step 1: PA = QR; P =
1 0
,Q=B ;0:8944 ;0:4472 0 C 0C
C:
0 1 @ A @ A
0 0 1 0 0
c !
Step 2: QT b = ; c = ;2:2361.
d
Step 3: Choose y = 0.
2
1
!
Step 4: The minimum norm least squares solution is x = y = .
0
(Note that R11y1 = c1 ; R12y2 gives y1 = 1)
425
7.8.4 Least Squares Solution Using the SVD
Acting on the remark in the previous section, for the sake of curious readers and those who know
how to compute the Singular Value Decomposition (SVD) of a matrix (for instance, using some
software package such as MATLAB, LAPACK or LINPACK), we just state the following results
which show how the SVD can be used to solve a least squares problem. A full treatment will be
given in Chapter 10.
Let
A = U V T
be the SVD of A, where A is m n with m n and = diag(1; : : :; n). Let
0 b0 1
B ...
b0 = U T b = B
1
CC :
@ A
b0m
426
Example 7.8.10
0 1 1
1 0 2
1
B 0:0001
A=B 0 C
C B C
@ A ; b = B@ 0:0001 CA
0 0:0001 0:0001
A = U V T gives
01 0 ;0:0001 1
B C
U =B@ 0 ;0:7071 0:7071 CA
0 0:7071 0:7071
0 1:4142 0 1
B C
=B @ 0 0:0001 CA
0 0
0:7071 ;0:7071
!
V=
0:7071 0:7071
021
! ! !
B 0 CC ;
b0 = U T b = B y=
y 1
=
1:4142
; x = Vy =
1
:
@ A y 0 1
0 2
m<n =
A b
x
Underdetermined systems arise in a variety of practical applications. Unfortunately, underde-
termined systems are not widely discussed in the literature. An excellent source is the survey paper
by Cline and Plemmons (1976). An underdetermined system has either no solution or an innite
number of solutions. This can be seen from the following theorem:
427
Theorem 7.9.1 Let Ax = b be an underdetermined system. Then every solution
x of the system can be represented as
x = xR + xN ;
where xR is any solution; that is,
AxR = b;
and xN is in the null space of A, that is,
AxN = 0:
In the following, we describe two approaches for computing the minimum norm solution
assuming that A has full rank, i.e., rank(A) = m.
428
7.9.1 The Minimum Norm Solution of the Full-rank Underdetermined Problem Using
Normal Equations
Theorem 7.9.2 Let A be m n (m < n) and have full rank. Then the unique
minimum norm solution x to the underdetermined system
Ax = b (7.9.1)
is given by
x = AT (AAT ); b: 1
(7.9.2)
Proof. Since A has full rank, AAT is nonsingular and clearly x is the unique solution of the system
(7.9.1) given by (7.9.2). To see that x is indeed the minimum norm solution, let's assume that y is
another solution and dene
z = y ; x:
Then
kyk = kx + zk
2
2
2
2
429
Step 2: Form the minimum norm solution x:
x = AT y:
1 2 3
! 6
!
A= ; b= :
2 3 4 9
Step 1:
14 20
!
AAT =
20 29
;1 !
y = :
1
Step 2: 011
B CC
x = AT y = B
@1A:
1
Remarks: The same diculties with the normal equations, as pointed out in the
overdetermined case, may arise in the underdetermined case as well. For example,
consider the explicit formation of AAT , with t-digits, when
1 0
!
A= :
1 0
If is such that 10;t < < 10; 2 , then
t
1 + 2 1
!
T
AA =
1 1 + 2
is singular.
430
7.9.2 The QR Approach for the Full-Rank Underdetermined Problem
A QR factorization of A can, of course, be used to solve an underdetermined system and, in
particular, to compute the minimum norm solution.
Since A in this case has more rows than columns, we decompose AT into QR instead of A.
Thus, we have !
R
QT AT = :
0
(Note that AT is n m; n m.) So, the system Ax = b becomes
(RT ; 0T )QT x = b:
Denote y = QT x. Then we have
(RT ; 0T )y = b:
Partitioning y conformably, we get
!
yR
(RT ; 0T ) =b
yN
or
RT yR = b:
The unique minimum norm least squares solution is obtained by setting
yN = 0:
So, the minimum norm solution is given by
yR
!
x = Qy = (Q ; Q ) = Q1yR :
1
0
2
Step 3: Solve
RT yR = b:
Step 4: Form x = Q yR.
1
431
A note on implementation. Note that if we use the Householder method to compute the
QR factorization of A, the product Q1yR can be computed from the factored form of Q as the
product of Householder matrices.
Round-o property. It has been shown (Lawson and Hanson SLP (pp. 93)) that the com-
puted vector x^ is close to the exact minimum length least squares solution of a perturbed problem.
That is, there exist a matrix E and a vector x^ such that x^ is the minimum length solution of
(A + E )^x b
where
kE kF (6n ; 3m + 41)m kAkF + O( ): 2
Example 7.9.2
1 2 3
! 6
!
A= ; b= :
2 3 4 9
Step 2:
0 ;:2673 :8729 1
B ;:5345
Q =B
C
:2182 C
1 @ A
;:8018 ;:4364
0 ;3:7417 0
1
R=B
B ;5:3452 C:
:6547 C
@ A
0 0
Step 3:
;1:6036 !
yR = :
0:6547
Step 4: The minimum norm solution:
011
B 1 CC :
x = Q yR = B
1 @ A
1
432
7.10 Iterative Renement
It is natural to wonder if a computed least squares solution x can be improved cheaply in an
iterative manner, as was done in the case of a linear system. A natural analog of the iterative
renement procedure for the linear system problem described in a section of Chapter 6 will be the
following. The scheme was proposed by Golub (1965).
Algorithm 7.10.1 Linear System Analog Iterative Renement For Least Squares So-
lution
Let x(1) be an approximate solution of the least squares problem. Then
For k = 1; 2; : : :, do until convergence occurs:
(1) r(k) = b ; Ax(k) (compute the residual).
(2) Compute c(k) such that
Ac(k) ; r(k)
2 is minimum.
(3) Correct the solution:
xk ( +1)
= x(k) + c(k):
An analysis of the above renement procedure by Golub and Wilkinson (1966)
reveals that the method is satisfactory only when the residual vector r = b ; Ax is
suciently small. A successful procedure used widely in practice now follows. The method is
based upon an interesting observation made by Golub that the least squares solution x and the
corresponding residual vector r satisfy the linear system
I A
! r! b
!
= :
AT 0 x 0
Note that the above system is equivalent to
Ax + r = b
and
AT r = 0;
which means that x is the solution of the normal equation
AT Ax = AT b:
Thus one can apply the iterative renement procedure described in Chapter 6 to the above aug-
mented linear system, yielding the following scheme, which is due to Bjorck (1967a; 1968).
Ake Bjorck is a Swedish numerical linear algebraist, well known for his outstanding contributions to solutions of
least squares problems. He is a professor of mathematics at Linkoping University in Sweden.
433
Algorithm 7.10.2 Iterative Renement for Least Squares Solutions
Set r(0) = 0; x(0) = 0.
For k = 1; 2; : : : do
rk ( ) ! bI A
! ! rk ! ( )
(1) Compute k 1
= ; T . (Using accumulation of inner products
r ( )
0 A 0 xk ( )
in double precision.)
2
I A rk
! ck !
( ) ( ) !
(2) Solve the system T = k . 1 1
A 0 ck r ( ) ( )
! rk ! ck !
2 2
rk ( +1) ( ) ( )
xk x c ( +1) ( ) ( )
2
Remark: Note that for satisfactory performance of the algorithm, step 1 must be performed
using double precision; which is in contrast with the iterative renement procedure for the linear
system problem given in Section 6.6, where accumulation of inner products is not necessary.
Implementation of Step 2
I A
!
Since the matrix T is of order m + n, the above scheme would be quite expensive
A 0
when m is large.
! Fortunately, one can implement the scheme rather cheaply. Observe that if
R
QT A = 1
is the QR decomposition of A, then the system
0
I A
! c! r
!
1
= 1
AT 0 c 2 r 2
(RT1 ; 0)QT c1 = r2 :
This shows that the above augmented system can be solved by solving two triangular systems and
two matrix-vector multiplications as follows:
r0 !
(1) Form QT r1 = . 1
r0 2
434
Flop-count. With the above formulation each iteration will require only 4mn ; n
op, 2
assuming that the Householder method has been used and that Q has not been formed explicitly.
Note that for the matrix-vector multiplications in steps 1 and 3, Q does not need to be formed
explicitly; these products can be obtained if Q is only known in implicit form.
Round-o error. It can be shown (Bjorck (1992)) that the solution x s obtained at the
( )
c is an error constant.
An Interpretation of the Result and Remarks
The above result tells us that the iterative renement procedure is quite satisfactory. It is even
more satisfactory for least squares problems with large residuals. Note that for these problems
(Cond(A))2 serves as the condition number. On the other hand, the above result shows error at
an iterative renement step depends upon the condition number of A. The procedure \may give
solutions to full single precision accuracy even when the initial solution may have no
correct signicant gures" (Bjorck (1992)). For a well-conditioned matrix, the convergence may
occur even in one iteration. Bjorck and Golub (1967) have shown that with a 8 8 ill-conditioned
Hilbert matrix, three digits accuracy per step both for the solution and the residual can be obtained.
Example 7.10.1
01 21 031
B CC B CC
A=B
@2 3A; b=B
@5A
3 4 9
r = 0;
(0)
x = 0;
(0)
k=0
031
BB 5 CC
r
!
(0)
b
! BB CC
1
= =B BB 9 CCC :
r(0)
2 0 B@ 0 CA
0
435
(2) Solve the system:
I A
! c ! (0)
r (0) !
1
= 1
AT 0 c(0)
r (0)
0 0:3333 1
2 2
BB ;0:6667 CC
c
!
(0) BB CC
1
= B 0:3333 C
B CC :
c(0)
BB
@ 3:3333 CA
2
;0:3333
(3) Update the solution and the residual:
0 0:3333 1
BB ;0:6667 CC
r
!
(1)
r
! c
!
(0) BB CC
(0)
= + = B 0:3333 C
B 1
CC :
x x c BB (0)
@ 3:3333 CA
(1) (0)
2
;0:3333
The computations of c(0) and c(0) are shown below.
0
! 1 2
r1(0) = b; r2(0) = .
0
0 ;10:6904 1
!
Tb = B CC. Thus,
r10
= Q B
@ ; 0 : 2182 A
r20
0:8165
0 ;10:6904 !
r1 =
0:2182
0
r2 = (0:8165):
0
!
c0
2 = .
0
3:3333
!
c2 =
(0)
.
;0:3333
0 0:3333 1
c(0)
B ;0:6667 CC.
=B
1 @ A
3:3333
Note that
3:3333
!
x =
(1)
;0:3333
is the same least squares solution as obtained by the QR and normal equation methods.
436
7.11 Computing the Variance-Covariance Matrix (AT A);1
In statistical and image processing applications very often one needs to compute the variance-
covariance matrix
X = (AT A);1 :
In order to see how this matrix arises in statistical analysis, consider the classical Linear Regression
Model:
b = Ax + ;
with the properties
1. E () = 0; and
2. cov() = E (T ) = 2I .
Here b is the response vector, A is the design matrix, and is the error term. The parameters
x and 2 are unknown.
Regression analysis is concerned with prediction of one or more (response) dependent variables
from the values of a group of predictor (independent) variables. Assuming that A has full rank,
then the least squares estimate of x in the above model is, clearly, given by
x^ = (AT A); AT b:
1
Also,
^ = b ; ^b = (I ; A(AT A);1AT )b
(note that AT ^ = 0 and ^b^ = 0). A common estimate of 2 is: 2 = kb ; Ax^k2 , where A is m n.
2
m;n
For details, see Johnson and Wichern (1988), and Bjorck (1992).
Since vital information might be lost in computing AT A explicitly, one wonders if it is possible
to compute X from the QR factorization of A. We show below how this can be done.
Consider the QR factorization of A:
T R~ !
Q A= :
0
Then
AT A = (R~ )T R~
and, thus,
X = (AT A); = (R~ ); (R~ )T :
1 1
437
One can then easily compute (R~ );1, since R~ is an upper triangular matrix. Note that since X is
symmetric, only one-half of its entries need to be computed, and X can overwrite R. However,
computing (AT A);1 this way will require explicit multiplication of (R~ );1(R~ );T .
This explicit multiplication can easily be avoided if the computations are reorganized. Thus, if
x1 through xn are the successive columns of X , then from
X = (R~ );1(R~);T ;
we have
R~(x ; : : :; xn) = (R~ );T :
1
Since the last column of R~ ;T is just r1 en, the last column, xn , is easily computed by solving the
nn
upper triangular system
~ n = 1 en:
Rx (7.11.1)
r nn
By symmetry we also get the last row of X .
Suppose now that we have already determined xij = xji; j = n; : : :; k + 1; i j . We now
determine xik ; i k.
For xkk we have
X
n
xkk rkk + rkjxjk = r1 ;
j =k+1 kk
from which we obtain
0 1
Xn
xkk = r1 @ r1 ; rkj xkj A : (7.11.2)
kk kk j =k+1
(Note that xkj = xjk ; j = k + 1; : : :; n have already been computed.) For i = k ; 1; : : :; 1, we have
0 1
X
k X
n
xik = ; r1 @ rij xjk + rij xkj A : (7.11.3)
ii j =i+1 j =k+1
Thus, (7.11.1){(7.11.3) determine all the entries of X .
Algorithm 7.11.1 Computing the Variance-Covariance Matrix X = (AT A); 1
438
Flop-count. The computation of X = (AT A); using the above formulas requires about n3
3
1
ops.
Example 7.11.1
01 5 71
B 2 3 4 CC :
A=B
@ A
1 1 1
A = QR gives
0 ;0:4082 0:9045 0:1231
1
B ;0:8165
Q = B ;0:3015
C
;0:4924 C
@ A
;0:4082 ;0:3015 0:8616
0 ;2:4495 ;4:8990 ;6:5320 1
B C
R = B
@ 0 3:3166 4:8242 C
A:
0 0 ;0:2462
0 4 1
B CC
The last column x = B
3 @ ;24 A (using (7.11.1))
16:5
x = 35 (using
22 3 (7.11.2))
x = ;6 5
12
(using (7.11.3)) .
x = 1:5
11
0 1:5 ;6 4 1
Then X = (AT A); = B
B ;6 35 ;24 CC.
1
@ A
4 ;24 16:5
7.12 Review and Summary, and Table of Comparisons
1. Existence and Uniqueness. The least squares solution to the problem Ax = b always exists.
In the overdetermined case, it is unique i A has full rank (Theorem 7.3.1).
2. Overdetermined Problems. We discussed two methods: the normal equations method and
the QR factorization method.
The normal equations method is simple to implement, but there are numerical diculties with
this method in certain cases.
The QR factorization methods are more expensive than the normal equations method, but are
more reliable numerically. If one insists on using normal equations, the use of extended precision is
439
recommended. One can use Householder or Givens transformations or the modied Gram-Schmidt
method to achieve QR factorization of A needed for solution of the least squares problem.
The Householder QR factorization is the most ecient and numerically viable; the modied
Gram-Schmidt is only slightly more expensive than the Householder method and is not as reliable
as the Householder or the Givens method numerically for QR factorization. However, with some
reorganization it can be made competitive with the Householder method to compute the least
squares solution. Indeed, popularity of using the MGS method in solving the least-squares problem
is growing. Some numerical experiments even suggest the superiority of this method over other QR
methods. See Bjorck (1992). Rice (1966) rst noted the superiority of this algorithm.
For the rank-decient overdetermined problem, there are innitely many solutions. Here we
have discussed only the QR factorization method with column pivoting. It is reasonably accurate,
but the most numerically viable approach to deal with rank deciency is the SVD approach to be
discussed in Chapter 10.
3. Underdetermined Problems. For the underdetermined problem there are innitely many
solutions in the full-rank case. We have discussed the normal equations method and the QR
factorization method for the minimum-norm solution. The normal equations here are:
AAT y = b;
whereas those of the overdetermined problem are:
AT Ax = AT b:
The numerical diculties with the normal equations are the same as those of the overdetermined
problem.
The round-o error analyses for the underdetermined and the rank-decient overdetermined
problems are somehow complicated, because solutions in these cases are not unique unless imposing
the requirement of having the minimum-norm solution. The backward round-o analyses
of the QR factorization methods using Householder transformations in these cases
show that the computed minimum-norm solution in each case is nearby the minimum-
norm solution of a perturbed problem. This is in contrast with the results obtained by the
Householder-QR factorization method for the full rank overdetermined problem, where it is shown
that the computed solution is the exact solution of a perturbed problem.
4. Perturbation Analysis: The results of perturbation analyses are dierent for dierent cases
of the perturbations in the data.
440
If only b is perturbed, then Cond(A) = kAk kAy k serves as the condition number for the unique
least squares solution (Theorem 7.7.1).
If A is perturbed, then the sensitivity of the unique least squares solution, in general, depends
upon the square of the condition number (Theorem 7.7.2). In certain cases, such as when the
residual is zero, the sensitivity depends only on the condition number of A.
5. Iterative Renement: As in the case of the linear system problem, it is possible to improve
the accuracy of a computed least squares solution in an iterative fashion. An algorithm which is a
natural analog to the one for the linear system (section 6.9) is satisfactory only when the residual
vector r = b ; Ax is suciently small. A widely used algorithm due to Bjorck is presented in
section 7.10. This algorithm requires solution of an augmented system of order m + n (where A
is m n). It is shown how to solve the system in a rather inexpensive way using QR factorization
of A.
The solution obtained by this iterative renement algorithm is quite satisfactory.
6. Computing the Covariance Matrix: An ecient computation of the matrix (AT A); is 1
important in statistical and image processing applications. In section 7.11, we show how to compute
this matrix using QR factorization of A and without nding the inverse explicitly.
TABLE 7.2
COMPARISONS FOR DIFFERENT LEAST SQUARES METHODS
441
Problem Method Flop-Count Numerical Properties
Overdetermined Normal mn2 + n3 (1) Diculties with formation of AT A
2 6
Full-Rank Equations (2) Produces more errors in the solution
than there are warranted by data, in
certain cases
Overdetermined Householder-QR mn2 ; 3n 3
Stable: The computed solution is the
Full-Rank exact solution of a nearby problem
Overdetermined MGS-QR mn2 Almost as stable as the
Full-Rank Householder-QR
2mnr ; r2(m + n) + 2r3 , Mildly stable: The computed minimum-
3
Overdetermined Householder-QR
Rank-Decient with column pivoting where r = rank(A) norm solution is close to the
minimum-norm solution of a
perturbed problem
Underdetermined Normal m
m2 n + 6
3
Same diculties as in the case of the
Full-rank Equations overdetermined problem
m2 n ; m3
3
Underdetermined Householder-QR Same as the rank-decient
Full-rank overdetermined problem
442
7.13 Suggestions for Further Reading
The techniques of least squares solutions are covered in any numerical linear algebra text, and in
some numerical analysis texts. The emphasis in most books is on the overdetermined problems. For
a thorough treatment of the subject we refer the readers to the book by Golub and Van Loan (MC).
The books by Stewart (IMC), by Watkins (FMC), and by Gil, Murray and Philip Numerical
Linear Algebra and Optimization also contain detailed discussions on perturbation analyses.
The books MC and FMC contain a very thorough treatment of the perturbation analysis.
A complete book devoted to the subject is the (almost classical) book by C. L. Lawson and
R. J. Hanson (SLP). There is a nice recent monograph by Bjorck (1992) that contains a thorough
discussion on least squares problems. These two books are a \must" for the readers interested in
further study on the subject. The SLP by Lawson and Hanson, in particular, gives the proofs of
the round-o error analyses of the various algorithms described in the present book.
Any book on regression analysis in statistics will contain applications of least squares problem
in statistics. We have, in particular, used the book Applied Linear Regression Models, by
John Neter, William Wasserman and Michael Kunter, Richard D. Irwin, Inc. Illinois (1983). A
classical survey paper of Golub (1969) contains an excellent exposition of the numerical linear
algebra techniques for the least squares and the singular value decomposition problems arising in
statistics and elsewhere. A paper by Stewart (1987) is also interesting to read.
The readers are highly recommended to read these papers along with other papers in the area
by Golub, Bjorck, Stewart, etc. representing the most fundamental contributions in this area. See,
for details, the references given in the book on these papers and the list of references appearing in
the recent monograph of Bjorck (1992). For more on underdetermined problems, see the paper by
Cline and Plemmons (1976).
443
Exercises on Chapter 7
PROBLEMS ON SECTION 7.3
1. Let A be m n; m n. Prove that the matrix AT A is positive denite i A has rank n.
2. Show that the residual vector r = b ; Ax is orthogonal to all vectors in R(A).
3. Prove that x is a least squares solution i
Ax = bR and b ; Ax = bN ;
where bR and bN are, respectively, the range-space and column-space components of the
vector b.
4. Prove Theorem 7.3.1, both in the full-rank and rank-decient cases, based on QR factorization
of A.
444
8. Let A and A~ have full rank. Let x and x~ be, respectively, the unique least squares solutions
to the problems
Ax = b
and
A~x~ = b;
where A~ = A + E . Then prove that
01 21 031
B CC B CC
A=B
@3 4A; b=B
@7A
5 6 11
E = 10; A 4
:
10. Verify the inequality of Theorem 7.7.4 in each of the following cases.
01 21
(a) A = B
B3 4C
@ C A ; E = 10;4A
05 1 6 1 1
(b) A = B
B ;4 C ; E = 10;4A
@ 10 0 C A
0 10;4
01 11
(c) A = B
B0 1C
@ C A ; E = 10;3A:
0 0
11. Work out a proof of Theorem 7.7.2.
445
PROBLEMS ON SECTIONS 7.7 AND 7.8
12. Develop an algorithm to solve the normal equations whose system matrix is positive semide-
nite and singular, based on complete pivoting of the system matrix. Apply your algorithm
to the normal equations with !
1 1
A=
1 1
and
2
!
b= :
2
01 11 001
B 2 3 CC ;
13. Let A = B
B 5 CC.
b=B
@ A @ A
0 1 1
(a) Find the unique least squares solution x using
(i) x = Ay b,
(ii) the normal equations method,
(iii) the Householder and the Givens QR factorization methods,
(iv) the CGS and MGS methods.
(b) Find Cond(A).
(c) Show that for this problem, the sensitivity of the least squares problem, when only A is
perturbed, depends upon Cond(A).
(d) Let A = A = 10;4A and let x~ = A~y b where A~ = A + E . Find kx~k;xkxk and verify the
inequality of Theorem 7.7.2 for this problem.
(e) Find r and r^ and verify the inequality of Theorem 7.7.3.
14. Construct your own example where the sensitivity of the least squares problem will depend
upon the square of the condition number of the matrix. (Show all your work.)
15. Consider the following well-known ill-conditioned matrix
01 1 11
BB 0 0 CC
A=B BB CC ; jj 1
CA
@ 0 0
0 0
(Bjorck (1990)).
446
(a) Choose an small, so that rank(A) = 3. Then compute Cond2 (A) to check that A is
ill-conditioned. 031
BB CC
(b) Find the least squares solution to Ax = BBB CCC using
@A
(i) the normal equations method,
(ii) the QR factorization method with the Householder, CGS and MGS methods.
031
(c) Change b to b1 = B
B 0 CC. Keep A unchanged. Find an upper bound for the relative error
@ A
0
in the least squares solution.
(d) Change A to A1 = A + A where A = 10;3 A. Keep b unchanged. Find an upper
bound for the relative error in the least squares solution.
(e) Find the maximum departure from orthogonality of the computed columns of the Q
matrix using the CGS and MGS methods.
(f) Compute the least squares solution of the problem in (b) using the SVD.
16. (Square-root free Cholesky) Given a symmetric positive denite matrix A, develop an
algorithm for nding the Cholesky decomposition of A without any square-roots:
A = LDLT ;
where L is a unit lower triangular matrix and D is a diagonal matrix with positive diagonal
entries.
Apply your algorithm to solve the full-rank least squares problem based on solving normal
equations.
17. (a) Derive a column-oriented version of the modied Gram-Schmidt process.
(b) Show that the
op-count for both the Gram-Schmidt and the modied Gram-Schmidt
are mn2, where A is m n (m n).
(c) Show that the
op-count for the Householder method is n2 m ; n3 .
18. (a) Construct an example of the least squares problem where Gram-Schmidt will yield a
poor result but modied Gram-Schmidt will do fairly well.
447
0 8 21 1
BB 13 34 CC
(b) Apply the CGS and MGS methods to the matrix A = BBB CC, using t = 4. Show that
@ 21 35 CA
34 39
in both cases the loss of orthogonality between q1 and q2 is unacceptable: q1T q2 = 0:09065.
19. Show that the
op-count for QR factorization with column pivoting using the Householder
method is 2mr ; r2(m + n) + 2r3=3, where r = rank(A).
20. Show that for the QR method with column pivoting in the rank-decient case, the basic
solution cannot be the minimum-norm solution unless R12 is zero.
21. (a) Show that the minimum-norm solution to Ax = b, with rank(A) = r, obtained by
complete QR factorization with column pivoting is given by
T; c
1
!
x = PW ;
0
where c is a vector consisting of the rst r elements of QT b.
(b) Find the minimum norm solution to the least squares problem with
01 1 11 031
B 1 1 1 CC ;
A=B
B 3 CC :
b=B
@ A @ A
1 1 1 3
23. Develop an algorithm similar to the one given in section 7.8.2 for computing the minimum
norm solution of the full-rank underdetermined problem. Apply your algorithm to problem
#22.
448
24. Using Theorem 7.9.1, prove that the minimum norm solution to an underdetermined system
can be obtained by projecting any solution to the system onto R(Ay ). That is, if PAy is the
orthogonal projection onto R(Ay ), then the minimum norm solution x is given by
x = PAy y;
where y is any solution. Using the above formula, compute the minimum norm solution to
the system 0x 1
0 1 10;4 0 0 B
1 1
C 011
BB 1 0 10;4 0 CC BB x2 CC = BB 1 CC :
@ A BB x3 CC @ A
1 0 0 10 @ A;4
1
x 4
0
2. For k = 1; 2; : : : do
2.1 r(k) = b ; Ax(k) . (Compute the residual.)
2.2 Solve the least squares problem: Find c(k) such that
k k
Ac ; r
is minimum.
( ) ( )
2
450
MATLAB PROGRAMS AND PROBLEMS ON CHAPTER 7
You will need the programs housqr, givqr, clgrsch, mdgrsch, lsfrqrh, lsfrmgs,
lsfrnme, lsudnme, lsudqrh, lsitrn1, lsitrn2, reganal from MATCOM.
0 0 1
0 0 ;0:0001 1 0 :0001 1
BB CC BB CC
B 0 0 :0009 C B : 0001 C
EA = 10; B ;
B@ 0 0:0003 CCA ; b = a0 BB@ :0001 CCA :
4 4
0 0:0001 :0001
Using MATLAB commands pinv, cond, norm, orth, null, etc., verify the inequalities of
Theorems 7.7.1-7.7.4 on dierent sensitivities of least-squares problems.
3. (Implementation of the Least-squares QR Algorithm using Givens Rotations).
Using givqr and bcksub, write a MATLAB program to implement the QR algorithm using
Givens Rotations for the full rank overdetermined least-square problem:
451
Test-Data for Problems 3, 4 and 5
For problems 4 and 5 use the following sets of test-data.
1. A randomly generated matrix of order 10.
2. Hilbert matrix of order 10.
0 1 1 1
1
BB 10;3 0 0 C
C
3. BB C.
B@ 0 10;3 0 C C
A
0 0 10 ; 3
For problem #5, generate b so that the least-squares solution x0in each 1
3
BB 10;3 CC
case has all entries equal to 1. (e.g. for data matrix in #3, b = BBB ;3 CCC
@ 10 A
10;3
.)
4. (The Purpose of this exercise is to compare the accuracy and
op-count and timing
for dierent methods for QR factorization of a matrix.)
a. Compute the QR factorization for each matrix A of the data set using
i. [Q; R] = qr(A) from MATLAB or [Q; R] = housqr (A) from Chapter 5. (Householder
QR).
ii. [Q; R] = givqr (A) (Givens QR)
iii. [Q; R]= clgrsch(A) from MATCOM (Classical Grass-Schmidt)
iv. [Q; R] = mdgrsch(A), from MATCOM (Modied Gram-Schmidt)
b. Using the results of (a), make the following table for each matrix A. Q^ and R^ stand for
the computed Q and R.
452
Table
(COMPARISON OF DIFFERENT QR FACTORIZATION METHODS).
housqr
givqr
clgrsch
mdgrsch
453
Table
(COMPARISON OF DIFFERENT METHODS FOR
FULL-RANK OVERDETERMINED LEAST-SQUARES PROBLEM)
METHOD kx ; x^k =kxk
2 2 kAx^ ; bk2 Flop-Count Elapsed Time
lsfrmgs
lsfrqrh
lsfrqrg
lsfrnme
generalized-
inverse
c. Write your conclusions.
6. Using housqr from MATCOM or [Q; R] = qr(A) from MATLAB, and backsub from Chapter
6, write a MATLAB program, called lsrdqrh(A,b) to compute the minimum norm least-
squares solution x to the rank-decient overdetermined problem Ax = b, and the
corresponding residual r, using Householder QR factorization of A:
[x; r] = lsrdqrh (A,b)
(This program should implement Algorithm 7.8.6 of the book.)
Test Data: A 20 2 matrix with all entries equal to 1 and b is vector with all entries equal
to 2.
7. Using the MATLAB function
[U,S,V] = svd (A)
to compute the singular value decomposition of A, write a MATLAB program, called lsrdsvd
to compute the minimum norm least-squares solution x to the rank-decient overdetermined
system Ax = b:
[x] = lsrdsvd (A,b)
Use the same test data as in problem #6 and compare the results with respect to accuracy,
op-count and elapsed time.
454
8. Run the programs lsudnme (least-squares solution for the underdetermined full-rank
problem using normal equations) and lsudqrh (least-squares solution for the under-
determined full rank problem using Housholder QR factorization) from MATCOM
on the following sets of data to compute the minimum norm solution x to the full rank un-
derdetermined problem Ax = b, and compare the results with respect to accuracy, elapsed
time, and
op-count.
1 2 3 4 5 6
! 1 1 1 1 1 1 1 1
!
A= ; A= ;
0 1 2 3 4 5 6 7
0 10 10 10 10 10 10 10 1
B 0 1 0 0 0 0 0 CC :
A=B
@ A
0 1 0 0 0 0 0
Construct b for each A so that the minimum norm solution x has all its entries equal to 1.
9. Run the programs lsitrn1 (based on Algorithm 7.10.1) and lsitrn2 (based on Algorithm
7.10.2) from MATCOM on the 88 Hilbert matrix A and constructing b randomly. Compare
the algorithms on this data with respect to number of iterations,
op-count, and elapsed time,
both for the solution and the residual.
10.
a. Compute (AT A);1 for each of the following matrices A as follows:
i. Compute explicitly (AT A);1 using MATLAB command inv.
ii. Run the program reganal from MATCOM.
b. Compare the results with respect to accuracy,
op-count, and elapsed time.
Test Data:
0 1 8 8 1 Hilbert1 matrix
A = The 1
BB 10; 0 0 C
C
A=B CC
3
BB
@ 0 10; 3
0 C A
0 0 10 ;3
455
8. NUMERICAL MATRIX EIGENVALUE PROBLEMS
8.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 457
8.2 Some Basic Results on Eigenvalues and Eigenvectors : : : : : : : : : : : : : : : : : : 458
8.2.1 Eigenvalues and Eigenvectors : : : : : : : : : : : : : : : : : : : : : : : : : : : 458
8.2.2 The Schur Triangularization Theorem and its Applications : : : : : : : : : : 460
8.2.3 Diagonalization of a Hermitian Matrix : : : : : : : : : : : : : : : : : : : : : : 463
8.2.4 The Cayley-Hamilton Theorem : : : : : : : : : : : : : : : : : : : : : : : : : : 466
8.3 The Eigenvalue Problems Arising in Practical Applications : : : : : : : : : : : : : : 466
8.3.1 Stability Problems for Dierential and Dierence Equations : : : : : : : : : : 467
8.3.2 Vibration Problem, Buckling Problem and Simulating Transient Current of
an Electrical Circuit : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 473
8.3.3 An Example of the Eigenvalue Problem Arising in Statistics: Principal Com-
ponents Analysis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 488
8.4 Localization of Eigenvalues : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 490
8.4.1 The Gersgorin Disk Theorems : : : : : : : : : : : : : : : : : : : : : : : : : : : 490
8.4.2 Eigenvalue Bounds and Matrix Norms : : : : : : : : : : : : : : : : : : : : : : 494
8.5 Computing Selected Eigenvalues and Eigenvectors : : : : : : : : : : : : : : : : : : : 495
8.5.1 Discussions on the Importance of the Largest and Smallest Eigenvalues : : : 495
8.5.2 The Role of Dominant Eigenvalues and Eigenvectors in Dynamical Systems : 496
8.5.3 The Power Method, The Inverse Iteration and the Rayleigh Quotient Iteration497
8.5.4 Computing the Subdominant Eigenvalues and Eigenvectors: De
ation : : : : 508
8.6 Similarity Transformations and Eigenvalue Computations : : : : : : : : : : : : : : : 515
8.6.1 Eigenvalue Computations Using the Characteristic Polynomial : : : : : : : : 515
8.6.2 Eigenvalue Computations via Jordan-Canonical Form : : : : : : : : : : : : : 520
8.6.3 Hyman's Method : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 520
8.7 Eigenvalue Sensitivity : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 521
8.7.1 The Bauer-Fike Theorem : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 521
8.7.2 Sensitivity of the Individual Eigenvalues : : : : : : : : : : : : : : : : : : : : : 524
8.8 Eigenvector Sensitivity : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 527
8.9 The Real Schur Form and QR Iterations : : : : : : : : : : : : : : : : : : : : : : : : : 529
8.9.1 The Basic QR Iteration : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 532
8.9.2 The Hessenberg QR Iteration : : : : : : : : : : : : : : : : : : : : : : : : : : : 535
8.9.3 Convergence of the QR Iterations and the Shift of Origin : : : : : : : : : : : 536
8.9.4 The Single-Shift QR Iteration : : : : : : : : : : : : : : : : : : : : : : : : : : : 537
8.9.5 The Double-Shift QR Iteration : : : : : : : : : : : : : : : : : : : : : : : : : : 539
8.9.6 Implicit QR Iteration : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 543
8.9.7 Obtaining the Real Schur Form A : : : : : : : : : : : : : : : : : : : : : : : : 547
8.9.8 The Real Schur Form and Invariant Subspaces : : : : : : : : : : : : : : : : : 550
8.10 Computing the Eigenvectors : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 552
8.10.1 The Hessenberg-Inverse Iteration : : : : : : : : : : : : : : : : : : : : : : : : : 552
8.10.2 Calculating the Eigenvectors from the Real Schur Form : : : : : : : : : : : : 553
8.11 The Symmetric Eigenvalue Problem : : : : : : : : : : : : : : : : : : : : : : : : : : : 556
8.11.1 The Sturm Sequence and the Bisection Method : : : : : : : : : : : : : : : : : 558
8.11.2 The Symmetric QR Iteration Method : : : : : : : : : : : : : : : : : : : : : : 564
8.12 The Lanczos Algorithm For Symmetric Matrices : : : : : : : : : : : : : : : : : : : : 566
8.13 Review and Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 573
8.14 Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 578
CHAPTER 8
NUMERICAL MATRIX EIGENVALUE PROBLEMS
8. NUMERICAL MATRIX EIGENVALUE PROBLEMS
Objectives
The major objectives of this chapter are to study numerical methods for the matrix eigenvalue
problem. Here are some of the highlights of the chapter.
Some practical applications (Section 8.3) giving rise to the eigenvalue problem.
The Power method and the Inverse power methods for nding the selected eigenvalues
and eigenvectors (Section 8.5).
Eigenvalue and Eigenvector sensitivity (Sections 8.7 and 8.8).
The QR iteration algorithm for nding the eigenvalues of a matrix of moderate size
(Section 8.9).
Inverse Iteration and Real Schur methods for nding the eigenvectors (Section 8.10).
Special methods for the symmetric eigenvalue problem: symmetric QR iteration with
Wilkinson shift and the bisection method (Section 8.11).
The Lanczos methods for nding the extremal eigenvalues of a large and sparse symmetric
matrix (Section 8.12).
Required Background
The following concepts and tools developed earlier in this book will be required for smooth
reading and understanding of the material of this chapter.
1. Norm Properties of Matrices (Section 1.7)
2. The QR factorization of an arbitrary and a Hessenberg matrix using Householder and
Givens transformations (Sections 5.4.1 and 5.5.1)
3. Linear system solutions with arbitrary, Hessenberg, and triangular matrices (Section 6.4)
4. The condition number and its properties (Section 6.7)
5. Reduction to Hessenberg and tridiagonal form (sections 5.4.3 and 5.5.3).
456
8.1 Introduction
This chapter is devoted to the study of numerical matrix eigenvalue problem. The problem is a
very important practical problem and arises in a variety of applications areas including engineering,
statistics, economics, etc.
Since the eigenvalues of a matrix A are the zeros of the characteristic polynomial det(A ; I ),
one would naively think of computing the eigenvalues of A by nding its characteristic polynomial
and then computing its zeros by a standard root-nding method. Unfortunately, this is not a
practical approach.
A standard practical algorithm for nding the eigenvalues of a matrix is the QR iteration
method with a single or double shift. Several applications do not need the knowledge of the whole
spectrum. A few selected eigenvalues, usually a few largest or smallest ones, suce. A classical
method, based on implicit powering of A, known as the power method is useful for this purpose.
The symmetric eigenvalue problem enjoys certain remarkable special properties; one such prop-
erty is that the eigenvalues of a symmetric matrix are well conditioned. A method based
on the exploitation of some of the special properties of the symmetric eigenvalue problem, known as
the Sturm Sequence-Bisection method, is useful for nding eigenvalues of a symmetric matrix
in an interval (note that the eigenvalues of a symmetric matrix are all real).
The QR iteration method, unfortunately, cannot be used to compute the eigenvalues and eigen-
vectors of a very large and sparse matrix. The sparsity gets lost and the storage becomes an
issue. An almost classical method by Lanczos has been rejuvenated by numerical linear algebraists
recently. The symmetric Lanczos method is useful in particular to extract a few extremal
eigenvalues and the associated eigenvectors of a very large and sparse matrix. The organization of
this chapter is as follows.
Some of the basic theoretical results on eigenvalues and eigenvectors are stated (and proved
sometimes) in Section 8.2.
Section 8.3 is devoted to the discussions of how the eigenvalue problem arises in some practical
applications such as stability analyses of a system of dierential and dierence equations, vibra-
tion analysis, transient behaviour of an electrical circuit, buckling problem, principal component
analysis in statistics, etc.
In Section 8.4 some classical results on eigenvalues locations such as the Gersgorin's disk
theorem etc. are stated and proved.
Section 8.5 describes the power method, the inverse power method, the Rayleigh-Quotient
iteration, etc. for nding a selected number of eigenvalues and the corresponding eigenvectors.
In Section 8.6 the diculties of computing the eigenvalues of a matrix via the characteristic
457
polynomial and the Jordan canonical form are highlighted.
The eigenvalue and eigenvector sensitivity are discussed in sections 8.7 and 8.8. The most
important result in this section is the Bauer-Fike theorem.
Section 8.9 is the most important section of this chapter. The QR iteration method with and
without shifts and their implementations are described in this section.
The Hessenberg-inverse iteration and computations of eigenvectors from the Real Schur
form are described in Section 8.10. The methods based on specic properties of symmetric ma-
trices are discussed in Section 8.11. These include the Sturm sequence-bisection method and
symmetric QR iteration method with the Wilkinson shift.
The chapter concludes with a brief introduction of the symmetric Lanczos algorithm for
nding the extremal eigenvalues of a large and sparse symmetric matrix.
Ax = x
or (A ; I )x = 0:
The vector x is a right eigenvector (or just an eigenvector) of A associated with the eigenvalue
. The vector y given by
y A = y
is called a left eigenvector of A associated with the eigenvalue . It is customary to call a right
eigenvector just an eigenvector.
The homogeneous system (A ; I )x = 0 has a nontrivial solution i
det(A ; I ) = 0:
PA () = det(A ; I ) is a polynomial in of degree n and is called the characteristic polynomial
of A.
Thus the n eigenvalues of A are the n roots of the characteristic polynomial.
The sum of the eigenvalues of a matrix A is called the trace of A. It is denoted by trace (A)
or Tr (A).
458
Recall also from Chapter 1, that if A = (a ), then trace(A) = a11 + + a =
ij nn
Theorem 8.2.1 The eigenvalues of a triangular matrix are its diagonal entries.
Theorem 8.2.2 The matrices A and TAT ; have the same eigenvalues. In other
1
Proof.
det(TAT ;1 ; I ) = det(T (A ; I )T ;1)
= det T det(A ; I ) det(T ;1)
= det T det(T ;1) det(A ; I )
= det(T T ;1 ) det(A ; I )
= det(A ; I ):
Thus, TAT ;1 and A have the same characteristic polynomial and, therefore, have the same
eigenvalues.
459
Note: The converse is not true. Two matrices having the same eigenvalues are not necessarily
similar. Here is a simple
! example. 1 0 !
1 1
Take A = ; B = I22 = . A and B have the same eigenvalues, but they cannot
0 1 0 1
be similar.
Theorem 8.2.3 For any complex matrix A there exists a unitary matrix U such
that
UAU = T
is a triangular matrix. The eigenvalues of A are the diagonal entries of T .
The theorem is of signicant theoretical importance. As we shall see below, several important
results on eigenvalues and eigenvectors can be derived using this theorem. We shall give a proof of
this theorem later in the chapter (Theorem 8.9.1).
460
In the general case,
U p(A)U = U (c An + c An; + + cnI )U
0 1
1
= c U An U + c U An; U + + cn U U
0 1
1
The diagonal entries of the triangular matrix T1 are p(1); p(2); : : :; p(n): Since p(A) and T1 have
the same eigenvalues, the theorem is proved.
Theorem 8.2.5 The eigenvalues of a Hermitian matrix A are real. The eigenvec-
tors corresponding to the distinct eigenvalues are pairwise orthogonal.
Proof.
Since U AU = T;
we have
(U AU ) = T
or U A U = T
or U AU = T
or T = T :
Thus, T is also Hermitian and, therefore, diagonal.
Since the eigenvalues of A are the diagonal entries of T , and the diagonal entries of a
Hermitian matrix must be real, it follows that the eigenvalues of A are real.
Remark: Note that if A is real, then A = A implies that A is symmetric (AT = A). The
eigenvalues of a real symmetric matrix are also real.
To prove the second part, let 1 and 2 be two distinct eigenvalues of A, and let x and y be the
associated eigenvectors. Then by denition, we have
Ax = x; x 6= 0;
1
and
Ay = y; y 6= 0;
2
461
and
y Ax = y x = y x = y x
2 2 2
(since = ):
2 2
Since A = A, we get
yx = yx
1 2
or ( ; )y x = 0:
1 2
Theorem 8.2.6 The eigenvalues of a Hermitian positive denite matrix are positive
and conversely, if a Hermitian matrix has all its eigenvalues positive, it must be
positive denite.
P
Dene now x = Uy , then x Ax = y U AUy = iyi2 . Since the i's are real and positive,
xAx > 0. Again, every nonzero y corresponds to a nonzero x. Thus, for every nonzero x, we have
x Ax > 0, proving that A is positive denite.
Theorem 8.2.7 The eigenvectors associated with the distinct eigenvalues of a ma-
trix are linearly independent.
462
Proof. Let ; : : :; m be the distinct eigenvalues and x ; : : :; xm be the corresponding eigenvectors.
1 1
Consider
c x + + cm xm = 0
1 1
or
c ( ; )x + + cm ( ; m )xm = 0:
2 1 2 2 1 (since Ax1 = 1x1):
Multiplying now to the left of the last equation by (2I ; A) we get
c ( ; )( ; )x + + cm( ; s)( ; s)xm = 0:
3 1 3 2 3 3 1 2
Theorem 8.2.8 For any Hermitian matrix A, there exists an unitary matrix U
such that
U AU = D
is a diagonal matrix. The diagonal entries of D are eigenvalues of A.
463
Theorem 8.2.9 An arbitrary matrix A is similar to a diagonal matrix if its eigen-
values are distinct.
X = (x ; : : :; xn):
1
Then,
AX = A(x ; : : :; xn0) = ( x ; : : :; nx1n)
1 1 1
0
BB 1
CC
= (x ; : : :; xn) BB CC = XD
1
B@ CA
0 n
where D = diag( ; : : :; n).
1
Since x ; : : :; xn are the eigenvectors corresponding to the distinct eigenvalues ; : : :; n, they
1 1
464
Theorem 8.2.10 (The Jordan Theorem) For an n n matrix A, there exists a
nonsingular matrix T such that
0J 1
BB 1
0
CC
BB CC
B CC
T ; AT = B
1
BB CC
BB CC
B@ 0 CA
Jm
where 0 1 1
i
BB 1 0 CC
BB i CC
BB CC
Ji = B BB C
C CC
BB CA
@ 0 1
i
where the i are eigenvalues of A.
v + + vm = n:
1
Theorem 8.2.11 A square matrix A satises its own characteristic equation; that
is, if A = (aij ) is an n n matrix and Pn () is the characteristic polynomial of A,
then
Pn (A) is a ZERO matrix.
0 1
0 0A
= @ :
0 0
8.3 The Eigenvalue Problems Arising in Practical Applications
The problem of nding the eigenvalues and eigenvectors arise in a wide variety of practical ap-
plications. It arises in almost all branches of science and engineering. As we have seen before,
the mathematical models of many engineering problems are systems of dierential and dierence
equations and the solutions of these equations are often expressed in terms of the eigenvalues and
eigenvectors of the matrices of these systems. Furthermore, many important characteristics of phys-
ical and engineering systems, such as stability, etc., often can be determined only by knowing the
nature and location of the eigenvalues. We will give a few representative examples in this section.
466
8.3.1 Stability Problems for Dierential and Dierence Equations
A homogeneous linear system of dierential equations with constant coecients of the form:
dx1(t) = a x (t) + a x (t) + + a x (t)
dt 11 1 12 2 n n 1
..
dxn(t) = a x (t) + a x (t) + + a x (t)
dt n1 1 n2 2 nn n
or in matrix form
x_ (t) = Ax(t); (8.3.1)
where
A = (aij )nn;
and 0 x (t) 1
B .. CC 1
x_ (t) = dtd B
@ . A;
xn(t)
arises in a wide variety of physical and engineering systems. The solution of this system is intimately
related to the eigenvalue problem for the matrix A.
To see this, assume that (8.3.1) has a solution x(t) = vet, where v is not dependent on t. Then
we must have
vet = Avet; (8.3.2)
that is
Av = v; (8.3.3)
showing that if is an eigenvalue of A and v is corresponding eigenvector. Thus the eigenpair (; v )
of A can be used to compute a solution x(t) of (8.3.1). If A has n linearly independent eigenvectors
(which will happen, as we have seen in the last section, when the eigenvalue of A are all distinct),
then the general solution of the system can be written as
x(t) = c v e1t + c v e2t + + cnvne t ;
1 1 2 2
n
(8.3.4)
where 1 ; : : :; n are the eigenvalues of A and v1; v2; : : :; vn are the corresponding eigenvectors.
In the general case, the general solution of (8.3.1) with x(0) = x0 is given by
x(t) = eAtx ; 0 (8.3.5)
467
where
eAt = I + At + A2!t + : : :
2 2
The eigenvalues i and the corresponding eigenvectors now appear in the computation of eAt :
For example, if A has the Jordan Canonical Form
V ; AV0= diag(J ; J ; : : :;1Jk);
1
1 2 (8.3.6)
i 1 0
BB . . . . . . CC
Ji = BBB .. . . . . . . CCC ; (8.3.7)
@. 1A
0 i
then
eAt = V diag(eJ1t ; eJ2 t ; : : :; eJ t)V ; ;
( ) k 1
(8.3.8)
where 0 1
BB 1 t t =
2
2 t p =p!
C
BB 1 t tp; =(p ; 1)! CCC 1
B ... ... .. CC
eJ t = e t B
i i
BB . CC : (8.3.9)
BB .
.. t CC
@ A
1
Thus, the system of dierential equations (8.3.1) is completely solved by knowing its eigenvalues
and eigenvectors of the system matrix A.
Furthermore, as said before, many interesting and desirable properties of physical and engineer-
ing systems can be studied just by knowing the location or the nature of the eigenvalues of the
system matrix A. Stability is one such property. The stability is dened with respect to an
equilibrium solution.
Denition 8.3.1 An equilibrium solution of the system
x_ (t) = Ax(t); x(0) = x ; 0 (8.3.10)
is the vector xe satisfying
Axe = 0:
Clearly, xe = 0 is an equilibrium solution and it is the unique equilibrium solution
i A is nonsingular.
A mathematical denition of stability is now in order.
468
Denition 8.3.2 An equilibrium solution xe is said to be stable, if, for any t and > 0, there
0
exists a real number (; t ) > 0 such that kx(t) ; xe k < wherever kx ; xe k :
0 0
In many situations, stability is not enough, one needs more than that.
Denition 8.3.3 An equilibrium solution is asymptotically stable if it is stable and if, for any
t , there is a such that
0
kx(t) ; xek ! 0; as t ! 1
whenever
kx ; xek < :
0
Denition 8.3.4 The system (8.3.1) is asymptotically stable if the equilibrium solution xe = 0 is
asymptotically stable.
Since an asymptotically stable system is necessarily stable, but not conversely, the following
convention is normally adopted.
Denition 8.3.5 A system is called marginally stable if it is stable, but not asymptotically
stable.
Denition 8.3.6 A system that is not stable is called unstable.
Mathematical Criteria for Asymptotic Stability
469
Theorem 8.3.1 (Stability Theorem For a Homogeneous System of Dier-
ential Equations) A necessary and sucient condition for the equilibrium solution
xe = 0 of the homogeneous system (8.3.1) to be asymptotically stable is that the
eigenvalues of the matrix A all have negative real parts.
Proof. It is enough to prove that x(t) ! 0 as t ! 1. Since the general solution of the system
x_ (t) = Ax(t) is given by x(t) = eAtx(0), the proof follows from (8.3.6){(8.3.9).
Note that if j = j + ij ; j = 1; 2; : : :; n; then e t = e t ei t , and e t ! 0, when t ! 1, i
j j j j
j < 0:
Stability of a Nonhomogeneous System
Many practical situations give rise to mathematical model of the form
x_ (t) = Ax(t) + b; (8.3.11)
where b is a constant vector. The stability of such a system is also governed by the eigenvalues of
A. This can be seen as follows.
Let x(t) be an equilibrium solution of (8.3.11). Dene
z(t) = x(t) ; x(t):
Then
z_ (t) = x_ (t) ; (x_ )(t)
= Ax(t) + b ; Ax(t) ; b
= A(x(t) ; x(t)) = Az (t):
Thus, x(t) ! x(t) if and only if z (t) ! 0. It, therefore, follows from Theorem 8.3.1 that:
470
Remarks: Since the stability of an equilibrium solution depends upon the eigenvalues of the
matrix A of the system, it is usual to refer to stability of the system itself, or just the stability
of the matrix A.
Stability of a System of Dierence Equations
Like the system of dierential equations (8.3.11), many practical systems are modeled by a system
of dierence equations: xk+1 = Axk + b.
A well-known mathematical criterion for asymptotic stability of such a system is given in the
following theorem. We leave the proof to the readers.
471
First consider the following crude (but simple) mathematical model of war between two coun-
tries:
dx = k x ; x + g
1
dt 1 2 1 1 1
dx = k x ; x + g
2
dt 2 1 2 2 2
where
xi(t) = war potential of the country i, i = 1; 2
gi(t) = the grievances that country i
has against the other, i = 1; 2:
gi , i and ki, i = 1; 2; are all positive constants. ixi denotes the cost of armaments of the country
i. This mathematical model is due to L. F. Richardson, and is known as the Richardson model.
Note that this simple model is realistic in the sense that the rate of change of the war potential
of one country depends upon the war potential of the other country, and on the grievances that
one country has against its enemy-country, and the cost of the armaments the country can aord.
While the rst two factors cause the rate to increase, the last factor has certainly a slowing eect
(that is why we have a ; sign associated with that term).
In matrix form, this model can be written as:
x_ (t) = Ax(t) + g
where 0 1 0 1
; k A x (t) A
A=@ 1
; x(t) = @
1
; 1
k ; x (t)
0 1
2 2 2
g
g = @ A: 1
g 2
2
Thus the equilibrium solution x(t) is asymptotically stable if 1 2 ; k1 k2 > 0, and unstable if
1 2 ; k1 k2 < 0. This is because when 12 ; k1k2 > 0, both the eigenvalues will have negative
real parts; if it is negative, one eigenvalue will have positive real parts.
For the above European arms race, the estimates of 1 ; 2 and k1 ; k2 were made under some
realistic assumptions and were found to be as follows:
= = 0:2
1 2
472
k = k = 0:9:
1 2
(For details of how these estimates were obtained, see the book by M. Braun (1978)). The main
assumptions are that both the alliances have roughly the same strength and 1 and 2 are the
same as Great Britain which is usually taken to be the reciprocal of the life-time of the British
Parliament (ve years).)
With these values of 1; 2; and k1 ; k2, we have
; k k = ; k = ;:7700:
1 2 1 2
2
1
2
1
Thus the equilibrium is unstable. In fact, the two eigenvalues are: 1.4000 and ;2:2000.
For a general model of Richardson's theory of arm races and the role of eigenvalues there, see
the book Introduction to Dynamic Systems by David G. Luenberger, John Wiley and Sons,
New York, 1979 (pp. 209-214).
Convergence of Iterative Schemes for Linear Systems
In Chapter 6 (Theorem 6.10.1) we have seen that the iteration:
xk = Bxk + d
+1 (8.3.13)
for solving the linear system
Ax = b
converges to the solution x for an arbitrary choice of the initial approximation x1 i the spectral
radius (B ) < 1.
We thus see again that only an implicit knowledge of the eigenvalues of B is needed
to see if an iterative scheme based upon (8.3.13) is convergent.
8.3.2 Vibration Problem, Buckling Problem and Simulating Transient Current of an
Electrical Circuit
Analysis of vibration and buckling of structures, simulation of transient current of electrical circuits,
etc., often give rise to a system of second-order dierential equations of the form:
B y + Ay = 0; (8.3.14)
where 0 1
BB y (t)
1
CC
y = B
B y (t) C
.. CC: (8.3.15)
2
BB . C
@ A
yn(t)
473
The solution of such a system leads to the solution of an eigenvalue problem of the type:
Ax = Bx: (8.3.16)
This can be seen as follows:
Let y = xeiwt be a solution of the system (8.3.14). Then from (8.3.14) we must have
w Bx = Ax:
2
In vibration problems the matrices B and A are, respectively, called the mass and stiness
matrices, and are denoted by M and K , giving rise to the symmetric generalized eigenvalue
problem:
Kx = Mx:
474
Denition 8.3.7 The quantities wi = pi; i = 1; : : :; n are called natural frequencies, and
x ; : : :; xn are called the amplitudes of vibration of the masses.
1
The frequencies can be used to determine the periods Tp for the vibrations. Thus
T = 2
pi
wi
is the period of vibration for the ith mode.
As we will see, the behavior of a vibrating system can be analyzed by knowing the
natural frequencies and the amplitudes . We will give a simple example below to illustrate
this. An entire chapter (Chapter 9) will be devoted to the generalized eigenvalue problem later.
Other vibration problems such as the buckling problem of a beam gives rise to boundary
value problems for the second-order dierential equations. The solutions of such problems using
nite dierences lead to also eigenvalue problems. We will also illustrate this in this section .
We now describe below how the frequencies and amplitudes can be used to predict the phe-
nomenon of resonance in vibration engineering.
All machines and structures such as bridges, buildings, aircrafts, etc., possessing mass and
elasticity experience vibration to some degree, and their design require consideration of their os-
cillatory behavior. Free vibration takes place when a system oscillates due to the forces inherent
in the system, and without any external forces. Under free vibration such systems will vibrate at
one or more of its natural frequencies, which are properties of the dynamical system and depends
on the associated mass and stiness distribution. For forced vibration, systems oscillate under the
excitation of external forces. When such excitation is oscillatory the system is also forced to vibrate
at the excitation frequency.
475
Phenomenon of Resonance
If the excitation frequency coincides or becomes close to
one of the natural frequencies of the system, dangerously
large oscillations may result, and a condition of resonance
is encountered. This is the kind of situation an engineer
would very much like to avoid. The collapse of the Tacoma
Narrows (also known as Galloping Gerty) Bridgea at Puget
Sound in the state of Washington in 1940 and that of the
Broughton Suspension Bridge in England are attributed to
such a phenomenon.
a For a complete story of the collapse of the Tacoma bridge, see the book
Differential Equations and Their Applications by M. Braun, Springer-
Verlag, 1978 (p. 167{169).
In both the above cases, a periodic force of very large amplitude was generated and, the fre-
quency of this force was equal to one of the natural frequencies of the bridge at the time of collapse.
In the case of the Broughton Bridge, the large force was set up by soldiers marching in cadence
over the bridge. In case of the Tacoma Bridge, it was the wind. Because of what happened
in Broughton, soldiers are no longer permitted to march in cadence over a bridge.
Buildings can be constructed with active devices so that the force due to wind can be controlled.
The famous Chicago Sears Tower in the windy city, Chicago, has such a device.
Another important property of dynamical systems is damping which is present in all systems
due to energy dissipation by friction and other resistances. However, for small values of damping,
it has very little eect on the natural frequencies, and is normally not included in the estimation
of natural frequencies. Damping becomes important in limiting the amplitude of oscillation at
resonance, major eect of damping is to reduce amplitude with time.
For a continuous elastic body the number of independent co-ordinates or degrees of freedom
needed to describe the motion is innite. However, under many situations, parts of such bodies
may be assumed to be rigid, and the system may be treated dynamically equivalent to one having
a nite number of the degrees of freedom.
In summary, the behavior of a vibrating system can be analyzed by knowing the frequencies
and the amplitudes of the masses, and the eigenvalue and the eigenvectors of the matrix of the
mathematical model of the system are related to these quantities. Specically, the frequencies
476
are the square roots of the eigenvalues and the relative amplitudes are represented by
the components of the eigenvectors.
Example 8.3.2 Vibration of a Building
Consider a two-story building (Figure 8.1(a)) with rigid
oor shown below. It is assumed that
the weight distribution of building can be represented in the form of concentrated weight at each
oor level, as shown in Figure 8.1(b), and the stiness of supporting columns are represented by
the spring constants ki .
y 2
m =m2
k =k
2
y 1
m =m1
k =k
1
(a) (b)
Figure 8.3
Figure 8.1
m y + (k + k )y ; k y = 0
1 1 1 2 1 2 2
m y ; k y + k y = 0
2 2 2 1 2 2
or
m 0
! y ! k + k ;k
! y!
1 1
+ 1 2 2 1
= 0: (8.3.18)
0 m2 y
2 ;k k 2 2 y2
477
Taking m1 = m2 = m; k1 = k2 = k, we have
m 0 ! y1 ! 2k ;k
! y !
+ = 0: (8.3.19)
1
0 m y2 ;k k y 2
Dening !
m 0
the mass matrix M =
0 m
;k !
and
2k
the stiness matrix K =
;k k
the above equation becomes
y
!
M y + Ky = 0; where y = 1
: (8.3.20)
y 2
The eigenvalues 1 and 2 and the corresponding eigenvectors representing two normal modes
for this 2 2 problem are easily calculated. The eigenvalues are:
= mk (0:3820);
1 = mk (2:6180):
2
478
0.851 0.851
rk rk
! = 0:618 m
1
! = 1:618 m
2
0.526 -0.526
Figure 8.2
Buckling Problem (A Boundary Value Problem)
Consider a thin, uniform beam of length l. An axial load P is applied to the beam at one of
the ends.
y
P
x
`
Figure 8.3
We are interested in the stability of the beam, that is, how and when the beam buckles. We
479
will show below how this problem gives rise to an eigenvalue problem and what role the eigenvalues
play.
Let y denote the vertical displacement of a point of the beam which is at a distance x from
the (de
ection) left support. Suppose that both the ends of the beam are simply supported, i.e.,
y(0) = y(l) = 0: >From beam theory the basic relationship between the curvature dx d2y and the
2
into account the boundary conditions (8.3.23), we obtain the following symmetric tridiagonal
matrix eigenvalue problem:
0 2 ;1 0 0 1 0 y 1 0 y 1
BB ;1 2 ;1 0 CC BB y1 CC BB y1 CC
BB . . . . . CC BB .2 CC BB .2 CC
BB . . . . . . . C
. . . C BB .. CC = BB .. CC ; (8.3.25)
BB .. . . . . . . . . . CC BB .. CC BB .. CC
@ . ;1 A @ . A @ . A
0 0 0 ;1 2 yn yn
480
where
= Ph
2
EI : (8.3.26)
Each value of determines a load
P = EI
h 2
(8.3.27)
which is called a critical load. These critical loads are the ones which are of practical interests,
because they determine the possible onset of the buckling of the beam.
In general, the smallest value of P is of primary importance, since the bending
associated with larger values of P may not be obtained without failure from occurring
under the action of the lowest critical value P .
Simulating Transient Current for an Electric Circuit ( Chapra and Canale (1988) ).
Given an electric circuit consisting of several loops, suppose we are interested in the transient
behavior of the electric circuit. In particular, we want to know the oscillation of each loop with
respect to the other. First consider the following circuit of the loop.
;i
_C
L
A A
A A
R
Figure 8.4
Voltage drop across a capacitor is:
VC = Cq ; q = charge in the capacitor
C = capacitance
Voltage drop across an inductor:
VL = L di
dt ; L is inductor u;
Voltage drop across a resistor:
VR = iR; R is resistor u:
481
Kircho's law of voltage states that the algebraic sum of voltage drops around a closed loop circuit
is zero. We then have for this circuit
L di q
dt + Ri + C = 0
or Z
L di + iR + 1
0
dt C i dt = 0
;1
Z
(Because, VC = Cq = C1 i dt:)
Now consider the network with four loops.
L1
;;;
$ $ $ $ L2
;;;
L3
;;;
L4
;;;
% % % %
+ C1 C2 C3 C4
;
i1 _ i2 _ i3 _ i4 _
Figure 8.5
Kircho's voltage law applied to each loop gives
Loop 1: Zt
; L didt ; C1
1
1
(i ; i )dt = 0 1 2 (8.3.28)
;1 1
Loop 2: Zt Zt
di 1
; L dt ; C2 1
(i ; i )dt + C (i ; i )dt = 0 (8.3.29)
2 2 3 1 2
;1 2 ;1 1
Loop 3: Zt Zt
di 1
; L dt ; C3 1
(i ; i )dt + C (i ; i )dt = 0 (8.3.30)
3 3 4 2 3
;1 3 ;1 2
Loop 4: Zt Zt
di 1
; L dt ; C 4 1
i dt + C (i ; i )dt = 0: (8.3.31)
4 4 3 4
;1 4;1 3
482
The system of ODEs given by (8.3.28)-(8.3.31) can be dierentiated and rearranged to give
L ddti + C1 (i ; i ) = 0
2
1 2
1
1 2 (8.3.32)
1
L ddti + C1 (i ; i ) ; C1 (i ; i ) = 0
2
2 2
2
2 3 1 2 (8.3.33)
2 1
L ddti + C1 (i ; i ) ; C1 (i ; i ) = 0
2
3 2
3
3 4 2 3 (8.3.34)
3 2
L ddti + C1 i ; C1 (i ; i ) = 0:
2
4
4
2 4 3 4 (8.3.35)
4 3
Assume
ij = Aj sin(wt); j = 1; 2; 3; 4: (8.3.36)
(Recall that ij is the current at the j th loop.)
>From (8.3.32)
;L1A1w2 sin wt + C1 A1 sin wt ; C1 A2 sin wt = 0
1 1
or
( C1 ; L1w2 )A1 ; C1 A2 = 0: (8.3.37)
1 1
>From (8.3.33)
;L A w sin wt + C1 A sin wt ; C1 A sin wt ; C1 A sin wt + C1 A sin wt = 0
2 2
2
2 2 1 2
2 2 1 1
or
; C1 A + ( C1 + C1 ; L w )A ; C1 A = 0:
1 2
2
2 3 (8.3.38)
1 1 2 2
>From (8.3.34)
;L A w sin wt + C1 A sin wt ; C1 A sin wt ; C1 A sin wt + C1 A sin wt = 0
3 3
2
3 4 2 3
3 3 2 2
or
; C1 A + ( C1 + C1 ; L w )A ; C1 A = 0:
2 3
2
3 4 (8.3.39)
2 2 3 3
>From (8.3.35)
;L A w sin wt + C1 A sin wt ; C1 A sin wt ; C1 A sin wt = 0
4 4
2
4 3 4
4 3 3
or
; C1 A + ( C1 + C1 ; L w )A = 0:
3 4
2
4 (8.3.40)
3 3 4
483
Gathering the equations (8.3.37){(8.3.40) together, we have
( 1 ; L w 2 )A ; 1 A = 0
C C 1 1 2 (8.3.41)
1 1
1
; C A + ( C + C ; L w )A ; C1 A = 0
1
1
1
2
2
2 3 (8.3.42)
1 1 2 2
; C1 A + ( C1 ; C1 ; L w )A ; C1 A = 0
2 3
2
3 4 (8.3.43)
2 2 3 3
; C1 A + ( C1 + C1 ; L w )A = 0
3 4
2
4 (8.3.44)
3 3 4
or
(1 ; L1 C1w2 )A1 ; A2 = 0 (8.3.45)
; C2 A1 + ( C2 + 1 ; L2C2w2)A2 ; A3 = 0
C C (8.3.46)
1 1
C
; C A + ( CC + 1 ; L C w )A ; A = 0
3
2
3
3 3
2
3 4 (8.3.47)
2 2
; CC A + ( CC + 1 ; L C w )A = 0:
4
3
4
4 4
2
4 (8.3.48)
3 3
or
A ;A
1 2 = L1 C1w2A1
; CC A + CC A + A ; A
2
1
2
2 2 3 = L2 C2w2A2
(8.3.49)
; CC A + ( C2 + 1)A ; A
1 1
3
2
3
3 4 = L3 C3w2A3
; CC A + ( CC + 1) + A
2
4
3
4
4 = L4 C4w2A4
3 3
@ ; CC ( CC + 1) ;1 C
3
2 A @
3
2 3
A
; CC ( CC + 1) A 4
3
4
2 4
0 10 1
BB L C 0 0 1 0 CC BB A
1 1
CC
=w B
B 0 LC 0 0 C CC BBB A CC : (8.3.50)
BB 0 CC
2 2 2 2
@ 0 LC 0 C A B@ A 3 3 3
A
0 0 0 LC 4 4 A 4
The above is an eigenvalue problem. To see it more clearly, consider the special case
C = C = C = C = C;
1 2 3 4
484
and
L = L = L = L = L;
1 2 3 4
and assume
= LCw ; 2
@ ;1 +(2 ; ) ; 1 A@ A 3
A
;1 +(2 ; ) A 4
@ 3
A
;1 (2 ; ) i 4
or 0 10 1 0 1
BB 1 ; 1 0 0 CC BB i CC BB i
1 1
CC
BB ;1 2 ;1 0 CC BB i CC = BB i CC : (8.3.53)
BB 0 ;1 2 ;1 CC BB i CC BB i CC
2 2
@ A@ A @ 2 3
A
0 0 ;1 2 i 3 i 4
The solution of this eigenvalue problem will give us the natural frequencies (wi2 = i=LC ): Moreover
the knowledge of the eigenvectors can be used to study the circuit's physical behavior such as the
natural modes of oscillation.
These eigenvalues and the corresponding normalized eigenvectors (in four digit arithmetic) are:
= :1206; = 1; = 2:3473; = 3:5321:
1 2 3 4
485
0 :6665 1 0 :5774 1 0 ;:4285 1 0 ;:2280 1
B
B CB
:5774 C B C BB :5774 CC BB :5774 CC
;0:0000 C
B
B C
C B
;B CC ; BB CC ; BB CC :
B
@ :4285 A @ ;0:5774 A @ :2289 A @ ;:6565 CA
C B C B C B
:2280 ;:5774 ;:6565 :4285
From the directions of the eigenvectors we conclude that for all the loops oscillate
1
in the same direction. For the second and third loops oscillate in the opposite
3
directions from the rst and fourth, and so on. This is shown in the following diagram:
486
= 0:1206
1
+
=1 2
= 2:3473
3
= 3:5321
4
Figure 8.6
487
8.3.3 An Example of the Eigenvalue Problem Arising in Statistics: Principal Com-
ponents Analysis
Many practical-life applications involving statistical analysis (e.g. stock market, weather prediction,
etc.) involve a huge amount of data. The volume and complexities of the data in these cases can
make the computations required for analysis practically infeasible. In order to handle and analyze
such a voluminous amount of data in practice, it is therefore necessary to reduce the data. The basic
idea then will be to choose judiciously `k' components from a data set consisting of n measurement
on p(p > k) original variables, in such a way that much of the information (if not most) in the
original p variables is contained in the k chosen components.
Such k components are called the rst k \principal components" in statistics.
The knowledge of eigenvalues and eigenvectors of the covariance matrix is needed to nd these
principal components.
Specically, if is the covariance matrix corresponding to the random vector
X = (X ; X ; : : :; Xp);
1 2
Note: The covariance matrix is symmetric positive semidenite, and, therefore, its
eigenvalues are all nonnegative.
If the rst k ratios constitute the most of the total population variance, then the rst k principal
components can be used in statistical analysis.
Note that in computing the kth ratio, we need to know only the kth eigenvalue of the covariance
matrix; the entire spectrum does not need to be computed. To end this section, we remark that
many real-life practices, such as computing the index of Dow Jones Industrial Average, etc.,
can now be better understood and explained through the principal components analysis. This is
shown in the example below.
488
A Stock-Market Example (Taken from Johnson and Wichern (1992))
Suppose that the covariance matrix for the weekly rates of return for stocks of ve major
companies (Allied Chemical, DuPont, Union Carbide, Exxon, and Texaco) in a given period of
time is given by: 0 1:000 0:577 0:509 0:387 0:462 1
BB 0:577 1:000 0:599 0:389 0:322 CC
BB CC
R = B 0:509 0:599 1:000 0:436 0:426 C
B CC :
BB
@ 0:387 0:389 0:436 1:000 0:523 CA
0:462 0:322 0:426 0:523 1:000
The rst two eigenvalues of R are:
= 2:857
1 (8.3.55)
= :809:
2 (8.3.56)
The proportion of total population variance due to the rst component is approximately
2:857 = 57%: (8.3.57)
5
The proportion of total population variance due to the second component is:
:809 = approximately 16%: (8.3.58)
5
Thus the rst two principal components account for 73% of the total population variance. The
eigenvectors corresponding to these principal components are:
xT = (:464; :457; :470; :421; :421)
1 (8.3.59)
and
xT = (:240; :509; :260; ;:526; ;:582):
2 (8.3.60)
These eigenvectors have interesting interpretations. >From the expression of x1 we see that the
rst component is (roughly) equally weighted sum of the ve stocks. This component is generally
called the market component. However, the expression for x2 tells us that the second component
represents a contrast between the chemical stocks and the oil-industries stocks. This component
will be generally called an industry component. Thus, we conclude that about 57% of total
variations in these stock returns is due to the market activity and 16% is due to industry activity.
The eigenvalue problem also arises in many other important statistical analysis, for example,
in computing the canonical correlations, etc. Interested readers are referred to the book by
Johnson and Wichern (1992) for further reading.
489
A nal comment: Most eigenvalue problems arising in statistics,
such as in principal components analysis, canonical correlations, etc.,
are actually singular value decomposition problems and should be
handled computationally using the singular value decomposition to
be described in Chapter 10.
490
we have
X
n
( ; aii)xi = aij xj ; i = 1; : : :; n;
j =1
i=j
6
where xi is the ith component of the vector x. Let xk be the largest component of x (in absolute
value). Then, since jxj j=jxkj 1 for j 6= k, we have from above
X
n
j ; akkj jakj j jjxxj jj
j =1 k
j =k
X
n
6
jakj j:
j =1
j =k 6
r = 12
2
r = 2:
3
R : jz ; 4j 12
2
R : jz ; 1j 2:
3
Remark: It is clear from the above example that the Gersgorin rst theorem gives only very
crude estimates of the eigenvalues.
491
Imag
R2
R1
R3 Real
R : jz ; 4j 0:5
2
R : jz ; 8j 0:9
3
All the three disks are disjoint from each other. Therefore, by Theorem 8.4.2, each disk must
contain exactly one iegenvalue of A. This is indeed true. Note that the eigenvalues A are 0.9834,
3.9671, and 8.0495. Imag
R1 R2 R3
1 4 Real
8
493
8.4.2 Eigenvalue Bounds and Matrix Norms
Simple matrix norms can sometimes be used to obtain useful bounds for the eigenvalues. Here are
two examples.
Corollary 8.4.1
(A) kAT k:
Combining these two results and taking the innity norm in particular, we obtain
Theorem 8.4.4
X
n X
n
(A) minfmax
i
jaijj; max
j
jaij jg:
j =1 i=1
494
Theorem 8.4.5 Let ; ; : : :; n be the eigenvalues of A, then
1 2
Xn
jij kAkF : 2 2
i=1
Proof. The Schur Triangularization Theorem (Theorem 8.2.3) tells us that there exists a unitary
matrix U such that
U AU = T; an upper triangular matrix.
Thus
TT = U AAU:
So, AA is unitarily similar to TT . Since similar matrices have the same traces,
Tr(TT ) = Tr(AA) = kAkF : 2
X
n X
n
Again, Tr(TT ) = jtij j : Also
2
i=1 j =1
X
n X
n
jtij j = kAkF :
2 2
i=1 j =1
and
X
n X
n
jtiij =
2
jij :2
i=1 i=1
X
n X
n X
n X
n
Thus, jji =
2
jtiij
2
jtij j = kAkF .
2 2
Let be the dominant eigenvalue of A; that is, j j > j j j j : : : jnj; where ; ; : : :n are
1 1 2 3 1 2
the eigenvalues of A. Suppose that A has a set of independent eigenvectors: v ; v ; : : :; vn: Then
1 2
where x = v + : : : + nvn : Since j jk > jijk ; i = 2; 3; : : :; n; it follows that for large values of
0 1 1 1
k,
j kj jikij; i = 2; 3; : : :; n;
1 1
provided that i =6 0: This means that for large values of k, the state vector xk will approach the
direction of the vector v corresponding to the dominant eigenvalue . Furthermore, the rate at
1 1
which the state vector approaches v is determined by the ratio of the second to the rst dominant
1
In the case = 0; the second dominant eigenvalue and the corresponding eigenvector assume
1 2
the role of the rst dominant eigenpair. Similar conclusions hold for the continuous-time system:
x_ (t) = Ax(t):
For details, see the book by Luenberger (1979).
496
The second dominant eigenpair is particularly important in the case 1 = 0: In this case, the
long term behavior of the system is determined by this pair.
An Example on the Population Study
Let's take the case of a population system to illustrate this.
It is well known (see Luenberger (1979), p. 170) that such a system can be modeled by
pk = Apk ; k = 0; 1; 2; : : :
+1
where pk is the population-vector. If the dominant eigenvalue 1 of the matrix A is less than 1
(that is, if j1j < 1), then it follows from
pk = k v + : : : + nknvn;
1 1 1
that the population decreases to zero as k becomes large. Similarly, if j1j > 1; then there is
long term growth in the population. In the latter case the original population approaches a nal
distribution that is dened by the eigenvector of the dominant eigenvalue. Moreover, it is the
second dominant eigenvalue of A that determines how fast the original population distribution is
approaching the nal distribution. Finally, if the dominant eigenvalue is 1, then over the long term
there is neither growth nor decay in the population.
8.5.3 The Power Method, The Inverse Iteration and the Rayleigh Quotient Iteration
In this section we will brie
y describe two well-known classical methods for nding the dominant
eigenvalues and the corresponding eigenvectors of a matrix. The methods are particularly suitable
for sparse matrices, because they rely on matrix vector multiplications only (and therefore, the
zero entries in a sparse matrix do not get lled in during the process).
The Power Method
The power method is frequently used to nd the dominant eigenvalue and the correspond-
ing eigenvector of a matrix. It is so named because it is based on implicit construction of the
powers of A.
Let the eigenvalues 1 ; 2; : : :; n of A be such that
j j > j j j j : : : jnj;
1 2 3
that is, 1 is the dominant eigenvalue of A. Let v1 be the corresponding eigenvector. Let max(g )
denote the element of maximum modulus of the vector g . Let be the tolerance and N be the
maximum number of iterations.
497
Algorithm 8.5.1 Power Method
Step 1. Choose x 0
Step 2. For k = 1; 2; 3; : : : do
x^k = Axk; ,
1
xk = x^k = max(^xk).
Stop if (max(^xk ) ; max(^xk; )) < or if k > N .
1
Ak x ) 0
Let the eigenvectors v1 through vn associated with 1; : : :; n be linearly independent. We can then
write x0 = 1 v1 + 2 v2 + : : : + n vn, 1 6= 0: So,
Ak x = Ak ( v + v + : : : + nvn )
0 1 1 2 2
= k v + k v + : : : + n knvn
1
k
1 1 2 2 2
k
k
= [ v +
1 1 1 v + : : : + n n vn]:
2
2
2
1 1
Thus,
Ak x ! cv ;
xk = max( 0
Ak x ) 0
1
and
fmax(^xk)g ! : 1
12
max(^x10) =112 0 1
0:50 1
x
^ B
x1 = max(^x ) = B
C
3 C
B
=B
C
0:75 C
2
@ 4 A @ A
1
1
1 1
k=2: 0 5:00 1 0 1
B CC B 0: 5263 C
x^ = Ax = B
2 B@ 7:25
1 CA ; 2B@ 0:7632 CCA
x =B
9:50 1:0000
max(^x2 ) = 9:50
499
k=3:
0 5:0526 1
B 7:3421 CC
x^ = Ax = B
3 @ A 2
9:6316
x = x0^ = max(^x1 )
3 3 3
0:5246
= B
B C
@ 0:7623 CA
1:000
max(^x ) = 9:6316: 3
Thus fmax(^x3)g is converging towards the largest eigenvalue 9.6235 and fxk g is converging towards
the direction0of the eigenvector
1 associated with this eigenvalue. (Note that the normalized dominant
0:3851
B CC
eigenvector B@ 0 :5595 A is a scalar multiple of x3.)
0:7339
Convergence of the Power Method
The rate of convergence of the power method is determined by the ratio 2 , as is easily seen
1
from the following.
k k
kxk ; v k = k v + : : : + n n vnk
1 1 2
2
2
k 1
k
1
j j kv k + : : : + jnj n kvnk
2
2 2
k 1 1
(j jkv k + : : : + jnj kvnk) :
2
2 2
1
where
= (j j kv k + : : : + jnj kvnk):
2 2
This shows that the rate at which xk approaches 1 v1 depends upon how fast j 2 jk goes to zero.
The absolute value of the error at each step decreases by the ratio ( ), that is, if is close to
1
2
2
, then the convergence will be very slow; if this ratio is small, the convergence will
1
1
be fast.
500
The Power Method with Shift
In some cases, convergence can be signicantly improved by using a suitable shift. Thus, if
is a suitable shift so that 1 ; is the dominant eigenvalue of A ; I , and if the power method is
applied A ; I , then the rate of convergence will be determined by the ratio
; to the shifted matrix
; , rather than . (Note that by shifting the matrix A by , the eigenvalues get
2 2
By choosing appropriately, in some cases, the ratio ;
; can be made signicantly smaller
2
1
than , thus yielding the faster convergence. An optimal choice (Wilkinson AEP, p. 572) of ,
2
assuming that i are all real, is 21 (1 + n). This simple choice of sometimes indeed yields very
1
fast convergence: but there are many common examples where the convergence can still be slow
with this choice of .
Consider a 20 20 matrix A with the eigenvalues 20; 19; : : :; 2; 1. The choice of = 1 +2 20 =
10:5 yields the ratio 2 ; = 8:5 , still close to one. Therefore, the rate of convergence will still
1 ; 9:5
be slow. Furthermore, the above choice of is not useful in practice, because the eigenvalues are
not known a priori.
The Inverse Power Method/Inverse Iteration
The following iterative method, known as the inverse iteration, is an eective method for
computing an eigenvector when a reasonably good approximation to an eigenvalue is known.
Algorithm 8.5.2 Inverse Iteration
Let be an approximation to a real eigenvalue 1 such that j1 ; j ji ; j(i 6= 1); that is,
is much closer to 1 than to the other eigenvalues. Let be the tolerance and N be the maximum
number of iterations.
Step 1. Choose x : 0
Step 2. For k = 1; 2; 3; : : :; do
(A ; I )^xk = xk; ; 1
xk = x^k = max(^xk):
Stop if (kxk ; xk; k=kxk k) < or if k > N .
1
Theorem 8.5.2 The sequence fxkg converges to the direction of the eigenvector
corresponding to 1.
501
Remark: Note that inverse iteration is simply the power method applied to (A ; I ); . That is 1
are the same as those of A. Thus, as in the case of the power method, we can write
x^k = ( ; c v + c cn
)k " ( ; )k : : : + (n ; )k vn
1 2
1
1
1
; k 2
; k #
= ( ; )k c v + c ; v : : : + cn ; vn :
1 1 2
1
2
1
1 2 n
Since is closer to than any other eigenvalue, the rst term on the right hand side is the
1
dominating one and, therefore, xk converges to the direction of v . It is the direction of v which
1 1
Since 1 is closer to than any other eigenvalue, the coecient of the rst term in the expansion,
namely ( 1; ) , is the dominant one (it is the largest). Thus, x^1 is roughly a multiple of v1,
1
which is what we desire.
Numerical Stability of the Inverse Iteration
At rst sight inverse iteration seems to be a dangerous procedure, because if is near 1, the
matrix (A ; I ) is obviously ill-conditioned. Consequently, this ill-conditioning might aect the
computed approximations of the eigenvector. Fortunately, in practice the ill-conditioning of the
matrix (A ; I ) is exactly what we want. The error at each iteration grows towards the direction
of the eigenvector, and it is the direction of the eigenvector that we are interested in.
Wilkinson (AEP pp. 620{621) has remarked that in practice x^k is remarkably close
to the solution of
(A ; I + F )xk = xk;1;
where F is small. For details see Wilkinson (AEP pp. 620{621). a
a \The iterated vectors do indeed converge eventually to the eigenvectors of A + F ."
502
Example 8.5.2 01 2 31
B C
A=B
@ 2 3 4 CA
3 4 5
x0 = (1; 1; 1)T ; = 9:
k = 1:
x^ = (1; 1:5; 2)T
1
k = 2:
x^ = (:619; :8975; 1:1761)T
2
k = 3:
x^ = (:6176; :8974; 1:1772)T
3
k = 4:
x^ = (:6176; :8974; 1:1772)T
4
k = 5:
x^ = (:6177; :8974; 1:1772)T
5
Remark: In the above example we have used norm k k as a scaling for the vector xi to empha-
2
size that the scaling is immaterial since we are working towards the direction of the eigenvector.
Choosing the Initial Vector x 0
To choose the initial vector x0 we can run a few iterations of the power method and then switch
to the inverse iteration with the last vector generated by the power method as the initial vector x0
in the inverse iteration. Wilkinson (AEP, p. 627) has stated that if x0 is assumed to be such that
Le = Px 0
503
where P is the permutation matrix satisfying
P (A ; I ) = LU;
and e is the vector of unit elements, then only \two iterations are usually adequate", provided
that is a good approximation to the eigenvalue.
Note: If x0 is chosen as above, then the computations of x^1 involves only solution of the
triangular system:
U x^1 = e1 :
Example 8.5.3 01 2 31
B C
A=B
@ 2 3 4 CA :
3 4 5
The eigenvalues of A are:
0; ;0:6235; and 9:6235:
Choose = 9:1 0 ;8:1 2:0 3:0 1
B C
A ; I = B
@ 2:0 ;6:1 4:0 CA
3:0 4:0 ;4:1
0 1 0 0
1 0 ;8:1 2 3
1
B C B C
L=B
@ ;:2469 1 0 CA ; U = B@ 0 ;5:6062 4:7407 CA
;:3704 ;:8456 1 0 0 1:0200
01 0 01
B C
P =B @ 0 1 0 CA :
0 0 1
0 1:000 1
B CC
x0 = P ;1Le = B
@ :7531 A:
;:2160
k=1: 0 :4003 1
B :6507 CC
x^ = (A ; I ); x = B
1
1
@ 0 A
:9804
0 :4083 1
B :6637 CC :
x = x^ = max(^x ) = B
1 1 1 @ A
1:000
504
k=2: 0 :9367 1
B 1:3540 CC
x^ = (1 ; I ); x = B
2 @ A 1
1
1:7624
0 :5315 1
B :7683 CC ;
x =B
2 @ A
1:000
which is about 1.3 times the normalized eigenvector.0
:3851 1
B :5595 CC.
The normalized eigenvector correct to four digits is B
@ A
:7339
The Rayleigh Quotient
Proof. Since A is symmetric there exists a set of orthogonal eigenvectors v ; v ; : : :; vn. Therefore 1 2
we can write
x = c v + : : : + cnvn :
1 1
Assume that vi ; i = 1; : : :; n are normalized, that is viT vi = 1: Then, since Avi = ivi ; i = 1; : : :; n,
and noting that viT vj = 0; i 6= j , we have
= xxTAx = (c v + : : : + cnvn ) TA(c v + : : : + cn vn)
T 1 1
T 1 1
x (c v + : : : + cnvn ) (c v + : : : + cnvn )
1 1 1 1
2 2
6 1 + c + : : : + c 77
2 2
= 664
1 c
1 1
cn
2 75 : 1
2
1
1+ c +:::+ c 2
1 1
505
Because of our assumption that x is a good approximation to v1 , c1 is larger than other ci; i =
2; : : :; n. Thus, the expression within [ ] is close to 1, which means that is close to 1.
where n and 1 are the smallest and the largest eigenvalue of A respectively.
Rayleigh-Quotient Iteration
The above idea of approximating an eigenvalue can be combined with inverse iteration to com-
pute successive approximations of an eigenvalue and the corresponding eigenvector in an iterative
fashion, known as Rayleigh-quotient iteration, described as follows. Let N be the maximum
number of iterations to be performed.
Algorithm 8.5.3 Rayleigh-Quotient Iteration
For k = 0; 1; 2; : : :; do
1. Compute
k = xTk Axk =xTk xk
2. Solve for x^k+1 :
(A ; k I )^xk+1 = xk ;
3. Compute
xk = x^k = max(^xk ):
+1 +1 +1
506
Convergence: It can be shown (Wilkinson, AEP, p. 630) that the rate of convergence of the
method is cubic.
Choice of x : As for choosing an initial vector x , perhaps the best thing to do is to use the
0 0
direct power method itself a few times and then use the last approximation as x0 .
Remark: Rayleigh Quotient iteration can also be dened in the nonsymmetric case, where one
nds both left and right eigenvectors at each step. We omit the discussion of the nonsymmetric
case here and refer the reader to Wilkinson AEP, p. 636. See also Parlett (1974).
Example 8.5.5 01 2 31
B 2 3 4 CC
A=B
@ A
3 4 5
Let us take
0 :5246 1
B C
x =B
0 @ :7622 CA ;
1:000
which is obtained after 3 iterations of the power method. Then
k=0:
= xT Ax =(xT x ) = 9:6235
0 0 0 0 0
0 :5247 1
B :7623 CC
x =B
@1 A
1:000
k=1:
= xT Ax =(xT x ) = 9:6235
0 1:000 1
1 1 1 1 1
B C
x =B
2 @ 1:4529 CA ;
1:9059
0 :3851 1
B C
The normalized eigenvector associated with 9.6255 is B @ :5595 CA.
:7339
Note that .3851 times x is this eigenvector to three digits. Thus two iterations were sucient.
2
507
8.5.4 Computing the Subdominant Eigenvalues and Eigenvectors: De
ation
Once the dominant eigenvalue 1 and the corresponding eigenvector v1 have been computed, the
next dominant eigenvalue 2 can be computed by using de
ation. The basic idea behind de
ation
is to replace the original matrix by another matrix of the same or lesser dimension using a computed
eigenpair such that the de
ated matrix has the same eigenvalues as the original one except the one
which is used to de
ate.
Hotelling De
ation
The Hotelling de
ation is a process which replaces the original matrix A = A1 by a matrix A2
of the same order such that all the eigenvalues of A2 are the same as those of A1 except the one
which is used to construct A2 from A1 .
Case 1: A is Symmetric
First suppose that A = A1 is symmetric. Let (1; x1) be an eigenpair of A1 . Dene
A = A ; x xT ;
2 1 1 1 1
For i = 1 :
A x = A x ; x xT x = x ; x = 0 = 0:
2 1 1 1 1 1 1 1 1 1 1 1
6 1:
For i =
A xi = A xi ; 0 = ixi:
2 1
Thus the eigenvalues of A are 0; ; ; : : :; n, and x through xn are the corresponding eigenvec-
2 2 3 1
tors.
Case 2: A is Nonsymmetric
The idea above can be easily generalized to a nonsymmetric matrix; however, we need both
left and right eigenvectors here. Let (x1 ; y1) be the pair of right and left eigenvector of A = A1
corresponding to 1. Then dene
A2 = A1 ; 1x1 y1T
where
y T x = 1:
1 1
508
Then using the bi-orthogonality conditions of the eigenvectors xi and yi we have
For i = 1 :
A2x1 = A1x1 ; 1x1y1T x1 = 1x1 ; 1x1 = 0
For i 6= 1 :
A xi = A xi ; x y T xi = ixi ; 0 = ixi :
2 1 1 1 1
Again, we see that 1 = 0; and, 2 through n are the eigenvalues of A2 corresponding to the
eigenvectors x1 through xn.
Householder De
ation
We will now construct a de
ation using similarity transformation on A1 with Householder
matrices. Of course, the technique will work with any similarity transformation; however, we will
use Householder matrices for reasons of numerical stability.
The method is based upon the following result:
Theorem 8.5.4 Let ( ; v ) be an eigenpair of A and H be a Householder matrix such that Hv
1 1 1
is a multiple of e1 , then
0 ::: 1 0
BB 0 1
CCC 1
b T
A = HAH = BBB .. CC = @ A;
A2
1
@. 0 A
1
A 2
0
where A2 is (n ; 1) (n ; 1), and the eigenvalues of A2 are the same as those of A except for 1; in
particular, if j1j > j2j j3j : : : jnj, then dominant eigenvalue of A2 is 2, which is the second
dominant (subdominant) eigenvalue of A.
509
Proof. From Av = v we have
1 1 1
HAHHv = Hv (since H = I ):
1 1 1
2
That is, HAH (ke1) = 1ke1 (since Hv1 = ke1 ) or HAHe1 = 1e1. (This means that the rst
column of HAH is 1 times the rst column of the identity matrix.) Thus HAH must have the
form 0 1 :::
BB 1
CC
0
HAH = BBB .. CC :
@. A2 CA
0
Since det(HAH ; I ) = 1 det(A2 ; I ), it follows that the eigenvalues of HAH are 1 plus (n ; 1)
eigenvalues of A2 . Moreover, if
j j > j j > j j j j : : : jnj;
1 2 3 4
0
3. Compute HAH .
4. Discard the rst row and the rst column of HAH and nd the dominant eigenvalue of the
(n ; 1) (n ; 1) matrix thus obtained.
Example 8.5.6 0 :2190 :6793 :5194 1
B :0470 :9347 :8310 CC :
A=B
@ A
:6789 :3835 :0346
The eigenvalues of A are: .0018, -0.3083, 1.4947.
510
0 ;:5552 1
B C
1. = 1:4947; v = B
1 @ ;:7039 CA :
1
;:4430
0 ;1:5552 1
B ;0:7039 CC
2. u = v ; kv k e = B
1 @
1 2 1 A
;0:4430
0 ;:5552 ;:7039 ;:4430 1
2uuT B C
H=I; =B
@ ;:7039 :6814 ;:2005 CA
uT u
;:4430 ;:2005 ;:8738
011
B CC
Hv = B@0A:
1
0
0 1:4977 ;:3223 ;:3331 1 0 1
1 : 4977 ; : 3223 ; : 3331
3. HAH = B
B 0 CC = BB CC
@ : 1987 :2672 A @ 0B A CA :
2
0 :3736 ;:5052 0
4. The dominant eigenvalue of A is ;0:3083 which the subdominant eigenvalue of A.
2
0 A2
1
511
Algorithm 8.5.5 Computing the Subdominant Eigenvector
1. Compute the eigenvector v2(2) of A2 corresponding to 2.
2. Compute given by
= b ;v :
T (2)
2
2 1
0 :3736 ;:5052
bT = (;:3223 ; :3331)
:1987 :2672
!
A = :
2
;:3736 ;:5052
;:4662 !
1. v =
(2)
2
:8847
= b ;v = :0801
T (2)
2. 2
0 :0801 1
2 1
3. v =B
B ;:4662 CC
@ A
(2)
1
;:8847
0 ;:1082 1
4. v = Hv = B
B ;:5514 CC :
@ A
(2)
2 1
0:8314
Computing the other largest Eigenvalues and Eigenvectors
512
Once the pair (2; v2) is computed, the matrix A2 can be de
ated using this pair to compute
the pair (3; v3). From the vector pair (3; v3), we then compute the pair (4; v4) and so on. Thus
by repeated applications of the process we can compute successively all the n eigenvalues and
eigenvectors of A.
Then the eigenvalues of A;1 (which are the reciprocals of the eigenvalues of A) are arranged as:
1 1 1
> 1 > 0:
n n; n;
1 2 j j
1
That is, 1 is the dominant eigenvalue of A;1 . This suggests that the reciprocal of the smallest
n
eigenvalue can be computed by applying the power method to A;1.
Algorithm 8.5.6 Computing the Smallest Eigenvalue in Magnitude
1. Apply the power method to A;1 .
2. Take the reciprocal of the eigenvalue obtained in step 1.
Example 8.5.8 01 4 51
B C
A=B
@ 2 3 3 CA :
1 1 1
The power method (without shift) applied to A;1 with the starting vector x0 = (1; ;1; 1)T gave
= 9:5145. Thus the smallest eigenvalue of A is:
1 = :1051:
(Note the eigenvalues of A are 6.3850, 01.4901 and .1051.)
514
8.6 Similarity Transformations and Eigenvalue Computations
Recall (Theorem 8.2.2) that two similar matrices A and B have the same eigenvalues, that is, if
X is a nonsingular matrix such that
X ;1AX = B;
then the eigenvalues of A are the same as those of B .
One obvious approach to compute the eigenvalues of A, therefore, will be to reduce A to a
suitable \simpler" form B by similarity so that the eigenvalues of B can be more easily computed.
However, extreme caution must be taken here. It can be shown (Golub and Van Loan MC 1984,
p. 198) that
kE k kX k kX ; k kAk :
2 2
1
2 2
Remark: Since the error matrix E clearly depends upon Cond (X ), the above
2
515
Diculties with Eigenvalue Computations Using
the Characteristic Polynomial
First, the process of explicitly computing the coecients of the characteristic
polynomial may be numerically unstable.
Second, the zeros of the characteristic polynomial may be very sensitive to
perturbations on the coecients of the characteristic polynomial. Thus if the
coecients of the characteristic polynomial are not computed accurately, there
will be errors in the computed eigenvalues.
In Chapter 3 we illustrated the sensitivity of the root-nding problem by means of the Wilkinson-
polynomial and other examples. We will now discuss the diculty of computing the characteristic
polynomial in some detail here.
Computing the characteristic polynomial explicitly amounts to transforming the matrix to a
block-companion (or Frobenius) form. Every matrix A can be reduced by similarity to
0C 0
1
B . . . CC
1
C=B
@ A;
0 Ck
where each Ci is a companion matrix. The matrix C is said to be in Frobenius form. If k = 1,
the matrix A is nonderogatory.
Assume that A is nonderogatory and let's see how A can be reduced to a companion matrix by
similarity. This can be achieved in two stages:
516
We have already seen in Chapter 5 how a nonderogatory matrix A can be transformed to an
unreduced Hessenberg matrix by orthogonal similarity in a numerically stable way using the
Householder or Given method.
Consider now stage 2, that is, the transformation of the unreduced Hessenberg matrix H to a
companion matrix C .
Let X be the nonsingular transforming matrix, that is HX = XC where
00 0 : 0 c 1
BB 1 0 : 0 c1 CC
BB 2C
C:
C = B 0 1 0 0 c3 C
B
BB .. .. .. . . . .. .. CCC
@. . . . .A
0 0 : 0 cn
If x1 ; x2; : : :; xn are the n successive columns of X , then from
HX = XC;
we have
Hxi = xi ; i = 1; : : :; n ; 1;
+1
and
Hxn = c x + c x + + cnxn:
1 1 2 2
517
Choose x1 = (1; 0; : : :; 0), then the matrix
01 1
BB 0 h CC
BB 21 CC
X = (x1; Hx1; : : :; Hxn;1) = B BB 0. 0. h21. h32 CC
B@ .. .. .. ... .. CC
. A
0 0 0 0 h21h32 hn;n;1
is nonsingular because hi+1;i 6= 0; i = 1; : : :; n ; 1.
So, if one or more subdiagonal entries hi ;i of H are signicantly small, then the
+1
inverse of X will have large entries and consequently X will be ill-conditioned. Thus in
such a case the transformation of an unreduced Hessenberg matrix H to a companion matrix will
be unstable
Example.
0 1 2 31
B 0:0001 1 1 CC
H=B
@ A
0 2 3
x1 = (1; 0; 0)T ; x2 = Hx1 = (1; 0:0001; 0)T ;
x = Hx = (1:0002; 0:0002; 0:0002)T ;
0 1 1 1:0002 1
3 2
X=B
B 0 0:0001 0:0002 CC
@ A
0 0 0:0002
00 0 1
1
B 1 0 ;4:9998 CC
X ; AX = C = B
1
@ A
0 1 5
Cond2(X ) = 3:1326 104 :
(Note that the existence of a small subdiagonal entry of H ; namely h21 , made the transforming
matrix X ill-conditioned.)
518
There are also other equivalent methods for reducing H to C . For example, Wilkinson, (AEP,
p. 406) describes a pivoting method for transforming an unreduced Hessenberg matrix H to a
companion matrix C using Gaussian elimination, which also shows that small subdiagonal entries
can make the method highly unstable. The subdiagonal entries are used as pivots and we have seen
before that small pivots can be dangerous.
Note that there are other approaches for nding the characteristic polynomial of a matrix. For
example, LeVerrier's method (Wilkinson, AEP pp. 434{435) computes the coecients of the
characteristic polynomial using the traces of the various powers of A. Here Wilkinson has shown
that in LeVerrier's method, severe cancellation can take place while computing the
coecients from the traces using Newton's sums. The Newton's sums determining the
coecients ci ; i = 0; : : :; n, of the characteristic polynomial
det(A ; I ) = n + cn;1n;1 + + c1 + c0;
are given by
cn; = ;trace(A);
1
k = 2; : : :; n:
For details, see (Wilkinson AEP, p. 434).
Having emphasized the danger of using the Frobenius form in the eigenvalue-computations of
a matrix, let's point out some remarks of Wilkinson about Frobenius forms of matrices arising in
certain applications such as mechanical and electrical systems.
\Although we have made it clear that we regard the use of the Frobenius form as
dangerous, in that it may well be catastrophically worse-conditioned than the original
matrix, we have found the program based on its use surprisingly satisfactory in general
for matrices arising from damped mechanical or electrical systems. It is common for
the corresponding characteristic polynomial to be well-conditioned. When this is true
methods based on the use of the explicit characteristic polynomial are both fast and
accurate."
519
Remarks: The above remarks of Wilkinson clearly support a long tradition by engineers of
computing the eigenvalues via the characteristic polynomial.
where 0 1 0
1
i
BB . . . . . . CC
Ji = BBB ... 1 C
C:
CA
@
0 i
If Ji is of order i , then
+ + : : : + k = n:
1 2
The matrix on the right hand side is the Jordan Canonical Form (JCF) of A, i 's are the eigenvalues.
Thus, the eigenvalues of A are displayed as soon as the JCF is computed.
Unfortunately, this computation can also be highly unstable.
520
Compute pn;1() through p1 () using the recurrence:
20 n 13
X
66 @ hi ;j pj () ; pi ()A 77
6 77
+1 +1
pi () = 66 j i
= +1
hi ;i 77 ; i = n ; 1; : : :; 1:
64 +1
5
Theorem 8.7.1 Let A be diagonalizable, that is, the Jordan Canonical Form of A
is a diagonal matrix D. Then for an eigenvalue of A + E , we have
min ji ; j kX k kX ;1k kE k;
where k k is a subordinate matrix norm, and 1; 2; : : :; n are the eigenvalues of A.
Set X ;1x = y . Then from the last equation, we have, by multiplying the equation by X ;1 to the
left
(I ; D)y = X ;1Ex
or
y = (I ; D); X ; EXy (note that x = Xy ).
1 1
So,
1 min(1 ; ) kX ;1k kX k kE k:
i i
or,
min( ; i ) kX ;1k kX k kE k:
522
Implications of the Theorem
The above theorem tells us that if
Cond(X ) = kX k; kX ; k
1 1
Remark: In case A is not diagonalizable, a similar result also holds. (For details, see Golub
and Van Loan, MC 1984 p. 209.)
Example 8.7.1
0 1 ;0:2113 0:0957 1
B C
A = B
@0 2 ;0:0756 C
A
0 0 3
0 0 ;0:2113 :1705 1
B C
X = 10 B
@0
5
1 ;1:0636 C
A
0 0 1:2147
X ; AX = diag(1; 2; 3):
1
Cond (X ) kE k = 18:2936:
2 2
respect to changes in the coecient of the matrix. However, as we have seen from the examples
in Chapter 3, some eigenvalues of A may be more sensitive than the others. In fact, some may
be very well-conditioned while others are ill-conditioned. Similarly, some eigenvectors may be
well-conditioned while others are not.
It is therefore more appropriate to talk about conditioning of the individual eigenvalues, rather
than conditioning of the eigenvalue problem. Recall that in Chapter 3 an analysis of the ill-
conditioning of the individual eigenvalues of the slightly perturbed Wilkinson-matrix was given in
terms of the numbers si . In general, this can be done for any diagonalizable matrix.
Let X ;1AX = diag(1; : : :; n). Then the normalized right-hand and left-hand eigenvectors
corresponding to an eigenvalue i are given by
xi = kXeXei ; y = (X ;1)T ei :
i
i k2 k(X ;1)T eik2
Denition 8.7.1 The number 1 , where s is dened by
si i
si = yiT xi
is called the condition number of the eigenvalue i .
If this number is large then i is an ill-conditioned eigenvalue. This typically happens when A
is close to a matrix having some nonlinear elementary divisors (see Wilkinson, AEP
p. 183).
Notes:
1. There are n condition numbers associated with the n eigenvalues of A.
2. If xi and yi are real, then si is the cosine of the angle between xi and yi .
Example 8.7.2 01 2 31
B C
A=B
@ 0 0:999 1 CA
0 0 2
524
x1 = (1; 0; 0)T
x2 = (1; ;0:005; 0)T
x3 = (:9623; :1923; :1925)T
y1 = (0:0004; :7066; ;0:7076)T
y2 = (0; ;0:7075; :7068)T
y3 = (0; 0; 1)T
s = yT x = 3:5329 10;
1 1 1
4
s = yT x = 3:5373 10;
2 2 2
4
s = yT x = :1925:
3 3 3
Thus, the above computations clearly show that 1 = 1; 2 = :999 are ill-conditioned, while 3 = 2
is well-conditioned.
Indeed, when a(3; 1) was perturbed to 0.000001 and the eigenvalues of the perturbed matrix
were computed, the rst two eigenvalues of the perturbed matrix (those corresponding to 1 and
of the .999 of the original matrix) become complex. The computed eigenvalues of the perturbed
matrix were (to three digits):
0:999 + 0:001i; 0:999 ; 0:001i; and 2:
For yet another nontrivial example, see exercise #23.
A Relationship Between s and Cond(X ) i
It is easy to see that the condition numbers si and Cond2(X ) with respect to the 2-norm are
related.
si = jyiT xi j = kXejeki X
T ; Xeij 1
i k(X ; )T ei k2
1
2
and
k(X ; )T eik k(X ; )T k keik
1
2
1
2 2
= k(X ; )T k = kX ; k : 1
2
1
2
525
So,
1 kX k kX ;1k = Cond (X ):
si 2 2 2
Example 8.7.3 01 2 31
B C
A=B
@ 0 0:999 1 CA
0 0 2
1 = 2:8305 103
s1
1 = 2:8270 103
s2
1 = 5:1940:
s 3
nearly dependent.
526
The Eigenvalue-Sensitivity of a Normal Matrix
A matrix A is called normal if
AA = A A;
where A = (A)T : A Hermitian matrix is normal. Normal matrices are diagonalizable.
A remarkable property of a normal matrix A is that if X is the transforming matrix that transform
A to a diagonal matrix, thus
Cond2 (X ) = 1:
Thus an immediate consequence of the Bauer-Fike theorem is:
Corollary to the Bauer-Fike Theorem: Let A be a normal matrix, and ; : : :; n be the
1
Remark: The normal matrices most commonly found in practical applications are symmetric (or
Hermitian, if complex) matrices. Thus, by the Corollary above, the eigenvalues of a symmet-
ric (or Hermitian) matrix are well-conditioned. We will discuss the symmetric eigenvalue
problem in more detail in Section 8.11.
527
Theorem 8.8.1 Let A be a very small perturbation of A and let the eigenvalue
k of A be perturbed by k ; that is k + k is an eigenvalue of A + A: Let
xk + xk be the eigenvector corresponding to k + k . Then, assuming that the
eigenvalues of A are all distinct, we have
X jk
xk + xk = xk + xj + 0(kAk );
2
j 6 k (k ; j )sj
=
where
jk = yj (A)xj :
0 0 1
not changed; while the other two eigenvectors have changed; because of the proximity
of the eigenvalues 1 and .99.
8.9 The Real Schur Form and QR Iterations
In the preceding discussions we have seen that computing eigenvalues of A via reduction of A to
the Frobenius or to the Jordan Canonical Form is not numerically eective. If the transforming
matrix is ill-conditioned, then there may be large errors in the computed canonical form and this
in turn will introduce large errors in the eigenvalues.
A question therefore arises as to whether we can obtain a similarity reduction of
A to a suitable canonical form using a well-conditioned transforming matrix.
A perfectly well-conditioned matrix, for example, is an orthogonal matrix (or unitary, if it is
complex) the condition number (with respect to 2-norm and F-norm) of such a matrix, being 1.
Indeed, if a matrix A is transformed to a matrix B using unitary similarity trans-
formation, then a perturbation in A will result in a perturbation in B of the same
magnitude. That is, if
B = U AU
and
U (A + A)U = B + B;
then
kBk kAk :
2 2
Example 8.9.1
01 2 31 0 ;:5774 ;:5774 ;:5774 1
B CC BB ;:5774 :7887 ;:2113 CC
A=B
@ 3 4 5 A ; U = @ A
6 7 8 ;:5774 ;:2113 :7887
0 13 ;:6340 ;2:3660 1
B=B
B ;:9019 0
C
0 C
@ A
;6:0981 0 0
A = 10; I 5
0 1:00001 2 1
3 3
3
A = A + A = B
B C
1 @ 3 4:00001 5 CA
6 7 8:00001
529
0 13:00001 ;:633974 ;2:3660 1
B C
B = U (A + A)U = B
1 @ ;:9019 :00001 0 C A
;6:0981 0 0:00001
B = B ; B = 10; I
1
5
3 3
A perfect canonical form displaying the eigenvalues is a triangular form (the diagonal entries
are the eigenvalues). In this context we now recall a classical result due to Schur (Theorem 8.2.3).
We restate this important result below and give a proof.
@ ^
A A
0
where A^ is (k ; 1) (k ; 1). By our hypothesis there exists unitary matrix V1 of order (k ; 1) such
that
T^ = V1 (A^)V1
530
is triangular. Then, dening 0 1
BB 1 0 0 CC
BB 0 CC
U = B .. CC ;
2
B@ . V A 1
0
we see that U2 is unitary (because V1 is so), and
U A U = U U AU U
2 1 2 2 1 1 2
= U AU
So,
0 1
BB
1 CC
B 0 CC
U AU = BBB ... V AV CC :
@ ^ = ^
T
1 A 2
0
Since T^ is triangular, so is U AU: Since the eigenvalues of a triangular matrix appear on the
diagonal, we are done.
Since a real matrix can have complex eigenvalues (occurring in complex conjugate pairs), even
for a real matrix A, U and T in the Schur Theorem above can be complex. However, we can choose
U to be real orthogonal if T is replaced by a quasi-triangular matrix R, known as the Real Schur
Form of A (RSF).
B 0 R R kC C
Q AQ = R = B
T
B@ ... . . . ... C
22
CA
2
0 0 Rkk
where each Rii is either a scalar or a 2 2 matrix. The scalars diagonal entries
correspond to real eigenvalues and 2 2 matrices on the diagonal correspond to
complex conjugate eigenvalues.
531
Proof. The proof is similar to Theorem 8.9.1.
Denition 8.9.1 The matrix R in Theorem 8.9.2 is known as the Real Schur Form (RSF) of
A.
Notes:
1. The 2 2 matrices on the diagonal are usually referred to as \bumps".
2. The columns of Q are called Schur vectors. For each k(1 k n), the rst k columns
of Q form an orthonormal basis for the invariant subspace corresponding to the
rst k eigenvalues.
Remark: Since the proofs of both the theorems are based on the knowledge of eigenvalues
and eigenvectors of the matrix A, they can not be considered to be constructive. They do not help
us in computing the eigenvalues and eigenvectors.
We present below a method known as the QR iteration method, for computing the Real-Schur
form of A. A properly implemented QR-method is widely used nowadays for computing
the eigenvalues of an arbitrary matrix. As the name suggests, the method is based on the
QR factorization and is iterative in nature. The QR iteration method was proposed in algorithmic
form by J. G. Francis (1961), though its roots can be traced to a work of Rutishauser (1958). The
method was also independently discovered by the Russian mathematician Kublanovskaya (1961).
Note: Since the eigenvalues of a matrix A are the n zeros of the char-
acteristic polynomial and it is well-known (proved by Galois more
than a century ago) that the roots of a polynomial equation of
degree higher than four cannot be found in a nite number
of steps, any numerical eigenvalue-method for an arbitrary matrix
has to be iterative in nature.
532
Compute now a sequence of matrices (Ak ) dened by
A = QR
0 0 0
A = R Q =Q R
1 0 0 1 1
A = R Q =Q R ;
2 1 1 2 2
The matrices in the sequence fAk g have a very interesting property. Each matrix in the se-
quence is orthogonally similar to the previous one and is therefore orthogonally similar
to the original matrix. It is easy to see this. For example,
A = R Q = QT A Q (sinceQT A = R )
1 0 0 0 0 0 0 0 0
A = R Q = QT A Q :
2 1 1 1 1 1
Since each matrix is orthogonally similar to the original matrix A, and therefore, has the same
eigenvalues as A; then if the sequence fAk g converges to a triangular or quasi-triangular matrix,
we will be done. The following result shows that under certain conditions, indeed this happens (see
Wilkinson AEP, pp. 518{519).
A Condition for Convergence
matrix X of the left eigenvectors (that is, X ;1) be such that its leading principal
minors are nonzero. Then fAk g converges to an upper triangular matrix or to the
Real Schur Form.
In fact, it can be shown that under the above conditions, the rst column of Ak approaches a
multiple of e1 . Thus, for suciently large k we get
0 u1
B
Ak = B
1
CC
@ A:
0 Ak
533
We can apply the QR iteration again to Ak and the process can be continued to see that the
sequence converges to an upper triangular matrix.
Example 8.9.2 !
1 2
A= :
3 4
Eigenvalues of A are: 5.3723, and -0.3723. j1j > j2j.
k=0:
A = A=Q R
;0:3162 ;0:9487 !
0 0 0
Q =
0
;0:9487 0:3162 !
;3:1623 ;4:4272
R = :
0
0 ;0:6325
k=1: !
5:2 1:6
A1 = R0 Q0 = =Q R
:6 ;:2 ! 1 1
;0:9934 ;0:1146
Q1 =
;0:1146 ;:9934 !
;5:2345 ;1:5665
R1 = :
0 ;0:3821
k=2: !
5:3796 ;0:9562
A2 = R1Q1 = = Q2R2:
;0:0438 ;0:3796
(Note that we have already made some progress towards obtaining the eigenval-
ues.)
;1 ;0:0082 !
Q = 2
;0:0081 1
;5:3797 0:9593 !
R = :
2
0 ;0:3718
k=3: !
5:3718 1:0030
A3 = R2Q2 = = Q3 R3
0:0030 ;0:3718
1 ;0:0006 !
Q3 =
;0:0006 1
;5:3718 ;1:0028 !
R3 = :
0 ;0:3723
534
k=4: !
5:3723 ;0:9998
A4 = R3Q3 = :
;0:0002 ;0:3723
8.9.2 The Hessenberg QR Iteration
The QR iteration method as presented above is not ecient if the matrix A is full and dense. We
have seen before that the QR factorization of such a matrix A requires 0(n3)
ops and thus n
iterations of QR method will require 0(n4 )
ops, making the method impractical.
Fortunately, something simple can be done:
535
k=1:
0 0:4949 0:8265 ;0:2132 1
B C
A =R Q =B
1 @ 0:9697 0:6928 0:7490 CA
0 0
Thus, the rate of convergence will be very slow if the moduli of two eigenvalues and i
Fortunately, the rate of convergence can signicantly be improved by shifting the
origin.
Let ^ i be an approximation of an eigenvalue i of H . Let the QR iteration be applied to the
matrix
H^ = H ; ^ i I:
536
The eigenvalues of H^ are i ; ^ i; 2 ; ^ i; : : :; n ; ^i: Let these eigenvalues be ordered so that
j ; ^ij j ; ^ij jn ; ^ij:
1 2
Then, in this case, the ith subdiagonal entry of H^ k will converge to zero at a rate determined by
the ratio ; ^ k
i i ;
i;1 ; ^ i
rather than by the ratio j i jk : The former is usually smaller than the latter.
i;1
Consider the convincing example from Ortega and Poole (INMD, p. 227).
Let i = :99; i;1 = 1:1; ^ i = 1: Then
; ^
i i = :1
i;1 ; ^ i
while
i = :9:
i; 1
This observation tells us that if we apply the QR iteration to the shifted matrix H^ rather than
to the original matrix H , then the rate of convergence will be faster. Of course, once an eigenvalue
of H^ is found, the corresponding eigenvalue of H can be computed just by adding the shift back.
Hk+1 = Rk Qk + h(nnk)I .
In the above h(ijk) denotes the (i; j )th entry of Hk . Of course, each of the matrices fH g k
can overwrite by H .
To implement the single shift QR iteration, we need to have an approximate value ^ i of the
eigenvalue i . Experimentally it has been observed that if we let the unshifted QR iteration (the
537
Basic QR) run for a few iterations (say s), then h(nns) can be taken as a reasonably good approximation
to an eigenvalue. Thus starting with h(nns) as a shift, we can continue the iterations using the (n; n)th
element of the current matrix as the next shift.
Remark: An implicit version of the single shift iteration known as the implicit QR iteration,
can be worked out, where one does not subtract the shifts but implicitly constructs the matrix
Hk+1. This is explained in the exercise #29.
Example 8.9.4 Single-shift QR Iteration
01 1 11
B C
H=H =B
@ 1 2 3 CA :
0
0 1 1
k=0: 00 1 11
B 1 1 3 CC = Q R
H ;h I =B
@ A
(0)
0 33 0 0
0 1 0
0 0 0:7071 ;0:7071 1
B C
Q =B
@ ;1 0
0 0 C A
0 0:7071 0:7071
0 ;1:0000 ;1:0000 ;3:000 1
B C
R =B
@ 0 0 0:4142 0:7071 C A
0 0 ;0:7071
0 2:000 ;2:8284 ;1:4142 1
B C
H =R Q +h I =B
1 0 0 @ ;1:4142 1:5000 0:5000 CA :
(0)
33
0 ;0:5000 0:5000
k=1:
H ;h I = Q R (1)
B C
Q =B
1 @ 0:6860 ;0:6727 ;0:2774 CA
0 ;0:3812 0:9245
0 ;2:0616 2:7440 1:3720 1
B
R =B
C
@ 0
1 1:3117 0:5606 C
A
0 0 0:2311
538
0 3:8824 ;1:0613 1:0464 1
B C
H = R Q +h I = B
2 1 @ 0:8998 ;0:5960 0:1544 CA :
1
(1)
33
0 ;0:0881 0:7137
k=2:
H ;h I = Q R (2)
Q =B
B ;0:2732 ;0:9580 ;0:0870 CC
@ 2 A
0 ;0:0905 0:9959
0 ;3:2940 1:3787 ;1:0488 1
B C
R =B
@ 0 0:9740 0:1367 C
2 A
0 0 0:0124
0 3:5057 ;2:1221 ;1:2459 1
B C
H =R Q +h I =B
3 2 @ ;0:2661 ;0:2318 ;:0514 CA :
2
(2)
33
0 ;0:0011 0:7260
k=3:
H ;h I = Q R (3)
B C
Q =B@ 0:0953 ;0:9955 ;0:0010 CA
3
0 ;0:0010 1:000
0 ;2:7924 2:0212 1:2451 1
B C
R =B
@ 0 1:1556 0:0675 C
3 A
0 0 0:0001
0 3:6983 ;1:7472 1:2434 1
B C
H = R Q +h I = B
4 3 @ 0:1101 ;0:4244 0:0664 CA :
3
(3)
33
0 0:0000 0:7261
The iteration is clearly converging towards the eigenvalue .7261. (The eigenvalues of H , in four
digit arithmetic, are: :7261; 3:6511; ;0:3772:)
539
ii) the (n; n)th entry of the trailing 2 2 matrix, which is real, will not be a good approximation,
iii) it is natural to use the eigenvalues of that 2 2 matrix as shift parameters yielding the
double-shift QR iteration.
One Iteration-step of the Double-Shift QR (Complex)
Let the eigenvalues of the 2 2 bottom right hand corner of the Hessenberg matrix Hs be k1
and k2 = k1 . Then one iteration-step of the double-shift QR iteration is:
Hs ; k I = QsRs ; Hs = Rs Qs + k I
1 +1 1
Hs ; k I = Qs Rs ; Hs = Rs Qs + k I:
+1 2 +1 +1 +2 +1 +1 2
Example 8.9.5 01 2 21
H =H=B
B 0 0 1 CC
0 @ A:
0 ;1 0
k = i; k = ;i:
1 2
H ; kiI = Q R :
0 0 0 0 ;1 0 1
0
B C
Q =B
@ 0 ;:7071 ;0:7071i CA
0
0 :7071i :7071
0 ;1 + i ;2 ;2 1
B C
R =B
0 @ 0 1:4142i ;1:4142 CA
0 0 0
0 1 1:4142 ; 1:4142i ;1:4142 + 1:4142i 1
B0
H =R Q +k I =B ;i 0
CC
1 @ 0 0 1 A
0 0 i
H ;k I = Q R :
1 2 1 1 0 ;1 0 1
0
B C
Q =B
@ 0 ;:9124 ;:2717i CA
1
0 :2717i :9624
0 ;1 ; i ;1:4142 + 1:4142i 1:4142 ; 1:4142i 1
B CC
R =B
1 @ 0 0 :5434 A
0 0 1:9248
0 1 1:7453 ; :9717i 1:7453 ; :9717i 1
B CC
H = R Q +k I = B
2 @01 ;:8523i
1 2 :5230 A:
0 ;:5230 :8523i
540
;:8523i :5230 !
Note that the eigenvalues of the 2 2 bottom right hand corner matrix
;:5230 :8523i
are ;i and i.
Thus the eigenvalues of H are 1; i and ;i.
2
transforming matrix, and can be formed directly from H without computing H +1. s s
Since k2 = k1 ; the matrix N is real. Next, we show that (QsQs+1 )(Rs+1Rs) is the QR factorization
of N .
N = (Hs ; k I )(Hs ; k I )
2 1
= (Hs ; k I )QsRs 2
Since N is real and (Qs Qs+1)(Rs+1Rs) is the QR factorization of N , the matrix Qs Qs+1 can be
chosen to be real.
Finally, we show that Hs+2 is orthogonally similar to Hs through this real transforming matrix
Qs Qs+1.
Hs +2 = Rs Qs + k I
+1 +1 2
= (Qs Qs ) Hs (QsQs ):
+1 +1
Though computing the eigenvalues of a 2 2 matrix is almost a trivial job, we note that k1 and
k2 need not be computed explicitly. To form the matrix
N = (Hs ; k I )(Hs ; k I ) = Hs ; (k + k )Hs + k k I;
1 2
2
1 2 1 2
all we need to compute is the trace and the determinant of the 2 2 matrix.
Let !
hn; ;n; hn; ;n
1 1 1
hn;n; hnn
1
be the 2 2 right hand corner matrix of the current matrix Hs: Then
t = k + k = sum of the eigenvalues = trace = hn; ;n; + hnn is real;
1 2 1 1
d = k k = product of the eigenvalues = determinant =hn; ;n; hnn ; hn;n; hn; ;n is real:
1 2 1 1 1 1
This allows us to write the one-step of double-shift QR iteration in real arithmetic as follows:
We will call the above computation Explicit Double Shift QR iteration for reasons to be
stated in the next section.
Example 8.9.6 01 2 31
B 1 0 1 CC
H=H =B
@ 0 A
0 ;2 2
t = 2; d = 2:
0 3 ;8 5 1
B C
N = H ; tH + dI = B
2
@ ;1 2 3 CA :
;2 0 0
542
Find the QR Factorization of N :
0 ;:8018 ;:5470 ;:2408 1
Q=B
B :2673 :0322 ;0:9631 CC
@ A
:5345 ;0:8365 0:1204
0 ;:8571 1:1007 2:5740 1
H = QT H Q = B
B ;1:1867 3:0455 ;0:8289 CC :
2 @0 A
0:0000 1:8437 0:8116
8.9.6 Implicit QR Iteration
After all this, we note, with utter disappointment, that the above double-shift (explicit) QR iter-
ation is not practical. The reason for this is that forming the matrix N itself in step 1 requires
0(n3)
ops. Fortunately, a little trick again allows us to implement the step in 0(n2)
ops.
Using the implicit Q-theorem (Theorem 5.4.3) we can show that the matrix H +2 s
of the explicit QR and H 0+2 of the implicit QR are both unreduced upper Hessenberg
s
543
Let 0n 1
BB n CC
11
BB 21 CC
Bn CC
Ne = n = B BB 31
CC ;
BB 0.
1 1
CC
B@ .. CA
0
then
n 11 = h211 ; th11 + d + h12h21
n 21 = h21 (h11 + h22 ; t)
n 31 = h21 h32:
Here hij refers to the (ij )th entry of Hs . Second, because only three elements of n1 are nonzero,
the Householder matrix P0 has the form
P^0 0
!
P0 = ;
0 In;3
where P^0 is a 3 3 Householder matrix. Because of this form of P0 , and Hs being Hessenberg, the
matrix Hs0 = P0 Hs P0 is not a full matrix. It is a Hessenberg matrix with a bulge. For example,
when n = 6, we have 0 1
BB 0
CCC
BB C
B0
C
Hs0 = P Hs P = B BB CC :
0
BB 0
0
C CC
B@ 0 0 0 C A
0 0 0 0
A bulge will be created at each step of the reduction of Hs to Hessenberg form and the constructions
of Householder matrices P through Pn; amount to chasing these bulges systematically.
1 2
where P^k is a 3 3 Householder matrix. The last Householder matrix Pn; has the form
2
In; 0
!
Pn; = 2
:
2
0 P^n; 2
544
Taking into consideration the above structures of computations, it can be shown that
one step of the implicit QR iteration requires only o(n )
ops. 2
For details of this 0(n2) computations of one iteration of the double-shifted implicit QR, see the
book by Stewart (IMC pp. 375{378) and the recent book by Watkins (FMC, pp. 277{278).
Algorithm 8.9.1 One Iteration-Step of the Double-Shift Implicit QR
Let H be an n n unreduced upper Hessenberg matrix. Then the following algorithm,
which constitutes one step of the double-shift implicit QR iteration, produces orthogonal matrices
P0 ; P1; : : :; Pn;2 such that
QT HQ; where Q = P P ; : : :; Pn; ; 0 1 2
y = n = h ( h + h ; t)
21 21 11 12
z = h =h h :
32 21 32
3. Compute the Householder matrices P0P1 : : :Pn;2 such that the nal matrix is upper Hessen-
berg.
For k = 0; 1; 2; : : :; n ; 3 do.
(a) Find a 3 3 Householder matrix P^k such that
0x1 01
B y CC = BB 0 CC :
P^k B
@ A @ A
z 0
Form 0I 0
1
k
B
B
Pk = @ P^k
CC
A:
0 In;k; 3
545
Form PkT HPk and overwrite with H :
H PkT HPk :
Update x; y and z :
x hk +2 ;k+1
y hk +3 ;k+1
z hk +4 ;k+1 (if k < n ; 3):
Flop-count. One step of the double-shift implicit QR iteration takes about 6n ops. If the 2
transferring matrix Q is needed and accumulated, then another 6n2
ops will be needed (see Golub
and Van Loan MC 1983, p. 234).
01 2 31
Example 8.9.7 1. t = 2; d = 2; H = BB@ 1 0 1 CCA
0 ;2 2
2. x = n = 3; y = n = ;1; z = n = ;2
11 21 31
3. k = 0: 031
P = I ; 2uuu
T BB CC
0 T u ; where u = @ ;1 A
;2
0 ;:8018 :2673 :5345 1
B CC
P =B @
0 : 2673 : 9604 ; 0 : 0793 A
:5345 ;:0793 0:8414
0 ;:8571 ;2:6248 ;0:9733 1
H = P T HP = B
B :0581 0:8666 1:9505 CC :
0 @ 0 A
1:1852 ;0:7221 2:9906
Update x and y :
546
x = h = :0581 y = h = 1:1852
21 31
Find P1 :
0:0490 ;0:9988
! ! ;1:1866 !
P^1 = ; P^1 xy = ;
;0:9988 0:0490 0
01 0 0
1
B
P1 = B
C;
;0:9988 C
@ 0 0:0490 A
0 ;0:9988 0:0490
0 ;0:8571 1:1008 2:5739
1
B ;1:1867
H = P1T HP1 = B 3:0456 ;0:8290 C
C:
@ A
0 1:8436 0:8116
Note that the matrix H obtained by the implicit QR is the same as H2 obtained earlier
in section 8.9.5 by the explicit QR. (Example 8.9.6)
8.9.7 Obtaining the Real Schur Form A
1. Transform the matrix A to Hessenberg form.
2. Iterate with the implicit double step QR method.
Typically, after two to three steps of the doubly-shift implicit QR iteration, one or two
(and sometime more) subdiagonal entries from the bottom of the Hessenberg matrix
converge to zero.
This then will give us a real or pair of complex conjugate eigenvalues. Once a real or a
pair of complex conjugate eigenvalues is computed, the last row and the last column in
the rst case, or the last two rows and the last two columns in the second case, can be
deleted and computation of the other eigenvalues can be continued with the submatrix.
This process is also known as de
ation.
Note that the eigenvalues of the de
ated submatrix are also the eigenvalues of the
original matrix. For, suppose immediately before de
ation, the matrix has the form:
A0 C 0
!
Hk = ;
0 B0
where B 0 is the 2 2 trailing submatrix or is a 1 1 matrix. Then the characteristic
equation of Hk :
det(I ; Hk ) = det(I ; A0) det(I ; B 0 ):
547
Thus, the eigenvalues of Hk are the eigenvalues of A0 together with those of B 0 . But Hk
is orthogonally similar to the original matrix A and therefore has the same eigenvalues
as A.
When to Accept a Subdiagonal Entry as Zero
A major decision that we have to make during the iteration procedure is when to accept
a subdiagonal entry as zero so that the matrix can be de
ated. It seems that there are
no clear-cut conventions here; however, we have given a commonly used criterion above.
For a good discussion on this matter, see the book \The Symmetric Eigenvalue Problem"
by B. N. Parlett (1980).
1 ;1:1867
2 0.3543
3 0.0129
4 0.0000
The RSF is 2 3
66 ; 1 : 1663 ; 1: 3326 ; 2: 0531 7
64 0 1:2384 1:6659 775 :
0 ;1:9409 2:9279
The eigenvalues of the 2 2 right-hand lower corner submatrix are 2:0832 1:5874i.
Beresford Parlett is a professor of mathematics at the University of California at Berkeley. He has made some
outstanding contributions in the area of numerical matrix eigenvalue problem, especially for large and sparse problems.
His book \The Symmetric Eigenvalue Problem" is an authoritative book in this area."
548
Example 8.9.9 Find the Real Schur Form of
2 3
66 0: 2190 ; 0 :0756 0: 6787 ; 0 :6391 7
66 ;0:9615 0:9032 ;0:4571 0:8804 777
h=6
64 0 ;0:3822 0:4526 ;0:0641 775
0 0 ;0:1069 ;0:0252
Iteration h21 h 32 h 43
549
QR iterations per eigenvalue. Thus, it will require about 8n3
ops to compute all the eigenvalues
(Golub and Van Loan, MC 1984 p. 235). If the transforming matrix Q and the nal quasitriangular
matrix T are also needed, then the cost will be about 15n3
ops.
Round-o Property
The QR iteration method is quite stable. An analysis of the round-o property of the
algorithm shows that the computed real Schur form (RSF) is orthogonally similar to a nearby
matrix A + E , where
kE kF (n)kAkF
(n) is a slowly growing function of n. The computed orthogonal matrix Q is also orthogonal.
0 22 R
and let us assume that R11 and R22 do not have eigenvalues in com-
mon. Then the rst p columns of Q, where p is the order
of R11, forms a basis for the invariant subspace associated
with the eigenvalues of R11.
In many applications, such as in the solution of algebraic Riccati equations (see Laub (1979)),
one needs to compute the orthonormal bases of an invariant subspace associated with a selected
550
number of eigenvalues. Unfortunately, the transformed Real Schur Form obtained by QR iteration
may not give the eigenvalues in some desired order. Thus, if the eigenvalues are not in a desired
order, one wonders if some extra work can be done to bring them into that order. That this can
indeed be done is seen from the following simple discussion. Let A be 2 2.
Let
1 r12
!
T
Q1 AQ1 = ; 1 6= 2:
0 2
If 1 and 2 are not in right order, all we need to do to reverse the order is to form a Givens rotation
J (1; 2; ) such that ! !
r12
J (1; 2; ) = :
2 ; 1 0
Then Q = Q1 J (1; 2; )T is such that
r
!
QT AQ = 2 12
:
0 1
Example 8.9.10
1 2
!
A=
2 3
:8507 :5257 !
Q1 =
;:5257 :8507
; 0 :2361 0 : 0000
!
QT1 AQ1 =
0:0000 4:2361
0 ;1
!
J (1; 2; ) =
;1 0
0
! 4:4722 !
J (1; 2; ) =
;4:4722 0
;0:5257 ;0:8507 !
Q = Q1J (1; 2; ) = T
;0:8507 0:5257
4:2361 0:00
!
Q AQ =
T :
0:00 ;0:2361
The above simple process can be easily extended to achieve any desired ordering of the eigenvalues
in the Real Schur Form. For details see (Golub and Van Loan, MC 1984 p. 241).
The process is quite inexpensive. It requires only k(8n)
ops, where k is the number of inter-
changes required to achieve the desired order. Stewart (1976) has provided useful Fortran routines
for such an ordering of the eigenvalues.
551
8.10 Computing the Eigenvectors
8.10.1 The Hessenberg-Inverse Iteration
As soon as an eigenvalue is computed by QR iteration, we can invoke inverse iteration (algo-
rithm 8.5.2) to compute the corresponding eigenvector. However, since A is initially reduced to a
Hessenberg matrix H for the QR iteration, it is natural to take advantage of the structure of the
Hessenberg matrix H in the solutions of the linear system that need to be solved in the process of
inverse iteration. Thus the Hessenberg-Inverse iteration can be stated as follows:
Algorithm 8.10.1 The Hessenberg-Inverse Iteration
1. Reduce the matrix A to an upper Hessenberg matrix H :
P T AP = H:
Stop if k(y (k) ; y (k;1))=y (k)k < or if k > N , the maximum number of iterations.
4. Recover the eigenvector x:
x = Py k ;
( )
552
8.10.2 Calculating the Eigenvectors from the Real Schur Form
The eigenvectors can also be calculated directly from the Real Schur Form without invoking the
inverse iteration. The process is described as follows:
Let A be transformed to the RSF T by the implicit QR iteration:
QAQ = T:
Then Ax = x can be written as
QAQQx = Q x:
That is, writing Q x = y , we have
Ty = y:
Thus, after A has been transformed to the RSF T , an eigenvector x corresponding to an eigenvalue
can be computed as follows:
1. Solve the homogeneous triangular system Ty = y .
2. Compute x = Qy:
We now show how the solution of Ty = y can be simplied assuming that T is triangular and
that all the eigenvalues of A (that is, the diagonal entries of T ) are distinct.
Let = tkk : That is we are trying to nd a y such that
(T ; tkk I )y = 0:
Write
T T
!
T ; tkkI = 11 12
;
0 T 22
that is,
T y + T y = 0; T y = 0:
11 1 12 2 22 2
553
Now, T22 is nonsingular, because its diagonal entries are tjj ; tkk ; j = k + 1; : : :; n; which are
dierent from zero. So, the homogeneous system
T y =0
22 2
Since the ith diagonal entry of T11 is zero, T11 is singular; therefore, y1 6= 0: Again, note that T11
has the form 0 1
^11 s
T
T11 = @ A;
0 0
where T^11 is (k ; 1) (k ; 1) and nonsingular. Thus T11y1 = 0 reduces to
T^ y^ + sz = 0;
11 1
where
y^
!
y = ; 1
1
z
z can be chosen to be any nonzero number. Since T^ is upper triangular, y^ can be computed by
11 1
back substitution.
Algorithm 8.10.2 Computing an Eigenvector Directly from the RSF
1. Transform A to RSF by the implicit QR iteration:
Q AQ = T
(Assume that T is triangular and that the diagonal entries of T are dierent.)
2. Select the eigenvalue = tkk whose eigenvector x is to be computed:
T !
3. Partition
T
(T ; tkk I ) = :
11 12
0 T 22
4. Partition
T^ s
!
T = 11
:
11
0 0
5. Solve by back substitution:
T^ y^ = ;sz;
11 1
T^ =
11 10:5428
s = 2:5145
Choose z = 1
y^1 = ;(T^11);1sz = ;:2385
y^
!
;:2385 !
y = = 1
:
1
z 1
Choose y2 = 1. Then
0 ;:2385 1
y
! B 1 CC
y = =B @
1
A
y 2
0 1
0 ;0:9283
B
x = Qy = B
C
@ 0:3910 CA :
0:2058
555
It is easily veried that 0 ;0:3315 1
B C
Ax = B
@ 0:1396 CA
0:0734
and 0 ;0:3315 1
B 0:1396 CC :
x=B
2 @ A
0:0734
8.11 The Symmetric Eigenvalue Problem
The QR iteration can, of course, be applied to nd the eigenvalues of a symmetric matrix. Indeed,
in the symmetric case the method simplies to a large extent. We will discuss below the symmetric
QR iteration brie
y. However, since the eigenvalues and eigenvectors of a symmetric matrix enjoy
certain special remarkable properties over those of the nonsymmetric case, some special methods
exploiting these properties can be developed for the symmetric problem. One such method is based
on the Sturm sequence property of the characteristic polynomial of the matrix.
Some Special Properties
A. The eigenvalues of a real symmetric matrix are real. The eigenvectors associated with
the distinct eigenvalues are orthogonal (Theorem 8.2.5).
B.
The Real Schur form of a real symmetric matrix is a diag-
onal matrix.
Proof. >From the Real Schur Triangularization Theorem (Theorem
8.9.2) we have
QT AQ = R
where R is in Real Schur Form, that is, R is a triangular matrix
with each diagonal entry as either a scalar or a 2 2 matrix. Now,
each 2 2 matrix on the diagonal corresponds to a pair of complex
conjugate eigenvalues. Since a real symmetric matrix cannot have a
complex eigenvalue, it follows that R can not have a 2 2 matrix on
the diagonal; therefore R is a diagonal matrix.
556
Let A be a n n real symmetric matrix. Let A0 = A + E; where E is a real symmetric
perturbation of the matrix A, and let 1 2 n and 01 02 0n be the
eigenvalues of A and A0 , respectively. Then it follows from the Bauer-Fike Theorem (Theorem
8.7.1) that
i ; kE k2 0i i + kE k2; i = 1; 2; : : :; n:
This result is remarkable.
Example 8.11.1 01 2 31
B 2 3 4 CC ;
A=B E = 10; I :
@ A 4
3 3
3 4 6
The eigenvalues of A are ;0:4203, 0.2336, and 10.1867. The eigenvalues of A + E are ;0:4203,
.2337, 10.1868. Note that kE k2 = 10;4.
557
Eigenvalues of a Rank-One Perturbed Matrix
Theorem 8.11.1 Suppose B = A + bbT , where A is an n n symmetric matrix,
is a scalar and b is an n-vector. Let ::: n be the eigenvalues of A
1 2
0 CC
B
B .
1
.
2
.
2 CC
B .. .. .. CC
PAP T = T = B
B
B ... ... ... CC :
B
B CC
B
@ 0 n; n; n; 2 1 1
CA
n; 1 n
558
Let pi() denote the characteristic polynomial of the i i principal submatrix of T . Then these
polynomials satisfy a three term recursion:
pi() = (i ; )pi; () ; i; pi; (); i = 2; 3; : : :; n
1
2
1 2
with
p () = 1 and p () = ; :
0 1 1
matrix. Then
k( +1)
1 <k <k
( )
1
( +1)
2 < < ik
( +1)
< ik < ik
( ) ( +1)
+1 < < kk
( +1)
< kk < kk :
( ) ( +1)
+1
We shall show that this interlacing property leads to an interesting result on the number of
eigenvalues of T.
It is clear that pk () = (;1)k k + . Thus pk () is positive if is negative with large magnitude.
The zeros (1k) < (2k) < < (kk) of pk separate the real line into k + 1 intervals
(;1; (1k)); ((1k); (2k)); : : :; ((kk;)1; (kk)); ((kk); 1):
pk () is positive in the rst interval and takes alternative signs in consecutive intervals. The
interlacing property and the sign changes of pk () can be illustrated in the following gure.
559
+ + + + + + + + + +
p (λ)
0
+ + + + + - - - - -
p (λ)
1
+ + + + - - + + + +
p (λ)
2
+ + + + - - + + - - -
p (λ)
3
+ + + - - + + - - + + +
p (λ)
4
+ + - - + + - - + + - -
p (λ)
5
Let be a real number. There are two cases for the signs of p0() and p1 ():
µ µ
+ + + + + + + + + + + +
p (λ)
0
+ + + - - - + + + - - -
p (λ)
1
(1) (1)
λ λ
1 1
Thus, it \pushes" one eigenvalue (nn) of A (or T ), as well as one eigenvalue of each k k submatrix,
to the right of .
Suppose we have counted the sign agreements between consecutive two terms of p0(); p1(); : : :; pk ().
Let's consider the signs of pk () and pk+1 () and interlacing property:
560
µ µ
+ + - - + + + + - - + +
p (λ) p (λ)
k k (k) (k)
λ
(k) (k) λ λ
λ i-1
i-1 i i
+ + - - + + - + + - - + + -
p (λ) p (λ)
k+1 k+1
(k+1) (k+1) (k+1) (k+1) (k+1) (k+1)
λ λ λ λ λ λ
i-1 i i+1 i-1 i i+1
Let be between (i;k)1 and (ik). It is clear from the above gure that a sign agreement between
pk () and pk+1() \pushes" ki +1 to the right of , or it means that pk+1 has one more zeros than
pk in [; 1).
There is no zero of p0 in [; 1). We know the number of zeros of p1 in [; 1) by the sign
agreement or disagreement of p0 () and p1(). Generally, when we know the number of zeros of
pk in [; 1), then we know the number of zeros of pk+1 in [; 1) by checking the sign agreement
between pk () and pk+1(). This recursion can be continued until we know the number of zeros of
pn, i.e., the number eigenvalues of T . The rule is clear:
of T larger than .
Note : It is also clear that pk () = 0 should be considered an sign agreement with
+1
561
p () = 1; p () = 2 ; ;
0 1
p () = (2 ; ) ; 1
2
2
= (2 ; )[(2 ; ) ; 1] ; (2 ; )
2
= (2 ; ) ; 2(2 ; ):
3
Let = 0.
Then the sequence p0; p1(); p2() and p3 () is 1, 2, 3, 4. There are three agreements in sign.
Thus all the eigenvalues of T are greater than or equal to zero. In fact, since p3(0) = 4 6= 0,
it follows that all the eigenvalues of T are positive. The eigenvalues of T are easily seen to be
p p
2; 2 + 2; 2 ; 2:
Let = 2 then, the sequence p0(); p1(); p2(); and p3() is:
1; 0; ;1; 0:
We count the agreements in sign of
+ + ;;
There are two agreements; conrming that T has two eigenvalues greater than or equal to 2.
The Sturm-sequence and Bisection method for computing a specied zero of pn () now can be
stated as follows:
Algorithm 8.11.1 The Sturm Sequence { Bisection Algorithm
Let 1 < 2 < < n be the eigenvalues of T , that is, the zeros of pn(). Suppose the desired
eigenvalue is n;m+1 for a given m n. Then
I. Find an interval [s1 ; s2] containing n;m+1 : Since i kT k, initially, we can take
s = ;kT k1 ; s = kT k1:
1 2
II. Compute s3 = s1 +3 s2 .
III. Compute N (s3) = the number of agreements in sign in the sequence
1; p1(s3); p2(s3); : : :; pn(s3 ):
If N (s3) < m, set s2 = s3 otherwise, set s1 = s3 .
Let > 0 be a preassigned small positive number. Test if js2 ; s1j < . If so accept s3 = s1 + s2
2
as an approximate value of n;m+1. Otherwise go to II.
562
Note: After k steps, the desired zero is located in an interval of width (s 2;k s ) . 2 1
m = 3:
Iteration 0. Initially,
s = ;4; s = 4
1 2
s =0 3
N (s ) = N (0) = 3:
3
Set s = s : 1 3
Iteration 1.
s = 0; s = 4
1 2
s = 0 +2 4 = 2
3
N (s ) = N (2) = 2 < 3:
3
Set s = s 2 3
Iteration 2.
s = 0; s = 2; s = 0 +2 2 = 1
1 2 3
1; ;1; 0; ;1:
(+ ; ;;)
N (s ) = 2 < 3
3
Set s = s :
2 3
Iteration 3.
s = 0; s = 1; s = :5
1 2 3
N (s ) = 3: 3
Set s = s : 1 3
563
The eigenvalue 1 is clearly in the interval [:5; 1], which is, in fact, the case. We can continue
our iterations until the length of the interval js2 ; s1 j < :
tn;n
( )
; 1 tnnk
( )
where
(t(nk;) 1;n;1 ; t(nnk) )
r= 2 :
Thus the symmetric method comes in two stages:
Algorithm 8.11.2 Symmetric QR Iteration
I. Transform A to a symmetric tridiagonal matrix T using orthogonal similarity transformations:
PAP T = T:
564
II. Apply single-shift QR iteration to T with the Wilkinson-shift :
Set T = T1
For k = 1; 2; : : : do until convergence occurs
Tk ; I = QkRk
Tk+1 = Rk Qk + I .
Remark: It is possible to compute Tk from Tk without explicitly forming the matrix Tk ; k I .
+1
This is known as Implicit Symmetric QR. For details, see Golub and Van Loan, pp. 278{281.
Also see exercise #28 in this chapter.
Convergence of the Symmetric QR Iteration
It can be shown (see Lawson and Hanson, SLP p. 109) that the convergence of t(n;n
k) to
;1
zero is quadratic, that is, there exists e > 0 depending upon A such that for all k
k j ejt k j :
jtn;n
( +1) ( ) 2
;
1 n;n; 1
Remark: In practice, it has been seen that the convergence is almost always cubic; but the
quadratic convergence is all that has been proved (for a proof, see Lawson and Hanson SLP,
pp. 240{244).
Flop-Count: The symmetric QR algorithm requires only about 23 n
ops. However if the
3
Round-o error property. As in the general nonsymmetric case, the symmetric QR with
implicit shift is stable. It can be shown that, given a symmetric matrix A, the symmetric QR
algorithm with implicit shift generates an orthogonal matrix Q and a diagonal matrix D such that
QT AQ = D + E;
where
kE kF (n)kAkF
and (n) is a slowly growing function of n.
565
Furthermore, each computed eigenvalue ^ satises the in- i
equality:
j ; ;1j p(n)kAk2 :
i i
Note: If the starting matrix itself is a symmetric tridiagonal matrix, then the
eigenvalues and the eigenvectors can be computed much more accurately.
8.12 The Lanczos Algorithm For Symmetric Matrices
There are areas of applications such as power systems, space science, quantum physics and chem-
istry, nuclear studies, etc., where the eigenvalue problems for matrices of very large order are
commonly found.
Most large problems arising in applications are sparse. Research in this area is very active.
The symmetric problem is better understood than the nonsymmetric problem. There are now well-
established techniques to compute the spectrum, at least the extremal eigenvalues of very large and
sparse symmetric matrices. A method originally devised by Lanczos has received considerable at-
tention in this context. We will discuss this method and its applications to eigenvalue computations
in this section. The QR iteration method described in the last section is not practical
for large and sparse eigenvalue problem. The sparsity gets destroyed and the storage
becomes an issue.
The Symmetric Lanczos Algorithm
Given an n n symmetric matrix A and a unit vector v , the Lanczos algorithm constructs
1
566
Let 0 0 1
BB . . .
1 1
CC
T =BBB .. . . . . . .
1 2 CC
CA
@. n; 1
0 n; 1 n
and V = (v1; v2; : : :; vn). Then the equation
V T AV = T
or
AV = V T
or 0 0 1
BB . . . . . .
1 1
CC
A(v ; v ; : : :; vn) = (v ; v ; : : :; vn) BBB .. . . . . . .
1 CC
CA
1 2 1 2
@. n; 1
0 n; n 1
gives
Avj = j vj + j ; vj ; + j vj ; j = 1; 2; : : :; n ; 1
1 1 +1 (8.12.2)
(where we assume that 0v0 = 0). Multiplying both sides of the above equation by vjT to the left,
and observing that the orthonormality condition gives
vjT vj = 1
vjT vk = 0; j 6= k;
we obtain
j = vjT Avj ; j = 1; 2; : : :; n: (8.12.3)
Also, if we write
rj = Avj ; j vj ; j ; vj ; ; 1 1 (8.12.4)
then from (8.12.2) and (8.12.4), we get
vj = rj =j ;
+1
provided that
j 6= 0:
The nonvanishing of j can be assured if we take j = krj k . 2
567
Algorithm 8.12.1 The Basic Symmetric Lanczos
Given an n n symmetric matrix A and a unit vector v1 , the following algorithm constructs
simultaneously the entries of a symmetric tridiagonal matrix T and an orthonormal matrix
V = (v1 ; : : :; vn) such that:
V T AV = T:
Set vo = 0; o = 1; ro = v1:
For j = 1; 2; : : :; n do
vj = rj ; =j ;
1 1
j = vjT Avj
rj = (A ; j I )vj ; j ; vj ;
1 1
j = k rj k 2
Notes:
1. The vectors v1 ; v2; : : :; vn are called Lanczos vectors.
2. Each Lanczos vector vj +1 is orthogonal to all the previous ones; provided j 6= 0.
3. The whole algorithm can be implemented just by using a subroutine that can perform matrix-
vector multiplication, and thus the sparsity of the original matrix A can be maintained. In
fact, in contrast with Householder's method or Givens' method, the matrix A
is never altered during the whole procedure. Such a feature makes the Lanczos
algorithm so attractive for sparse computations.
4. The vectors fv1 ; : : :; vj g from an orthonormal basis of the Krylov subspace fv1; Av1; : : :; Aj ;1v1 g.
5. The Arnoldi method described in chapter 6 is the same as the symmetric Lanczos method
described above.
We now show the Lanczos algorithm can be reformulated using only three n{vectors. This is
important when n is large.
Algorithm 8.12.2 The Reformulated Lanczos
Set v0 = 0; 0 = 1; r0 = v1:
For j = 1; 2; : : :; n do
568
1. vj = rj ;1=j ;1
2. uj Avj
3. rj uj ; j ;1vj ;1
4. j = vjT uj
5. rj = rj ; j vj
6. j = krj k:
Example 8.12.1 01 2 31 011
B C B CC
A=B
@ 2 3 4 CA ; r =v =B
0 @0A:
1
3 4 5 0
j=1: 011
B CC
= 1; = 3:6056; v = B
1 1 @0A: 1
0
j=2:
0 0 1
B C
= 8:0769; = :6154; v = B
2 2 @ :5547 CA
2
! :8321 !
1 3:6056
T = 1 1
=
2
1 2 3:6056 8:0769
j=3:
0 0 1
B :8321 CC
= 0:0769; = 1:5466 10; ; v = B
3 3 @ 14
3 A
0 1 3:6056 ; 0:
1 5547
0
T =T =B
B 3:6056 :0769 :6154 CC
3@ A
0 :6154 ;0:0769
01 0 0
1
B CC
V =B @ ; : 5547 : 8321 A:
0 :8321 ;0:5547
Note that V T AV = T:
569
Round-o Properties of the Symmetric Lanczos Algorithm
It is clear from (8.12.2) that if the symmetric Lanczos algorithm is run from j = 1 to k(k < n),
then we have
AVj = Vj Tj + rj eTj ;
where Vj = (v1 ; : : :; vj ) and Tj is the j j principal submatrix of T . If V~j , T~j and r~j denote the
respective computed quantities, then it can be shown (see Paige 1980) that
AV~j = V~j T~j + r~j eTj + Ej
where kEj k2 = kAk2; which shows that the Lanczos algorithm has very favorable numerical prop-
erties as far as the equation AVj = Vj Tj + rj eTj is concerned. However, the loss of orthogonality
among the Lanczos vectors is the real concern as explained in the following.
Loss of Orthogonality
The Lanczos algorithm clearly breaks down when any of the j = 0. However, as we will see
later, computationally this is a blessing in disguise. We immediately obtain an invariant
subspace. This, however, seldom happens in practice. The real diculty is that the computed ~j
can be very small due to the cancellation that takes place during the computations of rj , and a
small ~j can cause a severe loss of orthogonality among the computed vectors v~j , as can be seen
from the following result (Golub and Van Loan, MC 1984 p. 333):
570
Computing the Eigenvalues of A
The sole purpose of presenting the Lanczos algorithm here was to show that the Lanczos matrices
Tj can be used to compute certain eigenvalues of the matrix A.
We remarked earlier, when a j is exactly equal to zero, we have an invariant subspace. This is
indeed good news. Unfortunately, this happens very rarely in practice. In practice, for large enough
values of j , the eigenvalues of Tj provide very good approximations to the extremal eigenvalues
of A. The question arises: Can we give posteriori bounds? To this end, we introduce the
following denitions.
Denition 8.12.1 Let (j ; zj ) be an eigenpair of Tj . Then the pair (j ; yj ); where yj is dened by
yj = VjT zj , is called the Ritz pair; the j 's are known as the Ritz values and yj 's are called the
Ritz vectors.
Now, returning to the question of how well a Ritz pair approximates an eigenpair of A, we state
the following result (for a proof, see Golub and Van Loan, MC 1984 p. 327).
Thus, it follows from the above theorem that kRi k2 is a good measure of how accurate is the
Ritz pair (i ; yi).
How do we compute kRik ? 2
Fortunately, we can compute kRik from Tj without computing i and yi at every step.
Let
SjT Tj Sj = diag( ; ; : : :; j );
1 2
and
Yj = (y ; y ; : : :; yj ) = Vj Sj = Vj (s ; s ; : : :; sj ):
1 2 1 2
Then
kRik = kAyi ; yiik = kAVjsi ; Vj siik
= k(AVj ; Vj Tj )si k (because Tj si = si i )
= k(j vj eTj )si k (Note that AVj ; Vj Tj = j vj eTj )
+1 +1
= jj j jsjij;
571
where sij is the (j; i)th entry of Sj .
The above discussion can be summarized in the following theorem:
Theorem 8.12.2 (Residual Theorem for Ritz Pair) Let Tj denote the j j
symmetric tridiagonal matrix obtained after j steps of the Lanczos algorithm. Let
SjT Tj Sj denote the Real Schur Form (RSF) of Tj ;
SjT Tj Sj = diag( ; : : :; j );
1
Example 8.12.2 01 2 31
B 2 3 4 CC :
A=B
@ A
3 4 5
j=2: The eigenvalues of A are: 0, -.6235, 9.6235.
1 3:6056
!
T2 = ; 2 = :6154
3:6056 8:0769
:9221 :3870
!
S2 =
;:3970 :9211
1 = ;:5133; 2 = 9:5903:
(Note that 2 = 9:5903 is a good approximation of the largest eigenvalue 9.6235 of A).
R = j j js j = :6154 :3870 = :2382;
1 2 21
R = j j js j = :5674:
2 2 22
572
j=3: 0 1 3:6056 1
0
B C
T =B
3 @ 3:6056 8:0769 :6154 CA
0 :6154 ;0:0769
0 :8277 ;:4082 :3851 1
B C
S3 = B
@ ;:2727 :1132 :9210 CA
:4196 :9058 :0584
R = j j js j = 6:487 10;
1 3 31
15
R = j j js j = 1:4009 10;
2 3 32
14
R = j j js j = 9:0367 10; :
3 3 33
16
573
The Gersgorin disk theorems (Theorems 8.4.1 and 8.4.2) can be used to obtain a region
of the complex plane containing all the eigenvalues, or in some cases, a number of the
eigenvalues in a region. The estimates are, however, very crude.
Also, jj kAk. (Theorem 8.4.3) This result says that the upper bound of any
eigenvalue of A can be found by computing its norm.
Recall that this result is important in convergence analysis of iterative methods for linear
systems.
3. The Power Method and The Inverse Iteration. There are applications such as analysis
of dynamical systems, vibration analysis of structures, buckling of a beam, principal com-
ponent analysis in statistics, etc., where only the largest and the smallest (in magnitudes)
eigenvalues or only the rst or last few eigenvalues and their corresponding eigenvectors are
needed.
The power method and the inverse power method based on implicit construction of powers
of A can be used to compute these eigenvalues and the eigenvectors. The power method is
extremely simple to implement and is suitable for large and sparse matrices, but there are
certain numerical limitations.
In practice, the power method should be used with a suitable shift. The inverse
power method is simply the power method applied to (A ; I );1; where is a suitable shift.
It is widely used to compute an eigenvector when a reasonably good approximation to an
eigenvalue is known.
4. The Rayleigh Quotient Iteration. The quotient
Rq = xxTAx
T
x;
known as the Rayleigh quotient, gives an estimate of the eigenvalue of a symmetric matrix
A for which x is the corresponding eigenvector.
This idea, when combined with the inverse iteration method, can be used to compute an
approximation to an eigenvalue and the corresponding eigenvector. The process is known as
the Rayleigh quotient iteration.
5. Sensitivity of Eigenvalues and Eigenvectors.
The Bauer-Fike Theorem (Theorem 8.7.1) tells us that if A is a diagonalizable matrix,
then the condition number of the transforming matrix X , Cond(X ) = kX k kX ; k, plays
1
574
the role of the condition number of the eigenvalue problem. If this number is large, then
a small change in A can cause signicant changes in the eigenvalues.
Since a symmetric matrix A can be transformed to a diagonal matrix by orthogonal
similarity and the condition number of an orthogonal matrix (with respect to 2-norm)
is 1, it immediately follows from the Bauer-Fike Theorem that the eigenvalues of a
symmetric matrix are insensitive to small perturbations.
If an eigenvalue problem is ill-conditioned, then it might happen that some eigenvalues
are more sensitive than the others. It is thus important to know the sensitivity of
the individual eigenvalues. Unfortunately, to measure the sensitivity of an individual
eigenvalue, one needs the knowledge of both left and right eigenvectors corresponding to
that eigenvalue.
The condition number of the eigenvalue i is the reciprocal of the number jyiT xi j, where
xi, and yi are, respectively, the normalized right and left eigenvectors corresponding
to i .
The sensitivity of an eigenvector xk corresponding to an eigenvalue k depends upon (i)
the condition number of all the eigenvalues other than k and (ii) the distance of k
from the other eigenvalues (Theorem 8.8.1).
Thus, if the eigenvalues are well-separated and well-conditioned, then the
eigenvectors are well-conditioned. On the other hand, if there is a multiple eigen-
value or there is an eigenvalue close to another eigenvalue, then there are some ill-
conditioned eigenvectors. This is especially signicant for a symmetric matrix. The
eigenvalues of a symmetric matrix are well-conditioned, but the eigenvectors
can be quite ill-conditioned.
6. Eigenvalue Computation via the Characteristic Polynomial and the Jordan Canon-
ical Form. A similarity transformation preserves the eigenvalues and it is well known that
a matrix A can be transformed by similarity to the Jordan Canonical Form (JCF) and to
the Frobenius form (or a comparison form if A is nonderogatory). The eigenvalues of these
condensed forms are rather easily computed. The JCF displays the eigenvalues explicitly,
and with the companion or Frobenius form, the characteristic polynomial of A is trivially
computed and then a root-nding method can be applied to the characteristic polynomial to
obtain the eigenvalues, which are the zeros of the characteristic polynomial.
However, computation of eigenvalues via the characteristic polynomial or the JCF is not rec-
ommended in practice. Obtaining these forms may require a very ill-conditioned transforming
575
matrix, and the sensitivity of the eigenvalue problem depends upon the condition number of
this transforming matrix.
In general, ill-conditioned similarity transformation should be avoided in eigen-
value computation.
The use of well-conditioned transforming matrices, such as orthogonal matrices, is desirable.
7. The QR Iteration Algorithm. The most widely used algorithm for nding the eigenvalues
of a matrix is the QR iteration algorithm.
For a real matrix A, the algorithm basically constructs iteratively the Real Schur Form (RSF)
of A by orthogonal similarity. Since the algorithm is based on repeated QR factorizations
and each QR factorization of an n n matrix requires 0(n3 )
ops, the n steps of QR iteration
algorithm, if implemented naively (which we call the basic QR iteration), will require 0(n4)
ops, making the algorithm impractical.
The matrix A is, therefore, initially reduced to a Hessenberg matrix H by orthogonal
similarly before the start of the QR iteration. The key observations here are: the
reduction of A to H has to be made once for all, and the Hessenberg form is
preserved at each iteration.
The convergence of the Hessenberg-QR iteration algorithm, however, can be quite slow
in the presence of a near-multiple or a multiple eigenvalue. The convergence can be
accelerated by using suitable shifts.
In practice, double shifts are used. At each iteration, the shifts are the eigenvalues of
the 2 2 submatrix at the bottom right hand corner. Since the eigenvalues of a real
matrix can be complex, complex arithmetic is usually required. However, computations
can be arranged so that complex arithmetic can be avoided. Also, the eigenvalues of the
2 2 bottom right hand corner matrix at each iteration do not need to be computed
explicitly. The process is known as the double shift implicit QR iteration.
With double shifts, the eigenvalues are computed two at a time. Once two eigenvalues
are computed, the matrix is de
ated, and the process is applied to the de
ated matrix.
The double shift implicit QR iteration seems to be the most practical algo-
rithm for computing the eigenvalues of a dense matrix of modest size.
8. Ordering the Eigenvalues. The eigenvalues appearing in RSF obtained by the QR iteration
algorithm do not appear in the desired order, although there are some applications which need
576
this. However, with a little extra work, the eigenvalues can be put in the desired order. There
is an excellent Fortran routine, designed by Stewart, which will accomplish this.
9. Computing the Eigenvectors. Once an approximation to an eigenvalue is obtained for
the QR iteration, inverse iteration can be invoked to compute the corresponding eigenvector.
Since the matrix A is initially reduced to a Hessenberg matrix for practical implementation of
the QR iteration algorithm, advantage can be taken of the structure of a Hessenberg matrix
in computing an eigenvector using inverse iteration.
Alternatively, one can compute the eigenvectors directly from the RSF of A.
10. The Symmetric Eigenvalue Problem.
The QR iteration algorithm, of course, can be used to compute the eigenvalues of a sym-
metric matrix. A shift called the Wilkinson shift is normally used here. The convergence
of the symmetric QR iteration algorithm with the Wilkinson-shift has been proven to
be quadratic; however, in practice very often it is cubic.
The eigenvalues and eigenvectors of a symmetric matrix A enjoy some remarkable special
properties, and there are methods for the symmetric problem that can exploit these
properties. One such method is the bisection method.
The given symmetric matrix A is rst transformed to a symmetric tridiagonal matrix
T using orthogonal similarity and then the well-known bisection algorithm for the root-
nding problem is applied to the characteristic polynomial of T , obtained by a simple
recursion (Section 8.10.1). This recursion not only gives the characteristic polynomial of
T , but also gives the characteristic polynomials of all the leading principal submatrices.
A remarkable fact is that these polynomials have the Sturm sequence property,
which is used in the implementation of the bisection method. The bisection method is
especially useful for nding eigenvalues of a symmetric matrix in a given interval of the
real line.
There are other methods, such as the divide and conquer method, the Jacobi
method, etc., for the symmetric eigenvalue problem. We have not discussed these
methods in this chapter. These methods are important primarily for parallel
computations.
11. Large and Sparse Eigenvalue Problem. The eigenvalue problem for large and sparse ma-
trices is an active area of research. The state-of-the-art techniques using Lanczos or Arnoldi
577
type of methods with some sort of reorthogonalization and proper preconditioning can com-
pute only a few extremal eigenvalues.
The techniques for symmetric eigenvalue problems are more well-established and better-
understood than those for the nonsymmetric problem.
For the sake of completeness and to give the readers an idea of how the Lanczos-type methods
are used in eigenvalue computations, we have included only a very brief description of the
symmetric Lanczos method (Section 8.12).
8.14 Suggestions for Further Reading
Most books on vibration discuss eigenvalue problems arising in vibration of structures. However,
almost all eigenvalue problems here are generalized eigenvalue problems; as a matter of fact, they
are symmetric denite problems.
For references of the well-known books in vibration, see section 9.10 in the next chapter. The
books by Inman and by Thompson are, in particular, very useful and important books in this
area.
For learning more about how the eigenvalue problem arises in other areas of engineering, see
the books on numerical methods in engineering by Chapra and Canale, and by Peter O'Neil,
referenced in Chapter 6. There are other engineering books (too numerous to list here), especially
in the areas of electrical, mechanical, civil and chemical engineering, containing discussions on
eigenvalue problems in engineering. The Real Schur Form (RSF) of a matrix is an important tool
in numerically eective solutions of many important control problems, such as solutions of the Lya-
punov matrix equation (Bartels and Stewart (1972)), the Sylvester Matrix Equation (Golub,
Nash, and Van Loan (1979)), Algebraic Riccati equations (Laub (1979), Van Dooren (1982)),
Byers (1984), etc. For details, see the book Computational Methods for Linear Control
Systems, by Petkov, Christov and Konstantinov, Prentice-Hall, 1991, the IEEE Reprint, Vol-
ume by Patel et al. (1994), and the book Numerical Methods in Control Theory, by B. N.
Datta (in preparation).
Some references of how eigenvalue problems (especially large and sparse eigenvalue problems)
arise in other areas of sciences and engineering such as power systems, physics, chemistry, struc-
tural mechanics, oceanography, etc., are given in the book Lanczos Algorithms for Large
Symmetric Eigenvalue Computations, vol. 1, by Jane Cullum and Ralph Willoughby,
Birkhauser, Boston, 1985.
For some generalizations of the Gersgorin disk theorems, see the recent paper by Brualdi
(1993). This paper contains results giving a region of the complex plane for each eigenvalue; for a
578
full description of the Gersgorin disk theorems and applications, see Matrix Analysis by Roger
Horn and Charles Johnson, Cambridge University Press, Cambridge, 1985 (Chapter 6).
A nice description of stability theory in dynamic systems is given in Introduction to Dy-
namic Systems: Theory, Models, and Applications by David Luenberger, John Wiley &
Sons, New York, 1979.
For results on eigenvalue bounds, see the paper by Varah (1968).
For computation of Jordan Canonical Form, see the papers by Golub and Wilkinson (1976),
Kagstrom and Ruhe (1980a and 1980b), and Demmel (1983).
Descriptions of the usual techniques for eigenvalue and eigenvector computations|the power
method, the inverse power method, the Rayleigh-Quotient iteration method, and the QR iteration
method, etc.|can be found in all numerical linear algebra books: Golub and Van Loan (MC, 1983
and 1989), Stewart (IMC), Hager (ANLA), Watkins (FMC), Wilkinson (AEP). The Wilkinson
AEP is, of course, the most authoritative book in this area.
The papers by Varah (1968), (1970), etc. and Peters and Wilkinson (1979) are important
in the context of inverse iteration.
A Fortran program for ordering the eigenvalues of a real upper Hessenberg matrix appears in
Stewart (1976).
An important book in the area of symmetric eigenvalue problem is the book The Symmetric
Eigenvalue Problem by Beresford Parlett, Prentice-Hall, Englewood Clis, NJ, 1980.
For a proof of the global convergence of the symmetric QR iteration with Wilkinson-shifts, see
the book SLP by Lawson and Hanson, pp. 240{247.
For a description of the divide and conquer method, see Dongarra and Sorensen (1987)
and the original paper of Cuppen (1981). Watkins (FMC) contains a nice description of the Jacobi
method (Chapter 6).
Again, the book by Golub and Van Loan (MC 1984 and 1989) is a rich source of references.
A basic discussion on Lanczos methods with and without reorthogonalizations, and their appli-
cations to solutions of positive denite linear systems and eigenvalue computations, is contained in
the book by Golub and Van Loan (MC 1984, Chapter 9). Two authoritative books in this area
are: Lanczos Algorithms for Large Symmetric Eigenvalue Computations, volumes I
and II, by J. K. Cullum and R. A. Willoughby, Birkhauser, 1985, and The Symmetric Eigen-
value Problem by B. N. Parlett, Prentice-Hall. Another recent book in this area is Eigenvalue
Problem for Large Matrices, by Y. Saad (1993). Since the pioneering work by Paige in the
1970's, many papers in this area have been published, and this is an active area of research. See
the list of references given in Golub and Van Loan (MC 1989, Chapter 9).
579
The doctoral thesis of Chris Paige (1971) and several of his follow-up papers (Paige (1976),
Paige (1980), etc.) are also well worth reading. These are considered to be the \seed papers" for
further recent active research in this area.
580
Exercises on Chapter 8
(Use MATLAB whenever needed and appropriate)
PROBLEMS ON SECTION 8.3
1. Consider the following model for the vertical vibration of a motor car:
m1
y 2
k1 d (suspension)
1
m2
d (tire)
2
y 1
k2
Road
(a) Formulate the equation of motion of the car, neglecting the damping constants d1 of the
shock absorber and d2 of the tire:
M y + Ky = 0
where M = diag(m1; m2);
k ;k !
K= 1 1
;k k +k
1 1 2
and
y
!
y= : 1
y 2
N:
k = k = 300 cm
1 2
581
(b) Formulate the equation of motion when just the damping d2 of the tire is neglected:
M y + Dy_ + Ky = 0;
M , and K are the same as in part (a), and
d ;d !
D= 1 1
:
;d 1 d
1
and
y (t)
!
x(t) = :
y_ (t)
2. Write the solution of the equation M y + Ky = 0 with M and K as given in #1(a), using
initial conditions y (0) = 0 and y_ (0) = (1; 1; : : :; 1)T .
3. Develop an eigenvalue problem for an LC network similar to the case study given in section
8.2.2, but with only three loops.
Show that in this case the natural frequencies are given by
8 p
>
> : 4451 j LC
< p
w = > 1:2470j LC
: 1:8019jpLC
>
Find the modes; and illustrate how the currents oscillate in these modes.
582
PROBLEMS ON SECTION 8.4
4. Apply the Gersgorin disk theorems to obtain bounds of the eigenvalues for the following
matrices:
0 10 1 1 1
(a) A = B
B 2 10 1 CC ;
@ A
2 2 10
01 0 01
(b) B
B2 5 0C C
@ A;
0 12 1 ;61 0 0 1
BB C
B ;1 2 ;1 0 C C
(c) BB@ 0 ;1 2 ;1 CCA ;
0 01 ;01 ;01 20 1
BB ;1 2 ;1 0 CC
(d) B BB CC ;
CA
@ 0 ; 1 2 ; 1
0 10:000 0 :577;1 :509
2
:387 :462 1
BB :577 1:000 :599 :389 :322 CC
BB CC
(e) B :509 :599 1:000 :436 :426 C
B CC ;
BB
@ :387 :389 :436 1:000 :523 CA
0 1:452 ;i 0
:322 :426 :523 1:000
1
1
B 1
(f) B 1 1 C
C:
@ A
0 1;i 1+i
5. Using a Gersgorin disk theorem, prove that a diagonally dominant matrix is nonsingular.
6. Let x be an eigenvector corresponding to an isolated eigenvalue in the Gersgorin disk Rk .
Prove that jxk j > jxij for i 6= k.
7. Let A = (aij be an n n symmetric matrix. Then using a Gersgorin disk theorem prove that
each eigenvalue of A will lie in one of the intervals: [aij ; ri ; aij + ri].
Find an interval where all the eigenvalues of A must lie.
583
PROBLEMS ON SECTION 8.5
8. Applying the power method and inverse power method nd the dominant eigenvalue and the
corresponding eigenvector for each of the matrices in the exercise #4.
9. Prove that if 1; : : :; n are the eigenvalues of A and v1; : : :; vn are the corresponding eigenvec-
tors, then 1 ; ; : : :; n ; are the eigenvalues of A ; I , and the corresponding eigenvectors
are v1; : : :; vn.
10. Explain the slow rate of convergence of the power method with the following matrices:
03 2 31
B C
(a) A = B
@ 0 2:9 1 C A;
0 01 00 10 1
B C
(b) A = B
@ 1 10 0 CA ;
1 1 9:8
Choose a suitable shift and then apply the shifted power method to each of the matrices
and observe the improvement of the rates of convergence.
11. (Orthogonal Iteration) The following iterative procedure generalizes the power method
and is known as the orthogonal iteration process. The process can be used to compute
p(p > 1) largest eigenvalues and the corresponding eigenvectors.
Let Q1 be an n p orthonormal matrix.
Then
For k = 2; 3; : : : do
1) Compute Bk = AQk;1
2) Factorize into QR: Bk = Qk Rk :
Apply the above method to compute the rst two dominant eigenvalues and eigenvectors for
each of the matrices in Exercise #4.
12. (Inverse Orthogonal Iteration) The following iteration, called the Inverse Orthogonal
iteration, generalizes the inverse power method; and, can be used to compute the p smallest
eigenvalues.
Let Q1 be an n p orthonormal matrix.
584
For k = 2; 3; : : :
1) Solve for Bk : ABk = Qk;1.
2) Factorize into QR: Bk = Qk Rk :
Apply the Inverse Orthogonal Iteration to compute the 2 smallest (least dominant) eigenvalues
of each of the matrices in Exercise #4.
13. Let T be a symmetric tridiagonal matrix. Let the Rayleigh-Quotient Iteration be applied to
T with x0 = en , then prove that x1 = qn, where qn is the last column of Q in (T ; 0I ) = QR:
14. Compute the subdominant eigenvalue of each of the matrices in Exercise #4 using House-
holder de
ation. Then compute the corresponding eigenvector without invoking the inverse
iteration.
15. Compute the smallest eigenvalue of each of the matrices A in Exercise #4 by applying the
power method to A;1 , without explicitly computing A;1.
16. De
ation Using Invariant Subspace. Suppose that we have an n m matrix X with
independent columns and an m m matrix M such that
AX = XM:
Consider the QR factorization of X :
R
!
QT X = :
0
Then show that
A A
!
(a) QAQT = 1 2
;
0 A 3
or (simply)
H ; I = QR; H = RQ + I; is real:
587
(a) Prove that the rst column of Q is a multiple of the rst column of H ; I , and therefore
contains only two nonzero entries.
(b) Denote the rst column of H ; I by h1 = (h11 ; ; h21 ; 0; : : :; 0)T : Find a Givens
rotation P0 such that P0 h1 is a multiple of e1 . Show that the rst column of P0 is the
same as the rst column of Q, except possibly for the sign.
(c) Form H 0 = P0T HP0 . Find Givens rotations J32 ; J43; : : :; Jn;n;1 such that
H 0 = (J J ; : : :; Jn;n; )T H 0(J ; : : :; Jn;n; )
1 32 43 1 32 1
has the same rst column as P0 and hence the same rst column as Q. Conclude nally
from the Implicit Q Theorem that the Hessenberg matrix H10 is essentially the same as
H 0:
The steps (a) to (c) constitute one step of the implicit single-shift QR iteration.
Apply one step of the implicit single-shift QR iteration to the symmetric tridiagonal matrix
0 2 ;1 0 1
BB ;1 . . . CC
BB . . . C
B@ .. . . . . ;1 CCA :
0 ;1 2
30. Construct one step of the explicit double-shift QR iteration (real arithmetic) and one step of
the implicit double-shift QR iteration for the matrix
01 2 3 41
BB 3 4 5 6 CC
A=B BB CC
CA
@ 0 1 0 1
0 0 ;2 2
and show that the nal Hessenberg matrices are (essentially) the same.
31. LR Iteration. In analogy with QR iteration algorithm, develop a LR iteration algorithm,
based on LU decomposition of A, making the necessary assumptions.
Set A = A1
2) Compute Ak = Lk Rk ; Ak+1 = Rk Lk ; k = 1; 2; : : :
Why is this algorithm not to be preferred over the QR iteration algorithm?
588
32. Considering the structures of the matrices Pi ; i = 0; 1; : : :; n ; 2 in the implicit double-shift
QR iteration step (Exercise #29), show that it requires only 4n2
ops to implement this step.
33. Show that the matrices Hs and Hs+2 in the double-shift QR iteration have the same eigen-
values.
34. Prove the following:
Let H = H0 . Generate the sequence fHk g:
Hk ; k kI = Qk Rk; Hk l = Rk Qk + k I:
+
Then
ni=1 (H ; i I ) = (Q1 ; : : :; Qn)(Rn; : : :; R1):
589
(b) Test your algorithm with the matrix A of problem #35.
38. Let
!
A= :
Prove that the eigenvalue of A closest to is given by
=
; jjsignp () ;
2
+ +
2 2
where = ;
.
!
2
A ;A
39. Let A = A + iA be a Hermitian matrix. Then prove that B = 1 2
is symmetric.
1 2
A A
2 1
How are the eigenvalues and eigenvectors of A related to those of B ?
40. Take the 50 50 symmetric tridiagonal matrix T with 2 along the diagonal and -1 along the
sub diagonal and super diagonal. Apply Algorithm 8.12.3 to T with j = 1; 2; : : :; 20. Find
pproximations to a few extreme eigenvalues of T using Theorems 8.12.1 and 8.12.2.
590
MATLAB PROGRAMS AND PROBLEMS ON CHAPTER 8
1. Write a MATLAB program to compute the dominant eigenvalue of a matrix using the power
method.
[lambda1] = power(A,x0,epsilon,n)
(a) Modify the program power to incorporate a shift sigma
[lambda1] = powershift(A,x0,sigma,epsilon,n)
(b) Apply power and powershift to the following matrices and compare the speed of
convergence
Test Data: 03 2 3 1
B 0 0:99 1 CC ;
A=B
@ A
0 0 2:9
A = A randomly generated matrix of order 5.
A = The Wilkinson bidiagonal matrix of order 20.
2. (a) Using linsyspp from Chapter 6 or the MATLAB command `n', write a MATLAB pro-
gram called invitr to implement the inverse iteration algorithm
x = invitr(A,x0,sigma,epsilon,n)
(b) Using linsyspp from Chapter 6 or the MATLAB command `n', write a MATLAB pro-
gram called powersmall to compute the smallest eigenvalue (in magnitude) of a matrix
A
lambdan = powersmall(A,x0,epsilon,n)
Test Data and Experiment:
(a) Take the 20 20 symmetric tridiagonal matrix appearing in the buckling problem of
section 8.3.2. Apply power to compute the dominant eigenvalue lambda1 by choosing
x0 arbitrarily.
(b) Now compute the smallest eigenvalue in magnitude, lambdan, by using
(i) powershift with sigma = lambda1
(ii) powersmall with the same x0 as used to implement power.
591
(c) Compare the speed of (i) and (ii) in (b).
(d) Find the smallest critical load that will buckle the beam
(e) Taking sigma = lambdan, nd the eigenvector corresponding to the smallest eigenvalue
n using invitr.
3. Using power, invitr, housmat (from Chapter 5) and housmul (from Chapter 4), write
a MATLAB program to implement the Householder de
ation algorithm that computes the
subdominant eigenvalue of a matrix:
Test data:
A single sex, chohort population model can be represented by the following system of dierence
equations :
pi (k + 1) = i pi(k); i = 0; 1; :::; n ; 1;
+1
or in matrix form
pk = Apk :
+1
Here i is the birth rate of the ith age group and i is the rate of survival of that group (see
Luenberger (1979), p 170).
Taking 0 = 0; 1 = 2 = ::: = n = 1, and i = 1; i = 0; 1; :::; n ; 1, determine by using
power, invitr, housde
t, if there is a long term growth in the population; if this is so, what
is the nal population distribution; and how fast is the original distribution approaching the
nal distribution.
Consult the example on population study in Section 8.4.2.
4. (The purpose of this exercise is to study how are the eigenvalues of a matrix A
are aected by conditioning of the transforming matrix.)
(a) For each of the following matrices construct a matrix X of appropriate order which is
upper triangular with all the entries equal to 1 except for a few very small diagonal
592
entries. Then compute the eigenvalues of A and those of (X );1AX using MATLAB
commands eig and inv :
01 1 0 01
BB 0 1 0 0 CC 00 0 2 1
A=B BB CC ; A = BB 1 0 ;5 CC ;
@ A
@ 0 0 0:9 1 CA 0 1 4
0 0 0 1
A = The Wilkinson bidiagonal matrix of order 20.
(b) Repeat part (a) by taking X as a Householder matrix of appropriate order.
(c) Compare the results of (a) and (b).
5. (a) Compute the eigenvalues of the following matrices using:
1) MATLAB commands poly and roots.
2) MATLAB command eig.
00 0 2 1 01 1 1
1
B C B C
A=B
@ 1 0 ;5 CA ; A = B@ 0 :19 1 C A;
0 1 4 0 0 0:9999
A = The Wilkinson bidiagonal matrix of order 20.
(b) Compare your results of (1) and (2) for each matrix.
6. (The purpose of this exercise is to study the sensitivities of the eigenvalues of
some well-known matrices with ill conditioned eigenvalues).
Perform the following on each of the matrices in the test-data:
(a) Using the MATLAB command [V; D] = eig(A), nd the eigenvalues and the matrix of
right eigenvectors. Then nd the matrix of left eigenvectors W as follows W = (inv(V))'
/norm(inv(V)')
(b) Compute si = wiT vi ; i = 1; :::; n: where Wi and Vi are the ith columns of W and V .
(c) Compute ci = the condition number of the ith eigenvalue = 1=si, i = 1; 2; :::; n.
(d) Perturb the (n,1)th entry of A by = 10;10. Then compute the eigenvalues ^ i ; i = 1:::; n
of the perturbed matrix using the MATLAB command eig.
(e) Make the following table for each matrix.
593
i ^i ji ; ^ij Cond (V) ci
594
01 0 0 0
1
BB 0 :999 0 0 C
C
A=BBB CC
@0 0 0 2 C A
0 0 0 0:0005
1) A = diag(1; :9999; 1; :9999; 1)
2) A randomly generated matrix of order 5.
8. Write a MATLAB program called qritrb to implement the basic QR iteration using givqr
from Chapter 5 :
(a) [A] = qritrb(A; num); num is the maximum number of iterations.
(b) Modify the program now to implement the Hessenberg QR iteration [A] = qritrh(A; num)
num is the number of iterations
(c) Compare the speed of the programs by actually computing the
op-count and elapsed
time .
9. (a) Write a MATLAB program to compute one step of the explicit double shift QR iteration:
[A] = qritrdse(A)
(b) Write a MATLAB program to compute one step of the implicit QR iteration with double
shift: [A] = qritrdsi(A).
(c) Compare your results of (a) and (b) and conclude that they are essentially the same.
10. Write a MATLAB program to de
ate the last k rows and k columns of a Hessenberg matrix.
Test your program with a randomly generated matrix with dierent values of k. Note that
for k = 1, hprime will be of order n ; 1, for k = 2 hprime will be of order n ; 2, and so on.
11. Using Qritrdsi and de
at, write a MATLAB program to determine the Real Schur form of
a Hessenberg matrix A;
[h] = rsf(h; eps);
where eps is the tolerance.
Test: Generate randomly a 20 20 Hessenberg matrix and make the following table using
rsf.
595
Iteration h21 h32 h43 h54 h20;1
TABLE
12. (a) Write a MATLAB program, called polysymtri to compute the characteristic polynomial
pn() of an unreduced symmetric tridiagonal matrix T , based on the recursion in Section
8.11.1
[valpoly] = polysymtri(A; lambda):
(b) Using polysymtri, write a MATLAB program called signagree that nds the number
of eigenvalues of T greater than a given real number , based on Theorem 8.11.1:
[number] = signagree (T; meu).
(c) Using polysmtri and signagree, implement the Bisection algorithm of Section
8.11.1:
[lambda] = bisection (T; m; n).
Compute n;m+1 for m = 1; 2; 3; ::: , using bisection, and then compare your results
with those obtained by using eig(T).
Test Data:
A = The dymmetric tridiagonal matrix arising in Buckling Problem in Section 8.3.2, with
n = 20.
13. Write a MATLAB program, called lanczossym, to implement the reformulated symmetric
Lanczos algorithm of Section 8.11.3.
Using lanczossym nd the rst ten eigenvalues of a randomly generated symmetric matrix
of order 50. (To generate a symmetric matrix B, generate A rst, then take B = A + AT ).
(Note : The programs power, invitr, qritrb, qritrh, qritrdsi, etc. are in MATCOM. But it is
a good idea to write your own codes).
596
9. THE GENERALIZED EIGENVALUE PROBLEM
9.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 597
9.2 Generalized Schur Decomposition : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 599
9.3 The QZ algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 601
9.3.1 Reduction to Hessenberg-Triangular Form : : : : : : : : : : : : : : : : : : : : 602
9.3.2 Reduction to the Generalized Schur Form : : : : : : : : : : : : : : : : : : : : 604
9.4 Computations of the Generalized Eigenvectors : : : : : : : : : : : : : : : : : : : : : 611
9.5 The Symmetric-Denite Generalized Eigenvalue Problem : : : : : : : : : : : : : : : 612
9.5.1 The QZ Method for the Symmetric-Denite Pencil : : : : : : : : : : : : : : : 614
9.5.2 The Cholesky-QR Algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : 614
9.5.3 Diagonalization of the Symmetric-Denite Pencil : : : : : : : : : : : : : : : : 616
9.6 Symmetric-Denite Generalized Eigenvalue Problems Arising in Vibrations of Struc-
tures : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 619
9.6.1 A Case Study on the Vibration of a Spring-Mass System : : : : : : : : : : : : 619
9.6.2 A Case Study on the Vibration of a Building : : : : : : : : : : : : : : : : : : 622
9.6.3 Forced Harmonic Vibration : : : : : : : : : : : : : : : : : : : : : : : : : : : : 625
9.7 Applications to Decoupling and Model Reduction : : : : : : : : : : : : : : : : : : : : 629
9.7.1 Decoupling of a Second-Order System : : : : : : : : : : : : : : : : : : : : : : 630
9.7.2 The Reduction of a Large Model : : : : : : : : : : : : : : : : : : : : : : : : : 637
9.7.3 A Case Study on the Potential Damage of a Building Due to an Earthquake : 639
9.8 The Quadratic Eigenvalue Problem : : : : : : : : : : : : : : : : : : : : : : : : : : : : 643
9.9 The Generalized Symmetric Eigenvalue Problems for Large and Structured Matrices 646
9.9.1 The Sturm Sequence Method for Tridiagonal A and B : : : : : : : : : : : : : 646
9.9.2 The Lanczos Algorithm for the Pencil A ; B : : : : : : : : : : : : : : : : : 647
9.9.3 Estimating the Generalized Eigenvalues : : : : : : : : : : : : : : : : : : : : : 649
9.10 The Lanczos Method for Generalized Eigenvalues in an Interval : : : : : : : : : : : : 651
9.11 Review and Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 652
9.12 Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 655
CHAPTER 9
THE GENERALIZED EIGENVALUE PROBLEM
9. THE GENERALIZED EIGENVALUE PROBLEM
Objectives
The objectives of this chapter is to study engineering applications and numerical methods for
the generalized eigenvalue problem: Ax = Bx. Some of the highlights of this chapter are:
The QZ algorithm for generalized Schur form (Section 9.3).
The Cholesky-QR algorithm and simultaneous diagonalization techniques for a symmetric
denite pencil (Section 9.5).
Engineering vibration problems giving rise to generalized symmetric denite eigenvalue
problems (Section 9.6).
Applications of simultaneous diagonalization techniques to decoupling and model
reduction (Section 9.7).
The Lanczos algorithm for large and sparse symmetric denite problems (Section
9.9).
The Lanczos algorithm for generalized eigenvalues in an interval (Section 9.10).
9.1 Introduction
In this chapter we consider the following eigenvalue problems known as the generalized eigen-
value problem.
597
Statement of the Generalized Eigenvalue Problem
Given n n matrices A and B , nd scalars and nonzero vectors x such
that
Ax = Bx:
Note that the standard eigenvalue problem for the matrix A is a special case of this problem
(take B = I ).
Denition 9.1.1 In the problem AX = BX , is called a generalized eigenvalue and the
vector x is a generalized eigenvector associated with .
It is easy to see that is a root of the characteristic equation
det(A ; B ) = 0:
Denition 9.1.2 The matrix A ; B is called a matrix pencil. If B is nonsingular, then the
pencil is called a regular pencil.
599
Theorem 9.2.1 Let A and B be n n matrices and B be nonsingular. Let U1 and
U2 be unitary matrices such that
0t 1
BB 11 ... CC
0
U1AU2 = T1 = BBB .. ...
CC
CA
@.
0 tnn
0 t0 1
BB 011 ... CC
U1BU2 = T2 = BBB .. ...
CC :
CA
@.
0 t0nn
Then the generalized eigenvalues i ; i = 1; : : :; n of the regular pencil det(A ; B )
are given by
i = tii=t0ii:
Remarks:
1. Note that U2 x = y is an eigenvector associated with .
2. If A and B are real matrices, then U1 can be chosen such that U1 AU2 = T1 is in Real Schur
Form.
600
9.3 The QZ algorithm
We will describe an analog of the QR iteration, known as the QZ iteration algorithm for comput-
ing the generalized eigenvalues, developed by Cleve Moler and G. W. Stewart.Assume that B is
nonsingular. Then the basic idea is to apply the QR iteration algorithm to the matrix C = B ;1 A
(or to AB ;1 ), without explicitly forming the matrix C , because if B is nearly singular, then it is
not desirable to form B ;1 . In this case the elements of C will be much larger than those of A and
B , and the eigenvalues of C will be computed inaccurately. Note that the eigenvalues of B;1 A are
the same as those of AB ;1 , because AB ;1 = B (B ;1 A)B ;1 . Thus, they are similar.
If AB ;1 or B ;1 A is not to be computed explicitly, then the next best alternative, of course, is
to transform A and B to some reduced forms and then extract the generalized eigenvalues i from
these reduced forms. The simultaneous reduction of A and B to triangular forms by equivalence is
guaranteed by the following theorem:
Stage I. A and B are reduced to an upper Hessenberg and an upper triangular matrix, respec-
tively, by simultaneous orthogonal equivalence:
A QT AZ : an upper Hessenberg matrix
B QT BZ : an upper triangular matrix
Dr. Cleve Moler is the developer of the popular software package MATLAB. He was a professor and head of the
Computer Science Department at the University of New Mexico. He is currently with MathWorks, Inc. He was also
one of the developers of another popular mathematical software \LINPACK".
601
Stage II. The Hessenberg-Triangular pair (A; B) is reduced further to the generalized Real
Schur Form by applying implicit QR iteration to AB;1 .
This process is known as the QZ Algorithm.
We will now brie
y sketch these two stages in the sequel.
603
Example 9.3.1
01 2 31 01 1 11
B 1 3 4 CC ;
A=B
B 0 1 2 CC :
B=B
@ A @ A
1 3 3 0 0 2
1. Form Q23 to make a31 zero:
01 0 0
1
B 0 :7071 :7071 CC
Q23 = B
@ A
0 ;:7071 :7071
0 1 2 3
1
B C
A A(1) = Q23A = B
@ 1:4142 4:2426 4:9497 CA :
0 0 ;0:7071
2. Update B : 01 1
1 1
B C
B B (1) = Q23B = B@ 0 0:7071 2:8284 CA
0 ;0:7071 0
3. Form Z23 to make b32 zero:
01 0 0 1
B C
Z23 = B
@ 0 0 ;1 CA
0 1 0
01 1 ;1 1
B CC
B B(1) Z23 = Q23 BZ23 = B
@ 0 2: 8284 ; 0 : 7071 A:
0 0 0:7071
4. Update A: 0 1 3 ;2 1
B C
A A(1) Z23 = Q23AZ23 = B
@ 1:4142 4:9497 ;4:2426 CA :
0 ;0:7071 0
A is in upper Hessenberg and B is in upper triangular form.
605
where ai ; i = 1; 2 are the rst two columns of A. Note that c1 has at most two nonzero entries and
c2 has at most three.
Let 0 1 c
0c 1 BB c12 CC
11
B
B C
c21 C BB 22 CC
B
B CC C BB c32 CC
c1 = B
B 0 CC and c 2 B
= B CC :
B B 0
B .
@ .. CA BB .. CCC
0 @ .A
0
Then it is easy to see that
0 x 1 0 (c ; )(c ; ) + c c 1
11 1 11 2 12 21
BB y C B C
@ A = @ c21(c11 ; 2) + c21(c22 ; 1) CA :
C B
z c21c32
Example 9.3.2
Let
01 1 1 11 01 2 3 41
B
B 2 1 4 1C
C BB 0 1 1 1 CC
B
A=B CC ; B=BBB CC :
B
@ 0 1 1 1 CA @ 0 0 1 2 CA
0 0 1 1 0 0 0 3
1 2
!
The 2 2 leading principal submatrix of B = :
0 1
011 0 ;1 1
BB 2 CC BB ;3 CC
c1 = BBB CCC ; c2 = BBB CCC :
@0A @1A
0 0
Choose 1 = 1; 2 = 1;
then x = ;2; y = ;8; z = 2:
Computation of A1 and B1
Since the rst column n1 of N = (C ; 1 I )(C ; 2I ) has at most three nonzero entries, the
Householder matrix Q1 that transforms n1 to a multiple of e1 has the form
Q^ 1 0
!
Q1 =
0 In;3
606
where Q^ 1 is a 3 3 Householder matrix. Therefore
0 1
BB C
BB C CC
BB C
B
C CC
A Q1A = B BB 0
BB 0 C CC
BB ... .. C
.C
.. .. ..
@ . . . CA
0 0 0
0 1
BB C
BB
C CC
BB C
BB
C CC
B Q1B = B
B0 0 0 C CC :
BB
BB ... .. . . . . . . .. C
BB . .C CC
BB ... .. . . . . . . .. C
@ . .C CA
0 0 0
That is, both the Hessenberg form A and the triangular form of B are now lost, in that there
are now an unwanted nonzero in the (3,1) position of Q1 A and the unwanted nonzeros in the (2,1),
(3,1) and (3,2) position of Q1 B . The job now will be to cleverly chase the unwanted nonzero entries
and make them zero using orthogonal transformations.
To do this, Householder matrices Z1 and Z2 are rst constructed to make B triangular, that is,
to make the nonzero entries b31; b32 and b21 zero. We then have
0 1
BB 0 CC
BB CC
BB 0 0 CC
BB ... CC
B B Z1Z2 = B BB .. CC ;
CC
BB .. CC
BB .. CC
BB .. CA
@.
0 0 0 0
607
0 1
BB CCC
BB
BB
C CC
BB
C CC
A A Z1Z2 = B BB C:
BB 0 0 C
CC
BB 0 0 0 C
CC
BB .. CA
@.
0 0 0
(Note that we now have unwanted zeros in the (3,1), (4,1) and (4,2) positions of A.)
Next, a Householder matrix Q2 is created to introduce zeros in the (3,1) and (4,1) positions of
A; B is then updated. We now have
0 1
B C 0 1
B ... C B CC
B
B CC BB CC
B ... ... ... C B
B
B 0 CC BB 0 CC
B C BB CC
B
A Q2 A = B 0 ... ... ... C CC = BB 0 A0 CC ;
B
B C BB ... CC
B
B
B 0 0 0 ... ... ... C CC BB . CC
B .. . . . . . . . . . . . . . . . ... C BB .. CC
B
@. CA @ A
0 0 0 0 0 0
0 1 0 1
BB CC BB CC
BB 0 CC BB 0 CC
BB 0 CC BB 0 CC
BB ... C
C B
BB CC
B Q2 B = B BB 0 0 C =
CC BB 0 CC ;
BB 0 0 . . . . . . CC BB 0 CC
BB . . CC BB .. CC
B@ .. .. . . .
. . . . .. C B . B 0 CC
A @ A
0 0 0 0 0
The process is now repeated on the submatrices A0 and B 0 which have the same structures of A
and B to start with. The continuation of this process will nally yield A1 , and B1 in the desired
forms.
In view of the above discussion, let's now summarize one step of the QZ iteration.
Algorithm 9.3.1 One step of the QZ Algorithm
Given A, an unreduced upper Hessenberg matrix, and B , an upper triangular matrix, the
608
following steps constitute one step of the QZ iteration; that is, these steps construct orthogonal
matrices Q and Z such that QAZ is an upper Hessenberg and QBZ is upper triangular.
1. Choose the shifts 1 and 2.
2. Compute the rst column of N = (C ; 1 I )(C ; 2I ), where C = AB ;1 , without explicitly
!
b11 b12 ;1
forming B ;1 :
Let (c1; c2) be the rst two columns of C . Then (c1; c2) = (a1; a2) .
0 b22
The three nonzero entries of the rst column of N are given by:
x = (c11 ; 1 )(c11 ; 2) + c12c21
y = c21(c11 ; 2 ) + c21(c22 ; 1)
z = c21c32.
0x1
BB y CC
BB CC
BB z CC
The rst column of N = n1 = B
BB 0 CCC.
BB .. CC
@.A
0
01
BB 0 CC
3. Find a Householder matrix Q1 such that Q1 n1 = BBB .. CCC :
@.A
0
4. Form Q1 A and Q1 B .
5. Transform Q1A and Q1 B , respectively, to an upper Hessenberg matrix A1 and a triangular
matrix B1 by orthogonal equivalence in the way shown above, by taking advantage of the
special structures of the matrices, creating orthogonal matrices Q2 through Qn;2 and Z1
through Zn;2 .
611
Example 9.4.1
0 3 ;1:5 0 1
B CC
A = 109 B
@ ; 1 :5 3 ; 1 :5 A;
0 ;1:5 1:5
02 0 01
B C
B = 103 B
@ 0 3 0 CA :
0 0 4
B C
v^1 = B@ ;0:0102 CA
0:0024
0 :8507 1
B CC
v^1 = v^1 =kv^1k = B
@ ; : 5114 A:
0:1217
9.5 The Symmetric-Denite Generalized Eigenvalue Problem
In this section, we study the symmetric denite eigenvalue problem:
Ax = Bx;
Here A and B are symmetric and B is positive denite.
As said before, the symmetric denite generalized eigenvalue problem arises in a wide variety
of engineering applications.
It is routinely solved in vibration analysis of structures.
612
Frequencies, Modes, and Mode Shapes
The eigenvalues are related to the natural frequencies, and \the size and sign of
each element of an eigenvector determines the shape of the vibration at any instant
of time". The eigenvectors are therefore referred to as mode shapes or simply as
modes.
\The language of modes, mode shapes, and natural frequencies form the basis for
discussing vibration phenomena of complex systems. An entire industry has been
formed around the concept of modes" (Inman (1994)).
We start with an important (but not surprising) property of the symmetric denite pencil.
Theorem 9.5.1 The symmetric-denite pencil (A ; B) has real eigenvalues and
linearly independent eigenvectors.
613
An Interval Containing the Eigenvalues of a Symmetric Denite Pencil
The eigenvalues of the symmetric denite pencil A ; B lie in the interal
[;kB ;1 Ak; kB ;1 Ak]. (Exercise #26.)
Computational Algorithms
9.5.1 The QZ Method for the Symmetric-Denite Pencil
The QZ algorithm described in the previous section for the regular pencil A ; B can, of course,
be applied to a symmetric-denite pencil. However, the drawback here is that both the symmetry
and deniteness of the problem will be lost in general. We describe now a specialized algorithm for
a symmetric-denite pencil.
614
Algorithm 9.5.1 The Cholesky-QR Algorithm For the Symmetric-Denite Pencil
1. Find the Cholesky factorization of B :
B = LLT :
2. Form
p p
L = UD; 12 = U diag( d1 ; : : :; dn);1 :
3. Form
C = L;1AT (LT );1
4. Compute the eigenvalues of C .
615
Example 9.5.1
01 2 31 01 0 01
B C B C
A=B
@ 2 3 4 CA ; B=B
@ 0 :00001 0 CA :
3 4 5 0 0 1
Cond(B ) = 105:
1. 0 :00001 0 0 1 00 1 01
B 0 1 0 CC ;
U T BU = D = B
B 1 0 0 CC :
U =B
@ A @ A
0 0 1 0 0 1
2. 0 0 1 0
1
B
L = UD; = B
C
@ 316:2278 0 0 CA :
1
2
0 0 1
3. 0 0:0003 0:0063 0:0126 1
B C
C = L;1 AT (LT );1 = B
@ 0:0063 1 3 C A:
0:0126 3 5
4. The eigenvalues of C are: 0; ;0:6055; 6:6055.
5. The two smallest generalized eigenvalues are: 0; ;0:6665.
Remark: Two smallest generalized eigenvalues have been computed by the procedure rather
accurately. However, the largest one is completely wrong.
This example shows how ill-conditioning of B can impair the occuracy of the computed eigen-
values.
616
Set P = (L;1 )T Q. Then
P T AP = QT L;1A(L;1)T Q = QT CQ = diag(c1; c2; : : :; cn);
and
P T BP = QT L;1B (L;1)T Q = QT L;1LLT (L;1)T Q = I:
Algorithm 9.5.3 Simultaneous Diagonalization of a Symmetric Denite Pencil
Given a symmetric matrix A and a symmetric positive denite matrix B , the following algorithm
computes a nonsingular matrix P such that P T BP is the identity matrix and P T AP is a diagonal
matrix.
1. Compute the Cholesky factorization of B :
B = LLT :
Flop Count: The algorithm for simultaneous diagonalization requires 7n3
ops.
Example 9.5.2
Consider
0 10 1 1 1 01 2 31
B C B C
B=B
@ 1 10 1 CA ; A=B
@ 2 3 4 CA
1 1 10 3 4 5
A is symmetric and B is symmetric positive denite.
1. The Cholesky decomposition of B = LLT :
0 3:1623 0 0
1
L=B
B :3162 3:1464 0 CC
@ A
:3162 :2860 3:1334
617
2. Form C = L;1A(L;1 )T 0 0:1000 0:1910 :2752 1
B C
C=B
@ 0:1910 0:2636 0:3320 CA
0:2752 0:3320 0:3864
3. Find an orthogonal Q such that
QT CQ = diag(c1; : : :; cn)
0 :4220 ;0:8197 ;0:3873 1
B :5684 ;0:0936 0:8174 CC
Q = B
@ A
0:7063 0:5651 ;0:4262
4. Form 0 :09409 ;0:2726 ;0:1361 1
B C
P = (L;1)T Q = B
@ :1601 ;0:0462 0:2722 CA
0:2254 0:1803 ;0:1361
5. Verify:
P T AP = diag(:8179; ;0:0679; 0):
P T BP = Identity = diag(1; 1; 1):
Modal Matrix
In vibration problems involving the mass and stiness matrices, that is, when K is the stiness
matrix and M is the mass matrix in the symmetric-denite generalized eigenvalue problem:
Kx = Mx
the matrix P that simultaneously diagonalize M and K , respectively, to the identity and a diagonal
matrix, is called the modal matrix, and the columns of the matrix P are called the normal
modes.
Orthogonality of the Eigenvectors
Note that if p1; : : :; pn are the columns of P , then it is easy to see that
pTi Bpi = 1; i = 1; : : :; n
6 j:
pTi Bpj = 0; i =
and
pTi Api = ci ; i = 1; : : :; n
pTi Apj = 0; i =6 j:
618
Generalized Rayleigh-Quotient
The Rayleigh-Quotient dened for a symmetric matrix A in Chapter 8 can easily be generalized
for the pair (A; B ).
619
k
1
m
1
y
1
k
2
m
2
y
2
k
3
m
3
y
3
620
m2y2 ; k2y1 + (k1 + k2)y2 ; k3y3 = 0
m3y3 ; k3y2 + k3y3 = 0
or, in matrix form,
0 m 0 0 1 0 y 1 0 k + k ;k 0
10y 1 001
1 1 1 2 2 1
B
B C B C B C B C B C
@ 0 m2 0 C
AB@ y2 CA + B@ ;k2 k2 + k3 ;k3 CA B@ y2 CA = B@ 0 CA ;
0 0 m3 y3 0 ;k3 k3 y3 0
that is,
M y + Ky = 0;
where M = diag(m1 ; m2; m3); and
0 k + k ;k 0
1
1 2 2
B C
K=B
@ ;k2 k2 + k3 ;k3 CA :
0 ;k3 k3
Assuming harmonic motion, we can write
y1 = x1 ei!t
y2 = x2 ei!t
y3 = x3 ei!t
where x1 ; x2 and x3 are, respectively, the amplitudes of the masses, m1 , m2 and m3, and !
denotes the natural angular frequency. Substituting these expressions for y1 ; y2 and y3 into the
equations of motion, and noting that
yk = ;!2 xk ei!t; k = 1; 2; 3;
we have
;m1x1!2 + (k1 + k2)x1 ; k2x2 = 0
;m2x2!2 ; k2x1 + (k1 + k2)x2 ; k3x3 = 0
;m3x3!2 ; k3x2 + k3x3 = 0
which can be written in matrix form as:
0 k + k ;k 0
10x 1 0m 0 0 10x 1
BB ;k k + k ;k CC BB x CC = !2 BB 01 m 0 CC BB x1 CC
1 2 2 1
@ 2 1 2 3A@ 2A @ 2 A@ 2A
0 ;k3 k3 x3 0 0 m3 x3
621
or
Kx = Mx
where = ! 2 . The eigenvalues i = !i2 ; i = 1; 2; 3 are the squares of the natural frequencies
of the rst, second, and third modes of vibration, respectively.
2. Solve the Generalized Eigenvalue Problem
Using the Cholesky-QR Algorithm
Let's take m1 = 20000; m2 = 30000; m3 = 40000; k1 = k2 = k3 = 109 1:5. Then
M = diag(200000 3 ; 30000 ; 40000)1
;1:5 0
BB C
K = 10 @ ;1:5 3 ;1:5 C
9
A
0 ;1:5 1:5
0 141:4214 0 0
1
B C
L = B @ 0 173:2051 0 C A
0 0 200:000
0 1:5 ;:6124 0 1
B C
C = 105 B @ ;:6124 1 ;:4330 CA
0 ;:4330 :3750
The generalized eigenvalues = the eigenvalues of C are: 105(1:9508; :8382; :0860).
The natural frequencies are: 102 (4:4168; 2:8951; 0:9273).
The generalized eigenvectors are:
0 0:0056 ;0:0040 0:0017 1
BB ;0:0034 ;0:0035 0:0031 CC :
@ A
0:0008 0:0028 0:0040
These eigenvectors can be used to determine dierent congurations of the system for
dierent modes. That is, they can be used to see how the structures vibrates under
dierent modes.
9.6.2 A Case Study on the Vibration of a Building
Consider a four-story reinforced concrete building as shown in the gure below. The
oors and
roofs, which are fairly rigid are represented by lumped masses m1 to m4 having a horizontal motion
caused by shear deformation of columns, and k1 to k4 are equivalent spring constants of columns
that act as springs in parallel.
622
m 4
y
4
k
4 m 3
y
3
k
3 m 2
y
2
k
2
m 1
y
1
k
1
We would like to study the conguration when the building is vibrating in its rst two modes
(corresponding to two smallest eigenvalues).
1. Formulate the problem as a symmetric-denite generalized eigenvalue problem in
terms of mass and stiness matrices:
Kx = Mx:
As in the previous example, M is diagonal and K is tridiagonal:
M = diag(m1; m2; m3; m4)
0 k + k ;k 0 0
1
1 2 2
BB ;k k + k ;k 0 C
C
K = B B 2 2 3 3 CC :
B@ 0 ;k3 k3 + k4 ;k4 C A
0 0 ;k4 k4
Taking m1 = 5 107; m2 = 4 107; m3 = 3 107; m4 = 2 107, and k1 = 10 1014; k2 =
623
8 1014, k3 = 6 1014; k4 = 4 1014, we have
0 18 ;8 0 0 1
BB ;8 14 ;6 0 CC
K = 1014 B BB CC
@ 0 ;6 10 ;4 CA
0 0 ;4 4
and
M = 107diag(5; 4; 3; 2):
2. Find the generalized eigenvalues and eigenvectors using the Cholesky-QR algorithm.
0 7:0711 0 0 0
1
BB 0 6:3246 0 0
CC
L = 103 B B CC
B@ 0 0 5:4772 0 C A
0 0 0 4:4721
0 3:6 ;1:7889 0 0
1
BB C
B ;1:7889 3:5000 ;1:7321 0 C CC :
C = 10 B 7
B@ 0 ;1:7321 3:3333 ;1:6330 C A
0 0 ;1:6330 2:000
The eigenvalues of C which are the generalized eigenvalues, are:
0 6:1432 1
BB 1:8516 CC
10 B
7 BB CC :
@ 0:3435 CA
4:0950
The corresponding generalized eigenvectors are:
0 :0666 1 0 ;0:0785 1 0 0:0370 1 0 0:0896 1
B
B ;0:1058 C
C BB ;0:0858 CC BB 0:0753 CC BB ;0:02777 CC
;
10 B3 B C
C ;
; 10 B 3 B C ;
CC ; 10 BB 3 B C
CC ; 10 BBB
;3 CC :
B
@ 0: 0977 C
A B
@ 0 : 0104 A @ 0: 1091 A @ ; 0: 1085 CA
;0:0472 0:1403 0:1318 0:1036
The eigenvectors corresponding to the two smallest eigenvalues 107(:3435)and 107(1:8516) are:
0 0:0370 1 0 ;0:0785 1
BB 0:0753 CC BB ;0:0858 CC
;
10 B B C ; B C
B@ 0:1091 CCA ; 10 BB@ 0:0104 CCA :
3 3
0:1318 0:1403
The rst two modes of vibration are shown in Figures 9.3 and 9.4, respectively.
624
m x m
x 4 4 4
4
m k 4 m k
x 3
3 x 3 4
3
k k
m2 3
m 3
x 2
2
x
2
k 2 m k
x1 m 1 1
2
k 1
x k
1
1
Figure 9.3 First Mode of Vibration Figure 9.4 Second Mode of Vibration
of 4-Story Building of 4-Story Building
625
k F1 sin ω t
m
1
y
1
m2
y2
626
where !1 and !2 are the modal frequencies. For the special case when m1 = m2 = m, !1 and !2
are given by s s
!1 = mk and !2 = 3 mk
and x1 and x2 are given by
x1 = m2(!(22k;;!m! )F1
2
2 )(! 2 ; ! 2 ) ;
1
kF 1
2 (9.6.4)
x2 = m2(!2 ; ! 2)(! 2 ; ! 2) :
1 2
From above, it follows immediately that whenever ! is equal to or close to !1 or !2 , the amplitude
becomes arbitrarily large, signaling the occurrence of resonance. Note that in this case, the
denominator is zero or close to it. This situation is very alarming to an engineer.
In other words, if the frequency of the imposed periodic force is equal or nearly
equal to one of the natural frequencies of a system, resonance results, a situation which
is quite alarming.
Example 9.6.1
To demonstrate the consequences of excitation at or near resonance, let us consider an airplane
landing on a rough runway. The fuselage and engine are assumed to have a combined mass m1.
The wings are modeled by lumped masses m2 and m3 , and stiness k2 and k3 ; k1 represents the
combined stiness of the landing gear and tires. The masses of the wheels and landing gear are
assumed negligible compared to the mass of the fuselage and the wings. The system is modeled in
Figure 9.6.
627
y
v
y0
l = 20 m
The runway is modeled by a sinusoidal curve as shown in the Figure 9.6. Let the contour be
described by y = y0 sin !t, and let the airplane be subjected to a forcing input of f1 sin !t, where
f1 = k1 y0. The equations of motion for the 3 degree of freedom system so described are given by:
2m 0 0 38 > y
9 2 k + k + k ;k ;k 3 8 y 9 8 f sin !t 9
> > > > >
66 1
77 < = 66 1 2 3 2 3 77 < 1 = < 1
1 =
4 0 m 2 0 5> y
2 > + 4 ; k 2 k 2 0 5> y 2 > = > 0 > :
0 0 m3 y3 : ; ;k3 0 k3 : y3 ; : 0 ;
The airplane is shown schematically in Figure 9.7, where m1 represents the mass of the fuselage,
m2 and m3 represent the lumped masses of the wings. The combined stiness of the landing gear
and tires is denoted by k1, and k2 and k3 are the stinesses of the wings. Finally, y1 , y2 , and y3
represent motion relative to the ground.
628
y
1
y3 y
2
k3 k
2
m
3
m1 m2
k
1
and 0y 1
BB y1 CC
y=BBB ..2 CCC
@.A
yn
is an n-vector. The matrices M and K are, as usual, the mass and stiness matrices. Assuming
that these matrices are symmetric and M is positive denite, we will now show how the simulta-
neous diagonalization technique described earlier can be protably employed to solve this system
of second-order equations.
The idea is to decouple the system into n uncoupled equations so that each of these uncoupled
equations can be solved using a standard technique. Let P be the modal matrix such that:
P T MP = I; (9.7.2)
P T KP = = diag(!12; : : :; !n2 ): (9.7.3)
Let y = Pz , so the homogeneous system
M y + Ky = 0
becomes
MP z + KPz = 0:
Next premultiplying by P T ,
P T MP z + P T KPz = 0
or
z + z = 0:
The homogeneous system therefore decouples:
zi + !i2zi = 0; i = 1; 2; : : :; n:
630
or
z1 + !12z1 = 0
z2 + !22z2 = 0
.. ; (9.7.4)
.
zn + !n2 zn = 0
where 0z 1
1
B
B z2 C
C
B
z=B .. CCC ; = diag(!12; !22; : : :; !n2 ):
B
@.A
zn
The solution of the original system now can be obtained by solving these n decoupled equations
using standard techniques and then recovering the original solution y from
y = Pz:
Thus, if the solutions of the transformed system (9.7.4) are given by
zi = Ai cos !i t + Bi sin !it; i = 1; 2; ; n;
then the solutions of the original system (9.7.1) are given by
0 y 1 2 A cos ! t + B sin ! t 3
BB y1 C 66 1 1 1 1 77
BB .2 CC = P 66 A2 cos !2t + B2 sin !2t 77
B@ .. C
C
A 64 ..
6. 77 : (9.7.5)
5
yn An cos !nt + Bn sin !n t
The constants Ai and Bi are to be determined from the initial conditions such as:
yijt=0 = displacement at time t = 0
y_ijt=0 = initial velocity.
Example 9.7.1
Consider the system with three masses in the example of the Section 9.6.1. Suppose that the
system, when released from rest at t = 0, is subjected to the displacement as given below:
We would like to nd the undamped time response of the system. The initial conditions are:
3
y1 = 1 77 y_1 = 0
y2 = 2 75 y_2 = 0 :
y3 = 3 y_3 = 0
631
Since the initial velocities are zeros, we obtain
y_1 = PB1 !1 = 0
y_2 = PB2 !2 = 0
y_3 = PB3 !3 = 0:
These give B1 = B2 = B3 = 0.
Again, at t = 0, we have 0y 1 0A 1 011
1
B
B CC BB 1 CC BB CC
@ y2 A = P @ A2 A = @ 2 A :
y3 A3 3
Recall that for this problem
0 0:0056 ;0:0040 0:0017 1
B ;0:0034 ;0:0035 0:0031 CC :
P =B
@ A
0:0008 0:0028 0:0040
Then the solution of the linear system
0A 1 011
BB 1 CC BB CC
P @ A2 A = @ 2 A
A3 3
is
A1 = 2:5154
A2 = 55:5358
A3 = 710:6218:
Substituting these values of A1 ; A2; A3 and the values of !1 ; !2; and !3 obtained earlier in
y = Pz
or 0 y 1 0 z 1 0 A cos ! t 1
1 1 1 1
B
B C B C B C
@ y2 C
A = P B@ z2 CA = P B@ A2 cos !2t CA ;
y3 z3 A3 cos !3 t
we obtain the values of y1 ; y2; and y3 that give the undamped time response of the systems subject
to the given initial conditions.
632
Case 2. The Damped System Space
Some damping, such as that due to air resistance,
uid and solid friction, etc., is present in all real
systems. Let us now consider damped homogeneous systems.
Let D be the damping matrix. Then the equations of motion of the damped system becomes
M y + Dy_ + Ky = 0: (9.7.6)
Assume that D is a linear combination of M and K ; that is
D = M + K; (9.7.7)
where and are constants. Damping of this type is called proportional or Rayleigh damping.
Let P be the modal matrix. Then we have
P T DP = P T MP + P T KP
= I + :
Let z = P T y . Then the above homogeneous damped equations are transformed to n decoupled
equations:
zi + ( + !i2)z_i + !i2 zi = 0; i = 1; 2; : : :; n: (9.7.8)
In engineering practice it is customary to assume modal damping. In modal damping, the
damping is proportional to mass or stiness or a combination of them in a certain special manner.
Let and be chosen so that
+ !i2 = 2i !i:
i is called the modal damping ratio of the ith mode. i is usually taken as a small number
between 0 and 1. The most common values are 0 0:05. (See Inman (1994), pp. 196.)
However, in some applications, such as in the design of
exible structures, fig are taken to be as
low as 0.005. On the other hand, for an automatic shock absorber, a value as high as = 0:5 is
possible.
Assuming modal damping, the decoupled equations (9.7.8) become:
zi + 2i!i z_i + !i2 zi = 0; i = 1; 2; : : :; n:
The solutions of these equations are then given by
q q
zi = e;i !i t Ai cos !i 1 ; 2t + B
i i sin !i 1 ; 2t
i ; i = 1; 2; : : :; n;
where the constants Ai and Bi are to be determined from the given initial equations.
The original system can now be solved by solving these n uncoupled equations separately, and
then recovering the original solution y from y = Pz .
633
Example 9.7.2 Usefulness of Proportional Damping
Consider the following system with two degrees of freedom (DOF). We will show here why
proportional damping is useful.
y y
1 2
k k 2k
m 2m
d d d
1 2 3
The equations of motion of the system are developed by considering a free body diagram for
each mass.
For mass m:
634
For mass 2m:
2 = 5 + 4 (9.7.9)
;1 = ;2 (9.7.10)
3 = 10 + 6 (9.7.11)
Equations (9.7.9), (9.7.10) and (9.7.11) are uniquely satised with = 1 and = 0. So, this is a
2
case of proportional damping.
635
Case 2: d1 = 2; d2 = 4; d3 = 1
" 6 ;4 # " 5 0 # " 4 ;2 #
= +
;4 5 0 10 ;2 6
6 = 5 + 4 (9.7.12)
;4 = ;2 (9.7.13)
5 = 10 + 6: (9.7.14)
Now equations (9.7.12), (9.7.13) and (9.7.14) cannot all be satised with any set of values of and
. This is a case of nonproportional damping.
In the rst case, the equations of motion can be decoupled using y = Pz, obtaining
real mode shapes and real natural frequencies. However, in the second case, such de-
coupling is not possible. This type of damping will lead to complex natural frequencies
and mode shapes.
Damped Systems Under Force Excitation
When a damped system is subject to an external force F , the equations of motion are given by
8 F (t) 9
>
> 1 > >
>
< F2(t) >
=
M y + Dy_ + Ky = F (t) = > .. > : (9.7.15)
> . >
>
: Fn(t) >
;
Assuming that M is symmetric positive denite, K is symmetric, and damping is proportional,
it is easy to see from our previous discussion that the above equations can be decoupled using
simultaneous diagonalization.
Let P = (pij ) be the modal matrix. Then
0p p pn1 1 0 F1 1
11 21
B
B p12 p22
C BB F CC
pn2 C
B
B .. .. CB 2 C
.. C B.C
B
B . . . CCC BBB .. CCC :
P F =B
T
B
B p1i p2i pni C BF C
B
B . . .. CCC BBB ..i CCC
@ .. .. . A@ . A
p1n p2n pnn Fn
The decoupled equations will then be given by
zi + 2i!i z_i + !i2 zi = p1iF1 + p2iF2 + + pniFn ;
636
or
zi + 2i!i z_i + !i2zi = Ei(t); (9.7.16)
X
n
where Ei(t) = pji Fj ; i = 1; 2; : : :; n:
j =1
The function Ei (t) is called the exciting or forcing function of the ith mode. If each force Fi
is written as
Fi = fis(t);
then
X
n
Ei(t) = s(t) pjifj :
j =i
Denition 9.7.1 The expression
X
n
pjifi
j =1
is called the mode participation factor for the ith mode.
Once the decoupled equations are solved for zi we can recover the solutions of the original
equations from:
y = Pz
or 0y 1 0z 1
BB y1 CC BB z1 CC
BB .2 CC = P BB .2 CC :
B@ .. CA B@ .. CA
yn zn
The solutions of the decoupled equations
zi + 2i !i z_i + !i2zi = Ei(t)
depend upon the nature of the force F (t). For example, when the force is a shock type force, such
as an earthquake, etc., one is normally interested in maximum responses, and the maximum values
of z1; z2; : : :; zn can be obtained from the responses of a single equation of one degree of freedom
(see the example below).
637
For example, the large space structure (LSS) is a distributed parameter system. It is therefore
innite dimensional in theory, but of very large dimension in practice. A nite element generated
reduced order model can be a large second order system. The dimension of the problem in such
an application can be several thousand. Naturally, the solution of a large system will lead to
a solution of a very large generalized eigenvalue problem. Unfortunately, eective numerical
techniques for computing a large number of generalized eigenvalues and eigenvectors
are virtually nonexistent and not very well developed (see the section on the Generalized
Symmetric Eigenvalue Problem for Large and Structure Matrices). It is therefore natural to think
of solving the problem by constructing a reduced-order model with the help of a few eigenvalues
and eigenvectors which are feasible to compute. Such a thought is based on an assumption that in
many instances the response of the structure depends mainly on a rst couple of eigenvalues (low
frequencies); usually the higher modes do not get excited.
We will now show how the computations can be simplied by using only the knowledge of a few
eigenvalues and eigenvectors.
Suppose that, under the usual assumption that M and K are symmetric and of order n and M
is positive denite, we were able to compute only the rst few normal modes, perhaps m of them
where m n. Let the matrix of these normal modes be Pnm : Then
P T MP = Imm
P T KP = mm = diag(!12; : : :; !m2 ):
Setting y = Pz and assuming that the damping is proportional to mass or stiness, the system of
n dierential equations (9.7.17) then reduces to m equations:
zi + 2i!i z_i + !i2zi = Ei(t); i = 1; 2; : : :; m;
where Ei is the ith coordinate of the vector P T F . Once this small number of equations is solved,
the displacement of any masses under the external force can be computed from:
yi = Pzi; i = 1; : : :; n:
Sometimes only the maximum value of the displacement is of interest.
Several vibration groups in industry and the military use the following approximation to obtain
the maximum value of yi (see Thompson (1988)):
v
uX
jyijmax = jp1z1(max)j + u
m
t jpj zj(max)j2;
j =2
y
4
k
4
y
3
k
3
y
2
k
2
y
1
k
1
y
0
The decoupled normal mode equations in modal form in this case can be written as:
zi + 2i!i z_i + !i2zi = ;Eiy0; i = 1; 2;
where
y0 = absolute acceleration of the moving support,
X
4
Ei = pjimj = mode participation factor of the chosen mode pi due to support existence.
j =1
639
pji are the coordinates of the participating mode Pi, that is,
20 p 1 0 p 13
66BB p11 CC BB p12 CC77
P = (p1; p2) = 666B BB 21 CCC ; BBB 22 CCC777 ;
4@ p31 A @ p32 A5
p41 p42
where p1 and p2 are the two chosen participating modes.
Let R1 and R2 denote the maximum relative responses of z1(max) and z2(max) obtained from
previous experience. Then we can take
z1(max) = E1R1
z2(max) = E2R2:
This observation immediately gives:
0 1 0 1 0 1
BB y1 CC p
BB 11 CC BB p12 CC
BB y2 CC B p21 CC B p22 CC
BB y3 CC = E1R1 BBB p31 CC + E2R2 BBB p32 CC :
@ A @ A @ A
y4 max
p41 p42
Using now the data of the example in section 9.6.2, we have
m1 = 107(5); m2 = 107(4); m3 = 107(3); m4 = 107(2):
p1 = the eigenvector corresponding to the smallest eigenvalue
0 0:0370 1
BB CC
B 0 : 0753 C
= 10;3 BB@ 0:1091 CCA
0:1318
p2 = the eigenvector corresponding to the second smallest eigenvalue
0 ;0:0785 1
BB 0:0858 CC
= 10 B
; 3 BB CC
CA
@ 0 : 0104
0:1403
E1 = m1p11 + m2p21 + m3 p31 + m4p41 = 104(1:0772)
E2 = m1p12 + m2 p22 + m3p32 + m4 p42 = 103(;4:2417)
640
Assume that R1 = 1:5 inches, R2 = :25 inches. We obtain
0y 1 0 0:0370 1
BB y1 CC BB 0:0753 CC
BB 2 CC = 10;3:104(1:0772)(1:5) B BB CC
B@ y3 CA @ 0:1091 CA
y4 max 00:;1318 1
0:0785
BB C
; B 0:0858 CC
+ 10 :10 (;4:2417)(:25) B
3 3
B@ 0:0104 CCA
0 0:5979 1 0 0:0833 1 0:1403
BB 1:2169 CC BB 0:0910 CC
BB C B C
= B@ 1:7636 CCA + BB@ ;0:0110 CCA
0 20::1293
6812 in.
1 ;0:1488
BB 1:3079 in. CC
BB C
= B@ 1:7526 in. CCA
1:9805 in.:
Thus, the absolute maximum displacement (relative to the moving support) of the rst
oor is
0.6812 in., that of the second
oor is 1.3079 in., etc.
The absolute maximum relative displacements are obtained by adding the terms using their
absolute values: 0y 1 0 0:6812 in. 1
1
B
B C
y2 C
BB 1:3079 in. CC
B
B C
C =B BB CC
B
@ y3 AC @ 1:7747 in. CA
y4 (abs. max.) 2:2781 in.
641
m
y4 4
m k 4
y3 3
k
y2 m2 3
k 2
y1 m 1
k 1
Note: The contribution to the second participating mode to responses is small in comparison
with the contribution of the rst mode.
Average Maximum Relative Displacement of the Masses
The absolute maximum relative displacement of the masses provides us with an upper bound
for the largest relative displacements the masses can have, and thus help us to choose design
parameters. Another practice for such a measure in engineering literature has been to use the root
sum square of the same terms, giving the \average" maximum relative displacement values:
q
(yi)average max = (E1R1pi1 )2 + (E2R2pi2 )2 + + (Ek Rk Pik )2
For the above example, k = 2 and the average maximum relative displacements are given by
q
(y1 )average max = (E1R1p11)2 + (E2R2 p12)2 = :9975 inches;
q
(y2 )average max = (E1R1p21)2 + (E2R2 p22)2 = 1:5610 inches;
and so on.
642
9.8 The Quadratic Eigenvalue Problem
So far we have considered the second-order system:
M y + Dy_ + Ky = 0 (9.8.1)
under the assumption of proportional or modal damping. This assumption let us use the simulta-
neous diagonalization on M; K; and D (see Section 9.7.1, Case 2), which, of course, as we have
seen in Section 9.5, amounts to solving a generalized eigenvalue problem.
In case of general damping, we will, however, have the quadratic eigenvalue problem:
(2M + D + K )x = 0:
This problem has 2n eigenvalues and there are 2n eigenvectors corresponding to them. We de-
scribe two existing approaches to solve this quadratic eigenvalue problem and show how to extract
frequencies and from these eigenvalues and eigenvectors. We assume, as before, M is symmetric
positive denite.
Approach 1: Reduction to a standard Eigenvalue Problem.
Multiplying both sides of the equation (9.8.1). by M ;1 we have
y + M ;1 Dy_ + M ;1 Ky = 0: (9.8.2)
Write
y_ = z_1 = z2 (9.8.3)
Then
y = z_2 = ;M ;1 Dy_ ; M ;1 Ky (from 9.8.2)
(9.8.4)
= ;M ;1 Dz2 ; M ;1 Kz1 (from 9.8.3)
The equations (9.8.3) and (9.8.4) can now be combined into the single matrix equation:
z_1 ! 0 I ! z1 !
= (9.8.5)
z_2 ;M ;1K ;M ;1D z2
That is
z_ = Az (9.8.6)
where
z
! 0 I
!
z = 1 ; and A = : (9.8.7)
z2 ;M ;1K ;M ;1D
Assuming a solution of the equation (9.8.6) of the form z = xet , we have the standard eigenvalue
problem:
Ax = x; (9.8.8)
643
where A is 2n 2n as given by 9.8.7. There are 2n eigenvalues i ; i = 1; :::; 2n of the above problem
and 2n corresponding eigenvectors.
If the second-order system is the model of a vibrating structure, then the natural frequencies
and the modal damping ratios can be computed from the eigenvalues and eigenvectors as follows.
644
That is
B z_ = Az (9.8.14)
;K ! ;K 0 ! !
where
0 z1
A= ; B= z= (9.8.15)
;K ;D 0 M z2
Assuming again a solution of (9.8.14) of the form z = xet, the equation (9.8.14) yields the gener-
alized eigenvalue problem:
Ax = Bx; (9.8.16)
where A and B are both symmetric and are given by (9.8.15).
Since A and B are both 2n 2n, the solution of (9.8.16) will give 2n eigenvalues and eigen-
vectors. The natural frequencies and modal damping ratios can be extracted from the
eigenvalue as in Approach 1.
Summary: The quadratic eigenvalue problem
(2M + D + K )x = 0
can be either reduced to the standard nonsymmetric eigenvalue problem:
Ax = x
where
I
0
!
A=
;M K ;M ;1D
; 1
;K ! ;K 0 !
where
0
A= ; B= :
;K ;D 0 M
Once the eigenvalues and eigenvectors are computed, the natural frequencies and the modal damp-
ing ratios of a vibrating structure whose mathematical model is the second-order system (9.8.1)
can be computed using (9.8.9) and (9.8.10), respectively.
A Word of Caution About Using Approach 1
If M is ill-conditioned, the explicit formation of the matrix A given by (9.8.7) by actually
computing M ;1 will be computationally disastrous. A will be computed inaccurately and so will
be the eigenvalues and eigenvectors. Also note that A is nonsymmetric.
645
9.9 The Generalized Symmetric Eigenvalue Problems for Large and Structured
Matrices
As we have seen in several previous case studies, in many practical situations the matrices A and
B are structured: tridiagonal and banded cases are quite common. Unfortunately, the
Cholesky algorithm when applied to such structured problems, will very often destroy the sparsity.
Even though A and B are banded, the matrix C = L;1 A(LT );1 will in general be full. Thus,
simultaneous diagonalization techniques is not practical for large and sparse matrices.
9.9.1 The Sturm Sequence Method for Tridiagonal A and B
We now present a method when A and B are both symmetric tridiagonal and B is positive denite.
The method is a generalization of the Sturm-sequence algorithm for the symmetric eigenvalue
problem described earlier in Chapter 8, and takes advantage of the tridiagonal forms of A and B .
Let 0 0 1 0 0 0 0 1
1 1
BB . . . . . . CC BB 10 . .1. . . . CC
B 1 0 C B 1 C
A=B B@ ... . . . . . . n;1 CCA ; B = BB@ ... . . . . . . n0 ;1 CCA :
0 n;1 n 0 n0 ;1 0n
Dene the sequence of polynomials fp()g given by:
p0() = 1; (9.9.1)
p1() = 1 ; 01; (9.9.2)
pr () = (r ; r 0r )pr;1() ; (r ; r0 )2pr;2 (); (9.9.3)
r = 2; 3; : : :; n:
Then it can be shown (exercise #22) that these polynomials form a Sturm sequence.
The generalized eigenvalues of the pencil (A ; B ) are then given by the zeros of pn (). The
zeros can be found by bisection or any other suitable root-nding method for polynomials. For a
proof of the algorithm, see Wilkinson AEP, pp. 340-341.
Example 9.9.1
01 0 01 01 1 0 1
B 0 2 0 CC ; B = BB 1 4 1 CC :
A=B
@ A @ A
0 0 3 0 1 5
p0() = 1; p1() = 1 ;
646
p2() = (2 ; 4)(1 ; 2) ; (0 ; )2 1
= 32 ; 6 + 2
p3() = (3 ; 5)(32 ; 6 + 2) ; (;)2(1 ; )
= ;143 + 382 ; 28 + 6:
The roots of p3() = 0 are: 1.6708, 0.6472, and 0.3964. It can be easily checked that these are
the generalized eigenvalues of the pencil (A ; B ).
Remarks:
1. About two iterations per eigenvector are adequate.
2. Note that the system (A ; B )xi+1 = yi can be solved by taking advantage of the tridiagonal
forms of A and B .
NOTES:
1. The Lanczos vectors in this case are B-orthonormal:
viT Bvi = 1
viT Bvj = 0; i 6= j:
2. For j = n:
V T BV = Inn
V T AV = Tnn; a symmetric tridiagonal matrix.
3. If (; s) is an eigenpair of Tj , then (; Vj s) is the corresponding Ritz pair for the pencil (A; B ).
648
Computational Remarks
Note that L;1ri can be computed by solving the lower triangular system:
Lyi = ri :
Also,
Bvi = ri;1=i;1
is equivalent to
LT vi = L;1 (ri;1=i;1); since B = LLT .
Thus, to implement the scheme, we need (i) a Cholesky-factorization routine for a
sparse matrix, (ii) routines for sparse triangular systems, and (iii) a routine for matrix-
vector multiplication.
9.9.3 Estimating the Generalized Eigenvalues
Fact: For large enough j the extreme eigenvalues of the tridiagonal matrices
0 1
1 2
BB . . . . . . CC
Tj = BBB ..2 . . . . . . CCC
@. j A
j j
will approximate the generalized eigenvalues of the pencil (A ; B ).
Example 9.9.2
01 2 31
B 2 3 4 CC
A = B
@ A
3 4 5
B = diag(1; 2; 3)
L = diag(1; 1:4142; 1:7321)
0 = 1; v1 = (1; 0; 0)T ; r0 = v1:
(Note that v1T Bv1 = 1).
649
i=1:
1 1 ; Bv0) = 1
1 = v01T (Av
0
r1 = B
B 2 CC
@ A
3
1 = 2:2361:
i=2:
0 0 1
B 0:4472 CC
v2 = B
@ A
0:4472
2 = v2T (Av2 ; !Bv1) = 3:2000 !
1 1 1 2:2361
T2 = = :
1 2 2:2361 3:2000
The eigenvalues of T2 are ;0:3920; 4:5919. The generalized eigenvalues of (A; B ) are: 4.6013,
;0:4347, 0. (Note that the largest eigenvalue of T2, 4.5919, is a reasonable approx-
imation of the largest generalized eigenvalue 4.6013 of the pencil (A ; B).)
i=3:
0 0 1
B ;:5477
v3 = B
CC
@ A
= 0:3651
3 = v3T (Av3 ; Bv2 ) = ;0:0333:
Thus, 0 0 1 0 1 2:2361 1
1 1 0
T3 = T = B
B CC = BB C
@ 1 2 2 A @ 2:2361 3:2000 0:2449 CA :
0 2 3 0 0:2449 ;0:0333
The eigenvalues of T are: ;0:4347; 0; 4:6013. The eigenpairs of T2 are:
!! !!
0 :
;0:3920; ;0:52858489 0 :5285
and 4:5919; 0:8489 :
650
9.10 The Lanczos Method for Generalized Eigenvalues in an Interval
In several important applications, such as in vibration and structural analysis, one is frequently
interested in computing a specied number of eigenvalues in certain parts of the spectrum. We give
a brief outline of a Lanczos algorithm for the generalized symmetric eigenvalue problem to do this.
The algorithm was devised by Ericsson and Ruhe (1980).
Algorithm 9.10.1 The Lanczos Method for Generalized Eigenvalues in an Interval
Given (A; B ), A symmetric and B symmetric positive denite, the following algorithm computes
the generalized eigenvalues k of the pencil (A ; B ) in a specied interval [; ].
Denote
S (; ) = the set of eigenvalues in (; );
C (; ) = the set of converged eigenvalues in (; ):
Set
1 = ; C = (null set).
For i = 1; 2; : : : do until jC (; )j = jS (; )j
1. Factorize (A ; i B ) :
A ; i B = LDLT :
Count the number of negative diagonal entries of D. Set that count equal to jS (; i)j.
2. Choose a random unit vector v1.
3. Apply the symmetric Lanczos algorithm to the matrix
W T (A ; iB );1 W;
where
B = WW T ;
with v1 as the starting Lanczos vector.
4. Let 1 ; 2; : : :; r be the converged eigenvalues from Step 3. Then add i + 1 to C; s =
s
1; : : :; r.
5. Determine a new shift i+1 .
651
Remarks: Note that in applying the symmetric Lanczos to
W T (A ; i B);1 W;
we need a matrix-vector multiply of the form
y = W T (A ; iB );1 Wx
which can be obtained as follows:
1. Compute z = Wx
2. Solve for p:
(A ; i B )p = z
which can be done by making use of the factorization in Step 1 of the algorithm.
3. Compute y = W T p.
For details of this algorithm, its implementation and proof, see the paper by Ericsson and Ruhe
(1980).
652
3. The Generalized Symmetric Eigenvalue Problem. Almost all eigenvalue problems
arising in structural and vibration engineering are of the form
Mx = Kx;
where M is symmetric positive denite, and K is symmetric positive semidenite. This is
called the symmetric-denite generalized eigenvalue problem. Because of the impor-
tance of this problem, it has been studied in some depth here.
Several case studies from vibration engineering have been presented in section 9.6 to show
how this problem arises in important practical applications. These include:
(i) vibration of a free spring-mass system,
(ii) vibration of a building,
(iii) and forced harmonic vibration of a spring-mass system.
The natural frequencies and amplitudes of a vibrating system are related, respectively, to
the generalized eigenvalues and eigenvectors. If the frequency of the imposed periodic
force becomes equal or nearly equal to one of the natural frequencies of the
system, then resonance occurs, and the situation is quite alarming.
The fall of Tacoma Bridge in the state of Washington, in the USA, and of Boughton Bridge
in England are related to such a phenomenon. (See Chapter 8.)
The QZ method can, of course, be used to solve a symmetric denite generalized eigen-
value problem. However, both symmetry and deniteness will be lost in general.
A symmetry-preserving method is the Cholesky-QR algorithm. This has been de-
scribed in section 9.5.1. The accuracy obtained by this algorithm can be severely im-
paired if the matrix B in
Ax = Bx
is ill-conditioned.
A variation of this method due to Wilkinson that computes the smallest eigenvalues
with reasonable accuracy has been described in this section. The method constructs an
ordered real Schur form of B rather than the Cholesky factorization.
4. Simultaneous Diagonalization and Applications. The Cholesky-QR algorithm for the
symmetric denite problem
Ax = Bx
653
basically constructs a nonsingular matrix P that transforms A and B simultaneously to
diagonal forms by congruence:
P T AP = a diagonal matrix
P T BP = I:
This is called simultaneous diagonalization of A and B . In vibration and other engineering
applications, this decomposition is called modal decomposition and the matrix P is called
a modal matrix.
The technique of simultaneous diagonalization is a very useful technique in engineering prac-
tice. Its applications include
(i) decoupling of a second order system of dierential equations
M y + Dy_ + ky = 0
to n independent equations
zi + ( + !i2 )z_i + !i2 zi = 0; i = 1; 2; : : :; n;
where D = M + K , and
(ii) making of a reduced order model from a very large system of second-order systems.
Decoupling and model reduction are certainly very useful approaches for handling a large
second-order system of dierential equations. Unfortunately, however, the simultaneous
diagonalization technique is not practical for large and sparse problems. On the
other hand, many practical problems, such as the design of large space structures, etc., give
rise to very large and sparse symmetric denite eigenvalue problems.
The simultaneous diagonalization technique preserves symmetry, but destroys the other ex-
ploitable properties oered by the data of the problem, such as sparsity. (Note that most
practical large problems are sparse and maintaining sparsity is a major concern
to the algorithm developers to economize the storage requirements.)
5. The Quadratic Eigenvalue Problem: The eigenvalue problem (2M + D + K )x = 0 is
discussed in Section 9.8. It is shown that the problem can be reduced either to a standard
2n 2n nonsymmetric problem (9.8.8) or to a 2n 2n symmetric generalized eigenvalue
problem (9.8.16). It is also shown how to extract the frequencies and modal damping ratios
of a vibrating system governed by a second-order system once the eigenvalues are obtained
(Equations 9.8.9 and 9.8.10).
654
6. The Sturm-sequence and Lanczos Methods. We have given an outline of the Sturm-
sequence method for the generalized eigenvalue problem with tridiagonal matrices A and
B . The symmetric denite tridiagonal problems arise in several applications.
We have also given a very brief description of the symmetric Lanczos method for a generalized
eigenvalue problem.
The chapter concludes with a Lanczos-algorithm due to Ericsson and Ruhe for nding a specied
number of generalized eigenvalues in an interval of a symmetric denite generalized eigenvalue
problem (section 9.9).
655
For applications of the symmetric denite generalized eigenvalue problem to earthquake en-
gineering, see the book Introduction to Earthquake Engineering, (Second Edition), by
S. Okamoto, University of Tokyo Press, 1984.
A technique more ecient than the Cholesky-QR iteration method for computing the general-
ized eigenvalues of a symmetric denite pencil for banded matrices has been proposed by Crawford
(1973). See also the recent work of Wang and Zhao (1991) and Kaufman (1993). For a look-ahead
Lanczos algorithm for the quadratic eigenvalue problem see Parlett and Chen (1991).
Laub and Williams (1992) have considered the simultaneous triangularizations of matrices
M; D; and K of the second-order system M y + Dy_ + Ky = 0.
Chapter 15 of Parlett's book \The Symmetric Eigenvalue Problem" (1980) is a rich source
of knowledge in this area.
656
Exercises on Chapter 9
PROBLEMS ON SECTION 9.2
1. Show that when A and B have a common null vector, then the generalized characteristic
equation is identically zero:
det(A ; B ) = 0:
2. (a) Prove that if A and B are n n matrices, then det(A ; B ) is a polynomial of degree
at most n.
(b) The degree of det(A ; B ) is equal to n i A is nonsingular.
10. Consider the Hessenberg-triangular reduction of (A; B ) with B singular. Show that in this
case, if the dimension of the null space AB is k, then the Hessenberg-Triangular structure
A11 A12 ! !
takes the form:
0 B12
A= ; B= ;
0 A22 0 B22
where A11 is a k k upper triangular matrix, A22 is upper Hessenberg, and B22 is (n ; k)
(n ; k) upper triangular and nonsingular. How does this help in the reduction process of the
generalized-Schur form?
11. Work out the
op-count for the Cholesky-QR algorithm of the symmetric denite pencil.
12. Let A and B be positive denite. Show that the algorithm for simultaneous diagonalization
requires about 7n3
ops.
13. Generalized Orthogonal Iteration. Consider the following iterative algorithm known as
the generalized orthogonal iteration:
(a) Choose an n m orthonormal matrix Q0 such that QT0 Q0 = Imm .
(b) For k = 1; 2; : : : do
i. Solve for Zk : BZk = AQk;1.
ii. Find QR factorization of Zk : Zk = Qk Rk .
Apply the above algorithm to
01 1 1
1 0 10 1 1 1
1
BB 1 3 4C
C B C
A=B BB
2 CC ; B = BBB 1 10 1 1C CC :
@1 3 4 5C A B@ 1 1 10 1C A
1 4 5 6 1 1 1 10
14. Given
01 1 11 0 10 1 0 1
B
(a) A = B
C B C
@ 1 1 1 CA ; B = B@ 1 10 1 CA :
1 1 1 0 1 10
658
0 10 1 1 1 01 1 1 1
B C B 2 3
CC
(b) A = B
@ 1 10 1 CA ; B = B@ 12 1
3
1
4 A:
1 1 1 1 1 1
3 4 5
Find the generalized eigenvalues and eigenvectors for each of the above pair using
i. the QZ iteration followed by inverse iteration,
ii. the generalized Rayleigh-Quotient iteration,
iii. techniques of simultaneous diagonalization.
659
3k
y1
2m
y
2
k
y
3
2m
3k
660
18. Consider the four story building as depicted by the following diagram:
m
4
m 3
m 2
m 1
Given
m1 = 1:0 105kg
m2 = 0:8 105kg
m3 = 0:5 105kg
m4 = 0:6 105kg
k1 = 15 108N/m; k2 = 12 108N/m;
k3 = 15 108N/m; k4 = 10 108N/m:
Find the maximum amplitude of each
oor for a horizontal displacement of 3 mm with a
period of .25 second.
661
19. Consider the following diagram of a motor car suspension with vibration absorber (taken
from the book Linear Vibrations by P. C. Muller and W. O. Schiehlen, p. 226).
car body
y1 (t) y3 (t)
m1
m3 absorber
k d k d
1 1 3 3
y2 (t)
m2 axle & wheel
suspension
k tire
2
guide way
ye (t)
Given
m1 1200kg;
=
m2 80kg;
=
m3 20kg;
=
k1 300 N ;
=
cm
N;
k2 = 3200 cm
N:
k3 = 600 cm
Find the various amplitude frequency responses with dierent damping values of the ab-
sorber d3 = 0; 300; 600; 1000 Ns/m. (The response of a system is measured by the amplitude
ratios.)
662
PROBLEMS ON SECTIONS 9.8 AND 9.9
20. Find the eigenvalues of the quadratic pencil: (M2 + K + D)x = 0 using both approach 1
and approach 2 of Section 9.8, and compare the results.
0 2 ;1 0 1 01 1 11
M = I33; K = B
B ;1 2 ;1 CC ; D = BB 1 1 1 CC :
@ A @ A
0 ;1 2 1 1 1
21. Deduce the equations (9.8.9) and (9.8.10) for frequencies and the mode shapes.
22. Show that the polynomials dened by (9.9.1){(9.9.3) form a Sturm sequence.
23. Find the generalized eigenvalues for the pair (A; B ), where
01 1 01 0 10 1 0 1
B C B C
A=B
@ 1 1 1 CA ; B = B@ 1 10 1 CA
0 1 1 0 1 10
using the Sturm sequence method described in section 9.9.1.
24. Estimate the generalized eigenvalues of the pair (A; B ) of problem #14 using the generalized
symmetric Lanczos algorithm.
25. Find the generalized eigenvalues of the pair (A; B ) of problem #23 in the interval (0; 0:3).
26. Prove that the eigenvalues of the symmetric denite pencil A;B lie in the interval [;kB ;1 Ak; kB ;1 Ak].
in exercise #23.
663
MATLAB PROGRAMS AND PROBLEMS ON CHAPTER 9
1. Write a MATLAB program, called hesstri, to reduce a pair of matrices (A; B ) to a (Hessenberg-
triangular) pair:
[H; T ] = hesstri(A; B ):
Test your program by randomly generating A and B , each of order 15.
2. (a) Write a MATLAB program, called qzitri, to implement one step of QZ iteration algo-
rithm:
[A1; B 1] = qzitri (A; B ):
(b) Now apply one step of qritrdsi. (Double-shift implicit QR iteration from Chapter 8) to
C = AB;1 :
[C ] = qritrdsi (C ):
(c) Compute D = A1 B 1;1 , where (A1; B 1) is the Hessenberg-triangular pair obtained in
step (a).
(d) Compare C and D to verify that they are essentially the same.
Test-Data:
A = A 15 15 randomly generated unreduced upper Hessenberg matrix.
B = A 15 15, upper triangular matrix with all entries equal to 1, except two diagonal
entries each equal to 10;5.
3. Using lynsyspp from Chapter 6 or the MATLAB Command `n' and hesstri, write a MAT-
LAB program, called eigenvecgen to compute a generalized eigenvector u corresponding to
a given approximation to a generalized eigenvalue :
[u] = eigvecgen(A; B; ):
Test your program using randomly generated matrices A and B , each of order 15, and then
compare the result with that obtained by running the MATLAB command :
[U; D] = eig(A; B ):
4. (The purpose of this exercise is to compare the accuracy of dierent ways to
nd the generalized eigenvalues and eigenvectors of a symmetric denite pencil
(K ; M )).
664
(a) Using the MATLAB commands chol, inv, eig, and `n' (or bascksub from Chapter 3),
write a MATLAB program, called geneigsymdf, to implement the Cholesky algorithm
for the symmetric-denite pencil (K ; M ) (Algorithm 9.5.1).
[V 1; D1] = geneigsymdf (K; M );
where V 1 is a matrix containing the generalized eigenvectors as its columns and D1 is a
diagonal matrix containing the generalized eigenvalues of the symmetric-denite pencil
K ; M .
(b) Run eig(K; M ) from MATLAB to compute (V 2; D2):
[V 2; D2] = eig(K; M ):
(c) Run eig(inv(M ) K ) from MATLAB to compute (V 3; D3) :
[V 3; D3] = eig (inv (M ) K ):
(d) Compare the results of (a), (b), and (c).
Test-data:
0 m; m; ; m)2525
M = diag( 1
BB k ;k 0 0 0 CC
BB ;k 2k ;k 0 0 CC
B .. . . . . . . .. CC
K = B B. . . . . CC
BB .. . . . . . . . . . . . . ;k CC
B@ . A
0 0 ;k 2k 2525
m = 100;
k = 108 :
5. Using chol, inv, eig from MATLAB, write a MATLAB program, called
diagsimul, to simultaneously diagonalize a pair of matrices (M; K ), where M is symmetric
positive denite and K is symmetric (Algorithm 9.5.2):
666
command
[U; D] = eig(A; B )].
667
10.THE SINGULAR VALUE DECOMPOSITION (SVD)
10.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 669
10.2 The Singular Value Decomposition Theorem : : : : : : : : : : : : : : : : : : : : : : : 670
10.3 A Relationship between the Singular Values and the Eigenvalues : : : : : : : : : : : 673
10.4 Applications of the SVD : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 675
10.5 Sensitivity of the Singular Values : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 678
10.6 The Singular Value Decomposition and the Structure of a Matrix : : : : : : : : : : : 679
10.6.1 The Norms and the Condition Number : : : : : : : : : : : : : : : : : : : : : 680
10.6.2 Orthonormal Bases for the Null Space and the Range of a Matrix : : : : : : : 681
10.6.3 The Rank and the Rank-Deciency of a Matrix : : : : : : : : : : : : : : : : : 682
10.6.4 An Outer Product Expansion of A and Its Consequences : : : : : : : : : : : 686
10.6.5 Numerical Rank : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 687
10.7 Computing the Variance-Covariance Matrix : : : : : : : : : : : : : : : : : : : : : : : 688
10.8 The Singular Value Decomposition, the Least Squares Problem, and the Pseudo-Inverse689
10.8.1 The SVD and Least Squares Problem : : : : : : : : : : : : : : : : : : : : : : 689
10.8.2 Solving the Linear System Using the Singular Value Decomposition : : : : : 692
10.8.3 The SVD and the Pseudoinverse : : : : : : : : : : : : : : : : : : : : : : : : : 692
10.9 Computing the Singular Value Decomposition : : : : : : : : : : : : : : : : : : : : : : 695
10.9.1 Computing the SVD from the Eigenvalue Decomposition of AT A : : : : : : : 695
10.9.2 The Golub-Kahan-Reinsch Algorithm : : : : : : : : : : : : : : : : : : : : : : 696
10.9.3 The Chan SVD : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 703
10.9.4 Computing the Singular Values with High Accuracy: The Demmel-Kahan
Algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 705
10.9.5 The Dierential QD Algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : 709
10.9.6 The Bisection Method : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 709
10.10Generalized SVD : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 710
10.11Review and Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 711
10.12Some Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : : : 712
CHAPTER 10
THE SINGULAR VALUE DECOMPOSITION (SVD)
10. THE SINGULAR VALUE DECOMPOSITION (SVD)
Objectives
The major objectives of this chapter are to study theory, methods, and applications of the
Singular Value Decomposition (SVD).Here are the highlights of the Chapter.
The SVD existence theorem (Section 10.2)
A relationship between singular values and the eigenvalues(Section 10.3)
An application of the SVD to the fetal ECG (Section 10.4)
Sensitivity of singular values (Section 10.5)
Applications of the SVD to rank-related problems, computing orthonormal bases, matrix
approximations, etc. (Section 10.6)
Use of the SVD in least squares and the pseudoinverse problems (Section 10.8)
The Golub-Kahan-Reinsch algorithm (Section 10.9)
The Demmel-Kahan algorithm (Section 10.9)
Background Material Needed for this Chapter
The following background material will be needed for smooth reading of this chapter:
1. Norm concepts and related results (Section 1.7)
2. Rank concept (Section 1.3)
3. Orthonormal bases (Sections 1.3 and 5.6)
4. Creating zeros with Householder and Givens matrices (Sections 5.4 and 5.5)
5. QR factorization of a nonsquare matrix (Section 5.4.2)
6. Statements and methods for least-squares problems and the pseudoinverse (Chapter 7)
7. The Bauer-Fike Theorem (Theorem 8.7.1)
668
10.1 Introduction
So far we have studied two important decompositions of a matrix A, namely, the LU and the QR
decompositions. In this chapter we will study another very important decomposition of a complex
matrix A: the singular value decomposition or in short, the SVD dened by
A = U V
where U and V are unitary and is a diagonal matrix. If A is real, U and V are real orthogonal.
We will assume that A is real throughout this chapter.
The SVD has a long and fascinating history. The names of at least ve classical and celebrated
mathematicians|E. Beltrami (1835-1899), C. Jordan (1838-1921), J. Sylvester (1814-1897), E.
Schmidt (1876-1959), and H. Weyl (1885-1955)|can be associated with the development of the
theory of the SVD. Some details of the contributions of these mathematicians to the SVD can be
found in an interesting recent paper by G. W. Stewart (1993). Also the book by Horn and Johnson
(1991) contains a nice history of the SVD.
In recent years the SVD has become a computationally viable tool for solving a wide variety of
problems arising in many practical applications. However, this has only been possible due to the
pioneering work of some contemporary numerical analysts such as: G. H. Golub, W. Kahan, etc.,
who showed us how to compute the SVD in a numerically ecient and stable way.
In this chapter we will discuss the important properties, some applications, and the existing
computational methods for the SVD. The organization of this chapter is as follows.
In section 10.2 we prove the existence theorem on the SVD. There are several proofs available
in the literature: Golub and Van Loan (MC, 1984, p. 17) Horn and Johnson (1991), Pan and
Sigmon (1992), etc. We will, however, give a more traditional and constructive proof that relies
on the relationship between the singular values and singular vectors of A with the eigenvalues and
eigenvectors of AT A: The later aspect is treated fully in Section 10.3.
In section 10.4 we give a real-life application of the SVD on separating out the Electro-
Cardiogram (ECG) of a fetus from that of the mother, taken from the work of Callaerts, De
Moor, Van Dewalle and Sansen (1991).
In section 10.5 we discuss the sensitivity of the singular values. The singular values are
insensitive to perturbations, and this is remarkable.
In section 10.6 we outline several important applications of the SVD relating to the structure
of a matrix: nding the numerical rank, orthonormal bases for the range and null space
of A, approximation of A by another matrix of lower rank, etc. The ability of the SVD
to perform the above computations in a numerically reliable way makes the use of the SVD so
669
attractive in practical applications.
In section 10.8 we discuss the use of the SVD in least squares solutions and computation
of the pseudoinverse. The SVD is the most numerically reliable approach to handle
these problems.
Finally, section 10.9 is devoted to computing the SVD. Here, besides describing the widely-used
Golub-Reinsch method, we brie
y outline the recent improvements to the method by Demmel and
Kahan (1990). A brief mention of the very recent method of Fernando and Parlett (1992) will be
mentioned too.
0 uT 1 BB v2 A CC
2
1
BB uT CC BB ... CC
B C
U T AV = BBB ..2 CCC A(v1; : : :; vn) = BBB 1r vrT AT CCC A(v1; : : :; vn) (Using 10.2.4) (10.2.6)
@ . A BB uT CC
um T BB r.+1 CC
B@ .. CA
uTm
(10.2.7)
671
0 1 2 0 0
0 0
1
1
B
B
1
1 2 0 0C
C
B 0 2 2 0 CC
B
B .. ... ... .. CC
B
B . . C=
= B
B
B 0 1r r2 0 0 C CC
B C
B
@ 0 0 ... 0 C A
0 0
The statement about the rank follows immediately from the fact that the rank is invariant under
unitary matrix multiplications:
rank (A) = rank (U V T ) = rank ();
and that the rank of a diagonal matrix is equal to the number of nonzero diagonal entires.
The decomposition (10.2.6) is known as the Singular Value Decomposition (SVD) of A.
Denition 10.2.1 The scalars 1; 2; : : :; r are called the nonzero-Singular Values of A.
Note that the other (n ; r) singular values are zero singular values.
Denition 10.2.2 The columns of U are called the left singular vectors and those of V are
called the right singular vectors.
A Convention
For the remainder of this chapter we will assume, without any loss of gener-
ality, that m n, because if m < n, we consider the SVD of AT and if the
SVD of AT is U V T , then the SVD of A is V U T .
We will follow the convention that the singular values are ordered as above. Thus
1 = max is the largest singular value and = min the smallest singular
n
value. Also, by (A), we will denote the set of singular values of A. This is also
referred to as the singular spectrum.
Notes:
1. When m n, we have n singular values.
2. The singular values of A are uniquely determined while the matrices U and V are not unique
(Why?)
672
Example 10.2.1
01 21 0 6:5468 0 1
B2 3C
A=B
B 0 0:3742 CC
@ C A =B
@ A
3 4 0 0 32
0 0:3381 0:8480 0:4082 1
B 0:5506 0:1735 ;0:8165 CC 0:5696 ;0:8219
!
U =B
@ A V=
0:8299 0:5696 22
0:7632 ;0:5009 0:4082 33
There are two singular values; they are 6.5458, 0.3742.
Proof.
AT A = (V T U T )(U V T )
= V T V T (10.3.1)
= V 0V T ;
where 0 is an n n diagonal matrix with 12; : : :; n2 as its diagonal entries. Thus, V T AT AV = 0 =
diag(12; 22; : : :; n2 ): Since the singular values are nonnegative numbers, the theorem is proven.
Corollary 10.3.1 Let A be a symmetric matrix with its eigenvalues 1; : : :; n.
Then the singular values of A are jij; i = 1; : : :; n.
Proof. Since
A = AT ; (10.3.2)
AT A = A2: (10.3.3)
673
Therefore, from Theorem 10.3.1 above we have that the n singular values of A are the square roots
of the n eigenvalues of A2 . Since the eigenvalues of A2 are 21 ; : : :; 2n, we have the result.
Proof. We know that det(AT A) = (det(A))2: Thus A is nonsingular if and only if AT A is non-
singular. Since a matrix is nonsingular i its eigenvalues are all dierent from zero, proof is now
immediate from Theorem 10.3.1.
Proof. Let
A = U V T :
Write
U1 U2 !
U =
m n m (m ; n)
1(nn)
!
= :
0(m;n)n
Dene
U~ ;U~ U2
!
P = ~1 ~ 1 ;
V V 0n(m;n)
where
U~1 = p1 U1
2
and
V~ = p1 V:
2
Then it is easy to see that 0 0 01
BB 1 C
P CP = @ 0 ;1 0 C
T
A;
0 0 0
674
which shows that the nonzero eigenvalues of C are 1; : : :; k , ;1 ; : : :; ;k , where 1 through k
are the nonzero singular values of A.
675
disturbs the observation of the fetal ECG (FECG), because the contributions of the material heart
signals are much stronger than those of the fetal heart. The objective then will be to detect the
FECG while simultaneously suppressing the MECG respective to noise.
Suppose there are p measurements signals m1 (t); : : :; mp (t). Let these measurements be ar-
ranged in a vector called the measurement vector m(t):
m(t) = (m1(t); m2(t); : : :; mp(t))T : (10.4.1)
Let there be r source signals s1 (t); s2(t); : : :; sr (t) arranged in the source signal vector s(t):
s(t) = (s1(t); s2(t); : : :; sr (t))T : (10.4.2)
Obviously, the measurement signals are corrupted by an additive noise signal, and there exists a
relationship between the measurement signals and the source signals. It can be assumed (see Van
Dewalle and de Moor (1988)) that this relationship is linear, and, indeed, each measurement signal
mi (t) can be written as a linear combination of r source signals si (t) and additive noise signal ni (t).
This leads to the following equations:
m1(t) = t11s1(t) + t12 s2 (t) + + t1r sr (t) + n1 (t)
m2(t) = t21s1(t) + t22 s2 (t) + + t2r sr (t) + n2 (t)
.. (10.4.3)
.
mp(t) = tp1 s1(t) + tp2 s2(t) + + tpr sr (t) + nr (t):
or
m(t) = T s(t) + n(t); (10.4.4)
where
T = (tij ); and n(t) = (n1(t); n2(t); : : :; nr (t))T : (10.4.5)
The matrix T is called the transfer matrix and depends upon the geometry of the body, the
positions of the electrodes and sources, and the conductivities of the body tissues.
The problem now is to get an estimate of the source signals s(t) knowing only m(t), and from
that estimate, separate out the estimate of fetal source signals.
Let each measurement consist of q samples. Then the measurements can be stored in a matrix
M of order p q.
We now show that the SVD of M can be used to get estimates of the source signals. Let
M = U V T (10.4.6)
be the SVD of M .
676
Then the p q matrix S^ dened by
S^ = U T M (10.4.7)
will contain p estimates of the source signals. Next, we need to extract the estimates of the fetal
source signals from S^; let this be called S^F :
Partition the matrix of singular values of M as follows:
0 0 0 1
m
B
= @ 0 F 0 C
B C; (10.4.8)
A
0 0 0
where M contains rm large singular values, associated with the maternal heart, F contains rf
singular values, those smaller ones associated with the fetal heart, and 0 contains the remaining
singular values associated with noise, etc.
Let U = (UM ; UF ; U0) be a conformable partitioning of U . Then, obviously, we have
0 UT 1
M
^S = U T M = B
B@ UFT CCA M
0 U T M 1 U0T0 S^ 1 (10.4.9)
BB MT CC BB ^M CC
= @ UF M A = @ SF A ;
U0T M S^0
Thus S^F = UFT M .
Once S^F is determined, we can also construct a matrix F containing only the contributions of
fetus in each measured signal, as follows:
rmX
+rf
F = UF S^F = i uiviT ;
i=rm +1
where ui and vi are the ith column of U and V , and i is the ith singular value of M . The signals
in S^F are called the principal fetal signals.
The above method has been automated and an on-line adaptive algorithm to compute the U -
matrix has been designed. For the details of the method and test results, see the paper by Callaerts,
DeMoor, Van Dewalle and Sansen (1990).
Note that if the SVD of M is given by
M = U V T ! VT !
1 0 1
= (U1; U2) ;
0 2 V2T
then S can be estimated by S^ = U1T M:
677
10.5 Sensitivity of the Singular Values
Since the squares of the singular values of A are just the eigenvalues of the real symmetric matrix
AT A, and we have seen before that the eigenvalues of a symmetric matrix are insensitive to small
perturbations, the same thing should be expected of the singular values as well.
Indeed, the following important result holds. The result is analogous to a result on the sensitivity
of the eigenvalues of a symmetric matrix stated in Chapter 8. The result basically states that, like
the eigenvalues of a symmetric matrix, the singular values of a matrix are well-conditioned.
Proof. The proof is based on the interesting relationship between the singular values of the matrix
A and the eigenvalues of the symmetric matrix
0 A
!
A~ = ;
AT 0
displayed in Theorem 10.3.2. It has been shown there that the nonzero eigenvalues of A~ are
1 ; : : :; k ; ;1; : : :; ;k, where 1 through k are the nonzero singular
! values of 0A. The!remaining
0 B E
eigenvalues of A~ are, of course, zero. Dene now B~ = ; E~ = T ; we have
BT 0 E 0
B~ ; A~ = E~ .
The eigenvalues of B~ and E~ are related respectively, to the singular values of B and E in the
same way the eigenvalues of A~ are related to the singular values of A. The result of Theorem ??
now follows immediately by applying the Bauer-Fike Theorem (Theorem 8.7.1) to B; ~ A;
~ and E~ .
Example 10.5.1 01 2 31 00 0 1
0
B 3 4 5 CC ;
A=B
B0 0
E=B 0 C
C: (10.5.1)
@ A @ A
6 7 8 0 0 :0002
The singular values of A:
1 = 14:5576
2 = 1:0372 (10.5.2)
3 = 0:0000
678
The singular values of A + E :
~1 = 14:5577
~2 = 1:0372
~3 = 0:0000
kE k2 = :0002 (10.5.3)
j~1 ; 1j = 1:1373 10;4
j~2 ; 2j = 0
j~3 ; 3j = 0:
We now present a result (without proof) on perturbation of the singular values that uses Frobe-
nius norm instead of 2-norm. The result of Theorem 10.5.2 will be used later in deriving a result
on the closeness of A to a number of lower rank. (Theorem 10.6.2.)
i=1
679
10.6.1 The Norms and the Condition Number
Proof.
1. kAk2 = kU V T k2
= kk2 = maxi
(i )
= 1 :
(Note that the 2-norm and F -norm are invariant under unitary matrix products.)
2. kAkF = kU V T kF
= kkF
= (12 + 22 + + n2 ) 12 :
3. To prove 3 we note that the largest singular value of A;1 is 1 . (Note that when A is
n
invertible, n 6= 0). Then the result follows from 1.
4. Follows from the denition of Cond2 (A) and 1 and 3.
Remark: When A is rank-decient, min = 0, and we say that Cond(A) is innite.
Example 10.6.1
01 2 31
B CC
A = B
@ 3 4 5 A
6 7 8
1 = 14:5576; 2 = 1:0372; 3 = 0:0000
680
1. kAk2 = 1 = 14:5576
p
2. kAkF = 12 + 22 + 32 = 14:5945
10.6.2 Orthonormal Bases for the Null Space and the Range of a Matrix
In Chapter 5 we have shown how to construct orthonormal bases for the range and the null-space
of a matrix A using QR decomposition. In this section we show how the singular vectors can be
used to construct these bases.
Let r be the rank of A, that is,
1 2 r > 0; (10.6.1)
r+1 = = n = 0: (10.6.2)
Let uj and vj be the columns of U and V in the SVD of A. Then the set of columns vj
corresponding to the zero singular values of A form an orthonormal basis for the null-
space of A. This is because, when j = 0; vj satises Avj = 0 and is therefore in the null-space of
A. Similarly, the set of columns uj corresponding to the nonzero singular values is an
orthonormal basis for the range of A.
Orthogonal Projections
Once the orthonormal bases for the range R(A) and the null-space N (A) of A are obtained, the
orthogonal projections can be easily computed. Thus if we partition U and V as
U = (U1 ; U2); V = (V1; V2);
where U1 and V1 consist of the rst r columns of U and V , then
681
Example 10.6.2
01 2 31
B 3 4 5 CC
A = B
@ A
6 7 8
1 = 14
0 0:5576 ; 2 = 1:0372 13 = 0:
:2500 0:8371 0:4867
U = B
B C
@ 0:4852 0:3267 ;0:8111 CA
0:8378 ;0:4379 0:3244
0 0:4625 ;0:7870 0:4082 1
V = B
B 0:5706 ;0:0882 ;0:8165 CC
@ A
0:6786 ;0:6106 0:4082
0 0:4082 1
B ;0:8165 CC.
An orthonormal basis for the null-space of A = B
@ A
0:4082
0 0:2500 1
0:8371
B C
An orthonormal basis for the range of A = B @ 0:4852 0:3267 CA. (Compute now the four
0:8370 ;0:4379
orthogonal projections yourself.)
10.6.3 The Rank and the Rank-Deciency of a Matrix
Finding the rank of an m n matrix and, in particular, determining the nonsingularity of a square
matrix are very important and frequently arising tasks in linear algebra and many important
applications. The most obvious and the least expensive way of determining the rank of a matrix
is, of course, to triangularize the matrix using Gaussian elimination and then to nd the rank
of the reduced upper triangular matrix. Theoretically, nding the rank of a triangular matrix is
trivial; one can just read it o from the diagonal. Unfortunately, however, this is not a very reliable
approach in
oating point computations. In practice, it is more important, as we will see below, to
determine if the given matrix is near a matrix of a certain rank and in particular, to know if it is near
a rank-decient matrix. The Gaussian elimination method which uses elementary transformations,
may transform a rank-decient matrix rank into one having full rank, due to numerical round-o
error.
Another approach, which is more reliable than Gaussian elimination, to determine the rank
and rank-deciency, is the QR factorization with column pivoting. As we have seen before in
682
Chapter 5 this method constructs a permutation matrix P , and orthogonal matrix Q such that
R1 R2 !
QT AP = ; (10.6.3)
0 0
R1 is upper triangular and nonsingular, and the dimension r of R1 is the rank of A.
This result above, however, does not determine reliably the nearness of A to a rank-decient
matrix. For details, see again Golub and Van Loan (MC, 1984, pp. 166-167) and our discussions
on rank-revealing QR factorization in Chapter 5.
The most reliable way to determine the rank and nearness to rank-deciency is to
use the SVD.
Since the number of nonzero singular values determines the rank of a matrix, we can say that
a matrix A is arbitrarily near a matrix of full rank: just change each zero singular value
by a small number . It is, therefore, more meaningful to know if a matrix is near a
matrix of a certain rank, rather than knowing what the rank is. The singular value
decomposition exactly answers this question.
Suppose that A has rank r, that is, 1 2 r > 0 and r+1 = = n = 0. Then
the question is how far A is from a matrix of rank k < r. The following theorems can be
used to answer the question. Theorem 10.6.2 is generally known as the Eckart-Young Theorem (see
Eckart and Young (1939)).
Theorem 10.6.2 Let A = U V T be the SVD of A, and let rank(A) be r > 0. Let
k r. Dene
Ak = U k V T ;
where 0 1
BB 1 0 CC
B ... 0C
k = BBB 0 CC :
@ k CA
0 0
Then (i) rank(Ak ) = k, (ii) out of all the matrices of rank k, it is closest to A in
Frobenius norm.
683
To prove the second part, we rst prove that if A has (n ; k) small singular values, then A is
close to Ak . Indeed, kAk ; Ak2F = kU (k ; )V T k2F = kk ; k2F = k2+1 + + n2 : Thus if the
(n ; k) singular values k+1; : : :; n are small, then it is close to Ak . Next, we show that if B is any
other matrix of rank k,
kB ; Ak2F kAk ; Ak2F :
To see this, let's denote the singular values of B by 1 2 n . Then since B has rank k,
k+1 = k+2 = = n = 0. By Theorem 10.5.2, we then have
Xn
kB ; Ak2F ji ; ij2 k2+1 + + n2 = kAk ; Ak2F :
i=1
Theorem 10.6.3 Out of all the matrices of rank k(k < r), the matrix B, closest
to A is given by
B = U k V T ;
where 0 1
BB 1 0 0 CC
BB ... CC
k = B :
B@ 0 k C CA
0 0
Furthermore, the distance of B from A : kB ; Ak2 = k+1:
684
Implication of the above results
Example 10.6.3
01 0 0
1
B0 2
A = B 0 C
C
@ A
0 0 0:0000004
rank(A) = 3; 3 = 0:0000004
00 1 01
U =
BB 1 0 0 CC ;
@ A
0 0 1
00 1 01
V =
BB 1 0 0 CC
@ A
0 0 1
01 0 01
B C
A0 = A ; u3 3v3T = B@ 0 2 0 CA :
0 0 0
rank(A0) = 2:
685
The required perturbation u3 3 v3T to make A singular is very small:
00 0 0 1
B 0 0 0 CC :
10;6 B
@ A
0 0 :4
10.6.4 An Outer Product Expansion of A and Its Consequences
In many applications the matrix size is so large that the storage of A may require a major part of
total available memory space.
The knowledge of nonzero singular values and the associated left and right singular vectors can
help store the matrix with a substantial amount of savings in storage space.
Let A be m n (m n), and 1 ; : : :; r be the nonzero singular values of A. Let the associated
right and left singular vectors be v1; : : :; vr , and u1 ; : : :; ur , respectively. Then the following theorem
shows that A can be written as the sum of r matrices each of rank 1. Since a rank-one matrix of
order m n can be stored using only (m + n) storage locations rather than mn locations needed
to store an arbitrary matrix, this will result in a very substantial savings, if r is substantially less
than n.
Theorem 10.6.4 X
r
A= j uj vjT :
j =1
Example 10.6.4 01 2 31
B
A=B
CC
@ 3 4 5 A
6 7 8
686
1 = 14:5576
2 = 1:0372
3 = 0; r = 2:
0 :2500 1 0 :8371 1
B C B C
u1 = B
@ :4852 CA ; u2 = B@ :3267 CA
0 ::4625
8379 1 0 ;;:07870
:43891
B C B C
v1 = B
@ :5706 CA ; v2 = B@ ;:0882 CA
:6786 0:6106
01 2 31
B C
1u1v1T + 2 u2v2T = B
@ 3 4 5 CA :
6 7 8
An Important Consequence
Theorem 10.6.4 has an important practical consequences. In practical applications, the matrix
A generally has a large number of small singular values. Suppose there are (n ; k) small singular
values of A which can be neglected. Then the matrix Ak = 1u1 v1T + a2 u2v2T + + k uk vkT is a
very good approximation of A, and such an approximation can be adequate in applications. For
example, in digital image processing, even when k is chosen much less than n, the digital image
corresponding to Ak can be very close to the original image. For details see Andrews and Hunt
(1977). If A is n n, the storage of Ak will require 2nk locations compared to n2 locations needed
to store A, resulting in a large amount of savings.
687
A has \numerical rank" r if the computed singular values ~1; ~2; : : :; ~ n
satisfy
~1 ~2 ~r > ~r+1 ~n (10.6.4)
Thus to determine the numerical rank of a matrix A, count the \large" singular
values only. If this number is r, then A has numerical rank r.
Remark: Note that nding the numerical rank of a matrix will be \tricky" if there is no
suitable gap between a set of singular values.
Example 10.7.1 01 21
B 2 3 CC :
A=B
@ A
3 4
688
c22 = v212 + v222 = 2:3333:
2 2
1 2
10.8 The Singular Value Decomposition, the Least Squares Problem, and the
Pseudo-Inverse
In Chapter 7 we have seen that the full-rank linear least squares problem can be eectively solved
using the QR decomposition, but pivoting will be needed to handle the rank-decient case.
We have just seen that the singular values of A are needed to determine the reliably nearness
to rank deciency. We will now see that the SVD is also an eective tool to solve the least squares
problem, both in the full rank and rank-decient cases.
689
where k is the number of nonzero singular values of A. Thus the vector
0y 1
BB y1 CC
y=BBB ..2 CCC
@.A
yn
that minimizes ky ; b0k2 is given by:
0
yi = bi ; when i 6= 0
i
yi = arbitrary; when i = 0.
(Note that when k < n; yk+1 through ym do not appear in the above expression and therefore do
not have any eect on the residual.) Of course, once y is computed, the solution can be recovered
from x = V y:
Since corresponding to each \zero" singular value i ; yi can be set arbitrarily, in the rank-
decient case, we will have innitely many solutions to the least squares problem. There are in-
stances where this rank deciency is actually desirable because it provides a rich family of solutions
which might be used for optimizing some other aspect of the original problem. In the full rank case
the least squares solution is, of course, unique.
The above discussion can be summarized in the following algorithm.
Algorithm 10.8.1 Least Squares Solutions Using the SVD
1. Find the SVD of A:
A = U V T :
0 b0 1
1
B
B b02 C
C
0 T B
2. Form b = U b = B .. CC.
B
@ . CA
b0m
0y 1
BB ..1 C
3. Compute y = @ . C A choosing
yn
8 b0 1
>
< i; when 6
= 0
yi = > i i CA :
: arbitrary, when i = 0:
4. Then the family of least squares solutions is
x = V y:
(Note that in the full-rank case, the family has just one number.)
690
Flop-count: Using the Golub-Kahan-Reinsch method to be described later, it takes about
2mn2 + 4n3
ops to solve the least squares problem, when A is m n and m n. (In deriving
this
op-count, it is noted that the complete vector b does not need to be computed; then only the
columns of U that correspond to the nonzero singular values are needed in computation.)
An Expression for the Minimum Norm Least Squares Solution
It is clear from Step 3 above that in the rank-decient case, the minimum 2-norm least squares
solution is the one that is obtained by setting yi = 0, whenever i = 0. Thus, from above, we have
the following expression for the minimum 2-norm solution:
X
r uT b0
i i
x= i vi; (10.8.1)
i=1
where r = rank(A).
691
10.8.2 Solving the Linear System Using the Singular Value Decomposition
Note that the idea of using the SVD in the solution of the least squares problem can be easily
applicable to determine if a linear system Ax = b has a solution, and, if so, how to compute it.
Thus if
A = U V T ;
then
Ax = b
is equivalent to
y = b0
where y = V T x; and b0 = U T b:
So, the system Ax = b is consistent i the diagonal system y = b0 is consistent
(which is trivial to check), and a solution of Ax = b can be computed by solving the diagonal
system y = b0 rst, and then recovering x from x = vy . However, this approach will be much
more expensive than the Gaussian elimination and QR methods. That is why the SVD is not
used in practice, in general, to solve a linear system.
10.8.3 The SVD and the Pseudoinverse
In Chapter 7 we have seen that when A is an m n (m n) matrix having full rank, the
pseudoinverse of A is given by:
Ay = (AT A);1AT :
A formal denition of the pseudoinverse of any matrix A (whether it has full rank or not) can be
given as follows:
692
Four Properties of the Pseudoinverse
The pseudoinverse of an m n matrix A is an n m matrix X
satisfying the following conditions:
1. AXA = X .
2. XAX = X:
3. (AX )T = AX:
4. (XA)T = XA:
The pseudoinverse of a matrix always exists and is unique. We now show that the SVD
provides a nice expression for the pseudoinverse.
Let A = U V T be the SVD of A; then it is easy to verify that the matrix
V yU T ; where
y = diag( 1 ); (10.8.2)
j
(if j = 0, use 1 = 0);
j
satises all the four conditions, and therefore is the pseudoinverse of A. Note that this expression
for the pseudoinverse coincides with A;1 when A is nonsingular, because
A;1 = (AT A);1AT (10.8.3)
= (V T U T U V T );1V T U T
= V ;1(T );1V T V T U T
= V ;1U T
(Note that in this case, y = ;1 ):
The process for computing the pseudoinverse Ay of A using the SVD of A can be summarized
as follows.
Algorithm 10.8.2 Computing the Pseudoinverse Using the SVD
1. Find the SVD of A:
A = U V T :
693
2. Compute 0 1
1
BB 1 CC
BB 1
2 CC
y = diag B BB ... 0C
C;
CC
BB 1 CA
@ r
0 0
where 1 ; : : :; r are the r nonzero singular values of A.
3. Compute Ay = V y U T .
Example 10.8.2 Find the pseudoinverse of
0 0 0 ;1 1 0 1 0 0 1 0 1 ; 6 ; 6 1
B C B C B 3
A = @ ;1 0 0 A @ 0 2 0 A @ ; 9 3 ; 9 C
B C B C B 6 1
9 9
6 C
A
0 ;1 0 0 0 0 ;9 ;9 3
6 6 1
A = U V T ; Ay = V yU T
01 0 01
B 1 CC
y = B@0 2 0A
0 0 0
0 0 ;1 0 1
B 0 0 ;1 CC
UT = B
@ A
;1 0 0
V T = V:
Thus 0 1 ; 6 ; 6 1 0 1 0 0 1 0 0 ;1 0 1 0 0 ; 1 ; 1 1
B 36 19 ; 96 C
Ay = B C B
B 1 0CC BB 0 0 ;1 CC = BB 23 ; 31 CC
;
@ 9 3 9 A@ 2 A@ 0 A @0 3 6A
;9 ;9 3
6 6 1 0 0 0 ;1 0 0 2
0 3 31
694
Example 10.8.3
01 2 31 061
B 2 3 3 CC ; b = BB 8 CC
A=B
@ A @ A
1 2 3 6
0 ;0:9167 1:3333 ;0:9167
1
B C
Ay = B@ ;0:1667 0:3333 ;0:1667 C
A
0:5833 ;0:6667 0:5833
The minimum 2-norm least squares solution is
011
B 1 CC :
Ay b = B
@ A
1
10.9 Computing the Singular Value Decomposition
As mentioned before, the ability of the singular values and the singular vectors to compute eectively
the numerical rank, the condition number, the orthonormal bases of the range and the null spaces,
the distance of a matrix A from a matrix of lower rank, etc., is what makes the singular value
decomposition so attractive in many practical applications. Therefore, it is important to know how
the SVD can be computed in a numerically eective way. In this section we will discuss this most
important aspect of the SVD.
695
(to four signicant digits).
The eigenvalues of AT A are 0.0000 and 4.0004.
Remark: In the numerical linear algebra literature, Stage I is known as the Golub-Kahan
bidiagonal procedure, and Stage II is known as the Golub-Reinsch algorithm. We will call the
combined two-stage procedure by the Golub-Kahan-Reinsch method.
High relative accuracy of the singular values of bidiagonal matrices.
The following result due to Demmel and Kahan (1990) shows that the singular values of a
bidiagonal matrix can be computed with very high accuracy.
696
Theorem 10.9.1 Let B = (bij) be an n n bidiagonal matrix. Let B = (bij ).
6 0.
Suppose that bii + bii = 2i;1bii, and bi;i+1 + bi;i+1 = 2i bi;i+1; j =
2 n ;1 ;
Let = i=1 max(jij; ji j): Let 1 n be the singular values of B and
1
697
Householder matrices U02 and V02 are constructed so that
0 0 01
BB 0 0 CC
BB CC
U02A V02 = B 0 0 C
(2) B CC
BB
@ 0 0 CA
0 0
Of course, in this step we will work with the 4 3 matrix A0 rather than the matrix A(2) . Thus,
the orthogonal matrices U020 and V020 will be constructed rst such that
0 01
BB CC
B 0 C
U020 A0V020 = B B@ 0 CCA ;
0
then U02 and V02 will be constructed from U020 and V020 in the usual way, that is, by embedding them
in identity matrices of appropriate orders. The process is continued until the bidiagonal matrix B
is obtained.
Example 10.9.2 01 2 31
B C
A=B
@ 3 4 5 CA :
6 7 8
Step 1.
0 ;0:1474 ;0:4423 ;0:8847 1
U01
B ;0:4423 0:8295 ;0:3410 CC
= B
@ A
;0:8847 ;0:3410 0:3180
0 ;6:7823 ;8:2567 ;9:7312 1
A(1)
B 0
= U01A = B 0:0461 0:0923 C
C
@ A
0 ;0:9077 ;1:8154
Step 2.
01 0 0
1
V01
B 0 ;0:6470 0:7625 CC
= B
@ A
0 ;0:5571 0:6470
0 ;6:7823 12:7620 0
1
A(2)
B 0
= A(1) V01 = B ;1:0002 0:0245 C
C
@ A
0 1:9716 ;0:4824
698
Step 3.
01 0 0
1
B C
U02 = B
@ 0 ;0:0508 0:9987 CA
0 0:9987 0:0508
0 ;6:7823 12:7620 0
1
B C
B = U02A(2) = U021A(1)V01 = U02U01AV01 = B
@ 0 ;1:0081 ;1:8178 C
A
0 0 0
Note that from the above expression of B , it immediately follows that zero is a singular value
of A.
Finding the SVD of the Bidiagonal Matrix
The process is a variant of the QR iteration. Starting from the n n bidiagonal matrix B
obtained in Stage 1, it successively constructs a sequence of bidiagonal matrices fBk g such that each
Bi has possibly smaller o-diagonal entries than the previous one. The ith iteration is equivalent to
applying the implicit symmetric QR, described in Chapter 8, with Wilkinson shift to the symmetric
tridiagonal matrix BiT Bi without, of course, forming the product BiT Bi explicitly. The eective
tridiagonal matrices are assumed to be unreduced (note that the implicit symmetric QR works
with unreduced matrices); otherwise we would work with decoupled SVD problems. For example,
if bk;k+1 = 0, then B can be written as the direct sum of two bidiagonal matrices B1 and B2 and
(B ) = (B1) [ (B2).
The process has guaranteed convergence and the rate of convergence is quite fast. The details of
the process can be found in the book by Golub and Van Loan (MC 1984 pp. 292-293). We outline
the process brie
y in the following.
In the following just one iteration step of the method is described. To simplify the notation,
let's write 0 1
1 2
BB . . . . . . CC
B=B BB ... C
C (10.9.9)
@ nCA
n
be an n n bidiagonal matrix. Then the Wilkinson-shift for the symmetric matrix B T B is the
eigenvalue of 2 2 right-hand corner submatrix
2n;1 + n2;1 n;1n
!
(10.9.10)
nn;1 2n + n2
that is closer to 2n + n2 :
699
!
21 ; !
1. Compute a Givens rotation J 01 such that J 0
1 = :
12 0
J10 0
!
Form J1 = :
0 In;2
2. Apply J1 to the right of B , that is form
B BJ1
This will give a ll-in at the (2,1) position of B. That is, we will have
0 1
BB + . . . . . . CC
BB . .
CC
B BJ1 = B B . . . C
. C (10.9.11)
BB ... C CA
@
where + indicates a ll-in.
(The ll-in is at the (2,1) position.)
The idea is now to chase the nonzero entry `+' down the subdiagonal to the end of the matrix
by applying the Givens rotations in an appropriate order as indicated by the following:
0 + 1
BB . . . CC
BB . .
CC
3. B J2 B = B
T B . . . C
. C
BB ... C CA
@
(Fill-in at 0the (1,3) position) 1
BB . . . CC
BB . .
CC
B BJ3 = B + . . C
B . . C
BB ... C CA
@
(Fill-in at (3,2) position)
0 1
BB + CC
BB . .
CC
4. B J4 B = B
T B . . . C
. C
BB ... CCA
@
(Fill-in at (2,4) position)
700
0 1
BB CC
BB .
CC
B BJ5 = B B . C
. C
BB ... C CA
@ +
(Fill-in at the (4,3) position.) And so on.
At the end of one iteration we will have a new bidiagonal matrix B orthogonally equivalent to
the original bidiagonal matrix B :
B = (J2Tn;2 J4T J2T )B(J1J3 Jn;1 ):
Example 10.9.3 01 2 01
B C
B=B
@ 0 2 3 CA
0 0 1
1. The Wilkinson shift = 15:0828
0 ;:9901 :1406 0 1
B C
J1 = B
@ ;:1406 ;:9901 0 CA
0 0 0
2. 0 ;1:2713 ;1:8395 0 1
B ;0:2812 ;1:9801 3 CC
B BJ1 = B
@ A
0 0 1
(The ll-in at the (2,1) position.)
3. 0 ;:9764 ;:2160 0 1
B C
J2 = B
@ 0:2160 ;:9764 0 CA
0 0 1
0 1:3020 2:2238 ;:6480 1
B 0 1:5361 ;2:9292 CC
B J2B = B
@ A
0 0 1
(The ll-in at (1,3) position)
01 0 0
1
B
J3 = B
C
@ 0 0:9601 :2797 CA
0 ;:2797 :9601
701
0 1:3020 2:3163 0
1
B C
B BJ3 = B
@ 0 2:2942 ;2:3825 CA
0 ;:2797 :9601
(Fill-in at the (3,2) position)
4. 01 0 0
1
B 0 :9926 ;:1210 CC
J4 = B
@ A
0 :1210 :9926
0 1:3020 2:3163 0
1
B C
B J4B = B
@ 0 2:3112 ;2:4812 CA
0 0 :6646
Stopping Criterion
The algorithm typically requires a few iteration steps before the o-diagonal entry n becomes
negligible. A criterion for o-diagonal negligibility given in Golub and Van Loan (MC 1984 p. 434)
is:
Flop-Count: The cost of the method is determined by the cost of Stage I. Stage II is iterative and
is quite cheap. The estimated
op-count is: 2m2n + 4mn2 + 92 n3 (m n). This count includes
the cost of U; ; and V . There are applications (e.g., least squares) where all three matrices
are not explicitly required. A nice table of dierent
op-count of the Golub-Kahan-Reinsch SVD
and the Chan SVD (to be described in the next section) for dierent requirements of U; , and V
appears in Golub and Van Loan (MC 1984, p. 175).
Round-o Property: It can be shown (Demmel and Kahan (1990)) that the computed SVD,
U^ ^ (V^ )T , produced by the Golub-Kahan-Reinsch algorithm is nearly the exact SVD of A + E , that
is,
A + E (U^ + U^ )^ (V^ + V^ );
702
where U^ + U^ and V^ + V^ are orthogonal,
(kE k2jkAk2) p(m; n); k U^ k p(m; n);
kV^ k p(m; n);
and p(m; n) is a slowly growing function of m and n.
Entrywise Errors of the Singular Values
Furthermore, let i be a computed singular value, then
ji ; ij p(n)kAk2 = p(n)max;
where p(n) is a slowly growing function of n.
The result says that the computed singular values can not dier from the true singular values
by an amount larger than = p(n)max .
Thus, the singular values which are not much smaller than max will be computed
by the algorithm quite accurately, and the small ones may be inaccurate.
10.9.3 The Chan SVD
T. Chan (1982) observed that the Golub-Kahan-Reinsch algorithm for computing the SVD de-
scribed in the last section, can be improved in the case m > 13
8 n, if the matrix A is rst factored
into QR and then the bidiagonalization is performed on the upper triangular matrix R. The im-
provement naturally comes from the fact that the work required to bidiagonalize the triangular
matrix R is much less than that required originally to bidiagonalize the matrix A. Of course, once
the SVD of R is obtained, one can easily retrieve the SVD of A. Thus the Chan-SVD can be
described as follows:
1. Find the QR factorization of A: !
R
QT A = : (10.9.12)
0
2. Find the SVD of R using the Golub-Kahan-Reinsch algorithm:
R = X Y T : (10.9.13)
Tony Chan, a Chinese-American mathematician is a professor of mathematics at the University of California,
Los Angeles. He developed this improved algorithm for SVD when he was a graduate student at Stanford University.
703
Then the singular values of A are just the singular values of R. The singular vector matrices U
and V are given by:
U = Q[X j0]; V = Y: (10.9.14)
Flop-Count: The Chan-SVD requires about 2m2n +11n3
ops to compute ; U and V , compared
to 2mn + 4mn2 + 92 n3
ops required by the Golub-Kahan-Reinsch SVD algorithm. Clearly, there
will be savings with the Chan-SVD when m > n.
Remark: The Chan-SVD should play an important role in several applications such as in control
theory, where typically m n:
Example 10.9.4 01 21
B CC
A=B
@2 3A:
4 5
1. The QR factorization of A:
0 ;0:2182 ;0:8165 ;0:5345 1
B ;0:4364 ;0:4082 0:8018 CC
Q=B
@ A
;0:8729 0:4082 ;0:2673
0 ;4:5826 ;6:1101 1
B C
R=B @ 0 ;0:8165 C
A
0 0
(Q T A = R)
2. The SVD of R:
R = X Y T
0 ;0:9963 0:0856 0
1
B C
X = B
@ ;0:0856 ;0:9963 0 C A
0 0 1:0000
0:5956 ;0:8033
!
Y =
0:8033 0:5956
0 7:6656 0 1
B C
= B
@ 0 0:4881 CA
0 0
704
3. The Singular Value Decomposition of A = U V T
The Singular Values of A are 7.6656, 0.4881.
0 0:2873 0:7948 ;0:5345 1
U = Q[X j0] = B
B 0:4698 0:3694 0:8018 CC
@ A
0:8347 0:4814 ;0:2673
V = Y
Recall that the
op-count for the normal equations approach for the least-squares problem is
mn2 + n3 and that using QR factorization (with Householder matrices) is mn2 ; n3 =3. Again a
2 6
nice table of dierent least squares methods in regards to their eciency appears in Golub and Van
Loan (MC 1984, p. 177).
10.9.4 Computing the Singular Values with High Accuracy: The Demmel-Kahan
Algorithm
The round-o analysis of the Golub-Kahan-Reinsch algorithm tells us that the algorithm can com-
pute the singular values not much smaller than kAk2 quite accurately; however, the smaller singular
values may not be computed with high relative accuracy.
Demmel and Kahan (Demmel and Kahan, 1990) in an award-winning paper presented a new
algorithm that computes all the singular values, including the smaller ones, with very high relative
accuracy.
They call this algorithm \QR iteration with a zero shift". The reason for this is that the
new algorithm corresponds to the Golub-Kahan-Reinsch algorithm when the shift is zero. The
new algorithm is based on a remarkable observation that when the shift is zero, no
cancellation due to subtraction occurs, and therefore, the very small singular values
can be found (almost) as accurately as the data permits.
James Demmel is a professor of Computer Science at the University of California-Berkeley. He won several
awards including the prestigious Householder Award (jointly with Ralph Byers of the University of Kansas) in 1984,
the Presidential Young Investigator Award in 1986, the SIAM (Society for Industrial and Applied Mathematics)
Award in 1988, and jointly with William Kahan, in 1991, for the best linear algebra paper and the Wilkinson Prize
in 1993.
705
Indeed the eect of the zero shift is remarkable. This is demonstrated in the following. Let J10
be the Givens rotation such that
21
! !
(J1) 0 = :
1 2 0
J10 0
!
Compute J1 = . Then
0 In;2
0 0 1
BB + CC
BB ... ...
CC
BJ1 = B BB CC ;
B@ . .. C CA
where there is a ll-in at the (2,1) position as before, but note that the (1,2) entry now is zero
instead of being nonzero. This zero will now propagate through the rest of the algorithm and is
indeed the key to the eectiveness of this algorithm.
Let us now apply J2 to the left of BJ1 to zero the nonzero entry at the (2,1) position. Then
0 + 1
BB . . . CC
B J2 BJ1 = B BB C
... C
CA
@
There is now a ll-in at the (1,3) position as in the Golub-Kahan-Reinsch algorithm. Since the
rank of the 2 2 matrix !
b12 b13
b22 b23
is 1, it follows that when J3 is applied to J2 BJ1 to the right to zero out (1,3) entry, it will zero out
the (2,3) entry as well; that is, we will have
0 0 1
BB 0 CC
BB ... CC
BB CC
B J2 BJ1 J3 = B BB ... ... C CC
BB ... C CA
@
Thus, we now have one extra zero on the superdiagonal compared to the same stage of the Golub-
Kahan algorithm:
This phenomenon continues. Indeed, the rotation J4 zeros out the (3,4) entry as well as the
(2,4) entry, and so on.
706
Example 10.9.5 01 2 01
B C
B=B
@ 0 2 3 CA
0 0 1
1. 0 0:4472 ;0:8944 0 1
B 0:8944 0:4472 0 CC
J1 = B
@ A
0 0 1
2. 0 2:2361 0 0 1
B C
B BJ1 = B
@ 1:7889 0:8944 3 CA :
0 0 1
(Note that there is a ll-in at the (2,1) position, but the (1,2) entry is zero.)
3.
0 0:7809 0:6244 0 1
B C
J2 = B
@ ;0:6247 0:7809 0 CA :
0 0 1
0 2:8636 0:5588 1:8741 1
B C
B J2 B = B
@ 0 0:6984 2:3426 CA :
0 0 1
Note that the rank of the 2 2 matrix
0:5588 1:8741
!
0:6984 2:3476
is one.
01 0 0
1
B C
J3 = B @ 0 0:2857 ;0:9583 CA
0 0:9583 0:2857
0 2:8636 1:9557 0 1
B BJ3 = B
B C
@ 0 2:4445 0 CA
0 0:9583 0:2857
(Note that both (1,3) and (2,3) entries are zero.)
4.
01 0 0
1
B
J4 = B
C
@ 0 0:9310 0:3650 CA
0 ;0:3650 0:9310
707
0 2:8636 1:9557 0 1
B C
B J4B = B
@ 0 2:6256 0:1043 CA
0 0 0:2660
Note that the singular values of B are: 3.8990, 1.9306, and 0.2657.
Stopping Criterion: Let the nonzero diagonal entries of B be denoted by 1; : : :; n, and the
nonzero o-diagonal entries by 2 ; : : :; n. Dene the two sequences fk g and fk g as follows:
1st sequence: n = jnj
For j = n ; 1 1 to (n-1) do
j = jj j (j+1/(j+1 + jj +1j))
and
2nd sequence: 1 = f1g
For j = n ; 1 to 1 do
j+1 = jj+1j(j /(j + jj+1j)):
Rate of convergence: The convergence of the last o-diagonal element to zero is linear with
constant factor n2 ;1=n2 : If there is a cluster of singular values of multiplicity m dierent from the
remaining ones then the convergence will be linear with constant factor n2 ;m+1 /n2 ;m :
Round-o error: Let i; i = 1; : : :; n be the singular values of the computed bidiagonal matrix
B obtained by applying one implicit zero shift QR to B. Let 1 2 n. It has been
shown by Demmel and Kahan (1990) that, if
w 69n2 < 1;
708
then
ji ; ij 1 ;w w i; i = 1; : : :; n:
Moreover, if after k iterations, the singular values of the then bidiagonal matrix Bk are k1
kn, then when w 69n2 < 1, we have
1
ji ; kij (1 ; w)k ; 1 i 69kn2 i:
The above result basically states that \the relative error in the computed singular values
can only grow with the squares of the dimension". This is a rather pessimistic result. The
authors have given another round-o result that states that with the approach of the convergences
of the algorithm, \errors do not accumulate at all and the error in the computed i and
i is bounded by c ; c another modest constant".
0 0
Charles F. Van Loan is a professor of Computer Science at Cornell University. He is the co-author of the celebrated
book \Matrix Computations".
710
10.11 Review and Summary
In this section we summarize, for easy reference, the most important results discussed in this
chapter.
1. Existence and Uniqueness of the SVD: The singular value decomposition (SVD) of a
matrix A always exists (Theorem 10.2.1).
A = U V T :
The singular values (the diagonal entries of ) are unique, but U and V are not unique.
2. Relationship of the singular values and singular vectors with the eigenvalues: The
singular values of A are the nonnegative square roots of the eigenvalues of AT A. (Theorem
10.3.1, see also Theorem 10.3.2.)
3. Sensitivity of the singular values: The singular values are insensitive to small perturba-
tions (Theorem 10.5.1).
4. Applications of the SVD: The singular values and the singular vectors of a matrix A are
useful and most reliable tools for determining the (numerical) rank and the rank-deciency
of A; nding the orthonormal bases for range and the null space of A, nding the distance
of A from another matrix of lower rank (in particular the nearness to singularity of a
nonsingular matrix); solving both full-rank and the rank-decient least squares problems,
and nding the ipseudoinverse of A, etc.
These remarkable abilities and the fact that the singular values are insensitive to small per-
turbations, have made the SVD an indispensable tool in a wide variety of applications areas
such as the control and systems theory, signal processing, statistical applications, biomedical
engineering, image processing, etc.
5. Computing the SVD: The most widely used approach for computing the SVD of A is
the Golub-Kahan-Reinsch algorithm. This algorithm comes into two phases. In phase
1, the matrix A is reduced to a bidiagonal matrix by orthogonal equivalence, and in phase
2, the bidiagonal matrix is further reduced to a diagonal matrix by orthogonal similarity
using implicit QR iteration with the Wilkinson's shift. Unfortunately, very tiny singular
values may not be computed with very high relative accuracy by this method. A
modication of this method, known as the zero shift QR iteration or the QR iteration
with a zero shift has been proposed by Demmel and Kahan in 1990. The Demmel-Kahan
method computes all the singular values with high relative accuracy.
711
10.12 Some Suggestions for Further Reading
As mentioned before, applications of the SVD are varied. There are books in each area of appli-
cations where the SVD plays an important role. For applications of the SVD to classical control
problem, see an earlier survey of Klema and Laub (25).
The SVD also plays an important role in modern control theory, especially in H -inty and
Robust Control Theory. Curious readers are referred to a growing number of interesting papers in
these areas. For applications of the SVD to robust pole assignment problem, see Kautsky, Nichols
and van Dooren (1985).
For some applications of the SVD to system identication and signal processing problems, see
an interesting survey paper a variety of applications of singular value decomposition in
identication and signal processing by Joos Van Dewalle and Bart DeMoor (1988).
Two important books in in the area of \SVD and Signal processing" are: 1) SVD and Sig-
nal Processing, Algorithms, Applications and Architecture, edited by Ed. F. Depret-
tere, Elsevier Science Publishers B.W. (North Holland), Amsterdam, 1988. 2) SVD and Signal
Processing II, Algorithms, Analysis and Applications, edited by R. Vaccaro, Elsevier,
Amsterdam, 1991.
For applications to image processing, see the book Fundamentals of Digital Image Pro-
cessing by A. K. Jain, Prentice Hall, Information and System Sciences Series, edited by Thomas
Kailath, 1989 and Digital Image Processing by H. C. Andrews and B. R. Hunt, Prentice-Hall,
Inc., 1977.
Many matrices arising in signal processing and control, and systems theory applications are
structured, such as, Hankel, Toeplitz, etc. Finding ecient and reliable algorithms for dierent
types of computations that can exploit the structures of these matrices is a challenging problem
to researchers. Much work has been done and it is presently an active area of research. For some
interesting papers in this context, see the book Linear Algebra in Signals, Systems, and
Control, edited by B. N. Datta, et al. SIAM, 1988. See also the recent paper of Van Dooren
(1990).
For some statistical applications, see the excellent survey papers by Golub (1969), and Ham-
marling (1985).
Discussions on mathematical applications of the SVD, such as nding numerical rank, nearness
to singularity, orthonormal bases for the range and null-space, etc. are contained in all numerical
linear algebra books. The books by Golub and Van Loan (MC) and Watkins (FMC), in particular,
have treated these aspects very well. For discussions on singular values sensitivity, see Stewart
(1979) and (1984).
712
For computational aspects of singular values and singular vectors, the original paper by Golub
and Kahan (1965), and the recent papers by Demmel and Kahan (1990) and Fernando and Parlett
(1992) are valuable. Codes for the GKR method appear in Golub and Reinsch (1970), and Businger
and Golub (1969).
Fortran codes for the SVD computations also appear in Forsythe, Malcom and Moler (CMMC),
and in Lawson and Hanson (SLS).
The SVD Theorem (Theorem 10.2.1) has been generalized to a pair of matrices (A; B ) by Van
Loan (1976). This is known as the generalized singular value decomposition theorem. For the
statement and a proof of this theorem, see Golub and Van Loan (MC 1984, pp. 318-319). For other
papers in this area see Kagstrom (1985) and Paige and Saunders (1981).
For computational algorithms for the generalized SVD, see Stewart (1983), Van Loan (1985),
and Paige (1986).
For further generalizations of the SVD, see an excellent paper by DeMoor and Zha (1991).
For the SVD of the product of two matrices, see Heath, Laub, Paige and Ward (1986).
713
Exercises on Chapter 10
PROBLEMS ON SECTIONS 10.2 AND 10.3
1. (a) Let A be m n, and U and V be orthogonal. Then from the denitions of singular
values prove that the singular values of A and V T AV are the same.
(b) How are the singular vectors of A related with those of U T AV ?
(c) If m = n and A is symmetric positive denite, then prove that the eigenvalues of A are
the same as its singular values.
2. An economic version of the SVD. Let A be m n (m n). Let rank(A) = r n. Then
prove that
A = V SU;
where V is an m n matrix with orthonormal columns, S is a nonsingular r r diagonal
matrix, and U is an r n matrix with orthonormal rows. Why is the version called the
Economic Version?
3. Let be a singular value of A with multiplicity `; that is, i = i+1 = = i+`;1. Let
U V T be the singular value decomposition of A. Then construct U~ and V~ such that U~ (V~ )T
is also a singular value decomposition.
4. (a) Derive Theorem 10.3.1 from Theorem 10.3.2.
(b) Given 01 21
A=B
B 3 4 CC ;
@ A
5 6
Find the singular values 1 and 2 of A by computing the eigenvalues of AT A. Then
nd the orthogonal matrix P such that
P T SP = diag(1; 2; ;1 ; ;2 ; 0);
033 A !
where S = :
AT 022
714
5. Using the constructive proof of Theorem 10.2.1, nd the SVD of the following matrices:
01 21
A=B
B 3 4 CC ; A = ( 1 2 3 )
@ A
0 51 16
B CC
A=B @1A; A = diag(1; 0; 2; 0; ;5);
0 11 1 1
A=B
B 0 CC ; where is small:
@ A
0
6. (a) Find the rank, k k2; k kF , Cond(A), and the orthonormal bases for the null-space and
the range of each of the matrices in problem #5. Find the orthogonal projections onto
the range and the null-space of A and of their orthogonal complements.
(b) Prove that for an mxn matrix A
i. rank(AT A) = rank(AAT ) = rank(A) = rank(AT )
ii. AT A and AAT have the same nonzero eigenvalues.
iii. If the eigenvectors u1 and u2 of AT A are orthogonal, then Au1 and Au2 are orthog-
onal.
7. (a) Let A be an invertible matrix. Then show that kAk2 = 1 if and only if A is a multiple
of an orthogonal matrix.
(b) Let U be an orthogonal matrix. Then prove that
kAU k2 = kAk2
and
kAU kF = kAkF
(c) Let A be an m n matrix. Using the SVD of A, prove that
i. Cond2(AT A) = (Cond2 (A))2
ii. kAT Ak2 = kAk22
iii. Cond2(A) = Cond2 (U T AV ),
where U and V are orthogonal.
715
(d) Let rank(Amn ) = n, and let Bmr be a matrix obtained by deleting (n ; r) columns
from A, then prove that Cond2(B ) Cond2 (A):
8. Prove that if A is an m n matrix with rank r, and if B is another m n matrix satisfying
kA ; Bk2 < r , then B has at least rank r.
9. (a) Give a proof of Theorem 10.6.3.
(b) Given 01 21
B CC
A=B
@3 4A;
5 6
nd a rank-one matrix B closest to A in 2-norm. What is kB ; Ak2 ?
10. Let A and B be n n real matrices. Let Q be an orthogonal matrix such that kA ; BQkF
kA ; BX kF for any orthogonal matrix X . Then prove that Q = UV T , where BT A = U V T .
11. Given 01 2 31
B 2 3 4 CC ;
A=B
@ A
5 6 7
and using the result of problem #10, show that out of all the orthogonal matrices X , the
matrix 0 ;:2310 ;:3905 :8912 1
Q=B
B ;:4824 :8414 :2436 CC
@ A
:8449 :3736 :3827
is such that kA ; QkF kA ; X kF : Find kA ; QkF by computing the singular values of
A ; Q.
01 21
B CC
12. (a) Let A = B @ 1 3 A. Find the outer product expansion of A using the SVD of A.
1 4
(b) Compute (AT A);1 using the SVD of A.
716
PROBLEMS ON SECTION 10.8
13. Let 0 01
1
BB . . . CC
BB CC
BB r CC
D=B BB CC ; i > 0; i = 1; : : :; r:
0 C
BB ... C CA
@
0 0
Then show that 01 01
BB . .
1
CC
BB . CC
B 1 CC
Dy = B BB r CC :
BB 0 CC
B@ ... CA
0 0
14. Verify that the matrix Ay = V y U T where y = diag( 1 ) (with the convention that if j = 0,
j
1
we use = 0), is the pseudoinverse of A. (Check all four conditions for the denition of the
j
pseudoinverse.)
15. For any nonzero matrix A, show that
(a) AAyv = v for any vector v in R(A).
(b) Ayx = 0 for any x in N (AT ).
(c) (AT )y = (Ay )T
(d) (Ay )y = A:
16. Let A be an m n matrix. Show that
(a) If A has full column rank, then
Ay = (AT A);1AT
(b) If A has full row rank, then
Ay = AT (AAT );1:
17. >From the SVD of A, compute the singular value decompositions of the projection matrices:
P1 = AyA; P2 = I ; Ay A; P3 = AAy and P4 = I ; AAy . Also verify that each of these is a
projection matrix.
717
18. Let 01 21 031
B 2 4 CC ;
A=B
B 6 CC :
b=B
@ A @ A
3 6 9
Find the minimum 2-norm solution x to the least squares problem
min
x
kAx ; bk2:
What is kxk2?
Obtain x and kxk2 using both the SVD of A and the pseudoinverse of A.
011 011
B 2 CC ; v = BB 1 CC ; nd Ay where
19. Given u = B
@ A @ A
3 1
A = uvT :
719
MATLAB PROGRAMS AND PROBLEMS ON CHAPTER 10
720
(i) 0 1
BB 0 0 1 1 C
BB 0 0 0 0 CCC
A=B ;
B@ 1 1 1 0 CCA
1 1 1 0
(ii) A = A randomly generated matrix of order 10.
4. Compute the rank of each of the following matrices using (i) MATLAB command rank (that
uses the singular values of A) and (ii) the program housqrp (Householder QR factorization
with pivoting) and parpiv from Chapter 5, compare the results.
Test-Data and the Experiment
(a) (Kahan Matrix)
0 1
BB 1 ; c ; c ; c C
BB 0 1 ;c ;c CCC
A = diag (1; s; ; sn;1) B
B .. ... .. C
BB . . CCC ;
BB ... . . . ;c C CA
@
0 0 0 1
where, c2 + s2 = 1; c; s > 0; c = 0:2; s = 0:6; n = 100.
(b) A 15 10 matrix A created as follows: A = xy T ; where
x = round (10 rand (15,1))
y = round (10 rand (10,1)).
5. (a) Generate randomly a matrix A of order 6 4 by using MATLAB command rand (6,4).
Now run the following MATLAB command:
[U; S; V ] = svd (A):
Set S (4; 4) = 0; compute B = U S V 0. What is rank (B )?
(b) Construct a matrix C of rank 3 as follows:
C = qr (rand (3)):
Verify that jjC ; Ajj2F jjB ; Ajj2F using MATLAB command norm for computing the
Frobenius norm.
(c) What is the distance of B from A?
721
(d) Find a matrix D of rank 2 that is closest to A.
(This exercise is based on Theorems 10.6.2 and 10.6.3).
6. Let 0 1
BB 1 1 1 1 CC
A=B
B 0 :0001 1 1 C
BB 0 0 0:0001 1 CCC
@ A
0 0 0 1
Find the distance of A from the nearest singular matrix. Find a perturbation which will make
A singular. Compare the size of this perturbation with jr j.
7. Let A = U V T be the singular value decomposition (SV D) of a randomly generated 15 10
matrix A = rand (15,10), obtained by using MATLAB command [U; S; V ] = svd(A):
Set S (8; 8) = S (9; 9) = S (10; 10) = 0 all equal to zero. Compute B = U S V 0.
Find the best approximation of the matrix B in the form
X
r
B xi yiT
i=1
P
such that jjB ; ri=1 xi yiT jj2 = minimum, where xi and yi are vectors, and r is the rank of B .
8. For matrices A and B in problem #7, nd an unitary matrix Q such that jjA ; BQjj is
minimized.
(Hint: Q = V U T , where AT B = U V T ).
(Use MATLAB command svd to solve this problem).
9. (a) Write a MATLAB program, called covsvd to compute (AT A);1 using the singular value
decomposition.
Use it to nd (AT A);1 for the 8 8 Hilbert matrix and compare your results and
op-count with those obtained by running reganl from MATCOM.
(b) Using covsvd nd the Pseudoinverse of A and compare your result with that obtained
by running MATLAB command pinv.
(A is same as in part (a)).
10. Let A be a 10 10 Hilbert matrix and b be a vector generated such that all entries of the
vector x of Ax = b are equal to 1.
Solve Ax = b using the SVD of A, and compare the accuracy,
op-count and elapsed time
with those obtained by linsyspp from Chapter 6.
722
11. Let A = rand (10,3), and
X = pinv(A):
Verify that X satises all the four conditions of the Pseudoinverse using MATLAB: AXA =
X; XAX = X; (AX )T = AX; (XA)T = XA.
12. Write a MATLAB program, called chansvd to implement the improved
SVD algorithm of Chan given in Section 10.9.3, using MATLAB commands qr and svd.
[U; S; V ] = chansvd(A):
Run your program with a randomly generated 30 5 matrix A = rand (30,5) and compare
the
op-count and elapsed time with those obtained by using svd(A).
13. Write a MATLAB program, called bidiag to bidiagonalize a matrix A (Section 10.9.2) :
[B ] = bidiag(A; tol)
where B is a bidiagonal matrix and tol is the tolerance.
Test your program using A = rand(15,10).
14. (The purpose of this exercise is to compare the three related approaches for
nding the small singular values of a bidiagonal matrix.) Write a MATLAB program
to implement the Demmel-Kahan algorithm for computing the singular values of a bidiagonal
matrix:
[s] = dksvd(A)
where s is the vector containing the singular values of A.
Experiment:
Set A = rand (15,10). Compute [U; S; V ] = svd(A). Set S (10; 10) = S (9; 9) = S (8; 8) = 10;5.
Compute B = U S V 0 .
Run the program bidiag on B :
C = bidiag(B):
Compute now the singular values of C using (i) svd, (ii) dksvd, and (iii) chansvd and
compare the results with respect to accuracy (especially for the smallest singular values),
op-count and elapsed time.
723
11.A TASTE OF ROUND-OFF ERROR ANALYSIS
11.1 Basic Laws of Floating Point Arithmetic : : : : : : : : : : : : : : : : : : : : : : : : : 724
11.2 Backward Error Analysis for Forward Elimination and Back Substitutions : : : : : : 724
11.3 Backward Error Analysis for Triangularization by Gaussian Elimination : : : : : : : 727
11.4 Backward Error Analysis for Solving Ax = b : : : : : : : : : : : : : : : : : : : : : : : 734
11.5 Review and Summary : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 736
11.6 Suggestions for Further Reading : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 737
CHAPTER 11
A TASTE OF
ROUND-OFF ERROR ANALYSIS
11. A TASTE OF ROUND-OFF ERROR ANALYSIS
Here we give the readers a taste of round-o error analysis in matrix computations by giving back-
ward analysis of some basic computations such as solutions of triangular systems, LU factorization
using Gaussian elimination and solution of a linear system.
Let's recall that by backward error analysis we mean an analysis that shows that the computed
solution by the algorithm is an exact solution of a perturbed problem. When the perturbed problem
is close to the original problem, we say that the algorithm is backward stable.
11.2 Backward Error Analysis for Forward Elimination and Back Substitutions
Case 1. Lower Triangular System
Consider solving the lower triangular system
Ly = b (11.2.1)
where
L = (l ); b = (b ; : : :; b ) ;
ij 1 n
T
(11.2.2)
and
y = (y ; : : :; y ) ;
1 n
T
724
Step 1.
y^ =
( lb ) = l (1b+ ) ;
1
1 1
11 11 1
where
j j
1
that is,
^l11y^1 = b1;
where
^l11 = l11(1 + 1 ):
This shows that y^1 is the exact solution of an equation line whose coecient is a number close to
l11.
Step 2.
Similarly,
l y^ )
y^ =
b ;l l y^ =
b ;
(
2
2 21 1
l
2 21 1
22 22
= ( b ; l y
2 (1 + ))(1
21 1 + ) 11 22
(11.2.3)
l (1 + ) 22 2
That is,
l (1 + )^y + l (1 + )^y = b ;
21 21 1 22 22 2 2
where ;
= 1 + ;
21
11
= 1+
22
2 22
22 22
(neglecting 11 22, which is small). Thus, we can say that y^1 and y^2 satisfy
^l21y^1 + ^l22y^2 = b2 (11.2.5)
725
Step k.
The above can be easily generalized and we can say that at the kth step, the unknowns y1
through y satisfy
k
^l 1y^1 + ^l 2y^2 + + ^l y^ = b
k k kk k k (11.2.6)
where ^l = l (1 + ); j = 1; : : :; k. The process can be continued until k = n.
kj kj kj
Thus, we see that the computed y^1 through y^ satisfy the following perturbed triangular system:
n
^l11y^1 = b1
^l21y^1 + ^l22y^2 = b2
..
.
^l 1y^1 + ^l 2y^2 + + ^l y^ = b
n n nn n n
Knowing the bounds for , the bounds for can be easily computed. For example, if n is
small enough so that n < 1 , then j j 1:06(k ; j + 2) (see Chapter 2, Section 2.3). Then
ij ij
100 kj
0a (k )
a 1 (k )
B
B
11
... .. C
ln
. C
B
B CC
A =B
B
(k )
a (k )
a C C:
(k )
(11.3.1)
B
B ..
kk
. . . ... C
kn
CA
@ .
a (k )
nk
a (k )
nn
The nal matrix A( ;1) is triangular. We shall assume that the quantities a( ) are the computed
n k
ij
numbers.
First, let's analyze the computations of the entries of A(1) from A in the rst step. Let the
computed multipliers be
m^ 1 ; i = 2; 3; : : :; n:
i
Then
m^ =
( aa ) = aa (1 + );
i1
i1 i1
i1
11 11
where
j j :
i1 (11.3.2)
727
Thus, the error e(0)1 in setting a(1)1 to zero is given by
i i
e (0)
i1 = a(1)1 ; a 1 + m^ 1a11
i i i
= 0 ; a 1 + ( a 1 )(1 + 1)a11 i
a11 i i
= 1a 1 i i (11.3.3)
Let us now compute the errors in computing the other elements a(1) of A(1). The computed elements ij
a(1)
ij
=
(a ;
(m^ 1a ))
ij i ij
= [a ; m
^ 1 a1 (1 + (1))](1 + (1)); i; j = 2; : : :; n;
ij j j ij ij
where
j j ; j j :
(1)
ij
(1)
ij
A = A;L A+E ;
(1) (0) (0)
(11.3.6)
where 0 0 0 0
1 0 0 0 0 1
B
B m^ 0 0C
C BB C
e e C
L =B C; B CC :
(0) (0)
(0)
B
B .
21
... C CA E =BB@ ...
(0)
... CA
21 2n
@ ..
m^ 1 0 0
n e e (0)
n1
(0)
nn
A =A ;L A +E ;
(2) (1) (1) (1) (1)
(11.3.7)
where L(1) and E (1) are similarly dened.
728
Substituting (11.3.6) in (11.3.7), we have
A = A ;L A +E
(2) (1) (1) (1) (1)
(11.3.8)
= A;L A+E ;L A +E (0) (0) (1) (1) (1)
Since 00 0 01
BB CC
BB . 0 .. CC
L ;1) =B BB ... m^ . . C;
.. C
(k
B@ ..
k +1;k
C
.. .C A
0 m^n;k 0
we have
L A =L A
(k ) (k ) (k ) (n ;1); k = 0; 1; 2; : : :; n ; 2: (11.3.9)
Thus from (11.3.8) and (11.3.9), we obtain
That is,
729
Noting now
0 1 0 0 0
1
BB m^ 1 0 0C
C
BB 21 CC
I + L + L + +L
(0) (1) ;
(n 2)
=B BB m^. m^ 32 1 0C C = L^ (Computed L);
B@ ..
31
... ... ... CC
A
m^ ; 1 m^ n1 m^ 2
n n;n 1
A + E = L^ U^ (11.3.13)
where the entries of the matrices E (0); : : :; E ( ;2) are given by: n
00 0 0 0 1
BB 0 0 0 0 CC
BB . . . ( ;1) C
E ( ;1) = B .. .. .. e +1 e( +1;1) C CC ; k = 1; 2; : : :; n ; 1; (11.3.14)
BB . . .
k k k
. . .. C
.
k ;k k ;n
B@ .. .. .. .. .. CA
0 0 e( ;1) e( ;1) n;k
k k
n;n
e (k ;1) = a( ;1) k
; (11.3.15)
i;k i;k i;k
e (k ;1) = (k )
ij
a ; m^ a
(k ) (k ;1) ( k)
; i; j = k + 1; : : :; n; (11.3.16)
1+
ij (k ) ij ik kj ij
ij
and
j j ;
ik j j ;(k )
ij
(11.3.17)
and
j j :
(k )
ij
(11.3.18)
We formalize the above discussions in the following theorem:
730
Theorem 11.3.1 The computed upper and lower triangular matrices L^ and U^
produced by Gaussian elimination satisfy
A + E = L^ U^
where U^ = A( ) and L^ is lower triangular matrix of computed multipliers:
n
0 1 0 01
BB m^ C
BB .21 .1 .0 . 0. CCC
L=B BB ... . . . . . . . . . ... CCC
B@ .. .. .. . . .. C A
m^ 1 m^ 2 m^ ;1 1
n n n;n
m^ = ::11
21
21 = :52 a = :22 ; :52 :11 = :16
(1)
23
m^ = ::33
31
21 = 1:57 a = :22 ; 1:57 :35 = ;:33
(1)
32
e (0)
21 = 0 ; [:11 ; :52 :21] = ;:0008
e (0)
22 = :63 ; [:81 ; :52 :35] = :0020
e (0)
23 = :10 ; [:22 ; :52 :11] = ;:0020
e (0)
31 = 0 ; [:33 ; 1:57 :21] = ;:0003
731
e (0)
32 = ;:33 ; [:22 ; 1:57 :35] = ;:0005
e (0)
33 = :22 ; [:39 ; 1:57 :11] = :0027
0 0 0 0
1
B C
E =B
@ ;:0008 :0020 ;:0028 CA
(0)
0 0 :30
e (1)
32 = 0 ; [;:33 + :52 :63] = :0024
e (1)
= :30 ; [:22 + :52 :16] = ;:0032
00 0 1
33
0
E
B0 0
= B 0 C
C
(1)
@ A
0 :0024 ;:0032
Thus 0 0 0 0
1
B C
E=E +E =B @ ;:0008 :0020 ;:0028 CA
(0) (1)
factor is dened by
max ja( )j k
= max ja j : i;j;k ij
ij
i;j
732
Let a = max ja j. Then from (11.3.14) and (11.3.15), we have
i;j ij
je (k ;1)j a; k = 1; 2; : : :; n ; 1; i = k + 1; : : :; n
ik
and, for i; j = k + 1; : : :; n;
je j 1 ; ja j + ja
(k )
ij
(k )
ij
(k
ij
;1)j 2
1 ; a: (since m^ 1)
ik
Denote 1 ; by .
Then
jE j = jE + + E ; j
(0) (n 2)
jE j + + jE ; j 0
(0) (n 2)
20 0 0 1 B 0 01
C
66BB 1 2 2 CC BB 0 0C
CC
66BB CC BB 0 1 2 2C
a 66B B CC +BB C
C
66BB CC BB C
CC
4@ A BB C
1 2 2 @ A
00 1 2 2 13
0 0 0
BB 0 0 0C
C77
BB . C7
CC77
++ B BB .. CC77
B@ 0 0 0C A75
0 0 0 0 10 0 1 2
BB 1 2 2 2 CC
BB CC
a B 1 3 4 4 C
B CC (11.3.19)
BB .. CA
@ .
1 3 5 2n ; 2
Remark: The inequalities (11.3.18) hold element-wise. We can immediately obtain a bound in
terms of norms. Thus,
733
11.4 Backward Error Analysis for Solving Ax = b
We are now ready to give a backward round-o error analysis for solving Ax = b using triangular-
ization by Gaussian elimination, followed by forward elimination and back substitution.
First, from Theorem 11.3.1, we know that triangularization of A using Gaussian elimination
yields L^ and U^ such that A + E = L^ U^ .
These L^ and U^ will then be used to solve:
L^ y = b
^ = y:
Ux
From Theorem 11.2.1 and Theorem 11.2.2, we know that computed solution y^ and x^ of the
above two triangular systems satisfy:
(L^ + L)^y = b;
and
(U^ + U )^x = y^:
From these equations, we have
(U + U )^x = (L + L);1 b
or
(L^ + L)(U^ + U )^x = b
or
(A + F )^x = b (11.4.1)
where
F = E + (L)U^ + L^ (U ) + (L)(U ); (11.4.2)
(Note that A + E = L^ U^ ).
Bounds for F
From (11.4.2) we have
kF k1 kE k1 + kLk1 kU k1 + kLk1kU k1 + kLk1kU k1:
We now obtain expressions for kLk1; kU k1 ; kLk1 and kU k1 . Since,
0 1 1
BB m^ . . . 0 CC
L^ = BB 21 CC ;
B@ ... ... CA
m^ m^ ; 1
n1 n;n 1
734
from (11.2.8), we obtain
0 2
1
B
B 3jm^ j 2
CC
jLj 1:06 B B ..
21
...
CC (11.4.3)
B
@ . CA
(n + 1)jm^ j 3jm
21 ^ ;j 2 n;n 1
kL^ k1 n (11.4.4)
kLk1 n(n2+ 1) (1:06) (11.4.5)
Similarly,
kU^ k1 na; (11.4.6)
(note that U = A( ;1) ) n
and
kU k1 n(n2+ 1) 1:06a: (11.4.7)
(note that max ju j a).ij
kF k1 1:06(n + 3n )kAk1:
3 2
(11.4.11)
The above discussion leads to the following theorem.
735
Theorem 11.4.1 The computed solution x^ to the linear system
Ax = b
using Gaussian elimination with partial pivoting satises a perturbed system
(A + F )^x = b
where F is dened by (11.4.2) and kF k1 1:06(n3 + 3n2 )kAk1.
Remarks:
1. The above bound for F is grossly overestimated. In practice, this bound for F is very rarely
attained.
Wilkinson (AEP, pp ) states that in practice kF k1 is always less than or equal to
nkAk1 .
2. Making use of (11.2.8), (11.2.10), and (11.3.17), we can also obtain an element-wise bound
for F . (Exercise)
736
To repeat, these results say that the forward elimination and back substitution methods for
triangular systems are backward stable, whereas, the stability of the Gaussian elimination process
for LU factorization and therefore, for the linear system problem Ax = b using the process, depend
upon the size of the growth factor.
737
Exercises on Chapter 11
1. Using = 10, and t = 2; compute the LU decomposition using Gaussian elimination (without
pivoting) for the following matrices, and nd the error matrix E in each case such that
A + E = LU:
3 4
!
(a) A = ;
5 6
:25 :79 !
(b) ;
:01 :12
!
10 9
(c) ;
8 5
4 1
!
(d) ;
1 2
:01 :05
!
(e) :
:03 :01
2. Suppose now that partial pivoting has been used in computing the LU factorization of each of
the above matrices of problem #1. Find again the error matrix E in each case, and compare
the bounds of the entries in E predicted by (11.3.12) with the actual errors.
3. Consider the problems of solving linear systems:
Ax = b
using Gaussian elimination
! with partial pivoting with each of the matrices from problem #1
1
and taking b = in each case. Find F in each case such that the computed solution x
1
satises
(A + F )x = b:
Compare the bounds predicated by (11.4.11) with actual errors.
4. Making use of (11.2.8), (11.2.10), and (11.3.13{(11.3.17), nd an element-wise bound for F
in Theorem 11.4.1.
(A + F )x = b:
5. From Theorems 11.2.1 and 11.2.2, show that the process of forward elimination and back
substitution for lower and upper triangular systems, respectively, are backward stable.
6. From (11.3.18), conclude that the backward stability of Gaussian elimination is essentially
determined by the size of the growth factor .
738
7. Consider the problem of evaluating the polynomial
p() = a + a ;
n ;1 + + a
n
n n 1 0
by synthetic division:
p = a
n n
p ; =
(p + a ; );
i 1 i i 1 i = n; n ; 1; : : :; 1:
Then p0 = p(): Show that
p = a (1 + ) + a ; (1 + )
0 n n
n
n 1 n+1
n ;1 + + a0 (1 + 0 ):
Find a bound for each ; i = 0; 1; : : :; n: What can you say about the backward stability of
i
739
A. A BRIEF INTRODUCTION TO MATLAB
A.1 Some Basic Information on MATLAB : : : : : : : : : : : : : : : : : : : : : : 740
A.1.1 What is MATLAB ? : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 740
A.1.2 Entering and Exiting MATLAB : : : : : : : : : : : : : : : : : : : : : : 740
A.1.3 Two most important commands: HELP and DEMO : : : : : : : : : 740
A.1.4 Most frequently used MATLAB operations and functions : : : : : 741
A.1.5 Numerical Examples : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 743
A.2 Writing Your Own Programs Using MATLAB Commands : : : : : : : : : 746
A.2.1 Some Relational Operators in MATLAB : : : : : : : : : : : : : : : : 746
A.2.2 Some Matrix Building Functions : : : : : : : : : : : : : : : : : : : : : 747
A.2.3 Colon Notation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 747
A.2.4 for, while, if commands : : : : : : : : : : : : : : : : : : : : : : : : : : : 748
A.2.5 Computing Flop-Count and Elapsed Time of An Algorithm. : : : : 748
A.2.6 Saving a MATLAB Program : : : : : : : : : : : : : : : : : : : : : : : : 748
A.2.7 Getting a Hard Copy : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 748
A.2.8 Examples of Some Simple MATLAB Programs : : : : : : : : : : : : 749
A.2.9 Use of `diary' Command and Printing The Output : : : : : : : : : : 750
APPENDIX A
A BRIEF INTRODUCTION TO MATLAB
A. A BRIEF INTRODUCTION TO MATLAB
A.1 Some Basic Information on MATLAB
A.1.1 What is MATLAB ?
MATLAB stands for MATrix LABoratory. It is an interactive software package for solving problems
arising in scientic and engineering computations. MATLAB contains programs for all fundmental
matrix computations such as solutions of linear systems, various matrix factorizations, solutions
of least squares problems , eigenvalues and eigenvector computations, singular values and singular
vectors computations, etc. It was developed by Cleve Moler. The most current version contains
programs for many other types of computations including 2-D and 3-D graphic capabilitiies.
Example A.1.1
>> help norm
For vectors..
NORM(V,P) = sum(abs(V)^P)^(1/P).
NORM(V) = norm(V,2).
NORM(V,inf) = max(abs(V)).
NORM(V,-inf) = min(abs(V)).
Demo teaches you how to use the matlab functions such as how to enter matrix values into a
matrix, how to nd its transpose, how to nd the rank of a matrix, etc.
741
rcond - LINPACK reciprocal condition estimator.
rank - Number of linearly independent rows or columns.
det - Determinant.
trace - Sum of diagonal elements.
null - Null space.
orth - Orthogonalization.
rref - Reduced row echelon form.
Linear equations.
\ and / - Linear equation solution; use "help slash".
chol - Cholesky factorization.
lu - Factors from Gaussian elimination.
inv - Matrix inverse.
qr - Orthogonal-triangular decomposition.
qrdelete - Delete a column from the QR factorization.
qrinsert - Insert a column in the QR factorization.
pinv - Pseudoinverse.
lscov - Least squares in the presence of known covariance.
Using help you can get information on any of the above routines. Here is an exapmle.
Example A.1.2
>> help lu
742
LU Factors from Gaussian elimination.
[L,U] = LU(X) stores a upper triangular matrix in U and a
"psychologically lower triangular matrix", i.e. a product
of lower triangular and permutation matrices, in L , so
that X = L*U.
A =
1 3 5
2 4 6
1 3 9
b =
1
1
1
x =
743
-0.5000
0.5000
0
ans =
ans =
-0.3246
12.3246
2.0000
ans =
To nd the rank of A
744
>> rank(A)
ans =
ans =
13.3538
ans =
42.1539
q =
r =
745
To compute the lu factorization of A using partial pivoting.
>> [l,u] = lu(A)
l =
0.5000 1.0000 0
1.0000 0 0
0.5000 1.0000 1.0000
u =
2 4 6
0 1 2
0 0 4
746
>= greater than or equal to
== equal
Examples
1. rand(5,3) will create a 5 x 3 randomly generated matrix
2. If x is a vector diag(x) is the diagonal matrix with entries of x on the diagonal. diag(A) is
the vector consisting ot the diagonal of A
3. hilb(5) will create the 5 x 5 Hilbert matrix.
4. max(A(:,2)) will give the maximum value of the second column of A. max(max(A)) will give
the maximum entry of the whole matrix A.
747
A.2.4 for, while, if commands
These commands are most useful in writing MATLAB programs for matrix algorithms.
The uses of these commands will be illustrated in the following examples.
will give the total
ops to solve a linear system with a given matrix a and a vector b.
The function cputime returns the CPU time in seconds that has been used by the MATLAB
process since the MATLAB process started.
For example :
t = cputime; your operation; cputime ; t
returns the cputime to run your operation. Since the PC version of MATLAB does not have a
cputime function, MATCOM contains a program cputime.m. If the version of matlab that you use
contains the cputime function then you can delete the cputime.m program from MATCOM.
748
A.2.8 Examples of Some Simple MATLAB Programs
Example A.2.1
The following code will make the elements below the diagonal of the 4 x 4 matrix A = a(i,j) zero.
a = rand(4,4)
for i = 1:4
for j = 1:4
if i > j
a(i,j) = 0
end;
end;
end;
Example A.2.2
The following code will create a matrix A such that the (i; j )-th entry of the matrix A = a(i; j ) is
(i + j ).
a= zeros(4,4)
for i = 1:4
for j = 1:4
a(i,j) = i+j
end;
end;
Example A.2.3
The following MATLAB program computes the matrix-matrix product of two upper triangular
matrices. This program will be called matmat.m
% Matrix-Matrix product with upper triangular matrices
% input U and V two upper triangular matrices of order n
% output C = U * V
% function C = matmat(U,V)
749
function C = matmat(U,V)
[n,m] = size(U)
C = zeros(n,n)
for i = 1:n
for j = i:n
for k = i:j
C(i,j) = C(i,j) + U(i,k) * V(k,j)
end;
end;
end;
end;
750
diary command can be used to create a listing.
Example:
>>diary B:diary8 <----- sets the diary on
>> C = matmat(U,V) <----- execute your program
>> diary off <----- set the diary off
The above command will store all the commands that are executed by the program matmat.m
in the le diary8. Only that data which is printed on the screen is stored in diary8. This le can
then be printed. In case you do not want the output to be printed on the screen place a semi
colon at the end of the line.
To write a comment-line put the % sign in the rst column. Examples of some M les are given
in appendix B.
751
B. MATLAB AND SELECTED MATLAB PROGRAMS
B.1 MATCOM and Some Selected Programs From MATCOM : : : : : : : : : 752
B.1.1 What is MATCOM ? : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 752
B.2 How To Use MATCOM ? : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 752
B.3 Chapter-wise Listing Of MATCOM Programs : : : : : : : : : : : : : : : : : 755
B.4 Some Selected MATCOM Programs : : : : : : : : : : : : : : : : : : : : : : : 759
APPENDIX B
MATCOM AND SELECTED MATLAB PROGRAMS FROM
MATCOM
B. MATLAB AND SELECTED MATLAB PROGRAMS
B.1 MATCOM and Some Selected Programs From MATCOM
B.1.1 What is MATCOM ?
MATCOM is a MATLAB based interactive software package containing the implementation of all
the major algorithms of chapter 3 through chapter 8 of the book ' Numerical Linear Algebra
and Applications ' by Prof. Biswa Nath Datta.
For each problem considered in this book, there are more than one (in some cases several)
algorithms.
By using the programs in MATCOM the students will be able to compare dierent alogrithms
for the same problem with respect to accuracy ,
op-count, and elapsed time , etc. The students will
be able to verify the statements about goodness and badness of dierent algorithms. In particular,
they will be able to distinguish between a good and a bad algorithm. MATCOM has been written
by Conrad Fernandes, a graduate student of Professor Datta.
To use this program in matlab, you need to know the input and output variables. To nd out
these variables type
>> help housmat
The input is a vector x and the output are a matrix H and a vector u.
To execute the program you have to do the following
1. create the input vector x
2. then type
[u,H] = housmat(x)
As output you will get the Householder matrix H such that Hx is a multiple of e1 and the
vector u out of which the Householder matrix H has been formed.
>> x = rand(4,1)
x =
0.2190
0.0470
0.6789
0.6793
u =
1.7740
0.0693
0.9994
1.0000
H =
753
-0.6884 -0.0269 0.6122 -0.3880
-0.6888 -0.0269 -0.3880 0.6117
754
B.3 Chapter-wise Listing Of MATCOM Programs
CHAPTER 3
Title Program Name Number
Back substitution backsub.m 3.1.3
The Inverse of an Upper Triangular Matrix invuptr.m 3.1.4
Basic Gaussian Elimination gauss.m 3.1.5
CHAPTER 4
Title Program Name Number
Computing (I ; 2(u u )=(u u)) A housmul.m
T T 4.2.1
Computing A (I ; 2(u uT )=(uT u)) houspostmul.m 4.2.1 (sec. num.)
CHAPTER 5
Title Program Name Number
Triangularization Using Gaussian elimination lugsel.m 5.2.2
Without Pivoting
Triangularization Using Gaussian Elimination parpiv.m 5.2.3
With Partial Pivoting
Triagularization Using Gaussian elimination
With Complete Pivoting compiv.m 5.2.4
Creating Zeros in a Vector With a Householder
Matrix housmat.m 5.4.1
Householder QR Factorization housqr.m 5.4.2
Householder QR Factorization for a
Nonsquare Matrix housqrnon.m 5.4.2
Householder Hessenberg Reduction houshess.m 5.4.3
Creating Zeros in a Vector
Using Givens Rotations givcs.m and givrot.m 5.5.1 (sec. num.)
Creating Zeros in a Specied Position
of a Matrix Using Givens Rotations givrota.m 5.5.1
QR factorization Using Givens Rotations givqr.m 5.5.2
Givens Hessenberg Reduction givhs.m 5.5.3
755
CHAPTER 6
Forward Elimination forelm.m 6.4.1
Solving Ax = b with Partial Pivoting gausswf 6.4.3
without Explicit Factorization
Cholesky Algorithm choles.m 6.4.4
Sherman Morrison Formula shermor.m 6.5.2 (sec. num.)
Inverse by LU Factorization inlu.m 6.5.3 (sec. num.)
without Pivoting
Inverse by Partial Pivoting inparpiv.m 6.5.3 (sec. num.)
Inverse by Complete Pivoting incompiv.m 6.5.3 (sec. num.)
Hager's norm-1 condition number estimator hagnormin1.m 6.7.1
Iterative Renement iterref.m 6.9.1
The Jacobi Method jacobi.m 6.10.1
The Gauss-Seidel Method gaused.m 6.10.2
The Succesvie Overrelaxation Method sucov.m 6.10.3
The Basic Conjugate Gradient Algorithm congrd.m 6.10.4
Incomplete Cholesky Factorization icholes.m 6.10.6
No-Fill Incomplete LDLT nichol.m 6.10.7
756
CHAPTER 7
Title Program Name Number
Least Squares Solution Using Normal Equations lsfrnme.m 7.8.1
The Householder-Golub Method For the Full-Rank lsfrqrh.m 7.8.2
Least Squares Problem
Classical Gram-Schmidt for QR factorization clgrsch.m 7.8.3
Modied Gram-Schmidt for QR factorization mdgrsch.m 7.8.4
Least Squares Solution by MGS lsfrmgs.m 7.8.5
Least Squares Solution for the
Rank-Decient Problem Using QR lsrdqrh.m 7.8.6
Minimum Norm Solution for the
Full-Rank Underdetermined Problem Using lsudnme.m 7.9.1
Normal Equations
Minimum Norm solution for the Full-Rank
Underdetermined Problem Using QR lsudqrh.m 7.9.2
Linear Systems Analog Least Squares
Iterative Renement lsitrn1.m 7.10.1
Iterative Renement for Least Squares Solution lsitrn2.m 7 .10.2
Computing the Variance-Covariance Matrix reganal.m 7.11.1
CHAPTER 8
Title Program Name Number
Power Method power.m 8.5.1
Inverse Iteration invitr.m 8.5.2
Rayleigh-Quotient Iteration rayqot.m 8.5.3
Sensitivities of Eigenvalues senseig.m 8.7.2 (sec. num.)
The Basic QR Iteration qritrb 8.9.1 (sec. num.)
The Hessenberg QR Iteration qritrh.m 8.9.2 (sec. num.)
The Explicit Single Shift QR
Iteration qritrsse.m 8.9.4 (sec. num.)
The Explicit Double Shift QR Iteration qritrdse 8.9.5 (sec. num.)
One Iteration-Step of the Implicit Double Shift QR qritrdsi.m 8.9.1 (section 8.9.6)
NO CHAPTER
757
Title Program Name Number
The Absolute Maximum of a Vector absmax.m
Interchange Two Vectors inter.m
Computing the CPU Time cputime.m
758
B.4 Some Selected MATCOM Programs
% Back substitution 3.1.3 backsub.m
% input upper triangular T and vector b
% output [ y] the solution to Ty = b by back subst
% function [y] = backsub(T,b);
function [y] = backsub(T,b);
% !rm diary8
% diary diary8
[m,n] = size(T);
if m~=n
disp('matrix T is not square')
return
end;
y = zeros(n,1)
for i = n:-1:1
sum = 0
for j = i+1:n
sum = sum + T(i,j)*y(j)
end;
y(i) = (b(i) - sum ) / T(i,i)
end;
end;
759
s = eye(n,n)
for k = n:-1:1
T(k,k) = 1/T(k,k)
for i = k-1 :-1 :1
sum = 0
for j = i+1:k
sum = sum + T(i,j)*T(j,k)
end;
T(i,k) = -sum/T(i,i)
end;
end;
end;
beta = 2/(u'*u)
for j = 1 : n
alpha = 0
for i = 1 : m1
alpha = alpha + u(i) * A(i,j)
end;
alpha = beta * alpha
for i = 1:m1
A(i,j) = A(i,j) - alpha * u(i)
end;
end;
760
end;
end;
761
function [A] = givrota(i,j,A)
[m,n] = size(A)
% !rm diary89
% diary diary89
x = zeros(2,1)
x(1) = A(i,i)
x(2) = A(j,i)
[c,s] = givcs(x)
J = eye(n,n)
J(i,i) = c
J(i,j) = s
J(j,i) =-s
J(j,j) = c
A = J*A
end;
762
for i = k+1:n
b(i) = b(i) + A(i,k) * b(k)
for j = k+1 :n
A(i,j) = A(i,j) - A(i,k) * A(k,j)
end;
end;
end;
u = triu(A)
l = tril(A,-1)
for i = 1:n
l(i,i) = 1
end;
end;
763
end;
bsub = zeros(n,1)
for i = 1 : n
bsub(i) = b(i) / A(i,i)
end;
for i = 1 : numitr
disp('the iteration number')
i
xnew = Bsub * xold + bsub
xold = xnew;
end
end;
[q,r1] = qr(A)
c = q'*b
ran = rank(r1)
r = r1(1:ran,:)
[r,c,x] = backsub(r,c)
[m1,n] = size(A);
if m1~=n
disp('matrix A is not square')
764
return
end;
[q,r] = qr(A)
for k = 1 : numitr
disp('the iteration number')
k
Anew = r*q
[q,r] = qr(Anew)
end;
xold = diag(Anew)
end;
765