Professional Documents
Culture Documents
Von der Fakultt fr Mathematik und Informatik der Technischen Universitt Bergakademie Freiberg genehmigte
Dissertation
zur Erlangung des akademischen Grades
vorgelegt
von geboren am
Gutachter:
Prof. Dr. Michael Eiermann (Freiberg) Prof. Dr. Marlis Hochbruck (Dsseldorf ) Prof. Dr. Ivo Marek (Prag)
Danksagung
An dieser Stelle mchte ich mich bei allen bedanken, die zum Gelingen der Arbeit beigetragen haben. Am Graduiertenkolleg Rumliche Statistik der TU Bergakademie Freiberg hatte ich fr drei Jahre die Mglichkeit, meinen Forschungen und Neigungen nachzugehen. Zudem bot sich vielfach Gelegenheit, Neues zu lernen und ber den sprichwrtlichen Tellerrand zu schauen. Mein Dank gilt allen Kollegiaten, Professoren und Mitarbeitern fr die freundliche Aufnahme und Untersttzung. Ein weiterer Angelpunkt meiner Freiberger Jahre war das Institut fr Angewandte Mathematik II. Ich danke den dortigen Kollegen fr die Zusammenarbeit, insbesondere Oliver Ernst, Werner Queck und Volker Sperling. Am Institut fr Wissenschaftliches Rechnen des Forschungszentrums Karlsruhe fand ich schlielich ein neues Wirkungsfeld. Zudem durfte ich Zeit und Ressourcen im Institut fr den Abschluss der Arbeit nutzen. Mein Dank dafr gilt dem Institutsleiter, Herrn Mickel, sowie Frank Schmitz und den anderen Kollegen fr ihr Verstndnis. Besonderer Dank gebhrt Prof. Eiermann, der mich schon whrend des Studiums fr die numerische Mathematik begeisterte und seither als Betreuer meiner Diplomarbeit und der Dissertation meine Arbeit begleitet hat. die Mglichkeit gegeben, die vorliegende Arbeit zu beenden. Danken mchte ich auerdem Herrn Marcos Khrmann fr Beistand in einer schwierigen Lebensphase. Nicht zuletzt musste er meine unstete Arbeitsweise ertragen und hat mir trotz einer langen Unterbrechnung
ii
Contents
1 3 7
8 9 13 16 26
35
35 37 45 49 60 61
63
63 69 71 80 83 86
88 89
iii
1 Introduction
Krylov subspace methods are a standard technique for solving linear equations. The great majority of existing literature is devoted to the study of these methods for a regular coecient matrix (cf. [22] and the references therein). The analysis of this methods applied to a singular operator is not as well developed as for the nonsingular case. On the other hand, there are applications where linear systems of equations with a singular matrix arise, for example the discretization of partial dierential equations with Neumann boundary conditions or discrete inverse problems (cf. [33]). space, which is also stressed in this thesis. There are studies of the behavior of widespread used Krylov methods for singular systems, such as [34] for the Conjugate Residual (CR) method, [35] for the Generalized Conjugate Residual (GCR) method, [7, 64] for the Generalized Minimal Residual (GMRES) method, and [29] for the Quasi-Minimal Residual (QMR) method. Other authors have proposed modied versions of established Krylov methods (cf. [61, 62, 63]). Though, the analysis is often short-coming and lacks a consistent presentation of results. Recent advances in unifying the theory of Krylov methods use abstract orthogonal residual (OR) and minimal residual (MR) approximations to describe all kinds of subspace correction methods in a concise manner (cf. [22, 25]). These advances are not yet adopted for the singular case. The main objective of the present work is to close this open issues. In studying singular systems of linear equations there naturally arise generalizations of the concept of an inverse operator. We shall widely use such generalized inverses. Therefore we start with a short survey about this topic in Section 2. A less common representative of the class of generalized inverses is the subspace inverse, which we introduce in Section 3.2. This section is part of the exposition of abstract MR and OR approximation methods. Our presentation of the material in Section 3 is greatly inspired by Eiermann and Ernst (cf. [22]). We solely augment their results to cover the singular operator case. To this end, after some introductory remarks about operator equations and subspace correction methods in Section 3.1, we study minimal residual approximations for the solution of operator equations in Section 3.2. The standard implementation of this methods using nested orthonormal bases is described in Section 3.3. The breakdown behavior of the underlying Gram-Schmidt orthonormalization process is studied in in Section 3.4. We show in Section 3.5, how most of the results in [22] about MR and OR methods for solving regular systems may be preserved in the context of a singular operator. Another important realm of applications is the study of Markov chains with large nite state
1 Introduction
The Drazin inverse, another generalized inverse, plays a prominent role in Section 4 where we specialize the abstract setting to Krylov subspace methods. After the basic denition we point out the polynomial structure of Krylov spaces in Section 4.1. Section 4.2 provides tools to deal with the Drazin inverse and the minimal polynomials of a matrix and a vector. They are used in Section 4.3, where the important connection between Krylov subspaces and the Drazin inverse is established. In Section 4.4 we investigate the question, when a Krylov method yields least squares solution. This leads to new identities involving the Drazin and the Moore-Penrose inverse. We close with some remarks about the stability of Krylov methods and the modications necessary to obtain Drazin inverse solutions. As a real world problem involving the solution of singular linear operator equations we present the analysis of continuous-time Markov chains (CTMC) in Section 5. After some preliminaries about the background in stochastics and tools from matrix theory we investigate the transition function of a Markov chain in a purely analytical and matrix algebraic context. The results of this analysis are then interpreted in terms of the stochastic process and used to explain the usefulness of Krylov subspace methods for the analysis of Markov chains. We conclude with some nal remarks on possible topics of further research.
2 Generalized Inverses
In this section we summarize some facts about generalizations of the inverse of a linear operator. This is not only done for introducing the reader into this subject, but also to emphasize common properties and dierences between the proper inverse and its various generalizations. To keep things simple we restrict ourselves to the case, when A Cmn is a matrix. Considerable introductions to this topic can be found in the monographs of Campbell and Meyer [14] or Ben-Israel and Greville [4]. 1 The inverse X = A of a regular (and thus square) matrix A fullls the matrix identities
AX = I XA = I ,
where
and
(2.1) (2.2)
A.
A simple
AX = XA.
Other obvious identities are
(2.3)
k+1
X =A
k 0.
to given inner
M denotes the adjoint of a matrix (or operator) M with respect n m products (, ) on H1 = C and H2 = C . An obvious way to generalize
Here
the term inverse is to require a subset of the above identities to be satised. Equations (2.4) to (2.7) are known as the Penrose conditions. Together, they dene the Moore-Penrose generalized inverse or pseudoinverse A of an arbitrary matrix A Cmn . A generalization of A for a linear operator A : H1 H2 on arbitrary Hilbert spaces that
Hj
Note also, that each of the Penrose conditions (2.4) and (2.5) immediately implies,
AX
and
XA are projections.
2 Generalized Inverses
ditions (2.6) and (2.7) yield that these projections are orthogonal ones. which
Moore-Penrose inverse can be characterized as the uniquely determined operator the subspace
for
AX = PR(A) and XA = PR(A ) . Here PU denotes the orthogonal projection onto U and R(M ) := {M v : v H1 } H2 is the range of the operator (or matrix) M : H1 H2 .
If it exists, the solution of equations (2.3), (2.4) and (2.5) is called the group inverse
A# of the (necessarily square) matrix A. It is called so, because it is the inverse element of A in any multiplicative group of matrices. Another well known generalized inverse is the Drazin inverse of a square matrix A, which is the solution of equations (2.3), (2.5) and (2.8) for k index(A). The index of a matrix A is related to the Jordan canonical form: It is simply the dimension of the largest Jordan block corresponding to the eigenvalue 0 (and index(A) = 0 if A is nonsingular). Note that the group inverse is just the Drazin inverse of a matrix A with index(A) 1.
Beside its nice algebraic properties, the inverse of a linear operator is a powerful tool in solving operator equations. Consider
Ax = b ,
(2.9)
n where we are looking for solutions x C = H1 for a given right hand side b m C = H2 . If A is square and regular, the unique solution of this equation is given by x = A1 b . It seems natural to generalize this term for singular or even rectangular
A.
be consistent, i. e.,
b R(A),
usual sense. Second, even if the system is solvable, the solution need not be unique. More precisely, the solution of a consistent linear equation is unique if and only if
N(A) = {v : Av = 0 , v H1 }
vector.
A generalized inverse which supplies a solution of (2.9) if the equation is consistent is called an equation solving generalized inverse. The class of all equation solving inverses of
is usually denoted by
A{1}
A denotes an arbitrary element of it. A{1} is If X A{1} additionally satises (2.6) it is called a since for every right hand side b there holds
and
b AX b = min b Ax
x H1
in this case. The norm
(, ).
X b = min x
x ,Ax =b
for all
b R(A) X
is called a minimal norm
generalized inverse of
A.
2 Generalized Inverses
norm property if
Penrose inverse
Xb < w
norm property (cf. [14, Theorem 2.1.1]). All generalized inverses introduced so far coincide with
A1
if
A is regular.
But there
are useful generalizations which lack this property. An example is the constrained or
subspace generalized inverse, which satises the least squares and the minimal norm n
condition above only on a certain subspace
C C
[14, Section 3.6]). Note that the Drazin inverse provides a solution of (2.9) if the right index(A) hand side lies in a certain subspace of R(A) namely the range of A . In general, 1 each of the above properties of A may be restricted to a certain subset of vectors to get a new class of generalized inverses. Again, the subspace inverse may illustrate this. In Section 3.2 we will introduce it as the Moore-Penrose inverse of a certain operator. Alternatively it could be dened as the solution of the following equations:
for all
v C H1 , y , z H2 , v , w C H1 ,
(2.4') (2.5 )
(2.6') (2.7')
Generalized inverses of this type are rst studied by Minimade and Nakamura [50], where the term restricted pseudoinverse is used. Ben-Israel and Greville also discuss
(or the initial residual). In an iterative solution method each step is associated with a specic generalized inverse and the whole iteration can be described by specifying this sequence of operators. We illustrate this in short for the classical stationary iteration methods: Given a splitting iteration is
A = M N
with
x = T x 1 + c ,
with
= 1, 2, . . .
this can be rewritten as
T = M 1 N
and
c = M 1 b .
Using
r0 = b Ax0
1
x = x0 + I + T + + T
Thus, the
M 1 r0 .
This correction may be regarded as an approximate solution of the residual equation. In formulas, setting
A := I + T + + T
=M
1 j =0
(NM 1 )j ,
2 Generalized Inverses
the solution of
AX is a generalized inverse of A. For a nonsingular matrix A a reasonable iteration matrix T satises lim T = O . X X Since A satises the identity A A = I T we conclude that (2.1) holds asymptotically X X for A . In other words, A converges to the proper inverse of A. That an iteration is Ac = r0
is approximated by
AX r0 ,
where
reasonable means of course, that the iterates converge to the solution of (2.9), which is for nonsingular If
equivalent to
(T ) < 1,
1.
is
is singular and
b R(A),
(namely that
semiconvergent, cf. Denition 5.2.9), the stationary iteration converges to a solution of the linear system which can be expressed in terms of the Drazin inverse of for general semiiterative methods.
I T
(cf. [5, Section 7.6]). Eiermann, Marek and Niethammer derive in [24] similar results
(, )
. Given
h MR W,
The OR approximation
r MR := r h MR W.
h OR
is obtained by imposing
h OR W,
where
r OR := r h OR V, W.
We shall see later in this
section, how this approximations may be expressed in terms of orthogonal and oblique projections onto In what case where
W. follows A : H H A : H1 H2
invertible. All results and conclusions in this section remain true for the more general is an operator between dierent Hilbert spaces (e. g., a We do without this distinction here both rectangular matrix in nite dimensions).
for ease of notation and since the most relevant application of this theory in Krylov subspace methods (cf. Section 4) is covered by our assumptions. We also assume that
H is separable and that the associated scalar eld is algebraically closed, or, yet easier,
that the scalars are complex numbers. Some further remarks about the notation: independent vectors in Let
H.
{u1 , . . . , uk }
{u1 , . . . , uk }
Uk = u1 . . . uk and abbreviate the subspace Uk = span{u1 , . . . , uk } by span{Uk }. Similarly, Uk g with g = 1 . . . k Ck denotes the linear combination 1 u1 + + k uk and the notation Uk is an abbreviation k k of the mapping H C : (, uj ) . For a linear operator A on H we denote the j =1 range by R(A) = {Av : v H} and the nullspace by N(A) = {v : Av = 0 , v H}. The adjoint A of A is the operator satisfying (Ax , y ) = (x , A y ) for all x , y H.
with its representation as row vector
Ax = b
with given right hand side guess
(3.1)
b H
x0
{0 } = C0 C1 C2 Cm Cm+1 H,
we investigate iterates of the form obtain a (hopefully better) Setting initial approximation of the solution of (3.1) by a correction
(3.2)
xm = x0 + cm , cm Cm . That is, we correct the cm determined from Cm to approximate solution xm . Methods of this kind are known
hm := Acm ,
rm = b Axm = r0 Acm = r0 hm .
Thus,
space
The residual
rm
hm .
not depend on the operator equation (3.1). A simple but often used result is provided by
Lemma 3.1.1.
xm = x0 + cm
consistent.
Thus, if
If and only if
r0 Wm
(3.1).
cm Cm
such that
is a solution of
r0 Wm ,
there exists a
cm Cm with Acm = r0 and a reasonable approxihm = r0 with the approximation error rm = 0 . hm Wm to r0 , any solution cm of the consistent linear A|Cm c = hm
(3.4)
denes an iterate
xm = x0 + cm
of (3.4) dier only in directions of the subspace uniquely determined if and only if
Cm N(A) = {0 },
i. e., if all
Cm .
Obviously, if
m.
{0 } = W0 W1 Wm1 Wm .
If we attempt to apply the results of Eiermann and Ernst to our setting, the main diculty arise from the fact that
dim Wm = m
Wm1 = Wm ,
the successive
approximation problems remain unchanged. If we are only interested in abstract MR and OR methods without reference to any operator equation we may simply renumber the sequence of approximation spaces. If, however, this deciency.
Wm
is to require the
b Ax MR = r Ac MR = min r Ac ,
c C
where MR
(3.6)
h AC:
r = b Ax0 is the residual of the initial guess. The corresponding approximation = Ac MR is characterized by the orthogonal projection PW : H H onto W := h MR = PW r , r MR = r h MR = (I PW )r W.
(3.7)
The solutions of (3.6) can be described in terms of a special constrained generalized C inverse of A (cf. Section 2 and [14, Section 3.6]). Given a subspace C we call A := MR (APC ) the subspace inverse of A with respect to C. The solution c of (3.6) with minimal norm is
MR cmin = AC r
c MR = AC r + z
with
(I AC A)u : u C
(3.8)
(cf. [14, Corollary 3.6.1]). We summarize some properties of this generalized inverse
Proposition 3.2.1.
(i) (ii) (iii) (iv)
AC = (APC )
of
with respect to
satises
10
(v) (vi)
AC A
is a projection onto
R(AC )
and
N(AC A) = {z H : Az AC}.
From the denition of the constrained inverse and known properties of the
Proof.
C which prove (ii) and the inclusion R(A ) C in (i). C C This inclusion implies PC A = A and thus
AAC = APC AC = (APC )(APC ) = PR(APC ) = PW (AC A)2 = AC A follows immediately from C C the second Penrose condition (iii). So, demonstrating R(A A) = R(A ) proves (v) C C and the remaining part of (i). The inclusion R(A A) R(A ) is trivial. Now, let C v R(A ). Hence, there exists a vector u H such that v = AC u . Using AC AAC = AC we compute AC Av = AC AAC u = AC u = v , i. e., v R(AC A), which implies R(AC ) R(AC A). C Assertion (vi) follows easily: A vector z H belongs to the null-space of A A if and C only if Az N(A ) = W (note that this includes all z N(A)).
As earlier stated, the projection property since
R(APC ) = AC = W.
Corollary 3.2.2.
c MR = AC r + z | z N(A) C
Proof.
We have to show that
(I AC A)u : u C
Let
= N(A) C.
z = (I AC A)u with u C. Since R(AC ) C we have z C. A simple C C computation gives Az = A(I A A)u = (I AA )Au = (I PW )Au = 0 where the last equation follows from Au W = AC. C C Now let z N(A) C and set u = (I A )z . Then u C and (I A A)u = (I AC A)(I AC )z = (I AC A AC + AC AAC )z = (I AC A AC + AC )z = z since Az = 0 .
11
Note that the proof of the rst inclusion is a reformulation of (3.4) and the subsequent considerations. One may ask, when the inclusion in (i) of Proposition 3.2.1 is in fact an equality. It turns out, that this is characterized by the assumption mentioned in (3.5) which also simplies some other issues.
Proposition 3.2.3.
are equivalent
(i) the least squares problem (3.6) has a unique solution, (ii) (iii) (iv)
Proof.
Since
lary 3.2.2.
AC A
w +z
with
w R(AC A)
this decomposition. C
z = c d N(A A) C.
v H can be uniquely decomposed as v = z N(AC A). Let c C and denote by c = d + z d R(AC A) and, since c and d belong to C, we get
R(AC A) N(AC A) C = C.
This shows the equivalence of (iii) and (iv). We nish the proof by demonstrating the equivalence of (ii) and (iv). The direction C from left to right follows from equation (vi) in Proposition 3.2.1: Assume z N(A A)
C and z = 0 . Then (ii) implies Az = 0 and since Az W we have a contradiction to Az W. To prove the other direction assume z C with Az = 0 . Then there holds C also A Az = 0 which by (iv) implies that z is zero.
the generalized inverse In the case characterized by Proposition 3.2.3, the subspace inverse coincides with (2) AT,S described in [5, Theorem 2.12]. It is characterized there as the unique operator (if it exists) satisfying the second Penrose condition and having prescribed range and nullspace. For later use we state
Lemma 3.2.4.
mal basis
c MR denotes an arbitrary solution of (3.6). Assume an orthonor{z1 , . . . , z } of N(A) C is given and let Z = z1 . . . z . Then the subspace
Let
MR cmin = c MR Z (Z c MR ).
inverse solution (i. e., the solution of (3.6) which has minimal norm) is given by
12
If
N(A) C
is one-dimensional and
z N(A) C
Proof.
c MR
c MR z | z N(A) C .
We are looking for
MR cmin
with
MR cmin =
Writing
z N(A)C
min
c MR z .
z = Zg
with
g C
min c MR Z g
g C
which is solved by
g = Z c MR = Z c MR
In
N(A) C
can be done
Remark 3.2.5. Regarding the original linear equation (3.1) and the corresponding least MR squares problem (3.6) it is more appropriate to ask for a solution xmin with minimal
norm property. Fortunately, the answer can be easily derived from the results above. Note that Z Z = PZ is the orthogonal projection onto Z := C N(A). Thus, the main result of Lemma 3.2.4 can be rewritten as
MR cmin = (I PZ )c MR ,
which suggests an alternative proof using Pythagoras' theorem. The squared norm of MR an arbitrary least squares solution c z with z Z can be rewritten as
c MR z
and thus is minimized for get
= PZ c MR z
+ (I PZ )c MR
z = PZ c MR .
x0 + c MR = x0 + AC r z
with
z Z
we
MR xmin = (I PZ )(x0 + AC r ) = (I PZ )x MR ,
(3.9)
where
MR
13
Finally, we investigate the question of when the MR solution supplies a least square solution or even a pseudoinverse solution of (3.1). lemmata is straight forward. The proof of the two following
Lemma 3.2.6.
equivalent
(i) (ii) (iii) (iv) (v) (vi)
With the notation used in this section the following statements are
x MR
x MR = A b + z c MR = A r + z h MR = PR(A) r ,
z N(A), N(A), z
Lemma 3.2.7.
equivalent
(i) (ii) (iii) (iv)
MR xmin
MR xmin = A b ,
c MR = A r PN(A) x0 , A b x0 + C.
Vm+1 := span{r0 } + Wm .
The basis of
(3.10)
Acm against a (previVm using the (modied) GramSchmidt algorithm. Here cm is an arbitrary vector from Cm \ Cm1 . If PVm denotes the orthogonal projection onto Vm , a compact presentation of this procedure reads
is generated inductively by orthonormalizing
Vm+1
{v 1 , . . . , v m }
of
v1 = r0 /, vm+1 =
where
:= r0 ,
(3.11)
(m = 1, 2, . . . ).
14
if (and only if )
Acm Vm .
If such an index
(3.12)
exists we dene
L := min{m : Acm Vm }
and set With
(3.13)
L :=
otherwise.
H, m
L < . A,
(3.14)
Cm := c1 c2 . . . cm
and
Vm+1 := v1 v2 . . . vm+1
, the rst
m = Vm Hm + 0 . . . ACm = Vm+1 H
where
0 m+1,m vm+1 ,
m = j,k m+1,m C(m+1)m is an upper Hessenberg matrix and Hm := H j,k=1 m Cmm is the square matrix obtained by deleting the last row of H m. Im 0 H m are given by j,k = (Ack , vj ), 1 k j m, and The nonzero entries of H k+1,k = (I PVk )Ack 0, with equality holding if and only if k = L. In other m is an unreduced upper Hessenberg matrix (and thus of full rank m) as long words, H as m < L. For notational convenience we set vL+1 = 0 , so (3.14) holds true for m = L. Note L has a zero last row. also, that the Hessenberg matrix H With respect to the orthonormal basis Vm+1 of Vm+1 , the vector r0 = v1 = (m+1) (m+1) (m+1) Vm+1 u1 has the coordinates u1 (where u1 denotes the rst unit vecm+1 tor of C ). Consequently, writing c Cm as c = Cm y with a coordinate vector y Cm , the corresponding residual is given by b Ax = r0 ACm y = Vm+1 ( u1
(
(m+1)
(m+1) m y ) = u1 my H H MR MR cm = Cm ym ,
Cm+1 ).
For
u1
(m+1)
MR m ym H
= min u1 m
y C
(m+1)
my H
2.
(3.15)
This problem has a unique solution if and only if true as long as the nullspace of
m H
m, which is certainly
no directions in
m < L. A.
least squares problem has a unique solution if and only if Thus, for
m < L,
we have
MR MR m u1 cm = Cm ym = Cm H
= ACm r0 H m
(3.16)
is the (uniquely determined) MR correction. We should mention here that, if full rank
m,
m has H HH 1 H = (H m m ) Hm
15
m. H
m1 = Rm1 , Qm1 H 0
with
Qm1 Cmm
= Im )
and
Rm1 C(m1)(m1)
upper trian-
m1 H
Qm1 0 Qm1 0 Hm = 0 1 0 1
We dene a Givens rotation
m1 H hm 0 m+1,m
Rm1 t . = 0 0 m+1,m
(3.17)
2 (cm , sm 0, c2 m + sm = 1, m R)
(3.18)
Gm
and set
Qm1 0
Rm1 t 0 Hm = 0 1 0 0
and
(3.19)
Qm := Gm
Qm1 0 0 1
Rm :=
Rm1 t . 0
(3.20)
We can verify by a simple calculation that the appropriate parameters of the Givens rotation are
cm :=
| |
2 | |2 + m +1,m
, sm :=
m+1,m
2 | |2 + m +1,m
,
(3.21)
:=
im 2 | |2 + m . +1,m e
Qm = Gm
and
Gm1 0 0 1
Gm2 O G O 1 , O I2 O Im1
m = Rm . Qm H 0
(3.22)
16
Since
is nonzero as long as m+1,m = 0 the triangular matrix Rm is nonsingular if m < L. Even if m+1,m = 0, i. e., m = L, we may obtain a nonsingular RL , namely if and only if = 0 (cf. (3.28)). If = 0, Equation (3.22) holds true for cm and sm 2 2 chosen arbitrarily subject to cm + sm = 1.
Using this QR decomposition we can rewrite (3.15) as
y C
u1 min m
y C
(m+1)
my H
(m+1)
= min QH m Qm u1 m
y C
(m+1)
Rm y 0 ,
2
= min Qm u1 m
where
Rm y 0
= min m
2 y C
qm Rm y (m) qm+1,1
Cm ) denotes the rst column of Qm . The unique MR 1 solution of the above least-squares problem is ym = Rm qm and the associated least(m) MR squares error is given by rm = |qm+1,1 |. [qm , qm+1,1 ] = Qm u1
(q m More generally, (3.22) implies
(m)
(m+1)
m = QH Rm H m 0
for
and
1 = Rm 0 Qm H m
(3.23)
m < L.
rewritten as
m < L (a similar relations hold true for m = L, cf. (3.28) and (3.32)). The Vm is often referred as Paige-Saunders basis (cf. [52]) and forms an orthonormal of Wm .
L,
ACL = VL HL
and the least squares problem (3.6) is equivalent to
u1
where
(L)
MR HL yL
= min u1 m
y C
(L)
HL y
2,
(3.26)
We distinguish
two cases:
17
Denition 3.4.1.
the linear system
occurs if
HL y = u1
(L)
has a unique solution. Otherwise, the termination is called a singular breakdown . The situation in case of a regular breakdown is the same as if
is nonsingular:
Proposition 3.4.2.
some step
L,
then the linear system (3.1) is necessarily consistent and the (uniquely
determined) MR correction
1 MR cL = ACL r0 = CL HL u1 (L)
(3.27)
leads to a solution
MR MR xL = x0 + cL
of (3.1), i. e.,
MR AxL = b.
Proof. Obviously, if HL is regular, the solution of the least squares problem (3.26) is 1 (L) MR MR MR MR MR yL = HL u1 . The correction cL = CL yL satises r0 AcL = rL = 0 , i. e. MR MR AxL = Ax0 + AcL = Ax0 + r0 = b .
The coordinate vector QR factorization of
MR yL
HL .
Recall that
(3.28)
=0
we obtain
cL = 1
and
sL = 0
1 MR yL = RL QL1 u1 .
Note further, that our proof above shows This characterizes the regular breakdown:
(L)
MR r0 = AcL WL
Proposition 3.4.3.
(ii) (iii) (iv)
m,
r0 Wm = span{Ac1 , . . . , Acm },
MR xm
is a solution of (3.1),
Wm = Vm ,
1 In the context of Krylov subspace methods (cf. Section 4) Brown and Walker use in [7] the terms 2 breakdown
breakdown through degeneracy of the Krylov space and through rank deciency of the least-squares problem.
18
Proof.
We have already seen in Proposition 3.4.2, that (i) implies (ii) and (iii).
In
fact, by Lemma 3.1.1, (ii) and (iii) are equivalent. Suppose we have a regular breakdown in step
m = L,
that is
ACL = VL HL
where
HL
is nonsingular. Then
WL = span{ACL } = span{VL } = VL ,
which proves (i)(iv). By the denition that is (iv)(ii). Now, suppose
Vm = span{r0 } + Wm1
Since
we have, that
Vm = Wm
implies
r0 Wm ,
such that
r0 Wm .
Wm = ACm
there exists an
f Cm
r0 = ACm f .
r0 = Vm+1 u1
(m+1)
(m+1) m+1 f = u1 H .
Due to the upper Hessenberg structure of unreduced, that is
m+1 H
m+1 H
is not
m+1,m = 0
and
m = L.
L L1 hL f = u1 HL f = H .
The existence of a vector column span of
HL . Since is an unreduced upper Hessenberg matrix, the rst L 1 columns of HL are linearly independent. Assuming that HL is singular is therefore L L1 }. But this would imply that u1 equivalent to stating hL span{H lies already in the column span of HL1 , which is impossible since the columns of the triangular matrix L L1 forms a linear independent set of vectors in CL . Thus we conclude that u1 H HL is regular and have now demonstrated (ii)(i). Taking all together we have (i)(iv)(ii)(i) and (ii)(iii).
We now turn to the singular breakdown.
L1 H
L u1
lies in the
Proposition 3.4.4.
(i) (3.11) breaks down singularly in step (ii) the constrained least squares problem
m;
r0 Ac MR = min r Ac = min u1 m
c Cm y C
(m+1)
my H
(3.30)
19
m ) < m, rank(H HH H m m Rm
is singular,
is singular,
Proof.
m H
m,
also equivalent to (iv) and (v) (cf. (3.22)). (iii)(i): As previously noted,
m) < m rank(H
m = L. Therefore the Arnoldi-type decomposition reduces to (3.25) L ) shows, that HL is singular if and only if rank(H L ) < L. rank(H L1 has full rank L 1 (i)(vii): Let HL be singular. Since HL = HL1 hL and H L1 , i. e., there the last column of HL must be a linear combination of the columns of H L1 exists a vector g C such that hL = HL1 g . The last column of (3.25) can be
rewritten as
L1 g = ACL1 g ACL1 = WL1 . AcL = VL hL = VL H Acm span{Ac1 , . . . , Acm1 }. Thus, there exists 1 , . . . , m1 := 1 c1 + + m1 cm1 cm such that Acm = 1 Ac1 + + m1 Acm1 . Hence c = 0 since by denition cm Cm1 . Cm N(A) and c (vi)(i): Suppose 0 = z Cm N(A), that is z = Cm f with a nonzero coordinate m vector f C . Using (3.14) we get mf . 0 = Az = ACm f = Vm+1 H
So, either (vii)(vi): We have
Vm+1 are vm+1 = 0 and hm+1,m = 0. But then m = L and the equation above reduces to 0 = VL HL f . Since the basis VL is linearly independent, we conclude HL f = 0 , i. e., HL is singular.
which implies (iii) and further (i), or the rows of linearly dependent, which by construction implies Now, we nish the proof by noting that (vii) is equivalent to each of the subse-
mf = 0 , H
quent conditions: For (viii) this can be seen directly from the expanded formulation
span{Ac1 , . . . Acm1 , Acm } = span{Ac1 , . . . Acm1 }. Note further, denition dim Cm = m. Together with the nested structure of the (cf. (3.2)) and Wm = ACm this yields the remaining equivalence.
the determinant of
3 In context of Krylov subspace methods this condition is derived by Smoch in [64], who examines
HH H m m.
20
Note that the equivalence of (ii) and (vi) could be also derived directly from Proposition 3.2.3. Note further, that the vector of
L1 y = hL . H
N(HL )
solution by
= g
hL g H L1 = . 1 1 rank(HL ) < L
and
(3.31)
rank
u1
(L)
HL
rank(HL ) = L 1. Thus, the nullspace of HL } and N(A) CL = (just as N(A) CL ) is one-dimensional. We get N(HL ) = span{g }. Applying the previous Givens rotations to HL results in (cf. (3.17)) span{CL g L
which shows that
u1
(L)
R(HL )
and
(3.32)
HL
cL
and
sL
in (3.21) could
be chosen arbitrarily. This transforms the least squares problem (3.26) into
yL
min u1
CL
(L)
HL yL
min
yL1 C
CL1
QL1 u1
(L)
RL1 t 0 0
yL1
=
2
min
yL1 C (L1)
CL1
,
2
where
qL1 qL,1
= QL1 u1
(L)
1 block can be forced to be zero by setting yL1 := RL1 ( qL1 t ) the minimum is 1 hL = g (cf. (3.23)) is the unique independent from . Note further, that RL1 t = H L1 1 MR solution of HL1 y = hL and RL1 qL1 = yL1 is the coordinate vector of the previous
MR correction. In other words: All solutions of the above least-squares problem are in
MR yL 1 g
: C
MR yL 1 : C g 0 MR MR rL = |qL,1 | = rL 1 (L1)
.
(3.33)
Proposition 3.4.5.
holds: The linear system
Lth
step, there
HL y = u1
(L)
is inconsistent and
(3.34)
21
(L 1)th
in the terminating step. Thus, the MR approximation process makes no progress, i. e. MR MR rL = rL 1 . All MR corrections are in the set
MR cL 1 (CL1 g cL ) : C
(3.35)
and the correction with minimal norm (i. e., the subspace inverse correction) can be computed as
MR ACL r0 = cL 1 MR (z , cL 1 ) z (z , z )
with and
z = CL1 g cL hL = R 1 t g =H L1 L1
(3.36)
where Proof.
RL1
and
CL
together with
equations (3.34) and (3.35). The remaining assertion follows using Lemma 3.2.4 with
= CL1 g cL N(A) CL . z = CL g
The equivalences in Proposition 3.4.4 are often formulated in its negated form, for example: The constrained least squares problem (3.30) has a unique solution (or, equivalently,
or or or
mH m is regular, or Rm is regular) if and only if no singular breakdown occur in step H m. If A is nonsingular this conditions are always satised, as we had already stated
regarding (3.5). Thus
Corollary 3.4.6.
singular operator
If
In other words, a singular breakdown is only possible (but not necessary) for a
A.
often a concern to detect near singularity. Brown and Walker suggest in [7] to monitor the condition number of of
Rm .
Rm ,
specically, if
The two following propositions provide several equivalent denitions for the breakdown index
L.
22
Proposition 3.4.7.
be characterized as
can
Proposition 3.4.8.
be characterized as
can
Acm Vm } Acm Wm1 } Wm = Wm1 } dim Cm > dim Wm } dim Wm < m} Cm N(A) = {0 } } m ) < m} = min{m : rank(H H m = min{m : det (H Hm ) = 0} = min{m : rm,m = 0}, : : : : : : Rm
(cf. (3.20)).
where
rm,m =
Lth
A.
cL+1 CL+1 \ CL . The only interesting information, which should be saved, is the direction of N(A) CL . This can be done by computing and storing z as described in
(3.36). Throwing away from
cL CL \ CL1
CL+1 \ CL .
cL
cL+1
cL
is stored in
we can cancel
23
Cm
with
m > L+1
by
where
m = m 1.
This strategy to avoid a singular breakdown can be applied more than once and by orthonormalizing the vectors
we obtain a basis
Z = z1 . . . z
of the intersection of
the subspace inverse solutions with respect to the original correction spaces, i. e., to minimize the norm of the correction by adding a suitable vector from (cf. Remark 3.2.5). Example 3.4.10 illustrates this technique. Of course, if
span{z1 , . . . , z }
only up to a nite index and will stagnate henceforth. It may happen that also the termination at this index is a singular breakdown. Indeed, Proposition 3.4.2 implies that if the linear system (3.1) is inconsistent, any termination is a singular breakdown. Thus, the look-ahead strategy does not work in any situation.
Example 3.4.10. The sequence of correction spaces is dened by the nested bases
1 1 1 1 1 0 c1 = 0 , c2 = 0 , c3 = 1 , . . . .
. . . . . . . . . We restrict ourselves to three dimensions. Consider
1 0 0 A = 1 0 1 , 0 0 0
Thus we get
1 b = 0 0
and
0 x0 = 0 . 0
v1 = r0 = b and orthonormalizing Ac1 = 1 1 0 against v1 yields in v2 = 0 1 0 . In the second step the Gram-Schmidt procedure breaks down singularly
with the Arnoldi-type decomposition
1 1 1 0 1 1 A 0 1 = 0 1 , 1 1 0 0 0 0
At this point we cancel with orthonormalizing
which results in
g= 1
and
0 z = 1. 0
c2 and the last column of the Hessenberg matrix and continue Ac3 against v1 and v2 . This leads to the decomposition 1 1 1 0 1 1 A 0 1 = 0 1 , 1 0 0 1 0 0 1 1 1 1 = 0 1 1 0 0 1 1 1 1 1 0 = 0 1 = 1 0 1 0 1 1
c MR
24
Ax = b .
Since
span{c1 , c2 , c3 } = H,
space, which has minimal norm, coincides with the pseudoinverse solution and can be computed by solving the minimization problem
min c MR + z ,
C
which is given by
(z , c ) = 1 (z , z )
MR
and
1 MR c + z = 0 . 1 , C
Regarding Remark 3.4.9 and the subsequent example we should give a reformulation of the algorithm stated in (3.11). We assume a given correction space all possible correction spaces and may contain also directions in constructs a decomposition which covers
N(A) C
m+ = Cm Z C C
with
dim Cm = m
and
dim Z =
Cm , Z L m.
they are constructed by Algorithm 3.4.11. The breakdown index Moreover there holds
Cm N(A) = {0 }
for
m<L
and the
m = L.
with a look-ahead as long as there are unused directions in the global correction space.
25
Algorithm 3.4.11.
1 Given: 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
A, b , x0 , C m := 0, := 0, r0 := b Ax0 , := r0 , v1 := r0 /, V1 := v1 0 Assume empty matrices Z0 , C0 , Q0 and H >m+ while dim C Cm = span{Cm }, Z = span{Z }, Vm+1 = span{Vm+1 } \ Cm Z C Choose c m+1 := (Ac , vj ) j =1 h := (I PVm+1 )Ac , v := v t ( := (Ac , v1 ) for m = 0) := Qm h if =0 m := m + 1
Update the Arnoldi-type decomposition:
, Cm := Cm1 c
Compute Compute
, Vm+1 := Vm v
m := Hm1 h H 0
(3.18) and (3.21).
else
if = 0
g := Rm 1 t (z := c for m = 0) z := Cm g c = (I PZ )z / (I PZ )z z Z +1 := Z z := + 1
, m1 h Hm := H
(m)
Rm :=
Rm1 t 0
end end
end
break
MR cm := Cm Rm 1 Qm1 u1
MR MR xmin := (I PZ )(x0 + cm )
Cm Z
by:
26
Denition 3.5.1.
x, y H |(x , y )| . x y x H
the angle
(x , y ) [0, /2]
is
cos (x , y ) :=
We dene the angle between a nonzero vector as
UH
(x , U) := inf (x , u ),
0 =u U
i. e.,
Given two nite dimensional subspaces V and W of H with m := min(dim V, dim W), m the canonical (or principal ) angles {j }j =1 between V and W are recursively dened by
0 =v V 0 =w W
|(vj , wj )| |(v , w )| =: , v w vj wj
The angle between the spaces
v v1 , . . . vj 1
and
w w1 , . . . wj 1 .
and
(V, W) := m
(cf. [30, Section 12.4.3] and [22, p. 259]).
Lemma 3.5.2.
Let
U.
For each
x H
and denote by
PU
the
there holds
(3.37)
(x , U) = (x , PU x )
and, with
sin (x , U) :=
mth
OR hm Wm ,
i. e., we require that the previous error space
mth
approximation error
Vm
(cf. (3.10)).
27
Lemma 3.5.3.
h W
Let
such that
determined if and
V, W be subspaces of the Hilbert space H and r H. There exists r h V if and only if r W + V . Such an h is uniquely only if W V = {0 }.
V PW
exists.
H = W V . dim V = dim W
and
is regular or not.
Merely the
occurrence of a singular breakdown has to be considered with greater care. At rst we cite some results which are independent of the nested structure of the approximation and residual spaces. Using the notation of Section 3.2 and Lemma 3.5.3 we get
Proposition 3.5.4.
If
V PW = (PV PW ) ,
1 cos (V, W)
and
V r OR = r h OR = (I PW )r
r . cos (V, W)
Applying the notation and results of Section 3.3 (especially the Arnoldi-type decomOR OR OR position) yields in a representation of the OR correction cm , with hm = Acm , in the coordinate space:
Hm Cmm denote the square Hessenberg matrix obtained by m as in (3.14). There exists a uniquely determined mth OR deleting the last row of H OR approximation if and only if Hm is nonsingular. The correction cm which corresponds OR OR to the approximation hm = Acm satises
Let
OR OR cm = Cm ym
Proposition 3.5.5.
with
OR 1 ym = Hm u1 .
(m)
28
WL = WL1 VL
VL WL = VL , i. e., PW = PWL , L
Corollary 3.5.6.
holds
correction are uniquely determined and coincide with the corresponding MR quantities. If a singular breakdown occurs the OR approximation does not exists. In any case there
(VL , WL ) = 0.
The close connection between MR and OR quantities is not restricted to the termination step as we will see in the remainder of this section. For m L we dene w1 , . . . , wm such that {w1 , . . . , wm } is an orthonormal basis of Wm and span{w1 , . . . , wm1 } = Wm1 . For notational convenience we set wL := 0 if dim WL = L 1 (i. e., if a singular breakdown occurs). In terms of these vectors, the MR approximation can be expressed as a truncated Fourier expansion
m MR hm = PWm r0 = j =1
The norm of the approximation error, i e., the residual
(r0 , wj )wj .
MR MR = r0 hm rm m
is given by
MR 2 rm
= (I PWm )r0
= r0
j =1
|(r0 , wj )|2 .
The
mth MR approximation can also be computed by updating the previous approxm MR hm = j =1 MR MR = hm 1 + (r0 , wm )wm = hm1 + PWm r0 PWm1 r0 MR MR MR MR = hm 1 + PWm (r0 hm1 ) = hm1 + PWm rm1 .
imation:
(r0 , wj )wj
(3.42)
(3.43)
The angle which occurs in (3.43) is strongly connected with the QR decomposition (3.22), as the next proposition shows.
29
Proposition 3.5.7.
Suppose that no singular breakdown occurs in step m (m L) (m) m+1 and denote by Qm = [qj,k ]j,k=1 the uniquely determined unitary matrix in the QR m = j,k m+1,m . Then there holds decomposition (3.22) of the Hessenberg matrix H j,k=1
(m)
(3.44)
|qm+1,1 | |qm,1 |
(m1)
(m)
m+1,m | |2 +
2 m +1,m
(3.45)
where Proof.
is dened by
t = Qm1 hm
and
hm = [j,m ]m j =1
Hm .
With respect to the orthonormal basis {v1 , . . . , vm+1 } of Vm+1 the vector r0 = (m+1) m ). This v1 possesses the coordinates u1 and Wm = ACm is represented by R(H implies (m+1) m )), , R(H (r0 , Wm ) = (v1 , Wm ) = 2 (u1 where the last angle is dened with respect to the Euclidean inner product on which is indicated by the subscript
Cm+1 ,
2. H Obviously, the columns of Vm+1 Qm form another orthonormal basis of Vm+1 (cf. (m+1) (3.24)). The coordinate vector of v1 with respect to this basis is Qm u1 , the rst m+1 column of Qm , and Wm is represented by all vectors in C with zero last component, m m+1 m a subspace which we identify with C . The orthogonal projection of C onto C is Im 0 (m+1) given by has unit length we get from (3.39) and since Qm u1 0 0 sin (r0 , Wm ) = sin 2 (Qm u1
(m+1)
, Cm ) = Im+1 Im 0 0 0
(m+1) Qm u1
Qm u1
2
(m+1) 2
= |qm+1,1 |,
(m)
MR sin (rm 1 , Wm ) =
where we assume that
MR rm sin (r0 , Wm ) (I PWm )r0 = = , MR (I PWm1 )r0 sin (r0 , Wm1 ) rm1
i. e. no regular breakdown occurs before step
(3.46)
a regular breakdown occurs in step m = L then (r0 , Wm ) = 0, since r0 Wm and MR rL = 0 , i. e., the quantities in (3.44) and (3.45) become zero. (m) (m1) i From (3.18) and (3.20) we conclude qm+1,1 = sm e m qm,1 , where sm and m are as dened in (3.21). Together this yields (3.45).
r0 Wm1 ,
m.
If
30
Since
MR MR PWm rm rm 1 W m MR rm 2 MR = rm 1 2
MR PWm rm 1
MR = rm 1
|(r0 , wm )|2 .
(3.47)
This shows that the MR approximation is not improved whenever the direction, in which
Wm1
is enlarged, is orthogonal to
r0 ,
or if
Wm1
mth
wm = 0
in this
cm =
1 s2 m =
MR 2 rm |(wm , r0 )| = , MR MR rm1 rm 1
(3.48)
mth
cm = 0.
Next, we further investigate the question of when the OR approximation is welldened, i. e., if cos (Vm , Wm ) = 0 and dim Wm = m. Noting that the set MR MR m } with w m := rm {w1 , . . . , wm1 , w / r is an orthonormal basis of Vm we see, 1 m1 that the cosines of the canonical angles between Vm and Wm are the singular values of the matrix (cf. [30])
|(w1 , w1 )|
...
|(wm1 , w1 )|
. . .
|(wm , w1 )|
. . .
0 . m )| |(wm , w
Proposition 3.5.8.
mth
step, we have
In the exceptional case of a singular breakdown the angle between VL and WL hapMR rL 1 is orthogonal to WL = WL1 . As remarked earlier regarding (3.32) we can arbitrarily dene the last Givens rotation GL in this case. pens to be zero whereas Taking the identity relates
GL
to the
4 Note, that we use a slightly dierent denition for the canonical angles then in [30], since we take
the absolute value of the inner products.
31
MR (rL 1 , WL ).
Proposition 3.4.5, the following consequence of Proposition 3.5.8 holds also true for a
m = L. r0
with respect to
Corollary 3.5.9.
i. e., if and only if
The OR approximation of
Vm
and
Wm
exists and
(wm , r0 ) = 0.
Remark 3.5.10. We can apply this corollary to the special case that the MR approximation detects a least squares solution of (3.1), which is, in view of Lemma 3.2.6, equivalent to
PR(A) r0 Wm :
PR(A) r0 = r0 , we conclude from (ii), Proposition 3.4.3, that a regular breakdown occurs in step m. Suppose now, that r0 R(A), i. e., the linear equation (3.1) is inconsistent, and PR(A) r0 Wm . Then a singular breakdown occurs in step m L. Since the MR approximation becomes the global best approximation over H, it can not improve in the subsequent steps m + 1, m + 2, . . . and the corresponding OR approximations
If (3.1) is consistent, i. e., fail to exist. This holds true, even if we proceed the approximation process after the singular breakdown using the look-ahead strategy described in Remark 3.4.9. Moreover, equation (viii) in Proposition 3.4.4 guarantees
PR(A) r0 WL1 ,
one iteration with stagnation before the singular breakdown. The question of when an MR method determines a least squares solution can be further investigated. correction spaces. The next proposition generalizes a result which was stated by Brown and Walker in [7] for Krylov spaces, and can be analogously proved for general
Proposition 3.5.11.
gular breakdown in the
mth
dim Wm = m x0 + c
m.
c Cm
such that
0 = A (b A(x0 + c )) = A r0 A Ac .
This holds true if and only if A r0 + A Wm = A Wm .
A r0 A Wm
A Vm+1 =
Finally, we show that dim Wm = dim A Wm : Clearly we have dim A Wm dim Wm . Suppose dim A Wm < dim Wm , i. e., there exists c Cm such that 0 = Ac Wm and A Ac = 0 . Then 0 = (c , A Ac ) = (Ac , Ac ) = Ac 2 , which is a contradiction.
32
We close this section with a theorem which summarizes the connections between MR and OR approximations and solutions.
MR m , wm ) = 1 and m = rm (wm , r0 ) = 0 we can dene w 1 /(wm , r0 ), which implies (w m , wj ) = 0 (j = 1, . . . , m 1). Using this denition we can express the oblique OR (w
If projection by the expansion
m1 Vm PW m
=
j =1
m )wm . (, wj )wj + (, w
Vm OR MR PWm )r0 = hm hm = (P W m
(3.49)
Theorem 3.5.12.
cm =
With the notation of Algorithm 3.4.11, MR cos (rm1 , Wm ) the MR approximations satisfy
MR sm = sin (rm 1 , Wm )
and
MR MR rm = sm rm 1 = s1 s2 sm r0 ,
(3.50)
If the
mth
= s1 s2 sm r0 /cm , = = =
MR 2 OR s2 m xm1 + cm xm + MR 2 OR s2 m hm1 + cm hm MR 2 OR s2 m rm1 + cm rm .
z,
with
z Z
mth
MR xm MR 2 rm
=
j =0 m
1 rjOR 2 1 rjOR 2
xjOR + z , hjOR
with
z Z
(3.56)
1
MR 2 rm
MR = hm j =0 m
(3.57)
1
MR rm
r MR = 2 m
j =0
1 r OR , OR 2 j rj
(3.58)
33
1
MR 2 rm
1
MR 2 rm 1
1
OR 2 rm
=
j =0
1 rjOR 2
(3.59)
for which
cj = 0
MR OR rm = rm + MR rm wm ,
MR 2 rm wm . (wm , r0 )
Since
MR rm
1+
MR 2 rm (wm , r0 )
OR rm
MR 2 rm 1 r OR 2 , (wm , r0 ) m
MR MR hm hm 1 ,
OR hm
2 2 which implies by sm + cm = 1the relationships (3.54) and (3.55). MR MR Since hm = hm1 if the mth OR approximation does not exist, i. e., if repeated application of (3.54) leads to m MR hm
cm = 0,
the
=
j =0 cj =0
2 m,j hjOR ,
(3.60)
where
m,0 :=
and
m,j := cj
s
=j +1 c =0
(1
jm
and
cj = 0).
=0 c =0
34
m,j
rjMR := rjOR
=j +1 c =0
MR rm r MR = , r MR rjMR 1
which obviously holds true also for j = 0. As long as no (regular) breakdown occur, we MR 2 MR to get (3.57). Similarly we can derive have rm = 0 and can divide (3.60) by rm (3.58) from (3.55). The rst equality in (3.59) is a reformulation of the Pythagoras identity
2 1 = s2 m + cm =
MR rm MR rm 1
2 2
MR rm OR rm
2 2
(cf. (3.46) and (3.51)), and the second follows by induction if we take into account that MR MR = hm hm 1 if cm = 0. Any identity for the approximations hm implies an analogous result for the corrections
cm
or iterates
xm :
A|Cm Z
(cf. (3.4) and Algorithm 3.4.11) shows that the corrections satises the identity up to a dierence
z Z
. Adding
x0
(4.1)
Cm = Km (A, r0 )
the correction space coincides with the preceding residual space, we can use the same basis for both in computations, which reduces the computing and storage requirements. Moreover, if
A is regular, a MR or OR method determines the exact solution whenever the Krylov space becomes A-invariant (cf. [60] or [22]). We will see below, that this is no longer true for singular A, that is, there exists combinations of right hand side b and initial guess x0 for which the system Ax = b is solvable, i. e., r0 = b Ax0 R(A), but no Krylov space contains a correction c with Ac = r0 .
(m = 1, 2, . . . ), m.
The
Pm
mth Krylov space if and only if there exists no nonzero polynomial q Pm1 with q (A)r0 = 0 . If such a polynomial exists for some m then there also exists a (unique) monic polynomial mr0 ,A of minimal degree with mr0 ,A (A)r0 = 0 which is called the minimal polynomial of r0 with respect to A.
polynomial space is isomorphic to the Since (3.12) can be rewritten as
Pm
(4.2)
35
36
the degree of the minimal polynomial equals the breakdown index implies, that polynomial integer
is nite. Conversely, if
L < ,
we conclude from (4.2) and the minthis index also equals the smallest
imality property of
(cf.
mr0 ,A
of degree
such that
(4.3)
Km (A, r0 ) = KL (A, r0 )
for all
mL
and
dim Km (A, r0 ) = m
if and
m L. Km (A, r0 ) N(A) = {0 }
as long as
This shows that a singular breakdown cannot occur as long as the sequence of Krylov subspaces does not terminate, i. e.,
m < L.
Thus
it is not necessary to employ a look-ahead strategy as described in Remark 3.4.9. A rst consequence of the polynomial connection is a representation of the residuals in terms of special polynomials: Since every vector
xm = x0 + cm = x0 + qm1 (A)r0 with some qm1 rm = b Axm = r0 Acm can be written as rm = r0 + Aqm1 (A)r0 = pm (A)r0 ,
The residual polynomial where
pm ( ) := 1 qm1 ( ) Pm . pm (0) = 1.
pm
belong to the MR and OR iterates as well as their zeros can be found in the literature (cf. [23] and the references therein). The results about the OR polynomials follow by the usual line of argument, which did not care about if
is singular or not.
The zeros of the MR polynomials can be characterized as the orthogonal section of 1 A onto Km (A1 , Am r0 ) if A is nonsingular. To obtain analogous results for singular
A,
The operator of choice is the subspace inverse with respect to the space, as long as no singular breakdown occur. step
mth
Krylov sub-
m=L
(cf. Proposition 3.4.5) and thus the MR residual polynomial remains the
ACm A is a projection onto Cm = k C Noting A r0 R(A m A) for 0 k m 1 Km+1 (ACm , Am r0 ) = Km+1 (A, r0 ) m = V m Rm AVm = Vm+1 H m,
this can be rewritten as
and
As a second tool we need an inverse version of the relation (cf. (3.24)). If no singular breakdown occur in step
m = Vm Rm 1 ACm V
37
since
Cm
Rm is nonsingular (cf. Proposition 3.4.4, (v)) and ACm A is a = Km (A, r0 ) = span{Vm }. This relations are the essentials to
argumentation and results of [23, Section 3.3] remain valid if we replace the inverse of A by the subspace inverse ACm . C The disadvantage of A m is its dependence on the mth Krylov subspace (and thus it depends on r0 ). The dependence on m is not signicant since we could just as well C choose A where C is the largest Krylov subspace with Km (A, r0 ) N(A) = {0 } (that is
KL1 (A, r0 )
or
KL (A, r0 )
r0
persists.
In Remark 4.3.6 we will see how to remedy this deciency under certain conditions by using a more handy generalized inverse. A careful analysis of our arguments above shows, that as long as no singular breakdown occurs in step m, every generalized inverse X X 2 X for which A A is a projection (i. e., (A A) = A A) and Km (A, r0 )
AX is suitable, R(AXA).
with respect to
is a generalization of the
mA
of a
nn
matrix
A.
mA ( ) = k0
i=1
where
( i )ki ,
(4.5)
i (i = 1, . . . , ) are the distinct nonzero eigenvalues of A. The number d := k0 is called the index of A. Setting 0 = 0, the numbers ki are the dimensions of the largest Jordan block associated with the eigenvalue i in the Jordan canonical form of A.
For later use we state three lemmata
A Cnn
is invariant under
similarity transformations.
N Cnn is a nilpotent matrix of index d (i. e., N d = O and m smaller index m < d such that N = O ), then its minimal polynomial is
If If
Lemma 4.2.3.
nomial of
A Cnn
and
B Cmm
A O O B
is the product
mA ( )mB ( ).
matrix
AD
of a
nn
AD AAD = AD , AD A = AAD A
d+1
and (4.6)
A =A A ,
38
where
d = index(A)
is the index of
A=T
where
C O T 1 , O N
(4.7)
T Cnn
and
is nilpotent of index
C Crr , r = rank(Ad ), are nonsingular and N C(nr)(nr) d, the Drazin inverse of A can be written as AD = T C 1 O T 1 O O AD = O
if and only if (4.8)
(cf. [14, Theorem 7.2.1.]). This implies The Drazin inverse of holds
there
Proposition 4.2.4.
1 = deg(mA ) 1
Let
A Cnn
denote
i=1
M = d+
ki
and
i = 1, 2, . . . , ).
(4.9)
p(A) = AD .
Proof.
The transformation of
(4.10)
A=T
where the diagonal blocks
J O T 1 O N
(4.11)
and
J = diag(J1 , . . . , Jh ) and
N = diag(N1 , . . . , Ng ). Each Jk is a Jordan block associated with a non-zero eigenvalue i . The matrices Nj are the Jordan blocks associated with the zero eigenvalue which mm implies Nj C with m d = k0 . Moreover there is at least one d d Jordan block in N . Thus, N is nilpotent of index d, which shows, that the Jordan canonical d form is a special form of (4.7). Lemma 4.2.2 implies that mN ( ) = is the minimal polynomial of N . p(J ) 0 Let p be an arbitrary polynomial. Then there holds p(A) = T T 1 , 0 p(N )
and, observing (4.8) and the identities
39
1 we conclude, that equation (4.10) is equivalent to p(Jk ) = Jk for k d p(N ) = O . Therefore p is divided by , which can be rewritten as = p(d1) = 0. Given
i 1 0 . . . 0 0 i 1 0 . .. .. Jk = . Cmm . . . 0 0 i 1 0 0 . . . 0 i
an easy computation shows that
p (i ) 2! p (i ) 1! .. .
... ...
.. ..
p(i ) .
J k 1
0 . = . . 0 0
1 2 i 1 i
1 3 i
1 2 i .. .
. .
...
0 i
1 2 i 1 i
have maximum dimension ki , and there (1)j j ! (j ) is a block with exactly this dimension, we obtain the ki equations p (i ) = j +1 Since the Jordan blocks corresponding to (j
= 0, 1, . . . , ki 1)
i .
Thus, the uniqueness of the Hermite inter-
p.
M = deg(mA ) 1.
Then
1 A = [0 0 0]
mA ( ) = 2 .
AD = O
p( ) = 0
than
of degree
satises (4.10).
M = deg(mA ) 1,
Let
Proposition 4.2.6.
A is not nilpotent the M = deg(mA ) 1.
A Cnn , mA
degree of a polynomial
its minimal polynomial and d = index(A). If p with p(A) = AD can not be smaller than
40
d = 0, the matrix A is regular. Suppose p(A) = AD = A1 for some polynomial p. By multiplying with A and rearranging this equation we see that s( ) := 1 p( ) is divided by mA , i. e., deg p deg(mA ) 1 Now, let d > 0. The coecients of the Hermite interpolation polynomial which
Proof.
If satises (4.10) can be expressed by determinants.
Consider
p( ) 0 0 0
. . .
1 1 0 0
. . .
0 1 0
2 0 0 2
... ...
d1 0
d 0
... ...
M 0
..
. . .
. . .
. . .
M +1
(4.9). The rst column consists of the right hand sides of the interpolation conditions and the subsequent columns contain the values/derivatives of the polynomials in the rst row evaluated at
i .
mial can be computed by expanding this determinant along the rst row and isolating
p( ).
The uniqueness of the Hermite interpolation guarantees that the lower right
in the expansion) is nonzero, i. e., its M columns are linearly independent. The coecient of is zero if and only if
p( )
0 0 0
. . .
1 0 0
. . .
0 1 0 0
0 0 2 ... 2 1
...
...
..
. . .
. . .
. . .
0
1 1 1 2 1
. . .
(d 1)! ...
...
0
1 M 1 2 (M 1)M 1
. . .
1 1 0 1
21 . . .
= 0.
0 2 2 ...
d! dj (dj )! i
1 2
(M 1)! M k1 (M k1 )! 1 1 M 2
......................................................
(1)j j ! +1 j i (M 1)! M 1 j (M 1j )! i
...................................................... (1)(k
1) (k 1)! k
(M 1)! M k (M k )!
41qo7a$h3l@ixnews3.ix.netcom.com.
sci.math.research,
Message-ID:
41
Since we only ask if the determinant is zero or not, we can scale each row or column by a suitable nonzero factor. Thus, in the rst be scaled to rows the only nonzero entry can j +1 (1)j j ! and multiplying the rows, which start with by i = 0 leads to j +1 j!
i
d1
0 0 0
. . .
1 0 0
. . .
0 1 0 0
0 0 1 ... 3 1
...
...
..
. . .
. . .
. . .
0 1 1
. . .
1 ...
...
0 M 1 (M 1)M 1
. . .
1 2 1 0
3 2 1 21 . . .
0 3 2 ...
d j +1 d i
2 2 2
M 1 M 1 k1 1 M 2
..........................................
M 1 j
M i M
..........................................
1)
M 1 k 1
i 0
. . .
are sorted according to the size of the largest Jordan block, i. e.,
ki k j
for
i > j, 1
0
.. .
0
. . .
0
. . .
0 1
. . .
0 1 2 1 0 2
1 3 1 3
0 M 1
. . . M
1 1
3 2 1 21
(M 1)M 1
........................................ , +1 d M 1 +1 d M (1)j 0 0 j 1 1 1 j j
. . .
+1 j j
d j
dj+1
. . . M 1 j
M j
........................................ 0 0 0 0
M 1 M 1 k1 1 . . . M 1 M k1 1 k1 1
42
= max{i : 1 i , ki > j .
Each
j , 0 j < k1 ,
corresponds
to a row block in this determinant. Obviously, the rows in dierent blocks are linearly On the other hand, the rows in each block are linearly independent, since they are up to suitable column scaling and additional zero columns rows of Vandermonde determinants. Thus, the determinant cannot be zero unless it reduces to the upper left
dd
A third equivalent denition of the Drazin inverse can be given in functional terms (cf. [14, Section 7.2]): The index of d d+1 which R(A ) = R(A ) and N(Ad )
integer for
H = R(Ad ) N(Ad )
(4.12)
d and the restriction A|R(Ad ) of A to the range of A is an invertible linear operator on R(Ad ). Given a vector x H, with the unique decomposition x = y + z , y R(Ad ), z N(Ad ), the Drazin inverse is dened by AD x = (A|R(Ad ) )1 y .
R(A ) with respect to the basis T n (recall that the columns of T forms a basis of H = C if T is nonsingular). Similarly, N d describes the restriction of A to N(A ) in terms of this basis. We retain the distinction n between H and the coordinate vectors and matrices even if H = C is assumed. Thus H we write, e. g., A for the adjoint of an operator A while we use C to denote the conjugate transpose of the coordinate matrix C .
matrix
Remark 4.2.7. With regard to the canonical forms (4.7) and (4.8) we note that the d
restricted to
For the reader's convenience we state an important property of the Drazin inverse, which can be found e. g. in [14].
Lemma 4.2.8.
onto
Let
R(Ad )
along
d denote the index of A. The (uniquely determined) N(Ad ) is given by PR(Ad ),N(Ad ) = AD A and there holds
and
projection
j d,
R(Adj ).
Then there
Lemma 4.2.9.
holds
Denote by
the index of
and let
j N0 , 0 j d.
v R(Adj ).
Using (4.12) this vector has the unique decomposition s N(Ad ). Applied to v this gives
43
where
y = Adj r R(Ad )
and
z = Adj s N(Ad )
v.
Aj z = Aj Adj s = Ad s = 0 , v R(Ad ) N(Aj ), i. e., R(Adj ) R(Ad ) N(Aj ). d j d d Let now v = y + z with y R(A ) and z N(A ) N(A ). Now, choose r R(A ) d dj dj dj and s N(A ) such that y = A r and z = A s . Then v = y + z = A (r + s ) R(Adj ), which completes the proof.
which shows that A well-known property of the minimal polynomial of a matrix every nonzero polynomial
is that
mA
divides
with
p(A) = O.
with respect to
A.
Proposition 4.2.10.
mial
m r ,A
which satises
p(A)r = 0 .
Clearly we have
Proof.
deg p
deg mr ,A .
by
m r ,A
to obtain
p( ) = s( )mr ,A ( ) + r( ),
with
there is a contra-
In particular, this shows that the minimal polynomial the minimal polynomial of
m r ,A
of any vector
divides
A.
mr ,A ( ) = q
i=1
where is
( i )ni ,
(4.13)
i = 0, i = 1, . . . , k , are nonzero eigenvalues of A. The quantity q = index(r , A) called the index of r with respect to A. It can be characterized independently of the
If
above representation.
Proposition 4.2.11.
holds where
with respect to
A Cnn
then there
(4.14)
q = min{j N0 : 0 j d d = index(A).
Moreover there holds
and
r R(Adj )},
(4.15)
is given
r =T
y 0 +T . 0 z
44
with respect to
A by mr ,A = q c( ) with c(0) = 0. 0 z
0 = mr ,A (A)r = T =T
i. e.,
mr ,A (C ) O O mr ,A (N )
y mr ,A (C ) O +T 0 O mr ,A (N )
mr ,A (C )y 0 +T , 0 mr ,A (N )z
mr ,A (C )y = 0 and mr ,A (N )z = 0 . Hence the minimal polynomials my ,C mz ,N divide mr ,A . Since N is nilpotent, mz ,N ( ) has to be of the form j q j (cf. Lemma 4.2.2). The fact, that c( ) is a multiple of mz ,N ( ) = implies j q . Now the assumption j < q contradicts the minimality property of mr ,A consider p( ) := j c( ). Hence mz ,N ( ) = q , which implies N q z = 0 and, using (4.7),
and
Aq T
This proves
0 0 =T = 0, z N qz
i. e.,
0 N(Aq ). z
r R(Ad ) N(Aq ) = R(Adq ), where the last equation follows from q Lemma 4.2.9. Moreover, since is the minimal polynomial of z with respect to N , 0 there can not exist a smaller index j < q , such that T N(Aj ). z
All assertions now follow from the obvious equivalence
R(Adq ) R(Adj ) q j.
Corollary 4.2.12.
then
If
and
d = index(A),
Km (A, r ) R(Adq )
Remark 4.2.13. If
m.
We call the se-
bounded linear operators which possess a decomposition like (4.12). quence of subspaces
a=
otherwise.
Similarly we call
45
with
N(An ) =
integer, d := a = n. The existence of a nite index d d d is equivalent to the existence of the direct decomposition H = R(A ) N(A ) (cf. [38, d Satz 72.4]) and the restriction A|R(Ad ) of A to R(A ) is invertible. 1 D The Drazin inverse now may be dened in functional terms as A v = A|R(Ad ) y, d d where v has the unique decomposition v = y + z with y R(A ) and z N(A ). D Equivalently, we can dene A as the unique solution of (4.6). The index q of a vector with respect to
a and n A by this
are nite, they must coincide (cf. [38, Satz 72.3]) and
m r ,A
L=q+
i=1
Another characterization of D contains A r .
ni .
Proposition 4.3.1.
L
Suppose that
r N(Ad ), d = index(A).
with respect to
always contains
L as dened in (3.13)). We rst show that the Krylov space AD r . Since KL (A, r ) = {p(A)r : p PL1 }, this is equivalent to the D polynomial of degree at most L 1 which satises A r = p(A)r . This
A (or KL (A, r )
and
can be
J O RJ RN P T 1 A = TP O N O O R
and
y r = TP z , 0
46
where the Jordan blocks in the block diagonal matrix as in the proof of Proposition 4.2.4.
corresponding to an eigenvalue J is nilpotent of index q . Using that we can proceed i have maximal dimension ni and N
By the same argument as in Proposition 4.2.6 we conclude that
deg p = L 1
if
L > q.
Remark 4.3.2. Consider a Krylov subspace method for solving (3.1) with initial residual r0 = b Ax0 . If r0 N(Ad ) = N(AD ) we have AD r0 = 0 and Km (A, r0 ) N(Ad ) for
all
m.
r0 with respect to A is the smallest integer q for which r0 N(Aq ) d d (cf. (4.15)). If r0 = 0 (and thus d = 0) we have q > 0. Since R(A ) N(A ) = {0 } q there holds mr0 ,A ( ) = and the sequence of Krylov spaces terminates with L = q . Using (4.4) we get rm = pm (A)r0 with pm ( ) := 1 qm1 ( ) Pm and qm1 (A)r0 = cm for each correction cm Km (A, r0 ). Therefore the residual is nonzero if m < L and rL = rq = 0 . Moreover, the breakdown in step L = q is singular since Aq r0 = 0 q 1 r0 , Aq r0 } = span{Ar0 , . . . , Aq1 r0 } = and WL = AKL (A, r0 ) = span{Ar0 , . . . , A AKL1 (A, r0 ) = WL1 (cf. (viii) in Proposition 3.4.4).
The index of in the largest Krylov space KL (A, r ) and the A-invariance of D this space yields that it is invariant under A too (and it is not dicult to see, that L d is the smallest index for which this invariance holds, provided that r N(A )). This The inclusion of
AD r
leads to a decomposition of
KL (A, r ) L
Ad
Corollary 4.3.3.
spaces of with holds
Denote by
r with respect to A and by q the index of r with respect to A. Let r = s + z s R(Ad ), i. e., s = PR(Ad ),N(Ad ) r , and z = PN(Ad ),R(Ad ) r N(Ad ). Then here KL (A, s ) = AD AKL (A, r ) = R(Ad ) KL (A, r ) KL (A, r ), KL (A, z ) = N(Ad ) KL (A, r ) = N(Aq ) KL (A, r ) KL (A, r ) KL (A, s ) KL (A, z ) = KL (A, r ).
and
(4.18) (4.17)
In the rightmost relation of (4.17) equality holds if and only if d to z = 0 or N(A ) KL (A, r ) = {0 }.
q > 0)
As already noted, the largest Krylov space is invariant under the Drazin inverse, D more precisely, in view of Corollary 4.3.3, we have A KL (A, r ) = KL (A, s ) = AD AKL (A, r ) = R(Ad ) KL (A, r ). In essence, we are able to express quantities related D with A using the Drazin inverse of HL , the orthogonal section of A onto the invariant Krylov subspace.
47
Proposition 4.3.4. Suppose, for r H with q = index(r , A), that we have established
the Arnoldi decomposition
HL
equals
AVL = VL HL with span{VL } = KL (A, r ). Then v KL (A, r ) with v = VL y , y CL , there holds R(Ad ) KL (A, r ),
the index of
D PR(Ad ),N(Ad ) v = VL HL HL y
and
Using the QR decomposition (3.32) of HL we can compute a basis of the nullspace q q of HL (and at the same time a basis of N(A ) KL (A, r )). Assume q > 0, i. e., HL is singular. Recall from (3.31) and (3.36), that the nullspace of HL is spanned by
= g (1) := g
Thus, a vector
1 hL g RL H 1 t L1 = = . 1 1 1
but
g (2)
for which
2 (2) HL g =0
HL g (2) = 0
HL g (2) = g (1)
for some
= 0.
index(HL ) > 1, f
q 2.
Multiplying (4.19) by
QL1
RL1 t 0 0
s f = ,
where
= QL1 g
and
y=
s .
, happens , which is the last component of QL1 g normalization = 1 and = 1 we result in the solution
g (2) =
and
1 1 s RL RL 1 f + g 1 f = = + g (1) 1 0
2 span{g (1) , g (2) } = N(HL ). An obvious induction shows, that we can compute a q (1) (q ) basis {g , . . . , g } of N(HL ) and the basis {VL g (1) , . . . , VL g (q) } of N(Aq )KL (A, r ) = d N(A ) KL (A, r ) in this way. The occurrence of a regular or singular breakdown only depends on the index q of the initial residual with respect to A:
Theorem 4.3.5.
Cm
A, b and x0 be given and apply an MR or OR method with MR OR = Km (A, r0 ), r0 = b Ax0 , to compute approximate solutions xm or xm of
Let
q = index(r0 , A) = 0,
i. e., if
r0 R(Ad ).
48
r0 Wm = AKm (A, r0 ) (cf. (ii) in Proposition 3.4.3). Then there exists a polynomial p of degree m 1 such that r0 A p(A)r0 = 0 . Thus, the polynomial r( ) := 1 p( ) is divided by the minimal polynomial of r0 with respect to A (cf. Lemma 4.2.10), which shows that the latter has a nonzero constant term, i. e., q = 0. Suppose now, that q = 0 and denote the minimal polynomial by
Proof.
A regular breakdown occurs if and only if
mr0 ,A ( ) = L + L1 L1 + + 1 + 0 ,
where
0 = 0.
Consider
1 L1 L2 L1 2 1 + + + + . 0 0 0 0 Obviously, r0 A p(A)r0 = 0 and, since p(A)r0 KL (A, r0 ), we conclude that r0 AKL (A, r0 ). From the properties of the minimal polynomial it follows also that there exists no smaller index m < L with r0 AKm (A, r0 ) = Wm . D D Since A r0 KL (A, r0 ) a suitable correction is cL = A r0 . We compute p( ) := rL = r0 AcL = r0 AAD r0 = (I AAD )r0 = 0 ,
where we made use of the fact, that (4.20)
r0 R(Ad )
cL
and Corollary 3.5.6). Note that we have only made use of the fact that the largest Krylov space contains D A r0 . The minimality property (4.16) is not needed in general and in the regular case
q=0
this property is a simple consequence of (4.20) and Proposition 3.4.7 (see also
Lemma 3.1.1). Theorem 4.3.5 is a reformulation of a result by Ipsen and Meyer [44]: A Krylov space contains a solution of (3.1) if and only if the right hand side b belongs to the range of Ad and the initial guess x0 is chosen such that r0 = b Ax0 R(Ad ). For b = 0 this can always be achieved by
b R(Ad ), then no Krylov space can contain a solution, unless x0 is chosen such that Ax0 has the same d components as b in the direction of N(A ), i. e., PN(Ad ),R(Ad ) b = PN(Ad ),R(Ad ) Ax0 . But D in general, computing of the projection PN(Ad ),R(Ad ) = (I AA ) is as dicult as the
In other words: If the linear equation (3.1) is consistent but original problem of solving a linear equation.
x0 = 0 .
r0 R(Ad ) the Drazin inverse is an equation solving inverse on Km (A, r0 ) R(Ad ) and no Krylov subspace contains components in the nullspace of A, i. e., N(A) Km (A, r0 ) = {0 } for all m L. Thus it is not surprising that
Remark 4.3.6. If
all known results about Krylov subspace methods for a nonsingular operator (cf. [22] and [23]) remain valid for a singular operator
r0
such that
q = index(r0 , A) = 0,
if we replace the proper inverse by the Drazin inverse. In particD ular, the MR residual polynomials can be characterized in terms of Drazin inverse A (instead of the subspace inverse) using the same line of argument as in Remark 4.1.1.
49
index(r0 , A) = 0
Theorem 4.3.5 and Corollary 3.5.6). If (3.1) is inconsistent and the steps up to the singular breakdown in step
mth
Krylov space
contains a least squares solution, the MR method makes no progress in the subsequent
L > m.
behavior is provided by the backward shift operator with the rst unit vector as right hand side and the all zero vector as initial guess (cf. [7, Example 1.2]). A natural question is, under which conditions the possible Krylov correction in the
AD r0
Lth
MR method. We will answer this later in this section. In [7] it is shown that GMRES yields in the least squares solution of (3.1) for all right hand sides b and all initial vectors x if and only if N(A) = N(A ). We cite from Campbell and Meyer several characterizations of this situation (cf. [14, Denition 4.3.1 and Theorems 4.3.1 and 7.3.4]).
Theorem 4.4.1.
(i) (ii) (iii) (iv)
Suppose
A Cnn .
Cn = R(A) N(A)) U
and an nonsingular matrix
C Cr r ,
where
A,
such that
A=U
C O U O O
see, that this is equivalent to the condition also, that (iv) can be rewritten as
Remark 4.4.2. A matrix with the property (iii) is called range hermitian. It is easy to
Note
R(A) = N(A),
which is used by Hayami in [34].
50
R(A) N(A)
index(A) 1. This may be seen directly or one observes that the orthogonality implies R(A) N(A) = {0 } and uses Proposition 4.2.11 below. The direct decomposition is
then provided by (4.12). From the condition
index(A) 1
also the Moore-Penrose inverse coincide with the group inverse (cf. Section 2), i. e., A = AD = A# .
r0 R(A) for every choice of x0 and this means, by (4.15), that the index q of r0 with respect to A is zero. Applying
Now, if (3.1) is consistent it follows immediately that Theorem 4.3.5 shows that any Krylov method breaks down regularly with the solution
MR OR xL = xL = x0 + A# r0 = A# b + (I A# A)x0 = A b + PN(A) x0 .
If, on the other hand, the equation is inconsistent, we have
(4.21)
for every initial guess x0 and an MR Krylov methods breaks down singularly in step L. Since A r0 = AD r0 KL (A, r0 ) = CL we have
q =1
m<L
for which
and thus an MR subspace correction method yields a MR least squares solution. By Lemma 3.2.6 the correction is of the form cm = A r0 + z N(A) and the corresponding least squares solution is with z
MR xm = A b + z
In the terminating step
with
z N(A).
since
KL (A, r0 ) = {0 }. Using (3.36) we may compute the subspace inverse solution. MR cL = A# r0 = A r0 leads to
MR xL = A# b + (I A# A)x0 = A b + PN(A) x0 ,
which coincides with the MR solution for a consistent problem. For we get the pseudoinverse solution A b in both cases. If
N(A)
Setting
x0 R(A) N(A)
Note that our considerations prove one direction in the result of Brown and Walker:
arbitrary vectors
and
x0 .
1 0 0 A = 1 0 1 R33 . 0 0 1
51
A2 = A,
in particular
Moreover,
0 1 0 . Obviously, the right hand side b = 1 1 1 does not lie in the range of A. We apply a Krylov method with the zero vector as initial guess. MR MR In the second step the Arnoldi process breaks down singularly and c2 = c1 +z MR = 1 1 1 with z K2 (A, b ) N(A) = N(A). The rst MR correction is c1 where minimizes 1 1 1 1 1 A( 1) = 1 0 , 1 1 1 1
and i. e., = A r0 =
N(A) = span
1 and the residual norm is 1. This is not 4/3 0 2/3 has residual norm 1/ 3.
A b =
MR MR MR x2 = c2 = x1 z
0 1 1 1 , 0 1
i. e., the subspace
which is done by
= 1
and results in
MR c2 = AK2 (A,b ) b = AD b ,
= 1 0 0 R(A) shows. Again b MR = 1/2 0 0 and this is we get a singular breakdown after two steps. Now c1 D = 1 1 0 and also the solution with minimal norm after the second step, but A b = 2/3 0 1/3 . A b
But also this need not to be true in general as This example shows that, if
converge neither to a least squares solution nor to the Drazin inverse solution even if the index of the operator is one.
index(A) = 1 deserves closer attention. Of course, the index is zero if and only if A is nonsingular. Thus, stating index(A) = 1 is equivalent to demand index(A) 1 for singular A.
Though, the case
Proposition 4.4.4.
(i) (ii) (iii) (iv)
52
Proof.
If
For the equivalence of (i) and (iv) we refer to [14, Denition 7.2.4 and Theo-
rem 7.2.5].
A is nonsingular, the assertion is trivial since N(A) = {0 }, R(A) = H and index(A) = 0. Therefore we assume A is singular or, equivalently, index(A) 1. The implications (i)(ii)(iii) are direct consequences of (4.12). Now, suppose (iii) and denote by d the index of A. From Lemma 4.2.9 we obtain R(A) = R(Ad ) N(Ad1 ).
with each component of this direct decomposition are d assumed to be the trivial space containing only 0 . For R(A ) this is evidently The intersections of true (regardless of d1
N(A)
d = 1 or greater). Thus, the only restriction for d arise from N(A ) N(A) = {0 }. If d = 1, even this is evident. On the other hand, if d > 1 d1 there holds N(A) N(A ) and we have a contradiction. index(A) = 1, a similar result as in (4.21) is implied along the same line of argument if we replace PN(A) by the oblique projection PN(A),R(A) .
If is not range hermitian but
Proposition 4.4.5.
If
index(A) = 1
and
b R(A),
OR MR = b. = AxL AxL
This was rst observed by Freund and Hochbruck in [29, Corollary 3] in the context of the QMR method. the solution in terms of Similar results, though without stating the representation of A# , are provided by Brown and Walker for GMRES (cf. [7,
Theorem 2.6]) and by Hayami for the conjugate residual (CR) method (cf. [34, Theorem 3.21]) and GCR (cf. [35]). Assume, a Krylov subspace method yields a solution of (3.1) for any initial guess
x0 ,
there holds
r0 Wm = AKm (A, r0 )
regularly (cf. Proposition 3.4.3). By Theorem 4.3.5 this is equivalent to where choosing
d = index(A) and r0 = b Ax0 . This must hold true for arbitrary x0 . By x0 = 0 we conclude b R(Ad ). Thus, for any x0 we have Ax0 R(Ad ). d d We decompose x0 = s + t subject to s R(A ) and t N(A ). Then Ax0 = As + At = As , i. e., At = 0. This applies for any t N(Ad ), which is only possible if d = 1 or d = 0.
Thus we have proved: If an (MR or OR) Krylov method yields a solution of (3.1) for
any initial guess x0 , then index(A) 1 and b R(A). Vice versa, if index(A) 1 and b R(A), we already know from Proposition 4.4.5 that the Krylov method determines a solution regardless of x0 . Let us summarize:
x0 ,
if and only if
index(A) 1
and
b R(A).
53
Note the dierence between our result and that in [7, Theorem 2.4]: squares solution. Theorem 4.4.1 characterizes the matrices for which answers the question when there holds
We ask for
proper solutions, which requires consistency, while Brown and Walker seek for a least
A = AD ,
in other words, it
A r = AD r
for any vector the matrix for some
r . We now look for a characterization of the vectors for A, the relation (4.22) is satised. We rst assume r R(A), y Cn . Using (4.22) and AA A = A we obtain r = Ay = AA Ay = AA r = AAD r , R(Ad ), where d = index(A).
r = Ay
Similarly, for
r R(A) = N(A ) =
(I AAD )r = r AAD r = r AA r = r
that
r N(Ad ).
Suppose
Proposition 4.4.7.
Proof.
d = index(A).
s := PR(A) r R(Ad ) and t := PN(A ) r N(Ad ). d Thus we have established a decomposition r = s + t into the components on R(A ) d d d n and N(A ). Since R(A ) N(A ) = C (cf. (4.12)) this decomposition is unique D and the components are given by the oblique projection PR(Ad ),N(Ad ) = AA of r d d D onto R(A ) along N(A ) and its complementary projection I AA = PN(Ad ),R(Ad )
(cf. Lemma 4.2.8).
D Proposition 4.4.7 implies, that (4.22) is sucient for A r0 to be a least squares D solution of the residual equation, i. e., for x0 + A r0 to be a least square solution of
(3.1).
Proposition 4.4.8.
The correction
MR cL = AD r
Ac = r
(4.24)
if and only if
AAD r = AA r
Proof.
(or equivalently
AD A r N(A))
MR Note that (4.24) is merely another formulation for (4.23). A vector c is a MR least squares solution of Ac = r if and only if PR(A) r Ac = 0 or, equivalently, MR MR if and only if PR(A) r = AA r = Ac . Using the last equation, we get: If cL = AD r
is a least squares solution, then there holds (4.24) (and (4.23)). D If, on the other hand, AA r = AA r holds true, we choose
is possible in the termination step L. A simple calculation AA r = PR(A) r , which implies that c MR is a least squares solution.
54
Another way to generalize the results of Brown and Walker in [7] is to investigate under which conditions a Krylov method constructs a least squares solution up to some projection. The close connection of Krylov spaces and the Drazin inverse suggests the D projection PR(Ad ),N(Ad ) = A A.
Proposition 4.4.9.
Let
be nite dimensional. If
R(Ad ) N(Ad ),
there holds
(4.25)
AD = A AAD = AD AA .
Remark 4.4.10. Note, that the identities in (4.25) can be rewritten in several ways using PR(Ad ),N(Ad ) = AD A = AAD , PR(A) = AA and PR(A ) = A A. In particular, (4.25) is D equivalent to A = PR(Ad ),N(Ad ) A = A PR(Ad ),N(Ad ) , which means that the Drazin and
the Moore-Penrose inverse coincide up to the projection as a generalization of (i) in Theorem 4.4.1. Note further that the orthogonality assumption implies that the orthogonal projection
PR(Ad ) .
Proof.
R(Ad ) R(A). We can extend UC to an orthonormal basis W = UC UW of R(A). Since H = R(A) N(A ) and R(A) N(A ), we can further extend this to an orthonormal basis UC UW Y of H, where Y is a basis of N(A ). We set UY := UW Y . Obviously, H = span{UC } span{UY } d is an orthogonal decomposition of H and, since UC is a basis of R(A ), its orthogonal d complement N(A ) is spanned by the basis UY . d Now, denote by Z an orthonormal basis of N(A) N(A ) and extend it to an Z of N(Ad ). Since R(Ad ) N(Ad ) we may join UC orthonormal basis UZ = UV Z of H. We set with UZ to obtain another orthonormal basis UC UZ = UC UV V := UC UV . Obviously, H = span{V } span{Z } is an orthogonal decomposition of H. But span{Z } is the nullspace of A and thus the orthogonal complement has to coincide with R(A ). In other words, V is an orthonormal basis of R(A ).
Let
UC
be an orthonormal basis of
The remainder of the proof will be done by observing, how the operators act on these orthonormal bases. Note rst, that
A UC UY = UC UY A UC UZ = UC UZ
and
C O O NY C O O NZ O O O O = UC C 1 O O O O O O O O = UC C 1 O O O O
(4.26)
AD UC UW Y = UC UW
AD UC UV
Z = UC UV
1 C Y O O 1 C Z O O
55
Since
AA
R(A)
it follows that
AA W Y = W O = UC UW
which leads to
I O O Y O I O , O O O
AD AA UC UW Y = AD UC UW
UC C
I O O O O O I O = UC C 1 O O = AD UC UW Y . O O O
implies
I O O Y O I O = O O O
Similarly,
A A = PR(A )
A A V
and
Z = UC UV
A AAD UC UV
Z = A A UC UV
UC UV
1 C O O O O O O = UC C 1 O O = AD UC UV O O O
1 C O O Z O O O = O O O
Z ,
can be written as
A UC UV
Z = UC UW
C O O Y O M O , O O O
(4.27)
C Crr and M Css are nonsingular, r = rank(Ad ) and s = rank(A)r. The zero blocks in the last column are a simple consequence of span{Z } = N(A). That the last row is zero can be seen from the adjoint equation together with span{Y } = N(A ). d Since R(A ) is A-invariant, there holds AUC = UC C , which shows that the (2, 1) block d is zero. Similarly, the A-invariance of N(A ) implies AUV = UW M + Y B and the (1, 2) block has to be zero, since UW and Y are orthogonal to UC . The representation (4.27)
where can be regarded as a generalization of (v) in Theorem 4.4.1.
56
Proposition 4.4.11. If R(Ad ) N(Ad ), then for any right hand side b and any initial
guess
x0
the
Lth
MR correction
MR cL
with respect to
KL (A, r0 ), r0 = b Ax0
satises
Proof.
The
Lth
MR correction minimizes
c KL (A, r0 ).
tion this can be done independently for the components in the range and nullspace of Ad . But PR(Ad ) r0 APR(Ad ) c = 0 for PR(Ad ) c = AD r0 KL (A, r0 ) R(Ad ). The identities in (4.25) can be proved under much weaker conditions.
Proposition 4.4.12.
(i) (ii)
There holds
AD = A AAD R(Ad ) R(A ), AD = AD AA N(A ) N(Ad ). = R(Ad ) and A A = PR(A ) . AA ) = O and vice versa.
SimThe
Proof.
for any
R(AD ) = R(Ad ) R(A) = R(AA ) and R(I A A) = N(A) N(Ad ) = N(AD ) A, there holds AD = AD A A = AA AD .
d Together with (i) of the above proposition, we conclude that R(A ) R(A ) implies AA AD = A AAD , i. e., A commutes with its pseudoinverse on R(Ad ) (which is a
weakened version of (ii) in Theorem 4.4.1). Similarly, for any vector d d dierence AA v A Av lies in N(A ) if N(A ) N(A ). An example for a matrix
v H,
the
R(Ad ) R(A )
N(A ) N(Ad ),
(4.28)
R(Ad ) N(Ad ),
d = index(A),
(4.29)
is not obvious. This is impossible in three dimensions, as may be veried by a short exercise (but see also below). We give here general construction guidelines and illustrate them by a concrete example. Before we start some observations should be mentioned.
57
First, (4.29) implies (4.28). This follows from a combination of Propositions 4.4.9 and 4.4.11 or may be seen from the bases constructed in the poof of Proposition 4.4.9. Now, assume that (4.28) is satised and that actually equality holds in one relation, d d e. g. N(A ) = N(A ). Then dim N(A ) = dim N(A ) = dim N(A), and, since N(A) d d N(A ), this implies N(A) = N(A ), which means that d = index(A) 1. But then using the second inclusion we get
d = 1.
For
R(Ad ) = R(A )
we conclude similarly
d1
C O TA = T O N , (nr )(nr ) where C C is nonsingular, r = rank(A ), d = index(A), and N C is nilpotent of index d. We make no further initial assumption on C and N . At rst we choose the nullspace of A . Let Y = y1 . . . yt denote an orthonormal basis of N(A ). From its orthogonal complement R(A), we choose an orthonormal basis U d of an r -dimensional subspace which becomes the range of A . We want to construct d the basis T such that T = U X Y , where X Y forms a basis of N(A ). This d construction ensures that the condition N(A ) N(A ) is satised (the inclusion has
to be proper, since otherwise (4.29) holds). For later use, we note that, given Y and U as above, one can choose a basis V = v1 . . . vs with s = n r t such that U V is an orthonormal basis of R(A) and Y is an orthonormal basis of H, i. e. U V Y U V Y = I. thus, U V t An arbitrary vector y N(A ) is of the form y = Y g with g C . Using U Y = O we obtain
0 = T A y = (A T ) y =
C O
O NH
0 U H O X Y g = C X Yg . O NH Y Y Y g
Since g is arbitrary, we conclude that the t columns of X Y Y lie in the nullspace of H N . This null space must have the same dimension t as the nullspace of A . Therefore,
if
S=
SX SY
denotes a basis of
and
is restricted equations
ts
2 With regard to the notation with and H , we recall our distinction between quantities belonging to
the general Hilbert space and their coordinates with respect to a specic basis, cf. Remark 4.2.7.
58
and
Y orthonormal as above, this condition reduces t components of the vectors in N(N H ) can vary arbitrarily. The Jordan canon1 ical form JN of N = TN JN TN must contain exactly t Jordan blocks, i. e., there are exactly t columns and rows with all zero components. Thus it is possible to choose TN as a permutation such that the last t rows of N are zero. In other words, the Jordan structure of N (and A) can be chosen arbitrarily. This denes the number t, and now we choose t orthonormal vectors, which form the columns of Y . d Since N(A) N(A ) = span{X, Y }, we can write any vector in the nullspace of A
as
z = X Y
Let
gX , gY
where
g=
gX gY
is a zero eigenvector of
N.
{g1 , . . . , gt }
denote a basis of
N(N ),
(j )
whereas
g gj = X (j ) gY
The condition
for
j = 1, . . . , t.
R(Ad ) R(A ) = N(A) implies that the vectors of the basis U are (j ) X Y gj = U X O gj = U X gX for orthogonal to any z N(A), i. e., 0 = U j = 1, . . . , t. This yields another set of tr equations for X . If we write the vectors (j ) gX as columns in a matrix G Cst , a short representation of these equations is U X G = O . Altogether, we have a system of tr + ts linear equations for the s unknown vectors H in X = x1 . . . xs . Setting X = Y SX , we conclude that the system is solvable. Obviously, this solution violates a third condition on the basis X , which arises from R(Ad ) N(Ad ), namely U X = O . If there exists a matrix F Cts such that GF = Is
any solution (4.30)
implication holds true. Note rst that (4.30) implies that (since there holds to
GFG = F ). U V Y
rank G = s.
Remember that is an orthonormal basis of H. We can express any H r s in terms of this basis. Let X = Y SX + V + U M with M C . Then is satised and condition U X = O is equivalent to M = O . The matrix reduces to
X SX = X Y equation U X G = O
solution is given by
MG = O .
H (I GG ) | H Crs .
Thus, a nonzero solution of
GG = I rank G = s.
if
MG = O
59
N:
has a rank dierent g1 . . . gt of N(N ) such that the s t upper block of G If s = 1, the matrix G degenerates to a row vector g and (4.30) holds if we set F = g /(g , g )2 . This can be done since g = 0 , which follows from the fact that N(A) d has to be a proper subset of N(A ). Consequently, if s = 1, it is not possible to construct a matrix A with the desired properties. d On the other hand, s > 1 means that dim N(A ) dim N(A) + 2. In other words, N must at least consist of a three-dimensional Jordan block or of two two-dimensional
Jordan blocks. This explains, why the smallest possible example is four-dimensional. st Evidently, if t < s, the rank of G C is always smaller than s. The example given below is also of this type. It remains an open question whether there exists an The condition t < s can be formulated more comprehensibly Ad has to be large in comparison with the nullspace of A. More precisely, the dimension of N(Ad ) has to exceed twice the dimension of the example with as follows: The nullspace of nullspace of
= G from s.
t s.
A. A R4 4
as described above. For the
we choose the
33
Jordan block
0 1 0 N = 0 0 1 . 0 0 0
The regular component is simply C = 1 . This implies index(A) = 3 and r = rank(A3 ) = 1. Obviously, N(A ) is spanned by the last unit vector, in particular; t = 1 H and consequently s = n r t = 2. Thus we have SX = 0 0 and SY = 1 . The latter means that we can choose any vector of unit norm as the basis
Y.
Let
1 0 Y = 0 , 0
The nullspace of nonzero
0 1 U = 0 0
and
0 0 V = 1 0
0 0 . 0 1 G = 1 0
. A
with
MG = O
1 0 H + V + UM = X = Y SX 0 0
M = 0 1 0 0 0 0 + 1 0
and thus
0 0 0 1 0 0 + 0 1 = 1 0 0 1 0 0
0 1 . 0 1
60
0 1 A= 0 1
which has the desired properties.
0 1 0 0
0 0 0 1 0 1 0 0
K MR : H H,
its
Lth
A:
MR K MR r = cL = AKL (A,r ) r .
For simplicity we assume that
H.
K MR
is well dened on
the degree of the minimal polynomial of r with respect to A, is not MR continuous as function of r , we cannot expect that K is a continuous operator. On MR the other hand, for regular A it is known that K = A1 is linear. More generally, MR = AD on R(Ad ), i. e., K MR is continuous from Theorem 4.3.5 we conclude that K index(A) on R(A ). Since
L,
0 1 0 A = 0 0 1 0 0 0
and the initial residual
r= 0 1 0 0 , 1 0 2.
H2 =
=0
and
MR MR cL = 0 , which 1 = c1
is also the subspace inverse solution, i. e., the MR approximation in the Krylov space MR MR K2 (A, r ) with minimal norm, so that cL = c2 = 0. 3 If = 0, in exact arithmetic we get L = index(r , A) = 3 and K3 (A, r ) = H = R . Thus, the subspace inverse coincides with the Moore-Penrose inverse and yields
MR MR cL = c3
0 0 0 0 0 = 1 0 1 1 = 0 0 1 0 1
MR c2 = 1/2
MR ||. The previous Krylov approximation cL 1 with the same residual 0 1 . Note further that 3,2 = (I PV2 )A2 r = O(2 ).
61
K MR
tion may become unstable in presence of roundo errors. Due to the growing computational and storage requirement for larger correction spaces in practice a restart strategy is often used: After a xed number algorithm (3.14) the calculated basis of with the current iterate
of steps in
Vm
xm
In terms of polynomials and Krylov subspaces this means, that we have formed
rm = b Axm = r0 Aqm1 (A)r0 = pm (A)r0 with respect to Km (A, r0 ) and then continue the MR approximation process with Cj = Kj (A, rm ), j = 1, 2, . . . . With regard to our results about the role of the index one may ask, when index(rm , A) = 0 holds true. Unfortunately, this is only possible if index(r0 , A) = 0 since the residual polynomial satises the normalization condition pm (0) = 1 (cf. [64]). On the other hand, if the restart index m is smaller then the termination index L (which is the
typical case in practice), we will never be annoyed with a singular breakdown. This seems to be good news, but we can not expect that the residuals of the restarted method index(A) converge to zero unless r0 R(A ). even if the initial residual satises To add insult to injury, due to round-o errors we may end up with a nonzero index r0 R(Aindex(A) ) or, equivalently, index(r0 , A) = 0. index(A) This problems can be resolved if a projection on R(A ) is known. For this was also observed by Brown and Walker (cf. [7, p. 44]). An ap-
index(A) = 1
plication of this kind of stabilizing Krylov methods is given in Section 5 in context of Markov chains.
A.
subspaces to approximate the Drazin inverse solution of (3.1). The basic idea is to d force all computations into R(A ). Corollary 4.2.12 shows that this can be done by q multiplying all relevant quantities by A , q = index(r0 , A). Of course, this is only possible if known. The simplest case arises for then one additional multiplication with
d = index(A),
is explicitly
d = 1 (for example, if A is symmetric or skew-symmetric), A suces. For b R(A) even this is not R(A) N(A)
necessary (cf. Proposition 4.4.5). If the linear system is symmetric (but inconsistent) the analysis of the resulting method benets additionally from the fact (cf. [26]). Unfortunately, in general a simple bound for equations of nite rank, namely for always given by the dimension
nn
or
is
n.
computational eort for large systems and since the method becomes ill-conditioned: k The condition number of A increases of one power per each multiplication with A.
62
To describe the general approach in details, suppose we know an upper bound of q . d Since R(A ) is A-invariant, without lost of generality we can base our considerations on
Aq Ax = Aq b .
Given an initial guess
(4.31)
q q q the corresponding residual is now A b A Ax0 = A r0 . The d correction spaces have to be chosen such that Cm R(A ). This can be done by
x0
Aq r0
with respect to
r0
with respect to
A.
m, Aq+1 Cm = Vm+1 H
q where Vm+1 = v1 . . . vm is an orthonormal basis of Vm with v1 = A r0 / ( MR MR have to minimize = Cm ym Aq r0 ). Since we solve (4.31) the MR correction cm MR MR Aq rm = Aq r0 Aq+1 cm =
:=
= Vm+1 u1
implies that
(m+1)
MR Aq+1 Cm ym = u1
(m+1)
MR m ym H , MR cm R(Ad )
MR MR xm = x0 + cm
and
and
m=Lq
we know that
cLq = AD r0
Aq rLq = 0 .
S. To be precise, {X (t) | t T } is a family of random variables on a probability space (, F, Pr {}) with values in S. The sample space is arbitrary, F is a sigma algebra whose elements are subsets of and sometimes called events. Pr {} : F [0, 1] is a probability measure. For each xed t T the mapping X (t, ) = X (t) is a measurable function, i. e., the pre-images { | X (t, ) = i} = {X (t) = i} are measurable sets for all i S. Thus, we can measure probabilities such as Pr {X (t) = i} or Pr {X (s) = x(s)} (with a deterministic function x : T S). We denote the states by positive integers, S = {1, 2, . . . }. We will refer to the parameter set T = [0, ) as time, an element s T is a moment or time instant and for the moments s, s + t T we refer to t as the period between them. In other words, X (t) is a random function. For every xed the (deterministic) function t X (t, ) = X (t) is a realization of the stochastic process (also called
a sample path or trajectory ). Under relatively weak regularity conditions one may
63
64
conclude, that the sample paths of a stochastic process are piecewise constant functions (step functions).
processes, such that the realizations are right continuous step functions.
The jumps are often called events or transitions of the process. We say that an event
s. The event is said to be the transition from state i to state j if X (s) = j and X (t) = i for t < s in the neighborhood of s. The probability of being in state j at time s is Pr {X (s) = j }. Similarly, Pr {X (t) = i, X (s) = j } is the joint probability that the process is in i at time instant t and in j at s. Moreover, we will use conditional probabilities, dened as usual by Pr {X (s) = j | X (t) = i} = Pr {X (t) = i, X (s) = j } / Pr {X (t) = i}. For t < s this may be interpreted as the probability that the process stays in j after time s given that it was in state i at the previous moment t.
happens at time if the considered realization makes a jump at As common in Markov chain literature, we use row vectors for distributions and denote them by (bold) small Greek letters. The state space distribution or state distri-
sT
is the vector
(t) := Pr {X (t) = i}
The
iS
=: i (t)
iS
ith S.
entry
t.
i (t) of (t) is the probability, that the process is in state i at the time t T the vector (t) forms a probability distribution over the state 0 i (t) 1 1 := 1 . . . 1 i (t) = 1,
iS
space
1, (t) 0 (t)1 = 1.
(5.1)
Here and henceforth inequalities between vectors or matrices are interpreted as inequalities in each component. Comparing
(t)
view-points regarding a stochastic process: At each point of the time scale we have a probability distribution on the state space which describes the stochastic behavior of the process at this moment. If we take for each moment of the entire process.
tT
a realization of these
Example 5.1.1 (Homogeneous Poisson process). The Poisson process with intensity
>
is a stochastic process
X (t)
S = {0, 1, 2, . . . }
having stochastically
1 This is not true in general, even not for Markov processes. Many Markov chain books (cf. [19, 28,
27]) are in great parts devoted to the construction of such pathological examples. However, all those example seems to require a countable innite state space.
65
Pr {X (t) X (s) = n} =
for
[(t s)]n (ts) e n! X (s) X (t) for s t and a natural X (0) = 0. The process is
n = 0, 1, 2, . . .
and
s < t.
Evidently we have
called homogeneous or stationary since the distribution of the increments depends only
t s. X (t)
are non-decreasing step functions and the heights of their variable distributed as
jumps are equal to 1 (which implies that the Poisson process is a so called simple point
n let Tn denote a real valued random Pr {Tn t} = Pr {X (t) = n}. These jump
of random variables, i. e.,
lim Tn = +.
Yn = Tn+1 Tn
Pr {Yn t} = 1 et .
If we consider a time interval
[s, t]
s+T
h . ts
Therefore, the Poisson process can be seen as the most random stochastic process. It is in some sense the best model for an unordered and rather unpredictable sequence of events in continuous time.
= p i,j P
n denote i,j =1
nn
0 P
and
1 = 1 P i
in the
We dene a discrete time stochastic process by the following recursive construction: Under the condition, that the process is in state the next step
+1
. P
In other words,
p i,j
is the conditioned
Pr {X +1 = j | X = i} = p i,j .
The entries
p i,j
probability matrix. The process is homogeneous since the transition probabilities do not depend on
66
after
ith
entry
, := 0 P
where
0,
The entries of
are the
step transition probabilities. The sequence of random variables the probability distribution
(X ) 0 ,
where
has
S = {1, 2, . . . , n}.
Both processes in the examples above are memoryless in the sense, that the state after a jump depends only on the preceding state right before the jump. This is characteristic for Markov processes.
Let
S.
s, t 0
i, j, x(u)
there holds
i, j, i1 , . . . , ik S
and times
t>0
there holds
For all
(5.2)
s, t, h T
with
0s<t<h
and
i, j, k S
there holds
Pr {X (s + t) = j | X (s) = i}
s.
The rst two conditions express that the next possible event depends only on the directly preceding state and is independent from all other previous states. The third characterization may be paraphrased as: Given the present the future
X (t),
the past
X (s)
and
X (h)
67
For a homogeneous CTMC with nite state space we collect the transition probabil-
ities
i, j S,
X (t).
For each
i S
the row
pi,j (t)
j S
pi,j (t) = 1,
j S
moreover, since the entries are probabilities, they lie between we have
and
1.
P (t) O ,
This identies
P (t)1 = 1 .
P (t)
to deal with transition functions, which do not satisfy this condition in full strength, for example, if we derive a process from another process by excluding some states. Even then the transition function satises
P (t) O
and
P (t)1 1 .
rather the transition function) is said to be honest if there holds (5.3) and is said to be
dishonest otherwise. Note that it is simply possible to make a dishonest Markov chain
honest by adding an auxiliary state (cf. [1, Proposition 1.1]). An often useful fact is, that a CTMC is associated with a family of discrete time Markov chains: For each xed the
which gener-
ates a discrete time Markov chain as described in Example 5.1.2. This chain is called
-skeleton
of
, where i (0) = Pr {X (0) = i}, together iS with the transition function completely describes the stochastic behavior of a CTMC: The initial distribution Using the law of total probability one easily deduces
(5.4)
P (t)
An important subclass of Markov chains are irreducible CTMCs: The Markov chain is called irreducible, if there is a positive probability to reach each state starting in any other state. Equivalently, this can be expressed in terms of the transition probabilities
P (t) = pi,j (t) i,j S : The Markov chain is irreducible i = j , there exists an t > 0 such that pi,j (t) > 0.
i, j ,
68
P (t) =
t T,
(5.5)
it is called invariant or stationary. A stationary distribution additionally satises the normalization condition (5.1). If the state space is nite, any given stationary vector
:= /( 1 ).
The process
X (t)
exhibits time-constant
state distributions if the initial distribution is stationary, i. e., distribution naturally arise.
. (t) = (0) =
In
the analysis of a Markov chain the question of existence and uniqueness of a stationary At rst glance, the description of a CTMC as given in (5.4) appears not as satisfactory as for discrete time Markov chains (cf. Example 5.1.2), where a constant matrix suces to specify the process. Instead, we have to deal with a matrix of functions. A deeper study of the transition function is therefore required. As a consequence of the Markov property, the transition function satises the
Chapman-Kolmogorov equation
P (t + s) = P (t)P (s),
which shows that the functions forming
P (t)
In applications of Markov theory an urgent question is the long term behavior of the process. The Markov chain is called ergodic if there exists a limit distribution
(5.6)
(0)
>0
the columns of the identity matrix one after the other for and conclude that
lim P (t)
i. e.,
lim P (t) = 1 =: .
for all
A simple calculation using the Chapman-Kolmogorov equation shows, that the limit distribution satises
P (t) =
t T.
is stationary (cf. (5.5)). Two other properties of the transition function not yet mentioned are
P (t)
The further study of Markov chains can be based on the properties of the transition function only without any reference to the stochastic origin of the problem. Before we continue with it, we recapitulate some background from matrix analysis.
69
Denition 5.2.1.
A=
n ai,j i,j =1
1,
in formulas:
nn
0 ai,j 1,
j =1
ai,j 1
(row) stochastic.
for all
and
j.
If equality holds in the last equation we call In matrix notation, the real matrix
AO
and
A1 = 1 1
(A1
1 ).
(5.7)
With regard to the notation we remind that inequalities between matrices are interpreted as inequalities in each component and dimension containing radius of denotes the column vector of suitable
(A)
the spectral
A.
Stochastic and substochastic matrices have many algebraic properties, which often stem from the fact that these matrices are in particular nonnegative and we can apply the Perron-Frobenius theory. We start, however, with a very basic result concerning the location of eigenvalues.
Proposition 5.2.2.
if
If
A Rnn
is a stochastic matrix
(A) 1. Equality holds and then (A) = 1 is an eigenvalue of A and, moreover, left eigenvector corresponding to 1.
is substochastic there holds
A1 = 1 ,
then obviously
is an eigenvalue of
A.
follows from (5.7) and Gergorin's Theorem (cf. [39, Theorem 6.1.1]). The existence of a nonnegative left eigenvector of a stochastic matrix follows from [39, Theorem 8.3.1] applied to
The main results of the Perron-Frobenius theory can be stated for irreducible matrices.
Denition 5.2.3.
permutation matrix
A matrix
and
Q Cnn , n > 1, is called reducible some integer r , 0 < r < n, such that T QT = A B O C
zero matrix
if there is some
A Crr , B Cr(nr) , C C(nr)(nr) and a matrix Q is called irreducible if it is not reducible (cf.
with
O C(nr)r .
The
[39, p. 360]).
70
Theorem 5.2.4. A matrix Q Cnn is irreducible if and only if the following condition
is satised: For every pair of distinct indices
i, j
k1 =
i, k2 , k3 , . . . , km1 , km = j ,
are nonzero. Proof.
Corollary 5.2.5.
holds for all
and
A, B Cnn j , i = j , that
Let
with
A = ai,j
n and i,j =1
B = bi,j
n . If there i,j =1
ai,j = 0
then
bi,j = 0,
is irreducible.
Corollary 5.2.6.
A Cnn .
Then
is irreducible.
The main result about irreducible stochastic matrices guarantees the existence and uniqueness of a positive left eigenvector corresponding to the spectral radius.
Theorem 5.2.7.
Suppose that
P Rnn
an algebraically (and hence geometrically) simple eigenvalue of positive left eigenvector Proof.
(see Proposi-
Theorem 5.2.8.
sponding to
(P ) = 1
TPT
P 1 ,1 O . . . O O P 2 ,2 . . . O = . , . .. . . . . . O . . . O Pm,m
(5.8)
where with
T Rnn is a permutation matrix and each diagonal block Pj,j (Pj,j ) = 1 (i. e., the diagonal blocks are itself stochastic). 1 ),
is irreducible
Proof.
Noting the fact, that a row stochastic matrix has a positive right eigenvector the assertion follows from [5, Theo-
71
Denition 5.2.9.
A matrix
T Rnn
lim T k
Proposition 5.2.10.
the conditions
(a) (b)
Assume that
P = pi,j
P P
(that is,
(c)
1);
pi,i > 0
for all
i = 1, . . . , n.
= (b) = (a).
Proof.
Denition 5.2.11.
A Rnn
of the form
A = I B ,
for which
> 0,
B O, M-matrix
if
(5.9)
(B ),
is called an An
M-matrix. A
is a singular
= (B ).
Denition 5.2.12.
is said to have property c if the splitting (5.9) B can be chosen such that the matrix / is semiconvergent.
M-matrix A
Proposition 5.2.13.
Proof.
An
M-matrix A
index(A) 1.
P (t) Rnn , t 0
(5.10)
s, t 0.
(5.11)
72
Thus,
{P (t) | t 0}
and it is a subsemigroup
of the multiplicative semigroup of real matrices. Equation (5.11) is sometimes referred to as semigroup property. We will claim various additional properties on non-negativity, (a special kind of ) boundedness, or continuity:
P (t),
such as
h0
t 0, all t 0, t, h 0.
(5.12) (5.13)
It can be proved that a semigroup is continuous if and only if it is continuous at the origin:
h0
lim P (h) = I . P (t) P (t) are bounded by 1). t 0 (i. e., if P (t) is a
(5.14) is a substochastic matrix for The semigroup is said stochastic matrix) and If it is
t 0
to be stochastic if
P (t)1 = 1
for all
P (t)
Let
Proposition 5.3.1.
P (t)
P (t)1
t; t 0.
t > 0,
i = 1, . . . , n
there holds:
0
j =1 j =i
for all
t 0;
(iv)
implies (5.14).
If
P (t)
(v)
t 0; t > 0, then pi,i (t) = 1 for all t 0 pi,j (t) = 0 for all j = i and t 0);
(vi) if
pi,i (t) = 1
for some
(moreover, by (iii),
2 Some authors call the semigroup honest or dishonest likewise the Markov chain.
73
(vii)
|pi,j (t + h) pi,j (t)| 1 pi,i (|h|), for t 0 functions of P (t) are uniformly continuous.
such that
h < t,
Proof.
The above results can be found, e. g., in Anderson's book (cf. [1, Proposi-
tions 1.1.2 and 1.1.3]). The main result is about dierentiability of the semigroup, which follows from the uniform continuity.
Theorem 5.3.2.
on
P (t)
[0, ),
which satises
P (t) =
where
(5.15)
Q := P (0)
is the derivative at the origin. The unique solution of this system of is given by:
P (t) = etQ =
k=0
tk k Q . k! P (t) = QP (t)
(5.16)
Denition 5.3.3.
semigroup
The matrix
Q = P (0)
P (t).
P (t) = P (t)Q
and
in (5.15)
Proof.
P (t) d t, where the integral is dened for each scalar component. 0 Our rst objective is to show that S ( ) is invertible for suciently small > 0. This 1 is certainly true if I S ( ) < 1 in some matrix norm . One may accept the
Let
S ( ) :=
P (t) is uniformly continuous (cf. (vii) in 1 Proposition 5.3.1). A more detailed and elementary proof is given here: Let c ( , 1) 2 be xed. Then we choose an > 0 suciently small so that pi,i (t) > c for all i and 0 < t < . This is possible since pi,i (0) = 1 and since pi,i (t) is uniformly continuous with respect to t (and even so, with respect to i, however, this is trivial, since there are only nitely many i's). Take as the maximum row sum norm. Then we have
claim that this holds for any matrix norm since
S ( )
= max
i=1 j =1
i,j
1
n
pi,j (t) d t
0
= max
i=1
pi,i (t) d t +
0
pi,j (t) d t ,
j =i 0
1 p (t) d t 0 i,i term on the right can be estimated using (iii) of Proposition 5.3.1 as
where we may omit the absolute value in the left term, since
1.
The
pi,j (t) d t
0 j =i
(1 pi,i (t)) d t = 1
0
pi,i (t) d t.
0
74
Thus we get
I 1 S ( )
max 2 1
i=1
pi,i (t) d t
0
max 2 (1 c) < 1
i=1
The two rightmost inequalities follow from the choice of and c: On the one hand, 1 pi,i (t) > c for all t (0, ) implies 0 pi,i (t) d t c, on the other hand, 1 c < 2 since 1 c ( 2 , 1). Now, using (5.11), we write
1 (P (h) I ) h
at
P (t) d t =
0
1 h
P (t + h) d t
0 0
P (t) d t .
h,
the invertibility of
S ( )
we nally get
1 (P (h) I ) = h
Letting namely
1 h
+h
P (t) d t
1 h
P (t) d t
0 0
P (t) d t
h 0
P (0)
d Q := P (t) dt
Using (5.11) again, we write
= (P ( ) I )
t=0 0
P (t) d t
(5.17)
h0
istence and uniqueness of the solution (5.16) are common result on systems of linear dierential equations (cf. [40, p. 432]).
P (t)
is
S.
[53], where the above proof of the existence and niteness of the derivative remarkable formula (5.17). This formula provides an expression for
Q = P (0)
was found (cf. [53, Theorem 1.1.2]). We restated it here mainly because it yields the
in terms of a parameter
that has to
be chosen suciently small. In terms of the Markov process, suciently small means that the probability of leaving the current state within any time interval smaller then
is exceeded by the probability of staying in the current state, regardless which one
the current state is. Roughly speaking, the process makes probably no jump within a period of length
75
In Markov chain literature the results of Theorem 5.3.2 are commonly proved in a setting which covers the case of countable many states too and allows innite values for the derivatives (cf. [1, 6, 19, 28, 70]). The usual proof can be sketched as follows: Show that exists, but may be innite. Then show that is always nite. Assume that all The dierentiability of
qi := limt0 (1 pi,i (t))/t qi,j := limt0 pi,j (t))/t for i = j exists and
qi
at
pi,i (t)
is proved by regarding the subadditive function A common lemma on such coincides with
which satises
supt>0 h(t)/t,
(for details see [6, Theorem A.1.11 and Theorem 8.2.1] or [70,
Lemma 8.3.1 and Proposition 8.3.2]). This proof does not require that the matrices
P (t)
are neither stochastic nor substochastic (cf. [1, Lemma 1.2.1 and Proposition
1.2.2]). The existence and niteness of the derivative pi,j (0) for all i = j is usually proved using a stochastic argument: One considers the discrete time Markov chain generated by the matrix
P ( )
>0
(the
-skeleton).
Details may
be found in [70, Proposition 8.3.3] or [6, Theorem 8.2.1]. Chung [19, Theorem II.2.6] additionally remarks that the stochastic background is merely for sake of motivation. Freedman [28] shows that the proof even holds for a substochastic semigroup by previously constructing a discrete time Markov chain from a substochastic matrix. Comparing the proofs one also observes, that some details are apparently similar. 1 appears in the same manner t such that pi,i (t) > c > 2 in the beginning of Todorovic's proof ([70, p. 208], see also [6, p. 334]). For example, the choice of an Next, we state some properties of the derivative
Q = P (0).
Corollary 5.3.5.
(i) (ii)
Denote by
Q = qi,j
transition semigroup
P (t).
There holds
Q1 0 ,
P (t)
is stochastic;
qi := qi,i
j =1 j =i
qi,j i=j
P (t)
is stochastic;
qi,j 0
for all
qi 0;
t 0;
for all
pi,i (t) = 1
Proof.
functions of
t.
d P (t)1 dt
= P (0)Q 1 = Q 1 QP (t)1 =
t=0
76
where we used (5.12) and again (5.15) for the rightmost relations. Therefrom we derive (i) and (ii). A closer look at the limits constituting the derivative
P (0) = Q = qi,j
and
n yields i,j =1
(5.18) (5.19)
for
i=j
Proposition 5.3.6.
semigroup
of a continuous (sub)stochastic
P (t)
can be written as
I) Q = (P
with a suciently large
(5.20)
. P
The parameter
may
are positive.
1 Q such that the diagonal
suggest to rescale it as
entries are smaller (or equal) then one and then to dene
:= I + 1 Q P
to obtain a nonnegative matrix
(5.21)
= p i,j P
n := max qi i=1
is (sub)stochastic. Using
where equality holds if and only if the semigroup is stochastic (cf. (i) in Corollary 5.3.5).
In view of Proposition 5.2.2 and Denition 5.2.11, a reformulation of Proposition 5.3.6 is the statement that
Q
If
is a negative
M-matrix.
Proposition 5.3.7.
group then
is a
Q Rnn is the innitesimal generator of a stochastic semisingular M-matrix with property c. In particular there holds index(Q ) = 1.
77
(B ) where equality holds if P (and likewise the semigroup) is stochastic. Hence Q is a negative singular M-matrix. In view of Denition 5.2.12 and Proposition 5.2.13, it remains to show that B / = P i,i > 0 this follows from is semiconvergent. Since we can choose in (5.20) such that p
Proof.
With we have
B = P
Q = I B
and
are only consequences of the sign structure and row sum feature established in Corollary 5.3.5. We show next that these relatively minor conditions conversely imply that
is an innitesimal generator.
Proposition 5.3.8.
Any matrix
Q = qi,j i = j,
qi,j 0
for
and
qi,i =
j =1 j =i
qi,j
P (t).
the well-known fact
We show that
Obviously,
P (t) = etQ has the desired properties. P (0) = I . The semigroup property follows from
that the exponential addition theorem holds true for the matrix exponential if the summands commute with each other (cf. [40, Theorem 6.2.38]). That
P (t)
P (t)1 =
k=0
since
tk k Q 1 = I1 + k!
k=1
tk k1 Q k!
Q1 = 1
t, we have P (t) = I + tQ + O . Then by the P ( t) = P (t ) O and thus P (t) O for all t 0. tQ It is well known, that the matrix exponential t e is a smooth map (see [40, Theorem 6.2.34]). Thus, P (t) is continuous and satises P (0) = Q . Q1 = 0 .
For suciently small Noting that each
M-matrix
Theorem 5.3.9. P (t) is a continuous stochastic semigroup if and only if P (t) = etA
with an
M-matrix A
satisfying
A1 = 0 . M-matrix
with zero
In particular, the matrix exponential transforms every negative row sum into a stochastic matrix, a notable fact on its own.
Corollary 5.3.10.
M-matrix)
then
If
Q Rn
Q1 = 0 ,
then
Q is an eQ is a
stochastic matrix.
78
Proposition 5.3.11.
in formulas
P (t) = etQ
is irreducible,
(5.22)
is irreducible.
Q Rnn is reducible. Given a permutation matrix T , we tT QT have the identity T P (t)T = e . Thus, we may assume without loss of generality Q= A B O C
an
with
A Rrr , B Rr(nr) C R(nr)(nr) and a zero matrix O R(nr)r for tQ as a power series integer r , 0 < r < n (cf. Denition 5.2.3). We rewrite P (t) = e
P (t) =
k=0
tk k t2 Q = I + tQ + Q 2 + . k! 2
Q also Q 2 , Q 3 , . . . has a (n r) r zero block in the lower left corner. Thus, also P (t) possesses this zero block for all t 0, i. e., pi,j (t) = 0 for all t 0, r < i n and 1 j r, which contradicts the irreducibility of P (t). Assume now, that Q is irreducible but pi,j (t) = 0 for all t > 0 and some states i and j . Using Theorem 5.2.4 there exists a sequence k1 = i, k2 , k3 , . . . , km1 , km = j , such that qki ,ki+1 = 0, i = 1, . . . , m 1. Since pi,j (t) = 0 for all t > 0 we get from the
A short calculation shows, that with Kolmogorov backward equation
d 0 = pi,j (t) = dt = k2
Writing
qi, p ,j (t).
=1
All terms in the sum on the right hand side are nonnegative and thus has to vanish if the sum is zero. For we know that
pk2 ,j (t) = 0
for all
t > 0.
d 0 = pk2 ,j (t) = dt
we conclude by a similar argument that induction we get
qk2 , p ,j (t).
=1
for all
pk3 ,j (t) = 0
t > 0.
By an obvious
km = j ,
this yields in
continuity assumption on
pj,j (t) = 0 for all t > 0. This contradicts P (t) since limt0 pj,j (t) = 0 but pj,j (0) = 1.
to the
79
Note that, the proof above does not use the fact that tic. We only require entries.
P (t) 0,
which is guaranteed
Q is reducible, then P (t) is a reducible matrix t 0. Conversely, if there exists an > 0 for which P ( ) is an irreducible matrix, then Q is irreducible. On the other hand, if P (t) is a reducible matrix for every t 0, then the semigroup P (t) is not irreducible, or, conversely, if the semigroup P (t) is irreducible, then there exists an > 0 such that P ( ) is an irreducible matrix.
The rst part of the proof shows: If for every Taking all together, we have
Corollary 5.3.12.
(i) (ii)
P (t) Q
is an irreducible semigroup,
is irreducible,
>0 >0
such that
P ( )
is an irreducible matrix.
Actually, there holds a lot more then stated in (iii), namely that matrix for every turns out that the matrices we need Lvy's Dichotomy.
P ( ) is an irreducible
P ( )
for all
t > 0,
or
pi,j (t) = 0
for all
t > 0.
pi,j (t) are analytic functions of t. In view of (v) in Proposition 5.3.1 we have pi,i (t) > 0 for all t 0 and all i = 1, . . . , n. Thus, it remains to prove the assertion for the o-diagonal entries. Let i = j be xed. Assume pi,j (t) = 0 for some t > 0. Using the semigroup property (5.11) we calculate
Note that the entries
0 = pi,j (t) =
k=1
pi,k
t m
pk,j
m1 t m
pi,j pi,j
t m
pj,j
m1 t m
Since the diagonal entries are positive, we conclude That is, set set of zeros of function,
pi,j (s)
t = 0 for any integer m > 0. m has a limit point (namely 0). Then, as an analytic
pi,j (s)
pi,j (s) = 0
for all
s > 0.
Chung
Remark 5.3.14. The idea for the proof above was found in [45, p. 263].
orems II.1.5]).
gives a proof of Lvy's Dichotomy for measurable transition functions (cf. [19, The-
Corollary 5.3.15.
P (t)
is irreducible, i. e., it
P (t) > O
for all
t > 0.
80
Theorem 5.3.16.
ces
Let
P (t)
are positive
P (t) = etQ denote a standard transition semigroup. for all t > 0 if and only if Q is irreducible.
The matri-
The following lemma will be used later to establish the uniqueness of invariant (stationary) vectors. It states, that the nullspace of is irreducible. Since we know already, that the rank of the generator is
n 1.
is an irreducible stochastic semigroup, then there holds
Lemma 5.3.17.
dim N(Q ) = 1
Proof.
matrix only if
If
P (t) = etQ
0. P
in terms of a stochastic
= I + 1Q P 1 + /
with some
> 0. P
is irreducible. Moreover,
is an eigenvalue of
if and
is an eigenvalue of
with a corresponding positive 1 is a simple eigenvalue of P > 0 . Therefore 0 is a simple eigenvalue of Q corresponding to the left eigenvector . In other word, N(Q ) is one-dimensional and spanned by . eigenvector
Theorem 5.2.7 implies that
(0)
a semigroup denes by
(5.23)
n a vector valued function. We denote by R+ the set of vectors with nonnegative compon n nents. For (0) R+ and a substochastic semigroup P (t) we have () : [0, ) R+ . Moreover, if the semigroup is standard, (t) is a continuous function of t. We rst state
the representation of
(t)
Proposition 5.4.1.
The function
(t)
d (t) = (t)Q , dt
The solution is given by
t 0,
(0)
given.
(5.24)
t 0. (0) Rn +.
(5.25)
81
Equation (5.24) is called the balance equation. with respect to the semigroup
is invariant
P (t),
if
= P (t)
t 0. P (t)
if and only if it satises
Rn Proposition 5.4.2.
Q = 0 .
If there holds Proof.
(5.26)
(t) =
for all
for some
t 0,
t,
i. e.,
(t)
is constant.
Suppose,
. (0) =
(t) =
(0)P (t) =
(5.24) we get
t 0,
(t)
is zero. Using
d Q. (t) = (t)Q = dt Q = 0 for some vector . Then, using the Taylor series Assume now, there holds representation of P (t), we conclude 0 =
P (t) =
k=0
tk k I + Q Q = k!
k=1
tk k1 Q k!
. =
be invariant for some s > 0. To prove the nal remark in the assertion, let (s) = tQ Since e is invertible (cf. [40, Theorem 6.2.38]), the denition of an invariant vector = etQ . Then for all t 0 we have can be rewritten as e(ts)Q = . (t) = (0) etQ =
Proposition 5.4.3.
P (t) = etQ
possesses a non-
(cf. Proposition 5.3.6). By Proposition 5.2.2, there exists a nonnegative left eigenvector
of
invariant vector of
P (t).
The next result is an application the Perron-Frobenius Theorem for irreducible matrices.
Proposition 5.4.4.
vector
Let
P (t).
subject to
1 = 1.
>0
Q = 0
master equation.
82
Proof.
of
Q.
eigenvector vector
Thus, setting
:= /( 1)
We show next, that also the converse holds true, that is, the existence and uniqueness of a positive invariant vector implies the irreducibility of the semigroup (or respectively).
Q,
Proposition 5.4.5.
vector
P (t) = etQ
possesses an invariant
which is positive,
>0
1 = 1,
then
Q)
is irreducible.
>0
in
(5.20) has a positive left eigenvector corresponding to the eigenvalue by Theorem 5.2.8, If
) = 1. (P
Then,
has the block diagonal form (5.8) with irreducible diagonal blocks. Then, by Corollary 5.2.5, also
is irreducible.
is irreducible and
P (t)
is irreducible.
To nish our investigation of stochastic semigroups we show, that the invariant vector of an irreducible semigroup coincides with the limit
lim (t).
is irreducible, then
Theorem 5.4.6.
P (t) = etQ
lim P (t) = 1 ,
where
Q = 0 ,
Proof.
1 = 1. >0
we know that
P ( )
is a
positive matrix. The Perron-Frobenius Theorem (cf. [39, Theorem 8.2.11]) guarantees the existence of
m
where
lim P (m ) = lim P ( )m = 1 ,
m
P ( ) subject to 1 = 1 (moreover there holds > 0 ). This is true for any > 0. The uniformly continuity of P (t) then implies, that the limit is independent of and coincides with limit of P (t) as t . (see, e. g., [1, Lemma 5.1.2]). Writing = we have
is the uniquely determined left eigenvector of
P (s + t) = P (s)P (t).
Letting
and using
1 = 1
P (t).
83
P (t)
as
t
The
always exists, even if the semigroup is not irreducible (cf. [1, Theorem 5.1.3]).
limit matrix then no longer has all rows identical, its structure is determined by the
communicating classes of
P ( )
Corollary 5.4.8.
of
Let
P (t) = etQ
(0)
a given nonzero vector satisfying the normalization condition vector valued function
(0)1 = 1.
Then the
converges for
to a limit independently
(0),
namely
lim (t) = ,
,
where
is uniquely determined by
Q = 0
1 = 1.
Remark 5.5.1 (Transition Rates and Classication of States). The results in Section 5.3
show that the transition semigroup of a CTMC on a nite state space is characterized by the innitesimal generator
Q.
chain (cf. [1, pp. 34]), we have a one-to-one correspondence between the innitesimal generator and the Markov chain. The numbers forming the matrix interpretation in terms of the stochastic process: The o-diagonal entries enters
have a meaningful
qi,j 0, i = j , i,
i.
The time
Yi ,
The negative diagonal entries qi of Q are called the jump rates associated 1 with state i. The reciprocal value /qi is the mean time, that the process stays in state
qi = qi,i .
i.
The state have If
instantaneous if
qi > 0, permanent or absorbing if qi = 0 and qi = (impossible in our setting). Thus, for a permanent state we Pr {Yi t} = 1 eqi t = 0 for any t T , i. e., the probability of a nite sojourn
is said to be stable if is a permanent state, then the entire
ith
row of
Q ith
is zero (cf. (ii) and (iii) row are zero (cf. (vi) in
of Corollary 5.3.5).
P (t)
is
ith
row of
P (t)
ith
is independent form t. This means, that the Markov chain will never leave state
i after
84
State if
name is introduced, since for innitely many states this is not true in general, even not
P (t)
Q1 = 0 ,
generator
Remark 5.5.2 (Uniform Markov Chain). Every homogeneous CTMC on a nite state
space is uniformizable. Consider the stochastic matrix in Example 5.1.2, the matrix parameter intensity
. Using the
>0
N (t)
with
N (t) , X (t) := X
an homogeneous CTMC, which is called the uniform Markov chain with clock
also, that the continuous-time Markov chain may stay in the current state while the Poisson process makes a jump. This happens precisely if since the diagonal elements of can be nonzero.
=X +1 , X
which is possible
The process constructed this way coincides with our original Markov chain, since it has the same innitesimal generator, which is given by (5.20). The transition function has the representation
P (t) =
k=0
as can be seen from Note further,
k t (t)
k!
k, P
qi = 0
we could chose = 0 in (5.20) if and only if i=1 for all i, which means that all states are permanent. In view of Corollary 5.3.5,
for all
t 0.
Thus,
=0
corresponds to an entirely
Remark 5.5.3 (The Embedded Chain). We have seen so far two kinds of discrete-time
Markov chains connected with a continuous-time chain, namely the transition matrix
P ( )
with the
corresponding to some equivalent uniform Markov chain. A third discrete-time Markov chain linked with
P (t),
Xk
is dened as the
k th
That is, we just ignore the time spent in each state and regard only the sequence of states. The transition probabilities of the embedded chain are given by
pi,j := Pr {Xn+1 = j | Xn = i} =
85
Note that these probabilities are well dened as long as has no permanent states. The matrix
P = pi,j
i,j S
can be written as
P = I D 1 Q ,
Comparing the reformulation
D = diag(Q ).
Q = D (P I )
with (5.20) explains the matrix algebra behind these the things: To dene a scaling factor
we applied
and
we remark,
that the dierence between them is the following: The embedded chain corresponding
other hand, the uniform chain may stay in the same state in some step. This reects, that the embedded chain regards the underlying CTMC at the random times when a state change occurs, while the uniform chain consists on snap-shots of the CTMC at equidistant time steps small enough not to miss a state change of the process.
Remark 5.5.4 (Irreducible Markov Chains). The innitesimal generator of an irreducible Markov chain is irreducible. Recall that a state structure of
is permanent if the
ith
diagonal element of
implies then, that the entire row equals to zero, which means that
is
reducible. Then, by Proposition 5.3.11, the Markov chain is not irreducible. Conversely, there holds: An irreducible homogeneous Markov chain with nite state space has no permanent states. Note that the stochastic matrices Markov chain are primitive. Lemma 8.5.5]. Finally, a CTMC on a nite state space is ergodic if and only if it is irreducible. This can be seen as follows: Assume, the transition semigroup of an ergodic chain is not irreducible, i. e., For
, P ( ) P P
and
corresponding to an irreducible
large enough:
Choose
are positive.
pi,j (t) = 0
and some states i and j . Since by denition of = 1 . . . n exists and is independently of we get j = limt pi,j (t) = 0, which contradicts the positivity for all
t>0
of the limit distribution. The converse implication follows from Corollary 5.4.8.
Remark 5.5.5 (State Distributions). According to Proposition 5.4.1, the state distributQ
tions of the Markov chain are given by
(t) = (0) e
A stationary distribution of the CTMC as introduced in (5.5) can be characterized using Proposition 5.4.2 as a nonnegative solution of the homogeneous balance equation
Q = 0
subject to
1 = 1.
stationary distribution exists. An irreducible Markov chain has a uniquely determined stationary distribution, which is positive and coincides with the limit distribution (cf. Corollary 5.4.8).
86
Q = 0 ,
which may be rewritten using
A := Q ,
x := ,
b := 0 ,
such that the notation corresponds to that of Sections 3 and 4. The treatment of this problem benets from many characteristics of the generator matrix, which we restate here in summarized form and give links to the relevant results about Krylov subspace methods.
All eigenvalues of
0
If
The linear system (3.1) is consistent. The nontrivial solutions of (3.1) are nonnegative and can be normalized such that
1 x = 1.
(5.27)
There holds index(A) = 1 and b R(A). # Thus, A exists and Theorem 4.4.6 applies. In particular, no singular breakdown occurs. Since
b = 0,
there holds
MR OR xL = xL = A# b + (I A# A)x0 = (I A# A)x0 ,
i. e., the Krylov solution in the termination step is the projection of nullspace of
along
x0 R(A).
If
Since
in order to get a
R(A). This is a nontrivial solution of (3.1) if and only if 1 N(A ) R(A) we may choose x0 as a scalar multiple of 1 1 n nontrivial solution. The most popular choice is x0 = 1 R . n
x0
onto the
dim N(A) = 1 (suciently, if A irreducible), there exists a unique solution x of index(A) (3.1) subject to the normalization (5.27). Moreover, since R(A ) = R(A) = 1 N(A ) = span{1 } , the operation v n (1 , v )1 provides a cheap possibility to index(A) project a vector v onto R(A ). We may apply this projection to the generated orthonormal basis of the Krylov space to avoid stability problems. It is then guaranteed, that the initial residual has index
87
We may also use Krylov subspace methods to compute approximately transient sotQ lutions, i. e., (t) = (0)e . As above let A = Q and set r = (0) . We look tA for the MR approximation s for e r = (t) from Km (A, r ). The basic idea is to tH approximate the matrix exponential by Vm e m Vm with Vm and Hm from the Arnoldi decomposition. This can be seen as one step of a single-step solver for the ordinary dierential system (5.24). polynomial therein. Note that the normalization
pm1
corresponding to
is normalized
residual polynomial. More details can be found in [67, Chapter 8] and the references
6 Concluding Remarks
We have not studied in detail MR implementations relying on an orthonormal basis of
Wm ,
Wm = Wm1
in some step
due to
Wm Cm N(A) = {0 }
we have a singular breakdown. During the orthonormalization process this results in loss of orthogonality in the generated basis
Wm .
employ a look-ahead strategy similar to that in Remark 3.4.9 and Algorithm 3.4.11 as long as the sequence of correction spaces does not terminate. If, however, the correction spaces are Krylov subspaces, our results ensure
Cm N(A) = {0 }
as long as
m < L.
In either case the diculty is to detect a singular breakdown. This requires the stable implementation of an orthogonal factorization for a matrix not having full column rank. It is not clear if the results of Strako et al. (cf. [57, 56]) apply for a singular A. mn Krylov subspace methods can not be applied directly, if the matrix A R of the linear system is not square. An obvious remedy is, to apply a standard iterative method to the normal equation matrix
A Ax = A b .
nm
in place of
AB z = b , B z = x ,
Zhang and Oyanagi (cf. [72]). In view of the results in Section 4, the matrix this type are studied by Ito and Hayami (cf. [46, 47]).
should
be determined such that the resulting square matrix is range hermitian. Methods of Matrix identities similar to those in Propositions 4.4.9 and 4.4.12 are studied by Cheng and Tian (cf. [18]). Their equations involve the Moore-Penrose and the group inverse and the objective is to characterize range-hermitian and normal matrices. Our conditions (4.28) and (4.29) and their investigation provide a rst step toward generalizations of the concept of a range-hermitian matrix. anticipate the ramications of this idea. The study of semigroups of stochastic matrices treated in Sections 5.3 and 5.4 can be extended in several directions. In particular, a more complete theory should involve generalized inverses, namely the group inverse of the innitesimal generator. For ex # ample, there holds Q = 0 ( P (t)) d t (cf. [20]), which is connected with the mean rst passage times of the corresponding Markov chain. More study is necessary to
88
Bibliography
[1] William J. Anderson. Continuous-time Markov chains : an applications-oriented
plications. Pure and Applied Mathematics. John Wiley & Sons, New York etc.,
1974. [5] Abraham Berman and Robert J. Plemmons. Nonnegative matrices in the mathe-
89
Bibliography
90
[12] Peter Buchholz. Structured analysis approaches for large Markov chains. Applied
SIAM J.
Nu-
Nu-
automata networks. Numerische Mathematik, 87(1):3557, 2000. [18] Shizhen Cheng and Yongge Tian. Two sets of new characterizations for normal and EP matrices. Linear Algebra and its Applications, 375:181195, 2003. [19] Kai Lai Chung. Markov Chains. Number 104 in Grundlehren der Mathematischen Wissenschaften. Springer, Gttingen, Berlin, Heidelberg, 1960. [20] Pauline Coolen-Schrijner and Erik A. van Doorn. The deviation matrix of a
continuous-time Markov chain. Probab. Eng. Inf. Sci., 16(3):351366, 2002. [21] P.J. Courtois. Decomposability: Queueing and computer system applications. ACM Monograph Series. Academic Press, a subsidiary of Harcourt Brace Jovanovich, Publishers, New York, San Francisco, London, 1977. [22] Michael Eiermann and Oliver G. Ernst. Geometric aspects of the theory of Krylov subspace methods. Acta Numerica, pages 251312, 2001. [23] Michael Eiermann, Oliver G. Ernst, and Olaf Schneider. Analysis of acceleration strategies for restarted minimal residual methods. J. Comput. Appl. Math., 123(12):261292, 2000. [24] Michael Eiermann, Ivo Marek, and Wilhelm Niethammer. On the solution of singular linear systems of algebraic equations by semiiterative methods. Numerische
Bibliography
91
[26] Bernd Fischer, Martin Hanke, and Marlis Hochbruck. A note on conjugate-gradient type methods for indenite and/or inconsistent linear systems. Numerical Algo-
Queueing
Bibliography
92
[38] Harro Heuser. Funktionalanalysis. Mathematische Leitfden. Teubner, Stuttgart, 3rd edition, 1992. [39] Roger A. Horn and Charles R. Johnson. Matrix analysis. Cambridge University Press, repr. corr. edition, 1990. [40] Roger A. Horn and Charles R. Johnson. Topics in matrix analysis. Cambridge University Press, 1st paperback edition, 1994. [41] Bertram Huppert. Angewandte Lineare Algebra. de Gruyter, Berlin, New York, 1990. [42] Ilse C. F. Ipsen and Steve Kirkland. Convergence analysis of a pagerank updating algorithm by Langville and Meyer. to appear in SIAM J. Matrix Anal. Appl. [43] Ilse C. F. Ipsen and Steve Kirkland. Convergence analysis of an improved pagerank algorithm. 2004. [44] Ilse C. F. Ipsen and Carl D. Meyer. The idea behind Krylov methods. American Technical Report CRSC-TR04-02, North Carolina State University,
for Markov chains via empirical transition matrices, with applications to credit
gation/disaggregation method for computing stationary probability vectors of stochastic matrices. Numerical Linear Algebra with Applications, 5:253274, 1998. [50] Nariyasu Minamide and Kahei Nakamura. A restricted pseudoinverse and its
Bibliography
93
[51] Dianne P. O'Leary. Iterative methods for nding the stationary vector for Markov chains. In Carl D. Meyer and Robert J. Plemmons, editors, Linear Algebra, Markov
Chains, and Queueing Models, volume 48 of The IMA Volumes in Mathematics and its Applications, pages 125133, New York etc., 1993. Springer.
[52] C.C. Paige and M.A. Saunders. Solution of sparse indenite systems of linear
equations. Applied Mathematical Sciences, 44. Springer, New York etc., 1983.
[54] Bernard Philippe, Yousef Saad, and William Stewart. Numerical methods in
Markov chain modeling. Oper. Res., 40(6):11561179, 1992. [55] Bernard Philippe and Roger B. Sidje. by Krylov subspaces. Transient solutions of Markov processes Proceedings of the 2nd
international workshop on the numerical solution of Markov chains, Raleigh, NC, USA, January 1618, 1995. [56] Miroslav Rozlonk, Z. Strako, and Miroslav Tma. Numerical stability of GMRES. BIT, 35:309330, 1995. [57] Miroslav Rozlonk, Zdnek Strako, and Miroslav Tma. On the role of orthogonality in the GMRES method. In Conference on Current Trends in Theory and
Workshop on the Numerical Solution of Markov Chains 1990. [59] Yousef Saad. Preconditioned Krylov subspace methods for the numerical solution of Markov chains. In Stewart [68], pages 4964. Proceedings of the 2nd international workshop on the numerical solution of Markov chains, Raleigh, NC, USA, January 1618, 1995. [60] Yousef Saad and Martin H. Schultz. GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems. SIAM J. Sci. Stat. Comput., 7(3):856869, 1986. [61] Avram Sidi. A unied approach to Krylov subspace methods for the Drazin-
inverse solution of singular nonsymmetric linear systems. Linear Algebra and its
Bibliography
94
[63] Avram Sidi. DGMRES: A GMRES-type algorithm for Drazin-inverse solution of singular inconsistent nonsymmetric linear systems. Linear Algebra and its Appli-
Markov modeling. In Stewart [66], pages 3761. Proceedings of the 1st International Workshop on the Numerical Solution of Markov Chains 1990. [66] William J. Stewart, editor. Numerical solution of Markov chains. Probab., Pure Appl. 8. Marcel Dekker Inc., New York, 1991. Proceedings of the 1st International Workshop on the Numerical Solution of Markov Chains 1990. [67] William J. Stewart.
Princeton University Press, 1994. [68] William J. Stewart, editor. Computations with Markov chains. Kluwer Academic Publishers, Boston, MA, 1995. Proceedings of the 2nd international workshop on the numerical solution of Markov chains, Raleigh, NC, USA, January 1618, 1995. [69] William J. Stewart and Wei Wu. Numerical experiments with iteration and aggregation for Markov chains. ORSA Journal on Computing, 4(3):336350, 1992. [70] Petar Todorovic. edition, 1992. [71] Per ke Wedin. On angles between subspaces of a nite dimensional inner product space. In B. Kagstrom and A. Ruhe, editors, Matrix Pencils, Lecture Notes in Mathematics 973, pages 263285. Springer, Berlin, Heidelberg, New York, 1982. [72] Shao Liang Zhang and Yoshido Oyanagi. A necessary and sucient convergence condition of Orthomin(k ) methods for least squares problem with weight. Annals