YSaad 1

A tutorial on: Robust iterative methods for Sparse Linear
Systems. (Part-1)
Yousef Saad
University of Minnesota
Computer Science and Engineering
CEA-EDF-INRIA school
Sophia Antipolis, Mar. 30 Apr. 3, 2009
Outline
Part 1
Introduction, sparse matrices and sparsity
Basic projection methods and (Briey) Krylov subspace methods
Preconditioned iterations
Preconditioning techniques
Part 2
Multilevel preconditioners
Nonsymmmetric permutations
Parallel implementations
Parallel Preconditioners
Software
CEA-EDF-INRIA school - 03/30/0904/03/09 2
Matlab codes
Material of this course will be supported by matlab scripts for
demos.
The matlab suite used for the demos is located here:
http://www.cs.umn.edu/saad/software
[Note: Updated at the occasion of each tutorial I give Make sure to
get the most recent version]
INTRODUCTION TO SPARSE MATRICES
Typical Problem:
Physical Model
Nonlinear PDEs
Discretization
Linearization (Newton)
Sequence of Sparse Linear Systems Ax = b

Linear System Solvers: recent work
Problem considered: Linear systems
Ax = b
Can view the problem from somewhat different angles:
Discretized problem coming from a PDE
An algebraic system of equations [ignore origin]
Sometimes a system of equations where A is not explicitly avail-
able
Linear System Solvers: general view
General
Purpose

Specialized
Direct sparse
Solvers
Iterative
A x = b
u = f
+ bc
Methods
Preconditioned Krylov
Fast Poisson
Solvers
Multigrid
Methods
Introduction: Linear System Solvers
Much of recent work on solvers has focussed on:
(1) Parallel implementation scalable performance
(2) Improving Robustness, developing more general preconditioners
A few observations
Problems are getting harder for Sparse Direct methods
(more 3-D models, much bigger problems,..)
Problems are also getting difcult for iterative methods Cause:
more complex models - away from Poisson
Researchers in iterative methods are borrowing techniques from
direct methods: preconditioners
The inverse is also happening: Direct methods are being adapted
for use as preconditioners
What are sparse matrices?
Common denition: ..matrices that allow special techniques to
take advantage of the large number of zero elements and the
structure.
A fewapplications of sparse matrices: Structural Engineering, Reser-
voir simulation, Electrical Networks, optimization problems, ...
Goals: Much less storage and work than dense computations.
Observation: A
1
is usually dense, but L and U in the LU factor-
ization may be reasonably sparse (if a good technique is used).
Nonzero patterns of a few sparse matrices
ARC130: Unsymmetric matrix from laser problem. a.r.curtis, oct 1974 SHERMAN5: fully implicit black oil simulator 16 by 23 by 3 grid, 3 unk
Sparse matrices in Matlab
Explore the scripts Lap2D, mark (provided in matlab suite) for
generating sparse matrices
Explore the command spy
Explore the command sparse
Run the demos titled demo sparse0 and demo sparse1
Load the matrix can 256.mat from the Florida set. Show its
pattern
Sparse matrices - continued
Main goal of Sparse Matrix Techniques: To perform standard
matrix computations economically, i.e., without storing the zeros
Example: To add two square dense matrices of size n requires
O(n
2
) operations. To add two sparse matrices A and B requires
O(nnz(A) + nnz(B)) where nnz(X) = number of nonzero ele-
ments of a matrix X.
For typical Finite Element /Finite difference matrices, number of
nonzero elements is O(n).
Graph Representations of Sparse Matrices
Graph theory is a fundamental tool in sparse matrix techniques.
Graph G = (V, E) of an n n matrix A defined by
Vertices V = {1, 2, ...., N}.
Edges E = {(i, j)|a
ij
= 0}.
Graph is undirected if matrix has symmetric structure: a
ij
= 0 iff
a
ji
= 0.
Adjacency graph of: A =
_
_
_
_
_
_
_
_
_
_
_
_
_

_
_
_
_
_
_
_
_
_
_
_
_
_
.
For any matrix A, what is the graph of A
2
? [interpret in terms of
paths in the graph of A]
Background: Direct versus iterative methods
Direct methods: based on sparse Gaussian eimination, sparse
Cholesky,..
Iterative methods: compute a sequence of iterates which con-
verge to the solution - preconditioned Krylov methods..
Remark: These two classes of methods have always been in
competition.
40 years ago solving a system with n = 10, 000 was a challenge
Now you can solve this in < 1 sec. on a laptop.
Sparse direct methods made huge gains in efciency. As a result
they are hard to beat for 2-D problems.
3-D problems much more challenging to direct solvers [inherent
to the underlying graph] Iterative methods become mandatory
Similar situation: Problems with many unknowns per grid point
Get the best of both worlds: turn direct methods into precondi-
tioners
Remarks: No robust black-box iterative solvers.
Robustness often conicts with efciency
However, situation improved in last decade
Line between direct and iterative solvers blurring
PROJECTION METHODS
One-dimensional projection processes
Steepest descent Problem: Ax = b , with A SPD
Dene: f(x) =
1
2
x x
2
A
=
1
2
(A(x x
), (x x
))
Note: 1. f(x) =
1
2
(Ax, x) (b, x) + constant
2. f(x) = Axb descent direction = bAx r
Idea: take a step of the formx
new
= x+r which minimizes f(x).
Best = (r, r)/(Ar, r).
Iteration:
r b Ax,
(r, r)/(Ar, r)
x x +r
Can show: convergence guaranteed if A is SPD.
Residual norm steepest descent: Now A is arbitrary
Minimize instead: f(x) =
1
2
b Ax
2
2
in direction f.
f(x) = A
T
(b Ax) = A
T
r.
Iteration:
r b Ax, d = A
T
r
d
2
2
/Ad
2
2
x x +d
Important Note: equivalent to usual steepest descent applied to
normal equations A
T
Ax = A
T
b .
Converges under the condition that A is nonsingular.
But convergence can be very slow
Minimal residual iteration: Assume Ais positive denite (A+A
T
is SPD).
The objective function is still
1
2
b Ax
2
2
, but the direction of
search is r = b Ax instead of f(x)
Iteration:
r b Ax,
(Ar, r)/(Ar, Ar)
x x +r
Each step minimizes f(x) = b Ax
2
2
in direction r.
Converges under the condition that A+A
T
is SPD.
Common feature of these techniques: x
new
= x +d , where
d = a certain direction.
is dened to optimize a certain quadratic function.
Equivalent to determining by an orthogonality constraint.
Example
In MR:
x() = x +d, with d = b Ax.
min
b Ax()
2
reached iff b Ax() r
One-dimensional projection methods can we generalize to m-
dimensional techniques?
General Projection Methods
Initial Problem: b Ax = 0
Given two subspaces K and L of R
N
dene the approximate prob-
lem:
Find x K such that b A x L
Leads to a small linear system (projected problems) This is a
basic projection step. Typically: sequence of such steps are applied
With a nonzero initial guess x
0
, the approximate problem is
Find x x
0
+K such that b A x L
Write x = x
0
+ and r
0
= b Ax
0
. Leads to a system for :
Find K such that r
0
A L
Matrix representation:
Let
V = [v
1
, . . . , v
m
] a basis of K &
W = [w
1
, . . . , w
m
] a basis of L
Then letting x be the approximate solution x = x
0
+ x
0
+ V y
where y is a vector of R
m
, the Petrov-Galerkin condition yields,
W
T
(r
0
AV y) = 0
and therefore
x = x
0
+V [W
T
AV ]
1
W
T
r
0
Remark: In practice W
T
AV is known from algorithm and has a
simple structure [tridiagonal, Hessenberg,..]
Prototype Projection Method
Until Convergence Do:
1. Select a pair of subspaces K, and L;
2. Choose bases V = [v
1
, . . . , v
m
] for K and W = [w
1
, . . . , w
m
]
for L.
3. Compute
r b Ax,
y (W
T
AV )
1
W
T
r,
x x +V y.
Operator Form Representation
Let P be the orthogonal projector onto K and
Q the (oblique) projector onto K and orthogonally to L.
Px K, x Px K
Qx K, x Qx L
K
L
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
c
x
Px
Qx
The P and Q projectors

Approximate problem amounts to solving
Q(b Ax) = 0, x K
or in operator form
Q(b APx) = 0
Question: what accuracy can one expect?
Let x
be the exact solution. Then

1) We cannot get better accuracy than (I P)x
2
, i.e.,
x x
2
(I P)x
2
2) The residual of the exact solution for the approximate problem
satises:
b QAPx
2
QA(I P)
2
(I P)x
2
Two important particular cases.
1. L = AK . then b A x
2
= min
zK
b Az
2
class of minimal residual methods: CR, GCR, ORTHOMIN,
GMRES, CGNR, ...
2. L = K class of Galerkin or orthogonal projection methods.
When A is SPD then
x
x
A
= min
zK
x
z
A
.
One-dimensional projection processes
K = span{d}
and
L = span{e}
Then x x +d and Petrov-Galerkin condition r A e yields
=
(r,e)
(Ad,e)
(I) Steepest descent: K = span(r), L = K
(II) Residual norm steepest descent: K = span(A
T
r), L = AK
(III) Minimal residual iteration: K = span(r), L = AK
Krylov Subspace Methods
Principle: Projection methods on Krylov subspaces:
K
m
(A, v
1
) = span{v
1
, Av
1
, , A
m1
v
1
}
probably the most important class of iterative methods.
many variants exist depending on the subspace L.
Simple properties of K
m
. Let = deg. of minimal polynomial of v
K
m
= {p(A)v|p = polynomial of degree m1}
K
m
= K
for all m . Moreover, K
is invariant under A.
dim(K
m
) = m iff m.
Overview
K = K
m
(A, r
0
), L = K Full Orthogonalization Method [Y.S.
81] ORTHORES [Young-Jea 82], Axelsson 81.
K = K
m
(A, r
0
), L = AK GMRES [YS-Schultz], GCR, Or-
thomin (Vinsome 1980), Orthodir (Young-Jea, 83), Axelssons CGLS
83...
K = K
m
(A, r
0
), L = K
m
(A
T
, w) Bi-CG (Fletcher 75)
Many variants of (3) that avoid the transpose (CGS, Sonneveld-
84), BiCGSTAB (Van der Vorst 92) TFQMR (Freund,93), ...
BASIC RELAXATION METHODS
Basic Relaxation Schemes
Relaxation schemes: based on the decomposition A = D E F
@
@
@
@
@
@
@
@
@
@
@
@
@
@
D
- F
- E
D = diag(A), E = strict lower part of Aand F
its strict upper part.
For example, Gauss-Seidel iteration :
(D E)x
(k+1)
= Fx
(k)
+b
The most common iterative procedures 50 years ago. However,
nowadays: seldom used by themselves.
Still used as smoothers in Multigrid schemes or sometimes as
preconditioners to Krylov subspace methods.
Iteration matrices
Jacobi, Gauss-Seidel, SOR, & SSOR iterations are of the form
x
(k+1)
= Mx
(k)
+f
M
Jac
= D
1
(E +F) = I D
1
A
M
GS
(A) = (D E)
1
F == I (D E)
1
A
M
SOR
(A) = (DE)
1
(F +(1)D) = I (
1
DE)
1
A
M
SSOR
(A) = I (2
1
1)(
1
D F)
1
D(
1
D E)
1
A
= I (2 1)(D F)
1
D(D E)
1
A
An observation & Introduction to Preconditioning
The iteration x
(k+1)
= Mx
(k)
+ f is attempting to solve (I
M)x = f. Since M is of the form M = I P
1
A this system can
be rewritten as
P
1
Ax = P
1
b
where for SSOR, we have
P
SSOR
= (D E)D
1
(D F)
referred to as the SSOR preconditioning matrix.
In other words:
Relaxation Scheme Preconditioned Fixed Point Iteration
PRECONDITIONING
Preconditioning Basic principles
Basic idea is to use the Krylov subspace method on a modified
system such as
M
1
Ax = M
1
b.
The matrix M
1
A need not be formed explicitly; only need to
solve Mw = v whenever needed.
Consequence: fundamental requirement is that it should be easy
to compute M
1
v for an arbitrary vector v.
Left, Right, and Split preconditioning
Left preconditioning: M
1
Ax = M
1
b
Right preconditioning: AM
1
u = b, with x = M
1
u
Split preconditioning: M
1
L
AM
1
R
u = M
1
L
b, with x = M
1
R
u
[Assume M is factored: M = M
L
M
R
. ]
Preconditioned CG (PCG)
Assume: A and M are both SPD.
Applying CG directly to M
1
Ax = M
1
b or AM
1
u = b
wont work because coefficient matrices are not symmetric.
Alternative: when M = LL
T
use split preconditioner option
Second alternative: Observe that M
1
A is self-adjoint wrt M
inner product:
(M
1
Ax, y)
M
= (Ax, y) = (x, Ay) = (x, M
1
Ay)
M
Preconditioned CG (PCG)
ALGORITHM : 1 Preconditioned Conjugate Gradient
1. Compute r
0
:= b Ax
0
, z
0
= M
1
r
0
, and p
0
:= z
0
2. For j = 0, 1, . . ., until convergence Do:
3.
j
:= (r
j
, z
j
)/(Ap
j
, p
j
)
4. x
j+1
:= x
j
+
j
p
j
5. r
j+1
:= r
j
j
Ap
j
6. z
j+1
:= M
1
r
j+1
7.
j
:= (r
j+1
, z
j+1
)/(r
j
, z
j
)
8. p
j+1
:= z
j+1
+
j
p
j
9. EndDo
Note M
1
A is also self-adjoint with respect to (., .)
A
:
(M
1
Ax, y)
A
= (AM
1
Ax, y) = (x, AM
1
Ay) = (x, M
1
Ay)
A
Can obtain a similar algorithm
Assume that M = Cholesky product M = LL
T
.
Then, another possibility: Split preconditioning option, which applies
CG to the system
L
1
AL
T
u = L
1
b, with x = L
T
u
Notation:

A = L
1
AL
T
. All quantities related to the precondi-
tioned system are indicated by .
ALGORITHM : 2 CG with Split Preconditioner
1. Compute r
0
:= b Ax
0
; r
0
= L
1
r
0
; and p
0
:= L
T
r
0
.
2. For j = 0, 1, . . ., until convergence Do:
3.
j
:= ( r
j
, r
j
)/(Ap
j
, p
j
)
4. x
j+1
:= x
j
+
j
p
j
5. r
j+1
:= r
j
j
L
1
Ap
j
6.
j
:= ( r
j+1
, r
j+1
)/( r
j
, r
j
)
7. p
j+1
:= L
T
r
j+1
+
j
p
j
8. EndDo
The x
j
s produced by the above algorithm and PCG are identical
(if same initial guess is used).
Flexible accelerators
Question: What can we do in case M is dened only approxi-
mately? i.e., if it can vary from one step to the other.?
Applications:
Iterative techniques as preconditioners: Block-SOR, SSOR, Multi-
grid, etc..
Chaotic relaxation type preconditioners (e.g., in a parallel com-
puting environment)
Mixing Preconditioners mixing coarse mesh / fine mesh pre-
conditioners.
ALGORITHM : 3 GMRES No preconditioning
1. Start: Choose x
0
and a dimension m of the Krylov subspaces.
2. Arnoldi process:
Compute r
0
= b Ax
0
, = r
0
2
and v
1
= r
0
/.
For j = 1, ..., m do
Compute w := Av
j
for i = 1, . . . , j, do
_
_
_
h
i,j
:= (w, v
i
)
w := w h
i,j
v
i
_
_
_
;
h
j+1,1
= w
2
; v
j+1
=
w
h
j+1,1
Dene V
m
:= [v
1
, ...., v
m
] and

H
m
= {h
i,j
}.
3. Formthe approximate solution: Compute x
m
= x
0
+V
m
y
m
where
y
m
= argmin
y
e
1

H
m
y
2
and e
1
= [1, 0, . . . , 0]
T
.
4. Restart: If satised stop, else set x
0
x
m
and goto 2.
ALGORITHM : 4 GMRES (right) Preconditioning
1. Start: Choose x
0
and a dimension m
2. Arnoldi process:
Compute r
0
= b Ax
0
, = r
0
2
and v
1
= r
0
/.
For j = 1, ..., m do
Compute z
j
:= M
1
v
j
Compute w := Az
j
for i = 1, . . . , j, do :
_
_
_
h
i,j
:= (w, v
i
)
w := w h
i,j
v
i
_
_
_
h
j+1,1
= w
2
; v
j+1
= w/h
j+1,1
Dene V
m
:= [v
1
, ...., v
m
] and

H
m
= {h
i,j
}.
3. Formthe approximate solution: x
m
= x
0
+M
1
V
m
y
m
where y
m
=
argmin
y
e
1

H
m
y
2
and e
1
= [1, 0, . . . , 0]
T
.
0
x
m
and goto 2.
ALGORITHM : 5 GMRES variable preconditioner
1. Start: Choose x
0
and a dimension m of the Krylov subspaces.
2. Arnoldi process:
Compute r
0
= b Ax
0
, = r
0
2
and v
1
= r
0
/.
For j = 1, ..., m do
Compute z
j
:= M
1
j
v
j
; Compute w := Az
j
;
for i = 1, . . . , j, do:
_
_
_
h
i,j
:= (w, v
i
)
w := w h
i,j
v
i
_
_
_
;
h
j+1,1
= w
2
; v
j+1
= w/h
j+1,1
Dene Z
m
:= [z
1
, ...., z
m
] and

H
m
= {h
i,j
}.
3. Formthe approximate solution: Compute x
m
= x
0
+Z
m
y
m
where
y
m
= argmin
y
e
1

H
m
y
2
and e
1
= [1, 0, . . . , 0]
T
.
0
x
m
and goto 2.
Properties
x
m
minimizes b Ax
m
over Span{Z
m
}.
If Az
j
= v
j
(i.e., if preconditioning is exact at step j) then ap-
proximation x
j
is exact.
If M
j
is constant then method is to Right-Preconditioned GM-
RES.
Additional Costs:
Arithmetic: none.
Memory: Must save the additional set of vectors {z
j
}
j=1,...m
Advantage: Flexibility
Standard preconditioners
Simplest preconditioner: M = Diag(A) poor convergence.
Next to simplest: SSOR M = (D E)D
1
(D F)
Still simple but often more efcient: ILU(0).
ILU(p) ILU with level of ll p more complex.
Class of ILU preconditioners with threshold
Class of approximate inverse preconditioners
Class of Multilevel ILU preconditioners: Multigrid, Algebraic Multi-
grid, M-level ILU, ..
The SOR/SSOR preconditioner
D
F
E
SOR preconditioning
M
SOR
= (D E)
SSOR preconditioning
M
SSOR
= (D E)D
1
(D F)
M
SSOR
= LU, L = lower unit matrix, U = upper triangular. One
solve with M
SSOR
same cost as a MAT-VEC.
k-step SOR (resp. SSOR) preconditioning:
k steps of SOR (resp. SSOR)
Questions: Best ? For preconditioning can take = 1
M = (D E)D
1
(D F)
Observe: M = LU +R with R = ED
1
F.
Best k? k = 1 is rarely the best. Substantial difference in
performance.
Write matlab script for k-step SSOR preconditioner ( = 1).
Iteration times versus
k for SOR(k) precon-
ditioned GMRES
ILU(0) and IC(0) preconditioners
Notation: NZ(X) = {(i, j) | X
i,j
= 0}
Formal denition of ILU(0):
A = LU +R
NZ(L)
NZ(U) = NZ(A)
r
ij
= 0 for (i, j) NZ(A)
This does not dene ILU(0) in a unique way.
Constructive denition: Compute the LU factorization of A but drop
any ll-in in L and U outside of Struct(A).
ILU factorizations are often based on i, k, j version of GE.
What is the IKJ version of GE?
Different computational patterns for Gaussian elimination
KJI,KJI IJK
IKJ JKI
ALGORITHM : 6 Gaussian Elimination IKJ Variant
1. For i = 2, . . . , n Do:
2. For k = 1, . . . , i 1 Do:
3. a
ik
:= a
ik
/a
kk
4. For j = k + 1, . . . , n Do:
5. a
ij
:= a
ij
a
ik
a
kj
6. EndDo
7. EndDo
8. EndDo
Not accessed
Accessed but not
Accessed and
modified
modified
ILU(0) zero-ll ILU
ALGORITHM : 7 ILU(0)
For i = 1, . . . , N Do:
For k = 1, . . . , i 1 and if (i, k) NZ(A) Do:
Compute a
ik
:= a
ik
/a
kj
For j = k + 1, . . . and if (i, j) NZ(A), Do:
compute a
ij
:= a
ij
a
ik
a
k,j
.
EndFor
EndFor
When Ais SPD then the ILU factorization = Incomplete Cholesky
factorization IC(0). Meijerink and Van der Vorst [1977].
Typical eigenvalue distribution of preconditioned matrix
Pattern of ILU(0) for 5-point matrix
Stencils and ILU factorization
Stencils of A and the L and U parts of A:
Higher order ILU factorization
Higher accuracy incomplete Cholesky: for regularly structured
problems, IC(p) allows p additional diagonals in L.
Can be generalized to irregular sparse matrices using the notion
of level of ll-in [Watts III, 1979]
Initially Lev
ij
=
_
_
_
0 for a
ij
= 0
for a
ij
== 0
At a given step i of Gaussian elimination:
Lev
kj
= min{Lev
kj
; Lev
ki
+Lev
ij
+ 1}
ILU(p) Strategy = drop anything with level of ll-in exceeding p.
* Increasing level of ll-in usually results in more accurate ILU and...
* ...typically in fewer steps and fewer arithmetic operations.
ILU(1)
ALGORITHM : 8 ILU(p)
For i = 2, N Do
For each k = 1, . . . , i 1 and if a
ij
= 0 do
Compute a
ik
:= a
ik
/a
jj
Compute a
i,
:= a
i,
a
ik
a
k,
.
Update the levels of a
i,
Replace any element in row i with lev(a
ij
) > p by zero.
EndFor
EndFor
The algorithm can be split into a symbolic and a numerical phase.
Level-of-ll in Symbolic phase
ILU with threshold generic algorithms
ILU(p) factorizations are based on structure only and not numerical
values potential problems for non M-matrices.
One remedy: ILU with threshold (generic name ILUT.)
Two broad approaches:
First approach [derived fromdirect solvers]: use any (direct) sparse
solver and incorporate a dropping strategy. [Munksgaard (?), Os-
terby & Zlatev, Sameh & Zlatev[90], D. Young, & al. (Boeing) etc...]
Second approach : [derived from iterative solvers viewpoint]
1. use a (row or colum) version of the (i, k, j) version of GE;
2. apply a drop strategy for the elment l
ik
as it is computed;
3. perform the linear combinations to get a
i
. Use full row expansion
of a
i
;
4. apply a drop strategy to ll-ins.
ILU with threshold: ILUT(k, )
Do the i, k, j version of Gaussian Elimination (GE).
During each i-th step in GE, discard any pivot or ll-in whose value
is below row
i
(A).
Once the i-th row of L + U, (L-part + U-part) is computed retain
only the k largest elements in both parts.
Advantages: controlled ll-in. Smaller memory overhead.
Easy to implement
Can be made quite inexpensive.
log
2
0
C
P
U
T
i
m
e
1.0 3.0 5.0 7.0 9.0
0.
4.0
8.0
12.
16.
20.
h
h
h
h
h
h
h
h
h
h
h
h
h
h
h
h
h
h@@@@
$
$
$
$
2
2
2
2
4
4
4
4
Typical curve of CPU time versus numerical threshold

Restarting methods for linear systems
Motivation: Goal: to use the information generated from current
GMRES loop to improve convergence at next GMRES restart.
References:
R. A. Nicolaides (87): Deated CG.
R. Morgan (92) Deated GMRES
A. Chapman, YS (93) Deated GMRES+preconditioning
S. Kharchenko & A. Yeremin (92) pole placement ideas.
K. Burrage, J. Ehrel, and B. Pohl (93): Deated GMRES
E de Sturler: use SVD information in GMRES.
Can help improve convergence and prevent stagnation of GM-
RES in some cases.
Generally speaking: One should not expect to solve very hard prob-
lems with Eigenvalue Deation Preconditioning alone.
Question: Can the same effects be achieved with block-Krylov
methods?
Using the Flexible GMRES framework
Method: Deation can be achieved by enriching the Krylov
subspace with approximate eigenvectors obtained from previous
runs. We can use Flexible GMRES and append these vectors at
end. [See R. Morgan (92), Chapman & YS (95).]
Vectors v
1
, . . . , v
mp
= standard Arnoldi vectors
Vectors v
mp+1
, . . . , v
m
= Computed as in FGMRES where new
vectors z
j
are previously computed eigenvectors.
Storage: we need to store v
1
, . . . , v
m
and z
mp+1
, . . . , z
m
. p
additional vectors, with typically p << m.
GMRES with deation
1. Deated Arnoldi process: r
0
:= b Ax
0
, v
1
:= r
0
/( := r
0
2
).
For j = 1, ..., m do
If j < mp then z
j
:= v
j
Else z
j
= u
jm
(eigenvector)
w = Az
j
For i = 1, . . . , j, do
_
_
_
h
i,j
:= (w, v
i
)
w := w h
i,j
v
i
h
j+1,j
= w
2
, v
j+1
= w/w
2
.
EndDo
Dene Z
m
:= [z
1
, ...., z
m
] and

H
m
= {h
i,j
}.
2. Form the approximate solution:
Compute x
m
= x
0
+Z
m
y
m
where y
m
= argmin
y
e
1

H
m
y
2
.
3. Get next eigenvector estimates u
1
, . . . , u
p
from

H
m
, V
m
, Z
m
, ...
0
x
m
and goto 1.
Question 1: which eigenvectors to add?
Answer: those associated with smallest eigenvalues.
Question 2: How to compute eigenvectors from F-GMRES?
Answer: use the relation
AZ
m
= V
m+1

H
m
Approximation: , u = Z
m
y
Galerkin Condition: r AZ
m
gives the generalized problem
H
H
m
H
m
y =

H
H
m
V
H
m+1
Z
m
y
In Addition in GMRES:

H
m
= Q
m

R
m
so H
H
m
H
m
= R
H
m
R
m
.
See: Morgan (1993).
An example: Shell problems
Can be very hard to solve!
A matrix of size N=38,002, with Nz = 949,452 nonzero elements.
Actually symmetric. Not exploited in test.
Most simplistic methods fail.
ILUT(50,0) does not work even with GMRES(80).
This is an example when a large subspace is required.
0 100 200 300 400 500 600 700 800 900 1000
10
18
10
16
10
14
10
12
10
10
10
8
10
6
10
4
10
2
10
0
Numbers of GMRES steps
l
o
g
1
0

o
f

r
e
s
i
d
u
a
l
Shell problem. N = 38002, Nz = 949452, m = 80
0 eigenvector
5 eigenvectors
10 eigenvectors
20 eigenvectors
30 eigenvectors
An example: Eulers equations on an unstructured mesh
Contributed by Larry Wigton from Boeing
Size = 3,864. (966 mesh points).
Nonzero elements: 238,252 (about 62 per row).
Difcult to solve in spite of its small size.
Results with ILUT(lfil, )
ll Iterations estimate of
(tol = 10
8
) (LU)
1
100 0.19E + 56
110 0.34E + 9
120 30 0.70E + 5
130 25 0.33E + 7
140 20 0.17E + 4
150 19 0.69E + 4
Results with Block Jacobi Preconditioning
with Eigenvalue Deation
reduction in residual norm in
1200 GMRES steps with m = 49
4x4 block 16x16 block
p = 0 0.8 E 0 0.8 E 0
p = 4 0.8 E 0 4.0 E-5
p = 8 1.2 E-2 2.9 E-7
p = 12 1.9 E-2 3.8 E-6
Theory (Hermitian case only)
Assume that A is SPD and let K = K
m
+W, where W is s.t.
dist(AW, U) =
with U = exact invariant subspace associated with
1
, ..,
s
.
Then the residual r obtained from the minimal residual projection
process onto the augmented Krylov subspace K satises the
inequality,
r
2
r
0
1
T
2
m
()
+
2
where

n
+
s+1
s+1
, T
m
Chebyshev polyn. of deg. mof 1st kind.
See [YS, SIMAX vol. 4, pp 43-66 (1997)] for other results.
Crout-based ILUT (ILUTC)
Terminology: Crout versions of LU compute the k-th row of U and
the k-th column of L at the k-th step.
Computational pattern
Black = part computed at step k
Blue = part accessed
Main advantages:
1. Less expensive than ILUT (avoids sorting)
2. Allows better techniques for dropping
References:
[1] M. Jones and P. Plassman. An improved incomplete Choleski fac-
torization. ACM Transactions on Mathematical Software, 21:5
17, 1995.
[2] S. C. Eisenstat, M. H. Schultz, and A. H. Sherman. Algorithms
and data structures for sparse symmetric Gaussian elimination.
SIAM Journal on Scientic Computing, 2:225237, 1981.
[3] M. Bollh ofer. A robust ILU with pivoting based on monitoring the
growth of the inverse factors. Linear Algebra and its Applications,
338(13):201218, 2001.
[4] N. Li, Y. Saad, and E. Chow. Crout versions of ILU. MSI technical
report, 2002.
Crout LU (dense case)
Go back to delayed update algorithm (IKJ alg.) and observe: we
could do both a column and a row version
Left: U computed by rows. Right: L computed by columns
Note: Entries 1 : k1 in k-th row of gure need not be computed.
Available from already computed columns of L.
Similar observation for L (right)
ALGORITHM : 9 Crout LU Factorization (dense case)
1. For k = 1 : n Do :
2. For i = 1 : k 1 and if a
ki
= 0 Do :
3. a
k,k:n
= a
k,k:n
a
ki
a
i,k:n
4. EndDo
5. For i = 1 : k 1 and if a
ik
= 0 Do :
6. a
k+1:n.k
= a
k+1:n,k
a
ik
a
k+1:n,i
7. EndDo
8. a
ik
= a
ik
/a
kk
for i = k + 1, ..., n
9. EndDo
Comparison with standard techniques
0 10 20 30 40 50 60 70
0
2
4
6
8
10
12
14
Lfil
P
r
e
c
o
n
d
i
t
i
o
n
i
n
g

T
i
m
e

(
s
e
c
.
)
Preconditioning time vs. Lfil for RAEFSKY3
ILUC
rLLUT
cILUT
bILUT
Precondition time vs. Ll for ILUC (solid), row-ILUT (circles), column-
ILUT (triangles) and r-ILUT with Binary Search Trees (stars)
Inverse-based dropping strategies
Method developed mainly by Matthias Bollh offer
Observation: norm of inverses of the factors is more important
than the errors in the factors themselves: If A =

L
U +E then
L
1
A
U
1
= I +

L
1
E
U
1
In many cases
L
1
and
U
1
are *very* large Bad.
In contrast assume A = LU = exact LU factorization and
L
1
= L
1
+X

U
1
= U
1
+Y, Then:
L
1
A
U
1
= (L
1
+X)A(U
1
+Y ) = I +AY +XA+XY.
X, Y small preconditioned matrix close to identity
Let L
k
= matrix of the rst k rows of L and the last n k rows of
the identity matrix.
Consider a term l
jk
with j > k that is dropped at step k. Per-
turbed matrix

L
k
differs from L
k
by l
jk
e
j
e
T
k
. Note: L
k
e
j
= e
j
so
L
k
= L
k
l
jk
e
j
e
T
k
= L
k
(I l
jk
e
j
e
T
k
)
L
1
k
= (I l
jk
e
j
e
T
k
)
1
L
1
k
= L
1
k
+l
jk
e
j
e
T
k
L
1
k
.
j-th row of inverse of L
k
perturbed by l
jk
times k-th row of L
1
k
.
Need to limit the norm of this perturbing row, i.e.,
|l
jk
| e
T
k
L
1
k

should be small
L
1
is not available. Bollh ofers idea: use techniques for estimat-
ing condition numbers [see, e.g., Golub and van Loan]
ALGORITHM : 10 Estimating the norms e
T
k
L
1
1. Set
1
= 1,
i
= 0, i = 1, . . . , n
2 For k = 2, . . . , n do
3
+
= 1
k
;
= 1
k
;
4 if |
+
| > |
| then
k
=
+
else
k
=
5 For j = k + 1 : n and for l

jk
= 0 Do
6
j
=
j
+
k
l
jk
7 EndDo
8. EndDo
Idea ts very well with Crout ILU [Na Li, YS, E. Chow, 2004]
The end of Part-1
Given the following script for the IKJ version of LU write matlab
script for ILU(0)
function [L,U] = ikj(A)
%---------------------------------------------
% function [L,U] = ikj (A)
% LU factorization of A. Uses ikj variant of GE
%---------------------------------------------
n = size(A,1) ;
for i=1:n
for k=1:i-1
A(i,k) = A(i,k)/A(k,k); %% ! div by zero
A(i,k+1:n) = A(i,k+1:n)-A(i,k)
*
A(k,k+1:n);
end
end
L = diag(ones(n,1)) + tril(A,-1);
U = triu(A);
Explore the solvers in the package ITSOL:
www.cs.umn.edu/saad/software

YSaad 1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

YSaad 1

Uploaded by

Copyright:

Available Formats

A tutorial on: Robust iterative methods for Sparse Linear

Sequence of Sparse Linear Systems Ax = b

The P and Q projectors

be the exact solution. Then

for all m . Moreover, K

Typical curve of CPU time versus numerical threshold

5 For j = k + 1 : n and for l

You might also like