Professional Documents
Culture Documents
Systems. (Part-1)
Yousef Saad
University of Minnesota
Computer Science and Engineering
CEA-EDF-INRIA school
Sophia Antipolis, Mar. 30 Apr. 3, 2009
Outline
Part 1
Introduction, sparse matrices and sparsity
Basic projection methods and (Briey) Krylov subspace methods
Preconditioned iterations
Preconditioning techniques
Part 2
Multilevel preconditioners
Nonsymmmetric permutations
Parallel implementations
Parallel Preconditioners
Software
CEA-EDF-INRIA school - 03/30/0904/03/09 2
Matlab codes
Material of this course will be supported by matlab scripts for
demos.
The matlab suite used for the demos is located here:
http://www.cs.umn.edu/saad/software
[Note: Updated at the occasion of each tutorial I give Make sure to
get the most recent version]
CEA-EDF-INRIA school - 03/30/0904/03/09 3
INTRODUCTION TO SPARSE MATRICES
Typical Problem:
Physical Model
Nonlinear PDEs
Discretization
Linearization (Newton)
2
A
=
1
2
(A(x x
), (x x
))
Note: 1. f(x) =
1
2
(Ax, x) (b, x) + constant
2. f(x) = Axb descent direction = bAx r
Idea: take a step of the formx
new
= x+r which minimizes f(x).
Best = (r, r)/(Ar, r).
Iteration:
r b Ax,
(r, r)/(Ar, r)
x x +r
Can show: convergence guaranteed if A is SPD.
CEA-EDF-INRIA school - 03/30/0904/03/09 20
Residual norm steepest descent: Now A is arbitrary
Minimize instead: f(x) =
1
2
b Ax
2
2
in direction f.
f(x) = A
T
(b Ax) = A
T
r.
Iteration:
r b Ax, d = A
T
r
d
2
2
/Ad
2
2
x x +d
Important Note: equivalent to usual steepest descent applied to
normal equations A
T
Ax = A
T
b .
Converges under the condition that A is nonsingular.
But convergence can be very slow
CEA-EDF-INRIA school - 03/30/0904/03/09 21
Minimal residual iteration: Assume Ais positive denite (A+A
T
is SPD).
The objective function is still
1
2
b Ax
2
2
, but the direction of
search is r = b Ax instead of f(x)
Iteration:
r b Ax,
(Ar, r)/(Ar, Ar)
x x +r
Each step minimizes f(x) = b Ax
2
2
in direction r.
Converges under the condition that A+A
T
is SPD.
CEA-EDF-INRIA school - 03/30/0904/03/09 22
Common feature of these techniques: x
new
= x +d , where
d = a certain direction.
is dened to optimize a certain quadratic function.
Equivalent to determining by an orthogonality constraint.
Example
In MR:
x() = x +d, with d = b Ax.
min
b Ax()
2
reached iff b Ax() r
One-dimensional projection methods can we generalize to m-
dimensional techniques?
CEA-EDF-INRIA school - 03/30/0904/03/09 23
General Projection Methods
Initial Problem: b Ax = 0
Given two subspaces K and L of R
N
dene the approximate prob-
lem:
Find x K such that b A x L
Leads to a small linear system (projected problems) This is a
basic projection step. Typically: sequence of such steps are applied
CEA-EDF-INRIA school - 03/30/0904/03/09 24
With a nonzero initial guess x
0
, the approximate problem is
Find x x
0
+K such that b A x L
Write x = x
0
+ and r
0
= b Ax
0
. Leads to a system for :
Find K such that r
0
A L
CEA-EDF-INRIA school - 03/30/0904/03/09 25
Matrix representation:
Let
V = [v
1
, . . . , v
m
] a basis of K &
W = [w
1
, . . . , w
m
] a basis of L
Then letting x be the approximate solution x = x
0
+ x
0
+ V y
where y is a vector of R
m
, the Petrov-Galerkin condition yields,
W
T
(r
0
AV y) = 0
and therefore
x = x
0
+V [W
T
AV ]
1
W
T
r
0
Remark: In practice W
T
AV is known from algorithm and has a
simple structure [tridiagonal, Hessenberg,..]
CEA-EDF-INRIA school - 03/30/0904/03/09 26
Prototype Projection Method
Until Convergence Do:
1. Select a pair of subspaces K, and L;
2. Choose bases V = [v
1
, . . . , v
m
] for K and W = [w
1
, . . . , w
m
]
for L.
3. Compute
r b Ax,
y (W
T
AV )
1
W
T
r,
x x +V y.
CEA-EDF-INRIA school - 03/30/0904/03/09 27
Operator Form Representation
Let P be the orthogonal projector onto K and
Q the (oblique) projector onto K and orthogonally to L.
Px K, x Px K
Qx K, x Qx L
K
L
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
r
c
x
Px
Qx
2
, i.e.,
x x
2
(I P)x
2
2) The residual of the exact solution for the approximate problem
satises:
b QAPx
2
QA(I P)
2
(I P)x
2
CEA-EDF-INRIA school - 03/30/0904/03/09 30
Two important particular cases.
1. L = AK . then b A x
2
= min
zK
b Az
2
class of minimal residual methods: CR, GCR, ORTHOMIN,
GMRES, CGNR, ...
2. L = K class of Galerkin or orthogonal projection methods.
When A is SPD then
x
x
A
= min
zK
x
z
A
.
CEA-EDF-INRIA school - 03/30/0904/03/09 31
One-dimensional projection processes
K = span{d}
and
L = span{e}
Then x x +d and Petrov-Galerkin condition r A e yields
=
(r,e)
(Ad,e)
(I) Steepest descent: K = span(r), L = K
(II) Residual norm steepest descent: K = span(A
T
r), L = AK
(III) Minimal residual iteration: K = span(r), L = AK
CEA-EDF-INRIA school - 03/30/0904/03/09 32
Krylov Subspace Methods
Principle: Projection methods on Krylov subspaces:
K
m
(A, v
1
) = span{v
1
, Av
1
, , A
m1
v
1
}
probably the most important class of iterative methods.
many variants exist depending on the subspace L.
Simple properties of K
m
. Let = deg. of minimal polynomial of v
K
m
= {p(A)v|p = polynomial of degree m1}
K
m
= K
is invariant under A.
dim(K
m
) = m iff m.
CEA-EDF-INRIA school - 03/30/0904/03/09 33
Overview
K = K
m
(A, r
0
), L = K Full Orthogonalization Method [Y.S.
81] ORTHORES [Young-Jea 82], Axelsson 81.
K = K
m
(A, r
0
), L = AK GMRES [YS-Schultz], GCR, Or-
thomin (Vinsome 1980), Orthodir (Young-Jea, 83), Axelssons CGLS
83...
K = K
m
(A, r
0
), L = K
m
(A
T
, w) Bi-CG (Fletcher 75)
Many variants of (3) that avoid the transpose (CGS, Sonneveld-
84), BiCGSTAB (Van der Vorst 92) TFQMR (Freund,93), ...
CEA-EDF-INRIA school - 03/30/0904/03/09 34
BASIC RELAXATION METHODS
Basic Relaxation Schemes
Relaxation schemes: based on the decomposition A = D E F
@
@
@
@
@
@
@
@
@
@
@
@
@
@
D
- F
- E
D = diag(A), E = strict lower part of Aand F
its strict upper part.
For example, Gauss-Seidel iteration :
(D E)x
(k+1)
= Fx
(k)
+b
The most common iterative procedures 50 years ago. However,
nowadays: seldom used by themselves.
Still used as smoothers in Multigrid schemes or sometimes as
preconditioners to Krylov subspace methods.
CEA-EDF-INRIA school - 03/30/0904/03/09 36
Iteration matrices
Jacobi, Gauss-Seidel, SOR, & SSOR iterations are of the form
x
(k+1)
= Mx
(k)
+f
M
Jac
= D
1
(E +F) = I D
1
A
M
GS
(A) = (D E)
1
F == I (D E)
1
A
M
SOR
(A) = (DE)
1
(F +(1)D) = I (
1
DE)
1
A
M
SSOR
(A) = I (2
1
1)(
1
D F)
1
D(
1
D E)
1
A
= I (2 1)(D F)
1
D(D E)
1
A
CEA-EDF-INRIA school - 03/30/0904/03/09 37
An observation & Introduction to Preconditioning
The iteration x
(k+1)
= Mx
(k)
+ f is attempting to solve (I
M)x = f. Since M is of the form M = I P
1
A this system can
be rewritten as
P
1
Ax = P
1
b
where for SSOR, we have
P
SSOR
= (D E)D
1
(D F)
referred to as the SSOR preconditioning matrix.
In other words:
Relaxation Scheme Preconditioned Fixed Point Iteration
CEA-EDF-INRIA school - 03/30/0904/03/09 38
PRECONDITIONING
Preconditioning Basic principles
Basic idea is to use the Krylov subspace method on a modified
system such as
M
1
Ax = M
1
b.
The matrix M
1
A need not be formed explicitly; only need to
solve Mw = v whenever needed.
Consequence: fundamental requirement is that it should be easy
to compute M
1
v for an arbitrary vector v.
CEA-EDF-INRIA school - 03/30/0904/03/09 40
Left, Right, and Split preconditioning
Left preconditioning: M
1
Ax = M
1
b
Right preconditioning: AM
1
u = b, with x = M
1
u
Split preconditioning: M
1
L
AM
1
R
u = M
1
L
b, with x = M
1
R
u
[Assume M is factored: M = M
L
M
R
. ]
CEA-EDF-INRIA school - 03/30/0904/03/09 41
Preconditioned CG (PCG)
Assume: A and M are both SPD.
Applying CG directly to M
1
Ax = M
1
b or AM
1
u = b
wont work because coefficient matrices are not symmetric.
Alternative: when M = LL
T
use split preconditioner option
Second alternative: Observe that M
1
A is self-adjoint wrt M
inner product:
(M
1
Ax, y)
M
= (Ax, y) = (x, Ay) = (x, M
1
Ay)
M
CEA-EDF-INRIA school - 03/30/0904/03/09 42
Preconditioned CG (PCG)
ALGORITHM : 1 Preconditioned Conjugate Gradient
1. Compute r
0
:= b Ax
0
, z
0
= M
1
r
0
, and p
0
:= z
0
2. For j = 0, 1, . . ., until convergence Do:
3.
j
:= (r
j
, z
j
)/(Ap
j
, p
j
)
4. x
j+1
:= x
j
+
j
p
j
5. r
j+1
:= r
j
j
Ap
j
6. z
j+1
:= M
1
r
j+1
7.
j
:= (r
j+1
, z
j+1
)/(r
j
, z
j
)
8. p
j+1
:= z
j+1
+
j
p
j
9. EndDo
CEA-EDF-INRIA school - 03/30/0904/03/09 43
Note M
1
A is also self-adjoint with respect to (., .)
A
:
(M
1
Ax, y)
A
= (AM
1
Ax, y) = (x, AM
1
Ay) = (x, M
1
Ay)
A
Can obtain a similar algorithm
Assume that M = Cholesky product M = LL
T
.
Then, another possibility: Split preconditioning option, which applies
CG to the system
L
1
AL
T
u = L
1
b, with x = L
T
u
Notation:
A = L
1
AL
T
. All quantities related to the precondi-
tioned system are indicated by .
CEA-EDF-INRIA school - 03/30/0904/03/09 44
ALGORITHM : 2 CG with Split Preconditioner
1. Compute r
0
:= b Ax
0
; r
0
= L
1
r
0
; and p
0
:= L
T
r
0
.
2. For j = 0, 1, . . ., until convergence Do:
3.
j
:= ( r
j
, r
j
)/(Ap
j
, p
j
)
4. x
j+1
:= x
j
+
j
p
j
5. r
j+1
:= r
j
j
L
1
Ap
j
6.
j
:= ( r
j+1
, r
j+1
)/( r
j
, r
j
)
7. p
j+1
:= L
T
r
j+1
+
j
p
j
8. EndDo
The x
j
s produced by the above algorithm and PCG are identical
(if same initial guess is used).
CEA-EDF-INRIA school - 03/30/0904/03/09 45
Flexible accelerators
Question: What can we do in case M is dened only approxi-
mately? i.e., if it can vary from one step to the other.?
Applications:
Iterative techniques as preconditioners: Block-SOR, SSOR, Multi-
grid, etc..
Chaotic relaxation type preconditioners (e.g., in a parallel com-
puting environment)
Mixing Preconditioners mixing coarse mesh / fine mesh pre-
conditioners.
CEA-EDF-INRIA school - 03/30/0904/03/09 46
ALGORITHM : 3 GMRES No preconditioning
1. Start: Choose x
0
and a dimension m of the Krylov subspaces.
2. Arnoldi process:
Compute r
0
= b Ax
0
, = r
0
2
and v
1
= r
0
/.
For j = 1, ..., m do
Compute w := Av
j
for i = 1, . . . , j, do
_
_
_
h
i,j
:= (w, v
i
)
w := w h
i,j
v
i
_
_
_
;
h
j+1,1
= w
2
; v
j+1
=
w
h
j+1,1
Dene V
m
:= [v
1
, ...., v
m
] and
H
m
= {h
i,j
}.
3. Formthe approximate solution: Compute x
m
= x
0
+V
m
y
m
where
y
m
= argmin
y
e
1
H
m
y
2
and e
1
= [1, 0, . . . , 0]
T
.
4. Restart: If satised stop, else set x
0
x
m
and goto 2.
ALGORITHM : 4 GMRES (right) Preconditioning
1. Start: Choose x
0
and a dimension m
2. Arnoldi process:
Compute r
0
= b Ax
0
, = r
0
2
and v
1
= r
0
/.
For j = 1, ..., m do
Compute z
j
:= M
1
v
j
Compute w := Az
j
for i = 1, . . . , j, do :
_
_
_
h
i,j
:= (w, v
i
)
w := w h
i,j
v
i
_
_
_
h
j+1,1
= w
2
; v
j+1
= w/h
j+1,1
Dene V
m
:= [v
1
, ...., v
m
] and
H
m
= {h
i,j
}.
3. Formthe approximate solution: x
m
= x
0
+M
1
V
m
y
m
where y
m
=
argmin
y
e
1
H
m
y
2
and e
1
= [1, 0, . . . , 0]
T
.
4. Restart: If satised stop, else set x
0
x
m
and goto 2.
ALGORITHM : 5 GMRES variable preconditioner
1. Start: Choose x
0
and a dimension m of the Krylov subspaces.
2. Arnoldi process:
Compute r
0
= b Ax
0
, = r
0
2
and v
1
= r
0
/.
For j = 1, ..., m do
Compute z
j
:= M
1
j
v
j
; Compute w := Az
j
;
for i = 1, . . . , j, do:
_
_
_
h
i,j
:= (w, v
i
)
w := w h
i,j
v
i
_
_
_
;
h
j+1,1
= w
2
; v
j+1
= w/h
j+1,1
Dene Z
m
:= [z
1
, ...., z
m
] and
H
m
= {h
i,j
}.
3. Formthe approximate solution: Compute x
m
= x
0
+Z
m
y
m
where
y
m
= argmin
y
e
1
H
m
y
2
and e
1
= [1, 0, . . . , 0]
T
.
4. Restart: If satised stop, else set x
0
x
m
and goto 2.
CEA-EDF-INRIA school - 03/30/0904/03/09 49
Properties
x
m
minimizes b Ax
m
over Span{Z
m
}.
If Az
j
= v
j
(i.e., if preconditioning is exact at step j) then ap-
proximation x
j
is exact.
If M
j
is constant then method is to Right-Preconditioned GM-
RES.
Additional Costs:
Arithmetic: none.
Memory: Must save the additional set of vectors {z
j
}
j=1,...m
Advantage: Flexibility
CEA-EDF-INRIA school - 03/30/0904/03/09 50
Standard preconditioners
Simplest preconditioner: M = Diag(A) poor convergence.
Next to simplest: SSOR M = (D E)D
1
(D F)
Still simple but often more efcient: ILU(0).
ILU(p) ILU with level of ll p more complex.
Class of ILU preconditioners with threshold
Class of approximate inverse preconditioners
Class of Multilevel ILU preconditioners: Multigrid, Algebraic Multi-
grid, M-level ILU, ..
CEA-EDF-INRIA school - 03/30/0904/03/09 51
The SOR/SSOR preconditioner
D
F
E
SOR preconditioning
M
SOR
= (D E)
SSOR preconditioning
M
SSOR
= (D E)D
1
(D F)
M
SSOR
= LU, L = lower unit matrix, U = upper triangular. One
solve with M
SSOR
same cost as a MAT-VEC.
CEA-EDF-INRIA school - 03/30/0904/03/09 52
k-step SOR (resp. SSOR) preconditioning:
k steps of SOR (resp. SSOR)
Questions: Best ? For preconditioning can take = 1
M = (D E)D
1
(D F)
Observe: M = LU +R with R = ED
1
F.
Best k? k = 1 is rarely the best. Substantial difference in
performance.
Write matlab script for k-step SSOR preconditioner ( = 1).
CEA-EDF-INRIA school - 03/30/0904/03/09 53
Iteration times versus
k for SOR(k) precon-
ditioned GMRES
CEA-EDF-INRIA school - 03/30/0904/03/09 54
ILU(0) and IC(0) preconditioners
Notation: NZ(X) = {(i, j) | X
i,j
= 0}
Formal denition of ILU(0):
A = LU +R
NZ(L)
NZ(U) = NZ(A)
r
ij
= 0 for (i, j) NZ(A)
This does not dene ILU(0) in a unique way.
Constructive denition: Compute the LU factorization of A but drop
any ll-in in L and U outside of Struct(A).
ILU factorizations are often based on i, k, j version of GE.
CEA-EDF-INRIA school - 03/30/0904/03/09 55
What is the IKJ version of GE?
Different computational patterns for Gaussian elimination
KJI,KJI IJK
IKJ JKI
CEA-EDF-INRIA school - 03/30/0904/03/09 57
ALGORITHM : 6 Gaussian Elimination IKJ Variant
1. For i = 2, . . . , n Do:
2. For k = 1, . . . , i 1 Do:
3. a
ik
:= a
ik
/a
kk
4. For j = k + 1, . . . , n Do:
5. a
ij
:= a
ij
a
ik
a
kj
6. EndDo
7. EndDo
8. EndDo
CEA-EDF-INRIA school - 03/30/0904/03/09 58
Not accessed
Accessed but not
Accessed and
modified
modified
CEA-EDF-INRIA school - 03/30/0904/03/09 59
ILU(0) zero-ll ILU
ALGORITHM : 7 ILU(0)
For i = 1, . . . , N Do:
For k = 1, . . . , i 1 and if (i, k) NZ(A) Do:
Compute a
ik
:= a
ik
/a
kj
For j = k + 1, . . . and if (i, j) NZ(A), Do:
compute a
ij
:= a
ij
a
ik
a
k,j
.
EndFor
EndFor
When Ais SPD then the ILU factorization = Incomplete Cholesky
factorization IC(0). Meijerink and Van der Vorst [1977].
CEA-EDF-INRIA school - 03/30/0904/03/09 60
Typical eigenvalue distribution of preconditioned matrix
CEA-EDF-INRIA school - 03/30/0904/03/09 61
Pattern of ILU(0) for 5-point matrix
CEA-EDF-INRIA school - 03/30/0904/03/09 62
Stencils and ILU factorization
Stencils of A and the L and U parts of A:
CEA-EDF-INRIA school - 03/30/0904/03/09 63
Higher order ILU factorization
Higher accuracy incomplete Cholesky: for regularly structured
problems, IC(p) allows p additional diagonals in L.
Can be generalized to irregular sparse matrices using the notion
of level of ll-in [Watts III, 1979]
Initially Lev
ij
=
_
_
_
0 for a
ij
= 0
for a
ij
== 0
At a given step i of Gaussian elimination:
Lev
kj
= min{Lev
kj
; Lev
ki
+Lev
ij
+ 1}
CEA-EDF-INRIA school - 03/30/0904/03/09 64
ILU(p) Strategy = drop anything with level of ll-in exceeding p.
* Increasing level of ll-in usually results in more accurate ILU and...
* ...typically in fewer steps and fewer arithmetic operations.
CEA-EDF-INRIA school - 03/30/0904/03/09 65
ILU(1)
CEA-EDF-INRIA school - 03/30/0904/03/09 66
ALGORITHM : 8 ILU(p)
For i = 2, N Do
For each k = 1, . . . , i 1 and if a
ij
= 0 do
Compute a
ik
:= a
ik
/a
jj
Compute a
i,
:= a
i,
a
ik
a
k,
.
Update the levels of a
i,
Replace any element in row i with lev(a
ij
) > p by zero.
EndFor
EndFor
The algorithm can be split into a symbolic and a numerical phase.
Level-of-ll in Symbolic phase
CEA-EDF-INRIA school - 03/30/0904/03/09 67
ILU with threshold generic algorithms
ILU(p) factorizations are based on structure only and not numerical
values potential problems for non M-matrices.
One remedy: ILU with threshold (generic name ILUT.)
Two broad approaches:
First approach [derived fromdirect solvers]: use any (direct) sparse
solver and incorporate a dropping strategy. [Munksgaard (?), Os-
terby & Zlatev, Sameh & Zlatev[90], D. Young, & al. (Boeing) etc...]
CEA-EDF-INRIA school - 03/30/0904/03/09 68
Second approach : [derived from iterative solvers viewpoint]
1. use a (row or colum) version of the (i, k, j) version of GE;
2. apply a drop strategy for the elment l
ik
as it is computed;
3. perform the linear combinations to get a
i
. Use full row expansion
of a
i
;
4. apply a drop strategy to ll-ins.
CEA-EDF-INRIA school - 03/30/0904/03/09 69
ILU with threshold: ILUT(k, )
Do the i, k, j version of Gaussian Elimination (GE).
During each i-th step in GE, discard any pivot or ll-in whose value
is below row
i
(A).
Once the i-th row of L + U, (L-part + U-part) is computed retain
only the k largest elements in both parts.
Advantages: controlled ll-in. Smaller memory overhead.
Easy to implement
Can be made quite inexpensive.
CEA-EDF-INRIA school - 03/30/0904/03/09 70
log
2
0
C
P
U
T
i
m
e
1.0 3.0 5.0 7.0 9.0
0.
4.0
8.0
12.
16.
20.
h
h
h
h
h
h
h
h
h
h
h
h
h
h
h
h
h
h@@@@
$
$
$
$
2
2
2
2
4
4
4
4
2
).
For j = 1, ..., m do
If j < mp then z
j
:= v
j
Else z
j
= u
jm
(eigenvector)
w = Az
j
For i = 1, . . . , j, do
_
_
_
h
i,j
:= (w, v
i
)
w := w h
i,j
v
i
h
j+1,j
= w
2
, v
j+1
= w/w
2
.
EndDo
Dene Z
m
:= [z
1
, ...., z
m
] and
H
m
= {h
i,j
}.
2. Form the approximate solution:
Compute x
m
= x
0
+Z
m
y
m
where y
m
= argmin
y
e
1
H
m
y
2
.
3. Get next eigenvector estimates u
1
, . . . , u
p
from
H
m
, V
m
, Z
m
, ...
4. Restart: If satised stop, else set x
0
x
m
and goto 1.
Question 1: which eigenvectors to add?
Answer: those associated with smallest eigenvalues.
Question 2: How to compute eigenvectors from F-GMRES?
Answer: use the relation
AZ
m
= V
m+1
H
m
Approximation: , u = Z
m
y
Galerkin Condition: r AZ
m
gives the generalized problem
H
H
m
H
m
y =
H
H
m
V
H
m+1
Z
m
y
In Addition in GMRES:
H
m
= Q
m
R
m
so H
H
m
H
m
= R
H
m
R
m
.
See: Morgan (1993).
CEA-EDF-INRIA school - 03/30/0904/03/09 76
An example: Shell problems
Can be very hard to solve!
A matrix of size N=38,002, with Nz = 949,452 nonzero elements.
Actually symmetric. Not exploited in test.
Most simplistic methods fail.
ILUT(50,0) does not work even with GMRES(80).
This is an example when a large subspace is required.
0 100 200 300 400 500 600 700 800 900 1000
10
18
10
16
10
14
10
12
10
10
10
8
10
6
10
4
10
2
10
0
Numbers of GMRES steps
l
o
g
1
0
o
f
r
e
s
i
d
u
a
l
Shell problem. N = 38002, Nz = 949452, m = 80
0 eigenvector
5 eigenvectors
10 eigenvectors
20 eigenvectors
30 eigenvectors
CEA-EDF-INRIA school - 03/30/0904/03/09 78
An example: Eulers equations on an unstructured mesh
Contributed by Larry Wigton from Boeing
Size = 3,864. (966 mesh points).
Nonzero elements: 238,252 (about 62 per row).
Difcult to solve in spite of its small size.
Results with ILUT(lfil, )
ll Iterations estimate of
(tol = 10
8
) (LU)
1
100 0.19E + 56
110 0.34E + 9
120 30 0.70E + 5
130 25 0.33E + 7
140 20 0.17E + 4
150 19 0.69E + 4
CEA-EDF-INRIA school - 03/30/0904/03/09 80
Results with Block Jacobi Preconditioning
with Eigenvalue Deation
reduction in residual norm in
1200 GMRES steps with m = 49
4x4 block 16x16 block
p = 0 0.8 E 0 0.8 E 0
p = 4 0.8 E 0 4.0 E-5
p = 8 1.2 E-2 2.9 E-7
p = 12 1.9 E-2 3.8 E-6
CEA-EDF-INRIA school - 03/30/0904/03/09 81
Theory (Hermitian case only)
Assume that A is SPD and let K = K
m
+W, where W is s.t.
dist(AW, U) =
with U = exact invariant subspace associated with
1
, ..,
s
.
Then the residual r obtained from the minimal residual projection
process onto the augmented Krylov subspace K satises the
inequality,
r
2
r
0
1
T
2
m
()
+
2
where
n
+
s+1
s+1
, T
m
Chebyshev polyn. of deg. mof 1st kind.
See [YS, SIMAX vol. 4, pp 43-66 (1997)] for other results.
CEA-EDF-INRIA school - 03/30/0904/03/09 82
Crout-based ILUT (ILUTC)
Terminology: Crout versions of LU compute the k-th row of U and
the k-th column of L at the k-th step.
Computational pattern
Black = part computed at step k
Blue = part accessed
Main advantages:
1. Less expensive than ILUT (avoids sorting)
2. Allows better techniques for dropping
References:
[1] M. Jones and P. Plassman. An improved incomplete Choleski fac-
torization. ACM Transactions on Mathematical Software, 21:5
17, 1995.
[2] S. C. Eisenstat, M. H. Schultz, and A. H. Sherman. Algorithms
and data structures for sparse symmetric Gaussian elimination.
SIAM Journal on Scientic Computing, 2:225237, 1981.
[3] M. Bollh ofer. A robust ILU with pivoting based on monitoring the
growth of the inverse factors. Linear Algebra and its Applications,
338(13):201218, 2001.
[4] N. Li, Y. Saad, and E. Chow. Crout versions of ILU. MSI technical
report, 2002.
CEA-EDF-INRIA school - 03/30/0904/03/09 84
Crout LU (dense case)
Go back to delayed update algorithm (IKJ alg.) and observe: we
could do both a column and a row version
Left: U computed by rows. Right: L computed by columns
CEA-EDF-INRIA school - 03/30/0904/03/09 85
Note: Entries 1 : k1 in k-th row of gure need not be computed.
Available from already computed columns of L.
Similar observation for L (right)
CEA-EDF-INRIA school - 03/30/0904/03/09 86
ALGORITHM : 9 Crout LU Factorization (dense case)
1. For k = 1 : n Do :
2. For i = 1 : k 1 and if a
ki
= 0 Do :
3. a
k,k:n
= a
k,k:n
a
ki
a
i,k:n
4. EndDo
5. For i = 1 : k 1 and if a
ik
= 0 Do :
6. a
k+1:n.k
= a
k+1:n,k
a
ik
a
k+1:n,i
7. EndDo
8. a
ik
= a
ik
/a
kk
for i = k + 1, ..., n
9. EndDo
CEA-EDF-INRIA school - 03/30/0904/03/09 87
Comparison with standard techniques
0 10 20 30 40 50 60 70
0
2
4
6
8
10
12
14
Lfil
P
r
e
c
o
n
d
i
t
i
o
n
i
n
g
T
i
m
e
(
s
e
c
.
)
Preconditioning time vs. Lfil for RAEFSKY3
ILUC
rLLUT
cILUT
bILUT
Precondition time vs. Ll for ILUC (solid), row-ILUT (circles), column-
ILUT (triangles) and r-ILUT with Binary Search Trees (stars)
CEA-EDF-INRIA school - 03/30/0904/03/09 88
Inverse-based dropping strategies
Method developed mainly by Matthias Bollh offer
Observation: norm of inverses of the factors is more important
than the errors in the factors themselves: If A =
L
U +E then
L
1
A
U
1
= I +
L
1
E
U
1
In many cases
L
1
and
U
1
are *very* large Bad.
In contrast assume A = LU = exact LU factorization and
L
1
= L
1
+X
U
1
= U
1
+Y, Then:
L
1
A
U
1
= (L
1
+X)A(U
1
+Y ) = I +AY +XA+XY.
CEA-EDF-INRIA school - 03/30/0904/03/09 89
X, Y small preconditioned matrix close to identity
Let L
k
= matrix of the rst k rows of L and the last n k rows of
the identity matrix.
Consider a term l
jk
with j > k that is dropped at step k. Per-
turbed matrix
L
k
differs from L
k
by l
jk
e
j
e
T
k
. Note: L
k
e
j
= e
j
so
L
k
= L
k
l
jk
e
j
e
T
k
= L
k
(I l
jk
e
j
e
T
k
)
L
1
k
= (I l
jk
e
j
e
T
k
)
1
L
1
k
= L
1
k
+l
jk
e
j
e
T
k
L
1
k
.
j-th row of inverse of L
k
perturbed by l
jk
times k-th row of L
1
k
.
CEA-EDF-INRIA school - 03/30/0904/03/09 90
Need to limit the norm of this perturbing row, i.e.,
|l
jk
| e
T
k
L
1
k
should be small
L
1
is not available. Bollh ofers idea: use techniques for estimat-
ing condition numbers [see, e.g., Golub and van Loan]
CEA-EDF-INRIA school - 03/30/0904/03/09 91
ALGORITHM : 10 Estimating the norms e
T
k
L
1
1. Set
1
= 1,
i
= 0, i = 1, . . . , n
2 For k = 2, . . . , n do
3
+
= 1
k
;
= 1
k
;
4 if |
+
| > |
| then
k
=
+
else
k
=