Professional Documents
Culture Documents
Ryuji NISHIDE Kanada laboratory Department of Frontier Informatics, The University of Tokyo
Introduction of sparse direct solver First experimental results of auto-tuning Next problem
Direct method
Iterative method
Load:Low Stability:Low
Typical direct solver A is decomposed into the product of L and U The two triangular systems Ly=b and Ux=y are solved to obtain the solution x For sparse matrices
3 famous algorithm
Left-looking algorithm : supernodal method Right-looking algorithm : multi-frontal method Please refer to the previous presentation for detail
LU decomposition
Typical direct solver A is decomposed into the product of L and U The two triangular systems Ly=b and Ux=y are solved to obtain the solution x U matrices A For sparse * =
Sparse direct solver usually divides the solution process into 4 phases
In order to reduce the fill-in
Ordering
L algorithm : supernodal method Left-looking Right-looking algorithm : multi-frontal method Please refer to the previous presentation for detail
Symbolic factorization
In order to determine the nonzero Structure of L and U Actual computation of L and U Solution of Ly=b and Ux=y
LU decomposition
How to find the minimum fill-in ordering How to reduce the indirect addressing and cache misses
How to find the minimum fill-in ordering How to reduce the indirect addressing and cache misses
Group together columns with the same nonzero structure (=supernode) Need to predict the nonzero structure of L and U beforehand in order to find the dependencies among columns Block size settings
Group together columns with the same nonzero structure (=supernode) Need to predict the nonzero structure of L and U beforehand in order to find the dependencies among columns Block size settings
Modeling
Modeling
Conventional algorithms of sparse direct solver cannot automatically decide blocking size The optimum blocking size is decided by each user
Update
Blocking parameters and algorithms need to be automatically decided [R.Nishide et.al. 2002]
=w
Diagonal matrices such as Ex15 and Goodwin always show better performance Nondiagonal matrices such as Bayer10 and Orani678 may show better performance, but vary in parameters
This result implies the effect of the auto-tuning method The difference among parameters becomes large if the search range is expanded
Next problem
Diagonal matrices such as Ex15 and Goodwin always show better performance Bayer10 Orani678 Nondiagonal matrices such as Bayer10 and Orani678 may show better performance, but vary in parameters
Ordering algorithms are studied intensively In symbolic factorization, postordering algorithms are being developed Column elimination tree
Modeling
Performance can be predicted Find effectively the optimum parameter Goodwin Ex15
Elimination tree(etree) is defined using the Cholesky factor L Using etree, we can find the dependencies among columns and determine supernode size
1 3 7
Etree(G) Set all nodes unmarked Iterate the following step n times: Select an unmarked node v and add edges to G such that all vs neighbors are adjacent (clique), and then mark v. End Etree
clique
10 9
8 4
6 10
8 4
6 4 10 7 6 8
5 2 3 1
G(A)
G+(A)
T(A)
Postordering(1)
Column elimination tree is the etree of A T A where A is unsymmetric The structure of L and U dynamically changes due to pivoting Column etree can overestimate the true column dependencies [M.Cosnard et.al. 2000]
The storage and computational costs of A and PAPT are equivalent if column etree of both matrices is isomorphic
P: equivalent reordering
10 9 10 9 5 4 8 2 7 6 3 1 3 4 2 1 7 8 5 6
. A=
. .
. .
. .
. . AT A = . .
. . . .
. . . .
. . . .
T(A)
T(A)
Postordering(2)
In order to make the size of supernode large, we will use the function that evaluates data locality
[D.B. Heras et.al. 2001] 3 functions D1 ( x, y) = max elems aelems ( x, y) depth first search D ( x, y) = n ( x) + n ( y) 2 * a ( x, y) 3 lines lines lines
D2 ( x, y) = nelems ( x) + nelems ( x, y) 2 * aelems ( x, y)
Goodwin
We can obtain possible improvement with the better postordering Total implementation of the original sparse direct solver