Rinkou3 PPT PDF

Contents of presentation Direct Solver of Unsymmetric Sparse Linear Equations
Ryuji NISHIDE Kanada laboratory Department of Frontier Informatics, The University of Tokyo

Introduction of sparse direct solver First experimental results of auto-tuning Next problem
Modeling the sparse matrices
Second experimental results Conclusion and future work
Overview of Sparse direct solver(1)

Ax = b
Overview of sparse direct solver(2)

LU decomposition

Direct method

Iterative method

Load:High Stability:High LU decomposition
Load:Low Stability:Low
Typical direct solver A is decomposed into the product of L and U The two triangular systems Ly=b and Ux=y are solved to obtain the solution x For sparse matrices

3 famous algorithm

Sometimes cannot find the solution
Gauss-Seidel method Jacobi method
Left-looking algorithm : supernodal method Right-looking algorithm : multi-frontal method Please refer to the previous presentation for detail
LU decomposition
Typical direct solver A is decomposed into the product of L and U The two triangular systems Ly=b and Ux=y are solved to obtain the solution x U matrices A For sparse * =
Sparse direct solver usually divides the solution process into 4 phases
In order to reduce the fill-in
Ordering
L algorithm : supernodal method Left-looking Right-looking algorithm : multi-frontal method Please refer to the previous presentation for detail
Symbolic factorization
In order to determine the nonzero Structure of L and U Actual computation of L and U Solution of Ly=b and Ux=y
LU decomposition
Forward and backward substitution

dependence independence
Difficult problems of sparse direct solver

Difficult problems of sparse direct solver

How to find the minimum fill-in ordering How to reduce the indirect addressing and cache misses
How to find the minimum fill-in ordering How to reduce the indirect addressing and cache misses
Group together columns with the same nonzero structure (=supernode) Need to predict the nonzero structure of L and U beforehand in order to find the dependencies among columns Block size settings
Group together columns with the same nonzero structure (=supernode) Need to predict the nonzero structure of L and U beforehand in order to find the dependencies among columns Block size settings
Modeling
Modeling
Experimental result of auto-tuning(1)

Reference
Conventional algorithms of sparse direct solver cannot automatically decide blocking size The optimum blocking size is decided by each user

Possible poor performance Increase of users load
Update
Blocking parameters and algorithms need to be automatically decided [R.Nishide et.al. 2002]
=b MatrixVector r : parameter of supernode supernode = t amalgamation
Proposition of the auto-tuning method and evaluation of its effect
=w
The difference between worst performance and best performance

At most 1.55 times faster 1.15 times faster on average
Relation between the structure of matrices and auto-tuning performance
The difference between default performance and best performance

At most 1.33 times faster 1.05 times faster on average
Diagonal matrices such as Ex15 and Goodwin always show better performance Nondiagonal matrices such as Bayer10 and Orani678 may show better performance, but vary in parameters
This result implies the effect of the auto-tuning method The difference among parameters becomes large if the search range is expanded
Careful search is needed
Modeling of matrices structure

Performance can be predicted Find effectively the optimum parameter
Users bad setting
Next problem
Relation between the structure of matrices and auto-tuning performance
Diagonal matrices such as Ex15 and Goodwin always show better performance Bayer10 Orani678 Nondiagonal matrices such as Bayer10 and Orani678 may show better performance, but vary in parameters
We search better algorithm of ordering, symbolic factorization and LU factorization

Ordering algorithms are studied intensively In symbolic factorization, postordering algorithms are being developed Column elimination tree
Modeling
Careful search is needed
Modeling of matrices structure

Performance can be predicted Find effectively the optimum parameter Goodwin Ex15
Column elimination tree(1)
Elimination tree(etree) is defined using the Cholesky factor L Using etree, we can find the dependencies among columns and determine supernode size
1 3 7
Etree(G) Set all nodes unmarked Iterate the following step n times: Select an unmarked node v and add edges to G such that all vs neighbors are adjacent (clique), and then mark v. End Etree
clique
10 9
8 4
6 10
8 4
6 4 10 7 6 8
5 2 3 1
G(A)
G+(A)
T(A)
Postordering(1)
Column elimination tree is the etree of A T A where A is unsymmetric The structure of L and U dynamically changes due to pivoting Column etree can overestimate the true column dependencies [M.Cosnard et.al. 2000]
The storage and computational costs of A and PAPT are equivalent if column etree of both matrices is isomorphic
P: equivalent reordering
10 9 10 9 5 4 8 2 7 6 3 1 3 4 2 1 7 8 5 6
Column elimination forest(eforest)
. A=
. .
. .
. .
. . AT A = . .
. . . .
. . . .
. . . .
T(A)
T(A)
Postordering(2)
Experiment of modeling the etree(1)
In order to make the size of supernode large, we will use the function that evaluates data locality

We compare first experimental results with the structure of etree
[D.B. Heras et.al. 2001] 3 functions D1 ( x, y) = max elems aelems ( x, y) depth first search D ( x, y) = n ( x) + n ( y) 2 * a ( x, y) 3 lines lines lines
D2 ( x, y) = nelems ( x) + nelems ( x, y) 2 * aelems ( x, y)
Ex15, Orani678, Goodwin and Bayer10 are selected

Orani678

Bayer10
We must search the optimum parameter carefully
We must search the optimum parameter carefully
Conclusion and future work

We show the effect of auto-tuning method in first experiment Auto-tuning effect may depend on the structure of A Using column etree, we can estimate supernodal size and computational performance
Goodwin

Second experimental results
Search range can become narrow
We can obtain possible improvement with the better postordering Total implementation of the original sparse direct solver

Rinkou3 PPT PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Rinkou3 PPT PDF

Uploaded by

Copyright:

Available Formats

Contents of presentation Direct Solver of Unsymmetric Sparse Linear Equations

Modeling the sparse matrices

Second experimental results Conclusion and future work

Overview of Sparse direct solver(1)

Overview of sparse direct solver(2)

Load:High Stability:High LU decomposition

Sometimes cannot find the solution

Gauss-Seidel method Jacobi method

Overview of sparse direct solver(2)

Overview of sparse direct solver(3)

Forward and backward substitution

Overview of sparse direct solver(4)

Overview of sparse direct solver(4)

Difficult problems of sparse direct solver

Difficult problems of sparse direct solver

Experimental result of auto-tuning(1)

Experimental result of auto-tuning(2)

Possible poor performance Increase of users load

=b MatrixVector r : parameter of supernode supernode = t amalgamation

Proposition of the auto-tuning method and evaluation of its effect

Experimental result of auto-tuning(3)

Experimental result of auto-tuning(4)

The difference between worst performance and best performance

At most 1.55 times faster 1.15 times faster on average

Relation between the structure of matrices and auto-tuning performance

The difference between default performance and best performance

At most 1.33 times faster 1.05 times faster on average

Careful search is needed

Modeling of matrices structure

Performance can be predicted Find effectively the optimum parameter

Users bad setting

Experimental result of auto-tuning(4)

Relation between the structure of matrices and auto-tuning performance

We search better algorithm of ordering, symbolic factorization and LU factorization

Careful search is needed

Modeling of matrices structure

Column elimination tree(1)

Column elimination tree(2)

Column elimination tree(3)

Column elimination forest(eforest)

Experiment of modeling the etree(1)

We compare first experimental results with the structure of etree

Ex15, Orani678, Goodwin and Bayer10 are selected

Experiment of modeling the etree(2)

Experiment of modeling the etree(3)

We must search the optimum parameter carefully

We must search the optimum parameter carefully

Experiment of modeling the etree(4)

Conclusion and future work

Second experimental results

Search range can become narrow

You might also like