You are on page 1of 4

Contents of presentation Direct Solver of Unsymmetric Sparse Linear Equations

Ryuji NISHIDE Kanada laboratory Department of Frontier Informatics, The University of Tokyo

Introduction of sparse direct solver First experimental results of auto-tuning Next problem

Modeling the sparse matrices

Second experimental results Conclusion and future work

Overview of Sparse direct solver(1)


Ax = b

Overview of sparse direct solver(2)


LU decomposition

Direct method

Iterative method

Load:High Stability:High LU decomposition

Load:Low Stability:Low

Typical direct solver A is decomposed into the product of L and U The two triangular systems Ly=b and Ux=y are solved to obtain the solution x For sparse matrices

3 famous algorithm

Sometimes cannot find the solution

Gauss-Seidel method Jacobi method

Left-looking algorithm : supernodal method Right-looking algorithm : multi-frontal method Please refer to the previous presentation for detail

Overview of sparse direct solver(2)

Overview of sparse direct solver(3)

LU decomposition
Typical direct solver A is decomposed into the product of L and U The two triangular systems Ly=b and Ux=y are solved to obtain the solution x U matrices A For sparse * =

Sparse direct solver usually divides the solution process into 4 phases
In order to reduce the fill-in
Ordering

L algorithm : supernodal method Left-looking Right-looking algorithm : multi-frontal method Please refer to the previous presentation for detail

Symbolic factorization

In order to determine the nonzero Structure of L and U Actual computation of L and U Solution of Ly=b and Ux=y

LU decomposition

Forward and backward substitution

Overview of sparse direct solver(4)

Overview of sparse direct solver(4)


dependence independence

Difficult problems of sparse direct solver


Difficult problems of sparse direct solver


How to find the minimum fill-in ordering How to reduce the indirect addressing and cache misses

How to find the minimum fill-in ordering How to reduce the indirect addressing and cache misses

Group together columns with the same nonzero structure (=supernode) Need to predict the nonzero structure of L and U beforehand in order to find the dependencies among columns Block size settings

Group together columns with the same nonzero structure (=supernode) Need to predict the nonzero structure of L and U beforehand in order to find the dependencies among columns Block size settings

Modeling

Modeling

Experimental result of auto-tuning(1)

Experimental result of auto-tuning(2)


Reference

Conventional algorithms of sparse direct solver cannot automatically decide blocking size The optimum blocking size is decided by each user

Possible poor performance Increase of users load

Update

Blocking parameters and algorithms need to be automatically decided [R.Nishide et.al. 2002]

=b MatrixVector r : parameter of supernode supernode = t amalgamation

Proposition of the auto-tuning method and evaluation of its effect

=w

Experimental result of auto-tuning(3)

Experimental result of auto-tuning(4)

The difference between worst performance and best performance


At most 1.55 times faster 1.15 times faster on average

Relation between the structure of matrices and auto-tuning performance

The difference between default performance and best performance


At most 1.33 times faster 1.05 times faster on average

Diagonal matrices such as Ex15 and Goodwin always show better performance Nondiagonal matrices such as Bayer10 and Orani678 may show better performance, but vary in parameters

This result implies the effect of the auto-tuning method The difference among parameters becomes large if the search range is expanded

Careful search is needed

Modeling of matrices structure


Performance can be predicted Find effectively the optimum parameter

Users bad setting

Experimental result of auto-tuning(4)

Next problem

Relation between the structure of matrices and auto-tuning performance

Diagonal matrices such as Ex15 and Goodwin always show better performance Bayer10 Orani678 Nondiagonal matrices such as Bayer10 and Orani678 may show better performance, but vary in parameters

We search better algorithm of ordering, symbolic factorization and LU factorization


Ordering algorithms are studied intensively In symbolic factorization, postordering algorithms are being developed Column elimination tree

Modeling

Careful search is needed

Modeling of matrices structure


Performance can be predicted Find effectively the optimum parameter Goodwin Ex15

Column elimination tree(1)

Column elimination tree(2)

Elimination tree(etree) is defined using the Cholesky factor L Using etree, we can find the dependencies among columns and determine supernode size
1 3 7

Etree(G) Set all nodes unmarked Iterate the following step n times: Select an unmarked node v and add edges to G such that all vs neighbors are adjacent (clique), and then mark v. End Etree

clique

10 9

8 4

6 10

8 4

6 4 10 7 6 8

5 2 3 1

G(A)

G+(A)

T(A)

Column elimination tree(3)

Postordering(1)

Column elimination tree is the etree of A T A where A is unsymmetric The structure of L and U dynamically changes due to pivoting Column etree can overestimate the true column dependencies [M.Cosnard et.al. 2000]

The storage and computational costs of A and PAPT are equivalent if column etree of both matrices is isomorphic

P: equivalent reordering
10 9 10 9 5 4 8 2 7 6 3 1 3 4 2 1 7 8 5 6

Column elimination forest(eforest)

. A=

. .

. .

. .

. . AT A = . .

. . . .

. . . .

. . . .

T(A)

T(A)

Postordering(2)

Experiment of modeling the etree(1)

In order to make the size of supernode large, we will use the function that evaluates data locality

We compare first experimental results with the structure of etree

[D.B. Heras et.al. 2001] 3 functions D1 ( x, y) = max elems aelems ( x, y) depth first search D ( x, y) = n ( x) + n ( y) 2 * a ( x, y) 3 lines lines lines
D2 ( x, y) = nelems ( x) + nelems ( x, y) 2 * aelems ( x, y)

Ex15, Orani678, Goodwin and Bayer10 are selected

Experiment of modeling the etree(2)


Orani678

Experiment of modeling the etree(3)


Bayer10

We must search the optimum parameter carefully

We must search the optimum parameter carefully

Experiment of modeling the etree(4)

Conclusion and future work


We show the effect of auto-tuning method in first experiment Auto-tuning effect may depend on the structure of A Using column etree, we can estimate supernodal size and computational performance

Goodwin

Second experimental results

Search range can become narrow

We can obtain possible improvement with the better postordering Total implementation of the original sparse direct solver

You might also like