Sparse Days 06 15 2010

Computing the diagonal of the inverse of a sparse matrix Yousef Saad Department of Computer Science and Engineering University
of Minnesota
Sparse Days, Toulouse, June 15, 2010
Motivation: DMFT
Dynamic Mean Field Theory - quantum mechanical studies of highly correlated particles Equation to be solved (repeatedly) is Dysons equation
G( ) = [( + )I V ( ) + T ]1 (frequency) and (chemical potential) are real V = trap potential = real diagonal ( ) == local self-energy - a complex diagonal T is the hopping matrix (sparse real).
Sparse Days 06/15/2010
Interested only in diagonal of G( ) in addition, equation must be solved self-consistently and ... ... must do this for many s Related approach: Non Equilibrium Greens Function (NEGF) approach used to model nanoscale transistors. Many new applications of diagonal of inverse [and related problems.] A few examples to follow
Introduction: A few examples

Problem 1: Compute Tr[inv[A]] the trace of the inverse. Arises in cross validation : (I A( ))g 2 with A( ) I D (D T D +LLT )1D T , Tr (I A( )) D == blurring operator and L is the regularization operator In [Huntchinson 90] Tr[Inv[A]] is stochastically estimated Many authors addressed this problem.
Problem 2: Compute Tr [ f (A)], f a certain function Arises in many applications in Physics. Example: Stochastic estimations of Tr ( f(A)) extensively used by quantum chemists to estimate Density of States, see [Ref: H. Rder, R. N. Silver, D. A. Drabold, J. J. Dong, Phys. Rev. B. 55, 15382 (1997)]
Problem 3: Compute diag[inv(A)] the diagonal of the inverse Arises in Dynamic Mean Field Theory [DMFT, motivation for this work]. In DMFT, we seek the diagonal of a Greens function which solves (self-consistently) Dysons equation. [see J. Freericks 2005] Related approach: Non Equilibrium Greens Function (NEGF) approach used to model nanoscale transistors. In uncertainty quantication, the diagonal of the inverse of a covariance matrix is needed [Bekas, Curioni, Fedulova 09]
Problem 4: Compute diag[ f (A)] ; f = a certain function. Arises in any density matrix approach in quantum modeling - for example Density Functional Theory. Here, f = Fermi-Dirac operator:
f( ) =
1
1 + exp( k ) BT
Note: when T 0 then f becomes a step function.
Note: if f is approximated by a rational function then diag[f(A)] a lin. combinaiton of terms like diag[(A iI )1] Linear-Scaling methods based on approximating f (H ) and Diag(f (H )) avoid diagonalization of H
Methods based on the sparse L U factorization

Basic reference: K. Takahashi, J. Fagan, and M.-S. Chin, Formation of a sparse bus impedance matrix and its application to short circuit study, in Proc. of the Eighth Inst. PICA Conf., Minneapolis, MN, IEEE, Power Engineering Soc., 1973, pp. 63-69. Described in [Duff, Erisman, Reid, p. 273] Algorithm used by Erisman and Tinney [Num. Math. 1975]
Main idea. If A = LDU and B = A1 then
B = U 1D 1 + B (I L);
B = D 1L1 + (I U )B.
Not all entries are needed to compute selected entries of B For example: Consider lower part, i > j ; use rst equation:
bij = (B (I L))ij =
k>j
bik lkj
Need entries bik of row i where Lkj = 0, k > j . Entries of B belonging to the pattern of (L, U )T can be extracted without computing any other entries outside the pattern.
More recently exploited in a different form in L. Lin, C. Yang, J. Meza, J. Lu, L. Ying, W. E SelInv An algorithm for selected inversion of a sparse symmetric matrix, Tech. Report, Princeton Univ. An algorithm based on a form of nested dissection is described in Li, Ahmed, Glimeck, Darve [2008] A close relative to this technique is represented in L. Lin , J. Lu, L. Ying , R. Car , W. E Fast algorithm for extracting the diagonal of the inverse matrix with application to the electronic structure analysis of metallic systems Comm. Math. Sci, 2009. Difculty: 3-D problems.
10
Stochastic Estimator
A = original matrix, B = A1. (B ) = diag(B ) [matlab notation]
Notation:
D (B ) = diagonal matrix with diagonal (B )

and : Elementwise multiplication and division of vectors
{vj }: Sequence of s random vectors

Result:
s
vj Bvj
vj vj
(B )
j =1
j =1
Refs: C. Bekas , E. Kokiopoulou & YS (05), Recent: C. Bekas, A. Curioni, I. Fedulova 09.
11
Let Vs = [v1, v2, . . . , vs]. Then, alternative expression:
D (B ) D (BVsVs )D 1(VsVs )
Question: When is this result exact? Main Proposition
Let Vs Rns with rows {vj,:}; and B Cnn with elements {bjk } Assume that: vj,:, vk,: = 0, j = k, s.t. bjk = 0
Then:
D (B )=D (BVsVs )D 1(VsVs )

Approximation to bij exact when rows i and j of Vs are
12
Ideas from information theory: Hadamard matrices

Consider the matrix V want the rows to be as orthogonal as possible among each other, i.e., want to minimize
Erms =
I VVT
n(n 1)
or Emax = max |V V T |ij

i=j
Problems that arise in coding: nd code book [rows of V = code words] to minimize cross-correlation amplitude Welch bounds:
Erms
ns (n 1)s
Emax
ns (n 1)s
Result: a sequence of s vectors vk with binary entries which achieve the rst Welch bound iff s = 2 or s = 4k.
13
Hadamard matrices are a special class: n n matrices with entries 1 and such that HH = nI . 1 1 1 1 1 1 1 1 1 1 . and Examples : 1 1 1 1 1 1 1 1 1 1 Achieve both Welch bounds Can build larger Hadamard matrices recursively: Given two Hadamard matrices H1 and H2, the Kronecker product H1 H2 is a Hadamard matrix. Too expensive to use the whole matrix of size n Can use Vs = matrix of s rst columns of Hn
14
20
20
40
40
60
32
60
64
80
80
100
100
120 0 20 40 60 80 100 120
120 0 20 40 60 80 100 120
Pattern of VsVs , for s = 32 and s = 64.
15
A Lanczos approach
Given a Hermitian matrix A - generate Lanczos vectors via:
i+1qi+1 = Aqi iqi iqi1 i, i+1 selected s.t. qi+1

Result:
2
= 1 and qi+1 qi, qi+1 qi1
AQm = QmTm + m+1qm+1em,

1 When m = n then A = QnTnQn and A1 = QnTn Qn . 1 For m < n use the approximation: A1 QmTm Qm 1 D (A1) D [QmTm Qm]
16
ALGORITHM : 1
diagInv via Lanczos
For j = 1, 2, , Do: j +1qj +1 = Aqj j qj j qj 1 [Lanczos step] pj := qj j pj 1 j := j j j pj pj [Update of diag(inv(A))] dj := dj 1 + j
j +1 := EndDo
j +1 j
dk (a vector) will converge to the diagonal of A1 Limitation: Often requires all n steps to converge One advantage: Lanczos is shift invariant so can use this for many s Potential: Use as a direct method - exploiting sparsity
17
Using a sparse V : Probing

Goal: Difculty: Find Vs such that (1) s is small and (2) Vs satises Proposition (rows i & j orthgonoal for any nonzero bij ) Can work only for sparse matrices but B = A1 is usually dense
B can sometimes be approximated by a sparse matrix. Consider for some :
(B )ij =
bij , |bij | > 0, |bij |
B will be sparse under certain conditions, e.g., when A is diagonally dominant In what follows we assume B is sparse and set B := B . Pattern will be required by standard probing methods.
18
Generic Probing Algorithm

ALGORITHM : 2 Probing Input: A, s Output: Matrix D (B ) Determine Vs := [v1, v2, . . . , vs] for j 1 to s Solve Axj = vj end Construct Xs := [x1, x2, . . . , xs] Compute D (B ) := D XsVs D 1(VsVs ) Note: rows of Vs are typically scaled to have unit 2-norm =1., so D 1(VsVs ) = I .
19
Standard probing (e.g. to compute a Jacobian)

Several names for same method: probing; CPR, Sparse Jacobian estimators,.. Basis of the method: can compute Jacobian if a coloring of the columns is known so that no two columns of the same color overlap.
1 3 5 12 13 16 20
All entries of same color can be computed with one matvec. Example: For all blue entries multiply B by the blue vector on right.
1 1
(1) (3)
(5)
1 1
(12) (15)
1 1
(13)
(20)
20
What about Diag(inv(A))?

Dene vi - probing vector associated with color i:
[vi]k =
1 if color (k) == i 0 otherwise
Standard probing satises requirement of Proposition but... ... this coloring is not what is needed! [It is an overkill] Alternative: Color the graph of B in the standard graph coloring algorithm [Adjacency graph, not graph of column-overlaps] Result: Graph coloring yields a valid set of probing vectors for D (B ).
21
color red
color black
Proof: Column vc: one for each node i whose color is c, zero elsewhere. Row i of Vs: has a 1 in column c, where c = color (i), zero elsewhere. If bij = 0 then in matrix Vs:
0 1 0 0 0 0 0 i
0 0
0 0 0 1 0
i-th row has a 1 in column color (i), 0 elsewhere. j -th row has a 1 in column color (j ), 0 elsewhere.
The 2 rows are orthogonal.
22
Example:
Two colors required for this graph two probing vectors Standard method: 6 colors [graph of B T B ]
23
Next Issue: Guessing the pattern of B

Recall that we are dealing with B := B [pruned B ] Assume A diagonally dominant Write A = D E , with D = D (A). Then :
A = D (I F ) with F D 1E
A1 (I + F + F 2 + + F k )D 1
B (k )
When A is D.D. F k decreases rapidly. Can approximate pattern of B by that of B (k) for some k. Interpretation in terms of paths of length k in graph of A.
24
Q: How to select k? A: Inspect A1ej for some j Values of solution outside pattern of (Ak ej ) should be small. If during calculations we get larger than expected errors then redo with larger k, more colors, etc.. Can we salvage what was done? Question still open.
25
Preliminary experiments
Problem Setup
DMFT: Calculate the imaginary time Greens function DMFT Parameters: Set of physical parameters is provided DMFT loop: At most 10 outer iterations, each consisting of 62 inner iterations Each inner iteration: Find D (B ) Each inner iteration: Find D (B ) Matrix: Based on a ve-point stencil with ajj = + i V s(j )
Probing Setup
1 0 0 1 10 1 1 0 1 0 0 1 1 0 1 0 a 1 0 1 0 1 1
jj
Probing tolerance: = 1010 GMRES tolerance: = 1012

26
Results
CPU times (sec) for one inner iteration of DMFT.
n LAPACK Lanczos Probing

20
212 0.5 0.2 0.02
412 26 9.9 0.19
612 282 115 0.79
812 > 1000 838 2.0
15 Number
Path length, k # Probing vectors # GMRES iterations
A few statistics for case n = 81
10
0 0
10
20 30 40 50 60 70 DMFT inner iteration Sparse Days 06/15/2010
27
Challenge: The indenite case

The DMFT code deals with a separate case which uses a real axis sampling.. Matrix A is no longer diagonally dominant Far from it. This is a much more challenging case. One option: solve Axj = ej FOR ALL j s - with the ARMS solver using ddPQ ordering + exploit multiple right-hand sides More appealing: DD-type approaches
28
Divided & Conquer approach

Let A == a 5-point matrix (2-D problem) split roughly in two: A1 I I A2 I . . . .. .. .. I A I k A= I Ak+1 I ... ... ... I Any 1 I I Any where {Aj } = tridiag. Write:
A=
A11 A12 A21 A22
A11 A22
A12 A21
with A11 Cmm and A22 C(nm)(nm),

29
Observation:
T A11 + E1E1 T T E1E1 E1E2 T T E2E1 E2E2
A=
A22 +
T E2E2
where E1, E2 are (relatively) small rank matrices: I E1 := Cmnx , E2 := C(nm)nx , I Of the form
A = C EE ,
C :=
C1 C2
E :=
E1 E2
Idea: Use Sherman-Morrisson formula.

30
A1 = C 1 + U G1U T , with: U = C 1E Cnnx G = Inx E T U Cnxnx , D (A1) can be found from D (A1) =
1 D (C1 ) 1 D (C2 )
+D (U G1U T ).
recursion
U : solve CU = E , or
C1U1 = E1, Solve iteratively C2U2 = E2
T T G: G = Inx E T U = Inx E1 U1 E2 U2
31
Domain Decomposition approach

Domain decomposition with p = 3 subdomains Zoom into Subdomain 2
12
3
23
Under usual ordering [interior points then interface points]: B1 F1 B F 2 2 B F . . . . . A= , T F C Bp Fp T T T F1 F2 Fp C
100
Example of matrix A based on a DDM ordering with p = 4 subdomains. (n = 252)
200
300
400
500
600 0 100 200 300 400 nz = 3025 500 600
Inverse of A [Assuming both B and S nonsingular]

1 1 1 T 1 1 1 B + B F S F B B F S A 1 = S 1 F T B 1 S 1 S = C F T B 1F,
D (A1) =
D (B 1) + D (B 1F S 1F T B 1) D (S 1 )
Note: each diagonal block decouples from others: Inverse of A in ith block (domain)
1 (A1)ii = D (Bi ) + D (HiS 1HiT ) 1 Hi = Bi Fi
Note: only nonzero columns of Fi are those related to interface vertices. Approach similar to Divide and Conquer but not recursive..
34
DMFT experiment Times (in seconds) for direct inversion (INV), divide-and-conquer (D&C), and domain decomposition (DD) methods. n INV D&C DD p = 4 subd. for DD 21 .3 .1 .1 Various sizes - 2-D problems 51 12 1.4 .7 Times: seconds in matlab 81 88 7.1 3.2 NOTE: work still in progress
35
Conclusion
Diag(inv(A)) problem: easy for Diag. Dominant case. Very challenging in (highly) indenite case. Dom. Dec. methods can be a bridge between the two cases Approach [specically for DMFT problem] :
Use direct methods in strongly Diag. Dom. case Use DD-type methods in nearly Diag. Dom. case Use direct methods in all other cases [until we nd better means :-) ]
36

Sparse Days 06 15 2010

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sparse Days 06 15 2010

Uploaded by

Copyright:

Available Formats

Computing the diagonal of the inverse of a sparse matrix Yousef Saad Department of Computer Science and Engineering University

Sparse Days, Toulouse, June 15, 2010

Sparse Days 06/15/2010

Sparse Days 06/15/2010

Introduction: A few examples

Sparse Days 06/15/2010

Sparse Days 06/15/2010

Sparse Days 06/15/2010

Note: when T 0 then f becomes a step function.

Sparse Days 06/15/2010

Methods based on the sparse L U factorization

Sparse Days 06/15/2010

Main idea. If A = LDU and B = A1 then

Sparse Days 06/15/2010

Sparse Days 06/15/2010

D (B ) = diagonal matrix with diagonal (B )

{vj }: Sequence of s random vectors

Let Vs = [v1, v2, . . . , vs]. Then, alternative expression:

D (B )=D (BVsVs )D 1(VsVs )

Ideas from information theory: Hadamard matrices

or Emax = max |V V T |ij

120 0 20 40 60 80 100 120

120 0 20 40 60 80 100 120

Pattern of VsVs , for s = 32 and s = 64.

Sparse Days 06/15/2010

i+1qi+1 = Aqi iqi iqi1 i, i+1 selected s.t. qi+1

= 1 and qi+1 qi, qi+1 qi1

AQm = QmTm + m+1qm+1em,

Sparse Days 06/15/2010

diagInv via Lanczos

For j = 1, 2, , Do: j +1qj +1 = Aqj j qj j qj 1 [Lanczos step] pj := qj j pj 1 j := j j j pj pj [Update of diag(inv(A))] dj := dj 1 + j

Using a sparse V : Probing

B can sometimes be approximated by a sparse matrix. Consider for some :

bij , |bij | > 0, |bij |

Generic Probing Algorithm

Sparse Days 06/15/2010

Standard probing (e.g. to compute a Jacobian)

Sparse Days 06/15/2010

What about Diag(inv(A))?

1 if color (k) == i 0 otherwise

Sparse Days 06/15/2010

Next Issue: Guessing the pattern of B

Sparse Days 06/15/2010

Probing tolerance: = 1010 GMRES tolerance: = 1012

n LAPACK Lanczos Probing

212 0.5 0.2 0.02

412 26 9.9 0.19

612 282 115 0.79

812 > 1000 838 2.0

Path length, k # Probing vectors # GMRES iterations

A few statistics for case n = 81

20 30 40 50 60 70 DMFT inner iteration Sparse Days 06/15/2010

Challenge: The indenite case

Sparse Days 06/15/2010

Divided & Conquer approach

A11 A12 A21 A22

with A11 Cmm and A22 C(nm)(nm),

Idea: Use Sherman-Morrisson formula.

C1U1 = E1, Solve iteratively C2U2 = E2

Sparse Days 06/15/2010

Domain Decomposition approach

Under usual ordering [interior points then interface points]: B1 F1 B F 2 2 B F . . . . . A= , T F C Bp Fp T T T F1 F2 Fp C