You are on page 1of 36

Computing the diagonal of the inverse of a sparse matrix Yousef Saad Department of Computer Science and Engineering University

of Minnesota

Sparse Days, Toulouse, June 15, 2010

Motivation: DMFT
Dynamic Mean Field Theory - quantum mechanical studies of highly correlated particles Equation to be solved (repeatedly) is Dysons equation

G( ) = [( + )I V ( ) + T ]1 (frequency) and (chemical potential) are real V = trap potential = real diagonal ( ) == local self-energy - a complex diagonal T is the hopping matrix (sparse real).

Sparse Days 06/15/2010

Interested only in diagonal of G( ) in addition, equation must be solved self-consistently and ... ... must do this for many s Related approach: Non Equilibrium Greens Function (NEGF) approach used to model nanoscale transistors. Many new applications of diagonal of inverse [and related problems.] A few examples to follow

Sparse Days 06/15/2010

Introduction: A few examples


Problem 1: Compute Tr[inv[A]] the trace of the inverse. Arises in cross validation : (I A( ))g 2 with A( ) I D (D T D +LLT )1D T , Tr (I A( )) D == blurring operator and L is the regularization operator In [Huntchinson 90] Tr[Inv[A]] is stochastically estimated Many authors addressed this problem.

Sparse Days 06/15/2010

Problem 2: Compute Tr [ f (A)], f a certain function Arises in many applications in Physics. Example: Stochastic estimations of Tr ( f(A)) extensively used by quantum chemists to estimate Density of States, see [Ref: H. Rder, R. N. Silver, D. A. Drabold, J. J. Dong, Phys. Rev. B. 55, 15382 (1997)]

Sparse Days 06/15/2010

Problem 3: Compute diag[inv(A)] the diagonal of the inverse Arises in Dynamic Mean Field Theory [DMFT, motivation for this work]. In DMFT, we seek the diagonal of a Greens function which solves (self-consistently) Dysons equation. [see J. Freericks 2005] Related approach: Non Equilibrium Greens Function (NEGF) approach used to model nanoscale transistors. In uncertainty quantication, the diagonal of the inverse of a covariance matrix is needed [Bekas, Curioni, Fedulova 09]

Sparse Days 06/15/2010

Problem 4: Compute diag[ f (A)] ; f = a certain function. Arises in any density matrix approach in quantum modeling - for example Density Functional Theory. Here, f = Fermi-Dirac operator:

f( ) =

1
1 + exp( k ) BT

Note: when T 0 then f becomes a step function.

Note: if f is approximated by a rational function then diag[f(A)] a lin. combinaiton of terms like diag[(A iI )1] Linear-Scaling methods based on approximating f (H ) and Diag(f (H )) avoid diagonalization of H

Sparse Days 06/15/2010

Methods based on the sparse L U factorization


Basic reference: K. Takahashi, J. Fagan, and M.-S. Chin, Formation of a sparse bus impedance matrix and its application to short circuit study, in Proc. of the Eighth Inst. PICA Conf., Minneapolis, MN, IEEE, Power Engineering Soc., 1973, pp. 63-69. Described in [Duff, Erisman, Reid, p. 273] Algorithm used by Erisman and Tinney [Num. Math. 1975]

Sparse Days 06/15/2010

Main idea. If A = LDU and B = A1 then

B = U 1D 1 + B (I L);

B = D 1L1 + (I U )B.

Not all entries are needed to compute selected entries of B For example: Consider lower part, i > j ; use rst equation:

bij = (B (I L))ij =
k>j

bik lkj

Need entries bik of row i where Lkj = 0, k > j . Entries of B belonging to the pattern of (L, U )T can be extracted without computing any other entries outside the pattern.

Sparse Days 06/15/2010

More recently exploited in a different form in L. Lin, C. Yang, J. Meza, J. Lu, L. Ying, W. E SelInv An algorithm for selected inversion of a sparse symmetric matrix, Tech. Report, Princeton Univ. An algorithm based on a form of nested dissection is described in Li, Ahmed, Glimeck, Darve [2008] A close relative to this technique is represented in L. Lin , J. Lu, L. Ying , R. Car , W. E Fast algorithm for extracting the diagonal of the inverse matrix with application to the electronic structure analysis of metallic systems Comm. Math. Sci, 2009. Difculty: 3-D problems.

Sparse Days 06/15/2010

10

Stochastic Estimator
A = original matrix, B = A1. (B ) = diag(B ) [matlab notation]
Notation:

D (B ) = diagonal matrix with diagonal (B )


and : Elementwise multiplication and division of vectors

{vj }: Sequence of s random vectors


Result:
s

vj Bvj

vj vj

(B )
j =1

j =1

Refs: C. Bekas , E. Kokiopoulou & YS (05), Recent: C. Bekas, A. Curioni, I. Fedulova 09.
Sparse Days 06/15/2010
11

Let Vs = [v1, v2, . . . , vs]. Then, alternative expression:

D (B ) D (BVsVs )D 1(VsVs )
Question: When is this result exact? Main Proposition

Let Vs Rns with rows {vj,:}; and B Cnn with elements {bjk } Assume that: vj,:, vk,: = 0, j = k, s.t. bjk = 0
Then:

D (B )=D (BVsVs )D 1(VsVs )


Approximation to bij exact when rows i and j of Vs are
Sparse Days 06/15/2010
12

Ideas from information theory: Hadamard matrices


Consider the matrix V want the rows to be as orthogonal as possible among each other, i.e., want to minimize

Erms =

I VVT

n(n 1)

or Emax = max |V V T |ij


i=j

Problems that arise in coding: nd code book [rows of V = code words] to minimize cross-correlation amplitude Welch bounds:

Erms

ns (n 1)s

Emax

ns (n 1)s

Result: a sequence of s vectors vk with binary entries which achieve the rst Welch bound iff s = 2 or s = 4k.
Sparse Days 06/15/2010
13

Hadamard matrices are a special class: n n matrices with entries 1 and such that HH = nI . 1 1 1 1 1 1 1 1 1 1 . and Examples : 1 1 1 1 1 1 1 1 1 1 Achieve both Welch bounds Can build larger Hadamard matrices recursively: Given two Hadamard matrices H1 and H2, the Kronecker product H1 H2 is a Hadamard matrix. Too expensive to use the whole matrix of size n Can use Vs = matrix of s rst columns of Hn
Sparse Days 06/15/2010
14

20

20

40

40

60

32

60

64

80

80

100

100

120 0 20 40 60 80 100 120

120 0 20 40 60 80 100 120

Pattern of VsVs , for s = 32 and s = 64.

Sparse Days 06/15/2010

15

A Lanczos approach
Given a Hermitian matrix A - generate Lanczos vectors via:

i+1qi+1 = Aqi iqi iqi1 i, i+1 selected s.t. qi+1


Result:
2

= 1 and qi+1 qi, qi+1 qi1

AQm = QmTm + m+1qm+1em,


1 When m = n then A = QnTnQn and A1 = QnTn Qn . 1 For m < n use the approximation: A1 QmTm Qm 1 D (A1) D [QmTm Qm]

Sparse Days 06/15/2010

16

ALGORITHM : 1

diagInv via Lanczos

For j = 1, 2, , Do: j +1qj +1 = Aqj j qj j qj 1 [Lanczos step] pj := qj j pj 1 j := j j j pj pj [Update of diag(inv(A))] dj := dj 1 + j

j +1 := EndDo

j +1 j

dk (a vector) will converge to the diagonal of A1 Limitation: Often requires all n steps to converge One advantage: Lanczos is shift invariant so can use this for many s Potential: Use as a direct method - exploiting sparsity
Sparse Days 06/15/2010
17

Using a sparse V : Probing


Goal: Difculty: Find Vs such that (1) s is small and (2) Vs satises Proposition (rows i & j orthgonoal for any nonzero bij ) Can work only for sparse matrices but B = A1 is usually dense

B can sometimes be approximated by a sparse matrix. Consider for some :

(B )ij =

bij , |bij | > 0, |bij |

B will be sparse under certain conditions, e.g., when A is diagonally dominant In what follows we assume B is sparse and set B := B . Pattern will be required by standard probing methods.
Sparse Days 06/15/2010
18

Generic Probing Algorithm


ALGORITHM : 2 Probing Input: A, s Output: Matrix D (B ) Determine Vs := [v1, v2, . . . , vs] for j 1 to s Solve Axj = vj end Construct Xs := [x1, x2, . . . , xs] Compute D (B ) := D XsVs D 1(VsVs ) Note: rows of Vs are typically scaled to have unit 2-norm =1., so D 1(VsVs ) = I .

Sparse Days 06/15/2010

19

Standard probing (e.g. to compute a Jacobian)


Several names for same method: probing; CPR, Sparse Jacobian estimators,.. Basis of the method: can compute Jacobian if a coloring of the columns is known so that no two columns of the same color overlap.
1 3 5 12 13 16 20

All entries of same color can be computed with one matvec. Example: For all blue entries multiply B by the blue vector on right.

1 1

(1) (3)

(5)

1 1

(12) (15)

1 1

(13)

(20)

Sparse Days 06/15/2010

20

What about Diag(inv(A))?


Dene vi - probing vector associated with color i:

[vi]k =

1 if color (k) == i 0 otherwise

Standard probing satises requirement of Proposition but... ... this coloring is not what is needed! [It is an overkill] Alternative: Color the graph of B in the standard graph coloring algorithm [Adjacency graph, not graph of column-overlaps] Result: Graph coloring yields a valid set of probing vectors for D (B ).
Sparse Days 06/15/2010
21

color red

color black

Proof: Column vc: one for each node i whose color is c, zero elsewhere. Row i of Vs: has a 1 in column c, where c = color (i), zero elsewhere. If bij = 0 then in matrix Vs:

0 1 0 0 0 0 0 i

0 0

0 0 0 1 0

i-th row has a 1 in column color (i), 0 elsewhere. j -th row has a 1 in column color (j ), 0 elsewhere.
The 2 rows are orthogonal.
Sparse Days 06/15/2010
22

Example:

Two colors required for this graph two probing vectors Standard method: 6 colors [graph of B T B ]

Sparse Days 06/15/2010

23

Next Issue: Guessing the pattern of B


Recall that we are dealing with B := B [pruned B ] Assume A diagonally dominant Write A = D E , with D = D (A). Then :

A = D (I F ) with F D 1E

A1 (I + F + F 2 + + F k )D 1
B (k )

When A is D.D. F k decreases rapidly. Can approximate pattern of B by that of B (k) for some k. Interpretation in terms of paths of length k in graph of A.
Sparse Days 06/15/2010
24

Q: How to select k? A: Inspect A1ej for some j Values of solution outside pattern of (Ak ej ) should be small. If during calculations we get larger than expected errors then redo with larger k, more colors, etc.. Can we salvage what was done? Question still open.

Sparse Days 06/15/2010

25

Preliminary experiments
Problem Setup

DMFT: Calculate the imaginary time Greens function DMFT Parameters: Set of physical parameters is provided DMFT loop: At most 10 outer iterations, each consisting of 62 inner iterations Each inner iteration: Find D (B ) Each inner iteration: Find D (B ) Matrix: Based on a ve-point stencil with ajj = + i V s(j )
Probing Setup

1 0 0 1 10 1 1 0 1 0 0 1 1 0 1 0 a 1 0 1 0 1 1
jj

Probing tolerance: = 1010 GMRES tolerance: = 1012


Sparse Days 06/15/2010
26

Results
CPU times (sec) for one inner iteration of DMFT.

n LAPACK Lanczos Probing


20

212 0.5 0.2 0.02

412 26 9.9 0.19

612 282 115 0.79

812 > 1000 838 2.0

15 Number

Path length, k # Probing vectors # GMRES iterations

A few statistics for case n = 81

10

0 0

10

20 30 40 50 60 70 DMFT inner iteration Sparse Days 06/15/2010

27

Challenge: The indenite case


The DMFT code deals with a separate case which uses a real axis sampling.. Matrix A is no longer diagonally dominant Far from it. This is a much more challenging case. One option: solve Axj = ej FOR ALL j s - with the ARMS solver using ddPQ ordering + exploit multiple right-hand sides More appealing: DD-type approaches

Sparse Days 06/15/2010

28

Divided & Conquer approach


Let A == a 5-point matrix (2-D problem) split roughly in two: A1 I I A2 I . . . .. .. .. I A I k A= I Ak+1 I ... ... ... I Any 1 I I Any where {Aj } = tridiag. Write:

A=

A11 A12 A21 A22

A11 A22

A12 A21

with A11 Cmm and A22 C(nm)(nm),


Sparse Days 06/15/2010
29

Observation:
T A11 + E1E1 T T E1E1 E1E2 T T E2E1 E2E2

A=

A22 +

T E2E2

where E1, E2 are (relatively) small rank matrices: I E1 := Cmnx , E2 := C(nm)nx , I Of the form

A = C EE ,

C :=

C1 C2

E :=

E1 E2

Idea: Use Sherman-Morrisson formula.


Sparse Days 06/15/2010
30

A1 = C 1 + U G1U T , with: U = C 1E Cnnx G = Inx E T U Cnxnx , D (A1) can be found from D (A1) =
1 D (C1 ) 1 D (C2 )

+D (U G1U T ).

recursion

U : solve CU = E , or

C1U1 = E1, Solve iteratively C2U2 = E2

T T G: G = Inx E T U = Inx E1 U1 E2 U2

Sparse Days 06/15/2010

31

Domain Decomposition approach


Domain decomposition with p = 3 subdomains Zoom into Subdomain 2

12

3
23

Under usual ordering [interior points then interface points]: B1 F1 B F 2 2 B F . . . . . A= , T F C Bp Fp T T T F1 F2 Fp C

100

Example of matrix A based on a DDM ordering with p = 4 subdomains. (n = 252)

200

300

400

500

600 0 100 200 300 400 nz = 3025 500 600

Inverse of A [Assuming both B and S nonsingular]


1 1 1 T 1 1 1 B + B F S F B B F S A 1 = S 1 F T B 1 S 1 S = C F T B 1F,

D (A1) =

D (B 1) + D (B 1F S 1F T B 1) D (S 1 )

Note: each diagonal block decouples from others: Inverse of A in ith block (domain)
1 (A1)ii = D (Bi ) + D (HiS 1HiT ) 1 Hi = Bi Fi

Note: only nonzero columns of Fi are those related to interface vertices. Approach similar to Divide and Conquer but not recursive..

Sparse Days 06/15/2010

34

DMFT experiment Times (in seconds) for direct inversion (INV), divide-and-conquer (D&C), and domain decomposition (DD) methods. n INV D&C DD p = 4 subd. for DD 21 .3 .1 .1 Various sizes - 2-D problems 51 12 1.4 .7 Times: seconds in matlab 81 88 7.1 3.2 NOTE: work still in progress

Sparse Days 06/15/2010

35

Conclusion
Diag(inv(A)) problem: easy for Diag. Dominant case. Very challenging in (highly) indenite case. Dom. Dec. methods can be a bridge between the two cases Approach [specically for DMFT problem] :

Use direct methods in strongly Diag. Dom. case Use DD-type methods in nearly Diag. Dom. case Use direct methods in all other cases [until we nd better means :-) ]

Sparse Days 06/15/2010

36

You might also like