You are on page 1of 7

SPE

Society of Petroleum EngIneers of "IME

SPE 13535

Sparse Gaussian Elimination for Complex Reservoir Models


by A.H. Sherman, J.S. Nolen & Assocs. Inc.
SPE Member

Copyright 1985, Society of Petroleum Engineers

This paper was presented at the SPE 1985 Reservoir Simulation Symposium held in Dallas, Texas, February 10-13,1985. The material is subject to cor
rection by the author. Permission to copy is restricted to an abstract of not more than 300 words. Write SPE, P.O. Box 833836, Richardson, Texas
75083-3836. Telex: 730989 SPE DAL.
ABSTRACT Most direct methods tradi tionally used in
reservoir simulators are based on band Gaussian
The simulation industry is rapidly moving elimination in combination with a clever re-ordering
towards the frequent use of complex reservoir models of the unknowns to exploit certain structural
including such features as faults, implicit well features of the linear equations. The most commonly-
terms, and thermal or compositional effects. All of mentioned method is the D4-Gauss method due to Price
these things can pose great difficulties for the and CoatS. 11 In the past ten years, however, there
standard direct elimination methods typically found has been substantial interest in more general sparse
in reservoir simulators. This paper takes a new Gaussian elimination methods that can more fully
look at sparse Gaussian elimination methods in the exploit the large number of zero entries in the
context of the current technology and trends in coefficient matrices arising from reservoir simula-
simulation and computing. A novel implementation is tion. Within the simulation industry, there have
described, and numerical results are presented that been a number of proposals for such methods and
indicate that this implementation can be a more several papers have examined the performance of
effective tool for simUlation than previous versions sparse Gaussian elimination in the context of
of sparse elimination. reservoir simulation. 2 ,13-IS By and large, the
methodology of these studies has involved the
INTRODUCTION application, with little modification, of high-
quality general-purpose codes developed by
It is well known that the solution of linear specialists in numerical linear algebra. The usual
equations can account for far more than half of the conclusion has been that, especially for larger
total computational effort in a numerical reservoir problems, sparse methods can offer some advantages
simUlation. I 0 Because of this a great deal of over simpler alternatives such as band elimination
attention has been given to the development of or D4-Gauss, but that the gain does not warrant the
effective (i.e., accurate and efficient) methods for significant increase in code complexity. This has
solving such equations. Major recent advances have seemed to be particularly true on the vector and
been made in the technology of i terati ve methods, array processors now so widespread in the industry;
especially those based on conjugate-gradient-like in fact, the sparse methods tested were often very
iterations (cf., References 1, 8, and 12 for uncompetitive on such machines.l~
example). Despite this, however, there is still
strong evidence that direct methods, based on The studies just summarized offer a reasonable
Gaussian elimination, can play an important role. assessment of the utility of the sparse matrix soft-
In addition to the moderate-size two-dimensional ware of the late 1970's. Available codes were not
models on which such methods have traditionally been designed to exploit the special structure inherent
used, it appears that direct methods can also be in the linear equations arising in reservoir simula-
useful for a variety of complex fully-implicit tion, particularly those from fully-implicit models
reservoir models incorporating such features as having several equations and unknowns per grid
faults, implicit well models, or compositional or block. Moreover, the sparse matrix technology on
thermal effects. Such models, which are coming into which they were based had not yet advanced to the
increasingly-more-frequent use, can pose great point of allowing substantial vectorization on
difficulties for even the best iterative methods.l2 machines such as the CRAY-1 or Cyber 205. In addi-
Moreover, some of the newer i terati ve methods that tion, the test problems used in the studies were
may be capable of solving extremely difficult linear mainly confined to IMPES-type models without
systems require the solution of certain submodels complexities, such as faults or implicit well
that seem most amenable to efficient solution by treatments, that create non-standard grid connec-
direct methods. 12 tions and make Simpler elimination methods (and many
iterative methods) less attractive. As a result,
References and illustrations at end of paper. while the studies were extremely useful when they
were presented, the insights offered in them may no
407
2 SPARSE GAUSSIAN ELIMINATION FOR COMPLEX RESERVOIR MODELS SPE 13535
longer be accurate in the light of recent develop- The sparse Gaussian elimination method
ments in sparse matrix technology and the industry's described here solves the linear system (1) by
move toward larger, more implicit, more complex applying nonsymmetric Gaussian elimination to a
reservoir models. symmetrically permuted system of equations
The goal of this paper is to reexamine the (PAPT) (Px) = (Pb)
merits of sparse Gaussian elimination in the context
of current technology and trends in simulation. The where the permutation matrix P reorders the blocks
first part of the paper outlines an implementation of M1 and M2 so as to reduce the number of opera-
of sparse Gaussian elimination that can effectively tions or amount of storage required to solve (3).
exploit the structure of the simulation equations Our actual implementation uses the minimum degree
and seems better sui ted to vector processors than ordering 9 to reorder the blocks of M1 and leaves M2
previous sparse methods. Following this, we present in original order, but other choices for P can be
the results of several numerical experiments easily accomodated. In general, P is chosen at the
designed to investigate the capabilities of the new beginning of a simulation and retained throughout.
method. We conclude by assessing the current state Clearly, (3) satisfies all our assumptions about
of technology for direct methods in reservoir (1)
simulation.
Once P has been chosen, the solution x is
obtained by a two-step process. First, we compute
IMPLEMENTING SPARSE GAUSSIAN ELIMINATION an upper triangular matrix U and a vector y satis-
fying
In this section we outline an implementation of
sparse Gaussian elimination that is appropriate for U = L- 1QPAPT , Y = L- 1QPb (4)
the systems of linear equations arising in reservoir
simulation. The linear systems of interest here where L is a unit lower triangular matrix and Q is a
have the form sec~nd permutation matrix that reorders the rows of
PAP corresponding to a restricted form of partial
Ax = b (1) pi voting during the elimination process. We then
complete the process by solving
with
Uz = y, Px =z
A = rM1 J
I C
[R I M2 (2) In implementing sparse Gaussian elimination for
the simulation operations, it is important to
exploit several properties of the system (3). First,
where M1 is a block NB x NB matrix with NVB x NVB the individual matrix blocks (for NVB, NVW)1) are
blocks, M2 is a block NW x NW matrix with NVW x NVW typically either almost dense initially or rapidly
blocks, and R and C are rectangular matrices with filled in by elimination operations. Hence, there
commensurate block order and size. The overall is little motivation to treat intrablock sparsity,
order of the system is N = NBNVB + NWNVW. Typi- and by not doing so, it is possi ble to achieve a
cally, NB NW. In this paper we will consider significant savings in the amount of overhead
only NVB < 3 and NVW = 1, but the methods described integer storage. Second, each linear system (3)
are not restricted to those values. will be solved for only one righthand side. By
computing the vector y simultaneously with the
Equation (2) is general enough to include most matrix U, it is possible to avoid storing all of the
simUlation models in use today. It may be thought nonzeroes in the matrix L (although a significant
of as corresponding to a reservoir model containing portion of L must be retained temporarily to allow
NB grid blocks (including inactive zero-pore-volume for partial pi voting) Finally, by sui tably
blocks) and NW wells in which the interblock flow restricting the row interchanges due to parti~l
equations are formulated with NVB implicit equa- pivoting and exploiting the block symmetry of PAP ,
tions/unknowns per grid block, and NVW parameters it is possible to avoid storing even a structural
per well are solved for implicitly. Our restric- description of L that might otherwise be required to
tions here on NVB and NVW still allow for general control the elimination process. This yields a
black oil models, which typically have NVB = 1 substantial further reduction in pOinter storage.
(IMPES or sequential method pressure equations) or Empirically, it appears that the required restric-
NVB = 3 (simultaneous, fully impliCit method tion on the partial pivoting process does not impact
equations) and NVW = 1 (implicit bottomhole pressure the numerical stability of the overall method.
in each well). Compositional or thermal models
typically require larger values for NVB and NVW. With this discussion in mind, it is possible to
design a number of different implementations of
We assume that the matrix A has a block- sparse Gaussian elimination. A straightforward
symmetric incidence matrix; that is, the pattern of approach might be to adapt a code such as the Yale
nonzero blocks is symmetric. This is unlikely to be Sparse Matrix Package (YSMP). 7 Such an implementa-
a significant restriction in practice, since essen- tion would be particularly efficient on scalar
tially all reservoir simulation models give rise to machines, 13 but it would suffer on vector machines
matrices satisfying the assumption. The nonzero like the CYBER 205 or CRAY-1. lit In addition, the
entries of A, and, indeed, its (nonblock) incidence incorporation of partial pi voting would be awkward
matrix will frequently be nonsymmetric. and might lead to a substantial increase in the
required amount of integer pointer storage.

408
SPE 13535 ANDREW H. SHERMAN 3
The alternative considered in this work is hole pressure is required for wells completed
based on an algebraic interpretation of the multi- in multiple layers of a model. It is worth
frontal elimination method from structural engi- noting, however, that a well-coded D4-Gauss
neering in combination with an optional tradeoff method may be able to make more efficient use
between computation and storage by which certain of storage in certain situations where partial
matrix entries in U are computed several times in pivoting is not required.
order to avoid storing them. Many of the details of
the approach are similar to those described by
Eisenstat, et al,6 and Duff and Reid.~'s The NUMERICAL RESULTS
prinCipal modifications in the code itself concern
adaptations to the standard simulation data struc- In this section we report on several numerical
tures in which the matrix and righthand side are experiments designed to investigate aspects of the
presented. performance of our implementation of sparse Gaussian
elimination. Three sets of experiments were run.
The tradeoff between computation and storage is The first set provides information on possible
user-selectable. The code offers three options: tradeoffs between storage and time and on the effect
OPT=3 (maximum speed, no recomputation), OPT=2 of a vectorizing compiler on speed. The second set
(limited recomputation to save storage), and OPT=1 of experiments illustrates the performance of the
(full recomputation to minimize storage). As will method on realistic reservoir models. Finally, the
be evident from the numerical results in the next last set of experiments contains performance data
section, OPT=3 and OPT=2 seem the most useful. for a range of problem sizes in two and three dimen-
sions and allows direct comparison with previous
The major features of the implementation implementations of sparse Gaussian elimination for
include the following: reservoir simulation models. As noted earlier, none
of our examples require partial pivoting, so this
1. The innermost loops involve computations on feature of our method was turned off for these
dense submatrices rather than on sparse experiments.
vectors. This allows more efficient coding of
the loops and enhances vectorization. In All the examples in Test Sets 1 and 3 arise
particular, the Gaussian elimination computa- from reservoir models on regular grids. The grid
tion itself is really dense Gaussian elimi- dimensions and number of variables/equations per
nation; and, in order to enhance performance on grid block are shown in Table 1. These models are
vector machines, we have adopted the algorith- based on 5-point (2-D) or 7-point (3-D) difference
mic approach suggested by Dongarra and operators and contain neither faults nor implicitly-
Eisenstat. 3 In our code, the elimination loops treated well variables.
are unrolled to a depth of four.
The two examples used in Test Set 2 are based
2. Extremely large systems of equations may be on actual field models. FIELD 1 has a 35 x 11 x 13
sol ved by the method without the use of any grid containing 5005 cells of which only 28g8 have
auxiliary storage such as disk. The major nonzero pore volume. The model contains 18 wells
reason for this is that the required amount of and includes an implicit bottomhole pressure treat-
integer pointer storage is quite low (as a ment. The most effective approach to simulating the
percentage of the number of nonzeroes in L or model uses a sequential formulation, and it has been
U). Further enhancement of the method to found that the pressure equation can be extremely
inClude the use of auxiliary storage would be difficult for some standard iterative methods.
quite straightforward. (However, see Wallis, et al,12 for discussion of an
iterative method that is extremely effective for
3. Partial pivoting may be incorporated in the this problem.) For models such as this, it can be
elimination computation at very little addi- useful to apply direct methods; particularly during
tional cost. s While this may not be especially the initial period of model setup and testing, even
important for black-oil problems, it is likely if it is possible to tune up an iterative method for
that pivoting is required on a significant the most efficient final set of production runs.
portion of thermal and compositional models.
Experience indicates that local pivoting by FIELD2, the second example in Test Set 2, has a
interchanging equatiOns within individual grid 41 x 25 x 4 grid containing 4100 grid blocks. Again
blocks is usually sufficient to insure numeri- there are a large number of zero-pore-volume blocks,
cal stability, and a slight generalization of so only 2125 blocks are active in the model. The
this restricted pivoting is quite natural with model contains faults that both create 175 non-
our method. In the long run, it may turn out standard inter-block connections and seal off
that our method has its greatest utility when certain parts of the reservoir from the remainder.
pivoting is required, since the use of pivoting In addition, it is necessary to use an implicit
can substantially increase the storage demands bottomhole pressure treatment for the seven wells.
of other direct methods. (We note, however, As with FIELD1, the combination of features creates
that the results presented later in this paper a difficult situation for most itrative methods and
were obtained without pivoting.) all but eliminates any Gaussian elimination methods
based on two-cyclic matrices or banded solvers.
4. As compared to the D4-Gauss or band Gaussian
elimination methods, our sparse method requires All the test runs were made on a CRAY-1-S using
substantially less arithmetic and generates a a program written entirely in Fortran. The compiler
sparser set of factors for A. This is espe- used was CRAY' s CFT 1. 10 Compiler (in some cases
cially true when the matrix blocks R and C are with vectorization turned off). Reported CPU times
nonzero, as is the case when implicit bottom- are in seconds, and storage figures are in thousands
409
4 SPARSE GAUSSIAN ELIMINATION FOR COMPLEX RESERVOIR MODELS SPE 13535
of words. Storage figures include all storage Table 3 contains the results from Test Set 2.
except that for the original matrix and righthand The main purpose of running these examples was to
side. The reported rates are millions of floating demonstrate that the qualities of the method
point operations per second, counting only opera- observed on model data carryover to realistic
tions involved in Gaussian elimination and backsolu- reservoir models. We believe that this is borne out
tion. We note that where unvectorized times are by the data in Table 3, since we again observe many
gi ven, the relative comparisons among times are of the properties noticed in our discussion of Test
qualitatively similar to what has been observed on a Set 1. In particular, there seems to be a signifi-
Digital Equipment Co. VAX 11/780 computer. cant amount of vectorization despite the decrease in
structural regularity of the linear equations due to
Apart from loop unrolling and the overall faults, implicit bottomhole pressure, and inactive
algorithm design discussed in the previous section, grid blocks.
no speCial effort was made to vectorize the sparse
Gaussian elimination code. It is highly probable The results from Test Set 3 are given in Table
that improvement could be achieved through the use 4. One striking feature of these data is the
of certain assembly language routines available as significant improvement in the speed of the method
part of system or mathematical libraries on the with increasing problem size. There are two main
CRAY-1-S, but this has not been attempted for this reasons for this: first, as problem size increases,
study. a greater portion of total time is spent in the
innermost loops (which are the most efficient); and,
The results from Test Set 1 are shown in Table second, the average vector length increases with the
2. The data give inSight into several aspects of problem size. Since there is little evidence that
the new method: performance is reaching an asymptotic level, it is
to be expected that the speed would continue to
1. The method allows quite large problems to be improve for larger examples than those considered in
sol ved in relatively small amounts of storage. this work.
For example, by choosing OPT=2, it is possible
to solve the problem corresponding to the 50 x Several of the examples in Test Set 4 are iden-
50 grid using approximately 11 vectors of tical to examples considered by Woo and Levesque.lIt
length 2500 (NGB). Even choosing OPT=3 is A comparison of results shows that our method is
reasonably storage efficient when compared to more dependent on problem size (i.e., vector length)
band Gaussian elimination or other implementa- than their YSMP-like implementation, but that its
tions of sparse Gaussian elimination. When peak performance can be substantially better for
NVB) 1, the integer storage overhead for the large problems. For small problems, the vector
method may be reduced below 10%. lengths are short, and the YSMP-like approach
appears to be faster than our new method. However,
2. Depending on the availability of computing the situation is entirely reversed when the example
resources and the particular charging algorithm problems approach the sizes more typically encoun-
in use, the use of OPT=2 may provide an attrac- tered in actual modeling. For these problems, the
ti ve tradeoff between storage and time. As a new method is up to three times faster.
conservative example, consider a situation in
which the charging algorithm defines cost as
the product of storage and time. In such a DISCUSSION AND CONCLUSION
case, the costs of solving the linear systems
in Test Set 1 may be reduced by up to 40% (30 x In this paper we have reported on experience
30 grid) by using OPT=2. This example is with a new implementation of sparse Gaussian elimi-
conservative because many cost algorithms use a nation. We began with the goal of evaluating such
higher power of the storage measure, and, in methods in the context of current trends in simula-
virtual storage systems, the use of extra tion and computing. While our results are certainly
storage may lead to significant increases in not conclusive, it does appear that our new imple-
other costs that are not measured in our data. mentation offers Significantly better performance on
computers like the CRAY-1 than implementations
3. It would appear that using OPT=1 is almost reported previously. Moreover, we have seen that it
never an attractive alternative, since it is now possible to efficiently incorporate partial
increases CPU time substantially without pivoting into a sparse Gaussian elimination code and
providing much additional storage reduction to use such a code to deal effectively with linear
over OPT=2. equations including the effects of a variety of
modeling complexities such as faults, impliCit well
4. The method can aChieve a Significant level of terms, and inactive grid blocks.
vectorization, particularly when NVB>1. In the
case of the 8 x 8 x 8 grid for OPT=3, vectori- Despite these comments, however, we do not
zation leads to a 3.5-fold increase in speed. recommend that sparse Gaussian elimination be the
For NVB=1, the relatively small amount of sole direct method in a reservoir simulator. For a
vectorization is somewhat disappointing and is significant range of problems not requiring partial
due largely to the overall code complexity and pivoting, it appears from our experience that a
the amount of time spent outside the innermost well-coded D4-Gauss routine may be both faster and
loops. more storage-efficient than sparse Gaussian elimina-
tion. This is especially true for two-dimensional
black-oil models with NVB= 1 on vector machines. On
scalar machines, and for large values of NVB, sparse
Gaussian elimination becomes preferable on much

410
SPE 13535 ANDREW H. SHERMAN 5
smaller problems. Of course, it remains the case 12. Wallis, J.R., Kendall, R.P., and Little, T.E.:
that the use of implicit well equations, faulted "Constrained Residual Acceleration of
grids, or denser finite difference operators that Conjugate Residual Methods," Paper SPE 13536
destroy the two-cyclic nature of the coefficient presented at the 8th SPE Symposium on
matrix will cause D4-Gauss to be inapplicable. In Reservoir Simulation, Dallas, TX (1985).
such instances, sparse Gaussian elimination will
very likely be the only suitable direct method. 13. Woo, P.T., Eisenstat, S.C., Schultz, M.H., and
Sherman, A.H.: "Application of Sparse Matrix
Techniques to Reservoir Simulation," in Sparse
REFERENCES Matrix Computations, J.R. Bunch and D.J. Rose,
eds., Academic Press (1976) 427-438.
1. Appleyard, J.R. and Cheshire, 1.M.: "Nested
Factorization," Paper SPE 12265 presented at 14. Woo, P.T. and Levesque, J.M.: "Benchmarking a
the 7th SPE Symposium on Reservoir Simulation Sparse Elimination Routine on the Cyber 205
(1983) 315-324. and the CRAY-1," paper SPE 10526 presented at
the 6th SPE Symposium on Reservoir Simulation,
2. Calahan, D.A.: "Vectorized Direct Solvers for New Orleans, LA (1982) 535-538.
2-D Grids," Paper SPE 10522 presented at the
6th SPE Symposium on Reservoir Simulation, New 15. Woo, P.T., Roberts, S.J., and Gustavson, F.G.:
Orleans, LA (1982) 489-497. "Application of Sparse Matrix Techniques in
Reservoir Simulation," Paper SPE 4544
3. Dongarra, J.J. and Eisenstat, S.C.: "Squeezing presented at the 48th Annual Meeting of the
the Most out of an Algorithm in CRAY FORTRAN," SPE, Las Vegas, Nevada (1973).
ACM Trans. Math. Software, .1Q (1984) 221-230.
4. Duff, 1.S. and Reid, J.K.: "The Multifrontal
Solution of Indefinite Sparse Symmetric Linear
EquatiOns," ACM Trans. Math. Software 9 (1983)
302-325. -
5. Duff, 1.S. and Reid, J .K.: "The Multifrontal
Solution of Unsymnetric Sets of Linear Equa-
tiOns," SIAM J. Numer. Anal. .2. (1984) 633-641.
6. Eisenstat, S.C., Schultz, M.H., and Sherman,
A.H.: "Software for Sparse Gaussian Elimina-
tion with Limited Core Storage," in Sparse
Matrix Proceedings 1978, 1.S. Duff and G.W.
Stewart, eds., SIAM Press (1979) 135-153.
7. Eisenstat, S.C., Schultz, M.H., Sherman, A.H.,
and Gursky, M.C.: "The Yale Sparse Matrix
Package II: Nonsymmetric Matrices," Report
114, Department of Computer SCience, Yale
University (1977).
8. Elman, H., Eisenstat, S.C., and Schultz, M.H.:
"Block-Preconditioned Conjugate Gradient-Like
Methods for Numerical Reservoir Simulation,"
Paper SPE 13534 presented at the 8th SPE
Symposium on Reservoir Simulation, Dallas, TX
(1985)
9. George, J.A. and Liu, J.W.: Computer Solution
of Large Sparse Positive Definite Systems,
Prentice-Hall (1981) 115-137.
10. Stanat, P.L. and Nolen, J.S.: "Performance
Comparisons for Reservoir Simulation Problems
on Three Supercomputers," Paper SPE 10640
presented at the 6th SPE Symposium on
Reservoir Simulation, New Orleans, LA (1983)
593-602.
11. Price, H.S. and Coats, K.H.: "Direct Methods
in Reservoir Simulation," SPE Journal (1974),
295-308.

411
TABLE 1
Models Used in Test Sets 1 and 3

Grid .1frL NVB (Test Set 1) NVB (Test Set 2)


20 x 20 400 1, 3
30 x 30 900 3 1, 3
40 x 40 1600 1, 3
50 x 50 2500 1, 3
80 x 40 3200 1, 3
6x6x6 216 1, 3
8x8 x8 512 3 1, 3
10 x 10 x 10 1000

TABLE 2
Results of Test Set 1

Storage Vectorized Unvectorized


Grid NVB OPT Integer Real ~ Rate Time Rate
3 31.7 50.4 0.57 4.1 0.92 2.5
50 x 50 2 16.2 12.2 1.25 3.1 1.88 2.1
12.7 12.2 3.37 1.8 4.50 1.3

3 16.5 71.1 .54 10.5 1.34 4.2


10 x 10 x 10 2 6.1 48.8 .90 8.0 1.97 3.7
5.4 48.8 1.74 4.5 3.00 2.6

3 13.0 133.2 .92 12.2 2.34 4.8


30 x 30 3 2 7.0 40.5 1.72 9.9 3.96 4.3
6.5 40.5 3.49 6.9 6.82 3.5

3 8.9 211.0 1.34 21.7 4.77 6.1


8x8x8 3 2 4.3 149.3 1.93 17.6 6.08 5.6
3.8 149.3 3.06 12.2 7.81 4.8
TABLE 3
Results of Test Set 2

Stor~e Vectorized Unvectorized


Model NVB OPT Integer Real Time Rate Time Rate

FIELD 1 3 55.0 153.4 1.66 11.2 4.03 4.6

FIELD 1 2 33.1 57.8 3.13 9.4 7.05 4.2

FIELD2 3 45.9 90.0 1.17 6.9

FIELD2 2 25.1 53.3 2.45 5.3

TABLE 4

Results of Test Set 3

Storage Vectorized
Q!j& NVB OPT Integer Real Time Rate
20 x 20 3 4.9 5.7 0.07 1.5
30 x 30 3 11.2 15.1 0.18 2.4
40 x 40 3 20.1 30.3 0.34 3.2
50 x 50 3 31.7 50.4 0.57 4.1
80 x 40 3 40.6 62.6 0.73 4.2
6x6x6 3 3.2 6.5 0.07 2.7
8x8x8 3 7.9 23.7 0.20 5.6
10 x 10 x 10 1 3 16.5 71.1 0.54 10.5
20 x 20 3 3 5.7 49.6 0.33 8.2
30 x 30 3 3 13.0 133.2 0.92 12.2
40 x 40 3 2 13.4 75.2 3.63 12.4
50 x 50 3 2 21.2 115.8 6.83 14.9
80 x 40 3 2 27.3 131.8 9.20 15.1
6x6x6 3 3 3.7 57.4 0.36 12.1
8x8x8 3 3 8.9 211.0 1.34 21.7

You might also like