Professional Documents
Culture Documents
Editorial
Multigrid Methods
SUMMARY
This special issue contains papers from the Thirteenth Copper Mountain Conference on Multigrid Methods,
held in the Colorado Rocky Mountains on March 19–23, 2007, co-chaired by Van Henson and Joel
Dendy. The papers address a variety of applications and cover a breadth of topics, ranging from theory
to high-performance computing. Copyright q 2008 John Wiley & Sons, Ltd.
KEY WORDS: multigrid; image processing; adaptive refinement; domain decomposition; Karhunen–
Loève expansion; eigensolver; Hodge decomposition
The First Copper Mountain Conference on Multigrid Methods was organized in 1983 by Steve
McCormick, who persevered to chair nine more in this biennial series before handing over the reins
in 2003. Today, the conference is widely regarded as one of the premier international conferences
on multigrid methods. In 1990, it was joined by the equally successful conference on iterative
methods, chaired by Tom Manteuffel. The 2007 multigrid meeting was co-chaired by the now
three-time veterans Van Henson and Joel Dendy.
The conference began with three tutorial sessions given by Van Henson and Craig Douglas. The
sessions covered multigrid basics as well as more advanced topics such as nonlinear multigrid and
algebraic multigrid (AMG). The remaining five days of the conference were organized around a
series of 25-min talks, allowing ample time for individual research discussions with colleagues.
The student paper competition produced three winners, Hengguang Li (Penn State University),
Christian Mense (Technical University of Bonn), and Hisham Zubair (University of Delft), who
presented their papers in the student session.
This special issue contains 10 papers from the Thirteenth Copper Mountain Conference on
Multigrid Methods, held in the Colorado Rocky Mountains on March 19–23, 2007. The papers
address a variety of applications and cover a breadth of topics, ranging from theory to high-
performance computing.
De Sterck et al. [1] explore two efficiency-based refinement strategies for the adaptive finite
element solution of partial differential equations (PDEs). The goal is to reach a pre-specified bound
on the global discretization error with minimal amount of work. The methods described require a
multigrid method that is optimal on adaptive grids with potentially higher-order elements.
De Sterck et al. [2] introduce long-range interpolation strategies for AMG. The resulting AMG
methods exhibit dramatic reductions in complexity costs on parallel computers while maintaining
near-optimal multigrid convergence properties.
Rosseel et al. [3] describe an AMG method for solving stochastic PDEs. The stochastic finite
element method is used to transform the problem to a large system of coupled PDEs, and the
AMG method is used to solve the system.
Bell and Olson [4] propose a general AMG approach for the solution of discrete k-form
Laplacians. The method uses an aggregation approach and maintains commutativity of the coarse
and fine de Rham complexes.
Stürmer et al. [5] introduce a fast multigrid solver for applications in image processing, including
image denoising and non-rigid diffusion-based image registration. The solver utilizes architecture-
aware optimizations and is compared with solvers based on fast Fourier transforms.
Köstler et al. [6] develop a geometric multigrid solver for optical flow and image registration
problems. The collective pointwise smoothers used are analyzed with Fourier analysis, and the
method is applied to synthetic and real world images.
Michelini and Coyle [7] introduce an alternative to classical local Fourier analysis (LFA) as a
tool for designing intergrid transfer operators in multigrid methods. A harmonic aliasing property
is introduced and the approach is compared and contrasted with LFA.
Brezina et al. [8] introduce an eigensolver based on the smoothed aggregation (SA) method
that produces an approximation to the minimal eigenvector of the system. The ultimate aim of the
work is to improve the so-called adaptive SA method, which has been shown to be a highly robust
solver.
Zhu [9] derives convergence theory for overlapping domain decomposition methods for second-
order elliptic equations with large jumps in coefficients. It is shown that the convergence rate is
nearly uniform with respect to the jumps and mesh size.
Brannick et al. [10] analyze a multigrid V-cycle scheme for solving the discretized 2D Poisson
equation with corner singularities. The method is proven to be uniformly convergent for finite
element discretizations of the Poisson equation on graded meshes, and supporting numerical
experiments are supplied.
The 2007 conference was held in cooperation with the Society for Industrial and Applied Math-
ematics and sponsored by the Lawrence Livermore and Los Alamos National Laboratories, Front
Range Scientific Computation, Inc., the Department of Energy, the National Science Foundation,
and IBM Corporation. The Program Committee members for the conference were Susanne Brenner,
Craig Douglas, Robert Falgout, Jim Jones, Kirk Jordan, Tom Manteuffel, Steve McCormick, David
Moulton, Kees Oosterlee, Joseph Pasciak, Ulrich Rüde, John Ruge, Klaus Stüben, Olof Widlund,
Ulrike Yang, Irad Yavneh, and Ludmil Zikatanov. The Program Committee served as Guest Editors
for the special issue.
We thank the editors of Numerical Linear Algebra with Applications for hosting this special issue,
especially Panayot Vassilevski, for his invaluable help and guidance. This work was performed
under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory
under Contract DE-AC52-07NA27344.
REFERENCES
1. De Sterck H, Manteuffel T, McCormick S, Nolting J, Ruge J, Tang L. Efficiency-based h- and hp-refinement
strategies for finite element methods. Numerical Linear Algebra with Applications 2008; DOI: 10.1002/nla.567.
2. De Sterck H, Falgout RD, Nolting JW, Yang UM. Distance-two interpolation for parallel algebraic multigrid.
Numerical Linear Algebra with Applications 2008; DOI: 10.1002/nla.559.
3. Rosseel E, Boonen T, Vandewalle S. Algebraic multigrid for stationary and time-dependent partial differential
equations with stochastic coefficients. Numerical Linear Algebra with Applications 2008; DOI: 10.1002/nla.568.
4. Bell N, Olson LN. Algebraic multigrid for k-form Laplacians. Numerical Linear Algebra with Applications
2008; DOI: 10.1002/nla.577.
5. Stürmer M, Köstler H, Rüde U. A fast full multigrid solver for applications in image processing. Numerical
Linear Algebra with Applications 2008; DOI: 10.1002/nla.563.
6. Köstler H, Ruhnau K, Wienands R. Multigrid solution of the optical flow system using a combined diffusion-
and curvature-based regularizer. Numerical Linear Algebra with Applications 2008; DOI: 10.1002/nla.576.
7. Michelini PN, Coyle EJ. A semi-algebraic approach that enables the design of inter-grid operators to optimize
multigrid convergence. Numerical Linear Algebra with Applications 2008; DOI: 10.1002/nla.579.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:85–87
EDITORIAL 87
ROBERT D. FALGOUT
GUEST EDITOR
Center for Applied Scientific Computing
Lawrence Livermore National Laboratory
Livermore, CA, U.S.A.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:85–87
NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS
Numer. Linear Algebra Appl. 2008; 15:89–114
Published online 17 January 2008 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nla.567
SUMMARY
Two efficiency-based grid refinement strategies are investigated for adaptive finite element solution of
partial differential equations. In each refinement step, the elements are ordered in terms of decreasing
local error, and the optimal fraction of elements to be refined is determined based on efficiency measures
that take both error reduction and work into account. The goal is to reach a pre-specified bound on
the global error with minimal amount of work. Two efficiency measures are discussed, ‘work times
error’ and ‘accuracy per computational cost’. The resulting refinement strategies are first compared for
a one-dimensional (1D) model problem that may have a singularity. Modified versions of the efficiency
strategies are proposed for the singular case, and the resulting adaptive methods are compared with a
threshold-based refinement strategy. Next, the efficiency strategies are applied to the case of hp-refinement
for the 1D model problem. The use of the efficiency-based refinement strategies is then explored for
problems with spatial dimension greater than one. The ‘work times error’ strategy is inefficient when the
spatial dimension, d, is larger than the finite element order, p, but the ‘accuracy per computational cost’
strategy provides an efficient refinement mechanism for any combination of d and p. Copyright q 2008
John Wiley & Sons, Ltd.
1. INTRODUCTION
Adaptive finite element methods are being used extensively as powerful tools for approximating
solutions of partial differential equations (PDEs) in a variety of application fields, see, e.g. [1–3].
This paper investigates the behavior of two efficiency-based grid refinement strategies for adaptive
∗ Correspondence to: H. De Sterck, Department of Applied Mathematics, University of Waterloo, Waterloo, Ont.,
Canada.
†
E-mail: hdesterck@uwaterloo.ca
finite element solution of PDEs. It is assumed that a sharp, easily computed local a posteriori
error estimator is available for the finite element method. In each refinement step, the elements
are ordered in terms of decreasing local error, and the optimal fraction of elements to be refined
in the current step is determined based on efficiency measures that take both error reduction
and work into account. The goal is to reach a pre-specified bound on the global error with a
minimal amount of work. It is assumed that optimal solvers are used for the discrete linear
systems and that the computational work for solving these systems is, thus, proportional to the
number of degrees of freedom (DOF). Two efficiency measures are discussed. The first efficiency
measure is ‘work times error’ efficiency (WEE), which was originally proposed in [4]. A second
measure proposed in this paper is called ‘accuracy per computational cost’ efficiency (ACE). In
the first part of the paper, the performance of the two measures is compared for a standard one-
dimensional (1D) model problem with solution x , which may exhibit a singularity at the origin,
depending on the value of the parameter . The accuracy of the resulting grid is compared with
the asymptotically optimal ‘radical grid’ [3, 5]. Modified versions of the efficiency strategies are
proposed for the singular case, and the resulting adaptive methods are compared with a threshold-
based refinement strategy. The efficiency strategies are also applied to the hp-refinement case
for the 1D model problem, and the results are compared with the ‘optimal geometric grid’ for
hp-refinement that was derived in [5]. In the last part of the paper, the use of the efficiency-
based refinement strategies is explored for problems with spatial dimension d>1. The ‘work times
error’ strategy turns out to be inefficient when the spatial dimension, d, is larger than the finite
element order, p, but the ‘accuracy per computational cost’ strategy provides an efficient refinement
mechanism for any combination of d and p. This is illustrated for a model problem in two
dimensions (2D).
This paper is organized as follows. In the following section, the efficiency-based h-refinement
strategies are described, along with the notation used in this paper, the model problem, and
assumptions on the PDE problems, finite element methods, error estimators, and linear solvers
considered. The performance of the WEE and ACE refinement strategies for the 1D model problem
is discussed in Section 3. Modified WEE and ACE refinement strategies for the singular case
are considered in Section 4. In Section 5, efficiency-based hp-refinement strategies is discussed
and illustrated for the 1D test problem. Section 6 describes how the efficiency-based refinement
strategies can be applied for 2D problems. Throughout the paper, numerical tests illustrate the
performance of the proposed methods. Smooth and singular 1D model problems are introduced in
Section 2.2, and the performance of the proposed h- and hp-refinement strategies in 1D is discussed
in Sections 3–5. A smooth 2D test problem is proposed in Section 6.2, and 2D h-refinement results
are discussed in Section 6.3. Conclusions are formulated in Section 7.
2.1. Assumptions on PDE problem, error estimate, refinement process, and linear solver
Consider a PDE expressed abstractly as
Lu = f in ⊂ Rd (1)
with appropriate boundary conditions and solution space V . Assume that continuity and coer-
civity bounds for the corresponding bilinear form can be verified in some suitable norm. Let
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:89–114
DOI: 10.1002/nla
EFFICIENCY-BASED h- AND hp-REFINEMENT STRATEGIES 91
Th be a regular partition of the domain, , into finite elements [3, 6], i.e. = ∈Th with
h = max{diam() : ∈ Th }. In this paper we assume, for simplicity, that the elements are squares in
2D and cubes in three dimensions (3D). Let Vh be a finite-dimensional subspace of V and u h ∈ Vh
a finite element approximation such that the following error estimate holds:
where 0m<s< p +1, m and p are integers and s is a real number. Here, p is the polynomial
order of the finite element method. Furthermore, assume that we obtain a sharp a posteriori error
estimate E(u h , f ) that is equivalent to u −u h H m () . The associated error functional is given by
F(u h , f ) = E 2 (u h , f ). For example, the L 2 functional is a natural a posteriori error estimate for
first-order system least squares (FOSLS) finite element methods, and equivalence to the H 1 norm
has been proved for several relevant second-order PDE systems of elliptic type [4, 7–9]. The local
value of the error, E, on element j is denoted by j .
Consider an adaptive hp-refinement process of the following form. The refinement process
starts on a coarse grid with uniform element size h and order p = 1 (level 0) and proceeds
through levels = 1, 2, . . . , until the error measure, E (u h , f ), has a value less than a given
bound. In each step, some elements may be refined in h by splitting them into 2d sub-elements,
and some elements may be refined in p by doubling the element order. The decision of which
elements to refine is based on the information provided by the local error estimator, and by
heuristics that may take into account predicted error reduction and work. In particular, we consider
strategies where the elements are ordered in terms of decreasing local error, such that elements
with larger error are considered for refinement first. Standard threshold-based approaches then
may refine, for example, a fixed fraction of the elements in every step or a fixed fraction of
the total error functional. Let the work needed to solve the discrete linear system on level be
given by W . Our goal is to reach a pre-specified bound on the global error, E (u h , f ), with a
L
minimal amount of total work, =1 W . Finding this optimal grid sequence may be difficult,
even if we restrict the process to h-refinement alone. Hence, we turn to seeking nearly optimal
solutions by using heuristics of greedy type. We consider refinement heuristics that determine the
fraction of elements to be refined based on optimizing an efficiency measure in every step. We
expect that a desirable grid sequence needs to be a high accuracy sequence, i.e. a grid sequence
for which the error, E (N ), decreases with nearly optimal order as a function of the number
of DOF, N , on grid level . Note that our strategy also results in an approximate solution to
the following problem: find a mesh with a fixed number of DOF that minimizes the error. To
this end, one can simply stop the above described process when the specified number of DOF is
reached.
We allow the domain to contain singularities, i.e. points or lines in whose neighborhood the
full convergence order of the finite element method cannot be attained due to lack of smoothness
of the solution. For simplicity, assume that those singular points or lines can be located only at
coarse-level grid points or grid lines and that their power and location are known. This includes
the case where the singularities occur at the boundaries of the simulation domain. If the location
and strength of the singularities are not known in advance, they can be estimated by monitoring
reduction rates of local error functionals during a few steps of initial uniform refinement.
It is assumed that optimal solvers, e.g. multigrid, are used for the discrete linear systems. The
computational work for solving these systems is, thus, assumed to be a fixed constant times the
number of DOF: W = c N .
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:89–114
DOI: 10.1002/nla
92 H. DE STERCK ET AL.
with exact solution given by u = x . While the efficiency-based refinement strategies can be applied
to various types of finite element methods and associated error estimates, we choose to illustrate
the refinement strategies for model problem (3) using standard Galerkin finite element methods of
order p, with the error estimated by the H 1 seminorm of the actual error, e = u −u h , i.e. F(u h , f ) =
u −u h 2L 2 () and 2j = u −u h 2L 2 ( ) . These are equivalent to the H 1 norm, since it turns out that
j
e(xi ) = 0 at each grid point for our model problem [3, 5]. Note that u ∈ H 1+−(1/2)− ((0, 1)) for
/ H 2 ((0, 1)), then there is an x -type
any >0. If we choose 12 < 32 such that u ∈ H 1 ((0, 1)) but u ∈
singularity at x = 0. We choose this model problem and this error estimator because asymptotically
optimal h- and hp-finite element grids have been developed for them [3, 5], which can be used as
a point of comparison for the refinement strategies to be presented in this paper. In addition, it
turns out that the finite element approximations can be obtained easily, namely, by interpolation
for p = 1 and by integrating a truncated Legendre expansion of u (x) for p1. The refinement
strategies presented in this paper can be equally applied to other finite element methods, as is
illustrated in the second part of the paper, where we present results for a 2D problem using the
FOSLS finite element method [4, 8].
i.e.
ropt = arg min (r ) (r ) (5)
r ∈(0,1]
The motivation for this heuristic is as follows: more work on the current level is justified when it
results in increased error reduction that offsets the extra work. While this choice does not guarantee
that a globally optimal grid sequence is obtained, this local optimization in each step results in an
overall strategy of greedy type, which can be expected to lead to a reasonable approximation to
the optimal grid sequence.
We also propose a second strategy, ACE. We define the predicted effective functional reduction
factor
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:89–114
DOI: 10.1002/nla
EFFICIENCY-BASED h- AND hp-REFINEMENT STRATEGIES 93
The fraction, r , of elements to be refined on the current level is determined by minimizing this
effective reduction factor, which is the same as minimizing log((r )eff ), i.e.
log((r ))
ropt = arg min (7)
r ∈(0,1] (r )
The effective functional reduction factor, (r )eff , measures the functional reduction per unit work.
Indeed, compare two hypothetical error-reducing processes with functional reduction factors 1 and
2 , and work proportional to 1 and 2 . Assume that process 2 requires double the work of process
1, 2 = 2 1 . Then the two processes would be equally effective when 2 = 21 , because process 1
could be applied twice to obtain the same error reduction as process 2, using the same total amount
of work as process 2. Minimizing the effective functional reduction in every step, thus, chooses
the fraction, r , of elements to be refined by locally minimizing the functional reduction per unit
work.
Both the strategies of minimizing work times error reduction and minimizing the effective
functional reduction factor are ways for optimizing the efficiency of the refinement process at each
level. Hence, we call the two proposed efficiency-based refinement strategies.
The predicted functional reduction factor, (r ), depends on the error estimate and the smoothness
of the solution. As mentioned above, we consider the case that the error estimate is equivalent to
the H 1 norm of u −u h , i.e. F(u h , f ) ≈ u −u h 2H 1 () and 2j ≈ u −u h 2H 1 ( ) . The error has the
j
following asymptotic behavior [6].
For elements j in which the solution is smooth (at least in H p+1 ( j ) if order p elements are
used), we have
2j ≈ u h −u2H 1 (
j)
2p
Ch j u2H p+1 (
j)
2p
C M p+1 h j h j (9)
p+1
Here, we can take M p+1 = i=0 u (i)2 ∞, j , such that u2H p+1 ( ) M p+1 h j . If j is split into
j
two equal parts, we have two new elements, j,1 and j,2 , and we can assume that
2 p
2j,1 +2j,2 1
≈ (10)
2j 2
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:89–114
DOI: 10.1002/nla
94 H. DE STERCK ET AL.
However, if u is less smooth in some element j , i.e. if we can assume only that u ∈ H s ( j )
with s ∈ R, s< p +1, then we have
2(s−1)
2j Ch j u2s,i (11)
For simplicity, we consider only the highly singular case here, for which s p +1. If, again, j is
split into two, assuming element j,1 contains the singularity, then j,1 j,2 and us, j,1 ≈ us, j .
We then obtain
2(s−1)
2j,1 +2j,2 2j,11
≈ 2 ≈ (12)
i2 j 2
Suppose the solution is sufficiently smooth in the whole domain. Then the predicted functional
reduction factor, (r ), can be obtained as follows. We apply (10) to the elements that are refined.
A fraction, 1− f (r ), of elements do not get refined; hence, we assume that their errors are not
reduced. This results in
It is cumbersome to give a general expression for the singular case. However, assuming that we
know the power and location of the singularities in advance, one can easily compute (r ) using
(10) and (12).
(r ) = 1− 34 f (r ) (14)
Note that, for a given error bound, our ultimate goal is to choose a grid sequence that minimizes
L L
the total work, =1 W , which is the same as minimizing =1 N , based on our assumption
that the work is proportional to N . For a given error bound, the number of elements on final
grid N L is determined by the convergence rate of the global error w.r.t. the DOF, which in fact is
determined by the refinement strategy. For our model problem, it has been shown in [5] that the
rate of convergence is never better than (N p)− p , where N is the number of elements and p is the
degree of the polynomial.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:89–114
DOI: 10.1002/nla
EFFICIENCY-BASED h- AND hp-REFINEMENT STRATEGIES 95
For our example problem, an asymptotically optimal final grid, called a radical grid, is described
in [3, 5]:
This grid is optimal in the sense that, in the limit of large N , it results in the smallest error as
a function of the number of DOF. If the WEE or the ACE strategy results in a grid sequence
with approximately optimal convergence rate of the global error w.r.t. DOF, then the number of
elements on the final grid must be close to the optimal number of elements, which depends only
on the given error bound. Because we wish to minimize work, it follows that, among the methods
with approximately optimal convergence rate, the methods for which the sequence {N } increases
fast are preferable. Large refinements are, thus, advantageous.
We compare the numerical results of the WEE and ACE strategies, and radical grid for = 2.1
and p = 1 in Figures 1–6. In the numerical results, we carry out the refinement process until
E L (u h , f )2e−5 on final grid level L.
From Figure 1, it can be observed that both strategies result in a highly accurate grid sequence.
Thus, for a given error bound, the difference in the number of elements on the final grid is very
small. This can be verified on Figure 2. Figures 3 and 4 show that the ACE strategy is slightly
more efficient than the WEE strategy for our model problem in the smooth case. There are two
small refinements in the WEE refinement process, while there are no small refinements for the
ACE strategy. It follows that for a given error bound on the final grid, the WEE strategy may
log10(E)
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:89–114
DOI: 10.1002/nla
96 H. DE STERCK ET AL.
x 10 x 10
1.8 1.8
1.6
1.6
1.4
1.4
1.2
1.2
1
1
0.8
0.8
0.6
0.6
0.4
0.2 0.4
0 0.2
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
(a) (b)
Figure 2. Local error functional, i2 , versus grid location on the final grid, = 2.1 (no singularity),
p = 1: (a) WEE: N L = 32 741, E L = 1.859e−5, L = 18, total work = 102 313 and (b) ACE: N L = 32 760,
E L = 1.858e−5, L = 16, total work = 65 520.
1 1
0.9 0.9
0.8 0.8
0.7 0.7
opt
0.6 0.6
) and r
0.5 0.5
opt
0.4 0.4
f(r
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 2 4 6 8 10 12 14 16 18 0 5 10 15
(a) level (b) level
Figure 3. Refined fraction of error functional, f (ropt ), versus level, , and refined fraction of elements,
ropt , versus level, , = 2.1 (no singularity), p = 1: (a) WEE and (b) ACE.
require slightly more total work than the ACE strategy, see Figure 5. Figure 2 shows that, for
both strategies, the local errors in all elements tend to be equally distributed. This explains why
the values of f (ropt ) and ropt are close in Figure 3. From Figure 6 one can see that the predicted
reduction factor (ropt ) is very accurate. This suggests a modification of the refinement process
that can be considered to increase performance: one does not need to solve the linear systems until
the new level is refined enough to have a significant number of additional elements in it. In this
way complexity is never a problem, and we can still have a highly accurate grid sequence.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:89–114
DOI: 10.1002/nla
EFFICIENCY-BASED h- AND hp-REFINEMENT STRATEGIES 97
x 10 x 10
3.5 3.5
3 3
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5
0 0
0 2 4 6 8 10 12 14 16 18 0 2 4 6 8 10 12 14 16
(a) (b)
Figure 4. Number of elements, N , versus level, , = 2.1 (no singularity), p = 1: (a) WEE and (b) ACE.
10
WEE
ACE
10
error on final grid
10
10
10
10
10 10 10 10 10 10 10
total work
L
Figure 5. Final error, E L , versus total work, =1 N , = 2.1 (no singularity), p = 1.
Here, we assume that the local error in the element that contains x = 0 is always the largest.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:89–114
DOI: 10.1002/nla
98 H. DE STERCK ET AL.
1 1
predicted factor
predicted factor actual factor
0.9 0.9
actual factor
0.8 0.8
0.7 0.7
γ and g
γ and g
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0 2 4 6 8 10 12 14 16 18 0 5 10 15
(a) level (b) level
Figure 6. Predicted functional reduction factor, (ropt ), and actual functional reduction factor, g, versus
level, , = 2.1 (no singularity), p = 1: (a) WEE and (b) ACE.
log10(E)
The numerical results in Figures 7–12 show that the two refinement strategies fail for this singular
case. Figure 7 shows that the WEE strategy results in a highly accurate grid sequence, while the
ACE strategy becomes inaccurate by comparison with the radical grid. For both strategies, the
local error in the first element, which contains the singularity, is always the largest, see Figure 8.
Hence, it is refined by the WEE and the ACE in every step. This also confirms that the predicted
reduction factor can be given by (17). The WEE strategy generates a grid sequence with local
errors being nearly equally distributed, but the ACE strategy does not: more than 90% of the global
error accumulates in only 10% of the elements; see Figures 8 and 9. Most refinement steps of the
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:89–114
DOI: 10.1002/nla
EFFICIENCY-BASED h- AND hp-REFINEMENT STRATEGIES 99
x 10 x 10
4
4.5
3.5
4
3
3.5
2.5
3
2.5 2
2
1.5
1.5
1
1
0.5
0.5
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
(a) x (b) x
Figure 8. Local error functional, i2 , versus grid location on the final grid, = 0.6 (singular case),
p = 1: (a) WEE: N L = 6925, E L = 6.169e−4, L = 154, total work = 192 775 and (b) ACE: N L = 24 986,
E L = 6.411e−4, L = 106, total work = 365 420.
1 1
0.9 0.9
0.8 0.8
0.7 0.7
opt
0.6 0.6
) and r
0.5 0.5
opt
0.4 0.4
f(r
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120
(a) level (b) level
Figure 9. Refined fraction of error functional, f (ropt ), versus level, , and refined fraction of elements,
ropt , versus level , = 0.6 (singular case), p = 1: (a) WEE and (b) ACE.
WEE strategy are small refinements: only the first element (possibly with a few other elements)
is continuously being refined (see Figures 9 and 10). This implies that the number of elements
increases slowly as a function of refinement level. It follows that the total work is very large. The
ACE strategy does choose a refinement region with large fraction of the error in it. However, this
large fraction of error is contained only in a few elements. As a result, only a small fraction of
elements are refined. Thus, the required total work is still large; see Figures 10 and 11. Compared
with the nonsingular case (Figure 5), the slope of the error versus total work plot in Figure 11 is
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:89–114
DOI: 10.1002/nla
100 H. DE STERCK ET AL.
x 10
7000 2.5
6000
2
5000
1.5
4000
3000
1
2000
0.5
1000
0 0
0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120
(a) (b)
10
WEE
ACE
10
error on final grid
10
10
10
10 10 10 10 10 10 10
total work
L
Figure 11. Final error, E L , versus total work, =1 N , = 0.6 (singular case), p = 1.
much less steep, especially in the initial phase of the refinement process. The predicted reduction
factors for both strategies are accurate, see Figure 12. This suggests that we can make the same
modification as for the smooth case to increase performance: one can wait on solving the linear
systems until the number of elements has increased sufficiently. In this way, one can assure that
the complexity is never a problem, but calculating and minimizing the WEE and ACE functions
many times may be costly as well. In conclusion, for the highly singular case, the WEE strategy
results in an accurate grid sequence but is not efficient due to too many small refinements; the ACE
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:89–114
DOI: 10.1002/nla
EFFICIENCY-BASED h- AND hp-REFINEMENT STRATEGIES 101
1 1
0.9 0.9
0.8 0.8
γ l and g
γ and g
0.7 0.7
0.6 0.6
0.5 0.5
predicted factor predicted factor
actual factor actual factor
0.4 0.4
Figure 12. Predicted functional reduction factor, (ropt ), and actual functional reduction factor, g , versus
level, , = 0.6 (singular case), p = 1: (a) WEE and (b) ACE.
strategy is worse than the WEE strategy in this case, because the grid sequence is not accurate
and many small refinements are performed.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:89–114
DOI: 10.1002/nla
102 H. DE STERCK ET AL.
m j satisfying
2m j (s j −1) 2 p
1 1 p
≈ ⇒m j =
2 2 s j −1
Note that we assume here that the error in the first, singular new element dominates the sum
of the errors in the other new elements of the graded grid. This is a good approximation
for a strong singularity. For elements in which the solution is smooth, single refinement is
performed: m j = 1. Let k j be the number of new elements after j is refined: k j = m j +1.
(3) The predicted functional reduction factor, (r ), and the work increase ratio, (r ), are
given by
j r N k j
(r ) = 1−r +
N
2 p (18)
1
(r ) = 1− f (r )+ f (r )
2
(4) Find the optimal r defined in (5) for the MWEE strategy and in (7) for the MACE strategy.
(5) Repeat.
4.2. Performance of the modified WEE and ACE h-refinement strategies for singular solutions
We again choose = 0.6 and p = 1 for our example problem. There is a singularity at x = 0,
with error reduction factor bound ( 12 )0.2 . Therefore, for the element that contains x = 0, we use
11-graded refinement (m = 0.1
1
). Numerical results are shown in Figures 13–18.
By comparing the numerical results for the modified strategies with the results for the original
methods, we see the following. Both the MWEE and MACE strategies result in highly accurate
−0.5
MWEE : −1.0157x+0.68384
MACE : −1.0119x+0.70888
−1 Radical Grid : −0.9938x+0.58166
−1.5
log (E)
−2
10
−2.5
−3
−3.5
−4
1 1.5 2 2.5 3 3.5 4 4.5 5
log (N)
10
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:89–114
DOI: 10.1002/nla
EFFICIENCY-BASED h- AND hp-REFINEMENT STRATEGIES 103
x 10 x 10
4.5 4.5
4 4
3.5 3.5
3 3
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
(a) (b)
Figure 14. Local error functional, i2 , versus grid location on the final grid, = 0.6 (singular case),
p = 1: (a) MWEE: N L = 6975, E L = 6.125e−4, L = 15, total work = 21 176 and (b) MACE: N L = 8517,
E L = 5.443e−4, L = 12, total work = 17 044.
1 1
0.9 0.9
0.8 0.8
0.7 0.7
opt
opt
0.6 0.6
) and r
) and r
0.5 0.5
opt
opt
0.4 0.4
f(r
f(r
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 2 4 6 8 10 12 14 1 2 3 4 5 6 7 8 9 10 11
(a) level (b) level
Figure 15. Refined fraction of error functional, f (ropt ), versus level, , and refined fraction of elements,
ropt , versus level, , = 0.6 (singular case), p = 1: (a) MWEE and (b) MACE.
grid sequences: the convergence rate is very close to the optimal rate (Figure 13). Local error
functionals on the final MWEE grid are more equally distributed than for the MACE grid. For the
MWEE strategy, the local error functional in the singular element is only three times larger than in
the smooth elements. However, for the MACE strategy, that ratio is as large as 1000 (Figure 14).
For the MWEE strategy, the number of elements, N , increases much faster than for the WEE
strategy, which reduces the work considerably (Figure 15). However, there still exist a few small
refinement steps. For the MACE strategy, it seems that the strategy tends to do uniform refinement
after several initial steps (Figure 15(b)). Similar to the smooth solution case, the MWEE strategy
may need slightly more work to reach the same error bound than the MACE strategy due to a
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:89–114
DOI: 10.1002/nla
104 H. DE STERCK ET AL.
7000 9000
8000
6000
7000
5000
6000
4000 5000
3000 4000
3000
2000
2000
1000
1000
0 0
0 5 10 15 0 2 4 6 8 10 12
(a) (b)
10
MWEE
MACE
WEE
ACE
10
error on final grid
10
10
10
10 10 10 10 10 10 10
total work
L
Figure 17. Final error, E L , versus total work, =1 N , = 0.6 (singular case), p = 1.
few steps of small refinement (Figure 17). However, since the MWEE strategy is slightly more
accurate, the difference is very small. Again, the predicted functional reduction factors are good
approximations of the actual factors for both strategies (Figure 18).
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:89–114
DOI: 10.1002/nla
EFFICIENCY-BASED h- AND hp-REFINEMENT STRATEGIES 105
0.9 0.31
predicted factor predicted factor
0.3
0.8
actual factor actual factor
0.29
0.7
0.28
0.6
γ and g
γ and g
0.27
0.5 0.26
0.25
0.4
0.24
0.3
0.23
0.2 0.22
0 2 4 6 8 10 12 14 1 2 3 4 5 6 7 8 9 10 11
(a) level (b) level
Figure 18. Predicted functional reduction factor, (ropt ), and actual functional reduction factor, g, versus
level, , = 0.6 (singular case), p = 1: (a) MWEE and (b) MACE.
10
MWEE MWEE
MACE MACE
graded threshold 1.0 graded threshold 1.0
graded threshold 0.8 graded threshold 0.8
graded threshold 0.2 graded threshold 0.2
error on final grid
10
log10(E)
10
Figure 19. Efficiency-based and threshold-based refinement strategies: (a) error versus DOF and (b) final
L
error, E L , versus total work, =1 N . (Both = 0.6 (singular case), p = 1.)
If we choose to refine a fixed fraction of the global error that is too small (less than the average
of f (ropt ) in the modified efficiency-based strategies), e.g. = 0.2 in Figure 19, then the resulting
grid sequence is almost of optimal accuracy, but the total work increases significantly since N
increases slowly. A threshold value that is too large (larger than the average of f (ropt ) in the
modified efficiency-based strategies), e.g. = 1.0 in Figure 19, makes the number of elements,
{N }=1
L , increases faster, but the large threshold results in a less accurate grid sequence. This
implies that more total work is required to reach the same error bound. A threshold value that
is close to the average of f (ropt ) in the modified efficiency-based strategies, namely, = 0.8 in
Figure 19, results in a refinement process that performs similar to the efficiency-based refinement
processes.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:89–114
DOI: 10.1002/nla
106 H. DE STERCK ET AL.
(a) (b)
Figure 20. Efficiency-based refinement strategies for a smooth problem with p = 2 ( = 3.1): (a) error
L
versus DOF and (b) final error, E L , versus total work, =1 N .
(a) (b)
Figure 21. Efficiency-based refinement strategies for a singular problem with p = 2 ( = 0.6): (a) error
L
versus DOF and (b) final error, E L , versus total work, =1 N .
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:89–114
DOI: 10.1002/nla
EFFICIENCY-BASED h- AND hp-REFINEMENT STRATEGIES 107
resulting, as before, in much less work for the modified strategies. It has to be noted, however,
that the MWEE and MACE grids contain many more elements than optimal graded grids. This
is probably due to the fact that the singularity is very strong for = 0.6 and p = 2, such that a
geometrically graded grid with a grading factor of 12 does not decrease the grid size fast enough
in the vicinity of the singularity. Nevertheless, we can conclude that, within the constraint of
refinement based on splitting cells in two, the MWEE and MACE strategies lead to an efficient
refinement process.
Assuming that we know a good approximation for the p-refinement error reduction factor for each
element, we can apply the efficiency-based refinement strategies to hp-refinement processes.
We only consider h-refinement for the first element, which contains the singularity. Then we
have the error functional reduction factor bound ( 12 )2−1 as in (12). For an element j that does
not contain the singularity, note that j is small, and again we obtain the same h-reduction factor
bound, ( 12 )2 p j , as before (see (10)). Moreover, if we double the degree of polynomial p j , we
obtain the p-reduction factor bound as follows:
p j 2
j (2 p j ) j
≈ (21)
j(pj) 2
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:89–114
DOI: 10.1002/nla
108 H. DE STERCK ET AL.
2p
2j ( p j )C( p j )h j j u2 p j +1
H ( j )
Assuming that (1/(2 p j )!)u2 p j +1 M, where M is a constant, we obtain the following general
H ( j )
p-error reduction factor
2 p j
2j (2 p j ) hj
≈ (23)
2j ( p j ) 2
x j =q N− j , 0<q<1, j = 1, 2, . . . , N (24)
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:89–114
DOI: 10.1002/nla
EFFICIENCY-BASED h- AND hp-REFINEMENT STRATEGIES 109
10
10
10
geometric qopt
geometric q=0.5
10
10 10 10 10
DOF
√ √
Let j = = (1− q)/(1+ q), ∀ j : 1 jN . It was shown in [5] that the optimal degree distri-
bution of p for these grid locations tends to a linear distribution with slope
log q
so = (−1/2) (25)
log
Furthermore, the optimal geometric grid factor q and linear slope so combination is given by
√
qopt = ( 2−1)2 , sopt = 2−1 (26)
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:89–114
DOI: 10.1002/nla
110 H. DE STERCK ET AL.
10
10
10
10
10 10 10 10 10
total work
L
Figure 23. Final error, E L , versus total work, =1 N , = 0.6 (singular case), p = 1.
6. 2D RESULTS
In this section, we explore the use of the proposed efficiency-based refinement strategies in two
spatial dimensions. In these initial considerations, we discuss only problems with sufficiently
smooth solutions.
Suppose the solution is sufficiently smooth in the whole domain. As in the 1D case, the predicted
functional reduction factor, (r ), is given by
The WEE and ACE strategies can then be used to determine the fraction of elements to be refined,
ropt , according to Equations (5) and (7), respectively.
It should be noted here that the WEE measure may be problematic in dimensions higher than √one.
This can be seen as follows. The WEE measure determines ropt by minimizing MWEE ≡ (r ) (r )
over r ∈ [1/N , 1]. For smooth solutions, (1/N ) ≈ 1 and (1/N ) ≈ 1, such that MWEE (1/N ) ≈ 1.
For r = 1, however, it can be observed that (1) = 2d and (1) = ( 12 )2 p , such that MWEE (1) = 2d− p .
This means that MWEE >1 when d> p. MWEE (r ) is often a very smooth function; hence, ropt is
likely to be close to 1/N when d> p, resulting in small refinements, which are inefficient. We,
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:89–114
DOI: 10.1002/nla
EFFICIENCY-BASED h- AND hp-REFINEMENT STRATEGIES 111
thus, expect that the WEE strategy may not be efficient when d> p. We investigate this issue in
the numerical results presented below. Also, it can be noted that this problem does not occur for
the ACE strategy.
− p = f in
p=g on * (29)
= (0, 1)×(0, 1)
with the right-hand side f and boundary conditions g chosen such that the solution is given by
⎧
⎪ 1, r r0
⎨
p(r, ) = h(r ), r0 r r1 (30)
⎪
⎩
0, r1 r
Here, (r, ) are the usual polar coordinates and h(r ) is the unique polynomial of degree five such
that p ∈ C 2 (). We choose r0 = 0.7 and r1 = 0.8. The solution of this test problem takes on the
unit value in the lower left corner of the domain and is zero elsewhere, except for a steep gradient
in the thin strip 0.7r 0.8. Figure 24(a) shows the grid obtained for this model problem after
several refinement steps.
Figure 24. Adaptively refined grids using the ACE refinement strategy for 2D problems with p = 2:
(a) single arc on a unit square domain and (b) double arc on a unit square domain.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:89–114
DOI: 10.1002/nla
112 H. DE STERCK ET AL.
To illustrate the broad applicability of our refinement strategies, we solve this model problem
using a FOSLS finite element method, rather than the Galerkin method that was used for the 1D
test problems. BVP (29) is rewritten as a first-order system BVP [8]
−∇ ·U = f in
U = ∇p
∇ ×U = 0
p=g on * (31)
*g
s·U =
*
= (0, 1)×(0, 1)
where U is a vector of auxiliary unknowns, and s is the unit vector tangent to *. The FOSLS error
estimator is given by F( ph ,Uh ; f ) = ∇ ·Uh + f 2L 2 () +Uh −∇ ph 2(L 2 ())2 +∇ ×Uh 2L 2 () .
Under certain smoothness assumptions, the FOSLS error estimator is equivalent to the H 1 -norm
[8]: F( ph ,Uh ; f ) ≈ p − ph 2H 1 () +U −Uh 2(H 1 ())2 .
Note that in our approach refinement is performed in such a way that new nodes are introduced
on element edges and faces; hence, local refinement introduces hanging nodes (see Figure 24(a)).
To maintain a C 0 solution, we treat these as slave nodes, enforcing a continuity constraint across
element boundaries. This results in a conforming finite element method, and the approximation
properties discussed in this paper still hold on this type of grid.
(a) (b)
Figure 25. Efficiency-based refinement strategies for the 2D model problem with p = 1: (a) error versus
L
DOF and (b) final error, E L , versus total work, =1 N .
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:89–114
DOI: 10.1002/nla
EFFICIENCY-BASED h- AND hp-REFINEMENT STRATEGIES 113
(a) (b)
Figure 26. Efficiency-based refinement strategies for the 2D model problem with p = 2: (a) error versus
L
DOF and (b) final error, E L , versus total work, =1 N .
Figure 26 shows that, for p = 2 (d = p), both the ACE and WEE strategies produce an efficient
refinement process.
Figure 24(b) shows the resulting grid when the ACE strategy is applied to a slightly more
complicated test problem, in which two circular steps are superimposed (u = 1 in the lower left
corner, u = 2 in the lower right corner, u = 3 where the two steps overlap, and u = 0 in the top part
of the domain). The adaptive refinement process adequately captures the error generated at the
steep gradients.
7. CONCLUSIONS
Two efficiency-based adaptive refinement strategies for finite element methods, WEE and ACE,
were discussed. The two strategies take both error reduction and work into account. The two
strategies were first compared for a 1D model problem. For the case of h-refinement with smooth
solutions, the efficiency-based strategies generate a highly accurate grid sequence and an efficient
refinement process. However, for singular solutions, the refinement process becomes inefficient
due to many steps of small refinements. Use of a graded grid for elements with a singularity leads
to significant improvement. For both the WEE and ACE strategies, this modification saves a lot
of work and also results in a highly accurate grid sequence. For the hp-refinement case, similar
conclusions are obtained. However, for general problems, the difficulty here may lie in how to
find a good approximation for the p-error reduction factor. Application to problems with spatial
dimension larger than one shows that the WEE strategy is inefficient when the dimension, d, is
larger than the finite element order, p. The ACE strategy, however, produces an efficient refinement
process for any combination of d and p.
Future work will include application of these grid refinement strategies to problems with singu-
larities in multiple spatial dimensions. Also, an idea to be explored in the future is to enhance
the refinement strategies by allowing double or triple refinement for some elements, and deter-
mining, in each step, the optimal number of elements to be refined once, twice and thrice. More
realistic measures for computational work must be considered that may, for instance, take into
account matrix assembly costs and multigrid convergence factors, and their dependence on the
finite element order and the spatial dimension of the problem. Another topic of interest is the
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:89–114
DOI: 10.1002/nla
114 H. DE STERCK ET AL.
REFERENCES
1. Ruede U. Mathematical and Computational Techniques for Multilevel Adaptive Methods. Frontiers in Applied
Mathematics, vol. 13. SIAM: Philadelphia, 1993.
2. Verfuerth R. A Review of a Posteriori Error Estimation and Adaptive Mesh-Refinement Techniques. Teubner,
Wiley: Stuttgart, 1996.
3. Schwab C. p- and hp-Finite Element Methods. Clarendon Press: Oxford, 1998.
4. Berndt M, Manteuffel TA, McCormick SF. Local error estimates and adaptive refinement for first-order system
least squares (FOSLS). Electronic Transactions on Numerical Analysis 1997; 6:35–43.
5. Gui W, Babuška I. The h, p and hp versions of the finite element method in 1 dimension, parts I, II, III.
Numerische Mathematik 1986; 49:577–683.
6. Brenner SC, Scott LR. The Mathematical Theory of Finite Element Methods. Springer: New York, 1996.
7. Cai Z, Lazarov R, Manteuffel TA, McCormick SF. First-order system least squares for second-order partial
differential equations. I. SIAM Journal on Numerical Analysis 1994; 31:1785–1799.
8. Cai Z, Manteuffel TA, McCormick SF. First-order system least squares for second-order partial differential
equations. II. SIAM Journal on Numerical Analysis 1997; 34:425–454.
9. Bochev PB, Gunzburger MD. Finite element methods of least-squares type. SIAM Review 1998; 40:789–837.
10. Bank RE, Holst MJ. A new paradigm for parallel adaptive meshing algorithms. SIAM Review 2003; 45:292–323.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:89–114
DOI: 10.1002/nla
NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS
Numer. Linear Algebra Appl. 2008; 15:115–139
Published online 29 October 2007 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nla.559
Hans De Sterck1 , Robert D. Falgout2 , Joshua W. Nolting3 and Ulrike Meier Yang2, ∗, †
1 Department of Applied Mathematics, University of Waterloo, Waterloo, Ont., Canada N2L 3G1
2 Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, P.O. Box 808, Livermore,
CA 94551, U.S.A.
3 Department of Applied Mathematics, University of Colorado at Boulder, Campus Box 526, Boulder,
CO 80302, U.S.A.
SUMMARY
Algebraic multigrid (AMG) is one of the most efficient and scalable parallel algorithms for solving sparse
linear systems on unstructured grids. However, for large 3D problems, the coarse grids that are normally
used in AMG often lead to growing complexity in terms of memory use and execution time per AMG
V-cycle. Sparser coarse grids, such as those obtained by the parallel modified independent set (PMIS)
coarsening algorithm, remedy this complexity growth but lead to nonscalable AMG convergence factors
when traditional distance-one interpolation methods are used. In this paper, we study the scalability of AMG
methods that combine PMIS coarse grids with long-distance interpolation methods. AMG performance
and scalability are compared for previously introduced interpolation methods as well as new variants
of them for a variety of relevant test problems on parallel computers. It is shown that the increased
interpolation accuracy largely restores the scalability of AMG convergence factors for PMIS-coarsened
grids, and in combination with complexity reducing methods, such as interpolation truncation, one obtains
a class of parallel AMG methods that enjoy excellent scalability properties on large parallel computers.
Copyright q 2007 John Wiley & Sons, Ltd.
KEY WORDS: algebraic multigrid; long-range interpolation; parallel implementation; reduced complexity;
truncation
1. INTRODUCTION
Algebraic multigrid (AMG) [1–4] is an efficient potentially scalable algorithm for sparse linear
systems on unstructured grids. However, when applied to large 3D problems, the classical algorithm
∗ Correspondence to: Ulrike Meier Yang, Center for Applied Scientific Computing, Lawrence Livermore National
Laboratory, P.O. Box 808, Livermore, CA 94551, U.S.A.
†
E-mail: umyang@llnl.gov
often generates unreasonably large complexities with regard to memory use as well as computa-
tional operations. Recently, we suggested a new parallel coarsening algorithm, called the parallel
modified independent set (PMIS) algorithm [5], which is based on a parallel independent set
algorithm suggested in [6]. The use of this coarsening algorithm in combination with a slight
modification of Ruge and Stüben’s classical interpolation scheme [2] leads to significantly lower
complexities as well as significantly lower setup and cycle times. For various test problems, such
as isotropic and grid-aligned anisotropic diffusion operators, one obtains scalable results, partic-
ularly when AMG is used in combination with Krylov methods. However, AMG convergence
factors are severely impacted for more complicated problems, such as problems with rotated
anisotropies or highly discontinuous material properties. Since we realized that classical interpola-
tion methods, which use only distance-one neighbors for their interpolatory set, were not sufficient
for these coarse grids, we decided to investigate interpolation operators that also include distance-
two neighbors. In this paper, we focus on the following distance-two interpolation operators:
we study three methods proposed in [3], namely, standard interpolation, multipass interpolation,
and the use of Jacobi interpolation to improve other interpolation operators, and we investigate
two extensions of classical interpolation, which we denote with ‘extended’ and ‘extended+i’
interpolation.
Our investigation shows that all of the long-distance interpolation strategies, except for multipass
interpolation, significantly improve AMG convergence factors compared with classical interpola-
tion. Multipass interpolation shows poor numerical scalability, which, however, can be improved
with a Krylov accelerator, but it has very small computational complexity. All other long-distance
interpolation operators showed increased complexities. While the increase is not very significant
for 2D problems, it is of concern in the 3D case. Therefore, we also investigated complexity
reducing strategies, such as the use of smaller sets of interpolation points and interpolation trun-
cation. The use of these strategies led to AMG methods with significantly improved overall
scalability.
The paper is organized as follows. In Section 2, we briefly describe AMG. In Section 3,
distance-one interpolation operators are presented, and Section 4 describes long-range interpolation
operators. In Section 5, the computational cost of the interpolation strategies is investigated,
and in Section 6 some sequential numerical results are given, which motivate the following
sections. Section 7 presents various complexity reducing strategies. Section 8 investigates the
parallel implementation of the methods. Section 9 presents parallel scaling results for a variety of
test problems, and Section 10 contains the conclusions.
2. ALGEBRAIC MULTIGRID
In this section, we give an outline of the basic principles and techniques that comprise AMG, and
we define terminology and notation. Detailed explanations may be found in [2, 3, 7]. Consider a
problem of the form
Au = f (1)
where A is an n ×n matrix with entries ai j . For convenience, the indices are identified with grid
points, so that u i denotes the value of u at point i, and the grid is denoted by = {1, 2, . . . , n}.
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:115–139
DOI: 10.1002/nla
DISTANCE-TWO INTERPOLATION FOR PARALLEL AMG 117
In any multigrid method, the central idea is that ‘smooth error,’ e, that is not eliminated by
relaxation must be removed by coarse-grid correction. This is done by solving the residual equation
Ae =r on a coarser grid, then interpolating the error back to the fine grid and using it to correct
the fine-grid approximation.
Using superscripts to indicate level number, where 1 denotes the finest level so that A1 =
A and 1 = , AMG needs the following components: ‘grids’ 1 ⊃ 2 ⊃ · · · ⊃ M , grid opera-
tors A1 , A2 , . . . , A M , interpolation operators P k , restriction operators R k (often R k = (P k )T ), and
smoothers S k , where k = 1, 2, . . . , M −1.
Most of these components of AMG are determined in a first step, known as the setup phase.
During the setup phase, on each level k, k = 1, . . . , M −1, k+1 is determined using a coarsening
algorithm, P k and R k are defined and the Ak+1 is determined using the Galerkin condition
Ak+1 = R k Ak P k . Once the setup phase is completed, the solve phase, a recursively defined cycle,
can be performed as follows:
Algorithm
MGV(Ak , R k , P k , S k , u k , f k ).
If k = M, solve A M u M = f M with a direct solver.
Otherwise:
Apply smoother S k 1 times to Ak u k = f k .
Perform coarse-grid correction:
Set r k = f k − Ak u k .
Set r k+1 = R k r k .
Set ek+1 = 0.
Apply M GV (Ak+1 , R k+1 , P k+1 , S k+1 , ek+1 ,r k+1 ).
Interpolate ek = P k ek+1 .
Correct the solution by u k ← u k +ek .
Apply smoother S k 2 times to Ak u k = f k .
In the remainder of the paper, index k will be dropped for simplicity. The algorithm above describes
a V(1 , 2 )-cycle; other more complex cycles such as W-cycles are described in [7]. In every
V-cycle, the error is reduced by a certain factor, which is called the convergence factor. A sequence
of V-cycles is executed until the error is reduced below a specified tolerance. For a scalable AMG
method, the convergence factor is bounded away from one independently of the problem size n, and
the computational work in both the setup and solve phases is linearly proportional to the problem
size n. While AMG was originally developed in the context of symmetric M-matrix problems,
AMG has been applied successfully to a much wider class of problems. We assume in this paper
that A has positive diagonal elements.
In this section, we first give some definitions as well as some general remarks, and then recall
the possibly simplest interpolation strategy, the so-called direct interpolation strategy [3]. This
is followed by a description of the classical distance-one AMG interpolation method that was
introduced by Ruge and Stüben [2].
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:115–139
DOI: 10.1002/nla
118 H. DE STERCK ET AL.
In classical AMG [2], the interpolation of the error at the F-point i takes the form
ei = wi j e j (3)
j∈Ci
or
aii ei ≈ − ai j e j − ai j e j − ai j e j (5)
j∈Cis j∈Fis j∈Niw
From this expression, various interpolation formulae can be derived. We use the terminology of
[3] for the various interpolation strategies.
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:115–139
DOI: 10.1002/nla
DISTANCE-TWO INTERPOLATION FOR PARALLEL AMG 119
This leads to an interpolation, which is often not accurate enough. Nevertheless, we mention this
approach here, since various other interpolation operators which we consider are based on it. This
method is denoted by ‘direct’ in the tables presented below. In [3] it is also suggested to separate
positive and negative coefficients when determining the weights, a strategy which can help when
one encounters large positive off-diagonal matrix coefficients. We do not consider this approach
here, since the strategy did not lead to an improvement for the problems we consider here.
This approximation can be justified by the observation that smooth error varies slowly in the
direction of strong connection. The denominator simply ensures that constants are interpolated
exactly. Replacing the e j with a sum over the elements k of the coarse interpolatory set Ci
corresponds to taking into account strong F–F connections using C-points that are common
between the F-points. Note that, when the two F-points i and j do not have a common C-point
in Cis and C sj , the denominator in (7) is small or vanishing. Weak connections (from the points in
Niw ) are generally not important and, in (5), errors e j , j ∈ Niw are replaced by ei . This leads to the
following formula for the interpolation weights:
1 aik ak j
wi j = − ai j + , j ∈ Cis (8)
aii + k∈N w aik k∈F s m∈C s akm
i i i
In our experiments this interpolation is further modified as proposed in [8] to avoid extremely
large interpolation weights that can lead to divergence.
Now the interpolation above was suggested based on a coarsening algorithm that ensured that
two strongly connected F-points always have a common coarse neighbor. Since this condition is
no longer guaranteed when using PMIS coarsening [5], it may happen that the term m∈C s ak,m
i
in Equation (8) vanishes. In our previous paper on the PMIS-coarsening method [5], we modified
interpolation
formula (8) such that if this case occurs, aik is added to the diagonal term (the term
aii + k∈N w aik in Equation (8)), i.e. the strongly influencing neighbor point k of i is treated similar
i
to a weak connection of i. In what follows, we denote the set of strongly connected neighbors
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:115–139
DOI: 10.1002/nla
120 H. DE STERCK ET AL.
k i l
Figure 1. Example illustrating a situation occurring with PMIS coarsening, which will not correctly be
treated by direct or classical interpolation. Black points denote C-points, white points denote F-points,
and the arrow from i to l denotes that i strongly depends on l.
k of i that are F-points but do not have a common C-point, i.e. Cis ∩Cks = ∅, by Fis∗ . Combining
this with the modification suggested in [8] we obtain the following interpolation formula:
1 aik āk j
wi j = − ai j + , j ∈ Cis (9)
aii + k∈N w ∪F s∗ aik k∈F \F
s s∗ s
m∈C km ā
i i i i i
where
0 if sign(ai j ) = sign(aii )
āi j =
ai j otherwise
In this paper we refer to formula (9) as ‘classical interpolation’. The numerical results that were
presented in [5] showed that this interpolation formula, which is based on Ruge and Stüben’s
original distance-one interpolation formula [2], resulted in AMG methods with acceptable perfor-
mance when used with PMIS-coarsened grids for various problems, but only when the AMG cycle
is accelerated by a Krylov subspace method. Without such acceleration, interpolation formula (9)
is not accurate enough on PMIS-coarsened grids: AMG convergence factors deteriorate quickly
as a function of problem size, and scalability is lost. For various problems, such as problems with
rotated anisotropies or problems with large discontinuities, adding Krylov acceleration did not
remedy the scalability problems.
One of the issues is that distance-one interpolation schemes do not treat situations similar to the
one illustrated in Figure 1 correctly. Here we have an F-point with measure smaller than 1 that
has no coarse neighbors. This situation can occur for example if we have a fairly large strength
threshold. Both for classical and direct interpolation, the interpolated error in this point will vanish,
and coarse-grid correction will not be able to reduce the error in this point.
A major topic of this paper is to investigate whether distance-two interpolation methods are able
to restore grid-independent convergence to AMG cycles that use PMIS-coarsened grids, without
compromising scalability in terms of memory use and execution time per AMG V-cycle.
In this section, various long-distance interpolation methods are described. Parallel implementation
of some of these interpolation methods and parallel scalability results on PMIS-coarsened grids
are discussed later in this paper.
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:115–139
DOI: 10.1002/nla
DISTANCE-TWO INTERPOLATION FOR PARALLEL AMG 121
interpolation scheme to fix some of the problems that we saw when using our classical interpolation
scheme (9). Multipass interpolation proceeds as follows:
1. Use direct interpolation for all F-points i, for which Cis = ∅. Place these points in set F ∗ .
∗ ∗ ∗
all i ∈ F \ F with F ∩ Fi = 0, replace, in Equation (4), for all j ∈ Fi ∩ F , e j by
2. For s s
k∈C j w jk ek , where C j is the interpolatory set for e j . Apply direct interpolation to the new
equation. Add i to F ∗ . Repeat step 2 until F ∗ = F.
Multipass interpolation is fairly cheap. However, it is not very powerful, since it is based on
direct interpolation. If applied to PMIS, it still ends up being direct interpolation for most F-points.
However, it fixes the situation illustrated in Figure 1. If we apply multipass interpolation, the
point i will be interpolated by the coarse neighbors (black points) of F-points k and l.
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:115–139
DOI: 10.1002/nla
122 H. DE STERCK ET AL.
k l
i m
n k
Figure 2. Example of the interpolatory points for a 5-point stencil (left) and a 9-point stencil (right). The
gray point is the point to be interpolated, black points are C-points and white points are F-points.
See the left example in Figure 2. Consider point i. Using direct or classical interpolation, i
would only be interpolated by the two distance-one coarse points. However, when we include the
coarse points of its strong fine neighbors m and n, two additional interpolatory points k and l are
added, leading to a potentially more accurate interpolation formula. Standard interpolation is now
defined by applying direct interpolation to the new stencil, leading to
âi j k∈ N̂i âik
wi j = − (13)
âii k∈Ĉi âik
It then follows immediately that the interpolation weights using the extended coarse interpolatory
set Ĉi can be defined as
1 aik āk j
wi j = − ai j + , j ∈ Ĉi (15)
aii + k∈N w \Ĉi aik k∈F s m∈Ĉi ākm
i i
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:115–139
DOI: 10.1002/nla
DISTANCE-TWO INTERPOLATION FOR PARALLEL AMG 123
2 2
-1 -1 -1
0 1 2 3
Note that this may lead to some weak coarse points in Niw being included in the interpolatory
set Ĉi , if they are strongly connected to a neighbor point of i. This new interpolation formula
deals efficiently with strong F–F connections that do not share a common C-point. We call this
interpolation strategy ‘extended interpolation’ (‘ext’).
This is a better result than we would obtain for direct interpolation (6) and classical interpolation (9):
w1,0 = 1, w1,3 = 0
but worse than standard interpolation (13), for which we obtain the intuitively best interpolation
weights:
This can be remedied if we include not only connections a jk from strong fine neighbors j of i
to points k of the interpolatory set but also connections a ji from j to point i itself. An alternative
to expression (7) for the error in strongly connected F-points is then given by
k∈C ∪{i} a jk ek
ej ≈ i (17)
k∈Ci ∪{i} a jk
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:115–139
DOI: 10.1002/nla
124 H. DE STERCK ET AL.
with now
āki
ãii = aii + ain + aik (20)
n∈Niw \Ĉi k∈Fis l∈Ĉi ∪{i} ākl
We call this modified extended interpolation ‘extended+i’, and refer to it by ‘ext+i’ (or
sometimes ‘e+i’ to save space) in the tables below. If we apply it to the example illustrated in
Figure 3 we obtain weights (16).
In this section we consider the cost of some of the interpolation operators described in the previous
sections. We use the following notations:
Here f w indicates the number of strong F-neighbors that are treated weakly, which occur only for
classical interpolation (8). Also, sk denotes the average number of C-points which are distance-one
neighbors of j ∈ Fis and also distance-k interpolatory points for i, the point to be interpolated, i.e.
sk is the number of nonzero coefficients a jl , where j ∈ Fis and l is a distance-k interpolatory point,
divided by the number of distance-k interpolatory points for i. Note that sk is usually smaller than
ck and at most equal to ck . Note also that n k = f k +ck +wk .
In our considerations we assume a compressed sparse row data format, i.e. three arrays are used
to store the matrix: a real array that contains the coefficients of the matrix, an integer array that
contains the column indices for each coefficient and an integer array that contains pointers to the
beginning of each row for the other two arrays. We also assume an additional integer array that
indicates whether a point is an F- or a C-point.
For all interpolation operators mentioned before, it is necessary to determine at first the inter-
polatory set. At the same time, the data structure for the interpolation operator can be determined.
This can be accomplished by sweeping through each row that belongs to an F-point: coarse neigh-
bors are identified via integer comparisons, and the pointer array for the interpolation operator
is generated. For the distance-two interpolation schemes, it is also necessary to check neighbors
of strong fine neighbors. This requires n 1 comparisons for direct and classical interpolations, and
( f 1 +1)n 1 comparisons for extended, extended+i and standard interpolations. The final data struc-
ture contains Nc + Nf c1 coefficients for classical and direct interpolations, and Nc + Nf (c1 +c2 )
coefficients for extended(+i) and standard interpolations.
Next, the interpolation data structure is filled.
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:115–139
DOI: 10.1002/nla
DISTANCE-TWO INTERPOLATION FOR PARALLEL AMG 125
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:115–139
DOI: 10.1002/nla
126 H. DE STERCK ET AL.
Table III. Average number of distance-one (c1 ) and distance-two (c2 ) interpolatory
points and times for various interpolation operators.
Interpolation
Stencil c1 c2 Direct clas std ext ext+i
5-point 2.3 1.9 0.27 0.35 0.64 0.51 0.54
9-point 1.8 2.8 0.36 1.11 2.16 2.09 2.48
7-point 2.7 4.1 0.19 0.31 0.80 0.73 0.81
27-point 2.3 7.2 0.40 3.72 8.00 7.43 8.32
for distance-two interpolation operators, particularly for the 3D problems. These effects are signif-
icant, especially since on coarser levels the stencils become larger and, thus, impact the total
setup time.
While the previous section examined the computational cost for the interpolation operator, we are of
course mainly interested in the performance of the complete solver, which also includes coarsening,
the generation of the coarse-grid operator as well as the solve phase. We apply the new and old
interpolation operators here to a variety of test problems from [5] to compare their efficiency. We did
not include results using direct interpolation, since it performs worse than classical and multipass
interpolation for the problems considered, nor results using multipass interpolation followed by
Jacobi interpolation, since these results were very similar to those obtained for ‘clas+j’. All these
tests were obtained using AMG as a solver with a strength threshold of = 0.25, and coarse–fine-
Gauss–Seidel as a smoother. The iterations were stopped when the relative residual was smaller
than 10−8 . We also include operator complexity Cop , which is defined as the sum of the number
of nonzeroes of all matrices Ak divided by the number of nonzeroes of the original matrix A = A1 .
Cop is an indicator of computational complexity and memory use, i.e. large operator complexities
lead to large setup times, times per cycle and memory requirements.
Table IV shows results for the 2D Poisson problem −u = f using a 5-point finite difference
discretization and a 9-point finite element discretization. Table V shows results for the 2D rotated
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:115–139
DOI: 10.1002/nla
DISTANCE-TWO INTERPOLATION FOR PARALLEL AMG 127
Table IV. AMG for the 5- and 9-point 2D Laplace problem on a 1000×1000 square
with random right-hand side using different interpolation operators.
5-point 9-point
Method Cop # its Time Cop # its Time
clas 1.92 244 151.60 1.24 157 100.44
clas+j 2.65 15 26.09 1.65 9 21.03
mp 1.92 244 152.34 1.24 183 115.72
ext 2.54 16 20.24 1.60 10 18.26
ext+i 2.57 11 16.93 1.60 10 18.40
std 2.56 16 20.63 1.60 17 23.06
Table V. AMG for a problem with 45◦ and 60◦ rotated anisotropy on a 512×512 square
using different interpolation operators.
45◦ 60◦
Method Cop # its Time Cop # its Time
clas 1.90 168 38.60 1.82 >1000
clas+j 2.39 29 10.50 3.40 424 131.85
mp 1.90 163 37.16 1.82 >1000
ext 2.07 31 8.75 2.69 217 59.70
ext+i 2.07 11 4.05 2.89 97 29.78
std 2.07 13 4.53 2.89 148 43.68
anisotropic problem
−(c2 +s 2 )u x x +2(1−)scu x y −(s 2 +c2 )u yy = 1 (21)
with s = sin , c = cos , and = 0.001 with rotation angles = 45 and 60◦ .
The use of the distance-two interpolation operators combined with PMIS shows significant
improvements over classical and multipass interpolations with regard to number of iterations as
well as time. The best interpolation operator here is the ext+i interpolation, which has the lowest
number of iterations and times in general. The difference is especially significant in the case of the
problems with rotated anisotropies. The operator complexity is larger, however, as was expected.
This increase becomes more significant for 3D problems. Here we consider the partial differential
equation
−(au x )x −(au y ) y −(au z )z = f (22)
on a n ×n ×n cube. For the Laplace problem a(x, y, z) = 1, for the problem denoted by ‘Jumps’
we consider the function a(x, y, z) = 1000 for the interior cube 0.1<x, y, z<0.9, a(x, y, z) = 0.01
for 0<x, y, z<0.1 and the other cubes of size 0.1×0.1×0.1 that are located at the corners of the
unit cube and a(x, y, z) = 1 elsewhere. The 27-point problem is a matrix with a 27-point stencil
with the value 26 in the interior and −1 elsewhere and is being tested because we also wanted to
consider a problem with a larger stencil.
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:115–139
DOI: 10.1002/nla
128 H. DE STERCK ET AL.
Table VI. AMG for a 7-point 3D Laplace problem, a problem with a 27-point stencil
and a 3D structured PDE problem with jumps on a 60×60×60 cube with a random
right-hand side using different interpolation operators.
7-point 27-point Jumps
Method Cop # its Time Cop # its Time Cop # its Time
clas 2.34 45 10.21 1.09 28 10.58 2.50 >1000
clas+j 5.12 11 20.35 1.34 8 17.10 5.37 15 20.99
mp 2.35 47 10.40 1.10 30 9.39 2.50 80 17.37
ext 4.93 11 16.70 1.35 8 21.32 5.27 15 16.89
ext+i 4.27 9 14.48 1.35 8 21.55 5.10 11 15.96
std 4.20 10 12.78 1.38 10 18.58 5.21 18 17.47
70
60
50
clas
no. of its.
mp
40
clas+j
std
30 ext
ext+i
20
10
0
20 40 60 80 100
n
Figure 4. Number of iterations for PMIS with various interpolation operators for a 3D 7-point
Laplace problem on a n ×n ×n-grid.
While for these problems AMG convergence factors for distance-two interpolation improve
significantly compared with classical and multipass interpolations, as can be seen in Table VI,
overall times are worse for the 7-point 3D Laplace problem as well as the 27-point problem
on a 60×60×60 grid. The only problem on the 60×60×60 grid that benefits from distance-
two interpolation operators also with regard to time is the problem with jumps, which requires
long-distance interpolation to even converge. Using distance-two interpolation operators leads to
complexities about twice as large as those obtained when using classical or multi-pass interpolation,
which work relatively well for the 7- and 27-point problem on the 60×60×60 grid. However,
when we scale up the problem sizes, they show very good scalability in terms of AMG convergence
factors, as can be seen in Figure 4, which shows the number of iterations for a 3D 7-point Laplace
problem on a n ×n ×n grid for increasing n. The anticipated large differences in numbers of
iterations between distance-one and distance-two interpolations show up in the 2D results of Tables
IV and V on grids with 1000 points per direction, but are not yet particularly significant in the
3D results of Table VI with only 60 points per direction. It is expected, however, that for the
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:115–139
DOI: 10.1002/nla
DISTANCE-TWO INTERPOLATION FOR PARALLEL AMG 129
large problems that we want to solve on a parallel computer, distance-two interpolation operators
will lead to overall better times than classical or multi-pass interpolation due to scalable AMG
convergence factors, if the operator complexity can be kept under control. See Section 9 for actual
test results.
While the methods described in the previous section largely restore grid-independent convergence
to AMG cycles that use PMIS-coarsened grids, they also lead to much larger operator complexities
for the V-cycles. Therefore, it is necessary to consider ways to reduce these complexities while
(hopefully) retaining the improved convergence. In this section we describe a few ways of achieving
this.
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:115–139
DOI: 10.1002/nla
130 H. DE STERCK ET AL.
Table VII. AMG for the 9-point 2D Laplace problem on a 1000×1000 square with random right-hand
side using different interpolation operators and rotated anisotropies of 0.001 on a 512×512 grid.
9-point 45◦ 60◦
Method Cop # its Time Cop # its Time Cop # its Time
ext 1.60 10 18.26 2.07 31 8.67 2.69 217 59.70
ext-cc 1.45 14 17.35 2.06 34 9.33 2.62 247 66.22
ext-ccs 1.43 15 17.46 2.05 34 9.13 2.42 270 67.96
ext+i 1.60 10 18.40 2.07 11 4.05 2.89 97 29.78
ext+i-cc 1.45 14 17.91 2.05 14 4.72 2.80 117 34.63
ext+i-ccs 1.42 15 17.98 2.04 14 4.73 2.51 143 38.87
Table VIII. AMG for a 7- and 27-point 3D Laplace problem and a 3D structured PDE problem with
jumps on a 60×60×60 cube with a random right-hand side using different interpolation operators.
7-point 27-point Jumps
Method Cop # its Time Cop # its Time Cop # its Time
ext 4.93 11 16.70 1.35 8 21.32 5.27 15 16.89
ext-cc 4.62 12 11.11 1.33 7 11.82 4.86 16 11.59
ext-ccs 4.00 12 8.46 1.31 7 10.34 4.23 17 9.61
ext+i 4.27 9 14.48 1.35 8 21.55 5.10 11 15.96
ext+i-cc 4.12 9 9.16 1.33 7 12.48 4.66 13 10.35
ext+i-ccs 3.64 9 7.23 1.31 7 10.95 4.00 14 8.37
smaller than the Cop value for the ‘x’ interpolations, and the number of iterations is nearly constant
as a function of problem size, and only slightly larger than the number of iterations for the full
‘x’ interpolation formulas [9]. This shows that using distance-two interpolation formulas with
reduced complexities restores the grid-independent convergence and scalability of AMG on PMIS-
coarsened grids, without the need for GMRES acceleration. This makes these methods suitable
algorithms for large problems on parallel computers, as is discussed below.
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:115–139
DOI: 10.1002/nla
DISTANCE-TWO INTERPOLATION FOR PARALLEL AMG 131
Table IX. Effect of truncation on AMG with ext+i interpolation for a 7-point 3D
Laplace problem on a 60×60×60 cube with a random right-hand side.
Truncation factor Max. # of weights
Cop # its Time kmax Cop # its Time
0 4.27 9 14.48
0.1 4.13 9 10.72 7 3.75 9 8.63
0.2 3.88 9 8.52 6 3.42 9 7.41
0.3 3.39 10 6.82 5 3.01 10 6.42
0.4 3.02 13 6.60 4 2.73 14 6.30
0.5 2.75 20 7.67 3 2.48 24 7.41
truncation leads to an increase in total time similarly as reported for interpolatory set restriction
in the previous section.
8. PARALLEL IMPLEMENTATION
This section describes the parallel implementation and gives a rough idea of the cost involved,
with particular focus on the increase in communication required for the distance-two interpolation
formulae compared with distance-one interpolation. Since the core computation for the interpolation
routines is approximately the same as in the sequential case, we only focus on the additional
computations that are required for inter-communication between processors.
In parallel, each matrix is stored using a parallel data format, the ParCSR matrix data struc-
ture, which is described and analyzed in detail in [10]. Matrices are distributed across processors
by contiguous blocks of rows, which are stored via two compressed sparse row matrices, one
storing the local entries and the other one storing the off-processor entries. There is an additional
array containing a mapping for the off-processor neighbor points. The data structure also contains
the information necessary to retrieve information from distance-one off-processor neighbors. It,
however, does not contain information on off-processor distance-two neighbors, which compli-
cates the parallel implementation of distance-two interpolation operators. When determining these
neighbors, there are four scenarios that need to be considered, see Figure 5. Consider point i,
which is the point to be interpolated to, and is residing on Processor 0. A distance-two neighbor
can reside on the same processor as i, similar to point j; it can be a distance-one neighbor to
another point on Proc. 0, similar to point l, and therefore be already contained in the off-processor
mapping; it can be a new point on a neighbor processor, similar to point k, or it can be located on
a processor, which is currently not a neighbor processor to Proc. 0, similar to point m.
There are basically five additional parts that are required for the parallel implementation, and for
which we give rough estimates of the cost involved below. Operations include floating point and
integer operations as well as message passing and sends and receives required to communicate data
across processors. We use the following notations: n 1 denotes the average number of distance-one
neighbors per point, as defined previously, p is the total number of processors, qi is the average
number of distance-i neighbor processors per processor, Nio is the average number of distance-i
off-processor points and equals the sum of the average number of distance-i off-processor C-points,
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:115–139
DOI: 10.1002/nla
132 H. DE STERCK ET AL.
Proc. 0 Proc. 1
i k
Proc. 2
l m Proc. 3
Cio , and distance-i off-processor F-points, Fio . Note that the estimates of number of operations
and number of processors involved given below are per processor.
1. Communication of C/F splitting for all off-processor neighbor points: This is required for
all interpolation operators and takes O(N1o )+ O(q1 ) operations.
2. Obtaining the row information for off-processor distance-one F-points: This step is necessary
for classical and distance-two interpolations but not for direct interpolation, which only
uses local matrix coefficients to generate the interpolation formula. It requires O(n 1 F1o )+
O(F1o )+ O(q1 ) operations.
3. Determining off-processor distance-two points and additional communication information:
This step is only required for distance-two interpolation operators. Finding the new off-
processor points, which requires checking whether they are already contained in the map
and describing the off-processor connections, takes O(n 1 F1o log(N1o )) operations. Sorting the
new information takes O(N2o log(N2o )) operations. Obtaining the communication information
for the new points using an assumed partition algorithm [11], requires O(N2o )+ O(log p)+
O((q1 +q2 ) log(q1 +q2 )) operations. Obtaining the additional C/F splitting information takes
O(N2o )+ O(q1 +q2 ) operations.
4. Communication of fine-to-coarse mappings: This step requires O(N1o )+ O(q1 ) operations
for distance-one interpolation and O(N1o + N2o )+ O(q1 +q2 ) operations for the distance-two
interpolation schemes.
5. Generating the interpolation matrix communication package: This step requires O(C1o )+
O(log p)+ O(q1 log q1 ) operations for distance-one interpolation and O(C1o +C2o )+
O(log p)+ O((q1 +q2 ) log(q1 +q2 )) operations for distance-two interpolation. Note that if
truncation is used, Cio should be replaced by C̃ io with C̃ io <Cio for i = 1, 2.
Summarizing these results, direct interpolation requires the least amount of communication,
followed by classical interpolation. Parallel implementation of distance-two interpolation requires
more communication steps and additional data manipulation, and involves more data and neighbor
processors. How significantly this overhead impacts the total time depends on many factors, such
as the problem size per processor, the stencil size, the computer architecture and more. Parallel
scalability results are presented below.
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:115–139
DOI: 10.1002/nla
DISTANCE-TWO INTERPOLATION FOR PARALLEL AMG 133
In this section, we investigate weak scalability of the new interpolation operators by applying the
resulting AMG methods to various problems.
The following problems were run on Thunder, an Intel Itanium2 machine with 1024 nodes of
four processors each, located at Lawrence Livermore National Laboratory, unless we say otherwise.
In this section, p denotes the number of processors used.
Table X. Times in seconds (number of iterations) for a 9-point 2D Laplace problem with 300×300 points
per processor; ‘n.c.’ denotes ‘not converging within 500 iterations’.
p clas clas+j mp std ext e+i e+i-cc e+i-ccs
1 15(88) 3(9) 18(105) 4(15) 3(10) 3(10) 3(12) 3(13)
64 48(245) 4(11) 57(278) 6(20) 4(12) 4(12) 5(16) 5(19)
256 79(400) 5(12) 85(436) 8(27) 5(13) 5(13) 5(19) 6(21)
1024 104(494) 6(13) n.c. 9(27) 6(14) 6(14) 7(21) 7(21)
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:115–139
DOI: 10.1002/nla
134 H. DE STERCK ET AL.
Table XI. Times in seconds (number of iterations) for a 2D problem with a 45◦ rotated anisotropy of
0.001 with 300×300 points per processor; ‘n.c.’ denotes ‘not converging within 500 iterations’.
p clas clas+j mp std ext e+i e+i-cc e+i-ccs
1 24(116) 7(27) 24(119) 3(11) 7(29) 3(10) 3(12) 3(12)
64 n.c. 12(36) 96(401) 7(22) 11(39) 5(16) 7(21) 7(23)
256 n.c. 13(37) n.c. 8(25) 12(42) 6(18) 8(25) 8(27)
1024 n.c. 15(40) n.c. 10(29) 14(45) 8(21) 10(29) 11(31)
Table XII. Times in seconds (number of iterations) for a 2D problem with a 60◦ rotated anisotropy of
0.001 with 300×300 points per processor; ‘n.c.’ denotes ‘not converging within 500 iterations’.
p clas clas+j mp std ext e+i e+i-cc e+i-ccs
1 n.c. 105(342) n.c. 30(107) 45(172) 22(79) 24(87) 28(112)
64 n.c. n.c. n.c. 79(256) 96(330) 47(152) 59(196) 70(254)
256 n.c. n.c. n.c. 95(305) 110(374) 56(176) 70(227) 84(299)
1024 n.c. n.c. n.c. 113(357) 123(408) 62(193) 82(263) 100(347)
Table XIII. Total times in seconds (number of iterations) for a 7-point 3D Laplace problem
with 40×40×40 points per processor.
p clas clas+j mp std ext e+i ext-ccs e+i-ccs
1 5(33) 8(11) 6(34) 5(9) 7(11) 6(8) 4(12) 3(9)
64 17(80) 18(12) 16(79) 14(18) 16(12) 12(10) 9(14) 7(11)
512 33(149) 26(12) 28(126) 20(26) 20(14) 17(15) 11(14) 11(11)
1000 39(175) 41(12) 31(138) 26(31) 30(13) 31(39) 15(15) 16(14)
1728 51(229) 63(12) 37(159) 35(41) 46(13) 40(33) 22(15) 24(16)
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:115–139
DOI: 10.1002/nla
DISTANCE-TWO INTERPOLATION FOR PARALLEL AMG 135
Table XIV. Total times in seconds (number of iterations) for a 7-point 3D Laplace problem
with 40×40×40 points per processor.
p ext4 ext5 e+i4 e+i5 ext-cc5 e+i-cc5 std5 clas+j0.1
1 3(13) 3(11) 3(12) 3(9) 3(11) 3(9) 4(12) 3(13)
64 6(19) 7(15) 7(19) 7(13) 6(14) 6(13) 9(25) 9(23)
512 9(25) 8(18) 11(28) 10(19) 8(17) 8(17) 15(39) 13(36)
1000 10(25) 11(18) 11(30) 12(20) 10(18) 9(17) 17(39) 15(37)
1728 12(29) 12(21) 13(35) 14(24) 11(21) 11(20) 28(46) 19(45)
Table XV. Total times in seconds (number of iterations) for a structured 3D problem with
jumps with 40×40×40 points per processor.
p mp clas+j ext e+i ext-ccs e+i-ccs std4 ext-cc5
1 11(64) 8(14) 7(14) 6(10) 5(18) 4(15) 6(26) 5(17)
64 35(176) 20(17) 17(17) 15(14) 11(21) 9(19) 18(71) 11(24)
512 58(280) 31(20) 24(24) 21(20) 15(24) 13(21) 27(98) 11(30)
1000 65(306) 35(21) 27(20) 26(21) 19(24) 18(22) 33(113) 14(33)
1728 77(350) 60(21) 73(70) 43(26) 25(29) 29(23) 53(169) 17(36)
ext-ccs (3.9–4.2), e+i-cc (4.0–4.3), and e+i-ccs (3.6–3.8). For the sake of saving space, we
did not record the results for ext-cc or e+i-cc, but the times and number of iterations for these
methods were in between those of ext and ext-ccs, or e+i and e+i-ccs, respectively. Interestingly,
the complexity reducing strategies e+i-cc and e+i-ccs show not only better scalability with
regard to time, but also better scalability of convergence factors than e+i interpolation in this
case.
For this problem, complexity reducing strategies, thus, are paying off. Table XIV shows results
for various truncated interpolation schemes. We used the truncation strategy that restricts the number
of weights per row using either 4 or 5 for the maximal number of elements. While we present
both results for ext and e+i, we present only the faster results for the remaining interpolation
schemes for the sake of saving space. We used a truncation factor of 0.1 for clas+j. Operator
complexities were fairly consistent here across increasing numbers of processors: we obtained 2.9
for ext4, 3.2 for ext5, 2.8 for e+i4, 3.1 for e+i5, 3.2 for ext-cc5, 3.1 for e+i-cc5, 3.2 for std5,
and 3.0 for clas+j0.1. Clearly, using four compared with five weights leads to lower complexities,
but larger number of iterations. Total times are not significantly different. Comparing the fastest
method, e+i-cc5, on 1728 processors to PMIS with classical interpolation, we see a factor of 11
in improvement with regard to number of iterations and a factor of 5 in improvement with regard
to total time with a slight increase in complexity.
Table XV shows results for the problem with jumps (22), for which PMIS with classical
interpolation was shown to completely fail. Multipass interpolation converges here with highly
degrading scalability but good complexities of 2.4. Applying Jacobi interpolation to classical
interpolation leads to very good convergence, but, due to operator complexities between 5.1 and
5.7, it leads to a much more expensive setup and solve cycle. Applying a truncation factor of 0.1
as in the previous example leads to extremely bad convergence and is not helpful here. Standard
interpolation converges very well for small number of processors, but diverges if p is greater or
equal to 64. Interestingly enough std4 converges, albeit not very well.
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:115–139
DOI: 10.1002/nla
136 H. DE STERCK ET AL.
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:115–139
DOI: 10.1002/nla
DISTANCE-TWO INTERPOLATION FOR PARALLEL AMG 137
Table XVI. Number of iterations (operator complexities) for the unstructured 3D problem with jumps.
AMG is used here as a preconditioner for GMRES(10).
CLJP PMIS
p clas clas4 clas mp e+i e+i-cc e+i4
1 9(5.6) 9(4.2) 18(1.5) 20(1.5) 9(2.7) 10(2.2) 9(1.8)
64 11(6.7) 12(4.6) 62(1.5) 34(1.5) 11(3.0) 13(2.3) 13(1.9)
256 11(7.8) 12(5.0) 72(1.5) 34(1.5) 12(2.9) 12(2.3) 13(1.8)
512 11(7.2) 13(4.6) 118(1.5) 35(1.5) 12(3.0) 13(2.4) 12(1.8)
1024 10(8.6) 12(5.2) 162(1.6) 39(1.6) 12(3.4) 12(2.6) 14(2.0)
200
150
CLJP/clas
CLJP/clas4
Seconds
PMIS/clas
PMIS/e+i
100
PMIS/mp
PMIS/e+i-cc
PMIS/e+i4
50
0
0 200 400 600 800 1000
no. of procs
Figure 7. Total times for a diffusion problem with highly discontinuous material properties. AMG is used
here as a preconditioner for GMRES(10).
still did not achieve perfect scalability. Total times for CLJP with clas4 interpolation are comparable
with PMIS with classical interpolation due to the small complexities of PMIS in spite of its
significantly worse convergence factors. The use of extended+i and e+i-cc interpolations leads to
better scalability than the methods mentioned before due to their lower complexities if compared
with CLJP, or their better convergence factors if compared with PMIS with classical interpolation.
Multipass interpolation leads to even better timings, but the overall best time and scalability
are achieved by applying truncation to four weights per fine point to extended+i interpolation.
For this problem extended interpolation performs similar to extended+i interpolation. Standard
interpolation gives similar results on one processor, but the number of iterations gradually increases
from 11 on one processor to 34 on 1024 processors.
The second problem is a 3D linear elasticity problem using the same domain as above. However,
a smaller grid size is used, since this problem requires more memory, leading to about 30 000
degrees of freedom per processor. The Poisson ratio chosen for the pile driver in the middle of the
domain was chosen to be 0.4 and the Poisson ratios in the surrounding regions were 0.1, 0.3, 0.3
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:115–139
DOI: 10.1002/nla
138 H. DE STERCK ET AL.
Table XVII. Number of iterations for the 3D elasticity problem; range of operator complexities. AMG is
used here as a preconditioner for conjugate gradient.
CLJP PMIS
p clas clas4 clas mp e+i e+i-cc e+i-ccs e+i4
1 64 63 94 93 68 69 72 72
8 83 84 159 131 89 95 96 90
64 92 96 210 179 97 105 112 107
512 — 112 319 247 108 109 123 123
400
350
300
CLJP/clas
times in seconds
250 CLJP/clas4
PMIS/clas
PMIS/mp
200
PMIS/e+i
PMIS/e+i-cc
150 PMIS/e+i-ccs
PMIS/e+i4
100
50
0
0 100 200 300 400 500
no. of procs
Figure 8. Total times for the 3D elasticity problem. AMG is used here as a
preconditioner for conjugate gradient.
and 0.2. Since this is a systems problem, the unknown-based AMG method for systems of PDEs
was used. For this problem, the conjugate gradient method was used as an accelerator, and hybrid
symmetric Gauss–Seidel as a smoother. The results are given in Table XVII and Figure 8. CLJP
ran out of memory for the 512 processor run. Here also extended+i interpolation with truncation
leads to the lowest run times and best scalability. Extended interpolation performed similar to
extended+i interpolation. While standard interpolation performs similar to the other distance-two
interpolation methods for a small number of processors, it performed significantly worse on 512
processors.
10. CONCLUSIONS
We have studied the performance of AMG methods using the PMIS-coarsening algorithm in
combination with various interpolation operators. PMIS with classical, distance-one interpolation
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:115–139
DOI: 10.1002/nla
DISTANCE-TWO INTERPOLATION FOR PARALLEL AMG 139
leads to an AMG method with low complexity, but has bad scalability in terms of AMG convergence
factors. The use of distance-two interpolation operators restores this scalability. However, it leads
to an increase in operator complexity. While this increase was fairly small for 2D problems and was
far outweighed by the much improved convergence, for 3D problems complexities were often twice
as large, and impacted scalability. To counter this complexity growth, we implemented various
complexity reducing strategies, such as the use of smaller interpolatory sets and interpolation
truncation. The resulting AMG methods, particularly the extended+i interpolation in combination
with truncation, lead to very good scalability for a variety of difficult PDE problems on large
parallel computers.
ACKNOWLEDGEMENTS
We thank Tzanio Kolev for providing the unstructured problem generator and Jeff Painter for the Jacobi
interpolation routine. This work was performed under the auspices of the U.S. Department of Energy by
University of California Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48.
REFERENCES
1. Brandt A, McCormick SF, Ruge JW. Algebraic multigrid (AMG) for sparse matrix equations. In Sparsity and
its Applications, Evans DJ (ed.). Cambridge University Press: Cambridge, 1984.
2. Ruge JW, Stüben K. Algebraic multigrid (AMG). In Multigrid Methods, Vol. 3 of Frontiers in Applied Mathematics,
McCormick SF (ed.). SIAM: Philadelphia, PA, 1987; 73–130.
3. Stüben K. Algebraic multigrid (AMG): an introduction with applications. In Multigrid, Trottenberg U, Oosterlee C,
Schüller A (eds). Academic Press: New York, 2000.
4. Cleary AJ, Falgout RD, Henson VE, Jones JE, Manteuffel TA, McCormick SF, Miranda GN, Ruge JW. Robustness
and scalability of algebraic multigrid. SIAM Journal on Scientific Computing 2000; 21:1886–1908.
5. De Sterck H, Yang UM, Heys JJ. Reducing complexity in parallel algebraic multigrid preconditioners. SIAM
Journal on Matrix Analysis and Applications 2006; 27:1019–1039.
6. Luby M. A simple parallel algorithm for the maximal independent set problem. SIAM Journal on Computing
1986; 15:1036–1053.
7. Briggs WL, Henson VE, McCormick SF. A Multigrid Tutorial (2nd edn). SIAM: Philadelphia, PA, 2000.
8. Henson VE, Yang UM. BoomerAMG: a parallel algebraic multigrid solver and preconditioner. Applied Numerical
Mathematics 2002; 41:155–177.
9. Butler J. Improving coarsening and interpolation for algebraic multigrid. Master’s Thesis, Applied Mathematics,
University of Waterloo, 2006.
10. Falgout RD, Jones JE, Yang UM. Pursuing scalability for hypre’s conceptual interfaces. ACM Transactions on
Mathematical Software 2005; 31:326–350.
11. Baker A, Falgout RD, Yang UM. An assumed partition algorithm for determining processor inter-communication.
Parallel Computing 2006; 32:394–414.
12. Cleary AJ, Falgout RD, Henson VE, Jones JE. Coarse grid selection for parallel algebraic multigrid. In Proceedings
of the Fifth International Symposium on Solving Irregularly Structured Problems in Parallel. Springer: New York,
1998.
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:115–139
DOI: 10.1002/nla
NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS
Numer. Linear Algebra Appl. 2008; 15:141–163
Published online 28 December 2007 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nla.568
SUMMARY
We consider the numerical solution of time-dependent partial differential equations (PDEs) with random
coefficients. A spectral approach, called stochastic finite element method, is used to compute the statistical
characteristics of the solution. This method transforms a stochastic PDE into a coupled system of
deterministic equations by means of a Galerkin projection onto a generalized polynomial chaos. An
algebraic multigrid (AMG) method is presented to solve the algebraic systems that result after discretization
of this coupled system. High-order time integration schemes of an implicit Runge–Kutta type and spatial
discretization on unstructured finite element meshes are considered. The convergence properties of the
AMG method are demonstrated by a convergence analysis and by numerical tests. Copyright q 2008
John Wiley & Sons, Ltd.
KEY WORDS: partial differential equations with random coefficients; Karhunen–Loève expansion; poly-
nomial chaos; algebraic multigrid; implicit Runge–Kutta time discretization
1. INTRODUCTION
∗ Correspondence to: S. Vandewalle, Computer Science Department, Katholieke Universiteit Leuven, Celestijnenlaan
200A, B-3001 Leuven, Belgium.
†
E-mail: stefan.vandewalle@cs.kuleuven.be
Monte Carlo methods are often easy to implement but rapidly become prohibitively expensive
with increasing accuracy demands. Examples of deterministic approaches include perturbation
methods [5], Neumann expansion methods [6] and the spectral stochastic finite element method
[7, 8]. Perturbation and Neumann expansion methods are restricted to small parameter variances
and calculate only a few statistical moments of the solution. These restrictions do not hold for
the stochastic finite element method, which, in principle, enables to compute the full statistical
characteristics of the solution. That is, also the probability distribution of the solution can be
extracted. As such, it provides a valuable alternative to Monte Carlo simulations, see [4] for a
comparison.
The stochastic finite element method transforms a stochastic PDE into a system of coupled
deterministic PDEs after a projection of the random solution onto a suitable finite dimensional
random space. We will use the stochastic finite element approach to discretize the random part
of the PDE. For time-dependent PDEs, we will employ an implicit Runge–Kutta (IRK) time
integration scheme [9]. For IRK methods, the dimension of the linear systems to be solved at each
time step is proportional to the number of IRK stages. Multigrid methods are available for IRK
discretizations of deterministic parabolic PDEs [10, 11]. In this paper, we extend these methods
towards PDEs with random coefficients. In particular, we shall study an algebraic multigrid (AMG)
approach, suited for unstructured finite element meshes.
The paper is organized as follows. Section 2 describes the discretization of time-dependent
stochastic PDEs by means of the stochastic finite element and IRK method. The AMG method is
presented in Section 3. Its convergence properties are analyzed in Section 4 and further discussed
in Section 5. Section 6 addresses some implementation issues. In Section 7, numerical experiments
are given to illustrate the convergence behavior. Conclusions are presented in Section 8.
*u(x, t, )
−∇ ·((x, t, )∇u(x, t, )) = b(x, t, ) (1)
*t
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:141–163
DOI: 10.1002/nla
ALGEBRAIC MULTIGRID FOR PDES WITH STOCHASTIC COEFFICIENTS 143
where x ∈ D, t ∈ T = [0, T ] and ∈ , a sample space. Further on, we shall consider only the case
of a deterministic source term b(x, t). The method, however, immediately extends to the more
general case of a stochastic source term. Model problem (1) is completed with suitable boundary
conditions and initial conditions in the time-dependent case; only deterministic conditions are
considered here.
2.2.1. Generalized polynomial chaos. Consider a Hilbert space L 2 (, F, E) of square integrable
functions of L independent random variables i on (, F, E). A finite dimensional subspace S of
L 2 (, F, E) is defined through a set of Q basis functions {q }q=1,...,Q in the random variables
1 , . . . L . Let denote a vector containingthe random variables 1 , . . . L . The space S is equipped
with an inner product defined by ab = abw(y) dy, with w(y) denoting the joint probability
density corresponding to , the support of and a, b ∈ S. This inner product actually corresponds
to an expectation of the product of its arguments. Several approaches have been proposed to
construct S, e.g. [7, 12–14]. Here, we shall employ an orthonormal basis of multivariate polynomials
l that are globally defined in each random variable i . These multivariate polynomials are built
as a product of univariate polynomials {m i }i=1,...,L of degree m i in i and orthonormal w.r.t. the
probability measure corresponding to i .
Two criteria are often considered to determine the basis functions. One may limit the total degree
L
of the polynomial to a given value P, i.e. i=1 m i P. The total number of basis functions, Q, is
then given by (L + P)!/L!P! [15]. Alternatively, one may limit the degrees of the univariate factors
L
separately, i.e. m i pi , i = 1, . . . , L, for a given set of pi -values. In this case Q = i=1 ( pi +1) [4].
Using the first criterion, a so-called generalized polynomial chaos basis [1, 12] can be
constructed. The univariate polynomials are chosen from the Wiener–Askey scheme according
to the probability distributions of the random variables i . The second criterion can be used to
create an alternative set of basis functions {q } [4, 13, 16], which possess a double orthogonality
property:
j k = j,k and i j k = i jk j,k (2)
with i jk being a constant and j,k the Kronecker delta. This property allows to transform a linear
stochastic PDE into a system of uncoupled deterministic PDEs.
Having specified an appropriate random basis, the solution u(x, t, ) can be approximated by
a linear combination of basis functions with deterministic coefficients u q (x, t). When the basis
functions are collected in the column vector and the coefficients in a column vector u(x, t), we
can express
Q
u(x, t, ) ≈ u q (x, t)q () = T u(x, t) (3)
q=1
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:141–163
DOI: 10.1002/nla
144 E. ROSSEEL, T. BOONEN AND S. VANDEWALLE
2.2.2. Discretization of random inputs. The random inputs are typically discretized either by
a generalized polynomial chaos expansion approach similar to (3), see e.g. [17, 18], or by a
Karhunen–Loève (KL) expansion [19]. The former leads to an approximation of the form
Q (x, t, )q
(x, t, ) ≈ i (x, t)i () with i (x, t) = (4)
i=1 q2
We will apply this type of discretization to model random inputs with a lognormal marginal
distribution. Analytical expressions for the corresponding i (x, t) coefficients are given in [20].
The truncated KL expansion approximates a random wave (x, t, ) as
L
(x, t, ) ≈ 1 (x, t)+ i+1 (x, t)i () (5)
i=1
The function 1 (x, t) corresponds to the mean of (x, t, ). The functions i+1 (x, t) are eigenfunc-
tions of the covariance function C (x1 , t1 ; x2 , t2 ), scaled by the square root of the corresponding
eigenvalues. The random variables i are uncorrelated random variables with zero mean and unit
variance [21]. We assume that these random variables are independent. Note that L +1 terms
are needed to express a random input in an L-dimensional random space by a KL expansion, in
comparison with Q terms in the case of a chaos expansion. Hence, a chaos expansion will be used
only when the KL expansion is difficult to compute.
2.2.3. Galerkin approach. The stochastic PDE (1) can be converted into a system of deterministic
PDEs for the unknown coefficients u q (x, t) that appear in (3). This is done by replacing (x, t, )
by its approximation (4) or (5), by inserting the right-hand side of (3) into the PDE and by imposing
orthogonality of the resulting residual w.r.t. the chosen random basis. This results in
*u(x, t) L∗
C1 − Ci ∇ ·[i (x, t)∇u(x, t)] = b(x, t)c (6)
*t i=1
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:141–163
DOI: 10.1002/nla
ALGEBRAIC MULTIGRID FOR PDES WITH STOCHASTIC COEFFICIENTS 145
After spatial discretization, the stochastic finite element method yields a system of ordinary
differential equations (ODE):
⎡ ⎤
u1 (t)
du(t) L∗ ⎢ ⎥
⎢ ⎥
C1 ⊗ M + Ci ⊗ K i u(t) = c ⊗b(t) with u(t) = ⎢ ... ⎥ and uq (t) ∈ R N (10)
dt i=1 ⎣ ⎦
u Q (t)
Here, M ∈ R N ×N is the mass matrix defined as [M]kl = D sk (x)sl (x) dx, and the matrices Ci ∈
R Q×Q and the vector c ∈ R Q are defined by (7) or (8).
s
xi = u m +t ai j f (tm +c j t, x j ), i = 1, . . . , s (12)
j=1
Equation (11) expresses u m+1 as an update to u m in terms of the stage values {xi }i=1,...,s . Equation
(12) describes the system of equations to be solved to compute the stage values. The method is
fully characterized by the parameters Airk = [ai j ], birk = [b1 . . . bs ]T and cirk = [c1 . . . cs ]T . Equations
(11) and (12) are often rewritten in terms of the stage value increments x j := x j −u m :
s
xi = t ai j f (tm +c j t, u m +x j ), i = 1, . . . , s (14)
j=1
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:141–163
DOI: 10.1002/nla
146 E. ROSSEEL, T. BOONEN AND S. VANDEWALLE
We will apply the IRK method to system (10). The approximation at time tm+1 to the solution
u(tm+1 ) will be represented as u m+1 . Formulation (13)–(14) is used with stage vector increments
denoted simply as x j , j = 1, . . . , s. They are grouped together into a long vector x ∈ RNQs , where
the increments are numbered first along the random dimension, then along the spatial dimension
and finally according to the stages.
When the coefficient (x, ) is time independent, system (14) discretizing (10) becomes
L∗
C1 ⊗ M ⊗ Is +t Ci ⊗ K i ⊗ Airk x =
b (15)
i=1
with
b being a known vector depending on u m and on the right-hand side of (10)
⎛ ⎛ ⎡ ⎤⎞ ⎞
b(tm +c1 t)
⎜ ⎜ ⎢ ⎥⎟ ∗ ⎟
⎜ T⎜ ⎢ .
. ⎥⎟ L ⎟
b = t ⎜ INQ ⊗ Airk P ⎜c ⊗ ⎢ . ⎥⎟ − Ci ⊗ K i ⊗ Airk [u m ⊗1s ]⎟ (16)
⎝ ⎝ ⎣ ⎦⎠ i=1 ⎠
b(tm +cs t)
and 1s = [1 . . . 1]T ∈ Rs . The matrix P T is such that it permutes the rows of the vector it multiplies
so that all variables are grouped in the same order as the unknowns x.
In case of a time-dependent stochastic coefficient (x, t, ), each of the elements of the stiffness
matrices K i (9) is time dependent. According to Equation (14), every stiffness matrix K i (t) is
evaluated at s time positions t = tm +c j t, j = 1, . . . , s. This leads to a total of L ∗ ·s stiffness
matrices at each time step. Applying (14) yields the following system to be solved for the stage
vector increments:
∗
L
(C1 ⊗ M ⊗ Is )+t Ci ⊗ K i (tm +c1 t)⊗ Airk (:, 1) . . .
i=1
L∗
Ci ⊗ K i (tm +cs t)⊗ Airk (:, s) P x = B (17)
i=1
Matrix P is an NQs×NQs permutation matrix. It permutes the columns of the matrix that it is
multiplied with so that consecutive IRK stages are grouped together in blocks of s columns. In the
remainder of the paper, the multigrid formulation and analysis are presented for time-independent
(x, ). The extension to the general case of (x, t, ) is straightforward.
Remark 2.3
In Equation (15) the unknowns are ordered block-wise. The vector x consists of Q consecutive
blocks, with each block corresponding to the unknowns associated with a random mode. These
blocks can further be subdivided in N blocks, where each one contains the IRK unknowns per
spatial node. Similar to the discussion in [23] on unknown-based and point-wise ordering of
variables, the unknowns in Equation (15) can be reordered per spatial point. This yields the system
L∗
M ⊗C1 ⊗ Is +t K i ⊗Ci ⊗ Airk x̂ = b̂ (18)
i=1
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:141–163
DOI: 10.1002/nla
ALGEBRAIC MULTIGRID FOR PDES WITH STOCHASTIC COEFFICIENTS 147
with b̂ being the reordered version of
b (16). The vector x̂ contains N blocks, with each block
corresponding to the Qs unknowns related to a spatial point. This point-based ordering is more
convenient to illustrate the block operations of the point-based AMG method presented in Section 3,
see Remark 3.1.
Next, we present an AMG method to solve the stochastic finite element discretization (15) or (17).
We will also consider the case of a stationary, i.e. time-independent problem. In that case, the
discretization reads
⎡ ⎤
u1
L∗
⎢ ⎥
⎢ ⎥
Ci ⊗ K i u = b with u = ⎢ ... ⎥ , uq ∈ R N and b = c ⊗b (19)
i=1 ⎣ ⎦
uQ
The basis of the method is the classical multigrid iteration as shown in Algorithm 1. The algo-
rithm uses a hierarchy of K levels, k = 1, . . . , K , with A K u K = b K being the discretization of the
(stochastic) PDE on the given (fine) mesh. The recursion scheme is determined by a parameter
; for example, the case = 1 is called a V-cycle, the case = 2 a W-cycle. An AMG method
requires a setup phase to algebraically construct the restriction and prolongation operators, Rkk−1
k , k = 2, . . . , K . The coarse level operators A
and Pk−1 k−1 , k = 2, . . . , K , are assembled by using the
k+1
Galerkin principle [24], i.e. Ak = Rk+1 Ak+1 Pk . To construct an AMG method for stochastic
k
finite element and IRK discretizations, the AMG components are built so that all unknowns per
spatial node are updated together. A block smoother will be used, and prolongation and restriction
operators will have a tensor structure.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:141–163
DOI: 10.1002/nla
148 E. ROSSEEL, T. BOONEN AND S. VANDEWALLE
with x[n] ∈ RQs being the unknowns associated with node n. For stationary problems, the local
system simplifies to the Q × Q system:
L∗
L∗
K i[n,n] Ci u [n] = b[n] − K i[n,m] Ci u [m] (21)
i=1 m
=n i=1
The block Gauss–Seidel iteration step can be expressed as a linear iteration based on a matrix
splitting of the stiffness matrices K i , K i = K i+ + K i− (i = 1, . . . , L ∗ ), and the mass matrix M,
M = M + + M − . Here, K i+ and M + are the lower triangular parts of K i and M, respectively. The
block Gauss–Seidel iteration in the th iteration step can then be formulated as
L∗
+
C1 ⊗ M + ⊗ Is +t Ci ⊗ K i ⊗ Airk x ( +1)
i=1
L∗
=
b − C1 ⊗ M ⊗ Is +t −
Ci ⊗ K i− ⊗ Airk x ( ) (22)
i=1
Remark 3.1
The block Gauss–Seidel method entails every iteration a block triangular system solve. The
triangular shape of these systems can be visualized by reordering the unknowns according to
Equation (18). The block Gauss–Seidel iteration (22) can then be formulated as
⎡ ⎤
L∗
⎢ M[1,1] IQs +t
irk
K i[1,1] Ci 0 ⎥
⎢ ⎥
⎢ i=1
⎥
⎢ .. ⎥ ( +1)
⎢ . ⎥ x̂ = b̂GS
⎢ ⎥
⎢ ⎥
⎢ L∗ L∗ ⎥
⎣ ⎦
M[N ,1] IQs +t K i[N ,1] Ciirk . . . M[N ,N ] IQs +t K i[N ,N ] Ciirk
i=1 i=1
L ∗ − irk ( )
with Ciirk = Ci ⊗ Airk , C1 replaced by I Q and b̂GS = b̂ −(M − ⊗ IQs +t i=1 K i ⊗Ci ) x̂ .
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:141–163
DOI: 10.1002/nla
ALGEBRAIC MULTIGRID FOR PDES WITH STOCHASTIC COEFFICIENTS 149
4. CONVERGENCE ANALYSIS
Using Fourier analysis [24, 25], valuable insights in the convergence behavior of geometric multi-
grid methods can be obtained. A local Fourier analysis of geometric multigrid for stochastic,
stationary PDEs can be found in [16]. This analysis cannot be directly applied to AMG methods.
Instead, the methodology from [10, 11] is followed.
Our analysis for stationary and time-dependent problems as will be detailed in Sections 4.1 and
4.2, respectively, is restricted to the case of L ∗ = 2. This corresponds to a diffusion coefficient
discretized with one random variable, see also Equation (5). In Section 5, the extension to the
general case, L ∗ >2, is discussed.
and e( ) = u exact −u ( ) the error at iteration step . The asymptotic convergence is characterized
by the spectral radius of the iteration operator S, denoted by
(S). Assume that the random basis
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:141–163
DOI: 10.1002/nla
150 E. ROSSEEL, T. BOONEN AND S. VANDEWALLE
functions 1 , . . . , Q are normalized in such a way that C1 equals the identity matrix I Q . The
matrix C2 is a real symmetric matrix (7)–(8) with eigenvalue decomposition C2 = VC2 C2 VCT2 .
Applying the similarity transform VC2 ⊗ I N to S leads to
Q
= (−(K 1+ +q K 2+ )−1 (K 1− +q K 2− )) with q ∈ (C2 )
q=1
Q
= ( Ŝ(q ))
q=1
(S) = max
( Ŝ(q )) (27)
q ∈(C2 )
To characterize the convergence properties of a two-level multigrid cycle, we define the matrix-
valued function T̂ (r ):
T̂ (r ) = ( Ŝ(r ))s2 (I N − Pd (PdT (K 1 +r K 2 )Pd )−1 PdT (K 1 +r K 2 ))( Ŝ(r ))s1
with Ŝ(r ) being defined by (26), s1 and s2 are the number of pre- and postsmoothing iterations.
An analogous derivation as above shows that the asymptotic convergence factor of the two-level
cycle can be determined from the spectral radius of the corresponding iteration matrix T as
(T ) = max
(T̂ (q )) (28)
q ∈(C2 )
Formulas (27) and (28) allow the following intuitive interpretation. The convergence for the
stationary stochastic finite element discretization with L ∗ = 2 equals the worst convergence of
the corresponding Gauss–Seidel or multigrid method, applied to a set of deterministic problems
of the form:
(K 1 +q K 2 )u = b with q ∈ (C2 ) (29)
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:141–163
DOI: 10.1002/nla
ALGEBRAIC MULTIGRID FOR PDES WITH STOCHASTIC COEFFICIENTS 151
with corresponding iteration matrix denoted as S. This matrix can be decoupled by applying
the similarity transform VC2 ⊗ I N ⊗ Virk , with Virk resulting from the eigenvalue decomposition
−1
Airk = Virk irk Virk and VC2 from C2 = V2 C2 V2T . This enables to express the spectrum of S as
s
Q
(S) = ( Ŝ(q , trirk )), rirk ∈ (Airk ), q ∈ (C2 )
r =1 q=1
The analysis of the two-level multigrid cycle proceeds in a similar way. It is based on the
matrix-valued function T̂ (r, z) defined as
T̂ (r, z) = ( Ŝ(r, z))s2 (I N − Pd (PdT (M + z K 1 + zr K 2 )Pd )−1 PdT (M + z K 1 + zr K 2 ))( Ŝ(r, z))s1
with Ŝ(r, z) given by (31) and s1 and s2 being the number of pre- and postsmoothing steps. Using
this matrix function, the asymptotic convergence factor of the two-level multigrid cycle becomes
(T ) = max max
(T̂ (q , tirk )) (33)
irk ∈(Airk ) q ∈(C2 )
As in the stationary case, this value corresponds to the worst case asymptotic convergence factor
of multigrid applied to the set of deterministic problems:
(M +tirk K 1 +tirk q K 2 )x = b with q ∈ (C2 ), irk ∈ (Airk ) (34)
These deterministic systems can be derived from backward Euler discretizations with scaled time
step tirk of ODE systems:
dx
M +(K 1 +q K 2 )x = b (35)
dt
4.3. General discretizations with L ∗ >2
The case L ∗ = 2 enables a decoupling of the stochastic and spatial dimensions, using a similarity
transform based on C2 . Hence, the analysis can be reduced to the analysis of smaller problems of
the form (29) and (34). For these problems, local Fourier analysis [24, 26] allows to derive sharp
convergence factors, at least for the geometric multigrid variant on regular meshes.
In general, no decoupling between the spatial and random part of the discretization is possible
since the matrices Ci cannot be diagonalized simultaneously. An exception to this occurs when
double orthogonal polynomials are used as basis functions. Indeed, then all matrices Ci are diagonal,
see (2) and (7). Denoting the double orthogonal random basis by and the corresponding matrices
Ci by G i , we can determine the spectral radius of the two-level multigrid iteration matrix T as
(T ) = max . . . max
(T̂ (1 , . . . , L ∗ ))
1 ∈(G 1 ) L ∗ ∈(G L ∗ )
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:141–163
DOI: 10.1002/nla
152 E. ROSSEEL, T. BOONEN AND S. VANDEWALLE
· ( Ŝ(r1 , . . . ,r L ∗ ))s1
L ∗ L ∗
and the matrix-valued function Ŝ(r1 , . . . ,r L ∗ ) = ( i=1 ri K i+ )−1 ( i=1 ri K i− ). In Section 5 we will
point out a relation between the eigenvalues of the matrices Ci and the diagonal elements of the
matrices G i . On the basis of that relation, we can show that the AMG convergence properties in
case of a double orthogonal random basis are similar to the case L ∗ = 2 treated in the previous
section. Moreover, also when the double orthogonal basis is not used, we can argue that the analysis
of the case L ∗ = 2 is likely to provide valuable insights for the general case L ∗ >2. The first terms
of system (19), i.e. C1 ⊗ K 1 +C2 ⊗ K 2 , represent the mean behavior of the stochastic PDE and the
main stochastic variation. This follows from the stochastic discretization of the random coefficient
as a truncation of a series of terms of decreasing importance, see Section 2.2. The sum involving
the matrices C3 , . . . , C L ∗ can be seen as a perturbation of the system matrix. A more thorough
(geometric) multigrid analysis for the general stationary case can be found in [16].
The convergence analysis of the previous section shows that both the matrices K 1 and K 2 as well
as the eigenvalues of C2 and Airk determine the AMG convergence, see Equations (28) and (33). In
this section we discuss the AMG convergence behavior with respect to the stochastic discretization.
The conclusions agree with the properties of the geometric multigrid variant, as observed in [15]
and theoretically analyzed in [16, 27]. The AMG convergence behavior with respect to the IRK
discretization, i.e. the influence of the eigenvalues of Airk , is detailed in [11].
Thus, the eigenvalues of C 2 correspond to the diagonal entries of the diagonal matrix G 2 . It can
be shown that these values coincide with the roots of univariate orthogonal polynomials from the
Askey scheme, as explained in [16]. Moreover, as the matrix C2 is a principal submatrix of C 2 ,
the eigenvalues of C 2 determine upper and lower bounds for the eigenvalues of C2 . This allows
one to determine bounds for the eigenvalues of C2 from the roots of certain univariate orthogonal
polynomials.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:141–163
DOI: 10.1002/nla
ALGEBRAIC MULTIGRID FOR PDES WITH STOCHASTIC COEFFICIENTS 153
6 2
4
1
2
Eigenvalues
Eigenvalues
0 0
0 2 4 6 8 10 0 5 10 15 20
(a) Hermite chaos order (b) Number of random dimensions
Figure 1. Effect of the polynomial chaos order and the number of random variables on the eigenvalues
of C2 (7) in case of a Hermite polynomial chaos: (a) fixed number of random variables and increasing
P, L = 4 and (b) fixed order and increasing value of L, P = 2.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:141–163
DOI: 10.1002/nla
154 E. ROSSEEL, T. BOONEN AND S. VANDEWALLE
The stiffness matrix K 1 corresponds to a discrete Laplace operator, while the second stiffness
2
matrix K 2 contains only contributions from * u/*y 2 . The AMG convergence rate equals the worst
multigrid convergence for deterministic problems of the form:
2
* u √ *2 u
− +(1+q ) 2 u = b with q ∈ (C2 ), C2 = 1 T
*x 2 *y
Increasing the polynomial order broadens the range of q and consequently increases the anisotropy
of the problem. This results in a decreased AMG convergence. Eventually, for a sufficiently large
order, the problem will lose ellipticity and AMG will diverge.
6. IMPLEMENTATION ASPECTS
The effectiveness of an AMG method depends strongly on the efficiency of its implementation. In
this section we point out some implementation issues that allow to reduce the computation time
and memory usage.
with the unknowns u and x being collected in the multivectors U ∈ R N ×Q and X ∈ R N ×Qs . Note
that the N rows of X equal the N blocks of the unknown vector x̂ in Equation (18). This matrix
representation allows an easy access of all the unknowns per nodal point: they correspond to a
row in the matrix U or X . Such access is frequently needed for the block smoothing operator, the
matrix–vector multiplication in the residual computation, and the block restriction and prolongation
operators. Note also that storing these multivectors in a row-by-row storage format enables a cache
efficient implementation. With one memory access, a whole set of values can be retrieved from
memory that will be used in the subsequent operations.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:141–163
DOI: 10.1002/nla
ALGEBRAIC MULTIGRID FOR PDES WITH STOCHASTIC COEFFICIENTS 155
Obviously, the entire system of dimension NQ×NQ (in the stationary case), or NQs×NQs (in
the time-dependent case), is never stored or constructed explicitly. Only the storage of one mass
matrix M, of L ∗ stiffness matrices K i and L ∗ matrices Ci is required. These matrices can be stored
in sparse matrix format. In general, all stiffness matrices K i have the same sparsity structure;
hence, the description of this structure has to be stored just once.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:141–163
DOI: 10.1002/nla
156 E. ROSSEEL, T. BOONEN AND S. VANDEWALLE
10
10
Average solution time (sec.)
10
10
10
10
Figure 2. Average computation time to solve one local system (21) or (20) in case of the model problem
(1) with (x, t, ) modelled as a Gaussian random field (x, ) with an exponential covariance function:
(a) stationary problem and (b) time-dependent problem. A Hermite chaos random discretization is used
and a Radau IIA IRK method.
sparse, see [31]. As a consequence, the local system solves are more time consuming. The corre-
sponding computation times for different solution methods follow, however, the same pattern as in
Figure 2.
7. NUMERICAL RESULTS
In this section we present some numerical results obtained with the AMG method. First, we
investigate the AMG convergence with respect to several discretization parameters for the stationary
diffusion equation. The tests use a square spatial domain, D = [0, 1]2 , and piecewise linear, triangular
finite elements. We consider homogeneous Dirichlet boundary conditions, and the source term
b(x, t) is set to zero. The AMG prolongation operators are built with classical Ruge–Stüben AMG
[32]. The stopping criterion for the AMG method is a residual norm smaller than 10−10 . A random
initial approximation to the solution was used. We consider several configurations for the random
input (x, t, ). In case of a random field, (x, ), the stochastic diffusion coefficient depends
on the spatial position, e.g. representing a heterogeneous material. In case of a random process,
(t, ), the stochastic diffusion coefficient remains the same at all spatial points but evolves in
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:141–163
DOI: 10.1002/nla
ALGEBRAIC MULTIGRID FOR PDES WITH STOCHASTIC COEFFICIENTS 157
Table III. Number of iterations required to solve the steady-state diffusion equation corresponding to (1)
with W (2, 1)-cycles, using AMG as standalone solver, or as preconditioner for CG (between brackets).
Spatial nodes
Q = 21, P = 2, L = 5 N = 10 177 N = 50 499 N = 113 981 N = 257 488 N = 356 806
g 31 (15) 35 (16) 36 (17) 36 (17) 37 (17)
u 31 (15) 34 (16) 36 (17) 36 (17) 37 (17)
ln 32 (16) 37 (17) 39 (18) 38 (17) 39 (18)
Random variables L =1 L =5 L = 10 L = 15 L = 20
(N = 20 611, P = 2) Q =3 Q = 21 Q = 66 Q = 136 Q = 231
g 32 (15) 34 (16) 34 (16) 35 (16) 35 (16)
u 32 (15) 33 (16) 34 (16) 35 (16) 35 (16)
ln 35 (16) 36 (17) 36 (17) 36 (17) 37 (17)
Chaos order P =1 P =2 P =3 P =4 P =5
(N = 20 611, L = 5) Q =6 Q = 21 Q = 56 Q = 126 Q = 252
g 33 (15) 34 (16) 34 (16) 35 (16) 36 (17)
u 33 (15) 33 (16) 34 (16) 35 (16) 35 (17)
ln 34 (16) 36 (17) 37 (17) 37 (17) 38 (18)
time. For each case, Table II indicates which expansion is used to construct the random input
and what type of random variables are present in that expansion. In case of a KL expansion, an
exponential covariance function is assumed, C (x, x ) = exp(−|x−x |/lc ), with variance = 0.1
and correlation length lc = 1. In case of the lognormal random field ln , the variance of the
underlying Gaussian field g equals 0.3. For each configuration of , the mean value of the
random input always equals the constant function 1. When the stochastic discretization is based
on uniformly distributed random variables, a Legendre polynomial chaos is used, in the case of
standard normal distributed random variables a Hermite chaos. Next, the AMG performance will
be illustrated for a more complex test problem.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:141–163
DOI: 10.1002/nla
158 E. ROSSEEL, T. BOONEN AND S. VANDEWALLE
(a) (b)
Figure 3. Total solution time when solving the steady-state problem with u (x, ), L = 5 and (2, 1)-cycles
of AMG iterations: (a) a second-order Legendre chaos is used, resulting in Q = 21 and (b) the discretization
is based on a first until a fifth-order Legendre chaos and a mesh with 20 611 nodes.
Table IV. The number of iterations required to solve problem (38) with W (2, 1)-cycles, using AMG as
standalone solver, or as preconditioner for CG (between brackets), until residual<10−10 .
Polynomial chaos order P =1 P =2 P =3 P =4 P =5
g (Hermite chaos) 24 (16) 25 (17) 27 (19) 35 (22) 86 (61)
u (Legendre chaos) 24 (16) 24 (16) 25 (17) 25 (17) 25 (17)
ln (Hermite chaos) 43 (25) 48 (28) 58 (33) 72 (40) 97 (53)
The finite element mesh consists of N = 20 611 nodes, and five random variables are used to discretize the
random space.
Table V. The number of iterations required to solve the time-dependent problem (1) with V (2, 1)-cycles,
using AMG as standalone solver, or as preconditioner for BiCGStab (between brackets).
Time discretization order 1 3 5 7 9 11
IRK stages s =1 s =2 s =3 s =4 s =5 s =6
g 41 (18) 33 (19) 29 (19) 27 (20) 27 (18) 27 (19)
t 39 (18) 32 (19) 28 (19) 27 (18) 27 (19) 27 (19)
The discretization is based on a finite element mesh with 20 611 nodes, a second-order Hermite chaos with
L = 5, corresponding to Q = 21, and a Radau IIA implicit Runge–Kutta scheme with t = 0.01.
iterations is independent of the stochastic and spatial discretization when applied to our model
problem. The independence on the polynomial chaos order is maintained in the case of a Hermite
chaos for a low to moderate chaos order. Applying Krylov acceleration results in a more robust
convergence and reduced computing times.
The computation times for the calculations in Table III are presented graphically in Figure 3.
The total AMG solution time is shown as a function of the number of spatial nodes and of the
number of random unknowns. For this problem, the matrices Ci are defined as in Equation (7). By
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:141–163
DOI: 10.1002/nla
ALGEBRAIC MULTIGRID FOR PDES WITH STOCHASTIC COEFFICIENTS 159
increasing the number of spatial nodes, the number of local solves in the blocksmoother increases
proportionally. In addition, extra coarse levels may be introduced so that the total increase in
computation time is no longer linear. The results from the right figure were obtained by increasing
the polynomial chaos order while keeping the number of random variables L constant. Thus,
only the dimension of the matrices Ci increases; the number of stiffness matrices L ∗ remains the
same. This mainly affects the cost of the block solves in the smoother. With CG, the number of
iterations required is proportional to the square root of the condition number of (21). In practice
this condition number is close to 1 so that the number of CG iterations is more or less independent
of the dimension of the systems. The cost of each CG iteration depends on the sparsity of the
matrix, which, with Ci defined by (7), is of the order O(Q). This results in a cost O(Q) to solve
one block system in the smoother. The linear increase of the computation time in function of the
number of random unknowns Q is clearly observed in Figure 3. If the number of stiffness matrices
is also increased, then the total computing time tends to grow faster than linear. Also in the case
of a polynomial chaos expansion of the random input, as in the lognormal field example, higher
computing times are observed. This is caused by the larger number of stiffness matrices, Q instead
of L +1, and by the decreased sparsity of the matrices Ci (8).
As discussed in Section 5, the convergence analysis indicates that the convergence of AMG is
asymptotically independent of the polynomial chaos order in case of a Legendre chaos but not in
the case of a Hermite chaos. For model problem (1), solely a large polynomial chaos order has
an impact on the multigrid convergence. For some problems, however, also small values of the
polynomial chaos order affect the AMG convergence rate. This is illustrated by the problem
2 2
* u(x, ) * u(x, )
+(x, ) =0 (38)
*x 2 *y 2
which is discretized similar to our model problem. Table IV shows the AMG convergence for
increasing values of the polynomial chaos order. In case of a Hermite chaos, the deteriorating
AMG convergence is observed. As expected, the number of iterations remains unchanged in case
of a Legendre chaos.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:141–163
DOI: 10.1002/nla
160 E. ROSSEEL, T. BOONEN AND S. VANDEWALLE
Figure 5. Mean and variance of the solution of Equation (39). The configuration of Figure 4 is used with
a three-stage Radau IIA IRK discretization and time step 0.05. The stochastic discretization is based on
a second-order Legendre chaos. The electric potential is zero initially.
corresponding to the different material regions of the cable. The stochastic PDE models the effect
of deviations in permittivity on the resulting electric potential as a function of space and time.
Figure 5 shows the mean value and the variance of the electric potential at several instances in
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:141–163
DOI: 10.1002/nla
ALGEBRAIC MULTIGRID FOR PDES WITH STOCHASTIC COEFFICIENTS 161
(a) (b)
Figure 6. Residual norms as a function of the number of iterations when solving Equation (39) on
the domain represented by Figure 4, discretized with 166 245 nodes. A three-stage Radau IIA IRK
discretization with time step 0.05 is used together with a second-order Legendre chaos resulting in a total
of 1.7×106 unknowns in the steady-state case and 5.0×106 in the time-dependent case: (a) Steady-state
problem, F(2, 1) cycles and (b) transient problem, V(2, 1) cycles.
time. Applying AMG results in similar convergence properties as the ones described above. An
illustration of the convergence history as a function of the iteration index is given in Figure 6.
Observe that the use of GMRESR [33] results in a more robust convergence than BiCGStab. This
is typically also the case for classical deterministic PDEs. To limit the memory requirements of
GMRESR, the method is restarted every five iterations.
8. CONCLUSIONS
We have constructed and analyzed an AMG method for stochastic finite element discretizations of
time-dependent stochastic PDEs. This work extends previous research on multigrid for stochastic
finite element problems [16, 27] towards unstructured finite element meshes and high-order time
discretizations. The presented AMG method has very favorable convergence properties with respect
to the spatial, random and time discretization.
To solve real engineering stochastic PDEs by the stochastic finite element method, however,
further research is necessary. By using the knowledge of the stochastic discretization, the AMG
components may be enhanced and optimized.
The Hermite chaos of order P and defined over L standard normal variables 1 , . . . , L is
constructed as a set of Q multivariate Hermite polynomials q , each defined as [34]
L 1
q = H
q,i (i )
i=1
q,i !
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:141–163
DOI: 10.1002/nla
162 E. ROSSEEL, T. BOONEN AND S. VANDEWALLE
with Hn (z) being a one-dimensional Hermite polynomial of order n and
q = (
q,i , . . . ,
q,L ) a set
L
of non-negative integers with only a finite number non-zeros and i=1
q,i <P. The factor 1/
q,i !
guarantees the normalization of the multivariate polynomials. The one-dimensional polynomials
Hn (z) are recursively defined as [35]
H0 (z) = 1, H1 (z) = z, Hn+1 (z) = z Hn (z)−n Hn−1 (z)
Based on the properties of Hermite polynomials [28, p. 390], the inner product of three multivariate
Hermite polynomials can be calculated as
L
im !
jm !
km !
i j k = (A1)
m=1 (sm −
im )!(sm −
jm )!(sm −
km )!
ACKNOWLEDGEMENTS
This paper presents research results of the Belgian Network DYSCO (Dynamical Systems, Control, and
Optimization), funded by the Interuniversity Attraction Poles Programme, initiated by the Belgian State,
Science Policy Office. The scientific responsibility rests with its authors. This research was supported in
part by the Research Council K.U.Leuven, CoE EF/05/006 Optimization in Engineering (OPTEC).
REFERENCES
1. Karniadakis G, Su C-H, Xiu D, Lucor D, Schwab C, Todor R. Generalized polynomial chaos solution for
differential equations with random inputs. Research Report 2005-01, Seminar for Applied Mathematics, ETH
Zürich, January 2005.
2. Xiu D, Karniadakis G. Modeling uncertainty in steady state diffusion problems via generalized polynomial chaos.
Computer Methods in Applied Mechanics and Engineering 2002; 191:4927–4948.
3. Schuëller GI. A state-of-the-art report on computational stochastic mechanics. Probabilistic Engineering Mechanics
1997; 12(4):197–322.
4. Babuška I, Tempone R, Zouraris GE. Solving elliptic boundary value problems with uncertain coefficients by
the finite element method: the stochastic formulation. Computer Methods in Applied Mechanics and Engineering
2005; 194:1251–1294.
5. Babuška I, Chatzipantelidis P. On solving elliptic stochastic partial differential equations. Computer Methods in
Applied Mechanics and Engineering 2002; 191:4093–4122.
6. Shinozuka M, Deodatis G. Response variability of stochastic finite element systems. Journal of Engineering
Mechanics 1988; 114:499–519.
7. Ghanem R, Spanos P. Stochastic Finite Elements, a Spectral Approach. Dover: Mineola, NY, 2003.
8. Ghanem R, Spanos P. A spectral stochastic finite element formulation for reliability analysis. Journal of
Engineering Mechanics (ASCE) 1991; 17:2351–2372.
9. Hairer E, Wanner G. Solving Ordinary Differential Equations II: Stiff and Differential-algebraic Problems.
Springer: Berlin, Germany, 1991.
10. Van lent J, Vandewalle S. Multigrid methods for implicit Runge–Kutta and boundary value method discretizations
of parabolic pdes. SIAM Journal on Scientific Computing 2005; 27(1):67–92.
11. Boonen T, Van lent J, Vandewalle S. An algebraic multigrid method for high order time-discretization of the
div–grad and the curl–curl equations. Applied Numerical Mathematics 2007; in press.
12. Xiu D, Karniadakis G. The Wiener–Askey polynomial chaos for stochastic differential equations. SIAM Journal
on Scientific Computing 2002; 24(2):619–644.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:141–163
DOI: 10.1002/nla
ALGEBRAIC MULTIGRID FOR PDES WITH STOCHASTIC COEFFICIENTS 163
13. Babuška I, Tempone R, Zouraris GE. Galerkin finite element approximations of stochastic elliptic partial differential
equations. SIAM Journal on Numerical Analysis 2004; 42:800–825.
14. Wan X, Karniadakis GE. An adaptive multi-element generalized polynomial chaos method for stochastic differential
equations. Journal of Computational Physics 2005; 209(2):617–642.
15. Le Maı̂tre O, Knio O, Debusschere B, Najm H, Ghanem R. A multigrid solver for two-dimensional stochastic
diffusion equations. Computer Methods in Applied Mechanics and Engineering 2003; 192:4723–4744.
16. Seynaeve B, Rosseel E, Nicolaı̈ B, Vandewalle S. Fourier mode analysis of multigrid methods for partial
differential equations with random coefficients. Journal of Computational Physics 2007; 224:132–149.
17. Ghanem R, Saad G, Doostan A. Efficient solution of stochastic systems: application to the embankment dam
problem. Structural Safety 2007; 29(3):238–251.
18. Xiu D, Karniadakis G. Modeling uncertainty in flow simulations via generalized polynomial chaos. Journal of
Computational Physics 2003; 187:137–167.
19. Loève M. Probability Theory. Springer: New York, U.S.A., 1977.
20. Ghanem R. The nonlinear Gaussian spectrum of log-normal stochastic processes and variables. Journal of Applied
Mechanics—Transactions of the ASME 1999; 66(4):964–973.
21. Phoon K, Huang S, Quek S. Simulation of second-order processes using Karhunen–Loève expansion. Computers
and Structures 2002; 80:1049–1060.
22. Sudret B, Der Kiureghian A. Stochastic finite elements and reliability: a state-of-the-art report. Technical Report
UCB/SEMM-2000/08, University of California, Berkeley, 2000.
23. Ruge JW, Stüben K. Algebraic multigrid. In Multigrid Methods, McCormick SF (ed.). Frontiers in Applied
Mathematics. SIAM: Philadelphia, U.S.A., 1987; 73–130.
24. Trottenberg U, Oosterlee C, Schüller A. Multigrid. Academic Press: San Diego, U.S.A., 2001.
25. Brandt A. Multi-level adaptive solutions to boundary-value problems. Mathematics of Computation 1977; 31:
333–390.
26. Wienands R, Joppich W. Practical Fourier Analysis for Multigrid Methods. CRC Press: Boca Raton, FL, U.S.A.,
2005.
27. Elman H, Furnival D. Solving the stochastic steady-state diffusion problem using multigrid. IMA Journal of
Numerical Analysis 2007; 27(4):675–688.
28. Szegö G. Orthogonal Polynomials (4th edn). American Mathematical Society: Providence, U.S.A., 1967.
29. Davis TA. Algorithm 832: UMFPACK V4.3—an unsymmetric-pattern multifrontal method. ACM Transactions
on Mathematical Software 2004; 30(2):196–199.
30. Demmel JW, Eisenstat SC, Gilbert JR, Li XS, Liu JWH. A supernodal approach to sparse partial pivoting. SIAM
Journal on Matrix Analysis and Applications 1999; 20(3):720–755.
31. Eiermann M, Ernst OG, Ullmann E. Computational aspects of the stochastic finite element method. Computing
and Visualization in Science 2007; 10(1):3–15.
32. Stüben K. A review of algebraic multigrid. Journal of Computational and Applied Mathematics 2001; 128:
281–309.
33. Van der Vorst H, Vuik C. GMRESR: a family of nested GMRES methods. Numerical Linear Algebra with
Applications 1994; 1(4):369–386.
34. Matthies H, Keese A. Galerkin methods for linear and nonlinear elliptic stochastic partial differential equations.
Computer Methods in Applied Mechanics and Engineering 2005; 194:1295–1331.
35. Soize C, Ghanem R. Physical systems with random uncertainties: chaos representations with arbritary probability
measure. SIAM Journal on Scientific Computing 2004; 26(2):395–410.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:141–163
DOI: 10.1002/nla
NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS
Numer. Linear Algebra Appl. 2008; 15:165–185
Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nla.577
SUMMARY
In this paper we describe an aggregation-based algebraic multigrid method for the solution of discrete
k-form Laplacians. Our work generalizes Reitzinger and Schöberl’s algorithm to higher-dimensional
discrete forms. We provide conditions on the tentative prolongators under which the commutativity of
the coarse and fine de Rham complexes is maintained. Further, a practical algorithm that satisfies these
conditions is outlined, and smoothed prolongation operators and the associated finite element spaces are
highlighted. Numerical evidence of the efficiency and generality of the proposed method is presented in
the context of discrete Hodge decompositions. Copyright q 2008 John Wiley & Sons, Ltd.
KEY WORDS: algebraic multigrid; Hodge decomposition; discrete forms; mimetic methods; Whitney
forms
1. INTRODUCTION
Discrete differential k-forms arise in scientific disciplines ranging from computational electro-
magnetics to computer graphics. Examples include stable discretizations of the eddy-current
problem [1–3], topological methods for sensor network coverage [4], visualization of complex
flows [5, 6], and the design of vector fields on meshes [7].
In this paper we consider solving problems of the form
ddk = k (1)
where d denotes the exterior derivative and d the codifferential relating k-forms and . For
k = 0, 1, 2, dd is also expressed as ∇·∇, ∇×∇×, and ∇∇·, respectively. We refer to operator dd
generically as a Laplacian, although it does not correspond to the Laplace–de Rham operator D =
dd+dd except for the case k = 0. We assume that (1) is discretized with mimetic first-order elements
∗ Correspondence to: Nathan Bell, Siebel Center for Computer Science, University of Illinois at Urbana-Champaign,
201 North Goodwin Avenue, Urbana, IL 61801, U.S.A.
†
E-mail: wnbell@uiuc.edu, wnbell@gmail.com
such as Whitney forms [8, 9] on simplicial meshes or the analog on hexahedral [10] or polyhedral
elements [11]. In general, we use Ik to denote the map from discrete k-forms (cochains) to their
respective finite elements. Such discretizations give rise to a discrete exterior k-form derivative
Dk and discrete k-form innerproduct Mk (i, j) = Ik ei , Ik e j, which allows implementation of (1)
in weak form as
under the additional assumption that d commutes with I , i.e. Ik+1 Dk = dk Ik . This relationship is
depicted as
dk -
k k+1
6 6
Ik Ik+1 (3)
Dk -
kd k+1
d
where k and kd denote the spaces of differential k-forms and discrete k-forms, respectively. For
the remainder of the paper, we restrict our attention to solving (2) on structured or unstructured
meshes of arbitrary dimension and element type, provided the elements satisfy the aforementioned
commutativity property.
Figure 1. Enumeration of nodes (left), oriented edges (center), and oriented triangles (right) for a simple
triangle mesh. We say that vertices 2 and 3 are upper adjacent since they are joined by edge 4. Similarly,
edges 5 and 6 are both faces of triangle 2 and therefore upper adjacent.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:165–185
DOI: 10.1002/nla
AMG FOR k-FORM LAPLACIANS 167
1.1. Example
Although our results hold more generally, it is instructive to examine a concrete example that
satisfies the assumptions set out in Section 1. To this end, consider the three-element simplicial
mesh depicted in Figure 1, with the enumeration and orientation of vertices, edges, and triangles
as shown. In this example, we choose Whitney forms [8] to define the interpolation operators
I0 , I1 , I2 which in turn determine the discrete innerproducts M0 , M1 , M2 . Finally, sparse matrices
⎡ ⎤
−1 1 0 0 0
⎡ ⎤ ⎢ −1 0
0 ⎢ 0 1 0⎥
⎥
⎢ 0⎥ ⎢ ⎥
⎢ ⎥ ⎢ 0 −1 1 0 0⎥
⎢ ⎥ ⎢ ⎥
D−1 = ⎢ 0⎥ , D0 = ⎢
⎢ 0 −1 0 1 0⎥
⎥ (4)
⎢ ⎥ ⎢ 0
⎣ 0⎦ ⎢ 0 −1 1 0⎥
⎥
⎢ ⎥
0 ⎣ 0 0 −1 0 1⎦
0 0 0 −1 1
⎡ ⎤
1 −1 0 1 0 0 0
⎢ ⎥
D1 = ⎣ 0 0 1 −1 1 0 0 ⎦ , D2 = [0 0 0] (5)
0 0 0 0 −1 1 −1
implement the discrete k-form derivative operators. A discrete k-form (cochain), denoted k , is
represented by a column vector with entries corresponding to each of the k-simplices in the mesh.
For example, the Whitney-interpolated fields corresponding to 0 = [0, 1, 2, 1, 2]T , the gradient
D0 0 = [1, 1, 1, 0, −1, 0, 1]T , and another 1-form 1 = [1, 0, 1, 0, 0, 1, 0]T are shown in Figure 2.
By convention, D−1 and D2 are included to complete the exact sequence.
0
D
0
d - 1
d
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:165–185
DOI: 10.1002/nla
168 N. BELL AND L. N. OLSON
0
D 1
D k
D
0
d - 1
d - 2
d ... k
d - k+1
d
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:165–185
DOI: 10.1002/nla
AMG FOR k-FORM LAPLACIANS 169
exposes topological information through differential forms. For example, the two harmonic 1-forms
shown in Figure 3 exist because the manifold has genus 1. The efficient solution of discrete k-form
Laplacians has substantial utility in computational topology. For instance, sufficient conditions on
the coverage of sensor networks reduce to the discovery of harmonic forms on the simplicial Rips
complex [4]. In such applications, we do not encounter variable coefficients and often take the
identity matrix for Mk .
2. PROPOSED METHOD
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:165–185
DOI: 10.1002/nla
170 N. BELL AND L. N. OLSON
Figure 4. Nodal aggregates (left) determine coarse edges (center) through the algorithm
induced aggregates. Fine edges crossing between node aggregates interpolate from
the corresponding coarse edge with weight 1 or −1 depending on their relative orientation.
Edges contained within an aggregate do not correspond to any coarse edge and receive
weight 0. These weights are determined by lines 10–13 of induced aggregates.
aggregates. Furthermore, when two nonzero rows are equal up to a sign (i.e. linearly dependent),
they interpolate from a common coarse edge.
Therefore, the procedure of aggregating edges reduces to computing sets of linearly dependent
rows in D. Each set of dependent rows yields a coarse edge and thus a column of P1 . In the
general case, sets of dependent rows in D = Dk Pk are identified and used to produce Pk+1 . The
process can be repeated to coarsen the entire de Rham complex. Alternatively, the coarsening
can be stopped at a specific k < N . In Section 2.5, we discuss the coarse derivative operator
k ⇐ (P T Pk+1 )−1 P T Dk Pk and show that it satisfies diagram (8).
D k+1 k+1
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:165–185
DOI: 10.1002/nla
AMG FOR k-FORM LAPLACIANS 171
where R(A) denotes the range of matrix A. Note that property (9) is clearly necessary to satisfy
diagram (8).
Using disjoint sets of dependent rows A0 , A1 , . . ., the function induced aggregates
constructs the aggregation operator Pk+1 described above. Nonzero entry Pk+1 (i, j) indicates
membership of the ith row of D—i.e. the ith k +1-dimensional element—to the jth aggregate A j .
2.4. Example
In this section, we describe the steps of our algorithm applied to the three-element simplicial mesh
depicted in Figure 1. Matrices D−1 , D0 , D1 , and D2 , shown in Section 1.1, are first computed
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:165–185
DOI: 10.1002/nla
172 N. BELL AND L. N. OLSON
Figure 5. Example where contiguous (center) and noncontiguous (right) aggregation differs.
Contiguous aggregates are reflected through our choice of G defined in induced aggregates
and later used in dependent rows.
and then passed to coarsen complex. The externally defined procedure aggregate nodes
is then called to produce the piecewise-constant nodal aggregation operator
⎡ ⎤
1 0 0
⎢ ⎥
⎢ 1 0 0⎥
⎢ ⎥
⎢ ⎥
P0 = ⎢ 0 1 0⎥ (10)
⎢ ⎥
⎢ 1 0 0⎥
⎣ ⎦
0 0 1
whose corresponding aggregates are shown in Figure 6. At this stage of the procedure, a
more general nodal problem DT0 M1 D0 may be utilized in determining the coarse aggre-
gates. Next, induced aggregates is invoked with arguments P0 , D0 , D1 and the sparse
matrix
⎡ ⎤
0 0 0
⎢ ⎥
⎢ 0 0 0⎥
⎢ ⎥
⎢ ⎥
⎢ −1 1 0⎥
⎢ ⎥
⎢ ⎥
D = D0 P0 = ⎢ 0 0 0⎥ (11)
⎢ ⎥
⎢ 1 −1 0⎥
⎢ ⎥
⎢ ⎥
⎢ 0 −1 1⎥
⎣ ⎦
−1 0 1
is constructed. Recall from Section 2.2 that the rows of D are used to determine the induced
edge aggregates. The zero rows of D, namely rows 0, 1, and 3, correspond to interior edges,
which is confirmed by Figure 6. Linear dependence between rows 2 and 4 indicates that
edges 2 and 4 have common coarse endpoints, with the difference in sign indicating opposite
orientations.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:165–185
DOI: 10.1002/nla
AMG FOR k-FORM LAPLACIANS 173
Figure 6. Original mesh with nodal aggregates (left), coarse nodes (center), and coarse edges (right).
to find dependent rows among upper-adjacent edges. In this case, edges 3 and 4 are upper adjacent
to 2; however, only row 4 in D is linearly dependent to row 2 in D. Rows 5 and 6 of D are
not linearly dependent to any other rows, thus forming single aggregates for edges 5 and 6. The
resulting aggregation operator
⎡ ⎤
00 0
⎢ ⎥
⎢ 0 0 0⎥
⎢ ⎥
⎢ ⎥
⎢ 1 0 0⎥
⎢ ⎥
⎢ ⎥
P1 = ⎢ 0 0 0⎥ (13)
⎢ ⎥
⎢ −1 0 0⎥
⎢ ⎥
⎢ ⎥
⎢ 0 1 0⎥
⎣ ⎦
0 0 1
⎡ ⎤
−1 1 0
0 = (P T P1 )−1 P T D0 P0 = ⎢
D ⎣ 0
⎥
−1 1⎦ (14)
1 1
−1 0 1
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:165–185
DOI: 10.1002/nla
174 N. BELL AND L. N. OLSON
for the mesh in Figure 6. Subsequent iterations of the algorithm produce operators
⎡ ⎤
0
⎢ ⎥ 1 = (P T P2 )−1 P T D1 P1 = [1 1 −1], D 2 = D2 P2 = [0]
P2 = ⎣ 0⎦ , D 2 2 (15)
1
which complete the coarse de Rham complex.
2.5. Commutativity
0, D
We now prove tentative prolongators P0 , P1 , . . . , PK and coarse derivative operators D 1, . . . , D
K
produced by Algorithm 1 satisfy commutative diagram (8). The result is summarized by the
following theorem.
Theorem 1
Let Pk :
k
d → kd denote the discrete k-form prolongation operators with the following properties:
k ⇐ (P T Pk+1 )−1 P T Dk Pk
D (16c)
k+1 k+1
Proof
Since Pk+1 has full column rank, the pseudoinverse is given by
+
Pk+1 = (Pk+1
T
Pk+1 )−1 Pk+1
T
(18)
Since Algorithm 1 meets assumptions (16a)–(16c) it follows that diagram (8) is satisfied. Also,
assuming disjoint aggregates, the matrix (Pk+1
T P
k+1 ) appearing in (18) is a diagonal matrix; hence,
its inverse is easily computed.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:165–185
DOI: 10.1002/nla
AMG FOR k-FORM LAPLACIANS 175
k+1 D
D k = P + Dk+1 Pk+1 P + Dk Pk
k+2 k+1
+ +
= Pk+2 Dk+1 Pk+1 Pk+1 Pk+1 X
+
= Pk+2 Dk+1 Pk+1 X
+
= Pk+2 Dk+1 Dk Pk
=0
since Dk+1 Dk = 0 by assumption. From diagram (3), we infer the same result for the associated
finite element spaces.
Theorem 2
Given discrete k-form prolongation operators Pk satisfying (16a)–(16c), let Pk :
k
d → kd denote
the smoothed discrete k-form prolongation operators with the following properties:
Pk = Sk Pk (20a)
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:165–185
DOI: 10.1002/nla
176 N. BELL AND L. N. OLSON
where Sk defines the type of prolongation smoother. Then, diagram (8) holds. That is,
k
Dk Pk = Pk+1 D (21)
Proof
First, if
Dk Sk = Sk+1 Dk (22)
then
k = Sk+1 Pk+1 D
Pk+1 D k
Therefore, it suffices to show that (22) holds for all k. For k = 0, we have
= D0 (I −S0 DT0 M1 D0 )
= D0 S0
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:165–185
DOI: 10.1002/nla
AMG FOR k-FORM LAPLACIANS 177
=D
T k
k P T Mk+1 Pk+1 D
k+1
=D k+1 D
k M
T k
2.8. Extensions and applications
Note that condition (9) permits some freedom in our choice of aggregates. For instance, in restricting
ourselves to contiguous aggregates we have slightly enriched the range of Pk+1 beyond what
is necessary. Provided that Pk+1 already satisfies (9), additional coarse basis functions can be
introduced to better approximate low-energy modes. As in smoothed aggregation, these additional
columns of Pk+1 can be chosen to exactly interpolate given near-nullspace vectors [17].
So far we have only discussed coarsening the cochain complex (8). It is worth noting that
coarsen complex works equally well on the chain complex formed by the mesh boundary
operators *k = DTk−1 ,
DT−1 DT0 DTN −2 N −1 DTN −1 DTN
0 0d ··· d dN 0 (23)
by simply reversing the order of the complex, i.e. (D−1 , D,0 , . . . , D N ) ⇒ (DTN , DTN −1 , . . . , D−1 ).
In this case, aggregate nodes will aggregate the top-level elements, for instance, the triangles
in Figure 1. Intuitively, *k acts like a derivative operator that maps k-cochains to (k +1)-cochains;
however, one typically refers to these as k-chains rather than cochains [20]. In Section 3, we
coarsen both complexes when computing Hodge decompositions.
3. HODGE DECOMPOSITION
The Hodge decomposition [21] states that the space of k-forms on a closed manifold can be
decomposed into three orthogonal subspaces
k = dk−1 k−1 ⊕dk+1 k+1 ⊕ Hk (24)
where Hk is the space of harmonic k-forms, Hk = {h ∈ k |Dk h = 0}. The analogous result holds
for the space of discrete k-forms kd , where the derived codifferential [22]
dk = M−1
k−1 Dk−1 Mk
T
(25)
is defined to be the adjoint of Dk−1 in the discrete innerproduct Mk . Convergence of the discrete
approximations to the Hodge decomposition is examined in [23].
In practice, for a discrete k-form k we seek a decomposition
k = Dk−1 k−1 +M−1
k Dk Mk+1
T k+1
+h k (26)
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:165–185
DOI: 10.1002/nla
178 N. BELL AND L. N. OLSON
DTk−1 Mk Dk−1 k−1 = DTk−1 Mk k (27)
Dk M−1
k Dk Mk+1
T k+1
= Dk k (28)
Note that (28) involves the explicit inverse M−1 k which is typically dense.‡ In the following
sections, we first consider the special case Mk = I and then show how (28) can be circumvented
in the general case. Equation (27) is obtained by left multiplying Mk−1 DTk−1 Mk on both sides of
(26). Likewise, applying Dk to both sides of (26) yields (28). Equivalently, one may seek minima
of the following functionals:
5 end
6 f o r l = 0 t o NUM LEVELS − 1
7 Pl ⇐ s m o o t h p r o l o n g a t o r ( Al , Pk−1 l )
8 Al+1 ⇐ Pl Al Pl T
9 end
10 r e t u r n MG solver ( A0 , A1 , . . . , ANUM LEVELS , P0 , P1 , . . . , PNUM LEVELS−1 )
‡
The covolume Hodge star is a notable exception.
§ In the case of M = I , the cohomology basis is actually a homology basis also.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:165–185
DOI: 10.1002/nla
AMG FOR k-FORM LAPLACIANS 179
Algorithm 5 demonstrates how the proposed method is used to compute Hodge decompositions
in the special case. Multigrid solvers solver1 and solver2 are constructed for the solution of
linear systems (31) and (32), respectively. In the latter case, the direction of the chain complex is
reversed when being passed as an argument to construct solver. As mentioned in Section
2.8, coarsen complex coarsens the reversed complex with this simple change of arguments.
Using the identity innerproduct, construct solver applies the proposed method recur-
sively to produce a progressively coarser hierarchy of tentative prolongators Pkl and discrete
derivatives Dlk . The tentative prolongators are then smoothed by a user-defined function
smoothprolongator to produce the final prolongators Pl and Galerkin products Al+1 ⇐
PlT Al Pl . Finally, the matrices A0 , . . . , ANUM LEVELS and P0 , . . . , PNUM LEVELS−1 determine the
multigrid cycle in a user-defined class MGsolver. Choices for smoothprolongator and
MGsolver are discussed in Section 4.
Therefore, the task of computing general Hodge decompositions can be reduced to computing
a basis for Hk . Sometimes, a basis is known a priori. For instance, H0 , which corresponds to
the nullspace of the pure-Neumann problem, is spanned by constant vectors on each connected
component of the domain. Furthermore, if the domain is contractible then Hk = {} for k>0.
However, in many cases of interest we cannot assume that a basis for Hk is known and, therefore,
it must be computed.
Note that decompose special can be used to determine a Harmonic k-form basis for the
identity innerproduct by decomposing randomly generated k-forms until their respective harmonic
components become linearly dependent. We denote this basis {h k0 , h k1 , . . . h km } and their span Hk .
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:165–185
DOI: 10.1002/nla
180 N. BELL AND L. N. OLSON
Using these k-forms, a basis for the harmonic k-forms with innerproduct Mk can be produced by
solving
M−1 k −1 T k T k k−1
k−1 Dk−1 Mk h i = Mk−1 (Dk−1 Mk h i −Dk−1 Mk h i Dk−1 i
T
)=0 (37)
m
0= ci h ik
i=0
m
= ci (h ik −Dk−1 ik−1 )
i=0
m
= ci h ik − ci Dk−1 ik−1
i=0 i=0
N −1 k k k
which is a contradiction, since ( i=0 ci h i ) ∈ H is nonzero and H ⊥ R(Dk−1 ). Note that the
harmonic forms h k0 , . . . , h km are not generally the same as the harmonic components of the random
k-forms used to produce h k0 , . . . h km .
4. NUMERICAL RESULTS
We have applied the proposed method to a number of structured and unstructured problems. In all
cases, a multigrid V (1, 1)-cycle is used as a preconditioner to conjugate gradient iteration. Unless
stated otherwise, a symmetric Gauss–Seidel sweep is used during pre- and post-smoothing stages.
Iteration on the positive-semidefinite systems
DTk Dk , Dk DTk , DTk Mk+1 Dk (38)
proceeds until the relative residual is reduced by 10−10 . The matrix DT0 M1 D0 corresponds to
a Poisson problem with pure-Neumann boundary conditions. Similarly, DT1 M2 D1 is an eddy-
current problem (6) with = 0. As explained in Section 3, matrices (38) arise in discrete Hodge
decompositions.
The multigrid hierarchy extends until the number of unknowns falls below 500, at which point
a pseudoinverse is used to perform the coarse level solve. The tentative prolongators are smoothed
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:165–185
DOI: 10.1002/nla
AMG FOR k-FORM LAPLACIANS 181
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:165–185
DOI: 10.1002/nla
182 N. BELL AND L. N. OLSON
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:165–185
DOI: 10.1002/nla
AMG FOR k-FORM LAPLACIANS 183
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:165–185
DOI: 10.1002/nla
184 N. BELL AND L. N. OLSON
are associated with a coarse aggregate; therefore, the tentative prolongator has no zero rows.
As described in Section 2.4, the tentative prolongator for nonscalar problems has zero rows for
elements contained in the interior of a nodal aggregate. In the nonscalar case, additional smoothing
operations incorporate a greater proportion of these degrees of freedom into the range of the final
prolongator.
The influence of higher degree prolongation smoothers on solver performance is reported in
Table IV. Column ‘Degree’ records the degree d of the prolongation smoother P = S d P, whereas
‘Percent zero’ reflects the percentage of zero rows in the first-level prolongator. As expected,
the operator complexity increases with smoother degree. However, up to a point, this increase is
less significant than the corresponding reduction in solver convergence. Second-degree smoothers
exhibit the best efficiency in both instances of the problem DT1 M2 D1 and remain competitive
with higher-degree smoothers in the last test. Since work per digit figures exclude the cost of
constructing multigrid transfer operators, these higher-degree smoothers may be less efficient in
practice.
5. CONCLUSION
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:165–185
DOI: 10.1002/nla
AMG FOR k-FORM LAPLACIANS 185
REFERENCES
1. Yee KS. Numerical solution of initial boundary value problems involving Maxwells equations in isotropic media.
IEEE Transactions on Antennas and Propagation 1966; AP-14(3):302–307.
2. Bossavit A. On the numerical analysis of eddy-current problems. Computer Methods in Applied Mechanics and
Engineering 1981; 27(3):303–318.
3. Arnold DN. Differential complexes and numerical stability. Proceedings of the International Congress of
Mathematicians, Beijing. Plenary Lectures, vol. 1, 2002.
4. de Silva V, Ghrist R. Homological sensor networks. Notices of the American Mathematical Society 2007;
54:10–17.
5. Polthier K, Preuss E. Identifying vector field singularities using a discrete hodge decomposition. In Visualization
and Mathematics, VisMath, Hege HC, Polthier K (eds). Springer: Berlin, 2002.
6. Tong Y, Lombeyda S, Hirani AN, Desbrun M. Discrete multiscale vector field decomposition. ACM Transactions
on Graphics (Special issue of SIGGRAPH 2003 Proceedings) 2003; 22(3):445–452.
7. Fisher M, Schröder P, Desbrun M, Hoppe H. Design of tangent vector fields. SIGGRAPH ’07: ACM SIGGRAPH
2007 Papers, New York, NY, U.S.A. ACM: New York, 2007; 56.
8. Whitney H. Geometric Integration Theory. Princeton University Press: Princeton, NJ, 1957.
9. Bossavit A. Whitney forms: a class of finite elements for three-dimensional computations in electromagnetism.
IEE Proceedings 1988; 135(Part A(8)):493–500.
10. Bochev PB, Robinson AC. Matching algorithms with physics: exact sequences of finite element spaces. In
Collected Lectures on Preservation of Stability Under Discretization, Chapter 8, Estep D, Tavener S (eds). SIAM:
Philadelphia, PA, 2002; 145–166.
11. Gradinaru V, Hiptmair R. Whitney elements on pyramids. Electronic Transactions on Numerical Analysis 1999;
8:154–168.
12. Hiptmair R. Multigrid method for maxwell’s equations. SIAM Journal on Numerical Analysis 1999; 36(1):
204–225.
13. Arnold DN, Falk RS, Winther R. Multigrid in H (div) and H (curl). Numerische Mathematik 2000; 85(2):197–217.
14. Reitzinger S, Schöberl J. An algebraic multigrid method for finite element discretizations with edge elements.
Numerical Linear Algebra with Applications 2002; 9:223–238.
15. Hu JJ, Tuminaro RS, Bochev PB, Garasi CJ, Robinson AC. Toward an h-independent algebraic multigrid method
for Maxwell’s equations. SIAM Journal on Scientific Computing 2006; 27:1669–1688.
16. Jones J, Lee B. A multigrid method for variable coefficient maxwell’s equations. SIAM Journal on Scientific
Computing 2006; 27(5):1689–1708.
17. Vaněk P, Mandel J, Brezina M. Algebraic multigrid by smoothed aggregation for second and fourth order elliptic
problems. Computing 1996; 56(3):179–196.
18. Muhammad A, Egerstedt M. Control using higher order Laplacians in network topologies. Proceedings of the 17th
International Symposium on Mathematical Theory of Networks and Systems, Kyoto, Japan, 2006; 1024–1038.
19. Adams M, Brezina M, Hu J, Tuminaro R. Parallel multigrid smoothing: polynomial versus Gauss–Seidel. Journal
of Computational Physics 2003; 188(2):593–610.
20. Hirani AN. Discrete exterior calculus. Ph.D. Thesis, California Institute of Technology, May 2003.
21. Frankel T. An introduction. The Geometry of Physics (2nd edn). Cambridge University Press: Cambridge, 2004.
22. Bochev PB, Hyman JM. Principles of mimetic discretizations of differential operators. In Compatible Spatial
Discretizations, Arnold DN, Bochev PB, Lehoucq RB, Nicolaides RA, Shashkov M (eds). The IMA Volumes in
Mathematics and its Applications, vol. 142. Springer: Berlin, 2006; 89–119.
23. Dodziuk J. Finite-difference approach to the Hodge theory of harmonic forms. American Journal of Mathematics
1976; 98(1):79–104.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:165–185
DOI: 10.1002/nla
NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS
Numer. Linear Algebra Appl. 2008; 15:187–200
Published online 7 December 2007 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nla.563
SUMMARY
We present a fast, cell-centered multigrid solver and apply it to image denoising and non-rigid diffusion-
based image registration. In both applications, real-time performance is required in 3D and the multigrid
method has to be compared with solvers based on fast Fourier transform (FFT). The optimization of the
underlying variational approach results for image denoising directly in one time step of a parabolic linear
heat equation, for image registration a non-linear second-order system of partial differential equations is
obtained. This system is solved by a fixpoint iteration using a semi-implicit time discretization, where
each time step again results in an elliptic linear heat equation. The multigrid implementation comes close
to real-time performance for medium size medical images in 3D for both applications and is compared
with a solver based on FFT using available libraries. Copyright q 2007 John Wiley & Sons, Ltd.
KEY WORDS: multigrid; performance optimization; FFT; image processing; image registration; image
denoising
1. INTRODUCTION
In recent years, data sizes in image-processing applications have drastically increased due to the
improved image acquisition systems. Modern computer tomography (CT) scanners can create
volume data sets of 5123 voxels or more [1, 2]. However, users expect real-time image manipulation
and analysis. Thus, fast algorithms and implementations are needed to fulfill these tasks.
Many image-processing problems can be formulated in a variational framework and require
the solution of a large, sparse, linear system arising from the discretization of partial differential
equations (PDEs). Often these PDEs are inherently based on some kind of diffusion process. In
simple cases, it is possible to use fast Fourier transform (FFT)-based techniques to solve these
PDEs that are of complexity O(n log n). The FFT algorithm was introduced in 1965 by Cooley and
Tukey [3]; for an overview of Fourier transform methods, we refer e.g. to [4–6]. As an alternative,
multigrid methods are more general and can reach an asymptotically optimal complexity of O(n).
For discrete Fourier transforms, flexible and highly efficient libraries optimized for special
CPU architectures such as the FFTW library [7] or the Intel Math Kernel Library (MKL) [8]
are available. However, we are currently not aware of similarly tuned multigrid libraries in 3D
and only of DiMEPACK [9] for 2D problems. The purpose of this paper is to close this gap
and to implement a multigrid solver optimized especially for the Intel x86 architecture that is
competitive to highly optimized FFT libraries and apply it to typical applications in the area of image
processing.
The outline of this paper is as follows: We describe the multigrid scheme including some
results on its convergence and discuss some implementation and optimization issues in Section 2.
Then, the variational approaches used for image denoising and non-rigid diffusion registration are
introduced in Section 3. Finally, we compare computational times of our multigrid solver and the
FFTW package as obtained for image denoising and non-rigid registration of medical CT images.
2. MULTIGRID
For a comprehensive overview on multigrid methods we refer to, e.g. [10–15]. In this paper, we
implement a multigrid solver for the linear heat equation
*u
(x, t)−u(x, t) = f (x), u(x, 0) = u 0 (x) (1)
*t
with time t ∈ R+ , u, f : ⊂ R3 → R, x ∈ , initial solution u 0 : ⊂ R3 → R and homogeneous
Neumann boundary conditions. Note that in practice u(x, t) is often computed for a finite t,
only, and that the solution tends to the well-known Poisson equation in the limit for t → ∞. We
discretize (1) with finite differences
u h (x, )−u 0 (x)
−h u h (x, ) = f h (x) (2)
on a regular grid h with mesh size h and time step . h denotes the well-known 7-point stencil
for the Laplacian. We consider in the following only a single time step, where we have to solve
the elliptic equation
(I −h )u h (x, ) = f h (x)+u 0 (x) (3)
In this paper, we are dealing with image-processing problems, where we can think of the discrete
voxels located in the cell centers. Therefore, we have chosen to use a cell-centered multigrid scheme
with constant interpolation and 8-point restriction. Note that this combination of intergrid transfer
operators will lead to multigrid convergence rates significantly worse than what could be ideally
obtained [15, 16]. This will be shown by local Fourier analysis (LFA) and numerical experiments.
However, this leads to a relatively simple algorithm that satisfies our numerical requirements and
is quite suitable for a careful machine-specific performance optimization. For relaxation we choose
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:187–200
DOI: 10.1002/nla
A FAST FULL MULTIGRID SOLVER FOR APPLICATIONS IN IMAGE PROCESSING 189
2.1.1. Memory layout. Best performance on current x86 processors can be achieved by using the
SIMD (single instruction multiple data) unit, which was introduced to the architecture in 1999
with the Pentium III as streaming SIMD extension (SSE). These instructions perform vector-
like operations on units of 16 bytes, which can be seen as a SIMD vector data type containing
four single precision floating point numbers in our case. Operating on naturally aligned (i.e. at
addresses multiples of their size) SIMD vectors, the SSE unit provides high bandwidth especially
to the caches. Consequently, the memory layout must support as many aligned data accesses in
all multigrid components as possible. To enable efficient handling of the boundary conditions, we
chose to explicitly store boundary points around the grid; by copying the outer unknowns before
smoothing or calculating the point-wise residuals, we need no special handling of the homogeneous
Neumann boundary conditions. The first unknown of every line is further aligned to a multiple
of 16 bytes by padding, i.e. filling up the line with unused values up to a length of multiples of
four. This enables SIMD processing for any line length, as boundary values, which are generated
just-in-time, and the padding area can be overwritten with fake results.
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:187–200
DOI: 10.1002/nla
190 M. STÜRMER, H. KÖSTLER AND U. RÜDE
contributes only to a single coarse grid point. Hence, calculation of the residual and its restriction
can be done in SIMD and without storing residuals to memory. The idea is to compute four SIMD
registers containing residuals from four neighboring lines and averaging them into a single SIMD
vector first. Its values are reordered by special shuffle instructions, so that two coarse grid right-
hand side values can be generated by averaging its first and second, and its third and fourth values.
By reusing some common expressions, this can be further simplified. The constant interpolation
can also be executed very efficiently in the SIMD unit with shuffle operations.
Additionally, the loops are unrolled and the instructions scheduled carefully by hand to support
the compiler in producing fast code.
2.1.3. Blocking and fusion of components. SIMD optimization is most useful when combined with
techniques to enhance spatial and temporal data locality developed in [20, 22–24] and to exploit
the higher bandwidth of the caches. For smaller grids the post-smoother uses a simple blocking
method as illustrated in Figure 1(I): After preparing the first boundary (I(a)), it continues after
Figure 1. Illustration of the different blocking methods on a 10×10×10 cube. (I) Simple plane blocking
of one RBGS update: (a) initial boundary handling; (b) first block; (c) blocking complete; and (d) final
boundary handling. (II) Super-blocking of one RBGS update: (a) first sub-block of first super-block;
(b) first super-block complete; (c) middle super-block complete; and (d) last super-block complete.
(III) Super-blocking of one RBGS update fused with calculation of residual and restriction: (a) initial
boundary handling; (b) first sub-block of first super-block; (c) first super-block complete; and (d) only
final boundary handling missing.
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:187–200
DOI: 10.1002/nla
A FAST FULL MULTIGRID SOLVER FOR APPLICATIONS IN IMAGE PROCESSING 191
the red update in line y, z immediately with the black update in line y, z −1 (I(b)) through the
whole grid (I(c)) and finishes the sweep with a black update in the last plane (I(d)). As long as
data from the last block can be held in the cache hierarchy, the solution and right-hand side grid
must be transferred from and to memory only once. For larger grids this is not possible anymore
and another blocking level must be introduced as illustrated in Figure 1(II): The grid is divided
in the x–z direction then, and every resulting super-block is processed in a similar manner as in
the simple case, but the red update in line y, z is followed by the black update in line y −1, z to
respect data dependencies between two super-blocks. Therefore, the first and last super-blocks need
a special boundary handling (II(a–d)). This two-fold blocking method is slightly less effective,
since the super-blocks overlap and some values are read from main memory twice. The optimal
super-block height depends on the cache size and the line length.
The pre-smoother extends these blocking methods further by fusing the smoothing step with
calculation and restriction of the residuals. For smaller grids, the simpler blocking method working
on whole planes (I) is extended: the right-hand side values of the coarser grid plane z are computed
immediately after smoothing in the planes 2z and 2z +1 is done. This leads to a slightly more
complex handling at the first and last planes. For larger planes, however, super-blocks must be
used again as depicted in Figure 1(III).
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:187–200
DOI: 10.1002/nla
192 M. STÜRMER, H. KÖSTLER AND U. RÜDE
Table III. Wallclock times in ms for FFT (real type, out of place, forward and backward)
and the optimized multigrid on an AMD Opteron 248 2.2 GHz cluster node.
Size V(1, 1) FMG V(1, 1) FMG V(2, 2) FFT (FFTW) DCT (FFTW)
32 0.63 0.80 1.38 0.85 2.27
64 6.97 9.55 14.9 10.4 19.1
128 56.0 78.7 122 107 197
256 445 622 976 992 2024
512 3669 5175 7943 9274 67 766
Table IV. Wallclock times in ms for FFT (real type, out of place, forward and backward)
and the optimized multigrid on an Intel Core2 Duo 2.4 GHz (Conroe) workstation.
Size V(1, 1) FMG V(1, 1) FMG V(2, 2) FFT (FFTW) DCT (FFTW) FFT (MKL)
32 0.43 0.55 0.93 0.40 1.43 0.71
64 3.33 4.29 7.12 3.73 12.2 5.27
128 31.6 44.1 68.3 50.4 123 45.8
256 264 370 574 473 1246 401
512 2168 3026 4699 4174 11 067 3510
discrete cosine transform (DCT) used for Neumann boundary conditions, respectively. This does
not contain the time necessary for actually solving the problem in Fourier space as described in
Section 8, which is highly dependent on the code quality. For our applications, the accuracy of
a simple FMG-V(1, 1) or even a simple V(1, 1)-cycle is often sufficient, as will be explained
in Section 3. On both platforms, we compare the performance of our code-optimized multigrid
implementation with the performance of the well-known FFTW package [25] (version 3.1.2).
The first test platform is an AMD Opteron 248 cluster node. The CPUs run at 2.2 GHz and
provide a 1 MB unified L2 and 64 kB L1 data cache and are connected to DDR-333 memory. For
this platform, the GNU C and C++ compiler (version 4.1.0 for 64-bit environment) was used.
Measurements (see Table III) show that a full multigrid with V(1, 1)-cycles can outperform the
FFTW’s FFTs and is much faster than its DCTs even with V(2, 2)-cycles.
The second test platform is an Intel Core2 Duo (Conroe) workstation. The CPU runs at 2.4 GHz,
both cores have an L1 data cache of 16 kB, share 4 MB of unified L2 cache and are connected to
DDR2-667 memory. For this platform, the Intel 64 compiler suite (version 9.1) was used. We also
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:187–200
DOI: 10.1002/nla
A FAST FULL MULTIGRID SOLVER FOR APPLICATIONS IN IMAGE PROCESSING 193
present results for a beta version of the Intel MKL [8] (version 9.0.06 beta for 64-bit environment),
which provides an FFTW-compatible interface for FFTs through wrapper functions, but no DCT
functions at all.
Although a slightly different instruction scheduling more suitable for that CPU type is used,
all multigrid variants are slower at smaller problem sizes than the FFTs of FFTW and the
MKL on this platform (see Table IV), the FMG with V(2, 2)-cycles even at all problem sizes.
Again, the DCTs take much more time than the code-optimized multigrid at all problem sizes
tested.
Variational approaches in image processing are often considered as too slow for real-time appli-
cations, especially in 3D. Nevertheless, they are attractive due to their flexibility and the quality
of the results, see e.g. [1, 26–31]. In the following, we introduce two very simple variational
prototype problems. Most of the more complicated image-processing tasks consist of extensions
of these approaches that include, e.g. introducing local anisotropy in the PDEs. The reason why
we restrict ourselves to these simple approaches is that they can be solved by FFT-based methods
and by multigrid and they are therefore good benchmark problems to test the best possible speed
of variational image-processing methods.
with x ∈ Rd and ∈ R+ over the image domain ⊂ Rd . A necessary condition for a minimizer
u : → R, the denoised image, is characterized by the Euler–Lagrange equations
u −u 0 −u = 0 (5)
with homogeneous Neumann boundary conditions. This is equivalent to (3) with f h = 0 and = .
In an infinite domain, an explicit solution is given by
u(x, t) = G √2t (x−y)u 0 (y) dy = (G √2t ∗u 0 )(x) (6)
Rd
where the operator ∗ denotes the convolution of the grid function u 0 and the Gaussian kernel
1 −|x|2 /(22 )
G (x) = e (7)
22
with standard deviation ∈ R+ . This is equivalent to applying a low-pass filter and can be
transformed into Fourier space, where a convolution corresponds to a multiplication of the
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:187–200
DOI: 10.1002/nla
194 M. STÜRMER, H. KÖSTLER AND U. RÜDE
it follows that
2 /(2/2 )
F[G ∗u 0 ](w) = e−|x| F[u 0 ](w) (8)
Figure 2. Rendered 3D MRI image with added Gaussian noise ( = 10) added (left) and after denoising
(right) using a V(1, 1)-cycle of the cell-centered multigrid method.
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:187–200
DOI: 10.1002/nla
A FAST FULL MULTIGRID SOLVER FOR APPLICATIONS IN IMAGE PROCESSING 195
T, R : ⊂ Rd → R (9)
where T and R are template image and reference image, respectively, and is the image domain.
The task of non-rigid image registration is to find a transformation (x) such that the deformed
image T (u (x)) can be matched to image R(x). The transformation is defined as
that consists of two parts. The first term (T (x−u(x))− R(x))2 is a distance measure that evaluates
the similarity of the two images. Here, we restrict ourselves to the sum of squared differences
(SSD) as represented in the integral in (10). When discretized, this results in a point-wise ‘least-
squares’ difference of gray values. The second term, the regularizer, controls the smoothness or
regularity of the transformation. In the literature many different regularizers were discussed [29].
d
We restrict ourselves here to the so-called diffusion regularizer l=1 ∇u l 2 [35]. By choosing
+
different parameters ∈ R , one can control the relative weight of the two terms in the functional
[40, 41].
The optimization of the energy functional results in nonlinear Euler–Lagrange equations
with homogeneous Neumann boundary conditions that can be discretized by finite differences
on a regular grid h with mesh size h. To treat the nonlinearity often an artificial time is
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:187–200
DOI: 10.1002/nla
196 M. STÜRMER, H. KÖSTLER AND U. RÜDE
introduced
which is discretized by a semi-implicit scheme with a discrete time step , where the nonlinear
term is evaluated at the old time level
(uk+1
h −uh )
k
−h uk+1
h = ∇h T (x−uh )(T (x−uh )− R(x))
k k
(13)
The complete image registration scheme can be found in Algorithm 1. Note that in each time step,
line 6 of Algorithm 1 requires the solution of d decoupled scalar linear heat equations of type (3).
This can be accomplished by the same multigrid algorithms as for the image denoising in the last
section. To minimize the number of time steps, we use a technique described in [42] to adapt the
and parameters. The idea is to start with large and (we use = 1000, = 10) penalizing
higher oscillations in the solution and preferring global transformations, and then to decrease the
parameters by factors = 0.1 and = 0.5 when the improvement of the SSD stagnates. Note
that for small the transformations are localized and sensitive to discontinuities or noise in the
images. The development of the relative SSD error for an image registration example is found
in Figure 3. As initial deformation for the first time step we take an interpolated solution of the
0.8
relative SSD error
0.6
0.4
0.2
0
10 20 30 40 50 60 70 80
time step
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:187–200
DOI: 10.1002/nla
A FAST FULL MULTIGRID SOLVER FOR APPLICATIONS IN IMAGE PROCESSING 197
Figure 4. Slice of reference image (upper left) template image (upper right), distance image Tk –R (lower
left) and registered image (lower right).
image registration from the next coarser grid, which explains that the initial relative SSD error
is below 1.0. The bends in the curve arise when adapting and . Figure 4 shows slices of
the corresponding medical data sets and the registration result. For medical applications, it is
not always useful to drive the registration problem to a very small SSD, but to maintain the
topology of the medical data. Table VI summarizes the runtimes for different methods to solve (13).
A whole time step in the registration algorithm including three linear solves and the computation
of the new right-hand side and the SSD error takes 1.4 s. Starting with an FMG-V(2, 1) for the
first iterations, it is sufficient to perform an FMG-V(1, 1) after time steps become smaller without
losing any accuracy in the solution. The DCT-based implementation is described, e.g. in [29]. Here
about 65% of the time was spent to compute the forward and backward transforms, the rest for the
non-optimized multiplication of the inverse eigenfunctions. Note that in practice sometimes also
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:187–200
DOI: 10.1002/nla
198 M. STÜRMER, H. KÖSTLER AND U. RÜDE
Table VI. Runtime for one linear solve in one time step in the image
registration algorithm for an image of size 256×256×160.
Method Runtime (ms)
FMG-V(2, 2) 608
FMG-V(2, 1) 499
FMG-V(1, 1) 390
DCT 2107
AOS 1971
an additive operator splitting (AOS) scheme is used to solve the registration problem [29, 43]. It
is fast, but the time step has to be chosen sufficiently small [29].
A fast cell-based full multigrid implementation for variational image-processing problems is shown
to be highly competitive in terms of computing times with alternative techniques such as approaches
using FFT-based algorithms. However, this requires a careful machine-specific code optimization.
Next, this first step has to be extended to an arbitrary number of grid points in each direction
and to anisotropic or nonlinear diffusion models. Furthermore, we consider parallelization of the
optimized multigrid solver.
ACKNOWLEDGEMENTS
This research is being supported in part by the Deutsche Forschungsgemeinschaft (German Science
Foundation), projects Ru 422/7-1, 2, 3 and the Bavarian KONWIHR supercomputing research consortium
[44, 45].
REFERENCES
1. Jain AK. Fundamentals of Digital Image Processing. Prentice-Hall: Englewood Cliffs, NJ, U.S.A., 1989.
2. Oppenheim A, Schafer R. Discrete-time Signal Processing. Prentice-Hall: Englewood Cliffs, NJ, U.S.A., 1989.
3. Cooley J, Tukey J. An algorithm for the machine computation of the complex Fourier series. Mathematics of
Computation 1965; 19:297–301.
4. Duhamel P, Vetterli M. Fast Fourier transforms: a tutorial review and a state of the art. Signal Processing 1990;
19:259–299.
5. Rader CM. Discrete Fourier transforms when the number of data samples is prime. Proceedings of the IEEE
1968; 56:1107–1108.
6. Pennebaker W, Mitchell J. JPEG: Still Image Data Compression Standard. Van Nostrand Reinhold: New York,
1993.
7. Frigo M, Johnson S. FFTW: an adaptive software architecture for the FFT. Proceedings of the International
Conference on Acoustics, Speech, and Signal Processing, Seattle, WA, U.S.A., vol. 3, 1998; 1381–1384.
8. MKL. http://www.intel.com/cd/software/products/asmo-na/eng/perflib/mkl/.
9. Kowarschik M, Weiß C, Rüde U. DiMEPACK—a cache-optimized multigrid library. In Proceedings of the
International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2001),
vol. I, Las Vegas, NV, U.S.A., Arabnia HR (ed.). CSREA Press: Irvine, CA, U.S.A., 2001; 425–430.
10. Brandt A. Multi-level adaptive solutions to boundary-value problems. Mathematics of Computation 1977;
31(138):333–390.
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:187–200
DOI: 10.1002/nla
A FAST FULL MULTIGRID SOLVER FOR APPLICATIONS IN IMAGE PROCESSING 199
11. Hackbusch W. Multi-grid Methods and Applications. Springer: Berlin, Heidelberg, New York, 1985.
12. Briggs W, Henson V, McCormick S. A Multigrid Tutorial (2nd edn). SIAM: Philadelphia, PA, U.S.A., 2000.
13. Trottenberg U, Oosterlee C, Schüller A. Multigrid. Academic Press: San Diego, CA, U.S.A., 2001.
14. Wienands R, Joppich W. Practical Fourier analysis for multigrid methods. Numerical Insights, vol. 5. Chapman
& Hall/CRC Press: Boca Raton, FL, U.S.A., 2005.
15. Wesseling P. Multigrid Methods. Edwards: Philadelphia, PA, U.S.A., 2004.
16. Mohr M, Wienands R. Cell-centred multigrid revisited. Computing and Visualization in Science 2004; 7(3):
129–140.
17. Yavneh I. On red–black SOR smoothing in multigrid. SIAM Journal on Scientific Computing 1996; 17(1):180–192.
18. Barkai D, Brandt A. Vectorized multigrid Poisson solver for the CDC CYBER 205. Applied Mathematics and
Computation 1983; 13(3–4):215–228. (Special Issue, Proceedings of the First Copper Mountain Conference on
Multigrid Methods, Copper Mountain, CO, McCormick S, Trottenberg U (eds).)
19. Kowarschik M, Rüde U, Thürey N, Weiß C. Performance optimization of 3D multigrid on hierarchical memory
architectures. Proceedings of the 6th International Conference on Applied Parallel Computing (PARA 2002),
Lecture Notes in Computer Science, vol. 2367. Springer: Berlin, Heidelberg, New York, 2002; 307–316.
20. Kowarschik M. Data Locality Optimizations for Iterative Numerical Algorithms and Cellular Automata on
Hierarchical Memory Architectures. Advances in Simulation, vol. 13. SCS Publishing House: Erlangen, Germany,
2004.
21. Bergen B, Gradl T, Hülsemann F, Rüde U. A massively parallel multigrid method for finite elements. Computing
in Science and Engineering 2006; 8(6):56–62.
22. Douglas C, Hu J, Kowarschik M, Rüde U, Weiß C. Cache optimization for structured and unstructured grid
multigrid. Electronic Transactions on Numerical Analysis (ETNA) 2000; 10:21–40.
23. Weiß C. Data locality optimizations for multigrid methods on structured grids. Ph.D. Thesis, Lehrstuhl für
Rechnertechnik und Rechnerorganisation, Institut für Informatik, Technische Universität München, Germany,
2001.
24. Stürmer M. Optimierung von Mehrgitteralgorithmen auf der IA-64 Rechnerarchitektur. Lehrstuhl fr Informatik 10
(Systemsimulation), Institut für Informatik, University of Erlangen-Nuremberg, Germany, May 2006. Diplomarbeit.
25. FFTW. http://www.fftw.org.
26. Horn B. Robot Vision. MIT Press: Cambridge, MA, U.S.A., 1986.
27. Lehmann T, Oberschelp W, Pelikan E, Repges R. Bildverarbeitung für die Medizin. Springer: Berlin, Heidelberg,
New York, 1997.
28. Jähne B. Digitale Bildverarbeitung (6th edn). Springer: Berlin, Heidelberg, New York, 2006.
29. Modersitzki J. Numerical Methods for Image Registration. Oxford University Press: Oxford, 2004.
30. Morel J, Solimini S. Variational Methods in Image Segmentation. Progress in Nonlinear Differential Equations
and their Applications, vol. 14. Birkhaeuser: Boston, 1995.
31. Weickert J. Anisotropic Diffusion in Image Processing. Teubner Verlag: Stuttgart, Germany, 1998.
32. Tikhonov AN, Arsenin VY. Solution of Ill-posed Problems. Winston and Sons: New York, NY, U.S.A., 1977.
33. Hermosillo G. Variational methods for multi-model image matching. Ph.D. Thesis, Université de Nice, France,
2002.
34. Viola P, Wells W. Alignment by maximization of mutual information. International Journal of Computer Vision
1997; 24(2):137–154.
35. Fischer B, Modersitzki J. Fast diffusion registration. AMS Contemporary Mathematics, Inverse Problems, Image
Analysis, and Medical Imaging 2002; 313:117–129.
36. Haber E, Modersitzki J. A multilevel method for image registration. SIAM Journal on Scientific Computing 2006;
27(5):1594–1607.
37. Clarenz U, Droske M, Henn S, Rumpf M, Witsch K. Computational methods for nonlinear image registration.
Technical Report, Mathematical Institute, Gerhard-Mercator University Duisburg, Germany, 2006.
38. Fischer B, Modersitzki J. Curvature based image registration. Journal of Mathematical Imaging and Vision 2003;
18(1):81–85.
39. Henn S. A multigrid method for a fourth-order diffusion equation with application to image processing. SIAM
Journal on Scientific Computing 2005; 27(3):831–849.
40. Jäger F, Han J, Hornegger J, Kuwert T. A variational approach to spatially dependent non-rigid registration. In
Proceedings of SPIE, vol. 6144, Reinhardt J, Pluim J (eds). SPIE: Bellingham, U.S.A., 2006; 860–869.
41. Kabus S, Franz A, Fischer B. On elastic image registration with varying material parameters. In Proceedings
of Bildverarbeitung für die Medizin (BVM), Maintzer H-P, Handels H, Horsch A, Tolxdorff T (eds). Springer:
Berlin, Heidelberg, New York, 2005; 330–334.
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:187–200
DOI: 10.1002/nla
200 M. STÜRMER, H. KÖSTLER AND U. RÜDE
42. Henn S, Witsch K. Image registration based on multiscale energy information. Multiscale Modeling and Simulation
2005; 4(2):584–609.
43. Weickert J, ter Haar Romeny B, Viergever M. Efficient and reliable schemes for nonlinear diffusion filtering.
IEEE Transactions on Image Processing 1998; 7(3):398–410.
44. Hülsemann F, Meinlschmidt S, Bergen B, Greiner G, Rüde U. Gridlib—a parallel, object-oriented framework
for hierarchical-hybrid grid structures in technical simulation and scientific visualization. In High Performance
Computing in Science and Engineering, KONWIHR Results Workshop, Garching, Bode A, Durst F (eds). Springer:
Berlin, Heidelberg, New York, 2005; 117–128.
45. Freundl C, Bergen B, Hülsemann F, Rüde U. ParEXPDE: expression templates and advanced PDE software
design on the Hitachi SR8000. In High Performance Computing in Science and Engineering, KONWIHR Results
Workshop, Garching, Bode A, Durst F (eds). Springer: Berlin, Heidelberg, New York, 2005; 167–179.
Copyright q 2007 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:187–200
DOI: 10.1002/nla
NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS
Numer. Linear Algebra Appl. 2008; 15:201–218
Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nla.576
SUMMARY
Optical flow techniques are used to compute an approximate motion field in an image sequence. We
apply a variational approach for the optical flow using a simple data term but introducing a combined
diffusion- and curvature-based regularizer. The same data term arises in image registration problems where
a deformation field between two images is computed. For optical flow problems, usually a diffusion-
based regularizer should dominate, whereas for image registration a curvature-based regularizer is more
appropriate. The combined regularizer enables us to handle optical flow and image registration problems
with the same solver and it improves the results of each of the two regularizers used on their own. We
develop a geometric multigrid method for the solution of the resulting fourth-order systems of partial
differential equations associated with the variational approach for optical flow and image registration
problems. The adequacy of using (collective) pointwise smoothers within the multigrid algorithm is
demonstrated with the help of local Fourier analysis. Galerkin-based coarse grid operators are applied for
an efficient treatment of jumping coefficients. We show some multigrid convergence rates, timings and
investigate the visual quality of the approximated motion or deformation field for synthetic and real-world
images. Copyright q 2008 John Wiley & Sons, Ltd.
KEY WORDS: multigrid; optical flow; image registration; variational approaches in computer vision
1. INTRODUCTION
Optical flow is commonly defined to be the motion of brightness patterns in a sequence of images.
It was introduced by Horn and Schunck [1], who proposed a differential method to compute
the optical flow from pairs of images using a brightness constancy assumption and an additional
smoothness constraint on the magnitude of the gradient of the velocity field in order to regularize
the problem, what we call diffusion-based regularization. Since then optical flow has been studied
∗ Correspondence to: R. Wienands, Mathematical Institute, University of Cologne, Weyertal 86-90, 50931 Cologne,
Germany.
†
E-mail: wienands@math.uni-koeln.de
intensively and many extensions to that simple variational approach, e.g. considering different
regularizing terms, were investigated [2–9].
Optical flow applications range from robotics to video compression and particle image
velocimetry (PIV), where optical flow provides approximate motion of fluid flows. Especially
for PIV, it is necessary to incorporate physically more meaningful regularizers to be able to
impose, e.g. an incompressibility condition of the velocity field. Suter [10] introduced therefore a
smoothness constraint on the divergence and curl of the velocity field that was used intensively
in the following [11–14]. A well-known regularizer in image registration that is related to optical
flow [15] and a special case of a second-order div–curl-based regularizer [10] is the curvature-
based regularizer [16]. The purpose of the curvature-based regularizer is to let affine motion
unpenalized while higher-order motions are still used to enforce smoothness. Another advantage
of a higher-order regularizer is that for some applications additional information from features
or landmarks is given for the optical flow computation [17]. Here, the higher-order regularizer is
required to avoid singularities in the solution [18, 19].
We present a variational approach for optical flow with a combined diffusion- and curvature-
based regularizer in Section 2. Please note that the accuracy of optical flow models is usually
dominated by the data term. Our main focus is on the impact of the regularization and we use
a rather simple data term that also arises in image registration in order to treat both applications
with the same solver. As a consequence, we cannot expect to achieve the same accuracy as it is
obtained, for example, in [20] where very accurate optical flow models are presented based on an
advanced data term.
Besides accuracy of the approximate motion field obtained by optical flow, an important goal
is to achieve real time or close to real-time performance in many applications, which makes
an efficient numerical solution of the underlying system of partial differential equations (PDEs)
mandatory. First attempts to use multilevel techniques to speed up optical flow computations are due
to Glazer [21] and Terzopoulos [22]. After that, several multigrid-based solvers were proposed for
different optical flow regularizers (see, e.g. [23–27]). In [28, 29] efficient cell-centered (nonlinear)
multigrid solvers for various optical flow models with diffusion-based regularizers are discussed.
Multigrid methods for image registration are e.g. presented in [30–32]. We develop a geometric
multigrid method in Section 3 in order to solve the fourth-order system of PDEs derived from
our variational approach efficiently. Especially, the existence and efficiency of point smoothing
methods are investigated in some detail. Here, we do not apply the classical multigrid theory based
on smoothing and approximation property [33] as it is done in [34] for a similar application but
we use local Fourier analysis techniques [35–37].
In Section 4, optical flow and image registration results using the combined diffusion and
curvature regularizer both for synthetic and real-world images are found. We end this paper with
an outlook for future developments, e.g. the extension to isotropic or anisotropic versions of the
combined regularizer to deal with discontinuities in the velocity field.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:201–218
DOI: 10.1002/nla
MULTIGRID SOLUTION OF THE OPTICAL FLOW SYSTEM 203
object in the image does not change its gray values, what means that, for example, changes of
illumination are neglected. For an image sequence I : ×T → R, ⊂ R2 describing the gray value
intensities for each point x = (x, y) in the regular image domain at time t ∈ T = [0, tmax ], tmax ∈ N,
this so-called brightness constancy assumption reads
dI
=0 (1)
dt
This yields the following identity for the movement of a gray value at (x, y, t):
I (x, t) = I (x +dx, y +dy, t +dt) (2)
Taylor expansion of I (x +dx, y +dy, t +dt) around (x, y, t) neglecting higher-order terms and
using (2) gives
I x u + I y v + It ≈ 0
with the partial image derivatives *I /*x = I x , *I /*y = I y , *I /*t = It and the optical flow velocity
vector u = (u, v)T , u := dx/dt, v := dy/dt. Please note that in general I is not differentiable for
real-world images. However, usually these images are preprocessed by several steps of a Gaussian
filter [2] making sure that the function I is sufficiently smooth.
The brightness constancy assumption (1) is used throughout this paper, but by itself results in
an ill-posed, under-determined problem. Therefore, additional regularization is required. Horn and
Schunck proposed as second assumption a smoothness constraint or a diffusion-based regularizer
S1 (u) = ∇u2 +∇v2
and combined both in an energy functional
E 1 (u) := (I x u + I y v + It )2 +S1 (u) dx (3)
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:201–218
DOI: 10.1002/nla
204 H. KÖSTLER, K. RUHNAU AND R. WIENANDS
the complete variational problem based on E 3 (u), compare with [15] and the references therein.
Considerations concerning the well-posedness in a less regular case are covered in [34].
The diffusion-based regularizer only allows small changes of near vectors and produces very
smooth motion fields, but it also smoothes edges out. The curvature-based regularizer lets affine
motions unpenalized since they are in its kernel. Here, smoothness is achieved by using higher-
order motions. We will show for the problems under consideration that the optical flow (and the
deformation field derived in image registration, see below) based on the combined regularizer can
be computed efficiently and that we obtain more accurate solutions than they are produced by each
of the two regularizers used on their own.
∇u, n = 0,
∇v, n = 0 (5a)
∇(u), n = 0,
∇(v), n = 0 (5b)
with outward normal n. For = 0, we obtain a fourth-order system, whereas for = 1 the original
Horn and Schunck second-order system results where only two boundary conditions are required
given by (5a).
The biharmonic operator 2 which appears in (4a) is known to lead to poor multigrid perfor-
mance. Therefore, it is a common approach to split up the biharmonic operator into a system of
two Poisson-type equations [36]. Employing this idea, (4a) can be transformed into the following
system using additional unknown functions w 1 = −u and w 2 = −v:
⎛ ⎞ ⎛ ⎞
u 0
⎜ ⎟ ⎜
⎜ v⎟ ⎜ 0 ⎟ ⎟
⎜ ⎟
L ⎜ 1⎟ = ⎜ ⎟ (6a)
⎜w ⎟ ⎜ ⎝ −I I
⎟
⎠
⎝ ⎠ x t
w2 −I y It
with
⎛ ⎞
− 0 −1 0
⎜ ⎟
⎜ 0 − 0 −1 ⎟
⎜ ⎟
L=⎜ 2 ⎟ (6b)
⎜ Ix Ix I y (−(1−)+) 0 ⎟
⎝ ⎠
Ix I y I y2 0 (−(1−)+)
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:201–218
DOI: 10.1002/nla
MULTIGRID SOLUTION OF THE OPTICAL FLOW SYSTEM 205
∇u, n = 0,
∇v, n = 0,
∇w 1 , n = 0,
∇w 2 , n = 0
respectively. The principle part of det(L) is m with m = 4 for ∈ [0,1) and m = 2 for = 1 due
to >0. Hence, four boundary conditions for = 1 are required and two boundary conditions for
= 1 (see, e.g. [35, 36]). This requirement is met by our choice of boundary conditions since we
use natural homogeneous Neumann boundary conditions on u, v and additionally on −u = w 1 ,
−v = w 2 , if = 1, according to the minimization of the energy functional, see above.
2.3. Discretization
The continuous system (6a), (6b) of four PDEs is discretized by finite differences using the standard
five-point central discretization h of the Laplacian (see, e.g. [36]) with x ∈ h and discrete
functions u h , vh , wh1 , wh2 . Here, h denotes the discrete image domain, i.e. each x ∈ h refers
to a pixel. The mesh size h is usually set to 1 for optical flow applications. The corresponding
homogeneous Neumann boundary conditions for the four unknown functions are discretized by
central differences as well. Finally, the image derivatives have to be approximated by sufficiently
accurate finite differences schemes. A proper accuracy of these derivatives is often essential for
the quality of the image-processing result. The discrete operator Lh is then simply given by (6b)
where has to be replaced by h and I x , I y by their finite difference approximations I xh , I yh .
with the same boundary conditions as above. Please note that now the data term is nonlinear. To
minimize (8), we linearize the whole energy functional and apply an inexact Newton method as
described in detail in [30, 32]. Then, starting with an initial approximation u0 the (k +1)th iterate
is computed via
uk+1 = uk +k v
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:201–218
DOI: 10.1002/nla
206 H. KÖSTLER, K. RUHNAU AND R. WIENANDS
where we choose the parameter k ∈ R+ such that the energy becomes smaller after each step and
the correction v is derived from
H E (uk )v = −J E (uk ) (9)
J E := ∇Tu (R(x)− Tu )+(u+(1−)2 u) denotes the Jacobian and H E the Hessian of (8) that
is approximated by H E ≈ (∇Tu )2 +(+(1−)2 ). We drop the term ∇ 2 Tu (R(x)− Tu ) since
the difference R(x)− Tu should be small for registered images and since second image derivatives
are very sensitive to noise and are hard to estimate robustly. System (9) is equivalent to the optical
flow system (4) with a slightly different right-hand side and can be treated numerically in the same
way.
3. MULTIGRID SOLVER
In recent applications, a real-time solution of the optical flow system becomes more and more
important. Hence, an appropriate multigrid solver is an obvious choice for the numerical solution
of the resulting linear system, since multigrid methods are known to be among the fastest solvers
for discretized elliptic PDEs.
Multigrid methods (see, e.g. [33, 35, 36, 38, 39]) are mainly motivated by two basic principles.
1. Smoothing principle: Many iterative methods have a strong error smoothing effect if they
are applied to discrete elliptic problems.
2. Coarse grid correction principle: A smooth error term can be well represented on a coarser
grid where its approximation is substantially less expensive.
These two principles suggest the following structure of a two-grid cycle: Perform 1 steps of an
iterative relaxation method Sh on the fine grid (pre-smoothing), compute the defect of the current
fine grid approximation, restrict the defect to the coarse grid, solve the coarse grid defect equation,
interpolate the obtained error correction to the fine grid, add the interpolated correction to the
current fine grid approximation (coarse grid correction), perform 2 steps of an iterative relaxation
method on the fine grid (post-smoothing). Instead of an exact solution of the coarse grid equation,
it can be solved by a recursive application of the two-grid iteration, yielding a multigrid method.
We assume standard coarsening here, i.e. the sequence of coarse grids is obtained by repeatedly
doubling the mesh size in each space direction, i.e. h → 2h.
The crucial point for any multigrid method is to identify the ‘correct’ multigrid components (i.e.
relaxation method, restriction, interpolation, etc.) yielding an efficient interplay between relaxation
and coarse grid correction. A useful tool for a proper selection is local Fourier analysis.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:201–218
DOI: 10.1002/nla
MULTIGRID SOLUTION OF THE OPTICAL FLOW SYSTEM 207
components—which form a unitary basis of the space of bounded infinite grid functions, the
Fourier space. Regarding our optical flow system composed of four discrete equations, a proper
unitary basis of vector-valued Fourier components is given by
Then, the main idea of local Fourier analysis is to analyze different multigrid components or
even complete two-grid cycles by evaluating their effect on the Fourier components. Especially,
the analysis of the smoothing method is based on a distinction between ‘high’ and ‘low’ Fourier
frequencies governed by the coarsening strategy under consideration. If standard coarsening is
selected, each ‘low frequency’
in the transition from G h to G 2h . That is, the related three high-frequency components are not
visible on the coarse grid G 2h as they coincide with the coupled low-frequency component:
min{|det(
Lh (n, h))| : h ∈ high }
E h (Lh (n)) :=
max{|det(Lh (n, h))| : h ∈ }
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:201–218
DOI: 10.1002/nla
208 H. KÖSTLER, K. RUHNAU AND R. WIENANDS
is the Fourier symbol (for details concerning Fourier symbols for systems of equations, etc. we
refer to [35–37]) of Lh (n), i.e.
Lh (n)uh (h, x) =
Lh (n, h)uh (h, x)
The Fourier symbol Lh (n, h) for the system of PDEs is composed of the Fourier symbol of the
Laplacian and several constants. The Fourier symbol of the Laplacian reads (compare with [35–37])
4
−
h (h) = 2 (sin2 (1 /2)+sin2 (2 /2)) with h ∈
h
Now, det( Lh (n, h)) is simply given by (7) where −h has to be replaced by − h (h) and the image
derivatives by the related frozen constants. For the derivation of E h (Lh (n)), it is important to note
that −h (h)0. Moreover, for the four coefficients
As a consequence, the measure of h-ellipticity for the discrete operator Lh (n) turns out to be
4+ Ic h 4 2+ Ic h 2
E h (Lh (n)) = and E h (Lh ) =
1024+16Ic h 4 32+4Ic h 2
respectively. Note that E h (Lh (n))>0 for all possible choices of , h>0, ∈ [0, 1], Ic 0. In parti-
cular, this means that E h (Lh (n))>0 for all possible values of I xh (n), I yh (n) over the whole discrete
image domain, i.e. for arbitrary n ∈ h . This is a strong and very satisfactory robustness result
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:201–218
DOI: 10.1002/nla
MULTIGRID SOLUTION OF THE OPTICAL FLOW SYSTEM 209
for such a complicated system involving several parameters. Even in the limit of small mesh size
h → 0, the measure of h-ellipticity is bounded away from zero since we have
1
16 for = 1
lim E h (Lh (n)) =
256 for = 1
h→0 1
10000
Jacobi
GS-RB
GS-LEX
100
1
||Residuum||
0.01
1e-04
1e-06
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:201–218
DOI: 10.1002/nla
210 H. KÖSTLER, K. RUHNAU AND R. WIENANDS
by local Fourier analysis. loc is defined as the worst asymptotic error reduction by one relaxation
step of all high-frequency error components. For more details on local Fourier smoothing analysis,
we refer to the literature [35–37].
In case of smoothly varying coefficients, the smoothing factor for Lh (x) can be bounded by the
maximum over the smoothing factors for the locally frozen operator, i.e.
As a popular test case, we consider frame 8 of the Yosemite sequence shown in Figure 4. Table I
presents the corresponding smoothing factors calculated via (10) for GS-LEX and GS-RB with
varying . is fixed at 1500, which turned out to be a proper choice w.r.t. the average angular error
(AAE) (11) in many situations, see below. Obviously there is hardly any influence of the parameter
on the resulting smoothing factor. We always observe nearly the same smoothing factors as
they are well known for the Poisson equation (i.e. = 0.5 for GS-LEX and = 0.25 for GS-RB).
Systematic tests show that the same statement is also valid for the parameter . As a consequence,
we can expect to obtain the typical multigrid efficiency as long as the coarse grid correction works
properly, compare with Section 3.4. The situation is considerably more complicated if we apply
decoupled relaxations (compare with [36]) which will be discussed elsewhere.
Note that I x and I y are not varying smoothly over the image domain h for this test case.
Instead we have moderate jumps in the coefficients. As a consequence, the smoothing factors from
Table I are not justified rigorously. However, from practical experience, they can be considered as
heuristic but reliable estimates for the actual smoothing properties especially since we only have
moderate jumps. To back up the theoretical results from smoothing analysis, we also tested the
smoothing effect of the collective relaxations numerically. The smoothing effect of GS-LEX can
be clearly seen from Figure 2. Here, the initial (random) error on a 33×33 grid (a scaled down
version of frame 8 from the Yosemite sequence) and the error after five collective GS-LEX steps
of the first component u of the optical flow velocity vector are shown.
Summarizing, there is sufficient evidence that collective damped Jacobi, GS-LEX and GS-RB
relaxation are reasonable smoothing methods even though they might diverge for single relaxation
steps as stand-alone solvers.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:201–218
DOI: 10.1002/nla
MULTIGRID SOLUTION OF THE OPTICAL FLOW SYSTEM 211
2 4
1.8 3.5
1.6 3
1.4
1.2 2.5
1 2
0.8 1.5
0.6 1
0.4
0.2 0.5
35 35
0 30 0 30
25 25
20 20
15 15
10 10
0 5 5 0 5 5
10 15 20 0 10 15 20 0
25 30 35 25 30 35
post-relaxations). For details concerning these multigrid components, we refer to the well-known
literature again [33, 35, 36, 38, 39].
Since we are interested in a real-time solution, it is necessary to use the full multigrid (FMG)
technique (see, e.g. [35, 36]). Here, the initial approximation on the fine grid is obtained by the
computation and interpolation of approximations on coarser grids. A properly adjusted FMG
algorithm yields an asymptotically optimal method, i.e. the number of arithmetic operations is
proportional to the number of grid points, and at the same time, the error of the resulting fine grid
solution is approximately equal to the discretization error.
4. EXPERIMENTAL RESULTS
Next, the numerical performance of the multigrid solver described above is investigated, and the
quality of the variational model is demonstrated.
where uc = (u c , vc , 1) is the ground truth and ue = (u e , ve , 1) the estimated optical flow vector.
Most real-world image sequences do not offer a ground truth motion field; therefore, in this
case the quality of the optical flow is often measured visually by plotting the vector field and
comparing it with the expected result. For example, one can check whether the vector field is
smooth inside objects and edges from different movements are preserved, e.g. objects moving over
a static background.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:201–218
DOI: 10.1002/nla
212 H. KÖSTLER, K. RUHNAU AND R. WIENANDS
4.1.1. Multigrid performance. All experiments for different combinations of and (see below)
were performed using a single FMG-V (2, 2) cycle with collective GS-RB as the smoother. The same
visual and AAE results can be also obtained by five V (2, 2) cycles. Input images are smoothed by
a discrete Gaussian filter mask (standard deviation = 1.2) in order to ensure a robust computation
of the image derivatives by finite difference approximations.
For constant coefficients I x and I y , one obtains the typical multigrid convergence factors similar
as for the Poisson equation which can be nicely predicted by local Fourier analysis. For jumping
coefficients, a slight deterioration of the convergence rate can be observed. Table II lists some
representative results. Different values of that are useful for the application do not have a
substantial impact on the convergence rates. The best convergence rates are achieved when the
combination of and is optimal with respect to the quality of the solution which is an interesting
observation by itself. Figure 3 shows an AAE (11) plot over for = 1500. The best quality with
Table II. Convergence rates for the computation of the optical flow from frames 8 and 9
of the Yosemite sequence with = 1500.
GS-LEX GS-RB
Cycle =0 = 0.4 =1 =0 = 0.4 =1
1 0.053 0.051 0.048 0.091 0.090 0.074
2 0.054 0.042 0.045 0.070 0.055 0.044
3 0.096 0.065 0.148 0.115 0.069 0.127
4 0.124 0.086 0.196 0.156 0.093 0.181
5 0.131 0.093 0.232 0.172 0.110 0.233
10.6
AAE for alpha=500
AAE for alpha=1500
10.4 AAE for alpha=5000
10.2
10
9.8
AAE
9.6
9.4
9.2
8.8
8.6
0 0.2 0.4 0.6 0.8 1
beta
Figure 3. AAE plot of the calculated optical flow between pictures 8 and 9 from the Yosemite sequence
for = 500, 1500 and 5000.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:201–218
DOI: 10.1002/nla
MULTIGRID SOLUTION OF THE OPTICAL FLOW SYSTEM 213
respect to AAE is obtained for ≈ 0.4. On the other hand, the best convergence rates for = 1500
are also obtained for ≈ 0.4 (see Table II).
To give an impression of the performance of our optical flow algorithm, we list in Table III
runtimes for a FMG-V (2, 2) cycle for different image sizes. The time measurements are done on
an AMD Opteron 248 Cluster node with 2.2 GHz, 64 kB L1 cache, 1 MB L2 cache and 4 GByte
DDR-333 RAM. Of course, by a hardware-specific performance optimization of the multigrid
solver on current architectures these times can be improved for real applications [41, 42].
Summarizing, the multigrid algorithm exhibits a very robust behavior as it was indicated by
the investigation of the measure of h-ellipticity. For all possible choices of , and the image
derivatives, one obtains nearly the same (excellent) convergence factors as they are known for the
Poisson equation.
4.1.2. Quality of the optical flow model. In the following we use two sequences, one synthetic
and another real world [43] to evaluate our optical flow model.
The Yosemite sequence with clouds, created by Lynn Quam [44], is a rather complex test
case (see Figure 4). It consists of 15 frames of size 316×252 and depicts a flight through the
Yosemite national park. In this sequence, translational (clouds) and divergent motion (flight) is
present. Additionally, we have varying illumination in the region of the clouds; thus, our constant
brightness assumption is not fulfilled there.
All tests were obtained with frames 8 and 9 of the Yosemite sequence. First, we consider in
Figure 3 the AAE for = 500, 1500, 5000 and varying . = 500 was chosen because it was tested
to give the optimal value—w.r.t. a minimal AAE—for the second-order system. The combined
regularizer produces the best result. It is able to outperform both the diffusion-based and also the
curvature-based regularizer. Since the AAE is measured over the whole image domain, also small
improvements of the AAE can lead to a substantial improvement in the local visual quality of the
resulting optical flow field.
Figure 4 shows image details of the resulting velocity fields for the Yosemite sequence, where
we choose = 1500 for a visual comparison of different values of . The right half of this detail
includes the high mountain from the middle of the images. The mountains are moving from right
to left, whereas the clouds region is moving (pure horizontally) from left to right. For = 1, one
can see the usual behavior of the original Horn and Schunck regularizer, which tries to produce
a smooth solution even over the mountain crest. The fourth-order system performs better in this
regard, as the region of influence is notably smaller, for example, at the right crossover. The
combined regularizer with = 0.4 exhibits a mixture of both effects and leads to a smaller AAE
over the whole image. One can also observe that all methods fail to calculate the pure horizontal
flow in the clouds region. That is due to the fact that the brightness varies here and thus the
constant brightness assumption of the data term does not hold.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:201–218
DOI: 10.1002/nla
214 H. KÖSTLER, K. RUHNAU AND R. WIENANDS
140 145 150 155 160 165 170 175 180 185 140 145 150 155 160 165 170 175 180 185 140 145 150 155 160 165 170 175 180 185
50 50 50
55 55 55
60 60 60
65 65 65
70 70 70
75 75 75
Figure 4. First line: Frames 8 and 9 from Yosemite sequence. Second line: A detail from the optical flow
located left from the highest mountain in the middle of the image (marked in frame 8). It was calculated
with = 1500 and (from left to right) = 0, 0.4 and 1.
The second sequence shows rotating particles and is related to PIV. However, we do not use
the standard models like a div–curl regularizer for PIV but our variational approach. Our goal is
to visualize the difference in the diffusion- and curvature-based regularizer at a vortex, where the
latter is able to resolve the vortex much better which can be nicely observed in Figure 5.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:201–218
DOI: 10.1002/nla
MULTIGRID SOLUTION OF THE OPTICAL FLOW SYSTEM 215
Figure 5. First line: two frames of a rotating particle sequence (size 512×512). Second line:
the resulting optical flow field for = 500 at the vortex for the diffusion-based regularizer
(left) and the curvature-based regularizer (right).
We presented and evaluated a combined diffusion- and curvature-based regularizer for optical flow
and the related image registration. The arising fourth-order system of PDEs was solved efficiently
by a geometric multigrid solver. Here, it shows that the best results are obtained, when the weighting
between regularizer and brightness constancy assumption is chosen such that the multigrid solver
shows an optimal convergence rate. This is an interesting observation and it has to be investigated,
if this can be used to choose the weighting parameter automatically.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:201–218
DOI: 10.1002/nla
216 H. KÖSTLER, K. RUHNAU AND R. WIENANDS
Figure 6. First line: template image (left) and reference image (right) showing a human brain (size
256×256). Second line: registration results (from left to right) with = 3 for = 0 and 0.05.
To improve the static weighting of the regularizer, which produces an equally smooth solution
throughout the picture, one could allow a space-dependent parameter in order to deal with
discontinuities in the solution.
Next steps are the extension of the regularizer to the physically motivated div–curl-based regu-
larizer, or nonlinear regularizers, where and depend on the velocity field.
Furthermore, we wish to apply the curvature-based regularizer to motion blur computed by a
combined optical flow and ray tracer motion field [17]. This should help to overcome the problem
of the diffusion-based regularizer that introduces singularities in the Euler–Lagrange equations,
since some motion vectors are fixed within the optical flow model.
For image registration, it is an interesting task to extend the model to 3D in order to be able to
register 3D medical data sets.
REFERENCES
1. Horn B, Schunck B. Determining optical flow. Artificial Intelligence 1981; 17:185–203.
2. Horn B. Robot Vision. MIT Press: Cambridge, MA, U.S.A., 1986.
3. Nagel H-H, Enkelmann W. An investigation of smoothness constraints for the estimation of displacement
vector fields from image sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence 1986;
8(5):565–593.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:201–218
DOI: 10.1002/nla
MULTIGRID SOLUTION OF THE OPTICAL FLOW SYSTEM 217
4. Galvin B, McCane B, Novins K, Mason D, Mills S. Recovering motion fields: an evaluation of eight optical
flow algorithms. British Machine Vision Conference, Southampton, 1998.
5. Verri A, Poggio T. Motion field and optical flow: qualitative properties. IEEE Transactions on Pattern Analysis
and Machine Intelligence 1989; 11(5):490–498.
6. Haussecker H, Fleet D. Computing optical flow with physical models of brightness variation. IEEE Transactions
on Pattern Analysis and Machine Intelligence 2001; 23(6):661–673.
7. Weickert J, Schnörr C. A theoretical framework for convex regularizers in PDE-based computation of image
motion. International Journal of Computer Vision 2001; 45(3):245–264.
8. Weickert J, Schnörr C. Variational optic flow computation with a spatio-temporal smoothness constraint. Journal
of Mathematical Imaging and Vision 2001; 14(3):245–255.
9. Brox T, Weickert J. Nonlinear matrix diffusion for optic flow estimation. In Pattern Recognition, van Gool L
(ed.). Lecture Notes in Computer Science, vol. 2449. Springer: Berlin, 2002; 446–453.
10. Suter D. Motion estimation and vector splines. Proceedings of the Conference on Computer Vision and Pattern
Recognition, Los Alamos, U.S.A., 1994; 939–948.
11. Gupta S, Prince J. Stochastic models for div–curl optical flow methods. IEEE Signal Processing Letters 1996;
3(2):32–34.
12. Corpetti T, Mémin E, Pérez P. Dense estimation of fluid flows. IEEE Transactions on Pattern Analysis and
Machine Intelligence 2002; 24(3):365–380.
13. Kohlberger T, Mémin E, Schnörr Ch. Variational dense motion estimation using the Helmholtz decomposition.
In Fourth International Conference on Scale Space Methods in Computer Vision, Griffin L, Lillholm M (eds),
Isle of Skye, U.K. Lecture Notes in Computer Science, vol. 2695. Springer: Berlin, 2003; 432–448.
14. Corpetti T, Heitz D, Arroyo G, Mémin E, Santa-Cruz A. Fluid experimental flow estimation based on an optical-
flow scheme. Experiments in Fluids 2006; 40(1):80–97.
15. Modersitzki J. Numerical Methods for Image Registration. Oxford University Press: Oxford, 2004.
16. Fischer B, Modersitzki J. Curvature based image registration. Journal of Mathematical Imaging and Vision 2003;
18(1):81–85.
17. Zheng Y, Köstler H, Thürey N, Rüde U. Enhanced motion Blur calculation with optical flow. Proceedings of
Vision, Modeling and Visualization, RWTH Aachen, Germany. Aka GmbH, IOS Press: Berlin, 2006; 253–260.
18. Fischer B, Modersitzki J. Combining landmark and intensity driven registrations. PAMM 2003; 3(1):32–35.
19. Galic I, Weickert J, Welk M, Bruhn A, Belyaev A, Seidel H. Towards PDE-based image compression. Proceedings
of Variational, Geometric, and Level Set Methods in Computer Vision. Lecture Notes in Computer Science.
Springer: Berlin, Heidelberg, New York, 2005; 37–48.
20. Papenberg N, Bruhn A, Brox T, Didas S, Weickert J. Highly accurate optic flow computation with theoretically
justified warping. International Journal of Computer Vision 2006; 67(2):141–158.
21. Glazer F. Multilevel relaxation in low-level computer vision. In Multi-Resolution Image Processing and Analysis,
Rosenfeld A (ed.). Springer: Berlin, 1984; 312–330.
22. Terzopoulos D. Image analysis using multigrid methods. IEEE Transactions on Pattern Analysis and Machine
Intelligence 1986; 8:129–139.
23. Enkelmann W. Investigations of multigrid algorithms for the estimation of optical flow fields in image sequences.
Computer Vision, Graphics, and Image Processing 1988; 43:150–177.
24. Battiti R, Amaldi E, Koch C. Computing optical flow across multiple scales: an adaptive coarse-to-fine strategy.
International Journal of Computer Vision 1991; 6(2):133–145.
25. Kalmoun EM, Rüde U. A variational multigrid for computing the optical flow. In Vision, Modeling and
Visualization, Ertl T, Girod B, Greiner G, Niemann H, Seidel HP, Steinbach E, Westermann R (eds). Akademische
Verlagsgesellschaft: Berlin, 2003; 577–584.
26. Kalmoun EM, Köstler H, Rüde U. 3D optical flow computation using a parallel variational multigrid scheme
with application to cardiac C-arm CT motion. Image and Vision Computing 2007; 25(9):1482–1494.
27. Christadler I, Köstler H, Rüde U. Robust and efficient multigrid techniques for the optical flow problem using
different regularizers. In Proceedings of 18th Symposium Simulations Technique ASIM 2005, Hülsemann F,
Kowarschik M, Rüde U (eds). Frontiers in Simulation, vol. 15. SCS Publishing House: Erlangen, 2005; 341–346.
Preprint version published as Technical Report 05-6.
28. Bruhn A. Variational optic flow computation: accurate modeling and efficient numerics. Ph.D. Thesis, Department
of Mathematics and Computer Science, Saarland University, Saarbrücken, Germany, 2006.
29. Bruhn A, Weickert J, Kohlberger T, Schnörr C. A multigrid platform for real-time motion computation with
discontinuity-preserving variational methods. International Journal of Computer Vision 2006; 70(3):257–277.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:201–218
DOI: 10.1002/nla
218 H. KÖSTLER, K. RUHNAU AND R. WIENANDS
30. Haber E, Modersitzki J. A multilevel method for image registration. SIAM Journal on Scientific Computing 2006;
27(5):1594–1607.
31. Henn S. A multigrid method for a fourth-order diffusion equation with application to image processing. SIAM
Journal on Scientific Computing 2005; 27(3):831–849.
32. Hömke L. A multigrid method for anisotropic PDEs in elastic image registration. Numerical Linear Algebra with
Applications 2006; 13(2–3):215–229.
33. Hackbusch W. Multi-grid Methods and Applications. Springer: Berlin, Heidelberg, New York, 1985.
34. Keeling SL, Haase G. Geometric multigrid for high-order regularizations of early vision problems. Applied
Mathematics and Computation 2007; 184(2):536–556.
35. Brandt A. Multigrid techniques: 1984 guide with applications to fluid dynamics. GMD-Studie Nr. 85, Sankt
Augustin, West Germany, 1984.
36. Trottenberg U, Oosterlee C, Schüller A. Multigrid. Academic Press: San Diego, CA, U.S.A., 2001.
37. Wienands R, Joppich W. Practical Fourier analysis for multigrid methods. In Numerical Insights, vol. 5. Chapman &
Hall/CRC Press: Boca Raton, FL, U.S.A., 2005.
38. Briggs W, Henson V, McCormick S. A Multigrid Tutorial (2nd edn). SIAM: Philadelphia, PA, U.S.A., 2000.
39. Wesseling P. Multigrid Methods. Edwards: Philadelphia, PA, U.S.A., 2004.
40. McCane B, Novins K, Crannitch D, Galvin B. On benchmarking optical flow. Computer Vision and Image
Understanding 2001; 84(1):126–143.
41. Douglas C, Hu J, Kowarschik M, Rüde U, Weiß C. Cache optimization for structured and unstructured grid
multigrid. Electronic Transactions on Numerical Analysis 2000; 10:21–40.
42. Hülsemann F, Kowarschik M, Mohr M, Rüde U. Parallel geometric multigrid. In Numerical Solution of
Partial Differential Equations on Parallel Computers, Chapter 5, Bruaset A, Tveito A (eds). Lecture Notes in
Computational Science and Engineering, vol. 51. Springer: Berlin, Heidelberg, New York, 2005; 165–208.
43. Barron J, Fleet D, Beauchemin S. Performance of optical flow techniques. International Journal of Computer
Vision 1994; 12(1):43–77.
44. Heeger D. Model for the extraction of image flow. Journal of the Optical Society of America A: Optics, Image
Science, and Vision 1987; 4(8):1455–1471.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:201–218
DOI: 10.1002/nla
NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS
Numer. Linear Algebra Appl. 2008; 15:219–247
Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nla.579
SUMMARY
We study the effect of inter-grid operators—the interpolation and restriction operators—on the convergence
of two-grid algorithms for linear models. We show how a modal analysis of linear systems, along with
some assumptions on the normal modes of the system, allows us to understand the role of inter-grid
operators in the speed and accuracy of a full-multigrid step.
We state an assumption that generalizes local Fourier analysis (LFA) by means of a precise description
of aliasing effects on the system. This assumption condenses, in a single algebraic property called the
harmonic aliasing property, all the information needed from the geometry of the discretization and the
structure of the system’s eigenvectors. We first state a harmonic aliasing property based on the standard
coarsening strategies of 1D problems. Then, we extend this property to a more aggressive coarsening
typically used in 2D problems with the help of additional assumptions on the structure of the system
matrix.
Under our general assumptions, we determine the exact rates at which groups of modal components
of the error evolve and interact. With this knowledge, we are then able to design inter-grid operators
that optimize the two-grid algorithm convergence. By different choices of operators, we verify the classic
heuristics based on Fourier harmonic analysis, show a trade-off between the rate of convergence and the
number of computations required per iteration, and show how our analysis differs from LFA. Copyright
q 2008 John Wiley & Sons, Ltd.
KEY WORDS: multigrid algorithms; inter-grid operators; convergence analysis; modal analysis; aliasing
∗ Correspondence to: Pablo Navarrete Michelini, Departamento de Ingenierı́a Eléctrica, Universidad de Chile,
Av. Tupper 2007, Santiago, RM 8370451, Chile.
†
E-mail: pnavarre@purdue.edu
1. INTRODUCTION
We are interested in applications of the multigrid algorithm in the distributed sensing and processing
tasks that arise in the design of wireless sensor networks. In such scenarios, the inexpensive,
low-power, low-complexity sensor motes that are the nodes of the network must perform all
computation and communication tasks. This is very different than the scenarios encountered
in the implementation of multigrid algorithms on large parallel machines for the following
reasons:
• Sensor motes are battery powered and must operate unattended for long periods of time. The
design of algorithms that run on them must therefore attempt to minimize the number of
computations each node must perform and the number of times it must communicate because
both functions consume energy. Of the two functions, communication is the most energy
intensive per bit of data.
• Communication between sensor motes is carried out in hop-by-hop fashion, since the energy
required to send data over a distance d is proportional to d with 24. Thus, the sensor
motes communicate directly only with their nearest neighbors in any direction.
• Re-executing an algorithm after adjusting parameters or models is very difficult or might not
even be possible because of the remote deployment of the network. It is thus critical that the
algorithms used to perform various tasks be as robust and well understood as possible before
they are deployed.
‡
Originally called local mode analysis (LMA); we chose the nomenclature used in [7] as it emphasizes the essential
difference with the approach introduced in this paper.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
DESIGN OF INTER-GRID OPERATORS 221
Fourier harmonic modes. The properties of the system must thus be constrained in some way in
order to develop new tools for convergence analysis. The requirement we focus on is an explicit
description of the aliasing effects produced by the coarsening strategy.
The aliasing of Fourier harmonic modes is present in LFA through the concept of spaces of
harmonics [7]. We identify its simple form as one of the reasons why LFA is so powerful. Based on
this fact, we assume a more general aliasing pattern that still allows us to characterize convergence
behavior. This assumption condenses, in a single algebraic property called the harmonic aliasing
property, all the information needed from the geometry of the discretization and the structure of the
eigenvectors. If this property is satisfied, then no more information is needed from the system and
the analysis is completely algebraic. Therefore, our analysis could be considered a semi-algebraic
approach to the study of convergence issues and the design of efficient inter-grid operators.
One of the practical advantages of our approach is that we are able to separate the problem
of coarsening from what we call filtering, i.e. interpolation/restriction weights and smoothing
operations. The analysis of each problem makes no use of heuristics. The coarsening strategy
is designed to ensure a convenient aliasing pattern whereas the design of the filters is meant to
optimize multigrid convergence.
The main difficulty of our approach is the dependence of the assumptions on the eigenvectors of
the system. In practical applications, it is very unlikely that this information is available. Therefore
the verification of the assumptions remains unsolved. Nevertheless, this problem is also shared in
many fields in which transient or local phenomena do not allow a proper use of Fourier analysis [8].
There have been many efforts to identify suitable bases for specific problems and the goal of this
work is to open this problem in multigrid analysis. For these reasons, the results of this paper
are not entirely conclusive about optimization strategies for coarsening and filtering. They are,
however, an important first step toward this goal.
In Section 2 we provide the notation and the essential properties of the multigrid algorithm
for further analysis. In Section 3 we list the assumptions needed on the algorithm and system in
order to apply our analysis. In Section 4 we list the additional assumptions needed on 2D systems
in order to extend our analysis. In Section 5 we derive the main results about the influence of
inter-grid operators on multigrid convergence and verify the classic heuristics of Fourier harmonic
analysis. In Section 6 we provide examples that show how to use our analysis and also on how
our analysis differs from the classical LFA.
We wish to solve discrete linear systems of the form Au = f , defined on a grid h with step size
h ∈ R+ defined as the largest distance between neighboring grid nodes. A coarse grid s is defined
as a set of nodes such that s ⊂ h and s>h.
We define the so-called inter-grid operators, regardless of their use in the multigrid algorithm,
as any linear transformation between scalar fields on h and s . That is,
where Ish is the interpolation operator and Ihs is the restriction operator. We introduce a notation
with markers ‘ ˇ ’ or ‘ ˆ ’ to indicate transfers from a finer or coarser grid, respectively. We are
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
222 P. NAVARRETE MICHELINI AND E. J. COYLE
y ∈ R |
s
|
ŷ = Ish y, (3)
and
A ∈ R | |×|h |
h
Ǎ = Ihs AIsh , (4)
The definition of the coarsening operator in (4) follows the Galerkin condition and is standard in
most multigrid applications [9].
We consider a full two-grid approach consisting of a nested iteration step, as shown in Figure 1,
and 1 iterations of the Correction Scheme, including 1 pre-smoothing and 2 post-smoothing
iterations, as shown in Figure 2. Here, the vector vk is the kth approximation of the exact solution
of the linear system, u ∈ R| | . Similarly, the vector ek = u −vk is the approximation error after the
h
kth step of the algorithm. One smoothing iteration is characterized by the smoothing operator S;
after each iteration the approximation error evolves as ek+1 = Sek . Because of this property we
also call S the smoothing filter.
From these diagrams, it follows that the approximation error between smoothing iterations in
the correction scheme is given by
Figure 1. Diagram of a nested iterations step. The dotted line separates problems from the
fine and coarse grid domains. The interpolation (restriction) operation is applied to vectors
crossing the dotted line from below (above).
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
DESIGN OF INTER-GRID OPERATORS 223
and similarly, the initial approximation error, e0 , using nested iteration is given by
e0 = K u (6)
where u is the exact solution of the linear system and K is the so-called coarse grid correction
matrix [10] defined as
This matrix is the target of our analysis in Section 5 as it controls all of the convergence features
of the two-grid scheme. Considering the effect of smoothing iterations, the error in the whole
correction scheme evolves as
In the multiple-grid case, a recursive application of nested iterations and the correction scheme
is used to solve coarse system equations, as shown in Figure 3. Since coarse systems are not
solved with exact accuracy, the approximation error evolves differently. Here, the error depends
on the accuracy of the solutions from the coarse grids. Thus, matrix K used above is replaced by
a different matrix, denoted by K 1 , which is obtained from the following recursions:
K L = 0, A1 = A
A j = Ǎ j−1 , with j = 2, . . . , L −1 and (9)
[I −(S j 2 K j S j 1 ) j K j ]( Ǎ j−1 )−1 I j−1 A j−1 ,
j−1 j
K j−1 = I − I j with j = L , . . . , 2
j−1 j
where S j , I j , and I j−1 are the smoothing, interpolation, and restriction operators chosen at
level j, and j is the number of iterations of the correction scheme used at level j. Then, the
approximation error evolves as e0 = K 1 u in nested iterations and it evolves as e1 +1 = K 1 e1 between
smoothing iterations of the correction scheme.
Although our analysis is technically applicable to the full multiple-grid case, the coupling
between different levels makes the algebra tedious. Therefore, we concentrate on the two-level
case and for the multiple-grid case we assume that the problem in coarse levels has been solved
with enough accuracy so that matrices (S j 2 K j S j 1 ) j K j can be neglected and we can work under
the two-grid assumptions.
Figure 3. Diagram of the recursive full multigrid approach using one iteration of the correction
scheme per level. Each box represents a number of pre- or post-smoothing iterations. The
particular choice of using the same combination of pre-/post-smoothing iterations on different
correction scheme steps is considered.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
224 P. NAVARRETE MICHELINI AND E. J. COYLE
Two assumptions are needed in order to derive our convergence results. First, we introduce a
decomposition of the inter-grid interpolation/restriction operators into up-/down-sampling and
filtering operations, a standard approach in digital signal processing [8, 11]. Second, we assume that
the operators and the system possess the same basis of eigenvectors and we establish a condition
on these eigenvectors under (up-/down)-sampling operations. These conditions are motivated by
standard Fourier harmonic analysis but they are not restricted to systems with Fourier harmonic
modes as eigenvectors.
A = W V T (10)
Here, the diagonal matrix contains the eigenvalues of A on its diagonal. The columns of the
matrix W are the right-eigenvectors of A, i.e. AW = W . The columns of the matrix V contain
the left-eigenvectors of A, i.e. V T A = V T .
The column vectors of W and V form a biorthogonal basis since it follows from the above
definitions that
V TW = I (11)
If A is a symmetric matrix, then V = W and the column vectors of W form an orthogonal basis.
It is important to note that from this point on our analysis differs from LFA. In LFA it is
assumed that the stencil of A, denoted as the row vector s, is not dependent on the position of
the grid nodes to which it is applied. When this is true, the operation Ax can be expressed as the
convolution:
(Ax)n = (s)k (x)n+k (12)
k
where (Ax)n denotes the nth component of the vector Ax. This implies that the eigenvectors of A
are Fourier harmonic modes. In other words, if (w)k = ei k then Aw = s()w where s() is the
Fourier transform of the stencil sequence. In our analysis, the stencil can depend on the position
of the grid nodes to which it is applied. In this case, the operation Ax can be expressed as
(Ax)n = (sn )k (x)n+k (13)
k
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
DESIGN OF INTER-GRID OPERATORS 225
S = W V T (14)
where is a diagonal matrix with the eigenvalues of matrix S. The diagonal values in represent
the factor by which each modal component of the approximation error is multiplied after one
smoothing iteration.
As in LFA, our analysis is also applicable to smoothers of the form A+ ek+1 = A− ek with
A = A+ − A− [7], e.g. Gauss–Seidel with lexicographical ordering for constant stencil operators,
assuming that both A+ and A− have the same eigenvectors as A. The smoothing operator is then
given by
where + and − are diagonal matrices with the eigenvalues of A+ and A− , respectively.
U = DT (17)
DU = I˜ (19)
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
226 P. NAVARRETE MICHELINI AND E. J. COYLE
where I˜ ∈ R|s |×|s | is the identity matrix in the coarse grid. On the other hand, the matrix
U D ∈ R|h |×|h | is a diagonal matrix with 1 in the diagonal whenever i = j is a selected node and
0 otherwise.
Now, we can decompose the inter-grid operators Ish and Ihs , as defined in Section 2, into the
following matrix products:
F = FR = c(FI )T (21)
The inter-grid operator decomposition applies to any kind of inter-grid operators. Now, we
restrict our analysis to the set of inter-grid filters that have the same eigenvectors as the system
matrix A. That is, we assume inter-grid filters of the form
FI = W I V T and
(22)
FR = W R V T
where I and R are diagonal matrices and their diagonal coefficients represent the damping
effect of the filters on the corresponding eigenvector.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
DESIGN OF INTER-GRID OPERATORS 227
where Ux = DxT is the up-sampling matrix and N x is the harmonic aliasing pattern that we define
to be
1 I˜x I˜x
Nx = (24)
2 I˜x I˜x
We must note that the harmonic aliasing property only involves the eigenvectors of the system
and the down-/up-sampling operator. Although this is a strong assumption on the system, it
only involves the down-sampling operator from the multigrid algorithm. It does not depend on
the smoothing and inter-grid filters. This is an important consequence of the inter-grid operator
decomposition.
The definition above implicitly assumes a down-sampling by a factor of 2 and naturally induces
a partition of the eigenvectors into two sets, say Wx = [W L x W H x ] for the right-eigenvectors
and Vx = [VL x VH x ] for the left-eigenvectors. The subscripts L x and H x resemble the standard
Fourier harmonic analysis used to distinguish between low- and high-frequency modes (see for
instance [10]). Using these partitions, we can restate the harmonic aliasing property. For that
purpose we state the following definition:
Dx W L x = Dx W H x (25)
and
D x VL x = D x V H x (26)
Theorem 1
The surjective property is equivalent to the harmonic aliasing property.
Proof
First, we have to note that, given the partitions Wx = [W L x W H x ] and Vx = [VL x VH x ], we can
rewrite the harmonic aliasing property as the following set of biorthogonal relationships:
and
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
228 P. NAVARRETE MICHELINI AND E. J. COYLE
From here, if we assume the surjective property, then Equation (32) immediately implies the set
of biorthogonal relationships above, and the harmonic aliasing property is fulfilled.
Now, we assume the harmonic aliasing property holds and we pre-multiply Equation (32) by
(Dx VL x )T . Using Equations (27) and (28) we obtain
Similarly, we post-multiply Equation (32) by Dx W H x . Using Equations (28) and (30), we obtain
In Section 3 we stated assumptions that will allow us to understand the role of the smoothing
and inter-grid filters in multigrid convergence. The assumptions stated in Section 3 do not allow
the study of many multigrid applications. Specifically, when using the multigrid algorithm in
d-dimensional problems, the down-sampling is often designed to reduce the number of grid nodes
by a factor of 2d . On the other hand, the harmonic aliasing property, as stated in Section 3.4,
is essentially applicable only for cases where the grids are down-sampled by a factor of 2. The
down-sampling by a factor of 2d is important to reduce the computational and space costs of the
algorithm. In this section, we assume further properties in the algorithm and system so that our
analysis can be extended to these cases.
For these extensions we use the tensor product defined as:
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
DESIGN OF INTER-GRID OPERATORS 229
The most useful properties of Kronecker products for the purpose of our analysis are
(A ⊗ B)(C ⊗ D) = AC ⊗ B D (36)
and
(A ⊗ B)−1 = A−1 ⊗ B −1 (37)
For further properties, we refer the reader to [12, 13].
where A x,i ∈ Rm×m and A y,i ∈ Rn×n , with i = 1, . . . ,r , representing r possible operators acting on
the dimensions x and y, respectively.
We assume that the matrices A x,i , i = 1, . . . ,r , have the same set of eigenvectors Wx and Vx ,
the matrices A y,i , i = 1, . . . ,r , have the same set of eigenvectors W y and Vy , but each matrix can
have a different set of eigenvalues. We denote the matrix of eigenvalues as x,i for each matrix
A x,i , and y,i for each matrix A y,i . Thus, we have the following eigen-decompositions:
for which the sets of eigenvectors satisfy the biorthogonal relationships VxT Wx = I x and VyT W y = I y ,
where I x is an m ×m identity matrix and I y is an n ×n identity matrix.
It follows from these assumptions that the right-eigenvectors of the system matrix A x y , denoted
as Wx y , and its eigenvalues, denoted as x y , are given by
r
Wx y = Wx ⊗ W y and x y = x,i ⊗ y,i (42)
i=1
VxTy = Wx−1 −1 −1 −1
y = (W x ⊗ W y ) = W x ⊗ W y = Vx ⊗ Vy = (Vx ⊗ Vy )
T T T
(43)
We refer to the assumptions above as the separability assumptions because they allow us to apply
the assumptions from Section 3 for separate sets of eigenvectors. This kind of factorization for the
system matrix often appears in the discretization of partial differential equations (PDEs) (e.g. in
finite difference discretization of the Laplacian, divergence and other operators). Thus, the analysis
under these extended assumptions will be more suitable for applications.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
230 P. NAVARRETE MICHELINI AND E. J. COYLE
Dx y = Dx ⊗ D y (44)
In this way the down-sampling matrix Dx y is designed to reduce the total number of nodes by a
factor of 4.
We use inter-grid filters, denoted by FI,x y and FR,x y , and expressed as
where FI,x , FR,x and FI,y , FR,y are restriction and interpolation filters with eigenvectors Wx and
W y , respectively, and with eigenvalues I,x , R,x and I,y , R,y , respectively. Therefore, FI,x y
and FR,x y have right-eigenvectors Wx y , left-eigenvectors Vx y and eigenvalues given by
We note that due to the properties of Kronecker products, the decomposition in (20) is valid for
both 1D and 2D operators.
Similarly, the smoothing operator Sx y is designed such that
Sx y = Sx ⊗ S y (47)
where Sx and S y are smoothing operators with eigenvectors Wx and W y , respectively, with eigen-
values x and y , respectively. The eigenvalues of Sx y are given by
x y = x ⊗ y (48)
= (VxT Ux Dx Wx )⊗(VyT U y D y W y )
= Nx ⊗ N y (49)
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
DESIGN OF INTER-GRID OPERATORS 231
5. ERROR ANALYSIS
= {D FR AFI U }−1
= {(DW ) R I (DV )T }−1 (53)
From here, we first consider the assumptions in Section 3. Using the partition of eigenvectors
induced by the harmonic aliasing property, we define the matrix
x = R,L x L x I,L x + R,H x H x I,H x (54)
Then, we follow the last step in (53) and obtain
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
232 P. NAVARRETE MICHELINI AND E. J. COYLE
Now we consider the assumptions in Section 4. Similarly, for this case we define the matrices
= 16(Dx W L x ⊗ D y W L y )−1
x y (Dx VL x ⊗ D y VL y )
T
= 16(Dx y W L x y )−1
x y (Dx y VL x y )
T
(59)
where we use, first, the surjective property and, second, the biorthogonal relationships (27)–(30),
and finally, we simply define W L x y = W L x ⊗ W L y and VL x y = VL x ⊗ VL y .
We note that in both (55) and (59) the Galerkin coarse matrix Ǎ has an eigen-decomposition
with eigenvectors given by the down-sampled eigenvectors of A. This is a nice property as it
assures that the assumptions stated for the system on the fine grid are satisfied in coarser grids as
well.
= I − FI U Ǎ−1 D FR W V T
= I − FI W V T U Ǎ−1 DW R V T
= I −(22d )W I (V T U DW L )−1 (VLT U DW ) R V T (60)
where d represents the dimension of the problem. In parentheses we see how the harmonic aliasing
property appears naturally in this matrix.
For the assumptions from Section 3, we follow the algebra to obtain
1 I˜x −1 1 ˜
= Wx Vx −4Wx I,x
T
x ˜
I x I x R,x x VxT
2 I˜x 2
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
DESIGN OF INTER-GRID OPERATORS 233
⎡ ⎤
I˜x − I,L x −1
x R,L x L x − I,L x −1
x R,H x H x
= Wx ⎣ ⎦ VxT (61)
− I,H x −1
x R,L x L x I˜x − I,H x −1
x R,H x H x
Note that matrix K is not diagonalized by the eigenvectors of the system. Instead, we obtain a
block-tridiagonal matrix that shows how each group of modes from W L x and W H x are damped
and mixed. In order to simplify this result, we define the convergence operator, x , such that
K x = Wx x VxT
L x→L x H x→L x
= Wx VxT (62)
L x→H x H x→H x
Each one of the four submatrices in x is diagonal and we call them the modal convergence
operators. Their diagonal values represent the factor by which each modal component of the
error is multiplied and transferred between L x and H x modes according to the subscripts. Their
diagonal values can be simplified as follows:
1 −bi
( L x→L x )i,i = , ( H x→L x )i,i =
1+ai bi 1+ai bi
(63)
−ai ai bi
( L x→H x )i,i = and ( H x→H x )i,i =
1+ai bi 1+ai bi
where
( R,L x )i,i ( L x )i,i ( I,L x )i,i
ai = and bi = (64)
( R,H x )i,i ( H x )i,i ( I,H x )i,i
The convergence of a two-grid algorithm depends on the smoother Sx and the coarse grid correction
matrix K x , which in the domain of the system’s eigenvectors is contained in the matrices x and
x , respectively. Now, matrix x and its four modal convergence operators allow us to focus on
the performance of the inter-grid operators; therefore, this is the main object of study for the design
of inter-grid filters. In Section 6 we will show examples on how to apply this analysis.
From the assumptions in Section 4, we follow a different algebra. This is
1 I˜x 1 I˜y 1 ˜ ˜ 1 ˜ ˜
= I x y −16Wx y I,x y ⊗ −1xy [ I I
x x ]⊗ [ I I
y y ] R,x y x y VxTy
2 I˜x 2 I˜y 2 2
⎛ T T ⎞
I,L x I,L y r R,L x L x,i R,L y L y,i
= I x y − Wx y ⊗ −1
xy
⎝ ⊗ ⎠ VxTy
I,H x I,H y i=1
R,H x H x,i
R,H y H y,i
= Wx y x y VxTy (65)
Here, a simple structure for the convergence operator, x y , does not appear clear because of the
Kronecker products involved. Since the matrix −1
x y cannot in general be factored as a Kronecker
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
234 P. NAVARRETE MICHELINI AND E. J. COYLE
product, we cannot analyze the convergence of the algorithm for each dimension independent
of the other. We then need to consider the four possible combinations of x, y-dimensions and
L , H groups. The products considering these combinations are mixed in x y and we need to
reorder them to identify the modal convergence operators. Thus, we introduce a permutation
matrix P ∈ {0, 1}mn×mn such that for arbitrary matrices X L , X H ∈ Rm/2×m/2 and Y L , Y H ∈ Rn/2×n/2
one has
⎡ ⎤
X L ⊗Y L
⎢ ⎥
XL YL ⎢ X H ⊗Y L ⎥
P ⊗ =⎢ ⎢ ⎥ (66)
⎥
XH YH ⎣ L
X ⊗Y H⎦
X H ⊗Y H
Then, applying this permutation to reorder the rows and columns of x y , we obtain the following
structure:
⎡ ⎤
L x L y→L x L y H x L y→L x L y L x H y→L x L y H x H y→L x L y
⎢ ⎥
⎢ L x L y→H x L y H x L y→H x L y L x H y→H x L y H x H y→H x L y ⎥
Px y P T = ⎢
⎢
⎥
⎥ (67)
⎣ L x L y→L x H y H x L y→L x H y L x H y→L x H y H x H y→L x H y ⎦
L x L y→H x H y H x L y→H x H y L x H y→H x H y H x H y→H x H y
where we identify the modal convergence operators representing the 16 possible ways to transfer
modal components of the error between the four combinations of x, y-dimensions and L , H groups
according to the subscripts. The values of each one of these groups can be expressed in a generic
form as
r
Ax By→C x Dy = AC B D −( I,C x ⊗ I,Dy )−1
xy ( R,Ax Ax,i )⊗( R,By By,i ) (68)
i=1
where A ∈ {H, L}, B ∈ {H, L}, C ∈ {H, L}, D ∈ {H, L} and AC B D is an identity matrix only if
A = C and B = D.
The convergence operator, x y , and its 16 modal convergence operators allow us to focus on the
performance of the inter-grid operators and it is always the main object of study for the design of
inter-grid filters. Compared with the 1D case, the analysis is now more complicated as the modal
components of the error are transferred not only between two groups of modes but also between
different dimensions. In Section 6.3 we will show an example on how to design inter-grid filters
under this scenario.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
DESIGN OF INTER-GRID OPERATORS 235
We call this filter the sharp inter-grid filter. In Fourier harmonic analysis, this would correspond
to what is called a ‘perfect low-pass filter’ [11]. This definition is more general as we can now
apply it to a more general kind of basis, that is, to any basis with the harmonic aliasing property.
By using the eigen-decomposition of A and the sharp inter-grid filter in (63), we obtain
K sharp,x = W H x W HT x (70)
Therefore, for this choice of inter-grid operators, we can see that several applications of the coarse
grid correction matrix do not help to reduce the error. It just cancels the W L x components of the
error. We then need to apply smoothing iterations in order to reduce the W H x components of the
error. We also verify that the error reduction achieved by multigrid iterations does not depend on
the step size h as the iteration matrix does not depend on the eigenvalues of A. The simplicity of
this result shows the general principles of multigrid algorithm design. In Section 6 we will see how
this idealistic scenario does not always lead to an optimal algorithm for solving linear systems.
In Section 5 we obtained theoretical results for the convergence rates based on the assumptions
stated in previous sections. In this section, we introduce examples to show how these results can be
applied to different kinds of systems. We consider systems based on different sets of eigenvectors:
Fourier harmonic modes, Hadamard harmonic modes, and a mixture of Fourier and Hadamard
harmonic modes.
6.1. Fourier harmonic analysis: trade-off between computational complexity and convergence
rate
We consider a 1D system in which A is a standard finite-difference discretization of a second-order
derivative with step size h = 1; i.e. the stencil of A is s = [−1 2 −1] (the underline denotes the
diagonal element). We apply Dirichlet boundary conditions, i.e. stencil [2 −1] at the left corner
and [−1 2] at the right corner, which lead to an invertible system. The number of nodes in the
discretization is set to N = 16 and we consider a two-grid algorithm with a coarse-grid step size
of 2h = 2. In addition we assume the variational √property that leads to a single inter-grid filter F.
The eigenvectors of A are given by (W )i, j = 2/17 sin(i j/17), with i, j = 1, . . . , 16. The eigen-
vector matrix W is orthonormal and, after reversing the order of the columns j = 9, . . . , 16, it also
fulfills the harmonic aliasing property. Therefore, our modal analysis can be directly applied to this
system. On the other hand, the extension of Fourier analysis from complex- to real-valued harmonic
functions is well known and LFA can therefore be applied to this system. Thus, the purpose of this
example is to (i) show how our method is applied to a standard system in which the eigenvectors can
be labeled by frequencies, thus giving an intuitive picture of what is happening and (ii) show how to
design inter-grid filters within our new framework and thus demonstrate the issue we discover in this
process.
For the inter-grid filter, we start with the common choice of linear-interpolation and full-
weighting (LI/FW), and we consider their application on an increasing number of neighbors per
node. The standard choice for this system considers two neighbors per node, which leads to an
inter-grid filter F with stencil s = [0.5 1 0.5] and Dirichlet boundary conditions. Considering more
neighbors per node is equivalent to applying the inter-grid filter F several times in interpolation
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
236 P. NAVARRETE MICHELINI AND E. J. COYLE
Table I. Spectral radii of modal convergence operators for the system in Section 6.1.
Filter L x→L x H x→L x L x→H x H x→H x
LI/FW 1-pass 0.4539 0.9915 0.4539 0.9915
LI/FW 2-passes 0.3647 0.5280 0.4388 1.0000
LI/FW 3-passes 0.2839 0.4946 0.4110 1.0000
LI/FW 4-passes 0.2149 0.4506 0.3745 1.0000
LI/FW 5-passes 0.1590 0.4011 0.3334 1.0000
LI/FW 6-passes 0.1155 0.3506 0.2914 1.0000
The results consider a two-grid approach using several passes of LI/FW as inter-grid operators.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
DESIGN OF INTER-GRID OPERATORS 237
As a different choice of inter-grid operators, we try to approach the sharp inter-grid filter with
a common procedure used in signal processing. We select the eigenvalues of F in analogy with
a Butterworth filter of order n [11]. We start at order n = 1 with a cut-off frequency of /16 that
tries to reduce all frequencies except for the lowest frequency mode, and as we increase the order
n the cut-off frequency approaches /2 geometrically, at which point the filter becomes perfectly
sharp. That is,
1
Bn (i) =
2n , i = 1, . . . , 16 (71)
2 i −1
1+
1−(7/8)n N −1
from which we construct the inter-grid filter as F = W W T with = diag(Bn ). The main reason
to move the cut-off frequency with the order of the filter is to prevent the eigenvalues in H x
from producing large cross-frequency convergence rates.
In Table II we show the spectral radii of L x→L x , H x→L x , L x→H x , and H x→H x for
a two-grid approach using Butterworth filters of different orders. The Butterworth filter is
better than LI/FW, especially in terms of the cross-frequency convergence rate H x→L x .
The main disadvantage of the Butterworth filter is that it is always non-sparse, as shown in
Figure 4(e)–(h). Even if increasing the order n makes the filter appear more and more sparse, the
overall contribution of small terms is comparable to the largest entries. Now, increasing the order
n also concentrates the largest entries close to the diagonal and the tridiagonal elements become
similar to the LI/FW entries. This hints at the optimality of LI/FW as a tridiagonal inter-grid
filter for this specific problem.
An important conclusion of these tests is that in the design of inter-grid filters for systems with
Fourier harmonic modes as eigenvectors, we face a trade-off between the number of multigrid steps
that can be saved by moving toward a sharp inter-grid filter and the number of communications
between neighboring nodes required for interpolation/restriction tasks. This is a consequence of
the Gibbs phenomenon, which is well known in Fourier analysis [11].
Table II. Spectral radii of modal convergence operators for the system in Section 6.1.
Filter L x→L x H x→L x L x→H x H x→H x
B1 0.4156 0.5826 0.4493 0.9982
B2 0.2932 0.4994 0.4150 1.0000
B3 0.1954 0.4350 0.3615 1.0000
B4 0.1246 0.3623 0.3011 1.0000
B5 0.0770 0.2925 0.2431 1.0000
B6 0.0467 0.2314 0.1923 1.0000
B7 0.0279 0.1807 0.1502 1.0000
The results consider a two-grid approach using Butterworth filters of different orders as the inter-grid filter.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
238 P. NAVARRETE MICHELINI AND E. J. COYLE
Figure 4. Images of the magnitude of entries for different inter-grid filter matrices. The
intensity of gray color is white for the largest magnitude and black for the smallest magnitude.
The scale between black and white is set in logarithmic scale in order to increase the visual
difference between small and zero entries: (a) LI/FW 1-pass; (b) LI/FW 3-passes; (c) LI/FW
5-passes; (d) LI/FW 7-passes; (e) B1 ; (f) B3 ; (g) B5 ; and (h) B7 .
solution of the problem. Thus, the structure of the system is given by the transition probability
matrix within the transient states, which is obtained by the following recursion:
T1 = 1
2 (72)
Tl−1 2−l · I˜c
Tl = for l>1 (73)
2−l · I˜c Tl−1
where I˜c is a counter-diagonal matrix of the same size as Tl−1 . The recursion (73) creates a matrix
Tl ∈ (R+ )2 ×2 that is sub-stochastic since the sum of all of its entries in a row is always less
l−1 l−1
than or equal to 1. In fact, the sum of all of the entries in a row is equal to 1−1/2l for all the
rows in Tl . Thus, in this Markov chain, each transient state has a probability of 1/2l of jumping to
one or more recurrent states in one step. An example of this structure is shown in Figure 5 where
we can see the state transition diagram of the transient states for l = 4.
Since, by definition, no recurrent state is connected to any transient state, once the process
jumps from a transient to a recurrent state it will never return to any transient state and it is said
to have been absorbed. Starting from a given transient state i, 1i2l−1 , the number of jumps
within the transient states before jumping to a recurrent state is called the absorbing time, ti . There
are many applications associated with these so-called absorbing chains [14]; for instance, in the
study of discrete phase-type distributions in queueing theory [15].
Here, we will consider the problem of computing the expected value of the absorbing time when
l−1
we start at node i; denoted by (xl )i = E[ti ]. The vector xl ∈ R2 is given by the solution of the linear
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
DESIGN OF INTER-GRID OPERATORS 239
Figure 5. State transition diagram of the transient states for the Markov chain used in Section 6.2
with l = 4 (N = 8 nodes). Each connection with solid line shows the probability of state transitions.
The dashed lines with double arrows show the probability of transition to one or more recurrent
states that do not appear in this figure.
system
(I − Tl )xl = 1 (74)
where (1)i = 1, for i = 1, . . . , 2l−1 .
Here, our system matrix is given by Al = I − Tl , which is a
non-singular, symmetric, positive-definite M matrix. Furthermore, the matrix Al becomes ill-
conditioned as we increase l, creating a problem similar to that found in the numerical solution
of linear PDEs. In the general context of absorbing chains, the matrix Al = I − Tl is called the
fundamental matrix [14]. The inversion of this matrix is important as it also appears in the
computation of moments of discrete phase-type distributions and the probability of absorption by
recurrent classes, among other problems.
In the transition graph of this Markov chain, each node representing a transient state is connected
to l neighboring nodes. However, the structure of connections changes from node to node such
that the stencil of Al is not constant throughout the rows. For instance, in the Markov chain of
Figure 5, the fundamental matrix is
⎡ ⎤
0.5 −0.25 0 −0.125 0 0 0 −0.0625
⎢ ⎥
⎢ −0.25 0.5 −0.125 0 0 0 −0.0625 0 ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ 0 −0.125 0.5 −0.25 0 −0.0625 0 0 ⎥
⎢ ⎥
⎢ ⎥
⎢ −0.125 0 −0.25 0.5 −0.0625 0 0 0 ⎥
⎢ ⎥
A4 = ⎢ ⎥ (75)
⎢ −0.0625 −0.25 −0.125 ⎥
⎢ 0 0 0 0.5 0 ⎥
⎢ ⎥
⎢ ⎥
⎢ 0 0 −0.0625 0 −0.25 0.5 −0.125 0 ⎥
⎢ ⎥
⎢ ⎥
⎢ 0 −0.0625 0 0 0 −0.125 0.5 −0.25 ⎥
⎣ ⎦
−0.0625 0 0 0 −0.125 0 −0.25 0.5
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
240 P. NAVARRETE MICHELINI AND E. J. COYLE
Here, the stencil at the 3rd row is s3 = [−0.125, 0.5, −0.25, 0, −0.0625] (the underline denotes
the diagonal element), whereas the stencil at the 4th row is s4 = [−0.125, 0, −0.25, 0.5, −0.0625].
Therefore, the assumptions of LFA are not fulfilled and its analysis does not apply for this system.
Nevertheless, in the tests that follow we will ignore this fact as we wish to see what convergence
rates LFA predicts for a system where its assumptions do not apply.
In fact, the eigenvectors of the fundamental matrix Al do not correspond to the Fourier harmonic
modes of LFA but instead form a Hadamard matrix of order N = 2l−1 . One of the standard ways
to construct this matrix is Sylvester’s construction [16], but the basis obtained by this procedure
does not fulfill the harmonic aliasing property. As in the previous example, we need to reorder the
columns of the eigenvector matrix in order to obtain the right structure. Therefore, we introduce
a column-reordered variation of Sylvester’s construction as follows:
W1 = 1 (76)
1 Wl Wl
Wl+1 = √ [U Ū ] (77)
2 Wl −Wl
where U and Ū correspond to uniform up-sampling and up-unselecting matrices of sizes 2l ×2l−1 .
The matrix [U Ū ] acts as a permutation matrix that reorders the columns of the new basis. From
this construction, it can be easily checked through induction arguments that the matrix Wl is
orthonormal and that it fulfills the harmonic aliasing property. The same arguments could be used
to check the fact that Wl diagonalizes the system matrix Al . Furthermore, the orthogonality of Wl
and Equation (77) allow us to obtain a closed-form expression for the sharp inter-grid filter, as
defined in (69). That is,
⎡ ⎤
1 1
⎢ ⎥
⎢1 1 ⎥
⎢ ⎥
⎢ ⎥
⎢ 1 1 ⎥
˜I 0 ⎢ ⎥
1 1 ⎢ 1 1 ⎥
Fsharp,l+1 = Wl+1
T
Wl+1 = (I +U D̄ + Ū D) = ⎢ ⎥ (78)
0 0 2 2⎢ ⎥
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ ⎥
⎣ 1 1⎦
1 1
The structure of the filter turns out to be very sparse, unlike the sharp filter for the previous
example. This filter alternately averages the values at each node with its left neighbor and then its
right neighbor.
In our analysis, the inter-grid filter Fl and the smoothing operator Sl should be designed to
match the structure of the system. For this reason, our analysis would not work if we use standard
inter-grid operators such us LI/FW, because the eigenvectors of the LI/FW filter are Fourier
harmonic modes that are different than the Hadamard harmonic modes. As the sharp inter-grid
filter in (78) has a sparse structure, we choose it as the inter-grid filter. As in the previous example,
for the smoothing filter we use the Richardson iteration scheme, which leads to a smoothing filter
Sl = I −(1/)A, with = 1−2−l obtained by the Gershgorin bound of Al . Since the sharp inter-grid
filter is removing all the L x components of the error, the only parameters to configure are the
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
DESIGN OF INTER-GRID OPERATORS 241
number of smoothing iterations. This means that we need only one iteration of the full two-grid
algorithm with O(1) smoothing iterations to make the algorithm converge. On the other hand,
a standard choice of LI/FW inter-grid operators does not work better than the sharp inter-grid
configuration as shown in Table III.
As this scenario is rather unusual in the context of PDEs, where the eigenvectors are typically
similar to Fourier harmonic modes (that come with Gibbs phenomenon, as shown in Section 6.1),
we would like to understand how the sparse inter-grid filter arranges the information to reach
convergence in one step. To understand this, we need to consider three facts. First, the fact that the
sharp inter-grid filter is alternately averaging the values at each node with its left and then right
neighbor. Second, we need to note that the coarse grid matrix Ǎl constructed from Al and Fsharp,l ,
using the Galerkin condition, is equal to our definition of Al−1 constructed by recursion (this can
be checked by induction). This would not have been the case if we used a different inter-grid
filter such as LI/FW. Then, we can say that the sharp inter-grid filter has been able to unveil the
recursive structure by which we defined the system. It is also a nice property in the sense that
the coarse grid problem also represents an absorbing Markov chain; thus the sharp inter-grid filter
makes the two-grid algorithm an aggregation method similar to what is sought in [17] using a
different multi-level approach.
The third fact is that the structure of our system induces a hierarchical classification of nodes.
Namely, we can define classes of nodes by the strength of their connections, as is usually done in
AMG methods [2]. Two nodes i and j belong to the same class if they have a transition probability
(P)i, j 1/2c , with 1cl. For instance, in the system of Figure 5 for c = 1 we have eight singleton
classes with the individual transition states in each one; for c = 2 we have four classes: {1, 2},
{3, 4}, {5, 6}, and {7, 8}; for c = 3 we have two classes: {1, 2, 7, 8} and {3, 4, 5, 6}; and finally for
c = 4 we have one class with the whole set of nodes. This classification of nodes is shown in
Figure 6.
Finally we can see how these three facts combine. The sharp inter-grid filter averages the
strongest connected nodes, which correspond alternately to nodes at the left and right of each
(S K )2
N Sharp filter LI/FW
2 0.0000 0.2500
4 0.0816 0.2030
8 0.1600 0.2700
16 0.2040 0.3447
32 0.2268 0.3955
64 0.2383 0.4428
128 0.2442 0.4817
256 0.2471 0.5156
The configuration considers one step of the full two-grid
algorithm with one pre-smoothing and one post-smoothing
Richardson iteration. The results compare the convergence
rates by using a sharp inter-grid filter or LI/FW for inter-
grid operators.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
242 P. NAVARRETE MICHELINI AND E. J. COYLE
Figure 6. Classification of nodes by the strength of their connection for the Markov chain in
Figure 5. By considering only the strongest connections, we start in the white color with eight
singleton classes. As we consider weaker connections, we obtain four classes, two classes and
finally one class with the whole set of nodes, represented in light to dark gray colors, respectively.
The classification leads to a nested structure of classes.
Table IV. Spectral radii of modal convergence operators for the system in Section 6.2.
Analysis L x→L x H x→L x L x→H x H x→H x
MA ∀N 0 0 0 1
LFA N = 2 0 0 0 1
LFA N = 4 0.0528 0.2236 0.2236 0.9472
LFA N = 8 0.1702 0.3758 0.3758 0.9803
LFA N = 16 0.2877 0.4527 0.4527 0.9936
LFA N = 32 0.3739 0.4838 0.4838 0.9981
LFA N = 64 0.4283 0.4948 0.4948 0.9995
LFA N = 128 0.4602 0.4984 0.4984 0.9999
LFA N = 256 0.4783 0.4995 0.4995 1.0000
The results consider a two-grid approach using the sharp inter-grid filter from (78). The first row shows the
results for our modal analysis (MA), which do not change with the problem size. The following rows show
the estimation of LFA (working under incorrect assumptions) for systems with increasing size.
node. These nodes belong to the same class defined above for c = 2 and, since the different classes
for 1cl are nested (see Figure 6), the sharp inter-grid filter guarantees a similar structure in the
coarse grid. This did not happen in the example of Section 6.1 because in that case we could not
separate classes with a nested structure. This fact seems to be crucial in order to obtain an optimal
inter-grid filter for the Markov chain problem.
In terms of convergence factors for this example, our analysis gives different results if we used
LFA while ignoring the fact that the assumptions for LFA are not fulfilled. This is shown in
Table IV, where we can see that the convergence estimated by our method compared with LFA is
the same only for grid size N = 2. This is because N = 2 is the only size for which the Hadamard
basis is the same as the Fourier basis. For N >2 we see how LFA gives increasingly pessimistic
estimates of the convergence factors.
We can also check how different the convergence analysis would be if we chose LI/FW for the
inter-grid operators. The multigrid algorithm lets us use these inter-grid operators but then neither
LFA nor our analysis can be applied to get information about modal convergence. This is because
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
DESIGN OF INTER-GRID OPERATORS 243
the Fourier harmonic modes of the LI/FW inter-gird filter do not match the Hadamard harmonic
modes of the system. If we ignore this limitation and we use the Hadamard harmonic basis to
estimate the convergence of a two-grid step, we obtain the results of Table V. On the other hand,
if we use a Fourier harmonic basis to estimate convergence rates (which corresponds to LFA), we
obtain the results in Table VI. The Hadamard analysis leads to a more pessimistic estimation but
it is not possible to determine which result is more accurate because the definitions of the L and
H groups of modes technically does not apply under both analyses.
The conclusion of this approach is that an arbitrary choice of inter-grid operators does not let us
apply the heuristics of the multigrid methodology if we cannot define groups of L and H modes.
The choice of LI/FW inter-grid operators still seems to make the algorithm stable because the
estimated convergence factors are always less than 1, but its performance is obviously inferior to
that of the optimal sharp inter-grid filter for this system.
Thus, in this case our analysis has been shown to be better than LFA in terms of its usefulness
for studying convergence rates. Its main advantage appears in the design of inter-grid filters and
smoothing operators.
Table VI. Spectral radii of modal convergence operators for different sizes of the system in Section 6.2.
N L x→L x H x→L x L x→H x H x→H x
4 0.2205 0.6765 0.3841 0.8843
8 0.2782 0.7038 0.4527 0.9630
16 0.3597 0.6907 0.4660 0.9915
32 0.4150 0.7805 0.4770 0.9978
64 0.4514 0.8945 0.4879 0.9995
The results consider a two-grid approach, using LI/FW as the inter-grid operators, and assuming Fourier
harmonic modes as eigenvectors of the system matrix (wrong assumption) and inter-grid filter (valid assumption).
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
244 P. NAVARRETE MICHELINI AND E. J. COYLE
Thus, the system matrix A x y ∈ R256×256 is a mixture of matrices with different eigenvectors.
Although the problem does not represent any well-known system in applications, we choose it in
order to show how our analysis applies to mixtures of very different systems. A more realistic
scenario of this kind would be, for example, a 2D diffusion equation with a diffusion coefficient
that varies along one of the dimensions. The difficulty in that case is to check the harmonic aliasing
property, which thus remains a problem for future research.
Since A y does not have constant stencil coefficients, neither does A x y . Therefore the assumptions
of LFA are not fulfilled. However, since the system fulfills the assumptions introduced in Section 4,
we are able to apply our modal analysis.
Here, the eigenvectors of the system matrix A x y are given by Wx ⊗ W y , where Wx are Fourier
harmonic modes and W y are Hadamard harmonic modes. From the results of Section 5.2, we
know that although the eigenvectors of a system represented by sums of Kronecker products
are separable, the convergence rates are not. Thus, the problem of design of inter-grid opera-
tors cannot, in general, be considered with any one dimension independent of any other. Now,
since in the y-dimension we can actually implement optimal inter-grid operators using the sharp
inter-grid filter in (78), this allows us to decouple the two problems. Then, if we choose the
inter-grid filter Fx y = Fx ⊗ Fy with the 1-pass LI/FW inter-grid filter as Fx (suitable for Fourier
harmonic eigenvectors) and the sharp inter-grid filter in (78) as Fy (optimal for Hadamard harmonic
modes) we obtain the convergence rates shown in Table VII for the two-grid algorithm. This
combination of inter-grid filters completely removes the cross-modal convergence factors with
modal transfers H y → L y and L y → H y. For the modal transfers H y → H y, we observe complete
removal of cross-modal error components (HxHy → LxHy and LxHy → HxHy) and complete transfer
of self-mode error components (LxHy → LxHy and HxHy → HxHy). For the modal transfers
L y → L y, we observe results similar to those obtained for the 1-pass LI/FW inter-grid filter in
Section 6.1.
As we did in the previous example, we can ignore the fact that the assumptions for LFA are not
fulfilled in this problem and we can compute its estimates for the convergence rates. These results
are shown in Table VIII, where we see that the estimates are not too far from the estimates of
our modal analysis. The disadvantage of LFA, other than working as an approximation, is in the
interpretation of these results as it shows that there is no decoupling between the two dimensions
of the problem.
Finally, we consider the use of different inter-grid operators for which we make a common
choice of using a 2D LI/FW operator. This operator leads to an inter-grid filter Fx y = Fx ⊗ Fy
Table VII. Spectral radii of modal convergence operators for the system in Section 6.3
using our modal analysis.
x y LxLy HxLy LxHy HxHy
→ LxLy 0.4532 0.8503 0 0
→ HxLy 0.4611 0.9994 0 0
→ LxHy 0 0 1 0
→ HxHy 0 0 0 1
The 16 convergence factors are organized according to the subscripts of modal convergence
operators indicating transfer from the four combinations of modes in the columns to the
four combinations of modes in the rows. The results consider a two-grid approach, using
a 1-pass LI/FW inter-grid filter for the x-dimension and the sharp inter-grid filter in (78)
for the y-dimension.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
DESIGN OF INTER-GRID OPERATORS 245
Table VIII. Spectral radii of modal convergence operators for the system
in Section 6.3 using LFA (under wrong assumptions).
x y LxLy HxLy LxHy HxHy
→ LxLy 0.6063 0.8420 0.4523 0.2935
→ HxLy 0.4547 0.9995 0.2080 0.2024
→ LxHy 0.4523 0.2935 0.9965 0.1878
→ HxHy 0.2080 0.2024 0.1322 1.0000
The 16 convergence factors are organized according to the subscripts of modal convergence operators indicating
transfer from the four combinations of modes in the columns to the four combinations of modes in the rows.
The results consider a two-grid approach, using a 1-pass LI/FW inter-grid filter for the x-dimension and the
sharp inter-grid filter in (78) for the y-dimension.
Table IX. Spectral radii of modal convergence operators for the system in Section 6.3 using our modal
analysis (under incorrect assumptions).
x y LxLy HxLy LxHy HxHy
→ LxLy 0.7126 0.8287 0.7548 0.2509
→ HxLy 0.4533 0.9997 0.1892 0.1798
→ LxHy 0.3730 0.2177 0.9982 0.2957
→ HxHy 0.1432 0.1433 0.2226 1.0000
The 16 convergence factors are organized according to the subscripts of modal convergence operators indicating
transfer from the four combinations of modes in the columns to the four combinations of modes in the rows.
The results consider a two-grid approach, using a 1-pass LI/FW inter-grid filter in both x- and y-dimensions. It
is assumed that Fourier harmonic modes are eigenvectors of the operators in the x-dimension (valid assumption)
and Hadamard basis are eigenvectors of the operators in the y-dimension (valid for the system matrix and false
for the inter-grid filter).
Table X. Spectral radii of modal convergence operators for the system in Section 6.3 using LFA (under
incorrect assumptions).
x y LxLy HxLy LxHy HxHy
→ LxLy 0.6722 0.8313 0.6119 0.3030
→ HxLy 0.4553 0.9996 0.2253 0.2177
→ LxHy 0.4714 0.3026 0.9999 0.2528
→ HxHy 0.2257 0.2177 0.1890 1.0000
The 16 convergence factors are organized according to the subscripts of modal convergence operators indicating
transfer from the four combinations of modes in the columns to the four combinations of modes in the rows.
The results consider a two-grid approach, using a 1-pass LI/FW inter-grid filter in both x- and y-dimensions.
It is assumed that Fourier harmonic modes are eigenvectors of the operators in both x- and y-dimensions (false
only for the system matrix in the y-dimension).
where both Fx and Fy are 1D, 1-pass LI/FW filters. As in the example of Section 6.2, this choice
of inter-grid operators makes both our modal analysis and LFA not applicable for this problem. In
Tables IX and X, we can see the estimates of our analysis, based on a Fourier–Hadamard basis and
LFA, respectively. The results are very similar and our analysis shows slightly pessimistic results
compared with LFA.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
246 P. NAVARRETE MICHELINI AND E. J. COYLE
There are many disadvantages for this choice of inter-grid operators. First and most important,
it does not allow us to define groups of L and H modes. Second, by an arbitrary definition
of these groups of modes using either our analysis or LFA, we see a high coupling in the
cross-modal convergence rates. Finally, the convergence rate for the modal transfer LxLy → LxLy
frequencies, which is the most important task for the two-grid algorithm, is far from the convergence
rate achieved by the Fourier–Hadamard inter-grid operators in Table VII. This last fact has a
consequence in the final algorithm which can be observed by using a smoothing filter Sx y = Sx ⊗ S y ,
where Sx and S y correspond to the Richardson iteration scheme as configured in Sections 6.1 and
6.2, respectively. Then, a single full two-grid step (1 = 1) with 1 = 2 = 1 shows a convergence rate
of (S K )2 = 0.2301 for our inter-grid configuration compared with (S K )2 = 0.3037 obtained
by using a 2D LI/FW inter-grid operator.
Here, our analysis has been found to be better than LFA for the design of a 2D inter-grid filter,
as the combination of LI/FW with a sharp inter-grid filter shows good performance and perfect
decoupling between the convergence rates of different dimensions.
7. CONCLUSIONS
In this paper we introduced new tools for the analysis of the linear multigrid algorithm. These
tools allowed us to reveal and study the roles of the smoothing and inter-grid operators in multigrid
convergence. In most applications of multigrid methods, these operators are designed based on the
geometry and heuristics of the problem. We see this as a big problem for distributed applications
because in such scenarios it is essential to minimize the number of iterations the algorithm requires
to converge.
The main contribution of this paper is the establishment of a new approach to convergence
analysis and new design techniques for inter-grid and smoothing operators. We have shown how
this analysis is different than LFA, which is considered to be the standard tool for the analysis
and design of multigrid methods [7]. Our study shows the clear advantages of our approach when
facing systems with non-uniform stencils. By considering different systems, we showed that there
is no general approach to optimizing the multigrid operators for a given system. For systems
with Fourier harmonic modes as eigenvectors, we face a trade-off between the computational
complexity and the convergence rate of each multigrid step. For systems with a Hadamard basis as
eigenvectors, we are able to obtain optimal multigrid operators that make the algorithm converge
in one step, with O(1) smoothing iterations, which is possible due to the particular structure of
the system. The same multigrid operators show a perfect decoupling in a mixture of two different
systems where one of the operators has a Hadamard basis as eigenvectors. Our modal analysis
has been shown to be crucial to unveil these properties and to show the exact influence of each
operator on the convergence behavior of the algorithm.
We note that, given the assumptions imposed on the system, we were able to analyze multigrid
convergence with no heuristics based on the geometry of the problem. This opens the possibility
of designing a fully AMG method if the correct assumptions are satisfied. Nevertheless, this is
not a straightforward step because the harmonic aliasing property is strongly connected with the
geometry of the problem. The main difficulty in our approach is to check our assumptions on the
eigenvectors of the system. For future research, we are studying practical methods to check these
assumptions and modifications that can make them more flexible to check and manage.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
DESIGN OF INTER-GRID OPERATORS 247
REFERENCES
1. Brandt A. Algebraic multigrid theory: the symmetric case. Applied Mathematics and Computations 1986; 19:
23–56.
2. Ruge JW, Stüben K. Algebraic multigrid (AMG). In Multigrid Methods, Frontiers in Applied Mathematics, vol.
3, McCormick SF (ed.). SIAM: Philadelphia, PA, 1987; 73–130.
3. Brandt A, McCormick SF, Ruge JW. Algebraic multigrid (AMG) for sparse matrix equations. In Sparsity and
its Applications, Evans DJ (ed.). Cambridge University Press: Cambridge, 1984.
4. Yang UM. Parallel algebraic multigrid methods high performance preconditioners. In Numerical Solutions of
PDEs on Parallel Computers, Bruaset AM, Bjrstad P, Tveito A (eds), Lecture Notes in Computational Science
and Engineering: Springer: Berlin, 2005.
5. Brandt A. Rigorous quantitative analysis of multigrid, I: constant coefficients two-level cycle with l2-norm. SIAM
Journal on Numerical Analysis 1994; 31(6):1695–1730.
6. Brandt A. Multi-level adaptive solutions to boundary-value problems. Mathematics of Computation 1977; 31:
333–390.
7. Trottenberg U, Oosterlee CW, Schüller A. Multigrid. Academic Press: London, 2000.
8. Mallat S. A Wavelet Tour of Signal Processing (2nd edn), Wavelet Analysis and its Applications. Academic
Press: New York, 1999.
9. Briggs WL, Henson VE, McCormick SF. A Multigrid Tutorial (2nd edn). SIAM: Philadelphia, PA, 2000.
10. Wesseling P. An Introduction to Multigrid Methods. Wiley: Chichester, 1992.
11. Proakis JG, Manolakis DG. Digital Signal Processing (2nd edn), Principles, Algorithms, and Applications.
Macmillan: Indianapolis, IN, 1992.
12. Laub AJ. Matrix Analysis for Scientists and Engineers. SIAM: Philadelphia, PA, 2005.
13. Davis PJ. Circulant Matrices. A Wiley-Interscience Publication, Pure and Applied Mathematics. Wiley: New
York, Chichester, Brisbane, 1979.
14. Bremaud P. Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues. Springer: New York, 1999.
15. Neuts MF. Matrix-Geometric Solutions in Stochastic Models: An Algorithmic Approach. Johns Hopkins University
Press: Baltimore, MD, 1981.
16. Sylvester JJ. Thoughts on inverse orthogonal matrices, simultaneous sign-successions, and tesselated pavements
in two or more colours, with applications to newton’s rule, ornamental tile-work, and the theory of numbers.
Philosophical Magazine 1867; 34:461–475.
17. De Sterck H, Manteuffel T, McCormick SF, Nguyen Q, Ruge JW. Markov chains and web ranking: a multilevel
adaptive aggregation method. Thirteenth Copper Mountain Conference on Multigrid Methods, Copper Mountain,
CO, U.S.A., 2007.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:219–247
DOI: 10.1002/nla
NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS
Numer. Linear Algebra Appl. 2008; 15:249–269
Published online 15 January 2008 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nla.575
SUMMARY
Consider the linear system Ax = b, where A is a large, sparse, real, symmetric, and positive-definite
matrix and b is a known vector. Solving this system for unknown vector x using a smoothed aggregation
(SA) multigrid algorithm requires a characterization of the algebraically smooth error, meaning error
that is poorly attenuated by the algorithm’s relaxation process. For many common relaxation processes,
algebraically smooth error corresponds to the near-nullspace of A. Therefore, having a good approximation
to a minimal eigenvector is useful to characterize the algebraically smooth error when forming a linear
SA solver. We discuss the details of a generalized eigensolver based on smoothed aggregation (GES-SA)
that is designed to produce an approximation to a minimal eigenvector of A. GES-SA may be applied
as a stand-alone eigensolver for applications that desire an approximate minimal eigenvector, but the
primary purpose here is to apply an eigensolver to the specific application of forming robust, adaptive
linear solvers. This paper reports the first stage in our study of incorporating eigensolvers into the existing
adaptive SA framework. Copyright q 2008 John Wiley & Sons, Ltd.
∗ Correspondence to: G. Sanders, Department of Applied Mathematics, University of Colorado at Boulder, UCB 526,
Boulder, CO 80309-0526, U.S.A.
†
E-mail: sandersg@colorado.edu
‡
University of Colorado at Boulder and Front Range Scientific Computing.
Contract/grant sponsor: University of California Lawrence Livermore National Laboratory; contract/grant number:
W-7405-Eng-48
1. INTRODUCTION
In the spirit of algebraic multigrid (AMG) [1–5], smoothed aggregation (SA) multigrid [6] has
been designed to solve a linear system of equations with little or no prior knowledge regarding
the geometry or physical properties of the underlying problem. Therefore, SA is often an effi-
cient solver for problems discretized on unstructured meshes with varying coefficients or with no
associated geometry. The relaxation processes commonly used in multigrid solvers are compu-
tationally cheap, but commonly fail to adequately reduce certain types of error, which we call
error that is algebraically smooth with respect to the given relaxation. If a characterization of
algebraically smooth error is known, in the form of a small set of prototype vectors, the SA
framework constructs intergrid transfer operators that allow such error to be eliminated on coarser
grids, where relaxation is more economical. For example, in a 3D elasticity problem, six such
components (the so-called rigid body modes) form an adequate characterization of the algebraically
smooth error. Rigid body modes are often available from discretization packages, and a solver can
be produced with these vectors in the SA framework [6]. However, such a characterization is not
always readily available (even for some scalar problems) and must be developed in an adaptive
process.
Adaptive SA (SA), as presented in [7], was designed specifically to create a representative set
of vectors for cases where a characterization of algebraically smooth error is not known. Initially,
simple relaxation is performed on a homogeneous version of the problem for all levels of the
multigrid hierarchy being constructed. These coarse-level approximations are used to achieve a
global-scale update that serves as our first prototype vector that is algebraically smooth with respect
to relaxation. Using this one resulting component, the SA framework is employed to construct a
linear multigrid solver, and the whole process can be repeated with the updated solver playing the
role of relaxation on each multigrid level. At each step, the adequacy of the solver is assessed by
monitoring convergence factors, and if the current solver is deemed adequate, then the adaptive
process is terminated and the current solver is retained.
We consider applying SA to an algebraic system of equations Ax = b, where A = (ai j ) is an n ×n
symmetric, positive-definite (SPD) matrix that is symmetrically scaled so that its diagonal entries
are all ones. For simplicity, we use damped Jacobi for our initial relaxation. The SA framework
provides an interpolation operator, P, that is used to define a coarse level with standard Galerkin
variational corrections. If the relaxation process is a convergent iteration, then it is known from the
literature (e.g. [1, 8]) that a sufficient condition for two-level convergence factors bounded from
one is that for any u on the fine grid, there exists a v from the coarse grid such that
C
u− Pv22 (Au, u) (1)
A2
with some constant C. The quality of the bound on convergence factor depends on the size of C, as
shown in [9]. This requirement is known in the literature as the weak approximation property and
reflects the observation noted in [8, 10] that any minimal eigenvector (an eigenvector associated
with the smallest eigenvalue) of A needs to be interpolated with accuracy inversely proportional
to the size of its eigenvalue. For this reason, this paper proposes a generalized eigensolver based
on smoothed aggregation (GES-SA) to approximate a minimal eigenvector of A.
Solving an eigenvalue problem as an efficient means to developing a linear solver may appear
counterintuitive. However, we aim to compute only an appropriately accurate approximation of
the minimal eigenvector to develop an efficient linear solver with that approximation at O(n)
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:249–269
DOI: 10.1002/nla
GES-SA 251
cost. In this context, many existing efficient methods for generating a minimal eigenvector are
appealing (see [11, 12] for short lists of such methods). Here, we propose GES-SA because it
takes advantage of the same data structures as the existing SA framework. Our intention is to
eventually incorporate GES-SA into the SA framework to enhance robustness of our adaptive
solvers for difficult problems that may benefit from such enhancement (such as system problems,
corner-singularity problems, or problems with geometrically oscillatory near-kernel).
The GES-SA algorithm performs a series of iterations that minimize the Rayleigh quotient
(RQ) over various subspaces, as discussed in the later sections. In short, GES-SA is a variant of
algebraic Rayleigh quotient multigrid (RQMG [13]) that uses overlapping block RQ Gauss–Seidel
for its relaxation process and SA RQ minimization for coarse-grid updates. In [14], Hetmaniuk
developed an algebraic RQMG algorithm that performs point RQ Gauss–Seidel for relaxation and
coarse-grid corrections based on a hierarchy of static intergrid transfer operators that are supplied
to his algorithm. This supplied hierarchy is assumed to have adequate approximation properties.
In contrast, GES-SA initializes the hierarchy of intergrid transfer operators and modifies it with
each cycle, with the goal of developing a hierarchy with adequate approximation properties, as in
the setup phase of SA. This is discussed in more detail in Section 3.2.
This paper is organized as follows. The rest of Section 1 gives a simple example and a background
on SA multigrid. Section 2 introduces the components of GES-SA. Section 3 presents how the
components introduced in Section 2 are put together to form the full GES-SA algorithm. Section 4
presents a numerical example with results that demonstrate how the linear SA solvers produced with
GES-SA have desirable performance for particular problems. Finally, Section 5 makes concluding
remarks.
Example 1
Consider the linear problem Ax = b and its associated generalized eigenvalue problem Ax = Bx.
Matrix A is the 1D Laplacian with Dirichlet boundary conditions, discretized with equidistant
second-order central differences, symmetrically scaled so that the diagonal entries are all ones:
⎡ ⎤
2 −1
⎢ ⎥
⎢ −1 2 −1 ⎥
⎢ ⎥
1⎢ .. ⎥
A= ⎢ . ⎥ (2)
2⎢
⎢
⎥
⎥
⎢ ⎥
⎣ −1 2 −1⎦
−1 2
an n ×n tridiagonal matrix. Matrix B for this example is In , the identity operator on Rn . The full
set of nodes for this problem is n = {1, 2, . . . , n}. The problem size, n = 9, is used throughout
this paper to illustrate various concepts regarding the algorithm. Note that the 1D problem is used
merely to show concepts and is not of further interest, as its tridiagonal structure is treated with
optimal computational complexity using a direct solver. However, the example is useful in the
sense that it captures the concepts we present in their simplest form.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:249–269
DOI: 10.1002/nla
252 M. BREZINA ET AL.
1.2. SA multigrid
In this section, we briefly recall the SA multigrid framework for constructing a multigrid hierarchy.
Like any algebraic multilevel method, SA requires a setup phase. Here, we follow the version
presented in [6, 15]. Given a relaxation process and a set of vectors K characterizing algebraically
smooth error, the SA setup phase produces a multigrid hierarchy that defines a linear solver.
For symmetric problems, such as those we consider here, standard SA produces a coarse grid
using interpolation operator P and restriction operator, R = P T . This gives the variational (or
Galerkin) coarse-grid operator, Ac = P T A P, commonly used in AMG methods. This process is
repeated recursively on all grids, constructing a multigrid hierarchy. The interpolation operator is
produced by applying a smoothing operator, S, to a tentative interpolation operator, P̂, that satisfies
the weak approximation property.
At the heart of forming P̂ is a discrete partitioning of fine-level nodes into a disjoint covering
of the full set of nodes, n = {1, 2, . . . , n}. Members of this partition are locally grouped based
on matrix AG , representing the graph of strong connections [6]. AG is created by filtering the
original problem matrix A with regard to strength of coupling (Figure 1). For the scalar problems
considered here, we define node i to be strongly connected to node j with respect to the parameter
∈ (0, 1) if
√
|ai j |> aii a j j (3)
Any connection that violates this requirement is a weak connection. Entry (AG )i j = 1 if the connec-
tion between i and j is strong, and (AG )i j = 0 otherwise.
Definition 1.1
A collection of m subsets {A j }mj=1 of n = {1, 2, . . . , n} is an aggregation with respect to AG if
the following conditions hold.
• Covering: mj=1 A j = n .
• Disjoint: For any j = k, A j ∩Ak = ∅.
• Connected: For any j, if two nodes p, q ∈ A j , then there exists a sequence of edges with
end points in A j that connects p to q within the graph of AG .
Each individual subset A j within the aggregation is called an aggregate.
The method we use to form aggregations is given in [6], where each aggregate has a central
node, or seed, numbered i, and covers this node’s entire strong neighborhood (the support of the
ith row in graph of AG ). This is a very common way of forming aggregations for computational
benefits, but is not mandatory. We return to Example 1 to explain the aggregation concept. An
acceptable aggregation of 9 with respect to A would be m = 3 aggregates, each of size 3, defined
Figure 1. Graph of matrix AG from Example 1 with n = 9. The nine nodes are enumerated, edges of the
graph represent nonzero off-diagonal entries in A, and the Dirichlet boundary conditions are represented
with the hollow dots at the end points.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:249–269
DOI: 10.1002/nla
GES-SA 253
as follows:
It is easily verified that this partitioning satisfies Definition 1.1. This aggregation is pictured in
Figure 2. 2D examples are presented in Section 4.
We find it useful to represent an aggregation {A j }mj=1 with an n ×m sparse, binary aggregation
matrix, which we denote by [A]. Each column of [A] represents a single aggregate, with a one in
the (i, j)th entry if point i is contained in aggregate A j , and a zero otherwise. In our 1D example,
with n = 9, we represent the aggregation given in (4) as
⎡ ⎤
1
⎢ ⎥
⎢1 ⎥
⎢ ⎥
⎢ ⎥
⎢1 ⎥
⎢ ⎥
⎢ ⎥
⎢ 1 ⎥
⎢ ⎥
⎢ ⎥
[A] = ⎢ 1 ⎥ (5)
⎢ ⎥
⎢ ⎥
⎢ 1 ⎥
⎢ ⎥
⎢ 1⎥
⎢ ⎥
⎢ ⎥
⎢ 1⎥
⎣ ⎦
1
Based on the sparsity structure of [A], the SA setup phase constructs P̂ with a range that
represents a given, small collection of linearly independent vectors, K. This is done by simply
restricting the values of each vector in K to the sparsity pattern specified by [A].
Under the above construction, the vectors in K are ensured to be in R( P̂), the range of the
tentative interpolation operator, and are therefore well attenuated by a corresponding coarse-grid
correction. However, K is only a small number of near-kernel components. Other vectors in R( P̂)
may actually be quite algebraically oscillatory, which can be harmful to the coarsening process
because it may lead to a coarse-grid operator with higher condition number than desired. This
degrades the effect of coarse-grid relaxation on vectors that are moderately algebraically smooth.
Of greater importance, some algebraically smooth vectors are typically not well represented by
R( P̂) and are therefore not reduced by coarse-grid corrections. To remedy the situation, SA does
not use P̂ as its interpolation operator directly, but instead utilizes a smoothed version, P = S P̂,
where S is an appropriately chosen polynomial smoothing operator. As a result, a much richer
set of algebraically smooth error is accurately represented by the coarse grid. A typical choice
for S is one step of the error propagation operator of damped-Jacobi relaxation. In this paper,
Figure 2. Graph of matrix AG from Example 1 with n = 9 splits into three aggregates. Each box encloses
a group of nodes in its respective aggregate.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:249–269
DOI: 10.1002/nla
254 M. BREZINA ET AL.
we use damped-Jacobi smoothing under the assumption that the system is diagonally scaled so
that diagonal elements are one.
The underlying set, K, that induces a linear SA solver can be either supplied as in standard SA
or computed as in SA methods. We now describe a new approach to constructing K that can be
used within the existing SA framework.
Consider the generalized eigenvalue problem, Av = Bv, where A and B are given n ×n real SPD
matrices, v is an unknown eigenvector of length n, and is an unknown eigenvalue. Our target
problem is stated as follows:
find an eigenvector, v1 = 0, corresponding to the
smallest eigenvalue, 1 , in the problem Av = Bv (6)
For convenience, v1 is called a minimal eigenvector and the corresponding eigenvalue, 1 , is called
the minimal eigenvalue.
First, we review a well-known general strategy for approximating the solution of (6), an approach
that has been used in [13, 16] to introduce our method. This strategy is to select a subspace of
Rn and choose a vector in the subspace that minimizes the RQ. In GES-SA, we essentially do
two types of subspace selection: one uses local groupings to select local subspaces that update
our approximations locally; the other uses SA to select low-resolution subspaces that use coarse
grids to update our approximation globally. These two minimization schemes are used together in
a typical multigrid way.
We recall the RQ to introduce a minimization principle that we use to update an iterate within
a given subspace.
Definition 2.1
The RQ of a vector, v, with respect to matrices A and B is the value
vT Av
A,B (v) ≡ (7)
vT Bv
Since we restrict ourselves to the case when A and B are SPD, the RQ is always a real and
positive valued. The solution we seek minimizes the RQ:
A,B (v1 ) = minn A,B (v) = 1 >0 (8)
v∈R
If two vectors w and v are such that A,B (w)< A,B (v), then w is considered to be a better
approximate solution to (6) than v. Therefore, problem (6) is restated as a minimization problem:
find v1 = 0 such that A,B (v1 ) = minn A,B (v) (9)
v∈R
Given a current approximation, ṽ, we use the minimization principle to construct a subspace,
V ⊂ Rn , such that dim(V) = m
n and
min A,B (v) A,B (ṽ) (10)
v∈V
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:249–269
DOI: 10.1002/nla
GES-SA 255
The new approximation, w̃, is a vector in V with minimal RQ. Note that if ṽ is already of minimal
RQ, then lowering the RQ is not possible. In general, we must carefully construct the subspace to
ensure that the RQ is indeed lowered.
To select w̃, we must solve a restricted minimization problem within V:
This restricted minimization problem is solved for w̃ by restating the minimization problem within
the lower-dimensional vector space, Rm , and then mapping the low-dimensional solution to the
corresponding vector in V. To do so, we construct an n ×m matrix, Q, whose m column vectors are
a basis for V. Note that, for any v ∈ V, there exists a unique y ∈ Rm such that v = Qy. Moreover,
the RQ of v with respect to A and B and the RQ of y with respect to coarse versions of A and B
are equivalent:
vT Av yT Q T AQy
A,B (v) = = = Q T AQ, Q T B Q (y) = AV ,BV (y) (12)
vT Bv yT Q T B Qy
After either approximating the solution to low-dimensional minimization problem (13) or solving
low-dimensional eigenvalue problem (14) for y1 with a standard eigensolver, the solution to the
minimization problem restricted to V defined in (11) is w̃ ← Qy1 . The whole process is then
repeated: update ṽ ← w̃, use ṽ to form a new subspace, V, and corresponding Q, solve (14) for
y1 , and set w̃ ← Qy1 .
The specific methods we use for constructing subspaces are the defining features of GES-SA
and are explained in the following three sections. In Section 2.1, we focus on how a reasonable
initial approximation is obtained using a nonoverlapping version of the subspace minimization
algorithm. In Section 2.2, we present the global subspace minimization based on SA that serves
as our nonlinear coarse-grid update. In Section 2.3, we describe the local subspace minimizations
that play the role of nonlinear relaxation.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:249–269
DOI: 10.1002/nla
256 M. BREZINA ET AL.
First, we require that an aggregation, {A j }mj=1 , be provided. Each aggregate induces a subspace,
V j ⊂ Rn , defined by all vectors v whose support is contained entirely in A j . We form a local
selection matrix, Q j , that maps Rm j onto V j , where m j is the number of nodes in the jth
aggregate. This matrix is given by
⎡ ⎤
⎢ ⎥
Q j =⎢ ê
⎣ p1
. . . ê pm j ⎥
⎦ (15)
⊥ ⊥
m
where ê p is the pth canonical basis vector, and { pq }q=1
j
are the nodes in the jth aggregate. We
then form local principal submatrices, A j ← Q j AQ j and B j ← Q Tj B Q j . A solution, y1 = 0, to
T
generalized eigenvalue problem (14) of size m j is then found using a standard eigensolver. Nodes
within the jth aggregate are set as w̃ j ← Q j y1 . After w̃ j is found for
each aggregate, the initial
approximation is the sum of disjoint locally supported vectors: ṽ ← mj=1 w̃ j .
Remark 2.1
There is no guarantee that w̃ j is of the same sign as the w̃k that are supported within adjacent
aggregates. For example, w̃ j may have all negative entries on A j and w̃k may have all positive
entries on an adjacent aggregate. In fact, discrepancies in the sign of entries on neighboring
aggregates usually occur in practice because y1 is still a solution to the local eigenproblem for
any = 0. However, this is not an issue of concern, because the subsequent coarse-grid update
presented in Section 2.2 uses the same aggregation as the initial guess development. The coarse
space is invariant to such scaling; hence, the result of coarse-grid update is independent as well.
In any case, we emphasize that this may occur only for the initial guess development phase of the
algorithm. Example 2 in Section 4 is designed to show the invariance of the success of GES-SA
with respect to these sign changes.
We summarize initial guess development in the form of an algorithm. This algorithm is used
on every level in the full GES-SA (algorithm 3 of Section 3) as pre-relaxation only for the first
multigrid cycle.
Algorithm 1
Initial guess development.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:249–269
DOI: 10.1002/nla
GES-SA 257
Hence, solutions to the restricted eigenproblems are all of the form ỹ1 = j [ 12 , √1 , 12 ]T , with a
2
scaling term | j | = 1. Hence, the initial guess developed is the vector
T
1 1 1 2 2 2 3 3 3
ṽ = ,√ , , ,√ , , ,√ , (18)
2 2 2 2 2 2 2 2 2
For the case j = 1 for all three aggregates, the initial guess is seen in Figure 3. We reiterate
what is stated in Remark 2.1: if, for example, 1 = 3 = 1 and 2 = −1, then the initial guess
causes no difficulty, even though the RQ of this vector is much higher than the vector formed from
1 = 2 = 3 = 1. For either vector, the subsequent coarse-grid update uses the same subspace to
find a set of coefficients that correspond to some new vector of minimal RQ within that subspace.
In the context of multigrid, initial guess development is used in place of pre-relaxation for
the first GES-SA multigrid cycle performed. Subsequent pre-relaxations and post-relaxations are
applied as local subspace relaxation as presented in Section 2.3. We now describe how SA is used
for global subspace updates.
Figure 3. Initial guess for the 1D model problem produced by the initial guess development algorithm;
the RQ has been minimized over each aggregate individually.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:249–269
DOI: 10.1002/nla
258 M. BREZINA ET AL.
P̂ := diag(ṽ)[A] (19)
Operator P̂ is such that ṽ ∈ R( P̂). Specifically, ṽ = P̂ 1m , where 1m is the column vector of all
ones with length m. This means that we are guaranteed to have a vector within R( P̂) with no
larger an RQ than that of ṽ:
min A,B (v) A,B (ṽ) (20)
v∈R( P̂)
Many of the vectors in R( P̂) are of high RQ, because the columns of P̂ have local support and
are not individually algebraically smooth with respect to relaxation. Therefore, as in standard SA,
we apply a polynomial smoothing operator of low degree, S, to P̂, and use the resulting operator,
instead of P̂, as a basis for our coarse space. This gives a coarse space with better approximation
to the sought eigenvector at reasonable increase in computational complexity. This smoothing
consists of just one application of the error propagation operator of damped Jacobi:
S := (In −D −1 A) (21)
where In is the identity operator on Rn and = 4/3D −1 A2 .
Normalization of the columns of interpolation is also performed, which does not change the
range of interpolation, but does control the scaling of the coarse-grid problems. This scaling is
used so that the diagonal entries of coarse-grid matrix Ac are all one. The scaling is done by
multiplying with diagonal matrix, N , whose entries are given by
1
Nii := (22)
S( P̂)i A
where ( P̂)i is the ith column of P̂. Note that we must assume that ṽ is nonzero on every aggregate.
The interpolation matrix is
P := S P̂ N (23)
Under this construction, S ṽ is in the range of P. Therefore, if S ṽ has lower RQ than that of
ṽ, we have guaranteed that a vector in Vc = R(P) improves the RQ of our iterate. The vector of
minimal RQ we select from Vc is typically a vector of much lower RQ than that of S ṽ due to the
localization provided by prolongation.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:249–269
DOI: 10.1002/nla
GES-SA 259
Note that a choice of could be computed to minimize the RQ of ṽ, a single vector in the
range of interpolation. However, this choice of may not be best for all other vectors in the range.
Therefore, we retain the standard choice of , known from the literature [15].
The columns of P form a basis for Vc because our construction ensures that there is at least one
point in the support of each column that is not present in any other column. Forming aggregates
that are at least a neighborhood in size and using damped-Jacobi smoothing does not allow
columns to ever share support with an aggregate’s central node. Therefore, under the assumption
ṽ is nonzero on every aggregate, Ac ← P T A P and Bc ← P T B P are both SPD. In the multigrid
vocabulary, restricted problem (14) is now the coarse-grid problem. A coarse-grid update is given
by interpolating the solution of the coarse-grid problem: w̃ ← Py1 . This problem, Ac y = Bc y, is
either solved using a standard eigensolver or posed as a coarse-grid minimization problem as in
(13), where local and global updates may be applied in a recursive fashion. This process forms
the coarse-grid update step of Algorithm 3 of Section 3, the full GES-SA algorithm.
As in linear multigrid, the coarse-grid update needs to be complemented by an appropriately
chosen relaxation process, on which we next focus.
a subspace of Rn with dimension (m j +1) used to form and solve (11) for an updated approxi-
mation, w̃, that has a minimum RQ within Vṽj . We allow changes to the entries of current iterate
ṽ only at nodes in set W j to minimize RQ, while leaving ṽ unchanged at nodes outside of W j ,
up to a scaling factor, w0 .
Remark 2.2
If ṽ has a relatively high RQ, then a vector in Vṽj that has minimal RQ may have w0 = 0. Essentially,
the subspace iteration throws away all information outside of W j . This is potentially disastrous to
our algorithm because, for typical problems, minimal eigenvectors are globally supported. Avoiding
this situation is the primary reason we develop initial guesses with Algorithm 1 instead of randomly.
Our current implementation does not update the iterate for subspaces in which w0 = 0. However,
this situation did not occur for the problems presented in the numerical results in Section 4.
We now explain how subsets W j are chosen and then explain the iteration procedure. One step
of the local subspace relaxation scheme minimizes the approximate eigenvalue locally over one
small portion of the full set of nodes, n . We utilize a sequence of subsets {W j }mj=1 ⊂ n that
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:249–269
DOI: 10.1002/nla
260 M. BREZINA ET AL.
form an overlapping covering of n . We then perform local subspace relaxation with each of these
subsets in a multiplicative fashion.
Similar to aggregation matrix [A], we represent these subset coverings with a sparse, binary,
overlapping subset matrix, [W]. One way to obtain an overlapping subset covering is by dilating
aggregates. This is accomplished by taking each aggregate A j within the aggregation and
expanding A j once with respect to the graph of matrix AG (Figure 4). Let [AG ] be an n ×n
binary version of AG that stores strong connections in the graph of A, defined as
1, (AG )i j = 0
[AG ]i j := (25)
0, (AG )i j = 0
Then define [W] by creating a binary version of the matrix product [AG ][A], a dilation
1, ([AG ][A])i j = 0
[W]i j := (26)
0, ([AG ][A])i j = 0
Our choice of the overlapping subsets is not limited to this construction; however, we make this
choice for simplicity and convenience.
In practice, each local RQ minimization is accomplished by rewriting minimization problem
(11) as a generalized eigenvalue problem of low dimension, as in (14), and solving for minimal
eigenvector y1 with a standard eigensolver. Note that here we use Q ṽj to represent matrices that
span each subspace, Vṽj , to distinguish from the Q j used in the initial guess section. We construct
an n ×(m j +1) matrix, Q ṽj , so that its columns are an orthogonal basis for subspace Vṽj . To define
Q ṽj explicitly, first define vector v0 by
vi , i ∈ W j
(v0 )i := (27)
0, i ∈Wj
For each point p ∈ W j , define canonical basis vectors ê p . Then, we form Q ṽj by appending these
(m j +1) vectors in a matrix of column vectors:
⎡ ⎤
ṽ ⎢ v . . . ê pm j ⎥
Q j = ⎣ 0 ê p1 ⎦ (28)
⊥ ⊥ ⊥
Figure 4. Graph of matrix AG from Example 1, with n = 9, grouped into three overlapping subsets. Each
box encloses a group of nodes in a respective subset.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:249–269
DOI: 10.1002/nla
GES-SA 261
m
where the sequence of points, { pi }i=1 j
, is a list of all points within local subset W j . This makes
the columns of Q j orthogonal, a matrix that maps from Rm j +1 onto Vṽj . For the 1D example,
ṽ
Next, we compute Aṽj ← (Q ṽj )T AQ ṽj and B ṽj ← (Q ṽj )T B Q ṽj . Then (14) is solved with a standard
eigensolver for y1 , which is normalized so that (y1 )1 = 1. This normalization is the same as
requiring w0 = 1, which leaves all nodes outside of W j unchanged by the update. Then, updated
iterate is then given by w̃ ← Q ṽj y1 .
Local subspace relaxation is summarized in the following algorithm.
Algorithm 2
Local subspace relaxation.
Function: ṽ ← LSR(A, B, ṽ, {W j }mj=1 ) .
Input: SPD matrices A and B, current approximation to the minimal eigenvector ṽ, and over-
lapping subset covering {W j }mj=1 .
Output: Updated iterate ṽ.
1. For j = 1, . . . , m, do the following:
(a) Form Q ṽj based on ṽ and W j as in (28).
(b) Form Aṽj ← (Q ṽj )T AQ ṽj and B ṽj ← (Q ṽj )T B Q ṽj .
(c) Find y1 by solving (14) via a standard eigensolver.
(d) If w0 = 0, normalize and ṽ ← Q ṽj y1 .
2. Output ṽ.
Figure 5 shows how a single sweep of local subspace relaxation acts on a random initial guess for
the 1D example. Although the guess is never really random in the actual algorithm, due to the initial
guess development, we show this case so it is clear how the algorithm behaves. This algorithm gives
relaxed iterate w̃ local characteristics of the actual minimal eigenvector. For problems with a large
number of nodes, the global characteristics of the iterate are far from those of the actual minimal
eigenvector. This is where the coarse-grid iteration complements local subspace relaxation. When
done in an alternating sequence, as in a standard multigrid method, the complementary processes
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:249–269
DOI: 10.1002/nla
262 M. BREZINA ET AL.
Figure 5. A typical local subspace relaxation sweeps on a random iterate for the 1D example with
n = 9. The top left vector is the initial iterate, ṽ; top right shows a subspace update on subset W1 ;
bottom left shows a subsequent update over W2 ; and bottom right shows final relaxed iterate w̃ after
a subsequent subspace update over W3 .
achieve both local and global characteristics of the approximate minimal eigenvector, forming an
eigensolver. Their explicit use is presented in the following section.
3. GES-SA
Because GES-SA is a multilevel method, to describe it, we change to multilevel notation. Any
symbol with subscript l refers to an object on grid l, with l = 1 the finest or original grid and l = L
the coarsest. For example, the matrix associated with the problem on level l is denoted by Al ; in
particular, A1 = A, the matrix from our original problem. Interpolation from level l +1 to level l
l
is denoted by Pl+1 instead of P, and restriction from level l to level l +1 is denoted (Pl+1l )T . The
dimension of Al is written nl . Other level l objects are denoted with a subscript and superscript l,
as appropriate.
Algorithm 3
Generalized eigensolver based on smoothed aggregation.
Function: ṽl ← GESSA(Al , Bl , , ,
,l).
Input: SPD matrices Al and Bl , number of relaxations to perform , number of cycles , number
of coarse-grid problem iterations
, and current level l.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:249–269
DOI: 10.1002/nla
GES-SA 263
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:249–269
DOI: 10.1002/nla
264 M. BREZINA ET AL.
could be used in subsequent cycles to develop several eigenvectors at once, which is currently not a
feature of GES-SA. This would be a useful approach to initialize linear solvers for system problems.
This study does not quantitatively investigate the use of RQMG in the context of an adaptive
process. These possible expansions of the current adaptive methodology are under consideration
for our future research.
4. NUMERICAL RESULTS
Many linear systems that come from the discretization of scalar partial differential equations
(PDEs) are solved efficiently with SA, with the vector of all ones as near-kernel, where the linear
solver has decent convergence rates. However, we present examples of matrices where the vector
of all ones is not a near-kernel component, and using it as one with SA may not produce a linear
solver with acceptable convergence rates.
All the results in this section show the result of running one GES-SA V -cycle ( = 1,
=
1) and = 2 post-relaxation steps. Our implementation for GES-SA is currently in MATLAB
and we therefore make no rigorous timing comparisons with competing eigensolvers. In further
investigations, we intend to explore these details. The small eigenproblems involved in GES-SA
were all solved using the eigs() function with flags set for real and symmetric matrices, which
implements ARPACK [17] routines. No 2D problem used more than five iterations to solve small
eigenproblems; no 3D problem used more than 10 iterations.
Example 2
We present the random-signed discrete Laplacian. Consider the d-dimensional Poisson problem
with Dirichlet boundary conditions:
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:249–269
DOI: 10.1002/nla
GES-SA 265
Either way we discretize the problem, we have a sparse n ×n matrix Â. We then define the
diagonal, random-signed matrix D± to have randomly assigned positive and negative ones for
entries. Finally, we form the random-signed discrete Laplacian matrix A by
A ← D± ÂD± (31)
In our results, we also symmetrically scale the matrix A to have ones on its diagonal for Example 2.
Now consider solving Ax = b given vector b. Note that the vector of all ones is not algebraically
smooth with respect to standard relaxation methods. As shown in Table I, using the vector of all
ones produces SA solvers that have unacceptable convergence factors for these problems. Instead,
we use one GES-SA cycle to produce an approximate minimal eigenvector, ṽ, and use K = {ṽ}
in the setup phase of SA to produce a linear SA solver. The convergence factors of the resulting
solver are comparable to those obtained using the actual minimal eigenvector to build the linear
SA solver. Note that convergence factors are reported as an estimation of asymptotic convergence
factors by computing a geometric average of the last 5 of 25 linear SA V (2, 2)-cycles,
1/5
e(25) A
Asymptotic convergence factor ≈ (32)
e(20) A
for the homogeneous problem, Ax = 0, starting with a random initial guess. Operator complexity
is also reported for the linear solver that uses the vector developed with GES-SA. We use
Table I. Asymptotic convergence factors for the 2D and 3D finite difference (FD) and
finite element (FE) versions of the random-signed Laplacian problem.
Problem size Levels Ones GES-SA Eigen Comp
2D, FE 81 2 0.620 0.074 0.074 1.078
729 3 0.892 0.176 0.179 1.108
6561 4 0.965 0.193 0.196 1.119
59 049 5 0.977 0.215 0.214 1.123
2D, FD 81 2 0.849 0.219 0.219 1.317
729 3 0.947 0.294 0.290 1.357
6561 4 0.962 0.306 0.305 1.348
59 049 5 0.978 0.312 0.312 1.342
3D, FE 729 2 0.598 0.114 0.111 1.054
19 683 3 0.934 0.188 0.189 1.112
3D, FD 729 2 0.825 0.289 0.292 1.389
19 683 3 0.944 0.360 0.358 1.495
64 000 4 0.961 0.418 0.413 1.511
Factors in the column labeled ‘ones’ correspond to solvers created using the vectors of
all ones; factors in the ‘GES-SA’ column correspond to solvers that use our approximate
minimal eigenvector computed with GES-SA; and factors in the ‘eigen’ column correspond
to solvers that use the actual minimal eigenvector. The last column, ‘comp’, shows the
operator complexity for all three types of solvers.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:249–269
DOI: 10.1002/nla
266 M. BREZINA ET AL.
Example 3
We also investigate GES-SA on ‘shifted’ Laplacian, or Hemholtz, problems to show the invariance
of performance with respect to such shifts. Consider the d-dimensional Poisson problem with
Dirichlet boundary conditions, shifted by a parameter,
s >0:
Figure 7. Aggregation examples displayed for 2D test problems of low dimension. On the left is an
aggregation formed with a geometric aggregation method used for the finite element problems; on the
right is an aggregation formed with an algebraic aggregation method used for finite-difference problems.
Black edges represent strong connections within graph of matrix AG ; each gray box represents a separate
aggregate that contains the nodes enclosed.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:249–269
DOI: 10.1002/nla
GES-SA 267
Table II. Relative errors between the RQ of the GES-SA approximate minimal eigen-
vector, , and the minimal eigenvalue, 1 , for 2D and 3D finite element and finite
difference versions of Example 2.
Problem size Levels 1 Relative error
2D, FE 81 2 7.222e−02 7.222e−02 0.0000034
729 3 9.413e−03 9.412e−03 0.0001608
6561 4 1.101e−03 1.100e−03 0.0002491
59 049 5 1.243e−04 1.243e−04 0.0001224
2D, FD 81 2 4.895e−02 4.894e−02 0.0000582
729 3 6.307e−03 6.288e−03 0.0031257
6561 4 7.501e−04 7.338e−04 0.0222547
59 049 5 9.306e−05 8.289e−05 0.1227465
3D, FE 729 2 1.066e−01 1.066e−01 0.0000017
19 683 3 1.412e−02 1.409e−02 0.0022805
3D, FD 729 2 4.896e−02 4.894e−02 0.0003230
19 683 3 6.303e−03 6.288e−03 0.0024756
64 000 4 2.981e−03 2.934e−03 0.0158771
Here,
s is chosen to make the continuous problem nearly singular. The minimal eigenvalue of
the Laplacian operator on (0, 1)d is d2 . Therefore, setting
for an integer s>0 makes the shifted operator (−−
s ) have a minimal eigenvalue of 1 =
10−s d2 . Here, we consider the d = 2 and 3 cases for various shifts
s . We discretized the 2D
case with nodal bilinear functions on square elements, with h = 244 1
. This gave us a system with
n = 59 049 degrees of freedom. All aggregations done in these tests were geometric, and aggregate
diameters were never greater than 3. For each shift, the solvers we developed (using both GES-SA
and the actual minimal eigenvector) have operator complexity 1.119 and five levels with 59 049,
6561, 729, 81, and 9 degrees of freedom on each respective level. Similarly, the 3D case was
discretized with nodal trilinear functions on cube elements with h = 37
1
. This gave us a system with
n = 46 656 degrees of freedom. Again, for each shift the solvers have operator complexity 1.033
and four levels with 46 656, 1728, 64, and 8 degrees of freedom on each respective level. In either
case, the minimal eigenvalue for the discretized matrix A is 1 ≈ 10−s d2 h d .
For all cases, we produced two SA solvers: the first solver was based on the actual minimal
eigenvector of A and the second was based on the approximation to the minimal eigenvector
created by one cycle of GES-SA. In Table III, we show asymptotic convergence factors (32) for
these solvers for 2D and 3D and specific shift parameters.
We assume that prolongation P from the first coarse grid to the fine grid satisfies the weak
approximation property with constant
u− Pv22 A2
C := sup minn (36)
u∈R f v∈R
n c (Au, u)
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:249–269
DOI: 10.1002/nla
268 M. BREZINA ET AL.
Table III. Asymptotic convergence factors and measures of approximation, for example, 3.
s =1 s =2 s =3 s =4 s =5
1 3.32e−05 3.32e−06 3.36e−07 3.77e−08 7.90e−09
3.32e−05 3.37e−06 3.88e−07 9.11e−08 6.03e−08
2D, FE eigen 0.196 0.198 0.198 0.199 0.197
GES-SA 0.197 0.197 0.196 0.199 0.430
(n = 59 049) M1 (P) 1.14e−05 1.13e−04 1.11e−03 1.01e−02 4.83e−02
M2 (P) 9.45e−11 9.37e−11 9.36e−11 9.54e−11 9.54e−11
1 5.86e−05 6.17e−06 9.32e−07 4.08e−07 3.56e−07
5.88e−05 6.30e−06 1.06e−06 5.40e−07 4.86e−07
3D, FE eigen 0.187 0.187 0.190 0.188 0.183
GES-SA 0.188 0.185 0.188 0.187 0.185
(n = 46 656) M1 (P) 7.07e−05 6.67e−04 4.43e−03 1.04e−02 1.18e−02
M2 (P) 3.85e−08 3.83e−08 3.84e−08 3.94e−08 3.91e−08
The s values in the columns give shift sizes
s as in (35). The first block row is for 2D problems,
the second is for 3D problems. The rows labeled ‘1 ’ show the minimal eigenvalue for the
specific discrete problem and those labeled ‘’ show RQs of the GES-SA vectors. Rows labeled
‘eigen’ show convergence factors for solvers based on the actual minimum eigenvector. Rows
labeled ‘GES-SA’ show convergence factors for solvers based on the approximation to the minimal
eigenvector given by one GES-SA cycle. Measures of approximation, M1 (P) and M2 (P), are in
rows with respective labels.
Based on the knowledge that A comes from a scalar PDE, we further assume that it is most essential
to approximate a minimal eigenvector, u1 . The denominator, (Au, u), is smallest for this vector
and other vectors that have comparable denominators are locally well represented by u1 . Under
these assumptions, we feel it is insightful to monitor the following measure of approximation for
any P that we develop
u1 − Pv22
M2 (P) := minn (38)
v∈R c u1 22
Again, this measure is shown in Table III for each problem. As
s increases, we see that M2 (P) is
essentially constant for the linear solvers that GES-SA produced, with fixed computation, indicating
that the degradation is only due to the approximation requirements getting stricter.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:249–269
DOI: 10.1002/nla
GES-SA 269
5. CONCLUSION
This paper develops a multilevel eigensolver, GES-SA, in the SA framework for the specific
application of enhancing robustness of current adaptive linear SA solvers. We show preliminary
numerical results that support approximate eigensolvers as potentially useful for initialization
within the adaptive AMG process. This paper serves as a proof of concept, and due to our high-
level implementation, we are not making claims about the efficiency of this algorithm versus
purely relaxation-based initialization given in [7]. This question will be investigated as we begin
incorporating eigensolvers into our low-level adaptive software.
ACKNOWLEDGEMENTS
The work of the last author was performed under the auspices of the U.S. Department of Energy by the
University of California Lawrence Livermore National Laboratory under contract W-7405-Eng-48.
REFERENCES
1. Brandt A. Algebraic multigrid theory: the symmetric case. Applied Mathematics and Computation 1986; 9:23–26.
2. Brandt A, McCormick S, Ruge J. Algebraic multigrid (AMG) for sparse matrix equations. In Sparsity and its
Applications, Evans DJ (ed.). Cambridge University Press: Cambridge, U.K., 1984.
3. Briggs W, Henson VE, McCormick SF. A Multigrid Tutorial (2nd edn). SIAM: Philadelphia, PA, 2000.
4. Ruge J, Stüben K. Algebraic multigrid (AMG). In Multigrid Methods, vol. 5, McComrick SF (ed.). SIAM:
Philadelphia, PA, 1986.
5. Trottenberg U, Osterlee CW, Schuller A (Appendix by K. Stuben). Multigrid (Appendix A: An Introduction to
Algebraic Multigrid). Academic Press: New York, 2000.
6. Vaněk P, Mandel J, Brezina M. Algebraic multigrid by smoothed aggregation for second and fourth order elliptic
problems. Computing 1996; 56:179–196.
7. Brezina M, Falgout R, MacLachlan S, Manteuffel T, McCormick S, Ruge J. Adaptive smoothed aggregation
(SA). SIAM Journal on Scientific Computing 2004; 25:1896–1920.
8. McCormick SF, Ruge J. Multigrid methods for variational problems. SIAM Journal on Numerical Analysis 1982;
19:925–929.
9. Brezina M. Robust iterative methods on unstructured meshes. Ph.D. Thesis, University of Colorado, Denver, CO,
1997.
10. Ruge J. Multigrid methods for variational and differential eigenvalue problems and unigrid for multigrid simulation.
Ph.D. Thesis, Colorado State University, Fort Collins, CO, 1981.
11. Hetmaniuk U, Lehoucq RB. Multilevel methods for eigenspace computations in structural dynamics. Domain
Decomposition Methods in Science and Engineering, Lecture Notes in Computational Science and Engineering,
vol. 55. Springer: Berlin, 2007; 103–114.
12. Neymeyr K. Solving mesh eigenproblems with multigrid efficiency. In Numerical Methods for Scientific
Computing, Variational Problems and Applications, Kuznetsoz Y, Neittaanmäki P, Pironneau O (eds). Wiley:
Chichester, U.K., 2003.
13. Cai Z, Mandel J, McCormick SF. Multigrid methods for nearly singular linear equations and eigenvalue problems.
SIAM Journal on Numerical Analysis 1997; 34:178–200.
14. Hetmaniuk U. A Rayleigh quotient minimization algorithm based on algebraic multigrid. Numerical Linear
Algebra with Applications 2007; 14:563–580.
15. Vaněk P, Brezina M, Mandel J. Convergence of algebraic multigrid based on smoothed aggregation. Numerische
Mathematik 2001; 88:559–579.
16. Chan TF, Sharapov I. Subspace correction multi-level methods for elliptic eigenvalue problems. Numerical Linear
Algebra with Applications 2002; 9:1–20.
17. Lehoucq RB, Sorensen DC, Yang C. ARPACK USERS GUIDE: Solution of Large Scale Eigenvalue Problems
with Implicitly Restarted Arnoldi Methods. SIAM: Philadelphia, PA, 1998.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:249–269
DOI: 10.1002/nla
NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS
Numer. Linear Algebra Appl. 2008; 15:271–289
Published online 7 January 2008 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nla.566
Yunrong Zhu∗, †
Department of Mathematics, Pennsylvania State University, University Park, PA 16802, U.S.A.
SUMMARY
This paper provides a proof of the robustness of the overlapping domain decomposition preconditioners
for the linear finite element approximation of second-order elliptic boundary value problems with strongly
discontinuous coefficients. By analyzing the eigenvalue distribution of the domain decomposition precon-
ditioned system, we prove that only a small number of eigenvalues may deteriorate with respect to the
discontinuous jump or mesh size, and all the other eigenvalues are bounded below and above nearly
uniformly with respect to the jump and mesh size. As a result, we prove that the convergence rate of the
preconditioned conjugate gradient methods is nearly uniform with respect to the large jump and mesh
size. Copyright q 2008 John Wiley & Sons, Ltd.
KEY WORDS: jump coefficients; domain decomposition; conjugate gradient; effective condition number
1. INTRODUCTION
In this paper, we will discuss the overlapping domain decomposition preconditioned conjugate
gradient (PCG) methods for the linear finite element approximation of the second-order elliptic
∗ Correspondence to: Yunrong Zhu, Department of Mathematics, Pennsylvania State University, University Park, PA
16802, U.S.A.
†
E-mail: zhu y@math.psu.edu, yrzhu@psu.edu
where each m >0 is a constant. The analysis can be carried through to a more general case when
(x) varies moderately in each subregion.
We assume that the subregions {0m : m = 1, . . . , M} are given and fixed but may possibly have
complicated geometry. We are concerned with the robustness of the PCG method in regard to both
the fineness of the discretization of the overall problem and to the severity of the discontinuities
in . This model problem is relevant to many applications, such as groundwater flow [1, 2], fluid
pressure prediction [3], electromagnetics [4], semiconductor modeling [5], electrical power network
modeling [6] and fuel cell modeling [7, 8], where the coefficients have large discontinuities across
interfaces between regions with different material properties.
When the above problem is discretized by the finite element method, for example, the condi-
tioning of the resulting discrete system will depend on both the (discontinuous) coefficients and
also the mesh size. There has been much interest in the development of iterative methods (such
as domain decomposition and multigrid methods) whose convergence rates will be robust with
respect to the change of jump size and mesh size (see [9–14] and the references cited therein). In
two dimensions, it is not too difficult to see that both domain decomposition [15–18] and multigrid
[14, 19, 20] methods lead to robust iterative methods. In three dimensions, some nonoverlapping
domain decomposition methods have been shown to be robust with respect to both the jump size
and mesh size (see [12, 14, 21, 22]). As was pointed out in [20, Remark 6.3], in some circum-
stances the deterioration is not significantly severe. In fact, using the estimates related to weighted
L 2 -projection in [23], it can be proved that (BA)C| log H | in some cases for d = 3 where H is
the mesh size of the coarse space. For example, if the interface has no cross points, or if every
subdomain touches part of the Dirichlet boundary [23–25], or if the size of coefficient satisfy
the quasi-monotonicity (cf. [26, 27]), then the multilevel or domain decomposition method was
proved to be robust. However, in general, the situations for overlapping domain decomposition
and multilevel methods are still unclear. Technically, the difficulty is due to the lack of uniform
or nearly uniform error and the stability estimates for weighted L 2 -projection, as demonstrated
in [24, 28].
Recently [29, 30], we have proved that both the BPX and the multigrid V -cycle preconditioners
will lead to nearly uniformly convergent PCG methods for the finite element approximations of
(1), although the resulting condition numbers can deteriorate severely as mentioned above. Our
work was motivated by the work of Graham and Hagger [31]. In their work, they proved that a
simple diagonal scaling would lead to a preconditioned system that only has a fixed number of
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:271–289
DOI: 10.1002/nla
DD PRECONDITIONER FOR JUMP COEFFICIENTS PROBLEM 273
small eigenvalues, which are severely infected by the discontinuous jumps. More precisely, they
proved that the ratio of the extreme values of the remaining eigenvalues, the effective condition
number (cf. [30]), can be bounded by Ch −2 where C is a constant independent of the coefficients
and mesh size.
The aim of this paper is to provide a rigorous proof of the robustness of the overlapping
domain decomposition preconditioners. As in [30], the main idea is to analyze the eigenvalue
distribution of the preconditioned systems and to prove that except for a few ‘bad’ eigenvalues, the
effective condition numbers are bounded uniformly with respect to the jump and logarithmically
with respect to the mesh size. Thanks to a standard theory for the conjugate gradient method (see
[31–33]), these small eigenvalues will not deteriorate the efficiency of the methodsignificantly.
More specific, the asymptotic convergent rate of the PCG method will be 1−2/(C | log H |+1),
which is uniform with respect to the size of discontinuous jump. When d = 3 if each subregion
0m (m = 1, . . . , M) is assumed to be a polyhedral domain with each edge length of size H0 , then
the effective condition number of BA can be bounded by C (1+log H0 /H ). Consequently, the
asymptotic convergence rate of the corresponding PCG algorithm is 1−2/(C 1+log H0 /H +1).
In particular, if the coarse grid satisfies H H0 , then the asymptotic convergence rate of the PCG
algorithm is bounded uniformly.
The rest of the paper is organized as follows. In Section 2, we introduce some basic notation,
the PCG algorithm and some theoretical foundations. In Section 3, we quote some main results on
the weighted L 2 -projection from [23]. We also consider the approximation property and stability
of weighted L 2 -projection in some special cases mentioned above. In Section 4, we analyze
the eigenvalue distribution of the domain decomposition preconditioned system and prove the
convergence rate of the PCG algorithm. In Section 5, we give some conclusion remarks.
Following [20], we will use the following short notation: x y means xC y; xy means xcy
and x y means cxyC x, where c and C are generic positive constants independent of the
variables in the inequalities and any other parameters related to mesh, space and especially the
coefficients.
2. PRELIMINARY
2.1. Notation
We introduce the bilinear form
M
a(u, v) = m (∇u, ∇v) L 2 (0 ) ∀u, v ∈ HD1 ()
m
m=1
where HD1 () = {v ∈ H 1 () : v|D = 0} and introduce the H 1 -norm and seminorm with respect to
any subregion 0m by
Thus,
M
a(u, u) = m |u|21,0 := |u|21,
m=1 m
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:271–289
DOI: 10.1002/nla
274 Y. ZHU
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:271–289
DOI: 10.1002/nla
DD PRECONDITIONER FOR JUMP COEFFICIENTS PROBLEM 275
n n
be the vector representation of u, v ∈ Vh , respectively, i.e. u = i=1 i i and v = i=1 i i .
Define
n
(, )2 , =
¯ i i i
i=1
where ¯ j = o j /|o j | is the average of the coefficient on the local patch o j = supp( j ). By
definition and quasi-uniformity, we can easily see that
h d (, )2 , u20,
Let (A) be the condition number of A, i.e. the ratio between the largest and the smallest
eigenvalues. By the standard finite element theory (cf. [14]), it is apparent that
maxm m
(A) = (A) h −2 J() with J() =
minm m
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:271–289
DOI: 10.1002/nla
276 Y. ZHU
2
km + log +m log((BA)−1) c0 (7)
√ √
where c0 = log( b/a +1)/( b/a −1). More detailed discussions on the iteration number of PCG
methods can be found in [32, 38].
Observing the convergence estimate (6), if there are only a few small eigenvalues of BA
in√ 0 (BA), then
√ the convergent rate of the PCG methods will be dominated by the factor
( b/a −1)/( b/a +1), i.e. by b/a where b = n (BA) and a = m+1 (BA). We define this quantity
as the ‘effective condition number.’
To estimate the effective condition number, we need to estimate m+1 (A). A fundamental tool
is the Courant-Fisher ‘minimax’ principle (see, e.g. [34]):
Lemma 2.3
Let V be an n-dimensional Hilbert space and A : V → V is an SPD operator on V. Suppose that
1 2 · · · n are the eigenvalues of A, then
(Av, v)
m+1 (A) = max min
dim(S)=m 0
=v∈S ⊥ (v, v)
for m = 1, 2, . . . , n. Especially, for any subspace V0 ⊂ V with dim(V0 ) = n −m, the following
estimation of m+1 (A) holds:
(Av, v)
m+1 (A) min (8)
0
=v∈V0 (v, v)
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:271–289
DOI: 10.1002/nla
DD PRECONDITIONER FOR JUMP COEFFICIENTS PROBLEM 277
Inequality (8) is the starting point for our analysis of eigenvalue distribution. It enables us to
obtain a lower bound of every eigenvalue if we can estimate min0
=v∈V0 (Av, v)/(v, v) for some
suitable subspace V0 .
3. WEIGHTED L 2 -PROJECTION
Similar to [30], a major tool to analyze the overlapping domain decomposition preconditioner is
the weighted L 2 -projection Q
H : L () → VH defined by
2
(Q
H u, v H )0, = (u, v H )0, ∀v H ∈ VH
In this section, we shall recall some main results on weighted L 2 -projection from [23]. Most of
the results in this section can also be found in [30].
and
|Q
H u|1, cd (h, H )|u|1,
where
⎧
⎪
⎪ H 1/2
⎪
⎨ log h if d = 2
cd (h, H ) = C ·
⎪
⎪ H 1/2
⎪
⎩ if d = 3
h
The proof of this lemma is based on the properties of the standard interpolation operator and
Sobolev imbedding theorem (for details, see [23]). The above lemma is not necessary true for
general H 1 () function. However, if we use the full weighted H 1 -norm, then we have
(I − Q
H )u0, H | log H |
1/2
u1,
In general, we cannot replace u1, by the semi-norm |u|1, in the above lemma. For this
1 () of H 1 () as
purpose, u ∈ HD1 () must satisfy certain condition. We introduce a subspace HD D
follows:
HD () = v ∈ HD () :
1 1
v dx = 0, ∀m ∈ I
0m
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:271–289
DOI: 10.1002/nla
278 Y. ZHU
Remark 3.3
The condition 0 v = 0 is not essential. The main idea is to introduce a subspace such that the
m
Poincaré–Friedrichs inequality (9) holds. It can be replaced by some other conditions. For example,
we can use
v dx = 0, Fm ⊂ *0m and meas(Fm )>0
Fm
(I − Q
H )v0, H | log H |
1/2
|v|1, (10)
and
|Q
H v|1, | log H |
1/2
|v|1, (11)
Proof
From the assumption, v satisfies the Poincaré–Friedrichs inequality (9). Inequality (10) then follows
by Lemma 3.2.
The proof of inequality (11) relies on (10) and the local L 2 projection Q : L 2 () → P1 ()
defined by (Q u, ) = (u, ) for all ∈ P1 (). Then on each element ∈ TH , we have
|Q 2
H v|1, |Q H v − Q v|1, +|Q v|1,
2 2
H −2 Q
H v − Q v0, +|Q v|1,
2 2
H −2 (v − Q
H v0, +v − Q v0, )+|Q v|1,
2 2 2
H −2 v − Q
H v0, +|v|1,
2 2
In the last inequality, we used the stability and approximation properties of Q , see [23, Lemma 3.3].
By multiplying suitable weights and summing up over all ∈ TH on both sides, we obtain
|Q 2 −2
H v|1, h v − Q H v0, +|v|1, | log H ||v|1,
2 2 2
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:271–289
DOI: 10.1002/nla
DD PRECONDITIONER FOR JUMP COEFFICIENTS PROBLEM 279
1 (), we have
By the Poincaré–Friedrichs inequality (9), for each v ∈ HD
In this case, we can obtain the following approximation and stability properties for the weighted
L 2 -projection:
Lemma 3.6
In R3 , assume that each subregion 0m , (m = 1, . . . , M) satisfies H0 length(E) for each edge E
1 (), we have
of 0m . Then for all v ∈ HD
1/2
H0
(I − Q
H )v0, H log |v|1, (13)
H
and
1/2
H0
|Q
H v|1, log |v|1, (14)
H
Proof
Define w ∈ VH by
⎧
⎪
⎪ w at the nodes inside 0m
⎨ m
w= QFu at the nodes inside F ⊂ *0m
⎪
⎪
⎩
0 at the nodes elsewhere
where wm = Q H v is the standard L 2 -projection of v, F ⊂ *0m is any face of 0m , and Q F : L 2 (F) →
VH (F) is the orthogonal L 2 (F) projection. Then
w −wm 2L 2 (0 ) H 3 (w −wm )2 (x)
m
x∈*0m
H3 (w −wm )2 (x)
F⊂*0m x∈F
H 3
(wm − Q F u) (x)+2
wm
2
(x)
F∈*0m x∈F x∈* F
(H wm − Q F u2L 2 (F) + H 2 wm 2L 2 (* F) )
F∈*0m
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:271–289
DOI: 10.1002/nla
280 Y. ZHU
We need to bound two terms appearing in the last expression. For the first term, we have
H wm − Q F u2L 2 (F) H u −wm 2L 2 (*0 )
m
F∈*0m
H 2 u21,0
m
v L 2 (*0 )
−1 v0,0 +
v1,0 (15)
m m m
In the last step, we used the stability of Q H : |wm |1,0 = |Q H u|1,0 |u|1,0 . Consequently,
m m m
1/2
H0
w −wm 0,0 H log |u|1,0
m H m
This proves (13). The proof of the stability (14) is the same as in Lemma 3.4.
Remark 3.7
D (), we have
In addition to the condition in Lemma 3.6, if H H0 then for all v ∈ H
(I − Q w
H )0,w H |v|1,w (16)
|Q w
H v|1,w |v|1,w (17)
Then inequalities (16) and (17) follows by the same proof as Lemma 3.6.
In this section, we consider the two level overlapping domain decomposition methods. Specifically,
there is a fine grid Th with mesh size h as described in Section 2.2, on which the solution is
sought. There is also a coarse grid TH with mesh size H. For simplicity, we assume that each
element in TH is a union of some elements in Th , and we also assume that TH aligns with
the jump interface. Let V := Vh and V0 := VH be the piecewise linear continuous finite element
spaces on Th and TH , respectively.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:271–289
DOI: 10.1002/nla
DD PRECONDITIONER FOR JUMP COEFFICIENTS PROBLEM 281
from l . Here, we make no assumption on the relationship between this partition and the jump
regions 0m (m = 1, . . . , M). Based on the partition, a natural decomposition of the finite element
space V is
L
V= Vl where Vl := {v ∈ V : v = 0 in \l }
l=1
As usual, we introduce the coarse space V0 to provide the global coupling between subdomains.
Obviously, we have the space decomposition
L
V= Vl
l=0
Q l A = Al Pl and Q l Q
k = Q k Ql = Q k for kl
L
B= Al−1 Q l (18)
l=0
Obviously, we have
L
L
BA = Al−1 Q l A = Pl
l=0 l=0
k (D −1 A)C1 k (BA)C2
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:271–289
DOI: 10.1002/nla
282 Y. ZHU
m 0 +1 (BA) m 0 +1 (D −1 A)h 2
From this relationship, we can see that the m 0 th effective condition number m 0 +1 (BA) h −2 is
independent of the coefficients. We refer to [30] for a simple analytic proof of this fact. However,
this estimate is too rough. It was pointed out that m 0 +1 (BA) could be much better than this
estimate, but no rigorous proof was given in [31]. In the following subsection, we analyze the
eigenvalue distribution of BA and prove the robustness of the additive Schwarz preconditioner.
⊥
) = m 0 and the Poincaré–Friedrichs inequality (9) holds for
We shall emphasis here that dim(V
Then we have the following stable decomposition result:
any v ∈ V.
Lemma 4.2 L
For any v ∈ V, there exist vl ∈ Vl such that v = l=0 vl and
L
a(vl , vl ) cd (h, H )2 a(v, v) (19)
l=0
L
there exist vl ∈ Vl such that v =
For any v ∈ V, l=0 vl and
L
a(vl , vl ) | log H |a(v, v) (20)
l=0
Furthermore, if each subdomain 0m satisfies length(E) H0 for any edge E of 0m , then for any
there exist vl ∈ Vl such that v = L vl and
v ∈ V, l=0
L H0
a(vl , vl ) 1+log a(v, v) (21)
l=0 H
L
In particular, in this case if the coarse grid satisfies H H0 the l=0 a(vl , vl ) a(v, v).
Proof
The ideas to prove inequality (19)–(21) are the same. The main difference is that we use different
properties of weighted L 2 -projection in Section 3. Here, we follow the idea from [20].
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:271–289
DOI: 10.1002/nla
DD PRECONDITIONER FOR JUMP COEFFICIENTS PROBLEM 283
L
Let {l }l=1
L be a partition of unity defined on satisfying
l=1 l = 1 and for l = 1, 2, . . . , L ,
Theorem 4.3
For the additive Schwarz preconditioner B defined by (18), the eigenvalues of BA satisfies
min (BA)cd (h, H )−2 , m 0 +1 (BA)C|log H |−1 and max (BA)C
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:271–289
DOI: 10.1002/nla
284 Y. ZHU
Moreover, when d = 3 and if each subregion 0m is a polyhedral domain with each edge of length
H0 , then
H0 −1
m 0 +1 (BA)C 1+log
H
Especially, if H H0 then m0+1 (B A)C.
Proof
L
Note that BA = l=0 Pl , by a standard coloring argument, we have
max (BA)C
L
For the minimum eigenvalue, for any v ∈ V consider the decomposition v = l=0 vl as in
Lemma 4.2. By the Schwarz inequality, we obtain
L
L
a(v, v) = a(vl , v) = a(vl , Pl v)
l=0 l=0
1/2 1/2
L
L
a(vl , vl ) a (Pl v, Pl v)
l=0 l=0
1/2
L
= a(vl , vl ) (a (BAv, v))1/2
l=0
Remark 4.4
Theorem 4.3 gives a direct proof of the robustness of overlapping domain decomposition precondi-
tioner for the variable coefficient problem (1). That is, the preconditioned system has only m 0 small
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:271–289
DOI: 10.1002/nla
DD PRECONDITIONER FOR JUMP COEFFICIENTS PROBLEM 285
Theorem 4.5
In R3 , assume that each subregion 0m (m = 1, . . . , M) is a polyhedral domain with each edge of
length H0 . Let u ∈ V be the exact solution to Equation (2) and {u k : k = 0, 1, 2, . . .} be the solution
sequence of the PCG algorithm. Then we have
m 0
u −u k A C0 H
2 −1 k−m 0 for km 0
u −u 0 A h
where = 1−2/(C 1+logH0 /H +1) < 1 and C0 , C are constants independent of coefficients
and mesh size. Moreover, given a tolerance 0<
<1, the number of iterations needed for
u −u k A /u −u 0 A <
satisfies
2 C0 H
km 0 + log +m 0 log −1 | log()|
h
Especially, if H H0 then the asymptotic convergence rate of the PCG algorithm is uniform
bounded with respect to both the coefficients and mesh size.
Theorem 4.5 is a direct consequence of inequalities (6) and (7) and Theorem 4.3.
Remark 4.6
From Theorem 4.5, although the convergence rate will deteriorate slightly by the condition number
(BA), because m 0 is a fixed number, the asymptotic convergence rate can be bounded by < 1
which is uniform with respect to the coefficients and the mesh size.
Without the assumption on the subdomains 0m (m = 1, . . . , M), Theorem 4.5 becomes
m 0 k−m 0
u −u k A C0 H 2
2 −1 1− for km 0
u −u 0 A h C1 | log H |+1
2 C0 H
km 0 + log +m 0 log −1 c0 (H )
h
where
Cl | log H |+1
c0 (H ) = log
Cl | log H |−1
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:271–289
DOI: 10.1002/nla
286 Y. ZHU
Remark 4.7
By similar arguments, the results above can be generated to the inexact solver additive Schwarz
preconditioners (cf. [39]) and also to the multilevel additive Schwarz preconditioners (cf. [41]).
For the BPX preconditioner and the multigrid V -cycle preconditioner, similar results can be found
in [30].
5. MATRIX REPRESENTATIONS
So far, our analysis are based on the operator form (19). In this section, we are going to look at
the algebraic representation of this preconditioner and to show that introducing the weighted L 2 -
projection Q l is for theoretical purpose only. That is, the matrix representation of B is independent
of Q l .
Let V be the finite element space, with the nodal basis {1 , . . . , n }. Then given any function
v ∈ V, there exists a unique ∈ Rn such that
n
v= i i
i=1
Let ṽ = be the vector representation of v. Given two linear vector spaces V and W and a linear
operator A ∈ L(V, W), the matrix representation of A with respect to a basis {1 , . . . , n } of V
and a basis {
1 , . . . ,
k } of W is the matrix à = (ãi j ) ∈ Rk×n satisfying
k
A j = ãi j
i for 1 jn
i=1
From the above definitions, it is easy to see that for any two operators A, B ∈ L(V) and v ∈ V,
we have
AB = A B and
Av = Ãṽ (22)
Given any subspace V0 ⊂ V equipped with a basis {01 , . . . , 0n 0 }. Then there exists a unique
matrix I0 = (ei j ) ∈ Rn×n 0 such that
n
0j = ei j i for j = 1, . . . , n 0
i=1
This matrix is the matrix representation of the natural inclusion I0 : V0 → V, and it is known a
prolongation matrix. Its transpose It0 is known as a restriction matrix.
Define the mass matrix M and the stiffness matrix A by
By definition we have (u, v)0, = (ũ, Mṽ)2 and we can easily show that A = M Ã. Obviously,
the prolongation and restriction matrices satisfy the following important relation:
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:271–289
DOI: 10.1002/nla
DD PRECONDITIONER FOR JUMP COEFFICIENTS PROBLEM 287
6. CONCLUSION
In this paper, we discussed the eigenvalue distribution of the additive and multiplicative overlapping
domain decomposition methods for second-order elliptic equations with large jump coefficients.
We proved that there are only a few small eigenvalues infected by the large jump and that
the effective condition number of the preconditioned system is of O(| log H |). As a result, the
asymptotic
convergence rate of the PCG algorithm with additive Schwarz preconditioner is 1−
2/(C | log H |+1). With additional assumptions on the subregions 0m (m = 1, . . . , M), we also
proved that the effective condition number of the preconditioned system is uniform bounded with
respect to the coefficients and mesh size. In this case, the asymptotic convergence rate of the PCG
algorithm is bounded uniformly.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:271–289
DOI: 10.1002/nla
288 Y. ZHU
ACKNOWLEDGEMENTS
This work was supported in part by NSF DMS-0609727, NSFC-10528102 and the Center for Computational
Mathematics and Applications at Pennsylvania State. The author would like to thank professor Jinchao
Xu for his valuable advice and comments on this paper.
REFERENCES
1. Alcouffe RE, Brandt A, Dendy JE, Painter JW. The multi-grid methods for the diffusion equation with strongly
discontinuous coefficients. SIAM Journal on Scientific and Statistical Computing 1981; 2:430–454.
2. Kees CE, Miller CT, Jenkins EW, Kelley CT. Versatile two-level Schwarz preconditioners for multiphase flow.
Computers and Geosciences 2003; 7(2):91–114.
3. Vuik C, Segal A, Meijerink JA. An efficient preconditioned CG method for the solution of a class of layered
problems with extreme contrasts in the coefficients. Journal of Computational Physics 1999; 152(1):385–403.
4. Heise B, Kuhn M. Parallel solvers for linear and nonlinear exterior magnetic field problems based upon coupled
FE/BE formulations. Computing 1996; 56(3):237–258.
5. Coomer RK, Graham IG. Massively parallel methods for semiconductor device modelling. Computing 1996;
56(1):1–27.
6. Howle VE, Vavasis SA. An iterative method for solving complex-symmetric systems arising in electrical power
modeling. SIAM Journal on Matrix Analysis and Applications 2005; 26(4):1150–1178.
7. Wang C. Fundamental models for fuel cell engineering. Chemical Reviews 2004; 104:4727–4766.
8. Wang Z, Wang C, Chen K. Two phase flow and transport in the air cathode of proton exchange membrane fuel
cells. Journal of Power Sources 2001; 94:40–50.
9. Bramble JH, Pasciak JE, Schatz AH. The construction of preconditioners for elliptic problems by substructuring.
IV. Mathematics of Computation 1989; 53(187):1–24.
10. Chan T, Wan W. Robust multigrid methods for nonsmooth coefficient elliptic linear systems. Journal of
Computational and Applied Mathematics 2000; 123:323–352.
11. Dryja M, Widlund OB. Schwarz methods of Neumann–Neumann type for three-dimensional elliptic finite element
problems. Communications on Pure and Applied Mathematics 1995; 48(2):121–155.
12. Mandel J, Brezina M. Balancing domain decomposition for problems with large jumps in coefficients. Mathematics
of Computation 1996; 65(216):1387–1401.
13. Smith BF. A domain decomposition algorithm for elliptic problems in three dimensions. Numerische Mathematik
1991; 60(2):219–234.
14. Xu J, Zou J. Some nonoverlapping domain decomposition methods. SIAM Review 1998; 40(4):857–914
(electronic).
15. Bramble JH, Pasciak JE, Schatz AH. The construction of preconditioners for elliptic problems by substructuring.
III. Mathematics of Computation 1988; 51(184):415–430.
16. Cho S, Nepomnyaschikh SV, Park E-J. Domain decomposition preconditioning for elliptic problems with jumps in
coefficients. Technical Report rep05-22, Radon Institute for Computational and Applied Mathematics (RICAM),
2005.
17. Nepomnyaschikh SV. Preconditioning operators for elliptic problems with bad parameters. Eleventh International
Conference on Domain Decomposition Methods, London. DDM.org: Augsburg, 1999; 82–88 (electronic).
18. Wang J, Xie R. Domain decomposition for elliptic problems with large jumps in coefficients. Proceedings of
Conference on Scientific and Engineering Computing. National Defense Industry Press: Beijing, China, 1994;
74–86.
19. Bramble JH, Pasciak JE, Wang J, Xu J. Convergence estimates for multigrid algorithms without regularity
assumption. Mathematics of Computation 1991; 57(195):23–45.
20. Xu J. Iterative methods by space decomposition and subspace correction. SIAM Review 1992; 34:581–613.
21. Le Tallec P. Domain decomposition methods in computational mechanics. Computational Mechanics Advances
1994; 1(2):121–220.
22. Smith BF, Bjørstad PE, Gropp WD. Domain Decomposition. Cambridge University Press: Cambridge, 1996.
23. Bramble JH, Xu J. Some estimates for a weighted L 2 projection. Mathematics of Computation 1991; 56(194):
463–476.
24. Oswald P. On the robustness of the BPX-preconditioner with respect to jumps in the coefficients. Mathematics
of Computation 1999; 68:633–650.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:271–289
DOI: 10.1002/nla
DD PRECONDITIONER FOR JUMP COEFFICIENTS PROBLEM 289
25. Wang J. New convergence estimates for multilevel algorithms for finite-element approximations. Journal of
Computational and Applied Mathematics 1994; 50(1–3):593–604.
26. Dryja M, Sarkis MV, Widlund OB. Multilevel Schwarz methods for elliptic problems with discontinuous
coefficients in three dimensions. Numerische Mathematik 1996; 72(3):313–348.
27. Dryja M, Smith BF, Widlund OB. Schwarz analysis of iterative substructuring algorithms for elliptic problems
in three dimensions. SIAM Journal on Numerical Analysis 1994; 31(6):1662–1694.
28. Xu J. Counter examples concerning a weighted L 2 projection. Mathematics of Computation 1991; 57:563–568.
29. Xu J, Zhu Y. Multilevel preconditioners for elliptic equations with jump coefficients on bisection grids. 2007,
preprint.
30. Xu J, Zhu Y. Uniform convergent multigrid methods for elliptic problems with strongly discontinuous coefficients.
Mathematical Models and Methods in Applied Sciences 2008; 18(2):1–29.
31. Graham IG, Hagger MJ. Unstructured additive Schwarz-conjugate gradient method for elliptic problems with
highly discontinuous coefficients. SIAM Journal on Scientific Computing 1999; 20:2041–2066.
32. Axelsson O. Iteration number for the conjugate gradient method. Mathematics and Computers in Simulation
2003; 61(3–6):421–435.
33. Xu J. Lecture Notes Multigrid Methods. Penn State MATH 552 (Fall 2006).
34. Golub GH, van Loan CF. Matrix Computations. Johns Hopkins University Press: Baltimore, MD, 1996.
35. Kelley CT. Iterative Methods for Linear and Nonlinear Equations, vol. 16. SIAM: Philadelphia, PA, 1995.
36. Saad Y. Iterative Methods for Sparse Linear Systems. SIAM: Philadelphia, PA, 2003.
37. Hackbusch W. Iterative Solution of Large Sparse Systems of Equations, vol. 95. Springer: New York, 1994.
38. Axelsson O. Iterative Solution Methods. Cambridge University Press: Cambridge, 1994.
39. Chan TF, Mathew TP. Domain decomposition algorithms. Acta Numerica 1994; 3:61–143.
40. Toselli A, Widlund O. Domain Decomposition Methods—Algorithms and Theory, vol. 34. Springer: Berlin, 2005.
41. Zhang X. Multilevel Schwarz methods. Numerische Mathematik 1992; 63(1):521–539.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:271–289
DOI: 10.1002/nla
NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS
Numer. Linear Algebra Appl. 2008; 15:291–306
Published online 15 January 2008 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nla.574
SUMMARY
This paper analyzes a multigrid (MG) V -cycle scheme for solving the discretized 2D Poisson equation with
corner singularities. Using weighted Sobolev spaces K am () and a space decomposition based on elliptic
projections, we prove that the MG V -cycle with standard smoothers (Richardson, weighted Jacobi, Gauss–
Seidel, etc.) and piecewise linear interpolation converges uniformly for the linear systems obtained by
finite element discretization of the Poisson equation on graded meshes. In addition, we provide numerical
experiments to demonstrate the optimality of the proposed approach. Copyright q 2008 John Wiley &
Sons, Ltd.
KEY WORDS: multigrid methods; graded meshes; uniform convergence; corner-like singularities
1. INTRODUCTION
Multigrid (MG) methods are arguably one of the most efficient techniques for solving the large
systems of algebraic equations resulting from finite element discretizations of elliptic boundary
value problems. Many of the known results on the convergence properties of MG methods for
elliptic equations can be found in monographs and survey papers by Bramble [1], Hackbusch [2],
Trottenberg et al. [3], Xu [4] and the references therein.
It is well known that the geometry of the boundary and changes in the boundary condition can
influence the regularity of the solution [5–12]. In particular, if the domain possesses re-entrant
corners, cracks, or there exist abrupt changes in the boundary conditions, then the solution of
∗ Correspondence to: Hengguang Li, Department of Mathematics, The Pennsylvania State University, University Park,
PA 16802, U.S.A.
†
E-mail: li h@math.psu.edu
the elliptic boundary value problem may have singularities in H 2 —we hereafter refer to singu-
larities of these types as corner-like singularities. One possible approach for obtaining accurate
numerical approximations to the solutions nearby these types of singularities is to make use of
graded meshes [6, 13–15], for which the quasi-optimal convergence rates of the numerical solu-
tions can be recovered by using an analysis based on weighted Sobolev spaces. The analysis of
the convergence rate of MG methods in such settings is, however, non-trivial. The difficulties
that arise are due primarily to the lack of regularity of the solution and the non-uniformity of
the mesh.
A result for the uniform convergence of the MG method assuming full regularity was derived by
Braess and Hackbusch [16]; in Brenner’s paper [17], the analysis of the convergence rate for only
partial regularity was presented; Bramble et al. [18] developed the convergence estimate without
regularity assumptions for an L 2 -projection-based decomposition. In addition, on graded meshes,
using the approximation property in [14], Yserentant [19] proved the uniform convergence of the
MG W -cycle with a particular iterative method on each level for piecewise linear functions. There
are also many other more classical convergence proofs that use algebraic techniques and derive
convergence results based on assumptions related to, but nevertheless different from, the regularity
of the underlying partial differential equation [20, 21].
In this paper, using a space decomposition for elliptic projections and an estimate on the weighted
Sobolev space K am , we prove the uniform convergence of the MG V -cycle with standard subspace
smoothers (Richardson, weighted Jacobi, Gauss–Seidel, etc.) for elliptic problems with corner-
like singularities, discretized using graded meshes. To date, this type of convergence analysis
has been carried out only for problems with full elliptic regularity. The result presented here
establishes the uniform convergence of the MG method for problems with less regular solutions
discretized using graded meshes that appropriately capture the correct behavior of the solution near
the singularities. Although the main convergence theorem can be modified for elliptic problems
discretized on general graded meshes, for exposition, we restrict our discussion to the graded mesh
refinement (GMR) strategy developed by Băcuţă et al. [6]. Before proceeding, we mention that,
with appropriate modifications, our analysis for linear elements can also be applied to higher-order
finite element methods.
where * D and * N consist of segments of the boundary, and we assume that the Neumann
boundary condition is not imposed on adjacent sides of the boundary. We note that, in the Sobolev
space H m , corner-like singularities appear in the solution near vertices of the domain. Here, by
vertices, we mean the points on ¯ where corner-like singularities in H 2 () are located, namely,
the geometric vertices on re-entrant corners, crack points, or points with an interior angle > /2,
where the boundary conditions change.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:291–306
DOI: 10.1002/nla
MULTIGRID METHOD ON GRADED MESHES 293
Let H D1 () = {u ∈ H 1 ()| u = 0 on * D } be the space of H 1 () functions with zero trace on
* D , Tj , 0 jJ , be a sequence of appropriately graded and nested triangulations of , and Mj ,
0 jJ , be the finite element space associated with the linear Lagrange triangle [22] on Tj . Then,
M0 ⊂ M1 ⊂ · · · ⊂ Mj ⊂ · · · ⊂ MJ ⊂ H D1 ()
Let A be the differential operator associated with Equation (1). Solving (1) amounts to finding an
approximation u J ∈ MJ such that
a(u J , v J ) = (Au J , v J ) = (∇u J , ∇v J ) = ( f, v J ) ∀v J ∈ MJ
Denoting by N J the dimension of the space MJ , by using a GMR strategy, one can recover the
following quasi-optimal rate of convergence for the finite element approximation u J ∈ MJ on TJ :
−1/2
u −u J H 1 () C N J f L 2 ()
The main objective of this paper is to prove the uniform convergence of the MG V -cycle
with standard subspace smoothers (Richardson, weighted Jacobi, Gauss–Seidel, etc.) and linear
interpolation applied to the 2D Poisson equation discretized using piecewise linear functions on
graded meshes obtained via the GMR strategy introduced in [6]. Moreover, we shall show that the
convergence rate, c, of the MG V -cycle satisfies
c1
c
c1 +c2 n
where c1 and c2 are mesh-independent constants related to the elliptic equation and the smoother,
respectively, and n is the number of iterative solves on each subspace. We note that this result can
also be used to estimate the efficiency of other subspace smoothers on graded meshes.
The rest of this paper is organized as follows. In Section 2, we introduce the weighted Sobolev
space K am () for boundary value problem (1) and review the method of subspace corrections
(MSC). In addition, we briefly describe the GMR strategy under consideration here for generating
the sequence of graded meshes. Then, in Section 3, we prove the approximation and smoothing
properties, which in turn lead to our main MG convergence theorem. Section 4 contains numerical
results of the proposed method applied to problem (1).
In this section, we begin by introducing the weighted Sobolev space K am () and the mesh refinement
strategy under consideration for recovering quasi-optimal rates of convergence of the finite element
solution. Then, we present the MSC and a technique for estimating the norm of the product of
non-expansive operators.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:291–306
DOI: 10.1002/nla
294 J. J. BRANNICK, H. LI AND L. T. ZIKATANOV
¯ be an arbitrary point and S = {Si } be the set of vertices of the domain, on which
Let (x, y) ∈
the solution has singularities in H 2 (). Denote by ri (x, y) the distance from (x, y) to the vertex
Si ∈ S and let (x, y) be a smooth function on , ¯ such that =ri in the neighborhood of Si ,
and C > 0 otherwise. Then, the weighted Sobolev space K am (), m0, is defined as follows
[6, 11]:
i j
K am () = {u ∈ Hloc
m
()| i+ j−a *x * y u ∈ L 2 (), i + jm}
The corresponding K am -norm and seminorm for any function v ∈ K am () are
i j
v2K m () := i+ j−a *x * y v2L 2 ()
a
i+ j m
i j
|v|2K m () := m−a *x * y v2L 2 ()
a
i+ j=m
Note that is equal to the distance function ri (x, y) near the vertex Si . Thus, we have the following
proposition and mesh refinements as in [6, 15].
Proposition 2.1
We have |v| K 1 () =
∼ |v| H 1 () , v K 0 () Cv L 2 () , and the Poincaré type inequality v K 0 ()
1 1 1
C|v| K 1 () for v ∈ K 11 ()∩{v|* D = 0}.
1
Here, a =
∼ b means there exist positive constants C1 , C2 , such that C1 baC2 b.
Definition 2.2
Let be the ratio of decay of triangles near a vertex Si ∈ S. Then, for every < min(/ti ), one can
choose = 2−1/ , where i is the interior angle of vertex Si , t = 1 on vertices with both Dirichlet
boundary conditions, and t = 2 if the boundary condition changes type at Si . For example, i = 2
and t = 1 on crack points with both Dirichlet boundary conditions. In the initial triangulation, we
require that each triangle contains at most one point in S, and each Si needs to be a vertex of
some triangle. In other words, no point in S is sitting on the edge or in the interior of a triangle.
Let Tj = {Tk } be the triangulation after j refinements. Then, for the ( j +1)th refinement, if the
function is bounded away from 0 on a triangle (no point in S contained), new triangles are
obtained by connecting the mid-points of the old one. However, if Si is one of the vertices of a
triangle Si BC, then we choose a point D on Si B and another point E on Si C such that the
following holds for the ratios of the lengths
= Si D/Si B = Si E/Si C
In this way, the triangle Si BC is divided into four smaller triangles by connecting D, E, and
the mid-point of BC (see Figure 1).
We note that other refinements, for example, those found in [13, 14] also satisfy this condition,
although they follow different constructions. We now conclude this subsection by restating the
following theorem derived in [6, 15].
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:291–306
DOI: 10.1002/nla
MULTIGRID METHOD ON GRADED MESHES 295
Theorem 2.3
Let u j ∈ Mj be the finite element solution of Equation (1) and denote by N j the dimension of Mj .
Then, there exists a constant B1 = B1 (, , ), such that
−1/2 −1/2
u −u j H 1 () B1 N j f K 0 B1 N j f L 2 ()
−1 ()
for every f ∈ L 2 (), where < 1 is determined from Definition 2.2, Mj is the finite element space
of linear functions on the graded mesh Tj , as described in the introduction.
Remark 2.4
m+1
For u ∈
/ H 2 (), this theorem follows from the fact that the differential operator A : K 1+ ()∩{u =
m−1
0, on * D } → K −1+ (), m0, in Equation (1), is an isomorphism between the weighted Sobolev
spaces.
where the pairing (·, ·) is the inner product in L 2 (). Here, a(·, ·) is a continuous bilinear form
on H D1 ()× H D1 () and by the Poincare inequality is also coercive. In addition, since the Tj are
nested,
M0 ⊂ M1 ⊂ · · · ⊂ Mj ⊂ · · · ⊂ MJ ⊂ H D1 ()
(Au j , v j ) = (A j u j , v j ) ∀u ∈ H D1 () ∀u j , v j ∈ Mj
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:291–306
DOI: 10.1002/nla
296 J. J. BRANNICK, H. LI AND L. T. ZIKATANOV
j j
Let N j = {xi } be the set of nodal points in Tj and k (xi ) = i,k be the linear finite element nodal
j
basis function corresponding to node xk . Then, the jth level finite element discretization reads:
Find u j ∈ Mj , such that
Aju j = f j (2)
where f j ∈ Mj satisfies ( f j , v j ) = ( f, v j ), ∀v j ∈ Mj .
The MSC reduces an MG process to choosing a sequence of subspaces and corresponding
operators B j : Mj → Mj approximating A−1 j , j = 1, . . . , J . For example, in the MSC framework,
the standard MG backslash cycle for solving (2) is defined by the following subspace correction
scheme:
u j,l = u j,l−1 + B j ( f j − A j u j,l−1 )
where the operators B j : Mj → Mj , 0 jJ , are recursively defined as follows [24].
Algorithm 2.5
Let R j ≈ A−1 −1
j , j > 0, denote a local relaxation method. For j = 0, define B0 = A0 . Assume that
B j−1 : Mj−1 → Mj−1 is defined. Then,
1. Fine grid smoothing: For u 0j = 0 and k = 1, 2, . . . , n,
u kj = u k−1
j + R j ( f j − A j u k−1
j ) (3)
2. Coarse grid correction: Find the corrector e j−1 ∈ Mj−1 by the iterator B j−1
e j−1 = B j−1 Q j−1 ( f j − A j u nj )
Then, B j f j = u nj +e j−1 .
Recursive application of Algorithm 2.5 results in an MG V -cycle for which the following identity
holds: I − B vJ A J = (I − B J A J )∗ (I − B J A J ) [24], where B vJ is the iterator for the MG V -cycle.
Direct computation gives the following useful result:
u nj = (I − R j A j )u n−1
j + Rj Aju j
= (I − R j A j )2 u n−2
j −(I − R j A j )2 u j +u j
= −(I − R j A j )n u j +u j
where u j is the finite element solution of (2) and u nj is the approximation after n iterations of (3)
on the jth level. Let T j = (I −(I − R j A j )n )P j be a linear operator and define T0 = P0 . We have
the following identity:
(I − B J A J )u J = u J −u nJ −e J −1 = (I − T J )u J −e J −1
= (I − B J −1 A J −1 PJ −1 )(I − T J )u J
where, for B J −1 = A−1
J −1 , this becomes a two-level method. Recursive application of this identity
then yields the error propagation operator of an MG V -cycle:
(I − B J A J ) = (I − T0 )(I − T1 ) · · · (I − T J )
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:291–306
DOI: 10.1002/nla
MULTIGRID METHOD ON GRADED MESHES 297
To estimate the uniform convergence of the MG V -cycle, we thus need to show that
I − B vJ A J a = I − B J A J a2 c < 1
where c is independent of J and ua2 = a(u, u) = (Au, u) on .
Associated with each T j , we introduce its symmetrization
T̄ j = T j + T j∗ − T j∗ T j
where T j∗ is the adjoint operator of T j with respect to the inner product a(·, ·). By a well-known
result found in [25], the following estimate holds:
c0
I − B J A J a2 =
1+c0
where
J
c0 sup a((T̄ j−1 − I )(P j − P j−1 )v, (P j − P j−1 )v) (4)
va =1 j=1
Now, to prove the uniform convergence of the proposed MG scheme, we must derive a uniform
bound on the constant c0 .
Although the above presentation is in terms of operators, the matrix representation of the
smoothing step (3) is often used in practice. By the matrix representation R of an operator R on
Nj
Mj , we here mean that with respect to the basis {i }i=1 of Mj ,
Nj
R(k ) = Ri,k i
i=1
where Ri,k is the (i, k) component of the matrix R. Throughout the paper, we use boldfaced letters
to denote vectors and matrices.
Let A S = D−L−U be the stiffness matrix associated with the operator A j , where the matrix D
consists of only the diagonal entries of A S , while matrices −L and −U are the strictly lower and
upper triangular parts of A S , respectively. Denote by R M the corresponding matrix of the smoother
R j on the jth level. For example, R M = D−1 for the Jacobi method, and R M = (D−L)−1 for the
Gauss–Seidel method. In addition, let ul , ul−1 , and f be the vectors containing the coordinates
Ni N j l
of u lj , u l−1
j , f j ∈ Mj on the basis {i }i=1 , namely u j =
l
i=1 ui i . Then, one smoothing step for
solving (2) on a single level j in terms of matrices reads
ul = ul−1 +R M (Mf−A S ul−1 ) (5)
where M is the mass matrix, and Mi,k = (i , k ).
Lemma 2.6
Let R be the matrix representation of the smoother R j in Equation (3). Then,
R = RM M
Hence,
Nj
Nj
R j (k ) = Ri,k i = (R M M)i,k i
i=1 i=1
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:291–306
DOI: 10.1002/nla
298 J. J. BRANNICK, H. LI AND L. T. ZIKATANOV
and
Proof
Denote by A the matrix representation of the operator A. Note that
Nj
(Ai , k ) = Am,i m , k = (∇k , ∇i ) = (A S )k,i
m=1
indicates A S = MA. Moreover, in terms of matrices and vectors, Equation (3) also reads
Nj
Nj
Nj
Nj
Nj
Nj
Nj
uli i = ul−1
i i + Rk,i fi k − Rm,k Ak,i ui m
i=1 i=1 i=1 k=1 i=1 k=1 m=1
ul = ul−1 +R(f−Au)
Taking into account that Equations (3) and (5) represent the same iteration, we have
Rf = R M Mf
Note the above equation holds for any f ∈ R N j . Therefore, R = R M M, which completes the proof.
Next, we derive an estimate for the constant c0 in (4) of Section 2 and then proceed to establish
the main convergence theorem of the paper. We begin by proving several lemmas that are needed
in the convergence proof. For simplicity, we assume that there is only a single point S0 ∈ , ¯ for
which the solution of Equation (1) has a singularity in H (), and that a nested sequence of
2
graded meshes has been constructed, as described in Definition 2.2. The same argument, however,
carries over to problems on domains with multiple singularities and also for similar refinement
strategies.
S
Denote by {Ti 0 } all the initial triangles with the common vertex S0 . Recall that the function
in the weight equals the distance to S0 on these triangles. Based on the process in Definition 2.2,
S
after N refinements, the region ∪Ti 0 is partitioned into N +1 sub-domains (layers) Dn , 0nN ,
whose sizes decrease by the factor as they approach S0 (see Figure 2). In addition, (x, y) = ∼ n
on Dn for 0n < N and (x, y)C on D N . Meanwhile, sub-triangles (nested meshes) are
N
generated in these layers Dn , 0nN , with corresponding mesh size of order O(n 2n−N ).
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:291–306
DOI: 10.1002/nla
MULTIGRID METHOD ON GRADED MESHES 299
Figure 2. Initial triangles with vertex S0 (left); layer D0 and D1 after one refinement (right), = 0.2.
Note that = (∪Dn )∪(\∪ Dn ). Let *Dn be the boundary of Dn . Then, we define a piecewise
constant function r p (x, y) on ¯ as follows.
(1/2)n on D̄n \*Dn−1 for 1 < nN
r p (x, y) =
1 otherwise
S
where N = J is the number of refinements for TJ . Therefore, the restriction of r p on every Ti 0 ∩ Dn
is a constant. Recall that < 1 is the parameter for , such that = 2−1/ . Define the weighted
inner product with respect to r p :
(u, v)r p = (r p u,r p v) = r 2p uv
In addition, the above inner product induces the norm:
1/2
ur p = (u, u)r p
Then, the following estimate holds.
Lemma 3.1
c1
(u j − P j−1 u j , u j − P j−1 u j )r p a(u j − P j−1 u j , u j − P j−1 u j ) ∀u j ∈ Mj
Nj
where N j = O(22 j ) is the dimension of Mj .
Proof
This lemma can be proved by the duality argument as follows.
Consider the following boundary value problem:
−w = r 2p (u j − P j−1 u j ) in
w = 0 on * D
*w/*n = 0 on * N
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:291–306
DOI: 10.1002/nla
300 J. J. BRANNICK, H. LI AND L. T. ZIKATANOV
We note that w is a piecewise linear function on the graded triangulation Tj that is derived after
j refinements. From the results of Theorem 2.3, we conclude
The inequalities above are based on the definition of , r p , and related norms. Now, since N j =
O(N j−1 ), combining the results above, we have
c1 c1
|u j − P j−1 u j |2H 1 = a(u j − P j−1 u j , u j − P j−1 u j )
Nj Nj
Recall that the matrix form R M and the matrix representation R of a smoother R j are different
from Lemma 2.6. Then, we have the following result regarding the smoother R̄ j = R j + R tj −
R tj A j R j on Mj , which is the symmetrization of R j , where R tj is the adjoint of R j with respect
to (·, ·).
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:291–306
DOI: 10.1002/nla
MULTIGRID METHOD ON GRADED MESHES 301
Lemma 3.2
For the subspace smoother R̄ j : Mj → Mj , we assume that there is a constant C > 0 independent
of j, such that the corresponding matrix form R̄ M satisfies
vT R̄ M vCvT v ∀v ∈ R N j
on every level j, where N j is the dimension of the subspace Mj . Then, there exists c2 > 0, also
independent of the level j, such that the following estimate holds on each graded mesh Tj ,
c2
( R̄ j v, v)( R̄ j v, R̄ j v)r p ∀v ∈ Mj
Nj
Proof
For any v = i vi i ∈ Mj , from Lemma 2.6, we have
( R̄ j v, v) = vm (R̄ M M)k,m k , vi i = vT MT R̄ M Mv
m k i
On the other hand,
( R̄ j v, R̄ j v)r p = vm (R̄ M M)k,m k , vl (R̄ M M)i,l i
m k l i
= vT MT R̄ M M̃R̄ M Mv
where M̃ is a matrix satisfying (M̃)i,k = (r p i ,r p k ). Note that both M and M̃ are symmetric
positive definite (SPD). Now, suppose supp(i )∩ Dn = ∅, 0n j. Then, on supp(i ), the mesh
size is O(n 2n− j ) and r p =
∼ (1/2)n , respectively, since supp(i ) is covered by at most two adjacent
layers. Thus, all the non-zero elements in M̃ are positive and M̃ = ∼ 2−2 j =
∼ 1/N j . To complete the
proof, it is sufficient to show that there exists C > 0, such that
1/2 1/2
wT R̄ M M̃R̄ M w(C/N j )wT w
1/2
where w = R̄ M Mv.
From the condition on R̄ M and the estimates on M̃, it follows that
wT R̄ M M̃R̄ M w =
1/2 1/2
∼ (1/N j )w R̄ M w(C/N j )w w
T T
Remark 3.3
For our choice of graded meshes, the triangles remain shape-regular elements, that is, the minimum
angles of the triangles are bounded away from 0. Therefore, the stiffness matrix A S has a bounded
number of non-zero entries per row and each entry is of order O(1). Hence, the maximum
eigenvalue of A S is bounded. For this reason, standard smoothers (Richardson, weighted Jacobi,
Gauss–Seidel, etc.) satisfy Lemma 3.2, and (R M )i, j = O(1) as well, since they are all from part
of the matrix A S . Moreover, if R M is SPD and the spectral radius (R M A S ) , for 0 < < 1,
then based on Lemma 2.6,
a(R j A j v, v) = (A j R j A j v, v)
= vT A S R M A S v
a(v, v)
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:291–306
DOI: 10.1002/nla
302 J. J. BRANNICK, H. LI AND L. T. ZIKATANOV
1/2 1/2
The last inequality follows from the similarity of the matrix A S R M A S and the matrix R M A S .
Note that the above inequality implies the spectral radius of R j A j , since R j A j is symmetric
with respect to a(·, ·).
We then define the following operators for the MG V -cycle. Recall T j from Section 2 and
let R j denote a subspace smoother satisfying Lemma 3.2. Recall the symmetrization R̄ j of R j ,
and assume the spectral radius ( R̄ j A j ) for 0 < < 1. Note that R tj is the adjoint of R j with
respect to (·, ·) and T j∗ is the adjoint of T j with respect to a(·, ·). With n smoothing steps, where
R j and R tj are applied alternatingly, the operator G j and G ∗j are defined as follows:
G j = I − Rj Aj, G ∗j = I − R tj A j
Therefore, if we define
G ∗j G j for even n
G j,n =
G j G ∗j for odd n
since P j2 = P j ,
T¯j = T j + T j∗ − T j∗ T j = (I − G nj,n )P j
Theorem 3.4
On every triangulation Tj , suppose that the smoother on each subspace Mj satisfies Lemma 3.2.
Then, following the algorithm described above, we have
c0 c1
I − B J A J a2 =
1+c0 c1 +c2 n
where c1 and c2 are constants from Lemmas 3.1 and 3.2.
Proof
Recall
(4) from Section 2. To estimate the constant c0 , we first consider the decomposition v =
j v j for any v ∈ MJ with
v j = (P j − P j−1 )v ∈ Mj
N j (v j , v j )r p c1 a(v j , v j )
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:291–306
DOI: 10.1002/nla
MULTIGRID METHOD ON GRADED MESHES 303
= ( R̄ −1 n −1 n
j R̄ j A j (I − G j,n ) G j,n v j , v j )
= ( R̄ −1 n −1 n
j (I − G j,n )(I − G j,n ) G j,n v j , v j )
−1/2 1/2 −1/2
Note that G kj,n , kn, is in fact a polynomial of R̄ j A j . Therefore, R̄ j (I − G j,n ) R̄ j , R̄ j G nj,n
−1/2 1/2 1/2 1/2 −1/2 1/2
R̄ 1/2 , and R̄ j (I − G nj,n ) R̄ j are all polynomials of R̄ j A j R̄ j , where R̄ j (I − G nj,n ) R̄ j =
−1/2 −1/2 −1/2
(I − G nj,n )−1 R̄ j )−1 . Thus, it can be seen that R̄ j
1/2 1/2
( R̄ j (I − G j,n ) R̄ j , R̄ j G nj,n R̄ 1/2 , and
−1/2 −1/2
R̄ j (I − G nj,n )−1 R̄ j commute with each other; hence, R̄ j (I − G j,n )(I − G nj,n )−1 G nj,n R̄ 1/2
1/2
1 Nj
( R̄ −1
j v j , v j ) (v j , v j )r p
n c2 n
where the last inequality is from Lemma 3.2. Moreover,
J J N J c c1
a(T̄ j−1 (I − T̄ j )v j , v j )
j 1
(v j , v j )r p a(v j , v j ) = a(v, v)
j=0 c
j=1 2 n c
j=0 2 n c 2n
Therefore, c0 c1 /(c2 n) and consequently, the MSC yields the following convergence estimate for
the MG V -cycle:
c0 c1
I − B J A J a2 =
1+c0 c1 +c2 n
which completes the proof.
4. NUMERICAL ILLUSTRATION
This section contains numerical results for the proposed MG V -cycle applied to the 2D Poisson
equation with a single corner-like singularity. The model test problem we consider here is given
by
−u = f in
(6)
u=0 on *
where the singularity occurs at the tip of the crack {(x, y), 0x0.5, y = 0.5} for = (0, 1)×(0, 1)
as in Figure 3.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:291–306
DOI: 10.1002/nla
304 J. J. BRANNICK, H. LI AND L. T. ZIKATANOV
The MG scheme used to solve (6) is a standard MG V -cycle with linear interpolation. The
sequence of coarse-level problems defining the MG hierarchy is obtained by re-discretizing (6)
on the nested meshes constructed using the GMR strategy described in Section 2. The reported
results are for V (1, 1)-cycles and Gauss–Seidel (GS) as a smoother. The asymptotic convergence
factors are computed using 100 V (1, 1)-cycles applied to the homogeneous problem starting with
an O(1) random initial approximation.
The asymptotic convergence factors reported in Table I clearly demonstrate our theoretical
estimates in that they are independent of the number of refinement levels. To obtain a more complete
picture of the overall effectiveness of our MG solver, we examine also storage and work-per-cycle
measures. These are usually expressed in terms of operator complexity, defined as the number of
non-zero entries stored in the operators on all levels divided by the number of non-zero entries
in the finest-level matrix, and grid complexity defined as the sum of the dimensions of operators
over all levels divided by the dimension of the finest-level operator. The grid and, especially, the
operator complexities can be viewed as proportionality constants that indicate how expensive the
entire V -cycle is compared with performing only the finest-level relaxations of the V -cycle. For
our test problem, the grid and operator complexities were 1.2 and 1.3, respectively, independent
of the number of levels. Considering the low grid and operator complexities the performance of
the resulting MG solver applied to problem (6) is comparable to that of standard geometric MG
applied to the Poisson equation with full regularity, i.e. without corner-like singularities; for the
Poisson equation discretized on uniformly refined grids, standard MG with a GS smoother and
linear interpolation yields MG ≈ 0.35.
Figure 3. Crack: initial triangulation (left) and the triangulation after one refinement (right), = 0.2.
Table I. Asymptotic convergence factors (MG ) for the MG V (1, 1)-cycle applied to
problem (6) with Gauss–Seidel smoother.
levels 2 3 4 5 6
MG (GS) 0.40 0.53 0.56 0.53 0.50
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:291–306
DOI: 10.1002/nla
MULTIGRID METHOD ON GRADED MESHES 305
ACKNOWLEDGEMENTS
We would like to thank Long Chen, Victor Nistor and Jinchao Xu for their useful suggestions and
discussions during the preparation of this manuscript.
The work of the second author was supported in part by NSF (DMS-0555831). The work of the first
and the third author was supported in part by the NSF (DMS-058110) and Lawrence Livermore National
Lab (B568399).
REFERENCES
1. Bramble JH. Multigrid Methods. Chapman & Hall, CRC Press: London, Boca Raton, FL, 1993.
2. Hackbusch W. Multi-Grid Methods and Applications. Computational Mathematics. Springer: New York, 1995.
3. Trottenberg U, Oosterlee CW, Schüller A. Multigrid. Academic Press: San Diego, CA, 2001 (With contributions
by A. Brandt, P. Oswald, K. Stüben).
4. Xu J. Iterative methods by space decomposition and subspace correction. SIAM Review 1992; 34(4):581–613.
5. Babuška I, Aziz AK. The Mathematical Foundations of the Finite Element Method with Applications to Partial
Differential Equations. Academic Press: New York, 1972.
6. Băcuţă C, Nistor V, Zikatanov LT. Improving the rate of convergence of ‘high order finite elements’ on polygons
and domains with cusps. Numerische Mathematik 2005; 100(2):165–184.
7. Bourlard M, Dauge M, Lubuma MS, Nicaise S. Coefficients of the singularities for elliptic boundary value
problems on domains with conical points. III. Finite element methods on polygonal domains. SIAM Journal on
Numerical Analysis 1992; 29(1):136–155.
8. Dauge M. Elliptic Boundary Value Problems on Corner Domains. Lecture Notes in Mathematics, vol. 1341.
Springer: Berlin, 1988.
9. Grisvard P. Singularities in Boundary Value Problems. Research Notes in Applied Mathematics, vol. 22. Springer:
New York, 1992.
10. Kellogg RB, Osborn JE. A regularity result for the Stokes problem in a convex polygon. Journal of Functional
Analysis 1976; 21(4):397–431.
11. Kondratiev VA. Boundary value problems for elliptic equations in domains with conical or angular points.
Transactions of the Moscow Mathematical Society 1967; 16:227–313.
12. Kozlov VA, Mazya V, Rossmann J. Elliptic Boundary Value Problems in Domains with Point Singularities.
American Mathematical Society: Rhode Island, 1997.
13. Apel T, Sändig A, Whiteman JR. Graded mesh refinement and error estimates for finite element solutions of
elliptic boundary value problems in non-smooth domains. Mathematical Methods in the Applied Sciences 1996;
19(1):63–85.
14. Babuška I, Kellogg RB, Pitkäranta J. Direct and inverse error estimates for finite elements with mesh refinements.
Numerische Mathematik 1979; 33(4):447–471.
15. Li H, Mazzucato A, Nistor V. On the analysis of the finite element method on general polygonal domains II:
mesh refinements and interpolation estimates. 2007, in preparation.
16. Braess D, Hackbusch W. A new convergence proof for the multigrid method including the V -cycle. SIAM Journal
on Numerical Analysis 1983; 20(5):967–975.
17. Brenner SC. Convergence of the multigrid V -cycle algorithm for second-order boundary value problems without
full elliptic regularity. Mathematics of Computation 2002; 71(238):507–525 (electronic).
18. Bramble JH, Pasciak JE, Wang JP, Xu J. Convergence estimates for multigrid algorithms without regularity
assumptions. Mathematics of Computation 1991; 57(195):23–45.
19. Yserentant H. The convergence of multilevel methods for solving finite-element equations in the presence of
singularities. Mathematics of Computation 1986; 47(176):399–409.
20. Brandt A, McCormick S, Ruge J. Algebraic multigrid (AMG) for sparse matrix equations. Sparsity and its
Applications (Loughborough, 1983). Cambridge University Press: Cambridge, 1985; 257–284.
21. Vassilevski P. Multilevel Block Factorization Preconditioners. Springer: Berlin, 2008.
22. Ciarlet P. The Finite Element Method for Elliptic Problems. Studies in Mathematics and its Applications, vol. 4.
North-Holland: Amsterdam, 1978.
23. Li H, Mazzucato A, Nistor V. On the analysis of the finite element method on general polygonal domains I:
transmission problems and a priori estimates. CCMA Preprint AM319, 2007.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:291–306
DOI: 10.1002/nla
306 J. J. BRANNICK, H. LI AND L. T. ZIKATANOV
24. Xu J. An introduction to multigrid convergence theory. Iterative Methods in Scientific Computing, Hong Kong,
1995. Springer: Singapore, 1997; 169–241.
25. Xu J, Zikatanov L. The method of alternating projections and the method of subspace corrections in Hilbert
space. Journal of the American Mathematical Society 2002; 15(3):573–597 (electronic).
26. Adams R. Sobolev Spaces. Pure and Applied Mathematics, vol. 65. Academic Press: New York, London, 1975.
27. Ammann B, Nistor V. Weighted sobolev spaces and regularity for polyhedral domains. Preprint, 2005.
28. Apel T, Schöberl J. Multigrid methods for anisotropic edge refinement. SIAM Journal on Numerical Analysis
2002; 40(5):1993–2006 (electronic).
29. Băcuţă C, Nistor V, Zikatanov LT. Regularity and well posedness for the Laplace operator on polyhedral domains.
IMA Preprint, 2004.
30. Bramble JH, Pasciak JE. New convergence estimates for multigrid algorithms. Mathematics of Computation 1987;
49(180):311–329.
31. Bramble JH, Xu J. Some estimates for a weighted L 2 projection. Mathematics of Computation 1991; 56(194):
463–476.
32. Bramble JH, Zhang X. Uniform convergence of the multigrid V -cycle for an anisotropic problem. Mathematics
of Computation 2001; 70(234):453–470.
33. Brenner S, Scott LR. The Mathematical Theory of Finite Element Methods. Texts in Applied Mathematics,
vol. 15. Springer: New York, 1994.
34. Brenner SC. Multigrid methods for the computation of singular solutions and stress intensity factors. I. Corner
singularities. Mathematics of Computation 1999; 68(226):559–583.
35. Brenner SC, Sung L. Multigrid methods for the computation of singular solutions and stress intensity factors. II.
Crack singularities. BIT 1997; 37(3):623–643 (Direct methods, linear algebra in optimization, iterative methods,
Toulouse, 1995/1996).
36. Brenner SC, Sung L. Multigrid methods for the computation of singular solutions and stress intensity factors. III.
Interface singularities. Computer Methods in Applied Mechanics and Engineering 2003; 192(41–42):4687–4702.
37. Wu H, Chen Z. Uniform convergence of multigrid v-cycle on adaptively refined finite element meshes for second
order elliptic problems. Science in China 2006; 49:1405–1429.
38. Yosida K. Functional Analysis (5th edn). A Series of Comprehensive Studies in Mathematics, vol. 123. Springer:
New York, 1978.
39. Yserentant H. On the convergence of multilevel methods for strongly nonuniform families of grids and any
number of smoothing steps per level. Computing 1983; 30(4):305–313.
40. Yserentant H. Old and new convergence proofs for multigrid methods. Acta Numerica, 1993. Cambridge University
Press: Cambridge, 1993; 285–326.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:291–306
DOI: 10.1002/nla