!!!numerical Linear Algebra With Applications - Multigrid Method

NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS
Numer. Linear Algebra Appl. 2008; 15:85–87

Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nla.586
Editorial
Multigrid Methods
SUMMARY
This special issue contains papers from the Thirteenth Copper Mountain Conference on Multigrid Methods,
held in the Colorado Rocky Mountains on March 19–23, 2007, co-chaired by Van Henson and Joel
Dendy. The papers address a variety of applications and cover a breadth of topics, ranging from theory
to high-performance computing. Copyright q 2008 John Wiley & Sons, Ltd.
KEY WORDS: multigrid; image processing; adaptive refinement; domain decomposition; Karhunen–
Loève expansion; eigensolver; Hodge decomposition
The First Copper Mountain Conference on Multigrid Methods was organized in 1983 by Steve
McCormick, who persevered to chair nine more in this biennial series before handing over the reins
in 2003. Today, the conference is widely regarded as one of the premier international conferences
on multigrid methods. In 1990, it was joined by the equally successful conference on iterative
methods, chaired by Tom Manteuffel. The 2007 multigrid meeting was co-chaired by the now
three-time veterans Van Henson and Joel Dendy.
The conference began with three tutorial sessions given by Van Henson and Craig Douglas. The
sessions covered multigrid basics as well as more advanced topics such as nonlinear multigrid and
algebraic multigrid (AMG). The remaining five days of the conference were organized around a
series of 25-min talks, allowing ample time for individual research discussions with colleagues.
The student paper competition produced three winners, Hengguang Li (Penn State University),
Christian Mense (Technical University of Bonn), and Hisham Zubair (University of Delft), who
presented their papers in the student session.
This special issue contains 10 papers from the Thirteenth Copper Mountain Conference on
Multigrid Methods, held in the Colorado Rocky Mountains on March 19–23, 2007. The papers
address a variety of applications and cover a breadth of topics, ranging from theory to high-
performance computing.
De Sterck et al. [1] explore two efficiency-based refinement strategies for the adaptive finite
element solution of partial differential equations (PDEs). The goal is to reach a pre-specified bound
on the global discretization error with minimal amount of work. The methods described require a
multigrid method that is optimal on adaptive grids with potentially higher-order elements.
De Sterck et al. [2] introduce long-range interpolation strategies for AMG. The resulting AMG
methods exhibit dramatic reductions in complexity costs on parallel computers while maintaining
near-optimal multigrid convergence properties.
Rosseel et al. [3] describe an AMG method for solving stochastic PDEs. The stochastic finite
element method is used to transform the problem to a large system of coupled PDEs, and the
AMG method is used to solve the system.
Bell and Olson [4] propose a general AMG approach for the solution of discrete k-form
Laplacians. The method uses an aggregation approach and maintains commutativity of the coarse
and fine de Rham complexes.
Copyright q 2008 John Wiley & Sons, Ltd.

86 EDITORIAL
Stürmer et al. [5] introduce a fast multigrid solver for applications in image processing, including
image denoising and non-rigid diffusion-based image registration. The solver utilizes architecture-
aware optimizations and is compared with solvers based on fast Fourier transforms.
Köstler et al. [6] develop a geometric multigrid solver for optical flow and image registration
problems. The collective pointwise smoothers used are analyzed with Fourier analysis, and the
method is applied to synthetic and real world images.
Michelini and Coyle [7] introduce an alternative to classical local Fourier analysis (LFA) as a
tool for designing intergrid transfer operators in multigrid methods. A harmonic aliasing property
is introduced and the approach is compared and contrasted with LFA.
Brezina et al. [8] introduce an eigensolver based on the smoothed aggregation (SA) method
that produces an approximation to the minimal eigenvector of the system. The ultimate aim of the
work is to improve the so-called adaptive SA method, which has been shown to be a highly robust
solver.
Zhu [9] derives convergence theory for overlapping domain decomposition methods for second-
order elliptic equations with large jumps in coefficients. It is shown that the convergence rate is
nearly uniform with respect to the jumps and mesh size.
Brannick et al. [10] analyze a multigrid V-cycle scheme for solving the discretized 2D Poisson
equation with corner singularities. The method is proven to be uniformly convergent for finite
element discretizations of the Poisson equation on graded meshes, and supporting numerical
experiments are supplied.
The 2007 conference was held in cooperation with the Society for Industrial and Applied Math-
ematics and sponsored by the Lawrence Livermore and Los Alamos National Laboratories, Front
Range Scientific Computation, Inc., the Department of Energy, the National Science Foundation,
and IBM Corporation. The Program Committee members for the conference were Susanne Brenner,
Craig Douglas, Robert Falgout, Jim Jones, Kirk Jordan, Tom Manteuffel, Steve McCormick, David
Moulton, Kees Oosterlee, Joseph Pasciak, Ulrich Rüde, John Ruge, Klaus Stüben, Olof Widlund,
Ulrike Yang, Irad Yavneh, and Ludmil Zikatanov. The Program Committee served as Guest Editors
for the special issue.
We thank the editors of Numerical Linear Algebra with Applications for hosting this special issue,
especially Panayot Vassilevski, for his invaluable help and guidance. This work was performed
under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory
under Contract DE-AC52-07NA27344.
REFERENCES
1. De Sterck H, Manteuffel T, McCormick S, Nolting J, Ruge J, Tang L. Efficiency-based h- and hp-refinement
strategies for finite element methods. Numerical Linear Algebra with Applications 2008; DOI: 10.1002/nla.567.
2. De Sterck H, Falgout RD, Nolting JW, Yang UM. Distance-two interpolation for parallel algebraic multigrid.
Numerical Linear Algebra with Applications 2008; DOI: 10.1002/nla.559.
3. Rosseel E, Boonen T, Vandewalle S. Algebraic multigrid for stationary and time-dependent partial differential
equations with stochastic coefficients. Numerical Linear Algebra with Applications 2008; DOI: 10.1002/nla.568.
4. Bell N, Olson LN. Algebraic multigrid for k-form Laplacians. Numerical Linear Algebra with Applications
2008; DOI: 10.1002/nla.577.
5. Stürmer M, Köstler H, Rüde U. A fast full multigrid solver for applications in image processing. Numerical
Linear Algebra with Applications 2008; DOI: 10.1002/nla.563.
6. Köstler H, Ruhnau K, Wienands R. Multigrid solution of the optical flow system using a combined diffusion-
and curvature-based regularizer. Numerical Linear Algebra with Applications 2008; DOI: 10.1002/nla.576.
7. Michelini PN, Coyle EJ. A semi-algebraic approach that enables the design of inter-grid operators to optimize
multigrid convergence. Numerical Linear Algebra with Applications 2008; DOI: 10.1002/nla.579.
Copyright q 2008 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2008; 15:85–87
EDITORIAL 87
8. Brezina M, Manteuffel T, McCormick S, Ruge J, Sanders G, Vassilevski P. A generalized eigensolver based on

smoothed aggregation (GES-SA) for initializing smoothed aggregation (SA) multigrid. Numerical Linear Algebra
with Applications 2008; DOI: 10.1002/nla.575.
9. Zhu Y. Domain decomposition preconditioners for elliptic equations with jump coefficients. Numerical Linear
Algebra with Applications 2008; DOI: 10.1002/nla.566.
10. Brannick JJ, Li H, Zikatanov LT. Uniform convergence of the multigrid V -cycle on graded meshes for corner
singularities. Numerical Linear Algebra with Applications 2008; DOI: 10.1002/nla.574.
ROBERT D. FALGOUT
GUEST EDITOR
Center for Applied Scientific Computing
Lawrence Livermore National Laboratory
Livermore, CA, U.S.A.
Published online 17 January 2008 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nla.567
Efficiency-based h- and hp-refinement strategies for finite

element methods
H. De Sterck1, ∗, † , T. Manteuffel2 , S. McCormick2 , J. Nolting2 ,

J. Ruge2 and L. Tang1
1 Department of Applied Mathematics, University of Waterloo, Waterloo, Ont., Canada
2 Department of Applied Mathematics, University of Colorado at Boulder, Boulder, CO, U.S.A.
SUMMARY
Two efficiency-based grid refinement strategies are investigated for adaptive finite element solution of
partial differential equations. In each refinement step, the elements are ordered in terms of decreasing
local error, and the optimal fraction of elements to be refined is determined based on efficiency measures
that take both error reduction and work into account. The goal is to reach a pre-specified bound on
the global error with minimal amount of work. Two efficiency measures are discussed, ‘work times
error’ and ‘accuracy per computational cost’. The resulting refinement strategies are first compared for
a one-dimensional (1D) model problem that may have a singularity. Modified versions of the efficiency
strategies are proposed for the singular case, and the resulting adaptive methods are compared with a
threshold-based refinement strategy. Next, the efficiency strategies are applied to the case of hp-refinement
for the 1D model problem. The use of the efficiency-based refinement strategies is then explored for
problems with spatial dimension greater than one. The ‘work times error’ strategy is inefficient when the
spatial dimension, d, is larger than the finite element order, p, but the ‘accuracy per computational cost’
strategy provides an efficient refinement mechanism for any combination of d and p. Copyright q 2008
John Wiley & Sons, Ltd.
Received 19 April 2007; Accepted 1 November 2007
KEY WORDS: adaptive refinement; finite element methods; hp-refinement
1. INTRODUCTION
Adaptive finite element methods are being used extensively as powerful tools for approximating
solutions of partial differential equations (PDEs) in a variety of application fields, see, e.g. [1–3].
This paper investigates the behavior of two efficiency-based grid refinement strategies for adaptive
∗ Correspondence to: H. De Sterck, Department of Applied Mathematics, University of Waterloo, Waterloo, Ont.,
Canada.
†
E-mail: hdesterck@uwaterloo.ca

90 H. DE STERCK ET AL.
finite element solution of PDEs. It is assumed that a sharp, easily computed local a posteriori
error estimator is available for the finite element method. In each refinement step, the elements
are ordered in terms of decreasing local error, and the optimal fraction of elements to be refined
in the current step is determined based on efficiency measures that take both error reduction
and work into account. The goal is to reach a pre-specified bound on the global error with a
minimal amount of work. It is assumed that optimal solvers are used for the discrete linear
systems and that the computational work for solving these systems is, thus, proportional to the
number of degrees of freedom (DOF). Two efficiency measures are discussed. The first efficiency
measure is ‘work times error’ efficiency (WEE), which was originally proposed in [4]. A second
measure proposed in this paper is called ‘accuracy per computational cost’ efficiency (ACE). In
the first part of the paper, the performance of the two measures is compared for a standard one-
dimensional (1D) model problem with solution x , which may exhibit a singularity at the origin,
depending on the value of the parameter . The accuracy of the resulting grid is compared with
the asymptotically optimal ‘radical grid’ [3, 5]. Modified versions of the efficiency strategies are
proposed for the singular case, and the resulting adaptive methods are compared with a threshold-
based refinement strategy. The efficiency strategies are also applied to the hp-refinement case
for the 1D model problem, and the results are compared with the ‘optimal geometric grid’ for
hp-refinement that was derived in [5]. In the last part of the paper, the use of the efficiency-
based refinement strategies is explored for problems with spatial dimension d>1. The ‘work times
error’ strategy turns out to be inefficient when the spatial dimension, d, is larger than the finite
element order, p, but the ‘accuracy per computational cost’ strategy provides an efficient refinement
mechanism for any combination of d and p. This is illustrated for a model problem in two
dimensions (2D).
This paper is organized as follows. In the following section, the efficiency-based h-refinement
strategies are described, along with the notation used in this paper, the model problem, and
assumptions on the PDE problems, finite element methods, error estimators, and linear solvers
considered. The performance of the WEE and ACE refinement strategies for the 1D model problem
is discussed in Section 3. Modified WEE and ACE refinement strategies for the singular case
are considered in Section 4. In Section 5, efficiency-based hp-refinement strategies is discussed
and illustrated for the 1D test problem. Section 6 describes how the efficiency-based refinement
strategies can be applied for 2D problems. Throughout the paper, numerical tests illustrate the
performance of the proposed methods. Smooth and singular 1D model problems are introduced in
Section 2.2, and the performance of the proposed h- and hp-refinement strategies in 1D is discussed
in Sections 3–5. A smooth 2D test problem is proposed in Section 6.2, and 2D h-refinement results
are discussed in Section 6.3. Conclusions are formulated in Section 7.
2. EFFICIENCY-BASED h-REFINEMENT STRATEGIES
2.1. Assumptions on PDE problem, error estimate, refinement process, and linear solver
Consider a PDE expressed abstractly as
Lu = f in ⊂ Rd (1)
with appropriate boundary conditions and solution space V . Assume that continuity and coer-
civity bounds for the corresponding bilinear form can be verified in some suitable norm. Let
DOI: 10.1002/nla
EFFICIENCY-BASED h- AND hp-REFINEMENT STRATEGIES 91

Th be a regular partition of the domain, , into finite elements [3, 6], i.e. = ∈Th with
h = max{diam() : ∈ Th }. In this paper we assume, for simplicity, that the elements are squares in
2D and cubes in three dimensions (3D). Let Vh be a finite-dimensional subspace of V and u h ∈ Vh
a finite element approximation such that the following error estimate holds:
u −u h H m () Ch s−m u H s () (2)
where 0m<s< p +1, m and p are integers and s is a real number. Here, p is the polynomial
order of the finite element method. Furthermore, assume that we obtain a sharp a posteriori error
estimate E(u h , f ) that is equivalent to u −u h H m () . The associated error functional is given by
F(u h , f ) = E 2 (u h , f ). For example, the L 2 functional is a natural a posteriori error estimate for
first-order system least squares (FOSLS) finite element methods, and equivalence to the H 1 norm
has been proved for several relevant second-order PDE systems of elliptic type [4, 7–9]. The local
value of the error, E, on element j is denoted by j .
Consider an adaptive hp-refinement process of the following form. The refinement process
starts on a coarse grid with uniform element size h and order p = 1 (level 0) and proceeds
through levels = 1, 2, . . . , until the error measure, E (u h , f ), has a value less than a given
bound. In each step, some elements may be refined in h by splitting them into 2d sub-elements,
and some elements may be refined in p by doubling the element order. The decision of which
elements to refine is based on the information provided by the local error estimator, and by
heuristics that may take into account predicted error reduction and work. In particular, we consider
strategies where the elements are ordered in terms of decreasing local error, such that elements
with larger error are considered for refinement first. Standard threshold-based approaches then
may refine, for example, a fixed fraction of the elements in every step or a fixed fraction of
the total error functional. Let the work needed to solve the discrete linear system on level be
given by W . Our goal is to reach a pre-specified bound on the global error, E (u h , f ), with a
L
minimal amount of total work, =1 W . Finding this optimal grid sequence may be difficult,
even if we restrict the process to h-refinement alone. Hence, we turn to seeking nearly optimal
solutions by using heuristics of greedy type. We consider refinement heuristics that determine the
fraction of elements to be refined based on optimizing an efficiency measure in every step. We
expect that a desirable grid sequence needs to be a high accuracy sequence, i.e. a grid sequence
for which the error, E (N ), decreases with nearly optimal order as a function of the number
of DOF, N , on grid level . Note that our strategy also results in an approximate solution to
the following problem: find a mesh with a fixed number of DOF that minimizes the error. To
this end, one can simply stop the above described process when the specified number of DOF is
reached.
We allow the domain to contain singularities, i.e. points or lines in whose neighborhood the
full convergence order of the finite element method cannot be attained due to lack of smoothness
of the solution. For simplicity, assume that those singular points or lines can be located only at
coarse-level grid points or grid lines and that their power and location are known. This includes
the case where the singularities occur at the boundaries of the simulation domain. If the location
and strength of the singularities are not known in advance, they can be estimated by monitoring
reduction rates of local error functionals during a few steps of initial uniform refinement.
It is assumed that optimal solvers, e.g. multigrid, are used for the discrete linear systems. The
computational work for solving these systems is, thus, assumed to be a fixed constant times the
number of DOF: W = c N .
DOI: 10.1002/nla
2.2. 1D model problem and finite element method

In the first part of this paper, we study the performance of the proposed efficiency-based refinement
strategies for a standard model problem in 1D [3, 5]:
u = (−1)x −2 , u(0) = 0, u(1) = 1 (3)
with exact solution given by u = x . While the efficiency-based refinement strategies can be applied
to various types of finite element methods and associated error estimates, we choose to illustrate
the refinement strategies for model problem (3) using standard Galerkin finite element methods of
order p, with the error estimated by the H 1 seminorm of the actual error, e = u −u h , i.e. F(u h , f ) =
u −u h 2L 2 () and 2j = u −u h 2L 2 ( ) . These are equivalent to the H 1 norm, since it turns out that
j
e(xi ) = 0 at each grid point for our model problem [3, 5]. Note that u ∈ H 1+−(1/2)− ((0, 1)) for
/ H 2 ((0, 1)), then there is an x -type
any >0. If we choose 12 < 32 such that u ∈ H 1 ((0, 1)) but u ∈
singularity at x = 0. We choose this model problem and this error estimator because asymptotically
optimal h- and hp-finite element grids have been developed for them [3, 5], which can be used as
a point of comparison for the refinement strategies to be presented in this paper. In addition, it
turns out that the finite element approximations can be obtained easily, namely, by interpolation
for p = 1 and by integrating a truncated Legendre expansion of u (x) for p1. The refinement
strategies presented in this paper can be equally applied to other finite element methods, as is
illustrated in the second part of the paper, where we present results for a 2D problem using the
FOSLS finite element method [4, 8].
2.3. ‘WEE’ and ‘ACE’ strategies

On each level, order the elements such that the local error, j , satisfies 1 · · · N . With r ∈ (0, 1]
denoting the to-be-determined fraction of elements that will be refined, let f (r ) ∈ [0, 1] be the
fraction of the total error functional in the refinement region, (r ) ∈ [0, 1] the predicted functional
reduction, and (r ) ∈ [1, 2] the ratio of the number of DOF on level +1 and level , i.e. N+1 =
(r ) N . The first refinement strategy, WEE, was initially proposed in [4]. Here, the fraction, r , of
elements to be refined on the current level is determined by minimizing the following efficiency
measure:

work×error reduction = (r ) (r ) (4)
i.e.

ropt = arg min (r ) (r ) (5)
r ∈(0,1]
The motivation for this heuristic is as follows: more work on the current level is justified when it
results in increased error reduction that offsets the extra work. While this choice does not guarantee
that a globally optimal grid sequence is obtained, this local optimization in each step results in an
overall strategy of greedy type, which can be expected to lead to a reasonable approximation to
the optimal grid sequence.
We also propose a second strategy, ACE. We define the predicted effective functional reduction
factor
(r )eff = (r )1/(r ) (6)
DOI: 10.1002/nla
The fraction, r , of elements to be refined on the current level is determined by minimizing this
effective reduction factor, which is the same as minimizing log((r )eff ), i.e.
log((r ))
ropt = arg min (7)
r ∈(0,1] (r )
The effective functional reduction factor, (r )eff , measures the functional reduction per unit work.
Indeed, compare two hypothetical error-reducing processes with functional reduction factors 1 and
2 , and work proportional to 1 and 2 . Assume that process 2 requires double the work of process
1, 2 = 2 1 . Then the two processes would be equally effective when 2 = 21 , because process 1
could be applied twice to obtain the same error reduction as process 2, using the same total amount
of work as process 2. Minimizing the effective functional reduction in every step, thus, chooses
the fraction, r , of elements to be refined by locally minimizing the functional reduction per unit
work.
Both the strategies of minimizing work times error reduction and minimizing the effective
functional reduction factor are ways for optimizing the efficiency of the refinement process at each
level. Hence, we call the two proposed efficiency-based refinement strategies.
2.4. Error and work estimates for h-refinement in 1D

The predicted functional reduction ratio, (r ), and element growth ratio, (r ), can be determined
as follows for the case of h-refinement in 1D with fixed finite element order p.
The element growth ratio, (r ), can be determined easily. We have N elements on level . Of
these, r N are refined into two new elements each, while (1−r )N elements are left unrefined.
Thus, the number of elements on level +1 is N+1 = (1−r )N +2r N = (1+r )N . This yields
(r ) = 1+r (8)
The predicted functional reduction factor, (r ), depends on the error estimate and the smoothness
of the solution. As mentioned above, we consider the case that the error estimate is equivalent to
the H 1 norm of u −u h , i.e. F(u h , f ) ≈ u −u h 2H 1 () and 2j ≈ u −u h 2H 1 ( ) . The error has the
j
following asymptotic behavior [6].
For elements j in which the solution is smooth (at least in H p+1 ( j ) if order p elements are
used), we have
2j ≈ u h −u2H 1 (
j)
2p
Ch j u2H p+1 (
j)
2p
C M p+1 h j h j (9)
p+1
Here, we can take M p+1 = i=0 u (i)2 ∞, j , such that u2H p+1 ( ) M p+1 h j . If j is split into
j
two equal parts, we have two new elements, j,1 and j,2 , and we can assume that
2 p
2j,1 +2j,2 1
≈ (10)
2j 2
DOI: 10.1002/nla
However, if u is less smooth in some element j , i.e. if we can assume only that u ∈ H s ( j )
with s ∈ R, s< p +1, then we have
2(s−1)
2j Ch j u2s,i (11)
For simplicity, we consider only the highly singular case here, for which s p +1. If, again, j is
split into two, assuming element j,1 contains the singularity, then j,1 j,2 and us, j,1 ≈ us, j .
We then obtain
2(s−1)
2j,1 +2j,2 2j,11
≈ 2 ≈ (12)
i2 j 2
Suppose the solution is sufficiently smooth in the whole domain. Then the predicted functional
reduction factor, (r ), can be obtained as follows. We apply (10) to the elements that are refined.
A fraction, 1− f (r ), of elements do not get refined; hence, we assume that their errors are not
reduced. This results in
(r ) = 1− f (r )+( 12 )2 p f (r ) (13)
It is cumbersome to give a general expression for the singular case. However, assuming that we
know the power and location of the singularities in advance, one can easily compute (r ) using
(10) and (12).
3. PERFORMANCE OF THE WEE AND ACE h-REFINEMENT

STRATEGIES IN 1D
3.1. Performance of WEE and ACE for smooth solutions

We apply the WEE and ACE strategies to our 1D model problem (3) with p = 1. On each level
, each element is allowed to be refined at most once. We first consider the nonsingular case and
choose > 32 such that u ∈ H 2 ((0, 1)). It follows that the predicted functional reduction factor, (r ),
is given by
(r ) = 1− 34 f (r ) (14)
Note that, for a given error bound, our ultimate goal is to choose a grid sequence that minimizes
L L
the total work, =1 W , which is the same as minimizing =1 N , based on our assumption
that the work is proportional to N . For a given error bound, the number of elements on final
grid N L is determined by the convergence rate of the global error w.r.t. the DOF, which in fact is
determined by the refinement strategy. For our model problem, it has been shown in [5] that the
rate of convergence is never better than (N p)− p , where N is the number of elements and p is the
degree of the polynomial.
DOI: 10.1002/nla
Theorem 1 (Gui and Babuška [5])

Let E = ( i2 )1/2 . Then there is a constant, C = C(, p)>0, for any grid {0 = x 0 <x1 < · · · <x N = 1},
such that
EC (N p)− p (15)
For our example problem, an asymptotically optimal final grid, called a radical grid, is described
in [3, 5]:
x j = ( j/N )( p+1/2)/(−1/2) , j = 0, . . . , N (16)
This grid is optimal in the sense that, in the limit of large N , it results in the smallest error as
a function of the number of DOF. If the WEE or the ACE strategy results in a grid sequence
with approximately optimal convergence rate of the global error w.r.t. DOF, then the number of
elements on the final grid must be close to the optimal number of elements, which depends only
on the given error bound. Because we wish to minimize work, it follows that, among the methods
with approximately optimal convergence rate, the methods for which the sequence {N } increases
fast are preferable. Large refinements are, thus, advantageous.
We compare the numerical results of the WEE and ACE strategies, and radical grid for = 2.1
and p = 1 in Figures 1–6. In the numerical results, we carry out the refinement process until
E L (u h , f )2e−5 on final grid level L.
From Figure 1, it can be observed that both strategies result in a highly accurate grid sequence.
Thus, for a given error bound, the difference in the number of elements on the final grid is very
small. This can be verified on Figure 2. Figures 3 and 4 show that the ACE strategy is slightly
more efficient than the WEE strategy for our model problem in the smooth case. There are two
small refinements in the WEE refinement process, while there are no small refinements for the
ACE strategy. It follows that for a given error bound on the final grid, the WEE strategy may
log10(E)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

log (N)
10
Figure 1. Error versus DOF, = 2.1 (no singularity), p = 1.
DOI: 10.1002/nla
x 10 x 10
1.8 1.8
1.6
1.6
1.4
1.4
1.2
1.2
1
1
0.8
0.8
0.6
0.6
0.4
0.2 0.4
0 0.2
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
(a) (b)
Figure 2. Local error functional, i2 , versus grid location on the final grid, = 2.1 (no singularity),
p = 1: (a) WEE: N L = 32 741, E L = 1.859e−5, L = 18, total work = 102 313 and (b) ACE: N L = 32 760,
E L = 1.858e−5, L = 16, total work = 65 520.
1 1
0.9 0.9
0.8 0.8
0.7 0.7
opt
f(ropt) and ropt
0.6 0.6
) and r
0.5 0.5
opt
0.4 0.4
f(r
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 2 4 6 8 10 12 14 16 18 0 5 10 15
(a) level (b) level
Figure 3. Refined fraction of error functional, f (ropt ), versus level, , and refined fraction of elements,
ropt , versus level, , = 2.1 (no singularity), p = 1: (a) WEE and (b) ACE.
require slightly more total work than the ACE strategy, see Figure 5. Figure 2 shows that, for
both strategies, the local errors in all elements tend to be equally distributed. This explains why
the values of f (ropt ) and ropt are close in Figure 3. From Figure 6 one can see that the predicted
reduction factor (ropt ) is very accurate. This suggests a modification of the refinement process
that can be considered to increase performance: one does not need to solve the linear systems until
the new level is refined enough to have a significant number of additional elements in it. In this
way complexity is never a problem, and we can still have a highly accurate grid sequence.
DOI: 10.1002/nla
x 10 x 10
3.5 3.5
3 3
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5
0 0
0 2 4 6 8 10 12 14 16 18 0 2 4 6 8 10 12 14 16
(a) (b)
Figure 4. Number of elements, N , versus level, , = 2.1 (no singularity), p = 1: (a) WEE and (b) ACE.
10
WEE
ACE
10
error on final grid
10
10
10
10
10 10 10 10 10 10 10
total work
L
Figure 5. Final error, E L , versus total work, =1 N , = 2.1 (no singularity), p = 1.
3.2. Performance of WEE and ACE for singular solutions

Next, we consider a singular example: let = 0.6, so that u ∈ H 1.1 ((0, 1)). In the numerical results,
we carry out the refinement process until E L (u h , f )7e−4 on final grid level L. For p = 1, the
error reduction in the element that contains x = 0 can be approximately given by ( 12 )0.2 , see (12).
The predicted reduction factor (r ) is given by

3 1 0.2 1 1
(r ) = 1− f (r )+ − f (17)
4 2 4 N
Here, we assume that the local error in the element that contains x = 0 is always the largest.
DOI: 10.1002/nla
1 1
predicted factor
predicted factor actual factor
0.9 0.9
actual factor
0.8 0.8
0.7 0.7
γ and g
γ and g
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0 2 4 6 8 10 12 14 16 18 0 5 10 15
(a) level (b) level
Figure 6. Predicted functional reduction factor, (ropt ), and actual functional reduction factor, g, versus
level, , = 2.1 (no singularity), p = 1: (a) WEE and (b) ACE.
log10(E)
1.5 2 2.5 3 3.5 4 4.5 5

log (N)
10
Figure 7. Error versus DOF, = 0.6 (singular case), p = 1.
The numerical results in Figures 7–12 show that the two refinement strategies fail for this singular
case. Figure 7 shows that the WEE strategy results in a highly accurate grid sequence, while the
ACE strategy becomes inaccurate by comparison with the radical grid. For both strategies, the
local error in the first element, which contains the singularity, is always the largest, see Figure 8.
Hence, it is refined by the WEE and the ACE in every step. This also confirms that the predicted
reduction factor can be given by (17). The WEE strategy generates a grid sequence with local
errors being nearly equally distributed, but the ACE strategy does not: more than 90% of the global
error accumulates in only 10% of the elements; see Figures 8 and 9. Most refinement steps of the
DOI: 10.1002/nla
x 10 x 10
4
4.5
3.5
4
3
3.5
2.5
3
2.5 2
2
1.5
1.5
1
1
0.5
0.5
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
(a) x (b) x
Figure 8. Local error functional, i2 , versus grid location on the final grid, = 0.6 (singular case),
p = 1: (a) WEE: N L = 6925, E L = 6.169e−4, L = 154, total work = 192 775 and (b) ACE: N L = 24 986,
E L = 6.411e−4, L = 106, total work = 365 420.
1 1
0.9 0.9
0.8 0.8
0.7 0.7
opt
f(ropt) and ropt
0.6 0.6
) and r
0.5 0.5
opt
0.4 0.4
f(r
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120
(a) level (b) level
ropt , versus level , = 0.6 (singular case), p = 1: (a) WEE and (b) ACE.
WEE strategy are small refinements: only the first element (possibly with a few other elements)
is continuously being refined (see Figures 9 and 10). This implies that the number of elements
increases slowly as a function of refinement level. It follows that the total work is very large. The
ACE strategy does choose a refinement region with large fraction of the error in it. However, this
large fraction of error is contained only in a few elements. As a result, only a small fraction of
elements are refined. Thus, the required total work is still large; see Figures 10 and 11. Compared
with the nonsingular case (Figure 5), the slope of the error versus total work plot in Figure 11 is
DOI: 10.1002/nla
x 10
7000 2.5
6000
2
5000
1.5
4000
3000
1
2000
0.5
1000
0 0
0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120
(a) (b)
Figure 10. Number of elements, N , versus level, , = 0.6 (singular

case), p = 1: (a) WEE and (b) ACE.
10
WEE
ACE
10
error on final grid
10
10
10
10 10 10 10 10 10 10
total work
L
Figure 11. Final error, E L , versus total work, =1 N , = 0.6 (singular case), p = 1.
much less steep, especially in the initial phase of the refinement process. The predicted reduction
factors for both strategies are accurate, see Figure 12. This suggests that we can make the same
modification as for the smooth case to increase performance: one can wait on solving the linear
systems until the number of elements has increased sufficiently. In this way, one can assure that
the complexity is never a problem, but calculating and minimizing the WEE and ACE functions
many times may be costly as well. In conclusion, for the highly singular case, the WEE strategy
results in an accurate grid sequence but is not efficient due to too many small refinements; the ACE
DOI: 10.1002/nla
1 1
0.9 0.9
0.8 0.8
γ l and g
γ and g
0.7 0.7
0.6 0.6
0.5 0.5
predicted factor predicted factor
actual factor actual factor
0.4 0.4
0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120

(a) level (b) level
Figure 12. Predicted functional reduction factor, (ropt ), and actual functional reduction factor, g , versus
level, , = 0.6 (singular case), p = 1: (a) WEE and (b) ACE.
strategy is worse than the WEE strategy in this case, because the grid sequence is not accurate
and many small refinements are performed.
4. MODIFIED WEE AND ACE h-REFINEMENT STRATEGIES

FOR SINGULAR SOLUTIONS IN 1D
4.1. Modified WEE and ACE h-refinement strategies

The inefficiency of the WEE and ACE strategies for the highly singular solution is due to many
steps of small refinement for the singular elements. Therefore, we attempt to avoid these steps by
using a geometrically graded grid starting from the singular point, with the aim of saving work
while attempting to keep the grid sequence accurate.
As was discussed before, we assume that singularities can be located only at coarse-level grid
points and that we know the location and the power of the singularities in advance. We propose
to do graded grid refinement for elements containing a singularity, in such a way that we obtain
the same error reduction factor as in elements in which the solution is smooth. For example, for a
singularity located at a domain boundary, the element at the boundary is split into two, and then,
within the same refinement step, the new element at the singularity is repeatedly split into two
again, until the predicted error reduction factor matches the desired error reduction. We modify the
predicted functional reduction factor, (r ), and the work increase ratio, (r ), accordingly. We expect
the correspondingly modified WEE and ACE strategies (MWEE and MACE) to generate a highly
accurate grid sequence in an efficient way. This results in the following modified efficiency-based
refinement strategies:
(1) Order the elements such that the local error, j , satisfies 1 2 · · · N .
(2) Perform graded grid refinement for elements containing a singularity, i.e. if u ∈ H s j ( j ),
then graded grid refinement with m j levels is used for any j that needs to be refined, with
DOI: 10.1002/nla
m j satisfying
2m j (s j −1) 2 p
1 1 p
≈ ⇒m j =
2 2 s j −1
Note that we assume here that the error in the first, singular new element dominates the sum
of the errors in the other new elements of the graded grid. This is a good approximation
for a strong singularity. For elements in which the solution is smooth, single refinement is
performed: m j = 1. Let k j be the number of new elements after j is refined: k j = m j +1.
(3) The predicted functional reduction factor, (r ), and the work increase ratio, (r ), are
given by

j r N k j
(r ) = 1−r +
N
2 p (18)
1
(r ) = 1− f (r )+ f (r )
2
(4) Find the optimal r defined in (5) for the MWEE strategy and in (7) for the MACE strategy.
(5) Repeat.
4.2. Performance of the modified WEE and ACE h-refinement strategies for singular solutions
We again choose = 0.6 and p = 1 for our example problem. There is a singularity at x = 0,
with error reduction factor bound ( 12 )0.2 . Therefore, for the element that contains x = 0, we use
11-graded refinement (m = 0.1
1
). Numerical results are shown in Figures 13–18.
By comparing the numerical results for the modified strategies with the results for the original
methods, we see the following. Both the MWEE and MACE strategies result in highly accurate
−0.5
MWEE : −1.0157x+0.68384
MACE : −1.0119x+0.70888
−1 Radical Grid : −0.9938x+0.58166
−1.5
log (E)
−2
10
−2.5
−3
−3.5
−4
1 1.5 2 2.5 3 3.5 4 4.5 5
log (N)
10
DOI: 10.1002/nla
x 10 x 10
4.5 4.5
4 4
3.5 3.5
3 3
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
(a) (b)
Figure 14. Local error functional, i2 , versus grid location on the final grid, = 0.6 (singular case),
p = 1: (a) MWEE: N L = 6975, E L = 6.125e−4, L = 15, total work = 21 176 and (b) MACE: N L = 8517,
E L = 5.443e−4, L = 12, total work = 17 044.
1 1
0.9 0.9
0.8 0.8
0.7 0.7
opt
opt
0.6 0.6
) and r
) and r
0.5 0.5
opt
opt
0.4 0.4
f(r
f(r
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 2 4 6 8 10 12 14 1 2 3 4 5 6 7 8 9 10 11
(a) level (b) level
ropt , versus level, , = 0.6 (singular case), p = 1: (a) MWEE and (b) MACE.
grid sequences: the convergence rate is very close to the optimal rate (Figure 13). Local error
functionals on the final MWEE grid are more equally distributed than for the MACE grid. For the
MWEE strategy, the local error functional in the singular element is only three times larger than in
the smooth elements. However, for the MACE strategy, that ratio is as large as 1000 (Figure 14).
For the MWEE strategy, the number of elements, N , increases much faster than for the WEE
strategy, which reduces the work considerably (Figure 15). However, there still exist a few small
refinement steps. For the MACE strategy, it seems that the strategy tends to do uniform refinement
after several initial steps (Figure 15(b)). Similar to the smooth solution case, the MWEE strategy
may need slightly more work to reach the same error bound than the MACE strategy due to a
DOI: 10.1002/nla
7000 9000
8000
6000
7000
5000
6000
4000 5000
3000 4000
3000
2000
2000
1000
1000
0 0
0 5 10 15 0 2 4 6 8 10 12
(a) (b)
Figure 16. Number of elements, N , versus level, , = 0.6 (singular

case), p = 1: (a) MWEE and (b) MACE.
10
MWEE
MACE
WEE
ACE
10
error on final grid
10
10
10
10 10 10 10 10 10 10
total work
L
few steps of small refinement (Figure 17). However, since the MWEE strategy is slightly more
accurate, the difference is very small. Again, the predicted functional reduction factors are good
approximations of the actual factors for both strategies (Figure 18).
4.3. Comparison with threshold-based refinement strategy

It is instructive to compare the MWEE and MACE strategies with the threshold-based refinement
strategy that chooses to refine a fixed fraction of the error functional on each level, i.e. f (r ) ≡ .
The same graded grid refinement strategy is used for the elements that contain a singularity. We
find the following for our example problem.
DOI: 10.1002/nla
0.9 0.31
predicted factor predicted factor
0.3
0.8
actual factor actual factor
0.29
0.7
0.28
0.6
γ and g
γ and g
0.27
0.5 0.26
0.25
0.4
0.24
0.3
0.23
0.2 0.22
0 2 4 6 8 10 12 14 1 2 3 4 5 6 7 8 9 10 11
(a) level (b) level
Figure 18. Predicted functional reduction factor, (ropt ), and actual functional reduction factor, g, versus
level, , = 0.6 (singular case), p = 1: (a) MWEE and (b) MACE.
10
MWEE MWEE
MACE MACE
graded threshold 1.0 graded threshold 1.0
error on final grid
10
log10(E)
10
1 1.5 2 2.5 3 3.5 4 10 10 10 10

(a) log10(N) (b) total work
Figure 19. Efficiency-based and threshold-based refinement strategies: (a) error versus DOF and (b) final
L
error, E L , versus total work, =1 N . (Both = 0.6 (singular case), p = 1.)
If we choose to refine a fixed fraction of the global error that is too small (less than the average
of f (ropt ) in the modified efficiency-based strategies), e.g. = 0.2 in Figure 19, then the resulting
grid sequence is almost of optimal accuracy, but the total work increases significantly since N
increases slowly. A threshold value that is too large (larger than the average of f (ropt ) in the
modified efficiency-based strategies), e.g. = 1.0 in Figure 19, makes the number of elements,
{N }=1
L , increases faster, but the large threshold results in a less accurate grid sequence. This
implies that more total work is required to reach the same error bound. A threshold value that
is close to the average of f (ropt ) in the modified efficiency-based strategies, namely, = 0.8 in
Figure 19, results in a refinement process that performs similar to the efficiency-based refinement
processes.
DOI: 10.1002/nla
In conclusion, the efficiency-based refinement strategies automatically and adaptively choose a

nearly optimal fraction of the error to be refined. As a result, they generate nearly optimal grid
sequences in an efficient way, and there is no need to determine the optimal value of a threshold
parameter.
4.4. Results for p = 2

In this section, we briefly illustrate how the (M)WEE and (M)ACE strategies perform for finite
element polynomial order p = 2.
First, consider a smooth case with = 3.1, such that u ∈ H 3 and u ∈
/ H 4 . Error versus DOF and
total work are plotted for WEE and ACE in Figure 20. Both strategies lead to global refinement in
every step for this example and produce a sequence of grids that are very close to optimal radical
grids.
Figure 21 shows results for a highly singular case, with = 0.6, such that u ∈ H 1 and u ∈/ H 2.
WEE and ACE produce small refinements, but this is remedied by the MWEE and MACE strategies,
(a) (b)
Figure 20. Efficiency-based refinement strategies for a smooth problem with p = 2 ( = 3.1): (a) error
L
versus DOF and (b) final error, E L , versus total work, =1 N .
(a) (b)
Figure 21. Efficiency-based refinement strategies for a singular problem with p = 2 ( = 0.6): (a) error
L
versus DOF and (b) final error, E L , versus total work, =1 N .
DOI: 10.1002/nla
resulting, as before, in much less work for the modified strategies. It has to be noted, however,
that the MWEE and MACE grids contain many more elements than optimal graded grids. This
is probably due to the fact that the singularity is very strong for = 0.6 and p = 2, such that a
geometrically graded grid with a grading factor of 12 does not decrease the grid size fast enough
in the vicinity of the singularity. Nevertheless, we can conclude that, within the constraint of
refinement based on splitting cells in two, the MWEE and MACE strategies lead to an efficient
refinement process.
5. EFFICIENCY-BASED hp-REFINEMENT STRATEGIES IN 1D
Assuming that we know a good approximation for the p-refinement error reduction factor for each
element, we can apply the efficiency-based refinement strategies to hp-refinement processes.
5.1. hp-version of the (M)WEE and (M)ACE refinement strategies

Consider an hp-finite element method for our simple example problem (3). Let Th = {0 =
x0 <x1 < · · · <x N = 1} be the grid and let p = { p1 , p2 , . . . , p N } be the degrees of the polynomials in
the elements. Let u h be the Galerkin finite element solution of (3) and 2j ( p j ) = u −u s 20, j the
local error functional in element j = [x j−1 , x j ] with polynomial of degree p j . We choose local
Legendre polynomials as the modal base functions [3]. Then we have the following theorem:
Theorem 2 (Gui and Babuška [5])

Let 2j ( p j ) be the local error of the finite element solution of problem (12), and let
√ √
xi − xi−1
j = [x j−1 , x j ], j = √ √
xi + xi−1
Then
h 12−1
21 ( p1 ) ≈ (19)
p14−2
If j (2 jN ) is not close to 1, then
⎧ ⎫2
⎨ 2 −1 p j ⎬
−1/2 1− j j
2j ( p j ) ≈ h j (20)
⎩ 2 j p j ⎭
We only consider h-refinement for the first element, which contains the singularity. Then we
have the error functional reduction factor bound ( 12 )2−1 as in (12). For an element j that does
not contain the singularity, note that j is small, and again we obtain the same h-reduction factor
bound, ( 12 )2 p j , as before (see (10)). Moreover, if we double the degree of polynomial p j , we
obtain the p-reduction factor bound as follows:
p j 2
j (2 p j ) j
≈ (21)
j(pj) 2
DOI: 10.1002/nla
We can then develop an hp-version of the MWEE strategy as follows:

(1) Order the elements such that the local error, j , satisfies 1 2 · · · N .
(2) Let pmax be the maximal polynomial order to be used in the refinement process. Three types
of refinement are used, depending on the element. We use a graded grid with p = 1 for the
elements containing a singularity, in such a way that the predicted error-reduction factor
attains 14 . (Note that a target reduction factor of up to 1/2 pmax could be used, but we choose
1
4 for simplicity in our numerical tests.) For elements without a singularity, p-refinement
(doubling p) is used if the solution is locally smooth enough (which, in general, can be
detected a posteriori by comparing predicted and observed error-functional reduction ratios)
and p< pmax . Otherwise, h-refinement is used and the degree p is inherited by both sub-
elements. As before, we assume that the work of solving the linear systems is proportional
to the number of DOF. Then, doubling p or splitting the element into two elements with
order p has the same computational complexity.
(3) Calculate the MWEE or MACE efficiency function and find the optimal fraction of elements
to be refined, ropt .
(4) Refine elements j , 1 jropt N .
(5) Repeat.
For a general problem different from (3), it may be difficult to find a sharp approximation
formula for the error reduction in the case of p-refinement. Hence, we are interested in seeking a
more general but possibly less sharp p-error reduction factor. Recall that for elements j in which
the solution is smooth (at least in H p j +1 ( j ) if order p j elements are used), we have
2p
2j ( p j )C( p j )h j j u2 p j +1
H ( j )
More precisely, we have the following approximation [3]:

2 p j
hj 1
2j ( p j )c u2 p j +1 (22)
2 (2 p j )! H ( j )
Assuming that (1/(2 p j )!)u2 p j +1 M, where M is a constant, we obtain the following general
H ( j )
p-error reduction factor
2 p j
2j (2 p j ) hj
≈ (23)
2j ( p j ) 2
for elements j that do not contain a singularity.
5.2. Optimal geometric hp-grid for the model problem

Just as in the case of h-refinement, we seek some kind of optimal grid for comparison. Suppose
the locations of the grid points are given by
x j =q N− j , 0<q<1, j = 1, 2, . . . , N (24)
DOI: 10.1002/nla
10
10
10
geometric qopt
geometric q=0.5
10
10 10 10 10
DOF
√ √
Let j = = (1− q)/(1+ q), ∀ j : 1 jN . It was shown in [5] that the optimal degree distri-
bution of p for these grid locations tends to a linear distribution with slope
log q
so = (−1/2) (25)
log
Furthermore, the optimal geometric grid factor q and linear slope so combination is given by
√
qopt = ( 2−1)2 , sopt = 2−1 (26)
5.3. Numerical results and comparisons

We apply the hp-version MWEE and MACE strategies with the two p-refinement reduction
factors given by (21) and (23) to our model problem 3 with = 0.6 and compare the numer-
ical results with the optimal geometric grid with q = qopt and q = 12 ; see Figures 22 and 23. In
the numerical results, we carry out the refinement process until E L (u h , f )5e−3 on final grid
level L.
Observe that the hp-finite element methods result in much faster error convergence rates than the
h-finite element method. Both the hp-MACE and hp-MWEE strategies result in a highly accurate
grid sequence with rate-of-error convergence very close to the geometrical grid with grading
number q = 0.5. Also, the refinement process is efficient, i.e. the number of DOF increases fast
w.r.t. the refinement level. Surprisingly, hp-refinement strategies using the more general, but less
accurate, error reduction factor (23), result in a better grid sequence than with the more accurate
Babuška factor, (21). The results are even better than the optimal geometric grid sequence when
the number of DOF is small. More work needs to be done to verify whether the general factor
(23) works well for more general problems.
DOI: 10.1002/nla
10
10
error on final grid
10
10
10 10 10 10 10
total work
L
6. 2D RESULTS
In this section, we explore the use of the proposed efficiency-based refinement strategies in two
spatial dimensions. In these initial considerations, we discuss only problems with sufficiently
smooth solutions.
6.1. Efficiency strategies in Rd

The efficiency-based WEE and ACE refinement strategies presented above can readily be applied
to problems in d spatial dimensions. Let ⊂ Rd . Assume again that the error estimator, F(u h , f ),
is equivalent to the H 1 norm of u −u h : F(u h , f ) ≈ u −u h 2H 1 () . Assume that the refinement
process, in each step, splits elements into 2d sub-elements. Then the element growth ratio, (r ),
is given by
(r ) = 1+(2d −1)r (27)
Suppose the solution is sufficiently smooth in the whole domain. As in the 1D case, the predicted
functional reduction factor, (r ), is given by
(r ) = 1− f (r )+( 12 )2 p f (r ) (28)
The WEE and ACE strategies can then be used to determine the fraction of elements to be refined,
ropt , according to Equations (5) and (7), respectively.
It should be noted here that the WEE measure may be problematic in dimensions higher than √one.
This can be seen as follows. The WEE measure determines ropt by minimizing MWEE ≡ (r ) (r )
over r ∈ [1/N , 1]. For smooth solutions, (1/N ) ≈ 1 and (1/N ) ≈ 1, such that MWEE (1/N ) ≈ 1.
For r = 1, however, it can be observed that (1) = 2d and (1) = ( 12 )2 p , such that MWEE (1) = 2d− p .
This means that MWEE >1 when d> p. MWEE (r ) is often a very smooth function; hence, ropt is
likely to be close to 1/N when d> p, resulting in small refinements, which are inefficient. We,
DOI: 10.1002/nla
thus, expect that the WEE strategy may not be efficient when d> p. We investigate this issue in
the numerical results presented below. Also, it can be noted that this problem does not occur for
the ACE strategy.
6.2. Model problem and finite element method

The following 2D finite element problem is considered to illustrate the efficiency-based refinement
strategies. We solve the Poisson boundary value problem (BVP)
− p = f in
p=g on * (29)
= (0, 1)×(0, 1)
with the right-hand side f and boundary conditions g chosen such that the solution is given by
⎧
⎪ 1, r r0
⎨
p(r, ) = h(r ), r0 r r1 (30)
⎪
⎩
0, r1 r
Here, (r, ) are the usual polar coordinates and h(r ) is the unique polynomial of degree five such
that p ∈ C 2 (). We choose r0 = 0.7 and r1 = 0.8. The solution of this test problem takes on the
unit value in the lower left corner of the domain and is zero elsewhere, except for a steep gradient
in the thin strip 0.7r 0.8. Figure 24(a) shows the grid obtained for this model problem after
several refinement steps.
Figure 24. Adaptively refined grids using the ACE refinement strategy for 2D problems with p = 2:
(a) single arc on a unit square domain and (b) double arc on a unit square domain.
DOI: 10.1002/nla
To illustrate the broad applicability of our refinement strategies, we solve this model problem
using a FOSLS finite element method, rather than the Galerkin method that was used for the 1D
test problems. BVP (29) is rewritten as a first-order system BVP [8]
−∇ ·U = f in
U = ∇p
∇ ×U = 0
p=g on * (31)
*g
s·U =
*
= (0, 1)×(0, 1)
where U is a vector of auxiliary unknowns, and s is the unit vector tangent to *. The FOSLS error
estimator is given by F( ph ,Uh ; f ) = ∇ ·Uh + f 2L 2 () +Uh −∇ ph 2(L 2 ())2 +∇ ×Uh 2L 2 () .
Under certain smoothness assumptions, the FOSLS error estimator is equivalent to the H 1 -norm
[8]: F( ph ,Uh ; f ) ≈ p − ph 2H 1 () +U −Uh 2(H 1 ())2 .
Note that in our approach refinement is performed in such a way that new nodes are introduced
on element edges and faces; hence, local refinement introduces hanging nodes (see Figure 24(a)).
To maintain a C 0 solution, we treat these as slave nodes, enforcing a continuity constraint across
element boundaries. This results in a conforming finite element method, and the approximation
properties discussed in this paper still hold on this type of grid.
6.3. Numerical results

We present numerical results for the 2D model problem with p = 1 and 2 in Figures 25 and 26,
respectively. The figures show error versus DOF and total work for the WEE and ACE refinement
strategies, compared with global refinement in every step.
For p = 1, the ACE strategy results in an efficient algorithm, but, as expected, the WEE strategy
produces many small refinement steps for this case where d> p and is, thus, not efficient (Figure 25).
(a) (b)
Figure 25. Efficiency-based refinement strategies for the 2D model problem with p = 1: (a) error versus
L
DOF and (b) final error, E L , versus total work, =1 N .
DOI: 10.1002/nla
(a) (b)
Figure 26. Efficiency-based refinement strategies for the 2D model problem with p = 2: (a) error versus
L
DOF and (b) final error, E L , versus total work, =1 N .
Figure 26 shows that, for p = 2 (d = p), both the ACE and WEE strategies produce an efficient
refinement process.
Figure 24(b) shows the resulting grid when the ACE strategy is applied to a slightly more
complicated test problem, in which two circular steps are superimposed (u = 1 in the lower left
corner, u = 2 in the lower right corner, u = 3 where the two steps overlap, and u = 0 in the top part
of the domain). The adaptive refinement process adequately captures the error generated at the
steep gradients.
7. CONCLUSIONS
Two efficiency-based adaptive refinement strategies for finite element methods, WEE and ACE,
were discussed. The two strategies take both error reduction and work into account. The two
strategies were first compared for a 1D model problem. For the case of h-refinement with smooth
solutions, the efficiency-based strategies generate a highly accurate grid sequence and an efficient
refinement process. However, for singular solutions, the refinement process becomes inefficient
due to many steps of small refinements. Use of a graded grid for elements with a singularity leads
to significant improvement. For both the WEE and ACE strategies, this modification saves a lot
of work and also results in a highly accurate grid sequence. For the hp-refinement case, similar
conclusions are obtained. However, for general problems, the difficulty here may lie in how to
find a good approximation for the p-error reduction factor. Application to problems with spatial
dimension larger than one shows that the WEE strategy is inefficient when the dimension, d, is
larger than the finite element order, p. The ACE strategy, however, produces an efficient refinement
process for any combination of d and p.
Future work will include application of these grid refinement strategies to problems with singu-
larities in multiple spatial dimensions. Also, an idea to be explored in the future is to enhance
the refinement strategies by allowing double or triple refinement for some elements, and deter-
mining, in each step, the optimal number of elements to be refined once, twice and thrice. More
realistic measures for computational work must be considered that may, for instance, take into
account matrix assembly costs and multigrid convergence factors, and their dependence on the
finite element order and the spatial dimension of the problem. Another topic of interest is the
DOI: 10.1002/nla
parallelization of the efficiency-based refinement strategies. Binning strategies need to be consid-

ered to reduce the work for minimizing the efficiency measures and to reduce the communication
between processors [4]. Also, load balancing issues are important for parallel adaptive methods
(see, e.g. [10]). After initial solution of a coarse level problem on a single processor, the domain
may be partitioned such that each parallel processor receives a subdomain with approximately
the same amount of error. This may be a fruitful strategy for load balancing in that, as the grid
becomes finer, the optimal refinement approaches global refinement, which requires minimal load
balancing. This will be explored in future research.
REFERENCES
1. Ruede U. Mathematical and Computational Techniques for Multilevel Adaptive Methods. Frontiers in Applied
Mathematics, vol. 13. SIAM: Philadelphia, 1993.
2. Verfuerth R. A Review of a Posteriori Error Estimation and Adaptive Mesh-Refinement Techniques. Teubner,
Wiley: Stuttgart, 1996.
3. Schwab C. p- and hp-Finite Element Methods. Clarendon Press: Oxford, 1998.
4. Berndt M, Manteuffel TA, McCormick SF. Local error estimates and adaptive refinement for first-order system
least squares (FOSLS). Electronic Transactions on Numerical Analysis 1997; 6:35–43.
5. Gui W, Babuška I. The h, p and hp versions of the finite element method in 1 dimension, parts I, II, III.
Numerische Mathematik 1986; 49:577–683.
6. Brenner SC, Scott LR. The Mathematical Theory of Finite Element Methods. Springer: New York, 1996.
7. Cai Z, Lazarov R, Manteuffel TA, McCormick SF. First-order system least squares for second-order partial
differential equations. I. SIAM Journal on Numerical Analysis 1994; 31:1785–1799.
8. Cai Z, Manteuffel TA, McCormick SF. First-order system least squares for second-order partial differential
equations. II. SIAM Journal on Numerical Analysis 1997; 34:425–454.
9. Bochev PB, Gunzburger MD. Finite element methods of least-squares type. SIAM Review 1998; 40:789–837.
10. Bank RE, Holst MJ. A new paradigm for parallel adaptive meshing algorithms. SIAM Review 2003; 45:292–323.
DOI: 10.1002/nla
Published online 29 October 2007 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nla.559
Distance-two interpolation for parallel algebraic multigrid
Hans De Sterck1 , Robert D. Falgout2 , Joshua W. Nolting3 and Ulrike Meier Yang2, ∗, †
1 Department of Applied Mathematics, University of Waterloo, Waterloo, Ont., Canada N2L 3G1
2 Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, P.O. Box 808, Livermore,
CA 94551, U.S.A.
3 Department of Applied Mathematics, University of Colorado at Boulder, Campus Box 526, Boulder,
CO 80302, U.S.A.
SUMMARY
Algebraic multigrid (AMG) is one of the most efficient and scalable parallel algorithms for solving sparse
linear systems on unstructured grids. However, for large 3D problems, the coarse grids that are normally
used in AMG often lead to growing complexity in terms of memory use and execution time per AMG
V-cycle. Sparser coarse grids, such as those obtained by the parallel modified independent set (PMIS)
coarsening algorithm, remedy this complexity growth but lead to nonscalable AMG convergence factors
when traditional distance-one interpolation methods are used. In this paper, we study the scalability of AMG
methods that combine PMIS coarse grids with long-distance interpolation methods. AMG performance
and scalability are compared for previously introduced interpolation methods as well as new variants
of them for a variety of relevant test problems on parallel computers. It is shown that the increased
interpolation accuracy largely restores the scalability of AMG convergence factors for PMIS-coarsened
grids, and in combination with complexity reducing methods, such as interpolation truncation, one obtains
a class of parallel AMG methods that enjoy excellent scalability properties on large parallel computers.
Received 11 May 2007; Revised 20 September 2007; Accepted 21 September 2007
KEY WORDS: algebraic multigrid; long-range interpolation; parallel implementation; reduced complexity;
truncation
1. INTRODUCTION
Algebraic multigrid (AMG) [1–4] is an efficient potentially scalable algorithm for sparse linear
systems on unstructured grids. However, when applied to large 3D problems, the classical algorithm
∗ Correspondence to: Ulrike Meier Yang, Center for Applied Scientific Computing, Lawrence Livermore National
Laboratory, P.O. Box 808, Livermore, CA 94551, U.S.A.
†
E-mail: umyang@llnl.gov
Contract/grant sponsor: U.S. Department of Energy; contract/grant number: W-7405-Eng-48

often generates unreasonably large complexities with regard to memory use as well as computa-
tional operations. Recently, we suggested a new parallel coarsening algorithm, called the parallel
modified independent set (PMIS) algorithm [5], which is based on a parallel independent set
algorithm suggested in [6]. The use of this coarsening algorithm in combination with a slight
modification of Ruge and Stüben’s classical interpolation scheme [2] leads to significantly lower
complexities as well as significantly lower setup and cycle times. For various test problems, such
as isotropic and grid-aligned anisotropic diffusion operators, one obtains scalable results, partic-
ularly when AMG is used in combination with Krylov methods. However, AMG convergence
factors are severely impacted for more complicated problems, such as problems with rotated
anisotropies or highly discontinuous material properties. Since we realized that classical interpola-
tion methods, which use only distance-one neighbors for their interpolatory set, were not sufficient
for these coarse grids, we decided to investigate interpolation operators that also include distance-
two neighbors. In this paper, we focus on the following distance-two interpolation operators:
we study three methods proposed in [3], namely, standard interpolation, multipass interpolation,
and the use of Jacobi interpolation to improve other interpolation operators, and we investigate
two extensions of classical interpolation, which we denote with ‘extended’ and ‘extended+i’
interpolation.
Our investigation shows that all of the long-distance interpolation strategies, except for multipass
interpolation, significantly improve AMG convergence factors compared with classical interpola-
tion. Multipass interpolation shows poor numerical scalability, which, however, can be improved
with a Krylov accelerator, but it has very small computational complexity. All other long-distance
interpolation operators showed increased complexities. While the increase is not very significant
for 2D problems, it is of concern in the 3D case. Therefore, we also investigated complexity
reducing strategies, such as the use of smaller sets of interpolation points and interpolation trun-
cation. The use of these strategies led to AMG methods with significantly improved overall
scalability.
The paper is organized as follows. In Section 2, we briefly describe AMG. In Section 3,
distance-one interpolation operators are presented, and Section 4 describes long-range interpolation
operators. In Section 5, the computational cost of the interpolation strategies is investigated,
and in Section 6 some sequential numerical results are given, which motivate the following
sections. Section 7 presents various complexity reducing strategies. Section 8 investigates the
parallel implementation of the methods. Section 9 presents parallel scaling results for a variety of
test problems, and Section 10 contains the conclusions.
2. ALGEBRAIC MULTIGRID
In this section, we give an outline of the basic principles and techniques that comprise AMG, and
we define terminology and notation. Detailed explanations may be found in [2, 3, 7]. Consider a
problem of the form
Au = f (1)
where A is an n ×n matrix with entries ai j . For convenience, the indices are identified with grid
points, so that u i denotes the value of u at point i, and the grid is denoted by = {1, 2, . . . , n}.
DOI: 10.1002/nla
DISTANCE-TWO INTERPOLATION FOR PARALLEL AMG 117
In any multigrid method, the central idea is that ‘smooth error,’ e, that is not eliminated by
relaxation must be removed by coarse-grid correction. This is done by solving the residual equation
Ae =r on a coarser grid, then interpolating the error back to the fine grid and using it to correct
the fine-grid approximation.
Using superscripts to indicate level number, where 1 denotes the finest level so that A1 =
A and 1 = , AMG needs the following components: ‘grids’ 1 ⊃ 2 ⊃ · · · ⊃ M , grid opera-
tors A1 , A2 , . . . , A M , interpolation operators P k , restriction operators R k (often R k = (P k )T ), and
smoothers S k , where k = 1, 2, . . . , M −1.
Most of these components of AMG are determined in a first step, known as the setup phase.
During the setup phase, on each level k, k = 1, . . . , M −1, k+1 is determined using a coarsening
algorithm, P k and R k are defined and the Ak+1 is determined using the Galerkin condition
Ak+1 = R k Ak P k . Once the setup phase is completed, the solve phase, a recursively defined cycle,
can be performed as follows:
Algorithm
MGV(Ak , R k , P k , S k , u k , f k ).
If k = M, solve A M u M = f M with a direct solver.
Otherwise:
Apply smoother S k 1 times to Ak u k = f k .
Perform coarse-grid correction:
Set r k = f k − Ak u k .
Set r k+1 = R k r k .
Set ek+1 = 0.
Apply M GV (Ak+1 , R k+1 , P k+1 , S k+1 , ek+1 ,r k+1 ).
Interpolate ek = P k ek+1 .
Correct the solution by u k ← u k +ek .
Apply smoother S k 2 times to Ak u k = f k .
In the remainder of the paper, index k will be dropped for simplicity. The algorithm above describes
a V(1 , 2 )-cycle; other more complex cycles such as W-cycles are described in [7]. In every
V-cycle, the error is reduced by a certain factor, which is called the convergence factor. A sequence
of V-cycles is executed until the error is reduced below a specified tolerance. For a scalable AMG
method, the convergence factor is bounded away from one independently of the problem size n, and
the computational work in both the setup and solve phases is linearly proportional to the problem
size n. While AMG was originally developed in the context of symmetric M-matrix problems,
AMG has been applied successfully to a much wider class of problems. We assume in this paper
that A has positive diagonal elements.
3. DISTANCE-ONE INTERPOLATION STRATEGIES
In this section, we first give some definitions as well as some general remarks, and then recall
the possibly simplest interpolation strategy, the so-called direct interpolation strategy [3]. This
is followed by a description of the classical distance-one AMG interpolation method that was
introduced by Ruge and Stüben [2].
DOI: 10.1002/nla
3.1. Definitions and remarks

One of the concepts used in the following sections is strength of connection. A point j strongly
influences a point i or i strongly depends on j if
−ai, j > max(−ai,k ) (2)
k=i
where 0<<1. We set = 0.25 in the remainder of the paper.

We define the measure of a point i as the number of points which strongly depend on i. When
PMIS coarsening is used, a positive random number that is smaller than 1 is added to the measure
to distinguish between neighboring points that strongly influence the same number of points. In the
PMIS-coarsening algorithm, points that do not strongly influence any other points are initialized
as F-points.
Using this concept of strength of connection we define the following sets:
Ni = { j|ai j = 0}
Si = { j ∈ Ni | j strongly influences i}
Fis = F ∩ Si
Cis = C ∩ Si
Niw = Ni \(Fis ∩Cis )
In classical AMG [2], the interpolation of the error at the F-point i takes the form

ei = wi j e j (3)
j∈Ci
where wi j is an interpolation weight determining the contribution of the value e j to ei , and Ci ⊂ C

is the coarse interpolatory set of F-point i. In most classical approaches to AMG interpolation,
Ci is a subset of the nearest neighbors of grid point i, i.e. Ci ⊂ Ni , and longer-range interpolation
is not considered.
The points to which i is connected, comprise three sets: Cis , Fis and Niw . Based on assumptions
on small residuals for smooth error [1–3, 7], an interpolation formula can be derived as follows.
The assumption that algebraically smooth error has small residuals after relaxation
Ae ≈ 0
can be rewritten as

aii ei ≈ − ai j e j (4)
j∈Ni
or

aii ei ≈ − ai j e j − ai j e j − ai j e j (5)
j∈Cis j∈Fis j∈Niw
From this expression, various interpolation formulae can be derived. We use the terminology of
[3] for the various interpolation strategies.
DOI: 10.1002/nla
3.2. Direct interpolation

The so-called ‘direct interpolation’ strategy [3] has one of the most simple interpolation formulae.
The coarse interpolatory set is chosen as Ci = Cis , and

ai j k∈Ni aik
wi j = − , j ∈ Cis (6)
aii k∈C s aik
i
This leads to an interpolation, which is often not accurate enough. Nevertheless, we mention this
approach here, since various other interpolation operators which we consider are based on it. This
method is denoted by ‘direct’ in the tables presented below. In [3] it is also suggested to separate
positive and negative coefficients when determining the weights, a strategy which can help when
one encounters large positive off-diagonal matrix coefficients. We do not consider this approach
here, since the strategy did not lead to an improvement for the problems we consider here.
3.3. Classical interpolation

A generally more accurate distance-one interpolation formula is the interpolation suggested by
Ruge and Stüben in [2], which we call ‘classical interpolation’ (‘clas’). Again, Ci = Cis , but the
contribution from strongly influencing F-points (the points in Fis ) in (5) is taken into account more
carefully. An appropriate approximation for the errors e j of those strongly influencing F-points
may be defined as

k∈C a jk ek
ej ≈ i (7)
k∈Ci a jk
This approximation can be justified by the observation that smooth error varies slowly in the
direction of strong connection. The denominator simply ensures that constants are interpolated
exactly. Replacing the e j with a sum over the elements k of the coarse interpolatory set Ci
corresponds to taking into account strong F–F connections using C-points that are common
between the F-points. Note that, when the two F-points i and j do not have a common C-point
in Cis and C sj , the denominator in (7) is small or vanishing. Weak connections (from the points in
Niw ) are generally not important and, in (5), errors e j , j ∈ Niw are replaced by ei . This leads to the
following formula for the interpolation weights:

1 aik ak j
wi j = − ai j + , j ∈ Cis (8)
aii + k∈N w aik k∈F s m∈C s akm
i i i
In our experiments this interpolation is further modified as proposed in [8] to avoid extremely
large interpolation weights that can lead to divergence.
Now the interpolation above was suggested based on a coarsening algorithm that ensured that
two strongly connected F-points always have a common coarse neighbor. Since this condition is
no longer guaranteed when using PMIS coarsening [5], it may happen that the term m∈C s ak,m
i
in Equation (8) vanishes. In our previous paper on the PMIS-coarsening method [5], we modified
interpolation
formula (8) such that if this case occurs, aik is added to the diagonal term (the term
aii + k∈N w aik in Equation (8)), i.e. the strongly influencing neighbor point k of i is treated similar
i
to a weak connection of i. In what follows, we denote the set of strongly connected neighbors
DOI: 10.1002/nla
k i l
Figure 1. Example illustrating a situation occurring with PMIS coarsening, which will not correctly be
treated by direct or classical interpolation. Black points denote C-points, white points denote F-points,
and the arrow from i to l denotes that i strongly depends on l.
k of i that are F-points but do not have a common C-point, i.e. Cis ∩Cks = ∅, by Fis∗ . Combining
this with the modification suggested in [8] we obtain the following interpolation formula:

1 aik āk j
wi j = − ai j + , j ∈ Cis (9)
aii + k∈N w ∪F s∗ aik k∈F \F
s s∗ s
m∈C km ā
i i i i i
where

0 if sign(ai j ) = sign(aii )
āi j =
ai j otherwise
In this paper we refer to formula (9) as ‘classical interpolation’. The numerical results that were
presented in [5] showed that this interpolation formula, which is based on Ruge and Stüben’s
original distance-one interpolation formula [2], resulted in AMG methods with acceptable perfor-
mance when used with PMIS-coarsened grids for various problems, but only when the AMG cycle
is accelerated by a Krylov subspace method. Without such acceleration, interpolation formula (9)
is not accurate enough on PMIS-coarsened grids: AMG convergence factors deteriorate quickly
as a function of problem size, and scalability is lost. For various problems, such as problems with
rotated anisotropies or problems with large discontinuities, adding Krylov acceleration did not
remedy the scalability problems.
One of the issues is that distance-one interpolation schemes do not treat situations similar to the
one illustrated in Figure 1 correctly. Here we have an F-point with measure smaller than 1 that
has no coarse neighbors. This situation can occur for example if we have a fairly large strength
threshold. Both for classical and direct interpolation, the interpolated error in this point will vanish,
and coarse-grid correction will not be able to reduce the error in this point.
A major topic of this paper is to investigate whether distance-two interpolation methods are able
to restore grid-independent convergence to AMG cycles that use PMIS-coarsened grids, without
compromising scalability in terms of memory use and execution time per AMG V-cycle.
4. LONG-RANGE INTERPOLATION STRATEGIES
In this section, various long-distance interpolation methods are described. Parallel implementation
of some of these interpolation methods and parallel scalability results on PMIS-coarsened grids
are discussed later in this paper.
4.1. Multipass interpolation

Multipass interpolation (‘mp’) is suggested in [3], and is useful for low-complexity coarsening
algorithms, particularly the so-called aggressive coarsening [3]. We suggested it in [5] as a possible
DOI: 10.1002/nla
interpolation scheme to fix some of the problems that we saw when using our classical interpolation
scheme (9). Multipass interpolation proceeds as follows:
1. Use direct interpolation for all F-points i, for which Cis = ∅. Place these points in set F ∗ .
∗ ∗ ∗
all i ∈ F \ F with F ∩ Fi = 0, replace, in Equation (4), for all j ∈ Fi ∩ F , e j by
2. For s s
k∈C j w jk ek , where C j is the interpolatory set for e j . Apply direct interpolation to the new
equation. Add i to F ∗ . Repeat step 2 until F ∗ = F.
Multipass interpolation is fairly cheap. However, it is not very powerful, since it is based on
direct interpolation. If applied to PMIS, it still ends up being direct interpolation for most F-points.
However, it fixes the situation illustrated in Figure 1. If we apply multipass interpolation, the
point i will be interpolated by the coarse neighbors (black points) of F-points k and l.
4.2. Jacobi interpolation

Another approach that remedies convergence issues caused by distance-one interpolation formulae
is Jacobi interpolation [3]. This approach uses an existing interpolation formula P (0) and applies
one or more Jacobi iteration steps to the F-point portion of the interpolation operator leading to
a more accurate interpolation operator P (n) .
Assuming that A and the interpolation operator P (n) are reordered according to the C/F-splitting
and can be written in the following way:
(n)
A F F A FC (n) PFC
A= , P = (10)
AC F ACC ICC
(0) (0)
then Jacobi iteration on A F F e F + A FC eC = 0, with initial guess e F = PFC eC , leads to
(n) (n−1)
PFC = (I F F − D −1
F F A F F )PFC − D −1
F F A FC (11)
where D F F is the diagonal matrix containing the diagonal of A F F , and I F F and ICC are identity
matrices.
If we apply this approach to a distance-one interpolation operator similar to classical interpolation,
we obtain an improved long-distance interpolation operator. This approach is also recommended
to be used to improve multipass interpolation. We include results where classical interpolation is
used followed by one step of Jacobi interpolation in our numerical experiments and denote them
by ‘clas+j’.
4.3. Standard interpolation

Standard interpolation (‘std’) extends the interpolatory set that is used for direct interpolation [3].
by extending the stencil obtained through (4) via substitution of every e j with j ∈ Fi
This is done s
by 1/a j j k∈N j a jk ek . This leads to the following formula:

âii ei + âi j e j ≈ 0 (12)
j∈ N̂i

with the new neighborhood N̂i = Ni ∪ j∈F s N j and the new coarse point set Ĉi = Ci ∪ j∈F s C j .
i i
This can greatly increase the size of the interpolatory set.
DOI: 10.1002/nla
k l
i m
n k
Figure 2. Example of the interpolatory points for a 5-point stencil (left) and a 9-point stencil (right). The
gray point is the point to be interpolated, black points are C-points and white points are F-points.
See the left example in Figure 2. Consider point i. Using direct or classical interpolation, i
would only be interpolated by the two distance-one coarse points. However, when we include the
coarse points of its strong fine neighbors m and n, two additional interpolatory points k and l are
added, leading to a potentially more accurate interpolation formula. Standard interpolation is now
defined by applying direct interpolation to the new stencil, leading to

âi j k∈ N̂i âik
wi j = − (13)
âii k∈Ĉi âik
4.4. Extended interpolation

It is possible to extend the classical interpolation formula so that the interpolatory set includes
C-points that are distance two away from the F-point to be interpolated, i.e. applying the classical
interpolation formula, but using the same interpolatory set that is used in standard interpolation,

see Figure 2: Ĉi = Ci ∪ j∈F s C j .
i
Using the same reasoning that leads to the classical interpolation formula (8), the following
approximate statement can be made regarding the error at an F-point i:

k∈Ĉ a jk ek
aii + ai j ei ≈ − ai j e j − ai j i (14)
j∈N wi j∈Ĉ i j∈F s
i k∈Ĉi a jk
It then follows immediately that the interpolation weights using the extended coarse interpolatory
set Ĉi can be defined as

1 aik āk j
wi j = − ai j + , j ∈ Ĉi (15)
aii + k∈N w \Ĉi aik k∈F s m∈Ĉi ākm
i i
DOI: 10.1002/nla
2 2
-1 -1 -1
0 1 2 3
Figure 3. Finite difference 1D Laplace example.
Note that this may lead to some weak coarse points in Niw being included in the interpolatory
set Ĉi , if they are strongly connected to a neighbor point of i. This new interpolation formula
deals efficiently with strong F–F connections that do not share a common C-point. We call this
interpolation strategy ‘extended interpolation’ (‘ext’).
4.5. Extended+i interpolation

While extended interpolation remedies many problems that occur with classical interpolation, it
does not always lead to the desired weights. Consider the case given in Figure 3. Here we have a 1D
Laplace problem generated by finite differences. Points 1 and 2 are strongly connected F-points,
and points 0 and 3 are coarse points. Clearly 0, 3 is the interpolatory set for point 1 for the case
of extended interpolation. If we apply formula (15) to this example to calculate w1,0 and w1,3 , we
obtain
w1,0 = 0.5, w1,3 = 0.5
This is a better result than we would obtain for direct interpolation (6) and classical interpolation (9):
w1,0 = 1, w1,3 = 0
but worse than standard interpolation (13), for which we obtain the intuitively best interpolation
weights:
w1,0 = 23 , w1,3 = 13 (16)
This can be remedied if we include not only connections a jk from strong fine neighbors j of i
to points k of the interpolatory set but also connections a ji from j to point i itself. An alternative
to expression (7) for the error in strongly connected F-points is then given by

k∈C ∪{i} a jk ek
ej ≈ i (17)
k∈Ci ∪{i} a jk
This can be rewritten as

k∈Ci a jk ek a ji ei
ej ≈ + (18)
k∈Ci ∪{i} a jk k∈Ci ∪{i} a jk
which then, in a similar way as before, leads to interpolation weights

1 āk j
wi j = − ai j + aik , j ∈ Ĉi (19)
ãii k∈F s l∈Ĉi ∪{i} ākl
i
DOI: 10.1002/nla
with now
āki
ãii = aii + ain + aik (20)
n∈Niw \Ĉi k∈Fis l∈Ĉi ∪{i} ākl
We call this modified extended interpolation ‘extended+i’, and refer to it by ‘ext+i’ (or
sometimes ‘e+i’ to save space) in the tables below. If we apply it to the example illustrated in
Figure 3 we obtain weights (16).
5. COMPUTATIONAL COST OF INTERPOLATION STRATEGIES
In this section we consider the cost of some of the interpolation operators described in the previous
sections. We use the following notations:
Nc total number of coarse points

Nf total number of fine points
nk average number of distance-k neighbor points
ck average number of distance-k interpolatory points
fk average number of strong fine distance-k neighbors
wk average number of weak distance-k neighbors
sk average number of common distance-k interpolatory points
fw average number of strong neighbors treated weakly
Here f w indicates the number of strong F-neighbors that are treated weakly, which occur only for
classical interpolation (8). Also, sk denotes the average number of C-points which are distance-one
neighbors of j ∈ Fis and also distance-k interpolatory points for i, the point to be interpolated, i.e.
sk is the number of nonzero coefficients a jl , where j ∈ Fis and l is a distance-k interpolatory point,
divided by the number of distance-k interpolatory points for i. Note that sk is usually smaller than
ck and at most equal to ck . Note also that n k = f k +ck +wk .
In our considerations we assume a compressed sparse row data format, i.e. three arrays are used
to store the matrix: a real array that contains the coefficients of the matrix, an integer array that
contains the column indices for each coefficient and an integer array that contains pointers to the
beginning of each row for the other two arrays. We also assume an additional integer array that
indicates whether a point is an F- or a C-point.
For all interpolation operators mentioned before, it is necessary to determine at first the inter-
polatory set. At the same time, the data structure for the interpolation operator can be determined.
This can be accomplished by sweeping through each row that belongs to an F-point: coarse neigh-
bors are identified via integer comparisons, and the pointer array for the interpolation operator
is generated. For the distance-two interpolation schemes, it is also necessary to check neighbors
of strong fine neighbors. This requires n 1 comparisons for direct and classical interpolations, and
( f 1 +1)n 1 comparisons for extended, extended+i and standard interpolations. The final data struc-
ture contains Nc + Nf c1 coefficients for classical and direct interpolations, and Nc + Nf (c1 +c2 )
coefficients for extended(+i) and standard interpolations.
Next, the interpolation data structure is filled.
DOI: 10.1002/nla
For direct interpolation,

all that is required is to sweep through a whole row once to compute
i = − k∈Ni aik /(aii k∈C s aik ) and then multiply the relevant matrix elements ai j with i . The
i
sum in the denominator requires an additional n 1 comparisons, and the two summations require
n 1 +1 additions.
For classical, extended, and extended+i interpolations one needs to first compute for each

point k ∈ Fis \ Fis∗ , ik = aik /( m∈D s akm ), where Dis = Cis for classical, Dis = Ĉi for extended, and
i
Dis = Ĉi ∪{i} for ext+i interpolation. For example, for classical interpolation, this requires f 1 n 1
comparisons. After this step, all these coefficients need to be processed again in order to add
ik ak j to the appropriate weights. This requires an additional f 1 n 1 comparisons. The number of
additions, multiplications and divisions can be determined similarly.
For standard interpolation, at first the new stencil needs to be computed, leading to f 1 n 1 additions
and multiplications and f 1 divisions. This can be done when setting up the data structure to avoid
n 1 comparisons. After this one proceeds just as for direct interpolation with a much larger stencil
of size n 1 +n 2 .
The number of floating point additions, multiplications, and divisions to compute all interpolation
weights for each F-point are given in Table I. Note that a sum over m elements is treated
as m additions, assuming that we are adding to a variable that was originally 0. Also note
that occurrences of products of variables, such as f i ci or f i si , are of order n i2 , since f i , ci ,
si are dependent on n i . This is also reflected in the results given in Table II for two specific
examples.
Let us look at some examples to get an idea about the actual cost involved. First, consider
a 5-point stencil as in Figure 2. Here, we have the following parameters: c1 = f 1 = 2, w1 =
w2 = 0, n 1 = 4, f w = 2, s1 = 0, c2 = 2, f 2 = 3, n 2 = 5, s2 = 1.5. Table II shows the resulting inter-
polation cost. Next, we look at an example with a bigger stencil, see the 9-point stencil in
Figure 2 and Table II. The parameters are now c1 = 2, f 1 = 6, w1 = w2 = 0, n 1 = 8, f w = 1, s1 = 1,
c2 = 3, f 2 = 12, n 2 = 15, s2 = 1. We clearly see that a larger stencil significantly increases the
ratio of classical over direct interpolation, as well as that of distance-two over distance-one
interpolations.
Table III shows the times for calculating these interpolation operators for matrices with stencils
of various sizes. Two 2D examples, one with a 5-point and another with a 9-point stencil, were
examined on a 1000×1000 grid. The 3D examples, with a 7-point and a 27-point stencil, were
examined for an 80×80×80 grid. We have also included actual measurements of the average
number of interpolatory points for these examples. As expected, larger stencils lead to a larger
number of operations for each interpolation operator, with a much more significant increase
Table I. Computational cost for various interpolation operators.

Interpolation Additions Multiplications Divisions Comparisons
direct n 1 +1 c1 +1 1 2n 1
clas 2 f 1 s1 +w1 + f w f 1 s1 +c1 f 1 − f w +1 (2 f 1 +1)n 1
std f 1 n 1 +n 1 +n 2 +1 f 1 n 1 +c1 +c2 +1 f 1 +1 ( f 1 +2)n 1 +n 2
ext 2 f 1 (s1 +s2 )+w1 f 1 (s1 +s2 )+c1 +c2 f 1 +1 (3 f 1 +1)n 1
ext+i 2 f 1 (s1 +s2 +1)+w1 f 1 (s1 +s2 +1)+c1 +c2 f 1 +1 (3 f 1 +1)n 1
DOI: 10.1002/nla
Table II. Cost for examples in Figure 2.

Left example in Figure 2 Right example in Figure 2
Interpolation Adds Mults Divs Comps Adds Mults Divs Comps
direct 5 3 1 8 9 3 1 16
clas 2 2 1 20 13 8 6 104
std 18 13 3 21 72 54 7 79
ext 6 7 3 28 24 17 7 152
ext+i 10 9 3 28 36 23 7 152
Table III. Average number of distance-one (c1 ) and distance-two (c2 ) interpolatory
points and times for various interpolation operators.
Interpolation
Stencil c1 c2 Direct clas std ext ext+i
5-point 2.3 1.9 0.27 0.35 0.64 0.51 0.54
9-point 1.8 2.8 0.36 1.11 2.16 2.09 2.48
7-point 2.7 4.1 0.19 0.31 0.80 0.73 0.81
27-point 2.3 7.2 0.40 3.72 8.00 7.43 8.32
for distance-two interpolation operators, particularly for the 3D problems. These effects are signif-
icant, especially since on coarser levels the stencils become larger and, thus, impact the total
setup time.
6. SEQUENTIAL NUMERICAL RESULTS
While the previous section examined the computational cost for the interpolation operator, we are of
course mainly interested in the performance of the complete solver, which also includes coarsening,
the generation of the coarse-grid operator as well as the solve phase. We apply the new and old
interpolation operators here to a variety of test problems from [5] to compare their efficiency. We did
not include results using direct interpolation, since it performs worse than classical and multipass
interpolation for the problems considered, nor results using multipass interpolation followed by
Jacobi interpolation, since these results were very similar to those obtained for ‘clas+j’. All these
tests were obtained using AMG as a solver with a strength threshold of = 0.25, and coarse–fine-
Gauss–Seidel as a smoother. The iterations were stopped when the relative residual was smaller
than 10−8 . We also include operator complexity Cop , which is defined as the sum of the number
of nonzeroes of all matrices Ak divided by the number of nonzeroes of the original matrix A = A1 .
Cop is an indicator of computational complexity and memory use, i.e. large operator complexities
lead to large setup times, times per cycle and memory requirements.
Table IV shows results for the 2D Poisson problem −u = f using a 5-point finite difference
discretization and a 9-point finite element discretization. Table V shows results for the 2D rotated
DOI: 10.1002/nla
Table IV. AMG for the 5- and 9-point 2D Laplace problem on a 1000×1000 square
with random right-hand side using different interpolation operators.
5-point 9-point
Method Cop # its Time Cop # its Time
clas 1.92 244 151.60 1.24 157 100.44
clas+j 2.65 15 26.09 1.65 9 21.03
mp 1.92 244 152.34 1.24 183 115.72
ext 2.54 16 20.24 1.60 10 18.26
ext+i 2.57 11 16.93 1.60 10 18.40
std 2.56 16 20.63 1.60 17 23.06
Table V. AMG for a problem with 45◦ and 60◦ rotated anisotropy on a 512×512 square
using different interpolation operators.
45◦ 60◦
Method Cop # its Time Cop # its Time
clas 1.90 168 38.60 1.82 >1000
clas+j 2.39 29 10.50 3.40 424 131.85
mp 1.90 163 37.16 1.82 >1000
ext 2.07 31 8.75 2.69 217 59.70
ext+i 2.07 11 4.05 2.89 97 29.78
std 2.07 13 4.53 2.89 148 43.68
anisotropic problem
−(c2 +s 2 )u x x +2(1−)scu x y −(s 2 +c2 )u yy = 1 (21)
with s = sin , c = cos , and = 0.001 with rotation angles = 45 and 60◦ .
The use of the distance-two interpolation operators combined with PMIS shows significant
improvements over classical and multipass interpolations with regard to number of iterations as
well as time. The best interpolation operator here is the ext+i interpolation, which has the lowest
number of iterations and times in general. The difference is especially significant in the case of the
problems with rotated anisotropies. The operator complexity is larger, however, as was expected.
This increase becomes more significant for 3D problems. Here we consider the partial differential
equation
−(au x )x −(au y ) y −(au z )z = f (22)
on a n ×n ×n cube. For the Laplace problem a(x, y, z) = 1, for the problem denoted by ‘Jumps’
we consider the function a(x, y, z) = 1000 for the interior cube 0.1<x, y, z<0.9, a(x, y, z) = 0.01
for 0<x, y, z<0.1 and the other cubes of size 0.1×0.1×0.1 that are located at the corners of the
unit cube and a(x, y, z) = 1 elsewhere. The 27-point problem is a matrix with a 27-point stencil
with the value 26 in the interior and −1 elsewhere and is being tested because we also wanted to
consider a problem with a larger stencil.
DOI: 10.1002/nla
Table VI. AMG for a 7-point 3D Laplace problem, a problem with a 27-point stencil
and a 3D structured PDE problem with jumps on a 60×60×60 cube with a random
right-hand side using different interpolation operators.
7-point 27-point Jumps
Method Cop # its Time Cop # its Time Cop # its Time
clas 2.34 45 10.21 1.09 28 10.58 2.50 >1000
clas+j 5.12 11 20.35 1.34 8 17.10 5.37 15 20.99
mp 2.35 47 10.40 1.10 30 9.39 2.50 80 17.37
ext 4.93 11 16.70 1.35 8 21.32 5.27 15 16.89
ext+i 4.27 9 14.48 1.35 8 21.55 5.10 11 15.96
std 4.20 10 12.78 1.38 10 18.58 5.21 18 17.47
70
60
50
clas
no. of its.
mp
40
clas+j
std
30 ext
ext+i
20
10
0
20 40 60 80 100
n
Figure 4. Number of iterations for PMIS with various interpolation operators for a 3D 7-point
Laplace problem on a n ×n ×n-grid.
While for these problems AMG convergence factors for distance-two interpolation improve
significantly compared with classical and multipass interpolations, as can be seen in Table VI,
overall times are worse for the 7-point 3D Laplace problem as well as the 27-point problem
on a 60×60×60 grid. The only problem on the 60×60×60 grid that benefits from distance-
two interpolation operators also with regard to time is the problem with jumps, which requires
long-distance interpolation to even converge. Using distance-two interpolation operators leads to
complexities about twice as large as those obtained when using classical or multi-pass interpolation,
which work relatively well for the 7- and 27-point problem on the 60×60×60 grid. However,
when we scale up the problem sizes, they show very good scalability in terms of AMG convergence
factors, as can be seen in Figure 4, which shows the number of iterations for a 3D 7-point Laplace
problem on a n ×n ×n grid for increasing n. The anticipated large differences in numbers of
iterations between distance-one and distance-two interpolations show up in the 2D results of Tables
IV and V on grids with 1000 points per direction, but are not yet particularly significant in the
3D results of Table VI with only 60 points per direction. It is expected, however, that for the
DOI: 10.1002/nla
large problems that we want to solve on a parallel computer, distance-two interpolation operators
will lead to overall better times than classical or multi-pass interpolation due to scalable AMG
convergence factors, if the operator complexity can be kept under control. See Section 9 for actual
test results.
7. REDUCING OPERATOR COMPLEXITY
While the methods described in the previous section largely restore grid-independent convergence
to AMG cycles that use PMIS-coarsened grids, they also lead to much larger operator complexities
for the V-cycles. Therefore, it is necessary to consider ways to reduce these complexities while
(hopefully) retaining the improved convergence. In this section we describe a few ways of achieving
this.
7.1. Choosing smaller interpolatory sets

It is certainly possible to consider other interpolatory sets, which are larger than Cis , but smaller
than Ĉi . Particularly, it appears that a good interpolatory set would be one that only extends
Cis for strong F–F connections without a common C-point, since in the other cases point i is
likely already surrounded by interpolatory points and an extension is not necessary. If we look
at the right example in Figure 2, we see that neighbor k of i is the only fine neighbor that does
not share a C-point with i. Consequently, it may be sufficient to only include points n and l in
the extended interpolatory set. Applying this approach to the extended interpolation leads to the
so-called F–F interpolation [9]. The size of the interpolatory set can be further decreased if we
limit the number of C-points added when an F-point is encountered that does not have common
coarse neighbors. This has been done in the so-called F–F1 interpolation [9], where only the first
C-point is added. For the right example in Figure 2 this means that only point n or l would be
added to the interpolatory set for i.
Choosing a smaller interpolatory set decreases c2 and with it s2 , leading to fewer multiplications
and additions for the extended interpolation methods. On the other hand, additional operations
are needed to determine which coarse neighbors of strong F-points are common C-points. This
means that the actual determination of the interpolation operator might not be faster than creating
the extended interpolation operators. The real benefit, however, is achieved by the fact that the
use of smaller interpolatory sets leads to smaller stencils for the coarse-grid operator and hence
to smaller overall operator complexities.
Applying these methods to some of our previous test problems, we attain the results shown in
Tables VII and VIII. Here, ‘x-cc’ denotes that interpolation ‘x’ is used, but the interpolatory set is
only extended when there are no common C-points. Similarly, ‘x-ccs’ is just like ‘x-cc’, except
that for every strong F-point without a common C-point only a single C-point is added.
The results show that 2D problems do not benefit from this strategy, since operator complexities
are only slightly decreased, while the number of iterations increases. Therefore, total times increase.
However, 3D problems can be solved much faster due to significantly decreased setup times leading
to only half the total times when the ‘ccs’-strategy is employed. Again, these beneficial effects
are expected to be stronger on larger grids. Indeed, additional numerical tests (not presented here,
see [9]) also show that the ‘x-cc’ and ‘x-ccs’ distance-two interpolations result in algorithms that
are highly scalable as a function of problem size: Cop tends to a constant that is significantly
DOI: 10.1002/nla
Table VII. AMG for the 9-point 2D Laplace problem on a 1000×1000 square with random right-hand
side using different interpolation operators and rotated anisotropies of 0.001 on a 512×512 grid.
9-point 45◦ 60◦
ext 1.60 10 18.26 2.07 31 8.67 2.69 217 59.70
ext-cc 1.45 14 17.35 2.06 34 9.33 2.62 247 66.22
ext-ccs 1.43 15 17.46 2.05 34 9.13 2.42 270 67.96
ext+i 1.60 10 18.40 2.07 11 4.05 2.89 97 29.78
ext+i-cc 1.45 14 17.91 2.05 14 4.72 2.80 117 34.63
ext+i-ccs 1.42 15 17.98 2.04 14 4.73 2.51 143 38.87
Table VIII. AMG for a 7- and 27-point 3D Laplace problem and a 3D structured PDE problem with
jumps on a 60×60×60 cube with a random right-hand side using different interpolation operators.
7-point 27-point Jumps
ext 4.93 11 16.70 1.35 8 21.32 5.27 15 16.89
ext-cc 4.62 12 11.11 1.33 7 11.82 4.86 16 11.59
ext-ccs 4.00 12 8.46 1.31 7 10.34 4.23 17 9.61
ext+i 4.27 9 14.48 1.35 8 21.55 5.10 11 15.96
ext+i-cc 4.12 9 9.16 1.33 7 12.48 4.66 13 10.35
ext+i-ccs 3.64 9 7.23 1.31 7 10.95 4.00 14 8.37
smaller than the Cop value for the ‘x’ interpolations, and the number of iterations is nearly constant
as a function of problem size, and only slightly larger than the number of iterations for the full
‘x’ interpolation formulas [9]. This shows that using distance-two interpolation formulas with
reduced complexities restores the grid-independent convergence and scalability of AMG on PMIS-
coarsened grids, without the need for GMRES acceleration. This makes these methods suitable
algorithms for large problems on parallel computers, as is discussed below.
7.2. Interpolation truncation

Another very effective way to reduce complexities is the use of interpolation truncation. There are
essentially two ways by which we can truncate interpolation operators: we can choose a truncation
factor and eliminate every weight whose absolute value is smaller than this factor, i.e. for which
|wi j |< [3], or we can limit the number of coefficients per row, i.e. choose only the kmax largest
weights in absolute value. In both cases, the new weights need to be re-scaled so that the total
sums remain unchanged.
Both approaches can lead to significant reductions in setup times and operator complexities,
particularly for 3D problems, but if too much is truncated, the number of iterations rises signifi-
cantly, as one would expect.
We only report results for one interpolation formula (ext+i) for a 3D example here, see Table IX.
However, similar results can be obtained using the other interpolation operators. For 2D problems,
DOI: 10.1002/nla
Table IX. Effect of truncation on AMG with ext+i interpolation for a 7-point 3D
Laplace problem on a 60×60×60 cube with a random right-hand side.
Truncation factor Max. # of weights
Cop # its Time kmax Cop # its Time
0 4.27 9 14.48
0.1 4.13 9 10.72 7 3.75 9 8.63
0.2 3.88 9 8.52 6 3.42 9 7.41
0.3 3.39 10 6.82 5 3.01 10 6.42
0.4 3.02 13 6.60 4 2.73 14 6.30
0.5 2.75 20 7.67 3 2.48 24 7.41
truncation leads to an increase in total time similarly as reported for interpolatory set restriction
in the previous section.
8. PARALLEL IMPLEMENTATION
This section describes the parallel implementation and gives a rough idea of the cost involved,
with particular focus on the increase in communication required for the distance-two interpolation
formulae compared with distance-one interpolation. Since the core computation for the interpolation
routines is approximately the same as in the sequential case, we only focus on the additional
computations that are required for inter-communication between processors.
In parallel, each matrix is stored using a parallel data format, the ParCSR matrix data struc-
ture, which is described and analyzed in detail in [10]. Matrices are distributed across processors
by contiguous blocks of rows, which are stored via two compressed sparse row matrices, one
storing the local entries and the other one storing the off-processor entries. There is an additional
array containing a mapping for the off-processor neighbor points. The data structure also contains
the information necessary to retrieve information from distance-one off-processor neighbors. It,
however, does not contain information on off-processor distance-two neighbors, which compli-
cates the parallel implementation of distance-two interpolation operators. When determining these
neighbors, there are four scenarios that need to be considered, see Figure 5. Consider point i,
which is the point to be interpolated to, and is residing on Processor 0. A distance-two neighbor
can reside on the same processor as i, similar to point j; it can be a distance-one neighbor to
another point on Proc. 0, similar to point l, and therefore be already contained in the off-processor
mapping; it can be a new point on a neighbor processor, similar to point k, or it can be located on
a processor, which is currently not a neighbor processor to Proc. 0, similar to point m.
There are basically five additional parts that are required for the parallel implementation, and for
which we give rough estimates of the cost involved below. Operations include floating point and
integer operations as well as message passing and sends and receives required to communicate data
across processors. We use the following notations: n 1 denotes the average number of distance-one
neighbors per point, as defined previously, p is the total number of processors, qi is the average
number of distance-i neighbor processors per processor, Nio is the average number of distance-i
off-processor points and equals the sum of the average number of distance-i off-processor C-points,
DOI: 10.1002/nla
Proc. 0 Proc. 1
i k
Proc. 2
l m Proc. 3
Figure 5. Example of off-processor distance-two neighbors of point i. Black points are

C-points, and white points are F-points.
Cio , and distance-i off-processor F-points, Fio . Note that the estimates of number of operations
and number of processors involved given below are per processor.
1. Communication of C/F splitting for all off-processor neighbor points: This is required for
all interpolation operators and takes O(N1o )+ O(q1 ) operations.
2. Obtaining the row information for off-processor distance-one F-points: This step is necessary
for classical and distance-two interpolations but not for direct interpolation, which only
uses local matrix coefficients to generate the interpolation formula. It requires O(n 1 F1o )+
O(F1o )+ O(q1 ) operations.
3. Determining off-processor distance-two points and additional communication information:
This step is only required for distance-two interpolation operators. Finding the new off-
processor points, which requires checking whether they are already contained in the map
and describing the off-processor connections, takes O(n 1 F1o log(N1o )) operations. Sorting the
new information takes O(N2o log(N2o )) operations. Obtaining the communication information
for the new points using an assumed partition algorithm [11], requires O(N2o )+ O(log p)+
O((q1 +q2 ) log(q1 +q2 )) operations. Obtaining the additional C/F splitting information takes
O(N2o )+ O(q1 +q2 ) operations.
4. Communication of fine-to-coarse mappings: This step requires O(N1o )+ O(q1 ) operations
for distance-one interpolation and O(N1o + N2o )+ O(q1 +q2 ) operations for the distance-two
interpolation schemes.
5. Generating the interpolation matrix communication package: This step requires O(C1o )+
O(log p)+ O(q1 log q1 ) operations for distance-one interpolation and O(C1o +C2o )+
O(log p)+ O((q1 +q2 ) log(q1 +q2 )) operations for distance-two interpolation. Note that if
truncation is used, Cio should be replaced by C̃ io with C̃ io <Cio for i = 1, 2.
Summarizing these results, direct interpolation requires the least amount of communication,
followed by classical interpolation. Parallel implementation of distance-two interpolation requires
more communication steps and additional data manipulation, and involves more data and neighbor
processors. How significantly this overhead impacts the total time depends on many factors, such
as the problem size per processor, the stencil size, the computer architecture and more. Parallel
scalability results are presented below.
DOI: 10.1002/nla
9. PARALLEL NUMERICAL RESULTS
In this section, we investigate weak scalability of the new interpolation operators by applying the
resulting AMG methods to various problems.
The following problems were run on Thunder, an Intel Itanium2 machine with 1024 nodes of
four processors each, located at Lawrence Livermore National Laboratory, unless we say otherwise.
In this section, p denotes the number of processors used.
9.1. Two-dimensional problems

We first consider 2D Laplace problems, which perform very poorly for PMIS with classical
interpolation. The results we obtained for 5-point and 9-point stencils are very similar. Therefore,
we list only the results for the 9-point 2D Laplace problem here.
Table X, which contains the number of iterations and total times for this problem, shows that
classical interpolation performs very poorly, and multipass interpolation even worse. Neverthe-
less, these methods lead to the lowest operator complexities: 1.24. All long-range interpolation
schemes lead to good scalable convergence, with standard interpolation performing slightly worse
than classical followed by Jacobi, extended or extended+i interpolation, which are the overall
fastest methods here with the best scalability. Operator complexities are highest for clas+j with
1.65 and about 1.6 for the other three interpolation operators. Also, when choosing the lower
complexity versions e+i-cc and e+i-ccs, with complexities of 1.45 and 1.43, convergence dete-
riorates somewhat compared with e+i. Since for the 2D problems setup times are fairly small
and the improvement in complexities is not very significant, this increase in number of iterations
hurts the total times, and therefore there is no advantage in using low-complexity schemes for this
problem. Truncated versions lead to even larger total times.
Next, we consider the 2D problem with rotated anisotropy (21). The first problem here has an
anisotropy of 0.001 rotated by 45◦ , see Table XI. Operator complexities for classical and multipass
interpolations are here 1.9; they are 2.4 for classical interpolation followed by Jacobi, and 2.1 for
all remaining interpolation operators. Here, extended interpolation performs worse than standard
and extended+i interpolation, which gives the best results.
In Table XII, we consider the harder problem, where the anisotropy is rotated by 60◦ . Operator
complexities are now 1.8 for classical and multipass, 3.4 for clas+j, 2.9 for e+i and std, 2.7 for
ext, 2.8 for e+i-cc and 2.5 for e+i-ccs. Here, fastest convergence is obtained for the extended+i
interpolation, followed by e+i-cc, e+i-ccs, std, and ext. The other interpolations fail to converge
within 500 iterations. While long-range interpolation operators improve convergence, it is still not
good enough; hence, this problem should be solved using Krylov subspace acceleration.
Table X. Times in seconds (number of iterations) for a 9-point 2D Laplace problem with 300×300 points
per processor; ‘n.c.’ denotes ‘not converging within 500 iterations’.
p clas clas+j mp std ext e+i e+i-cc e+i-ccs
1 15(88) 3(9) 18(105) 4(15) 3(10) 3(10) 3(12) 3(13)
64 48(245) 4(11) 57(278) 6(20) 4(12) 4(12) 5(16) 5(19)
256 79(400) 5(12) 85(436) 8(27) 5(13) 5(13) 5(19) 6(21)
1024 104(494) 6(13) n.c. 9(27) 6(14) 6(14) 7(21) 7(21)
DOI: 10.1002/nla
Table XI. Times in seconds (number of iterations) for a 2D problem with a 45◦ rotated anisotropy of
0.001 with 300×300 points per processor; ‘n.c.’ denotes ‘not converging within 500 iterations’.
1 24(116) 7(27) 24(119) 3(11) 7(29) 3(10) 3(12) 3(12)
64 n.c. 12(36) 96(401) 7(22) 11(39) 5(16) 7(21) 7(23)
256 n.c. 13(37) n.c. 8(25) 12(42) 6(18) 8(25) 8(27)
1024 n.c. 15(40) n.c. 10(29) 14(45) 8(21) 10(29) 11(31)
Table XII. Times in seconds (number of iterations) for a 2D problem with a 60◦ rotated anisotropy of
0.001 with 300×300 points per processor; ‘n.c.’ denotes ‘not converging within 500 iterations’.
1 n.c. 105(342) n.c. 30(107) 45(172) 22(79) 24(87) 28(112)
64 n.c. n.c. n.c. 79(256) 96(330) 47(152) 59(196) 70(254)
256 n.c. n.c. n.c. 95(305) 110(374) 56(176) 70(227) 84(299)
1024 n.c. n.c. n.c. 113(357) 123(408) 62(193) 82(263) 100(347)
Table XIII. Total times in seconds (number of iterations) for a 7-point 3D Laplace problem
with 40×40×40 points per processor.
p clas clas+j mp std ext e+i ext-ccs e+i-ccs
1 5(33) 8(11) 6(34) 5(9) 7(11) 6(8) 4(12) 3(9)
64 17(80) 18(12) 16(79) 14(18) 16(12) 12(10) 9(14) 7(11)
512 33(149) 26(12) 28(126) 20(26) 20(14) 17(15) 11(14) 11(11)
1000 39(175) 41(12) 31(138) 26(31) 30(13) 31(39) 15(15) 16(14)
1728 51(229) 63(12) 37(159) 35(41) 46(13) 40(33) 22(15) 24(16)
9.2. Three-dimensional structured problems

We now consider 3D problems. Based on the sequential results in Section 6 we expect complexity
reduction schemes to make a difference here.
The first problem is a 7-point 3D Laplace problem on a structured cube with 40×40×40
unknowns per processor. Table XIII shows total times in seconds, and number of iterations.
While classical interpolation solves the problem, the number of iterations increases rapidly with
increasing number of processors and problem size. Multipass interpolation performs better for
larger number of processors, but still shows unscalable convergence factors. Applying one step of
Jacobi interpolation to classical interpolation leads to perfect scalability in terms of convergence
factors, but unfortunately also to rising operator complexities (4.9–5.7), which are twice as large
as for classical and multipass interpolation (2.3–2.4). Interestingly, while both standard and
extended+i interpolations need less iterations for a small number of processors than extended
interpolation, they show worse numerical scalability leading to far less iterations for extended inter-
polation for large number of processors. However, extended interpolation leads to larger complex-
ities (4.7–5.3) compared with extended+i (4.2–4.5) and standard interpolation (4.1–4.4).
The complexity reducing strategies lead to the following complexities: ext-cc (4.5–4.9),
DOI: 10.1002/nla
Table XIV. Total times in seconds (number of iterations) for a 7-point 3D Laplace problem
with 40×40×40 points per processor.
p ext4 ext5 e+i4 e+i5 ext-cc5 e+i-cc5 std5 clas+j0.1
1 3(13) 3(11) 3(12) 3(9) 3(11) 3(9) 4(12) 3(13)
64 6(19) 7(15) 7(19) 7(13) 6(14) 6(13) 9(25) 9(23)
512 9(25) 8(18) 11(28) 10(19) 8(17) 8(17) 15(39) 13(36)
1000 10(25) 11(18) 11(30) 12(20) 10(18) 9(17) 17(39) 15(37)
1728 12(29) 12(21) 13(35) 14(24) 11(21) 11(20) 28(46) 19(45)
Table XV. Total times in seconds (number of iterations) for a structured 3D problem with
jumps with 40×40×40 points per processor.
p mp clas+j ext e+i ext-ccs e+i-ccs std4 ext-cc5
1 11(64) 8(14) 7(14) 6(10) 5(18) 4(15) 6(26) 5(17)
64 35(176) 20(17) 17(17) 15(14) 11(21) 9(19) 18(71) 11(24)
512 58(280) 31(20) 24(24) 21(20) 15(24) 13(21) 27(98) 11(30)
1000 65(306) 35(21) 27(20) 26(21) 19(24) 18(22) 33(113) 14(33)
1728 77(350) 60(21) 73(70) 43(26) 25(29) 29(23) 53(169) 17(36)
ext-ccs (3.9–4.2), e+i-cc (4.0–4.3), and e+i-ccs (3.6–3.8). For the sake of saving space, we
did not record the results for ext-cc or e+i-cc, but the times and number of iterations for these
methods were in between those of ext and ext-ccs, or e+i and e+i-ccs, respectively. Interestingly,
the complexity reducing strategies e+i-cc and e+i-ccs show not only better scalability with
regard to time, but also better scalability of convergence factors than e+i interpolation in this
case.
For this problem, complexity reducing strategies, thus, are paying off. Table XIV shows results
for various truncated interpolation schemes. We used the truncation strategy that restricts the number
of weights per row using either 4 or 5 for the maximal number of elements. While we present
both results for ext and e+i, we present only the faster results for the remaining interpolation
schemes for the sake of saving space. We used a truncation factor of 0.1 for clas+j. Operator
complexities were fairly consistent here across increasing numbers of processors: we obtained 2.9
for ext4, 3.2 for ext5, 2.8 for e+i4, 3.1 for e+i5, 3.2 for ext-cc5, 3.1 for e+i-cc5, 3.2 for std5,
and 3.0 for clas+j0.1. Clearly, using four compared with five weights leads to lower complexities,
but larger number of iterations. Total times are not significantly different. Comparing the fastest
method, e+i-cc5, on 1728 processors to PMIS with classical interpolation, we see a factor of 11
in improvement with regard to number of iterations and a factor of 5 in improvement with regard
to total time with a slight increase in complexity.
Table XV shows results for the problem with jumps (22), for which PMIS with classical
interpolation was shown to completely fail. Multipass interpolation converges here with highly
degrading scalability but good complexities of 2.4. Applying Jacobi interpolation to classical
interpolation leads to very good convergence, but, due to operator complexities between 5.1 and
5.7, it leads to a much more expensive setup and solve cycle. Applying a truncation factor of 0.1
as in the previous example leads to extremely bad convergence and is not helpful here. Standard
interpolation converges very well for small number of processors, but diverges if p is greater or
equal to 64. Interestingly enough std4 converges, albeit not very well.
DOI: 10.1002/nla
9.3. Unstructured problems

In this section, we consider various linear systems on unstructured grids that have been generated
by finite element discretizations. All of these problems were run on an Intel Xeon Linux cluster
at Lawrence Livermore National Laboratory. The first problem is the 3D diffusion problem
−a1 (x, y, z)u x x −a2 (x, y, z)u yy −a3 (x, y, z)u zz = f with Dirichlet boundary conditions on an
unstructured cube. The material properties are discontinuous, and there are approximately 90 000
degrees of freedom per processor. See Figure 6 for an illustration of the grid used. There are five
regions: four layers and the thin stick in the middle of the domain. This grid is further refined
when a larger number of processors are used. The functions ai (x, y, z), i = 1, 2, 3 are constant
within each of the five regions of the domains with the following values (4, 0.2, 1, 1, 104 ) for
a1 (x, y, z), (1, 0.2, 3, 1, 104 ) for a2 (x, y, z), and (1, 0.01, 1, 1, 104 ) for a3 (x, y, z). We also include
some results obtained with CLJP coarsening, which is a parallel coarsening scheme that was
designed to ensure that two fine neighbors always have a common coarse neighbor and for which
classical interpolation is therefore suitable [12, 8]. As a smoother we used hybrid Gauss–Seidel,
which leads to a nonsymmetric preconditioner. Since in practice more complicated problems are
usually solved using AMG as a preconditioner for Krylov subspace methods, we use AMG here
as a preconditioner for GMRES(10). Note that both classical and multipass interpolations do not
converge within 1000 iterations for these problems if they are used without a Krylov subspace
method, whereas both extended and extended+i interpolations, as well as classical interpolation
on CLJP-coarsened grids, converge well without it, with a somewhat larger number of iterations
and slightly slower total times.
The results in Table XVI show that the long-range interpolation operators, with the exception of
multipass interpolation, restore the good convergence that was obtained with CLJP. CLJP has very
large complexities, however. We also used a truncated version of classical interpolation, restricting
the number of weights per fine point to at most 4 to control the complexities. While this hardly
affected convergence factors, it significantly improved the total times to solution, see Figure 7, but
Figure 6. Grid for the elasticity problem.
DOI: 10.1002/nla
Table XVI. Number of iterations (operator complexities) for the unstructured 3D problem with jumps.
AMG is used here as a preconditioner for GMRES(10).
CLJP PMIS
p clas clas4 clas mp e+i e+i-cc e+i4
1 9(5.6) 9(4.2) 18(1.5) 20(1.5) 9(2.7) 10(2.2) 9(1.8)
64 11(6.7) 12(4.6) 62(1.5) 34(1.5) 11(3.0) 13(2.3) 13(1.9)
256 11(7.8) 12(5.0) 72(1.5) 34(1.5) 12(2.9) 12(2.3) 13(1.8)
512 11(7.2) 13(4.6) 118(1.5) 35(1.5) 12(3.0) 13(2.4) 12(1.8)
1024 10(8.6) 12(5.2) 162(1.6) 39(1.6) 12(3.4) 12(2.6) 14(2.0)
200
150
CLJP/clas
CLJP/clas4
Seconds
PMIS/clas
PMIS/e+i
100
PMIS/mp
PMIS/e+i-cc
PMIS/e+i4
50
0
0 200 400 600 800 1000
no. of procs
Figure 7. Total times for a diffusion problem with highly discontinuous material properties. AMG is used
here as a preconditioner for GMRES(10).
still did not achieve perfect scalability. Total times for CLJP with clas4 interpolation are comparable
with PMIS with classical interpolation due to the small complexities of PMIS in spite of its
significantly worse convergence factors. The use of extended+i and e+i-cc interpolations leads to
better scalability than the methods mentioned before due to their lower complexities if compared
with CLJP, or their better convergence factors if compared with PMIS with classical interpolation.
Multipass interpolation leads to even better timings, but the overall best time and scalability
are achieved by applying truncation to four weights per fine point to extended+i interpolation.
For this problem extended interpolation performs similar to extended+i interpolation. Standard
interpolation gives similar results on one processor, but the number of iterations gradually increases
from 11 on one processor to 34 on 1024 processors.
The second problem is a 3D linear elasticity problem using the same domain as above. However,
a smaller grid size is used, since this problem requires more memory, leading to about 30 000
degrees of freedom per processor. The Poisson ratio chosen for the pile driver in the middle of the
domain was chosen to be 0.4 and the Poisson ratios in the surrounding regions were 0.1, 0.3, 0.3
DOI: 10.1002/nla
Table XVII. Number of iterations for the 3D elasticity problem; range of operator complexities. AMG is
used here as a preconditioner for conjugate gradient.
CLJP PMIS
p clas clas4 clas mp e+i e+i-cc e+i-ccs e+i4
1 64 63 94 93 68 69 72 72
8 83 84 159 131 89 95 96 90
64 92 96 210 179 97 105 112 107
512 — 112 319 247 108 109 123 123
Cop 4.5–7.3 3.6–5.4 1.5 1.5 2.5–3.0 2.1–2.4 1.9–2.1 1.9–2.0
400
350
300
CLJP/clas
times in seconds
250 CLJP/clas4
PMIS/clas
PMIS/mp
200
PMIS/e+i
PMIS/e+i-cc
150 PMIS/e+i-ccs
PMIS/e+i4
100
50
0
0 100 200 300 400 500
no. of procs
Figure 8. Total times for the 3D elasticity problem. AMG is used here as a
preconditioner for conjugate gradient.
and 0.2. Since this is a systems problem, the unknown-based AMG method for systems of PDEs
was used. For this problem, the conjugate gradient method was used as an accelerator, and hybrid
symmetric Gauss–Seidel as a smoother. The results are given in Table XVII and Figure 8. CLJP
ran out of memory for the 512 processor run. Here also extended+i interpolation with truncation
leads to the lowest run times and best scalability. Extended interpolation performed similar to
extended+i interpolation. While standard interpolation performs similar to the other distance-two
interpolation methods for a small number of processors, it performed significantly worse on 512
processors.
10. CONCLUSIONS
We have studied the performance of AMG methods using the PMIS-coarsening algorithm in
combination with various interpolation operators. PMIS with classical, distance-one interpolation
DOI: 10.1002/nla
leads to an AMG method with low complexity, but has bad scalability in terms of AMG convergence
factors. The use of distance-two interpolation operators restores this scalability. However, it leads
to an increase in operator complexity. While this increase was fairly small for 2D problems and was
far outweighed by the much improved convergence, for 3D problems complexities were often twice
as large, and impacted scalability. To counter this complexity growth, we implemented various
complexity reducing strategies, such as the use of smaller interpolatory sets and interpolation
truncation. The resulting AMG methods, particularly the extended+i interpolation in combination
with truncation, lead to very good scalability for a variety of difficult PDE problems on large
parallel computers.
ACKNOWLEDGEMENTS
We thank Tzanio Kolev for providing the unstructured problem generator and Jeff Painter for the Jacobi
interpolation routine. This work was performed under the auspices of the U.S. Department of Energy by
University of California Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48.
REFERENCES
1. Brandt A, McCormick SF, Ruge JW. Algebraic multigrid (AMG) for sparse matrix equations. In Sparsity and
its Applications, Evans DJ (ed.). Cambridge University Press: Cambridge, 1984.
2. Ruge JW, Stüben K. Algebraic multigrid (AMG). In Multigrid Methods, Vol. 3 of Frontiers in Applied Mathematics,
McCormick SF (ed.). SIAM: Philadelphia, PA, 1987; 73–130.
3. Stüben K. Algebraic multigrid (AMG): an introduction with applications. In Multigrid, Trottenberg U, Oosterlee C,
Schüller A (eds). Academic Press: New York, 2000.
4. Cleary AJ, Falgout RD, Henson VE, Jones JE, Manteuffel TA, McCormick SF, Miranda GN, Ruge JW. Robustness
and scalability of algebraic multigrid. SIAM Journal on Scientific Computing 2000; 21:1886–1908.
5. De Sterck H, Yang UM, Heys JJ. Reducing complexity in parallel algebraic multigrid preconditioners. SIAM
Journal on Matrix Analysis and Applications 2006; 27:1019–1039.
6. Luby M. A simple parallel algorithm for the maximal independent set problem. SIAM Journal on Computing
1986; 15:1036–1053.
7. Briggs WL, Henson VE, McCormick SF. A Multigrid Tutorial (2nd edn). SIAM: Philadelphia, PA, 2000.
8. Henson VE, Yang UM. BoomerAMG: a parallel algebraic multigrid solver and preconditioner. Applied Numerical
Mathematics 2002; 41:155–177.
9. Butler J. Improving coarsening and interpolation for algebraic multigrid. Master’s Thesis, Applied Mathematics,
University of Waterloo, 2006.
10. Falgout RD, Jones JE, Yang UM. Pursuing scalability for hypre’s conceptual interfaces. ACM Transactions on
Mathematical Software 2005; 31:326–350.
11. Baker A, Falgout RD, Yang UM. An assumed partition algorithm for determining processor inter-communication.
Parallel Computing 2006; 32:394–414.
12. Cleary AJ, Falgout RD, Henson VE, Jones JE. Coarse grid selection for parallel algebraic multigrid. In Proceedings
of the Fifth International Symposium on Solving Irregularly Structured Problems in Parallel. Springer: New York,
1998.
DOI: 10.1002/nla
Published online 28 December 2007 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nla.568
Algebraic multigrid for stationary and time-dependent partial

differential equations with stochastic coefficients
E. Rosseel, T. Boonen and S. Vandewalle∗, †

Computer Science Department, Katholieke Universiteit Leuven, Celestijnenlaan 200A, B-3001 Leuven, Belgium
SUMMARY
We consider the numerical solution of time-dependent partial differential equations (PDEs) with random
coefficients. A spectral approach, called stochastic finite element method, is used to compute the statistical
characteristics of the solution. This method transforms a stochastic PDE into a coupled system of
deterministic equations by means of a Galerkin projection onto a generalized polynomial chaos. An
algebraic multigrid (AMG) method is presented to solve the algebraic systems that result after discretization
of this coupled system. High-order time integration schemes of an implicit Runge–Kutta type and spatial
discretization on unstructured finite element meshes are considered. The convergence properties of the
AMG method are demonstrated by a convergence analysis and by numerical tests. Copyright q 2008
John Wiley & Sons, Ltd.
Received 14 May 2007; Revised 8 November 2007; Accepted 8 November 2007
KEY WORDS: partial differential equations with random coefficients; Karhunen–Loève expansion; poly-
nomial chaos; algebraic multigrid; implicit Runge–Kutta time discretization
1. INTRODUCTION
Randomness in a physical problem can be modelled mathematically by using stochastic partial

differential equations (PDEs). These may contain some stochastic or random parameters, for
example, in the coefficients of the differential operator, in the boundary and initial conditions, or
in the forcing term. Their solution allows to extract statistical information concerning the solution
to the physical model, as required, e.g. in uncertainty propagation problems.
The solution of a stochastic PDE can be obtained by a statistical or deterministic approach [1, 2].
The former are typically based on Monte Carlo simulation techniques, see, for example, [3, 4].
∗ Correspondence to: S. Vandewalle, Computer Science Department, Katholieke Universiteit Leuven, Celestijnenlaan
200A, B-3001 Leuven, Belgium.
†
E-mail: stefan.vandewalle@cs.kuleuven.be
Contract/grant sponsor: Belgian State, Science Policy Office

Contract/grant sponsor: Research Council K.U.Leuven, CoE EF/05/006 Optimization in Engineering (OPTEC)

142 E. ROSSEEL, T. BOONEN AND S. VANDEWALLE
Monte Carlo methods are often easy to implement but rapidly become prohibitively expensive
with increasing accuracy demands. Examples of deterministic approaches include perturbation
methods [5], Neumann expansion methods [6] and the spectral stochastic finite element method
[7, 8]. Perturbation and Neumann expansion methods are restricted to small parameter variances
and calculate only a few statistical moments of the solution. These restrictions do not hold for
the stochastic finite element method, which, in principle, enables to compute the full statistical
characteristics of the solution. That is, also the probability distribution of the solution can be
extracted. As such, it provides a valuable alternative to Monte Carlo simulations, see [4] for a
comparison.
The stochastic finite element method transforms a stochastic PDE into a system of coupled
deterministic PDEs after a projection of the random solution onto a suitable finite dimensional
random space. We will use the stochastic finite element approach to discretize the random part
of the PDE. For time-dependent PDEs, we will employ an implicit Runge–Kutta (IRK) time
integration scheme [9]. For IRK methods, the dimension of the linear systems to be solved at each
time step is proportional to the number of IRK stages. Multigrid methods are available for IRK
discretizations of deterministic parabolic PDEs [10, 11]. In this paper, we extend these methods
towards PDEs with random coefficients. In particular, we shall study an algebraic multigrid (AMG)
approach, suited for unstructured finite element meshes.
The paper is organized as follows. Section 2 describes the discretization of time-dependent
stochastic PDEs by means of the stochastic finite element and IRK method. The AMG method is
presented in Section 3. Its convergence properties are analyzed in Section 4 and further discussed
in Section 5. Section 6 addresses some implementation issues. In Section 7, numerical experiments
are given to illustrate the convergence behavior. Conclusions are presented in Section 8.
2. THE STOCHASTIC MODEL PROBLEM AND ITS DISCRETIZATION
2.1. A two-dimensional diffusion equation

To describe the model problem, we first introduce some concepts from probability theory. Consider
a complete probability space (, F, E) defined by a sample space , a -algebra F and a
probability measure E. A random variable () is defined as the function : → R. Further on
we
will simply express instead of (). The expected value of corresponds to = dE =
yp(y) dy. In the last equality p(y) represents the probability density function of , with support
and y ∈ . A random field (x, ) is defined by the mapping : D⊗ → R, with D being
a spatial domain. Hence, at every point x ∈ D, a random field corresponds to a certain random
variable. A typical application is a stochastic material parameter that represents the properties of
a heterogeneous mixture of materials. A random process (t, ), with t ∈ T = [0, T ], is defined as
: T⊗ → R. This concept models stochastic time series, for example, the evolution of shares at
a stock market. A random wave (x, t, ) generalizes the previous concepts and is defined by the
mapping : D⊗T⊗ → R.
We consider a diffusion problem with a random, spatially varying and time-dependent diffusion
coefficient (x, t, ), defined over a two-dimensional domain D:
*u(x, t, )
−∇ ·((x, t, )∇u(x, t, )) = b(x, t, ) (1)
*t
DOI: 10.1002/nla
ALGEBRAIC MULTIGRID FOR PDES WITH STOCHASTIC COEFFICIENTS 143
where x ∈ D, t ∈ T = [0, T ] and ∈ , a sample space. Further on, we shall consider only the case
of a deterministic source term b(x, t). The method, however, immediately extends to the more
general case of a stochastic source term. Model problem (1) is completed with suitable boundary
conditions and initial conditions in the time-dependent case; only deterministic conditions are
considered here.
2.2. Discretization of the random part of the problem

To transform the stochastic PDE (1) into a system of deterministic time-dependent PDEs, we will
follow the three-step procedure of [12]. First, we express the random inputs by a finite number of
random variables; second, the solution is approximated using a finite-term expansion with random
basis polynomials and third, we perform a Galerkin projection onto the set of polynomial basis
functions. The second step of this procedure will be addressed in Section 2.2.1; steps one and
three will be explained in Sections 2.2.2 and 2.2.3, respectively.
2.2.1. Generalized polynomial chaos. Consider a Hilbert space L 2 (, F, E) of square integrable
functions of L independent random variables i on (, F, E). A finite dimensional subspace S of
L 2 (, F, E) is defined through a set of Q basis functions {q }q=1,...,Q in the random variables
1 , . . . L . Let denote a vector containingthe random variables 1 , . . . L . The space S is equipped
with an inner product defined by ab = abw(y) dy, with w(y) denoting the joint probability
density corresponding to , the support of and a, b ∈ S. This inner product actually corresponds
to an expectation of the product of its arguments. Several approaches have been proposed to
construct S, e.g. [7, 12–14]. Here, we shall employ an orthonormal basis of multivariate polynomials
l that are globally defined in each random variable i . These multivariate polynomials are built
as a product of univariate polynomials {m i }i=1,...,L of degree m i in i and orthonormal w.r.t. the
probability measure corresponding to i .
Two criteria are often considered to determine the basis functions. One may limit the total degree
L
of the polynomial to a given value P, i.e. i=1 m i P. The total number of basis functions, Q, is
then given by (L + P)!/L!P! [15]. Alternatively, one may limit the degrees of the univariate factors
L
separately, i.e. m i pi , i = 1, . . . , L, for a given set of pi -values. In this case Q = i=1 ( pi +1) [4].
Using the first criterion, a so-called generalized polynomial chaos basis [1, 12] can be
constructed. The univariate polynomials are chosen from the Wiener–Askey scheme according
to the probability distributions of the random variables i . The second criterion can be used to
create an alternative set of basis functions {q } [4, 13, 16], which possess a double orthogonality
property:
j k = j,k and i j k = i jk j,k (2)
with i jk being a constant and j,k the Kronecker delta. This property allows to transform a linear
stochastic PDE into a system of uncoupled deterministic PDEs.
Having specified an appropriate random basis, the solution u(x, t, ) can be approximated by
a linear combination of basis functions with deterministic coefficients u q (x, t). When the basis
functions are collected in the column vector and the coefficients in a column vector u(x, t), we
can express

Q
u(x, t, ) ≈ u q (x, t)q () = T u(x, t) (3)
q=1
DOI: 10.1002/nla
2.2.2. Discretization of random inputs. The random inputs are typically discretized either by
a generalized polynomial chaos expansion approach similar to (3), see e.g. [17, 18], or by a
Karhunen–Loève (KL) expansion [19]. The former leads to an approximation of the form

Q (x, t, )q
(x, t, ) ≈ i (x, t)i () with i (x, t) = (4)
i=1 q2
We will apply this type of discretization to model random inputs with a lognormal marginal
distribution. Analytical expressions for the corresponding i (x, t) coefficients are given in [20].
The truncated KL expansion approximates a random wave (x, t, ) as

L
(x, t, ) ≈ 1 (x, t)+ i+1 (x, t)i () (5)
i=1
The function 1 (x, t) corresponds to the mean of (x, t, ). The functions i+1 (x, t) are eigenfunc-
tions of the covariance function C (x1 , t1 ; x2 , t2 ), scaled by the square root of the corresponding
eigenvalues. The random variables i are uncorrelated random variables with zero mean and unit
variance [21]. We assume that these random variables are independent. Note that L +1 terms
are needed to express a random input in an L-dimensional random space by a KL expansion, in
comparison with Q terms in the case of a chaos expansion. Hence, a chaos expansion will be used
only when the KL expansion is difficult to compute.
2.2.3. Galerkin approach. The stochastic PDE (1) can be converted into a system of deterministic
PDEs for the unknown coefficients u q (x, t) that appear in (3). This is done by replacing (x, t, )
by its approximation (4) or (5), by inserting the right-hand side of (3) into the PDE and by imposing
orthogonality of the resulting residual w.r.t. the chosen random basis. This results in
*u(x, t) L∗
C1 − Ci ∇ ·[i (x, t)∇u(x, t)] = b(x, t)c (6)
*t i=1
with the vector c = and the matrices Ci defined as

Ci = i−1 T , i = 1, . . . , L ∗ := L +1 if (x, t, ) represented by a KL expansion (7)
= i T , i = 1, . . . , L ∗ := Q if (x, t, ) represented by a chaos expansion (8)

and 0 = 1. The matrix C1 equals the identity matrix I Q of dimension Q if the polynomial chaos
functions are suitably normalized. Analytical expressions for i T can be found in [22]; in
Appendix A we present expressions for i T , see Equation (A1).
Remark 2.1
In case of a double orthogonal polynomial chaos and a KL expansion of the random input, the
PDEs are uncoupled. Indeed each matrix Ci is diagonal due to the orthogonality properties (2).
However, the resulting number of PDEs becomes rapidly too large for practical purposes when
the number of random variables or the degree of the polynomials increases.
Remark 2.2
The uniqueness and existence of the solution to (6) can be proved from the Lax–Milgram lemma
under certain conditions on the stochastic parameters, as detailed in [4, 13].
DOI: 10.1002/nla
2.3. Spatial finite element discretization

We shall use classical finite elements for spatial discretization and assume that each of the deter-
ministic coefficients is discretized on the same mesh, with the same (number of) elements. Hence,
each u q (x, t) for q = 1, . . . , Q is approximated as a linear combination of the form u q (x, t) ≈
N
n=1 u q,n (t)sn (x), in terms of N basis functions sn (x). The coefficients are grouped together in the
vectors b(t), uq (t) ∈ R N . The discretization of (6) can be compactly expressed after the introduction
of a set of L ∗ deterministic stiffness matrices K i ∈ R N ×N defined as K i = K(i (x, t)) with

[K(i (x, t))]kl = i (x, t)∇sk (x)·∇sl (x) dx, i = 1, . . . , L ∗ ; k,l = 1, . . . , N (9)
D
After spatial discretization, the stochastic finite element method yields a system of ordinary
differential equations (ODE):
⎡ ⎤
u1 (t)
du(t) L∗ ⎢ ⎥
⎢ ⎥
C1 ⊗ M + Ci ⊗ K i u(t) = c ⊗b(t) with u(t) = ⎢ ... ⎥ and uq (t) ∈ R N (10)
dt i=1 ⎣ ⎦
u Q (t)

Here, M ∈ R N ×N is the mass matrix defined as [M]kl = D sk (x)sl (x) dx, and the matrices Ci ∈
R Q×Q and the vector c ∈ R Q are defined by (7) or (8).
2.4. Time discretization

We consider time discretization by an IRK method [9]. To introduce some notation, we shall
recall the basic formula of such a method, as applied to an ODE of the form du/dt = f (t, u).
An IRK method computes an approximation u m+1 to the solution u(tm+1 ) at time tm+1 from
an approximation u m at time tm . To this end, it introduces a number of auxiliary variables x j ,
j = 1, . . . , s, called stage values or stage vectors, at times tm +c j t with t = tm+1 −tm . The
procedure corresponds to the following set of equations:

s
u m+1 = u m +t b j f (tm +c j t, x j ) (11)
j=1

s
xi = u m +t ai j f (tm +c j t, x j ), i = 1, . . . , s (12)
j=1
Equation (11) expresses u m+1 as an update to u m in terms of the stage values {xi }i=1,...,s . Equation
(12) describes the system of equations to be solved to compute the stage values. The method is
fully characterized by the parameters Airk = [ai j ], birk = [b1 . . . bs ]T and cirk = [c1 . . . cs ]T . Equations
(11) and (12) are often rewritten in terms of the stage value increments x j := x j −u m :
u m+1 = u m +[x1 · · · xs ]A−T

irk birk (13)

s
xi = t ai j f (tm +c j t, u m +x j ), i = 1, . . . , s (14)
j=1
DOI: 10.1002/nla
We will apply the IRK method to system (10). The approximation at time tm+1 to the solution
u(tm+1 ) will be represented as u m+1 . Formulation (13)–(14) is used with stage vector increments
denoted simply as x j , j = 1, . . . , s. They are grouped together into a long vector x ∈ RNQs , where
the increments are numbered first along the random dimension, then along the spatial dimension
and finally according to the stages.
When the coefficient (x, ) is time independent, system (14) discretizing (10) becomes

L∗

C1 ⊗ M ⊗ Is +t Ci ⊗ K i ⊗ Airk x =
b (15)
i=1
with
b being a known vector depending on u m and on the right-hand side of (10)
⎛ ⎛ ⎡ ⎤⎞ ⎞
b(tm +c1 t)
⎜ ⎜ ⎢ ⎥⎟ ∗ ⎟
⎜ T⎜ ⎢ .
. ⎥⎟ L ⎟
b = t ⎜ INQ ⊗ Airk P ⎜c ⊗ ⎢ . ⎥⎟ − Ci ⊗ K i ⊗ Airk [u m ⊗1s ]⎟ (16)
⎝ ⎝ ⎣ ⎦⎠ i=1 ⎠
b(tm +cs t)
and 1s = [1 . . . 1]T ∈ Rs . The matrix P T is such that it permutes the rows of the vector it multiplies
so that all variables are grouped in the same order as the unknowns x.
In case of a time-dependent stochastic coefficient (x, t, ), each of the elements of the stiffness
matrices K i (9) is time dependent. According to Equation (14), every stiffness matrix K i (t) is
evaluated at s time positions t = tm +c j t, j = 1, . . . , s. This leads to a total of L ∗ ·s stiffness
matrices at each time step. Applying (14) yields the following system to be solved for the stage
vector increments:
∗

L
(C1 ⊗ M ⊗ Is )+t Ci ⊗ K i (tm +c1 t)⊗ Airk (:, 1) . . .
i=1

L∗

Ci ⊗ K i (tm +cs t)⊗ Airk (:, s) P x = B (17)
i=1
Matrix P is an NQs×NQs permutation matrix. It permutes the columns of the matrix that it is
multiplied with so that consecutive IRK stages are grouped together in blocks of s columns. In the
remainder of the paper, the multigrid formulation and analysis are presented for time-independent
(x, ). The extension to the general case of (x, t, ) is straightforward.
Remark 2.3
In Equation (15) the unknowns are ordered block-wise. The vector x consists of Q consecutive
blocks, with each block corresponding to the unknowns associated with a random mode. These
blocks can further be subdivided in N blocks, where each one contains the IRK unknowns per
spatial node. Similar to the discussion in [23] on unknown-based and point-wise ordering of
variables, the unknowns in Equation (15) can be reordered per spatial point. This yields the system

L∗

M ⊗C1 ⊗ Is +t K i ⊗Ci ⊗ Airk x̂ = b̂ (18)
i=1
DOI: 10.1002/nla
with b̂ being the reordered version of b (16). The vector x̂ contains N blocks, with each block
corresponding to the Qs unknowns related to a spatial point. This point-based ordering is more
convenient to illustrate the block operations of the point-based AMG method presented in Section 3,
see Remark 3.1.
3. AMG FOR THE STOCHASTIC FINITE ELEMENT METHOD
Next, we present an AMG method to solve the stochastic finite element discretization (15) or (17).
We will also consider the case of a stationary, i.e. time-independent problem. In that case, the
discretization reads
⎡ ⎤
u1
L∗
⎢ ⎥
⎢ ⎥
Ci ⊗ K i u = b with u = ⎢ ... ⎥ , uq ∈ R N and b = c ⊗b (19)
i=1 ⎣ ⎦
uQ
The basis of the method is the classical multigrid iteration as shown in Algorithm 1. The algo-
rithm uses a hierarchy of K levels, k = 1, . . . , K , with A K u K = b K being the discretization of the
(stochastic) PDE on the given (fine) mesh. The recursion scheme is determined by a parameter
; for example, the case = 1 is called a V-cycle, the case = 2 a W-cycle. An AMG method
requires a setup phase to algebraically construct the restriction and prolongation operators, Rkk−1
k , k = 2, . . . , K . The coarse level operators A
and Pk−1 k−1 , k = 2, . . . , K , are assembled by using the
k+1
Galerkin principle [24], i.e. Ak = Rk+1 Ak+1 Pk . To construct an AMG method for stochastic
k
finite element and IRK discretizations, the AMG components are built so that all unknowns per
spatial node are updated together. A block smoother will be used, and prolongation and restriction
operators will have a tensor structure.
Algorithm 1. Standard multigrid iteration for Au = b. ( = 1: V-cycle, = 2: W-cycle)

(1) (0)
u k = multigrid(u k , Ak , bk , k)
(0) (0)
• Presmoothing: u k = smooths1 (u k , Ak , bk )
(0)
• Restrict residual: bk−1 = Rkk−1 (bk − Ak u k )
(0)
• Coarse grid correction: solve Ak−1 v̂k−1 = bk−1
(0)
— if k = 1, v̂k−1 = A−1
k−1 bk−1
(0)
— if k>1, v̂k−1 = multigrid(0, Ak−1 , bk−1 , k −1)
(0) (0)
v̂k−1 = multigrid−1 (v̂k−1 , Ak−1 , bk−1 , k −1)
(0) (0) (0)
• Prolongate correction and update solution: û k = u k + Pk−1
k v̂
k−1
(1) (0)
• Postsmoothing: u k = smooths2 (û k , Ak , bk )
DOI: 10.1002/nla
3.1. A smoother for the stochastic finite element method

We suggest to use a block lexicographic Gauss–Seidel smoothing method. More precisely, one
smoothing step consists of a loop over all spatial nodes, in which all random and IRK unknowns
per node are updated simultaneously. Hence, every iteration involves a sequence of N local solves
of a linear system. The local system at node n corresponds to the Qs×Qs system:

L∗

M[n,n] C1 ⊗ Is +t K i[n,n] Ci ⊗ Airk x[n]
i=1

L∗

=
b[n] − M[n,m] C1 ⊗ Is +t K i[n,m] Ci ⊗ Airk x[m] (20)
m
=n i=1
with x[n] ∈ RQs being the unknowns associated with node n. For stationary problems, the local
system simplifies to the Q × Q system:
L∗
L∗

K i[n,n] Ci u [n] = b[n] − K i[n,m] Ci u [m] (21)
i=1 m
=n i=1
The block Gauss–Seidel iteration step can be expressed as a linear iteration based on a matrix
splitting of the stiffness matrices K i , K i = K i+ + K i− (i = 1, . . . , L ∗ ), and the mass matrix M,
M = M + + M − . Here, K i+ and M + are the lower triangular parts of K i and M, respectively. The
block Gauss–Seidel iteration in the th iteration step can then be formulated as

L∗
+
C1 ⊗ M + ⊗ Is +t Ci ⊗ K i ⊗ Airk x ( +1)
i=1

L∗

=
b − C1 ⊗ M ⊗ Is +t −
Ci ⊗ K i− ⊗ Airk x ( ) (22)
i=1
Remark 3.1
The block Gauss–Seidel method entails every iteration a block triangular system solve. The
triangular shape of these systems can be visualized by reordering the unknowns according to
Equation (18). The block Gauss–Seidel iteration (22) can then be formulated as
⎡ ⎤
L∗
⎢ M[1,1] IQs +t
irk
K i[1,1] Ci 0 ⎥
⎢ ⎥
⎢ i=1
⎥
⎢ .. ⎥ ( +1)
⎢ . ⎥ x̂ = b̂GS
⎢ ⎥
⎢ ⎥
⎢ L∗ L∗ ⎥
⎣ ⎦
M[N ,1] IQs +t K i[N ,1] Ciirk . . . M[N ,N ] IQs +t K i[N ,N ] Ciirk
i=1 i=1
L ∗ − irk ( )
with Ciirk = Ci ⊗ Airk , C1 replaced by I Q and b̂GS = b̂ −(M − ⊗ IQs +t i=1 K i ⊗Ci ) x̂ .
DOI: 10.1002/nla
3.2. Multigrid hierarchy and intergrid transfer operators

We suggest to derive the multigrid hierarchy from the dominant term in (4) and (5), i.e. from
the stiffness matrix of the averaged deterministic problem. This is the PDE that results from the
stochastic PDE by replacing all the random parameters by their mean value. Such a hierarchy can
be derived by using a classical AMG strategy applied to the stiffness matrix K 1 . Suppose Pd is
a prolongation operator constructed for K 1 , then tensor prolongation and restriction operators for
(15), denoted as P and R, respectively, are built as
P = I Q ⊗ Pd ⊗ Is and R = I Q ⊗ PdT ⊗ Is (23)
This simplifies in the stationary case to
P = I Q ⊗ Pd and R = I Q ⊗ PdT (24)
Note that all random and IRK unknowns associated with a particular spatial node are prolongated
and restricted in a decoupled way. Only the spatial dimension is coarsened; the stochastic and time
discretization is kept unaltered throughout this multigrid hierarchy.
The coarse grid operator, denoted as A H , is deduced from (24) and (23) by using the Galerkin
principle. That is, the coarse grid operator corresponds to RAh P, with Ah being the fine grid
operator. Thus, applying formulas (24)–(23) to Equations (19)–(15) results in
L∗

A H = C1 ⊗ Pd M PdT ⊗ Is +t Ci ⊗ Pd K i PdT ⊗ Airk
i=1
for the time-dependent case. For the time-independent case, we have

L∗

AH = Ci ⊗ PdT K i Pd
i=1
4. CONVERGENCE ANALYSIS
Using Fourier analysis [24, 25], valuable insights in the convergence behavior of geometric multi-
grid methods can be obtained. A local Fourier analysis of geometric multigrid for stochastic,
stationary PDEs can be found in [16]. This analysis cannot be directly applied to AMG methods.
Instead, the methodology from [10, 11] is followed.
Our analysis for stationary and time-dependent problems as will be detailed in Sections 4.1 and
4.2, respectively, is restricted to the case of L ∗ = 2. This corresponds to a diffusion coefficient
discretized with one random variable, see also Equation (5). In Section 5, the extension to the
general case, L ∗ >2, is discussed.
4.1. Stationary problems

First, we derive some results for the block smoother. We start from the error iteration:
e( +1) = Se( ) with S = −(C1 ⊗ K 1+ +C2 ⊗ K 2+ )−1 (C1 ⊗ K 1− +C2 ⊗ K 2− ) (25)
and e( ) = u exact −u ( ) the error at iteration step . The asymptotic convergence is characterized
by the spectral radius of the iteration operator S, denoted by
(S). Assume that the random basis
DOI: 10.1002/nla
functions 1 , . . . , Q are normalized in such a way that C1 equals the identity matrix I Q . The
matrix C2 is a real symmetric matrix (7)–(8) with eigenvalue decomposition C2 = VC2 C2 VCT2 .
Applying the similarity transform VC2 ⊗ I N to S leads to
(S) = ((VCT2 ⊗ I N )S(VC2 ⊗ I N ))
= (−(I Q ⊗ K 1+ +C2 ⊗ K 2+ )−1 (I Q ⊗ K 1− +C2 ⊗ K 2− ))

Q
= (−(K 1+ +q K 2+ )−1 (K 1− +q K 2− )) with q ∈ (C2 )
q=1

Q
= ( Ŝ(q ))
q=1
with Ŝ being a matrix-valued function defined as
Ŝ(r ) = −(K 1+ +r K 2+ )−1 (K 1− +r K 2− ) (26)

Thus, the asymptotic convergence factor of block Gauss–Seidel is given by

(S) = max
( Ŝ(q )) (27)
q ∈(C2 )
To characterize the convergence properties of a two-level multigrid cycle, we define the matrix-
valued function T̂ (r ):
T̂ (r ) = ( Ŝ(r ))s2 (I N − Pd (PdT (K 1 +r K 2 )Pd )−1 PdT (K 1 +r K 2 ))( Ŝ(r ))s1
with Ŝ(r ) being defined by (26), s1 and s2 are the number of pre- and postsmoothing iterations.
An analogous derivation as above shows that the asymptotic convergence factor of the two-level
cycle can be determined from the spectral radius of the corresponding iteration matrix T as

(T ) = max
(T̂ (q )) (28)
q ∈(C2 )
Formulas (27) and (28) allow the following intuitive interpretation. The convergence for the
stationary stochastic finite element discretization with L ∗ = 2 equals the worst convergence of
the corresponding Gauss–Seidel or multigrid method, applied to a set of deterministic problems
of the form:
(K 1 +q K 2 )u = b with q ∈ (C2 ) (29)
4.2. Time-dependent problems

The error iteration of the block Gauss–Seidel smoother (22) for C1 = I Q and L ∗ = 2 is given by
(I Q ⊗ M + ⊗ Is +t (I Q ⊗ K 1+ +C2 ⊗ K 2+ )⊗ Airk )e( +1)
= −(I Q ⊗ M − ⊗ Is +t (I Q ⊗ K 1− +C2 ⊗ K 2− )⊗ Airk )e( ) (30)
DOI: 10.1002/nla
with corresponding iteration matrix denoted as S. This matrix can be decoupled by applying
the similarity transform VC2 ⊗ I N ⊗ Virk , with Virk resulting from the eigenvalue decomposition
−1
Airk = Virk irk Virk and VC2 from C2 = V2 C2 V2T . This enables to express the spectrum of S as

s
Q
(S) = ( Ŝ(q , trirk )), rirk ∈ (Airk ), q ∈ (C2 )
r =1 q=1
with Ŝ being the matrix-valued function defined as

Ŝ(r, z) = −(M + + z K 1+ + zr K 2+ )−1 (M − + z K 1− + zr K 2− ) (31)
Thus, the asymptotic convergence factor of lexicographic block Gauss–Seidel applied to system
(15) with L ∗ = 2 corresponds to

(S) = max max
( Ŝ(q , tirk )) (32)
irk ∈(Airk ) q ∈(C2 )
The analysis of the two-level multigrid cycle proceeds in a similar way. It is based on the
matrix-valued function T̂ (r, z) defined as
T̂ (r, z) = ( Ŝ(r, z))s2 (I N − Pd (PdT (M + z K 1 + zr K 2 )Pd )−1 PdT (M + z K 1 + zr K 2 ))( Ŝ(r, z))s1
with Ŝ(r, z) given by (31) and s1 and s2 being the number of pre- and postsmoothing steps. Using
this matrix function, the asymptotic convergence factor of the two-level multigrid cycle becomes

(T ) = max max
(T̂ (q , tirk )) (33)
irk ∈(Airk ) q ∈(C2 )
As in the stationary case, this value corresponds to the worst case asymptotic convergence factor
of multigrid applied to the set of deterministic problems:
(M +tirk K 1 +tirk q K 2 )x = b with q ∈ (C2 ), irk ∈ (Airk ) (34)
These deterministic systems can be derived from backward Euler discretizations with scaled time
step tirk of ODE systems:
dx
M +(K 1 +q K 2 )x = b (35)
dt
4.3. General discretizations with L ∗ >2
The case L ∗ = 2 enables a decoupling of the stochastic and spatial dimensions, using a similarity
transform based on C2 . Hence, the analysis can be reduced to the analysis of smaller problems of
the form (29) and (34). For these problems, local Fourier analysis [24, 26] allows to derive sharp
convergence factors, at least for the geometric multigrid variant on regular meshes.
In general, no decoupling between the spatial and random part of the discretization is possible
since the matrices Ci cannot be diagonalized simultaneously. An exception to this occurs when
double orthogonal polynomials are used as basis functions. Indeed, then all matrices Ci are diagonal,
see (2) and (7). Denoting the double orthogonal random basis by and the corresponding matrices
Ci by G i , we can determine the spectral radius of the two-level multigrid iteration matrix T as

(T ) = max . . . max
(T̂ (1 , . . . , L ∗ ))
1 ∈(G 1 ) L ∗ ∈(G L ∗ )
DOI: 10.1002/nla
with the matrix-valued function T̂ (r1 , . . . ,r ∗L ) being defined as

⎡ ∗ −1 ∗ ⎤

L
L
T̂ (r1 , . . . ,r L ∗ ) = ( Ŝ(r1 , . . . ,r L ∗ ))s2 ⎣ I N − Pd PdT ri K i Pd PdT ri K i ⎦
i=1 i=1
· ( Ŝ(r1 , . . . ,r L ∗ ))s1
L ∗ L ∗
and the matrix-valued function Ŝ(r1 , . . . ,r L ∗ ) = ( i=1 ri K i+ )−1 ( i=1 ri K i− ). In Section 5 we will
point out a relation between the eigenvalues of the matrices Ci and the diagonal elements of the
matrices G i . On the basis of that relation, we can show that the AMG convergence properties in
case of a double orthogonal random basis are similar to the case L ∗ = 2 treated in the previous
section. Moreover, also when the double orthogonal basis is not used, we can argue that the analysis
of the case L ∗ = 2 is likely to provide valuable insights for the general case L ∗ >2. The first terms
of system (19), i.e. C1 ⊗ K 1 +C2 ⊗ K 2 , represent the mean behavior of the stochastic PDE and the
main stochastic variation. This follows from the stochastic discretization of the random coefficient
as a truncation of a series of terms of decreasing importance, see Section 2.2. The sum involving
the matrices C3 , . . . , C L ∗ can be seen as a perturbation of the system matrix. A more thorough
(geometric) multigrid analysis for the general stationary case can be found in [16].
5. A DISCUSSION OF THE THEORETICAL CONVERGENCE PROPERTIES
The convergence analysis of the previous section shows that both the matrices K 1 and K 2 as well
as the eigenvalues of C2 and Airk determine the AMG convergence, see Equations (28) and (33). In
this section we discuss the AMG convergence behavior with respect to the stochastic discretization.
The conclusions agree with the properties of the geometric multigrid variant, as observed in [15]
and theoretically analyzed in [16, 27]. The AMG convergence behavior with respect to the IRK
discretization, i.e. the influence of the eigenvalues of Airk , is detailed in [11].
5.1. A bound for the eigenvalues of C2

The range of the eigenvalues of C2 can be computed by using properties of double orthogonal
random polynomials [16]. Consider the set of Q generalized polynomial chaos functions =
[1 , . . . , Q ]T . This set can be expanded to an orthonormal set = [1 , . . . , Q , Q+1 , . . . , Q ]T
that spans the same vector space as the double orthogonal polynomial chaos basis . Hence, an
orthogonal matrix Z exists so that = Z . As a consequence, the matrix C 2 can be diagonalized
by the matrix Z :
T
C 2 := 1 = 1 Z ( )T Z T = Z 1 ( )T Z T = Z G 2 Z T (36)
Thus, the eigenvalues of C 2 correspond to the diagonal entries of the diagonal matrix G 2 . It can
be shown that these values coincide with the roots of univariate orthogonal polynomials from the
Askey scheme, as explained in [16]. Moreover, as the matrix C2 is a principal submatrix of C 2 ,
the eigenvalues of C 2 determine upper and lower bounds for the eigenvalues of C2 . This allows
one to determine bounds for the eigenvalues of C2 from the roots of certain univariate orthogonal
polynomials.
DOI: 10.1002/nla
6 2
4
1
2
Eigenvalues
Eigenvalues
0 0
0 2 4 6 8 10 0 5 10 15 20
(a) Hermite chaos order (b) Number of random dimensions
Figure 1. Effect of the polynomial chaos order and the number of random variables on the eigenvalues
of C2 (7) in case of a Hermite polynomial chaos: (a) fixed number of random variables and increasing
P, L = 4 and (b) fixed order and increasing value of L, P = 2.
5.2. Polynomial chaos type and order

The influence of the polynomial chaos type and the polynomial order can be derived from the
properties of the roots of the corresponding orthogonal polynomials [28]. In the case of a Legendre
chaos, the eigenvalues of the matrix C2 take values between −1 and 1. Thus, AMG convergence
is asymptotically independent of the polynomial chaos order. In case of a Hermite chaos, the
eigenvalues can become arbitrarily large with a range that increases with increasing polynomial
order. This effect is illustrated in Figure 1(a). Whether the large eigenvalue range affects the
convergence or not depends on the particular PDE problem. Consider, √ e.g. model problem (1)
with a Gaussian variable
√ with mean 1 and variance , := 1+ 1 . Then, K 1 is a discretized
Laplacian and K 2 = K 1 . According to (29), the AMG convergence corresponds to the worst
convergence of multigrid applied to a set of problems of the form:
√
(1+q )K 1 u = b with q ∈ (C2 ), C2 = 1 T (37)
√
The multiplicative factor 1+q can be shifted to the right-hand side. Hence, the AMG conver-
gence is independent of the variance and of the polynomial chaos order. However, in case of
modelled as a random field (x, ), and L ∗ = 2, system (29) corresponds to K̃ u = b, with K̃
being a discretized diffusion problem with diffusion coefficient ˜ (x) := 1 (x)+q 2 (x). When ˜ (x)
violates the ellipticity conditions of the bilinear form upon which K̃ is based (cf. Equation (9)),
e.g. due to a large negative q , the AMG convergence degrades severely and eventually divergence
is possible. This typically occurs only for a large polynomial chaos order. For other problems,
the Hermite polynomial order can have an even more serious impact on AMG √ convergence. For
example, consider the following problem with a Gaussian random variable 1 with zero mean
and finite variance :
2 2

* u √ * u
− +(1+ 1 ) 2 u = b
*x 2 *y
DOI: 10.1002/nla
The stiffness matrix K 1 corresponds to a discrete Laplace operator, while the second stiffness
2
matrix K 2 contains only contributions from * u/*y 2 . The AMG convergence rate equals the worst
multigrid convergence for deterministic problems of the form:
2
* u √ *2 u
− +(1+q ) 2 u = b with q ∈ (C2 ), C2 = 1 T
*x 2 *y
Increasing the polynomial order broadens the range of q and consequently increases the anisotropy
of the problem. This results in a decreased AMG convergence. Eventually, for a sufficiently large
order, the problem will lose ellipticity and AMG will diverge.
5.3. Number of random variables

As our convergence analysis is restricted to L ∗ = 2, it does not provide direct information about the
influence of the number of random variables L. However, the eigenvalue bounds of the matrices Ci
(7) do not depend on the number of random variables. This follows from the fact that the eigenvalues
of G i (36) equal the roots of one-dimensional polynomials [16] and thus are independent of the
number of random variables. This property suggests an independence of AMG convergence on the
number of random variables. Figure 1(b) shows the eigenvalues of C2 as a function of the number
of random variables. The same eigenvalues can be found for C3 , . . . , C L ∗ (7), while all eigenvalues
of C1 equal 1.
Only in case of a double orthogonal polynomial chaos basis, a quantitative analysis of L ∗ >2
is possible. Then the independence of AMG convergence on the number of random variables can
be demonstrated theoretically.
6. IMPLEMENTATION ASPECTS
The effectiveness of an AMG method depends strongly on the efficiency of its implementation. In
this section we point out some implementation issues that allow to reduce the computation time
and memory usage.
6.1. Matrix formulation and storage

Reordering the unknowns shows that the tensor product formulations (19) and (15) are mathemati-
cally equivalent to the matrix systems:
L∗
L∗

K i U Ci = B and M X (C1 ⊗ Is )+t K i X (Ci ⊗ ATirk ) = B̃
i=1 i=1
with the unknowns u and x being collected in the multivectors U ∈ R N ×Q and X ∈ R N ×Qs . Note
that the N rows of X equal the N blocks of the unknown vector x̂ in Equation (18). This matrix
representation allows an easy access of all the unknowns per nodal point: they correspond to a
row in the matrix U or X . Such access is frequently needed for the block smoothing operator, the
matrix–vector multiplication in the residual computation, and the block restriction and prolongation
operators. Note also that storing these multivectors in a row-by-row storage format enables a cache
efficient implementation. With one memory access, a whole set of values can be retrieved from
memory that will be used in the subsequent operations.
DOI: 10.1002/nla
Obviously, the entire system of dimension NQ×NQ (in the stationary case), or NQs×NQs (in
the time-dependent case), is never stored or constructed explicitly. Only the storage of one mass
matrix M, of L ∗ stiffness matrices K i and L ∗ matrices Ci is required. These matrices can be stored
in sparse matrix format. In general, all stiffness matrices K i have the same sparsity structure;
hence, the description of this structure has to be stored just once.
6.2. Krylov acceleration

Typically, AMG is used as a preconditioner for a Krylov method. This makes the scheme more
robust and often significantly improves the convergence rates.
The matrix–vector multiplication needed for Krylov methods can be implemented in a cache
efficient way by using the row-by-row storage format suggested above. As explained in [11],
the matrix–vector product Y = AX of a sparse matrix A ∈ R N ×N and a multivector X ∈ R N ×Q is
implemented as a sequence of three nested loops, where the inner loop runs over the columns of
the multivectors instead of over their rows. This results in an optimal reuse of the cache since the
data access patterns of X and Y match their storage layout.
For the stationary systems, conjugate gradients (CG) can be used as the matrices Ci and the
stiffness matrices K i are symmetric. In the time-dependent case, we shall use BiCGStab or one
of the GMRES variants because of the non-symmetry of matrix Airk .
6.3. Block smoothing

A large part of the computation time is spent in the smoothing steps. At each smoothing iteration
N , systems of size Q × Q or Qs×Qs have to be solved. Optimizing the solution time of these local
systems is therefore of utmost importance. One possible approach is to factorize these systems
already during setup so that every smoothing step only matrix–vector multiplications or back
substitutions are required. However, the storage of N matrix factorizations may lead to excessive
memory requirements for large values of N and Q. Hence, we will not consider this further. In
our experiments with direct solvers, the factorization will be done on the fly.
Depending on the properties of the local systems, different solution methods can be selected.
Figure 2 shows the average computation time of several solution approaches to solve one local
system. The considered methods include an LU solver without pivoting, a sparse LU solver
(UMFPACK [29] and SuperLU [30]) and a Krylov method. The tests were performed on a Pentium
IV 2.4 GHz machine with 512 MByte RAM. Values for Q as a function of the number of random
variables L and the chaos order P are given in Table I. These values are to be multiplied by s to
get the system dimension in the IRK case.
In the stationary case, considering our model problem (1) discretized with a Hermite or a
Legendre chaos, the local systems (21) are sparse and symmetric positive-definite, with clustered
eigenvalues and a condition number O(1). For large problem sizes, the CG solver leads to the
best performance. No preconditioning is necessary because the systems are well conditioned. In
the time-dependent case, the local systems (20) are non-symmetric and sparse. The matrices have
clustered, complex eigenvalues and a condition number typically of the order O(10). The sparse LU
solver SuperLU [30] yields the smallest execution times for non-trivial problem sizes. In both the
stationary and time-dependent cases, a direct solver is the most efficient method if the dimension,
Q or Qs, is small enough.
When the random model parameter is discretized by a generalized polynomial chaos (4) instead
of by a KL expansion (5), the local systems have the same dimension Q but become less
DOI: 10.1002/nla
10
10
Average solution time (sec.)
Average solution time (sec.)

10
10
10
10
10
10 Gauss elimination 10 Gauss elimination

CG BiCGStab
Sparse LU (umfpack) Sparse LU (umfpack)
Sparse LU (SuperLU) Sparse LU (SuperLU)
10 10
0 50 100 150 200 0 50 100 150 200 250 300 350 400
(a) Dimension local system (b) Dimension local system
Figure 2. Average computation time to solve one local system (21) or (20) in case of the model problem
(1) with (x, t, ) modelled as a Gaussian random field (x, ) with an exponential covariance function:
(a) stationary problem and (b) time-dependent problem. A Hermite chaos random discretization is used
and a Radau IIA IRK method.
Table I. The number of random unknowns Q as a function of the number

of random variables L and of the polynomial chaos order P.
L
P 1 2 4 8 10 15 20
1 2 3 5 9 11 16 21
2 3 6 15 45 66 136 231
4 5 15 70 495 1001 3876 10 626
sparse, see [31]. As a consequence, the local system solves are more time consuming. The corre-
sponding computation times for different solution methods follow, however, the same pattern as in
Figure 2.
7. NUMERICAL RESULTS
In this section we present some numerical results obtained with the AMG method. First, we
investigate the AMG convergence with respect to several discretization parameters for the stationary
diffusion equation. The tests use a square spatial domain, D = [0, 1]2 , and piecewise linear, triangular
finite elements. We consider homogeneous Dirichlet boundary conditions, and the source term
b(x, t) is set to zero. The AMG prolongation operators are built with classical Ruge–Stüben AMG
[32]. The stopping criterion for the AMG method is a residual norm smaller than 10−10 . A random
initial approximation to the solution was used. We consider several configurations for the random
input (x, t, ). In case of a random field, (x, ), the stochastic diffusion coefficient depends
on the spatial position, e.g. representing a heterogeneous material. In case of a random process,
(t, ), the stochastic diffusion coefficient remains the same at all spatial points but evolves in
DOI: 10.1002/nla
Table II. Configurations of the random coefficient (x, t, ) in Equation (1).

Name Random discretization Distribution
Random field g (x, ) Karhunen–Loève expansion Standard normal
u (x, ) Karhunen–Loève expansion Uniform on [−1, 1]
ln (x, ) = exp(g ) Polynomial chaos expansion Standard normal
Random process t (t, ) Karhunen–Loève expansion Standard normal
Table III. Number of iterations required to solve the steady-state diffusion equation corresponding to (1)
with W (2, 1)-cycles, using AMG as standalone solver, or as preconditioner for CG (between brackets).
Spatial nodes
Q = 21, P = 2, L = 5 N = 10 177 N = 50 499 N = 113 981 N = 257 488 N = 356 806
g 31 (15) 35 (16) 36 (17) 36 (17) 37 (17)
u 31 (15) 34 (16) 36 (17) 36 (17) 37 (17)
ln 32 (16) 37 (17) 39 (18) 38 (17) 39 (18)
Random variables L =1 L =5 L = 10 L = 15 L = 20
(N = 20 611, P = 2) Q =3 Q = 21 Q = 66 Q = 136 Q = 231
g 32 (15) 34 (16) 34 (16) 35 (16) 35 (16)
u 32 (15) 33 (16) 34 (16) 35 (16) 35 (16)
ln 35 (16) 36 (17) 36 (17) 36 (17) 37 (17)
Chaos order P =1 P =2 P =3 P =4 P =5
(N = 20 611, L = 5) Q =6 Q = 21 Q = 56 Q = 126 Q = 252
g 33 (15) 34 (16) 34 (16) 35 (16) 36 (17)
u 33 (15) 33 (16) 34 (16) 35 (16) 35 (17)
ln 34 (16) 36 (17) 37 (17) 37 (17) 38 (18)
time. For each case, Table II indicates which expansion is used to construct the random input
and what type of random variables are present in that expansion. In case of a KL expansion, an
exponential covariance function is assumed, C (x, x ) = exp(−|x−x |/lc ), with variance = 0.1
and correlation length lc = 1. In case of the lognormal random field ln , the variance of the
underlying Gaussian field g equals 0.3. For each configuration of , the mean value of the
random input always equals the constant function 1. When the stochastic discretization is based
on uniformly distributed random variables, a Legendre polynomial chaos is used, in the case of
standard normal distributed random variables a Hermite chaos. Next, the AMG performance will
be illustrated for a more complex test problem.
7.1. Stationary problems

The dependence of the AMG convergence properties on the spatial and stochastic discretization
parameters is illustrated by the numerical results displayed in Table III. As AMG cycle type,
W (2, 1)-cycles are used since these result in a lower overall solution time compared with V - or
F-cycles, see also Figure 3. As expected from the discussion in Section 5, the number of AMG
DOI: 10.1002/nla
(a) (b)
Figure 3. Total solution time when solving the steady-state problem with u (x, ), L = 5 and (2, 1)-cycles
of AMG iterations: (a) a second-order Legendre chaos is used, resulting in Q = 21 and (b) the discretization
is based on a first until a fifth-order Legendre chaos and a mesh with 20 611 nodes.
Table IV. The number of iterations required to solve problem (38) with W (2, 1)-cycles, using AMG as
standalone solver, or as preconditioner for CG (between brackets), until residual<10−10 .
Polynomial chaos order P =1 P =2 P =3 P =4 P =5
g (Hermite chaos) 24 (16) 25 (17) 27 (19) 35 (22) 86 (61)
u (Legendre chaos) 24 (16) 24 (16) 25 (17) 25 (17) 25 (17)
ln (Hermite chaos) 43 (25) 48 (28) 58 (33) 72 (40) 97 (53)
The finite element mesh consists of N = 20 611 nodes, and five random variables are used to discretize the
random space.
Table V. The number of iterations required to solve the time-dependent problem (1) with V (2, 1)-cycles,
using AMG as standalone solver, or as preconditioner for BiCGStab (between brackets).
Time discretization order 1 3 5 7 9 11
IRK stages s =1 s =2 s =3 s =4 s =5 s =6
g 41 (18) 33 (19) 29 (19) 27 (20) 27 (18) 27 (19)
t 39 (18) 32 (19) 28 (19) 27 (18) 27 (19) 27 (19)
The discretization is based on a finite element mesh with 20 611 nodes, a second-order Hermite chaos with
L = 5, corresponding to Q = 21, and a Radau IIA implicit Runge–Kutta scheme with t = 0.01.
iterations is independent of the stochastic and spatial discretization when applied to our model
problem. The independence on the polynomial chaos order is maintained in the case of a Hermite
chaos for a low to moderate chaos order. Applying Krylov acceleration results in a more robust
convergence and reduced computing times.
The computation times for the calculations in Table III are presented graphically in Figure 3.
The total AMG solution time is shown as a function of the number of spatial nodes and of the
number of random unknowns. For this problem, the matrices Ci are defined as in Equation (7). By
DOI: 10.1002/nla
increasing the number of spatial nodes, the number of local solves in the blocksmoother increases
proportionally. In addition, extra coarse levels may be introduced so that the total increase in
computation time is no longer linear. The results from the right figure were obtained by increasing
the polynomial chaos order while keeping the number of random variables L constant. Thus,
only the dimension of the matrices Ci increases; the number of stiffness matrices L ∗ remains the
same. This mainly affects the cost of the block solves in the smoother. With CG, the number of
iterations required is proportional to the square root of the condition number of (21). In practice
this condition number is close to 1 so that the number of CG iterations is more or less independent
of the dimension of the systems. The cost of each CG iteration depends on the sparsity of the
matrix, which, with Ci defined by (7), is of the order O(Q). This results in a cost O(Q) to solve
one block system in the smoother. The linear increase of the computation time in function of the
number of random unknowns Q is clearly observed in Figure 3. If the number of stiffness matrices
is also increased, then the total computing time tends to grow faster than linear. Also in the case
of a polynomial chaos expansion of the random input, as in the lognormal field example, higher
computing times are observed. This is caused by the larger number of stiffness matrices, Q instead
of L +1, and by the decreased sparsity of the matrices Ci (8).
As discussed in Section 5, the convergence analysis indicates that the convergence of AMG is
asymptotically independent of the polynomial chaos order in case of a Legendre chaos but not in
the case of a Hermite chaos. For model problem (1), solely a large polynomial chaos order has
an impact on the multigrid convergence. For some problems, however, also small values of the
polynomial chaos order affect the AMG convergence rate. This is illustrated by the problem
2 2
* u(x, ) * u(x, )
+(x, ) =0 (38)
*x 2 *y 2
which is discretized similar to our model problem. Table IV shows the AMG convergence for
increasing values of the polynomial chaos order. In case of a Hermite chaos, the deteriorating
AMG convergence is observed. As expected, the number of iterations remains unchanged in case
of a Legendre chaos.
7.2. Time-dependent problems

First, we illustrate the influence of the number of IRK stages on the AMG convergence, for model
problem (1). The results are presented in Table V for a number of stages s, increasing from 1
up to 6. The first corresponds to a first-order method, while the later leads to a time integration
scheme of order 11. As a test case, a Gaussian field g (x, ) and a Gaussian process t (t, ) were
used, see Table II for the characteristics. Note that the convergence analysis predicts an increased
AMG convergence rate when the number of IRK stages is increased. This effect is visible in the
numerical results.
Finally, we consider a more challenging transient potential equation:
*V (x, t, )
−∇ ·((x, )∇V (x, t, )) = 0 (39)
*t
on a complex domain. Here, V denotes the electric potential and the electric permittivity. No
charge density
is present. The domain, the boundary conditions and the setup for are presented
in Figure 4. The model represents a three-phase cable, with four constant potentials along the
outer and inner boundaries. The permittivity is expressed as a piecewise constant random field
DOI: 10.1002/nla
Figure 4. Configuration of a two-dimensional stochastic problem. The random variables 1 , 2 , 3 are

independent and each uniformly distributed on [−1, 1]. The permittivity of the free space, 0 = 8.85×10−12 .
Mean at t = 0.1 Mean at t = 0.5 Mean at t = 2

-0.906 -0.0436 0.819 -0.906 -0.0436 0.819 -0.906 -0.0436 0.819
Variance at t = 0.1 Variance at t = 0.5 Variance at t = 2

0 0.000529 0.00106 0 0.000934 0.00187 0 0.000999 0.002
Figure 5. Mean and variance of the solution of Equation (39). The configuration of Figure 4 is used with
a three-stage Radau IIA IRK discretization and time step 0.05. The stochastic discretization is based on
a second-order Legendre chaos. The electric potential is zero initially.
corresponding to the different material regions of the cable. The stochastic PDE models the effect
of deviations in permittivity on the resulting electric potential as a function of space and time.
Figure 5 shows the mean value and the variance of the electric potential at several instances in
DOI: 10.1002/nla
(a) (b)
Figure 6. Residual norms as a function of the number of iterations when solving Equation (39) on
the domain represented by Figure 4, discretized with 166 245 nodes. A three-stage Radau IIA IRK
discretization with time step 0.05 is used together with a second-order Legendre chaos resulting in a total
of 1.7×106 unknowns in the steady-state case and 5.0×106 in the time-dependent case: (a) Steady-state
problem, F(2, 1) cycles and (b) transient problem, V(2, 1) cycles.
time. Applying AMG results in similar convergence properties as the ones described above. An
illustration of the convergence history as a function of the iteration index is given in Figure 6.
Observe that the use of GMRESR [33] results in a more robust convergence than BiCGStab. This
is typically also the case for classical deterministic PDEs. To limit the memory requirements of
GMRESR, the method is restarted every five iterations.
8. CONCLUSIONS
We have constructed and analyzed an AMG method for stochastic finite element discretizations of
time-dependent stochastic PDEs. This work extends previous research on multigrid for stochastic
finite element problems [16, 27] towards unstructured finite element meshes and high-order time
discretizations. The presented AMG method has very favorable convergence properties with respect
to the spatial, random and time discretization.
To solve real engineering stochastic PDEs by the stochastic finite element method, however,
further research is necessary. By using the knowledge of the stochastic discretization, the AMG
components may be enhanced and optimized.
APPENDIX A: THE COMPUTATION of i j k IN CASE OF A HERMITE CHAOS
The Hermite chaos of order P and defined over L standard normal variables 1 , . . . , L is
constructed as a set of Q multivariate Hermite polynomials q , each defined as [34]

L 1
q = H q,i (i )
i=1 q,i !
DOI: 10.1002/nla
with Hn (z) being a one-dimensional Hermite polynomial of order n and q = ( q,i , . . . , q,L ) a set
L
of non-negative integers with only a finite number non-zeros and i=1 q,i <P. The factor 1/ q,i !
guarantees the normalization of the multivariate polynomials. The one-dimensional polynomials
Hn (z) are recursively defined as [35]
H0 (z) = 1, H1 (z) = z, Hn+1 (z) = z Hn (z)−n Hn−1 (z)
Based on the properties of Hermite polynomials [28, p. 390], the inner product of three multivariate
Hermite polynomials can be calculated as

L im ! jm ! km !
i j k = (A1)
m=1 (sm − im )!(sm − jm )!(sm − km )!
if 2sm = im + jm + km is an even integer and sm im , sm jm , sm km , otherwise the inner

product is zero.
ACKNOWLEDGEMENTS
This paper presents research results of the Belgian Network DYSCO (Dynamical Systems, Control, and
Optimization), funded by the Interuniversity Attraction Poles Programme, initiated by the Belgian State,
Science Policy Office. The scientific responsibility rests with its authors. This research was supported in
part by the Research Council K.U.Leuven, CoE EF/05/006 Optimization in Engineering (OPTEC).
REFERENCES
1. Karniadakis G, Su C-H, Xiu D, Lucor D, Schwab C, Todor R. Generalized polynomial chaos solution for
differential equations with random inputs. Research Report 2005-01, Seminar for Applied Mathematics, ETH
Zürich, January 2005.
2. Xiu D, Karniadakis G. Modeling uncertainty in steady state diffusion problems via generalized polynomial chaos.
Computer Methods in Applied Mechanics and Engineering 2002; 191:4927–4948.
3. Schuëller GI. A state-of-the-art report on computational stochastic mechanics. Probabilistic Engineering Mechanics
1997; 12(4):197–322.
4. Babuška I, Tempone R, Zouraris GE. Solving elliptic boundary value problems with uncertain coefficients by
the finite element method: the stochastic formulation. Computer Methods in Applied Mechanics and Engineering
2005; 194:1251–1294.
5. Babuška I, Chatzipantelidis P. On solving elliptic stochastic partial differential equations. Computer Methods in
Applied Mechanics and Engineering 2002; 191:4093–4122.
6. Shinozuka M, Deodatis G. Response variability of stochastic finite element systems. Journal of Engineering
Mechanics 1988; 114:499–519.
7. Ghanem R, Spanos P. Stochastic Finite Elements, a Spectral Approach. Dover: Mineola, NY, 2003.
8. Ghanem R, Spanos P. A spectral stochastic finite element formulation for reliability analysis. Journal of
Engineering Mechanics (ASCE) 1991; 17:2351–2372.
9. Hairer E, Wanner G. Solving Ordinary Differential Equations II: Stiff and Differential-algebraic Problems.
Springer: Berlin, Germany, 1991.
10. Van lent J, Vandewalle S. Multigrid methods for implicit Runge–Kutta and boundary value method discretizations
of parabolic pdes. SIAM Journal on Scientific Computing 2005; 27(1):67–92.
11. Boonen T, Van lent J, Vandewalle S. An algebraic multigrid method for high order time-discretization of the
div–grad and the curl–curl equations. Applied Numerical Mathematics 2007; in press.
12. Xiu D, Karniadakis G. The Wiener–Askey polynomial chaos for stochastic differential equations. SIAM Journal
on Scientific Computing 2002; 24(2):619–644.
DOI: 10.1002/nla
13. Babuška I, Tempone R, Zouraris GE. Galerkin finite element approximations of stochastic elliptic partial differential
equations. SIAM Journal on Numerical Analysis 2004; 42:800–825.
14. Wan X, Karniadakis GE. An adaptive multi-element generalized polynomial chaos method for stochastic differential
equations. Journal of Computational Physics 2005; 209(2):617–642.
15. Le Maı̂tre O, Knio O, Debusschere B, Najm H, Ghanem R. A multigrid solver for two-dimensional stochastic
diffusion equations. Computer Methods in Applied Mechanics and Engineering 2003; 192:4723–4744.
16. Seynaeve B, Rosseel E, Nicolaı̈ B, Vandewalle S. Fourier mode analysis of multigrid methods for partial
differential equations with random coefficients. Journal of Computational Physics 2007; 224:132–149.
17. Ghanem R, Saad G, Doostan A. Efficient solution of stochastic systems: application to the embankment dam
problem. Structural Safety 2007; 29(3):238–251.
18. Xiu D, Karniadakis G. Modeling uncertainty in flow simulations via generalized polynomial chaos. Journal of
Computational Physics 2003; 187:137–167.
19. Loève M. Probability Theory. Springer: New York, U.S.A., 1977.
20. Ghanem R. The nonlinear Gaussian spectrum of log-normal stochastic processes and variables. Journal of Applied
Mechanics—Transactions of the ASME 1999; 66(4):964–973.
21. Phoon K, Huang S, Quek S. Simulation of second-order processes using Karhunen–Loève expansion. Computers
and Structures 2002; 80:1049–1060.
22. Sudret B, Der Kiureghian A. Stochastic finite elements and reliability: a state-of-the-art report. Technical Report
UCB/SEMM-2000/08, University of California, Berkeley, 2000.
23. Ruge JW, Stüben K. Algebraic multigrid. In Multigrid Methods, McCormick SF (ed.). Frontiers in Applied
Mathematics. SIAM: Philadelphia, U.S.A., 1987; 73–130.
24. Trottenberg U, Oosterlee C, Schüller A. Multigrid. Academic Press: San Diego, U.S.A., 2001.
25. Brandt A. Multi-level adaptive solutions to boundary-value problems. Mathematics of Computation 1977; 31:
333–390.
26. Wienands R, Joppich W. Practical Fourier Analysis for Multigrid Methods. CRC Press: Boca Raton, FL, U.S.A.,
2005.
27. Elman H, Furnival D. Solving the stochastic steady-state diffusion problem using multigrid. IMA Journal of
Numerical Analysis 2007; 27(4):675–688.
28. Szegö G. Orthogonal Polynomials (4th edn). American Mathematical Society: Providence, U.S.A., 1967.
29. Davis TA. Algorithm 832: UMFPACK V4.3—an unsymmetric-pattern multifrontal method. ACM Transactions
on Mathematical Software 2004; 30(2):196–199.
30. Demmel JW, Eisenstat SC, Gilbert JR, Li XS, Liu JWH. A supernodal approach to sparse partial pivoting. SIAM
Journal on Matrix Analysis and Applications 1999; 20(3):720–755.
31. Eiermann M, Ernst OG, Ullmann E. Computational aspects of the stochastic finite element method. Computing
and Visualization in Science 2007; 10(1):3–15.
32. Stüben K. A review of algebraic multigrid. Journal of Computational and Applied Mathematics 2001; 128:
281–309.
33. Van der Vorst H, Vuik C. GMRESR: a family of nested GMRES methods. Numerical Linear Algebra with
Applications 1994; 1(4):369–386.
34. Matthies H, Keese A. Galerkin methods for linear and nonlinear elliptic stochastic partial differential equations.
Computer Methods in Applied Mechanics and Engineering 2005; 194:1295–1331.
35. Soize C, Ghanem R. Physical systems with random uncertainties: chaos representations with arbritary probability
measure. SIAM Journal on Scientific Computing 2004; 26(2):395–410.
DOI: 10.1002/nla
Algebraic multigrid for k-form Laplacians
Nathan Bell∗, † and Luke N. Olson

Siebel Center for Computer Science, University of Illinois at Urbana-Champaign, 201 North Goodwin Avenue,
Urbana, IL 61801, U.S.A.
SUMMARY
In this paper we describe an aggregation-based algebraic multigrid method for the solution of discrete
k-form Laplacians. Our work generalizes Reitzinger and Schöberl’s algorithm to higher-dimensional
discrete forms. We provide conditions on the tentative prolongators under which the commutativity of
the coarse and fine de Rham complexes is maintained. Further, a practical algorithm that satisfies these
conditions is outlined, and smoothed prolongation operators and the associated finite element spaces are
highlighted. Numerical evidence of the efficiency and generality of the proposed method is presented in
the context of discrete Hodge decompositions. Copyright q 2008 John Wiley & Sons, Ltd.
Received 14 May 2007; Revised 4 December 2007; Accepted 5 December 2007
KEY WORDS: algebraic multigrid; Hodge decomposition; discrete forms; mimetic methods; Whitney
forms
1. INTRODUCTION
Discrete differential k-forms arise in scientific disciplines ranging from computational electro-
magnetics to computer graphics. Examples include stable discretizations of the eddy-current
problem [1–3], topological methods for sensor network coverage [4], visualization of complex
flows [5, 6], and the design of vector fields on meshes [7].
In this paper we consider solving problems of the form
ddk = k (1)
where d denotes the exterior derivative and d the codifferential relating k-forms and . For
k = 0, 1, 2, dd is also expressed as ∇·∇, ∇×∇×, and ∇∇·, respectively. We refer to operator dd
generically as a Laplacian, although it does not correspond to the Laplace–de Rham operator D =
dd+dd except for the case k = 0. We assume that (1) is discretized with mimetic first-order elements
∗ Correspondence to: Nathan Bell, Siebel Center for Computer Science, University of Illinois at Urbana-Champaign,
201 North Goodwin Avenue, Urbana, IL 61801, U.S.A.
†
E-mail: wnbell@uiuc.edu, wnbell@gmail.com

166 N. BELL AND L. N. OLSON
such as Whitney forms [8, 9] on simplicial meshes or the analog on hexahedral [10] or polyhedral
elements [11]. In general, we use Ik to denote the map from discrete k-forms (cochains) to their
respective finite elements. Such discretizations give rise to a discrete exterior k-form derivative
Dk and discrete k-form innerproduct Mk (i, j) = Ik ei , Ik e j, which allows implementation of (1)
in weak form as
DTk Mk+1 Dk x = b (2)
under the additional assumption that d commutes with I , i.e. Ik+1 Dk = dk Ik . This relationship is
depicted as
dk -
k k+1
6 6
Ik Ik+1 (3)
Dk -
kd k+1
d
where k and kd denote the spaces of differential k-forms and discrete k-forms, respectively. For
the remainder of the paper, we restrict our attention to solving (2) on structured or unstructured
meshes of arbitrary dimension and element type, provided the elements satisfy the aforementioned
commutativity property.
Figure 1. Enumeration of nodes (left), oriented edges (center), and oriented triangles (right) for a simple
triangle mesh. We say that vertices 2 and 3 are upper adjacent since they are joined by edge 4. Similarly,
edges 5 and 6 are both faces of triangle 2 and therefore upper adjacent.
Figure 2. Forms I0 0 , I1 D0 0 , and I1 1 where I denotes Whitney interpolation. The left

and center figures illustrate property (3). Whether the derivative is applied before or after
interpolation, the result is the same.
DOI: 10.1002/nla
AMG FOR k-FORM LAPLACIANS 167
1.1. Example
Although our results hold more generally, it is instructive to examine a concrete example that
satisfies the assumptions set out in Section 1. To this end, consider the three-element simplicial
mesh depicted in Figure 1, with the enumeration and orientation of vertices, edges, and triangles
as shown. In this example, we choose Whitney forms [8] to define the interpolation operators
I0 , I1 , I2 which in turn determine the discrete innerproducts M0 , M1 , M2 . Finally, sparse matrices
⎡ ⎤
−1 1 0 0 0
⎡ ⎤ ⎢ −1 0
0 ⎢ 0 1 0⎥
⎥
⎢ 0⎥ ⎢ ⎥
⎢ ⎥ ⎢ 0 −1 1 0 0⎥
⎢ ⎥ ⎢ ⎥
D−1 = ⎢ 0⎥ , D0 = ⎢
⎢ 0 −1 0 1 0⎥
⎥ (4)
⎢ ⎥ ⎢ 0
⎣ 0⎦ ⎢ 0 −1 1 0⎥
⎥
⎢ ⎥
0 ⎣ 0 0 −1 0 1⎦
0 0 0 −1 1
⎡ ⎤
1 −1 0 1 0 0 0
⎢ ⎥
D1 = ⎣ 0 0 1 −1 1 0 0 ⎦ , D2 = [0 0 0] (5)
0 0 0 0 −1 1 −1
implement the discrete k-form derivative operators. A discrete k-form (cochain), denoted k , is
represented by a column vector with entries corresponding to each of the k-simplices in the mesh.
For example, the Whitney-interpolated fields corresponding to 0 = [0, 1, 2, 1, 2]T , the gradient
D0 0 = [1, 1, 1, 0, −1, 0, 1]T , and another 1-form 1 = [1, 0, 1, 0, 0, 1, 0]T are shown in Figure 2.
By convention, D−1 and D2 are included to complete the exact sequence.
1.2. Related work

There is significant interest in efficient solution methods for Maxwell’s eddy-current problem
∇×∇×E+E = f (6)
In particular, recent approaches focus on multilevel methods for both structured and unstructured
meshes [12–15]. Scalar multigrid performs poorly on edge element discretizations of (6) since
error modes that lie in the kernel of ∇×∇× are not effectively damped by standard relaxation
methods. Fortunately, the problematic modes are easily identified by the range of the discrete
gradient operator D0 , and an appropriate hybrid smoother [12, 13] can be constructed. An important
property of these multigrid methods is commutativity between coarse and fine finite element spaces.
The relationship is described as
D0 - 1
0d d
6 6
P0 P1 (7)
0
D
0
d - 1
d
DOI: 10.1002/nla
where 0 the coarse gradient operator, and P0 and

k
d is the space of coarse discrete k-forms, D
P1 are the nodal and edge prolongation operators, respectively. Combining (7) with (3) yields the
same result for the corresponding fine and coarse finite element spaces.
In [14], Reitzinger and Schöberl describe an algebraic multigrid method for solving (6) on
unstructured meshes. In their method, property (7) is maintained by choosing nodal aggregates
and using these aggregates to obtain compatible edge aggregates. The nodal and edge aggregates
then give rise to piecewise-constant prolongators P0 and P1 , which can be smoothed to achieve
better multigrid convergence rates [15] while retaining property (7).
The method we present can be viewed as a natural extension of Reitzinger and Schöberl’s work
from 1-forms to general k-forms. Commutativity of the coarse and fine de Rham complexes is
maintained for all k-forms, and their associated finite element spaces Ik kd ⊂ k . The relationship
is described by
D0 - 1 D1 - 2 Dk -
0d d d ... kd k+1
d
6 6 6 6 6
P0 P1 P2 Pk Pk+1 (8)
0
D 1
D k
D
0
d - 1
d - 2
d ... k
d - k+1
d
where Pk denotes either the tentative prolongator Pk or smoothed prolongator Sk Pk .
1.3. Focus and applications

While our work is largely inspired by multigrid solvers for (6), our intended applications do not
focus specifically on the eddy-current problem. Indeed, recent work suggests that the emphasis on
multilevel commutativity, a property further developed in this paper, is at odds with developing
efficient solvers for (6) in the presence of highly variable coefficients [16]. Although our method
generalizes the work of Reitzinger and Schöberl [14] and Hu et al. [15], this additional generality
does not specifically address the aforementioned eddy-current issues.
In Section 3, we discuss computing Hodge decompositions of discrete k-forms with the proposed
method. The Hodge decomposition is a fundamental tool in both pure and applied mathematics that
Figure 3. The two harmonic 1-forms of a rocker arm surface mesh.
DOI: 10.1002/nla
exposes topological information through differential forms. For example, the two harmonic 1-forms
shown in Figure 3 exist because the manifold has genus 1. The efficient solution of discrete k-form
Laplacians has substantial utility in computational topology. For instance, sufficient conditions on
the coverage of sensor networks reduce to the discovery of harmonic forms on the simplicial Rips
complex [4]. In such applications, we do not encounter variable coefficients and often take the
identity matrix for Mk .
2. PROPOSED METHOD
2.1. Complex coarsening

In this section we describe the construction of tentative prolongators Pk and coarse operators D k ,
which satisfy (8). In practice, the two-level commutativity depicted in (8) is extended recursively
for use in a multilevel method. Also, it is important to note that when solving (2) for a specific k,
it is not necessary to coarsen the entire complex.
As in [14], we presume the existence of a nodal aggregation algorithm that produces a piecewise-
constant tentative prolongator P0 . This procedure, called aggregate nodes in Algorithm 1,
is fulfilled by either smoothed aggregation [17] or a graph partitioner on matrices DT0 M1 D0 or
DT0 D0 . Ideally, the nodal aggregates are contiguous and have a small number of interfaces with
other aggregates.
Algorithm 1. coarsen complex(D−1 , D0 , . . . , D N )

1 P0 ⇐ a g g r e g a t e n o d e s (D0 , . . .)
2 f o r k = 0 t o N −1
3 Pk+1 ⇐ i n d u c e d a g g r e g a t e s (Pk , Dk , Dk+1 )
4 k ⇐ (P T Pk+1 )−1 P T Dk Pk
D k+1 k+1
5 end
6 −1 ⇐ P T D−1
D 0
7 N ⇐ D N PN
D
8 −1 , D
r e t u r n P0 , P1 , . . . , PN and D 0, . . . , D
N
2.2. Induced aggregates

The key concept in [14], which we apply and extend here, is that nodal aggregates induce edge
aggregates; we denote P1 as the resulting edge aggregation operator. As depicted in Figure 4, a
coarse edge exists between two coarse nodal aggregates when any fine edge joins them. Multiple
fine edges between the same two coarse nodal aggregates interpolate from a common coarse edge
with weight 1 or −1 depending on their orientation relative to the coarse edge. The coarse nodes
and coarse edges define a coarse derivative operator D 0 , which satisfies diagram (7).
We now restate the previous process in an algebraic manner that generalizes to arbitrary k-forms.
Given P0 as before, form the product D = D0 P0 that relates coarse nodes to fine edges. Observe
that each row of D corresponds to a fine edge and each column to a coarse node. Notice that
the ith row of D is zero when the end points of fine edge i lie within the same nodal aggregate.
Conversely, the ith row of D is nonzero when the end points of fine edge i lie in different nodal
DOI: 10.1002/nla
Figure 4. Nodal aggregates (left) determine coarse edges (center) through the algorithm
induced aggregates. Fine edges crossing between node aggregates interpolate from
the corresponding coarse edge with weight 1 or −1 depending on their relative orientation.
Edges contained within an aggregate do not correspond to any coarse edge and receive
weight 0. These weights are determined by lines 10–13 of induced aggregates.
aggregates. Furthermore, when two nonzero rows are equal up to a sign (i.e. linearly dependent),
they interpolate from a common coarse edge.
Therefore, the procedure of aggregating edges reduces to computing sets of linearly dependent
rows in D. Each set of dependent rows yields a coarse edge and thus a column of P1 . In the
general case, sets of dependent rows in D = Dk Pk are identified and used to produce Pk+1 . The
process can be repeated to coarsen the entire de Rham complex. Alternatively, the coarsening
can be stopped at a specific k < N . In Section 2.5, we discuss the coarse derivative operator
k ⇐ (P T Pk+1 )−1 P T Dk Pk and show that it satisfies diagram (8).
D k+1 k+1
Algorithm 2. induced aggregates(Pk , Dk , Dk+1 )

1 D ⇐ Dk Pk
2 G ⇐ DTk+1 Dk+1
3 V ⇐ {}
4 n ⇐0
5
6 f o r i i n r ows(D) s u c h t h a t D(i, :) = 0
7 i f i ∈ V
8 An ⇐ d e p e n d e n t r o w s (G, D, i)
9 f o r j ∈ An
10 i f D(i, :) = D( j, :)
11 Pk+1 ( j, n) ⇐ 1
12 else
13 Pk+1 ( j, n) ⇐ −1
14 end
15 end
16 n ⇐ n +1
17 V ⇐ V ∪ An
18 end
19 end
20 r e t u r n Pk+1
DOI: 10.1002/nla
Intuitively, linear dependence between rows in D = Dk Pk indicates redundancy created by

operator Pk . Aggregating dependent rows together removes redundancy from the output of D and
compresses the remaining degrees of freedom into a smaller set of variables. By construction, the
tentative prolongators have full column rank and satisfy
R(Dk Pk ) ⊂ R(Pk+1 ) (9)
where R(A) denotes the range of matrix A. Note that property (9) is clearly necessary to satisfy
diagram (8).
Using disjoint sets of dependent rows A0 , A1 , . . ., the function induced aggregates
constructs the aggregation operator Pk+1 described above. Nonzero entry Pk+1 (i, j) indicates
membership of the ith row of D—i.e. the ith k +1-dimensional element—to the jth aggregate A j .
2.3. Computing aggregates

For a given row index i, the function dependent rows constructs a set of rows that are linearly
dependent to D(i, :). In the matrix graph of G, a nonzero entry G(i, j) indicates that the k +
1-dimensional elements with indices i and j are upper adjacent [18]. In other words, i and j are
both faces of some k +2-dimensional element. For example, two edges in a simplicial mesh are
upper adjacent if they belong to the same triangle. All linearly dependent rows that are adjacent
in the matrix graph of G are aggregated together. This construction ensures that the aggregates
produced by dependent rows are contiguous. As shown in Figure 5, such aggregates are more
T
natural than those that result from aggregating all dependent rows together (i.e. using G = D D ).
Algorithm 3. dependent rows(G, D, i)

1 Q ⇐ {i}
2 A ⇐ {i}
3 w h i l e Q = {}
4 j ⇐ pop(Q)
5 Q ⇐ Q \{ j}
6 f o r k s u c h t h a t G( j, k) = 0
7 i f k ∈ A and D(i, :) = ±D(k, :)
8 A ⇐ A ∪{k}
9 Q ⇐ Q ∪{k}
10 end
11 end
12 end
13 return A
2.4. Example
In this section, we describe the steps of our algorithm applied to the three-element simplicial mesh
depicted in Figure 1. Matrices D−1 , D0 , D1 , and D2 , shown in Section 1.1, are first computed
DOI: 10.1002/nla
Figure 5. Example where contiguous (center) and noncontiguous (right) aggregation differs.
Contiguous aggregates are reflected through our choice of G defined in induced aggregates
and later used in dependent rows.
and then passed to coarsen complex. The externally defined procedure aggregate nodes
is then called to produce the piecewise-constant nodal aggregation operator
⎡ ⎤
1 0 0
⎢ ⎥
⎢ 1 0 0⎥
⎢ ⎥
⎢ ⎥
P0 = ⎢ 0 1 0⎥ (10)
⎢ ⎥
⎢ 1 0 0⎥
⎣ ⎦
0 0 1
whose corresponding aggregates are shown in Figure 6. At this stage of the procedure, a
more general nodal problem DT0 M1 D0 may be utilized in determining the coarse aggre-
gates. Next, induced aggregates is invoked with arguments P0 , D0 , D1 and the sparse
matrix
⎡ ⎤
0 0 0
⎢ ⎥
⎢ 0 0 0⎥
⎢ ⎥
⎢ ⎥
⎢ −1 1 0⎥
⎢ ⎥
⎢ ⎥
D = D0 P0 = ⎢ 0 0 0⎥ (11)
⎢ ⎥
⎢ 1 −1 0⎥
⎢ ⎥
⎢ ⎥
⎢ 0 −1 1⎥
⎣ ⎦
−1 0 1
is constructed. Recall from Section 2.2 that the rows of D are used to determine the induced
edge aggregates. The zero rows of D, namely rows 0, 1, and 3, correspond to interior edges,
which is confirmed by Figure 6. Linear dependence between rows 2 and 4 indicates that
edges 2 and 4 have common coarse endpoints, with the difference in sign indicating opposite
orientations.
DOI: 10.1002/nla
Figure 6. Original mesh with nodal aggregates (left), coarse nodes (center), and coarse edges (right).
For each nonzero and un-aggregated row of D, dependent rows traverses

⎡ ⎤
1 −1 0 1 0 0 0
⎢ ⎥
⎢ −1 1 0 −1 0 0 0⎥
⎢ ⎥
⎢ ⎥
⎢ 0 0 1 −1 1 0 0 ⎥
⎢ ⎥
⎢ ⎥
G = D1 D1 = ⎢ 1 −1 −1 2 −1 0
T
0⎥ (12)
⎢ ⎥
⎢ 0 1 −1 2 −1 1 ⎥
⎢ 0 ⎥
⎢ ⎥
⎢ 0 0 −1 1 −1⎥
⎣ 0 0 ⎦
0 0 0 0 1 −1 1
to find dependent rows among upper-adjacent edges. In this case, edges 3 and 4 are upper adjacent
to 2; however, only row 4 in D is linearly dependent to row 2 in D. Rows 5 and 6 of D are
not linearly dependent to any other rows, thus forming single aggregates for edges 5 and 6. The
resulting aggregation operator
⎡ ⎤
00 0
⎢ ⎥
⎢ 0 0 0⎥
⎢ ⎥
⎢ ⎥
⎢ 1 0 0⎥
⎢ ⎥
⎢ ⎥
P1 = ⎢ 0 0 0⎥ (13)
⎢ ⎥
⎢ −1 0 0⎥
⎢ ⎥
⎢ ⎥
⎢ 0 1 0⎥
⎣ ⎦
0 0 1
is then used to produce the coarse discrete derivative operator
⎡ ⎤
−1 1 0
0 = (P T P1 )−1 P T D0 P0 = ⎢
D ⎣ 0
⎥
−1 1⎦ (14)
1 1
−1 0 1
DOI: 10.1002/nla
for the mesh in Figure 6. Subsequent iterations of the algorithm produce operators
⎡ ⎤
0
⎢ ⎥ 1 = (P T P2 )−1 P T D1 P1 = [1 1 −1], D 2 = D2 P2 = [0]
P2 = ⎣ 0⎦ , D 2 2 (15)
1
which complete the coarse de Rham complex.
2.5. Commutativity
0, D
We now prove tentative prolongators P0 , P1 , . . . , PK and coarse derivative operators D 1, . . . , D
K
produced by Algorithm 1 satisfy commutative diagram (8). The result is summarized by the
following theorem.
Theorem 1
Let Pk :
k
d → kd denote the discrete k-form prolongation operators with the following properties:
Pk+1 has full column rank (16a)
R(Dk Pk ) ⊂ R(Pk+1 ) (16b)
k ⇐ (P T Pk+1 )−1 P T Dk Pk
D (16c)
k+1 k+1
Then, diagram (8) holds. That is,

k
Dk Pk = Pk+1 D (17)
Proof
Since Pk+1 has full column rank, the pseudoinverse is given by
+
Pk+1 = (Pk+1
T
Pk+1 )−1 Pk+1
T
(18)
Recall that for an arbitrary matrix A, the pseudoinverse satisfies A A+ A = A. Furthermore,

R(Dk Pk ) ⊂ R(Pk+1 ) implies that Dk Pk = Pk+1 X for some matrix X. Combining these observa-
tions,
k = Pk+1 P + Dk Pk
Pk+1 D k+1
+
= Pk+1 Pk+1 Pk+1 X
= Pk+1 X
= Dk Pk
Since Algorithm 1 meets assumptions (16a)–(16c) it follows that diagram (8) is satisfied. Also,
assuming disjoint aggregates, the matrix (Pk+1
T P
k+1 ) appearing in (18) is a diagonal matrix; hence,
its inverse is easily computed.
DOI: 10.1002/nla
2.6. Exact sequences

The de Rham complex formed by the fine-level discrete derivative operators
D−1 - D0 - 1 D1 - D N −1- DN -
0 0d d ··· dN 0 (19)
is an exact sequence, i.e. img(Dk ) ⊂ ker(Dk+1 ) or equivalently Dk+1 Dk = 0. A natural question to
ask is whether the coarse complex retains this property. As argued in Section 2.5, Dk Pk = Pk+1 X
for some matrix X; therefore, it follows
k+1 D
D k = P + Dk+1 Pk+1 P + Dk Pk
k+2 k+1
+ +
= Pk+2 Dk+1 Pk+1 Pk+1 Pk+1 X
+
= Pk+2 Dk+1 Pk+1 X
+
= Pk+2 Dk+1 Dk Pk
=0
since Dk+1 Dk = 0 by assumption. From diagram (3), we infer the same result for the associated
finite element spaces.
2.7. Smoothed prolongators

While the tentative prolongators P0 , P1 , . . . produced by coarsen complex commute with Dk
and give rise to an coarse exact sequence, their piecewise-constant nature leads to suboptimal
multigrid scaling [14, 15]. In smoothed aggregation [17], the tentative prolongator P is smoothed
to produce another prolongator P = S P with superior approximation characteristics. We consider
prolongation smoothers of the form S = (I −SA). Possible implementations include Richardson
S = I , Jacobi S = diag(A)−1 , and polynomial S = p(A) [19].
Smoothed prolongation operators are desirable, but straightforward application of smoothers
to each of P0 , P1 , . . . violates commutativity. The solution proposed in [15] smooths P0 and
P1 with compatible smoothers S0 , S1 such that commutativity of the smoothed prolongators
P0 , P1 is maintained, i.e. D0 P0 = P1 D 0 . In the following theorem, we generalize this result to
arbitrary k.
Theorem 2
Given discrete k-form prolongation operators Pk satisfying (16a)–(16c), let Pk :
k
d → kd denote
the smoothed discrete k-form prolongation operators with the following properties:
Pk = Sk Pk (20a)
S0 = (I −S0 DT0 M1 D0 ) (20b)
Sk = (I −Sk DTk Mk+1 Dk −Dk−1 Sk−1 DTk−1 Mk ) for k>0 (20c)
DOI: 10.1002/nla
where Sk defines the type of prolongation smoother. Then, diagram (8) holds. That is,
k
Dk Pk = Pk+1 D (21)
Proof
First, if
Dk Sk = Sk+1 Dk (22)
then
k = Sk+1 Pk+1 D
Pk+1 D k
= Sk+1 Pk+1 (Pk+1

T
Pk+1 )−1 Pk+1
T
Dk Pk
= Sk+1 Dk Pk
= Dk Sk Pk
= Dk Pk
Therefore, it suffices to show that (22) holds for all k. For k = 0, we have
S1 D0 = (I −S1 DT1 M2 D1 −D0 S0 DT0 M1 )D0
= (D0 −S1 DT1 M2 D1 D0 −D0 S0 DT0 M1 D0 )
= (D0 −D0 S0 DT0 M1 D0 )
= D0 (I −S0 DT0 M1 D0 )
= D0 S0
and for all k>1 we have
Sk+1 Dk = (I −Sk+1 DTk+1 Mk+2 Dk+1 −Dk Sk DTk Mk+1 )Dk
= (Dk −Sk+1 DTk+1 Mk+2 Dk+1 Dk −Dk Sk DTk Mk+1 Dk )
= (Dk −Dk Sk DTk Mk+1 Dk )
= (Dk −Dk Sk DTk Mk+1 Dk −Dk Dk−1 Sk−1 DTk−1 Mk )
= Dk (I −Sk DTk Mk+1 Dk −Dk−1 Sk−1 DTk−1 Mk )

= Dk Sk
which completes the proof of (21).
On subsequent levels, the coarse innerproducts M k replace Mk

k = P T Mk Pk and derivatives D
k
and Dk in the definition of Sk . As shown below, the Galerkin product Ak = P Ak Pk can also be
T
k
DOI: 10.1002/nla
expressed in terms of the coarse operators

k = PkT Ak Pk
A
= PkT DTk Mk+1 Dk Pk
=D
T k
k P T Mk+1 Pk+1 D
k+1
=D k+1 D
k M
T k
2.8. Extensions and applications
Note that condition (9) permits some freedom in our choice of aggregates. For instance, in restricting
ourselves to contiguous aggregates we have slightly enriched the range of Pk+1 beyond what
is necessary. Provided that Pk+1 already satisfies (9), additional coarse basis functions can be
introduced to better approximate low-energy modes. As in smoothed aggregation, these additional
columns of Pk+1 can be chosen to exactly interpolate given near-nullspace vectors [17].
So far we have only discussed coarsening the cochain complex (8). It is worth noting that
coarsen complex works equally well on the chain complex formed by the mesh boundary
operators *k = DTk−1 ,
DT−1 DT0 DTN −2 N −1 DTN −1 DTN
0 0d ··· d dN 0 (23)
by simply reversing the order of the complex, i.e. (D−1 , D,0 , . . . , D N ) ⇒ (DTN , DTN −1 , . . . , D−1 ).
In this case, aggregate nodes will aggregate the top-level elements, for instance, the triangles
in Figure 1. Intuitively, *k acts like a derivative operator that maps k-cochains to (k +1)-cochains;
however, one typically refers to these as k-chains rather than cochains [20]. In Section 3, we
coarsen both complexes when computing Hodge decompositions.
3. HODGE DECOMPOSITION
The Hodge decomposition [21] states that the space of k-forms on a closed manifold can be
decomposed into three orthogonal subspaces
k = dk−1 k−1 ⊕dk+1 k+1 ⊕ Hk (24)
where Hk is the space of harmonic k-forms, Hk = {h ∈ k |Dk h = 0}. The analogous result holds
for the space of discrete k-forms kd , where the derived codifferential [22]
dk = M−1
k−1 Dk−1 Mk
T
(25)
is defined to be the adjoint of Dk−1 in the discrete innerproduct Mk . Convergence of the discrete
approximations to the Hodge decomposition is examined in [23].
In practice, for a discrete k-form k we seek a decomposition
k = Dk−1 k−1 +M−1
k Dk Mk+1
T k+1
+h k (26)
for some k−1 ∈ k−1

d ,
k+1
∈ k+1
d , and h ∈ d , where D h = 0. Note that
k k k k k−1 and k+1 are
−1 T
generally not unique, since the kernels of Dk−1 and Mk Dk Mk+1 are nonempty. However, the
DOI: 10.1002/nla
discrete k-forms (Dk−1 k−1 ) and (M−1

k Dk Mk+1
T k+1
) are uniquely determined. We decompose
into (26) by solving
k

DTk−1 Mk Dk−1 k−1 = DTk−1 Mk k (27)

Dk M−1
k Dk Mk+1
T k+1
= Dk k (28)
h k = k −Dk−1 k−1 −M−1

k Dk Mk+1
T k+1
(29)
Note that (28) involves the explicit inverse M−1 k which is typically dense.‡ In the following
sections, we first consider the special case Mk = I and then show how (28) can be circumvented
in the general case. Equation (27) is obtained by left multiplying Mk−1 DTk−1 Mk on both sides of
(26). Likewise, applying Dk to both sides of (26) yields (28). Equivalently, one may seek minima
of the following functionals:
Dk−1 k−1 −k Mk , M−1

k Dk Mk+1
T k+1
−k Mk (30)
3.1. Special case

Taking the appropriate identity matrix for all discrete innerproducts Mk in (27)–(29) yields
DTk−1 Dk−1 k−1 = DTk−1 k (31)
Dk DTk k+1 = Dk k (32)
h k = k −Dk−1 k−1 −DTk k+1 (33)

Although (31)–(33) are devoid of metric information, some fundamental topological properties of
the mesh are retained. For instance, the number of harmonic k-forms, which together form a coho-
mology basis, is independent of the choice of innerproduct.§ In applications where metric infor-
mation is either irrelevant or simply unavailable [4], these ‘nonphysical’ equations are sufficient.
Algorithm 4. construct solver(k, Mk , D−1 , D0 , . . . , D N )

1 A0 ⇐ DTk−1 Mk Dk−1
2 D0−1 , . . . , D0N ⇐ D−1 , . . . , D N
3 f o r l = 0 t o NUM LEVELS − 1
4 P0l , . . . , PNl , Dl+1 l+1
−1 , . . . , D N ⇐ c o a r s e n c o m p l e x ( D−1 , . . . , D N )
l l
5 end
6 f o r l = 0 t o NUM LEVELS − 1
7 Pl ⇐ s m o o t h p r o l o n g a t o r ( Al , Pk−1 l )
8 Al+1 ⇐ Pl Al Pl T
9 end
10 r e t u r n MG solver ( A0 , A1 , . . . , ANUM LEVELS , P0 , P1 , . . . , PNUM LEVELS−1 )
‡
The covolume Hodge star is a notable exception.
§ In the case of M = I , the cohomology basis is actually a homology basis also.
DOI: 10.1002/nla
Algorithm 5. decompose special(k , D−1 , D0 , . . . , D N )

1 s o l v e r 1 ⇐ c o n s t r u c t s o l v e r ( k, I, D−1 , D,0 , . . . , D N )
2 s o l v e r 2 ⇐ c o n s t r u c t s o l v e r ( N −k −1, I, DTN , DTN −1 , . . . , DT−1 )
3
4 k−1 ⇐ s o l v e r 1 ( DTk−1 k )
5 k+1 ⇐ s o l v e r 2 ( Dk k )
6 h ⇐ k −Dk−1 k−1 −DTk k+1
7
8 r e t u r n k−1 , k+1 , h k
Algorithm 5 demonstrates how the proposed method is used to compute Hodge decompositions
in the special case. Multigrid solvers solver1 and solver2 are constructed for the solution of
linear systems (31) and (32), respectively. In the latter case, the direction of the chain complex is
reversed when being passed as an argument to construct solver. As mentioned in Section
2.8, coarsen complex coarsens the reversed complex with this simple change of arguments.
Using the identity innerproduct, construct solver applies the proposed method recur-
sively to produce a progressively coarser hierarchy of tentative prolongators Pkl and discrete
derivatives Dlk . The tentative prolongators are then smoothed by a user-defined function
smoothprolongator to produce the final prolongators Pl and Galerkin products Al+1 ⇐
PlT Al Pl . Finally, the matrices A0 , . . . , ANUM LEVELS and P0 , . . . , PNUM LEVELS−1 determine the
multigrid cycle in a user-defined class MGsolver. Choices for smoothprolongator and
MGsolver are discussed in Section 4.
3.2. General case

The multilevel solver outlined in Section 3.1 can be directly applied to linear system (27) by
passing the innerproduct Mk , instead of the identity, in the arguments to construct solver.
However, a different strategy is needed to solve (28) since M−1k is generally dense and cannot be
formed explicitly. In the following, we outline a method for computing Hodge decompositions in
the general case.
We first remark that if a basis for the space of Harmonic k-forms, Hk = span{h k0 , h k1 , . . . h kH },
is known, then the harmonic component of the Hodge decomposition is easily computed by
projecting k onto the basis elements. Furthermore, since k−1 in (27) can also be obtained, we
can compute the value of the remaining component (k −Dk−1 k−1 −h k ) which must lie in the
range of M−1k Dk Mk+1 due to orthogonality of the three spaces.
T
Therefore, the task of computing general Hodge decompositions can be reduced to computing
a basis for Hk . Sometimes, a basis is known a priori. For instance, H0 , which corresponds to
the nullspace of the pure-Neumann problem, is spanned by constant vectors on each connected
component of the domain. Furthermore, if the domain is contractible then Hk = {} for k>0.
However, in many cases of interest we cannot assume that a basis for Hk is known and, therefore,
it must be computed.
Note that decompose special can be used to determine a Harmonic k-form basis for the
identity innerproduct by decomposing randomly generated k-forms until their respective harmonic
components become linearly dependent. We denote this basis {h k0 , h k1 , . . . h km } and their span Hk .
DOI: 10.1002/nla
Using these k-forms, a basis for the harmonic k-forms with innerproduct Mk can be produced by
solving
DTk−1 Mk Dk−1 ik−1 = DTk−1 Mk h ik (34)
h ik = h ik −Dk−1 ik−1 (35)
It is readily verified that h k0 , . . . , h km are harmonic
Dk h ik = Dk h ik −Dk Dk−1 ik−1 = 0 (36)
M−1 k −1 T k T k k−1
k−1 Dk−1 Mk h i = Mk−1 (Dk−1 Mk h i −Dk−1 Mk h i Dk−1 i
T
)=0 (37)
since Dk Dk−1 = 0 and Dk h ik = 0 by assumption. It remains to be shown that h k0 , . . . , h km are linearly

independent. Supposing h k0 , . . . , h km to be linearly dependent, there exist scalars c0 , . . . , c H not all
zero such that

m
0= ci h ik
i=0
m
= ci (h ik −Dk−1 ik−1 )
i=0
m
= ci h ik − ci Dk−1 ik−1
i=0 i=0

N −1 k k k
which is a contradiction, since ( i=0 ci h i ) ∈ H is nonzero and H ⊥ R(Dk−1 ). Note that the
harmonic forms h k0 , . . . , h km are not generally the same as the harmonic components of the random
k-forms used to produce h k0 , . . . h km .
We have applied the proposed method to a number of structured and unstructured problems. In all
cases, a multigrid V (1, 1)-cycle is used as a preconditioner to conjugate gradient iteration. Unless
stated otherwise, a symmetric Gauss–Seidel sweep is used during pre- and post-smoothing stages.
Iteration on the positive-semidefinite systems
DTk Dk , Dk DTk , DTk Mk+1 Dk (38)
proceeds until the relative residual is reduced by 10−10 . The matrix DT0 M1 D0 corresponds to
a Poisson problem with pure-Neumann boundary conditions. Similarly, DT1 M2 D1 is an eddy-
current problem (6) with = 0. As explained in Section 3, matrices (38) arise in discrete Hodge
decompositions.
The multigrid hierarchy extends until the number of unknowns falls below 500, at which point
a pseudoinverse is used to perform the coarse level solve. The tentative prolongators are smoothed
DOI: 10.1002/nla
twice with a Jacobi smoother

4
S=I− diag(A)−1 A (39)
3max
P = SS P (40)
where max is an upper bound on the spectral radius of diag(A)−1 A. When zero or near zero
values appear on the diagonal of the Galerkin product P T AP , the corresponding rows and columns
are zeroed and ignored during smoothing. We discuss this choice of prolongation smoother in
Section 4.1.
Tables I and II show the result of applying the proposed method to regular quadrilateral and
hexahedral meshes of increasing size. In both cases, the finite element spaces described in [10]
are used to produce the innerproducts Mk . The systems are solved with a random initial value
for x. Since the matrices are singular, the solution x is an arbitrary null vector. Column labels are
explained as follows:
• ‘Grid’—dimensions of the quadrilateral/hexahedral grid. √
• ‘Convergence’—geometric mean of residual convergence factors N r N /r0 .
• ‘Work/Digit’—averaged operation cost of 10
1
residual reduction in units of nnz(A).¶
Table I. Two-dimensional scaling results.

System Grid Unknowns Convergence Work/digit Complexity Levels
2502 63 001 0.075 8.172 1.636 4

DT0 D0 5002 251 001 0.100 9.321 1.661 4
10002 1 002 001 0.063 7.866 1.686 5
2502 125 500 0.096 8.370 1.506 4
DT1 D1 5002 501 000 0.103 8.741 1.527 5
10002 2 002 000 0.085 8.142 1.545 5
2502 125 500 0.124 9.529 1.530 4
D0 DT0 5002 501 000 0.133 9.932 1.542 5
10002 2 002 000 0.094 8.550 1.553 5
2502 62 500 0.063 7.664 1.641 4
D1 DT1 5002 250 000 0.063 7.758 1.664 4
10002 1 000 000 0.063 7.868 1.687 5
2502 63 001 0.043 5.894 1.415 4
DT0 M1 D0 5002 251 001 0.055 6.480 1.432 4
10002 1 002 001 0.041 5.963 1.448 5
2502 125 500 0.095 8.362 1.506 4
DT1 M2 D1 5002 501 000 0.103 8.738 1.527 5
10002 2 002 000 0.085 8.140 1.545 5
¶ Including the cost of conjugate gradient iteration.
DOI: 10.1002/nla
Table II. Three-dimensional scaling results.

System Grid Unknowns Convergence Work/digit Complexity Levels
253 17 576 0.120 7.976 1.268 3

DT0 D0 503 132 651 0.151 9.118 1.300 3
1003 1 030 301 0.105 7.960 1.358 4
253 50 700 0.192 10.432 1.296 3
DT1 D1 503 390 150 0.216 11.587 1.342 4
1003 3 060 300 0.208 11.849 1.415 4
253 48 750 0.188 9.342 1.156 3
DT2 D2 503 382 500 0.218 10.447 1.180 3
1003 3 030 000 0.267 12.350 1.217 4
253 50 700 0.287 13.323 1.246 3
D0 DT0 503 390 150 0.391 17.594 1.235 4
1003 3 060 300 0.323 14.811 1.252 4
253 48 750 0.187 10.928 1.389 3
D1 DT1 503 382 500 0.264 13.855 1.403 4
1003 3 030 000 0.194 11.630 1.455 4
253 15 625 0.089 7.152 1.302 3
D2 DT2 503 125 000 0.102 7.649 1.318 3
1003 1 000 000 0.103 7.949 1.368 4
253 17 576 0.037 4.804 1.178 3
DT0 M1 D0 503 132 651 0.053 5.495 1.200 3
1003 1 030 301 0.038 5.054 1.241 4
253 50 700 0.097 6.838 1.184 3
DT1 M2 D1 503 390 150 0.113 7.461 1.214 4
1003 3 060 300 0.088 6.932 1.264 4
253 48 750 0.188 9.334 1.156 3
DT2 M3 D2 503 382 500 0.223 10.585 1.180 3
1003 3 030 000 0.265 12.294 1.217 4
• ‘Complexity’—total memory cost of multigrid hierarchy relative to ‘System’.

• ‘Levels’—number of levels in the multigrid hierarchy.
For each k, the algorithm exhibits competitive convergence factors while maintaining low oper-
ator complexity. Together, the work per digit-of-accuracy remains bounded as the problem size
increases.
In Table III, numerical results are presented for the unstructured tetrahedral mesh depicted in
Figure 7. As with classical algebraic multigrid methods, performance degrades in moving from
a structured to an unstructured tessellation. However, the decrease in performance for the scalar
problems DT0 D0 and DT0 M1 D0 is less significant than that of the other problems.
DOI: 10.1002/nla
Table III. Solver performance on the unstructured tetrahedral mesh in Figure 7.

System Unknowns Convergence Work/digit Complexity Levels
DT0 D0 84 280 0.073 6.601 1.304 3

DT1 D1 554 213 0.378 18.816 1.391 4
DT2 D2 920 168 0.366 15.856 1.186 4
D0 DT0 554 213 0.236 19.848 2.289 4
D1 DT1 920 168 0.390 17.068 1.197 4
D2 DT2 450 235 0.370 14.400 1.043 3
DT0 M1 D0 84 280 0.144 8.949 1.304 3
DT1 M2 D1 554 213 0.518 29.428 1.483 4
DT2 M3 D2 920 168 0.348 15.111 1.187 4
Figure 7. Titan IV rocket mesh.
4.1. Prolongation smoother

On the nonscalar problems considered, we found second degree prolongation smoothers (39)
noticeably more efficient than first degree prolongation smoothers. While additional smoothing
operations generally improve the convergence rate of smoothed aggregation methods, this improve-
ment is typically offset by an increase in operator complexity: therefore, the resultant work per
digit of accuracy is not improved. However, there is an important difference between the tenta-
tive prolongators in the scalar and nonscalar problems. In the scalar case, all degrees of freedom
DOI: 10.1002/nla
Table IV. Comparison of prolongation smoothers.

System Grid Degree Percent zero Convergence Work/digit Complexity
0 66.8 0.697 42.255 1.123
1 66.8 0.357 14.774 1.123
DT1 M2 D1 2502 2 22.9 0.096 8.379 1.506
3 0.4 0.063 9.515 2.084
4 0.0 0.063 10.188 2.250
0 67.6 0.567 25.043 1.034
1 66.5 0.290 11.497 1.035
DT1 M2 D1 503 2 8.8 0.096 7.460 1.214
3 0.3 0.063 9.011 1.577
4 0.0 0.063 9.074 1.632
0 89.63 0.549 23.670 1.034
1 89.63 0.382 14.753 1.034
2 63.93 0.214 10.304 1.180
DT2 M3 D2 503 3 23.77 0.122 9.203 1.481
4 6.48 0.098 8.348 1.487
5 2.07 0.089 10.267 1.953
are associated with a coarse aggregate; therefore, the tentative prolongator has no zero rows.
As described in Section 2.4, the tentative prolongator for nonscalar problems has zero rows for
elements contained in the interior of a nodal aggregate. In the nonscalar case, additional smoothing
operations incorporate a greater proportion of these degrees of freedom into the range of the final
prolongator.
The influence of higher degree prolongation smoothers on solver performance is reported in
Table IV. Column ‘Degree’ records the degree d of the prolongation smoother P = S d P, whereas
‘Percent zero’ reflects the percentage of zero rows in the first-level prolongator. As expected,
the operator complexity increases with smoother degree. However, up to a point, this increase is
less significant than the corresponding reduction in solver convergence. Second-degree smoothers
exhibit the best efficiency in both instances of the problem DT1 M2 D1 and remain competitive
with higher-degree smoothers in the last test. Since work per digit figures exclude the cost of
constructing multigrid transfer operators, these higher-degree smoothers may be less efficient in
practice.
5. CONCLUSION
We have described an extension of Reitzinger and Schöberl’s methodology [14] to higher-

dimensional k-forms with the addition of smoothed prolongation operators. Furthermore, we
have detailed properties of the prolongation operator that arise from this generalized setting.
Specifically, we have identified necessary and sufficient conditions under which commutativity is
maintained. The prolongation operators give rise to a hierarchy of exact finite element sequences.
The generality of the method is appealing since the components are constructed independently of
a particular mimetic discretization. Finally, we have initiated a study of algebraic multigrid for
the Hodge decomposition of general k-forms.
DOI: 10.1002/nla
REFERENCES
1. Yee KS. Numerical solution of initial boundary value problems involving Maxwells equations in isotropic media.
IEEE Transactions on Antennas and Propagation 1966; AP-14(3):302–307.
2. Bossavit A. On the numerical analysis of eddy-current problems. Computer Methods in Applied Mechanics and
Engineering 1981; 27(3):303–318.
3. Arnold DN. Differential complexes and numerical stability. Proceedings of the International Congress of
Mathematicians, Beijing. Plenary Lectures, vol. 1, 2002.
4. de Silva V, Ghrist R. Homological sensor networks. Notices of the American Mathematical Society 2007;
54:10–17.
5. Polthier K, Preuss E. Identifying vector field singularities using a discrete hodge decomposition. In Visualization
and Mathematics, VisMath, Hege HC, Polthier K (eds). Springer: Berlin, 2002.
6. Tong Y, Lombeyda S, Hirani AN, Desbrun M. Discrete multiscale vector field decomposition. ACM Transactions
on Graphics (Special issue of SIGGRAPH 2003 Proceedings) 2003; 22(3):445–452.
7. Fisher M, Schröder P, Desbrun M, Hoppe H. Design of tangent vector fields. SIGGRAPH ’07: ACM SIGGRAPH
2007 Papers, New York, NY, U.S.A. ACM: New York, 2007; 56.
8. Whitney H. Geometric Integration Theory. Princeton University Press: Princeton, NJ, 1957.
9. Bossavit A. Whitney forms: a class of finite elements for three-dimensional computations in electromagnetism.
IEE Proceedings 1988; 135(Part A(8)):493–500.
10. Bochev PB, Robinson AC. Matching algorithms with physics: exact sequences of finite element spaces. In
Collected Lectures on Preservation of Stability Under Discretization, Chapter 8, Estep D, Tavener S (eds). SIAM:
Philadelphia, PA, 2002; 145–166.
11. Gradinaru V, Hiptmair R. Whitney elements on pyramids. Electronic Transactions on Numerical Analysis 1999;
8:154–168.
12. Hiptmair R. Multigrid method for maxwell’s equations. SIAM Journal on Numerical Analysis 1999; 36(1):
204–225.
13. Arnold DN, Falk RS, Winther R. Multigrid in H (div) and H (curl). Numerische Mathematik 2000; 85(2):197–217.
14. Reitzinger S, Schöberl J. An algebraic multigrid method for finite element discretizations with edge elements.
Numerical Linear Algebra with Applications 2002; 9:223–238.
15. Hu JJ, Tuminaro RS, Bochev PB, Garasi CJ, Robinson AC. Toward an h-independent algebraic multigrid method
for Maxwell’s equations. SIAM Journal on Scientific Computing 2006; 27:1669–1688.
16. Jones J, Lee B. A multigrid method for variable coefficient maxwell’s equations. SIAM Journal on Scientific
Computing 2006; 27(5):1689–1708.
17. Vaněk P, Mandel J, Brezina M. Algebraic multigrid by smoothed aggregation for second and fourth order elliptic
problems. Computing 1996; 56(3):179–196.
18. Muhammad A, Egerstedt M. Control using higher order Laplacians in network topologies. Proceedings of the 17th
International Symposium on Mathematical Theory of Networks and Systems, Kyoto, Japan, 2006; 1024–1038.
19. Adams M, Brezina M, Hu J, Tuminaro R. Parallel multigrid smoothing: polynomial versus Gauss–Seidel. Journal
of Computational Physics 2003; 188(2):593–610.
20. Hirani AN. Discrete exterior calculus. Ph.D. Thesis, California Institute of Technology, May 2003.
21. Frankel T. An introduction. The Geometry of Physics (2nd edn). Cambridge University Press: Cambridge, 2004.
22. Bochev PB, Hyman JM. Principles of mimetic discretizations of differential operators. In Compatible Spatial
Discretizations, Arnold DN, Bochev PB, Lehoucq RB, Nicolaides RA, Shashkov M (eds). The IMA Volumes in
Mathematics and its Applications, vol. 142. Springer: Berlin, 2006; 89–119.
23. Dodziuk J. Finite-difference approach to the Hodge theory of harmonic forms. American Journal of Mathematics
1976; 98(1):79–104.
DOI: 10.1002/nla
Published online 7 December 2007 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/nla.563
A fast full multigrid solver for applications in image processing
M. Stürmer∗, † , H. Köstler and U. Rüde

Department of Computer Science 10, University of Erlangen-Nuremberg,
Cauerstrasse 6, 91058 Erlangen, Germany
SUMMARY
We present a fast, cell-centered multigrid solver and apply it to image denoising and non-rigid diffusion-
based image registration. In both applications, real-time performance is required in 3D and the multigrid
method has to be compared with solvers based on fast Fourier transform (FFT). The optimization of the
underlying variational approach results for image denoising directly in one time step of a parabolic linear
heat equation, for image registration a non-linear second-order system of partial differential equations is
obtained. This system is solved by a fixpoint iteration using a semi-implicit time discretization, where
each time step again results in an elliptic linear heat equation. The multigrid implementation comes close
to real-time performance for medium size medical images in 3D for both applications and is compared
with a solver based on FFT using available libraries. Copyright q 2007 John Wiley & Sons, Ltd.
Received 15 May 2007; Accepted 21 September 2007
KEY WORDS: multigrid; performance optimization; FFT; image processing; image registration; image
denoising
1. INTRODUCTION
In recent years, data sizes in image-processing applications have drastically increased due to the
improved image acquisition systems. Modern computer tomography (CT) scanners can create
volume data sets of 5123 voxels or more [1, 2]. However, users expect real-time image manipulation
and analysis. Thus, fast algorithms and implementations are needed to fulfill these tasks.
Many image-processing problems can be formulated in a variational framework and require
the solution of a large, sparse, linear system arising from the discretization of partial differential
∗ Correspondence to: M. Stürmer, Department of Computer Science 10, University of Erlangen-Nuremberg,

Cauerstrasse 6, 91058 Erlangen, Germany.
†
E-mail: markus.stuermer@informatik.uni-erlangen.de
Contract/grant sponsor: Deutsche Forschungsgemeinschaft (German Science Foundation); contract/grant number:

Ru 422/7-1, 2, 3
Contract/grant sponsor: Bavarian KONWIHR supercomputing research consortium

188 M. STÜRMER, H. KÖSTLER AND U. RÜDE
equations (PDEs). Often these PDEs are inherently based on some kind of diffusion process. In
simple cases, it is possible to use fast Fourier transform (FFT)-based techniques to solve these
PDEs that are of complexity O(n log n). The FFT algorithm was introduced in 1965 by Cooley and
Tukey [3]; for an overview of Fourier transform methods, we refer e.g. to [4–6]. As an alternative,
multigrid methods are more general and can reach an asymptotically optimal complexity of O(n).
For discrete Fourier transforms, flexible and highly efficient libraries optimized for special
CPU architectures such as the FFTW library [7] or the Intel Math Kernel Library (MKL) [8]
are available. However, we are currently not aware of similarly tuned multigrid libraries in 3D
and only of DiMEPACK [9] for 2D problems. The purpose of this paper is to close this gap
and to implement a multigrid solver optimized especially for the Intel x86 architecture that is
competitive to highly optimized FFT libraries and apply it to typical applications in the area of image
processing.
The outline of this paper is as follows: We describe the multigrid scheme including some
results on its convergence and discuss some implementation and optimization issues in Section 2.
Then, the variational approaches used for image denoising and non-rigid diffusion registration are
introduced in Section 3. Finally, we compare computational times of our multigrid solver and the
FFTW package as obtained for image denoising and non-rigid registration of medical CT images.
2. MULTIGRID
For a comprehensive overview on multigrid methods we refer to, e.g. [10–15]. In this paper, we
implement a multigrid solver for the linear heat equation
*u
(x, t)−u(x, t) = f (x), u(x, 0) = u 0 (x) (1)
*t
with time t ∈ R+ , u, f : ⊂ R3 → R, x ∈ , initial solution u 0 : ⊂ R3 → R and homogeneous
Neumann boundary conditions. Note that in practice u(x, t) is often computed for a finite t,
only, and that the solution tends to the well-known Poisson equation in the limit for t → ∞. We
discretize (1) with finite differences
u h (x, )−u 0 (x)
−h u h (x, ) = f h (x) (2)

on a regular grid h with mesh size h and time step . h denotes the well-known 7-point stencil
for the Laplacian. We consider in the following only a single time step, where we have to solve
the elliptic equation
(I −h )u h (x, ) = f h (x)+u 0 (x) (3)
In this paper, we are dealing with image-processing problems, where we can think of the discrete
voxels located in the cell centers. Therefore, we have chosen to use a cell-centered multigrid scheme
with constant interpolation and 8-point restriction. Note that this combination of intergrid transfer
operators will lead to multigrid convergence rates significantly worse than what could be ideally
obtained [15, 16]. This will be shown by local Fourier analysis (LFA) and numerical experiments.
However, this leads to a relatively simple algorithm that satisfies our numerical requirements and
is quite suitable for a careful machine-specific performance optimization. For relaxation we choose
DOI: 10.1002/nla
A FAST FULL MULTIGRID SOLVER FOR APPLICATIONS IN IMAGE PROCESSING 189
an -Red–Black Gauss–Seidel smoother (RBGS) using = 1.15, which is known to be a better

choice in 3D for the given problem than simple Gauss–Seidel relaxation [13, 17].
2.1. Efficient multigrid implementation

This section describes our multigrid implementation. All floating point calculations are done with
single precision (four bytes per value), as this accuracy is already far beyond that of the source
image data. Performance of multigrid implementations can be improved significantly if code
optimization techniques are used as shown in [18–21]. In this paper we will focus on the x86
processor architecture, since it is currently the most common desktop PC platform.
2.1.1. Memory layout. Best performance on current x86 processors can be achieved by using the
SIMD (single instruction multiple data) unit, which was introduced to the architecture in 1999
with the Pentium III as streaming SIMD extension (SSE). These instructions perform vector-
like operations on units of 16 bytes, which can be seen as a SIMD vector data type containing
four single precision floating point numbers in our case. Operating on naturally aligned (i.e. at
addresses multiples of their size) SIMD vectors, the SSE unit provides high bandwidth especially
to the caches. Consequently, the memory layout must support as many aligned data accesses in
all multigrid components as possible. To enable efficient handling of the boundary conditions, we
chose to explicitly store boundary points around the grid; by copying the outer unknowns before
smoothing or calculating the point-wise residuals, we need no special handling of the homogeneous
Neumann boundary conditions. The first unknown of every line is further aligned to a multiple
of 16 bytes by padding, i.e. filling up the line with unused values up to a length of multiples of
four. This enables SIMD processing for any line length, as boundary values, which are generated
just-in-time, and the padding area can be overwritten with fake results.
2.1.2. SIMD-aware implementation. Unfortunately, current compilers fail to generate SIMD

instruction code from a scalar description in most real-world programs. The SIMD unit can be
programmed in assembly language, but as it makes the code more portable and maintainable,
our C++ implementation uses compiler intrinsics, which extend the programming language with
assembly-like instructions for SIMD vector data types.
Implementing the RBGS relaxation in SIMD is not straightforward, as only red or black points
must be updated, while every SIMD vector contains two values of each color. The idea of the
SIMD-aware RBGS is to first calculate a SIMD vector of relaxed values, like for a Jacobi method.
Subsequently, a SIMD multiplication with appropriately initialized SIMD registers is performed
such that either values are preserved and the others are relaxed, which can be illustrated as
⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤
u new (x, y) = 1− ∗ u old (x, y) + ∗ u relax (x, y)
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ u new (x +1, y)⎥ = ⎢ 1 ⎥ ∗ ⎢ u old (x +1, y)⎥ + ⎢ 0 ⎥ ∗ ⎢ u relax (x +1, y)⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ u new (x +2, y)⎦ = ⎣ 1−⎦ ∗ ⎣ u old (x +2, y)⎦ + ⎣ ⎦ ∗ ⎣ u relax (x +2, y)⎦
u new (x +3, y) = 1 ∗ u old (x +3, y) + 0 ∗ u relax (x +3, y)
The better internal and external bandwidths of SIMD over the scalar floating point unit lead to
a real performance gain, even if we actually double the number of floating point operations.
The cell-centered approach is advantageous especially for restriction and interpolation. Coars-
ening is done by averaging eight neighboring fine grid residuals, where every fine grid residual
DOI: 10.1002/nla
contributes only to a single coarse grid point. Hence, calculation of the residual and its restriction
can be done in SIMD and without storing residuals to memory. The idea is to compute four SIMD
registers containing residuals from four neighboring lines and averaging them into a single SIMD
vector first. Its values are reordered by special shuffle instructions, so that two coarse grid right-
hand side values can be generated by averaging its first and second, and its third and fourth values.
By reusing some common expressions, this can be further simplified. The constant interpolation
can also be executed very efficiently in the SIMD unit with shuffle operations.
Additionally, the loops are unrolled and the instructions scheduled carefully by hand to support
the compiler in producing fast code.
2.1.3. Blocking and fusion of components. SIMD optimization is most useful when combined with
techniques to enhance spatial and temporal data locality developed in [20, 22–24] and to exploit
the higher bandwidth of the caches. For smaller grids the post-smoother uses a simple blocking
method as illustrated in Figure 1(I): After preparing the first boundary (I(a)), it continues after
(a) (b) (c) (d)

(I)
(a) (b) (c) (d)

(II)
(a) (b) (c) (d)

(III)
Figure 1. Illustration of the different blocking methods on a 10×10×10 cube. (I) Simple plane blocking
of one RBGS update: (a) initial boundary handling; (b) first block; (c) blocking complete; and (d) final
boundary handling. (II) Super-blocking of one RBGS update: (a) first sub-block of first super-block;
(b) first super-block complete; (c) middle super-block complete; and (d) last super-block complete.
(III) Super-blocking of one RBGS update fused with calculation of residual and restriction: (a) initial
boundary handling; (b) first sub-block of first super-block; (c) first super-block complete; and (d) only
final boundary handling missing.
DOI: 10.1002/nla
the red update in line y, z immediately with the black update in line y, z −1 (I(b)) through the
whole grid (I(c)) and finishes the sweep with a black update in the last plane (I(d)). As long as
data from the last block can be held in the cache hierarchy, the solution and right-hand side grid
must be transferred from and to memory only once. For larger grids this is not possible anymore
and another blocking level must be introduced as illustrated in Figure 1(II): The grid is divided
in the x–z direction then, and every resulting super-block is processed in a similar manner as in
the simple case, but the red update in line y, z is followed by the black update in line y −1, z to
respect data dependencies between two super-blocks. Therefore, the first and last super-blocks need
a special boundary handling (II(a–d)). This two-fold blocking method is slightly less effective,
since the super-blocks overlap and some values are read from main memory twice. The optimal
super-block height depends on the cache size and the line length.
The pre-smoother extends these blocking methods further by fusing the smoothing step with
calculation and restriction of the residuals. For smaller grids, the simpler blocking method working
on whole planes (I) is extended: the right-hand side values of the coarser grid plane z are computed
immediately after smoothing in the planes 2z and 2z +1 is done. This leads to a slightly more
complex handling at the first and last planes. For larger planes, however, super-blocks must be
used again as depicted in Figure 1(III).
2.2. Convergence rates

The asymptotic convergence rates of our algorithm are evaluated in a power iteration for Equation
(3), i.e. setting the right-hand side f h and u 0 to zero and scaling the discrete L 2 -norm of the
solution u h to 1 after each multigrid V-cycle iteration step. In Table I asymptotic convergence
rates (after 100 iterations) for different sizes are shown. These values refer to the case, when (1)
degenerates to the Poisson equation, simulated by setting = 1030 . As expected the convergence
rates are even better for finite and smaller . We compare these results with LFA predictions
computed by the lfa package [14] in Table II again for the case of Poisson’s equation. This confirms
our observations that due to the constant interpolation the asymptotic convergence rates get worse
for smaller mesh sizes. Note that using a simple RBGS smoother by setting = 1 leads to a worse
asymptotic convergence factor.
2.3. Performance results

Next we discuss performance results measured on two different test platforms. As reference we
present run time for a forward and backward FFT used for periodic boundary conditions and
Table I. Asymptotic convergence rates for different time steps

measured experimentally with mesh size h = 1.0 on the finest
grid and one-grid point on the coarsest level.
Size V(1, 1) V(2, 2)
643 0.27 0.07

1283 0.29 0.07
2563 0.31 0.07
5123 0.34 0.07
Note: For → ∞, this results effectively in the Poisson equation.
DOI: 10.1002/nla
Table II. Smoothing factor and three-grid asymptotic convergence factor

(M3L ) for size 643 and = 1030 obtained by LFA.
V(1, 1) V(2, 2)
Interpolation (S) (M3L ) (S) (M3L )
Constant 1.0 0.20 0.47 0.04 0.07
Constant 1.15 0.08 0.20 0.04 0.06
LIN 1.15 0.08 0.10 0.04 0.06
Note: Settings are equivalent to Table I.
Table III. Wallclock times in ms for FFT (real type, out of place, forward and backward)
and the optimized multigrid on an AMD Opteron 248 2.2 GHz cluster node.
Size V(1, 1) FMG V(1, 1) FMG V(2, 2) FFT (FFTW) DCT (FFTW)
32 0.63 0.80 1.38 0.85 2.27
64 6.97 9.55 14.9 10.4 19.1
128 56.0 78.7 122 107 197
256 445 622 976 992 2024
512 3669 5175 7943 9274 67 766
Table IV. Wallclock times in ms for FFT (real type, out of place, forward and backward)
and the optimized multigrid on an Intel Core2 Duo 2.4 GHz (Conroe) workstation.
Size V(1, 1) FMG V(1, 1) FMG V(2, 2) FFT (FFTW) DCT (FFTW) FFT (MKL)
32 0.43 0.55 0.93 0.40 1.43 0.71
64 3.33 4.29 7.12 3.73 12.2 5.27
128 31.6 44.1 68.3 50.4 123 45.8
256 264 370 574 473 1246 401
512 2168 3026 4699 4174 11 067 3510
discrete cosine transform (DCT) used for Neumann boundary conditions, respectively. This does
not contain the time necessary for actually solving the problem in Fourier space as described in
Section 8, which is highly dependent on the code quality. For our applications, the accuracy of
a simple FMG-V(1, 1) or even a simple V(1, 1)-cycle is often sufficient, as will be explained
in Section 3. On both platforms, we compare the performance of our code-optimized multigrid
implementation with the performance of the well-known FFTW package [25] (version 3.1.2).
The first test platform is an AMD Opteron 248 cluster node. The CPUs run at 2.2 GHz and
provide a 1 MB unified L2 and 64 kB L1 data cache and are connected to DDR-333 memory. For
this platform, the GNU C and C++ compiler (version 4.1.0 for 64-bit environment) was used.
Measurements (see Table III) show that a full multigrid with V(1, 1)-cycles can outperform the
FFTW’s FFTs and is much faster than its DCTs even with V(2, 2)-cycles.
The second test platform is an Intel Core2 Duo (Conroe) workstation. The CPU runs at 2.4 GHz,
both cores have an L1 data cache of 16 kB, share 4 MB of unified L2 cache and are connected to
DDR2-667 memory. For this platform, the Intel 64 compiler suite (version 9.1) was used. We also
DOI: 10.1002/nla
present results for a beta version of the Intel MKL [8] (version 9.0.06 beta for 64-bit environment),
which provides an FFTW-compatible interface for FFTs through wrapper functions, but no DCT
functions at all.
Although a slightly different instruction scheduling more suitable for that CPU type is used,
all multigrid variants are slower at smaller problem sizes than the FFTs of FFTW and the
MKL on this platform (see Table IV), the FMG with V(2, 2)-cycles even at all problem sizes.
Again, the DCTs take much more time than the code-optimized multigrid at all problem sizes
tested.
3. VARIATIONAL APPROACHES IN IMAGE PROCESSING
Variational approaches in image processing are often considered as too slow for real-time appli-
cations, especially in 3D. Nevertheless, they are attractive due to their flexibility and the quality
of the results, see e.g. [1, 26–31]. In the following, we introduce two very simple variational
prototype problems. Most of the more complicated image-processing tasks consist of extensions
of these approaches that include, e.g. introducing local anisotropy in the PDEs. The reason why
we restrict ourselves to these simple approaches is that they can be solved by FFT-based methods
and by multigrid and they are therefore good benchmark problems to test the best possible speed
of variational image-processing methods.
3.1. Image denoising

The task of image denoising is to remove the noise from a given d-dimensional image u 0 :
⊂ Rd → R. One simple variational based on Tikhonov regularization [32] is to minimize the
functional

E 1 (u) = |u 0 −u|2 +|∇u|2 dx (4)

with x ∈ Rd and ∈ R+ over the image domain ⊂ Rd . A necessary condition for a minimizer
u : → R, the denoised image, is characterized by the Euler–Lagrange equations
u −u 0 −u = 0 (5)
with homogeneous Neumann boundary conditions. This is equivalent to (3) with f h = 0 and = .
In an infinite domain, an explicit solution is given by

u(x, t) = G √2t (x−y)u 0 (y) dy = (G √2t ∗u 0 )(x) (6)
Rd
where the operator ∗ denotes the convolution of the grid function u 0 and the Gaussian kernel
1 −|x|2 /(22 )
G (x) = e (7)
22
with standard deviation ∈ R+ . This is equivalent to applying a low-pass filter and can be
transformed into Fourier space, where a convolution corresponds to a multiplication of the
DOI: 10.1002/nla
transformed signals. If we denote the Fourier transform of a signal f : Rd → R by F[ f ] and use

2 /(2/2 )
F[G ](w) = e−|x| , w ∈ Rd
it follows that
2 /(2/2 )
F[G ∗u 0 ](w) = e−|x| F[u 0 ](w) (8)
Summarizing, we have three choices to compute the denoised image:

1. the convolution of the image with a discrete version of the Gaussian kernel (7),
2. the use of an FFT to solve (8) or
3. the application of a multigrid method to (3).
In the first two methods, we extend the image symmetrically and use periodic boundary conditions,
while we assume homogeneous Neumann boundary conditions for the third method. In most
applications, applying a filter mask to the image constructed from a discrete version of the Gaussian
kernel (7) is an easy and efficient way to denoise the image. However, if large (and thus
large t) is required, the filter masks become large and computationally inefficient. To show this
we add Gaussian noise to a rendered 3D MRI image (size 256×256×160) of a human head (see
Figure 2) and filter it using masks of sizes 5×5×5 and 3×3×3. We apply the masks in each
direction separately to the image, but do not decompose them as described in [28] to speed up the
computation further.
Then we use our cell-based multigrid method to solve (3) for = 1.21. Figure 2 shows the
resulting blurred volume. Larger time steps would blur image edges too much. Runtimes for
different methods measured on the AMD Opteron platform described in Section 2.3 are shown
in Table V. Times for FFT-based denoising include applying (8) besides forward and backward
transforms. The multiplication with the exponential was not optimized and took about 50% of the
time.
Note that the Laplacian has very strong isotropic smoothing properties and does not preserve
edges. Therefore, in practice, model (4) is not used to restore deteriorated images, but to presmooth
the image, e.g. in order to ensure a robust estimation of the image gradient.
Next, we turn to another prototype problem in image processing that involves also the solution
of several problems of type (3).
Figure 2. Rendered 3D MRI image with added Gaussian noise ( = 10) added (left) and after denoising
(right) using a V(1, 1)-cycle of the cell-centered multigrid method.
DOI: 10.1002/nla
Table V. Runtime for denoising a 3D MRI image (size

256×256×160) of a human head with added Gaussian
noise measured on the AMD Opteron platform.
Method Runtime (ms)
Filtering with a mask of size 5×5×5 1200
Filtering with a mask of size 3×3×3 680
FMG-V(1, 1) 390
FFT 1140
3.2. Non-rigid image registration

The task of image registration is to align two or more images from the same or different modalities
[33, 34]. We consider here only mono-modal registration. This requires finding a suitable spatial
transformation such that a transformed image becomes similar to another one, see e.g. [29, 35–39].
This deformation is independent of the motion of the object, e.g. a rotation. For image registration,
two d-dimensional images are given by
T, R : ⊂ Rd → R (9)
where T and R are template image and reference image, respectively, and is the image domain.
The task of non-rigid image registration is to find a transformation (x) such that the deformed
image T (u (x)) can be matched to image R(x). The transformation is defined as
u (·) : Rd → Rd , u (x) := x−u(x), x⊂
where the displacement u(x) : Rd → Rd , u = (u 1 , . . . , u d ) is a d-dimensional vector field. Mathe-

matically, we again use a variational approach to minimize the energy functional

d
E 2 (u) = (T (x−u(x))− R(x))2 + ∇u l 2 dx (10)
l=1
that consists of two parts. The first term (T (x−u(x))− R(x))2 is a distance measure that evaluates
the similarity of the two images. Here, we restrict ourselves to the sum of squared differences
(SSD) as represented in the integral in (10). When discretized, this results in a point-wise ‘least-
squares’ difference of gray values. The second term, the regularizer, controls the smoothness or
regularity of the transformation. In the literature many different regularizers were discussed [29].
d
We restrict ourselves here to the so-called diffusion regularizer l=1 ∇u l 2 [35]. By choosing
+
different parameters ∈ R , one can control the relative weight of the two terms in the functional
[40, 41].
The optimization of the energy functional results in nonlinear Euler–Lagrange equations
∇T (x−u(x))(T (x−u(x))− R(x))+u = 0 (11)
with homogeneous Neumann boundary conditions that can be discretized by finite differences
on a regular grid h with mesh size h. To treat the nonlinearity often an artificial time is
DOI: 10.1002/nla
Algorithm 1. Image registration scheme.

1. Set u0 ;f 0 = ∇h T (uk )(T (uk )− R);
2. for timestep = 0 to k do
3. Compute f k = ∇h T (uk )(T (uk )− R);
4. Update := , := if necessary;
5. Compute rk = f k +uk ;
6. Solve (I −h )uk+1 = rk ;
7. end for
introduced
*t u(x, t)−u(x, t) = ∇T (x−u(x, t))(T (x−u(x, t))− R(x)) (12)
which is discretized by a semi-implicit scheme with a discrete time step , where the nonlinear
term is evaluated at the old time level
(uk+1
h −uh )
k
−h uk+1
h = ∇h T (x−uh )(T (x−uh )− R(x))
k k
(13)

The complete image registration scheme can be found in Algorithm 1. Note that in each time step,
line 6 of Algorithm 1 requires the solution of d decoupled scalar linear heat equations of type (3).
This can be accomplished by the same multigrid algorithms as for the image denoising in the last
section. To minimize the number of time steps, we use a technique described in [42] to adapt the
and parameters. The idea is to start with large and (we use = 1000, = 10) penalizing
higher oscillations in the solution and preferring global transformations, and then to decrease the
parameters by factors = 0.1 and = 0.5 when the improvement of the SSD stagnates. Note
that for small the transformations are localized and sensitive to discontinuities or noise in the
images. The development of the relative SSD error for an image registration example is found
in Figure 3. As initial deformation for the first time step we take an interpolated solution of the
0.8
relative SSD error
0.6
0.4
0.2
0
10 20 30 40 50 60 70 80
time step
Figure 3. Relative SSD error for image registration over time.
DOI: 10.1002/nla
Figure 4. Slice of reference image (upper left) template image (upper right), distance image Tk –R (lower
left) and registered image (lower right).
image registration from the next coarser grid, which explains that the initial relative SSD error
is below 1.0. The bends in the curve arise when adapting and . Figure 4 shows slices of
the corresponding medical data sets and the registration result. For medical applications, it is
not always useful to drive the registration problem to a very small SSD, but to maintain the
topology of the medical data. Table VI summarizes the runtimes for different methods to solve (13).
A whole time step in the registration algorithm including three linear solves and the computation
of the new right-hand side and the SSD error takes 1.4 s. Starting with an FMG-V(2, 1) for the
first iterations, it is sufficient to perform an FMG-V(1, 1) after time steps become smaller without
losing any accuracy in the solution. The DCT-based implementation is described, e.g. in [29]. Here
about 65% of the time was spent to compute the forward and backward transforms, the rest for the
non-optimized multiplication of the inverse eigenfunctions. Note that in practice sometimes also
DOI: 10.1002/nla
Table VI. Runtime for one linear solve in one time step in the image
registration algorithm for an image of size 256×256×160.
Method Runtime (ms)
FMG-V(2, 2) 608
FMG-V(2, 1) 499
FMG-V(1, 1) 390
DCT 2107
AOS 1971
an additive operator splitting (AOS) scheme is used to solve the registration problem [29, 43]. It
is fast, but the time step has to be chosen sufficiently small [29].
4. CONCLUSIONS AND FURTHER WORK
A fast cell-based full multigrid implementation for variational image-processing problems is shown
to be highly competitive in terms of computing times with alternative techniques such as approaches
using FFT-based algorithms. However, this requires a careful machine-specific code optimization.
Next, this first step has to be extended to an arbitrary number of grid points in each direction
and to anisotropic or nonlinear diffusion models. Furthermore, we consider parallelization of the
optimized multigrid solver.
ACKNOWLEDGEMENTS
This research is being supported in part by the Deutsche Forschungsgemeinschaft (German Science
Foundation), projects Ru 422/7-1, 2, 3 and the Bavarian KONWIHR supercomputing research consortium
[44, 45].
REFERENCES
1. Jain AK. Fundamentals of Digital Image Processing. Prentice-Hall: Englewood Cliffs, NJ, U.S.A., 1989.
2. Oppenheim A, Schafer R. Discrete-time Signal Processing. Prentice-Hall: Englewood Cliffs, NJ, U.S.A., 1989.
3. Cooley J, Tukey J. An algorithm for the machine computation of the complex Fourier series. Mathematics of
Computation 1965; 19:297–301.
4. Duhamel P, Vetterli M. Fast Fourier transforms: a tutorial review and a state of the art. Signal Processing 1990;
19:259–299.
5. Rader CM. Discrete Fourier transforms when the number of data samples is prime. Proceedings of the IEEE
1968; 56:1107–1108.
6. Pennebaker W, Mitchell J. JPEG: Still Image Data Compression Standard. Van Nostrand Reinhold: New York,
1993.
7. Frigo M, Johnson S. FFTW: an adaptive software architecture for the FFT. Proceedings of the International
Conference on Acoustics, Speech, and Signal Processing, Seattle, WA, U.S.A., vol. 3, 1998; 1381–1384.
8. MKL. http://www.intel.com/cd/software/products/asmo-na/eng/perflib/mkl/.
9. Kowarschik M, Weiß C, Rüde U. DiMEPACK—a cache-optimized multigrid library. In Proceedings of the
International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2001),
vol. I, Las Vegas, NV, U.S.A., Arabnia HR (ed.). CSREA Press: Irvine, CA, U.S.A., 2001; 425–430.
10. Brandt A. Multi-level adaptive solutions to boundary-value problems. Mathematics of Computation 1977;
31(138):333–390.
DOI: 10.1002/nla
11. Hackbusch W. Multi-grid Methods and Applications. Springer: Berlin, Heidelberg, New York, 1985.
12. Briggs W, Henson V, McCormick S. A Multigrid Tutorial (2nd edn). SIAM: Philadelphia, PA, U.S.A., 2000.
13. Trottenberg U, Oosterlee C, Schüller A. Multigrid. Academic Press: San Diego, CA, U.S.A., 2001.
14. Wienands R, Joppich W. Practical Fourier analysis for multigrid methods. Numerical Insights, vol. 5. Chapman
& Hall/CRC Press: Boca Raton, FL, U.S.A., 2005.
15. Wesseling P. Multigrid Methods. Edwards: Philadelphia, PA, U.S.A., 2004.
16. Mohr M, Wienands R. Cell-centred multigrid revisited. Computing and Visualization in Science 2004; 7(3):
129–140.
17. Yavneh I. On red–black SOR smoothing in multigrid. SIAM Journal on Scientific Computing 1996; 17(1):180–192.
18. Barkai D, Brandt A. Vectorized multigrid Poisson solver for the CDC CYBER 205. Applied Mathematics and
Computation 1983; 13(3–4):215–228. (Special Issue, Proceedings of the First Copper Mountain Conference on
Multigrid Methods, Copper Mountain, CO, McCormick S, Trottenberg U (eds).)
19. Kowarschik M, Rüde U, Thürey N, Weiß C. Performance optimization of 3D multigrid on hierarchical memory
architectures. Proceedings of the 6th International Conference on Applied Parallel Computing (PARA 2002),
Lecture Notes in Computer Science, vol. 2367. Springer: Berlin, Heidelberg, New York, 2002; 307–316.
20. Kowarschik M. Data Locality Optimizations for Iterative Numerical Algorithms and Cellular Automata on
Hierarchical Memory Architectures. Advances in Simulation, vol. 13. SCS Publishing House: Erlangen, Germany,
2004.
21. Bergen B, Gradl T, Hülsemann F, Rüde U. A massively parallel multigrid method for finite elements. Computing
in Science and Engineering 2006; 8(6):56–62.
22. Douglas C, Hu J, Kowarschik M, Rüde U, Weiß C. Cache optimization for structured and unstructured grid
multigrid. Electronic Transactions on Numerical Analysis (ETNA) 2000; 10:21–40.
23. Weiß C. Data locality optimizations for multigrid methods on structured grids. Ph.D. Thesis, Lehrstuhl für
Rechnertechnik und Rechnerorganisation, Institut für Informatik, Technische Universität München, Germany,
2001.
24. Stürmer M. Optimierung von Mehrgitteralgorithmen auf der IA-64 Rechnerarchitektur. Lehrstuhl fr Informatik 10
(Systemsimulation), Institut für Informatik, University of Erlangen-Nuremberg, Germany, May 2006. Diplomarbeit.
25. FFTW. http://www.fftw.org.
26. Horn B. Robot Vision. MIT Press: Cambridge, MA, U.S.A., 1986.
27. Lehmann T, Oberschelp W, Pelikan E, Repges R. Bildverarbeitung für die Medizin. Springer: Berlin, Heidelberg,
New York, 1997.
28. Jähne B. Digitale Bildverarbeitung (6th edn). Springer: Berlin, Heidelberg, New York, 2006.
29. Modersitzki J. Numerical Methods for Image Registration. Oxford University Press: Oxford, 2004.
30. Morel J, Solimini S. Variational Methods in Image Segmentation. Progress in Nonlinear Differential Equations
and their Applications, vol. 14. Birkhaeuser: Boston, 1995.
31. Weickert J. Anisotropic Diffusion in Image Processing. Teubner Verlag: Stuttgart, Germany, 1998.
32. Tikhonov AN, Arsenin VY. Solution of Ill-posed Problems. Winston and Sons: New York, NY, U.S.A., 1977.
33. Hermosillo G. Variational methods for multi-model image matching. Ph.D. Thesis, Université de Nice, France,
2002.
34. Viola P, Wells W. Alignment by maximization of mutual information. International Journal of Computer Vision
1997; 24(2):137–154.
35. Fischer B, Modersitzki J. Fast diffusion registration. AMS Contemporary Mathematics, Inverse Problems, Image
Analysis, and Medical Imaging 2002; 313:117–129.
36. Haber E, Modersitzki J. A multilevel method for image registration. SIAM Journal on Scientific Computing 2006;
27(5):1594–1607.
37. Clarenz U, Droske M, Henn S, Rumpf M, Witsch K. Computational methods for nonlinear image registration.
Technical Report, Mathematical Institute, Gerhard-Mercator University Duisburg, Germany, 2006.
38. Fischer B, Modersitzki J. Curvature based image registration. Journal of Mathematical Imaging and Vision 2003;
18(1):81–85.
39. Henn S. A multigrid method for a fourth-order diffusion equation with application to image processing. SIAM
Journal on Scientific Computing 2005; 27(3):831–849.
40. Jäger F, Han J, Hornegger J, Kuwert T. A variational approach to spatially dependent non-rigid registration. In
Proceedings of SPIE, vol. 6144, Reinhardt J, Pluim J (eds). SPIE: Bellingham, U.S.A., 2006; 860–869.
41. Kabus S, Franz A, Fischer B. On elastic image registration with varying material parameters. In Proceedings
of Bildverarbeitung für die Medizin (BVM), Maintzer H-P, Handels H, Horsch A, Tolxdorff T (eds). Springer:
Berlin, Heidelberg, New York, 2005; 330–334.
DOI: 10.1002/nla
42. Henn S, Witsch K. Image registration based on multiscale energy information. Multiscale Modeling and Simulation
2005; 4(2):584–609.
43. Weickert J, ter Haar Romeny B, Viergever M. Efficient and reliable schemes for nonlinear diffusion filtering.
IEEE Transactions on Image Processing 1998; 7(3):398–410.
44. Hülsemann F, Meinlschmidt S, Bergen B, Greiner G, Rüde U. Gridlib—a parallel, object-oriented framework
for hierarchical-hybrid grid structures in technical simulation and scientific visualization. In High Performance
Computing in Science and Engineering, KONWIHR Results Workshop, Garching, Bode A, Durst F (eds). Springer:
Berlin, Heidelberg, New York, 2005; 117–128.
45. Freundl C, Bergen B, Hülsemann F, Rüde U. ParEXPDE: expression templates and advanced PDE software
design on the Hitachi SR8000. In High Performance Computing in Science and Engineering, KONWIHR Results
Workshop, Garching, Bode A, Durst F (eds). Springer: Berlin, Heidelberg, New York, 2005; 167–179.
DOI: 10.1002/nla
Multigrid solution of the optical flow system using a combined

diffusion- and curvature-based regularizer
H. Köstler1 , K. Ruhnau2 and R. Wienands2, ∗, †

1 Department of Computer Science 10, University of Erlangen-Nuremberg, Erlangen, Germany
2 Mathematical Institute, University of Cologne, Cologne, Germany
SUMMARY
Optical flow techniques are used to compute an approximate motion field in an image sequence. We
apply a variational approach for the optical flow using a simple data term but introducing a combined
diffusion- and curvature-based regularizer. The same data term arises in image registration problems where
a deformation field between two images is computed. For optical flow problems, usually a diffusion-
based regularizer should dominate, whereas for image registration a curvature-based regularizer is more
appropriate. The combined regularizer enables us to handle optical flow and image registration problems
with the same solver and it improves the results of each of the two regularizers used on their own. We
develop a geometric multigrid method for the solution of the resulting fourth-order systems of partial
differential equations associated with the variational approach for optical flow and image registration
problems. The adequacy of using (collective) pointwise smoothers within the multigrid algorithm is
demonstrated with the help of local Fourier analysis. Galerkin-based coarse grid operators are applied for
an efficient treatment of jumping coefficients. We show some multigrid convergence rates, timings and
investigate the visual quality of the approximated motion or deformation field for synthetic and real-world
images. Copyright q 2008 John Wiley & Sons, Ltd.
KEY WORDS: multigrid; optical flow; image registration; variational approaches in computer vision
1. INTRODUCTION
Optical flow is commonly defined to be the motion of brightness patterns in a sequence of images.
It was introduced by Horn and Schunck [1], who proposed a differential method to compute
the optical flow from pairs of images using a brightness constancy assumption and an additional
smoothness constraint on the magnitude of the gradient of the velocity field in order to regularize
the problem, what we call diffusion-based regularization. Since then optical flow has been studied
∗ Correspondence to: R. Wienands, Mathematical Institute, University of Cologne, Weyertal 86-90, 50931 Cologne,
Germany.
†
E-mail: wienands@math.uni-koeln.de

202 H. KÖSTLER, K. RUHNAU AND R. WIENANDS
intensively and many extensions to that simple variational approach, e.g. considering different
regularizing terms, were investigated [2–9].
Optical flow applications range from robotics to video compression and particle image
velocimetry (PIV), where optical flow provides approximate motion of fluid flows. Especially
for PIV, it is necessary to incorporate physically more meaningful regularizers to be able to
impose, e.g. an incompressibility condition of the velocity field. Suter [10] introduced therefore a
smoothness constraint on the divergence and curl of the velocity field that was used intensively
in the following [11–14]. A well-known regularizer in image registration that is related to optical
flow [15] and a special case of a second-order div–curl-based regularizer [10] is the curvature-
based regularizer [16]. The purpose of the curvature-based regularizer is to let affine motion
unpenalized while higher-order motions are still used to enforce smoothness. Another advantage
of a higher-order regularizer is that for some applications additional information from features
or landmarks is given for the optical flow computation [17]. Here, the higher-order regularizer is
required to avoid singularities in the solution [18, 19].
We present a variational approach for optical flow with a combined diffusion- and curvature-
based regularizer in Section 2. Please note that the accuracy of optical flow models is usually
dominated by the data term. Our main focus is on the impact of the regularization and we use
a rather simple data term that also arises in image registration in order to treat both applications
with the same solver. As a consequence, we cannot expect to achieve the same accuracy as it is
obtained, for example, in [20] where very accurate optical flow models are presented based on an
advanced data term.
Besides accuracy of the approximate motion field obtained by optical flow, an important goal
is to achieve real time or close to real-time performance in many applications, which makes
an efficient numerical solution of the underlying system of partial differential equations (PDEs)
mandatory. First attempts to use multilevel techniques to speed up optical flow computations are due
to Glazer [21] and Terzopoulos [22]. After that, several multigrid-based solvers were proposed for
different optical flow regularizers (see, e.g. [23–27]). In [28, 29] efficient cell-centered (nonlinear)
multigrid solvers for various optical flow models with diffusion-based regularizers are discussed.
Multigrid methods for image registration are e.g. presented in [30–32]. We develop a geometric
multigrid method in Section 3 in order to solve the fourth-order system of PDEs derived from
our variational approach efficiently. Especially, the existence and efficiency of point smoothing
methods are investigated in some detail. Here, we do not apply the classical multigrid theory based
on smoothing and approximation property [33] as it is done in [34] for a similar application but
we use local Fourier analysis techniques [35–37].
In Section 4, optical flow and image registration results using the combined diffusion and
curvature regularizer both for synthetic and real-world images are found. We end this paper with
an outlook for future developments, e.g. the extension to isotropic or anisotropic versions of the
combined regularizer to deal with discontinuities in the velocity field.
2. VARIATIONAL MODEL AND DISCRETIZATION
2.1. Optical flow

The variational approach to compute the motion field as proposed by Horn and Schunck [1] is
composed of a data term and a regularizer. The data term is based on the assumption that a moving
DOI: 10.1002/nla
MULTIGRID SOLUTION OF THE OPTICAL FLOW SYSTEM 203
object in the image does not change its gray values, what means that, for example, changes of
illumination are neglected. For an image sequence I : ×T → R, ⊂ R2 describing the gray value
intensities for each point x = (x, y) in the regular image domain at time t ∈ T = [0, tmax ], tmax ∈ N,
this so-called brightness constancy assumption reads
dI
=0 (1)
dt
This yields the following identity for the movement of a gray value at (x, y, t):
I (x, t) = I (x +dx, y +dy, t +dt) (2)
Taylor expansion of I (x +dx, y +dy, t +dt) around (x, y, t) neglecting higher-order terms and
using (2) gives
I x u + I y v + It ≈ 0
with the partial image derivatives *I /*x = I x , *I /*y = I y , *I /*t = It and the optical flow velocity
vector u = (u, v)T , u := dx/dt, v := dy/dt. Please note that in general I is not differentiable for
real-world images. However, usually these images are preprocessed by several steps of a Gaussian
filter [2] making sure that the function I is sufficiently smooth.
The brightness constancy assumption (1) is used throughout this paper, but by itself results in
an ill-posed, under-determined problem. Therefore, additional regularization is required. Horn and
Schunck proposed as second assumption a smoothness constraint or a diffusion-based regularizer
S1 (u) = ∇u2 +∇v2
and combined both in an energy functional

E 1 (u) := (I x u + I y v + It )2 +S1 (u) dx (3)

that is to be minimized. ∈ R+ represents a weighting parameter. The curvature-based regularizer

penalizes second derivatives instead and can be expressed as
S2 (u) = (u)2 +(v)2
As already mentioned, it is a special case of the div–curl-based regularizer [10]
S2 (u) = 1 ∇div u2 +2 ∇curl u2
where 1 = 2 = 1. We propose a combination of the regularizers S1 (u) and S2 (u) resulting in the
combined diffusion- and curvature-based regularizer
S3 (u) = S1 (u)+(1−)S2 (u)
where ∈ [0, 1]. The corresponding energy functional E 3 (u) is obtained by simply replacing S1
by S3 in (3). The resulting minimization problem is indeed a well-posed problem, which can be
seen as follows. Considering only the regularizing part of the energy functional, it can be easily
interpreted as a symmetric, positive and elliptic bilinear form for u and v. In such cases, it is
well known that the corresponding minimization problem has a unique solution. Since the data
term is assumed to be sufficiently smooth (see above), the well-posedness can be concluded for
DOI: 10.1002/nla
the complete variational problem based on E 3 (u), compare with [15] and the references therein.
Considerations concerning the well-posedness in a less regular case are covered in [34].
The diffusion-based regularizer only allows small changes of near vectors and produces very
smooth motion fields, but it also smoothes edges out. The curvature-based regularizer lets affine
motions unpenalized since they are in its kernel. Here, smoothness is achieved by using higher-
order motions. We will show for the problems under consideration that the optical flow (and the
deformation field derived in image registration, see below) based on the combined regularizer can
be computed efficiently and that we obtain more accurate solutions than they are produced by each
of the two regularizers used on their own.
2.2. System of PDEs

To solve the variational problem introduced above we consider the corresponding Euler–Lagrange
equations. Equipped with natural homogeneous Neumann boundary conditions on u, v, u and
v they form a well-posed boundary value problem, which constitutes a necessary condition for
a minimum of E 3 (u) (see, e.g. [15]). The Euler–Lagrange equations in the image domain read
((1−)(−)2 u +(−)u)+ I x (I x u + I y v + It ) = 0 (4a)
((1−)(−)2 v +(−)v)+ I y (I x u + I y v + It ) = 0 (4b)

The appropriate set of four boundary conditions for = 1 is given by

∇u, n = 0,
∇v, n = 0 (5a)

∇(u), n = 0,
∇(v), n = 0 (5b)
with outward normal n. For = 0, we obtain a fourth-order system, whereas for = 1 the original
Horn and Schunck second-order system results where only two boundary conditions are required
given by (5a).
The biharmonic operator 2 which appears in (4a) is known to lead to poor multigrid perfor-
mance. Therefore, it is a common approach to split up the biharmonic operator into a system of
two Poisson-type equations [36]. Employing this idea, (4a) can be transformed into the following
system using additional unknown functions w 1 = −u and w 2 = −v:
⎛ ⎞ ⎛ ⎞
u 0
⎜ ⎟ ⎜
⎜ v⎟ ⎜ 0 ⎟ ⎟
⎜ ⎟
L ⎜ 1⎟ = ⎜ ⎟ (6a)
⎜w ⎟ ⎜ ⎝ −I I
⎟
⎠
⎝ ⎠ x t
w2 −I y It
with
⎛ ⎞
− 0 −1 0
⎜ ⎟
⎜ 0 − 0 −1 ⎟
⎜ ⎟
L=⎜ 2 ⎟ (6b)
⎜ Ix Ix I y (−(1−)+) 0 ⎟
⎝ ⎠
Ix I y I y2 0 (−(1−)+)
DOI: 10.1002/nla
The boundary conditions (5a) and (5b) are transferred into

∇u, n = 0,
∇v, n = 0,
∇w 1 , n = 0,
∇w 2 , n = 0
The determinant of (6b) is given by
det(L) = 2 (−1)2 (−)4 +22 (−2 )(−)3

+(2 2 +(1−)(I x2 + I y2 ))(−)2 +(I x2 + I y2 )(−) (7)
For the special cases = 0 and 1, we obtain
det(L) = 2 4 +(I x2 + I y2 )2 and det(L) = 2 2 −(I x2 + I y2 )
respectively. The principle part of det(L) is m with m = 4 for ∈ [0,1) and m = 2 for = 1 due
to >0. Hence, four boundary conditions for = 1 are required and two boundary conditions for
= 1 (see, e.g. [35, 36]). This requirement is met by our choice of boundary conditions since we
use natural homogeneous Neumann boundary conditions on u, v and additionally on −u = w 1 ,
−v = w 2 , if = 1, according to the minimization of the energy functional, see above.
2.3. Discretization
The continuous system (6a), (6b) of four PDEs is discretized by finite differences using the standard
five-point central discretization h of the Laplacian (see, e.g. [36]) with x ∈ h and discrete
functions u h , vh , wh1 , wh2 . Here, h denotes the discrete image domain, i.e. each x ∈ h refers
to a pixel. The mesh size h is usually set to 1 for optical flow applications. The corresponding
homogeneous Neumann boundary conditions for the four unknown functions are discretized by
central differences as well. Finally, the image derivatives have to be approximated by sufficiently
accurate finite differences schemes. A proper accuracy of these derivatives is often essential for
the quality of the image-processing result. The discrete operator Lh is then simply given by (6b)
where has to be replaced by h and I x , I y by their finite difference approximations I xh , I yh .
2.4. Image registration

Image registration is closely related to the optical flow problem. Here, the goal is to compute
a deformation field between two images called reference (R(x) := I (x +dx, y +dy, t +dt)) and
template (T (x−u(x)) := Tu := I (x, t)) image in the following. We briefly summarize the mathe-
matical model. We also use assumption (1) but do not linearize the data term as for optical flow.
That means we try to minimize the energy functional

E reg (u) := (R(x)− Tu )2 +S3 (u) dx (8)

with the same boundary conditions as above. Please note that now the data term is nonlinear. To
minimize (8), we linearize the whole energy functional and apply an inexact Newton method as
described in detail in [30, 32]. Then, starting with an initial approximation u0 the (k +1)th iterate
is computed via
uk+1 = uk +k v
DOI: 10.1002/nla
where we choose the parameter k ∈ R+ such that the energy becomes smaller after each step and
the correction v is derived from
H E (uk )v = −J E (uk ) (9)
J E := ∇Tu (R(x)− Tu )+(u+(1−)2 u) denotes the Jacobian and H E the Hessian of (8) that
is approximated by H E ≈ (∇Tu )2 +(+(1−)2 ). We drop the term ∇ 2 Tu (R(x)− Tu ) since
the difference R(x)− Tu should be small for registered images and since second image derivatives
are very sensitive to noise and are hard to estimate robustly. System (9) is equivalent to the optical
flow system (4) with a slightly different right-hand side and can be treated numerically in the same
way.
3. MULTIGRID SOLVER
In recent applications, a real-time solution of the optical flow system becomes more and more
important. Hence, an appropriate multigrid solver is an obvious choice for the numerical solution
of the resulting linear system, since multigrid methods are known to be among the fastest solvers
for discretized elliptic PDEs.
Multigrid methods (see, e.g. [33, 35, 36, 38, 39]) are mainly motivated by two basic principles.
1. Smoothing principle: Many iterative methods have a strong error smoothing effect if they
are applied to discrete elliptic problems.
2. Coarse grid correction principle: A smooth error term can be well represented on a coarser
grid where its approximation is substantially less expensive.
These two principles suggest the following structure of a two-grid cycle: Perform 1 steps of an
iterative relaxation method Sh on the fine grid (pre-smoothing), compute the defect of the current
fine grid approximation, restrict the defect to the coarse grid, solve the coarse grid defect equation,
interpolate the obtained error correction to the fine grid, add the interpolated correction to the
current fine grid approximation (coarse grid correction), perform 2 steps of an iterative relaxation
method on the fine grid (post-smoothing). Instead of an exact solution of the coarse grid equation,
it can be solved by a recursive application of the two-grid iteration, yielding a multigrid method.
We assume standard coarsening here, i.e. the sequence of coarse grids is obtained by repeatedly
doubling the mesh size in each space direction, i.e. h → 2h.
The crucial point for any multigrid method is to identify the ‘correct’ multigrid components (i.e.
relaxation method, restriction, interpolation, etc.) yielding an efficient interplay between relaxation
and coarse grid correction. A useful tool for a proper selection is local Fourier analysis.
3.1. Basic elements of local Fourier analysis

Local Fourier analysis [35–37] is mainly valid for operators with constant or smoothly varying
coefficients. It is based on the simplification that boundary conditions are neglected and all occurring
operators are extended to an infinite grid
G h := {x = (x, y)T = h(n x , n y )T with (n x , n y ) ∈ Z2 }
On an infinite grid, the discrete solution, its current approximation and the corresponding error or
residual can be represented by linear combinations of certain exponential functions—the Fourier
DOI: 10.1002/nla
components—which form a unitary basis of the space of bounded infinite grid functions, the
Fourier space. Regarding our optical flow system composed of four discrete equations, a proper
unitary basis of vector-valued Fourier components is given by
uh (h, x) := exp(i hx/ h)·I with I = (1, 1, 1, 1)T , h ∈ := (−, ]2 , x ∈ Gh

√
and complex unit i = −1 yielding the Fourier space
F(G h ) := span{uh (h, x) : h ∈ }
Then, the main idea of local Fourier analysis is to analyze different multigrid components or
even complete two-grid cycles by evaluating their effect on the Fourier components. Especially,
the analysis of the smoothing method is based on a distinction between ‘high’ and ‘low’ Fourier
frequencies governed by the coarsening strategy under consideration. If standard coarsening is
selected, each ‘low frequency’
h = h00 ∈ low := (−/2, /2]2
is coupled with three ‘high frequencies’
h11 := h00 −(sign(1 ), sign(2 )), h10 := h00 −(sign(1 ), 0)

h01 := h00 −(0, sign(2 )) (h11 , h10 , h01 ∈ high := \low )
in the transition from G h to G 2h . That is, the related three high-frequency components are not
visible on the coarse grid G 2h as they coincide with the coupled low-frequency component:
uh (h00 , x) = uh (h11 , x) = uh (h10 , x) = uh (h01 , x) for x ∈ G 2h
This is of course due to the 2-periodicity of the exponential function.
3.2. Measure of h-ellipticity

A well-chosen relaxation method obviously has to take care of the high-frequency error components
since they cannot be reduced on coarser grids by the coarse grid correction. The measure of
h-ellipticity is often used to decide whether or not this can be accomplished by a point relaxation
method [35–37]. A sufficient amount of h-ellipticity indicates that pointwise error smoothing
procedures can be constructed for the discrete operator under consideration. Dealing with operators
based on variable coefficients prevents a direct application of local Fourier analysis. In our discrete
system, variable coefficients occur for the image derivatives. However, the analysis can be applied
to the locally frozen operator at a fixed grid point n. Replacing the variable x by a constant n, one
obtains an operator Lh (n) with constant frozen coefficients. The measure of h-ellipticity for our
frozen system of equations is then defined by
min{|det(
Lh (n, h))| : h ∈ high }
E h (Lh (n)) :=
max{|det(Lh (n, h))| : h ∈ }
DOI: 10.1002/nla
where the complex (4×4)-matrix

⎛ ⎞
−h (h) 0 −1 0
⎜ ⎟
⎜
⎜ 0 − h (h) 0 −1 ⎟
⎟

Lh (n, h) = ⎜ ⎟
⎜ (I xh (n))2
⎝ I xh (n)I yh (n) (−(1−)
h (h)+) 0 ⎟
⎠
I xh (n)I yh (n) (I yh (n))2 0 (−(1−)
h (h)+)
is the Fourier symbol (for details concerning Fourier symbols for systems of equations, etc. we
refer to [35–37]) of Lh (n), i.e.
Lh (n)uh (h, x) =
Lh (n, h)uh (h, x)
The Fourier symbol Lh (n, h) for the system of PDEs is composed of the Fourier symbol of the
Laplacian and several constants. The Fourier symbol of the Laplacian reads (compare with [35–37])
4
−
h (h) = 2 (sin2 (1 /2)+sin2 (2 /2)) with h ∈
h
Now, det( Lh (n, h)) is simply given by (7) where −h has to be replaced by − h (h) and the image
derivatives by the related frozen constants. For the derivation of E h (Lh (n)), it is important to note
that −h (h)0. Moreover, for the four coefficients
c1 := Ic , c2 := 2 2 +(1−)Ic , c3 := 22 (−2 ), c4 := 2 (−1)2
with Ic = (I xh (n))2 +(I yh (n))2 occurring in det(

Lh (n, h)), we have c1 , c2 , c3 , c4 0 for >0, ∈ [0, 1].
Since
f (x) = c1 x +c2 x 2 +c3 x 3 +c4 x 4
is monotonically increasing for x, c1 , c2 , c3 , c4 0, the minimal (h ∈ high ) and maximal (h ∈ )

values of −
(h) and |det(Lh (n, h))| coincide. In particular, we have
2 8
min (−
h (h)) = −
h (−/2, 0) = 2 , max(−
h (h)) = −
h (, ) =
h∈high h h∈ h2
As a consequence, the measure of h-ellipticity for the discrete operator Lh (n) turns out to be
8(−1)2 +8(−2 )h 2 +2(2 +(1−)Ic )h 4 +Ic h 6

E h (Lh (n)) =
2048(−1)2 +512(−2 )h 2 +32(2 +(1−)Ic )h 4 +4Ic h 6
For the special cases = 0, 1 this gives
4+ Ic h 4 2+ Ic h 2
E h (Lh (n)) = and E h (Lh ) =
1024+16Ic h 4 32+4Ic h 2
respectively. Note that E h (Lh (n))>0 for all possible choices of , h>0, ∈ [0, 1], Ic 0. In parti-
cular, this means that E h (Lh (n))>0 for all possible values of I xh (n), I yh (n) over the whole discrete
image domain, i.e. for arbitrary n ∈ h . This is a strong and very satisfactory robustness result
DOI: 10.1002/nla
for such a complicated system involving several parameters. Even in the limit of small mesh size
h → 0, the measure of h-ellipticity is bounded away from zero since we have
1
16 for = 1
lim E h (Lh (n)) =
256 for = 1
h→0 1
3.3. Smoothing method

Owing to the above derivations, it can be expected that the optical flow system under consideration
is appropriate to point smoothing. The straightforward generalization of a scalar smoothing method
to a system of PDEs is a collective relaxation method. This relaxation method sweeps over all grid
points x ∈ h in a certain order, for example, in a lexicographic or a red–black manner. At each
grid point, the four difference equations are solved simultaneously, i.e. the corresponding variables
u h (x), vh (x), wh1 (x) and wh2 (x) are updated simultaneously. This means that a (4×4)-system has
to be solved at each grid point.
First of all, we have to note that the large sparse matrix that corresponds to the discrete
system is neither symmetric nor diagonally dominant. Furthermore, it is not an M-matrix due
to positive off-diagonal entries. As a consequence, most of the classical convergence criteria for
standard iterative methods such as Jacobi or Gauss–Seidel relaxation do not apply and it has to be
expected that these methods might diverge for certain parameter choices. In our numerical tests for
collective lexicographic or red–black Gauss–Seidel relaxation (abbreviated by GS-LEX and GS-
RB, respectively) we always observed an overall convergence, although for certain combinations
of , , I x , I y there were single relaxation steps with an increasing residual. An example of such a
convergence history is shown in Figure 1 for collective Jacobi, GS-LEX and GS-RB relaxation.
However, if a relaxation method is applied within a multigrid algorithm then we are mainly
interested in its smoothing properties. That is, the relaxation is aimed at a sufficient reduction of the
high-frequency components of the error between the exact solution and the current approximation,
see above. A quantitative measure of its efficiency represents the smoothing factor loc obtained
10000
Jacobi
GS-RB
GS-LEX
100
1
||Residuum||
0.01
1e-04
1e-06
0 500 1000 1500 2000

Iterations
Figure 1. Residual improvement of relaxations.
DOI: 10.1002/nla
by local Fourier analysis. loc is defined as the worst asymptotic error reduction by one relaxation
step of all high-frequency error components. For more details on local Fourier smoothing analysis,
we refer to the literature [35–37].
In case of smoothly varying coefficients, the smoothing factor for Lh (x) can be bounded by the
maximum over the smoothing factors for the locally frozen operator, i.e.
loc (Lh (x)) = max loc (Lh (n)) (10)

n∈h
As a popular test case, we consider frame 8 of the Yosemite sequence shown in Figure 4. Table I
presents the corresponding smoothing factors calculated via (10) for GS-LEX and GS-RB with
varying . is fixed at 1500, which turned out to be a proper choice w.r.t. the average angular error
(AAE) (11) in many situations, see below. Obviously there is hardly any influence of the parameter
on the resulting smoothing factor. We always observe nearly the same smoothing factors as
they are well known for the Poisson equation (i.e. = 0.5 for GS-LEX and = 0.25 for GS-RB).
Systematic tests show that the same statement is also valid for the parameter . As a consequence,
we can expect to obtain the typical multigrid efficiency as long as the coarse grid correction works
properly, compare with Section 3.4. The situation is considerably more complicated if we apply
decoupled relaxations (compare with [36]) which will be discussed elsewhere.
Note that I x and I y are not varying smoothly over the image domain h for this test case.
Instead we have moderate jumps in the coefficients. As a consequence, the smoothing factors from
Table I are not justified rigorously. However, from practical experience, they can be considered as
heuristic but reliable estimates for the actual smoothing properties especially since we only have
moderate jumps. To back up the theoretical results from smoothing analysis, we also tested the
smoothing effect of the collective relaxations numerically. The smoothing effect of GS-LEX can
be clearly seen from Figure 2. Here, the initial (random) error on a 33×33 grid (a scaled down
version of frame 8 from the Yosemite sequence) and the error after five collective GS-LEX steps
of the first component u of the optical flow velocity vector are shown.
Summarizing, there is sufficient evidence that collective damped Jacobi, GS-LEX and GS-RB
relaxation are reasonable smoothing methods even though they might diverge for single relaxation
steps as stand-alone solvers.
3.4. Coarse grid correction

Next to the collective GS relaxation, standard multigrid components are applied. To handle the
jumping coefficients in I x and I y , we use Galerkin coarse grid operators. Since there are only
moderate jumps it is not necessary to consider operator-dependent transfers but we can stay
with straightforward geometric transfers like full-weighting and bilinear interpolation. Throughout
our numerical experiments, V (2,2)-cycles are employed (i.e. 1 = 2 pre-relaxations and 2 = 2
Table I. Smoothing factors for GS-LEX and GS-RB, = 1500.

0 0.4 1
GS-LEX 0.49973 0.49980 0.49970
GS-RB 0.25003 0.25009 0.25000
DOI: 10.1002/nla
2 4
1.8 3.5
1.6 3
1.4
1.2 2.5
1 2
0.8 1.5
0.6 1
0.4
0.2 0.5
35 35
0 30 0 30
25 25
20 20
15 15
10 10
0 5 5 0 5 5
10 15 20 0 10 15 20 0
25 30 35 25 30 35
Figure 2. Error smoothing of GS-LEX relaxation for a scaled down version of

frame 8 from the Yosemite sequence.
post-relaxations). For details concerning these multigrid components, we refer to the well-known
literature again [33, 35, 36, 38, 39].
Since we are interested in a real-time solution, it is necessary to use the full multigrid (FMG)
technique (see, e.g. [35, 36]). Here, the initial approximation on the fine grid is obtained by the
computation and interpolation of approximations on coarser grids. A properly adjusted FMG
algorithm yields an asymptotically optimal method, i.e. the number of arithmetic operations is
proportional to the number of grid points, and at the same time, the error of the resulting fine grid
solution is approximately equal to the discretization error.
4. EXPERIMENTAL RESULTS
Next, the numerical performance of the multigrid solver described above is investigated, and the
quality of the variational model is demonstrated.
4.1. Optical flow

In general, it is very hard to quantify the quality of the optical flow velocity field. For synthetic
image sequences, often a ground truth motion field (see [40] for details) is used to measure the
quality of a computed optical flow field by the AAE. It is calculated via (cf. [28])

T
1 uc ue
A AE(uc , ue ) = arccos dx (11)
|| |uc ||ue |
where uc = (u c , vc , 1) is the ground truth and ue = (u e , ve , 1) the estimated optical flow vector.
Most real-world image sequences do not offer a ground truth motion field; therefore, in this
case the quality of the optical flow is often measured visually by plotting the vector field and
comparing it with the expected result. For example, one can check whether the vector field is
smooth inside objects and edges from different movements are preserved, e.g. objects moving over
a static background.
DOI: 10.1002/nla
4.1.1. Multigrid performance. All experiments for different combinations of and (see below)
were performed using a single FMG-V (2, 2) cycle with collective GS-RB as the smoother. The same
visual and AAE results can be also obtained by five V (2, 2) cycles. Input images are smoothed by
a discrete Gaussian filter mask (standard deviation = 1.2) in order to ensure a robust computation
of the image derivatives by finite difference approximations.
For constant coefficients I x and I y , one obtains the typical multigrid convergence factors similar
as for the Poisson equation which can be nicely predicted by local Fourier analysis. For jumping
coefficients, a slight deterioration of the convergence rate can be observed. Table II lists some
representative results. Different values of that are useful for the application do not have a
substantial impact on the convergence rates. The best convergence rates are achieved when the
combination of and is optimal with respect to the quality of the solution which is an interesting
observation by itself. Figure 3 shows an AAE (11) plot over for = 1500. The best quality with
Table II. Convergence rates for the computation of the optical flow from frames 8 and 9
of the Yosemite sequence with = 1500.
GS-LEX GS-RB
Cycle =0 = 0.4 =1 =0 = 0.4 =1
1 0.053 0.051 0.048 0.091 0.090 0.074
2 0.054 0.042 0.045 0.070 0.055 0.044
3 0.096 0.065 0.148 0.115 0.069 0.127
4 0.124 0.086 0.196 0.156 0.093 0.181
5 0.131 0.093 0.232 0.172 0.110 0.233
10.6
AAE for alpha=500
AAE for alpha=1500
10.4 AAE for alpha=5000
10.2
10
9.8
AAE
9.6
9.4
9.2
8.8
8.6
0 0.2 0.4 0.6 0.8 1
beta
Figure 3. AAE plot of the calculated optical flow between pictures 8 and 9 from the Yosemite sequence
for = 500, 1500 and 5000.
DOI: 10.1002/nla
Table III. Runtimes of the optical flow FMG-V (2, 2) multigrid

solver for different image sizes.
Size Runtime (in ms)
256×192 305
256×256 420
316×252 560
640×480 1900
respect to AAE is obtained for ≈ 0.4. On the other hand, the best convergence rates for = 1500
are also obtained for ≈ 0.4 (see Table II).
To give an impression of the performance of our optical flow algorithm, we list in Table III
runtimes for a FMG-V (2, 2) cycle for different image sizes. The time measurements are done on
an AMD Opteron 248 Cluster node with 2.2 GHz, 64 kB L1 cache, 1 MB L2 cache and 4 GByte
DDR-333 RAM. Of course, by a hardware-specific performance optimization of the multigrid
solver on current architectures these times can be improved for real applications [41, 42].
Summarizing, the multigrid algorithm exhibits a very robust behavior as it was indicated by
the investigation of the measure of h-ellipticity. For all possible choices of , and the image
derivatives, one obtains nearly the same (excellent) convergence factors as they are known for the
Poisson equation.
4.1.2. Quality of the optical flow model. In the following we use two sequences, one synthetic
and another real world [43] to evaluate our optical flow model.
The Yosemite sequence with clouds, created by Lynn Quam [44], is a rather complex test
case (see Figure 4). It consists of 15 frames of size 316×252 and depicts a flight through the
Yosemite national park. In this sequence, translational (clouds) and divergent motion (flight) is
present. Additionally, we have varying illumination in the region of the clouds; thus, our constant
brightness assumption is not fulfilled there.
All tests were obtained with frames 8 and 9 of the Yosemite sequence. First, we consider in
Figure 3 the AAE for = 500, 1500, 5000 and varying . = 500 was chosen because it was tested
to give the optimal value—w.r.t. a minimal AAE—for the second-order system. The combined
regularizer produces the best result. It is able to outperform both the diffusion-based and also the
curvature-based regularizer. Since the AAE is measured over the whole image domain, also small
improvements of the AAE can lead to a substantial improvement in the local visual quality of the
resulting optical flow field.
Figure 4 shows image details of the resulting velocity fields for the Yosemite sequence, where
we choose = 1500 for a visual comparison of different values of . The right half of this detail
includes the high mountain from the middle of the images. The mountains are moving from right
to left, whereas the clouds region is moving (pure horizontally) from left to right. For = 1, one
can see the usual behavior of the original Horn and Schunck regularizer, which tries to produce
a smooth solution even over the mountain crest. The fourth-order system performs better in this
regard, as the region of influence is notably smaller, for example, at the right crossover. The
combined regularizer with = 0.4 exhibits a mixture of both effects and leads to a smaller AAE
over the whole image. One can also observe that all methods fail to calculate the pure horizontal
flow in the clouds region. That is due to the fact that the brightness varies here and thus the
constant brightness assumption of the data term does not hold.
DOI: 10.1002/nla
140 145 150 155 160 165 170 175 180 185 140 145 150 155 160 165 170 175 180 185 140 145 150 155 160 165 170 175 180 185
50 50 50
55 55 55
60 60 60
65 65 65
70 70 70
75 75 75
Figure 4. First line: Frames 8 and 9 from Yosemite sequence. Second line: A detail from the optical flow
located left from the highest mountain in the middle of the image (marked in frame 8). It was calculated
with = 1500 and (from left to right) = 0, 0.4 and 1.
The second sequence shows rotating particles and is related to PIV. However, we do not use
the standard models like a div–curl regularizer for PIV but our variational approach. Our goal is
to visualize the difference in the diffusion- and curvature-based regularizer at a vortex, where the
latter is able to resolve the vortex much better which can be nicely observed in Figure 5.
4.2. Medical image registration

For simplicity, we quantify the registration error by the relative sum of squared differences (SSD)
error (see, e.g. [15])
R(x)− Tu
SSD :=
R(x)− T (x)
However, for medical applications it is not always useful to force a very small relative SSD error,
but to maintain the topology of the medical data, i.e. to keep structures like bones. In Figure 6,
we depict two medical images of a human brain and their registration results. After five Newton
steps, we achieve SSD = 0.1 for = 0 and SSD = 0.08 for = 0.05. A diffusion-based regularizer
is not suitable here and leads to SSD = 0.3.
DOI: 10.1002/nla
Figure 5. First line: two frames of a rotating particle sequence (size 512×512). Second line:
the resulting optical flow field for = 500 at the vortex for the diffusion-based regularizer
(left) and the curvature-based regularizer (right).
5. CONCLUSIONS AND OUTLOOK
We presented and evaluated a combined diffusion- and curvature-based regularizer for optical flow
and the related image registration. The arising fourth-order system of PDEs was solved efficiently
by a geometric multigrid solver. Here, it shows that the best results are obtained, when the weighting
between regularizer and brightness constancy assumption is chosen such that the multigrid solver
shows an optimal convergence rate. This is an interesting observation and it has to be investigated,
if this can be used to choose the weighting parameter automatically.
DOI: 10.1002/nla
Figure 6. First line: template image (left) and reference image (right) showing a human brain (size
256×256). Second line: registration results (from left to right) with = 3 for = 0 and 0.05.
To improve the static weighting of the regularizer, which produces an equally smooth solution
throughout the picture, one could allow a space-dependent parameter in order to deal with
discontinuities in the solution.
Next steps are the extension of the regularizer to the physically motivated div–curl-based regu-
larizer, or nonlinear regularizers, where and depend on the velocity field.
Furthermore, we wish to apply the curvature-based regularizer to motion blur computed by a
combined optical flow and ray tracer motion field [17]. This should help to overcome the problem
of the diffusion-based regularizer that introduces singularities in the Euler–Lagrange equations,
since some motion vectors are fixed within the optical flow model.
For image registration, it is an interesting task to extend the model to 3D in order to be able to
register 3D medical data sets.
REFERENCES
1. Horn B, Schunck B. Determining optical flow. Artificial Intelligence 1981; 17:185–203.
2. Horn B. Robot Vision. MIT Press: Cambridge, MA, U.S.A., 1986.
3. Nagel H-H, Enkelmann W. An investigation of smoothness constraints for the estimation of displacement
vector fields from image sequences. IEEE Transactions on Pattern Analysis and Machine Intelligence 1986;
8(5):565–593.
DOI: 10.1002/nla
4. Galvin B, McCane B, Novins K, Mason D, Mills S. Recovering motion fields: an evaluation of eight optical
flow algorithms. British Machine Vision Conference, Southampton, 1998.
5. Verri A, Poggio T. Motion field and optical flow: qualitative properties. IEEE Transactions on Pattern Analysis
and Machine Intelligence 1989; 11(5):490–498.
6. Haussecker H, Fleet D. Computing optical flow with physical models of brightness variation. IEEE Transactions
on Pattern Analysis and Machine Intelligence 2001; 23(6):661–673.
7. Weickert J, Schnörr C. A theoretical framework for convex regularizers in PDE-based computation of image
motion. International Journal of Computer Vision 2001; 45(3):245–264.
8. Weickert J, Schnörr C. Variational optic flow computation with a spatio-temporal smoothness constraint. Journal
of Mathematical Imaging and Vision 2001; 14(3):245–255.
9. Brox T, Weickert J. Nonlinear matrix diffusion for optic flow estimation. In Pattern Recognition, van Gool L
(ed.). Lecture Notes in Computer Science, vol. 2449. Springer: Berlin, 2002; 446–453.
10. Suter D. Motion estimation and vector splines. Proceedings of the Conference on Computer Vision and Pattern
Recognition, Los Alamos, U.S.A., 1994; 939–948.
11. Gupta S, Prince J. Stochastic models for div–curl optical flow methods. IEEE Signal Processing Letters 1996;
3(2):32–34.
12. Corpetti T, Mémin E, Pérez P. Dense estimation of fluid flows. IEEE Transactions on Pattern Analysis and
Machine Intelligence 2002; 24(3):365–380.
13. Kohlberger T, Mémin E, Schnörr Ch. Variational dense motion estimation using the Helmholtz decomposition.
In Fourth International Conference on Scale Space Methods in Computer Vision, Griffin L, Lillholm M (eds),
Isle of Skye, U.K. Lecture Notes in Computer Science, vol. 2695. Springer: Berlin, 2003; 432–448.
14. Corpetti T, Heitz D, Arroyo G, Mémin E, Santa-Cruz A. Fluid experimental flow estimation based on an optical-
flow scheme. Experiments in Fluids 2006; 40(1):80–97.
15. Modersitzki J. Numerical Methods for Image Registration. Oxford University Press: Oxford, 2004.
16. Fischer B, Modersitzki J. Curvature based image registration. Journal of Mathematical Imaging and Vision 2003;
18(1):81–85.
17. Zheng Y, Köstler H, Thürey N, Rüde U. Enhanced motion Blur calculation with optical flow. Proceedings of
Vision, Modeling and Visualization, RWTH Aachen, Germany. Aka GmbH, IOS Press: Berlin, 2006; 253–260.
18. Fischer B, Modersitzki J. Combining landmark and intensity driven registrations. PAMM 2003; 3(1):32–35.
19. Galic I, Weickert J, Welk M, Bruhn A, Belyaev A, Seidel H. Towards PDE-based image compression. Proceedings
of Variational, Geometric, and Level Set Methods in Computer Vision. Lecture Notes in Computer Science.
Springer: Berlin, Heidelberg, New York, 2005; 37–48.
20. Papenberg N, Bruhn A, Brox T, Didas S, Weickert J. Highly accurate optic flow computation with theoretically
justified warping. International Journal of Computer Vision 2006; 67(2):141–158.
21. Glazer F. Multilevel relaxation in low-level computer vision. In Multi-Resolution Image Processing and Analysis,
Rosenfeld A (ed.). Springer: Berlin, 1984; 312–330.
22. Terzopoulos D. Image analysis using multigrid methods. IEEE Transactions on Pattern Analysis and Machine
Intelligence 1986; 8:129–139.
23. Enkelmann W. Investigations of multigrid algorithms for the estimation of optical flow fields in image sequences.
Computer Vision, Graphics, and Image Processing 1988; 43:150–177.
24. Battiti R, Amaldi E, Koch C. Computing optical flow across multiple scales: an adaptive coarse-to-fine strategy.
International Journal of Computer Vision 1991; 6(2):133–145.
25. Kalmoun EM, Rüde U. A variational multigrid for computing the optical flow. In Vision, Modeling and
Visualization, Ertl T, Girod B, Greiner G, Niemann H, Seidel HP, Steinbach E, Westermann R (eds). Akademische
Verlagsgesellschaft: Berlin, 2003; 577–584.
26. Kalmoun EM, Köstler H, Rüde U. 3D optical flow computation using a parallel variational multigrid scheme
with application to cardiac C-arm CT motion. Image and Vision Computing 2007; 25(9):1482–1494.
27. Christadler I, Köstler H, Rüde U. Robust and efficient multigrid techniques for the optical flow problem using
different regularizers. In Proceedings of 18th Symposium Simulations Technique ASIM 2005, Hülsemann F,
Kowarschik M, Rüde U (eds). Frontiers in Simulation, vol. 15. SCS Publishing House: Erlangen, 2005; 341–346.
Preprint version published as Technical Report 05-6.
28. Bruhn A. Variational optic flow computation: accurate modeling and efficient numerics. Ph.D. Thesis, Department
of Mathematics and Computer Science, Saarland University, Saarbrücken, Germany, 2006.
29. Bruhn A, Weickert J, Kohlberger T, Schnörr C. A multigrid platform for real-time motion computation with
discontinuity-preserving variational methods. International Journal of Computer Vision 2006; 70(3):257–277.
DOI: 10.1002/nla
30. Haber E, Modersitzki J. A multilevel method for image registration. SIAM Journal on Scientific Computing 2006;
27(5):1594–1607.
31. Henn S. A multigrid method for a fourth-order diffusion equation with application to image processing. SIAM
Journal on Scientific Computing 2005; 27(3):831–849.
32. Hömke L. A multigrid method for anisotropic PDEs in elastic image registration. Numerical Linear Algebra with
Applications 2006; 13(2–3):215–229.
33. Hackbusch W. Multi-grid Methods and Applications. Springer: Berlin, Heidelberg, New York, 1985.
34. Keeling SL, Haase G. Geometric multigrid for high-order regularizations of early vision problems. Applied
Mathematics and Computation 2007; 184(2):536–556.
35. Brandt A. Multigrid techniques: 1984 guide with applications to fluid dynamics. GMD-Studie Nr. 85, Sankt
Augustin, West Germany, 1984.
36. Trottenberg U, Oosterlee C, Schüller A. Multigrid. Academic Press: San Diego, CA, U.S.A., 2001.
37. Wienands R, Joppich W. Practical Fourier analysis for multigrid methods. In Numerical Insights, vol. 5. Chapman &
Hall/CRC Press: Boca Raton, FL, U.S.A., 2005.
38. Briggs W, Henson V, McCormick S. A Multigrid Tutorial (2nd edn). SIAM: Philadelphia, PA, U.S.A., 2000.
39. Wesseling P. Multigrid Methods. Edwards: Philadelphia, PA, U.S.A., 2004.
40. McCane B, Novins K, Crannitch D, Galvin B. On benchmarking optical flow. Computer Vision and Image
Understanding 2001; 84(1):126–143.
41. Douglas C, Hu J, Kowarschik M, Rüde U, Weiß C. Cache optimization for structured and unstructured grid
multigrid. Electronic Transactions on Numerical Analysis 2000; 10:21–40.
42. Hülsemann F, Kowarschik M, Mohr M, Rüde U. Parallel geometric multigrid. In Numerical Solution of
Partial Differential Equations on Parallel Computers, Chapter 5, Bruaset A, Tveito A (eds). Lecture Notes in
Computational Science and Engineering, vol. 51. Springer: Berlin, Heidelberg, New York, 2005; 165–208.
43. Barron J, Fleet D, Beauchemin S. Performance of optical flow techniques. International Journal of Computer
Vision 1994; 12(1):43–77.
44. Heeger D. Model for the extraction of image flow. Journal of the Optical Society of America A: Optics, Image
Science, and Vision 1987; 4(8):1455–1471.
DOI: 10.1002/nla
A semi-algebraic approach that enables the design of inter-grid

operators to optimize multigrid convergence
Pablo Navarrete Michelini1, 2, ∗, † and Edward J. Coyle3

1 Centerfor Wireless Systems and Applications, Purdue University, 465 Northwestern Ave., West Lafayette,
IN 47907-2035, U.S.A.
2 Department of Electrical Engineering, Universidad de Chile, Av. Tupper 2007, Santiago, RM 8370451, Chile
3 School of Electrical and Computer Engineering, Georgia Institute of Technology, 777 Atlantic Dr. NW,
Atlanta, GA 30332-0250, U.S.A.
SUMMARY
We study the effect of inter-grid operators—the interpolation and restriction operators—on the convergence
of two-grid algorithms for linear models. We show how a modal analysis of linear systems, along with
some assumptions on the normal modes of the system, allows us to understand the role of inter-grid
operators in the speed and accuracy of a full-multigrid step.
We state an assumption that generalizes local Fourier analysis (LFA) by means of a precise description
of aliasing effects on the system. This assumption condenses, in a single algebraic property called the
harmonic aliasing property, all the information needed from the geometry of the discretization and the
structure of the system’s eigenvectors. We first state a harmonic aliasing property based on the standard
coarsening strategies of 1D problems. Then, we extend this property to a more aggressive coarsening
typically used in 2D problems with the help of additional assumptions on the structure of the system
matrix.
Under our general assumptions, we determine the exact rates at which groups of modal components
of the error evolve and interact. With this knowledge, we are then able to design inter-grid operators
that optimize the two-grid algorithm convergence. By different choices of operators, we verify the classic
heuristics based on Fourier harmonic analysis, show a trade-off between the rate of convergence and the
number of computations required per iteration, and show how our analysis differs from LFA. Copyright
q 2008 John Wiley & Sons, Ltd.
Received 15 May 2007; Revised 9 November 2007; Accepted 14 December 2007
KEY WORDS: multigrid algorithms; inter-grid operators; convergence analysis; modal analysis; aliasing
∗ Correspondence to: Pablo Navarrete Michelini, Departamento de Ingenierı́a Eléctrica, Universidad de Chile,
Av. Tupper 2007, Santiago, RM 8370451, Chile.
†
E-mail: pnavarre@purdue.edu

220 P. NAVARRETE MICHELINI AND E. J. COYLE
1. INTRODUCTION
We are interested in applications of the multigrid algorithm in the distributed sensing and processing
tasks that arise in the design of wireless sensor networks. In such scenarios, the inexpensive,
low-power, low-complexity sensor motes that are the nodes of the network must perform all
computation and communication tasks. This is very different than the scenarios encountered
in the implementation of multigrid algorithms on large parallel machines for the following
reasons:
• Sensor motes are battery powered and must operate unattended for long periods of time. The
design of algorithms that run on them must therefore attempt to minimize the number of
computations each node must perform and the number of times it must communicate because
both functions consume energy. Of the two functions, communication is the most energy
intensive per bit of data.
• Communication between sensor motes is carried out in hop-by-hop fashion, since the energy
required to send data over a distance d is proportional to d with 24. Thus, the sensor
motes communicate directly only with their nearest neighbors in any direction.
• Re-executing an algorithm after adjusting parameters or models is very difficult or might not
even be possible because of the remote deployment of the network. It is thus critical that the
algorithms used to perform various tasks be as robust and well understood as possible before
they are deployed.
In implementations of multigrid algorithms on networks like these, as in many other applications

of multigrid algorithms, it is thus essential that the convergence rate of the algorithm be optimized.
This minimizes the number of communication and computation steps of the algorithm. It also leads
to interesting insights in the design of each step, highlighting both trade-offs between the different
costs of computations within each node and communications between nodes, and the need for low
complexity in each step of the algorithm.
Finally, in such applications the multigrid methods must be very robust in order to ensure
the continuous operation of the whole system. This task is difficult because it is likely that the
system model varies throughout the field. The current theory of algebraic multigrid (AMG) offers
one possible solution to this problem [1–4]. Unfortunately, the convergence results obtained so
far in the theory of AMG are not as strong as the theory for linear operators with constant
stencil coefficients [5]. As optimal convergence behavior is critical under our particular distributed
scenario, we seek a more flexible yet still rigorous convergence analysis.
The goal of this paper is thus to introduce a new convergence analysis based on a modal
decomposition of the system and a precise description of aliasing phenomena on coarse systems.
The purpose of this analysis is to provide tools that enable the design of coarsening strategies as
well as inter-grid and smoothing operators. We try to stay close to the technique of local Fourier
analysis (LFA)‡ introduced by Achi Brandt [5, 6] as it is a powerful technique for quantitative
convergence analysis. The essential difference between LFA and our approach is that we drop the
requirement of constant stencil coefficients. By doing so, the eigenvectors of a linear operator will
no longer be the so-called harmonic grid functions used in LFA [7], which in this paper we call
‡
Originally called local mode analysis (LMA); we chose the nomenclature used in [7] as it emphasizes the essential
difference with the approach introduced in this paper.
DOI: 10.1002/nla
DESIGN OF INTER-GRID OPERATORS 221
Fourier harmonic modes. The properties of the system must thus be constrained in some way in
order to develop new tools for convergence analysis. The requirement we focus on is an explicit
description of the aliasing effects produced by the coarsening strategy.
The aliasing of Fourier harmonic modes is present in LFA through the concept of spaces of
harmonics [7]. We identify its simple form as one of the reasons why LFA is so powerful. Based on
this fact, we assume a more general aliasing pattern that still allows us to characterize convergence
behavior. This assumption condenses, in a single algebraic property called the harmonic aliasing
property, all the information needed from the geometry of the discretization and the structure of the
eigenvectors. If this property is satisfied, then no more information is needed from the system and
the analysis is completely algebraic. Therefore, our analysis could be considered a semi-algebraic
approach to the study of convergence issues and the design of efficient inter-grid operators.
One of the practical advantages of our approach is that we are able to separate the problem
of coarsening from what we call filtering, i.e. interpolation/restriction weights and smoothing
operations. The analysis of each problem makes no use of heuristics. The coarsening strategy
is designed to ensure a convenient aliasing pattern whereas the design of the filters is meant to
optimize multigrid convergence.
The main difficulty of our approach is the dependence of the assumptions on the eigenvectors of
the system. In practical applications, it is very unlikely that this information is available. Therefore
the verification of the assumptions remains unsolved. Nevertheless, this problem is also shared in
many fields in which transient or local phenomena do not allow a proper use of Fourier analysis [8].
There have been many efforts to identify suitable bases for specific problems and the goal of this
work is to open this problem in multigrid analysis. For these reasons, the results of this paper
are not entirely conclusive about optimization strategies for coarsening and filtering. They are,
however, an important first step toward this goal.
In Section 2 we provide the notation and the essential properties of the multigrid algorithm
for further analysis. In Section 3 we list the assumptions needed on the algorithm and system in
order to apply our analysis. In Section 4 we list the additional assumptions needed on 2D systems
in order to extend our analysis. In Section 5 we derive the main results about the influence of
inter-grid operators on multigrid convergence and verify the classic heuristics of Fourier harmonic
analysis. In Section 6 we provide examples that show how to use our analysis and also on how
our analysis differs from the classical LFA.
2. THE ELEMENTS OF MULTIGRID ALGORITHMS
We wish to solve discrete linear systems of the form Au = f , defined on a grid h with step size
h ∈ R+ defined as the largest distance between neighboring grid nodes. A coarse grid s is defined
as a set of nodes such that s ⊂ h and s>h.
We define the so-called inter-grid operators, regardless of their use in the multigrid algorithm,
as any linear transformation between scalar fields on h and s . That is,
Ish ∈ R| |×|s |

Ihs ∈ R| |×|h |
h s
and (1)
where Ish is the interpolation operator and Ihs is the restriction operator. We introduce a notation
with markers ‘ ˇ ’ or ‘ ˆ ’ to indicate transfers from a finer or coarser grid, respectively. We are
DOI: 10.1002/nla
then interested in the following operations:

x ∈ R |
h
|
x̌ = Ihs x, (2)
y ∈ R |
s
|
ŷ = Ish y, (3)
and
A ∈ R | |×|h |
h
Ǎ = Ihs AIsh , (4)
The definition of the coarsening operator in (4) follows the Galerkin condition and is standard in
most multigrid applications [9].
We consider a full two-grid approach consisting of a nested iteration step, as shown in Figure 1,
and 1 iterations of the Correction Scheme, including 1 pre-smoothing and 2 post-smoothing
iterations, as shown in Figure 2. Here, the vector vk is the kth approximation of the exact solution
of the linear system, u ∈ R| | . Similarly, the vector ek = u −vk is the approximation error after the
h
kth step of the algorithm. One smoothing iteration is characterized by the smoothing operator S;
after each iteration the approximation error evolves as ek+1 = Sek . Because of this property we
also call S the smoothing filter.
From these diagrams, it follows that the approximation error between smoothing iterations in
the correction scheme is given by
e1 +1 = K e1 (5)
Figure 1. Diagram of a nested iterations step. The dotted line separates problems from the
fine and coarse grid domains. The interpolation (restriction) operation is applied to vectors
crossing the dotted line from below (above).
Figure 2. Diagram of a correction scheme step using 1 pre-smoothing iterations and 2

post-smoothing iterations (e.g. Gauss Seidel, Jacobi, Richardson, etc.). The dotted line separates
problems from the fine and coarse grid domains. The interpolation (restriction) operation is applied
to vectors crossing the dotted line from below (above).
DOI: 10.1002/nla
and similarly, the initial approximation error, e0 , using nested iteration is given by
e0 = K u (6)
where u is the exact solution of the linear system and K is the so-called coarse grid correction
matrix [10] defined as
K = I − Ish Ǎ−1 Ihs A (7)
This matrix is the target of our analysis in Section 5 as it controls all of the convergence features
of the two-grid scheme. Considering the effect of smoothing iterations, the error in the whole
correction scheme evolves as
e1 +1+2 = S 2 K S 1 e0 (8)
In the multiple-grid case, a recursive application of nested iterations and the correction scheme
is used to solve coarse system equations, as shown in Figure 3. Since coarse systems are not
solved with exact accuracy, the approximation error evolves differently. Here, the error depends
on the accuracy of the solutions from the coarse grids. Thus, matrix K used above is replaced by
a different matrix, denoted by K 1 , which is obtained from the following recursions:
K L = 0, A1 = A
A j = Ǎ j−1 , with j = 2, . . . , L −1 and (9)
[I −(S j 2 K j S j 1 ) j K j ]( Ǎ j−1 )−1 I j−1 A j−1 ,
j−1 j
K j−1 = I − I j with j = L , . . . , 2
j−1 j
where S j , I j , and I j−1 are the smoothing, interpolation, and restriction operators chosen at
level j, and j is the number of iterations of the correction scheme used at level j. Then, the
approximation error evolves as e0 = K 1 u in nested iterations and it evolves as e1 +1 = K 1 e1 between
smoothing iterations of the correction scheme.
Although our analysis is technically applicable to the full multiple-grid case, the coupling
between different levels makes the algebra tedious. Therefore, we concentrate on the two-level
case and for the multiple-grid case we assume that the problem in coarse levels has been solved
with enough accuracy so that matrices (S j 2 K j S j 1 ) j K j can be neglected and we can work under
the two-grid assumptions.
Figure 3. Diagram of the recursive full multigrid approach using one iteration of the correction
scheme per level. Each box represents a number of pre- or post-smoothing iterations. The
particular choice of using the same combination of pre-/post-smoothing iterations on different
correction scheme steps is considered.
DOI: 10.1002/nla
3. ASSUMPTIONS ABOUT THE ALGORITHM AND THE SYSTEM
Two assumptions are needed in order to derive our convergence results. First, we introduce a
decomposition of the inter-grid interpolation/restriction operators into up-/down-sampling and
filtering operations, a standard approach in digital signal processing [8, 11]. Second, we assume that
the operators and the system possess the same basis of eigenvectors and we establish a condition
on these eigenvectors under (up-/down)-sampling operations. These conditions are motivated by
standard Fourier harmonic analysis but they are not restricted to systems with Fourier harmonic
modes as eigenvectors.
3.1. System modes

Assuming that A is a diagonalizable square matrix, we define its eigen-decomposition as
A = W V T (10)
Here, the diagonal matrix contains the eigenvalues of A on its diagonal. The columns of the
matrix W are the right-eigenvectors of A, i.e. AW = W . The columns of the matrix V contain
the left-eigenvectors of A, i.e. V T A = V T .
The column vectors of W and V form a biorthogonal basis since it follows from the above
definitions that
V TW = I (11)
If A is a symmetric matrix, then V = W and the column vectors of W form an orthogonal basis.
It is important to note that from this point on our analysis differs from LFA. In LFA it is
assumed that the stencil of A, denoted as the row vector s, is not dependent on the position of
the grid nodes to which it is applied. When this is true, the operation Ax can be expressed as the
convolution:

(Ax)n = (s)k (x)n+k (12)
k
where (Ax)n denotes the nth component of the vector Ax. This implies that the eigenvectors of A
are Fourier harmonic modes. In other words, if (w)k = ei k then Aw = s()w where s() is the
Fourier transform of the stencil sequence. In our analysis, the stencil can depend on the position
of the grid nodes to which it is applied. In this case, the operation Ax can be expressed as

(Ax)n = (sn )k (x)n+k (13)
k
and then the eigenvectors of A need not be Fourier harmonic modes.

Later on we will make assumptions about the eigenvectors of A that are related to the coarsening
strategy of the multigrid approach. This does, of course, limit the scope of our analytical approach,
but it can still be applied to a broader family of operators than LFA. The examples in Sections 6.2
and 6.3 will make this point very clear.
DOI: 10.1002/nla
3.2. Smoothing filters

We assume that the smoothing operator S used in the two-grid algorithm, as defined in Section 2,
has the same eigenvectors as A. That is,
S = W V T (14)
where is a diagonal matrix with the eigenvalues of matrix S. The diagonal values in represent
the factor by which each modal component of the approximation error is multiplied after one
smoothing iteration.
As in LFA, our analysis is also applicable to smoothers of the form A+ ek+1 = A− ek with
A = A+ − A− [7], e.g. Gauss–Seidel with lexicographical ordering for constant stencil operators,
assuming that both A+ and A− have the same eigenvectors as A. The smoothing operator is then
given by
S = W (+ )−1 − V T (15)
where + and − are diagonal matrices with the eigenvalues of A+ and A− , respectively.
3.3. Inter-grid filters

In our analysis of multigrid convergence, it is useful to decompose the inter-grid operators defined
in Section 2 into two consecutive operations. For two grid levels, with the fine grid h and the
coarse grid s , we first identify the operation of selecting nodes from the fine grid for the coarse
grid. This leads to the following definitions:
Definition 1 (Down-/up-sampling matrices)

The down-sampling matrix D ∈ R|s |×|h | is defined as

1 if node j ∈ h is the ith selected node
(D)i, j = (16)
0 otherwise
The up-sampling matrix U ∈ R|h |×|s | is defined as
U = DT (17)
A similar definition for an unselecting operation which will be useful in Section 6 is
Definition 2 (Down-/up-unselecting matrices)

The down-unselecting matrix D̄ is defined as

1 if node j ∈ h is the ith unselected node
( D̄)i, j = (18)
0 otherwise
The up-unselecting matrix Ū is defined as Ū = D̄ T .
An important property that follows from these definitions is
DU = I˜ (19)
DOI: 10.1002/nla
where I˜ ∈ R|s |×|s | is the identity matrix in the coarse grid. On the other hand, the matrix
U D ∈ R|h |×|h | is a diagonal matrix with 1 in the diagonal whenever i = j is a selected node and
0 otherwise.
Now, we can decompose the inter-grid operators Ish and Ihs , as defined in Section 2, into the
following matrix products:
with FI ∈ R| |×|h |

h
Ish = FI U, and
(20)
with FR ∈ R| |×| |
h h
Ihs = D FR ,
where the square matrices FI and FR are called the interpolation and restriction filters, respectively.
Although this kind of decomposition is widely used in digital signal processing [8, 11], it has
not been used for convergence analysis of multigrid algorithms. In the case that the variational
property Ihs = c(Ish )T is assumed, the inter-grid filters reduce to a single filter F given by
F = FR = c(FI )T (21)
The inter-grid operator decomposition applies to any kind of inter-grid operators. Now, we
restrict our analysis to the set of inter-grid filters that have the same eigenvectors as the system
matrix A. That is, we assume inter-grid filters of the form
FI = W I V T and
(22)
FR = W R V T
where I and R are diagonal matrices and their diagonal coefficients represent the damping
effect of the filters on the corresponding eigenvector.
3.4. The harmonic aliasing property

From its earliest formulation, multigrid heuristics have always been based on Fourier harmonic
analysis. The idea of reducing high- and low-frequency components of the approximation error
can be found in almost any book or tutorial on the subject. In this paper, we generalize this to
a modal analysis where the eigenvectors (or modes) are not necessarily Fourier harmonic modes.
We keep the notion of harmonic analysis in a more general way. By harmonic modes now we
mean a set of vectors with a certain property that, generally speaking, will preserve the notion
of self-similarity through the aliasing of different modes after down-sampling. As an example, in
Section 6.2 we will mention ‘square-wave’ like functions that do not fit within the scope of LFA.
We introduce this property because the aliasing effects of Fourier harmonic modes are essential to
revealing the role of the smoothing and inter-grid filters in multigrid convergence. Therefore, we
need to define this property for our more general modal analysis.
Since the application of the following property will be constrained to 1D systems, we will start
using a subindex x as a label that indicates the dimension where the operations apply. Then, we
state the harmonic aliasing property as follows:
Definition 3 (Harmonic aliasing property)
A set of biorthogonal eigenvectors, Wx and Vx , and a down-sampling matrix Dx have the harmonic
aliasing property if there exists an ordering of eigenvectors for which
VxT Ux Dx Wx = N x (23)
DOI: 10.1002/nla
where Ux = DxT is the up-sampling matrix and N x is the harmonic aliasing pattern that we define
to be

1 I˜x I˜x
Nx = (24)
2 I˜x I˜x
We must note that the harmonic aliasing property only involves the eigenvectors of the system
and the down-/up-sampling operator. Although this is a strong assumption on the system, it
only involves the down-sampling operator from the multigrid algorithm. It does not depend on
the smoothing and inter-grid filters. This is an important consequence of the inter-grid operator
decomposition.
The definition above implicitly assumes a down-sampling by a factor of 2 and naturally induces
a partition of the eigenvectors into two sets, say Wx = [W L x W H x ] for the right-eigenvectors
and Vx = [VL x VH x ] for the left-eigenvectors. The subscripts L x and H x resemble the standard
Fourier harmonic analysis used to distinguish between low- and high-frequency modes (see for
instance [10]). Using these partitions, we can restate the harmonic aliasing property. For that
purpose we state the following definition:
Definition 4 (Surjective property)

A set of biorthogonal eigenvectors, Wx and Vx , and a down-sampling matrix Dx have the surjective
property if there exists an ordering of the eigenvectors for which the partitions Wx = [W L x W H x ]
and Vx = [VL x VH x ] fulfill the following conditions:
Dx W L x = Dx W H x (25)
and
D x VL x = D x V H x (26)
Theorem 1
The surjective property is equivalent to the harmonic aliasing property.
Proof
First, we have to note that, given the partitions Wx = [W L x W H x ] and Vx = [VL x VH x ], we can
rewrite the harmonic aliasing property as the following set of biorthogonal relationships:
(Dx VL x )T (Dx W L x ) = 12 I˜x (27)
(Dx VL x )T (Dx W H x ) = 12 I˜x (28)
(Dx VH x )T (Dx W L x ) = 12 I˜x (29)
and
(Dx VH x )T (Dx W H x ) = 12 I˜x (30)
Then, since Wx and Vx form a biorthogonal basis, we have
Wx VxT = W L x VLTx + W H x VHT x = I x (31)
DOI: 10.1002/nla
By pre-multiplication by Dx and post-multiplication by Ux , we obtain
(Dx Wx )(Dx Vx )T = (Dx W L x )(Dx VL x )T +(Dx W H x )(Dx VH x )T = I˜x (32)
From here, if we assume the surjective property, then Equation (32) immediately implies the set
of biorthogonal relationships above, and the harmonic aliasing property is fulfilled.
Now, we assume the harmonic aliasing property holds and we pre-multiply Equation (32) by
(Dx VL x )T . Using Equations (27) and (28) we obtain
(Dx VL x )T (Dx W L x )(Dx VL x )T +(Dx VL x )T (Dx W H x )(Dx VH x )T = (Dx VL x )T
2 (Dx VL x ) + 2 (Dx V H x ) = (Dx VL x )T

1 T 1 T (33)
(Dx VL x )T = (Dx VH x )T
Similarly, we post-multiply Equation (32) by Dx W H x . Using Equations (28) and (30), we obtain
(Dx W L x )(Dx VL x )T (Dx W H x )+(Dx W H x )(Dx VH x )T (Dx W H x ) = Dx W H x

(Dx W L x ) 12 +(Dx W H x ) 12 = Dx W H x (34)
Dx W L x = Dx W H x
Therefore, the harmonic aliasing property implies the surjective property.
4. ASSUMPTIONS FOR SEPARABLE BASIS SYSTEMS
In Section 3 we stated assumptions that will allow us to understand the role of the smoothing
and inter-grid filters in multigrid convergence. The assumptions stated in Section 3 do not allow
the study of many multigrid applications. Specifically, when using the multigrid algorithm in
d-dimensional problems, the down-sampling is often designed to reduce the number of grid nodes
by a factor of 2d . On the other hand, the harmonic aliasing property, as stated in Section 3.4,
is essentially applicable only for cases where the grids are down-sampled by a factor of 2. The
down-sampling by a factor of 2d is important to reduce the computational and space costs of the
algorithm. In this section, we assume further properties in the algorithm and system so that our
analysis can be extended to these cases.
For these extensions we use the tensor product defined as:
Definition 5 (Kronecker product)

If A is an m ×n matrix and B is a p ×q matrix, then the Kronecker product A ⊗ B is the mp ×nq
block matrix:
⎡ ⎤
(A)1,1 B · · · (A)1,n B
⎢ ⎥
⎢ .. .. .. ⎥
A⊗ B =⎢ . . . ⎥ (35)
⎣ ⎦
(A)m,1 B ··· (A)m,n B
DOI: 10.1002/nla
The most useful properties of Kronecker products for the purpose of our analysis are
(A ⊗ B)(C ⊗ D) = AC ⊗ B D (36)
and
(A ⊗ B)−1 = A−1 ⊗ B −1 (37)
For further properties, we refer the reader to [12, 13].
4.1. Separability assumptions

We now assume that we have a system matrix representing a 2D system with coordinates x
and y. We denote the system matrix as A x y ∈ Rmn×mn , where the integers m and n represent the
discretization size of the dimensions corresponding to x and y, respectively. We assume that the
system matrix can be expressed as the sum of Kronecker products:
A x y = A x,1 ⊗ A y,1 +· · ·+ A x,r ⊗ A y,r (38)

r
= A x,i ⊗ A y,i (39)
i=1
where A x,i ∈ Rm×m and A y,i ∈ Rn×n , with i = 1, . . . ,r , representing r possible operators acting on
the dimensions x and y, respectively.
We assume that the matrices A x,i , i = 1, . . . ,r , have the same set of eigenvectors Wx and Vx ,
the matrices A y,i , i = 1, . . . ,r , have the same set of eigenvectors W y and Vy , but each matrix can
have a different set of eigenvalues. We denote the matrix of eigenvalues as x,i for each matrix
A x,i , and y,i for each matrix A y,i . Thus, we have the following eigen-decompositions:
A x,i = Wx x,i VxT , i = 1, . . . ,r (40)

and
A y,i = W y y,i VyT , i = 1, . . . ,r (41)
for which the sets of eigenvectors satisfy the biorthogonal relationships VxT Wx = I x and VyT W y = I y ,
where I x is an m ×m identity matrix and I y is an n ×n identity matrix.
It follows from these assumptions that the right-eigenvectors of the system matrix A x y , denoted
as Wx y , and its eigenvalues, denoted as x y , are given by

r
Wx y = Wx ⊗ W y and x y = x,i ⊗ y,i (42)
i=1
The left-eigenvectors, denoted as Vx y , are given by
VxTy = Wx−1 −1 −1 −1
y = (W x ⊗ W y ) = W x ⊗ W y = Vx ⊗ Vy = (Vx ⊗ Vy )
T T T
(43)
We refer to the assumptions above as the separability assumptions because they allow us to apply
the assumptions from Section 3 for separate sets of eigenvectors. This kind of factorization for the
system matrix often appears in the discretization of partial differential equations (PDEs) (e.g. in
finite difference discretization of the Laplacian, divergence and other operators). Thus, the analysis
under these extended assumptions will be more suitable for applications.
DOI: 10.1002/nla
4.2. Separable filters

The purpose of the assumptions in this section is to apply more aggressive coarsening in the
multi-dimensional case. We start from two down-sampling matrices Dx and D y independently
designed to down-sample the nodes of the x- and y-dimensions by a factor of 2. Then, we define
the down-sampling matrix for the 2D system, denoted as Dx y , as
Dx y = Dx ⊗ D y (44)
In this way the down-sampling matrix Dx y is designed to reduce the total number of nodes by a
factor of 4.
We use inter-grid filters, denoted by FI,x y and FR,x y , and expressed as
FI,x y = FI,x ⊗ FI,y and

(45)
FR,x y = FR,x ⊗ FR,y
where FI,x , FR,x and FI,y , FR,y are restriction and interpolation filters with eigenvectors Wx and
W y , respectively, and with eigenvalues I,x , R,x and I,y , R,y , respectively. Therefore, FI,x y
and FR,x y have right-eigenvectors Wx y , left-eigenvectors Vx y and eigenvalues given by
I,x y = I,x ⊗ I,y and

(46)
R,x y = R,x ⊗ R,y
We note that due to the properties of Kronecker products, the decomposition in (20) is valid for
both 1D and 2D operators.
Similarly, the smoothing operator Sx y is designed such that
Sx y = Sx ⊗ S y (47)
where Sx and S y are smoothing operators with eigenvectors Wx and W y , respectively, with eigen-
values x and y , respectively. The eigenvalues of Sx y are given by
x y = x ⊗ y (48)
4.3. The separable harmonic aliasing property

Under the separability assumptions stated in the sections above, we assume the harmonic aliasing
property on each set Wx , Dx and W y , D y . Then, a generalization of the harmonic aliasing property
that we call the separable harmonic aliasing property follows for the set Wx y , Dx y . That is,
VxTy Ux y Dx y Wx y = (Vx ⊗ Vy )T (Dx ⊗ D y )T (Dx ⊗ D y )(Wx ⊗ W y )
= (VxT Ux Dx Wx )⊗(VyT U y D y W y )
= Nx ⊗ N y (49)
where N x and N y are harmonic aliasing patterns as defined in (24).
DOI: 10.1002/nla
5. ERROR ANALYSIS
In Section 2 the coarse grid correction matrix K was defined as
K = I − Ish Ǎ−1 Ihs A (50)

This is the main object of study in this section as it shows the evolution of the approximation
error in both nested iteration and the correction scheme. Namely, the approximation error after a
full two-grid step with 1 correction scheme iterations, each of them with 1 pre-smoothing and
2 post-smoothing iterations, is given by
e(1 +1+2 )1 = (S 2 K S 1 )1 K u (51)
In the following sub-sections, we use the assumptions stated in Sections 3 and 4 to see how the
eigenvectors of the system are affected by these iterations. Based on the partition of eigenvectors
introduced in Section 3, we apply the same principle to create the following partition of eigenvalues:

L x 0 L x 0 L x 0
x = , x = and x = (52)
0 H x 0 H x 0 H x
Within this section, we will use the convention to omit any subscript x, y or x y whenever the
analysis leads to the same formulas. For example, the eigen-decomposition A = W V is valid in
both 1D and 2D because the eigen-decomposition A x = Wx x VxT is assumed in the 1D, and the
properties of Kronecker products imply A x y = Wx y x y VxTy in the 2D case.
5.1. Galerkin coarsening

From the assumptions in both Sections 3 and 4, the Galerkin condition stated in (4) can be expressed
as
Ǎ−1 = {Ihs AIsh }−1
= {D FR AFI U }−1
= {(DW ) R I (DV )T }−1 (53)
From here, we first consider the assumptions in Section 3. Using the partition of eigenvectors
induced by the harmonic aliasing property, we define the matrix
x = R,L x L x I,L x + R,H x H x I,H x (54)
Then, we follow the last step in (53) and obtain
( Ǎ x )−1 = {(Dx Wx ) R,x x I,x (Dx Vx )T }−1
= {(Dx W L x )x (Dx VL x )T }−1

= 4(Dx W L x )−1
x (Dx VL x )
T
(55)
where we use, first, the surjective property and, second, the biorthogonal relationships (27) to (30).
DOI: 10.1002/nla
Now we consider the assumptions in Section 4. Similarly, for this case we define the matrices
x,i = R,L x L x,i I,L x + R,H x H x,i I,H x (56)

y,i = R,L y L y,i I,L y + R,H y H y,i I,H y (57)
and, based on these definitions,

r
x y = x,i ⊗ y,i (58)
i=1
Then, we follow the last step in (53) to obtain
( Ǎ x y )−1 = {(Dx y Wx y ) R,x y x y I,x y (Dx y Vx y )T }−1
= {(Dx W L x ⊗ D y W L y )x y (Dx VL x ⊗ D y VL y )T }−1
= 16(Dx W L x ⊗ D y W L y )−1
x y (Dx VL x ⊗ D y VL y )
T
= 16(Dx y W L x y )−1
x y (Dx y VL x y )
T
(59)
where we use, first, the surjective property and, second, the biorthogonal relationships (27)–(30),
and finally, we simply define W L x y = W L x ⊗ W L y and VL x y = VL x ⊗ VL y .
We note that in both (55) and (59) the Galerkin coarse matrix Ǎ has an eigen-decomposition
with eigenvectors given by the down-sampled eigenvectors of A. This is a nice property as it
assures that the assumptions stated for the system on the fine grid are satisfied in coarser grids as
well.
5.2. Convergence rates

Using the assumptions in Sections 3 and 4 and the results from Section 5.1, we can express the
coarse grid correction matrix as follows:
K = I − Ish Ǎ−1 Ihs A
= I − FI U Ǎ−1 D FR W V T
= I − FI W V T U Ǎ−1 DW R V T
= I −(22d )W I (V T U DW L )−1 (VLT U DW ) R V T (60)
where d represents the dimension of the problem. In parentheses we see how the harmonic aliasing
property appears naturally in this matrix.
For the assumptions from Section 3, we follow the algebra to obtain
K x = I x −4Wx I,x (VxT Ux Dx W L x )−1

x (VL x U x Dx W x ) R,x x Vx
T T

1 I˜x −1 1 ˜

= Wx Vx −4Wx I,x
T
x ˜
I x I x R,x x VxT
2 I˜x 2
DOI: 10.1002/nla
⎡ ⎤
I˜x − I,L x −1
x R,L x L x − I,L x −1
x R,H x H x
= Wx ⎣ ⎦ VxT (61)
− I,H x −1
x R,L x L x I˜x − I,H x −1
x R,H x H x
Note that matrix K is not diagonalized by the eigenvectors of the system. Instead, we obtain a
block-tridiagonal matrix that shows how each group of modes from W L x and W H x are damped
and mixed. In order to simplify this result, we define the convergence operator, x , such that
K x = Wx x VxT

L x→L x H x→L x
= Wx VxT (62)
L x→H x H x→H x
Each one of the four submatrices in x is diagonal and we call them the modal convergence
operators. Their diagonal values represent the factor by which each modal component of the
error is multiplied and transferred between L x and H x modes according to the subscripts. Their
diagonal values can be simplified as follows:
1 −bi
( L x→L x )i,i = , ( H x→L x )i,i =
1+ai bi 1+ai bi
(63)
−ai ai bi
( L x→H x )i,i = and ( H x→H x )i,i =
1+ai bi 1+ai bi
where
( R,L x )i,i ( L x )i,i ( I,L x )i,i
ai = and bi = (64)
( R,H x )i,i ( H x )i,i ( I,H x )i,i
The convergence of a two-grid algorithm depends on the smoother Sx and the coarse grid correction
matrix K x , which in the domain of the system’s eigenvectors is contained in the matrices x and
x , respectively. Now, matrix x and its four modal convergence operators allow us to focus on
the performance of the inter-grid operators; therefore, this is the main object of study for the design
of inter-grid filters. In Section 6 we will show examples on how to apply this analysis.
From the assumptions in Section 4, we follow a different algebra. This is
K x y = I x y −16Wx y I,x y (VxTy Ux y Dx y W L x y )−1

x y (VL x y U x y Dx y W x y ) R,x y x y Vx y
T T

1 I˜x 1 I˜y 1 ˜ ˜ 1 ˜ ˜
= I x y −16Wx y I,x y ⊗ −1xy [ I I
x x ]⊗ [ I I
y y ] R,x y x y VxTy
2 I˜x 2 I˜y 2 2
⎛ T T ⎞
I,L x I,L y r R,L x L x,i R,L y L y,i
= I x y − Wx y ⊗ −1
xy
⎝ ⊗ ⎠ VxTy
I,H x I,H y i=1
R,H x H x,i
R,H y H y,i
= Wx y x y VxTy (65)
Here, a simple structure for the convergence operator, x y , does not appear clear because of the
Kronecker products involved. Since the matrix −1
x y cannot in general be factored as a Kronecker
DOI: 10.1002/nla
product, we cannot analyze the convergence of the algorithm for each dimension independent
of the other. We then need to consider the four possible combinations of x, y-dimensions and
L , H groups. The products considering these combinations are mixed in x y and we need to
reorder them to identify the modal convergence operators. Thus, we introduce a permutation
matrix P ∈ {0, 1}mn×mn such that for arbitrary matrices X L , X H ∈ Rm/2×m/2 and Y L , Y H ∈ Rn/2×n/2
one has
⎡ ⎤
X L ⊗Y L
⎢ ⎥
XL YL ⎢ X H ⊗Y L ⎥
P ⊗ =⎢ ⎢ ⎥ (66)
⎥
XH YH ⎣ L
X ⊗Y H⎦
X H ⊗Y H
Then, applying this permutation to reorder the rows and columns of x y , we obtain the following
structure:
⎡ ⎤
L x L y→L x L y H x L y→L x L y L x H y→L x L y H x H y→L x L y
⎢ ⎥
⎢ L x L y→H x L y H x L y→H x L y L x H y→H x L y H x H y→H x L y ⎥
Px y P T = ⎢
⎢
⎥
⎥ (67)
⎣ L x L y→L x H y H x L y→L x H y L x H y→L x H y H x H y→L x H y ⎦
L x L y→H x H y H x L y→H x H y L x H y→H x H y H x H y→H x H y
where we identify the modal convergence operators representing the 16 possible ways to transfer
modal components of the error between the four combinations of x, y-dimensions and L , H groups
according to the subscripts. The values of each one of these groups can be expressed in a generic
form as

r
Ax By→C x Dy = AC B D −( I,C x ⊗ I,Dy )−1
xy ( R,Ax Ax,i )⊗( R,By By,i ) (68)
i=1
where A ∈ {H, L}, B ∈ {H, L}, C ∈ {H, L}, D ∈ {H, L} and AC B D is an identity matrix only if
A = C and B = D.
The convergence operator, x y , and its 16 modal convergence operators allow us to focus on the
performance of the inter-grid operators and it is always the main object of study for the design of
inter-grid filters. Compared with the 1D case, the analysis is now more complicated as the modal
components of the error are transferred not only between two groups of modes but also between
different dimensions. In Section 6.3 we will show an example on how to design inter-grid filters
under this scenario.
5.3. The heuristics in error analysis

We consider an ideal scenario for a 1D problem in order to check the heuristic behavior of the
multigrid algorithm. By using the variational property, we define the single inter-grid filter Fsharp,x
such that
L x = I and
(69)
H x = 0
DOI: 10.1002/nla
We call this filter the sharp inter-grid filter. In Fourier harmonic analysis, this would correspond
to what is called a ‘perfect low-pass filter’ [11]. This definition is more general as we can now
apply it to a more general kind of basis, that is, to any basis with the harmonic aliasing property.
By using the eigen-decomposition of A and the sharp inter-grid filter in (63), we obtain
K sharp,x = W H x W HT x (70)
Therefore, for this choice of inter-grid operators, we can see that several applications of the coarse
grid correction matrix do not help to reduce the error. It just cancels the W L x components of the
error. We then need to apply smoothing iterations in order to reduce the W H x components of the
error. We also verify that the error reduction achieved by multigrid iterations does not depend on
the step size h as the iteration matrix does not depend on the eigenvalues of A. The simplicity of
this result shows the general principles of multigrid algorithm design. In Section 6 we will see how
this idealistic scenario does not always lead to an optimal algorithm for solving linear systems.
6. EXAMPLES OF INTER-GRID FILTER DESIGN
In Section 5 we obtained theoretical results for the convergence rates based on the assumptions
stated in previous sections. In this section, we introduce examples to show how these results can be
applied to different kinds of systems. We consider systems based on different sets of eigenvectors:
Fourier harmonic modes, Hadamard harmonic modes, and a mixture of Fourier and Hadamard
harmonic modes.
6.1. Fourier harmonic analysis: trade-off between computational complexity and convergence
rate
We consider a 1D system in which A is a standard finite-difference discretization of a second-order
derivative with step size h = 1; i.e. the stencil of A is s = [−1 2 −1] (the underline denotes the
diagonal element). We apply Dirichlet boundary conditions, i.e. stencil [2 −1] at the left corner
and [−1 2] at the right corner, which lead to an invertible system. The number of nodes in the
discretization is set to N = 16 and we consider a two-grid algorithm with a coarse-grid step size
of 2h = 2. In addition we assume the variational √property that leads to a single inter-grid filter F.
The eigenvectors of A are given by (W )i, j = 2/17 sin(i j/17), with i, j = 1, . . . , 16. The eigen-
vector matrix W is orthonormal and, after reversing the order of the columns j = 9, . . . , 16, it also
fulfills the harmonic aliasing property. Therefore, our modal analysis can be directly applied to this
system. On the other hand, the extension of Fourier analysis from complex- to real-valued harmonic
functions is well known and LFA can therefore be applied to this system. Thus, the purpose of this
example is to (i) show how our method is applied to a standard system in which the eigenvectors can
be labeled by frequencies, thus giving an intuitive picture of what is happening and (ii) show how to
design inter-grid filters within our new framework and thus demonstrate the issue we discover in this
process.
For the inter-grid filter, we start with the common choice of linear-interpolation and full-
weighting (LI/FW), and we consider their application on an increasing number of neighbors per
node. The standard choice for this system considers two neighbors per node, which leads to an
inter-grid filter F with stencil s = [0.5 1 0.5] and Dirichlet boundary conditions. Considering more
neighbors per node is equivalent to applying the inter-grid filter F several times in interpolation
DOI: 10.1002/nla
or restriction operations. Thus, an inter-grid filter F, F 2 , F 3 , F 4 , . . . represents LI/FW operations

over 2, 4, 6, 8, . . . neighbors per node, respectively.
In Table I we show the spectral radii of L x→L x , H x→L x , L x→H x , and H x→H x for a
two-grid approach using different numbers of LI/FW passes. Here, the most important factor is
the spectral radius of L x→L x . It shows the worst case reduction of modal components of the error
for low-frequency modes that are mapped to themselves. In LFA the spectral radius L x→L x is
called the asymptotic convergence factor, loc [7]. The reduction of these components of the error
is the main task of the two-grid approach. We do not see much reduction of the high-frequency
components of the error that are mapped to themselves, as the spectral radius of H x→H x is always
close to 1, leaving this task to the smoothing iterations. The cross-frequency rates H x→L x and
L x→H x represent the aliasing effect in which high- and low-frequency components of the error
are reduced and mapped to low- and high-frequency components of the error, respectively.
The spectral radius of H x→L x in Table I appears to be close to 1, which means an almost
complete transfer of high-to-low frequency components of the error at each iteration. A careful
look at the convergence rates shows that this large number comes from the transfer of the highest-
frequency error to the lowest-frequency error. Although this transfer is not ideal, it is not critical
because the pre-smoothing iterations will reduce the highest-frequency error very effectively. As
expected, all the convergence rates in Table I are further reduced as we increase the number of
LI/FW passes. The disadvantage of increasing the number of passes is that the inter-grid filter,
as well as the coarse system matrix, becomes less and less sparse (see Figure 4(a)–(d)), thus
increasing the computational complexity of the algorithm.
To complete the convergence analysis, we need to consider a smoothing filter and select the
number of smoothing iterations. A simple choice is to use the Richardson iteration scheme, which
leads to a smoothing filter S = I −(1/)A, with = 4 obtained by the Gershgorin bound of A. This
filter satisfies our assumptions because it has the same eigenvectors as A. Since the task of the
smoothing filter is to reduce the high-frequency components of the error, we suggest choosing
the number of smoothing iterations such that the reduction of the high-frequency components of
the error, given by H x , is equal to or less than the reduction of low-frequency components of
the error achieved by the coarse grid correction matrix, given by L x→L x . For this example,
using a 1-pass LI/FW inter-grid filter we achieve the same reduction of low-frequency error as
the reduction of high-frequency error achieved by one Richardson iteration. For instance, using
one pre-smoothing (1 = 1) and one post-smoothing (2 = 1) Richardson iteration in the correction
scheme, the approximation error after one full two-grid step (1 = 1) will be given by e3 = (S K )2 u
with a convergence rate of (S K )2 = 0.2458.
Table I. Spectral radii of modal convergence operators for the system in Section 6.1.
Filter L x→L x H x→L x L x→H x H x→H x
LI/FW 1-pass 0.4539 0.9915 0.4539 0.9915
LI/FW 2-passes 0.3647 0.5280 0.4388 1.0000
LI/FW 3-passes 0.2839 0.4946 0.4110 1.0000
LI/FW 4-passes 0.2149 0.4506 0.3745 1.0000
LI/FW 5-passes 0.1590 0.4011 0.3334 1.0000
LI/FW 6-passes 0.1155 0.3506 0.2914 1.0000
The results consider a two-grid approach using several passes of LI/FW as inter-grid operators.
DOI: 10.1002/nla
As a different choice of inter-grid operators, we try to approach the sharp inter-grid filter with
a common procedure used in signal processing. We select the eigenvalues of F in analogy with
a Butterworth filter of order n [11]. We start at order n = 1 with a cut-off frequency of /16 that
tries to reduce all frequencies except for the lowest frequency mode, and as we increase the order
n the cut-off frequency approaches /2 geometrically, at which point the filter becomes perfectly
sharp. That is,
1
Bn (i) = 2n , i = 1, . . . , 16 (71)
2 i −1
1+
1−(7/8)n N −1
from which we construct the inter-grid filter as F = W W T with = diag(Bn ). The main reason
to move the cut-off frequency with the order of the filter is to prevent the eigenvalues in H x
from producing large cross-frequency convergence rates.
In Table II we show the spectral radii of L x→L x , H x→L x , L x→H x , and H x→H x for
a two-grid approach using Butterworth filters of different orders. The Butterworth filter is
better than LI/FW, especially in terms of the cross-frequency convergence rate H x→L x .
The main disadvantage of the Butterworth filter is that it is always non-sparse, as shown in
Figure 4(e)–(h). Even if increasing the order n makes the filter appear more and more sparse, the
overall contribution of small terms is comparable to the largest entries. Now, increasing the order
n also concentrates the largest entries close to the diagonal and the tridiagonal elements become
similar to the LI/FW entries. This hints at the optimality of LI/FW as a tridiagonal inter-grid
filter for this specific problem.
An important conclusion of these tests is that in the design of inter-grid filters for systems with
Fourier harmonic modes as eigenvectors, we face a trade-off between the number of multigrid steps
that can be saved by moving toward a sharp inter-grid filter and the number of communications
between neighboring nodes required for interpolation/restriction tasks. This is a consequence of
the Gibbs phenomenon, which is well known in Fourier analysis [11].
6.2. Hadamard harmonic analysis: optimality of the sharp inter-grid filter

Now, we consider a system based on an application of Markov chains. The system will have a
variable size with 2l−1 , l ∈ N+ , transient states and at least one recurrent state. We ignore the
precise number of recurrent states and their interconnections as they will not play any role in the
Table II. Spectral radii of modal convergence operators for the system in Section 6.1.
Filter L x→L x H x→L x L x→H x H x→H x
B1 0.4156 0.5826 0.4493 0.9982
B2 0.2932 0.4994 0.4150 1.0000
B3 0.1954 0.4350 0.3615 1.0000
B4 0.1246 0.3623 0.3011 1.0000
B5 0.0770 0.2925 0.2431 1.0000
B6 0.0467 0.2314 0.1923 1.0000
B7 0.0279 0.1807 0.1502 1.0000
The results consider a two-grid approach using Butterworth filters of different orders as the inter-grid filter.
DOI: 10.1002/nla
Figure 4. Images of the magnitude of entries for different inter-grid filter matrices. The
intensity of gray color is white for the largest magnitude and black for the smallest magnitude.
The scale between black and white is set in logarithmic scale in order to increase the visual
difference between small and zero entries: (a) LI/FW 1-pass; (b) LI/FW 3-passes; (c) LI/FW
5-passes; (d) LI/FW 7-passes; (e) B1 ; (f) B3 ; (g) B5 ; and (h) B7 .
solution of the problem. Thus, the structure of the system is given by the transition probability
matrix within the transient states, which is obtained by the following recursion:
T1 = 1
2 (72)

Tl−1 2−l · I˜c
Tl = for l>1 (73)
2−l · I˜c Tl−1
where I˜c is a counter-diagonal matrix of the same size as Tl−1 . The recursion (73) creates a matrix
Tl ∈ (R+ )2 ×2 that is sub-stochastic since the sum of all of its entries in a row is always less
l−1 l−1
than or equal to 1. In fact, the sum of all of the entries in a row is equal to 1−1/2l for all the
rows in Tl . Thus, in this Markov chain, each transient state has a probability of 1/2l of jumping to
one or more recurrent states in one step. An example of this structure is shown in Figure 5 where
we can see the state transition diagram of the transient states for l = 4.
Since, by definition, no recurrent state is connected to any transient state, once the process
jumps from a transient to a recurrent state it will never return to any transient state and it is said
to have been absorbed. Starting from a given transient state i, 1i2l−1 , the number of jumps
within the transient states before jumping to a recurrent state is called the absorbing time, ti . There
are many applications associated with these so-called absorbing chains [14]; for instance, in the
study of discrete phase-type distributions in queueing theory [15].
Here, we will consider the problem of computing the expected value of the absorbing time when
l−1
we start at node i; denoted by (xl )i = E[ti ]. The vector xl ∈ R2 is given by the solution of the linear
DOI: 10.1002/nla
Figure 5. State transition diagram of the transient states for the Markov chain used in Section 6.2
with l = 4 (N = 8 nodes). Each connection with solid line shows the probability of state transitions.
The dashed lines with double arrows show the probability of transition to one or more recurrent
states that do not appear in this figure.
system
(I − Tl )xl = 1 (74)
where (1)i = 1, for i = 1, . . . , 2l−1 .
Here, our system matrix is given by Al = I − Tl , which is a
non-singular, symmetric, positive-definite M matrix. Furthermore, the matrix Al becomes ill-
conditioned as we increase l, creating a problem similar to that found in the numerical solution
of linear PDEs. In the general context of absorbing chains, the matrix Al = I − Tl is called the
fundamental matrix [14]. The inversion of this matrix is important as it also appears in the
computation of moments of discrete phase-type distributions and the probability of absorption by
recurrent classes, among other problems.
In the transition graph of this Markov chain, each node representing a transient state is connected
to l neighboring nodes. However, the structure of connections changes from node to node such
that the stencil of Al is not constant throughout the rows. For instance, in the Markov chain of
Figure 5, the fundamental matrix is
⎡ ⎤
0.5 −0.25 0 −0.125 0 0 0 −0.0625
⎢ ⎥
⎢ −0.25 0.5 −0.125 0 0 0 −0.0625 0 ⎥
⎢ ⎥
⎢ ⎥
⎢ ⎥
⎢ 0 −0.125 0.5 −0.25 0 −0.0625 0 0 ⎥
⎢ ⎥
⎢ ⎥
⎢ −0.125 0 −0.25 0.5 −0.0625 0 0 0 ⎥
⎢ ⎥
A4 = ⎢ ⎥ (75)
⎢ −0.0625 −0.25 −0.125 ⎥
⎢ 0 0 0 0.5 0 ⎥
⎢ ⎥
⎢ ⎥
⎢ 0 0 −0.0625 0 −0.25 0.5 −0.125 0 ⎥
⎢ ⎥
⎢ ⎥
⎢ 0 −0.0625 0 0 0 −0.125 0.5 −0.25 ⎥
⎣ ⎦
−0.0625 0 0 0 −0.125 0 −0.25 0.5
DOI: 10.1002/nla
Here, the stencil at the 3rd row is s3 = [−0.125, 0.5, −0.25, 0, −0.0625] (the underline denotes
the diagonal element), whereas the stencil at the 4th row is s4 = [−0.125, 0, −0.25, 0.5, −0.0625].
Therefore, the assumptions of LFA are not fulfilled and its analysis does not apply for this system.
Nevertheless, in the tests that follow we will ignore this fact as we wish to see what convergence
rates LFA predicts for a system where its assumptions do not apply.
In fact, the eigenvectors of the fundamental matrix Al do not correspond to the Fourier harmonic
modes of LFA but instead form a Hadamard matrix of order N = 2l−1 . One of the standard ways
to construct this matrix is Sylvester’s construction [16], but the basis obtained by this procedure
does not fulfill the harmonic aliasing property. As in the previous example, we need to reorder the
columns of the eigenvector matrix in order to obtain the right structure. Therefore, we introduce
a column-reordered variation of Sylvester’s construction as follows:
W1 = 1 (76)

1 Wl Wl
Wl+1 = √ [U Ū ] (77)
2 Wl −Wl
where U and Ū correspond to uniform up-sampling and up-unselecting matrices of sizes 2l ×2l−1 .
The matrix [U Ū ] acts as a permutation matrix that reorders the columns of the new basis. From
this construction, it can be easily checked through induction arguments that the matrix Wl is
orthonormal and that it fulfills the harmonic aliasing property. The same arguments could be used
to check the fact that Wl diagonalizes the system matrix Al . Furthermore, the orthogonality of Wl
and Equation (77) allow us to obtain a closed-form expression for the sharp inter-grid filter, as
defined in (69). That is,
⎡ ⎤
1 1
⎢ ⎥
⎢1 1 ⎥
⎢ ⎥
⎢ ⎥
⎢ 1 1 ⎥
˜I 0 ⎢ ⎥
1 1 ⎢ 1 1 ⎥
Fsharp,l+1 = Wl+1
T
Wl+1 = (I +U D̄ + Ū D) = ⎢ ⎥ (78)
0 0 2 2⎢ ⎥
⎢ .. ⎥
⎢ . ⎥
⎢ ⎥
⎢ ⎥
⎣ 1 1⎦
1 1
The structure of the filter turns out to be very sparse, unlike the sharp filter for the previous
example. This filter alternately averages the values at each node with its left neighbor and then its
right neighbor.
In our analysis, the inter-grid filter Fl and the smoothing operator Sl should be designed to
match the structure of the system. For this reason, our analysis would not work if we use standard
inter-grid operators such us LI/FW, because the eigenvectors of the LI/FW filter are Fourier
harmonic modes that are different than the Hadamard harmonic modes. As the sharp inter-grid
filter in (78) has a sparse structure, we choose it as the inter-grid filter. As in the previous example,
for the smoothing filter we use the Richardson iteration scheme, which leads to a smoothing filter
Sl = I −(1/)A, with = 1−2−l obtained by the Gershgorin bound of Al . Since the sharp inter-grid
filter is removing all the L x components of the error, the only parameters to configure are the
DOI: 10.1002/nla
number of smoothing iterations. This means that we need only one iteration of the full two-grid
algorithm with O(1) smoothing iterations to make the algorithm converge. On the other hand,
a standard choice of LI/FW inter-grid operators does not work better than the sharp inter-grid
configuration as shown in Table III.
As this scenario is rather unusual in the context of PDEs, where the eigenvectors are typically
similar to Fourier harmonic modes (that come with Gibbs phenomenon, as shown in Section 6.1),
we would like to understand how the sparse inter-grid filter arranges the information to reach
convergence in one step. To understand this, we need to consider three facts. First, the fact that the
sharp inter-grid filter is alternately averaging the values at each node with its left and then right
neighbor. Second, we need to note that the coarse grid matrix Ǎl constructed from Al and Fsharp,l ,
using the Galerkin condition, is equal to our definition of Al−1 constructed by recursion (this can
be checked by induction). This would not have been the case if we used a different inter-grid
filter such as LI/FW. Then, we can say that the sharp inter-grid filter has been able to unveil the
recursive structure by which we defined the system. It is also a nice property in the sense that
the coarse grid problem also represents an absorbing Markov chain; thus the sharp inter-grid filter
makes the two-grid algorithm an aggregation method similar to what is sought in [17] using a
different multi-level approach.
The third fact is that the structure of our system induces a hierarchical classification of nodes.
Namely, we can define classes of nodes by the strength of their connections, as is usually done in
AMG methods [2]. Two nodes i and j belong to the same class if they have a transition probability
(P)i, j 1/2c , with 1cl. For instance, in the system of Figure 5 for c = 1 we have eight singleton
classes with the individual transition states in each one; for c = 2 we have four classes: {1, 2},
{3, 4}, {5, 6}, and {7, 8}; for c = 3 we have two classes: {1, 2, 7, 8} and {3, 4, 5, 6}; and finally for
c = 4 we have one class with the whole set of nodes. This classification of nodes is shown in
Figure 6.
Finally we can see how these three facts combine. The sharp inter-grid filter averages the
strongest connected nodes, which correspond alternately to nodes at the left and right of each
Table III. Convergence rates of the full two-grid algo-

rithm for different inter-grid operators and different sizes
of the system in Section 6.2.
(S K )2
N Sharp filter LI/FW
2 0.0000 0.2500
4 0.0816 0.2030
8 0.1600 0.2700
16 0.2040 0.3447
32 0.2268 0.3955
64 0.2383 0.4428
128 0.2442 0.4817
256 0.2471 0.5156
The configuration considers one step of the full two-grid
algorithm with one pre-smoothing and one post-smoothing
Richardson iteration. The results compare the convergence
rates by using a sharp inter-grid filter or LI/FW for inter-
grid operators.
DOI: 10.1002/nla
Figure 6. Classification of nodes by the strength of their connection for the Markov chain in
Figure 5. By considering only the strongest connections, we start in the white color with eight
singleton classes. As we consider weaker connections, we obtain four classes, two classes and
finally one class with the whole set of nodes, represented in light to dark gray colors, respectively.
The classification leads to a nested structure of classes.
Table IV. Spectral radii of modal convergence operators for the system in Section 6.2.
Analysis L x→L x H x→L x L x→H x H x→H x
MA ∀N 0 0 0 1
LFA N = 2 0 0 0 1
LFA N = 4 0.0528 0.2236 0.2236 0.9472
LFA N = 8 0.1702 0.3758 0.3758 0.9803
LFA N = 16 0.2877 0.4527 0.4527 0.9936
LFA N = 32 0.3739 0.4838 0.4838 0.9981
LFA N = 64 0.4283 0.4948 0.4948 0.9995
LFA N = 128 0.4602 0.4984 0.4984 0.9999
LFA N = 256 0.4783 0.4995 0.4995 1.0000
The results consider a two-grid approach using the sharp inter-grid filter from (78). The first row shows the
results for our modal analysis (MA), which do not change with the problem size. The following rows show
the estimation of LFA (working under incorrect assumptions) for systems with increasing size.
node. These nodes belong to the same class defined above for c = 2 and, since the different classes
for 1cl are nested (see Figure 6), the sharp inter-grid filter guarantees a similar structure in the
coarse grid. This did not happen in the example of Section 6.1 because in that case we could not
separate classes with a nested structure. This fact seems to be crucial in order to obtain an optimal
inter-grid filter for the Markov chain problem.
In terms of convergence factors for this example, our analysis gives different results if we used
LFA while ignoring the fact that the assumptions for LFA are not fulfilled. This is shown in
Table IV, where we can see that the convergence estimated by our method compared with LFA is
the same only for grid size N = 2. This is because N = 2 is the only size for which the Hadamard
basis is the same as the Fourier basis. For N >2 we see how LFA gives increasingly pessimistic
estimates of the convergence factors.
We can also check how different the convergence analysis would be if we chose LI/FW for the
inter-grid operators. The multigrid algorithm lets us use these inter-grid operators but then neither
LFA nor our analysis can be applied to get information about modal convergence. This is because
DOI: 10.1002/nla
the Fourier harmonic modes of the LI/FW inter-gird filter do not match the Hadamard harmonic
modes of the system. If we ignore this limitation and we use the Hadamard harmonic basis to
estimate the convergence of a two-grid step, we obtain the results of Table V. On the other hand,
if we use a Fourier harmonic basis to estimate convergence rates (which corresponds to LFA), we
obtain the results in Table VI. The Hadamard analysis leads to a more pessimistic estimation but
it is not possible to determine which result is more accurate because the definitions of the L and
H groups of modes technically does not apply under both analyses.
The conclusion of this approach is that an arbitrary choice of inter-grid operators does not let us
apply the heuristics of the multigrid methodology if we cannot define groups of L and H modes.
The choice of LI/FW inter-grid operators still seems to make the algorithm stable because the
estimated convergence factors are always less than 1, but its performance is obviously inferior to
that of the optimal sharp inter-grid filter for this system.
Thus, in this case our analysis has been shown to be better than LFA in terms of its usefulness
for studying convergence rates. Its main advantage appears in the design of inter-grid filters and
smoothing operators.
6.3. Fourier–Hadamard harmonic analysis: the mixture of two different bases

We now consider a 2D system that corresponds to a mixture of the system from Section 6.1
and the system from Section 6.2. Let A x ∈ R16×16 be the system matrix from Section 6.1 and let
A y ∈ R16×16 be the system matrix from Section 6.2 for l = 5, N = 16. Then, we define a 2D system
by taking the Kronecker sum of these two operators. That is,
Ax y = Ax ⊕ A y (79)
= Ax ⊗ I y + Ix ⊗ A y (80)
Table V. Spectral radii of modal convergence operators for different sizes of the system in Section 6.2.
N L x→L x H x→L x L x→H x H x→H x
4 0.4375 0.7844 0.2296 0.8438
8 0.5179 0.8122 0.2641 0.9183
16 0.5843 0.8466 0.3737 0.9586
32 0.6279 0.8893 0.4322 0.9791
64 0.6624 0.9708 0.4645 0.9895
The results consider a two-grid approach, using LI/FW as the inter-grid operators, and assuming the Hadamard
basis as eigenvectors of the system matrix (valid assumption) and inter-grid filter (wrong assumption).
Table VI. Spectral radii of modal convergence operators for different sizes of the system in Section 6.2.
N L x→L x H x→L x L x→H x H x→H x
4 0.2205 0.6765 0.3841 0.8843
8 0.2782 0.7038 0.4527 0.9630
16 0.3597 0.6907 0.4660 0.9915
32 0.4150 0.7805 0.4770 0.9978
64 0.4514 0.8945 0.4879 0.9995
The results consider a two-grid approach, using LI/FW as the inter-grid operators, and assuming Fourier
harmonic modes as eigenvectors of the system matrix (wrong assumption) and inter-grid filter (valid assumption).
DOI: 10.1002/nla
Thus, the system matrix A x y ∈ R256×256 is a mixture of matrices with different eigenvectors.
Although the problem does not represent any well-known system in applications, we choose it in
order to show how our analysis applies to mixtures of very different systems. A more realistic
scenario of this kind would be, for example, a 2D diffusion equation with a diffusion coefficient
that varies along one of the dimensions. The difficulty in that case is to check the harmonic aliasing
property, which thus remains a problem for future research.
Since A y does not have constant stencil coefficients, neither does A x y . Therefore the assumptions
of LFA are not fulfilled. However, since the system fulfills the assumptions introduced in Section 4,
we are able to apply our modal analysis.
Here, the eigenvectors of the system matrix A x y are given by Wx ⊗ W y , where Wx are Fourier
harmonic modes and W y are Hadamard harmonic modes. From the results of Section 5.2, we
know that although the eigenvectors of a system represented by sums of Kronecker products
are separable, the convergence rates are not. Thus, the problem of design of inter-grid opera-
tors cannot, in general, be considered with any one dimension independent of any other. Now,
since in the y-dimension we can actually implement optimal inter-grid operators using the sharp
inter-grid filter in (78), this allows us to decouple the two problems. Then, if we choose the
inter-grid filter Fx y = Fx ⊗ Fy with the 1-pass LI/FW inter-grid filter as Fx (suitable for Fourier
harmonic eigenvectors) and the sharp inter-grid filter in (78) as Fy (optimal for Hadamard harmonic
modes) we obtain the convergence rates shown in Table VII for the two-grid algorithm. This
combination of inter-grid filters completely removes the cross-modal convergence factors with
modal transfers H y → L y and L y → H y. For the modal transfers H y → H y, we observe complete
removal of cross-modal error components (HxHy → LxHy and LxHy → HxHy) and complete transfer
of self-mode error components (LxHy → LxHy and HxHy → HxHy). For the modal transfers
L y → L y, we observe results similar to those obtained for the 1-pass LI/FW inter-grid filter in
Section 6.1.
As we did in the previous example, we can ignore the fact that the assumptions for LFA are not
fulfilled in this problem and we can compute its estimates for the convergence rates. These results
are shown in Table VIII, where we see that the estimates are not too far from the estimates of
our modal analysis. The disadvantage of LFA, other than working as an approximation, is in the
interpretation of these results as it shows that there is no decoupling between the two dimensions
of the problem.
Finally, we consider the use of different inter-grid operators for which we make a common
choice of using a 2D LI/FW operator. This operator leads to an inter-grid filter Fx y = Fx ⊗ Fy
Table VII. Spectral radii of modal convergence operators for the system in Section 6.3
using our modal analysis.
x y LxLy HxLy LxHy HxHy
→ LxLy 0.4532 0.8503 0 0
→ HxLy 0.4611 0.9994 0 0
→ LxHy 0 0 1 0
→ HxHy 0 0 0 1
The 16 convergence factors are organized according to the subscripts of modal convergence
operators indicating transfer from the four combinations of modes in the columns to the
four combinations of modes in the rows. The results consider a two-grid approach, using
a 1-pass LI/FW inter-grid filter for the x-dimension and the sharp inter-grid filter in (78)
for the y-dimension.
DOI: 10.1002/nla
Table VIII. Spectral radii of modal convergence operators for the system
in Section 6.3 using LFA (under wrong assumptions).
→ LxLy 0.6063 0.8420 0.4523 0.2935
→ HxLy 0.4547 0.9995 0.2080 0.2024
→ LxHy 0.4523 0.2935 0.9965 0.1878
→ HxHy 0.2080 0.2024 0.1322 1.0000
The 16 convergence factors are organized according to the subscripts of modal convergence operators indicating
transfer from the four combinations of modes in the columns to the four combinations of modes in the rows.
The results consider a two-grid approach, using a 1-pass LI/FW inter-grid filter for the x-dimension and the
sharp inter-grid filter in (78) for the y-dimension.
Table IX. Spectral radii of modal convergence operators for the system in Section 6.3 using our modal
analysis (under incorrect assumptions).
→ LxLy 0.7126 0.8287 0.7548 0.2509
→ HxLy 0.4533 0.9997 0.1892 0.1798
→ LxHy 0.3730 0.2177 0.9982 0.2957
→ HxHy 0.1432 0.1433 0.2226 1.0000
The results consider a two-grid approach, using a 1-pass LI/FW inter-grid filter in both x- and y-dimensions. It
is assumed that Fourier harmonic modes are eigenvectors of the operators in the x-dimension (valid assumption)
and Hadamard basis are eigenvectors of the operators in the y-dimension (valid for the system matrix and false
for the inter-grid filter).
Table X. Spectral radii of modal convergence operators for the system in Section 6.3 using LFA (under
incorrect assumptions).
→ LxLy 0.6722 0.8313 0.6119 0.3030
→ HxLy 0.4553 0.9996 0.2253 0.2177
→ LxHy 0.4714 0.3026 0.9999 0.2528
→ HxHy 0.2257 0.2177 0.1890 1.0000
The results consider a two-grid approach, using a 1-pass LI/FW inter-grid filter in both x- and y-dimensions.
It is assumed that Fourier harmonic modes are eigenvectors of the operators in both x- and y-dimensions (false
only for the system matrix in the y-dimension).
where both Fx and Fy are 1D, 1-pass LI/FW filters. As in the example of Section 6.2, this choice
of inter-grid operators makes both our modal analysis and LFA not applicable for this problem. In
Tables IX and X, we can see the estimates of our analysis, based on a Fourier–Hadamard basis and
LFA, respectively. The results are very similar and our analysis shows slightly pessimistic results
compared with LFA.
DOI: 10.1002/nla
There are many disadvantages for this choice of inter-grid operators. First and most important,
it does not allow us to define groups of L and H modes. Second, by an arbitrary definition
of these groups of modes using either our analysis or LFA, we see a high coupling in the
cross-modal convergence rates. Finally, the convergence rate for the modal transfer LxLy → LxLy
frequencies, which is the most important task for the two-grid algorithm, is far from the convergence
rate achieved by the Fourier–Hadamard inter-grid operators in Table VII. This last fact has a
consequence in the final algorithm which can be observed by using a smoothing filter Sx y = Sx ⊗ S y ,
where Sx and S y correspond to the Richardson iteration scheme as configured in Sections 6.1 and
6.2, respectively. Then, a single full two-grid step (1 = 1) with 1 = 2 = 1 shows a convergence rate
of (S K )2 = 0.2301 for our inter-grid configuration compared with (S K )2 = 0.3037 obtained
by using a 2D LI/FW inter-grid operator.
Here, our analysis has been found to be better than LFA for the design of a 2D inter-grid filter,
as the combination of LI/FW with a sharp inter-grid filter shows good performance and perfect
decoupling between the convergence rates of different dimensions.
7. CONCLUSIONS
In this paper we introduced new tools for the analysis of the linear multigrid algorithm. These
tools allowed us to reveal and study the roles of the smoothing and inter-grid operators in multigrid
convergence. In most applications of multigrid methods, these operators are designed based on the
geometry and heuristics of the problem. We see this as a big problem for distributed applications
because in such scenarios it is essential to minimize the number of iterations the algorithm requires
to converge.
The main contribution of this paper is the establishment of a new approach to convergence
analysis and new design techniques for inter-grid and smoothing operators. We have shown how
this analysis is different than LFA, which is considered to be the standard tool for the analysis
and design of multigrid methods [7]. Our study shows the clear advantages of our approach when
facing systems with non-uniform stencils. By considering different systems, we showed that there
is no general approach to optimizing the multigrid operators for a given system. For systems
with Fourier harmonic modes as eigenvectors, we face a trade-off between the computational
complexity and the convergence rate of each multigrid step. For systems with a Hadamard basis as
eigenvectors, we are able to obtain optimal multigrid operators that make the algorithm converge
in one step, with O(1) smoothing iterations, which is possible due to the particular structure of
the system. The same multigrid operators show a perfect decoupling in a mixture of two different
systems where one of the operators has a Hadamard basis as eigenvectors. Our modal analysis
has been shown to be crucial to unveil these properties and to show the exact influence of each
operator on the convergence behavior of the algorithm.
We note that, given the assumptions imposed on the system, we were able to analyze multigrid
convergence with no heuristics based on the geometry of the problem. This opens the possibility
of designing a fully AMG method if the correct assumptions are satisfied. Nevertheless, this is
not a straightforward step because the harmonic aliasing property is strongly connected with the
geometry of the problem. The main difficulty in our approach is to check our assumptions on the
eigenvectors of the system. For future research, we are studying practical methods to check these
assumptions and modifications that can make them more flexible to check and manage.
DOI: 10.1002/nla
REFERENCES
1. Brandt A. Algebraic multigrid theory: the symmetric case. Applied Mathematics and Computations 1986; 19:
23–56.
2. Ruge JW, Stüben K. Algebraic multigrid (AMG). In Multigrid Methods, Frontiers in Applied Mathematics, vol.
3, McCormick SF (ed.). SIAM: Philadelphia, PA, 1987; 73–130.
3. Brandt A, McCormick SF, Ruge JW. Algebraic multigrid (AMG) for sparse matrix equations. In Sparsity and
its Applications, Evans DJ (ed.). Cambridge University Press: Cambridge, 1984.
4. Yang UM. Parallel algebraic multigrid methods high performance preconditioners. In Numerical Solutions of
PDEs on Parallel Computers, Bruaset AM, Bjrstad P, Tveito A (eds), Lecture Notes in Computational Science
and Engineering: Springer: Berlin, 2005.
5. Brandt A. Rigorous quantitative analysis of multigrid, I: constant coefficients two-level cycle with l2-norm. SIAM
Journal on Numerical Analysis 1994; 31(6):1695–1730.
6. Brandt A. Multi-level adaptive solutions to boundary-value problems. Mathematics of Computation 1977; 31:
333–390.
7. Trottenberg U, Oosterlee CW, Schüller A. Multigrid. Academic Press: London, 2000.
8. Mallat S. A Wavelet Tour of Signal Processing (2nd edn), Wavelet Analysis and its Applications. Academic
Press: New York, 1999.
9. Briggs WL, Henson VE, McCormick SF. A Multigrid Tutorial (2nd edn). SIAM: Philadelphia, PA, 2000.
10. Wesseling P. An Introduction to Multigrid Methods. Wiley: Chichester, 1992.
11. Proakis JG, Manolakis DG. Digital Signal Processing (2nd edn), Principles, Algorithms, and Applications.
Macmillan: Indianapolis, IN, 1992.
12. Laub AJ. Matrix Analysis for Scientists and Engineers. SIAM: Philadelphia, PA, 2005.
13. Davis PJ. Circulant Matrices. A Wiley-Interscience Publication, Pure and Applied Mathematics. Wiley: New
York, Chichester, Brisbane, 1979.
14. Bremaud P. Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues. Springer: New York, 1999.
15. Neuts MF. Matrix-Geometric Solutions in Stochastic Models: An Algorithmic Approach. Johns Hopkins University
Press: Baltimore, MD, 1981.
16. Sylvester JJ. Thoughts on inverse orthogonal matrices, simultaneous sign-successions, and tesselated pavements
in two or more colours, with applications to newton’s rule, ornamental tile-work, and the theory of numbers.
Philosophical Magazine 1867; 34:461–475.
17. De Sterck H, Manteuffel T, McCormick SF, Nguyen Q, Ruge JW. Markov chains and web ranking: a multilevel
adaptive aggregation method. Thirteenth Copper Mountain Conference on Multigrid Methods, Copper Mountain,
CO, U.S.A., 2007.
DOI: 10.1002/nla
A generalized eigensolver based on smoothed aggregation

(GES-SA) for initializing smoothed
aggregation (SA) multigrid
M. Brezina1, ‡ , T. Manteuffel1, ‡ , S. McCormick1, ‡ , J. Ruge1, ‡ ,

G. Sanders1, ∗, †, ‡ and P. Vassilevski2
1 Department of Applied Mathematics, University of Colorado at Boulder, UCB 526,
Boulder, CO 80309-0526, U.S.A.
2 Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, 7000 East Avenue,
Mail Stop L-560, Livermore, CA 94550, U.S.A.
SUMMARY
Consider the linear system Ax = b, where A is a large, sparse, real, symmetric, and positive-definite
matrix and b is a known vector. Solving this system for unknown vector x using a smoothed aggregation
(SA) multigrid algorithm requires a characterization of the algebraically smooth error, meaning error
that is poorly attenuated by the algorithm’s relaxation process. For many common relaxation processes,
algebraically smooth error corresponds to the near-nullspace of A. Therefore, having a good approximation
to a minimal eigenvector is useful to characterize the algebraically smooth error when forming a linear
SA solver. We discuss the details of a generalized eigensolver based on smoothed aggregation (GES-SA)
that is designed to produce an approximation to a minimal eigenvector of A. GES-SA may be applied
as a stand-alone eigensolver for applications that desire an approximate minimal eigenvector, but the
primary purpose here is to apply an eigensolver to the specific application of forming robust, adaptive
linear solvers. This paper reports the first stage in our study of incorporating eigensolvers into the existing
adaptive SA framework. Copyright q 2008 John Wiley & Sons, Ltd.
KEY WORDS: generalized eigensolver; smoothed aggregation; multigrid; adaptive solver
∗ Correspondence to: G. Sanders, Department of Applied Mathematics, University of Colorado at Boulder, UCB 526,
Boulder, CO 80309-0526, U.S.A.
†
E-mail: sandersg@colorado.edu
‡
University of Colorado at Boulder and Front Range Scientific Computing.
Contract/grant sponsor: University of California Lawrence Livermore National Laboratory; contract/grant number:
W-7405-Eng-48

250 M. BREZINA ET AL.
1. INTRODUCTION
In the spirit of algebraic multigrid (AMG) [1–5], smoothed aggregation (SA) multigrid [6] has
been designed to solve a linear system of equations with little or no prior knowledge regarding
the geometry or physical properties of the underlying problem. Therefore, SA is often an effi-
cient solver for problems discretized on unstructured meshes with varying coefficients or with no
associated geometry. The relaxation processes commonly used in multigrid solvers are compu-
tationally cheap, but commonly fail to adequately reduce certain types of error, which we call
error that is algebraically smooth with respect to the given relaxation. If a characterization of
algebraically smooth error is known, in the form of a small set of prototype vectors, the SA
framework constructs intergrid transfer operators that allow such error to be eliminated on coarser
grids, where relaxation is more economical. For example, in a 3D elasticity problem, six such
components (the so-called rigid body modes) form an adequate characterization of the algebraically
smooth error. Rigid body modes are often available from discretization packages, and a solver can
be produced with these vectors in the SA framework [6]. However, such a characterization is not
always readily available (even for some scalar problems) and must be developed in an adaptive
process.
Adaptive SA (SA), as presented in [7], was designed specifically to create a representative set
of vectors for cases where a characterization of algebraically smooth error is not known. Initially,
simple relaxation is performed on a homogeneous version of the problem for all levels of the
multigrid hierarchy being constructed. These coarse-level approximations are used to achieve a
global-scale update that serves as our first prototype vector that is algebraically smooth with respect
to relaxation. Using this one resulting component, the SA framework is employed to construct a
linear multigrid solver, and the whole process can be repeated with the updated solver playing the
role of relaxation on each multigrid level. At each step, the adequacy of the solver is assessed by
monitoring convergence factors, and if the current solver is deemed adequate, then the adaptive
process is terminated and the current solver is retained.
We consider applying SA to an algebraic system of equations Ax = b, where A = (ai j ) is an n ×n
symmetric, positive-definite (SPD) matrix that is symmetrically scaled so that its diagonal entries
are all ones. For simplicity, we use damped Jacobi for our initial relaxation. The SA framework
provides an interpolation operator, P, that is used to define a coarse level with standard Galerkin
variational corrections. If the relaxation process is a convergent iteration, then it is known from the
literature (e.g. [1, 8]) that a sufficient condition for two-level convergence factors bounded from
one is that for any u on the fine grid, there exists a v from the coarse grid such that
C
u− Pv22 (Au, u) (1)
A2
with some constant C. The quality of the bound on convergence factor depends on the size of C, as
shown in [9]. This requirement is known in the literature as the weak approximation property and
reflects the observation noted in [8, 10] that any minimal eigenvector (an eigenvector associated
with the smallest eigenvalue) of A needs to be interpolated with accuracy inversely proportional
to the size of its eigenvalue. For this reason, this paper proposes a generalized eigensolver based
on smoothed aggregation (GES-SA) to approximate a minimal eigenvector of A.
Solving an eigenvalue problem as an efficient means to developing a linear solver may appear
counterintuitive. However, we aim to compute only an appropriately accurate approximation of
the minimal eigenvector to develop an efficient linear solver with that approximation at O(n)
DOI: 10.1002/nla
GES-SA 251
cost. In this context, many existing efficient methods for generating a minimal eigenvector are
appealing (see [11, 12] for short lists of such methods). Here, we propose GES-SA because it
takes advantage of the same data structures as the existing SA framework. Our intention is to
eventually incorporate GES-SA into the SA framework to enhance robustness of our adaptive
solvers for difficult problems that may benefit from such enhancement (such as system problems,
corner-singularity problems, or problems with geometrically oscillatory near-kernel).
The GES-SA algorithm performs a series of iterations that minimize the Rayleigh quotient
(RQ) over various subspaces, as discussed in the later sections. In short, GES-SA is a variant of
algebraic Rayleigh quotient multigrid (RQMG [13]) that uses overlapping block RQ Gauss–Seidel
for its relaxation process and SA RQ minimization for coarse-grid updates. In [14], Hetmaniuk
developed an algebraic RQMG algorithm that performs point RQ Gauss–Seidel for relaxation and
coarse-grid corrections based on a hierarchy of static intergrid transfer operators that are supplied
to his algorithm. This supplied hierarchy is assumed to have adequate approximation properties.
In contrast, GES-SA initializes the hierarchy of intergrid transfer operators and modifies it with
each cycle, with the goal of developing a hierarchy with adequate approximation properties, as in
the setup phase of SA. This is discussed in more detail in Section 3.2.
This paper is organized as follows. The rest of Section 1 gives a simple example and a background
on SA multigrid. Section 2 introduces the components of GES-SA. Section 3 presents how the
components introduced in Section 2 are put together to form the full GES-SA algorithm. Section 4
presents a numerical example with results that demonstrate how the linear SA solvers produced with
GES-SA have desirable performance for particular problems. Finally, Section 5 makes concluding
remarks.
1.1. The model problem
Example 1
Consider the linear problem Ax = b and its associated generalized eigenvalue problem Ax = Bx.
Matrix A is the 1D Laplacian with Dirichlet boundary conditions, discretized with equidistant
second-order central differences, symmetrically scaled so that the diagonal entries are all ones:
⎡ ⎤
2 −1
⎢ ⎥
⎢ −1 2 −1 ⎥
⎢ ⎥
1⎢ .. ⎥
A= ⎢ . ⎥ (2)
2⎢
⎢
⎥
⎥
⎢ ⎥
⎣ −1 2 −1⎦
−1 2
an n ×n tridiagonal matrix. Matrix B for this example is In , the identity operator on Rn . The full
set of nodes for this problem is n = {1, 2, . . . , n}. The problem size, n = 9, is used throughout
this paper to illustrate various concepts regarding the algorithm. Note that the 1D problem is used
merely to show concepts and is not of further interest, as its tridiagonal structure is treated with
optimal computational complexity using a direct solver. However, the example is useful in the
sense that it captures the concepts we present in their simplest form.
DOI: 10.1002/nla
1.2. SA multigrid
In this section, we briefly recall the SA multigrid framework for constructing a multigrid hierarchy.
Like any algebraic multilevel method, SA requires a setup phase. Here, we follow the version
presented in [6, 15]. Given a relaxation process and a set of vectors K characterizing algebraically
smooth error, the SA setup phase produces a multigrid hierarchy that defines a linear solver.
For symmetric problems, such as those we consider here, standard SA produces a coarse grid
using interpolation operator P and restriction operator, R = P T . This gives the variational (or
Galerkin) coarse-grid operator, Ac = P T A P, commonly used in AMG methods. This process is
repeated recursively on all grids, constructing a multigrid hierarchy. The interpolation operator is
produced by applying a smoothing operator, S, to a tentative interpolation operator, P̂, that satisfies
the weak approximation property.
At the heart of forming P̂ is a discrete partitioning of fine-level nodes into a disjoint covering
of the full set of nodes, n = {1, 2, . . . , n}. Members of this partition are locally grouped based
on matrix AG , representing the graph of strong connections [6]. AG is created by filtering the
original problem matrix A with regard to strength of coupling (Figure 1). For the scalar problems
considered here, we define node i to be strongly connected to node j with respect to the parameter
∈ (0, 1) if
√
|ai j |> aii a j j (3)
Any connection that violates this requirement is a weak connection. Entry (AG )i j = 1 if the connec-
tion between i and j is strong, and (AG )i j = 0 otherwise.
Definition 1.1
A collection of m subsets {A j }mj=1 of n = {1, 2, . . . , n} is an aggregation with respect to AG if
the following conditions hold.

• Covering: mj=1 A j = n .
• Disjoint: For any j = k, A j ∩Ak = ∅.
• Connected: For any j, if two nodes p, q ∈ A j , then there exists a sequence of edges with
end points in A j that connects p to q within the graph of AG .
Each individual subset A j within the aggregation is called an aggregate.
The method we use to form aggregations is given in [6], where each aggregate has a central
node, or seed, numbered i, and covers this node’s entire strong neighborhood (the support of the
ith row in graph of AG ). This is a very common way of forming aggregations for computational
benefits, but is not mandatory. We return to Example 1 to explain the aggregation concept. An
acceptable aggregation of 9 with respect to A would be m = 3 aggregates, each of size 3, defined
Figure 1. Graph of matrix AG from Example 1 with n = 9. The nine nodes are enumerated, edges of the
graph represent nonzero off-diagonal entries in A, and the Dirichlet boundary conditions are represented
with the hollow dots at the end points.
DOI: 10.1002/nla
GES-SA 253
as follows:
A1 = {1, 2, 3}, A2 = {4, 5, 6}, A3 = {7, 8, 9} (4)
It is easily verified that this partitioning satisfies Definition 1.1. This aggregation is pictured in
Figure 2. 2D examples are presented in Section 4.
We find it useful to represent an aggregation {A j }mj=1 with an n ×m sparse, binary aggregation
matrix, which we denote by [A]. Each column of [A] represents a single aggregate, with a one in
the (i, j)th entry if point i is contained in aggregate A j , and a zero otherwise. In our 1D example,
with n = 9, we represent the aggregation given in (4) as
⎡ ⎤
1
⎢ ⎥
⎢1 ⎥
⎢ ⎥
⎢ ⎥
⎢1 ⎥
⎢ ⎥
⎢ ⎥
⎢ 1 ⎥
⎢ ⎥
⎢ ⎥
[A] = ⎢ 1 ⎥ (5)
⎢ ⎥
⎢ ⎥
⎢ 1 ⎥
⎢ ⎥
⎢ 1⎥
⎢ ⎥
⎢ ⎥
⎢ 1⎥
⎣ ⎦
1
Based on the sparsity structure of [A], the SA setup phase constructs P̂ with a range that
represents a given, small collection of linearly independent vectors, K. This is done by simply
restricting the values of each vector in K to the sparsity pattern specified by [A].
Under the above construction, the vectors in K are ensured to be in R( P̂), the range of the
tentative interpolation operator, and are therefore well attenuated by a corresponding coarse-grid
correction. However, K is only a small number of near-kernel components. Other vectors in R( P̂)
may actually be quite algebraically oscillatory, which can be harmful to the coarsening process
because it may lead to a coarse-grid operator with higher condition number than desired. This
degrades the effect of coarse-grid relaxation on vectors that are moderately algebraically smooth.
Of greater importance, some algebraically smooth vectors are typically not well represented by
R( P̂) and are therefore not reduced by coarse-grid corrections. To remedy the situation, SA does
not use P̂ as its interpolation operator directly, but instead utilizes a smoothed version, P = S P̂,
where S is an appropriately chosen polynomial smoothing operator. As a result, a much richer
set of algebraically smooth error is accurately represented by the coarse grid. A typical choice
for S is one step of the error propagation operator of damped-Jacobi relaxation. In this paper,
Figure 2. Graph of matrix AG from Example 1 with n = 9 splits into three aggregates. Each box encloses
a group of nodes in its respective aggregate.
DOI: 10.1002/nla
we use damped-Jacobi smoothing under the assumption that the system is diagonally scaled so
that diagonal elements are one.
The underlying set, K, that induces a linear SA solver can be either supplied as in standard SA
or computed as in SA methods. We now describe a new approach to constructing K that can be
used within the existing SA framework.
2. RQ MINIMIZATION WITHIN SUBSPACES
Consider the generalized eigenvalue problem, Av = Bv, where A and B are given n ×n real SPD
matrices, v is an unknown eigenvector of length n, and is an unknown eigenvalue. Our target
problem is stated as follows:
find an eigenvector, v1 = 0, corresponding to the
smallest eigenvalue, 1 , in the problem Av = Bv (6)
For convenience, v1 is called a minimal eigenvector and the corresponding eigenvalue, 1 , is called
the minimal eigenvalue.
First, we review a well-known general strategy for approximating the solution of (6), an approach
that has been used in [13, 16] to introduce our method. This strategy is to select a subspace of
Rn and choose a vector in the subspace that minimizes the RQ. In GES-SA, we essentially do
two types of subspace selection: one uses local groupings to select local subspaces that update
our approximations locally; the other uses SA to select low-resolution subspaces that use coarse
grids to update our approximation globally. These two minimization schemes are used together in
a typical multigrid way.
We recall the RQ to introduce a minimization principle that we use to update an iterate within
a given subspace.
Definition 2.1
The RQ of a vector, v, with respect to matrices A and B is the value
vT Av
A,B (v) ≡ (7)
vT Bv
Since we restrict ourselves to the case when A and B are SPD, the RQ is always a real and
positive valued. The solution we seek minimizes the RQ:
A,B (v1 ) = minn A,B (v) = 1 >0 (8)
v∈R
If two vectors w and v are such that A,B (w)< A,B (v), then w is considered to be a better
approximate solution to (6) than v. Therefore, problem (6) is restated as a minimization problem:
find v1 = 0 such that A,B (v1 ) = minn A,B (v) (9)
v∈R
Given a current approximation, ṽ, we use the minimization principle to construct a subspace,
V ⊂ Rn , such that dim(V) = m
n and
min A,B (v) A,B (ṽ) (10)
v∈V
DOI: 10.1002/nla
GES-SA 255
The new approximation, w̃, is a vector in V with minimal RQ. Note that if ṽ is already of minimal
RQ, then lowering the RQ is not possible. In general, we must carefully construct the subspace to
ensure that the RQ is indeed lowered.
To select w̃, we must solve a restricted minimization problem within V:
find w̃ = 0 such that A,B (w̃) = min A,B (v) (11)

v∈V
This restricted minimization problem is solved for w̃ by restating the minimization problem within
the lower-dimensional vector space, Rm , and then mapping the low-dimensional solution to the
corresponding vector in V. To do so, we construct an n ×m matrix, Q, whose m column vectors are
a basis for V. Note that, for any v ∈ V, there exists a unique y ∈ Rm such that v = Qy. Moreover,
the RQ of v with respect to A and B and the RQ of y with respect to coarse versions of A and B
are equivalent:
vT Av yT Q T AQy
A,B (v) = = = Q T AQ, Q T B Q (y) = AV ,BV (y) (12)
vT Bv yT Q T B Qy
for AV = Q T AQ and BV = Q T B Q. Thus, the solution of restricted minimization problem (11) is

found by solving a low-dimensional minimization problem:
find y1 = 0 such that AV ,BV (y1 ) = minm AV ,BV (y) (13)

y∈R
or, equivalently, a low-dimensional eigenproblem:
find an eigenvector, y1 = 0, corresponding to the smallest
eigenvalue, 1 , in the eigenproblem AV y = BV y (14)
After either approximating the solution to low-dimensional minimization problem (13) or solving
low-dimensional eigenvalue problem (14) for y1 with a standard eigensolver, the solution to the
minimization problem restricted to V defined in (11) is w̃ ← Qy1 . The whole process is then
repeated: update ṽ ← w̃, use ṽ to form a new subspace, V, and corresponding Q, solve (14) for
y1 , and set w̃ ← Qy1 .
The specific methods we use for constructing subspaces are the defining features of GES-SA
and are explained in the following three sections. In Section 2.1, we focus on how a reasonable
initial approximation is obtained using a nonoverlapping version of the subspace minimization
algorithm. In Section 2.2, we present the global subspace minimization based on SA that serves
as our nonlinear coarse-grid update. In Section 2.3, we describe the local subspace minimizations
that play the role of nonlinear relaxation.
2.1. Initial guess development

Because the RQ minimization problem we wish to solve is nonlinear, it is helpful to develop a
fairly accurate initial approximation to a minimal eigenvector. The algorithm presented in this
section is very similar to the local subspace iteration that is presented in Section 2.3. The difference
is that here we perform nonoverlapping additive updates with the zero-vector as an initial iterate.
DOI: 10.1002/nla
First, we require that an aggregation, {A j }mj=1 , be provided. Each aggregate induces a subspace,
V j ⊂ Rn , defined by all vectors v whose support is contained entirely in A j . We form a local
selection matrix, Q j , that maps Rm j onto V j , where m j is the number of nodes in the jth
aggregate. This matrix is given by
⎡ ⎤

⎢ ⎥
Q j =⎢ ê
⎣ p1
. . . ê pm j ⎥
⎦ (15)
⊥ ⊥
m
where ê p is the pth canonical basis vector, and { pq }q=1
j
are the nodes in the jth aggregate. We
then form local principal submatrices, A j ← Q j AQ j and B j ← Q Tj B Q j . A solution, y1 = 0, to
T
generalized eigenvalue problem (14) of size m j is then found using a standard eigensolver. Nodes
within the jth aggregate are set as w̃ j ← Q j y1 . After w̃ j is found for
each aggregate, the initial
approximation is the sum of disjoint locally supported vectors: ṽ ← mj=1 w̃ j .
Remark 2.1
There is no guarantee that w̃ j is of the same sign as the w̃k that are supported within adjacent
aggregates. For example, w̃ j may have all negative entries on A j and w̃k may have all positive
entries on an adjacent aggregate. In fact, discrepancies in the sign of entries on neighboring
aggregates usually occur in practice because y1 is still a solution to the local eigenproblem for
any = 0. However, this is not an issue of concern, because the subsequent coarse-grid update
presented in Section 2.2 uses the same aggregation as the initial guess development. The coarse
space is invariant to such scaling; hence, the result of coarse-grid update is independent as well.
In any case, we emphasize that this may occur only for the initial guess development phase of the
algorithm. Example 2 in Section 4 is designed to show the invariance of the success of GES-SA
with respect to these sign changes.
We summarize initial guess development in the form of an algorithm. This algorithm is used
on every level in the full GES-SA (algorithm 3 of Section 3) as pre-relaxation only for the first
multigrid cycle.
Algorithm 1
Initial guess development.
• Function: ṽ ← I G D(A, B, {A j }mj=1 ).

• Input: SPD matrices A and B, and aggregation {A j }mj=1 .
• Output: Initial approximate solution ṽ to (6).
1. For j = 1, . . . , m, do the following:

(a)Form Q j based on A j as in (15).
(b)Compute A j ← Q Tj AQ j and B j ← Q Tj B Q j .
(c)Find any y1 , y1 2 = 1, by solving (14) with a standard eigensolver.
(d)Interpolate w̃ j ← Q j y1 .

2. Output ṽ ← mj=1 w̃ j .
DOI: 10.1002/nla
GES-SA 257
Algorithm 1 is demonstrated through Example 1. The selection matrices are

⎡ ⎤
⎡ ⎤ ⎡ ⎤
1 ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ 1 ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ 1⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢1 ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
Q1 = ⎢ ⎥, Q2 = ⎢ ⎥ and Q3 = ⎢ ⎥ (16)
⎢ ⎥ ⎢ 1 ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ 1⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢1 ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ ⎦ ⎣ ⎦ ⎢ 1 ⎥
⎣ ⎦
1
Here, for all aggregates, j = 1, 2, 3, the restricted matrices are identical:

⎡ ⎤ ⎡ ⎤
2 −1 1
1⎢ ⎥ ⎢ ⎥
A j = ⎣ −1 2 −1⎦ and Bj =⎣ 1 ⎦ (17)
2
−1 2 1
Hence, solutions to the restricted eigenproblems are all of the form ỹ1 = j [ 12 , √1 , 12 ]T , with a
2
scaling term | j | = 1. Hence, the initial guess developed is the vector

T
1 1 1 2 2 2 3 3 3
ṽ = ,√ , , ,√ , , ,√ , (18)
2 2 2 2 2 2 2 2 2
For the case j = 1 for all three aggregates, the initial guess is seen in Figure 3. We reiterate
what is stated in Remark 2.1: if, for example, 1 = 3 = 1 and 2 = −1, then the initial guess
causes no difficulty, even though the RQ of this vector is much higher than the vector formed from
1 = 2 = 3 = 1. For either vector, the subsequent coarse-grid update uses the same subspace to
find a set of coefficients that correspond to some new vector of minimal RQ within that subspace.
In the context of multigrid, initial guess development is used in place of pre-relaxation for
the first GES-SA multigrid cycle performed. Subsequent pre-relaxations and post-relaxations are
applied as local subspace relaxation as presented in Section 2.3. We now describe how SA is used
for global subspace updates.
Figure 3. Initial guess for the 1D model problem produced by the initial guess development algorithm;
the RQ has been minimized over each aggregate individually.
DOI: 10.1002/nla
2.2. Global coarse-grid RQ minimization

Typically, SA has been used to form intergrid transfer operators within multigrid schemes for
linear systems, as in [6, 7]. Here, we use SA in a similar fashion to form coarse subspaces of lower
dimension that are used to compute iterates with lower RQ.
SA defines a sparse n ×m interpolation operator, P, that maps from a coarse set of m variables
to the original fine set of n variables. Here, we use the same aggregation that was used for initial
guess development in Section 2.1. This is essential for the initial guess to be a suitable one, as
stated in Remark 2.1. Given a current iterate, ṽ, we form a space V that is designed to contain a
vector with a RQ that is less than or equal to that of ṽ. Our construction is to first form tentative
interpolation, P̂, that has ṽ in its range. This is done in the usual way by restricting the values of ṽ
to individual aggregates according to the sparsity pattern defined by the aggregation matrix, [A]:
P̂ := diag(ṽ)[A] (19)
Operator P̂ is such that ṽ ∈ R( P̂). Specifically, ṽ = P̂ 1m , where 1m is the column vector of all
ones with length m. This means that we are guaranteed to have a vector within R( P̂) with no
larger an RQ than that of ṽ:
min A,B (v) A,B (ṽ) (20)
v∈R( P̂)
Many of the vectors in R( P̂) are of high RQ, because the columns of P̂ have local support and
are not individually algebraically smooth with respect to relaxation. Therefore, as in standard SA,
we apply a polynomial smoothing operator of low degree, S, to P̂, and use the resulting operator,
instead of P̂, as a basis for our coarse space. This gives a coarse space with better approximation
to the sought eigenvector at reasonable increase in computational complexity. This smoothing
consists of just one application of the error propagation operator of damped Jacobi:
S := (In −D −1 A) (21)
where In is the identity operator on Rn and = 4/3D −1 A2 .
Normalization of the columns of interpolation is also performed, which does not change the
range of interpolation, but does control the scaling of the coarse-grid problems. This scaling is
used so that the diagonal entries of coarse-grid matrix Ac are all one. The scaling is done by
multiplying with diagonal matrix, N , whose entries are given by
1
Nii := (22)
S( P̂)i A
where ( P̂)i is the ith column of P̂. Note that we must assume that ṽ is nonzero on every aggregate.
The interpolation matrix is
P := S P̂ N (23)
Under this construction, S ṽ is in the range of P. Therefore, if S ṽ has lower RQ than that of
ṽ, we have guaranteed that a vector in Vc = R(P) improves the RQ of our iterate. The vector of
minimal RQ we select from Vc is typically a vector of much lower RQ than that of S ṽ due to the
localization provided by prolongation.
DOI: 10.1002/nla
GES-SA 259
Note that a choice of could be computed to minimize the RQ of ṽ, a single vector in the
range of interpolation. However, this choice of may not be best for all other vectors in the range.
Therefore, we retain the standard choice of , known from the literature [15].
The columns of P form a basis for Vc because our construction ensures that there is at least one
point in the support of each column that is not present in any other column. Forming aggregates
that are at least a neighborhood in size and using damped-Jacobi smoothing does not allow
columns to ever share support with an aggregate’s central node. Therefore, under the assumption
ṽ is nonzero on every aggregate, Ac ← P T A P and Bc ← P T B P are both SPD. In the multigrid
vocabulary, restricted problem (14) is now the coarse-grid problem. A coarse-grid update is given
by interpolating the solution of the coarse-grid problem: w̃ ← Py1 . This problem, Ac y = Bc y, is
either solved using a standard eigensolver or posed as a coarse-grid minimization problem as in
(13), where local and global updates may be applied in a recursive fashion. This process forms
the coarse-grid update step of Algorithm 3 of Section 3, the full GES-SA algorithm.
As in linear multigrid, the coarse-grid update needs to be complemented by an appropriately
chosen relaxation process, on which we next focus.
2.3. Local subspace RQ relaxation

In the context of a nonlinear multilevel method, we use subspace minimization updates posed over
locally supported subsets as our relaxation process, which is a form of nonlinear overlapping-block
Gauss–Seidel method for minimizing the RQ. This section explains the specifics for choosing the
nodes that make up each block and presents the relaxation algorithm.
The original generalized eigenvalue problem, Av = Bv, is posed over a set of n nodes, n . To
choose a subspace that provides a local update over a small cluster of m j nodes, we construct W j
to be a subset of n , with cardinality m j . Subset W j should be local and connected within the
graph of A. Subspace Vṽj is chosen to be the space of all vectors that only differ from a constant
multiple of our current approximation, ṽ, by w, a vector with support in the subset W j :
Vṽj := {v ∈ Rn | v = w0 ṽ+w where w0 ∈ R and supp(w) ⊂ W j } (24)
a subspace of Rn with dimension (m j +1) used to form and solve (11) for an updated approxi-
mation, w̃, that has a minimum RQ within Vṽj . We allow changes to the entries of current iterate
ṽ only at nodes in set W j to minimize RQ, while leaving ṽ unchanged at nodes outside of W j ,
up to a scaling factor, w0 .
Remark 2.2
If ṽ has a relatively high RQ, then a vector in Vṽj that has minimal RQ may have w0 = 0. Essentially,
the subspace iteration throws away all information outside of W j . This is potentially disastrous to
our algorithm because, for typical problems, minimal eigenvectors are globally supported. Avoiding
this situation is the primary reason we develop initial guesses with Algorithm 1 instead of randomly.
Our current implementation does not update the iterate for subspaces in which w0 = 0. However,
this situation did not occur for the problems presented in the numerical results in Section 4.
We now explain how subsets W j are chosen and then explain the iteration procedure. One step
of the local subspace relaxation scheme minimizes the approximate eigenvalue locally over one
small portion of the full set of nodes, n . We utilize a sequence of subsets {W j }mj=1 ⊂ n that
DOI: 10.1002/nla
form an overlapping covering of n . We then perform local subspace relaxation with each of these
subsets in a multiplicative fashion.
Similar to aggregation matrix [A], we represent these subset coverings with a sparse, binary,
overlapping subset matrix, [W]. One way to obtain an overlapping subset covering is by dilating
aggregates. This is accomplished by taking each aggregate A j within the aggregation and
expanding A j once with respect to the graph of matrix AG (Figure 4). Let [AG ] be an n ×n
binary version of AG that stores strong connections in the graph of A, defined as

1, (AG )i j = 0
[AG ]i j := (25)
0, (AG )i j = 0
Then define [W] by creating a binary version of the matrix product [AG ][A], a dilation

1, ([AG ][A])i j = 0
[W]i j := (26)
0, ([AG ][A])i j = 0
Our choice of the overlapping subsets is not limited to this construction; however, we make this
choice for simplicity and convenience.
In practice, each local RQ minimization is accomplished by rewriting minimization problem
(11) as a generalized eigenvalue problem of low dimension, as in (14), and solving for minimal
eigenvector y1 with a standard eigensolver. Note that here we use Q ṽj to represent matrices that
span each subspace, Vṽj , to distinguish from the Q j used in the initial guess section. We construct
an n ×(m j +1) matrix, Q ṽj , so that its columns are an orthogonal basis for subspace Vṽj . To define
Q ṽj explicitly, first define vector v0 by

vi , i ∈ W j
(v0 )i := (27)
0, i ∈Wj
For each point p ∈ W j , define canonical basis vectors ê p . Then, we form Q ṽj by appending these
(m j +1) vectors in a matrix of column vectors:
⎡ ⎤

ṽ ⎢ v . . . ê pm j ⎥
Q j = ⎣ 0 ê p1 ⎦ (28)
⊥ ⊥ ⊥
Figure 4. Graph of matrix AG from Example 1, with n = 9, grouped into three overlapping subsets. Each
box encloses a group of nodes in a respective subset.
DOI: 10.1002/nla
GES-SA 261
m
where the sequence of points, { pi }i=1 j
, is a list of all points within local subset W j . This makes
the columns of Q j orthogonal, a matrix that maps from Rm j +1 onto Vṽj . For the 1D example,
ṽ
with W2 = {3, 4, 5, 6, 7}, the operator is given by

⎡ ⎤
v1
⎢ ⎥
⎢ v2 ⎥
⎢ ⎥
⎢ ⎥
⎢0 1 ⎥
⎢ ⎥
⎢ ⎥
⎢0 1 ⎥
⎢ ⎥
⎢ ⎥
Q ṽ2 = ⎢ 0 1 ⎥ (29)
⎢ ⎥
⎢0 ⎥
⎢ 1 ⎥
⎢ ⎥
⎢0 1⎥
⎢ ⎥
⎢ ⎥
⎢v ⎥
⎣ 8 ⎦
v9
Next, we compute Aṽj ← (Q ṽj )T AQ ṽj and B ṽj ← (Q ṽj )T B Q ṽj . Then (14) is solved with a standard
eigensolver for y1 , which is normalized so that (y1 )1 = 1. This normalization is the same as
requiring w0 = 1, which leaves all nodes outside of W j unchanged by the update. Then, updated
iterate is then given by w̃ ← Q ṽj y1 .
Local subspace relaxation is summarized in the following algorithm.
Algorithm 2
Local subspace relaxation.
Function: ṽ ← LSR(A, B, ṽ, {W j }mj=1 ) .
Input: SPD matrices A and B, current approximation to the minimal eigenvector ṽ, and over-
lapping subset covering {W j }mj=1 .
Output: Updated iterate ṽ.
1. For j = 1, . . . , m, do the following:
(a) Form Q ṽj based on ṽ and W j as in (28).
(b) Form Aṽj ← (Q ṽj )T AQ ṽj and B ṽj ← (Q ṽj )T B Q ṽj .
(c) Find y1 by solving (14) via a standard eigensolver.
(d) If w0 = 0, normalize and ṽ ← Q ṽj y1 .
2. Output ṽ.
Figure 5 shows how a single sweep of local subspace relaxation acts on a random initial guess for
the 1D example. Although the guess is never really random in the actual algorithm, due to the initial
guess development, we show this case so it is clear how the algorithm behaves. This algorithm gives
relaxed iterate w̃ local characteristics of the actual minimal eigenvector. For problems with a large
number of nodes, the global characteristics of the iterate are far from those of the actual minimal
eigenvector. This is where the coarse-grid iteration complements local subspace relaxation. When
done in an alternating sequence, as in a standard multigrid method, the complementary processes
DOI: 10.1002/nla
Figure 5. A typical local subspace relaxation sweeps on a random iterate for the 1D example with
n = 9. The top left vector is the initial iterate, ṽ; top right shows a subspace update on subset W1 ;
bottom left shows a subsequent update over W2 ; and bottom right shows final relaxed iterate w̃ after
a subsequent subspace update over W3 .
achieve both local and global characteristics of the approximate minimal eigenvector, forming an
eigensolver. Their explicit use is presented in the following section.
3. GES-SA
Because GES-SA is a multilevel method, to describe it, we change to multilevel notation. Any
symbol with subscript l refers to an object on grid l, with l = 1 the finest or original grid and l = L
the coarsest. For example, the matrix associated with the problem on level l is denoted by Al ; in
particular, A1 = A, the matrix from our original problem. Interpolation from level l +1 to level l
l
is denoted by Pl+1 instead of P, and restriction from level l to level l +1 is denoted (Pl+1l )T . The
dimension of Al is written nl . Other level l objects are denoted with a subscript and superscript l,
as appropriate.
3.1. The full GES-SA algorithm

GES-SA performs multilevel cycles that are structured in a format similar to standard multigrid.
The very first cycle of the full GES-SA algorithm differs from the subsequent cycles: on each level,
the initial guess development given in Algorithm 1 is used in place of pre-relaxation. Subsequent
cycles use local subspace relaxation given in Algorithm 2. Coarse-grid updates are given by the
process presented in Section 2.2. A typical GES-SA cycling scheme is illustrated in Figure 6.
Algorithm 3
Generalized eigensolver based on smoothed aggregation.
Function: ṽl ← GESSA(Al , Bl , , ,
,l).
Input: SPD matrices Al and Bl , number of relaxations to perform , number of cycles , number
of coarse-grid problem iterations
, and current level l.
DOI: 10.1002/nla
GES-SA 263
Figure 6. Diagram of how V -cycles are done in GES-SA for

= 1. We follow the diagram from left to
right as the algorithm progresses. Gray dots represent the initial guess development phase of the algorithm,
only done in the first cycle. Hollow dots represent solve steps done with a standard eigensolver on the
coarsest eigenproblem. Black dots represent local subspace pre- and post-relaxation steps. A dot on top
stands for a step on the finest grid and a dot on bottom stands for a step on the coarsest grid.
Output: Approximate minimal eigenvector ṽl to the level l problem.

0. If no aggregation of nl is provided, compute {Alj }mj=1 l
. Also, if no overlapping subset
l ml
covering is provided, compute {W j } j=1 . Step 0 is only performed once per level.
1. For = 1, . . . , , do the following:
(a) If = 1, form an initial guess, ṽl ← IGD(Al , Bl , {Alj }mj=1 l
). Otherwise, pre-relax the
l ml
current approximation, ṽl ← LSR(Al , Bl , ṽl , , {W j } j=1 ).
(b) Form Pl+1 l with SA based on ṽl and {Alj }mj=1 l
as in (23).
(c) Form matrices Al+1 ← (Pl+1 ) Al Pl+1 and Bl+1 ← (Pl+1
l T l l )T B P l .
l l+1
(d) If nl+1 is small enough, solve (14) for y1 with a standard eigensolver, and set ṽl+1 ← y1 .
Else ṽl+1 ← GESSA(Al+1 , Bl+1 , ,
,
,l +1).
(e) Interpolate the coarse-grid minimization, ṽl ← Pl+1 l ṽ
l+1 .
(f) Post-relax the current approximation, ṽl ← LSR(Al , Bl , ṽl , , {Wlj }mj=1
l
).
2. Output ṽl .
3.2. A qualitative comparison with RQMG

The GES-SA algorithm differs from RQMG [13] and algebraic versions of RQMG [14] in three
main aspects. Iterations in RQMG are performed as corrections, whereas iterations in GES-SA are
replacements or updates. In terms of cost, cycles of RQMG are cheaper than those of GES-SA.
For one, the updates of the hierarchy that GES-SA creates are not performed with each iteration
of RQMG. Also, the version of GES-SA we present here uses block relaxation compared with the
point relaxation used by the RQMG methods in the literature.
Perhaps more significant is that while the RQMG methods are supplied with a fixed hierarchy
of interpolation operators, assumed to have good approximation for the minimal eigenvector, GES-
SA starts with no multigrid hierarchy and creates one, changing the entries of the interpolation
operators with each cycle. This is similar in spirit to running several initialization setup phase cycles
of the original, relaxation-based SA. The GES-SA multigrid hierarchy is iteratively improved to
have coarsening and good approximation properties tailored for the problem at hand.
These differences suggest that GES-SA or a similar adaptive process may also be used to
initialize RQMG by supplying it with an initial hierarchy. Of even more appeal is that RQMG
DOI: 10.1002/nla
could be used in subsequent cycles to develop several eigenvectors at once, which is currently not a
feature of GES-SA. This would be a useful approach to initialize linear solvers for system problems.
This study does not quantitatively investigate the use of RQMG in the context of an adaptive
process. These possible expansions of the current adaptive methodology are under consideration
for our future research.
3.3. Simple adaptive linear solvers

Our primary purpose is to use GES-SA to create an adaptive linear SA solver for the problem
Ax = b. We first consider problems that require only one near-kernel vector for a successful solver.
Applications of GES-SA are repeated until the RQ improvement slows. This gives an approximate
minimal eigenvector, ṽ. Then, the setup phase of SA is run to form a solver that accurately
represents K = {ṽ}. This solver is tested on the homogeneous problem, Ax = 0. Section 4 presents
results only for such one-vector solvers.
However, if the current one-vector solver is not adequate, then we must develop a vector that
represents error that is algebraically smooth with respect to this solver. Currently, our approach is to
use the general setup phase of SA in [7] to develop a secondary component, k2 . (A study regarding
RQ optimization approaches for computing these secondary components is underway.) Then, the
setup phase of SA is run to form a solver that accurately represents K = {ṽ, k2 }. The updated
solver is again tested on the homogeneous problem. If the updated solver is also inadequate, the
SA process can be repeated until an adequate solver is built.
Many linear systems that come from the discretization of scalar partial differential equations
(PDEs) are solved efficiently with SA, with the vector of all ones as near-kernel, where the linear
solver has decent convergence rates. However, we present examples of matrices where the vector
of all ones is not a near-kernel component, and using it as one with SA may not produce a linear
solver with acceptable convergence rates.
All the results in this section show the result of running one GES-SA V -cycle ( = 1,
=
1) and = 2 post-relaxation steps. Our implementation for GES-SA is currently in MATLAB
and we therefore make no rigorous timing comparisons with competing eigensolvers. In further
investigations, we intend to explore these details. The small eigenproblems involved in GES-SA
were all solved using the eigs() function with flags set for real and symmetric matrices, which
implements ARPACK [17] routines. No 2D problem used more than five iterations to solve small
eigenproblems; no 3D problem used more than 10 iterations.
Example 2
We present the random-signed discrete Laplacian. Consider the d-dimensional Poisson problem
with Dirichlet boundary conditions:
−u = f in = (0, 1)d

(30)
u=0 on
We discretize (30) with both finite element spaces with nodal bases and second-order finite
differences on equidistant rectilinear grids.
DOI: 10.1002/nla
GES-SA 265
Either way we discretize the problem, we have a sparse n ×n matrix Â. We then define the
diagonal, random-signed matrix D± to have randomly assigned positive and negative ones for
entries. Finally, we form the random-signed discrete Laplacian matrix A by
A ← D± ÂD± (31)
In our results, we also symmetrically scale the matrix A to have ones on its diagonal for Example 2.
Now consider solving Ax = b given vector b. Note that the vector of all ones is not algebraically
smooth with respect to standard relaxation methods. As shown in Table I, using the vector of all
ones produces SA solvers that have unacceptable convergence factors for these problems. Instead,
we use one GES-SA cycle to produce an approximate minimal eigenvector, ṽ, and use K = {ṽ}
in the setup phase of SA to produce a linear SA solver. The convergence factors of the resulting
solver are comparable to those obtained using the actual minimal eigenvector to build the linear
SA solver. Note that convergence factors are reported as an estimation of asymptotic convergence
factors by computing a geometric average of the last 5 of 25 linear SA V (2, 2)-cycles,
1/5
e(25) A
Asymptotic convergence factor ≈ (32)
e(20) A
for the homogeneous problem, Ax = 0, starting with a random initial guess. Operator complexity
is also reported for the linear solver that uses the vector developed with GES-SA. We use
Table I. Asymptotic convergence factors for the 2D and 3D finite difference (FD) and
finite element (FE) versions of the random-signed Laplacian problem.
Problem size Levels Ones GES-SA Eigen Comp
2D, FE 81 2 0.620 0.074 0.074 1.078
729 3 0.892 0.176 0.179 1.108
6561 4 0.965 0.193 0.196 1.119
59 049 5 0.977 0.215 0.214 1.123
2D, FD 81 2 0.849 0.219 0.219 1.317
729 3 0.947 0.294 0.290 1.357
6561 4 0.962 0.306 0.305 1.348
59 049 5 0.978 0.312 0.312 1.342
3D, FE 729 2 0.598 0.114 0.111 1.054
19 683 3 0.934 0.188 0.189 1.112
3D, FD 729 2 0.825 0.289 0.292 1.389
19 683 3 0.944 0.360 0.358 1.495
64 000 4 0.961 0.418 0.413 1.511
Factors in the column labeled ‘ones’ correspond to solvers created using the vectors of
all ones; factors in the ‘GES-SA’ column correspond to solvers that use our approximate
minimal eigenvector computed with GES-SA; and factors in the ‘eigen’ column correspond
to solvers that use the actual minimal eigenvector. The last column, ‘comp’, shows the
operator complexity for all three types of solvers.
DOI: 10.1002/nla
the usual definition of operator complexity,

L
l=1 nz(Al )
comp = (33)
nz(A1 )
where the function nz(M) is the number of nonzeros in sparse matrix M.

Both geometric- and algebraic-based aggregations were done in our tests. For the finite element
problems, we took advantage of knowing the geometry of the grid and formed aggregates that were
blocks of 3d nodes. For the finite difference problems, no geometric information was employed and
aggregation was done algebraically, as in [6]. Small examples in two dimensions of the difference
between the two types of aggregations we used are shown in Figure 7. Algebraic aggregations
were done based on the strength-of-connection measure given in (3), with = 0.1.
Although it is not the primary purpose of this study, it is also interesting to view GES-SA as
a stand-alone eigensolver. For the random-signed Laplacians, Table II shows how one GES-SA
V -cycle with = 2 produces an approximate minimal eigenvector that is very close to the actual
minimal eigenvector in the sense that the relative error between the RQ and the minimal eigenvalue
is order 1.
All the results in this section are produced using only one GES-SA cycle. However, we do not
believe that a decent approximate minimal eigenvector can be produced with one GES-SA cycle
for general problems. Note that the relative error of one cycle tends to increase as h decreases, or
as the discretization error decreases. For most problems, we anticipate having to do more GES-SA
cycles to achieve an acceptable approximate minimal eigenvector.
Example 3
We also investigate GES-SA on ‘shifted’ Laplacian, or Hemholtz, problems to show the invariance
of performance with respect to such shifts. Consider the d-dimensional Poisson problem with
Dirichlet boundary conditions, shifted by a parameter, s >0:
−u − s u = f in = (0, 1)d

(34)
u=0 on
Figure 7. Aggregation examples displayed for 2D test problems of low dimension. On the left is an
aggregation formed with a geometric aggregation method used for the finite element problems; on the
right is an aggregation formed with an algebraic aggregation method used for finite-difference problems.
Black edges represent strong connections within graph of matrix AG ; each gray box represents a separate
aggregate that contains the nodes enclosed.
DOI: 10.1002/nla
GES-SA 267
Table II. Relative errors between the RQ of the GES-SA approximate minimal eigen-
vector, , and the minimal eigenvalue, 1 , for 2D and 3D finite element and finite
difference versions of Example 2.
Problem size Levels 1 Relative error
2D, FE 81 2 7.222e−02 7.222e−02 0.0000034
729 3 9.413e−03 9.412e−03 0.0001608
6561 4 1.101e−03 1.100e−03 0.0002491
59 049 5 1.243e−04 1.243e−04 0.0001224
2D, FD 81 2 4.895e−02 4.894e−02 0.0000582
729 3 6.307e−03 6.288e−03 0.0031257
6561 4 7.501e−04 7.338e−04 0.0222547
59 049 5 9.306e−05 8.289e−05 0.1227465
3D, FE 729 2 1.066e−01 1.066e−01 0.0000017
19 683 3 1.412e−02 1.409e−02 0.0022805
3D, FD 729 2 4.896e−02 4.894e−02 0.0003230
19 683 3 6.303e−03 6.288e−03 0.0024756
64 000 4 2.981e−03 2.934e−03 0.0158771
Here, s is chosen to make the continuous problem nearly singular. The minimal eigenvalue of
the Laplacian operator on (0, 1)d is d2 . Therefore, setting
s = (1−10−s )d2 (35)
for an integer s>0 makes the shifted operator (−− s ) have a minimal eigenvalue of 1 =
10−s d2 . Here, we consider the d = 2 and 3 cases for various shifts s . We discretized the 2D
case with nodal bilinear functions on square elements, with h = 244 1
. This gave us a system with
n = 59 049 degrees of freedom. All aggregations done in these tests were geometric, and aggregate
diameters were never greater than 3. For each shift, the solvers we developed (using both GES-SA
and the actual minimal eigenvector) have operator complexity 1.119 and five levels with 59 049,
6561, 729, 81, and 9 degrees of freedom on each respective level. Similarly, the 3D case was
discretized with nodal trilinear functions on cube elements with h = 37
1
. This gave us a system with
n = 46 656 degrees of freedom. Again, for each shift the solvers have operator complexity 1.033
and four levels with 46 656, 1728, 64, and 8 degrees of freedom on each respective level. In either
case, the minimal eigenvalue for the discretized matrix A is 1 ≈ 10−s d2 h d .
For all cases, we produced two SA solvers: the first solver was based on the actual minimal
eigenvector of A and the second was based on the approximation to the minimal eigenvector
created by one cycle of GES-SA. In Table III, we show asymptotic convergence factors (32) for
these solvers for 2D and 3D and specific shift parameters.
We assume that prolongation P from the first coarse grid to the fine grid satisfies the weak
approximation property with constant

u− Pv22 A2
C := sup minn (36)
u∈R f v∈R
n c (Au, u)
DOI: 10.1002/nla
Table III. Asymptotic convergence factors and measures of approximation, for example, 3.
s =1 s =2 s =3 s =4 s =5
1 3.32e−05 3.32e−06 3.36e−07 3.77e−08 7.90e−09
3.32e−05 3.37e−06 3.88e−07 9.11e−08 6.03e−08
2D, FE eigen 0.196 0.198 0.198 0.199 0.197
GES-SA 0.197 0.197 0.196 0.199 0.430
(n = 59 049) M1 (P) 1.14e−05 1.13e−04 1.11e−03 1.01e−02 4.83e−02
M2 (P) 9.45e−11 9.37e−11 9.36e−11 9.54e−11 9.54e−11
1 5.86e−05 6.17e−06 9.32e−07 4.08e−07 3.56e−07
5.88e−05 6.30e−06 1.06e−06 5.40e−07 4.86e−07
3D, FE eigen 0.187 0.187 0.190 0.188 0.183
GES-SA 0.188 0.185 0.188 0.187 0.185
(n = 46 656) M1 (P) 7.07e−05 6.67e−04 4.43e−03 1.04e−02 1.18e−02
M2 (P) 3.85e−08 3.83e−08 3.84e−08 3.94e−08 3.91e−08
The s values in the columns give shift sizes s as in (35). The first block row is for 2D problems,
the second is for 3D problems. The rows labeled ‘1 ’ show the minimal eigenvalue for the
specific discrete problem and those labeled ‘’ show RQs of the GES-SA vectors. Rows labeled
‘eigen’ show convergence factors for solvers based on the actual minimum eigenvector. Rows
labeled ‘GES-SA’ show convergence factors for solvers based on the approximation to the minimal
eigenvector given by one GES-SA cycle. Measures of approximation, M1 (P) and M2 (P), are in
rows with respective labels.
Based on the knowledge that A comes from a scalar PDE, we further assume that it is most essential
to approximate a minimal eigenvector, u1 . The denominator, (Au, u), is smallest for this vector
and other vectors that have comparable denominators are locally well represented by u1 . Under
these assumptions, we feel it is insightful to monitor the following measure of approximation for
any P that we develop
u1 − Pv22 A2

M1 (P) := minn (37)
v∈R c (Au1 , u1 )
where u1 is the minimal eigenvector of A. Note that this is a lower bound: M1 (P)C. We
compute minv∈Rnc u1 − Pv2 by directly projecting u1 onto the range of P, a computationally
costly operation that is merely a tool for analyzing test problems. Table III reports M1 (P) on
the finest grid for the P developed using the GES-SA method. As s increases, and the problem
becomes more ill-conditioned, we see an increase of M1 (P) and eventually a degradation in the
convergence factors for the 2D linear solvers that GES-SA produced.
We wish to investigate whether the degradation in the 2D GES-SA solver is due to GES-SA
performing worse for the more ill-conditioned problems, or the approximation requirements getting
stricter. To this purpose, we monitor a second measure of approximation
u1 − Pv22
M2 (P) := minn (38)
v∈R c u1 22
Again, this measure is shown in Table III for each problem. As s increases, we see that M2 (P) is
essentially constant for the linear solvers that GES-SA produced, with fixed computation, indicating
that the degradation is only due to the approximation requirements getting stricter.
DOI: 10.1002/nla
GES-SA 269
5. CONCLUSION
This paper develops a multilevel eigensolver, GES-SA, in the SA framework for the specific
application of enhancing robustness of current adaptive linear SA solvers. We show preliminary
numerical results that support approximate eigensolvers as potentially useful for initialization
within the adaptive AMG process. This paper serves as a proof of concept, and due to our high-
level implementation, we are not making claims about the efficiency of this algorithm versus
purely relaxation-based initialization given in [7]. This question will be investigated as we begin
incorporating eigensolvers into our low-level adaptive software.
ACKNOWLEDGEMENTS
The work of the last author was performed under the auspices of the U.S. Department of Energy by the
University of California Lawrence Livermore National Laboratory under contract W-7405-Eng-48.
REFERENCES
1. Brandt A. Algebraic multigrid theory: the symmetric case. Applied Mathematics and Computation 1986; 9:23–26.
2. Brandt A, McCormick S, Ruge J. Algebraic multigrid (AMG) for sparse matrix equations. In Sparsity and its
Applications, Evans DJ (ed.). Cambridge University Press: Cambridge, U.K., 1984.
3. Briggs W, Henson VE, McCormick SF. A Multigrid Tutorial (2nd edn). SIAM: Philadelphia, PA, 2000.
4. Ruge J, Stüben K. Algebraic multigrid (AMG). In Multigrid Methods, vol. 5, McComrick SF (ed.). SIAM:
Philadelphia, PA, 1986.
5. Trottenberg U, Osterlee CW, Schuller A (Appendix by K. Stuben). Multigrid (Appendix A: An Introduction to
Algebraic Multigrid). Academic Press: New York, 2000.
6. Vaněk P, Mandel J, Brezina M. Algebraic multigrid by smoothed aggregation for second and fourth order elliptic
problems. Computing 1996; 56:179–196.
7. Brezina M, Falgout R, MacLachlan S, Manteuffel T, McCormick S, Ruge J. Adaptive smoothed aggregation
(SA). SIAM Journal on Scientific Computing 2004; 25:1896–1920.
8. McCormick SF, Ruge J. Multigrid methods for variational problems. SIAM Journal on Numerical Analysis 1982;
19:925–929.
9. Brezina M. Robust iterative methods on unstructured meshes. Ph.D. Thesis, University of Colorado, Denver, CO,
1997.
10. Ruge J. Multigrid methods for variational and differential eigenvalue problems and unigrid for multigrid simulation.
Ph.D. Thesis, Colorado State University, Fort Collins, CO, 1981.
11. Hetmaniuk U, Lehoucq RB. Multilevel methods for eigenspace computations in structural dynamics. Domain
Decomposition Methods in Science and Engineering, Lecture Notes in Computational Science and Engineering,
vol. 55. Springer: Berlin, 2007; 103–114.
12. Neymeyr K. Solving mesh eigenproblems with multigrid efficiency. In Numerical Methods for Scientific
Computing, Variational Problems and Applications, Kuznetsoz Y, Neittaanmäki P, Pironneau O (eds). Wiley:
Chichester, U.K., 2003.
13. Cai Z, Mandel J, McCormick SF. Multigrid methods for nearly singular linear equations and eigenvalue problems.
SIAM Journal on Numerical Analysis 1997; 34:178–200.
14. Hetmaniuk U. A Rayleigh quotient minimization algorithm based on algebraic multigrid. Numerical Linear
Algebra with Applications 2007; 14:563–580.
15. Vaněk P, Brezina M, Mandel J. Convergence of algebraic multigrid based on smoothed aggregation. Numerische
Mathematik 2001; 88:559–579.
16. Chan TF, Sharapov I. Subspace correction multi-level methods for elliptic eigenvalue problems. Numerical Linear
Algebra with Applications 2002; 9:1–20.
17. Lehoucq RB, Sorensen DC, Yang C. ARPACK USERS GUIDE: Solution of Large Scale Eigenvalue Problems
with Implicitly Restarted Arnoldi Methods. SIAM: Philadelphia, PA, 1998.
DOI: 10.1002/nla
Domain decomposition preconditioners for elliptic equations

with jump coefficients
Yunrong Zhu∗, †
Department of Mathematics, Pennsylvania State University, University Park, PA 16802, U.S.A.
SUMMARY
This paper provides a proof of the robustness of the overlapping domain decomposition preconditioners
for the linear finite element approximation of second-order elliptic boundary value problems with strongly
discontinuous coefficients. By analyzing the eigenvalue distribution of the domain decomposition precon-
ditioned system, we prove that only a small number of eigenvalues may deteriorate with respect to the
discontinuous jump or mesh size, and all the other eigenvalues are bounded below and above nearly
uniformly with respect to the jump and mesh size. As a result, we prove that the convergence rate of the
preconditioned conjugate gradient methods is nearly uniform with respect to the large jump and mesh
size. Copyright q 2008 John Wiley & Sons, Ltd.
Received 19 May 2007; Accepted 1 November 2007
KEY WORDS: jump coefficients; domain decomposition; conjugate gradient; effective condition number
1. INTRODUCTION
In this paper, we will discuss the overlapping domain decomposition preconditioned conjugate
gradient (PCG) methods for the linear finite element approximation of the second-order elliptic
∗ Correspondence to: Yunrong Zhu, Department of Mathematics, Pennsylvania State University, University Park, PA
16802, U.S.A.
†
E-mail: zhu y@math.psu.edu, yrzhu@psu.edu
Contract/grant sponsor: NSF; contract/grant number: DMS-0609727

Contract/grant sponsor: NSFC; contract/grant number: 10528102
Contract/grant sponsor: Center for Computational Mathematics and Applications

272 Y. ZHU
boundary value problem

−∇ ·(∇u) = f in
u = gD on D
(1)
*u
= gN on N
*n
where ∈ Rd (2 or 3) is a polygonal or polyhedral domain with Dirichlet boundary D and
Neumann boundary N . The coefficient = (x) is a positive and piecewise constant function.
More precisely, we assume that there are M open disjointed polygonal or polyhedral subregions
M
0m (m = 1, . . . , M) satisfying m=1 0m = with
|0 = m , m = 1, . . . , M
m
where each m >0 is a constant. The analysis can be carried through to a more general case when
(x) varies moderately in each subregion.
We assume that the subregions {0m : m = 1, . . . , M} are given and fixed but may possibly have
complicated geometry. We are concerned with the robustness of the PCG method in regard to both
the fineness of the discretization of the overall problem and to the severity of the discontinuities
in . This model problem is relevant to many applications, such as groundwater flow [1, 2], fluid
pressure prediction [3], electromagnetics [4], semiconductor modeling [5], electrical power network
modeling [6] and fuel cell modeling [7, 8], where the coefficients have large discontinuities across
interfaces between regions with different material properties.
When the above problem is discretized by the finite element method, for example, the condi-
tioning of the resulting discrete system will depend on both the (discontinuous) coefficients and
also the mesh size. There has been much interest in the development of iterative methods (such
as domain decomposition and multigrid methods) whose convergence rates will be robust with
respect to the change of jump size and mesh size (see [9–14] and the references cited therein). In
two dimensions, it is not too difficult to see that both domain decomposition [15–18] and multigrid
[14, 19, 20] methods lead to robust iterative methods. In three dimensions, some nonoverlapping
domain decomposition methods have been shown to be robust with respect to both the jump size
and mesh size (see [12, 14, 21, 22]). As was pointed out in [20, Remark 6.3], in some circum-
stances the deterioration is not significantly severe. In fact, using the estimates related to weighted
L 2 -projection in [23], it can be proved that (BA)C| log H | in some cases for d = 3 where H is
the mesh size of the coarse space. For example, if the interface has no cross points, or if every
subdomain touches part of the Dirichlet boundary [23–25], or if the size of coefficient satisfy
the quasi-monotonicity (cf. [26, 27]), then the multilevel or domain decomposition method was
proved to be robust. However, in general, the situations for overlapping domain decomposition
and multilevel methods are still unclear. Technically, the difficulty is due to the lack of uniform
or nearly uniform error and the stability estimates for weighted L 2 -projection, as demonstrated
in [24, 28].
Recently [29, 30], we have proved that both the BPX and the multigrid V -cycle preconditioners
will lead to nearly uniformly convergent PCG methods for the finite element approximations of
(1), although the resulting condition numbers can deteriorate severely as mentioned above. Our
work was motivated by the work of Graham and Hagger [31]. In their work, they proved that a
simple diagonal scaling would lead to a preconditioned system that only has a fixed number of
DOI: 10.1002/nla
DD PRECONDITIONER FOR JUMP COEFFICIENTS PROBLEM 273
small eigenvalues, which are severely infected by the discontinuous jumps. More precisely, they
proved that the ratio of the extreme values of the remaining eigenvalues, the effective condition
number (cf. [30]), can be bounded by Ch −2 where C is a constant independent of the coefficients
and mesh size.
The aim of this paper is to provide a rigorous proof of the robustness of the overlapping
domain decomposition preconditioners. As in [30], the main idea is to analyze the eigenvalue
distribution of the preconditioned systems and to prove that except for a few ‘bad’ eigenvalues, the
effective condition numbers are bounded uniformly with respect to the jump and logarithmically
with respect to the mesh size. Thanks to a standard theory for the conjugate gradient method (see
[31–33]), these small eigenvalues will not deteriorate the efficiency of the methodsignificantly.
More specific, the asymptotic convergent rate of the PCG method will be 1−2/(C | log H |+1),
which is uniform with respect to the size of discontinuous jump. When d = 3 if each subregion
0m (m = 1, . . . , M) is assumed to be a polyhedral domain with each edge length of size H0 , then
the effective condition number of BA can be bounded by C (1+log H0 /H ). Consequently, the
asymptotic convergence rate of the corresponding PCG algorithm is 1−2/(C 1+log H0 /H +1).
In particular, if the coarse grid satisfies H H0 , then the asymptotic convergence rate of the PCG
algorithm is bounded uniformly.
The rest of the paper is organized as follows. In Section 2, we introduce some basic notation,
the PCG algorithm and some theoretical foundations. In Section 3, we quote some main results on
the weighted L 2 -projection from [23]. We also consider the approximation property and stability
of weighted L 2 -projection in some special cases mentioned above. In Section 4, we analyze
the eigenvalue distribution of the domain decomposition preconditioned system and prove the
convergence rate of the PCG algorithm. In Section 5, we give some conclusion remarks.
Following [20], we will use the following short notation: x y means xC y; xy means xcy
and x y means cxyC x, where c and C are generic positive constants independent of the
variables in the inequalities and any other parameters related to mesh, space and especially the
coefficients.
2. PRELIMINARY
2.1. Notation
We introduce the bilinear form

M
a(u, v) = m (∇u, ∇v) L 2 (0 ) ∀u, v ∈ HD1 ()
m
m=1
where HD1 () = {v ∈ H 1 () : v|D = 0} and introduce the H 1 -norm and seminorm with respect to
any subregion 0m by
|u|1,0 = ∇u0,0 , u1,0 = (u20,0 +|u|21,0 )1/2

m m m m m
Thus,

M
a(u, u) = m |u|21,0 := |u|21,
m=1 m
DOI: 10.1002/nla
274 Y. ZHU
We also need the weighted L 2 -inner product

M
(u, v)0, = m (u, v) L 2 (0 )
m
m=1
and the weighted L 2 - and H 1 -norms

1/2
u0, = (u, u)0, , u1, = (u20, +|u|21, )1/2
For any subset O ⊂ , we denote |u|1,,O and u0,,O be the restrictions of |u|1, and u0,
on the subset O, respectively.
For the distribution of the coefficients, we introduce the index set
I = {m : meas(* 0m ∩D ) = 0}
where meas(·) is the d −1 measure, in other words, I is the index set of all subregions which do
not touch the Dirichlet boundary. We assume that the cardinality of I is m 0 . We shall emphasize
that m 0 is a constant that depends only on the distribution of the coefficients.
2.2. The discrete system

Given a quasi-uniform triangulation Th with the mesh size h, let
Vh = {v ∈ HD1 () : v| ∈ P1 (), ∀ ∈ Th }
be the piecewise linear finite element space, where P1 denotes the set of linear polynomials. The
finite element approximation of (1) is the function u ∈ Vh , such that

a(u, v) = ( f, v)+ gN v ∀v ∈ Vh
N
We define a linear symmetric positive definite (SPD) operator A : Vh → Vh by

(Au, v)0, = a(u, v)
The related inner product and the induced energy norm are denoted by

(·, ·) A := a(·, ·), · A := a(·, ·)
Then we have the following operator equation:
Au = F (2)

where F ∈ L 2 () such that (F, v)0, = ( f, v)+ N gN v, ∀v ∈ Vh . The space Vh has a natural
nodal basis {i }i=1
n such that i (x j ) = i j for each non-Dirichlet boundary node x j . By means of
these nodal basis functions, (2) can be reduced to the following linear algebra equation:
A = b (3)

where A = (ai j )n×n , with ai j = a(i , j ) = ∇i ·∇ j is the stiffness matrix and b =
(b1 , . . . , bn ) ∈ Rn such that bi = ( f, i )+ N gN i . In this algebraic form, we shall also need the
discrete weighted 2 inner product corresponding to the weighted L 2 -inner product. Let , ∈ Rn
DOI: 10.1002/nla
n n
be the vector representation of u, v ∈ Vh , respectively, i.e. u = i=1 i i and v = i=1 i i .
Define

n
(, )2 , =
¯ i i i
i=1

where ¯ j = o j /|o j | is the average of the coefficient on the local patch o j = supp( j ). By
definition and quasi-uniformity, we can easily see that
h d (, )2 , u20,
Let (A) be the condition number of A, i.e. the ratio between the largest and the smallest
eigenvalues. By the standard finite element theory (cf. [14]), it is apparent that
maxm m
(A) = (A) h −2 J() with J() =
minm m
2.3. PCG methods

The well-known conjugate gradient method is the basis of all the preconditioning techniques to
be studied in this paper. The PCG methods can be viewed as a conjugate gradient method applied
to the preconditioned system
BAu = BF
Here, B is an SPD operator, known as a preconditioner of A. Note that BA is symmetric with
respect to the inner product (·, ·) B −1 (or (·, ·) A ). For the implementation of the PCG algorithm, we
refer to the monographs [34–36].
Let u k , k = 0, 1, 2, . . . , be the solution sequence of the PCG algorithm. It is well known that
√ k
(BA)−1
u −u k A 2 √ u −u 0 A (4)
(BA)+1
which implies that the PCG method generally converges faster with a smaller condition number.
Even though the estimate given in (4) is sufficient for many applications, in general, it is
not sharp. One way to improve the estimate is to look at the eigenvalue distribution of BA (see
[31–33, 37] for more details). More specifically, suppose that we can divide (BA), the spectrum
of BA, into two sets, 0 (BA) and 1 (BA), where 0 consists of all ‘bad’ eigenvalues and the
remaining eigenvalues in 1 are bounded above and below, then we have the following theorem.
Theorem 2.1 (Axelsson [32] and Xu [33])

Suppose that (BA) = 0 (BA)∪1 (BA) such that there are m elements in 0 (BA) and ∈ [a, b] for
each ∈ 1 (BA). Then
√ k−m
b/a −1
u −u k A 2K √ u −u 0 A (5)
b/a +1
where

K = max
∈1 (BA) ∈0 (BA)
1−
DOI: 10.1002/nla
276 Y. ZHU
If there are only m small eigenvalues in 0 , say

0< 1 2 · · · m m+1 · · · n
then

m
m

K= 1− n n −1 = ((BA)−1)m

i=1 i 1
In this case, the convergence rate estimate (5) becomes

√ k−m
u −u k A b/a −1
2((BA)−1) m
√ (6)
u −u 0 A b/a +1
Based on (6), given a tolerance 0<
<1, the number of iterations of the PCG algorithm needed for
u −u k A /u −u 0 A <
is given by

2
km + log +m log((BA)−1) c0 (7)

√ √
where c0 = log( b/a +1)/( b/a −1). More detailed discussions on the iteration number of PCG
methods can be found in [32, 38].
Observing the convergence estimate (6), if there are only a few small eigenvalues of BA
in√ 0 (BA), then
√ the convergent rate of the PCG methods will be dominated by the factor
( b/a −1)/( b/a +1), i.e. by b/a where b = n (BA) and a = m+1 (BA). We define this quantity
as the ‘effective condition number.’
Definition 2.2 (Xu and Zhu [30])

Let V be a Hilbert space. The mth effective condition number of an operator A : V → V is
defined by
max (A)
m+1 (A) =
m+1 (A)
where m+1 (A) is the (m +1)th minimal eigenvalue of A.
To estimate the effective condition number, we need to estimate m+1 (A). A fundamental tool
is the Courant-Fisher ‘minimax’ principle (see, e.g. [34]):
Lemma 2.3
Let V be an n-dimensional Hilbert space and A : V → V is an SPD operator on V. Suppose that
1 2 · · · n are the eigenvalues of A, then
(Av, v)
m+1 (A) = max min
dim(S)=m 0 =v∈S ⊥ (v, v)
for m = 1, 2, . . . , n. Especially, for any subspace V0 ⊂ V with dim(V0 ) = n −m, the following
estimation of m+1 (A) holds:
(Av, v)
m+1 (A) min (8)
0 =v∈V0 (v, v)
DOI: 10.1002/nla
Inequality (8) is the starting point for our analysis of eigenvalue distribution. It enables us to
obtain a lower bound of every eigenvalue if we can estimate min0 =v∈V0 (Av, v)/(v, v) for some
suitable subspace V0 .
3. WEIGHTED L 2 -PROJECTION
Similar to [30], a major tool to analyze the overlapping domain decomposition preconditioner is
the weighted L 2 -projection Q
H : L () → VH defined by
2
(Q
H u, v H )0, = (u, v H )0, ∀v H ∈ VH
In this section, we shall recall some main results on weighted L 2 -projection from [23]. Most of
the results in this section can also be found in [30].
Lemma 3.1 (Bramble and Xu [23])

Let VH ⊂ Vh be two nested linear finite element spaces. Then for any u ∈ Vh , there
(I − Q
H )u0, cd (h, H )H |u|1,
and
|Q
H u|1, cd (h, H )|u|1,
where
⎧
⎪
⎪ H 1/2
⎪
⎨ log h if d = 2
cd (h, H ) = C ·
⎪
⎪ H 1/2
⎪
⎩ if d = 3
h
The proof of this lemma is based on the properties of the standard interpolation operator and
Sobolev imbedding theorem (for details, see [23]). The above lemma is not necessary true for
general H 1 () function. However, if we use the full weighted H 1 -norm, then we have

For all u ∈ HD1 (), we have
(I − Q
H )u0, H | log H |
1/2
u1,
In general, we cannot replace u1, by the semi-norm |u|1, in the above lemma. For this
1 () of H 1 () as
purpose, u ∈ HD1 () must satisfy certain condition. We introduce a subspace HD D
follows:

HD () = v ∈ HD () :
1 1
v dx = 0, ∀m ∈ I
0m
An important feature of this subspace is that the Poincaré–Friedrichs inequality holds

v0, |v|1, D1 ()
∀v ∈ H (9)
DOI: 10.1002/nla
278 Y. ZHU
Remark 3.3
The condition 0 v = 0 is not essential. The main idea is to introduce a subspace such that the
m
Poincaré–Friedrichs inequality (9) holds. It can be replaced by some other conditions. For example,
we can use

v dx = 0, Fm ⊂ *0m and meas(Fm )>0
Fm
for each 0m

such that m ∈ I. In this case, the Poincaré–Friedrichs inequality (9) is still true (see
[11, 14] for more details).
Thanks to inequality (9), we have the following estimates for the weighted L 2 -projection:
Lemma 3.4
1 () we have
For any v ∈ HD
(I − Q
H )v0, H | log H |
1/2
|v|1, (10)
and
|Q
H v|1, | log H |
1/2
|v|1, (11)
Proof
From the assumption, v satisfies the Poincaré–Friedrichs inequality (9). Inequality (10) then follows
by Lemma 3.2.
The proof of inequality (11) relies on (10) and the local L 2 projection Q : L 2 () → P1 ()
defined by (Q u, ) = (u, ) for all ∈ P1 (). Then on each element ∈ TH , we have
|Q 2
H v|1, |Q H v − Q v|1, +|Q v|1,
2 2
H −2 Q
H v − Q v0, +|Q v|1,
2 2
H −2 (v − Q
H v0, +v − Q v0, )+|Q v|1,
2 2 2
H −2 v − Q
H v0, +|v|1,
2 2
In the last inequality, we used the stability and approximation properties of Q , see [23, Lemma 3.3].
By multiplying suitable weights and summing up over all ∈ TH on both sides, we obtain
|Q 2 −2
H v|1, h v − Q H v0, +|v|1, | log H ||v|1,
2 2 2
In the last step, we used inequality (10).

Although it is true for d = 2 or 3, Lemma 3.4 is of interest only when d = 3. When d = 2,
Lemma 3.1 is sufficient for our future use. From Lemma 3.4, the approximation and stability of
the weighted L 2 -projection will deteriorate by | log H |. A sharper estimate can be obtained if we
assume that each subregion 0m is a polyhedral domain with each edge of length H0 .
Assume G is a polyhedral domain in R3 . Then
v L 2 (E) | log h|1/2 v1,G ∀v ∈ Vh (G)
where E is any edge of G.
DOI: 10.1002/nla
1 (), we have
By the Poincaré–Friedrichs inequality (9), for each v ∈ HD
v1,0 |v|1,0 for all 0m (m = 1, . . . , M)

m m
Then by Lemma 3.5 and a standard scaling argument,

H0 1/2 D1 ()
v L 2 (E) log |v|1,0 ∀v ∈ VH (0m )∩ H (12)
H m
In this case, we can obtain the following approximation and stability properties for the weighted
L 2 -projection:
Lemma 3.6
In R3 , assume that each subregion 0m , (m = 1, . . . , M) satisfies H0 length(E) for each edge E
1 (), we have
of 0m . Then for all v ∈ HD
1/2
H0
(I − Q
H )v0, H log |v|1, (13)
H
and
1/2
H0
|Q
H v|1, log |v|1, (14)
H
Proof
Define w ∈ VH by
⎧
⎪
⎪ w at the nodes inside 0m
⎨ m
w= QFu at the nodes inside F ⊂ *0m
⎪
⎪
⎩
0 at the nodes elsewhere
where wm = Q H v is the standard L 2 -projection of v, F ⊂ *0m is any face of 0m , and Q F : L 2 (F) →
VH (F) is the orthogonal L 2 (F) projection. Then

w −wm 2L 2 (0 ) H 3 (w −wm )2 (x)
m
x∈*0m

H3 (w −wm )2 (x)
F⊂*0m x∈F

H 3
(wm − Q F u) (x)+2
wm
2
(x)
F∈*0m x∈F x∈* F

(H wm − Q F u2L 2 (F) + H 2 wm 2L 2 (* F) )
F∈*0m
DOI: 10.1002/nla
280 Y. ZHU
We need to bound two terms appearing in the last expression. For the first term, we have

H wm − Q F u2L 2 (F) H u −wm 2L 2 (*0 )
m
F∈*0m
u −wm 2L 2 (0 ) + H 2 u −wm 21,0

m m
H 2 u21,0
m
In the second step, we used inequality
v L 2 (*0 )
−1 v0,0 +
v1,0 (15)
m m m
The second term can be bounded by using inequality (12)

H0 H0
H 2 wm 2L 2 (* F) H 2 log |wm |21,0 H 2 log |u|21,0
F∈* 0 H m H m
m
In the last step, we used the stability of Q H : |wm |1,0 = |Q H u|1,0 |u|1,0 . Consequently,
m m m
1/2
H0
w −wm 0,0 H log |u|1,0
m H m
This proves (13). The proof of the stability (14) is the same as in Lemma 3.4.
Remark 3.7
D (), we have
In addition to the condition in Lemma 3.6, if H H0 then for all v ∈ H
(I − Q w
H )0,w H |v|1,w (16)
|Q w
H v|1,w |v|1,w (17)
In fact, in this case, obviously inequality (12) becomes

D1 ()
v L 2 (E) |v|1,0 , ∀ v ∈ VH (0m ) ∩ H
m
Then inequalities (16) and (17) follows by the same proof as Lemma 3.6.
4. OVERLAPPING DOMAIN DECOMPOSITION METHODS
In this section, we consider the two level overlapping domain decomposition methods. Specifically,
there is a fine grid Th with mesh size h as described in Section 2.2, on which the solution is
sought. There is also a coarse grid TH with mesh size H. For simplicity, we assume that each
element in TH is a union of some elements in Th , and we also assume that TH aligns with
the jump interface. Let V := Vh and V0 := VH be the piecewise linear continuous finite element
spaces on Th and TH , respectively.
DOI: 10.1002/nla
We partition the domain into L nonoverlapping subdomains l (l = 1, . . . , L), such that =

L
l=1 l . Enlarge each subdomain l to l in such a way that the restriction of triangulation Th
on l is also a triangulation of l itself, and l consists of all points in within a distance of CH

from l . Here, we make no assumption on the relationship between this partition and the jump
regions 0m (m = 1, . . . , M). Based on the partition, a natural decomposition of the finite element
space V is

L
V= Vl where Vl := {v ∈ V : v = 0 in \l }
l=1
As usual, we introduce the coarse space V0 to provide the global coupling between subdomains.
Obviously, we have the space decomposition

L
V= Vl
l=0
For each l = 0, 1, . . . , L , we define the projections Pl , Q l : V → Vl by
a(Pl u, vl ) = a(u, vl ), (Q l u, vl )0, = (u, vl )0, ∀vl ∈ Vl
and define the operator Al : Vl → Vl by
(Al u l , vl )0, = a(u l , vl ) ∀u l , vl ∈ Vl
For convenience, we denote A = A L and Q

−1 = 0. It follows from the definitions that
Q l A = Al Pl and Q l Q
k = Q k Ql = Q k for kl
The additive Schwarz preconditioner is defined by

L
B= Al−1 Q l (18)
l=0
Obviously, we have

L
L
BA = Al−1 Q l A = Pl
l=0 l=0
4.1. Relation between additive Schwarz and diagonal scaling

In [31], it was proved that the additive Schwarz preconditioner and diagonal scaling (Jacobi
preconditioner) have the following relationship:
Theorem 4.1 ([31])

There exist constants C1 1 and C2 >0 that depend only on the connectivity of the mesh such that,
for all k = 1, . . . , n,
k (D −1 A)C1 k (BA)C2
DOI: 10.1002/nla
282 Y. ZHU
By using this theorem, we have
m 0 +1 (BA) m 0 +1 (D −1 A)h 2
From this relationship, we can see that the m 0 th effective condition number m 0 +1 (BA) h −2 is
independent of the coefficients. We refer to [30] for a simple analytic proof of this fact. However,
this estimate is too rough. It was pointed out that m 0 +1 (BA) could be much better than this
estimate, but no rigorous proof was given in [31]. In the following subsection, we analyze the
eigenvalue distribution of BA and prove the robustness of the additive Schwarz preconditioner.
4.2. Eigenvalue analysis of BA

By a standard coloring technique [39, 40], we can easily prove max (BA)C where C is independent
of the mesh and coefficient. The analysis of the lower bound of eigenvalues relies on certain stable
decomposition.
1 () in Section 3, we introduce a subspace V
Similar to H of V by
D

V := HD ()∩V = v ∈ V :
1
v = 0, for m ∈ I
m
⊥
) = m 0 and the Poincaré–Friedrichs inequality (9) holds for
We shall emphasis here that dim(V
Then we have the following stable decomposition result:
any v ∈ V.
Lemma 4.2 L
For any v ∈ V, there exist vl ∈ Vl such that v = l=0 vl and

L
a(vl , vl ) cd (h, H )2 a(v, v) (19)
l=0
L
there exist vl ∈ Vl such that v =
For any v ∈ V, l=0 vl and

L
a(vl , vl ) | log H |a(v, v) (20)
l=0
Furthermore, if each subdomain 0m satisfies length(E) H0 for any edge E of 0m , then for any

there exist vl ∈ Vl such that v = L vl and
v ∈ V, l=0

L H0
a(vl , vl ) 1+log a(v, v) (21)
l=0 H
L
In particular, in this case if the coarse grid satisfies H H0 the l=0 a(vl , vl ) a(v, v).
Proof
The ideas to prove inequality (19)–(21) are the same. The main difference is that we use different
properties of weighted L 2 -projection in Section 3. Here, we follow the idea from [20].
DOI: 10.1002/nla
L
Let {l }l=1
L be a partition of unity defined on satisfying
l=1 l = 1 and for l = 1, 2, . . . , L ,
supp l ⊂ l ∪*, 0l 1, ∇l ∞,l C H −1

Here ·∞,O denote the L ∞ -norm of a function defined on a subdomain O.
L
The construction of such a partition of unity is standard. A partition v = l=0 vl for vl ∈ Vl can
then be obtained by taking v0 = Q 0 v and
vl = Ih (l (v − Q
0 v)) ∈ Vl , l = 1, . . . , L
where Ih is the nodal value interpolant on V.
From this decomposition, we prove that inequalities (19) and (20) hold. For any ∈ Th ,
note that
h
l −l, L ∞ () h∇l L ∞ ()
H
Let w = v − Q
0 v, and by the inverse inequality
|vl |1, |l, w|1, +|Ih (l −l, )w|1,
|w|1, +h −1 Ih (l −l, )w0,

It is easy to show that
h
Ih (l −l, )w0, w0,
H
Consequently,
1
|vl |21, |w|21, + w20,
H2
Summing over all ∈ Th ∩l with appropriate weights gives
1
|vl |21, = |vl |21,,l |w|21,,l + w20,,l
H2
and

L
L
L 1
a(vl , vl ) |vl |21,,l |w|21,,l +
2
w20,,l
l=1 l=1 l=1 H

1
|v − Q v|
0 1,
1
+ v − Q 2
0 v 0,
H2
From the above inequality, for any v ∈ V, applying Lemma 3.1 we obtain inequality (19). Applying
Lemma 3.4 for v ∈ V gives inequality (20), and applying Lemma 3.6 for any v ∈ V, we obtain
inequality (21). This completes the proof.
Theorem 4.3
For the additive Schwarz preconditioner B defined by (18), the eigenvalues of BA satisfies
min (BA)cd (h, H )−2 , m 0 +1 (BA)C|log H |−1 and max (BA)C
DOI: 10.1002/nla
284 Y. ZHU
Moreover, when d = 3 and if each subregion 0m is a polyhedral domain with each edge of length
H0 , then

H0 −1
m 0 +1 (BA)C 1+log
H
Especially, if H H0 then m0+1 (B A)C.
Proof
L
Note that BA = l=0 Pl , by a standard coloring argument, we have
max (BA)C
L
For the minimum eigenvalue, for any v ∈ V consider the decomposition v = l=0 vl as in
Lemma 4.2. By the Schwarz inequality, we obtain

L
L
a(v, v) = a(vl , v) = a(vl , Pl v)
l=0 l=0
1/2 1/2

L
L
a(vl , vl ) a (Pl v, Pl v)
l=0 l=0
1/2

L
= a(vl , vl ) (a (BAv, v))1/2
l=0
Followed by (19), we have

a(v, v)cd (h, H )a(v, v)1/2 a(BAv, v)1/2 ∀v ∈ V
This implies
min (BA)cd (h, H )−2
On the other hand, by (20), we have
a(v, v) | log H |1/2 a(v, v)1/2 a(BAv, v)1/2
∀v ∈ V
⊥ ) = m 0 , we obtain
By Min–Max Lemma 2.3, and note that dim(V
m 0 +1 (BA)| log H |−1
Similarly, from by (21) and Min–Max Lemma 2.3,

H0 −1
m 0 +1 (BA)C 1+log
H
when each subregion satisfies length(E) H0 . This completes the proof.
Remark 4.4
Theorem 4.3 gives a direct proof of the robustness of overlapping domain decomposition precondi-
tioner for the variable coefficient problem (1). That is, the preconditioned system has only m 0 small
DOI: 10.1002/nla
eigenvalues, and the effective condition number is bounded by C| log H |, or C(1+log H0 /H ) if

each subregion is a polyhedral domain with each edge of length H0 . Especially when H H0 , the
effective condition number is bounded uniformly.
The estimates of the maximum and minimum eigenvalues of BA are standard and can be found
in many references (see, for example, [27, 39]). From the above theorem, we know that when d = 2,
(BA)C(1+log H/ h) which is also quite robust. However, for the worst case in d = 3, we have
(BA)C H/ h, which grows rapidly as h → 0. In this case, we have the following convergence
estimate for the PCG algorithm.
Theorem 4.5
In R3 , assume that each subregion 0m (m = 1, . . . , M) is a polyhedral domain with each edge of
length H0 . Let u ∈ V be the exact solution to Equation (2) and {u k : k = 0, 1, 2, . . .} be the solution
sequence of the PCG algorithm. Then we have
m 0
u −u k A C0 H
2 −1 k−m 0 for km 0
u −u 0 A h

where = 1−2/(C 1+logH0 /H +1) < 1 and C0 , C are constants independent of coefficients
and mesh size. Moreover, given a tolerance 0<
<1, the number of iterations needed for
u −u k A /u −u 0 A <
satisfies

2 C0 H
km 0 + log +m 0 log −1 | log()|

h
Especially, if H H0 then the asymptotic convergence rate of the PCG algorithm is uniform
bounded with respect to both the coefficients and mesh size.
Theorem 4.5 is a direct consequence of inequalities (6) and (7) and Theorem 4.3.
Remark 4.6
From Theorem 4.5, although the convergence rate will deteriorate slightly by the condition number
(BA), because m 0 is a fixed number, the asymptotic convergence rate can be bounded by < 1
which is uniform with respect to the coefficients and the mesh size.
Without the assumption on the subdomains 0m (m = 1, . . . , M), Theorem 4.5 becomes
m 0 k−m 0
u −u k A C0 H 2
2 −1 1− for km 0
u −u 0 A h C1 | log H |+1
In this case, the number of iterations needed for u −u k A /u −u 0 A <

with the given tolerance
0<
<1 satisfies

2 C0 H
km 0 + log +m 0 log −1 c0 (H )

h
where

Cl | log H |+1
c0 (H ) = log
Cl | log H |−1
DOI: 10.1002/nla
286 Y. ZHU
Remark 4.7
By similar arguments, the results above can be generated to the inexact solver additive Schwarz
preconditioners (cf. [39]) and also to the multilevel additive Schwarz preconditioners (cf. [41]).
For the BPX preconditioner and the multigrid V -cycle preconditioner, similar results can be found
in [30].
5. MATRIX REPRESENTATIONS
So far, our analysis are based on the operator form (19). In this section, we are going to look at
the algebraic representation of this preconditioner and to show that introducing the weighted L 2 -
projection Q l is for theoretical purpose only. That is, the matrix representation of B is independent
of Q l .
Let V be the finite element space, with the nodal basis {1 , . . . , n }. Then given any function
v ∈ V, there exists a unique ∈ Rn such that

n
v= i i
i=1
Let ṽ = be the vector representation of v. Given two linear vector spaces V and W and a linear
operator A ∈ L(V, W), the matrix representation of A with respect to a basis {1 , . . . , n } of V
and a basis { 1 , . . . , k } of W is the matrix Ã = (ãi j ) ∈ Rk×n satisfying

k
A j = ãi j i for 1 jn
i=1
From the above definitions, it is easy to see that for any two operators A, B ∈ L(V) and v ∈ V,
we have

AB = A B and
Av = Ãṽ (22)
Given any subspace V0 ⊂ V equipped with a basis {01 , . . . , 0n 0 }. Then there exists a unique
matrix I0 = (ei j ) ∈ Rn×n 0 such that

n
0j = ei j i for j = 1, . . . , n 0
i=1
This matrix is the matrix representation of the natural inclusion I0 : V0 → V, and it is known a
prolongation matrix. Its transpose It0 is known as a restriction matrix.
Define the mass matrix M and the stiffness matrix A by
M = ((i , j )0, )n×n and A = ((Ai , j )0, )n×n
By definition we have (u, v)0, = (ũ, Mṽ)2 and we can easily show that A = M Ã. Obviously,
the prolongation and restriction matrices satisfy the following important relation:
(u, v0 )0, = (Mũ, I0 ṽ0 )2 = (It0 Mũ, ṽ0 )2
DOI: 10.1002/nla
For the weighted L 2 projection Q l : V → Vl , we have by definition

(u, vl )0, = (Q l u, vl )0, = ( Q
l u, Ml Vl )2 = ( Q l ũ, Ml Vl )2
where Ml is the mass matrix on Vl . On the other hand, note that
(u, vl )0, = (u, Il vl )0, = (ũ, M l ) 2
Il vl )2 = (ũ, MIl V
we deduce that the matrix representation of the weighted L 2 projection Q l , denoted by Ql , is
Ql = Ml −1 Ilt M (23)

To derive the algebraic representation of the preconditioner

L
L
B= Al−1 Q l = Il Al−1 Q l
l=0 l=0
applying (22) and (23) gives

L
L

B= l Q
Il R l = Il (Al−1 Ml )(Ml−1 Ilt M) = BM
l=0 l=0
where Al−1 is the algebraic representation of Al−1 and

L
B= Il Al−1 Ilt
l=0
is the standard matrix representation of the additive Schwarz preconditioner B.

We summarize this section by the following relationship between the operator equation (2) and
the algebraic equation (22). The linear iterations of (2) and (22) can be expressed as
u k+1 = u k + B(F − Au k ) and k+1 = k +B(b −Ak ), k = 0, 1, . . .
respectively.
Proposition 5.1 (Xu [20])
u is a solution to (2) if and only if = The linear iterations
u is the solution to (22) with b = M F.

are equivalent if and only if B = BM. In this case, (BA) = (BA).
6. CONCLUSION
In this paper, we discussed the eigenvalue distribution of the additive and multiplicative overlapping
domain decomposition methods for second-order elliptic equations with large jump coefficients.
We proved that there are only a few small eigenvalues infected by the large jump and that
the effective condition number of the preconditioned system is of O(| log H |). As a result, the
asymptotic
convergence rate of the PCG algorithm with additive Schwarz preconditioner is 1−
2/(C | log H |+1). With additional assumptions on the subregions 0m (m = 1, . . . , M), we also
proved that the effective condition number of the preconditioned system is uniform bounded with
respect to the coefficients and mesh size. In this case, the asymptotic convergence rate of the PCG
algorithm is bounded uniformly.
DOI: 10.1002/nla
288 Y. ZHU
ACKNOWLEDGEMENTS
This work was supported in part by NSF DMS-0609727, NSFC-10528102 and the Center for Computational
Mathematics and Applications at Pennsylvania State. The author would like to thank professor Jinchao
Xu for his valuable advice and comments on this paper.
REFERENCES
1. Alcouffe RE, Brandt A, Dendy JE, Painter JW. The multi-grid methods for the diffusion equation with strongly
discontinuous coefficients. SIAM Journal on Scientific and Statistical Computing 1981; 2:430–454.
2. Kees CE, Miller CT, Jenkins EW, Kelley CT. Versatile two-level Schwarz preconditioners for multiphase flow.
Computers and Geosciences 2003; 7(2):91–114.
3. Vuik C, Segal A, Meijerink JA. An efficient preconditioned CG method for the solution of a class of layered
problems with extreme contrasts in the coefficients. Journal of Computational Physics 1999; 152(1):385–403.
4. Heise B, Kuhn M. Parallel solvers for linear and nonlinear exterior magnetic field problems based upon coupled
FE/BE formulations. Computing 1996; 56(3):237–258.
5. Coomer RK, Graham IG. Massively parallel methods for semiconductor device modelling. Computing 1996;
56(1):1–27.
6. Howle VE, Vavasis SA. An iterative method for solving complex-symmetric systems arising in electrical power
modeling. SIAM Journal on Matrix Analysis and Applications 2005; 26(4):1150–1178.
7. Wang C. Fundamental models for fuel cell engineering. Chemical Reviews 2004; 104:4727–4766.
8. Wang Z, Wang C, Chen K. Two phase flow and transport in the air cathode of proton exchange membrane fuel
cells. Journal of Power Sources 2001; 94:40–50.
9. Bramble JH, Pasciak JE, Schatz AH. The construction of preconditioners for elliptic problems by substructuring.
IV. Mathematics of Computation 1989; 53(187):1–24.
10. Chan T, Wan W. Robust multigrid methods for nonsmooth coefficient elliptic linear systems. Journal of
Computational and Applied Mathematics 2000; 123:323–352.
11. Dryja M, Widlund OB. Schwarz methods of Neumann–Neumann type for three-dimensional elliptic finite element
problems. Communications on Pure and Applied Mathematics 1995; 48(2):121–155.
12. Mandel J, Brezina M. Balancing domain decomposition for problems with large jumps in coefficients. Mathematics
of Computation 1996; 65(216):1387–1401.
13. Smith BF. A domain decomposition algorithm for elliptic problems in three dimensions. Numerische Mathematik
1991; 60(2):219–234.
14. Xu J, Zou J. Some nonoverlapping domain decomposition methods. SIAM Review 1998; 40(4):857–914
(electronic).
15. Bramble JH, Pasciak JE, Schatz AH. The construction of preconditioners for elliptic problems by substructuring.
III. Mathematics of Computation 1988; 51(184):415–430.
16. Cho S, Nepomnyaschikh SV, Park E-J. Domain decomposition preconditioning for elliptic problems with jumps in
coefficients. Technical Report rep05-22, Radon Institute for Computational and Applied Mathematics (RICAM),
2005.
17. Nepomnyaschikh SV. Preconditioning operators for elliptic problems with bad parameters. Eleventh International
Conference on Domain Decomposition Methods, London. DDM.org: Augsburg, 1999; 82–88 (electronic).
18. Wang J, Xie R. Domain decomposition for elliptic problems with large jumps in coefficients. Proceedings of
Conference on Scientific and Engineering Computing. National Defense Industry Press: Beijing, China, 1994;
74–86.
19. Bramble JH, Pasciak JE, Wang J, Xu J. Convergence estimates for multigrid algorithms without regularity
assumption. Mathematics of Computation 1991; 57(195):23–45.
20. Xu J. Iterative methods by space decomposition and subspace correction. SIAM Review 1992; 34:581–613.
21. Le Tallec P. Domain decomposition methods in computational mechanics. Computational Mechanics Advances
1994; 1(2):121–220.
22. Smith BF, Bjørstad PE, Gropp WD. Domain Decomposition. Cambridge University Press: Cambridge, 1996.
23. Bramble JH, Xu J. Some estimates for a weighted L 2 projection. Mathematics of Computation 1991; 56(194):
463–476.
24. Oswald P. On the robustness of the BPX-preconditioner with respect to jumps in the coefficients. Mathematics
of Computation 1999; 68:633–650.
DOI: 10.1002/nla
25. Wang J. New convergence estimates for multilevel algorithms for finite-element approximations. Journal of
Computational and Applied Mathematics 1994; 50(1–3):593–604.
26. Dryja M, Sarkis MV, Widlund OB. Multilevel Schwarz methods for elliptic problems with discontinuous
coefficients in three dimensions. Numerische Mathematik 1996; 72(3):313–348.
27. Dryja M, Smith BF, Widlund OB. Schwarz analysis of iterative substructuring algorithms for elliptic problems
in three dimensions. SIAM Journal on Numerical Analysis 1994; 31(6):1662–1694.
28. Xu J. Counter examples concerning a weighted L 2 projection. Mathematics of Computation 1991; 57:563–568.
29. Xu J, Zhu Y. Multilevel preconditioners for elliptic equations with jump coefficients on bisection grids. 2007,
preprint.
30. Xu J, Zhu Y. Uniform convergent multigrid methods for elliptic problems with strongly discontinuous coefficients.
Mathematical Models and Methods in Applied Sciences 2008; 18(2):1–29.
31. Graham IG, Hagger MJ. Unstructured additive Schwarz-conjugate gradient method for elliptic problems with
highly discontinuous coefficients. SIAM Journal on Scientific Computing 1999; 20:2041–2066.
32. Axelsson O. Iteration number for the conjugate gradient method. Mathematics and Computers in Simulation
2003; 61(3–6):421–435.
33. Xu J. Lecture Notes Multigrid Methods. Penn State MATH 552 (Fall 2006).
34. Golub GH, van Loan CF. Matrix Computations. Johns Hopkins University Press: Baltimore, MD, 1996.
35. Kelley CT. Iterative Methods for Linear and Nonlinear Equations, vol. 16. SIAM: Philadelphia, PA, 1995.
36. Saad Y. Iterative Methods for Sparse Linear Systems. SIAM: Philadelphia, PA, 2003.
37. Hackbusch W. Iterative Solution of Large Sparse Systems of Equations, vol. 95. Springer: New York, 1994.
38. Axelsson O. Iterative Solution Methods. Cambridge University Press: Cambridge, 1994.
39. Chan TF, Mathew TP. Domain decomposition algorithms. Acta Numerica 1994; 3:61–143.
40. Toselli A, Widlund O. Domain Decomposition Methods—Algorithms and Theory, vol. 34. Springer: Berlin, 2005.
41. Zhang X. Multilevel Schwarz methods. Numerische Mathematik 1992; 63(1):521–539.
DOI: 10.1002/nla
Uniform convergence of the multigrid V -cycle on graded meshes

for corner singularities
James J. Brannick, Hengguang Li∗, † and Ludmil T. Zikatanov

Department of Mathematics, The Pennsylvania State University, University Park, PA 16802, U.S.A.
SUMMARY
This paper analyzes a multigrid (MG) V -cycle scheme for solving the discretized 2D Poisson equation with
corner singularities. Using weighted Sobolev spaces K am () and a space decomposition based on elliptic
projections, we prove that the MG V -cycle with standard smoothers (Richardson, weighted Jacobi, Gauss–
Seidel, etc.) and piecewise linear interpolation converges uniformly for the linear systems obtained by
finite element discretization of the Poisson equation on graded meshes. In addition, we provide numerical
experiments to demonstrate the optimality of the proposed approach. Copyright q 2008 John Wiley &
Sons, Ltd.
Received 23 May 2007; Revised 30 November 2007; Accepted 3 December 2007
KEY WORDS: multigrid methods; graded meshes; uniform convergence; corner-like singularities
1. INTRODUCTION
Multigrid (MG) methods are arguably one of the most efficient techniques for solving the large
systems of algebraic equations resulting from finite element discretizations of elliptic boundary
value problems. Many of the known results on the convergence properties of MG methods for
elliptic equations can be found in monographs and survey papers by Bramble [1], Hackbusch [2],
Trottenberg et al. [3], Xu [4] and the references therein.
It is well known that the geometry of the boundary and changes in the boundary condition can
influence the regularity of the solution [5–12]. In particular, if the domain possesses re-entrant
corners, cracks, or there exist abrupt changes in the boundary conditions, then the solution of
∗ Correspondence to: Hengguang Li, Department of Mathematics, The Pennsylvania State University, University Park,
PA 16802, U.S.A.
†
E-mail: li h@math.psu.edu
Contract/grant sponsor: NSF; contract/grant numbers: DMS-0555831, DMS-058110

Contract/grant sponsor: Lawrence Livermore National Lab; contract/grant number: B568399

292 J. J. BRANNICK, H. LI AND L. T. ZIKATANOV
the elliptic boundary value problem may have singularities in H 2 —we hereafter refer to singu-
larities of these types as corner-like singularities. One possible approach for obtaining accurate
numerical approximations to the solutions nearby these types of singularities is to make use of
graded meshes [6, 13–15], for which the quasi-optimal convergence rates of the numerical solu-
tions can be recovered by using an analysis based on weighted Sobolev spaces. The analysis of
the convergence rate of MG methods in such settings is, however, non-trivial. The difficulties
that arise are due primarily to the lack of regularity of the solution and the non-uniformity of
the mesh.
A result for the uniform convergence of the MG method assuming full regularity was derived by
Braess and Hackbusch [16]; in Brenner’s paper [17], the analysis of the convergence rate for only
partial regularity was presented; Bramble et al. [18] developed the convergence estimate without
regularity assumptions for an L 2 -projection-based decomposition. In addition, on graded meshes,
using the approximation property in [14], Yserentant [19] proved the uniform convergence of the
MG W -cycle with a particular iterative method on each level for piecewise linear functions. There
are also many other more classical convergence proofs that use algebraic techniques and derive
convergence results based on assumptions related to, but nevertheless different from, the regularity
of the underlying partial differential equation [20, 21].
In this paper, using a space decomposition for elliptic projections and an estimate on the weighted
Sobolev space K am , we prove the uniform convergence of the MG V -cycle with standard subspace
smoothers (Richardson, weighted Jacobi, Gauss–Seidel, etc.) for elliptic problems with corner-
like singularities, discretized using graded meshes. To date, this type of convergence analysis
has been carried out only for problems with full elliptic regularity. The result presented here
establishes the uniform convergence of the MG method for problems with less regular solutions
discretized using graded meshes that appropriately capture the correct behavior of the solution near
the singularities. Although the main convergence theorem can be modified for elliptic problems
discretized on general graded meshes, for exposition, we restrict our discussion to the graded mesh
refinement (GMR) strategy developed by Băcuţă et al. [6]. Before proceeding, we mention that,
with appropriate modifications, our analysis for linear elements can also be applied to higher-order
finite element methods.
1.1. Preliminaries and notation

Let be a bounded polygonal domain, possibly with cracks, in R2 and consider the following
prototype elliptic equation with mixed boundary conditions:
−u = f in
u=0 on * D (1)
*u/*n = 0 on * N
where * D and * N consist of segments of the boundary, and we assume that the Neumann
boundary condition is not imposed on adjacent sides of the boundary. We note that, in the Sobolev
space H m , corner-like singularities appear in the solution near vertices of the domain. Here, by
vertices, we mean the points on ¯ where corner-like singularities in H 2 () are located, namely,
the geometric vertices on re-entrant corners, crack points, or points with an interior angle > /2,
where the boundary conditions change.
DOI: 10.1002/nla
MULTIGRID METHOD ON GRADED MESHES 293
Let H D1 () = {u ∈ H 1 ()| u = 0 on * D } be the space of H 1 () functions with zero trace on
* D , Tj , 0 jJ , be a sequence of appropriately graded and nested triangulations of , and Mj ,
0 jJ , be the finite element space associated with the linear Lagrange triangle [22] on Tj . Then,
M0 ⊂ M1 ⊂ · · · ⊂ Mj ⊂ · · · ⊂ MJ ⊂ H D1 ()
Let A be the differential operator associated with Equation (1). Solving (1) amounts to finding an
approximation u J ∈ MJ such that
a(u J , v J ) = (Au J , v J ) = (∇u J , ∇v J ) = ( f, v J ) ∀v J ∈ MJ
Denoting by N J the dimension of the space MJ , by using a GMR strategy, one can recover the
following quasi-optimal rate of convergence for the finite element approximation u J ∈ MJ on TJ :
−1/2
u −u J H 1 () C N J f L 2 ()
The main objective of this paper is to prove the uniform convergence of the MG V -cycle
with standard subspace smoothers (Richardson, weighted Jacobi, Gauss–Seidel, etc.) and linear
interpolation applied to the 2D Poisson equation discretized using piecewise linear functions on
graded meshes obtained via the GMR strategy introduced in [6]. Moreover, we shall show that the
convergence rate, c, of the MG V -cycle satisfies
c1
c
c1 +c2 n
where c1 and c2 are mesh-independent constants related to the elliptic equation and the smoother,
respectively, and n is the number of iterative solves on each subspace. We note that this result can
also be used to estimate the efficiency of other subspace smoothers on graded meshes.
The rest of this paper is organized as follows. In Section 2, we introduce the weighted Sobolev
space K am () for boundary value problem (1) and review the method of subspace corrections
(MSC). In addition, we briefly describe the GMR strategy under consideration here for generating
the sequence of graded meshes. Then, in Section 3, we prove the approximation and smoothing
properties, which in turn lead to our main MG convergence theorem. Section 4 contains numerical
results of the proposed method applied to problem (1).
2. WEIGHTED SOBOLEV SPACES AND THE MSC
In this section, we begin by introducing the weighted Sobolev space K am () and the mesh refinement
strategy under consideration for recovering quasi-optimal rates of convergence of the finite element
solution. Then, we present the MSC and a technique for estimating the norm of the product of
non-expansive operators.
2.1. Weighted Sobolev spaces and graded meshes

It has been shown in [6–8, 14, 23] that with a careful choice of the parameters in the weight, the
singular behavior of the solution in Equation (1) can be captured well in the following weighted
Sobolev spaces. Namely, there is no loss of regularity of the solution in these spaces and the
corresponding refinements of meshes are optimal in the sense of Theorem 2.3.
DOI: 10.1002/nla
¯ be an arbitrary point and S = {Si } be the set of vertices of the domain, on which
Let (x, y) ∈
the solution has singularities in H 2 (). Denote by ri (x, y) the distance from (x, y) to the vertex
Si ∈ S and let (x, y) be a smooth function on , ¯ such that =ri in the neighborhood of Si ,
and C > 0 otherwise. Then, the weighted Sobolev space K am (), m0, is defined as follows
[6, 11]:
i j
K am () = {u ∈ Hloc
m
()| i+ j−a *x * y u ∈ L 2 (), i + jm}
The corresponding K am -norm and seminorm for any function v ∈ K am () are
i j
v2K m () := i+ j−a *x * y v2L 2 ()
a
i+ j m
i j
|v|2K m () := m−a *x * y v2L 2 ()
a
i+ j=m
Note that is equal to the distance function ri (x, y) near the vertex Si . Thus, we have the following
proposition and mesh refinements as in [6, 15].
Proposition 2.1
We have |v| K 1 () =
∼ |v| H 1 () , v K 0 () Cv L 2 () , and the Poincaré type inequality v K 0 ()
1 1 1
C|v| K 1 () for v ∈ K 11 ()∩{v|* D = 0}.
1
Here, a =
∼ b means there exist positive constants C1 , C2 , such that C1 baC2 b.
Definition 2.2
Let be the ratio of decay of triangles near a vertex Si ∈ S. Then, for every < min(/ti ), one can
choose = 2−1/ , where i is the interior angle of vertex Si , t = 1 on vertices with both Dirichlet
boundary conditions, and t = 2 if the boundary condition changes type at Si . For example, i = 2
and t = 1 on crack points with both Dirichlet boundary conditions. In the initial triangulation, we
require that each triangle contains at most one point in S, and each Si needs to be a vertex of
some triangle. In other words, no point in S is sitting on the edge or in the interior of a triangle.
Let Tj = {Tk } be the triangulation after j refinements. Then, for the ( j +1)th refinement, if the
function is bounded away from 0 on a triangle (no point in S contained), new triangles are
obtained by connecting the mid-points of the old one. However, if Si is one of the vertices of a
triangle Si BC, then we choose a point D on Si B and another point E on Si C such that the
following holds for the ratios of the lengths
= Si D/Si B = Si E/Si C
In this way, the triangle Si BC is divided into four smaller triangles by connecting D, E, and
the mid-point of BC (see Figure 1).
We note that other refinements, for example, those found in [13, 14] also satisfy this condition,
although they follow different constructions. We now conclude this subsection by restating the
following theorem derived in [6, 15].
DOI: 10.1002/nla
Figure 1. Mesh refinements: triangulation after one refinement, = 0.2.
Theorem 2.3
Let u j ∈ Mj be the finite element solution of Equation (1) and denote by N j the dimension of Mj .
Then, there exists a constant B1 = B1 (, , ), such that
−1/2 −1/2
u −u j H 1 () B1 N j f K 0 B1 N j f L 2 ()
−1 ()
for every f ∈ L 2 (), where < 1 is determined from Definition 2.2, Mj is the finite element space
of linear functions on the graded mesh Tj , as described in the introduction.
Remark 2.4
m+1
For u ∈
/ H 2 (), this theorem follows from the fact that the differential operator A : K 1+ ()∩{u =
m−1
0, on * D } → K −1+ (), m0, in Equation (1), is an isomorphism between the weighted Sobolev
spaces.
2.2. The method of subspace corrections

In this subsection, we review the MSC and provide an identity for estimating the norm of the
product of non-expansive operators. In addition, Lemma 2.6 reveals the connection between the
matrix representation and operator representation of the MG method.
Let H D1 () = {u ∈ H 1 ()|u = 0 on * D } be the Hilbert space associated with Equation (1), Tj
be the associated graded mesh, as defined in the previous subsection, Mj ∈ H D1 () be the space
of piecewise linear functions on Tj , and A : H D1 () → (H D1 ()) be the corresponding differential
operator. The weak form for (1) is then
a(u, v) = (Au, v) = (−u, v) = (∇u, ∇v) = ( f, v) ∀v ∈ H D1 ()
where the pairing (·, ·) is the inner product in L 2 (). Here, a(·, ·) is a continuous bilinear form
on H D1 ()× H D1 () and by the Poincare inequality is also coercive. In addition, since the Tj are
nested,
M0 ⊂ M1 ⊂ · · · ⊂ Mj ⊂ · · · ⊂ MJ ⊂ H D1 ()
Define Q j , P j : H D1 () → Mj and A j : Mj → Mj as orthogonal projectors and the restriction of

A on Mj , respectively,
(Q j u, v j ) = (u, v j ), a(P j u, v j ) = a(u, v j )
(Au j , v j ) = (A j u j , v j ) ∀u ∈ H D1 () ∀u j , v j ∈ Mj
DOI: 10.1002/nla
j j
Let N j = {xi } be the set of nodal points in Tj and k (xi ) = i,k be the linear finite element nodal
j
basis function corresponding to node xk . Then, the jth level finite element discretization reads:
Find u j ∈ Mj , such that
Aju j = f j (2)
where f j ∈ Mj satisfies ( f j , v j ) = ( f, v j ), ∀v j ∈ Mj .
The MSC reduces an MG process to choosing a sequence of subspaces and corresponding
operators B j : Mj → Mj approximating A−1 j , j = 1, . . . , J . For example, in the MSC framework,
the standard MG backslash cycle for solving (2) is defined by the following subspace correction
scheme:
u j,l = u j,l−1 + B j ( f j − A j u j,l−1 )
where the operators B j : Mj → Mj , 0 jJ , are recursively defined as follows [24].
Algorithm 2.5
Let R j ≈ A−1 −1
j , j > 0, denote a local relaxation method. For j = 0, define B0 = A0 . Assume that
B j−1 : Mj−1 → Mj−1 is defined. Then,
1. Fine grid smoothing: For u 0j = 0 and k = 1, 2, . . . , n,
u kj = u k−1
j + R j ( f j − A j u k−1
j ) (3)
2. Coarse grid correction: Find the corrector e j−1 ∈ Mj−1 by the iterator B j−1
e j−1 = B j−1 Q j−1 ( f j − A j u nj )
Then, B j f j = u nj +e j−1 .
Recursive application of Algorithm 2.5 results in an MG V -cycle for which the following identity
holds: I − B vJ A J = (I − B J A J )∗ (I − B J A J ) [24], where B vJ is the iterator for the MG V -cycle.
Direct computation gives the following useful result:
u nj = (I − R j A j )u n−1
j + Rj Aju j
= (I − R j A j )2 u n−2
j −(I − R j A j )2 u j +u j
= −(I − R j A j )n u j +u j
where u j is the finite element solution of (2) and u nj is the approximation after n iterations of (3)
on the jth level. Let T j = (I −(I − R j A j )n )P j be a linear operator and define T0 = P0 . We have
the following identity:
(I − B J A J )u J = u J −u nJ −e J −1 = (I − T J )u J −e J −1
= (I − B J −1 A J −1 PJ −1 )(I − T J )u J
where, for B J −1 = A−1
J −1 , this becomes a two-level method. Recursive application of this identity
then yields the error propagation operator of an MG V -cycle:
(I − B J A J ) = (I − T0 )(I − T1 ) · · · (I − T J )
DOI: 10.1002/nla
To estimate the uniform convergence of the MG V -cycle, we thus need to show that
I − B vJ A J a = I − B J A J a2 c < 1
where c is independent of J and ua2 = a(u, u) = (Au, u) on .
Associated with each T j , we introduce its symmetrization
T̄ j = T j + T j∗ − T j∗ T j
where T j∗ is the adjoint operator of T j with respect to the inner product a(·, ·). By a well-known
result found in [25], the following estimate holds:
c0
I − B J A J a2 =
1+c0
where

J
c0 sup a((T̄ j−1 − I )(P j − P j−1 )v, (P j − P j−1 )v) (4)
va =1 j=1
Now, to prove the uniform convergence of the proposed MG scheme, we must derive a uniform
bound on the constant c0 .
Although the above presentation is in terms of operators, the matrix representation of the
smoothing step (3) is often used in practice. By the matrix representation R of an operator R on
Nj
Mj , we here mean that with respect to the basis {i }i=1 of Mj ,

Nj
R(k ) = Ri,k i
i=1
where Ri,k is the (i, k) component of the matrix R. Throughout the paper, we use boldfaced letters
to denote vectors and matrices.
Let A S = D−L−U be the stiffness matrix associated with the operator A j , where the matrix D
consists of only the diagonal entries of A S , while matrices −L and −U are the strictly lower and
upper triangular parts of A S , respectively. Denote by R M the corresponding matrix of the smoother
R j on the jth level. For example, R M = D−1 for the Jacobi method, and R M = (D−L)−1 for the
Gauss–Seidel method. In addition, let ul , ul−1 , and f be the vectors containing the coordinates
Ni N j l
of u lj , u l−1
j , f j ∈ Mj on the basis {i }i=1 , namely u j =
l
i=1 ui i . Then, one smoothing step for
solving (2) on a single level j in terms of matrices reads
ul = ul−1 +R M (Mf−A S ul−1 ) (5)
where M is the mass matrix, and Mi,k = (i , k ).
Lemma 2.6
Let R be the matrix representation of the smoother R j in Equation (3). Then,
R = RM M
Hence,

Nj
Nj
R j (k ) = Ri,k i = (R M M)i,k i
i=1 i=1
DOI: 10.1002/nla
and
ul = ul−1 +R M (Mf−A S ul−1 ) = ul−1 +R(f−M−1 A S ul−1 )
Proof
Denote by A the matrix representation of the operator A. Note that

Nj
(Ai , k ) = Am,i m , k = (∇k , ∇i ) = (A S )k,i
m=1
indicates A S = MA. Moreover, in terms of matrices and vectors, Equation (3) also reads

Nj
Nj
Nj
Nj
Nj
Nj
Nj
uli i = ul−1
i i + Rk,i fi k − Rm,k Ak,i ui m
i=1 i=1 i=1 k=1 i=1 k=1 m=1
Then, the inner product with n on both sides, 1nN j , leads to
Mul = Mul−1 +MRf−MRAu
Multiplication by M−1 gives
ul = ul−1 +R(f−Au)
Taking into account that Equations (3) and (5) represent the same iteration, we have
Rf = R M Mf
Note the above equation holds for any f ∈ R N j . Therefore, R = R M M, which completes the proof.

3. UNIFORM CONVERGENCE OF THE MG METHOD ON GRADED MESHES
Next, we derive an estimate for the constant c0 in (4) of Section 2 and then proceed to establish
the main convergence theorem of the paper. We begin by proving several lemmas that are needed
in the convergence proof. For simplicity, we assume that there is only a single point S0 ∈ , ¯ for
which the solution of Equation (1) has a singularity in H (), and that a nested sequence of
2
graded meshes has been constructed, as described in Definition 2.2. The same argument, however,
carries over to problems on domains with multiple singularities and also for similar refinement
strategies.
S
Denote by {Ti 0 } all the initial triangles with the common vertex S0 . Recall that the function
in the weight equals the distance to S0 on these triangles. Based on the process in Definition 2.2,
S
after N refinements, the region ∪Ti 0 is partitioned into N +1 sub-domains (layers) Dn , 0nN ,
whose sizes decrease by the factor as they approach S0 (see Figure 2). In addition, (x, y) = ∼ n
on Dn for 0n < N and (x, y)C on D N . Meanwhile, sub-triangles (nested meshes) are
N
generated in these layers Dn , 0nN , with corresponding mesh size of order O(n 2n−N ).
DOI: 10.1002/nla
Figure 2. Initial triangles with vertex S0 (left); layer D0 and D1 after one refinement (right), = 0.2.
Note that = (∪Dn )∪(\∪ Dn ). Let *Dn be the boundary of Dn . Then, we define a piecewise
constant function r p (x, y) on ¯ as follows.

(1/2)n on D̄n \*Dn−1 for 1 < nN
r p (x, y) =
1 otherwise
S
where N = J is the number of refinements for TJ . Therefore, the restriction of r p on every Ti 0 ∩ Dn
is a constant. Recall that < 1 is the parameter for , such that = 2−1/ . Define the weighted
inner product with respect to r p :

(u, v)r p = (r p u,r p v) = r 2p uv

In addition, the above inner product induces the norm:
1/2
ur p = (u, u)r p
Then, the following estimate holds.
Lemma 3.1
c1
(u j − P j−1 u j , u j − P j−1 u j )r p a(u j − P j−1 u j , u j − P j−1 u j ) ∀u j ∈ Mj
Nj
where N j = O(22 j ) is the dimension of Mj .
Proof
This lemma can be proved by the duality argument as follows.
Consider the following boundary value problem:
−w = r 2p (u j − P j−1 u j ) in
w = 0 on * D
*w/*n = 0 on * N
DOI: 10.1002/nla
Then, since P j−1 w ∈ Mj−1 , from the equation above, we have
(r p (u j − P j−1 u j ),r p (u j − P j−1 u j )) = (r 2p (u j − P j−1 u j ), u j − P j−1 u j )

= (∇w, ∇(u j − P j−1 u j ))
= (∇(w − P j−1 w), ∇(u j − P j−1 u j ))
We note that w is a piecewise linear function on the graded triangulation Tj that is derived after
j refinements. From the results of Theorem 2.3, we conclude
|w − P j−1 w|2H 1 () (C1 /N j−1 )w2K 0

−1 ()

j
= (C1 /N j−1 ) 1− w2L 2 (D ) +1− w2L 2 (\∪D
n n)
n=0

j
(C/N j−1 ) n(1−) w2L 2 (D ) +w2L 2 (\∪D
n n)
n=0

j
= (C/N j−1 ) 2
n n
w2L 2 (D ) +w2L 2 (\∪D )
n n
n=0

j
= (C/N j−1 ) r −1
p w L 2 (Dn ) +w L 2 (\∪Dn )
2 2
n=0
= (C/N j−1 )r −1

p w L 2 ()
2
The inequalities above are based on the definition of , r p , and related norms. Now, since N j =
O(N j−1 ), combining the results above, we have
|w − P j−1 w|2H 1 |u j − P j−1 u j |2H 1

u j − P j−1 u j r2p
(u j − P j−1 u j )r2p
|w − P j−1 w|2H 1 |u j − P j−1 u j |2H 1

=
r −1
p w L 2
2
c1 c1
|u j − P j−1 u j |2H 1 = a(u j − P j−1 u j , u j − P j−1 u j )
Nj Nj
which completes the proof.
Recall that the matrix form R M and the matrix representation R of a smoother R j are different
from Lemma 2.6. Then, we have the following result regarding the smoother R̄ j = R j + R tj −
R tj A j R j on Mj , which is the symmetrization of R j , where R tj is the adjoint of R j with respect
to (·, ·).
DOI: 10.1002/nla
Lemma 3.2
For the subspace smoother R̄ j : Mj → Mj , we assume that there is a constant C > 0 independent
of j, such that the corresponding matrix form R̄ M satisfies
vT R̄ M vCvT v ∀v ∈ R N j
on every level j, where N j is the dimension of the subspace Mj . Then, there exists c2 > 0, also
independent of the level j, such that the following estimate holds on each graded mesh Tj ,
c2
( R̄ j v, v)( R̄ j v, R̄ j v)r p ∀v ∈ Mj
Nj
Proof
For any v = i vi i ∈ Mj , from Lemma 2.6, we have

( R̄ j v, v) = vm (R̄ M M)k,m k , vi i = vT MT R̄ M Mv
m k i
On the other hand,

( R̄ j v, R̄ j v)r p = vm (R̄ M M)k,m k , vl (R̄ M M)i,l i
m k l i
= vT MT R̄ M M̃R̄ M Mv
where M̃ is a matrix satisfying (M̃)i,k = (r p i ,r p k ). Note that both M and M̃ are symmetric
positive definite (SPD). Now, suppose supp(i )∩ Dn = ∅, 0n j. Then, on supp(i ), the mesh
size is O(n 2n− j ) and r p =
∼ (1/2)n , respectively, since supp(i ) is covered by at most two adjacent
layers. Thus, all the non-zero elements in M̃ are positive and M̃ = ∼ 2−2 j =
∼ 1/N j . To complete the
proof, it is sufficient to show that there exists C > 0, such that
1/2 1/2
wT R̄ M M̃R̄ M w(C/N j )wT w
1/2
where w = R̄ M Mv.
From the condition on R̄ M and the estimates on M̃, it follows that
wT R̄ M M̃R̄ M w =
1/2 1/2
∼ (1/N j )w R̄ M w(C/N j )w w
T T
Remark 3.3
For our choice of graded meshes, the triangles remain shape-regular elements, that is, the minimum
angles of the triangles are bounded away from 0. Therefore, the stiffness matrix A S has a bounded
number of non-zero entries per row and each entry is of order O(1). Hence, the maximum
eigenvalue of A S is bounded. For this reason, standard smoothers (Richardson, weighted Jacobi,
Gauss–Seidel, etc.) satisfy Lemma 3.2, and (R M )i, j = O(1) as well, since they are all from part
of the matrix A S . Moreover, if R M is SPD and the spectral radius (R M A S ) , for 0 < < 1,
then based on Lemma 2.6,
a(R j A j v, v) = (A j R j A j v, v)
= vT A S R M A S v
a(v, v)
DOI: 10.1002/nla
1/2 1/2
The last inequality follows from the similarity of the matrix A S R M A S and the matrix R M A S .
Note that the above inequality implies the spectral radius of R j A j , since R j A j is symmetric
with respect to a(·, ·).
We then define the following operators for the MG V -cycle. Recall T j from Section 2 and
let R j denote a subspace smoother satisfying Lemma 3.2. Recall the symmetrization R̄ j of R j ,
and assume the spectral radius ( R̄ j A j ) for 0 < < 1. Note that R tj is the adjoint of R j with
respect to (·, ·) and T j∗ is the adjoint of T j with respect to a(·, ·). With n smoothing steps, where
R j and R tj are applied alternatingly, the operator G j and G ∗j are defined as follows:
G j = I − Rj Aj, G ∗j = I − R tj A j
With this choice

P j −(G ∗j G j )n/2 P j for even n
Tj =
P j − G j (G ∗j G j )(n−1)/2 P j for odd n
Therefore, if we define

G ∗j G j for even n
G j,n =
G j G ∗j for odd n
since P j2 = P j ,
T¯j = T j + T j∗ − T j∗ T j = (I − G nj,n )P j
Note that T̄ j is invertible on Mj , and hence T̄ j−1 exists.

The main result concerning the uniform convergence of the MG V -cycle for our model problem
is summarized in the following theorem.
Theorem 3.4
On every triangulation Tj , suppose that the smoother on each subspace Mj satisfies Lemma 3.2.
Then, following the algorithm described above, we have
c0 c1
I − B J A J a2 =
1+c0 c1 +c2 n
where c1 and c2 are constants from Lemmas 3.1 and 3.2.
Proof
Recall
(4) from Section 2. To estimate the constant c0 , we first consider the decomposition v =
j v j for any v ∈ MJ with
v j = (P j − P j−1 )v ∈ Mj
Then, Lemma 3.1 implies
N j (v j , v j )r p c1 a(v j , v j )
DOI: 10.1002/nla
Estimating the identity of Xu and Zikatanov [25], we have
a(T̄ j−1 (I − T̄ j )v j , v j ) = a((I − G nj,n )−1 G nj,n v j , v j )
= ( R̄ −1 n −1 n
j R̄ j A j (I − G j,n ) G j,n v j , v j )
= ( R̄ −1 n −1 n
j (I − G j,n )(I − G j,n ) G j,n v j , v j )
−1/2 1/2 −1/2
Note that G kj,n , kn, is in fact a polynomial of R̄ j A j . Therefore, R̄ j (I − G j,n ) R̄ j , R̄ j G nj,n
−1/2 1/2 1/2 1/2 −1/2 1/2
R̄ 1/2 , and R̄ j (I − G nj,n ) R̄ j are all polynomials of R̄ j A j R̄ j , where R̄ j (I − G nj,n ) R̄ j =
−1/2 −1/2 −1/2
(I − G nj,n )−1 R̄ j )−1 . Thus, it can be seen that R̄ j
1/2 1/2
( R̄ j (I − G j,n ) R̄ j , R̄ j G nj,n R̄ 1/2 , and
−1/2 −1/2
R̄ j (I − G nj,n )−1 R̄ j commute with each other; hence, R̄ j (I − G j,n )(I − G nj,n )−1 G nj,n R̄ 1/2
1/2
is symmetric with respect to (·, ·).

−1/2
Then, based on the above argument, defining w j = R̄ j v j , we have
−1/2
a(T̄ j−1 (I − T̄ j )v j , v j ) = ( R̄ j (I − G j,n )(I − G nj,n )−1 G nj,n R̄ 1/2 w j , w j )
max (1−t)(1−t n )−1 t n ( R̄ −1

j vj,vj)
t∈[0,1]
1 Nj
( R̄ −1
j v j , v j ) (v j , v j )r p
n c2 n
where the last inequality is from Lemma 3.2. Moreover,

J J N J c c1
a(T̄ j−1 (I − T̄ j )v j , v j )
j 1
(v j , v j )r p a(v j , v j ) = a(v, v)
j=0 c
j=1 2 n c
j=0 2 n c 2n
Therefore, c0 c1 /(c2 n) and consequently, the MSC yields the following convergence estimate for
the MG V -cycle:
c0 c1
I − B J A J a2 =
1+c0 c1 +c2 n
which completes the proof.
4. NUMERICAL ILLUSTRATION
This section contains numerical results for the proposed MG V -cycle applied to the 2D Poisson
equation with a single corner-like singularity. The model test problem we consider here is given
by
−u = f in
(6)
u=0 on *
where the singularity occurs at the tip of the crack {(x, y), 0x0.5, y = 0.5} for = (0, 1)×(0, 1)
as in Figure 3.
DOI: 10.1002/nla
The MG scheme used to solve (6) is a standard MG V -cycle with linear interpolation. The
sequence of coarse-level problems defining the MG hierarchy is obtained by re-discretizing (6)
on the nested meshes constructed using the GMR strategy described in Section 2. The reported
results are for V (1, 1)-cycles and Gauss–Seidel (GS) as a smoother. The asymptotic convergence
factors are computed using 100 V (1, 1)-cycles applied to the homogeneous problem starting with
an O(1) random initial approximation.
The asymptotic convergence factors reported in Table I clearly demonstrate our theoretical
estimates in that they are independent of the number of refinement levels. To obtain a more complete
picture of the overall effectiveness of our MG solver, we examine also storage and work-per-cycle
measures. These are usually expressed in terms of operator complexity, defined as the number of
non-zero entries stored in the operators on all levels divided by the number of non-zero entries
in the finest-level matrix, and grid complexity defined as the sum of the dimensions of operators
over all levels divided by the dimension of the finest-level operator. The grid and, especially, the
operator complexities can be viewed as proportionality constants that indicate how expensive the
entire V -cycle is compared with performing only the finest-level relaxations of the V -cycle. For
our test problem, the grid and operator complexities were 1.2 and 1.3, respectively, independent
of the number of levels. Considering the low grid and operator complexities the performance of
the resulting MG solver applied to problem (6) is comparable to that of standard geometric MG
applied to the Poisson equation with full regularity, i.e. without corner-like singularities; for the
Poisson equation discretized on uniformly refined grids, standard MG with a GS smoother and
linear interpolation yields MG ≈ 0.35.
Figure 3. Crack: initial triangulation (left) and the triangulation after one refinement (right), = 0.2.
Table I. Asymptotic convergence factors (MG ) for the MG V (1, 1)-cycle applied to
problem (6) with Gauss–Seidel smoother.
levels 2 3 4 5 6
MG (GS) 0.40 0.53 0.56 0.53 0.50
DOI: 10.1002/nla
ACKNOWLEDGEMENTS
We would like to thank Long Chen, Victor Nistor and Jinchao Xu for their useful suggestions and
discussions during the preparation of this manuscript.
The work of the second author was supported in part by NSF (DMS-0555831). The work of the first
and the third author was supported in part by the NSF (DMS-058110) and Lawrence Livermore National
Lab (B568399).
REFERENCES
1. Bramble JH. Multigrid Methods. Chapman & Hall, CRC Press: London, Boca Raton, FL, 1993.
2. Hackbusch W. Multi-Grid Methods and Applications. Computational Mathematics. Springer: New York, 1995.
3. Trottenberg U, Oosterlee CW, Schüller A. Multigrid. Academic Press: San Diego, CA, 2001 (With contributions
by A. Brandt, P. Oswald, K. Stüben).
4. Xu J. Iterative methods by space decomposition and subspace correction. SIAM Review 1992; 34(4):581–613.
5. Babuška I, Aziz AK. The Mathematical Foundations of the Finite Element Method with Applications to Partial
Differential Equations. Academic Press: New York, 1972.
6. Băcuţă C, Nistor V, Zikatanov LT. Improving the rate of convergence of ‘high order finite elements’ on polygons
and domains with cusps. Numerische Mathematik 2005; 100(2):165–184.
7. Bourlard M, Dauge M, Lubuma MS, Nicaise S. Coefficients of the singularities for elliptic boundary value
problems on domains with conical points. III. Finite element methods on polygonal domains. SIAM Journal on
Numerical Analysis 1992; 29(1):136–155.
8. Dauge M. Elliptic Boundary Value Problems on Corner Domains. Lecture Notes in Mathematics, vol. 1341.
Springer: Berlin, 1988.
9. Grisvard P. Singularities in Boundary Value Problems. Research Notes in Applied Mathematics, vol. 22. Springer:
New York, 1992.
10. Kellogg RB, Osborn JE. A regularity result for the Stokes problem in a convex polygon. Journal of Functional
Analysis 1976; 21(4):397–431.
11. Kondratiev VA. Boundary value problems for elliptic equations in domains with conical or angular points.
Transactions of the Moscow Mathematical Society 1967; 16:227–313.
12. Kozlov VA, Mazya V, Rossmann J. Elliptic Boundary Value Problems in Domains with Point Singularities.
American Mathematical Society: Rhode Island, 1997.
13. Apel T, Sändig A, Whiteman JR. Graded mesh refinement and error estimates for finite element solutions of
elliptic boundary value problems in non-smooth domains. Mathematical Methods in the Applied Sciences 1996;
19(1):63–85.
14. Babuška I, Kellogg RB, Pitkäranta J. Direct and inverse error estimates for finite elements with mesh refinements.
Numerische Mathematik 1979; 33(4):447–471.
15. Li H, Mazzucato A, Nistor V. On the analysis of the finite element method on general polygonal domains II:
mesh refinements and interpolation estimates. 2007, in preparation.
16. Braess D, Hackbusch W. A new convergence proof for the multigrid method including the V -cycle. SIAM Journal
on Numerical Analysis 1983; 20(5):967–975.
17. Brenner SC. Convergence of the multigrid V -cycle algorithm for second-order boundary value problems without
full elliptic regularity. Mathematics of Computation 2002; 71(238):507–525 (electronic).
18. Bramble JH, Pasciak JE, Wang JP, Xu J. Convergence estimates for multigrid algorithms without regularity
assumptions. Mathematics of Computation 1991; 57(195):23–45.
19. Yserentant H. The convergence of multilevel methods for solving finite-element equations in the presence of
singularities. Mathematics of Computation 1986; 47(176):399–409.
20. Brandt A, McCormick S, Ruge J. Algebraic multigrid (AMG) for sparse matrix equations. Sparsity and its
Applications (Loughborough, 1983). Cambridge University Press: Cambridge, 1985; 257–284.
21. Vassilevski P. Multilevel Block Factorization Preconditioners. Springer: Berlin, 2008.
22. Ciarlet P. The Finite Element Method for Elliptic Problems. Studies in Mathematics and its Applications, vol. 4.
North-Holland: Amsterdam, 1978.
23. Li H, Mazzucato A, Nistor V. On the analysis of the finite element method on general polygonal domains I:
transmission problems and a priori estimates. CCMA Preprint AM319, 2007.
DOI: 10.1002/nla
24. Xu J. An introduction to multigrid convergence theory. Iterative Methods in Scientific Computing, Hong Kong,
1995. Springer: Singapore, 1997; 169–241.
25. Xu J, Zikatanov L. The method of alternating projections and the method of subspace corrections in Hilbert
space. Journal of the American Mathematical Society 2002; 15(3):573–597 (electronic).
26. Adams R. Sobolev Spaces. Pure and Applied Mathematics, vol. 65. Academic Press: New York, London, 1975.
27. Ammann B, Nistor V. Weighted sobolev spaces and regularity for polyhedral domains. Preprint, 2005.
28. Apel T, Schöberl J. Multigrid methods for anisotropic edge refinement. SIAM Journal on Numerical Analysis
2002; 40(5):1993–2006 (electronic).
29. Băcuţă C, Nistor V, Zikatanov LT. Regularity and well posedness for the Laplace operator on polyhedral domains.
IMA Preprint, 2004.
30. Bramble JH, Pasciak JE. New convergence estimates for multigrid algorithms. Mathematics of Computation 1987;
49(180):311–329.
31. Bramble JH, Xu J. Some estimates for a weighted L 2 projection. Mathematics of Computation 1991; 56(194):
463–476.
32. Bramble JH, Zhang X. Uniform convergence of the multigrid V -cycle for an anisotropic problem. Mathematics
of Computation 2001; 70(234):453–470.
33. Brenner S, Scott LR. The Mathematical Theory of Finite Element Methods. Texts in Applied Mathematics,
vol. 15. Springer: New York, 1994.
34. Brenner SC. Multigrid methods for the computation of singular solutions and stress intensity factors. I. Corner
singularities. Mathematics of Computation 1999; 68(226):559–583.
35. Brenner SC, Sung L. Multigrid methods for the computation of singular solutions and stress intensity factors. II.
Crack singularities. BIT 1997; 37(3):623–643 (Direct methods, linear algebra in optimization, iterative methods,
Toulouse, 1995/1996).
36. Brenner SC, Sung L. Multigrid methods for the computation of singular solutions and stress intensity factors. III.
Interface singularities. Computer Methods in Applied Mechanics and Engineering 2003; 192(41–42):4687–4702.
37. Wu H, Chen Z. Uniform convergence of multigrid v-cycle on adaptively refined finite element meshes for second
order elliptic problems. Science in China 2006; 49:1405–1429.
38. Yosida K. Functional Analysis (5th edn). A Series of Comprehensive Studies in Mathematics, vol. 123. Springer:
New York, 1978.
39. Yserentant H. On the convergence of multilevel methods for strongly nonuniform families of grids and any
number of smoothing steps per level. Computing 1983; 30(4):305–313.
40. Yserentant H. Old and new convergence proofs for multigrid methods. Acta Numerica, 1993. Cambridge University
Press: Cambridge, 1993; 285–326.
DOI: 10.1002/nla

!!!numerical Linear Algebra With Applications - Multigrid Method

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

!!!numerical Linear Algebra With Applications - Multigrid Method

Uploaded by

Copyright:

Available Formats

NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS

Numer. Linear Algebra Appl. 2008; 15:85–87

Copyright q 2008 John Wiley & Sons, Ltd.

8. Brezina M, Manteuffel T, McCormick S, Ruge J, Sanders G, Vassilevski P. A generalized eigensolver based on

Efficiency-based h- and hp-refinement strategies for finite

H. De Sterck1, ∗, † , T. Manteuffel2 , S. McCormick2 , J. Nolting2 ,

Received 19 April 2007; Accepted 1 November 2007

KEY WORDS: adaptive refinement; finite element methods; hp-refinement

Copyright q 2008 John Wiley & Sons, Ltd.

2. EFFICIENCY-BASED h-REFINEMENT STRATEGIES

u −u h H m () Ch s−m u H s () (2)

2.2. 1D model problem and finite element method

u  = (−1)x −2 , u(0) = 0, u(1) = 1 (3)

2.3. ‘WEE’ and ‘ACE’ strategies

(r )eff = (r )1/(r ) (6)

2.4. Error and work estimates for h-refinement in 1D

(r ) = 1+r (8)

(r ) = 1− f (r )+( 12 )2 p f (r ) (13)

3. PERFORMANCE OF THE WEE AND ACE h-REFINEMENT

3.1. Performance of WEE and ACE for smooth solutions

Theorem 1 (Gui and Babuška [5])

EC (N p)− p (15)

x j = ( j/N )( p+1/2)/(−1/2) , j = 0, . . . , N (16)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Figure 1. Error versus DOF, = 2.1 (no singularity), p = 1.

f(ropt) and ropt

3.2. Performance of WEE and ACE for singular solutions

1.5 2 2.5 3 3.5 4 4.5 5

Figure 7. Error versus DOF, = 0.6 (singular case), p = 1.

f(ropt) and ropt

Figure 10. Number of elements, N , versus level, , = 0.6 (singular

0 20 40 60 80 100 120 140 160 0 20 40 60 80 100 120

4. MODIFIED WEE AND ACE h-REFINEMENT STRATEGIES

4.1. Modified WEE and ACE h-refinement strategies

Figure 13. Error versus DOF, = 0.6 (singular case), p = 1.

Figure 16. Number of elements, N , versus level, , = 0.6 (singular

4.3. Comparison with threshold-based refinement strategy

1 1.5 2 2.5 3 3.5 4 10 10 10 10

In conclusion, the efficiency-based refinement strategies automatically and adaptively choose a

4.4. Results for p = 2

5. EFFICIENCY-BASED hp-REFINEMENT STRATEGIES IN 1D

5.1. hp-version of the (M)WEE and (M)ACE refinement strategies

Theorem 2 (Gui and Babuška [5])

We can then develop an hp-version of the MWEE strategy as follows:

More precisely, we have the following approximation [3]:

for elements j that do not contain a singularity.

5.2. Optimal geometric hp-grid for the model problem

Figure 22. Error versus DOF, = 0.6 (singular case), p = 1.

5.3. Numerical results and comparisons

error on final grid

6.1. Efficiency strategies in Rd

(r ) = 1+(2d −1)r (27)

(r ) = 1− f (r )+( 12 )2 p f (r ) (28)

6.2. Model problem and finite element method

6.3. Numerical results

parallelization of the efficiency-based refinement strategies. Binning strategies need to be consid-

Distance-two interpolation for parallel algebraic multigrid

Received 11 May 2007; Revised 20 September 2007; Accepted 21 September 2007

Contract/grant sponsor: U.S. Department of Energy; contract/grant number: W-7405-Eng-48

Copyright q 2007 John Wiley & Sons, Ltd.

3. DISTANCE-ONE INTERPOLATION STRATEGIES

3.1. Definitions and remarks

where 0<<1. We set = 0.25 in the remainder of the paper.

u = (−1)x −2 , u(0) = 0, u(1) = 1 (3)

(r )eff = (r )1/(r ) (6)

(r ) = 1+r (8)

(r ) = 1+(2d −1)r (27)

Algorithm 1. Standard multigrid iteration for Au = b. ( = 1: V-cycle, = 2: W-cycle)

= (−(I Q ⊗ K 1+ +C2 ⊗ K 2+ )−1 (I Q ⊗ K 1− +C2 ⊗ K 2− ))