You are on page 1of 14

GEOPHYSICS, VOL. 66, NO. 1 (JANUARY-FEBRUARY 2001); P. 174187, 7 FIGS., 4 TABLES.

Nonlinear conjugate gradients algorithm


for 2-D magnetotelluric inversion

William Rodi and Randall L. Mackie

ABSTRACT are more efcient than the Gauss-Newton algorithm


We investigate a new algorithm for computing reg- in terms of both computer memory requirements and
ularized solutions of the 2-D magnetotelluric inverse CPU time needed to nd accurate solutions to problems
problem. The algorithm employs a nonlinear conju- of realistic size. This owes largely to the fact that the
gate gradients (NLCG) scheme to minimize an objec- conjugate gradients-based algorithms avoid two com-
tive function that penalizes data residuals and second putationally intensive tasks that are performed at each
spatial derivatives of resistivity. We compare this al- step of a Gauss-Newton iteration: calculation of the
gorithm theoretically and numerically to two previous full Jacobian matrix of the forward modeling operator,
algorithms for constructing such minimum-structure and complete solution of a linear system on the model
models: the Gauss-Newton method, which solves a se- space. The numerical tests also show that the Mackie-
quence of linearized inverse problems and has been Madden algorithm reduces the objective function more
the standard approach to nonlinear inversion in geo- quickly than our new NLCG algorithm in the early stages
physics, and an algorithm due to Mackie and Madden, of minimization, but NLCG is more effective in the
which solves a sequence of linearized inverse prob- later computations. To help understand these results,
lems incompletely using a (linear) conjugate gradients we describe the Mackie-Madden and new NLCG algo-
technique. Numerical experiments involving synthetic rithms in detail and couch each as a special case of a
and eld data indicate that the two algorithms based more general conjugate gradients scheme for nonlinear
on conjugate gradients (NLCG and Mackie-Madden) inversion.

INTRODUCTION Jupp and Vozoff (1975), which obtain nonlinear least-squares


solutions, and those of Smith and Booker (1988) and Constable
The standard approach to solving nonlinear inverse prob- et al. (1987), which nd nonlinear least-squares solutions sub-
lems in geophysics has been iterated, linearized inversion. That ject to a smoothness constraint (regularized solutions). Jupp
is, the forward function (for predicting error-free data) is ap- and Vozoff extended their algorithm to the case of 2-D models
proximated with its rst-order Taylor expansion about some (Jupp and Vozoff, 1977), and algorithms for nding regular-
reference model; a solution of the resulting linear inverse prob- ized solutions of the 2-D MT problem have been presented by
lem is computed; the solution is then taken as a new reference Jiracek et al. (1987), Madden and Mackie (1989), Rodi (1989),
model, and the process is repeated. Such schemes are generally deGroot-Hedlin and Constable (1990), and Smith and Booker
some form of Newtons method (typically Gauss-Newton or (1991). Mackie and Madden (1993) implemented an iterated,
Levenberg-Marquardt). When run to convergence, they min- linearized inversion algorithm for 3-D MT data, as did Newman
imize an objective function over the space of models and, in (1995) and Newman and Alumbaugh (1997) for the related
this sense, produce an optimal solution of the nonlinear in- problem of crosswell electromagnetic data. However, the use-
verse problem. Most inversion algorithms for magnetotelluric fulness of such algorithms in 3-D electromagnetic inverse prob-
(MT) data have been iterated, linearized methods. For 1-D lems has been hampered by severe computational difculties,
earth models, these include the algorithms of Wu (1968) and which we now discuss.

Manuscript received by the Editor August 11, 1998; revised manuscript received June 6, 2000.

Massachusetts Institute of Technology, Earth Resources Laboratory, E34-458, Cambridge, Massachusetts 02139. E-mail: rodi@ mit.edu.
GSY-USA, Inc., PMB #643, 2261 Market Street, San Francisco, California 94114-1600. E-mail: randy@gsy-usa.com.
c 2001 Society of Exploration Geophysicists. All rights reserved.

174

Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
NLCG Algorithm for 2-D MT Inversion 175

Compared to global optimization methods like grid search, scribed for the nonlinear inverse problem. The use of conjugate
Monte-Carlo search, and genetic algorithms, inversion meth- gradients for function minimization is a well-established opti-
ods that make use of the Jacobian (rst-order derivative) of the mization technique (Fletcher and Reeves, 1959; Polak, 1971)
forward function, like those cited in the previous paragraph, and was suggested for nonlinear geophysical inverse prob-
generally require the testing of many fewer models to obtain lems by Tarantola (1987). It has been applied to varied geo-
an optimal solution of an inverse problem. This fact is of critical physical problems, including crosswell traveltime tomography
importance in 2-D and 3-D electromagnetic inverse problems (Matarese and Rodi, 1991; Matarese, 1993), crosswell wave-
where the forward function entails the numerical solution of form tomography (Thompson, 1993; Reiter and Rodi, 1996),
Maxwells equations, and is the reason that iterated, linearized and dc resistivity (Ellis and Oldenburg, 1994; Shi et al., 1996).
methods have occupied center stage in electromagnetic inver- Our investigation compares the numerical performance of
sion despite their greater susceptibility to nding locally rather three algorithms for 2-D magnetotelluric inversion: a Gauss-
than globally optimal solutions. On the other hand, generation Newton algorithm, the Mackie-Madden algorithm, and a new
of the Jacobian in these same problems multiplies the com- NLCG algorithm. In tests involving synthetic and real data, the
putational burden many times over that of evaluating the for- algorithms are applied to the minimization of a common objec-
ward function alone, even when efcient reciprocity techniques tive function so that algorithm efciency and accuracy can be
(Madden, 1972; Rodi, 1976; McGillivray and Oldenburg, 1990) compared directly. Rather than implement a published NLCG
are exploited. Moreover, iterated, linearized inversion meth- algorithm (e.g., Press et al., 1992), we designed our NLCG al-
ods, done to prescription, have the additional computational gorithm to avoid excessive evaluations of the forward problem
chore of solving a linear system on the model space at each and to fully exploit the computational techniques for Jacobian
iteration step. These two tasksgenerating the Jacobian and operations used in the Mackie-Madden algorithm. Conversely,
linear inversiondominate the computations in 2-D and 3-D we modied the original Mackie-Madden algorithm to include
MT inversion, where the number of data and model parameters a preconditioner that we developed for NLCG. Given this, we
are typically in the hundreds or thousands. The computation of can state two objectives of our study: to demonstrate quanti-
optimal solutions to the 2-D MT inverse problem can require tatively the computational advantages of the two algorithms
several hours of CPU time on a modern workstation, whereas that use conjugate gradients (Mackie-Madden and NLCG)
computing optimal solutions of the 3-D problem is impractical over a traditional iterated, linearized inversion scheme (Gauss-
on the computers widely available today. Newton); and to determine whether the NLCG framework
This computational challenge has motivated various algo- offers improvements over the Mackie-Madden approach as
rithmic shortcuts in 2-D and 3-D MT inversion. One approach a conjugate gradients technique. Towards the latter end and
has been to approximate the Jacobian based on electromag- as a prelude to future research on the conjugate-gradients
netic elds computed for homogeneous or 1-D earth mod- approach to nonlinear inversion, we describe the Mackie-
els, which has been used in 2-D MT inversion by Smith and Madden and our new NLCG algorithms in common terms and
Booker (1991) in their rapid relaxation inverse (RRI) and in detail in an attempt to isolate the precise differences between
by Farquharson and Oldenburg (1996) for more general 2-D them.
and 3-D electromagnetic problems. Other workers have sought
approximate solutions of the linearized inverse problem. In this PROBLEM FORMULATION
category is the method of Mackie and Madden (1993), which
solves each step of a Gauss-Newton iteration incompletely us- Forward model for 2-D magnetotellurics
ing a truncated conjugate gradients technique. In addition to
As is customary in 2-D magnetotellurics, we model the solid
bypassing the complete solution of a large linear system, the al-
earth as a conductive halfspace, z 0, underlying a perfectly re-
gorithm avoids computation of the full Jacobian matrix in favor
sistive atmosphere. The electromagnetic source is modeled as a
of computing only its action on specic vectors. Although not
plane current sheet at some height z = h. Given that the phys-
as fast as RRI, the Mackie-Madden algorithm does not em-
ical parameters of the earth are independent of one cartesian
ploy approximations to the Jacobian and requires much less
coordinate (x), Maxwells equations decouple into transverse
computer time and memory than the traditional iterated, lin-
electric (TE) and transverse magnetic (TM) polarizations. For
earized inversion methods (as we will demonstrate in this pa-
the purpose of calculating MT data at low frequency, it sufces
per). Also in this category is the subspace method, applied by
to solve (see, for example, Swift, 1971)
Oldenburg et al. (1993) to dc resistivity inversion and by oth-
ers to various other geophysical inverse problems. This method
reduces the computational burden by solving each linearized 2 Ex 2 Ex
+ = i E x (1)
inverse problem on a small set of judiciously calculated search y2 z 2
directions in the model space. 
E x 
In their use of incomplete solutions of the linearized inverse = i (2)
problem, the subspace and Mackie-Madden inversion methods z z=h
depart from the strict schema of iterated, linearized inversion,
with an accompanying reduction in the computer resources for the TE polarization, and
needed to solve large, nonlinear inverse problems. In this pa-    
per we investigate an approach to electromagnetic inversion Hx Hx
+ = iHx (3)
that is a further departure from the geophysical tradition: non- y y z z
linear conjugate gradients (NLCG), or conjugate gradients ap-
plied directly to the minimization of the objective function pre- Hx |z=0 = 1 (4)

Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
176 Rodi and Mackie

for the TM polarization, where E x (Hx ) is the x component of sistivity for one site is given by the formula
the electric (magnetic induction) eld, is angular frequency,  T 2
i a v
is the magnetic permeability (assumed to be that of free app = (10)
space), is the electrical conductivity, and is the inverse of bT v
conductivity, or resistivity.
where a and b are given vectors. An analogous discussion ap-
MT data are electric-to-magnetic-eld ratios in the fre-
plies to the TM polarization, with v being a discretization of
quency domain, which can be expressed as complex apparent
the Hx eld and with different choices of K, s, a, and b.
resistivities. For the TE polarization, the complex apparent re-
sistivity is dened as
  Inversion method
i E x  2
app = . (5)
Hy  We can write the inverse problem as

E x  denotes the value of E x at an observation site, which is d = F(m) + e


usually taken to be E x at a point but, more generally, can be a
spatial average of the E x eld. Hy  is an analogous functional where d is a data vector, m is a model vector, e is an er-
of the Hy eld. We note that Maxwells equations imply ror vector, and F is a forward modeling function. We take
1 Ex d = [d 1 d 2 d N ]T with each d i being either the log amplitude
Hy = . (6) or phase of app for a particular polarization (TE or TM), obser-
i z vation site, and frequency (). We take m = [m 1 m 2 m M ]T
For the TM polarization, we have to be a vector of parameters that dene the resistivity function.
  Being consistent with the numerical forward modeling scheme,
i E y  2 we let M be the number of model blocks and each m j be the
app = (7)
Hx  logarithm of resistivity (log ) for a unique block. Given these
denitions of d and m, the function F is dened implicitly by
and equations (9) and (10).
Hx We solve the inverse problem in the sense of Tikhonov and
Ey = . (8)
z Arsenin (1977), taking a regularized solution to be a model
minimizing an objective function, , dened by
We point out that the traditional real apparent resistivity is the
modulus of app . (m) = (d F(m))T V1 (d F(m)) + mT LT Lm (11)

for given , V, and L. The regularization parameter, , is a pos-


Numerical modeling
itive number. The positive-denite matrix V plays the role of
To solve equations (1)(8) approximately for a broad class the variance of the error vector e. The second term of  de-
of resistivity functions, the inversion algorithms in this paper nes a stabilizing functional on the model space. In this study
employ the numerical forward modeling algorithm described we choose the matrix L to be a simple, second-difference op-
by Mackie et al. (1988). In this algorithm, the halfspace z 0 is erator such that, when the grid of model blocks is uniform, Lm
segmented into 2-D rectangular blocks of varying dimensions, approximates the Laplacian of log .
each having a constant resistivity. Spatially heterogeneous re- The remainder of this paper deals with numerical algorithms
sistivity models ensue from varying the resistivities among the for minimizing .
blocks. The blocks abutting and outside a nite region are
semi-innite. Maxwells equations are approximated by nite- MINIMIZATION ALGORITHMS
difference equations derived using the transmission-network We will consider three numerical algorithms for minimiz-
analog of Madden (1972). ing the objective function  with respect to m: the Gauss-
For each polarization and frequency, the nite-difference Newton method, the method of Mackie and Madden (1993),
equations can be expressed as a complex system of linear and nonlinear conjugate gradients. For the remainder of this
equations, paper, we will label our particular implementation of these al-
gorithms as GN, MM, and NLCG, respectively. Each algorithm
Kv = s. (9)
generates a sequence of models m0 , m1 , . . , with the hope that
In the case of the TE polarization, this linear system represents (m ) minm (m) as  .
equations (1) and (2) with the vector v comprising samples of To describe the three algorithms in detail, we introduce the
the E x eld on a grid. The complex symmetric matrix K and following notations. The gradient and Hessian of the objective
right-hand-side vector s are functions of frequency and the di- function are the M-dimensional vector g and M M symmetric
mensions and resistivities of the model blocks. For a given ob- matrix H dened by
servation site, the quantity E x  in equation (5) is calculated as
g j (m) = j(m)
a linear combination of the elements of v, representing some
sort of linear interpolation and/or averaging of the E x eld. H jk (m) = j k (m), j, k = 1, . . . , M
Likewise, Hy  is calculated as a (different) linear function of
v, in this case representing also numerical differentiation in where j signies partial differentiation with respect to the jth
accordance with equation (6). Thus, the complex apparent re- argument of a function [reading (m) as (m 1 , m 2 , . . . , m M )].

Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
NLCG Algorithm for 2-D MT Inversion 177

Let A denote the Jacobian matrix of the forward function F: Presuming H to be nonsingular, this necessary condition is
also sufcient and we can write the Gauss-Newton iteration as
A (m) = j F (m),
ij i
i = 1, . . . , N ; j = 1, . . . , M. m+1 = m H1
 g .
Levenberg (1944) and Marquardt (1963) proposed a modi-
Given equation (11), we have
cation of the Gauss-Newton method in which the model in-
g(m) = 2A(m)T V1 (d F(m)) + 2LT Lm (12) crement at each step is damped. The rationale for damping
is to prevent unproductive movements through the solution

N
H(m) = 2A(m)T V1 A(m) + 2LT L 2 q i Bi (m) space caused by the nonquadratic behavior of  or poor con-
i=1
ditioning of H. In algorithm GN, we employ a simple version
(13) of Levenberg-Marquardt damping and replace equation (18)
with
where Bi is the Hessian of F i and q = V1 (d F(m)).  
We also dene an approximate objective function and its gra- H +  I (m+1 m ) = g . (19)
dient and Hessian based on linearization of F. For linearization
Here, I is the identity matrix and  is a positive damping pa-
about a model mref , dene
rameter allowed to vary with iteration step. Since the objec-
F(m; mref ) = F(mref ) + A(mref )(m mref ) tive function we are minimizing includes its own damping in
the form of the stabilizing (last) term in equation (11), and
(m;
mref ) = (d F(m; mref ))T V1 (d F(m; mref )) since this term is a quadratic function of m, a large amount of
Levenberg-Marquardt damping is not needed in our problem.
+ mT LT Lm. Algorithm GN chooses  to be quite small after the rst few
It is easy to show that the gradient and Hessian of  are given iteration steps and is therefore not a signicant departure from
by the expressions the Gauss-Newton method.
Our implementation of the Gauss-Newton algorithm solves
g(m; mref ) = 2A(mref )T V1 (d F(m; mref )) equation (19) using a linear, symmetric system solver from
the Linpack software library (Dongarra et al., 1979). First, the
+ 2LT Lm damped Hessian matrix, H +  I, is factored using Gaussian
elimination with symmetric pivoting. The factored system is
H(mref ) = 2A(mref )T V1 A(mref ) + 2LT L. (14)
then solved with g as the right-hand side vector. The Jacobian
 is quadratic in m (its rst argument), g is linear in m, and H matrix, A(m ), is needed to compute g and H in accordance
is independent of m. In fact, with equations (12) and (14). GN generates the Jacobian us-
ing the reciprocity method of Rodi (1976), which translates
(m;
mref ) = (mref ) + g(mref )T (m mref ) the task to that of solving a set of pseudoforward problems
1 having the same structure as equation (9) (see Appendix). The
+ (m mref )T H(mref )(m mref ) (15) memory requirements of GN are dominated by storage of the
2 Jacobian (NM real numbers) and the Hessian (M 2 real num-
g(m; mref ) = g(mref ) + H(mref )(m mref ). (16) bers). We note that the memory needed for forward modeling
and evaluating  scales linearly with N and M.
Clearly F(mref ; mref ) = F(mref ), (m ref ; mref ) = (mref )
Convergence of the Gauss-Newton, or Levenberg-
and g(mref ; mref ) = g(mref ), but H(mref ) is only an approx-
Marquardt, iteration implies that the sequence g converges
imation to H(mref ) obtained by dropping the last term in
to zero and thus that the solution is a stationary point of .
equation (13).
Whether the stationary point corresponds to a minimum or
otherwise depends on how strongly nonquadratic  is. When
Gauss-Newton algorithm (GN) the method does nd a minimum of , there is no assurance
that it is a global minimum.
One can describe the Gauss-Newton iteration as recursive
minimization of ,
i.e. the model sequence satises
Mackie-Madden algorithm (MM)
m0 = given
The second minimization algorithm we study is the algo-
(m
+1 ; m ) = min (m;
m ),  = 0, 1, 2, . . . . (17) rithm rst introduced by Madden and Mackie (1989) and fully
m
implemented and more completely described by Mackie and
A consequence of equation (17) is that the gradient vector, Madden (1993). As adapted to 3-D dc resistivity inversion, the
g(m+1 ; m ), is zero. In light of equation (16), m+1 satises the algorithm is also described by Zhang et al. (1995).
linear vector equation Mackie and Madden (1993) presented their algorithm as
iterated, linearized inversion. Solution of the linear inverse
H (m+1 m ) = g , (18) problem at each iteration step was formulated in terms of
a maximum-likelihood criterion. It is informative and well
where we make the abbreviations
serves our purpose to recast the Mackie-Madden algorithm
g g(m ) as a modication of the Gauss-Newton method which, like
Gauss-Newton, performs a minimization of the nonquadratic
H H(m ). objective function .

Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
178 Rodi and Mackie

That is, algorithm MM is a Gauss-Newton iteration in which and dene


the linear system (18) is solved incompletely by a conjugate
gradients (CG) technique. The incompleteness results from f,k = A p,k , k = 0, 1, . . . , K 1. (23)
halting the conjugate gradients iteration prematurely after a Then the denominator of equation (20) can be written
prescribed number of steps, K . Thus, for each , the updated
model, m+1 , is generated as a sequence: pT,k H p,k = 2fT,k V1 f,k + 2pT,k LT Lp,k ,
m,0 = m and the iteration for gradient vectors becomes

m,k+1 = m,k + ,k p,k , k = 0, 1, . . . , K 1 g,0 = 2AT V1 (d F(m )) + 2LT Lm (24)
m+1 = m,K . g,k+1 = g,k + 2,k AT V1 f,k + 2,k L Lp,k ,
T

For each k, the vector p,k is a search direction in model space k = 0, 1, . . . , K 2. (25)
and the scalar ,k is a step size. Let us make the additional
From equations (23)(25), we see that A and AT each operate
abbreviation
on K vectors, or one each per CG step. Mackie and Madden
g,k g(m,k ; m ). (1993) showed that operations with the Jacobian and its trans-
pose can be accomplished without computing the Jacobian
In accordance with the CG algorithm (Hestenes and Stiefel, itself. Instead, the vector resulting from either of these op-
1952), the step size is given by the formula erations can be found as the solution of a single pseudofor-
ward problem requiring the same amount of computation as
gT,k p,k the actual forward problem, F. (We dene one forward prob-
,k = , (20)
pT,k H p,k lem to include all frequencies and polarizations involved in
the data vector.) The algorithms for operating with A and
which, we point out, solves the univariate minimization prob- AT are detailed in the Appendix. The main memory used by
lem, MM comprises several vectors of length N (e.g. f,k ) and M
(e.g. p,k , g,k , and C g,k ). Our preconditioner (C ) requires
(m
,k + ,k p,k ; m ) = min (m
,k + p,k ; m ). no storage (see the section Preconditioning below). Thus,

the memory needed by MM scales linearly with the number of
The search directions are iterated as data and model parameters, compared to the quadratic scaling
for GN.
p,0 = C g We apply algorithm MM using relatively few CG steps per
p,k = C g,k + ,k p,k1 , k = 1, 2, . . . , K 1 (21) Gauss-Newton step. The main purpose in doing so is to keep
the computational effort needed for Jacobian operations un-
where the M M positive-denite matrix C is known as a der that which would be needed to generate the full Jacobian
preconditioner, and where the scalars ,k are calculated as matrix. The Jacobian operations performed in K CG steps of
MM require computations equivalent to solving 2K forward
gT,k C g,k problems, as indicated above. The computational effort needed
,k = .
gT,k1 C g,k1 to generate the full Jacobian matrix is harder to characterize
in general but, in the usual situation where the station set
The rst term of equation (21) is a preconditioned steepest de- is common for all frequencies and polarizations, amounts to
scent direction, which minimizes pT g,k , the directional deriva- one forward problem per station. Therefore, MM will do less
tive of (m;
m ) at m = m,k , with pT C1 p xed. The second computation (related to the Jacobian) per Gauss-Newton step
term modies the search direction so that it is conjugate to than GN when K is less than half the number of stations. Ad-
previous search directions, meaning ditonally, algorithm MM avoids the factorization of H. Trun-
cating the CG iteration also effects a kind of damping of the
pT,k H p,k = 0, k < k. (22) Gauss-Newton updates, achieving similar goals as Levenberg-
Marquardt damping. It is for this reason that algorithm MM
The nal ingredient of the conjugate gradients algorithm is
solves the undamped system (18), rather than system (19).
iteration of the gradient vectors:

g,0 = g Nonlinear conjugate gradients (NLCG)

g,k+1 = g,k + ,k H p,k , k = 0, 1, . . . , K 2, In algorithm MM, the method of conjugate gradients was
applied inside a Gauss-Newtonstyle iteration to incompletely
which follows from equation (16). solve a linear system or, equivalently, to incompletely minimize
The main computations entailed in algorithm MM are in- a quadratic approximation to the objective function. Nonlin-
volved in the evaluation of the forward function, F(m ), for ear conjugate gradients (see, for example, Luenberger, 1984)
each  [needed to compute (m ) and g ], and operation with directly solve minimization problems that are not quadratic,
the Jacobian matrix and its transpose for each k and . Regard- abandoning the framework of iterated, linearized inversion.
ing the latter, let Algorithm NLCG employs the Polak-Ribiere variant of non-
linear conjugate gradients (Polak, 1971) to minimize the objec-
A A(m ) tive function  of equation (11).

Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
NLCG Algorithm for 2-D MT Inversion 179

The model sequence for nonlinear CG is determined by a Here we dene


sequence of univariate minimizations, or line searches, along
computed search directions: g,k g(m,k )
H,k H(m,k ).
m0 = given
Our modications of this Gauss-Newton scheme are
(m +  p ) = min (m + p ) (26)

m+1 = m +  p ,  = 0, 1, 2, . . . . 1) We keep track of the best (smallest ) model encoun-
tered in the line search. Let us denote this as m,best
The search directions are iterated similarly to linear CG: m + ,best p .
2) If  increases during the iteration (  (,k ) >
p0 = C0 g0  (,k1 )), we calculate the next step-size by bisection:
p = C g +  p1 ,  = 1, 2, . . . (27) 1
,k+1 = (,k + ,best ). (31)
2
where, in the Polak-Ribiere technique,
3) On the second or later steps of a line search, if the current
gT C (g g1 ) and previous best models bracket a minimum, in the sense
 =  T . that (prime denotes derivative)
g1 C1 g1
The quantity C g is again the (preconditioned) steepest de-  (,best )  (,k ) < 0,
scent direction, minimizing the directional derivative of  eval-
uated at m . Unlike linear CG, the search directions are not then, instead of equation (30), ,k+1 is calculated so as
necessarily conjugate with respect to some xed matrix, as in to yield the local minimum of a cubic approximation to
equation (22), but they do satisfy the weaker condition  (). The cubic approximation matches  and  at
= ,k and = ,best .
pT (g g1 ) = 0,  > 0. (28)
The line search is deemed to converge when the estimated
The minimization problem, equation (26), is not quadratic value of the objective function for ,k+1 , predicted by the
and requires some iterative technique to solve. Since it involves quadratic or cubic approximation as appropriate, agrees with
only a single unknown, it is tempting to attack the problem as  (,k+1 ) within some prescribed tolerance. In the usual case
one of global optimization, i.e., nding a global minimum of of a Gauss-Newton update, the convergence condition is
 with respect to . Doing so would gain one advantage over
the Gauss-Newton method, which makes no attempt to dis- |  (,k+1 )
 (,k+1 ; m,k )|  (,k+1 )
tinguish local from global minima. However, global optimiza- where 1 is the tolerance. The line search is deemed to fail
tion potentially leads to many forward problem calculations if it does not converge within a prescribed maximum number
per NLCG step. Given the computational intensity of the MT of steps, or if  (,k+1 ) > 1.5  (,best ) occurs. In any case, the
forward problem, algorithm NLCG does not attempt global nal result of the th line search is taken as the best model
line minimization but approaches equation (26) with compu- found:
tational parsimony as a primary consideration.
Our line search algorithm is a univariate version of the m+1 = m,best .
Gauss-Newton method, with certain modications. To describe
it efciently, we denote the univariate function to be minimized If the line search converged, the new search direction, p+1 , is
as  and its Gauss-Newton approximation as : computed with equation (27). If it failed, p+1 is taken as the
steepest descent direction [rst term of equation (27)], break-
 () (m + p ) ing the conjugacy with previous search directions.
The main computations of algorithm NLCG are similar to
 (; mref ) (m
 + p ; mref ). those of MM. To evaluate g,k and pT H,k p in equation (30)
entails the computation of vectors AT,k V1 (d F(m,k )) and
Our line search generates a sequence of models
A,k p [where A,k A(m,k )]. Computing ,k+1 by cubic inter-
polation, however, does not require the second derivative of
m,k = m + ,k p , k = 0, 1, 2, . . . ,
 , in which case A,k p is not done. The same pseudoforward
where algorithms as in MM are used in NLCG to perform Jacobian
operations (see Appendix). NLCG, unlike MM, evaluates the
,0 = 0 forward function for each model update. Therefore, each line
(29) search step in NLCG solves the equivalent of two or three for-
 (,k+1 ; m,k ) = min
 (; m,k ), k = 0, 1, 2, . . . . ward problems. The memory requirements of NLCG are also

similar to MM, scaling linearly with N and M.
Since (m;
m,k ) is quadratic in m,
 (; m,k ) is quadratic in We close our description of NLCG by pointing out a po-
and it is easy to show that the minimization in equation (29) tential pitfall and related computational benet of the line
is solved by search stopping condition. Our condition compares  at the
gT,k p newest model, m,k+1 , to the quadratic or cubic approximation
,k+1 = ,k . (30) extrapolated from the previous model, m,k . The pitfall is that
pT H,k p agreement between these does not guarantee that  is near

Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
180 Rodi and Mackie

a minimum with respect to , so the line search might stop a simplied NLCG, in which the line search enhancements (cu-
prematurely. The benet ensues when F is approximately lin- bic interpolation and bisection) are ignored, to MM.
ear between m,k and the minimizing model. In this case, the Algorithms MM and NLCG both generate a doubly indexed
stopping condition will be met and m,k+1 will be an accurate sequence of models, m,k . In MM, the slower index () indexes
result of the line search, even though  and its gradient may a Gauss-Newton iteration while the faster index (k) indexes
have changed greatly from their values at m,k . The search stops a conjugate gradients loop. In our simplied NLCG, the op-
without additional, unnecessary computations such as an ad- posite is the case, with  a conjugate gradients counter and
ditional update (m,k+2 ) or second derivative information at k a Gauss-Newton counter. However, the algorithms perform
the new model (requiring A,k+1 p ). Consequently, when the similar calculations at each step of their respective inner loops.
nonlinear CG iteration has progressed to the point where F The difference between the algorithms can be identied with
behaves linearly in all search directions, each line minimiza- the frequency with which the following events occur: calculat-
tion will require only one step (m+1 = m,1 ) and the remaining ing the forward function (F); changing the search direction (p)
computations will be essentially the same as the linear CG com- used in conjugate gradients; and resetting the search direction
putations in MM, with the exception that the forward function to be the steepest descent direction.
F is evaluated each time the model is updated. To demonstrate this, we sketch a simple algorithm having a
single loop that subsumes MM and NLCG with the restricted
Preconditioning line search. The input is a starting model, m0 :
Algorithm CGI (m0 )
We recall that algorithms MM and NLCG each provide for
m:= m0 ;
the use of a preconditioner, C , in their respective implemen-
for  = 0, 1, 2, . . .
tations of conjugate gradients. The preconditioner can have a
if new ref
big impact on efciency in conjugate gradients. Two compet-
mref := m;
ing considerations in its choice are the computational cost of
e:= d F(mref );
applying the preconditioner, and its effectiveness in steering
else
the gradient vector into a productive search direction.
e:= e f;
This study compares two versions of each of algorithms
end
MM and NLCG: one without preconditioning (C = I) and one
g:= 2A(mref )T V1 e + 2LT Lm;
using
:=eT V1 e + mT LT Lm;
 1
C =  I + LT L , (32) if new dir
h:= C(mref )g;
where  is a specied scalar. In the latter case, we apply the if steep
preconditioner to a vector g by solving the linear system for h, := 0;
  else
 I + LT L h = g. := hT (g glast )/last ;
We solve this system using a (linear) conjugate gradients end
technique. p:= h + p;
The rationale for equation (32) is to have an operator that glast := g;
can be applied efciently and that in some sense acts like the last := hT g;
inverse of H , the approximate Hessian matrix. The efciency end
of applying C stems from the simplicity and sparseness of the f:= A(mref )p;
above linear system for h. The amount of computation needed := pT g/(fT V1 f + pT LT Lp);
to solve the system is less than one forward function evalua- m:= m + p;
tion and, thus, adds little overhead to either algorithm MM or next 
NLCG. The approximation to the inverse Hessian arises from The reader can verify that this algorithm corresponds to our
the second term of C1  , but we also attempt to choose  so
mathematical descriptions of MM and NLCG. To help, we point
that the rst term is of comparable size to the matrix AT V1 A . out that the formula for above corresponds to that for ,k
In our later examples, we took  to be a constant (indepen- in equation (20) (used in MM) but with that for ,k+1 ,k in
dent of ) based on the Jacobian matrix of a homogeneous equation (30) (used in the NLCG line search). Further, CGI
medium. replaces iteration of the gradient vector, in equation (25), with
iteration of an error vector, e.
Theoretical comparison of MM and NLCG Algorithm CGI has three ags: new ref, new dir, and steep.
The ag new ref is set to 1 (true) if the current model is to be
In the three main applications of NLCG presented below used as a reference model for linearization. The ag new dir
(Numerical Experiments), updating of the step size,  , by is 1 if the search direction is to be changed. Flag steep is 1
cubic interpolation occurred nine times, updating by bisection if the newly computed search direction is to be reset to the
[formula (31)] occurred zero times, and Gauss-Newton updat- steepest descent direction, thus breaking the conjugacy condi-
ing [formula (30)] occurred 211 times (for a total of 220 line tion [equation (28)]. All three ags are initialized to 1. We can
search steps among the three examples). Moreover, none of characterize algorithms MM and NLCG by how these ags are
the line searches failed to converge within the tolerance given. changed thereafter, as shown in Table 1. Algorithm CGI above
The line search algorithm in NLCG is thus primarily a univari- does not show tests for line search convergence or failure, but
ate Gauss-Newton algorithm, and it is informative to compare these could be the same as in NLCG.

Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
NLCG Algorithm for 2-D MT Inversion 181

The main computations in CGI are taken up in the evaluation thetic data were calculated using the nite-element algorithm
of the three quantities F(mref ), A(mref )p and A(mref )T V1 e. of Wannamaker et al. (1986), whereas our inversion algorithms
Each of these quantities requires the same computational ef- employ the transmission-network algorithm of Mackie et al.
fort (see Appendix). The latter two quantities (operations with (1988). Each synthetic data set comprises complex apparent
A and AT ) are done on each pass through the loop uncondition- resistivities at multiple station locations, frequencies, and po-
ally, while the forward function is done only when new ref is 1. larizations. Noise was included by adding an error to the com-
Therefore, each model update in CGI requires computations plex logarithm of each apparent resistivity: log app + er + iei ,
equal to two or three forward function evaluations, depending where er and ei are uncorrelated samples from a Gaussian dis-
on how new ref is determined. tribution having zero mean and 0.05 standard deviation (5%
noise). The noise was uncorrelated between frequencies, sta-
NUMERICAL EXPERIMENTS tions, and polarizations. For comparison, the accuracy of our
This section presents results of testing the three MT inversion forward modeling algorithm is approximately 13% for the
algorithms described above on synthetic and eld data. In each range of parameters (grid dimensions, frequencies, and resistiv-
test, algorithms GN, MM and NLCG were applied to the min- ities) involved in the test problems below (Madden and Mackie,
imization of a common objective function  [equation (11)] 1989).
with a given data vector d, variance matrix V, regularization
parameter , and regularization operator L. The data vector
and error variance matrix are described below with each ex- Model 1.Our rst tests employ a simple resistivity model
ample. The regularization operator for each example was the consisting of a 10 ohm-m rectangular body embedded in a
second-order nite-difference operator described earlier. To 100 ohm-m background. The anomalous body has dimensions
choose the regularization parameter, we ran preliminary in- of 10 10 km and its top is 2 km below the earths surface.
versions with a few values of and then subjectively chose one The tests use synthetic data for the TM and TE polarizations
that gave reasonable data residuals and model smoothness. at seven sites and ve frequencies, yielding a total of 140 real-
We point out that none of the three inversion algorithms being valued data. The frequencies range from 0.01 to 100 Hz and
tested determines as an output. Various other parameters are evenly spaced on a logarithmic scale. The model parame-
specic to the inversion algorithms were selected as follows: terization for inversion divides the earth into a grid of blocks
numbering 29 in the horizontal (y) direction and 27 in the ver-
1) In GN, the Levenberg-Marquardt damping parameter tical (z) direction, implying a total of 783 model parameters.
was set to 0.001 times the current value of the objective The variance matrix (V) was set to 0.0025 times the identity
function:  = 0.001(m ). matrix, and the regularization parameter () was chosen as 30.
2) In NLCG, the tolerance for deciding convergence of the The starting model for each inversion was a uniform halfspace
line minimization, , was set to 0.003. with = 30 ohm-m.
3) In MM and NLCG, the preconditioner was either that de- We applied inversion algorithm GN and two versions each
ned by equation (32) or, in one experiment, the identity of MM and NLCG (with and without preconditioning) to the
(no preconditioning). synthetic data from model 1. Figure 1 shows the performance
4) In MM, the number of conjugate gradient steps per of each algorithm in terms of the value of the objective function
Gauss-Newton step, K , was set to 3. () it achieves as a function of CPU time expended. CPU time
used to compute the objective function for the starting model
All results were computed on a 400-MHz Pentium II PC run- is ignored, so the rst symbol plotted for each algorithm is at
ning the Linux operating system. The CPU times stated below zero CPU time. Following this, a symbol is plotted for each it-
are intended to reect only the relative performance of the al- eration step of an algorithm: a Gauss-Newton step for GN and
gorithms. We emphasize that the intent of these tests was to MM, a conjugate gradients step for NLCG. It is immediately
compare the speed and accuracy of GN, MM and NLCG as evident from Figure 1 that, in both MM and NLCG, the pre-
minimization algorithms, not the quality of the inversion mod- conditioner enhances performance signicantly, especially in
els in a geophysical sense. the case of MM. With preconditioning, MM and NLCG effec-
tively converge to a nal result in less than one minute of CPU
Examples with synthetic data time, while without preconditioning, they are far from conver-
gence after a minute. We also infer from the spacing between
We generated synthetic data by applying a 2-D MT forward symbols that preconditioning does not add signicantly to the
modeling algorithm to specied models of the earths resistiv- amount of computation in either algorithm. Henceforth, we
ity and perturbing the results with random noise. The forward will consider MM and NLCG only with preconditioning.
modeling algorithm we used for this purpose was intention- Next, we compare algorithms MM, NLCG and GN. We see
ally different from that used in our inversion algorithms. Syn- from Figure 1 that GN, like MM and NLCG, effectively con-
verges in less than one minute of CPU time. However, the rates
Table 1. How ags are set in algorithms MM and NLCG. of convergence differ amongst the algorithms. MM and NLCG
reduce the objective function in the early stages of minimiza-
Event MM NLCG tion at a noticeably faster rate than GN. This is quantied in
new ref=1 Every K th update Every update Table 2, which gives the amount of CPU time expended by each
new dir=1 Every update When line search algorithm to achieve various values of the objective function,
converges or fails determined by interpolating between iteration steps. Values of
steep=1 Every K th update When line search fails  are referenced to the smallest value achieved by any of the

Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
182 Rodi and Mackie

algorithms (in this case GN), which is denoted min in the ta- We note that the number of steps until convergence and
ble. It is clear that MM and NLCG achieve each level of the the CPU time used per step differ markedly among the algo-
objective function, down to 1.05 min , much faster than GN, rithms (Figure 1). GN requires the fewest number of steps and
with MM being slightly faster than NLCG. In the later stages takes the longest for each step, whereas NLCG requires the
of minimization ( < 1.05 min ), NLCG becomes the most ef- most steps and is fastest per step. In MM and NLCG, the time
cient, reaching within 1% of the minimum in about 20% less per iteration step reects largely the number of forward prob-
CPU time than GN and 40% less than MM. lems (and pseudoforward problems) invoked. Given our input
Figure 2 displays one model from the model sequence gen- parameters, algorithm MM solves seven (i.e. 1 + 2K ) forward
erated by each of the three algorithms, i.e., the model yielding problems per Gauss-Newton step (six devoted to operations
the objective function value closest to 1.01 min . The images with the Jacobian matrix). NLCG solves three forward prob-
are truncated spatially to display the best resolved parameters; lems per line search step (two for Jacobian operations). Since
deeper blocks and those laterally away from the station array the stopping criterion for the line search was rather liberal
are not shown. The models from the different algorithms are ( = 0.003), all but the rst three line minimizations converged
clearly very similar. Each model differs (block by block over the in one step. (The rst three each required two steps.) GN solves
portion shown) from the best model generated (that yielding eight forward problems per Gauss-Newton step (seven to com-
 = min ) by less than a factor of 1.3 in resistivity, or difference pute the Jacobian matrix), which is only one greater than MM.
of 0.1 in log10 . Models later in each inversion sequence are However, GN spends signicant CPU time creating and fac-
even closer to each other and to the best model. This conrms toring the Hessian matrix, which explains why its CPU time
numerically the premise of our formulation that it is the min- per Gauss-Newton step is so much larger than that of MM.
imization criterion, and not the minimization algorithm, that Also of interest in Figure 1 is the observation that MM had
determines the solution of the inverse problem. a larger initial reduction in the objective function than GN.
This difference must be due to the difference between us-
ing Levenberg-Marquardt damping and truncated iteration for
Table 2. CPU times versus objective function: rst synthetic modifying the Gauss-Newton model update. Since we did not
data set.
attempt to optimize the choice of  in GN or K in MM, we note
/min (min = 443.82) this difference without drawing a general conclusion about the
merits of the two damping techniques.
2.0 1.5 1.2 1.1 1.05 1.02 1.01
GN 23 28 33 36 41 45 46
MM 9 12 14 21 31 48 61
NLCG 11 13 18 23 27 32 36

FIG. 1. Objective function versus CPU time resulting from the


application of the following inversion algorithms to the rst
synthetic data set: the Gauss-Newton algorithm (GN, lled cir- FIG. 2. Inversion models from the rst synthetic data set, com-
cles), the Mackie-Madden algorithm (MM) with and without puted with algorithms GN (top), MM with preconditioning
preconditioning (up and down triangles), and nonlinear con- (middle), and NLCG with preconditioning (bottom). Resistiv-
jugate gradients (NLCG) with and without preconditioning ity scales (right) have units log10 ohm-meters. Station locations
(pluses and crosses). (The label npc denotes no precondi- are marked with triangles. Each model yields  = 1.01min
tioning.) (see Table 2).

Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
NLCG Algorithm for 2-D MT Inversion 183

Model 2.The next experiment with synthetic data uses a Inversion models resulting from the second data set are
more complicated model and larger data set. The model repre- shown in Figure 4. In the case of GN and NLCG, the models
sents a block-faulted structure with a resistive unit exposed at are for  = 1.01 min ; for MM, it is the last model generated
the surface of the up-thrown block. The down-thrown block has ( = 1.026 min ). As in the previous example, there is great
the resistive unit being overlaid by a more conductive surface similarity among the models, although noticeable differences
layer. The data set comprises complex TM and TE apparent occur in the conductive overburden as well as beneath its right
resistivities for 12 sites and ten frequencies between 0.0032 and edge (x 5, z > 10 km). In the distance and depth range shown,
100 Hz, giving a total of 480 data. The inversion model has 660 the maximum departure of the displayed GN and NLCG mod-
parameters corresponding to a 33 20 grid of blocks. The ini- els from the best model computed is a factor of 2 in resistivity,
tial model for each algorithm was a homogeneous halfspace of whereas for MM it is a factor of 5. For both GN and NLCG,
10 ohm-m. The variance matrix was the same as in the previous the departure drops to about 1.5 when  reaches 1.005 min .
example, and the regularization parameter was set to 20.
The performance of the three inversion algorithms is pre-
sented in Figure 3 and Table 3. The algorithms differ in a similar Example with eld data
manner as in the previous example. In the beginning, the con-
Lastly, we demonstrate the various inversion algorithms on
jugate gradients-based algorithms (MM and NLCG) reduce
real MT data collected by P. Wannamaker in the Basin and
the objective function much faster than the Gauss-Newton al-
Range (Wannamaker et al., 1997). The data set comprises TM
gorithm, with MM noticeably faster than NLCG. In the later
complex apparent resistivities at 58 sites and 17 frequencies per
stages of minimization, MM exhibits a slow convergence rate
site, for a total of 1972 real-valued data. The inversion model
and is overtaken rst by NLCG and then by GN in reducing the
was parameterized with a 118 25 grid of blocks, yielding 2950
objective function. MM was halted after about 1000 s, at which
model parameters. Each algorithm was applied with a homo-
point  was 2.6% larger than min (which again was achieved
geneous initial model with resistivity 100 ohm-m. The diago-
by GN); hence the dashes in the last two columns of Table 3.
nal elements of the variance matrix (V) were set equal to the
We note that only six of the iterative line searches performed
squares of the reported standard errors and the off-diagonal
by NLCG took more than a single step, ve taking two steps
ones were set to zero. The regularization parameter was chosen
and one taking three.
as 8. The results are presented in Figures 57 and Table 4.
Looking at Figure 5, it is clear that NLCG and MM perform
Table 3. CPU times versus objective function: second syn- vastly better than GN on this real data set. NLCG achieved the
thetic data set.
/min (min = 1890.7)
2.0 1.5 1.2 1.1 1.05 1.02 1.01
GN 125 139 162 180 222 353 531
MM 47 67 114 201 404
NLCG 51 61 82 109 150 229 296

FIG. 4. Inversion models from the second synthetic data set,


computed with algorithms GN (top), MM (middle), and
NLCG (bottom). The resistivity models are displayed with the
FIG. 3. Objective function versus CPU time resulting from the same conventions as Figure 2. The GN and NLCG models
application of inversion algorithms GN, MM and NLCG to the yield  = 1.01 min and the MM model  = 1.026 min (see
second synthetic data set. Conventions are as in Figure 1. Table 3).

Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
184 Rodi and Mackie

smallest  among the algorithms in roughly the same amount (x 60 km), which are difcult to see since the color scale cov-
of time needed for one step of GN. GN took over 3 CPU hours ers almost a factor of 10 000 in resistivity. Otherwise the mod-
to reach within 10% of this value (Table 4), and reached only els are very similar. The maximum discrepancy from the model
within 4.4% of min when it was halted after about 7 hours. yielding  = min is about a factor of 4 for the GN model and
These results demonstrate the poor scalability of algorithm a factor of 2 for the others.
GN with problem size. In this problem, GN solves 59 forward
problems per Gauss-Newton step (compared to seven for MM)
and must factor a 2950 2950 matrix (the damped Hessian).
The computer memory requirements are also extensive as the
Jacobian matrix contains 5.8 million (real) elements and the
Hessian 8.7 million elements. MM and NLCG, on the other
hand, require only several vectors of length 2950.
Figure 6 replots the MM and NLCG results on an expanded
time scale so that the performance of these conjugate gradients-
based algorithms can be compared. We see the same pattern
as in the synthetic data examples, only this time MM performs
even more favorably than NLCG in the early stages of mini-
mization. NLCG shows faster convergence at the later stages,
overtaking MM when  is between 1.2 and 1.1 of the mini-
mum (Table 4). All but seven of the line searches in NLCG
converged in a single step, and only the rst took as many as
three steps.
The MM and NLCG inversion models in Figure 7 yield
 = 1.01 min , whereas the GN model yields  = 1.044 min .
There are some signicant differences between the GN model
and the others in a vertical band near the rightmost station

Table 4. CPU times versus objective function: Basin and FIG. 6. The results of Figure 5 for algorithms MM and NLCG
Range data set. shown on an expanded time scale.
/min (min = 9408.9)
2.0 1.5 1.2 1.1 1.05 1.02 1.01
GN 5143 6216 8245 11343 21608
MM 65 111 288 501 731 1232 1751
NLCG 158 224 342 425 536 712 827

FIG. 7. Inversion models from real MT data from the Basin


and Range, computed with algorithms GN (top), MM (mid-
FIG. 5. Objective function versus CPU time resulting from the dle), and NLCG (bottom). The models are displayed with the
application of GN, MM, and NLCG to real MT data from the same conventions as Figure 2, except that only the rst and last
Basin and Range (Wannamaker et al., 1997). Conventions are of 58 stations are marked. The MM and NLCG models yield
as in Figure 1.  = 1.01min and the GN model  = 1.044min (see Table 4).

Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
NLCG Algorithm for 2-D MT Inversion 185

DISCUSSION AND CONCLUSIONS ing an optimal preconditioner. Our rst recommendation is


additional work on the development of an effective precondi-
We have compared three minimization algorithms for com-
tioner for conjugate gradients-based inversion. Second, since
puting regularized solutions of the 2-D magnetotelluric inverse
we have seen advantages to both MM and NLCG, we recom-
problem, both theoretically and with numerical experiments
mend research on hybrid algorithms that combine elements
involving synthetic and real data. We conclude that the conju-
of each. In our theoretical comparison of these algorithms, we
gate gradients-based algorithms, MM and NLCG, are superior
pointed out their similarity in structure and sketched a more
to a conventional Gauss-Newton algorithm (GN) with regard
general algorithm (CGI) that is a template for both. In light of
to the computational resources needed to compute accurate so-
the discussion above, avenues for an improved CGI are more
lutions to problems of realistic size. The explanation is that the
sophisticated tests for when to compute the forward function,
Gauss-Newton method entails the generation of a full Jacobian
when to change search directions, and when to revert to the
matrix and the complete solution of a linearized inverse prob-
steepest descent search direction.
lem at each step of an iteration. MM and NLCG replace these
We close by remarking that the algorithms of the type pre-
computations with ones that scale much more favorably with
sented and tested here, while not optimal, are a clear and
problem size in both CPU and memory usage. Moreover, we
needed improvement over the iterated, linearized inversion al-
enhanced performance by employing a good preconditioner in
gorithms in standard use. With some renement at least, they
both CG-based algorithms and a very simple line minimization
will allow MT practitioners to use larger model grids and data
scheme in NLCG.
sets (more frequencies and stations) in their studies, which in
Between the Mackie-Madden algorithm and nonlinear con-
the past have often been reduced to accommodate the limita-
jugate gradients, our numerical tests do not indicate that either
tions of the computer. Further, it is quite obvious to us that
algorithm is clearly superior to the other. In all three tests, and
the standard methods, like Gauss-Newton, are not practical
especially the largest one with real data, MM reduced the ob-
for realistic 3-D electromagnetic problems and, even allowing
jective function at a faster rate (versus CPU time) than NLCG
for improvements in computing hardware, will not be for some
in the early stages of minimization, whereas NLCG performed
time. Our results with 2-D MT suggest that conjugate gradi-
more efciently in the later computations. The early model
ents algorithms would be a much more feasible approach to
updates account for most of the reduction of the objective
3-D electromagnetic inversion.
function, suggesting MM is preferable, but in our examples we
found that some model parameters, well sensed by the data, ACKNOWLEDGMENTS
change signicantly in the last stages of minimization, a fact
favoring NLCG. In the real data experiment, these changes We thank Joe Matarese for steering us, and taking the rst
amounted to as much as a factor of 30 in resistivity from the steps, in the direction of nonlinear conjugate gradients as a
point where NLCG overtook MM in the CPU time race. (The technique for fast inversion algorithms. We also thank Phil
objective function was about 1.14 times the minimum at this Wannamaker for providing us with his Basin and Range dataset
crossover point.) In the larger synthetic data test, MM took for use in our comparisons of inversion algorithms. We are
longer than both NLCG and GN to reach within a factor of 10 grateful to Greg Newman and anonymous reviewers of the
of the solution model. paper who provided helpful and insightful comments.
We attribute the slower convergence rate of MM to the fact
that it interrupts the conjugacy relation among search direc- REFERENCES
tions periodically, which is unnecessary near convergence when Constable, S. C., Parker, R. L., and Constable, C. G., 1987, Occams
the forward function is presumably well-approximated as lin- inversion: A practical algorithm for generating smooth models from
ear. On the other hand, NLCG is probably wasteful in the electromagnetic sounding data: Geophysics, 52, 289300.
deGroot-Hedlin, C., and Constable, S., 1990, Occams inversion to gen-
same situation by computing the nonlinear forward function erate smooth, two-dimensional models from magnetotelluric data:
after every model update. The net effect, however, is faster Geophysics, 55, 16131624.
convergence for NLCG. It is less obvious why MM is better Dongarra, J. J., Bunch, J. R., Moler, C. B., and Stewart, G. W., 1979,
LINPACK: Users guide: Soc. Ind. Appl. Math.
than NLCG in the early computations. One possibility is that Ellis, R. G., and Oldenburg, D. W., 1994, The pole-pole 3-D dc-
the second and third steps of the line search in NLCG, when resistivity inverse problem: A conjugate gradient approach: Geo-
phys. J. Internat., 119, 187194.
they occurred, did not reduce the objective function sufciently Farquharson, C. G., and Oldenburg, D. W., 1996, Approximate sensitiv-
to warrant doubling or tripling the CPU time of the search. Per- ities for the electromagnetic inverse problem: Geophys. J. Internat.,
haps more would have been gained by changing search direc- 126, 235252.
Fletcher, R., and Reeves, C. M., 1959, Function minimization by con-
tion every model update, as in MM. One motivation for doing jugate gradients: Computer J., 7, 149154.
accurate line minimizations in the NLCG method is to enable Hestenes, M. R., and Stiefel, E., 1952, Methods of conjugate gradients
the conjugacy of search directions, but conjugacy amongst the for solving linear systems: J. Res. Nat. Bureau Stand., 49, 409436.
Jiracek, G. R., Rodi, W. L., and Vanyan, L. L., 1987, Implications of
earliest search directions is not as important as for the later magnetotelluric modeling for the deep crustal environment in the
ones. For this same reason, interrupting conjugacy probably Rio Grande rift: Phys. Earth Plan. Int., 45, 179192.
does not hinder MM signicantly in the early stages. Lastly, it Jupp, D. L. B., and Vozoff, K., 1975, Stable iterative methods for the
inversion of geophysical data: Geophys. J. Roy. Astr. Soc., 42, 957
might be possible for NLCG to skip some nonlinear forward 976.
calculations even for the earlier model updates. 1977, Two-dimensional magnetotelluric inversion: Geophys. J.
Roy. Astr. Soc., 50, 333352.
We recommend two topics for continued research on these Levenberg, K., 1944, A method for the solution of certain non-linear
CG-based algorithms for electromagnetic inversion. For both problems in least squares: Quart. Appl. Math., 2, 164168.
MM and NLCG, we showed that performance is enhanced sig- Luenberger, D. G., 1984, Linear and nonlinear programming, 2nd ed.:
Addison-Wesley Publ. Co.
nicantly when a preconditioner is used. In developing these Mackie, R. L., Bennett, B. R., and Madden, T. R., 1988, Long-period
algorithms for this study, we did not put great effort into nd- magnetotelluric measurements near the central California coast:

Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
186 Rodi and Mackie

A land-locked view of the conductivity structure under the Pacic Rodi, W. L., 1976, A technique for improving the accuracy of nite
Ocean: Geophys. J., 95, 181194. element solutions for magnetotelluric data: Geophys. J. Roy. Astr.
Mackie, R. L., and Madden, T. R., 1993, Three-dimensional magne- Soc., 44, 483506.
totelluric inversion using conjugate gradients: Geophys. J. Internat., 1989, Regularization and Backus-Gilbert estimation in nonlin-
115, 215229. ear inverse problems: Application to magnetotellurics and surface
Madden, T. R., 1972, Transmission systems and network analogies to waves: Ph.D. thesis, Pennsylvania State Univ.
geophysical forward and inverse problems: ONR Technical Report Shi, W., Rodi, W., Mackie, R. L., and Zhang, J., 1996, 3-D d.c. electrical
72-3. resistivity inversion with application to a contamination site in the
Madden, T. R., and Mackie, R. L., 1989, Three-dimensional magne- Aberjona watershed: Proceedings from Symposium on the Appli-
totelluric modeling and inversion, Proc. IEEE, 77, 318333. cation of Geophysics to Environmental and Engineering Problems
Marquardt, D. W., 1963, An algorithm for least-squares estimation of (SAGEEP): Environmental and Engineering Geophys. Soc., 1257
nonlinear parameters, J. Soc. Indust. Appl. Math, 11, 431441. 1267.
Matarese, J. R., 1993, Nonlinear traveltime tomography: Ph.D. thesis, Smith, J. T., and Booker, J. R., 1988, Magnetotelluric inversion for
Massachusetts Institute of Technology. minimum structure: Geophysics, 53, 15651576.
Matarese, J. R., and Rodi, W. L., 1991, Nonlinear traveltime inver- 1991, Rapid inversion of two- and three-dimensional magne-
sion of cross-well seismics: a minimum structure approach: 61st totelluric data: J. Geophys. Res., 96, 39053922.
Ann. Internat. Mtg., Soc. Expl. Geophys., Expanded Abstracts, 917 Swift, C. M., Jr., 1971, Theoretical magnetotelluric and Turam response
921. from two-dimensional inhomogeneities: Geophysics, 36, 3852.
McGillivray, P. R., and Oldenburg, D. W., 1990, Methods for calculat- Tarantola, A., 1987, Inverse problem theory: Elsevier.
ing Frechet derivatives and sensitivities for the non-linear inverse Tikhonov, A. N., and Arsenin, V. Y., 1977, Solutions of ill-posed prob-
problem: A comparative study, Geophys. Prosp., 38, 499524. lems: V. H. Winston and Sons.
Newman, G., 1995, Crosswell electromagnetic inversion using integral Thompson, D. R., 1993, Nonlinear waveform tomography: Theory and
and differential equations: Geophysics, 60, 899911. application to crosshole seismic data: Ph.D. thesis, Massachusetts
Newman, G. A., and Alumbaugh, D. L., 1997, Three-dimensional mas- Institute of Technology.
sively parallel electromagnetic inversionI. Theory: Geophys. J. In- Wannamaker, P. E., Johnston, J. J., Stodt, J. A., and Booker, J. R.,
ternat., 128, 345354. 1997, Anatomy of the southern Cordilleran hingeline, Utah and
Oldenburg, D. W., McGillivray, P. R., and Ellis, R. G., 1993, General- Nevada, from deep electrical resistivity proling: Geophysics, 62,
ized subspace methods for large scale inverse problems: Geophys. J. 10691086.
Internat., 114, 1220. Wannamaker, P. E., Stodt, J. A., and Rijo, L., 1986, Two-dimensional
Polak, E., 1971, Computational methods in optimization: A unied topographic responses in magnetotellurics modeled using nite ele-
approach: Academic Press. ments: Geophysics, 51, 21312144.
Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P., Wu, F. T., 1968, The inverse problem of magnetotelluric sounding:
1992, Numerical recipes in FORTRAN: The art of scientic com- Geophysics, 33, 972979.
puting, 2nd ed.: Cambridge Univ. Press. Zhang, J., Mackie, R. L., and Madden, T. R., 1995, 3-D resistivity for-
Reiter, D. T., and Rodi, W., 1996, Nonlinear waveform tomography ward modeling and inversion using conjugate gradients: Geophysics,
applied to crosshole seismic data: Geophysics, 61, 902913. 60, 13131325.

APPENDIX
JACOBIAN COMPUTATIONS

The Gauss-Newton method (algorithm GN) requires the with the complex matrix A being the Jacobian of F:
computation of each element of the Jacobian matrix, A. The
Mackie-Madden algorithm (MM) and nonlinear conjugate gra- Ai j (m) = j F i (m).
dients (NLCG), in contrast, employ A only in the computation
of quantities Ap and AT q for specic vectors p and q [e.g., We also have
equations (23) and (25)]. This Appendix describes algorithms Ap = Re EAp
for the computation of A, Ap, and AT q.
To begin, since each datum is the real or imaginary part of a AT q = Re AT ET q.
complex quantity, we will convert our problem to one involving
Our task translates to nding A, Ap, and AT q where q = ET q.
complex variables. Let d be a complex vector such that each
To specify F, it is convenient to consider all frequencies and
element of d is the real or imaginary part of a unique element
polarizations involved in the data vector d simultaneously. Let
of d:
v be a vector comprising the parameterized E x and/or Hx elds
d = Re Ed for all frequencies, and let the linear equation

where K(m)v(m) = s(m) (A-1)




1 if d Re d ;
i k denote the nite-difference form of Maxwells equations for all
E ik
= i if d i Im dk ; relevant polarizations and frequencies. K is a block-diagonal

matrix (when v is partitioned by frequencies and polarizations)
0 else. and s comprises the right-hand-side vectors for all frequencies
and polarizations. We have shown the dependence of K and s,
We will denote the dimensionality of d as N , where clearly
and hence v, on the model parameter vector m. We can now
N N in general and N = 2 N just in case amplitude and phase
write
data are included in d equally. We can now write the forward 2
function F as i ai (m)T
v(m)
F i (m) = log (A-2)
F(m) = Re E F(m) i bi (m)T v(m)

where F is a complex function. It follows that where the vectors ai and bi are chosen to extract from v the
relevant eld averages for the polarization, frequency, and ob-
A = Re EA servation site associated with the ith complex datum.

Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
NLCG Algorithm for 2-D MT Inversion 187

Computation of A However, this last statement does not take into account the
particular structure of the matrix K and vectors ai and bi for
We consider the computation of A using two methods de- 2-D magnetotellurics. K has a block-diagonal structure with
scribed by Rodi (1976). Differentiating equation (A-2), each block corresponding to one polarization and frequency
ij ij combination. Furthermore, the nonzero elements of ai and bi ,
Ai j = A1 + A2 , (A-3) for any given i, are all associated with a common partition of v
where (since one 2-D MT datum conventionally involves only a single
T polarization and frequency). Therefore, only one block of each
ij 2 2 pseudoforward problem in equation (A-7) needs to be solved
A1 = T
j ai T j bi v
ai v bi v and, what is more, we may choose between the rst and second
methods independently for each polarization/frequency pair
and in computing its partition of A2 . The rst (second) method
ij is more efcient when the number of data for that polariza-
A2 = ciT j v, (A-4) tion/frequency is larger (smaller) than the number of model
parameters.
where the vector ci is dened by
2 2
ci = ai bi . Computation of Ap and AT q
aiT v biT v
From equation (A-3), we have
The matrix A1 accounts for the dependence of app on m
through the vectors ai and bi . The matrix A2 accounts for the Ap = A1 p + A2 p
dependence of v on m. We assume the vectors ai and bi and AT q = AT1 q + AT2 q.
their partial derivatives can be computed with closed-form ex-
pressions so that A1 can also be computed with such. We turn Again, we assume the rst term of each expression can be com-
to the more difcult task of computing A2 . puted explicitly and we turn our attention to the second terms.
From equation (A-1), we can infer The algorithm of Mackie and Madden (1993) for A2 p may
be derived as follows. From equation (A-4), we have
K j v = j s ( j K)v, j = 1, 2, . . . , M. (A-5)  ij
A2 p j = ciT t, (A-9)
Again, we assume that K, s, and their partial derivatives are j
known analytically. The rst method described by Rodi (1976)
is to solve these M pseudoforward problems for the vectors where the vector t is given by
j v and substitute them into equation (A-4). 
t= p j j v.
The second method of Rodi (1976) exploits the reciprocity
j
property of the forward problem, i.e., the symmetry of K. Solv-
ing equation (A-5) and plugging into equation (A-4), we get From equation (A-5), it is clear that t satises

A2 = ciT K1 ( j s ( j K)v).
ij
(A-6) Kt = p j ( j s ( j K)v). (A-10)
j
Let the vectors ui satisfy
The algorithm for A2 p is to solve the single forward problem,
Kui = ci , i = 1, 2, . . . , N . (A-7) equation (A-10), for t and then evaluate equation (A-9).
The Mackie-Madden method for AT2 q can be derived simi-
Given the symmetry of K, we can then write equation (A-6) as larly. From equation (A-8), we have
ij
A2 = uiT ( j s ( j K)v). (A-8)  ij
q i A2 = rT ( j s ( j K)v), (A-11)
The second method is to solve equations (A-7) and then eval- i
uate equation (A-8). where we dene the vector r by
The matrices j K are very sparse since K is sparse and each
of its elements depends on only a few of the m j . The vectors 
r= q i ui .
j s, ai , and bi are likewise sparse, or zero. Therefore, in ei-
i
ther method, construction of the right-hand-side vectors for
the pseudoforward problems [equations (A-5) or (A-7)] and From equation (A-7), r satises
ij
evaluation of the expression for A2 [equations (A-4) or (A-8)] 
Kr = q i ci . (A-12)
take relatively little computation. The major computational ef-
i
fort in either method is in solving the appropriate set of pseud-
oforward problems: equations (A-5) or (A-7). For this reason, The algorithm for AT q is to solve equation (A-12) and substi-
the rst method [equations (A-4) and (A-5)] is more efcient tute into equation (A-11).
when N > M (more data than model parameters) while the sec- The major computation in each of these algorithms is the so-
ond, reciprocity method [equations (A-7) and (A-8)] is more lution of one pseudoforward problem: for r in equation (A-12)
efcient when M > N . or t in equation (A-10).

Downloaded 15 May 2010 to 95.176.68.210. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/

You might also like