Professional Documents
Culture Documents
COMPUTATIONAL AND
APPLIED MATHEMATICS
Abstract
A new model-trust region algorithm for problems in unconstrained optimization and nonlinear equations utilizing a
quadratic interpolant for step selection is presented and analyzed. This is offered as an alternative to the piecewise-linear
interpolant employed in the widely used "double dogleg" step selection strategy. After the new step selection algorithm has
been presented, we offer a summary, with proofs, of its desirable mathematical properties. Numerical results illustrating
the efficacy of this new approach are presented. @ 1998 Elsevier Science B.V. All rights reserved.
Keywords." Model-trust region algorithms; Nonlinear equations; Unconstrained optimization
1. Introduction
We are concerned in this paper with a new procedure for solving unconstrained optimization
problems whether they arise directly or indirectly through minimization o f the norm of the residual
of a system o f nonlinear equations. O f particular interest are problems of sufficient complexity to
preclude the determination o f an initial guess for the iteration that is close to the desired solution.
This situation is quite common in engineering design, one such problem arising in the numerical
simulation o f semiconductor devices [4].
N e w t o n ' s method [6] is a powerful tool for solving such problems since it exhibits local q-quadratic
convergence, i.e. it is q-quadratically convergent provided that we start the iteration sufficiently close
to the solution. In order to construct a globally convergent variant, Newton's method is hybridized
with the globally, yet typically slowly, convergent Cauchy's method (steepest descent). The resulting
so-called model-trust region algorithms [2] retain the best features o f both methods: strong global
convergence coupled with rapid local convergence (i.e. they are globally q-quadratically convergent).
The various model-trust region algorithms differ from one another in precisely how this hybridization
is achieved. This matter will be further elaborated upon in the next section.
The present paper offers an improvement to the standard "double dogleg" version of this algorithm
via replacement of its piecewise-linear approximant by a quadratic interpolant. In the ensuing pages,
we first introduce the model-trust region strategy. We then present our new step selection algorithm,
describe its mathematical properties, and compare it to the "double dogleg" strategy. Finally, we
report on the strenuous excercise of the new algorithm on a suite of standard test problems [5] as
well as on the numerical simulation of a p - n diode [4]. Additional details of the test problems
appear in [3]. We focus herein on those aspects of our algorithm which are innovative. For details on
model-trust region algorithms in general (e.g. scaling, initialization, termination, Hessian estimation,
linear solver), the reader is referred to the superlative text [1 ].
with f ( x ) assumed twice continuously differentiable. We seek a local minimizer, x., for this problem.
For unconstrained maximization, simply replace f ( x ) by - f ( x ) . In what follows, all unspecified
norms are Euclidean 2-norms, ][xII :-- ~ , and Ilx[b := ~ with A symmetric and positive
definite (SPD).
In Newton's method [6], the step at each stage is selected to minimize the local quadratic model:
where a subscript denotes the point of evaluation, in this instance the current point, xc. Applying
the first derivative test for a local extremum, we arrive at the equations for the Newton step:
If VZf~ is not positive definite then we modify the local quadratic model as follows:
where Pc t> 0 is selected so that the modified model Hessian, H~ := V2fc + #fl is "safely positive
definite". Assuming that V 2 f . is nonsingular, V2fc will be SPD when we are close to the local
minimizer, so that #c = 0 eventually. Further justification for this modification as well as details of
its implementation are available in [1].
Similarly, we may also address the root-finding problem:
F(x.)=O, x. E ~n, F : R" ~ ~n, (5)
min f(x),
x E R"
f(x) := ½F(x)TF(x)= ½llF(x)ll 2 (6)
B.J. McCartinI Journal of Computationaland Applied Mathematics 91 (1998) 249-259 251
Taylor series expansion about the current iterate, xc, yields the following local approximation to
f(x):
Rather than having to compute the Hessian matrices, V2Fi (i = 1.... ,n), we observe that their
coefficients vanish at x,. Thus, we omit the above summation and introduce the modified local
quadratic model:
Since the gradients of the, me, and f are identical at x¢, they share descent directions from that point.
In particular, Pso := - J r F (steepest descent direction) and PN := - j - I F (Newton direction) are
shared by all three functions. Eq. (8) may be recast in the form of Eq. (4) by the indentifications
Vf¢ = JfF~ and H~ = J~J~, which is positive definite provided that J~ is nonsingular.
A variant of Newton's method for either unconstrained minimization or nonlinear equations with
attractive global convergence properties may be derived as follows. A pure Newton iteration would
correspond to selecting s to minimize lh¢ regardless of the quality of this local quadratic model. The
model-trust region algorithms improve upon this by utilizing an estimate, 6¢, of the radius of the
region about xc in which ~hc adequately represents f . Thus, the next iterate is determined by the
step, s, which solves the locally constrained minimization problem:
This stands in stark contrast to restricted-step Newton methods which retain the Newton step direction
but simply reduce the step length when far from x,.
Defining
we may now state the fundamental theorem of model-trust region algorithms (see Fig. 1):
Theorem 1 (Fundamental Theorem of model-trust region algorithms). The solution to the locally
constrained minimization problem, Eq. (9), is given by
Note that the curve, s(2), has the following properties [1]: s ( 0 ) = SN, S ( 1 ) = 0, S'(1)----Vfc.
Thus, the associated step-directions range between the Newton (6~ large) and the Cauchy (6¢ small)
directions. Also, s(2) is a descent direction from x¢ for all 0 ~< 2 < 1 and distance from x~ increases
as 2 decreases. In practice, the difficulty in applying Theorem 1 lies in the complexity of finding
2+. If we attempt to find it exactly, as is done in the optimal "hook step" strategy [1 ], then systems
of equations of the form
2k I'~
H~ + 1 - 2k ] sk = - ~ ' f ~ (12)
0.$
~,]tat t
O.S I t .S
xl
In the new step selection algorithm (see Fig. 3), we approximate s+ by replacing s(2) by the
quadratic interpolant:
a(r/) := ( r / - 1)[(r/- 1)Ss + r//3Vf~]; 0 ~< ~/~< 1, (13)
when sN lies outside the trust region. For the moment, we leave /3 > 0 unspecified. We next sum-
marize some general properties of a(q).
Theorem 2. For any fl > O, xc + a(rl) passes through x¢ and XN and is tangent to x¢ + s(2) at xc.
Moreover, a(q) is a descent direction f r o m x¢ f o r any 0 <%t1 < 1.
Ideally, the free parameter, fl, would be chosen to achieve tangency at xN also. However, this is
equivalent to selecting/3 so that -2Sy - flVf¢ II -H 'sN and would thus require that an additional
system of equations be solved. We wish to avoid this added cost. In the next section, we show that
the choice
/ -24vf
fl := V ~ f ~ (14)
endows the curve xc + o-(17) with certain highly desirable properties, thus obviating the need for
multiple linear solves at each step of the model-trust region iteration.
254 B.J. McCartin / Journal of Computational and Applied Mathematics 91 (1998) 249-259
1.5
O.S i 1.$
Itl
4. Mathematical properties
In order to fully explore the consequences of the above choice for fl, Eq. (14), we need to lay
some preliminary groundwork. We begin with:
Lemma 1. Let A , B , C E R n×n with A and C positive definite and x , y E Rn; then the following are
equivalent:
[xrByl 2 <~ (xXAx)(yrCy), (15)
Proof. This is Theorem 7.7.7 [2, p. 473] specialized to real, square matrices. []
This lemma has the following important consequence.
Theorem 4 (Cauchy-Newton inequality). The size of the Cauchy step is always bounded above by
the length of the projection of the Newton step onto the Cauchy step."
Proof.
IlVAIl: IlVfcll 3
SSD := VfTH~Vf~ • V A =~ I]ssoll-- VfJH~Vf~;
vf/L-'vA
sN := - H Z 1Vf¢ =~ l[ projection o f sN onto V fell -- ]]Vf¢l ]
Lemma 2.
13 <~ - v 5 s~V f~ IIs~ll:
IIvAII-------
~ <. - v S . s ~ V A (22)
We are finally in a position to fully describe the mathematical properties of the curve Xc + tr(r/)
with our choice of ft. We summarize these as:
Theorem 5. With a(rl) given by Eq. (13) and fl defined by Eq. (14):
(i) Distance from x¢ increases monotonically as we proceed along the curve, Xc + trOD, from
xc ( t / = 1) to X N (I~ = 0).
(ii) The value of the local quadratic model, rhc(xc + a(q)), decreases monotonically as we traverse
the curve, x~ + a(~l), from x¢ (q = 1) to xN (q = 0).
256 B.J. McCartinlJournal of Computational and Applied Mathematics 91 (1998) 249-259
Proof.
(i) A direction, p, from xc + a(~/) increases distance from Xc iff pTa(r/) > 0. The tangent vector
to the path xc + a(q) from xc to xs is parallel to p := -a'(~/), so that we need to establish that
a'(~/)Ta(t/) < 0 for 0 < ~/ < 1. But, a ' ( q ) = 2 ( ~ / - 1)ss+(2~/- 1)fl~Tfc, so that a'(q)Ta(q)=(~l - 1).U(q)
where u(r/) := r/2. [211Vfc] ]2fl2 ÷ 4 s ~ f ~ f l + 2]]SN112]-- q. [] ]~7fc]12fl2÷ 5sT~fcfl ÷ 41 ]SN112]+ [s~f~fl +
211sNII2]. We need to establish that u(r/) > 0 for 0 < r / < 1. Note that u ( 0 ) = 21IsNII 2 + >t 0
by Lemma 2, and that u(1) = fl2llVf~l] 2 > 0. Thus, either both roots of this quadratic lie in (0, 1)
or neither one does. Furthermore, the product of the roots [7] of this quadratic exceeds 1, again by
Lemma 2, so that both roots cannot lie in (0, 1) and hence neither root can either. Hence, u(r/) > 0
for 0 < r / < 1.
(ii) The directional derivative of rhc(Xc + a(q)) is [xTf~ + Hca(q)] T • and must be
shown to be positive for 0 < r / < 1. We will accomplish this by establishing that d(r/) := [VfcT+(r/-
1)2s~Hc +q(q - 1 )flVf~THc] • [2(q - 1 )sy +(2q - 1)flXYf~] > 0 for 0 < q < 1, since it has the same sign
as the directional derivative. Note that d(0) = 0, d(1 ) = flllVf~ll 2 > 0, d'(0) = -4flllVf~ll 2 - 6s~Vfc
> 0 - - o n c e again, by Lemma 2, and d ' ( 1 ) = 2fll[XTf~[I2 > o. Thus, it will suffice to exclude either
of the other two roots of the cubic, d0/), from the interval (0, 1). This is achieved by rewriting
d(q)=q.Q(rl) with the quadratic Q ( r / ) : = Aq2+Bq+C and A :-- 2fl2Vf~VHcVfc-4fll IVf~ 112--2sTX(7fc,
B : = -3fl2VfcTHcVfc ÷ 9ill IVf~l I2 + 6s~Vfc, c := fl2V.fcTHcVfc - 4ill IVfcl 12- 4sr~Vf~. The product
of the roots [7] of Q(r/) is C/A ~> 1 provided that r2 ~< _ 2 • s~Vfc/Vf~THcVf~ which is assured
by Eq. (14). Thus, for our choice of r, the two roots of Q(~/) cannot both reside on (0, 1), so that
neither can separately. Hence, we have that dUD is nonnegative on [0, 1] and is positive on (0, 1).
[]
Note that (i) precludes the quadratic trajectory, xc + a(q); q E [0, 1], from "overshooting" XN and
"turning back on itself", so that there will be a unique point of intersection of the path with the
trust region boundary. Also, (ii) asserts that it is pointless to stop along the curve, xc + o-(r/), before
reaching the trust region boundary since the local quadratic model, rhc, which we "trust" is still
decreasing.
Thus, it is well defined and reasonble to specify q+ as the smallest positive root of the quartic
q(r/) := [Io-(r/)]l2 - 6~, (23)
1 -
J,
f ( x , , x 2 ) := x 4 + x~ + x~, (25)
with
xc = 1 ' 1
6c = 5" (26)
This produces
H ¢ = [ 140102 (27)
and
-0.469] (28)
,N-- 1, SSD= --0.156 "
The new algorithm yields (see Fig. 3) fl = 0.1336, q+ = 0.444, a+----[-0.330,-0.375] T, while
s+ = [-0.343, -0.365] v. Comparison to the standard "double dogleg" step of [-0.340, -0.669] T (see
Fig. 2) indicates that the new algorithm provides a dramatic increase in accuracy at essentially no
additional cost.
258 B.J. McCartinl Journal of Computational and Applied Mathematics 91 (1998) 249-259
Table 1
Summary of numerical results
6. Numerical results
With the dual aims of establishing the validity and the utility of our algorithm, Table 1 presents
the results of a sequence of numerical examples. They run the gamut from low-dimensional systems
with known solutions [5] to a cutting-edge problem in semiconductor device simulation (p-n diode)
[4]. All simulations were performed in double precision using the Fortran77-3.0.1 compiler under the
SunOS-5.5.1 operating system on a Sun Microsystems E3000 UltraEnterprise workstation. Complete
details are available elsewhere [3].
7. Conclusion
In the preceding sections, we have presented a flexible and powerful new model-trust region al-
gorithm for unconstrained optimization and systems of nonlinear equations based upon quadratic
interpolation. The mathematical properties of this algorithm have been described and full proofs
have been provided. Comparison to the popular "double dogleg" strategy has been made. The new
algorithm has been validated on a suite of widely used test problems from the literature [5] and its
practical utility has been demonstrated on a challenging problem from semiconductor device simu-
lation [4]. The full details pertaining to these test cases have appeared elsewhere [3]. In closing, we
make the observation that methods based upon higher order interpolation might be contemplated.
However, the concomitant increase in algorithmic complexity would demand numerical and/or ana-
lytical justification.
8. Acknowledgements
The author would like to thank Ms. Barbara Rowe for her assistance in the production of this
paper.
B.J. McCartin/Journal of Computational and Applied Mathematics 91 (1998) 249-259 259
References
[1 ] J.E. Dennis, R.B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice-
Hall, Englewood Cliffs, NJ, 1983.
[2] R.A. Horn, C.R. Johnson, Matrix Analysis, Cambridge, 1990.
[3] B.J. McCartin, A new model-trust region algorithm for systems of nonlinear equations, Proc. 13th Ann. Conf. on
Applied Mathematics (CAM97), University of Central Oklahoma, 1997, pp. 172-186.
[4] B.J. McCartin, R.H. Hobbs, R.E. LaBarre, P.E. Kirschner, Solution of the discrete semiconductor device equations,
in: J.J.H. Miller (Ed.), NASECODE IV: Proc. 4th Internat. Conf. on the Numerical Analysis of Semiconductor
Devices and Integrated Circuits, Boole Press, 1985, pp. 411--416.
[5] J.J. Morr, B.S. Garbow, K.E. HiUstrom, Testing unconstrained optimization software, ACM Trans. Math. Soft-ware
7 (1) (1981) 17-41.
[6] J.J. Mor~, D.C. Sorenson, Newton's Method, in: G.H. Golub (Ed.), Studies in Numerical Analysis, MAA, 1984, pp.
29-82.
[7] J.V. Uspensky, Theory of Equations, McGraw-Hill, New York, 1948.