You are on page 1of 11

JOURNAL OF

COMPUTATIONAL AND
APPLIED MATHEMATICS

ELSEVIER Journal of Computational and Applied Mathematics 91 (1998) 249-259

A model-trust region algorithm


utilizing a quadratic interpolant
Brian J. M c C a r t i n *
Applied Mathematics, Kettering University, 1700 West Third Avenue, Flint, MI 48504-4898, USA
Received 12 March 1997; received in revised form 12 February 1998

Abstract
A new model-trust region algorithm for problems in unconstrained optimization and nonlinear equations utilizing a
quadratic interpolant for step selection is presented and analyzed. This is offered as an alternative to the piecewise-linear
interpolant employed in the widely used "double dogleg" step selection strategy. After the new step selection algorithm has
been presented, we offer a summary, with proofs, of its desirable mathematical properties. Numerical results illustrating
the efficacy of this new approach are presented. @ 1998 Elsevier Science B.V. All rights reserved.
Keywords." Model-trust region algorithms; Nonlinear equations; Unconstrained optimization

1. Introduction

We are concerned in this paper with a new procedure for solving unconstrained optimization
problems whether they arise directly or indirectly through minimization o f the norm of the residual
of a system o f nonlinear equations. O f particular interest are problems of sufficient complexity to
preclude the determination o f an initial guess for the iteration that is close to the desired solution.
This situation is quite common in engineering design, one such problem arising in the numerical
simulation o f semiconductor devices [4].
N e w t o n ' s method [6] is a powerful tool for solving such problems since it exhibits local q-quadratic
convergence, i.e. it is q-quadratically convergent provided that we start the iteration sufficiently close
to the solution. In order to construct a globally convergent variant, Newton's method is hybridized
with the globally, yet typically slowly, convergent Cauchy's method (steepest descent). The resulting
so-called model-trust region algorithms [2] retain the best features o f both methods: strong global
convergence coupled with rapid local convergence (i.e. they are globally q-quadratically convergent).

* Tel.: (810) 762-7802; fax: (810) 762-9796; e-mail: bmccarti@kettering.edu

0377-0427/98/$19.00 @ 1998 Elsevier Science B.V. All rights reserved


PH S 03 77-0427(98)00052-1
250 B.J. McCartin / Journal of Computational and Applied Mathematics 91 (1998) 249-259

The various model-trust region algorithms differ from one another in precisely how this hybridization
is achieved. This matter will be further elaborated upon in the next section.
The present paper offers an improvement to the standard "double dogleg" version of this algorithm
via replacement of its piecewise-linear approximant by a quadratic interpolant. In the ensuing pages,
we first introduce the model-trust region strategy. We then present our new step selection algorithm,
describe its mathematical properties, and compare it to the "double dogleg" strategy. Finally, we
report on the strenuous excercise of the new algorithm on a suite of standard test problems [5] as
well as on the numerical simulation of a p - n diode [4]. Additional details of the test problems
appear in [3]. We focus herein on those aspects of our algorithm which are innovative. For details on
model-trust region algorithms in general (e.g. scaling, initialization, termination, Hessian estimation,
linear solver), the reader is referred to the superlative text [1 ].

2. Model-trust region algorithms

Consider the unconstrained minimization problem

m~f(x), f " R" ~ N, (1)

with f ( x ) assumed twice continuously differentiable. We seek a local minimizer, x., for this problem.
For unconstrained maximization, simply replace f ( x ) by - f ( x ) . In what follows, all unspecified
norms are Euclidean 2-norms, ][xII :-- ~ , and Ilx[b := ~ with A symmetric and positive
definite (SPD).
In Newton's method [6], the step at each stage is selected to minimize the local quadratic model:

mc(Xc + s) := f~ + VfcVS + ½sVVZf~s, (2)

where a subscript denotes the point of evaluation, in this instance the current point, xc. Applying
the first derivative test for a local extremum, we arrive at the equations for the Newton step:

V2 fcSN -~- - - V fc. (3)

If VZf~ is not positive definite then we modify the local quadratic model as follows:

rhc(X~ + s) := fc + VATs + ½sTHcs, (4)

where Pc t> 0 is selected so that the modified model Hessian, H~ := V2fc + #fl is "safely positive
definite". Assuming that V 2 f . is nonsingular, V2fc will be SPD when we are close to the local
minimizer, so that #c = 0 eventually. Further justification for this modification as well as details of
its implementation are available in [1].
Similarly, we may also address the root-finding problem:
F(x.)=O, x. E ~n, F : R" ~ ~n, (5)

via its reformulation as the unconstrained minimization problem:

min f(x),
x E R"
f(x) := ½F(x)TF(x)= ½llF(x)ll 2 (6)
B.J. McCartinI Journal of Computationaland Applied Mathematics 91 (1998) 249-259 251

Taylor series expansion about the current iterate, xc, yields the following local approximation to
f(x):

mc(x¢ + s) := ½F(xc)TF(xc) + [J(xe)TF(xc)]Ts

+½sV[j(xc)Tj(x¢) + ~ Fi(xc)V2Fi(xe)]s. (7)


i=1

Rather than having to compute the Hessian matrices, V2Fi (i = 1.... ,n), we observe that their
coefficients vanish at x,. Thus, we omit the above summation and introduce the modified local
quadratic model:

rh~(Xc + s ) ' = ½F(xc)TF(x~) + [J(x~)TF(xc)]rs + ½ST[J(x~)TJ(x~)]S

= ½mc(x¢ + s)VMc(xc + s); M¢(x~ + s) := F(x~) +J(x~)s. (8)

Since the gradients of the, me, and f are identical at x¢, they share descent directions from that point.
In particular, Pso := - J r F (steepest descent direction) and PN := - j - I F (Newton direction) are
shared by all three functions. Eq. (8) may be recast in the form of Eq. (4) by the indentifications
Vf¢ = JfF~ and H~ = J~J~, which is positive definite provided that J~ is nonsingular.
A variant of Newton's method for either unconstrained minimization or nonlinear equations with
attractive global convergence properties may be derived as follows. A pure Newton iteration would
correspond to selecting s to minimize lh¢ regardless of the quality of this local quadratic model. The
model-trust region algorithms improve upon this by utilizing an estimate, 6¢, of the radius of the
region about xc in which ~hc adequately represents f . Thus, the next iterate is determined by the
step, s, which solves the locally constrained minimization problem:

m}nEth~(x¢ + s ) = f¢ + Vf~Ts + ½stiles]

s.t. Ilsll ~< 6o. (9)

This stands in stark contrast to restricted-step Newton methods which retain the Newton step direction
but simply reduce the step length when far from x,.
Defining

s(,~) := - (He + / Vf~, (10)

we may now state the fundamental theorem of model-trust region algorithms (see Fig. 1):

Theorem 1 (Fundamental Theorem of model-trust region algorithms). The solution to the locally
constrained minimization problem, Eq. (9), is given by

fs(O) =sN, if IIs(O)[I ~ 6~


(11)
s+ = L s(~+) ~ Ils(~+)ll = 6~, otherwise.
252 B.J. McCartin/ Journal of Computational and Applied Mathematics 91 (1998) 249-259

Fig. 1. Model-trust region.

Proof. See [1, pp. 131-132]. []

Note that the curve, s(2), has the following properties [1]: s ( 0 ) = SN, S ( 1 ) = 0, S'(1)----Vfc.
Thus, the associated step-directions range between the Newton (6~ large) and the Cauchy (6¢ small)
directions. Also, s(2) is a descent direction from x¢ for all 0 ~< 2 < 1 and distance from x~ increases
as 2 decreases. In practice, the difficulty in applying Theorem 1 lies in the complexity of finding
2+. If we attempt to find it exactly, as is done in the optimal "hook step" strategy [1 ], then systems
of equations of the form

2k I'~
H~ + 1 - 2k ] sk = - ~ ' f ~ (12)

need to be solved at each step.


A popular alternative to incurring this added computational burden is the use of the "double
dogleg" strategy [1] whereby the curve s(2), 0 ~< 2 ~< 1, is approximated by a piecewise-linear curve
xc ~ XSD --~ £ ~ XN as shown in Fig. 2. Here, XSD is the Cauchy point, XN is the Newton point,
:-----xc + ~N (7 ~< ~ ~< 1 ), and ? is the ratio of the Cauchy step length to the length of the projection
of the Newton step onto the Cauchy direction which we will show in the next section to be bounded
by 1. The new point, x+, is chosen so that I Ix+ - xoll -- unless [IH~-IVf~]I ~< 6c in which case
X+ ~ XN.
It is shown in [1] that with such a choice for ~ (they suggest setting ~ = 0.8~ + 0.2), the "dou-
ble dogleg" curve possesses two very important properties. Firstly, as a point proceeds along the
"double dogleg" curve from x~ to Xy, distance from xc increases monotonically, so that this process
is well defined. Secondly, the value of the modified local quadratic model, th~(xc + s) decreases
monotonically as a point traverses this curve in the same direction, so that this process is reason-
able. In the next section, we present a new quadratic interpolant to s(2) with these same desirable
properties.
B.J. McCartinl Journal of Computational and Applied Mathematics 91 (1998) 249-259 253

0.$

~,]tat t
O.S I t .S
xl

Fig. 2. Double dogleg strategy.

3. The quadratic interpolant

In the new step selection algorithm (see Fig. 3), we approximate s+ by replacing s(2) by the
quadratic interpolant:
a(r/) := ( r / - 1)[(r/- 1)Ss + r//3Vf~]; 0 ~< ~/~< 1, (13)
when sN lies outside the trust region. For the moment, we leave /3 > 0 unspecified. We next sum-
marize some general properties of a(q).

Theorem 2. For any fl > O, xc + a(rl) passes through x¢ and XN and is tangent to x¢ + s(2) at xc.
Moreover, a(q) is a descent direction f r o m x¢ f o r any 0 <%t1 < 1.

Proof. a ( 0 ) = sN =~ interpolation at xN. a ( 1 ) = 0 =~ interpolation at xc. a ' ( r / ) = 2 ( r / - 1)SN + ( 2 r / -


1)flVf~ ~ a'(1) =flVf~ => tangency at xc. Vf~Ttr(r/)=--(r/- 1)211~7fc112_, + r/(t]- l)l]~Tfcll 2 < 0 =:>
O'(t/) is a descent direction from x~ for 0 ~< r / < 1.

Ideally, the free parameter, fl, would be chosen to achieve tangency at xN also. However, this is
equivalent to selecting/3 so that -2Sy - flVf¢ II -H 'sN and would thus require that an additional
system of equations be solved. We wish to avoid this added cost. In the next section, we show that
the choice
/ -24vf
fl := V ~ f ~ (14)

endows the curve xc + o-(17) with certain highly desirable properties, thus obviating the need for
multiple linear solves at each step of the model-trust region iteration.
254 B.J. McCartin / Journal of Computational and Applied Mathematics 91 (1998) 249-259

1.5

O.S i 1.$
Itl

Fig. 3. New step selection algorithm.

4. Mathematical properties

In order to fully explore the consequences of the above choice for fl, Eq. (14), we need to lay
some preliminary groundwork. We begin with:

Lemma 1. Let A , B , C E R n×n with A and C positive definite and x , y E Rn; then the following are
equivalent:
[xrByl 2 <~ (xXAx)(yrCy), (15)

21xrByl <~xXAx + yXCy ' (16)


p(BXA-lBC -1) ~< 1. (17)

Proof. This is Theorem 7.7.7 [2, p. 473] specialized to real, square matrices. []
This lemma has the following important consequence.

Theorem 3. Assume that Hc & SPD; then


IVAll 2 ~ IIVAII,,¢- IIVAII,,~-,. (18)
1 2
IIVAII 2 ~< ~. [llVAII.c + IIVAIl~r,]. (19)

IIVAII ~< ½[IIVAII.o + IIVAII.<-.]. (20)

Proof. Let A = He, B = L C = H c-1 , x = y = Vf~ in Lemma 1" then p(BXA-~BC -1 ) = p ( I ) = 1.


Thus,
(i) Vf~rVA ~< (Vf~XH¢VA) 1/2" ( V f c r H £ 1 V A ) '/2.
B.J. McCartin/ Journal of Computational and Applied Mathematics 91 (1998) 249-259 255

(ii) 2Vfca'Vf¢ ~< Vf~XHcVfc + VfcTH~-IVf¢.


(iii) Taking the arithmetic mean of (i) and (ii) yields IlVf~ll ~ ~ ¼[llVfcll~/= +211Vfdl.=llVf~ll~,
+llVf~ll~=,] = ¼[IIVAII.¢ + IlVfcll.=,]:. Taking the square root of both sides of this results
in (iii). []

This brings us to the fundamental result:

Theorem 4 (Cauchy-Newton inequality). The size of the Cauchy step is always bounded above by
the length of the projection of the Newton step onto the Cauchy step."

Ilss~ll ~ s x ~sso . (21)

Proof.
IlVAIl: IlVfcll 3
SSD := VfTH~Vf~ • V A =~ I]ssoll-- VfJH~Vf~;
vf/L-'vA
sN := - H Z 1Vf¢ =~ l[ projection o f sN onto V fell -- ]]Vf¢l ]

Ilvfdl3 ~< V 4 riiv:¢ll


By Eq. (18), IIvA[I' ~< IlVf~ll~.llVAIl~-, ~ v::.¢v:¢ H j I V f c ' which we now recognize as
the desired inequality, Eq. (21). []

In what follows, we require an additional result:

Lemma 2.
13 <~ - v 5 s~V f~ IIs~ll:
IIvAII-------
~ <. - v S . s ~ V A (22)

Proof. By Eq. (18), IlVfc[I 4 ~ (Vf~THjlVA). (VfcTHcVfc) = (--s~Vfc) • (VfcTHcVfc). Thus,


~z _ --2sVVfc (s~Vfc) 2
VfJHoVf~ ~< 2. IIVAII 4
The first half of the inequality now follows by taking square roots. The second half of the
inequality is a direct consequence of the Cauchy-Schwarz inequality. []

We are finally in a position to fully describe the mathematical properties of the curve Xc + tr(r/)
with our choice of ft. We summarize these as:

Theorem 5. With a(rl) given by Eq. (13) and fl defined by Eq. (14):
(i) Distance from x¢ increases monotonically as we proceed along the curve, Xc + trOD, from
xc ( t / = 1) to X N (I~ = 0).
(ii) The value of the local quadratic model, rhc(xc + a(q)), decreases monotonically as we traverse
the curve, x~ + a(~l), from x¢ (q = 1) to xN (q = 0).
256 B.J. McCartinlJournal of Computational and Applied Mathematics 91 (1998) 249-259

Proof.
(i) A direction, p, from xc + a(~/) increases distance from Xc iff pTa(r/) > 0. The tangent vector
to the path xc + a(q) from xc to xs is parallel to p := -a'(~/), so that we need to establish that
a'(~/)Ta(t/) < 0 for 0 < ~/ < 1. But, a ' ( q ) = 2 ( ~ / - 1)ss+(2~/- 1)fl~Tfc, so that a'(q)Ta(q)=(~l - 1).U(q)
where u(r/) := r/2. [211Vfc] ]2fl2 ÷ 4 s ~ f ~ f l + 2]]SN112]-- q. [] ]~7fc]12fl2÷ 5sT~fcfl ÷ 41 ]SN112]+ [s~f~fl +
211sNII2]. We need to establish that u(r/) > 0 for 0 < r / < 1. Note that u ( 0 ) = 21IsNII 2 + >t 0
by Lemma 2, and that u(1) = fl2llVf~l] 2 > 0. Thus, either both roots of this quadratic lie in (0, 1)
or neither one does. Furthermore, the product of the roots [7] of this quadratic exceeds 1, again by
Lemma 2, so that both roots cannot lie in (0, 1) and hence neither root can either. Hence, u(r/) > 0
for 0 < r / < 1.
(ii) The directional derivative of rhc(Xc + a(q)) is [xTf~ + Hca(q)] T • and must be
shown to be positive for 0 < r / < 1. We will accomplish this by establishing that d(r/) := [VfcT+(r/-
1)2s~Hc +q(q - 1 )flVf~THc] • [2(q - 1 )sy +(2q - 1)flXYf~] > 0 for 0 < q < 1, since it has the same sign
as the directional derivative. Note that d(0) = 0, d(1 ) = flllVf~ll 2 > 0, d'(0) = -4flllVf~ll 2 - 6s~Vfc
> 0 - - o n c e again, by Lemma 2, and d ' ( 1 ) = 2fll[XTf~[I2 > o. Thus, it will suffice to exclude either
of the other two roots of the cubic, d0/), from the interval (0, 1). This is achieved by rewriting
d(q)=q.Q(rl) with the quadratic Q ( r / ) : = Aq2+Bq+C and A :-- 2fl2Vf~VHcVfc-4fll IVf~ 112--2sTX(7fc,
B : = -3fl2VfcTHcVfc ÷ 9ill IVf~l I2 + 6s~Vfc, c := fl2V.fcTHcVfc - 4ill IVfcl 12- 4sr~Vf~. The product
of the roots [7] of Q(r/) is C/A ~> 1 provided that r2 ~< _ 2 • s~Vfc/Vf~THcVf~ which is assured
by Eq. (14). Thus, for our choice of r, the two roots of Q(~/) cannot both reside on (0, 1), so that
neither can separately. Hence, we have that dUD is nonnegative on [0, 1] and is positive on (0, 1).
[]

Note that (i) precludes the quadratic trajectory, xc + a(q); q E [0, 1], from "overshooting" XN and
"turning back on itself", so that there will be a unique point of intersection of the path with the
trust region boundary. Also, (ii) asserts that it is pointless to stop along the curve, xc + o-(r/), before
reaching the trust region boundary since the local quadratic model, rhc, which we "trust" is still
decreasing.
Thus, it is well defined and reasonble to specify q+ as the smallest positive root of the quartic
q(r/) := [Io-(r/)]l2 - 6~, (23)

and to select as our next iterate (see Fig. 3)


x+ := xc + o-(r/+). (24)
Since the quartic, Eq. (23), has leading coefficient and constant term of opposite sign, it has at least
two real roots, one positive and one negative [7, p. 105]. In fact, by Theorem 5 (i), q(r/) has a
unique root in [0, 1] provided that C5c< I]SN]I.
Letting A f := f(x+) - f(Xc) and Afp := the(x+) - f(Xc), if A f > 10 -4 • VfV(x+ - x~) then
we reject x+ and reduce 3¢ by a quadratic backtrack strategy [1]. Otherwise, we check to see if the
decrease in f was dramatic enough to justify an enlarged trust region with the present model. If not,
then the model is updated with a ~c determined by a comparison of the actual decrease in f , A f ,
with the predicted decrease, Afp. This portion of the algorithm is not new and further details on the
model-trust region update are available [1]. The complete model-trust region algorithm is depicted
in Fig. 4.
B.J. McCartin/Journal of Computational and Applied Mathematics 91 (1998) 249-259 257

1 -
J,

Fig. 4. Model-trust region algorithm.

5. The Dennis-SelmaSel example

Now, consider the example [1]

f ( x , , x 2 ) := x 4 + x~ + x~, (25)

with

xc = 1 ' 1
6c = 5" (26)

This produces

H ¢ = [ 140102 (27)

and

-0.469] (28)
,N-- 1, SSD= --0.156 "

The new algorithm yields (see Fig. 3) fl = 0.1336, q+ = 0.444, a+----[-0.330,-0.375] T, while
s+ = [-0.343, -0.365] v. Comparison to the standard "double dogleg" step of [-0.340, -0.669] T (see
Fig. 2) indicates that the new algorithm provides a dramatic increase in accuracy at essentially no
additional cost.
258 B.J. McCartinl Journal of Computational and Applied Mathematics 91 (1998) 249-259

Table 1
Summary of numerical results

Problem Starting point Scaled f(x) No. of steps

Rosenbrock Banana function (n = 2) [--1.2, 1]T 9.86.10 -32 2


Freudenstein-Roth function (n = 2) [6, 5] T 7.32.10 -29 5
[0.5, - 2 ] 7 6.91.10 -29 19 (11+8)
Powell badly scaled function (n = 2) [0, 1]7 3.83.10 -27 12
Box 3D function (n = 3) [0, 10, 20] T 4.48-10 -32 5
Helical valley function (n = 3) [-- 1,0, 0] T 2.89.10 -28 13
Powell singular function (n = 4) [ 3 , - 1 , 0 , 1]7 2.50.10 -13 20
P-N diode (n = 195) Space-charge neutral 3.52.10 -25 7

6. Numerical results

With the dual aims of establishing the validity and the utility of our algorithm, Table 1 presents
the results of a sequence of numerical examples. They run the gamut from low-dimensional systems
with known solutions [5] to a cutting-edge problem in semiconductor device simulation (p-n diode)
[4]. All simulations were performed in double precision using the Fortran77-3.0.1 compiler under the
SunOS-5.5.1 operating system on a Sun Microsystems E3000 UltraEnterprise workstation. Complete
details are available elsewhere [3].

7. Conclusion

In the preceding sections, we have presented a flexible and powerful new model-trust region al-
gorithm for unconstrained optimization and systems of nonlinear equations based upon quadratic
interpolation. The mathematical properties of this algorithm have been described and full proofs
have been provided. Comparison to the popular "double dogleg" strategy has been made. The new
algorithm has been validated on a suite of widely used test problems from the literature [5] and its
practical utility has been demonstrated on a challenging problem from semiconductor device simu-
lation [4]. The full details pertaining to these test cases have appeared elsewhere [3]. In closing, we
make the observation that methods based upon higher order interpolation might be contemplated.
However, the concomitant increase in algorithmic complexity would demand numerical and/or ana-
lytical justification.

8. Acknowledgements

The author would like to thank Ms. Barbara Rowe for her assistance in the production of this
paper.
B.J. McCartin/Journal of Computational and Applied Mathematics 91 (1998) 249-259 259

References

[1 ] J.E. Dennis, R.B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice-
Hall, Englewood Cliffs, NJ, 1983.
[2] R.A. Horn, C.R. Johnson, Matrix Analysis, Cambridge, 1990.
[3] B.J. McCartin, A new model-trust region algorithm for systems of nonlinear equations, Proc. 13th Ann. Conf. on
Applied Mathematics (CAM97), University of Central Oklahoma, 1997, pp. 172-186.
[4] B.J. McCartin, R.H. Hobbs, R.E. LaBarre, P.E. Kirschner, Solution of the discrete semiconductor device equations,
in: J.J.H. Miller (Ed.), NASECODE IV: Proc. 4th Internat. Conf. on the Numerical Analysis of Semiconductor
Devices and Integrated Circuits, Boole Press, 1985, pp. 411--416.
[5] J.J. Morr, B.S. Garbow, K.E. HiUstrom, Testing unconstrained optimization software, ACM Trans. Math. Soft-ware
7 (1) (1981) 17-41.
[6] J.J. Mor~, D.C. Sorenson, Newton's Method, in: G.H. Golub (Ed.), Studies in Numerical Analysis, MAA, 1984, pp.
29-82.
[7] J.V. Uspensky, Theory of Equations, McGraw-Hill, New York, 1948.

You might also like