Quasi Newton Methods

Quasi-Newton Method
Solving non-Linear System of Equations
PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Fri, 17 Aug 2012 11:38:12 UTC
Contents
Articles
Quasi-Newton method DavidonFletcherPowell formula BFGS method Limited-memory BFGS Broyden's method Symmetric rank-one 1 3 4 7 10 12
References
Article Sources and Contributors 14
Article Licenses
License 15
Quasi-Newton method
Quasi-Newton method
In optimization, quasi-Newton methods (a special case of variable metric methods) are algorithms for finding local maxima and minima of functions. Quasi-Newton methods are based on Newton's method to find the stationary point of a function, where the gradient is 0. Newton's method assumes that the function can be locally approximated as a quadratic in the region around the optimum, and uses the first and second derivatives to find the stationary point. In quasi-Newton methods the Hessian matrix of second derivatives of the function to be minimized does not need to be computed. The Hessian is updated by analyzing successive gradient vectors instead. Quasi-Newton methods are a generalization of the secant method to find the root of the first derivative for multidimensional problems. In multi-dimensions the secant equation is under-determined, and quasi-Newton methods differ in how they constrain the solution, typically by adding a simple low-rank update to the current estimate of the Hessian. The first quasi-Newton algorithm was proposed by W.C. Davidon, a physicist working at Argonne National Laboratory. He developed the first quasi-Newton algorithm in 1959: the DFP updating formula, which was later popularized by Fletcher and Powell in 1963, but is rarely used today. The most common quasi-Newton algorithms are currently the SR1 formula (for symmetric rank one), the BHHH method, the widespread BFGS method (suggested independently by Broyden, Fletcher, Goldfarb, and Shanno, in 1970), and its low-memory extension, L-BFGS. The Broyden's class is a linear combination of the DFP and BFGS methods. The SR1 formula does not guarantee the update matrix to maintain positive-definiteness and can be used for indefinite problems. The Broyden's method does not require the update matrix to be symmetric and it is used to find the root of a general system of equations (rather than the gradient) by updating the Jacobian (rather than the Hessian). One of the chief advantages of quasi-Newton methods over Newton's method is that the Hessian matrix (or, in the case of quasi-Newton methods, its approximation) does not need to be inverted. Newton's method, and its derivatives such as interior point methods, require the Hessian to be inverted, which is typically implemented by solving a system of linear equations and is often quite costly. In contrast, quasi-Newton methods usually generate an estimate of directly.
Description of the method

As in Newton's method, one uses a second order approximation to find the minimum of a function Taylor series of around an iterate is: . The
where (
) is the gradient and ) is
an approximation to the Hessian matrix. The gradient of this approximation
(with respect to
and setting this gradient to zero provides the Newton step:
The Hessian approximation
is chosen to satisfy
which is called the secant equation (the Taylor series of the gradient itself). In more than one dimension determined. In one dimension, solving for
is under
and applying the Newton's step with the updated value is equivalent to
the secant method. The various quasi-Newton methods differ in their choice of the solution to the secant equation (in one dimension, all the variants are equivalent). Most methods (but with exceptions, such as Broyden's method) seek a symmetric solution ( ); furthermore, the variants listed below can be motivated by finding an update
Quasi-Newton method that is as close as possible to norm. An approximate initial value of in some norm; that is, where
is some positive defini is updated
is often sufficient to achieve rapid convergence. The unknown
applying the Newton's step calculated using the current approximate Hessian matrix , with chosen to satisfy the Wolfe conditions; ; The gradient computed at the new point , and
is used to update the approximate Hessian Sherman-Morrison formula. A key property of the BFGS and DFP updates is that if Wolfe conditions then is also positive definite. The most popular update formulas are:
Method DFP
, or directly its inverse is positive definite and
using the is chosen to satisfy the
BFGS
Broyden
Broyden family SR1
Implementations
Owing to their success, there are implementations of quasi-Newton methods in almost all programming languages. The NAG Library contains several routines[1] for minimizing or maximizing a function[2] which use quasi-Newton algorithms. In MATLAB's Optimization toolbox, the fminunc [3] function uses (among other methods) the BFGS Quasi-Newton method. Many of the constrained methods [4] of the Optimization toolbox use BFGS and the variant L-BFGS. Many user-contributed quasi-Newton routines are available on MATLAB's file exchange [5]. Mathematica includes quasi-Newton solvers [6]. R's optim general-purpose optimizer routine uses the BFGS method by using method="BFGS"[7]. In the SciPy extension to Python, the scipy.optimize.minimize [8] function includes, among other methods, a BFGS implementation.
Quasi-Newton method
References
[1] The Numerical Algorithms Group. "Keyword Index: Quasi-Newton" (http:/ / www. nag. co. uk/ numeric/ fl/ nagdoc_fl23/ html/ INDEXES/ KWIC/ quasi-newton. html). NAG Library Manual, Mark 23. . Retrieved 2012-02-09. [2] The Numerical Algorithms Group. "E04 Minimizing or Maximizing a Function" (http:/ / www. nag. co. uk/ numeric/ fl/ nagdoc_fl23/ pdf/ E04/ e04intro. pdf). NAG Library Manual, Mark 23. . Retrieved 2012-02-09. [3] http:/ / www. mathworks. com/ help/ toolbox/ optim/ ug/ fminunc. html [4] http:/ / www. mathworks. com/ help/ toolbox/ optim/ ug/ brnoxzl. html [5] http:/ / www. mathworks. com/ matlabcentral/ fileexchange/ ?term=BFGS [6] http:/ / reference. wolfram. com/ mathematica/ tutorial/ UnconstrainedOptimizationQuasiNewtonMethods. html [7] http:/ / finzi. psych. upenn. edu/ R/ library/ stats/ html/ optim. html [8] http:/ / docs. scipy. org/ doc/ scipy/ reference/ generated/ scipy. optimize. minimize. html
Further reading
Bonnans, J. F., Gilbert, J.Ch., Lemarchal, C. and Sagastizbal, C.A. (2006), Numerical optimization, theoretical and numerical aspects. Second edition. Springer. ISBN 978-3-540-35445-1. William C. Davidon, Variable Metric Method for Minimization (http://link.aip.org/link/?SJE/1/1/1), SIOPT Volume 1 Issue 1, Pages 117, 1991. Fletcher, Roger (1987), Practical methods of optimization (2nd ed.), New York: John Wiley & Sons, ISBN978-0-471-91547-8. Nocedal, Jorge & Wright, Stephen J. (1999). Numerical Optimization. Springer-Verlag. ISBN 0-387-98793-2. Press, WH; Teukolsky, SA; Vetterling, WT; Flannery, BP (2007). "Section 10.9. Quasi-Newton or Variable Metric Methods in Multidimensions" (http://apps.nrbook.com/empanel/index.html#pg=521). Numerical Recipes: The Art of Scientific Computing (3rd ed.). New York: Cambridge University Press. ISBN978-0-521-88068-8.
DavidonFletcherPowell formula
The DavidonFletcherPowell formula (or DFP; named after William C. Davidon, Roger Fletcher, and Michael J. D. Powell) finds the solution to the secant equation that is closest to the current estimate and satisfies the curvature condition (see below). It was the first quasi-Newton method which generalize the secant method to a multidimensional problem. This update maintains the symmetry and positive definiteness of the Hessian matrix. Given a function , its gradient ( ), and positive definite Hessian matrix , the Taylor series is:
and the Taylor series of the gradient itself (secant equation):
is used to update
. The DFP formula finds a solution that is symmetric, positive definite and closest to the current :
approximate value of
where
and
is a symmetric and positive definite matrix. The corresponding update to the inverse Hessian approximation is given by:
DavidonFletcherPowell formula
is assumed to be positive definite, and the vectors
and
must satisfy the curvature condition:
The DFP formula is quite effective, but it was soon superseded by the BFGS formula, which is its dual (interchanging the roles of y and s).
References
Davidon, W. C. (1991), "Variable metric method for minimization", SIAM Journal on Optimization 1: 117, doi:10.1137/0801001 Fletcher, Roger (1987), Practical methods of optimization (2nd ed.), New York: John Wiley & Sons, ISBN978-0-471-91547-8. Nocedal, Jorge & Wright, Stephen J. (1999), Numerical Optimization, Springer-Verlag, ISBN0-387-98793-2
BFGS method
In numerical optimization, the BroydenFletcher Goldfarb [1] Shanno solving nonlinear optimization problems (which lack constraints).
[2]
(BFGS) method is a method for
The BFGS method approximates Newton's method, a class of hill-climbing optimization techniques that seeks a stationary point of a (preferably twice continuously differentiable) function: For such problems, a necessary condition for optimality is that the gradient be zero. Newton's method and the BFGS methods need not converge unless the function has a quadratic Taylor expansion near an optimum. These methods use the first and second derivatives. However, BFGS has proven good performance even for non-smooth optimizations. In quasi-Newton methods, the Hessian matrix of second derivatives need not be evaluated directly. Instead, the Hessian matrix is approximated using rank-one updates specified by gradient evaluations (or approximate gradient evaluations). Quasi-Newton methods are a generalization of the secant method to find the root of the first derivative for multidimensional problems. In multi-dimensions the secant equation does not specify a unique solution, and quasi-Newton methods differ in how they constrain the solution. The BFGS method is one of the most popular members of this class.[3] Also in common use is L-BFGS, which is a limited-memory version of BFGS that is particularly suited to problems with very large numbers of variables (like >1000). The BFGS-B[4] variant handles simple box constraints.
Rationale
The search direction pk at stage k is given by the solution of the analogue of the Newton equation where is an approximation to the Hessian matrix which is updated iteratively at each stage, and is the
gradient of the function evaluated at xk. A line search in the direction pk is then used to find the next point xk+1. Instead of requiring the full Hessian matrix at the point xk+1 to be computed as Bk+1, the approximate Hessian at stage k is updated by the addition of two matrices. Both Uk and Vk are symmetric rank-one matrices but have different (matrix) bases. The symmetric rank one assumption here means that we may write
BFGS method So equivalently, Uk and Vk construct a rank-two update matrix which is robust against the scale problem often suffered in the gradient descent searching (e.g., in Broyden's method). The quasi-Newton condition imposed on this update is
Algorithm
From an initial guess to the solution. 1. Obtain a direction by solving: 2. Perform a line search to find an acceptable stepsize 3. Set 4. 5. denotes the objective function to be minimized. Convergence can be checked by observing the norm of the gradient, . Practically, can be initialized with , so that the first step will be equivalent to a gradient descent, but further steps are more and more refined by , the approximation to the Hessian. The first step of the algorithm is carried out using the inverse of the matrix , which is usually obtained efficiently by applying the ShermanMorrison formula to the fifth line of the algorithm, giving in the direction found in the first step, then update and an approximate Hessian matrix the following steps are repeated until converges
In statistical estimation problems (such as maximum likelihood or Bayesian inference), credible intervals or confidence intervals for the solution can be estimated from the inverse of the final Hessian matrix. However, these quantities are technically defined by the true Hessian matrix, and the BFGS approximation may not converge to the true Hessian matrix.
Implementations
In the Matlab Optimization Toolbox, the fminunc [3] function uses BFGS with cubic line search when the problem size is set to "medium scale." [5] The GSL implements BFGS as gsl_multimin_fdfminimizer_vector_bfgs2 [6]. In SciPy, the scipy.optimize.fmin_bfgs [7] function implements BFGS. It is also possible to run BFGS using any of the L-BFGS algorithms by setting the parameter L to a very large number.
Notes
[1] [2] [3] [4] http:/ / www. columbia. edu/ ~goldfarb/ http:/ / rutcor. rutgers. edu/ ~shanno/ Nocedal & Wright (2006), page 24 R. H. Byrd, P. Lu and J. Nocedal. A Limited Memory Algorithm for Bound Constrained Optimization (http:/ / www. ece. northwestern. edu/ ~nocedal/ PSfiles/ limited. ps. gz) (1995), SIAM Journal on Scientific and Statistical Computing, 16, 5, pp. 11901208. [5] http:/ / www. mathworks. com/ help/ toolbox/ optim/ ug/ brnoxr7-1. html#brnpcye [6] http:/ / www. gnu. org/ software/ gsl/ manual/ html_node/ Multimin-Algorithms-with-Derivatives. html [7] http:/ / docs. scipy. org/ doc/ scipy/ reference/ generated/ scipy. optimize. fmin_bfgs. html#scipy. optimize. fmin_bfgs
BFGS method
Bibliography
Avriel, Mordecai (2003), Nonlinear Programming: Analysis and Methods, Dover Publishing, ISBN0-486-43227-0 Bonnans, J.Frdric; Gilbert, J.Charles; Lemarchal, Claude; Sagastizbal, ClaudiaA. (2006), Numerical optimization: Theoretical and practical aspects (http://www.springer.com/mathematics/applications/book/ 978-3-540-35445-1), Universitext (Second revised ed. of translation of 1997 French ed.), Berlin: Springer-Verlag, pp.xiv+490, doi:10.1007/978-3-540-35447-5, ISBN3-540-35445-X, MR2265882 Broyden, C. G. (1970), "The convergence of a class of double-rank minimization algorithms", Journal of the Institute of Mathematics and Its Applications 6: 7690, doi:10.1093/imamat/6.1.76 Fletcher, R. (1970), "A New Approach to Variable Metric Algorithms", Computer Journal 13 (3): 317322, doi:10.1093/comjnl/13.3.317 Fletcher, Roger (1987), Practical methods of optimization (2nd ed.), New York: John Wiley & Sons, ISBN978-0-471-91547-8 Goldfarb, D. (1970), "A Family of Variable Metric Updates Derived by Variational Means", Mathematics of Computation 24 (109): 2326, doi:10.1090/S0025-5718-1970-0258249-6 Luenberger, David G.; Ye, Yinyu (2008), Linear and nonlinear programming, International Series in Operations Research & Management Science, 116 (Third ed.), New York: Springer, pp.xiv+546, ISBN978-0-387-74502-2, MR2423726 Nocedal, Jorge; Wright, Stephen J. (2006), Numerical Optimization (2nd ed.), Berlin, New York: Springer-Verlag, ISBN978-0-387-30303-1 Shanno, David F. (July 1970), "Conditioning of quasi-Newton methods for function minimization", Math. Comput. 24 (111): 647656, doi:10.1090/S0025-5718-1970-0274029-X, MR42:8905 Shanno, David F.; Kettler, Paul C. (July 1970), "Optimal conditioning of quasi-Newton methods", Math. Comput. 24 (111): 657664, doi:10.1090/S0025-5718-1970-0274030-6, MR42:8906
Limited-memory BFGS
Limited-memory BFGS
The limited-memory BFGS (L-BFGS or LM-BFGS) algorithm is a member of the broad family of quasi-Newton optimization methods that uses a limited memory variation of the BroydenFletcherGoldfarbShanno (BFGS) update to approximate the inverse Hessian matrix (denoted by ). Unlike the original BFGS method which stores a dense approximation, L-BFGS stores only a few vectors that represent the approximation implicitly. Due to its moderate memory requirement, L-BFGS method is particularly well suited for optimization problems with a large number of variables. L-BFGS never explicitly forms or stores . Instead, it maintains a history of the past updates of the position and gradient , where generally the history can be short, often less than 10. These updates are used to implicitly do operations requiring the straightforward BFGS implementation at the informed by all updates on . -vector product. While strictly, a
-th iteration would represent the inverse Hessian approximation as
, L-BFGS does quite well using updates from only the most recent iterations
Representation
An L-BFGS shares many features with other quasi-Newton algorithms, but is very different in how the matrix-vector multiplication for finding the search direction is carried out . There are multiple published approaches to using a history of updates to form this direction vector. Here, we give a common approach, the so-called "two loop recursion."[1][2] We'll take as given , the position at the -th iteration, and where is the function being and minimized, and all vectors are column vectors. Then we keep the updates . We'll define estimate at iteration For .
will be the 'initial' approximate of the inverse Hessian that our
begins with. Then we can compute the (uphill) direction as follows:
For
Stop with This formulation is valid whether we are minimizing or maximizing. Note that if we are minimizing, the search direction would be the negative of z (since z is "uphill"), and if we are maximizing, should be negative definite rather than positive definite. We would typically do a backtracking line search in the search direction (any line search would be valid, but L-BFGS does not require exact line searches in order to converge). Commonly, the inverse Hessian is represented as a diagonal matrix, so that initially setting requires only an element-by-element multiplication. This two loop update only works for the inverse Hessian. Approaches to implementing L-BFGS using the direct approximate Hessian have also been developed, as have other means of approximating the inverse Hessian.[3]
Limited-memory BFGS
Applications
L-BFGS has been called "the algorithm of choice" for fitting log-linear (MaxEnt) models and conditional random fields with -regularization.[4]
Variants
Since BFGS (and hence L-BFGS) is designed to minimize smooth functions without constraints, the L-BFGS algorithm must be modified to handle functions that include non-differentiable components or constraints. A popular class of modifications are called active-set methods, based on the concept of the active set. The idea is that when restricted to a small neighborhood of the current iterate, the function and constraints can be simplified.
L-BFGS-B
The L-BFGS-B algorithm extends L-BFGS to handle simple box constraints on variables.[5][6] The method works by identifying fixed and free variables at every step (using a simple gradient method), and then using the L-BFGS method on the free variables only to get higher accuracy, and then repeating the process.
OWL-QN
Orthant-wise limited-memory quasi-Newton (OWL-QN) is an L-BFGS variant for fitting exploiting the inherent sparsity of such models. where
[4]
-regularized models,
It minimizes functions of the form
is a differentiable convex loss function. The method is an active-set type method: at each iterate, it
estimates the sign of each component of the variable, and restricts the subsequent step to have the same sign. Once the sign is fixed, the non-differentiable term becomes a smooth linear term which can be handled by L-BFGS. After a L-BFGS step, the method allows some variables to change sign, and repeats the process.
Implementations
An early, open source implementation of L-BFGS in Fortran exists in Netlib as a shar archive [7]. Multiple other open source implementations have been produced as translations of this Fortran code (e.g. java [8], and python [9] via SciPy). Other implementations exist (e.g. Matlab (optimization toolbox) [10], Matlab (BSD) [11]), frequently as part of generic optimization libraries (e.g. Mathematica [6], FuncLib C# library [12], and dlib C++ library [13]). The libLBFGS [14] is a C implementation.
Implementations of variants
The L-BFGS-B variant also exists as ACM TOMS [15] algorithm 778[6]. In February 2011, some of the authors of the original L-BFGS-B code posted a major update (version 3.0) [16]; this reference implementation is available in Fortran 77 (and with a Fortran 90 interface) at the author's website [17]. This version, as well as older versions, has been converted to many other languages, including a Java wrapper [18] for v3.0; Matlab interfaces for v3.0 [19], v2.4 [20] , and v2.1 [21]; a C++ interface [22] for v2.1; a python interface [23] for v2.1 as part of scipy.optimize.minimize [8]; an OCaml interface [24] for v2.1 and v3.0; version 2.3 has been converted to C by f2c and is available at this website [25] ; and R's optim general-purpose optimizer routine includes L-BFGS-B by using method="L-BFGS-B".[26] OWL-QN implementations are available in: C++ implementation by its designers [27], includes the original ICML paper on the algorithm[4] Python implementation [28] by Michael Subotin, intended for use with SciPy The CRF toolkit Wapiti [29] includes a C implementation
Limited-memory BFGS
Works cited
[1] Matthies, H.; Strang, G. (1979). "The solution of non linear finite element equations". International Journal for Numerical Methods in Engineering 14 (11): 16131626. doi:10.1002/nme.1620141104. [2] Nocedal, J. (1980). "Updating Quasi-Newton Matrices with Limited Storage". Mathematics of Computation 35 (151): 773782. doi:10.1090/S0025-5718-1980-0572855-7. [3] Byrd, R. H.; Nocedal, J.; Schnabel, R. B. (1994). "Representations of Quasi-Newton Matrices and their use in Limited Memory Methods". Mathematical Programming 63 (4): 129156. doi:10.1007/BF01582063. [4] Andrew, Galen; Gao, Jianfeng (2007). "Scalable Training of L1-Regularized Log-Linear Models" (http:/ / research. microsoft. com/ apps/ pubs/ default. aspx?id=78900). 24th international conference on Machine learning. doi:10.1145/1273496.1273501. ISBN978-1-59593-793-3. . [5] Byrd, R. H.; Lu, P.; Nocedal, J.; Zhu, C. (1995). "A Limited Memory Algorithm for Bound Constrained Optimization". SIAM Journal on Scientific Computing 16 (5): 1190. doi:10.1137/0916069. [6] Zhu, C.; Byrd, Richard H.; Lu, Peihuang; Nocedal, Jorge (1997). "L-BFGS-B: Algorithm 778: L-BFGS-B, FORTRAN routines for large scale bound constrained optimization". ACM Transactions on Mathematical Software 23 (4): 550560. doi:10.1145/279232.279236. [7] http:/ / netlib. org/ opt/ lbfgs_um. shar [8] http:/ / riso. sourceforge. net/ [9] http:/ / www. scipy. org/ doc/ api_docs/ SciPy. optimize. lbfgsb. html#fmin_l_bfgs_b [10] http:/ / www. mathworks. com/ help/ toolbox/ optim/ ug/ fmincon. html [11] http:/ / www. mathworks. com/ matlabcentral/ fileexchange/ 23245 [12] http:/ / funclib. codeplex. com/ [13] http:/ / dlib. net/ optimization. html [14] http:/ / www. chokkan. org/ software/ liblbfgs/ [15] http:/ / toms. acm. org/ [16] Morales, J. L.; Nocedal, J. (2011). "Remark on "algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound constrained optimization"". ACM Transactions on Mathematical Software 38: 1. doi:10.1145/2049662.2049669. [17] http:/ / users. eecs. northwestern. edu/ ~nocedal/ lbfgsb. html [18] http:/ / www. mini. pw. edu. pl/ ~mkobos/ programs/ lbfgsb_wrapper/ index. html [19] http:/ / www. mathworks. com/ matlabcentral/ fileexchange/ 35104-lbfgsb-l-bfgs-b-mex-wrapper [20] http:/ / www. cs. toronto. edu/ ~liam/ software. shtml [21] http:/ / www. cs. ubc. ca/ ~pcarbo/ lbfgsb-for-matlab. html [22] http:/ / code. google. com/ p/ otkpp/ source/ browse/ trunk/ otkpp/ localsolvers/ lbfgsb/ LBFGSB. cpp?r=51 [23] http:/ / www. scipy. org/ doc/ api_docs/ SciPy. optimize. lbfgsb. html [24] http:/ / forge. ocamlcore. org/ projects/ lbfgs/ [25] http:/ / www. koders. com/ c/ fid4A53890DFB42BB9734639793C7BDD4EB1B8E6583. aspx?s=decomposition [26] "General-purpose Optimization" (http:/ / finzi. psych. upenn. edu/ R/ library/ stats/ html/ optim. html). R documentation. Comprehensive R Archive Network. . [27] http:/ / research. microsoft. com/ en-us/ downloads/ b1eb1016-1738-4bd5-83a9-370c9d498a03/ [28] http:/ / www. umiacs. umd. edu/ ~msubotin/ owlqn. py [29] http:/ / wapiti. limsi. fr
Further reading
Liu, D. C.; Nocedal, J. (1989). "On the Limited Memory Method for Large Scale Optimization" (http://www. ece.northwestern.edu/~nocedal/PSfiles/limited-memory.ps.gz). Mathematical Programming B 45 (3): 503528. doi:10.1007/BF01589116. Byrd, Richard H.; Lu, Peihuang; Nocedal, Jorge; Zhu, Ciyou (1995). "A Limited Memory Algorithm for Bound Constrained Optimization" (http://www.ece.northwestern.edu/~nocedal/PSfiles/limited.ps.gz). SIAM Journal on Scientific and Statistical Computing 16 (5): 11901208. doi:10.1137/0916069.
Broyden's method
10
Broyden's method
In numerical analysis, Broyden's method is a quasi-Newton method for the root-finding algorithm in k variables. It was originally described by C. G. Broyden in 1965.[1] Newton's method for solving the equation uses the Jacobian matrix and determinant, , at every
iteration. However, computing this Jacobian is a difficult and expensive operation. The idea behind Broyden's method is to compute the whole Jacobian only at the first iteration, and to do a rank-one update at the other iterations. In 1979 Gay proved that when Broyden's method is applied to a linear system of size n x n, it terminates in 2n steps.[2]
Description of the method

Solving single variable equation
In the secant method, we replace the first derivative with the finite difference approximation:
and proceeds similar to Newton's Method (
is the index for the iterations):
Solving a set of nonlinear equations

To solve a set of nonlinear equations , where the vector is a function of vector as (if we have equations):
For such problems, Broyden gives a generalization of above formula, replacing the derivative approximation:
with the Jacobian
. The Jacobian matrix is determined iteratively based on the secant equation with the finite difference
where is the index of iterations. However above equation is under determined in more than one dimension. Broyden suggests using the current estimate of the Jacobian matrix and improving upon it by taking the solution to the secant equation that is a minimal modification to Frobenius norm ): (minimal in the sense of minimizing the
where
then we proceed in the Newton direction as:
Broyden's method Broyden also suggested using the Sherman-Morrison formula to update directly the inverse of the Jacobian matrix:
11
This method is commonly known as the "good Broyden's method". A similar technique can be derived by using a slightly different modification to (which minimizes instead); this yields the so-called "bad Broyden's method" (but see[3]):
Many other quasi-Newton schemes have been suggested in optimization, where one seeks a maximum or minimum by finding the root of the first derivative (gradient in multi dimensions). The Jacobian of the gradient is called Hessian and is symmetric, adding further constraints to its upgrade.
References
[1] Broyden, C. G. (October 1965). "A Class of Methods for Solving Nonlinear Simultaneous Equations". Mathematics of Computation (American Mathematical Society) 19 (92): 577593. doi:10.2307/2003941. JSTOR2003941. [2] Gay, D.M. (August 1979). "Some convergence properties of Broyden's method". SIAM Journal of Numerical Analysis (SIAM) 16 (4): 623630. doi:10.1137/0716047. [3] Kvaalen, Eric (November 1991). "A faster Broyden method". BIT Numerical Mathematics (SIAM) 31 (2): 369372. doi:10.1007/BF01931297.
External links
Module for Broyden's Method by John H. Mathews (http://math.fullerton.edu/mathews/n2003/ BroydenMethodMod.html)
Symmetric rank-one
12
Symmetric rank-one
The Symmetric Rank 1 (SR1) method is a quasi-Newton method to update the second derivative (Hessian) based on the derivatives (gradients) calculated at two points. It is a generalization to the secant method for a multidimensional problem. This update maintains the symmetry of the matrix but does not guarantee that the update be positive definite. The sequence of Hessian approximations generated by the SR1 method converges to the true Hessian under mild conditions, in theory; in practice, the approximate Hessians generated by the SR1 method show faster progress towards the true Hessian than do popular alternatives (BFGS or DFP), in preliminary numerical experiments.[1] The SR1 method has computational advantages for sparse or partially separable problems. Alternative quasi-Newton methods like BFGS impose positive-definiteness, which is an inappropriate constraint on indefinite problems.[1] A twice continuously differentiable function function has an expansion as a Taylor series at has a gradient ( , which can be truncated ; its gradient has a Taylor-series approximation also , which is used to update . The above secant-equation need not have a unique solution . The SR1 formula : computes (via an update of rank 1) the symmetric solution that is closest to the current approximate-value , where . The corresponding update to the approximate inverse-Hessian . The SR1 formula has been rediscovered a number of times. A drawback is that the denominator can vanish. Some authors have suggested that the update be applied only if , where is a small number, e.g. .
[2]
) and Hessian matrix
: The
is
Notes
[1] Conn, Gould & Toint (1991) [2] Nocedal & Wright (1999)
References
Byrd, Richard H. (1996) Analysis of a Symmetric Rank-One Trust Region Method. SIAM Journal on Optimization 6(4) Conn, A. R.; Gould, N. I. M.; Toint, Ph. L. (March 1991). "Convergence of quasi-Newton matrices generated by the symmetric rank one update". Mathematical Programming (Springer Berlin/ Heidelberg) 50 (1): 177195. doi:10.1007/BF01594934. ISSN0025-5610. PDF file at Nick Gould's website (ftp://ftp.numerical.rl.ac.uk/ pub/nimg/pubs/ConnGoulToin91_mp.pdf).
Symmetric rank-one Khalfan, H. Fayez (1993) A Theoretical and Experimental Study of the Symmetric Rank-One Update. SIAM Journal on Optimization 3(1) Nocedal, Jorge & Wright, Stephen J. (1999). Numerical Optimization. Springer-Verlag. ISBN 0-387-98793-2.
13
Article Sources and Contributors
14
Article Sources and Contributors

Quasi-Newton method Source: http://en.wikipedia.org/w/index.php?oldid=495838291 Contributors: AbnerCYH, Abnercyh, Aschmied, BenFrantzDale, Benwing, ChristianGruen, Dayson39, Donvinzk, Drybid, Grafen, JPRBW, Jitse Niesen, Kiefer.Wolfowitz, Lavaka, Male1979, Marcol, Mathemaduenn, Michael Hardy, MisterSheik, O18, Oleg Alexandrov, Qwertyus, R'n'B, Rubybrian, Shai mach, Smarchesini, Starshipenterprise, Thomas.kn, Zakke, 23 anonymous edits DavidonFletcherPowell formula Source: http://en.wikipedia.org/w/index.php?oldid=478142548 Contributors: A. Pichler, Good Olfactory, Jitse Niesen, K.menin, Kiefer.Wolfowitz, Michael Hardy, Oleg Alexandrov, Omnipaedista, Rushbugled13, Smarchesini, 5 anonymous edits BFGS method Source: http://en.wikipedia.org/w/index.php?oldid=504984544 Contributors: A. Pichler, AManWithNoPlan, Alanb, Baccyak4H, Benwing, Booyabazooka, Charles Matthews, Chemuser, Drybid, Dtrebbien, Dycotiles, Gdm, Getreal123, Gnfnrf, Headbomb, Isilanes, Itub, JJL, Jitse Niesen, K.menin, Kiefer.Wolfowitz, Komap, Lavaka, Lunch, Male1979, Mebden, Michael Hardy, Nicolasbock, O18, Oleg Alexandrov, Omnipaedista, Paulck, Qwertyus, R'n'B, Roystgnr, Rushbugled13, Sabamo, Smarchesini, Stevenj, TenOfAllTrades, X7q, 54 anonymous edits Limited-memory BFGS Source: http://en.wikipedia.org/w/index.php?oldid=503271989 Contributors: AManWithNoPlan, Abeppu, Avraham, Benwing, Bobo192, Chris the speller, Danpovey, Delaszk, Eli Osherovich, Fredludlow, Glenn, JimVC3, Jitse Niesen, K.menin, Kiefer.Wolfowitz, Lavaka, LongHei, Maghnus, Marasmusine, Memming, Michael Hardy, NathanHagen, Nocedal, Noegenesis, Qwertyus, Soultaco, 50 anonymous edits Broyden's method Source: http://en.wikipedia.org/w/index.php?oldid=505386623 Contributors: Andrewhayes, BigJohnHenry, Calleman21, Cobi, Davidnavarroalarcon, Domination989, Dtrebbien, Haseldon, Imjustmatthew, Jitse Niesen, Lavaka, NathanHagen, Neuralwarp, Oleg Alexandrov, Paulnwatts, Rjwilmsi, Smarchesini, Timena dava, Tmonzenet, Wikielwikingo, 30 anonymous edits Symmetric rank-one Source: http://en.wikipedia.org/w/index.php?oldid=502963045 Contributors: BenFrantzDale, Bte99, Headbomb, Jitse Niesen, K.menin, Kiefer.Wolfowitz, Noegenesis, Qwertyus, Roystgnr, Smarchesini, WhatamIdoing, 1 anonymous edits
Image Sources, Licenses and Contributors
15
License
Creative Commons Attribution-Share Alike 3.0 Unported //creativecommons.org/licenses/by-sa/3.0/

Quasi Newton Methods

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Quasi Newton Methods

Uploaded by

Copyright:

Available Formats

Quasi-Newton Method

Solving non-Linear System of Equations

Description of the method

) is the gradient and ) is

an approximation to the Hessian matrix. The gradient of this approximation

and setting this gradient to zero provides the Newton step:

The Hessian approximation

is some positive defini is updated

is often sufficient to achieve rapid convergence. The unknown

, or directly its inverse is positive definite and

using the is chosen to satisfy the

Broyden family SR1

and the Taylor series of the gradient itself (secant equation):

is assumed to be positive definite, and the vectors

must satisfy the curvature condition:

(BFGS) method is a method for

-th iteration would represent the inverse Hessian approximation as

will be the 'initial' approximate of the inverse Hessian that our

begins with. Then we can compute the (uphill) direction as follows:

It minimizes functions of the form

Description of the method

and proceeds similar to Newton's Method (

is the index for the iterations):

Solving a set of nonlinear equations

with the Jacobian

then we proceed in the Newton direction as:

) and Hessian matrix

Article Sources and Contributors

Article Sources and Contributors

Image Sources, Licenses and Contributors

You might also like