Professional Documents
Culture Documents
1 Numerical optimization
1.1 Optimization of single-variable functions . . . . . . . . .
1.1.1 Golden Section Search . . . . . . . . . . . . . . . .
1.1.2 Fibonacci Search . . . . . . . . . . . . . . . . . . .
1.2 Algorithms for unconstrained multivariable optimization
1.2.1 Basic concepts . . . . . . . . . . . . . . . . . . . . .
1.2.1.1 Line search algorithms . . . . . . . . . .
1.2.1.2 Trust region methods . . . . . . . . . . .
1.2.2 Newton methods . . . . . . . . . . . . . . . . . . .
1.2.2.1 The algorithm . . . . . . . . . . . . . . .
1.2.2.2 Geometrical interpretation . . . . . . . .
1.2.2.3 Implementation aspects . . . . . . . . . .
1.2.2.4 Modified Newton method . . . . . . . .
1.2.3 Gradient methods . . . . . . . . . . . . . . . . . .
1.2.3.1 Directional derivative and the gradient .
1.2.3.2 The method of steepest descent . . . . .
1.2.4 Conjugate gradient methods . . . . . . . . . . . .
1.2.5 Quasi-Newton methods . . . . . . . . . . . . . . .
1.2.6 Algorithms for minimization without derivatives
1.2.6.1 Nelder-Mead method . . . . . . . . . . .
1.2.6.2 Rosenbrock method . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5
5
6
7
9
9
10
13
13
13
14
15
17
18
18
19
22
25
26
27
35
Contents
Chapter
Numerical optimization
1.1 Algorithms for optimization of single-variable functions.
techniques
Bracketing
Consider a single variable real-valued function f (x) for which it is required to find an optimum in an
interval [a, b]. Among the algorithms for univariate optimization, golden section or Fibonacci search techniques are fast, accurate, robust and do not require derivatives, (Sinha, 2007; Press et al., 2007; Mathews,
2005). They assume that the optimum is bracketed in a finite interval and the function is unimodal on [a, b].
The algorithms will be described for finding a minimum, the problem of finding a maximum is similar and
may be solved with a very slight alteration of the algorithm or by changing the sign of the function.
A function f (x) is unimodal on an interval [a, b] if it has exactly one local minimum x on the interval
and is strictly decreasing for x < x and strictly increasing for x > x .
0.8
0.7
0.6
f(x)
0.5
0.4
0.3
0.2
0.1
0
(1.1)
(1.2)
b x1 = r(b a)
(1.3)
We shall assume now that the function has a smaller value at x1 than x2 , thus the new search interval will be [a, x2 ]. The demonstration is similar with the other case when f (x2 ) > f (x1 ). A point x3 is
introduced within the new interval so it is located at the same distance, d2 , from x2 as x1 is from a:
d2 = x1 a = x2 x3
(1.4)
The ratio r must remain constant on each new interval, which means:
x1 a = r(x2 a)
(1.5)
x2 x3 = r(x2 a)
(1.6)
If x2 from (1.2) and x1 from (1.3) are replaced into (1.5), a value for r can be determined as follows:
b r(b a) a = r [a + r(b a) a]
(1.7)
(b a)(1 r r2 ) = 0
(1.8)
Rearranging, we get:
and since the initial length of the interval is nonzero, the ratio r is one of the roots of
1 r r2 = 0
The solutions of (1.9) are:
1 +
r1 =
2
1
= 0.618, r2 =
2
(1.9)
(1.10)
(1.11)
(1.13)
From Figure 1.3, the left hand side of the relation (1.13) can be written as:
x2k x1k = (bk ak ) (x1k ak ) (bk x2k )
= (bk ak ) (1 rk )(bk ak ) (1 rk )(bk ak )
= (bk ak )(2rk 1)
(1.14)
(1.15)
(1.16)
1 rk
rk
(1.17)
Fnk
Fnk+1
(1.18)
rk+1 =
Now we introduce the Fibonacci numbers. Replace:
rk =
and get the following:
rk+1 =
Fnk
Fnk+1
Fnk
Fnk+1
Fnk1
Fnk
(1.19)
Fn1
Fn
Fn1
Fn
...
(1.20)
rn2 =
rn1 =
F2
F3
F1
1
=
F2
2
At the last step, n 1, no new points can be added. The distance dn1 = dn2 rn1 = dn2 /2, i.e. at the step
n 1 the points x1,n1 and x2,n2 are equal and:
rn =
F0
1
= =1
F1
1
(1.21)
The last interval obtained is dn = rn (bn an ) = rn dn1 = dn1 and no other steps are possible. It may be
written as:
F0
F0 F1
dn1 =
dn1 ...
F1
F1 F2
F0 F1
Fn2 Fn1
F0
1
...
d1 =
d0 =
d0
F1 F2
Fn1 Fn
Fn
Fn
dn = rn dn1 =
=
8
(1.22)
(1.23)
and the necessary number of iterations to reach the minimum with a maximum error is the smaller integer
that makes the following inequality true:
ba
Fn >
(1.24)
(1.25)
2
2
5
Both algorithms for single-variable function optimization prsesented here can be applied when the
function f in not differentiable. For a small number of iterations the Fibonacci method is more efficient
since it provides the smallest final interval of uncertainty in a given number of steps. When n is large, the
methods are almost identical because the ratio Fnk /Fnk+1 converge to the golden ratio 0.618, when k
approaches .
(1.26)
the function value calculated at two successive points does not vary more than a given tolerance 2 :
|f (xk+1 ) f (xk )| < 2
(1.27)
(1.28)
where kXk is the Euclidian norm of a vector X = [x1 x2 ... xn ]T , given by:
q
kXk = x21 + x22 + ... + x2n
(1.29)
In general, there are two strategies from moving from one point to the next: line search and trust region.
An extensive description of algorithms from both categories can be found in (Nocedal and Wright, 1999;
Snyman, 2005; Conn et al., 2000).
1.2.1.1 Line search algorithms
Searching along a line for a minimum point is a component part of most nonlinear programming algorithms. To initiate a line search with respect to a function f , two vectors must be specified: the initial point
x0 and the direction d in which the search is to be made, (Luenberger, 2003).
2
f(x,y)=3x1sin(x2)
3
2.5
2
xn
1.5
x2
1
0.5
0
x0
0.5
1
1.5
2
1
0.5
0.5
x1
(1.30)
(1.31)
Large values of the step length may lead to the situation when the minimum point is not reached in a
small number of iterations as shown in Figure 1.5. The function f is reduced at every step but with a very
small amount compared to the step length.
sk
or compute a step length that provides a decrease in the value of f (e.g. using Armijo, Goldstein or
Wolfe rules)
Compute
xk+1 = xk + sk dk
until one of the conditions (1.26), (1.27) or (1.28) are true
One possibility to control the step size control is to implement a procedure for minimization of F (sk )
based on golden ratio or Fibonacci search as presented in previous sections.
Sophisticated algorithms for one-dimensional minimization may not be a reasonable choice because
they would require a large number of function evaluations and thus, a significant increase of the computation time. More practical strategies perform an inexact line search to identify a step length that achieves
adequate reductions in f . Implementations of line search algorithms try out a sequence of candidates for
sk stopping to accept one of these values when certain conditions are satisfied. A detailed description of
Armijo, Goldstein and Wolfe conditions is given in (Nocedal and Wright, 1999; Snyman, 2005; Luenberger,
2003).
11
(1.32)
(1.33)
(1.34)
F (sk ) =
0
F (0) =
0
F (0) + F (0)sk =
dTk f (xk + sk dk )
dTk f (xk )
f (xk ) + dTk f (xk )sk
(1.35)
(1.36)
(1.37)
(1.38)
The Wolfe test require a condition additional to the relation (1.37) to be satisfied:
f (xk + sk dk ) f (xk ) + 1 dTk f (xk )sk
dTk f (xk
where 0 < 1 < 2 < 1.
12
+ sk dk )
2 dTk f (xk )
(1.39)
(1.40)
(1.42)
The third term in (1.41) was differentiated with respect to xk+1 according to the rule:
xT Ax
= 2Ax, for A symmetrical
x
(1.43)
13
(1.44)
(1.45)
f 0 (xk )
f 00 (xk )
(1.46)
(1.47)
(1.48)
g(xk )
g 0 (xk )
(1.49)
Since the new point calculated is denoted by xk+1 , the iterative scheme is:
xk+1 = xk
g(xk )
g 0 (xk )
(1.50)
When the function g is the first derivative of a function f , g(x) = f 0 (x), the relation (1.50) can be written
as:
xk+1 = xk
f 0 (xk )
f 00 (xk )
(1.51)
(1.52)
where A is a symmetric n n matrix, b is a n 1 vector and c a scalar, the relation (1.41) is not an approximation and the stationary point can be determined in one step.
The gradient of f result as:
f (x) = Ax + b
(1.53)
and the solution of the optimization problem is:
x = A1 b
(1.54)
A Newton method will converge to the exact stationary point in one step, no matter which is the initial
point selected.
Example 1.1 Consider the function f (x1 , x2 ) = x21 + 2x22 + x1 + 7
It can be written in the form (1.52) as:
1
2 0
x1
x1
f (x1 , x1 ) = [x1 x2 ]
+ [1 0]
+7
0 4
x2
x2
2
(1.55)
Let x0 = [x10 x20 ]T be the initial point of a Newton algorithm. The gradient and Hessian matrix for function f
are:
2x1 + 1
2 0
f (x1 , x2 ) =
, H(x1 , x2 ) =
(1.56)
4x2
0 4
The next point is calculated from:
x1 = x0 H(x0 )1 f (x0 )
or:
x11
x12
x10
x10
1
2
0
1
4
2x10 + 1
4x20
(1.57)
12
0
(1.58)
Notice that the gradient calculated at this new point [ 12 0] is zero, thus the condition (1.45) is satisfied and the
algorithm stops here.
15
f (x) = 12x 2
(1.59)
(1.60)
A starting point x0 and a tolerance (i.e. = 105 ) must be selected. The next points of Newton search are
calculated according to the iterative scheme:
xk+1 = xk
f 0 (xk )
f 00 (xk )
(1.61)
0.05
0.1
f(x)
x0 = 0.9
0.15
x = 0.707
start
f 00 (x ) = 4 > 0
0.2
x is a local minimum
stop
0.25
1
0.5
0.5
Figure 1.8:
0
stop
0.05
start
0.1
f(x)
x0 = 0.3
x = 0
0.15
f 00 (x ) = 2 < 0
0.2
x is a local maximum
0.25
1
0.5
0.5
Figure 1.9:
Figures 1.8, 1.9, 1.10 illustrate the result of the Newton algorithm for various values of the initial point x0 , the
extremum obtained x and the type of extremum determined directly from the computation of the second derivative
at x .
16
0.05
f(x)
0.1
x0 = 0.3
start
x = 0.707
0.15
f 00 (x ) = 4 > 0
0.2
x is a local minimum
stop
0.25
1
0.5
0.5
Figure 1.10:
The advantages of Newton method include:
The method is powerful and easy to implement
It will converge to a local extremum from any sufficiently close starting value
However, the method has some drawbacks:
The method requires the knowledge of the gradient and hessian matrix, which is not always possible.
For large problems, determining the Hessian matrix by the method of finite-differences is costly in
terms of function evaluations.
There is no guarantee of convergence unless the initial point is reasonably close to the extremum,
thus the global convergence properties are poor.
1.2.2.4 Modified Newton method
A classical Newton method requires calculation of the Hessian at each iteration. Moreover, the matrix has
to be inverted which may be an inconvenient for large problems. A modified Newton method uses the
Hessian calculated at the initial point. The convergence of the algorithm is still good for starting points
close to extrema but the convergence is linear instead of quadratic (as in the case of the classical method).
The algorithm 7 describes the procedure.
Algorithm 7 Modified Newton method
Define f (x) and a tolerance
Choose an initial point x0
Compute the gradient f (x) symbolically
Compute and evaluate the hessian matrix at the initial point, H(x0 )
Compute the inverse of the Hessian, H(x0 )1
while ||H(x0 )1 f (xk )|| > do
Calculate a new point xk+1 = xk H(x0 )1 f (xk )
Set k k + 1
end while
17
direction of u
direction of x2
0.5
direction of x
1 2
f(x ,x )
0.5
1
0
1
x1
2
3
0.5
1.5
2.5
x2
The inner product from the definition of the directional derivative can be expressed as:
u f = f u = kf kkuk cos = kf k cos
(1.63)
where kf k and kuk are the norms of the corresponding vectors and is the angle between them (Figure
1.12). Because u is a unit vector, the norm is equal to one.
From (1.63) we obtain that:
the largest possible value of u f is obtained when cos assumes its maximum value of 1 at = 0,
i.e. the vector u points in the direction of the gradient f . Thus, the directional derivative will be
maximized in the direction of the gradient and this maximum value will be the magnitude of the
gradient (kf k).
if = , the cosine has its smallest value of 1 and the unit vector u points in the opposite direction
of the gradient, f . The directional derivative will be minimized in the direction of f and
u f = kf k.
In other words, the function f will increase most rapidly (or it has the greatest positive rate of change)
in the direction of the gradient and will decrease most rapidly in the direction of f .
18
(1.64)
The gradient of f represented as a vector field is shown in Figure1.13. It points in the direction of maximum
increase of the function.
0.5
x2
f(x1,x2)
0.5
2
1
6
6
5
4
x2
3
0
2
1
x1
x1
(1.65)
f (xk )
kf (xk )k
(1.66)
In this case, the distance between xk and xk+1 is exactly the step length sk .
The next iterate is computed in the direction of the negative gradient at this new point xk+1 and we get
a zig-zag pattern as illustrated in the example from Figure 1.14. Iterations continue until the extremum has
been determined within a chosen accuracy . The convergence criterion may be one of those given by the
relations (1.26), (1.27) or (1.28).
The step size can be selected at the beginning of the procedure and fixed for all iterations. In this case it
is possible to obtain oscillations around the local minimum and a very large number of iterations until the
procedure will eventually converge to a solution.
19
f (xk )
min f xk sk
sk >0
kf (xk )k
Calculate a new point:
xk+1 xk sk
f (xk )
kf (xk )k
Set k k + 1
until kf (xk )k < (or kxk+1 xk k < or |f (xk+1 ) f (xk )| < )
Quadratic functions
In the particular case when the function to be minimized is quadratic, we are able to determine explicitly
the optimal step size that minimizes f (xk sk f (xk )). Consider a function having the form:
1
f (x) = xT Qx + bT x + c
2
where the n n matrix Q is symmetric and positive definite, b is a n 1 vector and c is a scalar.
The gradient of f is is obtained as:
f (x) = Qx + b
20
(1.67)
(1.68)
f (xk sk f (xk )) =
(1.69)
When xk is given, the function f (xk sk f (xk )) depends only on sk and the derivative is:
df
dsk
(1.70)
(1.71)
where the last term in the right hand side was obtained using (1.68). If we set the derivative df /dsk equal
to zero, the optimal step is given by:
f (xk )T f (xk )
sk =
(1.72)
f (xk )T Qf (xk )
Thus, for quadratic functions, the method does not require a line search procedure and the steepest descent
iteration is given by:
xk+1 = xk
f (xk )T f (xk )
f (xk )T Qf (xk )
f (xk )
(1.73)
g1
P
d0
3
3
3
3
d1
d0
b)
x1
x1
(1.75)
The step sk is calculated by an exact or approximate line search method and dk , k = 0, n are the mutually
conjugate search directions.
Starting at an initial point x0 , the first direction of move is minus gradient:
d0 = f (x0 )
(1.76)
dk+1 = f (xk+1 ) + k dk
(1.77)
22
(1.79)
k =
(1.80)
Hestenes-Stiefel formula:
sk
Compute
xk+1 = xk + sk dk
Compute k using one of FR (1.78), PR (1.79) or HS (1.80) formula.
Compute the new direction
dk+1 = f (xk+1 ) + k dk
Set k k + 1
until convergence criterion is satisfied.
(1.81)
f (xk )T dk
dTk Qdk
(1.82)
1
1
2 2
x1
f (x1 , x2 ) = [x1 x2 ]
= xT Qx
2 8
x2
2
2
and the gradient vector is computed as:
f (x) =
2x1 + 2x2
2x1 + 8x2
(1.84)
(1.85)
d0 = f (x0 ) = f (2.5, 0) =
2 (2.5) + 2 0
2 (2.5) + 8 0
5
5
The optimal step length for quadratic functions, given by (1.82) is:
5
[5 5]
T
5
f (x0 ) d0
= 0.1429
s0 =
=
T
d0 Qd0
2 2
5
[5 5]
2 8
5
(1.87)
2.5
5
1.7857
x1 = x0 + s0 d0 =
+ 0.1429
=
0
5
0.7143
for which, the gradient is
f (x1 ) =
2.1429
2.1429
(1.86)
(1.88)
(1.89)
We shall compute the next iterate using the Fletcher-Reeves formula to determine the coefficients k :
2.1429
[2.1429 2.1429]
2.1429
f (x1 )T f (x1 )
0 =
=
= 0.1837
T
f (x0 ) f (x0 )
5
[5 5]
5
The new direction, Q-orthogonal or conjugate to d0 is given by:
2.1429
5
3.0612
d1 = f (x1 ) + 0 d0 =
+ 0.1837
=
,
2.1429
5
1.2245
(1.90)
(1.91)
f (x1 )T d1
= 0.5833
dT1 Qd1
(1.92)
x2 = x1 + s1 d1 =
24
1.7857
0.7143
+ 0.5833
3.0612
1.2245
0
0
(1.93)
x2
start
end
3
3
x1
Steepest descent
3
start
0
3
3
x1
(1.95)
The update matrix Buk can be calculated using one of the widely used formulas: Davidon-FletcherPowell or Broyden-Fletcher-Goldfarb-Shanno. Both of them require the value of the point and the gradient
at the current and previous iteration:
xk = xk+1 xk
(1.96)
Gk = f (xk+1 ) f (xk )
(1.97)
xk xTk
Bk Gk GTk Bk
xTk Gk
GTk Bk Gk
(1.98)
xk xTk
Gk xTk
Gk xTk
Bk +
Bk I
+ I
xTk Gk
GTk xk
GTk xk
(1.99)
sk >0
(1.100)
Figure 1.20 shows an example of an initial triangle and the notations B, G and W according to the
function value at the vertices.
1.5
0.5
0
1
0.5
0.5
0
0.5
1
1W
0.5
Midpoint of the BG side. The procedure uses the midpoint M , of the line segment joining B and G (Figure 1.21). Its coordinates are obtained as:
xM =
x1 + x2
y1 + y2
, yM =
2
2
x1 + x2 y1 + y2
,
2
2
(1.101)
(1.102)
Reflection. If we move from vertex W towards B or G, along the sides, the function decreases (because
in B it has the smallest value), or, smaller values of f lie away from W , on the opposite side of the
line between B and G. A point R is determined by reflecting the triangle through the side BG. It
is located at the same distance (d) from M , as W is to M , along the extended line that starts at W and
passes through the midpoint of BG, (Figure 1.21). The vector formula for the coordinates of R is:
R = M + (M W ) = 2M W
28
(1.103)
(1.104)
Figure 1.22: The triangle BGW and point R and extended point E
Contraction. If the function value at R and W are the same, another point must be tested. Perhaps the
function is smaller at M but we cannot replace W with M because we need a triangle. Consider the
two midpoints C1 and C2 of the line segments W M and M R respectively (Figure 1.23). The point
with the smaller function value is called C and the new triangle is BGC.
The points C1 and C2 have the coordinates:
C1 = W +
W +M
RM
R+M
M W
=
, C2 = R
=
2
2
2
2
(1.105)
Shrinking. If the function value at C is not less than the function value at W , the points G and W must be
shrunk toward B. The point G is replaced with M and W is replaced with S, which is the midpoint
of the line segment joining B and W (Figure 1.24). Similar to M , the midpoint S can be calculated
from:
B+W
S=
(1.106)
2
29
The algorithm must terminate in a finite number of steps or iterations. The stop criterion can be one or
all of the following:
The simplex becomes sufficiently small, ore the vertices are close within a given tolerance
The function does not change too much for successive iterations.
The number of iterations or function evaluations exceeds a maximum value allowed
In two-dimensional cases, the logical details are explained in the algorithm 11, (Mathews and Fink,
2004).
Example 1.5 Use the Nelder-Mead algorithm to find the minimum of:
f (x, y) = (x 10)2 + (y 10)2
(1.107)
First iteration Start with three vertices conveniently chosen at the origin and on the axes (Figure 1.25):
V1 = (0, 0), V2 = (2, 0), V3 = (0, 6)
30
(1.108)
31
(1.109)
Because f (0, 6) < f (2, 0) < f (0, 0), the vertices will be assigned the names:
B = (0, 6), G = (2, 0), W = (0, 0)
(1.110)
(1.111)
R = 2M W = (2, 6)
(1.112)
M=
and the reflected point:
The function value at R, f (R) = f (2, 6) = 80 < f (B) < f (B) and the extended point E will be calculated
from:
E = 2R M = (3, 9)
(1.113)
The new triangle is BGE.
Second iteration Since the function value at E is f (3, 9) = 50 < f (B) < f (G), the vertices labels for the current
stage are (Figure 1.26):
B = (3, 9), G = (0, 6), W = (2, 0)
(1.114)
The new midpoint of BG and the reflected point R are:
M=
B+G
= (1.5, 7.5), R = 2M W = (1, 15)
2
(1.115)
The value of f at the reflected point is: f (1, 15) = 106. It is less than f (G) = f (0, 6) = 116 but greater than
f (B) = f (3, 9) = 50. Therefore, the extended point will not be calculated and the new triangle is RGB.
32
(1.116)
B+G
= (2, 12), R = 2M W = (4, 18)
2
(1.117)
and the function takes at R the value f (R) = f (4, 18) = 100. This is again less than the value at G but it is
not less than the value at B, thus the point R will be a vertex of the next triangle RGB.
Fourth iteration Because f (3, 9) < f (4, 18) < f (1, 15), the new labels are (Figure 1.28):
B = (3, 9), G = (4, 18), W = (1, 15)
(1.118)
B+G
= (3.5, 13.5), R = 2M W = (6, 12)
2
(1.119)
Because f (R) = f (6, 12) = 20 < f (B), we shall build the extended point E:
E = 2R M = (8.5, 10.5)
(1.120)
The function takes here the value f (E) = f (8.5, 10.5) = 2.5, thus the new triangle is GBE.
The calculations continue until the function values at the vertices have almost equal values (with some allowable
tolerance ). It will still take a large number of steps until the procedure will reach the minimum point (10, 10) where
the function value is 0, but the steps performed above show that the function is decreasing at each iteration.
33
34
d1 =
1
0
, d2 =
0
1
(1.121)
Consider an initial point A having the coordinates given by the vector xA = [x1A x2A ]T . We shall move to point B
a distance s1 in the direction of x1 -axis, or the direction d1 . The coordinates of B are calculated as:
x1B
x1A
1
=
+ s1
(1.122)
x2B
x1A
0
or, in a vector form:
xB = xA + s1 d1
(1.123)
(1.124)
The coordinate system is now rotated and the new orthogonal directions of the axes are d0 1 , d0 2 . If we move
further on to the point D along d0 1 and then to E along d0 2 , with the distances s01 and s02 , the coordinates are given
by similar relations:
xD = xC + s01 d0 1
(1.125)
s02 d0 2
(1.126)
xE = xD +
35
(1.127)
A2 =
D2 d2 + . . . + Dn dn
(1.128)
Dn dn
(1.129)
...
An =
(1.130)
Thus Ai is the vector joining the initial and final points obtained by use of the vectors dk , k = i, n.
Orthogonal unit vectors of the new coordinate system (d0 1 , d0 2 , ..., d0 n ) are obtained from Ai by the
Gram-Schmidt orthogonalization procedure:
B1 = A1 , d0 1 =
B1
kB1 k
B2 = A2 d0 1 A1 d0 1 , d0 2 =
...
Bn = An
n1
X
j=1
(1.131)
B2
kB2 k
d0 j Aj d0 j , d0 n =
Bn
kBn k
(1.132)
(1.133)
The new set of directions was calculated such that bf d0 1 lies along the direction of fastest decrease,
bf d0 2 along the best direction which can be found normal to d0 1 and so on.
For a practical implementation we have to consider also the following issues:
36
Example 1.7 Using the Rosenbrock method minimize the banana function (or Rosenbrock function):
f (x1 , x2 ) = 100(x2 x21 )2 + (1 x1 )2
(1.135)
stop
1
0.8
0.6
x2
0.4
0.2
0
0.2
0.4
0.6
0.8
start
1
1
0.5
0.5
x1
38
Bibliography
numeriProject
Mathews, J. H. and Fink, K. K. (2004). Numerical Methods Using Matlab. Prentice-Hall Inc., Upper Saddle
River, New Jersey, USA, 4th edition.
Nocedal, J. and Wright, S. J. (1999). Numerical Optimization. Springer series in operation research. SpringerVerlag.
Pierre, D. A. (1986). Optimization Theory with Applications. Courier Dover Publications.
Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. (2007). Numerical Recipes: The Art of
Scientific Computing. Cambridge University Press.
Rosenbrock, H. (1960). An automatic method for finding the greatest or least value of a function. The
Computer Journal, 3(3):175184.
Sinha, P. (2007).
online.org/.
From http://www.optimization-
Snyman, J. A. (2005). Practical Mathematical Optimization: An Introduction to Basic Optimization Theory and
Classical and New Gradient-based Algorithms. Springer.
Weisstein, E. (2004).
Directional derivative.
From MathWorldA Wolfram Web Resource.
http://mathworld.wolfram.com/DirectionalDerivative.html.
39