Conjugate Gradients Programación No Lineal

Conjugate Gradients
Although the method of steepest ascent is a well-know procedure based upon a

very plausible idea, we have seen that the method is computationally not very
effective. We therefore turn our attention to some more advanced methods in
which the path taken is related to, though not identical with, the gradient
direction.
The first of these is known as the method of conjgate gradients. This method
was originally devised by Hestenes and Steifel [1952] for solving the system of
linear algebraic equations
(3.2.20)
AX = B,
Where A is a real, symmetric, positive-definite matrix, by minimizing the

corresponding quadratic form
(3.2.21)
1
y= 2 XAX BX
The equivalence of these two problems is established by the relationship

(3.2.22)
y = 0 = AX B.
Hence the vector X0 that minimizes Equation (3.2.21) will also be the solution
vector of equation (3.2.20). We shall proceed to develop the method based
upon the minimization of the quadratic form given by Equation (3.2.21). The
extension of the algorithm to the optimization of a more general objective
function will then be discussed.
The idea behind the conjgate-gradient procedure is similar to that of
steepest descent in that a sequence of one-dimensional searches is carried out
in directions which are determined by the partial derivaties of the objective
function. Unlike the method of steepest, however, the search vectors are not
equal to the negative gradient vectors; rather, a sequence of search vectors is
determined in such a manner that each search vector. The algorithm is
guaranteed to minimize a quadratic function of n independent variables with no
more tan n iterations a conditionknown as quadratic convergence.
Proceeding from an arbitrary initial search point X 0, we locate a sequence
of points that are successively closer to the mnimum as follows:
(3.2.23)
Xi+1 = Xi + iPi,
where i is a positive scalar defines the distance between X i and Xi+1 along the
search vector Pi. Notice that t the minimum along Pi will occur where Pi is
tangent to the family of contours given by Equation (3.2.21). Stated differently,

the gradient of y at Xi+1 will be normal to P, i.e.,
(3.2.24)
yi+1Pi= Pi yi+1 = 0.
If we apply Equation (3.2.23) recursively, we obtain

(3.2.25)
Xk = Xi
In particular,
Subtracting X0 from each side of of Equation (3.2.26) and premultiplyng by A,
we obtain
Forming the gradient of Equation(3.2.21), however, we see that
Hence Equation (3.2.27) becomes:
As a special case of the above expresin we have
Let us now develop a criterion for defining the search vector P j. Premultiplyng
Equation (3.2.29) by Pi-1 gives
The first termo n the right-hand side vaniches because of Equation (3.2.24). If
we now choose the Pj such that
PAPj = 0
For i j, then the summation term in Equation (3.2.31) will also vanish, so that
The condition expressed by Equation (3.2.32) is known as A-conjugacy, and the

set of vectors Pi is said to be A-conjugate.
It can easily be shown that the A-conjugacy condition is sufficient to
ensure that the vectors P0, P1,, Pn-1 are linearly independent. Therefore, the
vector P0, P1, , Pn-1 are linearly independent. Therefore, the vector set P i is a
basis in the n-dimensional vector space, so that yn can be expressed in terms
of one or more of the Pi. Thus Equation (3.2.33) will be nonzero for at least one
i unless yn = 0. We see, then, that the condition that the vectors P, be Aconjugate causes yn to vanish identically. Since this condition exists only
where the quadratic form is minimized, we conclude that the quadratic form is
thus minimized after no more than n one-dimensional searches in the
directions P0, P1, , Pn-1.
We still have not specified exactly how the Pi are chosen. Le tus
arbitrarily let P0 = -y0, and the let
(3.2.34)
Where rhe i are positive scalars that must be determined, This choice of
Pi can be shown to satisfy the A-conjugacy condition expressedby Equation
(3.2.32). For further decelopment of this point the reader is referred to
Beckman [1960].
From Equation (3.2.32), we can write
(3.2.35)
PiAPi+1 = 0
Conbining this result with Equation (3.2.34) gives

(3.2.36)
Or
(3.2.37)
Equation (3.2.37) can be used to compute the i if we so desire. In a practical
sense, however, notice that Equation (3.2.37) requires an explicit knowledge of
the matrix A. If a large problema (i.e., a problema of high dimensionality) is
being solved on a digital computer, then the nxn matrix a must be stored in the
computer. This can occupy a significant portion of core storage and hence limit
the size of a problema that can be solved by the conjgate-gradient method.
For this reason we will not use Equation (3.2.37) to determine i that does not
contain the matrix A.
From Equation (3.2.30), we can write
(3.2.38)
Feom Equation (3.2.24), however, we see that the left-hand side of Equation
(3.2.38) vanishes, so that
(3.2.39)
Now le tus premultiply Equation (3.2.34) by yi+1. This gives
Again referring to Equation (3.2.24), we see that last term in Equation (3.2.40)
vanishes. Hence,
Substituting this result into Equation (3.2.39) gives
Le tus again make use of Equation (3.2.30) to write
Which can be rearranged to give
Recall that A is assumed to be aymmetric; hence
So that
Utilizing Equation (3.2.30) one more time we can write
Because of Equations (3.2.24) and (3.2.32). From Equation (3.2.34), however,

we have
Combining this result with Equation (3.2.47) gives
Since yi+1Pi vanishes because of Equation (3.2.24); we conclude that

(3.2.50)
yi+1yi = 0.
Hence Equation (3.2.46) becomes

(3.2.51)
A simplified expression for i can now be obtaines by substituting
Equations (3.2.42) and (3.2.51) into Equation (3.2.37). This yields
(3.2.52)
Which does not require explicit knowledge of the matrix A.
Several other mathematical relationships can be shown to exist among

the various vectors that appear in the algorithm. For a detailed account of
these relationships, as well as a more rigorous exposition of the conjgategradient method, the reader is referred to a paper by Beckman [1960].
Let us now summarize the algorithm, including a generalization to the
maximization problema. Choose as arbitrary point X 0 and evaluate the gradient
vector y0. Let P0 = y0 (the plus sign corresponding to a maximization
problema, the minus to a minimization). Obtain
(3.2.23)
Xi+1 = Xi + iPi
As the point on the P, vector where the objective function is extremized. This
point is located by conducting a one-dimensional search along P i. Determine
yi+1, the gradient vector, at Xi+1. Compute i in accordance with Equation
(3.2.52), and determine a new search vector
(3.2.53)
Again the plus sign is chosen for a maximization problema and the minus sign
corresponds to a minimization.
Although the method is supposed to find the optimum of a quadratic
form with no more than n iterations, it is in fact rather sensitive to roundoff
error. Fletcher and Reeves [1964] suggest that the computation be restarted
with Pi =yi after every n+1 iterations as an effective way to minimize the
problema of cumulative roundoff.
Our discussion thus far has been concerned exclusively with the
optimization of quadratic forms. We can see that the method is applicable to a
broader clase of function. However by expanding the objective function in a
Taylor series:
Where H is the Hessian matrix copnsisting of the second partial derivatives of y

evaluated at X0; i.e.,
Notice that H is a real , symmetric matrix, providing the ogjective fuction is not
linear.
If X0 represents an extremum of y(X), then
And Equation (3.2.54) becomes
Hence the quantity y(X)-y(X0) can be expressed by Equation (3.2.57), providing

X is sufficiently closet o X0 so that the higher-order terms in the Taylor-series
expansin vanish. In the neighborhood of an optimum, then, we would expect
the conjugate-gradient method to perform quite efficiently when applied to any
continuous and differentiable nonlinear function.
Inplementation of the method does not require explicit knowledge of the matrix
H. The first partial derivaties of the objective function must be known, however,
in order to determine the gradient vector y. These partial derivaties can be
determined by finite differences. Unfortunately, the method is susceptible to
the accumulation of roundoff errors-a shortcoming that becomes particulary
troublesome when the partial derivaties are approximated numerically.
Therefore, the use of a restart after every n+1 iterations is strongly advised.
An appreciation for quadratically convergent methods can be obtained from
Figure 3.14 where a quadratic function is minimized by the methos of steepest
descent and by a conjugate-gradient search. The simple gradient (steepestdescent) procedure, which is shown by the solid lines, tends to zigzag back and
forth near the optimum, requiring many search iterations. This type of
oscillation near an optimum is very characteristic of steepest-descent
techniques. On the other hand the conjugate-gradient procedure, shown by
dashed lines, finds the mnimum of the function in only two iterations, thus
avoiding the problema of oscillations near an optimum. Quadratically
convergent gradient methods are sometimes referred to as second-order
methods (ef. Crockett and Chernoff [1955]).
Figure 3.14. Comparison of Behavior of Steepest Descent and Conjugate

Gradient Search in Minimization of Quadratic Function
EXAMPLE 3.2.4
Reslove Example 3.2.1 using the method of conjugate gradients, with X 10 = 1
and X20 = 1 as an initial point.
In vector notation,
X0 =
[]
1
1
And, from Example 3.2.1.

y|xn = -
[ ]
4
72
So that
From Equation (3.2.23), we have
[]
1 +
1
X1 =
[ ]
4
72
> 0.
The objective function can be expressed as a function of

y(
Minimizing y(
) = (4
- 2)2 + 9(72
), we obtain y = 3.1594 at
X1 =
1.223 .
5.011
The gradient can now be determined as
And 0 can be computed as
3.554
0.197
as follows:
- 4)2.
=0.0555. Hence
[ ]
y|x1 =
( 3.554 )2+ ( 0.197 )2

( 4 )2 + ( 72 )2
0 =
= 0.00244.
Making use of Equations (3.2.34) and (3.2.23), we obtain
P1 =
Solving for
[ ][
3.554 +0.00244 4 = 3.564 .

0.197
72 0.022
as before [i.e., expressing y(X2) as a function of
minimizing with respect to
] yields y= 5.91x10-10 at
X2 =
and
= 0.4986. Hence
3.0000
,
5.000
Which is, for all practical purposes, the desired result.

Notice that this two-dimensional function has been minimized with the
determination of only two points. This is, of course, to be expected, since the
objective function is a quadratic and the conjugate-gradient algorithm is
quadratically convergent.
EXAMPLE 3.25
Resolve Example 3.2.3 using the method of conjugate gradients and a
goldenratio search. Let =10-5 be the largest permisible value of each the
partial derivatives for which the problema will be considered to have converged
to an optimum. The partial derivatives are to be evaluated numerically.
Some select results of this problema, obtained with a digital computer,
are given in Table 3.8.
Table 3.8. RESULTS OF CONJUGATE GRADIENT OPTIMIZATION
n
0
1
2
3
5
10
16
X1n
0.5000
0.1800
0.1703
0.1922
0.1977
0.1998
0.19998
X2n
0.5000
0.7716
0.7618
0.7264
0.6984
0.6685
0.66683
yn
5.7899
7.1772 x 10-3
3.8202 x 10-3
1.2420 x 10-3
2.9656 x 10-4
4.0192 x 10-7
5.0831 x 10-12
The computation was terminated after 16 iterations, since each partial

derivative bcame less than or equal to 10-5 in magnitude. Notice that this
problema required less than one tenth as many iterations as optimal steepest
for a comparable solution (cf. Example 3.2.3). The computational effort per
iteration is slightly greater than with optimal steepest descent.
Variable-metric algorithm
The variable-metric algorithm is another sophisticated gradient
techniques, originally devised by DAvidon [1959] to minimize a quadratic
function with no more than n ateps. The method was later modified and
improved upon by Fletvher and Powell [1964]. This later of the algorithm is an
extremely powerful gradient method for extremizing any unconstrained,
continuous, and differentiable objective function. We shall present the Fletcher
and Powell version of the algorithm and discuss its advantages and
disadvantages.
Like the method of conjugate gradients, the variable-metric algorithm is
designed to extremize the function
(3.2.58)
By conducting a sequence of one-dimensional searches. These searches begin
at some arbitrary point X0 and proceed to locate a succesession of improved
points in accordance with
Where
is some positive constant.
Now le tus recall a few significant relationship that relate to the

minimization of a quadratic form. First, each X i+1 represents the position of an
extremum along Pi. Hence the gradient of the objective function will be
ortogonal to Pi at Xi+1; i.e.,
Where
[The validity of Equation (3.2.60) is, of course, not restricted to a quadratic

form.] Also, we have established that
Finally, we have shown that an extremization algorithm will be quadratically

convergent, providing the Pi are A-conjugate; i.e.,
PiAPj = 0,
ij.
(3.2.63)
Any algorithm that satisfies Equation (3.2.63) will, apart from numerical
roundoff errors, minimize a quadratic form with no more than n onedimensional searches in the Pi directions, i = 0,1,,n-1.
Thus far the description of the variable metric algorithm parallels that of
the methos of conjugate gradients. The algorithms differ in the manner in
which the search vectors P1, P2,, Pn-1 are chosen.
In the variable metric algorithm the search vectors are chosen as

Conjugate Gradients Programación No Lineal

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Conjugate Gradients Programación No Lineal

Uploaded by

Copyright:

Available Formats

Conjugate Gradients

Although the method of steepest ascent is a well-know procedure based upon a

Where A is a real, symmetric, positive-definite matrix, by minimizing the

The equivalence of these two problems is established by the relationship

tangent to the family of contours given by Equation (3.2.21). Stated differently,

If we apply Equation (3.2.23) recursively, we obtain

Forming the gradient of Equation(3.2.21), however, we see that

Hence Equation (3.2.27) becomes:

As a special case of the above expresin we have

The condition expressed by Equation (3.2.32) is known as A-conjugacy, and the

Conbining this result with Equation (3.2.34) gives

Substituting this result into Equation (3.2.39) gives

Le tus again make use of Equation (3.2.30) to write

Which can be rearranged to give

Recall that A is assumed to be aymmetric; hence

Utilizing Equation (3.2.30) one more time we can write

Because of Equations (3.2.24) and (3.2.32). From Equation (3.2.34), however,

Combining this result with Equation (3.2.47) gives

Since yi+1Pi vanishes because of Equation (3.2.24); we conclude that

Hence Equation (3.2.46) becomes

Several other mathematical relationships can be shown to exist among

Where H is the Hessian matrix copnsisting of the second partial derivatives of y

If X0 represents an extremum of y(X), then

And Equation (3.2.54) becomes

Hence the quantity y(X)-y(X0) can be expressed by Equation (3.2.57), providing

Figure 3.14. Comparison of Behavior of Steepest Descent and Conjugate

And, from Example 3.2.1.

From Equation (3.2.23), we have

The objective function can be expressed as a function of

The gradient can now be determined as

And 0 can be computed as

( 3.554 )2+ ( 0.197 )2

Making use of Equations (3.2.34) and (3.2.23), we obtain

3.554 +0.00244 4 = 3.564 .

as before [i.e., expressing y(X2) as a function of

minimizing with respect to

Which is, for all practical purposes, the desired result.

The computation was terminated after 16 iterations, since each partial

is some positive constant.

Now le tus recall a few significant relationship that relate to the

[The validity of Equation (3.2.60) is, of course, not restricted to a quadratic

Finally, we have shown that an extremization algorithm will be quadratically

You might also like