Professional Documents
Culture Documents
Two
Numerical Optimization
Using C1 and C2 Type
Objects
C1 and C2 spaces are continuous vector spaces which are differentiable up to order 1 and order 2, respectively. Classes in C1 and C2 types enable the users of VectorSpace C++ Library to deal with numerically differentiable objects. The applications in the subject of numerical optimization, constrained or unconstrained, can be
easily expressed with C1 and C2 types. C++ programs using VectorSpace C++ Library in this chapter are
projects in project workspace file Cn.dsw under directory vs\ex\Cn.
1. e.g., p.90 in W.L. Burke, 1985, Applied differential geometry, Cambridge University Press, Cambridge, U.K.
Chapter
In the following we define the algebra of the C1 type. For x, y, and z, Tangent_Bundle objects of C1
type, with an abstract binary operator , such as in z = x y , TABLE 21 summaries the algebra of four concrete basic operators (in place of the abstract operator ).
Operator
base point
tangent vector
z=x+y
dz = dx + dy
z=x-y
dz = dx - dy
z=xy
dz = y dx + x dy (Lebniz rule)
z=x/y
dz = (y dx - x dy) / y2
TABLE 21 Four basic binary operators for the algebra of C1 type objects.
The derivatives for the multiplication operator in the third row simply follows the Lebniz rule in calculusd z =
d(x y) = y dx + x dy. Lebniz rule can also be applied to the division operatord(x / y) = d(x * (1/y)) = (y dx - x
dy) / y2. Other operators and transcendental functions can be defined accordingly as in calculus.
It is obvious how C1 type object works as a differentiable object. Since we keep track of numerical values of
base point, p, and tangent vector, w, of a C1 type object through out all kinds of operations, the resultant numerical values of base point and tangent vector of intermediate (temporary) objects will always be available. Inquiring on the numerical values of tangent vector, w, of the C1 type object gives the derivative information we need.
From reverse engineering point of view, when we want to do away with C1 type object, this model has the
advantage that the computing algorithm is quite compatible with traditional FORTRAN or C programming. On
the other hand, the symbolic languages process the intermediate analytical expression by looking up its dictionary, while defer the evaluation of actually numerical values until it is explicitly requested by users. Therefore
the computing algorithm is completely different from that of FORTRAN or C. In retrospect, we note that the
defer evaluation approach is known to have its advantage of fast response time in an interactive environment,
since un-necessary evaluations are avoided sometimes.
96
C1 Type Objects
Scalar/Vector du for the tangent vectorw (a Scalar when that spatial dimension =1, and a Vector when the
spatial dimension > 1). The two private data members C0* type Scalar, u, and Scalar/Vector, du, in turn refer
to double* type v and dv in physical memory space, respectively. The constructor and the destructor of the
C1 type Tangent_Bundle object encapsulate the details of low-level memory management on the double* type
v and dv. In other words, both the base point and the tangent vector are represented in two levels. That is in
higher level as C0* type Scalar(u) and Scalar/Vector(du), and in lower level as double* type v and dv. This
dual abstraction in Tangent_Bundle class is to facilitate (1) swift memory management in lower level, and (2)
mathematical abstraction in higher level for the C1 type Tangent_Bundle class as shown in TABLE 22.
Mathematics
Physical Memory
base pointp
C0* of Scalaru
double*v
tangent vectorw
C0* of Scalar/Vectordu
double*dv
TABLE 22. Dual abstraction of a C1 type Tangent_Bundle class.
Constructors
A dedicated constructor for a C1 type Tangent_Bundle object can be written as (project: c1_examples)
1
2
3
4
5
C1 x(0.0);
cout << ((C0) x) << endl;
cout << d(x) << endl;
((C0) x) = 3.0;
cout << ((C0) x) << endl;
A double constant 0.0 in line 1 is the argument passed to the dedicated constructor to be assigned as the value
of the base point. The Tangent_Bundle so constructed has default spatial dimension of 1, and its default derivative (tangent vector) value du = 1.0 as default. C0 converter C1::operator C0() in line 2 casting on x is used
to retrieve the value of the base point of x. The free function d(const C1&) in line 3 can be used to retrieve
the value of the derivative (tangent vector). Both the casting operator and the derivative function can be used as
l-value, to be put on the left-hand-side to assign value to it. The reason for the default value du = 1.0 is evident
when we consider using x as a variable. For example, use x as a variable to define a function f 1(project:
c1_examples)
1
2
3
4
5
C1 x(0.0),
f = 2.0 * x * (sin(x)+1.0);
cout << ((C0) f) << endl;
cout << d(f) << endl;
1. example taken from K.E. Gorlen, S.M.Orlow, and P.S. Plexico, 1991, Data Abstraction and Object-Oriented Programming in C++, John Wiley & Sons Ltd, p.92-93.
97
Chapter
The function (or dependent variable) f is defined with the (independent) variable x as its parameter. A different view on the default value of du = 1.0, is that if x is to serve as a variable, the derivative of xdx
equals 1.0 by the way of differentiation in calculus. The kind of dedicated constructor C1::C1(const double&) that can be used as a variable to define a more complicated function is called a variable (dedicated) constructor.
When the spatial dimension is not equal to 1, we can use the following constructor (not a variable dedicated
constructor)
C1 y(3.0, 3);
cout << ((C0) y) << endl;
cout << d(y) << endl;
// (project: c1_examples)
// 3.0
// {0.0, 0.0, 0.0}T
The first argument of this dedicated constructor is a const double& which specifies the value of the base point,
and the second argument is a int which gives the number of the spatial dimension, the dimension of a tangent
vector, w. The default value of the tangent vector is to have all its components set to 0.0.
The constant strings for Tangent_Bundle virtual constructor (use macro definition TANGENT_BUNDLE)
and autonomous virtual constructor are shown in the following box.
virtual constructor string
priority
by reference
C1&
C1*
double*, double*
double*, double*, int
C1 type Tangent_Bundle
pointer to C1 type Tangent_Bundle
base point, tangent vector, (spatial dim. = 1)
base point, tangnet vector, spatial dim.
1
2
3
4
by value
int
const double&, const double&
const C0&, const C0&
const double*, const double*
const C0*, const C0*
const double*, const double*, int
const C0*, const C0*, int
const C1&
const C1*
spatial dim.
base point, tangent vector, (spatial dim. = 1)
base point, tangent vector, (spatial dim. = 1)
base point, tangent vector, (spatial dim. = 1)
base point, tangent vector, (spatial dim. = 1)
base point, tangent vector, spatial dim.
base point, tangent vector, spatial dim.
C1 type Tangent_Bundle
pointer to C1 type Tangent_Bundle
5
6
7
8
9
10
11
12
13
C1 Type Objects
operator or function
remark
symbolic operators
C0& operator &= ( )
C0& operator = ( )
operator C0()
assignment by reference
assignment by value
casting operator; retrieve base point
arithmetic operators
C0 operator + ( ) const
C0 operator - ( ) const
C0 operator + (const C0&) const
C0 operator - (const C0&) const
C0 operator * (const C0&) const
C0 operator / (const C0&) const
C0& operator += (const C0&)
C0& operator -= (const C0&)
C0& operator *= (const C0&)
C0& operator /= (const C0&)
logic operators
int operator == (const C0&) const
int operator != (const C0&) const
int operator >= (const C0&) const
int operator <= (const C0&) const
int operator > (const C0&) const
int operator < (const C0&) const
equal
not equal
greater or equal
less or equal
greater
less
functions
int col_length() const
C0& d()
C0 pow(int) const
C0 exp(const C0&) const
C0 log(const C0&) const
C0 sin(const C0&) const
C0 cos(const C0&) const
spatial dimension
the first derivative; retrieve tangent vector
power (applied to each element of the Vector)
exponent (applied to each element of the Vector)
log (applied to each element of the Vector)
sin (applied to each element of the Vector)
cos (applied to each element of the Vector)
TRUE == 1
FALSE == 0
Partial listing of C1 type Tangent_Bundle class arithmetic operators, logic operators and functions.
99
Chapter
is represented by a C0* type Vector object for its base point (length = m), and a C0* type Matrix object
(row-length = m, column-length = n, and m n) for its tangent vector. The Matrix object actually is the
Jacobian matrix of the form (for example, m = n = 3)
f 1 x 1 f 1 x 2 f1 x 3
df i
df
------ = ------- = f 2 x 1 f 2 x 2 f2 x 3
dx j
dx
f 3 x 1 f 3 x 2 f3 x 3
Eq. 21
Constructors
The dedicated constructor for the C1 type Vector_of_Tangent_Bundle class can be written as (project:
c1_examples)
double v[3] = {1.0, 1.0, 1.0};
C1 x(3, v);
cout << ((C0)x) << endl;
cout << d(x) << endl;
C1::C1(int, const double*) is the variable dedicated constructor for the C1 type Vector_of_Tangent_Bundle
class. The example for using this variable dedicated constructor is a vector function f = {f1, f2, f3}T, which
depends on three independent variable x = {x1, x2, x3}T as1
f 1 = 16x 14 + 16x 24 + x 34 16
f 2 = x 12 + x 22 + x 32 3
f 3 = x 13 + x 2
Eq. 22
Root finding of f(x) = 0 for this non-linear problem can be obtained by an iterative algorithm. Considering
approximation of the vector function f by Taylor expansion at the neighborhood of an initial value xi with its
increment dx as
f(xi+dx) = f(xi) + f,x(xi) dx + O(dx2) = 0
Eq. 23
where O(dx2) denotes the error in the second-order of dx or above. Neglecting higher-order errors in Eq. 23 for
small dx, we have
dx = - f(xi) / f,x(xi)
Eq. 24
1. example taken from K.E. Gorlen, S.M. Orlow, and P.S. Plexico, 1991, Data Abstraction and Object-Oriented Programming in C++, John Wiley & Sons Ltd, p.93-97.
100
C1 Type Objects
This is the formula for root finding, and xi+1 = xi+dx is the update. The implementation of this iterative algorithm
(Newton-Raphson Method) shown in Program Listing 21 with VectorSpace C++ Library is very simple. The
C++ codes are actually as concise as the mathematical expressions. The selector C1::operator [](int) is used to
access the components of the C1 type Vector_of_Tangent_Bundle class. The return value of the selector is a
Tangent_Bundle (see Figure 2.2). The solution for f(x) = 0 is x = {0.877966, 0.676757, 1.33086}T.
#include include/vs.h
#define EPSILON 1.e-12
#define MAX_ITER_NO 10
int main() {
double v[3] = {1.0, 1.0, 1.0};
C1 x(3, v), f(3, (double*)0);
int count = 0;
do {
f[0]=16.0*x[0].pow(4)+16.0*x[1].pow(4)+x[2].pow(4)-16.0;
f[1]=
x[0].pow(2)+
x[1].pow(2)+x[2].pow(2)-3.0;
f[2] =
x[0].pow(3)x[2];
C0 dx = - ((C0)f) / d(f);
(C0) x += dx;
} while(++count < MAX_ITER_NO &&
(double)norm((C0)f) > EPSILON);
if(count == MAX_ITER_NO)
cout << Warning: convergence failed, residual norm:
<< ((double)norm((C0)f)) << endl;
else
cout << solution ( << count << ): << ((C0)x) << endl;
return 0;
}
f1 = 16x 14 + 16x 24 + x 34 16
f2 = x 12 + x 22 + x 32 3
f 3 = x 13 + x 2
x:
double v[3] = {1.0, 1.0, 1.0};
C1 x(3, v);
x[0] :
00 01 02
10 11 12
20 21 22
base
point
tangent
vector
00 01 02
101
Chapter
The constant strings for C1 type Vector_of_Tangent_Bundle class virtual constructors (use macro definition
VECTOR_OF_TANGENT_BUNDLE) and autonomous virtual constructors are shown in the following box.
virtual constructor string
by reference
C1&
C1*
int, int, double*, double*
by value
int, int
int, int, const double*, const double*
const C0&, const C0&
const C0*, const C0*
const C1&
const C1*
priority
C1 type Vector_of_Tangent_Bundle
15
16
102
C1 Type Objects
operator or function
symbolic operators
C0& operator &= ( )
C0& operator = ( )
C0& operator [] (int)
operator C0()
assignment by reference
assignment by value
selector
remark
return Tangent_
Bundle
arithmetic operators
C0 operator + ( ) const
C0 operator - ( ) const
C0 operator + (const C0&) const
C0 operator - (const C0&) const
C0 operator * (const C0&) const
C0 operator / (const C0&) const
C0& operator += (const C0&)
C0& operator -= (const C0&)
C0& operator *= (const C0&)
C0& operator /= (const C0&)
logic operators
int operator == (const C0&) const
int operator != (const C0&) const
int operator >= (const C0&) const
int operator <= (const C0&) const
int operator > (const C0&) const
int operator < (const C0&) const
equal
not equal
greater or equal
less or equal
greater
less
functions
int row_length() const
int col_length() const
C0& d()
C0 pow(int) const
C0 exp(const C0&) const
C0 log(const C0&) const
C0 sin(const C0&) const
C0 cos(const C0&) const
manifold dimension
spatial dimension
the first derivative; retrieve tangent vector
power (applied to each element of the Vector)
exponent (applied to each element of the Vector)
log (applied to each element of the Vector)
sin (applied to each element of the Vector)
cos (applied to each element of the Vector)
TRUE == 1
FALSE == 0
Partial listing of Vector_of_Tangent_Bundle object arithmetic operators, logic operators and functions.
103
Chapter
Operator
base point
tangent vector
z=x+y
dz = dx + dy
z=x-y
dz = dx - dy
z=xy
dz = y dx + x dy
z=x/y
dz = (y dx - x dy) / y2
dx dy + dy dx + xd y + yd x
d x y ( dx dy + dy dx + xd y ) y 2 +2x ( dy dy ) y 3
TABLE 23. Four basic binary operators for the algebra of C2 type objects.
where the operator for the tangent of tangent vector denotes tensor product.
spatial dimension = 1
du
ddu
spatial dimension = 3
u
base point
104
C2 Type Objects
Constructors
An example of using a variable dedicated constructor for C2 type Tangent_of_Tangent_Bundle class is
(project: c2_examples)
C2 x(0.0);
cout << ((C0) x) << endl;
cout << d(x) << endl;
cout << dd(x) << endl;
For the purpose of a variable dedicated constructor, the default value of ddu(=0.0) is just the derivative of du
(= 1.0). For access to the second derivative information, free function dd(const C2&) (or d2(const C2&)) can
be used to retrieve the value of ddu (project: c2_examples).
C2 x(0.0),
f = 2.0 * x * (sin(x)+1.0);
cout << ((C0) f) << endl;
cout << d(f) << endl;
cout << dd(f) << endl;
For spatial dimension greater than 1, we can write the dedicated constructor similarly to write that of the C1 type
Tangent_Bundle as (project: c2_examples)
C2 y(3.0, 3);
cout << ((C0) y) << endl;
cout << d(y) << endl;
cout << dd(y) << endl;
// 3.0
// {0.0, 0.0, 0.0}T
// {{0.0, 0.0, 0.0}, {0.0, 0.0, 0.0}, {0.0, 0.0, 0.0}}
The constant strings for C2 type Tangnet_of_Tangent_Bundle virtual constructors (use macro definition
TANGENT_OF_TANGENT_BUNDLE) and autonomous virtual constructors are shown in the following box.
105
Chapter
by reference
C2&
C2*
double*, double*, double*
double*, double*, double*, int
by value
int
const double&, const double&,
const double&
const C0&, const C0&,
const C0&
const double*, const double*
const double*
const C0*, const C0*
const C0*
const double*, const double*,
const double*, int
const C0*, const C0*, int
const C0*, int
const C2&
const C2*
C2 type Tangent_of_Tangent_Bundle
pointer to C2 type Tangent_of_Tangent_Bundle
base point, tangent vector, tangent of tangent
vector, (spatial dim. = 1)
base point, tangent vector, tangent of tangent
vector, spatial dim.
1
2
3
spatial dim.
base point, tangent vector, tangent of tangent
vector, (spatial dim. = 1)
base point, tangent vector, tangent of tangent
vector, (spatial dim. = 1)
base point, tangent vector, tangent of tangent
vector, (spatial dim. = 1)
base point, tangent vector, tangent of tangent
vector, (spatial dim. = 1)
base point, tangent vector, tangent of tangent
vector, spatial dim.
base point, tangent vector, tangent of tangent
vector, spatial dim.
C2 type Tangent_of_Tangent_Bundle
pointer to C2 type Tangent_of_Tangent_Bundle
5
6
106
priority
7
8
9
10
11
12
13
C2 Type Objects
operator or function
remark
symbolic operators
C0& operator &= ( )
C0& operator = ( )
operator C0()
assignment by reference
assignment by value
casting operator; retrieve base point
arithmetic operators
C0 operator + ( ) const
C0 operator - ( ) const
C0 operator + (const C0&) const
C0 operator - (const C0&) const
C0 operator * (const C0&) const
C0 operator / (const C0&) const
C0& operator += (const C0&)
C0& operator -= (const C0&)
C0& operator *= (const C0&)
C0& operator /= (const C0&)
logic operators
int operator == (const C0&) const
int operator != (const C0&) const
int operator >= (const C0&) const
int operator <= (const C0&) const
int operator > (const C0&) const
int operator < (const C0&) const
equal
not equal
greater or equal
less or equal
greater
less
functions
int col_length() const
C0& d()
C0& dd() or C0& d2()
C0 pow(int) const
C0 exp(const C0&) const
C0 log(const C0&) const
C0 sin(const C0&) const
C0 cos(const C0&) const
spatial dimension
the first derivative
the second derivative
power (applied to each element of the Vector)
exponent (applied to each element of the Vector)
log (applied to each element of the Vector)
sin (applied to each element of the Vector)
cos (applied to each element of the Vector)
TRUE == 1
FALSE == 0
Partial listing of C2 type Tangent_of_Tangent_Bundle object arithmetic operators, logic operators and functions.
107
Chapter
base point
tangent
vector
tangent of tangent
vector
Constructors
A variable dedicated constructor for C2 type Vector_of_Tangent_of_Tangent_Bundle can be written as
(project: c2_examples)
double v[3] = {0.0, 1.0, 2.0};
C2 x(3, v);
cout << ((C0) x) << endl;
cout << d(x) << endl;
cout << (+dd(x)) << endl;
Note that dd(x) returns a Nominal_Submatrix which can not be directed to iostreams. We must use primary
casting + to convert it into a Matrix. Without this conversion the program will throw an exception and stop.
108
C2 Type Objects
We apply the variable dedicated constructor to a minimization problem1
f(x1, x2) = 2x12 + x1x2 + x22 -12x1 -10 x2
Eq. 25
This elliptic objective functional can be approximated by Taylor expansion to the second order as2
1
f ( x ) f ( x i ) + f ,x ( x i )dx + --- dx T H ( x i )dx
2
Eq. 26
where f,x(xi) is the so-called Jacobian matrix, and H(xi) = f,xx(xi) is the so-called Hessian matrix. f(x) is minimized if its first derivatives with respect to dx vanishes. Therefore, if we take derivatives of f(x), set to zero, then
solve for dx, we obtain,
dx = - f,x(xi) / H(xi)
Eq. 27
The elliptic nature of the objective functional guarantees that the Hessian matrix can be inverted. Eq. 27 is
known as the Newtons formula, and xi+1 = xi+dx is the update for the algorithm. For the elliptic objective functional such as Eq. 25, the approximation by Eq. 26 in quadratic form is exact. One iteration will give the exact
answer. Program Listing 22 in the following implemented the classic Newton-Raphson method that can be
used for less ideal cases when the objective functionals are not exactly quadratic.
The minimum solution of this elliptic objective functional f is {2, 4}T, which is the center of the ellipse f =
constant. We can verify this immediately with analytical geometry. The Newton-Raphson iterative procedure in
this case achieves convergence in just one iteration from the initial point (0, 0) to the final solution point (2, 4)
(see Figure 2.5).
The constant strings for Vector_of_Tangent_Bundle virtual constructors (use macro definition
VECTOR_OF_TANGENT_OF_TANGENT_BUNDLE) and autonomous virtual constructors are shown in
the following box.
1. function without constrained conditions from D.G. Luenberger, 1989, Linear and Nonlinear Programming, AddisonWesley Publishing Company, Inc., Reading, MA., p.426.
2. A similar equation is in p.225, Eq. 43 of D.G. Luenberger, 1989, Linear and Nonlinear Programming, Addison-Wesley
Publishing Company, Inc., Reading, MA.
109
Chapter
#include include/vs.h
#define EPSILON 1.e-12
#define MAX_ITER_NO 10
int main() {
C2 x(2, (double*)0), f;
int count = 0;
do {
f &= 2.0*x[0].pow(2) + x[0]*x[1] + x[1].pow(2) -12.0*x[0]
-10.0*x[1];
C0 dx = - d(f) / dd(f);
(C0) x += dx;
} while(++count < MAX_ITER_NO &&
(double)norm(d(f))>EPSILON);
if(count == MAX_ITER_NO)
cout << Warning: convergence failed, increment norm:
<< ((double)norm(dx)) << endl;
else
cout << solution ( << count << ): << ((C0)x) << endl;
return 0;
}
(2, 4) solution
110
C2 Type Objects
virtual constructor string
by reference
C2&
C2*
int, int, double*, double*,
double*
C2 type Vector_of_Tangent_of_Tangent_Bundle
C2* type Vector_of_Tangent_of_Tangent_Bundle
manifold dim, spatial dim, base point, tangent vector
tangent of tangent vector
by value
int, int
int, int, const double*,
const double*, const double*
const C0&, const C0&,
const C0&
const C0*, const C0*,
const C0*
const C2&
const C2*
priority
14
15
16
111
Chapter
symbolic operators
C0& operator &= ( )
C0& operator = ( )
C0& operator [] (int)
operator C0()
assignment by reference
assignment by value
selector
return Tangent_of_
Tangent_Bundle
arithmetic operators
C0 operator + ( ) const
C0 operator - ( ) const
C0 operator + (const C0&) const
C0 operator - (const C0&) const
C0 operator * (const C0&) const
C0 operator / (const C0&) const
C0& operator += (const C0&)
C0& operator -= (const C0&)
C0& operator *= (const C0&)
C0& operator /= (const C0&)
logic operators
int operator == (const C0&) const
int operator != (const C0&) const
int operator >= (const C0&) const
int operator <= (const C0&) const
int operator > (const C0&) const
int operator < (const C0&) const
equal
not equal
greater or equal
less or equal
greater
less
functions
int row_length() const
int col_length() const
C0& d()
C0& dd()
C0 pow(int) const
C0 exp(const C0&) const
C0 log(const C0&) const
C0 sin(const C0&) const
C0 cos(const C0&) const
manifold dimension
spatial dimension
the first derivative; retrieve tangent vector
the second derivative; retrieve tangent of tangent vector
power (applied to each element of the Vector)
exponent (applied to each element of the Vector)
log (applied to each element of the Vector)
sin (applied to each element of the Vector)
cos (applied to each element of the Vector)
TRUE == 1
FALSE == 0
Partial listing of C2 type Vector_of_Tangent_of_Tangent_Bundle object arithmetic operators, logic operators and functions.
112
The pre-processing step of the linear programming is to (1) multiply -1 on the objective functional to convert the maximization problem into a minimization problem, (2) transform the first three inequality constraints
into equality constraints by adding positive slack variables x4, x5, and x6. That is
minimize objective functional f(x) = -3x1 - x2 - 3x3 subject to
2x 1 + x 2 + x 3 + x 4 = 2
x 1 + 2x 2 + 3x 3 + x 5 = 5
2x 1 + 2x 2 + x 3 + x 6 = 6
x 1 0, x 2 0, x 3 0, x 4 0, x 5 0, x 6 0
1. example taken from D.G. Luenberger, 1989, Linear and Nonlinear Programming, Addison-Wesley Publishing Company,
Inc., Reading, MA., p.46.
113
Chapter
sible solutions. In other words, the optimal value of the objective functional is always achieved at a basic
feasible solution for a linear objective functional.
The matrix form of the above problem can be written as
minimize f ( x ) = c DT x D + c BT x B
subject to D x D + Bx B = b
x D 0, x B 0
Eq. 28
xD = {x1, x2, x3} is the non-basic variables, xB = {x4, x5, x6} is the basic variables. We choose the slack variables as initial basic variables for the obvious reason that the initial basic feasible solution is clearly xB = {2, 5,
6} without having to solve any set of equations. We can solve for xB from the equality constraints as
xB = B-1 (b-D xD)
Substituting this back to the objective functional gives
T c T B 1 D ) x + c T B 1 b
f ( x ) = ( cD
D
B
B
Eq. 29
In Eq. 29 the coefficient of the non-basic variables xD is rTD c DT c BT B 1 D . rD is known as the relative cost
which measures the cost of a non-basic variable relative to the current basic variables. Negative values of the relative cost in Eq. 29 decrease the value of the objective functional. A non-basic variable corresponding to the
most negative relative cost component in rD will be brought into the basic set, such that the objective functional
decreases the most. Denote a column in D corresponding to the non-basic variable selected to enter the basic set
as d. Bringing this non-basic variable into the current basic set means that we are moving with p = - B-1d as a
searching direction; that is moving away from xB along x = xB + p. The smallest non-negative component in
vector to satisfy xB + p = 0 will be the first basis in the current basic set to be encountered (an adjacent
extreme point to d). Hence, this basis is to be selected to leave the basic set. The above process is to be repeated
until the components of relative cost rD are all positive.
The implementation of any non-trivial problem such as the finite difference method discussed in page 67
contains many logic steps that are not highly mathematical. In finite difference method we create a class FD to
handle the mapping of finite difference stencil to the global matrix using the concept of data abstraction. Program Listing 23 (project: linear_programming_basic_set) implemented the basic set method. In current basic
set method we create a class Basic_Set to represent the basic and non-basic columns in the constraint equations and the coefficients of the objective functional as shown in Figure 2.6.
For the example problem, we write
C1 X(6, (double*)0), C(3, 6, (double*)0), f;
C[0] = 2*X[0]+ X[1]+ X[2]+ X[3]
C[1] =
X[0]+2*X[1]+3*X[2]
+ X[4]
114
// 6 variables, 3 constraints
;
;
+ X[5];
Notice that both the constraint equations and the objective functional are declared as objects of C1 type. The tangent vector of the C1 type objects will give the coefficients we need; i.e, the elements of A = d(C), and the elements of cT = d(f) in Eq. 28. The class Basic_Set is initialized by calling constructor
#include include/vs.h
class Basic_Set {
C0 *_A, *_c;
int row_size, col_size, *_basic_order;
public:
Basic_Set(C0&, C0&);
~Basic_Set() { delete [] _basic_order; }
C0& A() { return *_A; }
C0& c() { return *_c; }
int basic_order(int i) {return _basic_order[i];}
void swap(int, int);
};
Basic_Set::Basic_Set(C0& dC, C0& df) {
row_size = dC.row_length(); col_size = dC.col_length();
_basic_order = new int[col_size];
for(int i = 0; i < col_size; i++) _basic_order[i] = i;
_A = &dC; _c = &df;
}
void Basic_Set::swap(int i, int j) {
int old_basic_order = _basic_order[i];
_basic_order[i] = _basic_order[j]; _basic_order[j] = old_basic_order;
C0 old_Ai(row_size, (double*)0); old_Ai = (*_A)(i);
(*_A)(i) = (*_A)(j); (*_A)(j) = old_Ai;
C0 old_ci(0.0); old_ci = (*_c)[i];
(*_c)[i] = (*_c)[j]; (*_c)[j] = old_ci;
}
swap order
swap columns
swap objective functional coefficients
swap
basic
non-basic
A=
= [D, B]
cT= [
order array:
] = [cTD, cTB]
0
initial order
after swapping
115
Chapter
The submatrices (D, B) and subvectors (cTD, cTB) in Eq. 28 can be written using referenced Matrix and referenced Vector in VectorSpace C++ Library as
C0 D(3, 3, BS.A(), 0,0), B(3, 3, BS.A(), 0,3), c_D(3, BS.c(),0), c_B(3,BS.c(),3);
where the public member functions Basic_Set::A() and Basic_Set::c() provide access to the matrix A and
vector cT referenced to by the Basic_Set object. The most important function that the class Basic_Set performs is to swap columns between the basic and non-basic set. This is provided by the public member function
Basic_Set::swap(int, int) with the two integer arguments indicating which two columns are to be swapped. A
private member integer array _basic_order is used to keep track down the original variable order. This original
variable order can be retrieved by using the public member function Basic_Set::basic_order(int). Its integer
argument is the current column number.
Program Listing 24 implemented the steps of the so-called revised simplex method in linear programming. 1
These steps are
Step 1: relative costcompute rTD c DT c BT B 1 D , if r D 0 stop,
Step 2:inselect non-basic column (say d) corresponding to the most negative rD to enter the basic set,
Step 3:outcompute p = B-1d, then = xB / p. If no component in vector is greater than 0, stop.
The column in the basic set corresponding to the smallest positive is to leave the basic set.
Step 4: swapswap columns selected in in and out.
The solution is x = {0.2, 0, 1.6, 0, 0, 4}T in the standard form, with the maximum value of the objective functional as 5.4, or the solution is x = {0.2, 0, 1.6}T in original form, neglecting the artificial slack variables.
The basic set method is the traditional simplex-tableau updating procedure that can be explained step-by-step
with minimum mathematics.2 Following is the active set method for linear programming that will be more
readily modified to inequality constrained nonlinear programming.
1. p.60 in D.G. Luenberger, 1989, Linear and Nonlinear Programming, Addison-Wesley Publishing Company, Inc., Reading, MA.
2. see Chapter 3 in D.G. Luenberger, 1989, same as the above.
116
6 variables, 3 constraints
2x 1 + x 2 + x 3 + x 4 = 2
x 1 + 2x 2 + 3x 3 + x 5 = 5
2x 1 + 2x 2 + x 3 + x 6 = 6
= xB / p
step 4: swap non-basic and basic
until all rD > 0
unscramble the solution
x = {0.2, 0, 1.6, 0, 0, 4}T
5.4
117
Chapter
f
p
f
x*
f=c
Tangent plane
C(x)
Figure 2.7 Extremum point occurs at f is a linear combination of C.
With this relation between f and C at an extremum point, we introduce Lagrange multiplier (m
dimensional vector) as the coefficients of linear combination of components of C to form f
f +T C = 0
Eq. 210
In view of this, the Lagrangian functional, l(x, ), can be introduced to represent the constrained optimization
problem as
l(x, ) = f + T C
Eq. 211
For an extremum condition, setting the first-order derivatives of the Lagrangian functional Eq. 211 to zero gives
the Euler-Lagrange equations
l,x= f + T C = 0
l, = C = 0
Eq. 212
This states that the first-order condition of the Lagrangian functional (1) is exactly Eq. 210, and (2) requires x to
stay on the constraints surface (C = 0). The second-order condition of the Lagrangian functional in the case of
the linear objective functional is just the Hessian H= f,xx being positive definite.
In the case of inequality constraints C ( x ) 0 , one can state that
either i = 0 and Ci < 0, or i > 0 and Ci = 0
Eq. 213
In the first part of Eq. 213, the constraint is satisfied in the interior of a feasible region (Ci < 0). The constraint
is inactive by setting the corresponding Lagrange multiplier i = 0. In the second part of Eq. 213, the constraint
is on the edge of the feasible region. The constraint is active (Ci = 0), and the corresponding Lagrange multiplier
118
Eq. 214
This is the so-called Kuhn-Tucker condition. Although Eq. 214 is aesthetically more satisfying, we use Eq. 213
for practical coding of class Active_Set. The kernel of the problem is to compute the Lagrange multiplier
according to Eq. 210 as
= - f / (A)T
Eq. 215
where A denotes the active subset of C. When i 0 , for i in the active set, the Kuhn-Tucker condition is
uphold, and the solution is optimal. Otherwise, select the constraint corresponding to the most negative i, and
drop this constraint from the active set. The search direction p due to the deletion of this constraint satisfies1
Ap= - ei
Eq. 216
where is the e i basis vector (with only i-th component = 1 and 0 elsewhere). Eq. 216 means that the active
constraints other than the i-th one is to be strictly satisfied (=0). We solve for p = - e i / A, and the next solution
is along the path x+ p. Therefore, the first constraint to be encountered (Ci = 0) along this search path is the
minimum positive to satisfy Ci (x+ p) = 0. We have
min {i = - Ci (xcurrent) / (d(C)i p), i-th constraint in the inactive set}
Eq. 217
The constraint corresponding to this smallest positive i (in the inactive set) will be added to the active set, A.
Program Listing 23 (project: linear_programming_active_set) implemented the class Active_Set. The
criterion to determine whether a constraint is active or not is replaced by
Ai > - , (instead of Ai = 0)
Eq. 218
where is a small positive number. The Active_Set keeps track of the active state of each constraint in the
original constraint equations. Upon calling the public member function Active_Set::activate() the current
active set is assembled and a coefficient matrix will be formed. A public member function
Active_Set::active_state(int) requires an integer argument as the order of the original constraint equations and
returns the order of the current constraint equations in the active set. The coefficient matrix can be retrieved
using another free function d( ) such as
C1 X(3, x), C(6, 3, (double*)0, (double*)0);
C[0] = 2*X[0]+ X[1]+ X[2] - 2;
C[1] =
X[0]+2*X[1]+3*X[2] - 5;
C[2] = 2*X[0]+2*X[1]+ X[2] - 6;
C[3] = - X[0];
1. p. 176 in P.E. Gill, W. Murray, and M.H. Wright, 1981, Practical Optimization, Academic Press, Inc., San Diego.
119
Chapter
// C is the constraints
// form active set and coefficient matrix
// Eq. 210 as = - f / (A)T
The problem has been recast to the standard form for the active set method. For problems with equality constraints the Active_Set constructor can be called as
Active_Set A(C, 3);
A second integer argument number indicates the number of equality constraints. These equality constraints will
be always kept in the active set.
A minor technical detail of Active_Set is the private data member _active_state is initialized to -1. When
a constraint is determined to be included in the active set the value of _active_state is set to the order of the
constraint in the current active set. When a constraint is determined to be dropped from the active set, the value
of its _active_state is set to -2, which means this particular constraint can never be activated again. This
treatment may avoid possible zigzagging or jamming of the searching path caught in an infinite loop1.
Program Listing 23 implemented the active set algorithm. The core steps are
Step 1: Lagrange multipliercompute = - f / (A)T, if i 0 stop, the solution is optimal.
Step 2: outselect the constraint corresponding to the most negative i to be dropped from the active set.
Step 3: in compute p = A-1ei, then, for all inactive i, i = - Ci / (d(C)i p). If no i is greater than 0, stop.
The constraint corresponding to the smallest positive i is to be added to the active set.
Step 4: repeat Step 1-3.
Step 4 is written with a do-while control statement. The termination criterion is to have all Lagrange multiplier corresponding to the active constraints to be positive. This condition is the second part of Eq. 213.
1. see p. 330, Chapter 11 in D.G. Luenberger, 1989, Linear and Nonlinear Programming, Addison-Wesley Publishing
Company, Inc., Reading, MA.
120
class Active_Set
coefficients of the active constraints
C1* to constraint equations
121
Chapter
int main() {
double x[3] = {0.0, 0.0, 0.0};
C1 X(3,x),
C(6,3, (double*)0, (double*)0),
f;
const int ALL_POSITIVE = 1;
const double EPSILON = 1.e-12;
int lambda_flag;
f &= -3*X[0]-X[1]-3*X[2];
C[0] = 2*X[0]+ X[1]+ X[2] - 2;
C[1] = X[0]+2*X[1]+3*X[2] - 5;
C[2] = 2*X[0]+2*X[1]+ X[2] - 6;
C[3] = - X[0];
C[4] = - X[1];
C[5] = - X[2];
Active_Set A(C);
do{
A.activate();
lambda_flag = ALL_POSITIVE;
C0 lambda = - d(f) / ~d(A);
int i_cache = -1;
double min_lambda = -EPSILON;
for(int i = 0; i < A.active_no(); i++)
if((double)lambda[i] < -EPSILON) {
lambda_flag = !ALL_POSITIVE;
if((double)lambda[i] < min_lambda) {
min_lambda = lambda[i];
i_cache = i;
}
}
if(!lambda_flag) {
A.deactivate(i_cache);
C0 e(3,(double*)0); e[i_cache] = 1.0;
C0 p = - e / d(A);
double min_alpha = 1.e20;
int activate_flag = FALSE;
for(int i = 0; i < 6; i++) {
if(A.active_state(i) <= -1) {
double alpha, temp = (double)(d(C)[i]*p);
if(fabs(temp) > EPSILON)
alpha = -(double)((((C0)C)[i])/temp);
if(alpha < min_alpha && alpha > 0.0) {
min_alpha = alpha;
activate_flag = TRUE;
}
}
}
if(activate_flag) {
((C0)X) += min_alpha * sd;
((C0)C[0]) = 2*((C0)X[0])+ ((C0)X[1])+ ((C0)X[2]) - 2;
((C0)C[1]) = ((C0)X[0])+2*((C0)X[1])+3*((C0)X[2]) - 5;
((C0)C[2]) = 2*((C0)X[0])+2*((C0)X[1])+ ((C0)X[2]) - 6;
((C0)C[3]) = - ((C0)X[0]); ((C0)C[4]) = - ((C0)X[1]); ((C0)C[5]) = - ((C0)X[2]);
}
}
} while(!lambda_flag);
cout << "solution: " << ((C0)X) << endl << "maximum objective functiona: " <<
(3*((C0)X[0])+((C0)X[1])+3*((C0)X[2])) << endl;
return 0;
}
6 constraints, 3 variables
i = - Ci / (d(C)i p)
select smallest positive i
update solution
update constraint values
122
Eq. 219
The unique minimum point (1, 1) is at a banana-shaped valley. For all problems in this section, the initial point
is selected at (-1.2, 1) such that an intelligent search path will have to make a turn along the banana-shaped valley to arrive at the minimum point. This objective functional could be used to test the robustness of an algorithm.
2
(-1.2, 1)
(1, 1)
-1
-2
-2
-1
Figure 2.8 Rosenbrocks function with minimum point at point (1, 1).
1. from p. 96 in P.E. Gill, W. Murray, and M.H. Wright, 1981, Practical Optimization, Academic Press, Inc., San Diego.
123
Chapter
#include include/vs.h
#define EPSILON 1.e-12
#define MAX_ITER_NO 20
int main() {
double v[2] = {-1.2, 1.0}, energy_norm;
C2 x(2, v), f;
int count = 0;
do {
f &=100.0*(x[1]-x[0].pow(2)).pow(2)+(1.0-x[0]*x[1]).pow(2);
C0 dx = - d(f) / dd(f);
(C0) x += dx;
energy_norm = norm(dx*(C0)f);
} while(++count < MAX_ITER_NO && energy_norm > EPSILON);
if(count == MAX_ITER_NO)
cout << Warning: convergence failed, energy norm: << energy_norm << endl;
else
cout << solution ( << count << ): << ((C0)x) << endl;
return 0;
}
-1
-2
-3
-2
-1
124
Eq. 220
Along p the solution is updated according to xi+1 = xi+ p, where is a scalar parameter, and its optimal value is
determined by using line search, or even a scalar version of classical Newtons method (see next section on
steepest descent method). We consider bisection and golden section here. For line search algorithm, the minimum of a function is searched by evaluating the function and then comparing its values at selected bracketing
points. The basic idea is to have the bracketing interval contains the point with the minimum function value, and
at the same time make the bracketing interval smaller and smaller in an iterative algorithm.
Given a bracketing interval [a, c] for bisection method, the interval contains the point corresponding to the
minimum function value. At the middle of the interval is the point, b = (a+c)/2. The next bracketing point x is
taken as the middle of [b, c]; i.e., x = (b+c)/2. If f(x) > f(b), the next bracketing points are [a, x], otherwise, the
next bracketing points are [b, c]. Repeating this process, the bracketing interval will become smaller and smaller.
In the worst scenario, the selected intervals always lie on the larger segments. The bracketing intervals will
reduce at the rate of 0.75 2n = 0.5625n, where 2n is the number of repeated iterations. On the other hand, the best
case will be reducing at the rate of 0.252n = 0.0625n. On an average case, there is 50% chance of selecting either
larger or smaller segments, so the reducing rate is 0.25n 0.75n = 0.1875n.
Golden section finds an optimal ratio to avoid the worst case scenario compared to bisection method. Considering a triplet of points [a, b, c] with the ratio of interval [a, b] to interval [a, c] as , the interval [b, c] to interval [a, c] ratio will be 1-. The next bracketing point x, lies to the right of b with interval [b, x] to interval [a, c]
ratio as . First, since after comparison of function values the selected bracketing point can be either b or x, we
demand the symmetry of the two points by requiring that [a, x] (normalized length = + ) and [b, c] (normalized length = 1-) to be equal. Therefore,
+ = 1-
Eq. 221
Secondly, if x is to be selected as the next bracketing point, the ratio of interval [b, x] to interval [b, c] (= /(1-))
should be self-similar to the original ratio of interval [a, b] to interval [a, c] (= ). Therefore,
/(1-) =
Eq. 222
Eq. 223
One of the roots of Eq. 223 that is between [0, 1] is the ratio = 0.381971, with the ratio of selecting the larger
segment as 1- = 0.61803. Now, for the worst scenario the convergence rate reduces from 0.752n to 0.618032n.
1. p. 350 for bisection, and p. 399 for golden section, in W.H. Press, S.A. Teukolsky, W.T. Vetterlin, and B.P. Flannery,
1992, Numerical Recipes, Cambridge University Press, Cambridge, UK.
125
Chapter
However, it is easy to see both the best case scenario and the average case scenario for golden section reduces
the bracketing length at a slower rate than bisection method. Program Listing 28 implements golden section line
search with classical Newtons method for defining the search direction. The result is shown in Figure 2.10,
where the search path follows the banana shaped valley nicely. In Chapter 4, the least squares formulation for a
nonlinear finite element method on page 331 converges only after using the line search algorithm. In Chapter 5,
a relative large incremental step can be taken in a finite deformation elastoplastic finite element problem only
when the line search method is applied.
#include include/vs.h
static double EPSILON = 1.e-12; static int MAX_ITER_NO = 100
int main() {
double v[2] = {-1.2, 1.0}; C2 X(2, v); C0 d_x, p; int count = 0;
do {
C2 f = 100.0*(X[1]-X[0].pow(2)).pow(2)+(1-X[0]).pow(2);
p &= -d(f)/dd(f);
double left = 0.0, right = 1.0, length = right-left;
C1 x0(0.0), x1(0.0), phi(0.0), alpha(0.0);
do {
double alpha_temp = (C0) alpha = (left + 0.618 * length);
x0 = ((C0)X)[0] + alpha * p[0], x1 = ((C0)X)[1] + alpha * p[1];
phi = 100*(x1-x0.pow(2)).pow(2)+(1-x0).pow(2);
double golden_phi = (C0)phi; (C0) alpha = (left + 0.382 * length);
x0 = ((C0)X)[0] + alpha * p[0], x1 = ((C0)X)[1] + alpha * p[1];
phi = 100*(x1-x0.pow(2)).pow(2)+(1-x0).pow(2);
double left_phi = (C0)phi;
if(golden_phi < left_phi) { left = left + 0.382 * length; (C0)alpha = alpha_temp;
} else { right = left + 0.618 * length; }
length = right-left;
} while(length > 1.e-3);
d_x &= ((C0)alpha)*p; ((C0)X) += d_x;
cout << "solution " << (++count) << ": " << "{" << ((C0)X)[0] << ", "
<< ((C0)X)[1] << "}" << endl;
} while(((double)norm(p)) > EPSILON && count < MAX_ITER_NO);
return 0;
}
-1
-2
-2
-1
Figure 2.10 Golden section line search with Newtons method for search direction.
126
Eq. 224
That is the search direction is taken along the negative gradient direction. This search direction makes intuitive
sense for the objective functional to decrease in the direction of the negative gradient. The solution is updated
through xi+1 = xi + p, where the scalar is the line search parameter. We seek a value of that gives the minimum value of f along the search direction p. For this one variable() optimization problem, the scalar version of
the Newtons method can be used for solving optimal value of . We may replace it with a more primitive
method such as the golden section line search1 described in previous section.
#include include/vs.h
#define EPSILON 1.e-12
#define MAX_ITER_NO 20
int main() {
double v[2] = {-1.2, 1.0};
C1 X(2, v);
C0 dx;
int count = 0;
do {
C1 f = 100.0*(X[1]-X[0].pow(2)).pow(2)+(1-X[0]).pow(2);
C0 g = d(f);
C2 alpha(0.0);
C0 d_alpha;
do {
C2 x0 = ((C0)X)[0] + alpha * -g[0],
x1 = ((C0)X)[1] + alpha * -g[1];
C2 phi = 100*(x1-x0.pow(2)).pow(2)+(1-x0).pow(2);
d_alpha &= - d(phi) / dd(phi);
((C0)alpha) += d_alpha;
} while(((double)norm(d_alpha)) > EPSILON);
dx &= ((C0)alpha)*(-g);
((C0)X) += dx;
cout << "solution " << (++count) << ": " << ((C0)X) << endl;
} while(((double)norm(dx)) > EPSILON &&count < MAX_ITER_NO);
cout << "solution: " << ((C0)X) << endl;
}
1. see p. 353 for bisection and p.397 for golden section search in W.H. Press, S.H. Teukolsky, W.T. Vetterling, and B.P. Flannery, 1992, Numerical Recipes in C, Cambridge University Press, Cambridge, U.K.
127
Chapter
Program Listing 29 implements steepest descent method. The line search parameter , is the only variable
in updating formula xi+1 = xi + p, where xi and p are regarded as constants, and is the parameter to search for
the minimum objective functional value. Therefore, in the definition for the objective functional in line search
algorithm is
f(x1, x2) = (xi+1()) = ()
where f depends on 2 variables, or in more general case a multi-variable functional, while only depends on
one variable . We note the difference of cost between doing multi-variable Newtons method as described on
page 110 and one-parameter Newtons method for the line search algorithm here. The resultant search path of
steepest descent method is shown in Figure 2.11. First of all the path shows the typical zigzag pattern consisting
of alternating orthogonal search directions. The convergence rate is extremely slow. After 100 iterations the
solution is still at (0.6, 0.36). The convergence becomes ever slower when it is approaching the true solution (1,
1). At 9660 iterations, the solution is still at (0.999941, 0.999883). Although each iteration in steepest descent
method is much cheaper than in Newtons method, the Newtons method takes only 6 iteration to get to (1, 1).
-1
-2
-2
-1
Figure 2.11 The search path of the steepest descent method up to 100 iterations.
128
Eq. 225
is the line search parameter in steepest descent method. However, the selection of the weighting parameter is
a matter of art. A more systematic way of implementing Eq. 225, or a probably more intelligent way, is to use
modified Cholesky decomposition2 introduced on page 32. The basic idea is to set M = H-1. Since Hessian
matrix is symmetrical, we can apply Cholesky decomposition. The problem with the Newtons method is that the
objective functional may not be quadratic and the Hessian matrix may not be positive definite. When we apply
the Cholesky decomposition, we modify small or negative diagonals according to
d = max {d , }
where d is the modified diagonals and is a small positive number to be supplied to the modified Choelsky
decomposition on page 32. That is the degeneration of Newtons method to steepest descent method occurs only
when the positive definitiveness of the Hessian matrix is in question.
Program Listing 210 implements combined steepest descent and Newton method using modified Choleksy
decomposition. The search path is shown in Figure 2.12. It takes 13 iterations to get to point(1, 1) about two
times iterations compared to the classic Newtons method. However, the wild search path in Figure 2.9 has been
tamed successfully. Combined Newton and steepest descent method seems to be a more robust method than classic Newton method and steepest descent method.
1. p. 226-227 in D.G. Luenberger, 1989, Linear and Nonlinear Programming, Addison-Wesley Publishing Company, Inc.,
Reading, MA.
2. p. 108-111 in P.E. Gill, W. Murray, and M.H. Wright, 1981, Practical Optimization, Academic Press, Inc., San Diego.
129
Chapter
#include include/vs.h
#define EPSILON 1.e-12
#define MAX_ITER_NO 20
int main() {
double v[2] = {-1.2, 1.0};
C2 X(2, v);
C0 dx;
int k = 0;
do {
C2 f = 100.0*(X[1]-X[0].pow(2)).pow(2)+(1.0-X[0]).pow(2);
C0 g = d(f);
Cholesky mcd(dd(f), EPSILON);
C0 p = mcd * (-g);
C2 alpha(0.0), x0, x1;
C0 d_alpha;
do {
x0 &= ((C0)X)[0] + alpha * p[0]; x1 &= ((C0)X)[1] + alpha * p[1];
C2 phi = 100.0*(x1-x0.pow(2)).pow(2)+(1.0-x0).pow(2);
if(fabs((double)dd(phi)) > EPSILON) d_alpha &= -d(phi) / dd(phi);
else break;
((C0)alpha) += d_alpha;
} while(((double)norm(d_alpha)) > 1.e-8);
dx = ((C0)alpha)*p;
((C0)X) += dx;
} while(((double)norm(dx)) > EPSILON);
cout << "final solution: " << ((C0)X) << endl;
}
Listing 210 Minimization of Rosenbrocks function using modified Cholesky decomposition (project:
combined_newton_and_steepest_descent).
-1
-2
-2
-1
Eq. 226
The solution is sought along x i+1 = x i + p i , where dx = p i. Minimize f(x i+1) with respect to dx (where x i+1
= x i + dx) , we get
i = - (pi)Tf i / ( (pi)THpi)-1
Eq. 227
Eq. 228
Since search directions pi are orthogonal to each other, we have (pi)TH pj = 0, for i j. From Eq. 228, we have
(gi+1 - gi)T pj = i (pi)TH pj = 0.
Eq. 229
The conjugate direction pi+1 that is orthogonal to all its previous directions is taken as
pi+1 = - gi+1 + i pi
Eq. 230
Pre-multiplying Eq. 230 with (gi+1 - gi)T we have left-hand-side of Eq. 230 equal zero. In view of Eq. 229, i
can be solved for as
i = (gi+1 - gi)Tgi+1 / [(gi+1 - gi)Tpi]
Eq. 231
Applying the orthogonal relations to Eq. 231 we have the Fletcher-Reeves formula
i = (gi+1)Tgi+1 / [ (gi)Tgi]
Eq. 232
Eq. 233
or Polak-Ribiere formula
131
Chapter
Program Listing 211 implements conjugate gradient method. The basic steps are1
Step 1: search directioncompute p0 = - g0 = - f(x0)T,
Step 2: looppartial conjugate gradient method, loop over n dimension
a: line searchx i+1 = x i + p i to minimize f(x i+1) in place of Eq. 227
b: gradient at x i+1gi+1 = f(xi+1)T
c: search directionpi+1 = - gi+1 + i pi (Eq. 229), where
Fletcher-Reeves: i= (gi+1)Tgi+1 / [ (gi)Tgi] (Eq. 232)or,
Polak-Ribiere:
i = (gi+1 - gi)Tgi+1 / [ (gi)Tgi] (Eq. 233)
Step 3: restartrepeat Step 1 and 2, and reset x0 = xn.
#include include/vs.h
int main() {
double v[2] = {-1.2, 1.0};
const double EPSILON = 1.e-12;
const int MAX_NO_OF_ITERATION = 30;
int k = 0;
C1 X(2, v);
C0 dx, p;
do {
for(int i = 0; i < 2; i++) {
C1 f = 100.0*(X[1]-X[0].pow(2)).pow(2)+(1.0-X[0]).pow(2);
C0 g = d(f);
if(i == 0) p &= - g;
C2 alpha(0.0), x0, x1;
C0 d_alpha;
do {
x0 &= ((C0)X)[0] + alpha * p[0]; x1 &= ((C0)X)[1] + alpha * p[1];
C2 phi = 100.0*(x1-x0.pow(2)).pow(2)+(1.0-x0).pow(2);
if(fabs((double)dd(phi)) > EPSILON) d_alpha &= -d(phi) / dd(phi);
else break;
((C0)alpha) += d_alpha;
} while(((double)norm(d_alpha)) > 1.e-8);
dx &= ((C0)alpha)*p;
((C0)X) += dx;
if(i != 1 && ((double)norm(dx)) > EPSILON) {
C1 f1 = 100.0*(X[1]-X[0].pow(2)).pow(2)+(1.0-X[0]).pow(2);
C0 g1 = d(f1),
beta = g1.pow(2) / g.pow(2);
p = -g1 + beta * p;
}
cout << "solution(" << ++k << "): " << ((C0)X) << endl;
}
} while(((double)norm(dx)) > EPSILON && k < MAX_NO_OF_ITERATION);
cout << "The final solution: " << ((C0)X) << endl;
}
Listing 211 Minimization of Rosenbrocks function using conjugate gradient method (project:
conjugate_gardient_method).
1. p. 253 in D.G. Luenberger, 1989, Linear and Nonlinear Programming, Addison-Wesley Publishing Company, Inc.,
Reading, MA.
132
-1
-2
-2
-1
Figure 2.13 Conjugate gradient method using only first derivative information, and 30
iterations.
Quasi-Newton Method
The advantage of both steepest descent method and conjugate gradient method is that it requires only first
derivative information. This is important especially for problems with large number of variables. However, we
saw classic Newtons method or its modification that have second derivative information enjoys a faster convergence rate. The strategy is we want to stick with first derivative method because of economical consideration.
Since along the iterative steps we have two sequences of first derivative information, {x0, x1, ... , xi} and {g0, g1,
Workbook of Applications in VectorSpace C++ Library
133
Chapter
..., gi}, we can use these sequences of first derivative information to construct an approximated second derivative
information. For search direction pi = xi+1 - xi, and qi = gi+1 - gi, the finite difference quotient gives
H(xi) = (gi+1-gi)/(xi+1 - xi)
Eq. 234
Eq. 234 is Hi pi = qi, which is also known as quasi-Newton condition. pi = qi / Hi = Bi qi , where Bi = (Hi)-1 is the
inverse of Hessian. We seek a rank one update formula with the form of
B
i+1
= B +uv
Eq. 235
where u and v are vectors, which need further constraints. Enforcing quasi-Newton condition first, it can be
shown that the second term is
i
i i
(p B q ) v
u v = ------------------------------------i
vp
Eq. 236
Bi+1 in Eq. 235 satisfy quasi-Newton condition but is not symmetrical. We can symmetrized it as (denoted with
superscript s)
(Bi+1)s = (Bi+1 + (Bi+1)T) / 2
Eq. 237
However, (Bi+1)s may not satisfy quasi-Newton condition. We can use the above two steps (1) quasi-Newton
condition, and (2) symmetrization repeatedly to yield a sequence of Bi+1. The limit of the sequence gives the
updating formula which is known as the Davidon-Fletcher-Powell (DFP) method
i
i i
i i
(B q ) (B q )
i p p
i+1
- ---------------------------------------B DFP = B + ----------------i
i
i
i i
p p
q (B q )
Eq. 238
This is a rank-two update formula. If the update is performed on the Hessian H itself instead of its inverse B, we
have a complementary formula by substituting B for H, and p for q and vice versa. Then, taking inverse of this
expression gives the alternative Broyden-Fletcher-Goldfarb-Shanno (BFGS) updating formula
i
q ( B q ) p p p ( B q ) + ( B q ) p
i+1
- ------------------ ---------------------------------------------------------------B BFGS = B i + 1 + -------------------------i
i
i
i
i
q pi p p
q p
i
i i
i i
i i
Eq. 239
Program Listing 212 implemented the quasi-Newton method. The basic steps are1
Step 1: search directioncompute di = - Bi gi ,
Step 2: looppartial quasi-Newton method, loop over n dimension
a: line searchx i+1 = x i + d i to minimize f(x i+1), we get x i+1 , p i = d i and
1. p. 265-267 in D.G. Luenberger, 1989, Linear and Nonlinear Programming, Addison-Wesley Publishing Company, Inc.,
Reading, MA.
134
i i
i i
(B q ) (B q )
i p p
i+1
- ---------------------------------------DFP: BDFP = B + ----------------i
i
i
i i
p p
q (B q )
i
, or
q ( B q ) p p p ( B q ) + ( B q ) p
+1
BFGS: B iBFGS
- ---------------------------------------------------------------- ----------------= B i + 1 + -------------------------i
i
i
i
i
q pi p p
q p
Step 3: restartrepeat Step 1 and 2, and reset Bi .
i
i i
i i
i i
There are two ways to think of initial B. The first is that B may not be available at all. So initially B is set to
identity matrix. The second is that B computation is very expensive, so B is only computed at the initial step of
every restart. In between quasi-Newton method takes over without having to have the second derivative information. This method is popular in finite element method, in which the formation of global stiffness matrix and its
solution is equivalent to compute the inverse of HessianB. The result of BFGS computation is shown in Figure
2.14. It takes 34 iterations to arrive at the solution point (1, 1). In Chapter 5 we show an example of an elastoplastic finite element problem implemented with the BFGS method.
-1
-2
-2
-1
1
2
0
Figure 2.14 Searching path of the BFGS method. The solution point (1, 1) is arrived at after
34 iterations.
135
Chapter
#include include/vs.h
int main() {
double v[2] = {-1.2, 1.0};
const double EPSILON = 1.e-12;
const int MAX_NO_OF_ITERATION = 100;
int k = 0;
C0 dx;
C1 X(2, v);
C2 x(2, v);
do {
((C0)x) = ((C0)X);
C2 F = 100.0*(x[1]-x[0].pow(2)).pow(2)+(1.0-x[0]).pow(2);
C0 B = dd(F).inverse();
for(int i = 0; i < 2; i++) {
C1 f = 100.0*(X[1]-X[0].pow(2)).pow(2)+(1.0-X[0]).pow(2);
C0 g = d(f), d = B*(-g);
C2 alpha(1.0), x0, x1;
C0 d_alpha;
do {
x0 &= ((C0)X)[0] + alpha * d[0];
x1 &= ((C0)X)[1] + alpha * d[1];
C2 phi = 100.0*(x1-x0.pow(2)).pow(2)+(1.0-x0).pow(2);
if(fabs((double)dd(phi)) > EPSILON) d_alpha &= -d(phi) / dd(phi);
else break;
((C0)alpha) += d_alpha;
} while(((double)norm(d_alpha)) > 1.e-8);
dx &= ((C0)alpha)*d;
((C0)X) += dx; // update the solution
if(((double)norm(d_x)) > EPSILON) {
C1 f1 = 100*(X[1]-X[0].pow(2)).pow(2)+(1.0-X[0]).pow(2);
C0 g1 = - d(f1), p = dx, q = g1 -g, Bq = B * q;
B += (1.0+(q*Bq)/(q*p))*((p%p)/(p*q)) - (p%Bq+Bq%p)/(q*p);
}
cout << "solution(" << (++k) << "): " << ((C0)X) << endl;
}
} while(k < MAX_NO_OF_ITERATION && ((double)norm(dx)) > EPSILON);
cout << "Final solution: " << ((C0)X) << endl;
}
x = {1, 1}T
Listing 212 Minimization of Rosenbrocks function using BFGS method (project: quasi_newton_bfgs).
136
Eq. 240
1. p. 347 in D.G. Luenberger, 1989, Linear and Nonlinear Programming, Addison-Wesley Publishing Company, Inc.,
Reading, MA.
137
Chapter
#include include/vs.h
class Basic_Set {
C1 *_A, *_c, *_X;
int row_size, col_size, *_basic_order;
public:
Basic_Set(C1&, C1&, C1&);
~Basic_Set() { delete [] _basic_order; }
int basic_order(int i) {return _basic_order[i];}
void swap(int i, int j);
C0& X();
friend C0& dC(Basic_Set&);
friend C0& df(Basic_Set&);
};
C0& Basic_Set::X() { return (*(_X)).F(); }
C0& dC(Basic_Set& a) { return d(*(a._A)); }
C0& df(Basic_Set& a) { return d(*(a._c)); }
Basic_Set::Basic_Set(C1& C, C1& f, C1& X) {
row_size = C.row_length();
col_size = C.col_length();
_basic_order = new int[col_size];
for(int i = 0; i < col_size; i++) _basic_order[i] = i;
_A = &C; _c = &f; _X = &X;
}
void Basic_Set::swap(int i, int j) {
int old_basic_order = _basic_order[i];
_basic_order[i] = _basic_order[j]; _basic_order[j] = old_basic_order;
C0 old_Ai(row_size, (double*)0);
old_Ai = d(*_A)(i); d(*_A)(i) = d(*_A)(j); d(*_A)(j) = old_Ai;
C0 old_ci(0.0); old_ci = d(*_c)[i];
d(*_c)[i] = d(*_c)[j]; d(*_c)[j] = old_ci;
C0 old_Xi(0.0); old_Xi = ((C0)(*_X))[i];
((C0)(*_X))[i] = ((C0)(*_X))[j]; ((C0)(*_X))[j] = old_Xi;
}
variables
constraint coefficients
objective functional coefficients
swap order
swap columns
swap objective functional coefficients
swap variables
Listing 213 class Basic_Set data abstraction for nonlinear problem (project: reduced_gradient).
Program Listing 214 implements the reduced gradient method using class Basic_Set in Program Listing
213. The basic steps are
Step 1: reduced gradientrT = cD - cB B-1 D,
Step 2: xDif ri < 0 or xDi > 0, xDi = -ri else xDi = 0
Step 3: xBif xDi = 0, current solution is optimal, else xB= B-1 DxD
Step 4: feasible boundsmax { B: xB+B xB 0 }, and max { D: xD+D xD 0 }
Step 5: line searchmin{f(x + x): 0 B, 0 D }, update with xi+1 = xi + x
Step 6: swapif B swap the vanishing xB with vanishing xD, and corresponding [B, D] and [cB, cD] .
The difference from the linear programming version is evident now that the fundamental theorem of linear programming is not applicable any more; i.e., the extremum value may occur in the middle of an edge or even in the
138
139
Chapter
#include include/vs.h
int main() {
double rhs[2] = {7.0, 6.0}, v[4] = {2.0, 2.0, 1.0, 0.0}, norm_dxd,
EPSILON = 1.e-12, HUGE = 1.e20, RELAXED = 1.e3;
int k = 0, MAX_NO_OF_ITER = 10;
C1 X(4, v), C = VECTOR_OF_TANGENT_BUNDLE("int, int", 2, 4);
C[0] = 2*X[0]+ X[1]+ X[2]+4*X[3];
C[1] = X[0]+ X[1]+2*X[2] + X[3];
C1 f = X[0].pow(2)+X[1].pow(2)+X[2].pow(2)+X[3].pow(2)-2*X[0]-3*X[3];
Basic_Set BS(C, f, X);
C0 B(2, 2, dC(BS), 0, 0), D(2, 2, dC(BS), 0, 2), c_B(2, df(BS), 0), c_D(2, df(BS), 2);
C0 X_B(2, BS.X(), 0), X_D(2, BS.X(), 2), b(2, rhs), x(4, (double*)0);;
do {
C0 d_X_B(2, (double*)0),d_X_D(2, (double*)0),
B_inv = B.inverse(), r_D = c_D - c_B * B_inv * D;
for(int i = 0; i < 2; i++)
if((double) r_D[i] < -EPSILON || (double) X_D[i] > EPSILON) d_X_D[i] = - r_D[i];
else d_X_D[i] = 0.0;
if((norm_dxd = norm(d_X_D)) > RELAXED*EPSILON) {
d_X_B = - B_inv * D * d_X_D;
double alpha_B=HUGE,alpha_D=HUGE,ratio_B,ratio_D;int min_B=-1,min_D=-1;
for(int i = 0; i < 2; i++) if((double)d_X_B[i] < EPSILON) {
ratio_B = (double) - X_B[i]/d_X_B[i];
if(ratio_B < alpha_B) { alpha_B = ratio_B; min_B = i; }
}
for(int i = 0; i < 2; i++) if((double)d_X_D[i] < EPSILON) {
ratio_D = (double) - X_D[i]/d_X_D[i];
if(ratio_D < alpha_D) { alpha_D = ratio_D; min_D = i; }
}
C0 d_X = d_X_B & d_X_D, d_alpha;
C2 alpha(0.0), x[4];
do {
for(int i = 0; i < 4; i++)
x[i] = ((C0)X)[BS.basic_order(i)] + alpha * d_X[BS.basic_order(i)];
C2 phi = x[0].pow(2)+x[1].pow(2)+x[2].pow(2)+x[3].pow(2)-2*x[0]-3*x[3];
if(fabs((double)dd(phi)) > EPSILON) d_alpha &= -d(phi) / dd(phi);
else break;
((C0)alpha) += d_alpha;
} while((double)norm(d_alpha) > RELAXED*EPSILON);
if((double)(C0) alpha >= alpha_B) {
(C0) alpha = alpha_B;
BS.swap(min_B, min_D+2);
}
if((double)(C0) alpha > alpha_D) (C0) alpha = alpha_D;
d_X *= ((C0)alpha);
((C0)X) += d_X;
}
f = X[BS.basic_order(0)].pow(2)+X[BS.basic_order(1)].pow(2)+
X[BS.basic_order(2)].pow(2)+X[BS.basic_order(3)].pow(2)
-2*X[BS.basic_order(0)]-3*X[BS.basic_order(3)];
df(BS) = d(f);
} while(++k < MAX_NO_OF_ITER && norm_dxd > RELAXED*EPSILON);
for(int i = 0; i < 4; i++) x[i] = ((C0)X)[BS.basic_order(i)];
C0 fp = x[0].pow(2)+x[1].pow(2)+x[2].pow(2)+x[3].pow(2)-2*x[0]-3*x[3];
cout << "The final solution: " << x << endl << "f: " << fp << endl;
}
140
Eq. 241
where the second term - A is the component orthogonal to the tangent plane. In view of Eq. 241, p vanishes
when the left-hand-side equals zero; i.e., the first-order condition of an extremum point is satisfied. Since the
search direction p on the tangent plane is orthogonal to the gradient of the constraint equations A, we have the
orthogonal relationship as A p = 0. Pre-multiply Eq. 241 with A and solve for ,1
T -1
= -(A A ) Af
Eq. 242
T -1
p = -[I - A (A A ) A] g = - P g,
T
T -1
where P = [I - A (A A ) A] is the projection matrix which projects the negative gradient - g to the tangent plane to define the search direction p.
Active_Set class has already been shown in Program Listing 23. Program Listing 215 implements the
gradient projection method. The basic steps are2
hypotenuse: -g = f
x
A
T
p = f - A
tangent plane
Figure 2.15 Project negative gradient to the tangent plane as p (search direction).
1. p. 330-331 in D.G. Luenberger, 1989, Linear and Nonlinear Programming, Addison-Wesley Publishing Company, Inc.,
Reading, MA.
2. modified from p. 332-333 in D.G. Luenberger, 1989, same as the above.
141
Chapter
1. p. 426-427 in D.G. Luenberger, 1989, Linear and Nonlinear Programming, Addison-Wesley Publishing Company, Inc.,
Reading, MA.
142
line search
min{f(x + p): 0 i }
updates
xk+1 = xk + p
C(xk+1)
f(xk+1)
Step 4: if p = 0
most negative
Kuhn-Tucker condition;
214
Eq.
i 0 , A i 0, and i A i = 0
until for Ai = 0 i 0
drop constraint corresponding to
the most negative
143
Chapter
feasible region
(1.5, 2.5)
2
(3, 1)
x1 + x2 = 4
x1
(0, 0)
0
Figure 2.16
144
(3, 0)
2
1
f(x) = -2 x T H x + g T x
subject to
A(x) = Ax - b = 0
where H = f,xx is the Hessian matrix, and g = f,x is the gradient vector. The Lagrangian functional using the
Lagrange multiplier method such as Eq. 211 on page 118 is
T
T
1
l ( x, ) = f ( x ) + A ( x ) = --- x T H x + g T x + ( Ax b )
2
Eq. 243
The Euler-Lagrange equations give the first-order optimal conditions of the Lagrangian functional with respect
to x and .
l,x(x,) = Hx+AT + g = 0
l,(x,) = Ax - b = 0
Eq. 244
The second-order optimal condition requires the Hessian matrix H be positive definite. That is always true for a
quadratic functional. An incremental version with xi+1 = xi +x can be substituted in f(x) and A(x). One can view
the expression f ( x i + x ) f ( x i ) + gT x + 1--- ( x )T H x as an approximation using second-order Taylor expan2
sion to the objective functional, with the current active constraint equations as
A(xi+1) = A(xi +x) A(xi) + A x = A(xi) + A x = 0
With these relations, the Euler-Lagrange equations can be re-written in matrix form with the incremental solution, x, as
T
H A
A 0
f ( x i )
A ( x i )
Eq. 245
Using first equation in Eq. 245, we get x = H-1 (f(xi ) - AT). Notice that we have relied on the symmetrical
positive definitiveness of H to have its inverse. Substituting this back to eliminate x in the second equation
gives an equation: AH-1 (f(xi ) - AT) = -A(xi). Solving this equation for gives
= (AH-1AT)-1 [A(xi) - AH-1f(xi )]
Eq. 246
Eq. 247
145
Chapter
In summary, we compare the Lagrange method to the previous methods. One can view Eq. 241 and Eq. 242
for the gradient projection method as the simplified approximation of Eq. 246 and Eq. 247 for the Lagrange
method. First, in the gradient projection method we use only the first derivative information. If we set H-1 = I in
Eq. 247 we get x = - f(xi ) - AT which is Eq. 241 (with p = x). Secondly, having H-1 = I in mind, gradient
projection method is projecting the gradient on the tangent plane of the constraint surface A x = 0 instead of
the approximated constraint surface A x = - A(xi) in Lagrange method. Plugging in the tangent plane A x
T -1
= 0 in Eq. 245 and H-1 = I, the Eq. 246 becomes = -(A A ) Af, which is exactly Eq. 242. Therefore,
in Lagrange method, we use gradient projection Eq. 242 in place of Eq. 246. The procedure of using such
approximated Lagrange multiplier in Lagrange method is an example known as the multiplier update method.
The class Active_Set needs some modification (see Program Listing 216). In the previous examples, in
linear programing and gradient projection method, class Active_Set only needs to store and update the tangent
plane (of the constraint surface) information. In Lagrange method, besides the tangent plane information, the
information on the constraint surface itself, specifically A(xi), also needs to be stored and updated; i.e., A x =
- A(xi) instead of A x = 0.
#include include/vs.h
class Active_Set {
C1 _A, &Constraint;
int n_equality, size_c, n_active, *_active_state;
public:
Active_Set(C1& C, int n = 0);
~Active_Set() { delete [] _active_state; }
int active_state(int i) { return _active_state[i];}
int active_no() { return n_active; }
operator C0() { return ((C0)_A); }
void activate();
void deactivate(int i, int k = -2);
friend C0& d(Active_Set&);
};
Active_Set::Active_Set(C1& C, int n) : Constraint(C) {
n_equality = n;
size_c = Constraint.row_length();
_active_state = new int[size_c];
for(int i = 0; i < size_c; i++) _active_state[i] = -1;
}
void Active_Set::activate() {
n_active = 0;
for(int i = 0; i < n_equality; i++) _active_state[i] = n_active++;
for(int i = n_equality; i < size_c; i++)
if((double) ((C0)Constraint)[i] > -1.e-10 &&_active_state[i] >= -1)
_active_state[i] = n_active++;
if(n_active > 0) {
_A &=VECTOR_OF_TANGENT_BUNDLE("int, int", n_active, Constraint.col_length());
for(int i = 0; i < size_c; i++) if(_active_state[i] >= 0) _A[_active_state[i]] = Constraint[i];
}
}
void Active_Set::deactivate(int i, int k) {
for(int j = 0; j < size_c; j++) if(_active_state[j] == i) { _active_state[j] = k; break; } }
C0& d(Active_Set& a) { return d(a._A); }
class Active_Set
A(active constraints) and C
A(xi)
A
initialize active set
update A from C
deactive a constraint
Listing 216 class Active_Set data abstraction for both Lagrange method and gradient projection method
(project: lagrangian_and_gradient_projection).
146
(2, 4)
4
feasible region
(1.5, 2.5)
2
(2.6667, 1.3333)
x1 + x2 = 4
x1
(0, 0)
0
(3, 0)
2
Figure 2.17 Lagrange method with active set method on a constrained quadratic
functional
147
Chapter
#include include/vs.h
int main() {
double v[2] = {0.0, 0.0}, EPSILON = 1.e-12; int ALL_POSITIVE = TRUE, lambda_flag;
C1 X(2,v), C = VECTOR_OF_TANGENT_BUNDLE("int, int", 3, 2);
#if defined(__LAGRANGE)
C2 X2(2, v), f = 2*X2[0].pow(2)+X2[0]*X2[1]+X2[1].pow(2)-12*X2[0]-10*X2[1];
#else
C1 f = 2*X[0].pow(2)+X[0]*X[1]+X[1].pow(2)-12*X[0]-10*X[1];
#endif
C[0] = X[0] + X[1] -4; C[1] = -X[0]; C[2] = - X[1];
Active_Set A(C);
for(;;) {
A.activate(); lambda_flag = !ALL_POSITIVE; C0 lambda, p;
if(A.active_no() == 0)
#if defined(__LAGRANGE)
p &= -d(f)/dd(f)
#else
p &= -d(f);
#endif
else {
#if defined(__LAGRANGE)
lambda &= (((C0)A)-d(A)*dd(f).inverse()*d(f))/(d(A)*dd(f).inverse()*(~d(A)));
p &= - dd(f).inverse()*(d(f)+(~d(A))*lambda);
#else
lambda &= - (d(A)*d(f)) / (d(A)*~d(A));
p &= -d(f)- d(A)*lambda;
#endif
}
if(fabs((double)norm(p)) > EPSILON) { double min_alpha = 1.e20;
for(int i = 0; i < 3; i++) if(A.active_state(i) <= -1) {
double alpha, temp = (double)(d(C)[i]*p);
if(fabs(temp) > EPSILON) alpha = -(double)(((C0)C)[i]/temp);
if(alpha < min_alpha && alpha > 0.0) { min_alpha = alpha; active_flag =
TRUE;}}
C0 d_alpha(0.0); C2 alpha(0.0), x0, x1, F;
do { x0 = ((C0)X[0]) + alpha * p[0]; x1 = ((C0)X[1]) + alpha * p[1];
F &= 2*x0.pow(2)+x0*x1+x1.pow(2)-12*x0-10*x1;
d_alpha = - d(F)/dd(F); ((C0)alpha) += d_alpha;
} while((double)norm(d_alpha) > EPSILON);
if((double)((C0)alpha) < min_alpha) min_alpha = (double)((C0)alpha);
C0 dx = min_alpha * p; ((C0)X) += dx;
((C0)C[0]) = ((C0)X[0]) + ((C0)X[1]) -4;
((C0)C[1]) = -((C0)X[0]) ; ((C0)C[2]) = ((C0)X[1]);
#if defined(__LAGRANGE)
((C0)X2) = ((C0)X) ;
f = 2*X2[0].pow(2)+X2[0]*X2[1]+X2[1].pow(2)-12*X2[0]-10*X2[1];
#else
f = 2*X[0].pow(2)+X[0]*X[1]+X[1].pow(2)-12*X[0]-10*X[1];
#endif
} else {
int i_cache = -1; double min_lambda = -EPSILON; lambda_flag = ALL_POSITIVE;
for(int i = 0; i < A.active_no(); i++)
if((double)lambda[i] < -EPSILON) { lambda_flag = !ALL_POSITIVE;
if((double)lambda[i] < min_lambda) { min_lambda = lambda[i]; i_cache = i; } }
if (lambda_flag) break; else A.deactivate(i_cache);
}
cout << ((C0)X) << endl;
} cout << "solution: " << ((C0)X) << endl;
line search
min{f(x + p): 0 i}
updates
xk+1 = xk + p
C(xk+1)
f(xk+1)
Step 4: if p = 0
most negative
Kuhn-Tucker condition;
214
Eq.
i 0 , A i 0, and i Ai = 0
until for Ai = 0 i 0
drop constraint corresponding to
Listing 217 Lagrange method and gradient projection method (project: lagrangian_and_gradient_projection).
148
From Eq. 241, p = - f - A . The search direction p is lying on the tangent plane of the constraint surface
A; that is p is orthogonal to the gradient of the constraint surface A (=A) as (see also Figure 2.15)
A p = 0.
Eq. 248
In other word, Eq. 248 expresses that the search direction p is in the null space of A. At the optimal condition, p
= 0, the negative gradient of the objective functional - f is the linear combination of the range space of A (=
A); i.e., - f = AT.
Range Space Method: Recall Eq. 246 and Eq. 247 from Lagrange method and consider projection of negative
gradient f on the tangent plane M= {y| Ay= 0}and gradient of constraint surface A, we have
-1 T -1
-1
= - (AH A ) AH f
-1
T
p = - H (f + A )
Eq. 249
T
-1
-1
T -1
= - (Y H Y) Y H f
-1
p = - H (f + Y)
Eq. 250
T
On page 36 we discussed that the round-off error could accumulate in the multiplication of the normal form A A
in the least square problem, where the QR decomposition is used to control the condition number of the problem.
-1 T
In the first equation of Eq. 249, the condition number can increase by the multiplication operations in AH A ,
and we may run into trouble when its inverse is taken. Since columns of Y are orthonormal, in the first equation
-1
T -1
of Eq. 250, the condition number of Y H Y is as good as that of H . Therefore, the range space method with
Eq. 250 is numerically superior to Eq. 249.
The range space method is implemented in Program Listing 218 for solving the same problem that the
reduced gradient method solved in Program Listing 214. The core steps are
1. see p. 183-184 in P.E. Gill, W. Murray, and M.H. Wright, 1981, Practical Optimization, Academic Press, Inc., San
Diego.
149
Chapter
#include include/vs.h
int main() {
const double EPSILON = 1.e-12;
const double RELAXED = 1.e3;
const int MAX_NO_OF_ITERATION = 10;
double v[4] = {2.0, 2.0, 1.0, 0.0};
C2 X(4, v);
C2 C = VECTOR_OF_TANGENT_OF_TANGENT_BUNDLE("int, int", 2, 4);
C[0] = 2*X[0]+ X[1]+ X[2]+4*X[3] - 7.0;
C[1] = X[0]+ X[1]+2*X[2]+ X[3] - 6.0;
C2 f = X[0].pow(2)+X[1].pow(2)+X[2].pow(2)+X[3].pow(2)-2*X[0]-3*X[3];
C0 p = VECTOR("int", 4);
C0 A = d(C), Q = QR(~A).Q();
C0 Y = MATRIX("int, int", 4, 2);
for(int i = 0; i < 2; i++) Y(i) = Q(i);
int k = 0;
do {
C0 H_inv = dd(f).inverse();
C0 lambda_bar = - ((~Y)*H_inv*Y).inverse() * (~Y) *H_inv* d(f);
p = - H_inv*(Y*lambda_bar+d(f));
((C0)X) += p;
f = X[0].pow(2)+X[1].pow(2)+X[2].pow(2)+X[3].pow(2)-2*X[0]-3*X[3];
++k;
cout << "solution{" << k << "): " << ((C0)X) << endl << "f: " << ((C0)f) << endl;
} while(k < MAX_NO_OF_ITERATION && (double)norm(p) > RELAXED*EPSILON);
cout << "The final solution: " << ((C0)X) << endl << "f: " << ((C0)f) << endl;
}
AT = QR
Y has columns in the range space of Q
-1
H
-1 -1
-1
= - (YTH Y) YTH f
-1
p = - H (f + Y)
Eq. 251
where pz is a vector of size n-m. Taylor expansion to second-order such as Eq. 26 on page 109 with xi+1 = xi +
p is
150
Substituting Eq. 251 into the increment p of the last equation yields
1
f ( x i + p ) = f ( x i + Zp Z ) f ( x i ) + f ,x ( x i )Zp Z + --- p ZT Z T H ( x i )Zp Z
2
Eq. 252
Eq. 253
Denoting the projected Hessian as Hz = ZTHZ, and the projected gradient as (f)z = ZTf, we recover the classic
Newton method on the null space as
pz = - (f)z / Hz
Therefore, the meaning of the n-m vector pz is clear. From first equation of Eq. 245 we have Hp+AT + f =
0. Substituting Eq. 253 into this equation, we can solve for the Lagrange multiplier if necessary (such as for inequality constrained problems where values of is needed for the active set method)
= - (AAT)-1A(Hp+ f)
Eq. 254
Program Listing 219 implemented the null space method. The kernel steps are simple
Step 1: null space, QR decomposition and form Z
-1
Step 2: search direction, p =- Z(ZTHZ) ZTf
In retrospect, we should discuss the counterpart (dual) of the projected Hessian and projected gradient of null
space Hz = ZTHZ, and (f)z = ZTf, respectively. Assume x* as a local solution of the primal problem: minimize
f(x) subject to A(x) = 0. The lagrangian functional from Eq. 211 can be re-written for the dual problem as
l(x*, ) = (x*(), ) = () = f(x*()) + T A(x*())
The first order derivative of () is
( ) = [ f ( x ( ) ) + T A ( x ( ) ) ] x ( ) + A ( x ( ) )
Eq. 255
Eq. 256
2
( ) = A ( x ( ) ) x ( )
Eq. 257
151
Chapter
#include include/vs.h
int main() {
const double EPSILON = 1.e-12;
const double RELAXED = 1.e3;
const int MAX_NO_OF_ITERATION = 10;
double v[4] = {2.0, 2.0, 1.0, 0.0};
C2 X(4, v);
C2 C = VECTOR_OF_TANGENT_OF_TANGENT_BUNDLE("int, int", 2, 4);
C[0] = 2*X[0]+ X[1]+ X[2]+4*X[3] - 7.0;
C[1] = X[0]+ X[1]+2*X[2]+ X[3] - 6.0;
C2 f = X[0].pow(2)+X[1].pow(2)+X[2].pow(2)+X[3].pow(2)-2*X[0]-3*X[3];
C0 p = VECTOR("int", 4);
C0 A = d(C);
C0 Q = QR(~A).Q();
C0 Z = MATRIX("int, int", 4, 2);
for(int i = 0; i < 2; i++)Z(i) = Q(i+2);
int k = 0;
do {
p = Z * ((~Z)*dd(f)*Z).inverse() * (~Z) * -d(f);
((C0)X) += p;
f = X[0].pow(2)+X[1].pow(2)+X[2].pow(2)+X[3].pow(2)-2*X[0]-3*X[3];
++k;
cout<< "solution{" << k << "): " << ((C0)X) << endl << "f: " << ((C0)f) << endl;
} while(k < MAX_NO_OF_ITERATION && (double)norm(p) > RELAXED*EPSILON);
cout << "The final solution: " << ((C0)X) << endl << "f: " << ((C0)f) << endl;
}
AT = QR
Z has columns in the null space of Q
p = - Z(ZTHZ)-1 ZTf
Eq. 258
Therefore, we get
T
x ( ) = H 1 ( x ( ), ) A ( x ( ) )
Eq. 259
Eq. 260
That is = A(x), and = - AT H-1 A. The classic Newton method for the dual problem coincides with the
first term in Eq. 246 of Lagrange method. The Hessian of the dual, - AT H-1 A, governs the convergence rate
of the dual problem.
It is helpful to point out that the existence and uniqueness of the constrained problem is known to be associated with the abstract form of a saddle-point problem1 (see Figure 2.18). A saddle function is shown as the
1. p. 30 in M.M. Sewell, 1987, Maximum and minimum principles, Cambridge University Press, Cambridge, UK.
152
Penalty Methods
Penalty method transforms a constrained problem into an unconstrained problem by defining a penalty objective functional, for example, with a quadratic penalty term of active constraints as
minimize q(x) = f(x) + P(x) = f(x) + ( /2) A (x)T A(x)
Eq. 261
where is the penalty parameter and the second term is designed to penalized the objective functional when the
constraints are violated. The steps of the penalty method are
153
Chapter
The final solution is the limiting point x, with , although when is too big the problem becomes ill-conditioned. The advantage of penalty method lies on the simplicity of Eq. 261. No advanced concept needs to be
introduced. The disadvantage is that we are left with a incremental procedure in which the problem needs to be
solved many times with an empirical sequence of . We emphasize that it is necessary to start with a smaller ,
and then, increase it subsequently. If we ignore the need for a incremental procedure and compute with only one
big , the solution can be completely different. Starting with too big a penalty parameter the solution is to satisfy
constraints overwhelmingly with no concern of the minimization of the objective functional. However, in many
engineering applications some magic is often recommended for their own application domain. The use of this
magic is an art rather than science.
In view of the shape of a saddle function shown in Figure 2.18, it is obvious that the quadratic form of the
penalty function P(x) = (/2) || A(x) ||2 is the most popular one, where P,xx is positive semi-definite; i.e., the
penalty term convexifies the primal (x-variables). Comparing Eq. 210, f +T A = 0 (the first-order condition), with the penalty objective functional
q(x) = f(x) + A(x)T A(x) = 0,
we see
= A(x)
Eq. 262
This can be used as the updating formula i+1 = i + A(x) for the simplest form of multiplier update method
discussed on page 146. This updating formula can be used for the augmented lagrangian method introduced
later.
Now consider a specific example we have been solving in previous sections
f(x1, x2) = 2x12 + x1x2 + x22 -12x1 -10 x2
subject to x1 + x2 4
-x1 0
-x2 0
For simplicity we should drop the inequality constrained part of the problem and consider only the equality constraint
x1 + x2 = 4
Assume that we are at the final constraint set of the active set method. We use x = (2, 2) which is clearly on the
constraint line and use penalty method to search for the final solution (1.5, 2.5). The penalty objective functional
q(x) is defined as
154
q(x1, x2) = 2x12 + x1x2 + x22 -12x1 -10 x2 + --- (A (x1, x2) T A(x1, x2) )
2
These two equations are the kernel of the penalty method, and the original constrained problem has been transform completely into an new unconstrained problem. Program Listing 220 implemented this simplified problem. This coding should be embedded into the active set method for a more general inequality constrained
problem.
#include include/vs.h
int main() {
const int DOF = 2; const int MAX_NO_OF_ITERATION = 20;
const double EPSILON = 1.e-12; double x[DOF] = {2.0, 2.0};
C2 q, A = TANGENT_OF_TANGENT_BUNDLE("int", DOF), X(DOF,x);
double rho = 1.0, delta_X;
C0 d_x, X_cache = VECTOR("int", DOF);
int k0 = 0;
do {
rho *= 10.0;
int k1 = 0;
do {
A = X[0] + X[1] -4;
q &= 2*X[0].pow(2)+X[0]*X[1]+X[1].pow(2)-12*X[0]-10*X[1]
+ (0.5*rho)*A.pow(2);
d_x &= -d(q) / dd(q);
(C0)X += d_x;
} while ((double)norm(d_x) > EPSILON && ++k1 < 10);
cout << "solution(rho=" << rho << ", " << k1 << "): " << ((C0)X) << endl;
delta_X = norm(X_cache - ((C0)X));
X_cache = ((C0)X);
} while(++k0 < MAX_NO_OF_ITERATION && delta_X > 1.e6*EPSILON);
cout << "The Final solution: " << ((C0)X) << endl;
}
A(x1, x2) = x1 + x2 - 4
q(x1, x2) = 2x12 + x1x2 + x22 -12x1 -10
x2 + --- (A(x1, x2)T A(x1, x2) )
2
dx = - q,x(xi) / q,xx(xi)
x += dx
x = {1.5, 2.5}T
Listing 220 Penalty method with a single equality constraint (project: penalty_one_constraint).
Since the penalty problem has transformed an equality constrained problem into an unconstrained problem,
various unconstrained optimization methods in Section 2.3.2 are applicable to the penalty method. We apply
classic Newton method, conjugate gradient method, and combined Newton and steepest descent method to the
penalty method in the following. Consider a less trivial problem with 10 variables and 4 equality constraints such
as1
1. from p. 381 in D.G. Luenberger, 1989, Linear and Nonlinear Programming, Addison-Wesley Publishing Company, Inc.,
Reading, MA.
155
Chapter
minimize f(x) = ix i2
i=0
f(x) = ix i2
i=0
Listing 221 The classic Newton version for penalty method (project: penalty_newton).
Program Listing 222 implemented conjugate gradient method version of the penalty method for the same
equality constrained problem in the above. Conjugate gradient uses line search along search direction p as xi+1 =
xi + p, and its objective functional is redefined to be a one-parameter function in as (xi+1()) = (). The
one-parameter line search uses Newtons formula d = -d()/d 2() to find the minimum of (). Conjugate
gradient is computed using Fletcher-Reeves formula with gi+1=q(xi+1)T, and i= (gi+1)Tgi+1 / [(gi)Tgi] as Eq.
156
Eq. 263
For the Newton method on M, any movement on this subspace can be expressed as xi+1 = xi + A u. Define
such incremental movement on M for the penalty objective functional
T
Eq. 264
Eq. 265
1. p. 282-284, and p. 384-387 in D.G. Luenberger, 1989, Linear and Nonlinear Programming, Addison-Wesley Publishing
Company, Inc., Reading, MA.
157
Chapter
#include include/vs.h
int main() {
const int DOF = 10; const int MAX_NO_OF_ITERATION = 10; int k0 = 0, k1 = 0;
const double EPSILON = 1.e-12; const double RELAXED = 1.e3;
C1 f, q, X(DOF,x), A = VECTOR_OF_TANGENT_BUNDLE("int, int", 4, DOF);
double c = 1.0, delta_X, x[DOF] = {0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0};
C0 d_x, sd, delta_q, X_cache( DOF, (double*)0);
do {
rho *= 10.0;
do {
for(int i = 0; i < 10; i++) {
A[0]=1.5*X[0]+X[1]+X[2]+0.5*X[3]+0.5*X[4] -5.5;
A[1]=2.0*X[5]-0.5*X[6]-0.5*X[7]+X[8]-X[9]-2.0;
A[2]=X[0] +X[2] +X[4] +X[6] +X[8]-10.0;
A[3]=X[1] +X[3] +X[5] +X[7] +X[9]-15.0;
f &=X[0].pow(2)+2*X[1].pow(2)+3*X[2].pow(2)+4*X[3].pow(2)+
5*X[4].pow(2)+6*X[5].pow(2)+7*X[6].pow(2)+8*X[7].pow(2)+
9*X[8].pow(2)+10*X[9].pow(2);
q &= f + (0.5 * rho) * A.pow(2);
C2 alpha(0.0), x[10], psi[4]; C0 d_alpha, g = d(q); int k2 = 0;
if(i == 0) p &= - g;
do {
for(int j = 0; j < 10; j++) x[j] &= ((C0)X)[j] + alpha * p[j];
psi[0]=1.5*x[0]+x[1]+x[2]+0.5*x[3]+0.5*x[4]-5.5;
psi[1]= 2.0*x[5]-0.5*x[6]-0.5*x[7]+x[8]-x[9]-2.0;
psi[2]= x[0] + x[2] + x[4] + x[6] + x[8] -10.0;
psi[3]= x[1] + x[3] + x[5] + x[7] + x[9] -15.0;
C2 phi = x[0].pow(2)+2*x[1].pow(2)+3*x[2].pow(2)+4*x[3].pow(2)+
5*x[4].pow(2)+6*x[5].pow(2)+7*x[6].pow(2)+8*x[7].pow(2)+
9*x[8].pow(2)+10*x[9].pow(2)+(0.5 * rho) * (psi[0].pow(2)+
psi[1].pow(2)+psi[2].pow(2)+psi[3].pow(2));
if((double)dd(phi) > EPSILON) d_alpha &= -d(phi) / dd(phi); else break;
((C0)alpha) += d_alpha;
} while(++k2 < MAX_NO_OF_ITERATION &&
(double)norm(d_alpha) > RELAXED*EPSILON);
d_x &= ((C0)alpha)*p; ((C0)X) += d_x;
if(i != 9 && (double)norm(d_x) > EPSILON) {
A[0]=1.5*X[0]+X[1]+X[2]+0.5*X[3]+0.5*X[4]-5.5;
A[1]=2.0*X[5]-0.5*X[6]-0.5*X[7]+X[8]-X[9]-2.0;
A[2]=X[0] +X[2] +X[4] +X[6] +X[8]-10.0;
A[3]=X[1] +X[3] +X[5] +X[7] +X[9]-15.0;
C1 f1 = X[0].pow(2)+2*X[1].pow(2)+3*X[2].pow(2)+4*X[3].pow(2) +
5*X[4].pow(2)+6*X[5].pow(2)+7*X[6].pow(2)+8*X[7].pow(2)+
9*X[8].pow(2)+10*X[9].pow(2);
C1 q1 = f1 + (0.5 * rho) * A.pow(2);
C0 g1 = d(q1), beta = g1.pow(2) / g.pow(2);
p = -g1 + beta * p; delta_q &= (C0)q - (C0)q1;
if((double)norm(delta_q) < RELAXED*EPSILON ||
(double)norm(d_x) < RELAXED*EPSILON) break;
}
}
} while (++k1 < MAX_NO_OF_ITERATION && (double)norm(d_x) >
RELAXED*EPSILON && (double)norm(delta_q)> RELAXED*EPSILON);
delta_X = norm(X_cache - ((C0)X)); X_cache = ((C0)X);
} while(++k0 < MAX_NO_OF_ITERATION && delta_X > RELAXED*EPSILON);
cout << "The Final solution: " << ((C0)X) << endl;
}
f(x) =
ixi2
i=0
+= d
dx = p
update xi+1 = xi+dx
norm = || dx ||
conjugate direction
T
gi+1 = q(xi+1)
T
i= (gi+1)Tgi+1 / [ (gi) gi]
pi+1 = - gi+1 + i pi
Listing 222 The conjugate gradient method for penalty formulation (project: penalty_conjugate_gradient).
158
Eq. 266
The kernel steps of the combined Newton and steepest descent customized for penalty method are
Step 1: Newton method on M,
search direction: p = - --1- A T(xi) (A(xi)A(xi) T )-2 A (xi) q(xi)T
line search: minimize q(xi + p), update ri = xi + p
Step 2: steepest descent on M,
search direction: p = - q(ri)T
line search: minimize q(ri + p), update xi+1 = ri + p
We implemented these two steps with three C++ functions in Program Listing 223: (1) the line search is performed in both steps. We factor out this procedure and code it into a line_search() function, (2) Newton method
applied to M is implemented as newton_on_orthogonal_complement_of_tangent(), and (3) steepest descent
on M is implemented as steepest_descent_on_tangent(). Program Listing 224 implemented the main program
of the combined Newton and steepest descent method using the above three functions.
159
Chapter
#include include/vs.h
void line_search(C2& X, C0& p, C2& alpha, double rho) {
const int DOF = 10; const double EPSILON = 1.e-12;
const double RELAXED = 1.e6; const int MAX_NO_OF_ITERATION = 10;
((C0)alpha) = 0.0;
C2 x[DOF], A[4]; C0 d_alpha; int k2 = 0;
do {
C2 phi;
for(int j = 0; j < 10; j++) x[j] &= ((C0)X)[j] + alpha * p[j];
A[0]=1.5*x[0]+x[1]+x[2]+0.5*x[3]+0.5*x[4] -5.5;
A[1]= 2.0*x[5]-0.5*x[6]-0.5*x[7]+x[8]-x[9]-2.0;
A[2]=x[0] +x[2] +x[4] +x[6] +x[8] -10.0;
A[3]=x[1] +x[3] +x[5] +x[7] +x[9]-15.0;
phi &= x[0].pow(2)+2*x[1].pow(2)+3*x[2].pow(2)+4*x[3].pow(2)+5*x[4].pow(2)+
6*x[5].pow(2)+7*x[6].pow(2)+8*x[7].pow(2)+9*x[8].pow(2)+10*x[9].pow(2)+
(0.5 * rho) *A.pow(2);
if((double)dd(phi) > EPSILON) d_alpha &= -d(phi) / dd(phi); else break;
((C0)alpha) += d_alpha;
} while(++k2 < MAX_NO_OF_ITERATION &&
(double)norm((double)d_alpha) > RELAXED*EPSILON);
}
void newton_on_orthogonal_complement_of_tangent(
C2& X, C2& f, C2& A, C2& q, C0& p, double rho) {
A[0]=1.5*X[0]+X[1]+X[2]+0.5*X[3]+0.5*X[4] -5.5;
A[1]= 2.0*X[5]-0.5*X[6]-0.5*X[7]+X[8]-X[9]-2.0;
A[2]=X[0] +X[2] +X[4] +X[6] +X[8] -10.0;
A[3]=X[1] +X[3] +X[5] +X[7] +X[9]-15.0;
f &= X[0].pow(2)+2*X[1].pow(2)+3*X[2].pow(2)+4*X[3].pow(2)+5*X[4].pow(2)+
6*X[5].pow(2)+7*X[6].pow(2)+8*X[7].pow(2)+9*X[8].pow(2)+10*X[9].pow(2);
q &= f + (0.5 * rho) * A.pow(2);
C0 grad_A = d(A), approx_Q_bar_inv;
approx_Q_bar_inv &= 1.0/rho * (grad_A*(~grad_A)).pow(2).inverse();
p &= (~grad_A) * approx_Q_bar_inv * grad_A * -d(q);
}
void steepest_descent_on_tangent(C2& X, C2& f, C2& A, C2& q, C0& p, double rho) {
A[0]=1.5*X[0]+X[1]+X[2]+0.5*X[3]+0.5*X[4]-5.5;
A[1]=2.0*X[5]-0.5*X[6]-0.5*X[7]+X[8]-X[9]-2.0;
A[2]=X[0] +X[2] +X[4] +X[6] +X[8]-10.0;
A[3]=
X[1] +
X[3] +
X[5] +
X[7] + X[9]-15.0;
f &= X[0].pow(2)+2*X[1].pow(2)+3*X[2].pow(2)+4*X[3].pow(2)+5*X[4].pow(2)+
6*X[5].pow(2)+7*X[6].pow(2)+8*X[7].pow(2)+9*X[8].pow(2)+10*X[9].pow(2);
q &= f + (0.5 * rho) * A.pow(2);
p &= -d(q);
}
f(x) =
ixi2
i=0
steepest descent on M
p = - q(ri)
Listing 223 Three functions of the combined Newton and steepest descent version of penalty method (project:
penalty_combined_newton_and_steepest_descent).
160
Listing 224 The main program of the combined Newton and steepest descent version of penalty method
(project: penalty_combined_newton_and_steepest_descent).
161
Chapter
Eq. 267
First, the obvious feature is now we are working on an augmented space of {x, } with the dimension n+m.
Second, from dual view point, comparing Eq. 267 with the Lagrangian functional in Lagrange method on
T
page 145, Eq. 267 has an extra penalty term ( /2) A(x) A(x). In view of the existence and uniqueness (strong)
condition of the saddle function shown in Figure 2.18. This quadratic penalty term in x convexifies the primal
(x-variables) of the Lagrangian functional l(x, ). Third, from penalty view point, Eq. 267 without the middle
term is the penalty objective functional ( q(x) in Eq. 261). The penalty method is always plagued by being not
consistent with the minimum of the Lagrangian functional l(x, ). We can easily show this by considering that
the first-order conditions of the constraint problem (l,x) and unconstrained problem (q) are not equal
l,x = f(x) + A(x) q = f(x) + A(x) A(x)
T
unless a special condition = A(x), Eq. 262, is met. On the other hand, the first derivative of the augmented
Lagrangian functional (lA) is
f(x) + A(x) + A(x) A(x) = 0
T
Therefore, the middle term A(x) in Eq. 267 makes the penalty method consistent with the first-order condition of Lagrange method. The algorithm of augmented Lagrangian method can be implemented with a nested
double loop. The outer loop is the penalty method in which we need to increase the penalty parameter from a
smaller number to a considerable greater number until the solution converge. The inner loop is to update the
Lagrange multiplier, in view of = A(x), as
T
i+1 = i + A(xi )
Eq. 268
in the hope that A(xi+1) = 0 can be achieved by this update. Program Listing 225 implemented augmented
Lagrangian method.
162
i+1 = i + A(xi)
lA =f(x) + T A(x) + ( /2) A(x)T A(x)
dx = - lA,x / lA,xx
Eq. 269
The solution is achieved at the limit 0 . The gradient of the perturbed Lagrangian function (the EulerLagrange equations) is
lp,x = f(x) + T A(x) = 0
lp, = A(x) - = 0
Eq. 270
It is consistent with the first-order condition of Lagrange method (at the limit of 0 ). For a quadratic programming problem
minimize
subject to
f(x) = -2 x T H x + g T x
A(x) = Ax - b = 0
163
Chapter
Eq. 270 can be re-written in matrix form with the incremental solution x as
T
HA
A
i
x = f ( x )
i
A ( x )
Eq. 271
HA
A 0
f ( xi )
A ( xi )
Since the left-hand-side matrix is symmetrical we can use modified Cholesky decomposition to solve Eq. 271.
Or, recall Eq. 245 that = (AH-1AT)-1 [A(xi) - AH-1f(xi )] Set B-1 = (AH-1AT)-1 and apply modified Cholesky
decomposition to B. Not only the implementation in Program Listing 226 is extremely simple, but also the
computing speed is lightning fast.
#include include/vs.h
int main() {
const int DOF = 10; const int MAX_NO_OF_ITERATION = 20;
const double EPSILON = 1.e-12; const double RELAXED = 1.e6;
double x[DOF] = {0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0};
int k = 0;
C2 f, X(DOF,x),
A = VECTOR_OF_TANGENT_OF_TANGENT_BUNDLE("int, int", 4, DOF);
C0 d_x;
do {
A[0]=1.5*X[0]+X[1]+X[2]+0.5*X[3]+0.5*X[4]
-5.5;
A[1]=
2.0*X[5]-0.5*X[6]-0.5*X[7]+X[8]-X[9]-2.0;
A[2]= X[0] + X[2] +
X[4] +
X[6] +
X[8] -10.0;
A[3]=
X[1] +
X[3] +
X[5] +
X[7] + X[9]-15.0;
f &= X[0].pow(2)+2*X[1].pow(2)+3*X[2].pow(2)+4*X[3].pow(2)+5*X[4].pow(2)+
6*X[5].pow(2)+7*X[6].pow(2)+8*X[7].pow(2)+9*X[8].pow(2)+10*X[9].pow(2);
C0 H_inv = dd(f).inverse();
Cholesky AHAt_inv( (d(A)*H_inv*(~d(A)) ), EPSILON);
C0 lambda = AHAt _inv * ( ((C0)A) - d(A)*(H_inv*d(f)) );
d_x &= H_inv*-( (~d(A))*lambda + d(f) );
((C0)X) += d_x;
cout << "solution(" << (++k) << "): "<<((C0)X) << ", objective functional: " << ((C0)f)
<< endl;
} while( k < MAX_NO_OF_ITERATION && (double)norm(d_x) > RELAXED*EPSILON );
cout << "The final solution: " << ((C0)X) << endl;
}
164