Op Tim Ization

CHAPTER
Two
Numerical Optimization
Using C1 and C2 Type
Objects
C1 and C2 spaces are continuous vector spaces which are differentiable up to order 1 and order 2, respectively. Classes in C1 and C2 types enable the users of VectorSpace C++ Library to deal with numerically differentiable objects. The applications in the subject of numerical optimization, constrained or unconstrained, can be
easily expressed with C1 and C2 types. C++ programs using VectorSpace C++ Library in this chapter are
projects in project workspace file Cn.dsw under directory vs\ex\Cn.
2.1 C1 Type Objects

VectorSpace C++ Library supports two classes, modeled after concept in differential geometry1,
Tangent_Bundle and Vector_of_Tangent_Bundle, in C1 type. A tangent bundle has a tangent vector w attached
to a base point p on a body; i.e., p , where a body m, and w n, with m n. We denote the tangent
bundle as a pair of a base point and a tangent vector (p, w) defined in T n. We represent the tangent bundle graphically in Figure 2.1. In VectorSpace C++ Library, the class name Tangent_Bundle is reserved for a
scalar tangent bundle where the base point has dimension of 1 (p ), and w n. A more general case of
tangent bundle C++ class with m, p , and w n; i.e., the base point p has the dimension of m
(m n) is called Vector_of_Tangent_Bundle (notice that may have lower dimension than its containing
space = n, for example, a rod (1-D; m =1), a plate (2-D; m = 2), or a shell (2-D; m = 2) in a three dimensional
physical space 3; n = 3.) Note that for more generalized spatial and body dimension (in case for including
1. e.g., p.90 in W.L. Burke, 1985, Applied differential geometry, Cambridge University Press, Cambridge, U.K.
Workbook of Applications in VectorSpace C++ Library 95
Chapter
2 Numerical Optimization Using C1 and C2 Type Objects
dimensions greater than 3), a generalized containing space

body in physical space 3.
is said to contain a manifold , in place of a
Figure 2.1 Tangent bundle of a body
in its containing space .
In the following we define the algebra of the C1 type. For x, y, and z, Tangent_Bundle objects of C1
type, with an abstract binary operator , such as in z = x y , TABLE 21 summaries the algebra of four concrete basic operators (in place of the abstract operator ).
Operator
base point
tangent vector
z=x+y
dz = dx + dy
z=x-y
dz = dx - dy
z=xy
dz = y dx + x dy (Lebniz rule)
z=x/y
dz = (y dx - x dy) / y2
TABLE 21 Four basic binary operators for the algebra of C1 type objects.
The derivatives for the multiplication operator in the third row simply follows the Lebniz rule in calculusd z =
d(x y) = y dx + x dy. Lebniz rule can also be applied to the division operatord(x / y) = d(x * (1/y)) = (y dx - x
dy) / y2. Other operators and transcendental functions can be defined accordingly as in calculus.
It is obvious how C1 type object works as a differentiable object. Since we keep track of numerical values of
base point, p, and tangent vector, w, of a C1 type object through out all kinds of operations, the resultant numerical values of base point and tangent vector of intermediate (temporary) objects will always be available. Inquiring on the numerical values of tangent vector, w, of the C1 type object gives the derivative information we need.
From reverse engineering point of view, when we want to do away with C1 type object, this model has the
advantage that the computing algorithm is quite compatible with traditional FORTRAN or C programming. On
the other hand, the symbolic languages process the intermediate analytical expression by looking up its dictionary, while defer the evaluation of actually numerical values until it is explicitly requested by users. Therefore
the computing algorithm is completely different from that of FORTRAN or C. In retrospect, we note that the
defer evaluation approach is known to have its advantage of fast response time in an interactive environment,
since un-necessary evaluations are avoided sometimes.
2.1.1 Tangent Bundle

The data abstraction for a C1 type Tangent_Bundle object is modeled mathematically by (1) a C0* type Scalar u for the base pointp (with the dimension of the base point being always equals 1), and (2) a C0* type
96
Workbook of Applications in VectorSpace C++ Library
C1 Type Objects
Scalar/Vector du for the tangent vectorw (a Scalar when that spatial dimension =1, and a Vector when the
spatial dimension > 1). The two private data members C0* type Scalar, u, and Scalar/Vector, du, in turn refer
to double* type v and dv in physical memory space, respectively. The constructor and the destructor of the
C1 type Tangent_Bundle object encapsulate the details of low-level memory management on the double* type
v and dv. In other words, both the base point and the tangent vector are represented in two levels. That is in
higher level as C0* type Scalar(u) and Scalar/Vector(du), and in lower level as double* type v and dv. This
dual abstraction in Tangent_Bundle class is to facilitate (1) swift memory management in lower level, and (2)
mathematical abstraction in higher level for the C1 type Tangent_Bundle class as shown in TABLE 22.
Tangent Bundle Abstraction
Mathematics
Physical Memory
base pointp
C0* of Scalaru
double*v
tangent vectorw
C0* of Scalar/Vectordu
double*dv
TABLE 22. Dual abstraction of a C1 type Tangent_Bundle class.
Constructors
A dedicated constructor for a C1 type Tangent_Bundle object can be written as (project: c1_examples)
1
2
3
4
5
C1 x(0.0);
cout << ((C0) x) << endl;
cout << d(x) << endl;
((C0) x) = 3.0;
// (*u) = 0.0; (*du) = 1.0 (default), spatial dimension = 1 (default)

// 0.0
// 1.0
// as l-value
// 3.0
A double constant 0.0 in line 1 is the argument passed to the dedicated constructor to be assigned as the value
of the base point. The Tangent_Bundle so constructed has default spatial dimension of 1, and its default derivative (tangent vector) value du = 1.0 as default. C0 converter C1::operator C0() in line 2 casting on x is used
to retrieve the value of the base point of x. The free function d(const C1&) in line 3 can be used to retrieve
the value of the derivative (tangent vector). Both the casting operator and the derivative function can be used as
l-value, to be put on the left-hand-side to assign value to it. The reason for the default value du = 1.0 is evident
when we consider using x as a variable. For example, use x as a variable to define a function f 1(project:
c1_examples)
1
2
3
4
5
C1 x(0.0),
f = 2.0 * x * (sin(x)+1.0);
cout << ((C0) f) << endl;
cout << d(f) << endl;
// (*u) = 0.0; (*du) = 1.0

// f(x) = 2x[sin(x)+1], x C1 and f C1
// f(0.0) = 0.0 = 2.0*(0.0)*[sin(0.0)+1.0]
// df(0.0) / dx = 2.0 = 2.0*(0.0)*cos(0.0) +
//
2.0*[sin(0.0)+1.0]
1. example taken from K.E. Gorlen, S.M.Orlow, and P.S. Plexico, 1991, Data Abstraction and Object-Oriented Programming in C++, John Wiley & Sons Ltd, p.92-93.
97
Chapter
The function (or dependent variable) f is defined with the (independent) variable x as its parameter. A different view on the default value of du = 1.0, is that if x is to serve as a variable, the derivative of xdx
equals 1.0 by the way of differentiation in calculus. The kind of dedicated constructor C1::C1(const double&) that can be used as a variable to define a more complicated function is called a variable (dedicated) constructor.
When the spatial dimension is not equal to 1, we can use the following constructor (not a variable dedicated
constructor)
C1 y(3.0, 3);
cout << ((C0) y) << endl;
cout << d(y) << endl;
// (project: c1_examples)
// 3.0
// {0.0, 0.0, 0.0}T
The first argument of this dedicated constructor is a const double& which specifies the value of the base point,
and the second argument is a int which gives the number of the spatial dimension, the dimension of a tangent
vector, w. The default value of the tangent vector is to have all its components set to 0.0.
The constant strings for Tangent_Bundle virtual constructor (use macro definition TANGENT_BUNDLE)
and autonomous virtual constructor are shown in the following box.
virtual constructor string
VectorSpace C++ Library definition
priority
by reference
C1&
C1*
double*, double*
double*, double*, int
C1 type Tangent_Bundle
pointer to C1 type Tangent_Bundle
base point, tangent vector, (spatial dim. = 1)
base point, tangnet vector, spatial dim.
1
2
3
4
by value
int
const double&, const double&
const C0&, const C0&
const double*, const double*
const C0*, const C0*
const double*, const double*, int
const C0*, const C0*, int
const C1&
const C1*
spatial dim.
base point, tangent vector, spatial dim.
base point, tangent vector, spatial dim.
C1 type Tangent_Bundle
pointer to C1 type Tangent_Bundle
5
6
7
8
9
10
11
12
13
Strings in C1 virtual constructor for C1 type Tangent_Bundle class.
Operators and Functions

We have introduced the casting operator C1::operator C0() and the free function d(const C1&) along
with the Tangent_Bundle constructors. These are the two most important operator and function in C1 type. The
rest of the operators and functions are listed in the following box.
98
C1 Type Objects
operator or function
VectorSpace C++ library definition
remark
symbolic operators
C0& operator &= ( )
C0& operator = ( )
operator C0()
assignment by reference
assignment by value
casting operator; retrieve base point
arithmetic operators
C0 operator + ( ) const
C0 operator - ( ) const
C0 operator + (const C0&) const
C0 operator - (const C0&) const
C0 operator * (const C0&) const
C0 operator / (const C0&) const
C0& operator += (const C0&)
C0& operator -= (const C0&)
C0& operator *= (const C0&)
C0& operator /= (const C0&)
positive (primary casting)

unary
negative
unary
addition
subtraction
multiplication by a scalar; scalar product of two Vectors
division (by a Scalar or a Matrix only)
return a Vector
replacement addition
replacement subtraction
replacement multiplication (by a Scalar only)
replacement division (by a Scalar only)
logic operators
int operator == (const C0&) const
int operator != (const C0&) const
int operator >= (const C0&) const
int operator <= (const C0&) const
int operator > (const C0&) const
int operator < (const C0&) const
equal
not equal
greater or equal
less or equal
greater
less
functions
int col_length() const
C0& d()
C0 pow(int) const
C0 exp(const C0&) const
C0 log(const C0&) const
C0 sin(const C0&) const
C0 cos(const C0&) const
spatial dimension
the first derivative; retrieve tangent vector
power (applied to each element of the Vector)
exponent (applied to each element of the Vector)
log (applied to each element of the Vector)
sin (applied to each element of the Vector)
cos (applied to each element of the Vector)
TRUE == 1
FALSE == 0
Partial listing of C1 type Tangent_Bundle class arithmetic operators, logic operators and functions.
2.1.2 Vector of Tangent Bundle

Recalled that a Tangent_Bundle object of C1 type is reserved for tangent bundle with a scalar base point (m =
1; i.e., the manifold dimension equals 1). For the dimension m of a manifold to be greater than 1, we have C1
type Vector_of_Tangent_Bundle objects. The data abstraction for the C1 type Vector_of_Tangent_Bundle class
99
Chapter
is represented by a C0* type Vector object for its base point (length = m), and a C0* type Matrix object
(row-length = m, column-length = n, and m n) for its tangent vector. The Matrix object actually is the
Jacobian matrix of the form (for example, m = n = 3)
f 1 x 1 f 1 x 2 f1 x 3
df i
df
------ = ------- = f 2 x 1 f 2 x 2 f2 x 3
dx j
dx
f 3 x 1 f 3 x 2 f3 x 3
Eq. 21
Constructors
The dedicated constructor for the C1 type Vector_of_Tangent_Bundle class can be written as (project:
c1_examples)
double v[3] = {1.0, 1.0, 1.0};
C1 x(3, v);
cout << ((C0)x) << endl;
// base point values (coordinates)

// m = n = 3; equal dimensions for body and physical space
// {1.0, 1.0, 1.0}T
// { {1.0, 0.0, 0.0},
// {0.0, 1.0, 0.0},
// {0.0, 0.0, 1.0} }
C1::C1(int, const double*) is the variable dedicated constructor for the C1 type Vector_of_Tangent_Bundle
class. The example for using this variable dedicated constructor is a vector function f = {f1, f2, f3}T, which
depends on three independent variable x = {x1, x2, x3}T as1
f 1 = 16x 14 + 16x 24 + x 34 16
f 2 = x 12 + x 22 + x 32 3
f 3 = x 13 + x 2
Eq. 22
Root finding of f(x) = 0 for this non-linear problem can be obtained by an iterative algorithm. Considering
approximation of the vector function f by Taylor expansion at the neighborhood of an initial value xi with its
increment dx as
f(xi+dx) = f(xi) + f,x(xi) dx + O(dx2) = 0
Eq. 23
where O(dx2) denotes the error in the second-order of dx or above. Neglecting higher-order errors in Eq. 23 for
small dx, we have
dx = - f(xi) / f,x(xi)
Eq. 24
1. example taken from K.E. Gorlen, S.M. Orlow, and P.S. Plexico, 1991, Data Abstraction and Object-Oriented Programming in C++, John Wiley & Sons Ltd, p.93-97.
100
C1 Type Objects
This is the formula for root finding, and xi+1 = xi+dx is the update. The implementation of this iterative algorithm
(Newton-Raphson Method) shown in Program Listing 21 with VectorSpace C++ Library is very simple. The
C++ codes are actually as concise as the mathematical expressions. The selector C1::operator [](int) is used to
access the components of the C1 type Vector_of_Tangent_Bundle class. The return value of the selector is a
Tangent_Bundle (see Figure 2.2). The solution for f(x) = 0 is x = {0.877966, 0.676757, 1.33086}T.
#include include/vs.h
#define EPSILON 1.e-12
#define MAX_ITER_NO 10
int main() {
double v[3] = {1.0, 1.0, 1.0};
C1 x(3, v), f(3, (double*)0);
int count = 0;
do {
f[0]=16.0*x[0].pow(4)+16.0*x[1].pow(4)+x[2].pow(4)-16.0;
f[1]=
x[0].pow(2)+
x[1].pow(2)+x[2].pow(2)-3.0;
f[2] =
x[0].pow(3)x[2];
C0 dx = - ((C0)f) / d(f);
(C0) x += dx;
} while(++count < MAX_ITER_NO &&
(double)norm((C0)f) > EPSILON);
if(count == MAX_ITER_NO)
cout << Warning: convergence failed, residual norm:
<< ((double)norm((C0)f)) << endl;
else
cout << solution ( << count << ): << ((C0)x) << endl;
return 0;
}
initial values, x0 = {1.0, 1.0, 1.0}T
f1 = 16x 14 + 16x 24 + x 34 16
f2 = x 12 + x 22 + x 32 3
f 3 = x 13 + x 2
Eq. 24, dx = - f(xi) / f,x(xi)

update xi+1 = xi+dx
check convergence condition
if convergence failed output
residual norm
x = {0.877966, 0.676757, 1.33086}T
Listing 21 Solving a nonlinear vector of function using C1 type Vector_of_Tangent_Bundle class

(project: newton_method_nd_root_finding).
x:
double v[3] = {1.0, 1.0, 1.0};
C1 x(3, v);
x[0] :
00 01 02
10 11 12
20 21 22
base
point
tangent
vector
00 01 02
Figure 2.2 Selector of the C1 type Vector_of_Tangent_Bundle class.
101
Chapter
The constant strings for C1 type Vector_of_Tangent_Bundle class virtual constructors (use macro definition
VECTOR_OF_TANGENT_BUNDLE) and autonomous virtual constructors are shown in the following box.
by reference
C1&
C1*
int, int, double*, double*
by value
int, int
int, int, const double*, const double*
const C0&, const C0&
const C1&
const C1*
priority
C1 type Vector_of_Tangent_Bundle
pointer to C1 type Vector_of_Tangent_Bundle
manifold dim, spatial dim, base point, tangnet vector

14
manifold dim., spatial dim.

base point (Vector), tangent vector (Matrix)
C1 type Vector_of_Tangent_Bundle
pointer to C1 type Vector_of_Tangent_Bundle
15
16
Strings in C1 virtual constructor for C1 type Vector_of_Tangent_Bundle class.

The most important new operator other than those which have already been introduced in the C1 type
Vector_of_Tangent_Bundle class is the selector. The rest of the operators and functions are listed in the following box.
102
C1 Type Objects
symbolic operators
C0& operator &= ( )
C0& operator = ( )
C0& operator [] (int)
operator C0()
assignment by value
selector
remark
return Tangent_
Bundle

unary
negative
unary
addition
subtraction
multiplication by a scalar or scalar product
division (by a Scalar only)
logic operators
equal
not equal
greater or equal
less or equal
greater
less
functions
int row_length() const
C0& d()
C0 pow(int) const
manifold dimension
spatial dimension
TRUE == 1
FALSE == 0
Partial listing of Vector_of_Tangent_Bundle object arithmetic operators, logic operators and functions.
103
Chapter
2.2 C2 Type Objects

C2 type models objects in C2 spaces. It is a generalization of the tangent bundle to derivatives of second
order. For the tangent bundle in C1 space, the generalization to second order derivative gives tangent of tangent bundle in C2 space. In the previous section we have a vector of tangent bundle in C1 space when the manifold dimension is greater than 1. Therefore, we can also have vector of tangent of tangent bundle. In
VectorSpace C++ library we implement the generalization of classes from C1 type to C2 type as classes
Tangent_of_Tangent_Bundle and Vector_of_Tangent_of_Tangent_Bundle. The nomenclature is unfortunately quite verbose. These cluttered class names are trade-off for showing generic clarity.
The algebra of the C2 type can be defined similar to that of the C1 type. For C2 type objects x, y, and
z with an abstract binary operator in the case of z = x y . TABLE 21 outlines the algebra of four concrete basic operators.
Operator
base point
tangent vector
tangent of tangent vector
z=x+y
dz = dx + dy
d2z = d2x + d2y
z=x-y
dz = dx - dy
d2z = d2x - d2y
z=xy
dz = y dx + x dy
z=x/y
dz = (y dx - x dy) / y2
dx dy + dy dx + xd y + yd x
d x y ( dx dy + dy dx + xd y ) y 2 +2x ( dy dy ) y 3
TABLE 23. Four basic binary operators for the algebra of C2 type objects.
where the operator for the tangent of tangent vector denotes tensor product.
2.2.1 Tangent of Tangent Bundle

The data abstraction for a C2 type Tangent_of_Tangent_Bundle (see Figure 2.3) is represented by a C0*
type Scalar u for the base point p, and a C0* type Scalar/Vector du for the tangent vector w (Vector in
the case of spatial dimension > 1), and a C0* type Scalar/Matrix ddu for the tangent of tangent vector dw
(Matrix in the case of spatial dimension > 1). As the dual abstraction for the C1 type object explained in
page 97, high-level representation u, du, and ddu are all referring to physical memory spaces pointed to
low-level representation by three double* v, dv, and ddv.
spatial dimension = 1
du
ddu
spatial dimension = 3
u
base point
u,x u,y u,z

tangent
vector
u,xx u,xy u,xz

u,yx u,yy u,yz
u,zx u,zy u,zz
tangent of tangent
vector
Figure 2.3 Data abstraction of C2 type Tangent_of_Tangent_Bundle class.
104
C2 Type Objects
Constructors
An example of using a variable dedicated constructor for C2 type Tangent_of_Tangent_Bundle class is
(project: c2_examples)
C2 x(0.0);
cout << dd(x) << endl;
// u = 0.0, du (= 1.0; default value), ddu (= 0.0; default value)

// 0.0
// 1.0
// 0.0, or write d2(x) instead
For the purpose of a variable dedicated constructor, the default value of ddu(=0.0) is just the derivative of du
(= 1.0). For access to the second derivative information, free function dd(const C2&) (or d2(const C2&)) can
be used to retrieve the value of ddu (project: c2_examples).
C2 x(0.0),
f = 2.0 * x * (sin(x)+1.0);
cout << ((C0) f) << endl;
cout << d(f) << endl;
cout << dd(f) << endl;
// (*u) = 0.0; (*du) = 1.0

// f(x) = 2x[sin(x)+1], x C2 and f C2
// f(0.0) = 0.0 = 2.0*(0.0)*[sin(0.0)+1.0]
// df(0.0) / dx = 2.0 = 2.0*(0.0)*cos(0.0) +
//
2.0*[sin(0.0)+1.0]
// d 2f(0.0)/dx2 = 4.0 = [2.0*(0.0)*-sin(0.0) +2.0*cos(0.0)]+
//
2.0*cos(0.0)
For spatial dimension greater than 1, we can write the dedicated constructor similarly to write that of the C1 type
Tangent_Bundle as (project: c2_examples)
C2 y(3.0, 3);
cout << ((C0) y) << endl;
cout << d(y) << endl;
cout << dd(y) << endl;
// 3.0
// {0.0, 0.0, 0.0}T
// {{0.0, 0.0, 0.0}, {0.0, 0.0, 0.0}, {0.0, 0.0, 0.0}}
The constant strings for C2 type Tangnet_of_Tangent_Bundle virtual constructors (use macro definition
TANGENT_OF_TANGENT_BUNDLE) and autonomous virtual constructors are shown in the following box.

We have introduced the second derivative function dd(const C2&) (or d2(const C2&)) along with the C2
type Tangent_of_Tangent_Bundle constructors. This is the most important new function in C2 type. The rest of
the operators and functions are listed in the following box.
105
Chapter

by reference
C2&
C2*
double*, double*, double*
double*, double*, double*, int
by value
int
const double&, const double&,
const double&
const C0&, const C0&,
const C0&
const double*
const C0*
const double*, const double*,
const double*, int
const C0*, const C0*, int
const C0*, int
const C2&
const C2*
C2 type Tangent_of_Tangent_Bundle
pointer to C2 type Tangent_of_Tangent_Bundle
base point, tangent vector, tangent of tangent
vector, (spatial dim. = 1)
vector, spatial dim.
1
2
3
spatial dim.
C2 type Tangent_of_Tangent_Bundle
pointer to C2 type Tangent_of_Tangent_Bundle
5
6
Strings in C2 virtual constructor for Tangent_of_Tangent_Bundle object.
106
priority
7
8
9
10
11
12
13
C2 Type Objects
remark
symbolic operators
C0& operator &= ( )
C0& operator = ( )
operator C0()
assignment by value

unary
negative
unary
addition
subtraction
multiplication by a scalar; scalar product of two Vectors
division (by a Scalar or a Matrix only)
return a Vector
logic operators
equal
not equal
greater or equal
less or equal
greater
less
functions
C0& d()
C0& dd() or C0& d2()
C0 pow(int) const
spatial dimension
the first derivative
the second derivative
TRUE == 1
FALSE == 0
Partial listing of C2 type Tangent_of_Tangent_Bundle object arithmetic operators, logic operators and functions.
107
Chapter
2.2.2 Vector of Tangent of Tangent Bundle

The data abstraction for a C2 type Vector_of_Tangent_of_Tangent_Bundle (see Figure 2.4) is represented
by a C0* type Vector u for the base point p, and a C0* type Matrix du for the tangent vector w, and a
C0* type Submatrix ddu for the tangent of tangent vector dw. As in the dual abstraction for the C1 type
object explained in page 97, C2 type has high-level mathematical representation as u, du, and ddu which
in turn refer to physical memory spaces in low-level representation as double* type v, dv, and ddv.
spatial dimension = 3, manifold dimension = 3
u,x u,y u,z
u,xx u,xy u,xz

u,yx u,yy u,yz
u,zx u,zy u,zz
v,x v,y v,z
v,xx v,xy v,xz

v,yx v,yy v,yz
v,zx v,zy v,zz
w,x w,y w,z
w,xx w,xy w,xz

w,yx w,yy w,yz
w,zx w,zy w,zz
base point
tangent
vector
tangent of tangent
vector
Figure 2.4 Data abstraction of C2 type Vector_of_Tangent_of_Tangent_Bundle class.
Constructors
A variable dedicated constructor for C2 type Vector_of_Tangent_of_Tangent_Bundle can be written as
(project: c2_examples)
double v[3] = {0.0, 1.0, 2.0};
C2 x(3, v);
cout << (+dd(x)) << endl;
// du (= identity; default), ddu (= 0; default)

// {0.0, 1.0, 2.0}T
// {{1.0, 0.0, 0.0}, {0.0, 1.0, 0.0}, {0.0, 0.0, 1.0}}
// {{0.0, 0.0, 0.0}, {0.0, 0.0, 0.0}, {0.0, 0.0, 0.0},
// {0.0, 0.0, 0.0}, {0.0, 0.0, 0.0}, {0.0, 0.0, 0.0},
// {0.0, 0.0, 0.0}, {0.0, 0.0, 0.0}, {0.0, 0.0, 0.0}}
Note that dd(x) returns a Nominal_Submatrix which can not be directed to iostreams. We must use primary
casting + to convert it into a Matrix. Without this conversion the program will throw an exception and stop.
108
C2 Type Objects
We apply the variable dedicated constructor to a minimization problem1
f(x1, x2) = 2x12 + x1x2 + x22 -12x1 -10 x2
Eq. 25
This elliptic objective functional can be approximated by Taylor expansion to the second order as2
1
f ( x ) f ( x i ) + f ,x ( x i )dx + --- dx T H ( x i )dx
2
Eq. 26
where f,x(xi) is the so-called Jacobian matrix, and H(xi) = f,xx(xi) is the so-called Hessian matrix. f(x) is minimized if its first derivatives with respect to dx vanishes. Therefore, if we take derivatives of f(x), set to zero, then
solve for dx, we obtain,
dx = - f,x(xi) / H(xi)
Eq. 27
The elliptic nature of the objective functional guarantees that the Hessian matrix can be inverted. Eq. 27 is
known as the Newtons formula, and xi+1 = xi+dx is the update for the algorithm. For the elliptic objective functional such as Eq. 25, the approximation by Eq. 26 in quadratic form is exact. One iteration will give the exact
answer. Program Listing 22 in the following implemented the classic Newton-Raphson method that can be
used for less ideal cases when the objective functionals are not exactly quadratic.
The minimum solution of this elliptic objective functional f is {2, 4}T, which is the center of the ellipse f =
constant. We can verify this immediately with analytical geometry. The Newton-Raphson iterative procedure in
this case achieves convergence in just one iteration from the initial point (0, 0) to the final solution point (2, 4)
(see Figure 2.5).
The constant strings for Vector_of_Tangent_Bundle virtual constructors (use macro definition
VECTOR_OF_TANGENT_OF_TANGENT_BUNDLE) and autonomous virtual constructors are shown in
the following box.

The most important operator other than those introduced in the Tangent_of_Tangent_Bundle is the selector
we have just used. The rest of the operators and functions are listed in the following box.
1. function without constrained conditions from D.G. Luenberger, 1989, Linear and Nonlinear Programming, AddisonWesley Publishing Company, Inc., Reading, MA., p.426.
2. A similar equation is in p.225, Eq. 43 of D.G. Luenberger, 1989, Linear and Nonlinear Programming, Addison-Wesley
Publishing Company, Inc., Reading, MA.
109
Chapter
int main() {
C2 x(2, (double*)0), f;
int count = 0;
do {
f &= 2.0*x[0].pow(2) + x[0]*x[1] + x[1].pow(2) -12.0*x[0]
-10.0*x[1];
C0 dx = - d(f) / dd(f);
(C0) x += dx;
} while(++count < MAX_ITER_NO &&
(double)norm(d(f))>EPSILON);
cout << Warning: convergence failed, increment norm:
<< ((double)norm(dx)) << endl;
else
return 0;
}
initial values, x0 = {0.0, 0.0}T

f(x1, x2) = 2x12 + x1x2 + x22 -12x1 -10 x2
Eq. 27, dx = - f,x(xi) / f,xx(xi)
update xi+1 = xi+dx
check convergence condition
if convergence failed output
dx norm
x = {2, 4}T
Listing 22 Minimization of an elliptic objective functional using Vector_of_Tangent_of_Tangent _Bundle

(project: newton_method_optimization).
8
(2, 4) solution
(0,0) initial point

0
Figure 2.5 Newton-Raphson method for a quadratic functional.
110
C2 Type Objects
by reference
C2&
C2*
int, int, double*, double*,
double*
C2 type Vector_of_Tangent_of_Tangent_Bundle
C2* type Vector_of_Tangent_of_Tangent_Bundle
manifold dim, spatial dim, base point, tangent vector
tangent of tangent vector
by value
int, int
int, int, const double*,
const C0&, const C0&,
const C0&
const C0*, const C0*,
const C0*
const C2&
const C2*
manifold dim., spatial dim.

tangent of tangent vector (Submatrix)
tangent of tangent vector (Submatrix)
C1 type Vector_of_Tangent_of_Tangent_Bundle
C1* type Vector_of_Tangent_of_Tangent_Bundle
priority
14
15
16
Strings in C2 virtual constructor for C2 type Vector_of_Tangent_of_Tangent_Bundle class.
111
Chapter

symbolic operators
C0& operator &= ( )
C0& operator = ( )
C0& operator [] (int)
operator C0()
VectorSpace C++ library definition remark
assignment by value
selector
return Tangent_of_
Tangent_Bundle

unary
negative
unary
addition
subtraction
multiplication by a scalar or scalar product
division (by a Scalar only)
logic operators
equal
not equal
greater or equal
less or equal
greater
less
functions
int row_length() const
C0& d()
C0& dd()
C0 pow(int) const
manifold dimension
spatial dimension
the second derivative; retrieve tangent of tangent vector
TRUE == 1
FALSE == 0
Partial listing of C2 type Vector_of_Tangent_of_Tangent_Bundle object arithmetic operators, logic operators and functions.
112
Linear Programming and Non-linear Optimization

2.3 Linear Programming and Non-linear Optimization
2.3.1 Linear Programming
Two kinds of approaches in linear programming are explained in this section, namely the basic set method
(column pivoting) and the active set method (row pivoting). The example problem is1
maximize objective functional f(x) = 3x1 + x2 + 3x3 subject to
2x 1 + x 2 + x 3 2
x 1 + 2x 2 + 3x 3 5
2x 1 + 2x 2 + x 3 6
x 1 0, x 2 0, x 3 0
The pre-processing step of the linear programming is to (1) multiply -1 on the objective functional to convert the maximization problem into a minimization problem, (2) transform the first three inequality constraints
into equality constraints by adding positive slack variables x4, x5, and x6. That is
minimize objective functional f(x) = -3x1 - x2 - 3x3 subject to
2x 1 + x 2 + x 3 + x 4 = 2
x 1 + 2x 2 + 3x 3 + x 5 = 5
2x 1 + 2x 2 + x 3 + x 6 = 6
x 1 0, x 2 0, x 3 0, x 4 0, x 5 0, x 6 0
This is the so-called standard form in linear programming.
Basic Set Method

The above problem has 6 variables and 3 equality constraints. We can select 3 variables (set the others to
zero) and solve the 3 equality constraints. In condition that the 3 equations are linear independent we can have a
basic solution. If this basic solution satisfies non-negative constraints on the 6 variables, this solution is also
called basic feasible solution. The procedure of linear programming is to move from one basic feasible solution
to the other such that the value of the objective functional decreases to its minimum. For a linear objective functional, the fundamental theorem of linear programming states that it is sufficient to consider only those basic fea-
1. example taken from D.G. Luenberger, 1989, Linear and Nonlinear Programming, Addison-Wesley Publishing Company,
Inc., Reading, MA., p.46.
113
Chapter
sible solutions. In other words, the optimal value of the objective functional is always achieved at a basic
feasible solution for a linear objective functional.
The matrix form of the above problem can be written as
minimize f ( x ) = c DT x D + c BT x B
subject to D x D + Bx B = b
x D 0, x B 0
A = [D, B], cT = [cTD, cTB], xT = [xTD, xTB]
Eq. 28
xD = {x1, x2, x3} is the non-basic variables, xB = {x4, x5, x6} is the basic variables. We choose the slack variables as initial basic variables for the obvious reason that the initial basic feasible solution is clearly xB = {2, 5,
6} without having to solve any set of equations. We can solve for xB from the equality constraints as
xB = B-1 (b-D xD)
Substituting this back to the objective functional gives
T c T B 1 D ) x + c T B 1 b
f ( x ) = ( cD
D
B
B
Eq. 29
In Eq. 29 the coefficient of the non-basic variables xD is rTD c DT c BT B 1 D . rD is known as the relative cost
which measures the cost of a non-basic variable relative to the current basic variables. Negative values of the relative cost in Eq. 29 decrease the value of the objective functional. A non-basic variable corresponding to the
most negative relative cost component in rD will be brought into the basic set, such that the objective functional
decreases the most. Denote a column in D corresponding to the non-basic variable selected to enter the basic set
as d. Bringing this non-basic variable into the current basic set means that we are moving with p = - B-1d as a
searching direction; that is moving away from xB along x = xB + p. The smallest non-negative component in
vector to satisfy xB + p = 0 will be the first basis in the current basic set to be encountered (an adjacent
extreme point to d). Hence, this basis is to be selected to leave the basic set. The above process is to be repeated
until the components of relative cost rD are all positive.
The implementation of any non-trivial problem such as the finite difference method discussed in page 67
contains many logic steps that are not highly mathematical. In finite difference method we create a class FD to
handle the mapping of finite difference stencil to the global matrix using the concept of data abstraction. Program Listing 23 (project: linear_programming_basic_set) implemented the basic set method. In current basic
set method we create a class Basic_Set to represent the basic and non-basic columns in the constraint equations and the coefficients of the objective functional as shown in Figure 2.6.
For the example problem, we write
C1 X(6, (double*)0), C(3, 6, (double*)0), f;
C[0] = 2*X[0]+ X[1]+ X[2]+ X[3]
C[1] =
X[0]+2*X[1]+3*X[2]
+ X[4]
114
// 6 variables, 3 constraints
;
;

C[2] = 2*X[0]+2*X[1]+ X[2]
f &= -3*X[0]-X[1]-3*X[2];
+ X[5];
Notice that both the constraint equations and the objective functional are declared as objects of C1 type. The tangent vector of the C1 type objects will give the coefficients we need; i.e, the elements of A = d(C), and the elements of cT = d(f) in Eq. 28. The class Basic_Set is initialized by calling constructor
class Basic_Set {
C0 *_A, *_c;
int row_size, col_size, *_basic_order;
public:
Basic_Set(C0&, C0&);
~Basic_Set() { delete [] _basic_order; }
C0& A() { return *_A; }
C0& c() { return *_c; }
int basic_order(int i) {return _basic_order[i];}
void swap(int, int);
};
Basic_Set::Basic_Set(C0& dC, C0& df) {
row_size = dC.row_length(); col_size = dC.col_length();
_basic_order = new int[col_size];
for(int i = 0; i < col_size; i++) _basic_order[i] = i;
_A = &dC; _c = &df;
}
void Basic_Set::swap(int i, int j) {
int old_basic_order = _basic_order[i];
_basic_order[i] = _basic_order[j]; _basic_order[j] = old_basic_order;
C0 old_Ai(row_size, (double*)0); old_Ai = (*_A)(i);
(*_A)(i) = (*_A)(j); (*_A)(j) = old_Ai;
C0 old_ci(0.0); old_ci = (*_c)[i];
(*_c)[i] = (*_c)[j]; (*_c)[j] = old_ci;
}
class basic set

constraint, and coefficients of the objective functional
initialize original variable order array
swap order
swap columns
swap objective functional coefficients
Listing 23 class Basic_Set data abstraction (project: linear_programming_basic_set).
swap
basic
non-basic
A=
= [D, B]
cT= [
order array:
] = [cTD, cTB]
0
initial order
after swapping
Figure 2.6 Data abstraction of the basic set.

115
Chapter

Basic_Set BS(d(C), d(f));
The submatrices (D, B) and subvectors (cTD, cTB) in Eq. 28 can be written using referenced Matrix and referenced Vector in VectorSpace C++ Library as
C0 D(3, 3, BS.A(), 0,0), B(3, 3, BS.A(), 0,3), c_D(3, BS.c(),0), c_B(3,BS.c(),3);
where the public member functions Basic_Set::A() and Basic_Set::c() provide access to the matrix A and
vector cT referenced to by the Basic_Set object. The most important function that the class Basic_Set performs is to swap columns between the basic and non-basic set. This is provided by the public member function
Basic_Set::swap(int, int) with the two integer arguments indicating which two columns are to be swapped. A
private member integer array _basic_order is used to keep track down the original variable order. This original
variable order can be retrieved by using the public member function Basic_Set::basic_order(int). Its integer
argument is the current column number.
Program Listing 24 implemented the steps of the so-called revised simplex method in linear programming. 1
These steps are
Step 1: relative costcompute rTD c DT c BT B 1 D , if r D 0 stop,
Step 2:inselect non-basic column (say d) corresponding to the most negative rD to enter the basic set,
Step 3:outcompute p = B-1d, then = xB / p. If no component in vector is greater than 0, stop.
The column in the basic set corresponding to the smallest positive is to leave the basic set.
Step 4: swapswap columns selected in in and out.
The solution is x = {0.2, 0, 1.6, 0, 0, 4}T in the standard form, with the maximum value of the objective functional as 5.4, or the solution is x = {0.2, 0, 1.6}T in original form, neglecting the artificial slack variables.
The basic set method is the traditional simplex-tableau updating procedure that can be explained step-by-step
with minimum mathematics.2 Following is the active set method for linear programming that will be more
readily modified to inequality constrained nonlinear programming.
1. p.60 in D.G. Luenberger, 1989, Linear and Nonlinear Programming, Addison-Wesley Publishing Company, Inc., Reading, MA.
2. see Chapter 3 in D.G. Luenberger, 1989, same as the above.
116

int main() {
C1 X(6, (double*)0), C(3, 6, (double*)0, (double*)0);
C[0] = 2*X[0]+ X[1]+ X[2]+ X[3]
;
C[1] =
X[0]+2*X[1]+3*X[2]
+ X[4]
;
C[2] = 2*X[0]+2*X[1]+ X[2]
+ X[5];
C1 f = -3*X[0]- X[1]-3*X[2];
double rhs[3] = {2.0, 5.0, 6.0};
C0 b(3, rhs), x_B;
Basic_Set BS(d(C), d(f));
C0 D(3, 3, BS.A(), 0, 0), B(3, 3, BS.A(), 0, 3), c_D(3, BS.c(), 0), c_B(3, BS.c(), 3);
int min_q;
do {
C0 B_inv = B.inverse();
C0 r_D = c_D - c_B * B_inv * D;
x_B = B_inv * b;
min_q = -1; double min = 1.e-20;
for(int i = 0; i < 3; i++) if((double) r_D[i] < min) { min = r_D[i]; min_q = i; }
if(min_q != -1) {
C0 p = B_inv * D(min_q);
int min_i = -1; double min_alpha = 1.0e20;
for(int i = 0; i < 3; i++) if((double)p[i] > 0.0) {
double alpha = (double)(x_B[i] / p[i]);
if(alpha < min_alpha && alpha > 0.0) { min_alpha = alpha; min_i = i; }
}
if(min_i == -1) cout << "This linear progamming problem is unbounded" << endl;
BS.swap(min_i+3, min_q);
}
} while(min_q != -1);
for(int i = 0; i < 3; i++) ((C0)X)[BS.basic_order(i+3)] = x_B[i];
cout << "solution: " << ((C0)X) << endl << "maximum objective function: " <<
(3*((C0)X)[0]+((C0)X)[1]+3*((C0)X)[2]) << endl;
}
6 variables, 3 constraints
2x 1 + x 2 + x 3 + x 4 = 2
x 1 + 2x 2 + 3x 3 + x 5 = 5
2x 1 + 2x 2 + x 3 + x 6 = 6
f(x) = -3x1 - x2 - 3x3

r.h.s. and basic feasible solution
define submatrices and subvectors as
A = [D, B], cT = [cTD, cTB]
non-basic column index to be selected
do until all rD are positive
1
step 1: rTD = c DT c BT B D
step 2:select the most negative rD
step 3:determine adjacent extreme point
p = B-1d
= xB / p
step 4: swap non-basic and basic
until all rD > 0
unscramble the solution
x = {0.2, 0, 1.6, 0, 0, 4}T
5.4
Listing 24 Basic set method for linear programming (project: linear_programming_basic_set).
117
Chapter
Active Set Method

For simplicity we consider the equality constraint first. For objective functional f and constraint equations
C (with m number of constraints and n number of variables). An extremum point occurs at x* where f is
orthogonal to the tangent plane of C. That is f can be written as a linear combination of C. We can prove the
contrary. For example, at point x, if f is not orthogonal to the tangent plane, f can be projected as a vector p
to the tangent plane. This means that moving along curve C on the direction of p increases (or decreases) the
value of f. Therefore, x is not an extremum point.
C
C
Tangent plane
f
p
f
x*
f=c
Tangent plane
C(x)
Figure 2.7 Extremum point occurs at f is a linear combination of C.
With this relation between f and C at an extremum point, we introduce Lagrange multiplier (m
dimensional vector) as the coefficients of linear combination of components of C to form f
f +T C = 0
Eq. 210
In view of this, the Lagrangian functional, l(x, ), can be introduced to represent the constrained optimization
problem as
l(x, ) = f + T C
Eq. 211
For an extremum condition, setting the first-order derivatives of the Lagrangian functional Eq. 211 to zero gives
the Euler-Lagrange equations
l,x= f + T C = 0
l, = C = 0
Eq. 212
This states that the first-order condition of the Lagrangian functional (1) is exactly Eq. 210, and (2) requires x to
stay on the constraints surface (C = 0). The second-order condition of the Lagrangian functional in the case of
the linear objective functional is just the Hessian H= f,xx being positive definite.
In the case of inequality constraints C ( x ) 0 , one can state that
either i = 0 and Ci < 0, or i > 0 and Ci = 0
Eq. 213
In the first part of Eq. 213, the constraint is satisfied in the interior of a feasible region (Ci < 0). The constraint
is inactive by setting the corresponding Lagrange multiplier i = 0. In the second part of Eq. 213, the constraint
is on the edge of the feasible region. The constraint is active (Ci = 0), and the corresponding Lagrange multiplier
118

takes a positive value (i > 0). This is the essence of the active set method for dealing with the inequality constraints. This either - or condition can be expressed as a trio
i 0 , A i 0, and i A i = 0
Eq. 214
This is the so-called Kuhn-Tucker condition. Although Eq. 214 is aesthetically more satisfying, we use Eq. 213
for practical coding of class Active_Set. The kernel of the problem is to compute the Lagrange multiplier
according to Eq. 210 as
= - f / (A)T
Eq. 215
where A denotes the active subset of C. When i 0 , for i in the active set, the Kuhn-Tucker condition is
uphold, and the solution is optimal. Otherwise, select the constraint corresponding to the most negative i, and
drop this constraint from the active set. The search direction p due to the deletion of this constraint satisfies1
Ap= - ei
Eq. 216
where is the e i basis vector (with only i-th component = 1 and 0 elsewhere). Eq. 216 means that the active
constraints other than the i-th one is to be strictly satisfied (=0). We solve for p = - e i / A, and the next solution
is along the path x+ p. Therefore, the first constraint to be encountered (Ci = 0) along this search path is the
minimum positive to satisfy Ci (x+ p) = 0. We have
min {i = - Ci (xcurrent) / (d(C)i p), i-th constraint in the inactive set}
Eq. 217
The constraint corresponding to this smallest positive i (in the inactive set) will be added to the active set, A.
Program Listing 23 (project: linear_programming_active_set) implemented the class Active_Set. The
criterion to determine whether a constraint is active or not is replaced by
Ai > - , (instead of Ai = 0)
Eq. 218
where is a small positive number. The Active_Set keeps track of the active state of each constraint in the
original constraint equations. Upon calling the public member function Active_Set::activate() the current
active set is assembled and a coefficient matrix will be formed. A public member function
Active_Set::active_state(int) requires an integer argument as the order of the original constraint equations and
returns the order of the current constraint equations in the active set. The coefficient matrix can be retrieved
using another free function d( ) such as
C1 X(3, x), C(6, 3, (double*)0, (double*)0);
C[0] = 2*X[0]+ X[1]+ X[2] - 2;
C[1] =
X[0]+2*X[1]+3*X[2] - 5;
C[2] = 2*X[0]+2*X[1]+ X[2] - 6;
C[3] = - X[0];
1. p. 176 in P.E. Gill, W. Murray, and M.H. Wright, 1981, Practical Optimization, Academic Press, Inc., San Diego.
119
Chapter

C[4] = - X[1];
C[5] = - X[2];
Active_Set A(C);
A.activate();
C0 lambda = - d(f) / (~d(A));
// C is the constraints
// form active set and coefficient matrix
// Eq. 210 as = - f / (A)T
The problem has been recast to the standard form for the active set method. For problems with equality constraints the Active_Set constructor can be called as
Active_Set A(C, 3);
A second integer argument number indicates the number of equality constraints. These equality constraints will
be always kept in the active set.
A minor technical detail of Active_Set is the private data member _active_state is initialized to -1. When
a constraint is determined to be included in the active set the value of _active_state is set to the order of the
constraint in the current active set. When a constraint is determined to be dropped from the active set, the value
of its _active_state is set to -2, which means this particular constraint can never be activated again. This
treatment may avoid possible zigzagging or jamming of the searching path caught in an infinite loop1.
Program Listing 23 implemented the active set algorithm. The core steps are
Step 1: Lagrange multipliercompute = - f / (A)T, if i 0 stop, the solution is optimal.
Step 2: outselect the constraint corresponding to the most negative i to be dropped from the active set.
Step 3: in compute p = A-1ei, then, for all inactive i, i = - Ci / (d(C)i p). If no i is greater than 0, stop.
The constraint corresponding to the smallest positive i is to be added to the active set.
Step 4: repeat Step 1-3.
Step 4 is written with a do-while control statement. The termination criterion is to have all Lagrange multiplier corresponding to the active constraints to be positive. This condition is the second part of Eq. 213.
1. see p. 330, Chapter 11 in D.G. Luenberger, 1989, Linear and Nonlinear Programming, Addison-Wesley Publishing
Company, Inc., Reading, MA.
120

#include "include/vs.h"
class Active_Set {
C0 _A;
C1 *Constraint_ptr;
int n_equality, size_c, n_active, *_active_state;
public:
Active_Set(C1& C, int n = 0);
~Active_Set() { delete [] _active_state; }
int active_state(int i) {return _active_state[i];}
int active_no() { return n_active; }
C0& A() { return _A; }
void activate();
void deactivate(int i);
friend C0& d(Active_Set&);
};
Active_Set::Active_Set(C1& C, int n) {
Constraint_ptr = &C;
n_equality = n;
size_c = Constraint_ptr->row_length();
_active_state = new int[size_c];
for(int i = 0; i < size_c; i++) _active_state[i] = -1;
}
void Active_Set::activate() {
n_active = 0;
for(int i = 0; i < n_equality; i++) {
_active_state[i] = n_active++;
}
for(int i = n_equality; i < size_c; i++)
if((double) ((C0)(*Constraint_ptr))[i] > -1.e-10)
if(_active_state[i] >= -1) {
}
if(n_active > 0) _A &= C0(n_active, Constraint_ptr->col_length(),(double*)0);
for(int i = 0; i < size_c; i++)
if(_active_state[i] >= 0) _A[_active_state[i]] = d(*Constraint_ptr)[i];
}
void Active_Set::deactivate(int i) {
for(int j = 0; j < size_c; j++)
if(_active_state[j] == i) { _active_state[j] = -2; break; }
}
C0& d(Active_Set& a) { return a._A; }
class Active_Set
coefficients of the active constraints
C1* to constraint equations
return order in active set

total number of active constraints
form active set
drop out of the active set
return active set coefficient matrix
number of equality constraints
the initial state for all constraints is

inactive
add equality constraints
check inequality constraints to add on
the active set
form coefficient matrix of the active set
drop from active set
Listing 25 class Active_Set data abstraction (project: linear_programming_active_set).
121
Chapter
int main() {
double x[3] = {0.0, 0.0, 0.0};
C1 X(3,x),
C(6,3, (double*)0, (double*)0),
f;
const int ALL_POSITIVE = 1;
const double EPSILON = 1.e-12;
int lambda_flag;
f &= -3*X[0]-X[1]-3*X[2];
C[0] = 2*X[0]+ X[1]+ X[2] - 2;
C[1] = X[0]+2*X[1]+3*X[2] - 5;
C[2] = 2*X[0]+2*X[1]+ X[2] - 6;
C[3] = - X[0];
C[4] = - X[1];
C[5] = - X[2];
Active_Set A(C);
do{
A.activate();
lambda_flag = ALL_POSITIVE;
C0 lambda = - d(f) / ~d(A);
int i_cache = -1;
double min_lambda = -EPSILON;
for(int i = 0; i < A.active_no(); i++)
if((double)lambda[i] < -EPSILON) {
lambda_flag = !ALL_POSITIVE;
if((double)lambda[i] < min_lambda) {
min_lambda = lambda[i];
i_cache = i;
}
}
if(!lambda_flag) {
A.deactivate(i_cache);
C0 e(3,(double*)0); e[i_cache] = 1.0;
C0 p = - e / d(A);
double min_alpha = 1.e20;
int activate_flag = FALSE;
for(int i = 0; i < 6; i++) {
if(A.active_state(i) <= -1) {
double alpha, temp = (double)(d(C)[i]*p);
if(fabs(temp) > EPSILON)
alpha = -(double)((((C0)C)[i])/temp);
if(alpha < min_alpha && alpha > 0.0) {
min_alpha = alpha;
activate_flag = TRUE;
}
}
}
if(activate_flag) {
((C0)X) += min_alpha * sd;
((C0)C[0]) = 2*((C0)X[0])+ ((C0)X[1])+ ((C0)X[2]) - 2;
((C0)C[1]) = ((C0)X[0])+2*((C0)X[1])+3*((C0)X[2]) - 5;
((C0)C[2]) = 2*((C0)X[0])+2*((C0)X[1])+ ((C0)X[2]) - 6;
((C0)C[3]) = - ((C0)X[0]); ((C0)C[4]) = - ((C0)X[1]); ((C0)C[5]) = - ((C0)X[2]);
}
}
} while(!lambda_flag);
cout << "solution: " << ((C0)X) << endl << "maximum objective functiona: " <<
(3*((C0)X[0])+((C0)X[1])+3*((C0)X[2])) << endl;
return 0;
}
6 constraints, 3 variables
active set standard form

f(x) = -3x1 - x2 - 3x3
2x 1 + x 2 + x 3 2 , x 1 + 2x 2 + 3x 3 5
2x 1 + 2x 2 + x 3 6 , -x4
-x5, -x6
form active set and matrix d(A)
step 1: = - f / (A)T
step 2:select the most negative i
drop from active set

step 3: adjacent extreme point
search direction p = A-1ei
i = - Ci / (d(C)i p)
select smallest positive i
update solution
update constraint values
until all i are positive

x = {0.2, 0, 1.6}T
5.4
Listing 26 Active set method for linear programming (project: linear_programming_active_set).
122

2.3.2 Unconstrained Optimization
In this section, we deal with nonlinear objective functional without any constraint. The Newton-Raphson
method on page 110 is applied to an elliptic objective functional. The convergence is achieved in just one step. In
general, we may not encounter such an ideal situation. Therefore, we introduce a more challenging Rosenbrocks
function.1
2
f(x1, x2) = 100 (x2-x12) + (1-x1)2
Eq. 219
The unique minimum point (1, 1) is at a banana-shaped valley. For all problems in this section, the initial point
is selected at (-1.2, 1) such that an intelligent search path will have to make a turn along the banana-shaped valley to arrive at the minimum point. This objective functional could be used to test the robustness of an algorithm.
2
(-1.2, 1)
(1, 1)
-1
-2
-2
-1
Figure 2.8 Rosenbrocks function with minimum point at point (1, 1).
Classic Newtons Method

Classic Newtons method applied to the minimization of the Rosenbrocks function is implemented in Program Listing 27. Apart from the objective functional definition, the code is essential the same as Program Listing 22. The search path of the Newton iteration is shown in Figure 2.9. It takes only 6 iterations to get to the
final solution, because the rate of convergence of classic Newtons method is known to be quadratic. However,
a wild search path was taken at the second iteration. We are just lucky that the search path did not bounce out of
1. from p. 96 in P.E. Gill, W. Murray, and M.H. Wright, 1981, Practical Optimization, Academic Press, Inc., San Diego.
123
Chapter
int main() {
double v[2] = {-1.2, 1.0}, energy_norm;
C2 x(2, v), f;
int count = 0;
do {
f &=100.0*(x[1]-x[0].pow(2)).pow(2)+(1.0-x[0]*x[1]).pow(2);
C0 dx = - d(f) / dd(f);
(C0) x += dx;
energy_norm = norm(dx*(C0)f);
} while(++count < MAX_ITER_NO && energy_norm > EPSILON);
cout << Warning: convergence failed, energy norm: << energy_norm << endl;
else
return 0;
}
initial values, x0 = {-1.2, 1}T

f(x1, x2) = 100 (x2-x12)2 + (1-x1)2
Eq. 27, dx = - f,x(xi) / f,xx(xi)
update xi+1 = xi+dx
energy norm = || dx f ||
if convergence failed, output
energy norm
x = {1, 1}T
Listing 27 Minimization of Rosenbrocks function using classic Newtons method (project:

newton_rosenbrock).
-1
-2
-3
-2
-1
Figure 2.9 Searching path of Rosenbrocks function using classic Newtons

method.
124

the banana-shaped valley. Indeed, there is no guarantee that classical Newtons method will always converge.
In fact, the wild search path in classical Newton method can be tamed by using line search method with either
bisection, or golden section1. The line search direction p can be taken from Newtons formula that
p = - f,x(xi) / f,xx(xi)
Eq. 220
Along p the solution is updated according to xi+1 = xi+ p, where is a scalar parameter, and its optimal value is
determined by using line search, or even a scalar version of classical Newtons method (see next section on
steepest descent method). We consider bisection and golden section here. For line search algorithm, the minimum of a function is searched by evaluating the function and then comparing its values at selected bracketing
points. The basic idea is to have the bracketing interval contains the point with the minimum function value, and
at the same time make the bracketing interval smaller and smaller in an iterative algorithm.
Given a bracketing interval [a, c] for bisection method, the interval contains the point corresponding to the
minimum function value. At the middle of the interval is the point, b = (a+c)/2. The next bracketing point x is
taken as the middle of [b, c]; i.e., x = (b+c)/2. If f(x) > f(b), the next bracketing points are [a, x], otherwise, the
next bracketing points are [b, c]. Repeating this process, the bracketing interval will become smaller and smaller.
In the worst scenario, the selected intervals always lie on the larger segments. The bracketing intervals will
reduce at the rate of 0.75 2n = 0.5625n, where 2n is the number of repeated iterations. On the other hand, the best
case will be reducing at the rate of 0.252n = 0.0625n. On an average case, there is 50% chance of selecting either
larger or smaller segments, so the reducing rate is 0.25n 0.75n = 0.1875n.
Golden section finds an optimal ratio to avoid the worst case scenario compared to bisection method. Considering a triplet of points [a, b, c] with the ratio of interval [a, b] to interval [a, c] as , the interval [b, c] to interval [a, c] ratio will be 1-. The next bracketing point x, lies to the right of b with interval [b, x] to interval [a, c]
ratio as . First, since after comparison of function values the selected bracketing point can be either b or x, we
demand the symmetry of the two points by requiring that [a, x] (normalized length = + ) and [b, c] (normalized length = 1-) to be equal. Therefore,
+ = 1-
Eq. 221
Secondly, if x is to be selected as the next bracketing point, the ratio of interval [b, x] to interval [b, c] (= /(1-))
should be self-similar to the original ratio of interval [a, b] to interval [a, c] (= ). Therefore,
/(1-) =
Eq. 222
Eliminating in Eq. 221 and Eq. 222, we have

2 - 3 + 1 = 0
Eq. 223
One of the roots of Eq. 223 that is between [0, 1] is the ratio = 0.381971, with the ratio of selecting the larger
segment as 1- = 0.61803. Now, for the worst scenario the convergence rate reduces from 0.752n to 0.618032n.
1. p. 350 for bisection, and p. 399 for golden section, in W.H. Press, S.A. Teukolsky, W.T. Vetterlin, and B.P. Flannery,
1992, Numerical Recipes, Cambridge University Press, Cambridge, UK.
125
Chapter
However, it is easy to see both the best case scenario and the average case scenario for golden section reduces
the bracketing length at a slower rate than bisection method. Program Listing 28 implements golden section line
search with classical Newtons method for defining the search direction. The result is shown in Figure 2.10,
where the search path follows the banana shaped valley nicely. In Chapter 4, the least squares formulation for a
nonlinear finite element method on page 331 converges only after using the line search algorithm. In Chapter 5,
a relative large incremental step can be taken in a finite deformation elastoplastic finite element problem only
when the line search method is applied.
static double EPSILON = 1.e-12; static int MAX_ITER_NO = 100
int main() {
double v[2] = {-1.2, 1.0}; C2 X(2, v); C0 d_x, p; int count = 0;
do {
C2 f = 100.0*(X[1]-X[0].pow(2)).pow(2)+(1-X[0]).pow(2);
p &= -d(f)/dd(f);
double left = 0.0, right = 1.0, length = right-left;
C1 x0(0.0), x1(0.0), phi(0.0), alpha(0.0);
do {
double alpha_temp = (C0) alpha = (left + 0.618 * length);
x0 = ((C0)X)[0] + alpha * p[0], x1 = ((C0)X)[1] + alpha * p[1];
phi = 100*(x1-x0.pow(2)).pow(2)+(1-x0).pow(2);
double golden_phi = (C0)phi; (C0) alpha = (left + 0.382 * length);
x0 = ((C0)X)[0] + alpha * p[0], x1 = ((C0)X)[1] + alpha * p[1];
phi = 100*(x1-x0.pow(2)).pow(2)+(1-x0).pow(2);
double left_phi = (C0)phi;
if(golden_phi < left_phi) { left = left + 0.382 * length; (C0)alpha = alpha_temp;
} else { right = left + 0.618 * length; }
length = right-left;
} while(length > 1.e-3);
d_x &= ((C0)alpha)*p; ((C0)X) += d_x;
cout << "solution " << (++count) << ": " << "{" << ((C0)X)[0] << ", "
<< ((C0)X)[1] << "}" << endl;
} while(((double)norm(p)) > EPSILON && count < MAX_ITER_NO);
return 0;
}

f(x1, x2) = 100 (x2-x12)2 + (1-x1)2
p = - f,x(xi) / f,xx(xi)
initial bracket [0, 1]
golden section at x
(x)
golden section at b
(b)
comparing (x), and (b) and reduce
the bracketing interval
update xi+1 = xi+ p
until || p || converges
x = {1, 1}T
Listing 28 Minimization of Rosenbrocks function using classic Newtons method (project:

newton_rosenbrock with macro definition __TEST_GOLDEN_SECTION defined at compile time).
2
-1
-2
-2
-1
Figure 2.10 Golden section line search with Newtons method for search direction.
126

Steepest Descent Method
The classic Newtons method requires second derivative information. In practical computation, this information can be very costly. For example, in finite element method (see Chapter 4), this is equivalent to form the socalled stiffness matrix. Even worse is when the problem size is large to get the inverse of the second derivative
information means we need to have a costly computation using matrix solver. It can be very demanding in terms
of computing time and memory space. The steepest descent method requires only first derivative information in
which the search direction p is
p = - f
Eq. 224
That is the search direction is taken along the negative gradient direction. This search direction makes intuitive
sense for the objective functional to decrease in the direction of the negative gradient. The solution is updated
through xi+1 = xi + p, where the scalar is the line search parameter. We seek a value of that gives the minimum value of f along the search direction p. For this one variable() optimization problem, the scalar version of
the Newtons method can be used for solving optimal value of . We may replace it with a more primitive
method such as the golden section line search1 described in previous section.
int main() {
double v[2] = {-1.2, 1.0};
C1 X(2, v);
C0 dx;
int count = 0;
do {
C1 f = 100.0*(X[1]-X[0].pow(2)).pow(2)+(1-X[0]).pow(2);
C0 g = d(f);
C2 alpha(0.0);
C0 d_alpha;
do {
C2 x0 = ((C0)X)[0] + alpha * -g[0],
x1 = ((C0)X)[1] + alpha * -g[1];
C2 phi = 100*(x1-x0.pow(2)).pow(2)+(1-x0).pow(2);
d_alpha &= - d(phi) / dd(phi);
((C0)alpha) += d_alpha;
} while(((double)norm(d_alpha)) > EPSILON);
dx &= ((C0)alpha)*(-g);
((C0)X) += dx;
cout << "solution " << (++count) << ": " << ((C0)X) << endl;
} while(((double)norm(dx)) > EPSILON &&count < MAX_ITER_NO);
cout << "solution: " << ((C0)X) << endl;
}

f(x1, x2) = 100 (x2-x12)2 + (1-x1)2
p = - g = - f,x(xi); the gradient
line search along negative gradient
xi+1 = xi + p
(xi+1)
d = -d / d 2; Newtons formula
+= d
dx = p
update xi+1 = xi+dx
norm = || dx ||
x = {1, 1}T
Listing 29 Minimization of Rosenbrocks function using steepest descent method (project:

steepest_descent).
1. see p. 353 for bisection and p.397 for golden section search in W.H. Press, S.H. Teukolsky, W.T. Vetterling, and B.P. Flannery, 1992, Numerical Recipes in C, Cambridge University Press, Cambridge, U.K.
127
Chapter
Program Listing 29 implements steepest descent method. The line search parameter , is the only variable
in updating formula xi+1 = xi + p, where xi and p are regarded as constants, and is the parameter to search for
the minimum objective functional value. Therefore, in the definition for the objective functional in line search
algorithm is
f(x1, x2) = (xi+1()) = ()
where f depends on 2 variables, or in more general case a multi-variable functional, while only depends on
one variable . We note the difference of cost between doing multi-variable Newtons method as described on
page 110 and one-parameter Newtons method for the line search algorithm here. The resultant search path of
steepest descent method is shown in Figure 2.11. First of all the path shows the typical zigzag pattern consisting
of alternating orthogonal search directions. The convergence rate is extremely slow. After 100 iterations the
solution is still at (0.6, 0.36). The convergence becomes ever slower when it is approaching the true solution (1,
1). At 9660 iterations, the solution is still at (0.999941, 0.999883). Although each iteration in steepest descent
method is much cheaper than in Newtons method, the Newtons method takes only 6 iteration to get to (1, 1).
-1
-2
-2
-1
Figure 2.11 The search path of the steepest descent method up to 100 iterations.
128

Combined Newton and Steepest Descent Method
Newtons method is fast, but we can only cross our fingers to see whether the initial point given is close
enough to the minimum that a convergent course will be taken. On the other hand, the steepest descent method
has a much more stable searching path, but its convergence rate is unacceptable. From Eq. 27 on page 109,
Newtons formula is
dx = - f,x(xi) / f,xx(xi) = - g / H
where g (= f,x) is the gradient vector and H (= f,xx) is Hessian matrix. Now consider taking H = I, the above equation becomes
dx = - g
that is Newtons method has degenerated into steepest descent method. A weighted approach to combine these
two methods together is1
dx = - M g, where M = [ I + H]-1
Eq. 225
is the line search parameter in steepest descent method. However, the selection of the weighting parameter is
a matter of art. A more systematic way of implementing Eq. 225, or a probably more intelligent way, is to use
modified Cholesky decomposition2 introduced on page 32. The basic idea is to set M = H-1. Since Hessian
matrix is symmetrical, we can apply Cholesky decomposition. The problem with the Newtons method is that the
objective functional may not be quadratic and the Hessian matrix may not be positive definite. When we apply
the Cholesky decomposition, we modify small or negative diagonals according to
d = max {d , }
where d is the modified diagonals and is a small positive number to be supplied to the modified Choelsky
decomposition on page 32. That is the degeneration of Newtons method to steepest descent method occurs only
when the positive definitiveness of the Hessian matrix is in question.
Program Listing 210 implements combined steepest descent and Newton method using modified Choleksy
decomposition. The search path is shown in Figure 2.12. It takes 13 iterations to get to point(1, 1) about two
times iterations compared to the classic Newtons method. However, the wild search path in Figure 2.9 has been
tamed successfully. Combined Newton and steepest descent method seems to be a more robust method than classic Newton method and steepest descent method.
1. p. 226-227 in D.G. Luenberger, 1989, Linear and Nonlinear Programming, Addison-Wesley Publishing Company, Inc.,
Reading, MA.
2. p. 108-111 in P.E. Gill, W. Murray, and M.H. Wright, 1981, Practical Optimization, Academic Press, Inc., San Diego.
129
Chapter
int main() {
double v[2] = {-1.2, 1.0};
C2 X(2, v);
C0 dx;
int k = 0;
do {
C2 f = 100.0*(X[1]-X[0].pow(2)).pow(2)+(1.0-X[0]).pow(2);
C0 g = d(f);
Cholesky mcd(dd(f), EPSILON);
C0 p = mcd * (-g);
C2 alpha(0.0), x0, x1;
C0 d_alpha;
do {
x0 &= ((C0)X)[0] + alpha * p[0]; x1 &= ((C0)X)[1] + alpha * p[1];
C2 phi = 100.0*(x1-x0.pow(2)).pow(2)+(1.0-x0).pow(2);
if(fabs((double)dd(phi)) > EPSILON) d_alpha &= -d(phi) / dd(phi);
else break;
} while(((double)norm(d_alpha)) > 1.e-8);
dx = ((C0)alpha)*p;
((C0)X) += dx;
} while(((double)norm(dx)) > EPSILON);
cout << "final solution: " << ((C0)X) << endl;
}

f(x1, x2) = 100 (x2-x12)2 + (1-x1)2
modified Cholesky decomposition on M
p = - g / M;
line search along p
xi+1 = xi + p
(xi+1)
d = -d / d2; Newtons formula
+= d
dx = p
update xi+1 = xi+dx
norm = || dx ||
x = {1, 1}T
Listing 210 Minimization of Rosenbrocks function using modified Cholesky decomposition (project:
combined_newton_and_steepest_descent).
-1
-2
-2
-1
Figure 2.12 Search path using modified Cholesky decomposition.

130

Conjugate Gradient Method
When steepest descent method was introduced, one of the advantage mentioned is that the costly second
derivative information is not necessary, and the need to invert it can also be avoided. The inversion of the second
derivative information means that the solution of a matrix problem is bypassed. The above combined Newton
and steepest descent method although solves the problem of slow convergence of the steepest descent method, it
require the second derivative information and its inversion just as in classic Newtons method. For some problems the second derivative information may not be available or very expensive to compute; e.g., the global stiffness matrix of the finite element method in Chapter 4. What we should do is to fix the steepest descent method
that only requires first derivative information, and to improve its convergence rate. The basic idea is to make the
subsequent search direction pi+1 orthogonal to all previous search directions {p1. p2. ... pi }. For a quadratic
objective functional of dimension n, theoretically, n iterations will be sufficient to reach the exact solution.
We first approximate a general objective functional in quadratic form as
1
f ( x ) --- x T H x b T x
2
Eq. 226
The solution is sought along x i+1 = x i + p i , where dx = p i. Minimize f(x i+1) with respect to dx (where x i+1
= x i + dx) , we get
i = - (pi)Tf i / ( (pi)THpi)-1
Eq. 227
where f = Hx - b = g. Therefore, we have

gi+1 - gi = H (xi+1- xi) = H (i pi)
Eq. 228
Since search directions pi are orthogonal to each other, we have (pi)TH pj = 0, for i j. From Eq. 228, we have
(gi+1 - gi)T pj = i (pi)TH pj = 0.
Eq. 229
The conjugate direction pi+1 that is orthogonal to all its previous directions is taken as
pi+1 = - gi+1 + i pi
Eq. 230
Pre-multiplying Eq. 230 with (gi+1 - gi)T we have left-hand-side of Eq. 230 equal zero. In view of Eq. 229, i
can be solved for as
i = (gi+1 - gi)Tgi+1 / [(gi+1 - gi)Tpi]
Eq. 231
Applying the orthogonal relations to Eq. 231 we have the Fletcher-Reeves formula
i = (gi+1)Tgi+1 / [ (gi)Tgi]
Eq. 232
i = (gi+1 - gi)Tgi+1 / [ (gi)Tgi]
Eq. 233
or Polak-Ribiere formula
131
Chapter
Program Listing 211 implements conjugate gradient method. The basic steps are1
Step 1: search directioncompute p0 = - g0 = - f(x0)T,
Step 2: looppartial conjugate gradient method, loop over n dimension
a: line searchx i+1 = x i + p i to minimize f(x i+1) in place of Eq. 227
b: gradient at x i+1gi+1 = f(xi+1)T
c: search directionpi+1 = - gi+1 + i pi (Eq. 229), where
Fletcher-Reeves: i= (gi+1)Tgi+1 / [ (gi)Tgi] (Eq. 232)or,
Polak-Ribiere:
i = (gi+1 - gi)Tgi+1 / [ (gi)Tgi] (Eq. 233)
Step 3: restartrepeat Step 1 and 2, and reset x0 = xn.
int main() {
double v[2] = {-1.2, 1.0};
const int MAX_NO_OF_ITERATION = 30;
int k = 0;
C1 X(2, v);
C0 dx, p;
do {
for(int i = 0; i < 2; i++) {
C1 f = 100.0*(X[1]-X[0].pow(2)).pow(2)+(1.0-X[0]).pow(2);
C0 g = d(f);
if(i == 0) p &= - g;
C2 alpha(0.0), x0, x1;
C0 d_alpha;
do {
x0 &= ((C0)X)[0] + alpha * p[0]; x1 &= ((C0)X)[1] + alpha * p[1];
else break;
dx &= ((C0)alpha)*p;
((C0)X) += dx;
if(i != 1 && ((double)norm(dx)) > EPSILON) {
C1 f1 = 100.0*(X[1]-X[0].pow(2)).pow(2)+(1.0-X[0]).pow(2);
C0 g1 = d(f1),
beta = g1.pow(2) / g.pow(2);
p = -g1 + beta * p;
}
cout << "solution(" << ++k << "): " << ((C0)X) << endl;
}
} while(((double)norm(dx)) > EPSILON && k < MAX_NO_OF_ITERATION);
cout << "The final solution: " << ((C0)X) << endl;
}
f(x1, x2) = 100 (x2-x12)2 + (1-x1)2

p = - g;
line search along p
xi+1 = xi + p
(xi+1)
+= d
dx = p
update xi+1 = xi+dx
norm = || dx ||
gi+1 = f(xi+1)T
i= (gi+1)Tgi+1 / [ (gi)Tgi]
pi+1 = - gi+1 + i pi
x = {1, 1}T
Listing 211 Minimization of Rosenbrocks function using conjugate gradient method (project:
conjugate_gardient_method).
1. p. 253 in D.G. Luenberger, 1989, Linear and Nonlinear Programming, Addison-Wesley Publishing Company, Inc.,
Reading, MA.
132

In the above steps, conjugate gradient algorithm is reset every n search directions. This is known as partial
conjugate gradient method. Recall that for a quadratic functional, theoretically, the conjugate gradient method
should arrive at the solution in n iterations. However, there are truncation errors in numerical computation. The
repeated loop is necessary to get to the final solution.
The result of the conjugate gradient method using only first derivative information is shown in Figure 2.13.
We set the program to terminate at 30 iterations. The solution at this stage is (0.995522, 0.990224). The convergence rate has a dramatic improvement over the steepest descent method in which after 100 iterations the solution only gets up to the point (0.6, 0.36), and the result is an unacceptable in any standard. In Chapter 5, we
implemented the conjugate gradient method for an elastoplastic finite element problem.
-1
-2
-2
-1
Figure 2.13 Conjugate gradient method using only first derivative information, and 30
iterations.
Quasi-Newton Method
The advantage of both steepest descent method and conjugate gradient method is that it requires only first
derivative information. This is important especially for problems with large number of variables. However, we
saw classic Newtons method or its modification that have second derivative information enjoys a faster convergence rate. The strategy is we want to stick with first derivative method because of economical consideration.
Since along the iterative steps we have two sequences of first derivative information, {x0, x1, ... , xi} and {g0, g1,
133
Chapter
..., gi}, we can use these sequences of first derivative information to construct an approximated second derivative
information. For search direction pi = xi+1 - xi, and qi = gi+1 - gi, the finite difference quotient gives
H(xi) = (gi+1-gi)/(xi+1 - xi)
Eq. 234
Eq. 234 is Hi pi = qi, which is also known as quasi-Newton condition. pi = qi / Hi = Bi qi , where Bi = (Hi)-1 is the
inverse of Hessian. We seek a rank one update formula with the form of
B
i+1
= B +uv
Eq. 235
where u and v are vectors, which need further constraints. Enforcing quasi-Newton condition first, it can be
shown that the second term is
i
i i
(p B q ) v
u v = ------------------------------------i
vp
Eq. 236
Bi+1 in Eq. 235 satisfy quasi-Newton condition but is not symmetrical. We can symmetrized it as (denoted with
superscript s)
(Bi+1)s = (Bi+1 + (Bi+1)T) / 2
Eq. 237
However, (Bi+1)s may not satisfy quasi-Newton condition. We can use the above two steps (1) quasi-Newton
condition, and (2) symmetrization repeatedly to yield a sequence of Bi+1. The limit of the sequence gives the
updating formula which is known as the Davidon-Fletcher-Powell (DFP) method
i
i i
i i
(B q ) (B q )
i p p
i+1
- ---------------------------------------B DFP = B + ----------------i
i
i
i i
p p
q (B q )
Eq. 238
This is a rank-two update formula. If the update is performed on the Hessian H itself instead of its inverse B, we
have a complementary formula by substituting B for H, and p for q and vice versa. Then, taking inverse of this
expression gives the alternative Broyden-Fletcher-Goldfarb-Shanno (BFGS) updating formula
i
q ( B q ) p p p ( B q ) + ( B q ) p
i+1
- ------------------ ---------------------------------------------------------------B BFGS = B i + 1 + -------------------------i
i
i
i
i
q pi p p
q p
i
i i
i i
i i
Eq. 239
Program Listing 212 implemented the quasi-Newton method. The basic steps are1
Step 1: search directioncompute di = - Bi gi ,
Step 2: looppartial quasi-Newton method, loop over n dimension
a: line searchx i+1 = x i + d i to minimize f(x i+1), we get x i+1 , p i = d i and
Reading, MA.
134

b: gradient differencegi+1 = f(xi+1)T, qi = gi+1 - gi
c: inverse of Hessian update
i
i i
i i
(B q ) (B q )
i p p
i+1
- ---------------------------------------DFP: BDFP = B + ----------------i
i
i
i i
p p
q (B q )
i
, or
q ( B q ) p p p ( B q ) + ( B q ) p
+1
BFGS: B iBFGS
- ---------------------------------------------------------------- ----------------= B i + 1 + -------------------------i
i
i
i
i
q pi p p
q p
Step 3: restartrepeat Step 1 and 2, and reset Bi .
i
i i
i i
i i
There are two ways to think of initial B. The first is that B may not be available at all. So initially B is set to
identity matrix. The second is that B computation is very expensive, so B is only computed at the initial step of
every restart. In between quasi-Newton method takes over without having to have the second derivative information. This method is popular in finite element method, in which the formation of global stiffness matrix and its
solution is equivalent to compute the inverse of HessianB. The result of BFGS computation is shown in Figure
2.14. It takes 34 iterations to arrive at the solution point (1, 1). In Chapter 5 we show an example of an elastoplastic finite element problem implemented with the BFGS method.
-1
-2
-2
-1
1
2
0
Figure 2.14 Searching path of the BFGS method. The solution point (1, 1) is arrived at after
34 iterations.
135
Chapter
int main() {
double v[2] = {-1.2, 1.0};
int k = 0;
C0 dx;
C1 X(2, v);
C2 x(2, v);
do {
((C0)x) = ((C0)X);
C2 F = 100.0*(x[1]-x[0].pow(2)).pow(2)+(1.0-x[0]).pow(2);
C0 B = dd(F).inverse();
for(int i = 0; i < 2; i++) {
C1 f = 100.0*(X[1]-X[0].pow(2)).pow(2)+(1.0-X[0]).pow(2);
C0 g = d(f), d = B*(-g);
C2 alpha(1.0), x0, x1;
C0 d_alpha;
do {
x0 &= ((C0)X)[0] + alpha * d[0];
x1 &= ((C0)X)[1] + alpha * d[1];
else break;
dx &= ((C0)alpha)*d;
((C0)X) += dx; // update the solution
if(((double)norm(d_x)) > EPSILON) {
C1 f1 = 100*(X[1]-X[0].pow(2)).pow(2)+(1.0-X[0]).pow(2);
C0 g1 = - d(f1), p = dx, q = g1 -g, Bq = B * q;
B += (1.0+(q*Bq)/(q*p))*((p%p)/(p*q)) - (p%Bq+Bq%p)/(q*p);
}
cout << "solution(" << (++k) << "): " << ((C0)X) << endl;
}
} while(k < MAX_NO_OF_ITERATION && ((double)norm(dx)) > EPSILON);
cout << "Final solution: " << ((C0)X) << endl;
}
f(x1, x2) = 100 (x2-x12)2 + (1-x1)2

B = H-1, initial inverse of Hessian
d = - B g;
line search along d
xi+1 = xi + d
(xi+1)
+= d
dx = d
update xi+1 = xi+dx
norm = || dx ||
p i = dx , qi = gi+1 - gi
BFGS updating formula Eq. 239
x = {1, 1}T
Listing 212 Minimization of Rosenbrocks function using BFGS method (project: quasi_newton_bfgs).
136

2.3.3 Constrained Optimization
In Section 2.3.1 on linear programming, we introduced (1) the basic set method and (2) the active set method.
In this section we will introduce (1) the reduced gradient method and (2) the gradient projection method. For
nonlinear optimization, these two methods are parallel to the previous two methods, respectively.
Reduced Gradient Method

The problem is expressed in a form, keeping close to the notations in linear programming,
minimize f(xB, xD)
subject to B xB + D xD = b
x B 0, x D 0
where B and D split the basic and non-basic sets of the constraint coefficients and variables. From the constraint equations we have
xB = B-1 b - B-1 D xD
Therefore, the objective functional can be expressed in xD alone as
f(B-1 b - B-1 D xD, xD)
The reduced gradient rT is the derivative of the objective functional f with respect to xD as
d xB
r T = x B f ( x B, x D ) ---------- + x D f ( x B, x D ) = xD f ( x B, x D ) xB f ( x B, x D ) B 1 D = c D c B B 1 D
dx
Eq. 240
where c B = xB f ( x B, x D ) and c D = x D f ( x B, x D ) . Basic_Set class needs some modification for nonlinear

expression as shown in Program Listing 213. The coefficient vector of the objective functional cT = f, and the
coefficient matrix of the constraint equations A [B, D] = dC, are now accessed through free functions C0&
df(Basic_Set&) and C0& dC(Basic_Set&), which are declared as friend of class Basic_Set. The pointer to
variables are also declared as a private data member of the Basic_Set, which can be accessed by the member
function C0& X()
We consider a specific example1 already written in standard form as
minimize x12 +x22 + x32 + x42 - 2x1 - 3x4
subject to 2x1 + x2 + x3 + 4x4 = 7
x1 + x2 + 2x3 + x4 = 6
x 1 0, x 2 0, x 3 0, x 4 0
1. p. 347 in D.G. Luenberger, 1989, Linear and Nonlinear Programming, Addison-Wesley Publishing Company, Inc.,
Reading, MA.
137
Chapter
class Basic_Set {
C1 *_A, *_c, *_X;
int row_size, col_size, *_basic_order;
public:
Basic_Set(C1&, C1&, C1&);
~Basic_Set() { delete [] _basic_order; }
int basic_order(int i) {return _basic_order[i];}
void swap(int i, int j);
C0& X();
friend C0& dC(Basic_Set&);
friend C0& df(Basic_Set&);
};
C0& Basic_Set::X() { return (*(_X)).F(); }
C0& dC(Basic_Set& a) { return d(*(a._A)); }
C0& df(Basic_Set& a) { return d(*(a._c)); }
Basic_Set::Basic_Set(C1& C, C1& f, C1& X) {
row_size = C.row_length();
col_size = C.col_length();
_basic_order = new int[col_size];
for(int i = 0; i < col_size; i++) _basic_order[i] = i;
_A = &C; _c = &f; _X = &X;
}
void Basic_Set::swap(int i, int j) {
int old_basic_order = _basic_order[i];
_basic_order[i] = _basic_order[j]; _basic_order[j] = old_basic_order;
C0 old_Ai(row_size, (double*)0);
old_Ai = d(*_A)(i); d(*_A)(i) = d(*_A)(j); d(*_A)(j) = old_Ai;
C0 old_ci(0.0); old_ci = d(*_c)[i];
d(*_c)[i] = d(*_c)[j]; d(*_c)[j] = old_ci;
C0 old_Xi(0.0); old_Xi = ((C0)(*_X))[i];
((C0)(*_X))[i] = ((C0)(*_X))[j]; ((C0)(*_X))[j] = old_Xi;
}
class basic set

constraint, coefficients of the objective
functional, and variables
variables
constraint coefficients
objective functional coefficients
initialize original variable order array
swap order
swap columns
swap objective functional coefficients
swap variables
Listing 213 class Basic_Set data abstraction for nonlinear problem (project: reduced_gradient).
Program Listing 214 implements the reduced gradient method using class Basic_Set in Program Listing
213. The basic steps are
Step 1: reduced gradientrT = cD - cB B-1 D,
Step 2: xDif ri < 0 or xDi > 0, xDi = -ri else xDi = 0
Step 3: xBif xDi = 0, current solution is optimal, else xB= B-1 DxD
Step 4: feasible boundsmax { B: xB+B xB 0 }, and max { D: xD+D xD 0 }
Step 5: line searchmin{f(x + x): 0 B, 0 D }, update with xi+1 = xi + x
Step 6: swapif B swap the vanishing xB with vanishing xD, and corresponding [B, D] and [cB, cD] .
The difference from the linear programming version is evident now that the fundamental theorem of linear programming is not applicable any more; i.e., the extremum value may occur in the middle of an edge or even in the
138

interior of the feasible domain. It is not sufficient to check from vertex to vertex for optimal objective functional
value as was done in linear programming. The consequence is that a line search is performed for the minimum
value while the vertex value is used only as an upper bound (Step 4, and 5). The result of the optimal solution is
at x = {1.148, 0.683, 1.806, 0.553}T with the minimum objective functional value, f = 1.40348.
139
Chapter
int main() {
double rhs[2] = {7.0, 6.0}, v[4] = {2.0, 2.0, 1.0, 0.0}, norm_dxd,
EPSILON = 1.e-12, HUGE = 1.e20, RELAXED = 1.e3;
int k = 0, MAX_NO_OF_ITER = 10;
C1 X(4, v), C = VECTOR_OF_TANGENT_BUNDLE("int, int", 2, 4);
C[0] = 2*X[0]+ X[1]+ X[2]+4*X[3];
C[1] = X[0]+ X[1]+2*X[2] + X[3];
C1 f = X[0].pow(2)+X[1].pow(2)+X[2].pow(2)+X[3].pow(2)-2*X[0]-3*X[3];
Basic_Set BS(C, f, X);
C0 B(2, 2, dC(BS), 0, 0), D(2, 2, dC(BS), 0, 2), c_B(2, df(BS), 0), c_D(2, df(BS), 2);
C0 X_B(2, BS.X(), 0), X_D(2, BS.X(), 2), b(2, rhs), x(4, (double*)0);;
do {
C0 d_X_B(2, (double*)0),d_X_D(2, (double*)0),
B_inv = B.inverse(), r_D = c_D - c_B * B_inv * D;
for(int i = 0; i < 2; i++)
if((double) r_D[i] < -EPSILON || (double) X_D[i] > EPSILON) d_X_D[i] = - r_D[i];
else d_X_D[i] = 0.0;
if((norm_dxd = norm(d_X_D)) > RELAXED*EPSILON) {
d_X_B = - B_inv * D * d_X_D;
double alpha_B=HUGE,alpha_D=HUGE,ratio_B,ratio_D;int min_B=-1,min_D=-1;
for(int i = 0; i < 2; i++) if((double)d_X_B[i] < EPSILON) {
ratio_B = (double) - X_B[i]/d_X_B[i];
if(ratio_B < alpha_B) { alpha_B = ratio_B; min_B = i; }
}
for(int i = 0; i < 2; i++) if((double)d_X_D[i] < EPSILON) {
ratio_D = (double) - X_D[i]/d_X_D[i];
if(ratio_D < alpha_D) { alpha_D = ratio_D; min_D = i; }
}
C0 d_X = d_X_B & d_X_D, d_alpha;
C2 alpha(0.0), x[4];
do {
for(int i = 0; i < 4; i++)
x[i] = ((C0)X)[BS.basic_order(i)] + alpha * d_X[BS.basic_order(i)];
C2 phi = x[0].pow(2)+x[1].pow(2)+x[2].pow(2)+x[3].pow(2)-2*x[0]-3*x[3];
else break;
} while((double)norm(d_alpha) > RELAXED*EPSILON);
if((double)(C0) alpha >= alpha_B) {
(C0) alpha = alpha_B;
BS.swap(min_B, min_D+2);
}
if((double)(C0) alpha > alpha_D) (C0) alpha = alpha_D;
d_X *= ((C0)alpha);
((C0)X) += d_X;
}
f = X[BS.basic_order(0)].pow(2)+X[BS.basic_order(1)].pow(2)+
X[BS.basic_order(2)].pow(2)+X[BS.basic_order(3)].pow(2)
-2*X[BS.basic_order(0)]-3*X[BS.basic_order(3)];
df(BS) = d(f);
} while(++k < MAX_NO_OF_ITER && norm_dxd > RELAXED*EPSILON);
for(int i = 0; i < 4; i++) x[i] = ((C0)X)[BS.basic_order(i)];
C0 fp = x[0].pow(2)+x[1].pow(2)+x[2].pow(2)+x[3].pow(2)-2*x[0]-3*x[3];
cout << "The final solution: " << x << endl << "f: " << fp << endl;
}
Listing 214 Reduced gradient method (project: reduced_gradient).
140
initial feasible point, x0 = {2, 2, 1, 0}T

constraints and objective functional
C[0] = 2x1 + x2 + x3 + 4x4
C[1] = x1 + x2 + 2x3 + x4
f(x) = x12 +x22 + x32 + x42 - 2x1 - 3x4
A = [B, D], cT = [cBT, cDT],
xT = [xBT, xDT], bT = {7, 6}T
xB, xD
Step 1: rT = cD - cB B-1 D
Step 2: xDi = -ri
xDi = 0
Step 3: if not xDi = 0
xB= B-1 DxD
Step 4:
max { B: xB+B xB 0 },
max { D: xD+D xD 0 }
Step 5:
xT = [xBT, xDT]
line search along x
xi+1 = xi + x
(xi+1)
d = -d / d 2; Newtons formula
+= d
min{f(x + x):
0 B, 0 D }
Step 6: update B and D
dx = d
update xi+1 = xi+dx
update f
update c
update x
x = {1.148, 0.683, 1.806, 0.553}T
f = 1.40348

Gradient Projection Method
An inequality constrained problem will be tackled in this section using the active set method. In Figure 2.7 on
the introduction of Lagrange multiplier method we showed that for a point to be at extremum condition the gradient of the objective functional f can be expressed as linear combination of gradient of the active constraint
equations A such as the first-order condition in Eq. 210
f +T A = 0
This optimal condition states that the two gradients are parallel to each other (in 2-D representation). In general
during the course of optimization, where the convergence has not yet achieved, these two gradients are not parallel to each other, and we can project f on the tangent plane such as at the point x (as in Figure 2.7). The projected gradient on the tangent plane is a vector d to be used as search direction is expressible as (see also Figure
2.15
p = f A
T
Eq. 241
where the second term - A is the component orthogonal to the tangent plane. In view of Eq. 241, p vanishes
when the left-hand-side equals zero; i.e., the first-order condition of an extremum point is satisfied. Since the
search direction p on the tangent plane is orthogonal to the gradient of the constraint equations A, we have the
orthogonal relationship as A p = 0. Pre-multiply Eq. 241 with A and solve for ,1
T -1
= -(A A ) Af
Eq. 242
then, substitute back to Eq. 241 to eliminate , we obtain

T
T -1
p = -[I - A (A A ) A] g = - P g,
T
T -1
where P = [I - A (A A ) A] is the projection matrix which projects the negative gradient - g to the tangent plane to define the search direction p.
Active_Set class has already been shown in Program Listing 23. Program Listing 215 implements the
gradient projection method. The basic steps are2
hypotenuse: -g = f
x
A
T
p = f - A
tangent plane
Figure 2.15 Project negative gradient to the tangent plane as p (search direction).
Reading, MA.
2. modified from p. 332-333 in D.G. Luenberger, 1989, same as the above.
141
Chapter
Step 1: active set,

Step 2: projection matrix,
if number of constraints 1, P = [I - AT (A AT)-1 A] , p = - Pf
else p = - f (no constraint, degenerate to steepest descent method)
Step 3: if p 0,
feasible boundsmin { i: i = - Ci / (d(C)i p)}
line searchmin{f(x + p): 0 i }, update with xk+1 = xk + p
go to step 1
Step 4: else if p = 0,
Kuhn-Tucker conditionI = {i | Ci = 0}, for i I, if i 0; break
deactivateelse deactivate the constraint with the most negative .
go to step 1.
A specific example solved in Program Listing 215 is 1
f(x1, x2) = 2x12 + x1x2 + x22 -12x1 -10 x2
subject to x1 + x2 4
-x1 0
-x2 0
The objective functional is the same one that has been solved in Eq. 25 without constraints, and implemented in
Program Listing 22. The result of the present constrained optimization of Eq. 25 is shown in Figure 2.16. The
search path taken by this program is: (1) the initial feasible point is set at (0, 0). The first active set formed consists of the second and the third constraints (-x1 0, and -x2 0). This initial active set will produce no search
direction (p = 0). (2) The Lagrange multiplier of this active set is {-12, -10}T. Therefore, the second constraint (
-x1 0 with = -12) is dropped from the active set. (3) The search path, p, is along the x-axis. A line search
will be activated to find the minimum objective functional value of f(x1, x2) at (3, 0). (4) the third constraint is
dropped (-x2 0 with = -10) from the active set leaving no constraint in the set. A steepest descent of objective functional f(x1, x2) produces a search direction parallel to y-axis. The first constraint (x1 + x2 4) will be
encountered along this search direction at (3, 1). (5) With only one constraint left, a line search is performed
which gives (1.5, 2.5) as the minimum point. Since the only Lagrange multiplier is positive. (1.5, 2.5) is taken as
the final optimal solution.
In the gradient projection method, when there is no constraint, the method reduces to steepest descent
method. Figure 2.11 on page 128 shows an example of how steepest descent method can be a nightmare in the
computation. The steepest descent is really the backbone behind the gradient projection method. We need a second-order method to improve the convergence rate.
Reading, MA.
142

int main() {
double v[2] = {0.0, 0.0}, EPSILON = 1.e-12; int ALL_POSITIVE = TRUE, lambda_flag;
C1 X(2,v), C = VECTOR_OF_TANGENT_BUNDLE("int, int", 3, 2);
C1 f = TANGENT_BUNDLE("int", 2);
f = 2*X[0].pow(2)+X[0]*X[1]+X[1].pow(2)-12*X[0]-10*X[1];
C[0] = X[0] + X[1] -4;
C[1] = -X[0]
; C[2] =
- X[1];
Active_Set A(C);
for(;;) {
A.activate();
C0 lambda, p;
if(A.active_no() == 0) p &= -d(f);
else {
lambda &= - (d(A)*d(f)) / (d(A)*~d(A));
p &= -d(f)- d(A)*lambda;
}
if(fabs((double)norm(p)) > EPSILON) {
double min_alpha = 1.e20;
for(int i = 0; i < 3; i++)
if(A.active_state(i) <= -1) {
if(fabs(temp) > EPSILON) alpha = -(double)(((C0)C)[i]/temp);
if(alpha < min_alpha && alpha > 0.0) { min_alpha = alpha; active_flag = TRUE; }
}
C0 d_alpha(0.0); C2 alpha(0.0), x0, x1, F;
do {
x0 = ((C0)X[0]) + alpha * p[0]; x1 = ((C0)X[1]) + alpha * p[1];
F &= 2*x0.pow(2)+x0*x1+x1.pow(2)-12*x0-10*x1;
d_alpha = - d(F)/dd(F);
} while((double)norm(d_alpha) > EPSILON);
if((double)((C0)alpha) < min_alpha) min_alpha = (double)((C0)alpha);
C0 dx = min_alpha * p; ((C0)X) += dx;
((C0)C[0]) = ((C0)X[0]) + ((C0)X[1]) -4;
((C0)C[1]) = -((C0)X[0]) ; ((C0)C[2]) = ((C0)X[1]);
f = 2*X[0].pow(2)+X[0]*X[1]+X[1].pow(2)-12*X[0]-10*X[1];
} else {
int i_cache = -1; double min_lambda = -EPSILON;
lambda_flag = ALL_POSITIVE;
if((double)lambda[i] < -EPSILON) {
if((double)lambda[i] < min_lambda) { min_lambda = lambda[i]; i_cache = i; }
}
if (lambda_flag) break;
else A.deactivate(i_cache);
}
cout << ((C0)X) << endl;
}
cout << "solution: " << ((C0)X) << endl;
return 0;
}
initial feasible point (0, 0)

f(x1, x2) = 2x12 + x1x2 + x22 -12x1 -10 x2
x1 0, x2 0
initialize active set
Step 1. form active set
Step 2: projection matrix
no constraint degenerated to
steepest descent p = - f
T -1
= -(A A ) A f
p = - f - AT , Eq. 241
Step 3: if p 0
min{i:i =- Ci / (d(C)i p)}
line search
min{f(x + p): 0 i }
updates
xk+1 = xk + p
C(xk+1)
f(xk+1)
Step 4: if p = 0
most negative
Kuhn-Tucker condition;
214
Eq.
i 0 , A i 0, and i A i = 0
until for Ai = 0 i 0
drop constraint corresponding to
the most negative
Listing 215 Gradient projection method (project: gradient_projection).
143
Chapter

x2
f(x1, x2) = 2x12 + x1x2 + x22 -12x1 -10 x2

(2, 4)
feasible region
(1.5, 2.5)
2
(3, 1)
x1 + x2 = 4
x1
(0, 0)
0
Figure 2.16
144
(3, 0)
2
Active set method on a constrained quadratic functional

Lagrange Method
For simplicity we restrict the problem to linear equality constraints with quadratic objective functional; i.e.,
the equality constrained quadratic programming. For generalization to the inequality constrained problem with
inequality constraint equation as C(x). A(x), the subset of C(x), will be the currently active constraints. Consider
a quadratic program
minimize
1
f(x) = -2 x T H x + g T x
subject to
A(x) = Ax - b = 0
where H = f,xx is the Hessian matrix, and g = f,x is the gradient vector. The Lagrangian functional using the
Lagrange multiplier method such as Eq. 211 on page 118 is
T
T
1
l ( x, ) = f ( x ) + A ( x ) = --- x T H x + g T x + ( Ax b )
2
Eq. 243
The Euler-Lagrange equations give the first-order optimal conditions of the Lagrangian functional with respect
to x and .
l,x(x,) = Hx+AT + g = 0
l,(x,) = Ax - b = 0
Eq. 244
The second-order optimal condition requires the Hessian matrix H be positive definite. That is always true for a
quadratic functional. An incremental version with xi+1 = xi +x can be substituted in f(x) and A(x). One can view
the expression f ( x i + x ) f ( x i ) + gT x + 1--- ( x )T H x as an approximation using second-order Taylor expan2
sion to the objective functional, with the current active constraint equations as
A(xi+1) = A(xi +x) A(xi) + A x = A(xi) + A x = 0
With these relations, the Euler-Lagrange equations can be re-written in matrix form with the incremental solution, x, as
T
H A
A 0
f ( x i )
A ( x i )
Eq. 245
Using first equation in Eq. 245, we get x = H-1 (f(xi ) - AT). Notice that we have relied on the symmetrical
positive definitiveness of H to have its inverse. Substituting this back to eliminate x in the second equation
gives an equation: AH-1 (f(xi ) - AT) = -A(xi). Solving this equation for gives
= (AH-1AT)-1 [A(xi) - AH-1f(xi )]
Eq. 246
We first compute , then, solve x as

x = H-1 (f(xi ) - AT).
Eq. 247
145
Chapter
In summary, we compare the Lagrange method to the previous methods. One can view Eq. 241 and Eq. 242
for the gradient projection method as the simplified approximation of Eq. 246 and Eq. 247 for the Lagrange
method. First, in the gradient projection method we use only the first derivative information. If we set H-1 = I in
Eq. 247 we get x = - f(xi ) - AT which is Eq. 241 (with p = x). Secondly, having H-1 = I in mind, gradient
projection method is projecting the gradient on the tangent plane of the constraint surface A x = 0 instead of
the approximated constraint surface A x = - A(xi) in Lagrange method. Plugging in the tangent plane A x
T -1
= 0 in Eq. 245 and H-1 = I, the Eq. 246 becomes = -(A A ) Af, which is exactly Eq. 242. Therefore,
in Lagrange method, we use gradient projection Eq. 242 in place of Eq. 246. The procedure of using such
approximated Lagrange multiplier in Lagrange method is an example known as the multiplier update method.
The class Active_Set needs some modification (see Program Listing 216). In the previous examples, in
linear programing and gradient projection method, class Active_Set only needs to store and update the tangent
plane (of the constraint surface) information. In Lagrange method, besides the tangent plane information, the
information on the constraint surface itself, specifically A(xi), also needs to be stored and updated; i.e., A x =
- A(xi) instead of A x = 0.
class Active_Set {
C1 _A, &Constraint;
int n_equality, size_c, n_active, *_active_state;
public:
Active_Set(C1& C, int n = 0);
~Active_Set() { delete [] _active_state; }
int active_state(int i) { return _active_state[i];}
int active_no() { return n_active; }
operator C0() { return ((C0)_A); }
void activate();
void deactivate(int i, int k = -2);
friend C0& d(Active_Set&);
};
Active_Set::Active_Set(C1& C, int n) : Constraint(C) {
n_equality = n;
size_c = Constraint.row_length();
_active_state = new int[size_c];
for(int i = 0; i < size_c; i++) _active_state[i] = -1;
}
void Active_Set::activate() {
n_active = 0;
for(int i = 0; i < n_equality; i++) _active_state[i] = n_active++;
for(int i = n_equality; i < size_c; i++)
if((double) ((C0)Constraint)[i] > -1.e-10 &&_active_state[i] >= -1)
if(n_active > 0) {
_A &=VECTOR_OF_TANGENT_BUNDLE("int, int", n_active, Constraint.col_length());
for(int i = 0; i < size_c; i++) if(_active_state[i] >= 0) _A[_active_state[i]] = Constraint[i];
}
}
void Active_Set::deactivate(int i, int k) {
for(int j = 0; j < size_c; j++) if(_active_state[j] == i) { _active_state[j] = k; break; } }
C0& d(Active_Set& a) { return d(a._A); }
class Active_Set
A(active constraints) and C
A(xi)
A
activate active set
update A from C
deactive a constraint
Listing 216 class Active_Set data abstraction for both Lagrange method and gradient projection method
(project: lagrangian_and_gradient_projection).
146

With the close relationship of gradient projection method and Lagrange method, we should implement the
two procedures in one program (see Program Figure 2.17). The macro definition __LAGRANGE is to be set at
compile time to build an executable module for Lagrange method. Without this macro definition set, gradient
projection method will be built as default. Program Listing 215 for gradient projection method and the current
Program Listing 217 is almost identical.
The result of Lagrange method is shown in Figure 2.17. The obvious difference is that when there is no constraint, Lagrange method takes a classic Newton direction, instead of a steepest descent direction, aiming at the
point (2, 4) which encounters the constraint of x 1 + x2 = 4 at point (2 2--- , 1 1--- ). This different search direction of
3
3
course doesnt affect to the reach of the final solution at (1.5, 2.5).
f(x1, x2) = 2x12 + x1x2 + x22 -12x1 -10 x2
x2
(2, 4)
4
feasible region
(1.5, 2.5)
2
(2.6667, 1.3333)
x1 + x2 = 4
x1
(0, 0)
0
(3, 0)
2
Figure 2.17 Lagrange method with active set method on a constrained quadratic
functional
147
Chapter
int main() {
double v[2] = {0.0, 0.0}, EPSILON = 1.e-12; int ALL_POSITIVE = TRUE, lambda_flag;
C1 X(2,v), C = VECTOR_OF_TANGENT_BUNDLE("int, int", 3, 2);
#if defined(__LAGRANGE)
C2 X2(2, v), f = 2*X2[0].pow(2)+X2[0]*X2[1]+X2[1].pow(2)-12*X2[0]-10*X2[1];
#else
C1 f = 2*X[0].pow(2)+X[0]*X[1]+X[1].pow(2)-12*X[0]-10*X[1];
#endif
C[0] = X[0] + X[1] -4; C[1] = -X[0]; C[2] = - X[1];
Active_Set A(C);
for(;;) {
A.activate(); lambda_flag = !ALL_POSITIVE; C0 lambda, p;
if(A.active_no() == 0)
p &= -d(f)/dd(f)
#else
p &= -d(f);
#endif
else {
lambda &= (((C0)A)-d(A)*dd(f).inverse()*d(f))/(d(A)*dd(f).inverse()*(~d(A)));
p &= - dd(f).inverse()*(d(f)+(~d(A))*lambda);
#else
lambda &= - (d(A)*d(f)) / (d(A)*~d(A));
p &= -d(f)- d(A)*lambda;
#endif
}
if(fabs((double)norm(p)) > EPSILON) { double min_alpha = 1.e20;
for(int i = 0; i < 3; i++) if(A.active_state(i) <= -1) {
if(fabs(temp) > EPSILON) alpha = -(double)(((C0)C)[i]/temp);
if(alpha < min_alpha && alpha > 0.0) { min_alpha = alpha; active_flag =
TRUE;}}
C0 d_alpha(0.0); C2 alpha(0.0), x0, x1, F;
do { x0 = ((C0)X[0]) + alpha * p[0]; x1 = ((C0)X[1]) + alpha * p[1];
F &= 2*x0.pow(2)+x0*x1+x1.pow(2)-12*x0-10*x1;
d_alpha = - d(F)/dd(F); ((C0)alpha) += d_alpha;
} while((double)norm(d_alpha) > EPSILON);
if((double)((C0)alpha) < min_alpha) min_alpha = (double)((C0)alpha);
C0 dx = min_alpha * p; ((C0)X) += dx;
((C0)C[0]) = ((C0)X[0]) + ((C0)X[1]) -4;
((C0)C[1]) = -((C0)X[0]) ; ((C0)C[2]) = ((C0)X[1]);
((C0)X2) = ((C0)X) ;
f = 2*X2[0].pow(2)+X2[0]*X2[1]+X2[1].pow(2)-12*X2[0]-10*X2[1];
#else
f = 2*X[0].pow(2)+X[0]*X[1]+X[1].pow(2)-12*X[0]-10*X[1];
#endif
} else {
int i_cache = -1; double min_lambda = -EPSILON; lambda_flag = ALL_POSITIVE;
if((double)lambda[i] < -EPSILON) { lambda_flag = !ALL_POSITIVE;
if((double)lambda[i] < min_lambda) { min_lambda = lambda[i]; i_cache = i; } }
if (lambda_flag) break; else A.deactivate(i_cache);
}
cout << ((C0)X) << endl;
} cout << "solution: " << ((C0)X) << endl;
initial feasible point (0, 0)

f(x1, x2) = 2x12 + x1x2 + x22 -12x1 -10 x2
x1 0, x2 0
Step 1. form active set
Step 2: search direction p
no constraint degenerate to
classic newton p = - f / 2f,
or,
steepest descent p = - f
= (AH A )
-1
[A(xi) - AH f(xi )]
-1
i
p = H (f(x ) - AT)
T -1
or,
= -(A A ) Af
T
p = - f - A , Eq. 241
Step 3: if p 0
min{i : i = - Ci / (d(C)i p)}
-1 T -1
line search
min{f(x + p): 0 i}
updates
xk+1 = xk + p
C(xk+1)
f(xk+1)
Step 4: if p = 0
most negative
Kuhn-Tucker condition;
214
Eq.
i 0 , A i 0, and i Ai = 0
until for Ai = 0 i 0
drop constraint corresponding to
Listing 217 Lagrange method and gradient projection method (project: lagrangian_and_gradient_projection).
148

Range Space and Null Space Methods
Both the reduced gradient method and the gradient projection method are known as reduced gradient type
method collectively. They have strong flavor of linear programming in them. At this point, for those who are
familiar with numerical linear algebra, we introduce the range space method and the null space method. The
range space and null space methods use the relatively efficient and reliable QR decomposition, that we have
introduced in Chapter 1. When rank deficiency in the constraint spaces becomes a concern, the more costly singular value decomposition can be used.
T
From Eq. 241, p = - f - A . The search direction p is lying on the tangent plane of the constraint surface
A; that is p is orthogonal to the gradient of the constraint surface A (=A) as (see also Figure 2.15)
A p = 0.
Eq. 248
In other word, Eq. 248 expresses that the search direction p is in the null space of A. At the optimal condition, p
= 0, the negative gradient of the objective functional - f is the linear combination of the range space of A (=
A); i.e., - f = AT.
Range Space Method: Recall Eq. 246 and Eq. 247 from Lagrange method and consider projection of negative
gradient f on the tangent plane M= {y| Ay= 0}and gradient of constraint surface A, we have
-1 T -1
-1
= - (AH A ) AH f
-1
T
p = - H (f + A )
Eq. 249
T
A is of size m n and 0 m n. We can perform QR decomposition on A , that is A = QR. Assuming no

T
degeneracy condition, the first m columns of Q span the range space of A . Denoting Y (matrix of size n m)
T
YR into the first equation of Eq. 249, we have
consists the m columns of the range space. Substituting A
the range space method1
T
-1
-1
T -1
= - (Y H Y) Y H f
-1
p = - H (f + Y)
Eq. 250
T
On page 36 we discussed that the round-off error could accumulate in the multiplication of the normal form A A
in the least square problem, where the QR decomposition is used to control the condition number of the problem.
-1 T
In the first equation of Eq. 249, the condition number can increase by the multiplication operations in AH A ,
and we may run into trouble when its inverse is taken. Since columns of Y are orthonormal, in the first equation
-1
T -1
of Eq. 250, the condition number of Y H Y is as good as that of H . Therefore, the range space method with
Eq. 250 is numerically superior to Eq. 249.
The range space method is implemented in Program Listing 218 for solving the same problem that the
reduced gradient method solved in Program Listing 214. The core steps are
1. see p. 183-184 in P.E. Gill, W. Murray, and M.H. Wright, 1981, Practical Optimization, Academic Press, Inc., San
Diego.
149
Chapter
int main() {
const double RELAXED = 1.e3;
double v[4] = {2.0, 2.0, 1.0, 0.0};
C2 X(4, v);
C2 C = VECTOR_OF_TANGENT_OF_TANGENT_BUNDLE("int, int", 2, 4);
C[0] = 2*X[0]+ X[1]+ X[2]+4*X[3] - 7.0;
C[1] = X[0]+ X[1]+2*X[2]+ X[3] - 6.0;
C0 p = VECTOR("int", 4);
C0 A = d(C), Q = QR(~A).Q();
C0 Y = MATRIX("int, int", 4, 2);
for(int i = 0; i < 2; i++) Y(i) = Q(i);
int k = 0;
do {
C0 H_inv = dd(f).inverse();
C0 lambda_bar = - ((~Y)*H_inv*Y).inverse() * (~Y) *H_inv* d(f);
p = - H_inv*(Y*lambda_bar+d(f));
((C0)X) += p;
f = X[0].pow(2)+X[1].pow(2)+X[2].pow(2)+X[3].pow(2)-2*X[0]-3*X[3];
++k;
cout << "solution{" << k << "): " << ((C0)X) << endl << "f: " << ((C0)f) << endl;
} while(k < MAX_NO_OF_ITERATION && (double)norm(p) > RELAXED*EPSILON);
cout << "The final solution: " << ((C0)X) << endl << "f: " << ((C0)f) << endl;
}
AT = QR
Y has columns in the range space of Q
-1
H
-1 -1
-1
= - (YTH Y) YTH f
-1
p = - H (f + Y)
x = {1.148, 0.683, 1.806, 0.553}T

f = 1.40348
Listing 218 Range space method (project: range_space).

Step 1: range space, QR decomposition and form Y
Step 2: search direction,
-1 -1
-1
Lagrange multiplier estimate: = - (YTH Y) YTH f
-1
search direction: p = -H (f + Y)
For inequality constrained problems, these core steps need to be embedded in a program with the aid of class
Active_Set such as what is implemented in Program Listing 217.
Null Space Method: Consider search direction p on tangent plane M= {y| Ay= 0}, we have any such direction
satisfies
Ap=0
Denote Z contains n - m columns of the null space. The search direction p is the linear combination of the columns of Z as
p = Z pz
Eq. 251
where pz is a vector of size n-m. Taylor expansion to second-order such as Eq. 26 on page 109 with xi+1 = xi +
p is
150

1
f ( x i + p ) f ( x i ) + f,x ( x i )p + --- p T H ( x i )p
2
Substituting Eq. 251 into the increment p of the last equation yields
1
f ( x i + p ) = f ( x i + Zp Z ) f ( x i ) + f ,x ( x i )Zp Z + --- p ZT Z T H ( x i )Zp Z
2
Eq. 252
Minimization of Eq. 252 gives

p (= Z pz) = - Z(ZTHZ) ZTf
-1
Eq. 253
Denoting the projected Hessian as Hz = ZTHZ, and the projected gradient as (f)z = ZTf, we recover the classic
Newton method on the null space as
pz = - (f)z / Hz
Therefore, the meaning of the n-m vector pz is clear. From first equation of Eq. 245 we have Hp+AT + f =
0. Substituting Eq. 253 into this equation, we can solve for the Lagrange multiplier if necessary (such as for inequality constrained problems where values of is needed for the active set method)
= - (AAT)-1A(Hp+ f)
Eq. 254
Program Listing 219 implemented the null space method. The kernel steps are simple
Step 1: null space, QR decomposition and form Z
-1
Step 2: search direction, p =- Z(ZTHZ) ZTf
In retrospect, we should discuss the counterpart (dual) of the projected Hessian and projected gradient of null
space Hz = ZTHZ, and (f)z = ZTf, respectively. Assume x* as a local solution of the primal problem: minimize
f(x) subject to A(x) = 0. The lagrangian functional from Eq. 211 can be re-written for the dual problem as
l(x*, ) = (x*(), ) = () = f(x*()) + T A(x*())
The first order derivative of () is
( ) = [ f ( x ( ) ) + T A ( x ( ) ) ] x ( ) + A ( x ( ) )
Eq. 255
From the term, f ( x ( ) ) + T A ( x ( ) ) = 0 , we have

( ) = A ( x ( ) )
Eq. 256
2
( ) = A ( x ( ) ) x ( )
Eq. 257
For the second order derivative of ()
151
Chapter
int main() {
const double RELAXED = 1.e3;
double v[4] = {2.0, 2.0, 1.0, 0.0};
C2 X(4, v);
C2 C = VECTOR_OF_TANGENT_OF_TANGENT_BUNDLE("int, int", 2, 4);
C[0] = 2*X[0]+ X[1]+ X[2]+4*X[3] - 7.0;
C[1] = X[0]+ X[1]+2*X[2]+ X[3] - 6.0;
C0 p = VECTOR("int", 4);
C0 A = d(C);
C0 Q = QR(~A).Q();
C0 Z = MATRIX("int, int", 4, 2);
for(int i = 0; i < 2; i++)Z(i) = Q(i+2);
int k = 0;
do {
p = Z * ((~Z)*dd(f)*Z).inverse() * (~Z) * -d(f);
((C0)X) += p;
f = X[0].pow(2)+X[1].pow(2)+X[2].pow(2)+X[3].pow(2)-2*X[0]-3*X[3];
++k;
cout<< "solution{" << k << "): " << ((C0)X) << endl << "f: " << ((C0)f) << endl;
} while(k < MAX_NO_OF_ITERATION && (double)norm(p) > RELAXED*EPSILON);
cout << "The final solution: " << ((C0)X) << endl << "f: " << ((C0)f) << endl;
}
AT = QR
Z has columns in the null space of Q
p = - Z(ZTHZ)-1 ZTf
x = {1.148, 0.683, 1.806, 0.553}T

f = 1.40348
Listing 219 Null space method (project: null_space).

Taking derivative of f ( x ( ) ) + T A ( x ( ) ) = 0 with respect to we have
H ( x ( ), ) x ( ) + A ( x ( ) ) = 0
Eq. 258
Therefore, we get
T
x ( ) = H 1 ( x ( ), ) A ( x ( ) )
Eq. 259
Substituting Eq. 259 back to Eq. 257 we have

T
2
( ) = A ( x ( ) )H 1 ( x ( ), ) A ( x ( ) )
Eq. 260
That is = A(x), and = - AT H-1 A. The classic Newton method for the dual problem coincides with the
first term in Eq. 246 of Lagrange method. The Hessian of the dual, - AT H-1 A, governs the convergence rate
of the dual problem.
It is helpful to point out that the existence and uniqueness of the constrained problem is known to be associated with the abstract form of a saddle-point problem1 (see Figure 2.18). A saddle function is shown as the
1. p. 30 in M.M. Sewell, 1987, Maximum and minimum principles, Cambridge University Press, Cambridge, UK.
152

x
saddle point
Figure 2.18 An example saddle function l(x, )= x2 - 2, where x is the

minimizer and is the maximizer.
Lagrangian functional l(x, ), which is minimized with respect to x and maximized with respect to . Requiring
both positive definitiveness of l,xx and negative definitiveness of l, gives a strong (sufficient) condition for a
unique solution.
In summary, the range space method works on m dimensional space, where the most important step is to
estimate the Lagrange multiplier, a vector of length m. The null space method works on n-m dimensional
space, where we compute projected search direction pz (of length n-m) from projected gradient and projected
Hessian as pz = - (f)z / Hz. For small size problems, the null space method is numerically more reliable, because
in the range space method the inverse of the dual Hessian is a doubly inverted factor (YTH-1Y)-1 in the estimation
of Lagrange multiplier, which may cause significant cancellation errors. For a large size problem, the size of
(YTH-1Y)-1 in Eq. 250 of the range space method, and the size of (ZTHZ)-1 in Eq. 253 of the null space method
may significantly affect the size of the memory space, consequently, the cost of the inverse computation. For
number of constraints less than half of the variable number the range space method is less demanding, and for
number of constraints greater than half of the variable number null space has the advantage.
Penalty Methods
Penalty method transforms a constrained problem into an unconstrained problem by defining a penalty objective functional, for example, with a quadratic penalty term of active constraints as
minimize q(x) = f(x) + P(x) = f(x) + ( /2) A (x)T A(x)
Eq. 261
where is the penalty parameter and the second term is designed to penalized the objective functional when the
constraints are violated. The steps of the penalty method are
153
Chapter

Step 1: initial x0 and 0 , use x0 which satisfies A(x0) = 0, and starts with an initial small 0 to make
f(x0) has more weight compared to the P(x0) in the minimization to obtain a new solution x1.
Step 2: for loop, increase the penalty constant such that i+1 > i . Therefore, P(xi) has increasing
weight with respect to f(xi); i.e., active constraints are penalized increasingly.
The final solution is the limiting point x, with , although when is too big the problem becomes ill-conditioned. The advantage of penalty method lies on the simplicity of Eq. 261. No advanced concept needs to be
introduced. The disadvantage is that we are left with a incremental procedure in which the problem needs to be
solved many times with an empirical sequence of . We emphasize that it is necessary to start with a smaller ,
and then, increase it subsequently. If we ignore the need for a incremental procedure and compute with only one
big , the solution can be completely different. Starting with too big a penalty parameter the solution is to satisfy
constraints overwhelmingly with no concern of the minimization of the objective functional. However, in many
engineering applications some magic is often recommended for their own application domain. The use of this
magic is an art rather than science.
In view of the shape of a saddle function shown in Figure 2.18, it is obvious that the quadratic form of the
penalty function P(x) = (/2) || A(x) ||2 is the most popular one, where P,xx is positive semi-definite; i.e., the
penalty term convexifies the primal (x-variables). Comparing Eq. 210, f +T A = 0 (the first-order condition), with the penalty objective functional
q(x) = f(x) + A(x)T A(x) = 0,
we see
= A(x)
Eq. 262
This can be used as the updating formula i+1 = i + A(x) for the simplest form of multiplier update method
discussed on page 146. This updating formula can be used for the augmented lagrangian method introduced
later.
Now consider a specific example we have been solving in previous sections
f(x1, x2) = 2x12 + x1x2 + x22 -12x1 -10 x2
-x1 0
-x2 0
For simplicity we should drop the inequality constrained part of the problem and consider only the equality constraint
x1 + x2 = 4
Assume that we are at the final constraint set of the active set method. We use x = (2, 2) which is clearly on the
constraint line and use penalty method to search for the final solution (1.5, 2.5). The penalty objective functional
q(x) is defined as
154

A(x1, x2) = x1 + x2 - 4
q(x1, x2) = 2x12 + x1x2 + x22 -12x1 -10 x2 + --- (A (x1, x2) T A(x1, x2) )
2
These two equations are the kernel of the penalty method, and the original constrained problem has been transform completely into an new unconstrained problem. Program Listing 220 implemented this simplified problem. This coding should be embedded into the active set method for a more general inequality constrained
problem.
int main() {
const int DOF = 2; const int MAX_NO_OF_ITERATION = 20;
const double EPSILON = 1.e-12; double x[DOF] = {2.0, 2.0};
C2 q, A = TANGENT_OF_TANGENT_BUNDLE("int", DOF), X(DOF,x);
double rho = 1.0, delta_X;
C0 d_x, X_cache = VECTOR("int", DOF);
int k0 = 0;
do {
rho *= 10.0;
int k1 = 0;
do {
A = X[0] + X[1] -4;
q &= 2*X[0].pow(2)+X[0]*X[1]+X[1].pow(2)-12*X[0]-10*X[1]
+ (0.5*rho)*A.pow(2);
d_x &= -d(q) / dd(q);
(C0)X += d_x;
} while ((double)norm(d_x) > EPSILON && ++k1 < 10);
cout << "solution(rho=" << rho << ", " << k1 << "): " << ((C0)X) << endl;
delta_X = norm(X_cache - ((C0)X));
X_cache = ((C0)X);
} while(++k0 < MAX_NO_OF_ITERATION && delta_X > 1.e6*EPSILON);
cout << "The Final solution: " << ((C0)X) << endl;
}
A(x1, x2) = x1 + x2 - 4
q(x1, x2) = 2x12 + x1x2 + x22 -12x1 -10
x2 + --- (A(x1, x2)T A(x1, x2) )
2
dx = - q,x(xi) / q,xx(xi)
x += dx
x = {1.5, 2.5}T
Listing 220 Penalty method with a single equality constraint (project: penalty_one_constraint).
Since the penalty problem has transformed an equality constrained problem into an unconstrained problem,
various unconstrained optimization methods in Section 2.3.2 are applicable to the penalty method. We apply
classic Newton method, conjugate gradient method, and combined Newton and steepest descent method to the
penalty method in the following. Consider a less trivial problem with 10 variables and 4 equality constraints such
as1
1. from p. 381 in D.G. Luenberger, 1989, Linear and Nonlinear Programming, Addison-Wesley Publishing Company, Inc.,
Reading, MA.
155
Chapter

10
minimize f(x) = ix i2
i=0
subject to 1.5 x1 + x2 + x3 +0.5 x4 + 0.5 x5 = 5.5

2.0 x6 - 0.5 x7 -0.5 x8 + x9 - x10 = 2.0
x1 + x3 + x5 + x7 + x9 = 10
x2 + x4 + x6 + x8 + x10 = 15
Program Listing 221 implemented the classic Newton method version of the penalty method for the above
constrained problem. Definition of the penalty objective functional is q(x) = f(x) + ( /2) (A (x)T A(x) ), and the
classic Newton formula is dx = - q,x(xi) / q,xx(xi) with the update x += dx. The solution is x = {-2.00, 2.66, 2.39,
3.62, 3.27, 2.87, 3.87, 3.16, 2.47, 2.69}T with the minimum objective functional value, f = 502.4.
int main() {
const double EPSILON = 1.e-12; const double RELAXED = 1.e6;
double rho = 1.0, delta_X, x[DOF] = {0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0};
C2 f, q, X(DOF,x),
A = VECTOR_OF_TANGENT_OF_TANGENT_BUNDLE("int, int", 4, DOF);
int k0 = 0;
do {
rho *= 10.0;
int k1 = 0;
do {
A[0]=1.5*X[0]+X[1]+X[2]+0.5*X[3]+0.5*X[4] -5.5;
A[1]=2.0*X[5]-0.5*X[6]-0.5*X[7]+X[8]-X[9]-2.0;
A[2]= X[0] +X[2] + X[4] + X[6] + X[8] -10.0;
A[3]= X[1] + X[3] + X[5] + X[7] + X[9]-15.0;
f &= X[0].pow(2)+2*X[1].pow(2)+3*X[2].pow(2)+4*X[3].pow(2)+5*X[4].pow(2)+
6*X[5].pow(2)+7*X[6].pow(2)+8*X[7].pow(2)+9*X[8].pow(2)+10*X[9].pow(2);
q &= f + (0.5 * rho) * A.pow(2);
d_x &= -d(q) / dd(q);
((C0)X) += d_x;
} while ((double)norm(d_x) > RELAXED*EPSILON
&& ++k1 < MAX_NO_OF_ITERATION);
cout << "solution(rho=" << rho << ", " << k1 << "): " << ((C0)X) <<
" f: " << ((C0)f) << " q: " << ((C0)q) << endl;
X_cache = ((C0)X);
} while(++k0 < MAX_NO_OF_ITERATION && delta_X > RELAXED*EPSILON);
}
A0 =1.5 x1+x2+x3 +0.5 x4 +0.5 x5- 5.5

A1 =2.0 x6-0.5 x7-0.5 x8+x9-x10-2.0
A2 =x1 + x3 + x5 + x7 + x9 - 10
A3 =x2 + x4 + x6 + x8 + x10 - 15
10
f(x) = ix i2
i=0
q(x) = f(x) + ( /2) (A(x) T A(x) )

dx = - q,x(xi) / q,xx(xi)
x += dx
x = {-2.00, 2.66, 2.39, 3.62, 3.27, 2.87,

3.87, 3.16, 2.47, 2.69}T
f = 502.4
Listing 221 The classic Newton version for penalty method (project: penalty_newton).
Program Listing 222 implemented conjugate gradient method version of the penalty method for the same
equality constrained problem in the above. Conjugate gradient uses line search along search direction p as xi+1 =
xi + p, and its objective functional is redefined to be a one-parameter function in as (xi+1()) = (). The
one-parameter line search uses Newtons formula d = -d()/d 2() to find the minimum of (). Conjugate
gradient is computed using Fletcher-Reeves formula with gi+1=q(xi+1)T, and i= (gi+1)Tgi+1 / [(gi)Tgi] as Eq.
156

232 on page 131. The next search direction is computed using pi+1 = - gi+1 + i pi, accordingly. The final result is
essentially the same as the result of classic Newton method.
Combined Newton and steepest descent method version of penalty method is discussed next. When we introduced combined Newton and steepest descent method, Newton method and steepest descent method are applied
simultaneously using modified Cholesky decomposition. The implementation is extremely simple. Here, we go
back to the basics. Denote the tangent plane of the constraints as M = {y | A T y = 0}. First, Newton method is
applied on the orthogonal complement of the tangent plane M. Then, the steepest descent method is applied on
M. These two steps are executed in a repeated sequences1. We first recall the penalty objective functional from
Eq. 261
q(x) = f(x) + ( /2) (A(x)T A(x) )
Define the gradient of q(x) as
q(x) = f(x) + A(x)T A(x)
Denote Q(x) the Hessian of q(x) as
Q(x) = H(x) + A(x)T A(x)
Eq. 263
For the Newton method on M, any movement on this subspace can be expressed as xi+1 = xi + A u. Define
such incremental movement on M for the penalty objective functional
T
q(u) = q(xi + A (xi) u)

The gradient of q(u) is
q(u) =q(xi + A(xi)T u) A(xi)T
Then, denote Q(x) as the Hessian of q(u) at u = 0, and substitute Eq. 263 in place of Q(x), we have
T
Q(xi) = A(xi)( H(xi) + A(xi) A(xi)) A(xi)
Eq. 264
For , we can approximate Eq. 264 as

T
Q(xi) (A(xi) A(xi) )2
Eq. 265
Therefore, the Newton search direction p projected on M is
1. p. 282-284, and p. 384-387 in D.G. Luenberger, 1989, Linear and Nonlinear Programming, Addison-Wesley Publishing
Company, Inc., Reading, MA.
157
Chapter
int main() {
const int DOF = 10; const int MAX_NO_OF_ITERATION = 10; int k0 = 0, k1 = 0;
C1 f, q, X(DOF,x), A = VECTOR_OF_TANGENT_BUNDLE("int, int", 4, DOF);
double c = 1.0, delta_X, x[DOF] = {0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0};
C0 d_x, sd, delta_q, X_cache( DOF, (double*)0);
do {
rho *= 10.0;
do {
for(int i = 0; i < 10; i++) {
A[0]=1.5*X[0]+X[1]+X[2]+0.5*X[3]+0.5*X[4] -5.5;
A[1]=2.0*X[5]-0.5*X[6]-0.5*X[7]+X[8]-X[9]-2.0;
A[2]=X[0] +X[2] +X[4] +X[6] +X[8]-10.0;
A[3]=X[1] +X[3] +X[5] +X[7] +X[9]-15.0;
f &=X[0].pow(2)+2*X[1].pow(2)+3*X[2].pow(2)+4*X[3].pow(2)+
5*X[4].pow(2)+6*X[5].pow(2)+7*X[6].pow(2)+8*X[7].pow(2)+
9*X[8].pow(2)+10*X[9].pow(2);
q &= f + (0.5 * rho) * A.pow(2);
C2 alpha(0.0), x[10], psi[4]; C0 d_alpha, g = d(q); int k2 = 0;
if(i == 0) p &= - g;
do {
for(int j = 0; j < 10; j++) x[j] &= ((C0)X)[j] + alpha * p[j];
psi[0]=1.5*x[0]+x[1]+x[2]+0.5*x[3]+0.5*x[4]-5.5;
psi[1]= 2.0*x[5]-0.5*x[6]-0.5*x[7]+x[8]-x[9]-2.0;
psi[2]= x[0] + x[2] + x[4] + x[6] + x[8] -10.0;
psi[3]= x[1] + x[3] + x[5] + x[7] + x[9] -15.0;
C2 phi = x[0].pow(2)+2*x[1].pow(2)+3*x[2].pow(2)+4*x[3].pow(2)+
5*x[4].pow(2)+6*x[5].pow(2)+7*x[6].pow(2)+8*x[7].pow(2)+
9*x[8].pow(2)+10*x[9].pow(2)+(0.5 * rho) * (psi[0].pow(2)+
psi[1].pow(2)+psi[2].pow(2)+psi[3].pow(2));
if((double)dd(phi) > EPSILON) d_alpha &= -d(phi) / dd(phi); else break;
} while(++k2 < MAX_NO_OF_ITERATION &&
(double)norm(d_alpha) > RELAXED*EPSILON);
d_x &= ((C0)alpha)*p; ((C0)X) += d_x;
if(i != 9 && (double)norm(d_x) > EPSILON) {
A[0]=1.5*X[0]+X[1]+X[2]+0.5*X[3]+0.5*X[4]-5.5;
A[1]=2.0*X[5]-0.5*X[6]-0.5*X[7]+X[8]-X[9]-2.0;
A[2]=X[0] +X[2] +X[4] +X[6] +X[8]-10.0;
A[3]=X[1] +X[3] +X[5] +X[7] +X[9]-15.0;
C1 f1 = X[0].pow(2)+2*X[1].pow(2)+3*X[2].pow(2)+4*X[3].pow(2) +
5*X[4].pow(2)+6*X[5].pow(2)+7*X[6].pow(2)+8*X[7].pow(2)+
9*X[8].pow(2)+10*X[9].pow(2);
C1 q1 = f1 + (0.5 * rho) * A.pow(2);
C0 g1 = d(q1), beta = g1.pow(2) / g.pow(2);
p = -g1 + beta * p; delta_q &= (C0)q - (C0)q1;
if((double)norm(delta_q) < RELAXED*EPSILON ||
(double)norm(d_x) < RELAXED*EPSILON) break;
}
}
} while (++k1 < MAX_NO_OF_ITERATION && (double)norm(d_x) >
RELAXED*EPSILON && (double)norm(delta_q)> RELAXED*EPSILON);
delta_X = norm(X_cache - ((C0)X)); X_cache = ((C0)X);
}
A0 =1.5 x1+x2+x3 +0.5 x4 +0.5 x5- 5.5

A1 =2.0 x6-0.5 x7-0.5 x8+x9-x10-2.0
A2 =x1 + x3 + x5 + x7 + x9 - 10
A3 =x2 + x4 + x6 + x8 + x10 - 15
10
f(x) =
ixi2
i=0
q(x) = f(x) + ( /2) (A(x) A(x) )

p = - g;
line search along p
xi+1 = xi + p
(xi+1)
d= -d/d2; Newtons formula
+= d
dx = p
update xi+1 = xi+dx
norm = || dx ||
conjugate direction
T
gi+1 = q(xi+1)
T
i= (gi+1)Tgi+1 / [ (gi) gi]
pi+1 = - gi+1 + i pi
x = {-2.00, 2.66, 2.39, 3.62, 3.27, 2.87,

T
3.87, 3.16, 2.47, 2.69}
f = 502.4
Listing 222 The conjugate gradient method for penalty formulation (project: penalty_conjugate_gradient).
158

p = -A(xi) T q(0) T / Q(xi) A T(xi) q(xi)T A (xi) / ((A(xi) T A(xi))2)
= - --1- A T(xi) (A(xi)A(xi) T )-2 A (xi) q(xi)T
Eq. 266
The kernel steps of the combined Newton and steepest descent customized for penalty method are
Step 1: Newton method on M,
search direction: p = - --1- A T(xi) (A(xi)A(xi) T )-2 A (xi) q(xi)T
line search: minimize q(xi + p), update ri = xi + p
Step 2: steepest descent on M,
search direction: p = - q(ri)T
line search: minimize q(ri + p), update xi+1 = ri + p
We implemented these two steps with three C++ functions in Program Listing 223: (1) the line search is performed in both steps. We factor out this procedure and code it into a line_search() function, (2) Newton method
applied to M is implemented as newton_on_orthogonal_complement_of_tangent(), and (3) steepest descent
on M is implemented as steepest_descent_on_tangent(). Program Listing 224 implemented the main program
of the combined Newton and steepest descent method using the above three functions.
159
Chapter
void line_search(C2& X, C0& p, C2& alpha, double rho) {
const int DOF = 10; const double EPSILON = 1.e-12;
const double RELAXED = 1.e6; const int MAX_NO_OF_ITERATION = 10;
((C0)alpha) = 0.0;
C2 x[DOF], A[4]; C0 d_alpha; int k2 = 0;
do {
C2 phi;
for(int j = 0; j < 10; j++) x[j] &= ((C0)X)[j] + alpha * p[j];
A[0]=1.5*x[0]+x[1]+x[2]+0.5*x[3]+0.5*x[4] -5.5;
A[1]= 2.0*x[5]-0.5*x[6]-0.5*x[7]+x[8]-x[9]-2.0;
A[2]=x[0] +x[2] +x[4] +x[6] +x[8] -10.0;
A[3]=x[1] +x[3] +x[5] +x[7] +x[9]-15.0;
phi &= x[0].pow(2)+2*x[1].pow(2)+3*x[2].pow(2)+4*x[3].pow(2)+5*x[4].pow(2)+
6*x[5].pow(2)+7*x[6].pow(2)+8*x[7].pow(2)+9*x[8].pow(2)+10*x[9].pow(2)+
(0.5 * rho) *A.pow(2);
if((double)dd(phi) > EPSILON) d_alpha &= -d(phi) / dd(phi); else break;
} while(++k2 < MAX_NO_OF_ITERATION &&
(double)norm((double)d_alpha) > RELAXED*EPSILON);
}
void newton_on_orthogonal_complement_of_tangent(
C2& X, C2& f, C2& A, C2& q, C0& p, double rho) {
A[0]=1.5*X[0]+X[1]+X[2]+0.5*X[3]+0.5*X[4] -5.5;
A[1]= 2.0*X[5]-0.5*X[6]-0.5*X[7]+X[8]-X[9]-2.0;
A[2]=X[0] +X[2] +X[4] +X[6] +X[8] -10.0;
A[3]=X[1] +X[3] +X[5] +X[7] +X[9]-15.0;
q &= f + (0.5 * rho) * A.pow(2);
C0 grad_A = d(A), approx_Q_bar_inv;
approx_Q_bar_inv &= 1.0/rho * (grad_A*(~grad_A)).pow(2).inverse();
p &= (~grad_A) * approx_Q_bar_inv * grad_A * -d(q);
}
void steepest_descent_on_tangent(C2& X, C2& f, C2& A, C2& q, C0& p, double rho) {
A[0]=1.5*X[0]+X[1]+X[2]+0.5*X[3]+0.5*X[4]-5.5;
A[1]=2.0*X[5]-0.5*X[6]-0.5*X[7]+X[8]-X[9]-2.0;
A[2]=X[0] +X[2] +X[4] +X[6] +X[8]-10.0;
A[3]=
X[1] +
X[3] +
X[5] +
X[7] + X[9]-15.0;
q &= f + (0.5 * rho) * A.pow(2);
p &= -d(q);
}
line search along p

xi+1 = xi + p
A0 =1.5 x1+x2+x3 +0.5 x4 +0.5 x5- 5.5
A1 =2.0 x6-0.5 x7-0.5 x8+x9-x10-2.0
A2 =x1 + x3 + x5 + x7 + x9 - 10
A3 =x2 + x4 + x6 + x8 + x10 - 15
10
f(x) =
ixi2
i=0
(x) = f(x) + ( /2) (A A)

d= -d/d2; Newtons formula
Newton method on M
Q(xi) --1- (A(xi) A(xi) )-2

T
p = -A (xi) Q(xi) A (xi) q(xi)
steepest descent on M
p = - q(ri)
Listing 223 Three functions of the combined Newton and steepest descent version of penalty method (project:
penalty_combined_newton_and_steepest_descent).
160

int main() {
const int DOF = 10;
const int MAX_NO_OF_ITERATION = 12; int MAX_NO_OF_INNER_ITERATION = 6;
C2 f, q, X(DOF,x),
int k0 = 0;
double q_cache, d_q;
do {
rho *= 10.0;
int k1 = 0;
q_cache = d_q = 0.0;
MAX_NO_OF_INNER_ITERATION = ((rho < 1.e6) ? 1: 6);
do {
C0 p;
newton_on_orthogonal_complement_of_tangent(X, f, A, q, p, rho);
C2 alpha(0.0);
line_search(X, p, alpha, rho);
d_x &= ((C0)alpha)*p;
((C0)X) += d_x;
steepest_descent_on_tangent(X, f, A, q, p, rho);
((C0)alpha) = 0.0;
line_search(X, p, alpha, rho);
d_x &= ((C0)alpha)*p;
((C0)X) += d_x;
d_q = fabs( ((double)(C0)q)-q_cache );
q_cache = (double)(C0)q;
cout << "(c=" << c << ", k1=" << k1 << ")--" << " f: " << ((C0)f) << " q: " << ((C0)q)
<< " d_q: " << d_q << endl;
if((double)norm(d_x) < RELAXED*EPSILON && d_q < RELAXED*EPSILON)
break;
} while (++k1 < MAX_NO_OF_INNER_ITERATION && (double)norm(d_x) >
RELAXED*EPSILON && d_q > RELAXED*EPSILON);
cout << "solution(c=" << c << ", " << k1 << "): ";
cout << ((C0)X) << " f: " << ((C0)f) << " q: " << ((C0)q) << " d_q: " << d_q << endl;
X_cache = ((C0)X);
return 0;
}
call Newton method on M

line search
call steepest descent on M

line search
x = {-2.00, 2.66, 2.39, 3.62, 3.27, 2.87,

T
3.87, 3.16, 2.47, 2.69}
f = 502.4
Listing 224 The main program of the combined Newton and steepest descent version of penalty method
(project: penalty_combined_newton_and_steepest_descent).
161
Chapter
Augmented Lagrangian Method

Augmented Lagrangian method is ready to be tackled, since we have introduced both Lagrange method and
penalty method. For an equality constrained problem the augmented Lagragian functional is
lA(x, ) = f(x) + T A(x) + ( /2) A(x)T A(x)
Eq. 267
First, the obvious feature is now we are working on an augmented space of {x, } with the dimension n+m.
Second, from dual view point, comparing Eq. 267 with the Lagrangian functional in Lagrange method on
T
page 145, Eq. 267 has an extra penalty term ( /2) A(x) A(x). In view of the existence and uniqueness (strong)
condition of the saddle function shown in Figure 2.18. This quadratic penalty term in x convexifies the primal
(x-variables) of the Lagrangian functional l(x, ). Third, from penalty view point, Eq. 267 without the middle
term is the penalty objective functional ( q(x) in Eq. 261). The penalty method is always plagued by being not
consistent with the minimum of the Lagrangian functional l(x, ). We can easily show this by considering that
the first-order conditions of the constraint problem (l,x) and unconstrained problem (q) are not equal
l,x = f(x) + A(x) q = f(x) + A(x) A(x)
T
unless a special condition = A(x), Eq. 262, is met. On the other hand, the first derivative of the augmented
Lagrangian functional (lA) is
f(x) + A(x) + A(x) A(x) = 0
T
This is consistent with the first-order condition of Lagrange method as

f(x) + A(x) = 0
A(x) = 0
T
Therefore, the middle term A(x) in Eq. 267 makes the penalty method consistent with the first-order condition of Lagrange method. The algorithm of augmented Lagrangian method can be implemented with a nested
double loop. The outer loop is the penalty method in which we need to increase the penalty parameter from a
smaller number to a considerable greater number until the solution converge. The inner loop is to update the
Lagrange multiplier, in view of = A(x), as
T
i+1 = i + A(xi )
Eq. 268
in the hope that A(xi+1) = 0 can be achieved by this update. Program Listing 225 implemented augmented
Lagrangian method.
Perturbed Lagrangian Method

Augmented Lagrangian method has a quadratic penalty term that convexifies the primal. In the context of
quadratic programming, the Hessian of the primal is already positive definite. To make the condition for the
existence and uniqueness of a saddle point problem, we really want to convexfies the dual (-variables), which
162

int main() {
C2 f, l_A, X(DOF,x),
lambda = VECTOR_OF_TANGENT_OF_TANGENT_BUNDLE("int, int", 4, DOF);
C0 d_x, X_cache(DOF, (double*)0);
int k0 = 0;
do {
rho *= 10.0;
int k1 = 0;
do {
A[0]=1.5*X[0]+X[1]+X[2]+0.5*X[3]+0.5*X[4]-5.5;
A[1]=2.0*X[5]-0.5*X[6]-0.5*X[7]+X[8]-X[9]-2.0;
A[2]=X[0] +X[2] +X[4] +X[6] +X[8]-10.0;
A[3]=X[1] +X[3] +X[5] +X[7] +X[9]-15.0;
lambda += rho * A;
l_A &= f + lambda*A + (0.5 * rho) * A.pow(2);
d_x &= -d(l_A) / dd(l_A);
((C0)X) += d_x;
} while (++k1 < MAX_NO_OF_ITERATION &&
(double)norm(d_x) > RELAXED*EPSILON);
cout << "solution(rho=" << rho << ", " << k1 << "): " << ((C0)X) <<
" f: " << ((C0)f) << " q: " << ((C0)q) << endl;
X_cache = ((C0)X);
}
i+1 = i + A(xi)
lA =f(x) + T A(x) + ( /2) A(x)T A(x)
dx = - lA,x / lA,xx
x = {-2.00, 2.66, 2.39, 3.62, 3.27, 2.87,

3.87, 3.16, 2.47, 2.69}T
f = 502.4
Listing 225 Augmented Lagrangian method (project: augmented_lagrangian).

does not necessary have negative definite Hessian. Consider the Lagrangian functional l(x, ), a perturbed
Lagrangian functional lp(x, ) can be defined as
lp(x, ) = l(x, ) --- 2 = f(x) + T A(x) --- 2
Eq. 269
The solution is achieved at the limit 0 . The gradient of the perturbed Lagrangian function (the EulerLagrange equations) is
lp,x = f(x) + T A(x) = 0
lp, = A(x) - = 0
Eq. 270
It is consistent with the first-order condition of Lagrange method (at the limit of 0 ). For a quadratic programming problem
minimize
subject to
f(x) = -2 x T H x + g T x
A(x) = Ax - b = 0
163
Chapter
Eq. 270 can be re-written in matrix form with the incremental solution x as
T
HA
A
i
x = f ( x )
i
A ( x )
Eq. 271
Considering the limit of 0 , is in place of a null submatrix in Eq. 245

T
HA
A 0
f ( xi )
A ( xi )
Since the left-hand-side matrix is symmetrical we can use modified Cholesky decomposition to solve Eq. 271.
Or, recall Eq. 245 that = (AH-1AT)-1 [A(xi) - AH-1f(xi )] Set B-1 = (AH-1AT)-1 and apply modified Cholesky
decomposition to B. Not only the implementation in Program Listing 226 is extremely simple, but also the
computing speed is lightning fast.
int main() {
double x[DOF] = {0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0};
int k = 0;
C2 f, X(DOF,x),
C0 d_x;
do {
A[0]=1.5*X[0]+X[1]+X[2]+0.5*X[3]+0.5*X[4]
-5.5;
A[1]=
2.0*X[5]-0.5*X[6]-0.5*X[7]+X[8]-X[9]-2.0;
A[2]= X[0] + X[2] +
X[4] +
X[6] +
X[8] -10.0;
A[3]=
X[1] +
X[3] +
X[5] +
X[7] + X[9]-15.0;
C0 H_inv = dd(f).inverse();
Cholesky AHAt_inv( (d(A)*H_inv*(~d(A)) ), EPSILON);
C0 lambda = AHAt _inv * ( ((C0)A) - d(A)*(H_inv*d(f)) );
d_x &= H_inv*-( (~d(A))*lambda + d(f) );
((C0)X) += d_x;
cout << "solution(" << (++k) << "): "<<((C0)X) << ", objective functional: " << ((C0)f)
<< endl;
} while( k < MAX_NO_OF_ITERATION && (double)norm(d_x) > RELAXED*EPSILON );
cout << "The final solution: " << ((C0)X) << endl;
}
modified Cholesky on (AH-1AT)-1

= (AH-1AT)-1 [A(xi) - AH-1f(xi )]
x = H-1 (f(xi ) - AT); i.e., Eq. 247
x = {-2.00, 2.66, 2.39, 3.62, 3.27, 2.87,
3.87, 3.16, 2.47, 2.69}T
f = 502.4
Listing 226 Perturbed Lagrangian method (project: perturbed_lagrangian).
164

Op Tim Ization

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Op Tim Ization

Uploaded by

Copyright:

Available Formats

CHAPTER

2.1 C1 Type Objects

Workbook of Applications in VectorSpace C++ Library 95

2 Numerical Optimization Using C1 and C2 Type Objects

dimensions greater than 3), a generalized containing space

is said to contain a manifold , in place of a

Figure 2.1 Tangent bundle of a body

in its containing space .

2.1.1 Tangent Bundle

Workbook of Applications in VectorSpace C++ Library

Tangent Bundle Abstraction

// (*u) = 0.0; (*du) = 1.0 (default), spatial dimension = 1 (default)

// (*u) = 0.0; (*du) = 1.0

Workbook of Applications in VectorSpace C++ Library

2 Numerical Optimization Using C1 and C2 Type Objects

VectorSpace C++ Library definition

Strings in C1 virtual constructor for C1 type Tangent_Bundle class.

Operators and Functions

Workbook of Applications in VectorSpace C++ Library

VectorSpace C++ library definition

positive (primary casting)

2.1.2 Vector of Tangent Bundle

Workbook of Applications in VectorSpace C++ Library

2 Numerical Optimization Using C1 and C2 Type Objects

// base point values (coordinates)

Workbook of Applications in VectorSpace C++ Library

initial values, x0 = {1.0, 1.0, 1.0}T

Eq. 24, dx = - f(xi) / f,x(xi)

Listing 21 Solving a nonlinear vector of function using C1 type Vector_of_Tangent_Bundle class

Figure 2.2 Selector of the C1 type Vector_of_Tangent_Bundle class.

Workbook of Applications in VectorSpace C++ Library

2 Numerical Optimization Using C1 and C2 Type Objects

VectorSpace C++ library definition

pointer to C1 type Vector_of_Tangent_Bundle

manifold dim, spatial dim, base point, tangnet vector

manifold dim., spatial dim.

Strings in C1 virtual constructor for C1 type Vector_of_Tangent_Bundle class.

Operators and Functions

Workbook of Applications in VectorSpace C++ Library

VectorSpace C++ library definition

casting operator; retrieve base point

positive (primary casting)

Workbook of Applications in VectorSpace C++ Library

2 Numerical Optimization Using C1 and C2 Type Objects

2.2 C2 Type Objects

tangent of tangent vector

d2z = d2x + d2y

d2z = d2x - d2y

2.2.1 Tangent of Tangent Bundle

u,x u,y u,z

u,xx u,xy u,xz

Figure 2.3 Data abstraction of C2 type Tangent_of_Tangent_Bundle class.

Workbook of Applications in VectorSpace C++ Library

// u = 0.0, du (= 1.0; default value), ddu (= 0.0; default value)

// (*u) = 0.0; (*du) = 1.0

Operators and Functions

Workbook of Applications in VectorSpace C++ Library

2 Numerical Optimization Using C1 and C2 Type Objects

VectorSpace C++ library definition

Strings in C2 virtual constructor for Tangent_of_Tangent_Bundle object.

Workbook of Applications in VectorSpace C++ Library

VectorSpace C++ library definition

positive (primary casting)

// (u) = 0.0; (du) = 1.0 (default), spatial dimension = 1 (default)

// (u) = 0.0; (du) = 1.0

// (u) = 0.0; (du) = 1.0