= 2 
1
2p
.
Notc: As p , _2 
1
2p
] 2 (= x
)
Penalty Function Auxiliary Function
3 2 1 0 1 2 1 0 1 2 3
1
p
2
p
p
f+
2
p
f+
1
p
1
=0.5
2
=1.5
152
In general:
1. We convert the constrained problem to an unconstrained one by using the
penalty function.
2. The solution to the unconstrained problem can be made arbitrarily close to that
of the original one by choosing sufficiently large .
In practice, if is very large, too much emphasis is placed on feasibility. Often,
algorithms for unconstrained optimization would stop prematurely because the step
sizes required for improvement become very small.
Usually, we solve a sequence of problems with successively increasing
values; the optimal point from one iteration becomes the starting point for the next
problem.
PROCEDURE: Choose the following initially:
1. a tolerance c,
2. an increase factor ,
3. a starting point x
1
, and
4. an initial
1
.
At Iteration i
1. Solve the problem Minimize f(x)+
i
p(x); st xeX (usually R
n
). Use x
i
as the
starting point and let the optimal solution be x
i+1
2. If
i
p(x
i+1
) < c then STOP; else let
i+1
=(
i
) and start iteration (i+1)
153
EXAMPLE: Minimize f(x) = x
1
2
+ 2x
2
2
st g(x) = 1x
1
x
2
s 0; xeR
2
Define the penalty function
p(x) = _
(1  x
1
 x
2
)
2
if g(x) > u
u if g(x) u
The unconstrained problem is
Minimize x
1
2
+ 2x
2
2
+ p(x)
If p(x)=0, then the optimal solution is x
*
=(0,0) INFEASIBLE!
p(x) = (1x
1
x
2
)
2
, f(x) = x
1
2
+ 2x
2
2
+ [(1x
1
x
2
)
2
], and the necessary
conditions for the optimal solution (Vf(x)=0) yield the following:
o
ox
1
= 2x
1
+ 2p(1  x
1
 x
2
)(1) = u,
o
ox
2
= 4x
2
+ 2p(1  x
1
 x
2)
(1) = u
Thus x
1
=
2p
(2 + Sp)
anu x
2
=
p
(2 + Sp)
Starting with =0.1, =10 and x
1
=(0,0) and using a tolerance of 0.005 (say), we have
the following:
Iter. (i)
i
x
i+1
g(x
i+1
)
i
p(x
i+1
)
1
2
3
4
5
0.1
1.0
10
100
1000
(0.087, 0.043)
(0.4, 0.2)
(0.625, 0.3125)
(0.6622, 0.3311)
(0.666, 0.333)
0.87
0.40
0.0625
0.0067
0.001
0.0757
0.16
0.039
0.00449
0.001
Thus the optimal solution is x
*
= (2/3, 1/3).
154
INTERIOR PENALTY (BARRIER) FUNCTION METHODS
These methods also transform the original problem into an unconstrained one;
however the barrier functions prevent the current solution from ever leaving the
feasible region. These require that the interior of the feasible sets be nonempty, which
is impossible if equality constraints are present. Therefore, they are used with
problems having only inequality constraints.
A barrier function B is one that is continuous and nonnegative over the interior
of {x  g(x)s0}, i.e., over the set {x  g(x)<0}, and approaches as the boundary is
approached from the interior.
Let (y) > 0 if y<0 and lim
0
 (y) =
Then B(x) = g
]
(x)
m
]=1
]
Usually,
B(x) = 
1
g
]
(x)
m
]=1
or B(x) =  log (g
]
(x))
m
]=1
In both cases, note that lim
g
]
(x)0
 B(x) =
The auxiliary function is now
f(x) + B(x)
where is a SMALL positive number.
155
Q. Why should be SMALL?
A. Ideally we would like B(x)=0 if g
j
(x)<0 and B(x)= if g
j
(x)=0, so that we never
leave the region{x  g(x) s 0}. However, B(x) is now discontinuous. This causes
serious computational problems during the unconstrained optimization.
Similar to exterior penalty functions we don't just choose one small value for rather
we start with some
1
and generate a sequence of points.
PROCEDURE: Initially choose a tolerance c, a decrease factor , an interior
starting point x
1
, and an initial
1
.
At Iteration i
1. Solve the problem Minimize f(x)+
i
B(x),st xeX (usually R
n
). Use x
i
as the
starting point and let the optimal solution be x
i+1
2. If
i
B(x
i+1
) < c then STOP; else let
i+1
=(
i
) and start iteration (i+1)
Consider the previous example once again
156
EXAMPLE: Minimize f(x) = x
1
2
+ 2x
2
2
st g(x) = 1x
1
x
2
s 0; xeR
2
Define the barrier function
B(x) = log(g(x)) = log (x
1
+ x
2
 1)
The unconstrained problem is
Minimize x
1
2
+ 2x
2
2
+ B(x) = x
1
2
+ 2x
2
2
 log (x
1
+ x
2
 1)
The necessary conditions for the optimal solution (Vf(x)=0) yield the following:
o
ox
1
= 2x
1
 p(x
1
+ x
2
 1) = u,
o
ox
2
= 4x
2
 p(x
1
+ x
2
 1) = u
Solving, we get x
1
=
1 _ 1 + Sp
S
anu x
2
=
1 _ 1 + Sp
6
Since the negative signs lead to infeasibility, we x
1
=
1+1+3
3
anu x
2
=
1+1+3
6
Starting with =1, =0.1 and x
1
=(0,0) and using a tolerance of 0.005 (say), we have
the following:
Iter. (i)
i
x
i+1
g(x
i+1
)
i
B(x
i+1
)
1
2
3
4
5
1
0.1
0.01
0.001
0.0001
(1.0, 0.5)
(0.714, 0.357)
(0.672, 0.336)
(0.6672, 0.3336)
(0.6666, 0.3333)
0.5
0.071
0.008
0.0008
0.0001
0.693
0.265
0.048
0.0070
0.0009
Thus the optimal solution is x
*
= (2/3, 1/3).
157
Penalty Function Methods and Lagrange Multipliers
Consider the penalty function approach to the problem below
Minimize f(x)
st g
j
(x) s 0; j = 1,2,,m
h
j
(x) = 0; j = m+1,m+2,,l x
k
eR
n
.
Suppose we use the usual interior penalty function described earlier, with p=2.
The auxiliary function that we minimize is then given by
P(x) = f(x) + p(x) = f(x) + {
j
[Max(0, g
j
(x))]
2
+
j
[h
j
(x)]
2
}
= f(x) + {
j
[Max(0, g
j
(x))]
2
+
j
[h
j
(x)]
2
}
The necessary condition for this to have a minimum is that
VP(x) = Vf(x) + Vp(x) = 0, i.e.,
Vf(x) +
j
2[Max(0, g
j
(x)]Vg
j
(x) +
j
2[h
j
(x)]Vh
j
(x) = 0 (1)
Suppose that the solution to (1) for a fixed (say
k
>0) is given by x
k
.
Let us also designate
2
k
[Max{0, g
j
(x
k
)}] =
j
(
k
); j=1,2,,m (2)
2
k
[h
j
(x
k
)] =
j
(
k
); j=m+1,,l (3)
so that for =
k
we may rewrite (1) as
158
Vf(x) +
j
j
(
k
) Vg
j
(x) +
j
j
(
k
) Vh
j
(x) = 0 (4)
Now consider the Lagrangian for the original problem:
L(x,) = f(x) +
j
j
g
j
(x) +
j
j
h
j
(x)
The usual KKT necessary conditions yield
Vf(x) +
j=1..m
{
j
Vg
j
(x)} +
j=m+1..l
{
j
Vh
j
(x)} = 0 (5)
plus the original constraints, complementary slackness for the inequality constraints
and
j
>0 for j=1,,m.
Comparing (4) and (5) we can see that when we minimize the auxiliary function
using =
k
, the
j
(
k
) values given by (2) and (3) estimate the Lagrange multipliers
in (5). In fact it may be shown that as the penalty function method proceeds and
k
and the x
k
x
*
, the optimum solution, the values of
j
(
k
)
j
*
, the optimum
Lagrange multiplier value for constraint j.
Consider our example on page 153 once again. For this problem the Lagrangian is
given by L(x,) =x
1
2
+ 2x
2
2
+ (1x
1
x
2
).
The KKT conditions yield
cL/cx
1
= 2x
1
 = 0; cL/cx
2
= 4x
2
 = 0; (1x
1
x
2
)=0
Solving these results in x
1
*
=2/3; x
2
*
=1/3;
*
=4/3; (=0 yields an infeasible solution).
159
Recall that for fixed
k
the optimum value of x
k
was [2
k
/(2+3
k
)
k
/(2+3
k
)]
T
. As
we saw, when
k
, these converge to the optimum solution of x
*
= (2/3,1/3).
Now, suppose we use (2) to define () = 2[Max(0, g
j
(x
k
)]
= 2[1  {2/(2+3)}  {/(2+3)}] (since g
j
(x
k
)>0 if >0)
= 2[1  {3/(2+3)}] = 4/(2+3)
Then it is readily seen that Lim
() = Lim
4/(2+3) = 4/3 =
*
Similar statements can also be made for the barrier (interior penalty) function
approach, e.g., if we use the logbarrier function we have
P(x) = f(x) + p(x) = f(x) +
j
log(g
j
(x)),
so that
VP(x) = Vf(x) 
j
{/g
j
(x)}Vg
j
(x) = 0 (6)
If (like before) for a fixed
k
we denote the solution by x
k
and define 
k
/g
j
(x
k
) =
j
(
k
), then from (6) and (5) we see that
j
(
k
) approximates
j
. Furthermore, as
k
0 and the x
k
x
*
, it can be shown that
j
(
k
)
j
*
For our example (page 156) 
k
/g
j
(x
k
) =
k
/_1 
1+1+3
3

1+1+3
6
] =
2
k
/(11 + Sp
k
) 4/3, as
k
0.
160
Penalty and Barrier function methods have been referred to as SEQUENTIAL
UNCONSTRAINED MINIMIZATION TECHNIQUES (SUMT) and studied in
detail by Fiacco and McCormick. While they are attractive in the simplicity of the
principle on which they are based, they also possess several undesirable properties.
When the parameters are very large in value, the penalty functions tend to be
illbehaved near the boundary of the constraint set where the optimum points usually
lie. Another problem is the choice of appropriate
1
and  values. The rate at which
i
change (i.e. the  values) can seriously affect the computational effort to find a
solution. Also as increases, the Hessian of the unconstrained function becomes ill
conditioned.
Some of these problems are addressed by the socalled "multiplier" methods.
Here there is no need for to go to infinity, and the unconstrained function is better
conditioned with no singularities. Furthermore, they also have faster rates of
convergence than SUMT.
161
IllConditioning of the Hessian Matrix
Consider the Hessian matrix of the Auxiliary function P(x)=x
1
2
+2x
2
2
+(1x
1
x
2
)
2
:
H =
(
+
+
2 4 2
2 2 2
. Suppose we want to find its eigenvalues by solving
HI = (2+2)*(4+2)  4
2
=
2
(6+4) + (8+12) = 0
This quadratic equation yields
= 1 4 ) 2 3 (
2
+ +
Taking the ratio of the largest and the smallest eigenvalue yields
1 4 ) 2 3 (
1 4 ) 2 3 (
2
2
+ +
+ + +
. It should be clear that as , the limit of the preceding ratio
also goes to . This indicates that as the iterations proceed and we start to increase
the value of , the Hessian of the unconstrained function that we are minimizing
becomes increasingly illconditioned. This is a common situation and is especially
problematic if we are using a method for the unconstrained optimization that requires
the use of the Hessian.
162
MULTIPLIER METHODS
Consider the problem: Min f(x), st g
j
(x)s0, j=1,2,...,m.
In multiplier methods the auxiliary function is given by
P(x) = f(x) + p
]
Nax{u, g(x) + 0
]
]]
2
m
]=1
where
j
>0 and 0
]
=0 and
j
=, this reduces to the usual exterior penalty function.
The basic idea behind multiplier methods is as follows:
Let x
i
be the minimum of the auxiliary function P
i
(x) at some iteration i. From the
optimality conditions we have
VP

(x

) = V(x

) +p
]
Hoxu, g
]
(x

) +0
]
]Vg
j
(x

)
m
]=1
Now, Hoxu, g
]
+ 0
]
 = 0
]
+ Hox(0
]
, g
]
). Therefore VP

(x

) = implies
V(x

) +p
]
0
]
Vg
j
(x

)
m
]=1
+ p
]
Hox0
]
, g
]
(x

)Vg
j
(x

)
m
]=1
=
Let us assume for a moment that 0
]
u, g
]
(x

) u anu 0
]
g
]
(x

) = u (2)
Hox(0
]
, g
]
) = u
163
Therefore as long as (2) is satisfied, we have for the point x
i
that minimize P
i
(x)
VP

(x

) = V(x

) + p
]
0
]
Vg
j
(x

) =
m
]=1
(3)
If we let p
]
0
]
= z
]
u, g
]
(x

) u anu z
]
g
]
(x

) = u, ] (A)
and (3) reduces to
V(x

) + z
]
Vg
j
(x

) =
m
]=1
. (B)
It is readily seen that (A) and (B) are merely the KARUSHKUHNTUCKER
necessary conditions for x
i
to be a solution to the original problem below!!
Min f(x), st g
j
(x)s0, j=1,2,...,m.
We may therefore conclude that
x

= x
anu p
]
0
]
= p
]
0
]
= z
]
Here x
*
is the solution to the problem, and
j
*
is the optimum Lagrange multiplier
associated with constraint j.
From the previous discussion it should be clear that at each iteration u
j
should be
chosen in such a way that Hox(0
]
, g
]
) u, which results in x
i
x
*
. Note that x
i
is
obtained here by minimizing the auxiliary function with respect to x.
164
ALGORITHM: A general algorithmic approach for the multiplier method may now
be stated as follows:
STEP 0: Set i=0, choose vectors x
i
, and 0

.
STEP 1: Set i=i+1.
STEP 2: Starting at x
i1
, Minimize P
i
(x) to find x
i
.
STEP 3: Check convergence criteria and go to Step 4 only if the criteria are not
satisfied.
STEP 4: Modify 0

based on satisfying (1). Also modify if necessary and go to
Step 2.
(Usually 0
0
is set to 0)
One example of a formula for changing 0

is the following:
0
+1
= 0

+ max(g
]
(x

), 0
]
)]
165
PRIMAL METHODS
By a primal method we mean one that works directly on the original constrained
problem:
Min f(x)
st g
j
(x)s0 j=1,2,...,m.
Almost all methods are based on the following general strategy:
Let x
i
be a design at iteration i. A new design x
i+1
is found by the expression x
i+1
= x
i
+ o
i
d

, where o
i
is a step size parameter and d

is a search direction (vector). The
direction vector d

is typically determined by an expression of the form:
d

= [
0
V(x

) + [
]
Vg
j
(x

)
]]
s
where Vf and Vg
j
are gradient vectors of the objective and constraint functions and J
c
is an active set" given by J
c
={jg
j
(x
i
)+c > 0, c >0}.
Unlike with transformation methods, it is evident that here the gradient vectors of
individual constraint functions need to be evaluated. This is a characteristic of all
primal methods.
166
METHOD OF FEASIBLE DIRECTIONS
The first class of primal methods we look at are the feasible directions methods,
where each x
i
is within the feasible region. An advantage is that if the process is
terminated before reaching the true optimum solution (as is usually necessary with
largescale problems) the terminating point is feasible and hence acceptable. The
general strategy at iteration i is:
(1) Let x
i
be a feasible point.
(2) Choose a direction d

so that
a) x
i
+od

is feasible at least for some "sufficiently" small o>0, and
b) f(x
i
+od

) < f(x
i
).
(3) Do an unconstrained optimization
Min
o
f(x
i
+od

) to obtain o
*
=o
i
and hence x
i+1
= x
i
+o
i
d

DEFINITION: For the problem Minimize f(x), st xeX
d (=0) is called a feasible direction at x'eX if  o>0 (x'+od)eX for oe0,o)
X
x'
feasible directions
d
X
x'
not feasible directions
167
Further, if we also have f(x'+od) < f(x') for all oe[0,o'), o' <o, then d is called an
improving feasible direction.
Under differentiability, recall that this implies
Vf
T
(x')d < 0
Let F
x'
= {d  Vf
T
(x')d < 0}, and
D
x'
= {d   o>0 (x'+od)eX for all oe0,o)},
Thus F
x'
is the set of improving directions at x', and
D
x'
is the set of feasible directions at x'.
If x
*
is a local optimum then F
x*
D
x*
=  as long as a constraint qualification is
met.
In general, for the problem
Minimize f(x), st g
j
(x)s0, j=1,2,...,m.
let J
x'
={j  g
j
(x')=0} (Index set of active constraints)
We can show that the set of feasible directions at x' is
D
x'
= {d  Vg
j
T
(x')d < 0 for all jeJ
x'
}
NOTE: If g
j
(x) is a linear function, then the strict inequality (<) used to define the set
D
x'
above can be relaxed to an inequality (s).
168
Geometric Interpretation
Minimize f(x), st g
j
(x) s0, j=1,2,,m
(i) (ii)
In (ii) d any positive step in this direction is infeasible (however we will often
consider this a feasible direction)
FARKAS LEMMA: Given AeR
mn
, beR
m
, xeR
n
, yeR
m
, the following statements
are equivalent to each other:
1. y
T
A s 0 y
T
b s 0
2.  z such that Az=b, z>0
d
T
Vg
j
(x
*
)<0 d is a feasible
direction
Vg
j
(x
*
)
g
j
(x)=0
x
*
d
T
Vg
j
(x
*
)=0
d is tangent to the
constraint boundary
Vg
j
(x
*
)
x
*
g
j
(x)=0
d
A
1
A
2
A
3
b
H
1
H
2
H
3
H
b
y
T
As 0 ye
j
H
j
where
H
j
is the closed
halfspace on the side of H
j
that does not contain A
j
So
y
T
As 0 y
T
b s 0 simply
implies that (
j
H
j
) _ H
b
j
H
j
Suppose H
j
is the hyperplane
through the origin that is
orthogonal to A
j
d
169
Application to NLP
Let b Vf(x
*
) A
j
Vg
j
(x
*
)
y d (direction vector), z
j
j
for je J
x*
{j  g
j
(x
*
)=0}
Farkas Lemma then implies that
(1) d
T
Vg
j
(x
*
) s 0 je J
x*
d
T
Vf(x
*
) s 0, i.e., d
T
Vf(x
*
) > 0
(2) 
j
>0 such that
je
J
x*
j
Vg
j
(x
*
) = Vf(x
*
)
are equivalent! Note that (1) indicates that
 directions satisfying d
T
Vg
j
(x
*
) s 0 je J
x*
are feasible directions
 directions satisfying d
T
Vf(x
*
) > 0 are ascent (nonimproving) directions
On the other hand (2) indicates the KKT conditions. They are equivalent!
So at the optimum, there is no feasible direction that is improving
Similarly for the general NLP problem
Vg
2
(x
*
)
Cone generated by gradients
of tight constraints
KKT implies that the steepest descent direction is in
the cone generated by gradients of tight constraints
x
*
g
3
(x)=0
g
2
(x)=0
Vg
3
(x
*
)
Vf (x
*
)
g
1
(x)=0
Vg
1
(x
*
)
Feasible region
170
Minimize f(x)
st g
j
(x) s 0, j=1,2,...,m
h
k
(x) = 0, k=1,2,...,p
the set of feasible directions is
D
x'
= {d  Vg
j
T
(x')d < 0 jeJ
x'
, and Vh
k
T
(x') d = 0 k}
====================================================
In a feasible directions algorithm we have the following:
STEP 1 (direction finding)
To generate an improving feasible direction from x', we need to find d such that
F
x'
D
x'
= , i.e.,
Vf
T
(x')d < 0 and Vg
j
T
(x')d < 0 for jeJ
x'
, Vh
k
T
(x') d = 0 k
We therefore solve the subproblem
Minimize Vf
T
(x')d
st Vg
j
T
(x')d < 0 for jeJ
x'
(s0 for j g
j
linear)
Vh
k
T
(x')d = 0 for all k,
along with some normalizing constraints such as
1 s d
i
s 1 for i=1,2,...,n or d
2
= d
T
d s 1.
If the objective of this subproblem is negative, we have an improving feasible
direction d
i
.
171
In practice, the actual subproblem we solve is slightly different since strict inequality
(<) constraints are difficult to handle.
Let z = Max [Vf
T
(x')d, Vg
j
T
(x')d for jeJ
x'
]. Then we solve:
Minimize z
st Vf
T
(x')d s z
Vg
j
T
(x')d s z for jeJ
Vh
k
T
(x')d = 0 for k=1,2,...,p,
1 s d
i
s 1 for i=1,2,...,n
If (z
*
,d
*
) is the optimum solution for this then
a) z
*
< 0 d =d
*
is an improving feasible direction
b) z = 0 x
*
is a Fritz John point (or if a CQ is met, a KKT point)
(Note that z
*
can never be greater than 0)
STEP 2: (line search for step size)
Assuming that z
*
from the first step is negative, we now solve
Min f(x
i
+od
i
)
st g
j
(x
i
+od
i
) s 0, o>0
This can be rewritten as
Min f(x
i
+od
i
)
st 0soso
max
where o
max
= supremum {o  g
j
(x
i
+od
i
) s 0, j=1,2,...,m}. Let the optimum solution to
this be at o
*
=o
i
.
Then we set x
i+1
= (x
i
+od
i
) and return to Step 1.
172
NOTE: (1) In practice the set J
x'
is usually defined as J
x'
={ j g
j
(x
i
)+c >0}, where the
tolerance c
i
is referred to as the "constraint thickness," which may be
reduced as the algorithm proceeds.
(2) This procedure is usually attributed to ZOUTENDIJK
EXAMPLE
Minimize 2x
1
2
+ x
2
2
 2x
1
x
2
 4x
1
 6x
2
st x
1
+ x
2
s 8
x
1
+ 2x
2
s 10
x
1
s 0
x
2
s 0 (A QP)
For this, we have
vg
1
(x) = j
1
1
[ , vg
2
(x) = j
1
2
[ , vg
3
(x) = j
1
u
[ , vg
4
(x) = j
u
1
[,
and v(x) = _
4x
1
 2x
2
 4
2x
2
 2x
1
 6
_.
Let us begin with x
= j
u
u
[, with f(x
0
)=0
ITERATION 1 J={3,4}. STEP 1 yields
Min z Min z
st Vf
T
(x
1
)d s z d
1
(4x
1
2x
2
4) + d
2
(2x
2
2x
1
6) s z
Vg
3
T
(x
i
)d s z d
1
s z
Vg
4
T
(x
i
)d s z d
2
s z
1sd
1
, d
2
s1 d
1
,d
2
e [1,1]
173
i.e., Min z
st 4d
1
 6d
2
s z, d
1
s z, d
2
s z d
1
,d
2
e [1,1]
yielding z
*
= 1 (<0), with d
*
= d
0
= j
1
1
[
STEP 2 (line search):
x
+ od
= j
u
u
[ + o j
1
1
[ = j
o
o
[
Thus o
max
= {o  2os8, os10, os0, os0} = 4
We therefore solve: Minimize f(x
0
+od
0
), st 0sos4
o

= o
0
=4 x
1
= x
+ o
0
d
= j
4
4
[, with f(x
1
)= 24.
ITERATION 2 J={1}. STEP 1 yields
Min z
st d
1
(4x
1
2x
2
4) + d
2
(2x
2
2x
1
6) s z
d
1
+ d
2
s z
1sd
1
, d
2
s1
i.e., Min z
st 4d
1
 6d
2
s z, d
1
+ d
2
s z, d
1
,d
2
e [1,1]
yielding z
*
= 4 (<0), with d
*
= d
1
= j
1
u
[.
STEP 2 (line search):
x
1
+ od
1
= j
4
4
[ + o j
1
u
[ = j
4  o
4
[
174
Thus o
max
= {o  8os8, o4+8s10, o4s0, 4s0} = 4
We therefore solve: Minimize 2(4o)
2
+ 4
2
2(4o)4 4(4o) 6(4), st 0sos4.
o

=o
1
=1 x
2
= x
1
+ o
1
d
1
= j
S
4
[, with f(x
2
)= 26.
ITERATION 3 J={}. STEP 1 yields
Min z
st d
1
(4x
1
2x
2
4) + d
2
(2x
2
2x
1
6) s z
1sd
1
, d
2
s1
i.e., Min z
st 4d
2
s z, d
1
,d
2
e [1,1]
yielding z
*
= 4 (<0), with d
*
= d
2
= j
u
1
[.
STEP 2 (line search):
x
2
+ od
2
= j
S
4
[ + o j
u
1
[ = j
S
4 + o
[
Thus o
max
= {o  7+os8, 5+os10, 3s0, 4os0} = 1
We therefore solve: Minimize f(x
2
+od
2
), st 0sos1
i.e., Minimize 2(3)
2
+ (4+o)
2
2(3)(4+o) 4(3) 6(4+o), st 0sos1
o

=o
2
=1 x
3
= x
2
+ o
2
d
2
= j
S
S
[, with f(x
3
)= 29.
175
ITERATION 4 J={1}. STEP 1 yields
Min z
st d
1
(4x
1
2x
2
4) + d
2
(2x
2
2x
1
6) s z
d
1
+ d
2
s z
1sd
1
, d
2
s1
i.e., Min z
st 2d
1
 10d
2
s z, d
1
+ d
2
s z, d
1
,d
2
e [1,1]
yielding z
*
= 0, with d
*
= j
u
u
[. STOP
Therefore the optimum solution is given by x
= j
S
S
[, with f(x
*
)= 29.
1 2 3 4 5 6 8 7
1
2
3
4
5
6
8
7
x
0
x
1
x
2
x
3
x
1
+2 x
2
=10
x
1
+ x
2
=8
path followed by algorithm
feasible region
176
GRADIENT PROJECTION METHOD (ROSEN)
Recall that the steepest descent direction is Vf(x). However, for constrained
problems, moving along Vf(x) may destroy feasibility. Rosen's method works by
projecting Vf(x) on to the hyperplane tangent to the set of active constraints. By
doing so it tries to improve the objective while simultaneously maintaining
feasibility. It uses d=PVf(x) as the search direction, where P is a projection matrix.
Properties of P (nn matrix)
1) P
T
= P and P
T
P = P (i.e., P is idempotent)
2) P is positive semidefinite
3) P is a projection matrix if, and only if, IP is also a projection matrix.
4) Let Q=IP and p= Px
1
, q=Qx
2
where x
1
, x
2
eR
n
. Then
(a) p
T
q= q
T
p =0 (p and q are orthogonal)
(b) Any xeR
n
can be uniquely expressed as x=p+q, where p=Px, q=(IP)x
x
2
x
1
x
p
q
given x, we can
find p and q
177
Consider the simpler case where all constraints are linear. Assume that there are 2
linear constraints intersecting along a line as shown below.
Obviously, moving along Vf would take us outside the feasible region. Say we
project Vf on to the feasible region using the matrix P.
THEOREM: As long as P is a projection matrix and PVf =0, d=PVf will be an
improving direction.
PROOF: Vf
T
d = Vf
T
(PVf) =  Vf
T
P
T
PVf = PVf
2
< 0. (QED)
For PVf to also be feasible we must have Vg
1
T
(PVf) and Vg
2
T
(PVf) s 0 (by
definition).
Vf
(IP)(Vf)
Vg
2
Vg
1
g
1
(x)=0
g
2
(x)=0
P(Vf)
178
Say we pick P such that Vg
1
T
(PVf) = Vg
2
T
(PVf) = 0, so that PVf is a feasible
direction). Thus if we denote M as the (2n) matrix where Row 1 is Vg
1
T
and
Row 2 is Vg
2
T
, then we have M(PVf)= 0 (*)
To find the form of the matrix P, we make use of Property 4(b) to write the vector Vf
as Vf = P(Vf) + [IP](Vf) = PVf  [IP]Vf
Now, Vg
1
and Vg
2
are both orthogonal to PVf, and so is [IP](Vf). Hence we must
have
[IP](Vf) =  [IP]Vf =
1
Vg
1
+
2
Vg
2
[IP]Vf = M
T
, where =_
z
1
z
2
_
M[IP]Vf = MM
T
(MM
T
)
1
M[IP]Vf = (MM
T
)
1
MM
T
=
= { (MM
T
)
1
MVf (MM
T
)
1
MPVf }
= { (MM
T
)
1
MVf +(MM
T
)
1
M(PVf) }
= (MM
T
)
1
MVf (from (*) above M(PVf)=0)
Hence we have
Vf = PVf  [IP]Vf = PVf + M
T
= PVf + M
T
(MM
T
)
1
MVf
i.e., PVf = Vf  M
T
(MM
T
)
1
MVf = [I  M
T
(MM
T
)
1
M]Vf
i.e.,
P = [I  M
T
(MM
T
)
1
M]
179
Question: What if PVf(x) = 0?
Answer: Then 0 = PVf = [I M
T
(MM
T
)
1
M] Vf = Vf + M
T
w, where
w = (MM
T
)
1
MVf. If w>0, then x satisfies the KarushKuhnTucker conditions and
we may stop; if not, a new projection matrix P
Vf
is an improving feasible direction. In order to identify this matrix P
we merely pick
any component of w that is negative, and drop the corresponding row from the matrix
M. Then we use the usual formula to obtain P
.
====================
Now consider the general problem
Minimize f(x)
st g
j
(x) s 0, j=1,2,...,m
h
k
(x) = 0, k=1,2,...,p
The gradient projection algorithm can be generalized to nonlinear constraints by
projecting the gradient on to the tangent hyperplane at x (rather than on to the surface
of the feasible region itself).
180
STEP 1: SEARCH DIRECTION
Let x
i
be a feasible point and let J be the "active set", i.e., J = {j  g
j
(x
i
)=0}, or more
practically, J = {j  g
j
(x
i
)+c > 0}.
Suppose each jeJ is approximated using the Taylor series expansion by:
g
j
(x) = g
j
(x
i
) + (xx
i
)Vg
j
(x
i
).
Notice that this approximation is a linear function of x with slope Vg
j
(x
i
).
 Let Mmatrix whose rows are Vg
j
T
(x
i
) for jeJ (active constraints) and Vh
k
T
(x
i
)
for k=1,2,...,p (equality constraints)
 Let let P=[I  M
T
(MM
T
)
1
M] (projection matrix). If M does not exist, let P=I.
 Let d
i
= PVf(x
i
).
a) If d
i
=0 and M does not exist STOP
b) If d
i
=0 and M exists find
w = j
u
u
[ = (MM
T
)
1
MVf(x
i
)
where u corresponds to g and v to h. If u>0 STOP, else delete the row of M
corresponding to some u
j
<0 and return to Step 1.
c) If d
i
=0 go to Step 2.
181
STEP 2. STEP SIZE (Line Search)
Solve Ninimize
0<x<x
mcx
(x
+ zJ
)
where
max
= Supremum {  g
j
(x
i
+d
i
) s 0, h
j
(x
i
+d
i
) =0}.
Call the optimal solution
*
and find x
i+1
= x
i
+
*
d
i
. If all constraints are not linear
make a "correction move" to return x into the feasible region. The need for this is
shown below. Generally, it could be done by solving g(x)=0 starting at x
i+1
using the
Newton Raphson approach, or by moving orthogonally from x
i+1
etc.
g
1
, g
2
are linear constraints g
1
is a nonlinear constraint
NO CORRECTION REQD. REQUIRES CORRECTION
g
1
=0
g
2
=0
Vg
1
Vf
PVf
O.F. contours
Vf
PVf
g
1
=0
O.F. contours
correction move
x
i+1
182
LINEARIZATION METHODS
The general idea of linearization methods is to replace the solution of the NLP by the
solution of a series of linear programs which approximate the original NLP.
The simplest method is RECURSIVE LP:
Transform Min f(x), st g
j
(x), j=1,2,...,m,
into the LP
Min {f(x
i
) + (xx
i
)
T
Vf(x
i
)}
st g
j
(x
i
) + (xx
i
)
T
Vg
j
(x
i
)s 0, for all j.
Thus we start with some initial x
i
where the objective and constraints (usually only
the tight ones) are linearized, and then solve the LP to obtain a new x
i+1
and
continue...
Note that f(x
i
) is a constant and xx
i
=d implies the problem is equivalent to
Min d
T
Vf(x
i
)
st d
T
Vg
j
(x
i
)s g
j
(x
i
), for all j.
and this is a direction finding LP problem, and x
i+1
=x
i
+d implies that the step
size=1.0!
EXAMPLE: Min f(x)= 4x
1
x
2
2
12,
st g
1
(x) = x
1
2
+ x
2
2
 25 s 0,
g
2
(x) = x
1
2
 x
2
2
+10x
1
+ 10x
2
 34 > 0
If this is linearized at the point x
i
=(2,4), then since
183
v(x

) = _
4
2x
2
_ = j
4
8
[ ; vg
1
(x

) = _
2x
1
2x
2
_ = j
4
8
[ ; vg
2
(x

) = _
2x
1
+ 1u
2x
2
+ 1u
_ = j
6
2
[
it yields the following linear program (VERIFY)
Min
(x) = 4x
1
8x
2
+ 4,
st g
1
(x) = 4x
1
+ 8x
2
 45 s 0,
g
2
(x) = 6x
1
+ 2x
2
 14 > 0
The method however has limited application since it does not converge if the local
minimum occurs at a point that is not a vertex of the feasible region; in such cases it
oscillates between adjacent vertices. To avoid this one may limit the step size (for
instance, the FrankWolfe method minimizes f between x
i
and x
i
+d to get x
i+1
).
f=20
f=10
f=0
=30
=0
g
1
=0
g
2
=0
g
1
=0
g
2
=0
x
*
184
RECURSIVE QUADRATIC PROGRAMMING (RQP)
RQP is similar in logic to RLP, except that it solves a sequence of quadratic
programming problems. RQP is a better approach because the second order terms
help capture curvature information.
Min d
T
VL(x
i
) + d
T
Bd
st d
T
Vg
j
(x
i
) s g
j
(x
i
) j
where B is the Hessian (or an approximation of the Hessian) of the Lagrangian
function of the objective function f at the point x
i
. Notice that this a QP in d with
linear constraints. Once the above QP is solved to get d
i
, we obtain x
i
by finding an
optimum step size along x
i
+d
i
and continue. The matrix B is never explicitly
computed  it is usually updated by schemes such as the BFGS approach using
information from Vf(x
i
) and Vf(x
i+1
) only. (Recall the quasiNewton methods for
unconstrained optimization)
185
KELLEY'S CUTTING PLANE METHOD
One of the better linearization methods is the cutting plane algorithm of J.E.Kelley
developed for convex programming problems.
Suppose we have a set of nonlinear constraints g
j
(x)s0 for j=1,2,...,m that determine
the feasible region G. Suppose further that we can find a polyhedral set H that
entirely contains the region determined by these constraints, i.e., G_H.
In general, a cutting plane algorithm would operate as follows:
1. Solve the problem with linear constraints.
2. If the optimal solution to this problem is feasible in the original (nonlinear)
constraints then STOP; it is also optimal for the original problem.
3. Otherwise add an extra linear constraint (i.e., an extra line/hyperplane) so that the
current optimal point becomes infeasible for the new problem with the additional
constraint (we thus "cut out" part of the infeasibility).
Now return to Step 1.
g
1
=0
g
2
=0
g
3
=0
G
b
1
=0
H
b
2
=0
b
3
=0
b
4
=0
G:
g
j
(x): nonlinear
b
j
(x): linear; hence
H is polyhedral
cosiblc rcgion
186
JUSTIFICATION
Step 1: It is in general easier to solve problems with linear constraints than ones with
nonlinear constraints.
Step 2: G is determined by g
j
(x)s0, j=1,...,m, where each g
j
is convex, while
H is determined by h
k
s0 k, where each h
k
is linear. G is wholly contained inside
the polyhedral set H.
Thus every point that satisfies g
j
(x)s0 automatically satisfies h
k
(x)s0, but not
viceversa. Thus G places "extra" constraints on the design variables and therefore
the optimal solution with G can never be any better than the optimal solution with H.
Therefore, if for some region H, the optimal solution x
*
is also feasible in G, it
MUST be optimal for G also. (NOTE: In general, the optimal solution for H at some
iteration will not be feasible in G).
Step 3: Generating a CUT
In Step 2, let g(x

) = Naximum
]]
g
]
(x

), where J is the set of violated constraints,
J={j  g
j
(x)>0}, i.e., g(x

) is the value of the most violated constraint at x
i
. By the
Taylor series approximation around x
i
,
g(x) = g(x

) +(x x

)
T
Vg(x

)+
g(x) g(x

) + (x  x

)
T
Vg(x

), for every xeR
n
(since g is convex).
187
If Vg(x

) = then g(x) g(x

)
But since g is violated at x
i
, g(x

) > u. Therefore g(x) > u xeR
n
.
g is violated for all x,
the original problem is infeasible.
If Vg(x

) = , then consider the following linear constraint:
(x  x

)
T
Vg(x

) g(x

)
i. e. , b
k
(x) = g(x

) + (x x

)
T
Vg(x

) u ()
(Note that x
i
, Vg(x

) and g(x

) are all known)
The current point x
i
is obviously infeasible in this constraint  since h
k
(x
i
) s 0
implies that g(x

) +(x

 x

)
T
Vg(x

) = g(x

) u, which contradicts the fact that
g is violated at x
i
, (i.e, g(x
i
) >0).
So, if we add this extra linear constraint, then the optimal solution would
change because the current optimal solution would no longer be feasible for the new
LP with the added constraint. Hence () defines a suitable cutting plane.
NOTE: If the objective for the original NLP was linear, then at each iteration we
merely solve a linear program!
188
An Example (in R
2
) of how a cutting plane algorithm might proceed.
( cutting plane).
Nonnegativity + 5 linear
constraints
G
b
3
H
b
2
b
1
b
4
x=x
*
b
5
b
6
G
b
3
H
b
2
b
1
b
4
x
b
5
True optimum
x
*
for G
Nonnegativity + 6 linear
constraints
Nonnegativity + 4 linear
constraints
G
b
3
H
b
2
b
1
b
4
x
Nonnegativity + 3 linear
constraints
G
b
3
H
b
2
b
1
x
h
4
h
5
h
6