Chap04 - Basic Concepts of Optimization

PART
II
OPTIMIZATION
THEORY
AND METHODS
P
art II describes modern techniques of optimization and translates these
concepts into computational methods and algorithms. A vast literature on
optimization techniques exists, hence we have focused solely on methods which
have been proved effective for a wide range of problems. Optimization methods
have matured sufficiently during the past twenty years so that fast and reliable
methods are available to solve each important class of problem.
Six chapters make up Part II of this book, covering the following areas:
1. Mathematical concepts (Chapter 4)
2. One-dimensional search (Chapter 5)
3. Unconstrained multivariable optimization (Chapter 6)
4. Linear programming (Chapter 7)
5. Nonlinear programming (Chapter 8)
6. Optimization involving staged processes and discrete variables (Chapter 9)
121
122 OPTIMIZATION THEORY AND METHODS
The topics are grouped so that unconstrained methods are presented first, fol-
lowed by constrained methods. The last chapter in Part II deals with discontinu-
ous (integer) variables, a common category of problem in chemical engineering
but one quite difficult to solve without great effort.
As optimization methods as well as computer hardware have been im-
proved over the past two decades, the degree of difficulty of the problems that
can be solved has expanded significantly. Continued improvements in optimiza-
tion algorithms and computer hardware and software should enable optimiza-
tion of large-scale nonlinear problems involving thousands of variables, both
continuous and integer, some of which may be stochastic in nature.
CHAPTER
4
BASIC
CONCEPTS'
OF OPTIMIZATION
4.1 Continuity of Functions 124
4.2 Unimodal Versus Multimodal Functions 127
4.3 Convex and Concave Functions 129
4.4 Convex Region 134
4.5 Necessary and Sufficient Conditions for an Extremum of an Unconstrained
Function 138
4.6 Interpretation of the Objective Function in Terms of Its Quadratic Approxima-
tion 145
References 151
Problems 152
123
124 BASIC CONCEPTS OF OPTIMIZATION
In order to understand the strategy of optimization procedures, certain
basic concepts must be described. In this chapter we will examine the properties
of objective functions and constraints to establish a basis for analyzing optimiza-
tion problems. Those features which are desirable (and also undesirable) in the
formulation of an optimization problem are identified. Both qualitative and
quantitative characteristics of functions will be described. In addition, we will
present the necessary and sufficient conditions to guarantee that a supposed
extremum is indeed a minimum or a maximum.
4.1 CONTINUITY OF FUNCTIONS
In carrying out analytical or numerical optimization you will find it preferable
and more convenient to work with continuous functions of one or more vari-
ables than with functions containing discontinuities. Functions having contin-
uous derivatives are also preferred. What does continuity mean? Examine
Fig. 4.1. In case A, the function is clearly discontinuous. Is case B also discon-
tinuous?
We define the property of continuity as follows. A function of a single
variable x is continuous at a point Xo if
(a) f(x
o
) exists
(b) lim f(x) exists
X--+XO
(c) lim f(x) = f(x
o
)
X-+XO
If f(x) is continuous at every point in region R, then f(x) is said to be contin-
uous throughout R. For case B in Fig. 4.1, the function of x ~ s a "kink" in it,
but f(x) does satisfy the property of continuity. However, f'(x) == df(x)/dx does
not. Therefore, the function in case B is continuous but not continuously differ-
entIable.
!(X
l
)
~ ~
!(x
2
)
/\
Xl X
2
Case A Case B
Figure 4.1 Functions with discontinuities in the function and/or derivatives.
4.1 CONTINUITY OF FUNCTIONS 125
EXAMPLE 4.1 ANALYSIS OF FUNCTIONS FOR CONTINUITY
Are the following functions continuous? (a) f(x) = l/x; (b) f(x) = In x. In each
case specify the range of x for which f(x) and rex) are continuous.
Solution
(a) f(x) = l/x is continuous except at x = 0; f(O) is not defined. rex) = -1/x
2
is
continuous except at x = O.
(b) f(x) = In x is continuous for x > O. For x S 0, In (x) is not defined. As to
f'(x) = l/x, see (a).
A discontinuity in a function mayor may not cause difficulty in optimiza-
tion. In case A in Fig. 4.1, the maximum occurs reasonably far from the discon-
tinuity and mayor may not be encountered in the search for the optimum. In
case B, if a method of optimization that o ~ s not use derivatives is employed,
then the "kink" in f(x) will probably be unimportant, but methods employing
derivatives might fail, because the derivative becomes undefined at the discontin-
uity and has different signs on each side of the discontinuity. Hence small
changes in x do not lead to convergence.
One type of discontinuous objective function, one that allows only discrete
values of the independent variable(s), occurs frequently in process design be-
cause the process variables assume only specific values rather than continuous
values. Examples are the cost per unit diameter of pipe, the cost per unit area
for heat exchanger surface, or the insulation cost considered in Example 1.1. For
a pipe, we might represent the installed cost as a function of the pipe diameter
as shown in Fig. 4.2. See also Noltie (1978). Although in fact discontinuous, the
cost function can for most purposes be approximated as a continuous function
because of the relatively small differences ip. available pipe diameters. You could
Cost

Commercially available
pipe diameters

Figure 4.2 Installed pipe cost as a
function of diameter.
then disregard the discrete nature .of the functi.on and .optimize the C.ost as if the
.diameter were a c.ontinu.ous variable. Once the .optimum value .of the diameter is
.obtained f.or the c.ontinu.ous fl.mcti.on, the discrete-valued diameter nearest t.o the
.optimum that is c.ommercially available can be selected. A sub.optimal value f.or
installed cost will result, but such a s.oluti.on sh.ould be sufficiently adequate for
engineering purp.oses because of the narrow intervals between discrete values .of
the diameter.
EXAMPLE 4.2 OPTIMIZATION INVOLVING AN INTEGER-VALUED
VARIABLE
Consider a catalytic regeneration cycle in which there is a simple trade-off between
costs incurred during regeneration and the increased revenues due to the regener-
ated catalyst. Let x 1 be the number of days during which the catalyst is used in the
reactor, and Xl be the number of days for regeneration. The reactor start-up crew
is only available in the morning ,shift, so Xl + Xl must be an integer.
We will assume that the reactor feed-flow rate q (kg/day) is constant as is the
cost of the feed C
l
($/kg), the value-of the product C
l
($/kg), and the regeneration
cost C
3
($/regeneration cycle). We will further assume that the catalyst deteriorates
graduaJly according to the linear relation
d = 1.0 - kXl
where 1.0 represents the weight fraction conversion of feed at the start of the oper-
ating cycle, and k is the deterioration factor in units of weight fraction per day.
Define an objective function and find the .optimal value of Xl'
Solution. For one complete cycle of operation and regeneration, the objective func-
tion for the total profit per day is comprised of
Profit
-D = product value - feed cost - (regeneration cost per cycle) (cycles per day)
ay
or in the defined notation
qClxldavg - qC1x
l
- C
3
f(x) = ------"----'---
Xl + Xl
where d
avg
= 1.0 - (kxt/2).
(a)
The maximum daily profit for an entire cycle would be obtained by maximiz-
ing Eq. (a) with respect to Xl' When the first derivative of Eq (a) is set equal to
zero and the resulting equation solved for XI' the optimum is
t [1 (2)( CIX
l
C
3
)Jl/l
~ = -Xl + Xl + k Xl - --c; + qC
l
Suppose Xl = 2, kl = 0.02, q = 1000, C
l
= 1.0, C
l
= 0.4, and C
3
= 1000. Then
x ~ t = 12.97 (rounded to 13 days if XI is an integer).
Clearly, treating XI as a continuous variable may be improper if Xl is 1, 2, 3,
etc., but is probably satisfactory if XI is 15, 16, 17, etc. You might specify Xl in
terms of shifts of four or eight hours instead of days to obtain finer subdivisions of
time.
4.2 UNIMODAL VERSUS MULTIMODAL FUNCTIONS 127
You will find in real life that other problems involving discrete variables
may not be so nicely posed. For example, if the cost is a function of the number
of discrete pieces of equipment, such as compressors, the optimization procedure
cannot ignore the integer character of the cost function because usually only a
small number of pieces of equipment are involved. You cannot install 1.54 com-
pressors, and rounding off to 1 or 2 compressors may be quite unsatisfactory.
This subject will be discussed in more detail in Chap. 9.
4.2 UNIMODAL VERSUS
MULTIMODAL FUNCTIONS
In formulating an objective function, it is far better, if possible, to choose a
unimodal than a multimodal function as a performance criterion. Compare Fig.
4.3a with Fig. 4.3b. A unimodal function f(x) (in the range specfied for x) has a
single extremum (minimum or maximum) whereas a multimodal function has
two or more extrema. If f'(x) = 0 at the extremum, the point is called a station-
ary point (and can be a maximum or a minimum). Multiple stationary points
occur for multimodal functions. The distinction between the global extremum,
the biggest or smallest among a set of extrema, and local extrema (any extre-
mum) becomes significant in numerous practical optimization problems involv-
ing nonlinear functions. (Numerical procedures usually terminate at a local
extremum, and that point may not be the point you are seeking.) More pre-
cisely, for a function of a single variable, if x* is the point where f(x) reaches a
maximum (see Fig. 4.3a), unimodality is defined as
a
f(x
l
) < f(x
z
) < f(x*)
f(x
4
) < f(x
3
) < f(x*)
x
Figure 4.3a A unimodal function.
b
Xl < X
z
< x*
x* < X3 < X
4
x
(4.1)
a b x
Figure 4.3b A multimodal function.
For a maximum, the function f(x) monotonically increases from the left to the
maximum, and monotonically decreases to the right of the maximum. An analo-
gous set of relations to (4.1) can be written for a minimum. In Fig. 4.4, the point
X
4
represents a saddle point (the concept of which is clear when you later on
examine Fig. 4.14). We shall in Chap. 5 describe numerical techniques that are
based on the underlying assumption that the function treated is unimodal. The
property of unimodality is difficult to establish analytically, as demonstrated by
Wilde and Beightler (1967). For functions of one or two variables, the function
can be plotted, making it obvious whether or not the function is unimodal. For
example, Fig. 4.4 illustrates the contours of a multimodal function of two vari-
ables with two minima.
Figure 4.4 Contours of !(x!, x
2
), a multimodal function.
4.3 CONVEX AND CONCAVE
FUNCTIONS
4.3 CONVEX AND CONCAVE FUNCTIONS 129
Determination of convexity, or concavity, will help you establish whether a local
optimal solution is also the global optimal solution (the best among all solu-
tions), a matter of some concern in view of the remarks made in Sec. 4.2 and in
Chap. 1 regarding multiple optima. When the objective function is known to
have certain properties (defined below), computation of the optimum can be ac-
celerated by using appropriate optimization algorithms. A function is called con-
cave over the region R if the following relation holds. For any two different
values of x (x may be a vector x), Xa and Xb lying in the region R,
(4.2)
where () is a scalar having a value between 0 and 1. The function is strictly
concave if the greater than or equal to sign of (4.2) is replaced with the greater
than (> ) sign.
Figure 4.5 illustrates the concepts involved in Eq. (4.2). If the values of
f(x) on each straight line (in the figure the dashed line) connecting pairs of
function values f(xa) and f(x
b
) for all pairs Xa and Xb lie on or below the
function values themselves, then the function is concave. If the values of f(x) on
each straight line between f(xa) and f(x
b
) lie above or on the function values
themselves, then the function is said to be convex. For a convex function the
inequality sign would be reversed in Eq. (4.2). What would strictly convex imply
with respect to the:::;; sign? The expressions "strictly concave" or "strictly con-
vex" imply that the dashed lines between f(xa) and f(x
b
) cannot fall on the
function itself except at the end points of the line. The common example of a
function that is convex but not strictly convex is a straight line; the line is also
concave but not strictly concave.
Equation (4.2) is not a convenient equation to use in testing for convexity
or concavity. Instead, we will make use of the second derivative of f(x), or
V
2
f(x) if x is a vector. v
2
f(x) is called the Hessian matrix of f(x), often de-
noted by the symbol H(x), and is the symmetric matrix of second derivatives of
(a) Concave function (b) Convex function
f(x)
o x o
Figure 4.5 Comparison of concave (a) and convex (b) functions.
x
(a) Concave function (b) Convex function
""
:'(X) 1
/
""
x
7
x
Figure 4.6 Plot of the first derivatives of a quadratic concave and convex function of a single
variable.
f(x) (see App. B). For example, if f(x) is a function of two variables and is
quadratic
Suppose we examine the simplest function of one variable that can have curva-
ture, a quadratic function, such as shown in Figs. 4.5a and b. A plot of the first
derivative of f(x) would appear as indicated in Fig. 4.6. Figure 4.7 illustrates the
value of the second derivative, f"(x) = d
2
f(x)/dx
2

From Fig. 4.7, we conclude that if the sign of the second derivative of f(x)
in the range of a.:::;; x .:::;; b is always negative or zero, then f(x) is concave; if the
sign of f"(x) is positive or zero (nonnegative) for a .:::;; x .:::;; b, then f(x) will be
convex. Strictly concave means that the sign of f"(x) is always negative and
strictly convex means that the sign of f"(x) is always positive in the specified
range of x. A general proof of these statements is available in Aoki (1971).
(a) Concave function
rex)
o r ~ x
(b) Convex function
rex)
o ~ x
Figure 4.7 Second derivative of f(x) for a quadratic function.
EXAMPLE 4.3 ANALYSIS FOR CONVEXITY AND CONCAVITY
For each of the functions below,
(a) f(x) = 3x
2
(b) f(x) = 2x
(c) f(x) = -5x
2
(d) f(x) = 2X2 - x
3
determine if f(x) is convex, concave, strictly convex, strictly concave, all, or none
of these classes in the range - ro S; x S; roo
Solution
(a) f"(x) = 6, always positive, hence f(x) is both strictly convex and convex.
(b) f"(x) = 0 for all values of x, hence f(x) is convex and concave. Note straight
lines are both convex and concave simultaneously.
(c) f"(x) = -10, always negative, hence f(x) is both strictly concave and concave.
(d) f"(x) = 6':'" 3x; may be positive or negative depending on the value of x, hence
f(x) is not convex or concave over the entire range of x.
The concepts of concavity and convexity also apply to a multivariable
function f(x). For any objective functions, the Hessian matrix H(x) must be
evaluated to determine the nature of f(x). First let us summarize the definitions
of matrix types:
1. H is positive definite if and only if x
T
H x is >0 for all x #- O.
2. H is negative definite if and only if x
T
H x < 0 for all x #- O.
3. H is indefinite if x
T
H x < 0 for some x and >0 for other x.
Definitions 1 and 2 can be extended to H being positive semidefinite (x
T
H x 0)
or negative semidefinite (x
T
H x 0), for all x. It can be shown from a Taylor
series expansion that if f(x) has continuous second partial derivatives, f(x) is
concave if and only if the Hessian matrix is negative semidefinite. For f(x) to be
strictly concave, H must be negative definite. For f(x) to be convex H(x) must
be positive semidefinite and for f(x) to be strictly convex, H(x) must be positive
definite.
Two convenient tests can be used to establish the status of H(x) for strict
convexity:
(1) All diagonal elements must be positive and the determinants of all leading
prindpal minors, det {M;(H)}, and of H(x) itself, det (H), are positive (>0).
Keep in mind that H(x) must be a symmetric matrix.
Another test is
(2) All the eigenvalues of H(x) are positive (> 0).
Table 4.1 Relationship between the character of f(x) and
the state of H(x)
All the eigen- Determinants of the
values of H(x) leading principal
f(x) is H(x) is are minors of H* (L\,)
Strictly convex Positive definite >0 L\1 > 0, L\2 > 0, ...
Convex Positive semi- ~ O L\1 ~ 0, L\2 ~ 0, ...
definite
Concave Negative semi- :0;0 L\1 :0; 0, L\2 ~ 0, L\3 :0; 0, ...
definite (alternating sign)
Strictly concave Negative definite <0 L\1 < 0, L\2 > 0, L\3 < 0
(alternating sign)
Refer to App. B for.details of how to calculate the leading principal minors and
eigenvalues. For strict concavity, two alternative definitions similarly can be em-
ployed:
(1) All diagonal elements must be negative and det (H) and det {M;(H)} > 0
if i is even (i = 2,4,6, ... ); det (H) and det {M;(H)} < 0 if i is odd
(i = 1, 3, 5, ... ), where Mi is the ith principal minor determinant.
(2) All the eigenvalues of H(x) are negative ( < 0).
To establish convexity and concavity, the strict inequalities > or <, respecti-
vely, in the above tests are replaced by ~ or ~ respectively. If a function has a
stationary point where the Hessian has eigenvalues of mixed signs, the function
is neither convex nor concave.
Table 4.1 summarizes the relations between convexity, concavity, and the
state of the Hessian matrix of f(x). We have omitted the indefinite case for H,
that is, when f(x) is neither convex or concave.
EXAMPLE 4.4 DETERMINATION OF POSITIVE DEFINITENESS
Classify the function f(x) = 2xi - 3X
I
X
2
+ x ~ using the categories in Table 4.1,
or state that it does not belong in any of the categories.
Solution
af(x)
-a-- = 4Xl - 3x
2
Xl
af(x)
-a-- = -3XI + 4X2
X
2
a
2
f(x) a
2
f(x)
--=4 --=4
axi a x ~
a
2
f(x) a
2
f(x)
--=--=-3
aX
I
aX
2
aX
2
aX
I
Both diagonal elements are positive.
The leading principal minors are:
M1(order 1) = 4
M
2
(order 2) = H
detM
1
=4
detM2 = 7
hence H(x) is positive definite. Consequently, f(x) is strictly convex (as well as
convex).
EXAMPLE 4.5 DETERMINATION OF POSITIVE DEFINITENESS
Repeat the analysis of Example 4.4for f(x) = xi + X
1
X
2
+ 2X2 + 4.
Solution
H(x) = [ ~ ~
det (H(x)) = -1
The principal minors are:
detM2 = -1
det Ml = 2
Consequently, f(x) does not fall into any of the categories in Table 4.1. We con-
clude that no unique extremum exists. Can you demonstrate the same result by
calculating the eigenvalues of H?
EXAMPLE 4.6 DETERMINATION OF CONVEXITY AND CONCAVITY
Repeat the analysis of Example 4.4 for f(x) = 2Xl + 3x
2
+ 6.
Solution
H(x) = [ ~ ~
hence the function is both convex and concave.
EXAMPLE 4.7 DETERMINATION OF CONVEXITY
Consider the following objective function: is it convex? Use eigenvalues in the
analysis.
Solution
a
2
f(x)
----;---z- = 4
UX
I
a
2
f(x)
----;---z- = 3
UX
2
Therefore the Hessian matrix is
H(x) = [ ~ ~
Next determine the eigenvalues of H(x).
[
4 -IX
det 2
1X2 - 71X + 8 = 0
1X1 = 5.56
1X2 = 1.44
Because both eigenvalues are positive, the function is strictly convex (and convex,
of course) for all values of Xl and X
2
.
EXAMPLE 4.8 CLASSIFICATION OF STATIONARY POINTS
Find the stationary points and their classification for the nonlinear function
f(x) = xi + ~ - 3x
1
+ 8x
2
+ 2
Solution. The stationary points are found from solving simultaneously
and
of 2
- = 0 = 3x
1
- 3
ox
of
-=0=2x
2
+ 8
oX
2
and are (1, -4) and (-1, -4). Next find the Hessian:
[
6X1 OJ
H(x) = 0 2
For x* = (1, -4), H is positive definite, hence f(x*) is convex at that point. For
x* = (-1, -4), H is indefinite. This point corresponds to a saddlepoint, a topic to
be discussed later in Sec. 4.6.
In some instances we can use the concept that a sum of convex (concave)
functions is also convex (concave). We can examine any f(x) in terms of its
component parts. If the function is separable into f(x) = f l(X) + fix), and if
f l(X) is convex and f 2(X) is convex, then f(x) is convex. For example
f(x) = a
1
(x
1
- c
1
)Z + az{xz - C
Z
)2 a
i
~ 0
is convex.
4.4 CONVEX REGION
A convex region (set of points) plays a useful role in optImIzation involving
constraints. Figure 4.8 illustrates the concept of a convex region and one that is
(a)
Convex
region
Figure 4.8 Convex and nonconvex regions.
4.4 CONVEX REGION 135
(b)
not convex. A convex set of points exists if for any two points in a region, Xa
and X
b
, all points x = p'Xa + (1 - P,)Xb' where ::;; p, ::;; 1, on the line joining Xa
and Xb are in the set. In Fig. 4.8b note how this requirement is not satisfied
along the dashed line. If a region is completely bounded by concave functions
for the case in which all g;(x) ~ 0, then the functions form a closed convex
region. For the case in which all the inequality constraints stated in the form
gi(X) ::;; are convex functions, the functions form a convex region. Keep in
mind that straight lines are both concave and convex functions.
EXAMPLE 4.9 DETECTION OF A CONVEX REGION
Does the following set of constraints that form a closed region form a convex
region
-xi + X
2
~ 1
X
1
-x
2
::;;-2
Solution. A plot of the two functions indicates the region circumscribed is closed.
The arrows in Fig. E4.9 designate the directions in which the inequalities hold.
Write the inequality constraints as gi ~ O. Therefore
gl(X) = -xi + X
2
- 1 ~ 0
gzCx) = -Xl + X
2
- 2 ~ 0
That the enclosed region is convex can be demonstrated by showing that both
gl(X) and gzCx) are concave functions:
H[gl(X)] = [ ~ ~ J
H[g2(X)] = [ ~ ~ J
Since all eigenvalues are zero or negative, according to Table 4.1 both gl and g2
are concave and the region is convex.
t
Figure E4.9 Convex region composed of two concave functions.
EXAMPLE 4.10 CONSTRUCTION OF A CONVEX REGION
Construct the region given by the following inequality constraints; is it convex?
Xl ::;; 6; x
2
::;; 6; Xl 0; Xl + X
2
::;; 6; X
2
0
Solution. See Fig. E4.1O for the region delineated by the inequality constraints. By
visual inspection, the region is convex. This set of linear inequality constraints
Figure E4.10 Diagram of region defined by linear inequality constraints.
4.4 CONVEX REGION 137
forms a convex region since all the constraints are concave. In this case the convex
region is closed.
As mentioned before, the existence of convex regions has an important
bearing on optimization of functions subject to constraints. For example, exam-
ine Fig. 4.9. In optimization problems involving constraints, points that satisfy
all the constraints are said to be feasible points; all other points are nonfeasible
-------.... ........
"-
"....------............ "
/' '\ \
Feasible-++---'-----+-, ----ill--l)nconstrained
maximum / / maximum

region
Local
Feasible
region
.....
........
,
/ /1
/'
.....-
--
(a)

(b)
"- ..... 30
, ........ 20 25
15
(c)
maximum
Local and global
maximum
"-
"-
........ '50
"-40
'35
Figure 4.9 Effect of the character of the region of search in solving constrained optimization prob-
lems.
points. Inequality constraints specify a feasible region comprised of the set of
points that are feasible whereas equality constraints limit the feasible set of points
to hypersurfaces (multidimensional surfaces defined by hi(X
1
, X
2
, , xn) = 0),
curves, or perhaps even a single point. Solid lines in Fig. 4.9 represent inequality
constraint boundaries. Dashed lines are the contours of the objective function.
All points that satisfy all the inequality constraints (a) as strict inequalities are
termed interior points, and (b) as equalities are termed boundary points. All other
points are exterior points.
In Fig. 4.9a the maximum of the unconstrained objective function lies out-
side the feasible region. It is clear that the given set of constraints requires that
the search for the optimum terminate at a lesser value of the objective function
(for the maximum) termed the constrained solution. In Fig. 4.9a one of the
constraints at the optimal solution is "active," that is, the inequality constraint is
satisfied at its boundary as an equality (=0). If no constraints are "active" at
the optimal solution then the unconstrained maximum can be reached as shown
in Fig. 4.9b. Figure 4.9c illustrates the importance of having the constraints com-
prise a convex region if a global extremum is to be located. The search for the
maximum of f(x), if initiated in the left-hand part of the feasible region, might
well terminate at point A, whereas a search starting in the right-hand side of the
region might terminate at B. Therefore, we conclude that the nature of the
search region has an important bearing on the potential for obtaining suitable
results in optimization. If the feasible region in Fig. 4.9c were extended, as
shown by the dashed line, to make the region convex, then the search from any
initial point would converge to the same answer.
4.5 NECESSARY AND SUFFICIENT
CONDITIONS FOR AN EXTREMUM
OF AN UNCONSTRAINED FUNCTION
In optimization of an unconstrained function we are concerned with finding the
minimum or maximum of an objective function f(x), a function of one or more
variables. The problem can be interpreted geometrically as finding the point in
an n-dimension space at which the function has an extremum. Examine Fig. 4.10
in which the contours of a function of two variables are displayed.
An optimal point x* is completely specified by satisfying what are called
the necessary and sufficient conditions for optimality. A condition N is necessary
for a result R if R can be true only if the condition is true (R => N). However,
the reverse is not true, that is if N is true, R is not necessarily true. A condition
is sufficient for a result R if R is true if the condition is true (S => R). A condition
T is necessary and sufficient for result R if R is true if and only if T is true
T ~ R ) .
The easiest way to develop the necessary and sufficient conditions for a
minimum or maximum of f(x) is to start with a Taylor series expansion about
the presumed extremum x*
f(x) = f(x*) + VTf(x*) Ax + t{AX
T
)V2f(x*)Ax + 03(Ax) + ... (4.3)
4.5 NECESSARY AND SUFFICIENT CONDITIONS FOR EXTREMUM OF UNCONSTRAINED FUNCTION 139
X
2
X
2
Figure 4.10a A function of two variables
with a single stationary point, the extre-
mum.
Figure 4.10b A function of two variables with
three stationary points and two extrema, A and
B.
where = x - x*, the perturbation of x from x*. We assume all terms in Eq.
(4.3) exist and are continuous, but will ignore the terms of order 3 or higher
and simply analyze what occurs for various cases involving just the
terms through the second ortfer.
By defining a local minimum is a point x* such that no other point in the
vicinity of x* yields a value of f(x) less than f(x*), or
f(x) - f(x*) 0 (4.4)
(x* is a global minimum if (4.4) holds for any x in the n-dimensional space of x).
Similarly, x* is a local maximum if
f(x) - f(x*) ::;; 0 (4.5)
Examine the second term on the right-hand side of Eq. (4.3):
Because is arbitrary and can have both plus and minus values of elements,
we must insist that V f(x*) = 0 or otherwise we could add a term to f(x*) so
that Eq. (4.4) for a minimum, or (4.5) for a maximum, would be violated. Hence,
a necessary condition for a minimum or maximum of f(x) is that the gradient of
f(x) vanishes at x*
Vf(x*) = 0 (4.6)
that is, x* is a stationary point.
With the second term on the right-hand side of Eq. (4.3) forced to be zero,
we next examine the third term: This term establishes the
character of the stationary point (minimum, maximum, or saddle point). In Fig.
4.10b, A and B are minima while C is a saddle point. Note how movement along
one of the perpendicular search directions (dashed lines) from point C increases
f(x) whereas movement in the other direction decreases f(x). Thus, satisfaction
of the necessary conditions does not guarantee a minimum or maximum. Figure
4.11 illustrates the character of f(x) if the objective function is a function of a
single variable.
f(x)
d
x
a-inflection point (scalar equivalent to a saddle point)
b-global maximum (and local maximum)
c-local minimum
d-local maximum
Figure 4.11 A function exhibiting
different types of stationary points.
To establish the existence of a minimum or maximum at x*, we know
from Eq. (4.3) with Vf(x*) = 0 and the conclusions reached in Sec. 4.3 concern-
ing convexity that for Llx i= 0
'l/2f(x*)
Positive definite
Positive semidefinite
Negative definite
Negative semidefinite
Indefinite
>0
~ o
<0
::;0
Both ::;0 and ~
depending on ,1.x
Consequently, x* can be classified as
'l/2f(x*)
Near x:' f(x) - f(x*)
Increases
Possibly increases
Decreases
Possibly decreases
Increases, decreases, neither
x*
Positive definite
Negative definite
Unique ("isolated") minimum
Unique ("isolated") maximum
These two conditions are known as the sufficiency conditions.
In summary, the necessary conditions (1 and 2 below) and the sufficient
condition (3) to guarantee that x* is an extremum are as follows:
1. f(x) is twice differentiable at x*.
2. Vf(x*) = 0, that is, a stationary point exists at x*.
3. H(x*) is positive definite for a minimum to exist at x*, and negative definite
for a maximum to exist at x*.
Of course, a minimum or maximum may exist at x* even though it is not possi-
ble to demonstrate the fact using the three conditions. For example, if f(x) =
X4/3, x* = 0 is a minimum but H(O) is not defined at x* = 0, hence condition 3
is not satisfied.
EXAMPLE 4.11 CALCULATION OF A MINIMUM OF f(x)
Does f(x) = X4 have an extremum? If SO, what is the value of x* and f(x*) at the
extremum?
Solution
rex) = 4x
3
f"(x) = 12x2
Set rex) = 0 and solve for x; hence x = 0 is a stationary point. Also, 1"(0) = 0,
meaning that condition 3 is not satisfied. Figure E4.11 is a plot of f(x) = X4. Thus,
a minimum exists for f(x) but the sufficiency condition is not satisfied.
f(x)
-1 o x
Figure E4.11
If both first and second derivatives vanish at the stationary point, then
further analysis is required to evaluate the nature of the function. For functions of
a single variable, take successively higher derivatives and evaluate them at the sta-
tionary point. Continue this procedure until one of the higher derivatives is not
zero (the nth one); hence, f'(x*), f"(x*), ... , pn-l)(x*) all vanish. Two cases must
be analyzed:
(a) If n is even, the function attains a maximum or a minimum; a positive sign of
f(n) indicates a minimum, a negative sign a maximum.
(b) If n is odd, the function exhibits a saddle point.
For more details refer to Beveridge and Schechter (1970).
For application of these guidelines to f(x) = x\ you will find d4j(x)/dx
4
=
24 for which n is even and the derivative is positive, so that a minimum exists.
EXAMPLE 4.12 CALCULATION OF EXTREMA
Identify the stationary points of the following function (Fox, 1971), and determine
if any extrema exist.
Solution. For this function, three stationary points can be located by setting
Vf(x) = 0:
af(x) 2
-,,- = -4 + 4X2 - 2Xl - 2Xl = 0
UX
2
(a)
(b)
The set of nonlinear equations (a) and (b) has to be solved, say by Newton's
method, to get the pairs (Xl' x
2
) as follows:
Stationary point Hessian matrix
(x" X
z
)
f(x) eigenvalues Classification
(1.941, 3.854) 0.9855 37.03 0.97 Local minimum
( -1.053, 1.028) -0.5134 10.5 3.5 Local minimum
(also the global
minimum)
(0.6117, 1.4929) 2.83 7.0 -2.56 Saddle point
Figure 4.10b shows contours for the objective function in this example. Note
that the global minimum can only be identified by evaluating f(x) for all the
local minima. For general nonlinear objective functions, it is usually difficult to
ascertain the nature of the stationary points without detailed examination of
each point.
EXAMPLE 4.13
In many types of processes such as batch constant-pressure filtration or fixed-bed
ion exchange, the production rate decreases as a function of time. At some optimal
time, tOP" production is terminated (at poPt) and the equipment is cleaned. Figure
E4.13a illustrates the cumulative throughput pet) as a function of time (t) for such
a process. For one cycle of production and cleaning, the overall production rate is
R(t) = pet)
t + tc
(a)
where R(t) is the overall production rate per cycle (mass/time) and tc is the clean-
ing time (assumed to be constant).
Determine the maximum production rate and show that popt is indeed the
maximum throughout.
pet)
Figure E4.13a
Solution. Differentiate R(t) with respect to t and equate the derivative to 0:
dR(t) = - pet) + [dP(t)/dt](t + te) = 0
dt (t + tY
POP' = dP(t) I (t + te)
dt OP'
(b)
The geometric interpretation of Eq. (b) is the classical result (Walker et aI., 1937)
that the tangent to pet) at POP' intersects the time axis at - te' Examine Fig. E4.13b.
The maximum overall production rate is
P(t)
Figure E4.13b
POP'
ROP'= __ _
tOP' + te
Slope = dP(t) I
dt opt
d
2
P ~ t (the slope) is negative
dt
o ~ __________ __ ~ ____ _
o
Figure E4.13c
(c)
4.6 INTERPRETATION OF THE OBJECTIVE FUNCTION 145
Does popt meet the sufficiency condition to be a maximum? Is
d
2
R(t) 2P(t) - 2[dP(t)jdt](t + t
c
) + [d
2
P(t)/dt
2
](t + tY
--= <0
dt
2
(t + tY
? (d)
Rearrangement of (d) and introduction of (b) into (d) or the pair (poP" t
OPt
) gives
From Fig. E4.13b we note in the range 0 < t < t
Opt
that dP(t)jdt is always positive
and decreasing so that d
2
P(t)/dt
2
is always negative (see Fig. E4.13c). Consequent-
ly, the sufficiency condition is met.
4.6 INTERPRETATION OF THE OBJECTIVE
FUNCTION IN TERMS
OF ITS QUADRATIC APPROXIMATION
If a function of two variables is quadratic or approximated by a quadratic func-
tion f(x) = b
o
+ b1x
1
+ b
2
x
2
+ b
ll
xi + X ~ + b
12
x
t
X
2
then the eigenvalues
of H(x) can be calculated and used to interpret the nature of f(x) at x*. Table
4.2 lists some conclusions that can be reached by examining the eigenvalues of
H(x) for a function of two variables, and Figs. 4.12 through 4.16 illustrate the
Table 4.2 Geometric interpretation of a quadratic function
Signs Character of
Eigenvalue Types of Geometric center of
Case Relations 0(1 0(2 contours interpretation contours Figure
1 0(1 = 0(2 Circles Circular hill Maximum 4.12
2 0(1 = 0(2
+ +
Circles Circular valley Minimum 4.12
3 0(1) 0(2 Ellipses Elliptical hill Maximum 4.13
4 0(1) 0(2
+ +
Ellipses Elliptical Minimum 4.13
valley
5
10(11 = 10(21
+
Hyperbolas Symmetrical Saddle 4.14
saddle point
6
10(11 = 10(21
+
Hyperbolas Symmetrical Saddle 4.14
saddle point
7 0(1) 0(2
+
Hyperbolas Elongated Saddle 4.14
saddle point
8 0(2 = 0 Straight Stationary None 4.15
lines ridge
9 0(2 = 0
+
Straight Stationary None 4.15
lines valley
10 0(2 = 0 Parabolas Rising ridge*t At 00 4.16
11 0(2 = 0 Parabolas Falling valley*t At 00 4.16
These are "degenerate" surfaces.
t The condition of rising or falling must be evaluated from the linear terms in f(x).
f(x)
Figure 4.12 Geometry of second-order objective function of two independent variables-circular
contours.
Figure 4.13 Geometry of second-order objective function of two independent variables-elliptical
contours.
f(x)
Figure 4.14 Geometry of second-order objective function of two independent variables-saddle
point.
f(x)
Figure 4.15 Geometry of second-order objective function of two independent variables-valley.
Figure 4.16 Geometry of second-order objective function of two independent variables-falling
valley.
different types of surfaces corresponding to each case that arises for quadratic
function. By implication, analysis of a function of many variables via examina-
tion of the eigenvalues can be conducted whereas contour plots are limited to
functions of only two or three variables.
Figures 4.12 and 4.13 correspond to objective functions in well-posed opti-
mization problems. In Table 4.2, cases 1 and 2 (Fig. 4.12) correspond to con-
tours of f(x) that are concentric circles, but such functions rarely occur in
practice. Elliptical contours such as correspond to cases 3 and 4 are most likely
for well-behaved functions. Cases 5 to 10 correspond to degenerate problems,
those in which there is no finite maximum/minimum and/or perhaps nonunique
optima appear.
EXAMPLE 4.14 INTERPRETATION OF AN OBJECTIVE FUNCTION
IN TERMS OF ITS EIGENVALUES
Examine the function f(x) = 2xi + ~ + 2x
t
x
Z
H(x) = [ ~ ~
d [(4 - a) 2 ] _ 0
et 2 (2 - a)
a
Z
- 6a + 4 = 0
The eigenvalues are a = (6 J36 - 16)/2 = 3 fl, both positive, but not equal,
corresponding to case 4 in Table 4.2.
H(x) =
[
(2 - IX) 2 J
det 2 (2 _ IX) = 0
SO and
This case corresponds to case 9 in Table 4.2.
For well-posed quadratic objective functions the contours always form a
convex region; for more general nonlinear functions, they do not (see for exam-
ple Figs. 4.10a or 4.10b). It is helpful to construct contour plots to assist in
analyzing the performance of multivariable optimization techniques when ap-
plied to problems of two or three dimensions. Most computer libraries have
contour plotting routines to generate the desired figures.
As indicated in Table 4.2, the eigenvalues of the Hessian matrix of f(x)
indicate the shape of a function. For a positive definite symmetric matrix, the
eigenvectors (refer to App. B) form an orthonormal set. For example, in two
dimensions, if the eigenvectors are Vi and V
2
, ViV2 = 0 (the eigenvectors are per-
pendicular to each other). The eigenvectors also correspond to the directions of
the principal axes of the contours of f(x). This information can be used to make
a variable transformation which eliminates the "tilt" in the contours, and yields
search directions that are efficient. The next example illustrates such a calcula-
tion.
EXAMPLE 4.15
For f(x) = 3xi + 2X
1
X
2
+ find the equations for the principal axes and de-
termine a transformation x = Vz such that f = allzi + thus eliminating the
interaction term.
Solution. The optimum is at (0,0). First, compute H(x) and the corresponding
eigenvalues and eigenvectors.
The eigenvalues are 2 and 7 and the corresponding eigenvectors are
and
Refer to App. B for the procedure to calculate these quantities. Note that vi v 2 = O.
From linear algebra (Amundson, 1966), the transformation which eliminates
the interaction (X
1
X
2
) term is
x=Vz where
Substituting for V and z = G:J
Xl = -Zl +2Z2
X
2
= 2Zl + Z2
(a)
(b)
If Eqs. (a) and (b) are substituted into the objective function, the result in terms of
Zl and Z2 is
fez) = 5zI + 1 7 5 z ~
The equations for the principal axes can be found from the eigenvectors,
[-1 2], and [2 1]
X
2
= O.5x
l
X
2
= -2.0Xl
Figure E4.15 illustrates the contours of f(x) and the principal axes for this Exam-
ple. The variable transformation represents rotation of the variable axes from the
Xl - X
2
plane to the Zl - Z2 plane.
Figure E4.1S Contours and principal axes for f = 3xi + 2x
t
x
2
+ 5 x ~
SUPPLEMENTARY REFERENCES 151
This example was somewhat simplified because the origin of the principal
axes and the origin for the original axes were the same. If they were not the same,
then the origin for the principal axes would be at the extremum, and you would
transform the variables so that the new variables in the principal axes, y, would be
related to x by y = x - x*, and thus eliminate linear terms in f(x).
One of the primary requirements of any successful optimization technique
is the ability to be able to move rapidly in a local region along a narrow valley
(in minimization) toward the minimum of the objective function. In other words,
an efficient algorithm will select a search direction which generally follows the
axis of the valley rather than jumping back and forth across the valley. Valleys
(ridges in maximization) occur quite frequently, at least locally, and these types
of surfaces have the potential to slow down greatly the search for the optimum.
A valley lies in the direction of the eigenvector associated with a small eigen-
value of the Hessian matrix of the objective function. For example, if the
Hessian matrix of a quadratic function is
H = G ~
then the eigenvalues are ()(l = 1 and ()(2 = 10. The eigenvector associated with
()(l = 1, that is, the Xl axis, is lined up with the valley in the ellipsoid. Transfor-
mation techniques such as those discussed above can be used to allow the prob-
lem to be more efficiently solved by a search technique. (See Chap. 6.)
Valleys and ridges corresponding to cases 1 through 4 can lead to a mini-
mum or maximum, respectively, but not for other cases. Do you see why?
REFERENCES
Amundson, N. R., Mathematical Methods in Chemical Engineering: Matrices and Their Application,
Prentice-Hall, Englewood Cliffs, New Jersey (1966).
Aoki, M., Introduction to Optimization Techniques, Macmillan Co., New York (1971), p.24.
Beveridge, G. S. G., and R. S. Schechter, Optimization: Theory and Practice, McGraw-Hill, New
York (1970), p. 126.
Fox, R. L., Optimization Methods for Engineering Design, Addison-Wesley, Reading. Massachusetts
(1971), p.42.
Noltie, C. B., Optimum Pipe Size Selection, Gulf Pub!. Co., Houston, Texas (1978).
Walker, W. H., W. K. Lewis, W. H. McAdams, and E. R. Gilliland, Principles of Chemical Engineer-
ing, 3d ed., McGraw-Hill, New York (1937), p. 357.
Wilde, D. J., and C. S. Beightler, Foundations of Optimization, Prentice-Hall, Englewood Cliffs, New
Jersey (1967), p. 220.
SUPPLEMENTARY REFERENCES
Avriel, M., Nonlinear Programming, Prentice-Hall, Englewood Cliffs, New Jersey, (1976).
Jeter, M. W., Mathematical Programming, Marcel Dekker, New York (1986).
Walsh, G. R., Methods of Optimization, John Wiley, New York (1975).
Wilde, D. J., Optimum Seeking Methods, Prentice-Hall, Englewood Cliffs, New Jersey (1964).
PROBLEMS
4.1. Classify the following functions as continuous (specify the range) or discrete:
(a) f(x) = eX
(b) f(x) = aX
n
-
1
+ b(xo - x
n
) where Xn represents a stage in a distillation
column
(c) f(x) = Xn - xsI(1 + xs) where Xn = concentration of vapor from a still and
Xs is the concentration in the still
4.2. The future worth S of a series of n uniform payments each of amount P is
P
S = --;- [(1 + i)n - 1]
I
where i is the interest rate per period. If i is considered to be the only variable, is it
discrete of continuous? Explain. Repeat for n. Repeat for both nand i being vari-
ables.
4.3. In a plant the gross profit P in dollars is
P = nS - (nV + F)
where n is the number of units produced per year, S is the sales price in dollars per
unit, V is the variable cost of production in dollars per unit, and F is the fixed
charge in dollars. Suppose that the average unit cost is calculated as
nV +F
Average unit cost = ---
n
Discuss under what circumstances n can be treated as a continuous variable.
4.4. One rate of return is the ratio of net profit P to total investment
R = 100 P(l - t) = 100(1 _ t) [S - (V + Fin)]
I lin
where t is the fraction tax rate and I is the total investment in dollars. Find the
maximum R as a function of n for a given I if n is a continuous variable. Repeat if n
is discrete. (See Prob. 4.3 for other notation.)
4.5. Classify the following functions as unimodal, multimodal, or neither:
(a) f(x) = 2x + 3
(b) f(x) = x
2
(c) f(x) = a cos (x)
4.6. Are the following functions unimodal, multimodal, or neither:
(a) f(x) = xi + + X
1
X
2
(b) f(x) = xf - 4xix2 +
(c) f(x) =
PROBLEMS 153
4.7. Determine the convexity or concavity of the following objective functions:
(a) f(x
l
, Xz) = (Xl - Xz)Z + ~
(b) f(x
l
, Xz, X
3
) = xi + ~ + ~
(c) f(x
l
, Xz) = eX! + e
X2
4.8. Given a linear objective function,
f = Xl + X
z
explain why a nonconvex region such as region A in Fig. P4.8 will yield difficulties
in the search for the maximum.
5
(Xl and X
z
must lie in region A)
o 5
FigureP4.8
Why is region A not convex?
4.9. Are the following functions convex? Strictly convex? Why?
(a) 2xi + 2x
l
x
Z
+ x ~ + 7XI + 8x
z
+ 25
What are the optimum values of Xl and x
z
?
(b) e
5x
4.10. A reactor converts an organic compound to product P by heating the material in
the presence of an additive A (mole fraction = x
A
). The additive can be injected into
the reactor, while steam can be injected into a heating coil inside the reactor to
provide heat. Some conversion can be obtained by heating without addition of A,
and vice versa.
The product P can be sold for $50 per Ib-mol. For lib-mol of feed, the cost
of the additive (in dollars per Ib-mol) as a function of X A is given by the formula,
2.0 + lOx A + 0 x ~ The cost of the steam (in dollars) as a function of S is 1.0 +
0.003S + 2.0 x 1O-
6
S
z
. (S = Ib steamjlb-mol feed). The yield equation is Yp = 0.1 +
0.3x
A
+ O.OOlS + O.OOOlxAS; Yp = Ib-mol product Pjlb-mol feed.
(a) Formulate the profit function (basis of 1.0 lb-mol feed) in terms of X
A
and S.
f = income - costs
(b) Maximize f subject to the constraints
0:5:x
A
:5:1 S ~ O
by any method you choose.
(c) Is f a concave function? Demonstrate mathematically why it is or why it is not
concave.
(d) Is the region of search convex? Why?
4.11. Show that f = e"1 + e
X2
is convex. Is it also strictly convex?
4.12. Show that f = Ixi is convex.
4.13. Is the following region constructed by the four constraints convex? Closed?
X
z
~ 1 - Xl
X
z
:5: 1 + O.5x
l
Xl :5: 2
Xz ~ 0
4.14. Does the following set of constraints form a convex region?
gl(X) = -(xi + xD + 9 ~ 0
gz{x) = -Xl - X
z
+ 1 ~ 0
4.15. Consider the following problem:
Minimize f(x) = xi + X
z
Subject to gl(X) = xi + ~ - 9 :5: 0
gz{x) = (Xl + ~ - 1 :5: 0
g3(X) = (Xl + x
z
) - 1 :5: 0
Does the constraint set form a convex region? Is it closed? (Hint: a plot will help
decide.)
4.16. The objective function work requirement for a three-stage compressor can be ex-
pressed as .
f = (;:f
Z86
+ (;:f
Z86
+ (;:f
Z86
Pl = 1 atm and P4 = 10 atm. The minimum occurs at a pressure ratio for each stage
of ViO. Is f convex for 1 :5: pz :5: 10, 1 :5: P3 :5: 10?
4.17. Happel and Jordan [Chemical Process Economics, Marcel Dekker, New York, 1975,
p. 178] reported an objective function (cost) for the design of a distillation column
as folloW!:
'"
f = 14720(100 - P) + 6560R - 30.2PR + 6560 - 30.2P
+ 19.5n (5000R - 23PR + 5000 - 23P)O.5
+ 23.2 [5000R - 23PR + 5000 - 23P]O.6Z
PROBLEMS 155
where n = number of theoretical stages, R = reflux ratio, and P = percent recovery
in bottoms stream. They reported the optimum occurs at R = 8, n= 55, and P =
99. Is f convex at this point? Are there nearby regions where f is not convex?
4.18 .. Consider the function
y = (x - a)2
Note that x = a minimizes y. Let z = x
2
- 4x + 16. Does the solution to x
2
- 4x +
16 = 0,
4 + J-48
x = - 2 = 2 j2j3
minimize z? (j = J=i).
4.19. The following objective function can be seen by inspection to have a minimum at
x=O:
Can the criteria of Sec. 4.5 be applied to;jtest this outcome?
4.20. (a) Consider the objective function,
f = 6xi + ~ + 6X
1
X
2
+ x ~
Find the stationary points and classify them using the Hessian matrix.
(b) Repeat for
(c) Repeat for
4.21. Classify the stationary points of
(a) f = _x
4
+ x
3
+ 20
(b) f = x
3
+ 3x
2
+ X + 5
(c) f = X4 - 2x
2
+ 1
(d) f = xi - 8X
1
X
2
+ ~
according to Table 4.2/
4.22. List stationary poinfs and their classification (maximum, minimum, saddle point) of
(a) f = xi + 2Xl + x ~ + 6x
2
+ 4
(b) f = Xl + X
2
+ xi - 4X
l
X
2
+ x ~
4.23. Show that the eigenvectors of the Hessian for a two-variable quadratic optimization
problem are orthogonal.
4.24. Figure P4.24 shows the contours of a quadratic optimization problem, which has a
minimum at (2, 2) where f(x) = 5. If the general quadratic function is described by
f = txT Ax + b
T
X + c
give characteristics of A, b, and c; be as quantitative as possible.

-
-2.0 0 2.0 4.0 6.0 8.0
Figure P4.24
4.25. We wish to minimize
f(x) = -12xl + 2xi + - 2X
1
X
2
(a
l
) (a
2
) (a
3
) (a
4
)
(a) Find the stationary point and determine if it is a maximum or minimum based
on the Hessian matrix.
(b) The coefficients of each term (a
l
, a
2
, a
3
, a
4
) can affect whether a maximum or a
minimum occurs. If three of the four coefficients are fixed at the above values
and we vary the remaining one, find the individual values of a
l
, a
2
, a
3
, and a
4
that change the character of the stationary point (e.g., from minimum to saddle
point).
4.26. For
f = xi + X
1
X
2
+
find the transformation x = Vz such that
f = allzi +
Illustrate the transformed coordinate system on a contour plot of f(x).

Chap04 - Basic Concepts of Optimization

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chap04 - Basic Concepts of Optimization

Uploaded by

Copyright:

Available Formats

PART

You might also like