You are on page 1of 25

Chapter 4: Unconstrained Optimization

Unconstrained optimization problem min


x
F(x) or max
x
F(x)
Constrained optimization problem
min
x
F(x) or max
x
F(x)
subject to g(x) = 0
and/or h(x) < 0 or h(x) > 0
Example: minimize the outer area of
a cylinder subject to a xed volume.
Objective function
F(x) = 2r
2
+ 2rh, x =
_
r
h
_
Constraint: 2r
2
h = V
1
Outline:
Part I: one-dimensional unconstrained optimization
Analytical method
Newtons method
Golden-section search method
Part II: multidimensional unconstrained optimization
Analytical method
Gradient method steepest ascent (descent) method
Newtons method
2
PART I: One-Dimensional Unconstrained Optimization Techniques
1 Analytical approach (1-D)
min
x
F(x) or max
x
F(x)
Let F

(x) = 0 and nd x = x

.
If F

(x

) > 0, F(x

) = min
x
F(x), x

is a local minimum of F(x);


If F

(x

) < 0, F(x

) = max
x
F(x), x

is a local maximum of F(x);


If F

(x

) = 0, x

is a critical point of F(x)


Example 1: F(x) = x
2
, F

(x) = 2x = 0, x

= 0. F

(x

) = 2 > 0. Therefore,
F(0) = min
x
F(x)
Example 2: F(x) = x
3
, F

(x) = 3x
2
= 0, x

= 0. F

(x

) = 0. x

is not a local
minimum nor a local maximum.
Example 3: F(x) = x
4
, F

(x) = 4x
3
= 0, x

= 0. F

(x

) = 0.
In example 2, F

(x) > 0 when x < x

and F

(x) > 0 when x > x

.
In example 3, x

is a local minimum of F(x). F

(x) < 0 when x < x

and
F

(x) > 0 when x > x

.
3
F(x)=0
F(x)<0 F(x)>0
F(x)=0
F(x)>0
F(x)=0
F(x)<0
F(x)>0
F(x)<0
F(x)>0
F(x)>0
F(x)=0
Figure 1: Example of constrained optimization problem
2 Newtons Method
min
x
F(x) or max
x
F(x)
Use x
k
to denote the current solution.
F(x
k
+ p) = F(x
k
) + pF

(x
k
) +
p
2
2
F

(x
k
) + . . .
F(x
k
) + pF

(x
k
) +
p
2
2
F

(x
k
)
4
F(x

) = min
x
F(x) min
p
F(x
k
+ p)
min
p
_
F(x
k
) + pF

(x
k
) +
p
2
2
F

(x
k
)
_
Let
F(x)
p
= F

(x
k
) + pF

(x
k
) = 0
we have
p =
F

(x
k
)
F

(x
k
)
Newtons iteration
x
k+1
= x
k
+ p = x
k

F

(x
k
)
F

(x
k
)
Example: nd the maximum value of f(x) = 2 sin x
x
2
10
with an initial guess
of x
0
= 2.5.
Solution:
f

(x) = 2 cos x
2x
10
= 2 cos x
x
5
5
f

(x) = 2 sin x
1
5
x
i+1
= x
i

2 cos x
i

x
i
5
2 sin x
i

1
5
x
0
= 2.5, x
1
= 0.995, x
2
= 1.469.
Comments:
Same as N.-R. method for solving F

(x) = 0.
Quadratic convergence, |x
k+1
x

| |x
k
x

|
2
May diverge
Requires both rst and second derivatives
Solution can be either local minimum or maximum
6
3 Golden-section search for optimization in 1-D
max
x
F(x) (min
x
F(x) is equivalent to max
x
F(x))
Assume: only 1 peak value (x

) in (x
l
, x
u
)
Steps:
1. Select x
l
< x
u
2. Select 2 intermediate values, x
1
and x
2
so that x
1
= x
l
+ d, x
2
= x
u
d, and
x
1
> x
2
.
3. Evaluate F(x
1
) and F(x
2
) and update the search range
If F(x
1
) < F(x
2
), then x

< x
1
. Update x
l
= x
l
and x
u
= x
1
.
If F(x
1
) > F(x
2
), then x

> x
2
. Update x
l
= x
2
and x
u
= x
u
.
If F(x
1
) = F(x
2
), then x
2
< x

< x
1
. Update x
l
= x
2
and x
u
= x
1
.
4. Estimate
x

= x
1
if F(x
1
) > F(x
2
), and
x

= x
2
if F(x
1
) < F(x
2
)
7
F(x1)>F(x2)
(new ) Xl (new )
Xl (new ) Xu (new )
Xl (new ) Xu (new )
Xl X2 X1 Xu
Xl (new ) Xu (new )
Xl Xu X1 X2
Xl X2 X1 Xu
F(x1)<F(x2)
Xl Xu X1 X2
Xu
Figure 2: Golden search: updating search range
Calculate
a
. If
a
<
threshold
, end.

a
=

x
new
x
old
x
new

100%
8
The choice of d
Any values can be used as long as x
1
> x
2
.
If d is selected appropriately, the number of function evaluations can be min-
imized.
Figure 3: Golden search: the choice of d
d
0
= l
1
, d
1
= l
2
= l
0
d
0
= l
0
l
1
. Therefore, l
0
= l
1
+ l
2
.
l
0
d
0
=
l
1
d
1
. Then
l
0
l
1
=
l
1
l
2
.
l
2
1
= l
0
l
2
= (l
1
+ l
2
)l
2
. Then 1 =
_
l
2
l
1
_
2
+
l
2
l
1
.
9
Dene r =
d
0
l
0
=
d
1
l
1
=
l
2
l
1
. Then r
2
+ r 1 = 0, and r =

51
2
0.618
d = r(x
u
x
l
) 0.618(x
u
x
l
) is referred to as the golden value.
Relative error

a
=

x
new
x
old
x
new

100%
Consider F(x
2
) < F(x
1
). That is, x
l
= x
2
, and x
u
= x
u
.
For case (a), x

> x
2
and x

closer to x
2
.
x x
1
x
2
= (x
l
+ d) (x
u
d)
= (x
l
x
u
) + 2d = (x
l
x
u
) + 2r(x
u
x
l
)
= (2r 1)(x
u
x
l
) 0.236(x
u
x
l
)
For case (b), x

> x
2
and x

closer to x
u
.
x x
u
x
1
= x
u
(x
l
+ d) = x
u
x
l
d
= (x
u
x
l
) r(x
u
x
l
) = (1 r)(x
u
x
l
)
0.382(x
u
x
l
)
Therefore, the maximum absolute error is (1 r)(x
u
x
l
) 0.382(x
u
x
l
).
10

x
x

100%

(1 r)(x
u
x
l
)
|x

|
100%
=
0.382(x
u
x
l
)
|x

|
100%
Example: Find the maximum of f(x) = 2 sin x
x
2
10
with x
l
= 0 and x
u
= 4 as
the starting search range.
Solution:
Iteration 1: x
l
= 0, x
u
= 4, d =

51
2
(x
u
x
l
) = 2.472, x
1
= x
l
+ d = 2.472,
x
2
= x
u
d = 1.528. f(x
1
) = 0.63, f(x
2
) = 1.765.
Since f(x
2
) > f(x
1
), x

= x
2
= 1.528, x
l
= x
l
= 0 and x
u
= x
1
= 2.472.
Iteration 2: x
l
= 0, x
u
= 2.472, d =

51
2
(x
u
x
l
) = 1.528, x
1
= x
l
+ d = 1.528,
x
2
= x
u
d = 0.944. f(x
1
) = 1.765, f(x
2
) = 1.531.
Since f(x
1
) > f(x
2
), x

= x
1
= 1.528, x
l
= x
2
= 0.944 and x
u
= x
u
= 2.472.
11
Multidimensional Unconstrained Optimization
4 Analytical Method
Denitions:
If f(x, y) < f(a, b) for all (x, y) near (a, b), f(a, b) is a local maximum;
If f(x, y) > f(a, b) for all (x, y) near (a, b), f(a, b) is a local minimum.
If f(x, y) has a local maximumor minimumat (a, b), and the rst order partial
derivatives of f(x, y) exist at (a, b), then
f
x
|
(a,b)
= 0, and
f
y
|
(a,b)
= 0
If
f
x
|
(a,b)
= 0 and
f
y
|
(a,b)
= 0,
then (a, b) is a critical point or stationary point of f(x, y).
If
f
x
|
(a,b)
= 0 and
f
y
|
(a,b)
= 0
12
and the second order partial derivatives of f(x, y) are continuous, then
When |H| > 0 and

2
f
x
2
|
(a,b)
< 0, f(a, b) is a local maximum of f(x, y).
When |H| > 0 and

2
f
x
2
|
(a,b)
> 0, f(a, b) is a local minimum of f(x, y).
When |H| < 0, f(a, b) is a saddle point.
Hessian of f(x, y):
H =
_

2
f
x
2

2
f
xy

2
f
yx

2
f
y
2
_
|H| =

2
f
x
2


2
f
y
2


2
f
xy


2
f
yx
When

2
f
xy
is continuous,

2
f
xy
=

2
f
yx
.
When |H| > 0,

2
f
x
2


2
f
y
2
> 0.
Example (saddle point): f(x, y) = x
2
y
2
.
f
x
= 2x,
f
y
= 2y.
Let
f
x
= 0, then x

= 0. Let
f
y
= 0, then y

= 0.
13
Therefore, (0, 0) is a critical point.

2
f
x
2
=

x
(2x) = 2,

2
f
y
2
=

y
(2y) = 2

2
f
xy
=

x
(2y) = 0,

2
f
yx
=

y
(2x) = 0
|H| =

2
f
x
2


2
f
y
2


2
f
xy


2
f
yx
= 4 < 0
Therefore, (x

, y

) = (0, 0) is a saddle maximum.


Example: f(x, y) = 2xy + 2x x
2
2y
2
, nd the optimum of f(x, y).
Solution:
f
x
= 2y + 2 2x,
f
y
= 2x 4y.
Let
f
x
= 0, 2x + 2y = 2.
Let
f
y
= 0, 2x 4y = 0.
Then x

= 2 and y

= 1, i.e., (2, 1) is a critical point.

2
f
x
2
=

x
(2y + 2 2x) = 2

2
f
y
2
=

y
(2x 4y) = 4

2
f
xy
=

x
(2x 4y) = 2, or
14
0.5
0
0.5
0.5
0
0.5
0.4
0.2
0
0.2
0.4
x
z=x
2
y
2
y
Figure 4: Saddle point
15

2
f
yx
=

y
(2y + 2 2x) = 2
|H| =

2
f
x
2


2
f
y
2


2
f
xy


2
f
yx
= (2) (4) 2
2
= 4 > 0

2
f
x
2
< 0. (x

, y

) = (2, 1) is a local maximum.


5 Steepest Ascent (Descent) Method
Idea: starting from an initial point, nd the function maximum (minimum) along
the steepest direction so that shortest searching time is required.
Steepest direction: directional derivative is maximum in that direction gradi-
ent direction.
Directional derivative
D
h
f(x, y) =
f
x
cos +
f
y
sin = [
f
x
f
y
]

[cos sin ]

: inner product
Gradient
16
When [
f
x
f
y
]

is in the same direction as [cos sin ]

, the directional derivative


is maximized. This direction is called gradient of f(x, y).
The gradient of a 2-Dfunction is represented as f(x, y) =
f
x

i+
f
y

j, or [
f
x
f
y
]

.
The gradient of an n-D function is represented as f(

X) =
_
f
x
1
f
x
2
. . .
f
x
n
_
,
where

X = [x
1
x
2
. . . x
n
]

Example: f(x, y) = xy
2
. Use the gradient to evaluate the path of steepest ascent
at (2,2).
Solution:
f
x
= y
2
,
f
y
= 2xy.
f
x
|
(2,2)
= 2
2
= 4,
f
y
|
(2,2)
= 2 2 2 = 8
Gradient: f(x, y) =
f
x

i +
f
y

j = 4

i + 8

j
= tan
1 8
4
= 1.107, or 63.4
o
.
cos =
4

4
2
+8
2
, sin =
8

4
2
+8
2
.
Directional derivative at (2,2):
f
x
cos +
f
y
sin = 4 cos + 8 sin = 8.944
17
If

= , for example,

= 0.5325, then
D
h
f|
(2,2)
=
f
x
cos

+
f
y
sin

= 4 cos

+ 8 sin

= 7.608 < 8.944


Steepest ascent method
Ideally:
Start from (x
0
, y
0
). Evaluate gradient at (x
0
, y
0
).
Walk for a tiny distance along the gradient direction till (x
1
, y
1
).
Reevaluate gradient at (x
1
, y
1
) and repeat the process.
Pros: always keep steepest direction and walk shortest distance
Cons: not practical due to continuous reevaluation of the gradient.
Practically:
Start from (x
0
, y
0
).
Evaluate gradient (h) at (x
0
, y
0
).
18
Evaluate f(x, y) in direction h.
Find the maximum function value in this direction at (x
1
, y
1
).
Repeat the process until (x
i+1
, y
i+1
) is close enough to (x
i
, y
i
).
Find

X
i+1
from

X
i
For a 2-D function, evaluate f(x, y) in direction h:
g() = f(x
i
+
f
x
|
(x
i
,y
i
)
, y
i
+
f
y
|
(x
i
,y
i
)
)
where is the coordinate in h-axis.
For an n-D function f(

X),
g() = f(

X + f|
(

X
i
)
)
Let g

() = 0 and nd the solution =

.
Update x
i+1
= x
i
+
f
x
|
(x
i
,y
i
)

, y
i+1
= y
i
+
f
y
|
(x
i
,y
i
)

.
19
Figure 5: Illustration of steepest ascent
20
Figure 6: Relationship between an arbitrary direction h and x and y coordinates
21
Example: f(x, y) = 2xy + 2x x
2
2y
2
, (x
0
, y
0
) = (1, 1).
First iteration:
x
0
= 1, y
0
= 1.
f
x
|
(1,1)
= 2y + 2 2x|
(1,1)
= 6,
f
y
|
(1,1)
= 2x 4y|
(1,1)
= 6
f = 6

i 6

j
g() = f(x
0
+
f
x
|
(x
0
,y
0
)
, y
0
+
f
y
|
(x
0
,y
0
)
)
= f(1 + 6, 1 6)
= 2 (1 + 6) (1 6) + 2(1 + 6) (1 + 6)
2
2(1 6)
2
= 180
2
+ 72 7
g

() = 360 + 72 = 0,

= 0.2.
Second iteration:
x
1
= x
0
+
f
x
|
(x
0
,y
0
)

= 1+60.2 = 0.2, y
1
= y
0
+
f
y
|
(x
0
,y
0
)

= 160.2 =
0.2
f
x
|
(0.2,0.2)
= 2y + 2 2x|
(0.2,0.2)
= 2 (0.2) + 2 2 0.2 = 1.2,
f
y
|
(0.2,0.2)
= 2x 4y|
(0.2,0.2)
= 2 0.2 4 (0.2) = 1.2
22
f = 1.2

i + 1.2

j
g() = f(x
1
+
f
x
|
(x
1
,y
1
)
, y
1
+
f
y
|
(x
1
,y
1
)
)
= f(0.2 + 1.2, 0.2 + 1.2)
= 2 (0.2 + 1.2) (0.2 + 1.2) + 2(0.2 + 1.2)
(0.2 + 1.2)
2
2(0.2 + 1.2)
2
= 1.44
2
+ 2.88 + 0.2
g

() = 2.88 + 2.88 = 0,

= 1.
Third iteration:
x
2
= x
1
+
f
x
|
(x
1
,y
1
)

= 0.2 + 1.2 1 = 1.4, y


2
= y
1
+
f
y
|
(x
1
,y
1
)

=
0.2 + 1.2 1 = 1
. . .
(x

, y

) = (2, 1)
23
6 Newtons Method
Extend the Newtons method for 1-D case to multidimensional case.
Given f(

X), approximate f(

X) by a second order Taylor series at

X =

X
i
:
f(

X) f(

X
i
) + f

(

X
i
)(

X

X
i
) +
1
2
(

X

X
i
)

H
i
(

X

X
i
)
where H
i
is the Hessian matrix
H =
_

2
f
x
2
1

2
f
x
1
x
2
. . .

2
f
x
1
x
n

2
f
x
2
x
1

2
f
x
2
2
. . .

2
f
x
2
x
n
. . .

2
f
x
n
x
1

2
f
x
n
x
2
. . .

2
f
x
2
n
_

_
At the maximum (or minimum) point,
f(

X)
x
j
= 0 for all j = 1, 2, . . . , n, or
f =

0. Then
f(

X
i
) + H
i
(

X

X
i
) = 0
If H
i
is non-singular,

X =

X
i
H
1
i
f(

X
i
)
24
Iteration:

X
i+1
=

X
i
H
1
i
f(

X
i
)
Example: f(

X) = 0.5x
2
1
+ 2.5x
2
2
f(

X) =
_
x
1
5x
2
_
H =
_

2
f
x
2

2
f
xy

2
f
yx

2
f
y
2
_
=
_
1 0
0 5
_

X
0
=
_
5
1
_
,

X
1
=

X
0
H
1
f(

X
0
) =
_
5
1
_

_
1 0
0
1
5
_ _
5
5
_
=
_
0
0
_
Comments: Newtons method
Converges quadratically near the optimum
Sensitive to initial point
Requires matrix inversion
Requires rst and second order derivatives
25