Optimization

Optimization
Unconstrained Minimization
Def : f(x), x is said to be differentiable at a point x*, if it is defined in a
neighborhood N around x* and if x* +h a vector n independent of
h that
f ( x * h) f ( x*) a, h h, ( x*, h)
where the vector a is called the gradient of f(x) evaluated at x*,
denote it as
a f ( x * ) [
f
x1
f
x 2
f T
] x*
x n
The term <a,h> is called the 1-st variation.
x * , h [1 ( x*, h)
and
n ( x*, h)]T
lim i ( x*, h) 0, i 1,2,....n

h 0
Note if f(x) is twice differentiable, then
1
( x, h) F ( x) h H .O.T
2
where F(x) is an n*n symmetric, called the Hessian of f(x)
Then
2 f
x
21
f
F ( x)
x x
2 1
:
2 f
x1x2
2 f
2
x2
:
:
2 f
x1 xn
2 f
x2 xn
:
:
f ( x * h) f ( x*) f ( x*), h
1st variation
1
F ( x*)h, h H .O.T
2
2nd variation
Directional derivatives
Let w be a directional vector of unit norm || w|| =1
w
Now consider
g (r ) f ( x * r w)
is a function of the scalar r.
x*
rw
x * rw
Def : The directional derivative of f(x) in the direction w (unit norm)

at w* is defined as
dg ( r )
g ( r ) g ( 0)
| r 0 lim
dr
r 0
r 0
f ( x * rw) f ( x*)
lim
r
r 0
f ( x*), rw ( x*, rw), rw
lim
r
r 0
lim ( f ( x*), w ( x*, rw), w )
r 0
0
f ( x*), w lim ( x*, rw), w
Dw f ( x*)
r 0
Example :
Let
w ei [0 1 0]T
Then
Dei f ( x*) f ( x*), ei
f
xi
i.e. the partial derivative of f(x*) w.r.t xi is the directional derivative

of f(x) in the of
direction
ei.
Dwf (x*)
Interpretation
Consider
proj f ( x*) f ( x*), w w
Then
proj f ( x*)
f (x*)
x*
Dwf (x*)
proj f (x*)
f ( x*) w, f ( x*) , w
f ( x*), w 2 w, w
proj f ( x*) f ( x*), w Dwf ( x*)

The directional derivative along a direction w (||w||=1) is the
length of the projection vector of on w.
w 1
[Q] : What direction w yield the largest directional derivative?

Ans :
f ( x*)
|| f ( x*) ||
Recall that the 1st variation of
f ( x * rw) is
r f ( x*), w rDwf ( x*)
Conclusion 1 : The direction of the gradient is the direction that

yields the largest change (1st -variation) in the
function.
This suggests xk 1 xk f ( xk ) in the steepest decent method
which will be described later
Example:
x1
f ( x) x x , x R 2
x2
2
1
2
2
x2
Sol :
f ( x )
f
x 2 x1
f ( x ) 1
f 2 x 2
x 2
Let
,
1
x
1
f ( x1 , x 2 ) c
level curcles
2
f ( x )
2
w with unit norm =
x1
f ( x )
1
f ( x )
8
2
2
1
2
The directional derivative in the direction of the gradient is
f ( x )
Dw f ( x ) f ( x ), w ,
w
f ( x )
2

2
Notes :
1
2
4
,
2 2
1
2
2
De1 f ( x ) f ( x ), e1
2
1
, 2 2 2
0
2
De2 f ( x ) f ( x ), e2
2
0
, 2 2 2
1
Def : f(x) is said to have a local (or relative) minimum at x*,
if in a nbd N of x*
f ( x * ) f ( x ), x N
Theorem: Let f(x) be differentiable ,If f(x) has a local minimum
at
, then
fx*( x
)0
pf :
(or Dw f ( x ) 0, w)
f ( x h) f ( x * ) f ( x ), h ( x , h), h 0
as h 0 f ( x ), h 0
f ( x ), h 0
since if h, f ( x ), h 0
then
h1 h f ( x ), h1 0
Note: f ( x ) is a necessary condition, not sufficient condition.
Theorem: If f(x) is twice diff and x
(1) f ( x ) 0
(2) F(x* ) is positive definite matrix (i.e. v T F(x* )v 0 , v 0 )
then x* is a local minimum of f(x)
pf :
f ( x h) f ( x ) f ( x ), h
f ( x h) f ( x )
1
F ( x )h, h H .O.T
2
1 T
h F ( x )h 0
2
f ( x ) f ( x) , x N x*
Conclusion2: The necessary & Sufficient Conditions for a local
minimum of f(x) is
(1) f(x * ) 0
(2) F ( x ) is p.d
Minimization of Unconstrained function

Prob. : Let y=f(x) , x R . We want to generate a sequence
n
x ( 0 ) , x (1) , x ( 2 ) ,........ such that f( x ( 0 ) ) f( x (1) ) f( x ( 2 ) )
and such that it converges to the minimum of f(x).

k
Consider the kth guess, x , , we can generate x ( k 1) provided
that we have two of information
k
(1) d : the direction to go
k
(2) : a scalar step size
Then
x ( k 1) x ( k ) ( k ) d ( k )
Basic descent methods

(1) Steepest descent
(2) Newton-Raphson method
Steepest Descent
Steepest descent : d ( k ) f ( x ( k ) )
x ( k 1) x ( k ) ( k ) f ( x ( k ) ) with ( k ) 0
To determine (k) ,
consider g( (k) ) f(x (k) - (k)f(x (k) )) is a function of (k) .
Note
f ( x ( k ) ( k ) f ( x ( k ) ))
f ( x ( k ) ) f ( x ( k ) ), ( k ) f ( x ( k ) )
f (x
1.a. Optimum
(k )
(k )
(k )
f ( x
(k )
it minimizes g (
)
(k )
f ( x (k ) )
dg ( ( k ) )
), ie
0
(k )
d
Steepest Descent
Example :
f ( x) x1
x2
, x R2
x1 x 2 1
37
64
3
(k )
(k )
Suppose x , f ( x )
21
3
64
37
3
( k 1)
x
( k ) 64
21
3
64
37
(3 ( k ) ) 2
37
64
g ( ( k ) ) 3 ( k )
37
21
64
(3 ( k ) )(3 ( k ) ) 1
64
64
dg ( ( k ) )
(k )
0
,
is messy to calvulate (in general)

(k )
d
Steepest Descent
Example : f ( x)
1 T
x Qx b T x, Qnn : symmetric and p.d.
2
f ( x ) Qx b
QX ( k ) b d ( k )
x ( k 1) x ( k ) ( k ) (Qx ( k ) b)
g ( ( k ) ) f ( x ( k ) ( k ) (Qx ( k ) b))
1 (k ) 2 (k ) T
( ) ( d ) Qd ( k ) ( k ) (( X ( k ) )T Qd ( k ) bT d ( k ) )
2
1 (k ) T
( x ) Qx ( k ) bT x ( k )
2
( i.e. a parabola in (k) )
Steepest Descent
dg ( ( k ) )
(k )
(k ) T
(k )
( k )T
T
(k )
(
d
)
Qd
(
X
Q
b
)
d
0
(k )
d
(k )
( d ( k ) )T d ( k )
(k ) T
(d ) Qd ( k )
d 2 g ( ( k ) )
(k ) T
(k )
Note
(
d
)
Qd
0 (Q is p.d.)
(k ) 2
d ( )
Optimum iteration
X
( k 1)
(k )
d (k ) , d (k ) (k )
(k )
(k )
d
,
d
QX
b
(k )
(k )
d , Qd
Remark :
The optimal steepest descent step size can be determined
analytically for quadratic function.
Steepest Descent
1.b. other possibilities for choosing
(1) Constant step size i.e.
(k )
(k )
constant k
x ( k 1) x ( k ) f ( x ( k ) )
adv : simple
disadv : no idea of which value of to choose
If is too large diverge
If is too small very slow
(2) Variable step size
i.e. choose (k) from { 1 , 2 , , k } g( (k) ) is minimized

evaluate g( 1 ), g( 2 ), , g( k ) find the minimized one.
Steepest Descent
(3) Polynomial fit methods
(k )
g ( k )
g1
(i) Quadratic fit

g ( ) a b c 2
g2
g3
1 2 (k ) 3
gauss three values for , say 1 , 2 , 3.

(k )
(k )
g
(
f
(
x
f
(
x
) a b i c i , i 1,2,3
i
i
Let
dg ( )
g
(
)
b 2c 0
Solve for a, b, c minimize
by
d
d 2 g ( )
Check
2c 0
2
d
( k )
b
fun( 1, 2, 3, g 1, g 2, g 3)
2c
(k )
Steepest Descent
(k )
(3) Polynomial fit methods

(ii) Cubic fit
g ( ) a1 3 a 2 2 a3 a 4
1 1 1 1 a1
2 3 2 2 2 1 a 2

3
2
3 3 3 1 a 3
3

2
4 4 4 1 a 4
3
g ( k )
g1
g 2
g 3
g3
g2
g3
g 4
to solve a1, a 2, a 3, a 4
dg ( )
3a1 2 2a 2 a3 0
d
2
(k )
g1
2a 2 4a 2 12a1 a3
6a1
d 2 g ( ( k ) )
(k )
check
6
a
2a 2 0
1
(k )
d
(k )
4 (k )
Steepest Descent
(k )
(4) Region elimination methods

g ( )
g1
a 1
2 b
(a) g1>g2
g1
Assume g() is convex

over [a,b] i.e. one minimum
g2
(b)g1<g2
g2
g1
eliminated
1 2
(c)g1=g2
g1
g2
eliminated
1 2
g2
eliminated
eliminated
1 2
initial interval of uncertainty [a,b] , next interval of uncertainty for

(i) is [ 1 ,b];
for (ii) is [a, 2 ];
for (iii) is [1 , 2 ]
Steepest Descent
[Q] : how do we choose 1 and 2 ?
(i) Two points equal interval search
i.e. 1- a = 1- 2=b- 1
1st iteration
L0 b a
2 iteration
L1
3rd iteration
L2
k iteration
2
Lk ( ) k L 0
3
nd
th
2
L0
3
2
2
L1 ( ) 2 L0
3
3
Steepest Descent
(ii) Fibonacci Search method
F0 1, F1 1, Fk Fk 1 Fk 2
For N-search iteration

1( k )
FN 1 k
(bk ak ) ak , k 0,1,2, , N 1
FN 1 k
2(k )
FN k
(bk ak ) ak , k 0,1,2, , N 1
FN 1 k
Example: Let N=5, initial a = 0 , b = 1

k=0
g1
10
g2
8
5
20
13
13
a1
b1
L1
F4
5
(
1
0
)
1 F6
13
F
8
( 0)
2 F56 (1 0) 0 13
( 0)
compare g 1 , g 2 [a1 ,b1 ]
F3
1 F5 (b1 a1 ) a1
F
(1)
2 F54 (b1 a1 ) a1
(1)
compare g 1 , g 2 [a 2 , b2 ]
8
L0
13
Steepest Descent
(iii) Golden Section Method
F
if N lim N 1 0.382
N FN 1
then use
1( k ) 0.382(bk a k ) a k
2( k ) 0.618(bk ak ) ak
until
bk ak
Example: [a 0 , b0 ] [0,2]
then
FN
0.618
lim
N FN 1
k 0,1,2
[a1 , b1 ] [0,1.236]
then
(1)
1( 0 ) 0.382(2 0) 0 0.764
1 0.382(1.236 0) 0 0.472
2( 0 ) 0.618( 2 0) 0 1.236
2(1) 0.618(1.236 0) 0 0.763
g1
0
0.764
g2
1.236
g1
1
0.472
g2
0.763 1.236
etc
Steepest Descent
Flow chart of steepest descent
Initial guess x(0)
Compute f(x )
(k)
f(x )
Stop!
x(k) is minimum
Yes
(k)
k=k+1
No
Determine (k)
x(k+1)c=x(k)- (k) f(x(k))
{1 n}
Polynomial fit : cubic ,
Region elimination :
Steepest Descent
[Q]: is the direction of f (x) the best direction to go?
1
f ( x) x T Qx b T x
2
f ( x) Qx b
suppose the initial guess is x(0)

Consider the next guess
x (1) x ( 0 ) Mf ( x) , M : n n matrix
x ( 0 ) M (Qx ( 0 ) b)
What should M be such that x(1) is the minimum, i.e. f ( x (1) ) 0 ?

Since we want
f ( x (1) ) Qx (1) b 0
Q( x ( 0) M (Qx ( 0 ) b)) b
Qx (0) QMQx (0) QMb b 0

If MQ=I or M=Q-1
Thus for a quadratic function x(k+1)=x(k)-Q-1f(x(k)) will take us
to the minimum in one iteration no matter what x(0) is.
Newton-Raphson Method
Minimize f(x)
The necessary condition f(x)=0
The N-R algorithm is to find the roots of f(x)=0
f (x)
f ( x k )
x k 1
xk
x k 1 x k
Guess x(k) then x(k+1) must satisfy

f ( x ( k ) )
df ( x ( k ) ) x ( k 1) x ( k ) ( df ( x)
(k )
( k 1)
dx
x x
dx
x( k )
f (x)
Note not
always converge
x
2
) 1 f ( x ( k ) )
x( k )
A more formal derivation
Min f(x(k)+h) w.r.t h
f ( x ( k ) h ) f ( x ( k ) ) f ( x ( k ) , h
1
h, F ( x ( k ) )h
2
h f ( x ( k ) h ) f ( x ( k ) ) F ( x ( k ) ) h 0
h F ( x ( k ) ) f ( x ( k ) )
x ( k 1) x ( k ) h
x ( k ) [ F ( x ( k ) )]1 f ( x ( k ) )
f (x )
x k 3 x k 2 x k 1 x k
Remarks
1 computation of [F(x(k))]-1 at every iteration
time consuming
modify N-R algorithm to calculate [F(x(k))]-1 every M-th
iteration
2 must check F(x(k)) is p.d. at every iteration.
(k )
(k )
If not F ( x ) F ( x )
Example
f ( x1 , x 2 )
1 0
0
0 n
x1
1
2
x2 1
2 x1
( x 2 x 2 1) 2
2
f 1
2 x2
( x 2 x 2 1) 2
2
1
F ( x)
3 x1 2 x 2 2 1
2
2
( x1 x 2 1) 3
4 x1 x 2
2
2
x1 3 x 2 1
4 x1 x 2
The minimum of f(x) is at (0,0)

In the nbd of (0,0)
1 0
is p.d.
0 1
F ((0,0)) 2
Now suppose we start an initial guess

x
(0)
1
, then
0
1
(0)
f ( x ) 2
0

Then
x
(1)
( 0)
F ( x (0) ) 2
0
5
( F ( x )) f ( x ) 4 diverges.
0

(0)
(0)
Remark
3 N-R algorithm is good(fast) when initial guess close
to
minimum but not very good when far from

Optimization

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Optimization

Uploaded by

Copyright:

Available Formats

Optimization

The term <a,h> is called the 1-st variation.

lim i ( x*, h) 0, i 1,2,....n

is a function of the scalar r.

Def : The directional derivative of f(x) in the direction w (unit norm)

Dei f ( x*) f ( x*), ei

i.e. the partial derivative of f(x*) w.r.t xi is the directional derivative

proj f ( x*) f ( x*), w Dwf ( x*)

[Q] : What direction w yield the largest directional derivative?

Recall that the 1st variation of

r f ( x*), w rDwf ( x*)

Conclusion 1 : The direction of the gradient is the direction that

w with unit norm =

Note: f ( x ) is a necessary condition, not sufficient condition.

Minimization of Unconstrained function

x ( 0 ) , x (1) , x ( 2 ) ,........ such that f( x ( 0 ) ) f( x (1) ) f( x ( 2 ) )

and such that it converges to the minimum of f(x).

Basic descent methods

is messy to calvulate (in general)

i.e. choose (k) from { 1 , 2 , , k } g( (k) ) is minimized

(i) Quadratic fit

gauss three values for , say 1 , 2 , 3.

(3) Polynomial fit methods

(4) Region elimination methods

Assume g() is convex

initial interval of uncertainty [a,b] , next interval of uncertainty for

For N-search iteration

Example: Let N=5, initial a = 0 , b = 1

compare g 1 , g 2 [a1 ,b1 ]

2(1) 0.618(1.236 0) 0 0.763

suppose the initial guess is x(0)

What should M be such that x(1) is the minimum, i.e. f ( x (1) ) 0 ?

Qx (0) QMQx (0) QMb b 0

Guess x(k) then x(k+1) must satisfy

The minimum of f(x) is at (0,0)

Now suppose we start an initial guess

You might also like

Dei f ( x) f ( x), ei

proj f ( x) f ( x), w Dwf ( x*)

r f ( x), w rDwf ( x)