You are on page 1of 28

Optimization

Unconstrained Minimization
Def : f(x), x is said to be differentiable at a point x*, if it is defined in a
neighborhood N around x* and if x* +h a vector n independent of
h that

f ( x * h) f ( x*) a, h h, ( x*, h)
where the vector a is called the gradient of f(x) evaluated at x*,
denote it as

a f ( x * ) [

f
x1

f
x 2

f T
] x*
x n

The term <a,h> is called the 1-st variation.

x * , h [1 ( x*, h)
and

n ( x*, h)]T

lim i ( x*, h) 0, i 1,2,....n


h 0

Unconstrained Minimization
Note if f(x) is twice differentiable, then
1
( x, h) F ( x) h H .O.T
2
where F(x) is an n*n symmetric, called the Hessian of f(x)

Then

2 f

x
21
f
F ( x)
x x
2 1
:

2 f
x1x2
2 f
2
x2
:
:

2 f

x1 xn
2 f

x2 xn

:
:

f ( x * h) f ( x*) f ( x*), h

1st variation

1
F ( x*)h, h H .O.T
2

2nd variation

Directional derivatives
Let w be a directional vector of unit norm || w|| =1
w
Now consider

g (r ) f ( x * r w)

is a function of the scalar r.

x*

rw

x * rw

Def : The directional derivative of f(x) in the direction w (unit norm)


at w* is defined as
dg ( r )
g ( r ) g ( 0)
| r 0 lim
dr
r 0
r 0
f ( x * rw) f ( x*)
lim
r
r 0
f ( x*), rw ( x*, rw), rw
lim
r
r 0
lim ( f ( x*), w ( x*, rw), w )
r 0
0
f ( x*), w lim ( x*, rw), w

Dw f ( x*)

r 0

Directional derivatives
Example :
Let

w ei [0 1 0]T
Then

Dei f ( x*) f ( x*), ei

f
xi

i.e. the partial derivative of f(x*) w.r.t xi is the directional derivative


of f(x) in the of
direction
ei.
Dwf (x*)
Interpretation
Consider
proj f ( x*) f ( x*), w w
Then

proj f ( x*)

f (x*)

x*
Dwf (x*)

proj f (x*)

f ( x*) w, f ( x*) , w
f ( x*), w 2 w, w

proj f ( x*) f ( x*), w Dwf ( x*)


The directional derivative along a direction w (||w||=1) is the
length of the projection vector of on w.

w 1

Unconstrained Minimization

[Q] : What direction w yield the largest directional derivative?


Ans :

f ( x*)
|| f ( x*) ||

Recall that the 1st variation of

f ( x * rw) is

r f ( x*), w rDwf ( x*)

Conclusion 1 : The direction of the gradient is the direction that


yields the largest change (1st -variation) in the
function.
This suggests xk 1 xk f ( xk ) in the steepest decent method
which will be described later

Directional derivatives
Example:

x1
f ( x) x x , x R 2
x2
2
1

2
2

x2

Sol :

f ( x )

f
x 2 x1
f ( x ) 1

f 2 x 2
x 2

Let

,
1

x
1

f ( x1 , x 2 ) c
level curcles

2
f ( x )
2

w with unit norm =

x1

f ( x )
1

f ( x )
8

2
2

1
2

Directional derivatives
The directional derivative in the direction of the gradient is
f ( x )

Dw f ( x ) f ( x ), w ,
w
f ( x )
2

2

Notes :

1
2
4
,

2 2

1
2

2
De1 f ( x ) f ( x ), e1
2

1
, 2 2 2
0

2
De2 f ( x ) f ( x ), e2
2

0
, 2 2 2
1

Directional derivatives
Def : f(x) is said to have a local (or relative) minimum at x*,
if in a nbd N of x*
f ( x * ) f ( x ), x N
Theorem: Let f(x) be differentiable ,If f(x) has a local minimum

at
, then
fx*( x
)0

pf :

(or Dw f ( x ) 0, w)

f ( x h) f ( x * ) f ( x ), h ( x , h), h 0
as h 0 f ( x ), h 0
f ( x ), h 0
since if h, f ( x ), h 0
then

h1 h f ( x ), h1 0

Note: f ( x ) is a necessary condition, not sufficient condition.

Directional derivatives
Theorem: If f(x) is twice diff and x

(1) f ( x ) 0
(2) F(x* ) is positive definite matrix (i.e. v T F(x* )v 0 , v 0 )
then x* is a local minimum of f(x)
pf :

f ( x h) f ( x ) f ( x ), h
f ( x h) f ( x )

1
F ( x )h, h H .O.T
2

1 T
h F ( x )h 0
2

f ( x ) f ( x) , x N x*
Conclusion2: The necessary & Sufficient Conditions for a local
minimum of f(x) is

(1) f(x * ) 0
(2) F ( x ) is p.d

Minimization of Unconstrained function


Prob. : Let y=f(x) , x R . We want to generate a sequence
n

x ( 0 ) , x (1) , x ( 2 ) ,........ such that f( x ( 0 ) ) f( x (1) ) f( x ( 2 ) )

and such that it converges to the minimum of f(x).


k
Consider the kth guess, x , , we can generate x ( k 1) provided
that we have two of information
k
(1) d : the direction to go
k
(2) : a scalar step size
Then

x ( k 1) x ( k ) ( k ) d ( k )

Basic descent methods


(1) Steepest descent
(2) Newton-Raphson method

Steepest Descent
Steepest descent : d ( k ) f ( x ( k ) )

x ( k 1) x ( k ) ( k ) f ( x ( k ) ) with ( k ) 0
To determine (k) ,
consider g( (k) ) f(x (k) - (k)f(x (k) )) is a function of (k) .
Note

f ( x ( k ) ( k ) f ( x ( k ) ))
f ( x ( k ) ) f ( x ( k ) ), ( k ) f ( x ( k ) )
f (x

1.a. Optimum

(k )

(k )

(k )

f ( x

(k )

it minimizes g (

)
(k )

f ( x (k ) )

dg ( ( k ) )
), ie
0
(k )
d

Steepest Descent
Example :

f ( x) x1

x2
, x R2
x1 x 2 1

37
64
3
(k )
(k )
Suppose x , f ( x )
21
3

64
37

3
( k 1)
x
( k ) 64
21
3

64
37
(3 ( k ) ) 2
37
64
g ( ( k ) ) 3 ( k )

37
21
64
(3 ( k ) )(3 ( k ) ) 1
64

64

dg ( ( k ) )
(k )

0
,

is messy to calvulate (in general)


(k )
d

Steepest Descent
Example : f ( x)

1 T
x Qx b T x, Qnn : symmetric and p.d.
2

f ( x ) Qx b

QX ( k ) b d ( k )

x ( k 1) x ( k ) ( k ) (Qx ( k ) b)
g ( ( k ) ) f ( x ( k ) ( k ) (Qx ( k ) b))
1 (k ) 2 (k ) T
( ) ( d ) Qd ( k ) ( k ) (( X ( k ) )T Qd ( k ) bT d ( k ) )
2
1 (k ) T
( x ) Qx ( k ) bT x ( k )
2
( i.e. a parabola in (k) )

Steepest Descent
dg ( ( k ) )
(k )
(k ) T
(k )
( k )T
T
(k )

(
d
)
Qd

(
X
Q

b
)
d
0
(k )
d

(k )

( d ( k ) )T d ( k )
(k ) T
(d ) Qd ( k )

d 2 g ( ( k ) )
(k ) T
(k )
Note

(
d
)
Qd
0 (Q is p.d.)
(k ) 2
d ( )
Optimum iteration
X

( k 1)

(k )

d (k ) , d (k ) (k )
(k )
(k )

d
,
d

QX
b
(k )
(k )
d , Qd

Remark :
The optimal steepest descent step size can be determined
analytically for quadratic function.

Steepest Descent
1.b. other possibilities for choosing
(1) Constant step size i.e.

(k )

(k )

constant k

x ( k 1) x ( k ) f ( x ( k ) )

adv : simple
disadv : no idea of which value of to choose
If is too large diverge
If is too small very slow
(2) Variable step size

i.e. choose (k) from { 1 , 2 , , k } g( (k) ) is minimized


evaluate g( 1 ), g( 2 ), , g( k ) find the minimized one.

Steepest Descent
1.b. other possibilities for choosing
(3) Polynomial fit methods

(k )

g ( k )

g1

(i) Quadratic fit


g ( ) a b c 2

g2

g3

1 2 (k ) 3

gauss three values for , say 1 , 2 , 3.


(k )
(k )
g
(

f
(
x

f
(
x
) a b i c i , i 1,2,3
i
i
Let

dg ( )
g
(

)
b 2c 0
Solve for a, b, c minimize
by
d

d 2 g ( )
Check
2c 0
2
d
( k )

b
fun( 1, 2, 3, g 1, g 2, g 3)
2c

(k )

Steepest Descent
1.b. other possibilities for choosing

(k )

(3) Polynomial fit methods


(ii) Cubic fit
g ( ) a1 3 a 2 2 a3 a 4
1 1 1 1 a1
2 3 2 2 2 1 a 2


3
2
3 3 3 1 a 3
3

2
4 4 4 1 a 4
3

g ( k )

g1

g 2
g 3

g3
g2

g3

g 4

to solve a1, a 2, a 3, a 4
dg ( )
3a1 2 2a 2 a3 0
d
2

(k )

g1

2a 2 4a 2 12a1 a3
6a1

d 2 g ( ( k ) )
(k )
check

6
a

2a 2 0
1
(k )
d

(k )

4 (k )

Steepest Descent
1.b. other possibilities for choosing

(k )

(4) Region elimination methods


g ( )
g1

a 1

2 b

(a) g1>g2
g1

Assume g() is convex


over [a,b] i.e. one minimum

g2

(b)g1<g2
g2

g1

eliminated

1 2

(c)g1=g2
g1

g2
eliminated

1 2

g2

eliminated

eliminated

1 2

initial interval of uncertainty [a,b] , next interval of uncertainty for


(i) is [ 1 ,b];
for (ii) is [a, 2 ];
for (iii) is [1 , 2 ]

Steepest Descent
[Q] : how do we choose 1 and 2 ?
(i) Two points equal interval search
i.e. 1- a = 1- 2=b- 1
1st iteration

L0 b a

2 iteration

L1

3rd iteration

L2

k iteration

2
Lk ( ) k L 0
3

nd

th

2
L0
3
2
2
L1 ( ) 2 L0
3
3

Steepest Descent
[Q] : how do we choose 1 and 2 ?
(ii) Fibonacci Search method
F0 1, F1 1, Fk Fk 1 Fk 2

For N-search iteration


1( k )

FN 1 k
(bk ak ) ak , k 0,1,2, , N 1
FN 1 k

2(k )

FN k
(bk ak ) ak , k 0,1,2, , N 1
FN 1 k

Example: Let N=5, initial a = 0 , b = 1


k=0

g1
10

g2
8
5
20
13
13

a1

b1

L1

F4
5

(
1

0
)

1 F6
13
F
8
( 0)
2 F56 (1 0) 0 13
( 0)

compare g 1 , g 2 [a1 ,b1 ]

F3
1 F5 (b1 a1 ) a1
F
(1)
2 F54 (b1 a1 ) a1
(1)

compare g 1 , g 2 [a 2 , b2 ]

8
L0
13

Steepest Descent
[Q] : how do we choose 1 and 2 ?
(iii) Golden Section Method
F

if N lim N 1 0.382
N FN 1
then use
1( k ) 0.382(bk a k ) a k

2( k ) 0.618(bk ak ) ak
until
bk ak

Example: [a 0 , b0 ] [0,2]
then

FN
0.618
lim
N FN 1

k 0,1,2

[a1 , b1 ] [0,1.236]
then
(1)

1( 0 ) 0.382(2 0) 0 0.764

1 0.382(1.236 0) 0 0.472

2( 0 ) 0.618( 2 0) 0 1.236

2(1) 0.618(1.236 0) 0 0.763

g1
0

0.764

g2
1.236

g1
1

0.472

g2
0.763 1.236

etc

Steepest Descent
Flow chart of steepest descent
Initial guess x(0)
Compute f(x )
(k)

f(x )

Stop!
x(k) is minimum
Yes

(k)

k=k+1

No
Determine (k)
x(k+1)c=x(k)- (k) f(x(k))

{1 n}
Polynomial fit : cubic ,
Region elimination :

Steepest Descent
[Q]: is the direction of f (x) the best direction to go?
1
f ( x) x T Qx b T x
2

f ( x) Qx b

suppose the initial guess is x(0)


Consider the next guess
x (1) x ( 0 ) Mf ( x) , M : n n matrix
x ( 0 ) M (Qx ( 0 ) b)

What should M be such that x(1) is the minimum, i.e. f ( x (1) ) 0 ?


Since we want
f ( x (1) ) Qx (1) b 0
Q( x ( 0) M (Qx ( 0 ) b)) b

Qx (0) QMQx (0) QMb b 0


If MQ=I or M=Q-1
Thus for a quadratic function x(k+1)=x(k)-Q-1f(x(k)) will take us
to the minimum in one iteration no matter what x(0) is.

Newton-Raphson Method
Minimize f(x)
The necessary condition f(x)=0
The N-R algorithm is to find the roots of f(x)=0
f (x)

f ( x k )

x k 1

xk

x k 1 x k

Guess x(k) then x(k+1) must satisfy


f ( x ( k ) )
df ( x ( k ) ) x ( k 1) x ( k ) ( df ( x)

(k )
( k 1)
dx
x x
dx
x( k )
f (x)

Note not
always converge

x
2

) 1 f ( x ( k ) )
x( k )

Newton-Raphson Method
A more formal derivation
Min f(x(k)+h) w.r.t h
f ( x ( k ) h ) f ( x ( k ) ) f ( x ( k ) , h

1
h, F ( x ( k ) )h
2

h f ( x ( k ) h ) f ( x ( k ) ) F ( x ( k ) ) h 0

h F ( x ( k ) ) f ( x ( k ) )

x ( k 1) x ( k ) h
x ( k ) [ F ( x ( k ) )]1 f ( x ( k ) )
f (x )

x k 3 x k 2 x k 1 x k

Newton-Raphson Method
Remarks
1 computation of [F(x(k))]-1 at every iteration
time consuming
modify N-R algorithm to calculate [F(x(k))]-1 every M-th
iteration
2 must check F(x(k)) is p.d. at every iteration.
(k )
(k )
If not F ( x ) F ( x )

Example
f ( x1 , x 2 )

1 0
0
0 n

x1

1
2
x2 1

2 x1

( x 2 x 2 1) 2
2

f 1
2 x2

( x 2 x 2 1) 2
2
1

Newton-Raphson Method
F ( x)

3 x1 2 x 2 2 1

2
2

( x1 x 2 1) 3

4 x1 x 2

2
2
x1 3 x 2 1

4 x1 x 2

The minimum of f(x) is at (0,0)


In the nbd of (0,0)

1 0
is p.d.
0 1

F ((0,0)) 2

Now suppose we start an initial guess


x

(0)

1
, then
0

1
(0)
f ( x ) 2
0

Then
x

(1)

( 0)

F ( x (0) ) 2
0

5
( F ( x )) f ( x ) 4 diverges.
0

(0)

(0)

Remark
3 N-R algorithm is good(fast) when initial guess close
to
minimum but not very good when far from

You might also like