Professional Documents
Culture Documents
Unconstrained Minimization
Def : f(x), x is said to be differentiable at a point x*, if it is defined in a
neighborhood N around x* and if x* +h a vector n independent of
h that
f ( x * h) f ( x*) a, h h, ( x*, h)
where the vector a is called the gradient of f(x) evaluated at x*,
denote it as
a f ( x * ) [
f
x1
f
x 2
f T
] x*
x n
x * , h [1 ( x*, h)
and
n ( x*, h)]T
Unconstrained Minimization
Note if f(x) is twice differentiable, then
1
( x, h) F ( x) h H .O.T
2
where F(x) is an n*n symmetric, called the Hessian of f(x)
Then
2 f
x
21
f
F ( x)
x x
2 1
:
2 f
x1x2
2 f
2
x2
:
:
2 f
x1 xn
2 f
x2 xn
:
:
f ( x * h) f ( x*) f ( x*), h
1st variation
1
F ( x*)h, h H .O.T
2
2nd variation
Directional derivatives
Let w be a directional vector of unit norm || w|| =1
w
Now consider
g (r ) f ( x * r w)
x*
rw
x * rw
Dw f ( x*)
r 0
Directional derivatives
Example :
Let
w ei [0 1 0]T
Then
f
xi
proj f ( x*)
f (x*)
x*
Dwf (x*)
proj f (x*)
f ( x*) w, f ( x*) , w
f ( x*), w 2 w, w
w 1
Unconstrained Minimization
f ( x*)
|| f ( x*) ||
f ( x * rw) is
Directional derivatives
Example:
x1
f ( x) x x , x R 2
x2
2
1
2
2
x2
Sol :
f ( x )
f
x 2 x1
f ( x ) 1
f 2 x 2
x 2
Let
,
1
x
1
f ( x1 , x 2 ) c
level curcles
2
f ( x )
2
x1
f ( x )
1
f ( x )
8
2
2
1
2
Directional derivatives
The directional derivative in the direction of the gradient is
f ( x )
Dw f ( x ) f ( x ), w ,
w
f ( x )
2
2
Notes :
1
2
4
,
2 2
1
2
2
De1 f ( x ) f ( x ), e1
2
1
, 2 2 2
0
2
De2 f ( x ) f ( x ), e2
2
0
, 2 2 2
1
Directional derivatives
Def : f(x) is said to have a local (or relative) minimum at x*,
if in a nbd N of x*
f ( x * ) f ( x ), x N
Theorem: Let f(x) be differentiable ,If f(x) has a local minimum
at
, then
fx*( x
)0
pf :
(or Dw f ( x ) 0, w)
f ( x h) f ( x * ) f ( x ), h ( x , h), h 0
as h 0 f ( x ), h 0
f ( x ), h 0
since if h, f ( x ), h 0
then
h1 h f ( x ), h1 0
Directional derivatives
Theorem: If f(x) is twice diff and x
(1) f ( x ) 0
(2) F(x* ) is positive definite matrix (i.e. v T F(x* )v 0 , v 0 )
then x* is a local minimum of f(x)
pf :
f ( x h) f ( x ) f ( x ), h
f ( x h) f ( x )
1
F ( x )h, h H .O.T
2
1 T
h F ( x )h 0
2
f ( x ) f ( x) , x N x*
Conclusion2: The necessary & Sufficient Conditions for a local
minimum of f(x) is
(1) f(x * ) 0
(2) F ( x ) is p.d
x ( k 1) x ( k ) ( k ) d ( k )
Steepest Descent
Steepest descent : d ( k ) f ( x ( k ) )
x ( k 1) x ( k ) ( k ) f ( x ( k ) ) with ( k ) 0
To determine (k) ,
consider g( (k) ) f(x (k) - (k)f(x (k) )) is a function of (k) .
Note
f ( x ( k ) ( k ) f ( x ( k ) ))
f ( x ( k ) ) f ( x ( k ) ), ( k ) f ( x ( k ) )
f (x
1.a. Optimum
(k )
(k )
(k )
f ( x
(k )
it minimizes g (
)
(k )
f ( x (k ) )
dg ( ( k ) )
), ie
0
(k )
d
Steepest Descent
Example :
f ( x) x1
x2
, x R2
x1 x 2 1
37
64
3
(k )
(k )
Suppose x , f ( x )
21
3
64
37
3
( k 1)
x
( k ) 64
21
3
64
37
(3 ( k ) ) 2
37
64
g ( ( k ) ) 3 ( k )
37
21
64
(3 ( k ) )(3 ( k ) ) 1
64
64
dg ( ( k ) )
(k )
0
,
Steepest Descent
Example : f ( x)
1 T
x Qx b T x, Qnn : symmetric and p.d.
2
f ( x ) Qx b
QX ( k ) b d ( k )
x ( k 1) x ( k ) ( k ) (Qx ( k ) b)
g ( ( k ) ) f ( x ( k ) ( k ) (Qx ( k ) b))
1 (k ) 2 (k ) T
( ) ( d ) Qd ( k ) ( k ) (( X ( k ) )T Qd ( k ) bT d ( k ) )
2
1 (k ) T
( x ) Qx ( k ) bT x ( k )
2
( i.e. a parabola in (k) )
Steepest Descent
dg ( ( k ) )
(k )
(k ) T
(k )
( k )T
T
(k )
(
d
)
Qd
(
X
Q
b
)
d
0
(k )
d
(k )
( d ( k ) )T d ( k )
(k ) T
(d ) Qd ( k )
d 2 g ( ( k ) )
(k ) T
(k )
Note
(
d
)
Qd
0 (Q is p.d.)
(k ) 2
d ( )
Optimum iteration
X
( k 1)
(k )
d (k ) , d (k ) (k )
(k )
(k )
d
,
d
QX
b
(k )
(k )
d , Qd
Remark :
The optimal steepest descent step size can be determined
analytically for quadratic function.
Steepest Descent
1.b. other possibilities for choosing
(1) Constant step size i.e.
(k )
(k )
constant k
x ( k 1) x ( k ) f ( x ( k ) )
adv : simple
disadv : no idea of which value of to choose
If is too large diverge
If is too small very slow
(2) Variable step size
Steepest Descent
1.b. other possibilities for choosing
(3) Polynomial fit methods
(k )
g ( k )
g1
g2
g3
1 2 (k ) 3
f
(
x
f
(
x
) a b i c i , i 1,2,3
i
i
Let
dg ( )
g
(
)
b 2c 0
Solve for a, b, c minimize
by
d
d 2 g ( )
Check
2c 0
2
d
( k )
b
fun( 1, 2, 3, g 1, g 2, g 3)
2c
(k )
Steepest Descent
1.b. other possibilities for choosing
(k )
3
2
3 3 3 1 a 3
3
2
4 4 4 1 a 4
3
g ( k )
g1
g 2
g 3
g3
g2
g3
g 4
to solve a1, a 2, a 3, a 4
dg ( )
3a1 2 2a 2 a3 0
d
2
(k )
g1
2a 2 4a 2 12a1 a3
6a1
d 2 g ( ( k ) )
(k )
check
6
a
2a 2 0
1
(k )
d
(k )
4 (k )
Steepest Descent
1.b. other possibilities for choosing
(k )
a 1
2 b
(a) g1>g2
g1
g2
(b)g1<g2
g2
g1
eliminated
1 2
(c)g1=g2
g1
g2
eliminated
1 2
g2
eliminated
eliminated
1 2
Steepest Descent
[Q] : how do we choose 1 and 2 ?
(i) Two points equal interval search
i.e. 1- a = 1- 2=b- 1
1st iteration
L0 b a
2 iteration
L1
3rd iteration
L2
k iteration
2
Lk ( ) k L 0
3
nd
th
2
L0
3
2
2
L1 ( ) 2 L0
3
3
Steepest Descent
[Q] : how do we choose 1 and 2 ?
(ii) Fibonacci Search method
F0 1, F1 1, Fk Fk 1 Fk 2
FN 1 k
(bk ak ) ak , k 0,1,2, , N 1
FN 1 k
2(k )
FN k
(bk ak ) ak , k 0,1,2, , N 1
FN 1 k
g1
10
g2
8
5
20
13
13
a1
b1
L1
F4
5
(
1
0
)
1 F6
13
F
8
( 0)
2 F56 (1 0) 0 13
( 0)
F3
1 F5 (b1 a1 ) a1
F
(1)
2 F54 (b1 a1 ) a1
(1)
compare g 1 , g 2 [a 2 , b2 ]
8
L0
13
Steepest Descent
[Q] : how do we choose 1 and 2 ?
(iii) Golden Section Method
F
if N lim N 1 0.382
N FN 1
then use
1( k ) 0.382(bk a k ) a k
2( k ) 0.618(bk ak ) ak
until
bk ak
Example: [a 0 , b0 ] [0,2]
then
FN
0.618
lim
N FN 1
k 0,1,2
[a1 , b1 ] [0,1.236]
then
(1)
1( 0 ) 0.382(2 0) 0 0.764
1 0.382(1.236 0) 0 0.472
2( 0 ) 0.618( 2 0) 0 1.236
g1
0
0.764
g2
1.236
g1
1
0.472
g2
0.763 1.236
etc
Steepest Descent
Flow chart of steepest descent
Initial guess x(0)
Compute f(x )
(k)
f(x )
Stop!
x(k) is minimum
Yes
(k)
k=k+1
No
Determine (k)
x(k+1)c=x(k)- (k) f(x(k))
{1 n}
Polynomial fit : cubic ,
Region elimination :
Steepest Descent
[Q]: is the direction of f (x) the best direction to go?
1
f ( x) x T Qx b T x
2
f ( x) Qx b
Newton-Raphson Method
Minimize f(x)
The necessary condition f(x)=0
The N-R algorithm is to find the roots of f(x)=0
f (x)
f ( x k )
x k 1
xk
x k 1 x k
(k )
( k 1)
dx
x x
dx
x( k )
f (x)
Note not
always converge
x
2
) 1 f ( x ( k ) )
x( k )
Newton-Raphson Method
A more formal derivation
Min f(x(k)+h) w.r.t h
f ( x ( k ) h ) f ( x ( k ) ) f ( x ( k ) , h
1
h, F ( x ( k ) )h
2
h f ( x ( k ) h ) f ( x ( k ) ) F ( x ( k ) ) h 0
h F ( x ( k ) ) f ( x ( k ) )
x ( k 1) x ( k ) h
x ( k ) [ F ( x ( k ) )]1 f ( x ( k ) )
f (x )
x k 3 x k 2 x k 1 x k
Newton-Raphson Method
Remarks
1 computation of [F(x(k))]-1 at every iteration
time consuming
modify N-R algorithm to calculate [F(x(k))]-1 every M-th
iteration
2 must check F(x(k)) is p.d. at every iteration.
(k )
(k )
If not F ( x ) F ( x )
Example
f ( x1 , x 2 )
1 0
0
0 n
x1
1
2
x2 1
2 x1
( x 2 x 2 1) 2
2
f 1
2 x2
( x 2 x 2 1) 2
2
1
Newton-Raphson Method
F ( x)
3 x1 2 x 2 2 1
2
2
( x1 x 2 1) 3
4 x1 x 2
2
2
x1 3 x 2 1
4 x1 x 2
1 0
is p.d.
0 1
F ((0,0)) 2
(0)
1
, then
0
1
(0)
f ( x ) 2
0
Then
x
(1)
( 0)
F ( x (0) ) 2
0
5
( F ( x )) f ( x ) 4 diverges.
0
(0)
(0)
Remark
3 N-R algorithm is good(fast) when initial guess close
to
minimum but not very good when far from