Steepest Descent Method and Convergence

CCO-10/11
CONVEX & COMBINATORIAL OPTIMIZATION

SEC-2 P-004
WWW.SEARCHING-EYE.COM
http://www.searching-eye.com/~mac/
STANFORD AUDI: AUTONOMOUS DRIFTING.
Cool!!
Unconstrained Minimization
Steepest Descent Methods Under Different Norms
Including Coordinate Descent Algorithm
All downloadable material: ( including MATLAB Code )

http://www.searching-eye.com/sanjeevsharma/co/cco/
CCO-10/11 Sanjeev Sharma www.searching-eye.com

Undergraduate Student: Final Year
Indian Institute of Technology, Roorkee
Electrical Engineering Department (2007-2011)
Interests: Reinforcement Learning, Decentralized Algorithms,

Decomposition Methods, Convex Optimization, Machine Learning,
Mixed Integer Programming.
Application Interest : Large scale Semi-Autonomous & Fully-

Autonomous Military Operations; Robots that can help human
beings in Complex Tasks; Decentralized Path Planning & Joint
Operation Systems; Decentralized Reinforcement Learning.
http://www.searching-eye.com/sanjeevs.html
Before taking CCO-10/11
Please refer MAC-X

http://www.searching-eye.com/~mac
Or
Refer to the details of section 1 & 2 of CCO-10/11
http://www.searching-eye.com/sanjeevsharma/co/cco/

This presentation
 Steepest Descent Methods
- Steepest Descent under L-2 Norm
- Steepest Descent under Quadratic Norm.
- Steepest Descent under L-1 Norm: Coordinate
Descent Method.
- Convergence Analysis of Steepest Descent Methods.
Directional Derivatives:
 Consider a function f : R  Rn
- Directional derivative of f in the directionv :f ( x)T v

Directional Derivatives:
 Consider a function f : R  Rn
- Directional derivative of f in the directionv :f ( x)T v
 For Descent Methods:

- Directional derivative should be negative: f ( x)T v  0
- v is a descent direction. (See 3rd presentation)
Basic Vector Algebra
 a.b : is a scalar projection of vector b on vector a
scaled by the length of a .

Basic Vector Algebra
 a.b : is a scalar projection of vector b on vector a
scaled by the length of a .
 Form the linear function point of view:

- f ( x)T v : Linear function of v
- Can be made as small as possible with length(v)  
Limit of the length (function of Norm)
 The descent direction makes sense only if the length
of the descent direction is restricted.
 Length is a measure of norm: || v || 

- Where || . || is any valid norm.

Normalized Steepest Descent
 In normalized descent direction (in steepest descent
methods) xnsd : xnsd  arg min{
v
f ( x)T v | || v || 1}

v
f ( x)T v | || v || 1}
 Direction in the unit ball defined by the norm, that
extends farthest in the direction f ( x) .

v
f ( x)T v | || v || 1}
 Direction in the unit ball defined by the norm, that
extends farthest in the direction f ( x) .
 Un-normalized steepest descent direction: xsd
xsd || f ( x) ||* xnsd ; || .||*  dual norm

Un-normalized Steepest Descent: xsd
 A property that will be utilized in convergence
analysis.
- f ( x) x || f ( x) || f ( x) x   || f ( x) || :
T
sd *
T
nsd
2
*
- Derivation in next page and can be skipped.
- Dual Norm: || z ||*  sup{zT x | || x || 1}

Derivation of the relation.
 A basic relation b/w (infimum) &
inf sup (supremum)
- inf( z)   sup( z); z  R n and vice-versa.

inf sup (supremum)
- Ex: Consider sets: z  {1, 2,3, 4};  z  {1, 2, 3, 4}

inf sup (supremum)
- Then: inf( z)  1; sup( z)  1;  inf( z)   sup( z)

inf sup (supremum)
- Then: inf( z)  1; sup( z)  1;  inf( z)   sup( z)
- Also: sup( z)  4; inf( z)  4;  sup( z)   inf( z)
 Will be used in the derivation.
Derivation of the Relation
 f ( x)T xsd || f ( x) ||* f ( x)T xnsd   || f ( x) ||*2

- || f ( x) ||* f ( x)T xnsd || f ( x) ||* f ( x)T {arg min{f ( x)T v | || v || 1}}

- Therefore: xnsd :vector which gives inf(f ( x)T v)such that ||v || 1

- Therefore: xnsd :vector which gives inf(f ( x)T v)such that ||v || 1
- Therefore: f ( x)T xnsd  inf{f ( x)T xnsd | || xnsd || 1}

- Therefore: x :vector which gives inf(f ( x) v)such that ||v || 1

nsd
T
- Therefore: f ( x) x  inf{f ( x) x | || x || 1}

T
nsd
T
nsd nsd
- Which is: f ( x) x   sup(f ( x) x | || x || 1}   || f ( x) ||

T
nsd
T
nsd nsd *

- Therefore: x :vector which gives inf(f ( x) v)such that ||v || 1

nsd
T
- Therefore: f ( x) x  inf{f ( x) x | || x || 1}

T
nsd
T
nsd nsd
- Which is: f ( x) x   sup(f ( x) x | || x || 1}   || f ( x) ||

T
nsd
T
nsd nsd *
 Hence f ( x)T xsd   || f ( x) ||*2

 Steepest Descent for Different Norms.

The Euclidean Norm.
L  2 NORM : || z ||2  zT z
Resulting in Gradient Descent Algorithm

L-2 Norm: Steepest Descent
f ( x)
 The descent direction: xsd   || f ( x) ||*  f ( x)
|| f ( x) ||2

f ( x)
 The descent direction: xsd   || f ( x) ||*  f ( x)
|| f ( x) ||2
Dual norm: || . ||  || . ||
dual norm
- 2 2

 The descent direction: x   || f ( x) || f ( x)  f ( x)
|| f ( x) ||
sd *
2
Dual norm: || . ||  || . ||
dual norm
- 2 2
- Finding Descent Direction: Two methods –

|| f ( x) ||
sd *
2
Dual norm: || . ||  || . ||
dual norm
- 2 2

 Simple vector algebra

|| f ( x) ||
sd *
2
Dual norm: || . ||  || . ||
dual norm
- 2 2

 Simple vector algebra
 Mathematical Optimization: Theoretically sound
(derivation can be skipped)
Simple Vector Algebra:
 Dot product: f ( x)T v || v |||| f ( x) || cos 

 Dot product: f ( x) v || v |||| f ( x) || cos
T
- Norm ball || v || 1 : sphere in n  dimensionalspace

T

- Product is minimum when: || v || 1 and cos  1

T

- Which gives: x  v  f ( x)
|| f ( x) ||
nsd

T

- Which gives: x  v  f ( x)
|| f ( x) ||
nsd
f ( x)
 Hence xsd   || f ( x) ||*
|| f ( x) ||2
 f ( x)

Solving Equivalent Optimization Problem
 Primal equivalent problem:
minimize f ( x)T v
subject to vT v  1

subject to vT v  1 Lagrange multiplier  
 Lagrangian: L(v,  )  f ( x)T v  vT v  

subject to vT v  1 Lagrange multiplier  
 Lagrangian: L(v,  )  f ( x)T v  vT v  

 Minimize with respect to primal variable
f ( x)
v
2
Solving Equivalent Optimization problem
f ( x)T f ( x)
 Dual problem: maximize g ( )     {with   0}
4

f ( x)T f ( x)
4
 For   0  objective is a concave function of 

f ( x)T f ( x)
4
 Setting g ( )  0 gives:   || f ( x) ||2
 2

f ( x)T f ( x)
4
 2
f ( x) f ( x)
 Hence xnsd  v    ;
2 || f ( x) ||2

f ( x)T f ( x)
4
 2
f ( x) f ( x)
 Hence xnsd  v    ;
2 || f ( x) ||2
 Which gives: xsd  f ( x)
Finally: (by all means  )
 In case of L-2 Norm the steepest descent algorithm
is same as the gradient descent algorithm.

 Gradient Descent has been discussed in previous
presentations, with complete convergence analysis
using line search methods and condition number.

 Gradient Descent has been discussed in previous
presentations, with complete convergence analysis
using line search methods and condition number.
 (See Presentation 2 & 3 of Sec-2 in CCO-10/11)

The Quadratic Norm.
Qudractic Norm : || z || p  ( zT Pz)1/2
Resulting in Gradient Descent: Modified Coordinate System

f ( x)T v
 The descent direction: minimize
subject to vT Pv  1

f ( x)T v
 Lagrangian multiplier: 
L(v,  )  f ( x)T v   (vT Pv)  

f ( x)T v
 Lagrangian multiplier: 
L(v,  )  f ( x)T v   (vT Pv)  
 Minimize w.r.t. v
L( , v) P 1f ( x)
0 v
v 2

 Dual function  convace function
f ( x)T P 1f ( x)
maximize L( )    {  0}
4

f ( x)T P 1f ( x)
maximize L( )    {  0}
4
 Setting
L( ) f ( x)T P 1f ( x)
 0  
 2

f ( x)T P 1f ( x)
maximize L( )    {  0}
4
 Setting
L( ) f ( x)T P 1f ( x)
 0  
 2
 Which gives: xnsd  v  (f ( x)T P1f ( x))1/2 P 1f ( x)

f ( x)T P 1f ( x)
maximize L( )    {  0}
4
 Setting
L( ) f ( x)T P 1f ( x)
 0  
 2
 Which gives: xnsd  v  (f ( x)T P1f ( x))1/2 P1f ( x)
1
 Hence: x || f ( x) || x   || f ( x) P f ( x) || || f ( xP) P f (xf)( x) ||

sd * nsd
T 1
2 T 1
2

Steepest Descent: Quadratic Norm
 Descent direction: xsd   P1f ( x)

 Change of coordinate interpretation: y  P1/2 x
- Giving: || y ||2 || x || p

- Giving: || y ||2 || x || p

 Define: f ( y)  f (P y)  f ( x)
1/2

- Giving: || y ||2 || x || p

 Define: f ( y)  f (P y)  f ( x)
1/2
 Descent direction for f ( y) :

ysd  f ( y)   P1/2f ( P1/2 y)   P1/2f ( x)

Change of Coordinate Interpretation
 Steepest descent method in the quadratic norm is
the gradient descent method applied to the
problem after the change of the coordinates.
 This can help in overcoming the large condition
number of the Hessian of the function at the
optimal.

The L-1 Norm.
L1 Norm : || z ||1   | zi |
i
Resulting in Coordinate Descent Methods

Steepest Descent: L1 Norm
 Descent Direction: xnsd  arg min{f ( x)T v | || v ||1  1}


 Can be derived analytically –
f ( x)


- To simplify, assume all components of f ( x) are of
same sign. (for example negative).


- To simplify, assume all components of f ( x) are of
same sign. (for example negative).
- The dot product is minimized if all components of v
are positive.

Descent Direction
n n
- Hence this becomes: argmin( f ( x)i vi | v i  1; vi  0)
i 1 i 1

Descent Direction
n n
- Hence this becomes: argmin( f ( x)i vi |  vi  1; vi  0)
i 1 i 1
- The minimum is attained when
vk  1; k  arg max i {| f ( x)i |}

Descent Direction
n n
- Hence this becomes: argmin( f ( x)i vi |  vi  1; vi  0)
i 1 i 1
- The minimum is attained when
vk  1; k  arg max i {| f ( x)i |}
- Hence, absolute value of the component of v

corresponding to element in f ( x) having maximum
magnitude is 1 and all other components are 0.
(due to constraint: || v ||1  1)
Descent Direction
 This can be generalized for any general f ( x)

Descent Direction
 Let k be the index: || f ( x) || | (f ( x))k |

Descent Direction
 Then x  sign(f ( x)) e ; e  k standard basis vector
nsd k k k
th

Descent Direction
nsd k k k
th
 Un-normalized descent direction:

 f ( x) 
xsd || f ( x) ||* xnsd   || f ( x) || sign   ek
 xk 

Descent Direction
nsd k k k
th
 Un-normalized descent direction:

 f ( x) 
xsd || f ( x) ||* xnsd   || f ( x) || sign   ek
 xk 
f ( x)
 Hence xsd  
xk
ek

 Thus resulting in the coordinate descent method.

 At each iteration, component of f ( x) with maximum
absolute value is selected, and then corresponding
component of x is adjusted.
 Descent direction: x   f x( x) e
sd k
k
k  arg max i {|  f ( x) i |}
 Theory:
Convergence Analysis for any general norm.
With
Backtracking line search
(can be skipped if your focus is just on implementation)

Convergence Analysis: Backtracking
 Using the property of Norm:
- Any norm can be bounded by an L2 norm:
- There exists   (0,1]: || z ||  || z ||2

Convergence Analysis: Backtracking
 Using the property of Norm:
- Any norm can be bounded by an L2 norm:
- There exists   (0,1]: || z ||  || z ||2
 This relation can be checked in various

mathematical sources describing the Norms.
 Using this, convergence analysis will be done.
Convergence Rate: Backtracking
 Using the relation   (0,1]: || z ||  || z || and definition
2
of xsd , and using the relation  f ( x) MI (see the 2
first presentation of CCO-10/11 Sec-2), it can be

easily shown that:
Mt 2
f (t )  f ( x  t xsd )  f ( x)  t || f ( x) ||  2 || f ( x) ||*2
2
2
*
- RHS: Convex function of step-size: t

 ( RHS )
 The RHS can be minimized be setting t
0

 The RHS can be minimized be setting ( RHS t
)
0
Which gives t*  M : Substituting this value gives –

2
2 2
f (t*)  f ( x  t * xsd )  f ( x)  || f ( x) ||  f ( x) 
2
* f ( x)T xsd
2M 2M

 The RHS can be minimized be setting ( RHS t
)
0
Which gives t*  M : Substituting this value gives –

2
2 2
f (t*)  f ( x  t * xsd )  f ( x)  || f ( x) ||  f ( x) 
2
* f ( x)T xsd
2M 2M
 Now for Backtracking Line Search:   (0,0.5);   (0,1]

- Backtracking – discussed in detail in previous presentations

 for   (0,0.5) : the relation below holds ( f ( x)T xsd  0)
2  2
 Relation: f (t*)  f ( x  t * x
sd )  f ( x) 
2M
f ( x) xsd  f ( x) 
T
M
f ( x)T xsd

 for   (0,0.5) : the relation below holds ( f ( x)T xsd  0)
2  2
 Relation: f (t*)  f ( x  t * x )  f ( x)  2M f ( x)
sd
T
xsd  f ( x) 
M
f ( x)T xsd
 Therefore line search returns:

 2
t  min{1, }
M
 Thus f ( x  t xsd )  f ( x)   min{1,  2 / M }|| f ( x) ||*2

 Subtracting p * from both sides (as was done in
gradient descent, see previous presentation) gives:
f ( x )  p*  f ( x)  p *  2 min{1,  2 / M }|| f ( x) ||22
- (Using the norm relation)

 Subtracting p * from both sides (as was done in
gradient descent, see previous presentation) gives:
f ( x )  p*  f ( x)  p *  2 min{1,  2 / M }|| f ( x) ||22
(Using the norm relation)
-
 From 1st presentation: we have || f ( x) ||22  2m( f ( x)  p*)

 Where m : 2 f ( x)  mI (strong convexity assumption: presentation 1)
 Using the relation and substituting in
f ( x )  p*  f ( x)  p *  2 min{1,  2 / M }|| f ( x) ||22

f ( x )  p*  f ( x)  p *  2 min{1,  2 / M }|| f ( x) ||22
 Resulting in:
f ( x )  p*  f ( x)  p * 2m 2 min{1,  2 / M }( f ( x)  p*)

f ( x )  p*  f ( x)  p *  2 min{1,  2 / M }|| f ( x) ||22
 Resulting in:
f ( x )  p*  f ( x)  p * 2m 2 min{1,  2 / M }( f ( x)  p*)
 Hence f ( x )  p*  c( f ( x)  p*)
c  1   2 min{2m, 2m 2 / M }  1

 Applying the relation recursively for each iteration:
f ( xk 1 )  p*  c k ( f ( x0 )  p*)
 Linear Convergence.

Experiments and Results.

Form previous presentation.
1) f ( x)  x12   x22 ; x  R 2 :
Different  will give different eccentricity to the
sublevel sets.

The Euclidean Norm.
L  2 NORM : || z ||2  zT z
Resulting in Gradient Descent Algorithm
See 2nd and 3rd presentation for detailed analysis.
Results:
Backtracking Line Search:   0.1;   0.8

 Using the Euclidean Norm : Gradient Descent.
- Dependent on the condition number of the Hessian
near the optimal. (see previous presentation).
- Scaling issues: Large gamma– large condition
number.

CCO-10/11: Steepest Descent:L2 Norm: Gamma=0.05: No. Iter=211 CCO-10/11: Steepest Descent:L2 Norm: Gamma=0.1: No. Iter=110
10 10
50
20
40
30
50
30
40
40
20
30
8 8
10
50
40
20
10
10
50
10
30
20
6 6
4 4
2 2
50
20
40
30
30
50
40
0 0
10
40
30
20
50
40
20
10
10
50
10
20
30
-2 -2
-4 -4
-6 -6
50
20
40
30
30
50
40
-8 -8
40
10
30
20
50
40
10
10
20
50
10
20
30
10 -10
-10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10

CCO-10/11: Steepest Descent:L2 Norm: Gamma=0.5: No. Iter=19 CCO-10/11: Steepest Descent:L2 Norm: Gamma=1: No. Iter=1
10 10 50
40
30 0 0
20 5 4
50
40
8 8 40
50
30
50
10 20
20
6 6 30
30
40
10
30
4 4
20
20
50
40
10
2 2
20
40
30
10
50
40
10
50
20
0 0
50
20
20
40
10
30
30
-2 -2
10
50
30
-4 -4 10
20
20 40
40
-6 10 30 -6
30 30
40 40 50
50
20
50
-8 20 -8 50
30 40
10 -10 50
-10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10

CCO-10/11: Steepest Descent:L2 Norm: Gamma=5: No. Iter=39 CCO-10/11: Steepest Descent:L2 Norm: Gamma=10: No. Iter=84
10 10
8 8
6 6
50
4 40 30 40 50 4
50 20 40
50
30 4050
30
2 30 20 10 2 20 30
10 50
50
40
30 20
40
50
30
20
0 0
10
40
20
40
40
20
30
10 10
20
-2 10 -2 50 20 30
50 20 30 40 30 40 50
30 50
40 40 50
-4 50 -4
-6 -6
-8 -8
10 -10
-10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10

250
200
Number of Iterations 150
100
50
0
0 2 4 6 8 10 12 14 16 18 20
Gamma

The Quadratic Norm.
Qudractic Norm : || z || p  ( zT Pz)1/2

Resulting in Gradient Descent: Modified Coordinate System
1 0 
P  Exact Hessian     Resulting in 1 Iteration
0  
Backtracking Line Search   0.1;   0.8

CCO-10/11: Steepest Descent: Quadratic Norm: Gamma=0.1: No. Iter=1 CCO-10/11: Steepest Descent: Quadratic Norm: Gamma=0.5: No. Iter=1
10 10
50
40
20
30
30
30
50
8 8
40
40
30
10
10
20
20
50
50
6 6 10
40
20
20
4 4
10
2 2
50
40
30
50
20
30
30
40
30
0 0
40
10
10
50
20
20
10
50
10
-2 -2
40
20
-4 -4
-6 -6 10
20
50
30
40
50
30
40
30
-8 -8
30
20
40
50
10
10
20
20
50
40
10 -10
-10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10

CCO-10/11: Steepest Descent: Quadratic Norm: Gamma=1: No. Iter=1 CCO-10/11: Steepest Descent: Quadratic Norm: Gamma=5: No. Iter=1
10 50 10
50
40
8 40 8
30
6 20 30 6
50
50 50
50
4 10 4
30
40
40
30 40
40
20
20 30
10
20
2 2 50 40 30 20 10
50
40
20
0 0
10
30
40
30
20
50
50
30
20
-2 -2 10
10 50
20 30
40
10 40 30 50
20 40
40
-4 -4
50
-6 20 -6
30 30
-8 50 50 -8
40
40
10 50 -10
-10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10

CCO-10/11: Steepest Descent: Quadratic Norm: Gamma=10: No. Iter=1
10 2
8 1.8
6 1.6
4 1.4
Number of Iterations
40
50
30 40 50
2 20 30 1.2
10 40 50
50 30 20
30
0 1
20
20
10 10
40
-2 50 20 30 0.8
40 30 40 50
50
-4 0.6
-6 0.4
-8 0.2
10 0
-10 -8 -6 -4 -2 0 2 4 6 8 10 0 2 4 6 8 10 12 14 16 18 20
Gamma

The L-1 Norm.
L1 Norm : || z ||1   | zi |
i
Resulting in Coordinate Descent Methods

Backtracking line search:   0.1;   0.8

CCO-10/11: Steepest Descent:L1 Norm: Gamma=0.05: No. Iter=212 CCO-10/11: Steepest Descent:L1 Norm: Gamma=0.1: No. Iter=111
10 10
20
10
30
10
30
40
20
20
40
40
50
30
40
8 8
30
20
10
10
50
50
50
6 6
4 4
2 2
10
20
30
10
30
40
20
20
40
40
50
0 0
40
30
30
20
10
10
50
50
50
-2 -2
-4 -4
-6 -6
10
20
30
10
30
40
20
40
20
-8 -8
50
40
40
30
30
20
10
10
50
50
50
10 -10
-10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10

CCO-10/11: Steepest Descent:L1 Norm: Gamma=0.5: No. Iter=20 CCO-10/11: Steepest Descent:L1 Norm: Gamma=1: No. Iter=2
10 10
50
50
50
50
20 40
8 8 40
30
40
40
30
20
30
6 6
50
20
10 30
50
30
10
20
40
4 4 20 10
10
40
50
50
2 2
20
10
20
20
50
0 0
40
30
40
30
30
40
-2 -2
20
30
50
10 10 10
-4 -4
40
20 20
50
-6 10 -6
50
20 50 30
40 30
40
-8 30 -8
20 30 50
40
40
10 -10 50
-10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10

CCO-10/11: Steepest Descent:L1 Norm: Gamma=5: No. Iter=36 CCO-10/11: Steepest Descent:L1 Norm: Gamma=10: No. Iter=43
10 10
8 8
6 6
50
4 50 40 40 4
30 50
20 50 40 50
40 30
2 30 10 30 2 50 20 30
40 20 10 40
30 20 10
50
30
20
0 0
30
40
20
10
40
20
40
20 50 50
10
-2 10 -2 20 30
30 40
30
20 30 50 5040 50
50 40 40
-4 50 -4
-6 -6
-8 -8
10 -10
-10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10

250
200
Number of Iterations 150
100
50
0
0 2 4 6 8 10 12 14 16 18 20
Gamma

 The huge number of iterations is a result of
Backtracking Line Search.
 The coordinate descent in this example can be
trivialized by solving it as a univariate problem at
each iteration.

Reference for CCO-10/11
 Publications of Angelia Nedic, Asuman Ozdalgar
and Stephen Boyd.
 S. Boyd & L. Vandenberghe: Convex Optimization.
 J. Nocedal & S.J. Wright: Numerical Optimization.
(Each reference is explicitly mentioned on the CCO-10/11-webpage for each presentation)

Thank you for watching.
SEARCHING-EYE.COM
- One week project: Coming Soon
- Sanjeev Sharma
- http://www.searching-eye.com/sanjeevs.html
- http://www.searching-eye.com/~mac/

Steepest Descent Method and Convergence

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Steepest Descent Method and Convergence

Uploaded by

Copyright:

Available Formats

CCO-10/11

CONVEX & COMBINATORIAL OPTIMIZATION

All downloadable material: ( including MATLAB Code )

CCO-10/11 Sanjeev Sharma www.searching-eye.com

Interests: Reinforcement Learning, Decentralized Algorithms,

Application Interest : Large scale Semi-Autonomous & Fully-

Please refer MAC-X

CCO-10/11 Sanjeev Sharma www.searching-eye.com

- Directional derivative of f in the directionv :f ( x)T v

CCO-10/11 Sanjeev Sharma www.searching-eye.com

- Directional derivative of f in the directionv :f ( x)T v

 For Descent Methods:

CCO-10/11 Sanjeev Sharma www.searching-eye.com

 Form the linear function point of view:

 Length is a measure of norm: || v || 

CCO-10/11 Sanjeev Sharma www.searching-eye.com

CCO-10/11 Sanjeev Sharma www.searching-eye.com

CCO-10/11 Sanjeev Sharma www.searching-eye.com

CCO-10/11 Sanjeev Sharma www.searching-eye.com

- Derivation in next page and can be skipped.

- Dual Norm: || z ||*  sup{zT x | || x || 1}

CCO-10/11 Sanjeev Sharma www.searching-eye.com

CCO-10/11 Sanjeev Sharma www.searching-eye.com

CCO-10/11 Sanjeev Sharma www.searching-eye.com

CCO-10/11 Sanjeev Sharma www.searching-eye.com

CCO-10/11 Sanjeev Sharma www.searching-eye.com

CCO-10/11 Sanjeev Sharma www.searching-eye.com

- Therefore: f ( x)T xnsd  inf{f ( x)T xnsd | || xnsd || 1}

CCO-10/11 Sanjeev Sharma www.searching-eye.com

- Therefore: x :vector which gives inf(f ( x) v)such that ||v || 1

- Therefore: f ( x) x  inf{f ( x) x | || x || 1}

- Which is: f ( x) x   sup(f ( x) x | || x || 1}   || f ( x) ||

CCO-10/11 Sanjeev Sharma www.searching-eye.com

- Therefore: x :vector which gives inf(f ( x) v)such that ||v || 1

- Therefore: f ( x) x  inf{f ( x) x | || x || 1}

- Which is: f ( x) x   sup(f ( x) x | || x || 1}   || f ( x) ||

 Hence f ( x)T xsd   || f ( x) ||*2

CCO-10/11 Sanjeev Sharma www.searching-eye.com

Resulting in Gradient Descent Algorithm

CCO-10/11 Sanjeev Sharma www.searching-eye.com

CCO-10/11 Sanjeev Sharma www.searching-eye.com

CCO-10/11 Sanjeev Sharma www.searching-eye.com

- Finding Descent Direction: Two methods –

CCO-10/11 Sanjeev Sharma www.searching-eye.com

- Finding Descent Direction: Two methods –

CCO-10/11 Sanjeev Sharma www.searching-eye.com

- Finding Descent Direction: Two methods –

CCO-10/11 Sanjeev Sharma www.searching-eye.com

- Norm ball || v || 1 : sphere in n  dimensionalspace

CCO-10/11 Sanjeev Sharma www.searching-eye.com

- Norm ball || v || 1 : sphere in n  dimensionalspace

CCO-10/11 Sanjeev Sharma www.searching-eye.com

- Norm ball || v || 1 : sphere in n  dimensionalspace

CCO-10/11 Sanjeev Sharma www.searching-eye.com

- Norm ball || v || 1 : sphere in n  dimensionalspace

CCO-10/11 Sanjeev Sharma www.searching-eye.com

CCO-10/11 Sanjeev Sharma www.searching-eye.com

 Lagrangian: L(v,  )  f ( x)T v  vT v  

CCO-10/11 Sanjeev Sharma www.searching-eye.com

 Lagrangian: L(v,  )  f ( x)T v  vT v  

CCO-10/11 Sanjeev Sharma www.searching-eye.com

CCO-10/11 Sanjeev Sharma www.searching-eye.com

CCO-10/11 Sanjeev Sharma www.searching-eye.com