You are on page 1of 111

CCO-10/11

CONVEX & COMBINATORIAL OPTIMIZATION


SEC-2 P-004

WWW.SEARCHING-EYE.COM
http://www.searching-eye.com/~mac/
STANFORD AUDI: AUTONOMOUS DRIFTING.
Cool!!
Unconstrained Minimization
Steepest Descent Methods Under Different Norms
Including Coordinate Descent Algorithm

All downloadable material: ( including MATLAB Code )


http://www.searching-eye.com/sanjeevsharma/co/cco/

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Undergraduate Student: Final Year
Indian Institute of Technology, Roorkee
Electrical Engineering Department (2007-2011)

Interests: Reinforcement Learning, Decentralized Algorithms,


Decomposition Methods, Convex Optimization, Machine Learning,
Mixed Integer Programming.

Application Interest : Large scale Semi-Autonomous & Fully-


Autonomous Military Operations; Robots that can help human
beings in Complex Tasks; Decentralized Path Planning & Joint
Operation Systems; Decentralized Reinforcement Learning.
http://www.searching-eye.com/sanjeevs.html
CCO-10/11 Sanjeev Sharma www.searching-eye.com
Before taking CCO-10/11

Please refer MAC-X


http://www.searching-eye.com/~mac
Or
Refer to the details of section 1 & 2 of CCO-10/11
http://www.searching-eye.com/sanjeevsharma/co/cco/

CCO-10/11 Sanjeev Sharma www.searching-eye.com


This presentation
 Steepest Descent Methods
- Steepest Descent under L-2 Norm
- Steepest Descent under Quadratic Norm.
- Steepest Descent under L-1 Norm: Coordinate
Descent Method.
- Convergence Analysis of Steepest Descent Methods.
CCO-10/11 Sanjeev Sharma www.searching-eye.com
Directional Derivatives:
 Consider a function f : R  Rn

- Directional derivative of f in the directionv :f ( x)T v

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Directional Derivatives:
 Consider a function f : R  Rn

- Directional derivative of f in the directionv :f ( x)T v

 For Descent Methods:


- Directional derivative should be negative: f ( x)T v  0
- v is a descent direction. (See 3rd presentation)
CCO-10/11 Sanjeev Sharma www.searching-eye.com
Basic Vector Algebra
 a.b : is a scalar projection of vector b on vector a
scaled by the length of a .

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Basic Vector Algebra
 a.b : is a scalar projection of vector b on vector a
scaled by the length of a .

 Form the linear function point of view:


- f ( x)T v : Linear function of v
- Can be made as small as possible with length(v)  
CCO-10/11 Sanjeev Sharma www.searching-eye.com
Limit of the length (function of Norm)
 The descent direction makes sense only if the length
of the descent direction is restricted.

 Length is a measure of norm: || v || 


- Where || . || is any valid norm.

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Normalized Steepest Descent
 In normalized descent direction (in steepest descent
methods) xnsd : xnsd  arg min{
v
f ( x)T v | || v || 1}

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Normalized Steepest Descent
 In normalized descent direction (in steepest descent
methods) xnsd : xnsd  arg min{
v
f ( x)T v | || v || 1}
 Direction in the unit ball defined by the norm, that
extends farthest in the direction f ( x) .

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Normalized Steepest Descent
 In normalized descent direction (in steepest descent
methods) xnsd : xnsd  arg min{
v
f ( x)T v | || v || 1}
 Direction in the unit ball defined by the norm, that
extends farthest in the direction f ( x) .
 Un-normalized steepest descent direction: xsd
xsd || f ( x) ||* xnsd ; || .||*  dual norm

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Un-normalized Steepest Descent: xsd
 A property that will be utilized in convergence
analysis.
- f ( x) x || f ( x) || f ( x) x   || f ( x) || :
T
sd *
T
nsd
2
*

- Derivation in next page and can be skipped.

- Dual Norm: || z ||*  sup{zT x | || x || 1}


CCO-10/11 Sanjeev Sharma www.searching-eye.com
Derivation of the relation.
 A basic relation b/w (infimum) &
inf sup (supremum)
- inf( z)   sup( z); z  R n and vice-versa.

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Derivation of the relation.
 A basic relation b/w (infimum) &
inf sup (supremum)
- inf( z)   sup( z); z  R n and vice-versa.
- Ex: Consider sets: z  {1, 2,3, 4};  z  {1, 2, 3, 4}

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Derivation of the relation.
 A basic relation b/w (infimum) &
inf sup (supremum)
- inf( z)   sup( z); z  R n and vice-versa.
- Ex: Consider sets: z  {1, 2,3, 4};  z  {1, 2, 3, 4}
- Then: inf( z)  1; sup( z)  1;  inf( z)   sup( z)

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Derivation of the relation.
 A basic relation b/w (infimum) &
inf sup (supremum)
- inf( z)   sup( z); z  R n and vice-versa.
- Ex: Consider sets: z  {1, 2,3, 4};  z  {1, 2, 3, 4}
- Then: inf( z)  1; sup( z)  1;  inf( z)   sup( z)
- Also: sup( z)  4; inf( z)  4;  sup( z)   inf( z)
 Will be used in the derivation.
CCO-10/11 Sanjeev Sharma www.searching-eye.com
Derivation of the Relation
 f ( x)T xsd || f ( x) ||* f ( x)T xnsd   || f ( x) ||*2

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Derivation of the Relation
 f ( x)T xsd || f ( x) ||* f ( x)T xnsd   || f ( x) ||*2

- || f ( x) ||* f ( x)T xnsd || f ( x) ||* f ( x)T {arg min{f ( x)T v | || v || 1}}

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Derivation of the Relation
 f ( x)T xsd || f ( x) ||* f ( x)T xnsd   || f ( x) ||*2

- || f ( x) ||* f ( x)T xnsd || f ( x) ||* f ( x)T {arg min{f ( x)T v | || v || 1}}

- Therefore: xnsd :vector which gives inf(f ( x)T v)such that ||v || 1

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Derivation of the Relation
 f ( x)T xsd || f ( x) ||* f ( x)T xnsd   || f ( x) ||*2

- || f ( x) ||* f ( x)T xnsd || f ( x) ||* f ( x)T {arg min{f ( x)T v | || v || 1}}

- Therefore: xnsd :vector which gives inf(f ( x)T v)such that ||v || 1

- Therefore: f ( x)T xnsd  inf{f ( x)T xnsd | || xnsd || 1}

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Derivation of the Relation
 f ( x)T xsd || f ( x) ||* f ( x)T xnsd   || f ( x) ||*2

- || f ( x) ||* f ( x)T xnsd || f ( x) ||* f ( x)T {arg min{f ( x)T v | || v || 1}}

- Therefore: x :vector which gives inf(f ( x) v)such that ||v || 1


nsd
T

- Therefore: f ( x) x  inf{f ( x) x | || x || 1}


T
nsd
T
nsd nsd

- Which is: f ( x) x   sup(f ( x) x | || x || 1}   || f ( x) ||


T
nsd
T
nsd nsd *

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Derivation of the Relation
 f ( x)T xsd || f ( x) ||* f ( x)T xnsd   || f ( x) ||*2

- || f ( x) ||* f ( x)T xnsd || f ( x) ||* f ( x)T {arg min{f ( x)T v | || v || 1}}

- Therefore: x :vector which gives inf(f ( x) v)such that ||v || 1


nsd
T

- Therefore: f ( x) x  inf{f ( x) x | || x || 1}


T
nsd
T
nsd nsd

- Which is: f ( x) x   sup(f ( x) x | || x || 1}   || f ( x) ||


T
nsd
T
nsd nsd *

 Hence f ( x)T xsd   || f ( x) ||*2


CCO-10/11 Sanjeev Sharma www.searching-eye.com
 Steepest Descent for Different Norms.

CCO-10/11 Sanjeev Sharma www.searching-eye.com


The Euclidean Norm.

L  2 NORM : || z ||2  zT z

Resulting in Gradient Descent Algorithm

CCO-10/11 Sanjeev Sharma www.searching-eye.com


L-2 Norm: Steepest Descent
f ( x)
 The descent direction: xsd   || f ( x) ||*  f ( x)
|| f ( x) ||2

CCO-10/11 Sanjeev Sharma www.searching-eye.com


L-2 Norm: Steepest Descent
f ( x)
 The descent direction: xsd   || f ( x) ||*  f ( x)
|| f ( x) ||2
Dual norm: || . ||  || . ||
dual norm
- 2 2

CCO-10/11 Sanjeev Sharma www.searching-eye.com


L-2 Norm: Steepest Descent
 The descent direction: x   || f ( x) || f ( x)  f ( x)
|| f ( x) ||
sd *
2

Dual norm: || . ||  || . ||
dual norm
- 2 2

- Finding Descent Direction: Two methods –

CCO-10/11 Sanjeev Sharma www.searching-eye.com


L-2 Norm: Steepest Descent
 The descent direction: x   || f ( x) || f ( x)  f ( x)
|| f ( x) ||
sd *
2

Dual norm: || . ||  || . ||
dual norm
- 2 2

- Finding Descent Direction: Two methods –


 Simple vector algebra

CCO-10/11 Sanjeev Sharma www.searching-eye.com


L-2 Norm: Steepest Descent
 The descent direction: x   || f ( x) || f ( x)  f ( x)
|| f ( x) ||
sd *
2

Dual norm: || . ||  || . ||
dual norm
- 2 2

- Finding Descent Direction: Two methods –


 Simple vector algebra
 Mathematical Optimization: Theoretically sound
(derivation can be skipped)
CCO-10/11 Sanjeev Sharma www.searching-eye.com
Simple Vector Algebra:
 Dot product: f ( x)T v || v |||| f ( x) || cos 

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Simple Vector Algebra:
 Dot product: f ( x) v || v |||| f ( x) || cos
T

- Norm ball || v || 1 : sphere in n  dimensionalspace

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Simple Vector Algebra:
 Dot product: f ( x) v || v |||| f ( x) || cos
T

- Norm ball || v || 1 : sphere in n  dimensionalspace


- Product is minimum when: || v || 1 and cos  1

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Simple Vector Algebra:
 Dot product: f ( x) v || v |||| f ( x) || cos
T

- Norm ball || v || 1 : sphere in n  dimensionalspace


- Product is minimum when: || v || 1 and cos  1
- Which gives: x  v  f ( x)
|| f ( x) ||
nsd

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Simple Vector Algebra:
 Dot product: f ( x) v || v |||| f ( x) || cos
T

- Norm ball || v || 1 : sphere in n  dimensionalspace


- Product is minimum when: || v || 1 and cos  1
- Which gives: x  v  f ( x)
|| f ( x) ||
nsd

f ( x)
 Hence xsd   || f ( x) ||*
|| f ( x) ||2
 f ( x)

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Solving Equivalent Optimization Problem
 Primal equivalent problem:
minimize f ( x)T v
subject to vT v  1

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Solving Equivalent Optimization Problem
 Primal equivalent problem:
minimize f ( x)T v
subject to vT v  1 Lagrange multiplier  

 Lagrangian: L(v,  )  f ( x)T v  vT v  

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Solving Equivalent Optimization Problem
 Primal equivalent problem:
minimize f ( x)T v
subject to vT v  1 Lagrange multiplier  

 Lagrangian: L(v,  )  f ( x)T v  vT v  


 Minimize with respect to primal variable
f ( x)
v
2
CCO-10/11 Sanjeev Sharma www.searching-eye.com
Solving Equivalent Optimization problem
f ( x)T f ( x)
 Dual problem: maximize g ( )     {with   0}
4

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Solving Equivalent Optimization problem
f ( x)T f ( x)
 Dual problem: maximize g ( )     {with   0}
4
 For   0  objective is a concave function of 

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Solving Equivalent Optimization problem
f ( x)T f ( x)
 Dual problem: maximize g ( )     {with   0}
4
 For   0  objective is a concave function of 
 Setting g ( )  0 gives:   || f ( x) ||2
 2

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Solving Equivalent Optimization problem
f ( x)T f ( x)
 Dual problem: maximize g ( )     {with   0}
4
 For   0  objective is a concave function of 
 Setting g ( )  0 gives:   || f ( x) ||2
 2

f ( x) f ( x)
 Hence xnsd  v    ;
2 || f ( x) ||2

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Solving Equivalent Optimization problem
f ( x)T f ( x)
 Dual problem: maximize g ( )     {with   0}
4
 For   0  objective is a concave function of 
 Setting g ( )  0 gives:   || f ( x) ||2
 2

f ( x) f ( x)
 Hence xnsd  v    ;
2 || f ( x) ||2
 Which gives: xsd  f ( x)
CCO-10/11 Sanjeev Sharma www.searching-eye.com
Finally: (by all means  )
 In case of L-2 Norm the steepest descent algorithm
is same as the gradient descent algorithm.

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Finally: (by all means  )
 In case of L-2 Norm the steepest descent algorithm
is same as the gradient descent algorithm.
 Gradient Descent has been discussed in previous
presentations, with complete convergence analysis
using line search methods and condition number.

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Finally: (by all means  )
 In case of L-2 Norm the steepest descent algorithm
is same as the gradient descent algorithm.
 Gradient Descent has been discussed in previous
presentations, with complete convergence analysis
using line search methods and condition number.
 (See Presentation 2 & 3 of Sec-2 in CCO-10/11)

CCO-10/11 Sanjeev Sharma www.searching-eye.com


The Quadratic Norm.

Qudractic Norm : || z || p  ( zT Pz)1/2

Resulting in Gradient Descent: Modified Coordinate System

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Solving Equivalent Optimization Problem
f ( x)T v
 The descent direction: minimize
subject to vT Pv  1

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Solving Equivalent Optimization Problem
f ( x)T v
 The descent direction: minimize
subject to vT Pv  1
 Lagrangian multiplier: 
L(v,  )  f ( x)T v   (vT Pv)  

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Solving Equivalent Optimization Problem
f ( x)T v
 The descent direction: minimize
subject to vT Pv  1
 Lagrangian multiplier: 
L(v,  )  f ( x)T v   (vT Pv)  
 Minimize w.r.t. v
L( , v) P 1f ( x)
0 v
v 2

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Solving Equivalent Optimization Problem
 Dual function  convace function
f ( x)T P 1f ( x)
maximize L( )    {  0}
4

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Solving Equivalent Optimization Problem
 Dual function  convace function
f ( x)T P 1f ( x)
maximize L( )    {  0}
4
 Setting
L( ) f ( x)T P 1f ( x)
 0  
 2

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Solving Equivalent Optimization Problem
 Dual function  convace function
f ( x)T P 1f ( x)
maximize L( )    {  0}
4
 Setting
L( ) f ( x)T P 1f ( x)
 0  
 2
 Which gives: xnsd  v  (f ( x)T P1f ( x))1/2 P 1f ( x)

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Solving Equivalent Optimization Problem
 Dual function  convace function
f ( x)T P 1f ( x)
maximize L( )    {  0}
4
 Setting
L( ) f ( x)T P 1f ( x)
 0  
 2
 Which gives: xnsd  v  (f ( x)T P1f ( x))1/2 P1f ( x)
1

 Hence: x || f ( x) || x   || f ( x) P f ( x) || || f ( xP) P f (xf)( x) ||


sd * nsd
T 1
2 T 1
2

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Steepest Descent: Quadratic Norm
 Descent direction: xsd   P1f ( x)

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Steepest Descent: Quadratic Norm
 Descent direction: xsd   P1f ( x)
 Change of coordinate interpretation: y  P1/2 x

- Giving: || y ||2 || x || p

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Steepest Descent: Quadratic Norm
 Descent direction: xsd   P1f ( x)
 Change of coordinate interpretation: y  P1/2 x

- Giving: || y ||2 || x || p


 Define: f ( y)  f (P y)  f ( x)
1/2

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Steepest Descent: Quadratic Norm
 Descent direction: xsd   P1f ( x)
 Change of coordinate interpretation: y  P1/2 x

- Giving: || y ||2 || x || p


 Define: f ( y)  f (P y)  f ( x)
1/2

 Descent direction for f ( y) :


ysd  f ( y)   P1/2f ( P1/2 y)   P1/2f ( x)

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Change of Coordinate Interpretation
 Steepest descent method in the quadratic norm is
the gradient descent method applied to the
problem after the change of the coordinates.
 This can help in overcoming the large condition
number of the Hessian of the function at the
optimal.

CCO-10/11 Sanjeev Sharma www.searching-eye.com


The L-1 Norm.

L1 Norm : || z ||1   | zi |
i

Resulting in Coordinate Descent Methods

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Steepest Descent: L1 Norm

 Descent Direction: xnsd  arg min{f ( x)T v | || v ||1  1}

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Steepest Descent: L1 Norm

 Descent Direction: xnsd  arg min{f ( x)T v | || v ||1  1}


 Can be derived analytically –
f ( x)

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Steepest Descent: L1 Norm

 Descent Direction: xnsd  arg min{f ( x)T v | || v ||1  1}


 Can be derived analytically –
- To simplify, assume all components of f ( x) are of
same sign. (for example negative).

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Steepest Descent: L1 Norm

 Descent Direction: xnsd  arg min{f ( x)T v | || v ||1  1}


 Can be derived analytically –
- To simplify, assume all components of f ( x) are of
same sign. (for example negative).
- The dot product is minimized if all components of v
are positive.

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Descent Direction
n n
- Hence this becomes: argmin( f ( x)i vi | v i  1; vi  0)
i 1 i 1

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Descent Direction
n n
- Hence this becomes: argmin( f ( x)i vi |  vi  1; vi  0)
i 1 i 1
- The minimum is attained when
vk  1; k  arg max i {| f ( x)i |}

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Descent Direction
n n
- Hence this becomes: argmin( f ( x)i vi |  vi  1; vi  0)
i 1 i 1
- The minimum is attained when
vk  1; k  arg max i {| f ( x)i |}

- Hence, absolute value of the component of v


corresponding to element in f ( x) having maximum
magnitude is 1 and all other components are 0.
(due to constraint: || v ||1  1)
CCO-10/11 Sanjeev Sharma www.searching-eye.com
Descent Direction
 This can be generalized for any general f ( x)

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Descent Direction
 This can be generalized for any general f ( x)
 Let k be the index: || f ( x) || | (f ( x))k |

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Descent Direction
 This can be generalized for any general f ( x)
 Let k be the index: || f ( x) || | (f ( x))k |
 Then x  sign(f ( x)) e ; e  k standard basis vector
nsd k k k
th

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Descent Direction
 This can be generalized for any general f ( x)
 Let k be the index: || f ( x) || | (f ( x))k |
 Then x  sign(f ( x)) e ; e  k standard basis vector
nsd k k k
th

 Un-normalized descent direction:


 f ( x) 
xsd || f ( x) ||* xnsd   || f ( x) || sign   ek
 xk 

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Descent Direction
 This can be generalized for any general f ( x)
 Let k be the index: || f ( x) || | (f ( x))k |
 Then x  sign(f ( x)) e ; e  k standard basis vector
nsd k k k
th

 Un-normalized descent direction:


 f ( x) 
xsd || f ( x) ||* xnsd   || f ( x) || sign   ek
 xk 
f ( x)
 Hence xsd  
xk
ek

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Steepest Descent: L1 Norm

 Thus resulting in the coordinate descent method.


 At each iteration, component of f ( x) with maximum
absolute value is selected, and then corresponding
component of x is adjusted.
 Descent direction: x   f x( x) e
sd k
k

k  arg max i {|  f ( x) i |}
CCO-10/11 Sanjeev Sharma www.searching-eye.com
 Theory:
Convergence Analysis for any general norm.
With
Backtracking line search
(can be skipped if your focus is just on implementation)

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Convergence Analysis: Backtracking
 Using the property of Norm:
- Any norm can be bounded by an L2 norm:
- There exists   (0,1]: || z ||  || z ||2

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Convergence Analysis: Backtracking
 Using the property of Norm:
- Any norm can be bounded by an L2 norm:
- There exists   (0,1]: || z ||  || z ||2

 This relation can be checked in various


mathematical sources describing the Norms.
 Using this, convergence analysis will be done.
CCO-10/11 Sanjeev Sharma www.searching-eye.com
Convergence Rate: Backtracking
 Using the relation   (0,1]: || z ||  || z || and definition
2

of xsd , and using the relation  f ( x) MI (see the 2

first presentation of CCO-10/11 Sec-2), it can be


easily shown that:
Mt 2
f (t )  f ( x  t xsd )  f ( x)  t || f ( x) ||  2 || f ( x) ||*2
2

2
*

- RHS: Convex function of step-size: t

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Convergence Rate: Backtracking
 ( RHS )
 The RHS can be minimized be setting t
0

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Convergence Rate: Backtracking
 The RHS can be minimized be setting ( RHS t
)
0

Which gives t*  M : Substituting this value gives –


2

2 2
f (t*)  f ( x  t * xsd )  f ( x)  || f ( x) ||  f ( x) 
2
* f ( x)T xsd
2M 2M

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Convergence Rate: Backtracking
 The RHS can be minimized be setting ( RHS t
)
0

Which gives t*  M : Substituting this value gives –


2

2 2
f (t*)  f ( x  t * xsd )  f ( x)  || f ( x) ||  f ( x) 
2
* f ( x)T xsd
2M 2M

 Now for Backtracking Line Search:   (0,0.5);   (0,1]


- Backtracking – discussed in detail in previous presentations

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Convergence Rate: Backtracking
 for   (0,0.5) : the relation below holds ( f ( x)T xsd  0)
2  2
 Relation: f (t*)  f ( x  t * x
sd )  f ( x) 
2M
f ( x) xsd  f ( x) 
T

M
f ( x)T xsd

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Convergence Rate: Backtracking
 for   (0,0.5) : the relation below holds ( f ( x)T xsd  0)
2  2
 Relation: f (t*)  f ( x  t * x )  f ( x)  2M f ( x)
sd
T
xsd  f ( x) 
M
f ( x)T xsd

 Therefore line search returns:


 2
t  min{1, }
M

 Thus f ( x  t xsd )  f ( x)   min{1,  2 / M }|| f ( x) ||*2


CCO-10/11 Sanjeev Sharma www.searching-eye.com
Convergence Rate: Backtracking
 Subtracting p * from both sides (as was done in
gradient descent, see previous presentation) gives:
f ( x )  p*  f ( x)  p *  2 min{1,  2 / M }|| f ( x) ||22
- (Using the norm relation)

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Convergence Rate: Backtracking
 Subtracting p * from both sides (as was done in
gradient descent, see previous presentation) gives:
f ( x )  p*  f ( x)  p *  2 min{1,  2 / M }|| f ( x) ||22
(Using the norm relation)
-

 From 1st presentation: we have || f ( x) ||22  2m( f ( x)  p*)


 Where m : 2 f ( x)  mI (strong convexity assumption: presentation 1)
CCO-10/11 Sanjeev Sharma www.searching-eye.com
Convergence Rate: Backtracking
 Using the relation and substituting in
f ( x )  p*  f ( x)  p *  2 min{1,  2 / M }|| f ( x) ||22

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Convergence Rate: Backtracking
 Using the relation and substituting in
f ( x )  p*  f ( x)  p *  2 min{1,  2 / M }|| f ( x) ||22

 Resulting in:
f ( x )  p*  f ( x)  p * 2m 2 min{1,  2 / M }( f ( x)  p*)

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Convergence Rate: Backtracking
 Using the relation and substituting in
f ( x )  p*  f ( x)  p *  2 min{1,  2 / M }|| f ( x) ||22

 Resulting in:
f ( x )  p*  f ( x)  p * 2m 2 min{1,  2 / M }( f ( x)  p*)

 Hence f ( x )  p*  c( f ( x)  p*)
c  1   2 min{2m, 2m 2 / M }  1

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Convergence Rate: Backtracking
 Applying the relation recursively for each iteration:
f ( xk 1 )  p*  c k ( f ( x0 )  p*)
 Linear Convergence.

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Experiments and Results.

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Form previous presentation.
1) f ( x)  x12   x22 ; x  R 2 :
Different  will give different eccentricity to the
sublevel sets.

CCO-10/11 Sanjeev Sharma www.searching-eye.com


The Euclidean Norm.
L  2 NORM : || z ||2  zT z
Resulting in Gradient Descent Algorithm
See 2nd and 3rd presentation for detailed analysis.
Results:
Backtracking Line Search:   0.1;   0.8

CCO-10/11 Sanjeev Sharma www.searching-eye.com


 Using the Euclidean Norm : Gradient Descent.
- Dependent on the condition number of the Hessian
near the optimal. (see previous presentation).
- Scaling issues: Large gamma– large condition
number.

CCO-10/11 Sanjeev Sharma www.searching-eye.com


CCO-10/11: Steepest Descent:L2 Norm: Gamma=0.05: No. Iter=211 CCO-10/11: Steepest Descent:L2 Norm: Gamma=0.1: No. Iter=110
10 10

50

20
40

30
50

30

40
40
20

30
8 8

10

50
40
20

10
10
50
10

30

20
6 6

4 4

2 2

50

20
40
30

30
50

40
0 0

10

40

30
20

50
40
20

10
10
50
10

20
30
-2 -2

-4 -4

-6 -6

50

20
40

30
30
50

40
-8 -8
40
10

30
20

50
40

10

10
20

50
10

20
30
10 -10
-10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10

CCO-10/11 Sanjeev Sharma www.searching-eye.com


CCO-10/11: Steepest Descent:L2 Norm: Gamma=0.5: No. Iter=19 CCO-10/11: Steepest Descent:L2 Norm: Gamma=1: No. Iter=1
10 10 50
40
30 0 0
20 5 4

50
40
8 8 40
50

30
50
10 20

20
6 6 30
30

40
10

30
4 4

20
20

50
40

10

2 2

20

40
30

10
50
40
10
50

20
0 0

50
20

20

40

10
30

30
-2 -2
10

50
30

-4 -4 10
20
20 40
40

-6 10 30 -6
30 30
40 40 50

50
20
50

-8 20 -8 50
30 40
10 -10 50
-10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10

CCO-10/11 Sanjeev Sharma www.searching-eye.com


CCO-10/11: Steepest Descent:L2 Norm: Gamma=5: No. Iter=39 CCO-10/11: Steepest Descent:L2 Norm: Gamma=10: No. Iter=84
10 10

8 8

6 6
50
4 40 30 40 50 4
50 20 40
50
30 4050
30
2 30 20 10 2 20 30
10 50
50
40

30 20

40
50

30
20
0 0
10

40
20
40

40
20
30

10 10

20
-2 10 -2 50 20 30
50 20 30 40 30 40 50
30 50
40 40 50
-4 50 -4

-6 -6

-8 -8

10 -10
-10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10

CCO-10/11 Sanjeev Sharma www.searching-eye.com


250

200

Number of Iterations 150

100

50

0
0 2 4 6 8 10 12 14 16 18 20
Gamma

CCO-10/11 Sanjeev Sharma www.searching-eye.com


The Quadratic Norm.

Qudractic Norm : || z || p  ( zT Pz)1/2


Resulting in Gradient Descent: Modified Coordinate System
1 0 
P  Exact Hessian     Resulting in 1 Iteration
0  

Backtracking Line Search   0.1;   0.8

CCO-10/11 Sanjeev Sharma www.searching-eye.com


CCO-10/11: Steepest Descent: Quadratic Norm: Gamma=0.1: No. Iter=1 CCO-10/11: Steepest Descent: Quadratic Norm: Gamma=0.5: No. Iter=1
10 10

50
40

20
30

30

30

50
8 8

40
40

30
10
10

20
20
50

50
6 6 10

40

20
20
4 4

10
2 2

50
40

30
50

20
30

30

40
30
0 0

40
10
10

50
20
20

10
50

10
-2 -2

40

20
-4 -4

-6 -6 10
20

50

30
40

50

30

40
30

-8 -8
30
20
40

50
10

10

20
20
50

40
10 -10
-10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10

CCO-10/11 Sanjeev Sharma www.searching-eye.com


CCO-10/11: Steepest Descent: Quadratic Norm: Gamma=1: No. Iter=1 CCO-10/11: Steepest Descent: Quadratic Norm: Gamma=5: No. Iter=1
10 50 10
50
40
8 40 8
30

6 20 30 6

50
50 50
50

4 10 4
30

40

40
30 40
40

20
20 30
10
20

2 2 50 40 30 20 10

50
40

20
0 0

10
30

40
30
20

50
50

30

20

-2 -2 10
10 50
20 30
40

10 40 30 50
20 40

40
-4 -4
50

-6 20 -6
30 30
-8 50 50 -8
40
40
10 50 -10
-10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10

CCO-10/11 Sanjeev Sharma www.searching-eye.com


CCO-10/11: Steepest Descent: Quadratic Norm: Gamma=10: No. Iter=1
10 2

8 1.8

6 1.6

4 1.4

Number of Iterations
40
50
30 40 50
2 20 30 1.2
10 40 50
50 30 20
30

0 1

20
20

10 10
40

-2 50 20 30 0.8
40 30 40 50
50
-4 0.6

-6 0.4

-8 0.2

10 0
-10 -8 -6 -4 -2 0 2 4 6 8 10 0 2 4 6 8 10 12 14 16 18 20
Gamma

CCO-10/11 Sanjeev Sharma www.searching-eye.com


The L-1 Norm.

L1 Norm : || z ||1   | zi |
i

Resulting in Coordinate Descent Methods


Backtracking line search:   0.1;   0.8

CCO-10/11 Sanjeev Sharma www.searching-eye.com


CCO-10/11: Steepest Descent:L1 Norm: Gamma=0.05: No. Iter=212 CCO-10/11: Steepest Descent:L1 Norm: Gamma=0.1: No. Iter=111
10 10

20

10

30
10
30
40

20
20

40

40
50
30

40
8 8

30

20
10

10
50

50

50
6 6

4 4

2 2

10
20

30
10
30
40

20

20

40

40
50
0 0

40
30

30

20
10
10
50

50
50
-2 -2

-4 -4

-6 -6

10
20

30
10
30
40

20

40
20

-8 -8

50

40

40
30

30

20
10

10
50

50
50
10 -10
-10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10

CCO-10/11 Sanjeev Sharma www.searching-eye.com


CCO-10/11: Steepest Descent:L1 Norm: Gamma=0.5: No. Iter=20 CCO-10/11: Steepest Descent:L1 Norm: Gamma=1: No. Iter=2
10 10
50
50

50

50
20 40
8 8 40
30

40
40

30
20
30

6 6

50
20
10 30

50
30
10
20

40
4 4 20 10
10

40
50

50
2 2

20
10
20

20
50
0 0

40

30
40

30
30

40
-2 -2

20

30

50
10 10 10
-4 -4

40
20 20
50

-6 10 -6

50
20 50 30
40 30
40
-8 30 -8
20 30 50
40

40
10 -10 50
-10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10

CCO-10/11 Sanjeev Sharma www.searching-eye.com


CCO-10/11: Steepest Descent:L1 Norm: Gamma=5: No. Iter=36 CCO-10/11: Steepest Descent:L1 Norm: Gamma=10: No. Iter=43
10 10

8 8

6 6
50
4 50 40 40 4
30 50
20 50 40 50
40 30
2 30 10 30 2 50 20 30
40 20 10 40
30 20 10
50

30

20
0 0

30
40

20
10

40

20
40
20 50 50
10
-2 10 -2 20 30
30 40
30
20 30 50 5040 50
50 40 40
-4 50 -4

-6 -6

-8 -8

10 -10
-10 -8 -6 -4 -2 0 2 4 6 8 10 -10 -8 -6 -4 -2 0 2 4 6 8 10

CCO-10/11 Sanjeev Sharma www.searching-eye.com


250

200

Number of Iterations 150

100

50

0
0 2 4 6 8 10 12 14 16 18 20
Gamma

CCO-10/11 Sanjeev Sharma www.searching-eye.com


 The huge number of iterations is a result of
Backtracking Line Search.
 The coordinate descent in this example can be
trivialized by solving it as a univariate problem at
each iteration.

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Reference for CCO-10/11
 Publications of Angelia Nedic, Asuman Ozdalgar
and Stephen Boyd.
 S. Boyd & L. Vandenberghe: Convex Optimization.
 J. Nocedal & S.J. Wright: Numerical Optimization.

(Each reference is explicitly mentioned on the CCO-10/11-webpage for each presentation)

CCO-10/11 Sanjeev Sharma www.searching-eye.com


Thank you for watching.
SEARCHING-EYE.COM
- One week project: Coming Soon

- Sanjeev Sharma
- http://www.searching-eye.com/sanjeevs.html
- http://www.searching-eye.com/~mac/
CCO-10/11 Sanjeev Sharma www.searching-eye.com

You might also like