Professional Documents
Culture Documents
CAMBRIDGE, MASS
DIMITRI P. BERTSEKAS
http://www.athenasc.com/nonlinbook.html
LECTURE 1: INTRODUCTION
LECTURE OUTLINE
• Nonlinear Programming
• Application Contexts
• Characterization Issue
• Computation Issue
• Duality
• Organization
NONLINEAR PROGRAMMING
min f (x),
x∈X
where
• f : n → is a continuous (and usually differ-
entiable) function of n variables
• X = n or X is a subset of n with a “continu-
ous” character.
• Characterization of minima
− Necessary conditions
− Sufficient conditions
− Lagrange multiplier theory
− Sensitivity
− Duality
• Computation by iterative algorithms
− Iterative descent
− Approximation methods
− Dual and primal-dual methods
APPLICATIONS OF NONLINEAR PROGRAMMING
• Unconstrained problems
− Zero 1st order variation along all directions
• Constrained problems
− Nonnegative 1st order variation along all fea-
sible directions
• Equality constraints
− Zero 1st order variation along all directions
on the constraint surface
− Lagrange multiplier theory
• Sensitivity
COMPUTATION PROBLEM
• Iterative descent
• Approximation
• Role of convergence analysis
• Role of rate of convergence analysis
• Using an existing package to solve a nonlinear
programming problem
POST-OPTIMAL ANALYSIS
• Sensitivity
• Role of Lagrange multipliers as prices
DUALITY
0 0
Max Intercept Point Max Intercept Point
(a) (b)
Illustration of the optimal values of the min common point
and max intercept point problems. In (a), the two optimal
values are not equal. In (b), the set S, when “extended
upwards” along the nth axis, yields the set
LECTURE 2
UNCONSTRAINED OPTIMIZATION -
OPTIMALITY CONDITIONS
LECTURE OUTLINE
• Unconstrained Optimization
• Local Minima
• Necessary Conditions for Local Minima
• Sufficient Conditions for Local Minima
• The Role of Convexity
MATHEMATICAL BACKGROUND
f(x)
x* = 0 x x* = 0 x x* = 0 x
d ∇f (x∗ ) = 0, ∀ d ∈ n
α 2
f (x∗ +αd)−f (x∗ ) = α∇f (x∗ ) d+ d ∇2 f (x∗ )d+o(α2 )
2
∇f (x∗ ) = 0
y
y
x
x
y
x
αf(x) + (1 - α)f(y)
f(z)
x y z
C
f(x)
αf(x*) + (1 - α)f(x)
x x* x
f(z)
f(z) + (z - x)'∇f(x)
x z
− Implication:
LECTURE OUTLINE
minn f (x) = 12 x Qx − b x,
x∈
• Necessary conditions:
∇f (x∗ ) = Qx∗ − b = 0,
1/α
0 x 0 x
1/α 1/α
0 x 0 x
min f (x)
x∈X
∇f(x)
x
If ∇f (x) = 0, there is an
xα = x - α∇f(x)
f(x) = c1 interval (0, δ) of stepsizes
f(x) = c2 < c1
such that
f(x) = c3 < c2 f x − α∇f (x) < f (x)
x - δ∇f(x)
∇f(x)
If d makes an angle with
xα = x + αd
∇f (x) that is greater than
x
f(x) = c1
90 degrees,
∇f (x) d < 0,
f(x) = c2 < c1
x + δd
f(x) = c3 < c2 d
there is an interval (0, δ)
of stepsizes such that f (x+
αd) < f (x) for all α ∈
(0, δ).
PRINCIPAL GRADIENT METHODS
xk+1 = xk + αk dk , k = 0, 1, . . .
∇f (xk ) dk < 0,
xk+1 = xk − αk Dk ∇f (xk ),
xk+1 = xk − αk ∇f (xk ), k = 0, 1, . . .
LECTURE 4
LECTURE OUTLINE
Set of Acceptable
Stepsizes Unsuccessful Stepsize
Trials
× × ×
0 βs s α
Stepsize α k = β2s
f(xk + αd k ) - f(xk )
σα∇f(xk )'dk
α∇f(xk )'dk
αk = s : a constant
• Diminishing stepsize:
αk → 0
∇f (x) − ∇f (y) ≤ L x − y , ∀ x, y ∈ n ,
(2 − )|∇f (x k ) dk |
0 < ≤ αk ≤
L dk 2
or
∞
(2) αk → 0 and k=0 αk = ∞.
Then either f (xk ) → −∞ or else {f (xk )} con-
verges to a finite value and ∇f (xk ) → 0.
MAIN PROOF IDEA
k k
)'d |
α = |∇f(x k |2
L||d ||
×
0 α
f(xk + αd k ) - f(xk )
α∇f(xk )'dk
y ∇f (x + αy) − ∇f (x) dα
0
1
≤ y ∇f (x) dα
0
1
+ y · ∇f (x + αy) − ∇f (x) dα
0
1
≤ y ∇f (x) + y Lα y dα
0
L
= y ∇f (x) + y 2 .
2
CONVERGENCE RESULT – ARMIJO RULE
dk k αk dk
Defining pk = dk
and α = β , we have
f (xk ) − f (xk + αk pk )
< −σ∇f (xk ) pk .
αk
LECTURE OUTLINE
CONVERGENCE ANALYSIS
some q > 0, p > 1 and β ∈ [0, 1), and for all k].
• Sublinear convergence
QUADRATIC MODEL ANALYSIS
xk+1 2 = xk (I − αk Q)2 xk
≤ max eig. (I − α Q) xk 2
k 2
xk+1
≤ max |1 − α m|, |1 − α M |
k k
x
k
OPTIMAL CONVERGENCE RATE
xk+1 M −m
≤
x
k M +m
max {|1 - αm|, |1 - αM|}
M -m |1 - αm |
M +m
|1 - αM |
0 1 2 2 1 α
M M +m M m
Stepsizes that
Guarantee Convergence
xk+1 = xk − αk Dk ∇f (xk )
y k+1 = y k − αk ∇h(y k )
Sy k+1 = Sy k − αk S∇h(y k )
xk+1 = xk − αk Dk ∇f (xk )
DIAGONAL SCALING
y k+1
≤ max |1 − α m |, |1 − α M |
k k k k
y
k
2
f (xk+1 ) h(y k+1 ) Mk −mk
= ≤
f (xk ) h(y k ) M k + mk
where mk and M k are the smallest and largest
eigenvalues of the Hessian of h, which is
LECTURE 6
LECTURE OUTLINE
• Newton’s Method
• Convergence Rate of the Pure Form
• Global Convergence
• Variants of Newton’s Method
• Least Squares Problems
• The Gauss-Newton Method
NEWTON’S METHOD
−1
xk+1 = xk − αk ∇ f (x )
2 k ∇f (xk )
so
xk+1 − x∗ = o x − x ,
k ∗
g(x) = e x - 1
k xk g(xk )
0 - 1.00000 - 0.63212
1 0.71828 1.05091
2 0.20587 0.22859
3 0.01981 0.02000
4 0.00019 0.00019
5 0.00000 0.00000
x0 = -1 x1
0 x2
x
g(x)
x3 x1 0 x0 x2
x
MODIFICATIONS FOR GLOBAL CONVERGENCE
• Use a stepsize
• Modify the Newton direction when:
− Hessian is not positive definite
− When Hessian is nearly singular (needed to
improve performance)
• Use
−1
dk = − ∇ f (x ) + ∆
2 k k ∇f (xk ),
∇2 f (xk ) + ∆k > 0
m
minimize f (x) = 12 g(x) 2 = 1
2 gi (x) 2
i=1
subject to x ∈ n ,
• Many applications:
− Solution of systems of n nonlinear equations
with n unknowns
− Model Construction – Curve Fitting
− Neural Networks
− Pattern Classification
PURE FORM OF THE GAUSS-NEWTON METHOD
• The direction
−1
− ∇g(x )∇g(x )
k k ∇g(xk )g(xk )
i
ψi = arg minn g̃j (x, ψj−1 ) 2 , i = 1, . . . , m,
x∈
j=1
m
1
2 zi − h(x, yi ) 2
i=1
h(x, y) = x3 y 3 + x2 y 2 + x1 y + x0 ,
LECTURE OUTLINE
m
minimize f (x) = 12 g(x) 2 = 1
2 gi (x) 2
i=1
subject to x ∈ n ,
m
xk+1 = xk −αk ∇f (xk ) = xk −αk ∇gi (xk )gi (xk )
i=1
ψ0 = xk , xk+1 = ψm
(ai x - bi )2
x*
ai R ai x
mini max i
bi bi
Advantage of incrementalism
VIEW AS GRADIENT METHOD W/ ERRORS
m
xk+1 = xk − αk ∇gi (xk )gi (xk )
i=1
m
+ αk ∇gi (xk )gi (xk ) − ∇gi (ψi−1 )gi (ψi−1 )
i=1
xk+1 = xk + αk dk
w1
d 1 = Q -1/2w1
Expanding Subspace Theorem
GENERATING Q-CONJUGATE DIRECTIONS
i
di+1 = ξ i+1 + c(i+1)m dm ;
m=0
d1
0
0 d0
- c10d 0 ξ0 = d0
CONJUGATE GRADIENT METHOD
k−1
g k Qdj j
dk = −g k + j j
d
j=0
d Qd
dk = −g k + β k dk−1 , k = 1, . . . , n − 1,
where β k is given by
g k gk (g k − g k−1 ) g k
βk = or βk =
g k−1 g k−1 g k−1 g k−1
g k is orthogonal to d0 , . . . , dk−1
g k is orthogonal to g 0 , . . . , g k−1
so g k is linearly independent of g 0 , . . . , g k−1 ,
completing the induction.
• Since at most n lin. independent gradients can
be generated, g k = 0 for some k ≤ n.
• Algebra to verify the direction formula.
QUASI-NEWTON METHODS
q k ≈ ∇2 f (xk+1 )pk ,
p k pk D k q k q k Dk
Dk+1 = Dk + − + ξ k τ k vk vk ,
pk q k q k Dk q k
p k D k qk
vk = − k , τ k = q k Dk q k , 0 ≤ ξk ≤ 1
pk q k τ
and D0 > 0 is arbitrary, αk by line minimization,
and Dn = Q−1 for a quadratic.
NONDERIVATIVE METHODS
∂f (xk ) 1
≈ f (xk + he ) − f (xk )
i
∂xi h
∂f (xk ) 1
i
≈ f (x + hei ) − f (x − hei )
k k
∂x 2h
•Coordinate descent.
Applies also to the case
xk+2
xk+1 where there are bound
xk constraints on the vari-
ables.
LECTURE 8
OPTIMALITY CONDITIONS
x
X
OPTIMALITY CONDITION
∇f (x∗ ) (x − x∗ ) ≥ 0, ∀ x ∈ X.
∇f(x*)
At a local minimum x∗ ,
x
the gradient ∇f (x∗ ) makes
an angle less than or equal
x* to 90 degrees with all fea-
sible variations x−x∗ , x ∈
Surfaces of equal cost f(x)
X.
∇f(x*)
Constraint set X
Illustration of failure of the
optimality condition when
x* X is not convex. Here x∗
x
is a local min but we have
∇f (x∗ ) (x − x∗ ) < 0 for
the feasible vector x shown.
PROOF
∂f (x∗ )
= 0, if x∗i > 0.
∂xi
∇f(x*) ∇f(x*)
x* x* = 0
OPTIMIZATION OVER A SIMPLEX
n
X = x
x ≥ 0, xi = r
i=1
∂f (x∗ ) ∂f (x∗ )
x∗i >0 =⇒ ≤ , ∀ j,
∂xi ∂xj
i.e., at the optimum, positive components have
minimal (and equal) first cost derivative.
OPTIMAL ROUTING
xp ≥ 0, ∀ p ∈ Pw , w ∈ W
• Optimality condition
∂D(x∗ ) ∂D(x∗ )
x∗p >0 =⇒ ≤ , ∀ p ∈ Pw ,
∂xp ∂xp
• If X is a subspace, z − x∗ ⊥ X.
• The mapping f : n → X defined by f (x) =
[x]+ is continuous and nonexpansive, that is,
[x]+ − [y]+ ≤ x − y , ∀ x, y ∈ n .
6.252 NONLINEAR PROGRAMMING
LECTURE OUTLINE
Constraint set X
x1
xk+1 = xk + αk dk ,
xk+1 = xk + αk (xk − xk ),
Constraint set X
x
Illustration of the direction
_
of the conditional gradient
x
method.
Surfaces of
equal cost
Constraint set X
x0
x1 Operation of the method.
x2
_1
x
x*
_
x0 Slow (sublinear) convergence.
Surfaces of
equal cost
CONVERGENCE OF CONDITIONAL GRADIENT
{xk −xk }k∈K : bounded, lim sup ∇f (xk ) (xk −xk ) < 0
k→∞, k∈K
xk+1 = xk + αk (xk − xk )
+
x = x − s ∇f (x )
kk k k
{xk −xk }k∈K : bounded, lim sup ∇f (xk ) (xk −xk ) < 0
k→∞, k∈K
1st relation holds because x − k
xk k∈K
con-
verges to [x̃−s∇f (x̃)]+ −x̃ . By optimality condi-
k
tion for projections, x −s∇f (x )−x (x−xk ) ≤
k k
k
≤ max |1 − sm|, |1 − sM | x − x
∗
LECTURE 10
LECTURE OUTLINE
xk+1 = xk + αk (xk − xk ),
1
xk = arg min ∇f (x ) (x − x ) + k (x − xk ) H k (x − xk )
k k
.
x∈X 2s
∇f(xk)
xk xk
xk - αk Dk ∇f(xk ) xk - αk Dk ∇f(xk )
(a) (b)
I + (xk ) = i
xki = 0, ∂f (xk )/∂xi > 0
PROPERTIES OF 2-METRIC PROJECTION
x1
x1
x3 x4
x2
x2
x3
(a) (b)
xk+1 = xk − αk (H k )−1 (c − A λk ),
x0
x1
Importance of using time-
x2 varying H k (should bend
xk −xk away from the bound-
x3 ary)
x*
AFFINE SCALING
H k = (X k )−2 ,
k+1 k k k 2 k k
k 2
−1
x = x −α (X ) (c−A λ ), λ = A(X ) A A(X k )2 c
xk+1 yk+1
yk= (Xk )-1 xk
xk
yk = (1,1,1)
LECTURE 11
CONSTRAINED OPTIMIZATION;
LAGRANGE MULTIPLIERS
LECTURE OUTLINE
minimize f (x)
subject to hi (x) = 0, i = 1, . . . , m.
h(x) = 0
2
minimize x1 + x2
x2
h 2(x) = 0
minimize x1 + x2
∇f(x* ) = (1,1)
∇h 1(x* ) = (-2,0) s. t. (x1 − 1)2 + x22 − 1 = 0
1 2
∇h 2(x* ) = (-4,0) x1
By defining
λ∗ = −(B )−1 ∇B f (x∗ ),
0 ≤ d ∇2 F (x∗R )d = y ∇2 f (x∗ )y
m
∗
=y 2
∇ f (x ) + λ∗i ∇2 hi (x∗ ) y,
i=1
where y = ( yB yR ) = ( −B −1 Rd d ) .
• y has this form iff
k α
F (x) = f (x) + ||h(x)|| + ||x − x∗ ||2 ,
k 2
2 2
k α
F k (xk ) = f (xk )+ ||h(xk )||2 + ||xk −x∗ ||2 ≤ F k (x∗ ) = f (x∗ )
2 2
Hence, limk→∞ ||h(xk )|| = 0, so for every limit point
x of {xk }, h(x) = 0.
• Furthermore, f (xk ) + (α/2)||xk − x∗ ||2 ≤ f (x∗ ) for
all k, so by taking lim,
α
f (x) + ||x − x∗ ||2 ≤ f (x∗ ).
2
Combine with f (x∗ ) ≤ f (x) [since x ∈ S and h(x) = 0]
to obtain ||x−x∗ || = 0 so that x = x∗ . Thus {xk } → x∗ .
PENALTY APPROACH - CONTINUED
k
k k
−1 k
k k ∗
kh(x ) = − ∇h(x ) ∇h(x ) ∇h(x ) ∇f (x )+α(x −x ) .
∇x L(x∗ , λ∗ ) = 0, ∇λ L(x∗ , λ∗ ) = 0,
• Example
minimize 1
2
x21 + x22 + x23
subject to x1 + x2 + x3 = 3.
Necessary conditions
x∗1 + λ∗ = 0, x∗2 + λ∗ = 0,
σ
6.252 NONLINEAR PROGRAMMING
LECTURE OUTLINE
minimize f (x)
subject to hi (x) = 0, i = 1, . . . , m.
∇x L(x∗ , λ∗ ) = 0, ∇λ L(x∗ , λ∗ ) = 0,
∇f(x* )
a x* + ∆x a'x = b + ∆b
∆x
a'x = b
x*
slope ∇p(0) = - λ* = -1 u
-1 0
Illustration of the primal function p(u) = f x(u)
for the two-dimensional problem
minimize f (x) = 1
2
x21 − x22 − x2
subject to h(x) = x2 = 0.
Here,
p(u) = min f (x) = − 12 u2 − u
h(x)=u
LECTURE OUTLINE
minimize f (x)
subject to h(x) = 0, g(x) ≤ 0
where f : n → , h : n → m , g : n → r are
continuously differentiable. Here
• If x∗ is a local minimum:
− The active inequality constraints at x∗ can be
treated as equations
− The inactive constraints at x∗ don’t matter
• Assuming regularity of x∗ and assigning zero
Lagrange multipliers to inactive constraints,
m
r
µ∗j = 0, / A(x∗ ).
∀j∈
∇x L(x∗ , λ∗ , µ∗ ) = 0,
µ∗j ≥ 0, j = 1, . . . , r,
µ∗j = 0, / A(x∗ ).
∀j∈
where
∗
∗ ∗ ∗
V (x ) = y | ∇h(x ) y = 0, ∇gj (x ) y = 0, j ∈ A(x ) .
as well as regularity of x∗ .
PROOF OF KUHN-TUCKER CONDITIONS
k k
r
2 1
f (x) + ||h(x)||2 + gj+ (x) + ||x − x∗ ||2
2 2 2
j=1
r
∇f (x∗ ) + µ∗j aj = 0, µ∗j = 0, / A(x∗ ).
∀j∈
j=1
0
{
C= x|x= Σ
j=1
µjaj, µj ≥ 0 }
C ⊥ = {y | aj'y ≤ 0, j=1,...,r}
a1
⊥
Then, C⊥ = C, i.e.,
a2
0
{
C= x|x= Σ
j=1
µjaj, µj ≥ 0 }
C ⊥ = {y | aj'y ≤ 0, j=1,...,r} ^x
x - x^ a1
x
x (x − x̂) = x − x̂ 2 , (∗)
(x − x̂) aj ≤ 0, ∀ j.
x (x − x̂) ≤ 0. (∗∗)
− ∇f(x* )
x*
a1
Constraint set
{x | aj'x ≤ bj, j = 1,...,r}
LECTURE OUTLINE
minimize f (x)
subject to aj x ≤ bj , j = 1, . . . , r,
min f (x)
x∈X, aj x≤bj , j=1,...,r
minimize c x
subject to ei x = di , i = 1, . . . , m, x≥0
• Dual function
n
m
m
q(λ) = inf cj − λi eij xj + λi di .
x≥0
j=1 i=1 i=1
m
• If cj − i=1 λi eij ≥ 0 for all
j , the infimum is
m
attained for x = 0, and q(λ) = λ d . If cj −
m i=1 i i
λ e < 0 for some j , the expression in braces
i=1 i ij
can be arbitrarily small by taking xj suff. large, so
q(λ) = −∞. Thus, the dual is
m
maximize λi di
i=1
m
subject to λi eij ≤ cj , j = 1, . . . , n.
i=1
THE DUAL OF A QUADRATIC PROGRAM
LECTURE OUTLINE
minimize f (x)
subject to x ∈ X, gj (x) ≤ 0, j = 1, . . . , r,
• Barrier Method:
k
k
x = arg min f (x) + B(x) , k = 0, 1, . . . ,
x∈S
ε B(x)
ε' < ε
ε' B(x)
Boundary of S Boundary of S
S
CONVERGENCE
– a contradiction.
LINEAR PROGRAMS/LOGARITHMIC BARRIER
where S = x | Ax = b, x > 0}. We assume that S is
nonempty and bounded.
• As → 0, x() follows the central path
• By straightforward calculation
x = x − Xq(x, ),
Xz
q(x, ) = − e, e = (1 . . . 1) , z = c − A λ,
2 −1
λ = (AX A ) AX Xc − e ,
x0
x(ε 0)
S
√
x∞ c x− min c y ≤ n+ n .
Ay=b, y≥0
Set {x | ||q(x,ε 0)|| < 1}
• The “termination set” x | q(x, ) < 1 is part
of the region of quadratic convergence of the pure
form of Newton’s method. In particular, if q(x, ) <
1, then the pure Newton iterate x = x − Xq(x, ) is
an interior point, that is, x ∈ S . Furthermore, we
have q(x, ) < 1 and in fact
q(x, ) ≤ q(x, ) 2 .
SHORT STEP METHODS
In particular, if
δ ≤ γ(1 − γ)(1 + γ)−1 ,
we have q(x, ) ≤ γ .
•Can be used to establish nice complexity results;
but must be reduced VERY slowly.
LONG STEP METHODS
• Main features:
− Decrease faster than dictated by complex-
ity analysis.
− Require more than one Newton step per (ap-
proximate) minimization.
− Use line search as in unconstrained New-
ton’s method.
− Require much smaller number of (approxi-
mate) minimizations.
x* x*
Central Path Central Path
xk+2
x(ε k+2)
xk+2 x(ε k+2)
xk+1 x(ε k+1)
xk+1 x(ε k+1) S S
xk x(ε k ) xk x(ε k )
x∞ x∞
(a) (b)
LECTURE OUTLINE
• Example:
minimize f (x) = 12 (x21 + x22 )
subject to x1 = 1
1 (x2
c
Lc (x, λ) = 2 1 + x22 ) + λ(x1 − 1) + (x1 − 1)2
2
c−λ
x1 (λ, c) = , x2 (λ, c) = 0
c+1
EXAMPLE CONTINUED
x2 x2
c=1 c=1
λ =0 λ = - 1/2
1/2 1 x1 3/4 1 x1
0 0
x2 x2
c=1 c = 10
λ =0 λ =0
1/2 x1 x1
0 1 0 10/11 1
GLOBAL CONVERGENCE
k ck
k k k
Lck (x , λ ) = f (x ) + λ h(x ) +k
h(xk ) 2 ≤ f ∗ .
2
ck
f (x̄) + λ̄ h(x̄) + lim sup h(xk ) 2 ≤ f ∗ . (*)
k→∞ 2
Proof: We have
k k k k
k k k
0 = ∇x Lck (x , λ ) = ∇f (x ) + ∇h(x ) λ + c h(x )
= ∇f (xk ) + ∇h(xk )λ̃k .
Multiply with
k
−1
∇h(x ) ∇h(x )k
∇h(xk )
LECTURE OUTLINE
• Multiplier Methods
*******************************************
• Consider the equality constrained problem
minimize f (x)
subject to h(x) = 0,
where f : n → and h : n → m are continuously
differentiable.
• The (1st order) multiplier method finds
k k k ck
x = arg min Lck (x, λ ) ≡ f (x) + λ h(x) + h(x) 2
x∈ n 2
λk+1 = λk + ck h(xk )
CONVEX EXAMPLE
k+1 k k ck − λk
λ =λ +c −1
ck + 1
k+1 ∗ λk − λ∗
λ −λ = k
c +1
• We see that:
− λk → λ∗ = −1 and xk → x∗ = (1, 0) for ev-
ery nondecreasing sequence {ck }. It is NOT
necessary to increase ck to ∞.
− The convergence rate becomes
k ∗ faster as c k
k+1 ∗λk − λ∗
λ −λ =− k
c −1
• We see that:
− No need to increase ck to ∞ for convergence;
doing so results in faster convergence rate.
− To obtain convergence, ck must eventually
exceed the threshold 2.
THE PRIMAL FUNCTIONAL
p(0) = f(x*) = 1
2 u
-1 0
-1 0 u 1
p(0) = f(x*) = - 2
(a) (b)
Slope = - λ Slope = - λ*
p(0) = f(x*)
Primal Function
p(u)
- λ'u(λ,c)
min Lc (x,λ)
x
u(λ,c) 0 u
λk+1 = λk + ck h(xk ).
c Slope = - λ k+1
p(u) + ||u||2
2
Slope = - λ k Slope = - λ*
p(0) = f(x*) Slope = - λ k+2
p(u)
min Lc k(x,λ k )
x
uk 0 u k+1 u
∇qc (λ) = ∇λ x(λ, c)∇x Lc x(λ, c), λ + h x(λ, c)
= h x(λ, c) .
LECTURE OUTLINE
minimize f (x)
subject to x ∈ X, gj (x) ≤ 0, j = 1, . . . , r,
• Let M be a subset of n :
• Min Common Point Problem : Among all points that
are common to both M and the nth axis,find the
one whose nth component is minimum.
• Max Crossing Point Problem : Among all hyper-
planes that intersect the nth axis and support the
set M from “below”, find the hyperplane for which
point of intercept with the nth axis is maximum.
w w
Min Common Point w* Min Common Point w*
M M
0 u 0 u
Max Crossing Point q*
Max Crossing Point q*
µ∗j ≥ 0, j = 1, . . . , r,
and
f ∗ = inf L(x, µ∗ ).
x∈X
w w
S = {(g(x),f(x)) | x ∈ X} S = {(g(x),f(x)) | x ∈ X}
(µ*,1)
(µ*,1)
(0,f*) (0,f*)
0 z 0 z
(a) (b)
(µ*,1)
(µ*,1) (µ*,1)
S = {(g(x),f(x)) | x ∈ X}
min f(x) = x
s.t. g(x) = x2 ≤ 0
x ∈X = R
(0,f*) = (0,0)
(a)
min f(x) = - x
S = {(g(x),f(x)) | x ∈ X}
s.t. g(x) = x - 1/2 ≤ 0
(-1/2,0) x ∈ X = {0,1}
(0,f*) = (0,0)
(b)
(1/2,-1)
maximize q(µ)
subject to µ ≥ 0,
(µ,1) S = {(g(x),f(x)) | x ∈ X}
Optimal
Dual Value
Support points
correspond to minimizers q(µ) = inf L(x,µ)
of L(x,µ) over X x ∈X
H = {(z,w) | w + µ'z = b}
WEAK DUALITY
• The domain of q is
Dq = µ | q(µ) > −∞ .
q∗ ≤ f ∗ .
so
q ∗ = sup q(µ) ≤ inf f (x) = f ∗ .
µ≥0 x∈X, g(x)≤0
DUAL OPTIMAL SOLUTIONS AND G-MULTIPLIERS
q(µ)
min f(x) = x
s.t. g(x) = x2 ≤ 0
x ∈X = R
f* = 0
µ - 1/(4 µ) if µ > 0
q(µ) = min {x + µx2} =
x∈R
{
- ∞ if µ ≤ 0
(a)
q(µ)
min f(x) = - x
s.t. g(x) = x - 1/2 ≤ 0
x ∈ X = {0,1}
f* = 0 1
µ q(µ) = min { - x + µ(x - 1/2)} = min{ - µ/2, µ/2 −1}
- 1/2 x ∈ {0,1}
-1 (b)
6.252 NONLINEAR PROGRAMMING
LECTURE OUTLINE
minimize f (x)
subject to x ∈ X, gj (x) ≤ 0, j = 1, . . . , r,
maximize q(µ)
subject to µ ≥ 0,
(µ,1) S = {(g(x),f(x)) | x ∈ X}
Optimal
Dual Value
Support points
correspond to minimizers q(µ) = inf L(x,µ)
of L(x,µ) over X x ∈X
H = {(z,w) | w + µ'z = b}
z
S = {(g(x),f(x)) | x ∈ X}
min f(x) = x
s.t. g(x) = x2 ≤ 0
S = {(x2,x) | x > 0}
x ∈ X = {x | x > 0}
0 w
f* = ∞, q* = 0
(b)
min f(x) = x1 + x2
S = {(g(x),f(x)) | x ∈ X}
= {(z,w) | z > 0} s.t. g(x) = x1 ≤ 0
x ∈ X = {(x1,x2) | x1 > 0}
0 w
f* = ∞, q* = −∞
(c)
EXTENSIONS AND APPLICATIONS
• Separable problems:
m
minimize fi (xi )
i=1
m
subject to gij (xi ) ≤ 0, j = 1, . . . , r,
i=1
xi ∈ Xi , i = 1, . . . , m.
n
minimize fi (xi )
i=1
n
subject to xi ≥ A, α i ≤ xi ≤ β i , ∀ i.
i=1
DUALITY THEOREM I FOR CONVEX PROBLEMS
minimize f (x)
subject to x ∈ X, ai x − bi = 0, i = 1, . . . , m,
ej x − dj ≤ 0, j = 1, . . . , r,
minimize f (x)
subject to x1 ≤ 0, x ∈ X = {x | x ≥ 0},
where √
− x1 x2
f (x) = e , ∀ x ∈ X,
minimize f (x)
subject to x ∈ X, gj (x) ≤ 0, j = 1, . . . , r.
gj (x̄) < 0, ∀ j = 1, . . . , r.
LECTURE OUTLINE
********************************
minimize f (x)
subject to x ∈ X, gj (x) ≤ 0, j = 1, . . . , r,
gj (x̄) < 0, ∀ j = 1, . . . , r.
w
( g(x),f(x)) S = {(g(x),f(x)) | x ∈ X}
(µ,1)
(0,f*)
z
PROOF OUTLINE
βf ∗ ≤ βw + µ z, ∀ (z, w) ∈ A.
ei x − di = 0, i = 1, . . . , m
• Assumptions:
− The set X is convex and the functions f , gj
are convex over X .
− The optimal value f ∗ is finite and there exists
a vector x̄ ∈ ri(X) such that
gj (x̄) < 0, j = 1, . . . , r,
ei x̄ − di = 0, i = 1, . . . , m.
• Consider
minimize f (x) = x1
subject to x2 = 0, x∈X= (x1 , x2 ) | x21 ≤ x2 .
= g2 (λ) − g1 (λ)
DUALITY THEOREM
Slope = λ
(λ,-1)
sup {f2(x) - x'λ} = - g2(λ)
x ∈ X2
f1(x)
(λ,-1)
f2(x)
Slope = λ
0 x 0 x
X1 X2
• Assume that
− X1 and X2 are convex
− f1 and f2 are convex and concave over X1
and X2 , respectively
− The relative interiors of X1 and X2 intersect
• The duality theorem for equalities applies and
shows that
∗
f = max g2 (λ) − g1 (λ)
λ∈ n
x∗ ∈ X1 ∩ X2 , (primal feasibility),
λ∗ ∈ Λ1 ∩ Λ2 , (dual feasibility),
x = arg max x λ∗ − f1 (x)
∗
x∈X1
∗
= arg min x λ − f2 (x) , (Lagr. optimality).
x∈X2
Slope = λ*
f 1(x)
(- λ*,1)
(- λ,1)
g2(λ) - g1(λ)
Slope = λ
0 x* x
f 2(x)
g2(λ*) - g1(λ*)
6.252 NONLINEAR PROGRAMMING
LECTURE OUTLINE
********************************
• Consider
minimize f (x)
subject to x ∈ X, gj (x) ≤ 0, j = 1, . . . , r,
Lower Bound_fY
_ Y = {1,2,3} {4,5}
Feasible Solution x ∈ Y
{1} {2}
BRANCH-AND-BOUND ALGORITHM
minimize f (x)
subject to gj (x) ≤ 0, j = 1, . . . , r,
x ∈ X,
where
r
q(µ) = min f (x) + µj gj (x) .
x∈X
j=1
LECTURE OUTLINE
• Dual Methods
• Nondifferentiable Optimization
********************************
minimize f (x)
subject to x ∈ X, gj (x) ≤ 0, j = 1, . . . , r,
subject to µ ≥ 0.
PROS AND CONS FOR SOLVING THE DUAL
ADVANTAGES:
DISADVANTAGES:
can be written as
minimize F (x) + inf G(y)
By=c−Ax, y∈Y
subject to x ∈ X.
• Let
xµ = arg min L(x, µ) = arg min f (x) + µ g(x) .
x∈X x∈X
∇q(µ) = g(xµ ), ∀ µ ∈ r .
NONDIFFERENTIABLE DUAL
∂q(µ) = g
g = ξi ai , ξi ≥ 0, ξi = 1 .
⎩ ⎭
i∈Iµ i∈Iµ
NONDIFFERENTIABLE OPTIMIZATION
k+1
k k k +
µ = µ +s g ,
Contours of q
M
µk
µ*
[ µk + sk g k ]+
gk
µk + sk g k
KEY SUBGRADIENT METHOD PROPERTY
Contours of q M
µk < 90 o
µ* gk
µk+1 = [ µk + sk g k ]+
µk + sk g k
µk+1 − µ∗ < µk − µ∗ ,
where qk ≈ q∗ and
0 < αk < 2.
• Some possibilities:
− q k is the best known upper bound to q ∗ ; α0 = 1
and αk decreased by a certain factor every
few iterations.
− αk = 1 for all k and
q k = max q(µi ) + δ k ,
0≤i≤k
LECTURE OUTLINE
********************************
minimize f (x)
subject to x ∈ X, gj (x) ≤ 0, j = 1, . . . , r,
max Qk (µ)
µ∈M
where
k
i i i
Q (µ) = min q(µ ) + (µ − µ ) g .
i=0,...,k−1
Set
µk = arg max Qk (µ).
µ∈M
q(µ0) + (µ − µ0)'g(x µ 0)
q(µ1) + (µ − µ1)'g(x µ 1)
q(µ)
µ0 µ3 µ* µ2 µ1
µ
M
POLYHEDRAL CASE
q(µ) = min ai µ + bi
i∈I
q(µ)
µ0 µ3 µ2 µ1
µ
M µ4 = µ*
CONVERGENCE
q(µi ) + (µ − µi ) g i ≥ q(µ), ∀ µ ∈ M,
J
minimize fj (xj )
j=1
J
subject to xj ∈ Xj , j = 1, . . . , J, Aj xj = b.
j=1
• Dual function is
J
q(λ) = min fj (xj ) + λ Aj xj − λ b
xj ∈Xj
j=1
J
= fj xj (λ) + λ Aj xj (λ) − λ b
j=1
J
gλ = Aj xj (λ) − b.
j=1
DANTSIG-WOLFE DECOMPOSITION
k−1
i i i i
minimize ξ q(λ ) − λ g
i=0
k−1
k−1
subject to ξ i = 1, ξ i g i = 0,
i=0 i=0
ξ i ≥ 0, i = 0, . . . , k − 1,
DANTSIG-WOLFE DECOMPOSITION (CONT.)
ξ i ≥ 0, i = 0, . . . , k − 1.
k−1
i i
ξ fj xj (λ )
i=0
k−1
ξ i xj (λi )
i=0
GEOMETRICAL INTERPRETATION
fj(xj)
LECTURE OUTLINE
= inf f (x) + µj uj
{(u,x)|x∈X, gj (x)≤uj , j=1,...,r}
j=1
r
= inf inf f (x) + µj uj ,
u∈ r x∈X, gj (x)≤uj
j=1,...,r j=1
and finally q(µ) = inf u∈ r p(u) + µ u for all µ ≥ 0
• Thus,
q(µ) = −h(−µ), ∀ µ ≥ 0,
• We have