Professional Documents
Culture Documents
CAMBRIDGE, MASS
BY DIMITRI P. BERTSEKAS
http://web.mit.edu/dimitrib/www/home.html
http://www.athenasc.com/convexity.html
LECTURE OUTLINE
• Generic form:
minimize f (x)
subject to x ∈ C
r
L(x, µ) = f (x)+ µj gj (x), x ∈ n , µ ∈ r .
j=1
_ w
w
M
Min Common Point w* Min Common Point w*
M M
0 u 0 u
Max Crossing Point q*
Max Crossing Point q*
(a) (b)
w
_
M
Min Common Point w*
S M
Max Crossing Point q*
0 u
(c)
x1
LECTURE OUTLINE
and y
√
• x = x x is the (Euclidean) norm of x. We
use this norm almost exclusively
• See Section 1.1 of the textbook for an overview
of the linear algebra and real analysis background
that we will use
CONVEX SETS
y
y
x
x
y
x
αx + (1 − α)y ∈ C, ∀ x, y ∈ C, ∀ α ∈ [0, 1]
αf(x) + (1 - α)f(y)
f(z)
x y z
C
0 x
{x | f(x) ≤ γ}
• Note that:
− If f is lower semicontinuous at all x ∈ dom(f ),
it is not necessarily closed
− If f is closed, dom(f ) is not necessarily closed
• Proposition: Let f : X → [−∞, ∞] be a func-
tion. If dom(f ) is closed and f is lower semicon-
tinuous at all x ∈ dom(f ), then f is closed.
EXTENDED REAL-VALUED CONVEX FUNCTIONS
x x
Convex function Nonconvex function
LECTURE OUTLINE
f(z)
f(x) + (z - x)'∇f(x)
x z
f (y) = f (x)+(y−x) ∇f (x)+ 12 (y−x) ∇2 f x+α(y−x) (y−x)
• Given a set X ⊂ n :
• A convex combination
m of elements of X is a
vector of
m the form i=1 αi xi , where xi ∈ X, αi ≥
0, and i=1 αi = 1.
• The convex hull of X, denoted conv(X), is the
intersection of all convex sets containing X (also
the set of all convex combinations from X).
• The affine hull of X, denoted aff(X), is the in-
tersection of all affine sets containing X (an affine
set is a set of the form x + S, where S is a sub-
space). Note that aff(X) is itself an affine set.
• A nonnegative combination
m of elements of X is
a vector of the form i=1 αi xi , where xi ∈ X and
αi ≥ 0 for all i.
• The cone generated by X, denoted cone(X), is
the set of all nonnegative combinations from X:
− It is a convex cone containing the origin.
− It need not be closed.
− If X is a finite set, cone(X) is closed (non-
trivial to show!)
CARATHEODORY’S THEOREM
cone(X) x1
conv(X)
X
x1
x x2 x
x4
x2
0 x3
(a) (b)
m
λi xi = 0
i=1
the sequence
(α1k , . . . , αn+1
k , xk1 , . . . , xkn+1 )
xα = αx + (1 - α)x
x αε
Sα
ε
S
C
ADDITIONAL MAJOR RESULTS
X z2
z1
C
m
X= αi zi
αi < 1, αi > 0, i = 1, . . . , m
i=1 i=1
x* x
x
X
aff(X)
LECTURE OUTLINE
x
C
LINEAR TRANSFORMATIONS
C1 = {x | x ≤ 0}, C2 = {x | x ≥ 0}
CONTINUITY OF CONVEX FUNCTIONS
xk
xk+1
e4 zk e1
xk ∞ 1
f (0) ≤ f (zk ) + f (xk )
xk ∞ + 1 xk ∞ + 1
Since xk ∞ → 0, f (xk ) → f (0). Q.E.D.
x + αy ∈ C, ∀ x ∈ C, ∀ α ≥ 0
x + αy
x
y
Recession Cone RC
0
Convex Set C
RC∩D = RC ∩ RD
z3
z2
z1 = x + y C
x _
_ x + y3
_ x + y2 _
x + y1 x+y
_
x
LC = RC ∩ (−RC )
C = LC + (C ∩ L⊥
C ).
S C
x
z S
y
C∩S
0
LECTURE 5
LECTURE OUTLINE
x
y
Recession Cone RC
0
Convex Set C
epi(f)
γ
“Slice” {(x,γ) | f(x) ≤ γ}
0 Recession
Cone of f
0 Recession Cone Rf
• Terminology:
− y ∈ Rf : a direction of recession of f .
− Lf = Rf ∩ (−Rf ): the lineality space of f .
− y ∈ Lf : a direction of constancy of f .
− Function rf : n → (−∞, ∞] whose epi-
graph is Repi(f ) : the recession function of f .
• Note: rf (y) is the “asymptotic slope” of f in the
direction y. In fact, rf (y) = limα→∞ ∇f (x + αy) y
if f is differentiable. Also, y ∈ Rf iff rf (y) ≤ 0.
DESCENT BEHAVIOR OF A CONVEX FUNCTION
f(x + αy) f(x + αy)
f(x) f(x)
α α
(a) (b)
f(x) f(x)
α α
(c) (d)
f(x)
f(x)
α α
(e) (f)
xk d
xk → ∞, →
xk d
S0
S1
x1
S2
x0 x3 S3
x2
x4
x5
x6
Asymptotic Sequence
Asymptotic Direction
CONNECTION WITH RECESSION CONES
∩∞
k=0 ASk
AS = RS \ {0}
∩∞
k=0 RSk \ {0}
LECTURE 6
LECTURE OUTLINE
S0
S1
x1
S2
x0 x3 S3
x2
x4
x5
x6
Asymptotic Sequence
Asymptotic Direction
RETRACTIVE ASYMPTOTIC DIRECTIONS
d d
S2
S1
0 S0 0 x2
S2 x2
x1
S1 x1 x0
S0 x0
(a) (b)
SET INTERSECTION THEOREM
x1
x3
x2
x4 x5
x0
Asymptotic Sequence
0
d
Asymptotic Direction
RECOGNIZING RETRACTIVE SETS
Sk = {x ∈ X | x Qx + c x ≤ γk }
d Qd ≤ 0, aj d ≤ 0, j = 1, . . . , r
(c + 2Qx) d ≥ 0, ∀x∈X
• Thus,
so xk − d ∈ Sk . Q.E.D.
INTERSECTION THEOREM FOR CONVEX SETS
R = ∩∞
k=0 RCk , L = ∩∞
k=0 LCk .
∩∞
k=0 Ck = L + C̃,
Sk1 ∩ Sk2 = Ø, k = 0, 1, . . . ,
S2
Sk1
d: “Critical Asymptote”
HORIZON DIRECTIONS
x2 x2
Sk+1
Sk
Sk+1
0 x1 0 x1
Sk
Sk = Ø, ∀ k, ∩∞ j
k=0 k = Ø,
S ∀ j.
• Let
f (x) = x Qx + c x,
X = {x | x Rj x + aj x + bj ≤ 0, j = 1, . . . , r},
where Q and Rj are positive semidefinite matri-
ces. If the minimal value of f over X is finite, there
exists a minimum of f of over X.
Proof: Let f ∗ be the minimal value, and let γk ↓
f ∗ . The set of optimal solutions is
X∗ = ∩∞
k=0 X ∩ {x | x Qx + c x ≤ γk } .
C
x Sk
y yk+1 yk
Wk
AC
LECTURE 7
LECTURE OUTLINE
(y − z) (x − z) ≤ 0, ∀y∈C
x − z ∈ S⊥,
f (x) = x Qx + c x,
X= x| x Qj x+aj x+bj ≤ 0, j = 1, . . . , r .
x2 x2
X X
0 x1 0 x1
(b)
(a)
Positive Halfspace
{x | a'x ≥ b}
_
x
either a x1 ≤ b ≤ a x2 , ∀ x1 ∈ C1 , ∀ x2 ∈ C2 ,
or a x2 ≤ b ≤ a x1 , ∀ x1 ∈ C1 , ∀ x2 ∈ C2
C1
C
x
a C2
a
(a) (b)
a x1 < b < a x2 , ∀ x1 ∈ C1 , ∀ x2 ∈ C2
x1
x
x2 C2
C1 = {(ξ1,ξ2) | ξ1 ≤ 0}
a
(a) (b)
SUPPORTING HYPERPLANE THEOREM
C x0
x1
x2
x3 a0
x
a1
x3
a2
x0
x2 x1
a x1 ≤ a x2 , ∀ x1 ∈ C1 , ∀ x2 ∈ C2 .
C1 − C2 = {x2 − x1 | x1 ∈ C1 , x2 ∈ C2 }
x1
x
x2 C2
C1 = {(ξ1,ξ2) | ξ1 ≤ 0}
a
(a) (b)
a
C1
Separating C1
hyperplane
C2
Separating
hyperplane
C2
(a) (b)
ri(C1 ) ∩ ri(C2 ) = Ø
MIN COMMON / MAX CROSSING PROBLEMS
M M
0 u 0 u
Max Crossing Point q*
Max Crossing Point q*
(µ,β)
__
(u,w)
Vertical
_ _ Hyperplane
(µ/β)' u + w
0 u
Nonvertical
Hyperplane
LECTURE OUTLINE
w w
Min Common Point w*
Min Common Point w*
M M
0 u 0 u
Max Crossing Point q*
Max Crossing Point q*
WEAK DUALITY
ξ ≤ w + µ u, ∀ (u, w) ∈ M
subject to µ ∈ n .
• For all (u, w) ∈ M and µ ∈ n ,
w _ w
M
Min Common Point w* Min Common Point w*
M M
0 u 0 u
Max Crossing Point q*
Max Crossing Point q*
(a) (b)
w
_
M
Min Common Point w*
S M
Max Crossing Point q*
0 u
(c)
DUALITY THEOREMS
is convex.
• Min Common/Max Crossing Theorem I : We
have q ∗ = w ∗ if and only if for every sequence
(uk , wk ) ⊂ M with uk → 0, there holds w∗ ≤
lim inf k→∞ wk .
• Min Common/Max Crossing Theorem II : As-
sume in addition that −∞ < w∗ and that the set
D = u | there exists w ∈ with (u, w) ∈ M }
Given φ : X × Z → , where X ⊂ n , Z ⊂ m
consider
minimize sup φ(x, z)
z∈Z
subject to x ∈ X
and
maximize inf φ(x, z)
x∈X
subject to z ∈ Z.
minimize f (x)
subject to x ∈ X, gj (x) ≤ 0, j = 1, . . . , r
x = (x1 , . . . , xn ), z = (z1 , . . . , zm )
• We always have
LECTURE OUTLINE
• Min-Max Problems
• Saddle Points
• Min Common/Max Crossing for Min-Max
−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Given φ : X × Z → , where X ⊂ n , Z ⊂ m
consider
minimize sup φ(x, z)
z∈Z
subject to x ∈ X
and
maximize inf φ(x, z)
x∈X
subject to z ∈ Z.
x∗ ∈ arg min sup φ(x, z), z ∗ ∈ arg max inf φ(x, z) (*)
x∈X z∈Z z∈Z x∈X
φ(x,z)
Curve of maxima
Saddle point
φ(x,z(x)
^ )
(x*,z*)
Curve of minima
x
φ(x(z),z
^ )
Assume that:
(1) X and Z are convex.
(2) p(0) = inf x∈X supz∈Z φ(x, z) < ∞.
(3) For each z ∈ Z, the function φ(·, z) is convex.
(4) For each x ∈ X, the function −φ(x, ·) : Z →
is closed and convex.
Then, the minimax equality holds if and only if the
function p is lower semicontinuous at u = 0.
Proof: The convexity/concavity assumptions guar-
antee that the minimax equality is equivalent to
q ∗ = w∗ in the min common/max crossing frame-
work. Furthermore, w∗ < ∞ by assumption, and
the set M [equal to M and epi(p)] is convex.
By the 1st Min Common/Max Crossing The-
orem, we have w ∗ = q ∗ iff for every sequence
(uk , wk ) ⊂ M with uk → 0, there holds w∗ ≤
lim inf k→∞ wk . This is equivalent to the lower
semicontinuity assumption on p:
Assume that:
(1) X and Z are convex.
(2) p(0) = inf x∈X supz∈Z φ(x, z) > −∞.
(3) For each z ∈ Z, the function φ(·, z) is convex.
(4) For each x ∈ X, the function −φ(x, ·) : Z →
is closed and convex.
(5) 0 lies in the relative interior of dom(p).
Then, the minimax equality holds and the supre-
mum in supz∈Z inf x∈X φ(x, z) is attained by some
z ∈ Z. [Also the set of z where the sup is attained
is compact if 0 is in the interior of dom(f ).]
Proof: Apply the 2nd Min Common/Max Cross-
ing Theorem.
EXAMPLE I
• Let X = (x1 , x2 ) | x ≥ 0 and Z = {z ∈ |
z ≥ 0}, and let √
−
φ(x, z) = e x1 x2 + zx1 ,
which satisfy the convexity and closedness as-
sumptions. For all z ≥ 0,
−√x x
inf e 1 2 + zx
1 = 0,
x≥0
p(u)
√
− x1 x2
p(u) = inf sup e + z(x1 − u)
x≥0 z≥0
epi(p)
∞ if u < 0,
1
= 1 if u = 0,
0 if u > 0,
0 u
EXAMPLE II
= − u if u ≥ 0,
0 u ∞ if u < 0.
SADDLE POINT ANALYSIS
where
F (x, u) = supz∈Z φ(x, z) − u z if x ∈ X,
∞ if x ∈
/ X.
LECTURE OUTLINE
C ∗ = {y | y x ≤ 0, ∀ x ∈ C},
C
a2
a2
0
0 C∗
C∗
(a) (b)
x
C
z^
2 z^
0
z - z^
C∗ z
• A cone C ⊂ n is polyhedral , if
C = {x | aj x ≤ 0, j = 1, . . . , r},
C= x
x= µj aj , µj ≥ 0, j = 1, . . . , r
⎩ ⎭
j=1
= cone {a1 , . . . , ar } ,
a1
a3 a1
a2
0 a2 a3
(a) (b)
FARKAS-MINKOWSKI-WEYL THEOREMS
v1
v4
C
v2
v3
0
P̂ = (x, w)
x = µj vj , w = µj dj , µj ≥ 0 ,
⎩ ⎭
j=1 j=1
J + = {j | dj > 0}, J 0 = {j | dj = 0}
PROOF CONTINUED
P̂ = (x, w)
x = µj vj , w = µj , µ j ≥ 0
⎩ ⎭
j∈J + ∪J 0 j∈J +
Since P = x | (x, 1) ∈ P̂ , we obtain
⎧ ⎫
⎨
⎬
P = x
x= µj vj , µj = 1, µj ≥ 0
⎩ ⎭
j∈J + ∪J 0 j∈J +
Thus,
⎧ ⎫
⎨
⎬
P = conv {vj | j ∈ J + } + µj vj
µj ≥ 0, j ∈ J 0
⎩ 0 ⎭
j∈J
• To prove that the vector sum of conv {v1 , . . . , vm }
and a finitely generated cone is a polyhedral set,
we reverse the preceding argument. Q.E.D.
POLYHEDRAL FUNCTIONS
Thus
dom(f ) = x | aj x + bj ≤ 0, j = m + 1, . . . , r ,
Q.E.D.
LECTURE 11
LECTURE OUTLINE
• Extreme points
• Extreme points of polyhedral sets
• Extreme points and linear/integer programming
−−−−−−−−−−−−−−−−−−−−−−−−−−
Recall some of the facts of polyhedral convexity:
• Polarity relation between polyhedral and finitely
generated cones
∗
{x | aj x ≤ 0, j = 1, . . . , r} = cone {a1 , . . . , ar }
• Farkas’ Lemma
{x | aj x ≤ 0, j = 1, . . . , r}∗ = cone {a1 , . . . , ar }
• Minkowski-Weyl Theorem: a cone is polyhedral
iff it is finitely generated. A corollary (essentially):
Polyhedral set P = conv {v1 , . . . , vm } + RP
EXTREME POINTS
a3 a
a2 2
a1 a1
v v
P P
a4 a4
a5 a5
(a) (b)
PROOF OUTLINE
aj w = 0, ∀ aj ∈ Av
aj w = bj , ∀ aj ∈ Āv
x*
x*
C∩H1 ∩H2
Level sets of f
P P
(a) (b)
EXTREME POINTS AND INTEGER PROGRAMMING
P = {x | Ax = b, c ≤ x ≤ d},
LECTURE OUTLINE
ri(C) ∩ ri(P ) = Ø
P P
a
C C
Separating
Separating
hyperplane
hyperplane
p(u)
f(x) < w
M = epi(p)
x
0
Ax-b ≤ 0 x u
Ax-b ≤ u
PROOF
ξ y ≤ 0, ∀ y with aj y ≤ 0, ∀ j ∈ J,
so by Farkas’ Lemma,
there exist µj ≥ 0, i ∈ J,
such that ξ = j∈J µj aj . Defining µj = 0 for
j∈/ J, we have
ξ = A µ and µ (Az ∗ − b) = 0, so ξ z ∗ = µ b
• Hence from w +ξ z ≤ inf (x,w)∈epi(f ) w+ξ x ,
∗ ∗
∗
w ≤ inf w + µ (Ax − b)
(x,w)∈epi(f )
≤ inf n
{w + µ u}
(x,w)∈epi(f ), u∈
Ax−b≤u
= inf {w + µ u} = q(µ) ≤ q ∗ .
(u,w)∈M
r
f (x) + µ∗j gj (x) ≥ 0, ∀x∈C
j=1
0 0 0
(µ,1) (µ,1)
w∗ ≥ 0.
EXAMPLE
g(x)
g(x)
f(x)
f(x)
g(x) ≤ 0
minimize f (x)
subject to x ∈ F = x ∈ C | gj (x) ≤ 0, j = 1, . . . , r
r
q(µ) ≤ f (x) + µj gj (x) ≤ f (x)
j=1
LECTURE OUTLINE
f(z) - f(x)
slope =
z-x
x y z
f (x + α) − f (x)
f + (x) = lim
α↓0 α
f (x) − f (x − α)
f − (x) = lim
α↓0 α
MULTI-DIMENSIONAL DIRECTIONAL DERIVATIVES
• For a convex f : n →
f (x + αy) − f (x)
f (x; y) = lim ,
α↓0 α
f (x; y) = ∇f (x) y,
f(z)
z
SUBDIFFERENTIAL
0 x -1 0 1 x
∂f(x) ∂f(x)
0 x -1 0 1 x
-1
PROPERTIES OF SUBGRADIENTS I
f (x) ≤ f (x + u) + µ u, ∀ u ∈ n
f (x; y) = max y d, ∀ y ∈ n
d∈∂f (x)
• Generalizes to functions F (x) = g f (x) , where
g is smooth.
ADDITIONAL RESULTS ON SUBGRADIENTS
LECTURE OUTLINE
• Conical approximations
• Cone of feasible directions
• Tangent and normal cones
• Conditions for optimality
−−−−−−−−−−−−−−−−−−−−−−−−−−
• A basic necessary condition:
− If x∗ minimizes a function f (x) over x ∈ X,
then for every y ∈ n , α∗ = 0 minimizes
g(α) ≡ f (x + αy) over the line subset
{α | x + αy ∈ X}
∇f (x∗ ) y ≥ 0, ∀ y ∈ FX (x∗ )
xk − x y
xk → x, →
xk − x y
Ball of
radius ||y||
X
x x+y
x + yk+1
xk+1
x + yk
xk
x2 x2
(1,2)
TX(x) = cl(FX(x)) TX(x)
x = (0,1)
x = (0,1)
x1 x1
(a) (b)
xk → x, zk → z, zk ∈ TX (xk )∗ , ∀k
TX(x) = Rn
x=0
NX(x)
TX (x) = NX (x)∗ .
Therefore,
inf sup d (x − x∗ ) = 0
x∈cl(X)∩{z|z−x∗ ≤1} d∈∂f (x∗ )
LECTURE OUTLINE
−−−−−−−−−−−−−−−−−−−−−−−−−−−−
• Problem
minimize f (x)
subject to x ∈ X, h1 (x) = 0, . . . , hm (x) = 0
g1 (x) ≤ 0, . . . , gr (x) ≤ 0
µ∗j ≥ 0, ∀ j = 1, . . . , r,
m
r
L(x, λ, µ) = f (x) + λi hi (x) + µj gj (x)
i=1 j=1
m
r
∇f (x∗ ) + λ∗i ∇hi (x∗ ) + µ∗j ∇gj (x∗ ) = 0
i=1 j=1
EXAMPLE OF NONEXISTENCE
OF A LAGRANGE MULTIPLIER
x2
h2(x) = 0
h1(x) = 0
∇f(x* ) = (1,1)
Minimize
f (x) = x1 + x2
subject to the two constraints
hi (x) = ai x − bi , i = 1, . . . , m,
T (x∗ ) = {y | ai y = 0, i = 1, . . . , m}
m
r
∇f (x∗ ) + λ∗i ∇hi (x∗ ) + µ∗j ∇gj (x∗ ) = 0
i=1 j=1
FRITZ JOHN THEORY
m
∇f (x∗ ) + λ∗i ∇hi (x∗ ) = 0
i=1
m
λ∗i ∇hi (x∗ ) = 0
i=1
m
µ∗0 ∇f (x∗ ) + λ∗i ∇hi (x∗ ) = 0
i=1
∇f(x* )
a x* + ∆x a'x = b + ∆b
∆x
a'x = b
x*
• Consider
⎛ ⎞
m
r
Fc (x) = f (x) + c ⎝ |hi (x)| + gj+ (x)⎠
i=1 j=1
Fc(x)
f(x) f(x)
x* x x* x
g(x)
µ*g(x)
(a) (b)
gj (x∗ ) = 0, ∀j∈J
LECTURE 16
LECTURE OUTLINE
−−−−−−−−−−−−−−−−−−−−−−−−−−−−
• Problem
minimize f (x)
subject to x ∈ X, h1 (x) = 0, . . . , hm (x) = 0
g1 (x) ≤ 0, . . . , gr (x) ≤ 0
m
r
L(x, λ, µ) = f (x) + λi hi (x) + µj gj (x)
i=1 j=1
Let x∗ be a local minimum. Then λ∗ and µ∗ are
Lagrange multipliers if for all j,
∇x L(x∗ , λ∗ , µ∗ ) y ≥ 0, ∀ y ∈ TX (x∗ )
(TX(x*))*
Level Sets
Level Sets
of f
of f
∇g1(x*)
xk ∇g(x*)
...
. ∇g2(x*) g(x) < 0
xk
x*
... .
x*
∇f(x*) X
g1(x) < 0
(a) (b)
(TX(x*))*
Level Sets
Level Sets
of f
of f
∇g1(x*)
xk ∇g(x*)
... ∇g2(x*) g(x) < 0
x* . xk
... .
x*
∇f(x*) X
g1(x) < 0
(a) (b)
µ∗0 > 0, µ∗1 = λ∗ /µ∗0 , µ∗2 = 0 or µ∗0 > 0, µ∗1 = 0, µ∗2 = −λ∗ /µ∗0
PROOF OF ENHANCED FJ THEOREM
Dividing with δ k ,
r
1 k ∗
− µk0 ∇f (xk ) + µkj ∇gj (xk ) + k
(x − x ) ∈ TX (xk )∗
δ
j=1
r
Since by construction (µ0 ) + j=1 (µkj )2 = 1, the
k 2
r
µj gj (xk ) > 0, ∀k
j=1
• Assume that X = n
u2 u2 u2
µ µ
Tε Tε
Tε
0 u1 0 u1 0 u1
µ G = {g(x) | x ∈ X}
G = {g(x) | x ∈ X}
µ
H
H
0 g(x* ) 0
g(x* )
(b)
(a)
x* : not pseudonormal
G = {g(x) | x ∈ X}
µ
H
g(x* ) 0
(c)
SOME MAJOR CONSTRAINT QUALIFICATIONS
m
λi ∇hi (x∗ ) ∈ NX (x∗ )
i=1
µ g(x) over x ∈ n , so
µ g(x) ≤ µ g(x∗ ) = 0, ∀ x ∈ n
Thus,
r
∗
− µj ∇gj (x∗ ) ∈
/ NX (x∗ )∗
j=1
∗
Since NX (x∗ ) ⊂ NX (x∗ )∗ ,
r
− µj ∇gj (x∗ ) ∈
/ NX (x∗ )
j=1
a contradiction of conditions (i) and (ii). Q.E.D.
LECTURE OUTLINE
• Sensitivity Issues
• Exact penalty functions
• Extended representations
−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Review of Lagrange Multipliers
r
µj gj (xk ) > 0, ∀k
j=1
x2
h(x) = x2 = 0
x1
x* = 0
• We have
Strong
Informative Minimal
Lagrange multipliers
SENSITIVITY
r
f (xk ) = f (x∗ ) − µ∗j gj (xk ) + o(xk − x∗ )
j=1
x ∈ X, gj (x) ≤ vj , 0 ≤ vj , j = 1, . . . , r
PROOF THAT PN IMPLIES EXACT PENALTY
r
− µj ∇gj (x∗ ) ∈ NX (x∗ )
j=1
r
µj gj (xk ) > 0
j=1
C = {x | gj (x) ≤ 0, j = 1, . . . , r}
r
∇f (x∗ ) + µ∗j ∇gj (x∗ ) = 0,
j=1
µ∗j ≥ 0, ∀ j = 0, 1, . . . , r, µ∗j = 0, ∀ j ∈
/ A(x∗ ),
where
A(x∗ ) = {j | gj (x∗ ) = 0, j = 1, . . . , r}
X = Rn X = Rn and Regular
Constraint Qualifications Constraint Qualifications
CQ1-CQ4 CQ5, CQ6
Pseudonormality Pseudonormality
Admittance of an Exact
Admittance of an Exact Penalty
Quasiregularity
Penalty
Admittance of Informative
Admittance of Informative Lagrange Multipliers
Lagrange Multipliers
Admittance of Lagrange
Admittance of Lagrange Multipliers
Multipliers
X = Rn
Constraint Qualifications
CQ5, CQ6
Pseudonormality
Admittance of an Exact
Penalty
Admittance of R-multipliers
LECTURE 18
LECTURE OUTLINE
minimize f (x)
subject to x ∈ X, g1 (x) ≤ 0, . . . , gr (x) ≤ 0
f ∗ = inf L(x, µ∗ ),
x∈X
where
L(x, µ) = f (x) + µ g(x)
(µ,1) w w
S = {(g(x),f(x)) | x ∈ X}
(g(x),f(x))
S
• L(x,µ) = f(x) + µ'g(x)
• (0,f*)
•
• infx ∈ X L(x,µ)
0 z 0 z
(a) (b)
w w
(µ*,1) S S
(0,f*) (µ*,1)
• (0,f*)
•
0 z 0 z
(c) (d)
(µ*,1)
(µ*,1) (µ*,1)
S = {(g(x),f(x)) | x ∈ X}
min f(x) = x
s.t. g(x) = x2 ≤ 0
x ∈X = R
(0,f*) = (0,0)
(a)
min f(x) = - x
S = {(g(x),f(x)) | x ∈ X}
s.t. g(x) = x - 1/2 ≤ 0
(-1/2,0) x ∈ X = {0,1}
(0,f*) = (0,0)
(b)
(1/2,-1)
maximize q(µ)
subject to µ ≥ 0,
(µ,1) S = {(g(x),f(x)) | x ∈ X}
Optimal
Dual Value
Support points
correspond to minimizers q(µ) = inf L(x,µ)
of L(x,µ) over X x ∈X
H = {(z,w) | w + µ'z = b}
WEAK DUALITY
• The domain of q is
Dq = µ | q(µ) > −∞
r
q(µ) = inf L(z, µ) ≤ f (x) + µj gj (x) ≤ f (x),
z∈X
j=1
so
q(µ)
min f(x) = - x
s.t. g(x) = x - 1/2 ≤ 0
x ∈ X = {0,1}
f* = 0 1
µ q(µ) = min { - x + µ(x - 1/2)} = min{ - µ/2, µ/2 −1}
- 1/2 x ∈ {0,1}
-1 (b)
DUALITY AND MINIMAX THEORY
minimize f (x)
subject to x1 ≤ 0, x ∈ X = {x | x ≥ 0},
where
√
f (x) = −
e x1 x2 , ∀ x ∈ X,
z
S = {(g(x),f(x)) | x ∈ X}
min f(x) = x
s.t. g(x) = x2 ≤ 0
S = {(x2,x) | x > 0}
x ∈ X = {x | x > 0}
0 w
f* = ∞, q* = 0
(b)
min f(x) = x1 + x2
S = {(g(x),f(x)) | x ∈ X}
= {(z,w) | z > 0} s.t. g(x) = x1 ≤ 0
x ∈ X = {(x1,x2) | x1 > 0}
0 w
f* = ∞, q* = −∞
(c)
LECTURE 19
LECTURE OUTLINE
−−−−−−−−−−−−−−−−−−−−−−−−−−−−
• Primal problem: Minimize f (x) subject to x ∈ X,
and g1 (x) ≤ 0, . . . , gr (x) ≤ 0 (assuming −∞ <
f ∗ < ∞). It is equivalent to inf x∈X supµ≥0 L(x, µ).
• Dual problem: Maximize q(µ) subject to µ ≥ 0,
where q(µ) = inf x∈X L(x, µ). It is equivalent to
supµ≥0 inf x∈X L(x, µ).
• µ∗ is a geometric multiplier if and only if f ∗ = q ∗ ,
and µ∗ is an optimal solution of the dual problem.
• Question: Under what conditions f ∗ = q ∗ and
there exists a geometric multiplier?
LINEAR AND QUADRATIC PROGRAMMING DUALITY
−∞ < f ∗ < ∞
minimize c x
subject to ei x = di , i = 1, . . . , m, x≥0
• Dual function
⎧ ⎫
⎨ n
m
m ⎬
q(λ) = inf cj − λi eij xj + λi d i
x≥0 ⎩ ⎭
j=1 i=1 i=1
m
• If cj − i=1 λi eij ≥ 0 for all j,the infimum
m
is attained
m for x = 0, and q(λ) = i=1 λi di . If
cj − i=1 λi eij < 0 for some j, the expression in
braces can be arbitrarily small by taking xj suff.
large, so q(λ) = −∞. Thus, the dual is
m
maximize λi d i
i=1
m
subject to λi eij ≤ cj , j = 1, . . . , n.
i=1
THE DUAL OF A QUADRATIC PROGRAM
subject to Ax ≤ b,
where Q is a given n × n positive definite symmet-
ric matrix, A is a given r × n matrix, and b ∈ r
and c ∈ n are given vectors.
• Dual function:
subject to µ ≥ 0,
where P = AQ−1 A and t = b + AQ−1 c.
RECALL NONLINEAR FARKAS’ LEMMA
r
f (x) + µ∗j gj (x) ≥ 0, ∀x∈C
j=1
APPLICATION TO CONVEX PROGRAMMING
F = {x ∈ C | aj x − bj ≤ 0, j = 1, . . . , p}
Assert the existence of geometric multipliers in
the extended representation, and pass back to the
original representation. Q.E.D.
STRONG DUALITY THEOREM II
LECTURE OUTLINE
= inf {f (x) + µ u}
{(u,x)|x∈X, g(x)≤u}
• Thus
q(µ) = inf r p(u) + µ u , ∀µ≥0
u∈
w
(µ,1)
S={(g(x),f(x)) | x ∈ X}
f*
q* p(u)
(µ*,1)
S ={(g(x),f(x)) | x ∈ X}
f* p(u)
Slope: -µ*
0 u
uγj = (0, . . . , 0, γ, 0, . . . , 0)
p(uγj ) − p(0) ∗
p(u γ
j ) − p(0)
lim ≤ −µj ≤ lim
γ↑0 γ γ↓0 γ
Thus −µ∗j lies between the left and the right slope
of p in the direction of the jth axis starting at u = 0.
SENSITIVITY ANALYSIS II
w
S = {(g(x),f(x)) | x ∈ X}
(µ∗ , µ0 ∗ )
(0,f*)
LECTURE OUTLINE
• Fenchel Duality
• Conjugate Convex Functions
• Relation of Primal and Dual Functions
• Fenchel Duality Theorems
−−−−−−−−−−−−−−−−−−−−−−−−−−−−
FENCHEL DUALITY FRAMEWORK
= g2 (λ) − g1 (λ)
CONJUGATE FUNCTIONS
f(x)
(- λ,1)
~
f(x) = sup { x'λ - g(λ)}
λ
x 0
(a)
f(x)
Conjugate of 0
conjugate of f
~ (b)
f(x) = sup { x'λ - g(λ)}
λ
EXAMPLES OF CONJUGATE PAIRS
g(λ) = sup x λ−f (x) , f˜(x) = sup x λ−g(λ)
x∈n λ∈n
β if λ = α
f(x) = αx - β g(λ) = { ∞ if λ ≠ α
Slope = α
0 β/α x 0 α λ
X = (- ∞, ∞)
if |λ| ≤ 1
f(x) = |x| g(λ) = {0∞ if |λ| > 1
0 x -1 0 1 λ
X = (- ∞, ∞)
0 x 0 λ
X = (- ∞, ∞)
CONJUGATE OF THE CONJUGATE FUNCTION
(a) We have
f (x) ≥ f̃ (x), ∀ x ∈ n
f (x) = f̃ (x), ∀ x ∈ n
fˆ(x) = f̃ (x), ∀ x ∈ n
CONJUGACY OF PRIMAL AND DUAL FUNCTIONS
minimize f (x)
subject to x ∈ X, gj (x) ≤ 0, j = 1, . . . , r.
• Thus, q(µ) = − supu∈r −µ u − p(u) or
q(µ) = −h(−µ), ∀ µ ≥ 0,
σX (λ) = sup λ x,
x∈X
LECTURE OUTLINE
• Fenchel Duality
• Fenchel Duality Theorems
• Cone Programming
• Semidefinite Programming
−−−−−−−−−−−−−−−−−−−−−−−−−−−−
• Recall the conjugate convex function of a gen-
eral extended real-valued proper function f : n →
(−∞, ∞]:
g(λ) = sup x λ − f (x) , λ ∈ n .
x∈n
= g2 (λ) − g1 (λ)
FENCHEL DUALITY THEOREM
Slope = λ
(- λ,1)
sup {f2(x) - x'λ} = - g2(λ)
x
f1(x)
(- λ,1)
f2(x)
Slope = λ
0 x 0 x
dom(f1) dom(-f2)
Slope = λ*
f 1(x)
(- λ*,1)
(- λ,1)
g2(λ) - g1(λ)
Slope = λ
0 x* x
f 2(x)
g2(λ*) - g1(λ*)
if either
− The relative interiors of dom(g1 ) and dom(−g2 )
intersect, or
− dom(g1 ) and dom(−g2 ) are polyhedral, and
g1 and g2 can be extended to real-valued
convex and concave functions over n .
CONIC DUALITY I
minimize f (x)
subject to x ∈ C,
We have
g1 (λ) = sup λ x−f (x) , g2 (λ) = inf x λ = 0 if λ ∈ Ĉ,
x∈n x∈C −∞ if λ ∈
/ Ĉ,
Ĉ = −C ∗ = {λ | x λ ≥ 0, ∀ x ∈ C}
CONIC DUALITY II
minimize c x
subject to x − b ∈ S, x ∈ C.
• The conjugate is
minimize b λ
subject to λ − c ∈ S ⊥ , λ ∈ Ĉ.
m
maximize bi λ i
i=1
subject to C − (λ1 A1 + · · · + λm Am ) ∈ D.
LECTURE OUTLINE
********************************
minimize f (x)
subject to x ∈ X, gj (x) ≤ 0, j = 1, . . . , r,
subject to µ ≥ 0.
PROS AND CONS FOR SOLVING THE DUAL
can be written as
subject to x ∈ X.
• Let
xµ = arg min L(x, µ) = arg min f (x) + µ g(x)
x∈X x∈X
≤ f (xµ ) + µ g(xµ )
= f (xµ ) + µ g(xµ ) + (µ − µ) g(xµ )
= q(µ) + (µ − µ) g(xµ ).
∇q(µ) = g(xµ ), ∀ µ ∈ r
NONDIFFERENTIABILITY OF THE DUAL
∂q(µ) = g
g = ξi ai , ξi ≥ 0, ξi = 1
⎩ ⎭
i∈Iµ i∈Iµ
NONDIFFERENTIABLE OPTIMIZATION
Contours of q
M
µk
µ*
[ µk + sk g k ]+
gk
µk + sk g k
KEY SUBGRADIENT METHOD PROPERTY
Contours of q M
µk < 90 o
µ* gk
µk+1 = [ µk + sk g k ]+
µk + sk g k
sC 2
lim sup q(µk ) ≥ q(µ∗ ) −
k→∞ 2
OTHER STEPSIZE RULES
LECTURE OUTLINE
• Subgradient Methods
• Stepsize Rules and Convergence Analysis
********************************
• Consider a generic convex problem minx∈X f (x),
where f : n → is a convex function and X is
a closed convex set, and the subgradient method
+
xk+1 = [xk − αk gk ] ,
+
xk+1 = ψm,k , ψi,k = [ψi−1,k − αk gi,k ] , i = 1, . . . , m
where
m
C= Ci
i=1
• Constant Stepsize αk ≡ α:
− By key lemma with f (y) ≈ f ∗ , it makes
progress
2 f (xk )−f ∗
to the optimal if 0 < α < C2 , i.e., if
αC 2
f (xk ) > f ∗ +
2
• Diminishing Stepsize αk → 0, k αk = ∞:
− Eventually makes progress (once αk becomes
small enough). Can show that
αC 2
lim inf f (xk ) ≤ f ∗ + ,
k→∞ 2
m
where C = i=1 Ci (in the case where f ∗ = −∞,
we have lim inf k→∞ f (xk ) = −∞.)
• Proof by contradiction. Let > 0 be s.t.
αC 2
lim inf f (xk ) > f ∗ + + 2,
k→∞ 2
αC 2
lim inf f (xk ) ≥ f (ŷ) + + 2
k→∞ 2
αC 2+
min f (xk ) ≤ f ∗ +
0≤k≤K 2
where % 2 &
d(x0 , X ∗ )
K=
α
• By contradiction. Assume that for 0 ≤ k ≤ K
αC 2+
f (xk ) > f ∗ +
2
2
2
∗ ∗ ∗
d(xk+1 , X ) ≤ d(xk , X ) − 2α f (xk ) − f +α2 C 2
∗
2
≤ d(xk , X ) − (α2 C 2 + α) + α2 C 2
∗
2
= d(xk , X ) − α.
2
Sum over k to get d(x0 , X ∗ ) − (K + 1)α ≥ 0.
CONVERGENCE FOR OTHER STEPSIZE RULES
Then,
lim inf f (xk ) = f ∗
k→∞
• Estimation method:
inf f (xk ) ≤ f ∗ + δ
k≥0
LECTURE OUTLINE
+
xk+1 = ψm,k , ψi,k = [ψi−1,k − αk gi,k ] , i = 1, . . . , m
• For αk ≡ α, we have
αC 2
lim inf f (xk ) ≤ f ∗ +
k→∞ 2
M
min C0 |x + 1| + 2|x| + |x − 1|
x
i=1
αmC 2
f∗ + 0
≤ lim inf f (xk )
2 k→∞
where
C0 = max{C1 , . . . , Cm }
COMPLEXITY ESTIMATE FOR CONSTANT STEP
αC 2+
min f (xk ) ≤ f ∗ +
0≤k≤K 2
where % 2 &
d(x0 , X ∗ )
K=
α
RANDOMIZED ORDER METHODS
" #+
xk+1 = xk − αk g(ωk , xk )
where ωk is a random variable taking equiprobable
values from the set {1, . . . , m}, and g(ωk , xk ) is a
subgradient of the component fωk at xk .
• Assumptions:
(a) {ωk } is a sequence of independent random
variables. Furthermore, the sequence {ωk }
is independent of the sequence {xk }.
(b) The set of subgradients g(ωk , xk ) | k =
0, 1, . . . is bounded, i.e., there exists a pos-
itive constant C0 such that with prob. 1
• Stepsize Rules:
− Constant: αk ≡ α
− Diminishing: k αk = ∞, k (αk )2 < ∞
− Dynamic
RANDOMIZED METHOD W/ CONSTANT STEP
• With probability 1
αmC 2
inf f (xk ) ≤ f ∗ + 0
k≥0 2
where
2α
f (x̂k ) − f (y γ ) − α 2C 2
0 if x̂k ∈
/ Lγ ,
zk = m
0 if x̂k = yγ .
PROOF CONTINUED II
• If x̂k ∈
/ Lγ , we have
2α
zk = f (x̂k ) − f (yγ ) − α2 C02
m
2α 2 αmC0 2 1
≥ ∗
f + + −f −∗ − α2 C02
m γ 2 γ
2α
= .
mγ
2α
E ||x̂k+1 − yγ ||2 | Fk ≤ ||x̂k − yγ ||2 −
mγ
αmC 2+
min f (xk ) ≤ f ∗ + 0
,
0≤k≤N 2
LECTURE OUTLINE
********************************
minimize f (x)
subject to x ∈ X, gj (x) ≤ 0, j = 1, . . . , r,
max Qk (µ)
µ∈M
where
Qk (µ) = min q(µi ) + (µ − µi ) g i
i=0,...,k−1
Set
µk = arg max Qk (µ)
µ∈M
q(µ 0) + (µ − µ0)'g(x µ0 )
q(µ 1) + (µ − µ1)'g(x µ1 )
q(µ)
µ0 µ3 µ* µ2 µ1
µ
M
POLYHEDRAL CASE
q(µ) = min ai µ + bi
i∈I
q(µ)
µ0 µ3 µ2 µ1
µ
M µ4 = µ*
CONVERGENCE
q(µi ) + (µ − µi ) g i ≥ q(µ), ∀ µ ∈ M,
J
minimize fj (xj )
j=1
J
subject to xj ∈ Xj , j = 1, . . . , J, Aj xj = b.
j=1
• Dual function is
J
q(λ) = min fj (xj ) + λ A j xj − λ b
xj ∈Xj
j=1
J
= fj xj (λ) + λ Aj xj (λ) − λ b
j=1
J
gλ = Aj xj (λ) − b
j=1
DANTSIG-WOLFE DECOMPOSITION
maximize v
subject to v ≤ q(λi ) + (λ − λi ) g i , i = 0, . . . , k − 1
k−1
minimize ξi q(λi ) − λi g i
i=0
k−1
k−1
subject to ξ i = 1, ξ i g i = 0,
i=0 i=0
ξ i ≥ 0, i = 0, . . . , k − 1,
DANTSIG-WOLFE DECOMPOSITION (CONT.)
ξ i ≥ 0, i = 0, . . . , k − 1.
k−1
ξ i fj xj (λi )
i=0
k−1
ξ i xj (λi )
i=0
GEOMETRICAL INTERPRETATION
fj(xj)