You are on page 1of 18

Analysis of Convex Sets and Functions

Robert M. Freund

April, 2004

2004 Massachusetts Institute of Technology.

1
2

1 Convex Sets - Basics

A set S IRn is dened to be a convex set if for any x1 S, x2 S,


and any scalar satisfying 0 1, x1 + (1 )x2 S. Points of the
form x1 + (1 )x2 are said to be a convex combination of x1 and x2 , if
0 1.
A hyperplane H IRn is a set of the form {x Rn | pt x = } for some
xed p IRn , p =
 0, and R.
A closed halfspace is a set of the form H + = {x | pt x } or H =
{x | pt x }. An open halfspace is dened analogously, with and
replaced by < and >.

Any hyperplane or closed or open halfspace is a convex set

The ball {x IRn | x21 + . . . + x2n 5} is a convex set.

The intersection of an arbitrary number of convex sets is a convex set.

A polyhedron {x IRn | Ax b} is a convex set.

X = {x | x = x1 + x2 , where x1 S1 and x2 S2 } is a convex set,


if S1 and S2 are convex sets.

Let S IRn . The convex hull of S, denoted H(S), is the collection of all
convex combinations of points of S, i.e., x H(S) if x can be represented
as

k
x= j xj
j=1

where

k
j = 1, j 0, j = 1, . . . , k,
j=1

for some positive integer k and vectors x1 , . . . , xk S.

Lemma 1 If S IRn , then H(S) is the smallest convex set containing S,


and H(S) is the intersection of all convex sets containing S.
3

The convex hull of a nite number of points x1 , . . . , xk+1 IRn is called


a polytope.
If the vectors x2 x1 , x3 x1 , . . . , xk+1 x1 , are linearly independent, then
H(x1 , . . . , xk+1 ) is called a k-dimensional simplex, with vertices x1 , . . . , xk+1 .
Carath eodory Theorem. Let S IRn , and let x H(S) be given. Then
x H(x1 , . . . , xn+1 ) for n+1 suitably chosen points xj S, j = 1, . . . , n+1.
In other words, x can be represented as


n+1
x= j xj
j=1

where

n+1
j = 1, j 0, j = 1, . . . , n, for some xj S, j = 1, . . . , n + 1 .
j=1

Proof: Since x H(S) , there exist k points x1 , . . . , xk that satisfy


k
j xj = x
j=1

and

k
j = 1, 1 , . . . , k 0 .
j=1

If k n + 1, we are done. If not, let us consider the above linear system


as a system of n + 1 equations in the nonnegative variable 0, i.e., as a
system A = b, 0, where
   
x1 x2 . . . xk x
A= , b= .
1 1 ... 1 1

Let e = (1, . . . , 1)t .


Notice that this system has n + 1 equations in k nonnegative variables.
From the theory of linear programming, this system has a basic feasible
4

solution, and we can presume, without loss of generality, that the basis
consists of columns 1, . . . , n + 1 of A. Thus, there exists 0, et = 1,
n+2 , . . . , k = 0 such that


n+1 
n+1
j xj = x, j = 1, 0 .
j=1 j+1

2 Closure and Interior of Convex Sets

Let x IRn . Dene N (x) = {y IRn | y x < }, where   is the


Euclidean norm: 

 n 2
z :=  zj = z t z .
j=1

Let S be an arbitrary set in Rn . x clS, the closure of S, if SN (x) =


for every  > 0. If S = clS, then S is a closed set. x intS, the interior of
S, if there exists  > 0 for which N (x) S. If S = intS, then S is an open
set. x S, the boundary of S, if N (x) contains a point in S and a point
not in S, for all  > 0.

Theorem 2 Let S be a convex set in Rn . If x1 clS and x2 intS, then


x1 + (1 )x2 intS for every (0, 1).

Proof: Since x2 intS, there exists  > 0 such that z x2  <  implies
z S. Also, since x1 clS, for any > 0, there exists a z with z x1  <
and z S. Now let y = x1 + (1 )x2 be given, with (0, 1). Let
r = (1 ). I claim that for any z with z y < r, then z S. This
means that y intS, proving the theorem.
To prove the claim, consider such a z with z y < r. Let =
(1 ) z y. Then > 0. Then there exists z 1 with z 1 x1  <
and z 1 S. Let z 2 = (z z 1 )/(1 ).
5

1 1 )(yx1 )
z 2 x2  =  (zz ) 2
(1) x  = 
(zz

  1
1 1 1
= 1 z y (z x )
1
1 (z y + z 1 x1 )
1
< 1 (z y + ((1 ) z y))
= z y +  < (1 ) +  = .

Thus z 2 S. Also z = z 1 + (1 )z 2 , and so z S.


The proof construction is illustrated in Figure 1.

x1
z1

z
y
r
z2

x2

Figure 1: Construction for the proof of Theorem 2

3 Supporting Hyperplanes of Convex Sets

Let S be nonempty set in Rn , and let x S. A hyperplane H = {x


IRn | pt x = } (where p = 0) is call a supporting hyperplane of S at x
if
6

either S H + or S H , (i.e., if either pt x for all x S or pt x


for all x S, and pt x
= .
i.e., H = {x | pt (x x)
Note that we can write H as H = {x | pt x = pt x}, =
 H, then H is called a proper supporting hyperplane.
0}. If S
= inf{pt x | x S} or pt x
Note that either pt x = sup{pt x | x S}.

Theorem 3 If S is a nonempty convex set and x S, then there exists


a supporting hyperplane to S at x; that is, there exists a nonzero vector p
such that pt x pt x
for all x S.

Proof: Let x S. Then there exists a sequence of y i , i = 1, . . . , , such


that y i x, and y i
/ cl(S). Now cl(S) is a closed convex set, and so there
exists p = 0 and i such that (pi )T x i for all x cl(S), and (pi )T y i >
i

i . We can re-scale the pi values so that pi  = 1. As y i x, choose


subsequence of the p that converge, say, to p. Then (p ) x < (pi )T y i
i i T i

for all x cl(S), and so pT x pT x


. Thus, as i , any x S cl(S)
satises p x p x.
T T

Theorem 4 If A and B are nonempty convex sets and A B = , then A


 0, , such
and B can be separated by a hyperplane. That is, there exists p =
that p x for any x A and p x for any x B.
T T

Proof: Let S = {x | x = x1 x2 , x1 A, x2 B}. Then S is a


convex set, and 0 / S. Let T = cl(S) = S (S). If 0 T , then 0 S,
and from Theorem 3, there exists p =  0, such that pT (x1 x2 ) 0 for
any x1 A, x2 B, so that pT x1 pT x2 for any x1 A, x2 B. Let
= inf{pT x2 | x2 B}. Then pT x1 pT x2 for any x1 A, x2 B.
If 0
/ T , then since T is a closed set, then we can apply the basic separating
hyperplane theorem for convex sets: there exists p = 0 and for which
pT (x1 x2 ) < pT 0 = 0 for any x1 A, x2 B, so that pT x1 pT x2
for any x1 A, x2 B. Again let = inf{pT x2 | x2 B}. Then pT x1
pT x2 for any x1 A, x2 B.
7

Corollary 5 If S1 and S2 are nonempty disjoint closed convex sets, and S1


is bounded, then S1 and S2 can be strongly separated.

4 Polyhedral Convex Sets

S is a polyhedral set if S = {x IRn | Ax b} for some (A, b). x is an


extreme point of S if whenever x = x1 + (1 )x2
can be represented as x
for some x1 , x2 S and (0, 1), then x1 = x2 = x
.
d IRn is called a ray of S if whenever x X, x + d S for all 0. d is
called an extreme ray of S if whenever d = 1 d1 + 2 d2 for rays d1 , d2 of S
and 1 , 2 = 0, then d1 = d2 for some 0 or d2 = d1 for some 0.

Theorem 6 The set S = {x IRn | Ax = b, x 0} contains an extreme


 .
point if S =

Theorem 7 Let S = {x IRn | Ax b} be nonempty. Then S has a nite


number of extreme points and extreme rays.

Theorem 8 Let S = {x IRn | Ax b} be nonempty. Then there ex-


ist a nite number of points x1 , . . . , xK and a nite number of directions
d1 , . . . , dL with the property that x
S if and only if the following inequality
system has a solution in 1 , . . . , K , 1 , . . . , L :


K 
L
xj j + dj j = x

j=1 j=1


K

j = 1
j=1

1 0, . . . , K 0, 1 0, . . . , L 0 .
8

5 Convex Functions - Basics

Let S IRn be a nonempty convex set. A function f () : S IR


{, +} is a convex function on S if:

f (x + (1 )y) f (x) + (1 )f (y) for all [0, 1] and x, y S.

f () is strictly convex if the inequality above is strict for all (0, 1) and
x=  y.
f () is concave if f () is convex.
f () is strictly concave if f () is strictly convex.
Examples:

1. f (x) = ax + b

2. f (x) = |x|

3. f (x) = x2 2x + 3

4. f (x) = x for x 0

5. f (x1 , x2 ) = 2x21 + x22 2x1 x2

We dene S := {x S | f (x) }. S is called the level set of f () at


level .

Lemma 9 If f () is convex on S, then S is a convex set.

Proof: Let x, y S . Then f (x) and f (y) , and so [0, 1]


implies

f (x + (1 )y) f (x) + (1 )f (y) + (1 ) = ,

and so x + (1 )y S . Therefore S is a convex set.


Note, the converse Lemma 9 is false.
9

Theorem 10 If S IRn is an open convex set and f () is a real-valued


convex function on S, then f () is continuous on S.

Proof: We will prove this in three steps.


Step 1. If f () is convex on S and x1 , . . . , xk S and 1 , . . . , k 0 and
n
j=1 j = 1, then

k 
k
f j xj
j f (xj ) .
j=1 j=1

Step 2. For each x S, there exists a > 0 such that x ei S, for


i = 1, . . . , n, where ei is the ith unit vector. Call these vectors z1 , . . . , z2n .
Let Mx := max{f (z1 ), . . . , f (z2n )}.
Since x lies in the interior of the convex hull of z1 , . . . , z2n , there exists
tx > 0 such that B(x, tx ) := {y | y x tx } also lies in the convex
hull of z1 , . . . , z2n . Then for any y satisfying y x tx it holds that
2n
y = 2n j=1 j zj for some appropriate 1 , . . . , 2n 0 and j=1 j = 1 and
so
2n
 2n

f (y) = f j zj j f (zj ) Mx .
j=1 j=1

Step 3. (The proof of the theorem). Without loss of generality, we assume


that f (0) = 0 and we want to prove f (x) is continuous at x = 0. For any
 > 0, we must exhibit a > 0 such that if y < then  f (y) . Let
t > 0 and M be chosen so that y t implies f (y) M (from Step 2).
Now let = t/M . Let y satisfy y < . Also, we can assume that M > 
and write:     
  M
y = 1 0+ y .
M M 
 
Since  M
 y =  y
M
< t, it follows that f M y M . Therefore:
   
  M 
f (y) 1 f (0) + f y 0+ M = .
M M  M
10

We also have:
  
M 1
0= M
 y + y,
1+ M  1 + M

and so

   
M 1 f (y)
0 = f (0) M
 f y +  f (y) M
 M + .
1+ M  1+ M 1+ M 1 + M

Therefore f (y) , and so  f (y) .


in the direction d, denoted by
The directional derivative of f () at x

f (
x, d) is dened as:

x + d) f (

f ( x)
f (
x, d) = lim ,
0+
when this limit exists.

Lemma 5.1 Suppose that S is convex and f () : S IR is convex, x S



+ d S for all > 0 and suciently small. Then f (
and x x, d) exists.

Proof: Let 2 > 1 > 0 be suciently small. Then


   
x + 1 d) = f
f ( 1
2 (
x + 2 d) + 1 1
(
x)
 2
1
2 f (
x + 2 d) + 1 1
2 f (
x)

Rearranging the above, we have:

x + 1 d) f (
f ( x) x + 2 d) f (
f ( x)
.
1 2
f (
x+d)f (
x)
Thus the term is nondecreasing in , and so the limit exists,
and is  
x + d) f (
f ( x)
f (
x, d) = inf ,
>0

x, d) = .
so long as we allow f (
11

6 Subgradients of Convex Functions

Let S IRn , and f () : S IR be given. The epigraph of f (), denoted


epi f (), is a subset of IRn+1 and is dened to be:

epi f () := {(x, ) IRn+1 | x S, IR, f (x) } .

The hypograph of f (), denoted hyp f (), is dened analogously as

hyp f {(x, ) IRn+1 | x S, IR, f (x) } .

Theorem 11 Let S be a nonempty convex set in IRn , and f () : S IR.


Then f () is a convex function if and only if epi f () is a convex set.

Proof: Assume f () is convex. Let (x1 , 1 ), (x2 , 2 ) epi f (). Then if


(0, 1), x1 + (1 )x2 S, because x1 S, x2 S, and S is convex.
Also 1 f (x1 ), 2 f (x2 ), f (x1 + (1 )x2 ) f (x1 ) + (1 )f (x2 )
1 + (1 )2 . Thus (x1 + (1 )x2 ), (1 + (1 )2 ) epi f (), and
so epi f () is a convex set.
Conversely, assume that epi f () is a convex set. Let x1 , x2 S, and let
1 = f (x1 ), 2 = f (x2 ). Then (x1 , 1 ) epi f () and (x2 , 2 ) epi f ().
Thus (x1 + (1 )x2 , 1 + (1 )2 ) epi f ().
This means f (x1 +(1)x2 ) 1 +(1)2 = f (x1 )+(1)f (x2 ),
and so f () is a convex function.
If f () : S IR is a convex function, then IRn is a subgradient of
f () at x S if

f (x) f ( for all x S .


x) + t (x x)

Example: f (x) = x2 and let x


= 3 and = 6. Notice that from the basic
inequality
0 (x 3)2 = x2 6x + 9
we obtain for any x:

f (x) = x2 6x 9 = 9 + 6(x 3) = f (
x) + 6(x x)
,
12

which shows that = 6 is a subgradient of f () at x = x


= 3.
Example: f (x) = |x|. Then it is straightforward to show that:

If x
> 0, then = 1 is a subgradient of f () at x

If x
< 0, then = 1 is a subgradient of f () at x

If x
= 0, then [1, 1] is a subgradient of f () at x.

Theorem 12 Let S IRn be a convex set and f () : S IR be a convex


function. If x intS, then there exists a vector IRn such that the
hyperplane H = {(x, y) IRn+1 | y = f ( x) + t (x x)}

supports epi f () at
x)). In particular, f (x) f (
x, f (
(
for all x S, and so is
x) + t (x x)
a subgradient of f () at x
.

Proof: Because epi f () is a convex set, and (x, f (


x)) belongs to the bound-
ary of epi f (), there exists a supporting hyperplane to epi f () at ( x)).
x, f (
Thus there exists a nonzero vector (, u) IR n+1
such that

t x + u t x x) for all (x, ) epi f () .


+ uf (

Let be any scalar larger that f (x). Then as we make arbitrarily large,
the inequality must still hold. Thus u 0. If u < 0, we can re-scale so that
u = 1. Then t x t x
f (
x), which upon rearranging yields:

f ( for all (x, ) epi f () .


x) + t (x x)

x) + t (x x),
In particular, with y = f (x), f (x) f ( proving the theorem.
It remains to show that u = 0 is an impossibility. If u = 0, then t x t x

for all x S. But since x int S, x+
S for > 0 and suciently
x + ) t x,
small. Thus t ( and so t 0. But this is a contradiction

unless = 0, which cannot be true since (, u) =  (0, 0).


The collection of subgradients of f (x) at x = x is denoted by f ( x) and
is called the subdierential of f (). We write f (
x) if is a subgradient
of f (x) at x = x
.
13

Theorem 13 Let S IRn be a convex set, and let f () : S IR be a


function dened on S. Suppose that for each x intS there exists a sub-
gradient vector . That is, for each x S there exists a vector such that
f (x) f ( for all x S. Then f () is a convex function on
x) + t (x x)
intS.

Theorem 14 Let f () be convex on IRn , let S be a convex set and consider


the problem:
P: minx f (x)

s.t. xS .

Then x S is an optimal solution of P if and only if f () has a subgradient


such that t (x x)
at x 0 for all x S.

Proof: Suppose f () has a subgradient at x 0 for


for which t (x x)
all x S. Then f (x) f (
x) + (x x)
t x) for all x S. Hence x
f ( is an
optimal solution of P .
Conversely, suppose x
is an optimal solution of P . Dene the following
sets:
A = {(d, ) IRn+1 | f (
x + d) < + f (
x)}
and
B = {(d, ) IRn+1 | x
+ d S, 0} .
Both A and B are convex sets. Also A B = . (If not, there is a d and an
x + d) < + f (
such that f ( x) f ( + d S, so x
x), x is not an optimal
solution.)
Therefore A and B can be separated by a hyperplane H = {(d, )
IRn+1 | t d + u = } where (, u) = 0, such that:

f ( x) t d + u
x + d) < + f (

x
+ d S, 0 t d + u
14

In the rst implication can be made arbitrarily large, so this means


u 0. Also in the rst implication setting d = 0 and =  > 0 implies
that u. In the second implication setting d = 0 and = 0 implies that
0. Thus = 0 and u 0. In the second implication setting = 0
we have t d 0 whenever x + d S, and so t (
x + d x)
0 whenever
+ d S. Put another way, we have x S implies that t (x x)
x 0.
It only remains to show that is subgradient. Note that u < 0, for if
u = 0 it would follow from the rst implication that t d 0 for any d, a
contradiction. Since u < 0, we can re-scale so that u = 1.
Now let d be given so that ( x + d) S and let = f ( x + d) f (
x) + 
for some  > 0. Then f ( x + d) < + f ( x), and it follows from the rst
implication that t d f ( x)  0 for all  > 0. Thus f (
x + d) + f ( x + d)
+ d S. Setting x = x
x) + t d for all x
f ( + d, we have that if x S,

x) + t (x x),
f (x) f (

and so is a subgradient of f () at x
.

7 Dierentiable Convex Functions

Let f () : S IR be given. f () is dierentiable at x intS if there is a


vector f ( x) + f (
x), the gradient vector, such that f (x) = f ( x)t (x x)
+
x x(
x; x x),
where limxx ( x; x x)
= 0. Note that
 
f (
x) f (
x)
f (
x) = ,..., .
x1 xn

Lemma 7.1 Let f () : S IR be convex. If f () is dierentiable at x


is the singleton set
intS, then the collection of subgradients of f () at x
{f ( x) = {f (
x)}, i.e., f ( x)}.

Proof: The set of subgradients of f () at x is nonempty by Theorem 12.


Let be a subgradient of f () at x
. Then for any d and any > 0 we have:

x + d) f (
f ( x + d x)
x) + t ( x) + t d .
= f (
15

Also
f ( x)t d + d(
x) + f (
x + d) = f ( x; d) .
Subtracting, we have

0 ( f (
x))t d d(
x; d)

which is equivalent to:

x))t d d(
0 ( f ( x; d) .

As 0 we have ( x))t d 0 for any d. This can


x; d) 0. So ( f (
only mean that f (
x).

Theorem 15 If f () : S IR is dierentiable, then f () is convex if and


only if
f (x) f ( x)t (x x)
x) + f ( int S.
for all x, x

8 Exercises on Convex Sets and Functions

1. Let K denote the dual cone of the closed convex cone K IRn ,
dened by:

K := {y IRn | y T x 0 for all x K} .

Prove that (K ) = K, thus demonstrating that the dual of the dual


of a closed convex cone is the original cone.

+ denote the nonnegative orthant, namely IR+ = {x


2. Let IRn IR | xj
n n
n
 n
 n
0, j = 1, . . . , n}. Considering IR+ as a cone, prove that IR+ = IR+ ,
n
thus showing that IR+ is self-dual.
  
3. Let Qn = x IRn | x1 n 2 n
j=2 xj . Q is called the second-order
cone, the Lorentz cone, or the ice-cream cone (I am not making this
up). Considering Qn as a cone, prove that (Qn ) = Qn , thus showing
that Qn is self-dual.
16

4. Prove Lemma 1 of the notes on Analysis of Convex Sets.



5. Let S1 and S2 be two nonempty sets in IRn , and dene S1 S2 :=
{x | x = x1 + x2 for some x1 S1 , x2 S2 }.

(a) Show that S1 S2 is a convex set if both S1 and S2 are convex
sets.

(b) Show by an example that S1 S2 is not necessarily a closed set,
even if both S1 and S2 are closed convex sets.
(c) Show that if either S1 or S2 is a bounded convex set, and both

S1 and S2 are closed sets, then S1 S2 is a closed set.

6. Suppose that f () is a convex function on IRn . Prove that is a


subgradient of f () at x
if and only if

x, d) t d
f (

for all directions d IRn .

7. Prove Theorem 13 of the notes.

8. Consider the function f () : IRn IR{}{+}. Such a function


is called an extended real-valued function. The epigraph of f () is
dened as:
epif () := {(x, ) | f (x) } .
We dene f () to be a convex function if epif () is a convex set. Show
that if f () : IRn IR, then this denition of a convex function is
equivalent to the standard denition of a convex function.

9. In class, we stated the meta result that the study of convex functions
reduces to the study of convex functions on the real line. This exercise
formalizes this observation and shows how we might use it to obtain
results about convex function from results about convex functions on
the real line.

(a) Let f (x) be a real-valued function dened on an open set X IRn .


Show that f (x) is a convex function if and only if for any two
points x, y X, the function g() := f (x + (1 )y) of the
scalar is convex on the open interval (0, 1).
17

(b) Suppose that we have proved that a twice dierentiable function


g() of the scalar is convex on the open interval (0, 1) if
and only if its second derivative is nonnegative for every point in
this interval. Use this fact, part (a), and the chain rule to show
that a twice dierentiable real-valued function f (x) dened on an
open set X IRn is convex if and only if its Hessian matrix is
positive semi-denite at every point in X.

10. (a) Let f (x) be a real-valued function dened on an open interval


I = (l, u) of the real line. For any two points a < b in (l, u) ,
let S(a, b) := f (b)f
ba
(a)
be the secant slope of f (x) between the
points a and b. Prove the following result:
Three Cord Lemma: f (x) is a convex function on an the in-
terval I = (l, u) of the real line if and only if for any three points
a < b < c in I we have S(a, b) S(a, c) S(b, c).
(b) Use the Three Cord Lemma and the mean value theorem to show
that a twice dierentiable function f (x) of the scalar x is convex
on the open interval I = (l, u) of the real line if and only if its
second derivative is nonnegative for every point in this interval.
(Hint: What does the Three Cord Lemma in part (a) say about
the relationship between the derivative of f (x) at the points a
and b ?)
11. Let S be a nonempty bounded convex set in IRn , and dene f () :
IRn IR as follows:

f (y) := sup{y t x | x S} .
x

The function f () is called the support function of S.


(a) Prove that f () is a convex function.
y) = yt x
(b) Suppose that f ( for some x f (
S. Show that x y).
12. Let f () : IRn IR be a convex function. Show that if f (
x), then
the hyperplane

x) + t (x x)}
H := {(x, ) | = f (

is a supporting hyperplane of epif () at (


x, f (
x)).
18

13. Let f () : IRn IR be a convex function, and let x


be given. Show
that f (x) is a closed convex set.

14. Let f1 (), . . . , fk () : IRn IR be dierentiable convex functions, and


dene:

f (x) := max{f1 (x), . . . , fk (x)} .

Let I(x) := {i | f (x) = fi (x)}. Show that



 
f (
x) = |= i fi (
x), i = 1, i 0 for i I(x) .

iI(x) iI(x)

15. Consider the function L (u) dened by the following optimization


problem, where X is a compact polyhedral set:

L (u) := minx ct x + ut (Ax b)

s.t. xX .

(a) Show that L () is a concave function.


(b) Characterize the set of subgradients of L () at any given u = u
.

You might also like