You are on page 1of 11

Exercises for Optimization and Algorithms

João Xavier
Instituto Superior Técnico

September 2014

1 Formulation of optimization problems


1. Target localization. You want to locate a target in Rn (n = 2 or 3). You have sensors at positions
ai ∈ Rn , i = 1, . . . , m. Each sensor provides a noisy measurement of its distance to the target:
sensor i gives you
di = x − ai + “noise”,
where x is the target’s location. Formulate the problem of estimating x from the available mea-
surements d1 , d2 , . . . , dm , as an optimization problem.

2. Optimal grades. You are enrolled in m courses for the semester. If you study a€ total of t iŠhours
for course i during the semester, your corresponding grade will be g i (t i ) = 20 1 − e−t i /di . The
constants di > 0 are given and represent the intrinsic difficulty of each course. You have T hours
available for studying during the semester. Formulate the problem of choosing t 1 , . . . , t m —the
study time that should be allocated to each course—as an optimization problem.

3. Linear receiver. A transmitter sends information symbols by modulating the amplitude of a carrier:
for each information symbol s ∈ R the transmitter emits the signal x(t) = sc(t), t = 1, . . . , T ,
where c(t) = cos(2π f0 t) is the carrier.
The receiver observes x(t) in additive noise plus an interference: y(t) = x(t) + n(t) + Ai(t),
t = 1, . . . , T , where y(t) is the signal measured at the receiver, n(t) denotes noise, and i(t) is the
interference signal with amplitude A. The receiver knows the carrier c(t) and the interference
i(t) but not the amplitude A nor the noise signal n(t) (of course, the receiver also ignores the
information symbol). The goal of the receiver is to estimate the information symbol s from the
available measurements y(1), . . . , y(T ).
We want to design a linear receiver, i.e., a receiver that estimates the information symbol via a
weighted combination of the measurements, b s = w1 y(1) + w2 y(2) + · · · + w T y(T ). Note that, for
a given choice of weights w1 , . . . , w T , there holds
! ! !
X T XT XT
s=s
b w t c(t) + w t n(t) + A w t i(t) .
t=1 t=1 t=1

Formulate the problem of choosing the weights w1 , . . . , w T as an optimization problem.

1
4. Intruder detection. An intruder moves on the vertices of a given graph G = (V, E) where V =
{1, 2, . . . , N } is the node set and E is the edge set. There is a sensor at each node of the graph.
If the intruder is at node i, the sensor at that node detects him with probability αi and flags a
variable Zi = 1 (accordingly, the sensor fails misses the intruder with probability 1 − αi and sets
Zi = 0). If the intruder is not at node i, the sensor at node i may still report an (false) alarm, i.e.,
the sensor flags Zi = 1 with probability βi and sets Zi = 0 with probability 1 − βi . (Ideal sensors
correspond to αi = 1 and βi = 0.)
We have sensors’ measurements for T consecutive time instants: (Z(1), Z(2), . . . , Z(T )). The
vector Z(t) = (Z1 (t), Z2 (t), . . . , ZN (t)) represents the sensors’ readings at time t.
Formulate the problem of finding the most probable path of the intruder, given the measurements,
as an optimization problem. Use binary variables pn (t) ∈ {0, 1} to denote if the intruder is at node
n at time t (pn (t) = 1) or not (pn (t) = 0).

2 Convex functions

5. Convex functions? Check if any of the following functions f : R → R is convex.

(a) f (x) = (|x| + 1)2


(b) f (x) = (|x| − 1)2
(c) f (x) = log (1 + e x )
(d) f (x) = x + where x + := max{x, 0}
(e) f (x) = 21 x +
2
.

6. Functions of discrete-time signals. We represent discrete-time signals of duration T as vectors in


R T : x = x 1 , x 2 , . . . , x T . The following functions f : R T → R compute characteristics of such


signals:
x 1 +···+x T
(a) Average: f (x) = T
1
PT € x +···+x
Š2
(b) Variance: f (x) = T t=1 x t − 1 T T
(c) Energy: f (x) = x 12 + x 22 + · · · + x T2
(d) Peak: f (x) = max{|x 1 |, . . . , |x T |}
Æ P
T
(e) Root mean square (RMS): f (x) = T1 t=1 x 2t
¦  ©
(f) Largest increase: f (x) = max x 2 − x 1 + , x 3 − x 2 + , . . . , x T − x T −1 +
 

(g) Dynamic range: f (x) = max{x 1 , . . . , x T } − min{x 1 , . . . , x T }.

Show that all these functions are convex.

7. Least squares. A least-squares problem is an optimization problem of the form

minimize kAx − bk2 (1)


x

where A ∈ Rm×n and b ∈ Rm are given. Show that (1) is a convex optimization problem.

2
8. Signal denoising. We observe the signal y t = x t + n t for t = 1, . . . , T , where n t denotes noise and
x t is the signal of interest—known to be usually nondecreasing, i.e., x t ≥ x t−1 for almost all t. To
recover x = (x 1 , . . . , x T ) from the measurement y = ( y1 , . . . , y T ) we formulate the optimization
problem
2 T
X
minimize y − x + ρ

x t−1 − x t + (2)
x
t=2

where ρ > 0 is given. Show that (2) is a convex optimization problem.

9. Optimal coverage. We want to cover given points x i ∈ Rn , i = 1, . . . , m, with a ball

B(c, R) = {x : kx − ck ≤ R}.
2
If point x i is not covered by we pay the quadratic penalty ωi kx i − ck − R where
the ball
ωi ≥ 0 is given; note that x i − c − R is the distance form x i to B(c, R). We are given the
radius R > 0 and we want to find the center c. The optimal placement of the ball corresponds to
solving
Xm € Š2
minimize ωi x i − c − R + . (3)
c
i=1

Show that (3) is a convex optimization problem.

10. Logistic regression. Logistic regression is a popular approach to model the influence of several
explanatory variables X = (X 1 , . . . , X n ) on a binary outcome Y ∈ {0, 1}. For example, Y may
represent the absence of presence of a certain disease and the X i ’s might represent age, weight,
blood pressure, etc. In logistic regression, we assume the parametric model
!Y
1−Y T
X +r
1 es

Prob(Y |X , s, r) = T X +r T X +r
,
1 + es 1 + es

where s = (s1 , . . . , sn ) and r are the model parameters.


We are given independent observations (x k , yk ) ∈ Rn × {0, 1}, k = 1, . . . , K, where

x k = (X 1k , . . . , X nk ) ∈ Rn

contains the values of the explanatory variables in the kth example and yk ∈ {0, 1} is the corres-
ponding binary outcome. Computing the maximum likelihood estimates of the model parameters,
for the given dataset, corresponds to solving
K
X
maximize log P( yk |x k , s, r). (4)
s,r
k=1

Show that (4) is a convex optimization problem.

3
3 Optimization with linear equality constraints

11. Maximum entropy. A random variable X taking values in the finite alphabet A = {a1 , a2 , . . . , an } ⊂
R is characterized by its probability mass function (pmf):

P(X = ai ) = pi , i = 1, . . . , n.
Pn
Note that pi ≥ 0 for all i and i=1 pi = 1. Assume a1 < a2 < · · · < an .
The entropy of X measures how “uncertain” the variable is:
n
1
X  
H(X ) = pi log
i=1
pi

where 0 log(1/0) := 0. Note that the entropy is a function of the pmf p = (p1 , . . . , pn ).
Suppose that we do not know the pmf of X . We only know that the expected value of X is a given
µ ∈ (a1 , an ):
X n
pi ai = µ.
i=1

The goal of this exercise is to find the maximum entropy pmf with expected value µ; this corres-
ponds to solving the optimization problem
Pn
minimize − i=1 pi log(pi )
p>0
subject to a T p = µ
1 T p = 1,

where a = (a1 , . . . , an ).

(a) Show that the optimal pmf is given by

e ai λ
pi (λ) = Pn ak λ
, i = 1, . . . , n,
k=1 e

where λ ∈ R solves
n
X
ai pi (λ) = µ. (5)
i=1
Pn
(b) Show that φ(λ) = i=1 ai pi (λ) is a strictly increasing function. Hint: show that
n
X 2
φ̇(λ) = pi (λ) ai − µ(λ) , (6)
i=1
Pn
where µ(λ) = i=1 ai pi (λ). (The right-hand side of (6) can be interpreted as the variance
of a random variable with pmf (p1 (λ), . . . , pn (λ)).
(c) Suggest a simple bisection method to find the solution of (5).

4
12. Bandwidth allocation. You have a bandwidth B > 0 to split among n clients. Each client has an
utility function that depends on the amount of bandwidth he gets:
Bi
 
Ui (Bi ) = αi log , for client i = 1, . . . , n.
βu
The constants αi > 0, βi > 0 are given.

(a) Find the bandwidth allocation that maximizes the overall utility, i.e., solve the optimization
problem Pn
maximize i=1 Ui (Bi )
B1 ,...,Bn >0
(7)
subject to B1 + · · · + Bn = B
γ
(b) Re-solve (7) but with the utility functions Ui (Bi ) = − Bi + δi where γi > 0, δi > 0 are given
i
constants.

13. Projection onto an ellipsoid. Consider the ellipsoid


x 12 x n2
¨ «
E = (x 1 , . . . , x n ) : 2 + · · · + 2 = 1 ,
R1 Rn
where R i > 0 are given for i = 1, . . . , n. We want to project the point a = (a1 , . . . , an ) onto E , i.e.,
we want to solve Pn 2
minimize 21 i=1 x i − ai (8)
x 1 ,...,x n
x 12 x n2
subject to R21
+ ··· + R2n
= 1.
Assume that ai > 0 for i = 1, . . . , n.

(a) Argue that the solution must satisfy x i > 0 for i = 1, . . . , n. Thus, problem (8) is equivalent
to Pn 2
minimize 12 i=1 x i − ai (9)
x 1 ,...,x n >0
x 12 x n2
subject to R21
+ ··· + R2n
= 1.

(b) Find an efficient algorithm for solving (9). Hint: note that the constraints in (9) are not
linear. Explore the change of variables yi := x i2 .

4 Optimization with linear inequality constraints

14. Projection onto the probability simplex. The set ∆n = {x ∈ Rn : x ≥ 0, 1 T x = 1} is known as


the probability simplex: each point x = (x 1 , . . . , x n ) in ∆ can be interpreted as the probability
mass function of a random variable over an alphabet with n symbols (x i is the probability of the
random variable taking on the ith symbol).
Consider the problem of projecting a point a = (a1 , . . . , an ) ∈ R onto the probability simplex,
minimize 1
2
kx − ak2
x
subject to x ≥0
1 T x = 1.

5
(a) Show that the first-order necessary optimality conditions can be written as

x i − ai = λ + µi , i = 1, . . . , n

 x1 + · · · + x n = 1


x i ≥ 0, i = 1, . . . , n
 µi ≥ 0, i = 1, . . . , n


x i µi = 0, i = 1, . . . n,

where λ, µ1 , . . . , µn denote lagrange multipliers.


(b) Let c ∈ R be given. Show that the solution of the system

a−b=c

 a≥0
 b≥0
ab = 0

with unknowns a and b is given by a = c + := max{c, 0} and b = c − := max{−c, 0}.


(c) Use part (b) to find an efficient algorithm that solves the system in (a).

15. Optimal power allocation. Consider the problem


Pn  
Pi
maximize i=1 log 1+ Ni
P1 ,...,Pn
subject to Pi ≥ 0, i = 1, . . . , n
P1 + · · · + Pn = P0 .

This problem arises in communication theory. The goal is to assign powers to n transmitters in
order to maximize the communication rate of n parallel additive white gaussian noise channels.
The constant P0 > 0 is the total power budget and Ni > 0 represents the power of the observation
noise in the ith channel.

(a) Show that the first-order necessary optimality conditions can be written as

1
− P +N = λ + µi , i = 1, . . . , n
 i i
 P + ··· + P = P


1 n 0
Pi ≥ 0, i = 1, . . . , n
µi ≥ 0, i = 1, . . . , n




Pi µi = 0, i = 1, . . . n,

where λ, P1 , . . . , Pn denote lagrange multipliers.


(b) Show that if (P1 , . . . , Pn , λ, µ1 , . . . , µn ) solves the system in (a) then λ < 0.
(c) Show that the system in (a) is equivalent to

−λPi − Ni µi = λNi + 1, i = 1, . . . , n

 P1 + · · · + Pn = P0


Pi ≥ 0, i = 1, . . . , n
 µi ≥ 0, i = 1, . . . , n


Pi µi = 0, i = 1, . . . n.

6
(d) Change variables as η := − λ1 (this is well defined since λ < 0) and show that the system in
part (c) is equivalent to

Pi − ηNi µi = η − Ni , i = 1, . . . , n

 P1 + · · · + Pn = P0


Pi ≥ 0, i = 1, . . . , n
 µi ≥ 0, i = 1, . . . , n


Pi µi = 0, i = 1, . . . n.

(e) Change variables as νi := ηNi µi and show that the system in part (d) is equivalent to

Pi − νi = η − Ni , i = 1, . . . , n

 P1 + · · · + Pn = P0


Pi ≥ 0, i = 1, . . . , n
 νi ≥ 0, i = 1, . . . , n


Pi νi = 0, i = 1, . . . n.

(f) Find an efficient algorithm to solve the system in part (e). Can you see why the optimal
power assignments resemble water filling?

16. Resource allocation. Consider the problem


Pn
minimize
x 1 ,...,x n i=1 f i (x i ) (10)
subject to x1 + x2 + · · · + x n = B
x i ≥ 0, i = 1, . . . , n.

We can interpret x i as the amount of a resource (e.g., time or money) we allocate to a task i, and
− f i (x i ) as the reward we receive back. The available budget is B > 0. The goal is to spread our
budget across the tasks in order to maximize the reward.

(a) Assume that each function f i : R → R is differentiable. Write the first-order necessary con-
ditions for problem (10) in terms of the variables (x 1 , . . . , x n ) and the lagrange multipliers
λ and µ1 , . . . , µn (associated with the equality and inequality constraints, respectively).
(b) Let (x 1? , x 2? , . . . , x n? ) be a solution of (10). Use part (a) to show that there exists a η such that

f˙i € x i? Š = η , if x i? > 0
¨ € Š
(11)
f˙i x i? ≥ η , if x i? = 0.

(c) Suppose η and (x 1? , . . . , x n? ) satisfy (11). Choose λ and µ1 , . . . , µn such that all the conditions
in part (a) are fulfilled—except for the constraint x 1 + · · · + x n = B.
(d) Assume that f¨i > 0 for i = 1, . . . , n. This means that all f i ’s are convex with strictly increasing
derivative. Suggest a method for solving the system in part (a). Hint: try to design a
bisection method that finds a suitable η.
(e) Apply your approach from part (d) to problem 2: make your method as explicit as possible.

7
5 Optimization with nonlinear constraints

17. Toy problem 1. Consider the optimization problem

minimize −x − y (12)
x, y
subject to y ≤3
y ≥ x 2 − 1.

(a) Sketch the feasible region and some level curves of the objective function.
(b) Use the KKT theorem to locate all solution candidates.
(c) Give the solution of the problem. Hint: argue through Weierstrass’s theorem (which says
that a continuous function on a compact set attains its infimum).

18. Toy problem 2. Consider the optimization problem

minimize x 2 + ( y + 1)2 (13)


x, y
subject to 1 ≤ (x − 4)2 + y 2 ≤ 9
y ≥ 0.

(a) Sketch the feasible region and some level curves of the objective function. Guess the solution
(x ? , y ? ).
(b) Write the KKT system associated with problem (13).
(c) Show that your guess (x ? , y ? ) satisfies the KKT system.

19. Toy problem 3. Consider the optimization problem

minimize − y2 (14)
x, y
subject to x2 + y2 ≤ 4
xy ≥0
x + y ≥ −1.

(a) Sketch the feasible region and some level curves of the objective function. Guess two local
minimizers (x 1? , y1? ) and (x 2? , y2? ).
(b) Write the KKT system associated with problem (14).
(c) Show that your two guesses (x 1? , y1? ) and (x 2? , y2? ) satisfy the KKT system.

20. Distance to an hyperplane. An hyperplane in Rn is a set of the form

H(s, r) = {x ∈ Rn : s T x = r}

with s 6= 0. In R2 an hyperplane corresponds to a line; in R3 , it corresponds to a plane.


We want to project a point p ∈ Rn onto a given hyperplane H(s, r), i.e., we want to solve
2
minimize 21 x − p

(15)
x
subject to s x = r.
T

8
(a) Show that the solution of (15) is

r − sT p
x? = p + s.
ksk2

(b) Conclude that the distance from the point p to the hyperplane H(s, r), defined as
¦ ©
d p, H(s, r) = inf x − p : x ∈ H(s, r) ,


is given by
 |r − s T p|
d p, H(s, r) = .
ksk

21. Fitting a plane. We want to fit a plane to a cloud of points in R3 . We parameterize the plane as

H(s, r) = {x ∈ R3 : s T x = r}

with ksk = 1. The given dataset is {x k : k = 1, . . . , K} ⊂ R3 .


We fit the plane by minimizing the sum of the square distances from each x k to the plane:
PK 2
minimize 12 k=1 d x k , H(s, r) (16)
s,r
subject to ksk2 = 1.

(a) Write the KKT system associated with problem (16).


(b) Use the KKT system to show that the optimal plane passes through the center of mass
K
1X
x := xk.
K k=1

(c) Show that the KKT system can be reduced to


¨
Rs = λs
ksk = 1

with unknowns s ∈ R3 and λ ∈ R, and


K
1X  T
R := xk − x xk − x
K k=1

denotes the covariance matrix of the data.


(d) Conclude that the optimal plane is given by (s? , r ? ) where s? an eigenvector associated with
the minimum eigenvalue of R and r ? = (s? ) T x.

22. Linear function over an ellipsoid. Consider the problem

minimize sT x (17)
x
T −1
subject to (x − c) A (x − c) ≤ 1.

9
The data s 6= 0, c and A  0 is given. Problem (17) consists in minimizing a linear function over
an ellipsoid with center c. Use the KKT theorem to show that the solution is

As
x? = c + p .
s T As

23. Minimum volume box enclosing an ellipsoid. A box in Rn is a set of the form

B(l, u) = {x ∈ Rn : l ≤ x ≤ u}.

Find the smallest box that contain a given ellipsoid E = {x : (x − c) T A−1 (x − c) ≤ 1}.

6 Linear programs

24. Fitting a line. We want to fit a line to given data {(x k , yk ) : k = 1, . . . , K} ⊂ R2 . The line is
parameterized as y = s x + r and we penalize residuals via their absolute values:
PK
minimize
s,r k=1 | yk − (s x k + r)|. (18)

(a) Formulate (18) as a linear program (LP).


(b) Suppose now that each (x k , yk ) is known only up to an additive bound ρk ≥ 0. That is, the
(unknown) true kth data point lies in the box {(x, y) : |x − x k | ≤ ρk , | y − yk | ≤ ρk }.
To cope with this uncertainty, we formulate a robust variation of (18) as
K
X
minimize max{| y − (s x + r)| : |x − x k | ≤ ρk , | y − yk | ≤ ρk } . (19)
s,r
|k=1 {z }
f (s,r)

Note that (18) corresponds to the special case ρk ≡ 0. In (19) we fit a line that minimizes
the sum of the worst-case residuals. Formulate (19) as a LP. Hint: it may be useful to show
that max{|a + z b| : |z| ≤ ρ} = |a| + ρ|b| for any a, b ∈ R.

25. Placing a box. The `1 distance from a point p ∈ Rn to a box

B∞ (c, R) = {x ∈ Rn : kx − ck∞ ≤ R}
¦ ©
p, B∞ (c, R) = inf y − p 1 : y ∈ B∞ (c, R) .

is defined as d1
We want to place a box (with fixed radius R) near given points pk ∈ Rn , k = 1, . . . , K, by solving
PK
k=1 d1 pk , B∞ (c, R) .

minimize (20)
c

Formulate (20) as a LP.

10
26. Rankings. We have a collection of n items to rank. A ranking corresponds to the assignment of a
label x i ∈ R to each item i = 1, . . . , n. Thus, the x i ’s measure the status of the items (the top item
is the one with the highest of the x i ’s). We refer to a vector x = (x 1 , x 2 , . . . , x n ) as a ranking.
We want to build a ranking on the basis of a given set of K opinions that compare two items at
a time. If the kth opinion says “item i is better than j” we encode it as a vector ak ∈ Rn with a
1 in position i and a −1 in position j; the remaining entries are zero. A ranking x is compatible
with the kth opinion if akT x ≥ ε where ε > 0 is some (fixed) positive constant. Note that some
opinions may be contradictory.
We want to find a ranking that agrees the most with the K opinions. We formulate the problem
PK € Š
minimize k=1 w k ε − a T
k
x +
(21)
x
1 T
subject to n
1 x = 0,

where the constant w k ≥ 0 is the weight we place on opinion k (its credibility). The constraint
says that the average label should be zero; this constraint is introduced to normalize the labels,
as it is easily seen that the cost function in (21) is insensitive to an additive constant in x.
Formulate (21) as a LP.

11