You are on page 1of 69

MA4254: Discrete Optimization Defeng Sun Department of Mathematics National University of Singapore Oce: S14-04-25 Telephone: 6516 3343

Aims/Objectives: Discrete optimization deals with problems of maximizing or minimizing a function over a feasible region of discrete structure. These problems come from many elds like operations research, management science, and computer science. The primary objective of this course is twofold: a) to study key techniques to separate easy problems from dicult ones and b) to use typical methods to deal with dicult problems. Mode of Evaluation: Tutorial class performance (10%); Mid-Term test (20%) and Final examination (70%)

This course is taught at Department of Mathematics, National University of Singapore, Semester I, 2009/2010. E-mail: matsundf@nus.edu.sg

References: 1) D. Bertsimas and J. N. Tsitsiklis, Introduction to Linear Optimization. Athena Scientic, 1997.

2) G. L. Nemhauser and L. A. Wolsey, Integer and Combinatorial Optimization. John Wiley and Sons, 1999.

3) C. H. Papadimitriou and K. Steiglitz, Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, 1982. Second edition by Dover, 1998. PARTIAL lecture notes will be made available in my webpage http://www.math.nus.edu.sg/matsundf/

Discrete Optimization
1

Introduction

In this Chapter we will briey discuss the problems we are going to study; give a short review about simplex methods for solving linear programming problems and introduce some basic concepts in graphs and digraphs. 1.1

Linear Programming (LP): a short review


min cT x (P ) s.t. Ax b x0

Consider the following linear programming

and its dual max bT y (D) s.t. AT y c y 0. Simplex Method Dantzig (1947) Very ecient Not polynomial time algorithm. Klee and Minty (1972) gave an counterexample. Average analysis versus worst-case analysis Russians Ellipsoid Method Polynomial time algorithm (Khachiyan, 1979) Less ecient Interior-Point Algorithms Karmarkar (1984) Polynomial times algorithm Ecient for some large-scale sparse LPs Others

1.2

Discrete Optimization (DO)

Also Combinatorial Optimization (CO) Mathematical formula in general: min (x) s.t. x decision policy F is the collection of feasible decision policies (x) measures the value of members of F . A typical DO (CO) problem: min cT x (IP ) s.t. Ax b x0 xj integer for j I N := {1, , n}. where c
n

xF

,x

,b

and A

mn

I = , (IP) = (LP) I = N , (IP) = pure IP 1.3

Specic Forms

1. The 0 1 Knapsack Problem Suppose there are n projects. j th project has a cost aj and a value cj each project either done or not A budget of b available to fund the projects Then the 0 1 Knapsack Problem can be formulated as max cT x s.t. aT x b x Bn , where a = (a1 , , an )T and B n is the set of n-dimensional binary vector.

Discrete Optimization
2. The Assignment Problem n people and m jobs, where n m

Each job must be assigned to exactly one person, and each person can do at most one job The cost of person j doing job i is cij . Then the Assignment Problem can be formulated as
m n

min
i=1 j =1 n

cij xij xij = 1,


j =1 m

s.t.

i = 1, , m j = 1, , n

xij 1,
i=1

x B mn . Extensions Three-Index Assignment Problem

3. Set-Covering, Set-Packing, and Set-Partitioning Problems

The Set-Covering Problem is min cT x s.t. Ax 1 x Bn .

The Set-Packing Problem is max cT x s.t. Ax 1 x Bn .

4. Traveling Salesman Problem (TSP)

We are given a set of nodes V = {1, , n} and a set of arcs A. The nodes represent cities, and the arcs represent ordered pairs of cities between which direct travel is possible. For (i, j ) A, cij is the direct travel time from city i to city j . The TSP is to nd a tour, starting at city 1, that (a) visits each other city exactly once and then returns to city 1, and (b) takes the least total travel time.

5. Facility Location Problem, Network Flow Problem, and many more

1.4

Why DO (CO) dicult

Arrangements grow exponentially is the supercial reason. Total Unimodularity (TU) Theory; Shortest Path; Matroids and Greedy Algorithm; Complexity (P = N P conjecture); Interior-Point Algorithms; Cutting Plane; Branch and Bound; Decomposition; Flowshop Scheduling, etc.

1.5

Convex sets
n

In linear programming and nonlinear programming, we have already met many convex sets. For examples, the line segment between two points in a unit ball in convex set?
n

is a convex set;

is a convex set; and more importantly a polyhedral set is a convex

set (a formal denition of a polyhedral set is to be given shortly). But, what is a

Denition 1.1 A set S

is convex if for any x, y S , and any [0, 1], we

have x + (1 )y S , i.e., the whole line segment between x and y is in S .

Exercise: Give two more sets which are convex and two sets which are not convex.

Discrete Optimization
1.6

Hyperplanes and half spaces


n

Denition 1.2 Let a be a nonzero vector in {x


n

and b be a scalar. Then the set

| aT x = b }

is called a hyperplane, where aT is the transpose of the (column) vector a. Geometrically, the hyperplane {x it in the form {x
n n

| aT x = b} can be understood by expressing

| aT (x x0 ) = 0}

where x0 is any point in the hyperplane, i.e., aT x0 = b. This representation can then be interpreted as {x
n

| aT (x x0 ) = 0} = x0 + a ,

where a denotes the orthogonal complement of a, i.e., the set of all vectors orthogonal to it: a = {d
n

| a T d = 0 }.

This shows that the hyperplane consists an oset of the hyperplane from the origin (i.e., x0 ), plus all vectors orthogonal to the (normal) vector. A hyperplane divides
n

into two parts, which are called half spaces.

Denition 1.3 Let a be a nonzero vector in {x is called a halfspace.


n

and b be a scalar. Then the set

| aT x b}

Obviously, a halfspace is a convex set and {x halfspace.

| aT x b} is the other

Let x0 be any point on the hyperplane {x halfspace {x


n

| aT (x x0 ) = 0 }. Then the

| aT x b } can be expressed as {x
n

| aT (x x0 ) 0 }.

This suggests a simple geometric interpretation: the half space consists of x0 plus any vector that makes an acute angle with the normal vector a. See the gure below.
a

a T( x x0) > _0

x0

Figure 1.1: The half space {x

| aT x b } consists of x0 plus any vector that makes

an acute angle with the normal vector a

1.7

Polyhedra
m

Denition 1.4 A polyhedron is a set that can be described in the form {x


n

| Ax b}, where A is an m n matrix and b is a vector in Let A


mn

and b

be dened as follows T a 1 . . b = A = . , aT m

b1 . . . . bm

Then the polyhedron dened in Denition 1.4 is the intersection of the following halfspaces {x
n

| aT i x bi },

i = 1, . . . , m.

Discrete Optimization
hedrons is again a polyhedron. So {x where C
pn n

It is noted that these halfspaces are nite in number. The intersection of two poly| Cx d, Ax = b} is also a polyhedron, ,d
p

A polyhedron may have dierent representations. For example {x = {x


2

| x1 + x2 = 0, x1 0}
2

| 2x1 + 2x2 0, x1 + x2 0, x1 0} .

A bounded polyhedron is sometimes called a polytope.

Let ei be the ith unit vector in the nonnegative orthant


n +

. Then by noting that xi = eT i x we know that

= {x

| xi 0, i = 1, . . . , n}.

is a polyhedron. Denition 1.5 Let x1 , . . . , xk be vectors in scalars whose sum is one.


n n

and let 1 , . . . , k be nonnegative

(a) The vector


i=1

i xi is said to be a convex combination of the vectors x1 , . . . , xk .

(b) The convex hull (conv in short) of the vectors x1 , . . . , xk is the set of all convex combinations of these vectors. It is easy to see by Denition 1.5 that conv{e1 , . . . , en }
n

= is a polytope. 1.8

|
i=1

xi = 1, xi 0, i = 1, . . . , n

Basic feasible solutions

We have already known that an optimal solution to a linear programming (assume the existence of an optimal solution) can be found at a corner of the polyhedron

10

over which we are optimizing. There are quite a number of dierent but equivalent ways to dene the concept of a corner. Here we introduce two of them exreme points and basic feasible solutions. Our rst denition denes an extreme point of a polyhedron as a point that can not be expressed as a convex combination of two other points of the polyhedron. Denition 1.6 Let P
n

be a polyhedron. A vector x P is an extreme point

of P if we cannot nd two vectors y, z P , both dierent from x, and a scalar [0, 1], such that x = y + (1 )z. It can be checked easily that the extreme points of P = conv{e1 , e2 , e3 } are e1 , e2 and e3 . Clearly, Denition 1.6 is entirely geometric and does not refer to a specic representation of a polyhedron in terms of linear constraints. Next, we give a denition that relies on a representation of a polyhedron in terms of linear constraints. Some terminology is necessary for this purpose.

Consider a polyhedron P equality constraints

, dened in terms of the linear equality and in-

aT i x bi , i M 1 aT i x bi , i M 2 aT i x = bi , i M3 where M1 , M2 and M3 are nite index sets, each ai is a vector in scalar. For example, let P = {x
3 T T | aT 1 x 1, a2 x 3, a3 x = 1, x 0} n

and each bi is a

(1.1)

Discrete Optimization
a6 = e3 . Then M1 = {1, 4, 5, 6}, M2 = {2}, M3 = {3}.

11

where a1 = (0, 0, 2)T , a2 = (4, 0, 0)T and a3 = (1, 1, 1)T . Let a4 = e1 , a5 = e2 and

Denition 1.7 If a vector x satises aT i x = bi for some i M1 , M2 or M3 , we

say that the corresponding constraint is active or binding at x . The active set of P at x is dened as
I (x ) = {i M1 M2 M3 | aT i x = bi },

i.e., I (x ) is the set of indices of constraints that are active at x . For example, suppose that P is dened by (1.1). Let x = (0.5, 0, 0.5)T . All active constraints at x are
T T aT 1 x 1, a3 x = 1, a5 x(= x2 ) 0

and I (x ) = {1, 3, 5}.

Recall that vectors x1 , . . . , xk

are said to be linearly independent if = 1 = . . . = k = 0.


n

1 x1 + . . . + k xk = 0

The maximal number of linearly independent vectors in if x1 , . . . , xk


n

is exactly n. Thus k n
n

are linearly independent. Note that x1 , . . . , xn

are linearly

independent if and only if the matrix M = [x1 . . . xn ] is nonsingular, i.e., the determinant of M is not zero. If there are n constraints of P
n

that are active at a vector x , then x satises

a certain system of n linear equations in n unknowns. This system has a unique solution if any only if the n vectors ai of these n equations are linear independent. This is stated precisely in the following proposition. Proposition 1.1 Let x
n

. The following are equivalent.

12

(a) There exist n vectors in the set {ai | i I (x )}, which are linearly independent. (b) The span of the vectors ai , i I (x ), is all of
n

, that is, every element of

can be expressed as a linear combination of the vectors ai , i I (x ).


(c) The system of equations aT i x = bi , i I (x ), has a unique solution.

[Observations: If I (x ) contains exactly n elements, the proof of the proposition is trivial. I (x ) may contain more than n elements.] Proof. (a) (b) Suppose that the vectors ai , i I (x ), span
n

. Then, the span of these vectors

has dimension n. This clearly implies that exist n vectors in the set {ai | i I (x )}, which are linearly independent because otherwise if the maximal number of linearly independent vectors is k n 1 the the span of these vectors would have dimension k.

Conversely, suppose that n of the vectors ai , i I (x ), are linearly independent. Then, the subspace spanned by these n vectors is n-dimensional and must be equal to
n

. Hence, every element of

is a linear combination of the vectors ai , i I (x ).

(b) (c)
1 If the system of equations aT i x = bi , i I (x ), has multiple solutions, say x and x2 , then the nonzero vector d = x1 x2 satises aT i d = 0, i I (x ). Then for any

linear combination

iI (x )

i ai of vectors ai , i I (x ), one has i ai =


iI (x ) iI (x )

dT

i dT ai = 0.

This, together with the fact that dT d > 0, shows that d is not a linear combination of these vectors. Thus, ai , i I (x ) do not span
n

Conversely, if the vectors ai , i I (x ), do not span

, choose a nonzero vector d

Discrete Optimization

13

which is orthogonal to the subspace spanned by these vectors. If x satises aT i x = bi


for all i I (x ), we also have aT i (x + d) = bi for all i I (x ), thus obtaining multiple

solutions. We have therefore established that (b) and (c) are equivalent.

Q.E.D.

With a slight abuse of language, we will often say that certain constraints are linearly independent, meaning that the corresponding vectors ai are linearly independent. We are now ready to provide an algebraic denition of a corner point of the polyhedron P . Denition 1.8 Let x
n

(a) The vector x is called a basic solution if


(i) aT i x = bi , i M3 .

(ii) Out of {ai }iI (x ) , there are n of them that are linearly independent. (b) If x is a basic solution that satises all of the constraints, we say that it is a basic feasible solution. Let P be dened by (1.1). Then x = (0.5, 0, 0.5)T is a basic feasible solution because x P and a1 , a3 , a5 are linearly independent. Let us take another example by assuming that P = {y where A
mn m

| AT y c},

and c

(surprisingly familiar? Think about the dual form


m

of a linear programming problem). Then y

is a basic solution if AT i y = ci ,

i J {1, . . . , n} and there exist m vectors in {Ai }iJ such that they are linearly independent. Here Ai denotes the ith column of A. y is a basic feasible solution if it is a basic solution and AT y c. Exercise: What is a basic (feasible) solution to P = {x
n

| Ax = b, x 0}?

14

Note that if the number m of constraints used to dene a polyhedron P than n, and there are no basic or basic feasible solutions.

is less than n, the number of active constraints at any given point must also be less

1.9

Finite basis theorem for polyhedra


n

Denition 1.9 A set C

is a cone if x C for all 0 and all x C .


x1

Convex Cone

K ( z )

x2

x3

Figure 1.2: A closed convex cone K

From the denition we can see that 0 C . For vectors x1 , . . . , xk cone{x1 , . . . , xk }


k

, let

|x =
i=1

i xi , i 0, i = 1, . . . , k

Then, cone{x1 , . . . , xk } is a cone and convex set, which is called the convex cone generated by x1 , . . . , xk .

Discrete Optimization
The set P = {x
n

15

| Ax 0} is called a polyhedral cone.

Excercise: Is cone{x1 , . . . , xk } a polyhedral cone? Given A


mn

and b

, consider
n

P = {x and y P .

| Ax b}

Denition 1.10 The recession cone of P at y is dened as the set {d


n

| A(y + d) b, for all 0}.

Roughly speaking, the recession cone of P at y is the set of all directions d along which we can move indenitely away from y , without leaving the set P . It can be easily seen that the recession cone of P at y is the same as {d
n

| Ad 0},

and is the polyhedral cone. This means that the recession cone is independent of the starting point y . For P = {x the recession cone is {d
n n

|Ax = b, x 0}, where A | Ad = 0, d 0}.

mn

and b

16

Denition 1.11 (a) A nonzero element d of a polyhedral cone C


n

is called an extreme ray if

there are n 1 linearly independent constraints that are active at d. (b) An extreme ray of the recession cone associated with a nonempty polyhedron P is also called an extreme ray of P .
n +.

For example, consider the simple polyhedral cone


n +

Then the extreme rays of

are {e1 , . . . , en }.

Since there are n linearly independent constraints that are active at the zero vector, by the denition the zero vector is an extreme point of point). The following theorem is called the nite basis theorem for polyhedra, obtained by Minkowski.
n +

(actually the only extreme

Theorem 1.1 Let P = {x

| Ax b}, where A
n

mn

and rank (A)=n.

Then there exist x1 , . . . , xq and d1 , . . . , dr in

such that

P = conv{x1 , . . . , xq } + cone{d1 , . . . , dr }, where {x1 , . . . , xq } is the set of extreme points (basic feasible solutions) of P and {d1 , . . . , dr } is the set of extreme rays of P .

Note that Theorem 1.1 implies that P can only have nitely many extreme points. It can also be seen easily that P is bounded if and only if the recession cone of P contains the zero vector only. There are several ways to prove the above theorem. For example, one may use Farkas lemma and the duality theory of linear programming to prove the above theorem.

Discrete Optimization
1.10

17

Simplex Method Revisited


min cT x (P ) s.t. Ax = b, x 0, (1.2)

Consider the standard linear programming problem

where A

mn

(m n) is of full row rank, and its dual form (D) max bT y s.t. AT y c. (1.3)

Let x be a basic feasible solution to the standard form problem, let B (1), . . . , B (m) be the indices of the basic variables, and let B = [AB (1) . . . AB (m) ] the corresponding basis matrix. In particular, we have xi = 0 for every nonbasic variable, while the vector xB = (xB (1) , . . . , xB (m) )T of basic variables is given by xB = B 1 b 0. The full tableau implementation of the simplex method at the very beginning is: 0 b1 . . . bm c1 | A1 | ... ... cn | An |

1 Let c T = cT cT B B A. By using elementary row operations to change cB (1) , . . . , cB (m)

to be zero, we have the following tableau: cT B xB xB (1) . . . xB (m) c 1 | B 1 A1 | ... ... c n | B 1 An |

Anticycling rules: Lexicography and Blands rule.

18

Finding an initial basic feasible solution: The articial variables method and the bigM method.

For the dual simplex method, we have 0 b1 . . . bm and cT B xB xB (1) . . . xB (m) c 1 | B 1 A1 | ... ... c n | B 1 An | c1 | A1 | ... ... cn | An |

We do not require B 1 b to be nonnegative, which means that we have a basic, but not necessarily feasible solution to the primal problem. However, we assume
1 that c 0; equivalently, the vector y T = cT satises y T A cT , and we have BB

a feasible solution to the dual problem. The cost of this dual feasible solution is
1 T y T b = cT B B b = cB xB , which is the negative of the entry at the upper left corner of

the tableau.

Discrete Optimization
1.11

19

Graphs and Digraphs Graphs

1.11.1

Denition 1.12 A graph G is a pair (V, E ), where V is a nite set and E is a set of unordered pairs of elements of V . Elements of V are called vertices and elements of E edges. We say that a pair of distinct vertices are adjacent if they dene an edge, and that the edge is said to be incident to its dening vertices. The degree of a vertex v (denoted deg(v )) is the number f edges incident to that vertex. An Example.
e1

e 3

e4

e 2

Figure 1.3: A Graph

Denition 1.13 An v1 vk -path (or path connecting v1 and vk ) is a sequence of edges v1 v2 , . . . , vi1 vi , . . . , vk1 vk . A cycle is a sequence of edges v1 v2 , . . . , vi1 vi , . . . , vk1 vk , vk v1 . In both cases vertices are all distinct. A graph is acyclic if it has no cycle. Proposition 1.2 If every vertex of G has degree of at least two then G has a cycle. Proof. Let P = v1 v2 , . . . , vk1 vk be a path of G with a maximum number of edges. Since deg(vk ) 2, there is an edge vk w where w = vk1 . It follows from the choice of P that w is a vertex of P , i.e., w = vi for some i {1, . . . , k 2}. Then vi vi+1 , . . . , vk1 vk , vk vi is a cycle. Q.E.D.

20

Denition 1.14 G is connected if each pair of vertices is connected by a path. Proposition 1.3 Let G be a connected graph with a cycle C and let e be an edge of C . Then G e is connected. Proof. Let v1 , v2 be vertices of G e. We need to show there exists a v1 v2 -path P of G e. Since G is connected there exists a v1 v2 -path P of G. If P does not use e then we are done. Otherwise P implies there exists a v1 w1 -path P1 and a w2 v2 -path P2 , where w1 , w2 are endpoints of e. Moreover, C w1 w2 is a w1 w2 -path. The result now follows. Q.E.D.

Denition 1.15 H is a subgraph of G if V (H ) V (G) and E (H ) E (G). It is a spanning subgraph if in addition V (H ) = V (G). Denition 1.16 A tree is a connected acyclic graph. Theorem 1.2 If T = (V, E ) is a tree, then |E | = |V | 1. Proof. Let us proceed by induction of the number of vertices of V . The base case |V | = 1 is trivial since then |E | = 0. Assume now |V | 2 and suppose the theorem holds for all trees with |V | 1 vertices. Since T is acyclic, it follows form Proposition 1.2 that there is a vertex v with deg(v ) 1. Since T is connected and |V | 2, deg(v ) = 0. Thus, there is a unique uv incident to v . Let T be dened as follows V (T ) = V {v } and E (T ) = E {uv }. Observe that T is a tree. Hence by induction |E (T )| = |V (T )| 1 and it follows |E | = |V | 1. Q.E.D.

Proposition 1.4 Let G = (V, E ) be a connected graph. Then |E | |V | 1. Moreover, if equality holds then G is a tree. Proof. If G has a cycle then remove from G any edge on the cycle. Repeat until the resulting graph T is acyclic. It follows from Proposition 1.3 that T is connected. Hence T is a tree and by Theorem 1.2, |E (G)| |E (T )| = |V (G)| 1. Q.E.D.

Discrete Optimization
1.11.2 Bipartite Graph

21

G = (S, T, E ): For any edge in E with one vertex in S and the other in T .

1.11.3

Vertex-Edge Incidence Matrix

Denition 1.17 The vertex-edge incidence matrix of a graph G = (V, E ) is a matrix A with |V | rows and |E | columns whose entries are either 0 or 1 such that The rows correspond to the vertices of G, The columns correspond to the edges of G, and the entry Av,ij for vertex v and edge ij is given by Av,ij 0 if v = i and v = j = 1 if v = i or j.

1.11.4

Digraphs (Directed Graphs)

Denition 1.18 A directed graph (or digraph) D is a pair (N, A) where N is a nite set and A is a set of ordered pairs of elements of N . Elements of N are called nodes and elements of A arcs. Node i is the tail (resp. head) of arc ij . The in-degree (resp. out-degree) of node v (denoted deg+ (v ) (resp. deg (v )) is the number of arcs with head (resp. tail) v .

1.11.5

Bipartite Digraph

D = (S, T, A)

1.11.6

Node-Arc Incidence Matrix

Denition 1.19 The node-arc incidence matrix of a graph D = (N, A) is a matrix M with |V | rows and |A| columns whose entries are either 0, +1, or 1 such that The rows correspond to the nodes of D, The columns correspond to the arcs of D, and the entry Mv,ij for node v and arc

22

ij is given by Mv,ij

if v = i and v = j 0 = +1 if v = j, and 1 if v = i.

Discrete Optimization
2 Total Unimodularity (TU) and Its Applications

23

In this section we will discuss the total unimodularity theory and its applications to ows in networks. 2.1 Total Unimodularity: Denition and Properties

Consider the following integer linear programming problem max cT x (P ) s.t. Ax = b x0 where A Z mn , b Z m and C Z n all integers. Denition 2.1 A square, integer matrix B is called unimodular if |Det(B )| = 1. An integer matrix A is called totally unimodular if every square, nonsingular submatrix of A is unimodular. The above denition means that a TU matrix is a {1, 0, 1}-matrix. But, a {1, 0, 1}-matrix may not necessarily a TU matrix, e.g., 1 1 A= 1 1 Lemma 2.1 Suppose that A Z nn is a unimodular matrix and that b Z n is an integer vector. If A is nonsingular, then Ax = b has the unique integer solution x = A1 b. Proof. Let aij be the ij -th entry of A, i, j = 1, . . . , n. For any aij , dene the cofactor of aij as Cof(aij ) = (1)i+j Det(A{1,...,n}\{i} ), where (A{1,...,n}\{i} ) is the matrix obtained by removing the i-th row and the j -th column of A. Then Det(A) =
i=1 n {1,...,n}\{j } {1,...,n}\{j }

(2.1)

ai1 Cof(ai1 ).

24

The Adjoint of A is Adj(A) = Adj({aij }) = {Cof(aij )}T and the inverse of A is A1 = 1 Adj(A). Det(A)

Since A Z nn is a unimodular nonsingular integer matrix, every Cof(aij ) is an integer and Det(A) = 1. Hence A1 is an integer matrix and x = A1 b is integer whenever b is. Q.E.D.

Theorem 2.1 If A is TU, every basic solution to P is integer. Proof. Suppose that x is a basic solution to P . Let N be the set of indices of x such that xj = 0. Since x is a basic solution to P , there exist two nonnegative integers p and q with p + q = n and indices B (1), . . . , B (p) {1, . . . , m} and N (1), . . . , N (q ) N such that
p q T {AT B (i) }i=1 {eN (j ) }j =1

are linearly independent, where eN (j ) is the N (j )-th unit vector in such that BxB = bB ,

.
pp

From Ax = b and xj = 0 for j N , we know that there exists a matrix B

where xB = (xB (1) , . . . , xB (p) )T and bB = (bB (1) , . . . , bB (p) )T . The matrix B is nonp q T singular from the linear independence of {AT B (i) }i=1 {eN (j ) }j =1 . Then, by Lemma

2.1, we know that xB is integer. By noting that xN = 0 is integer, we complete the proof. Q.E.D.

Proposition 2.1 A Z mn is TU = A and AT are totally unimodular matrices. Proposition 2.2 A Z mn is TU = (A ei ) is TU, where ei is the i-th unit vector of
m

, i = 1, . . . , m.

Discrete Optimization
Proposition 2.3 A Z mn is TU = (A I ) is TU, where I matrix. Proposition 2.4 A Z mn is TU = identity matrix. A I is TU, where I
nn mm

25

is the identity

is the

Theorem 2.2 (Homan and Kruskal, 1956) For any integer matrix A Z mn , the following statements are equivalent: 1. A is TU; 2. The extreme points (if any) of S (b) = {x| Ax b, x 0} are integer for any integer b; 3. Every square nonsingular submatrix of A has integer inverse. Proof. (1 2) After adding nonnegative slack variables, we have the system Ax + Is = b, x 0, s 0.

The extreme points of S (b) correspond to basic feasible solutions of the system (as an exercise). Let y = (x, s) be a basic feasible solution of the above system. If a given basis B contains only columns from A, then yB is integer as A is TU (Lemma 2.1). The same is true if B contains only columns from I . So we have to consider I ), where A is a submatrix of A and I is a submatrix of I . the case when B = (A After the permutation of rows of B , we have A1 0 . B = A2 I

26

Obviously, |Det(B )| = |Det(B )| and |Det(B )| = |Det(A1 )||Det(I )| = |Det(A1 )|. Now A is totally unimodular implies |Det(A1 )| = 0 or 1 and since B is assumed to be nonsingular, |Det(B )| = 1. Again, from Lemma 2.1, yB is an integer. Hence y is integer because yj = 0, j / B . This implies that x is integer. [One may also make use of Theorem 2.1 and Proposition 2.3 to get the proof immediately.]

(2 3). Let B Z pp be any square nonsingular submatrix of A. It is sucient to prove that bj is an integer vector, where bj is the j th column of B 1 , j = 1, . . . , p. Let t be an integer vector such that t + bj > 0 and bB (t) = Bt + ej , where ej is the j th unit vector. Then xB = B 1 bB (t) = B 1 (Bt + ej ) = t + B 1 ej = t + bj > 0. By choosing bN (N = {1, . . . , n}\B ) suciently large such that (Ax)j < bj , j N , where xj = 0, j N . Hence x is an extreme point of S (b(t)). As xB and t are integer vectors, bj is an integer vector too for j = 1, . . . , p and B 1 is an integer.

(3 1). Let B be an arbitrary square, nonsingular submatrix of A. Then 1 = |Det(I )| = |Det(BB 1 )| = |Det(B )||Det(B 1 )|. By the assumption, B and B 1 are integer matrices. Thus |Det(B )| = |Det(B 1 )| = 1, and A is TU. Q.E.D.

Discrete Optimization
or 1 is TU if 1. no more than two nonzero elements appear in each column, 2. the rows of A can be partitioned into two subsets M1 and M2 such that

27

Theorem 2.3 (A sucient condition of TU) An integer matrix A with all aij = 0, 1,

(a) if a column contains two nonzero elements with the same sign, one element is in each of the subsets, (b) if a column contains two nonzero elements of opposite signs, both elements are in the same subset. Proof. The proof is by induction. One element submatrix of A has a determinant equal to (0, 1, 1). Assume that the theorem is true for all submatrices of A of order k 1 or less. If B contains a column with only one nonzero element, we expand Det(B ) by that column and apply the induction hypothesis.

Finally, consider the case in which every column of B contains two nonzero elements. Then from 2(a) and 2(b) for every column j bij =
iM1 iM2

bij , j = 1, . . . , k.

Let bi be the ith row. Then the above equality gives bi


iM1 iM2

bi = 0,

which implies that {bi }, i M1 M2 are linearly dependent and thus B is singular, i.e., Det(B ) = 0. Corollary 2.1 The vertex-edge incidence matrix of a bipartite graph is TU. Corollary 2.2 The node-arc incidence matrix of a digraph is TU. Q.E.D.

28

2.2

Applications

In this section we show that the assumptions in Theorems in Section 2.1 for integer programming problems connected with optimization of ows in networks are fullled. This means that these problems can be solved by the SIMPLEX METHOD. However, it is not necessarily to use the simplex method because more ecient methods have been developed by taking into consideration the specic structure of these problems. Many commodities, such as gas, oil, etc., are transported through networks in which we distinguish sources, intermediate transportation or distribution points and destination points. We will represent a network as a directed graph G = (V, E ) and associate with each arc (i, j ) E the ow xij of the commodity and the capacity dij (possibly innite) that bounds the ow through the arc. The set V is partitioned into three sets: V1 set of sources or origins, V2 set of intermediate points, V3 set of destinations or sinks.

V1

V V2

Figure 2.1: A network

Discrete Optimization
demand for the commodity.

29

For each i V1 , let ai be a supply of the commodity and for each i V3 , let bi be a We assume that there is no loss of the ow at intermediate points. Additionally, denote V (i) (V (i)) as V (i) = {j | (i, j ) E } and V (i) = {j | (j, i) E }, respectively. Then the minimum cost capacitated problem may be formulated as (P) subject to xij
j V (i) j V

v (P ) = min
(i,j )E

cij xij

i V1 , ai , xji = 0, i V2 , (i) bi , i V3 , (i, j ) E.

(2.2)

0 xij dij ,

(2.3)

Constraint (2.2) requires the conservation of ow at intermediate points, a net ow into sinks at least as great as demanded, and a net ow out of sources equal or less than the supply. In some applications, demand must be satised exactly and all of the supply must be used. If all of the constraints of (2.2) are equalities, the problem has no feasible solutions unless ai =
iV1 iV3

bi .

To avoid pathological cases, we assume for each cycle in the network G = (V, E ) either that the sum of costs of arcs in the cycle is positive or that the minimal capacity of an arc in the cycle is bounded. Theorem 2.4 The constraint matrix corresponding to (2.2) and (2.3) is totally unimodular.

30

Proof. The constraint matrix has the form A1 , A= I where A1 is the matrix for (2.2) and I is an identity matrix for (2.3). In the last section, we show that A1 is totally unimodular implies that A is totally unimodular.

Each variable xij appears in exactly two constraints of (2.2) with coecients +1 or 1. Thus A1 is an incidence matrix for a digraph and therefore it is totally unimodular. Q.E.D.

The most popular case of P is the so-called (capacitated) transportation problem. We obtain it if we put in P : V2 = , V (i) = for all i V1 and V (i) = for all i V3 . So we get v (T ) = min
(i,j )E

cij xij ,

s.t. (TP)
j V (i)

xij ai , i V1 ,

xji bi , i V3 ,
j V (i)

0 xij dij , (i, j ) E.

If dij = for all (i, j ) E , the uncapacitated version of P is sometimes called the transshipment problem.

Discrete Optimization
problem reduces to the so-called assignment problem of the form v (AP ) = min
iV1 j V (i)

31

If all ai = 1, and all bi = 1, and additionally, |V1 | = |V3 |, the transshipment

cij xij ,

s.t. (AP)
j V (i)

xij = 1, i V1 ,

xji = 1, i V3 ,
j V (i)

xij 0. Note that |V1 | = |V3 | implies that all constraints in (AP) must be satised as equalities. Let V = {1, . . . , m}. Still another important practical problem obtained from P is called the maximum ow problem. In this problem, V1 = {1}, V3 = {m}, V (1) = , V (m) = , a1 = , bm = . The problem is to maximize the total ow into the vertex m under the capacity constraints v (M F ) = max
iV (m)

xim ,

s.t. (MF)
j V (i)

xij
j V (i)

xji = 0,

i V2 = {2, . . . , m 1}, 0 xij dij , (i, j ) E.

Finally, consider the shortest path problem. Let cij be interpreted as the length of edge (i, j ). Dene the length of a path in G to be the sum of the edge lengths over all edges in the path. The objective is to nd a path of minimum length

32

from a vertex 1 to vertex m. It is assumed that all cycles have nonnegative length. This problem is a special case of the transshipment problem in which V1 = {1}, V3 = {m}, a1 = 1 and bm = 1. Let A be the incidence matrix of the digraph G = (V, E ), where V = {1, . . . , m} and E = {e1 , . . . , en }. With each arc ej we associate its length cj 0 and its ow xj 0. The shortest path problem may be formulated as:
n

v (SP ) = min
j =1

c j xj ,

(SP) s.t.

Ax =

1 0 . . .

, x 0. 0 +1

The rst constraint corresponds to the source vertex, the mth constraint corresponds to the demand vertex, while the remaining constraints correspond to the intermediate vertices, i.e., the points of distribution of the unit ow. The dual problem to SP is (DSP) v (DSP ) = max(u1 + um ), AT u c. (2.4)

Discrete Optimization
3
3.1

33

The Shortest Path


The Primal-Dual Method

Consider the standard linear programming min cT x (P ) s.t. Ax = b 0 x0 and its dual (D) max T b s.t. T A cT .

Suppose that we have a current which is feasible to the dual problem (D). Dene the index set J by J = {j : T A j = cj } ,

where Aj is the j th column of A. Then for any j / J , we have T Aj < cj . We call J the set of admissible columns. In order to search for an x such that it is not only feasible to the primal problem (P) but also it, togther with , satises the complementary condition of (P) and (D), we invent a new LP, called the restricted primal (RP), as follows
m

min
i=1

xa i Ax + xa = b xj 0 , for all j , xj = 0 , j / J, xa i 0 , i = 1, . . . , m ,

s.t. (RP )

34

i.e.,
m

= min 0 xJ + (RP ) s.t.


i=1 a

xa i

A J xJ + x = b xJ 0, xa 0 .

The dual of (RP) is w = max T b (DRP ) s.t. T Aj 0, j J i 1, i = 1, . . . , m . Let ( xJ , x a ) be an optimal basic feasible solution to (RP) and be an optimal basic feasible solution to (DRP) obtained from ( xJ , x a ). If w = 0, then = 0. Such an x is found. Otherwise, w > 0 and we can update to new = + .

The new cost to (D) is ( new )T b = T b + T b = T b + w ,

which means that we shall get a better if we can take > 0. On the other hand, new should be feasible to (D), i.e., ( new )T Aj = T Aj + T Aj cj .

Since for every j J , T Aj 0, we only need to consider those T Aj > 0, j / J. Therefore, we can take

Discrete Optimization
= min j /J such that T Aj > 0 cj T Aj . T Aj

35

Primal P

Dual P

Restricted Primal (RP)

Dual of RP (DRP)

Adjustment

to

Figure 3.1: An illustration of the prima-dual method

3.2

The Primal-Dual Method for the Shortest Path Problem

be the incidence matrix of the digraph G = (V, E ), where V = {1, . . . , m} and Let A E = {e1 , . . . , en }. With each arc ej we associate its length cj 0 and its ow xj 0. The shortest path problem, as we have already known, may be formulated as:
n

min
j =1

c j xj , = Ax x 0. 1 0 . . . , 0 +1

s.t.

(3.1)

be the remaining submatrix of A by removing the last row of A (it is redundant Let A is zero). Then (3.1) turns into because the sum of all rows of A

36

min
j =1

c j xj , = Ax x 0. 1 0 , . . . 0 (3.2)

s.t.

The dual problem to (3.2) is max 1 s.t. i + j cij m = 0 , is omitted in A . where we must x m = 0 because the last row of A for all (i, j ) E, (3.3)

The idea of primal-dual algorithm is derived from the idea of searching for a feasible point x such that

xij = 0 (some xk ) whenever i + j < cij ,

for given feasible (Remark: think about complementary conditions). We search for such an x by solving an auxiliary problem, called the restricted primal (RP), determined by the we are working with. If our search for the x is not successful, we nevertheless obtain information from the dual of RP, which we call DRP, and tells us how to improve the particular with which we started.

Discrete Optimization
Next, we give the details. The shortest-path problem can be written as
n

37

min

cj xj , j =1 Ax = x 0,

+1 0 , . . . 0 (3.4)

s.t.

. The purpose of introducing A is to make the right hand side of the where A = A constraint Ax = b nonnegative. Now, the dual problem of (3.4) is max 1 s.t. i j cij m = 0 . For a given feasible to (3.5), the set of admissible arcs is dened by J = {arcs (i, j ) : i j = cij } . The corresponding restricted primal problem (RP) is
m1

for all (i, j ) E,

(3.5)

= min
i=1

xa i, +1 0 a Ax + x = . , . . 0 xj 0 , for all j , xj = 0 , j / J, xa i 0 , i = 1, . . . , m 1

s.t.

(3.6)

38

and the dual of the restricted primal (DRP) is w = max 1 s.t. i j 0 for all (i, j ) J , i 1 for all i = 1, . . . , m 1 , m = 0 . DRP (3.7) is evry easy to solve: Since 1 1 and we wish to maximize 1 , we try 1 = 1. If there is no path from 1 to m (node 1 to node m), using only arcs in J , then we can propagate the 1 from node 1 to all nodes reachable by a path from node 1 without violating the i j 0 constraints, and an optimal solution to the DRP is then 1 for all nodes reachable by paths from node 1 using arcs in J = 0 for all nodes from which node m is reachable using arcs in J 1 for all other nodes. (3.7)

(Notice that this is not unique.)

We can then calculate 1 = min arcs (i, j ) /J such that i j > 0 to update and J , and re-solve the DRP. {cij (i j )}

Discrete Optimization

39

1 11 00 00 11 J

0 11 00 00 11

1 11 00 00 11 J

11 00 00 11 1
J

0 11 00 m 00 11

11 00 0 00 11

11 00
1

1 0 0 1
1

Figure 3.2: A solution to the restricted dual problem

: = + 1 .

If we get to a point where there is a path from node 1 to node m using arcs in J , 1 = 0, and we nd an optimal solution because = w = 0. Any path from node 1 to node m using only arcs in J is optimal.

The primal-dual algorithm reduces the shortest path problem to repeated solution of the simpler problem of nding the set of nodes reachable from a given node.

Interpretation: Dene at any point in the algorithm the set W = {i : node m is reachable from i by admissible arcs} = {i : i = 0} . Then the variable i remains xed from the time that i enters W to the conclusion of the algorithm, because the corresponding i will always be zero. Every arc that becomes admissible (enter J ) stays admissable throughout the

40

algorithm, because once we have i j = cij for (i, j ) E ,

we always change i and j by the same amount. i , i W is the length of the shortest path from node i to node m and the algorithm proceeds by adding to W , at each stage, the nodes not in W next closest to node m. At most |v | = m stages.

Dijkstras algorithm is an ecient implementation of the primal-dual algorithm for the shortest path problem.

3.3

Bellmans Equation

Let cij be the length of arc (i, j ) (positive arcs if cij > 0; nonnegative if cij 0). Let uij be the length of the shortest path from i j . Dene ui = u1 i . Then Bellmans Equations are u1 = 0 , ui = min{uk + cki }
k=i

3.4

Dijkstras Algorithm

In this section we assume that cij 0. Denote P : permanently labeled nodes; T : temporarily labeled nodes.

Discrete Optimization
i

41

ui cki
1

uk
k

Figure 3.3: Bellmans equation

P and T always satisfy P T = & P T = V.

Label for node j , [uj , lj ] where uj : the length of the (may be temporary) shortest path from node 1 to j and lj : the preceding node in the path. Dijkstras algorithm can be summarized as follows. Step 0. P = {1}, u1 = 0, l1 = 0, T = V \P. Compute c 1j if (1, j ) E, uj = if (1, j ) / E, 1 if (1, j ) E, lj = 0 if (1, j ) / E.

Step 1. Find k T such that uk = min{uj }.


j T

Let P = P {k } and T = T \{k }. If k = n, stop.

42

Step 2. For j T , if uk + ckj < uj , let [uj = uk + ckj , lj = k ] and go back to Step 1.

Claim: At any step, uj is the length of the shortest path from 1 to j , only passing nodes in P . [Suppose not and j is the rst violation... ].

Claim: The total cost is O(n2 ).

3.5

PERT or CPM Network

A large project is devisable into many unit tasks. Each task requires a certain amount of time for its completion, and the tasks are partially ordered. This network is sometimes called a PERT (Project Evaluation and Review Technique) or CPM (Critical Path Method) network. A PERT network is necessarily acyclic.

Theorem 3.1 A digraph is acyclic if and only if its nodes can be renumbered in such a way that for all arc (i, j ), i < j . [The work of this is O(n2 )]

Claim: For any acyclic graph, at least one node has indegree 0. After renumbering it, we have for all (i, j ), i < j .

Bellmans equations are

u1 = 0, ui = min{uk + cki }
k=i

Discrete Optimization
For acyclic graphs, they turn out to be u1 = 0, ui = min{uk + cki }
k<i

43

For a network with no cycles, one can replace each arc length by its negative value and still carry out the computation successfully. u1 = 0 , ui = max{uk + cki }
k<i

Find the longest path = the time needs to nish the project.

3.6

Bellman-Ford Method

In this section we consider a general method of solution to Bellmans equations. Here we neither assume that the network is acyclic nor that all arc lengths are nonnegative. [We still assume that there are no negative cycles].

Step 1. u1 = 0, uj = c1j , j = 1. Step k . For k = 2, . . . , n, uj = min{uj


(k) (k1)

(1)

(1)

, min{ui
i=j

(k1)

+ cij }}, j = 1, , n

Clearly, for each node j , successive approximations of uj are monotone decreasing: uj uj uj . . . The total computational cost is O(n3 ). Outline of Proof: uj
(k) (1) (2) (3)

is the length of the shortest path from node 1 to node

j , subject to the condition that the path contains no more than k arcs.

44

3.7

Floyd-Warshall Method for Shortest Paths Between All Pairs

Again, we need the assumption that the networks contain no negative cycles in order that the Floyd-Warshall method works. Step 0. uij = cij , i, j = 1, . . . , n. Step k . For k = 1, . . . , n, uij
(k+1) (1)

= min{uij , uik + ukj }, i, j = 1, . . . , n

(k )

(k)

(k)

Claim: uij is the length of a shortest path from i to j , subject to the condition that the path does no pass through k, k + 1, . . . n (i and j excepted). [This means uij
(n+1)

(k )

= uij ].
(k)

Proof by induction. It is clearly true for Step 0. Suppose it is true for uij for all i and j . Now consider uij
(k+1)

. If a shortest path from node i to node j which does


(k+1)

not pass through nodes k + 1, k + 2, . . . n does not pass through k , then uij Otherwise, if it does pass through node k , uij
(k+1)

= uij .

(k)

= uik + ukj .

(k )

(k)

It is easy to see that the complexity of the Floyd-Warshall method is O(n3 ). The Floyd-Warshall requires the storage of an n n matrix. Initially this is U (1) = C . Thereafter, U (k+1) is obtained from U (k) by using row k and column k to revise the remaining elements. That is, uij is compared with uik + ukj and if the later is smaller, uik + ukj is substituted for uij in the matrix. There are other methods of the above type, e.g. G B Dantzig method. 3.8 Other Cases 1 |A| << |V |(|V | 1). 2 2. The k th shortest path problem. allow repetition

1. Sparse graphs

Discrete Optimization
not allow repetitive arcs not allow repetitive nodes 3. with time constraints 4. with xed charge

45

ZusrlarafZ

( z 4 z d z'z)= Q :G

46

The Greedy Algorithm and Computational Complexity


Matroid

4.1

1935, matroid theory founded by H. Whitney; 1965, J. Edmonds pointed out the signicance of matroid theory to combinatorial optimization (CO). Importance: 1) Many CO problems can be formulated as matroid problems, and solved by the same algorithm; 2) We can detect the insight of the CO problems; 3) A special tool for CO.

Denition 4.1 Suppose we have a nite ground set S , |S | < , and a collection, , of subsets of S . Then H := (S, ) is said to be an independent system if the empty set is in and is closed under inclusion; that is i) ; ii) X Y = X . Elements in are called independent sets, and subsets of S not in are called dependent sets.

Discrete Optimization
Example: Matching system. G = (V, E ), = {all matchings in G}.

47

[A matching M of a graph G = (V, E ) is a subset of the edges with the property that no two edges of M share the same node. A matching M is a piecewise disjoint edge set]
e1

e 3

e4

e 2

Figure 4.1: A Matching Example

In Figure 4.1, S = {e1 , e2 , e3 , e4 }, = {, {e1 }, {e2 }, {e3 }, {e4 }, {e2 , e3 }}.

48

Denition 4.2 If H = (S, ) is an independent system such that X, Y , |X | = |Y | + 1 = there exists e X \Y such that Y + e , then H (or the pair (S, )) is called a matroid. Examples: i) Matric matroid: A matrix A = (a1 , . . . , an )mn , S = {a1 , . . . , an }, X X = {ai1 , . . . , aik } is independent. ii) Graphic matroid: G = (V, E ), S = E , X X E, X has no cycle.

ii) is a special case of i) with A = the vertex-edge incidence matrix.


4.2

The Greedy Algorithm


+

Suppose that H = (S, ) is an independent system and W : S is a weight function with W (e) 0 e S . For X S , let W (X ) :=
eX

W (e).

Then the matroid problem is max W (X ) s.t. X .

Discrete Optimization
Greedy Algorithm: Suppose W (e1 ) W (e2 ) . . . W (en ). Step 0. Let X = . Step k . If X + ek , let X := X + ek , where k = 1, . . . , n.

49

Theorem 4.1 (Rado, Edmonds) The above algorithm works if and only if H is a matroid.

Applications: 1) The Maximal Spanning Tree Problem. Suppose that there is a television network leasing video links so that its stations in various places can be formed into a connected network. Each link (i, j ) has a dierent rental cost cij . The question is how the network can be constructed to have the minimum cost? Obviously, what is wanted is a minimum cost spanning tree of video links. Replacing cij by M cij , where M is a larger number, we can see that it then turns into a maximum spanning tree (MST). Kruskal has already proposed the following solution: Choose the edges one at a time in order of their weights, largest rst, rejecting an arc only if it forms a cycle with edges already chosen. 2) A Sequencing Problem. Suppose that there are a number of jobs which are to be processed

50

by a single machine. All jobs require the same processing time. Each job j has assigned to it a deadline dj , and a penalty pj , which must be paid if the job is not completed by its deadline. What ordering of the jobs minimizes the total penalty costs? It can be easily seen that there exists an optimal sequence in which all jobs completed on time appear at the beginning of the sequence in order of deadlines, earliest deadline rst. The late jobs follow, in arbitrary order. Thus, the problem is to choose an optimal set of jobs which can be completed on time. The following procedure can be shown to accomplish that objective.

Choose the jobs one at a time in order of penalties, largest rst, rejecting a job only if its choice would mean that it, or one of the jobs already chosen, cannot be completed on time. [This requires checking to see that the total amount of processing to be completed by a particular deadline does not exceed the deadline in question.]

For example, consider the set of jobs below, where the processing time of each job is one hour, and the deadlines are expressed in hours of elapsed time.

Discrete Optimization

51

Job j

Deadline Penalty dj pj 10 9 7 6 4 2

1 2 3 4 5 6 1 1 3 2 3 6

Job 1 is chosen, but job 2 is discarded, because the two together require two hours of processing time and the deadline for job 2 is at the end of the rst hour. Jobs 3 and Jobs 4 are chosen, job 5 is discarded, and job 6 is chosen. An optimal sequence is jobs 1, 4,3, and 6, followed by the late jobs 2 and 5.

3) A Semimatching Problem. Let W be an m n nonnegative matrix. Suppose we wish to choose a maximum weight subset of elements, subject to the constraint that no two elements are from the same row of the matrix. Or, in other

52

words, the problem is to maximize


i,j

wij xij xij 1, i = 1, ..., m


j

subject to

xij {0, 1}. This semimatching problem can be solved by choosing the largest element in each row of W . Or alternatively: choose the elements one at a time in order of size, largest rst, rejecting an element only if an element in the same row has already been chosen.
4.3

General Introduction on Computational Complexity

Initiated in large measure by the seminal papers of S. A. Cook (1971) and R. M. Karp (1972) in the area of discrete optimization. Denition 4.3 An instance of an optimization problem consists of a feasible set F and a cost function c : F problem is dened as a collection of instances. . An optimization

For example, linear programming is a problem and min{x : x 3} is an instance. Instances of a problem need to be described according to a common format. For example, instances of linear programming in standard form can be described by listing the entries of A, b, and c. Note that

Discrete Optimization
the notion of the size of an instance.

53

some instances are larger than others, and it is convenient to dene

Denition 4.4 The size of an instance is dened as the number of bits used to describe the instance, according to a prescribed format. Given that arbitrary numbers cannot be represented in binary, this denition is geared towards instances involving integer (or rational) numbers. Note that any nonnegative integer r smaller or equal to U can be written in binary as follows: r = ak 2k + ak1 2k1 + . . . + a1 21 + a0 , where the scalars a0 , . . . , ak , are 0 or 1. The number k is clearly at most log2 U , since r U . We can then represent r by the binary vector (a0 , a1 , . . . , ak ). With an extra bit for sign, we can aslo represent negative numbers. In other words, we can represent any integer with absolute value less than or equal to U using at most log2 U + 2 bits. Consider now an instance of a linear programming problem in standrad form, i.e., an m n matrix A, an m-vector b, and an nvector c, and assume that the magnitude of the largest element of {A, b, c} is equal to U . Since there are (mn + m + n) entries in A, b, and c, the size of such an instance is at most (mn + m + n)( log2 U + 2). In fact, this count is not exactly correct: more bits will be needed to encode ags that indicate where a number ends, and another

54

starts. However, our count is right as far as the order of magnitude is concerned. To avoid details of this kind, we will be using instead the order-of-magnitude notation, and we will simply say that the size of such an instance is O(mnlogU ). Optimization problems are solved by algorithms. The running time of an algorithm will, in general, depend on the instance to which it is applied. Let T (n) be the worst-case running time of some algorithm over all instances of size n, under the bit model.

Denition 4.5 An algorithm runs in polynomial time if there exists an integer k such that T (n) = O(nk ).

Fact: Suppose that an algorithm takes polynomial time under the arithmetic model. Furthermore, suppose that on instances of size n, any integer produced in the course of execution of the algorithm has size bounded by a polynomial in n. Then, the algorithm runs in polynomial time under the bit model as well. The class P : A combinatorial optimization (CO) problem is in P if it admits algorithms of polynomial complexity. The class N P : A combinatorial problem is in N P if for all YES instances, there exists a polynomial length certicate that can be used to verify in polynomial time that the answer is indeed yes.

Discrete Optimization
N P : e.g., verify the optimality of an LP solution. Obviously, P N P . But, P = N P?

55

Denition 4.6 Suppose that there exists an algorithm for some problem A that consists of a polynomial time computation in addition of polynomial number of subroutine calls to an algorithm for problem B . We then say that problem A reduces (in polynomial time) to problem B . For short, A = B . In the above denition, all references to polynomiality are with respect to the size of an instance of problem A.
R

Theorem 4.2 If A = B and B P , then A P . The above theorem says that if A = B , then problem A is not much more dicult than problem B . For example, let us consider the following scheduling problem: a set of jobs are to be processed on two machines where no job requires in excess of three operations. A job may require, for example, processing on machine one rst, followed by machine two, and nally back on machine one. Our objective is to minimize makespan, i.e., complete the set of jobs in minimum time. Let us refer to this problem as (PJ ).
R

56

Now, take the one-row integer program or knapsack problem that we state in the equality form: given integers a1 , a2 , . . . , an and b, does there exist a subset S {1, 2, . . . , n} such that
j S

aj = b? Calling

the later problem (PK ), our objective is to show that (PK ) polynomially reduces to (PJ ).

For a given (PK ) we construct an instance of (PJ ) wherein the rst n jobs require only one operation, this being on machine one. Each has processing time aj for j = 1, 2, . . . , n. Job n + 1 possesses three operations constrained in such a way that the rst is on machine two, the second on machine one, and the last on machine two again. The rst such operation has duration b, the second duration 1, and the third duration
n j =1 aj

b.

Clearly, one lower bound on the completion of processing time of all jobs in this instance of (PJ ) is the sum of processing times for job n + 1, i.e.,
n j =1 aj

+ 1. Any feasible schedule for all jobs achieving

this makespan value must be optimal. Suppose a subset S exists such that the knapsack problem is solvable. For (PJ ) we can schedule jobs implies by S rst on machine one, followed by the second operation of job n + 1, and complete with the remaining jobs (those not given by S ). The rst and last operations for job n + 1 (on machine two) nish at times b and this schedule is
n j =1 aj

+ 1, respectively. Thus, the completion time of + 1.

n j =1 aj

Discrete Optimization
If, conversely, there is no subset S {1, 2, . . . , n} with
j S

57

aj = b

our scheduling instance would be forced to a solution like: either job n + 1 waits before it obtains the needed unit of time on machine one or some of jobs 1, 2, . . . , n wait to keep job n + 1 progressing. Either way the last job will complete after time
n j =1 aj

+ 1.

We can conclude that the question of whether (PK ) has a solution can be reduced to asking whether the corresponding (PJ ) has makespan no greater than
n j =1 aj +1.

Since (as is usually the case) the size of the

required (PJ ) instance is a simple polynomial (in fact linear) function of the size of (PK ), we have a polynomial reduction. Problem (PK ) indeed reduces polynomially to (PJ ).
4.4

Three Forms of a CO Problem


is a cost

A CO problem: F is the feasible solution set and c : F function, min c(f ) s.t. f F.

The above CO problem has three versions: a) Optimization version: Find the optimal solution. b) The evaluation version: Find the optimal value of c(f ), f F . c) The recognition version: Given an integer L, is there a feasible

58

solution f F such that c(f ) L?. These three type of problems are closely related in terms of algorithmic diculty. In particular, the diculty of the recognition problem is usually a very good indicator of the diculty of the corresponding evaluation and optimization problems. For this reason, we can focus, without loss of generality, on recognition problems. Consider the following combinatorial optimization problem, called the maximum clique problem: Given a graph G = (V, E ) nd the largest subset C V such that for all distinct u, v C , (v, u) E . The maximum clique problem is in N P or in short, Clique N P . Assume that we have a procedure cliquesize which, given any graph G, will evaluate the size of the maximum clique of G. In other words cliquesize solves the evaluation version of the maximum clique problem. We can then make ecient use of this routine in order to solve the optimization version. Step 0 . X = . Step 1. Find v V such that cliquesize(G(v )) = cliquesize(G), where G(v ) is the subgraph of G consisting of v and all its adjacent nodes. Step 2. X = X + v . G = G(v )\v . If G = , stop; otherwise, go to

Discrete Optimization
Step 1.

59

We now discuss the relation between the three variants in general. Let us assume that the cost c(f ) of any feasible f F can be computed in polynomial time. It is then clear a polynomial time algorithm for the optimization problem leads to a polynomial time algorithm for the optimization problem. (Once an optimal solution is found, use it to evaluate - in polynomial time, the optimal cost.) Similarly, a polynomial time for the evaluation problem immediately translates to a polynomial time algorithm for the recognition problem. For many interesting problems, the converse is also true: namely a polynomial time algorithm for the recognition problem often leads to polynomial time algorithms for the evaluation and optimization problems. Suppose that the optimal cost is known to take one of M values. We can then perform binary search and solve the evaluation problem using logM calls to an algorithm for the recognition problem. If logM is bounded by a polynomial function of the instance size (which is often the case), and if the recognition algorithm runs in polynomial time, we obtain a polynomial time algorithm for the evaluation problem. We will now give another example to show how a polynomial time evaluation algorithm can lead to a polynomial time optimization algorithm by using the zero-one integer programming problem (ZOIP). Given an instance I of ZOIP, let us consider a particular component

60

of the vector x to be optimized, say x1 , and let us form a new instance I by adding the constraint x1 = 0. We run an evaluation algorithm on instances I and I . If the outcome is the same for both instances, we can set x1 to zero without any loss of optimality. If the outcome is dierent, we conclude that x1 should be set to 1. In either case, we have arrived at an instance involving one less variable to be optimized. Continuing the same way, xing the value of one variable at a time, we obtain an optimization algorithm whose running time is roughly equal to the running time of the evaluation algorithm times the number of variables.
4.5

N PC

The class coN P : A combinatorial problem is in coN P if for all NO instances, there exists a polynomial length certicate that can be used to verify in polynomial time that the answer is indeed no. Obviously, P coN P . But, P = coN P ?

The next denition deals with the simplest type of a reduction, where an instance of problem A is replaced by an equivalent instance of problem B . Rather than developing a general denition of equivalence, it is more convenient to focus on the recognition problems, that is, problems that have a binary answer (e.g., YES or NO).

Discrete Optimization

61

co_NP NP P

Figure 4.2: Relationships among P , N P and co-N P

Denition 4.7 Let A and B be two recognition problems. We say that problem A transforms to problem B (in polynomial time) if there exists a polynomial time algorithm which given an instance I1 of problem A, outputs an instance I2 of B , with the property that I1 is a YES instance of A if and only if I2 is a YES instance of B . [A = B .] The class N P -hard: A problem A is N Phard if for any problem B N P , B = A. Theorem 4.3 Suppose that a problem C is N P -hard and that C can be transformed (in polynomial time) to another problem D. Then D is N P -hard. Dene a set of Boolean variables {x1 , x2 , . . . , xn } and let the complement of any of these variables xi be denoted by x i . In the language of logic, these variables are referred to as literals. To each literal we assign a label of true or false such that xi is true if and only if x i is false.
R R

62

Let the symbol denote or and the symbol denote and. We then can write any Boolean expression in which is referred to as conjunctive normal form, i.e., as a nite conjunction of disjunctions using each literal once at most. For example, with the set of variables {x1 , x2 , x3 , x4 } one might encounter the following conjunctive normal form expression (x1 x2 x4 ) ( x1 x2 x 3 ) ( x2 x 4 ). Each disjunctive grouping in parenthesis is referred to as a clause. The satisability problem is Given a set of literals and a conjunction of clauses dened over the literals, is there an assignment of values to the literals for which the Boolean expression is true? If so, then the expression is said to be satisable. The Boolean expression above is satisable via the following assignment: x1 = x2 = x3 = true and x4 = f alse. Let SAT denote the satisability problem and Q be any member of N P .

Theorem 4.4 (Cook (1971)) Every problem Q N P polynomially reduces to SAT.

Karp (1972) showed that SAT polynomially reduces to many combinatorial problems. The class N PC : A recognition problem A is N PC if

Discrete Optimization
i) A N P and ii) for any problem B N P , B = A.
R

63

Cooks Theorem shows SAT N PC because it can be checked easily that SAT N P . Examples of N PC problems: ILP, ZOIP, Clique, Vertex Packing, TSP, TSP, 3-Index Assignment, Knapsack, etc.

NP

NPC

NP-hard

Figure 4.3: Relationships among P , N P , N PC , and N P -hard

N P -hardness is not a denite proof that no polynomial time algorithm exists. For all we know, it is always possible that ZIOP belongs to P , and P = N P . Nevertheless, N P -hardness suggests that we

64

should stop searching for a polynomial time algorithm, unless we are willing to tackle the P = N P question. For a good guide to the theory of N PC , see 1979, M. R. Garey and D. S. Johnson, Computers and Intractability: a Guide to the Theory of N P -Completeness. 1995, C.H. Papadimitriou, Computational Complexity.

You might also like