You are on page 1of 165

Advanced algorithms

Freely using the textbook by Cormen, Leiserson, Rivest, Stein

Péter Gács

Computer Science Department


Boston University

Spring 09

Péter Gács (Boston University) CS 530 Spring 09 1 / 165


Introduction

The class structure

See the course homepage.


In the notes, section numbers and titles generally refer to the
book: CLSR: Algorithms, second edition.

Péter Gács (Boston University) CS 530 Spring 09 2 / 165


Linear algebra Matrices and vectors

Vectors

For us, a vector is always given by a finite sequence of numbers.


Row vectors, column vectors, matrices.
Notation:
Z: integers,
Q: rationals,
R: reals,
C: complex numbers.
Q, R, C are fields (allowing division as well as multiplication).
(We may get to see also some other fields later.)
Addition: componentwise. Over a field, multiplication of a vector
by a field element is also defined (componentwise).
Linear combination.

Péter Gács (Boston University) CS 530 Spring 09 3 / 165


Linear algebra Vector spaces

Vector spaces

Vector space over a field: a set M of vectors closed under linear


combination.
Elements of the field will be also called scalars.
Examples
The set C of complex numbers is a vector space over the field
R of real numbers (2 dimensional, see later).
It is also a vector space over the complex numbers (1
dimensional).
{ (x, y, z) : x + y + z = 0 }.
{ (2t + u, u, t − u) : t, u ∈ R }.

Péter Gács (Boston University) CS 530 Spring 09 4 / 165


Linear algebra Linear dependence

Linear dependence

Subspace. Generated subspace.


Two equivalent criteria of dependence:
one of them depends on the others (is in the subspace
generated by the others)
a nontrivial linear combination is 0.

Examples
{(1, 2), (3, 6)}. Two vectors are dependent when one is a scalar
multiple of the other.
{(1, 0, 1), (0, 1, 0), (1, 1, 1)}.

Basis in a subspace M: a maximal lin. indep. set.

Theorem
A set is a basis iff it is a minimal generating set.

Péter Gács (Boston University) CS 530 Spring 09 5 / 165


Linear algebra Linear dependence

Examples
A basis of { (x, y, z) : x + y + z = 0 } is {(0, 1, −1), (1, 0, −1)}.
A basis of { (2t + u, u, t − u) : t, y ∈ R } is {(2, 0, 1), (1, 1, −1)}.

Theorem
All bases have the same number of elements.

Proof. Via the exchange lemma.

Dimension of a vector space: this number.

Example
The set of all n-tuples of real numbers with the property that the
sum of their elements is 0 has dimension n − 1.

Péter Gács (Boston University) CS 530 Spring 09 6 / 165


Linear algebra Linear dependence

Let M be a vector space. If b i is an n-element basis, then each


vector x in M in has a unique expression as

x = x1 b1 + · · · + xn b n .

The x i are called the coordinates of x with respect to this basis.

Example
If M is the set Rn of all n-tuples of real numbers then the
n-tuples of form e i = (0, . . . , 1, . . . , 0) (only position i has 1) form a
basis. Then (x1 , . . . , xn ) = x1 e1 + · · · + xn e n .

Péter Gács (Boston University) CS 530 Spring 09 7 / 165


Linear algebra Linear dependence

Example
If A is the set of all n-tuples whose sum is 0 then the n − 1 vectors

(1, −1, 0, ..., 0)


(0, 1, −1, 0, . . . , 0)
...
(0, 0, 0, 0, . . . , 0, 1, −1)

form a basis of A (prove it!).

Péter Gács (Boston University) CS 530 Spring 09 8 / 165


Linear algebra Matrices

Matrices

(a i j ). Dimensions. m × n
Diagonal matrix diag(a 11 , . . . , a nn )
Identity matrix.
Triangular (unit triangular) matrices.
Permutation matrix.
Transpose A T . Symmetric matrix.

Péter Gács (Boston University) CS 530 Spring 09 9 / 165


Linear algebra Matrices

Matrix representing a linear map

A p × q matrix A can represent a linear map R q → R p as follows:

x1 = a 11 y1 + · · · + a 1 q yq
.. ..
. .
x p = a p1 y1 + · · · + a pq yq

With column vectors x = (x i ), y = (y j ) and matrix A = (a i j ), this


can be written as

x = A y.

This is taken as the definition of matrix-vector product.


General definition of a linear transformation F : V → W. Every
such transformation can be represented by a matrix, after we fix
bases in V and W.

Péter Gács (Boston University) CS 530 Spring 09 10 / 165


Linear algebra Matrices

Matrix multiplication

Let us also have

y1 = b 11 z1 + · · · + b 1r z r
.. ..
. .
yq = b q1 z1 + · · · + b qr z r

writeable as y = Bz. Then it can be computed that

x = Cz where C = (c ik ),

c ik = a i1 b 1k + · · · + a iq b qk (i = 1, . . . , p, k = 1, . . . , r).

Péter Gács (Boston University) CS 530 Spring 09 11 / 165


Linear algebra Matrices

We define the matrix product

AB = C

from above, which makes sense only for compatible matrices


(p × q and q × r). Then

x = A y = A(Bz) = Cz = (AB)z.

From this we can infer also that matrix multiplication is


associative.
Example
¡0 1¢ ¡0 0¢
For A = 00 , B= 10 we have AB 6= BA.

Péter Gács (Boston University) CS 530 Spring 09 12 / 165


Linear algebra Matrices

Transpose of product
Easy to check: (AB)T = B T A T .

Inner product
If a = (a i ), b = (b i ) are vectors of the same dimension n taken as
column vectors then

aT b = a1 b1 + · · · + a n b n

is called their inner product: it is a scalar. The Euclidean norm


(length) of a vector v is defined as
p
vT v = ( v2i )1/2 .
X
i

Péter Gács (Boston University) CS 530 Spring 09 13 / 165


Linear algebra Matrices

The (less frequently used) outer product makes sense for any two
column vectors of dimensions p, q, and is the p × q matrix
ab T = (a i b j ).

Péter Gács (Boston University) CS 530 Spring 09 14 / 165


Linear algebra Inverse, rank

Inverse, rank

Example

µ ¶−1 µ ¶
1 1 0 1
= .
1 0 1 −1

(AB)−1 = B−1 A −1 .
(A T )−1 = (A −1 )T .
A square matrix with no inverse is called singular. Nonsingular
matrices are also called regular.

Example
¡1 0¢
The matrix 10 is singular.

Péter Gács (Boston University) CS 530 Spring 09 15 / 165


Linear algebra Inverse, rank

Im(A) = set of image vectors of A. If the colums of matrix A are


a1 , . . . , a n , then the product Ax can also be written as

Ax = x1 a1 + · · · + xn a n .

This shows that Im(A) is generated by the column vectors of the


matrix, moreover
   
1 0
0 1
   
0 0
   
a j = Ae j , with e1 =   , e2 = 
 
  , and so on.
 ..   .. 
. .
0 0

Péter Gács (Boston University) CS 530 Spring 09 16 / 165


Linear algebra Inverse, rank

Ker(A) = the set of vectors x with Ax = 0.


The sets Im(A) and Ker(A) are subspaces.
Null vector of a matrix: non-0 element of the kernel.
Theorem
If A : V → W then

dim Ker(A) + dim Im(A) = dim(V ).

Theorem
A square matrix A is singular iff KerA 6= {0}.

More generally, a non-square matrix A will be called singular, if


KerA 6= {0}.
The rank of a set of vectors: the dimension of the space they
generate.
The column rank of a matrix A is dim(ImA). (The row rank is
harder to interpret.)
Péter Gács (Boston University) CS 530 Spring 09 17 / 165
Linear algebra Inverse, rank

Theorem
The two ranks are the same (see proof later). Also, rank(A) is the
smallest r such that there is an m × r matrix B and an r × n
matrix C with A = BC.

A special case is easy:

Proposition
A triangular matrix with only r rows (or only r columns) and all
non-0 diagonal elements in those rows, has row rank and column
rank r.

Interpretation: going through spaces with dimensions m → r → n.

Example
The outer product A = bc T of two vectors has rank 1, and this
product is the decomposition.

Péter Gács (Boston University) CS 530 Spring 09 18 / 165


Linear algebra Inverse, rank

The following is immediate:

Proposition
A square matrix is nonsingular iff it has full rank.

Minors.

Péter Gács (Boston University) CS 530 Spring 09 19 / 165


Linear algebra Determinant

Determinant

Definition
A permutation: an invertible map σ : {1, . . . , n} → {1, . . . , n}.
The product of two permutations σ, τ is their consecutive
application: (στ)(x) = σ(τ(x)).
A transposition is a permutation that interchanges just two
elements.
An inversion in a permutation: a pair of numbers i < j with
σ(i) > σ( j). We denote by Inv(σ) the number of inversions in σ.
A permutation σ is even or odd depending on whether Inv(σ)
is even or odd.

Péter Gács (Boston University) CS 530 Spring 09 20 / 165


Linear algebra Determinant

Proposition
a A transposition is always an odd permutation.
b Inv(στ) ≡ Inv(σ) + Inv(τ) (mod 2).

It follows from these that multiplying a permutation with a


transposition always changes its parity.

Péter Gács (Boston University) CS 530 Spring 09 21 / 165


Linear algebra Determinant

Definition
Let A = (a i j ) an n × n matrix. Then

(−1)Inv(σ) a 1σ(1) a 2σ(2) · · · a nσ(n) .


X
det(A) = (1)
σ

Geometrical interpretation the absolute value of the determinant


of a matrix A over R with column vectors a1 , . . . , a n is the
volume of the parallelepiped spanned by these vectors in
n-space.
Recursive formula Let A i j be the submatrix (minor) obtained by
deleting the ith row and jth column. Then

(−1) i+ j a i j det(A i j ).
X
det(A) =
j

Computing det(A) using this formula is just as inefficient as


using the original definition (1).

Péter Gács (Boston University) CS 530 Spring 09 22 / 165


Linear algebra Determinant

Properties

det A = det(A T ).
det(v1 , v2 , . . . , v n ) is multilinear, that is linear in each
argument separately. For example, in the first argument:

det(α u + βv, v2 , . . . , v n ) = α det(u, v2 , . . . , v n ) + β det(v, v2 , . . . , v n ).

Hence det(0, v2 , . . . , v n ) = 0.
Antisymmetric: changes sign at the swapping of any two
arguments. For example for the first two arguments:

det(v2 , v1 , . . . , v n ) = − det(v1 , v2 , . . . , v n ).

Hence det(u, u, v2 , . . . , v n ) = 0.

Péter Gács (Boston University) CS 530 Spring 09 23 / 165


Linear algebra Determinant

It follows that any multiple of one row (or column) can be added to
another without changing the determinant. From this it follows:

Theorem
A square matrix is singular iff its determinant is 0.

The following is also known.

Theorem
det(AB) = det(A) det(B).

Péter Gács (Boston University) CS 530 Spring 09 24 / 165


Linear algebra Positive definite matrices

Positive definite matrices

An n × n matrix A = (a i j ) is symmetric if a i j = a ji (that is,


A = A T ). To each symmetric matrix, we associate a function
Rn → R called a quadratic form and defined by

x 7→ xT Ax =
X
a i j xi x j .
ij

The matrix A is positive definite if xT Ax Ê 0 for all x and


equality holds only with x = 0.

Péter Gács (Boston University) CS 530 Spring 09 25 / 165


Linear algebra Positive definite matrices

For example, if B is a nonsingular matrix then A = B T B is always


positive definite. Indeed,

xT B T Bx = (Bx)T (Bx),

the squared length of the vector Bx, and since B is nonsingular,


this is 0 only if x is 0.

Theorem
A is positive definite iff A = B T B for some nonsingular B.

Péter Gács (Boston University) CS 530 Spring 09 26 / 165


Divide and conquer Polynomial multiplication

Divide and conquer


Polynomial multiplication

We will illustrate here the algebraic divide-and-conquer method.


The problem is similar for integers, but is slightly simpler for
polynomials.

nX
−1
f = f (x) = a i xi ,
i =0
nX −1
g = f (x) = b i xi ,
i =0
2Xn−2
f (x)g(x) = h(x) = c k xk ,
k=0
where c k = a 0 b k + a 1 b k−1 + · · · + a k b 0 .

Péter Gács (Boston University) CS 530 Spring 09 27 / 165


Divide and conquer Polynomial multiplication

Let M(n) be the minimal number of multiplications of constants


needed to compute the product of two polynomials of length n.
The school method shows

M(n) É n2 .

Can we do better?

Péter Gács (Boston University) CS 530 Spring 09 28 / 165


Divide and conquer Polynomial multiplication

Divide and conquer

For simplicity, assume n is a power of 2 (otherwise, we pick n0 > n


that is a power of 2). Let m = n/2, then

f (x) = a 0 + · · · + a m−1 x m−1 + x m (a m + · · · + a 2m−1 x m−1 )


= f 0 (x) + x m f 1 (x).

Similarly for g(x). So,

f g = f 0 g 0 + x m ( f 0 g 1 + f 1 g 0 ) + x2 m f 1 g 1 .

In order to compute f g, we need to compute

f0 g0, f0 g1 + f1 g0, f1 g1.

Péter Gács (Boston University) CS 530 Spring 09 29 / 165


Divide and conquer Polynomial multiplication

How many multiplications does this need? If we compute f i g j


separately for i, j = 0, 1 this would just give the recursion

M(2m) É 4M(m)

which suggests that we really need n2 multiplications.

Péter Gács (Boston University) CS 530 Spring 09 30 / 165


Divide and conquer Polynomial multiplication

Trick that saves us a (polynomial) multiplication:

f 0 g 1 + f 1 g 0 = ( f 0 + f 1 )(g 0 + g 1 ) − f 0 f 1 − g 0 g 1 . (2)

We found M(2m) É 3M(m). This trick saves us a lot more when


we apply it recursively.

M(2k ) É 3k M(1) = 3k .

So, if n = 2k , then k = log n,

M(n) < 3log n = 2log n·log 3 = nlog 3 .

log 4 = 2, so log 3 < 2, so nlog 3 is a smaller power of n than n2 .


(It is actually possible to do much better than this.)

Péter Gács (Boston University) CS 530 Spring 09 31 / 165


Divide and conquer Polynomial multiplication

Counting also additions


Let L(n) be the complexity of multiplication when additions of
constants are also counted. The addition of two polynomials of
length n takes at most n additions of constants. Taking this into
account, the above trick gives the following new estimate:

L(2m) É 3L(m) + 10m.

Let us show from here, by induction, that L(n) = O(nlog 3 ).

L(2m) É 3L(m) + 10m,


L(4m) É 9L(m) + 10m(2 + 3),
L(8m) É 27L(m) + 10m(22 + 2 · 3 + 32 ),
L(2k ) É 3k L(1) + 10(2k−1 + 2k−2 · 3 + · · · + 3k−1 )
< 3k + 10 · 3k−1 (1 + 2/3 + (2/3)2 + · · · ).

Péter Gács (Boston University) CS 530 Spring 09 32 / 165


Divide and conquer Polynomial multiplication

As we see, counting also the additions did not change the


upper bound substantially. The reason is that even when
counting only multiplications, we already had to deal with the
most important issue: the number of recursive calls when
doing divide-and-conquer.
The best-known algorithm for multiplying polynomials or
integers requires of the order of n log n log log n operations.
(Surprisingly, it uses a kind of “Fourier transform”.)

Péter Gács (Boston University) CS 530 Spring 09 33 / 165


Divide and conquer Faster matrix multiplication

Faster matrix multiplication

For matrix multiplication, there is a trick similar to the one seen


for polynomial multiplication. Let

µ ¶ µ ¶
a b e f
A= , B= ,
c d g h
µ ¶
r s
C = AB = .
t u

Then r = ae + b g, s = a f + bh, t = ce + d g, u = c f + dh. The naive


way to compute these requires 8 multiplications. We will find a
way to compute them using only 7.

Péter Gács (Boston University) CS 530 Spring 09 34 / 165


Divide and conquer Faster matrix multiplication

Let

P1 = a( f − h),
P2 = (a + b)h,
P3 = (c + d)e,
P4 = d(g − e),
P5 = (a + d)(e + h),
P6 = (b − d)(g + h),
P7 = (a − c)(e + f ).

Then
r = −P2 + P4 + P5 + P6 ,
s = P1 + P2 ,
(3)
t = P3 + P4 ,
u = P1 − P3 + P5 − P7 .

Péter Gács (Boston University) CS 530 Spring 09 35 / 165


Divide and conquer Faster matrix multiplication

In all products P i , the elements of A are on the left, and the


elements of B on the right. Therefore the calculations leading
to (3) do not use commutativity, so they are also valid when
a, b, · · · , g, h are matrices. If M(n) is the number of multiplications
needed to multiply n × n matrices, then this leads (for n a power
of 2) to

M(n) É nlog 7 .

Taking also additions into account:

T(2n) É 7T(n) + O(n2 ).

Read Section 4 of CLRS to recall how to prove from here


T(n) = O(nlog 7 ).

Péter Gács (Boston University) CS 530 Spring 09 36 / 165


Divide and conquer Faster matrix multiplication

The currently best known matrix multiplication algorithm


has an exponent substantially lower than log 7, but still
greater than 2.
There is a great difference between the applicability of fast
polynomial multiplication and fast matrix multiplication.
The former is practical and is used much, in computing products
of large polynomials and numbers (for example in
cryptography).
On the other hand, fast matrix multiplication is an (important)
theoretical result, but with serious obstacles to its practical
application. First, there are problems with its numerical
stability, due to all the subtractions, whose effect may magnify
round-off errors. Second, and more importantly, large matrices
in practice are frequently sparse, with much fewer than n2
elements. Strassen’s algorithm does not exploit this.

Péter Gács (Boston University) CS 530 Spring 09 37 / 165


Linear equations Elimination

Linear equations
Informal treatment first

a 11 x1 + · · · + a 1n xn = b 1 ,
.. ..
. .
a m1 x1 + · · · + a mn xn = b m .

How many solutions? Undetermined and overdetermined


systems.

Péter Gács (Boston University) CS 530 Spring 09 38 / 165


Linear equations Elimination

For simplicity, let us count just multiplications again.


Jordan elimination: eliminating first x1 , then x2 , and so on.

n · n · (n + (n − 1) + · · · ) ≈ n3 /2.

Gauss elimination: eliminating xk only from equations


k + 1, k + 2, . . . . Then solving a triangular set of equations.
Elimination:

n(n − 1) + (n − 1)(n − 2) + · · · ≈ n3 /3.

Triangular set of equations:

1 + 2 + · · · + (n − 1) ≈ n2 /2.

Péter Gács (Boston University) CS 530 Spring 09 39 / 165


Linear equations Elimination

Sparsity and fill-in

Example (Chvatal)
A sparse system that fills in.

x1 + x2 + x3 + x4 + x5 + x6 = 4,
x1 + 6x2 = 5,
x1 + 6x3 = 5,
x1 + 6x4 = 5,
x1 + 6x5 = 5,
x1 + 6x6 = 5.

Eliminating x1 fills in everything. There are some guidelines that


direct us to eliminate x2 first, which leads to no such fill-in.

Péter Gács (Boston University) CS 530 Spring 09 40 / 165


Linear equations Elimination

Outcomes of Gaussian elimination

(Possibly changing the order of equations and variables.)


Contradiction: no solution.
Triangular system with nonzero diagonal: 1 solution.
Triangular system with k lines: the solution contains n − k
parameters xk+1 , . . . , xn .

a 11 x1 + ··· + a 1,k+1 xk+1 + · · · + a 1n xn = b 1 ,


a 22 x2 + · · · + a 2,k+1 xk+1 + · · · + a 2n xn = b 2 ,
.. ..
. .
a kk xk + · · · + a k,k+1 xk+1 + · · · + a kn xn = b k ,

where a 11 , . . . , a kk 6= 0. Then dim Ker(A) = n − k, dim Im(A) = k.


The operations performed do not change row and colum rank,
so we find (row rank) = (column rank) = k.

Péter Gács (Boston University) CS 530 Spring 09 41 / 165


Linear equations Elimination

Duality

The original system has no solution if and only if a certain other


system has solution. This other system is the one we obtain
trying to form a contradiction 0 = 1 from the original one, via a
linear combination with coefficients y1 , . . . , ym :

a 11 y1 + · · · + a m1 ym = 0,
.. ..
. .
a 1n y1 + · · · + a mn ym = 0,
b 1 y1 + · · · + b m ym = 1.

Gives an easy way to prove that the system is unsolvable.

Péter Gács (Boston University) CS 530 Spring 09 42 / 165


Linear equations LUP decomposition

LUP decomposition

Permutation matrix. P A interchanges the rows, AP the columns.

Example
The following matrix represents the permutation (2, 3, 1) since its
rows are obtained by this permutation from the unit matrix:

0 0 1
 
1 0 0
0 1 0

Péter Gács (Boston University) CS 530 Spring 09 43 / 165


Linear equations LUP decomposition

LUP decomposition of matrix A:

P A = LU

Using for equation solution:

P b = P Ax = LU x.

From here, forward and back substitution.

Péter Gács (Boston University) CS 530 Spring 09 44 / 165


Linear equations LUP decomposition

Computing the LU decomposition


Zeroing out one column

The following operation adds λ i times row 2 to rows 3, 4, . . . of A:


 
1 0 0 0 0 ... 0
0 1 0 0 0 . . . 0
 
0 λ3 1 0 0 . . . 0 A.
 
L2 A = 
0 λ4 0 1 0 . . . 0
 
.. .. .. .. .. . . ..
 
. . . . . . .
 
1 0 0 0 0 ... 0
0 1 0 0 0 . . . 0
 
−1 0 − λ 1 0 0 . . . 0
 
L2 =   3 .
0 −λ4 0 1 0 . . . 0

.. .. .. .. .. . . .
 
. . . . . . ..

Similarly, a matrix L1 might add multiples of row 1 to rows


2, 3, . . . .
Péter Gács (Boston University) CS 530 Spring 09 45 / 165
Linear equations LUP decomposition

Repeating:
1 −1
B3 = L−
2 L1 A,
 
0 a 11 a 12 a 13 ...

1 0 0 0 ...
λ 0 a(1) a(1) . . .
 
 2 1 0 0 ... 0  22 23
a(2)
 
λ µ3 1 0 ... 0  0 0 . . .

A = L1 L2 B3 = 
 3

33 
λ4 µ4

0 1 ... 0  0 0 a(2) . . .


.. .. .. .. ..  . 43
.. .. .. ..

. . . . . . .. .
. .

Péter Gács (Boston University) CS 530 Spring 09 46 / 165


Linear equations LUP decomposition

Example: If

wT
µ ¶
a 11
A=
v A0

then setting
µ ¶ µ ¶
1 0 1 1 0
L1 = , L−
1 = ,
v/a 11 I n−1 −v/a 11 I n−1

we have L−1 A = B2 , A = L1 B2 where

wT
µ ¶
a 11
B2 = .
0 A 0 − vwT /a 11

The matrix A 2 = A 0 − vwT /a 11 is the Schur’s complement of A.


If A 2 is singular then so is A (look at row rank).

Péter Gács (Boston University) CS 530 Spring 09 47 / 165


Linear equations LUP decomposition

Positive definite matrix


µ T¶
a 11
v
If A is symmetric: A = then with U 1 = L1T we have
A0
v
µ ¶
−1 −1 a 11 0
L1 AU 1 =
0 A2
with Schur’s complement A 2 = A 0 − vvT /a 11 .
Proposition
If A is positive definite then A 2 is also.

Proof. We have yT A 2 y = xT Ax, with


µ ¶
1 0
x = U−
1 y =: M 1 y.
I n−1

If y shows A 2 not positive definite by yT A 2 y É 0 then x = M 1 y


shows A not positive definite.
Péter Gács (Boston University) CS 530 Spring 09 48 / 165
Linear equations LUP decomposition

Passing through a permutation

Suppose that having A = L1 L2 B3 = LB3 , we want to permute the


rows 3, 4, . . . using a permutation π before applying some L3−1 to
L−1 A (say because position (3, 3) in this matrix is 0). Let P be the
permutation matrix belonging to π:

PL−1 A = L3 B4 ,
P A = PLP −1 L3 B4 = L̂L3 B4 where
 
1 0 0 0 ... 0
 λ 1 0 0 ... 0
 2 
−1 λ µπ(3) 1 0 . . . 0 ,
 
L̂ = PLP =   π(3)
λπ(4) µπ(4) 0 1 . . .

0
.. .. .. .. . . ..
 
. . . . . .

assuming L1 was formed with λ2 , λ3 , . . . , and L2 with µ3 , µ4 , . . . .

Péter Gács (Boston University) CS 530 Spring 09 49 / 165


Linear equations LUP decomposition

Organizing the computation: In the kth step, we have a


representation

P A = LB k+1 ,

where the first k columns of B k+1 are 0 below the diagonal.


During the computation, only one permutation π needs to be
maintained, in an array.
Pivoting (see later).
Positive definite matrices do not require it (see later).
Putting it all in a single matrix: Figure 28.1 of CLRS.

Péter Gács (Boston University) CS 530 Spring 09 50 / 165


Linear equations LUP decomposition

LUP decomposition, in a single matrix

for i = 1 to n do π[i] ← i
for k = 1 to n do
p←0
for i = k to n do
if |a ik | > p then
p ← |a ik |
k0 ← i
if p = 0 then error “singular matrix”
exchange π[k] ↔ π[k0 ]
for i = 1 to n do exchange a ki ↔ a k0 i
for i = k + 1 to n do
a ik ← a ik /a kk
for j = k + 1 to n do a i j ← a i j − a ik a k j

Péter Gács (Boston University) CS 530 Spring 09 51 / 165


Linear equations LUP decomposition

What if there is no pivot in some column?


General form:

P AQ = LU, P A = LUQ −1 .

Using for equation solution:

P b = P Ax = LUQ −1 x.

Find P b by permutation via P, then Q −1 x by forward and


backward substitution, then x by permutation via Q.

Péter Gács (Boston University) CS 530 Spring 09 52 / 165


Linear equations LUP decomposition

Proposition
For an n × n matrix A, the row rank is the same as the column
rank.

Proof. Let P AQ = LU. If U has only r rows then L needs to


have only r columns, and vice versa, so L: n × r and U: r × n.
Let us see that r is the row rank of A. Indeed, A has a column
rank r since U maps onto Rr and the image of L is also
r-dimensional. By transposition, the same is true for A T = U T L T ,
and hence the row rank is the same as the colum rank.

Péter Gács (Boston University) CS 530 Spring 09 53 / 165


Issues of rounding Determinant exactly

Computing the determinant exactly

Computing the determinant of an integer matrix is a task that


can stand for many similar ones, like the LU decomposition,
inversion or equation solution. The following considerations
apply to all.
How large is the determinant? Interpretation as volume: if
matrix A has rows a1T , . . . , aT
n then

n Xn
( a2i j )1/2 .
Y
det A É |a1 | · · · |a n | =
i =1 j =1

This is known as Hadamard’s inequality.

Péter Gács (Boston University) CS 530 Spring 09 54 / 165


Issues of rounding Determinant exactly

Working with exact fractions

A single addition or subtraction may double the number of


digits needed, even if the size of the numbers does not grow.

a c ad + bc
+ = .
b d bd
If we are lucky, we can simplify the fraction.
It turns out that with Gaussian elimination, we will be lucky
enough.

Péter Gács (Boston University) CS 530 Spring 09 55 / 165


Issues of rounding Determinant exactly

Theorem
Assume that Gaussian elimination on an integer matrix A
succeeds without pivoting. Every intermediate term in the
Gaussian elimination is a fraction whose numerator and
denominator are some subdeterminants of the original matrix.

(By the Hadamard inequality, these are not too large.)


More precisely, let
A (k) = be the matrix after k stages of the elimination.
D (k) = the minor determined by the first k rows and columns
of A.
D (ikj ) =, for k + 1 É i, j É n, the minor determined by the first k
rows and the ith row and the first k columns and the jth
column.
det D (ikj )
Then for i, j > k we have a(ikj ) = .
det D (k)

Péter Gács (Boston University) CS 530 Spring 09 56 / 165


Issues of rounding Determinant exactly

Proof. In the process of Gaussian elimination, the determinants


of the matrices D (k) and D (ikj ) do not change: they are the same for
A (k) as for A. But in A (k) , both matrices are upper triangular.
Denoting the elements on their main diagonal by d 1 , . . . , d k+1 , a(ikj ) ,
we have

det D (k) = d 1 · · · d k+1 ,


det D (ikj ) = d 1 · · · d k+1 · a(ikj ) .

Divide these two equations by each other.

Péter Gács (Boston University) CS 530 Spring 09 57 / 165


Issues of rounding Determinant exactly

The theorem shows that if we always cancel (using the


Euclidean algorithm) our algorithm is polynomial.
There is a cheaper way than doing complete cancellation (see
exact-Gauss.pdf).
There is also a way to avoid working with fractions altogether:
modular computation. Se for example the Lovász lecture
notes.

Péter Gács (Boston University) CS 530 Spring 09 58 / 165


Issues of rounding Pivoting and scaling

When rounding is unavoidable (reading)

Floating point: 0.235 · 105 (3 digits precision)


Complete pivoting: experts generally do not advise it.
Considerations of fill-in are typically given preference over
considerations of round-off errors, since if the matrix is huge and
sparse, we may not be able to carry out the computations at all if
there is too much fill-in.

Péter Gács (Boston University) CS 530 Spring 09 59 / 165


Issues of rounding Pivoting and scaling

Example

0.0001x + y=1
(4)
0.5x + 0.5y = 1

Eliminate x : −4, 999.5y = −4999.


Rounding to 3 significant digits:

−5, 000y = −5, 000


y= 1
x= 0

True solution: y = 0.999899, rounds to 1, x = 1, 0001, rounds to 1.


We get the true solution by choosing the second equation for
pivoting, rather than the first equation.

Péter Gács (Boston University) CS 530 Spring 09 60 / 165


Issues of rounding Pivoting and scaling

Forward error analysis: comparing the solution with the true


solution.
We can make our solutions look better introducing backward
error analysis: showing that our solution solves precisely a
system that differs only a little from the original.

Péter Gács (Boston University) CS 530 Spring 09 61 / 165


Issues of rounding Pivoting and scaling

Frequently, partial pivoting (choosing the pivot element just in


the k-th column) is sufficient to find a good solution in terms of
forward error analysis. However:

Example

x + 10, 000y = 10, 000


(5)
0.5x + 0.5y = 1

Choosing the first equation for pivoting seems OK. Eliminate x


from the second eq:

−5000.5y = −4, 999


y= 1 after rounding
x= 0

Péter Gács (Boston University) CS 530 Spring 09 62 / 165


Issues of rounding Pivoting and scaling

This is wrong even if we do backward error analysis: every


system

a 11 x + a 12 y = 10, 000
a 21 x + a 22 y = 1

satisfied by x = 0, y = 1 must have a 22 = 1.

Péter Gács (Boston University) CS 530 Spring 09 63 / 165


Issues of rounding Pivoting and scaling

The problem is that our system is not well scaled. Row scaling
and column scaling:
X
ri ai j s j x j = ri bi
ij

where r i , s j are powers of 10. Equilibration: we can always


achieve

0.1 < max | r i a i j s j | É 1,


j

0.1 < max | r i a i j s j | É 1.


i

Example
In (5), let r 1 = 10−4 , all other coeffs are 1: We get back (4), which
we solve by partial pivoting as before.

Péter Gács (Boston University) CS 530 Spring 09 64 / 165


Issues of rounding Pivoting and scaling

Sometimes, like here, there are several ways to scale, and not all
are good.

Example
Choose s 2 = 10−4 , all other coeffs 1:

x+ y0 = 10, 000
0.5x + 0.00005y0 = 1

(We could have gotten this system to start with. . . .) Eliminate x


from the second equation:

−0.49995y0 = −4999
y0 = 10000 after rounding
x= 0

so, we again got the bad solution.

Fortunately, such pathological systems are rare in practice.


Péter Gács (Boston University) CS 530 Spring 09 65 / 165
Inverting matrices

Inverting matrices

Computing matrix inverse from an LUP decomposition:


solving equations

A X i = ei, i = 1, . . . , n.

Inverting a diagonal matrix: (d 1 , . . . , d n )−1 = (d 1−1 , . . . , d −1


n ).
¡B 0 ¢ ¡B 0 ¢ I 0 ³ ´
Inverting a matrix L = C D = 0 D D −1 C I
: We have

B−1 B−1
µ ¶µ ¶ µ ¶
−1 I 0 0 0
L = −1 = .
−D −1 C I 0 D −D CB−1
−1
D −1
¡B C
¢
For an³upper triangular
´
matrix U = 0 D we get similarly
U −1
= B−1 −B−1 CD −1 .
0 D −1

Péter Gács (Boston University) CS 530 Spring 09 66 / 165


Inverting matrices

Theorem
Multiplication is no harder than inversion.

Proof. Let

I 0 0 I 0 0 I 0 0
    

D = L1 L2 =  A I 0 =  A I 0 0 I 0 .
0 B I 0 0 I 0 B I

Its inverse is

I 0 0 I 0 0 I 0 0
    

D −1 = L−
2
1 −1 
L 1 = 0 I 0  − A I 0 =  − A I 0 .
0 −B I 0 0 I AB −B I

Péter Gács (Boston University) CS 530 Spring 09 67 / 165


Inverting matrices

Theorem
Inversion is no harder than multiplication.

Let n be power
³ of 2.´ Assume first that A is symmetric, positive
T
B C
definite, A = C D . Trying a block version of the LU
decomposition:

CT
µ ¶µ ¶
I 0 B
A= .
CB−1 I 0 D − CB−1 C T

Define Q = B−1 C T , and define the Schur complement as


S = D − CQ. We will see later that it is positive definite, so it has
an inverse.

Péter Gács (Boston University) CS 530 Spring 09 68 / 165


Inverting matrices

³ ´³ ´
I 0 C T . By the inversion of triangular
We have A = Q T I B 0 S
matrices learned before:
¶−1 ¶ µ −1
CT B−1 −B−1 C T S −1 −QS −1
µ µ ¶
B B
= = ,
0 S 0 S −1 0 S −1
0 B−1 −QS −1 B + QS −1 Q T
¶ µ −1
−QS −1
µ ¶µ ¶
−1 I
A = = .
−Q T I 0 S −1 −S −1 Q T S −1

Péter Gács (Boston University) CS 530 Spring 09 69 / 165


Inverting matrices

4 multiplications of size n/2 matrices

Q = B−1 C T , QT CT , S −1 Q T , Q(S −1 Q T ),

further 2 inversions and c · n2 additions:

I(2n) É 2I(n) + 4M(n) + c 1 n2 = 2I(n) + F(n),


I(4n) É 4I(n) + F(2n) + 2F(n),
I(2k ) É 2k I(1) + F(2k−1 ) + 2F(2k−2 ) + · · · + 2k−1 F(1).

Péter Gács (Boston University) CS 530 Spring 09 70 / 165


Inverting matrices

Assume F(n) É c 2 n b with b > 1. Then

F(2k− i )2 i É c 2 2bk−bi+ i = 2bk 2−(b−1) i .

So,

I(2k ) É 2k I(1) + c 2 2b(k−1) (1 + 2−(b−1) + 2−2(b−1) + · · · )


< 2k + c 2 2b(k−1) /(1 − 2−(b−1) ).

Inverting an arbitrary matrix: A −1 = (A T A)−1 A T .

Péter Gács (Boston University) CS 530 Spring 09 71 / 165


Inverting matrices

Proposition
The Schur complement is positive definite.

Proof.

BT
µ ¶µ ¶
T AT y
(y , z ) = yT A y + yT B T z + z T B y + z T Cz
B C z
= (y + A −1 B T z)T A(y + A −1 B T z) + z T (C − BA −1 B T )z.

For any z you can choose y to make the first term 0.

Péter Gács (Boston University) CS 530 Spring 09 72 / 165


Least squares approximation

Least squares approximation (reading)

Data: (x1 , y1 ), . . . , (xm , ym ).


Fitting F(x) = c 1 f 1 (x) + · · · + c n f n (x).
It is reasonable to choose n much smaller than m (noise).
 
f 1 (x1 ) f 2 (x1 ) . . . f n (x1 )
 f 1 (x2 ) f 2 (x2 ) . . . f n (x2 ) 
 
A=  .. .. .. ..  .
 . . . . 
f 1 (xm ) f 2 (xm ) . . . f n (xm )

Equation Ac = y, generally unsolvable in the variable c. We want


to minimize the error η = Ac − y. Look at the subspace V of
vectors of the form Ac. In V , we want to find c for which Ac is
closest to y.

Péter Gács (Boston University) CS 530 Spring 09 73 / 165


Least squares approximation

Then Ac is the projection of y to to V , with the property that


Ac − y is orthogonal to every vector of the form Ax:

(Ac − y)T Ax = 0 for all x, so


T
(Ac − y) A = 0
A T (Ac − y) = 0

The equation A T Ac = A T y is called the normal equation,


solvable by LU decomposition.
Explicit solution: Assume that A has full column rank, then A T A
is positive definite.
c = (A T A)−1 A T y. Here (A T A)−1 A T is called the pseudo-inverse
of A.

Péter Gács (Boston University) CS 530 Spring 09 74 / 165


Linear Programming Problem definition

Linear programming

How about solving a system of linear inequalities?

Ax É b.

We will try to solve a seemingly more general problem:

maximize cT x
subject to Ax É b.

This optimization problem is called a linear program. (Not


program in the computer programming sense.)

Péter Gács (Boston University) CS 530 Spring 09 75 / 165


Linear Programming Problem definition

Example
Three voting districts: urban, suburban, rural.
Votes needed: 50,000, 100,000, 25,000.
Issues: build roads, gun control, farm subsidies, gasoline tax.
Votes gained, if you spend $ 1000 on advertising on any of these
issues:

adv. spent policy urban suburban rural


x1 build roads −2 5 3
x2 gun control 8 2 −5
x3 farm subsidies 0 0 10
x4 gasoline tax 10 0 −2
votes needed 50, 000 100, 000 25, 000

Minimize the advertising budget (x1 + · · · + x4 ) · 1000.

Péter Gács (Boston University) CS 530 Spring 09 76 / 165


Linear Programming Problem definition

The linear programming problem:

minimize x1 + x2 + x3 + x4
subject to −2x1 + 8x2 + 10x4 Ê 50, 000
5x1 + 2x2 Ê 100, 000
3x1 − 5x2 + 10x3 − 2x4 Ê 25, 000

Implicit inequalities: x i Ê 0.

Péter Gács (Boston University) CS 530 Spring 09 77 / 165


Linear Programming Solution idea

Two-dimensional example

maximize x1 + x2
subject to 4x1 − x2 É 8
2x1 + x2 É 10
5x1 − 2x2 Ê −2
x1 , x2 Ê 0

Graphical representation, see book.


Convex polyhedron, extremal points.
The simplex algorithm: moving from an extremal point to a
nearby one (changing only two inequalities) in such a way that
the objective function keeps increasing.

Péter Gács (Boston University) CS 530 Spring 09 78 / 165


Linear Programming Solution idea

Worry: there may be too many extremal points. For example, the
set of 2n inequalities

0 É x i É 1, i = 1, . . . , n

has 2n extremal points.

Péter Gács (Boston University) CS 530 Spring 09 79 / 165


Linear Programming Standard and slack form

Standard and slack form

Standard form

maximize cT x
subject to Ax É b
xÊ 0

Objective function, constraints, nonnegativity constraints,


feasible solution, optimal solution, optimal objective value.
Unbounded: if the optimal objective value is infinite.
Converting into standard form:

x j = x0j − x00j , subject to x0j , x00j Ê 0.

Handling equality constraints.

Péter Gács (Boston University) CS 530 Spring 09 80 / 165


Linear Programming Standard and slack form

Slack form
In the slack form, the only inequality constraints are
nonnegativity constraints. For this, we introduce slack variables
on the left:
n
X
x n+ i = b i − ai j x j.
j =1

In this form, they are also called basic variables. The objective
function does not depend on the basic variables. We denote its
value by z.

Péter Gács (Boston University) CS 530 Spring 09 81 / 165


Linear Programming Standard and slack form

Example for the slack form notation:

z= 2x1 − 3x2 + 3x3


x4 = 7 − x1 − x2 + x3
x5 = −7 + x1 + x2 − x3
x6 = 4 − x1 + 2x2 − 2x3

More generally: B = set of indices of basic variables, |B| = m.


N = set of indices of nonbasic variables, | N | = n,
B ∪ N = {1, . . . , m + n}. The slack form is given by (N, B, A, b, c, v):
P
z= v+ c x
P j∈N j j
x i = b i − j∈N a i j x j for i ∈ B.

Note that these equations are always independent.

Péter Gács (Boston University) CS 530 Spring 09 82 / 165


Linear Programming Formulating problems as linear programs

Single-source shortest paths

(Maximization is counter-intuitive, but the book is wrong.)

maximize d[t]
subject to d[v] É d[u] + w(u, v) for each edge (u, v)
d[s] Ê 0

Péter Gács (Boston University) CS 530 Spring 09 83 / 165


Linear Programming Formulating problems as linear programs

Maximum flow

Capacity c(u, v) Ê 0.
P
maximize f (s, v)
v
subject to f (u, v) É c(u, v)
f (u, v) = − f (v, u)
P
v f (u, v) = 0 for u ∈ V − { s, t}

The matching problem.


Given m workers and n jobs, and a graph connecting each worker
with some jobs he is capable of performing. Goal: to connect the
maximum number of workers with distinct jobs.
This can be reduced to a maximum flow problem (see homework
and book).

Péter Gács (Boston University) CS 530 Spring 09 84 / 165


Linear Programming Formulating problems as linear programs

Minimum-cost flow
Edge cost a(u, v). Send d units of flow from s to t and minimize
the total cost
X
a(u, v) f (u, v).
u,v

Multicommodity flow
k different commodities K i = (s i , t i , d i ), where d i is the demand.
The capacities constrain the aggregate flow. There is nothing to
optimize: just determine the feasibility.

Péter Gács (Boston University) CS 530 Spring 09 85 / 165


Linear Programming Formulating problems as linear programs

Games

A zero-sum two-person game is played between player 1 and


player 2 and defined by an m × n matrix A. We say that if player
1 chooses a pure strategy i ∈ {1, . . . , m} and player 2 chooses pure
strategy j ∈ {1, . . . , n} then there is payoff: player 2 pays amount
a i j to player 1.

Example
m = n = 2, pure strategies {1, 2} are called “attack left”, “attack
right” for player 1 and “defend left”, “defend right” for player 2.
The matrix is
µ ¶
−1 1
A= .
1 −1

Péter Gács (Boston University) CS 530 Spring 09 86 / 165


Linear Programming Formulating problems as linear programs

Mixed strategy: a probability distribution over pure strategies.


p = (p 1 , . . . , p m ) for player 1 and q = (q 1 , . . . , q m ) for player 2.
P
Expected payoff: i j a i j p i q j .
If player 1 knows the mixed strategy q of player 2, he will want to
achieve
X X X
max pi a i j q j = max ai j q j
p i
i j j

since a pure strategy always achieves the maximum. Player 2


wants to minimize this and can indeed achieve
X
min max ai j q j.
q i j

Péter Gács (Boston University) CS 530 Spring 09 87 / 165


Linear Programming Formulating problems as linear programs

Rewritten as a linear programming problem:

minimize t
P
subject to t Ê j ai j q j, i = 1, . . . , m
q Ê 0, j = 1, . . . , n
P j
j qj = 1.

Péter Gács (Boston University) CS 530 Spring 09 88 / 165


Linear Programming The simplex algorithm

The simplex algorithm


Slack form. Example:

z= 3x1 + x2 + 2x3
x4 = 30 − x1 − x2 + 3x3
x5 = 24 − 2x1 − 2x2 − 5x3
x6 = 36 − 4x1 − x2 − 2x3

A basic solution: set each nonbasic variable to 0. Since all b i


are positive, the basic solution is feasible here.
Iteration step: Increase x1 until one of the constraints
becomes tight: now, this is x6 since b i /a i1 is minimal for i = 6.
Pivot operation: exchange x6 for x1 .

x1 = 9 − x2 /4 − x3 /2 − x6 /4

Here, x1 is the entering variable, x6 the leaving variable.


If not possible, are we done? See later.
Péter Gács (Boston University) CS 530 Spring 09 89 / 165
Linear Programming The simplex algorithm

In general:

Lemma
The slack form is uniquely determined by the set of basic variables.

Proof. Simple, using the uniqueness of linear forms.

This is useful, since the matrix is therefore only needed for


deciding how to continue. We might have other ways to decide
this.
Assume that there is a basic feasible solution. See later how
to find one.

Péter Gács (Boston University) CS 530 Spring 09 90 / 165


Linear Programming The simplex algorithm

Rewrite all other equations, substituting this x1 :

z = 27 + x2 /4 + x3 /2 − 3x6 /4
x1 = 9 − x2 /4 − x3 /2 − x6 /4
x4 = 21 − 3x2 /4 − 5x3 /2 + x6 /4
x5 = 6 − 3x2 /2 − 4x3 + x6 /2

Formal pivot algorithm: no surprise.

Péter Gács (Boston University) CS 530 Spring 09 91 / 165


Linear Programming The simplex algorithm

When can we not pivot?


unbounded case
optimality
The problem of cycling Can be solved, though you will not
encounter it in practice.
Perturbation, or “Bland’s Rule”: choose variable with the
smallest index. (No proof here that this terminates.)
Geometric meaning: walking around a fixed extremal point,
trying different edges on which we can leave it while increasing
the objective.

Péter Gács (Boston University) CS 530 Spring 09 92 / 165


Linear Programming The simplex algorithm

Initial basic feasible solution

Solve the following auxiliary problem, with an additional variable


x0 :

minimize x0
T
subject to a i x − x0 Éb i = 1, . . . , m,
x, x0 Ê0

If the optimal x0 is 0 then the optimal basic feasible solution is a


basic feasible solution to the original problem.

Péter Gács (Boston University) CS 530 Spring 09 93 / 165


Linear Programming The simplex algorithm

Complexity of the simplex method

Each pivot step takes O(mn) algebraic operations.


How many pivot steps? Can be exponential.
Does not occur in practice, where the number of needed
iterations is rarely higher than 3 max(m, n). Does not occur on
“random” problems, but mathematically random problems are
not typical in practice.
Spielman-Teng: on a small random perturbation of a linear
program (a certain version of) the simplex algorithm
terminates in polynomial time (on average).
There exists also a polynomial algorithm for solving linear
programs (see later). It is rarely competitive in practice.

Péter Gács (Boston University) CS 530 Spring 09 94 / 165


Linear Programming Duality

Duality

Primal (standard form): maximize c T x subject to Ax É b and


x Ê 0. Value of the optimum (if feasible): z∗ . Dual:

AT y Ê c yT A Ê c T
yÊ0 yT Ê 0
T
min b y min yT b

Value of the optimum if feasible: t∗ .

Proposition (Weak duality)


z∗ É t∗ , moreover for every pair of feasible solutions x, y of the
primal and dual:

c T x É yT Ax É yT b = b T y. (6)

Péter Gács (Boston University) CS 530 Spring 09 95 / 165


Linear Programming Duality

Use of duality. If somebody offers you a feasible solution to the


dual, you can use it to upperbound the optimum of the primal
(and for example decide that it is not worth continuing the
simplex iterations).

Péter Gács (Boston University) CS 530 Spring 09 96 / 165


Linear Programming Duality

Interpretation:
b i = the total amount of resource i that you have (kinds of
workers, land, machines).
a i j = the amount of resource i needed for activity j.
c j = the income from a unit of activity j.
x j = amount of activity j.
Ax É b says that you can use only the resources you have.
Primal problem: maximize the income c T x achievable with the
given resources.
Dual problem: Suppose that you can buy lacking resources and
sell unused resources.

Péter Gács (Boston University) CS 530 Spring 09 97 / 165


Linear Programming Duality

Resource i has price yi . Total income:

L(x, y) = c T x + yT (b − Ax) = (c T − yT A)x + yT b.

Let

f ( x̂) = inf L( x̂, y) É L( x̂, ŷ) É sup L(x, ŷ) = g( ŷ).


yÊ0 xÊ0

Then f (x) > −∞ needs Ax É b. Hence if the primal is feasible


then for the optimal x∗ (choosing y to make yT (b − Ax∗ ) = 0) we
have

sup f (x) = c T x∗ = z∗ .
x

Similarly g(y) < ∞ needs c T É yT A, hence if the dual is feasible


then we have

z∗ É inf g(y) = (y∗ )T b = t∗ .


y

Péter Gács (Boston University) CS 530 Spring 09 98 / 165


Linear Programming Duality

Complementary slackness conditions:

yT (b − Ax) = 0, (yT A − c T )x = 0.

Proposition
Equality of the primal and dual optima implies complementary
slackness.

Interpretation:
Inactive constraints have shadow price yi = 0.
Activities that do not yield the income required by shadow
prices have level x j = 0.

Péter Gács (Boston University) CS 530 Spring 09 99 / 165


Linear Programming Duality

Theorem (Strong duality)


The primal problem has an optimum if and only if the dual is
feasible, and we have

z∗ = max c T x = min yT b = t∗ .

This surprising theorem says that there is a set of prices (called


shadow prices) which will force you to use your resources
optimally.
Many interesting uses and interpretations, and many proofs.

Péter Gács (Boston University) CS 530 Spring 09 100 / 165


Linear Programming Duality

Our proof of strong duality uses the following result of the


analysis of the simplex algorithm.

Theorem
If there is an optimum v then there is a basis B ⊂ {1, . . . , m + n}
belonging to a basic feasible solution, and coefficients c̃ i É 0 such
that

c T x = v + c̃ T x,

where c̃ i = 0 for i ∈ B.

Define the nonnegative variables

ỹi = − c̃ n+ i i = 1, . . . , m.

Péter Gács (Boston University) CS 530 Spring 09 101 / 165


Linear Programming Duality

For any x, the following transformation holds, where i = 1, . . . , m,


j = 1, . . . , n:
X X X
c jxj = v + c̃ j x j + c̃ n+ i xn+ i
j j i
X X X
= v+ c̃ j x j + (− ỹi )(b i − ai j x j)
j i j
X X X
= v− b i ỹi + ( c̃ j + a i j ỹi )x j .
i j i
P P
This is an identity for x, so v = i b i ỹi , and also c j = c̃ j + i a i j ỹi .
Optimality implies c̃ j É 0, which implies that ỹi is a feasible
solution of the dual.

Péter Gács (Boston University) CS 530 Spring 09 102 / 165


Linear Programming Duality

Linear programming and linear inequalities

Any feasible solution of the set of inequalities

Ax Éb
AT y Ê c
cT x − bT y = 0
x, yÊ 0

gives an optimal solution to the original linear programming


problem.

Péter Gács (Boston University) CS 530 Spring 09 103 / 165


Linear Programming Alternatives

Theory of alternatives

Theorem (Farkas Lemma, not as in the book)


A set of inequalities Ax É b is unsolvable if and only if a positive
linear combination gives a contradiction: there is a solution y Ê 0
to the inequalities

yT A = 0,
yT b < 0.

For proof, translate the problem to finding an initial feasible


solution to standard linear programming.

Péter Gács (Boston University) CS 530 Spring 09 104 / 165


Linear Programming Alternatives

We use the homework allowing variables without nonnegativity


constraints:

maximize z
(7)
subject to Ax + z · e É b

Here, e is the vector consisting of all 1’s. The dual is

minimize yT b
subject to yT A = 0
(8)
yT e = 1
yT Ê 0

The original problem has no feasible solution if and only if


max z < 0 in (7). In this case, min yT b < 0 in (8). (Condition
yT e = 1 is not needed.)

Péter Gács (Boston University) CS 530 Spring 09 105 / 165


Linear Programming Alternatives

Separating hyperplane
Vectors u1 , . . . , u m in an n-dimensional space. Let L be the set of
convex linear combinations of these points: v is in L if
X X
yi u i = v, yi = 1, y Ê 0.
j i

Using matrix U with rows u Ti :

yT U = v T ,
X
yi = 1, y Ê 0. (9)
i

If v 6∈ L then we can put between L and v a hyperplane with


equation d T v = c. Writing x in place of d and z in place of c, this
says that the following set of inequalities has a solution for x, z:

u Ti x É z (i = 1, . . . , m), vT x > z.

Can be derived from the Farkas Lemma.


Péter Gács (Boston University) CS 530 Spring 09 106 / 165
Linear Programming Applications of duality

Application to games

Primal, with dual variables written in parentheses at end of lines:

minimize t
P
subject to t − j ai j q j Ê 0 i = 1, . . . , m (p i )
P
j q j = 1, (z)
q j Ê 0, j = 1, . . . , n

Dual:

maximize z
P
subject to i pi = 1,
P
− i ai j pi + z É 0, j = 1, . . . , n
pi Ê 0 i = 1, . . . , m.

Péter Gács (Boston University) CS 530 Spring 09 107 / 165


Linear Programming Applications of duality

Dual for max-flow: min-cut

P
maximize v∈V f (s, v)
subject to f (u, v) É c(u, v), u, v ∈ V ,
f (u, v) = − f (v, u), u, v ∈ V ,
P
v∈V f (u, v) = 0, u ∈ V \ { s, t}.
Two variables associated with each edge, f (u, v) and f (v, u).
Simplify. Order the points arbitrarily, but starting with s and
ending with t. Leave f (u, v) when u < v: whenever f (v, u) appears
with u < v, replace with − f (u, v).

Péter Gács (Boston University) CS 530 Spring 09 108 / 165


Linear Programming Applications of duality

P
maximize f (s, v)
v> s
subject to f (u, v) É c(u, v), u < v,
− f (u, v) É c(v, u), u < v,
P P
v> u f (u, v) − v< u f (v, u) = 0, u ∈ V \ { s, t}.

Some constraints disappeared but others appeared, since in case


of u < v the constraint f (v, u) É c(v, u) is written now
− f (u, v) É c(u, v).
A dual variable for each constraint. For f (u, v) É c(u, v), call it
y+ (u, v), for − f (u, v) É c(u, v), call it y− (y, v). For
X X
f (u, v) − f (v, u) = 0
v> u v< u

call it y(u).

Péter Gács (Boston University) CS 530 Spring 09 109 / 165


Linear Programming Applications of duality

Dual constraint for each primal variable f (u, v), u < v. Since
f (u, v) is not restricted by sign, the dual constraint is an
equation. If u, v 6= s then f (u, v) has coefficient 0 in the objective
function. Let

y(u, v) = y+ (u, v) − y− (u, v).

The equation for u 6= s, v 6= t is y+ (u, v) − y− (u, v) + y(u) − y(v) = 0, or

y(u, v) = y(v) − y(u).

For u = s, v 6= t: y+ (s, v) − y− (s, v) − y(v) = 1, or

y(s, v) = y(v) − (−1).

For u 6= s but v = t, y+ (u, t) − y− (u, t) + y(u) = 0, or

y(u, t) = 0 − y(u).

Péter Gács (Boston University) CS 530 Spring 09 110 / 165


Linear Programming Applications of duality

For u = s, v = t: y+ (s, t) − y− (s, t) = 1, or

y(s, t) = 0 − (−1).

Setting y(s) = −1, y(t) = 0, all these equations can be summarized


in y(u, v) = y(v) − y(u) for all u, v.
The objective function is u,v c(u, v)(y+ (u, v) + y− (u, v)).
P

The maximum of any x+ + x− subject to x+ , x− Ê 0, x+ − x− = a is


|a|, so the objective function can be simplified to
P
u,v c(u, v)| y(u, v)|. Simplified dual problem:

P
minimize u<v c(u, v)| y(v) − y(u)|
subject to y(s) = −1, y(t) = 0.

Let us require y(s) = 0, y(t) = 1 instead; the problem remains the


same.

Péter Gács (Boston University) CS 530 Spring 09 111 / 165


Linear Programming Applications of duality

Claim
There is an optimal solution in which each y(u) is 0 or 1.

Proof. Assume that there is an y(u) that is not 0 or 1. If it is


outside the interval [0, 1] then moving it towards this interval
decreases the objective function, so assume they are all inside. If
there are some variables y(u) inside this interval then move them
all by the same amount either up or down until one of them hits 0
or 1. One of these two possible moves will not increase the
objective function. Repeat these actions until each y(u) is 0 or
1.

Péter Gács (Boston University) CS 530 Spring 09 112 / 165


Linear Programming Applications of duality

Let y be an optimal solution in which each y(u) is either 0 or 1.


Let
S = { u : y(u) = 0 }, T = { u : y(u) = 1 }.
Then s ∈ S, t ∈ T. The objective function is
X
c(u, v).
u∈S,v∈T

This is the value of the “cut” (S, T). So the dual problem is about
finding a minimum cut, and the duality theorem implies the
max-flow/min-cut theorem.

Péter Gács (Boston University) CS 530 Spring 09 113 / 165


Linear Programming Applications of duality

Maximum bipartite matching


Bipartite graph with left set A, right set B and edges E ⊆ A × B.
Interpretation: elements of A are workers, elements of B are jobs.
(a, b) ∈ E means that worker a has the skill to perform job b. Two
edges are disjoint if both of their endpoints differ. Matching: a set
M of disjoint edges. Maximum matching: a maximum-size
assignment of workerst to jobs.
Covering set C ⊆ A ∪ B: a set with the property that for each edge
(a, b) ∈ E we have a ∈ C or b ∈ C.
Clearly, the size of each matching is É the size of each covering
set.
Theorem
The size of a maximum matching is equal to the size of a
minimum covering set.

There is a proof by reduction to the flow problem and using the


max-flow min-cut theorem.
Péter Gács (Boston University) CS 530 Spring 09 114 / 165
The ellipsoid algorithm The problem

The ellipsoid algorithm


The problem

The simplex algorithm may take an exponential number of


steps, as a function of m + n.
Consider just the problem of solving a set of inequalities

aTi x É b i , i = 1, . . . , m

for x ∈ Rn . If each entry has at most k digits then the size of


the input is

L = m · n · k.

We want a solution (or learn that none exists) in a number of


steps polynomial in L, that is O(L c ) for some constant c.

Péter Gács (Boston University) CS 530 Spring 09 115 / 165


The ellipsoid algorithm Ellipsoids

Ellipsoids

In space Rn , for all r > 0 the set

B(c, r) = { x : (x − c)T (x − c) É r 2 }

is a ball with center c and radius r. A nonsingular linear


transformation L transforms B(0, r) into an ellipsoid

E = { Lx : xT x É r 2 } = { y : yT A −1 y É r 2 },

where A = L T L is positive definite. A general ellipsoid E(c, A)


with center c has the form

{ x : (x − c)T A −1 (x − c) É r 2 }

where A is positive definite.

Péter Gács (Boston University) CS 530 Spring 09 116 / 165


The ellipsoid algorithm Ellipsoids

Though we will not use it substantially, the following theorem


shows that ellipsoids can always be brought to a simple form. A
basis b1 , . . . , b n of the vector space Rn is called orthonormal if
b Ti b j = 0 for i 6= j and 1 for i = j.

Theorem (Principal axes)


Let E be an ellipsoid with center 0. Then there is an orthonormal
basis such that if vectors are expressed with coordinates in this
basis then

E = { x : xT A −2 x É 1 },

where A is a diagonal matrix with positive elements a 1 , . . . , a n on


the diagonal.
© x2 x2 ª
In other words, E = x : a12 + · · · + a2n É 1 .
1 n

Péter Gács (Boston University) CS 530 Spring 09 117 / 165


The ellipsoid algorithm Ellipsoids

In 2 dimensions this gives the familiar equation of the ellipse

x2 y2
+ = 1.
a2 b 2
The numbers a, b are the lenghts of the principal axes of the
ellipse, measured from the center. When they are all equal, we
get the equation of a circle (sphere in n dimensions).

Péter Gács (Boston University) CS 530 Spring 09 118 / 165


The ellipsoid algorithm Ellipsoids

Volume of an ellipsoid

Let Vn be the volume of a unit ball in n dimensions. It is easy to


see that the volume of the ellipsoid

x2 x2
E = x : 12 + · · · + 2n É 1 .
© ª
a1 an

is Vol(E) = Vn a 1 a 2 · · · a n . More generally, if


E = { x : xT (A A T )−1 x É 1 } then Vol(E) = Vn det A.

Péter Gács (Boston University) CS 530 Spring 09 119 / 165


The ellipsoid algorithm Upper and lower bounds

Bounding the set of solutions

The set of solutions is a (possibly empty) polyhedron P. Let

1 δ
N = n n/2 102kn , δ= , ε= ,
2mN 10k n
b0i = b i + δ.

In preparation, we will show

Theorem
p
a There is a ball E 1 of radius É N n and center 0 with the
property that if there is a solution then there is a solution in
E1.
b Ax É b is solvable if and only if Ax É b0 is solvable and its set
of solutions of contains a ball of radius ε.

Péter Gács (Boston University) CS 530 Spring 09 120 / 165


The ellipsoid algorithm Upper and lower bounds

Consider the upper bound first. We have seen in homework the


following:

Lemma
If there is a solution then there is one with | x j | É N for all j.

This implies a .
Now for the lower bound. The coming homework has a problem
showing the following theorem, with

Lemma
If Ax É b has no solution then defining b0i = b i + δ, the system
Ax É b0 has no solution either.

Péter Gács (Boston University) CS 530 Spring 09 121 / 165


The ellipsoid algorithm Upper and lower bounds

The following clearly implies b of the theorem:

Corollary
If Ax É b0 is solvable then its set of solutions contains a cube of
size 2ε.

Proof. If Ax É b0 is solvable then so is Ax É b. Let x be a


solution of Ax É b. Then changing each x j by any amount of
absolute value at most ε changes
n
aTi x =
X
ai j x j
j =1

by at most 10k nε É δ, so each inequality aTi x É b0i still holds.

Péter Gács (Boston University) CS 530 Spring 09 122 / 165


The ellipsoid algorithm The algorithm

The algorithm

The algorithm will go through a series x(1) , x(2) , . . . of trial


solutions, and in step t learn P ⊆ E t where our wraps
E 1 , E 2 , . . . are ellipsoids.
We start with x(1) = 0, the center of our ball. Is it a solution? If
not, there is an i with aTi x(1) > b i . Then P is contained in the
half-ball

H1 = E 1 ∩ { x : aTi x É aTi x(1) }.

Péter Gács (Boston University) CS 530 Spring 09 123 / 165


The ellipsoid algorithm Shrinking rate

Shrinking rate

To keep our wraps simple, we enclose H1 into an ellipsoid E 2 of


possibly small volume.

Lemma
There is an ellipsoid E 2
containing H1 with
1
Vol(E 2 ) É e− 2n Vol(E 1 ). This is
x(1) x(2) true even if E 1 was also an
ellipsoid.
1
Note e− 2n ≈ 1 − 21n .

Péter Gács (Boston University) CS 530 Spring 09 124 / 165


The ellipsoid algorithm Shrinking rate

Proof

Assume without loss of generality


E 1 is the unit ball E 1 = { x : xT x É 1 },
a i = − e1 , b i < 0.
Then the half-ball to consider is { x ∈ E 1 : x1 Ê 0 }. The best
ellipsoid’s center has the form (d, 0, . . . , 0)T . The axes will be
(1 − d), b, b, . . . , b, so

© (x1 − d)2 −2
X 2 ª
E2 = x : + b x j É 1 .
(1 − d)2 j Ê2

P 2
It touches the ball E 1 at the circle x1 = 0, j Ê2 x j = 1:

d2
+ b−2 = 1.
(1 − d)2

Péter Gács (Boston University) CS 530 Spring 09 125 / 165


The ellipsoid algorithm Shrinking rate

Hence

d2 1 − 2d
b−2 = 1 − 2
= ,
(1 − d) 1 − 2d + d 2
d2
b2 = 1 + É 1 + 2d 2 if d É 1/4.
1 − 2d
Using 1 + z É e z :
2
Vol(E 2 ) = Vn (1 − d)b n−1 É Vn (1 − d)(1 + 2d 2 )n/2 É Vn e nd −d
.
1
Choose d = 21n , then this is Vn e− 2n .
This proves the Lemma for the case when E 1 is a ball. When E 1
is an ellipsoid, transform it linearly into a ball, apply the lemma
and then transform back. The transformation takes ellipsoids
into ellipsoids and does not change the ratio of volumes.

Péter Gács (Boston University) CS 530 Spring 09 126 / 165


The ellipsoid algorithm Bounding the number of iterations

Bounding the number of iterations


Now the algorithm constructs E 3 from E 2 in the same way, and so
on. If no solution is found, then r steps diminish the volume by a
factor
r
e− 2n .
p
We know Vol(E 1 ) É Vn (N n)n , while if there is a solution then
the set of solutions contains a ball of volume Ê Vn εn . But if r is so
large that
¶n
ε
µ
− 2rn
e < p
N n

then Vol(E r+1 ) is smaller than the volume of this small ball, so
there is no solution.
It is easy to see from here that r can be chosen to be polynomial
in m, n, k.
Péter Gács (Boston University) CS 530 Spring 09 127 / 165
NP-completeness

NP problems

Examples
Shortest vs. longest simple paths
Euler tour vs. Hamiltonian cycle
2-SAT vs. 3-SAT. Satisfiability for circuits and for conjunctive
normal form (SAT). Reducing sastisfiability for circuits to
3-SAT.
Use of reduction in this course: proving hardness.
Ultrasound test of sex of fetus.

Péter Gács (Boston University) CS 530 Spring 09 128 / 165


NP-completeness

Decision problems vs. optimization problems vs. search problems.

Example
Given a graph G.
Decision Given k, does G have an independent subset of size Ê k?
Optimization What is the size of the largest independent set?
Search Given k, give an independent set of size k (if there is one).
Optimization+search Give a maximum size independent set.

Péter Gács (Boston University) CS 530 Spring 09 129 / 165


NP-completeness Polynomial time

Random access machine

Memory: one-way infinite tape: cell i contains natural number


T[i] of arbitrary size.
Program: a sequence of instructions, in the “program store”: a
(potentially) infinite sequence of labeled registers containing
instructions. A program counter.
Instruction types:
T[T[i]] = T[T[ j]] random access
T[i] = T[ j] ± T[k] addition
if T[0] > 0 then jump to s conditional branching
The cost of an operation will be taken to be proportional to the
total length of the numbers participating in it. This keeps the cost
realistic despite the arbitrary size of numbers in the registers.

Péter Gács (Boston University) CS 530 Spring 09 130 / 165


NP-completeness Polynomial time

Polynomial time

Abstract problems
Instance. Solution.
Encodings
Concrete problems: encoded into strings.
Polynomial-time computable functions, polynomial-time
decidable sets.
Polynomially related encodings.
Language: a set of strings. Deciding a language.

Péter Gács (Boston University) CS 530 Spring 09 131 / 165


NP-completeness Polynomial-time verification

Polynomial-time verification

Example
Hamiltonian cycles.

An NP problem is defined with the help of a polynomial-time


function

V (x, w)

with yes/no values that verifies, for a given input x and witness
(certificate) w whether w is indeed witness for x.

Péter Gács (Boston University) CS 530 Spring 09 132 / 165


NP-completeness Polynomial-time verification

The same decision problem may belong to very different


verification functions (search problems).

Example (Compositeness)
Let the decision problem be the question whether a number x is
composite (nonprime). The obvious verifiable property is

V1 (x, w) ⇔ (1 < w < x) ∧ (w| x).

There is also a very different verifiable property V2 (x, w) for


compositeness such that, for a certain polynomial-time
computable b(x), if x is composite then at least half of the
numbers 1 É w É b(x) are witnesses. This can be used for
probabilistic prime number tests.

Péter Gács (Boston University) CS 530 Spring 09 133 / 165


NP-completeness Reducibility, completeness

Reducibility, completeness

Reduction of problem A 1 to problem A 2 in terms of the


verification functions V1 , V2 and a reduction (translation)
function τ:

∃wV1 (x, w) ⇔ ∃ uV2 (τ(x), u).

Example
Reducing linear programming to solving a set of linear
inequalities.

NP-hardness.
NP-completeness.

Péter Gács (Boston University) CS 530 Spring 09 134 / 165


NP-completeness Circuit satisfiability and independent set

Theorem
Satisfiability is NP-complete.

Proof via circuit satisfiability.

Theorem
INDEPENDENT SET is NP-complete.

Reducing SAT to it.

Example
Integer linear programming. In particular, the subset sum
problem.

Reduction of 3SAT to subset sum.


Example
Set cover Ê vertex cover ∼ independent set.

Péter Gács (Boston University) CS 530 Spring 09 135 / 165


Approximations

Approximations
The setting

In case of NP-complete problems, maybe something can be said


about how well we can approximate a solution. We will formulate
the question only for problems, where we maximize a positive
function. For object function f (x, y) for x, y ∈ {0, 1}n , the optimum
is
M(x) = max f (x, y)
y

where y runs over the possible “witnesses”.


For 0 < λ, an algorithm A(x) is a λ-approximation if

f (x, A(x)) > M(x)/λ.

For minimization problems, with minimum m(x), we require


f (x, A(x)) < M(x)λ.

Péter Gács (Boston University) CS 530 Spring 09 136 / 165


Approximations Greedy algorithms

Greedy algorithms
Try local improvements as long as you can.
Example (Maximum cut)
Graph G = (V , E), cut S ⊆ V , S = V \ S. Find cut S that maximizes
the number of edges in the cut:

|{ { u, v} ∈ E : u ∈ S, v ∈ S }|.

Greedy algorithm:
Repeat: find a point on one side of the cut whose
moving to the other side increases the cutsize.

Theorem
If you cannot improve anymore with this algorithm then you are
within a factor 2 of the optimum.

Proof. The unimprovable cut contains


Péter Gács (Boston University)CS 530 at least half of all Spring 09 137 / 165
Approximations Greedy algorithms

Randomized algorithms

Generalize maximum cut for the case where edges e have weights
w e , that is maximize
X
wuv .
u∈S,v∈S

Question The greedy algorithm brings within factor 2 of the


optimum also in the weighted case. But does it take a
polynomial number of steps?
New idea: decide each “v ∈ S?” question by tossing a coin. The
P
expected weight of the cut is 12 e w e , since each edge is in the
cut with probability 1/2.
We will do better with semidefinite programming.

Péter Gács (Boston University) CS 530 Spring 09 138 / 165


Approximations Greedy algorithms

Less greed is sometimes better

What does the greedy algorithm for vertex cover say?


The following, less greedy algorithm has better performance
guarantee.
Approx_Vertex_Cover (G):
C←;
E 0 ← E[G]
while E 0 6= ; do
let (u, v) be an arbitrary edge in E 0
C ← C ∪ { u, v}
remove from E 0 every edge incident on either u or v
return C

Péter Gács (Boston University) CS 530 Spring 09 139 / 165


Approximations Greedy algorithms

Theorem
Approx_Vertex_Cover has a ratio bound of 2.

Proof. The points of C are endpoints of a matching. Any


optimum vertex cover must contain half of them.

Péter Gács (Boston University) CS 530 Spring 09 140 / 165


Approximations Greedy algorithms

More general vertex cover problem for G = (V , E), with weight w i


in vertex i. Let x i = 1 if vertex x is selected. Linear programming
problem without the integrality condition:

minimize wT x
subject to x i + x j Ê 1, (i, j) ∈ E,
x Ê 0.

Let the optimal solution be x∗ . Choose x i = 1 if x∗i Ê 1/2 and 0


otherwise.
Claim
Solution x has approximation ratio 2.

Proof. We increased each x∗i by at most a factor of 2.

Péter Gács (Boston University) CS 530 Spring 09 141 / 165


Approximations The set-covering problem

The set-covering problem

Given (X , F ): a set X and a family F of subsets of X , find a


min-size subset of F covering X .
Example: Smallest committee with people covering all skills.
Generalization: Set S has weight w(S) > 0. We want a
minimum-weight set cover.

Péter Gács (Boston University) CS 530 Spring 09 142 / 165


Approximations The set-covering problem

The algorithm Greedy_Set_Cover (X , F ):


U←X
C ←;
while U 6= ; do
select an S ∈ F that maximizes |S ∩ U |/w(S)
U ←U \S
C ← C ∪ {S }
return C

If element e was covered by set S then let price(e) = Sw∩(SU) . Then


we cover each element at minimum price (at the moment).
Note that the total final weight is nk=1 price(e k ).
P

Péter Gács (Boston University) CS 530 Spring 09 143 / 165


Approximations The set-covering problem

Analysis

Let H(n) = 1 + 1/2 + · · · + 1/n(≈ ln n).

Theorem
Greedy_Set_Cover has a ratio bound maxS ∈F H(|S |).

Péter Gács (Boston University) CS 530 Spring 09 144 / 165


Approximations The set-covering problem

Lemma
For all S in F we have
P
e∈S price(e) É w(S)H(|S |).

Proof. Let e ∈ S ∩ S i \ j< i S j , and Vi = S \ j< i S j be the


S S

remaining part of S before being covered in the greedy cover. By


the greedy property,

price(e) É w(S)/|Vi |.

Let e 1 , . . . , e |S | be a list of elements in the order in which they are


covered (ties are broken arbitrarily). Then the above inequality
implies

w(S)
price(e k ) É .
|S | − k + 1

Summing for all k proves the lemma.

Péter Gács (Boston University) CS 530 Spring 09 145 / 165


Approximations The set-covering problem

Proof of the theorem. Let C ∗ be the optimal set cover and C the
cover returned by the algorithm.

w(S)H(|S |) É H(|S ∗ |)
X X X X X
price(e) É price(e) É w(S)
e S ∈C ∗ e∈S S ∈C ∗ S ∈C ∗

where S ∗ is the largest set.

Question
Is this the best possible factor for set cover?

The answer is not known.

Péter Gács (Boston University) CS 530 Spring 09 146 / 165


Approximations Approximation schemes

Approximation scheme

An algorithm that for every ε, gives an (1 + ε)-approximation.


A problem is fully approximable if it has a polynomial-time
approximation scheme.
Example: see a version KNAPSACK below.
It is partly approximable if there is a lower bound λmin > 1 on
the achievable approximation ratio.
Example: MAXIMUM CUT, VERTEX COVER, MAX-SAT.
It is inapproximable if even this cannot be achieved.
Example: INDEPENDENT SET (deep result). The
approximation status of this problem is different from
VERTEX COVER, despite the close equivalence between the
two problems.

Péter Gács (Boston University) CS 530 Spring 09 147 / 165


Approximations Approximation schemes

Fully approximable version of knapsack

Given: integers b Ê a 1 , . . . , a n , and integer weights w1 Ê . . . Ê wn .

maximize wT x
subject to aT x É b,
x i = 0, 1, i = 1, . . . , n.

Péter Gács (Boston University) CS 530 Spring 09 148 / 165


Approximations Approximation schemes

Dynamic programming: For 1 É k É n,

A k (p) = min{ aT x : wT x = p, xk+1 = · · · = xn = 0 }.

If the set is empty the minimum is ∞. Let w = w1 + · · · + wn . The


vector (A k+1 (0), . . . , A k+1 (w)) can be computed by a simple
recursion from (A k (0), . . . , A k (w)). Namely, if wk+1 > p then
A k+1 (p) = A k (p). Otherwise,

A k+1 (p) = min{ A k (p), a k+1 + A k (p − wk+1 ) }.

The optimum is max{ p : A n (p) É b }.


Complexity: roughly O(nw) steps.
Why is this not a polynomial algorithm?

Péter Gács (Boston University) CS 530 Spring 09 149 / 165


Approximations Approximation schemes

Idea for approximation: break each w i into a smaller number of


big chunks, and use dynamic programming. Let r > 0, w0i = bw i /r c.

maximize (w0 )T x
subject to aT x É b,
x i = 0, 1, i = 1, . . . , n.

Péter Gács (Boston University) CS 530 Spring 09 150 / 165


Approximations Approximation schemes

For the optimal solution x0 of the changed problem, estimate


w T x0 w T x0
OPT = wT x∗ . We have

wT x0 /r Ê (w0 )T x0 Ê (w0 )T x∗ Ê (w/r)T x∗ − n,


wT x0 Ê OPT − r · n = OPT − εw1 ,

where we set r = εw1 /n. This gives

(w)T x0 ε w1
Ê 1− Ê 1 − ε.
OPT OPT
P
With w = i wi , the amount of time is of the order of

nw/r = n2 w/(w1 ε) É n3 /ε,

which is polynomial in n, (1/ε).

Péter Gács (Boston University) CS 530 Spring 09 151 / 165


Approximations Approximation schemes

Look at the special case of knapsack, with w i = a i . Here, we just


want to fill up the knapsack as much as we can. This is
equivalent to minimizing the remainder,

aT x.
X
b−
i

But this minimization problem is inapproximable.

Péter Gács (Boston University) CS 530 Spring 09 152 / 165


Convex programming Convexity

Convex programming
Convexity

Many methods and results of linear programming generalize to


the case when the set of feasible solutions is convex and there is a
convex function to minimize.
Definition
A function f : Rn → R is convex if the set { (x, y) : f (x) É y } is
convex. It is concave if − f (x) is convex.

Equivalently, f is convex if

f (λa + (1 − λ)b) É λ f (a) + (1 − λ) f (b)

holds for all 0 É λ É 1.

Péter Gács (Boston University) CS 530 Spring 09 153 / 165


Convex programming Convexity

Examples
Each linear function aT x + b is convex.
If a matrix A is positive semidefinite then the quadratic
function xT Ax is convex.
If f (x), g(x) are convex and α, β Ê 0 then α f (x) + β g(x) is also
convex.

If f (x) is convex then for every constant c the set { x : f (x) É c } is a


convex set.

Péter Gács (Boston University) CS 530 Spring 09 154 / 165


Convex programming Convexity

Definition
A convex program is an optimization problem of the form

min f 0 (x)
subject to f i (x) É 0 for i = 1, . . . , m,

where all functions f i for i = 0, . . . , m are convex.


More generally, we also allow constraints of the form

x∈H

for any convex set H given in some effective way.

Péter Gács (Boston University) CS 530 Spring 09 155 / 165


Convex programming Convexity

Example: Support vector machine


Vectors u1 , . . . , u k represent persons known to have ADD
(attention deficit disorder). u i j = measurement value of the
jth psychological or medical test of person i. v1 , . . . , v l ∈ Rn
represent persons known not to have ADD.
Separate the two groups, if possible by a linear test find
vectors z, x < y with
z T u i É x for i = 1, . . . , k,
z T v i Ê y for i = 1, . . . , l.
y− x
For z, x, y to maximize the width of the gap ( z T z)1/2
, solve the
convex program:
maximize y− x
subject to u Ti z É x, i = 1, . . . , k,
vTi z Ê y, i = 1, . . . , l,
z T z É 1.
Péter Gács (Boston University) CS 530 Spring 09 156 / 165
Convex programming Separation oracle

Separation oracle

For the definition of “given in an effective way”, take clue from


the ellipsoid algorithm:
We were looking for a solution to a system of linear
inequalities

aTi x É b i , i = 1, . . . , n.

A trial solution x( t) was always the center of some ellipsoid E t .


If it violated the conditions, it violated one of these:
aTi x( t) > b i . We could then use this to cut the ellipsoid E t in
half and to enclose it into a smaller ellipsoid E t+1 .
Now we are looking for an element of an arbitrary convex set
H. Assume again, that at step t, it is enclosed in an ellipsoid
E t , and we are checking the condition x( t) ∈ H. How to imitate
the ellipsoid algorithm further?

Péter Gács (Boston University) CS 530 Spring 09 157 / 165


Convex programming Separation oracle

Definition
Let a : Qn → Qn , b : Qn → Q be functions computable in
polynomial time and H ⊆ Rn a (convex) set. These are a
separating (hyperplane) oracle for H if for all x ∈ Rn , with
a = a(x), b = b(x) we have:
If x ∈ H then a = 0.
If x 6∈ H then aT y É b for all y ∈ H and aT x Ê b.

Example
For the unit ball H = { x : xT x É 1 }, the functions a = x · | xT x − 1|+ ,
and b = xT x − 1 give a separation oracle.
To find a separation oracle for an ellipsoid, transform it into a ball
first.

Péter Gács (Boston University) CS 530 Spring 09 158 / 165


Convex programming Separation oracle

If the goal is to find an element in a convex set H that allows a


separation oracle (a(·), b(·)) then we can use it to proceed with the
ellipsoid algorithm, enclosing the convex set H into ellipsoids of
smaller and smaller volume. This can frequently lead to good
approximation algoritms.

Péter Gács (Boston University) CS 530 Spring 09 159 / 165


Convex programming Semidefinite programs

Semidefinite programs

If A, B are symmetric matrices then A ¹ B denotes that B − A


is positive semidefinite, and A ≺ B denotes that B − A is
positive definite.
Let the variables x i j be arranged in an n × n symmetric
matrix X = (x i j ). The set of positive semidefinite matrices

{ X : X º 0}

is convex. Indeed, it is defined by the set of linear inequalities

aT X a Ê 0, that is
X
(a i a j )x i j Ê 0
ij

where a runs through all vectors in Rn .

Péter Gács (Boston University) CS 530 Spring 09 160 / 165


Convex programming Semidefinite programs

Example: maximum cut

Recall the maximum cut problem in a graph G = (V , E, w(·))


where w e is the weight of edge e.
New idea:
Assign a unit vector u i ∈ Rn to each vertex i ∈ V of the graph.
Choose a random direction through 0, that is a random unit
vector z. The sign of the projection on z determines the cut:

S = { i : z T u i É 0 }.

Computation shows that in order to maximize the expected


cut weight, we need to minimize

w i j u Ti u j .
X
i 6= j

Péter Gács (Boston University) CS 530 Spring 09 161 / 165


Convex programming Semidefinite programs

This brings to the program:


P T
minimize i 6= j w i j u i u j
subject to u Ti u i = 1, i = 1, . . . , n.

It is more convenient to work with the variables x i j = u Ti u j . The


matrix X = (x i j ) is positive semidefinite, with x ii = 1, if and only if
it can be represented as x i j = u Ti u j . We arrive at the semidefinite
program:
P
minimize i j wi j xi j
subject to x ii = 1, i = 1, . . . , n,
X º 0.

Péter Gács (Boston University) CS 530 Spring 09 162 / 165


Convex programming Semidefinite programs

Separation oracle for semidefiniteness

The LU decomposition algorithm, when the matrix A is


symmetric,
µ becomes¶ the Cholesky decomposition:
a 11 vT
For A = with U 1 = L1T :
v A0
µ ¶
1 −1 a 11 0
L−
1 AU 1 =
0 A2

with Schur’s complement A 2 = A 0 − vvT /a 11 .

Péter Gács (Boston University) CS 530 Spring 09 163 / 165


Convex programming Semidefinite programs

Proposition
If A is positive definite then A 2 is also.

Proof. We have yT A 2 y = xT Ax, with


µ ¶
1 0
x = U−
1 y =: M 1 y.
I n−1

If y witnesses A 2 not positive definite by yT A 2 y É 0 then x = M 1 y


witnesses A not positive definite.

Péter Gács (Boston University) CS 530 Spring 09 164 / 165


Convex programming Semidefinite programs

Separation oracle (d, b) for positive definiteness


b(A) = 0.
We will set d i j = x i x j where x witnesses A not positive
definite.
If the first step of the decomposition fails, that is a 11 É 0, then
set x = e1 .
If the recursive step of the decomposition fails, that is y
witnesses A 2 not positive definite by yT A 2 y É 0, then set
x = M 1 y.

Péter Gács (Boston University) CS 530 Spring 09 165 / 165

You might also like