You are on page 1of 116

STAT7003

Optimisation Algorithms
in
Operational Research

Course Notes
2011/2012

Contents

Course Outline

1 Linear Programming

1.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2

Mathematical Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.3

Linear Programming Problems (LPPs) . . . . . . . . . . . . . . . . . . . . . . . .

1.4

Graphical Solution Method for LPPs . . . . . . . . . . . . . . . . . . . . . . . . .

1.5

Extreme Point Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.6

Graphical Na Solution Method . . . . . . . . . . . . . . . . . . . . . . . . . . .


ve

10

1.7

Slack Variables and Surplus Variables . . . . . . . . . . . . . . . . . . . . . . . .

11

1.8

Graphical Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

2 The Simplex Algorithm

17

2.1

Theory leading to the Simplex Algorithm . . . . . . . . . . . . . . . . . . . . . .

18

2.2

Numerical Na Method
ve

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

2.3

The Simplex Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

2.4

Why Does the Simplex Algorithm Work? . . . . . . . . . . . . . . . . . . . . . .

26

2.5

Another Example Of the Simplex Algorithm . . . . . . . . . . . . . . . . . . . . .

28

2.6

Practical Considerations of the Simplex Algorithm . . . . . . . . . . . . . . . . .

31

2.7

Simplex Algorithm for More General Problems . . . . . . . . . . . . . . . . . . .

33

CONTENTS

2.8

The Big-M Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

2.9

Two-Phase Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

3 Sensitivity Analysis and Duality

42

3.1

Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

42

3.2

Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

47

4 Game Theory

50

4.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

50

4.2

Basic Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

51

4.3

Nash Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

52

4.4

Dominance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

54

4.5

Saddle Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

55

4.6

Maximin and Minimax Mixed Strategies . . . . . . . . . . . . . . . . . . . . . . .

56

4.7

Graphical Solution for n 2 Game . . . . . . . . . . . . . . . . . . . . . . . . . .

57

4.8

Linear Programming Formulation of a Game . . . . . . . . . . . . . . . . . . . .

59

4.9

Games Against Nature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

62

5 Dynamic Programming

68

5.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

68

5.2

Forward Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

69

5.3

Backward Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

72

5.4

Resource Allocation Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

73

5.5

The Warehousing Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

78

6 Stochastic Optimisation

83

6.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

83

6.2

Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

83

6.3

Markov dynamic programming without actions . . . . . . . . . . . . . . . . . . .

85

6.4

Markov dynamic programming with actions . . . . . . . . . . . . . . . . . . . . .

87

6.5

Discounting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

90

6.6

Innite horizon problems

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

91

6.7

Optimal Stopping Routines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

Course Outline
Lecturer
Dr. Ricardo Silva (room 139) ricardo@stats.ucl.ac.uk

Aims of course
To provide an introduction to the ideas underlying the optimal choice of component variables, subject
to constraints, that maximise (or minimise) an objective function. The algorithms described are both
mathematically interesting and applicable to a wide variety of complex real life situations.

Objectives of course
On successful completion of the course, a student should be able to understand the theoretical concepts
of linear programming, dynamic programming and nite Markov programming, set up correct models of
real life problems, interpret results correctly and check the validity of assumptions.

Applications
Optimisation methods provide the means for successful business strategies, scientic planning and statistical estimation under constraints. They are a critical component of any area where decision making
under limited resources is necessary.

Prerequisites
STAT1004 or equivalent.

Course content
Linear programming: graphical solution techniques, simplex method, sensitivity analysis.
Game theory: Zero-sum two player games, minimax, maximin, Laplace, Hurwicz and minimax regret
strategies, linear programming formulation.
Dynamic programming: systems, states, stages, principle of optimality, forward and backward recurrence.
Markov sequential processes: Markov processes with rewards values-iteration, policy iteration, sequential
decision processes.

Course Outline

Texts
This course covers a range of topics that are not usually covered all in one book. This is an abridged list
of textbooks. A more detailed list is given in Appendix 2 of the course notes.
F.S. Hillier & G.J. Lieberman Introduction to Operations Research (2005, McGraw Hill).
G. Gordon & I. Pressman Quantitative Decision-Making for Business (1978, Prentice Hall).
B.Kolman & R.E.Beck, Elementary Linear Programming with Applications (1980, Academic Press).
W. L. Winston Operations research : applications and algorithms (1994, Duxbury Press).
S. M. Ross Introduction to Stochastic Dynamic Programming (1983, Academic Press).

Assessment for examination grading


In-course assessment: one compulsory set of problems done under supervision
1
2 2 -hour written examination in term 3.

The nal mark is a 9 to 1 weighted average of the written examination and in-course assessment marks.

Other set work


Six sets of exercises. These will not count towards the examination grading.

Timetabled workload
Lectures and problems classes: 3 hours per week in term 1.

Chapter 1

Linear Programming
1.1

Introduction

What is OR?
OR stands for Operational research or alternatively Operations Research, and is sometimes
called Management Science.
To the mathematician, OR means maximising some function subject to a set of constraints.
To a manager, OR means building a model for his/her business which:

allows the ecient or optimal allocation of limited resources, and


identies those resources that are limiting the business output.

Who uses OR?


Usually large companies or organisations e.g. oil companies, car manufacturers, the military.

When did it start?


Surprisingly recently; during the Second World War when scarce resources had to be assigned
for dierent military purposes. Scientists were asked to do research into military operations.
The Simplex algorithm was formulated in 1947.
To do practical OR one needs at least simple computers/calculators.

Main topics in this course

Linear programming (Chapters 1 to 3),


Game theory (Chapter 4),
Dynamic programming - deterministic and stochastic (Chapters 5 and 6)
3

CHAPTER 1. LINEAR PROGRAMMING

1.2

Mathematical Programming

Optimisation is the problem of nding the maximum or minimum of a function f (x) and the
value of x at which f (x) attains that maximum/minimum.
If x is the value that minimises f (x), then x is also the value that maximises f (x) so a
minimisation problem can be trivially turned into a maximisation problem.
Example 1.1 Unconstrained Minimisation
Minimise
z = x2 + x2 .
1
2
It is easy to see that x = 0 and x = 0 gives the unique minimum, with z = 0. For a less
2
1
obvious problem partial derivatives would yield the solution.
Example 1.2 Constrained Minimisation
Minimise
z = x2 + x2 .
1
2
Subject to the conditions
x1 x2 = 3
x2
2

Note that the value of z is the squared distance of the point (x1 , x2 ) from the origin (Pythagorass
Theorem). From the gure the point that satises the constraints that is also the closest to the
origin is x = 5, x = 2, which is the optimal solution.
2
1
A mathematical program is an optimisation problem in which the objective and constraints are
given as mathematical functions and functional relationships:

CHAPTER 1. LINEAR PROGRAMMING


optimise z = f (x1 , x2 , . . . , xn )

g1 (x1 , x2 , . . . , xn )

g2 (x1 , x2 , . . . , xn )
subject to:
=

...

gm (x1 , x2 , . . . , xn )

b1

b
2
...

bm

The xj are called the control variables (or decision variables), f is the objective function and bi
(i = 1, . . . , m) are the resource values.

1.3

Linear Programming Problems (LPPs)

Denition
A mathematical program is a linear program if f (x1 , x2 , . . . , xn ) and all gi (x1 , x2 , . . . , xn ) are
linear in the decision variables,
i.e. if f and all gi can be written as:
f (x1 , x2 , . . . , xn ) = c1 x1 + c2 x2 + . . . + cn xn
and
gi (x1 , x2 , . . . , xn ) = ai,1 x1 + ai,2 x2 + . . . + ai,n xn ,
where cj and ai,j (i = 1, 2, . . . , m and j = 1, . . . , n) are known constants and the xj are continuous
real variables.
In most problems we deal with in this course we also insist that the xi s are non-negative.
Denitions
If all xi s are non-negative variables and all the constraints are equalities then the LPP is in
canonical form.
If all xi s are non-negative variables and all the constraints use then the LPP is in standard
form.
The feasible region for a LPP is the set of all points (x1 , . . . , xn ) that satisfy all the constraints
(and satisfy xi 0 i if xi s are non-negative variables).
Example 1.3 Clock Manufacturer
A clock manufacture produces a number of standard clocks and a number of alarm clocks each
day.
Resource constraints:

labour hours per day:


processing hours per day:
alarm assemblies available per day:

1600
1800
350

CHAPTER 1. LINEAR PROGRAMMING


Consumption of resources:

one standard clock uses:


one alarm clock uses:

Prot:

one standard clock:


one alarm clock:

2
6
4
2
1

hours labour
hours processing
hours labour
hours processing
alarm assembly.

gives 3 prot
gives 8 prot

The objective is to maximise prot.


In linear programming notation, let x1 be the number of standard clocks produced per day
and let x2 be the number of alarm clocks per day.
The objective function is z = 3x1 + 8x2 , which we want to maximise.
The constraints are:

2x1 + 4x2
6x1 + 2x2
x2 350

1600
1800

(labour)
(processing)
(alarm assemblies).

Also, the number of clocks produced per day cannot be negative, so x1

0 and x2

0.

There are several methods of solving this problem: graphical solution methods and numerical
methods, including the Simplex algorithm.

1.4

Graphical Solution Method for LPPs

Consider the alarm clock problem above.

CHAPTER 1. LINEAR PROGRAMMING


First draw the feasible region, dened by the 3 constraints.

800

700

700
Alarm Clock Production x2

900

800

Alarm Clock Production x2

900

600
500
400
300

600
500
400
300

200

our

nst

rain

100

100

200

300

500

400

100

700

800

100

200

Standard Clock Production x1

300

500

400

100

700

800

700

800

Standard Clock Production x1

900

900

800

800

Alarm Clock Production x2

Lab

our

200

Co

nst

rain

nt

300

500

i
nstra
g Co
essin

400

600

Proc

nt

500

700

i
nstra
g Co
essin

600

Proc

700
Alarm Clock Production x2

Co

200

100

Lab

400

Alarm AssembliesConstraint

300

Lab

our

200

100

Co

nst

rain

100

100

200

300

400

500

100

Standard Clock Production x1

700

800

100

200

300

400

500

100

Standard Clock Production x1

CHAPTER 1. LINEAR PROGRAMMING

900
800

onst
raint

Alarm Clock Production x2

600

gC
essin
Proc

700

z=5200

(100,350)

Alarm AssembliesConstraint

500
400
300

Lab

our

200

Co

nst

rain

z=3100

z=1200

100

100

200

300

400

500

100

700

800

Standard Clock Production x1


1
Taking z = 3x1 + 8x2 , we can rearrange this to get x2 = 3 x1 + 8 z. So, for any given constant
8
c, all points on this line x2 = 3 x1 + c yield the same value of z, and this value of z is given
8
3
by 1 z = c, i.e. z = 8c. Lines x2 = 8 x1 + c that do not intersect with the feasible region are
8
of no interest, as no point on such a line satises the constraints. We are looking for the line
3
x2 = 8 x1 + c which both has the greatest intercept, c, and intersects the feasible region. All
points on the line yield the same maximum value of z (z = 8c), and at least one point on this
line is a feasible solution.

By taking the line x2 = 3 x1 + c for some arbitrary value of c, and moving it up and down
8
in a parallel fashion (i.e. increasing/decreasing c whilst keeping the gradient the same) we nd
the line with the greatest intercept that intersects the feasible region. This gives us both the
maximum value for z and the values of x1 and x2 that maximise it.
In this example (x , x ) = (100, 350) gives the optimal value z = 3100.
1 2

CHAPTER 1. LINEAR PROGRAMMING

If, instead, the objective function were z = 3x1 + 2x2 then the optimal solution would become
(x , x ) = (200, 300) with z = 1200.
1 2
Exercise: Check this.
Note: In both these examples, the optimising values of the control variables are unique and lie
at a vertex of the feasible region.

1.5

Extreme Point Theorem

Denitions
An extreme point of a region R is any vertex on the boundary of R.
R is bounded if all the points in R are nite.

Extreme Point Theorem

For a linear programming problem, if the feasible region is nonempty and bounded then an
optimal solution exists.

If the optimal solution is unique then it occurs at one of the extreme points of the feasible
region.

If a set of points all give the same maximal value of z then these points lie on a straight
line on the boundary of the feasible region.

Examples
The previous example of the alarm clock assembly problem is one that gives a unique solution.
The following is an example of an empty feasible region
maximise
s.t.

z = x1 + x2
x1 x2

x1 x2

The following is an example of an unbounded feasible region without an optimal solution.

maximise
s.t.

z = x1 + x2
x1

x2

10

CHAPTER 1. LINEAR PROGRAMMING


The following is an example of an unbounded feasible region with an optimal solution.
minimise
s.t.

z = x1 + x2
x1

x2

The following is an example of a non-unique optimal solution.


maximise
s.t.

z = x1 + x2
x1 + x2
x1 , x2

5
0

Denition
The extreme points of a feasible region are called the feasible basis points.

1.6

Graphical Na Solution Method


ve

If the feasible region is bounded, the Extreme Point Theorem tells us that the optimal solution
is at an extreme point of the feasible region (or on a line connecting two extreme points). So,
one can evaluate the objective function at each feasible basis point. The point that gives the
highest (lowest) value for z is the optimal solution.
Example 1.4 The alarm clock problem

Coordinates
(x1 , x2 )
0,0
300,0
0,350
200,300
100,350

z=
3x1 + 8x2
0
900
2800
3000
3100

Comments
If the feasible region is not bounded, the method may give a suboptimal solution.
For complex LPPs, nding the feasible basis points becomes time consuming.

11

CHAPTER 1. LINEAR PROGRAMMING

1.7

Slack Variables and Surplus Variables

In the optimal solution above x = 100 and x = 350, which implies the following use of
1
1
resources.

So, only
units per day of processing are required, whereas we have 1800 available. This
is a slack of 500. The slack for labour and alarm assemblies is zero.
Considering further the processing constraint:
6x1 + 2x2

1800,

by introducing a slack variable x4 we can transform the constraint into


6x1 + 2x2 + x4 = 1800,
with x4

0. (If we allowed negative values for x4 then 6x1 + 2x2 could be > 1800.)

So the inequality has been transformed into an equality. This will be utilised in the next chapter.
Suppose there had been another constraint that was a minimum ( ) not a maximum ( ). E.g.
the total number of clocks produced must be at least 100.
x1 + x2

100.

We now introduce a surplus variable x6 and transform the constraint into


x1 + x2 x6 = 100.
with x6

0.

Note: All slack and surplus variables are non-negative.


The full set of equations for the extended clock problem are:

CHAPTER 1. LINEAR PROGRAMMING

1.8

12

Graphical Sensitivity Analysis

Sensitivity Analysis means determining the amount the optimal solution changes when the specication of the LPP changes. How sensitive is the optimal solution to the correct model specication?
The specication of the LPP may change in three ways:
1. The coecients of the objective function (cj ) change,
2. The resource constraints (bi ) change, or
3. The coecients of the constraints (ai,j ) change.
Below, we look at 2 and 3.
Denition
The shadow price of a constraint is the amount by which the optimal value of the objective
function changes when the resource value of that constraint is increased by one unit.
Returning to the original clock problem.

CHAPTER 1. LINEAR PROGRAMMING

13

Suppose we had increased the available labour resource by 1 unit to 1601. Then the optimal
solution will still be at the same vertex as before. The optimal value of x2 is still x = 350,
2
thus
x =
1
z =

Because the problem is linear, the objective function increases by 1.5 for every unit increase in
the labour resource. The shadow price of the labour constraint is 1.5 (per labour unit).
Suppose, instead, the processing resource had increased by one unit. Then the optimal solution
would not change at all. This is because the optimal solution does not lie on the line corresponding to the processing constraint. Thus, the shadow price for processing is zero. In general,
if the slack/surplus variable for a constraint is non-zero then the shadow price of the constraint
will be zero.
Finally, suppose the number of alarm assemblies available increases by one. Then
x =
1
x =
2
z =

and so the shadow price for assemblies is 2 (per assembly unit).

Interpretation of the shadow price


Suppose the clock company considers increasing the number of alarm assemblies available at a
cost. The shadow price of 2 tells them that they should do so, provided the cost of buying in
each extra alarm assembly is less than 2.

CHAPTER 1. LINEAR PROGRAMMING

14

At what point does the shadow price drop to zero?


Look at the previous graph. As the labour resource increases, the optimal solution will increase
until the point P. This corresponds to the processing slack being reduced to zero. Increasing
the labour resource beyond this point will have no eect on the optimal solution and the labour
slack will increase.
Point P is located at
, where z =
, corresponding to a labour resource of
. It is not worthwhile to increase the available labour by more than
hours/day.

Changes in the objective function


From before,
z = 3x1 + 8x2
z
3
x2 = x1 +
8
8
3
So, the gradient of the contours of the objective function is 8 .

In the graph, the gradient of the assembly constraint is zero and the gradient of the labour
1
constraint is 2 . For the optimum solution to be at the assembly-labour vertex, the gradient of
1
the objective function must be between 2 and 0.
If the gradient of the objective function is between 3 and 1 , then a greater value of the ob2
jective function can be obtained at the labour-processing vertex. If the gradient of the objective
1
function is exactly 2 then it is parallel to the labour constraint and any point on the labour
constraint between the two vertices is optimal.
Question: If z = c1 x1 + c2 x2 , what values of c1 and c2 give the same values of x = 100,
1
x = 350?
2

So if we change the value of c1 the same position of the optimal solution will be obtained provided
c1 remains positive and does not exceed 1 c2 .
2
Similarly, if we change only c2 the position of the optimal solution will remain the same provided
c2 exceeds 2c1 .

15

CHAPTER 1. LINEAR PROGRAMMING

Note: The position of the optimal solution remains the same (i.e. x and x remain unchanged),
2
1
but the optimal value of the objective function will change.

Example 2
Suppose a company makes two models of car. These models are called Alpha and Omega. The
number of Alpha models to be produced is x1 and the number of Omega models is x2 .
We have the following linear programming problem:

maximise
s.t.

z = 6x1 + 5x2
4x1 + x2
2x1 + 3x2

800
900

(prot)
(materials)
(labour)

x1

180

(max alpha sales)

x2

320

(max omega sales)

Which gives the graph:

The optimal solution is at point


, z =
, x =
x =
2
1

, giving
.

Again, if the objective function is altered, the location of the optimal solution could change
to another vertex. We shall nd the values of c1 and c2 for which the optimal solution is still
located at the same vertex.
For z = c1 x1 + c2 x2 , the gradient of the z contours is c1 /c2 . So, if

CHAPTER 1. LINEAR PROGRAMMING

16

For c1 = 6 we have a tolerance interval for c2 of [1.5, 9]. As the original value of c1 was 6, the
original tolerance interval for c2 is [1.5, 9].
Exercise
1. What is the original tolerance interval for c1 ?
2. What are the shadow prices for the four constraints in this problem?

Chapter 2

The Simplex Algorithm


In this chapter we shall assume that all linear programming problems (LPPs) are maximisation
problems and that all control variables, xi , must be non-negative, unless otherwise stated.

General Notation
A LPP may be written using matrix notation.
For example,
maximise

z = cT x

obj. fn.

subject to

Ax

constraints

(2.1)

where
A = [ai,j ] is an m n matrix
b is an m-vector of constraints (resources)
x is an n-vector of variables and
c is an n-vector of objective function coecients.
The above LPP is in standard form. This is because equation (2.1) has a
are all non-negative.

inequality, and x

If instead
Ax = b
0

x
the LPP is in canonical form.

By using slack/surplus variables, it is always possible to convert any LPP into canonical form.
17

CHAPTER 2. THE SIMPLEX ALGORITHM


When Ax
Putting

18

b, we require one slack variable for each of the m rows of A. Call these xn+1 , . . . , xn+m .

Ac =

and xc = (x1 , . . . , xn+m )T gives Ac xc = b, i.e. in canonical form.


Example 2.1 Clock LPP revisited
The clock LPP can be written in matrix notation in standard form as

Add slack variables to transform it into canonical form:

2.1

Theory leading to the Simplex Algorithm

Assume the rows of Ac are linearly independent. Otherwise at least one constraint is redundant
(i.e. follows from the other constraints) and should be removed.
Solving Ac xc = b gives a region R (in Rn+m ) satisfying the m constraints of the canonical
problem.
Solving means: given Ac and b nd the set of xc that satisfy Ac xc = b. In general, this will not
be a unique solution because Ac has m rows and n + m columns.
The region S = {xc R xc

0} is the feasible region of the LPP.

19

CHAPTER 2. THE SIMPLEX ALGORITHM

Denitions
If a vector xc solves the equation Ac xc = b and xc has at most m non-zero elements, xc is called
a basic solution of the LPP.
If xc is a basic solution of the LPP and every element of xc is non-negative, xc is called a feasible
basic solution of the LPP.
Theorem 2.1
Every extreme point of S is a feasible basic solution of the LPP, and conversely every feasible
basic solution of the LPP is an extreme point of S.
In the last chapter we showed that the optimal solution of a LPP lies at an extreme point of the
feasible region (or on a line connecting two extreme points). So, from the theorem above, the
optimal solution must be one (or possibly two) of the basic feasible solutions.
The following method involves nding all the basic solutions, determining which are feasible,
and then seeing which optimises the objective function.

2.2

Numerical Na Method
ve

How do we nd the basic solutions?


We can nd one basic solution as follows. Take the rst m columns of Ac (call this submatrix
A ), let x be a vector of length m and solve the set of equations A x = b. Let xc = (x , 0, . . . , 0)
(i.e. the last n elements of xc are zero). Then xc is a basic solution of the LPP. Furthermore it
will be a feasible basic solution of the LPP if every element of x is non-negative.
Another basic solution can be found by taking a dierent combination of m columns of Ac (call
this A ) and again solving A x = b. The entries of xc that are set equal to x correspond to the
columns of Ac selected. The other n elements of xc are set equal to zero.
The n elements of xc that are set to zero are called the non-basic variables. The other m
elements, which are set equal to x (and so may be non-zero), are called the basic variables.
Using this method it is straightforward to nd all the basic solutions, check which ones are
feasible and then inspect which of these feasible basic solutions gives the largest value of z. This
is the optimal solution.
m+n
m
tions to be found.

There are

combinations of m columns from Ac , and hence

Example 2.2 Clock LPP revisited

m+n
m

basic solu-

CHAPTER 2. THE SIMPLEX ALGORITHM

20

Earlier, we wrote the clock LPP in canonical form using matrix notation:

x1

2 4 1 0 0 x2
1600

6 2 0 1 0 x3 = 1800

0 1 0 0 1 x4
350
x5
There are

5
3

= 10 ways of choosing m = 3 columns from Ac , and so 10 basic solutions to

obtain.
Using columns 1, 2 and 3,

gives x1 =

, x2 =

Using columns 1, 2 and 4

Using columns 1, 2 and 5

Using columns 1, 3 and 4

2 4 1
x1
1600

6 2 0 x2 = 1800
0 1 0
x3
350

, x3 =

21

CHAPTER 2. THE SIMPLEX ALGORITHM

The nal combination we consider is columns 3, 4 and 5

The following table summarises the results for the other ve combinations of three columns:
Columns
1 3 5
1 4 5
2 3 4
2 3 5
2 4 5

x1
300
800
0
0
0

x2
0
0
350
900
400

x3
1000
0
200
-2000
0

x4
0
-3000
1100
0
1000

x5
350
350
0
-550
-50

Feasible?
Y
N
Y
N
N

z
900
2800

So the largest feasible value of the objective function is the second combination with x = 100,
1
x = 350, x = 0, x = 500, x = 0, giving z = 3100, which conrms the result found
2
3
4
5
graphically.
Example 2.3 Another Example
The following LPP is in canonical form.
maximise

z = x1 + x2

s.t.

x1 + 2x2 + x3 = 10

2x1 + x2 + x4 = 12

giving
Ac =

xc = (x1 , x2 , x3 , x4 )T

b = (10, 12)T

22

CHAPTER 2. THE SIMPLEX ALGORITHM


One combination of m columns of Ac is column 1 and column 3.
So, solve

giving x1 =
, x2 =
, x3 =
and x4 =
is feasible. Now compute z, which equals 6.

. As x1

0 and x3

0, the solution

Exercise: repeat this for each combination of two columns of Ac , to nd the optimal solution.

Comments
1. The above is simply a numerical method of the graphical na method (Section 1.6).
ve
Graphical method dicult for n 3.
2. The number of possible solutions
m = 5, n = 5 gives

10
5

m+n
m

can become large very rapidly. For example,

= 252 solutions to be found.

3. We have looked at the naive numerical method for the case where all m constraints are
. When this is not so, the LPP in canonical form will still be Ac xc = b, but Ac may
not have m + n columns. The principle remains the same, however. We still take m m
submatrices of Ac (A say) and solve A x = b. The vector x is still of length m and
there are still m basic variables. The number of non-basic variables will be l m, where l
is the number of columns of Ac .

2.3

The Simplex Algorithm

Assume for now that i) all m constraints are


constraints, so that we add m slack variables,
and ii) all bi 0. We shall consider later what to do when this is not true.
m+n
possible
m
matrices A . One of these is trivial to solve: the one where A consists of the last m columns of
Ac . A is then the identity matrix and the solution is just x = b (see Example 2.2). Thus, one
basic solution is just x1 = . . . = xn = 0 (the non-basic variables) and xn+i = bi for i = 1, . . . , m
(the basic variables). That is, set each slack variable equal to its corresponding resource value
and all the original variables to be zero. This basic solution is feasible (because bi 0 i) and
z = 0.
In the numerical na method, we had to solve A x = b for each of the
ve

23

CHAPTER 2. THE SIMPLEX ALGORITHM

By laying out the problem in a tabular format using this starting solution, we can successively
visit other feasible solutions in such a way that the objective function always increases. Once
z can be increased no more we reach the optimal solution. This method is called the Simplex
algorithm.
The advantage of the Simplex algorithm is that we do not have to investigate explicitly all of
m+n
the
basic solutions.
m
Rearranging z = c1 x1 + + cn xn , we obtain
c1 x1 cn xn + z = 0.
Our m constraints are Ac xc = b. So, we can write

0
x1

.
.

.
.
Ac
.
.

0 xn+m

c1 . . . cn 0 . . . 0 1
z
From this we have the general format of the initial tableau:

b1
.
.
= .

bm
0

(2.2)

Clock example: initial tableau

One basic solution is easy to read o from this tableau: x1 = x2 = 0 (the non-basic variables)
and x3 = 1600, x4 = 1800 and x5 = 350 (the basic variables), with z = 0. We say that this

24

CHAPTER 2. THE SIMPLEX ALGORITHM

tableau corresponds to this basic solution. Note that this basic solution is feasible, because
bi 0 i.
The Simplex algorithm now looks for another basic feasible solution that has a greater value of
z. This new basic solution will be adjacent to the old one. That is, m 1 of the basic variables
will be same and one will be dierent.
First, we choose which of the non-basic variables to enter (into the set of basic variables).
Second, we choose which basic variable to exit (from the set of basic variables).
Step 1: Find the non-basic variable with the largest negative entry in the the last row (the
objective function row or z row). This variable will enter.
In the example, it is x2 , which has z-row entry 8.
Step 2: Find the basic variable corresponding to the row with the smallest positive -ratio. This
variable will exit. The -ratio for row i when xj is the entering variable is bi /ai,j .
In this example, the -ratios are:
1 = 1600/4 = 400,
2 = 1800/2 = 900,
3 = 350/1 = 350.
The smallest ratio is for row 3, so x5 is the exiting variable. Row 3 is called the pivotal row.
Step 3: Use elementary row operations to isolate the entering variable.
A variable is isolated when the column corresponding to this variable consists of zero entries
except for the entry in the pivotal row, which equals one.
The elementary row operations are:

Multiplying a row by a constant.


Adding one row to another.

In this example, we want the x2 column,

4
2
1
8

, to become

0
0
1
0

Row 3 already has a 1 in the x2 column. So there is no need to alter row 3, except that we
relabel it with x2 , because the old basic variable x5 is being replaced by x2 . We can write this
as 3 3 .

If we add 4 times row 3 to row 1, row 1 becomes


x3

2 0 1

200.

The element in the x2 column is now zero, as required. We can write this as 1 4 3 1 .

25

CHAPTER 2. THE SIMPLEX ALGORITHM


To isolate x2 , take
3

1
3
4
2
3
2
3
z
+8

This gives the second tableau:

Variables x2 , x3 and x4 are now the basic variables and so x1 and x5 are non-basic. The
values of x2 , x3 , x4 and z can easily be read o the tableau:
x2 = 350, x3 = 200, x4 = 1100, z = 2800.
(Note that z = 3x1 + 8x2 does indeed equal 2800.)
We have now completed one full iteration in the Simplex algorithm.

Notes

z has increased.
The values of the basic variables and of z come from:
0x2 + 1x3 + 0x4 = 200
x3 = 200
0x2 + 0x3 + 1x4 = 1100
x4 = 1100
1x2 + 0x3 + 0x4 = 350
x2 = 350
0x2 + 0x3 + 0x4 + z = 2800
z = 2800
since x1 = x5 = 0.

CHAPTER 2. THE SIMPLEX ALGORITHM

26

It is easy to read o the values of the basic variables because the submatrix consisting
of the columns corresponding to the basic variables is the identity matrix (with columns
permuted).

Second iteration
We now repeat the same process to see if we can get a solution that is even better.
The entering variable is x1 .
The ratios are 100, 183 1 and .
3
The exiting variable is x3 in row 1.
The third tableau is:

Stopping criterion
When the last row has non-negative entries for every non-basic variable, then the stopping
criterion has been met. In most cases that means that the optimal solution has been reached,
which can be read o directly from the tableau. The exception to this is for degenerate problems,
which are covered in the next section.
In our example, the stopping criterion is satised and thus the optimal solution is
x = 100, x = 350, x = 0, x = 500, x = 0 and z = 3100.
5
4
3
2
1
(Verify that z = 3x + 8x equals 3100.)
2
1

2.4

Why Does the Simplex Algorithm Work?

Full details are available in our references, such as Bertsimas and Tsitsiklis, Hillier and Lieberman, or Gordon and Pressman. Here I provide a brief account.
We have seen that the initial tableau represents the matrix equation
Ac x c = b
and the equation
z c1 x1 . . . cn xn = 0,

(2.3)

27

CHAPTER 2. THE SIMPLEX ALGORITHM


which is equivalent to
z = c1 x1 + . . . + cn xn

(2.4)

One basic feasible can easily be read o this tableau:


x1 = . . . = xn = 0
xn+i = bi

(i = 1, . . . , m)

z = 0
To obtain the second tableau, we perform elementary row operations. This amounts to premultiplying equation (2.3) by a particular matrix B.
B Ac xc = B b

(2.5)

The set of solutions of (2.3) is the same as the set of solutions of (2.5).
As m columns of the matrix BAc form an identity matrix (with columns permuted), it is easy
to read o the second basic feasible solution:
xi = 0

i non-basic variables

xi = (B b)i

i basic variables

where (Bb)i denotes the ith element of vector Bb.


In the initial solution, x1 = . . . = xn = 0. So, equation (2.4) gives z = 0. In the second
basic feasible solution, the variable xi from {x1 , . . . , xn } with the largest positive ci value (and
hence most negative entry in the z row) has been allowed to be > 0, while one of the variables
xn+1 , . . . , xn+m has been set to zero. From equation (2.4), we see that setting xi > 0 will increase
z, while setting any of xn+1 , . . . , xn+m equal to zero will not change z. Thus, the second basic
feasible solution will have a greater z value than the initial solution.
The same reasoning shows that our third basic feasible solution will be better than our second,
and so on.
In any tableau, the z row indicates that
z d1 xk1 . . . dn xkn = e

(2.6)

where xk1 , . . . , xkn are the n non-basic variables, and d1 , . . . , dn are the corresponding entries in
the z row, and e is the entry in the solution column of the z row. When all entries of the z row
are non-negative (so, d1 , . . . , dn 0), it is clear that making any of the non-basic variables into
basic variables (i.e. allowing them to be > 0) can only decrease z. Therefore, we have found an
optimal solution.
Example 2.4 A simple example

maximise
s.t.

z = x1 + 2x2
x1 + x2
x1 , x2

2
0

28

CHAPTER 2. THE SIMPLEX ALGORITHM

The constraint becomes x1 + x2 + x3 = 2. As this is in three dimensions, the problem can be


plotted.

x2
2

C
x3
2

A
B

0
0

x1

The area enclosed by triangle ABC is the feasible region. The vertices A, B and C are the three
basic solutions (two zero elements and one non-zero).
The initial solution is (0,0,2) (vertex A).
The optimal solution is (0,2,0) (vertex C).
The Simplex algorithm moves, at each iteration, from one basic feasible solution to an adjacent
and better solution, stopping when it can no longer nd a better solution.
So, either:
We move from A to B (i.e. enter x1 and exit x3 ), giving z = 2, followed by moving from B
to C (i.e. entering x2 and exiting x1 ), giving z = 4.

Or we move directly from A to C (i.e. enter x2 and exit x3 ).

Exercise: Run the Simplex algorithm on this problem, and investigate which route to the
optimal solution is taken.

2.5

Another Example Of the Simplex Algorithm

Example 2.5

maximise
s.t.

z = x1 + 2x2 + x3 + x4
2x1 + x2 + 3x3 + x4
2x1 + 3x2 + 4x4
3x1 + x2 + 2x3
x1 , . . . , x4

12
18

CHAPTER 2. THE SIMPLEX ALGORITHM

29

Introduce the slack variables x5 , x6 and x7 . These slack variables are the initial basic variables.

Initial Tableau

CHAPTER 2. THE SIMPLEX ALGORITHM

30

31

CHAPTER 2. THE SIMPLEX ALGORITHM


After the second iteration (3rd tableau) the stopping criterion has been reached. Thus
x = 4, x = 4/3, x = x = 0
2
3
1
4
and the slack variables are

1
x = x = 0, x = 11 ,
5
6
7
3
= 9 1 . (Check this using z = x + 2x + x + x .)
and the optimal value is z
1
2
3
4
3

2.6

Practical Considerations of the Simplex Algorithm

The choice of entering variable


Dierent implementations of the Simplex algorithm use dierent methods for choosing the entering variable. The method given in this course, that is choose the largest negative value in
the z row, is a good, simple method.
In the situation where two or more entries in the z row have the same largest negative value,
just choose one arbitrarily.

Alternative optima
If one (or more) of the non-basic variables has a zero in the z row of the nal tableau, there
exists an alternative optimal solution. In other words there are two (or more) feasible basic
solutions that give the same maximum value of the objective function.
Why is this?
Look at equation (2.6) in Section 2.4. The fact that there is a zero in one of the columns
corresponding to a non-basic variable means that one of d1 , . . . , dn (dj , say) equals zero. So, we
can enter the non-basic variable xkj without changing the value of z.
The geometric interpretation of this is that the z-contours are parallel to one (or more) constraints and that that any point on this constraint line also maximises z. So, alternative optimal
basic solutions will always be adjacent.
Example 2.6
This is a trivial example to demonstrate the idea. Given the following LPP.
maximise
s.t.

z = x1 + 2x2
x1

x2

x1 + 2x2
x1 , x2

12
0

CHAPTER 2. THE SIMPLEX ALGORITHM

32

Clearly the objective function is parallel to the third constraint.

The Simplex method moves from the origin to point A to point B, where the algorithm stops.
Any point on the line BC is optimal, but only B and C are basic solutions.
Exercise: Run the Simplex algorithm on this problem and verify that the z row of the nal
tableau contains a zero in one of the columns corresponding to a non-basic variable.

Unbounded solutions
In the Simplex algorithm, if you encounter the situation in which none of the -ratios are positive,
then there is an unbounded problem. There is no optimal solution and the Simplex algorithm
breaks down.
In any practical example this means that the problem has been ill-formulated. It is nonsense to
say that an innite prot (or other objective) is attainable without violating any constraint.
Example 2.7 Another trivial example

maximise
s.t.

z = x1 + x2
2x1 + x2
x1 x2
x1 , x2

10
5

Degenerate problems
If two (or more) rows have the same smallest positive -ratio so that there is a choice of
exiting variable we have a degenerate problem.

This will lead to one or more of the basic variables being zero.
It may lead to the stopping criterion being satised even though a suboptimal solution is
present.

CHAPTER 2. THE SIMPLEX ALGORITHM

33

What can one do about this?

Recognise when a choice of departing variable arises.

Be aware that cycling may occur.

When it does, continue the Simplex algorithm even if the stopping criterion has been met,
until z does not increase any more.

Good linear programming software can handle degenerate problems.


For this course you only need to know about degeneracy and to be able to identify when it
occurs. You will not be expected to nd the true optimal solution to a degenerate problem. For
more details consult Kolman and Beck.

2.7

Simplex Algorithm for More General Problems

So far, we have assumed that

the objective function is to be maximised,


all constraints are of the form

all the resource values, bi , are

0,

and all the control variables must be

0.

We now look at what to do when these are not true.

Minimisation Problems
Minimising
z = c1 x1 + + cn xn
is the same as maximising
z = c1 x1 cn xn .
So in the Simplex algorithm enter

in the objective function row.


Note that for problems in standard form with c1 , . . . , cn all positive, the initial solution is the
minimising value for z. Why?

34

CHAPTER 2. THE SIMPLEX ALGORITHM

Negative Control Variables


So far we have assumed that all control variables must be non-negative. When a basic-variable
(x1 say) departs the tableau, we set x1 to zero, because this maximises the increase in z. If x1
were allowed to be negative, we could increase z even more by putting x1 equal to some negative
number. Therefore, the Simplex algorithm as described above will not work.
The solution is straightforward. We replace x1 by two variables x1 = x+ and x and let
1
1
x1 = x+ x where x+ 0 and x 0. This allows the Simplex algorithm to be specied in
1
1
1
1
terms of non-negative control variables while still allowing x1 to be negative. Although there is
no unique solution to x1 = x+ x , the Simplex algorithm will always reach a nal solution in
1
1
which either x+ = 0 or x = 0.
1
1
You will not be required to solve problems of this kind in this course.

Articial Variables
So far, we have assumed that all the constraints have a
and all the resource values bi are
non-negative. In this case, we introduce slack variables xn+1 , . . . , xn+m and the initial solution
x1 , . . . , xn = 0 and xn+i = bi (i = 1, . . . , m) is feasible.
There are three situations to consider where this is not the case.
1. When the constraint involves
constraint by 1. E.g.

and the resource value bi is negative, simply multiply the


a11 x1 + . . . + a1n xn

b1 < 0

becomes
a11 x1 . . . a1n xn
But now the constraint is
2. When a constraint involves

b1 > 0

.
, we introduce a surplus variable. E.g.
a21 x1 + . . . + a2n xn

b2

becomes
a21 x1 + . . . + a2n xn xn+2 = b2
If b2 0, we can just set xn+2 = b2 . If b2 > 0, we cannot set xn+2 = b2 because then
xn+2 < 0, which is not a feasible solution.
3. When a constraint is an equality, e.g.
a31 x1 + . . . + a3n xn = b3
we do not introduce a slack or surplus variable and there is no simple way to nd an initial
basic solution in the original variables.

35

CHAPTER 2. THE SIMPLEX ALGORITHM

To solve LPPs using the Simplex algorithm, an articial variable is added to each constraint
that has an = sign or is bi with bi > 0. So, we have
a21 x1 + . . . + a2n xn xn+2 + y1 = b2
for a

bi constraint with bi > 0, and


a31 x1 + . . . + a3n xn + y2 = b3

for an equality constraint.


Now we can set x1 = . . . = xn = xn+2 = 0 and set y1 = b2 and y2 = b3 as our starting point.
The purpose of an articial variable is simply to allow a feasible initial basic solution; it has
no real life meaning, and we require that each articial variable becomes zero in the nal
solution.
There are two adaptations to the Simplex algorithm to ensure that articial variables end up
equal to zero: the big-M method and the two-phase method.

2.8

The Big-M Method

Introduce articial variables, y1 , . . . , yp , say, using the method described above. Change the
objective function to
z = c1 x1 + . . . + cn xn M (y1 + . . . + yp )
where M is a very large number. In practice, M is any arbitrary number large enough to
ensure that once the articial variable departs, it will never enter again.
The Simplex algorithm is then used to solve the new problem, with one adaptation: the stopping
criterion is not met until all the articial variables are non-basic. This may mean entering a
variable whose z row entry is positive.
Example 2.8 The big-M method

The original problem is


maximise
s.t.

z = x1 + 5x2
x1

2x2

12

3x1 + 2x2
x1 , x2
The articial problem is

18

36

CHAPTER 2. THE SIMPLEX ALGORITHM

where x3 and x4 are slack variables, x5 is a surplus variable, and y1 is an articial variable.
The initial basic solution is x1 = x2 = x5 = 0, x3 = 4, x4 = 12 and y1 = 18.

Initial tableau
x1
1
0
3
1

x3
x4
y1
z

1
2

x2
0

2
-5

x3
1
0
0
0

x4
0
1
0
0

x5
0
0
-1
0

y1
0
0
1
M

Sol
4
12
18
0

6
9

2
2
2 1
1 3
2
3 z
z
, , - , + .
5
2

Second tableau
x1
1
0

x3
x2
y1
z

+1

x2
0
1
0
0

x3
1
0
0
0

x4
0
1
2

-1
5
2

x5
0
0
-1
0

y1
0
0
1
M

Sol
4
6
6
30

1 2
2
3 z
z
1
3
3
3
- , , , - .
1
3

1
3

1
3

Final tableau

x3
x2
x1
z

x1
0
0
1
0

x2
0
1
0
0

x3
1
0
0
0

x4
1
3
1
2
-1
3
17
6

x5
0
0
1
-3
1
3

y1
1
-3
0
1
3
M- 1
3

Sol
2
6
2
28

37

CHAPTER 2. THE SIMPLEX ALGORITHM


The nal solution is

Notes:

1
If we had not used the big-Ms then the z row entry for y1 would have been 3 and we
would try to enter the articial variable back as a basic variable.

The stopping criterion for the big-M method is slightly dierent. If all the non-basic variables have positive z row entries, but some of the articial variables are still basic, then
continue the algorithm, choosing as the entering variable the non-basic variable with the
smallest positive z row entry.

This approach only works for maximisation problems. One can adapt the method to work
for minimisation methods, but that is not covered in this course.

This is a simplied version compared to most books.

2.9

Two-Phase Method

This method involves running the Simplex algorithm twice: once to eliminate all the articial
variables and so nd a feasible basic solution to the original problem; and then again after
dropping the articial variables from the tableau in order to optimise the original problem.

First Phase
Dene one articial variable for every constraint that needs one (y1 , . . . , yp ,
the auxiliary objective function as

m). Dene

yi

z =
i=1

(which we want to maximise).


Suppose (simply for notational convenience) that constraints 1 . . . , p have articial variables
and constraints p + 1, . . . , m do not. Then putting
xi = 0 for i = 1, . . . n + p
yi = bi for i = 1, . . . p
and
xn+i = bi for i = p + 1, . . . m
gives a feasible basic solution to the auxiliary problem.

38

CHAPTER 2. THE SIMPLEX ALGORITHM

We solve the auxiliary problem using the Simplex algorithm with the auxiliary objective function.
We continue until all the articial variables have been eliminated. As soon as all the articial
variables have been eliminated, we stop the rst phase.
Example 2.9
Consider the same problem we used with the big-M example:
maximise
s.t.

z = x1 + 5x2
x1

2x2

12

3x1 + 2x2
x1 , x2

18

Phase One
Add slack and articial variables to give the auxiliary problem.

The initial solution is x3 = 4, x4 = 12 and y1 = 18. The z row would normally be

There is a problem here: all the non-basic variables have a zero in the z row. We have to re-write
z as

In general, we have
z = (y1 + + yp )
and
ai,1 x1 + . . . ai,n+m xn+m + yi = bi

39

CHAPTER 2. THE SIMPLEX ALGORITHM


giving
yi = bi (ai,1 x1 + . . . ai,n+m xn+m ),
and hence

n+m

i=1

i=1

j=1

bi

xj =

ai,j

Note: rows 1, . . . , p are those rows which have an articial variable.


The other requirement is that we carry through the original z row, to ensure it is in the correct
form at the start of the the second phase.

Phase I: initial tableau


x1

x3
x4
y1
z

0
3
1
-3

x2
0
2
2
-5
-2

x3
1
0
0
0
0

x4
0
1
0
0
0

x5
0
0
-1
0
1

y1
0
0
1
0
0

Sol
4
12
18
0
-18

1 2
2 3
1
3 z
1
z
z
1
z
1
, , - 3 , +3. - .

second tableau

x1
x4
y1
z

x1
1
0
0
0
0

x2
0
2

-5
-2

x3
1
0
-3
-1
3

x4
0
1
0
0
0

x5
0
0
-1
0
1

y1
0
0
1
0
0

Sol
4
12
6
-4
-6

6
3

1
1 2
3
2
3 z
3
z
z
z
3
3
, - , , +. + .

1
2

nal tableau

x1
x4
x2
z

x1
1
0
0
0
0

x2
0
0
1
0
0

x3
1
3
3
-2
- 17
2
0

x4
0
1
0
0
0

x5
0
1
-1
2
5
-2
0

y1
0
-1
1
2
5
2

Sol
4
6
3
11
0

5
2

40

CHAPTER 2. THE SIMPLEX ALGORITHM


Now the articial variable is zero. The solution for z is zero. This must be the case. Why?
Our feasible solution for the original problem is

Phase Two
Use the nal tableau of phase one as the starting point for phase two. Simply drop the z row
and the y columns.

Phase II

x1
x4
x2
z

x1
1
0
0
0

x2
0
0
1
0

x3
1

-3
2
- 17
2

x4
0
1
0
0

x5
0
1
1
-2
5
-2

Sol
4
6
3
11

4
2
-ve

1
2 3
3 z
z
1
2
2
2
2
- , , + , + .
1
3

1
3

x1
x3
x2
z

1
2

x1
1
0
0
0

x2
0
0
1
0

x3
0
1
0
0

x4
-1
3
1
3
1
2
17
6

17
6

x5
-1
3
1
3

0
1
3

Sol
2
2
6
28

So, our nal solution is

Comments
The big-M method is often easier for hand calculations than the two-phase method. However
for numerical implementation on a computer, the method is numerically unstable, giving (very)
inaccurate results. So, most computer algorithms use the two-phase method.

41

CHAPTER 2. THE SIMPLEX ALGORITHM


Summary of two-phase method
1. If all constraints are
method.

bi with bi

0 or

bi with bi

0 then use the standard Simplex

2. Otherwise, introduce articial variables.


3. Obtain the auxiliary objective function, in terms of xi .
4. Phase 1: Solve the auxiliary problem using the Simplex algorithm on z and carrying
through z.
5. Phase 2: Drop the articial variables and solve the original problem.

Chapter 3

Sensitivity Analysis and Duality


3.1

Sensitivity Analysis

Recall that sensitivity analysis means assessing how sensitive the optimal solution is to the
specication of the linear program. In Chapter 1, we looked at:

how much the optimal value changes for a unit change in a resource value (shadow price).
how much a resource value can change before the optimal solution moves to a dierent
extreme point (i.e. the set of basic variables changes).
how much the objective function can change before the optimal solution moves to a dierent
extreme point.

We looked at sensitivity analysis for the graphical solution method. Now, we consider it for the
Simplex algorithm. Fortunately, the algorithm itself helps us answer such questions.

Change to a Resource Value


Example 3.1
Consider the following linear programming problem,
maximise: z = 4x1 + 2x2 + x3
subject to:
x1 + x2
4
x1 + x3
6
x2 + x3
8
x1 , x2 , x3
0.
In canonical form this is

42

43

CHAPTER 3. SENSITIVITY ANALYSIS AND DUALITY


maximise: z = 4x1 + 2x2 + x3
subject to: x1 + x2 + x4 = 4
x1 + x3 + x5 = 6
x2 + x3 + x6 = 8
x1 , . . . , x6
0.
This gives the following Simplex algorithm tableaux.

Initial tableau
x1

x4
x5
x6
z

1
0
-4

x2
1
0
1
-2

x3
0
1
1
-1

x4
1
0
0
0

x5
0
1
0
0

x6
0
0
1
0

S
4
6
8
0

4
6

1 2 1
2 3
3
z
1
z
1
, -, and +4.

Second tableau

x1
x5
x6
z

x1
1
0
0
0

x2
1
-1
1
2

x3
0
1
1
-1

x4
1
-1
0
4

x2
1
-1
2
1

x3
0
1
0
0

x5
0
1
0
0

x6
0
0
1
0

S
4
2
8
16

2
8

1 2
2 3 2
3
z 2
z
1
, , - and +.

Final tableau

x1
x3
x6
z
So the optimal solution is

x1
1
0
0
0

x4
1
-1
1
3

x5
0
1
-1
1

x6
0
0
1
0

S
4
2
6
18

CHAPTER 3. SENSITIVITY ANALYSIS AND DUALITY

44

We shall see later that the entries in the z row corresponding to the slack variables are the
shadow prices! The shadow prices for the rst, second and third constraints are 3, 1 and 0,
respectively.
Denition
If the slack variable corresponding to a particular constraint is zero then it is a binding constraint,
otherwise it is a non-binding constraint.
Thus, a constraint is binding if the optimal solution exactly satises that constraint (with
equality).
Theorem 3.1 Complementary Slackness

Non-binding constraints have zero shadow prices.


Constraints with non-zero shadow prices are binding.

Usually, binding constraints have positive shadow prices. This reects the fact that we could
get a higher return if we had a larger resource value for that constraint, since we are currently
bound by the constraint.
In Example 3.1 a greater prot can be obtained by increasing the rst two resource values, b1
and b2 , but not the third resource value, b3 .

How Much Can We Change a Resource Value?


Suppose the value of b1 changes from 4 to 4+. Then from the shadow price we can see that
z increases by 3 units. Clearly, this cannot be true for unlimited values of : there will be a
point at which this constraint becomes non-binding because the other resource values prevent
further increases in z. Increasing b1 beyond this point will make no further dierence to z. Note
that at this point the set of basic variables will change (the slack variable for the rst constraint
will become basic). An important question: is for what range of values of b1 does the constraint
remain binding?
We could re-run the whole Simplex algorithm again with the value of b1 replaced by its new
value. However, there is a quicker way.
Were we to repeat all the elementary row operations performed in the Simplex algorithm, we
would nd that only the solution column would change. So, instead of repeating all the elementary row operations, just repeat them for the solution column. If all the entries in the solution

45

CHAPTER 3. SENSITIVITY ANALYSIS AND DUALITY

column remain non-negative, the solution we obtain is still feasible (since xi


0 i) and still
optimal (since all the entries in the z row are still 0 they have not changed).
In the above example we have:

So, at the nal basic solution, x1 = 4 + , x3 = 2 and x6 = 6 + . This is feasible if


4+

0,

0,

6+

0.

Now,
4+

(3.1)

(3.2)

6+

(3.3)
(3.4)

So, the solution is feasible if 4

2, i.e. if 0

b1

6.

The new optimal value is z = 18 + 3, which conrms the shadow price.

Notes
1. In fact, it is not necessary to recalculate the solution column for each iteration of the
Simplex algorithm. The new nal solution column is just the old nal solution column
plus times the column corresponding to the slack variable for the constraint whose
resource value has been changed, i.e. in this case the x4 column.
So, we see that the entries in the z row corresponding to the slack variables are indeed the
shadow prices.
2. If the value of is outside the range 4

2, then the optimal solution will have


a dierent set of basic variables and the whole Simplex algorithm will have to be re-run
using the new value of b1 .
3. We have assumed that only one resource value is changed. There are methods which allow
simultaneous changes to more than one resource value, but these are not covered in this
course.

46

CHAPTER 3. SENSITIVITY ANALYSIS AND DUALITY

4. Good linear programming software will usually report, for each constraint i, the range of
values of bi for which the optimal solution still has the same set of basic variables.
Exercise: If b2 changes from 6 to 8, what is the new optimal solution?

Changes to the Objective Function


We consider changes to basic and non-basic variables of the optimal solution separately.

Basic variables Suppose we change the coecient of x3 from 1 to 1+, then


z = 4x1 + 2x2 + (1 + )x3
The old optimal solution we obtained will always still be feasible, since the constraints are
unaltered,. However, the value of z will change and the solution may no longer be optimal. In
the nal tableau the z row will change in the following way:
Take the row corresponding to the changed variable (x3 ),
corresponding to the non-basic variables and the
the z row.
x3
0
1
1
1
x3
z
0
1
0
3
znew
This solution will still be optimal provided

Solving this gives

and so the coecient for x3 (c3 ) must be within the range

The optimal value will be


z =

multiply by and for the columns


solution column only add this to
1

18

47

CHAPTER 3. SENSITIVITY ANALYSIS AND DUALITY

Non-Basic Variables Suppose that the objective function has a change in the coecient of
a non-basic variable (e.g. x2 ) by an amount . That is, z becomes
z = 4x1 + (2 + )x2 + x3
The only change to the nal tableau is that the z-row entry of the variable whose coecient has
been changed (x2 ) changes by an amount . Thus the z row becomes

The solution will still be optimal provided 1


Note that can be negative.

0, i.e. provided

1. So, we require c2

3.

Again, the above methods only work for a change to one ci .

3.2

Duality

Each LPP can be formulated in two dierent ways: the primal problem and the dual problem.
We have been considering LPPs of the form
maximise: z = n ci xi
i=1
subject to:
a11 x1 + . . . a1n xn
.
.
.

.
.
.

am1 x1 + . . . amn xn
x1 , . . . , xn

b1
bm
0.

This is primal problem. It is closely related to the following LPP:


minimise: g = m bi yi
i=1
subject to: a11 y1 + . . . am1 ym
.
.
.
a1n y1 + . . . amn ym
y1 , . . . , ym

.
.
.

c1
cn
0.

which is called the dual problem.


Note that the dual of the dual problem is the primal problem.
Example 3.2
The dual for the LPP in Example 3.1 is

CHAPTER 3. SENSITIVITY ANALYSIS AND DUALITY

48

Exercise: You may want to check that the optimal solution to this dual problem is

y1 = 3,

y2 = 1,

y3 = 0,

g = 18

Notice that g = z and y1 , y2 and y3 equal the shadow prices for the primal problem.

Matrix Format
In matrix form, the primal problem is
maximise: z = cT x
subject to: Ax
b
x
0
and the corresponding dual problem is
minimise: g = bT y
subject to: AT y
y

c
0

Theorem 3.2
The optimal value of the objective function in the primal problem is the same as the optimal
value of the objective function in the dual problem. That is
z = g .
Consider the dual problem in Example 3.2. Suppose b1 is changed to b = b1 + 1 = 5. Assuming
1
that the optimal solution remains feasible, the new value of g is

g = 5y1 + 6y2 + 8y3 = g + y1 .

By increasing b1 by one unit the new optimal value of g has increased by an amount y1 . The will have also increased by y . Hence y is the
orem 3.2 tells us that the optimal value of z
1
1
shadow price for the rst constraint of the primal problem.

Theorem 3.3
The optimal values of the control variables in the dual problem are the shadow prices of the
primal problem and vice versa.

CHAPTER 3. SENSITIVITY ANALYSIS AND DUALITY

49

Interpretation of the dual problem


Let us return to the alarm clock problem.
Mr. Primal is running a factory. He has 1600 units of labour, 1800 units of processing and 350
alarm assemblies at his disposal. He plans to produce x1 standard clocks and x2 alarm clocks.
For each standard clock he will get 3 and for each alarm clock 8.
An accident occurs and one unit of one of the resources (labour, processing or alarm assemblies)
is damaged and can no longer be used. Fortunately, Mr. Primal is insured for such losses by
Mrs. Dual.
Mr. Primal and Mrs. Dual have agreed beforehand how much one unit of each resource is worth,
and so how much Mrs. Dual will compensate Mr. Primal to ensure that he does not suer a
nancial loss. Let yi denote the value of one unit of the ith resource (i = 1, 2, 3) that they
agreed.
Mr. Primal explains to Mrs. Dual that each standard clock uses 2 labour units and 6 processing
units and makes 3 prot. Each alarm clock uses 4 labour units, 2 processing units and 1 alarm
assembly and makes 8. Thus, for the compensation to be fair, he insists that y1 , y2 and y3
must satisfy the constraints
2y1 + 6y2

4y1 + 2y2 + y3

(3.5)

since, for example, with 2 units of labour and 6 of processing, he can make 3 of prot.
Mrs. Dual agrees this is fair. However, she want to minimise her liabilities. If the factory burns
down and all Mr. Primals resources are lost, she will have to pay out
g = 1600y1 + 1800y2 + 350y3
pounds.
So, Mrs. Dual wants to solve the following LPP:
minimise g subject to the constraints (3.5).
This is the dual of Mr. Primals original optimisation problem.

Chapter 4

Game Theory
4.1

Introduction

In Chapters 13, we looked at situations in which one individual chooses what to do in order
to optimise some quantity (the objective function). In this chapter, we look at strategic games
(or games). In a game two or more individuals (the players) each try to maximise a dierent
quantity (the payo for that player) which depends not only on what he/she does, but also on
what the other player(s) do.
Game Theory is applied in many elds:

games of fun (Monopoly, card games, noughts and crosses)


business and economics
sociology
politics
evolutionary biology

It is often necessary to idealise (simplify) the real problem, in order to make it susceptible to
analysis by Game Theory.
Games may be classied as:

simultaneous-move or sequential-move
zero-sum or non-zero sum (variable-sum)
two-player or n-player (n > 2)

In this chapter, we shall look at two-player, zero-sum, simultaneous-move games. In Section 4.9
we shall briey look at Games Against Nature.
An important assumption of Game Theory is that each player is rational, i.e. they are each
seeking to maximise their payo and choose the best strategy to achieve this. This assumption
is necessary to work out what each players best (optimal) strategy is.
50

51

CHAPTER 4. GAME THEORY

4.2

Basic Formulation

We have two players (A and B). Each has a number of possible moves and must choose one of
these. We are dealing with simultaneous-move games, so players must choose their move before
knowing which move the other has chosen. The outcome of the game is a payo for Player A
and a payo for Player B. These depend on Player A and Bs choices of move. As we are dealing
with zero-sum games, the payo to Player B is minus the payo to Player A. So, we require only
one payo matrix:

If Player A chooses move Ai and Player B chooses move Bj , the payo to Player A is aij and to
Player B is aij . The convention is that the numbers in the payo matrix are the payos for
the player to the left of the matrix (Player A in the matrix above).
Example 4.1

Player
A

A1
A2

Player B
B1 B2
0 4
3
2

Denitions
A strategy is a rule for determining which move to play.
A pure strategy species that the same move always be chosen.
A mixed strategy species that the move be chosen at random, with each move having some
specic probability of being chosen. E.g. Player B chooses B1 with probability y1 , B2 with
probability y2 , etc. (y1 + y2 + . . . + ym = 1).
(A pure strategy is a mixed strategy in which one of the probabilities equals one and the rest
are zero.)
Suppose the payo matrix is

52

CHAPTER 4. GAME THEORY

An

Player A

A1
.
.
.

Player
B1 . . .
a11 . . .
.
.
.
an1

B
Bm
a1m
.
.
.

...

anm

Let xi (i = 1, . . . , n) be the probability that Player A chooses move Ai and yj be the probability
that Player B chooses move Bj , and let x = (x1 , . . . , xn ) and y = (y1 , . . . , ym ). Then the expected
payo for Player A is
E(x, y) =
Note the non-standard notation for expectation.

4.3

Nash Equilibrium

We seek the optimal strategy for each player. These are dened in terms of the Nash equilibrium.
The Nash equilibrium of a game is the pair of strategies, one for each player, such that each
players strategy is best for him, given that the other player is playing his equilibrium strategy.
Another way of looking at this is that neither player should want to change his strategy if he
knew what strategy the other player was using.
In fact, we shall often pretend that the other player knows our strategy in order to work out
what our optimal strategy is.
Example 4.2

Player
A

A1
A2

Player B
B1 B2
0 4
3
2

The Nash equilibrium is: Player A always chooses A2 and Player B always chooses B2 . Thus
these are the optimal strategies for the two players. They are pure strategies: x = (0, 1),
y = (0, 1). The payo will then be 2 (to Player A).
Example 4.3

53

CHAPTER 4. GAME THEORY

Player
A

A1
A2

Player B
B1 B2
4
3
2
8

1
1
Do x = ( 2 , 1 ) and y = ( 4 , 3 ) constitute a Nash equilibrium?
2
4

Denition
If optimal strategies x and y exist, then
v = E(x , y )
is called the value of the game.

Corollary
E(x , y)

E(x, y )

Theorem 4.1 Existence


Every two-player, simultaneous-move, zero-sum game has a Nash equilibrium and hence an
optimal strategy for each player and a value.
The optimal strategies may be pure or mixed.
If a pure strategy is optimal for one player then a pure strategy is also optimal for the other.
Solving a Game
Solving a game means nding x , y and v.
To solve a game, try the following methods (in order of increasing complexity):

54

CHAPTER 4. GAME THEORY

is there a single dominant strategy?


look for saddle point
if at least one player has only two possible moves, use graphical method
formulate game as LPP and solve

The rst two methods will nd optimal strategies if they are pure. The latter two are needed
otherwise.

4.4

Dominance

Denition
Let xp and x be pure strategies. Then xp dominates x if
p
p
E(xp , yq )

E(x , yq )
p

for all pure strategies yq

E(xp , yq )

for all pure strategies xp

Also, yq dominates yq if

E(xp , yq )

Dominated strategies are sub-optimal and can be eliminated (for the purpose of solving the
game).
Example 4.4

Player
A

A1
A2
A3

B1
1
2
3

Player B
B2 B3 B4
2
1
2
1
0 3
2
0
0

B5
0
1
0

We can drop the dominated strategies from the game. The sub-payo matrix is now

If, for Player A, pure strategy xp dominates all other pure strategies, xp is Player As optimal
strategy. Player Bs optimal strategy is now the pure strategy that minimises As payo when
A uses xp .
Example 4.5

55

CHAPTER 4. GAME THEORY

Player
A

A1
A2
A3

Player B
B1 B2 B3
2
5 2
0
7
1
3
6
1

Pure strategy A2 (i.e. x = (0, 1, 0)) dominates the other two pure strategies, A1 and A3 (i.e.
x = (1, 0, 0) and x = (0, 0, 1). Hence, x = (0, 1, 0).
What is Bs optimal strategy?

4.5

Saddle Points

Suppose that neither Player A nor Player B has a pure strategy that dominates all his/her other
pure strategies. Then we look for saddle points.
Denition
Pure maximin strategy for Player A is the strategy with maximum row-minimum.
That is,
arg max
i=1,...n

min (ai1 , . . . , aim )

j=1,...m

Pure minimax strategy for Player B is the strategy with minimum column-maximum.
That is,
arg min
j=1,...m

max (a1j , . . . , anj )

i=1,...n

Example 4.6
Two companies have rival products of Beer, Wine and Spirits. Each month they choose which
product to advertise. The following table shows the gain in As prots (in thousands per
month) given the combination of advertising actions. We shall assume that Bs prots drop by
the same amount.

Company
A

Beer
Wine
Spirits

Company B
Beer Wine Spirits
7
8
5
2
12
0
9
4
4

56

CHAPTER 4. GAME THEORY

Are there any dominated strategies?

Denition
If the payo for As pure maximin strategy is the same as for Bs pure minimax strategy then
there is a saddle point in the game.
Theorem 4.2 Saddle Point Theorem
If a two-player, zero-sum, simultaneous-move game has a saddle point, the pure maximin and
minimax strategies constitute a Nash equilibrium and are optimal strategies. The value of the
game, v, equals the maximin and minimax payos.
In Example 4.5 there is a saddle point in the top right hand corner. As optimal strategy is
always to advertise Beer and Bs optimal strategy is to advertise Spirits.
Exercise: Check that these two pure strategies constitute a Nash equilibrium.

Notes

4.6

A game may have multiple saddle points.


If there is a pure strategy that dominates all other pure strategies, it will also be found by
looking for a saddle point.
Before looking for saddle points, you may want to eliminate dominated strategies (not
strictly necessary).

Maximin and Minimax Mixed Strategies

If there is no saddle point, the optimal solution is mixed, and either pure maximin payo < v
or pure minimax payo > v or both.
When searching for a saddle point, we considered only pure maximin/minimax strategies. We
now consider mixed maximin/minimax strategies.
In order for Player A to determine his maximin strategy, he must consider what strategy Player B
would adopt if she knew x = (x1 , . . . , xn ). Player B would choose the move Bj for which the
expected payo (to A)
n

xi ai,j
i=1

57

CHAPTER 4. GAME THEORY


is minimised. So, the expected payo would be
n

xi ai,m

xi ai,1 , . . . ,

min

i=1

i=1

Player As maximin strategy is the strategy x that maximises this:


n

n
(x1 ,...xn )

xi ai,m

xi ai,1 , . . . ,

x = arg max min


i=1

Similarly, the minimax strategy for B is

y = arg min max


(y1 ,...,ym )

i=1

yj a1,j , . . . ,
j=1

j=1

Theorem 4.3

yj an,j .

The maximin and minimax strategies constitute a Nash equilibrium and so are optimal strategies.

4.7

Graphical Solution for n 2 Game

A game that is not n 2 may become so after eliminating dominated strategies. So, eliminate
dominated strategies rst.
Assume we have eliminated all dominated strategies and looked for a saddle point. No saddle
point has been found, and so we must consider mixed strategies.
Example 4.7

Player
A

A1
A2
A3

Player B
B1 B2
2 1
3
4
0
2

Are there any dominated strategies?


Is there a saddle point?
Start by working out Bs minimax strategy. Expected payo to A for each of his pure strategies
is
A1 :
A2 :
A3 :

2y1 1(1 y1 ) = 3y1 1

58

CHAPTER 4. GAME THEORY


Player Bs minimax strategy is
arg min {max(3y1 1,
0 y1 1

We can plot the three components in the above formula as lines with y1 (0
horizontal axis.

y1

1) on the

4
Payoff
to A
A1

A3
0
1

y1

2
A2

We can compute y1 exactly because it is the intersection of the lines A1 and A3 .

So, Bs optimal strategy is y =

3 2
5, 5

. Now, we need to nd As optimal solution.

Theorem 4.4
The optimal (maximin) strategy for A requires only two strategies: those that intersect at Bs
minimax solution.
If more than two strategies intersect at this point, choose any two that have opposite gradients.
In this example, A1 and A3 intersect at the minimax solution. This implies that A2 is not used
in the optimal solution and so x2 = 0.
We need to nd the values of x1 and x3 that maximise the minimum of
2x1

and

x1 + 2x3 ,

59

CHAPTER 4. GAME THEORY


where x3 = 1 x1 . So, we want x1 that maximises
min(2x1 , 2 3x1 ).

This minimum occurs where the two lines intersect (draw the graph of 2x1 and 2 3x1 against
x1 if you are not sure why). Hence we have
2x = 2 3x
1
1
2
x =
1
5
The optimal solution for this game is
3
2
, 0,
5
5
3 2
=
,
5 5
2 2
3 2
2 3
= 2 1 +2
5 5
5 5
5 5
4
.
=
5

x =
y
v

To solve a 2 m game an almost identical approach is used. The maximin strategy for A is
found graphically by taking the maximum point of the lowest of the n lines. Then the minimax
strategy for B can be found. Always start with the player who has only two possible moves.

4.8

Linear Programming Formulation of a Game

The following method can always be used, even if n > 2 and m > 2.
Suppose Player B plays the mixed strategy y and Player A know what y is. Then Player A will
choose his move Ai to maximise the payo, i.e.

m
ai,j yj .
u = max

i=1,...n
j=1

Player Bs minimax strategy is that which minimises u. From the denition of u we have
a1,j yj
j=1

a2,j yj

u,

an,j yj

u, . . . ,

j=1

(4.1)

j=1

and
y1 + + ym = 1

(4.2)

60

CHAPTER 4. GAME THEORY

Taking (4.1) and (4.2) and dividing through by u (which for the moment we will assume is
positive), we get
ym
y1
+ + a1,m
u
u
.
.
.
ym
y1
an,1 + + an,m
u
u
ym
y1
+ +
=
u
u

a1,1

.
.
.
1
1
u

Player B wants to choose y1 , . . . , ym to minimise u.


The above equations can be expressed as a linear programming problem:
maximise Y1 + + Ym = Z
subject to a1,1 Y1 + + a1,m Ym

...

...

an,1 Y1 + + an,m Ym

where Yj = yj /u and Z = 1/u.


Now, suppose that Player A plays mixed strategy x. Player B will choose his move Bj to
minimise the payo to A, i.e.
n

w=

j=1,...,m

ai,j xi

min

i=1

Player A wants to maximise w. We have


n

ai,1 xi

ai,m xi

w, . . . ,

i=1

i=1

and
x1 + + xn = 1,
which yields (assuming w is positive)
n

ai,1
i=1

xi
w

ai,m

1, . . . ,
i=1

xi
w

xn
1
x1
+ +
=
w
w
w
Let Xi = xi /w and V = 1/w. We have
minimise X1 + + Xn = V
subject to a1,1 X1 + + an,1 Xn
.
.
.

1
.
.
.

a1,m X1 + + an,m Xn

61

CHAPTER 4. GAME THEORY

Recognise this as the dual of the rst LPP. Hence Z = V , i.e. min u = max w. Thus we have
the following theorem.
Theorem 4.5
Provided the value of the game is greater than zero (v > 0), the minimum possible value of u is
the value of the game.

Solving the rst LPP using the simplex algorithm gives Z = 1/u = 1/v and Yj = yj /u
(j = 1, . . . , m).

How should we calculate x (i = 1, . . . , n)?


i
To complete the method we must consider what to do when v 0. Since the value of the game
cannot be less than the smallest value in the payo matrix, we can simply add a constant to all
entries in the payo matrix, solve the LPP, and then subtract the same constant at the end.
If a is the minimum entry in the payo matrix, and a
0, add a + 1 to all payo matrix

entries, solve the LPP and then nd yj = Yj /Z and v = 1/Z + a 1.


Example 4.8
Suppose we have the following payo table.

A1
A2
A3

B1
3
-1
-2

B2
-2
0
-1

B3
-3
-2
3

The smallest entry is 3, so we add 4 to every entry and get,

A1
A2
A3

B1
7
3
2

B2
2
4
3

B3
1
2
7

which must have a value greater than zero. The LPP is

This gives the initial tableau:

62

CHAPTER 4. GAME THEORY

The nal tableau is

Y1
Y2
Y3
Z

Y1
1
0
0
0

Y2
0
1
0
0

Y3
0
0
1
0

Y4
.18
.14
.008
.05

Y5
.091
. 39
-.14
.152

Y6
0
.091
.18
.091

Sol
.091
.152
.05
.298

The optimal solution is Z = .298, Y1 = 0.091, Y2 = 0.152 and Y2 = 0.05. This gives

, and y3 =
, y2 =
the optimal strategy for Player B: y1 =
.
The optimal solution for Player A can be obtained using duality. The shadow prices for the

primal problem are the solution to the dual problem. Thus, X1 = 0.05, X2 = 0.152 and

X3 = 0.091. Now, Player As optimal strategy is x = X1 /Z = 0.05/0.298 = 0.168, x =


2
1

X2 /Z = 0.152/0.298 = 0.527 and x = X3 /Z = 0.091/0.298 = 0.305,


3
Finally v = 1/.298 4 = 0.644. The value, v, has turned out to be negative, so we did need
to add a constant to the payo matrix entries.

4.9

Games Against Nature

In this setting, we only have one rational player, but the payo also depends on some as yet
unknown state of nature. Here nature might mean nature (e.g. weather, biological competition) or some other phenomenon over which the player has no inuence (e.g. stock-markets,
trac conditions, political policy).
We adopt a similar framework to the two-player game, assuming that nature is player B. However
we no longer assume that player B adopts a strategy to minimise player As reward. Nature is
not out to get us. This fact means we must change our optimisation approach.
Example 4.9 Electronics company
An electronics company is considering launching a new product. They have 3 options:
A1: Launch now with a big advertising campaign
A2: Launch now with a minimal advertising campaign

63

CHAPTER 4. GAME THEORY


A3: Make a small pilot study to see if the market is receptive to the product.
Nature is the market reaction, which can be good, moderate or poor.
Strategy
A1 (risky)

Good
High sales
High costs

Moderate
Moderate sales
High costs

Poor
Poor sales
High costs

A2 (neutral)

Would do better with


more advertising time

Moderate sales
Low costs

Poor sales
Low costs

A3 (Cautious)

Competitors steal
the market

Lower sales
Lower costs

Very low costs,


can switch to
another product

The payo matrix is given by

A1 (risky)
A2 (neutral)
A3 (Cautious)

Good
110
90
80

Moderate
45
55
40

Poor
-30
-10
10

In the previous sections As strategy was determined by the maximin principle. The maximin
strategy is pure strategy A3, as there is a saddle point at the cautious/poor entry (10). However,
this is only optimal if we assume that the market is trying to minimise this companys costs, an
assumption for which there is no justication.
We shall only consider pure strategies for games against nature.
There is no single optimal strategy. It depends on the players attitude to risk and/or how likely
he thinks the various possible states of nature are.

Maximax strategy
This is the strategy that can give the maximum payo (but only if the state of nature happens
to be right). Thus the maximax strategy is the one that corresponds to the row with the largest
entry in the payo matrix. In Example 4.9 this is A1.
The maximax strategy is risk taking.

64

CHAPTER 4. GAME THEORY

Laplace strategy
The Laplacian strategy is the one that maximises the expected payo, assuming that each state
of nature is equally likely.
The expected payos for strategies A1 , A2 and A3 are
2
For A1 : 110/3 + 45/3 30/3 = 41 3
For A2 : 90/3 + 55/3 10/3 = 45
1
For A3 : 80/3 + 40/3 + 10/3 = 43 3
Thus, the Laplacian strategy is A2.

Hurwicz strategy
For a strategy Ai the Hurwicz number Hi is
Hi = aiu + (1 )ail

where aiu is the maximum entry in row i and ail is the minimum entry in row i of the payo
matrix.
The Hurwicz strategy is the strategy that gives the greatest Hurwicz number. It depends on .
When = 0 we get the maximin solution and when = 1 we get the maximax solution. So,
is a weighting of the best and worse states of nature.
In our example we have:
H1 = 110 30(1 ) = 140 30
H2 = 90 10(1 ) = 100 10
H3 = 80 + 10(1 ) = 70 + 10

65

CHAPTER 4. GAME THEORY


110

H( )

H1

90

H2

80

H3

10
0
10

30

So, when = 0.25, we have H1 = 5, H2 = 15 and H3 = 27.5, giving A3 as the optimal strategy.
For what values of is A3 the Hurwicz strategy?

In general, the Hurwicz solution is a compromise of the two extreme states of nature. However,
the method requires us to choose and it does not take into account the other states of nature
(which may be more likely).

Minimax regret strategy


Denition
The regret for a strategy and state of nature combination is the cost of taking that strategy
compared to the best strategy for that state of nature.
For example, if the market turns out to be good, with A1 we have a payo of 110 which is the
best payo for that outcome, and we have zero regret. For A2 we only get 90, when we could
have obtained a payo of 110, which is a regret of 20.
The regret matrix is

66

CHAPTER 4. GAME THEORY


Good
0
20
30

A1 (risky)
A2 (neutral)
A3 (Cautious)

Moderate
10
0
15

Poor
40
20
0

The column minimum is always zero.


The minimax regret strategy is that which minimises the maximum regret, which in this case is
A2 .
We can also use the minimin, Hurwicz or Laplace criterion on the regret matrix.

Expected Payo and Variance


Suppose the probabilities of the states of nature are known/estimated, p = (p1 , . . . , pm ), with
pj = 1, 0 pj 1.
The expected payo for pure strategy Ai is
m

pj aij

Ep (Ai ) =

i = 1, . . . , n.

j=1

We can choose our preferred strategy to be the pure strategy Ai that maximises Ep . Note that
if each pj = 1/m, we have the Laplacian strategy.
Similarly, the variance of the payo is
m

pj a2 (Ep (Ai ))2


ij

Vp (Ai ) =
j=1

Expected valuestandard deviation criterion


In many cases, the strategy that maximises the expected payo is very risky (e.g. investment in
a high-return high-risk stock-market portfolio) and a less protable but less risky option might
be more appropriate (e.g. for a pension fund).
The expected valuestandard deviation (EVSD) criterion is dened as
EV SDp (Ai , K) = Ep (Ai ) K

Vp (Ai ),

for some K. If K is large, strategies with a highly variable payo are penalised. The optimal
strategy under the EVSD criterion is the strategy Ai that maximises EV SDp (Ai , K).
In Wilkes (1989) and many other OR books the expected valuevariance criterion is considered
instead. The EVSD is better from a statistical viewpoint because the standard deviation is

CHAPTER 4. GAME THEORY

67

measured on the same scale as the expected value, and so the EVSD has the same units as the
expected value.

Chapter 5

Dynamic Programming
5.1

Introduction

Example 5.1
Look at the following network. We have to get from node A to node Z in the shortest possible
time, moving only in the direction of the arrows. The time needed to move from one node
to another is indicated above the arrow. E.g. one possible route is ACGJLZ, which takes
4 + 6 + 3 + 3 + 5 = 21 minutes.

8
9

A
4

1
10

Z
1

2
1

G
D

6
7

E
We could inspect all possible routes. However, there are many possible routes, and this number
would grow rapidly as the network grew. Dynamic programming is a more ecient method to
solve this problem.
Dynamic programming is used when the problem can be separated into stages. An optimisation
68

CHAPTER 5. DYNAMIC PROGRAMMING

69

is performed at each stage in turn, but the optimal decision found at each stage depends on the
optimal decision found at the next stage, and so on. It is only when the optimisation has been
performed at the nal stage that it becomes clear what the optimal decision is at each of the
earlier stages.
One application of dynamic programming is to nd the route through a network that gives the
minimum total cost or maximum total reward. Example 5.1 is an example of such a problem,
in which the costs are the times.

Notation

Stage n {0, 1, . . . , N }.
Sn is the state at stage n.
Qn is the state space at stage n, i.e. is the set of all possible states at stage n. (So, Sn Qn .)
ci,j is the transition cost for moving from state i to state j.

Example 5.1 continued


N = 5, Q0 = {A}, Q1 = {B, C, D, E}, Q2 = {F, G, H}, . . ., Q5 = {Z}, cA,B = 2, etc.

For each state in stage 1, what is the quickest route from stage 0? How long does it take?
For each state in stage 2, what is the quickest route from stage 1? How long does it take?
.
.
.
Finally, for each state Z in stage 5, what is the quickest route from stage 4? How long does
it take?

We have separated the problem into stages and solved each stage in turn. This is dynamic
programming. The method we have just used is a dynamic programming algorithm called
forward recursion.

5.2

Forward Recursion

Dene fn (Sn ) as minimum cost for moving from (any state in) stage 0 to state Sn in stage n.
Clearly, f0 (S0 ) = 0 S0 Q0 .
The forward recursion equation is given by
fn (Sn ) =

min

Sn1 Qn1

fn1 (Sn1 ) + cSn1 ,Sn .

(for n = 1, . . . , N ).
If the transition from a state Sn to a state Sn+1 is not possible, treat cSn ,Sn+1 as .

70

CHAPTER 5. DYNAMIC PROGRAMMING


Example 5.1 continued
In fact, forward recursion is precisely the algorithm we used above for Example 5.1.
f1 (B) = min {f0 (S0 ) + cS0 ,B } = f0 (A) + cA,B = 0 + 2
S0 {A}

The optimal route to B is A B. Similarly, for all the other states, C, D and E, in stage 1.
Then
f2 (F ) =

min

S1 {B,C,D,E}

{f1 (S1 ) + cS1 ,F }

= min {2 + 8, 4 + 2, 1 + , 3 + 8} = 6
So, the optimal route to F passes through C. Since we know that the optimal route to C is
A C, the optimal route to F is A C F . Similarly, for the other states, G and H, in stage
2. We do the same for stages 3 and 4, and nd that f4 (L) = 10, f4 (M ) = 8 and the optimal
route to M is A E G J M .
Finally,
f5 (Z) =

min

S4 {L,M }

{f4 (S4 ) + cS4 ,Z }

= min {10 + 5, 8 + 1} = 9
So, the optimal route to Z passes through M. Since we know that the optimal route to M is
A E G J M , the optimal route to Z is A E G J M Z.

Maximising reward
Alternatively, instead of minimising a total cost, we might want to maximise a total reward.
Here, instead of ci,j being the transition cost for moving from state i to state j, ri,j is the
transition reward.
Dene fn (Sn ) as maximum reward for moving from (any state in) stage 0 to state Sn in stage
n, and let f0 (S0 ) = 0 S0 Q0 again. The forward recursion equation to maximise a reward
is given by
fn (Sn ) =
The forward recursion equation implies that it is enough to know the optimal rewards/costs for
reaching every state in the previous stage and the transition rewards/costs to get from each of
these states to the current state. We do not need to consider the entire path to reach the current
state.
Example 5.2
The following diagram shows a network with 4 stages (0, 1, 2 and 3) and the transition rewards.
Find the route from stage 0 to stage 3 that maximises the total reward. Note: You may end in
any of the three states of stage 3.

71

CHAPTER 5. DYNAMIC PROGRAMMING

Stage 0

Stage 1

Stage 2
8

Stage 3
8

3
4

4
5

6
10

6
1
9

10

We nd fn (Sn ) for every state at every stage (starting with stage 0) and the state in the previous
stage through which we should pass.
Stage 0 f0 (1) = 0.
Stage 1 The maximum rewards at the rst stage are just the transition rewards:
f1 (2) = max{f0 (1) + r1,2 } = 3
f1 (3) = max{f0 (1) + r1,3 } = 9
f1 (4) = max{f0 (1) + r1,4 } = 6
The optimal routes are obviously 1 2, 1 3 and 1 4.
Stage 2
f2 (5) = max{f1 (2) + r2,5 , f1 (3) + r3,5 } = max{3 + 8, 9 + 4} = 13 (through state 3)
f2 (6) = max{f1 (2)+r2,6 , f1 (3)+r3,6 , f1 (4)+r4,6 } = max{3+2, 9+5, 6+10} = 16 (through 4)
f2 (7) = max{f1 (3) + r3,7 , f1 (4) + r4,7 } = max{9 + 7, 6 + 9} = 16 (through 3)
Stage 3
f3 (8) = max{f2 (5) + r5,8 , f2 (6) + r6,8 } = max{13 + 8, 16 + 4} = 21 (through 5)
f3 (9) = max{f2 (5)+r5,9 , f2 (6)+r6,9 , f2 (7)+r7,9 } = max{13+8, 16+2, 16+1} = 21 (through 5)
f3 (10) = max{f2 (6) + r6,10 , f2 (7) + r7,10 } = max{16 + 1, 16 + 6} = 22 (through 7)
The optimal route ends at state 10 and passes through 7. The optimal route to 7 passes
through 3. So, the optimal route is 1 3 7 10.
Check that this route gives the correct total reward.
r1,3 + r3,7 + r7,10 = 9 + 7 + 6 = 22.

CHAPTER 5. DYNAMIC PROGRAMMING

72

Note
To nd the optimal route, we do not actually need to note down the optimal route to every
state. We can wait until we get to the end, when we know the optimal reward, and then nd
the optimal route by rolling back the solution. You may prefer to nd the optimal route this
way (it involves less work). In this example we want to nish at state 10, giving a reward of
22. But 22 comes from f2 (7) + r7,10 = 16 + 6, so if we want the optimal reward we must visit
state 7. The optimal reward for getting to state 7 is obtained from f1 (3) + r3,7 = 9 + 7 so we
must visit state 3. Thus the optimal route is to go through states 3,7 and 10, giving a reward
of 9+7+6=22.

5.3

Backward Recursion

Example 5.3 Pipeline routeing problem


A company wishes to lay a pipeline from their processing plant to a new local distribution point.
The company has identied a network of possible routes with the associated costs of installing
each section.

Using the same method as before, the value fn (Sn ) can be written in next to the appropriate
state. Note that this is a minimisation problem, as we are dealing with costs.
We can also use Backward Recursion.
Here, we dene gn (Sn ) as the maximum reward / minimum cost for moving from state Sn in
stage n to (any state in) stage N .
Clearly, gN (SN ) = 0 SN QN .
Now, we work backwards, going rst to stage N 1, then to N 2, etc., and ending in stage 0.

73

CHAPTER 5. DYNAMIC PROGRAMMING


For costs,
gn (Sn ) =

min

Sn+1 Qn+1

gn+1 (Sn+1 ) + cSn ,Sn+1 ,

(n = 0, 1, . . . , N 1)

(with the obvious changes for a maximising rewards problem).


In forward recursion, fn (Sn ) is the optimum return obtainable by moving from stage 0 to state
Sn at stage n.
In backward recursion, gn (Sn ) is the optimum return obtainable by moving from state Sn at
stage n to stage N.
We can write in the values gn (Sn ) next to each state Sn . E.g.
g3 (I) =
g2 (B) =
We see that both forward and backward recursion have given the same solution.
Now, we roll forward the solution to nd the optimal route. This is

Comments on forward and backward recursion


The computational eort may be dierent for the two methods.
When a dynamic programming problem has a random component to it (Chapter 6), only backward recursion can be used.

5.4

Resource Allocation Problems

This is a common type of problem which once formulated as a dynamic programming problem
can be solved in a straightforward manner.
Example 5.4
A company wishes to expand its business by enlarging three of its manufacturing plants. For
each type of expansion there is an expense and a future revenue (reward) according to the
following table (all entries in M).

74

CHAPTER 5. DYNAMIC PROGRAMMING


Proposed
Expansion
A
B
C
D

Plant 1
expense revenue
0
0
1
5
2
6

The company has a budget of


expansions.

Plant 2
expense revenue
0
0
2
8
3
9
4
12

Plant 3
expense revenue
0
0
1
3

5M and it wants to maximise its revenue from the available

We can consider the problem sequentially.


Stage
Stage
Stage
Stage

0:
1:
2:
3:

no expansion plans have been decided.


the expansion plan for plant 1 has been decided.
the expansion plans for plants 1 and 2 have been decided.
the expansion plans for plants 1, 2 and 3 have been decided.

Let Sn , the state at stage n (n = 0, 1, 2, 3), be the cumulative expense by stage n.


Let rn (i, j) be the transition reward of moving from state i at stage n to state j at stage n + 1.
This will be the revenue of the expansion project for plant n + 1 that has associated expense
j i.
We can now draw a network showing the possible states at each stage, the possible transitions
and the transition rewards.
This problem can be solved using forward or backward recursion.
Q0 =
Q1 =
Q2 =
Q3 =

CHAPTER 5. DYNAMIC PROGRAMMING

75

76

CHAPTER 5. DYNAMIC PROGRAMMING

Forward recursion
Let fn (Sn ) be the maximum revenue available in stages 0, . . . , n when the state at stage n is Sn
(i.e. having spent Sn million expanding plants 1, . . . , n).
So, the forward recursion equation is
fn (Sn ) =

max

Sn1 Qn1

{fn1 (Sn1 ) + rn1 (Sn1 , Sn )} .

and f0 (0) = 0, as usual.


It is easiest to solve this problem in tabular format.

Stage 1
f1 (S1 ) = maxS0 {0} {f0 (S0 ) + r0 (S0 , S1 )} = r0 (0, S1 )
S1
0
1
2

f1 (S1 )
0
5
6

Stage 2
f2 (S2 ) = maxS1 {0,1,2} {f1 (S1 ) + r1 (S1 , S2 )}.
S2
0
1
2
3
4
5

S1 = 0
0+0

S1 = 1

S1 = 2

5+0
0+8
0+9
0+12

6+0
5+8
5+9
5+12

6+8
6+9

f2 (S2 )
0
5
8
13
14
17

S1
0
1
0
1
1 or 2
1

Stage 3
f3 (S3 ) = maxS2 {0,1,2,3,4,5} {f2 (S2 ) + r2 (S2 , S3 )}.
S3
0
1
2
3
4
5

S2 = 0
0+0
0+3

S2 = 1
5+0
5+3

S2 = 2

8+0
8+3

S2 = 3

13+0
13+3

S2 = 4

S2 = 5

14+0
14+3

17+0

f3 (S3 )
0
5
8
13
16
17

S2
0
1
1 or 2
3
3
4 or 5

77

CHAPTER 5. DYNAMIC PROGRAMMING


, which is achieved by getting to S3 =

Thus, the maximum revenue is


.
a total of

, i.e. spending

We now roll back the solution to nd the optimal investment strategy.

So, there is a choice of expansion plans:

Backward recursion
Let gn (Sn ) be the maximum revenue available over future stages, starting in state Sn at stage n
(i.e. having spent Sn million expanding plants 1, . . . , n).
Clearly, g3 (S3 ) = 0 S3 Q3 . Now,
gn (Sn ) =

max

Sn+1 Qn+1

{gn+1 (Sn+1 ) + rn (Sn , Sn+1 )}

(n = 0, 1, 2)

Stage 2
g2 (S2 ) = maxS3 {0,1,2,3,4,5} {g3 (S3 ) + r2 (S2 , S3 )}.
S2
0
1
2
3
4
5

S3 = 0
0+0

S3 = 1
0+3
0+0

S3 = 2
0+3
0+0

S3 = 3

0+3
0+0

S3 = 4

0+3
0+0

Stage 1
g1 (S1 ) = maxS2 {0,1,2,3,4,5} {g2 (S2 ) + r1 (S1 , S2 )}.

S3 = 5

0+3
0+0

g2 (S2 )
3
3
3
3
3
0

S3
1
2
3
4
5
5

78

CHAPTER 5. DYNAMIC PROGRAMMING


S1
0
1
2

S2 = 0
3+0

S2 = 1

S2 = 2
3+8

3+0

S2 = 3
3+9
3+8

3+0

S2 = 4
3+12
3+9
3+8

S2 = 5
0+12
0+9

g1 (S1 )
15
12
11

S2
4
4 or 5
5

Stage 0
g0 (S0 ) = maxS1 {0,1,2,3,4,5} {g1 (S1 ) + r0 (S0 , S1 )}.
S0
0

S1 = 0
15+0

S1 = 1
12+5

S1 = 2
11+6

g0 (S0 )
17

S1
1 or 2

Again, the maximum revenue is


. We can now roll forward the solution, which again
yields the same optimal routes as we obtained earlier using forward recursion.

5.5

The Warehousing Problem

This is another example of dynamic programming solving a practical problem.


A company manufactures and supplies a product. Each month they manufacture a certain
amount at a production plant, which is then stored at their warehouse. They also distribute
some or all of the warehoused stock to the retailers.
Assume that the distributed stock leaves the warehouse early afternoon on the rst of each
month and the manufactured stock is brought to the warehouse in the evening on the day it is
manufactured. On the morning of the rst of each month, a manager has to choose how much
stock is to be distributed that day, and how much is to be manufactured that month.

Notation
t
Qt
Vt
ct
pt
It
K
t

month (t = 1, . . . , T )
number of items manufactured in month t
number of items distributed to the retailers during the rst day of month t
manufacturing cost (per item) in month t
retail price (per item) in month t
number of items stored in the warehouse at beginning of 1st day of month t
the maximum capacity (no. of items that can be stored) and
prot available in months t, t + 1, . . . , T .

Each of Qt , Vt , ct , pt , It and K are 0, and for any month t the number of items that can be
stored cannot exceed the maximum capacity, i.e. 0 It K. The costs, ct , and prices, pt , are
assumed to be known in advance but may be dierent from month to month.
The number of items in the warehouse at the start of month t + 1 is the number of items there

CHAPTER 5. DYNAMIC PROGRAMMING

79

at the start of month t, plus the number of items manufactured in month t minus the number
of items distributed in month t, giving
It+1 = It Vt + Qt
We want to choose Vt and Qt (t = 1, . . . , T ) so that the prot 1 is maximised.
The above is a general problem. In a specic example we have:
The maximum capacity is K = 1000.
The initial amount of stock in the warehouse is I1 = 300.
We are considering the problem over a six month period, so T = 6,
and at the end of the 6 month period we want an empty warehouse, so I7 = 0.
The values ct and pt are
t ct pt
1 70 90
2 64 82
3 72 70
4 70 85
5 65 90
6 65 85
We need to use backward recursion for this problem.

Month 6
Let t = 6, and assume that today is the rst day of month 6. Pretend that the number of items
in the warehouse at the start of today, I6 , is xed.
The prot for month 6 is 6 = p6 V6 c6 Q6 . Since we are required to have I7 = 0, I7 = I6 V6 +Q6 ,
and Q6 0, the only way that this can be satised is for Q6 = 0 and I6 = V6 . So
6 = p6 I6 = 85I6 ,
with V6 = I6 and Q = 0.
6

Month 5
Now let t = 5, and assume that today is the rst day of month 5. Pretend that the number of
items in the warehouse at the start of today, I5 , is xed and allow that the number of items in
the warehouse one month later, I6 , is variable.
The two month prot is
5 = p5 V5 c5 Q5 + 6
= p5 V5 c5 Q5 + 85I6
= p5 V5 c5 Q5 + 85(I5 V5 + Q5 )
= 90V5 65Q5 + 85(I5 V5 + Q5 )
= 5V5 + 20Q5 + 85I5 .

80

CHAPTER 5. DYNAMIC PROGRAMMING


Also, we have I6 = I5 V5 + Q5
the warehouse, i.e. V5 I5 .

K, and we cannot distribute more stock than is currently in

This gives a linear programming problem in two variables V5 and Q5 with a constant added to
the objective function.
maximise: 5 = 5V5 + 20Q5 + 85I5
subject to:
V5
I5
Q5 V5
K I5
Q5 , V5
0.
The feasible region is

Rearranging the objective function gives


Q5 =
giving a slope of 1/4 and an intercept
optimum is at the point B. Hence

5
V5 +
20
5
20

5 85
I5 ,
20 20

85
20 I5 .

So, we want to maximise the intercept. The

V5 = I5
Q = K
5

5 = 90I5 + 20K

This corresponds to distributing everything we have this month and lling up the warehouse for
the start of next month.

Month 4
Step back another month, so t = 4. Now we pretend that I4 is xed, but allow that I5 be
variable.
The maximum of the three-month prot, 4 , is the maximum of

4 = p4 V4 c4 Q4 + 5 ,

CHAPTER 5. DYNAMIC PROGRAMMING

81

But 5 depends on I5 , which in turn depends on I4 , V4 and Q4 . So, we want to maximise

4 = p4 V4 c4 Q4 + 5

= p4 V4 c4 Q4 + 90I5 + 20K
= 85V4 70Q4 + 90(I4 V4 + Q4 ) + 20K
= 5V4 + 20Q4 + 90I4 + 20K
Similarly to when t = 5, we have the LPP
maximise: 4 = 5V4 + 20Q4 + 90I4 + 20K
subject to:
V4
I4
Q4 V4
K I4
Q4 , V4
0.
This has exactly the same feasible region as before (apart from replacing I5 , Q5 and V5 with I4 ,
Q4 and V4 ). Now the slope of the objective function is +1/4, which again gives the maximum
at the point
V4 = I4 and Q = K
4
giving

4 = 85I4 + 40K.

Month 3
Exactly the same method can be applied, so
3 = 15V3 + 13Q3 + 85I3 + 40K
and again the feasible region is the same (apart from replacing I4 , Q4 and V4 with I3 , Q3 and
V3 ). This time the slope is 15/13 > 1 so the optimum is at
V3 = 0

and Q = K I3
3

giving

3 = 72I3 + 53K.

Month 2
V2 = I2 and Q = K
2
giving

2 = 82I2 + 61K.

CHAPTER 5. DYNAMIC PROGRAMMING

82

Month 1
V1 = I1 and Q = K
1
giving

1 = 90I1 + 73K.

Because I1 = 300 and K = 1000 we have

1 = 90 300 + 73 1000 = 100000.

We also have enough information to calculate the numerical values for Vt , Q , It and t .
t

t
1
2
3
4
5
6

Vt
300
1000

Q
t
1000
1000

It
300
1000

t
100 000
143 000

Question:

why is 2 > 1

Note: Forward recursion does not work with this problem. This is because the maximum prot
made in the rst two months is attained by a dierent strategy than the one that is optimal
over the rst month.

Chapter 6

Stochastic Optimisation
6.1

Introduction

In Chapter 5 the rewards/costs of the transitions were assumed to be known exactly and we were
able to choose which transitions to make. In this chapter we deal with problems where these
assumptions are violated. Such problems are solved by stochastic programming or stochastic
optimisation algorithms. We shall concentrate on one aspect of stochastic optimisation known
as Markov Dynamic Programming (MDP), also known as Markov Programming or Markov
Decision Programming.
A stochastic (random) process is a set of random variables indexed by time, {Xt : t
shall only consider discrete-time random processes, X0 , X1 , X2 , . . . .

6.2

0}. We

Markov Chains

Denition
A sequence of random variables X0 , X1 , . . . is a Markov chain if the Markov property holds for
each random variable in the sequence: that is, n 0,
P (Xn+1

xn+1 | X0 = x0 , X1 = x1 , . . . , Xn = xn ) = P (Xn+1

xn+1 | Xn = xn )

If X0 , X1 , . . . are discrete random variables, we can write the Markov property equivalently as
P (Xn+1 = xn+1 | X0 = x0 , X1 = x1 , . . . , Xn = xn ) = P (Xn+1 = xn+1 | Xn = xn )
(i.e. change the s to =s).
Example 6.1

83

84

CHAPTER 6. STOCHASTIC OPTIMISATION

Mr. Bond is playing roulette. At each turn he places a 1 chip on number 17. Let Xn denote
the number of 1 chips he has after n turns. He begins with X0 = x0 chips.
The number of chips he has after n + 1 turns, Xn+1 , depends on the number of chips he has after
n turns. Given Xn , Xn+1 is independent of X0 , . . . , Xn1 . Therefore X0 , X1 , . . . is a Markov
chain.
In most of this chapter we consider nite horizon problems, i.e. n = 0, 1, . . . , N , with N xed.
In Section 6.6 we allow an innite number of stages.
In a nite horizon Markov dynamic programming problem we have the following:
1. a Markov chain X0 , X1 , . . . , XN with known transition probabilities,
2. costs or rewards associated with each transition in the Markov chain,
3. terminal costs or rewards associated with the states at the nal stage, and
4. (usually) actions that alter the transition probabilities and costs/rewards.

Notation
We must always use backward recursion for MDP problems (for reasons explained later). For
this reason, it is convenient to use n to denote the number of stages to go, rather than the
number of the stage (as was the case in Chapter 5). So,

n = 0 at the nal (terminal) stage


n = N at the initial stage

Let Xn denote the state with n stages to go.


Let Qn denote the state space with n stages to go. (Note dierence from Chapter 5.)
(n)

Let pi,j denote the transition probability for moving from state i with n stages to go to state j
(with n 1 stages to go).
(n)
pi,j = P (Xn1 = j | Xn = i).
(n)

The transition matrix P (n) is the matrix whose (i, j)th entry is pi,j .
Clearly,
(n)
(n)
0 pi,j
1 and
pi,j = 1 for all i.
j

So, each element of P (n) must be between zero and one, and the rows must sum to one.
(n)

(n)

(0)

(0)

Let ri,j or ci,j denote the transition reward/cost for the transition from state i with n stages
to go to state j (with n 1 stages to go).
Let ri or ci denote the terminal reward/cost for state i, i.e. the reward/cost incurred when
we end in state i.

85

CHAPTER 6. STOCHASTIC OPTIMISATION

Note: Often the transition probabilities will be the same at every stage, in which case the
(n)
superscript (n) in pi,j and P (n) can be dropped.

6.3

Markov dynamic programming without actions

Example 6.2

Stage 0
n=3

Stage 1
n=2

Stage 2
n=1

Stage 3
n=0
terminal
rewards

3
4

4
5

10

6
10

1
9

So, the state space for X0 is Q0 = {8, 9, 10}, for X1 is Q1 = {5, 6, 7}, for X2 is Q2 = {2, 3, 4}
and for X3 is Q3 = {1}.
(1)

The transition probabilities are P (X0 = 8 | X1 = 6) = p6,8 = 0.4, etc. The transition rewards
(3)

(2)

(0)

(0)

(0)

are r1,2 = 3, r2,5 = 8, etc. Finally, the terminal rewards are r8 = 1, r9 = 1 and r10 = 2.
Denitions
Let Rn (i) be the one-step expected reward when in state i with n stages to go (Xn = i). The
formula for Rn (i) is

(n) (n)

n 1

jQn1 pi,j ri,j


Rn (i) =
(0)

ri
n=0

86

CHAPTER 6. STOCHASTIC OPTIMISATION

Let Vn (i) be the total future expected reward when in state i with n stages to go (Xn = i). The
function Vn (i) is called the value function.

(n)
(n)
jQn1 pi,j rij + Vn1 (j)
n 1

Vn (i) =

(0)

ri
n=0

(n)
Rn (i) + jQ
p Vn1 (j) n 1

n1 i,j

(0)

R0 (i) = ri

n = 0.

This is a recursive formula for Vn (i) in terms of Vn1 , so we can use a similar methods to those
used in the last chapter in order to nd the expected reward over the whole problem, V3 (1).
So, we start at stage 3, when n = 0
(0)

V0 (8) = r8 =
(0)

V0 (9) = r9 =
(0)

and V0 (10) = r10 =


At stage 2 (n = 1) we can compute the one-step expected rewards for each state
R1 (5) =

(1) (1)
10
j=8 p5,j r5,j

R1 (6) =

(1) (1)
10
j=8 p6,j r6,j

and R1 (7) =

(1) (1)
10
j=8 p7,j r7,j

This in turn allows us to compute the value function for each state in stage 2 (n = 1)
V1 (5) = R1 (5) +

(1)
10
j=8 p5,j V0 (j)

V1 (6) = R1 (6) +

(1)
10
j=8 p6,j V0 (j)

and V1 (7) = R1 (7) +

(1)
10
j=8 p7,j V0 (j)

Repeat this for stage 1 (n = 2)


R2 (2) =
R2 (3) =
and R2 (4) =
V2 (2) =
V2 (3) =
and V2 (4) =

CHAPTER 6. STOCHASTIC OPTIMISATION

87

And nally for stage 0 (n = 3)


R3 (1) =
V3 (1) =
The expected reward over the three stages starting in state 1 is 19.276.

Note: the deterministic Example 5.2 had the same network and the same rewards (but no
terminal rewards). The maximum total reward was 22 (24 if we add the terminal reward). This
was achieved by visiting states 1, 3, 7 and 10. In the MDP example, we cannot decide which
route to take (the route is random), so it is not surprising that the expected total reward is less
than the maximum reward.

6.4

Markov dynamic programming with actions

Suppose the process is in state i with n stages to go (Xn = i), and that we now have K actions
(n) (n)
(n)
available a1 , a2 , . . . , aK = A(n) . We call A(n) the action space.
We can extend the dynamic programming model to depend on the actions taken:

The transition probabilities depend on the action taken at each stage, so


(n)

pi,j (a(n) ) = P (Xn1 = j | Xn = i, a(n) ),


where a(n) A(n) is the particular action chosen with n stages to go. Note that we also
insist that the random variables are Markovian with respect to the actions. That is, the
probability of a transition to Xn1 = j depends on Xn = i and a(n) , but does not depend
on the previous states, nor on any other actions.

The transition rewards/costs depend on the action taken at each stage.


(n)
(n)
ri,j (a(n) ) or ci,j (a(n) ).

So, we have

The terminal rewards/costs do not depend on the actions.

Now that we have actions, there is a choice available. We want to nd the sequence of actions
that will maximise our expected total reward, or minimise out expected total cost. First, we
extend the denitions of the one-step expected reward and the value function to take account
of the actions chosen.
Denitions
Rn (i, a(n) ) is the one-step expected reward when the state is i with n stages to go (Xn = i) and
action a(n) is taken.

Vn (i) is the future expected reward when in state i with n stages to go and an optimal action

plan is used for these remaining n stages. Vn (i) is called the optimal value function.

88

CHAPTER 6. STOCHASTIC OPTIMISATION


As before, we have an expression for the one-step expected reward:

(n) (n)
(n)

) ri,j (a(n) ) n 1

jQn1 pi,j (a
Rn (i, a(n) ) =
(0)

ri
n=0
The optimal value function is given by the following recursive equation.

Vn (i) =

maxa(n) A(n)

(n)

jQn1

(n)

pi,j (a(n) ) ri,j (a(n) ) + Vn1 (j)

(0)

ri

maxa(n) A(n) Rn (i, a(n) ) +


(0)
ri

n=0
(n)

jQn1

pi,j (a(n) ) Vn1 (j)

n=0

Example 6.3 Advertising for a Concert

A ticket agency is responsible for advertising and selling tickets for a concert which will take
place in two weeks. Each week the agency has the choice either to put a half-page advert in
the major newspapers and magazines, or just to submit the concert details to the whats-on
listings. To simplify the problem, the agency categorises the ticket sales into three states, fast
ticket sales, average ticket sales and slow ticket sales. At present, the ticket sales are average.
Let n denote the number of weeks remaining before the concert takes place. Label the states as:
3 for fast sales, 2 for average sales and 1 for slow sales. Label the actions as A for Advertise, and
D for Dont advertise. As the actions are the same at each stage we do not have to superscript
them with n.
We can represent this as a network

89

CHAPTER 6. STOCHASTIC OPTIMISATION

The transition matrices and rewards (in 1000s) are:


(0)
(0)
(0)
r1 = 2, r2 = 5 and r3 = 10.

1 0 1

(1)
ri,j (A) = 0 1 2
1 2 3

1 2 3

(1)
ri,j (D) = 2 3 4
3 4 5

.5 .3 .2
1.0 0.0 0.0

P (1) (A) = .4 .2 .4 P (1) (D) = .4 .3 .3


.3 .3 .4
.4 .4 .2
(2)

1 0 1

(2)

.4 .3 .3

r1,j (A) =
p1,j (A) =

(2)

r1,j (D) =
(2)

p1,j (D) =

1 2 3

.6 .3 .1

The way to solve this problem is to set up a table for every state and possible action in each
(n)

stage. For each action a and state i compute Rn (i, a) and jQ pi,j (a) Vn1 (j). This gives us
Rn (i, a) +

(n)
jQ pi,j (a)

Vn1 (j) and we choose the action that maximises this.

90

CHAPTER 6. STOCHASTIC OPTIMISATION


n
0

state
3

action

2
1
3

1
2

1
2

Vn (i)

sum

(n)

jQ pi,j (a)Vn1 (j)

0
1

Rn (i, a)

The last two columns tell us the optimal expected future reward and the action that should be
taken in order to attain the optimum. The optimal actions are: dont advertise with two weeks
to go; and with one week to go advertise only if the ticket sales are slow. The optimal expected
future reward is 7 340.

Notes:
The solution requires that the optimal actions are found for every state.
In deterministic dynamic programming problems the choice is the path through the network. In
MDP we can not choose the path that is taken. Instead, we choose actions which inuence the
probabilities of which path is taken.

6.5

Discounting

Suppose a prot of 1 now is worth more than 1 at the next stage. Reasons for this could be:

ination by the next stage 1 will have less spending power


investment 1 can be invested and interest earned by the next stage.
risk of bankruptcy no point in maximising next years reward if you may go bankrupt
this year.

In order to account for this, we use a discount factor , which compounds year on year. We
shall assume that is constant and 0 < < 1. So an amount of 1 at the next stage is worth
at todays prices.

91

CHAPTER 6. STOCHASTIC OPTIMISATION


The recursive equation for the optimal value function is now given by

(n)

Vn (i) = max Rn (i, a(n) ) +


pi,j (a(n) ) Vn1 (j)

a(n)
jQn1

Exercises 6, Question 5
Repeat the ticket agency example using a discount factor of
1. = 0.9
2. = 0.4
and compare with the initial answer.

6.6

Innite horizon problems

When the problem has no (obvious) nal stage, we still want to nd the optimal value function
and the optimal actions.
We shall assume that there is a discount factor (0 < < 1) and that the state space and
action space are the same for each stage. So, we can drop the subscript/superscript n.
The optimal value function, i.e. optimal expected future (discounted) reward, is

V (i) = max R(i, a) +


pi,j (a) V (j)

aA
jQ

pi,j (a (i)) V (j).

= R (i, a (i)) +

jQ

where a (i) denotes the optimal action when in state i.


In general, solving the above equation is dicult.
One approach is to guess a general form of the solution and solve it analytically. Another is to
use iterative methods to obtain the optimal value function numerically.
Example 6.4 Machine Replacement

A machine in a manufacturing plant can be in states {0, 1, 2, . . . } according to the condition it


is in. At the end of each week the machine is inspected, its condition is noted and we make a
decision as to whether or not to replace it.

92

CHAPTER 6. STOCHASTIC OPTIMISATION

Let Yt denote the state of the machine at the end of week t. If Yt = i and the action is dont
replace, we incur a running cost of c(i) = i and
i

with prob.

i+1

Yt+1 =

with prob.

1
2
1
2

If Yt = i and the action is replace we incur a xed cost of K and


0 with prob.

Yt+1 =

1 with prob.

1
2
1
2

The costs are discounted by a factor over an innite horizon.


Thus
Q = {0, 1, 2, . . . }

A = {1 = replace, 2 = dont replace}


pi,0 (1) =
pi,1 (1) =
pi,i (2) =

1
2
1
2
1
2

pi,i+1 (2) =

1
2

r(i, 1) = K
r(i, 2) = i

Denition
A policy (i) is a decision rule (or action plan) that species an action for each state i in the
state space Q.
We might guess that the optimal policy is of the form: choose a = 1 (replace) if i
a = 2 (dont replace) if i < N , for some unknown N .
The above policy is (i) = 2 for 0 i
corresponding to this policy, V (i), is

N 1 and (i) = 1 for i

V (i) = R(i, (i)) +

N and

N and the value function

pi,j ((i)) V (j)


j

i +
K +

1
2 V (i)

1
+ 2 V (i + 1)

if 0

+ 1 V (1)
2

if i

1
2 V (0)

N 1

Analytical methods can be used to show that this policy is optimal and to nd the values of
V (i) and N , but this is not covered in this course. Instead we shall use an iterative method.

For this iterative method, pretend that the number of stages is nite. Let Vn (i) be the expected
future (discounted) reward when there are n stages to go and we use the optimal policy. Then

V0 (i) = 0 i N

Vn (i) = max {Un (i, 1), Un (i, 2)}

i N

93

CHAPTER 6. STOCHASTIC OPTIMISATION


where
1
1
Vn1 (0) + Vn1 (1)
2
2
1
1
Un (i, 2) = i + Vn1 (i) + Vn1 (i + 1) .
2
2

Un (i, 1) = K +

If Vn (i) converges (say to V (i)) as n , then V (i) is the optimal value function for the
problem with innite number of stages because it satises the equation

V (i) = max R(i, a) +


pi,j (a)V (j) .

aA
jQ

The iterative procedure will also give us the optimal policy.

If we now assume specic values for K and of 10 and 0.75 respectively, and run the above
iterative method we get:
State
i
0
1
2
3
4
5
6
7
8
9
10
11

a
2
2
2
2
2
2
2
2
2
2

1
V1 (i)
-0
-1
-2
-3
-4
-5
-6
-7
-8
-9

a
2
2
2
2
2
2
1
1
1
1

Iteration Number
2
3
(i)

V2
a
V3 (i)
-0.375
2
-0.938
-2.125
2
-3.250
-3.875
2
-5.562
-5.625
2
-7.875
-7.375
2 -10.188
-9.125
1 -10.938
-10.375
1 -10.938
-10.375
1 -10.938
-10.375
1 -10.938
-10.375
1 -10.938

(n)
a
2
2
2
2
1
1
1
1
1
1
1
1

38

V38 (i)
-5.106
-8.511
-11.518
-13.864
-15.106
-15.106
-15.106
-15.106
-15.106
-15.106
-15.106
-15.106

a
2
2
2
2
1
1
1
1
1
1
1
1

39

V39 (i)
-5.106
-8.511
-11.518
-13.864
-15.106
-15.106
-15.106
-15.106
-15.106
-15.106
-15.106
-15.106

The function Vn (i) converges (but needs 38 iterations to converge to 3 d.p.s). The convergence
for such problems in general is often very slow.

Notice that the solution is of the form postulated above: Replace if i 4 and continue if i 3.
The optimal value function is given in the rightmost column (with V (i) = 15.106 for all
i 4).

6.7

Optimal Stopping Routines

At every stage of an optimal stopping routine there are two possible actions, stop or continue.
If the action is stop then a reward is obtained, depending on the current state, and that is the

94

CHAPTER 6. STOCHASTIC OPTIMISATION

end of the routine. If the action is continue then a cost is incurred and the routine proceeds to
the next stage, according to a Markov chain. Assume there are only a nite number of stages
in the routine.
The routines we shall consider have the following form.
The reward for stopping in state i before the nal stage is r(i). This is independent of stage.
The reward for continuing to the nal stage is r (0) (i).
The cost of continuing is c(i). It depends on the state but not the stage.
If continuing, the probability that the next state is j given that the current state is i with n
(n)
stages to go is pij .
There is no discounting.
The optimal value function is

Vn (i) = max

r(i), c(i) +

and

jQn1

(n)
pi,j Vn1 (j)

(n

1)

V0 (i) = r (0) (i)

Example: TV Game Show


A contestant has N boxes each containing a random amount of money from 0 to 1000. All
amounts are equally likely, and the amount of money in each box is independent of the amount
of money in the others.
The game-show host opens the rst box and the contestant has to choose whether to keep the
money, in which case the game is over (stop) or to move on to the next box. Once the
contestant has shut a box he/she can no longer claim the money in that box.
Given that the number of boxes is N = 4, what is the optimal strategy that the contestant
should follow?

Stages are the boxes.

The rewards are the amounts in the box (i.e. the states) r(i) = i.

n is the number of stages left, i.e. the number of boxes still to be accepted or rejected.
For n
1, state space Qn = all possible values in box. The amounts of money are in
pounds, so Qn = {0, 1, . . . , 999, 1000}.
The terminal reward is nothing: r (0) (i) = 0.
The transition costs are zero: c(i) = 0.
For n

(n)

2, the transition probabilities are pij =.


(1)

Q0 = {0} and pi0 = 1.

95

CHAPTER 6. STOCHASTIC OPTIMISATION


n = 0 Once all of the boxes have been rejected:
V0 (0) =
n = 1 This corresponds to deciding whether to accept or reject the last of the four boxes.
V1 (i) = max {i, V0 (0)}
=
So the optimal policy is

n = 2 When the contestant decides on the third box:

1000
pij V1 (j)
V2 (i) = max i,

j=0

1000

pij V1 (j) =
j=0

1
1001

1000

j
j=0

=
giving V2 (i) =
So the optimal policy is to take the money if and only the amount in the box is

n = 3 When the contestant opens the second box:

1000
pij V2 (j)
V3 (i) = max i,

j=0

1000

pij V2 (j) =
j=0

=
V3 (i) =

1
1001

1000

500

500 +
j=0

j=501

96

CHAPTER 6. STOCHASTIC OPTIMISATION


So the optimal policy is to take the money if and only if the amount in the box is

n = 4 When the contestant sees the contents of the rst box:

1000
pij V3 (j)
V4 (i) = max i,

j=0

1000

pij V3 (j) =
j=0

625
1
625.125 +
1001
j=0

1000

j=626

V4 (i) =
So the optimal policy is to take the money if and only if the amount in the box is

The complete optimal strategy is for the contestant to accept a box if and only if it contains at
least the amount shown
Box number

Amount

Appendix 1: Exercises
STAT7003: Exercises 1
1. Solve the following Linear programming problems graphically.
(a)

(b)

minimise: z = 5x1 + 2x2


subject to: 4x1 + x2
x1 + x2
x2
x1

8
5
2
0

maximise: z = x1 + 2x2
subject to: 3x1 + 3x2
x1 x2
x1 + x2
x1 + 3x2
x1 , x2

(c)

maximise: z = x1 + x2
subject to: x1 + x2
3
x1
2
x1 , x2
0

(d)

maximise: z = 3x1 x2
subject to:
x1 2x2
x1 + x2
4x1 + 2x2
x2
x1
x1
x1 , x2

9
2
6
6
0

4
8
20
4
4
8
0

2. A company wants to purchase at most 1800 units of a product. There are two types of
the product available: product 1 and product 2. Product 1 occupies 2 ft3 , costs 12 and
the company makes a prot of 3. Product 2 occupies 3 ft3 , costs 15 and the company
97

98

Appendix 1: Exercises 1

makes a prot of 4. The company wants to maximise its prot. If the budget is 15 000
and the warehouse has 3 000 ft3 for the product,
a) set up the problem as a linear programming problem and
b) solve the problem graphically.
3. The BW Dog Food manufacturer has four production plants, and ship cases of dog food
from these plants to 10 warehouses. The shipping costs per week for each plant/warehouse
combination are given in the matrix below. The number of cases required by each warehouse and the capacity of each plant (in number of cases) are also given given.

Plant
1
2
3
4
Minimum number of
cases required

1
5
6
8
7

2
4
8
3
9

3
8
3
7
6

4
4
3
5
9

Warehouse
5 6 7
9 2 9
1 6 8
4 4 2
3 8 8
4

8
4
5
3
3

9
8
4
4
1

10
6
3
5
6

Capacity
10
14
16
12

a) Formulate this as a linear programming problem, for minimising costs. Hint: this
problem has forty control variables.
b) Discuss how the model would change if in addition we had to consider the production
cost per case, which are 12 15 10 and 16 for plants 1, 2, 3 and 4 respectively.
4. a) Show graphically that the following linear programming problem has an unbounded
feasible region.
1
maximise: z = 2 x1 + x2
subject to:
x1 + x2
6
3x1 2x2
3
x1 8x2
8
x1 , x2
0
b) Is there a optimal solution to this problem, if so what is it?
c) Would there be an optimal solution to this problem if it were a minimisation problem
1
instead (again with z = 2 x1 + x2 ), if so what is it?

99

Appendix 1: Exercises 2

STAT7003: Exercises 2
1. For the problem in Exercises 1 Question 2, obtain the shadow prices for each constraint,
and give an interpretation to these shadow prices.
2. Solve these linear programming problems using the numerical na method.
ve
a)

maximise: z = 2x1 + 5x2


subject to: 2x1 + 5x2
8
7x1 + 7x2
12
x1 , x2
0

b)

maximise: z = x1 + 3x2 + x3
subject to: 2x1 5x2 + x3
x1 + 4x2
x1 , x2 , x3

3
5
0

3. Solve the following linear programming problem using the simplex method.
maximise: z = x1 + 2x2
subject to: x1 + 2x2
4
2x1 + 5x2
11
x1 , x2
0
(In all problems that involve using the simplex method, you are expected to include all
intermediate tables and to show your working.)
4. A crisp company makes pizza avoured and chilli avoured crisps. These crisps go through
three main processes: frying, avouring and packing. Each kilogram of pizza avoured
crisps takes 3 minutes to fry, 5 minutes to avour and 2 minutes to pack. Each kilogram
of chilli avoured crisps takes 3 minutes to fry, 4 minutes to avour and 3 minutes to pack.
The net prot on each Kg of pizza avoured crisps is 0.12 and is 0.10 for each Kg of
chilli avoured crisps. The frying machine is available 4 hrs each day, the packing machine
is available 6 hrs each day and the the avouring machine is available 8 hours each day.
The manufacturer wants to maximise the daily prot for these two products. Express this
as a linear programming problem and solve it using the simplex method.

Appendix 1: Exercises 3

100

STAT7003: Exercises 3
1. Convert the following linear programming problem into a maximisation problem in standard form, hence use the simplex algorithm to nd the optimal solution. Write down the
optimal value of the objective function.
minimise: z = x1 3x2 2x3
subject to: 3x1 x2 + 2x3
7
2x1 + 4x2
12
4x1 3x2 8x3
10
x1 , x2 , x3
0
2. Write down the following linear programming problem in canonical form using articial
variables. State clearly which variables are slack, surplus and articial variables. What is
the auxiliary objective function z in terms of the control variables xi ? You do not need
to solve this problem.
maximise: z = x1 + 2x2 x3
subject to:
x1 + x2
4
x1 2x2 + x3 = 3
2x1 + 2x2 + x3
12
x1 + x2 x3
6
x1 , x2 , x3
0
3. Write down the following linear programming problem in canonical form using articial
variables. Solve it using the big-M method.
maximise: z = x1 + 2x2
subject to: x1 + x2 = 10
x2
8
x1 , x2
0
4. Write down the following linear programming problem in canonical form using articial
variables. Solve it using the two-phase method.
maximise: z = 4x1 x2
subject to:
x1 + x2
4
x1
8
x1 + x2
1
x1 , x2
0

101

Appendix 1: Exercises 4

STAT7003: Exercises 4
1. Given the following pay-o matrix, nd which of As strategies are dominant stating which
strategies are dominated. Also nd which of Bs strategies are dominant stating which
strategies are dominated.
Write down the sub-pay-o matrix.

Player
A

A1
A2
A3
A4

B1
7
5
3
6

Player B
B2 B3 B4
0
3
2
-1
1
4
4
0
0
3
2
8

2. Find the saddle point in the following pay-o matrix, and hence state the optimal policy
for A and B.

Player
A

B1
4
7
1
-6
2

A1
A2
A3
A4
A4

Player B
B2 B3 B4
-2 -6
6
5
3
4
-4
0
5
7
2 -1
6
4
3

B5
-3
-1
0
-2
0

3. Consider a game between players A and B with the following pay-o matrix.

Player
A

A1
A2
A3

Player B
B1 B2 B3
2
1
6
0
5
4
4
0
1

Explain why the value of this game must be in the interval [1, 4].
4. Find an upper and lower bound for the value of the following game between players A and
B.

Player
A

A1
A2
A3

Player B
B1 B2 B3 B4
3 -3
5 -6
2
2 -6
7
-4
5
0
2

102

Appendix 1: Exercises 4

5. Find the saddle point of the following 2 games, and so give the value of the game and the
optimal strategies.

(a)

(b)

Player
A

Player
A

A1
A2

A1
A2
A3

Player B
B1 B2
2
-1
2
4
Player B
B1 B2 B3
8 -5 -4
-1
4 -3
-7
7 -5

6. Find the region of values p and q such that the matrix entry (A1 , B3 ) is a saddle point for
a game with the following pay-o matrix. Hint: it may help to sketch a graph of q against
p plotting the inequalities that are required to make the saddle point.

Player
A

A1
A2
A3

B1
3
2p
q

Player B
B2
B3
(q + 1)
p
0
2
5 (q 2)

7. Solve the following game between players A and B graphically and nd the value of the
game.

Player
A

A1
A2

Player B
B1 B2 B3 B4
2
3 -4
6
3
4
5 -4

8. In the game scissors-paper-stone two players simultaneously choose one of those three
objects. If both players make the same choice then it is a draw, otherwise the winner is
determined by the following three rules:
scissors cuts paper
stone blunts scissors and
paper covers stone.
Draw up a zero-sum game pay-o matrix for this game. What is the value of this game?
Without solving explicitly suggest what the optimal policy is, and explain why.
9. Consider a game between players with the following pay-o matrix.

103

Appendix 1: Exercises 4

Player
A

A1
A2
A3

Player B
B1 B2 B3
-2 -2
2
-1
1
0
7
2 -2

Draw up the simplex algorithm initial table to solve this problem.


The nal table (to 3 d.p.) is given below.

Y3
Y2
Y1
Z

Y1
0
0
1
0

Y2
0
1
0
0

Y3
1
0
0
0

Y4
.226
-.211
.083
.098

Y5
-.036
.368
-.180
.150

Y6
-.015
-.053
.128
.060

Soln.
.173
.105
.030
.308

State the optimal policy for both players (to 2 d.p.) and the value of the game.

104

Appendix 1: Exercises 5

STAT7003: Exercises 5
1. A real estate development rm is comparing the following alternative development projects:
building and leasing an oce park, building an oce block for rent, buying and leasing a
warehouse, building a shopping centre and building then selling a block of ats.
The nancial payos of these projects depends on the interest rate levels over the next ve
years. They could decline remain stable or increase. The nancial returns for the projects
( millions) are shown in the following payo table.

Project
oce park
oce block
warehouse
shopping centre
block of ats

Interest rates
Decline Stable Increase
2
6.8
18
6
7.6
9.4
6.8
5.6
4
1.8
9.6
15.6
12.8
6
2.4

(a) Check for dominated investments, and exclude them (if any) from the payo table.
(b) Find the optimal investment solutions based on the following decision principles:
maximax, Laplace criterion, and Hurwicz criterion.
(c) construct the regret matrix, and obtain the minimax regret strategy for the rm.
2. In an election year a UK investor must decide whether to invest capital in local stocks,
long term gilts or an overseas mutual fund. Over the next four years the returns on the
investment will depend on which of the three parties will win the election. With the help
of a broker and after reading daily comments in the business section of the newspapers,
the investor compiles the following table of of predicted annual percentage returns and
their dependence on the winning party. The investor wishes to maximise the percentage
return on the invested capital.

Local Stocks
Long term gilts
Mutual Fund

Winning Party
P1
P2
P3
15%
5% 12%
7% 10%
7%
5% 11% 10%

(a) Check for dominated investments, and exclude them (if any) from the payo table.
(b) Find the optimal investment solutions based on the following decision principles:
maximax, Laplace criterion, Hurwicz criterion and minimax regret.
(c) The investor estimates that the winning chances of the three parties P1, P2 and P3
are 0.4, 0.5, and 0.1 respectively. Calculate the expectations of investment returns.
Find the investment with the largest expected return and the one with the smallest
expected variance of return.

105

Appendix 1: Exercises 5
3. For the network shown below:
(a) The rst forward recursion equation is
f1 (S1 ) = max {rxS1 }

S1 {B, C, D}.

x{A}

Write down equations for f2 (S2 ), f3 (S3 ) and f4 (S4 ).


(b) Use these forward recursion equations to nd the optimal route, and optimal reward.
(c) The rst backward recursion equation is
g3 (S3 ) = max {rS3 x }

S3 {G, H, I, J}.

x{K}

Write down equations for g2 (S2 ), g1 (S1 ) and g0 (S0 ).


(d) Use these backward recursion equations to nd the optimal route and optimal reward.
G
B

10
A

10

E
C

10

F
6
D

12

H
7

9
I

11

8
J

4. Resource Allocation Problem


The following table lists the expenditures and revenues (in M) for a resource allocation
problem.
Assume that the maximum budget is 6M and they want to maximise their revenue.
Formulate this problem as a dynamic programming problem (dening the stages and
states), draw the network diagram, and use forward recursion to nd the optimal solution.
Proposed
Expansion
0
1
2

Plant 1
expend revenue
0
0
1
3
3
5

Plant 2
expend revenue
0
0
2
3
3
4

Plant 3
expend revenue
0
0
1
2
2
5

5. Consider the resource allocation problem. Draw the network associated with tables (a)
and (b) below, where ci and Ri denote the expenditure and revenue for the expansion of
plant i. Assume that the total available capital is 8 million, and that plant i must be
enlarged, unless it specically has a proposal with zero ci and zero Ri .
(a)

106

Appendix 1: Exercises 5

Proposal
1
2
3
4

Plant 1
c1 R1
3
5
4
6

Plant 2
c2 R2
3
4
4
5
5
8

Plant 3
c3 R3
0
0
2
3
3
5
6
9

(b)
Proposal
1
2
3

Plant 1
c1 R1
0
0
3
5
4
7

Plant 2
c2 R2
1 1.5
3
5
4
6

Plant 3
c3 R3
0
0
1 2.1

Plant 4
c4 R4
0
0
2 2.8
3 3.6

6. (a) Solve Problem (5a) using the forward recursive equations.


(b) Similarly, using backward recursion obtain the optimal solution to Problem (5a).
7. Solve Problem (5b) using forward recursion.
8. A company is planning its advertising strategy for next year for its main product, in three
dierent European countries. Since these countries have dierent languages and cultures,
the advertising campaigns are developed independently. Six million pounds are available
for advertising next year, the advertising expenditure for each product must be a multiple
of 1 million, and each product must be advertised. The vice-president for marketing has
established the objective: to determine how much to spend on each product in order to
maximise its total increase in sales.
The following table gives the estimates increase in sales ( millions) for the dierent
advertising expenditures. Note: this is very similar to the resource allocation problem.
(a) What are the stages and states in this dynamic programming problem.
(b) Sketch a network of the problem (you do not need to include the rewards on the
network).
(c) Solve the problem using forward recursion.
Advertising
Expenditure
1
2
3
4

Country
1
2
3
7
4
6
10 8
9
14 11 14
17 14 15

9. Warehousing Problem
You are given the warehousing problem with T = 4, K = 1500, I1 = 750 and the prices
and costs in the following table.

107

Appendix 1: Exercises 5
t
1
2
3
4

pt
18
20
22
20

ct
20
23
16
20

Again assume that the warehouse should be empty after four months (I5 =0).
Find the optimal values for Vt and Qt and the maximum future prot t . Hint: be
particularly careful about your solutions for V2 and Q .
2
10. The University Shop sells college scarves during the three terms of the academic year.
Each term up to 100 scarves can be kept in the shop stores. The shop has to order the
number of scarves it wants delivered at the start of the next term during the current term.
At the start of term 1 they have 30 scarves in the store room. Assume that all scarves
taken from the store room and placed in the shop during the term will be sold.
They want to have an empty store at the end of term 3, and do not need to order any
scarves during term 3 as this can be done during the long summer vacation.
The costs and selling prices are given in s below.
Formulate this as a warehousing problem using the shop store at the warehouse.
Use this to nd the optimal number of scarves to be ordered during each term (for delivery
at the start of next term), and the optimal prot that can be expected.
term
1
2
3

cost price
7
6
6

selling price
10
10
7

108

Appendix 1: Exercises 6

STAT7003: Exercises 6
1. The following MDP uses the same notation as given in Chapter 6 of the lecture notes.
Assume that there are 2 stages to go, and at both stages the state space is {0, 1}. You are
currently in state 1, i.e. X2 = 1. At each stage there are two actions possible a1 and a2 .
(0)

(0)

The terminal rewards are r0 = 0 and r1 = 2.


The transition probablilities are (independent of the stage)
P (a1 ) =

.8 .2
.5 .5

P (a2 ) =

.1 .9
.2 .8

The transition rewards for action a1 are (independent of the stage)


r(a1 ) =

0 1
1 2

The transition rewards for action a2 are all zero.


Use a table similar to the one in Section 6.4 to nd the optimal expected reward, and the
optimal actions for this problem.
2. A nancial company makes heavy use of a computer network for its trading. The network
server they use is old and is due to be replaced in a months time. Currently the server
often becomes unstable, and the best way to make the server revert to a stable state
is to reboot it.
Assume that the decision to reboot or not reboot the server is made once an hour. If
the server is unstable, then it will certainly remain unstable until it is rebooted. If it is
unstable and is rebooted it will be stable and fully operational in one hour 90% of the
time but 10% of the time the reboot is unsuccessful and the server is still unstable. If the
server is stable, then there is a 40% probability it will be unstable in one hours time if
the machine is not rebooted. However if the server is rebooted when it is stable, it will
denitely be stable in one hours time.
If the server is not rebooted, the reward is 20 units if the machine is stable at the beginning
of the hour, and 10 units if it is unstable. If the machine is rebooted, the down-time and
the disruption caused means that there is zero reward for that hour.
For the last hour of the trading day, there is no point in rebooting as this can be done
overnight. So, assume that the terminal rewards are 20 and 10 for a stable and an unstable
server respectively.
(a) Write down the transition matrices for the state of the server depending on the action
of reboot or dont reboot.
(b) Write down the recursion equation dening the optimal value function with n stages
to go.
(c) Find the optimal actions with 3 stages to go.

109

Appendix 1: Exercises 6

3. A self-employed person can greatly aect the standing of her company depending how
hard she works. The harder she works the stronger the business becomes and the greater
the prots. However she realises there is a downside to working hard in that she has less
time for her personal life. Also working hard when the the business is very successful is
very stressful and could lead to a nervous breakdown.
She asks your advice as to how hard she should work depending on the standing of her
company from year to year. After a brief discussion you simplify the problem to have the
following format:
The states (standing) of the business are X = 0 bankrupt, X = 1 poor business, X = 2
average business, X = 3 successful business or X = 4 very successful business.
The actions she can take are a = 1 slack o, a = 2 work as normal, or a = 3 work
very hard. Each stage is a year, and the reward for that year is related to the value of
the company minus the personal cost of working in the form:
Rn (Xn = i, an = 1) = i3
Rn (Xn = i, an = 2) = i3 10
Rn (Xn = i, an = 3) = i3 25.
The terminal reward at retirement is the value of the business, namely:
R0 (X0 = i) = i3 .
The transition probabilities, depend on the
by the transition matrices:

1
0.8

P (a = 1) = 0.1

0.1
0

amount she works during that year, are given

0
0.1
0.5
0.3
0.1

0
0.1
0.3
0.3
0.3

0
0
0.1
0.2
0.4

0
0
0
0.1
0.2

1
0.1
0.1
0
0

0
0.7
0.1
0.1
0

0
0.2
0.6
0.1
0.1

0
0
0.2
0.6
0.4

0
0
0
0.2
0.5

1
0
0
0.1
0.3

0
0.3
0
0
0

0
0.4
0.3
0.1
0

0
0.3
0.4
0.3
0.1

0
0
0.3
0.5
0.6

P (a = 2) =

P (a = 3) =

(a) Explain in words why the top row of each transition matrix is (1, 0, 0, 0, 0).

Appendix 1: Exercises 6

110

(b) Write out the table to obtain the optimal expected reward and the optimal actions,
with two and one stages to go, for each of the ve states.
(c) Give an informal explanation as to why the optimal action with two stages to go
starting in state 1 is to take action a = 1, to slack o.
Do you think that starting in state one with many stages to go would give a dierent
optimal action, why?
4. A company has asked an employment agency to send interviewees for a vacant position.
The reward for four dierent standard of applicant are:
100 for an Excellent candidate,
60 for Good,
30 for Poor and
0 for a Hopeless candidate.
The employment agency is not a very good one, and so the standard of the candidate they
send is equally likely to be Excellent, Good, Poor or Hopeless. The cost of continuing and
having a further interview is 20. Assume that the agency has a xed maximum number of
applicants they will send.

(a) Formulate this as an optimal stopping MDP. Write down the formula for Vn (i) the
optimal reward given that the candidate sent with n stages to go is of the standard i.

(b) Solve this problem by writing down the optimal values in tabular format in order to
nd the optimal policy with 3 stages to go.
(c) Can you suggest what the optimal strategy would be if the agency will continue to
send candidates indenitely (until one is employed by the company).
5. Recompute the optimal policy of Example 6.2 (page 88) but now using a discount factor:
(a) of = 0.9 and
(b) of = 0.4.
Compare these solutions to the non-discounted solution.
6. An innite horizon MDP has the following form. There are 3 states 0, 1 and 2, and two
actions a = 1 and a = 2. Given action a = 1, the transitions to each state are equally
likely and the reward is always 2. Given action a = 2, the transition probabilities are 0.9
that the state remains the same and 0.05 that it moves to one of the other two states. The
rewards are

0
for j = 0

rij (a = 2) =
1
for j = 1

10 for j = 2.
There is a discount factor of .

(a) Write down the transition matrices for both actions, and nd R(i, a) for all values of
i and a.

111

Appendix 1: Exercises 6
(b) Write down an expression for the optimal value function V (i).
(c) Let = 3/4 and V0 (i) = 0 for all i. Using
2

pij (a) Vm (j)

Um+1 (i, a) = R(i, a) +


j=0

nd U1 (i, a) for all values of i and a. Use this to nd the rst iteration of the value
function V1 (i) for values of i.
(d) Repeat step (c) for m = 2 to nd U2 (i, a) and V2 (i).

Appendix 2: Reading List


This course covers a range of topics that are not usually covered all in one book. The following
list is for further reading, and the ones with asterisks are more relevant than the others.
* F.S. Hillier & G.J. Lieberman Introduction to Operations Research (2005, McGraw Hill).

* B.Kolman & R.E.Beck, Elementary Linear Programming with Applications (1980, Academic
Press).

D. Bertsimas & J. Tsitsiklis Introduction to Linear Optimization (1997, Athena Scientic).

G. Gordon & I. Pressman Quantitative Decision-Making for Business (1978, Prentice Hall).

Wilkes, M. Operational Research Analysis and Applications (1989, McGraw-Hill)

A. K. Dixit & S. Skeath, Games of Strategy (1999, Norton).

K.G. Binmore, Fun and Games (1992, Houghton Miin).

R. Gibbons A Primer in Game Theory (1992, Pearson Education).

* W. L. Winston Operations Research : Applications and Algorithms (1994, Duxbury Press).

S. M. Ross Introduction to Stochastic Dynamic Programming (1983, Academic Press).

* J.Bather, Decision Theory: An Introduction to Dynamic Programming and Sequential Decisions (2000, Wiley).
112

You might also like