Professional Documents
Culture Documents
Optimisation Algorithms
in
Operational Research
Course Notes
2011/2012
Contents
Course Outline
1 Linear Programming
1.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2
Mathematical Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3
1.4
1.5
1.6
10
1.7
11
1.8
12
17
2.1
18
2.2
Numerical Na Method
ve
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19
2.3
22
2.4
26
2.5
28
2.6
31
2.7
33
CONTENTS
2.8
35
2.9
Two-Phase Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
42
3.1
Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
3.2
Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
4 Game Theory
50
4.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
4.2
Basic Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
4.3
Nash Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
4.4
Dominance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
4.5
Saddle Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
4.6
56
4.7
57
4.8
59
4.9
62
5 Dynamic Programming
68
5.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
68
5.2
Forward Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
5.3
Backward Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
72
5.4
73
5.5
78
6 Stochastic Optimisation
83
6.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
6.2
Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
6.3
85
6.4
87
6.5
Discounting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
90
6.6
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
91
6.7
93
Course Outline
Lecturer
Dr. Ricardo Silva (room 139) ricardo@stats.ucl.ac.uk
Aims of course
To provide an introduction to the ideas underlying the optimal choice of component variables, subject
to constraints, that maximise (or minimise) an objective function. The algorithms described are both
mathematically interesting and applicable to a wide variety of complex real life situations.
Objectives of course
On successful completion of the course, a student should be able to understand the theoretical concepts
of linear programming, dynamic programming and nite Markov programming, set up correct models of
real life problems, interpret results correctly and check the validity of assumptions.
Applications
Optimisation methods provide the means for successful business strategies, scientic planning and statistical estimation under constraints. They are a critical component of any area where decision making
under limited resources is necessary.
Prerequisites
STAT1004 or equivalent.
Course content
Linear programming: graphical solution techniques, simplex method, sensitivity analysis.
Game theory: Zero-sum two player games, minimax, maximin, Laplace, Hurwicz and minimax regret
strategies, linear programming formulation.
Dynamic programming: systems, states, stages, principle of optimality, forward and backward recurrence.
Markov sequential processes: Markov processes with rewards values-iteration, policy iteration, sequential
decision processes.
Course Outline
Texts
This course covers a range of topics that are not usually covered all in one book. This is an abridged list
of textbooks. A more detailed list is given in Appendix 2 of the course notes.
F.S. Hillier & G.J. Lieberman Introduction to Operations Research (2005, McGraw Hill).
G. Gordon & I. Pressman Quantitative Decision-Making for Business (1978, Prentice Hall).
B.Kolman & R.E.Beck, Elementary Linear Programming with Applications (1980, Academic Press).
W. L. Winston Operations research : applications and algorithms (1994, Duxbury Press).
S. M. Ross Introduction to Stochastic Dynamic Programming (1983, Academic Press).
The nal mark is a 9 to 1 weighted average of the written examination and in-course assessment marks.
Timetabled workload
Lectures and problems classes: 3 hours per week in term 1.
Chapter 1
Linear Programming
1.1
Introduction
What is OR?
OR stands for Operational research or alternatively Operations Research, and is sometimes
called Management Science.
To the mathematician, OR means maximising some function subject to a set of constraints.
To a manager, OR means building a model for his/her business which:
1.2
Mathematical Programming
Optimisation is the problem of nding the maximum or minimum of a function f (x) and the
value of x at which f (x) attains that maximum/minimum.
If x is the value that minimises f (x), then x is also the value that maximises f (x) so a
minimisation problem can be trivially turned into a maximisation problem.
Example 1.1 Unconstrained Minimisation
Minimise
z = x2 + x2 .
1
2
It is easy to see that x = 0 and x = 0 gives the unique minimum, with z = 0. For a less
2
1
obvious problem partial derivatives would yield the solution.
Example 1.2 Constrained Minimisation
Minimise
z = x2 + x2 .
1
2
Subject to the conditions
x1 x2 = 3
x2
2
Note that the value of z is the squared distance of the point (x1 , x2 ) from the origin (Pythagorass
Theorem). From the gure the point that satises the constraints that is also the closest to the
origin is x = 5, x = 2, which is the optimal solution.
2
1
A mathematical program is an optimisation problem in which the objective and constraints are
given as mathematical functions and functional relationships:
g1 (x1 , x2 , . . . , xn )
g2 (x1 , x2 , . . . , xn )
subject to:
=
...
gm (x1 , x2 , . . . , xn )
b1
b
2
...
bm
The xj are called the control variables (or decision variables), f is the objective function and bi
(i = 1, . . . , m) are the resource values.
1.3
Denition
A mathematical program is a linear program if f (x1 , x2 , . . . , xn ) and all gi (x1 , x2 , . . . , xn ) are
linear in the decision variables,
i.e. if f and all gi can be written as:
f (x1 , x2 , . . . , xn ) = c1 x1 + c2 x2 + . . . + cn xn
and
gi (x1 , x2 , . . . , xn ) = ai,1 x1 + ai,2 x2 + . . . + ai,n xn ,
where cj and ai,j (i = 1, 2, . . . , m and j = 1, . . . , n) are known constants and the xj are continuous
real variables.
In most problems we deal with in this course we also insist that the xi s are non-negative.
Denitions
If all xi s are non-negative variables and all the constraints are equalities then the LPP is in
canonical form.
If all xi s are non-negative variables and all the constraints use then the LPP is in standard
form.
The feasible region for a LPP is the set of all points (x1 , . . . , xn ) that satisfy all the constraints
(and satisfy xi 0 i if xi s are non-negative variables).
Example 1.3 Clock Manufacturer
A clock manufacture produces a number of standard clocks and a number of alarm clocks each
day.
Resource constraints:
1600
1800
350
Prot:
2
6
4
2
1
hours labour
hours processing
hours labour
hours processing
alarm assembly.
gives 3 prot
gives 8 prot
2x1 + 4x2
6x1 + 2x2
x2 350
1600
1800
(labour)
(processing)
(alarm assemblies).
0 and x2
0.
There are several methods of solving this problem: graphical solution methods and numerical
methods, including the Simplex algorithm.
1.4
800
700
700
Alarm Clock Production x2
900
800
900
600
500
400
300
600
500
400
300
200
our
nst
rain
100
100
200
300
500
400
100
700
800
100
200
300
500
400
100
700
800
700
800
900
900
800
800
Lab
our
200
Co
nst
rain
nt
300
500
i
nstra
g Co
essin
400
600
Proc
nt
500
700
i
nstra
g Co
essin
600
Proc
700
Alarm Clock Production x2
Co
200
100
Lab
400
Alarm AssembliesConstraint
300
Lab
our
200
100
Co
nst
rain
100
100
200
300
400
500
100
700
800
100
200
300
400
500
100
900
800
onst
raint
600
gC
essin
Proc
700
z=5200
(100,350)
Alarm AssembliesConstraint
500
400
300
Lab
our
200
Co
nst
rain
z=3100
z=1200
100
100
200
300
400
500
100
700
800
By taking the line x2 = 3 x1 + c for some arbitrary value of c, and moving it up and down
8
in a parallel fashion (i.e. increasing/decreasing c whilst keeping the gradient the same) we nd
the line with the greatest intercept that intersects the feasible region. This gives us both the
maximum value for z and the values of x1 and x2 that maximise it.
In this example (x , x ) = (100, 350) gives the optimal value z = 3100.
1 2
If, instead, the objective function were z = 3x1 + 2x2 then the optimal solution would become
(x , x ) = (200, 300) with z = 1200.
1 2
Exercise: Check this.
Note: In both these examples, the optimising values of the control variables are unique and lie
at a vertex of the feasible region.
1.5
Denitions
An extreme point of a region R is any vertex on the boundary of R.
R is bounded if all the points in R are nite.
For a linear programming problem, if the feasible region is nonempty and bounded then an
optimal solution exists.
If the optimal solution is unique then it occurs at one of the extreme points of the feasible
region.
If a set of points all give the same maximal value of z then these points lie on a straight
line on the boundary of the feasible region.
Examples
The previous example of the alarm clock assembly problem is one that gives a unique solution.
The following is an example of an empty feasible region
maximise
s.t.
z = x1 + x2
x1 x2
x1 x2
maximise
s.t.
z = x1 + x2
x1
x2
10
z = x1 + x2
x1
x2
z = x1 + x2
x1 + x2
x1 , x2
5
0
Denition
The extreme points of a feasible region are called the feasible basis points.
1.6
If the feasible region is bounded, the Extreme Point Theorem tells us that the optimal solution
is at an extreme point of the feasible region (or on a line connecting two extreme points). So,
one can evaluate the objective function at each feasible basis point. The point that gives the
highest (lowest) value for z is the optimal solution.
Example 1.4 The alarm clock problem
Coordinates
(x1 , x2 )
0,0
300,0
0,350
200,300
100,350
z=
3x1 + 8x2
0
900
2800
3000
3100
Comments
If the feasible region is not bounded, the method may give a suboptimal solution.
For complex LPPs, nding the feasible basis points becomes time consuming.
11
1.7
In the optimal solution above x = 100 and x = 350, which implies the following use of
1
1
resources.
So, only
units per day of processing are required, whereas we have 1800 available. This
is a slack of 500. The slack for labour and alarm assemblies is zero.
Considering further the processing constraint:
6x1 + 2x2
1800,
0. (If we allowed negative values for x4 then 6x1 + 2x2 could be > 1800.)
So the inequality has been transformed into an equality. This will be utilised in the next chapter.
Suppose there had been another constraint that was a minimum ( ) not a maximum ( ). E.g.
the total number of clocks produced must be at least 100.
x1 + x2
100.
0.
1.8
12
Sensitivity Analysis means determining the amount the optimal solution changes when the specication of the LPP changes. How sensitive is the optimal solution to the correct model specication?
The specication of the LPP may change in three ways:
1. The coecients of the objective function (cj ) change,
2. The resource constraints (bi ) change, or
3. The coecients of the constraints (ai,j ) change.
Below, we look at 2 and 3.
Denition
The shadow price of a constraint is the amount by which the optimal value of the objective
function changes when the resource value of that constraint is increased by one unit.
Returning to the original clock problem.
13
Suppose we had increased the available labour resource by 1 unit to 1601. Then the optimal
solution will still be at the same vertex as before. The optimal value of x2 is still x = 350,
2
thus
x =
1
z =
Because the problem is linear, the objective function increases by 1.5 for every unit increase in
the labour resource. The shadow price of the labour constraint is 1.5 (per labour unit).
Suppose, instead, the processing resource had increased by one unit. Then the optimal solution
would not change at all. This is because the optimal solution does not lie on the line corresponding to the processing constraint. Thus, the shadow price for processing is zero. In general,
if the slack/surplus variable for a constraint is non-zero then the shadow price of the constraint
will be zero.
Finally, suppose the number of alarm assemblies available increases by one. Then
x =
1
x =
2
z =
14
In the graph, the gradient of the assembly constraint is zero and the gradient of the labour
1
constraint is 2 . For the optimum solution to be at the assembly-labour vertex, the gradient of
1
the objective function must be between 2 and 0.
If the gradient of the objective function is between 3 and 1 , then a greater value of the ob2
jective function can be obtained at the labour-processing vertex. If the gradient of the objective
1
function is exactly 2 then it is parallel to the labour constraint and any point on the labour
constraint between the two vertices is optimal.
Question: If z = c1 x1 + c2 x2 , what values of c1 and c2 give the same values of x = 100,
1
x = 350?
2
So if we change the value of c1 the same position of the optimal solution will be obtained provided
c1 remains positive and does not exceed 1 c2 .
2
Similarly, if we change only c2 the position of the optimal solution will remain the same provided
c2 exceeds 2c1 .
15
Note: The position of the optimal solution remains the same (i.e. x and x remain unchanged),
2
1
but the optimal value of the objective function will change.
Example 2
Suppose a company makes two models of car. These models are called Alpha and Omega. The
number of Alpha models to be produced is x1 and the number of Omega models is x2 .
We have the following linear programming problem:
maximise
s.t.
z = 6x1 + 5x2
4x1 + x2
2x1 + 3x2
800
900
(prot)
(materials)
(labour)
x1
180
x2
320
, giving
.
Again, if the objective function is altered, the location of the optimal solution could change
to another vertex. We shall nd the values of c1 and c2 for which the optimal solution is still
located at the same vertex.
For z = c1 x1 + c2 x2 , the gradient of the z contours is c1 /c2 . So, if
16
For c1 = 6 we have a tolerance interval for c2 of [1.5, 9]. As the original value of c1 was 6, the
original tolerance interval for c2 is [1.5, 9].
Exercise
1. What is the original tolerance interval for c1 ?
2. What are the shadow prices for the four constraints in this problem?
Chapter 2
General Notation
A LPP may be written using matrix notation.
For example,
maximise
z = cT x
obj. fn.
subject to
Ax
constraints
(2.1)
where
A = [ai,j ] is an m n matrix
b is an m-vector of constraints (resources)
x is an n-vector of variables and
c is an n-vector of objective function coecients.
The above LPP is in standard form. This is because equation (2.1) has a
are all non-negative.
inequality, and x
If instead
Ax = b
0
x
the LPP is in canonical form.
By using slack/surplus variables, it is always possible to convert any LPP into canonical form.
17
18
b, we require one slack variable for each of the m rows of A. Call these xn+1 , . . . , xn+m .
Ac =
2.1
Assume the rows of Ac are linearly independent. Otherwise at least one constraint is redundant
(i.e. follows from the other constraints) and should be removed.
Solving Ac xc = b gives a region R (in Rn+m ) satisfying the m constraints of the canonical
problem.
Solving means: given Ac and b nd the set of xc that satisfy Ac xc = b. In general, this will not
be a unique solution because Ac has m rows and n + m columns.
The region S = {xc R xc
19
Denitions
If a vector xc solves the equation Ac xc = b and xc has at most m non-zero elements, xc is called
a basic solution of the LPP.
If xc is a basic solution of the LPP and every element of xc is non-negative, xc is called a feasible
basic solution of the LPP.
Theorem 2.1
Every extreme point of S is a feasible basic solution of the LPP, and conversely every feasible
basic solution of the LPP is an extreme point of S.
In the last chapter we showed that the optimal solution of a LPP lies at an extreme point of the
feasible region (or on a line connecting two extreme points). So, from the theorem above, the
optimal solution must be one (or possibly two) of the basic feasible solutions.
The following method involves nding all the basic solutions, determining which are feasible,
and then seeing which optimises the objective function.
2.2
Numerical Na Method
ve
There are
m+n
m
basic solu-
20
Earlier, we wrote the clock LPP in canonical form using matrix notation:
x1
2 4 1 0 0 x2
1600
6 2 0 1 0 x3 = 1800
0 1 0 0 1 x4
350
x5
There are
5
3
obtain.
Using columns 1, 2 and 3,
gives x1 =
, x2 =
2 4 1
x1
1600
6 2 0 x2 = 1800
0 1 0
x3
350
, x3 =
21
The following table summarises the results for the other ve combinations of three columns:
Columns
1 3 5
1 4 5
2 3 4
2 3 5
2 4 5
x1
300
800
0
0
0
x2
0
0
350
900
400
x3
1000
0
200
-2000
0
x4
0
-3000
1100
0
1000
x5
350
350
0
-550
-50
Feasible?
Y
N
Y
N
N
z
900
2800
So the largest feasible value of the objective function is the second combination with x = 100,
1
x = 350, x = 0, x = 500, x = 0, giving z = 3100, which conrms the result found
2
3
4
5
graphically.
Example 2.3 Another Example
The following LPP is in canonical form.
maximise
z = x1 + x2
s.t.
x1 + 2x2 + x3 = 10
2x1 + x2 + x4 = 12
giving
Ac =
xc = (x1 , x2 , x3 , x4 )T
b = (10, 12)T
22
giving x1 =
, x2 =
, x3 =
and x4 =
is feasible. Now compute z, which equals 6.
. As x1
0 and x3
0, the solution
Exercise: repeat this for each combination of two columns of Ac , to nd the optimal solution.
Comments
1. The above is simply a numerical method of the graphical na method (Section 1.6).
ve
Graphical method dicult for n 3.
2. The number of possible solutions
m = 5, n = 5 gives
10
5
m+n
m
3. We have looked at the naive numerical method for the case where all m constraints are
. When this is not so, the LPP in canonical form will still be Ac xc = b, but Ac may
not have m + n columns. The principle remains the same, however. We still take m m
submatrices of Ac (A say) and solve A x = b. The vector x is still of length m and
there are still m basic variables. The number of non-basic variables will be l m, where l
is the number of columns of Ac .
2.3
23
By laying out the problem in a tabular format using this starting solution, we can successively
visit other feasible solutions in such a way that the objective function always increases. Once
z can be increased no more we reach the optimal solution. This method is called the Simplex
algorithm.
The advantage of the Simplex algorithm is that we do not have to investigate explicitly all of
m+n
the
basic solutions.
m
Rearranging z = c1 x1 + + cn xn , we obtain
c1 x1 cn xn + z = 0.
Our m constraints are Ac xc = b. So, we can write
0
x1
.
.
.
.
Ac
.
.
0 xn+m
c1 . . . cn 0 . . . 0 1
z
From this we have the general format of the initial tableau:
b1
.
.
= .
bm
0
(2.2)
One basic solution is easy to read o from this tableau: x1 = x2 = 0 (the non-basic variables)
and x3 = 1600, x4 = 1800 and x5 = 350 (the basic variables), with z = 0. We say that this
24
tableau corresponds to this basic solution. Note that this basic solution is feasible, because
bi 0 i.
The Simplex algorithm now looks for another basic feasible solution that has a greater value of
z. This new basic solution will be adjacent to the old one. That is, m 1 of the basic variables
will be same and one will be dierent.
First, we choose which of the non-basic variables to enter (into the set of basic variables).
Second, we choose which basic variable to exit (from the set of basic variables).
Step 1: Find the non-basic variable with the largest negative entry in the the last row (the
objective function row or z row). This variable will enter.
In the example, it is x2 , which has z-row entry 8.
Step 2: Find the basic variable corresponding to the row with the smallest positive -ratio. This
variable will exit. The -ratio for row i when xj is the entering variable is bi /ai,j .
In this example, the -ratios are:
1 = 1600/4 = 400,
2 = 1800/2 = 900,
3 = 350/1 = 350.
The smallest ratio is for row 3, so x5 is the exiting variable. Row 3 is called the pivotal row.
Step 3: Use elementary row operations to isolate the entering variable.
A variable is isolated when the column corresponding to this variable consists of zero entries
except for the entry in the pivotal row, which equals one.
The elementary row operations are:
4
2
1
8
, to become
0
0
1
0
Row 3 already has a 1 in the x2 column. So there is no need to alter row 3, except that we
relabel it with x2 , because the old basic variable x5 is being replaced by x2 . We can write this
as 3 3 .
2 0 1
200.
The element in the x2 column is now zero, as required. We can write this as 1 4 3 1 .
25
1
3
4
2
3
2
3
z
+8
Variables x2 , x3 and x4 are now the basic variables and so x1 and x5 are non-basic. The
values of x2 , x3 , x4 and z can easily be read o the tableau:
x2 = 350, x3 = 200, x4 = 1100, z = 2800.
(Note that z = 3x1 + 8x2 does indeed equal 2800.)
We have now completed one full iteration in the Simplex algorithm.
Notes
z has increased.
The values of the basic variables and of z come from:
0x2 + 1x3 + 0x4 = 200
x3 = 200
0x2 + 0x3 + 1x4 = 1100
x4 = 1100
1x2 + 0x3 + 0x4 = 350
x2 = 350
0x2 + 0x3 + 0x4 + z = 2800
z = 2800
since x1 = x5 = 0.
26
It is easy to read o the values of the basic variables because the submatrix consisting
of the columns corresponding to the basic variables is the identity matrix (with columns
permuted).
Second iteration
We now repeat the same process to see if we can get a solution that is even better.
The entering variable is x1 .
The ratios are 100, 183 1 and .
3
The exiting variable is x3 in row 1.
The third tableau is:
Stopping criterion
When the last row has non-negative entries for every non-basic variable, then the stopping
criterion has been met. In most cases that means that the optimal solution has been reached,
which can be read o directly from the tableau. The exception to this is for degenerate problems,
which are covered in the next section.
In our example, the stopping criterion is satised and thus the optimal solution is
x = 100, x = 350, x = 0, x = 500, x = 0 and z = 3100.
5
4
3
2
1
(Verify that z = 3x + 8x equals 3100.)
2
1
2.4
Full details are available in our references, such as Bertsimas and Tsitsiklis, Hillier and Lieberman, or Gordon and Pressman. Here I provide a brief account.
We have seen that the initial tableau represents the matrix equation
Ac x c = b
and the equation
z c1 x1 . . . cn xn = 0,
(2.3)
27
(2.4)
(i = 1, . . . , m)
z = 0
To obtain the second tableau, we perform elementary row operations. This amounts to premultiplying equation (2.3) by a particular matrix B.
B Ac xc = B b
(2.5)
The set of solutions of (2.3) is the same as the set of solutions of (2.5).
As m columns of the matrix BAc form an identity matrix (with columns permuted), it is easy
to read o the second basic feasible solution:
xi = 0
i non-basic variables
xi = (B b)i
i basic variables
(2.6)
where xk1 , . . . , xkn are the n non-basic variables, and d1 , . . . , dn are the corresponding entries in
the z row, and e is the entry in the solution column of the z row. When all entries of the z row
are non-negative (so, d1 , . . . , dn 0), it is clear that making any of the non-basic variables into
basic variables (i.e. allowing them to be > 0) can only decrease z. Therefore, we have found an
optimal solution.
Example 2.4 A simple example
maximise
s.t.
z = x1 + 2x2
x1 + x2
x1 , x2
2
0
28
x2
2
C
x3
2
A
B
0
0
x1
The area enclosed by triangle ABC is the feasible region. The vertices A, B and C are the three
basic solutions (two zero elements and one non-zero).
The initial solution is (0,0,2) (vertex A).
The optimal solution is (0,2,0) (vertex C).
The Simplex algorithm moves, at each iteration, from one basic feasible solution to an adjacent
and better solution, stopping when it can no longer nd a better solution.
So, either:
We move from A to B (i.e. enter x1 and exit x3 ), giving z = 2, followed by moving from B
to C (i.e. entering x2 and exiting x1 ), giving z = 4.
Exercise: Run the Simplex algorithm on this problem, and investigate which route to the
optimal solution is taken.
2.5
Example 2.5
maximise
s.t.
z = x1 + 2x2 + x3 + x4
2x1 + x2 + 3x3 + x4
2x1 + 3x2 + 4x4
3x1 + x2 + 2x3
x1 , . . . , x4
12
18
29
Introduce the slack variables x5 , x6 and x7 . These slack variables are the initial basic variables.
Initial Tableau
30
31
1
x = x = 0, x = 11 ,
5
6
7
3
= 9 1 . (Check this using z = x + 2x + x + x .)
and the optimal value is z
1
2
3
4
3
2.6
Alternative optima
If one (or more) of the non-basic variables has a zero in the z row of the nal tableau, there
exists an alternative optimal solution. In other words there are two (or more) feasible basic
solutions that give the same maximum value of the objective function.
Why is this?
Look at equation (2.6) in Section 2.4. The fact that there is a zero in one of the columns
corresponding to a non-basic variable means that one of d1 , . . . , dn (dj , say) equals zero. So, we
can enter the non-basic variable xkj without changing the value of z.
The geometric interpretation of this is that the z-contours are parallel to one (or more) constraints and that that any point on this constraint line also maximises z. So, alternative optimal
basic solutions will always be adjacent.
Example 2.6
This is a trivial example to demonstrate the idea. Given the following LPP.
maximise
s.t.
z = x1 + 2x2
x1
x2
x1 + 2x2
x1 , x2
12
0
32
The Simplex method moves from the origin to point A to point B, where the algorithm stops.
Any point on the line BC is optimal, but only B and C are basic solutions.
Exercise: Run the Simplex algorithm on this problem and verify that the z row of the nal
tableau contains a zero in one of the columns corresponding to a non-basic variable.
Unbounded solutions
In the Simplex algorithm, if you encounter the situation in which none of the -ratios are positive,
then there is an unbounded problem. There is no optimal solution and the Simplex algorithm
breaks down.
In any practical example this means that the problem has been ill-formulated. It is nonsense to
say that an innite prot (or other objective) is attainable without violating any constraint.
Example 2.7 Another trivial example
maximise
s.t.
z = x1 + x2
2x1 + x2
x1 x2
x1 , x2
10
5
Degenerate problems
If two (or more) rows have the same smallest positive -ratio so that there is a choice of
exiting variable we have a degenerate problem.
This will lead to one or more of the basic variables being zero.
It may lead to the stopping criterion being satised even though a suboptimal solution is
present.
33
When it does, continue the Simplex algorithm even if the stopping criterion has been met,
until z does not increase any more.
2.7
0,
0.
Minimisation Problems
Minimising
z = c1 x1 + + cn xn
is the same as maximising
z = c1 x1 cn xn .
So in the Simplex algorithm enter
34
Articial Variables
So far, we have assumed that all the constraints have a
and all the resource values bi are
non-negative. In this case, we introduce slack variables xn+1 , . . . , xn+m and the initial solution
x1 , . . . , xn = 0 and xn+i = bi (i = 1, . . . , m) is feasible.
There are three situations to consider where this is not the case.
1. When the constraint involves
constraint by 1. E.g.
b1 < 0
becomes
a11 x1 . . . a1n xn
But now the constraint is
2. When a constraint involves
b1 > 0
.
, we introduce a surplus variable. E.g.
a21 x1 + . . . + a2n xn
b2
becomes
a21 x1 + . . . + a2n xn xn+2 = b2
If b2 0, we can just set xn+2 = b2 . If b2 > 0, we cannot set xn+2 = b2 because then
xn+2 < 0, which is not a feasible solution.
3. When a constraint is an equality, e.g.
a31 x1 + . . . + a3n xn = b3
we do not introduce a slack or surplus variable and there is no simple way to nd an initial
basic solution in the original variables.
35
To solve LPPs using the Simplex algorithm, an articial variable is added to each constraint
that has an = sign or is bi with bi > 0. So, we have
a21 x1 + . . . + a2n xn xn+2 + y1 = b2
for a
2.8
Introduce articial variables, y1 , . . . , yp , say, using the method described above. Change the
objective function to
z = c1 x1 + . . . + cn xn M (y1 + . . . + yp )
where M is a very large number. In practice, M is any arbitrary number large enough to
ensure that once the articial variable departs, it will never enter again.
The Simplex algorithm is then used to solve the new problem, with one adaptation: the stopping
criterion is not met until all the articial variables are non-basic. This may mean entering a
variable whose z row entry is positive.
Example 2.8 The big-M method
z = x1 + 5x2
x1
2x2
12
3x1 + 2x2
x1 , x2
The articial problem is
18
36
where x3 and x4 are slack variables, x5 is a surplus variable, and y1 is an articial variable.
The initial basic solution is x1 = x2 = x5 = 0, x3 = 4, x4 = 12 and y1 = 18.
Initial tableau
x1
1
0
3
1
x3
x4
y1
z
1
2
x2
0
2
-5
x3
1
0
0
0
x4
0
1
0
0
x5
0
0
-1
0
y1
0
0
1
M
Sol
4
12
18
0
6
9
2
2
2 1
1 3
2
3 z
z
, , - , + .
5
2
Second tableau
x1
1
0
x3
x2
y1
z
+1
x2
0
1
0
0
x3
1
0
0
0
x4
0
1
2
-1
5
2
x5
0
0
-1
0
y1
0
0
1
M
Sol
4
6
6
30
1 2
2
3 z
z
1
3
3
3
- , , , - .
1
3
1
3
1
3
Final tableau
x3
x2
x1
z
x1
0
0
1
0
x2
0
1
0
0
x3
1
0
0
0
x4
1
3
1
2
-1
3
17
6
x5
0
0
1
-3
1
3
y1
1
-3
0
1
3
M- 1
3
Sol
2
6
2
28
37
Notes:
1
If we had not used the big-Ms then the z row entry for y1 would have been 3 and we
would try to enter the articial variable back as a basic variable.
The stopping criterion for the big-M method is slightly dierent. If all the non-basic variables have positive z row entries, but some of the articial variables are still basic, then
continue the algorithm, choosing as the entering variable the non-basic variable with the
smallest positive z row entry.
This approach only works for maximisation problems. One can adapt the method to work
for minimisation methods, but that is not covered in this course.
2.9
Two-Phase Method
This method involves running the Simplex algorithm twice: once to eliminate all the articial
variables and so nd a feasible basic solution to the original problem; and then again after
dropping the articial variables from the tableau in order to optimise the original problem.
First Phase
Dene one articial variable for every constraint that needs one (y1 , . . . , yp ,
the auxiliary objective function as
m). Dene
yi
z =
i=1
38
We solve the auxiliary problem using the Simplex algorithm with the auxiliary objective function.
We continue until all the articial variables have been eliminated. As soon as all the articial
variables have been eliminated, we stop the rst phase.
Example 2.9
Consider the same problem we used with the big-M example:
maximise
s.t.
z = x1 + 5x2
x1
2x2
12
3x1 + 2x2
x1 , x2
18
Phase One
Add slack and articial variables to give the auxiliary problem.
There is a problem here: all the non-basic variables have a zero in the z row. We have to re-write
z as
In general, we have
z = (y1 + + yp )
and
ai,1 x1 + . . . ai,n+m xn+m + yi = bi
39
n+m
i=1
i=1
j=1
bi
xj =
ai,j
x3
x4
y1
z
0
3
1
-3
x2
0
2
2
-5
-2
x3
1
0
0
0
0
x4
0
1
0
0
0
x5
0
0
-1
0
1
y1
0
0
1
0
0
Sol
4
12
18
0
-18
1 2
2 3
1
3 z
1
z
z
1
z
1
, , - 3 , +3. - .
second tableau
x1
x4
y1
z
x1
1
0
0
0
0
x2
0
2
-5
-2
x3
1
0
-3
-1
3
x4
0
1
0
0
0
x5
0
0
-1
0
1
y1
0
0
1
0
0
Sol
4
12
6
-4
-6
6
3
1
1 2
3
2
3 z
3
z
z
z
3
3
, - , , +. + .
1
2
nal tableau
x1
x4
x2
z
x1
1
0
0
0
0
x2
0
0
1
0
0
x3
1
3
3
-2
- 17
2
0
x4
0
1
0
0
0
x5
0
1
-1
2
5
-2
0
y1
0
-1
1
2
5
2
Sol
4
6
3
11
0
5
2
40
Phase Two
Use the nal tableau of phase one as the starting point for phase two. Simply drop the z row
and the y columns.
Phase II
x1
x4
x2
z
x1
1
0
0
0
x2
0
0
1
0
x3
1
-3
2
- 17
2
x4
0
1
0
0
x5
0
1
1
-2
5
-2
Sol
4
6
3
11
4
2
-ve
1
2 3
3 z
z
1
2
2
2
2
- , , + , + .
1
3
1
3
x1
x3
x2
z
1
2
x1
1
0
0
0
x2
0
0
1
0
x3
0
1
0
0
x4
-1
3
1
3
1
2
17
6
17
6
x5
-1
3
1
3
0
1
3
Sol
2
2
6
28
Comments
The big-M method is often easier for hand calculations than the two-phase method. However
for numerical implementation on a computer, the method is numerically unstable, giving (very)
inaccurate results. So, most computer algorithms use the two-phase method.
41
bi with bi
0 or
bi with bi
Chapter 3
Sensitivity Analysis
Recall that sensitivity analysis means assessing how sensitive the optimal solution is to the
specication of the linear program. In Chapter 1, we looked at:
how much the optimal value changes for a unit change in a resource value (shadow price).
how much a resource value can change before the optimal solution moves to a dierent
extreme point (i.e. the set of basic variables changes).
how much the objective function can change before the optimal solution moves to a dierent
extreme point.
We looked at sensitivity analysis for the graphical solution method. Now, we consider it for the
Simplex algorithm. Fortunately, the algorithm itself helps us answer such questions.
42
43
Initial tableau
x1
x4
x5
x6
z
1
0
-4
x2
1
0
1
-2
x3
0
1
1
-1
x4
1
0
0
0
x5
0
1
0
0
x6
0
0
1
0
S
4
6
8
0
4
6
1 2 1
2 3
3
z
1
z
1
, -, and +4.
Second tableau
x1
x5
x6
z
x1
1
0
0
0
x2
1
-1
1
2
x3
0
1
1
-1
x4
1
-1
0
4
x2
1
-1
2
1
x3
0
1
0
0
x5
0
1
0
0
x6
0
0
1
0
S
4
2
8
16
2
8
1 2
2 3 2
3
z 2
z
1
, , - and +.
Final tableau
x1
x3
x6
z
So the optimal solution is
x1
1
0
0
0
x4
1
-1
1
3
x5
0
1
-1
1
x6
0
0
1
0
S
4
2
6
18
44
We shall see later that the entries in the z row corresponding to the slack variables are the
shadow prices! The shadow prices for the rst, second and third constraints are 3, 1 and 0,
respectively.
Denition
If the slack variable corresponding to a particular constraint is zero then it is a binding constraint,
otherwise it is a non-binding constraint.
Thus, a constraint is binding if the optimal solution exactly satises that constraint (with
equality).
Theorem 3.1 Complementary Slackness
Usually, binding constraints have positive shadow prices. This reects the fact that we could
get a higher return if we had a larger resource value for that constraint, since we are currently
bound by the constraint.
In Example 3.1 a greater prot can be obtained by increasing the rst two resource values, b1
and b2 , but not the third resource value, b3 .
45
0,
0,
6+
0.
Now,
4+
(3.1)
(3.2)
6+
(3.3)
(3.4)
2, i.e. if 0
b1
6.
Notes
1. In fact, it is not necessary to recalculate the solution column for each iteration of the
Simplex algorithm. The new nal solution column is just the old nal solution column
plus times the column corresponding to the slack variable for the constraint whose
resource value has been changed, i.e. in this case the x4 column.
So, we see that the entries in the z row corresponding to the slack variables are indeed the
shadow prices.
2. If the value of is outside the range 4
46
4. Good linear programming software will usually report, for each constraint i, the range of
values of bi for which the optimal solution still has the same set of basic variables.
Exercise: If b2 changes from 6 to 8, what is the new optimal solution?
18
47
Non-Basic Variables Suppose that the objective function has a change in the coecient of
a non-basic variable (e.g. x2 ) by an amount . That is, z becomes
z = 4x1 + (2 + )x2 + x3
The only change to the nal tableau is that the z-row entry of the variable whose coecient has
been changed (x2 ) changes by an amount . Thus the z row becomes
0, i.e. provided
1. So, we require c2
3.
3.2
Duality
Each LPP can be formulated in two dierent ways: the primal problem and the dual problem.
We have been considering LPPs of the form
maximise: z = n ci xi
i=1
subject to:
a11 x1 + . . . a1n xn
.
.
.
.
.
.
am1 x1 + . . . amn xn
x1 , . . . , xn
b1
bm
0.
.
.
.
c1
cn
0.
48
Exercise: You may want to check that the optimal solution to this dual problem is
y1 = 3,
y2 = 1,
y3 = 0,
g = 18
Notice that g = z and y1 , y2 and y3 equal the shadow prices for the primal problem.
Matrix Format
In matrix form, the primal problem is
maximise: z = cT x
subject to: Ax
b
x
0
and the corresponding dual problem is
minimise: g = bT y
subject to: AT y
y
c
0
Theorem 3.2
The optimal value of the objective function in the primal problem is the same as the optimal
value of the objective function in the dual problem. That is
z = g .
Consider the dual problem in Example 3.2. Suppose b1 is changed to b = b1 + 1 = 5. Assuming
1
that the optimal solution remains feasible, the new value of g is
By increasing b1 by one unit the new optimal value of g has increased by an amount y1 . The will have also increased by y . Hence y is the
orem 3.2 tells us that the optimal value of z
1
1
shadow price for the rst constraint of the primal problem.
Theorem 3.3
The optimal values of the control variables in the dual problem are the shadow prices of the
primal problem and vice versa.
49
4y1 + 2y2 + y3
(3.5)
since, for example, with 2 units of labour and 6 of processing, he can make 3 of prot.
Mrs. Dual agrees this is fair. However, she want to minimise her liabilities. If the factory burns
down and all Mr. Primals resources are lost, she will have to pay out
g = 1600y1 + 1800y2 + 350y3
pounds.
So, Mrs. Dual wants to solve the following LPP:
minimise g subject to the constraints (3.5).
This is the dual of Mr. Primals original optimisation problem.
Chapter 4
Game Theory
4.1
Introduction
In Chapters 13, we looked at situations in which one individual chooses what to do in order
to optimise some quantity (the objective function). In this chapter, we look at strategic games
(or games). In a game two or more individuals (the players) each try to maximise a dierent
quantity (the payo for that player) which depends not only on what he/she does, but also on
what the other player(s) do.
Game Theory is applied in many elds:
It is often necessary to idealise (simplify) the real problem, in order to make it susceptible to
analysis by Game Theory.
Games may be classied as:
simultaneous-move or sequential-move
zero-sum or non-zero sum (variable-sum)
two-player or n-player (n > 2)
In this chapter, we shall look at two-player, zero-sum, simultaneous-move games. In Section 4.9
we shall briey look at Games Against Nature.
An important assumption of Game Theory is that each player is rational, i.e. they are each
seeking to maximise their payo and choose the best strategy to achieve this. This assumption
is necessary to work out what each players best (optimal) strategy is.
50
51
4.2
Basic Formulation
We have two players (A and B). Each has a number of possible moves and must choose one of
these. We are dealing with simultaneous-move games, so players must choose their move before
knowing which move the other has chosen. The outcome of the game is a payo for Player A
and a payo for Player B. These depend on Player A and Bs choices of move. As we are dealing
with zero-sum games, the payo to Player B is minus the payo to Player A. So, we require only
one payo matrix:
If Player A chooses move Ai and Player B chooses move Bj , the payo to Player A is aij and to
Player B is aij . The convention is that the numbers in the payo matrix are the payos for
the player to the left of the matrix (Player A in the matrix above).
Example 4.1
Player
A
A1
A2
Player B
B1 B2
0 4
3
2
Denitions
A strategy is a rule for determining which move to play.
A pure strategy species that the same move always be chosen.
A mixed strategy species that the move be chosen at random, with each move having some
specic probability of being chosen. E.g. Player B chooses B1 with probability y1 , B2 with
probability y2 , etc. (y1 + y2 + . . . + ym = 1).
(A pure strategy is a mixed strategy in which one of the probabilities equals one and the rest
are zero.)
Suppose the payo matrix is
52
An
Player A
A1
.
.
.
Player
B1 . . .
a11 . . .
.
.
.
an1
B
Bm
a1m
.
.
.
...
anm
Let xi (i = 1, . . . , n) be the probability that Player A chooses move Ai and yj be the probability
that Player B chooses move Bj , and let x = (x1 , . . . , xn ) and y = (y1 , . . . , ym ). Then the expected
payo for Player A is
E(x, y) =
Note the non-standard notation for expectation.
4.3
Nash Equilibrium
We seek the optimal strategy for each player. These are dened in terms of the Nash equilibrium.
The Nash equilibrium of a game is the pair of strategies, one for each player, such that each
players strategy is best for him, given that the other player is playing his equilibrium strategy.
Another way of looking at this is that neither player should want to change his strategy if he
knew what strategy the other player was using.
In fact, we shall often pretend that the other player knows our strategy in order to work out
what our optimal strategy is.
Example 4.2
Player
A
A1
A2
Player B
B1 B2
0 4
3
2
The Nash equilibrium is: Player A always chooses A2 and Player B always chooses B2 . Thus
these are the optimal strategies for the two players. They are pure strategies: x = (0, 1),
y = (0, 1). The payo will then be 2 (to Player A).
Example 4.3
53
Player
A
A1
A2
Player B
B1 B2
4
3
2
8
1
1
Do x = ( 2 , 1 ) and y = ( 4 , 3 ) constitute a Nash equilibrium?
2
4
Denition
If optimal strategies x and y exist, then
v = E(x , y )
is called the value of the game.
Corollary
E(x , y)
E(x, y )
54
The rst two methods will nd optimal strategies if they are pure. The latter two are needed
otherwise.
4.4
Dominance
Denition
Let xp and x be pure strategies. Then xp dominates x if
p
p
E(xp , yq )
E(x , yq )
p
E(xp , yq )
Also, yq dominates yq if
E(xp , yq )
Dominated strategies are sub-optimal and can be eliminated (for the purpose of solving the
game).
Example 4.4
Player
A
A1
A2
A3
B1
1
2
3
Player B
B2 B3 B4
2
1
2
1
0 3
2
0
0
B5
0
1
0
We can drop the dominated strategies from the game. The sub-payo matrix is now
If, for Player A, pure strategy xp dominates all other pure strategies, xp is Player As optimal
strategy. Player Bs optimal strategy is now the pure strategy that minimises As payo when
A uses xp .
Example 4.5
55
Player
A
A1
A2
A3
Player B
B1 B2 B3
2
5 2
0
7
1
3
6
1
Pure strategy A2 (i.e. x = (0, 1, 0)) dominates the other two pure strategies, A1 and A3 (i.e.
x = (1, 0, 0) and x = (0, 0, 1). Hence, x = (0, 1, 0).
What is Bs optimal strategy?
4.5
Saddle Points
Suppose that neither Player A nor Player B has a pure strategy that dominates all his/her other
pure strategies. Then we look for saddle points.
Denition
Pure maximin strategy for Player A is the strategy with maximum row-minimum.
That is,
arg max
i=1,...n
j=1,...m
Pure minimax strategy for Player B is the strategy with minimum column-maximum.
That is,
arg min
j=1,...m
i=1,...n
Example 4.6
Two companies have rival products of Beer, Wine and Spirits. Each month they choose which
product to advertise. The following table shows the gain in As prots (in thousands per
month) given the combination of advertising actions. We shall assume that Bs prots drop by
the same amount.
Company
A
Beer
Wine
Spirits
Company B
Beer Wine Spirits
7
8
5
2
12
0
9
4
4
56
Denition
If the payo for As pure maximin strategy is the same as for Bs pure minimax strategy then
there is a saddle point in the game.
Theorem 4.2 Saddle Point Theorem
If a two-player, zero-sum, simultaneous-move game has a saddle point, the pure maximin and
minimax strategies constitute a Nash equilibrium and are optimal strategies. The value of the
game, v, equals the maximin and minimax payos.
In Example 4.5 there is a saddle point in the top right hand corner. As optimal strategy is
always to advertise Beer and Bs optimal strategy is to advertise Spirits.
Exercise: Check that these two pure strategies constitute a Nash equilibrium.
Notes
4.6
If there is no saddle point, the optimal solution is mixed, and either pure maximin payo < v
or pure minimax payo > v or both.
When searching for a saddle point, we considered only pure maximin/minimax strategies. We
now consider mixed maximin/minimax strategies.
In order for Player A to determine his maximin strategy, he must consider what strategy Player B
would adopt if she knew x = (x1 , . . . , xn ). Player B would choose the move Bj for which the
expected payo (to A)
n
xi ai,j
i=1
57
xi ai,m
xi ai,1 , . . . ,
min
i=1
i=1
n
(x1 ,...xn )
xi ai,m
xi ai,1 , . . . ,
i=1
yj a1,j , . . . ,
j=1
j=1
Theorem 4.3
yj an,j .
The maximin and minimax strategies constitute a Nash equilibrium and so are optimal strategies.
4.7
A game that is not n 2 may become so after eliminating dominated strategies. So, eliminate
dominated strategies rst.
Assume we have eliminated all dominated strategies and looked for a saddle point. No saddle
point has been found, and so we must consider mixed strategies.
Example 4.7
Player
A
A1
A2
A3
Player B
B1 B2
2 1
3
4
0
2
58
We can plot the three components in the above formula as lines with y1 (0
horizontal axis.
y1
1) on the
4
Payoff
to A
A1
A3
0
1
y1
2
A2
3 2
5, 5
Theorem 4.4
The optimal (maximin) strategy for A requires only two strategies: those that intersect at Bs
minimax solution.
If more than two strategies intersect at this point, choose any two that have opposite gradients.
In this example, A1 and A3 intersect at the minimax solution. This implies that A2 is not used
in the optimal solution and so x2 = 0.
We need to nd the values of x1 and x3 that maximise the minimum of
2x1
and
x1 + 2x3 ,
59
This minimum occurs where the two lines intersect (draw the graph of 2x1 and 2 3x1 against
x1 if you are not sure why). Hence we have
2x = 2 3x
1
1
2
x =
1
5
The optimal solution for this game is
3
2
, 0,
5
5
3 2
=
,
5 5
2 2
3 2
2 3
= 2 1 +2
5 5
5 5
5 5
4
.
=
5
x =
y
v
To solve a 2 m game an almost identical approach is used. The maximin strategy for A is
found graphically by taking the maximum point of the lowest of the n lines. Then the minimax
strategy for B can be found. Always start with the player who has only two possible moves.
4.8
The following method can always be used, even if n > 2 and m > 2.
Suppose Player B plays the mixed strategy y and Player A know what y is. Then Player A will
choose his move Ai to maximise the payo, i.e.
m
ai,j yj .
u = max
i=1,...n
j=1
Player Bs minimax strategy is that which minimises u. From the denition of u we have
a1,j yj
j=1
a2,j yj
u,
an,j yj
u, . . . ,
j=1
(4.1)
j=1
and
y1 + + ym = 1
(4.2)
60
Taking (4.1) and (4.2) and dividing through by u (which for the moment we will assume is
positive), we get
ym
y1
+ + a1,m
u
u
.
.
.
ym
y1
an,1 + + an,m
u
u
ym
y1
+ +
=
u
u
a1,1
.
.
.
1
1
u
...
...
an,1 Y1 + + an,m Ym
w=
j=1,...,m
ai,j xi
min
i=1
ai,1 xi
ai,m xi
w, . . . ,
i=1
i=1
and
x1 + + xn = 1,
which yields (assuming w is positive)
n
ai,1
i=1
xi
w
ai,m
1, . . . ,
i=1
xi
w
xn
1
x1
+ +
=
w
w
w
Let Xi = xi /w and V = 1/w. We have
minimise X1 + + Xn = V
subject to a1,1 X1 + + an,1 Xn
.
.
.
1
.
.
.
a1,m X1 + + an,m Xn
61
Recognise this as the dual of the rst LPP. Hence Z = V , i.e. min u = max w. Thus we have
the following theorem.
Theorem 4.5
Provided the value of the game is greater than zero (v > 0), the minimum possible value of u is
the value of the game.
Solving the rst LPP using the simplex algorithm gives Z = 1/u = 1/v and Yj = yj /u
(j = 1, . . . , m).
A1
A2
A3
B1
3
-1
-2
B2
-2
0
-1
B3
-3
-2
3
A1
A2
A3
B1
7
3
2
B2
2
4
3
B3
1
2
7
62
Y1
Y2
Y3
Z
Y1
1
0
0
0
Y2
0
1
0
0
Y3
0
0
1
0
Y4
.18
.14
.008
.05
Y5
.091
. 39
-.14
.152
Y6
0
.091
.18
.091
Sol
.091
.152
.05
.298
The optimal solution is Z = .298, Y1 = 0.091, Y2 = 0.152 and Y2 = 0.05. This gives
, and y3 =
, y2 =
the optimal strategy for Player B: y1 =
.
The optimal solution for Player A can be obtained using duality. The shadow prices for the
primal problem are the solution to the dual problem. Thus, X1 = 0.05, X2 = 0.152 and
4.9
In this setting, we only have one rational player, but the payo also depends on some as yet
unknown state of nature. Here nature might mean nature (e.g. weather, biological competition) or some other phenomenon over which the player has no inuence (e.g. stock-markets,
trac conditions, political policy).
We adopt a similar framework to the two-player game, assuming that nature is player B. However
we no longer assume that player B adopts a strategy to minimise player As reward. Nature is
not out to get us. This fact means we must change our optimisation approach.
Example 4.9 Electronics company
An electronics company is considering launching a new product. They have 3 options:
A1: Launch now with a big advertising campaign
A2: Launch now with a minimal advertising campaign
63
Good
High sales
High costs
Moderate
Moderate sales
High costs
Poor
Poor sales
High costs
A2 (neutral)
Moderate sales
Low costs
Poor sales
Low costs
A3 (Cautious)
Competitors steal
the market
Lower sales
Lower costs
A1 (risky)
A2 (neutral)
A3 (Cautious)
Good
110
90
80
Moderate
45
55
40
Poor
-30
-10
10
In the previous sections As strategy was determined by the maximin principle. The maximin
strategy is pure strategy A3, as there is a saddle point at the cautious/poor entry (10). However,
this is only optimal if we assume that the market is trying to minimise this companys costs, an
assumption for which there is no justication.
We shall only consider pure strategies for games against nature.
There is no single optimal strategy. It depends on the players attitude to risk and/or how likely
he thinks the various possible states of nature are.
Maximax strategy
This is the strategy that can give the maximum payo (but only if the state of nature happens
to be right). Thus the maximax strategy is the one that corresponds to the row with the largest
entry in the payo matrix. In Example 4.9 this is A1.
The maximax strategy is risk taking.
64
Laplace strategy
The Laplacian strategy is the one that maximises the expected payo, assuming that each state
of nature is equally likely.
The expected payos for strategies A1 , A2 and A3 are
2
For A1 : 110/3 + 45/3 30/3 = 41 3
For A2 : 90/3 + 55/3 10/3 = 45
1
For A3 : 80/3 + 40/3 + 10/3 = 43 3
Thus, the Laplacian strategy is A2.
Hurwicz strategy
For a strategy Ai the Hurwicz number Hi is
Hi = aiu + (1 )ail
where aiu is the maximum entry in row i and ail is the minimum entry in row i of the payo
matrix.
The Hurwicz strategy is the strategy that gives the greatest Hurwicz number. It depends on .
When = 0 we get the maximin solution and when = 1 we get the maximax solution. So,
is a weighting of the best and worse states of nature.
In our example we have:
H1 = 110 30(1 ) = 140 30
H2 = 90 10(1 ) = 100 10
H3 = 80 + 10(1 ) = 70 + 10
65
H( )
H1
90
H2
80
H3
10
0
10
30
So, when = 0.25, we have H1 = 5, H2 = 15 and H3 = 27.5, giving A3 as the optimal strategy.
For what values of is A3 the Hurwicz strategy?
In general, the Hurwicz solution is a compromise of the two extreme states of nature. However,
the method requires us to choose and it does not take into account the other states of nature
(which may be more likely).
66
A1 (risky)
A2 (neutral)
A3 (Cautious)
Moderate
10
0
15
Poor
40
20
0
pj aij
Ep (Ai ) =
i = 1, . . . , n.
j=1
We can choose our preferred strategy to be the pure strategy Ai that maximises Ep . Note that
if each pj = 1/m, we have the Laplacian strategy.
Similarly, the variance of the payo is
m
Vp (Ai ) =
j=1
Vp (Ai ),
for some K. If K is large, strategies with a highly variable payo are penalised. The optimal
strategy under the EVSD criterion is the strategy Ai that maximises EV SDp (Ai , K).
In Wilkes (1989) and many other OR books the expected valuevariance criterion is considered
instead. The EVSD is better from a statistical viewpoint because the standard deviation is
67
measured on the same scale as the expected value, and so the EVSD has the same units as the
expected value.
Chapter 5
Dynamic Programming
5.1
Introduction
Example 5.1
Look at the following network. We have to get from node A to node Z in the shortest possible
time, moving only in the direction of the arrows. The time needed to move from one node
to another is indicated above the arrow. E.g. one possible route is ACGJLZ, which takes
4 + 6 + 3 + 3 + 5 = 21 minutes.
8
9
A
4
1
10
Z
1
2
1
G
D
6
7
E
We could inspect all possible routes. However, there are many possible routes, and this number
would grow rapidly as the network grew. Dynamic programming is a more ecient method to
solve this problem.
Dynamic programming is used when the problem can be separated into stages. An optimisation
68
69
is performed at each stage in turn, but the optimal decision found at each stage depends on the
optimal decision found at the next stage, and so on. It is only when the optimisation has been
performed at the nal stage that it becomes clear what the optimal decision is at each of the
earlier stages.
One application of dynamic programming is to nd the route through a network that gives the
minimum total cost or maximum total reward. Example 5.1 is an example of such a problem,
in which the costs are the times.
Notation
Stage n {0, 1, . . . , N }.
Sn is the state at stage n.
Qn is the state space at stage n, i.e. is the set of all possible states at stage n. (So, Sn Qn .)
ci,j is the transition cost for moving from state i to state j.
For each state in stage 1, what is the quickest route from stage 0? How long does it take?
For each state in stage 2, what is the quickest route from stage 1? How long does it take?
.
.
.
Finally, for each state Z in stage 5, what is the quickest route from stage 4? How long does
it take?
We have separated the problem into stages and solved each stage in turn. This is dynamic
programming. The method we have just used is a dynamic programming algorithm called
forward recursion.
5.2
Forward Recursion
Dene fn (Sn ) as minimum cost for moving from (any state in) stage 0 to state Sn in stage n.
Clearly, f0 (S0 ) = 0 S0 Q0 .
The forward recursion equation is given by
fn (Sn ) =
min
Sn1 Qn1
(for n = 1, . . . , N ).
If the transition from a state Sn to a state Sn+1 is not possible, treat cSn ,Sn+1 as .
70
The optimal route to B is A B. Similarly, for all the other states, C, D and E, in stage 1.
Then
f2 (F ) =
min
S1 {B,C,D,E}
= min {2 + 8, 4 + 2, 1 + , 3 + 8} = 6
So, the optimal route to F passes through C. Since we know that the optimal route to C is
A C, the optimal route to F is A C F . Similarly, for the other states, G and H, in stage
2. We do the same for stages 3 and 4, and nd that f4 (L) = 10, f4 (M ) = 8 and the optimal
route to M is A E G J M .
Finally,
f5 (Z) =
min
S4 {L,M }
= min {10 + 5, 8 + 1} = 9
So, the optimal route to Z passes through M. Since we know that the optimal route to M is
A E G J M , the optimal route to Z is A E G J M Z.
Maximising reward
Alternatively, instead of minimising a total cost, we might want to maximise a total reward.
Here, instead of ci,j being the transition cost for moving from state i to state j, ri,j is the
transition reward.
Dene fn (Sn ) as maximum reward for moving from (any state in) stage 0 to state Sn in stage
n, and let f0 (S0 ) = 0 S0 Q0 again. The forward recursion equation to maximise a reward
is given by
fn (Sn ) =
The forward recursion equation implies that it is enough to know the optimal rewards/costs for
reaching every state in the previous stage and the transition rewards/costs to get from each of
these states to the current state. We do not need to consider the entire path to reach the current
state.
Example 5.2
The following diagram shows a network with 4 stages (0, 1, 2 and 3) and the transition rewards.
Find the route from stage 0 to stage 3 that maximises the total reward. Note: You may end in
any of the three states of stage 3.
71
Stage 0
Stage 1
Stage 2
8
Stage 3
8
3
4
4
5
6
10
6
1
9
10
We nd fn (Sn ) for every state at every stage (starting with stage 0) and the state in the previous
stage through which we should pass.
Stage 0 f0 (1) = 0.
Stage 1 The maximum rewards at the rst stage are just the transition rewards:
f1 (2) = max{f0 (1) + r1,2 } = 3
f1 (3) = max{f0 (1) + r1,3 } = 9
f1 (4) = max{f0 (1) + r1,4 } = 6
The optimal routes are obviously 1 2, 1 3 and 1 4.
Stage 2
f2 (5) = max{f1 (2) + r2,5 , f1 (3) + r3,5 } = max{3 + 8, 9 + 4} = 13 (through state 3)
f2 (6) = max{f1 (2)+r2,6 , f1 (3)+r3,6 , f1 (4)+r4,6 } = max{3+2, 9+5, 6+10} = 16 (through 4)
f2 (7) = max{f1 (3) + r3,7 , f1 (4) + r4,7 } = max{9 + 7, 6 + 9} = 16 (through 3)
Stage 3
f3 (8) = max{f2 (5) + r5,8 , f2 (6) + r6,8 } = max{13 + 8, 16 + 4} = 21 (through 5)
f3 (9) = max{f2 (5)+r5,9 , f2 (6)+r6,9 , f2 (7)+r7,9 } = max{13+8, 16+2, 16+1} = 21 (through 5)
f3 (10) = max{f2 (6) + r6,10 , f2 (7) + r7,10 } = max{16 + 1, 16 + 6} = 22 (through 7)
The optimal route ends at state 10 and passes through 7. The optimal route to 7 passes
through 3. So, the optimal route is 1 3 7 10.
Check that this route gives the correct total reward.
r1,3 + r3,7 + r7,10 = 9 + 7 + 6 = 22.
72
Note
To nd the optimal route, we do not actually need to note down the optimal route to every
state. We can wait until we get to the end, when we know the optimal reward, and then nd
the optimal route by rolling back the solution. You may prefer to nd the optimal route this
way (it involves less work). In this example we want to nish at state 10, giving a reward of
22. But 22 comes from f2 (7) + r7,10 = 16 + 6, so if we want the optimal reward we must visit
state 7. The optimal reward for getting to state 7 is obtained from f1 (3) + r3,7 = 9 + 7 so we
must visit state 3. Thus the optimal route is to go through states 3,7 and 10, giving a reward
of 9+7+6=22.
5.3
Backward Recursion
Using the same method as before, the value fn (Sn ) can be written in next to the appropriate
state. Note that this is a minimisation problem, as we are dealing with costs.
We can also use Backward Recursion.
Here, we dene gn (Sn ) as the maximum reward / minimum cost for moving from state Sn in
stage n to (any state in) stage N .
Clearly, gN (SN ) = 0 SN QN .
Now, we work backwards, going rst to stage N 1, then to N 2, etc., and ending in stage 0.
73
min
Sn+1 Qn+1
(n = 0, 1, . . . , N 1)
5.4
This is a common type of problem which once formulated as a dynamic programming problem
can be solved in a straightforward manner.
Example 5.4
A company wishes to expand its business by enlarging three of its manufacturing plants. For
each type of expansion there is an expense and a future revenue (reward) according to the
following table (all entries in M).
74
Plant 1
expense revenue
0
0
1
5
2
6
Plant 2
expense revenue
0
0
2
8
3
9
4
12
Plant 3
expense revenue
0
0
1
3
0:
1:
2:
3:
75
76
Forward recursion
Let fn (Sn ) be the maximum revenue available in stages 0, . . . , n when the state at stage n is Sn
(i.e. having spent Sn million expanding plants 1, . . . , n).
So, the forward recursion equation is
fn (Sn ) =
max
Sn1 Qn1
Stage 1
f1 (S1 ) = maxS0 {0} {f0 (S0 ) + r0 (S0 , S1 )} = r0 (0, S1 )
S1
0
1
2
f1 (S1 )
0
5
6
Stage 2
f2 (S2 ) = maxS1 {0,1,2} {f1 (S1 ) + r1 (S1 , S2 )}.
S2
0
1
2
3
4
5
S1 = 0
0+0
S1 = 1
S1 = 2
5+0
0+8
0+9
0+12
6+0
5+8
5+9
5+12
6+8
6+9
f2 (S2 )
0
5
8
13
14
17
S1
0
1
0
1
1 or 2
1
Stage 3
f3 (S3 ) = maxS2 {0,1,2,3,4,5} {f2 (S2 ) + r2 (S2 , S3 )}.
S3
0
1
2
3
4
5
S2 = 0
0+0
0+3
S2 = 1
5+0
5+3
S2 = 2
8+0
8+3
S2 = 3
13+0
13+3
S2 = 4
S2 = 5
14+0
14+3
17+0
f3 (S3 )
0
5
8
13
16
17
S2
0
1
1 or 2
3
3
4 or 5
77
, i.e. spending
Backward recursion
Let gn (Sn ) be the maximum revenue available over future stages, starting in state Sn at stage n
(i.e. having spent Sn million expanding plants 1, . . . , n).
Clearly, g3 (S3 ) = 0 S3 Q3 . Now,
gn (Sn ) =
max
Sn+1 Qn+1
(n = 0, 1, 2)
Stage 2
g2 (S2 ) = maxS3 {0,1,2,3,4,5} {g3 (S3 ) + r2 (S2 , S3 )}.
S2
0
1
2
3
4
5
S3 = 0
0+0
S3 = 1
0+3
0+0
S3 = 2
0+3
0+0
S3 = 3
0+3
0+0
S3 = 4
0+3
0+0
Stage 1
g1 (S1 ) = maxS2 {0,1,2,3,4,5} {g2 (S2 ) + r1 (S1 , S2 )}.
S3 = 5
0+3
0+0
g2 (S2 )
3
3
3
3
3
0
S3
1
2
3
4
5
5
78
S2 = 0
3+0
S2 = 1
S2 = 2
3+8
3+0
S2 = 3
3+9
3+8
3+0
S2 = 4
3+12
3+9
3+8
S2 = 5
0+12
0+9
g1 (S1 )
15
12
11
S2
4
4 or 5
5
Stage 0
g0 (S0 ) = maxS1 {0,1,2,3,4,5} {g1 (S1 ) + r0 (S0 , S1 )}.
S0
0
S1 = 0
15+0
S1 = 1
12+5
S1 = 2
11+6
g0 (S0 )
17
S1
1 or 2
5.5
Notation
t
Qt
Vt
ct
pt
It
K
t
month (t = 1, . . . , T )
number of items manufactured in month t
number of items distributed to the retailers during the rst day of month t
manufacturing cost (per item) in month t
retail price (per item) in month t
number of items stored in the warehouse at beginning of 1st day of month t
the maximum capacity (no. of items that can be stored) and
prot available in months t, t + 1, . . . , T .
Each of Qt , Vt , ct , pt , It and K are 0, and for any month t the number of items that can be
stored cannot exceed the maximum capacity, i.e. 0 It K. The costs, ct , and prices, pt , are
assumed to be known in advance but may be dierent from month to month.
The number of items in the warehouse at the start of month t + 1 is the number of items there
79
at the start of month t, plus the number of items manufactured in month t minus the number
of items distributed in month t, giving
It+1 = It Vt + Qt
We want to choose Vt and Qt (t = 1, . . . , T ) so that the prot 1 is maximised.
The above is a general problem. In a specic example we have:
The maximum capacity is K = 1000.
The initial amount of stock in the warehouse is I1 = 300.
We are considering the problem over a six month period, so T = 6,
and at the end of the 6 month period we want an empty warehouse, so I7 = 0.
The values ct and pt are
t ct pt
1 70 90
2 64 82
3 72 70
4 70 85
5 65 90
6 65 85
We need to use backward recursion for this problem.
Month 6
Let t = 6, and assume that today is the rst day of month 6. Pretend that the number of items
in the warehouse at the start of today, I6 , is xed.
The prot for month 6 is 6 = p6 V6 c6 Q6 . Since we are required to have I7 = 0, I7 = I6 V6 +Q6 ,
and Q6 0, the only way that this can be satised is for Q6 = 0 and I6 = V6 . So
6 = p6 I6 = 85I6 ,
with V6 = I6 and Q = 0.
6
Month 5
Now let t = 5, and assume that today is the rst day of month 5. Pretend that the number of
items in the warehouse at the start of today, I5 , is xed and allow that the number of items in
the warehouse one month later, I6 , is variable.
The two month prot is
5 = p5 V5 c5 Q5 + 6
= p5 V5 c5 Q5 + 85I6
= p5 V5 c5 Q5 + 85(I5 V5 + Q5 )
= 90V5 65Q5 + 85(I5 V5 + Q5 )
= 5V5 + 20Q5 + 85I5 .
80
This gives a linear programming problem in two variables V5 and Q5 with a constant added to
the objective function.
maximise: 5 = 5V5 + 20Q5 + 85I5
subject to:
V5
I5
Q5 V5
K I5
Q5 , V5
0.
The feasible region is
5
V5 +
20
5
20
5 85
I5 ,
20 20
85
20 I5 .
V5 = I5
Q = K
5
5 = 90I5 + 20K
This corresponds to distributing everything we have this month and lling up the warehouse for
the start of next month.
Month 4
Step back another month, so t = 4. Now we pretend that I4 is xed, but allow that I5 be
variable.
The maximum of the three-month prot, 4 , is the maximum of
4 = p4 V4 c4 Q4 + 5 ,
81
4 = p4 V4 c4 Q4 + 5
= p4 V4 c4 Q4 + 90I5 + 20K
= 85V4 70Q4 + 90(I4 V4 + Q4 ) + 20K
= 5V4 + 20Q4 + 90I4 + 20K
Similarly to when t = 5, we have the LPP
maximise: 4 = 5V4 + 20Q4 + 90I4 + 20K
subject to:
V4
I4
Q4 V4
K I4
Q4 , V4
0.
This has exactly the same feasible region as before (apart from replacing I5 , Q5 and V5 with I4 ,
Q4 and V4 ). Now the slope of the objective function is +1/4, which again gives the maximum
at the point
V4 = I4 and Q = K
4
giving
4 = 85I4 + 40K.
Month 3
Exactly the same method can be applied, so
3 = 15V3 + 13Q3 + 85I3 + 40K
and again the feasible region is the same (apart from replacing I4 , Q4 and V4 with I3 , Q3 and
V3 ). This time the slope is 15/13 > 1 so the optimum is at
V3 = 0
and Q = K I3
3
giving
3 = 72I3 + 53K.
Month 2
V2 = I2 and Q = K
2
giving
2 = 82I2 + 61K.
82
Month 1
V1 = I1 and Q = K
1
giving
1 = 90I1 + 73K.
We also have enough information to calculate the numerical values for Vt , Q , It and t .
t
t
1
2
3
4
5
6
Vt
300
1000
Q
t
1000
1000
It
300
1000
t
100 000
143 000
Question:
why is 2 > 1
Note: Forward recursion does not work with this problem. This is because the maximum prot
made in the rst two months is attained by a dierent strategy than the one that is optimal
over the rst month.
Chapter 6
Stochastic Optimisation
6.1
Introduction
In Chapter 5 the rewards/costs of the transitions were assumed to be known exactly and we were
able to choose which transitions to make. In this chapter we deal with problems where these
assumptions are violated. Such problems are solved by stochastic programming or stochastic
optimisation algorithms. We shall concentrate on one aspect of stochastic optimisation known
as Markov Dynamic Programming (MDP), also known as Markov Programming or Markov
Decision Programming.
A stochastic (random) process is a set of random variables indexed by time, {Xt : t
shall only consider discrete-time random processes, X0 , X1 , X2 , . . . .
6.2
0}. We
Markov Chains
Denition
A sequence of random variables X0 , X1 , . . . is a Markov chain if the Markov property holds for
each random variable in the sequence: that is, n 0,
P (Xn+1
xn+1 | X0 = x0 , X1 = x1 , . . . , Xn = xn ) = P (Xn+1
xn+1 | Xn = xn )
If X0 , X1 , . . . are discrete random variables, we can write the Markov property equivalently as
P (Xn+1 = xn+1 | X0 = x0 , X1 = x1 , . . . , Xn = xn ) = P (Xn+1 = xn+1 | Xn = xn )
(i.e. change the s to =s).
Example 6.1
83
84
Mr. Bond is playing roulette. At each turn he places a 1 chip on number 17. Let Xn denote
the number of 1 chips he has after n turns. He begins with X0 = x0 chips.
The number of chips he has after n + 1 turns, Xn+1 , depends on the number of chips he has after
n turns. Given Xn , Xn+1 is independent of X0 , . . . , Xn1 . Therefore X0 , X1 , . . . is a Markov
chain.
In most of this chapter we consider nite horizon problems, i.e. n = 0, 1, . . . , N , with N xed.
In Section 6.6 we allow an innite number of stages.
In a nite horizon Markov dynamic programming problem we have the following:
1. a Markov chain X0 , X1 , . . . , XN with known transition probabilities,
2. costs or rewards associated with each transition in the Markov chain,
3. terminal costs or rewards associated with the states at the nal stage, and
4. (usually) actions that alter the transition probabilities and costs/rewards.
Notation
We must always use backward recursion for MDP problems (for reasons explained later). For
this reason, it is convenient to use n to denote the number of stages to go, rather than the
number of the stage (as was the case in Chapter 5). So,
Let pi,j denote the transition probability for moving from state i with n stages to go to state j
(with n 1 stages to go).
(n)
pi,j = P (Xn1 = j | Xn = i).
(n)
The transition matrix P (n) is the matrix whose (i, j)th entry is pi,j .
Clearly,
(n)
(n)
0 pi,j
1 and
pi,j = 1 for all i.
j
So, each element of P (n) must be between zero and one, and the rows must sum to one.
(n)
(n)
(0)
(0)
Let ri,j or ci,j denote the transition reward/cost for the transition from state i with n stages
to go to state j (with n 1 stages to go).
Let ri or ci denote the terminal reward/cost for state i, i.e. the reward/cost incurred when
we end in state i.
85
Note: Often the transition probabilities will be the same at every stage, in which case the
(n)
superscript (n) in pi,j and P (n) can be dropped.
6.3
Example 6.2
Stage 0
n=3
Stage 1
n=2
Stage 2
n=1
Stage 3
n=0
terminal
rewards
3
4
4
5
10
6
10
1
9
So, the state space for X0 is Q0 = {8, 9, 10}, for X1 is Q1 = {5, 6, 7}, for X2 is Q2 = {2, 3, 4}
and for X3 is Q3 = {1}.
(1)
The transition probabilities are P (X0 = 8 | X1 = 6) = p6,8 = 0.4, etc. The transition rewards
(3)
(2)
(0)
(0)
(0)
are r1,2 = 3, r2,5 = 8, etc. Finally, the terminal rewards are r8 = 1, r9 = 1 and r10 = 2.
Denitions
Let Rn (i) be the one-step expected reward when in state i with n stages to go (Xn = i). The
formula for Rn (i) is
(n) (n)
n 1
ri
n=0
86
Let Vn (i) be the total future expected reward when in state i with n stages to go (Xn = i). The
function Vn (i) is called the value function.
(n)
(n)
jQn1 pi,j rij + Vn1 (j)
n 1
Vn (i) =
(0)
ri
n=0
(n)
Rn (i) + jQ
p Vn1 (j) n 1
n1 i,j
(0)
R0 (i) = ri
n = 0.
This is a recursive formula for Vn (i) in terms of Vn1 , so we can use a similar methods to those
used in the last chapter in order to nd the expected reward over the whole problem, V3 (1).
So, we start at stage 3, when n = 0
(0)
V0 (8) = r8 =
(0)
V0 (9) = r9 =
(0)
(1) (1)
10
j=8 p5,j r5,j
R1 (6) =
(1) (1)
10
j=8 p6,j r6,j
and R1 (7) =
(1) (1)
10
j=8 p7,j r7,j
This in turn allows us to compute the value function for each state in stage 2 (n = 1)
V1 (5) = R1 (5) +
(1)
10
j=8 p5,j V0 (j)
V1 (6) = R1 (6) +
(1)
10
j=8 p6,j V0 (j)
(1)
10
j=8 p7,j V0 (j)
87
Note: the deterministic Example 5.2 had the same network and the same rewards (but no
terminal rewards). The maximum total reward was 22 (24 if we add the terminal reward). This
was achieved by visiting states 1, 3, 7 and 10. In the MDP example, we cannot decide which
route to take (the route is random), so it is not surprising that the expected total reward is less
than the maximum reward.
6.4
Suppose the process is in state i with n stages to go (Xn = i), and that we now have K actions
(n) (n)
(n)
available a1 , a2 , . . . , aK = A(n) . We call A(n) the action space.
We can extend the dynamic programming model to depend on the actions taken:
So, we have
Now that we have actions, there is a choice available. We want to nd the sequence of actions
that will maximise our expected total reward, or minimise out expected total cost. First, we
extend the denitions of the one-step expected reward and the value function to take account
of the actions chosen.
Denitions
Rn (i, a(n) ) is the one-step expected reward when the state is i with n stages to go (Xn = i) and
action a(n) is taken.
Vn (i) is the future expected reward when in state i with n stages to go and an optimal action
plan is used for these remaining n stages. Vn (i) is called the optimal value function.
88
(n) (n)
(n)
) ri,j (a(n) ) n 1
jQn1 pi,j (a
Rn (i, a(n) ) =
(0)
ri
n=0
The optimal value function is given by the following recursive equation.
Vn (i) =
maxa(n) A(n)
(n)
jQn1
(n)
(0)
ri
n=0
(n)
jQn1
n=0
A ticket agency is responsible for advertising and selling tickets for a concert which will take
place in two weeks. Each week the agency has the choice either to put a half-page advert in
the major newspapers and magazines, or just to submit the concert details to the whats-on
listings. To simplify the problem, the agency categorises the ticket sales into three states, fast
ticket sales, average ticket sales and slow ticket sales. At present, the ticket sales are average.
Let n denote the number of weeks remaining before the concert takes place. Label the states as:
3 for fast sales, 2 for average sales and 1 for slow sales. Label the actions as A for Advertise, and
D for Dont advertise. As the actions are the same at each stage we do not have to superscript
them with n.
We can represent this as a network
89
1 0 1
(1)
ri,j (A) = 0 1 2
1 2 3
1 2 3
(1)
ri,j (D) = 2 3 4
3 4 5
.5 .3 .2
1.0 0.0 0.0
1 0 1
(2)
.4 .3 .3
r1,j (A) =
p1,j (A) =
(2)
r1,j (D) =
(2)
p1,j (D) =
1 2 3
.6 .3 .1
The way to solve this problem is to set up a table for every state and possible action in each
(n)
stage. For each action a and state i compute Rn (i, a) and jQ pi,j (a) Vn1 (j). This gives us
Rn (i, a) +
(n)
jQ pi,j (a)
90
state
3
action
2
1
3
1
2
1
2
Vn (i)
sum
(n)
0
1
Rn (i, a)
The last two columns tell us the optimal expected future reward and the action that should be
taken in order to attain the optimum. The optimal actions are: dont advertise with two weeks
to go; and with one week to go advertise only if the ticket sales are slow. The optimal expected
future reward is 7 340.
Notes:
The solution requires that the optimal actions are found for every state.
In deterministic dynamic programming problems the choice is the path through the network. In
MDP we can not choose the path that is taken. Instead, we choose actions which inuence the
probabilities of which path is taken.
6.5
Discounting
Suppose a prot of 1 now is worth more than 1 at the next stage. Reasons for this could be:
In order to account for this, we use a discount factor , which compounds year on year. We
shall assume that is constant and 0 < < 1. So an amount of 1 at the next stage is worth
at todays prices.
91
(n)
a(n)
jQn1
Exercises 6, Question 5
Repeat the ticket agency example using a discount factor of
1. = 0.9
2. = 0.4
and compare with the initial answer.
6.6
When the problem has no (obvious) nal stage, we still want to nd the optimal value function
and the optimal actions.
We shall assume that there is a discount factor (0 < < 1) and that the state space and
action space are the same for each stage. So, we can drop the subscript/superscript n.
The optimal value function, i.e. optimal expected future (discounted) reward, is
aA
jQ
= R (i, a (i)) +
jQ
92
Let Yt denote the state of the machine at the end of week t. If Yt = i and the action is dont
replace, we incur a running cost of c(i) = i and
i
with prob.
i+1
Yt+1 =
with prob.
1
2
1
2
Yt+1 =
1 with prob.
1
2
1
2
1
2
1
2
1
2
pi,i+1 (2) =
1
2
r(i, 1) = K
r(i, 2) = i
Denition
A policy (i) is a decision rule (or action plan) that species an action for each state i in the
state space Q.
We might guess that the optimal policy is of the form: choose a = 1 (replace) if i
a = 2 (dont replace) if i < N , for some unknown N .
The above policy is (i) = 2 for 0 i
corresponding to this policy, V (i), is
N and
i +
K +
1
2 V (i)
1
+ 2 V (i + 1)
if 0
+ 1 V (1)
2
if i
1
2 V (0)
N 1
Analytical methods can be used to show that this policy is optimal and to nd the values of
V (i) and N , but this is not covered in this course. Instead we shall use an iterative method.
For this iterative method, pretend that the number of stages is nite. Let Vn (i) be the expected
future (discounted) reward when there are n stages to go and we use the optimal policy. Then
V0 (i) = 0 i N
i N
93
Un (i, 1) = K +
If Vn (i) converges (say to V (i)) as n , then V (i) is the optimal value function for the
problem with innite number of stages because it satises the equation
aA
jQ
If we now assume specic values for K and of 10 and 0.75 respectively, and run the above
iterative method we get:
State
i
0
1
2
3
4
5
6
7
8
9
10
11
a
2
2
2
2
2
2
2
2
2
2
1
V1 (i)
-0
-1
-2
-3
-4
-5
-6
-7
-8
-9
a
2
2
2
2
2
2
1
1
1
1
Iteration Number
2
3
(i)
V2
a
V3 (i)
-0.375
2
-0.938
-2.125
2
-3.250
-3.875
2
-5.562
-5.625
2
-7.875
-7.375
2 -10.188
-9.125
1 -10.938
-10.375
1 -10.938
-10.375
1 -10.938
-10.375
1 -10.938
-10.375
1 -10.938
(n)
a
2
2
2
2
1
1
1
1
1
1
1
1
38
V38 (i)
-5.106
-8.511
-11.518
-13.864
-15.106
-15.106
-15.106
-15.106
-15.106
-15.106
-15.106
-15.106
a
2
2
2
2
1
1
1
1
1
1
1
1
39
V39 (i)
-5.106
-8.511
-11.518
-13.864
-15.106
-15.106
-15.106
-15.106
-15.106
-15.106
-15.106
-15.106
The function Vn (i) converges (but needs 38 iterations to converge to 3 d.p.s). The convergence
for such problems in general is often very slow.
Notice that the solution is of the form postulated above: Replace if i 4 and continue if i 3.
The optimal value function is given in the rightmost column (with V (i) = 15.106 for all
i 4).
6.7
At every stage of an optimal stopping routine there are two possible actions, stop or continue.
If the action is stop then a reward is obtained, depending on the current state, and that is the
94
end of the routine. If the action is continue then a cost is incurred and the routine proceeds to
the next stage, according to a Markov chain. Assume there are only a nite number of stages
in the routine.
The routines we shall consider have the following form.
The reward for stopping in state i before the nal stage is r(i). This is independent of stage.
The reward for continuing to the nal stage is r (0) (i).
The cost of continuing is c(i). It depends on the state but not the stage.
If continuing, the probability that the next state is j given that the current state is i with n
(n)
stages to go is pij .
There is no discounting.
The optimal value function is
Vn (i) = max
r(i), c(i) +
and
jQn1
(n)
pi,j Vn1 (j)
(n
1)
The rewards are the amounts in the box (i.e. the states) r(i) = i.
n is the number of stages left, i.e. the number of boxes still to be accepted or rejected.
For n
1, state space Qn = all possible values in box. The amounts of money are in
pounds, so Qn = {0, 1, . . . , 999, 1000}.
The terminal reward is nothing: r (0) (i) = 0.
The transition costs are zero: c(i) = 0.
For n
(n)
95
1000
pij V1 (j)
V2 (i) = max i,
j=0
1000
pij V1 (j) =
j=0
1
1001
1000
j
j=0
=
giving V2 (i) =
So the optimal policy is to take the money if and only the amount in the box is
1000
pij V2 (j)
V3 (i) = max i,
j=0
1000
pij V2 (j) =
j=0
=
V3 (i) =
1
1001
1000
500
500 +
j=0
j=501
96
1000
pij V3 (j)
V4 (i) = max i,
j=0
1000
pij V3 (j) =
j=0
625
1
625.125 +
1001
j=0
1000
j=626
V4 (i) =
So the optimal policy is to take the money if and only if the amount in the box is
The complete optimal strategy is for the contestant to accept a box if and only if it contains at
least the amount shown
Box number
Amount
Appendix 1: Exercises
STAT7003: Exercises 1
1. Solve the following Linear programming problems graphically.
(a)
(b)
8
5
2
0
maximise: z = x1 + 2x2
subject to: 3x1 + 3x2
x1 x2
x1 + x2
x1 + 3x2
x1 , x2
(c)
maximise: z = x1 + x2
subject to: x1 + x2
3
x1
2
x1 , x2
0
(d)
maximise: z = 3x1 x2
subject to:
x1 2x2
x1 + x2
4x1 + 2x2
x2
x1
x1
x1 , x2
9
2
6
6
0
4
8
20
4
4
8
0
2. A company wants to purchase at most 1800 units of a product. There are two types of
the product available: product 1 and product 2. Product 1 occupies 2 ft3 , costs 12 and
the company makes a prot of 3. Product 2 occupies 3 ft3 , costs 15 and the company
97
98
Appendix 1: Exercises 1
makes a prot of 4. The company wants to maximise its prot. If the budget is 15 000
and the warehouse has 3 000 ft3 for the product,
a) set up the problem as a linear programming problem and
b) solve the problem graphically.
3. The BW Dog Food manufacturer has four production plants, and ship cases of dog food
from these plants to 10 warehouses. The shipping costs per week for each plant/warehouse
combination are given in the matrix below. The number of cases required by each warehouse and the capacity of each plant (in number of cases) are also given given.
Plant
1
2
3
4
Minimum number of
cases required
1
5
6
8
7
2
4
8
3
9
3
8
3
7
6
4
4
3
5
9
Warehouse
5 6 7
9 2 9
1 6 8
4 4 2
3 8 8
4
8
4
5
3
3
9
8
4
4
1
10
6
3
5
6
Capacity
10
14
16
12
a) Formulate this as a linear programming problem, for minimising costs. Hint: this
problem has forty control variables.
b) Discuss how the model would change if in addition we had to consider the production
cost per case, which are 12 15 10 and 16 for plants 1, 2, 3 and 4 respectively.
4. a) Show graphically that the following linear programming problem has an unbounded
feasible region.
1
maximise: z = 2 x1 + x2
subject to:
x1 + x2
6
3x1 2x2
3
x1 8x2
8
x1 , x2
0
b) Is there a optimal solution to this problem, if so what is it?
c) Would there be an optimal solution to this problem if it were a minimisation problem
1
instead (again with z = 2 x1 + x2 ), if so what is it?
99
Appendix 1: Exercises 2
STAT7003: Exercises 2
1. For the problem in Exercises 1 Question 2, obtain the shadow prices for each constraint,
and give an interpretation to these shadow prices.
2. Solve these linear programming problems using the numerical na method.
ve
a)
b)
maximise: z = x1 + 3x2 + x3
subject to: 2x1 5x2 + x3
x1 + 4x2
x1 , x2 , x3
3
5
0
3. Solve the following linear programming problem using the simplex method.
maximise: z = x1 + 2x2
subject to: x1 + 2x2
4
2x1 + 5x2
11
x1 , x2
0
(In all problems that involve using the simplex method, you are expected to include all
intermediate tables and to show your working.)
4. A crisp company makes pizza avoured and chilli avoured crisps. These crisps go through
three main processes: frying, avouring and packing. Each kilogram of pizza avoured
crisps takes 3 minutes to fry, 5 minutes to avour and 2 minutes to pack. Each kilogram
of chilli avoured crisps takes 3 minutes to fry, 4 minutes to avour and 3 minutes to pack.
The net prot on each Kg of pizza avoured crisps is 0.12 and is 0.10 for each Kg of
chilli avoured crisps. The frying machine is available 4 hrs each day, the packing machine
is available 6 hrs each day and the the avouring machine is available 8 hours each day.
The manufacturer wants to maximise the daily prot for these two products. Express this
as a linear programming problem and solve it using the simplex method.
Appendix 1: Exercises 3
100
STAT7003: Exercises 3
1. Convert the following linear programming problem into a maximisation problem in standard form, hence use the simplex algorithm to nd the optimal solution. Write down the
optimal value of the objective function.
minimise: z = x1 3x2 2x3
subject to: 3x1 x2 + 2x3
7
2x1 + 4x2
12
4x1 3x2 8x3
10
x1 , x2 , x3
0
2. Write down the following linear programming problem in canonical form using articial
variables. State clearly which variables are slack, surplus and articial variables. What is
the auxiliary objective function z in terms of the control variables xi ? You do not need
to solve this problem.
maximise: z = x1 + 2x2 x3
subject to:
x1 + x2
4
x1 2x2 + x3 = 3
2x1 + 2x2 + x3
12
x1 + x2 x3
6
x1 , x2 , x3
0
3. Write down the following linear programming problem in canonical form using articial
variables. Solve it using the big-M method.
maximise: z = x1 + 2x2
subject to: x1 + x2 = 10
x2
8
x1 , x2
0
4. Write down the following linear programming problem in canonical form using articial
variables. Solve it using the two-phase method.
maximise: z = 4x1 x2
subject to:
x1 + x2
4
x1
8
x1 + x2
1
x1 , x2
0
101
Appendix 1: Exercises 4
STAT7003: Exercises 4
1. Given the following pay-o matrix, nd which of As strategies are dominant stating which
strategies are dominated. Also nd which of Bs strategies are dominant stating which
strategies are dominated.
Write down the sub-pay-o matrix.
Player
A
A1
A2
A3
A4
B1
7
5
3
6
Player B
B2 B3 B4
0
3
2
-1
1
4
4
0
0
3
2
8
2. Find the saddle point in the following pay-o matrix, and hence state the optimal policy
for A and B.
Player
A
B1
4
7
1
-6
2
A1
A2
A3
A4
A4
Player B
B2 B3 B4
-2 -6
6
5
3
4
-4
0
5
7
2 -1
6
4
3
B5
-3
-1
0
-2
0
3. Consider a game between players A and B with the following pay-o matrix.
Player
A
A1
A2
A3
Player B
B1 B2 B3
2
1
6
0
5
4
4
0
1
Explain why the value of this game must be in the interval [1, 4].
4. Find an upper and lower bound for the value of the following game between players A and
B.
Player
A
A1
A2
A3
Player B
B1 B2 B3 B4
3 -3
5 -6
2
2 -6
7
-4
5
0
2
102
Appendix 1: Exercises 4
5. Find the saddle point of the following 2 games, and so give the value of the game and the
optimal strategies.
(a)
(b)
Player
A
Player
A
A1
A2
A1
A2
A3
Player B
B1 B2
2
-1
2
4
Player B
B1 B2 B3
8 -5 -4
-1
4 -3
-7
7 -5
6. Find the region of values p and q such that the matrix entry (A1 , B3 ) is a saddle point for
a game with the following pay-o matrix. Hint: it may help to sketch a graph of q against
p plotting the inequalities that are required to make the saddle point.
Player
A
A1
A2
A3
B1
3
2p
q
Player B
B2
B3
(q + 1)
p
0
2
5 (q 2)
7. Solve the following game between players A and B graphically and nd the value of the
game.
Player
A
A1
A2
Player B
B1 B2 B3 B4
2
3 -4
6
3
4
5 -4
8. In the game scissors-paper-stone two players simultaneously choose one of those three
objects. If both players make the same choice then it is a draw, otherwise the winner is
determined by the following three rules:
scissors cuts paper
stone blunts scissors and
paper covers stone.
Draw up a zero-sum game pay-o matrix for this game. What is the value of this game?
Without solving explicitly suggest what the optimal policy is, and explain why.
9. Consider a game between players with the following pay-o matrix.
103
Appendix 1: Exercises 4
Player
A
A1
A2
A3
Player B
B1 B2 B3
-2 -2
2
-1
1
0
7
2 -2
Y3
Y2
Y1
Z
Y1
0
0
1
0
Y2
0
1
0
0
Y3
1
0
0
0
Y4
.226
-.211
.083
.098
Y5
-.036
.368
-.180
.150
Y6
-.015
-.053
.128
.060
Soln.
.173
.105
.030
.308
State the optimal policy for both players (to 2 d.p.) and the value of the game.
104
Appendix 1: Exercises 5
STAT7003: Exercises 5
1. A real estate development rm is comparing the following alternative development projects:
building and leasing an oce park, building an oce block for rent, buying and leasing a
warehouse, building a shopping centre and building then selling a block of ats.
The nancial payos of these projects depends on the interest rate levels over the next ve
years. They could decline remain stable or increase. The nancial returns for the projects
( millions) are shown in the following payo table.
Project
oce park
oce block
warehouse
shopping centre
block of ats
Interest rates
Decline Stable Increase
2
6.8
18
6
7.6
9.4
6.8
5.6
4
1.8
9.6
15.6
12.8
6
2.4
(a) Check for dominated investments, and exclude them (if any) from the payo table.
(b) Find the optimal investment solutions based on the following decision principles:
maximax, Laplace criterion, and Hurwicz criterion.
(c) construct the regret matrix, and obtain the minimax regret strategy for the rm.
2. In an election year a UK investor must decide whether to invest capital in local stocks,
long term gilts or an overseas mutual fund. Over the next four years the returns on the
investment will depend on which of the three parties will win the election. With the help
of a broker and after reading daily comments in the business section of the newspapers,
the investor compiles the following table of of predicted annual percentage returns and
their dependence on the winning party. The investor wishes to maximise the percentage
return on the invested capital.
Local Stocks
Long term gilts
Mutual Fund
Winning Party
P1
P2
P3
15%
5% 12%
7% 10%
7%
5% 11% 10%
(a) Check for dominated investments, and exclude them (if any) from the payo table.
(b) Find the optimal investment solutions based on the following decision principles:
maximax, Laplace criterion, Hurwicz criterion and minimax regret.
(c) The investor estimates that the winning chances of the three parties P1, P2 and P3
are 0.4, 0.5, and 0.1 respectively. Calculate the expectations of investment returns.
Find the investment with the largest expected return and the one with the smallest
expected variance of return.
105
Appendix 1: Exercises 5
3. For the network shown below:
(a) The rst forward recursion equation is
f1 (S1 ) = max {rxS1 }
S1 {B, C, D}.
x{A}
S3 {G, H, I, J}.
x{K}
10
A
10
E
C
10
F
6
D
12
H
7
9
I
11
8
J
Plant 1
expend revenue
0
0
1
3
3
5
Plant 2
expend revenue
0
0
2
3
3
4
Plant 3
expend revenue
0
0
1
2
2
5
5. Consider the resource allocation problem. Draw the network associated with tables (a)
and (b) below, where ci and Ri denote the expenditure and revenue for the expansion of
plant i. Assume that the total available capital is 8 million, and that plant i must be
enlarged, unless it specically has a proposal with zero ci and zero Ri .
(a)
106
Appendix 1: Exercises 5
Proposal
1
2
3
4
Plant 1
c1 R1
3
5
4
6
Plant 2
c2 R2
3
4
4
5
5
8
Plant 3
c3 R3
0
0
2
3
3
5
6
9
(b)
Proposal
1
2
3
Plant 1
c1 R1
0
0
3
5
4
7
Plant 2
c2 R2
1 1.5
3
5
4
6
Plant 3
c3 R3
0
0
1 2.1
Plant 4
c4 R4
0
0
2 2.8
3 3.6
Country
1
2
3
7
4
6
10 8
9
14 11 14
17 14 15
9. Warehousing Problem
You are given the warehousing problem with T = 4, K = 1500, I1 = 750 and the prices
and costs in the following table.
107
Appendix 1: Exercises 5
t
1
2
3
4
pt
18
20
22
20
ct
20
23
16
20
Again assume that the warehouse should be empty after four months (I5 =0).
Find the optimal values for Vt and Qt and the maximum future prot t . Hint: be
particularly careful about your solutions for V2 and Q .
2
10. The University Shop sells college scarves during the three terms of the academic year.
Each term up to 100 scarves can be kept in the shop stores. The shop has to order the
number of scarves it wants delivered at the start of the next term during the current term.
At the start of term 1 they have 30 scarves in the store room. Assume that all scarves
taken from the store room and placed in the shop during the term will be sold.
They want to have an empty store at the end of term 3, and do not need to order any
scarves during term 3 as this can be done during the long summer vacation.
The costs and selling prices are given in s below.
Formulate this as a warehousing problem using the shop store at the warehouse.
Use this to nd the optimal number of scarves to be ordered during each term (for delivery
at the start of next term), and the optimal prot that can be expected.
term
1
2
3
cost price
7
6
6
selling price
10
10
7
108
Appendix 1: Exercises 6
STAT7003: Exercises 6
1. The following MDP uses the same notation as given in Chapter 6 of the lecture notes.
Assume that there are 2 stages to go, and at both stages the state space is {0, 1}. You are
currently in state 1, i.e. X2 = 1. At each stage there are two actions possible a1 and a2 .
(0)
(0)
.8 .2
.5 .5
P (a2 ) =
.1 .9
.2 .8
0 1
1 2
109
Appendix 1: Exercises 6
3. A self-employed person can greatly aect the standing of her company depending how
hard she works. The harder she works the stronger the business becomes and the greater
the prots. However she realises there is a downside to working hard in that she has less
time for her personal life. Also working hard when the the business is very successful is
very stressful and could lead to a nervous breakdown.
She asks your advice as to how hard she should work depending on the standing of her
company from year to year. After a brief discussion you simplify the problem to have the
following format:
The states (standing) of the business are X = 0 bankrupt, X = 1 poor business, X = 2
average business, X = 3 successful business or X = 4 very successful business.
The actions she can take are a = 1 slack o, a = 2 work as normal, or a = 3 work
very hard. Each stage is a year, and the reward for that year is related to the value of
the company minus the personal cost of working in the form:
Rn (Xn = i, an = 1) = i3
Rn (Xn = i, an = 2) = i3 10
Rn (Xn = i, an = 3) = i3 25.
The terminal reward at retirement is the value of the business, namely:
R0 (X0 = i) = i3 .
The transition probabilities, depend on the
by the transition matrices:
1
0.8
P (a = 1) = 0.1
0.1
0
0
0.1
0.5
0.3
0.1
0
0.1
0.3
0.3
0.3
0
0
0.1
0.2
0.4
0
0
0
0.1
0.2
1
0.1
0.1
0
0
0
0.7
0.1
0.1
0
0
0.2
0.6
0.1
0.1
0
0
0.2
0.6
0.4
0
0
0
0.2
0.5
1
0
0
0.1
0.3
0
0.3
0
0
0
0
0.4
0.3
0.1
0
0
0.3
0.4
0.3
0.1
0
0
0.3
0.5
0.6
P (a = 2) =
P (a = 3) =
(a) Explain in words why the top row of each transition matrix is (1, 0, 0, 0, 0).
Appendix 1: Exercises 6
110
(b) Write out the table to obtain the optimal expected reward and the optimal actions,
with two and one stages to go, for each of the ve states.
(c) Give an informal explanation as to why the optimal action with two stages to go
starting in state 1 is to take action a = 1, to slack o.
Do you think that starting in state one with many stages to go would give a dierent
optimal action, why?
4. A company has asked an employment agency to send interviewees for a vacant position.
The reward for four dierent standard of applicant are:
100 for an Excellent candidate,
60 for Good,
30 for Poor and
0 for a Hopeless candidate.
The employment agency is not a very good one, and so the standard of the candidate they
send is equally likely to be Excellent, Good, Poor or Hopeless. The cost of continuing and
having a further interview is 20. Assume that the agency has a xed maximum number of
applicants they will send.
(a) Formulate this as an optimal stopping MDP. Write down the formula for Vn (i) the
optimal reward given that the candidate sent with n stages to go is of the standard i.
(b) Solve this problem by writing down the optimal values in tabular format in order to
nd the optimal policy with 3 stages to go.
(c) Can you suggest what the optimal strategy would be if the agency will continue to
send candidates indenitely (until one is employed by the company).
5. Recompute the optimal policy of Example 6.2 (page 88) but now using a discount factor:
(a) of = 0.9 and
(b) of = 0.4.
Compare these solutions to the non-discounted solution.
6. An innite horizon MDP has the following form. There are 3 states 0, 1 and 2, and two
actions a = 1 and a = 2. Given action a = 1, the transitions to each state are equally
likely and the reward is always 2. Given action a = 2, the transition probabilities are 0.9
that the state remains the same and 0.05 that it moves to one of the other two states. The
rewards are
0
for j = 0
rij (a = 2) =
1
for j = 1
10 for j = 2.
There is a discount factor of .
(a) Write down the transition matrices for both actions, and nd R(i, a) for all values of
i and a.
111
Appendix 1: Exercises 6
(b) Write down an expression for the optimal value function V (i).
(c) Let = 3/4 and V0 (i) = 0 for all i. Using
2
nd U1 (i, a) for all values of i and a. Use this to nd the rst iteration of the value
function V1 (i) for values of i.
(d) Repeat step (c) for m = 2 to nd U2 (i, a) and V2 (i).
* B.Kolman & R.E.Beck, Elementary Linear Programming with Applications (1980, Academic
Press).
G. Gordon & I. Pressman Quantitative Decision-Making for Business (1978, Prentice Hall).
* J.Bather, Decision Theory: An Introduction to Dynamic Programming and Sequential Decisions (2000, Wiley).
112