Professional Documents
Culture Documents
Operations Research 2
Dynamic Programming
Dynamic Programming
Dynamic programming is a technique for
solving certain types of optimization
problems
The idea is to break up a large, complex
problem into many smaller, much easier ones
Usually, this technique can be applied to
problems in which a sequence of decisions
over time needs to be made to optimize
some criterion
Dynamic Programming
In many cases, solving a problem by
dynamic programming means
formulating this problem as a shortest path
problem in an acyclic network
The art of dynamic programming lies in
how to construct this network!
Example:
Travel from coast to coast
You currently live in NYC (1), but plan to move
to LA (10)
You will drive
To save money, you will spend each night of your
trip at a friend’s house
You structure your potential stopovers as follows:
In 1 day you can reach Columbus (2), Nashville (3), or
Louisville (4)
On the 2nd day, you can reach Kansas City (5), Omaha (6),
or Dallas (7)
On the 3rd day, you can reach San Antonio (8) or Denver (9)
On the 4th day, you can reach LA
To minimize your gas expenses, you are looking for
the route of minimum length
Example:
Travel from coast to coast
We can classify the cities as follows:
Call all cities that you can be in at the
beginning of your nth day the Stage n cities
The idea of solving this problem by
dynamic programming is to start by
solving easy problems that will eventually
help you solve the entire problem
In particular, we will work backward
Example:
Travel from coast to coast
Denote the distance between city i and
city j by ci,j
If city i is a stage t city, we denote the
length of the shortest path from city i to
LA by ft(i)
Clearly, we would like to find f0(1)
Example:
Travel from coast to coast
First, find the shortest path to LA from
each of the cities from which you can
reach LA in 1 day – the stage 4 cities
Note that these problems are trivial, since in
each case there’s only 1 way to go to LA
More formally,
f4(8) = c8,10
f4(9) = c9,10
Example:
Travel from coast to coast
Then, find the shortest path to LA from
each of the stage 3 cities
Note that this means that you should first go
to a stage 4 city, and then use the shortest
path from this stage 4 city to LA
These problems are not as trivial as the first
ones, but by simply looking at all possible city
4 problems and the solutions to the first set
of problems this remains relatively easy
Example:
Travel from coast to coast
From each stage 3 city
go to a stage 4 city, and then use the
shortest path from this stage 4 city to LA
So, for example, f3(5) is equal to
c5,8 + f4(8), or
c5,9 + f4(9)
Since we’re interested in the shortest
path, we have
f3(5) = min{c5,8 + f4(8) , c5,9 + f4(9)}
Example:
Travel from coast to coast
Perform the same procedure for the stage
2 cities
Perform the same procedure for the stage
1 city, NYC
From NYC you should first go to a stage 2
city, and then use the shortest path from this
stage 2 city to LA
We can find the best route from NYC to LA
by considering all possible stage 2 cities
Example:
Travel from coast to coast
In general, in stage t we are interested in
finding ft(i) for all stage t cities i
Using the earlier approach, we can write
ft(i) = minj: j is a stage t+1 city {ci,j + ft+1(j) }
for all stage t cities i
Computational efficiency of
dynamic programming
In the example, we could simply enumerate all
possible paths from NYC to LA
It is easy to see that there are 3x3x2=18 paths
However, suppose that we have more options:
Starting city is again stage 1
5 cities in each of 5 stages (stages 2,…,6)
Destination city is stage 7
Then there are 55=3,125 paths
Determining the length of each of these paths takes
a total of 5x55 = 15,625 additions and 3,124
comparisons
Computational efficiency of
dynamic programming
How much work is the dynamic
programming algorithm?
The stage 6 problems are trivial
Each of the other problems require
5 additions (potential choices for next city to
visit) and 4 comparisons
For a total of 4x5x5 + 5 = 105 additions and
4x5x4 + 4 = 84 comparisons
Characteristics of dynamic
programming
The problem should have stages
Each stage corresponds to a point at which a
decision needs to be made
Each stage should have a number of
associated states
The state contains all information that is
needed to make an optimal decision for the
remaining problem
Characteristics of dynamic
programming
The decision chosen at each stage
describes how the state at the current
stage is transformed in the state at the
next stage
The optimal decision at the current state
should not depend on previously visited
states or previous decisions
This is called the principle of optimality
Characteristics of dynamic
programming
There must be a recursion that relates
the cost or reward for stages t, t+1, …, T
to the cost or reward for stages t+1, t+2,
…, T
This recursion formalizes the procedure of
working backwards from the last stage to the
first stage
Dynamic programming
formulation
Stages: t =1,…,5
States: city
Decision in each stage:
Choose the stage t+1 city to go to
Dynamic programming recursion:
f4(i) = ci,10 for all stage 4 cities i
ft(i) = minj: j is a stage t+1 city {ci,j + ft+1(j) }
for all stage t cities i
Dynamic programming
without stages
You must drive from Bloomington to
Cleveland
You are interested in the route that takes
the least amount of time
Dynamic programming
without stages
1 hour 2 hours
2.5 hours
Bloomington Cincinnati
3 hours
Production & inventory
planning
Consider the following production & inventory
planning problem for a single item:
Consider a planning period of T periods, and assume
that
the demand for the item in each of the periods is known
the initial inventory level is known
At the start of each period, you must decide how
many units to produce; production capacity is limited
Each period’s demand must be met on time
There is a limited amount of storage space available
The goal is to minimize the total production &
inventory costs over the planning horizon
Production & inventory
planning
This is a periodic review model
Denote the demand in period t by dt
(t =1,…,T )
Denote the cost of producing x units in
period t by ct(x) (often, this function is
independent of t, i.e., ct(x)=c(x) )
If at the end of period t the inventory
level is I, a cost of ht(I) is charged (often,
these costs are independent of t, i.e.,
ht(I)=h(I) )
Production & inventory
planning
If the production and inventory holding
cost functions are linear, we can
formulate this problem as an LP problem
(how?)
Often, the production costs are assumed
to have a fixed-charge structure:
c(x) = 0 if x = 0, c(x) = a + bx if x > 0
In that case, we can formulate this problem
as a mixed-integer LP problem (how?)
Production & inventory
planning
More generally, we can formulate this
problem as an NLP problem (how?)
Production (and inventory) costs are
often assumed to be concave – reflecting
economies of scale
What does that mean for the ease of
solvability of the NLP problem?
Production & inventory
planning
NLP formulation:
T T
min ct ( xt ) h (I )
t t
t 1 t 1
subject to
It 1 xt dt It t 1,..., T
0 It B t 1,..., T
0 xt C t 1,..., T
Production & inventory
planning
Dynamic programming provides a solution
methodology that can be applied for
general cost functions
We only need to assume that the units of
demand, production, and inventory are
integers – which is not unrealistic in many
practical situations
This methodology will be efficient if the
magnitude of the numbers involved is not too
large
Production & inventory
planning
We must identify:
Stages
time: t 1,..., T
States
(starting) inventory level: I 0,..., B
Decisions
production quantity: x 0,..., C
Recursion
minimal cost from start of stage t : ft (I )
Clearly, we are looking for f1(I0)
Production & inventory
planning
Recursion:
Cost at the beginning of stage T :
fT (I ) min cT (x) hT (I x dT )
0 x C
ft (I ) min
max(0,dt I ) x min(C ,dt B I )
ct (x) ht (I x dt ) ft 1(I x dt )
f3 (I ) min
max(0,2 I ) x min(5,2 4 I )
f3 (0) min
max(0,2 0) x min(5,2 4 0)
f3 (2) min f4 (0),min 3 1 12 x f4 ( x)
1 x 4
Production & inventory
planning
I = 2:
f3 (2)
x 0 : f4 (0) 6
x 1 : 3 1 12 1 f4 (1) 4 12 5 9 12
min x 2 : 3 1 12 2 f4 (2) 4 12 4 8 12
x 3 : 3 1 12 3 f4 (3) 4 12 0 4 12
x 4 : 3 1 12 4 f4 (4) 4 12 0 4 12
Production & inventory
planning
Network representation:
Nodes: stage/state combinations (t,I)
Arcs: decisions x
Arc from node (t,I) corresponding to decision
x leads to node (t+1,I+x-dt)
Cost of this arc is ct(x) + ht(I+x-dt)
Resource allocation:
the knapsack problem (1)
Stockco is considering 4 investments
Investment 1 will yield a NPV of $16K, but
requires a cash outflow of $5K
Investment 2 will yield a NPV of $22K, but
requires a cash outflow of $7K
Investment 3 will yield a NPV of $12K, but
requires a cash outflow of $4K
Investment 4 will yield a NPV of $8K, but
requires a cash outflow of $3K
You have a budget of $14K
Resource allocation:
the knapsack problem (1)
IP formulation:
4
max NPVi xi 16 x1 22 x2 12 x3 8 x4
i 1
subject to
4
C x
i 1
i i 5x1 7 x2 4 x3 3x4 14
xi {0,1} i 1,..., 4
Resource allocation:
the knapsack problem (2)
You are planning an overnight hike, and are
considering taking 4 items along on your trip
Item 1 yields a “benefit” of 16, but weighs 5 lbs
Item 2 yields a “benefit” of 22, but weighs 7 lbs
Item 3 yields a “benefit” of 12, but weighs 4 lbs
Item 4 yields a “benefit” of 8, but weighs 3 lbs
You do not want to carry more than 14 lbs
You want to maximize your “benefit”
Mathematically, this is the same problem as the
investment problem!
Resource allocation:
more general
Stockco is considering n investments
Investment n will yield a NPV of rn(dn) when
dn$1,000 is invested
You only want to (or can) invest in integer multiples of
$1,000
You have a budget of B $1,000
Example
n = 3, B = 6
r1(d1) = 7d1+2 (d1>0), r1(0) = 0
r2(d2) = 3d2+7 (d2>0), r2(0) = 0
r3(d3) = 4d3+5 (d3>0), r3(0) = 0
Resource allocation:
more general
NLP formulation:
n
max ri (di )
i 1
subject to
n
d
i 1
i B
di {0,1,...} i 1,..., n
Resource allocation
To formulate this problem as a DP problem, we
must identify:
Stages
investment categories: i 1,2,3
States
budget available: y 0,1,...,6
Decisions
investment amount: d 0,1,...,6
Recursion
maximal return from inv. categories i,…,3 : fi (y )
Clearly, we are looking for f1(6)
Resource allocation
Recursion:
Return from investment in category 3 only:
f3 (y ) max r3 (d )
0 d y
Note that you will always invest all remaining
budget in category 3 at this stage, i.e., d=y
0 if y 0
f3 (y ) r3 (y )
4y 5 if y 1,...,6
Resource allocation
Recursion:
Return from investment in categories 2 and 3:
f2 (y ) max r2 (d ) f3(y d )
0 d y
These subproblems are a little harder…
y=0: f2(0) = 0
y=1: f2(1) =max(r2(0)+f3(1),r2(1)+f3(0))
=max(0+9,10+0) = 10
y=2: f2(2) =
max(r2(0)+f3(2),r2(1)+f3(1),r2(2)+f3(0))
=max(0+13,10+9,13+0) = 19
Resource allocation
Network representation:
Nodes: stage/state combinations (i,y)
Arcs: decisions d
Arc from node (i,y) corresponding to decision
x leads to node (i+1,y-d)
Return of this arc is ri(d)
Resource allocation:
even more general
NLP formulation:
n
max ri (di )
i 1
subject to
n
g (d ) B
i 1
i i
di {0,1,..., Ui } i 1,..., n
Find the DP formulation for this general
case
Equipment replacement
problem
A company faces the problem of how long a
machine should be utilized before it should be
traded in for a new one
Example
A new machine costs p=$1,000, and has a useful
lifetime of 3 years
Maintaining a machine during its first 3 years costs
m1=$60, m2=$80, m3=$120, respectively
If a machine is traded in, a salvage value is
obtained: s1=$800, s2=$600, and s3=$500,
respectively, after the first 3 years
Equipment replacement
problem
We currently have a y year old machine
Find a policy that minimizes total net costs
over the next 5 years
Equipment replacement
problem
To formulate this problem as a DP problem, we
must identify:
Stages
time: t 0,1,...,5
States
age of machine: y 0,1,2,3
Decisions
keep or trade-in: d 0,1
Recursion
minimal net cost after period t : ft (y )
Clearly, we are looking for f0(y)
Equipment replacement
problem
Recursion:
Note that you will always salvage the
machine at the end of year 5:
Net cost after period 5:
f5 (y ) sy y 1,2,3
Equipment replacement
problem
Recursion:
At the end of period t < 5, you must decide
whether to keep or trade-in the machine
If y=3, you must trade it in