ESI 4313 Operations Research 2: Dynamic Programming

ESI 4313
Operations Research 2
Dynamic Programming
Dynamic Programming
 Dynamic programming is a technique for
solving certain types of optimization
problems
 The idea is to break up a large, complex
problem into many smaller, much easier ones
 Usually, this technique can be applied to
problems in which a sequence of decisions
over time needs to be made to optimize
some criterion
Dynamic Programming
 In many cases, solving a problem by
dynamic programming means
 formulating this problem as a shortest path
problem in an acyclic network
 The art of dynamic programming lies in
how to construct this network!
Example:
Travel from coast to coast
 You currently live in NYC (1), but plan to move
to LA (10)
 You will drive
 To save money, you will spend each night of your
trip at a friend’s house
 You structure your potential stopovers as follows:
 In 1 day you can reach Columbus (2), Nashville (3), or
Louisville (4)
 On the 2nd day, you can reach Kansas City (5), Omaha (6),
or Dallas (7)
 On the 3rd day, you can reach San Antonio (8) or Denver (9)
 On the 4th day, you can reach LA
 To minimize your gas expenses, you are looking for
the route of minimum length
Example:
 We can classify the cities as follows:
 Call all cities that you can be in at the
beginning of your nth day the Stage n cities
 The idea of solving this problem by
dynamic programming is to start by
solving easy problems that will eventually
help you solve the entire problem
 In particular, we will work backward
Example:
 Denote the distance between city i and
city j by ci,j
 If city i is a stage t city, we denote the
length of the shortest path from city i to
LA by ft(i)
 Clearly, we would like to find f0(1)
Example:
 First, find the shortest path to LA from
each of the cities from which you can
reach LA in 1 day – the stage 4 cities
 Note that these problems are trivial, since in
each case there’s only 1 way to go to LA
 More formally,
 f4(8) = c8,10
 f4(9) = c9,10
Example:
 Then, find the shortest path to LA from
each of the stage 3 cities
 Note that this means that you should first go
to a stage 4 city, and then use the shortest
path from this stage 4 city to LA
 These problems are not as trivial as the first
ones, but by simply looking at all possible city
4 problems and the solutions to the first set
of problems this remains relatively easy
Example:
 From each stage 3 city
 go to a stage 4 city, and then use the
shortest path from this stage 4 city to LA
 So, for example, f3(5) is equal to
 c5,8 + f4(8), or
 c5,9 + f4(9)
 Since we’re interested in the shortest
path, we have
 f3(5) = min{c5,8 + f4(8) , c5,9 + f4(9)}
Example:
 Perform the same procedure for the stage
2 cities
 Perform the same procedure for the stage
1 city, NYC
 From NYC you should first go to a stage 2
city, and then use the shortest path from this
stage 2 city to LA
 We can find the best route from NYC to LA
by considering all possible stage 2 cities
Example:
 In general, in stage t we are interested in
finding ft(i) for all stage t cities i
 Using the earlier approach, we can write
 ft(i) = minj: j is a stage t+1 city {ci,j + ft+1(j) }
for all stage t cities i
Computational efficiency of
dynamic programming
 In the example, we could simply enumerate all
possible paths from NYC to LA
 It is easy to see that there are 3x3x2=18 paths
 However, suppose that we have more options:
 Starting city is again stage 1
 5 cities in each of 5 stages (stages 2,…,6)
 Destination city is stage 7
 Then there are 55=3,125 paths
 Determining the length of each of these paths takes
a total of 5x55 = 15,625 additions and 3,124
comparisons
Computational efficiency of
dynamic programming
 How much work is the dynamic
programming algorithm?
 The stage 6 problems are trivial
 Each of the other problems require
 5 additions (potential choices for next city to
visit) and 4 comparisons
 For a total of 4x5x5 + 5 = 105 additions and
4x5x4 + 4 = 84 comparisons
Characteristics of dynamic
programming
 The problem should have stages
 Each stage corresponds to a point at which a
decision needs to be made
 Each stage should have a number of
associated states
 The state contains all information that is
needed to make an optimal decision for the
remaining problem
programming
 The decision chosen at each stage
describes how the state at the current
stage is transformed in the state at the
next stage
 The optimal decision at the current state
should not depend on previously visited
states or previous decisions
 This is called the principle of optimality
programming
 There must be a recursion that relates
the cost or reward for stages t, t+1, …, T
to the cost or reward for stages t+1, t+2,
…, T
 This recursion formalizes the procedure of
working backwards from the last stage to the
first stage
Dynamic programming
formulation
 Stages: t =1,…,5
 States: city
 Decision in each stage:
 Choose the stage t+1 city to go to
 Dynamic programming recursion:
 f4(i) = ci,10 for all stage 4 cities i
 ft(i) = minj: j is a stage t+1 city {ci,j + ft+1(j) }
for all stage t cities i
Dynamic programming
without stages
 You must drive from Bloomington to
Cleveland
 You are interested in the route that takes
the least amount of time
Dynamic programming
without stages
Gary Toledo Cleveland

3 hours 3 hours
2 hours 1 hour 3 hours
Indianapolis Dayton Columbus

3 hours 2 hours
1 hour 2 hours
2.5 hours
Bloomington Cincinnati
3 hours
Production & inventory
planning
 Consider the following production & inventory
planning problem for a single item:
 Consider a planning period of T periods, and assume
that
 the demand for the item in each of the periods is known
 the initial inventory level is known
 At the start of each period, you must decide how
many units to produce; production capacity is limited
 Each period’s demand must be met on time
 There is a limited amount of storage space available
 The goal is to minimize the total production &
inventory costs over the planning horizon
planning
 This is a periodic review model
 Denote the demand in period t by dt
(t =1,…,T )
 Denote the cost of producing x units in
period t by ct(x) (often, this function is
independent of t, i.e., ct(x)=c(x) )
 If at the end of period t the inventory
level is I, a cost of ht(I) is charged (often,
these costs are independent of t, i.e.,
ht(I)=h(I) )
planning
 If the production and inventory holding
cost functions are linear, we can
formulate this problem as an LP problem
(how?)
 Often, the production costs are assumed
to have a fixed-charge structure:
 c(x) = 0 if x = 0, c(x) = a + bx if x > 0
 In that case, we can formulate this problem
as a mixed-integer LP problem (how?)
planning
 More generally, we can formulate this
problem as an NLP problem (how?)
 Production (and inventory) costs are
often assumed to be concave – reflecting
economies of scale
 What does that mean for the ease of
solvability of the NLP problem?
planning
 NLP formulation:
T T
min  ct ( xt )   h (I )
t t
t 1 t 1
subject to
It 1  xt  dt  It t  1,..., T
0  It  B t  1,..., T
0  xt  C t  1,..., T
planning
 Dynamic programming provides a solution
methodology that can be applied for
general cost functions
 We only need to assume that the units of
demand, production, and inventory are
integers – which is not unrealistic in many
practical situations
 This methodology will be efficient if the
magnitude of the numbers involved is not too
large
planning
 We must identify:
 Stages
time: t  1,..., T
 States
(starting) inventory level: I  0,..., B
 Decisions
production quantity: x  0,..., C
 Recursion
minimal cost from start of stage t : ft (I )
 Clearly, we are looking for f1(I0)
planning
 Recursion:
 Cost at the beginning of stage T :
fT (I )  min  cT (x)  hT (I  x  dT )
0 x C
 Note that you will always want to end up

with 0 inventory – so the final period’s
production will be x  dT  I
 So: fT (I )  cT (dT  I )
 (what if dT-I < 0 or dT-I > C ???)
planning
 Recursion:
 Cost at the beginning of stage t :
ft (I )  min
max(0,d  I ) x min(C ,d  B  I )
 ct ( x)  ht (I  x  dt )
 ft 1(I  x  dt )
t t
 We have to make sure that we have

sufficient storage capacity, i.e., we need
I  x  dt  B or x  dt  B  I
 We have to make sure that we deliver on
time I  x  d  0 or x  d  I
t t
planning example
 Example:
 T = 4 periods
 Demands: 1, 4, 2, 3
 Inventory holding costs: $0.50 per unit
 Production costs:
 fixed setup cost $3
 variable cost $1 per unit
 Production capacity C = 5 units
 Inventory capacity B = 4 units
planning
 Initialization: T = 4
 d4 = 3
 Cost at the beginning of stage T = 4 :
fT (I )  cT (dT  I )
 I = 0: f4(0) = c4(3-0) = 3 + 31 = 6
 I = 1: f4(1) = c4(3-1) = 3 + 21 = 5
 I = 2: f4(2) = c4(3-2) = 3 + 11 = 4
 I = 3: f4(3) = c4(3-3) = 0
 I = 4: f4(4) = c4(3-4) = ???  0
planning
 Next stage: t = 3
 d3 = 2
 Cost at the beginning of stage t = 3:
ft (I )  min
max(0,dt  I ) x min(C ,dt  B  I )
 ct (x)  ht (I  x  dt )  ft 1(I  x  dt )
f3 (I )  min
max(0,2  I ) x min(5,2  4  I )
 c3( x)  h3(I  x  2)  f4(I  x  2)

planning
 I = 0:
f3 (0)  min
max(0,2 0) x min(5,2  4 0)
 c3(x)  h3(0  x  2)  f4(0  x  2)
f3 (0)  min  c3 (x)  12 (x  2)  f4 ( x  2)

2 x 5
planning
 I = 0:
f3 (0) 
x  2: 2  1 12  2  f4 (0)  5  6  11

x  3: 2  1 12  3  f4 (1)  5 12  5  10 12
min 
x  4: 2  1 12  4  f4 (2)  6  4  10
 x  5: 2  1 12  5  f4 (3)  6 12  0  6 12
planning
 I = 2:
f3 (2)  min
max(0,2 2) x min(5,2  4 2)
 c3(x)  h3(2  x  2)  f4(2  x  2)

f3 (2)  min  c3 (x )  1
2
x  f4 (x )
0 x  4

f3 (2)  min f4 (0),min  3  1 12 x  f4 ( x)
1 x  4

planning
 I = 2:
f3 (2) 
x  0 : f4 (0)  6
x  1 : 3  1 12  1  f4 (1)  4 12  5  9 12

min  x  2 : 3  1 12  2  f4 (2)  4 12  4  8 12
x  3 : 3  1 12  3  f4 (3)  4 12  0  4 12

 x  4 : 3  1 12  4  f4 (4)  4 12  0  4 12
planning
 Network representation:
 Nodes: stage/state combinations (t,I)
 Arcs: decisions x
 Arc from node (t,I) corresponding to decision
x leads to node (t+1,I+x-dt)
 Cost of this arc is ct(x) + ht(I+x-dt)
Resource allocation:
the knapsack problem (1)
 Stockco is considering 4 investments
 Investment 1 will yield a NPV of $16K, but
requires a cash outflow of $5K
 You have a budget of $14K
 IP formulation:
4
max  NPVi xi  16 x1  22 x2  12 x3  8 x4
i 1
subject to
4
C x
i 1
i i  5x1  7 x2  4 x3  3x4  14
xi  {0,1} i  1,..., 4
 You are planning an overnight hike, and are
considering taking 4 items along on your trip
 Item 1 yields a “benefit” of 16, but weighs 5 lbs
 You do not want to carry more than 14 lbs
 You want to maximize your “benefit”
 Mathematically, this is the same problem as the
investment problem!
more general
 Stockco is considering n investments
 Investment n will yield a NPV of rn(dn) when
dn$1,000 is invested
 You only want to (or can) invest in integer multiples of
$1,000
 You have a budget of B  $1,000
 Example
 n = 3, B = 6
 r1(d1) = 7d1+2 (d1>0), r1(0) = 0
 r2(d2) = 3d2+7 (d2>0), r2(0) = 0
 r3(d3) = 4d3+5 (d3>0), r3(0) = 0
more general
n
max  ri (di )
i 1
subject to
n
d
i 1
i B
di  {0,1,...} i  1,..., n
Resource allocation
 To formulate this problem as a DP problem, we
must identify:
 Stages
investment categories: i  1,2,3
 States
budget available: y  0,1,...,6
 Decisions
investment amount: d  0,1,...,6
 Recursion
maximal return from inv. categories i,…,3 : fi (y )
 Clearly, we are looking for f1(6)
Resource allocation
 Recursion:
 Return from investment in category 3 only:
f3 (y )  max r3 (d )
0 d  y
 Note that you will always invest all remaining
budget in category 3 at this stage, i.e., d=y
0 if y  0
f3 (y )  r3 (y )  
4y  5 if y  1,...,6
Resource allocation
 Recursion:
 Return from investment in categories 2 and 3:
f2 (y )  max  r2 (d )  f3(y  d )
0 d  y
 These subproblems are a little harder…
 y=0: f2(0) = 0
 y=1: f2(1) =max(r2(0)+f3(1),r2(1)+f3(0))
=max(0+9,10+0) = 10
 y=2: f2(2) =
max(r2(0)+f3(2),r2(1)+f3(1),r2(2)+f3(0))
=max(0+13,10+9,13+0) = 19
Resource allocation
 Nodes: stage/state combinations (i,y)
 Arcs: decisions d
 Arc from node (i,y) corresponding to decision
x leads to node (i+1,y-d)
 Return of this arc is ri(d)
even more general
n
max  ri (di )
i 1
subject to
n
 g (d )  B
i 1
i i
di  {0,1,..., Ui } i  1,..., n
 Find the DP formulation for this general
case
Equipment replacement
problem
 A company faces the problem of how long a
machine should be utilized before it should be
traded in for a new one
 Example
 A new machine costs p=$1,000, and has a useful
lifetime of 3 years
 Maintaining a machine during its first 3 years costs
m1=$60, m2=$80, m3=$120, respectively
 If a machine is traded in, a salvage value is
obtained: s1=$800, s2=$600, and s3=$500,
respectively, after the first 3 years
problem
 We currently have a y year old machine
 Find a policy that minimizes total net costs
over the next 5 years
problem
 To formulate this problem as a DP problem, we
must identify:
 Stages
time: t  0,1,...,5
 States
age of machine: y  0,1,2,3
 Decisions
keep or trade-in: d  0,1
 Recursion
minimal net cost after period t : ft (y )
 Clearly, we are looking for f0(y)
problem
 Recursion:
 Note that you will always salvage the
machine at the end of year 5:
 Net cost after period 5:
f5 (y )  sy y  1,2,3
problem
 Recursion:
 At the end of period t < 5, you must decide
whether to keep or trade-in the machine
 If y=3, you must trade it in
ft (3)  s3  p  ft 1(1) t  0,1,..., 4

 If y<3, you have a real choice:
 x  0 : my  ft 1(y  1)
ft (y )  
 x  1 :  sy  p  ft 1(1)
y  1,2; t  0,1,..., 4
problem
 Nodes: stage/state combinations (t,y)
 Arcs: decisions x
 Arc from node (t,y) corresponding to decision
 x=0 leads to node (t+1,y+1)
 x=1 leads to node (t+1,1)
 Return of the arc is
 my when x=0
 – sy + p when x=1

ESI 4313 Operations Research 2: Dynamic Programming

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ESI 4313 Operations Research 2: Dynamic Programming

Uploaded by

Copyright:

Available Formats

ESI 4313

Gary Toledo Cleveland

2 hours 1 hour 3 hours

Indianapolis Dayton Columbus

 Note that you will always want to end up

 We have to make sure that we have

 c3( x)  h3(I  x  2)  f4(I  x  2)

 c3(x)  h3(0  x  2)  f4(0  x  2)

f3 (0)  min  c3 (x)  12 (x  2)  f4 ( x  2)

 c3(x)  h3(2  x  2)  f4(2  x  2)

ft (3)  s3  p  ft 1(1) t  0,1,..., 4

You might also like