Professional Documents
Culture Documents
Jeff Linderoth
Dept. of Industrial and Systems Engineering Univ. of Wisconsin-Madison linderoth@wisc.edu Enterprise-Wide Optimization Meeting Carnegie-Mellon University March 10th, 2009
Je Linderoth (UW-Madison)
CMU-EWO
1 / 82
Mission Impossible
Explaining Stochastic Programming in 90 mins I will try to give an overview please interrupt with questions!
Je Linderoth (UW-Madison)
CMU-EWO
2 / 82
Algorithms Extensive Form Benders Decomposition (2-stage) Sampling Nested Benders Decomposition (multistage)
Je Linderoth (UW-Madison)
CMU-EWO
3 / 82
Etymology
program:
(3) An ordered list of events to take place or procedures to be followed; a schedule
Late Latin programma, public notice, from Greek programma, programmat-, from prographein, to write publicly
stochastic:
(1b) Involving chance or probability
Greek stokhastikos, from stokhasts, diviner, from stokhazesthai, to guess at, from stokhos, aim, goal.
Source: The American Heritage Dictionary of the English Language, Fourth Edition.
Je Linderoth (UW-Madison)
CMU-EWO
4 / 82
Sources of Uncertainty
Sources of Uncertainty
Houston, we have uncertainty! What we anticipate seldom occurs; what we least expected generally happens. Benjamin Disraeli (1804 - 1881) Financial
Market price movements Defaults by a business partner
Market Related
Shifts in tastes
Competition
What will your competitors strategy be next year?
Operational
Customer demands, Travel times
Technology related
Will a new technology be ready in time
Je Linderoth (UW-Madison)
Sources of Uncertainty
Stochastic Programming
A tool used in planning under uncertainty More specically: Mathematical Programming, or Optimization, in which some of the parameters dening a problem instance are random, or uncertain Optimization min f(x)
xX
Stochastic Optimization is UNDEFINED You cant possibly choose an x that optimizes for all More speciation is required
Je Linderoth (UW-Madison) Models & Algs. for SP CMU-EWO 6 / 82
Sources of Uncertainty
Sources of Uncertainty
Probability Theory(?)
This notion of having to know a probability distribution for the randomness is troubling, since in reality, very few people exactly know that
Their customer demands follow a log-normal distribution with mean 17.26 and variance 2.88726 Their plant will have forced shutdowns following a Weibull distribution with parameters (100.25, 73.7916)
Je Linderoth (UW-Madison)
CMU-EWO
8 / 82
Sources of Uncertainty
Three Approaches
1 2 3
Je Linderoth (UW-Madison)
Robust Optimization
Uncertain data is assumed to lie in an uncertainty set (T (), h()) U Guarantee that constraints be satised for all possible realizations min cx s.t. Ax b Tx h (T, h) U x0 Tractability depends on structure of U
Je Linderoth (UW-Madison)
CMU-EWO
11 / 82
Robust Optimization
To control conservatism, uncertainty set can be parameterized by a budget of uncertainty Example 1: Tij () [lij , uij ] (Bertsimas and Sim)
At most K of the components in each row can dier from the nominal value Nature can choose which K will dier K large highly conservative (Soyster) K = 0 No robustness Can formulate this problem as a linear program
Je Linderoth (UW-Madison)
CMU-EWO
12 / 82
Robust Optimization
Advantages: Computationally tractable Can yield extremely reliable solutions Does not require stochastic model Disadvantages: Does not use a stochastic model Although conservatism can be controlled, the control parameter doesnt have meaning to decision makers
Je Linderoth (UW-Madison)
CMU-EWO
13 / 82
Stochastic Programming
Assume uncertain data are random variables with known distributions Two approaches to uncertain constraints:
1
Require constraint to be satised with high probability min {cx : x X, P{T ()x h()} 1 } is a parameter, e.g. = 0.05 or = 0.01 Linear program with probabilistic (chance) constraints
Penalize violations of constraints min cx + E[(h() T ()x)+ ] : x X Special case of a Two stage stochastic program
Je Linderoth (UW-Madison)
CMU-EWO
14 / 82
Individual constraints: min cx : x X, P{T ()i x h()i } 1 Joint constraints: min {cx : x X, P{T ()x h()} 1 } Bad news: calculating probability is hard Worse news: probabilistic constraints are generally non-convex!
i
Je Linderoth (UW-Madison)
CMU-EWO
15 / 82
x1
Je Linderoth (UW-Madison)
CMU-EWO
16 / 82
Choose x Observe (T (), h()) Pay penalty Good news: (SP) is convex Bad news: Calculating expectation is hard Successful Approach: Sample Average Approximation Generate (T ()1 , h()1 ), . . . , (T ()N , h()N ) and solve
N
(SPN )
min cx +
i=1
x is a often a good approximation to true optimal solution N Well see (a lot) more later!
Je Linderoth (UW-Madison) Models & Algs. for SP CMU-EWO 17 / 82
Stochastic Programming
(Con): More challenging to build and solve models (Pro): SP helps you optimize over your what-ifs.
The Upshot! Use simulation to generate scenarios. Input the scenarios to a stochastic program to show how to decide how to best hedge against this uncertainty
Je Linderoth (UW-Madison)
CMU-EWO
18 / 82
The evolution of information is of fundamental importance to the decision-making progress. We make a decision now (x1 ) Nature makes a random decision 2 : (stu happens) We make a second period decision x2 that attempts to repair the havoc wrought by nature in (recourse). Repeat as necessary... We make decisions in stages, in between which uncertainty is revealed to us
Je Linderoth (UW-Madison) Models & Algs. for SP CMU-EWO 19 / 82
The Newsvendor
The Newsvendor Problem Given only knowledge of the probability distribution F of demand, how may papers should the newsvendor buy?
Je Linderoth (UW-Madison)
CMU-EWO
20 / 82
The Newsvendor
Newsvendor Problem
Suppose that the newsvendors goal is to maximize the prots in the long run. (In expectation)... Intuitively, it seems that the newsvendors best strategy is to every purchase the average demand Take Away Message! The optimal solution is NOT to use the mean demand. In fact, the two solution can be far apart. (Depending on the distribution, and parameters r, c, q
Je Linderoth (UW-Madison)
CMU-EWO
21 / 82
The Newsvendor
ExampleThe Newsvendor
c = 50, q = 70, r = 5 Demand: (Truncated) Normal distributed. = 100, = 50 Mean Value Solution
Buy 100. Expect to prot: 2000 TRUE long run prot 650 (Duh!)
Stochastic Solution
Buy 75. Expect to prot: 1500 TRUE long run prot 880
The dierence between the two solutions (880 650) is called the value of the stochastic solution.
How much is it worth to you to plan using full uncertainty information as opposed to mean-values for the uncertain parameters
Je Linderoth (UW-Madison) Models & Algs. for SP CMU-EWO 22 / 82
The Newsvendor
Point Estimates If you are planning with point estimates for demands, then you are planning sub-optimally It doesnt matter how carefully you choose the point estimate it is impossible to hedge against future uncertainty by considering one realization of the uncertainty in your planning process
Je Linderoth (UW-Madison)
CMU-EWO
23 / 82
Financial Optimization
Russell-Yasuda Kasai
Yasuda Kasai: Seventh largest (worldwide) property and casualty insurer. Assets of > 3.47 trillion Liability structure is complex, but want a tool that will allow them to maximize the revenue from these assets in the face of asset management restrictions Frank Russell Company hired to develop Asset-Liability Management Model based on (multistage) stochastic programming Carino, Myers, Ziemba, Second place in Edelman prize competition of INFORMS.
Je Linderoth (UW-Madison)
CMU-EWO
24 / 82
Financial Optimization
Random Events:
Return on investment for each asset. Liability payouts
Constraints:
Asset Allocation Constraints (Complex) Loan Model Liability Model
Compared to a performance benchmark established at Yasuda Kasai at the beginning of the Fiscal Year to measure the value added by their use of the model, the new model increased annual income by 9.5 billion.
Mr. Kunihiko Sasamoto, Director and Deputy President, Yasuda Kasai.
Je Linderoth (UW-Madison) Models & Algs. for SP CMU-EWO 25 / 82
Financial Optimization
Ease of Use
Risk is well dened, not using some abstract measure like standard deviation
Je Linderoth (UW-Madison)
CMU-EWO
26 / 82
Financial Optimization
Decisions:
Invest in various projects (All or nothing investment). Complicated project prerequisite structure
Random Events:
Design-win from customers Technology failures Market forces
(HUGE
impact)
Constraints:
Resources Hire-re costs
Je Linderoth (UW-Madison)
CMU-EWO
27 / 82
Financial Optimization
The muckety-mucks loved it! They like the ability to talk about the dierent scenarios. Focuses discussion in business planning meetings Gives unbiased simulator view of potential outcomes of decisions
Je Linderoth (UW-Madison)
CMU-EWO
28 / 82
Logistics
Decisions:
Regular supply chain decision: How much? where? and when?
Random Elements:
Demands, prices, resource capacity. Supply chains going global imply that companies are now more exposed to risky factors such as exchange rates and reliability of transfer channels.
Constraints:
Regular supply chain constraints: Flow balance, material availability, etc.
Je Linderoth (UW-Madison)
CMU-EWO
29 / 82
Logistics
A Case Study
T. Santoso, S. Ahmed, M. Goetschalckx, and A. Shapiro. A Stochastic Programming Approach for Supply Chain Network Design under Uncertainty, European Journal of Operational Research, vol.167, pp.96-115, 2005.
Sizes: Around 100 facilities. Around 100 customers, In general, the (sampled) stochastic model was roughly 5% better than using the mean value of demand, translating into millions of dollars in potential savings.
Je Linderoth (UW-Madison)
CMU-EWO
30 / 82
Logistics
Lesson Learned Having a (static) simulation of the production-disribution process is a key component to the project
Je Linderoth (UW-Madison)
CMU-EWO
31 / 82
Other Industries
Telecommunication
Capacity/bandwidth planning: Invest in capacity for the network before you know the true bandwidth demands
Military
Network Interdiction Problem: Where to place agent on a network to interrupt evil-doers
It aint that rosy As far as I know, mot implementations are built on a case-by-case basis and are fairly ad-hoc.
Je Linderoth (UW-Madison)
CMU-EWO
32 / 82
Each of these imply a dierent notion of risk, and lead to dierent stochastic optimization problems Stochastic Programming isnt about getting a number, its about getting a distribution that looks good to you
Je Linderoth (UW-Madison) Models & Algs. for SP CMU-EWO 33 / 82
Some SP Objectives
Je Linderoth (UW-Madison)
CMU-EWO
34 / 82
Conditio
s em
s em
J oint D istrib ut
(Jo
in
t)
Ch an c
Co n
yD
uti trib is
st ra in ts
ions
Je Linderoth (UW-Madison)
nal)
CMU-EWO
35 / 82
I point out all these dierent avors of SP to highlight what I think has been one of the hinderances of having a modeling laguage for SP. I dont know the key to success, but the key to failure is trying to please everybody. Bill Cosby (1937 - ) I believe the fact that a stochastic program is not a well-dened concept is one of the fundamental reasons why more people dont use stochastic programming Other reasons people dont use stochastic programming?
Je Linderoth (UW-Madison)
CMU-EWO
36 / 82
Je Linderoth (UW-Madison)
CMU-EWO
38 / 82
Probability Management
A true believer is Sam Savage (consulting professor at at Stanford). He believes companies should have a comprehensive probability management plan. Probability Management Simulations to generate distributions Information systems to hold distributions of key uncertain inputs A Chief probability ocer responsible for signing o on the distributions You can start small... 1 What are your scenarios and distributions?
2
Je Linderoth (UW-Madison)
Algorithms
ALGORITHMS
I focus almost exclusively on two-stage recourse problems
Je Linderoth (UW-Madison)
CMU-EWO
40 / 82
Algorithms
Stochastic Programming
A Stochastic Program min{E F(x, )}
xX
Wy = h() T ()x y 0
def
Algorithms
Extensive Form
Extensive Form
Assume = {1 , 2 , . . . S } Rr , P( = s ) = ps , s = 1, 2, . . . , S Ts T (s ), hs = h(s ) Then can write extensive form:
cT x s.t. Ax T1 x T2 x . . . TS x xX + p1 qT y1 + p2 qT y2 + + ps qT ys = = = .. y2 Y . + y1 Y Wys ys Y = b h1 h2 . . . hs
Wy1 + + Wy2
The Upshot! This is just a larger linear program It is a larger linear program that also has special structure
Je Linderoth (UW-Madison) Models & Algs. for SP CMU-EWO 42 / 82
Algorithms
Extensive Form
M E T H O D
Je Linderoth (UW-Madison)
CMU-EWO
43 / 82
Algorithms
Extensive Form
60
Cplex/Extensive Form
50
Time
40
Lshaped
30
20
10
50
100
150
200
250
300
number of scenarios
Je Linderoth (UW-Madison)
CMU-EWO
44 / 82
Algorithms
Q(x) =
i=1
pi Q(x, i )
For a partition of the N scenarios into sets N1 , N2 , . . . Nt , let Q[j] (x) be the contribution of the jth set to Q(x): Q[j] (x) =
t j=1 Q[j] def iNj
pi Q(x, i )
so then Q(x) =
Je Linderoth (UW-Madison)
CMU-EWO
45 / 82
Algorithms
Key Idea Represent Q[j] (x) by an articial variable j and nd supporting planes for j
j Q[j] (xk ) + gj (xk )T (x xk ) ()
Point of Decomposition Evaluation of Q(^) is separable x We can solve linear programs corresponding to each Q(^, i ) x independently in parallel!
Je Linderoth (UW-Madison) Models & Algs. for SP CMU-EWO 46 / 82
Algorithms
Q(x)
x
Je Linderoth (UW-Madison) Models & Algs. for SP CMU-EWO 47 / 82
Algorithms
Q(x)
x
Je Linderoth (UW-Madison)
xk
Models & Algs. for SP CMU-EWO 48 / 82
Algorithms
Q(x)
x
Je Linderoth (UW-Madison)
x2
x1
CMU-EWO
49 / 82
Algorithms
s1
s2
s3
s4
s5
2
Solve the master problem M with the current approximation to Q(x) for xk . Solve the subproblems, (sj ) evaluating Q(xk ) and obtaining subgradient(s) to update master approximation M k = k+1. Goto 1.
M
3
Algorithms
Warning!
If Q(x) is not convex, then this algorithm doesnt work If you have a integer recourse variables y Zp Rnp , the problem becomes signicantly more dicult. Your Options Give your favorite solver the full extensive form (and pray)
Weak relaxation
Decomposition method: Care and Schultz, Sen Spatial branch and bound: Talk to Nick! Want to know about stochastic integer programming/spatial branch and bound?
Je Linderoth (UW-Madison) Models & Algs. for SP CMU-EWO 51 / 82
Algorithms
A R985,032,88912,590,000,121
Je Linderoth (UW-Madison)
CMU-EWO
52 / 82
Algorithms
Je Linderoth (UW-Madison)
CMU-EWO
53 / 82
Algorithms
TA-DA!!!!!
Wall clock time CPU time Avg. # machines Max # machines Parallel Eciency Master iterations CPU Time solving the master problem Maximum number of rows in master problem
Je Linderoth (UW-Madison)
CMU-EWO
54 / 82
Algorithms
Number of Workers
600
500
400 #workers
300
200
100
80000
100000
120000
140000
Sampling
How do we solve a problem that has more variables and more constraints than the number of subatomic particles in the universe?
Je Linderoth (UW-Madison)
CMU-EWO
56 / 82
Sampling
Je Linderoth (UW-Madison)
CMU-EWO
57 / 82
Sampling
The Story Solving two-stage SP exactly is often impossible Solving two-stage SP approximately is often easy: Sample Average Approximation (SAA)
Je Linderoth (UW-Madison)
CMU-EWO
58 / 82
Sampling
Take a sample (1 , ..., N ) of N realizations of the vector , and form the sample average function
N
fN (x) = N1
j=1
def
F(x, j )
For Stochastic LP w/recourse, evaluate fN (x) solve one LP for each of N scenarios
j=1
F(x, j )
CMU-EWO 59 / 82
Sampling
Note that vN is a random variable, as it depends on the (random) sample of size N From this information, we can get bounds on the optimal solution value v
Je Linderoth (UW-Madison)
CMU-EWO
60 / 82
Sampling
Get upper bound on v from f(^). Estimate f(^) by solving N x x (completely independent) linear programsrecourse LPs with x xed. ^
N
fN (^) = (N )1 x
j=1
2
def
F(^, j ) x
Get a lower bound on v from E(vN ). Estimate E(vN ) by solving M independent stochastic LPs, giving optimal values v1 , v2 , . . . vM N N N
M
E(vN ) = M1
j=1
def
vj N
Sampling
More Theory
A very interesting result of Shapiro and Homem-de-Mello says the following: Suppose that x is the unique optimal solution to the true problem Let xN be the solution to the sampled approximating problem ^ Under certain conditions, the event (^N = x ) happens with x probability 1 for N large enough. The probability of this event approaches 1 exponentially fast as N !! There exists a constant such that
N
This is a qualitative result indicating that it might not be necessary to have a large sample size in order to solve the true problem exactly. For a problem with 51000 scenarios a sample of size N 400 is required in order to nd the true optimal solution with probability 95%!!!
Je Linderoth (UW-Madison) Models & Algs. for SP CMU-EWO 62 / 82
Sampling
Je Linderoth (UW-Madison)
CMU-EWO
63 / 82
Sampling
20term Convergence
255500 255000 254500 254000 Value 253500 253000 252500 252000 251500 10 100 N 1000 10000 Lower Bound Upper Bound
Je Linderoth (UW-Madison)
CMU-EWO
64 / 82
Sampling
ssn Convergence
18 16 14 12 Value 10 8 6 4 2 10 100 N 1000 10000 Lower Bound Upper Bound
Je Linderoth (UW-Madison)
CMU-EWO
65 / 82
Sampling
storm Convergence
1.555e+06 1.554e+06 1.553e+06 1.552e+06 1.551e+06 Value 1.55e+06 1.549e+06 1.548e+06 1.547e+06 1.546e+06 1.545e+06 1.544e+06 10 100 N 1000 10000 Lower Bound Upper Bound
Je Linderoth (UW-Madison)
CMU-EWO
66 / 82
Sampling
gbd Convergence
1800 Lower Bound Upper Bound
1750
1700 Value
1650
1600
1550
Je Linderoth (UW-Madison)
CMU-EWO
67 / 82
Multistage SPs
Multistage Stochastic LP
1 x1 2 x2 3 xT 1 T xT Random vectors 1 Rn1 , 2 Rn2 , . . . , T RnT Make sequence of decisions x1 X1 , x2 X2 , . . . , xT XT .
Risk Neutral: We always aim to optimize the expected value of our current decision xt Linear: Assume Xt are polyhedra Discrete: Assume t are drawn from a discrete distribution. The Hard Part Decisions made at period t (xt ) must only depend on events and decisions up to period t
Je Linderoth (UW-Madison) Models & Algs. for SP CMU-EWO 68 / 82
Multistage SPs
Create (extra) variables for all possible scenarios, and enfroce equality between decisions that should be nonanticipative (Progressive Hedging)
Models & Algs. for SP CMU-EWO 69 / 82
Je Linderoth (UW-Madison)
Multistage SPs
Scenario Tree
N: Set of nodes in the tree (n): Unique predecessor of node n in the tree S(n): Set of successor nodes of n x0 ^ 1 ^ 2 x(n) Warning! Scenario Trees can get big There are some tools that try and prune the tree while keeping similar statistical properties in the stochastic process
Je Linderoth (UW-Madison) Models & Algs. for SP CMU-EWO 70 / 82
qn : Probability that the sequence of events leading to node n occurs xn xn : Decision taken at node n
Multistage SPs
qn cT xn n
Tn x(n) + Wn xn = hn n N
mS(n)
Multistage SPs
Algorithm
Nested Decomposition
0: Root node of the scenario tree x0 : Initial state of the system Recursive Formulation zSP = Q0 (x0 ) Cost to go: Gn (x) = Mk (x): n
def
Wn xn = hn Tn x(n)
((MLPn ))
Je Linderoth (UW-Madison)
CMU-EWO
72 / 82
Multistage SPs
Algorithm
Building Mk (x) n
Create a partition (or clustering Cn ) of S(n) A lower bound mk for each element of the partition (each cluster) n[j] is created independently Mk (x) = n
def def jCn
mk (x) n[j]
j e Fk x + fk n[j] n[j]
Fk , fn[j] obtained from dual solutions (to form subgradients) of n[j] linear programs of nodes within cluster [j] Mk (x ) Gn (x ) n
Je Linderoth (UW-Madison) Models & Algs. for SP CMU-EWO 73 / 82
Multistage SPs
Algorithm
Action Pictures
x0
1 x0
Je Linderoth (UW-Madison)
CMU-EWO
74 / 82
Multistage SPs
Algorithm
small
A B C E D F
Set of stages T Set J of links Sets It of demands Random demand dt () R|It | Budget each period Install capacity on links each period to minimize the total expected unserved demand
Je Linderoth (UW-Madison)
CMU-EWO
75 / 82
Multistage SPs
Algorithm
Some
(Limited)
Computational Results
K 30 50 60
Je Linderoth (UW-Madison)
CMU-EWO
76 / 82
Multistage SPs
Algorithm
Computational Results
It: Number of iterations (Times MLP0 was solved) E: Parallel eciency. Time machines solving MLPn Time machines available
K 30 50 60 It 9 7 11 Avg Workers 62 75 162 Wall Time 2:34:21 1:12:49:27 3:16:51:00 CPU Time 6:15:15:10 85:20:24:15 431:12:15:37 E 67 77 73
Je Linderoth (UW-Madison)
CMU-EWO
77 / 82
Multistage SPs
Modeling Tools
Name AIMMS Gams MPL XPRESS-SP SPiNE STRUMS SUTIL SLPLib COIN-Smi, SP/OSL
Author(s) AIMMS Team Gams Team Kristjensen Verma, Dash Opt. Valente, CARISMA Fourer and Lopes Czyzyk and Linderoth Felt, Sarich, Ariyawansa COIN, IBM
Comment Commercial Commercial Commercial Commercial, Beta Prototype(?) C++ classes Open Source C Routines C++ methods
Je Linderoth (UW-Madison)
CMU-EWO
78 / 82
Multistage SPs
Modeling Tools
Most stochastic programming implementations of which Im aware, merely form and solve extensive form Other software:
Author(s) AIMMS Team Kall, Mayer Gassmann Valente, CARISMA Altenstedt Linderoth, Wright
Comment Commercial, LShaped method LShaped, Stochastic Decomposition, others Nested LShaped Commercial, LShaped method, may not exist anymore Nested LShaped method, Open source Design to run in parallel. Not simple to build and run
Je Linderoth (UW-Madison)
CMU-EWO
79 / 82
Multistage SPs
Modeling Tools
Conclusions
Stochastic Programming A tool for decision making under uncertainty
Considers the impact of recourse decisions It may not be the answer, but it does help you hedge against upcoming uncertainty More importantly, it gets people talking about the impact of uncertainty in the decision making process
Planning with mean-value estimates will not lead to an optimal policy Used with some success in industry
Financial Services (Many successes) Logistics and Supply Chain (Fewer successes, but coming!)
Multistage SPs
Modeling Tools
We Want YOU!
To consider using Stochastic Programming as a decision support tool to help manage in turbulent times!
linderoth@wisc.edu
CMU-EWO
81 / 82
Multistage SPs
Modeling Tools
It is a good thing for the uneducated person to read books of quotations Winston Churchill
Je Linderoth (UW-Madison)
CMU-EWO
82 / 82