You are on page 1of 22

“Mining the Matrix” – Applying

Data Mining Concepts to Large


Scale MIPs
J.L. Higle
May, 2004
Groningen WSIP
Outline
• “VanTran” paratransit driver schedules
(a problem formulation)
• Problem Characteristics
(or why I gave up on that idea…)
• Observations based on SSD computations
(relationship to some data mining techniques)
• An initial foray …
• Computational results so far
• What’s next…
2
VanTran – Paratransit service provider in Tucson

•± 80 vans, each hold up to 3 wheelchairs and 10 ambulatory riders


•Reservation-based ride service
•Changes in federal regulation impose stricter service requirements
•Driver schedules (“routes”) are set well in advance of day of service,
and remain constant over lengthy periods of time.
•Demand changes each day, so vehicle movements vary.

Question: Given that service requirements are changing, how should


the driver schedules be adapted to accommodate this change?

My response … “This sounds like a recourse problem …


first stage ~ “schedule drivers”
second stage ~ “assign customers to drivers”

3
One formulation
(or something like that…):
Variables:
srt  1 if driver r "starts" in period t; 0 othe rwise
ert  1 if driver r "ends" in period t; 0 otherwise
Dr 2  1 if r drives < 2hrs; 0 otherwise
Dr10  1 if r drives > 10 hrs; 0 otherwise
xir  1 if customer i assigned to r; 0 otherwise
ui  1 if customer i not assigned; 0 otherwise
zij  1 if i and j assigned togther; 0 otherwise

4
Min å t e rt  s rt   å  ps D rs  pl D rl  å Mui  å d ( i, j) zij
r ,t r i i, j

s/t
å t e rt  s rt   MDrs  t s r driver's schedule is too short
t

å t e rt  s rt   MD rl  t l r driver's on the road too long


t

srt  ert  1 r , t don't start and end at the same time*


åå  s
r q t
rq
 erq   1 t driver on the road at all times

åxr
ir  ui  1 i account for each customer

 zij  xir  x jr  1 i , j , r i,j assigned to same driver?


ånx i ir c r , t seat capacity
i I ( t )

å ts
t
rt  M (1  xir )  tipick  i, r don't assign i to r if r starts too lat e

å te rt  M (1  xir )  t idrop i , r don't assign i to r if r ends too early


t

Plus a few extra “tie-breaking” constraints to tighten the relaxation


5
Problem Characteristics (or why I gave up on that idea…)

• Hmm… how about 4 drivers, 20 customers, and only one demand


scenario…
• This results in a MIP – 605 binaries, 210 continuous, 1546
constraints…

• I let CPLEX work on it all night … after 5,000,000 B&B nodes I


managed to get the gap down to 7.7% (ouch)

• VanTran’s got 80 vehicles, 60 ± drivers, thousands of passengers,


random daily demand…

I abandoned that approach …


This is a large MIP (binary) lots of rows, lots of columns that’s
probably going to be solved lots of times…

6
SSD Stochastic Scenario Decomposition
(Higle, Rayco, Sen ’01)

• Solution method for multistage SLP


• Uses scenario decomposition + piecewise linear
concave approximation of the dual objective
function
• Uses randomly generated observations for
successive improvement of approximation
– adaptive sampling (aka “internal”) as opposed to
nonadaptive sampling (aka “external”)

7
SSD: Salient Features
• New observations – increase master program column dimension
• New cuts – increase master program row dimension
• Column dimension grows is more problematic…
To solve the master program we introduced some column and row
aggregation to reduce the size as follows:
• aggregate most of the cuts (except for two … “new”, “incumbent”)
• represent each column with 4-tuple
– current value
– coefficient in “new” cut
– coefficient in “incumbent” cut
– coefficient in “aggregated” cut
• columns with similar 4-tuples are aggregated
Note: aggregation ignores scenario and stage associated with
column … looks at the “data” and considers only similarities in
the data.
Note also: it worked surprisingly well on a variety of problems.
8
The aggregation scheme used by SSD can be viewed as a form
of “data mining” of the master program matrix.
“Data Mining” is a catch-all phrase – it’s a collection of
techniques used to draw some meaning from a large data set
&/or reduce its storage requirements.
What’s the connection…
Large problems  Big Matrices  lots of “data”
“VanTran” takes a long time to solve because it’s hard to
choose from “similar” solutions, and lots of “ties” have to be
broken
Perhaps columns can be “clustered” so that tie-breaking can be
postponed until later...
Perhaps the “information” contained in the constraints can be
represented in a compressed form…

9
Variable Clustering
xb , xc are the binary, continuous variables
Ab xb  Ac xc  r are the constraints … i.e., Ab  a1 ,..., an  where ai  m i
b

For each columns i and j calculatedij  D  ai , a j  for some “distance”, D.

If ai iI
are all “close” to each other, replace the columns by a
“clustered” variable, X I (general integer) so that the constraints become:
åI aI X I  Ac xc  r
1
where
aI  å ai X  u I || I || (obj. coeffs. similarly defined).
|| I || iI , and I

The resulting MIP has fewer variables, and is likely to spend less time
trying to break ties.
10
A solution to the clustered MIP, can be converted to a solution to
the original MIP (“parsed”) as follows:

Xˆ I  0  xi  0 i  I

Xˆ I  u I  xi  1 i  I

If 0  Xˆ I  uI the “cluster” must be undone, which can be


accomplished with a simple MIP … during which xc
can be assigned as well.

11
Two questions—
How should “distance” be defined?
How should variables be “linked” to form clusters?
• Possible “Distance” definitions…
• cityblock: dij  å | aki  akj |
k

åa  akj 
2
• euclidean: dij  ki
k

• Less standard:
• indicator: dij  å1   | aki * akj | where  ( u)  1 if u > 0; 0 otherwise
k

• correlation: d ij  1   ( ai , a j )
1
• hamming: dij  å   | aki  akj | percent of coordinates that differ
n k
1
• jaccard:
dij 
nnonzeroes
å  | aki  akj |
k
12
Possible “linking” definitions:
“linking” creates a hierarchical tree that indicates the order in
which clusters are aggregated. Some possible methods for
aggregating two sub-clusters:

• single: min. smallest distance between elements of both sets


• complete: min. largest distance between elements of both sets
• average: min. average distance between objects in the two sets
• ward: minimize inner squared distance

JH Confession: I didn’t want to code this so I just used Matlab


… these are some of the standard linkages that Matlab
provides…

13
My initial foray…

MIP Solver: CPLEX 8.1 (BTTOL 0.1, EMPHASIS feasibility)


6 Distances*4 Linkages*3 ClusterSizes = 72 runs

VanTran: 605 Binary, 210 continuous variables … 1546 constraints

Best known objective value = 605

CPLEX required 1,490,789 nodes to identify it

Gap after 5,000,000 nodes (37,481,441 iterations) is 7.7%

14
Some especially good combinations:

Dist Link ClusterSize ObjVal Nodes Iters


Jaccard Ward 300 607 1,240 8,881
Indicator Average 300 609 940 8,543
CityBlock Average 0.50 616.5 10,000 61,359
Correlation Complete 0.50 616.5 10,000 65,473
Correlation Single 0.50 616.5 10,000 108,396
Euclidean Average 300 617 5,264 62,120
Euclidean Single 0.50 618.5 10,000 73,302
Correlation Average 0.50 622.5 10,000 66,865
CityBlock Single 300 626 359 5,814
CityBlock Single 0.75 626 359 5,814

Not quite so good…


Euclidean Single 300 815 350 4230
Euclidean Average 50 902 4282 50514

The rest…
average ( std dev) 1951.3 217.4 1617.8
(Note: 28 did not return a feasible solution) (260.5) (1283.4) (6708.2)

(Note: Nodelimit = 10,000. Relative error gap tolerance = 0.03)

15
Some significant correlations:

measure measure correlation


Parsed Objective Value # of clusters -0.369
MIP solution node id -0.604
MIP iteration count -0.706

MIP1 solution node id # of clusters 0.523

16
Interaction Plot (data means) for Parsed_O
Average Complete Single Ward 50 75 300

2000 Dist
Abs
Cor
1500 Euc
Dist Ham
Ind
1000 Jac

2000 Link
Average
Complete
1500 Single
Link Ward

1000

C utO ff

17
What’s next…
• An obvious next step
– use the solution as an initial solution for the original MIP.
– slight problem: lower bounds are still weak, doesn’t help much

• Obvious questions –
can we “mine” for improved lower bounds?
(maybe, but not ready for prime time …)

• Does this generalize beyond VanTran?

18
Experimentation with some MIPLIB problems:

Problems and Characteristics

Problem Binaries Continuous Row Types Notes


10teams 1800 225 110 G, 40 P, 80 S correlations n/a
danoint 56 465 256 G, 392 U
egout 55 86 43 G, 55 U
fiber 1254 44 44 G, 378 U
khb05250 24 1326 77 G, 24 U
misc06 112 1696 820 G
mod011 96 10862 4400 G, 16 S, 64 U
rentacar 55 9502 6674 G, 55 U correlations n/a
rgn 100 80 20 G, 4 P

G: General P: Packing S: Special Ordered Set U,L: Upper,Lower Bound

19
Problem Combinations # within 1.5% of best # within 0.01% # w/o initial
known solution of best known solution
solution
10teams 60 60
danoint 72 12 12
egout 72 72 54
fiber 72 *** 69
khb05250 72 68 49
misc06 72 72 0
mod011 72 72 72
rentacar 60 60
rgn 72 49 49

*** when “parsed” solution was used to initialize B&B, optimal solution
identified within 285 nodes, 4300 iterations

20
Combination Summary: # within 1.5% of best known solution

Average Complete Single Ward


CityBlock 15 15 15 15
Correlation 13 15 13 13
Euclidean 17 15 15 15
Hamming 15 14 13 15
Indicator 15 15 13 16
Jaccard 14 14 13 13

Each Distance/Linkage Combination has:


3 cluster sizes, 9 problems = 27 combinations.
(correlation: 7 problems, so 21 combinations)

21
Conclusions?

• There’s still a lot of work ~ data analysis/interpretation to do


• It appears that there may be some problem classes where this type of
approach is beneficial
• “Parsed” solution isn’t necessarily feasible, but a complete recourse
type of formulation should eliminate that problem
• Lower bounds … might be achievable through a row aggregation
scheme

22

You might also like