You are on page 1of 50

H T T P : / / WWW. A N A L Y T I C S - MAGA Z I N E .

O R G
ALSO INSIDE:
NOVEMBER/DECEMBER 2012 DRIVING BETTER BUSINESS DECISIONS
Image analytics: next really big data thing
Distribution processing: math of uncertainty
Consulting & communication: achieving buy-in
DRILLING
with big data
Digital oil field helps oil & gas industry
produce cost-effective energy while addressing
environmental concerns.
Executive Edge
Macys.com VP
Kerem Tomak on
overcoming big
data, analytics
challenges
The AIMMS PRO Platform
A New Revolution In Optimization
Ever thought about

... creating an optimization App Store
for business users?
... providing web access to your
optimization solutions?
...deploying in the cloud?
No? We have!
Check out our AIMMS PRO Platform at:
www.aimms.com/aimms-pro-platform
Norman Jerome, Aromatics Business Optimization Advisor - BP, plc.
"In the first three months since migrating to the AIMMS PRO platform,
we have seen our user base expand five-fold for one of our core
scheduling applications. And with so many new eyes reviewing this
application, the data are more reliable, and we are producing better
schedules than ever before."
Bellevue, WA, USA Haarlem, the Netherlands Singapore Shanghai, China
www.aimms.com info@aimms.com
AIMMS is a registered trademark of Paragon Decision Technology B.V. Other brands and their products are trademarks of their respective holders
www.aimms.com/aimms-pro-bp
04911-AIMMS-ORMS-def.indd 1 17-08-12 10:14
1 | A NA LY T I CS - MAGA Z I NE . OR G
Big Datas Big Daddy
I NS I DE S T ORY
Can big data get any bigger?
The question reminds me of the
old joke about the bear in the woods.
Of course big data is going to get
bigger. Today, and with apologies to
Sting, every breath you take, every
move you make, every bond you
break, every step you take seem-
ingly produces data. Multiply the
moves you make and the claims you
stake by the billions of other peo-
ple around the world and suddenly
youre talking really big data.
You might say Big Data has
many Big Daddies, all of whom are
prolifc.
Wasnt it just a nanosecond ago
(in relative terms) that analysts every-
where were whining that if they only
had more data, they could solve all
manner of complex operational prob-
lems that were heretofore intractable?
Now those same analysts are drown-
ing in data and struggling to keep their
heads above the data deluge.
It turns out that irony really is
a dish best served with cold, hard
facts. Just go easy on the side order.
Today, the analytics community
is basically scratching the surface in
terms of turning the deluge of data into
meaningful decision-making insight
on a widespread, corporate-world
scale. The sheer volume of available
data is imposing enough, but then
the data has to be properly mined,
cleaned, analyzed and presented to
decision-makers or its going right
back on the scrap heap along with all
the other promising ideas that never
garnered C-level buy-in.
That, in a nutshell, is the theme
of several articles in this issue of
Analytics magazine. For example,
in his cover story on the potential
of big data analytics and the digital
oil feld to revolutionize the oil and
gas industry, Adam Farris notes
that breaking into the oil and gas
industry is diffcult for analysts be-
cause data scientists and petroleum
engineers not only dont speak the
same language, they dont appear
to be from the same planet. Yet the
potential for big data analytics to im-
prove energy production and safety
while protecting the environment is
enormous.
Go fgure.
PETER HORNER, EDITOR
peter.horner@mail.informs.org
Discover the easiest, fastest, lowest cost and risk on
ramp to full-power data mining and predictve
analytcs: Analyze your data in Microsof Excel with
XLMiner sofware, now from Frontline Systems.
You wont need a statstcs PhD, months of learning
tme, or a huge budget for enterprise data mining
sofware. But you will have the power of predictve
analytcs and data visualizaton at your fngertps.
Sophistcated Data Mining Power.
XLMiner goes far beyond other statstcs and forecastng
add-ins for Excel. It starts with multple regression,
exponental smoothing, and ARIMA models, but goes
further with regression trees, k-nearest neighbors, and
neural networks for predicton. It ofers discriminant
analysis, logistc regression, k-nearest neighbors,
classifcaton trees, nave Bayes and neural nets for
classifcaton, and associaton rules for afnity analysis.
Practcal Data Cleaning and Exploraton.
XLMiner includes utlites for data sampling and
parttoning, missing data handling, binning, and
transforming categorical data. You can apply principal
component analysis, k-means clustering, and
hierarchical clustering to simplify and cluster your data.
Help, Textbook and Instructor Support.
XLMiners extensive online Help introduces the data
mining methods and explains the optons in each dialog.
With the popular textbook Data Mining for Business
Intelligence built around XLMiner sofware you can
go from beginner to expert business analyst in a high-
demand feld. Instructors can take advantage of low-
cost classroom licenses, PowerPoint slides, example
fles, solutons to exercises, an online forum, and more.
Download Your Free Trial Now.
Visit www.solver.com/xlminer to learn more, register
and download a free trial or email or call us today.
Tel 775 831 0300 Fax 775 831 0314 info@solver.com
XLMINER: VISUAL DATA MINING
Works with PowerPivot to Access 100 Million Rows in Excel
DRIVING BETTER BUSINESS DECISIONS
C O N T E N T S
FEATURES
IMAGES & VIDEOS: SOME REALLY BIG DATA
By Fritz Venter and Andrew Stein
Sizing up the potential impact of prescriptive analytics driven by
proliferation of images and video.
HOW BIG DATA IS CHANGING OIL & GAS INDUSTRY
By Adam Farris
Advent of the digital oil eld helps produce cost-effective energy
while addressing safety and environmental concerns.
DISTRIBUTION PROCESSING ADDRESSES UNCERTAINTY
By Sam L. Savage
Non-prot organization promotes standards for making rational,
auditable calculations based on probability distributions.
SOFT SKILLS: ART OF EFFECTIVE COMMUNICATION
By Gary Cokins
How to achieve corporate buy-in during the Twitter-inuenced,
short-attention-span era.
SUCCESSFULLY OPERATIONALIZING ANALYTICS
By James Taylor
A repeatable, efcient process for creating and effectively
deploying predictive analytic models into production.
SPORTS ANALYTICS: BASKETBALL GENOMICS
By William Cade
Evaluation of performance: evolution of the ofcial box score
reveals true on-court player values.
14
20
28
33
37
39
2 | A NA LY T I CS - MAGA Z I NE . OR G
39
20
14
NOVEMBER/ DECEMBER 2012
Brought to you by
DRIVING BETTER BUSINESS DECISIONS
Brought to you by
3 | A NA LY T I CS - MAGA Z I NE . OR G
REGISTER FOR A FREE SUBSCRIPTION:
http://analytics.informs.org
INFORMS BOARD OF DIRECTORS
President Terry P. Harrison, Penn State University
President-Elect Anne G. Robinson, Verizon Wireless
Past President Rina R. Schneur,
Verizon Network & Technology
Secretary Brian Denton,
University of Michigan
Treasurer Nicholas G. Hall, Ohio State University
Vice President-Meetings William Bill Klimack, Chevron
Vice President-Publications Linda Argote, Carnegie Mellon University
Vice President-
Sections and Societies Barrett Thomas, University of Iowa
Vice President-
Information Technology Bjarni Kristjansson, Maximal Software
Vice President-Practice Activities Jack Levis, UPS
Vice President-International Activities Jionghua Judy Jin, Univ. of Michigan
Vice President-Membership
and Professional Recognition Ozlem Ergun, Georgia Tech
Vice President-Education Joel Sokol, Georgia Tech
Vice President-Marketing,
Communications and Outreach E. Andrew Andy Boyd,
University of Houston
Vice President-Chapters/Fora Olga Raskina, Con-way Freight
INFORMS OFFICES
www.informs.org Tel: 1-800-4INFORMS

Executive Director Melissa Moore
Meetings Director Teresa V. Cryan
Marketing Director Gary Bennett
Communications Director Barry List

Headquarters INFORMS (Maryland)
7240 Parkway Drive, Suite 300
Hanover, MD 21076 USA
Tel.: 443.757.3500
E-mail: informs@informs.org
ANALYTICS EDITORIAL AND ADVERTISING
Lionheart Publishing Inc., 506 Roswell Street, Suite 220, Marietta, GA 30060 USA
Tel.: 770.431.0867 Fax: 770.432.6969
President & Advertising Sales John Llewellyn
john.llewellyn@mail.informs.org
Tel.: 770.431.0867, ext.209
Editor Peter R. Horner
peter.horner@mail.informs.org
Tel.: 770.587.3172
Art Director Lindsay Sport
lindsay.sport@mail.informs.org
Tel.: 770.431.0867, ext.223
Advertising Sales Sharon Baker
sharon.baker@mail.informs.org
Tel.: 813.852.9942
Analytics (ISSN 1938-1697) is published six times a year by
the Institute for Operations Research and the Management
Sciences (INFORMS). For a free subscription, register at
http://analytics.informs.org. Address other correspondence to
the editor, Peter Horner, peter.horner@mail.informs.org. The
opinions expressed in Analytics are those of the authors, and
do not necessarily refect the opinions of INFORMS, its offcers,
Lionheart Publishing Inc. or the editorial staff of Analytics.
Analytics copyright 2012 by the Institute for Operations
Research and the Management Sciences. All rights reserved.
DEPARTMENTS
Inside Story
Executive Edge
Prot Center
Analyze This!
Forum
Conference Preview
Five-Minute Analyst
Thinking Analytically
1
4
7
9
11
42
44
47
42
44
The Excel Solvers Big Brother Makes Optmizaton
Easier, More Powerful and Far More Visual.
From its classic Solver Parameters dialog to its Ribbon,
Task Pane, charts to visualize the shape of functons,
Constraint Wizard, model diagnosis, Help, and coverage
in leading management science textbooks, Risk Solver
Platorm makes it easier to learn and use optmizaton.
Learn and Use Monte Carlo Simulaton, Decision
Analysis, and More with the Same Sofware.
You can use its powerful optmizaton features alone
but Risk Solver Platorm is also a full-power tool for
Monte Carlo simulaton and decision trees, with a Dis-
tributon Wizard, 50 distributons, 30 statstcs and risk
measures, and a wide array of charts and graphs.
Use the Full Spectrum of Optmizaton Methods.
Fast linear, quadratc and mixed-integer programming is
just the startng point in Risk Solver Platorm. SOCP,
nonlinear, non-smooth and global optmizaton are just
the next step. Easily incorporate uncertainty and solve
with simulaton optmizaton, stochastc programming,
and robust optmizaton all at your fngertps.
Go Beyond Single Optmizaton/Simulaton Solutons.
Defne parameters in seconds, run high speed mult-way
parameterized simulatons and optmizatons, capture
all the solutons, and produce instant reports, charts
and graphs, without programming. And you can do
even more in Excel VBA.
Free Trials and Support for Your Eforts at Solver.com.
Download a free trial of Risk Solver Platorm, and learn
about the many other ways we can support your
learning and teaching eforts all at www.solver.com.
TEACH AND LEARN OPTIMIZATION AND SIMULATION
RISK SOLVER PLATFORM
Tel 775 831 0300 Fax 775 831 0314 info@solver.com
WWW. I NF OR MS . OR G
Its been more than a decade since the In-
ternet became a household shopping front.
We shop without leaving the sofa during a
commercial break due to the ease of a tablet
device. Our smartphone tells us how much an
item is on a competitive ecommerce site while
we are shopping in a retail store. If we like a
product we buy it instantly without waiting in a
checkout line.
One common theme behind all these ac-
tivities: we implicitly or explicitly create data
as we interact with these devices. We trans-
mit data to the cloud where it is stored. This
data (with our permission) then becomes part
of an analytic workfow somewhere and comes
back to us with recommendations and/or of-
fers on what we should buy next, and the circle
of commerce continues.
Twenty years ago, 30MB of hard disk was
so immense that one didnt know what to do
with so much storage space. A gigabyte was
big data for an 8086 processor and DOS-
based Lotus 123 worksheets that were used.
The Internet did not exist, so the speed at
which data increased was contingent upon the
speed at which one could receive foppy disks
in the mail, 360KB at a time.
However, we still had the same workfow
that we have today in relation to analytic ex-
ercise. We sampled, ran descriptive statistics
and visualized the data. Based on our fndings,
we came up with a model or series of models
that best ft the data, calibrated the model pa-
rameters based on simulations and completed
the version 0 of the analytics deliverable. As
we collected new data, we would revisit the
process and assess whether we needed a
new model or keep the existing one, making a
few parametric changes here and there. All the
data we had flled a spreadsheet back then.
We could eyeball the data and see patterns
easily.
Similarly, when we sample data today, we
need effcient and fast visualization tools that
allow us to get to the nuggets quickly. Not
only is the data much larger, but the dimen-
sions over which the data is collected are nu-
merous. The belief that since we have more
data we do not need to sample is a fawed
one. A critical assumption behind that thought
is that big data is accurately and comprehen-
sively capturing every known piece of informa-
tion there is to know about everything. Within
the modeling realm there is also the concept
of over-ftting, data quality, etc., which still im-
plies sampling as a step in the analytic pro-
cess. However, a 1 percent sample of a 100TB
data is still large data.
RISING CUSTOMER EXPECTATIONS
As the time spans in which data is creat-
ed are compressed, customer expectations
of companies to provide information about
products and services such as availability, de-
livery, discounts in near real time, if not real
time, increase dramatically. To complicate
things even further, there is a new addition to
the data types that has added a twist to the
story: social media feeds. Semi- or un-struc-
tured data makes parsing, analyzing and in-
terpreting the data even more challenging, as
the data does not come in traditional columnar
setup. What is the value of a fans comment
on a businesss Facebook page? Who are the
social infuencers in a companys network of
fans and how can we use this information to
reach to the right audience? How can a com-
pany understand which products are trendy or
what brands are in high demand from tweets?
After pre-processing and massaging the so-
cial data, these and similar questions can be
answered by using statistical tools and experi-
menting with fndings to see if any of those are
actionable.
Thanks to the cloud, we do not need to in-
vest a lot of money in hardware and software to
process all this data. Our ability of disseminat-
ing information quickly across different units
is constrained by the slowest link we maintain
in our network. If we are not comfortable with
moving and/or sharing a lot of data, we can
build our own cloud behind frewalls. Sophisti-
cated statistical and visualization software are
affordable as well. It can be only a matter of
4 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
Overcoming big data
challenges for analytics
BY KEREM TOMAK
As the time spans in
which data is created
are compressed,
customer expectations
of companies to provide
information about products
and services increase
dramatically.
E XE CU T I VE E D GE
5 | A NA LY T I CS - MAGA Z I NE . OR G
days before a company obtains more
than simple analytical capabilities. En-
terprise class operations still require
signifcant investment, but even these
are relatively cheap.
These affordable technological ca-
pabilities enable the possibility of build-
ing a successful analytics function as if
the unit is a startup company within a
larger organization. This is one of the
many scenarios in which an analytics
team can be established. With buy-
in from senior management already
achieved and seed funding ready, the
main starting point is to hire an expe-
rienced analytics leader and empower
him or her to build the roadmap to es-
tablish a proactive team.
ANALYTICS LEADERSHIP
Analytics leaders need to speak
the language of at least one quantita-
tive feld such as mathematics, statis-
tics, operations research or economics.
This is necessary to build a credible
leadership vertically and across the or-
ganization. Think of them as interpret-
ers between the quantitative types and
execution teams. An effcient analytics
leader needs to understand the busi-
ness and trends, anticipate the chang-
es in requests for information and plan
ahead to build required capacity to
respond to the changes. Many analyt-
ics projects fail as either the informa-
tion is too overwhelming or the model
is too complex for a non-quantitative
end-user to comprehend and take an
action.
One of the key early steps is to
have a dedicated systems team that
is given the right funding and flex-
ibility to build the analytics systems
and support. Without a clear roadmap
toward scalable and robust systems
and processes, an analytics team is
limited in capabilities. Analytics lead-
ership needs to pass requirements to
the systems team or teams in order
to put the building blocks in place.
This requires a comprehensive un-
derstanding, exposure and hands-on
experience with data and analytics
systems and tools.
What does this flexibility enable
an analytics team to accomplish?
They can rapidly prototype automat-
ed, data-driven solutions in reporting,
product recommendations, personal-
ized offers and more. Being on the
cutting edge of tools and techniques
enables the right data scientist to
have the freedom to invent. Business
units benefit from not only improved
internal processes to acquire the in-
formation they need much faster, but
E XE CU T I VE E D GE
The Optmizaton / Simulaton Library that Speaks the
Languages of Desktop and Web Computng.
No other callable library matches the fexibility of
Solver SDK Platorm from solving Excel models on
your server, to solving models created in your browser.
Build Your Model Faster in Excel or Visual Studio.
Use Risk Solver Platorm or its subsets to build your
model in Excel. Use Microsof Visual Studio to create
your model in C++, C# or VB.NET. Use C, R, MATLAB,
Java, PHP or JavaScript its your choice.
Use Optmizaton and Simulaton as a Web Service.
Run the Solver Server that comes with Solver SDK on
any PC or server, then solve your models over the
wire using our standard Web Service, SOAP and WSDL
protocols. Get real-tme progress, or just disconnect
and retrieve the soluton later its your choice.
Solve Your Most Challenging Linear Mixed-Integer,
Nonlinear and Stochastc Optmizaton Models.
Solver SDKs wide choice of built-in and plug-in Solver
Engines handle every type and size of optmizaton
problem, from Gurobi and Xpress-MP to Mosek and
KNITRO. Use the SDKs built-in simulaton optmizaton
and stochastc programming algorithms, and support
for mult-core CPUs.
Download a Free Trial at www.solver.com.
Learn more and download a FREE trial version of Risk
Solver Platorm, Solver SDK Platorm, and our eight
plug-in Solver Engines from Solver.com
THE MOST FLEXIBLE WAY TO DEPLOY YOUR MODEL
SOLVER SDK PLATFORM
Tel 775 831 0300 Fax 775 831 0314 info@solver.com
A membership in INFORMS will help!
visit http://join.informs.org to join online
New in 2013! Certification for Analytics Professionals
Online access to the latest in operations research
and advanced analytics techniques
Subscriptions to online and print INFORMS Publications
Networking Opportunities available at INFORMS Meetings
and Communities
Education programs around the world to enhance your
professional development and growth!
How will you stand out from the crowd?
2013
INFORMS Renewals
available online
Renewing online is quick
and easy. Renew online
to help us stay green.
http://renew.informs.org
E XE CU T I VE E D GE
they also start to find novel ways to serve
their customers, to improve their product of-
ferings, and to understand where the bottle-
necks are within the organization, and the
list grows.
TESTING AND PRODUCTION OF PROTOTYPES
Finally, the path to testing and production
of working prototypes needs to be smooth and
supported by technology teams across differ-
ent business units. An analytics team needs to
be able to build dashboards and disseminate
the information through centralized systems
for everyone who needs that information to
use. They need to be able to test new algo-
rithms live or by using simulations to see what
needs to be tweaked and/or improved. But
most importantly they need to work hand in
hand with agile technology teams to turn pro-
totypes into products that pass strict SLAs and
requirements to meet the performance criteria
of the production systems.
The road to taming big data passes through
people who are trained to handle the intrica-
cies of data, understand their business, ar-
ticulate what they see and, most importantly,
are enabled to feed their intellectual curiosity
by learning new tools and thinking outside the
box. Aligned with testing and delivery teams,
an analytics team with a keen focus on the
end-goal can be a major driver of a successful
business.
Kerem Tomak (kerem.tomak@macys.com) is vice president of
Marketing Analytics at Macys.com. He is a member of INFORMS.
6 | A NA LY T I CS - MAGA Z I NE . OR G
An analytics team
needs to be able to
build dashboards and
disseminate the information
through centralized
systems for everyone who
needs that information.
Subscribe to Analyti cs
Its fast, its easy and its FREE!
Just visit: http://analytics.informs.org/
WWW. I NF OR MS . OR G
How does an organization move from
practicing little or no analytics to becoming
a world leader? The answer isnt simple. But
much can be gleaned from taking a look at
companies and industries that now employ
analytics at the highest levels. And one of
the great success stories is that of the airline
industry.
Prior to 1978 the Civil Aeronautics Board
(CAB) regulated where, when and at what
price every airline could fy. If an airline wanted
to offer a new fight, it had to fle the appro-
priate paperwork then wait for a decision from
the CAB. Prices, which were identical across
carriers, were set by the CAB to refect the
airlines reported cost of service. The environ-
ment didnt encourage the industry to operate
effciently.
That situation changed with the Airline De-
regulation Act of 1978. Airlines were free to es-
tablish their own routes and schedules and to
set prices however they saw ft. It was an era
of tremendous upheaval as airlines sought to
adapt to the competitive environment in order
to survive. Analytics proved to be a corner-
stone of the adaptation process.
Where exactly did analytics fit in? One
area was that of choosing the routes air-
craft would fly. If an airline serves 100 cities
and a typical route involves a plane visiting
three cities per day, there are roughly a mil-
lion different routes a single plane can be
assigned to. Of course, the actual problem
is far more complicated. All of the planes in
the fleet must be routed and scheduled so
that their arrival and departure times are co-
ordinated, thus allowing passengers to make
connections.
The problem of fnding a single, reason-
able schedule is in itself a diffcult task. But
to be competitive, airlines need to fnd good
schedules schedules that fll fights with pas-
sengers. In the wake of deregulation, airlines
developed analytical models to predict pas-
senger demand, demand that was in turn fed
into large optimization models to generate the
most proftable schedule.
Routing and scheduling are only part of
the operational problem. Pilots and flight at-
tendants must be assigned to staff flights.
The question for airlines is who to assign to
various flights. Simply finding an assignment
can be difficult since union contracts and
government regulations place restrictions on
what crews are allowed to do. A pilot, for ex-
ample, cant fly for 24 hours without mandat-
ed rest breaks. But among the many potential
crew assignments, some are more cost ef-
fective than others for example, those that
require fewer crews, reduce overnight stays
in hotels and other items. For large airlines,
crew costs run well into the billions of dollars
annually, and large optimization models are
routinely used to find crew assignments with
the lowest possible cost.
One of the more interesting practices to
spring from the Airline Deregulation Act was
the practice of dynamic pricing. Airlines quick-
ly realized there were two primary classes of
flyers: business passengers, who were rela-
tively price insensitive, and leisure passen-
gers, who cared a lot about price. Airlines
were able to segregate these two groups by
introducing fare restrictions. A $200 ticket
might be available up to three weeks in ad-
vance, after which the price would go up to
$300. Segregation of this type worked be-
cause business travelers frequently booked
only a few days ahead of departure while
leisure travelers were willing to book their
vacations further in advance to obtain lower
prices.
The practice worked well, and once Pan-
doras Box was open airlines rushed to take a
look inside. If raising the price three weeks be-
fore departure was successful, why not raise it
again to $600 with one week to go? If a plane
is nearly full four weeks before departure, why
wait another three weeks to raise the price to
$600? Why not do so immediately? Over time
the practice incrementally evolved to a point
where future demand was being forecast by
price point and the interaction between differ-
ent fares on routes using shared fight legs was
7 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
Embracing analytics
BY E. ANDREW BOYD
The practice worked well,
and once Pandoras Box
was open, airlines rushed
to take a look inside.
P R OF I T CE NT E R
8 | A NA LY T I CS - MAGA Z I NE . OR G
being accounted for. Dynamic pricing in
the airline industry (revenue manage-
ment in industry jargon) is one of the
most advanced applications of analyt-
ics in use today.
The rise of advanced analytics in
the airline industry can be attributable
to many factors, but two stand out in
particular. One was Robert Crandall,
the CEO of American Airlines from
1985 to 1998, who believed in the
power of analytics. Crandall was no
lover of mathematics, but he was no-
toriously competitive and believed an-
alytics could be used as a competitive
weapon. Under his leadership Ameri-
can embraced analytics and became
the most feared and revered airline of
the 1980s and 1990s, employing hun-
dreds of analytics professionals who
had their hands involved in every as-
pect of running the airline.
Americans innovations caught the
attention of other carriers who realized
the value of analytics. And this was the
second factor leading to wide-scale
adoption of analytics: airlines needed
it to remain competitive. The practice
of analytics had become necessary to
stay in business.
Most industries havent undergone
the analytics conversion experienced
by the airlines. While its true that de-
regulation helped serve as a catalyst
for the airline industry, earth-shaking
events arent required to embrace ana-
lytics. All thats needed is recognizing
the competitive advantage it provides
and nurturing a sustained effort to im-
prove over time. American Airlines
started with an analytics group of eight
people doing what they could in an or-
ganization devoid of analytics. It took
time to grow in size and sophistication,
but American was ahead of its competi-
tors. And in a period that saw the de-
mise of dozens of established airlines,
American survived and thrived. Its one
of the great analytics success stories,
and one we have much to learn from.
Andrew Boyd, senior INFORMS member and
INFORMS VP of Marketing, Communications and
Outreach, has been an executive and chief scientist at
an analytics frm for many years. He can be reached at
e.a.boyd@earthlink.net.
P R OF I T CE NT E R
A leading marketing companys scoring models used to take
4.5 hours to process. Now, with high-performance analytics
from SAS,

theyre scored in 60 seconds.


Get the relevant insights you need to make decisions in an
ever-shrinking window of opportunity and capitalize on
the complexity of big data to differentiate and innovate.
269
What would you do
with an extra 269 minutes?
High-Performance Computing
Grid Computing
In-Database Analytics
In-Memory Analytics
Big Data
high-performance
analytics
A real
game changer.
sas.com/269 for a free white paper
Each SAS customers experience is unique. Actual results vary depending on the customers individual conditions. SAS does not guarantee results, and nothing herein should be construed as constituting an additional warranty. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks
of SAS Institute Inc. in the USA and other countries. 2012 SAS Institute Inc. All rights reserved. S91566US.0412
Hel p Promote Anal yt i cs
Its fast and its easy! Visit: http://analytics.informs.org/button.html
WWW. I NF OR MS . OR G
Not long after fnishing graduate school,
I found myself working at what used to be
known as an operations research consulting
frm (today, this company would be called an
analytics services provider or some such),
working full-time on a project for a large client.
I still have a lot of scars from that project.
The core of the model that we were build-
ing was coded and implemented on a main-
frame computer. This meant that I often had
to struggle with writing a few crucial and con-
fusing lines of Job Control Language (JCL)
something that I never did quite master
and every time one of my compute jobs
crashed, I would receive a late-night phone
call at home.
Also, upon joining the project team, I had
inherited a largely undocumented SAS pro-
gram from a departing colleague, a mess of
spaghetti code that contained the guts of the
model that we were implementing. I spent
countless hours trying to sort it out and clean it
up without causing our nightly production runs
to crash (see late-night phone call above).
THE BIGGEST PROBLEM
By far the biggest problem, however, was
the lack of clarity about the projects purpose.
The project was sponsored and funded by the
clients IT organization. The actual business
groups who were expected to use the models
were not at all clear on what the value proposi-
tion was for them, and we could see that our
project deliverables were being shoved down
their throats.
Not surprisingly, the business users we
worked with were motivated to fnd problems
with what we were doing and they often
did. Some instability in the networking infra-
structure often prevented us from delivering
updated results, which generated one set of
complaints. Even when everything worked on
schedule, our results (based on a daily snap-
shot of the systems state) would inevitably be
out of synch with the latest data that was avail-
able, which in turn produced a whole other set
of complaints. At core, there was a fundamen-
tal disconnect between the business users
(who believed they had asked IT for a tactical
reporting tool) and the IT organization (who
believed that we had been asked to deliver a
more sophisticated decision support system).
Meanwhile, we were pushed for political
reasons to get the model into production as
soon as possible, while the business users
kept fnding reasons not too sign off on the de-
liverables and refused to use our solution at all
until it had been formally accepted. As such, a
huge amount of effort was spent designing the
perfect user interface, right down to the choice
of colors to be used to represent different kinds
of outputs, while the models core logic and
functionality was never seriously examined by
the clients who would ultimately have to use it.
PROJECTS VALUE PROPOSITION
Though nearly 20 years have passed since
then, this project came to mind again the other
day when an old friend of mine (lets call him
Doug) told me a familiar tale about one of
his recent projects. From the beginning, our
understanding was that the purpose of the
project was to build an alpha version, Doug
explained, something to demonstrate the
potential of the application while giving us a
chance to establish data connectivity, get a
bunch of technologies talking to each other,
and use a sample of the operational data to
show that the optimization could actually pro-
vide signifcant savings.
So far, this made great sense to me. In fact,
of late I have had conversations with many
people in the analytics feld about the value of
rapid prototyping for engaging potential stake-
holders, for demonstrating business value, and
for ensuring that the team really understands
the problem domain.
So what went wrong for Doug?
Turns out his project was also being led by
the IT organization, and that those folks did
not have any sense at all about the projects
value proposition. In addition, not long after
Doug and his team had begun working, some-
one somewhere in the IT chain of command
made an ad hoc decision to roll out the alpha
9 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
Even tragic projects can
have happy endings
BY VIJAY MEHROTRA
The business users we
worked with were
motivated to find problems
with what we were doing
and they often did.
ANALY Z E T H I S !
WWW. I NF OR MS . OR G 10 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
version of the software to a group of busi-
ness users around the county. Within the
clients IT group, this was interpreted as
a decision to treat this alpha version as a
production system and as such to hold
the consultants feet to the fre as if tak-
ing delivery on commercial enterprise
software.
Doug and his team were perplexed
and distracted from what they thought
their focus should be. A great deal of time
was spent on minute details associated
with the graphical user interface, includ-
ing long discussions about the layout and
coloring of various output values, even
though there was no established accep-
tance criteria (since the GUI had not been
viewed as a signifcant part of the original
projects scope). Meanwhile, despite re-
peated attempts by Doug, no one on the
client side was willing to even look at the
results of the optimization until the entire
user interface design was signed off and
fully functional.
In fact, with the possible exception of
the original executive sponsor (who was
extremely busy and far removed from
the reality of the project), it seemed to
Doug that no one really understand how
the systems pieces (including the GUI,
a confguration rules engine, the optimi-
zation model, a relational database, and
java code for developing and deploying
the application) ft together, or why the
project was being done in the frst place.
ABANDON SHIP
After more than a year on my project,
I had become fed up. The partners in my
frm had made a great deal of money
from my billable hours on this project, but
I had come to understand that this was
clearly just part of the standard profes-
sional services business model. Howev-
er, I had nothing to show for my efforts
but lost sleep, a bunch of unhappy people
within the client organization, and a deep
sense that I was wasting their time and
my own. I left the project, and the frm,
soon thereafter.
For many years, I felt quite smug about
this decision to abandon ship. Indeed, I
have since been told by several people
that I respect and trust that the willing-
ness to put your job on the line for what
you believe every single day should be
a core value for successful project lead-
ership. In any case, I was young, single
and free of debt, and walking out of my
employers offces on that fnal day, I felt
that I had very little to lose by leaving it all
behind.
After talking to Doug, however, Im
not so sure. Far older and wiser now
than I was then, Doug has worked
through this challenging project calmly
despite the many frustrations, commu-
nicating his concerns to his own man-
agement and doing his best to educate
people throughout the client organiza-
tion all the way along. Though the ini-
tial project was ultimately cancelled, the
clients executive sponsor has recently
re-engaged with Dougs company, rec-
ognizing her own part in the projects
failure and still believing in the poten-
tial business value that the optimization
might be able to provide.
While his initial project had appeared
to be a tragedy, or at least a black come-
dy, Dougs story may yet turn out to have
a happy ending. In any case, I plan to
stay tuned, and I hope to learn something
along the way.
Vijay Mehrotra (vmehrotra@usfca.edu) is an
associate professor in the Department of Analytics and
Technology at the University of San Franciscos School
of Management. He is also an experienced analytics
consultant and entrepreneur, an angel investor in several
successful analytics companies and a longtime member
of INFORMS.
ANALY Z E T H I S !
prize
call for
nominations
The Institute for Operations Research and the Management
Sciences annually awards the INFORMS Prize for effective
integration of Operations Research/Management Science
(OR/MS) and advanced analytics into organizational decision
making. The award is given to an organization that has
repeatedly applied the principles of OR/MS and advanced
analytics in pioneering, varied, novel, and lasting ways.
2013
DEADLINE FOR APPLICATIONS IS DECEMBER 1, 2012
2013 COMMITTEE CHAIR:
Stefan Karisch
Jeppesen
voice: +1 303.328.6389
e-mail: Stefan.Karisch@jeppesen.com

Tell us why the title should be yours.
log onto: www.informs.org/informsprize for more information
Which organization has the best O.R. department in the world?
WWW. I NF OR MS . OR G
Statisticians and analytics related profes-
sionals have been conducting time-to-event
analyses across myriad applications for as long
as data has been collected and analyzed. Gov-
ernments and religious institutions throughout
history have collected data on birth and death
rates to better estimate resource demands
and predict tax revenue. Insurance compa-
nies use sophisticated time-to-event models
to predict accident, illness and mortality rates
in order to set policy costs and forecast profts.
Engineers use these techniques (often called
reliability analysis) to model the lifetime and
failure rates of mechanical or electronic sys-
tems and uncover the factors that impact those
rates, producing metrics such as MTTF (mean
time to failure) among others. Marketers have
begun adopting many of these techniques to
study important time-to-event phenomena of
their customers, such as the rates of product
adoption or time for a customer to upgrade a
service contract.
This article illustrates an example of how
the author used the analytical techniques of
time-to-event analysis to build a set of statis-
tical models forecasting the time for known
software bugs to be encountered by custom-
ers after the software had been installed on
their networks.
As anyone involved in software develop-
ment knows, its not possible to have 100 per-
cent of known bugs fxed prior to release if a
company is to meet market demands and ex-
pectations. Being able to predict the expected
time for bugs to be encountered by customers
once the software is out in the feld and deter-
mining the characteristics that impact that time
can be very valuable information for the orga-
nization. Management can prioritize efforts
and focus engineering and testing resources
on the bugs most likely to be encountered, and
then hopefully fx those bugs before too many
customers encounter the problem.
CENSORED DATA
Time-to-event data and the statistical tech-
niques developed to analyze such data have
an important distinction. The data is often in-
complete, or what in the statistical literature
is referred to as censored data. That is, in-
stead of having exact time-to-event values,
only upper and/or lower bounds for when the
event in question occurred is known. For ex-
ample, consider that a release of the software
is installed on a customers network with 250
known bugs not fxed at the time of installa-
tion. Consider that after six months (180 days)
in use, the customer had encountered and re-
ported on 86 of those 250 bugs. The length of
time between installation and the date the bug
was encountered can be used as the time-to-
event for each of those 86 bugs. The remaining
250 - 86 = 164 bugs, however, are still pres-
ent in the software but havent been found by
the customer yet. If its desired to conduct the
analysis after 180 days, it would be incorrect
to use the time-to-event value of 180 days for
those 164 bugs. If given enough time in use,
those 164 bugs would eventually be encoun-
tered and would each have a time-to-event
value of 181 days or greater. Statistical tech-
niques for time-to-event analysis can handle
this, and each of those 164 bugs would be as-
signed what is called a right censored value
of 180 days. Similarly, there is also left censor-
ing when only the upper bound for the time-to-
event is known and interval censoring when
the time-to-event is only known within a range.
In many time-to-event studies, data will con-
tain a combination of complete data as well as
all forms of censoring. The data used for this
project had complete data (i.e., time-to-event
known) and right-censored data (i.e., a lower
bound on the time-to-event known).
LEARNING AGENDA
As with any analytics initiative, consolidat-
ing a learning agenda that all stakeholders and
teams had input in creating is essential to get the
needed alignment and support. This work can
take anywhere from a few days for a simple proj-
ect to many months for a more complex initiative
that an organization has not attempted before.
For this particular project, three months of time
11 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
Time-to-event analysis
BY KEVIN J. POTCNER
Its not possible to have
100 percent of known
bugs fixed prior to release
if a company is to meet
market demands and
expectations.
F OR U M
WWW. I NF OR MS . OR G 12 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
was spent in discussions with senior man-
agement across various groups including
QA & Testing, Engineering, Development
& Engineering, Product and Release Man-
agement, among others. These discus-
sions are the frst opportunity to establish
priorities and set realistic expectations. In
addition to developing a set of key ques-
tions and project objectives, these discus-
sions provide valuable historic context on
the problem and can identify sensitive top-
ics that a consultant has to tread very care-
fully around.
For this project, the learning agenda
was distilled down to three key questions.
1. What is the likelihood that a customer
will encounter a bug as a function of
time in usage?
2. How do characteristics of the bug
impact that likelihood?
3. Which bug types are the most likely
to be encountered by a customer?
DATA COLLECTION & AGGREGATION
Collecting and aggregating all the
needed data can be one of the most chal-
lenging and time-consuming aspects of
an analytics initiative. The data needed for
this effort was spread across myriad data-
bases requiring many different resources
to fully source. Resources worked on this
stage for over two months, often requiring
multiple data extractions.
The defnition of data should be broad-
ened to include information not contained
within a database. Resources intimate
with the software and how the customer
uses it can provide a wealth of knowledge
that can often be translated into quantita-
tive variables providing additional dimen-
sions to the analysis.
QA, ANALYSIS & DATA CLEANSING
Once all the data has been extract-
ed, its important to plan for a proper
amount of time and effort to validate
and clean the data. This step is often
underestimated in analytics projects but
is one of the most critical as misleading
results can be produced if this work is
not done thoroughly. Almost all data will
have issues that need to be resolved.
Errors, incorrect values, unusual ob-
servations, extreme outliers and data
inconsistencies are quite common; ad-
dressing these issues will benefit both
the project at hand as well as other
applications that these data are being
used for. For this particular project, the
technical teams uncovered a host of
problems with a few of the databases
revealing that an uncomfortable level of
inaccurate data was being used to cre-
ate various reports distributed across
the business. A separate project was
kicked off to fix these issues and im-
prove the accuracy of these reports.
Validating and cleaning data gener-
ates some very rich conversations among
stakeholders and technical teams. This
stage is also helpful to set expectations
with stakeholders when the gaps and
limitations of the data can be more clearly
shown. These discussions help the stat-
istician better connect how the business
views the problem to how the available
data can be used to produce actionable
business metrics. Valuable insight into
the nature of the data is gleaned as the
statistician is able to examine the vari-
ability and correlation structure identify-
ing issues that may impact the statistical
modeling and analysis techniques to be
used.
F OR U M
Limited print copies are available to purchase for $45.00
http://tutorials.pubs.informs.org
NOW AVAILABLE
F
R
E
E
!
2
0
1
2
T
u
tO
R
ia
ls a
v
a
ila
b
le
O
N
L
IN
E

fo
r IN
F
O
R
M
S
A
n
n
u
a
l M
e
e
tin
g
a
tte
n
d
e
e
s!
New Directions in Informatics,
Optimization, Logistics, and Production
Pitu Mirchandani, volume editor
J. Cole Smith, series editor
INFORMS 2012 edition of the TutORials in Operations Research
series is available online to registrants of the 2012 INFORMS
Annual Meeting. It will be made available online to all 2013
INFORMS members on January 1, 2013.

To access the 2012 TutORials log in at:
www.informs.org/tutorialsonline
and enter your INFORMS username and password.
Request a no-obligation
INFORMS Member Benefits Packet
For more information, visit:
http://www.informs.org/Membership
WWW. I NF OR MS . OR G 13 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
STATISTICAL MODELING
Once the data has been adequately
cleaned and prepared, the statistical mod-
eling work begins. By this point, enough
analysis work and data examination should
have been done so the statistician has a
very clear idea of the technique and ap-
proach that will be used. The goal here is
to reduce the data to a mathematical ex-
pression with a component that provides
a description of the overall structure in the
data and a component that accounts for
the variability and uncertainty around that
structure. Its important to remember that
the goal of statistical modeling is to build
as simple of a model as possible that ad-
equately describes the key features in the
data allowing the hypotheses/questions of
interest to be addressed without adding un-
necessary complexity.
A great quote that most statisticians
keep top of mind during this process to
help strike this balance is from one of the
pioneers of the science, George Box: All
models are wrong, but some are useful.
For time-to-event analyses, the mod-
eling technique needs to account for
the censored nature of the data. Many
statistical model forms that are common
in time-to-event analyses can handle
censored data and a variety of statistical
software that contains these techniques.
The author used the Minitab Statistical
Software, which is a software package
common among reliability engineers.
RESULTS
In most analytics projects, the more
advanced statistical analyses and models
are not shared beyond the core techni-
cal team doing the analysis work. These
models and analyses need to be trans-
lated into a variety of summary statistics
and graphical displays that communicate
the features in the data and are easy
to share across a broad range of audi-
ences. For this project, a technical report
containing a variety of graphical displays
and data tables was produced. Figure 1
is an example of one of the graphical dis-
plays produced in this project, and one
thats commonly used in time-to-event
analyses.
The graph displays the likelihood that a
customer will encounter a bug as a func-
tion of time (Note: the probability values are
not shown to protect the confdentiality of
the work). This approach shows the rate at
which that likelihood increases over time.
The likelihood for fve different bug types is
displayed (A, B, C, D and E), allowing for a
comparison across the bug types. For ex-
ample, bug type E has the greatest chance
of being found by a customer while bug
type A and B have the least chance.
Management can use graphical dis-
plays such as these to help determine
the time in usage at which certain bug
types would have a likelihood of being
encountered beyond desired. In this proj-
ect, a certain level of likelihood was de-
cided upon by senior management and
displayed on the graph (shown by the
grey horizontal line). As can be seen, bug
types A and B dont reach that likelihood
until almost three years in usage, indicat-
ing that fxing these bugs can be of lower
priority. Bug types D and E, on the oth-
er hand, reach that likelihood within the
frst few months of usage indicating that
these bugs have a high chance of being
encountered and should be top priority to
fx before too many customer encounter
them.
Kevin Potcner (Kevin.potcner@exsilondata.com) is
a director at Exsilon Data & Statistical Solutions. A
statistician, Potcner has provided analytics consulting and
training for a variety of industries including automotive,
biotech, medical device, pharmaceutical, fnancial
services, software, e-commerce and retail. He holds a
masters degree in applied statistics from the Rochester
Institute of Technology.
F OR U M
Figure 1: Graph displaying likelihood that a customer will encounter a bug as a function of time.
Joi n t he Anal yt i cs Sect i on of I NFORMS
For more information, visit: http://www.informs.org/Community/Analytics/Membership
WWW. I NF OR MS . OR G
Sizing up the potential impact of prescriptive analytics driven by proliferation of images and video.
The human brain simultane-
ously processes millions of
images, movement, sound
and other esoteric informa-
tion from multiple sources. The brain is
exceptionally effcient and effective in its
capacity to prescribe and direct a course
of action and eclipses any computing
power available today. Smartphones now
record and share images, audios and vid-
eos at an incredibly increasing rate, forc-
ing our brains to process more.
Technology is catching up to the
brain. Googles image recognition in
Self-taught Software is working to
replicate the brains capacity to learn
through experience. In parallel, pre-
scriptive analytics is becoming far more
intelligent and capable than predictive
analytics. Like the brain, prescriptive
analytics learns and adapts as it pro-
cesses images, videos, audios, text
and numbers to prescribe a course of
action.
THE FUTURE IS NOW
Google is working on simulating the
human brains ability to compute, evalu-
ate and choose a course of action using
massive neural networks.
The image and video analytics science
has scaled with advances in machine vi-
sion, multi-lingual speech recognition and
rules-based decision engines. Intense in-
terest exists in prescriptive analytics driv-
en by real-time streams of rich image and
video content. Consumers with mobile
devices drive an explosion of location-
tracked image and video data. Lowering
costs have democratized cloud-based
high-performance computing. Andrew
McAfee and Erik Brynjolfsson in Har-
vard Business Review in October 2012
called this Big Data: The Management
Revolution.
Image analytics is seen as a po-
tential solution to social, political, eco-
nomic and industry issues. Thirty years
of Intels Gordon E. Moores law and
Images & videos: really big data
BY FRITZ VENTER (LEFT) AND ANDREW STEIN
T
T H E NE X T B I G T H I NG
14 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
15 | A NA LY T I CS - MAGA Z I NE . OR G
Harvard Business Schools Clayton
Christensens disruptive innovation
have created the current experience-
driven generation that is fully aware
of technologys potential to solve is-
sues plaguing these global domains.
On the consumption side, mobile
consumption of video is growing dra-
matically. Bandwidth is no longer a con-
cern. Prescriptive analytics is poised to
deliver relevant video to viewers be-
yond Netfix algorithm for DVDs to rent
based on viewing interests.
IMAGE ANALYTICS: TECHNOLOGY
PROCESS
Image analytics is the automatic al-
gorithmic extraction and logical analy-
sis of information found in image data
using digital image processing tech-
niques. The use of bar codes and QR
codes are simple examples, but in-
teresting examples are as complex
as facial recognition and position and
movement analysis.
Today, images and image sequenc-
es (videos) make up about 80 percent
I MAGE ANALY T I CS
nnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnn
nnnnnnnnnnnnnnnn
Decision Sciences Institute
advancing the science and practice of decision making
The Decision Sciences Institute is a nonprot
professional organization of researchers, managers,
educators, and students interested in decision-
making techniques and processes in private and
public organizations. The Institute is an international
organization with over 3,500 members in 32 coun-
tries. The annual meetings and regional conferences
attract over 3,500 participants a year.
The Decision Sciences Institute
Is Committed to . . .
Research. Afocus on the integration of research
in the art and science of managerial decision making
across traditional functional academic disciplines; an
international forum for presentation and discussion
of research.
Teaching. Aforum for presentation and discussion
of innovative teaching; recognition of teaching
excellence and curriculum innovation.
Practice. An exchange of ideas between leading
professional practitioners and educators.
Benets Members Receive:
Annual meeting. Members receive discounted fees to
attend, and the meeting draws over 1,500 professionals
together to share current thoughts on theoretical and
applied issues.
Decision Sciences is a highly respected journal among
scholars and is subscribed to by over 1,000 libraries.
It seeks and publishes high quality, theoretical and
empirical articles.
Decision Sciences Journal of Innovative Education is
a high quality, peer-reviewed scholarly journal whose
mission is to publish signicant research.
Decision Line, the Institutes news publication, is
published ve times annually and includes feature
columns, as well as information on members, regions,
annual meetings, and placement activities.
Job Placement Services are offered throughout the
year, with position and applicant listings available via
the Internet. Also at the conference, special facilities
aid in position search activities.
www.DecisionSciences.org
Decision Sciences Institute
43rd ANNUAL MEETING
NOVEMBER 17-20, 2012
Join us.
Log on to our website and complete
the membership application to begin
enjoying your benets.
Regular membership: $160/year
Student membership: $25/year
Figure 1: Fast-growing consumption of mobile video.
WWW. I NF OR MS . OR G 16 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
of all corporate and public unstructured
big data. As growth of unstructured data
increases, analytical systems must as-
similate and interpret images and videos
as well as they interpret structured data
such as text and numbers.
An image is a set of signals sensed
by the human eye and processed by
the visual cortex in the brain creat-
ing a vivid experience of a scene that
is instantly associated with concepts
and objects previously perceived and
recorded in ones memory. To a com-
puter, images are either a raster image
or a vector image. Simply put, raster
images are a sequence of pixels with
discreet numerical values for color; vec-
tor images are a set of color-annotated
polygons. To perform analytics on im-
ages or videos, the geometric encod-
ing must be transformed into constructs
depicting physical features, objects and
movement represented by the image
or video. These constructs can then be
logically analyzed by a computer.
The process of transforming big data
(including image data) into higher-level
constructs that can be analyzed is or-
ganized in progressive steps that each
adds value to the original information in
a value chain (see Figure 2) a concept
developed by Harvard professor Michael
Porter. Prescriptive analytics leverages
the emergence of big data and computa-
tional and scientifc advances in the felds
of statistics, mathematics, operations
research, business rules and machine
learning.
Prescriptive analytics is essentially
this chain of transformations whereby
structured and unstructured big data is
processed through intermediate repre-
sentations to create a set of prescrip-
tions (suggested future actions). These
actions are essentially changes (over a
future time frame) to variables that in-
fluence metrics of interest to an enter-
prise, government or another institution.
These variables influence target metrics
over a specified time frame. The struc-
ture of the relationship between a met-
ric and the variables that influence it is
a called a predictive model. A predictive
model represents detected patterns,
time series and relationships among
sets of variables and metrics. Predictive
models of key metrics can project future
time series of metrics from forecasted
influencing variables.
The first step in the prescriptive
analytics process transforms the ini-
tial unstructured and structured data
sources into analytically prepared data.
Although there are parallels with stan-
dard data-warehousing/ETL, this step
is different from that approach in that it
contends with the complexities of pre-
processing of unstructured data, as well
as structured data including databases,
narrative text files, images, videos and
sound.
For more details on the image analyt-
ics technology process, click here.
DEFENSE AND SECURITY DRIVING
DEMAND
The need to analyze data and pro-
actively prescribe actions is pervasive
in nearly every vibrant growth industry,
government and institutional sector.
This has created a vacuum, or demand,
I MAGE ANALY T I CS
Figure 2: Value chain of transformations.
The need to analyze
data and proactively
prescribe actions is
pervasive in nearly
every vibrant growth
industry, government and
institutional sector.
17 | A NA LY T I CS - MAGA Z I NE . OR G
for prescriptive analytics systems. De-
fense and security, as well as health-
care, are particularly good examples
of industries that are driving demand
for such systems.
The defense industry has pushed the
envelope for image processing, and it is
refected in the storage that is being pro-
cured by government. GovWin Consult-
ing reports that Defense agencies are
the largest spenders on a per-agency ba-
sis at the federal level for electronic data
storage. The Army, Navy and Air Force,
along with the Department of Defense,
account for 58.4 percent of all federal
spending for storage. GovWin indicates
that the drivers for this spend are big
data and full motion video.
The proliferation of captured data of
interest to defense and security comes
from four clear sources.
1. Predator drones gathering
intelligence via video and image
reconnaissance at reduced
risk as they seek out hostile
scenarios.
2. In-place surveillance cameras
increasingly prevalent in public
places, managed by federal, state
and local governments.
3. Stationary commercial and
institutional surveillance mounted
I MAGE ANALY T I CS
For submission information:
http://meetings.informs.org/analytics2013
Real-world use of descriptive, PREDICTIVE and prescriptive analytics
Focus on BIG DATA, marketing analytics, forecasting
Most rigorous and REAL-WORLD analytics conference offered
Administration of Certied Analytics Professional exam
Not just what to do but HOW TO DO IT
Only INFORMS can provide an analytics and O.R. conference backed up by
the best minds in industry and academia. Hand-picked speakers take you
through case studies on how analytics can maximize the value of your
data, driving better business decisions and impacting the bottom line.
Present a Talk or Poster and Save
Submission deadline: December 15, 2012
Present your work at the conference known for real-world analytics
and save 35-40% on registration. Case studies, best practice examples,
and academic research with a practitioner orientation are all welcome.
Open call for presentations, with special focus on:
Big data analytics
Supply chain management & logistics
Marketing analytics
Forecasting & risk management
Plus all other areas and topics within the business analytics and O.R. arena.
Thanks to our Sponsor
April 7-9, 2013
Grand Hyatt
San Antonio
San Antonio, TX
Learn from the BEST about High-Impact ANALYTICS & O.R. APPLICATIONS
Analytics'13 Analytics2 ad.rev_Layout 1 11/1/12 1:53 PM Page 1
Predator drones gather intelligence via video and image reconnaissance.
WWW. I NF OR MS . OR G 18 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
in public places of business, the
workplace, hospitals and schools.
4. Consumer-created image and video
shared on YouTube, Facebook,
Twitter, blogs and other online social
media sharing/publishing sites.
While the demand drives proliferation,
it also presents a confict between safety
and privacy. People value surveillance
as a resource when a child is taken or
a loved one goes missing. On the other
hand, people see it as an invasion of pri-
vacy during everyday activities. Likewise,
people value sharing their personal pho-
tos with family and friends, but they are
concerned that their images and videos
may be anonymously processed and an-
alyzed to identify criminal activity. Where
is the ethical line of too much drawn?
And, do younger generations have the
same privacy-loss perspective?
Major cities around the world, from
London to Las Vegas, have cameras in-
stalled so densely that its nearly impos-
sible to move about the city without being
recorded. Keeping up with the installa-
tion statistics is almost impossible. The
availability of easy-to-deploy, consumer-
installed cameras is ubiquitous. This rate
of adoption for security video capture
makes an accurate assessment of how
much video is being recorded diffcult.
We just know it is BIG.
Is all this surveillance coupled with the
potential of video/image analytics help-
ing? Research published in the Journal
of Experimental Social Psychology sug-
gests that increased surveillance only
increases our propensity to be Good Sa-
maritans, not reduce crime. Eric Jaffe
calls this the reverse the bystander ef-
fect in his recent article. In the end, sur-
veillance and image analytics does give
provide data that can help offcials pur-
sue criminal activity and pursue justice,
albeit ex post facto.
How does cost drive the demand for
video and image analytics? People expect
the nations defense and security effort
to be cost-effective. This means that the
country will move to a smaller but more
educated fghting force and at the same
time increase the use of remote sensing,
observation and monitoring tools. Simply
put, this means more image and video
capture or surveillance everywhere.
HEALTHCARE A PERFECT DOMAIN
The complexity of healthcare makes it
a perfect domain to explore the potential
for prescriptive analytics and imaging.
Healthcare has been a pioneer in captur-
ing rich imaging information and built da-
tabases to develop a variety of statistical
medical norms. The next step is to use
this image analytics to provide real-time
insight to healthcare providers during di-
agnosis and treatment.
The advances in medical science
come fast, and physicians have a dif-
ficult time keeping up with new proce-
dures, treatments and pharmacology
while they care for patients. Whether a
routine office visit, serious disease or an
emergency, prescriptive analytics inte-
grated in medical workflow promises to
improve the standard of care and speed
of diagnosis, treatment and recovery.
Its happening now. In Science Busi-
ness, Alan Kotok wrote about University
of Michigan researchers who adapted
computed tomography image analytics to
I MAGE ANALY T I CS
Parametric response mapping lung images.
Research suggests that
increased surveillance only
increases our propensity
to be Good Samaritans,
not reduce crime. Eric Jaffe
calls this the reverse the
bystander effect.
Hel p Promote Anal yt i cs
Its fast and its easy! Visit: http://analytics.informs.org/button.html
WWW. I NF OR MS . OR G 19 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
diagnose chronic obstructive pulmonary
disease (COPD).
Advanced medical decision-support
systems (MDDS) link massive knowl-
edge bases to multiple clinical data-
bases. These in turn are linked to a
patients data. These complex systems
have varying schemata, comparative
image banks and discipline vocabu-
laries even local languages. Image
analytics reduces varying subjective in-
terpretation and human error, thereby
accelerating the process of treatment
and recovery.
With an image analytics system that
can accurately process and prescribe
action, its possible to envision real-
time patient monitoring systems with
rules-based analysis and caregiver
notification.
The increasing role of algorithmic di-
agnosis and treatment creates the per-
fect opportunity to integrate images with
prescriptive analytics. For example,
medical professionals at the German
company Medal and the Institute for Al-
gorithmic Medicine in Houston, Texas,
curate and credential a digital knowl-
edge base of medical algorithms and
computational procedures for medical
treatment and administration. Integrat-
ing image analytics with such technol-
ogy in a prescriptive analytics system
holds potential to make faster and more
informed decisions, streamline costs
and broadly improve the quality and
economics of healthcare.
THE ROAD AHEAD
Looking further ahead, several trends,
opportunities and issues for video and
image analytics will certainly emerge. For
example:
Other industries are already forging
strategies for video and image
analytics. Consumer and marketing
research is one good example.
Expect global frms such as GfK,
Nielsen, Acxiom and Symphony IRI
to reinvent survey-based research to
add tone captured in the video of a
panelist. Video and image analytics
will generate a deeper understanding
in both staged and impromptu
marketing research. The oil and gas
industry is considering what proactive
action could be possible by analyzing
video and image feeds during drilling
and fracking processes (see related
story).
The demand for talent in this area will
increase, creating the job of the future.
Look for additional technology
breakthroughs involving 3D image and
video analytics, breakthroughs that will
exponentially increase the potential for
prescriptive analytics.
Where applicable, ethics and the social
effect of image and video analytics on
people, groups and systems must be
considered. Jay Stanley, a senior policy
analyst with the ACLU, addresses
the topic in Video Analytics: A Brain
Behind the Eye? and explores the
moral question of machines interpreting
human activity predictively and
prescriptively.
Finally, expect to see many purpose-
built solutions that can leverage across
industries, cultures, domains and other
boundaries through a common image
and video-processing platform for
prescriptive analytics.
Fritz Venter (fritz.venter@ayata.com) is the director of
technology at AYATA, a prescriptive analytics software
company headquartered in Austin, Texas, where he
is responsible for product and intellectual property
development, as well as delivery of solutions to
customers. He has 20 years of industry experience and is
fnishing his Ph.D. in pattern matching.

Andrew Stein (andrew_stein16@msn.com) is the chief
advisor at the Pervasive Strategy Group located near
Chicago, where he fuels creative vision for sustainable
analytics-based strategies for continuous innovation. He
can be found sharing disruptive innovative ideas on his
blog, www.SteinVox.com.
I MAGE ANALY T I CS
With an image analytics
system that can accurately
process and prescribe
action, its possible to
envision real-time patient
monitoring systems with
rules-based analysis and
caregiver notification.
Subscri be t o Anal yt i cs
Its fast, its easy and its FREE! Just visit: http://analytics.informs.org/
WWW. I NF OR MS . OR G
The advent of the digital oil field helps produce
cost-effective energy while addressing safety and
environmental concerns.
Everyone needs it, few know
how we get it, and many feel
compelled to slow down ef-
forts to fnding and producing
oil. One of the primary assets of success-
ful, thriving societies is a low-cost energy
source. What drives low cost? Supply
greater than demand! What drives supply?
Finding supplies in suffcient quantities
so producing oil and gas is economically
viable. Finding and producing hydrocar-
bons is technically challenging and eco-
nomically risky. The process generates
a large amount of data, and the industry
needs new technologies and approach-
es to integrate and interpret this data to
drive faster and more accurate decisions.
Doing so will lead to safely fnding new
resources, increasing recovery rates and
reducing environmental impacts.
The term big data has historically
been regarded by the oil and gas industry
as a term used by softer industries to
track peoples behaviors, buying tenden-
cies, sentiments, etc. However, the con-
cept of big data defned as increasing
volume, variety and velocity of data is
quite familiar to the oil and gas industry.
The processes and decisions related to
oil and natural gas exploration, development
and production generate large amounts of
data. The data volume grows daily. With
new data acquisition, processing and stor-
age solutions and the development of new
devices to track a wider array of reservoir,
machinery and personnel performance
todays total data is predicted to double in
the next two years.
Many types of captured data are
used to create models and images of the
How big data is
changing the oil &
gas industry
BY ADAM FARRIS
E
B I G DATA ANALY T I CS
20 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
Finding and producing hydrocarbons is technically challenging and economically risky.
WWW. I NF OR MS . OR G 21 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
Earths structure and layers 5,000-35,000
feet below the surface and to describe
activities around the wells themselves,
such as machinery performance, oil fow
rates and pressures. With approximately
one million wells currently producing oil
and/or gas in the United States alone,
and many more gauges monitoring per-
formance, this dataset is growing daily.
The oil industry recognizes that great
power and imminent breakthroughs can
be found in this data by using it in smarter,
faster ways. However, resistance regard-
ing workfows and analysis approaches
remains in place, as it has for the last 30
years. How does the industry bridge the
vocabulary and cultural gap between data
scientists and technical petroleum pro-
fessionals? Ideas, applications and solu-
tions generated outside the oil and gas
industry rarely fnd their way inside. Other
industries seem to have bridged this gap,
but in talking to experts in the broader
technology industry, the oil industry is
seen as a no mans land for new-age
entrepreneurs, while major technology
providers spend billions trying to enter it
(e.g., GE, IBM and Microsoft).
Breaking into the oil and gas indus-
try is diffcult for analysts, but the need
and potential for reward are great. Nine
of the top 10 organizations in Fortunes
Global 500 are oil and gas companies.
More than 20,000 companies are asso-
ciated with the oil business, and almost
all of them need data analytics and inte-
grated technology throughout the oil and
gas lifecycle.
Throughout the 1990s, the oil and
gas industry focused on data integra-
tion, i.e., How do we get all the data in
one place and make it available to the
geo-scientists and engineers working to
find and produce hydrocarbons? Since
the turn of the century, technology de-
velopment has mainly focused on soft-
ware that integrates across the major
disciplines to speed up old workflows.
The industry has had many amazing
technical professionals, but the idea of
a data scientist is new, and should be
considered alongside the petrophysical,
geophysical and engineering scientists.
The next decade must focus on ways to
use of all of the data the industry gener-
ates to automate simple decisions and
guide harder ones, ultimately reducing
the risk and resulting in finding and pro-
ducing more oil and gas with less envi-
ronmental impact.
TECHNICALLY COMPLEX, HIGH RISK
Despite its astronomical revenues,
the profit margin of the oil and gas ma-
jors is 8 percent to 9 percent. Find-
ing and developing oil and gas while
reducing the safety risk and environ-
mental impact is difficult. The layers
of hydrocarbon-bearing rock are deep
below the Earths surface, with much
of the worlds hydrocarbons locked in
hard-to-reach places, such as in deep
water or areas with difficult geopolitics.
Oil is not found in big, cavernous
pools in the ground. It resides in layers
of rock, stored in the tiny pores between
the grains of rock. Much of the rock con-
taining oil is tighter than the surface on
which your computer currently sits. Fur-
ther, oil is found in areas that have struc-
turally trapped the oil and gas there is
no way out. Without a structural trap, oil
and gas commonly migrates throughout
the rock, resulting in lower pressures and
uneconomic deposits. All of the geologi-
cal components play an important role; in
drilling wells, all components are techni-
cally challenging.
Following are three big oil industry
problems that consume money and pro-
duce data:
1. Oil is hard to fnd. Reservoirs are
generally 5,000 to 35,000 feet below the
Earths surface. Low-resolution imaging
and expensive well logs (after the wells
are drilled) are the only options for fnd-
ing and describing the reservoirs. Rock is
complex for fuids to move through to the
D I GI TAL OI L F I E L D
How does the industry
bridge the vocabulary and
cultural gap between data
scientists and technical
petroleum professionals?
Ideas, applications and
solutions generated
outside the oil and gas
industry rarely find their
way inside.
Request a no-obligation
INFORMS Member Benefits Packet
For more information, visit:
http://www.informs.org/Membership
WWW. I NF OR MS . OR G 22 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
wellbore, and the fuids themselves are
complex and have many different physi-
cal properties.
2. Oil is expensive to produce. The
large amount science, machinery and
manpower required to produce a bar-
rel of oil must be done proftably, taking
into account cost, quantity and market
availability.
3. Drilling for oil presents potential
environmental and human safety con-
cerns that must be addressed.
Finding and producing oil involves
many specialized scientifc domains (i.e.,
geophysics, geology and engineering),
each solving important parts of the equa-
tion. When combined, these components
describe a localized system containing
hydrocarbons. Each localized system
(reservoir) has a unique recipe for getting
the most out of the ground proftably and
safely.
FINDING IT
To locate oil, geologists and petro-
physicists use data that indicates the type
of rock in nearby wells, as well as seismic
data, produced by sending sound waves
deep into the subsurface, which bounce
back to receivers. The data collected
from wells has high-
er resolution but is
accurate for only a
small area (10 feet)
around the well,
so data interpreta-
tion techniques are
used to extrapolate
between wells. This
requires scientists to
insert considerable
interpretation into
the analysis. Geo-
physics is the study
of the Earth, but
in the oil business,
geophysicists focus
primarily on seismic data. They use this
data to create a subsurface picture, but
the resolution is several hundred meters
at best. Over time, many breakthroughs
in fnding new deposits of oil and gas
have occurred through the combination
of geology, petrophysics and geophys-
ics, ultimately developing better models
of the Earths subsurface, but this area
still has the most uncertainty. Find a way
to paint a clearer, more accurate image of
the subsurface and youve found the Holy
Grail of the oil and gas industry.
To date, 3D seismic data has been
the industrys most impactful scientific
breakthrough. This data vastly improves
the picture of the Earths subsurface
and removes the need to drill a multi-
million dollar hole, with very little data,
to explore what is in the rock. Seis-
mology, rightfully so, has received the
most research attention (billions of dol-
lars yearly), trying to better tune data
acquisition and processing, in an effort
to get a clearer image.
R&D spending in geophysics centers
around four main categories acquisi-
tion, processing, interpretation and hard-
ware optimization all rich in the three
Vs of big data (volume, variety and veloc-
ity). One raw seismic dataset is usually
in the hundreds of gigabytes, resulting in
terabytes once the numerous and expen-
sive processing and interpretations are
fnished. These processing algorithms
calculate many billions of data points with
each run, and hundreds of these runs oc-
cur globally every day, all for one goal
create a clear, accurate picture of the
Earths subsurface and identify all of the
major components of the hydrocarbon
systems.
With the lower seismic data resolution,
data from other existing nearby wells is
used to enhance the overall Earth picture.
Well log data is captured on every well
and is interpreted to generate specifc in-
formation about the rock that was cut and
the fuids that exist. While the resolution
D I GI TAL OI L F I E L D
Figure 1: Simple seismic acquisition diagram (left) and a processed, interpreted 3D seismic Earth model (right).
WWW. I NF OR MS . OR G 23 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
of this data is one to two feet, it has lim-
ited accuracy as mentioned earlier. The
problem: lots of data, different scales, dif-
ferent types, with a critical need to get the
most clarity possible through integration.
Finally, while much time is spent on
using captured data to make the subsur-
face clear, equivalent time is spent on
acquisition machinery and techniques.
Seismic acqusition crews, well-logging
crews and other services deploy machin-
ery on and in the Earth, taking readings
that are processed often manually to
produce a best possible picture used for
planning, locating and drilling wells, and
as the guide for economic planning and
reporting.
PRODUCING IT
Producing oil and gas involves drill-
ing and completing the wells, connecting
them to pipelines and then keeping the
fow of the hydrocarbons at an optimum
rate, all integrally related to the subsur-
face environment. The path to optimizing
production is dependent on the type of
rock and structure of the reservoir. These
decisions depend heavily on models cre-
ated in the exploration phase described
earlier.
Today, every well thats drilled uses
extensive machinery, measurement de-
vices and people all of which produce
video, image and structured data.
This area is probably the fastest
growing in terms of the volume,
variety and velocity of data being
captured. Improving drilling and
completion operations can sig-
nifcantly reduce costs. For refer-
ence, the widely publicized shale
oil wells typically cost between
$7 million and $10 million each.
Offshore wells can cost tens or
hundreds of millions of dollars.
The cost goes up as the seafoor
and the reservoirs deepen re-
quiring more technology to do it
safely and successfully.
The average fnding and de-
velopment (F&D) cost of opera-
tors is between $7 and $15 a
barrel (the wide range dependent
on geographies and geologies). If
there is enough oil to make the
economics work, they will pro-
ceed; if there are smaller pockets
of oil, development costs must be
lower in order for the economics
to make sense.
Some simple math: If oil com-
panies found a one million barrel
oil reservoir (at a price of $100 per
barrel), that seems like good mon-
ey, right? $100 million! However, to
fnd it, you have to shell out up to
D I GI TAL OI L F I E L D
Figure 2: Raw well-log (left) and processed well log images showing the rock type (right).
Figure 3: Simple well diagram (left) and an onshore drilling rig in south Texas (right).
24 | A NA LY T I CS - MAGA Z I NE . OR G
$30 million for the acquisition, process-
ing and interpretation of the seismic data.
The operators must then produce it, and
the cost of land, drilling and getting oil to
market is signifcant. Land can cost as
much as $30,000 per acre for access
and often requires 120 acres for one
well. Typical deals involve thousands of
acres, plus drilling costs of between $5
million and $10 million for U.S. onshore
wells and up to $100 million for offshore
drilling. If you are a major integrated oil
and gas company, your proft on $100
million will be $1 million to $12 million
(a bit higher for independent operators).
Many will lose money overall. Analytical
approaches that impact the success rate
of fnding or reducing the cost to develop
and produce oil and gas can make en-
ergy more affordable, safer and environ-
mentally conscious.
SOLUTION
The integration and mining of data
produced in the hydrocarbon fnding
and producing process offers amazing
potential for answering some of the big
questions facing the oil industry. The
biggest question: Where is more oil?
The next biggest question: How do we
get substantially more out of the ground
safely, with minimal environmental im-
pact? The less sexy, but possibly more
relevant question is: How do we use
data that has such potential to unlock
these answers?
The oil and gas industry can learn
much from the data, yet it is generally
used the same way it was historically
used. The industry must look at broad-
er areas and functioning of individual
components to create different views
and perspectives. For example, Drill-
inginfo, a leading data and intelligence
provider of upstream data for oil and
gas decisions, has begun to break the
barriers of geography and discipline to
consider many potential variables and,
based on thousands of wells, create a
statistically predictive model for a given
areas producibility.
Most analysis in oil and gas is done
within technical disciplines and within
a relatively small geographical study
area. This would be easier if the Earths
properties were not so variable. Dep-
ositional systems vary greatly based
on rock types and the different layers
caused by ancient rivers, mountains,
plains and deserts. How do we learn be-
tween reservoirs when all of the drilling
environments are seemingly different
systems? Typically, the data is gath-
ered and stored in different databases,
fle cabinets and various geoscientists
and engineers desks. Grains of similar
D I GI TAL OI L F I E L D
APRIL 7, 2013
CONDUCTED BY INFORMS
INAUGURAL ANALYTICS CERTIFICATION EXAM

Be among the first to become a Certified Analytics Professional (CAP).
Make plans now to take the profession's first general analytics
certification exam. Candidate Handbook now available at
www.informs.org/Build-Your-Career/Analytics-Certification.
BENEFITS OF CERTIFICATION
Advances your career potential by setting you apart from the competition
Drives personal satisfaction of accomplishing a key career milestone
Helps improve your overall job performance by stressing continuing
professional development
Recognizes that you have invested in your analytics career by pursuing this rigorous credential
Boosts your salary potential by being viewed as experienced analytics professional
Shows competence in the principles and practices of analytics
ELIGIBILITY
BA/BS or MA/MS degree
At least five years of analytics work-related experience for BA/BS holder in a related area
At least three years of analytics work-related experience for MA/MS (or higher) holder in
a related area
At least seven years of analytics work-related experience
for BA/BS (or higher) holder in an unrelated area
APPLICATIONS
Open in January
Prepare to apply by reviewing Candidate Handbook now
Arrange now to secure academic transcript and
confirmation of soft skills from employer
to send to INFORMS
QUESTIONS
Email certification@mail.informs.org
2013 CAP EXAM SCHEDULE
APRIL 7, 2013
INFORMS Conference on
Business Analytics & O.R.,
San Antonio, TX
OCTOBER 5, 2013
INFORMS Annual Meeting,
Minneapolis, MN
OTHERS TBD
DOMAINS OF ANALYTICS PRACTICE
Domain Description Weight*
Business Problem (Question) Framing
Analytics Problem Framing
Data
Methodology (Approach) Selection
Model Building
Deployment
Life Cycle Management
*Percentage of questions in exam
I
II
III
IV
V
VI
VII
15%
17%
22%
15%
16%
9%
6%
100%
24 | A NA LY T I CS - MAGA Z I NE . OR G
$30 million for the acquisition, process-
ing and interpretation of the seismic data.
The operators must then produce it, and
the cost of land, drilling and getting oil to
market is signifcant. Land can cost as
much as $30,000 per acre for access
and often requires 120 acres for one
well. Typical deals involve thousands of
acres, plus drilling costs of between $5
million and $10 million for U.S. onshore
wells and up to $100 million for offshore
drilling. If you are a major integrated oil
and gas company, your proft on $100
million will be $1 million to $12 million
(a bit higher for independent operators).
Many will lose money overall. Analytical
approaches that impact the success rate
of fnding or reducing the cost to develop
and produce oil and gas can make en-
ergy more affordable, safer and environ-
mentally conscious.
SOLUTION
The integration and mining of data
produced in the hydrocarbon fnding
and producing process offers amazing
potential for answering some of the big
questions facing the oil industry. The
biggest question: Where is more oil?
The next biggest question: How do we
get substantially more out of the ground
safely, with minimal environmental im-
pact? The less sexy, but possibly more
relevant question is: How do we use
data that has such potential to unlock
these answers?
The oil and gas industry can learn
much from the data, yet it is generally
used the same way it was historically
used. The industry must look at broad-
er areas and functioning of individual
components to create different views
and perspectives. For example, Drill-
inginfo, a leading data and intelligence
provider of upstream data for oil and
gas decisions, has begun to break the
barriers of geography and discipline to
consider many potential variables and,
based on thousands of wells, create a
statistically predictive model for a given
areas producibility.
Most analysis in oil and gas is done
within technical disciplines and within
a relatively small geographical study
area. This would be easier if the Earths
properties were not so variable. Dep-
ositional systems vary greatly based
on rock types and the different layers
caused by ancient rivers, mountains,
plains and deserts. How do we learn be-
tween reservoirs when all of the drilling
environments are seemingly different
systems? Typically, the data is gath-
ered and stored in different databases,
fle cabinets and various geoscientists
and engineers desks. Grains of similar
D I GI TAL OI L F I E L D
APRIL 7, 2013
CONDUCTED BY INFORMS
INAUGURAL ANALYTICS CERTIFICATION EXAM

Be among the first to become a Certified Analytics Professional (CAP).
Make plans now to take the profession's first general analytics
certification exam. Candidate Handbook now available at
www.informs.org/Build-Your-Career/Analytics-Certification.
BENEFITS OF CERTIFICATION
Advances your career potential by setting you apart from the competition
Drives personal satisfaction of accomplishing a key career milestone
Helps improve your overall job performance by stressing continuing
professional development
Recognizes that you have invested in your analytics career by pursuing this rigorous credential
Boosts your salary potential by being viewed as experienced analytics professional
Shows competence in the principles and practices of analytics
ELIGIBILITY
BA/BS or MA/MS degree
At least five years of analytics work-related experience for BA/BS holder in a related area
At least three years of analytics work-related experience for MA/MS (or higher) holder in
a related area
At least seven years of analytics work-related experience
for BA/BS (or higher) holder in an unrelated area
APPLICATIONS
Open in January
Prepare to apply by reviewing Candidate Handbook now
Arrange now to secure academic transcript and
confirmation of soft skills from employer
to send to INFORMS
QUESTIONS
Email certification@mail.informs.org
2013 CAP EXAM SCHEDULE
APRIL 7, 2013
INFORMS Conference on
Business Analytics & O.R.,
San Antonio, TX
OCTOBER 5, 2013
INFORMS Annual Meeting,
Minneapolis, MN
OTHERS TBD
DOMAINS OF ANALYTICS PRACTICE
Domain Description Weight*
Business Problem (Question) Framing
Analytics Problem Framing
Data
Methodology (Approach) Selection
Model Building
Deployment
Life Cycle Management
*Percentage of questions in exam
I
II
III
IV
V
VI
VII
15%
17%
22%
15%
16%
9%
6%
100%
24 | A NA LY T I CS - MAGA Z I NE . OR G
$30 million for the acquisition, process-
ing and interpretation of the seismic data.
The operators must then produce it, and
the cost of land, drilling and getting oil to
market is signifcant. Land can cost as
much as $30,000 per acre for access
and often requires 120 acres for one
well. Typical deals involve thousands of
acres, plus drilling costs of between $5
million and $10 million for U.S. onshore
wells and up to $100 million for offshore
drilling. If you are a major integrated oil
and gas company, your proft on $100
million will be $1 million to $12 million
(a bit higher for independent operators).
Many will lose money overall. Analytical
approaches that impact the success rate
of fnding or reducing the cost to develop
and produce oil and gas can make en-
ergy more affordable, safer and environ-
mentally conscious.
SOLUTION
The integration and mining of data
produced in the hydrocarbon fnding
and producing process offers amazing
potential for answering some of the big
questions facing the oil industry. The
biggest question: Where is more oil?
The next biggest question: How do we
get substantially more out of the ground
safely, with minimal environmental im-
pact? The less sexy, but possibly more
relevant question is: How do we use
data that has such potential to unlock
these answers?
The oil and gas industry can learn
much from the data, yet it is generally
used the same way it was historically
used. The industry must look at broad-
er areas and functioning of individual
components to create different views
and perspectives. For example, Drill-
inginfo, a leading data and intelligence
provider of upstream data for oil and
gas decisions, has begun to break the
barriers of geography and discipline to
consider many potential variables and,
based on thousands of wells, create a
statistically predictive model for a given
areas producibility.
Most analysis in oil and gas is done
within technical disciplines and within
a relatively small geographical study
area. This would be easier if the Earths
properties were not so variable. Dep-
ositional systems vary greatly based
on rock types and the different layers
caused by ancient rivers, mountains,
plains and deserts. How do we learn be-
tween reservoirs when all of the drilling
environments are seemingly different
systems? Typically, the data is gath-
ered and stored in different databases,
fle cabinets and various geoscientists
and engineers desks. Grains of similar
D I GI TAL OI L F I E L D
WWW. I NF OR MS . OR G 25 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
rock behave the same everywhere else
on Earth. While they are never laid down
the exact same way, the lessons learned
in one area could be extrapolated or ap-
plied to another area. Today, this process
is very manual and labor intensive.
Data science will help the oil and gas
industry learn more about each subsys-
tem and inject more accuracy and con-
fdence in every decision, ultimately
reducing risk. Big data analytics will be
key. While the concept is still in its infan-
cy as far as the oil and gas industry is
concerned, here are some possible near-
term big data analytical solutions:
Integration over a wide variety of
large data volumes incorporating
all relevant information for fnding
additional hydrocarbons, and
identifying the data and the best
known technologies to produce it for
that particular system (the recipe).
Make daily operational data relevant
to reduce operating costs and improve
recovery rate.
Decision management take into
consideration all knowns and local
conditions and quickly identify if or
how to proceed.
In the oil and gas industry, all data is
critical, but not all pieces are critical for
every decision. So how do we break down
every decision, identify every potential
piece of contributing data and quickly sift
through to a decision?
Other industries are embracing big
data analytics, but the oil and gas indus-
try is just now getting the concept. The oil
and gas industry has dealt with big vol-
ume, variety and velocity, but must start
thinking beyond self-made boundaries to
truly capture the beneft awaiting.
In the past decade, oil and gas tech-
nology focused on the fnding half of
the equation and beneftted from imaging
breakthroughs in the healthcare industry.
The magnetic resonance imaging (MRI)
technology that doctors use to see inside
of humans without a knife has also proved
useful to see into the rock from inside the
wellbore identifying useful rock and fu-
id properties. For the producing, side of
the equation, the digital oil well, digital oil
feld and data formatting standards have
led to operators putting gauges and data
gathering devices on everything possible
D I GI TAL OI L F I E L D
Soon we will not just
capture data and view it
We will have smarter
solutions, with built-in
intelligence, so computers
can make simple decisions,
while indicating a set
of potential outcomes to
the user in more difficult
situations, helping with
faster decisions based on
best practices.
Your one-stop shop to view top presentations from key INFORMS meetings
Your latest member benefit lets you learn from the best on your schedule.
http://livewebcast.net/INFORMS_Video_Learning_Center
video learning center
Coming Soon! 2012 Wagner Prize Presentations
2012 Analytics Conference
2011 Annual Meeting
2011 Analytics Conference
2010 Annual Meeting
2010 Practice Conference
2009 Annual Meeting Join the Analytics Section of INFORMS
For more information, visit:
http://www.informs.org/Community/Analytics/Membership
26 | A NA LY T I CS - MAGA Z I NE . OR G
in the feld. In the continuum of descrip-
tive, predictive and prescriptive ana-
lytics, the oil and gas industry is just
learning how to use this data to make
decisions. The descriptive portion (per
device) is being embraced. Capturing
this data allows very large felds to be
managed from a central command
center rather than by people physically
checking every well. In a feld with thou-
sands of wells, where one person once
checked fve to 10 wells per day, per-
sonnel costs and the cost of optimizing
production have been vastly reduced
while production has increased. One of
the frst implementations of the digital
oilfeld was Occidental Petroleums Elk
Hills feld in Bakersfeld, Calif.
A decade ago, the term digital oil-
feld meant installing digital gauges
and transmitting devices for production
rates and pressures (instead of manual
readings). Field personnel could target
wells that were down or having prob-
lems. In the next stage, companies dis-
played this data in one room and then
on the same big computer screen as an
earthmodel built by the geoscientists
and engineers. Todays stage is a
rather linear extension from this hard-
ware-centric integration solution, with
dashboards and software making digi-
tal monitoring and operations more ef-
fective, but more can be done.
Soon we will not just capture data
and view it, which still requires experi-
enced personnel to make a large num-
ber of decisions. We will have smarter
solutions, with built-in intelligence, so
computers can make simple decisions,
while indicating a set of potential out-
comes to the user in more diffcult situ-
ations, helping with faster decisions
based on best practices. Ultimately,
costs for these operations will be cut
and production will go up.
Soon we will have automated inter-
pretation systems that learn while the
user interprets seismic data, and well
begin completing more of the interpreta-
tions, as they improve in understanding
the users selection methods. First ver-
sions of this concept exist today. The
next stage of development must focus
on merging this knowledge with other
data types to identify what is productive
D I GI TAL OI L F I E L D
WIN FRIENDS &
INFLUENCE PEOPLE
WITH FORESIGHT
Impress your clients and colleagues with your
understanding of the latest developments in
forecasting research. Dazzle them with anecdotes
of real world forecasting successes, pulled straight
from our pages. Rally your organization around
a common goal of increased effciency and
proftability, supported by your improved forecasts
and business planning.
Foresight: The International Journal of Applied
Forecasting is a practical guide that relates to
business forecasting like no other professional
journal can. Four times each year, Foresights pages
are packed with articles, reviews, and opinions that
showcase the best thinking and writing in the feld of
forecasting. Visit forecasters.org/foresight to learn
more and subscribe or renew today.
FORESIGHT
Do you have it?
Do you want it?
Get it here:
www.forecasters.org/
foresight/subscribe
Hel p Promote Anal yt i cs
Its fast and its easy! Visit: http://analytics.informs.org/button.html
27 | A NA LY T I CS - MAGA Z I NE . OR G
and what is not. Further, the economics,
necessary machinery, personnel and en-
vironment risks should be considered as
the seismic work is done. This is possi-
ble! We have the data! We have the ana-
lytical expertise. Theyve just never been
introduced to each other. The bottom
line: The oil and gas industry is ripe for
big data analytics, whether through new
software, new middleware, data handling
solutions or data manipulation tools. The
primary focus areas should be fnding it,
producing it and operations.
CONCLUSION: IT MUST HAPPEN
While we have plenty of data and
challenges to undertake, how do we
bridge the gap between the two? One
way the oil industry tries is through ven-
ture capital funds. Both Shell and Chev-
ron have their own such funds to create
an outlet to explore new ideas. While
not enough, the majors have begun to
embrace analytics and others are fnd-
ing ways to follow.
The oil and gas industry need
more cross-fertilization. As oil and
gas companies awake to the potential
of analytics, many jobs will be created
for data scientists, opening a portal
for new applications and ideas to en-
ter the industry.
Many technology providers exist in
the industry. The successful ones from
the past decade must embrace big data
analytics to succeed in the future. This
is challenging enough, but it will also
require a mindset change. Few are
poised to do so, but such companies
will have the strategic data and intelli-
gence to train new applications to be
smarter and provide more complete
solutions than stranded static data in
empty point software products.
The oil and gas industry has an op-
portunity to capitalize on big data an-
alytics solutions. Now the oil and gas
industry must educate big data on the
types of data the industry captures in
order to utilize the existing data in fast-
er, smarter ways that focus on helping
fnd and produce more hydrocarbons,
at lower costs in economically sound
and environmentally friendly ways.
Adam Farris (adam.farris@drillinginfo.com) is senior
vice president of business development for Drillinginfo.
Based in Austin, Texas, Drillinginfo is the second
largest oil and gas data and analysis marketplace in
the country. Farris has 15 years of management and
engineering experience in the oil and gas industry.
Hes a member of Society of Petroleum Engineers and
the Society of Exploration Geophysics.
D I GI TAL OI L F I E L D
INFORMS second conference
on the HEALTH SECTOR,
bringing together researchers
and stakeholders around the
most current research and practice
in healthcare O.R. and analytics.
The best, most current research and applied
work in healthcare O.R. and analytics in one
highly-focused conference.
Top-quality presentations by leading researchers
and practitioners, selected through a review of
extended abstracts.
Cross-cultural view of healthcare systems and
analysis of operational impacts.
Structured networking opportunities, including
birds-of-a-feather discussion groups and facilitated
networking over lunch.
Collegial, small-scale setting in a great hotel in a
dynamic city.
http://meetings2.informs.org/healthcare2013
INFORMSHEALTHCARE 2013
Call for Papers
and Posters
Abstract deadline:
March 1, 2013
All submissions will be reviewed
by the Committee, with authors
notified of acceptance at specified
dates. Submissions consist of a
500-character abstract and a two-
page extended abstract. There is
limited session capacity for this
conference, and we encourage you
to submit early.
June 23-26, 2013
Chicago Marriott Downtown
Magnificent Mile
Chicago, Illinois
Healthcare Analytics color ad_Layout 1 11/1/12 1:49 PM Page 1
Subscri be t o Anal yt i cs
Its fast, its easy and its FREE!
Just visit: http://analytics.informs.org/
WWW. I NF OR MS . OR G
Non-profit organization promotes standards for making rational, auditable calculations based on probability distributions.
Distribution Processing per-
forms calculations with un-
certainties just as if they
were ordinary numbers. It
can be done in Microsoft Excel without
using macros, and it leverages the cur-
rent generation of simulation tools. Prob-
abilityMananagement.org is a non-proft
organization actively promoting this new
arithmetic of uncertainty through edu-
cational tools, standards and best prac-
tices. This article introduces distribution
processing in Excel and describes the
ecosystem into which it fts. But frst lets
take a detour through medieval Italy to
review the adoption of the arithmetic we
take for granted today.
WHAT DID FIBONACCI BRING TO ITALY?
The famous mathematician, Fibonac-
ci, returned to Italy in 1199 A.D. with ab-
solutely nothing, nada, zilch, zero. After
studying with leading Arab mathematicians
of the time he also brought the rest of the
Hindu-Arabic numbers, 1 through 9. But
zero was the important one, as it had no
representation in Roman numerals. This
made it hard to do arithmetic at all, let alone
get your fnancial statements out on time.
When Fibonacci attempted to promote
his new system, the typical response was:
Cant you see were busy? We need to
multiply these two Roman numerals to-
gether by next March! So after beating
his head against the wall for a couple of
years, he published his book, Liber Aba-
ci, in 1202, which showed the business
community how Arabic arithmetic vastly
simplifed calculations involving interest,
currency conversion, etc. Apparently he
didnt even invent the famous Fibonacci
series [1], but merely presented it in his
book as a challenge, as in try doing that
with your lousy Roman numerals!
Ultimately 0 through 9 caught on and
the rest is history, as chronicled in a love-
ly book on zero by Charles Seife [2].
The book contains a woodblock of a
medieval quiz show in which a contestant
with Arabic numbers is up against anoth-
er with an abacus (Figure 1) [3]. Roman
numerals were undoubtedly eliminated in
Distribution Processing and the
arithmetic of uncertainty
BY SAM L. SAVAGE
D
P R OBAB I L I T Y MANAGE ME NT
28 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
WWW. I NF OR MS . OR G 29 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
the previous round. A Medieval show host,
Vanna Bianca, is presenting the ques-
tions. Her fond gaze implies something
might be going on after the show with Mr.
Arithmetic, who has a look of smug con-
centration on his face. From the frown on
the fellow with the abacus, he has appar-
ently just soiled his medieval undergar-
ments. Beyond these artistic touches, the
graphic indicates that calculation was not
just an academic exercise in 16th century
Europe, and that there were still compet-
ing methodologies.
THE GOALS OF PROBABILITY
MANAGEMENT
With this in mind, the broad goal of Prob-
ability Management is to do for uncertainty
what Fibonacci did for numbers. That is,
to promote standards for making rational,
auditable calculations based on probabil-
ity distributions. There are three prevalent
ways of calculating with uncertainties to-
day. The frst and most widespread is to
ignore distributions altogether and replace
them with single average numbers. This
leads to a host of systematic errors, which
I refer to collectively as the Flaw of Aver-
ages [4]. The second approach is to ap-
ply classical statistical techniques. This has
provided groundbreaking insights in felds
as diverse as supply chains and fnance,
but for the everyday manager it falls into
the Roman numeral category. The third
is computer simulation, which is the most
general. However, there are three barriers
to this approach. First, it requires statistical-
ly trained experts to generate the distribu-
tions of uncertain inputs; second, it requires
specialized software to perform the calcu-
lations; and third, the results are diffcult for
management to interpret.
DISTRIBUTION PROCESSING: MONTE
CARLO SIMULATION AND MUCH LESS
The frst problem is solved by allow-
ing statistical experts to store simulation
trials as data in Stochastic Information
Packets (SIPs). Because a SIP can con-
tain thousands of potential outcomes in-
stead of a single number it cures the Flaw
of Averages. Furthermore, native Excel
can do SIP math, replacing numbers with
uncertainties, thereby bringing down the
second and third barriers. I call this Distri-
bution Processing, and it may be viewed as
Monte Carlo simulation without the Monte
Carlo. Instead of generating random num-
bers, the user simply performs SIP math
with pre-generated, auditable inputs. If you
think of traditional simulation as generating
electricity instead of random numbers, then
distribution processing is a light bulb for il-
luminating uncertainty.
SIP MATH WITH DATA TABLES IN
MICROSOFT EXCEL
In the late 1980s, Economist William
Sharpe of CAPM (capital asset pricing mod-
el) fame used data tables in Lotus 1-2-3 to
perform simple Monte Carlo simulations
using random numbers generated within
the spreadsheet. I have used data tables
to teach interactive simulation in spread-
sheets since the early 1990s [5]. Back then
this would often crash Excel, but it was still
an inspirational teaching tool.
I recently re-examined this approach,
but using SIPs instead of = RAND(). I
was amazed at how fast and robust data
tables had become on the latest machines
and versions of Excel. As a bonus, Excel
2010 introduced sparklines, tiny charts in-
spired by Edward Tufte [6] that can repre-
sent the shape of a distribution in a single
cell. The accompanying sidebar story de-
scribes how to do this from scratch in the
privacy of your own cubicle. Below Ill dis-
play some results and discuss some impli-
cations of this approach. You may follow
along by downloading the examples from
www.ProbabilityManagement.org.
SPARKLAND
Well start with a fle I call Sparkland,
because it represents the results of simu-
lations as sparklines. Figure 2 depicts a
portion of a worksheet that took only a
few minutes to create in Excel 2010 and
uses no macros.
The sparklines in cells B3 and C3 are
histograms of SIPS of 10,000 trials of inde-
pendent, uniform, random variables.
D I S T R I B U T I ON P R OCE S S I NG
Figure 1: Woodblock of a medieval quiz show.
SIPmath videos
SIPmath, Part 1: Calculating with Uncertainties
SIPmath, Part 2: The Model that Launched 1,000
Mutual Funds
SIPmath, Part 3: Its Done with Data Tables
SIPmath, Part 4: More SIPmath Examples
(Cantral Limit Theorem, Lognormal)
Cell B5 is shown in edit mode while
the formula =B3+C3 is being entered.
The instant the formula is completed,
the worksheet appears as shown in
Figure 3. The sum of two uniform distri-
butions is triangular; for example there
are more ways to get a seven than a
two or 12 on a pair of dice.
When the formula in B5 is replaced
with =Cos(B3+C3), then you instantly
get the histogram in Figure 4, not the
type of calculation you can do in your
head. Click the undo key and instantly
youre back to Figure 3.
The resulting output SIPs may be
interactively analyzed with Excels sta-
tistical functions, such as AVERAGE,
STDEV or PERCENTILE.
Hey Fibonacci, try doing that with
your lousy Arabic numerals!
OK, I hear some of you saying,
When would I need to calculate the
cosine of a sum of uniforms? and
you have a point. So lets consider a
more practical example, the model that
launched a thousand mutual funds: the
1952 mean variance analysis of Harry
Markowitz [7, 8].
A PORTFOLIO MODEL
Figure 5 shows a distribution-pro-
cessing dashboard based on SIPs of
returns of seven fnancial assets. As
30 | A NA LY T I CS - MAGA Z I NE . OR G
D I S T R I B U T I ON P R OCE S S I NG
Figure 2: Portion of worksheet created in
Excel 2010.
Figure 3: Worksheet instantly depicts formula as
histogram.
Figure 4: Changing the formula instantly
produces new histogram.
WWW. I NF OR MS . OR G 31 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
you change the weights in your portfolio,
a simulation of 1,000 trials is run instan-
taneously, with the risk, return and distri-
bution of results displayed graphically. A
partial view of the stochastic library driv-
ing this model appears in Figure 6.
The primary data element is the SIP,
containing pre-generated trials for each
asset. Of course, in general there is sta-
tistical dependence between the various
inputs that must be maintained. So what is
required for this model is a Stochastic Li-
brary Unit with Relationships Preserved,
or a SLURP [9] of all seven assets.
ProbabilityManagement.org promotes
formatting standards for such libraries to
maximize the potential for sharing and
collaboration among statistical experts
and end users across diverse felds.
DISTRIBUTIONS AS DATA
The library shown above is based on
lognormal distributions of asset returns.
The way modern fnance is taught, this
assumption was apparently welded in at
the factory. This is analogous to a formula
in Excel having the number 7 embed-
ded in it. Data and formulas should not be
mixed! The beauty of distributions as data
is that different distributional assumptions
may be swapped in and out. If you toggle
the Library control from Lognormal to Fat
Tailed, 7,000 trials are instantly loaded
into the model, and all the results update
instantly.
Figure 7: Toggling from Lognormal to Fat Tailed.
The idea of probability distributions as
data is not new. In 1955 the Rand Corpora-
tion published a book containing one million
random digits [10] (a hit with insomniacs).
And data-driven simulation is at the heart
of statistical bootstrapping as pioneered
by Brad Efron [11]. The Fed runs simula-
tions that generate sample paths of future
economic conditions [12]. Numerous large
frms and risk management systems have
their own internal databases of scenarios
for stochastic modeling in many contexts.
A confuence of technologies is now
making it not only practical, but indeed
necessary to better coordinate this sort of
data. Nobel Laureate and board member
of ProbabilityManagement.org, Harry Mar-
kowitz, puts it this way: Decision-makers of
the future will have access to an ocean of
data some generated by their own orga-
nizations, some downloaded from networks
including, but not limited to, historical se-
ries collected by governments and private
D I S T R I B U T I ON P R OCE S S I NG
Figure 5: Distribution processing dashboard based on a stochastic library.
Figure 6. Partial view of the stochastic library driving the model.
WWW. I NF OR MS . OR G 32 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
frms, series generated by simulation runs,
statistical analyses based on one or more
of the forgoing, etc. The standardization of
the representation of such information
with associated provenance! is essential
to not clumsily drown in this ocean of data.
NETWORKED ANALYTICS
Network effects provide economic
beneft. A single telephone is worthless,
but a network of phones brings increas-
ing value as it grows in size. Add cam-
eras, maps and angry birds, and you get
a whole new economy.
In principle any analytical software
that manipulates simulation scenarios
can be networked through stochastic li-
braries. Some packages notably Ana-
lytica from Lumina Systems, Crystal Ball
from Oracle Corporation, Risk Solver
from Frontline Systems and XLSim from
Vector Economics are already compli-
ant by design with the principles of Prob-
ability Management. Many others, such
as @RISK, JMP, Tableau and Matlab are
a mere import or export away.
Networked analytics will play an impor-
tant role in such enterprise-wide activities
as the stress testing of fnancial institu-
tions, management of portfolios of R&D
projects and in communicating forecasts
as distributions rather than single num-
bers. At ProbabilityMananagement.org we
are striving to establish standards to help
this network fourish.
Sam L. Savage (savage@stanford.edu) is executive
director of ProbabilityManagement.org, author of The
Flaw of Averages and consulting professor at Stanford
University. In 2012, ProbabilityManagement.org was
incorporated as a non-proft with board members Harry
Markowitz (Nobel Laureate in Economics) and Michael
Salama (lead tax council for Walt Disney) with the goal
of promoting more consistent and auditable ways of
communicating and managing uncertainty. Savage has
a long history of bringing analytics to spreadsheets,
spanning his design of WhatsBest!, which coupled Lotus
1-2-3 to linear programming in 1985, to distribution
processing in Excel in 2012. Savage holds a Ph.D. in the
area of computational complexity from Yale University. He
is a senior member of INFORMS.
D I S T R I B U T I ON P R OCE S S I NG
SIP math in Excel
REFERENCES
1. The Fibonacci sequence is prescribed by the linear
equation {Fn}n=1, with Fn = Fn-1 + Fn-2 where F1
= F2 = 1.
2. Charles Seife, Zero: The Biography of a
Dangerous Idea, Penguin Group, 2000.
3. Gregor Reisch, Margarita Philosophica,
Strassbourg, 1504.
4. Sam L. Savage, The Flaw of Averages, Why we
Underestimate Risk in the Face of Uncertainty, John
Wiley, 2009, 2012,
5. Sam L. Savage, Decision Making with Insight,
Cengage Learning, 2003, p. 44.
6. Edward Tufte, Beautiful Evidence, Graphics
Press, 2004.
7. Markowitz, H.M., Portfolio Selection, Journal of
Finance, Vol. 7, No. 1, pp. 7791, March 1952.
8. Markowitz, H. M., Portfolio Selection: Effcient
Diversifcation of Investments, second edition,
Blackwell Publishers, Inc., Malden, Mass., 1957, 1997.
9. Sam Savage, Stefan Scholtes and Daniel Zweidler,
Probability Management, OR/MS Today, February
2006, Vol. 33 No. 1.
10. www.rand.org/pubs/monograph_reports/MR1418/
index2.html.
11. Efron, B. and Tibshirani, R., An Introduction to the
Bootstrap, Boca Raton, Fla.: Chapman & Hall, 1993.
12. www.phil.frb.org/research-and-data/real-time-
center/PRISM/.
You can do everything described below in Excel
2007 except the sparklines used to create the
Sparkland worksheet.
The proper way to learn this is to download the
tutorial fle shown in Figure 8 from
www.ProbabilityManagement.org. But if you insist
on being told, here goes.
The components are as follows:
1) The stochastic library. This is comprised of one
or more columns (or rows) of pre-computed
Monte Carlo trials (SIPs). The SIPs in Figure
8 were created with two columns of RAND()
formulas, which were then copied and pasted
with special values.
2) Index formulas that point into the two SIPs.
3) This cell drives the Index formulas, and is also
the Column Input cell of the Data Table.
4) The formula specifying the SIP math. In this case
the sum of the SIPs referenced by Index formula
cells (C14+D14).
5) The Data Table. The use of this powerful but
arcane feature of the spreadsheet is a folk art,
passed down in the aural tradition. If you are
not familiar with it already, I suggest the tutorial
mentioned above or a knowledgeable colleague
for guidance.
6) The bins and frequency formulas for the
histogram. Frequency is an array formula,
another folk art. However, Excel help is pretty
useful on this one. Just dont forget to press
<Ctrl><Shift><Enter> while holding down the
<N> key with your nose as you enter the formula.
OK, I was just kidding about the <N> key, but you
need all the others.
So how about Sparkland? Sparklines in Excel can reside in the same cells as formulas. So if you take the
above example, and put sparklines of the histograms of the two uniforms in the cells containing the Index
formulas (2), and put a sparkline of the output cells histogram bins in the formula cell (4), then you get the
behavior shown in Figures 3 and 4.
And note, you can replace the Index formulas (2 in Figure 8) with RAND() formulas and Excel no longer
blows up. Of course this does not yield the reproducible results obtainable with SIP math.
Figure 8: The tutorial fle from www.ProbabilityManagement.org.
WWW. I NF OR MS . OR G
Effective communications: How to achieve buy-in during the Twitter-influenced, short-attention-span era.
Most advocates of analyt-
ics do not have problems
performing analysis. Their
challenge is effectively
communicating with their colleagues
and getting them on board and getting
buy-in.
Consider this example. An analyst
comes up with an idea, perhaps to iden-
tify the most valuable type of customer
to pursue for actions to lift profts. The
analyst conducts a meeting, explains the
problem to be solved or the opportunity
to pursue, and describes a solution. The
analyst concludes the meeting by defn-
ing and scheduling next steps.
Now imagine the analyst gets to se-
cretly listen to conversations of the
meeting attendees with others of their im-
pressions of the analysts proposal. What
might the analyst hear? Attendees might
confess they did not really understand
the problem, opportunity or solution.
Some might react negatively, indicating
For analytics, the singer is
more important than the song
BY GARY COKINS
M
S OF T S K I L L S
33 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
Analysts need to explain their idea, get buy-in for it, sell it and implement it.
WWW. I NF OR MS . OR G 34 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
that they do not like the solution. Some
might not like the analyst and his or her
message.
THE PROBLEM WITH GETTING AHEAD OF
YOUR AUDIENCE
The root problem is that the analyst
assumes everyone understands the is-
sues and that they are all on board and
ready to take the next steps. Dont be so
hasty. A mistake made by idea genera-
tors, such as the analyst, is assuming that
informing and educating their colleagues
equates to having their support and com-
mitment. The analyst has erroneously
gotten ahead of his or her audience.
In his book Beyond the Wall of Re-
sistance [1] author Rick Maurer uses a
similar example as an introduction to his
copyrighted Cycle of Change wheel dia-
gram. The cycle includes these stages:
In the dark The people you want to
infuence are unaware of the problem
and therefore the need for change.
They do not see reasons to act on the
issue.
See the challenge Maurer refers
to this as the aha moment when
someone acknowledges the problem,
threat or opportunity. This is the most
critical stage in the cycle. Once people
see the challenge, it becomes possible
to align them and advance in the cycle.
Get started These are the typical
initial actions such as identifying
objectives and making plans. The
obstacle is people who are still in the
dark or who do not see the challenge.
They are not ready to get started.
Roll out Roll out is sometimes
confused with a successful conclusion.
It is just the beginning. The system
may be in place and people trained,
but the roll out means you are just
starting the analysis or project
implementation.
Results This is the stage where
the benefts, partial or full, are
realized. Examples of benefts may be
increased revenues, cost reductions
or reduced cycle time. Results are the
test of both the idea and its solution
including the implementation.
Time to move on Eventually the
idea and its implementation are
completed. Minor refnements can
continue, but generally at this point the
change has served its purpose, and
there are new priorities to pursue.
The more critical obstacles are in
the beginning stages. In addition to the
previously stated mistake of assuming
everyone is on board and prematurely
advancing to the get started stage,
Maurer cites other mistakes. One is
underestimating the potential power of
co-worker and management engage-
ment. Employees who are disengaged
can break the spirit of those who are.
Another mistake is not recognizing how
much resistance to change and fear may
reside in co-workers.
An additional and critical mistake
Maurer cites is a failure to acknowledge
how lack of trust or confdence in the an-
alyst or leadership can prevent pursuing
a good idea. The adage from the movie
The Field of Dreams was, If you build it
(the baseball feld), then they will come.
That wont work.
THE WHY BEFORE THE HOW
The lesson here is to not move to action
too quickly. Until understanding and educa-
tion are frmly in place, others might tune
out or even sabotage an idea. In the see
the challenge stage of the cycle analysts,
as change agents, should be empathetic.
They need to place themselves in the shoes
E F F E CT I VE COMMU NI CAT I ON
http://decision-analysis.society.informs.org
JOIN INFORMS DAS
Go to the INFORMS website
(http://www.informs.org) and click on
the Member Login link at top right.
Log in. If you don't have login
information, click on the link "Please
send me my login info."
Once you are logged in, you will be
redirected to the home page. Click on
the "Self-Service Center" link at top.
Select "Renew or Add Items for 20XX".
Scroll down this page to add/edit your
community memberships.
Interested in the development and use of logical methods
for the improvement of decision making in public and private
enterprises? Join INFORMS Decision Analysis Society.

PURPOSE
Advances the theory, application, and teaching of all aspects of decision analysis
with the objective of providing practical advice to decision makers.

MEMBER BENEFITS
Complimentary online subscription to INFORMS journal Decision Analysis
Decision Analysis Today newsletter (published 3 times/year)
Subscription privileges for the Society electronic mailing list
Eligibility for Society activities (sponsored session chairs,
committees and panels, etc)
Personal information listed in the online membership directory
Voting privileges in Society elections

AWARDS
Frank P. Ramsey Medal (for distinguished contributions)
Decision Analysis Publication Award
Practice Award
Student Paper Award
Decision Analysis Society
35 | A NA LY T I CS - MAGA Z I NE . OR G
E F F E CT I VE COMMU NI CAT I ON
Regardless of what social
network you choose,
Scan this QR code with your smartphone to link
to INFORMS social networking pages or visit
www.informs.org/Connect-with-People/Social-Networking
we can help you connect
with others in your field.
of their audience and consider what may
emotionally affect the audience.
Do not underestimate the impor-
tance of making a strong case for the
idea or change. Do not rush to the
how unless everyone understands
and agrees to the why.
At any stage in the cycle of change
there is vulnerability to resistance that
can emerge from any missteps. Resis-
tance to change is human nature. Peo-
ple like the status quo. Maurer warns
change agents to watch out for three
increasing levels of concern:
1. Logical concern: Confusion vs.
understanding; your audience is
thinking, I dont get it.
2. Emotional concern: Fear vs. a
favorable reaction; your audience is
thinking, I dont like it.
3. Personal concern: Mistrust vs.
confdence; your audience is
thinking, I dont like you.
People are seldom neutral. They are
either supportive or opposing.
EFFECTIVE COMMUNICATIONS: SHORT
AND SWEET
Getting your message out involves
effective communication. Good com-
munication matters when you want
to convert opposing views, build that
business case earlier mentioned or, at
a minimum, have an audience to listen
to you. This is essential during the in
the dark and see the challenge stag-
es of Maurers Cycle of Change.
Effective communication is compli-
cated in the Twitter era by the growing
deluge of social media and other dis-
tractions coupled with the shortening
attention span of people. Good, old-
fashioned memory retention or lack
thereof further complicates the prob-
lem, as evidenced by the Ebbinghaus
Forgetting Curve. Hermaan Ebbing-
haus was a German psychologist who
researched memory in the late 19th
century. His tests revealed that imme-
diate recall and memory of a message
typically drops to 60 percent in 19 min-
utes, 45 percent in 63 minutes and 33
percent in a day.
So, what are effective ways to com-
municate? Most important is to keep
your message brief and simple.
Consider this example. On Nov.
19, 1863, at the dedication of the Get-
tysburgs Soldiers National Cemetery,
the frst speaker was a popular orator
of the time, Edward Everett. His talk
was lasted more than two hours and
included roughly 13,700 words. Presi-
dent Abraham Lincoln was also invited
to speak. His address last two minutes
36 | A NA LY T I CS - MAGA Z I NE . OR G
and included 271 words. One talk was
memorable and the other forgettable.
Lincoln did not quickly compose his
speech. He carefully crafted it. That re-
veals a diffcult task. The shorter the time
to communicate a message, the more
thought that must go into crafting it.
In the book Made to Stick Why
Some Ideas Stick and Others Die [2],
authors Chip and Dan Heath emphasize
key points to make a message stick
with readers or listeners. The following
quote is from Publishers Weeklys review
of the book: What makes stories memo-
rable and ensures their spread around
the globe? The authors credit six key
principles: simplicity, unexpectedness,
concreteness, credibility, emotions and
stories. (The initial letters spell out suc-
cess well, almost.) They illustrate these
principles with a host of stories, some fa-
miliar (Kennedys stirring call to land a
man on the moon and return him safely
to the Earth within a decade) and oth-
ers very funny (Nora Ephrons anecdote
of how her high school journalism teach-
er used a simple, embarrassing trick to
teach her how not to bury the lead).
I have made many presentations for
my employers, admittedly to drum up
business and sale leads. I may not al-
ways have been perfectly on message
for what my employer was selling at the
time, but I have been effective at hold-
ing peoples attention and earning the
right to advance. Sometimes the singer
is more important than the song.
Analysts are passionate about their
ideas and techniques to gain insights
and provide foresight for their colleagues
to make better decisions. It is not suff-
cient to have a good idea. Analysts need
to explain it, get buy-in for it, sell it and
implement it. You may not have a great
singing voice. Everyone is not a Frank
Sinatra. But if your song is a decent one,
singing it well and not getting ahead of
your audience may be your secret to bet-
ter success of your ideas and analysis.
Gary Cokins (gcokins@garycokins.com) is an
internationally recognized expert, speaker and
author in advanced cost management and enterprise
performance and risk management systems. He
is the founder of Analytics-Based Performance
Management (www.garycokins.com), an advisory frm
located in Cary, N.C. He began his career in industry
with a Fortune 100 company in CFO and operations
roles. He then worked 15 years in consulting with
Deloitte, KPMG and EDS. In 1992 he joined SAS, a
global leader in business intelligence and analytics
software, as a principal consultant. His two most
recent books are Performance Management: Finding
the Missing Pieces to Close the Intelligence Gap
and Performance Management: Integrating Strategy
Execution, Methodologies, Risk, and Analytics. He is
a member of INFORMS.
E F F E CT I VE COMMU NI CAT I ON
Analytics: Driving Better Business Decisions provides
a window into applications of mathematics, operations
research, and statistics to drive business decisions.
WWW.INFORMS.ORG WWW.ANALYTICS-MAGAZINE.ORG
Advertising Opportunities at:
WWW.ANALYTICS-MAGAZINE.ORG
For more information about advertising, contact:
John Llewellyn
770.431.0867, ext. 209
(888.3033.5639 Toll Free)
Email: john.llewellyn@mail.informs.org
Analytics
REFERENCES
1. Rick Maurer, Beyond the Wall of Resistance
(revised edition), Bard Press, 2010.
2. Chip and Dan Heath, Made to Stick, Random
House, 2007.
WWW. I NF OR MS . OR G
A repeatable, efficient process for creating and effectively
deploying predictive analytic models into production.
Organizations are increas-
ingly adopting predictive
analytics, and theyre adopt-
ing these predictive analyt-
ics more broadly. Many are now using
dozens or even hundreds of predictive
analytic models. These models are now
used in real-time decision-making and
in operational, production systems.
However many analytic teams rely on
tools and approaches that will not scale
to this level of adoption. These teams
lack a repeatable, efficient process for
creating and effectively deploying pre-
dictive analytic models into production.
To succeed they need to operationalize
analytics.
Operationalizing analytics requires
both a repeatable, industrial-scale pro-
cess for developing the dozens or hun-
dreds of predictive analytic models that
will be needed and a reliable architecture
for deploying predictive analytic models
into production systems.
The frst step in operationalizing analyt-
ics is moving from a cottage industry to an
industrial process for building analytic mod-
els. This means moving away from individu-
al scripting environments where every task
is performed by hand, reuse is limited, only
small numbers of expert analytic practitio-
ners are involved and these practitioners
do not follow a standard process. Such an
approach can and does produce high qual-
ity models, but it cannot scale to allow an
organization to become broadly analytic in
its decision-making. A more industrialized
process has a number of characteristics.
Access to data is standardized based
on well-defned metadata and standard
sets of data about customers, products,
etc. Defnitions of this data are shared
and analytical datasets are generated in
a repeatable and increasingly automat-
ed way. This more systematic approach
to data management feeds into a work-
bench environment for defning the mod-
eling workfow.
Products such as SAS Enterprise
Miner allow standard workfows to be
developed and shared through a reposi-
tory, streamlining and standardizing how
modeling is performed. These workfows
can use in-database mining capabilities
to push data preparation, transformation
and even algorithms into the data infra-
structure, improving throughput by reduc-
ing data movement. In-memory and other
high performance analytic capabilities, as
well as intelligent automation of modeling
activities, can be applied as appropriate
in the steps defned in the workfows.
Predictive analytic workbenches also
support the ongoing management and
monitoring of models once they are in
use. Workbenches allow an analytic team
to set up automated monitoring of mod-
els to see when they need to be re-turned
or even completely re-built. These capa-
bilities also help track the performance of
models to confrm their predictive power
and behavior.
Finally, these workbenches can often
be wrapped with an interface suitable for
less technical users. Such features allow
less technical users to build and execute
workfows that take advantage of the au-
tomation capabilities to produce large
numbers of good-enough models quickly.
Working with an analytic team, these us-
ers can produce frst-cut models and par-
ticipate more fully in reviews of models,
while allowing the analytic team to focus
on high-value, high complexity problems.
These kinds of capabilities allow orga-
nizations to develop an industrial-scale
process for developing predictive analyt-
ics. Operationalizing analytics requires
Successfully
operationalizing
analytics
BY JAMES TAYLOR
O
P R E D I CT I VE ANALY T I CS
37 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
WWW. I NF OR MS . OR G 38 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
this, but it also requires a more sys-
tematic focus on the use of analytics in
operational systems. As organizations
expand their use of predictive analytics, it
becomes increasingly clear that the long-
term value of predictive analytics comes
in improving the quality and effectiveness
of operational decisions.
Operational decisions are highly re-
peatable decisions about a single cus-
tomer or transaction. Because they are
made over and over again, these deci-
sions create the data needed for effec-
tive predictive analytics. Because they
must be made by front line staff or com-
pletely automated systems, they dont
lend themselves to analytic approaches
that rely on the end user such as visu-
alization and query tools. To influence
these decisions using predictive ana-
lytics, organizations must embed ex-
ecutable analytic models deeply into
their operational systems, driving better
decisions in real-time in high-volume,
transactional environments.
Many analytic teams think of them-
selves as done when the model is built
and validated. A team focused on opera-
tionalizing analytics will instead focus on
the point at which operational decisions
are being made more effectively thanks to
analytics embedded in the systems used
to make those decisions. This requires
a change in focus and the use of some
more modern technologies.
One way to make predictive analytic
models available in operations is to use
in-database scoring infrastructure, which
takes models and pushes them directly
into the core of an organizations opera-
tional data stores. Once deployed, the
models are available as a function and
can be included in views or stored pro-
cedures. This allows operational systems
direct access to the result of the model
while ensuring that this is calculated live,
when requested, and not based on a po-
tentially out of date batch run.
Predictive analytic models can also
be deployed directly into operational en-
vironments. The increasing adoption of
the Predictive Model Markup Language
(PMML), an XML standard for defning
predictive analytic models, means that
a model can be generated from an ana-
lytic workbench and then deployed into a
variety of environments. Many business
rules and business process management
systems, for instance, support PMML.
This allows them to load the defnition of
a model directly and then execute it when
the rules or process need to know the
prediction.
In addition many applications for fraud
detection credit risk management or cus-
tomer intelligence have been designed to
allow new models to be rapidly integrated
into operational systems. As more pack-
aged applications tackle decision-mak-
ing, this capability will only become more
widespread, giving analytic teams more
deployment options and further reducing
the barriers to using predictive analytics
in operations.
To broadly and effectively adopt pre-
dictive analytics organizations must op-
erationalize analytics. Operationalize
analytics requires an industrialized process
for building predictive analytic models us-
ing technology that emphasizes automa-
tion, collaboration and reuse combined with
a focus on rapid deployment of predictive
analytic models into operational systems.
Note: For a complimentary copy of
James Taylors white paper on Opera-
tionalizing Analytics, click here.
James Taylor is CEO of Decision Management Solutions
(www.decisionmanagementsolutions.com). A consultant
and leading expert in decision management, Taylor
is also a writer, speaker and a faculty member of the
International Institute for Analytics.
OP E R AT I ONAL I Z I NG ANALY T I CS
INFORMS, The Institute for Operations Research and the Management Sciences, serves the scientific and professional
needs of OR/MS educators, scientists, researchers, managers, and students, as well as the institutions they serve.
INFORMS publishes 13 scholarly journals, including two electronic-only journals, covering excellent the latest OR/MS
methods and applications. INFORMS also organizes professional conferences and application-oriented meetings that
provide sponsorship opportunities.
INFORMS Advertising Helps You Reach
Consultants to Help You Solve Business Problems
Individual Purchasers of OR/MS & Related Products & Services
Institutional Product & Services Purchasing Decision Makers
Analytics Professionals & Executives
INFORMS Audience
INFORMS audience get the attention of operations researchers who work for corporations, consulting groups, the
military, the government, and health care, as well as academics who teach OR/MS, analytics, and the quantitative
sciences in engineering and business schools.
INFORMS Subscriber Profile
Our subscribers are interested in a variety of OR/MS topics and most have purchasing power within their
organizations. They are professionals, residing in nearly every country around the globe, found in every sector of
business and industry, both public and private. Subscribers rely heavily on INFORMS journals to keep them up-to-date
on the most recent OR/MS research and industry developments. Target this specialized audience of OR/MS
professionals by placing your print ad or banner ad in INFORMS publications.
ADVERTISE WITH INFORMS
Click to view rates: www.informs.org/advertising
National & International Meeting Attendees
OR/MS Job Seekers
Specialized Software Developers
Students Seeking Summer Employment
WWW. I NF OR MS . OR G
Evaluation of performance: Evolution of the official box score.
What if basketball ana-
lytics could formulate an
end-all value that could
justly evaluate team and/or
player performance? Regardless of the
complexity of its formulation, those im-
mersed in the world of basketball analyt-
ics are challenged with this mission: to
translate a game of interdependent fac-
tors into simple measures of player and
team performance. The Four Factors
of Basketball Success, established by
basketball statistician pioneer Dean Oli-
ver, have long played a role in the un-
derstanding and the evaluation of team
success (www.basketball-reference.
com/about/factors.html). Simply put:
If youre better than your opponent at
making field goals, creating turnovers,
grabbing rebounds and getting to the
foul line, then youre going to win many
more games than you will lose.
Genetics is the
study of the variations
between humans and
how those variations
are passed through a
family. Described as
the cook book of recipes that tells our
body how to grow and how to develop,
DNA is the basis of genetics. In genom-
ics, team research is conducted to inves-
tigate the complex instructions between
multiple environmental and genetic risk
factors.
Interestingly, the same advanced
statistical methods implemented to dis-
cover and map the genes responsible
for disease in families and populations,
equivalently can be modifed to identify
and evaluate the basketball DNA genes
attributed with success on a basketball
team. In principal, the evaluation of per-
formance in basketball analytics parallels
the application of methodology utilized in
Basketball genomics
BY WILLIAM CADE
W
S P OR T S ANALY T I CS
39 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
When a player looks at the game, they begin with the least important statistic and thats playing
time. They think that if they play a lot of minutes then everything will work out in their favor
not just our guys, but I think players across the country think its just all about playing time. If I
play a lot, then Ill play well ...and well, you have to earn that!
JIM LARRANAGA (ABOVE), HEAD COACH, UNIVERSITY OF MIAMI MENS BASKETBALL TEAM
The evaluation of performance in basketball analytics parallels the application of
methodology utilized in the analytics of genomics.
WWW. I NF OR MS . OR G 40 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
the analytics of genomics. So then, what
is your favorite teams basketball DNA?
A fundamental building block for the
measure of team performance is time, as
function of minutes (playing time). With-
in the analytics of basketball genomics,
the genetic makeup of playing time is se-
quenced and coded as a possession.
How much information observed and
collected on a particular possession is
fundamental in identifying the basketball
DNA associated with/within a team (play-
er and/or lineup).
The Offcial Box-Score (play-by-play),
for example, signifcantly serves as an in-
valuable guide in understanding the anal-
ysis of the game on a fundamental level.
With an unbiased precision, the composi-
tion of the Offcial Box-Score is two-fold.
It provides a quarter-by-quarter descrip-
tion of events, along with descriptive
measurements, used to inform how well
or how poorly a player and/or team have
performed. Statistics included in an Off-
cial Box-Score are feld goals made and
feld goals attempted (FGM and FGA),
three-point feld goals made and three-
point feld goals attempted (3PM and
3PA), free throws made and free throws
attempted (FTM and FTA), offensive
rebounds (OR or OREB), defensive
rebounds (DR or DREB), total re-
bounds (TREB), assists (A or AST),
steals (S or STL), blocked shots (B
or BS), personal fouls (F or PF) ,
Turnovers (TOV or TO), minutes (M
or MIN) and points (P or PTS). Vali-
dated by the aforementioned play-by-play
component of the Offcial Box-Score, the
methodology for the evaluation of player
and/or team lineup performance is dis-
played in the truncated example of game
charting shown in Table 1.
With the use of advanced statistical
software tools (SAS Version 9.3), I have
extended the Offcial Box-Score and es-
tablished a never-ending framework that
can measure the offensive and defensive
prowess for a basketball team, by lineup
(per game, seasonally, etc.), entitled Of-
fcial Box-Score DNA. The principal ar-
eas of extension within Offcial Box-Score
DNA include possession, feld goal, re-
bound and free throw. Derivative and
unique (*) statistics provided in Offcial
Box-Score DNA are two-point feld goals
made and two-point feld goals attempt-
ed (2FGM and 2FGA), offensive and
defensive possessions (OP and DP),
three-point feld goal attempted offen-
sive rebounds (O3REB and O3RB),
three-point feld goal attempted defen-
sive rebounds (D3REB or D3RB), free
throw offensive rebound (FTOREB or
FTORB), free throw defensive rebound
(FTDREB or FTDRB) and potential
free throws (PFT).
Subsequent basketball analytics ex-
ecuted with the utilization of advanced
statistical software tools produces mea-
surements of frequency, effciency and
precision relative to team performance.
Complimentary to the notable plus/mi-
nus basketball statistic that looks at the
point differential when players are both in
BAS K E T BAL L GE NOMI CS
Whereas a coach looks at,
playing well earn your playing
time by your performance on the
court by how well you defend,
how well you rebound, how well
you guard your man, how well
you run the floor, how well you
make good decisions on offense
(make your shots, make your
passes correctly, dont turn to the
ball over) ...
JIM LARRANAGA
Table 1: Game charting.
Request a no-obligation
INFORMS Member Benefits Packet
For more information, visit:
http://www.informs.org/Membership
WWW. I NF OR MS . OR G 41 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
and out of the game, the example in Table
2 illustrates team lineup performance for
the entirety of a single basketball game.
This impartial approach to quantify
team chemistry clearly identifies the
qualities shared by players whose play
on the court seems simply to flow. It can
look at a variety of combinations of play-
ers on the court and clearly show which
combinations have the biggest effect
best impactful two-player, three-player
and even five-player combinations for
each game.
For example, suppose your favorite
teams opponent(s) is entering a tourna-
ment or playoff game setting, and game
preparation involves exact knowledge
of successful team lineup defensive
performance? Game strategy and deci-
sion-making to implement the best team
defense would naturally lend itself to
some of the following questions: What
lineup has played the most defensive pos-
sessions together? What lineup defends
the two-point feld goal attempt the best?
What lineup rebounds the three-point feld
goal attempt the best? What lineup fouls
the least? The solution to these defensive
questions of interest, respective to game
preparation, are illustrated in Table 3.
In essence, the collection of informa-
tion provided by Offcial Box-Score DNA
statistics allows for an effcacious way of
showing the best-assembled/best com-
bination of players on the court. Though
basketball analytics comes with its limita-
tions and imperfections, the pursuit of the
advancement of knowledge of the game
further incites ongoing analyses and a
penchant for better statistics!
William Cade (qstatscade@gmail.com), who holds a
masters degree in public health, is a senior data analyst
at the John P. Hussman Institute for Human Genomics,
University of Miami Miller School of Medicine, and an
institutional staff member of University of Miami mens
basketball team.
BAS K E T BAL L GE NOMI CS
Table 2: Team lineup performance.
Table 3: Team defense.
Join the Analytics Section of INFORMS
For more information, visit:
http://www.informs.org/Community/Analytics/Membership
WWW. I NF OR MS . OR G
Nicole Piasecki, vice president of Business
Development and Strategic Integration at Boe-
ing, will deliver the opening keynote address at
the 2013 INFORMS Conference on Business
Analytics and O.R., April 7-9, in San Antonio,
Texas, kicking off two days of intensive, real-
world education on descriptive, predictive and
prescriptive analytics.
The conference program is being designed
by a 27-member committee of analytic prac-
titioners and academics, chaired by Jim Wil-
liams, manager of Advanced Analytics at Land
OLakes. Two expanded, 10-session tracks will
focus on marketing analytics and the analytics
process. Other tracks will cover big data, sup-
ply chain management, forecasting, decision
analysis, soft skills and special topics such as
HR analytics.
The Franz Edelman Competition showcasing
the best in high-impact analytics will be held April
7. A new track for 2013 will highlight applied work
that has received recognition in other competi-
tions, including fnalists for the Innovation in Ana-
lytics Award and winners of the Wagner Prize,
INFORMS Prize, UPS George D. Smith Prize
and Spreadsheet Guru Prize.
While most speakers are hand-picked by
the committee to deliver talks on specifc topics,
interested pro-
fessionals can
submit a pre-
sentation pro-
posal for a talk
or poster. Ac-
ceptance for
either program
enables the
presenter to
register at 35 percent to 40 percent discount off
regular rates. The submission deadline for talks
(Select Presentations) is Dec. 15, 2012; the
deadline for posters is Feb. 1, 2013.
Select Presentations are 50-minute talks,
integrated into the regular conference program
and scheduled in special tracks. Talks cover
the full range of analytics topics, including best
practice examples, tutorials with a practice ori-
entation, case studies, emerging topics and
applications reports. The review process is
quite rigorous, with both content and speaking
expertise considered in the judging.
The poster format lends itself to works-in-
progress on which the presenter is looking for
feedback, successful projects that may not be
extensive enough for a 50-minute talk, and
corporate or consulting work subject to non-
disclosure restrictions that can still be present-
ed at a high level. Poster presentations will be
scheduled in standalone sessions on two days
of the conference.
For conference information and to submit
a proposal, visit http://meetings2.informs.org/
Analytics2013.
42 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
INFORMS Conference on
Business Analytics & O.R.
Two expanded, 10-session
tracks will focus on
marketing analytics and
the analytics process.
Other tracks will cover
big data, supply chain
management, forecasting,
decision analysis, soft
skills and special topics
such as HR analytics.
CONF E R E NCE P R E V I E W
Certifed Analytics Professional
program prepares for launch
The Institute for Operations Research and the
Management Sciences (INFORMS) will hold the frst exam
for its new Certifed Analytics Professional (CAP) program in
conjunction with the 2013 INFORMS Conference on Business
Analytics & Operations Research in San Antonio, Texas, in
April. The exam will also be offered at the INFORMS Annual
Meeting in Minneapolis in October 2013.
The certifcation program is designed to help analysts
differentiate themselves in the profession, while simultaneously
helping organizations looking for analytical talent to fnd
qualifed individuals with the skills needed to do the job.
In order to qualify to take the exam, the CAP process
requires candidates to have:
a bachelors degree in a related area and at least fve years
of analytics work-related experience; or
a masters degree in a related area and at least three years
of analytics work-related experience; or
a bachelors degree in an unrelated area and at least seven
years of analytics work-related experience.
In addition, all candidates must provide verifcation of soft
skills such as communication and speaking skills, ability to
make presentations, etc.
The 100-question, multiple-choice exam will include
questions in the following seven domains (with percentage
of weight toward the fnal score):
business problem (question) framing (15 percent)
analytics problem framing (17 percent)
data (22 percent)
methodology (approach) selection (15 percent)
model building (16 percent)
deployment (9 percent)
life cycle management (6 percent)
For more information on CAP including a candidate
handbook/study guide, click here.
San Antonio, home to the Alamo,
will host the conference.
WWW. I NF OR MS . OR G
In 2011, INFORMS introduced a new, high-
ly focused brand of conference with a three-
day meeting on healthcare. That meeting was
such a success both in terms of attendance
and quality of presentations that a second
INFORMS conference on healthcare is being
planned for 2013. Aimed at bringing healthcare
researchers and stakeholders together around
the most current research and practice, IN-
FORMS Healthcare 2013 will be held June 23-
26 at the Marriott Downtown in Chicago.
The conference will combine the deep re-
search focus of a traditional INFORMS meeting
with an emphasis on real-world application that
distinguishes the INFORMS analytics confer-
ence. General Chair Sanjay Mehrotra, professor
of Industrial Engineering and Management Sci-
ences at Northwestern Universitys McCormick
School of Engineering, has formed an Organizing
Committee of respected OR/MS professionals,
including Steven Shechter (University of Brit-
ish Columbia), Vedat Verter (McGill University),
Amy Cohn (University of Michigan), Hari Bala-
subramanian (University of Massachusetts-Am-
herst Diwakar Gupta (University of Minnesota)
and others. The INFORMS Health Applications
Society is a co-sponsor of the meeting and is ac-
tively involved in the planning.
The conference will offer a small-scale, col-
legial environment where high-quality talks are
presented by leaders in the feld. All submissions
will be reviewed by the committee and chairs.
Authors are invited to submit in a choice
of formats, either for oral presentation or for
interactive poster sessions, and must submit
a two-page extended abstract as well as 50-
word summary. Abstract submission is now
open, with the deadline of March 1, 2013.
The conference venue, the Marriott Down-
town Magnifcent Mile, is a renowned Chicago
hotel, situated at the best address in a great
city, with world-class dining and shopping, top
attractions and the best in theatre and muse-
ums within walking distance.
For complete information and to submit
an abstract, visit: http://meetings.informs.org/
healthcare2013/.
43 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
INFORMS
Healthcare 2013
The conference will offer
a small-scale, collegial
environment where
high-quality talks are
presented by leaders in
the field.
CONF E R E NCE P R E V I E W
Join the Community
for inside information.
SECTION ON
ANALYTICS
Looking to interact and network
with analytics professionals?
http://analytics.section.informs.org
Subscri be t o Anal yt i cs
Its fast, its easy and its FREE! Just visit: http://analytics.informs.org/
WWW. I NF OR MS . OR G
Like many of us, I feel as though Ive been
an analyst all my life. While it was always part
of who I am, it was not my profession until
relatively recently. One of the things I got to
do on the way here was to sell Christmas
trees. Since its the November-December
edition of the magazine, I thought I would
take some time to think about the purchase
of Christmas trees from an analytic point of
view.
THE CHRISTMAS TREE EXPERIENCE
When a person goes to buy a Christmas
tree, what are they really purchasing? It is not
the tree itself but rather the experience of buy-
ing the tree. This is why Internet sales of real
trees is unlikely to replace the traditional tree
lot (but it does exist: www.atreetoyourdoor.
com). Getting the Christmas tree is frequent-
ly a family affair and a family decision. Lets
suppose that there is a particular tree that the
salesman wants to sell, perhaps because he
gets a better commission, or perhaps he has a
bet with other salesmen about whether he can
sell it. How does the analytical tree salesman
approach this problem?
OPTIMAL SALES STRATEGY
The Christmas tree
salesman, if hes an analyst,
will quickly observe that the
tree decision is a collabora-
tive process. The decision is
based on two factors: first,
the quality, price and suit-
ability of a tree; and second,
the comfort read: cold-
ness of the coldest per-
son. While the first factor is
the one that everyone says
they use, in practice, the
second factor may be more
important. So, frequently
the purchase is made when
one of the familys decision-
makers reaches the critical
level of discomfort and says
this one in front of which-
ever one this one happens
to be.
Now, family members typically come with
the same kit gloves, hat, scarf, coat,
boots, but not the same rig, It was my ex-
perience that one member of the family, for
ease of exposition well call him Dad, typi-
cally comes with a coat buttoned up, hat on,
scarf around neck, gloves on, etc., while an-
other family member who well call Mom
has the same gear, but hat and gloves tucked
uselessly in oversize pockets, Coat open,
scarf fashionably and uselessly draped
over shoulders.
Let us presume that body temperatures
are a proxy for comfort and exposed skin
experiences Newtons Law of cooling, i.e.
, where is the outside air
temperature. If we presume that the param-
eter k is dependent on the amount of exposed
skin, with larger areas of exposed skin leading
to high values for k, fully dressed out family
members will have values of k near zero.
Immediately, we see that the optimal strate-
gy for family members is to have and properly
use warm clothing. We should not be surprised
that, as in other analytical problems, zero cost
44 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
The Traveling
(Christmas Tree)
Salesman Problem
BY HARRISON SCHRAMM
The Christmas tree
salesman, if hes an
analyst, will quickly
observe that the
tree decision is a
collaborative process.
F I VE - MI NU T E ANALYS T
Figure 1: Newtons Law of cooling applied to family members at the tree
lot. Because of a greater proportion of exposed skin, Moms exposed
skin temperature drops dramatically faster than Dads. When Moms skin
temperature reaches the critical level, marked on the chart at , the tree
immediately in front of the family becomes the chosen one. In this example,
the time until purchase is 22 minutes.
WWW. I NF OR MS . OR G 45 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
optimal strategies are not always adopt-
ed in the real world!
Now consider the salesmans prob-
lem. He needs to do three things: First,
estimate k for each of the family mem-
bers; second, convert this into a time; and
fnally, determine a path through the tree
lot that has the family shopping in front of
a tree he wants to sell when this time oc-
curs and not make it obvious that hes
doing so.
The frst two pieces of this are
straightforward. The third problem might
be hard (as in NP). It is a variation on
the Travelling Salesman Problem. In the
Traveling (Christmas Tree) Salesman
Problem (TCTSP) the salesman needs to
select an acyclic, directed path through
the tree lot network which arrives at the
selected tree at the time that the deci-
sion-maker is cold (Figure 1).
I said that the problem might be NP
hard, but it turns out that in practice, it
is not terribly difficult. This is because
the arc times are not fixed, but have
some variability; i.e. the salesman can
adjust his speed as he walks through
the lot, provided that he isnt too obvi-
ous about it. This adds a
new degree of freedom
to the problem. In fact,
the problem has two de-
cision variables: First,
the route to take, and
secondly, the speed at
which to travel. Because
its an easier day for the
tree salesman to walk
slower, well say that his
objective is to minimize
speed (Figure 2).
In practice, this prob-
lem is not terribly diffcult,
because the salesman
does not have to hit ex-
actly the right spot, and
he generally controls the
speed of the walk. For example, he will
typically only have two or three realistic
paths through the tree lot, which he may
solve by enumeration (Figure 3).
Of course, running a tree lot is not only
about selling the right tree for a commis-
sion. Theres also:
Security. Christmas trees generally have
a high markup typically around 100 per-
cent. This is not pathological, however,
as the trees are generally cheap, and
selling them can be expensive. Should
Christmas trees be guarded after hours?
F I VE - MI NU T E ANALYS T
Figure 2: Mathematical formulation of the Traveling Christmas Tree Salesmans
problem. Note that it is the side constraint that is the cause
of diffculty.
Figure 3: Traveling the tree lot: While the general formulation of the
TCTSP problem is hard, in practice the number of paths is small and
may be enumerated. Two possible paths through a notional tree lot are
shown here.
The problem is not terribly
difficult, because the
salesman does not have to
hit exactly the right spot,
and he generally controls
the speed of the walk.
Hel p Promote Anal yt i cs
Its fast and its easy! Visit: http://analytics.informs.org/button.html
WWW. I NF OR MS . OR G 46 | A NA LY T I CS - MAGA Z I NE . OR G A NA LY T I CS | NOV E MB E R / DE CE MB E R 2012
The rate of spillage in our experience
was around one to two trees per year. Again,
for reasons discussed in the introduction,
the customers are paying for the Christmas
tree experience, and we fgured probably
rightly that people who were willing to
steal a Christmas tree probably needed it
so badly that we would have felt sorry for
them anyway. The vigilance issues what
is the probability of catching a tree thief on
a large lot at 3 a.m.?
combined with the
legal/liability issues
what are you going
to do about it even
if you did? made
security in our par-
ticular situation not
worth it. Others may,
of course, come
up with a different
solution.
End of the season.
Christmas trees are
different from other
commodities be-
cause they have a
known, fxed shelf-life. Depending on the cli-
mate, the consumers value of a Christmas
tree is a function of the time of purchase.
Suppose that the frst day that you may buy
a tree is Dec. 1. If you live in a cold climate,
this brings the tree indoors and risks brown-
ing by Christmas day. However, if you live
on say, Guam (as my wife and I did for two
Christmases) you are better off to buy a tree
immediately; frst because of the danger of
the lot selling out, but more importantly be-
cause your house is cooler than the tree lot,
and this prolongs the trees useful life.
We may agree that Christmas trees
become worthless for retail on Dec. 26.
Also, there are very few people who go
out and buy a tree after Dec. 21. With
this in mind, we deeply discounted our
trees in the last few days of the season.
While someone may wait until Dec. 21
to buy a tree at a discount, this doesnt
often happen in practice, and trees left
on the lot after Christmas becomes a li-
ability you have to figure out how to
dispose of them. Therefore, at some
point in the days immediately before
Christmas, the expected value of the
trees on the lot goes from positive to
negative, and the proprietor would re-
ally prefer to give them away. However,
there is a limited appetite for discount
trees; therefore, the first tree salesman
to fold as it were, receives a bonus
in the form of getting rid of trees faster.
SUMMARY
I hope that you have fun shopping for
your tree this year. As a retired tree sales-
man, my preference is Douglas Fir, sin-
gle-wrap. They have just a little touch of
Charlie Brown to them, and they smell
wonderful in the house. Happy holidays
to all and best of luck in 2013!
This article is dedicated to my father, Dale.
Harrison Schramm (harrison.schramm@gmail.com) is a
military instructor in the Operations Research Department
at the Naval Postgraduate School in Monterey, Calif. He is
a member of INFORMS.
F I VE - MI NU T E ANALYS T
Figure 4: Values of Christmas trees as a function of date in various
markets. Christmas trees increase in value at frst in the United Sates
because they are kept fresher, longer by being on the tree lot. The opposite
effect happens on Guam, where exposure to the heat and humidity tends to
dry out the trees.
Request a no-obligation INFORMS Member Benefits Packet
For more information, visit: http://www.informs.org/Membership
Christmas trees are
different from other
commodities because
they have a known, fixed
shelf-life. Depending on
the climate, the consumers
value of a Christmas tree
is a function of the time of
purchase.
OPTIMIZATION
www.gams.com
Europe
GAMS Software GmbH
Eupener Strasse 135-137
50933 Cologne, Germany
phone
+49-221-949-9170
fax
+49-221-949-9171
mail
info@gams.de
web
http://www.gams.com
USA
GAMS Development
Corporation
1217 Potomac Street, NW
Washington, DC 20007, USA
phone
+1-202-342-0180
fax
+1-202-342-0181
mail
sales@gams.com
web
http://www.gams.com
GAMS Integrated Developer Environment for editing,
debugging, solving models, and viewing data.
High-Level Modeling
The General Algebraic Modeling System (GAMS)
is a high-level modeling system for mathemati-
cal programming problems. GAMS is tailored for
complex, large-scale modeling applications, and
allows you to build large maintainable models that
can be adapted quickly to new situations. Models
are fully portable from one computer platform to
another.
State-of-the-Art Solvers
GAMS incorporates all major commercial and
academic state-of-the-art solution technologies for
a broad range of problem types.
The DIMENSION model is a linear energy system
model developed at the Institute of Energy Econo-
mics at the University of Cologne (EWI). It optimi-
zes the future development of electricity genera-
tion capacities and their dispatch in Europe on an
hourly basis by minimizing total costs.
Besides conventional power plants, combined heat
and power plants and power storages, the model
considers technologies that support the future high
feed in of renewable energies by providing demand
side fexibility. These technologies include demand
DIMENSION - A Dispatch and Investment Model for
European Electricity Markets
side management processes and virtual power
storages consisting of electric vehicles. Moreover,
DIMENSION provides a detailed modelling of rene-
wable energy sources and -technologies.
For further information please contact Jan Richter (jan.richter@uni-koeln.de) or visit www.ewi.uni-koeln.de.
For a technical description of the model see Richter, J., 2011: DIMENSION A Dispatch and Investment Model for European Electricity
Markets, EWI Working Paper No. 11/03, Institute of Energy Economics at the University of Cologne.
Many people store their
valuables in home safes be-
cause they protect against
burglaries and fres. They
are a good place for stor-
ing insurance documents, car titles, cash and
many other valuables.
Figure 1 shows six dials that are on the front
of your home safe. In order to open the safe, you
must set each of the dials to one number. When
the correct number is selected for each dial, the
safe will open. Unfortunately you have forgotten
the combination. All you can remember is that
the numbers on all of the dials summed to 419.
QUESTION:
What numbers should you select in order to
unlock the safe?
Send your answer to puzzlor@gmail.com
by Jan. 7, 2013. The winner, chosen ran-
domly from correct answers, will receive an
Analytics - Driving Better Business Deci-
sions T-shirt. Past questions can be found
at puzzlor.com.
John Toczek is the manager of Decision Support and Analytics
for ARAMARK Corporation in the Global Risk Management group.
He earned a bachelors degree in chemical engineering at Drexel
University (1996) and a masters degree in operations research
from Virginia Commonwealth University (2005). He is a member
INFORMS.
47 | A NA LY T I CS - MAGA Z I NE . OR G
Combination locks
BY JOHN TOCZEK
T H I NK I NG ANALY T I CAL LY
Figure 1: What numbers will unlock the safe?

You might also like