You are on page 1of 20

Oil, Gas, and Data

High-performance data tools in the production of industrial power


By Daniel Cowles
April 15, 2015

Introduction
When you hear "innovation in oil and gas," your first thoughts might go to
hardwarebigger, faster, deeper drilling; more powerful pumping equipment;
and bigger transportor to the "shale revolution"unconventional wells,
hydraulic fracturing, horizontal drilling, and other enhanced oil recovery (EOR)
techniques. But, just like any other industry where optimization is important
and due to large capital investment and high cost of error, its perhaps even
more important in oil and gas than in most other industriesthe potential
benefits of predictive analytics, data science, and machine learning, along with
rapid increases in computer processing power and speed, greater and cheaper
storage, and advances in digital imaging and processing, have driven innovation
and created a rich and disruptive movement among oil and gas companies and
their suppliers.
The truth is, the oil and gas industry has been dealing with large amounts of
data longer than most, some even calling it the "original big data
industry."Karren, Charles. "Insight Report: Data Centre Developments Get up
Close and Personal."OFFCOM News. CTLD Publishing Ltd. Web. 20 Mar.
2015.Boman, Karen. "What Upstream Oil, Gas Can Learn About Big Data from
Social Media."Rigzone News. Dice Holdings, Inc., 10 Dec. 2014. Web. 20 Mar.
2015.Large increases in the quantity, resolution, and frequency of seismic data,

and advances in "Internet-of-Things"-like network-attached sensors, devices,


and appliances, are being combined with large amounts of historical databoth
digital and physicalto create one of the most complex data science problems
out there, and a new industry is developing to help solve it.
L E A R N I N G PAT H

Architect and Build Big Data Applications


This Learning Path will take you through the entire process of designing and
building data applications that can visualize, navigate, and interpret reams of
data.
Shop now
In oil and gas more than in almost any other industry, efficiency and accuracy is
highly valued, and small improvements in efficiency and productivity can make a
significant economic difference. When a typical well can cost upwards of ten
million dollarsand often much morethe cost of error is great, and managing
cost versus benefit can mean the difference between profitability and loss. And
not unlike a tech startup, where a meaningful investment upfront is required
before knowing how much the return will beif anyyou may have to dig many
holes to find a successful well. Obviously, the more certainty you can have, the
better, and incrementally increasing certainty is a place where data science and
predictive analytics promise to help. The payoff from analytics isn't limited to
exploration: once a well has been successfully drilled, production efficiency and
optimization remains important in the lifetime ultimate recovery of a well.
In addition, given crude price fluctuations and many other unpredictable outside
variables, capital project planning itself is rife with uncertainty, and large-scale
projects often face significant overages. In 2011, upstream offshore oil and gas
projectsaround 28% had a cost blowout of more than 50% and the root cause
of that isthey got the numbers wrong, says Dominic Thasarathar, who
watches the energy sector for the Thought Leadership team at Autodesk. Their
costs have gone up, theyre dealing in everything from frontier environments to
difficulties raising finance. According to the International Energy Agency, capital
investments in energy projects have more than doubled since 2000, and are
expected to grow by $2 trillion annually by 2035, so accurately predicting cost
versus benefit is extremely important."WORLD ENERGY INVESTMENT
OUTLOOK 2014 FACTSHEET - OVERVIEW." International Energy Agency.

OECD/IEA, 1 Jan. 2014. Web. 20 Mar. 2015.Where we see big data fitting in,
continues Thasarathar, isif you look at the performance for those big
projects, its pretty much a horror story in terms of how its dropped off over the
last 1520 years, and the root cause of that is, theres so much that project
teams need to understand and assimilate in terms of information to make the
right decision.
But exploration and production arent the only areas that can benefit from
innovative data and data science driven solutions. From health, safety, and
environmental, to cyber security, to transportation and manufacturing
opportunities to create greater efficiencies exist throughout the entire
hydrocarbon production and delivery cycle.

Overview
The oil and gas industry is traditionally broken down into three broader
categories: upstream, which includes exploration, discovery, and both land and
sea drilling and production; midstream, which includes transportation, wholesale
markets, and manufacturing and refinement of crude; and downstream, which is
primarily concerned with the delivery of refined products to the consumer. The
majority of big and fast data related innovation is found upstream, in the
discovery and exploration phase, where risk and uncertainty are high,
conditions can beto put it mildlychallenging, and where failure is very
expensive.
The industry is a mature and unique one, built on experience and hard-won
knowledge, and employing the worlds leading geological scientists and
engineers. Theyre very good at what they do, and theyve been doing it for a
long time, but there is an imperative to add more big data and data science
skills like machine learning and predictive analytics into the mix, skills that oil
companies havent traditionally and broadly had in-house. According to Boaz
Nur, former VP of Energy at data science startup Kaggle, energy analysts think
big data and analytics are the next frontier in oil and gas, but theyre only now in
the early adoption phase. They [oil and gas companies] don't shy away from
technology, theyre just careful, Nur says. A lot of snake oil has been sold to
the oil and gas companies over the years. Theyve also historically done a pretty
good job of producing oil. Theyre [already] doing OK; what were proposing will
help them take it up to the next level. Adds Nur: Theyre cautious but theyre
optimistic.
EBOOK

Mastering Azure Analytics


By Zoiner Tejada
Shop now
Halliburton is using big data and data science techniques to try to solve a
variety of problems in the E & P (exploration and production) upstream phase.
We are looking at trying to optimize seismic space, trying to optimize drilling
space, well planning, says Dr. Satyam Priyadarshy, Halliburtons recently hired
Chief Data Scientist. Priyadarshy is bringing some big data techniques to the
space: For example, we are looking at how to optimize in the seismic world
through distributed computing [techniques] because it takes a long time to
process the data. But Priyadarshy says that its a mistake to think that data
science methods and techniques are new to oil and gas. Theyve actually been
using machine learning for many years, he says. People have been using
neural networks, fuzzy logic, SVM, SVRspretty much any algorithm you want
to talk about in machine learning, they have been using it. But, they have been
using these in limited cases, to limited value, and the goal is now for people like
us (data scientists) to build this into a more valuable product. He says that the
oil and gas industry is unique in terms of the complexity of the data and models,
and that turnkey solutions from other traditional big data industries cant be
easily applied here. Its a complex challenge. Its not the same as the other big
data players, says Priyadarshy, who has worked widely on big data projects in
the news, media, Internet, and insurance spaces. The complexity in the oil and
gas industry outweighs any other.
Because of that complexity, Priyadarshy stresses the need for domain area
expertise when dealing with petrotechnical data, and he has his own definition
of the skills a data scientist should have for the space. You need a person who
has domain expertise, a person who is a computer scientist, and a business
personthese three actually form a real domain data scientist for the oil and
gas space.
Another complication is legacy and historical data: some is digital, but much is
still found in binder and paper form. From a predictive modeling standpoint,
theres value to be had, but dealing with old systems and documents, often at
isolated physical properties, oras often happens in the industryinherited
through acquisitions and neglected, makes integrating these pieces into your
model challenging.

Remote standalone locations and physical records and manuals also hamper
efforts to digitally connect a companys systems and assetsthe much
discussed "digital oilfield" idea, where systems are integrated and automated to
tune and optimize operations across the breadth of the production cycle. The
move to digital operations is increasing steadily, but theres an awful lot of
legacy out there, things going back decades, where the drawings were done
with, literally, pen and paper, says Neale Stidolph, Head of Information
Management at Lockheed Martin and based in Aberdeen, where he primarily
deals with North Sea oil fields, including many older legacy wells. A large part
of the industry is very much tied to documents and records. So, theres still a
need to maintain vast physical archivesof boxes full of old information. And
theres a need to analyze and strip that to get more value. And since many of
the physical sites involved are isolated, supplying their own power, without
modern communication networks, there are additional barriers to fully digitizing
operations. One of the factors the rigs have to cope with is what they call a
black start, says Stidolph. If your rig goes down, it means youve lost
everything: youve lost all power generation, all connectivity, all systems of every
type. You need a flashlight and you need a manual to be able to see how to get
this thing operational again. Many of these rigs are in hazardous and remote
environments, so off-the-shelf connectivity solutions arent typically sufficient.
But, challenges and cultural resistance aside, big data methods are changing
how the industry does business, and these changes will ultimately result in a
changed oil and gas industry.
EBOOK

High Performance Spark


By Holden KarauRachel Warren
Shop now

Upstream
As previously mentioned, oil and gas has long been familiar with large and
diverse datasets, and improvements in technology and methodology are driving
an exponential increase in the amount of data being collected.
In the exploration space, for example, due to advances in seismic acquisition
methodology, storage capabilities, and processing power, data gathered via

offshore seismic acquisition has gotten both biggerdue to increased resolution


and faster, due to increase in frequency and rate of acquisition. The result is
4D data (x/y/z space, and time) at a far higher resolution, providing far better
understanding of subsurface deposits and reservoirs than previously possible.
Wide azimuth towed streamer acquisition (WATS)seismic exploration using
multiple ships deploying a miles-wide array of acoustic equipmentallows
companies like Chevron and BP to create high-resolution topographic maps
under the earth and beneath salt canopies, and locate new oil fields that may
not have been found otherwise."Marine Seismic Imaging." BP.COM. British
Petroleum. Web. 20 Mar. 2015.Time-lapse seismic data acquisition also allows
them to see how reservoirs are behaving as oil begins to flow, allowing them to
optimize production once it begins. As the worlds energy demands continue to
grow, and exploration efforts move farther offshore and into deeper waters, the
ability to accurately visualize deep, complex, subsurface topography is
essential. Recent deepwater discoveries in the Gulf of Mexico have been greatly
aided by new seismic techniques, and there is a direct relationship between
improvements in data storage and data processing, and improvements in
seismically generated image resolution, which in turn results in new and better
understood hydrocarbon discoveries. And there is still room for improvement in
seismic acquisition image resolution: Even at very high resolution, the images
we can make today still have gaps bigger than the size of a conference room,
says BPs John Etgen.

Well Optimization and Mature Wells


Although a lot of recent press and activity focus on the "shale boom" and other
unconventional extraction techniques, according to Halliburton, 70% of the
worlds oil and gas comes from mature wells."Maximizing the Value of Mature
Fields." Halliburton.com. Halliburton, 1 May 2012. Web. 20 Mar. 2015.A mature
well is usually defined as one where peak production levels have been reached,
and extraction rates are on the decline, or when the majority of the relatively
"easy to get" hydrocarbons that the well will ultimately deliver have been
extracted. Typically in wells the early oil and gas is easier and cheaper to
extract, and the industry hasnt been enthusiastic about optimizing extraction,
holding a common belief that there is an "economic limit" where it costs more to
get the resources out of the ground than theyre worth on the market. However,
modern EOR techniques have become more efficient, and sensor data and
predictive models play a part in that. These wells have a wide range of factors
that make them more complicatedpoor flow, poor rock formations, bore
cracks, complex geological conditionsbut they still have a lot to offer in terms

of hydrocarbons. In an industry where small margins mean large sums of


money, getting the most from mature and end-of-life wells at the lowest cost is
another area where improved use of data can have a significant impact on
results. Again according to Halliburton, a 1% increase in production from the
mature fields currently active would add two years to the worlds oil and gas
supply.

Remote Sensors and Network Attached Devices/I


of T
There is already a lot of application and ongoing interest in network-attached
devices, appliances, and Internet of Things-like connected devices in the oil and
gas space. Halliburtons Priyadarshy prefers the term "emerging technology
devices," to the "Internet of Things" label, which causes some confusion and
resistance in the industry. In any event, remoteness, geographic breadth of
facilities and pipelines, hazardous environments, and inaccessibility of many
aspects of the oil and gas production cycle make it highly disposed to
automation and remote monitoring and optimization. Remotely monitored and
controlled devices can help lower cost, effort, and error in resource tracking,
and can decrease workforce overhead, improve logistics, and drive well and
operations automation and optimization. Its a big piece of the "digital oilfield"
concept, and one that the industry has already embraced.
VIDEO

Jupyter Notebook for Data Science Teams


Jupyter Notebook for Data Science Teams
By Jonathan Whitmore
Shop now
Sensors of all kinds are already used throughout the detection, production, and
manufacturing cycle to better understand and monitor processes and gather
data. Sensors can capture fluid pressure, velocity and flow, temperature,
radiation levels (gamma ray energy is a useful indication in hydrocarbon
discovery), relative orientation and position, as well as chemical and biological
make-up of physical materials. Trending toward cheaper, smaller, and

connected arrays, newer microsensors can communicate with each other and
with external networks.
From exploration to the gas pump, there are opportunities to use networked
devices. Offshore, submersible devices that gather information can be remotely
controlled and are safer alternatives to human-piloted crafts. Pumps can be
remotely monitored and adjusted, and can be far more economical than manual
maintenance. Midstream in the transportation phase, networked devices can
help track resources through the many and various stages and handoffs that
happen throughout the crude transport process. Pipelines and remote
equipment can be monitored and even maintained remotely. Biomonitoring
workers could increase safety. Gartner has predicted as many as 30 billion
connected devices by 2020, with 15% of those in the manufacturing sector.van
der Meulen, Rob. "Gartner Says Personal Worlds and the Internet of Everything
Are Colliding to Create New Markets." Gartner.com. Gartner, Inc., 11 Nov. 2013.
Web. 20 Mar. 2015.
The data gathered from all these devices will be valuable for predictive analytics
and other applications: from well sensor data that can be analyzed to help
optimize productivity, to operations data that can monitor and calibrate
operational systems, to transportation data that can help identify bottlenecks
and inefficiencies, to workforce data that can help drive safety. But, as
Halliburtons Priyadarshy points out, with those benefits also come some new
challenges; for example, sensor data veracity in different physical environments:
Imagine a situation where you build a sensor for Texas weather. If you were to
take it to some Middle Eastern country like Kuwait, where temperatures are
[significantly] higherif the sensor starts sending data and you are trying to
predict based on what you know from Texas, then you may be in deep trouble.

Security
As discussed, there is significant pressure to lower costs and optimize, and
remote-controlled and network-attached devices of all types are a means to that
end. The downside is, the more connected you are, the more vulnerable you are
to network intrusions, intentional or otherwise. And while remote monitoring is
also crucial to improved security, it can open holes itself. There are gains from
the automation; you can get more protection, you can do better sensing of
whats happening along your line, theres lots and lots of opportunities for
managing and monitoring the line using automation, says industrial and oil and

gas cyber-security expert Eric Byres, but your automation system, which is
supposed to be protecting your pipeline, [can become] the problem.
A series of events and attacks have made the oil and gas industry keenly aware
of the need to dramatically improve their cyber-security. After 9/11, the industry
became more concerned about intentional and coordinated attacks, but it wasnt
until the Stuxnet worm attack in 2010 that they started to really address the
problem. Stuxnet hit an Iranian nuclear facility in 2010, causing the failure of
uranium-enriching centrifuges. Written specifically to exploit Microsoft and
Siemens vulnerabilities, Stuxnet was the first prominent attack against the
PLC/SCADA (programmable logic controller/supervisory control and data
acquisition) systems used by industrial plants of all types, including the oil and
gas industry, and previously assumed to be safe from cyber attack. To make
things even scarier, Stuxnetwidely reported to be a joint Israeli/US made
cyber-weaponfound its way onto the Natanz enrichment facility while not
connected to the Internet, via sneakernet, on USB drives. In the case of
Stuxnet, the collateral damageand what might even be called friendly fire in
this new battlefieldspread into the wider industrial ecosystem, infecting
Chevron, with unconfirmed reports of at least three other major oil companies
being affected as well.Sale, Richard. "Stuxnet Hit 4 Oil
Companies." Isssource.com. Industrial Safety and Security Source, 15 Nov.
2012. Web. 20 Mar. 2015.,King, Rachael. "Virus Aimed at Iran Infected Chevron
Network." Wall Street Journal. Dow Jones and Company, Inc., 9 Nov. 2012.
Web. 20 Mar. 2015.
EBOOK

Getting Started with SQL


By Thomas Nield
Shop now
Given the sociopolitically charged nature of the industry, oil companies were
justifiably worried by Stuxnet. Suddenly, the ability for remote and unaffiliated
parties to influence operational and safety systems was very real: spills,
blowouts, explosions, and the potential for loss of life. In addition to wells and
refineries, pipelines, trains, and other transportation methods are vulnerable to
attack, and beyond any human disaster, the repercussions could be
environmentally catastrophic as well as disruptive to business.

Since Stuxnet, there have been other attacks, including the Shamoon virus that
hit Saudi Aramco in 2012. Initiated by a "disgruntled insider," Shamoon wiped
out the contents of between 30,000 and 55,000 Saudi Aramco workstations.
These attacks, coupled with environmental and PR disasters like Deepwater
Horizon, have given the industry all the motivation it needs to get serious about
security. I do think the oil and gas industry is ahead of all the other companies
[in terms of security], continues Byres. There is a real serious attempt to try
and get security under controlthats the good news. But while the majors like
Shell, Exxon, Chevron, Total, and in particular BP (where Paul Dorey was an
early and vocal security advocate) have become very serious about security,
youre only as strong as your weakest link, and the industry is dependent on and
tightly integrated with suppliers, contractors, and vendors, many with less
sophisticated approaches to security. That terrifies the guys at BP, thats why
they started becoming evangelists in 2006, 2005because they realized they
could do a good job on their site, and gain nothing because of the integration to
all the other companies around them. The other companies were so insecure.
And its not just production that is threatened. The Night Dragon attacks
thought to be started in Chinatargeted intellectual property.Kirk,
Jeremy. "Night Dragon" Attacks from China Strike Energy
Companies." PCWorld.com. IDG, 12 Feb. 2011. Web. 20 Mar. 2015.In the PR
space, a Sony-type attack on internal email and proprietary information systems
could also have huge ramifications in a competitive and secretive industry.
While some facilities remain off the net by virtue of being old and isolated, and
instances of air-gapped systems may persist, in general the digitally attached
genie is out of the bottle: the industry is moving rapidly toward digital openness,
and it wont be going back. As Byres notes, The reality is, modern networks in
the oil and gas industry need a steady diet of data. Data going in and out;
security patches, lab results, remote maintenance, [and] interactions with
customers. So, theres no way you can isolate a refinery anymore. Theres just
too much need for data on the plant floor now with the way weve built our
systems.
Technology might not always be the best solution in an industry as
fundamentally physical as this one. Byres relates a story of how one Nigerian
delta oil company battled the theft of sections of pipe that were being taken and
sold as scrap metal. They started making them heavy enough to sink the boats
that were used to carry them off, and the thefts stopped. But anecdotes aside,
security is now primary for oil and gas IT, and while prevention is still important,
most now agree that 100% impenetrability is unlikely, and rapid detection is the

most important security tool. This is an area where data science and threat
analytics can possibly help. Applying machine learning and pattern recognition
to noisy and ever-larger data streams can preemptively detect anomalies and
identify attacks. But Byres thinks the complexity of the problem means the
industry is a ways off from really leveraging big data and data science solutions
in the security space: There is an opportunity, theres no questionbut were
still a few years away before anyone uses it effectively.

Health, Safety, and Environment


There is also a lot of optimism around the ways big data can help in the health,
safety, and environmental space (HSE), and around the ways that predictive
analytics and machine learning can be applied to anticipate well and
manufacturing downtimes, malfunctions, accidents, and spills. As increasing
energy demand pushes oil and gas production into untapped frontiers and
deeper waters, with ever harsher and unpredictable environments, the potential
for ecological, human, economic, and public relations disaster increases. So,
companies are highly incentivized to do everything they can to anticipate and
proactively address potential problems. This is a space where historical data
can be analyzed to predict future issues, and where models can also tie in new
data sources, like weather.
EBOOK

Kafka: The Definitive Guide


By Gwen ShapiraTodd PalinoNeha Narkhede
Shop now
In addition, unconventional resource plays have introduced a whole new set of
environmental and safety concerns, from water, air, and soil pollution to
earthquakes. There is thought to be a significant opportunity to tune and
improve all aspects of unconventional drilling to reduce the harmful side effects.
Data science and predictive models can help drive optimization of fluid injection
and more accurate drilling, and by incorporating ever more underground sensor
data into the model, further improvements can be made.

High-Performance Computing and Beyond

To handle the massive increase in the amount of datafor example, in 2013 BP


stated their computing needs were 20,000 times greater than in 1999"BP Opens
New Facility in Houston to House the Worlds Largest Supercomputer for
Commercial Research." BP.com. British Petroleum, 22 Oct. 2013. Web. 20 Mar.
2015.companies have turned to expensive high-performance computing
centers, and are building out data science expertise in-house, or engaging data
science partners. Some of the largest private supercomputing facilities in the
world are now run by oil companies, with Italian energy company Eni, Francebased Total Group, and now BP all recently building HPC centers capable of
greater than 2 petaflops.Trader, Tiffany. "Eni Joins Oil and Gas Petaflop
Club." HPCWire.com. Tabor Communications, Inc., 20 Nov. 2013. Web. 20 Mar.
2015.Eniwhich utilizes a CPU/GPU clusterclaims upward of 3 petaflops,
while BPwhose facility cost upwards of $100 millionclaims 3.8 petaflops of
computing power and 23.5 petabytes of disk space, all geared toward
processing seismic imaging and hydrocarbon exploration data."Number
Crunching with Big Data." BP.com. British Petroleum, 22 Dec. 2014. Web. 20
Mar. 2015.GPU processing is now commonplace for seismic data-crunching,
and Intel, with their Xeon Phi processor, claims similar or better cost/benefit
performance.
As falling crude prices impact IT spending, and with cloud computing prices
dropping almost as fast as that of crude, it seems that open-source distributed
data management computing, in the cloud or on commodity hardware, could be
poised to become a real presence in the oil and gas space, particularly for
smaller companies who cant afford their own HPC center, or skunkworks
projects where resources are scarce. But its a cautious IT culture, with many
companies waiting to see what others do before them. Its usually cultural
problems that get in the way more than technical capabilities, says Stidolph,
about adopting new technology solutions. We often call it 'the race to be
second.' In most industries, people want to innovate, to be the leader, which
means you take certain risks, and you make certain investments. In oil and gas,
everybody kind of queues up to see who steps out of line to make that
investment and take that risk and follow suit only once its proved successful.
But, while crunching seismic data may continue to live in the HPC world, there
are many other use cases where open source and NoSQL distributed data
management systems like Hadoop could provide cost-effective alternatives to
HPC. Hadoop providers like Hortonworks have begun working with the industry,
and they see opportunities throughout the exploration to delivery petro cycle
(read more about Hortonworks next). Meanwhile, Cloudera has developed a
"Seismic Hadoop" project to demonstrate how to store and process seismic

data in a Hadoop cluster on commodity hardware.Wills, Josh. "Seismic Data


Science: Reflection Seismology and Hadoop." Cloudera.com. 25 Jan. 2012.
Web. 20 Feb. 2015.

More Cloud and Mobile


Cloud-based processing of large datasets is also driving innovation and
disruption in the supply chain, and allowing for an untethered workforce.
Products like Autodesks ReCap allow customers to createlarge (many billion)
point-cloud datasets, and render them as 3D models to mobile devices quickly.
In the oil and gas manufacturing space, this can mean visualizing wells or
facilities on-site via tablet or mobile device. Its advances like the cloud that
allow things like ReCap to be able to crunch those numbers and stitch
photographs together, says Autodesks Thasarathar. And using the cloud to
crunch those datasets is becoming attractive to oil and gas companies for a few
different reasons. Not only does it allow them to lessen their capital investment
in soon-to-be-obsolete hardware, but they can also move the infrastructure
cost/benefit burden to the cloud provider. The fact that they can do it on
demand, and you pay for what you usethat consumption-based business
model is incredibly attractive to the industry, says Thasarathar.
BOOK

Cassandra: The Definitive Guide


By Eben HewittJeff Carpenter
Shop now

Midstream and Downstream


Primarily because there is less uncertainty, and the cost of failure is lower, there
is less innovative data science activity in the midstream and downstream
sectors, but there is opportunity there as well. Many of the same principles and
techniques apply, especially in midstream activities like crude transport and
pipeline security and safety, refinery maintenance, and failure monitoring,
logistics, and people and resource management.

Emerging Tech

Things are changing within the sector, where cluster compute platforms,
massive and affordable storage, and new techniques have enabled companies
to evolve their existing tools and methods. Data science as a practice is being
adopted within the industry, but many companies lack the needed internal data
science resources. While they have abundant expertise in geosciences and
engineering, among other things, they dont typically have big and unstructured
data, machine learning, predictive analytics, artificial intelligence, or other data
science specific expertise. Theyre recognizing that they have a lot of data, both
historical as well as new, that they arent getting everything they can out of,
says Kaggles Nur. So oil and gas related companies are taking different
approaches, building out data science teams internally or turning to outside
companies for expertise. Lets take a look at some of these outside companies.

Hortonworks
Hortonworks is a leading provider of Hadoop solutions, well known in the tech
sector, but relatively new to oil and gas. They bring technical expertise and
provide solutions with a toolset that oil and gas isnt familiar with, and they bring
an open-source approach to a sector that isnt known for its openness or its
willingness to share. But thats changing as the industry starts to understand the
potential in data science and predictive analytics. They all want to get into a
modern data architecture, and they realize Hadoop is a cornerstone for that,
says Ofer Mendelevitch, Hortonworks Director of Data Science. Hortonworks
sees opportunities to provide insight throughout the upstream sector, from using
predictive analytics to improve production optimization by providing a better
sense of when a well might go down, to being better able to predict and
proactively handle safety and environmental hazards, to providing a broader
and more multidimensional dashboard, including services like weather and
social feeds.
Mendelevitch also sees potential in niche cases, like automatically processing
LAS (Log ASCII Standard) files, using algorithms and fitting curves to identify
redundancy and greatly reduce work currently done manually. In addition, with a
lot of buzz around data security and Internet of Things, they see companies
adjusting their IT processes to collect more and new data, and becoming more
in touch with their social media streams and presence. And there are
opportunities midstream and downstream as well, in areas like equipment
failure prediction, safety analytics, and portfolio analysis.

Kaggle

Kaggle, a startup with roots as an analytics competition platform, is another tech


sector company to have brought data science expertise to the oil and gas
space. Kaggle took a different approach, hoping to provide expertise to the oil
and gas sector by leveraging its large data science competition community and
platform to provide expertise to an industry who might not always have the data
science skills in-house to find the solution. Though this business model
ultimately did not pan out, Kaggle did achieve successes using data science
and well logs, production data, and completion data to optimize drilling
parameters like well spacing, orientation, length, and more. Of course all these
decisions they have to make have an economic cost component to them, said
Boaz Nur, former VP of Energy at Kaggle. The longer you drill the well, the
more it costs; the more proppant"Hydraulic Fracturing
Proppants." Wikipedia.com. Wiki Foundation, 2 Feb. 2015. Web. 20 Mar.
2015.you use, the more it costs; the more fluid you use, the more it coststhere
is some sort of optimal solution, Nur continues. We ingest all the dataand
we basically come up with that optimal solution. By helping guide them, theyre
able to think about the parameters that are most important. Our strategy is to
find where data science solutions add the most value, where the challenging
problems are, and where data science is the most applicable solution.
VIDEO

Scalable Machine Learning


Scalable Machine Learning
By Mikio Braun
Shop now

SparkBeyond
Unlike some pure data science oriented startups, SparkBeyond uses domain
area experts to work with their data scientist team, to help ask the right
questions and pick the right inputs for their models and machine learning. They
emphasize the value of expertise, and stress sound methodology when building
complex models. They use Apache Spark, yes, but many other tools as well,
and apply a broad multidisciplinary approach and diverse datasets to their oil
and gas sector work, a sector full of uncertainty throughout the entire production
chain. In addition to standard seismic, production, operational, and log data,
they pull from a variety of other sources. You need to incorporate weather data

and APIs with geological data, financial data with news articles to see how
geopolitical events can affect (production) cycles, says Sagie Davidovich,
SparkBeyonds CEO. They also incorporate data from other energy sectors,
since it has a direct impact on oil and gas cost/benefit models.
Currently most of the work they do is in the unconventional well space, which
makes building models challenging because of the relatively small and
incomplete sample set for wells of the shale boom era. Wells drilled since, say
2009, 2010are very different from the wells drilled 20 years ago, says Meir
Maor, SparkBeyonds Chief Architect. So, theres actually less relevant data to
learn from, and most of these wells have not completed their lifetime, [which
makes ultimate recovery predictions difficult]. Were looking for the areas where
there is a lot of uncertainty in the exploration processhow much oil is going to
be produced, how fast its going to come out, says Meir, and were also looking
for places that decisions can be made, that if we can manage to lower that
uncertainty with a predictive model, it will be actionable. And with so many new
techniques and methods emerging in the unconventional oil space, how one
drills is becoming as much of an issue as if and where to drill. When you are
talking about extracting hydrocarbons from solid rock, it becomes exponentially
more difficult. The techniques have advanced, so there are many different ways
[to drill], which can behave differentlyso the decision space is wide and there
is a lot of money at stake and a lot of uncertainty as to what will come out.
Given the high cost of error in the industry, trust and adoption of new
technologies doesnt come easily. Even if you have sound predictive models,
actual economic success can take years to prove. So, in the meantime,
SparkBeyond works hard to remain transparent and to build models that are
easy to understand. Says Davidovich: What we learned is that being 3040%
more successful than our competitors is only the first step to get in the door,
then you go through other steps.
But theyre seeing things change, partly due to external forces. If you think
about it, the big data hype actually creates a lot of pressure on companies to
introduce predictive analyticsand the oil and gas industry is no different, says
Davidovich, and thats an exciting prospect to him. There are so many new
undiscovered opportunities to bring more certainty to this space, which affects
every aspect of our lives. Adds Maor: Whats even more interesting is, when
our client actually acts on thiswhen the insights we deliver drive action to
change the world in a meaningful way.

WellWiki

Joel Gehman started the WellWiki project when he was a grad student, at a
time when the Pennsylvania Marcellus Formation and debates around fracking
first appeared in mainstream consciousness. Hoping to become the Wikipedia
for wells, WellWiki scrapes public databases to compile a wiki of North
American well info. They then combine the database feed with contributions
from the community to create a structured dataset tied to user-driven narrative
content. The goal is to have information on all the wells4 million by Gehmans
estimatedrilled in North America since the Drake Well in 1859. Neither a
watchdog nor industry-backed entity, WellWiki remains neutral while trying to
provide information and bring transparency to a fragmented and controversial
space. I think of it as giving wells biographies. Gehman says, Every well has a
story.
VIDEO

Learning Path: Data Visualization


Learning Path: Data Visualization
By
Shop now
While the data was generally publically available, Gehman found it difficult to
consume in its nonstandardized form. He wants to harmonize the data and
information, which is regulated and recorded inconsistently by state and
province in the United States and Canada. Landowners, community members,
citizens, journalists, and attorneys have used the site.
But maybe the real power of WellWiki is what happens behind the scenes,
where all the data that has been collected, scrubbed, and normalized can then
be joined to other datasets using standardized and unique keys, including
parent company financial reports and other business data, to help researchers,
academics, journalists, and others understand and report on the industry.
There are other well-monitoring organizations, some working to keep an eye on
the industry's activities. FracTracker is another data-driven organization,
providing maps and analysis to shine a light on the impact of fracking and
other oil and gas development projects. SkyTruthis a nonprofit that uses satellite
and aerial remote sensor data and imagery to identify and quantify the effects of
oil and gas production on the environment. According to their site, SkyTruth

was the first to publicly challenge BPs inaccurate reports of the rate of oil
spilling into the Gulf.

Other Disruptors
There are other relatively new technologies beginning to be adopted that may
impact niche segments of the industry. 3D printing has the potential to bring
disruption to the supply chain. Drones are beginning to be used to acquire aerial
images and remote-sensor data. Crowd-funding platforms like
crudefunders.com allow individuals to participate directly in the oil business. DIY
spill cleanup and monitoring organizations have sprung up in the aftermath of
the Deepwater Horizon spill.

Summary
Its early times, but one of the important things to remember is this is iterative,
[its] low hanging fruit at first, but over time you can really dial in models to
become really good at predicting, safety, or maintenance, says Hortonworks
Mendelevitch, and the capabilities of these new tools and platforms are in turn
changing the way oil and gas does business; for example, cheaper storage
allows them to change retention policies, and keep more data longer. In the
past, a lot of this data was thrown away pretty quickly, because the cost of
storing it was very high. Thats the disruptive part of Hadoopdata storage is so
inexpensive.
Its really important to understand the industry well and the pain points of the
industry in order to develop the appropriate solutions, says Kaggles Nur,
adding that cultural differences dont matter if you deliver. If you have a solution
that creates value, and is proven, then clients will use it.
Despite its massive size, the oil and gas industry operates on small margins,
and depends on efficiency and optimization for profitability. Given extraordinary
capital costs, a wide and deep array of risk, and high cost of error, machine
learning and predictive analyticsdriven by faster, distributed cluster computing
and larger, cheaper storagebecomes an increasingly important factor to
address the efficiency and optimization required to extract profits along with
hydrocarbons.
VIDEO

Data Governance

Data Governance
By John Adler
Shop now

Innovation in Tough Economic Times


Recently, of course, oil prices have plummeted. The question is, how will lower
crude prices affect investments in innovation and data science? At a time when
margins are being squeezed even further, and vendors up and down the supply
chain are being asked to cut back, will resources be found for new investment?
The consensus seems to be: yes, at first, projects will be cut. But then,
innovation becomes a necessity. The first reaction is usually, well look at our
suppliers, and well look at our staff, and well look at our freelance contractors,
and well ask them all to take a haircut, says Lockheeds Stidolph, and then
theyll look and see 'Have we got any projects we can suspend or defer?'but
then they have to look at how they can be smarter, and collaborate more, share
drilling rigs, move the data more efficiently, mine it harder. Autodesks
Thasarathar also sees opportunity in collaboration, and shared IP, as well as a
chance to get leaner and smarter: Once the dust has settled on the initial
kneejerk of 'cut capital spending by 20% or whatever it might beI think theyre
going to look to the supply chains to deliver those costs that are going to be
cutand one way to do that is through innovation and technology. But the
current drop in prices may require more than a "haircut," as falling prices are
causing a steady decline in the number of rigs open for business, with over 500
US rigs having closed in the past year, and Halliburton, Schlumberger, and
Baker Hughes have all signaled significant layoffs to come.
While some people might see downtimes as an opportunity to innovate, others
become even more risk averse. But SparkBeyonds Maor sees even more
reason to embrace data solutions in that case: When you become risk
averseuncertainty becomes even more of an issue. You want to have a really
good idea of how much oil is going to come out. They arent going to stop
drillingand they still have great uncertainty. Our solution has proven to
dramatically reduce the amount of uncertainty.
The cost of error is higher now [and] you cannot make as many trial and errors
as you could before, because you do need to slow down some of your
activities, adds Davidovich. But this also creates a higher pressure to innovate.

It just takes the inherent challenges of applying predictive analytics in such a


risk averse space, and it makes them even more acute.
Halliburtons Priyadarshy sees the commitment to big data as a long-term play:
Data projects are an investment, its not like you can get the return tomorrow.
You have to invest in it, you have to build a team, and you have to come up with
a plan, he says. Think of it like building a startup within a company.

Post-Mortem
The falling price of crude is already having an impact on data science forays into
the industry. Kagglementioned earlier in this articlerecently eliminated their
energy-industry consulting business.McMillan, Robert. "DATA-SCIENCE
DARLING KAGGLE CUTS A THIRD OF ITS STAFF."Wired.com. Cond Nast, 9
Feb. 2015. Web. 20 Mar. 2015.In addition to falling prices, its possible that
petro companies were uncomfortable with Kaggles "competition-based"
business model, which could require them to share their private data with the
data science community. While Kaggles innovative algorithms and predictive
expertise may very well have contributed insights and improved efficiencies to
the century-old effort of supplying petroleum to the industrial world, it seems that
not enough companies were willing to make that leap quite yet, particularly
given the current environment. In the data era, however, we look forward to
seeing more experimentsand successesin this high-stakes industry.
Article image:

Daniel Cowles
Dan Cowles is a writer, filmmaker, and data geek who has worked in the tech
sector for the last 20 plus years. Dan is interested in human beings and their
stories, and all manner and method of telling them. He lives in Berkeley, CA with
his wife and son.

https://www.oreilly.com/ideas/oil-gas-data

You might also like