You are on page 1of 7

Towards a Framework for Automating Drilling States Analysis Using

Stochastic Grammars

Orestes Appel
Candidate to MSc Intelligent Systems and Robotics
E-mail: orestes.appel@email.dmu.ac.uk
De Montfort University
The Gateway
Leicester, LE1 9BH, United Kingdom

Overview

The utilisation of Stochastic Grammars has been gaining momentum in the latest years within the Natural Language Understanding community. However, the applications of this technique go beyond this domain. There are a variety of disciplines
where this approach can be utilised, like image and audio processing, biochemistry, genetics and, last but not least, oilfield
drilling operations. For the latter, gaining a better understanding of the drilling process is key in order to aspire to achieve
operations automation and the significant costs reductions that come with it. Whilst in operation, the drilling equipment at the
rig streams surface drilling measurements at a very fast pace and for humans, the task of analysing this data and discovering
meaningful patterns in it is simply daunting. In this research project we aim to explore, evaluate and develop a framework that
will automatically generate a Stochastic Grammar using as input data being streamed from the rig, with the ultimate goal of
contributing to the process of operation automation. As a consequence of this project, the existing body of knowledge of both
disciplines, Drilling States Analysis Automation and Stochastic Grammars, will be expanded.

Background

The Oil & Gas industry tends to be rather conservative when it comes to quickly adopting new computing technologies without
plenty of testing and validation. The reason for this is obviously cost. The spending that goes into drilling a well is extremely
high in order to risk it in the hands of unproven technology. Having said that, the advent of new discoveries in a number of
scientific disciplines, is bringing new computing techniques closer to the field operation, where -if it were to be successful- it
could bring massive benefits in cost reduction, increased operational safety, efficiency, etc. The volume of data generated in a
typical oil & gas operation, let us say seismic acquisition is in the order of terabytes. Nevertheless, these data is first captured,
then processed and eventually interpreted. There are gaps among the times at which these activities happen. Nevertheless,
as described by [8], in the case of drilling operations, the pace is different and the requirements and needs to deal with the
data generated during the actual drilling process are quite unique. In the oil & gas business is not uncommon to hear someone
say that drilling a well accounts for two-thirds or more of the total cost of getting hydrocarbons off the ground. With such
a high cost, the interest and curiosity in techniques that could help to improve our learning of the drilling process, with the
objective of enhance it and automate it as much as possible, keeps growing day by day. The business managers will be willing
to invest, provided that the technology is carefully designed and tested before being put into use. As mentioned in [8] and [12],
the surface measurements captured from the rig whilst drilling are key in understanding the drilling process and its different
phases. As such, whilst in operation the rig goes through a number of states that are vital to comprehend the whole process.
For example, as indicated by [5] there are a number of drilling states that need to be managed efficiently in order to improve
the measurement of depth in a LWD operation (LWD stands for Logging While Drilling). Nevertheless, LWD is not the only
activity performed at the rig and there are many other drilling states that must be captured and understood. As described
1

already by [8] and [12] the drilling equipment at the rig streams every 3-5 seconds a variety of surface measurements that a
human being will be completely unfit to process -let alone attempting to identifying meaningful patterns-.
It is precisely at this point that the Computational Linguistics technique denominated Stochastic Grammars come into play.
It has been suggested by [8] that probabilistic grammars could be a good fit for dealing with the task at hand. Specifically, it
seems that the inversion of Stochastic Grammars could be very promising in this arena. Our main objective in this research
project will be realising the possibility of efficiently generating Stochastic Grammars from the big torrent of data streamed
from the rig to the computer operator (typically, a Drilling Engineer). If success is achieved -and there seems to be evidence
of this in the work of some researchers like [12] who considers this technique to be very promising- we would like to attempt to
extend the scope of previous studies and strive to build firstly a prototype and then a full-fledge system capable of implementing
the technique described above using real oilfield cases. A number of examples of the utilisation of stochastic grammars
are available in the Natural Language Understanding and Image Recognition disciplines as stated in [18] and [21]. In the
aforementioned works, the fitness of the technique being studied here in, has been established when it comes to the challenge
of analysing high-volume unstructured data in order to organize it, classify it and process it. In [16] we can see that it is
explained how Stochastic Grammars are introduced in the presence of ambiguity, as a mechanism to deal with it. This
situation can be assimilated to the one experienced when inspecting drilling states and a number of data types at the rig
site. Usually, there is noise being transmitted as well and as such some type of ambiguity seems to be present. In this
proposal we intent to clarify how fit-for-purpose the Probabilistic Grammar technique is, and put it at use if adequate for
the job being examined. As already shown, there is a significant amount of knowledge already generated through research
in the area of Stochastic Grammars. Based on the work already developed by [12], [8], [4], [18], [3], and [2] -the latter
presenting a real application in the drilling domain based on probabilistic grammars and mainly Hidden Markov Models- we
will attempt to apply Stochastic Grammars to the problem of understanding drilling states and analysing them with the final
goal of contributing to the automation of the analysis of drilling states in a typical upstream operation in the oil & gas sector.
The research challenges are many, among others:
1. Realise whether Stochastic Grammars (from now on referred to as SGs) will suffice to solve the problem at hand,
specially when according to [16], SGs do not carry context, so we may need to investigate whether a context-sensitive
model will be required (there are pros and cons).
2. Discard -or confirm- that other tools with a good fit for dealing with ambiguity could be better equipped to deal with
the issue, like some flavour of Fuzzy Logic (FL), as defined by [20].
3. Consider the possibility of combining both techniques -SGs and FL- in the process of analysing the drilling data.
Perhaps both could work together in synergy. The other option would be to use one technique to validate the results
obtained using the other one, and/or viceversa.
4. Confirm -from the computational standpoint- that the chosen technique will be capable of dealing with the complexity
of the input represented by the data streamed in from the rig and will be able to cope with pattern matching activities
-or at least be synergetic with the latter-.
5. Validate whether similar models used already in other domains (i.e. Linguistics, Speech Recognition, Natural Language
Understanding & Processing, etc.) will be able to be incorporated into a new context and be capable of being scaled-up
for the task at hand.
6. Determine the best programming tool/language to deal with the research topic and to be utilised as a vehicle for
implementing and deploying the devised model
7. Design a Graphical User Interface (GUI) smart enough to make the system usable -specially considering the hostile
operating environment of an oilfield drillerWe are aware that there will be additionally many other challenges, but the research team is extremely well motivated and is
confident that will be able to find answers to the questions above, as well as many others that undoubtedly will surface during
the investigation process.

Proposed work

2.1 Aims and objectives


The central aim of this research project is to produce both a conceptual framework and a software system for the application
of the Inversion of Stochastic Grammars in the context of Oilfield Drilling States Analysis and Its Automation. In the scope
on this investigation we define the Inversion of a Stochastic Grammar as the generation of a probabilistic grammar from
data streamed-in from the drilling rig.
Objectives: In order to achieve our central goal we rely in the following subobjectives:
2

1. Validate the fit for purpose of the model being considered (stochastic grammar and its inversion)
2. Elucidate whether models based on stochastic grammars previously used in other disciplines have the potential to be
assimilated into the domain of drilling and its operational states (i.e.: Computational Linguistics, Natural Language
Understanding, Bioinformatics, etc.)
3. Compare the stochastic grammar model with the fuzzy logic model -or equivalent- and find out whether they are interchangeable or complementary in the context of the problem being analysed
4. Define a validation model based on the outcome of the previous subobjectives where-by one model can validate the
results obtained by using the other one, if possible
5. Establish a sound model to automate the process of analysing drilling states in the oilfield context, including patterns
identification
6. Generate a software prototype & system using the devised framework
7. Perform an initial validation study with the end-users and drilling experts to evaluate the grounds of the devised approach
and define a continuing improvement process for the model and system developed
2.2 Rationale
A number of researchers in the upstream segment of the oil & gas sector have been paying attention to the latest developments
in Mathematics and Artificial Intelligence. The first attempts to achieve upstream automation started in the 1980s and left
a lot of disappointment behind them as per [8]. The main reason for that, it seems, was lack of maturity in both, the concept of operations automation that was produced (instead, now the trend is Human-Centred Automation), and the techniques
necessary to achieve the goal. In the last few years, the situation has started to change and there is a significant excitement
around the technology that is now available. In [12], this excitement is apparent. In this paper the authors have established
the need for introducing the so-called drilling states, which basically define what is happening at all times with four key
surface measurements at the rig plus some additional data as described already by [12] and [8]. However, the volume of
the data streamed in from the rig -and the speed at which it is transmitted- imposes a challenge for human interpretation and
pattern matching. Some researchers do believe that there is a lot of potential -in order to automate the drilling process- in the
designing of a framework capable of being converted into a computing system, that would be capable of generating a grammar
from the data received from the rig and producing automated analysis using as basis the previously generated grammar. It is
fundamental to notice that this grammar, however, must be capable of incorporating probabilities in order to be useful. As a
sort of a brief historical recount, firstly [6] created the modern definition of a formal grammar, whilst [4] introduced the concept of probabilities in formal grammars. Then, [7] went a step forward presenting stochastic logic programs. Furthermore,
[1] published a key paper covering statistical methods in linguistics, treating ambiguity and probabilities, and [21] produced
work on using stochastic context free grammars in understanding images. Later on, [9] used stochastic context free grammars
for the analysis of protein sequences, which implicitly speaks about an increase in complexity in the data being studied and
their relationships. All in all, based on a sound mathematical foundation, a number of researchers have come forward an have
found applications of the idea of aggregating probabilities to grammars as an important technique in various fields of study.
Among the latter, the main common characteristics are: (a) unstructured data as input; (b) high volume and fast-paced data
flows; (c) the need to create structured information out of unstructured one; (d) the need to find patterns that are usually hidden
in the data; (e) very often, the presence of ambiguity and noise; (f) the presence of random events and (f) the dilemma between
the need for context or rather a context free interpretation of the received data. Hence, we have got both, a clear definition of
a problem and the theoretical and pragmatic background needed to face such a problem. In this proposal we aim to use as
a foundation the previous experiences available in other domains and to extend them to cover a new field (oilfield drilling),
with the confidence that we will be able to solve the problem at hand, and at the same time, increase the body of knowledge
of the subject matter and of the theoretical and pragmatic tools in this area of study. It is important to notice that several of the
techniques that we will explore could be seen as belonging to a broader AI category, namely, Machine Learning [13]. If for
a moment we think of Machine Learning as the marriage of Computer Science and Statistics as disciplines, it will help us to
comprehend why so many of the techniques here being analysed are heavily based on statistics, like in the case of the work of
[11, 10, 19, 9, 17] and many others.
2.3 Methodology
First of all the research process will start with a complete review of the existing body of knowledge around Stochastic Grammars and its inversion with the idea of extending it and finding the best fit for purpose solution for the problem being studied.
A PhD candidate will carry on with the literature review. Upon completion, the initial analysis will be performed by the PhD
candidate under the guidance of the PhD advisor, who should be a senior member of the CCI (Centre for Computational
Intelligence: a prestigious research group hosted by the Division of Artificial Intelligence and Computer Modelling within the
School of Computing at De Montfort University). Once these steps are successfully completed, the process to attempt to build
3

the desired conceptual framework will be put in motion and finalised by building firstly, a prototype based on the aforementioned devised framework, and secondly, a comprehensive software system. In order to calibrate the deliverables obtained,
a group of expert drillers will be appointed to perform the validation of the model/software and to ensure that it meets the
objectives of the project. The population sample of drillers that will validate the model, the prototype/software system and the
approval criteria will be defined beforehand and integrated into the research instrument. In addition, it will be tightly coupled
to the research plan. There will be several control points along the way during the research project when the so-called
drilling experts will participate in the process. In the first place they will provide (a) initial input. Then they will participate
in a (b) short loop to validate the model(s) and the prototype(s). In addition -after the completion and deployment of the final
model & software system- a smaller drilling experts committee will stay put and engaged in order to provide enhancements
to increase the strengths of the model/system and to incorporate functionality improvements and/or corrections on continuous
basis.
2.4 Work Packages
In this section we will proceed to detail the work packages that will compose this research project that -without hesitation- we
do believe has plenty of merits to become the core of a PhD thesis at De Montfort University (DMU).
1. Literature Review and Analysis: the PhD candidate will spend time getting familiar with the literature about both the
drilling domain and the stochastic grammars discipline, as well as related or broader topics that deal with learning
and ambiguity (i.e. fuzzy logic, probabilities, pattern-matching, and more generally, machine learning). Besides, the
main topics, there are additional avenues of literature examination to pursue, like the incorporation of probabilities into
existing programming languages as covered by [14, 15], because the definition of the programming tools to be used in
the project is key to a successful implementation. The CCI advisor will participate in this activity as well, providing
guidelines (her/his time will overlap with the one of the PhD candidate).
2. Research instrument design: in this package we will design the research instrument to be utilised. It will cover aspects
such as the model to be used, data gathering, study population, sample selection and manipulation, data analysis, and
in general all the different aspects to be required to become part of a comprehensive research instrument. Given the
importance of relying in a good research instrument, we have specifically included it among the different work packages
to be delivered.
3. Analysis and comparison of methods and techniques: in this package we will perform a full analysis -and comparisonof the different methods and techniques researched (i.e. Probabilistic Grammars, Hidden Markov Models, Fuzzy Logic,
additional machine learning tools, etc.), with the objective of identifying the strengths and weaknesses among them,
and then selecting the best among them in the context of the problem being examined. At this point, we do not discard
that a combination of techniques could be applicable as well.
4. Development of a Basic Framework and Prototype: The primary goal in this work package is to articulate a basic
framework using as a foundation the knowledge developed in the items above. Somehow, this deliverable will be treated
as a proof-of-concept framework. Once this initial stage is completed, a prototype modelled after the aforementioned
framework will be built using the appropriate -and previously identified- programming tools.
5. Field initial feedback Report: this work package will deliver a report to the sponsors of the research project covering the
initial findings, describing the progress achieved at the time of writing the document, presenting the feedback supplied
by the expert drillers about the conceptual framework proposed and the developed prototype V0 . This package is
critical as it will be at this point that we will get to know if we have got the endorsement of the group of experts in
order to continue in the same research direction. If a change of course were to be deemed required, it will be a this
point that we will know about it and we will have to act on it accordingly. In terms of the progress of the project, this
activity can be considered to be in the project critical path. It is important to notice that in this step we will apply a
pre-built research instrument designed to collect feedback from the subject matter experts, leaving room for periodical
reoccurring improvements (i.e. getting feedback on enhancements, corrections, recommendations, etc.).
6. Building of a Full-fledge Framework and Prototype: in this step we will complete the conceptual model and will proceed
to the construction of the actual -full fledge- software system. At this point, all the required elements will be properly
incorporated after counting with the approval of the domain experts. Three software environments will be mandatory,
namely: development, staging and production. The first one for the pure software development process. The second
one for providing stability to the system and allowing the experts access to the latest version of the system that is still
being tested. The third one, the productive environment, where the end-users will work to perform their day-by-day
tasks and that will host the production system.

7. Creation of a field enhancement process: a permanent enhancement process needs to be built in cooperation with the
sponsors, end-users, expert drillers and developers. The objective of this is to make sure the business rules are clear
among all participants, so that effective progress is guaranteed with the framework and software system delivered to the
end-users. This enhancement process will clearly define how new requirements and improvements will be communicated to the developers, how the enhancements will be included in a newer version of the system (Vk+1 ), service level
agreements and associated costs, system maintenance plans, consulting agreement with the CCI, etc.
8. Final Evaluation, Report and Information Dissemination: preparation of a final report to stockholders clearly presenting
the final conclusions of the research process and highlighting the benefits of the framework devised and associated
software system. This phase will include as well the publishing of findings in technical journals and technical reports
(IEEE, ACM, SPE, AAAI, PhD thesis and others). In addition, this section will show a forward-looking strategy to
further advance the research in this area of interest for both, the AI and the oil & gas communities.

Relevance to Beneficiaries

The main beneficiaries of this research will be the oil & gas producers, the research departments of the oilfield services
companies and the Artificial Intelligence research community as a whole. For all of them the possibility of embracing a
framework that could be applied in the automation of drilling states analysis is key, as the drilling costs represents around
two thirds of the upstream total spending. Any new well supported technology capable of reducing the drilling costs, either
directly or by helping to increase the existing knowledge of the drilling states sub-domain, will be extremely well received
by the oil & gas industry community as a whole. A successful implementation of such a framework/system would translate
into a fundamental step forward in automating drilling operations. For the Artificial Intelligence community, and particularly
for the CCI at De Montfort, besides the direct effect of increasing the existing body of knowledge in the field of study and
augmenting the scientific/technical prestige, the opportunity of generating technology that could be deployed successfully in
the Energy Sector will be paramount, as it could open many doors for future research and additional funding.

Justification of Resources

The objectives defined for this project are very challenging, so counting with the right resources will be absolutely critical. In
the first place, a PhD candidate in Computer Science will be mandatory. For this candidate, having knowledge of upstream
operations in oil & gas will be extremely desirable. In addition, an experienced researcher of the CCI should be appointed,
too, as a PhD thesis advisor. We expect the PhD candidate to be interested in the Artificial Intelligence field and to be
familiar with the Structured, Logic and Functional Programming paradigms. In addition he should be capable of using modern
modelling commercial software like MATLAB and programming languages interpreters and/or compilers, like PROLOG,
SCHEME/LISP and/or C/C++. We expect the member of the CCI to serve as both, the PhD and the research project advisor.
She/He is expected as well to somehow participate in the development of the required framework via the application of her/his
research skills, the knowledge of the theoretical aspects involved in this research study and her abilities to draw on her previous
research experiences. In terms of computer hardware, 3 laptops running Windows XP, Windows Vista or Windows 7 will be
sufficient, provided that they would count at least with 4MB of RAM, 250GB of disk space and with CPUs of latest generation.
A separate video processor will be very desirable. Access to printing facilities will be key as well as access to library resources
(books, technical journals, technical reports, etc.). Some provisions should be made as well to cover some travel, lodging and
per diem expenses related to visiting a drilling field location in order to get familiar with the drilling operations, and visiting
some oilfield services technology centres -owned by the sponsor of the project- to validate and confirm the research direction
as well as interim and final results. It must be stressed that access to real data flow samples coming from the rig will be of
the utmost importance in order to be able to carry on with this research project.

References
[1] Steven ABNEY. Statistical methods and linguistics. In Judith Klavans and Philip Resnik, editors, The Balancing Act:
Combining Symbolic and Statistical Approaches to Language. The MIT Press, Cambridge, MA, 1996.
[2] Harry BARROW and Bertrand du CASTEL. System and method for determining drilling activity, PCT/IB2009/006346,
Patent filed in July. Filed Patent, 2009.
[3] Rens BOD. Using an annotated language corpus as a virtual stochastic grammarprobabilistic representation of formal
languages. Proceedings of the Annual Conference of the Association for the Advancement of Artifical Intelligence (AAAI),
pages 778783, 1993.
[4] Taylor L. BOOTH. Probabilistic representation of formal languages. Proceedings of 10th Annual Symposium on Switching and Automata Theory (SWAT 1969), pages 7481, October 1969.
[5] C. CHIA, H. LAASTAD, A. KOSTIN, F. HJORTLAND, and G. BORDAKOV. A New Method for improving LWD
Logging Depth. Proceedings Society of Petroleum Engineers Annual Technical Conference and Exhibition, 2:1184
1192, 2006.
[6] Noam CHOMSKY. Syntactic Structures. Mouton de Gruyter (formerly Mouton, The Hague), 2nd revised (2002) edition,
1957 1st edition. ISBN 3110172798.
[7] James CUSSENS. Stochastic logic programs: Sampling, inference and applications. Proceedings of the sixteenth annual
conference on uncertainty in artificial intelligence (UAI-2000), Stanford, CA, 2000.
[8] Bertrand du CASTEL. Human-centered oilfiled automation. Schlumberger Journal of Modeling, Design and Simulation,
1:1024, December 2010.
[9] Witold DYRKA and Jean-Christophe NEBEL. A stochastic context free grammar based framework for analysis of
protein sequences. BMC Bioinformatics, 10(323), October 2009.
[10] D. JURAFSKY, C. WOOTERS, J. SEGAL, A. STOLCKE, E. FOSLER, G. TAJCHMAN, and N. MORGAN. Using
a stochastic context-free grammar as a language model for speech recognition. International Conference on Acoustics,
Speech, and Signal Processing (ICASSP-95), 1:189192, May 1995.

[11] Christopher D. MANNING and Hinrich SCHUTZE.


Foundations of Statistical Natural Language Processing. The MIT
Press, 1999.
[12] G. MCLAREN, N. BROWN, Z. OKAFOR, and I. MEGAT. Improving the value of real-time drilling data to aid collaboration, drilling optimization, and decision making. Proceedings Society of Petroleum Engineers Annual Technical
Conference, SPE 110563, 2007.
[13] Tom MITCHELL. Machine Learning. McGraw Hill Higher Education, New edition edition, October 1997.
[14] Luc De RAEDT and Kristian KERSTING. Probabilistic logic learning. ACM SIGKDD (Special Interest Group on
Knowledge Discovery in Data) Explorations Newsletter, 5(1):3148, July 2003.
[15] Luc De RAEDT, Angelika KIMMIG, and Hannu TOIVONEN. ProbLog: A Probabilistic Prolog and its Application
in Link Discovery. In Proceedings of 20th International Joint Conference on Artificial Intelligence, pages 24682473.
AAAI Press, 2007.
[16] Stuart RUSSELL and Peter NORVIG. Artificial Intelligence: a modern approach. Prentice-Hall, Inc., 1995.
[17] J.A. SANCHEZ and J.M. BENEDI. Stochastic inversion transduction grammars for obtaining word phrases for phrasebased statistical machine translation. Proceedings of the Workshop on Statistical Machine Translation Association for
Computational Linguistics, pages 130133, 2006.
[18] Dekai WU. Stochastic inversion transduction grammars, with application to segmentation, bracketing, and alignment
of parallel corpora: HKUST-CS95-30. Technical report, Department of Computer Science, University of Science and
Technology, Clear Water Bay, Hong Kong, 1995.
[19] Ryo YAMAMOTO, Shinji SAKO, Takuya NISHIMOTO, and Shigeki SAGAYAMA. On-Line Recognition of
Handwritten Mathematical Expressions Based on Stroke-Based Stochastic Context-Free Grammar. Int. Conf. on Frontiers in Handwriting Recognition (IWFHR 2006), 2006.
[20] L.W. ZADEH. Fuzzy sets. Information and ControlProceedings of the Annual Conference of the Association for the
Advancement of Artifical Intelligence (AAAI), 8:338353, 1965.
[21] Song-Chun ZHU and David MUMFORD. A Stochastic Grammar of Images. Now Publishers, Inc., 2007.

Appendix: Project and Diagrammatic Work Plans

In this appendix we will show the Project Work Plan (PWP) and the Diagrammatic Work Plan (DWP), the latter displaying a
time line for the project activities related to this research proposal. Please keep in mind that in the DWP we are not showing
the time assignment for the PhD/Research advisor. There are two reasons for that: first, her/his time must be previously agreed
with the CCI, and second, his time assignment will always overlap with the time assignments of the main researcher whom is
the PhD candidate, so that it will not cause any delays (however, the participation of a CCI member will definitely will have an
impact on the cost structure of the project, and as such it must be clearly defined as soon as possible). As a quick assessment
we expect the PhD/Research Advisor time allocation to be around 20% of the time slotted to the PhD Candidate. So that, if
we assume the time allotted to the PhD Candidate to be 37 man/months, we estimate that the PhD/Research Advisor will use
approximately 20% of that, or 6-7 man/months.

Task
1
2
3
4
5
6
7
8

Description
Literature Review and Analysis
Research instrument design
Analysis & comparison of methods
and Techniques
Development of a Basic
Framework and Prototype
Field initial feedback
Building of a Full-fledge
Framework and Prototype
Creation of a field enhancement process
Final Evaluation, Report
and Information Dissemination

Table 1:

Deliverable
Reports
Models, Reports
Reports

Person Months
6
3
6

Abstract Models, Papers, Software

Reports, Software
Abstract Models, Papers, Software

3
6

Reports, Software
Reports

3
4

Project Work Plan (PWP)

Figure 1: DWP - Diagrammatic Work Plan (time represented in three-month units)

You might also like