Professional Documents
Culture Documents
Abstract—In recent years, there has been an increasing initiative” in 2009 to harness big data for development and
interest in the field of big data analytics. It has been established humanitarian actions and published a report [4] highlighting
that there exist large amounts of data in the energy industry1 . the challenges and opportunities. According to [5], [6],
However, there is a need to develop methods combining domain
knowledge to transform this data into meaningful information a social, economic, and technical revolution has emerged
to return business intelligence. The existing literature on big around us, resulting in an exponential growth of data. This
data analytics focuses on applications in various fields such as data is generated at different levels in the form of social
healthcare, aviation industry, finance, energy industry, and sup- media information, smart devices, Internet of Things (IoT),
ply chain. However, within the energy industry, the application bank services, and reports etc. With the advancements in
of big data analytics in process safety and risk management is
in the nascent stages. The objective of this study is to discuss the computing technologies, it is easier to store data (clouds,
potential of big data analytics in the area of process safety and data warehouses), and draw insights with the help of tools
risk management in the energy industry. The paper outlines the such as artificial intelligence (AI), machine/deep learning,
systemic framework with different stakeholders, data sources, granular computing [7], cognitive computing, and computer
challenges, and discusses the benefits of big data analytics in vision. Big data has been defined by different users and
process safety. Four case studies with different applications
ranging from incident database analysis, predictive modeling selected definitions of big data are summarized in Table I.
for pump failures, dynamic risk mapping of operating plant, The attributes of big data are defined as 7 V’s and listed as
and image analysis to gain insights are demonstrated. It is follows[8], [9]:
concluded that the application of big data analytics would • Volume: large amounts of data generated from devices.
provide valuable insights for more informed policy, strategic,
and operational risk decision-making leading to a safer and • Variety: heterogeneity of data types, representation, and
more reliable industry. semantic interpretation.
• Velocity: data is generated at a rapid rate compared to
Keywords-Big Data Analytics; Process Safety; Risk Map;
Incident Database; Fault Detection; Image Analysis the traditional systems and requires processing.
• Value: added value from the information extracted.
I. I NTRODUCTION • Veracity: uncertainty, accuracy, and reliability of data.
• Variability: number of inconsistencies, variable data
“Big Data” has transformed from a buzzword to a real
sources and data changes (dynamic).
value creator in recent years and is serving as a key
• Valence: inter-connectedness, inter-relation.
enabler in boosting the performance of operations, economy,
and businesses. Several countries and organizations have This paper provides the basis of application of big data
started various projects to harness the big data. In the analytics in process safety that would provide valuable in-
United States, The Obama Administration launched the ‘Big sights. This would result in more informed policy, strategic,
Data Research and Development Initiative’ in 2012, and and operational risk decision-making leading to a safer and
in 2016 the administration released “The Federal Big Data more reliable industry. The paper is organized as follows:
Research and Development Strategic Plan”. This highlights the applications in other industrial sectors and value created
the emerging big data capabilities and provides guidance by big data analytics are described in Section II. Section III
for developing or expanding federal big data research and outlines the system framework with different stakeholders,
development (R&D) plans [1], [2]. In China, Ministry of data sources, challenges, and discusses the benefits of big
Industry and Information Technology (MIIT) have prepared data analytics in process safety. In Section IV, four different
a five year plan for developing big data infrastructure case studies in process safety and risk management and their
through standardized systems [3]. In Japan, big data is a key results are explained. Section V concludes the paper and
component of the national technological strategy since 2012. highlights future research areas and applications.
The United Nations (UN) established the “Global Pulse II. B IG DATA A NALYTICS AND A PPLICATION
1 Energy industry includes oil & gas industry , petroleum refining, In addition to the attributes of big data mentioned in Sec-
chemical manufacturing. tion I, it is essential that mechanisms exist for visualization
1144
B. Process Safety Data
Within the energy industry, data is generated continuously
from various sources and available in different formats.
Process safety related data can be broadly categorized into
three different levels as depicted in Figure 2.
• Data collected by regulatory agencies such as Depart-
ment of Transportation (DoT), Occupational Safety and
Health Administration (OSHA), United States Environ-
mental Protection Agency (USEPA) and similar agen-
cies in other countries. Some examples of databases are
incident statistics, statutory fines;
• Data collected by industry consortiums such as Amer-
ican Petroleum Institute (API), Oil and Gas Producers
Association (OGP), and many more. Some examples of
databases are metrics system, injury records, production Figure 3. Types and classification of process safety data
data;
• Data collected by organizations (manufacturing facili-
ties) such as chemical plants, oil and gas exploration [15],[16],[17],[18]. Some reasons noted in literature are
units etc. These databases are further classified into failure to learn from incidents [19], challenges in alarm
seven areas based on the source and type of data. These management and decision making [20], inadequate tools to
are as follows: quantify social (human & organizational) aspects etc. [21].
– Historian: process parameters, production data, Also, there is an increase in the development of different
alarm logs, machine monitoring, system fault process safety and risk assessment methodologies and tools
records. over the past few decades (1970-2020) [15]. From the data
– Design data: process flow diagrams (PFDs), pip- application viewpoint, the authors of this study believe that
ing and instrumentation diagrams (P&IDs), plant in each of those development stages, data was collected and
layouts, standard operating procedures (SOPs), in- utilized in some form in the past. However, a systematic
sturment and equipment data-sheets. approach has not been established to implement process
– Operational data: work permits, mechanical in- safety big data management. This gap can be filled with
tegrity and quality insurance data. the incorporation of PSBDMF in addition to current risk
– Centralized Maintenance Management System assessment and mitigation methods. Some of the critical
(CMMS): maintenance and reliability records, risk- questions, which most risk assessors deal with are - what is
based inspections and filed visit records. the right format for data collection?, what data are significant
– Laboratory Information Management System to collect?, are our facilities becoming any safer?, which
(LIMS): quality reports, lab test reports. metrics have an impact on safety?, can we analyze the health
– Process Safety Management (PSM) system: audit of safety barriers?, and what will be an effective maintenance
reports, Learning From Incident (LFI) communica- schedule? The incorporation of PSBDMF will address the
tions, training records, safety culture assessments. above-mentioned questions at different levels. Challenges
– Process safety studies: process hazard analysis related to these questions can be categorized into policy,
(PHA)/ hazard and operability studies (HAZOP), strategic, and operational as follows, see Figure 4.
emergency response plan evaluation studies, inci-
dent investigation reports. • Policy: This refers to the policy or rule making related
challenges. These can be addressed by the regulatory
Process safety data can have following types: static, dy-
agencies. Analysis of current databases can help infer
namic, and classifications: structured, and unstructured as
knowledge on which other data may be relevant, or
shown in Figure 3. Static data means data or report generated
effective usage of collected data, or prioritizing the
over a period and remains fixed for a considerable amount
inspection schedules.
of time, while dynamic data means data which change with
• Strategic: This refers to the industry consortiums such
time and are continuous in nature. Structured data refers to
as API, and OGP, which collect data for industrial sec-
data in table or specific report formats whereas unstructured
tors. Analysis of these databases can help in the identifi-
refers to data primarily expressed as text.
cation of robust metrics that influence the process safety
C. Process Safety Challenges significantly, or improvement of data collection and
Many authors have established that one major chal- management structure, or improvement in monitoring
lenge in process safety is that incidents continue to occur with insights on new metrics
1145
• Operational: This refers to the manufacturing
plants/facilities, which collect a wealth of data
within the organization. Analysis of these databases
can help in the identification of weak signals, or
evaluation of the effectiveness of safety barriers,
or recognition of optimal maintenance schedule, or
barriers prioritization and resource allocation for
emergency response based on dynamic risk profiles.
1146
Most of the organizations in the oil & gas industry have
implemented process safety management which involves
capturing the details related to near-misses or an incident
along with other PSM elements. Similarly, several federal
agencies such as DoT capture similar information related to
their jurisdictions. From these databases important trends,
areas of concerns and improvement methods can be derived
to reduce the losses due to downtime, injuries, property
damage and environmental impact. A detailed analytics
plan can be used to address the challenges and derive the
information for the stakeholders. One of the main challenges Figure 7. Choropleth incident map (developed from PHMSA database:
2002-2017)
during analysis is how to select specific variables from hun-
dreds of variables and draw meaningful conclusions [23]. To
demonstrate the application of such analysis on an incident available data, model generation and application of the
reporting database, we used a publically available dataset of model. For this purpose, one of the significance criteria
HAZMAT incident from PHMSA website [24], processed it was used based on property loss>=$50,000USD [24]. First
to build a predictive model and validated it with the help of a descriptive analysis was performed in Python to under-
Python [25] and IBM SPSS Modeler [26]. The summary of stand the datasets available and understand the nature of
the database is described in Table III Two datasets A and B the incidents, types of commodities involved as illustrated
have been used in this analysis. In general the missing data is through a graphic in Figure 6 and to categorize the states
imputed during the analysis; however, in this case it is not based on number of incidents, a choropleth map shown in
possible since the database is an incident database, based Figure 7 is prepared using ‘Python’[25] and ‘Plotly’[27].
on actual scenarios and investigations. Hence, the missing To prepare a predictive model for the incident significant
values observed were discarded. For ease in visualization classification, first the dataset from 2002-2009 was used to
following changes were made to the datasets: train the model and generate various decision rules with IBM
1) The description for commodity classification was SPSS Modeler. The details observed from different methods
made short. used for the purpose of model generation and deployment
2) The description of commodities and causes for both are show in table IV and a chi-squared automatic interaction
datasets were made universal. detection (CHAID) tree is shown in Figure 8. Out of the
mentioned models, classification and regression tree (C&RT)
was selected as it was providing the predictions at a higher
accuracy according to the significance rule defined prior to
the study.
Table IV
M ODELS TESTED ON DATASETS
1147
Figure 8. CHAID tree for dataset A
Table V
D ETAILS OF FEATURES EXTRACTED FOR THE DATASET
Figure 9. Tree for datset A Figure 10. Tree for datset B
Variable Extracted features
Vibration 3-h and 24-h rolling meand and standard deviation
Voltage 3-h and 24-h rolling meand and standard deviation
Faults Number and type of faults
Maintenance Days since last replacement of pump parts
Failures Actual failures
1148
Table VI Table VII
C ONFUSION MATRIX E VALUATION M ETRICS FOR CONFUSION MATRIX
1149
at Imperial Sugar manufacturing facility [37],[38].
Case1: Existing or conventional method
Case2: Proposed method with consideration of operations
hazard factor (hot work) and penalty factors for safety
barriers (no maintenance).
The methodology outlined in the flowchart is applied and
Figures 17 and 18 show the input and output screens for
the two cases. It is evident that the risk level changed
for case2 with inclusion of dynamic components in plants
such as OHF (hot work) and penalty of no maintenance.
This type of dynamic risk profile analysis would support
more informed operational decisions, improved maintenance
plans, work execution strategies, and overall safer and more Figure 17. Dynamic Risk Analysis Case 1
reliable operations.
1150
data, identify changes, and provide directions for the users to
take appropriate actions. This can reduce the overall analysis
time, action time and increase efficiency, productivity and
availability of the systems.
V. C ONCLUSION
1151
R EFERENCES systems approach for improved risk and safety management,”
Journal of Loss Prevention in the Process Industries, 2017.
[22] P. Chapman, J. Clinton, R. Kerber, T. Khabaza, T. Reinartz,
[1] M. K. (2106) Administration issues strategic plan for big data C. Shearer, and R. Wirth, “Crisp-dm 1.0 step-by-step data
research and development. mining guide,” 2000.
[2] N. NITRD et al., “The federal big data research and devel- [23] S. Anand, N. Keren, M. J. Tretter, Y. Wang, T. M. OConnor,
opment strategic plan,” 2016. and M. S. Mannan, “Harnessing data mining to explore
[3] Xinhua. (2017) China to manage big data through standard- incident databases,” Journal of Hazardous Materials, vol.
ized system. 130, no. 1, pp. 33–41, 2006.
[4] E. Letouzé et al., “Big data for development: Challenges & [24] PHMSA. (2017). [Online]. Available:
opportunities, new york: Un global pulse (white paper): Big https://www.phmsa.dot.gov/
data for development: Opportunities & challenges (2012),” [25] G. Van Rossum et al., “Python programming language.” in
Retrieved on, vol. 13, 2016. USENIX Annual Technical Conference, vol. 41, 2007, p. 36.
[5] D. Maltby, “Big data analytics,” in 74th Annual Meeting of the [26] I. S. Modeler, “14.2 algorithms guide,” IBM Corporation,
Association for Information Science and Technology (ASIST), 2011.
2011, pp. 1–6. [27] P. T. Inc. (2015) Collaborative data science. Montral, QC.
[6] L. Chiang, B. Lu, and I. Castillo, “Big data analytics [Online]. Available: https://plot.ly
in chemical engineering,” Annual Review of Chemical and [28] F. I. Khan and M. M. Haddara, “Risk-based maintenance
Biomolecular Engineering, no. 0, 2017. (rbm): a quantitative approach for maintenance/inspection
[7] A. Skowron, A. Jankowski, and S. Dutta, “Interactive granular scheduling and planning,” Journal of loss prevention in the
computing,” Granular Computing, vol. 1, no. 2, pp. 95–113, process industries, vol. 16, no. 6, pp. 561–573, 2003.
2016. [29] M. Čepin, “Optimization of safety equipment outages im-
[8] P. Bellini, M. Di Claudio, P. Nesi, and N. Rauch, “Tasson- proves safety,” Reliability Engineering & System Safety,
omy and review of big data solutions navigation,” Big Data vol. 77, no. 1, pp. 71–80, 2002.
Computing To Be Published 26th July, 2013. [30] R. C. Team, “R: A language and environment for statistical
[9] Y. Demchenko, P. Grosso, C. De Laat, and P. Membrey, computing. vienna, austria: R foundation for statistical com-
“Addressing big data issues in scientific data infrastructure,” puting; 2014,” 2014.
in Collaboration Technologies and Systems (CTS), 2013 In- [31] H. Wickham, ggplot2: Elegant Graphics for Data Analysis.
ternational Conference on. IEEE, 2013, pp. 48–55. Springer-Verlag New York, 2009. [Online]. Available:
[10] W. L. Chang, “Nist big data interoperability framework: http://ggplot2.org
Volume 1, definitions,” Special Publication (NIST SP)-1500- [32] scales: Scale functions for visualization. [Online]. Available:
1, 2015. https://cran.r-project.org/web/packages/scales/index.html
[11] D. IDC-Vesset, B. Woo, H. Morris, R. Villars, G. Little, [33] Statisticat and LLC., LaplacesDemon: Complete Environment
J. Bozman, L. Borovick, C. Olofson, S. Feldman, S. Conway for Bayesian Inference, 2016, r package version 16.0.1. [On-
et al., “Market analysis–worldwide big data technology and line]. Available: http://www.bayesian-inference.com/software
services 2012-2015 forecast,” IDC Analyze the Future, vol. 1, [34] M. Neill et al., “An integrated approach to operational risk
pp. 1–34, 2012. management–the role of process safety management,” in SPE
[12] J.-P. Dijcks, “Oracle: Big data for the enterprise,” Oracle Health, Safety, Security, Environment, & Social Responsibility
White Paper, 2012. Conference-North America. Society of Petroleum Engineers,
[13] IBM. (2017) Bringing big data to the enterprise. [Online]. 2017.
Available: http://www-01.ibm.com/software/in/data/bigdata/ [35] T. Whipple and R. Pitblado, “Applied risk-based process
[14] A. Oussous, F.-Z. Benjelloun, A. A. Lahcen, and S. Belfkih, safety: A consolidated risk register and focus on risk commu-
“Big data technologies: A survey,” Journal of King Saud nication,” Process Safety Progress, vol. 29, no. 1, pp. 39–46,
University-Computer and Information Sciences, 2017. 2010.
[15] P. Jain, H. J. Pasman, S. P. Waldram, W. J. Rogers, and [36] T. Aven, S. Hauge, S. Sklet, and J. E. Vinnem, “Methodology
M. S. Mannan, “Did we learn about risk control since seveso? for incorporating human and organizational factors in risk
yes, we surely did, but is it enough? an historical brief and analysis for offshore installations,” International Journal of
problem analysis,” Journal of Loss Prevention in the Process Materials & Structural Reliability, vol. 4, no. 1, pp. 1–14,
Industries, 2016. 2006.
[16] M. S. Mannan, O. Reyes-Valdes, P. Jain, N. Tamim, and [37] N. Khakzad, F. Khan, and P. Amyotte, “Dynamic risk analysis
M. Ahammad, “The evolution of process safety: current using bow-tie approach,” Reliability Engineering & System
status and future direction,” Annual review of chemical and Safety, vol. 104, pp. 36–44, 2012.
biomolecular engineering, vol. 7, pp. 135–162, 2016. [38] CSB, “Imperial sugar dust explosion and fire final investiga-
[17] MARSH, “The 100 largest losses 1974-2015.” tion report,” 2009.
[18] P. Jain, A. M. Reese, D. Chaudhari, R. A. Mentzer, and M. S. [39] Electric panel. [Online]. Available:
Mannan, “Regulatory approaches-safety case vs us approach: https://www.flickr.com/photos/scottbb/202290560
Is there a best solution today?” Journal of Loss Prevention [40] The OpenCV Reference Manual, 2nd ed., Itseez, April 2014.
in the Process Industries, vol. 46, pp. 154–162, 2017. [41] Itseez, “Open source computer vision library,”
[19] A. Hopkins et al., Failure to learn: the BP Texas City refinery https://github.com/itseez/opencv, 2015.
disaster. CCH Australia Ltd, 2008. [42] F. Boulogne, J. D. Warner, and E. Neil Yager, “scikit-image:
[20] P. Goel, A. Datta, and M. S. Mannan, “Industrial alarm Image processing in python,” 2014.
systems: Challenges and opportunities,” Journal of Loss Pre- [43] Mathworks. (2017) Structural similarity index (ssim) for
vention in the Process Industries, vol. 50, pp. 23–36, 2017. measuring image quality.
[21] P. Jain, H. J. Pasman, S. Waldram, E. Pistikopoulos, and M. S.
Mannan, “Process resilience analysis framework (praf): A
1152