You are on page 1of 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/267592647

Root Cause Analysis of an Industrial Boiler Explosion (and How Hazard Analysis
Could Have Prevented It)

Conference Paper · January 2010


DOI: 10.1115/IMECE2010-37944

CITATIONS READS
0 1,479

4 authors, including:

Juan C Ramirez Mark Fecke


Rimkus Consulting Group Exponent
43 PUBLICATIONS   503 CITATIONS    6 PUBLICATIONS   14 CITATIONS   

SEE PROFILE SEE PROFILE

Delmar R. Morrison
Exponent
29 PUBLICATIONS   58 CITATIONS   

SEE PROFILE

All content following this page was uploaded by Juan C Ramirez on 23 March 2015.

The user has requested enhancement of the downloaded file.


Proceedings of the ASME 2010 International Mechanical Engineering Congress & Exposition
IMECE2010
November
November 12-18,
12-18, 2010,
2010, Vancouver,
Vancouver, British Columbia, Canada

IMECE2010-37
IMECE2010-37944

ROOT CAUSE ANALYSIS OF AN INDUSTRIAL BOILER EXPLOSION


(AND HOW HAZARD ANALYSIS COULD HAVE PREVENTED IT)

Juan C. Ramirez Mark Fecke


Exponent Exponent
Lisle, IL, USA Lisle, IL, USA

Delmar “Trey” Morrison John D. Martens


Exponent Exponent
Lisle, IL, USA Lisle, IL, USA

ABSTRACT INTRODUCTION
An explosion occurred in the firebox of an industrial boiler This paper is divided into several sections. The
with a nominal fuel input rate of 100 MW (340 million Btu/hr), Background section provides information on the equipment
in a processing plant during final commissioning of the burner involved in the explosion; this includes the boiler and a
systems. This paper describes the investigation of the incident, description of the controls system. The Incident Summary and
root cause analysis, and lessons learned from the incident. The Analysis section discusses some important facts and witness
original burners in the boiler had recently been replaced with observations preceding the accident. This section is a brief
low NOx burners, and the facility was in the process of summary of the accident sequence of events but has been
commissioning the new burner system. The boiler was running generalized to protect confidential details. This is followed by
only on natural gas igniters at the time of the incident. While the Root Causes section in which we identify the root causes
firing on igniters, an undetected stoppage of the control for the failure. We finish the paper with a Hazard Analysis
equipment occurred, which led to a restriction of airflow section, which provides an introduction to this field and
through the secondary air dampers. The boiler controls describes how we are currently applying the techniques of
included programmable logic controllers (PLCs) for both the hazard analysis to assists plant personnel avoiding events like
combustion control system (CCS) for regulation and the burner this one in the future.
management system (BMS) for safety functions. The BMS was
intended to detect a loss of control such as this and immediately BACKGROUND
stop fuel to the boiler; however, it did not. The BMS PLC was As part of the investigation we used several codes,
not configured to detect the dangerous states and allowed the standards, and guidelines [1,2,3,4] as authoritative references.
igniters to continue to fire. An explosion subsequently The boiler in question is a two drum design with superheat
occurred within the boiler firebox that caused extensive yielding superheated steam at a design pressure of 7 kPa (1000
damages to the facility and equipment. This paper will describe psig), at 450°C (842°F). The boiler produces steam at a rate of
the incident investigation and determination of multiple root 25 kg/s (200,000 lbs/hr) that can be used for either process heat
causes for failure of the BMS to prevent the explosion. The or a power turbine.
inadequate configuration of the control systems was likely
present for some time prior to the incident, and the explosion Boiler System and Equipment
was eventually caused when the right conditions occurred The boiler provided steam as one of the primary utilities
during this commissioning. We found through the investigation for the plant. A block flow diagram is provided in Annex A to
that the BMS deficiencies could have been detected and illustrate the fuel and air flow to the boiler. The Low-NOx
prevented (and almost were) through standard hazard analysis burner system for the boiler consisted of primary burners
techniques common in the chemical processing industries. This coupled with natural-gas-fired igniter burners. The air stream
paper will also discuss how hazard analysis can be applied to was substoichiometric to the burners requiring a secondary air
detect and prevent similar system failures. feed for stable combustion.

1 Copyright © 2010 by ASME


A natural gas fired igniter was mounted adjacent to each and facility damages were limited to the boiler, boiler house,
burner. Each igniter had a natural gas feed and an igniter and nearby equipment.
primary air feed. The igniter gas/air stream was The underlying cause of the explosion in the boiler and the
substoichiometric and relied upon the additional air feed from events leading up to the incident are discussed below. The focus
the windbox to achieve stoichiometric combustion. of the investigation was on the industrial boiler and the control
The constant-speed forced draft (FD) fan supplied the system (BMS, CCS, and operator interface).
additional air stream. Exhaust gases from the boiler exited the A few minutes before the explosion, a series of
furnace through the boiler penthouse before entering the observations were noted regarding the control system and final
pollution control equipment. Exhaust gases were drawn from control elements in the field. Unbeknownst to the operator, the
the furnace and through the air pollution control equipment via CCS had stopped execution. The boiler operator reported a loss
the constant speed induced draft (ID) fan. The FD and ID of control of the boiler and was working with a field operator to
dampers controlled the rate of airflow through each fan. The troubleshoot the issue. Multiple control elements were observed
discharge of the ID fan exited through the stack. to be only intermittently responding to control signals from the
boiler operator. The field operator reported that these control
Control System elements not only did not respond but operated in contradiction
The control system consisted of several interacting with the boiler operator’s commands. The boiler operator was
components and subsystems that were coordinated to control unable to control the boiler due to a loss of control system
the operation of the boiler. The various devices communicated response.
through hard-wired signals and network communications, and An examination of the PI data after the incident revealed
the operator interacted with the systems using a human- that the data from some sensors was not recorded for a period
machine interface (HMI). of several minutes surrounding the explosion. The data also
The boiler had four systems that worked together: Burner showed that the control of various process variables resumed
Management System, Combustion Control System, HMI, and after the outage. Furthermore, the PI data for pressures
Plant Information (PI) system. The BMS and CCS were upstream and downstream of the pollution control equipment
Programmable Logic Controller systems that had physical were recorded during the outage (i.e., non-CCS process data
connections to the sensors and actuators, known as field was not lost during the outage). The recorded PI data indicated
elements. The PI data system was an electronic data recording that there was a loss of airflow through the boiler firebox
system with a historian function. during the control outage.
Through actuators, sensors, and interlocks, the BMS PLCs Based on fault tree analyses (which we omit for the sake of
controlled various actuators (e.g., valve positions, feeders, and brevity) and incorporating all information from witness
mills) according to permissives defined by the logic running in statements, physical evidence and calculations, we concluded
the PLC program. The BMS represented the standard safety that the igniters approached a near flamed-out condition and the
functions typical to boiler systems. conditions in the boiler reached the LEL (Lower Explosive
The CCS was a typical PLC that consisted of several sub- Limit) of gas/air. In order to reach this condition, it is
systems: a main processor, flexible I/O modules, power hypothesized that the airflow was reduced and that this was
supplies, and networking cards. Much like the BMS, the I/O caused by the FD and ID dampers closing due to a loss of
modules allowed the PLC to ascertain the state of the boiler by control.
reading the value of sensors. The I/O modules also allowed the
PLC to control the various actuators by changing the outputs. ROOT CAUSE ANALYSIS
As is typical with PLCs, a key switch was located on the main The investigation was very detailed and included the fault
PLC in the cabinet for the CCS. Such key switches control the tree analyses, calculations, and other analytical tools as
state of the PLC using three settings: run (causes the PLC to mentioned above. Additionally, a detailed analysis of the
immediately execute the programmed logic), remote (the state controls hardware, PLC programs, and programming
is remotely controlled by a programmer, which is typically a procedures were required. Based upon the detailed
personal computer), and program (ceases program execution in investigation, a set of causal factors were identified for further
preparation for the downloading of new instructions). consideration. This set of causal factors was reduced to a more
succinct set of root causes (e.g., conditions whose removal
INCIDENT SUMMARY AND ANALYSIS may have prevented the incident sequence of events). Several
The explosion occurred in an industrial boiler. The burners of the more substantial root causes are discussed below to
on this boiler had recently been replaced, and the plant was in provide lessons learned from the investigation. These root
the process of commissioning the new burner system. The causes have been generalized to protect any confidential details.
boiler was running on natural gas igniters only during the day
of the accident, the boiler operator realized that he had lost Root Cause 1. PLC State Not Monitored
control over the boiler system. The explosion occurred within This condition refers to lack of an external indication to the
minutes of this upset. No personnel were injured in the incident BMS that the CCS had stopped or otherwise failed. The fact
that the PLC state was not being monitored may have been

2 Copyright © 2010 by ASME


revealed if a functional specification had called for a PLC state HAZARD ANALYSIS
monitor or if an independent review of the system had A common definition for a hazard in an industrial setting is
occurred. Such a review could have included a comparison to “a physical or chemical condition that has the potential for
current standards and recommendations, including for example, causing harm to people, property, or the environment.” [7] A
ABMA. [3] steam boiler plant is a specialized type of process unit that may
If the BMS is able to detect that the CCS is no longer able operate within a larger chemical process plant or as the source
to control an essential field device, then the BMS may act on of municipal power generation. Steam boilers may present
this information. The condition of the loss of the CCS may be process hazards from combustion, high temperature, high
detected by the BMS in advance of any safety sensors pressure, and steam along with typical machinery and
triggering as a result of improper operating conditions. This occupational hazards (e.g., rotating machinery, electricity, slip
information, i.e. the loss of the CCS, would trip the boiler. and fall). NFPA 85 recommends reviewing both initial design
Incorporating functional specifications, procedures, functional and changes to a boiler system for safety. NFPA 85 provides
testing, and inspection requirements in the management-of- many prescriptive guidelines for designing, commissioning,
change system may have reasonably prevented this accident or operating, and maintaining boiler systems. NFPA 85 does not
similar accidents. prescribe specific hazard evaluation techniques to accomplish
these safety objectives.
Root Cause 2. I/O Module Configuration Several types of process hazard evaluation techniques can
The default configuration for the I/O modules was to be applied to identifying potential hazards in steam boiler
output a signal that would close the ID and FD dampers upon a systems at the various stages in the life cycle of the process
PLC stoppage. This undesired fail-closed configuration may unit. The process unit lifecycle includes the stages of design,
have been caught if a functional specification had been written commissioning, and operation. At various points of the process
for the CCS or if a review procedure requiring independent life after initial commissioning, changes may be undertaken to
review of the system design had taken place. Such a review mechanical, electrical, or control systems that alter the
could have included a comparison to industry standards and functional characteristics. These types of changes may fall
recommendations, including for example, maintaining a under the definition of a company’s management of change
minimum airflow regardless of the operating state of the control procedures. A process hazard analysis (PHA) should be
system by not closing the dampers. It is important that the recommended for any such changes; however, the complexity
default setting does not cause the dampers to close in the event of the changes should be considered when determining what
of a stopped processor. type of PHA is to be conducted. Qualitative scenario-driven
hazard analysis techniques such as Preliminary Hazard
Root Cause 3. Minimum Damper Position Analyses (PrHA), Hazard Identification Studies (HAZID), and
Since the damper was allowed to fully close, the existing Hazard and Operability studies (HAZOP) are ubiquitous in the
requirements from the NFPA 85 standard were not met. An process industries, but these are better suited to unique
independent review of the airflow arrangement for the system processes and initial system designs. The boiler system
would likely have caught this issue. Administrative controls to discussed in this paper had been safely in service for over two
limit the minimum setting for the damper positions are decades prior to the incident; thus, most hazard scenarios that
insufficient in the case of a control system outage such as that could be identified by the earlier techniques were likely already
experienced in this case. NFPA 85 specifically requires that the mitigated for the existing design. However, the incident
airflow not be below the 25% level while there is fuel flow to occurred during a design modification, which could be used to
the burners or igniters. Typically, this function is ensured trigger a limited PHA to address the new changes.
through mechanical stops in the damper system to limit its For boiler systems, an effective form of PHA is a Checklist
minimum position. Analysis and detailed guidance can be found in Reference. [6]
A checklist is a written list or table of items that is used to
Root Cause 4. Non-Operational Minimum Airflow verify the status of a system. Individual checklists may be
Sensor. highly specific to a certain process or company, but they are
The terminal connections for the airflow sensor were non- frequently used to measure compliance with standards and
operational, therefore, the BMS could not detect a low airflow guidelines. [7] A checklist will provide criteria against which a
condition using this sensor. Such a condition must lead to a given system is evaluated. The authors have found good
master fuel trip based on the NFPA 85 code. The root cause success in developing boiler safety checklists for use in
analysis was focused on how this condition was created through management of change activities for boiler systems upgrades
the legacy system and remained hidden through the current and commissioning. Excerpts from such a generic checklist
burner upgrade project. Based upon the history of the minimum and explanation of how this checklist would have identified the
airflow monitoring strategy for the boiler, it was inferred that specific root causes are provided below.
this condition was created during a previous system upgrade. A checklist-based analysis on the provisions of NFPA 85
The change in this strategy should have reasonably included would likely have detected the root causes identified in the
verification of the final control elements but apparently did not. previous section when used by experienced and knowledgeable

3 Copyright © 2010 by ASME


personnel as part of the facility’s management of change. For
instance, careful consideration of section 4.6.3.2.3 of NFPA 85
may have addressed the conditions creating root causes 1, 2 and
3 as it explicitly requires evaluating specific failure modes such
as processor faults.
The lack of state-of-health monitor between BMS and CCS
(Root Cause #1) may have been detected by implementing a
checklist including ABMA - Combustion control guidelines
2001, Section 3.3. This section states: The control system is to
send a signal to the BMS anytime there is an internal fault that
jeopardizes the ability of the control system to maintain safe
operation.
The non-operational minimum airflow sensor (Root Cause
#4) may have been detected had section 6.6.5.2.1.1 - (14) of
NFPA 85 been followed carefully. This section reads: A
complete functional check of the interlocks has been made after
an overhaul or modification of the interlock system. If a
thorough functional check had been performed during many of
the system upgrades over the years, the absence of the sensor
functionality may have been detected before the special
requirements leading to the incident were met. Similar language
requiring functional checks can be found in sections
4.6.2.3.2.4* - (J), 6.4.2.2.2, 6.4.2.2.3, and 6.6.5.2.1.1 - (14).

REFERENCES
1. NFPA 85: Boiler and Combustion System Hazards
Code. Natioanl Fire Protection Association, Quincy,
MA.
2. ASME Pressure Vessel Code, Section VII
3. American Boiler Manufacturers Association (ABMA)
– Multiple Burner Boilers Combustion Control
Guidelines, Forward. 2001
4. American National Standard Institute (ANSI) and
Instrumentation Systems and Automation Society
(ISA) –77.41.01-2005, Forward. 2005
5. Guidelines for Investigating Chemical Process
Incidents, 2nd edition, AIChE Center for Chemical
Process Safety, p. 179 (2003).
6. Fecke M, Morrison DR, Martens J, Cowells J. “A
guide to developing and implementing safety
checklists: Plant steam utilities.” American Institute
of Chemical Engineers, 2010 Spring National
Meeting, 25th Center for Chemical Process Safety
International Conference, San Antonio, TX, March
22–24, 2010.
7. Guidelines for Hazard Evaluation Procedures, 3rd
Edition, Center for Chemical Process Safety, New
York (2008)

4 Copyright © 2010 by ASME


View publication stats

ANNEX A

BOILER FUEL / AIR FLOW DIAGRAM

5 Copyright © 2010 by ASME

You might also like