You are on page 1of 43

ASM

Abnormal
Situation
Management
Defining the way things
will be.
The birth of ASM...

• ASM grew from an initial focus on alarm


management. Most sites are aware that operator
overload and alarm floods are common during
abnormal operations. As we analyzed the issues
around alarm management, we discovered that
operator problems with the alarm system were
only a symptom of a general issue:
– the design, implementation, and maintenance
of many facilities, systems, and practices.
ASM Consortium
• Charter:
Current Membership: – Research the causes of
abnormal situations and
create technologies to
address this problem
• Deliverables:
– Technology, best practices,
application knowledge,
prototypes, metrics
• History:
– Started in 1994
– Co-funded by US Govt
(NIST)
– Budget: +$16M USD
BRAD ADAMS WALK ER University Affiliates • Current Status:
A R C H I T E C T U R E, P. C.
– Committed through 2002
– Honeywell leadership
– Expanding membership
Requirements for Safe Operation
• Hazards must be recognized and
Understood
• Equipment must be “fit for purpose”
• Systems and procedures to maintain plant
Integrity
• Competent staff
• Emergency Preparedness
• Monitor Performance
In the area of alarm management most companies fail to
meet these basic requirements for safe operation
Various cost elements
Theoretical Limit
Future upgrades (e.g., Theoreticallypossible; currently unsustainable
Advanced Control) Current Li mit
Comfort Margin
Lost opportunity Operating Target
(Cost of comfort)
Profit
Lost Profit
Incident Break-even

Lost Revenue
Loss

Fixed Costs
Additional Shutdown (Idl ePlant)
Efficiency

unplanned costs
Accident Equipment
Plant Performance damage, etc.

Losses due to
Savings fromreducing the comfort
incidents, accidents
margin
(about 10% of
operating costs)
A Look At Plant Operations
A typical Production
Profile for an Asset 95 days
Intensive Facility for a
calendar year. 79 days
62 days

47 days
23 days
30 days
Days per Year

16 days

8 days

5 days

< 60% Daily Production 95% 100%

Production Target set by Enterprise


Factors Affecting Plant
Operations
Plant Operating Target
Planning Constraints
Plant Availability Operational Constraints

Production
Plant Incidents
Days per Year

Effectiveness
Asset Utilization
Plant Capacity Limit

< 60% Daily Production 95% 100%

Agility/Flexibility
Frequency Frequency
# Days
# Days

1
1
2
2
3

0
5
0
5
0

0
0
0
0
0

10
15
20
100
150
200
250
300

0
5
5

50
0

10
12
14
16
18
0

0
2
4
6
8
0
280
280 112
457

290
290
115
300 463
300
310 118
310 468

320
320
121
474
330
330
340 124
480
340
350
350 127
486

360
360
130
370 492

370

3.2%
380 133
380 497

$33.5 M
390
2

390
136
503
σ

400
2

400
4.2

410 139
509
410
M

420
420 142
515
24.2M

5.8%

430
5.8%

430
145
520
440
F
ee

440
d
H
is
t

R
at
o

e
g
r

$38.5 M
450 148
a

526
Productionrate

450
m

Histogram

460

Rate
Rate
460

Total Feed
151 532
Total Feed

470
470
154
480 538
1

480
σ

490
490 157 543

500
500
160
549
510
510
520 163 555
520
530
530 166 561

540
540
169 567
550
550
560
172 572
560
570
570 174 578
580
580
590 177 584
590
600
600 180 590
1
503

610
610
183 595
620
620
Real Life Examples

$38.5M!
capacity!
incidents!

5.8% in lost

lost $33.5M!
And this plant
This plant had
This plant had

This plant lost


$24.2M in lost
capacity due to
asset availability &
Site Studies have identified Plant Lost
Opportunity
Between 3-15% in Lost
Capacity is attributed to asset
in-availability and incidents Plant Operating Target
Planning Constraints
Plant Availability Operational Constraints

Plant Incidents Production


Management
NEW EMPHASIS!!
Days per Year

DCS/APC/
Asset Management Optimization efforts
Reliability & CMMS
Plant Capacity Limit

< 60% Daily Production 95% 100%


Manufacturing
Execution
Scheduling & ERP
Major Profit Potential
Emphasis on plant & Higher Plant Operating Target
equipment reliability Fewer Planning Constraints
improvements and reduced
incidents can result in a
recovery of 3-15% of Fewer Operational Constraints

lost capacity!
Days per Year

Plant Capacity Limit

< 60% Daily Production 95% 100%


The Importance of Alarm Management
Improvement Project
Alarm management is the proper
design, implementation, operation,
and maintenance of industrial
manufacturing plant alarm systems.
Current alarming practices are leading to Incidents
Major problem is:-
alarm flood
Standing Alarms
Poor Configuration of Alarms
Nuisance Alarms
Technology exists to significantly contribute to
effective alarm systems and provide good
Situation Awareness
Alarms identified as contribution
A Case
b
The lightning struck just before 9:00 AM on a Sunday. It immediately started a
fire in the crude distillation unit of the refinery. The control operators on
duty responded by calling out the fire brigade, and then had to divert their
attention to a growing number of alarms while desperately trying to bring the
crude unit to a safe emergency shutdown.
Hydrocarbon flow was lost to the deethanizer in the FCCU recovery section,
which fed the debutanizer further along. The system was arranged to prevent
total loss of liquid level in the two vessels, so the falling level in the
deethanizer caused the deethanizer discharge valve to close. This, in turn,
caused the level in the debutanizer to drop rapidly and its discharge valve
also closed. Heat remained on the debutanizer and the trapped liquid
vaporized as the pressure rose causing the pressure relief valve to “pop” (for
the first of three times) into the flare KO drum and then immediately onto the
flare itself.
continued

In a matter of minutes, the board operator was able to restore flow to


the deethanizer. This permitted the deethanizer discharge valve to
be opened, allowing renewed flow forward to the debutanizer. The
rising level in the debutanizer should have caused the debutanizer
discharge valve to open (by the level controller action) and allow
b

flow on to the naphtha splitter. Although the operators in the


control room received a signal indicating the valve had opened, the
debutanizer, nonetheless was filling rapidly with liquid while the
naphtha splitter was emptying. The operators were concentrating
on the displays which focussed on the problems with the
deethanizer and debutanizer, and had no overview of the process
available to indicate that even though the debutanizer discharge
valve registered as open, there was no flow going from the
debutanizer to the naphtha splitter.
Despite attempts to divert the excess, the debutanizer became liquid-logged
about an hour later and the pressure relief valve lifted for the second time,
venting to the flare via the flare KO drum. Because there were enormous
volumes of gas venting, the level of liquid in the flare KO drum was rising
to a very high value.

About 2-1/2 hours later, the debutanizer vented to the flare a third time AND
CONTINUED VENTING FOR 36 MINUTES. The high level alarm for the flare
drum was activated at this time. But with alarms going off every 2 to 3 seconds,
there appears to be no evidence that that alarm was ever seen. By this time, the flare
KO drum had filled with liquid well beyond its design capacity. The fast-flowing gas
through the overfilled drum forced liquid out of the drum’s discharge pipe. The
discharge line was not designed for liquid, so the force of the liquid caused a rupture
at an elbow. This released over 20 tons of highly flammable hydrocarbon.
continued

The ensuing release quickly formed an ominous


drifting cloud of vapor and droplets. In a matter
of minutes, this cloud found its ignition source
350 feet downwind. The resulting explosion was
heard 80 miles away. In the town nearest the
plant, few windows still held intact panes, so
overpowering was the pressure shock wave from
the blast. The last fires in the refinery were
eventually extinguished 2 days later. end
Interface
between the
organization
& the individual
Management Workplace

Source Functional Condition Unsafe Acts


Failure Failure Tokens Errors &
Types Types Precursors Violations

Organization Individual
Stylistic or Cultural General Failure Poor workplace Near miss
Indicators Types design Auditing
Top Down: Accidents High workload
Unsociable hours Du Pont
Commitment Incidents
Inadequate Training
Competence Near-Misses training Workspace
Cognizance 1-10 hit list Poor perception
Motivation
data collected & of hazards
Proactive Design Attitude
analyzed Alarms
SI Projects Human Factors
Safety Information System
Control room Group Factors
Diagnostic and
Best Practices design Working Practice
remedial measures
Various cost elements
Theoretical Limit
Future upgrades (e.g., Theoreticallypossible; currently unsustainable
Advanced Control) Current Li mit
Comfort Margin
Lost opportunity Operating Target
(Cost of comfort)
Profit
Lost Profit
Incident Break-even

Lost Revenue
Loss

Fixed Costs
Additional Shutdown (Idl ePlant)
Efficiency

unplanned costs
Accident Equipment
Plant Performance damage, etc.

Losses due to
Savings fromreducing the comfort
incidents, accidents
margin
(about 10% of
operating costs)
Managing Abnormal Situations
Anatomy of a Disaster from Operations Perspective

Operational Critical Operational Plant


Modes: Plant States: Systems: Goals: Activities:

Disaster Area Emergency Response


System
Emergency Minimize Firefighting
Site Emergency Response Impact
Accident First Aid
System
Rescue
Physical and Mechanical Bring to
Containment System Safe State
Out of Evacuation
Control
Safety Shutdown,
Protective Systems,
Abnormal Hardwired Emergency Alarms
Return to Manual Control &
Normal Troubleshooting
Abnormal
DCS Alarm System

Decision Support System


Process Equipment,
Keep Normal Preventative
Normal Normal DCS, Automatic Controls Monitoring &
Plant Management Systems Testing
300

250
Unexpected Upsets Cost 3-8% of Capacity 3.2%
Histogram

5.8%
Frequency

200

150

100

50

0 H
is
tog
ram
115

118

124

127

142

145

148

154

163

172

174
112

121

130

133

136

139

151

157

160

166

169

177

180

183
1
503
3
00

~ $10 Billion annually in lost production !


2
σ Productionrate
1
σ
2
50
Frequency

2
00
$
24.2
M

1
50

1
00

5
0 503

515
457

463

468

474

480

486

492

497

509

520

526

532

538

543

549

555

561

567

572

578

584

590

595
0

F
ee
dRa
te

Total Feed
18
16
14
12
10 $38.5 M
# Days

Plant Operating Target


6
4
2
0
350

420

490

560
280

290

300

310

320

330

340

360

370

380

390

400

410

430

440

450

460

470

480

500

510

520

530

540

550

570

580

590

600

610

620
Total Feed
20 Rate

Planning Constraints
15
$33.5 M
# Days

10

0
280

350

360

370

390

400

470

480

550

560

590

600
290

300

310

320

330

340

380

410

420

430

440

450

460

490

500

510

520

530

540

570

580

610

620
Operational Constraints
Rate

Summarized Production Data


Days per Year

Optimization efforts

Plant Capacity Limit

< 60% Daily Production 95% 100%


Major Profit Potential
Higher Plant Operating Target

Fewer Planning Constraints

Fewer Operational Constraints


Focused efforts can
result in recovery of
3-8% of capacity
Days per Year

Plant Capacity Limit

< 60% Daily Production 95% 100%

~ $10 Billion potential to the bottom line!


Timing diagram of DIN V 19251 as applicable
for a single channel SRS with ultimate self tests
executed within the PST

Failure Occurrence in the Failure is Safe status of the


Process or in the Detected Process assured
Safeguarding System

t
System internal Time for Time for reaction of the Process
diagnostic time corrective action on the corrective action

Fault Tolerance Time

Fault tolerance time of the process or Process Safety Time (PST)


Reliability Requirements for Alarms
Claimed PFDavg Alarm system Human
integrity/reliability reliability
requirements requirements
1 – 0.1 Alarms may be
integrated into the
process control
system

No special requirements – however


the alarm system should be operated
engineered and maintained to the
good engineering standards
identified in the EEMUA Guide

EMMUA Alarm Systems Guide page 17


CONCEPT 1 : RISK REDUCTION

Actual Risk to meet


remaining required Level EUC Risk
risk of Safety

Necessary minimum risk reduction [ ∆ R ] Increasing


Risk
Actual risk reduction

Partial risk covered Partial risk covered Partial risk covered


by E/E/PES by Other Technology by External Risk
SRSs SRSs Reduction Facilities

Risk reduction achieved by all SRSs & External Risk Reduction Facilities
SAFETY INTEGRITY LEVELS

TABLE 2: SAFETY INTEGRITY LEVELS: TARGET


FAILURE MEASURES
SAFETY DEMAND MODE OF CONTINUOUS/HIGH
INTEGRITY OPERATION DEMAND MODE OF
LEVEL (Average Probability OPERATION
of failure to perform (Average Probability
(SIL) its design function of a dangerous
on demand) failure per year)
4 10-5 to < 10-4 10-5 to < 10-4
3 10-4 to < 10-3 10-4 to < 10-3

2 10-3 to < 10-2 10-3 to < 10-2

1 10-2 to < 10-1 10-2 to < 10-1


Reliability requirements for alarms
Claimed PFDavg Alarm system Human reliability
integrity/reliability requirements
requirements
0.1 – 0.01 Alarms system should The operator should be
be designated as safety trained in the
related & categorized as management of the
SIL 1 specific plant failure
that the alarm indicates;
Alarm system should The alarm presentation
be independent from arrangements should
the process control make the claimed alarm
system very obvious to the
operator and
distinguishable from
other alarms
The alarm should
remain on view to the
operator for the whole
of the time it is active
EMMUA Alarm Systems Guide page 17
Reliability requirements for alarms
Claimed PFDavg Alarm system Human reliability
integrity/reliability requirements
requirements
Below 0.01 Alarms system would It is not recommended
have to be designated as that claims for a PFDavg
safety related and below 0.01 are made
categorized as at least for any operator action
SIL2 even if it is multiple
alarmed and very
simple.
For all credible
accident scenarios the
designer should
demonstrate that the
total number of safety
related alarms and their
maximum rate of
presentation does not
overload the operator

EMMUA Alarm Systems Guide page 17


The Setting of a high pre-trip alarm
Maximum rate of change
of alarmed variable during fault

Limit at which
Time for operator B protection operates
to respond to alarm
and correct fault Abnormal Operating Region

Alarm Setting
A
Limit of largest normal
operational fluctuation
EMMUA Alarm Systems Guide page 17
120 Explosion
Lower Explosive Limit (LEL)
Gas Concentration (Percentage of LEL)

100
Actual Gas
Concentration
80
Actual trip point

Normal
60 operating Level Error Measured Gas
Set trip point Concentration
Gas concentration
prior to fault
40

20 Fault Sampling Sensor Error Shut Down


Occurs Delay Delay Delay System Delay

0
0 10 20 30 40 50 60 70 80
Time after onset of fault (Seconds)
Redesign Choices
• Redesign - the plant or its controls to provide greater margin between the normal
operating limits & the trip limits. This is the most desirable solution but is often
impractical or too expensive;
• Setting within normal operating limits - setting the alam within the limits of normal
operating fluctuations & accepting that spurious alarms will occur during large normal
disturbances. This is ergonomically very undesirable and will tend to increase alarm rates
and reduce the operator confidence in the alarm system. In effect it increases the Average
Probability of Failure on Demand (PFDavg ) for the alarm system as a whole;
• setting nearer trip limits - setting the alarm closer to the trip limits and accepting that some
fast transients will not be corrected by the operator before they reach the trip level. This
will increase the production losses due to plant trips, & because there are more demands
on the protection system, tend to make the plant less safe. It also implies an increase
PFDavg for the alarm system.

EMMUA Alarm Systems Guide page 17


Different Kinds of Events

Potential
Impact
of
Initiating
Abrupt/Catastrophic
Event

Manageable

Insidious

Time
Impact of DCS Alarm System
Awareness of Disturbances
With typical alarm systems,
orienting begins after an event Incident
creates an abnormal plant state.
The extent of the problem can
impact operator’s ability to be fully
aware of the locations of process
Potential disturbances.
Impact As disturbances propagate the
number of conditions to be aware of
of increases as well as the response
Initiating requirements and the likelihood of
missing important information.
Event Failure is
Detected
Safe status of the
Process assured
Failure Occurrence in the
Process or in the Safeguarding System Time

Point of operator awareness

Correct intervention causes return to normal


Impact of DCS Alarm System
Management of Problems
Incident

Inadequate filtering interferes with Action


Potential
Impact Alarm Floods delay Evaluation
of
Standing Alarms
Initiating interfere with
Event Orientation

Time

Point of operator awareness

Correct intervention causes return to normal


Impact of Good Alarm Management in Situation
Awareness

• Increases likelihood of
awareness of disturbances
Potential • Reduces time to awareness
• Hence, reduces the average
Impact impact of initiating events
of
Initiating
Event

Time

Average shift in awareness with decision support


Impact of Protection System

UN-SAFE
Incident
Trip SAFE
Emergency Alarm Loss
Impact
of
Initiating Quality
Event High Alarm
Operator
diagnostic time
Profit
Time FTT
Process Safety Time
Trip from SIS Emergency High FTT= Fault Tolerance Time
No response
Incorrect

Potential
Impact
of
Initiating
Event Suboptimal

Best
Time
Impact of Decision Support System
Support for Optimal Response

• Reduces errors
• Decreases time to implement
response
Potential • Manages side effects
• Increases awareness
Impact
of
Initiating
Event

Time
ASM Alarm Management Solutions
Education for Management, Engineers, Technicians
and Operators.

• Alarm Performance Assessment.


• Requirement for alarm optimization tools.
• Alignment with Company & EEMUA Guidelines.
• Alarm Rationalization.
• User Interface Design.
• Decision Support Activities
Alarm Management Optimization
Objectives
• Enhance operator effectiveness
– Avoid alarm floods
– Identify root causes
– Eliminate nuisance alarms
• Enhance profitability
– Reduce variability
– Maximize plant up time
– Prevent damage to equipment
• Reduce risk of :
– Injury to personnel
– Environmental incidents
Alarm Management Optimization
The Process

Collect
Collect Data
Data

Change
Change
Management
Management Analyze
Analyze

Develop Plant
Alarm Management
Standards & Philosophy

Identify
Identify
Implement
Implement Enhancements
Enhancements

Verify
Verify Against
Against
Standards
Standards
Alarm Management Optimization
Alarm Management Before - 30 Points Account for ~ 85 %
of All Alarms
• Increase the effectiveness of the existing 100
K

alarm system through proven


methodology
– Analyze existing system performance
– Assist in developing an alarm strategy and educating
operations staff
– Rationalize existing alarm system
After - 30 Points Account for ~ 52 %
• Recommend and apply new alarm 2
of All Alarms
K
management software
– UserAlert
– Optimization Suite
• Alarm Rationalization and Documentation
• Alarm Metrics and Analysis
• Advanced Alarm Handlers
Optimization Suite…
Alarm Rationalization
• Alarm priority (class) is based on severity and
level of impact and time
• Available priority options in TPS:
– No Action
– Journal
– Print
– Print & Journal
– Low
– High
– Emergency
Optimization Suite…
Alarm Rationalization
• Recommends alarm priorities based on plant
philosophy
– Severity of impact
– Time to respond
– Trip Point
• Electronically captures plant alarm
management philosophy
– Time to respond rules definition
– Impact and severity rules definition
• Apply manual priority override
• Use Alarm Impact Templates
• Generate EC Files (Honeywell)

You might also like