Presentation On Alarm Management

ASM
Abnormal
Situation
Management
Defining the way things
will be.
The birth of ASM...
• ASM grew from an initial focus on alarm

management. Most sites are aware that operator
overload and alarm floods are common during
abnormal operations. As we analyzed the issues
around alarm management, we discovered that
operator problems with the alarm system were
only a symptom of a general issue:
– the design, implementation, and maintenance
of many facilities, systems, and practices.
ASM Consortium
• Charter:
Current Membership: – Research the causes of
abnormal situations and
create technologies to
address this problem
• Deliverables:
– Technology, best practices,
application knowledge,
prototypes, metrics
• History:
– Started in 1994
– Co-funded by US Govt
(NIST)
– Budget: +$16M USD
BRAD ADAMS WALK ER University Affiliates • Current Status:
A R C H I T E C T U R E, P. C.
– Committed through 2002
– Honeywell leadership
– Expanding membership
Requirements for Safe Operation
• Hazards must be recognized and
Understood
• Equipment must be “fit for purpose”
• Systems and procedures to maintain plant
Integrity
• Competent staff
• Emergency Preparedness
• Monitor Performance
In the area of alarm management most companies fail to
meet these basic requirements for safe operation
Various cost elements
Theoretical Limit
Future upgrades (e.g., Theoreticallypossible; currently unsustainable
Advanced Control) Current Li mit
Comfort Margin
Lost opportunity Operating Target
(Cost of comfort)
Profit
Lost Profit
Incident Break-even
Lost Revenue
Loss
Fixed Costs
Additional Shutdown (Idl ePlant)
Efficiency
unplanned costs
Accident Equipment
Plant Performance damage, etc.
Losses due to
Savings fromreducing the comfort
incidents, accidents
margin
(about 10% of
operating costs)
A Look At Plant Operations
A typical Production
Profile for an Asset 95 days
Intensive Facility for a
calendar year. 79 days
62 days
47 days
23 days
30 days
Days per Year
16 days
8 days
5 days
< 60% Daily Production 95% 100%
Production Target set by Enterprise

Factors Affecting Plant
Operations
Plant Operating Target
Planning Constraints
Plant Availability Operational Constraints
Production
Plant Incidents
Days per Year
Effectiveness
Asset Utilization
Plant Capacity Limit
Agility/Flexibility
Frequency Frequency
# Days
# Days
1
1
2
2
3
0
5
0
5
0
0
0
0
0
0
10
15
20
100
150
200
250
300
0
5
5
50
0
10
12
14
16
18
0
0
2
4
6
8
0
280
280 112
457
290
290
115
300 463
300
310 118
310 468
320
320
121
474
330
330
340 124
480
340
350
350 127
486
360
360
130
370 492
370
3.2%
380 133
380 497
$33.5 M
390
2
390
136
503
σ
400
2
400
4.2
410 139
509
410
M
420
420 142
515
24.2M
5.8%
430
5.8%
430
145
520
440
F
ee
440
d
H
is
t
R
at
o
e
g
r
$38.5 M
450 148
a
526
Productionrate
450
m
Histogram
460
Rate
Rate
460
Total Feed
151 532
Total Feed
470
470
154
480 538
1
480
σ
490
490 157 543
500
500
160
549
510
510
520 163 555
520
530
530 166 561
540
540
169 567
550
550
560
172 572
560
570
570 174 578
580
580
590 177 584
590
600
600 180 590
1
503
610
610
183 595
620
620
Real Life Examples
$38.5M!
capacity!
incidents!
5.8% in lost
lost $33.5M!
And this plant
This plant had
This plant had
This plant lost

$24.2M in lost
capacity due to
asset availability &
Site Studies have identified Plant Lost
Opportunity
Between 3-15% in Lost
Capacity is attributed to asset
in-availability and incidents Plant Operating Target
Plant Availability Operational Constraints
Plant Incidents Production

Management
NEW EMPHASIS!!
Days per Year
DCS/APC/
Asset Management Optimization efforts
Reliability & CMMS

Manufacturing
Execution
Scheduling & ERP
Major Profit Potential
Emphasis on plant & Higher Plant Operating Target
equipment reliability Fewer Planning Constraints
improvements and reduced
incidents can result in a
recovery of 3-15% of Fewer Operational Constraints
lost capacity!
Days per Year

The Importance of Alarm Management
Improvement Project
Alarm management is the proper
design, implementation, operation,
and maintenance of industrial
manufacturing plant alarm systems.
Current alarming practices are leading to Incidents
Major problem is:-
alarm flood
Standing Alarms
Poor Configuration of Alarms
Nuisance Alarms
Technology exists to significantly contribute to
effective alarm systems and provide good
Situation Awareness
Alarms identified as contribution
A Case
b
The lightning struck just before 9:00 AM on a Sunday. It immediately started a
fire in the crude distillation unit of the refinery. The control operators on
duty responded by calling out the fire brigade, and then had to divert their
attention to a growing number of alarms while desperately trying to bring the
crude unit to a safe emergency shutdown.
Hydrocarbon flow was lost to the deethanizer in the FCCU recovery section,
which fed the debutanizer further along. The system was arranged to prevent
total loss of liquid level in the two vessels, so the falling level in the
deethanizer caused the deethanizer discharge valve to close. This, in turn,
caused the level in the debutanizer to drop rapidly and its discharge valve
also closed. Heat remained on the debutanizer and the trapped liquid
vaporized as the pressure rose causing the pressure relief valve to “pop” (for
the first of three times) into the flare KO drum and then immediately onto the
flare itself.
continued
In a matter of minutes, the board operator was able to restore flow to

the deethanizer. This permitted the deethanizer discharge valve to
be opened, allowing renewed flow forward to the debutanizer. The
rising level in the debutanizer should have caused the debutanizer
discharge valve to open (by the level controller action) and allow
b
flow on to the naphtha splitter. Although the operators in the

control room received a signal indicating the valve had opened, the
debutanizer, nonetheless was filling rapidly with liquid while the
naphtha splitter was emptying. The operators were concentrating
on the displays which focussed on the problems with the
deethanizer and debutanizer, and had no overview of the process
available to indicate that even though the debutanizer discharge
valve registered as open, there was no flow going from the
debutanizer to the naphtha splitter.
Despite attempts to divert the excess, the debutanizer became liquid-logged
about an hour later and the pressure relief valve lifted for the second time,
venting to the flare via the flare KO drum. Because there were enormous
volumes of gas venting, the level of liquid in the flare KO drum was rising
to a very high value.
About 2-1/2 hours later, the debutanizer vented to the flare a third time AND
CONTINUED VENTING FOR 36 MINUTES. The high level alarm for the flare
drum was activated at this time. But with alarms going off every 2 to 3 seconds,
there appears to be no evidence that that alarm was ever seen. By this time, the flare
KO drum had filled with liquid well beyond its design capacity. The fast-flowing gas
through the overfilled drum forced liquid out of the drum’s discharge pipe. The
discharge line was not designed for liquid, so the force of the liquid caused a rupture
at an elbow. This released over 20 tons of highly flammable hydrocarbon.
continued
The ensuing release quickly formed an ominous

drifting cloud of vapor and droplets. In a matter
of minutes, this cloud found its ignition source
350 feet downwind. The resulting explosion was
heard 80 miles away. In the town nearest the
plant, few windows still held intact panes, so
overpowering was the pressure shock wave from
the blast. The last fires in the refinery were
eventually extinguished 2 days later. end
Interface
between the
organization
& the individual
Management Workplace
Source Functional Condition Unsafe Acts

Failure Failure Tokens Errors &
Types Types Precursors Violations
Organization Individual
Stylistic or Cultural General Failure Poor workplace Near miss
Indicators Types design Auditing
Top Down: Accidents High workload
Unsociable hours Du Pont
Commitment Incidents
Inadequate Training
Competence Near-Misses training Workspace
Cognizance 1-10 hit list Poor perception
Motivation
data collected & of hazards
Proactive Design Attitude
analyzed Alarms
SI Projects Human Factors
Safety Information System
Control room Group Factors
Diagnostic and
Best Practices design Working Practice
remedial measures
Various cost elements
Theoretical Limit
Future upgrades (e.g., Theoreticallypossible; currently unsustainable
Advanced Control) Current Li mit
Comfort Margin
Lost opportunity Operating Target
(Cost of comfort)
Profit
Lost Profit
Incident Break-even
Lost Revenue
Loss
Fixed Costs
Additional Shutdown (Idl ePlant)
Efficiency
unplanned costs
Accident Equipment
Plant Performance damage, etc.
Losses due to
Savings fromreducing the comfort
incidents, accidents
margin
(about 10% of
operating costs)
Managing Abnormal Situations
Anatomy of a Disaster from Operations Perspective
Operational Critical Operational Plant

Modes: Plant States: Systems: Goals: Activities:
Disaster Area Emergency Response

System
Emergency Minimize Firefighting
Site Emergency Response Impact
Accident First Aid
System
Rescue
Physical and Mechanical Bring to
Containment System Safe State
Out of Evacuation
Control
Safety Shutdown,
Protective Systems,
Abnormal Hardwired Emergency Alarms
Return to Manual Control &
Normal Troubleshooting
Abnormal
DCS Alarm System
Decision Support System

Process Equipment,
Keep Normal Preventative
Normal Normal DCS, Automatic Controls Monitoring &
Plant Management Systems Testing
300
250
Unexpected Upsets Cost 3-8% of Capacity 3.2%
Histogram
5.8%
Frequency
200
150
100
50
0 H
is
tog
ram
115
118
124
127
142
145
148
154
163
172
174
112
121
130
133
136
139
151
157
160
166
169
177
180
183
1
503
3
00
~ $10 Billion annually in lost production !

2
σ Productionrate
1
σ
2
50
Frequency
2
00
$
24.2
M
1
50
1
00
5
0 503
515
457
463
468
474
480
486
492
497
509
520
526
532
538
543
549
555
561
567
572
578
584
590
595
0
F
ee
dRa
te
Total Feed
18
16
14
12
10 $38.5 M
# Days
Plant Operating Target

6
4
2
0
350
420
490
560
280
290
300
310
320
330
340
360
370
380
390
400
410
430
440
450
460
470
480
500
510
520
530
540
550
570
580
590
600
610
620
Total Feed
20 Rate
15
$33.5 M
# Days
10
0
280
350
360
370
390
400
470
480
550
560
590
600
290
300
310
320
330
340
380
410
420
430
440
450
460
490
500
510
520
530
540
570
580
610
620
Operational Constraints
Rate
Summarized Production Data

Days per Year
Optimization efforts

Major Profit Potential
Higher Plant Operating Target
Fewer Planning Constraints
Fewer Operational Constraints

Focused efforts can
result in recovery of
3-8% of capacity
Days per Year
~ $10 Billion potential to the bottom line!

Timing diagram of DIN V 19251 as applicable
for a single channel SRS with ultimate self tests
executed within the PST
Failure Occurrence in the Failure is Safe status of the

Process or in the Detected Process assured
Safeguarding System
t
System internal Time for Time for reaction of the Process
diagnostic time corrective action on the corrective action
Fault Tolerance Time
Fault tolerance time of the process or Process Safety Time (PST)

Reliability Requirements for Alarms
Claimed PFDavg Alarm system Human
integrity/reliability reliability
requirements requirements
1 – 0.1 Alarms may be
integrated into the
process control
system
No special requirements – however

the alarm system should be operated
engineered and maintained to the
good engineering standards
identified in the EEMUA Guide
EMMUA Alarm Systems Guide page 17

CONCEPT 1 : RISK REDUCTION
Actual Risk to meet

remaining required Level EUC Risk
risk of Safety
Necessary minimum risk reduction [ ∆ R ] Increasing

Risk
Actual risk reduction
Partial risk covered Partial risk covered Partial risk covered

by E/E/PES by Other Technology by External Risk
SRSs SRSs Reduction Facilities
Risk reduction achieved by all SRSs & External Risk Reduction Facilities
SAFETY INTEGRITY LEVELS
TABLE 2: SAFETY INTEGRITY LEVELS: TARGET

FAILURE MEASURES
SAFETY DEMAND MODE OF CONTINUOUS/HIGH
INTEGRITY OPERATION DEMAND MODE OF
LEVEL (Average Probability OPERATION
of failure to perform (Average Probability
(SIL) its design function of a dangerous
on demand) failure per year)
4 10-5 to < 10-4 10-5 to < 10-4
3 10-4 to < 10-3 10-4 to < 10-3
2 10-3 to < 10-2 10-3 to < 10-2
1 10-2 to < 10-1 10-2 to < 10-1

Reliability requirements for alarms
Claimed PFDavg Alarm system Human reliability
integrity/reliability requirements
requirements
0.1 – 0.01 Alarms system should The operator should be
be designated as safety trained in the
related & categorized as management of the
SIL 1 specific plant failure
that the alarm indicates;
Alarm system should The alarm presentation
be independent from arrangements should
the process control make the claimed alarm
system very obvious to the
operator and
distinguishable from
other alarms
The alarm should
remain on view to the
operator for the whole
of the time it is active
Reliability requirements for alarms
Claimed PFDavg Alarm system Human reliability
integrity/reliability requirements
requirements
Below 0.01 Alarms system would It is not recommended
have to be designated as that claims for a PFDavg
safety related and below 0.01 are made
categorized as at least for any operator action
SIL2 even if it is multiple
alarmed and very
simple.
For all credible
accident scenarios the
designer should
demonstrate that the
total number of safety
related alarms and their
maximum rate of
presentation does not
overload the operator

The Setting of a high pre-trip alarm
Maximum rate of change
of alarmed variable during fault
Limit at which
Time for operator B protection operates
to respond to alarm
and correct fault Abnormal Operating Region
Alarm Setting
A
Limit of largest normal
operational fluctuation
120 Explosion
Lower Explosive Limit (LEL)
Gas Concentration (Percentage of LEL)
100
Actual Gas
Concentration
80
Actual trip point
Normal
60 operating Level Error Measured Gas
Set trip point Concentration
Gas concentration
prior to fault
40
20 Fault Sampling Sensor Error Shut Down

Occurs Delay Delay Delay System Delay
0
0 10 20 30 40 50 60 70 80
Time after onset of fault (Seconds)
Redesign Choices
• Redesign - the plant or its controls to provide greater margin between the normal
operating limits & the trip limits. This is the most desirable solution but is often
impractical or too expensive;
• Setting within normal operating limits - setting the alam within the limits of normal
operating fluctuations & accepting that spurious alarms will occur during large normal
disturbances. This is ergonomically very undesirable and will tend to increase alarm rates
and reduce the operator confidence in the alarm system. In effect it increases the Average
Probability of Failure on Demand (PFDavg ) for the alarm system as a whole;
• setting nearer trip limits - setting the alarm closer to the trip limits and accepting that some
fast transients will not be corrected by the operator before they reach the trip level. This
will increase the production losses due to plant trips, & because there are more demands
on the protection system, tend to make the plant less safe. It also implies an increase
PFDavg for the alarm system.

Different Kinds of Events
Potential
Impact
of
Initiating
Abrupt/Catastrophic
Event
Manageable
Insidious
Time
Impact of DCS Alarm System
Awareness of Disturbances
With typical alarm systems,
orienting begins after an event Incident
creates an abnormal plant state.
The extent of the problem can
impact operator’s ability to be fully
aware of the locations of process
Potential disturbances.
Impact As disturbances propagate the
number of conditions to be aware of
of increases as well as the response
Initiating requirements and the likelihood of
missing important information.
Event Failure is
Detected
Safe status of the
Process assured
Failure Occurrence in the
Process or in the Safeguarding System Time
Point of operator awareness
Correct intervention causes return to normal

Impact of DCS Alarm System
Management of Problems
Incident
Inadequate filtering interferes with Action

Potential
Impact Alarm Floods delay Evaluation
of
Standing Alarms
Initiating interfere with
Event Orientation
Time
Point of operator awareness
Correct intervention causes return to normal

Impact of Good Alarm Management in Situation
Awareness
• Increases likelihood of
awareness of disturbances
Potential • Reduces time to awareness
• Hence, reduces the average
Impact impact of initiating events
of
Initiating
Event
Time
Average shift in awareness with decision support

Impact of Protection System
UN-SAFE
Incident
Trip SAFE
Emergency Alarm Loss
Impact
of
Initiating Quality
Event High Alarm
Operator
diagnostic time
Profit
Time FTT
Process Safety Time
Trip from SIS Emergency High FTT= Fault Tolerance Time
No response
Incorrect
Potential
Impact
of
Initiating
Event Suboptimal
Best
Time
Impact of Decision Support System
Support for Optimal Response
• Reduces errors
• Decreases time to implement
response
Potential • Manages side effects
• Increases awareness
Impact
of
Initiating
Event
Time
ASM Alarm Management Solutions
Education for Management, Engineers, Technicians
and Operators.
• Alarm Performance Assessment.

• Requirement for alarm optimization tools.
• Alignment with Company & EEMUA Guidelines.
• Alarm Rationalization.
• User Interface Design.
• Decision Support Activities
Alarm Management Optimization
Objectives
• Enhance operator effectiveness
– Avoid alarm floods
– Identify root causes
– Eliminate nuisance alarms
• Enhance profitability
– Reduce variability
– Maximize plant up time
– Prevent damage to equipment
• Reduce risk of :
– Injury to personnel
– Environmental incidents
The Process
Collect
Collect Data
Data
Change
Change
Management
Management Analyze
Analyze
Develop Plant
Alarm Management
Standards & Philosophy
Identify
Identify
Implement
Implement Enhancements
Enhancements
Verify
Verify Against
Against
Standards
Standards
Alarm Management Before - 30 Points Account for ~ 85 %
of All Alarms
• Increase the effectiveness of the existing 100
K
alarm system through proven

methodology
– Analyze existing system performance
– Assist in developing an alarm strategy and educating
operations staff
– Rationalize existing alarm system
After - 30 Points Account for ~ 52 %
• Recommend and apply new alarm 2
of All Alarms
K
management software
– UserAlert
– Optimization Suite
• Alarm Rationalization and Documentation
• Alarm Metrics and Analysis
• Advanced Alarm Handlers
Optimization Suite…
Alarm Rationalization
• Alarm priority (class) is based on severity and
level of impact and time
• Available priority options in TPS:
– No Action
– Journal
– Print
– Print & Journal
– Low
– High
– Emergency
Optimization Suite…
Alarm Rationalization
• Recommends alarm priorities based on plant
philosophy
– Severity of impact
– Time to respond
– Trip Point
• Electronically captures plant alarm
management philosophy
– Time to respond rules definition
– Impact and severity rules definition
• Apply manual priority override
• Use Alarm Impact Templates
• Generate EC Files (Honeywell)

Presentation On Alarm Management

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Presentation On Alarm Management

Uploaded by

Copyright:

Available Formats

ASM

• ASM grew from an initial focus on alarm

< 60% Daily Production 95% 100%

Production Target set by Enterprise

< 60% Daily Production 95% 100%

This plant lost

Plant Incidents Production

< 60% Daily Production 95% 100%

Plant Capacity Limit

< 60% Daily Production 95% 100%

In a matter of minutes, the board operator was able to restore flow to

flow on to the naphtha splitter. Although the operators in the

The ensuing release quickly formed an ominous

Source Functional Condition Unsafe Acts

Operational Critical Operational Plant

Disaster Area Emergency Response

Decision Support System

~ $10 Billion annually in lost production !

Plant Operating Target

Summarized Production Data

Plant Capacity Limit

< 60% Daily Production 95% 100%

Fewer Planning Constraints

Fewer Operational Constraints

Plant Capacity Limit

< 60% Daily Production 95% 100%

~ $10 Billion potential to the bottom line!

Failure Occurrence in the Failure is Safe status of the

Fault Tolerance Time

Fault tolerance time of the process or Process Safety Time (PST)

No special requirements – however

EMMUA Alarm Systems Guide page 17

Actual Risk to meet

Necessary minimum risk reduction [ ∆ R ] Increasing

Partial risk covered Partial risk covered Partial risk covered

TABLE 2: SAFETY INTEGRITY LEVELS: TARGET

2 10-3 to < 10-2 10-3 to < 10-2

1 10-2 to < 10-1 10-2 to < 10-1

EMMUA Alarm Systems Guide page 17

20 Fault Sampling Sensor Error Shut Down

EMMUA Alarm Systems Guide page 17

Point of operator awareness

Correct intervention causes return to normal

Inadequate filtering interferes with Action

Point of operator awareness

Correct intervention causes return to normal

Average shift in awareness with decision support

• Alarm Performance Assessment.

alarm system through proven

You might also like